[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-wzhe06--SparkCTR":3,"tool-wzhe06--SparkCTR":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":81,"owner_twitter":79,"owner_website":82,"owner_url":83,"languages":84,"stars":93,"forks":94,"last_commit_at":95,"license":96,"difficulty_score":97,"env_os":98,"env_gpu":99,"env_ram":100,"env_deps":101,"category_tags":107,"github_topics":108,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":116,"updated_at":117,"faqs":118,"releases":154},3615,"wzhe06\u002FSparkCTR","SparkCTR","CTR prediction model based on spark(LR, GBDT, DNN)","SparkCTR 是一个基于 Apache Spark MLlib 构建的点击率（CTR）预测模型库，专为大规模广告和推荐场景设计。它致力于解决在海量数据环境下，如何高效、准确地预估用户点击行为的核心难题，帮助业务方优化广告投放策略与推荐效果。\n\n该工具特别适合大数据开发工程师、算法研究人员以及需要快速验证 CTR 模型的企业团队使用。其最大的技术亮点在于“纯净”的实现方式：完全依赖 Spark 原生能力，无需引入任何第三方深度学习框架或额外依赖库，极大地降低了部署门槛和环境配置复杂度。\n\nSparkCTR 不仅涵盖了逻辑回归、随机森林、梯度提升决策树（GBDT）等经典机器学习算法，还实现了因子分解机（FM）以及多种前沿的神经网络模型（如 IPNN、OPNN）。更值得一提的是，它支持业界经典的组合模型（如 GBDT+LR），并内置了便捷的模型选择示例，用户只需运行简单指令，即可一次性训练多种模型并自动对比各项性能指标。对于希望在不增加架构负担的前提下，探索从传统统计模型到深度学习方法的用户来说，SparkCTR 提供了一个开箱即用且易于扩展的理想方案。","# CTRmodel\nCTR prediction model based on pure Spark MLlib, no third-party library.\n\n# Realized Models\n* Naive Bayes\n* Logistic Regression\n* Factorization Machine\n* Random Forest\n* Gradient Boosted Decision Tree\n* GBDT + LR\n* Neural Network\n* Inner Product Neural Network (IPNN)\n* Outer Product Neural Network (OPNN)\n\n# Usage\nIt's a maven project. Spark version is 2.3.0. Scala version is 2.11. \u003Cbr \u002F>\nAfter dependencies are imported by maven automatically, you can simple run the example function (**com.ggstar.example.ModelSelection**) to train all the CTR models and get the metrics comparison among all the models.\n\n# Related Papers on CTR prediction\n* [[LR] Predicting Clicks - Estimating the Click-Through Rate for New Ads (Microsoft 2007)](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FClassic%20CTR%20Prediction\u002F%5BLR%5D%20Predicting%20Clicks%20-%20Estimating%20the%20Click-Through%20Rate%20for%20New%20Ads%20%28Microsoft%202007%29.pdf) \u003Cbr \u002F>\n* [[FFM] Field-aware Factorization Machines for CTR Prediction (Criteo 2016)](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FClassic%20CTR%20Prediction\u002F%5BFFM%5D%20Field-aware%20Factorization%20Machines%20for%20CTR%20Prediction%20%28Criteo%202016%29.pdf) \u003Cbr \u002F>\n* [[GBDT+LR] Practical Lessons from Predicting Clicks on Ads at Facebook (Facebook 2014)](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FClassic%20CTR%20Prediction\u002F%5BGBDT%2BLR%5D%20Practical%20Lessons%20from%20Predicting%20Clicks%20on%20Ads%20at%20Facebook%20%28Facebook%202014%29.pdf) \u003Cbr \u002F>\n* [[PS-PLM] Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction (Alibaba 2017)](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FClassic%20CTR%20Prediction\u002F%5BPS-PLM%5D%20Learning%20Piece-wise%20Linear%20Models%20from%20Large%20Scale%20Data%20for%20Ad%20Click%20Prediction%20%28Alibaba%202017%29.pdf) \u003Cbr \u002F>\n* [[FTRL] Ad Click Prediction a View from the Trenches (Google 2013)](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FClassic%20CTR%20Prediction\u002F%5BFTRL%5D%20Ad%20Click%20Prediction%20a%20View%20from%20the%20Trenches%20%28Google%202013%29.pdf) \u003Cbr \u002F>\n* [[FM] Fast Context-aware Recommendations with Factorization Machines (UKON 2011)](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FClassic%20CTR%20Prediction\u002F%5BFM%5D%20Fast%20Context-aware%20Recommendations%20with%20Factorization%20Machines%20%28UKON%202011%29.pdf) \u003Cbr \u002F>\n* [[DCN] Deep & Cross Network for Ad Click Predictions (Stanford 2017)](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BDCN%5D%20Deep%20%26%20Cross%20Network%20for%20Ad%20Click%20Predictions%20%28Stanford%202017%29.pdf) \u003Cbr \u002F>\n* [[Deep Crossing] Deep Crossing - Web-Scale Modeling without Manually Crafted Combinatorial Features (Microsoft 2016)](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BDeep%20Crossing%5D%20Deep%20Crossing%20-%20Web-Scale%20Modeling%20without%20Manually%20Crafted%20Combinatorial%20Features%20%28Microsoft%202016%29.pdf) \u003Cbr \u002F>\n* [[PNN] Product-based Neural Networks for User Response Prediction (SJTU 2016)](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BPNN%5D%20Product-based%20Neural%20Networks%20for%20User%20Response%20Prediction%20%28SJTU%202016%29.pdf) \u003Cbr \u002F>\n* [[DIN] Deep Interest Network for Click-Through Rate Prediction (Alibaba 2018)](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BDIN%5D%20Deep%20Interest%20Network%20for%20Click-Through%20Rate%20Prediction%20%28Alibaba%202018%29.pdf) \u003Cbr \u002F>\n* [[ESMM] Entire Space Multi-Task Model - An Effective Approach for Estimating Post-Click Conversion Rate (Alibaba 2018)](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BESMM%5D%20Entire%20Space%20Multi-Task%20Model%20-%20An%20Effective%20Approach%20for%20Estimating%20Post-Click%20Conversion%20Rate%20%28Alibaba%202018%29.pdf) \u003Cbr \u002F>\n* [[Wide & Deep] Wide & Deep Learning for Recommender Systems (Google 2016)](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BWide%20%26%20Deep%5D%20Wide%20%26%20Deep%20Learning%20for%20Recommender%20Systems%20%28Google%202016%29.pdf) \u003Cbr \u002F>\n* [[xDeepFM] xDeepFM - Combining Explicit and Implicit Feature Interactions for Recommender Systems (USTC 2018)](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BxDeepFM%5D%20xDeepFM%20-%20Combining%20Explicit%20and%20Implicit%20Feature%20Interactions%20for%20Recommender%20Systems%20%28USTC%202018%29.pdf) \u003Cbr \u002F>\n* [[Image CTR] Image Matters - Visually modeling user behaviors using Advanced Model Server (Alibaba 2018)](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BImage%20CTR%5D%20Image%20Matters%20-%20Visually%20modeling%20user%20behaviors%20using%20Advanced%20Model%20Server%20%28Alibaba%202018%29.pdf) \u003Cbr \u002F>\n* [[AFM] Attentional Factorization Machines - Learning the Weight of Feature Interactions via Attention Networks (ZJU 2017)](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BAFM%5D%20Attentional%20Factorization%20Machines%20-%20Learning%20the%20Weight%20of%20Feature%20Interactions%20via%20Attention%20Networks%20%28ZJU%202017%29.pdf) \u003Cbr \u002F>\n* [[DIEN] Deep Interest Evolution Network for Click-Through Rate Prediction (Alibaba 2019)](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BDIEN%5D%20Deep%20Interest%20Evolution%20Network%20for%20Click-Through%20Rate%20Prediction%20%28Alibaba%202019%29.pdf) \u003Cbr \u002F>\n* [[DSSM] Learning Deep Structured Semantic Models for Web Search using Clickthrough Data (UIUC 2013)](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BDSSM%5D%20Learning%20Deep%20Structured%20Semantic%20Models%20for%20Web%20Search%20using%20Clickthrough%20Data%20%28UIUC%202013%29.pdf) \u003Cbr \u002F>\n* [[FNN] Deep Learning over Multi-field Categorical Data (UCL 2016)](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BFNN%5D%20Deep%20Learning%20over%20Multi-field%20Categorical%20Data%20%28UCL%202016%29.pdf) \u003Cbr \u002F>\n* [[DeepFM] A Factorization-Machine based Neural Network for CTR Prediction (HIT-Huawei 2017)](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BDeepFM%5D%20A%20Factorization-Machine%20based%20Neural%20Network%20for%20CTR%20Prediction%20%28HIT-Huawei%202017%29.pdf) \u003Cbr \u002F>\n* [[NFM] Neural Factorization Machines for Sparse Predictive Analytics (NUS 2017)](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BNFM%5D%20Neural%20Factorization%20Machines%20for%20Sparse%20Predictive%20Analytics%20%28NUS%202017%29.pdf) \u003Cbr \u002F>\n\n# Other Resources\n* [Papers on Computational Advertising](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers) \u003Cbr \u002F>\n* [Papers on Recommender System](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers) \u003Cbr \u002F>\n\n","# CTR模型\n基于纯 Spark MLlib 的点击率预测模型，不依赖任何第三方库。\n\n# 已实现的模型\n* 朴素贝叶斯\n* 逻辑回归\n* 因子分解机\n* 随机森林\n* 梯度提升决策树\n* GBDT + LR\n* 神经网络\n* 内积神经网络 (IPNN)\n* 外积神经网络 (OPNN)\n\n# 使用方法\n这是一个 Maven 项目。Spark 版本为 2.3.0，Scala 版本为 2.11。\u003Cbr \u002F>\n通过 Maven 自动导入依赖后，您可以直接运行示例函数 (**com.ggstar.example.ModelSelection**) 来训练所有 CTR 模型，并获取各模型之间的指标对比结果。\n\n# 有关 CTR 预测的相关论文\n* [[LR] 预测点击——估计新广告的点击率（微软，2007）](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FClassic%20CTR%20Prediction\u002F%5BLR%5D%20Predicting%20Clicks%20-%20Estimating%20the%20Click-Through%20Rate%20for%20New%20Ads%20%28Microsoft%202007%29.pdf) \u003Cbr \u002F>\n* [[FFM] 面向领域的因子分解机用于点击率预测（Criteo，2016）](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FClassic%20CTR%20Prediction\u002F%5BFFM%5D%20Field-aware%20Factorization%20Machines%20for%20CTR%20Prediction%20%28Criteo%202016%29.pdf) \u003Cbr \u002F>\n* [[GBDT+LR] 从 Facebook 广告点击预测中获得的实际经验（Facebook，2014）](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FClassic%20CTR%20Prediction\u002F%5BGBDT%2BLR%5D%20Practical%20Lessons%20from%20Predicting%20Clicks%20on%20Ads%20at%20Facebook%20%28Facebook%202014%29.pdf) \u003Cbr \u002F>\n* [[PS-PLM] 从大规模数据中学习分段线性模型用于广告点击预测（阿里巴巴，2017）](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FClassic%20CTR%20Prediction\u002F%5BPS-PLM%5D%20Learning%20Piece-wise%20Linear%20Models%20from%20Large%20Scale%20Data%20for%20Ad%20Click%20Prediction%20%28Alibaba%202017%29.pdf) \u003Cbr \u002F>\n* [[FTRL] 广告点击预测：来自一线的视角（谷歌，2013）](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FClassic%20CTR%20Prediction\u002F%5BFTRL%5D%20Ad%20Click%20Prediction%20a%20View%20from%20the%20Trenches%20%28Google%202013%29.pdf) \u003Cbr \u002F>\n* [[FM] 基于因子分解机的快速上下文感知推荐（UKON，2011）](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FClassic%20CTR%20Prediction\u002F%5BFM%5D%20Fast%20Context-aware%20Recommendations%20with%20Factorization%20Machines%20%28UKON%202011%29.pdf) \u003Cbr \u002F>\n* [[DCN] 用于广告点击预测的深度与交叉网络（斯坦福大学，2017）](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BDCN%5D%20Deep%20%26%20Cross%20Network%20for%20Ad%20Click%20Predictions%20%28Stanford%202017%29.pdf) \u003Cbr \u002F>\n* [[Deep Crossing] Deep Crossing——无需人工设计组合特征的全网规模建模（微软，2016）](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BDeep%20Crossing%5D%20Deep%20Crossing%20-%20Web-Scale%20Modeling%20without%20Manually%20Crafted%20Combinatorial%20Features%20%28Microsoft%202016%29.pdf) \u003Cbr \u002F>\n* [[PNN] 基于产品的神经网络用于用户响应预测（上海交通大学，2016）](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BPNN%5D%20Product-based%20Neural%20Networks%20for%20User%20Response%20Prediction%20%28SJTU%202016%29.pdf) \u003Cbr \u002F>\n* [[DIN] 用于点击率预测的深度兴趣网络（阿里巴巴，2018）](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BDIN%5D%20Deep%20Interest%20Network%20for%20Click-Through%20Rate%20Prediction%20%28Alibaba%202018%29.pdf) \u003Cbr \u002F>\n* [[ESMM] 全空间多任务模型——一种有效估算点击后转化率的方法（阿里巴巴，2018）](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BESMM%5D%20Entire%20Space%20Multi-Task%20Model%20-%20An%20Effective%20Approach%20for%20Estimating%20Post-Click%20Conversion%20Rate%20%28Alibaba%202018%29.pdf) \u003Cbr \u002F>\n* [[Wide & Deep] 用于推荐系统的 Wide & Deep 学习（谷歌，2016）](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BWide%20%26%20Deep%5D%20Wide%20%26%20Deep%20Learning%20for%20Recommender%20Systems%20%28Google%202016%29.pdf) \u003Cbr \u002F>\n* [[xDeepFM] xDeepFM——结合显式和隐式特征交互用于推荐系统（中国科学技术大学，2018）](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BxDeepFM%5D%20xDeepFM%20-%20Combining%20Explicit%20and%20Implicit%20Feature%20Interactions%20for%20Recommender%20Systems%20%28USTC%202018%29.pdf) \u003Cbr \u002F>\n* [[图像 CTR] 图像很重要——利用高级模型服务器对用户行为进行可视化建模（阿里巴巴，2018）](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BImage%20CTR%5D%20Image%20Matters%20-%20Visually%20modeling%20user%20behaviors%20using%20Advanced%20Model%20Server%20%28Alibaba%202018%29.pdf) \u003Cbr \u002F>\n* [[AFM] 注意力因子分解机——通过注意力网络学习特征交互的权重（浙江大学，2017）](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BAFM%5D%20Attentional%20Factorization%20Machines%20-%20Learning%20the%20Weight%20of%20Feature%20Interactions%20via%20Attention%20Networks%20%28ZJU%202017%29.pdf) \u003Cbr \u002F>\n* [[DIEN] 用于点击率预测的深度兴趣演化网络（阿里巴巴，2019）](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BDIEN%5D%20Deep%20Interest%20Evolution%20Network%20for%20Click-Through%20Rate%20Prediction%20%28Alibaba%202019%29.pdf) \u003Cbr \u002F>\n* [[DSSM] 利用点击数据学习用于网页搜索的深度结构化语义模型（伊利诺伊大学厄巴纳-香槟分校，2013）](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BDSSM%5D%20Learning%20Deep%20Structured%20Semantic%20Models%20for%20Web%20Search%20using%20Clickthrough%20Data%20%28UIUC%202013%29.pdf) \u003Cbr \u002F>\n* [[FNN] 针对多字段分类数据的深度学习（伦敦大学学院，2016）](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BFNN%5D%20Deep%20Learning%20over%20Multi-field%20Categorical%20Data%20%28UCL%202016%29.pdf) \u003Cbr \u002F>\n* [[DeepFM] 基于因子分解机的神经网络用于点击率预测（哈尔滨工业大学-华为，2017）](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BDeepFM%5D%20A%20Factorization-Machine%20based%20Neural%20Network%20for%20CTR%20Prediction%20%28HIT-Huawei%202017%29.pdf) \u003Cbr \u002F>\n* [[NFM] 用于稀疏预测分析的神经因子分解机（新加坡国立大学，2017）](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers\u002Fblob\u002Fmaster\u002FDeep%20Learning%20CTR%20Prediction\u002F%5BNFM%5D%20Neural%20Factorization%20Machines%20for%20Sparse%20Predictive%20Analytics%20%28NUS%202017%29.pdf) \u003Cbr \u002F>\n\n# 其他资源\n* [计算广告相关论文](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers) \u003Cbr \u002F>\n* [推荐系统相关论文](https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FAd-papers) \u003Cbr \u002F>","# SparkCTR 快速上手指南\n\nSparkCTR 是一个基于纯 Spark MLlib 实现的点击率（CTR）预测模型库，无需任何第三方依赖。它支持从传统的逻辑回归到深度的神经网络等多种主流 CTR 算法。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **JDK**: Java 8 (推荐)\n*   **Build Tool**: Maven 3.x+\n*   **Spark 版本**: 2.3.0\n*   **Scala 版本**: 2.11\n\n> **注意**：本项目严格依赖 Spark 2.3.0 和 Scala 2.11，使用其他版本可能导致兼容性问题。\n\n## 安装步骤\n\n本项目是一个标准的 Maven 工程，无需手动下载 jar 包，只需配置依赖即可自动构建。\n\n1.  **克隆项目或引入依赖**\n    如果您是将此工具作为依赖集成到自己的项目中，请在 `pom.xml` 中添加以下依赖（假设项目已发布到仓库，若为本地源码则直接导入项目）：\n\n    ```xml\n    \u003Cdependencies>\n        \u003C!-- 请根据实际发布的 groupId 和 artifactId 填写，此处以源码结构为例 -->\n        \u003Cdependency>\n            \u003CgroupId>com.ggstar\u003C\u002FgroupId>\n            \u003CartifactId>ctrmodel\u003C\u002FartifactId>\n            \u003Cversion>1.0.0\u003C\u002Fversion> \n        \u003C\u002Fdependency>\n        \n        \u003C!-- Spark 核心依赖 (版本需严格匹配 2.3.0) -->\n        \u003Cdependency>\n            \u003CgroupId>org.apache.spark\u003C\u002FgroupId>\n            \u003CartifactId>spark-core_2.11\u003C\u002FartifactId>\n            \u003Cversion>2.3.0\u003C\u002Fversion>\n        \u003C\u002Fdependency>\n        \u003Cdependency>\n            \u003CgroupId>org.apache.spark\u003C\u002FgroupId>\n            \u003CartifactId>spark-mllib_2.11\u003C\u002FartifactId>\n            \u003Cversion>2.3.0\u003C\u002Fversion>\n        \u003C\u002Fdependency>\n    \u003C\u002Fdependencies>\n    ```\n\n    *国内加速建议*：建议在 Maven 的 `settings.xml` 中配置阿里云镜像源以加快依赖下载速度：\n    ```xml\n    \u003Cmirror>\n      \u003Cid>aliyunmaven\u003C\u002Fid>\n      \u003CmirrorOf>*\u003C\u002FmirrorOf>\n      \u003Cname>Aliyun Maven\u003C\u002Fname>\n      \u003Curl>https:\u002F\u002Fmaven.aliyun.com\u002Frepository\u002Fpublic\u003C\u002Furl>\n    \u003C\u002Fmirror>\n    ```\n\n2.  **构建项目**\n    在项目根目录下执行以下命令进行编译：\n\n    ```bash\n    mvn clean compile\n    ```\n\n## 基本使用\n\nSparkCTR 提供了一个示例类，可以一键运行并训练所有支持的模型，同时输出各模型的性能指标对比。这是验证环境和理解用法的最佳起点。\n\n**运行示例代码：**\n\n直接运行 `com.ggstar.example.ModelSelection` 类：\n\n```bash\nmvn exec:java -Dexec.mainClass=\"com.ggstar.example.ModelSelection\"\n```\n\n或者在 IDE（如 IntelliJ IDEA）中直接运行该类的 `main` 方法。\n\n**执行流程说明：**\n1.  程序会自动加载示例数据（或您配置的数据路径）。\n2.  依次训练以下模型：\n    *   Naive Bayes (朴素贝叶斯)\n    *   Logistic Regression (逻辑回归)\n    *   Factorization Machine (因子分解机)\n    *   Random Forest (随机森林)\n    *   Gradient Boosted Decision Tree (GBDT)\n    *   GBDT + LR\n    *   Neural Network (神经网络)\n    *   IPNN \u002F OPNN (内积\u002F外积神经网络)\n3.  控制台将打印各模型的评估指标（如 AUC、LogLoss 等），方便进行模型选型。\n\n**自定义开发：**\n参考 `ModelSelection` 源码，您可以实例化具体的模型类（如 `FactorizationMachine`），传入特征向量和标签 RDD，调用 `fit` 方法进行训练，并使用 `transform` 进行预测。","某大型电商平台的广告算法团队需要在每日 TB 级用户行为日志上，快速迭代并对比多种点击率（CTR）预测模型以优化广告投放策略。\n\n### 没有 SparkCTR 时\n- **环境依赖复杂**：引入深度学习模型（如 PNN、DNN）需额外配置 TensorFlow 或 PyTorch 集群，与现有的纯 Spark 数据流水线割裂，运维成本极高。\n- **模型对比低效**：手动分别编写逻辑回归、GBDT 和神经网络的训练代码，耗时数天才能完成一轮多模型的效果横向评估。\n- **特征工程重复**：不同模型间无法复用特征处理逻辑，每次尝试新算法都要重新清洗和对齐海量稀疏特征。\n- **技术栈不统一**：传统机器学习与深度学习团队使用不同框架，导致代码难以合并，模型上线部署流程冗长且易出错。\n\n### 使用 SparkCTR 后\n- **架构轻量统一**：直接基于纯 Spark MLlib 运行，无需任何第三方库，无缝嵌入现有大数据集群，一键部署从 LR 到 IPNN 等所有主流模型。\n- **自动化模型优选**：通过运行 `ModelSelection` 示例函数，自动并行训练多种算法并输出指标对比报表，将模型筛选周期从数天缩短至小时级。\n- **流程高度复用**：在统一的 Spark DataFrame 管道中完成特征工程，同一套预处理逻辑可直接服务于 GBDT+LR、FM 及各类神经网络模型。\n- **落地门槛降低**：算法工程师仅需调整 Maven 依赖即可调用前沿的 Deep Crossing 或 DIN 模型，实现了从实验到生产环境的平滑过渡。\n\nSparkCTR 通过“零外部依赖”的纯 Spark 实现，让大规模 CTR 模型的研发从繁琐的环境搭建中解放出来，专注于算法效果本身的快速验证与迭代。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fwzhe06_SparkCTR_0499bce6.png","wzhe06","Wang Zhe","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fwzhe06_0f390bbb.jpg","Engineering Manager @Bytedance\r\nComputational Advertising",null,"San Francisco Bay Area","wzhe06@gmail.com","https:\u002F\u002Fwzhe.me\u002F","https:\u002F\u002Fgithub.com\u002Fwzhe06",[85,89],{"name":86,"color":87,"percentage":88},"Scala","#c22d40",83,{"name":90,"color":91,"percentage":92},"Java","#b07219",17,924,259,"2026-03-01T11:34:13","Apache-2.0",4,"","未说明 (基于 Spark MLlib，通常依赖 CPU 集群)","未说明",{"notes":102,"python":100,"dependencies":103},"该项目是一个 Maven 项目，完全基于纯 Spark MLlib 实现，无需第三方库。主要运行环境为 Apache Spark 2.3.0 和 Scala 2.11。用户可通过导入 Maven 依赖后，运行示例类 com.ggstar.example.ModelSelection 来训练模型并对比指标。",[104,105,106],"Apache Spark 2.3.0","Scala 2.11","Spark MLlib",[13],[109,110,111,112,113,114,115],"ctr-prediction","machine-learning","computational-advertising","scala","spark","spark-mllib","spark-ml","2026-03-27T02:49:30.150509","2026-04-06T07:14:25.948040",[119,124,129,134,139,144,149],{"id":120,"question_zh":121,"answer_zh":122,"source_url":123},16575,"如何运行示例函数（如 LR, GBDT, NN 等）？","建议使用 IntelliJ IDEA IDE 导入该项目，并将其设置为 Maven 项目。配置完成后，直接执行 example 包中对应模型的主函数（main function）即可运行。注意：原始数据文件目前不再提供，如果遇到读取 orc 文件的空指针异常，通常是因为缺少对应的数据资源文件。","https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FSparkCTR\u002Fissues\u002F6",{"id":125,"question_zh":126,"answer_zh":127,"source_url":128},16576,"GBDT+LR 模型是否已实现？","本项目中尚未直接实现 GBDT+LR 模型，因为项目仍处于孵化阶段。但可以利用 @titicaca 的开源库来实现 GBDT + LR 的组合，建议参考该外部库进行集成。","https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FSparkCTR\u002Fissues\u002F1",{"id":130,"question_zh":131,"answer_zh":132,"source_url":133},16577,"为什么在模型中要对用户特征和物品特征做内积和外积操作？","这是一种特征交叉的方法。由于 User Embedding 通常是基于 Item Embedding 构建的（例如对用户点击过的广告 Item Embedding 取平均），两者具有强相关性。通过内积（inner product）和外积（outer product）计算得出的相似度更能体现用户的兴趣。这只是一个 Toy 模型，实际应用中需要根据具体场景尝试不同的特征工程组合。","https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FSparkCTR\u002Fissues\u002F2",{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},16578,"使用 mvn install 打包后找不到 Class 文件怎么办？","该项目目前未实现 Maven 打包的相关配置，因此生成的 jar 包中不包含编译后的 class 文件。如果需要使用 Maven 打包，请开发者自行补充相关的 Maven 插件配置（如 maven-assembly-plugin 或 maven-shade-plugin），并欢迎将解决方案反馈到项目中。","https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FSparkCTR\u002Fissues\u002F3",{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},16579,"文档中的 PDF 链接失效了怎么办？","维护者已收到反馈并更新了所有失效的论文和文档链接，请刷新页面或重新查看 README 中的最新链接地址。","https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FSparkCTR\u002Fissues\u002F5",{"id":145,"question_zh":146,"answer_zh":147,"source_url":148},16580,"这套框架能支持多大的模型或多少特征？","支持的模型规模和特征数量没有固定上限，主要取决于硬件环境配置以及 Spark 集群的节点数目。建议在实际工程实践中，根据自身的集群资源进行测试和调整，以确定最佳的性能边界。","https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FSparkCTR\u002Fissues\u002F4",{"id":150,"question_zh":151,"answer_zh":152,"source_url":153},16581,"不同 CTR 模型之间的性能对比结果如何？是否有明显优于其他的模型？","理论上针对不同业务场景，没有一种模型能够在所有情况下都胜出。性能对比（如 AUC、PR 曲线等）需要用户基于自己的训练集和测试集数据进行实测，建议根据具体数据分布和业务需求选择合适的模型。","https:\u002F\u002Fgithub.com\u002Fwzhe06\u002FSparkCTR\u002Fissues\u002F7",[]]