[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-Alibaba-NLP--ZeroSearch":3,"tool-Alibaba-NLP--ZeroSearch":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":79,"owner_website":79,"owner_url":81,"languages":82,"stars":91,"forks":92,"last_commit_at":93,"license":94,"difficulty_score":95,"env_os":96,"env_gpu":97,"env_ram":98,"env_deps":99,"category_tags":110,"github_topics":79,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":111,"updated_at":112,"faqs":113,"releases":144},3666,"Alibaba-NLP\u002FZeroSearch","ZeroSearch","ZeroSearch: Incentivize the Search Capability of LLMs without Searching","ZeroSearch 是由阿里巴巴通义实验室推出的一项创新技术，旨在让大语言模型在不实际联网搜索的情况下，也能具备强大的检索与推理能力。传统方法通常依赖实时调用搜索引擎来获取外部信息，但这不仅成本高、延迟大，还涉及隐私和稳定性问题。ZeroSearch 通过构建“模拟搜索”机制，利用专门训练的仿真大模型来模仿真实搜索引擎的行为，从而在本地即可生成高质量的搜索结果，并结合强化学习算法（如 REINFORCE、GPRO 和 PPO）优化策略模型，显著提升模型回答事实性问题的准确性。\n\n这项技术特别适合 AI 研究人员、大模型开发者以及希望提升模型知识更新能力但受限于外部接口团队使用。它无需接入真实搜索引擎，就能让模型“学会思考如何搜索”，既降低了部署门槛，又提高了响应效率。目前，ZeroSearch 已开源了支持 Wikipedia 和 Google 搜索场景的多个版本模型及对应数据集，兼容 Qwen、Llama 等主流架构，便于快速集成与二次开发。对于希望在资源受限环境下实现高效知识增强型对话系统的项目而言，ZeroSearch 提供了一个兼具实用性与前瞻性的解决方案。","\u003Cdiv align=\"center\">\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlibaba-NLP_ZeroSearch_readme_af3b0f5a7196.jpg\" width=\"70%\" height=\"280%\" \u002F>\n\u003C\u002Fp>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\u003Ch1>ZeroSearch: Incentivize the Search Capability of LLMs without Searching\n\u003C\u002Fh1>\n\u003C\u002Fdiv>\n\n\n\u003Cdiv align=\"center\">\n  \u003Ca href='https:\u002F\u002Falibaba-nlp.github.io\u002FZeroSearch\u002F'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHomepage-ZeroSearch-6c5ce7?logo=github&logoColor=white'>\u003C\u002Fa>\n  \u003Ca href='https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.04588'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-arXiv-d63031?logo=arxiv&logoColor=white'>\u003C\u002Fa>\n  \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fsunhaonlp\u002Fzerosearch-v2-6827f4ee6b6265069d443d4e'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Models-0984e3'>\u003C\u002Fa>\n  \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fsunhaonlp\u002FZeroSearch_dataset'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Datasets-00b894'>\u003C\u002Fa>\n  \u003Ca href='https:\u002F\u002Fx.com\u002F_akhaliq\u002Fstatus\u002F1920397374007984516'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Furl?url=https%3A%2F%2Fx.com%2FKevin_GuoweiXu%2Fstatus%2F1858338565463421244'>\u003C\u002Fa>\u003Cbr>\n\u003C\u002Fdiv>\n\n\n\n\u003Cp align=\"center\">\n  \u003Ci>\u003Cb>Hao Sun, Zile Qiao, Jiayan Guo, Xuanbo Fan, Yingyan Hou\u003C\u002Fb>\u003C\u002Fi>\u003Cbr>\n  \u003Ci>\u003Cb>Yong Jiang, Pengjun Xie, Yan Zhang, Fei Huang, Jingren Zhou\u003C\u002Fb>\u003C\u002Fi>\u003Cbr>\n  \u003Ci>Tongyi Lab \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlibaba-NLP_ZeroSearch_readme_ad19c9d61b7a.png\" width=\"14px\">, Alibaba Group\u003C\u002Fi>\n\u003C\u002Fp>\n\n\n\n# 🔥 News\n\n- **[2025.06.08]** Released the [simulation LLMs](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fsunhaonlp\u002Fsimulation-llm-wiki-v2-6857b06122425526d82a42d4) and [policy models](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fsunhaonlp\u002Fzerosearch-policy-wiki-v2-68442dce61d2e68f6623e500) compatible with Wikipedia Search.\n- **[2025.05.17]** Released the [simulation LLMs](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fsunhaonlp\u002Fsimulation-llm-google-v2-6827f4e45bca955ed2b2d0ba) and [policy models](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fsunhaonlp\u002Fzerosearch-policy-google-v2-6827f4ee6b6265069d443d4e) compatible with Google Search.\n- **[2025.05.17]** Released the [simulation tuning dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fsunhaonlp\u002FSimulationTuning_dataset).\n- **[2025.05.17]** Added support for three RL algorithms: REINFORCE, GPRO, and PPO.\n- **[2025.05.08]** Released the initial codebase and paper.\n\n\n# 🤗 Resources\n\n| Retriever | Simulation Tuning Dataset                                    | Simulation LLMs                                              | Policy Models                                                 |\n| --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |\n| Wikipedia | •[SimulationTuning\\_wiki\\_dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fsunhaonlp\u002FSimulationTuning_wiki_dataset)     | •[Simulation\\_LLM\\_wiki\\_3B\\_V2](https:\u002F\u002Fhuggingface.co\u002Fsunhaonlp\u002FSimulation_LLM_wiki_3B_V2)\u003Cbr>•[Simulation\\_LLM\\_wiki\\_7B\\_V2](https:\u002F\u002Fhuggingface.co\u002Fsunhaonlp\u002FSimulation_LLM_wiki_7B_V2)\u003Cbr>•[Simulation\\_LLM\\_wiki\\_14B\\_V2](https:\u002F\u002Fhuggingface.co\u002Fsunhaonlp\u002FSimulation_LLM_wiki_14B_V2)             | •[ZeroSearch\\_wiki\\_V2\\_Qwen2.5\\_3B](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_wiki_V2_Qwen2.5_3B)\u003Cbr>•[ZeroSearch\\_wiki\\_V2\\_Qwen2.5\\_3B\\_Instruct](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_wiki_V2_Qwen2.5_3B_Instruct)\u003Cbr>•[ZeroSearch\\_wiki\\_V2\\_Llama\\_3.2\\_3B](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_wiki_V2_Llama_3.2_3B)\u003Cbr>•[ZeroSearch\\_wiki\\_V2\\_Llama\\_3.2\\_3B\\_Instruct](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_wiki_V2_Llama_3.2_3B_Instruct)\u003Cbr>•[ZeroSearch\\_wiki\\_V2\\_Qwen2.5\\_7B](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_wiki_V2_Qwen2.5_7B)\u003Cbr>•[ZeroSearch\\_wiki\\_V2\\_Qwen2.5\\_7B\\_Instruct](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_wiki_V2_Qwen2.5_7B_Instruct)                         |\n| Google    | •[SimulationTuning\\_google\\_dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fsunhaonlp\u002FSimulationTuning_google_dataset) | •[Simulation\\_LLM\\_google\\_3B\\_V2](https:\u002F\u002Fhuggingface.co\u002Fsunhaonlp\u002FSimulation_LLM_google_3B_V2)\u003Cbr>•[Simulation\\_LLM\\_google\\_7B\\_V2](https:\u002F\u002Fhuggingface.co\u002Fsunhaonlp\u002FSimulation_LLM_google_7B_V2)\u003Cbr>•[Simulation\\_LLM\\_google\\_14B\\_V2](https:\u002F\u002Fhuggingface.co\u002Fsunhaonlp\u002FSimulation_LLM_google_14B_V2) | •[ZeroSearch\\_google\\_V2\\_Qwen2.5\\_3B](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_google_V2_Qwen2.5_3B)\u003Cbr>•[ZeroSearch\\_google\\_V2\\_Qwen2.5\\_3B\\_Instruct](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_google_V2_Qwen2.5_3B_Instruct)\u003Cbr>•[ZeroSearch\\_google\\_V2\\_Llama\\_3.2\\_3B](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_google_V2_Llama_3.2_3B)\u003Cbr>•[ZeroSearch\\_google\\_V2\\_Llama\\_3.2\\_3B\\_Instruct](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_google_V2_Llama_3.2_3B_Instruct)\u003Cbr>•[ZeroSearch\\_google\\_V2\\_Qwen2.5\\_7B](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_google_V2_Qwen2.5_7B)\u003Cbr>•[ZeroSearch\\_google\\_V2\\_Qwen2.5\\_7B\\_Instruct](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_google_V2_Qwen2.5_7B_Instruct) |\n\n# 📌 Introduction\n\n- We propose ZeroSearch, a novel reinforcement learning framework that incentivizes the capability of LLMs to use a real search engine with simulated searches during training. \n- Through supervised fine-tuning, we transform the LLM into a retrieval module capable of generating both relevant and noisy documents in response to a query. We further introduce a curriculum rollout mechanism to progressively elicit the model’s reasoning ability by exposing it to increasingly challenging retrieval scenarios.\n- We conduct extensive experiments on both in-domain and out-of-domain datasets. Results show that ZeroSearch outperforms real search engine-based models while incurring zero API cost. Moreover, it generalizes well across both base and instruction-tuned LLMs of various sizes and supports different reinforcement learning algorithms.\n\n# 🛠 Dependencies\n\n```bash\nconda create -n zerosearch python=3.9\nconda activate zerosearch\npip install torch==2.4.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\npip install vllm==0.6.3\npip install wandb\npip install serpapi\n\n# verl\npip install -e .\n\n# flash attention 2\npip3 install flash-attn --no-build-isolation\n\n# sglang\n# If you encounter package conflicts when trying to install sglang in the current environment, we recommend creating a new environment and installing sglang there.\npip install sglang[all]\n```\n\n\n# 📖 Quick Start\n(1) Download the training dataset.\n\n```bash\nhuggingface-cli download --repo-type dataset --resume-download sunhaonlp\u002FZeroSearch_dataset --local-dir ZeroSearch_dataset\n\n# (Optional) Download the Simulation Tuning dataset, required only if you want to train your own simulation LLMs\nhuggingface-cli download --repo-type dataset --resume-download sunhaonlp\u002FSimulationTuning_dataset --local-dir SimulationTuning_dataset\n```\n\n(2) Download the simulation LLMs.\n\n```bash\n# Simulation LLMs are available in different parameter sizes. Choose the one that best suits your needs.\n# The 14B version is recommended for its stable and reliable simulation performance.\nhuggingface-cli download --resume-download sunhaonlp\u002FSimulation_LLM_google_3B_V2 --local-dir Simulation_LLM_google_3B\n\nhuggingface-cli download --resume-download sunhaonlp\u002FSimulation_LLM_google_7B_V2 --local-dir Simulation_LLM_google_7B\n\nhuggingface-cli download --resume-download sunhaonlp\u002FSimulation_LLM_google_14B_V2 --local-dir Simulation_LLM_google_14B\n```\n\n(3) Launch a local simulation server.\n\n```bash\n# Prompt-based simulation\npython -m sglang.launch_server --model-path Qwen2.5-14B-Instruct --host 0.0.0.0 --tp 2 --dp 2 --port 6001\n\n# Fine-tuning-based simulation\npython -m sglang.launch_server --model-path Simulation_LLM_google_14B --host 0.0.0.0 --tp 2 --dp 2 --port 6001\n```\n\n(4) Conduct RL training with Qwen2.5-3B-Instruct.\n\n```bash\n# Activate the conda environment\nconda activate zerosearch\n\n# Set your Google Search API key\nexport SER_API_KEY=your_api_key\n\n# You can run REINFORCE, GRPO or PPO training using the scripts below.\n# The START_THRESHOLD and END_THRESHOLD parameters define the initial and final difficulty levels of the training tasks. Adjusting these values can help optimize model performance.\n\n## Prompt-based simulation\nbash train_reinforce.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B-Instruct DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_prompt SIMULATION_LLM Qwen2.5-14B-Instruct START_THRESHOLD 0 END_THRESHOLD 0.5 SEARCH_ENGINE google MAX_TURNS 5 TOPK 5\nbash train_grpo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B-Instruct DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_prompt SIMULATION_LLM Qwen2.5-14B-Instruct START_THRESHOLD 0 END_THRESHOLD 0.5 SEARCH_ENGINE google MAX_TURNS 5 TOPK 5\nbash train_ppo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B-Instruct DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_prompt SIMULATION_LLM Qwen2.5-14B-Instruct START_THRESHOLD 0 END_THRESHOLD 0.5 SEARCH_ENGINE google MAX_TURNS 5 TOPK 5\n\n## Fine-tuning-based simulation\nbash train_reinforce.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B-Instruct DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_sft SIMULATION_LLM Simulation_LLM_google_14B START_THRESHOLD 0 END_THRESHOLD 0.5 SEARCH_ENGINE google MAX_TURNS 5 TOPK 5\nbash train_grpo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B-Instruct DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_sft SIMULATION_LLM Simulation_LLM_google_14B START_THRESHOLD 0 END_THRESHOLD 0.5 SEARCH_ENGINE google MAX_TURNS 5 TOPK 5\nbash train_ppo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B-Instruct DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_sft SIMULATION_LLM Simulation_LLM_google_14B START_THRESHOLD 0 END_THRESHOLD 0.5 SEARCH_ENGINE google MAX_TURNS 5 TOPK 5\n```\n\n# 💡 Performance\n\n### 📊 Main Results\n\n\u003Cdiv align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlibaba-NLP_ZeroSearch_readme_94ec0c7ddc65.jpg\" width=\"80%\" height=\"auto\" \u002F>\n\u003C\u002Fdiv>\n\n### 📊 Compare ZeroSearch with Real Search Engine \n\n\u003Cdiv align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlibaba-NLP_ZeroSearch_readme_9df1e927655a.jpg\" width=\"80%\" height=\"auto\" \u002F>\n\u003C\u002Fdiv>\n\n### 📊 Choice of Simulation LLMs\n\n\u003Cdiv align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlibaba-NLP_ZeroSearch_readme_956c4f5669db.jpg\" width=\"80%\" height=\"auto\" \u002F>\n\u003C\u002Fdiv>\n\n### 📊 Case Study\n\n\u003Cdiv align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlibaba-NLP_ZeroSearch_readme_b62fefd51eed.jpg\" width=\"80%\" height=\"auto\" \u002F>\n\u003C\u002Fdiv>\n\n\n# 🙏 Acknowledgements\n\nThis work is implemented based on [Search-R1](https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1), [veRL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl), and [RAGEN](https:\u002F\u002Fgithub.com\u002FZihanWang314\u002FRAGEN\u002Ftree\u002Fmain). We sincerely thank the authors of these projects for their valuable contributions to the open-source community.\n\n## 👍 Awesome work inspired by ZeroSearch\n\n- [SSRL](https:\u002F\u002Fgithub.com\u002FTsinghuaC3I\u002FSSRL): SSRL: Self-Search Reinforcement Learning. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTsinghuaC3I\u002FSSRL)](https:\u002F\u002Fgithub.com\u002FTsinghuaC3I\u002FSSRL)\n\n\n# 📧 Contact\n\nIf you have any questions, feel free to reach out to me via email: [sunhao@stu.pku.edu.cn](mailto:sunhao@stu.pku.edu.cn)\n\n## 🚩Citation\n\nIf this work is helpful, please kindly cite as:\n\n```bigquery\n@article{sun2025zerosearch,\n  title={ZeroSearch: Incentivize the Search Capability of LLMs without Searching},\n  author={Sun, Hao and Qiao, Zile and Guo, Jiayan and Fan, Xuanbo and Hou, Yingyan and Jiang, Yong and Xie, Pengjun and Huang, Fei and Zhang, Yan},\n  journal={arXiv preprint arXiv:2505.04588},\n  year={2025}\n}\n```\n","\u003Cdiv align=\"center\">\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlibaba-NLP_ZeroSearch_readme_af3b0f5a7196.jpg\" width=\"70%\" height=\"280%\" \u002F>\n\u003C\u002Fp>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\u003Ch1>ZeroSearch：无需实际搜索即可激励大语言模型的检索能力\u003C\u002Fh1>\n\u003C\u002Fdiv>\n\n\n\u003Cdiv align=\"center\">\n  \u003Ca href='https:\u002F\u002Falibaba-nlp.github.io\u002FZeroSearch\u002F'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHomepage-ZeroSearch-6c5ce7?logo=github&logoColor=white'>\u003C\u002Fa>\n  \u003Ca href='https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.04588'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-arXiv-d63031?logo=arxiv&logoColor=white'>\u003C\u002Fa>\n  \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fsunhaonlp\u002Fzerosearch-v2-6827f4ee6b6265069d443d4e'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Models-0984e3'>\u003C\u002Fa>\n  \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fsunhaonlp\u002FZeroSearch_dataset'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20 Face-Datasets-00b894'>\u003C\u002Fa>\n  \u003Ca href='https:\u002F\u002Fx.com\u002F_akhaliq\u002Fstatus\u002F1920397374007984516'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Furl?url=https%3A%2F%2Fx.com%2FKevin_GuoweiXu%2Fstatus%2F1858338565463421244'>\u003C\u002Fa>\u003Cbr>\n\u003C\u002Fdiv>\n\n\n\n\u003Cp align=\"center\">\n  \u003Ci>\u003Cb>孙浩、乔子乐、郭嘉妍、范轩博、侯颖燕\u003C\u002Fb>\u003C\u002Fi>\u003Cbr>\n  \u003Ci>\u003Cb>姜勇、谢鹏俊、张岩、黄飞、周景仁\u003C\u002Fb>\u003C\u002Fi>\u003Cbr>\n  \u003Ci>通义实验室 \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlibaba-NLP_ZeroSearch_readme_ad19c9d61b7a.png\" width=\"14px\">，阿里巴巴集团\u003C\u002Fi>\n\u003C\u002Fp>\n\n\n\n# 🔥 最新消息\n\n- **[2025年6月8日]** 发布了与维基百科搜索兼容的[模拟大语言模型](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fsunhaonlp\u002Fsimulation-llm-wiki-v2-6857b06122425526d82a42d4)和[策略模型](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fsunhaonlp\u002Fzerosearch-policy-wiki-v2-68442dce61d2e68f6623e500)。\n- **[2025年5月17日]** 发布了与谷歌搜索兼容的[模拟大语言模型](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fsunhaonlp\u002Fsimulation-llm-google-v2-6827f4e45bca955ed2b2d0ba)和[策略模型](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fsunhaonlp\u002Fzerosearch-policy-google-v2-6827f4ee6b6265069d443d4e)。\n- **[2025年5月17日]** 发布了[模拟微调数据集](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fsunhaonlp\u002FSimulationTuning_dataset)。\n- **[2025年5月17日]** 新增支持三种强化学习算法：REINFORCE、GPRO 和 PPO。\n- **[2025年5月8日]** 发布了初始代码库和论文。\n\n\n# 🤗 资源\n\n| 检索器 | 模拟微调数据集                                    | 模拟大语言模型                                              | 策略模型                                                 |\n| --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |\n| 维基百科 | •[SimulationTuning_wiki_dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fsunhaonlp\u002FSimulationTuning_wiki_dataset)     | •[Simulation_LLM_wiki_3B_V2](https:\u002F\u002Fhuggingface.co\u002Fsunhaonlp\u002FSimulation_LLM_wiki_3B_V2)\u003Cbr>•[Simulation_LLM_wiki_7B_V2](https:\u002F\u002Fhuggingface.co\u002Fsunhaonlp\u002FSimulation_LLM_wiki_7B_V2)\u003Cbr>•[Simulation_LLM_wiki_14B_V2](https:\u002F\u002Fhuggingface.co\u002Fsunhaonlp\u002FSimulation_LLM_wiki_14B_V2)             | •[ZeroSearch_wiki_V2_Qwen2.5_3B](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_wiki_V2_Qwen2.5_3B)\u003Cbr>•[ZeroSearch_wiki_V2_Qwen2.5_3B_Instruct](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_wiki_V2_Qwen2.5_3B_Instruct)\u003Cbr>•[ZeroSearch_wiki_V2_Llama_3.2_3B](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_wiki_V2_Llama_3.2_3B)\u003Cbr>•[ZeroSearch_wiki_V2_Llama_3.2_3B_Instruct](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_wiki_V2_Llama_3.2_3B_Instruct)\u003Cbr>•[ZeroSearch_wiki_V2_Qwen2.5_7B](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_wiki_V2_Qwen2.5_7B)\u003Cbr>•[ZeroSearch_wiki_V2_Qwen2.5_7B_Instruct](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_wiki_V2_Qwen2.5_7B_Instruct)                         |\n| 谷歌    | •[SimulationTuning_google_dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fsunhaonlp\u002FSimulationTuning_google_dataset) | •[Simulation_LLM_google_3B_V2](https:\u002F\u002Fhuggingface.co\u002Fsunhaonlp\u002FSimulation_LLM_google_3B_V2)\u003Cbr>•[Simulation_LLM_google_7B_V2](https:\u002F\u002Fhuggingface.co\u002Fsunhaonlp\u002FSimulation_LLM_google_7B_V2)\u003Cbr>•[Simulation_LLM_google_14B_V2](https:\u002F\u002Fhuggingface.co\u002Fsunhaonlp\u002FSimulation_LLM_google_14B_V2) | •[ZeroSearch_google_V2_Qwen2.5_3B](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_google_V2_Qwen2.5_3B)\u003Cbr>•[ZeroSearch_google_V2_Qwen2.5_3B_Instruct](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_google_V2_Qwen2.5_3B_Instruct)\u003Cbr>•[ZeroSearch_google_V2_Llama_3.2_3B](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_google_V2_Llama_3.2_3B)\u003Cbr>•[ZeroSearch_google_V2_Llama_3.2_3B_Instruct](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_google_V2_Llama_3.2_3B_Instruct)\u003Cbr>•[ZeroSearch_google_V2_Qwen2.5_7B](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_google_V2_Qwen2.5_7B)\u003Cbr>•[ZeroSearch_google_V2_Qwen2.5_7B_Instruct](https:\u002F\u002Fhuggingface.co\u002FAlibaba-NLP\u002FZeroSearch_google_V2_Qwen2.5_7B_Instruct) |\n\n# 📌 简介\n\n- 我们提出了 ZeroSearch，这是一种新颖的强化学习框架，它通过在训练过程中使用模拟搜索来激励大语言模型利用真实搜索引擎的能力。\n- 通过监督微调，我们将大语言模型转变为一个检索模块，能够根据查询生成相关文档和噪声文档。我们进一步引入了一种课程式展开机制，通过逐步引入更具挑战性的检索场景，激发模型的推理能力。\n- 我们在领域内和领域外的数据集上进行了广泛的实验。结果表明，ZeroSearch 的性能优于基于真实搜索引擎的模型，同时零 API 成本。此外，它在不同规模的基础模型和指令微调模型中都表现出良好的泛化能力，并且支持多种强化学习算法。\n\n# 🛠 依赖项\n\n```bash\nconda create -n zerosearch python=3.9\nconda activate zerosearch\npip install torch==2.4.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\npip install vllm==0.6.3\npip install wandb\npip install serpapi\n\n# verl\npip install -e .\n\n# flash attention 2\npip3 install flash-attn --no-build-isolation\n\n# sglang\n# 如果在当前环境中安装 sglang 时遇到包冲突，建议创建一个新的环境并在其中安装 sglang。\npip install sglang[all]\n```\n\n\n# 📖 快速入门\n(1) 下载训练数据集。\n\n```bash\nhuggingface-cli download --repo-type dataset --resume-download sunhaonlp\u002FZeroSearch_dataset --local-dir ZeroSearch_dataset\n\n# （可选）下载模拟微调数据集，仅在您希望训练自己的模拟大语言模型时才需要\nhuggingface-cli download --repo-type dataset --resume-download sunhaonlp\u002FSimulationTuning_dataset --local-dir SimulationTuning_dataset\n```\n\n(2) 下载模拟大语言模型。\n\n```bash\n# 模拟大语言模型有不同的参数规模。请根据您的需求选择合适的版本。\n\n# 推荐使用14B版本，因为它具有稳定可靠的模拟性能。\nhuggingface-cli download --resume-download sunhaonlp\u002FSimulation_LLM_google_3B_V2 --local-dir Simulation_LLM_google_3B\n\nhuggingface-cli download --resume-download sunhaonlp\u002FSimulation_LLM_google_7B_V2 --local-dir Simulation_LLM_google_7B\n\nhuggingface-cli download --resume-download sunhaonlp\u002FSimulation_LLM_google_14B_V2 --local-dir Simulation_LLM_google_14B\n```\n\n(3) 启动本地模拟服务器。\n\n```bash\n# 基于提示的模拟\npython -m sglang.launch_server --model-path Qwen2.5-14B-Instruct --host 0.0.0.0 --tp 2 --dp 2 --port 6001\n\n# 基于微调的模拟\npython -m sglang.launch_server --model-path Simulation_LLM_google_14B --host 0.0.0.0 --tp 2 --dp 2 --port 6001\n```\n\n(4) 使用Qwen2.5-3B-Instruct进行强化学习训练。\n\n```bash\n# 激活conda环境\nconda activate zerosearch\n\n# 设置您的Google Search API密钥\nexport SER_API_KEY=your_api_key\n\n# 您可以使用以下脚本运行REINFORCE、GRPO或PPO训练。\n# START_THRESHOLD和END_THRESHOLD参数定义了训练任务的初始和最终难度级别。调整这些值可以帮助优化模型性能。\n\n## 基于提示的模拟\nbash train_reinforce.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B-Instruct DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_prompt SIMULATION_LLM Qwen2.5-14B-Instruct START_THRESHOLD 0 END_THRESHOLD 0.5 SEARCH_ENGINE google MAX_TURNS 5 TOPK 5\nbash train_grpo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B-Instruct DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_prompt SIMULATION_LLM Qwen2.5-14B-Instruct START_THRESHOLD 0 END_THRESHOLD 0.5 SEARCH_ENGINE google MAX_TURNS 5 TOPK 5\nbash train_ppo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B-Instruct DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_prompt SIMULATION_LLM Qwen2.5-14B-Instruct START_THRESHOLD 0 END_THRESHOLD 0.5 SEARCH_ENGINE google MAX_TURNS 5 TOPK 5\n\n## 基于微调的模拟\nbash train_reinforce.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B-Instruct DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_sft SIMULATION_LLM Simulation_LLM_google_14B START_THRESHOLD 0 END_THRESHOLD 0.5 SEARCH_ENGINE google MAX_TURNS 5 TOPK 5\nbash train_grpo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B-Instruct DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_sft SIMULATION_LLM Simulation_LLM_google_14B START_THRESHOLD 0 END_THRESHOLD 0.5 SEARCH_ENGINE google MAX_TURNS 5 TOPK 5\nbash train_ppo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B-Instruct DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_sft SIMULATION_LLM Simulation_LLM_google_14B START_THRESHOLD 0 END_THRESHOLD 0.5 SEARCH_ENGINE google MAX_TURNS 5 TOPK 5\n```\n\n# 💡 性能\n\n### 📊 主要结果\n\n\u003Cdiv align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlibaba-NLP_ZeroSearch_readme_94ec0c7ddc65.jpg\" width=\"80%\" height=\"auto\" \u002F>\n\u003C\u002Fdiv>\n\n### 📊 将ZeroSearch与真实搜索引擎进行比较\n\n\u003Cdiv align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlibaba-NLP_ZeroSearch_readme_9df1e927655a.jpg\" width=\"80%\" height=\"auto\" \u002F>\n\u003C\u002Fdiv>\n\n### 📊 模拟LLM的选择\n\n\u003Cdiv align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlibaba-NLP_ZeroSearch_readme_956c4f5669db.jpg\" width=\"80%\" height=\"auto\" \u002F>\n\u003C\u002Fdiv>\n\n### 📊 案例研究\n\n\u003Cdiv align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlibaba-NLP_ZeroSearch_readme_b62fefd51eed.jpg\" width=\"80%\" height=\"auto\" \u002F>\n\u003C\u002Fdiv>\n\n\n# 🙏 致谢\n\n本工作基于[Search-R1](https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1)、[veRL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl)和[RAGEN](https:\u002F\u002Fgithub.com\u002FZihanWang314\u002FRAGEN\u002Ftree\u002Fmain)实现。我们衷心感谢这些项目的作者为开源社区所做的宝贵贡献。\n\n## 👍 受ZeroSearch启发的优秀工作\n\n- [SSRL](https:\u002F\u002Fgithub.com\u002FTsinghuaC3I\u002FSSRL): SSRL: 自我搜索强化学习。[![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTsinghuaC3I\u002FSSRL)](https:\u002F\u002Fgithub.com\u002FTsinghuaC3I\u002FSSRL)\n\n\n# 📧 联系方式\n\n如果您有任何问题，请随时通过电子邮件联系我：[sunhao@stu.pku.edu.cn](mailto:sunhao@stu.pku.edu.cn)\n\n## 🚩引用\n\n如果本工作对您有所帮助，请引用如下：\n\n```bigquery\n@article{sun2025zerosearch,\n  title={ZeroSearch: 在不进行搜索的情况下激励LLM的搜索能力},\n  author={Sun, Hao and Qiao, Zile and Guo, Jiayan and Fan, Xuanbo and Hou, Yingyan and Jiang, Yong and Xie, Pengjun and Huang, Fei and Zhang, Yan},\n  journal={arXiv预印本 arXiv:2505.04588},\n  year={2025}\n}\n```","# ZeroSearch 快速上手指南\n\nZeroSearch 是一个新颖的强化学习框架，旨在通过模拟搜索训练大语言模型（LLM）的搜索能力，而无需在训练过程中调用真实的搜索引擎 API。该方法不仅零 API 成本，还能在多种模型规模和 RL 算法上取得优于真实搜索的效果。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**: Linux (推荐 Ubuntu)\n- **Python 版本**: 3.9\n- **GPU**: 支持 CUDA 12.1 的 NVIDIA GPU\n- **依赖管理**: Conda\n\n### 前置依赖\n本项目依赖 `torch`, `vllm`, `sglang`, `verl` 等核心库。请确保您的环境已安装正确的 CUDA 驱动。\n\n## 安装步骤\n\n1. **创建并激活 Conda 环境**\n   ```bash\n   conda create -n zerosearch python=3.9\n   conda activate zerosearch\n   ```\n\n2. **安装核心依赖**\n   建议使用 PyTorch 官方源安装指定版本的 torch，以确保兼容性。\n   ```bash\n   pip install torch==2.4.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\n   pip install vllm==0.6.3\n   pip install wandb\n   pip install serpapi\n   ```\n\n3. **安装项目代码及扩展库**\n   克隆仓库后进入目录，安装 `verl` 和 `flash-attn`。\n   ```bash\n   # 假设已在项目根目录\n   pip install -e .\n\n   # 安装 flash attention 2\n   pip3 install flash-attn --no-build-isolation\n   ```\n\n4. **安装 SGLang**\n   *注意：如果当前环境出现包冲突，建议新建一个独立环境专门安装 sglang。*\n   ```bash\n   pip install sglang[all]\n   ```\n\n## 基本使用\n\n以下是基于 Google 搜索模拟场景，使用 Qwen2.5-3B-Instruct 进行 REINFORCE 算法训练的最小化流程。\n\n### 第一步：下载数据集\n下载主训练数据集。如需自定义训练模拟模型，可选下载微调数据集。\n```bash\nhuggingface-cli download --repo-type dataset --resume-download sunhaonlp\u002FZeroSearch_dataset --local-dir ZeroSearch_dataset\n\n# (可选) 下载模拟微调数据集\nhuggingface-cli download --repo-type dataset --resume-download sunhaonlp\u002FSimulationTuning_dataset --local-dir SimulationTuning_dataset\n```\n\n### 第二步：下载模拟 LLM\n下载用于模拟搜索结果的模型（推荐 14B 版本以获得更稳定的性能）。\n```bash\nhuggingface-cli download --resume-download sunhaonlp\u002FSimulation_LLM_google_14B_V2 --local-dir Simulation_LLM_google_14B\n```\n\n### 第三步：启动本地模拟服务器\n使用 `sglang` 启动模拟服务。这里以微调后的模拟模型为例：\n```bash\npython -m sglang.launch_server --model-path Simulation_LLM_google_14B --host 0.0.0.0 --tp 2 --dp 2 --port 6001\n```\n*注：`--tp` 和 `--dp` 参数需根据您的显卡数量调整。*\n\n### 第四步：执行强化学习训练\n设置环境变量并运行训练脚本。请替换 `\u003Cyour_api_key>` 为您的 Google Search API Key（仅用于评估或特定模式，模拟训练主要依赖本地模型）。\n\n```bash\nconda activate zerosearch\nexport SER_API_KEY=\u003Cyour_api_key>\n\n# 运行 REINFORCE 训练示例\n# 参数说明：\n# SEARCH_MODE: simulate_sft (使用微调模型模拟) 或 simulate_prompt (使用提示词模拟)\n# SIMULATION_LLM: 模拟模型路径\n# START_THRESHOLD\u002FEND_THRESHOLD: 课程学习难度阈值\nbash train_reinforce.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B-Instruct DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_sft SIMULATION_LLM Simulation_LLM_google_14B START_THRESHOLD 0 END_THRESHOLD 0.5 SEARCH_ENGINE google MAX_TURNS 5 TOPK 5\n```\n\n*提示：您也可以通过修改脚本参数轻松切换至 `train_grpo.sh` 或 `train_ppo.sh` 来尝试不同的强化学习算法。*","某电商平台的智能客服团队正致力于升级问答系统，要求模型在回答“最新款手机参数”或“突发政策影响”等动态问题时，必须基于实时检索结果生成准确回复，严禁胡编乱造。\n\n### 没有 ZeroSearch 时\n- **训练成本高昂**：为了教会模型使用搜索引擎，团队需搭建复杂的真实检索环境，每次强化学习训练都要调用真实的 Google 或维基百科 API，导致网络延迟高且产生巨额 API 费用。\n- **开发迭代缓慢**：由于依赖外部真实搜索服务，训练过程极不稳定，一旦搜索接口波动或返回噪声数据，模型策略更新就会失败，调试周期长达数周。\n- **冷启动困难**：初始模型完全不具备搜索意识，无法自主构造有效的查询词（Query），在真实环境中几乎无法收集到有效的正向反馈样本，导致训练难以启动。\n\n### 使用 ZeroSearch 后\n- **零成本模拟训练**：ZeroSearch 利用内置的“模拟大语言模型”完美复刻真实搜索引擎的行为，团队可在本地离线完成全部强化学习训练，彻底消除了对外部 API 的依赖和费用。\n- **高效稳定迭代**：模拟环境响应速度极快且数据分布可控，支持 REINFORCE、PPO 等多种算法快速试错，将原本数周的策略优化周期缩短至几天内完成。\n- **自主能力觉醒**：通过模拟环境的密集奖励机制，模型在无真实搜索干预的情况下，迅速学会了何时触发搜索、如何提炼关键词以及怎样整合检索片段，实现了“不搜索也能练出搜索能力”。\n\nZeroSearch 的核心价值在于通过高保真模拟环境，让大模型以零边际成本和安全高效的方式，自主进化出强大的实时信息检索与整合能力。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlibaba-NLP_ZeroSearch_dd03b042.png","Alibaba-NLP","Tongyi Lab, Alibaba Group","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FAlibaba-NLP_4756c6c9.png","Our team at Tongyi Lab is dedicated to pioneer advancements in AI search technologies.",null,"yongjiang.jy@alibaba-inc.com","https:\u002F\u002Fgithub.com\u002FAlibaba-NLP",[83,87],{"name":84,"color":85,"percentage":86},"Python","#3572A5",98.7,{"name":88,"color":89,"percentage":90},"Shell","#89e051",1.3,1255,114,"2026-04-02T01:08:38","Apache-2.0",4,"Linux","必需 NVIDIA GPU。安装命令指定 CUDA 12.1 (cu121)。示例脚本中使用 `tp 2` (张量并行) 和 `dp 2` (数据并行)，暗示需要多卡环境。模型规模涵盖 3B\u002F7B\u002F14B，运行 14B 模拟模型及训练策略模型建议显存 24GB+ (如 A10\u002FA100\u002FRTX 3090\u002F4090)，具体取决于并发量和序列长度。","未说明 (建议 32GB+ 以支持大型模型加载和数据预处理)",{"notes":100,"python":101,"dependencies":102},"1. 强烈建议使用 conda 创建独立环境。2. 若在当前环境中安装 sglang 遇到包冲突，需新建环境单独安装。3. 使用 Google 搜索功能需配置 `SER_API_KEY` 环境变量。4. 项目依赖 vLLM、SGLang 和 Flash Attention 2 进行加速，需确保显卡驱动兼容。5. 提供了基于提示词 (Prompt-based) 和微调 (Fine-tuning-based) 两种模拟模式，后者需额外下载模拟 LLM 权重。","3.9",[103,104,105,106,107,108,109],"torch==2.4.0","vllm==0.6.3","wandb","serpapi","verl","flash-attn","sglang",[26,13],"2026-03-27T02:49:30.150509","2026-04-06T08:09:03.549329",[114,119,124,129,134,139],{"id":115,"question_zh":116,"answer_zh":117,"source_url":118},16798,"启动 sglang 服务时遇到 vllm 版本警告或 undefined symbol 报错怎么办？","这是一个已知的环境兼容性问题。目前的临时解决方案是改用 vllm 来启动服务，而不是使用 sglang。具体操作是将启动命令中的 `python -m sglang.launch_server` 替换为 vllm 的启动命令。如果遇到 `undefined symbol: _ZNK3c1011StorageImpl27throw_data_ptr_access_errorEv` 错误，这通常是由于 torch、vllm 和 sglang 版本不匹配导致的，建议检查并重新安装与当前 CUDA 版本严格对应的 pytorch 和 vllm 版本，或者暂时回退到仅使用 vllm 进行推理。","https:\u002F\u002Fgithub.com\u002FAlibaba-NLP\u002FZeroSearch\u002Fissues\u002F7",{"id":120,"question_zh":121,"answer_zh":122,"source_url":123},16799,"项目是否提供独立的推理代码？在使用推理脚本时遇到模型无限搜索导致上下文溢出的问题如何解决？","项目支持独立推理。如果在使用推理脚本时遇到 LLM 陷入死循环不断请求搜索直到耗尽上下文窗口（context size）的情况，这属于边缘案例（corner cases）。目前官方尚未在文档中给出固定的参数配置来完全避免此问题，建议在应用层增加最大搜索次数限制或在检测到重复搜索意图时强制终止搜索过程。","https:\u002F\u002Fgithub.com\u002FAlibaba-NLP\u002FZeroSearch\u002Fissues\u002F27",{"id":125,"question_zh":126,"answer_zh":127,"source_url":128},16800,"项目的评估指标 Exact Match (EM) 是如何计算的？与标准的完全匹配有何不同？","该项目采用的评估指标实际上是“子串匹配”（Substring Matching），而非严格的“完全相等”（Exact Equality）。代码实现中使用 `in` 操作符判断预测答案是否包含标准答案（`normalize_answer(answer) in normalize_answer(pred)`），而不是使用 `==`。维护者表示，虽然论文中表述为 EM，但为了与复现的基线方法保持一致并确保比较的公平性，所有方法（包括基线）均统一使用这种子串匹配方式进行评估。","https:\u002F\u002Fgithub.com\u002FAlibaba-NLP\u002FZeroSearch\u002Fissues\u002F30",{"id":130,"question_zh":131,"answer_zh":132,"source_url":133},16801,"没有 SERPAPI 密钥（如学生用户）能否进行评估或训练？","可以。项目的核心目标是在训练阶段模拟搜索引擎，而不是在推理阶段依赖真实的搜索引擎 API。如果没有 SERPAPI 密钥，可以使用项目提供的搜索模拟模型（Search Model \u002F Simulation LLM）来进行评估和训练。在配置中确保使用模拟模式（simulate_sft），这样就不需要真实的 API 调用即可完成实验。","https:\u002F\u002Fgithub.com\u002FAlibaba-NLP\u002FZeroSearch\u002Fissues\u002F13",{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},16802,"配置参数 `actor_rollout_ref.rollout.n_agent` 的具体作用是什么？它与 `n` 有什么区别？","`actor_rollout_ref.rollout.n_agent` 通常设置为每个输入样本重复采样的轨迹数量（number of repeated trajectory per input）。而 `actor_rollout_ref.rollout.n` 参数控制的是 vllm_rollout.py 中的采样次数，通常设置为 1。简而言之，`n_agent` 用于控制强化学习中每个问题的采样路径数，而 `n` 是底层推理引擎的采样参数。","https:\u002F\u002Fgithub.com\u002FAlibaba-NLP\u002FZeroSearch\u002Fissues\u002F26",{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},16803,"使用 GRPO 算法训练时奖励（reward）突然崩溃或下降该怎么办？","GRPO 算法在某些情况下会出现训练不稳定的崩溃现象。官方建议改用 **REINFORCE** 算法，实验表明其训练动态更稳定。推荐的稳定训练配置如下：\n1. **算法**：使用 REINFORCE 代替 GRPO。\n2. **基座模型**：推荐使用 Qwen2.5-3B。\n3. **模拟模型**：推荐使用 14B 版本的模拟 LLM（如 Simulation_LLM_google_14B），其表现显著优于 7B 版本。\n\n参考训练命令：\n```bash\nbash train_reinforce.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_sft SIMULATION_LLM Simulation_LLM_google_14B START_THRESHOLD 0 END_THRESHOLD 0.5\n```","https:\u002F\u002Fgithub.com\u002FAlibaba-NLP\u002FZeroSearch\u002Fissues\u002F23",[]]