[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-PeterGriffinJin--Search-R1":3,"tool-PeterGriffinJin--Search-R1":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":81,"owner_twitter":81,"owner_website":82,"owner_url":83,"languages":84,"stars":93,"forks":94,"last_commit_at":95,"license":96,"difficulty_score":97,"env_os":98,"env_gpu":99,"env_ram":100,"env_deps":101,"category_tags":115,"github_topics":81,"view_count":23,"oss_zip_url":81,"oss_zip_packed_at":81,"status":16,"created_at":116,"updated_at":117,"faqs":118,"releases":155},3379,"PeterGriffinJin\u002FSearch-R1","Search-R1","Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL","Search-R1 是一个基于强化学习（RL）的高效训练框架，旨在让大语言模型学会“边思考、边搜索”。它解决了传统模型在面对复杂问题时，难以自主协调内部推理与外部工具调用（如搜索引擎）的痛点。通过模拟 DeepSeek-R1 的思路并加以扩展，Search-R1 能让参数量较小的基础模型（如 3B 级别），在无需大量人工标注数据的情况下，自我进化出强大的逻辑推理和实时信息检索能力，被视为 OpenAI DeepResearch 的开源替代方案。\n\n该工具主要面向 AI 研究人员和开发者，特别是那些希望探索工具增强型大模型、复现前沿推理技术或构建自定义搜索智能体的团队。其核心技术亮点在于基于 veRL 构建，支持 PPO、GRPO 等多种主流强化学习算法，并具备极高的灵活性：用户可自由搭配不同的基座模型（如 Llama3、Qwen2.5）以及多种搜索后端（从本地稀疏\u002F稠密检索器到在线搜索引擎）。凭借模块化设计和完整的开源训练流水线，Search-R1 降低了高性能推理模型的研发门槛，助力社区在可解释性与动态知识获取领域取得更多突破。","# Search-R1: Train your LLMs to reason and call a search engine with reinforcement learning\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPeterGriffinJin_Search-R1_readme_c3d1724ba2a6.png\" alt=\"logo\" width=\"300\"\u002F>\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.09516\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper1-blue?style=for-the-badge\" alt=\"Button1\"\u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.15117\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper2-green?style=for-the-badge\" alt=\"Button2\"\u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FPeterJinGo\u002Fsearch-r1-67d1a021202731cb065740f5\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FResources-orange?style=for-the-badge\" alt=\"Button3\"\u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fx.com\u002FBowenJin13\u002Fstatus\u002F1895544294473109889\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTweet-red?style=for-the-badge\" alt=\"Button4\"\u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-v0.2\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLogs-purple?style=for-the-badge\" alt=\"Button5\"\u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\u003C!-- \u003Cstrong>Search-R1\u003C\u002Fstrong> is a reinforcement learning framework for \u003Cem>training reasoning and searching (tool-call) interleaved LLMs\u003C\u002Fem>.  -->\n\u003C!-- We built upon [veRL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl). -->\n**Search-R1** is a reinforcement learning framework designed for training **reasoning-and-searching interleaved LLMs**—language models that learn to reason and make tool calls (e.g., to search engines) in a coordinated manner.\n\n\u003C!-- It can be seen as an extension of \u003Cstrong>DeepSeek-R1(-Zero)\u003C\u002Fstrong> with interleaved search engine calling and an opensource RL training-based solution for \u003Cstrong>OpenAI DeepResearch\u003C\u002Fstrong>. -->\nBuilt upon [veRL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl), Search-R1 extends the ideas of **DeepSeek-R1(-Zero)** by incorporating interleaved search engine access and provides a fully open-source RL training pipeline. It serves as an alternative and open solution to **OpenAI DeepResearch**, enabling research and development in tool-augmented LLM reasoning.\n\n\u003C!-- Through RL (rule-based outcome reward), the 3B **base** LLM (both Qwen2.5-3b-base and Llama3.2-3b-base) develops reasoning and search engine calling abilities all on its own. -->\n\nWe support different RL methods (e.g., PPO, GRPO, reinforce), different LLMs (e.g., llama3, Qwen2.5, etc) and different search engines (e.g., local sparse\u002Fdense retrievers and online search engines).\n\nPaper: [link1](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.09516), [link2](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.15117); Model and data: [link](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FPeterJinGo\u002Fsearch-r1-67d1a021202731cb065740f5); Twitter thread: [link](https:\u002F\u002Fx.com\u002FBowenJin13\u002Fstatus\u002F1895544294473109889); Full experiment log: [prelim](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-open); [v0.1](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-nq_hotpotqa_train); [v0.2](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-v0.2); [v0.3](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-v0.3). Details about these logs and methods can be find [here](https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1\u002Fblob\u002Fmain\u002Fdocs\u002Fexperiment_log.md).\n\n\n![single-turn](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPeterGriffinJin_Search-R1_readme_8541341591f1.png)\n\n## News\n\n- [2025.10] Search-R1 is featured by Thinking Machines Lab's first product [Tinker](https:\u002F\u002Fgithub.com\u002Fthinking-machines-lab\u002Ftinker-cookbook)! Details: [Document](https:\u002F\u002Fgithub.com\u002Fthinking-machines-lab\u002Ftinker-cookbook\u002Ftree\u002Fmain\u002Ftinker_cookbook\u002Frecipes\u002Ftool_use\u002Fsearch).\n- [2025.7] Search-R1 is supported by [SkyRL](https:\u002F\u002Fgithub.com\u002FNovaSky-AI\u002FSkyRL)! Detailed instructions: [code](https:\u002F\u002Fgithub.com\u002FNovaSky-AI\u002FSkyRL\u002Ftree\u002Fmain\u002Fskyrl-train\u002Fexamples\u002Fsearch), [Document](https:\u002F\u002Fnovasky-ai.notion.site\u002Fskyrl-searchr1).\n- [2025.6] Search-R1 is now integrated into the latest version of veRL and can take advantage of its most up-to-date features! Detailed instructions: [veRL](https:\u002F\u002Fverl.readthedocs.io\u002Fen\u002Flatest\u002Fsglang_multiturn\u002Fsearch_tool_example.html), [English Document](https:\u002F\u002Fgithub.com\u002Fzhaochenyang20\u002FAwesome-ML-SYS-Tutorial\u002Fblob\u002Fmain\u002Frlhf\u002Fverl\u002Fmulti-turn\u002Ftool_examples\u002Fverl-multiturn-searchR1-like.md), [Chinese Document](https:\u002F\u002Fgithub.com\u002Fzhaochenyang20\u002FAwesome-ML-SYS-Tutorial\u002Fblob\u002Fmain\u002Frlhf\u002Fverl\u002Fmulti-turn\u002Ftool_examples\u002Fverl-multiturn-searchR1-like_ZH.md).\n- [2025.5] The second [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.15117) conducting detailed empirical studies is published with logs: [v0.3](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-v0.3). \n- [2025.4] We support [multinode](https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1\u002Fblob\u002Fmain\u002Fdocs\u002Fmultinode.md) training for 30B+ LLMs!\n- [2025.4] We support [different search engines](https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1\u002Fblob\u002Fmain\u002Fdocs\u002Fretriever.md) including sparse local retriever, dense local retriever with ANN indexing and online search engines!\n- [2025.3] The first Search-R1 [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.09516) is published with the logs: [v0.1](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-nq_hotpotqa_train); [v0.2](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-v0.2).\n- [2025.2] We opensource Search-R1 codebase with [preliminary results](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-open).\n\n## Links\n\n- [Installation](#installation)\n- [Quick start](#quick-start)\n- [Preliminary results](#preliminary-results)\n- [Inference](#inference)\n- [Use your own dataset](#use-your-own-dataset)\n- [Use your own search engine](#use-your-own-search-engine)\n- [Features](#features)\n- [Ackowledge](#acknowledge)\n- [Citations](#citations)\n\n## Installation\n\n### Search-r1 environment\n```bash\nconda create -n searchr1 python=3.9\nconda activate searchr1\n# install torch [or you can skip this step and let vllm to install the correct version for you]\npip install torch==2.4.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\n# install vllm\npip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1\n\n# verl\npip install -e .\n\n# flash attention 2\npip3 install flash-attn --no-build-isolation\npip install wandb\n```\n\n### Retriever environment (optional)\nIf you would like to call a local retriever as the search engine, you can install the environment as follows. (We recommend using a seperate environment.)\n```bash\nconda create -n retriever python=3.10\nconda activate retriever\n\n# we recommend installing torch with conda for faiss-gpu\nconda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia\npip install transformers datasets pyserini\n\n## install the gpu version faiss to guarantee efficient RL rollout\nconda install -c pytorch -c nvidia faiss-gpu=1.8.0\n\n## API function\npip install uvicorn fastapi\n```\n\n\n## Quick start\n\nTrain a reasoning + search LLM on NQ dataset with e5 as the retriever and wikipedia as the corpus.\n\n(1) Download the indexing and corpus.\n```bash\nsave_path=\u002Fthe\u002Fpath\u002Fto\u002Fsave\npython scripts\u002Fdownload.py --save_path $save_path\ncat $save_path\u002Fpart_* > $save_path\u002Fe5_Flat.index\ngzip -d $save_path\u002Fwiki-18.jsonl.gz\n```\n\n(2) Process the NQ dataset.\n```bash\npython scripts\u002Fdata_process\u002Fnq_search.py\n```\n\n(3) Launch a local retrieval server.\n```bash\nconda activate retriever\nbash retrieval_launch.sh\n```\n\n(4) Run RL training (PPO) with Llama-3.2-3b-base.\n```bash\nconda activate searchr1\nbash train_ppo.sh\n```\n\n## Preliminary results\n\n(1) The base model (llama3.2-3b-base) learns to call the search engine and obtain improved performance.\n\n![llama-3b](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPeterGriffinJin_Search-R1_readme_6ef0f8069291.png)\n\n\n(2) The base model (Qwen2.5-7b-base) can learn to conduct multi-turn search engine calling and reasoning with RL.\n\n![multi-turn](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPeterGriffinJin_Search-R1_readme_22d3487c6e14.png)\n\n## Inference\n#### You can play with the trained Search-R1 model with your own question.\n(1) Launch a local retrieval server.\n```bash\nconda activate retriever\nbash retrieval_launch.sh\n```\n\n(2) Run inference.\n```bash\nconda activate searchr1\npython infer.py\n```\nYou can modify the ```question``` on line 7 to something you're interested in.\n\n## Use your own dataset\n\n### QA data\nFor each question-answer sample, it should be a dictionary containing the desired content as below:\n\n```\ndata = {\n        \"data_source\": data_source,\n        \"prompt\": [{\n            \"role\": \"user\",\n            \"content\": question,\n        }],\n        \"ability\": \"fact-reasoning\",\n        \"reward_model\": {\n            \"style\": \"rule\",\n            \"ground_truth\": solution\n        },\n        \"extra_info\": {\n            'split': split,\n            'index': idx,\n        }\n    }\n```\n\nYou can refer to ```scripts\u002Fdata_process\u002Fnq_search.py``` for a concrete data processing example.\n\n### Corpora\n\nIt is recommended to make your corpus a jsonl file, where each line (a dictionary with \"id\" key and \"contents\" key) corresponds to one passage. You can refer to ```example\u002Fcorpus.jsonl``` for an example.\n\nThe \"id\" key corresponds to the passage id, while the \"contents\" key corresponds to the passage content ('\"' + title + '\"\\n' + text).\nFor example:\n```\n{\"id\": \"0\", \"contents\": \"Evan Morris Evan L. Morris (January 26, 1977 \\u2013 July 9, 2015) was a lobbyist for Genentech and its parent corporation Roche in Washington.\"}\n...\n{\"id\": \"100\", \"contents\": \"Three years later, when the United States Exploring Expedition to little-known portions of the globe was organised under Charles Wilkes, Hale was recommended, while yet an undergraduate.\"}\n...\n```\n\n**Index your corpora (optional).**\nIf you would like to use a local retriever as the search engine, you can index your own corpus by:\n```\nbash search_r1\u002Fsearch\u002Fbuild_index.sh\n```\nYou can change ```retriever_name``` and ```retriever_model``` to your interested off-the-shelf retriever.\n\n## Use your own search engine\n\nOur codebase supports local sparse retriever (e.g., BM25), local dense retriever (both flat indexing with GPUs and ANN indexing with CPUs) and online search engine (e.g., Google, Bing, etc). More details can be found [here](https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1\u002Ftree\u002Fmain\u002Fdocs\u002Fretriever.md).\n\nThe main philosophy is to launch a local or remote search engine server separately from the main RL training pipeline. \n\nThe LLM can call the search engine by calling the search API (e.g., \"http:\u002F\u002F127.0.0.1:8000\u002Fretrieve\").\n\nYou can refer to ```search_r1\u002Fsearch\u002Fretriever_server.py``` for an example of launching a local retriever server.\n\n## Features\n- Support local sparse retrievers (e.g., BM25). ✔️\n- Support local dense retrievers (both flat indexing and ANN indexing) ✔️\n- Support google search \u002F bing search \u002F brave search API and others. ✔️\n- Support off-the-shelf neural rerankers. ✔️\n- Support different RL methods (e.g., PPO, GRPO, reinforce). ✔️\n- Support different LLMs (e.g., llama3, Qwen2.5, etc). ✔️\n\n## Acknowledge\n\nThe concept of Search-R1 is inspired by [Deepseek-R1](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-R1) and [TinyZero](https:\u002F\u002Fgithub.com\u002FJiayi-Pan\u002FTinyZero\u002Ftree\u002Fmain).\nIts implementation is built upon [veRL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl) and [RAGEN](https:\u002F\u002Fgithub.com\u002FZihanWang314\u002FRAGEN\u002Ftree\u002Fmain). \nWe sincerely appreciate the efforts of these teams for their contributions to open-source research and development.\n\n## Awesome work powered or inspired by Search-R1\n\n- [DeepResearcher](https:\u002F\u002Fgithub.com\u002FGAIR-NLP\u002FDeepResearcher): Scaling Deep Research via Reinforcement Learning in Real-world Environments. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FGAIR-NLP\u002FDeepResearcher)](https:\u002F\u002Fgithub.com\u002FGAIR-NLP\u002FDeepResearcher)\n- [Multimodal-Search-R1](https:\u002F\u002Fgithub.com\u002FEvolvingLMMs-Lab\u002Fmultimodal-search-r1): Incentivizing LMMs to Search. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FEvolvingLMMs-Lab\u002Fmultimodal-search-r1)](https:\u002F\u002Fgithub.com\u002FEvolvingLMMs-Lab\u002Fmultimodal-search-r1)\n- [OTC](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.14870): Optimal Tool Calls via Reinforcement Learning.\n- [ZeroSearch](https:\u002F\u002Fgithub.com\u002FAlibaba-NLP\u002FZeroSearch): Incentivize the Search Capability of LLMs without Searching. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FAlibaba-NLP\u002FZeroSearch)](https:\u002F\u002Fgithub.com\u002FAlibaba-NLP\u002FZeroSearch)\n- [IKEA](https:\u002F\u002Fgithub.com\u002Fhzy312\u002Fknowledge-r1): Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fhzy312\u002Fknowledge-r1)](https:\u002F\u002Fgithub.com\u002Fhzy312\u002Fknowledge-r1)\n- [Scent of Knowledge](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.09316): Optimizing Search-Enhanced Reasoning with Information Foraging.\n- [AutoRefine](https:\u002F\u002Fwww.arxiv.org\u002Fpdf\u002F2505.11277): Search and Refine During Think. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fsyr-cn\u002FAutoRefine)](https:\u002F\u002Fgithub.com\u002Fsyr-cn\u002FAutoRefine)\n- [O^2-Searcher](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.16582): A Searching-based Agent Model for Open-Domain Open-Ended Question Answering. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FAcade-Mate\u002FO2-Searcher)](https:\u002F\u002Fgithub.com\u002FAcade-Mate\u002FO2-Searcher)\n- [MaskSearch](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.20285): A Universal Pre-Training Framework to Enhance Agentic Search Capability. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FAlibaba-NLP\u002FMaskSearch)](https:\u002F\u002Fgithub.com\u002FAlibaba-NLP\u002FMaskSearch)\n- [VRAG-RL](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.22019): Vision-Perception-Based RAG for Visually Rich Information Understanding. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FAlibaba-NLP\u002FVRAG)](https:\u002F\u002Fgithub.com\u002FAlibaba-NLP\u002FVRAG)\n- [R1-Code-Interpreter](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.21668): Training LLMs to Reason with Code via SFT and RL. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fyongchao98\u002FR1-Code-Interpreter)](https:\u002F\u002Fgithub.com\u002Fyongchao98\u002FR1-Code-Interpreter)\n- [R-Search](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.04185): Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FQingFei1\u002FR-Search)](https:\u002F\u002Fgithub.com\u002FQingFei1\u002FR-Search)\n- [StepSearch](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.15107): Igniting LLMs Search Ability via Step-Wise Proximal Policy Optimization. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FZillwang\u002FStepSearch)](https:\u002F\u002Fgithub.com\u002FZillwang\u002FStepSearch)\n- [SimpleTIR](https:\u002F\u002Fsimpletir.notion.site\u002Freport): Stable End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fltzheng\u002FSimpleTIR)](https:\u002F\u002Fgithub.com\u002Fltzheng\u002FSimpleTIR)\n- [Router-R1](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.09033): Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fulab-uiuc\u002FRouter-R1)](https:\u002F\u002Fgithub.com\u002Fulab-uiuc\u002FRouter-R1)\n- [SkyRL](https:\u002F\u002Fskyrl.readthedocs.io\u002Fen\u002Flatest\u002F): A Modular Full-stack RL Library for LLMs. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FNovaSky-AI\u002FSkyRL)](https:\u002F\u002Fgithub.com\u002FNovaSky-AI\u002FSkyRL)\n- [ASearcher](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.07976): Large-Scale RL for Search Agents. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FinclusionAI\u002FASearcher)](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FASearcher)\n- [ParallelSearch](https:\u002F\u002Fwww.arxiv.org\u002Fabs\u002F2508.09303): Decompose Query and Search Sub-queries in Parallel with RL. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTree-Shu-Zhao\u002FParallelSearch)](https:\u002F\u002Fgithub.com\u002FTree-Shu-Zhao\u002FParallelSearch)\n- [AutoTIR](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2507.21836): Autonomous Tools Integrated Reasoning via Reinforcement Learning. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fweiyifan1023\u002FAutoTIR)](https:\u002F\u002Fgithub.com\u002Fweiyifan1023\u002FAutoTIR)\n- [verl-tool](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2509.01055): A version of verl to support diverse tool use. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTIGER-AI-Lab\u002Fverl-tool)](https:\u002F\u002Fgithub.com\u002FTIGER-AI-Lab\u002Fverl-tool)\n- [Tree-GRPO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.21240): Tree Search for LLM Agent Reinforcement Learning. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FAMAP-ML\u002FTree-GRPO)](https:\u002F\u002Fgithub.com\u002FAMAP-ML\u002FTree-GRPO)\n- [EviNote-RAG](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.00877): Enhancing RAG Models via Answer-Supportive Evidence Notes. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FDa1yuqin\u002FEviNoteRAG)](https:\u002F\u002Fgithub.com\u002FDa1yuqin\u002FEviNoteRAG)\n- [GlobalRAG](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2510.20548v1): GlobalRAG: Enhancing Global Reasoning in Multi-hop Question Answering via Reinforcement Learning. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FCarnegieBin\u002FGlobalRAG)](https:\u002F\u002Fgithub.com\u002FCarnegieBin\u002FGlobalRAG)\n\n\n\n\n\n## Citations\n\n```bibtex\n@article{jin2025search,\n  title={Search-r1: Training llms to reason and leverage search engines with reinforcement learning},\n  author={Jin, Bowen and Zeng, Hansi and Yue, Zhenrui and Yoon, Jinsung and Arik, Sercan and Wang, Dong and Zamani, Hamed and Han, Jiawei},\n  journal={arXiv preprint arXiv:2503.09516},\n  year={2025}\n}\n```\n\n```bibtex\n@article{jin2025empirical,\n  title={An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents},\n  author={Jin, Bowen and Yoon, Jinsung and Kargupta, Priyanka and Arik, Sercan O and Han, Jiawei},\n  journal={arXiv preprint arXiv:2505.15117},\n  year={2025}\n}\n```\n","# Search-R1：使用强化学习训练您的大语言模型进行推理并调用搜索引擎\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPeterGriffinJin_Search-R1_readme_c3d1724ba2a6.png\" alt=\"logo\" width=\"300\"\u002F>\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.09516\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper1-blue?style=for-the-badge\" alt=\"Button1\"\u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.15117\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper2-green?style=for-the-badge\" alt=\"Button2\"\u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FPeterJinGo\u002Fsearch-r1-67d1a021202731cb065740f5\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FResources-orange?style=for-the-badge\" alt=\"Button3\"\u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fx.com\u002FBowenJin13\u002Fstatus\u002F1895544294473109889\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTweet-red?style=for-the-badge\" alt=\"Button4\"\u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-v0.2\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLogs-purple?style=for-the-badge\" alt=\"Button5\"\u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\u003C!-- \u003Cstrong>Search-R1\u003C\u002Fstrong> is a reinforcement learning framework for \u003Cem>training reasoning and searching (tool-call) interleaved LLMs\u003C\u002Fem>.  -->\n\u003C!-- We built upon [veRL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl). -->\n**Search-R1** 是一个强化学习框架，专为训练 **推理与搜索交织的大语言模型** 而设计——这些模型能够协同地学习推理和工具调用（例如调用搜索引擎）。\n\n\u003C!-- It can be seen as an extension of \u003Cstrong>DeepSeek-R1(-Zero)\u003C\u002Fstrong> with interleaved search engine calling and an opensource RL training-based solution for \u003Cstrong>OpenAI DeepResearch\u003C\u002Fstrong>. -->\n基于 [veRL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl)，Search-R1 延伸了 **DeepSeek-R1(-Zero)** 的理念，融入了交错的搜索引擎调用功能，并提供了一个完全开源的强化学习训练流程。它作为 **OpenAI DeepResearch** 的替代性开源解决方案，推动了工具增强型大语言模型推理的研究与发展。\n\n\u003C!-- Through RL (rule-based outcome reward), the 3B **base** LLM (both Qwen2.5-3b-base and Llama3.2-3b-base) develops reasoning and search engine calling abilities all on its own. -->\n\n我们支持多种强化学习方法（如 PPO、GRPO、Reinforce），不同的大语言模型（如 Llama3、Qwen2.5 等）以及不同的搜索引擎（如本地稀疏\u002F稠密检索器和在线搜索引擎）。\n\n论文：[link1](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.09516)，[link2](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.15117)；模型与数据：[link](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FPeterJinGo\u002Fsearch-r1-67d1a021202731cb065740f5)；Twitter 帖子：[link](https:\u002F\u002Fx.com\u002FBowenJin13\u002Fstatus\u002F1895544294473109889)；完整实验日志：[prelim](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-open)；[v0.1](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-nq_hotpotqa_train)；[v0.2](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-v0.2)；[v0.3](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-v0.3)。关于这些日志和方法的详细信息，请参阅 [这里](https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1\u002Fblob\u002Fmain\u002Fdocs\u002Fexperiment_log.md)。\n\n\n![single-turn](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPeterGriffinJin_Search-R1_readme_8541341591f1.png)\n\n## 新闻\n\n- [2025.10] Search-R1 被 Thinking Machines Lab 的首款产品 [Tinker](https:\u002F\u002Fgithub.com\u002Fthinking-machines-lab\u002Ftinker-cookbook) 采用！详情：[文档](https:\u002F\u002Fgithub.com\u002Fthinking-machines-lab\u002Ftinker-cookbook\u002Ftree\u002Fmain\u002Ftinker_cookbook\u002Frecipes\u002Ftool_use\u002Fsearch)。\n- [2025.7] Search-R1 得到 [SkyRL](https:\u002F\u002Fgithub.com\u002FNovaSky-AI\u002FSkyRL) 的支持！详细说明：[代码](https:\u002F\u002Fgithub.com\u002FNovaSky-AI\u002FSkyRL\u002Ftree\u002Fmain\u002Fskyrl-train\u002Fexamples\u002Fsearch)，[文档](https:\u002F\u002Fnovasky-ai.notion.site\u002Fskyrl-searchr1)。\n- [2025.6] Search-R1 现已集成到 veRL 的最新版本中，可充分利用其最新功能！详细说明：[veRL](https:\u002F\u002Fverl.readthedocs.io\u002Fen\u002Flatest\u002Fsglang_multiturn\u002Fsearch_tool_example.html)，[英文文档](https:\u002F\u002Fgithub.com\u002Fzhaochenyang20\u002FAwesome-ML-SYS-Tutorial\u002Fblob\u002Fmain\u002Frlhf\u002Fverl\u002Fmulti-turn\u002Ftool_examples\u002Fverl-multiturn-searchR1-like.md)，[中文文档](https:\u002F\u002Fgithub.com\u002Fzhaochenyang20\u002FAwesome-ML-SYS-Tutorial\u002Fblob\u002Fmain\u002Frlhf\u002Fverl\u002Fmulti-turn\u002Ftool_examples\u002Fverl-multiturn-searchR1-like_ZH.md)。\n- [2025.5] 第二篇进行详细实证研究的 [论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.15117) 已发表，并附带日志：[v0.3](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-v0.3)。\n- [2025.4] 我们支持针对 30B+ 规模大语言模型的 [多节点训练](https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1\u002Fblob\u002Fmain\u002Fdocs\u002Fmultinode.md)！\n- [2025.4] 我们支持 [不同搜索引擎](https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1\u002Fblob\u002Fmain\u002Fdocs\u002Fretriever.md)，包括本地稀疏检索器、带有 ANN 索引的本地稠密检索器以及在线搜索引擎！\n- [2025.3] 第一篇 Search-R1 [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.09516) 已发表，并附带日志：[v0.1](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-nq_hotpotqa_train)；[v0.2](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-v0.2)。\n- [2025.2] 我们开源了 Search-R1 代码库，并发布了 [初步成果](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-open)。\n\n## 链接\n\n- [安装](#installation)\n- [快速入门](#quick-start)\n- [初步成果](#preliminary-results)\n- [推理](#inference)\n- [使用您自己的数据集](#use-your-own-dataset)\n- [使用您自己的搜索引擎](#use-your-own-search-engine)\n- [功能](#features)\n- [致谢](#acknowledge)\n- [引用](#citations)\n\n## 安装\n\n### Search-r1 环境\n```bash\nconda create -n searchr1 python=3.9\nconda activate searchr1\n# 安装 torch [或者您可以跳过这一步，让 vllm 自动为您安装正确版本]\npip install torch==2.4.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\n# 安装 vllm\npip3 install vllm==0.6.3 # 或者您可以安装 0.5.4、0.4.2 和 0.3.1\n\n# verl\npip install -e .\n\n# flash attention 2\npip3 install flash-attn --no-build-isolation\npip install wandb\n```\n\n### 检索器环境（可选）\n如果您希望将本地检索器用作搜索引擎，可以按以下方式安装环境。（我们建议使用独立的环境。）\n```bash\nconda create -n retriever python=3.10\nconda activate retriever\n\n# 我们推荐使用 conda 安装 torch 以支持 faiss-gpu\nconda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia\npip install transformers datasets pyserini\n\n## 安装 GPU 版本 faiss 以确保高效的强化学习回放\nconda install -c pytorch -c nvidia faiss-gpu=1.8.0\n\n## API 函数\npip install uvicorn fastapi\n```\n\n## 快速入门\n\n使用 e5 作为检索器、Wikipedia 作为语料库，在 NQ 数据集上训练一个推理 + 检索的 LLM。\n\n(1) 下载索引和语料库。\n```bash\nsave_path=\u002Fthe\u002Fpath\u002Fto\u002Fsave\npython scripts\u002Fdownload.py --save_path $save_path\ncat $save_path\u002Fpart_* > $save_path\u002Fe5_Flat.index\ngzip -d $save_path\u002Fwiki-18.jsonl.gz\n```\n\n(2) 处理 NQ 数据集。\n```bash\npython scripts\u002Fdata_process\u002Fnq_search.py\n```\n\n(3) 启动本地检索服务器。\n```bash\nconda activate retriever\nbash retrieval_launch.sh\n```\n\n(4) 使用 Llama-3.2-3b-base 运行 RL 训练（PPO）。\n```bash\nconda activate searchr1\nbash train_ppo.sh\n```\n\n## 初步结果\n\n(1) 基础模型（llama3.2-3b-base）学会调用搜索引擎，并取得了更好的性能。\n\n![llama-3b](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPeterGriffinJin_Search-R1_readme_6ef0f8069291.png)\n\n\n(2) 基础模型（Qwen2.5-7b-base）可以通过 RL 学会进行多轮搜索引擎调用和推理。\n\n![multi-turn](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPeterGriffinJin_Search-R1_readme_22d3487c6e14.png)\n\n## 推理\n#### 您可以使用自己的问题与训练好的 Search-R1 模型进行交互。\n(1) 启动本地检索服务器。\n```bash\nconda activate retriever\nbash retrieval_launch.sh\n```\n\n(2) 运行推理。\n```bash\nconda activate searchr1\npython infer.py\n```\n您可以修改第 7 行的 ```question``` 为您感兴趣的内容。\n\n## 使用您自己的数据集\n\n### QA 数据\n对于每个问答样本，它应该是一个包含所需内容的字典，如下所示：\n\n```\ndata = {\n        \"data_source\": data_source,\n        \"prompt\": [{\n            \"role\": \"user\",\n            \"content\": question,\n        }],\n        \"ability\": \"fact-reasoning\",\n        \"reward_model\": {\n            \"style\": \"rule\",\n            \"ground_truth\": solution\n        },\n        \"extra_info\": {\n            'split': split,\n            'index': idx,\n        }\n    }\n```\n\n您可以参考 ```scripts\u002Fdata_process\u002Fnq_search.py``` 获取具体的数据处理示例。\n\n### 语料库\n\n建议将您的语料库制作成 jsonl 文件，其中每行（一个带有 “id” 键和 “contents” 键的字典）对应一段文本。您可以参考 ```example\u002Fcorpus.jsonl``` 了解示例。\n\n“id” 键对应段落 ID，而 “contents” 键对应段落内容（'\"' + 标题 + '\"\\n' + 文本）。\n例如：\n```\n{\"id\": \"0\", \"contents\": \"Evan Morris Evan L. Morris (January 26, 1977 – July 9, 2015) was a lobbyist for Genentech and its parent corporation Roche in Washington.\"}\n...\n{\"id\": \"100\", \"contents\": \"Three years later, when the United States Exploring Expedition to little-known portions of the globe was organised under Charles Wilkes, Hale was recommended, while yet an undergraduate.\"}\n...\n```\n\n**索引您的语料库（可选）。**\n如果您希望使用本地检索器作为搜索引擎，可以通过以下命令索引您自己的语料库：\n```\nbash search_r1\u002Fsearch\u002Fbuild_index.sh\n```\n您可以将 ```retriever_name``` 和 ```retriever_model``` 更改为您感兴趣的现成检索器。\n\n## 使用您自己的搜索引擎\n\n我们的代码库支持本地稀疏检索器（如 BM25）、本地稠密检索器（包括使用 GPU 的平面索引和使用 CPU 的 ANN 索引）以及在线搜索引擎（如 Google、Bing 等）。更多详细信息请参阅 [这里](https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1\u002Ftree\u002Fmain\u002Fdocs\u002Fretriever.md)。\n\n主要理念是将本地或远程搜索引擎服务器与主 RL 训练流程分开启动。\n\nLLM 可以通过调用搜索 API（例如 “http:\u002F\u002F127.0.0.1:8000\u002Fretrieve”）来调用搜索引擎。\n\n您可以参考 ```search_r1\u002Fsearch\u002Fretriever_server.py``` 了解如何启动本地检索器服务器的示例。\n\n## 特性\n- 支持本地稀疏检索器（如 BM25）。✔️\n- 支持本地稠密检索器（包括平面索引和 ANN 索引）。✔️\n- 支持 Google 搜索 \u002F Bing 搜索 \u002F Brave 搜索 API 等。✔️\n- 支持现成的神经重排序器。✔️\n- 支持不同的 RL 方法（如 PPO、GRPO、Reinforce）。✔️\n- 支持不同的 LLM（如 llama3、Qwen2.5 等）。✔️\n\n## 致谢\n\nSearch-R1 的概念灵感来源于 [Deepseek-R1](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-R1) 和 [TinyZero](https:\u002F\u002Fgithub.com\u002FJiayi-Pan\u002FTinyZero\u002Ftree\u002Fmain)。\n其实现基于 [veRL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl) 和 [RAGEN](https:\u002F\u002Fgithub.com\u002FZihanWang314\u002FRAGEN\u002Ftree\u002Fmain)。\n我们衷心感谢这些团队为开源研究与开发所做出的贡献。\n\n## 由 Search-R1 提供支持或受其启发的优秀工作\n\n- [DeepResearcher](https:\u002F\u002Fgithub.com\u002FGAIR-NLP\u002FDeepResearcher)：在真实环境中通过强化学习扩展深度研究能力。[![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FGAIR-NLP\u002FDeepResearcher)](https:\u002F\u002Fgithub.com\u002FGAIR-NLP\u002FDeepResearcher)\n- [Multimodal-Search-R1](https:\u002F\u002Fgithub.com\u002FEvolvingLMMs-Lab\u002Fmultimodal-search-r1)：激励多模态语言模型进行搜索。[![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FEvolvingLMMs-Lab\u002Fmultimodal-search-r1)](https:\u002F\u002Fgithub.com\u002FEvolvingLMMs-Lab\u002Fmultimodal-search-r1)\n- [OTC](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.14870)：通过强化学习实现最优工具调用。\n- [ZeroSearch](https:\u002F\u002Fgithub.com\u002FAlibaba-NLP\u002FZeroSearch)：在不实际搜索的情况下激励大语言模型的搜索能力。[![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FAlibaba-NLP\u002FZeroSearch)](https:\u002F\u002Fgithub.com\u002FAlibaba-NLP\u002FZeroSearch)\n- [IKEA](https:\u002F\u002Fgithub.com\u002Fhzy312\u002Fknowledge-r1)：基于强化学习的内外部知识协同推理，用于构建高效的自适应搜索代理。[![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fhzy312\u002Fknowledge-r1)](https:\u002F\u002Fgithub.com\u002Fhzy312\u002Fknowledge-r1)\n- [知识之香](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.09316)：利用信息觅食优化增强搜索的推理。\n- [AutoRefine](https:\u002F\u002Fwww.arxiv.org\u002Fpdf\u002F2505.11277)：在思考过程中进行搜索与精炼。[![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fsyr-cn\u002FAutoRefine)](https:\u002F\u002Fgithub.com\u002Fsyr-cn\u002FAutoRefine)\n- [O^2-Searcher](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.16582)：一种基于搜索的开放域开放式问答代理模型。[![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FAcade-Mate\u002FO2-Searcher)](https:\u002F\u002Fgithub.com\u002FAcade-Mate\u002FO2-Searcher)\n- [MaskSearch](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.20285)：一个通用的预训练框架，用于提升智能体式搜索能力。[![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FAlibaba-NLP\u002FMaskSearch)](https:\u002F\u002Fgithub.com\u002FAlibaba-NLP\u002FMaskSearch)\n- [VRAG-RL](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.22019)：基于视觉感知的 RAG，用于理解视觉丰富的信息。[![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FAlibaba-NLP\u002FVRAG)](https:\u002F\u002Fgithub.com\u002FAlibaba-NLP\u002FVRAG)\n- [R1-Code-Interpreter](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.21668)：通过 SFT 和 RL 训练大语言模型使用代码进行推理。[![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fyongchao98\u002FR1-Code-Interpreter)](https:\u002F\u002Fgithub.com\u002Fyongchao98\u002FR1-Code-Interpreter)\n- [R-Search](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.04185)：通过多奖励强化学习，利用搜索增强大语言模型的推理能力。[![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FQingFei1\u002FR-Search)](https:\u002F\u002Fgithub.com\u002FQingFei1\u002FR-Search)\n- [StepSearch](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.15107)：通过分步近端策略优化激发大语言模型的搜索能力。[![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FZillwang\u002FStepSearch)](https:\u002F\u002Fgithub.com\u002FZillwang\u002FStepSearch)\n- [SimpleTIR](https:\u002F\u002Fsimpletir.notion.site\u002Freport)：用于多轮工具集成推理的稳定端到端强化学习。[![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fltzheng\u002FSimpleTIR)](https:\u002F\u002Fgithub.com\u002Fltzheng\u002FSimpleTIR)\n- [Router-R1](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.09033)：通过强化学习教导大语言模型进行多轮路由与聚合。[![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fulab-uiuc\u002FRouter-R1)](https:\u002F\u002Fgithub.com\u002Fulab-uiuc\u002FRouter-R1)\n- [SkyRL](https:\u002F\u002Fskyrl.readthedocs.io\u002Fen\u002Flatest\u002F)：面向大语言模型的模块化全栈强化学习库。[![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FNovaSky-AI\u002FSkyRL)](https:\u002F\u002Fgithub.com\u002FNovaSky-AI\u002FSkyRL)\n- [ASearcher](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.07976)：用于搜索代理的大规模强化学习。[![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FinclusionAI\u002FASearcher)](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FASearcher)\n- [ParallelSearch](https:\u002F\u002Fwww.arxiv.org\u002Fabs\u002F2508.09303)：利用强化学习将查询分解并并行搜索子查询。[![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTree-Shu-Zhao\u002FParallelSearch)](https:\u002F\u002Fgithub.com\u002FTree-Shu-Zhao\u002FParallelSearch)\n- [AutoTIR](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2507.21836)：通过强化学习实现自主工具集成推理。[![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fweiyifan1023\u002FAutoTIR)](https:\u002F\u002Fgithub.com\u002Fweiyifan1023\u002FAutoTIR)\n- [verl-tool](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2509.01055)：一个支持多样化工具使用的 verl 版本。[![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTIGER-AI-Lab\u002Fverl-tool)](https:\u002F\u002Fgithub.com\u002FTIGER-AI-Lab\u002Fverl-tool)\n- [Tree-GRPO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.21240)：用于大语言模型代理强化学习的树状搜索。[![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FAMAP-ML\u002FTree-GRPO)](https:\u002F\u002Fgithub.com\u002FAMAP-ML\u002FTree-GRPO)\n- [EviNote-RAG](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.00877)：通过答案支持性证据笔记增强 RAG 模型。[![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FDa1yuqin\u002FEviNoteRAG)](https:\u002F\u002Fgithub.com\u002FDa1yuqin\u002FEviNoteRAG)\n- [GlobalRAG](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2510.20548v1)：GlobalRAG：通过强化学习提升多跳问答中的全局推理能力。[![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FCarnegieBin\u002FGlobalRAG)](https:\u002F\u002Fgithub.com\u002FCarnegieBin\u002FGlobalRAG)\n\n\n\n\n\n## 引用\n\n```bibtex\n@article{jin2025search,\n  title={Search-r1: 使用强化学习训练大语言模型进行推理并利用搜索引擎},\n  author={Jin, Bowen and Zeng, Hansi and Yue, Zhenrui and Yoon, Jinsung and Arik, Sercan and Wang, Dong and Zamani, Hamed and Han, Jiawei},\n  journal={arXiv 预印本 arXiv:2503.09516},\n  year={2025}\n}\n```\n\n```bibtex\n@article{jin2025empirical,\n  title={关于推理-搜索交织型大语言模型代理的强化学习实证研究},\n  author={Jin, Bowen and Yoon, Jinsung and Kargupta, Priyanka and Arik, Sercan O and Han, Jiawei},\n  journal={arXiv 预印本 arXiv:2505.15117},\n  year={2025}\n}\n```","# Search-R1 快速上手指南\n\nSearch-R1 是一个基于强化学习（RL）的框架，旨在训练大语言模型（LLM）具备**推理与调用搜索引擎相结合**的能力。它扩展了 DeepSeek-R1 的理念，支持多轮搜索与推理交错进行，是开源社区中实现类似 OpenAI DeepResearch 功能的重要方案。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**: Linux (推荐 Ubuntu 20.04+)\n- **Python**: 3.9 (主训练环境), 3.10 (可选的检索器环境)\n- **GPU**: 支持 CUDA 12.1 的 NVIDIA 显卡\n- **显存**: 建议 24GB+ (取决于模型大小，3B 模型需较少显存，30B+ 需多卡或多节点)\n\n### 前置依赖\n确保已安装 `conda` 和 `git`。\n\n## 安装步骤\n\n### 1. 配置主训练环境 (Search-R1)\n此环境用于运行强化学习训练和推理。\n\n```bash\n# 创建并激活虚拟环境\nconda create -n searchr1 python=3.9\nconda activate searchr1\n\n# 安装 PyTorch (CUDA 12.1)\n# 国内用户可使用清华源加速：pip install torch==2.4.0 --index-url https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\npip install torch==2.4.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\n\n# 安装 vLLM (支持版本 0.6.3, 0.5.4, 0.4.2, 0.3.1)\npip3 install vllm==0.6.3\n\n# 克隆项目并安装 Search-R1\ngit clone https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1.git\ncd Search-R1\npip install -e .\n\n# 安装 Flash Attention 2 和 WandB\npip3 install flash-attn --no-build-isolation\npip install wandb\n```\n\n### 2. 配置检索器环境 (可选)\n如果你计划使用**本地检索器**（如基于 Faiss 的稀疏\u002F稠密检索器）而非在线搜索 API，需单独配置此环境。\n\n```bash\n# 创建并激活虚拟环境\nconda create -n retriever python=3.10\nconda activate retriever\n\n# 安装 PyTorch 全家桶 (推荐通过 conda 安装以兼容 faiss-gpu)\n# 国内用户可使用清华源：conda install ... -c https:\u002F\u002Fmirrors.tuna.tsinghua.edu.cn\u002Fanaconda\u002Fpkgs\u002Fmain\u002F\nconda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia\n\n# 安装必要库\npip install transformers datasets pyserini\n\n# 安装 GPU 版 Faiss (保证 RL rollout 效率)\nconda install -c pytorch -c nvidia faiss-gpu=1.8.0\n\n# 安装 API 服务依赖\npip install uvicorn fastapi\n```\n\n## 基本使用\n\n以下示例演示如何在 **NQ (Natural Questions)** 数据集上，使用 **E5** 作为检索器、**Wikipedia** 作为语料库，训练一个具备搜索能力的 **Llama-3.2-3b-base** 模型。\n\n### 第一步：下载索引与语料库\n```bash\n# 设置保存路径\nsave_path=\u002Fthe\u002Fpath\u002Fto\u002Fsave\n\n# 下载数据脚本\npython scripts\u002Fdownload.py --save_path $save_path\n\n# 合并索引文件\ncat $save_path\u002Fpart_* > $save_path\u002Fe5_Flat.index\n\n# 解压语料库\ngzip -d $save_path\u002Fwiki-18.jsonl.gz\n```\n\n### 第二步：处理数据集\n```bash\npython scripts\u002Fdata_process\u002Fnq_search.py\n```\n\n### 第三步：启动本地检索服务\n切换到检索器环境并启动服务。\n```bash\nconda activate retriever\nbash retrieval_launch.sh\n```\n*注：保持该终端运行，检索服务需在后台持续响应。*\n\n### 第四步：开始强化学习训练\n切换回主训练环境，启动 PPO 训练。\n```bash\nconda activate searchr1\nbash train_ppo.sh\n```\n\n### 第五步：模型推理\n训练完成后，你可以用自己的问题进行测试。\n\n1. 确保检索服务仍在运行 (`retriever` 环境)。\n2. 在主环境中运行推理脚本：\n```bash\nconda activate searchr1\npython infer.py\n```\n*提示：编辑 `infer.py` 第 7 行的 `question` 变量即可修改测试问题。*\n\n---\n**更多自定义选项**：\n- **更换搜索引擎**：支持本地 BM25、Faiss (Flat\u002FANN) 或在线 API (Google\u002FBing)，详见 `docs\u002Fretriever.md`。\n- **更换数据集**：参考 `scripts\u002Fdata_process\u002Fnq_search.py` 格式构建自己的 JSONL 数据。\n- **多节点训练**：支持 30B+ 大模型的多节点训练配置，详见 `docs\u002Fmultinode.md`。","某金融科技公司的情报分析团队需要每日从海量新闻和财报中自动提取关键事件，以辅助投资决策。\n\n### 没有 Search-R1 时\n- **信息滞后且幻觉频发**：模型仅依赖训练截止前的静态数据，面对突发市场动态（如刚刚发布的财报或政策）只能“瞎编”，导致分析结论严重失真。\n- **推理与检索割裂**：传统流程需先由人工编写复杂的检索查询词，再单独调用搜索引擎，最后将结果喂给模型，链路冗长且无法根据中间推理动态调整搜索策略。\n- **训练成本高昂且封闭**：想要让模型学会自主搜索，往往依赖闭源 API 或缺乏高效的强化学习框架，难以针对特定金融领域数据进行低成本微调。\n\n### 使用 Search-R1 后\n- **实时精准的事实核查**：Search-R1 通过强化学习让模型自主决定何时调用搜索引擎，能实时获取最新股价和公告，彻底消除因数据过时产生的幻觉。\n- **“思考 - 搜索”交织的智能决策**：模型在推理过程中可多次动态发起搜索（例如：发现异常数据后自动追问原因），实现了类似人类分析师的“边想边查”闭环。\n- **开源高效的专业定制**：基于 veRL 构建的开源训练管线，允许团队使用自有金融数据低成本训练 3B 量级的小模型，即可达到媲美大模型的复杂任务处理能力。\n\nSearch-R1 的核心价值在于将静态的语言模型进化为具备实时自主信息获取能力的动态智能体，以开源方案打破了专业领域对闭源黑盒模型的依赖。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPeterGriffinJin_Search-R1_c3d1724b.png","PeterGriffinJin","Bowen Jin","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FPeterGriffinJin_f55ead76.jpg","A Ph.D student interested in LLM and multimodality.","UIUC","Urbana Champaign",null,"peterjin.me","https:\u002F\u002Fgithub.com\u002FPeterGriffinJin",[85,89],{"name":86,"color":87,"percentage":88},"Python","#3572A5",94.4,{"name":90,"color":91,"percentage":92},"Shell","#89e051",5.6,4362,375,"2026-04-04T16:05:45","Apache-2.0",4,"Linux","必需 NVIDIA GPU。安装指令明确指定使用 CUDA 12.1 (cu121)。支持本地稠密检索器的 GPU 扁平索引（flat indexing with GPUs）。针对 30B+ 大模型支持多节点训练，暗示需要多卡或高显存配置。","未说明（建议根据模型大小配置，30B+ 模型需大量内存）",{"notes":102,"python":103,"dependencies":104},"项目基于 veRL 框架。主训练环境 (searchr1) 推荐 Python 3.9，若使用本地检索器需单独创建 Python 3.10 环境并安装 faiss-gpu。支持多种强化学习算法（PPO, GRPO, reinforce）及多种大模型（Llama3, Qwen2.5 等）。检索引擎可配置为本地稀疏\u002F稠密检索器或在线搜索引擎（Google\u002FBing）。首次运行需下载索引和语料库数据。","3.9 (主环境), 3.10 (可选的检索器环境)",[105,106,107,108,109,110,111,112,113,114],"torch==2.4.0","vllm>=0.3.1","flash-attn","wandb","transformers","datasets","pyserini","faiss-gpu==1.8.0","uvicorn","fastapi",[26,13,15],"2026-03-27T02:49:30.150509","2026-04-06T06:51:55.876957",[119,124,129,134,139,143,147,151],{"id":120,"question_zh":121,"answer_zh":122,"source_url":123},15525,"安装 Search-R1 环境时遇到 vllm、outlines 和 pyairports 之间的依赖冲突怎么办？","可以通过手动安装 pyairports 并指定 outlines 版本来解决。具体步骤如下：\n1. 克隆 pyairports 仓库并本地安装：\n   git clone https:\u002F\u002Fgithub.com\u002FNICTA\u002Fpyairports\n   cd pyairports\n   pip install -e .\n2. 安装特定版本的 outlines（注意版本号应为 0.0.46）：\n   pip install outlines==0.0.46\n3. 完成上述步骤后，再安装其他依赖即可正常进行。","https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1\u002Fissues\u002F147",{"id":125,"question_zh":126,"answer_zh":127,"source_url":128},15526,"运行训练时出现 \"RuntimeError: batch size must be positive\" 错误如何解决？","该问题通常是由于填充移除逻辑（padding removal logic）导致某些样本的所有 token 被丢弃，从而产生零批次大小。解决方案包括：\n1. 设置 `use_remove_padding=False` 禁用填充移除功能。\n2. 如果遇到显存不足（OOM），可以尝试减小 `micro_batch_size`。\n3. 开启参数\u002F梯度\u002F优化器卸载（`param\u002Fgrad\u002Foptimizer_off_load`）以及启用梯度检查点（`enable_gradient_checkpointing`）以降低显存占用。","https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1\u002Fissues\u002F34",{"id":130,"question_zh":131,"answer_zh":132,"source_url":133},15527,"训练过程中遇到 CPU 或 GPU 显存溢出（OOM）错误该怎么办？","如果是 GPU OOM，可以尝试降低 `ppo_micro_batch_size` 参数以减少单次批处理的数据量。如果是 CPU 内存溢出，可能需要减少任务并行度或增加每个任务请求的 CPU 数量。对于检索服务（retriever），80GB 显存通常足够启动；但对于 RL 训练，可能需要 40GB 以上的 GPU 显存。若问题持续，可参考 Ray 官方文档调整内存监控阈值或禁用 worker 杀死机制。","https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1\u002Fissues\u002F45",{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},15528,"训练开始前的验证（validation）耗时过长，如何跳过初始评估？","如果希望跳过训练前的初始评估以节省时间，可以在配置中将 `+trainer.val_before_train` 设置为 `false`。评估耗时取决于计算设备性能，仅在 NQ 数据集上评估会比在七个数据集（如 nq_hotpotqa）上评估快得多。","https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1\u002Fissues\u002F55",{"id":140,"question_zh":141,"answer_zh":142,"source_url":128},15529,"当 max_turns 大于 2 时训练失败，但 max_turns=2 时成功，这是什么原因？","这通常与批次大小计算或填充处理逻辑有关。当回合数增加时，序列长度变化可能导致某些批次在处理后被完全过滤掉，引发 \"batch size must be positive\" 错误。建议尝试设置 `use_remove_padding=False`，或者减小 `micro_batch_size` 来避免此问题。",{"id":144,"question_zh":145,"answer_zh":146,"source_url":133},15530,"运行检索服务（retrieval_launch.sh）时发生 OOM，80GB 显存是否足够？","80GB 显存通常足以启动检索服务（retriever）。请确保 faiss-gpu 已正确安装，以便检索索引能加载到 GPU 中。需要注意的是，后续的 RL 训练阶段可能还需要额外的 40GB 以上显存，因此需合理规划资源分配。",{"id":148,"question_zh":149,"answer_zh":150,"source_url":123},15531,"如何解决因 outlines 版本不兼容导致的安装失败？","Search-R1 依赖特定版本的 vllm，而这些版本又依赖 outlines\u003C0.1, >=0.0.43。由于 outlines 不同子版本对 pyairports 的依赖关系不同，容易引发冲突。推荐直接手动安装 outlines==0.0.46，并预先从源码安装 pyairports，以避免 pip 自动解析依赖时出错。",{"id":152,"question_zh":153,"answer_zh":154,"source_url":128},15532,"使用多卡（如 8*A100）训练大模型（如 Qwen 2.5-7B）时频繁报错，有哪些优化建议？","除了调整 `ppo_micro_batch_size` 外，还可以尝试以下优化措施：\n1. 禁用填充移除：设置 `use_remove_padding=False`。\n2. 启用内存优化技术：打开 `param_off_load`、`grad_off_load`、`optimizer_off_load` 以及 `enable_gradient_checkpointing`。\n3. 检查 vllm 版本兼容性，部分用户反馈使用 vllm==0.4.2 可能更稳定。",[]]