[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-Simple-Efficient--RL-Factory":3,"tool-Simple-Efficient--RL-Factory":62},[4,18,26,36,46,54],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",159636,2,"2026-04-17T23:33:34",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":42,"last_commit_at":43,"category_tags":44,"status":17},8272,"opencode","anomalyco\u002Fopencode","OpenCode 是一款开源的 AI 编程助手（Coding Agent），旨在像一位智能搭档一样融入您的开发流程。它不仅仅是一个代码补全插件，而是一个能够理解项目上下文、自主规划任务并执行复杂编码操作的智能体。无论是生成全新功能、重构现有代码，还是排查难以定位的 Bug，OpenCode 都能通过自然语言交互高效完成，显著减少开发者在重复性劳动和上下文切换上的时间消耗。\n\n这款工具专为软件开发者、工程师及技术研究人员设计，特别适合希望利用大模型能力来提升编码效率、加速原型开发或处理遗留代码维护的专业人群。其核心亮点在于完全开源的架构，这意味着用户可以审查代码逻辑、自定义行为策略，甚至私有化部署以保障数据安全，彻底打破了传统闭源 AI 助手的“黑盒”限制。\n\n在技术体验上，OpenCode 提供了灵活的终端界面（Terminal UI）和正在测试中的桌面应用程序，支持 macOS、Windows 及 Linux 全平台。它兼容多种包管理工具，安装便捷，并能无缝集成到现有的开发环境中。无论您是追求极致控制权的资深极客，还是渴望提升产出的独立开发者，OpenCode 都提供了一个透明、可信",144296,1,"2026-04-16T14:50:03",[13,45],"插件",{"id":47,"name":48,"github_repo":49,"description_zh":50,"stars":51,"difficulty_score":32,"last_commit_at":52,"category_tags":53,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":55,"name":56,"github_repo":57,"description_zh":58,"stars":59,"difficulty_score":32,"last_commit_at":60,"category_tags":61,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[45,13,15,14],{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":77,"owner_twitter":77,"owner_website":77,"owner_url":78,"languages":79,"stars":95,"forks":96,"last_commit_at":97,"license":98,"difficulty_score":10,"env_os":99,"env_gpu":100,"env_ram":101,"env_deps":102,"category_tags":116,"github_topics":77,"view_count":32,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":117,"updated_at":118,"faqs":119,"releases":149},8937,"Simple-Efficient\u002FRL-Factory","RL-Factory","Train your Agent model via our easy and efficient framework","RL-Factory 是一个专为智能体（Agent）学习打造的强化学习后训练框架，旨在让用户能轻松、高效地训练自己的 Agent 模型。它核心解决了传统训练中环境与算法耦合紧密、配置复杂且效率低下的痛点。通过独特的“环境解耦”设计，用户只需定义工具配置和奖励函数即可启动训练，无需深入底层架构。\n\n该框架特别适合希望快速验证想法的 AI 研究人员、需要定制垂直领域智能体的开发者，以及想要尝试大模型工具调用能力的技术爱好者。其显著的技术亮点在于支持异步工具调用，使训练速度比现有框架提升约 2 倍；同时原生支持 Qwen3 等先进基座模型及 DeepSearch 等复杂场景，并提供规则奖励与模型评判等多种奖励计算方式。无论是构建多轮对话工具链，还是探索多模态智能体，RL-Factory 都致力于以极简的代码投入，帮助用户专注于核心的奖励逻辑设计与工具搭建，加速智能体应用的迭代与落地。","\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FSimple-Efficient_RL-Factory_readme_f2302eb9610a.png\" alt=\"Description\" style=\"width:300px; height:auto;\"\u002F>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\n[📘Tutorial](docs\u002Frl_factory\u002Fen\u002Fmain_tutorial.md) &#124; [🛠️Installation](docs\u002Frl_factory\u002Fen\u002Finstall.md) &#124; [🎨Framework](docs\u002Frl_factory\u002Fen\u002Fframework_design.md) &#124; [🏆Model](https:\u002F\u002Fhuggingface.co\u002FSimple-Efficient\u002FRLFactory-Qwen3-8B-GRPO)\n\n\u003C\u002Fdiv>\n\n--- \n\nRLFactory is an **easy** and **efficient** RL post-training framework for **Agentic Learning**. \n\nRL-Factory decouples the environment from RL post-training, enabling training with just a tool config and reward function while supporting async tool-calling to make RL post-training **2x faster**.\n\nCurrent version natively supports one-click **DeepSearch** training and features multi-turn tool-calling, model judge reward, and training of multiple models including **Qwen3**. More easy and efficient agentic learning modules will be added in upcoming features.\n\n\u003Cdiv align=\"center\">\n  \u003Cb>Now, everyone can easily and quickly train an Agent with Qwen3 (as base models) and MCP tools!\u003C\u002Fb>\n\u003C\u002Fdiv>\n\n## Release Log\nWe’ll keep a fast release cycle to quickly deliver and polish the upcoming features.\n+ **Version 0.1**\n  + **Environment decouple**: define your tool-use envinroment easily (tools setup and reward function definition)\n  + **Qwen3 Model support**: quickly train your agent using Qwen3 (much better than Qwen2.5 in tool-call)\n  + **Efficient training**: 2x faster than existing frameworks for rapid model iteration (mainly through async tool-use)\n+ **Version 0.2** We are looking forward to more people participating in the development and construction together, to create a great Agenetic Training community. Please feel free to contact us.\n  + **WebUI**: build a WebUI for data processing, tool & environment definition, training configuration, and project management [#2](https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fissues\u002F2)\n  + **More efficient training**: support the AsyncLLMEngine for more efficient rollout [#4](https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fissues\u002F4)\n  + **More models**: test more models (such as Deepseek, Llama, etc.) and add corresponding support configurations [#5](https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fissues\u002F5)\n  + **Process Reward**: use process reward to better guide the tool-call behavior of your model [#6](https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fissues\u002F6)\n  + **More applications**: help create more demos (such as [TravelPlanner](https:\u002F\u002Fgithub.com\u002FOSU-NLP-Group\u002FTravelPlanner)) to adapt to more benchmarks\n  + **Multimodal agentic learning**: support multimodal (image) agent training in terms of functionality [#66](https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fissues\u002F66)\n  + **Android Environment**: added Android OS environment support [#38](https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fpull\u002F38)\n  + **Tools cache**: cached tool invocation results to enhance post-processing efficiency [#57](https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fpull\u002F57)\n  + **Handy evaluation**: added main_eval.sh for evaluation utility [#36](https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fpull\u002F36)\n  + **Upgrade to VeRL-0.5**: Upgraded to VeRL-0.5 with maximal component decoupling [update\u002Fverl_0_5](https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Ftree\u002Fupdate\u002Fverl_0_5)\n  + **Add MS-SWIFT-3.7 to RL-Factory**: Added the support of MS-SWIFT-3.7 to make it more convenient for individual developers [#95](https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fpull\u002F95)\n\n## Our Framework Design\nOur goal is to enable users to focus on reward logic and tool setup for fast agentic learning with minimal code, while hardcore developers could focus on improving training efficiency and model performance. \n\nFor **easy-to-use**, we decouple the environment from RL-based post-training with several advantages. \n+ **Easy-to-design reward function**: Calculate rewards through **rules**, **model-judge**, and even **tools** to meet all your requirements for reward function.\n+ **Seamless tool setup**: Simply provide the configuration file for your **MCP tools** and custom tools to integrate them into RL learning.\n+ **Multi-Agent extention**: Convert your agent to the MCP format for easy Multi-Agent Interaction. LLM chat simulation will be also added in the future to improve multi-turn dialogue capabilities. \n\nFor **efficient learning**, we develope several essential modules within the RL post-training framework, making training **2x faster**.\n+ **Efficient tool-call**: Improve online RL training efficiency through batch processing and asynchronous parallel tool calls.\n+ **Efficient reward calculation**: Deploy LRM (like QwQ-32B) in a distributed manner for efficient **model judging**, and use asynchronous parallelism to speed up reward calculation. \n\nFor **future progression**, we will continue to prioritize **\"easy\"** and **\"efficient\"**.\n+ **Easier**: Use WebUI to process data, define tool & environment, adjust training configuration, and manage project. (The WebUI is under rapid development.)\n+ **More efficient**: Continuously iterating and improving the training framework (such as AsyncLLMEngine) and RL training algorithms.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FSimple-Efficient_RL-Factory_readme_a7e0fffa5adf.png\" alt=\"Description\" style=\"width:750px; height:auto;\"\u002F>\n\u003C\u002Fdiv>\n\n## User Instructions\n- **Dependencies (Key)**\n  ```yaml\n  Cuda: >=12.0 (Recommended: 12.4)\n  Python: >=3.10  (Recommended: 3.10)\n  # For Qwen3 model support\n  vllm: >=0.8.3 (Recommended: 0.8.5)\n  ```\n- **Install Requirements**\n  ```bash\n  pip3 install accelerate bitsandbytes datasets deepspeed==0.16.4 einops flash-attn==2.7.0.post2 isort jsonlines loralib optimum packaging peft pynvml>=12.0.0 ray[default]==2.46.0 tensorboard torch==2.6.0 torchmetrics tqdm transformers==4.51.3 transformers_stream_generator wandb wheel\n  pip3 install vllm==0.8.5      # Mainly for Qwen3 model support\n  pip3 install \"qwen-agent[code_interpreter]\"\n  pip3 install llama_index bs4 pymilvus infinity_client codetiming tensordict==0.6 omegaconf torchdata==0.10.0 hydra-core easydict dill python-multipart mcp==1.9.3\n  pip3 install -e . --no-deps\n  pip3 install faiss-gpu-cu12   # Optional, needed for end-to-end search model training with rag_server\n  pip3 install nvidia-cublas-cu12==12.4.5.8  # Optional, needed while encountering ray worker died issue during training\n  ```\n  \u003Cdiv style=\"padding:10px; background-color:#fff3cd; color:#856404; border:1px solid #ffeeba; border-radius:4px;\">\n  \u003Cstrong>Note:\u003C\u002Fstrong> Currently, only Qwen models are tested.\n  \u003C\u002Fdiv>\n- **What do you need to provide?**\n  + An **environment** is enough! See the minimal tutorial in [`docs\u002Frl_factory\u002Fmain_tutorial.md`](docs\u002Frl_factory\u002Fmain_tutorial.md)\n- **Training Command**\n  ```bash\n  # Before running, modify MODEL_PATH, REWARD_MODEL_PATH, and several actor_rollout_ref.env parameters as needed\n  bash main_grpo.sh\n  ```\n- **Evaluate or Infer Command**\n  ```bash\n  # Before running, modify MODEL_PATH, REWARD_MODEL_PATH, and several data and trainer parameters as needed\n  bash main_eval.sh\n  ```\n\n\n## Demo in DeepSearch Training\n+ In [`docs\u002Frl_factory\u002Fmain_tutorial.md`](docs\u002Frl_factory\u002Fmain_tutorial.md), we provide an RLFactory reproduction example of [Search-R1](https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1). We use `Qwen3-4B` and `Qwen3-8B` as the base model for RL training. \n+ **Easy**: Start with Qwen3 and MCP tools to quickly train your own DeepSearch Agent.\n  + Provide only one tool configuration and one reward function to start training! \n  + Qwen3 demonstrates significant advantages in Agent Learning. It can accurately call tools even without SFT, and it also supports the MCP protocol.\n\n+ **Efficient**: Enjoy the efficient training enabled by asynchronous parallel tool-call.\n  + Compared to Search-R1 based on the original verl, the required training time is reduced by **1.5 to 2 times**, and the efficiency gain is even greater if a **model judge** is involved.\n  + After 100 steps of training (about 5 hours in 8*A100), `Qwen3-4B` achieves a score of 0.458 and `Qwen3-8B` achieves a score of 0.463. \n+ The table below presents our training results under identical computational resources, software, and verl versions\n  + RLFactory trains in about half the time of Search-R1, demonstrating high efficiency.\n  + Qwen3 as the base model outperforms Qwen2.5, enabling domain-specific tool-calling via RL post-training without SFT.\n\n\u003Cp align=\"center\">\n\n| Model Name | Test Score (NQ) | Total Training Time (100 step) | Seconds per step | Training Resources |\n| --- | :---: | :---: | :---: | :---: |\n| Search-R1-Qwen2.5-3B-Instruct-GRPO | 0.356 | 7.39 h | 266 s | A100 × 8 |\n| Search-R1-Qwen2.5-7B-Instruct-GRPO | 0.451 | 9.25 h | 333 s | A100 × 8 |\n| Search-R1-Qwen3-4B-GRPO | 0.420 | 7.95 h | 286 s | A100 × 8 |\n| **RLFactory-Qwen3-4B-GRPO** | **0.458** | **5.30 h** | **190 s** | A100 × 8 | \n| **RLFactory-Qwen3-8B-GRPO** | **0.463** | **5.76 h** | **207 s** | A100 × 8 | \n\n\u003C\u002Fp>\n\n\n\n## How to contribute?\nWe welcome all users and developers to contribute code to RLFactory. If you have any questions, encounter bugs, or would like to collaborate on development, please feel free to contact us!\n\n1. Submit an issue directly on GitHub.  \n2. Contact us via email at chaijiajun@meituan.com or gjyin@outlook.com.\n3. Join our WeChat group(preferred) and become a pioneer in Agent training!\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FSimple-Efficient_RL-Factory_readme_29d6fbd71eee.jpg\" alt=\"Description\" style=\"width:200px; height:auto;\"\u002F>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FSimple-Efficient_RL-Factory_readme_fc16c400a168.png\" alt=\"Description\" style=\"width:200px; height:auto;\"\u002F>\n\u003C\u002Fdiv>\n\n## Acknowledgement\nThis repo benefits from [verl](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002FveRL), [Search-R1](https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1), [Qwen-Agent](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen-Agent). Thanks for their wonderful works. We will also introduce [TRL](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftrl) in the future to further expand the applicability of our framework.\n\n## 📚 Citation\nOur technical report can be found [here](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2509.06980).\nIf you find our work useful, please consider citing our work:\n```\n@misc{chai2025rlfactoryplugandplayreinforcementlearning,\n      title={RLFactory: A Plug-and-Play Reinforcement Learning Post-Training Framework for LLM Multi-Turn Tool-Use}, \n      author={Jiajun Chai and Guojun Yin and Zekun Xu and Chuhuai Yue and Yi Jia and Siyu Xia and Xiaohan Wang and Jiwen Jiang and Xiaoguang Li and Chengqi Dong and Hang He and Wei Lin},\n      year={2025},\n      eprint={2509.06980},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.06980}, \n}\n```\nIf you have contributed to this project and wish to be included in our technical report, please contact me (gjyin@outlook.com) promptly.\n\n\n\n## Star History\n[![Star History Chart](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FSimple-Efficient_RL-Factory_readme_b063159bfa0b.png)](https:\u002F\u002Fwww.star-history.com\u002F#Simple-Efficient\u002FRL-Factory&Date)  \n","\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FSimple-Efficient_RL-Factory_readme_f2302eb9610a.png\" alt=\"Description\" style=\"width:300px; height:auto;\"\u002F>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\n[📘教程](docs\u002Frl_factory\u002Fen\u002Fmain_tutorial.md) &#124; [🛠️安装](docs\u002Frl_factory\u002Fen\u002Finstall.md) &#124; [🎨框架](docs\u002Frl_factory\u002Fen\u002Fframework_design.md) &#124; [🏆模型](https:\u002F\u002Fhuggingface.co\u002FSimple-Efficient\u002FRLFactory-Qwen3-8B-GRPO)\n\n\u003C\u002Fdiv>\n\n---\n\nRLFactory 是一个**简单**且**高效**的用于**智能体学习**的强化学习后训练框架。\n\nRL-Factory 将环境与强化学习后训练解耦，只需工具配置和奖励函数即可进行训练，同时支持异步工具调用，使强化学习后训练速度提升至**2倍**。\n\n当前版本原生支持一键式**DeepSearch**训练，并具备多轮工具调用、模型评判奖励以及对包括**Qwen3**在内的多种模型的训练功能。未来还将陆续加入更多简单高效的智能体学习模块。\n\n\u003Cdiv align=\"center\">\n  \u003Cb>现在，每个人都可以轻松快速地使用 Qwen3（作为基础模型）和 MCP 工具来训练智能体！\u003C\u002Fb>\n\u003C\u002Fdiv>\n\n## 发布日志\n我们将保持快速的发布节奏，以迅速推出并完善即将上线的功能。\n+ **版本 0.1**\n  + **环境解耦**：轻松定义你的工具使用环境（工具设置和奖励函数定义）\n  + **Qwen3 模型支持**：使用 Qwen3 快速训练你的智能体（在工具调用方面远优于 Qwen2.5）\n  + **高效训练**：相比现有框架，训练速度提升至2倍，实现快速的模型迭代（主要通过异步工具调用）\n+ **版本 0.2** 我们期待更多人共同参与开发与建设，打造一个优秀的智能体训练社区。欢迎随时联系我们。\n  + **WebUI**：构建用于数据处理、工具与环境定义、训练配置及项目管理的 WebUI [#2](https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fissues\u002F2)\n  + **更高效的训练**：支持 AsyncLLMEngine，以实现更高效的 rollout [#4](https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fissues\u002F4)\n  + **更多模型**：测试更多模型（如 Deepseek、Llama 等），并添加相应的支持配置 [#5](https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fissues\u002F5)\n  + **过程奖励**：利用过程奖励更好地引导模型的工具调用行为 [#6](https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fissues\u002F6)\n  + **更多应用**：帮助创建更多演示项目（如 [TravelPlanner](https:\u002F\u002Fgithub.com\u002FOSU-NLP-Group\u002FTravelPlanner)），以适配更多基准测试\n  + **多模态智能体学习**：在功能上支持多模态（图像）智能体训练 [#66](https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fissues\u002F66)\n  + **Android 环境**：新增 Android OS 环境支持 [#38](https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fpull\u002F38)\n  + **工具缓存**：缓存工具调用结果，以提升后处理效率 [#57](https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fpull\u002F57)\n  + **便捷评估**：新增 main_eval.sh 用于评估工具 [#36](https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fpull\u002F36)\n  + **升级至 VeRL-0.5**：通过最大程度的组件解耦，升级至 VeRL-0.5 [update\u002Fverl_0_5](https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Ftree\u002Fupdate\u002Fverl_0_5)\n  + **将 MS-SWIFT-3.7 加入 RL-Factory**：增加了对 MS-SWIFT-3.7 的支持，使个人开发者使用起来更加方便 [#95](https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fpull\u002F95)\n\n## 我们的框架设计\n我们的目标是让用户能够专注于奖励逻辑和工具设置，以最少的代码实现快速的智能体学习；而核心开发者则可以专注于提升训练效率和模型性能。\n\n为了实现**易用性**，我们通过以下几点优势将环境与基于强化学习的后训练解耦：\n+ **易于设计的奖励函数**：可通过**规则**、**模型评判**，甚至**工具**来计算奖励，满足你对奖励函数的所有需求。\n+ **无缝工具设置**：只需提供你的**MCP 工具**和自定义工具的配置文件，即可将其集成到强化学习中。\n+ **多智能体扩展**：将你的智能体转换为 MCP 格式，以便于多智能体交互。未来还将加入 LLM 对话模拟功能，以提升多轮对话能力。\n\n为了实现**高效学习**，我们在强化学习后训练框架中开发了若干关键模块，使训练速度提升至**2倍**。\n+ **高效工具调用**：通过批量处理和异步并行工具调用，提升在线强化学习训练效率。\n+ **高效奖励计算**：以分布式方式部署 LRM（如 QwQ-32B）进行高效的**模型评判**，并利用异步并行技术加快奖励计算速度。\n\n对于**未来的进展**，我们将继续优先考虑**“简单”**和**“高效”**：\n+ **更简单**：使用 WebUI 处理数据、定义工具与环境、调整训练配置并管理项目。（WebUI 正在快速开发中。）\n+ **更高效**：持续迭代和改进训练框架（如 AsyncLLMEngine）以及强化学习训练算法。\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FSimple-Efficient_RL-Factory_readme_a7e0fffa5adf.png\" alt=\"Description\" style=\"width:750px; height:auto;\"\u002F>\n\u003C\u002Fdiv>\n\n## 用户使用说明\n- **依赖项（关键）**\n  ```yaml\n  Cuda: >=12.0（推荐：12.4）\n  Python: >=3.10（推荐：3.10）\n  # 用于支持Qwen3模型\n  vllm: >=0.8.3（推荐：0.8.5）\n  ```\n- **安装要求**\n  ```bash\n  pip3 install accelerate bitsandbytes datasets deepspeed==0.16.4 einops flash-attn==2.7.0.post2 isort jsonlines loralib optimum packaging peft pynvml>=12.0.0 ray[default]==2.46.0 tensorboard torch==2.6.0 torchmetrics tqdm transformers==4.51.3 transformers_stream_generator wandb wheel\n  pip3 install vllm==0.8.5      # 主要用于支持Qwen3模型\n  pip3 install \"qwen-agent[code_interpreter]\"\n  pip3 install llama_index bs4 pymilvus infinity_client codetiming tensordict==0.6 omegaconf torchdata==0.10.0 hydra-core easydict dill python-multipart mcp==1.9.3\n  pip3 install -e . --no-deps\n  pip3 install faiss-gpu-cu12   # 可选，用于rag_server端到端检索模型训练时需要\n  pip3 install nvidia-cublas-cu12==12.4.5.8  # 可选，训练过程中遇到ray worker死亡问题时需要\n  ```\n  \u003Cdiv style=\"padding:10px; background-color:#fff3cd; color:#856404; border:1px solid #ffeeba; border-radius:4px;\">\n  \u003Cstrong>注：\u003C\u002Fstrong>目前仅对Qwen系列模型进行了测试。\n  \u003C\u002Fdiv>\n- **您需要提供什么？**\n  + 一个**环境**就足够了！请参阅[`docs\u002Frl_factory\u002Fmain_tutorial.md`](docs\u002Frl_factory\u002Fmain_tutorial.md)中的最小化教程。\n- **训练命令**\n  ```bash\n  # 运行前，请根据需要修改MODEL_PATH、REWARD_MODEL_PATH以及若干actor_rollout_ref.env参数\n  bash main_grpo.sh\n  ```\n- **评估或推理命令**\n  ```bash\n  # 运行前，请根据需要修改MODEL_PATH、REWARD_MODEL_PATH以及若干数据和trainer参数\n  bash main_eval.sh\n  ```\n\n\n## DeepSearch训练中的演示\n+ 在[`docs\u002Frl_factory\u002Fmain_tutorial.md`](docs\u002Frl_factory\u002Fmain_tutorial.md)中，我们提供了基于[Search-R1](https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1)的RLFactory复现示例。我们使用`Qwen3-4B`和`Qwen3-8B`作为RL训练的基础模型。\n+ **简单**：从Qwen3和MCP工具开始，快速训练属于您自己的DeepSearch智能体。\n  + 您只需提供一种工具配置和一种奖励函数即可开始训练！\n  + Qwen3在智能体学习方面表现出显著优势。即使没有经过SFT微调，它也能准确地调用工具，并且还支持MCP协议。\n\n+ **高效**：享受异步并行工具调用带来的高效训练体验。\n  + 与基于原始verl的Search-R1相比，所需的训练时间缩短了**1.5至2倍**，如果加入**模型评判者**，效率提升会更加明显。\n  + 经过100步训练（约5小时，8*A100），`Qwen3-4B`得分达到0.458，`Qwen3-8B`得分达到0.463。以下表格展示了我们在相同计算资源、软件和verl版本下的训练结果：\n  + RLFactory的训练时间约为Search-R1的一半，展现出极高的效率。\n  + 以Qwen3为基础模型的表现优于Qwen2.5，能够在无需SFT的情况下通过RL后训练实现领域特定的工具调用。\n\n\u003Cp align=\"center\">\n\n| 模型名称 | 测试分数（NQ） | 总训练时间（100步） | 每步耗时 | 训练资源 |\n| --- | :---: | :---: | :---: | :---: |\n| Search-R1-Qwen2.5-3B-Instruct-GRPO | 0.356 | 7.39 h | 266 s | A100 × 8 |\n| Search-R1-Qwen2.5-7B-Instruct-GRPO | 0.451 | 9.25 h | 333 s | A100 × 8 |\n| Search-R1-Qwen3-4B-GRPO | 0.420 | 7.95 h | 286 s | A100 × 8 |\n| **RLFactory-Qwen3-4B-GRPO** | **0.458** | **5.30 h** | **190 s** | A100 × 8 | \n| **RLFactory-Qwen3-8B-GRPO** | **0.463** | **5.76 h** | **207 s** | A100 × 8 | \n\n\u003C\u002Fp>\n\n\n\n## 如何贡献？\n我们欢迎所有用户和开发者为RLFactory贡献代码。如果您有任何疑问、遇到bug，或者希望参与开发合作，请随时联系我们！\n\n1. 直接在GitHub上提交issue。\n2. 通过电子邮件chaijiajun@meituan.com或gjyin@outlook.com与我们联系。\n3. 加入我们的微信群（首选），成为智能体训练领域的先锋！\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FSimple-Efficient_RL-Factory_readme_29d6fbd71eee.jpg\" alt=\"描述\" style=\"width:200px; height:auto;\"\u002F>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FSimple-Efficient_RL-Factory_readme_fc16c400a168.png\" alt=\"描述\" style=\"width:200px; height:auto;\"\u002F>\n\u003C\u002Fdiv>\n\n## 致谢\n本仓库受益于[verl](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002FveRL)、[Search-R1](https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1)、[Qwen-Agent](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen-Agent)。感谢他们的杰出工作。未来我们还将引入[TRL](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftrl)，以进一步扩展我们框架的应用范围。\n\n## 📚 引用\n我们的技术报告可在[这里](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2509.06980)找到。如果您认为我们的工作有用，请考虑引用：\n```\n@misc{chai2025rlfactoryplugandplayreinforcementlearning,\n      title={RLFactory: 一种适用于大语言模型多轮工具使用的即插即用强化学习后训练框架}, \n      author={Jiajun Chai、Guojun Yin、Zekun Xu、Chuhuai Yue、Yi Jia、Siyu Xia、Xiaohan Wang、Jiwen Jiang、Xiaoguang Li、Chengqi Dong、Hang He、Wei Lin},\n      year={2025},\n      eprint={2509.06980},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.06980}, \n}\n```\n如果您为本项目做出了贡献，并希望被纳入我们的技术报告，请尽快与我联系（gjyin@outlook.com）。\n\n\n\n## 星级历史\n[![星级历史图表](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FSimple-Efficient_RL-Factory_readme_b063159bfa0b.png)](https:\u002F\u002Fwww.star-history.com\u002F#Simple-Efficient\u002FRL-Factory&Date)","# RL-Factory 快速上手指南\n\nRL-Factory 是一个**简单**且**高效**的强化学习（RL）后训练框架，专为**智能体学习**（Agentic Learning）设计。它通过将环境与 RL 训练解耦，仅需配置文件和奖励函数即可启动训练，并支持异步工具调用，使训练速度提升约 **2 倍**。目前原生支持基于 **Qwen3** 模型的 DeepSearch 任务及 MCP 工具协议。\n\n## 1. 环境准备\n\n在开始之前，请确保您的开发环境满足以下系统要求和依赖版本。\n\n### 系统要求\n- **CUDA**: >= 12.0 (推荐 12.4)\n- **Python**: >= 3.10 (推荐 3.10)\n\n### 核心依赖\n- **vllm**: >= 0.8.3 (推荐 0.8.5，主要用于支持 Qwen3 模型)\n- **注意**: 目前主要测试了 Qwen 系列模型。\n\n## 2. 安装步骤\n\n请按照以下顺序执行命令安装所需依赖。建议在使用前配置好国内 pip 镜像源（如清华源或阿里源）以加速下载。\n\n```bash\n# 1. 安装基础深度学习及 RL 相关依赖\npip3 install accelerate bitsandbytes datasets deepspeed==0.16.4 einops flash-attn==2.7.0.post2 isort jsonlines loralib optimum packaging peft pynvml>=12.0.0 ray[default]==2.46.0 tensorboard torch==2.6.0 torchmetrics tqdm transformers==4.51.3 transformers_stream_generator wandb wheel\n\n# 2. 安装 vllm (Qwen3 模型支持关键组件)\npip3 install vllm==0.8.5\n\n# 3. 安装智能体及工具链相关依赖\npip3 install \"qwen-agent[code_interpreter]\"\npip3 install llama_index bs4 pymilvus infinity_client codetiming tensordict==0.6 omegaconf torchdata==0.10.0 hydra-core easydict dill python-multipart mcp==1.9.3\n\n# 4. 安装 RL-Factory 本体\npip3 install -e . --no-deps\n\n# 5. (可选) 若需进行端到端搜索模型训练 (RAG)，安装 faiss-gpu\npip3 install faiss-gpu-cu12\n\n# 6. (可选) 若训练中遇到 ray worker died 问题，安装特定版本的 cublas\npip3 install nvidia-cublas-cu12==12.4.5.8\n```\n\n## 3. 基本使用\n\nRL-Factory 的设计理念是“开箱即用”。您只需要定义一个**环境**（包含工具配置和奖励函数），即可开始训练。\n\n### 第一步：准备环境配置\n参考官方最小化教程 [`docs\u002Frl_factory\u002Fmain_tutorial.md`](docs\u002Frl_factory\u002Fmain_tutorial.md)，您需要准备：\n1.  **工具配置文件**：定义您的 MCP 工具或自定义工具。\n2.  **奖励函数**：可以通过规则、模型评判（Model Judge）或工具调用来计算奖励。\n\n> **提示**：框架已内置 DeepSearch 示例，可直接复用其配置结构快速上手。\n\n### 第二步：启动训练\n修改脚本 `main_grpo.sh` 中的关键参数（如 `MODEL_PATH`, `REWARD_MODEL_PATH` 以及 `actor_rollout_ref.env` 相关配置），然后运行：\n\n```bash\nbash main_grpo.sh\n```\n\n### 第三步：评估与推理\n训练完成后，修改 `main_eval.sh` 中的模型路径和数据参数，运行评估脚本：\n\n```bash\nbash main_eval.sh\n```\n\n### 性能参考\n在 8 张 A100 显卡环境下，使用 RL-Factory 训练 Qwen3-8B 模型（100 steps）仅需约 **5.76 小时**，相比传统框架效率提升显著，且无需 SFT 即可实现精准的工具调用能力。","某电商团队希望基于 Qwen3 大模型构建一个能自动查询库存、比对价格并生成采购建议的智能 Agent，以辅助运营人员决策。\n\n### 没有 RL-Factory 时\n- **环境耦合严重**：开发者需手动编写大量代码将库存 API、价格数据库等工具硬编码进训练循环，每次更换数据源都要重构底层逻辑。\n- **训练效率低下**：由于缺乏异步工具调用支持，模型在等待 API 返回时处于空闲状态，导致单次迭代耗时极长，难以快速验证想法。\n- **奖励函数难定义**：难以灵活结合“规则判断”（如价格低于阈值）与“模型评判”（如建议合理性），导致 Agent 学会投机取巧而非真正优化采购策略。\n- **多轮交互支持弱**：原生框架对多轮工具调用的状态管理复杂，Agent 常在中途丢失上下文，无法完成复杂的连环查询任务。\n\n### 使用 RL-Factory 后\n- **环境解耦配置化**：只需通过配置文件定义工具集和奖励逻辑，即可将 Qwen3 与电商后台无缝对接，无需修改核心训练代码。\n- **训练速度翻倍**：利用内置的异步工具调用机制，模型在等待接口响应时并行处理其他样本，整体训练效率提升 2 倍，加速模型迭代。\n- **灵活奖励机制**：轻松组合规则奖励（价格合规）与模型裁判奖励（建议质量），引导 Agent 输出既符合业务约束又具备商业价值的方案。\n- **原生多轮支持**：框架天然支持多轮工具调用状态管理，Agent 能流畅执行“查库存->比价格->算利润->生成报告”的复杂链路。\n\nRL-Factory 通过解耦环境与训练流程，让开发者专注于业务逻辑与奖励设计，以最低成本打造出高效可靠的垂直领域智能体。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FSimple-Efficient_RL-Factory_f2302eb9.png","Simple-Efficient","AsX","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FSimple-Efficient_45a7f545.png","Running towards AGI",null,"https:\u002F\u002Fgithub.com\u002FSimple-Efficient",[80,84,88,92],{"name":81,"color":82,"percentage":83},"Python","#3572A5",95.4,{"name":85,"color":86,"percentage":87},"Shell","#89e051",4.4,{"name":89,"color":90,"percentage":91},"Roff","#ecdebe",0.1,{"name":93,"color":94,"percentage":91},"Jinja","#a52a22",1736,163,"2026-04-17T14:56:42","Apache-2.0","Linux","必需 NVIDIA GPU，推荐 A100（测试环境为 8xA100），需支持 CUDA 12.0+（推荐 12.4），需安装 faiss-gpu-cu12 和 nvidia-cublas-cu12","未说明",{"notes":103,"python":104,"dependencies":105},"目前仅测试过通义千问（Qwen）系列模型；训练 DeepSearch 等任务需要配置 MCP 工具和环境；若遇到 ray worker 崩溃问题，需特定版本 nvidia-cublas-cu12；支持异步工具调用以加速训练。",">=3.10 (推荐 3.10)",[106,107,108,109,110,111,112,113,114,115],"torch==2.6.0","vllm>=0.8.3 (推荐 0.8.5)","transformers==4.51.3","deepspeed==0.16.4","ray==2.46.0","flash-attn==2.7.0.post2","accelerate","peft","qwen-agent","mcp==1.9.3",[35,14,13],"2026-03-27T02:49:30.150509","2026-04-18T14:13:06.016465",[120,125,130,135,140,145],{"id":121,"question_zh":122,"answer_zh":123,"source_url":124},40097,"如何修改整体流程以不使用模型自身的 Function Call 能力，而是通过自定义占位符（如\u003Cquery>）解析并调用工具？","如果您保持标准的对话流程（system -> user -> assistant -> tool -> assistant ...），通常不需要修改 rollout 逻辑。对于不同模型（如 Qwen3 和 Llama3），框架已在 tokenizer.apply_chat_template() 中做了适配处理：Qwen3 会自动拼接包含工具调用指令的 prompt，而 Llama3 则有单独的格式适配。只要遵循既定流程，框架会自动处理 prompt 拼接和工具调用的解析，无需手动干预底层逻辑。","https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fissues\u002F41",{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},40098,"在使用多模态数据输入时，是否会导致工具信息缺失（sys_prompt 不包含工具信息）？","这是一个已修复的 Bug。此前在 rl_dataset.py 的多模态输入分支（第 223-263 行）中，确实遗漏了对 tool_manager.get_prompt() 的调用，导致系统提示词（sys_prompt）中缺少工具相关信息。该问题现已修复，确保在多模态模式下也能正确生成包含工具定义的 prompt。如果您遇到类似问题，请确保代码已更新到最新版本。","https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fissues\u002F81",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},40099,"遇到 flash-attn 版本冲突或安装错误怎么办？","如果报错显示 flash_attn 版本与当前 PyTorch 版本不匹配（例如显示 torch2.5 但实际环境是 torch2.6），请尝试安装特定版本的 wheel 文件。建议使用 flash-attn 2.7.1 及以上版本，并确保其与您的 PyTorch 版本对应。例如，对于 Torch 2.6 + CUDA 12，可安装：\npip install https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention\u002Freleases\u002Fdownload\u002Fv2.7.1.post4\u002Fflash_attn-2.7.1.post4+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl","https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fissues\u002F50",{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},40100,"使用 vLLM 进行多轮工具调用时，遇到 'ray.exceptions.ActorDiedError' 错误如何解决？","该错误通常由 vLLM 与 Torch 或 CUDA 库的版本兼容性引起，特别是 nvidia-cublas 库。解决方法是强制安装特定版本的 nvidia-cublas-cu12。请执行以下命令：\npip3 install nvidia-cublas-cu12==12.4.5.8\n安装后重启任务，通常可解决因显存溢出或底层库冲突导致的 Actor 意外退出问题。","https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fissues\u002F8",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},40101,"框架是否支持过程奖励（Process Reward）机制，而不仅仅是最终结果奖励？","是的，框架已初步实现了过程奖励机制。除了传统的基于最终结果（如任务成功\u002F失败）的奖励外，现在支持用户自定义过程奖励模式。您可以在工具选择、参数有效性验证和推理连贯性等步骤注入奖励信号。这有助于解决复杂多轮任务中的反馈稀疏问题，加速模型收敛并提升工具调用熟练度。具体实现细节可联系维护者或查看最新代码。","https:\u002F\u002Fgithub.com\u002FSimple-Efficient\u002FRL-Factory\u002Fissues\u002F6",{"id":146,"question_zh":147,"answer_zh":148,"source_url":129},40102,"在多模态管理器（如 qwen2_5_vl_manager.py）中，tokenizer 初始化时为 None 是否会影响 get_prompt 的执行？","不会影响。虽然在 __init__() 中 self.tokenizer 被初始化为 None，但在 mmbase.py 首次调用 get_prompt 时，代码会检测到 tokenizer 为 None 并自动执行 _load_custom_chat_template(tokenizer) 进行加载。加载完成后，self.tokenizer 将被正确赋值，后续的 get_prompt 调用即可正常使用该 tokenizer 生成包含工具信息的 prompt。",[150],{"id":151,"version":152,"summary_zh":153,"released_at":154},323628,"v0.1.0","这个版本让你能够快速训练属于自己的智能体模型。  \n+ **环境解耦**：轻松定义工具使用环境（包括工具的配置和奖励函数的设定）  \n+ **支持通义千问3模型**：利用通义千问3快速训练智能体模型，其工具调用能力远超通义千问2.5  \n+ **高效训练**：相比现有框架，训练速度提升2倍，实现模型的快速迭代（主要得益于异步工具调用机制）","2025-05-23T04:01:54"]