[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-openreasoner--openr":3,"tool-openreasoner--openr":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",160784,2,"2026-04-19T11:32:54",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",109154,"2026-04-18T11:18:24",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":72,"owner_avatar_url":73,"owner_bio":74,"owner_company":74,"owner_location":74,"owner_email":74,"owner_twitter":74,"owner_website":74,"owner_url":75,"languages":76,"stars":89,"forks":90,"last_commit_at":91,"license":92,"difficulty_score":93,"env_os":94,"env_gpu":95,"env_ram":94,"env_deps":96,"category_tags":104,"github_topics":74,"view_count":32,"oss_zip_url":74,"oss_zip_packed_at":74,"status":17,"created_at":105,"updated_at":106,"faqs":107,"releases":136},9861,"openreasoner\u002Fopenr","openr","OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models","OpenR 是一个专为提升大语言模型高级推理能力而设计的开源框架。面对复杂数学问题或逻辑推导任务时，普通大模型往往难以生成准确且连贯的解题步骤，OpenR 正是为了解决这一痛点而生。它提供了一套完整的工具链，支持从数据构建、模型训练到推理验证的全流程，帮助开发者高效地探索和实现更强大的推理算法。\n\n该框架特别适合人工智能研究人员和算法工程师使用，尤其是那些致力于优化模型在数学、科学等领域推理表现的专业人士。OpenR 不仅开源了核心的训练与推理代码，还分享了高质量的专用数据集（如 MATH-APS）以及经过微调的预训练模型，极大地降低了复现前沿研究成果的门槛。其技术亮点在于模块化设计，允许用户灵活集成不同的推理策略，并提供了详细的教程和文档，方便快速上手。无论是希望深入理解大模型推理机制的研究者，还是想要在实际应用中部署高精度推理模型的开发者，OpenR 都是一个值得尝试的强大助手。","\u003Cdiv id=\"top\">\u003C\u002Fdiv>\n\u003C!--\n*** Thanks for checking out the Best-README-Template. If you have a suggestion\n*** that would make this better, please fork the repo and create a pull request\n*** or simply open an issue with the tag \"enhancement\".\n*** Don't forget to give the project a star!\n*** Thanks again! Now go create something AMAZING! :D\n-->\n\n\u003C!-- PROJECT SHIELDS -->\n\n\u003C!--\n*** I'm using markdown \"reference style\" links for readability.\n*** Reference links are enclosed in brackets [ ] instead of parentheses ( ).\n*** See the bottom of this document for the declaration of the reference variables\n*** for contributors-url, forks-url, etc. This is an optional, concise syntax you may use.\n*** https:\u002F\u002Fwww.markdownguide.org\u002Fbasic-syntax\u002F#reference-style-links\n-->\n\n\u003C!-- \n***[![MIT License][license-shield]][license-url]\n-->\n\n\u003C!-- PROJECT LOGO -->\n\n\u003Cbr \u002F>\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002F\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_6a636a3bde09.png\" alt=\"Logo\" width=\"200\">\n  \u003C\u002Fa>\n  \n\u003Ch1 align=\"center\" style=\"font-size: 30px;\">\u003Cstrong>\u003Cem>OpenR\u003C\u002Fem>\u003C\u002Fstrong>: An Open Source Framework for Advanced Reasoning with Large Language Models\u003C\u002Fh1>\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.09671\">Paper\u003C\u002Fa>\n    ·\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fblob\u002Fmain\u002Freports\u002FTutorial-LLM-Reasoning-Wang.pdf\">Tutorial\u003C\u002Fa>\n    ·\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\">Code\u003C\u002Fa>\n    ·\n    \u003Ca href=\"https:\u002F\u002Fopenreasoner.github.io\u002F\">Docs\u003C\u002Fa>\n    ·\n    \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopenreasoner\u002FMATH-APS\">Data\u003C\u002Fa>\n    ·\n    \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fopenreasoner\u002FMath-psa\">Model\u003C\u002Fa>\n    ·\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fissues\">Issue\u003C\u002Fa>\n    ·\n    \u003Ca href=\"https:\u002F\u002Fwww.modelscope.cn\u002Fstudios\u002Fmodelscope\u002FOpenR_Inference\">Demo\u003C\u002Fa>\n  \u003C\u002Fp>\n    \u003Cp align=\"center\">\n     [ \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fblob\u002Fmain\u002FREADME.md\">English\u003C\u002Fa> ][ \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fblob\u002Fmain\u002FREADME_zh.md\">中文\u003C\u002Fa> ]\n    \u003C\u002Fp>\n\u003C\u002Fdiv>\n\n---\n[![GitHub contributors](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcontributors\u002Fopenreasoner\u002Fopenr)][contributors-url]\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArXiv-2410.09671-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.09671)\n![GitHub License](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fopenreasoner\u002Fopenr)\n[![GitHub Issues or Pull Requests](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Fopenreasoner\u002Fopenr)][issues-url]\n[![GitHub forks](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Fopenreasoner\u002Fopenr)][forks-url]\n[![GitHub Repo stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fopenreasoner\u002Fopenr)][stars-url]\n[![HuggingFace Dataset](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHugging%20Face-FFD21E?logo=huggingface&logoColor=000)](https:\u002F\u002Fhuggingface.co\u002Fopenreasoner)\n[![X](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fopenreasoner-%23000000.svg?logo=X&logoColor=white)](https:\u002F\u002Fx.com\u002Fopenreasoner)\n[![WeChat](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWeChat_Group-07C160?logo=wechat&logoColor=white)](#community)\n\n\n\u003C!-- TABLE OF CONTENTS -->\n\n\u003Cdetails>\n  \u003Csummary>\u003Cspan style=\"font-size: 1.5em;\">\u003Cstrong>Table of Contents\u003C\u002Fstrong> 📖 \u003C\u002Fspan>\u003C\u002Fsummary>\n  \u003Col>\n    \u003Cli>\u003Ca href=\"#news-and-updates\">News and Updates\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#features\">Features\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#todo\">TODO\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#benchmark\">Benchmark\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#plots\">Plots\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#provided-datasets-and-models\">Datasets and Models\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\n      \u003Ca href=\"#getting-started\">Getting Started\u003C\u002Fa>\n      \u003Cul>\n        \u003Cli>\u003Ca href=\"#installation\">Installation\u003C\u002Fa>\u003C\u002Fli>\n        \u003Cli>\u003Ca href=\"#quickstart\">Quick Start\u003C\u002Fa>\u003C\u002Fli>\n      \u003C\u002Ful>\n    \u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#usage\">Usage\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#join-us\">Join Us\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#contact\">Contact\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#response-examples\">Response Examples\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#community\">Community\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#reference\">Reference\u003C\u002Fa>\u003C\u002Fli>\n  \u003C\u002Fol>\n\n\u003C\u002Fdetails>\n\n\u003C!-- News and Updates -->\n\n## News and Updates\n- **[29\u002F11\u002F2024]** We have now added a [**demo**](https:\u002F\u002Fwww.modelscope.cn\u002Fstudios\u002Fmodelscope\u002FOpenR_Inference) page on *ModelScope*. Many thanks to [@wangxingjun778](https:\u002F\u002Fgithub.com\u002Fwangxingjun778) !\n- **[24\u002F10\u002F2024]** ***OpenR*** now supports **MCTS** reasoning ([#24](https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fpull\u002F24))! 🌲\n- **[15\u002F10\u002F2024]** Our report is on [**Arxiv**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.09671)! \n- **[12\u002F10\u002F2024]** ***OpenR*** has been released! 🚀 \n\n\n## Features\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_4a4a79c21426.png\" alt=\"Description\" style=\"width: 300px; margin-left: 50px; float: right;\">\n\u003C\u002Fp>\n\n| Feature                                | Contents                                                                                                                                                                                                                                                                              |\n|----------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| ✅ Process-supervision Data Generation | - [**OmegaPRM**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.06592): Improve Mathematical Reasoning in Language Models by Automated Process Supervision                                                                                                                                                |\n| ✅ Online Policy Training              | - [**RL Training**](train\u002Fmat\u002Ftrainers): APPO, GRPO, TPPO;                                                                                                                                                                                                                            |\n| ✅ Generative and Discriminative PRM Training | - [**PRM Training**](prm\u002Fcode): Supervised Training for PRMs\u003Cbr> - **Generative RM Training**: [Direct GenRM](gen_rm\u002F)                                                                                                                                                                |\n| ✅ Multiple Search Strategies          | - **Greedy Search**\u003Cbr> - **Best-of-N**\u003Cbr> - **Beam Search**\u003Cbr> - **MCTS**\u003Cbr> - [**rStar**](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.06195v1): Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers\u003Cbr> - **Critic-MCTS**: [Under Review](https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fpull\u002F44) |\n| ✅ Test-time Computation and Scaling Law | TBA, see [benchmark](#benchmark)                                                                                                                                                                                                                                                      |\n\n\n## TODO\n\n\n| Feature                                 | TODO (\u003Cspan style=\"color:red;\">High Priority\u003C\u002Fspan>, We value you contribution!)                                                                                                                                                                                                                                                                                                                                                                                              |\n|-----------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| 👨‍💻Data                                    | - Re-implement [**Journey Learning**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.18982)                                                                                                                                                                                                                                                                                                                                                                                                       |\n| 👨‍💻RL Training                             | - Distributed Training\u003Cbr\u002F>- Reinforcement Fine-Tuning (RFT) [#80](https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fissues\u002F80)                                                                                                                                                                                                                                                                                                                                                           |\n| 👨‍💻PRM                                     | - Larger-scale training\u003Cbr> - GenRM-CoT implementation \u003Cbr\u002F>- Soft-label training [#57](https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fissues\u002F57)                                                                                                                                                                                                                                                                                                                                      |\n| 👨‍💻Reasoning                               | - Optimize code structure [#53](https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fpull\u002F53) \u003Cbr> - More tasks on reasoning (AIME, etc.) [#53](https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fpull\u002F53) \u003Cbr> - Multi-modal reasoning [#82](https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fissues\u002F82) \u003Cbr> - Reasoning in code generation [#68](https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fpull\u002F68) \u003Cbr\u002F> - Dots [#75](https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fpull\u002F75) \u003Cbr\u002F> - Consistency check \u003Cbr\u002F> - Benchmarking |\n\n## Benchmark\n\nSee [Benchmark](benchmark) !\n\n\n\n\n\n## Plots\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_84bbb9893a61.png\" alt=\"PRM_Results\" width=\"45%\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_6e62d09bc7d8.png\" alt=\"Inference_Results\" width=\"45%\" \u002F>\n\u003C\u002Fp>\n\n## Provided Datasets and Models\n\n[\u002F\u002F]: # ([PRM800K]&#40;https:\u002F\u002Fgithub.com\u002Fopenai\u002Fprm800k&#41; &#40;Process Supervision Dataset&#41;)\n\n[MATH-APS](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fmengfang\u002FMATH-APS) (Our Dataset)\n\n[MATH-psa](https:\u002F\u002Fhuggingface.co\u002Fopenreasoner\u002FMath-psa) (Our Process Reward Model)\n\n## Getting Started\n\n\n### Installation\n\n```\nconda create -n open_reasoner python=3.10\nconda activate open_reasoner\npip install -r requirements.txt\npip3 install  \"fschat[model_worker,webui]\"\npip install -U pydantic\ncd envs\u002FMATH\u002Flatex2sympy\npip install -e .\ncd -\n```\n\n\n### Download Base Models\n\n\nBefore running the project, please ensure that all required base models are downloaded. The models used in this project include:\n\n- `Qwen2.5-Math-1.5B-Instruct`, `Qwen2.5-Math-7B-Instruct`\n- `peiyi9979\u002Fmistral-7b-sft`\n- `peiyi9979\u002Fmath-shepherd-mistral-7b-prm`\n\nTo download these models, please refer to the [Hugging Face model downloading tutorial](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhub\u002Fmodels-downloading) for step-by-step guidance on downloading models from the Hugging Face Hub.\n\nPlease make sure that all models are saved in their directories according to the project setup before proceeding.\n\n\n### Quickstart\n\nBefore running inference, please modify the following variables in the scripts under `reason\u002Fllm_service\u002F` to set the appropriate base models for your usage:\n\n- `$MODEL_BASE`: Set this to the directory where your models are stored.\n- `$POLICY_MODEL_NAME`: Set this to the name of the policy model you wish to use.\n- `$VALUE_MODEL_NAME`: Set this to the name of the value model you wish to use.\n- `$NUM_LM_WORKER`: Set this to the number of language model (LM) workers to start.\n- `$NUM_RM_WORKER`: Set this to the number of reward model (RM) workers to start.\n\nThen it prepares and runs inference using different techniques.\n\n#### Start LM & RM Services\nFor example, to start the LM and RM services for the Math Shepherd model, run the following command:\n```bash\nsh reason\u002Fllm_service\u002Fcreate_service_math_shepherd.sh\n```\n\nTo kill the server processes, recommend using the following command:\n```bash\ntmux kill-session -t {Your Session Name} # default is `FastChat`\n```\n\n## Usage\n\n#### Run Inference\n\n\n⚠️ Make sure the input (`--LM`, `--RM`) in the script aligns with the variables (`$POLICY_MODEL_NAME`, `$VALUE_MODEL_NAME`) in the pending worker!\n\n\n\n```bash\nexport PYTHONPATH=$(pwd)\nsh scripts\u002Feval\u002Fcot_greedy.sh\n\n# Method: cot. Average result: ({'majority_vote': 0.734, 'total_completion_tokens': 559.13},)\n\nsh scripts\u002Feval\u002Fcot_rerank.sh\n\n# Method: best_of_n. Average result: ({'majority_vote': 0.782, \n#                                       'prm_min_max': 0.772, \n#                                       'prm_min_vote': 0.792, \n#                                       'prm_last_max': 0.776, \n#                                       'prm_last_vote': 0.792, \n#                                       'total_completion_tokens': 4431.268},)\n\nsh scripts\u002Feval\u002Fbeam_search.sh\n\n# Method: beam_search. Average result: ({'majority_vote': 0.74, 'total_completion_tokens': 2350.492},)\n\nsh scripts\u002Feval\u002Fvanila_mcts.sh\n\n```\n\n#### Run Training\n\n⚠️ Before training, please modify the `$dataset_path`, `$model_name_or_path` and `$prm_name_or_path` in `train\u002Fmat\u002Fscripts\u002Ftrain_llm.sh`.\n\n```bash\ncd train\u002Fmat\u002Fscripts\nbash train_llm.sh\n```\n\n#### Run PRM Learning\n\n```bash\ncd prm\u002Fcode\n\n\\\\ single gpu\npython finetune_qwen_single_gpu.py --model_path $YOUR_MODEL_PATH \\\n                                   --train_data_path $TRAIN_DATA_PATH \\\n                                   --test_data_path $TEST_DATA_PATH\n\n\n\\\\ multi gpu\ntorchrun --nproc_per_node=2 finetune_qwen.py --model_path $YOUR_MODEL_PATH \\\n                                             --data_path $YOUR_DATA_FOLDER_PATH \\\n                                             --datasets both \\\n```\n\n## Join Us\n\n> Every contribution is valuable to the community.\n\nThank you for your interest in ***OpenR*** ! 🥰 We are deeply committed to the open-source community, \nand we welcome contributions from everyone. Your efforts, whether big or small, help us grow and improve. \nContributions aren’t limited to code—answering questions, helping others, enhancing our \ndocumentation, and sharing the project are equally impactful. \n\nFeel free to checkout the [contribution guidance](CONTRIBUTING.md) ! \n\n### Future Plan\n\n- Add More Comprehensive Evaluations on RL Training and Search Strategies\n\n- Scaling the Prove-Verifier Model Size\n\n- Support Self-improvement Training\n\n\u003C!-- CONTACT -->\n\n## Contact\n\nThe ***OpenR*** community is maintained by:\n\n- **Openreasoner Team** (openreasoner@gmail.com)\n\n## License\nOpenR is released under the MIT License.\n\n## Citation\n\nIf you do find our resources helpful, please cite our paper:\n\n```\n@misc{wang2024tutorial,\n  author = {Jun Wang},\n  title = {A Tutorial on LLM Reasoning: Relevant Methods Behind ChatGPT o1},\n  year = {2024},\n  url = {https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fblob\u002Fmain\u002Freports\u002Ftutorial.pdf},\n  note = {Available on GitHub}\n}\n```\n\n```\n@article{wang2024openr,\n  title={OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models},\n  author={Wang, Jun and Fang, Meng and Wan, Ziyu and Wen, Muning and Zhu, Jiachen and Liu, Anjie and Gong, Ziqin and Song, Yan and Chen, Lei and Ni, Lionel M and others},\n  journal={arXiv preprint arXiv:2410.09671},\n  year={2024}\n}\n```\n\n## Response Examples\n\n### Comparing PRM, Math-psa (Ours) V.S. Math-Shepherd \n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_22171dcc2121.png\" alt=\"QA 1\" width=\"49%\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_b36d1b8d6a7a.png\" alt=\"QA 2\" width=\"49%\" \u002F>\n\u003C\u002Fp>\n\n\n### Justifing RL Training\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_06dff447d9fc.png\" alt=\"QA 3\" width=\"49%\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_4e56ae5d6cb4.png\" alt=\"QA 4\" width=\"49%\" \u002F>\n\u003C\u002Fp>\n\n### Exploring Test-time Computation\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_40c1d797a4c6.png\" alt=\"QA 5\" width=\"70%\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_c2ce89e0da1a.png\" alt=\"QA 6\" width=\"70%\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_e8ee1048524a.png\" alt=\"QA 7\" width=\"70%\" \u002F>\n\u003C\u002Fp>\n\n\n## Community\n\n**WeChat**:\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_adbe2bf87cb8.jpg\" width=\"30%\" \u002F>\n\n\n\n## Reference\n\n### Inference-time Computing\n[1] [Alphazero-like tree-search can guide large language model decoding and training.](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.17179)\n\n[2] [Reasoning with language model is planning with world model.](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.14992)\n\n[3] [Scaling LLM test-time compute optimally can be more effective than scaling model parameters](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.03314?)\n\n[4] [Think before you speak: Training language models with pause tokens](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.02226)\n\n\n### From Outcome Supervision to Process Supervision\n\n[1] [Training verifiers to solve math word problems](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2110.14168)\n\n[2] [Solving math word problems with process-and outcome-based feedback](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2211.14275)\n\n[3] [Let’s verify step by step](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.20050)\n\n[4] [Making large language models better reasoners with step-aware verifier](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2206.02336)\n\n[5] [Ovm, outcome-supervised value models for planning in\nmathematical reasoning](https:\u002F\u002Faclanthology.org\u002F2024.findings-naacl.55.pdf)\n\n[6] [Generative verifiers: Reward modeling as next-token prediction](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.15240)\n\n### Data Acquisition\n\n[1] [Star: Bootstrapping reasoning with reasoning](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Ffile\u002F639a9a172c044fbb64175b5fad42e9a5-Paper-Conference.pdf)\n\n[2] [Quiet-star: Language models can teach themselves to think before speaking](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.09629)\n\n[3] [Improve mathematical reasoning in language models by automated\nprocess supervision](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.06592)\n\n[4] [Shepherd: A critic for language model generation](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.04592)\n\n[5] [Math-shepherd: Verify and reinforce llms step-by-step without human annotations](https:\u002F\u002Faclanthology.org\u002F2024.acl-long.510.pdf)\n\n\u003C!-- MARKDOWN LINKS & IMAGES -->\n\n\u003C!-- https:\u002F\u002Fwww.markdownguide.org\u002Fbasic-syntax\u002F#reference-style-links -->\n\n[contributors-shield]: https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcontributors\u002Fopenreasoner\u002Fopenr.svg?style=for-the-badge\n[contributors-url]: https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fgraphs\u002Fcontributors\n[forks-shield]: https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Fopenreasoner\u002Fopenr.svg?style=for-the-badge\n[forks-url]: https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fnetwork\u002Fmembers\n[stars-shield]: https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fopenreasoner\u002Fopenr.svg?style=for-the-badge\n[stars-url]: https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fstargazers\n[issues-shield]: https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Fopenreasoner\u002Fopenr.svg?style=for-the-badge\n[issues-url]: https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fissues\n\n[license-shield]: https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fopenreasoner\u002Fopenr.svg?style=for-the-badge\n[license-url]: https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fblob\u002Fmain\u002FLICENSE.txt\n","\u003Cdiv id=\"top\">\u003C\u002Fdiv>\n\u003C!--\n*** 感谢您查看最佳 README 模板。如果您有任何改进建议，请 fork 该仓库并提交 pull request，\n*** 或直接开一个带有“enhancement”标签的问题。别忘了给项目点个 star！\n*** 再次感谢！现在就去创造一些令人惊叹的东西吧：D\n-->\n\n\u003C!-- 项目状态徽章 -->\n\n\u003C!--\n*** 我使用 Markdown 的“引用式”链接以提高可读性。\n*** 引用式链接用方括号 [ ] 而不是圆括号 ( ) 表示。\n*** 文档底部列出了 contributors-url、forks-url 等引用变量的声明，这是一种可选的简洁语法。\n*** https:\u002F\u002Fwww.markdownguide.org\u002Fbasic-syntax\u002F#reference-style-links\n-->\n\n\u003C!-- \n***[![MIT License][license-shield]][license-url]\n-->\n\n\u003C!-- 项目 Logo -->\n\n\u003Cbr \u002F>\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002F\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_6a636a3bde09.png\" alt=\"Logo\" width=\"200\">\n  \u003C\u002Fa>\n  \n\u003Ch1 align=\"center\" style=\"font-size: 30px;\">\u003Cstrong>\u003Cem>OpenR\u003C\u002Fem>\u003C\u002Fstrong>: 一个用于大型语言模型高级推理的开源框架\u003C\u002Fh1>\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.09671\">论文\u003C\u002Fa>\n    ·\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fblob\u002Fmain\u002Freports\u002FTutorial-LLM-Reasoning-Wang.pdf\">教程\u003C\u002Fa>\n    ·\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\">代码\u003C\u002Fa>\n    ·\n    \u003Ca href=\"https:\u002F\u002Fopenreasoner.github.io\u002F\">文档\u003C\u002Fa>\n    ·\n    \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopenreasoner\u002FMATH-APS\">数据\u003C\u002Fa>\n    ·\n    \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fopenreasoner\u002FMath-psa\">模型\u003C\u002Fa>\n    ·\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fissues\">问题\u003C\u002Fa>\n    ·\n    \u003Ca href=\"https:\u002F\u002Fwww.modelscope.cn\u002Fstudios\u002Fmodelscope\u002FOpenR_Inference\">演示\u003C\u002Fa>\n  \u003C\u002Fp>\n    \u003Cp align=\"center\">\n     [ \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fblob\u002Fmain\u002FREADME.md\">英文\u003C\u002Fa> ][ \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fblob\u002Fmain\u002FREADME_zh.md\">中文\u003C\u002Fa> ]\n    \u003C\u002Fp>\n\u003C\u002Fdiv>\n\n---\n[![GitHub contributors](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcontributors\u002Fopenreasoner\u002Fopenr)][contributors-url]\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArXiv-2410.09671-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.09671)\n![GitHub License](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fopenreasoner\u002Fopenr)\n[![GitHub Issues or Pull Requests](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Fopenreasoner\u002Fopenr)][issues-url]\n[![GitHub forks](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Fopenreasoner\u002Fopenr)][forks-url]\n[![GitHub Repo stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fopenreasoner\u002Fopenr)][stars-url]\n[![HuggingFace Dataset](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHugging%20Face-FFD21E?logo=huggingface&logoColor=000)](https:\u002F\u002Fhuggingface.co\u002Fopenreasoner)\n[![X](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fopenreasoner-%23000000.svg?logo=X&logoColor=white)](https:\u002F\u002Fx.com\u002Fopenreasoner)\n[![WeChat](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWeChat_Group-07C160?logo=wechat&logoColor=white)](#community)\n\n\n\u003C!-- 目录 -->\n\n\u003Cdetails>\n  \u003Csummary>\u003Cspan style=\"font-size: 1.5em;\">\u003Cstrong>目录\u003C\u002Fstrong> 📖 \u003C\u002Fspan>\u003C\u002Fsummary>\n  \u003Col>\n    \u003Cli>\u003Ca href=\"#news-and-updates\">新闻与更新\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#features\">功能特性\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#todo\">待办事项\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#benchmark\">基准测试\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#plots\">图表\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#provided-datasets-and-models\">提供的数据集和模型\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\n      \u003Ca href=\"#getting-started\">开始使用\u003C\u002Fa>\n      \u003Cul>\n        \u003Cli>\u003Ca href=\"#installation\">安装\u003C\u002Fa>\u003C\u002Fli>\n        \u003Cli>\u003Ca href=\"#quickstart\">快速入门\u003C\u002Fa>\u003C\u002Fli>\n      \u003C\u002Ful>\n    \u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#usage\">使用方法\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#join-us\">加入我们\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#contact\">联系我们\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#response-examples\">响应示例\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#community\">社区\u003C\u002Fa>\u003C\u002Fli>\n    \u003Cli>\u003Ca href=\"#reference\">参考文献\u003C\u002Fa>\u003C\u002Fli>\n  \u003C\u002Fol>\n\n\u003C\u002Fdetails>\n\n\u003C!-- 新闻与更新 -->\n\n## 新闻与更新\n- **[29\u002F11\u002F2024]** 我们现在在 *ModelScope* 上新增了 [**演示**](https:\u002F\u002Fwww.modelscope.cn\u002Fstudios\u002Fmodelscope\u002FOpenR_Inference) 页面。非常感谢 [@wangxingjun778](https:\u002F\u002Fgithub.com\u002Fwangxingjun778)！\n- **[24\u002F10\u002F2024]** ***OpenR*** 现在支持 **MCTS** 推理 ([#24](https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fpull\u002F24))！🌲\n- **[15\u002F10\u002F2024]** 我们的报告已发表在 [**Arxiv**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.09671) 上！\n- **[12\u002F10\u002F2024]** ***OpenR*** 正式发布！🚀\n\n\n## 功能特性\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_4a4a79c21426.png\" alt=\"Description\" style=\"width: 300px; margin-left: 50px; float: right;\">\n\u003C\u002Fp>\n\n| 功能                                | 内容                                                                                                                                                                                                                                                                              |\n|----------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| ✅ 过程监督数据生成 | - [**OmegaPRM**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.06592): 通过自动化过程监督提升语言模型的数学推理能力                                                                                                                                                |\n| ✅ 在线策略训练              | - [**RL 训练**](train\u002Fmat\u002Ftrainers): APPO、GRPO、TPPO；                                                                                                                                                                                                                            |\n| ✅ 生成式与判别式 PRM 训练 | - [**PRM 训练**](prm\u002Fcode): PRM 的监督训练\u003Cbr> - **生成式 RM 训练**: [Direct GenRM](gen_rm\u002F)                                                                                                                                                                |\n| ✅ 多种搜索策略          | - **贪心搜索**\u003Cbr> - **Best-of-N**\u003Cbr> - **束搜索**\u003Cbr> - **MCTS**\u003Cbr> - [**rStar**](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.06195v1): 相互推理使小型 LLM 更强大\u003Cbr> - **Critic-MCTS**: [正在审查](https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fpull\u002F44) |\n| ✅ 测试时计算与规模定律 | 待定，详见 [基准测试](#benchmark)                                                                                                                                                                                                                                                      |\n\n## 待办事项\n\n\n| 功能                                 | TODO (\u003Cspan style=\"color:red;\">高优先级\u003C\u002Fspan>, 我们非常重视您的贡献!)                                                                                                                                                                                                                                                                                                                                                                                              |\n|-----------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| 👨‍💻数据                                    | - 重新实现 [**旅程学习**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.18982)                                                                                                                                                                                                                                                                                                                                                                                                       |\n| 👨‍💻强化学习训练                             | - 分布式训练\u003Cbr\u002F>- 强化微调 (RFT) [#80](https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fissues\u002F80)                                                                                                                                                                                                                                                                                                                                                           |\n| 👨‍💻PRM                                     | - 更大规模的训练\u003Cbr> - GenRM-CoT 实现 \u003Cbr\u002F>- 软标签训练 [#57](https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fissues\u002F57)                                                                                                                                                                                                                                                                                                                                      |\n| 👨‍💻推理                               | - 优化代码结构 [#53](https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fpull\u002F53) \u003Cbr> - 更多推理任务（AIME 等）[#53](https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fpull\u002F53) \u003Cbr> - 多模态推理 [#82](https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fissues\u002F82) \u003Cbr> - 代码生成中的推理 [#68](https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fpull\u002F68) \u003Cbr\u002F> - Dots [#75](https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fpull\u002F75) \u003Cbr\u002F> - 一致性检查 \u003Cbr\u002F> - 基准测试 |\n\n## 基准测试\n\n请参阅 [基准测试](benchmark) !\n\n\n\n\n\n## 图表\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_84bbb9893a61.png\" alt=\"PRM_Results\" width=\"45%\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_6e62d09bc7d8.png\" alt=\"Inference_Results\" width=\"45%\" \u002F>\n\u003C\u002Fp>\n\n## 提供的数据集和模型\n\n[\u002F\u002F]: # ([PRM800K]&#40;https:\u002F\u002Fgithub.com\u002Fopenai\u002Fprm800k&#41; &#40;过程监督数据集&#41;)\n\n[MATH-APS](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fmengfang\u002FMATH-APS) (我们的数据集)\n\n[MATH-psa](https:\u002F\u002Fhuggingface.co\u002Fopenreasoner\u002FMath-psa) (我们的过程奖励模型)\n\n## 开始使用\n\n\n### 安装\n\n```\nconda create -n open_reasoner python=3.10\nconda activate open_reasoner\npip install -r requirements.txt\npip3 install  \"fschat[model_worker,webui]\"\npip install -U pydantic\ncd envs\u002FMATH\u002Flatex2sympy\npip install -e .\ncd -\n```\n\n\n### 下载基础模型\n\n\n在运行项目之前，请确保所有所需的基础模型均已下载。本项目中使用的模型包括：\n\n- `Qwen2.5-Math-1.5B-Instruct`, `Qwen2.5-Math-7B-Instruct`\n- `peiyi9979\u002Fmistral-7b-sft`\n- `peiyi9979\u002Fmath-shepherd-mistral-7b-prm`\n\n要下载这些模型，请参考 [Hugging Face 模型下载教程](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhub\u002Fmodels-downloading)，其中提供了从 Hugging Face Hub 下载模型的分步指南。\n\n请务必在继续操作之前，将所有模型按照项目设置保存到相应的目录中。\n\n\n### 快速入门\n\n在运行推理之前，请修改 `reason\u002Fllm_service\u002F` 目录下脚本中的以下变量，以设置适合您使用的相应基础模型：\n\n- `$MODEL_BASE`: 设置为您存储模型的目录。\n- `$POLICY_MODEL_NAME`: 设置为您希望使用的策略模型名称。\n- `$VALUE_MODEL_NAME`: 设置为您希望使用的价值模型名称。\n- `$NUM_LM_WORKER`: 设置为要启动的语言模型 (LM) 工作进程数量。\n- `$NUM_RM_WORKER`: 设置为要启动的奖励模型 (RM) 工作进程数量。\n\n然后即可准备并使用不同的技术进行推理。\n\n#### 启动 LM 和 RM 服务\n例如，要为 Math Shepherd 模型启动 LM 和 RM 服务，请运行以下命令：\n```bash\nsh reason\u002Fllm_service\u002Fcreate_service_math_shepherd.sh\n```\n\n若需停止服务器进程，建议使用以下命令：\n```bash\ntmux kill-session -t {您的会话名称} # 默认为 `FastChat`\n```\n\n## 使用方法\n\n#### 运行推理\n\n\n⚠️ 请确保脚本中的输入 (`--LM`, `--RM`) 与待处理工作进程中的变量 (`$POLICY_MODEL_NAME`, `$VALUE_MODEL_NAME`) 保持一致！\n\n\n\n```bash\nexport PYTHONPATH=$(pwd)\nsh scripts\u002Feval\u002Fcot_greedy.sh\n\n# 方法：cot。平均结果：({'majority_vote': 0.734, 'total_completion_tokens': 559.13},)\n\nsh scripts\u002Feval\u002Fcot_rerank.sh\n\n# 方法：best_of_n。平均结果：({'majority_vote': 0.782, \n#                                       'prm_min_max': 0.772, \n#                                       'prm_min_vote': 0.792, \n#                                       'prm_last_max': 0.776, \n#                                       'prm_last_vote': 0.792, \n#                                       'total_completion_tokens': 4431.268},)\n\nsh scripts\u002Feval\u002Fbeam_search.sh\n\n# 方法：束搜索。平均结果：({'majority_vote': 0.74, 'total_completion_tokens': 2350.492},)\n\nsh scripts\u002Feval\u002Fvanila_mcts.sh\n\n```\n\n#### 运行训练\n\n⚠️ 在训练之前，请修改 `train\u002Fmat\u002Fscripts\u002Ftrain_llm.sh` 中的 `$dataset_path`、`$model_name_or_path` 和 `$prm_name_or_path`。\n\n```bash\ncd train\u002Fmat\u002Fscripts\nbash train_llm.sh\n```\n\n#### 运行 PRM 学习\n\n```bash\ncd prm\u002Fcode\n\n\\\\ 单 GPU\npython finetune_qwen_single_gpu.py --model_path $YOUR_MODEL_PATH \\\n                                   --train_data_path $TRAIN_DATA_PATH \\\n                                   --test_data_path $TEST_DATA_PATH\n\n\n\\\\ 多 GPU\ntorchrun --nproc_per_node=2 finetune_qwen.py --model_path $YOUR_MODEL_PATH \\\n                                             --data_path $YOUR_DATA_FOLDER_PATH \\\n                                             --datasets both\n```\n\n## 加入我们\n\n> 每一份贡献对社区都至关重要。\n\n感谢您对 ***OpenR*** 的关注！🥰 我们深深致力于开源社区，并欢迎所有人的贡献。无论您的努力是大是小，都能帮助我们成长和进步。贡献不仅限于代码——回答问题、帮助他人、完善文档以及分享项目同样具有重要意义。\n\n请随时查看 [贡献指南](CONTRIBUTING.md)！\n\n### 未来计划\n\n- 对强化学习训练和搜索策略进行更全面的评估\n- 扩展证明-验证器模型的规模\n- 支持自我改进型训练\n\n\u003C!-- 联系方式 -->\n\n## 联系方式\n\n***OpenR*** 社区由以下团队维护：\n\n- **Openreasoner 团队** (openreasoner@gmail.com)\n\n## 许可证\n\nOpenR 根据 MIT 许可证发布。\n\n## 引用\n\n如果您觉得我们的资源有所帮助，请引用我们的论文：\n\n```\n@misc{wang2024tutorial,\n  author = {Jun Wang},\n  title = {LLM 推理教程：ChatGPT o1 背后的相关方法},\n  year = {2024},\n  url = {https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fblob\u002Fmain\u002Freports\u002Ftutorial.pdf},\n  note = {可在 GitHub 上获取}\n}\n```\n\n```\n@article{wang2024openr,\n  title={OpenR：用于大型语言模型高级推理的开源框架},\n  author={Wang, Jun 和 Fang, Meng 和 Wan, Ziyu 和 Wen, Muning 和 Zhu, Jiachen 和 Liu, Anjie 和 Gong, Ziqin 和 Song, Yan 和 Chen, Lei 和 Ni, Lionel M 等},\n  journal={arXiv 预印本 arXiv:2410.09671},\n  year={2024}\n}\n```\n\n## 回答示例\n\n### 比较 PRM、Math-psa（我们）与 Math-Shepherd\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_22171dcc2121.png\" alt=\"QA 1\" width=\"49%\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_b36d1b8d6a7a.png\" alt=\"QA 2\" width=\"49%\" \u002F>\n\u003C\u002Fp>\n\n\n### 解释强化学习训练\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_06dff447d9fc.png\" alt=\"QA 3\" width=\"49%\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_4e56ae5d6cb4.png\" alt=\"QA 4\" width=\"49%\" \u002F>\n\u003C\u002Fp>\n\n### 探索测试时计算\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_40c1d797a4c6.png\" alt=\"QA 5\" width=\"70%\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_c2ce89e0da1a.png\" alt=\"QA 6\" width=\"70%\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_e8ee1048524a.png\" alt=\"QA 7\" width=\"70%\" \u002F>\n\u003C\u002Fp>\n\n\n## 社区\n\n**微信**：\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_readme_adbe2bf87cb8.jpg\" width=\"30%\" \u002F>\n\n\n\n## 参考文献\n\n### 推理时计算\n[1] [类似 AlphaZero 的树搜索可以指导大型语言模型的解码和训练。](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.17179)\n\n[2] [使用语言模型进行推理就是利用世界模型进行规划。](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.14992)\n\n[3] [优化 LLM 测试时的计算能力可能比增加模型参数更为有效](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.03314?)\n\n[4] [三思而后言：使用暂停标记训练语言模型](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.02226)\n\n\n### 从结果监督到过程监督\n[1] [训练验证者解决数学应用题](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2110.14168)\n\n[2] [通过过程和结果反馈解决数学应用题](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2211.14275)\n\n[3] [让我们逐步验证](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.20050)\n\n[4] [通过步骤感知的验证者使大型语言模型成为更好的推理者](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2206.02336)\n\n[5] [Ovm，用于数学推理规划的结果监督价值模型](https:\u002F\u002Faclanthology.org\u002F2024.findings-naacl.55.pdf)\n\n[6] [生成式验证者：将奖励建模视为下一个标记预测](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.15240)\n\n### 数据获取\n[1] [Star：通过推理来引导推理](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Ffile\u002F639a9a172c044fbb64175b5fad42e9a5-Paper-Conference.pdf)\n\n[2] [Quiet-star：语言模型可以自我教导，在开口前先思考](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.09629)\n\n[3] [通过自动化的过程监督提高语言模型的数学推理能力](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.06592)\n\n[4] [Shepherd：语言模型生成的批评者](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.04592)\n\n[5] [Math-shepherd：无需人工标注即可逐步验证并强化 LLM](https:\u002F\u002Faclanthology.org\u002F2024.acl-long.510.pdf)\n\n\u003C!-- Markdown 链接与图片 -->\n\n\u003C!-- https:\u002F\u002Fwww.markdownguide.org\u002Fbasic-syntax\u002F#reference-style-links -->\n\n[contributors-shield]: https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcontributors\u002Fopenreasoner\u002Fopenr.svg?style=for-the-badge\n[contributors-url]: https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fgraphs\u002Fcontributors\n[forks-shield]: https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Fopenreasoner\u002Fopenr.svg?style=for-the-badge\n[forks-url]: https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fnetwork\u002Fmembers\n[stars-shield]: https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fopenreasoner\u002Fopenr.svg?style=for-the-badge\n[stars-url]: https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fstargazers\n[issues-shield]: https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Fopenreasoner\u002Fopenr.svg?style=for-the-badge\n[issues-url]: https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fissues\n\n[license-shield]: https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fopenreasoner\u002Fopenr.svg?style=for-the-badge\n[license-url]: https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fblob\u002Fmain\u002FLICENSE.txt","# OpenR 快速上手指南\n\nOpenR 是一个用于大语言模型高级推理的开源框架，支持过程监督数据生成、在线策略训练（RL）、多种搜索策略（如 MCTS、Best-of-N）以及奖励模型（PRM）训练。\n\n## 环境准备\n\n*   **操作系统**: Linux (推荐) 或 macOS\n*   **Python 版本**: 3.10\n*   **依赖管理**: Conda (推荐)\n*   **前置知识**: 熟悉 Hugging Face 模型下载流程\n\n## 安装步骤\n\n### 1. 创建并激活虚拟环境\n建议使用 Conda 创建独立的 Python 3.10 环境：\n\n```bash\nconda create -n open_reasoner python=3.10\nconda activate open_reasoner\n```\n\n### 2. 安装核心依赖\n安装项目所需的 Python 包及特定组件：\n\n```bash\npip install -r requirements.txt\npip3 install \"fschat[model_worker,webui]\"\npip install -U pydantic\n```\n\n### 3. 安装数学工具包\n编译并安装用于数学表达式处理的 `latex2sympy` 模块：\n\n```bash\ncd envs\u002FMATH\u002Flatex2sympy\npip install -e .\ncd -\n```\n\n### 4. 下载基础模型\n在运行前，请确保已下载以下基础模型至本地目录。您可以参考 [Hugging Face 下载教程](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhub\u002Fmodels-downloading)，或使用国内镜像站（如 ModelScope）加速下载。\n\n**所需模型列表：**\n*   `Qwen2.5-Math-1.5B-Instruct`\n*   `Qwen2.5-Math-7B-Instruct`\n*   `peiyi9979\u002Fmistral-7b-sft`\n*   `peiyi9979\u002Fmath-shepherd-mistral-7b-prm`\n\n> **提示**：国内用户推荐使用 ModelScope 进行模型下载，或将 Hugging Face  endpoint 配置为国内镜像源。\n\n## 基本使用\n\n### 1. 配置运行参数\n在执行推理之前，需要修改 `reason\u002Fllm_service\u002F` 目录下的脚本变量，以指向您的本地模型路径和配置工作进程数量。\n\n请设置以下变量：\n*   `$MODEL_BASE`: 存放模型的本地根目录路径。\n*   `$POLICY_MODEL_NAME`: 要使用的策略模型名称（例如 `Qwen2.5-Math-7B-Instruct`）。\n*   `$VALUE_MODEL_NAME`: 要使用的价值模型（奖励模型）名称（例如 `peiyi9979\u002Fmath-shepherd-mistral-7b-prm`）。\n*   `$NUM_LM_WORKER`: 启动的语言模型 (LM) 工作进程数。\n*   `$NUM_RM_WORKER`: 启动的奖励模型 (RM) 工作进程数。\n\n### 2. 运行推理\n配置完成后，即可调用框架提供的脚本启动推理。OpenR 支持多种推理策略，包括贪婪搜索、Best-of-N、束搜索 (Beam Search) 以及蒙特卡洛树搜索 (MCTS)。\n\n*(具体推理命令请参考项目 `reason\u002F` 目录下的示例脚本，根据上述配置的变量执行即可)*\n\n---\n*更多详细用法、训练教程及基准测试报告，请访问 [OpenR 官方文档](https:\u002F\u002Fopenreasoner.github.io\u002F) 或查看 [Arxiv 论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.09671)。*","某教育科技公司的算法团队正在开发一款针对高中生的自适应数学辅导系统，需要模型不仅能给出答案，还能生成逻辑严密、步骤清晰的解题过程。\n\n### 没有 openr 时\n- 模型面对复杂几何或代数题时，常出现“跳跃式”推理，直接给出结论而缺失关键推导步骤，导致学生无法理解。\n- 缺乏有效的验证机制，模型生成的解题路径中隐含逻辑错误（如公式误用），但输出结果看似合理，难以被自动检测发现。\n- 训练数据仅依赖标准答案，模型无法学习多样化的解题策略，遇到变体题目时泛化能力差，容易陷入死胡同。\n- 调试推理过程如同“黑盒”，开发人员难以定位模型是在哪一步逻辑链条上发生了断裂，优化效率极低。\n\n### 使用 openr 后\n- 利用 openr 的高级推理框架，模型被迫采用思维链（Chain-of-Thought）逐步拆解问题，输出的每一步推导都清晰可见且逻辑连贯。\n- 内置的自我反思与验证模块能实时检查中间步骤的合理性，自动修正公式滥用或计算偏差，显著降低“幻觉”错误率。\n- 通过 openr 提供的强化学习流程，模型在 MATH-APS 等数据集上学会了多种解题策略，面对新颖题型也能灵活迁移思路。\n- 框架开放的可视化分析工具让团队能精准追踪推理轨迹，快速识别薄弱节点并针对性微调模型，研发周期缩短了一半。\n\nopenr 将大模型从单纯的“答案生成器”升级为具备严谨逻辑闭环的“数学导师”，从根本上解决了复杂推理任务中的可靠性难题。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenreasoner_openr_6a636a3b.png","openreasoner","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fopenreasoner_6b059b76.png",null,"https:\u002F\u002Fgithub.com\u002Fopenreasoner",[77,81,85],{"name":78,"color":79,"percentage":80},"Python","#3572A5",97.3,{"name":82,"color":83,"percentage":84},"ANTLR","#9DC3FF",1.4,{"name":86,"color":87,"percentage":88},"Shell","#89e051",1.3,1842,131,"2026-04-15T09:34:03","MIT",4,"未说明","未说明 (项目涉及大语言模型推理与训练，隐含需要 NVIDIA GPU，但 README 未指定具体型号、显存或 CUDA 版本)",{"notes":97,"python":98,"dependencies":99},"建议使用 conda 创建名为 'open_reasoner' 的虚拟环境。运行前需手动下载基础模型（如 Qwen2.5-Math 系列、Mistral-7b 等）并配置相应路径变量。项目支持多种推理策略（如 MCTS、Beam Search）及在线策略训练。","3.10",[100,101,102,103],"requirements.txt 中定义的依赖","fschat[model_worker,webui]","pydantic","latex2sympy",[35,14],"2026-03-27T02:49:30.150509","2026-04-20T07:16:09.949710",[108,113,118,123,128,132],{"id":109,"question_zh":110,"answer_zh":111,"source_url":112},44277,"执行评估脚本时遇到 `requests.exception.MissingSchema: Invalid URL '\u002Fworker_generate'` 错误怎么办？","该错误通常是因为 `--LM` 和 `--RM` 参数配置不正确导致的。请确保：\n1. `--LM` 和 `--RM` 中的参数值必须与日志文件中的 model name 完全对应。\n2. 只需要填写 model name（例如 `Qwen2.5-Math-1.5B-Instruct`），不要填写完整的文件路径或包含多余的协议头。\n3. 检查生成的 `worker_addr` 是否为空或缺少 `http:\u002F\u002F` 前缀，这通常意味着服务启动脚本未正确注册模型地址。","https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fissues\u002F18",{"id":114,"question_zh":115,"answer_zh":116,"source_url":117},44278,"为什么我训练的 PRM 模型在 MCTS 方法下表现不如 CoT，甚至效果更差？","仅使用单一数据集（如 MATH-APS）训练 PRM 往往难以获得良好效果。建议采用混合数据集进行训练以提升泛化能力：\n1. 将 Math-shepherd、PRM800K 和 MATH-APS 数据集结合起来共同训练。\n2. 数学 PRM 训练数据相对稀缺，充分利用所有可用数据集是提升 MCTS 引导效果的关键。\n3. 如果复现官方开源的 PRM 有效但自定义训练无效，请重点检查数据预处理和训练超参数是否与官方一致。","https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fissues\u002F49",{"id":119,"question_zh":120,"answer_zh":121,"source_url":122},44279,"运行 OmegaPRM_v2 或其他标注脚本时，为何无法生成节点数据的 JSON 文件？","这通常是由于代码中存在的小缺陷导致的。社区用户已发现并修复了相关问题：\n1. 检查是否遇到了特定的运行时错误导致流程中断。\n2. 参考社区提交的 PR #79 (https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fpull\u002F79)，其中包含了针对 OmegaPRM_v2 运行问题的修复代码。\n3. 应用该补丁后，问答数据应能正常生成带有节点标注的 JSON 文件。","https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fissues\u002F72",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},44280,"当设置 `num_sequence` (或 `num_com`) 大于等于 2 时，推理过程出现维度不匹配（mismatch dimension）错误如何解决？","在使用 `best_of_n` 等方法且 `num_sequence >= 2` 时，可能会触发维度对齐问题。解决步骤如下：\n1. 确认报错具体位置是否在 reward model worker 中。\n2. 连接到 tmux 会话查看 `reward_model_worker` 的详细报错信息，通常涉及批次处理时的张量形状不一致。\n3. 确保使用的 PRM 模型（如 math-shepherd-mistral-7b-prm）服务已正确启动，并且能够处理多序列输入的批量评分请求。如果问题持续，可能需要检查评估代码中对多序列结果聚合的逻辑。","https:\u002F\u002Fgithub.com\u002Fopenreasoner\u002Fopenr\u002Fissues\u002F39",{"id":129,"question_zh":130,"answer_zh":131,"source_url":112},44281,"如何正确启动 LLM 服务和 RM 服务以避免评估时的连接错误？","正确的服务启动顺序和参数配置至关重要：\n1. 首先运行对应的服务启动脚本（如 `create_service_qwen2.5_math_vllm.sh` 或 `create_service_math_shepherd.sh`）。\n2. 启动后，务必进入 tmux 会话查看日志，确认服务已成功加载模型且没有报错。\n3. 在运行评估脚本（`evaluate.py`）时，`--LM` 和 `--RM` 参数必须严格匹配服务启动时注册的模型名称。\n4. 如果日志显示 `worker_addr` 为空，说明服务注册失败，需检查启动脚本中的控制器地址配置。",{"id":133,"question_zh":134,"answer_zh":135,"source_url":117},44282,"MCTS 方法相比 CoT 消耗更多 Token 但准确率提升不明显，这是正常现象吗？","这在 PRM 模型未经过充分训练或数据适配时是可能发生的。观察到的现象包括：\n1. Vanilla MCTS 的 Token 消耗量通常是 CoT 的数倍（例如从 500+ 增加到 2000+）。\n2. 如果 PRM 模型质量不高（如仅在少量数据上训练），MCTS 的搜索策略可能无法选出优于 CoT Greedy 的路径，导致准确率持平甚至下降。\n3. 解决方案是优化 PRM 模型（使用混合数据集训练），高质量的 PRM 才能使 MCTS 发挥出超越 CoT 的性能（如官方数据显示从 0.826 提升至 0.84）。",[]]