[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-maitrix-org--llm-reasoners":3,"tool-maitrix-org--llm-reasoners":62},[4,18,26,36,46,54],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",159267,2,"2026-04-17T11:29:14",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":42,"last_commit_at":43,"category_tags":44,"status":17},8272,"opencode","anomalyco\u002Fopencode","OpenCode 是一款开源的 AI 编程助手（Coding Agent），旨在像一位智能搭档一样融入您的开发流程。它不仅仅是一个代码补全插件，而是一个能够理解项目上下文、自主规划任务并执行复杂编码操作的智能体。无论是生成全新功能、重构现有代码，还是排查难以定位的 Bug，OpenCode 都能通过自然语言交互高效完成，显著减少开发者在重复性劳动和上下文切换上的时间消耗。\n\n这款工具专为软件开发者、工程师及技术研究人员设计，特别适合希望利用大模型能力来提升编码效率、加速原型开发或处理遗留代码维护的专业人群。其核心亮点在于完全开源的架构，这意味着用户可以审查代码逻辑、自定义行为策略，甚至私有化部署以保障数据安全，彻底打破了传统闭源 AI 助手的“黑盒”限制。\n\n在技术体验上，OpenCode 提供了灵活的终端界面（Terminal UI）和正在测试中的桌面应用程序，支持 macOS、Windows 及 Linux 全平台。它兼容多种包管理工具，安装便捷，并能无缝集成到现有的开发环境中。无论您是追求极致控制权的资深极客，还是渴望提升产出的独立开发者，OpenCode 都提供了一个透明、可信",144296,1,"2026-04-16T14:50:03",[13,45],"插件",{"id":47,"name":48,"github_repo":49,"description_zh":50,"stars":51,"difficulty_score":32,"last_commit_at":52,"category_tags":53,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":55,"name":56,"github_repo":57,"description_zh":58,"stars":59,"difficulty_score":32,"last_commit_at":60,"category_tags":61,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[45,13,15,14],{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":78,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":91,"forks":92,"last_commit_at":93,"license":94,"difficulty_score":10,"env_os":95,"env_gpu":96,"env_ram":97,"env_deps":98,"category_tags":108,"github_topics":77,"view_count":32,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":109,"updated_at":110,"faqs":111,"releases":142},8529,"maitrix-org\u002Fllm-reasoners","llm-reasoners","A library for advanced large language model reasoning","llm-reasoners 是一个专为提升大语言模型（LLM）复杂推理能力而设计的开源库。它旨在解决模型在处理数学解题、逻辑推导等高难度任务时，因缺乏系统性思考而导致准确率不足的问题。\n\n该工具非常适合 AI 研究人员和开发者使用，尤其是那些希望复现前沿算法或深入分析模型推理过程的团队。llm-reasoners 的核心亮点在于其丰富的算法支持，不仅涵盖了思维链（CoT）、思维树（ToT）等经典方法，还集成了蒙特卡洛树搜索（MCTS）、推理时缩放（Inference-time Scaling）等最新研究成果。为了降低使用门槛，它提供了一行代码即可生成的直观可视化功能，让用户能清晰洞察复杂的推理路径。此外，通过集成高性能推理框架 SGLang 并支持多种后端，llm-reasoners 在确保严格复现论文效果的同时，显著提升了推理效率，是探索和优化大模型推理策略的得力助手。","![logo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmaitrix-org_llm-reasoners_readme_bc79edacb87e.png)\n\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fwww.llm-reasoners.net\u002F\">Home\u003C\u002Fa>\n  |\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.05221\">Paper (COLM2024)\u003C\u002Fa>\n  |\n  \u003Ca href=\"https:\u002F\u002Fwww.llm-reasoners.net\u002Fblog\">Blog\u003C\u002Fa>\n  |\n  \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FPxDJby9W\">Discord\u003C\u002Fa>\n  |\n  \u003Ca href=\"https:\u002F\u002Fmaitrix.org\u002F\">@Maitrix.org\u003C\u002Fa>\n\u003C\u002Fp>\n\n---\n\n**LLM Reasoners** is a library designed to enhance LLMs' ability to perform complex reasoning using advanced algorithms. It provides:\n\n\n- **Cutting-Edge Reasoning Algorithms**\n  \n  The library offer the most up-to-date search algorithms for reasoning with LLMs, such as:\n  \n  - [Reasoner Agent](examples\u002FReasonerAgent-Web) ([Deng et al., 2025](https:\u002F\u002Freasoner-agent.maitrix.org\u002F))\n  - [Inference-time Scaling with PRM](examples\u002FInference-Scaling-SGL\u002Fmath500) ([Snell et al., 2024](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.03314))\n  - [Reasoning-via-Planning, MCTS](examples\u002FRAP) ([Hao et al., 2023](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14992))\n  - [Tree-of-Thoughts, BFS](examples\u002FToT) ([Yao et al., 2023](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.10601))\n  \n  \u003Cdetails>\n    \u003Csummary>(Show more supported algorithms)\u003C\u002Fsummary>\n  \n  - [StructChem](examples\u002FStructChem) ([Ouyang et al., 2023](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.09656))\n  - [Chain-of-thoughts](examples\u002FCoT) ([Wei et al., 2022](https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.11903))\n  - [Least-to-most prompting](examples\u002FLeast-to-most) ([Zhou et al., 2022](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.10625))\n  - [Tree-of-Thoughts, DFS](examples\u002FToT) ([Yao et al., 2023](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.10601))\n  - [Self-Eval Guided Decoding, Beam Search](examples\u002FSelf-Eval) ([Xie et al., 2023](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.00633))\n  - [Grace Decoding](examples\u002FGrace) ([Khalifa et al., 2023](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14934))\n  - [Eurus](examples\u002FEurus) ([Yuan et al., 2024](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.02078))\n  - [PromptAgent](examples\u002FPromptAgent) ([Wang et al., 2023](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.16427))\n  - [DRPO](examples\u002FDRPO) ([Singla et al., 2024](https:\u002F\u002Faclanthology.org\u002F2024.emnlp-main.1220\u002F))\n  \n  \u003C\u002Fdetails>\n\n- **Intuitive Visualization and Interpretation**: Our library provides a [visualization tool](https:\u002F\u002Fwww.llm-reasoners.net\u002F) to aid users in comprehending the reasoning process. Even for complex reasoning algorithms like Monte-Carlo Tree Search, users can easily diagnose and understand the process with **one line of python code**. See an exmaple in the tutorial [notebook](demo.ipynb).\n \n- **Efficient Reasoning with LLM**: Our library optimizes the performance of advanced reasoning techniques by integrating [SGLang](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang), a high-performance LLM inference framework, featuring structured generation (Check out this [thread](https:\u002F\u002Fx.com\u002FMaitrixOrg\u002Fstatus\u002F1885387184557199857) and [example](https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002Fllm-reasoners\u002Ftree\u002Fmain\u002Fexamples\u002FInference-Scaling-SGL\u002Fmath500)). We also support other LLM backends like `huggingface transformers`, `OpenAI API`, `Exllama`, `fairscale`, `llama.cpp`, etc.\n\n- **Rigorous Implementation and Reproducibility**: We prioritize precision and reliability in our implementations, ensuring that our algorithms are not just theoretical concepts but practically usable tools. All methods implemented in LLM Reasoners are carefully engineered to be faithful to their original formulations and performance. It powers our [analysis](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.05221) of reasoning algorithms published in COLM2024.\n\n    \u003Cdetails>\n    \n    \u003Csummary> (Examples of Reproducibility) \u003C\u002Fsummary>\n    \n    - LLM Reasoners has been tested to successfully reproduce the performance of [Tree-of-Thoughts](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.10601), [Guided Decoding](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.00633) and [GRACE Decoding](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14934) with their official implementation. We list the results reported in their paper \u002F reproduced from their official repositories for reference (†). Some results are on the subsets of the first 100 examples (*).\n    \n    \u003Cdiv align=\"center\">\n        \n    |Method|Base LLM|GSM8k|\n    |--|--|--|\n    |[Guided Decoding](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.00633)\u003Csup>†\u003C\u002Fsup>|CodeX (PAL)|0.80|-|-|-|-|-|\n    |Guided Decoding|CodeX (PAL)|[0.83\\*](examples\u002Fguided_gsm8k)|-|-|-|-|-|\n    \n    |Method|Base LLM|Game of 24|\n    |--|--|--|\n    |[Tree-of-Thoughts](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.10601)\u003Csup>†\u003C\u002Fsup>|GPT-3.5-turbo|0.22|\n    |Tree-of-Thoughts|GPT-3.5-turbo|[0.22](examples\u002Ftot_game24)|\n    \n    |Method|Base LLM|GSM8k|\n    |--|--|--|\n    |[GRACE Decoding](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14934)\u003Csup>†\u003C\u002Fsup>|Flan-T5-Large (Fine-tuned)|0.34|-|-|-|-|-|\n    |GRACE Decoding| Flan-T5-Large (Fine-tuned)|[0.33\\*](examples\u002Fgrace_gsm8k)|-|-|-|-|-|\n    \u003C\u002Fdiv>\n    \n    \u003C\u002Fdetails>\n\n## News\n- Feb. 21, 2025: We have integrated Deepseek R1 ([example](https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002Fllm-reasoners\u002Ftree\u002Fmain\u002Fexamples\u002FLongCoT_Search\u002FProsQA)). Also check out our analysis of the search patterns of R1 ([thread](https:\u002F\u002Fx.com\u002FMaitrixOrg\u002Fstatus\u002F1893017035753574799)). \n\n- Feb. 6, 2025: Thrilled to introduce **ReasonerAgent** - A fully open source, ready-to-run agent that does research 🧐 in a web browser and answers your queries. Check out this [thread](https:\u002F\u002Fx.com\u002FMaitrixOrg\u002Fstatus\u002F1887584291087098063), and explore the [code](https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002Fllm-reasoners\u002Ftree\u002Fmain\u002Fexamples\u002FReasonerAgent-Web) here! \n- Jan. 31, 2025: LLM Reasoners has integrated [SGLang](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang). Enjoy 100x speed-up with a one-line change! New applications like PRM-guided search for inference-time scaling are also available. See more details in this [post](https:\u002F\u002Fx.com\u002FMaitrixOrg\u002Fstatus\u002F1885387184557199857).\n- Dec. 20, 2024: We now supported planning algorithms (MCTS, DFS\u002FBFS, Beam Search) in web environments with [BrowserGym](https:\u002F\u002Fgithub.com\u002FServiceNow\u002FBrowserGym), check the [README](https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002Fllm-reasoners\u002Ftree\u002Fmain\u002Fexamples\u002Fbrowsergym) to try out!\n\n\u003Cdetails>\n\n\u003Csummary>(Show more news)\u003C\u002Fsummary>\n\n- Nov. 13, 2024: We integrated [DRPO](https:\u002F\u002Faclanthology.org\u002F2024.emnlp-main.1220\u002F), a tuning-free alignment method published at EMNLP 2024 ([link](https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002Fllm-reasoners\u002Ftree\u002Fmain\u002Fexamples\u002FDRPO)).\n  \n- Jul. 10, 2024: Our paper on [LLM Reasoners](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.05221) is accepted to [COLM 2024](https:\u002F\u002Fcolmweb.org\u002Findex.html)!\n- Jun. 24, 2024: [PromptAgent](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.16427) is in LLM Reasoners! Let it help you write down a super detailed prompt for your task ([here](https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002Fllm-reasoners\u002Ftree\u002Fmain\u002Fexamples\u002FPromptAgent)).\n- May. 14, 2024: Check out [Eurus](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.02078), a suit of LLMs optimized for reasoning. With LLM Reasoners, Eurus-RM can easily boost Llama-8B from 0.49 to 0.73 📈 on GSM8k ([code](examples\u002FEurus)).\n- May. 2, 2024: We have integrated our first reasoning method for scientific reasoning, [StructChem](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.09656)! Check it out [here](https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002Fllm-reasoners\u002Ftree\u002Fmain\u002Fexamples\u002FStructChem).\n- Apr. 22, 2024: We integrated [Llama-3](https:\u002F\u002Fgithub.com\u002Fmeta-llama\u002Fllama3), with additional useful APIs (e.g., customizing EOS tokens, calculating likelihood)\n- **Apr. 8, 2024: Our new [paper](assets\u002FReasoners.pdf) introducing LLM Reasoners is available!**\n- Mar. 29, 2024: [Grace Decoding](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14934) has been incorporated!\n- Oct. 25, 2023: A [video tutorial](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=5QfOxtiw_ZU) on the visualizer of LLM Reasoners are available.\n\n- Oct. 23, 2023: Reasoning-via-Planning is accepted to EMNLP 2023! Check our [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14992) with updated results and discussion!\n\u003C\u002Fdetails>\n\n\n## Introduction of the library\n\n![Library Structure](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmaitrix-org_llm-reasoners_readme_1ef221f1e905.png)\n\nWe abstract an LLM reasoning algorithm into three key components, *reward function*, *world model*, and *search algorithm* (see the formulation in our [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.05221)), corresponding to three classes in the library, \u003Ctt>SearchConfig\u003C\u002Ftt>, \u003Ctt>WorldModel\u003C\u002Ftt> and \u003Ctt>SearchAlgorithm\u003C\u002Ftt> respectively. Besides, there are \u003Ctt>LLM APIs\u003C\u002Ftt> to power other modules, \u003Ctt>Benchmark\u003C\u002Ftt>, and \u003Ctt>Visualization\u003C\u002Ftt> to evaluate or debug the reasoning algorithm (middle). To implement a reasoning algorithm for a certain domain (a \u003Ctt>Reasoner\u003C\u002Ftt> object), a user may inherit the \u003Ctt>SearchConfig\u003C\u002Ftt> and \u003Ctt>WorldModel\u003C\u002Ftt> class, and import a pre-implemented \u003Ctt>SearchAlgorithm\u003C\u002Ftt>. We also show a concrete example of solving Blocksworld with RAP using LLM Reasoners (bottom).\n\n\n## Quick Tour\nLet's go through the code of reasoning over Blocksworld problems. Note that the code is simplified for demonstration (check [here](demo.ipynb) for a runnable notebook).\n\nThe first step is to define the world model: you will set up an initial state given a question in `init_state`, judge whether a state is terminal in `is_terminal`, and most importantly, define the world dynamics with `step`:\n```python\nfrom typing import NamedTuple\nimport utils\nfrom reasoners import WorldModel, LanguageModel\nimport copy\n\nBWState = str\nBWAction = str\n\nclass BlocksWorldModel(WorldModel[BWState, BWAction]):\n    def __init__(self,\n                 base_model: LanguageModel,\n                 prompt: dict) -> None:\n        super().__init__()\n        self.base_model = base_model\n        self.prompt = prompt\n\n    def init_state(self) -> BWState:\n        # extract the statement from a given problem\n        # e.g., \"the red block is clear, the blue block is clear...\"\n        return BWState(utils.extract_init_state(self.example)) \n\n    def step(self, state: BWState, action: BWAction) -> tuple[BWState, dict]:\n        # call the LLM to predict the state transition\n        state = copy.deepcopy(state)\n        # load the prompt for the LLM to predict the next state\n        # e.g. \"... I have that \u003Cstate>, if I \u003Caction>, then ...\"\n        world_update_prompt = self.prompt[\"update\"].replace(\"\u003Cstate>\", state).replace(\"\u003Caction>\", action)\n        world_output = self.base_model.generate([world_update_prompt],\n                                    eos_token_id=\"\\n\", hide_input=True, temperature=0).text[0].strip()\n        new_state = utils.process_new_state(world_output)\n        # till now, we have the new state after the action\n        # the following part is to speed up the reward calculation\n\n        # we want to check the portion of the satisfied subgoals, and use it as a part of the reward\n        # since we have predicted the new state already, we can just check it here at convenience\n        goal_reached = utils.goal_check(utils.extract_goals(self.example, new_state))\n        # return the new state and the additional dictionary (to be passed to the reward function)\n        return new_state, {\"goal_reached\": goal_reached}\n\n    def is_terminal(self, state: BWState) -> bool:\n        # define the condition the terminal state to stop the search\n        # e.g., all the subgoals are met\n        if utils.goal_check(utils.extract_goals(self.example), state.blocks_state) == 1:\n            return True\n        return False\n```\nThen, it's time to consider how to search for the optimal reasoning chain. It involves `get_actions` to get the action space given a state, and the most important `reward` as the guidance for reasoning. For Monte-Carlo Tree Search, we can additionally define a `fast_reward` to speed up the roll-out stage.\n```python\nimport utils\nfrom world_model import BWState, BWAction\nfrom reasoners import SearchConfig, LanguageModel\nclass BWConfig(SearchConfig):\n    def __init__(self,\n                 base_model: LanguageModel,\n                 prompt: dict,\n                 reward_alpha=0.5,\n                 goal_reward_default=0.,\n                 goal_reached_reward=100) -> None:\n        super().__init__()\n        self.base_model = base_model\n        self.example = None\n        self.prompt = prompt\n        # some parameters to calculate the fast reward or reward (explained below)\n        self.reward_alpha = reward_alpha\n        self.goal_reward_default = goal_reward_default\n        self.goal_reached_reward = goal_reached_reward\n\n    def get_actions(self, state: BWState) -> list[BWAction]:\n        # use a rule-based function to extract all legal actions\n        return utils.generate_all_actions(state)\n\n    def fast_reward(self, state: BWState, action: BWAction) -> tuple[float, dict]:\n        # build an in-context learning prompt (similar to the one used in Chain-of-thoughts reasoning)\n        inputs = self.prompt[\"icl\"].replace(\"\u003Cinit_state>\", state)\\\n            .replace(\"\u003Cgoals>\", utils.extract_goals(self.example))\n        # concatenate a candidate action after the prompt, and test its loglikelihood\n        intuition = self.base_model.get_loglikelihood(inputs, [inputs + action])[0]\n        # the reward is a combination of intuition and goal satisfaction\n        # in fast_reward, we skip the calculation of goal satisfaction and use a default value\n        fast_reward = intuition * self.reward_alpha + self.goal_reward_default * (1 - self.reward_alpha)\n        # cache some information for the reward calculation later (will be passed to `reward` function)\n        details = {'intuition': intuition}\n        return fast_reward, details\n\n    def reward(self, state: BWState, action: BWAction,\n               intuition: float = None,\n               goal_reached: tuple[bool, float] = None) -> float:\n        # note that `intuition` (cached in `fast_reward`) and `goal_reached` (cached in `step`) are automatically passed as parameters to this reward function\n        if goal_reached == 1:\n            # if the goal state is reached, we will assign a large reward\n            goal_reward = self.goal_reached_reward\n        else:\n            # otherwise assign the reward based on the portion of satisfied subgoals\n            goal_reward = goal_reached\n        # the reward is a combination of intuition and goal satisfaction\n        reward = intuition * self.reward_alpha + goal_reward * (1 - self.reward_alpha)\n        # return the reward and an additional dictionary (to be saved in the log for visualization later)\n        return reward, {'intuition': intuition, 'goal_reached': goal_reached}\n```\nNow, we are ready to apply a reasoning algorithm to solve the problem:\n```python\nfrom reasoners.algorithm import MCTS\nfrom reasoners.lm import LLaMAModel\nfrom world_model import BlocksWorldModel\nfrom search_config import BWConfig\n\nllama_model = LLaMAModel(llama_ckpts, llama_size, max_batch_size=1)\nwith open(prompt_path) as f:\n    prompt = json.load(f)\nworld_model = BlocksWorldModel(base_model=base_model, prompt=prompt)\nconfig = BWConfig(base_model=llama_model, prompt=prompt)\n# save the history of every iteration for visualization\nsearch_algo = MCTS(output_trace_in_each_iter=True)\nreasoner = Reasoner(world_model=world_model, search_config=config, search_algo=search_algo)\nfor i, example in enumerate(dataset):\n    algo_output = reasoner(example)\n    # save the MCTS results as pickle files\n    with open(os.path.join(log_dir, 'algo_output', f'{resume + i + 1}.pkl'), 'wb') as f:\n        pickle.dump(algo_output, f)\n```\nFinally, we can easily visualize the reasoning process:\n```python\nimport pickle\nfrom reasoners.visualization import visualize\nwith open(\"logs\u002Fbw_MCTS\u002Fxxx\u002Falgo_output\u002F1.pkl\", 'rb') as f:\n    mcts_result = pickle.load(f)\n\nfrom reasoners.visualization.tree_snapshot import NodeData\nfrom reasoners.algorithm.mcts import MCTSNode\n\n# by default, a state will be presented along with the node, and the reward with saved dictionary in `SearchConfig.reward` will be presented along with the edge. \n# we can also define a helper function to customize what we want to see in the visualizer.\ndef blocksworld_node_data_factory(n: MCTSNode) -> NodeData:\n    return NodeData({\"block state\": n.state.blocks_state if n.state else None,\n                     \"satisfied\": n.fast_reward_details if n.fast_reward_details else \"Not expanded\"})\ndef blocksworld_edge_data_factory(n: MCTSNode) -> EdgeData:\n    return EdgeData({\"reward\": n.reward, \"intuition\": n.fast_reward_details[\"intuition\"]})\nvisualize(mcts_result, node_data_factory=blocksworld_node_data_factory,\n                       edge_data_factory=blocksworld_edge_data_factory)\n```\nThen a URL of the visualized results will pop up. The figure will be interactive and look like the examples shown on our [demo website](https:\u002F\u002Fllm-reasoners.net\u002F).\n## Installation\n\nMake sure to use Python 3.10 or later.\n\n```bash\nconda create -n reasoners python=3.10\nconda activate reasoners\n```\n\n### Install from `pip`\n\n```bash\npip install llm-reasoners\n```\n\n### Install from github\n(Recommended if you want to run the examples in the github repo)\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FBer666\u002Fllm-reasoners --recursive\ncd llm-reasoners\npip install -e .\n```\nAdding `--recursive` will help you clone exllama and LLM-Planning automatically. Note that some other optional modules may require other dependencies. Please refer to the error message for details.\n\n## Citation\nThis project is an extension of the following paper:\n```bibtex\n@inproceedings{hao2023reasoning,\n  title={Reasoning with Language Model is Planning with World Model},\n  author={Hao, Shibo and Gu, Yi and Ma, Haodi and Hong, Joshua and Wang, Zhen and Wang, Daisy and Hu, Zhiting},\n  booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing},\n  pages={8154--8173},\n  year={2023}\n}\n@article{hao2024llm,\n  title={LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models},\n  author={Hao, Shibo and Gu, Yi and Luo, Haotian and Liu, Tianyang and Shao, Xiyan and Wang, Xinyuan and Xie, Shuhua and Ma, Haodi and Samavedhi, Adithya and Gao, Qiyue and others},\n  journal={arXiv preprint arXiv:2404.05221},\n  year={2024}\n}\n```\n","![logo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmaitrix-org_llm-reasoners_readme_bc79edacb87e.png)\n\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fwww.llm-reasoners.net\u002F\">首页\u003C\u002Fa>\n  |\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.05221\">论文（COLM2024）\u003C\u002Fa>\n  |\n  \u003Ca href=\"https:\u002F\u002Fwww.llm-reasoners.net\u002Fblog\">博客\u003C\u002Fa>\n  |\n  \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FPxDJby9W\">Discord\u003C\u002Fa>\n  |\n  \u003Ca href=\"https:\u002F\u002Fmaitrix.org\u002F\">@Maitrix.org\u003C\u002Fa>\n\u003C\u002Fp>\n\n---\n\n**LLM Reasoners** 是一个旨在利用先进算法提升大语言模型复杂推理能力的库。它提供以下功能：\n\n\n- **前沿的推理算法**\n\n  该库提供了当前最先进的用于大语言模型推理的搜索算法，例如：\n  \n  - [Reasoner Agent](examples\u002FReasonerAgent-Web) ([Deng 等, 2025](https:\u002F\u002Freasoner-agent.maitrix.org\u002F))\n  - [推理时缩放与 PRM](examples\u002FInference-Scaling-SGL\u002Fmath500) ([Snell 等, 2024](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.03314))\n  - [规划式推理，MCTS](examples\u002FRAP) ([Hao 等, 2023](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14992))\n  - [思维树，BFS](examples\u002FToT) ([Yao 等, 2023](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.10601))\n  \n  \u003Cdetails>\n    \u003Csummary>(显示更多支持的算法)\u003C\u002Fsummary>\n  \n  - [StructChem](examples\u002FStructChem) ([Ouyang 等, 2023](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.09656))\n  - [思维链](examples\u002FCoT) ([Wei 等, 2022](https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.11903))\n  - [由简入繁提示](examples\u002FLeast-to-most) ([Zhou 等, 2022](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.10625))\n  - [思维树，DFS](examples\u002FToT) ([Yao 等, 2023](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.10601))\n  - [自评引导解码，束搜索](examples\u002FSelf-Eval) ([Xie 等, 2023](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.00633))\n  - [Grace 解码](examples\u002FGrace) ([Khalifa 等, 2023](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14934))\n  - [Eurus](examples\u002FEurus) ([Yuan 等, 2024](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.02078))\n  - [PromptAgent](examples\u002FPromptAgent) ([Wang 等, 2023](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.16427))\n  - [DRPO](examples\u002FDRPO) ([Singla 等, 2024](https:\u002F\u002Faclanthology.org\u002F2024.emnlp-main.1220\u002F))\n  \n  \u003C\u002Fdetails>\n\n- **直观的可视化与解释**：我们的库提供了一个[可视化工具](https:\u002F\u002Fwww.llm-reasoners.net\u002F)，帮助用户理解推理过程。即使是像蒙特卡洛树搜索这样复杂的推理算法，用户也只需**一行 Python 代码**就能轻松诊断并理解整个流程。示例请参见教程[笔记本](demo.ipynb)。\n\n- **高效的 LLM 推理**：通过集成高性能 LLM 推理框架[SGLang](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang)，该库优化了高级推理技术的性能，其特点是结构化生成（请参阅此[推文](https:\u002F\u002Fx.com\u002FMaitrixOrg\u002Fstatus\u002F1885387184557199857)和[示例](https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002Fllm-reasoners\u002Ftree\u002Fmain\u002Fexamples\u002FInference-Scaling-SGL\u002Fmath500)）。我们还支持其他 LLM 后端，如 `huggingface transformers`、`OpenAI API`、`Exllama`、`fairscale`、`llama.cpp` 等。\n\n- **严谨的实现与可复现性**：我们在实现中优先考虑精确性和可靠性，确保我们的算法不仅是理论概念，更是可实际使用的工具。LLM Reasoners 中实现的所有方法都经过精心设计，忠实于其原始表述和性能。这为我们发表在 COLM2024 上的推理算法分析（[arxiv.org\u002Fabs\u002F2404.05221](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.05221)）提供了有力支撑。\n\n    \u003Cdetails>\n    \n    \u003Csummary> (可复现性的示例) \u003C\u002Fsummary>\n    \n    - LLM Reasoners 已成功复现了[思维树](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.10601)、[引导解码](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.00633)和[GRACE 解码](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14934)在其官方实现中的表现。我们列出了这些论文中报告的结果以及从其官方仓库复现的结果供参考（†）。部分结果基于前 100 个示例的子集（*）。\n    \n    \u003Cdiv align=\"center\">\n        \n    |方法|基础 LLM|GSM8k|\n    |--|--|--|\n    |[引导解码](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.00633)\u003Csup>†\u003C\u002Fsup>|CodeX (PAL)|0.80|-|-|-|-|-|\n    |引导解码|CodeX (PAL)|[0.83\\*](examples\u002Fguided_gsm8k)|-|-|-|-|-|\n    \n    |方法|基础 LLM|24点游戏|\n    |--|--|--|\n    |[思维树](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.10601)\u003Csup>†\u003C\u002Fsup>|GPT-3.5-turbo|0.22|\n    |思维树|GPT-3.5-turbo|[0.22](examples\u002Ftot_game24)|\n    \n    | 方法|基础 LLM|GSM8k|\n    |--|--|--|\n    |[GRACE 解码](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14934)\u003Csup>†\u003C\u002Fsup>|Flan-T5-Large（微调版）|0.34|-|-|-|-|-|\n    |GRACE 解码| Flan-T5-Large（微调版）|[0.33\\*](examples\u002Fgrace_gsm8k)|-|-|-|-|-|\n    \u003C\u002Fdiv>\n    \n    \u003C\u002Fdetails>\n\n## 新闻\n- 2025年2月21日：我们已集成Deepseek R1（[示例](https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002Fllm-reasoners\u002Ftree\u002Fmain\u002Fexamples\u002FLongCoT_Search\u002FProsQA)）。同时，请查看我们对R1搜索模式的分析（[推文](https:\u002F\u002Fx.com\u002FMaitrixOrg\u002Fstatus\u002F1893017035753574799)）。\n\n- 2025年2月6日：我们非常高兴地推出**ReasonerAgent**——一个完全开源、开箱即用的智能体，它可以在网页浏览器中进行研究并回答您的问题🧐。请查看这篇[推文](https:\u002F\u002Fx.com\u002FMaitrixOrg\u002Fstatus\u002F1887584291087098063)，并在此处探索[代码](https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002Fllm-reasoners\u002Ftree\u002Fmain\u002Fexamples\u002FReasonerAgent-Web)！\n\n- 2025年1月31日：LLM Reasoners已集成[SGLang](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang)。只需一行代码更改，即可享受100倍的速度提升！此外，还提供了如PRM引导搜索等用于推理时扩展的新应用。更多详情请参见此[帖子](https:\u002F\u002Fx.com\u002FMaitrixOrg\u002Fstatus\u002F1885387184557199857)。\n\n- 2024年12月20日：我们现在已在基于[BrowserGym](https:\u002F\u002Fgithub.com\u002FServiceNow\u002FBrowserGym)的Web环境中支持规划算法（MCTS、DFS\u002FBFS、束搜索），请查阅[README](https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002Fllm-reasoners\u002Ftree\u002Fmain\u002Fexamples\u002Fbrowsergym)以进行尝试！\n\n\u003Cdetails>\n\n\u003Csummary>（显示更多新闻）\u003C\u002Fsummary>\n\n- 2024年11月13日：我们集成了[DRPO](https:\u002F\u002Faclanthology.org\u002F2024.emnlp-main.1220\u002F)，这是一种在EMNLP 2024上发表的无需微调的对齐方法（[链接](https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002Fllm-reasoners\u002Ftree\u002Fmain\u002Fexamples\u002FDRPO)）。\n\n- 2024年7月10日：我们关于[LLM Reasoners](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.05221)的论文已被[COLM 2024](https:\u002F\u002Fcolmweb.org\u002Findex.html)接收！\n\n- 2024年6月24日：[PromptAgent](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.16427)现已加入LLM Reasoners！它可以帮助您为任务编写超详细的提示词（[这里](https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002Fllm-reasoners\u002Ftree\u002Fmain\u002Fexamples\u002FPromptAgent)）。\n\n- 2024年5月14日：请查看[Eurus](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.02078)，这是一系列专为推理优化的LLM。借助LLM Reasoners，Eurus-RM可以轻松将Llama-8B在GSM8k上的得分从0.49提升至0.73📈（[代码](examples\u002FEurus)）。\n\n- 2024年5月2日：我们首次集成了用于科学推理的推理方法[StructChem](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.09656)! 欢迎在此处查看（[链接](https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002Fllm-reasoners\u002Ftree\u002Fmain\u002Fexamples\u002FStructChem)）。\n\n- 2024年4月22日：我们集成了[Llama-3](https:\u002F\u002Fgithub.com\u002Fmeta-llama\u002Fllama3)，并添加了额外的实用API（例如自定义EOS标记、计算似然度等）。\n\n- **2024年4月8日：我们介绍LLM Reasoners的新[论文](assets\u002FReasoners.pdf)现已发布！**\n\n- 2024年3月29日：[Grace Decoding](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14934)已被纳入！\n\n- 2023年10月25日：关于LLM Reasoners可视化工具的[视频教程](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=5QfOxtiw_ZU)现已上线。\n\n- 2023年10月23日：通过规划进行推理的方法已被EMNLP 2023接收！请查看我们的[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14992)，其中包含更新的结果和讨论！\n\u003C\u002Fdetails>\n\n\n## 库简介\n\n![库结构](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmaitrix-org_llm-reasoners_readme_1ef221f1e905.png)\n\n我们将LLM推理算法抽象为三个关键组件：*奖励函数*、*世界模型*和*搜索算法*（详见我们的[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.05221)中的公式），分别对应库中的三个类：\u003Ctt>SearchConfig\u003C\u002Ftt>、\u003Ctt>WorldModel\u003C\u002Ftt>和\u003Ctt>SearchAlgorithm\u003C\u002Ftt>。此外，还有用于驱动其他模块的\u003Ctt>LLM API\u003C\u002Ftt>、用于评估或调试推理算法的\u003Ctt>Benchmark\u003C\u002Ftt>和\u003Ctt>Visualization\u003C\u002Ftt>（中间）。要实现针对特定领域的推理算法（一个\u003Ctt>Reasoner\u003C\u002Ftt>对象），用户可以继承\u003Ctt>SearchConfig\u003C\u002Ftt>和\u003Ctt>WorldModel\u003C\u002Ftt>类，并导入一个预实现的\u003Ctt>SearchAlgorithm\u003C\u002Ftt>。我们还在底部展示了一个使用LLM Reasoners通过RAP解决Blocksworld问题的具体示例。\n\n## 快速导览\n让我们来过一遍针对 Blocksworld 问题的推理代码。请注意，这段代码为了演示目的进行了简化（可在[这里](demo.ipynb)找到可运行的笔记本）。\n\n第一步是定义世界模型：你需要在 `init_state` 中根据问题设置初始状态，在 `is_terminal` 中判断一个状态是否为终止状态，而最重要的是用 `step` 定义世界动力学：\n```python\nfrom typing import NamedTuple\nimport utils\nfrom reasoners import WorldModel, LanguageModel\nimport copy\n\nBWState = str\nBWAction = str\n\nclass BlocksWorldModel(WorldModel[BWState, BWAction]):\n    def __init__(self,\n                 base_model: LanguageModel,\n                 prompt: dict) -> None:\n        super().__init__()\n        self.base_model = base_model\n        self.prompt = prompt\n\n    def init_state(self) -> BWState:\n        # 从给定的问题中提取陈述\n        # 例如：“红色方块是干净的，蓝色方块是干净的……”\n        return BWState(utils.extract_init_state(self.example)) \n\n    def step(self, state: BWState, action: BWAction) -> tuple[BWState, dict]:\n        # 调用 LLM 预测状态转移\n        state = copy.deepcopy(state)\n        # 加载 LLM 预测下一个状态的提示\n        # 例如：“……我有 \u003Cstate>, 如果我 \u003Caction>, 那么……”\n        world_update_prompt = self.prompt[\"update\"].replace(\"\u003Cstate>\", state).replace(\"\u003Caction>\", action)\n        world_output = self.base_model.generate([world_update_prompt],\n                                    eos_token_id=\"\\n\", hide_input=True, temperature=0).text[0].strip()\n        new_state = utils.process_new_state(world_output)\n        # 到目前为止，我们已经得到了执行动作后的新的状态\n        # 接下来的部分是为了加快奖励计算\n\n        # 我们希望检查已满足子目标的比例，并将其作为奖励的一部分\n        # 由于我们已经预测了新状态，因此可以在这里方便地进行检查\n        goal_reached = utils.goal_check(utils.extract_goals(self.example, new_state))\n        # 返回新的状态和附加字典（将传递给奖励函数）\n        return new_state, {\"goal_reached\": goal_reached}\n\n    def is_terminal(self, state: BWState) -> bool:\n        # 定义终止状态的条件以停止搜索\n        # 例如，所有子目标都已达成\n        if utils.goal_check(utils.extract_goals(self.example), state.blocks_state) == 1:\n            return True\n        return False\n```\n然后，我们需要考虑如何搜索最优的推理链。这涉及到在给定状态时使用 `get_actions` 获取动作空间，以及最重要的作为推理指导的 `reward` 函数。对于蒙特卡洛树搜索，我们还可以额外定义一个 `fast_reward` 来加速展开阶段。\n```python\nimport utils\nfrom world_model import BWState, BWAction\nfrom reasoners import SearchConfig, LanguageModel\nclass BWConfig(SearchConfig):\n    def __init__(self,\n                 base_model: LanguageModel,\n                 prompt: dict,\n                 reward_alpha=0.5,\n                 goal_reward_default=0.,\n                 goal_reached_reward=100) -> None:\n        super().__init__()\n        self.base_model = base_model\n        self.example = None\n        self.prompt = prompt\n        # 一些用于计算快速奖励或奖励的参数（如下文解释）\n        self.reward_alpha = reward_alpha\n        self.goal_reward_default = goal_reward_default\n        self.goal_reached_reward = goal_reached_reward\n\n    def get_actions(self, state: BWState) -> list[BWAction]:\n        # 使用基于规则的函数提取所有合法动作\n        return utils.generate_all_actions(state)\n\n    def fast_reward(self, state: BWState, action: BWAction) -> tuple[float, dict]:\n        # 构建上下文学习提示（类似于思维链推理中使用的提示）\n        inputs = self.prompt[\"icl\"].replace(\"\u003Cinit_state>\", state)\\\n            .replace(\"\u003Cgoals>\", utils.extract_goals(self.example))\n        # 将候选动作连接到提示之后，并测试其对数似然\n        intuition = self.base_model.get_loglikelihood(inputs, [inputs + action])[0]\n        # 奖励是直觉与目标达成度的结合\n        # 在快速奖励中，我们跳过目标达成度的计算，直接使用默认值\n        fast_reward = intuition * self.reward_alpha + self.goal_reward_default * (1 - self.reward_alpha)\n        # 缓存一些信息以便后续奖励计算（将传递到 `reward` 函数）\n        details = {'intuition': intuition}\n        return fast_reward, details\n\n    def reward(self, state: BWState, action: BWAction,\n               intuition: float = None,\n               goal_reached: tuple[bool, float] = None) -> float:\n        # 注意，`intuition`（在 `fast_reward` 中缓存）和 `goal_reached`（在 `step` 中缓存）会自动作为参数传递给这个奖励函数\n        if goal_reached == 1:\n            # 如果达到目标状态，我们将给予高额奖励\n            goal_reward = self.goal_reached_reward\n        else:\n            # 否则，根据已满足子目标的比例分配奖励\n            goal_reward = goal_reached\n        #  Beliebtigkeit ist eine Kombination aus Intuition und Zielvereinbarung.\n        Beliebtigkeit ist eine Kombination aus Intuition und Zielvereinbarung.\n        Beliebtigkeit ist eine Kombination aus Intuition und Zielvereinbarung.\n        Beliebtigkeit ist eine Kombination aus Intuition und Zielvereinbarung.\n        Beliebtigkeit ist eine Kombination aus Intuition und Zielvereinbarung.\n        Beliebtigkeit ist eine Kombination aus Intuition und Zielvereinbarung.\n        Beliebtigkeit ist eine Kombination aus Intuition und Zielvereinbarung.\n        Beliebtigkeit ist eine Kombination aus Intuition und Zielvereinbarung.\n        Beliebtigkeit ist eine Kombination aus Intuition und Zielvereinbarung.\n        Beliebtigkeit ist eine Kombination aus Intuition und Zielvereinbarung.\n        Beliebtigkeit ist eine Kombination aus Intuition und Zielvereinbarung.\n        Beliebtigkeit ist eine Kombination aus Intuition und Zielvereinbarung.\n        Beliebtigkeit ist eine Kombination aus Intuition und Zielverein......## 快速导览\n让我们来过一遍针对 Blocksworld 问题的推理代码。请注意，这段代码为了演示目的进行了简化（可在 [这里](demo.ipynb) 查看可运行的笔记本）。\n\n第一步是定义世界模型：你需要在 `init_state` 中根据问题设置初始状态，在 `is_terminal` 中判断一个状态是否为终止状态，而最重要的是用 `step` 定义世界动力学：\n```python\nfrom typing import NamedTuple\nimport utils\nfrom reasoners import WorldModel, LanguageModel\nimport copy\n\nBWState = str\nBWAction = str\n\nclass BlocksWorldModel(WorldModel[BWState, BWAction]):\n    def __init__(self,\n                 base_model: LanguageModel,\n                 prompt: dict) -> None:\n        super().__init__()\n        self.base_model = base_model\n        self.prompt = prompt\n\n    def init_state(self) -> BWState:\n        # 从给定的问题中提取陈述\n        # 例如：“红色方块是干净的，蓝色方块是干净的……”\n        return BWState(utils.extract_init_state(self.example)) \n\n    def step(self, state: BWState, action: BWAction) -> tuple[BWState, dict]:\n        # 调用 LLM 预测状态转移\n        state = copy.deepcopy(state)\n        # 加载 LLM 预测下一个状态的提示\n        # 例如：“……我有 \u003Cstate>, 如果我 \u003Caction>, 那么……”\n        world_update_prompt = self.prompt[\"update\"].replace(\"\u003Cstate>\", state).replace(\"\u003Caction>\", action)\n        world_output = self.base_model.generate([world_update_prompt],\n                                    eos_token_id=\"\\n\", hide_input=True, temperature=0).text[0].strip()\n        new_state = utils.process_new_state(world_output)\n        # 到目前为止，我们已经得到了执行动作后的新状态\n        # 接下来的部分是为了加快奖励计算\n\n        # 我们想检查已满足的子目标的比例，并将其作为奖励的一部分\n        # 由于我们已经预测了新状态，因此可以在这里方便地进行检查\n        goal_reached = utils.goal_check(utils.extract_goals(self.example, new_state))\n        # 返回新的状态和附加字典（将传递给奖励函数）\n        return new_state, {\"goal_reached\": goal_reached}\n\n    def is_terminal(self, state: BWState) -> bool:\n        # 定义终止状态的条件以停止搜索\n        # 例如，所有子目标都已达成\n        if utils.goal_check(utils.extract_goals(self.example), state.blocks_state) == 1:\n            return True\n        return False\n```\n然后，该考虑如何搜索最优的推理链了。这涉及到 `get_actions` 来获取给定状态下的动作空间，以及最重要的 `reward` 作为推理的指导。对于蒙特卡洛树搜索，我们还可以额外定义一个 `fast_reward` 来加速展开阶段。\n```python\nimport utils\nfrom world_model import BWState, BWAction\nfrom reasoners import SearchConfig, LanguageModel\nclass BWConfig(SearchConfig):\n    def __init__(self,\n                 base_model: LanguageModel,\n                 prompt: dict,\n                 reward_alpha=0.5,\n                 goal_reward_default=0.,\n                 goal_reached_reward=100) -> None:\n        super().__init__()\n        self.base_model = base_model\n        self.example = None\n        self.prompt = prompt\n        # 一些用于计算快速奖励或奖励的参数（如下解释）\n        self.reward_alpha = reward_alpha\n        self.goal_reward_default = goal_reward_default\n        self.goal_reached_reward = goal_reached_reward\n\n    def get_actions(self, state: BWState) -> list[BWAction]:\n        # 使用基于规则的函数提取所有合法动作\n        return utils.generate_all_actions(state)\n\n    def fast_reward(self, state: BWState, action: BWAction) -> tuple[float, dict]:\n        # 构建上下文学习提示（类似于思维链推理中使用的提示）\n        inputs = self.prompt[\"icl\"].replace(\"\u003Cinit_state>\", state)\\\n            .replace(\"\u003Cgoals>\", utils.extract_goals(self.example))\n        # 将候选动作连接到提示之后，并测试其对数似然\n        intuition = self.base_model.get_loglikelihood(inputs, [inputs + action])[0]\n        # 奖励是直觉与目标达成度的结合\n        # 在快速奖励中，我们跳过目标达成度的计算，直接使用默认值\n        fast_reward = intuition * self.reward_alpha + self.goal_reward_default * (1 - self.reward_alpha)\n        # 缓存一些信息以便稍后计算奖励（将传递到 `reward` 函数）\n        details = {'intuition': intuition}\n        return fast_reward, details\n\n    def reward(self, state: BWState, action: BWAction,\n               intuition: float = None,\n               goal_reached: tuple[bool, float] = None) -> float:\n        # 注意，`intuition`（在 `fast_reward` 中缓存）和 `goal_reached`（在 `step` 中缓存）会自动作为参数传递给这个奖励函数\n        if goal_reached == 1:\n            # 如果达到目标状态，我们将给予高额奖励\n            goal_reward = self.goal_reached_reward\n        else:\n            # 否则，根据已满足的子目标比例分配奖励\n            goal_reward = goal_reached\n        # 奖励是直觉与目标达成度的组合\n        reward = intuition * self.reward_alpha + goal_reward * (1 - self.reward_alpha)\n        # 返回奖励和附加字典（将在日志中保存以便后续可视化）\n        return reward, {'intuition': intuition, 'goal_reached': goal_reached}\n```\n现在，我们已经准备好应用推理算法来解决问题了：\n```python\nfrom reasoners.algorithm import MCTS\nfrom reasoners.lm import LLaMAModel\nfrom world_model import BlocksWorldModel\nfrom search_config import BWConfig\n\nllama_model = LLaMAModel(llama_ckpts, llama_size, max_batch_size=1)\nwith open(prompt_path) as f:\n    prompt = json.load(f)\nworld_model = BlocksWorldModel(base_model=base_model, prompt=prompt)\nconfig = BWConfig(base_model=llama_model, prompt=prompt)\n# 保存每次迭代的历史以便可视化\nsearch_algo = MCTS(output_trace_in_each_iter=True)\nreasoner = Reasoner(world_model=world_model, search_config=config, search_algo=search_algo)\nfor i, example in enumerate(dataset):\n    algo_output = reasoner(example)\n    # 将 MCTS 结果保存为 pickle 文件\n    with open(os.path.join(log_dir, 'algo_output', f'{resume + i + 1}.pkl'), 'wb') as f:\n        pickle.dump(algo_output, f)\n```\n最后，我们可以轻松地可视化推理过程：\n```python\nimport pickle\nfrom reasoners.visualization import visualize\nwith open(\"logs\u002Fbw_MCTS\u002Fxxx\u002Falgo_output\u002F1.pkl\", 'rb') as f:\n    mcts_result = pickle.load(f)\n\nfrom reasoners.visualization.tree_snapshot import NodeData\nfrom reasoners.algorithm.mcts import MCTSNode\n\n# 默认情况下，状态会与节点一起展示，而`SearchConfig.reward`中保存的奖励字典中的奖励则会与边一起展示。\n# 我们也可以定义一个辅助函数来自定义可视化工具中显示的内容。\ndef blocksworld_node_data_factory(n: MCTSNode) -> NodeData:\n    return NodeData({\"积木状态\": n.state.blocks_state if n.state else None,\n                     \"满足情况\": n.fast_reward_details if n.fast_reward_details else \"未展开\"})\ndef blocksworld_edge_data_factory(n: MCTSNode) -> EdgeData:\n    return EdgeData({\"奖励\": n.reward, \"直觉\": n.fast_reward_details[\"intuition\"]})\nvisualize(mcts_result, node_data_factory=blocksworld_node_data_factory,\n                       edge_data_factory=blocksworld_edge_data_factory)\n```\n随后会弹出一个可视化结果的URL链接。该图像是交互式的，外观类似于我们在[演示网站](https:\u002F\u002Fllm-reasoners.net\u002F)上展示的示例。\n## 安装\n\n请确保使用 Python 3.10 或更高版本。\n\n```bash\nconda create -n reasoners python=3.10\nconda activate reasoners\n```\n\n### 通过 `pip` 安装\n\n```bash\npip install llm-reasoners\n```\n\n### 从 GitHub 安装\n（如果您想运行 GitHub 仓库中的示例，推荐此方法）\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FBer666\u002Fllm-reasoners --recursive\ncd llm-reasoners\npip install -e .\n```\n添加 `--recursive` 参数将帮助您自动克隆 exllama 和 LLM-Planning。请注意，其他一些可选模块可能需要额外的依赖项，请根据错误信息进行处理。\n\n## 引用\n本项目是以下论文的扩展：\n```bibtex\n@inproceedings{hao2023reasoning,\n  title={利用语言模型进行推理即为基于世界模型的规划},\n  author={Hao, Shibo and Gu, Yi and Ma, Haodi and Hong, Joshua and Wang, Zhen and Wang, Daisy and Hu, Zhiting},\n  booktitle={2023年自然语言处理经验方法会议论文集},\n  pages={8154--8173},\n  year={2023}\n}\n@article{hao2024llm,\n  title={LLM Reasoners：大型语言模型逐步推理的新评估、库及分析},\n  author={Hao, Shibo and Gu, Yi and Luo, Haotian and Liu, Tianyang and Shao, Xiyan and Wang, Xinyuan and Xie, Shuhua and Ma, Haodi and Samavedhi, Adithya and Gao, Qiyue and others},\n  journal={arXiv 预印本 arXiv:2404.05221},\n  year={2024}\n}\n```","# LLM Reasoners 快速上手指南\n\n**LLM Reasoners** 是一个旨在通过先进算法增强大语言模型（LLM）复杂推理能力的开源库。它集成了思维链（CoT）、思维树（ToT）、蒙特卡洛树搜索（MCTS）等多种前沿推理算法，并提供直观的可视化调试工具和高性能的 SGLang 后端支持。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux (推荐 Ubuntu 20.04+) 或 macOS。Windows 用户建议使用 WSL2。\n*   **Python 版本**: Python 3.9 或更高版本。\n*   **硬件要求**:\n    *   若使用本地模型推理，建议配备 NVIDIA GPU (显存取决于模型大小，运行 7B 模型建议至少 16GB)。\n    *   若仅调用 OpenAI API 等云端服务，对本地硬件无特殊要求。\n*   **前置依赖**:\n    *   `git`\n    *   `pip`\n    *   (可选) CUDA Toolkit (如需编译特定算子或使用 SGLang 加速)\n\n## 安装步骤\n\n### 1. 克隆仓库\n首先从 GitHub 克隆项目代码：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002Fllm-reasoners.git\ncd llm-reasoners\n```\n\n### 2. 创建虚拟环境并安装依赖\n推荐使用 `conda` 或 `venv` 创建隔离环境：\n\n```bash\n# 使用 conda 创建环境\nconda create -n reasoners python=3.10 -y\nconda activate reasoners\n\n# 安装核心依赖\npip install -e .\n```\n\n> **注意**：如果您计划使用 **SGLang** 进行高性能推理加速，请参考 [SGLang 官方文档](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang) 单独安装其对应版本的依赖，因为 SGLang 对环境有特定的 CUDA 要求。\n\n### 3. 配置模型后端\n本库支持多种后端，您可以根据需求选择：\n*   **OpenAI API**: 无需额外安装，只需设置环境变量 `OPENAI_API_KEY`。\n*   **Hugging Face Transformers \u002F Exllama \u002F llama.cpp**: 需安装对应的 Python 包（通常 `pip install -e .` 已包含基础支持，特定量化后端可能需要额外指令）。\n*   **SGLang**: 需独立部署 SGLang 服务。\n\n## 基本使用\n\nLLM Reasoners 将推理算法抽象为三个核心组件：**WorldModel** (世界模型)、**SearchConfig** (搜索配置) 和 **SearchAlgorithm** (搜索算法)。\n\n以下是一个简化的示例，展示如何定义一个“积木世界”（Blocksworld）的推理任务并使用 MCTS 算法求解。\n\n### 1. 定义世界模型 (WorldModel)\n世界模型负责定义状态初始化、状态转移（Step）和终止条件。\n\n```python\nfrom typing import NamedTuple\nimport utils\nfrom reasoners import WorldModel, LanguageModel\nimport copy\n\n# 定义状态和动作类型\nBWState = str\nBWAction = str\n\nclass BlocksWorldModel(WorldModel[BWState, BWAction]):\n    def __init__(self, base_model: LanguageModel, prompt: dict) -> None:\n        super().__init__()\n        self.base_model = base_model\n        self.prompt = prompt\n\n    def init_state(self) -> BWState:\n        # 从问题中提取初始状态\n        return BWState(utils.extract_init_state(self.example)) \n\n    def step(self, state: BWState, action: BWAction) -> tuple[BWState, dict]:\n        # 调用 LLM 预测状态转移\n        state = copy.deepcopy(state)\n        world_update_prompt = self.prompt[\"update\"].replace(\"\u003Cstate>\", state).replace(\"\u003Caction>\", action)\n        \n        # 生成新状态\n        world_output = self.base_model.generate(\n            [world_update_prompt],\n            eos_token_id=\"\\n\", \n            hide_input=True, \n            temperature=0\n        ).text[0].strip()\n        \n        new_state = utils.process_new_state(world_output)\n        \n        # 计算部分奖励所需的信息 (如子目标达成情况)\n        goal_reached = utils.goal_check(utils.extract_goals(self.example, new_state))\n        \n        return new_state, {\"goal_reached\": goal_reached}\n\n    def is_terminal(self, state: BWState) -> bool:\n        # 判断是否达到终止状态 (所有子目标完成)\n        if utils.goal_check(utils.extract_goals(self.example), state) == 1:\n            return True\n        return False\n```\n\n### 2. 定义搜索配置 (SearchConfig)\n配置搜索空间（动作生成）和奖励函数（Reward），用于指导搜索方向。\n\n```python\nfrom reasoners import SearchConfig\n\nclass BWConfig(SearchConfig):\n    def __init__(self, base_model: LanguageModel, prompt: dict, reward_alpha=0.5) -> None:\n        super().__init__()\n        self.base_model = base_model\n        self.prompt = prompt\n        self.reward_alpha = reward_alpha\n\n    def get_actions(self, state: BWState) -> list[BWAction]:\n        # 让 LLM 生成当前状态下可能的动作列表\n        prompt = self.prompt[\"get_actions\"].replace(\"\u003Cstate>\", state)\n        output = self.base_model.generate([prompt], temperature=0).text[0]\n        return utils.parse_actions(output)\n\n    def reward(self, state: BWState, **kwargs) -> float:\n        # 定义奖励函数，结合最终结果和中间过程奖励\n        goal_reached = kwargs.get(\"goal_reached\", False)\n        if goal_reached:\n            return 100.0\n        # 这里可以添加基于启发式的中间奖励\n        return 0.0\n```\n\n### 3. 运行搜索算法\n实例化组件并启动搜索（以 MCTS 为例）。\n\n```python\nfrom reasoners.algorithm import MCTS\nfrom reasoners.lm import OpenAIModel # 或其他后端，如 HuggingFaceModel\n\n# 1. 初始化基座模型\nllm = OpenAIModel(model=\"gpt-3.5-turbo\") \n\n# 2. 准备提示词 (需根据具体任务定义)\nprompts = {\n    \"update\": \"...\", \n    \"get_actions\": \"...\",\n    # 其他必要提示词\n}\n\n# 3. 实例化组件\nworld_model = BlocksWorldModel(base_model=llm, prompt=prompts)\nsearch_config = BWConfig(base_model=llm, prompt=prompts)\nsearch_algo = MCTS(max_depth=10, n_iterations=20)\n\n# 4. 执行推理\n# example 是你的具体任务输入数据\nresult = search_algo(world_model, search_config, example=\"your_blocksworld_problem_data\")\n\n# 5. 获取最佳推理路径\nbest_trajectory = result.best_trajectory\nprint(\"Best reasoning chain:\", best_trajectory)\n```\n\n### 4. 可视化调试\nLLM Reasoners 提供了一行代码即可生成的可视化工具，帮助理解复杂的搜索过程（如 MCTS 树结构）：\n\n```python\n# 在 notebook 或脚本中调用\nresult.visualize() \n# 这将生成一个可在浏览器中打开的链接，展示完整的推理树和节点得分\n```\n\n更多详细示例和完整可运行代码，请参阅仓库中的 `demo.ipynb` 或 `examples\u002F` 目录下的具体算法实现（如 `examples\u002FRAP`, `examples\u002FToT`）。","某量化分析团队正利用大模型处理复杂的金融逻辑推理任务，需要从海量非结构化新闻中推导潜在的市场趋势。\n\n### 没有 llm-reasoners 时\n- **推理深度不足**：直接使用基础提示词（Prompt）或简单的思维链（CoT），模型在面对多步逻辑跳转时容易“迷路”，导致结论缺乏严谨性。\n- **黑盒难以调试**：当模型得出错误结论时，开发者无法直观看到中间思考路径，像面对黑盒一样难以定位是哪一步逻辑崩塌。\n- **算法复现困难**：想要尝试前沿的“思维树（ToT）”或“蒙特卡洛树搜索（MCTS）”等高级策略，需从零编写复杂的搜索与评估代码，耗时且易出错。\n- **推理效率低下**：缺乏针对推理过程的底层优化，在处理大批量数据时，单次推理耗时过长，无法满足实时分析需求。\n\n### 使用 llm-reasoners 后\n- **策略灵活升级**：通过一行代码即可切换至 MCTS 或 ToT 等先进算法，让模型具备规划与自我修正能力，显著提升了复杂推导的准确率。\n- **过程透明可视**：利用内置的可视化工具，团队能清晰看到模型每一步的搜索树与评分细节，快速诊断并优化薄弱环节。\n- **开箱即用复现**：直接调用库中经严格验证的算法实现，无需重复造轮子，确保了实验结果的可复现性与学术严谨性。\n- **高性能加速**：集成 SGLang 后端进行结构化生成优化，在保持高推理质量的同时，大幅缩短了大规模任务的执行时间。\n\nllm-reasoners 将原本晦涩难懂的高级推理算法转化为可落地、可观测的工程利器，让大模型真正具备了处理复杂逻辑的“大脑”。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmaitrix-org_llm-reasoners_bc79edac.png","maitrix-org","Maitrix.org","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fmaitrix-org_63b7cbd3.png","Open Organization to Build AI-powered Realities based on Techniques of Large Language\u002FMulti-Modal Models, Agent Models, World Models.",null,"maitrix.org@gmail.com","MaitrixOrg","https:\u002F\u002Fmaitrix.org","https:\u002F\u002Fgithub.com\u002Fmaitrix-org",[83,87],{"name":84,"color":85,"percentage":86},"Python","#3572A5",86.8,{"name":88,"color":89,"percentage":90},"Jupyter Notebook","#DA5B0B",13.2,2341,203,"2026-04-15T09:34:48","Apache-2.0","","未说明（支持多种后端如 SGLang, huggingface transformers, OpenAI API, Exllama, llama.cpp 等，具体取决于所选后端）","未说明",{"notes":99,"python":97,"dependencies":100},"该库是一个框架，支持多种 LLM 后端（包括 SGLang、Hugging Face、OpenAI API、Exllama、llama.cpp 等），因此具体的硬件和软件依赖取决于用户选择的后端。集成了 SGLang 可实现高性能推理。支持可视化调试工具。",[101,102,103,104,105,106,107],"SGLang","huggingface transformers","OpenAI API","Exllama","fairscale","llama.cpp","BrowserGym",[35,14,13],"2026-03-27T02:49:30.150509","2026-04-18T00:45:29.017167",[112,117,122,127,132,137],{"id":113,"question_zh":114,"answer_zh":115,"source_url":116},38197,"该项目是否支持 LLaMA 2 模型？如何配置使用？","是的，项目已更新以支持 LLaMA 2。维护者已将 LLaMA 2 的集成更新到主分支。如果在实验中遇到问题，可以直接联系维护者。用户反馈在使用 LLaMA 2 进行 GSM8K 测试时，RAP 算法在少量样本（Few-shots）设置下表现良好（例如 k=4 时得分约 0.41-0.47），但在零样本（Zero-shot）模式下可能会因为输出格式不包含\"The answer is\"而导致解析失败，建议检查输出解析逻辑或使用 Few-shot 模式。","https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002Fllm-reasoners\u002Fissues\u002F18",{"id":118,"question_zh":119,"answer_zh":120,"source_url":121},38198,"运行 ReAct HotpotQA 示例时遇到 'missing 1 required positional argument: toolset' 错误怎么办？","该问题通常是因为代码中未正确定义或传递 `toolset` 参数。根据社区反馈，此问题已在后续版本中解决。如果遇到类似错误，请确保拉取最新的代码库。此外，如果使用的是 Llama-3 模型，需确保模型目录中包含 `params.json` 文件，或者确认使用的模型路径和类型（如 ExLlamaModel）与推理脚本中的配置相匹配。","https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002Fllm-reasoners\u002Fissues\u002F110",{"id":123,"question_zh":124,"answer_zh":125,"source_url":126},38199,"使用 Llama 3.1 Instruct 复现 ProntoQA (DFS-ToT) 结果为 0 准确率，如何解决？","这通常是因为默认配置未正确适配 Hugging Face 模型。维护者已更新代码以支持 HF 模型类。请使用以下命令尝试运行，并指定 `--base_lm hf` 参数：\n`CUDA_VISIBLE_DEVICES=4 python examples\u002FToT\u002Fprontoqa\u002Ftot_inference.py --base_lm hf --model_dir meta-llama\u002FLlama-3.1-8B-Instruct --batch_size 8 --search_algo dfs --log_dir logs\u002Fprontoqa_tot_dfs_abc --depth_limit 10 --total_states 10 --temperature 0.8 --max_per_state 3`\n该配置已验证可解决准确率均为 0 的问题，且适用于 Phi 等其他 HF 模型。","https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002Fllm-reasoners\u002Fissues\u002F122",{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},38200,"遇到 'TypeError: Too few parameters for WorldModel' 类型错误该如何排查？","此错误通常发生在初始化阶段，可能与 Python 版本、typing 扩展库或模型加载方式有关。建议首先尝试使用官方的 Meta LLaMA 仓库运行基础推理，以排除环境配置问题。如果官方仓库运行正常但本项目报错，请检查是否使用了正确的分支版本，或者尝试调整 `torchrun` 的参数（如 `--nproc-per-node`）。有用户反馈在下载新仓库并重新运行 GSM8k 示例后，原始 bug 被修复。","https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002Fllm-reasoners\u002Fissues\u002F47",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},38201,"项目是否集成了 GRACE 解码方法？","是的，项目已经完成了 GRACE 解码方法的集成。这是在社区贡献者（如 @adithya-samavedhi）的帮助下完成的。现在用户可以在项目中直接使用 GRACE 相关的功能。","https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002Fllm-reasoners\u002Fissues\u002F27",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},38202,"使用 GPT-3.5 Turbo 运行 Blocksworld (ToT DFS) 时准确率为 0 或报错，原因是什么？","这通常是因为 OpenAI API 默认不直接返回所需的对数概率（loglikelihood）。用户需要自行实现 `get_loglikelihood` 函数，利用 OpenAI API 的 `top_logprobs` 和 `logprobs` 参数来获取数据。如果不正确实现该函数，搜索算法无法评估状态价值，从而导致准确率为 0。建议参考社区提供的代码片段来实现基于 OpenAI API 的对数概率计算。","https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002Fllm-reasoners\u002Fissues\u002F100",[143],{"id":144,"version":145,"summary_zh":146,"released_at":147},306352,"v1.0.0","# LLM Reasoners 初次发布\n\n**LLM Reasoners** 是一个旨在使大语言模型能够进行复杂推理的库，内置了先进的推理算法。它将多步推理视为一种规划任务，并*搜索最优的推理链*，通过“世界模型”和“奖励”的概念，在探索与利用之间实现最佳平衡。\n# 核心特性\n\n\n\n## 最前沿的推理算法\n\n我们提供了用于大语言模型推理的最新搜索算法，例如：\n\n- [基于规划的推理、蒙特卡洛树搜索（Hao 等，2023）](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14992)\n- [StructChem（Ouyang 等，2023）](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.09656)\n- [思维链（Wei 等，2022）](https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.11903)\n- [从少到多提示法（Zhou 等，2022）](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.10625)\n- [思维树、广度优先搜索（Yao 等，2023）](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.10601)\n- [思维树、深度优先搜索（Yao 等，2023）](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.10601)\n- [引导解码、束搜索（Xie 等，2023）](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.00633)\n- [优雅解码、贪婪解码（Khalifa 等，2023）](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14934)\n\n## 直观的可视化与解释\n\n我们的库提供了一个[可视化工具](https:\u002F\u002Fwww.llm-reasoners.net\u002F)，帮助用户理解推理过程。即使是像蒙特卡洛树搜索这样复杂的推理算法，用户也只需**一行 Python 代码**就能轻松诊断并理解整个流程。示例请参见教程中的[笔记本](demo.ipynb)。\n\n## 与主流 LLM 库的兼容性\n\n我们的框架兼容多种流行的 LLM 框架，例如 `Huggingface Transformers`、`OpenAI`\u002F`Google`\u002F`Anthropic` API 等。特别地，我们集成了 LLaMA-1\u002F2\u002F3，并支持使用 `fairscale`（[1,2](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fllama)，[3](https:\u002F\u002Fgithub.com\u002Fmeta-llama\u002Fllama3)）、[LLaMA.cpp](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fllama.cpp)、[Exllama](https:\u002F\u002Fgithub.com\u002FBer666\u002Fllm-reasoners\u002Ftree\u002Fmain\u002Freasoners\u002Flm#exllama) 或 `huggingface` 等不同实现方式，以满足不同的需求，比如最快的推理速度、最低的硬件要求等。","2024-05-02T18:48:36"]