[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-algorithmicsuperintelligence--optillm":3,"tool-algorithmicsuperintelligence--optillm":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":79,"owner_website":81,"owner_url":82,"languages":83,"stars":95,"forks":96,"last_commit_at":97,"license":98,"difficulty_score":23,"env_os":99,"env_gpu":100,"env_ram":101,"env_deps":102,"category_tags":112,"github_topics":113,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":133,"updated_at":134,"faqs":135,"releases":166},3326,"algorithmicsuperintelligence\u002Foptillm","optillm","Optimizing inference proxy for LLMs","OptiLLM 是一款专为大语言模型（LLM）设计的推理优化代理工具，旨在无需任何额外训练或微调的情况下，显著提升模型在数学、编程及逻辑推理等复杂任务中的准确率。它通过充当兼容 OpenAI API 的中间层，在推理阶段动态调用 20 多种前沿技术（如混合专家代理、蒙特卡洛树搜索及自动规划等），以“增加计算换质量”的策略，让轻量级模型也能展现出媲美顶级旗舰模型的性能表现。\n\n该工具主要解决了中小参数模型在处理高难度推理问题时准确率不足的痛点，帮助用户在不更换底层模型的前提下，低成本地获得更可靠的输出结果。由于其支持“即插即用”，只需简单修改 API 请求地址和模型名称即可生效，因此非常适合开发者、研究人员以及需要部署高性能 AI 应用的企业团队使用。无论是希望优化现有聊天机器人回答质量的工程师，还是致力于探索推理边界的研究者，都能通过 OptiLLM 快速验证效果。其独特的技术亮点在于集成了包括 MARS、CePO 在内的多种先进算法，并原生支持 OpenAI、Anthropic、Google 等百家模型提供商，兼具灵活性与生产环境的稳定性。","# OptiLLM\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Falgorithmicsuperintelligence_optillm_readme_ff913a070650.png\" alt=\"OptiLLM Logo\" width=\"400\" \u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cstrong>🚀 2-10x accuracy improvements on reasoning tasks with zero training\u003C\u002Fstrong>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fstargazers\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Falgorithmicsuperintelligence\u002Foptillm?style=social\" alt=\"GitHub stars\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Foptillm\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Foptillm\" alt=\"PyPI version\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Foptillm\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Foptillm\" alt=\"PyPI downloads\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fblob\u002Fmain\u002FLICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Falgorithmicsuperintelligence\u002Foptillm\" alt=\"License\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fcodelion\u002Foptillm\">🤗 HuggingFace Space\u003C\u002Fa> •\n  \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1SpuUb8d9xAoTh32M-9wJsB50AOH54EaH?usp=sharing\">📓 Colab Demo\u003C\u002Fa> •\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fdiscussions\">💬 Discussions\u003C\u002Fa>\n\u003C\u002Fp>\n\n---\n\n**OptiLLM** is an OpenAI API-compatible optimizing inference proxy that implements 20+ state-of-the-art techniques to dramatically improve LLM accuracy and performance on reasoning tasks - without requiring any model training or fine-tuning.\n\nIt is possible to beat the frontier models using these techniques across diverse tasks by doing additional compute at inference time. A good example of how to combine such techniques together is the [CePO approach](optillm\u002Fcepo) from Cerebras.\n\n## ✨ Key Features\n\n- **🎯 Instant Improvements**: 2-10x better accuracy on math, coding, and logical reasoning\n- **🔌 Drop-in Replacement**: Works with any OpenAI-compatible API endpoint  \n- **🧠 20+ Optimization Techniques**: From simple best-of-N to advanced MCTS and planning\n- **📦 Zero Training Required**: Just proxy your existing API calls through OptiLLM\n- **⚡ Production Ready**: Used in production by companies and researchers worldwide\n- **🌍 Multi-Provider**: Supports OpenAI, Anthropic, Google, Cerebras, and 100+ models via LiteLLM\n\n## 🚀 Quick Start\n\nGet powerful reasoning improvements in 3 simple steps:\n\n```bash\n# 1. Install OptiLLM\npip install optillm\n\n# 2. Start the server\nexport OPENAI_API_KEY=\"your-key-here\"\noptillm\n\n# 3. Use with any OpenAI client - just change the model name!\n```\n\n```python\nfrom openai import OpenAI\n\nclient = OpenAI(base_url=\"http:\u002F\u002Flocalhost:8000\u002Fv1\")\n\n# Add 'moa-' prefix for Mixture of Agents optimization\nresponse = client.chat.completions.create(\n    model=\"moa-gpt-4o-mini\",  # This gives you GPT-4o performance from GPT-4o-mini!\n    messages=[{\"role\": \"user\", \"content\": \"Solve: If 2x + 3 = 7, what is x?\"}]\n)\n```\n\n**Before OptiLLM**: \"x = 1\" ❌  \n**After OptiLLM**: \"Let me work through this step by step: 2x + 3 = 7, so 2x = 4, therefore x = 2\" ✅\n\n## 📊 Proven Results\n\nOptiLLM delivers measurable improvements across diverse benchmarks:\n\n| Technique | Base Model | Improvement | Benchmark |\n|-----------|------------|-------------|-----------|\n| **MARS** | Gemini 2.5 Flash Lite | **+30.0 points** | AIME 2025 (43.3→73.3) |\n| **CePO** | Llama 3.3 70B | **+18.6 points** | Math-L5 (51.0→69.6) |\n| **AutoThink** | DeepSeek-R1-1.5B | **+9.34 points** | GPQA-Diamond (21.72→31.06) |\n| **LongCePO** | Llama 3.3 70B | **+13.6 points** | InfiniteBench (58.0→71.6) |\n| **MOA** | GPT-4o-mini | **Matches GPT-4** | Arena-Hard-Auto |\n| **PlanSearch** | GPT-4o-mini | **+20% pass@5** | LiveCodeBench |\n\n*Full benchmark results [below](#sota-results-on-benchmarks-with-optillm)* ⬇️\n\n## 🏗️ Installation\n\n### Using pip\n\n```bash\npip install optillm\noptillm\n2024-10-22 07:45:05,612 - INFO - Loaded plugin: privacy\n2024-10-22 07:45:06,293 - INFO - Loaded plugin: memory\n2024-10-22 07:45:06,293 - INFO - Starting server with approach: auto\n```\n\n### Using docker\n\n```bash\ndocker pull ghcr.io\u002Falgorithmicsuperintelligence\u002Foptillm:latest\ndocker run -p 8000:8000 ghcr.io\u002Falgorithmicsuperintelligence\u002Foptillm:latest\n2024-10-22 07:45:05,612 - INFO - Loaded plugin: privacy\n2024-10-22 07:45:06,293 - INFO - Loaded plugin: memory\n2024-10-22 07:45:06,293 - INFO - Starting server with approach: auto\n```\n\n**Available Docker image variants:**\n\n- **Full image** (`latest`): Includes all dependencies for local inference and plugins\n- **Proxy-only** (`latest-proxy`): Lightweight image without local inference capabilities\n- **Offline** (`latest-offline`): Self-contained image with pre-downloaded models (spaCy) for fully offline operation\n\n```bash\n# Proxy-only (smallest)\ndocker pull ghcr.io\u002Falgorithmicsuperintelligence\u002Foptillm:latest-proxy\n\n# Offline (largest, includes pre-downloaded models)\ndocker pull ghcr.io\u002Falgorithmicsuperintelligence\u002Foptillm:latest-offline\n```\n\n### Install from source\n\nClone the repository with `git` and use `pip install` to setup the dependencies.\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm.git\ncd optillm\npython3 -m venv .venv\nsource .venv\u002Fbin\u002Factivate\npip install -r requirements.txt\n```\n\n## 🔒 SSL Configuration\n\nOptILLM supports SSL certificate verification configuration for working with self-signed certificates or corporate proxies.\n\n**Disable SSL verification (development only):**\n```bash\n# Command line\noptillm --no-ssl-verify\n\n# Environment variable\nexport OPTILLM_SSL_VERIFY=false\noptillm\n```\n\n**Use custom CA certificate:**\n```bash\n# Command line\noptillm --ssl-cert-path \u002Fpath\u002Fto\u002Fca-bundle.crt\n\n# Environment variable\nexport OPTILLM_SSL_CERT_PATH=\u002Fpath\u002Fto\u002Fca-bundle.crt\noptillm\n```\n\n⚠️ **Security Note**: Disabling SSL verification is insecure and should only be used in development. For production environments with custom CAs, use `--ssl-cert-path` instead. See [SSL_CONFIGURATION.md](SSL_CONFIGURATION.md) for details.\n\n## Implemented techniques\n\n| Approach                             | Slug               | Description                                                                                    |\n| ------------------------------------ | ------------------ | ---------------------------------------------------------------------------------------------- |\n| [MARS (Multi-Agent Reasoning System)](optillm\u002Fmars) | `mars`             | Multi-agent reasoning with diverse temperature exploration, cross-verification, and iterative improvement |\n| [Cerebras Planning and Optimization](optillm\u002Fcepo)   | `cepo`             | Combines Best of N, Chain-of-Thought, Self-Reflection, Self-Improvement, and various prompting techniques |\n| CoT with Reflection                  | `cot_reflection`   | Implements chain-of-thought reasoning with \\\u003Cthinking\\>, \\\u003Creflection> and \\\u003Coutput> sections |\n| PlanSearch                           | `plansearch`       | Implements a search algorithm over candidate plans for solving a problem in natural language   |\n| ReRead                               | `re2`              | Implements rereading to improve reasoning by processing queries twice                          |\n| Self-Consistency                     | `self_consistency` | Implements an advanced self-consistency method                                                 |\n| Z3 Solver                            | `z3`               | Utilizes the Z3 theorem prover for logical reasoning                                           |\n| R* Algorithm                         | `rstar`            | Implements the R* algorithm for problem-solving                                                |\n| LEAP                                 | `leap`             | Learns task-specific principles from few shot examples                                         |\n| Round Trip Optimization              | `rto`              | Optimizes responses through a round-trip process                                               |\n| Best of N Sampling                   | `bon`              | Generates multiple responses and selects the best one                                          |\n| Mixture of Agents                    | `moa`              | Combines responses from multiple critiques                                                     |\n| Monte Carlo Tree Search              | `mcts`             | Uses MCTS for decision-making in chat responses                                                |\n| PV Game                              | `pvg`              | Applies a prover-verifier game approach at inference time                                      |\n| [Deep Confidence](optillm\u002Fdeepconf) | N\u002FA for proxy | Implements confidence-guided reasoning with multiple intensity levels for enhanced accuracy |\n| CoT Decoding                         |  N\u002FA for proxy     | Implements chain-of-thought decoding to elicit reasoning without explicit prompting            |\n| Entropy Decoding                     |  N\u002FA for proxy     | Implements adaptive sampling based on the uncertainty of tokens during generation              |\n| Thinkdeeper                          |  N\u002FA for proxy     | Implements the `reasoning_effort` param from OpenAI for reasoning models like DeepSeek R1      |\n| [AutoThink](optillm\u002Fautothink)       |  N\u002FA for proxy     | Combines query complexity classification with steering vectors to enhance reasoning            |\n\n## Implemented plugins\n\n| Plugin                  | Slug               | Description                                                                                    |\n| ----------------------- | ------------------ | ---------------------------------------------------------------------------------------------- |\n| [System Prompt Learning](optillm\u002Fplugins\u002Fspl)  | `spl`              | Implements what [Andrej Karpathy called the third paradigm](https:\u002F\u002Fx.com\u002Fkarpathy\u002Fstatus\u002F1921368644069765486) for LLM learning, this enables the model to acquire program solving knowledge and strategies |\n| [Deep Think](optillm\u002Fplugins\u002Fdeepthink)              | `deepthink`        | Implements a Gemini-like Deep Think approach using inference time scaling for reasoning LLMs |\n| [Long-Context Cerebras Planning and Optimization](optillm\u002Fplugins\u002Flongcepo)              | `longcepo`              | Combines planning and divide-and-conquer processing of long documents to enable infinite context  |\n| Majority Voting         | `majority_voting`  | Generates k candidate solutions and selects the most frequent answer through majority voting (default k=6) |\n| MCP Client              | `mcp`              | Implements the model context protocol (MCP) client, enabling you to use any LLM with any MCP Server  |\n| Router                  | `router`           | Uses the [optillm-modernbert-large](https:\u002F\u002Fhuggingface.co\u002Fcodelion\u002Foptillm-modernbert-large) model to route requests to different approaches based on the user prompt |\n| Chain-of-Code           | `coc`              | Implements a chain of code approach that combines CoT with code execution and LLM based code simulation |\n| Memory                  | `memory`           | Implements a short term memory layer, enables you to use unbounded context length with any LLM |\n| Privacy                 | `privacy`          | Anonymize PII data in request and deanonymize it back to original value in response            |\n| Read URLs               | `readurls`         | Reads all URLs found in the request, fetches the content at the URL and adds it to the context |\n| Execute Code            | `executecode`      | Enables use of code interpreter to execute python code in requests and LLM generated responses |\n| JSON                    | `json`             | Enables structured outputs using the outlines library, supports pydantic types and JSON schema |\n| GenSelect               | `genselect`        | Generative Solution Selection - generates multiple candidates and selects the best based on quality criteria |\n| Web Search              | `web_search`       | Performs Google searches using Chrome automation (Selenium) to gather search results and URLs |\n| [Deep Research](optillm\u002Fplugins\u002Fdeep_research)           | `deep_research`    | Implements Test-Time Diffusion Deep Researcher (TTD-DR) for comprehensive research reports using iterative refinement |\n| [Proxy](optillm\u002Fplugins\u002Fproxy)      | `proxy`            | Load balancing and failover across multiple LLM providers with health monitoring and round-robin routing |\n\nWe support all major LLM providers and models for inference. You need to set the correct environment variable and the proxy will pick the corresponding client.\n\n| Provider | Required Environment Variables | Additional Notes |\n|----------|-------------------------------|------------------|\n| OptiLLM | `OPTILLM_API_KEY` | Uses the inbuilt local server for inference, supports logprobs and decoding techniques like `cot_decoding` & `entropy_decoding` |\n| OpenAI | `OPENAI_API_KEY` | You can use this with any OpenAI compatible endpoint (e.g. OpenRouter) by setting the `base_url` |\n| Cerebras | `CEREBRAS_API_KEY` | You can use this for fast inference with supported models, see [docs for details](https:\u002F\u002Finference-docs.cerebras.ai\u002Fintroduction) |\n| Azure OpenAI | `AZURE_OPENAI_API_KEY`\u003Cbr>`AZURE_API_VERSION`\u003Cbr>`AZURE_API_BASE` | - |\n| Azure OpenAI (Managed Identity) | `AZURE_API_VERSION`\u003Cbr>`AZURE_API_BASE` | Login required using `az login`, see [docs for details](https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fazure\u002Fai-services\u002Fopenai\u002Fhow-to\u002Fmanaged-identity) |\n| LiteLLM | depends on the model | See [docs for details](https:\u002F\u002Fdocs.litellm.ai\u002Fdocs\u002Fproviders) |\n\nYou can then run the optillm proxy as follows.\n\n```bash\npython optillm.py\n2024-09-06 07:57:14,191 - INFO - Starting server with approach: auto\n2024-09-06 07:57:14,191 - INFO - Server configuration: {'approach': 'auto', 'mcts_simulations': 2, 'mcts_exploration': 0.2, 'mcts_depth': 1, 'best_of_n': 3, 'model': 'gpt-4o-mini', 'rstar_max_depth': 3, 'rstar_num_rollouts': 5, 'rstar_c': 1.4, 'base_url': '', 'host': '127.0.0.1'}\n * Serving Flask app 'optillm'\n * Debug mode: off\n2024-09-06 07:57:14,212 - INFO - WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.\n * Running on http:\u002F\u002F127.0.0.1:8000\n2024-09-06 07:57:14,212 - INFO - Press CTRL+C to quit\n```\n\n> **Security Note**: By default, optillm binds to `127.0.0.1` (localhost only) for security. To allow external connections (e.g., for Docker or remote access), use `--host 0.0.0.0`. Only do this on trusted networks or with proper authentication configured via `--optillm-api-key`.\n## Usage\n\nOnce the proxy is running, you can use it as a drop in replacement for an OpenAI client by setting the `base_url` as `http:\u002F\u002Flocalhost:8000\u002Fv1`.\n\n```python\nimport os\nfrom openai import OpenAI\n\nOPENAI_KEY = os.environ.get(\"OPENAI_API_KEY\")\nOPENAI_BASE_URL = \"http:\u002F\u002Flocalhost:8000\u002Fv1\"\nclient = OpenAI(api_key=OPENAI_KEY, base_url=OPENAI_BASE_URL)\n\nresponse = client.chat.completions.create(\n  model=\"moa-gpt-4o\",\n  messages=[\n    {\n      \"role\": \"user\",\n      \"content\": \"Write a Python program to build an RL model to recite text from any position that the user provides, using only numpy.\"\n    }\n  ],\n  temperature=0.2\n)\n\nprint(response)\n```\nThe code above applies to both OpenAI and Azure OpenAI, just remember to populate the `OPENAI_API_KEY` env variable with the proper key.\nThere are multiple ways to control the optimization techniques, they are applied in the follow order of preference:\n\n- You can control the technique you use for optimization by prepending the slug to the model name `{slug}-model-name`. E.g. in the above code we are using `moa` or mixture of agents as the optimization approach. In the proxy logs you will see the following showing the `moa` is been used with the base model as `gpt-4o-mini`.\n\n```bash\n2024-09-06 08:35:32,597 - INFO - Using approach moa, with gpt-4o-mini\n2024-09-06 08:35:35,358 - INFO - HTTP Request: POST https:\u002F\u002Fapi.openai.com\u002Fv1\u002Fchat\u002Fcompletions \"HTTP\u002F1.1 200 OK\"\n2024-09-06 08:35:39,553 - INFO - HTTP Request: POST https:\u002F\u002Fapi.openai.com\u002Fv1\u002Fchat\u002Fcompletions \"HTTP\u002F1.1 200 OK\"\n2024-09-06 08:35:44,795 - INFO - HTTP Request: POST https:\u002F\u002Fapi.openai.com\u002Fv1\u002Fchat\u002Fcompletions \"HTTP\u002F1.1 200 OK\"\n2024-09-06 08:35:44,797 - INFO - 127.0.0.1 - - [06\u002FSep\u002F2024 08:35:44] \"POST \u002Fv1\u002Fchat\u002Fcompletions HTTP\u002F1.1\" 200 -\n```\n\n- Or, you can pass the slug in the `optillm_approach` field in the `extra_body`.\n\n```bash\nresponse = client.chat.completions.create(\n  model=\"gpt-4o-mini\",\n  messages=[{ \"role\": \"user\",\"content\": \"\" }],\n  temperature=0.2,\n  extra_body={\"optillm_approach\": \"bon|moa|mcts\"}\n)\n```\n- Or, you can just mention the approach in either your `system` or `user` prompt, within `\u003Coptillm_approach> \u003C\u002Foptillm_approach>` tags.\n\n```bash\nresponse = client.chat.completions.create(\n  model=\"gpt-4o-mini\",\n  messages=[{ \"role\": \"user\",\"content\": \"\u003Coptillm_approach>re2\u003C\u002Foptillm_approach> How many r's are there in strawberry?\" }],\n  temperature=0.2\n)\n```\n\n> [!TIP]\n> You can also combine different techniques either by using symbols `&` and `|`. When you use `&` the techniques are processed in the order from left to right in a pipeline\n> with response from previous stage used as request to the next. While, with `|` we run all the requests in parallel and generate multiple responses that are returned as a list.\n\nPlease note that the convention described above works only when the optillm server has been started with inference approach set to `auto`. Otherwise, the `model` attribute in the client request must be set with the model name only.\n\nWe now support all LLM providers (by wrapping around the [LiteLLM sdk](https:\u002F\u002Fdocs.litellm.ai\u002Fdocs\u002F#litellm-python-sdk)). E.g. you can use the Gemini Flash model with `moa` by setting passing the api key in the environment variable `os.environ['GEMINI_API_KEY']` and then calling the model `moa-gemini\u002Fgemini-1.5-flash-002`. In the output you will then see that LiteLLM is being used to call the base model.\n\n```bash\n9:43:21 - LiteLLM:INFO: utils.py:2952 -\nLiteLLM completion() model= gemini-1.5-flash-002; provider = gemini\n2024-09-29 19:43:21,011 - INFO -\nLiteLLM completion() model= gemini-1.5-flash-002; provider = gemini\n2024-09-29 19:43:21,481 - INFO - HTTP Request: POST https:\u002F\u002Fgenerativelanguage.googleapis.com\u002Fv1beta\u002Fmodels\u002Fgemini-1.5-flash-002:generateContent?key=[redacted] \"HTTP\u002F1.1 200 OK\"\n19:43:21 - LiteLLM:INFO: utils.py:988 - Wrapper: Completed Call, calling success_handler\n2024-09-29 19:43:21,483 - INFO - Wrapper: Completed Call, calling success_handler\n19:43:21 - LiteLLM:INFO: utils.py:2952 -\nLiteLLM completion() model= gemini-1.5-flash-002; provider = gemini\n```\n\n> [!TIP]\n> optillm is a transparent proxy and will work with any LLM API or provider that has an OpenAI API compatible chat completions endpoint, and in turn, optillm also exposes\nthe same OpenAI API compatible chat completions endpoint. This should allow you to integrate it into any existing tools or frameworks easily. If the LLM you want to use\ndoesn't have an OpenAI API compatible endpoint (like Google or Anthropic) you can use [LiteLLM proxy server](https:\u002F\u002Fdocs.litellm.ai\u002Fdocs\u002Fproxy\u002Fquick_start) that supports most LLMs.\n\nThe following sequence diagram illustrates how the request and responses go through optillm.\n\n![Sequance diagram showing optillm in use](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Falgorithmicsuperintelligence_optillm_readme_b93cb2279096.png)\n\nIn the diagram:\n- `A` is an existing tool (like [oobabooga](https:\u002F\u002Fgithub.com\u002Foobabooga\u002Ftext-generation-webui\u002F)), framework (like [patchwork](https:\u002F\u002Fgithub.com\u002Fpatched-codes\u002Fpatchwork))\nor your own code where you want to use the results from optillm. You can use it directly using any OpenAI client sdk.\n- `B` is the optillm service (running directly or in a docker container) that will send requests to the `base_url`.\n- `C` is any service providing an OpenAI API compatible chat completions endpoint.\n\n### Local inference server\n\nWe support loading any HuggingFace model or LoRA directly in optillm. To use the built-in inference server set the `OPTILLM_API_KEY` to any value (e.g. `export OPTILLM_API_KEY=\"optillm\"`)\nand then use the same in your OpenAI client. You can pass any HuggingFace model in model field. If it is a private model make sure you set the `HF_TOKEN` environment variable\nwith your HuggingFace key. We also support adding any number of LoRAs on top of the model by using the `+` separator.\n\nE.g. The following code loads the base model `meta-llama\u002FLlama-3.2-1B-Instruct` and then adds two LoRAs on top - `patched-codes\u002FLlama-3.2-1B-FixVulns` and `patched-codes\u002FLlama-3.2-1B-FastApply`.\nYou can specify which LoRA to use using the `active_adapter` param in `extra_body` field of OpenAI SDK client. By default we will load the last specified adapter.\n\n```python\nOPENAI_BASE_URL = \"http:\u002F\u002Flocalhost:8000\u002Fv1\"\nOPENAI_KEY = \"optillm\"\nresponse = client.chat.completions.create(\n  model=\"meta-llama\u002FLlama-3.2-1B-Instruct+patched-codes\u002FLlama-3.2-1B-FastApply+patched-codes\u002FLlama-3.2-1B-FixVulns\",\n  messages=messages,\n  temperature=0.2,\n  logprobs = True,\n  top_logprobs = 3,\n  extra_body={\"active_adapter\": \"patched-codes\u002FLlama-3.2-1B-FastApply\"},\n)\n```\n\nYou can also use the alternate decoding techniques like `cot_decoding` and `entropy_decoding` directly with the local inference server.\n\n```python\nresponse = client.chat.completions.create(\n  model=\"meta-llama\u002FLlama-3.2-1B-Instruct\",\n  messages=messages,\n  temperature=0.2,\n  extra_body={\n        \"decoding\": \"cot_decoding\",  # or \"entropy_decoding\"\n        # CoT specific params\n        \"k\": 10,\n        \"aggregate_paths\": True,\n        # OR Entropy specific params\n        \"top_k\": 27,\n        \"min_p\": 0.03,\n    }\n)\n```\n\n### Starting the optillm proxy with an external server (e.g. llama.cpp or ollama)\n\n- Set the `OPENAI_API_KEY` env variable to a placeholder value\n  - e.g. `export OPENAI_API_KEY=\"sk-no-key\"`\n- Run `.\u002Fllama-server -c 4096 -m path_to_model` to start the server with the specified model and a context length of 4096 tokens\n- Run `python3 optillm.py --base_url base_url` to start the proxy\n  - e.g. for llama.cpp, run `python3 optillm.py --base_url http:\u002F\u002Flocalhost:8080\u002Fv1`\n\n> [!WARNING]\n> The Anthropic API, llama.cpp-server, and ollama currently do not support sampling multiple responses from a model, which limits the available approaches to the following:\n> `cot_reflection`, `leap`, `plansearch`, `rstar`, `rto`, `self_consistency`, `re2`, and `z3`. For models on HuggingFace, you can use the built-in local inference server as it supports multiple responses.\n\n### MCP Plugin\n\nThe Model Context Protocol (MCP) plugin enables OptiLLM to connect with MCP servers, bringing external tools, resources, and prompts into the context of language models. This allows for powerful integrations with filesystem access, database queries, API connections, and more.\n\nOptiLLM supports both **local** and **remote** MCP servers through multiple transport methods:\n- **stdio**: Local servers (traditional)\n- **SSE**: Remote servers via Server-Sent Events\n- **WebSocket**: Remote servers via WebSocket connections\n\n#### What is MCP?\n\nThe [Model Context Protocol](https:\u002F\u002Fmodelcontextprotocol.io\u002F) (MCP) is an open protocol standard that allows LLMs to securely access tools and data sources through a standardized interface. MCP servers can provide:\n\n- **Tools**: Callable functions that perform actions (like writing files, querying databases, etc.)\n- **Resources**: Data sources for providing context (like file contents)\n- **Prompts**: Reusable prompt templates for specific use cases\n\n#### Configuration\n\n##### Setting up MCP Config\n\n> **Note on Backwards Compatibility**: Existing MCP configurations will continue to work unchanged. The `transport` field defaults to \"stdio\" when not specified, maintaining full backwards compatibility with existing setups.\n\n1. Create a configuration file at `~\u002F.optillm\u002Fmcp_config.json` with the following structure:\n\n**Local Server (stdio) - Traditional Method:**\n```json\n{\n  \"mcpServers\": {\n    \"filesystem\": {\n      \"transport\": \"stdio\",\n      \"command\": \"npx\",\n      \"args\": [\n        \"-y\",\n        \"@modelcontextprotocol\u002Fserver-filesystem\",\n        \"\u002Fpath\u002Fto\u002Fallowed\u002Fdirectory1\",\n        \"\u002Fpath\u002Fto\u002Fallowed\u002Fdirectory2\"\n      ],\n      \"env\": {},\n      \"description\": \"Local filesystem access\"\n    }\n  },\n  \"log_level\": \"INFO\"\n}\n```\n\n**Legacy Format (still works):**\n```json\n{\n  \"mcpServers\": {\n    \"filesystem\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"@modelcontextprotocol\u002Fserver-filesystem\", \"\u002Fpath\u002Fto\u002Fdirectory\"],\n      \"env\": {}\n    }\n  }\n}\n```\n\n**Remote Server (SSE) - New Feature:**\n```json\n{\n  \"mcpServers\": {\n    \"github\": {\n      \"transport\": \"sse\",\n      \"url\": \"https:\u002F\u002Fapi.githubcopilot.com\u002Fmcp\",\n      \"headers\": {\n        \"Authorization\": \"Bearer ${GITHUB_TOKEN}\",\n        \"Accept\": \"text\u002Fevent-stream\"\n      },\n      \"timeout\": 30.0,\n      \"sse_read_timeout\": 300.0,\n      \"description\": \"GitHub MCP server for repository access\"\n    }\n  },\n  \"log_level\": \"INFO\"\n}\n```\n\n**Remote Server (WebSocket) - New Feature:**\n```json\n{\n  \"mcpServers\": {\n    \"remote-ws\": {\n      \"transport\": \"websocket\",\n      \"url\": \"wss:\u002F\u002Fapi.example.com\u002Fmcp\",\n      \"description\": \"Remote WebSocket MCP server\"\n    }\n  },\n  \"log_level\": \"INFO\"\n}\n```\n\n**Mixed Configuration (Local + Remote):**\n```json\n{\n  \"mcpServers\": {\n    \"filesystem\": {\n      \"transport\": \"stdio\",\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"@modelcontextprotocol\u002Fserver-filesystem\", \"\u002Fhome\u002Fuser\u002Fdocs\"],\n      \"description\": \"Local filesystem access\"\n    },\n    \"github\": {\n      \"transport\": \"sse\",\n      \"url\": \"https:\u002F\u002Fapi.githubcopilot.com\u002Fmcp\",\n      \"headers\": {\n        \"Authorization\": \"Bearer ${GITHUB_TOKEN}\"\n      },\n      \"description\": \"GitHub MCP server\"\n    },\n    \"remote-api\": {\n      \"transport\": \"websocket\",\n      \"url\": \"wss:\u002F\u002Fapi.company.com\u002Fmcp\",\n      \"description\": \"Company internal MCP server\"\n    }\n  },\n  \"log_level\": \"INFO\"\n}\n```\n\n##### Configuration Parameters\n\n**Common Parameters:**\n- **Server name**: A unique identifier for the server (e.g., \"filesystem\", \"github\")\n- **transport**: Transport method - \"stdio\" (default), \"sse\", or \"websocket\"\n- **description** (optional): Description of the server's functionality\n- **timeout** (optional): Connection timeout in seconds (default: 5.0)\n\n**stdio Transport (Local Servers):**\n- **command**: The executable to run the server\n- **args**: Command-line arguments for the server\n- **env**: Environment variables for the server process\n\n**sse Transport (Server-Sent Events):**\n- **url**: The SSE endpoint URL\n- **headers** (optional): HTTP headers for authentication\n- **sse_read_timeout** (optional): SSE read timeout in seconds (default: 300.0)\n\n**websocket Transport (WebSocket):**\n- **url**: The WebSocket endpoint URL\n\n**Environment Variable Expansion:**\nHeaders and other string values support environment variable expansion using `${VARIABLE_NAME}` syntax. This is especially useful for API keys:\n```json\n{\n  \"headers\": {\n    \"Authorization\": \"Bearer ${GITHUB_TOKEN}\",\n    \"X-API-Key\": \"${MY_API_KEY}\"\n  }\n}\n```\n\n#### Available MCP Servers\n\nOptiLLM supports both local and remote MCP servers:\n\n##### Local MCP Servers (stdio transport)\n\nYou can use any of the [official MCP servers](https:\u002F\u002Fmodelcontextprotocol.io\u002Fexamples) or third-party servers that run as local processes:\n\n- **Filesystem**: `@modelcontextprotocol\u002Fserver-filesystem` - File operations\n- **Git**: `mcp-server-git` - Git repository operations\n- **SQLite**: `@modelcontextprotocol\u002Fserver-sqlite` - SQLite database access\n- **Brave Search**: `@modelcontextprotocol\u002Fserver-brave-search` - Web search capabilities\n\n##### Remote MCP Servers (SSE\u002FWebSocket transport)\n\nRemote servers provide centralized access without requiring local installation:\n\n- **GitHub MCP Server**: `https:\u002F\u002Fapi.githubcopilot.com\u002Fmcp` - Repository management, issue tracking, and code analysis\n- **Third-party servers**: Any MCP server that supports SSE or WebSocket protocols\n\n##### Example: Comprehensive Configuration\n\n```json\n{\n  \"mcpServers\": {\n    \"filesystem\": {\n      \"transport\": \"stdio\",\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"@modelcontextprotocol\u002Fserver-filesystem\", \"\u002Fhome\u002Fuser\u002Fdocuments\"],\n      \"description\": \"Local file system access\"\n    },\n    \"search\": {\n      \"transport\": \"stdio\",\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"@modelcontextprotocol\u002Fserver-brave-search\"],\n      \"env\": {\n        \"BRAVE_API_KEY\": \"your-api-key-here\"\n      },\n      \"description\": \"Web search capabilities\"\n    },\n    \"github\": {\n      \"transport\": \"sse\",\n      \"url\": \"https:\u002F\u002Fapi.githubcopilot.com\u002Fmcp\",\n      \"headers\": {\n        \"Authorization\": \"Bearer ${GITHUB_TOKEN}\",\n        \"Accept\": \"text\u002Fevent-stream\"\n      },\n      \"description\": \"GitHub repository and issue management\"\n    }\n  },\n  \"log_level\": \"INFO\"\n}\n```\n\n#### Using the MCP Plugin\n\nOnce configured, the MCP plugin will automatically:\n\n1. Connect to all configured MCP servers\n2. Discover available tools, resources, and prompts\n3. Make these capabilities available to the language model\n4. Handle tool calls and resource requests\n\nThe plugin enhances the system prompt with MCP capabilities so the model knows which tools are available. When the model decides to use a tool, the plugin:\n\n1. Executes the tool with the provided arguments\n2. Returns the results to the model\n3. Allows the model to incorporate the results into its response\n\n#### Example Queries\n\nHere are some examples of queries that will engage MCP tools:\n\n**Local Server Examples:**\n- \"List all the Python files in my documents directory\" (Filesystem)\n- \"What are the recent commits in my Git repository?\" (Git)\n- \"Search for the latest information about renewable energy\" (Search)\n- \"Query my database for all users who registered this month\" (Database)\n\n**Remote Server Examples:**\n- \"Show me the open issues in my GitHub repository\" (GitHub MCP)\n- \"Create a new branch for the feature I'm working on\" (GitHub MCP)\n- \"What are the most recent pull requests that need review?\" (GitHub MCP)\n- \"Get the file contents from my remote repository\" (GitHub MCP)\n\n#### Troubleshooting\n\n##### Logs\n\nThe MCP plugin logs detailed information to:\n```\n~\u002F.optillm\u002Flogs\u002Fmcp_plugin.log\n```\n\nCheck this log file for connection issues, tool execution errors, and other diagnostic information.\n\n##### Common Issues\n\n**Local Server Issues (stdio transport):**\n\n1. **Command not found**: Make sure the server executable is available in your PATH, or use an absolute path in the configuration.\n\n2. **Access denied**: For filesystem operations, ensure the paths specified in the configuration are accessible to the process.\n\n**Remote Server Issues (SSE\u002FWebSocket transport):**\n\n3. **Connection timeout**: Remote servers may take longer to connect. Increase the `timeout` value in your configuration.\n\n4. **Authentication failed**: Verify your API keys and tokens are correct. For GitHub MCP server, ensure your `GITHUB_TOKEN` environment variable is set with appropriate permissions.\n\n5. **Network errors**: Check your internet connection and verify the server URL is accessible.\n\n6. **Environment variable not found**: If using `${VARIABLE_NAME}` syntax, ensure the environment variables are set before starting OptILLM.\n\n**General Issues:**\n\n7. **Method not found**: Some servers don't implement all MCP capabilities (tools, resources, prompts). Verify which capabilities the server supports.\n\n8. **Transport not supported**: Ensure you're using a supported transport: \"stdio\", \"sse\", or \"websocket\".\n\n**Example: Testing GitHub MCP Connection**\n\nTo test if your GitHub MCP server configuration is working:\n\n1. Set your GitHub token: `export GITHUB_TOKEN=\"your-github-token\"`\n2. Start OptILLM and check the logs at `~\u002F.optillm\u002Flogs\u002Fmcp_plugin.log`\n3. Look for connection success messages and discovered capabilities\n\n## Available parameters\n\noptillm supports various command-line arguments for configuration. When using Docker, these can also be set as environment variables prefixed with `OPTILLM_`.\n\n| Parameter                | Description                                                     | Default Value   |\n|--------------------------|-----------------------------------------------------------------|-----------------|\n| `--approach`             | Inference approach to use                                       | `\"auto\"`        |\n| `--simulations`          | Number of MCTS simulations                                      | 2               |\n| `--exploration`          | Exploration weight for MCTS                                     | 0.2             |\n| `--depth`                | Simulation depth for MCTS                                       | 1               |\n| `--best-of-n`            | Number of samples for best_of_n approach                        | 3               |\n| `--model`                | OpenAI model to use                                             | `\"gpt-4o-mini\"` |\n| `--base-url`             | Base URL for OpenAI compatible endpoint                         | `\"\"`            |\n| `--rstar-max-depth`      | Maximum depth for rStar algorithm                               | 3               |\n| `--rstar-num-rollouts`   | Number of rollouts for rStar algorithm                          | 5               |\n| `--rstar-c`              | Exploration constant for rStar algorithm                        | 1.4             |\n| `--n`                    | Number of final responses to be returned                        | 1               |\n| `--return-full-response` | Return the full response including the CoT with \u003Cthinking> tags | `False`         |\n| `--port`                 | Specify the port to run the proxy                               | 8000            |\n| `--optillm-api-key`      | Optional API key for client authentication to optillm           | `\"\"`            |\n| `--cepo_*`               | See CePO Parameters section below for detailed config options   | Various         |\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>CePO Parameters\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n| Parameter | Description | Default Value |\n|-----------|-------------|---------------|\n| `--cepo_bestofn_n` | Number of responses to be generated in best of n stage | 3 |\n| `--cepo_bestofn_temperature` | Temperature for verifier in best of n stage | 0.1 |\n| `--cepo_bestofn_max_tokens` | Maximum number of tokens for verifier in best of n stage | 4096 |\n| `--cepo_bestofn_rating_type` | Type of rating in best of n stage (\"absolute\" or \"pairwise\") | `\"absolute\"` |\n| `--cepo_planning_n` | Number of plans generated in planning stage | 3 |\n| `--cepo_planning_m` | Number of attempts to generate n plans in planning stage | 6 |\n| `--cepo_planning_temperature_step1` | Temperature for generator in step 1 of planning stage | 0.55 |\n| `--cepo_planning_temperature_step2` | Temperature for generator in step 2 of planning stage | 0.25 |\n| `--cepo_planning_temperature_direct_resp` | Temperature for generator after step 2 if planning fails and answer directly | 0.1 |\n| `--cepo_planning_temperature_step3` | Temperature for generator in step 3 of planning stage | 0.1 |\n| `--cepo_planning_temperature_step4` | Temperature for generator in step 4 of planning stage | 0 |\n| `--cepo_planning_max_tokens_step1` | Maximum number of tokens in step 1 of planning stage | 4096 |\n| `--cepo_planning_max_tokens_step2` | Maximum number of tokens in step 2 of planning stage | 4096 |\n| `--cepo_planning_max_tokens_direct_resp` | Maximum number of tokens after step 2 if planning fails and answer directly | 4096 |\n| `--cepo_planning_max_tokens_step3` | Maximum number of tokens in step 3 of planning stage | 4096 |\n| `--cepo_planning_max_tokens_step4` | Maximum number of tokens in step 4 of planning stage | 4096 |\n| `--cepo_use_reasoning_fallback` | Whether to fallback to lower levels of reasoning when higher level fails | False |\n| `--cepo_num_of_retries` | Number of retries if llm call fails, 0 for no retries | 0 |\n| `--cepo_print_output` | Whether to print the output of each stage | `False` |\n| `--cepo_config_file` | Path to CePO configuration file | `None` |\n| `--cepo_use_plan_diversity` | Use additional plan diversity step | `False` |\n| `--cepo_rating_model` | Specify a model for rating step if different than for completion | `None` |\n\n\u003C\u002Fdetails>\n\n## Running with Docker\n\noptillm can optionally be built and run using Docker and the provided [Dockerfile](https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fblob\u002Fmain\u002FDockerfile).\n\n### Using Docker Compose\n\n1. Make sure you have Docker and Docker Compose installed on your system.\n\n2. Either update the environment variables in the docker-compose.yaml file or create a `.env` file in the project root directory and add any environment variables you want to set. For example, to set the OpenAI API key, add the following line to the `.env` file:\n\n   ```bash\n   OPENAI_API_KEY=your_openai_api_key_here\n   ```\n\n3. Run the following command to start optillm:\n\n   ```bash\n   docker compose up -d\n   ```\n\n   This will build the Docker image if it doesn't exist and start the optillm service.\n\n4. optillm will be available at `http:\u002F\u002Flocalhost:8000`.\n\nWhen using Docker, you can set these parameters as environment variables. For example, to set the approach and model, you would use:\n\n```bash\nOPTILLM_APPROACH=mcts\nOPTILLM_MODEL=gpt-4\n```\n\nTo secure the optillm proxy with an API key, set the `OPTILLM_API_KEY` environment variable:\n\n```bash\nOPTILLM_API_KEY=your_secret_api_key\n```\n\nWhen the API key is set, clients must include it in their requests using the `Authorization` header:\n\n```plain\nAuthorization: Bearer your_secret_api_key\n```\n\n## SOTA results on benchmarks with optillm\n\n### MARS on AIME 2025, IMO 2025, and LiveCodeBench (Oct 2025)\n\n| Benchmark | Approach | Problems | Correct | Accuracy | Improvement |\n|-----------|----------|----------|---------|----------|-------------|\n| **AIME 2025** | Baseline | 30 | 13 | 43.3% | - |\n| **AIME 2025** | **MARS** | 30 | **22** | **73.3%** | **+30.0pp (+69.2%)** |\n| **IMO 2025** | Baseline | 6 | 1 | 16.7% | - |\n| **IMO 2025** | **MARS** | 6 | **2** | **33.3%** | **+16.7pp (+100%)** |\n| **LiveCodeBench v5\u002Fv6** | Baseline | 105 | 41 | 39.05% | - |\n| **LiveCodeBench v5\u002Fv6** | **MARS** | 105 | **53** | **50.48%** | **+11.43pp (+29.3%)** |\n\nModel: google\u002Fgemini-2.5-flash-lite-preview-09-2025 via OpenRouter\nConfiguration: 3 agents, 2-pass verification, thinking tags disabled for proofs\n\n### AutoThink on GPQA-Diamond & MMLU-Pro (May 2025)\n\n| **Model**     | **GPQA-Diamond**            |                          | **MMLU-Pro**               |                          |\n|----------------|-----------------------------|--------------------------|----------------------------|--------------------------|\n|                | Accuracy (%)                | Avg. Tokens              | Accuracy (%)               | Avg. Tokens              |\n| DeepSeek-R1-Distill-Qwen-1.5B    | 21.72                       | 7868.26                  | 25.58                      | 2842.75                  |\n| with Fixed Budget | 28.47                     | 3570.00                  | 26.18                      | 1815.67                  |\n| **with AutoThink**  | **31.06**                   | **3520.52**              | **26.38**                  | **1792.50**              |\n\n\n### LongCePO on LongBench v2 (Apr 2025)\n\n| Model¹                             | Context window | Short samples (up to 32K words) | Medium samples (32–128K words) |\n|----------------------------------|----------------|------------------|----------------|\n| Llama 3.3 70B Instruct           | 128K           | 36.7 (45.0)               | 27.0 (33.0)            |\n| **LongCePO + Llama 3.3 70B Instruct** | **8K**             | **36.8 ± 1.38**        |  **38.7 ± 2.574 (39.735)²**             |\n| Mistral-Large-Instruct-2411     | 128K           | 41.7 (46.1)                 | 30.7 (34.9)             |\n| o1-mini-2024-09-12               | 128K           | 48.6 (48.9)                | 33.3 (32.9)            |\n| Claude-3.5-Sonnet-20241022       | 200K           | 46.1 (53.9)                | 38.6 (41.9)            |\n| Llama-4-Maverick-17B-128E-Instruct | 524K         | 32.22 (50.56)                  | 28.84 (41.86)               |\n\n ¹ Performance numbers reported by LongBench v2 authors, except for LongCePO and Llama-4-Maverick results.\n\n ² Numbers in parentheses for LongCePO indicate accuracy of majority voting from 5 runs.\n\n### LongCePO on HELMET - InfiniteBench En.MC, 128K length (Apr 2025)\n\n| Model   | Accuracy (%) |\n|---------|---------------|\n| Llama 3.3 70B Instruct  (full context)  | 58.0          |\n| **LongCePO + Llama 3.3 70B Instruct (8K context)** | **71.6 ± 1.855 (73.0)¹**  |\n| o1-mini-2024-09-12 (full context) | 58.0          |\n| gpt-4o-2024-08-06 (full context) | 74.0          |\n\n ¹ Numbers in parentheses for LongCePO indicate accuracy of majority voting from 5 runs.\n\n### CePO on math and code benchmarks (Sep 2025)\n\n| Method                  | AIME 2024 | AIME 2025 |  GPQA  | LiveCodeBench |\n| ----------------------: | :-------: | :-------: | :----: | :-----------: |\n| Qwen3 8B                |   74.0    |   68.3    |  59.3  |     55.7      |\n| CePO (using Qwen3 8B)   |   86.7    |   80.0    |  62.5  |     60.5      |\n| Qwen3 32B               |   81.4    |   72.9    |  66.8  |     65.7      |\n| CePO (using Qwen3 32B)  | **90.7**  | **83.3**  |  70.0  |   **71.9**    |\n| Qwen3 235B              |   85.7    |   81.5    |  71.1  |     70.7      |\n| DeepSeek R1             |   79.8    |   70.0    |  71.5  |     64.3      |\n| OpenAI o3-mini          |   79.6    |   74.8    |  76.8  |     66.3      |\n| Grok3 Think             |   83.9    |   77.3    |**80.2**|     70.6      |\n\n### CePO on math and code benchmarks (Mar 2025)\n\n| Method                         | Math-L5 | MMLU-Pro (Math) | CRUX | LiveCodeBench (pass@1) | Simple QA |\n| -----------------------------: | :-----: | :-------------: | :----: | :--------------------: | :-------: |\n| Llama 3.3 70B                  |  51.0   |      78.6       |  72.6  |          27.1          |    20.9   |\n| Llama 3.1 405B                 |  49.8   |      79.2       |  73.0  |          31.8          |    13.5   |\n| CePO (using Llama 3.3 70B)     |  69.6   |      84.8       |  80.1  |          31.9          |  **22.6** |\n| QwQ 32B                        |  61.4   |      90.8       |  82.5  |          44.3          |    7.8    |\n| CePO (using QwQ 32B)           |  88.1   |    **92.0**     |  86.3  |        **51.5**        |    8.2    |\n| DeepSeek R1 Llama              |  83.1   |      82.0       |  84.0  |          47.3          |    14.6   |\n| CePO (using DeepSeek R1 Llama) |**90.2** |      84.0       |**89.4**|          47.2          |    15.5   |\n\n### coc-claude-3-5-sonnet-20241022 on AIME 2024 pass@1 (Nov 2024)\n\n| Model | Score |\n|-------|-----:|\n| o1-mini | 56.67 |\n| coc-claude-3-5-sonnet-20241022 | 46.67 |\n| coc-gemini\u002Fgemini-exp-1121 | 46.67 |\n| o1-preview | 40.00 |\n| gemini-exp-1114 | 36.67 |\n| claude-3-5-sonnet-20241022 | 20.00 |\n| gemini-1.5-pro-002 | 20.00 |\n| gemini-1.5-flash-002 | 16.67 |\n\n### readurls&memory-gpt-4o-mini on Google FRAMES Benchmark (Oct 2024)\n| Model | Accuracy |\n| ----- | -------- |\n| readurls&memory-gpt-4o-mini | 61.29 |\n| gpt-4o-mini | 50.61 |\n| readurls&memory-Gemma2-9b | 30.1 |\n| Gemma2-9b | 5.1 |\n| Gemma2-27b | 30.8 |\n| Gemini Flash 1.5 | 66.5 |\n| Gemini Pro 1.5 | 72.9 |\n\n### plansearch-gpt-4o-mini on LiveCodeBench (Sep 2024)\n\n| Model                  | pass@1 | pass@5 | pass@10 |\n| ---------------------- | ------ | ------ | ------- |\n| plansearch-gpt-4o-mini | 44.03  | 59.31  | 63.5    |\n| gpt-4o-mini            | 43.9   | 50.61  | 53.25   |\n| claude-3.5-sonnet      | 51.3   |        |         |\n| gpt-4o-2024-05-13      | 45.2   |        |         |\n| gpt-4-turbo-2024-04-09 | 44.2   |        |         |\n\n### moa-gpt-4o-mini on Arena-Hard-Auto (Aug 2024)\n\n![Results showing Mixture of Agents approach using gpt-4o-mini on Arena Hard Auto Benchmark](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Falgorithmicsuperintelligence_optillm_readme_9dbb33886eb2.png)\n\n### optillm with Patchwork (July 2024)\n\nSince optillm is a drop-in replacement for OpenAI API you can easily integrate it with existing tools and frameworks using the OpenAI client. We used optillm with [patchwork](https:\u002F\u002Fgithub.com\u002Fpatched-codes\u002Fpatchwork) which is an open-source framework that automates development gruntwork like PR reviews, bug fixing, security patching using workflows\ncalled patchflows. We saw huge performance gains across all the supported patchflows as shown below when using the mixture of agents approach (moa).\n\n![Results showing optillm mixture of agents approach used with patchflows](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Falgorithmicsuperintelligence_optillm_readme_cda8a1644c8c.png)\n\n## Testing\n\nOptiLLM includes a comprehensive test suite to ensure reliability and compatibility.\n\n### Running Tests\n\nThe main test suite can be run from the project root:\n```bash\n# Test all approaches with default test cases\npython tests\u002Ftest.py\n\n# Test specific approaches\npython tests\u002Ftest.py --approaches moa bon mcts\n\n# Run a single test\npython tests\u002Ftest.py --single-test \"Simple Math Problem\"\n```\n\n### Unit and Integration Tests\n\nAdditional tests are available in the `tests\u002F` directory:\n```bash\n# Run all tests (requires pytest)\n.\u002Ftests\u002Frun_tests.sh\n\n# Run specific test modules\npytest tests\u002Ftest_plugins.py -v\npytest tests\u002Ftest_api_compatibility.py -v\n```\n\n### CI\u002FCD\n\nAll tests are automatically run on pull requests via GitHub Actions. The workflow tests:\n- Multiple Python versions (3.10, 3.11, 3.12)\n- Unit tests for plugins and core functionality\n- API compatibility tests\n- Integration tests with various approaches\n\nSee `tests\u002FREADME.md` for more details on the test structure and how to write new tests.\n\n## 🤝 Contributing\n\nWe ❤️ contributions! OptiLLM is built by the community, for the community.\n\n- 🐛 **Found a bug?** [Open an issue](https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fissues\u002Fnew)\n- 💡 **Have an idea?** [Start a discussion](https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fdiscussions)\n- 🔧 **Want to code?** Check out [good first issues](https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Flabels\u002Fgood%20first%20issue)\n\n### Development Setup\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm.git\ncd optillm\npython -m venv .venv\nsource .venv\u002Fbin\u002Factivate  # or `.venv\\Scripts\\activate` on Windows\npip install -r requirements.txt\npip install -r tests\u002Frequirements.txt\n\n# Run tests\npython -m pytest tests\u002F\n```\n\n## References\n- [Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.08060)\n- [AutoThink: efficient inference for reasoning LLMs](https:\u002F\u002Fdx.doi.org\u002F10.2139\u002Fssrn.5253327) - [Implementation](optillm\u002Fautothink)\n- [Deep Think with Confidence: Confidence-guided reasoning and inference-time scaling](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.15260) - [Implementation](optillm\u002Fdeepconf)\n- [Self-Discover: Large Language Models Self-Compose Reasoning Structures\n](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.03620) - [Implementation](optillm\u002Fplugings\u002Fdeepthink)\n- [CePO: Empowering Llama with Reasoning using Test-Time Compute](https:\u002F\u002Fcerebras.ai\u002Fblog\u002Fcepo) - [Implementation](optillm\u002Fcepo)\n- [LongCePO: Empowering LLMs to efficiently leverage infinite context](https:\u002F\u002Fcerebras.ai\u002Fblog\u002Flongcepo) - [Implementation](optillm\u002Fplugins\u002Flongcepo)\n- [Chain of Code: Reasoning with a Language Model-Augmented Code Emulator](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.04474) - [Inspired the implementation of coc plugin](optillm\u002Fplugins\u002Fcoc_plugin.py)\n- [Entropy Based Sampling and Parallel CoT Decoding](https:\u002F\u002Fgithub.com\u002Fxjdr-alt\u002Fentropix) - [Implementation](optillm\u002Fentropy_decoding.py)\n- [Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.12941) - [Evaluation script](scripts\u002Feval_frames_benchmark.py)\n- [Writing in the Margins: Better Inference Pattern for Long Context Retrieval](https:\u002F\u002Fwww.arxiv.org\u002Fabs\u002F2408.14906) - [Inspired the implementation of the memory plugin](optillm\u002Fplugins\u002Fmemory_plugin.py)\n- [Chain-of-Thought Reasoning Without Prompting](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.10200) - [Implementation](optillm\u002Fcot_decoding.py)\n- [Re-Reading Improves Reasoning in Large Language Models](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.06275) - [Implementation](optillm\u002Freread.py)\n- [In-Context Principle Learning from Mistakes](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.05403) - [Implementation](optillm\u002Fleap.py)\n- [Planning In Natural Language Improves LLM Search For Code Generation](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.03733) - [Implementation](optillm\u002Fplansearch.py)\n- [Self-Consistency Improves Chain of Thought Reasoning in Language Models](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.11171) - [Implementation](optillm\u002Fself_consistency.py)\n- [Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.06195) - [Implementation](optillm\u002Frstar.py)\n- [Mixture-of-Agents Enhances Large Language Model Capabilities](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.04692) - [Inspired the implementation of moa](optillm\u002Fmoa.py)\n- [Prover-Verifier Games improve legibility of LLM outputs](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.13692) - [Implementation](optillm\u002Fpvg.py)\n- [Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.00451) - [Inspired the implementation of mcts](optillm\u002Fmcts.py)\n- [Unsupervised Evaluation of Code LLMs with Round-Trip Correctness](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.08699) - [Inspired the implementation of rto](optillm\u002Frto.py)\n- [Patched MOA: optimizing inference for diverse software development tasks](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.18521) - [Implementation](optillm\u002Fmoa.py)\n- [Patched RTC: evaluating LLMs for diverse software development tasks](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.16557) - [Implementation](optillm\u002Frto.py)\n- [AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.16891) - [Implementation](optillm\u002Fplugins\u002Fgenselect_plugin.py)\n- [Test-Time Diffusion Deep Researcher (TTD-DR): Think More, Research More, Answer Better!](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.16075v1) - [Implementation](optillm\u002Fplugins\u002Fdeep_research)\n\n## Citation\n\nIf you use this library in your research, please cite:\n\n```bibtex\n@software{optillm,\n  title = {OptiLLM: Optimizing inference proxy for LLMs},\n  author = {Asankhaya Sharma},\n  year = {2024},\n  publisher = {GitHub},\n  url = {https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm}\n}\n```\n\n---\n\n\u003Cp align=\"center\">\n  \u003Cstrong>Ready to optimize your LLMs? Install OptiLLM and see the difference! 🚀\u003C\u002Fstrong>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  ⭐ \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\">Star us on GitHub\u003C\u002Fa> if you find OptiLLM useful!\n\u003C\u002Fp>\n","# OptiLLM\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Falgorithmicsuperintelligence_optillm_readme_ff913a070650.png\" alt=\"OptiLLM Logo\" width=\"400\" \u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cstrong>🚀 在推理任务上实现零训练下的2-10倍准确率提升\u003C\u002Fstrong>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fstargazers\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Falgorithmicsuperintelligence\u002Foptillm?style=social\" alt=\"GitHub 星标\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Foptillm\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Foptillm\" alt=\"PyPI 版本\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Foptillm\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Foptillm\" alt=\"PyPI 下载量\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fblob\u002Fmain\u002FLICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Falgorithmicsuperintelligence\u002Foptillm\" alt=\"许可证\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fcodelion\u002Foptillm\">🤗 HuggingFace Space\u003C\u002Fa> •\n  \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1SpuUb8d9xAoTh32M-9wJsB50AOH54EaH?usp=sharing\">📓 Colab 演示\u003C\u002Fa> •\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fdiscussions\">💬 讨论区\u003C\u002Fa>\n\u003C\u002Fp>\n\n---\n\n**OptiLLM** 是一个兼容 OpenAI API 的优化推理代理，实现了 20 多种最先进的技术，能够在无需任何模型训练或微调的情况下，显著提升 LLM 在推理任务上的准确性和性能。\n\n通过在推理阶段增加计算资源，这些技术可以在各种任务中超越前沿模型。Cerebras 提出的 [CePO 方法](optillm\u002Fcepo) 就是一个将这些技术有效结合的良好范例。\n\n## ✨ 核心特性\n\n- **🎯 即时提升**: 在数学、编码和逻辑推理方面实现 2-10 倍的准确率提升\n- **🔌 即插即用**: 可与任何兼容 OpenAI API 的端点配合使用\n- **🧠 20+ 优化技术**: 从简单的最佳 N 抽样到高级的 MCTS 和规划方法\n- **📦 无需训练**: 只需将现有的 API 调用通过 OptiLLM 进行代理即可\n- **⚡ 生产就绪**: 已被全球多家公司和研究机构用于生产环境\n- **🌍 多提供商支持**: 支持 OpenAI、Anthropic、Google、Cerebras 等，并可通过 LiteLLM 使用 100 多种模型\n\n## 🚀 快速入门\n\n只需三个简单步骤，即可获得强大的推理能力提升：\n\n```bash\n# 1. 安装 OptiLLM\npip install optillm\n\n# 2. 启动服务器\nexport OPENAI_API_KEY=\"your-key-here\"\noptillm\n\n# 3. 与任何 OpenAI 客户端一起使用 - 只需更改模型名称！\n```\n\n```python\nfrom openai import OpenAI\n\nclient = OpenAI(base_url=\"http:\u002F\u002Flocalhost:8000\u002Fv1\")\n\n# 添加 'moa-' 前缀以启用混合代理优化\nresponse = client.chat.completions.create(\n    model=\"moa-gpt-4o-mini\",  # 这将使 GPT-4o-mini 达到 GPT-4o 的性能！\n    messages=[{\"role\": \"user\", \"content\": \"解方程：如果 2x + 3 = 7，那么 x 是多少？\"}]\n)\n```\n\n**OptiLLM 之前**: “x = 1” ❌  \n**OptiLLM 之后**: “让我一步步来：2x + 3 = 7，所以 2x = 4，因此 x = 2” ✅\n\n## 📊 经验证的效果\n\nOptiLLM 在多种基准测试中均表现出可量化的提升：\n\n| 技术 | 基础模型 | 提升幅度 | 基准测试 |\n|-----------|------------|-------------|-----------|\n| **MARS** | Gemini 2.5 Flash Lite | **+30.0 分** | AIME 2025 (43.3→73.3) |\n| **CePO** | Llama 3.3 70B | **+18.6 分** | Math-L5 (51.0→69.6) |\n| **AutoThink** | DeepSeek-R1-1.5B | **+9.34 分** | GPQA-Diamond (21.72→31.06) |\n| **LongCePO** | Llama 3.3 70B | **+13.6 分** | InfiniteBench (58.0→71.6) |\n| **MOA** | GPT-4o-mini | **媲美 GPT-4** | Arena-Hard-Auto |\n| **PlanSearch** | GPT-4o-mini | **Pass@5 提升 20%** | LiveCodeBench |\n\n*完整的基准测试结果 [见下文](#sota-results-on-benchmarks-with-optillm)* ⬇️\n\n## 🏗️ 安装\n\n### 使用 pip\n\n```bash\npip install optillm\noptillm\n2024-10-22 07:45:05,612 - INFO - 加载了隐私插件\n2024-10-22 07:45:06,293 - INFO - 加载了记忆插件\n2024-10-22 07:45:06,293 - INFO - 使用自动方法启动服务器\n```\n\n### 使用 Docker\n\n```bash\ndocker pull ghcr.io\u002Falgorithmicsuperintelligence\u002Foptillm:latest\ndocker run -p 8000:8000 ghcr.io\u002Falgorithmicsuperintelligence\u002Foptillm:latest\n2024-10-22 07:45:05,612 - INFO - 加载了隐私插件\n2024-10-22 07:45:06,293 - INFO - 加载了记忆插件\n2024-10-22 07:45:06,293 - INFO - 使用自动方法启动服务器\n```\n\n**可用的 Docker 镜像变体:**\n\n- **完整镜像** (`latest`): 包含本地推理和插件的所有依赖项\n- **仅代理镜像** (`latest-proxy`): 不具备本地推理功能的轻量级镜像\n- **离线镜像** (`latest-offline`): 自包含镜像，预下载了 spaCy 等模型，可用于完全离线运行\n\n```bash\n# 仅代理（最小）\ndocker pull ghcr.io\u002Falgorithmicsuperintelligence\u002Foptillm:latest-proxy\n\n# 离线（最大，包含预下载的模型）\ndocker pull ghcr.io\u002Falgorithmicsuperintelligence\u002Foptillm:latest-offline\n```\n\n### 从源码安装\n\n使用 `git` 克隆仓库，并通过 `pip install` 安装依赖项。\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm.git\ncd optillm\npython3 -m venv .venv\nsource .venv\u002Fbin\u002Factivate\npip install -r requirements.txt\n```\n\n## 🔒 SSL 配置\n\nOptILLM 支持 SSL 证书验证配置，以便在使用自签名证书或企业代理时正常工作。\n\n**禁用 SSL 验证（仅限开发）:**\n```bash\n# 命令行\noptillm --no-ssl-verify\n\n# 环境变量\nexport OPTILLM_SSL_VERIFY=false\noptillm\n```\n\n**使用自定义 CA 证书:**\n```bash\n# 命令行\noptillm --ssl-cert-path \u002Fpath\u002Fto\u002Fca-bundle.crt\n\n# 环境变量\nexport OPTILLM_SSL_CERT_PATH=\u002Fpath\u002Fto\u002Fca-bundle.crt\noptillm\n```\n\n⚠️ **安全提示**: 禁用 SSL 验证是不安全的，仅应在开发环境中使用。对于使用自定义 CA 的生产环境，请改用 `--ssl-cert-path`。详细信息请参阅 [SSL_CONFIGURATION.md](SSL_CONFIGURATION.md)。\n\n## 已实现的技术\n\n| 方法                             | Slug               | 描述                                                                                    |\n| ------------------------------------ | ------------------ | ---------------------------------------------------------------------------------------------- |\n| [MARS（多智能体推理系统）](optillm\u002Fmars) | `mars`             | 多智能体推理，结合多样化的温度探索、交叉验证和迭代改进 |\n| [Cerebras 规划与优化](optillm\u002Fcepo)   | `cepo`             | 结合最佳N个结果、思维链、自我反思、自我改进以及多种提示技术 |\n| 带反思的思维链                  | `cot_reflection`   | 实现带有\\\u003Cthinking\\>、\\\u003Creflection\\>和\\\u003Coutput\\>部分的思维链推理 |\n| 计划搜索                           | `plansearch`       | 在自然语言中对候选计划实施搜索算法以解决问题 |\n| 重读                               | `re2`              | 通过两次处理查询来改进推理，实现重读功能                          |\n| 自一致性                     | `self_consistency` | 实现一种先进的自一致性方法                                                 |\n| Z3 定理证明器                            | `z3`               | 利用Z3定理证明器进行逻辑推理                                           |\n| R* 算法                         | `rstar`            | 实现R*算法用于问题求解                                                |\n| LEAP                                 | `leap`             | 从少量示例中学习特定任务的原则                                         |\n| 往返优化              | `rto`              | 通过往返过程优化响应                                               |\n| 最佳N采样                   | `bon`              | 生成多个响应并选择最佳的一个                                          |\n| 智能体混合                    | `moa`              | 综合多个批评意见的响应                                              |\n| 蒙特卡洛树搜索              | `mcts`             | 在聊天回复中使用蒙特卡洛树搜索进行决策                              |\n| PV 游戏                              | `pvg`              | 在推理时应用证明者-验证者博弈的方法                                      |\n| [Deep Confidence](optillm\u002Fdeepconf) | 代理不适用 | 实现基于置信度的推理，采用多级强度以提升准确性 |\n| 思维链解码                         | 代理不适用     | 实现思维链解码，以在无需显式提示的情况下引导推理            |\n| 熵解码                     | 代理不适用     | 在生成过程中根据标记的不确定性实施自适应采样              |\n| Thinkdeeper                          | 代理不适用     | 实现来自OpenAI的`reasoning_effort`参数，适用于DeepSeek R1等推理模型      |\n| [AutoThink](optillm\u002Fautothink)       | 代理不适用     | 将查询复杂度分类与引导向量相结合，以增强推理能力            |\n\n## 已实现插件\n\n| 插件                  | Slug               | 描述                                                                                    |\n| ----------------------- | ------------------ | ---------------------------------------------------------------------------------------------- |\n| [系统提示学习](optillm\u002Fplugins\u002Fspl)  | `spl`              | 实现了[Andrej Karpathy 所称的第三范式](https:\u002F\u002Fx.com\u002Fkarpathy\u002Fstatus\u002F1921368644069765486) 的 LLM 学习方法，使模型能够获取解题知识和策略 |\n| [深度思考](optillm\u002Fplugins\u002Fdeepthink)              | `deepthink`        | 使用推理时缩放技术，为推理型 LLM 实现类似 Gemini 的深度思考方法 |\n| [长上下文 Cerebras 规划与优化](optillm\u002Fplugins\u002Flongcepo)              | `longcepo`              | 结合规划和分治处理长文档，以支持无限上下文长度 |\n| 多数投票         | `majority_voting`  | 生成 k 个候选解决方案，并通过多数投票选出最频繁的答案（默认 k=6） |\n| MCP 客户端              | `mcp`              | 实现模型上下文协议 (MCP) 客户端，使您能够将任何 LLM 与任何 MCP 服务器配合使用  |\n| 路由器                  | `router`           | 使用 [optillm-modernbert-large](https:\u002F\u002Fhuggingface.co\u002Fcodelion\u002Foptillm-modernbert-large) 模型，根据用户提示将请求路由到不同的方法 |\n| 代码链           | `coc`              | 实现一种结合思维链与代码执行及 LLM 基于代码模拟的代码链方法 |\n| 内存                  | `memory`           | 实现短期记忆层，使您能够与任何 LLM 配合使用无限制的上下文长度 |\n| 隐私                 | `privacy`          | 对请求中的 PII 数据进行匿名化处理，并在响应中将其还原为原始值            |\n| 读取 URL               | `readurls`         | 读取请求中找到的所有 URL，获取 URL 上的内容并将其添加到上下文中 |\n| 执行代码            | `executecode`      | 允许在请求和 LLM 生成的响应中使用代码解释器执行 Python 代码 |\n| JSON                    | `json`             | 使用 outlines 库实现结构化输出，支持 Pydantic 类型和 JSON Schema |\n| 生成选择               | `genselect`        | 生成式解决方案选择——生成多个候选方案，并根据质量标准选出最佳方案 |\n| 网络搜索              | `web_search`       | 使用 Chrome 自动化工具（Selenium）进行 Google 搜索，收集搜索结果和 URL |\n| [深度研究](optillm\u002Fplugins\u002Fdeep_research)           | `deep_research`    | 实现测试时扩散深度研究员（TTD-DR），通过迭代精炼生成全面的研究报告 |\n| [代理](optillm\u002Fplugins\u002Fproxy)      | `proxy`            | 在多个 LLM 提供商之间实现负载均衡和故障转移，具备健康监测和轮询路由功能 |\n\n我们支持所有主要的 LLM 提供商及其模型进行推理。您只需设置正确的环境变量，代理就会自动选择相应的客户端。\n\n| 提供商 | 必需环境变量 | 附加说明 |\n|----------|-------------------------------|------------------|\n| OptiLLM | `OPTILLM_API_KEY` | 使用内置本地服务器进行推理，支持 logprobs 以及如 `cot_decoding` 和 `entropy_decoding` 等解码技术 |\n| OpenAI | `OPENAI_API_KEY` | 您可以通过设置 `base_url` 将其与任何兼容 OpenAI 的端点（例如 OpenRouter）一起使用 |\n| Cerebras | `CEREBRAS_API_KEY` | 您可以使用它来对受支持的模型进行快速推理，请参阅[文档了解详情](https:\u002F\u002Finference-docs.cerebras.ai\u002Fintroduction) |\n| Azure OpenAI | `AZURE_OPENAI_API_KEY`\u003Cbr>`AZURE_API_VERSION`\u003Cbr>`AZURE_API_BASE` | - |\n| Azure OpenAI（托管身份） | `AZURE_API_VERSION`\u003Cbr>`AZURE_API_BASE` | 需要使用 `az login` 登录，请参阅[文档了解详情](https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fazure\u002Fai-services\u002Fopenai\u002Fhow-to\u002Fmanaged-identity) |\n| LiteLLM | 取决于模型 | 请参阅[文档了解详情](https:\u002F\u002Fdocs.litellm.ai\u002Fdocs\u002Fproviders) |\n\n随后，您可以按如下方式运行 optillm 代理。\n\n```bash\npython optillm.py\n2024-09-06 07:57:14,191 - INFO - 启动服务器，采用自动模式\n2024-09-06 07:57:14,191 - INFO - 服务器配置：{'approach': 'auto', 'mcts_simulations': 2, 'mcts_exploration': 0.2, 'mcts_depth': 1, 'best_of_n': 3, 'model': 'gpt-4o-mini', 'rstar_max_depth': 3, 'rstar_num_rollouts': 5, 'rstar_c': 1.4, 'base_url': '', 'host': '127.0.0.1'}\n * 正在提供 Flask 应用程序 'optillm'\n * 调试模式：关闭\n2024-09-06 07:57:14,212 - INFO - 警告：这是一个开发服务器。请勿在生产环境中使用。请改用生产级 WSGI 服务器。\n * 运行于 http:\u002F\u002F127.0.0.1:8000\n2024-09-06 07:57:14,212 - INFO - 按 CTRL+C 退出\n```\n\n> **安全提示**：默认情况下，optillm 绑定到 `127.0.0.1`（仅限本地），以确保安全性。若需允许外部连接（例如用于 Docker 或远程访问），请使用 `--host 0.0.0.0`。但仅应在受信任的网络上或已通过 `--optillm-api-key` 配置适当的身份验证后才这样做。\n\n## 使用方法\n\n代理服务启动后，您可以通过将 `base_url` 设置为 `http:\u002F\u002Flocalhost:8000\u002Fv1`，将其用作 OpenAI 客户端的直接替代品。\n\n```python\nimport os\nfrom openai import OpenAI\n\nOPENAI_KEY = os.environ.get(\"OPENAI_API_KEY\")\nOPENAI_BASE_URL = \"http:\u002F\u002Flocalhost:8000\u002Fv1\"\nclient = OpenAI(api_key=OPENAI_KEY, base_url=OPENAI_BASE_URL)\n\nresponse = client.chat.completions.create(\n  model=\"moa-gpt-4o\",\n  messages=[\n    {\n      \"role\": \"user\",\n      \"content\": \"请编写一个 Python 程序，仅使用 numpy 构建一个强化学习模型，使其能够从用户指定的任意位置开始朗读文本。\"\n    }\n  ],\n  temperature=0.2\n)\n\nprint(response)\n```\n\n上述代码适用于 OpenAI 和 Azure OpenAI，只需确保将 `OPENAI_API_KEY` 环境变量设置为正确的 API 密钥即可。  \n\n优化技术有多种控制方式，它们按以下优先级顺序应用：\n\n- 您可以通过在模型名称前添加标识符 `{slug}-model-name` 来指定使用的优化技术。例如，在上面的代码中，我们使用了 `moa`（即混合代理）作为优化方法。在代理的日志中，您会看到类似以下内容，表明正在使用 `moa` 技术，并以 `gpt-4o-mini` 作为基础模型：\n\n```bash\n2024-09-06 08:35:32,597 - INFO - 使用 moa 方法，基础模型为 gpt-4o-mini\n2024-09-06 08:35:35,358 - INFO - HTTP 请求：POST https:\u002F\u002Fapi.openai.com\u002Fv1\u002Fchat\u002Fcompletions \"HTTP\u002F1.1 200 OK\"\n2024-09-06 08:35:39,553 - INFO - HTTP 请求：POST https:\u002F\u002Fapi.openai.com\u002Fv1\u002Fchat\u002Fcompletions \"HTTP\u002F1.1 200 OK\"\n2024-09-06 08:35:44,795 - INFO - HTTP 请求：POST https:\u002F\u002Fapi.openai.com\u002Fv1\u002Fchat\u002Fcompletions \"HTTP\u002F1.1 200 OK\"\n2024-09-06 08:35:44,797 - INFO - 127.0.0.1 - - [06\u002FSep\u002F2024 08:35:44] \"POST \u002Fv1\u002Fchat\u002Fcompletions HTTP\u002F1.1\" 200 -\n```\n\n- 或者，您也可以在 `extra_body` 中通过 `optillm_approach` 字段传递标识符。\n\n```bash\nresponse = client.chat.completions.create(\n  model=\"gpt-4o-mini\",\n  messages=[{ \"role\": \"user\",\"content\": \"\" }],\n  temperature=0.2,\n  extra_body={\"optillm_approach\": \"bon|moa|mcts\"}\n)\n```\n\n- 另一种方式是在您的 `system` 或 `user` 提示中，使用 `\u003Coptillm_approach> \u003C\u002Foptillm_approach>` 标签来指定优化方法。\n\n```bash\nresponse = client.chat.completions.create(\n  model=\"gpt-4o-mini\",\n  messages=[{ \"role\": \"user\",\"content\": \"\u003Coptillm_approach>re2\u003C\u002Foptillm_approach> 草莓这个词中有多少个 r？\" }],\n  temperature=0.2\n)\n```\n\n> [!提示]\n> 您还可以结合不同的技术，使用符号 `&` 和 `|`。当使用 `&` 时，技术会按照从左到右的顺序依次处理，前一阶段的响应将作为下一阶段的请求输入。而使用 `|` 时，则会并行执行所有请求，并返回多个响应结果，以列表形式呈现。\n\n请注意，上述约定仅在 optillm 服务器以 `auto` 推理模式启动时有效。否则，客户端请求中的 `model` 属性必须仅指定模型名称。\n\n目前，我们支持所有 LLM 提供商（通过封装 [LiteLLM SDK](https:\u002F\u002Fdocs.litellm.ai\u002Fdocs\u002F#litellm-python-sdk) 实现）。例如，您可以将 Gemini Flash 模型与 `moa` 结合使用，只需在环境变量中设置 `os.environ['GEMINI_API_KEY']`，然后调用模型 `moa-gemini\u002Fgemini-1.5-flash-002`。在输出中，您会看到 LiteLLM 正在用于调用基础模型。\n\n```bash\n9:43:21 - LiteLLM:INFO: utils.py:2952 -\nLiteLLM completion() 模型= gemini-1.5-flash-002；提供商 = gemini\n2024-09-29 19:43:21,011 - INFO -\nLiteLLM completion() 模型= gemini-1.5-flash-002；提供商 = gemini\n2024-09-29 19:43:21,481 - INFO - HTTP 请求：POST https:\u002F\u002Fgenerativelanguage.googleapis.com\u002Fv1beta\u002Fmodels\u002Fgemini-1.5-flash-002:generateContent?key=[redacted] \"HTTP\u002F1.1 200 OK\"\n19:43:21 - LiteLLM:INFO: utils.py:988 - 包装器：调用完成，正在调用成功处理程序\n2024-09-29 19:43:21,483 - INFO - 包装器：调用完成，正在调用成功处理程序\n19:43:21 - LiteLLM:INFO: utils.py:2952 -\nLiteLLM completion() 模型= gemini-1.5-flash-002；提供商 = gemini\n```\n\n> [!提示]\n> optillm 是一个透明代理，可与任何具有 OpenAI API 兼容聊天补全端点的 LLM API 或提供商配合使用，同时 optillm 本身也暴露了一个兼容 OpenAI API 的聊天补全端点。这使得您可以轻松地将其集成到任何现有工具或框架中。如果您想要使用的 LLM 不具备 OpenAI API 兼容端点（如 Google 或 Anthropic），可以使用 [LiteLLM 代理服务器](https:\u002F\u002Fdocs.litellm.ai\u002Fdocs\u002Fproxy\u002Fquick_start)，它支持大多数 LLM。\n\n以下序列图展示了请求和响应如何通过 optillm 流转：\n\n![展示 optillm 使用情况的序列图](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Falgorithmicsuperintelligence_optillm_readme_b93cb2279096.png)\n\n在图中：\n- `A` 是现有的工具（如 [oobabooga](https:\u002F\u002Fgithub.com\u002Foobabooga\u002Ftext-generation-webui\u002F)）、框架（如 [patchwork](https:\u002F\u002Fgithub.com\u002Fpatched-codes\u002Fpatchwork)）或您自己的代码，您希望从中获取 optillm 的结果。您可以直接使用任何 OpenAI 客户端 SDK 来访问它。\n- `B` 是 optillm 服务（可以直接运行或在 Docker 容器中运行），它会向 `base_url` 发送请求。\n- `C` 是任何提供 OpenAI API 兼容聊天补全端点的服务。\n\n### 本地推理服务器\n\n我们支持在 optillm 中直接加载任何 HuggingFace 模型或 LoRA。要使用内置的推理服务器，只需将 `OPTILLM_API_KEY` 设置为任意值（例如 `export OPTILLM_API_KEY=\"optillm\"`），然后在你的 OpenAI 客户端中使用相同的设置即可。你可以在 `model` 字段中传递任何 HuggingFace 模型。如果该模型是私有的，请确保设置包含你的 HuggingFace 密钥的 `HF_TOKEN` 环境变量。此外，我们还支持通过使用 `+` 分隔符在模型基础上添加任意数量的 LoRA。\n\n例如，以下代码加载了基础模型 `meta-llama\u002FLlama-3.2-1B-Instruct`，并在其上添加了两个 LoRA：`patched-codes\u002FLlama-3.2-1B-FixVulns` 和 `patched-codes\u002FLlama-3.2-1B-FastApply`。你可以通过 OpenAI SDK 客户端的 `extra_body` 字段中的 `active_adapter` 参数来指定要使用的 LoRA。默认情况下，系统会加载最后指定的适配器。\n\n```python\nOPENAI_BASE_URL = \"http:\u002F\u002Flocalhost:8000\u002Fv1\"\nOPENAI_KEY = \"optillm\"\nresponse = client.chat.completions.create(\n  model=\"meta-llama\u002FLlama-3.2-1B-Instruct+patched-codes\u002FLlama-3.2-1B-FastApply+patched-codes\u002FLlama-3.2-1B-FixVulns\",\n  messages=messages,\n  temperature=0.2,\n  logprobs = True,\n  top_logprobs = 3,\n  extra_body={\"active_adapter\": \"patched-codes\u002FLlama-3.2-1B-FastApply\"},\n)\n```\n\n你还可以直接在本地推理服务器上使用替代解码技术，如 `cot_decoding` 和 `entropy_decoding`。\n\n```python\nresponse = client.chat.completions.create(\n  model=\"meta-llama\u002FLlama-3.2-1B-Instruct\",\n  messages=messages,\n  temperature=0.2,\n  extra_body={\n        \"decoding\": \"cot_decoding\",  # 或 \"entropy_decoding\"\n        \u002F\u002F CoT 特定参数\n        \"k\": 10,\n        \"aggregate_paths\": True,\n        \u002F\u002F 或者熵解码特定参数\n        \"top_k\": 27,\n        \"min_p\": 0.03,\n    }\n)\n```\n\n### 使用外部服务器（如 llama.cpp 或 ollama）启动 optillm 代理\n\n- 将 `OPENAI_API_KEY` 环境变量设置为占位符值：\n  - 例如 `export OPENAI_API_KEY=\"sk-no-key\"`\n- 运行 `.\u002Fllama-server -c 4096 -m path_to_model` 来启动服务器，指定模型和 4096 个 token 的上下文长度。\n- 运行 `python3 optillm.py --base_url base_url` 来启动代理：\n  - 例如，对于 llama.cpp，运行 `python3 optillm.py --base_url http:\u002F\u002Flocalhost:8080\u002Fv1`。\n\n> [!警告]\n> Anthropic API、llama.cpp 服务器和 ollama 目前不支持从模型中采样多个响应，这限制了可用的方法仅限于以下几种：\n> `cot_reflection`、`leap`、`plansearch`、`rstar`、`rto`、`self_consistency`、`re2` 和 `z3`。对于 HuggingFace 上的模型，你可以使用内置的本地推理服务器，因为它支持多响应。\n\n### MCP 插件\n\n模型上下文协议（MCP）插件使 OptiLLM 能够连接到 MCP 服务器，从而将外部工具、资源和提示引入语言模型的上下文中。这使得与文件系统访问、数据库查询、API 连接等的强大集成成为可能。\n\nOptiLLM 支持通过多种传输方式连接 **本地** 和 **远程** MCP 服务器：\n- **stdio**：本地服务器（传统方式）\n- **SSE**：通过服务器发送事件的远程服务器\n- **WebSocket**：通过 WebSocket 连接的远程服务器\n\n#### 什么是 MCP？\n\n[模型上下文协议](https:\u002F\u002Fmodelcontextprotocol.io\u002F)（MCP）是一个开放的协议标准，允许 LLM 通过标准化接口安全地访问工具和数据源。MCP 服务器可以提供：\n\n- **工具**：可调用的函数，用于执行操作（如写入文件、查询数据库等）\n- **资源**：用于提供上下文的数据源（如文件内容）\n- **提示**：针对特定用例的可重用提示模板\n\n#### 配置\n\n##### 设置 MCP 配置\n\n> **关于向后兼容性的说明**：现有的 MCP 配置将继续正常工作，无需更改。如果未指定 `transport` 字段，则默认为 `\"stdio\"`，从而保持与现有设置的完全向后兼容性。\n\n1. 在 `~\u002F.optillm\u002Fmcp_config.json` 创建一个配置文件，结构如下：\n\n**本地服务器（stdio）——传统方法：**\n```json\n{\n  \"mcpServers\": {\n    \"filesystem\": {\n      \"transport\": \"stdio\",\n      \"command\": \"npx\",\n      \"args\": [\n        \"-y\",\n        \"@modelcontextprotocol\u002Fserver-filesystem\",\n        \"\u002Fpath\u002Fto\u002Fallowed\u002Fdirectory1\",\n        \"\u002Fpath\u002Fto\u002Fallowed\u002Fdirectory2\"\n      ],\n      \"env\": {},\n      \"description\": \"本地文件系统访问\"\n    }\n  },\n  \"log_level\": \"INFO\"\n}\n```\n\n**旧格式（仍然有效）：**\n```json\n{\n  \"mcpServers\": {\n    \"filesystem\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"@modelcontextprotocol\u002Fserver-filesystem\", \"\u002Fpath\u002Fto\u002Fdirectory\"],\n      \"env\": {}\n    }\n  }\n}\n```\n\n**远程服务器（SSE）——新功能：**\n```json\n{\n  \"mcpServers\": {\n    \"github\": {\n      \"transport\": \"sse\",\n      \"url\": \"https:\u002F\u002Fapi.githubcopilot.com\u002Fmcp\",\n      \"headers\": {\n        \"Authorization\": \"Bearer ${GITHUB_TOKEN}\",\n        \"Accept\": \"text\u002Fevent-stream\"\n      },\n      \"timeout\": 30.0,\n      \"sse_read_timeout\": 300.0,\n      \"description\": \"GitHub MCP 服务器，用于仓库访问\"\n    }\n  },\n  \"log_level\": \"INFO\"\n}\n```\n\n**远程服务器（WebSocket）——新功能：**\n```json\n{\n  \"mcpServers\": {\n    \"remote-ws\": {\n      \"transport\": \"websocket\",\n      \"url\": \"wss:\u002F\u002Fapi.example.com\u002Fmcp\",\n      \"description\": \"远程 WebSocket MCP 服务器\"\n    }\n  },\n  \"log_level\": \"INFO\"\n}\n```\n\n**混合配置（本地 + 远程）：**\n```json\n{\n  \"mcpServers\": {\n    \"filesystem\": {\n      \"transport\": \"stdio\",\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"@modelcontextprotocol\u002Fserver-filesystem\", \"\u002Fhome\u002Fuser\u002Fdocs\"],\n      \"description\": \"本地文件系统访问\"\n    },\n    \"github\": {\n      \"transport\": \"sse\",\n      \"url\": \"https:\u002F\u002Fapi.githubcopilot.com\u002Fmcp\",\n      \"headers\": {\n        \"Authorization\": \"Bearer ${GITHUB_TOKEN}\"\n      },\n      \"description\": \"GitHub MCP 服务器\"\n    },\n    \"remote-api\": {\n      \"transport\": \"websocket\",\n      \"url\": \"wss:\u002F\u002Fapi.company.com\u002Fmcp\",\n      \"description\": \"公司内部 MCP 服务器\"\n    }\n  },\n  \"log_level\": \"INFO\"\n}\n```\n\n##### 配置参数\n\n**通用参数：**\n- **服务器名称**：服务器的唯一标识符（如“filesystem”、“github”）\n- **传输方式**：传输方法——“stdio”（默认）、“sse”或“websocket”\n- **描述**（可选）：服务器功能的描述\n- **超时时间**（可选）：连接超时时间，单位为秒（默认：5.0）\n\n**stdio 传输（本地服务器）：**\n- **命令**：运行服务器的可执行文件\n- **参数**：服务器的命令行参数\n- **环境变量**：服务器进程的环境变量\n\n**sse 传输（服务器发送事件）：**\n- **URL**：SSE 端点 URL\n- **头信息**（可选）：用于身份验证的 HTTP 头\n- **sse_read_timeout**（可选）：SSE 读取超时时间，单位为秒（默认：300.0）\n\n**websocket 传输（WebSocket）：**\n- **URL**：WebSocket 端点 URL\n\n**环境变量扩展：**\n标头和其他字符串值支持使用 `${VARIABLE_NAME}` 语法进行环境变量扩展。这对于 API 密钥尤其有用：\n```json\n{\n  \"headers\": {\n    \"Authorization\": \"Bearer ${GITHUB_TOKEN}\",\n    \"X-API-Key\": \"${MY_API_KEY}\"\n  }\n}\n```\n\n#### 可用的 MCP 服务器\n\nOptiLLM 支持本地和远程 MCP 服务器：\n\n##### 本地 MCP 服务器（stdio 传输）\n\n您可以使用任何 [官方 MCP 服务器](https:\u002F\u002Fmodelcontextprotocol.io\u002Fexamples) 或作为本地进程运行的第三方服务器：\n\n- **文件系统**：`@modelcontextprotocol\u002Fserver-filesystem` - 文件操作\n- **Git**：`mcp-server-git` - Git 仓库操作\n- **SQLite**：`@modelcontextprotocol\u002Fserver-sqlite` - SQLite 数据库访问\n- **Brave 搜索**：`@modelcontextprotocol\u002Fserver-brave-search` - 网络搜索功能\n\n##### 远程 MCP 服务器（SSE\u002FWebSocket 传输）\n\n远程服务器提供集中式访问，无需本地安装：\n\n- **GitHub MCP 服务器**：`https:\u002F\u002Fapi.githubcopilot.com\u002Fmcp` - 仓库管理、问题跟踪和代码分析\n- **第三方服务器**：任何支持 SSE 或 WebSocket 协议的 MCP 服务器\n\n##### 示例：综合配置\n\n```json\n{\n  \"mcpServers\": {\n    \"filesystem\": {\n      \"transport\": \"stdio\",\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"@modelcontextprotocol\u002Fserver-filesystem\", \"\u002Fhome\u002Fuser\u002Fdocuments\"],\n      \"description\": \"本地文件系统访问\"\n    },\n    \"search\": {\n      \"transport\": \"stdio\",\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"@modelcontextprotocol\u002Fserver-brave-search\"],\n      \"env\": {\n        \"BRAVE_API_KEY\": \"your-api-key-here\"\n      },\n      \"description\": \"网络搜索功能\"\n    },\n    \"github\": {\n      \"transport\": \"sse\",\n      \"url\": \"https:\u002F\u002Fapi.githubcopilot.com\u002Fmcp\",\n      \"headers\": {\n        \"Authorization\": \"Bearer ${GITHUB_TOKEN}\",\n        \"Accept\": \"text\u002Fevent-stream\"\n      },\n      \"description\": \"GitHub 仓库和问题管理\"\n    }\n  },\n  \"log_level\": \"INFO\"\n}\n```\n\n#### 使用 MCP 插件\n\n配置完成后，MCP 插件将自动：\n\n1. 连接到所有已配置的 MCP 服务器\n2. 发现可用的工具、资源和提示\n3. 将这些能力提供给语言模型\n4. 处理工具调用和资源请求\n\n插件会通过 MCP 能力增强系统提示，使模型知道有哪些工具可用。当模型决定使用某个工具时，插件会：\n\n1. 使用提供的参数执行该工具\n2. 将结果返回给模型\n3. 允许模型将结果整合到其响应中\n\n#### 查询示例\n\n以下是一些会触发 MCP 工具的查询示例：\n\n**本地服务器示例：**\n- “列出我文档目录中的所有 Python 文件”（文件系统）\n- “我的 Git 仓库最近有哪些提交？”（Git）\n- “搜索关于可再生能源的最新信息”（搜索）\n- “查询我的数据库中本月注册的所有用户”（数据库）\n\n**远程服务器示例：**\n- “展示我 GitHub 仓库中的未解决问题”（GitHub MCP）\n- “为我正在开发的功能创建一个新分支”（GitHub MCP）\n- “有哪些最近的拉取请求需要评审？”（GitHub MCP）\n- “获取我远程仓库中的文件内容”（GitHub MCP）\n\n#### 故障排除\n\n##### 日志\n\nMCP 插件会将详细信息记录到：\n```\n~\u002F.optillm\u002Flogs\u002Fmcp_plugin.log\n```\n\n请检查此日志文件以了解连接问题、工具执行错误及其他诊断信息。\n\n##### 常见问题\n\n**本地服务器问题（stdio 传输）：**\n\n1. **命令未找到**：确保服务器可执行文件在您的 PATH 中可用，或在配置中使用绝对路径。\n\n2. **权限不足**：对于文件系统操作，请确保配置中指定的路径对进程是可访问的。\n\n**远程服务器问题（SSE\u002FWebSocket 传输）：**\n\n3. **连接超时**：远程服务器可能需要更长时间才能建立连接。请增加配置中的 `timeout` 值。\n\n4. **认证失败**：请验证您的 API 密钥和令牌是否正确。对于 GitHub MCP 服务器，确保已设置具有适当权限的 `GITHUB_TOKEN` 环境变量。\n\n5. **网络错误**：请检查您的互联网连接，并确认服务器 URL 是否可访问。\n\n6. **未找到环境变量**：如果使用 `${VARIABLE_NAME}` 语法，请确保在启动 OptILLM 之前已设置相应的环境变量。\n\n**通用问题：**\n\n7. **方法未找到**：某些服务器并未实现所有 MCP 功能（工具、资源、提示）。请确认服务器支持哪些功能。\n\n8. **不支持的传输方式**：请确保您使用的是受支持的传输方式：“stdio”、“sse”或“websocket”。\n\n**示例：测试 GitHub MCP 连接**\n\n要测试您的 GitHub MCP 服务器配置是否正常工作：\n\n1. 设置您的 GitHub 令牌：`export GITHUB_TOKEN=\"your-github-token\"`\n2. 启动 OptILLM 并查看 `~\u002F.optillm\u002Flogs\u002Fmcp_plugin.log` 中的日志\n3. 查找连接成功的消息以及发现的能力\n\n## 可用参数\n\noptillm 支持多种命令行参数进行配置。在使用 Docker 时，这些参数也可以作为以 `OPTILLM_` 为前缀的环境变量来设置。\n\n| 参数                | 描述                                                     | 默认值   |\n|--------------------------|-----------------------------------------------------------------|-----------------|\n| `--approach`             | 使用的推理方法                                       | `\"auto\"`        |\n| `--simulations`          | MCTS 模拟次数                                      | 2               |\n| `--exploration`          | MCTS 的探索权重                                     | 0.2             |\n| `--depth`                | MCTS 的模拟深度                                     | 1               |\n| `--best-of-n`            | best_of_n 方法的样本数量                            | 3               |\n| `--model`                | 使用的 OpenAI 模型                                  | `\"gpt-4o-mini\"` |\n| `--base-url`             | OpenAI 兼容端点的基础 URL                           | `\"\"`            |\n| `--rstar-max-depth`      | rStar 算法的最大深度                                | 3               |\n| `--rstar-num-rollouts`   | rStar 算法的模拟次数                                | 5               |\n| `--rstar-c`              | rStar 算法的探索常数                                | 1.4             |\n| `--n`                    | 最终返回的回答数量                                  | 1               |\n| `--return-full-response` | 返回包含思维链（CoT）及 `\u003Cthinking>` 标签的完整响应  | `False`         |\n| `--port`                 | 指定代理服务运行的端口                              | 8000            |\n| `--optillm-api-key`      | 客户端认证 optillm 的可选 API 密钥                  | `\"\"`            |\n| `--cepo_*`               | 详细配置选项请参阅下方的 CePO 参数部分              | 各种           |\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>CePO 参数\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n| 参数 | 描述 | 默认值 |\n|-----------|-------------|---------------|\n| `--cepo_bestofn_n` | 在 best of n 阶段生成的回答数量 | 3 |\n| `--cepo_bestofn_temperature` | 在 best of n 阶段验证器的温度 | 0.1 |\n| `--cepo_bestofn_max_tokens` | 在 best of n 阶段验证器的最大令牌数 | 4096 |\n| `--cepo_bestofn_rating_type` | 在 best of n 阶段的评分类型（“绝对”或“成对”） | `\"absolute\"` |\n| `--cepo_planning_n` | 在规划阶段生成的计划数量 | 3 |\n| `--cepo_planning_m` | 在规划阶段尝试生成 n 个计划的次数 | 6 |\n| `--cepo_planning_temperature_step1` | 规划阶段第一步生成器的温度 | 0.55 |\n| `--cepo_planning_temperature_step2` | 规划阶段第二步生成器的温度 | 0.25 |\n| `--cepo_planning_temperature_direct_resp` | 如果规划失败并直接作答，第二步后生成器的温度 | 0.1 |\n| `--cepo_planning_temperature_step3` | 规划阶段第三步生成器的温度 | 0.1 |\n| `--cepo_planning_temperature_step4` | 规划阶段第四步生成器的温度 | 0 |\n| `--cepo_planning_max_tokens_step1` | 规划阶段第一步的最大令牌数 | 4096 |\n| `--cepo_planning_max_tokens_step2` | 规划阶段第二步的最大令牌数 | 4096 |\n| `--cepo_planning_max_tokens_direct_resp` | 如果规划失败并直接作答，第二步后的最大令牌数 | 4096 |\n| `--cepo_planning_max_tokens_step3` | 规划阶段第三步的最大令牌数 | 4096 |\n| `--cepo_planning_max_tokens_step4` | 规划阶段第四步的最大令牌数 | 4096 |\n| `--cepo_use_reasoning_fallback` | 当高级推理失败时是否回退到低级推理 | False |\n| `--cepo_num_of_retries` | LLM 调用失败时的重试次数，0 表示不重试 | 0 |\n| `--cepo_print_output` | 是否打印每个阶段的输出 | `False` |\n| `--cepo_config_file` | CePO 配置文件的路径 | `None` |\n| `--cepo_use_plan_diversity` | 是否使用额外的计划多样性步骤 | `False` |\n| `--cepo_rating_model` | 如果评分步骤使用的模型与完成步骤不同，则指定评分模型 | `None` |\n\n\u003C\u002Fdetails>\n\n## 使用 Docker 运行\n\noptillm 可以选择使用 Docker 和提供的 [Dockerfile](https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fblob\u002Fmain\u002FDockerfile) 进行构建和运行。\n\n### 使用 Docker Compose\n\n1. 确保您的系统已安装 Docker 和 Docker Compose。\n\n2. 您可以更新 `docker-compose.yaml` 文件中的环境变量，或者在项目根目录下创建一个 `.env` 文件，并添加您想要设置的任何环境变量。例如，要设置 OpenAI API 密钥，请在 `.env` 文件中添加以下内容：\n\n   ```bash\n   OPENAI_API_KEY=your_openai_api_key_here\n   ```\n\n3. 运行以下命令启动 optillm：\n\n   ```bash\n   docker compose up -d\n   ```\n\n   如果 Docker 镜像不存在，此命令将构建镜像并启动 optillm 服务。\n\n4. optillm 将在 `http:\u002F\u002Flocalhost:8000` 上可用。\n\n在使用 Docker 时，您可以将这些参数设置为环境变量。例如，要设置推理方法和模型，可以使用：\n\n```bash\nOPTILLM_APPROACH=mcts\nOPTILLM_MODEL=gpt-4\n```\n\n要通过 API 密钥保护 optillm 代理，请设置 `OPTILLM_API_KEY` 环境变量：\n\n```bash\nOPTILLM_API_KEY=your_secret_api_key\n```\n\n当设置了 API 密钥后，客户端必须在其请求中使用 `Authorization` 头部包含该密钥：\n\n```plain\nAuthorization: Bearer your_secret_api_key\n```\n\n## optillm 在基准测试中的 SOTA 结果\n\n### MARS 在 AIME 2025、IMO 2025 和 LiveCodeBench（2025 年 10 月）上的表现\n\n| 基准测试 | 方法 | 题目数量 | 正确答案数 | 准确率 | 提升幅度 |\n|-----------|----------|----------|---------|----------|-------------|\n| **AIME 2025** | 基线 | 30 | 13 | 43.3% | - |\n| **AIME 2025** | **MARS** | 30 | **22** | **73.3%** | **+30.0pp (+69.2%)** |\n| **IMO 2025** | 基线 | 6 | 1 | 16.7% | - |\n| **IMO 2025** | **MARS** | 6 | **2** | **33.3%** | **+16.7pp (+100%)** |\n| **LiveCodeBench v5\u002Fv6** | 基线 | 105 | 41 | 39.05% | - |\n| **LiveCodeBench v5\u002Fv6** | **MARS** | 105 | **53** | **50.48%** | **+11.43pp (+29.3%)** |\n\n模型：通过 OpenRouter 使用 google\u002Fgemini-2.5-flash-lite-preview-09-2025  \n配置：3 个智能体，两轮验证，禁用证明中的思考标签\n\n### AutoThink 在 GPQA-Diamond 和 MMLU-Pro 上的表现（2025年5月）\n\n| **模型**     | **GPQA-Diamond**            |                          | **MMLU-Pro**               |                          |\n|----------------|-----------------------------|--------------------------|----------------------------|--------------------------|\n|                | 准确率 (%)                | 平均 token 数              | 准确率 (%)               | 平均 token 数              |\n| DeepSeek-R1-Distill-Qwen-1.5B    | 21.72                       | 7868.26                  | 25.58                      | 2842.75                  |\n| 固定预算下 | 28.47                     | 3570.00                  | 26.18                      | 1815.67                  |\n| **AutoThink 下**  | **31.06**                   | **3520.52**              | **26.38**                  | **1792.50**              |\n\n\n### LongCePO 在 LongBench v2 上的表现（2025年4月）\n\n| 模型¹                             | 上下文窗口 | 短样本（最多32K词） | 中等样本（32–128K词） |\n|----------------------------------|----------------|------------------|----------------|\n| Llama 3.3 70B Instruct           | 128K           | 36.7 (45.0)               | 27.0 (33.0)            |\n| **LongCePO + Llama 3.3 70B Instruct** | **8K**             | **36.8 ± 1.38**        |  **38.7 ± 2.574 (39.735)²**             |\n| Mistral-Large-Instruct-2411     | 128K           | 41.7 (46.1)                 | 30.7 (34.9)             |\n| o1-mini-2024-09-12               | 128K           | 48.6 (48.9)                | 33.3 (32.9)            |\n| Claude-3.5-Sonnet-20241022       | 200K           | 46.1 (53.9)                | 38.6 (41.9)            |\n| Llama-4-Maverick-17B-128E-Instruct | 524K         | 32.22 (50.56)                  | 28.84 (41.86)               |\n\n ¹ 性能数据由 LongBench v2 的作者提供，除 LongCePO 和 Llama-4-Maverick 的结果外。\n\n ² LongCePO 括号中的数字表示 5 次运行的多数投票准确率。\n\n### LongCePO 在 HELMET - InfiniteBench En.MC、128K 长度上的表现（2025年4月）\n\n| 模型   | 准确率 (%) |\n|---------|---------------|\n| Llama 3.3 70B Instruct  （完整上下文）  | 58.0          |\n| **LongCePO + Llama 3.3 70B Instruct（8K 上下文）** | **71.6 ± 1.855（73.0）¹**  |\n| o1-mini-2024-09-12（完整上下文） | 58.0          |\n| gpt-4o-2024-08-06（完整上下文） | 74.0          |\n\n ¹ LongCePO 括号中的数字表示 5 次运行的多数投票准确率。\n\n### CePO 在数学和代码基准测试上的表现（2025年9月）\n\n| 方法                  | AIME 2024 | AIME 2025 |  GPQA  | LiveCodeBench |\n| ----------------------: | :-------: | :-------: | :----: | :-----------: |\n| Qwen3 8B                |   74.0    |   68.3    |  59.3  |     55.7      |\n| CePO（使用 Qwen3 8B）   |   86.7    |   80.0    |  62.5  |     60.5      |\n| Qwen3 32B               |   81.4    |   72.9    |  66.8  |     65.7      |\n| CePO（使用 Qwen3 32B）  | **90.7**  | **83.3**  |  70.0  |   **71.9**    |\n| Qwen3 235B              |   85.7    |   81.5    |  71.1  |     70.7      |\n| DeepSeek R1             |   79.8    |   70.0    |  71.5  |     64.3      |\n| OpenAI o3-mini          |   79.6    |   74.8    |  76.8  |     66.3      |\n| Grok3 Think             |   83.9    |   77.3    |**80.2**|     70.6      |\n\n### CePO 在数学和代码基准测试上的表现（2025年3月）\n\n| 方法                         | Math-L5 | MMLU-Pro（数学） | CRUX | LiveCodeBench（pass@1） | Simple QA |\n| -----------------------------: | :-----: | :-------------: | :----: | :--------------------: | :-------: |\n| Llama 3.3 70B                  |  51.0   |      78.6       |  72.6  |          27.1          |    20.9   |\n| Llama 3.1 405B                 |  49.8   |      79.2       |  73.0  |          31.8          |    13.5   |\n| CePO（使用 Llama 3.3 70B）     |  69.6   |      84.8       |  80.1  |          31.9          |  **22.6** |\n| QwQ 32B                        |  61.4   |      90.8       |  82.5  |          44.3          |    7.8    |\n| CePO（使用 QwQ 32B）           |  88.1   |    **92.0**     |  86.3  |        **51.5**        |    8.2    |\n| DeepSeek R1 Llama              |  83.1   |      82.0       |  84.0  |          47.3          |    14.6   |\n| CePO（使用 DeepSeek R1 Llama） |**90.2** |      84.0       |**89.4**|          47.2          |    15.5   |\n\n### coc-claude-3-5-sonnet-20241022 在 AIME 2024 pass@1 上的表现（2024年11月）\n\n| 模型 | 分数 |\n|-------|-----:|\n| o1-mini | 56.67 |\n| coc-claude-3-5-sonnet-20241022 | 46.67 |\n| coc-gemini\u002Fgemini-exp-1121 | 46.67 |\n| o1-preview | 40.00 |\n| gemini-exp-1114 | 36.67 |\n| claude-3-5-sonnet-20241022 | 20.00 |\n| gemini-1.5-pro-002 | 20.00 |\n| gemini-1.5-flash-002 | 16.67 |\n\n### readurls&memory-gpt-4o-mini 在 Google FRAMES 基准测试上的表现（2024年10月）\n| 模型 | 准确率 |\n| ----- | -------- |\n| readurls&memory-gpt-4o-mini | 61.29 |\n| gpt-4o-mini | 50.61 |\n| readurls&memory-Gemma2-9b | 30.1 |\n| Gemma2-9b | 5.1 |\n| Gemma2-27b | 30.8 |\n| Gemini Flash 1.5 | 66.5 |\n| Gemini Pro 1.5 | 72.9 |\n\n### plansearch-gpt-4o-mini 在 LiveCodeBench 上的表现（2024年9月）\n\n| 模型                  | pass@1 | pass@5 | pass@10 |\n| ---------------------- | ------ | ------ | ------- |\n| plansearch-gpt-4o-mini | 44.03  | 59.31  | 63.5    |\n| gpt-4o-mini            | 43.9   | 50.61  | 53.25   |\n| claude-3.5-sonnet      | 51.3   |        |         |\n| gpt-4o-2024-05-13      | 45.2   |        |         |\n| gpt-4-turbo-2024-04-09 | 44.2   |        |         |\n\n### moa-gpt-4o-mini 在 Arena-Hard-Auto 上的表现（2024年8月）\n\n![展示使用 gpt-4o-mini 的混合代理方法在 Arena Hard Auto 基准测试上的结果](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Falgorithmicsuperintelligence_optillm_readme_9dbb33886eb2.png)\n\n### optillm 与 Patchwork 结合使用（2024年7月）\n\n由于 optillm 是 OpenAI API 的直接替代品，您可以使用 OpenAI 客户端轻松将其集成到现有工具和框架中。我们使用 optillm 与 [patchwork](https:\u002F\u002Fgithub.com\u002Fpatched-codes\u002Fpatchwork) 结合，这是一个开源框架，可通过称为 patchflows 的工作流自动执行开发中的重复性任务，如 PR 审查、错误修复和安全补丁。正如下面所示，当我们采用混合代理方法（moa）时，所有支持的 patchflows 都实现了巨大的性能提升。\n\n![展示 optillm 混合代理方法与 patchflows 结合使用的结果](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Falgorithmicsuperintelligence_optillm_readme_cda8a1644c8c.png)\n\n## 测试\n\nOptiLLM 包含一个全面的测试套件，以确保可靠性和兼容性。\n\n### 运行测试\n\n主测试套件可以从项目根目录运行：\n```bash\n# 使用默认测试用例测试所有方法\npython tests\u002Ftest.py\n\n# 测试特定方法\npython tests\u002Ftest.py --approaches moa bon mcts\n\n# 运行单个测试\npython tests\u002Ftest.py --single-test \"简单数学问题\"\n```\n\n### 单元测试和集成测试\n\n`tests\u002F` 目录中还提供了其他测试：\n```bash\n# 运行所有测试（需要 pytest）\n.\u002Ftests\u002Frun_tests.sh\n\n# 运行特定的测试模块\npytest tests\u002Ftest_plugins.py -v\npytest tests\u002Ftest_api_compatibility.py -v\n```\n\n### CI\u002FCD\n\n所有测试都会通过 GitHub Actions 在拉取请求上自动运行。工作流会测试：\n- 多个 Python 版本（3.10、3.11、3.12）\n- 插件和核心功能的单元测试\n- API 兼容性测试\n- 使用多种方法的集成测试\n\n更多关于测试结构以及如何编写新测试的信息，请参阅 `tests\u002FREADME.md`。\n\n## 🤝 贡献\n\n我们非常欢迎贡献！OptiLLM 是由社区共建、服务于社区的项目。\n\n- 🐛 **发现 bug？** [提交 issue](https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fissues\u002Fnew)\n- 💡 **有想法？** [发起讨论](https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fdiscussions)\n- 🔧 **想参与开发？** 查看 [适合初学者的问题](https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Flabels\u002Fgood%20first%20issue)\n\n### 开发环境搭建\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm.git\ncd optillm\npython -m venv .venv\nsource .venv\u002Fbin\u002Factivate  # 或者在 Windows 上使用 `.venv\\Scripts\\activate`\npip install -r requirements.txt\npip install -r tests\u002Frequirements.txt\n\n# 运行测试\npython -m pytest tests\u002F\n```\n\n## 参考文献\n- [通过推理时技术激发微调 Transformer 的能力](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.08060)\n- [AutoThink：用于推理型 LLM 的高效推理](https:\u002F\u002Fdx.doi.org\u002F10.2139\u002Fssrn.5253327) - [实现](optillm\u002Fautothink)\n- [深度思考，充满信心：基于置信度的推理与推理时缩放](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.15260) - [实现](optillm\u002Fdeepconf)\n- [自我发现：大型语言模型自动生成推理结构](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.03620) - [实现](optillm\u002Fplugings\u002Fdeepthink)\n- [CePO：利用推理时计算赋能 Llama 模型进行推理](https:\u002F\u002Fcerebras.ai\u002Fblog\u002Fcepo) - [实现](optillm\u002Fcepo)\n- [LongCePO：赋能 LLM 高效利用无限上下文](https:\u002F\u002Fcerebras.ai\u002Fblog\u002Flongcepo) - [实现](optillm\u002Fplugins\u002Flongcepo)\n- [代码链：结合语言模型增强的代码模拟器进行推理](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.04474) - [启发了 coc 插件的实现](optillm\u002Fplugins\u002Fcoc_plugin.py)\n- [基于熵的采样与并行 CoT 解码](https:\u002F\u002Fgithub.com\u002Fxjdr-alt\u002Fentropix) - [实现](optillm\u002Fentropy_decoding.py)\n- [事实、获取与推理：检索增强生成的统一评估](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.12941) - [评估脚本](scripts\u002Feval_frames_benchmark.py)\n- [在边缘书写：适用于长上下文检索的更好推理模式](https:\u002F\u002Fwww.arxiv.org\u002Fabs\u002F2408.14906) - [启发了 memory 插件的实现](optillm\u002Fplugins\u002Fmemory_plugin.py)\n- [无需提示的思维链推理](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.10200) - [实现](optillm\u002Fcot_decoding.py)\n- [重读提升大型语言模型的推理能力](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.06275) - [实现](optillm\u002Freread.py)\n- [基于错误的上下文原则学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.05403) - [实现](optillm\u002Fleap.py)\n- [自然语言规划提升 LLM 的代码生成搜索能力](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.03733) - [实现](optillm\u002Fplansearch.py)\n- [自我一致性提升语言模型中的思维链推理](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.11171) - [实现](optillm\u002Fself_consistency.py)\n- [互惠式推理使小型 LLM 成为更强大的问题解决者](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.06195) - [实现](optillm\u002Frstar.py)\n- [混合代理增强大型语言模型的能力](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.04692) - [启发了 moa 插件的实现](optillm\u002Fmoa.py)\n- [证明者-验证者游戏提升 LLM 输出的可读性](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.13692) - [实现](optillm\u002Fpvg.py)\n- [蒙特卡洛树搜索通过迭代偏好学习提升推理能力](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.00451) - [启发了 mcts 插件的实现](optillm\u002Fmcts.py)\n- [使用往返正确性对代码 LLM 进行无监督评估](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.08699) - [启发了 rto 插件的实现](optillm\u002Frto.py)\n- [改进的 MOA：优化针对多样化软件开发任务的推理](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.18521) - [实现](optillm\u002Fmoa.py)\n- [改进的 RTC：评估 LLM 在多样化软件开发任务中的表现](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.16557) - [实现](optillm\u002Frto.py)\n- [AIMO-2 冠军方案：利用 OpenMathReasoning 数据集构建最先进的数学推理模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.16891) - [实现](optillm\u002Fplugins\u002Fgenselect_plugin.py)\n- [推理时扩散深度研究员 (TTD-DR)：多思考、多研究、回答更出色！](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.16075v1) - [实现](optillm\u002Fplugins\u002Fdeep_research)\n\n## 引用\n\n如果您在研究中使用此库，请引用以下内容：\n\n```bibtex\n@software{optillm,\n  title = {OptiLLM：优化 LLM 推理的代理},\n  author = {Asankhaya Sharma},\n  year = {2024},\n  publisher = {GitHub},\n  url = {https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm}\n}\n```\n\n---\n\n\u003Cp align=\"center\">\n  \u003Cstrong>准备好优化您的 LLM 了吗？安装 OptiLLM，感受其中的不同吧！🚀\u003C\u002Fstrong>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  ⭐ 如果您觉得 OptiLLM 很有用，请在 GitHub 上 [给它点个赞](https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm)！\n\u003C\u002Fp>","# OptiLLM 快速上手指南\n\nOptiLLM 是一个兼容 OpenAI API 的推理优化代理，无需训练或微调模型，即可通过 20+ 种前沿技术（如 MCTS、多智能体协作等）将大模型在数学、编程和逻辑推理任务上的准确率提升 2-10 倍。\n\n## 环境准备\n\n*   **系统要求**：支持 Linux、macOS 或 Windows（需安装 Python 环境）。\n*   **前置依赖**：\n    *   Python 3.8+\n    *   `pip` 包管理工具\n    *   （可选）Docker：若选择容器化部署\n*   **API Key**：需要拥有任意 OpenAI 兼容接口的 API Key（如 OpenAI, Anthropic, Google, Cerebras 等）。\n\n## 安装步骤\n\n你可以选择通过 `pip` 直接安装或使用 `Docker` 部署。\n\n### 方式一：使用 pip 安装（推荐）\n\n```bash\n# 安装 OptiLLM\npip install optillm\n\n# 设置你的 API Key\nexport OPENAI_API_KEY=\"your-key-here\"\n\n# 启动服务\noptillm\n```\n> **提示**：国内用户若下载缓慢，可尝试使用清华源：`pip install optillm -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`\n\n### 方式二：使用 Docker 部署\n\n```bash\n# 拉取最新镜像\ndocker pull ghcr.io\u002Falgorithmicsuperintelligence\u002Foptillm:latest\n\n# 运行容器（映射端口 8000 并传入 API Key）\ndocker run -p 8000:8000 -e OPENAI_API_KEY=\"your-key-here\" ghcr.io\u002Falgorithmicsuperintelligence\u002Foptillm:latest\n```\n\n## 基本使用\n\nOptiLLM 启动后会在本地 `http:\u002F\u002Flocalhost:8000\u002Fv1` 提供一个兼容 OpenAI 的接口。你只需更改客户端的 `base_url` 和 `model` 名称即可启用优化策略。\n\n### 1. 配置客户端\n\n将 OpenAI 客户端的基础 URL 指向本地 OptiLLM 服务：\n\n```python\nfrom openai import OpenAI\n\n# 指向本地 OptiLLM 服务\nclient = OpenAI(base_url=\"http:\u002F\u002Flocalhost:8000\u002Fv1\")\n```\n\n### 2. 调用优化模型\n\n在模型名称前添加特定的**前缀**来激活不同的优化技术。例如，使用 `moa-` 前缀开启“多智能体混合（Mixture of Agents）”模式，可用小模型获得大模型的效果。\n\n```python\nresponse = client.chat.completions.create(\n    # 使用 moa- 前缀激活多智能体优化，让小模型具备 GPT-4o 级别的推理能力\n    model=\"moa-gpt-4o-mini\",  \n    messages=[\n        {\"role\": \"user\", \"content\": \"Solve: If 2x + 3 = 7, what is x?\"}\n    ]\n)\n\nprint(response.choices[0].message.content)\n```\n\n### 常用优化前缀参考\n\n| 前缀 | 对应技术 | 适用场景 |\n| :--- | :--- | :--- |\n| `moa-` | Mixture of Agents | 通用推理提升，综合多个模型意见 |\n| `mars-` | Multi-Agent Reasoning | 复杂数学与逻辑推导 |\n| `cepo-` | Cerebras Planning | 结合思维链与自我反思的综合规划 |\n| `plansearch-` | PlanSearch | 分步规划解决复杂问题 |\n| `cot_reflection-` | CoT with Reflection | 带自我反思的思维链 |\n\n**效果对比示例：**\n*   **普通模式**：直接回答 \"x = 1\" (错误) ❌\n*   **OptiLLM 模式**：逐步推导 \"2x + 3 = 7 → 2x = 4 → x = 2\" (正确) ✅","某金融科技公司的量化团队正利用开源大模型自动解析复杂的金融衍生品条款，并生成对应的定价逻辑代码，以加速新产品的上线流程。\n\n### 没有 optillm 时\n- **推理准确率低下**：面对多步嵌套的数学计算和逻辑判断，基础模型（如 Llama 3 或 GPT-4o-mini）经常跳过关键步骤，导致生成的定价公式存在隐蔽错误。\n- **调试成本高昂**：开发人员需要花费大量时间人工复核模型输出的代码逻辑，甚至需要重新微调模型才能勉强达到可用标准。\n- **算力与性能失衡**：为了获得可靠的推理结果，团队被迫调用昂贵的大型模型 API，显著增加了运营成本，且响应延迟较高。\n- **复杂场景失效**：在处理长上下文的历史数据对比或极端市场假设推演时，模型容易迷失重点，输出无关或幻觉内容。\n\n### 使用 optillm 后\n- **推理能力跃升**：通过集成 MCTS（蒙特卡洛树搜索）和思维链等 20+ 种优化技术，optillm 让轻量级模型在数学和逻辑任务上的准确率提升了 2-10 倍，直接输出正确的推导过程。\n- **零训练即时部署**：无需任何额外的模型训练或微调，只需将 API 请求代理至 optillm 服务器并添加特定前缀（如 `moa-`），即可立刻获得前沿模型的表现。\n- **降本增效显著**：团队成功用低成本的小参数模型替代了昂贵的大模型，在保持甚至超越原有精度的同时，大幅降低了 Token 消耗和等待时间。\n- **复杂任务稳健处理**：借助自动规划（Planning）和多智能体协作（Mixture of Agents）策略，optillm 能稳定处理长文本分析和复杂假设推演，确保证券定价逻辑的严密性。\n\noptillm 通过“以计算换智能”的推理时优化策略，让企业在不增加训练成本的前提下，瞬间解锁了小模型解决高难度专业问题的能力。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Falgorithmicsuperintelligence_optillm_ff913a07.png","algorithmicsuperintelligence","Algorithmic SuperIntelligence Labs","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Falgorithmicsuperintelligence_2f2eccb0.png","",null,"research@algorithmicsuperintelligence.ai","https:\u002F\u002Falgorithmicsuperintelligence.ai","https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence",[84,88,92],{"name":85,"color":86,"percentage":87},"Python","#3572A5",99.8,{"name":89,"color":90,"percentage":91},"Shell","#89e051",0.1,{"name":93,"color":94,"percentage":91},"Dockerfile","#384d54",3404,267,"2026-04-04T03:33:50","Apache-2.0","Linux, macOS, Windows","非必需（主要作为代理调用外部 API）；若使用本地推理插件，需根据具体模型决定，未指定具体型号","未说明（取决于是否运行本地模型及所选插件）",{"notes":103,"python":104,"dependencies":105},"该工具主要是一个推理优化代理，默认通过 API 调用外部模型（如 OpenAI, Anthropic 等），因此对本地 GPU 无强制要求。若启用‘离线模式’或使用需要本地计算的插件（如 Z3 Solver, Code Execution），则需安装相应依赖。提供多种 Docker 镜像：完整版（含本地推理依赖）、仅代理版（轻量）、离线版（预下载 spaCy 等模型）。支持 SSL 配置以适应企业代理环境。","3.8+",[106,107,108,109,110,111],"litellm","openai","requests","spacy (可选离线模式)","outlines (JSON 插件)","selenium (Web 搜索插件)",[13,53,15,26],[114,115,116,117,118,119,120,121,122,123,124,125,107,126,127,128,129,130,131,132],"agent","agentic-ai","agentic-workflow","agents","api-gateway","genai","large-language-models","llm","llm-inference","llmapi","mixture-of-experts","moa","openai-api","optimization","proxy-server","agentic-framework","chain-of-thought","monte-carlo-tree-search","prompt-engineering","2026-03-27T02:49:30.150509","2026-04-06T06:54:19.120916",[136,141,146,151,156,161],{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},15269,"如何在本地使用 llama.cpp 运行 optillm？有哪些限制和配置建议？","可以配合 llama.cpp server 使用，但需注意以下限制和配置：\n1. llama-server 默认不支持采样多个响应（参数 n>1），这会导致 'Only one completion choice is allowed' 错误。因此，依赖多响应的策略（如 bon, moa, mcts）可能无法正常工作或效果不佳。\n2. 建议启动时设置 '--best_of_n 1' 以避免报错，但这会禁用最佳-of-N 策略。\n3. 推荐尝试不需要多响应的策略，如 'cot_reflection', 'leap', 'plansearch', 'rstar', 'rto', 'self_consistency' 和 'z3'。\n4. 务必设置上下文长度 'n_ctx' 至少为 4096（llama-server 默认为 2048），因为大多数策略的 'max_tokens' 设为 4096。\n5. 替代方案：可以考虑使用 ollama 在本地运行模型以获得更好的兼容性。","https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fissues\u002F8",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},15270,"为什么在使用某些策略（如 bon, moa, mcts）时会遇到 'list index out of range' 错误？","该错误通常是因为后端模型（特别是 vLLM）没有返回预期数量的响应选项（choices）。例如，optillm 请求了 3 个响应（n=3），但 vLLM 只返回了 1 个，导致代码在访问列表索引时越界。\n解决方案：\n1. 这是一个已知问题，项目计划在未来版本中实现回退机制（fallback）来处理这种情况。\n2. 临时检查：确认你的 OpenAI 客户端是否直接向后端（如 vLLM）设置了 'n' 参数并成功获取了多个响应。如果后端本身不支持或多响应配置未生效，optillm 的多响应策略将失败。","https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fissues\u002F67",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},15271,"Docker 容器在使用自定义 base_url 时，访问 '\u002Fv1\u002Fmodels' 接口报 500 错误怎么办？","这是因为 Flask 应用返回了无效的响应类型（SyncPage[Model] 而非 JSON）。\n解决方法是修改 'optillm.py' 文件中的 'proxy_models' 函数，将获取模型的代码从 'client.models.list()' 改为 'client.models.list().json()'，以确保返回正确的 JSON 格式。\n具体代码补丁如下：\n```python\n# 原代码\nmodels_response = client.models.list()\n# 修改后\nmodels_response = client.models.list().json()\n```\n维护者已合并相关修复，建议更新到最新版本或手动应用此补丁。","https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fissues\u002F194",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},15272,"为什么无法在代理模式下使用 'cot_decode' 方法？有什么变通方案吗？","'cot_decode' 方法需要在模型的前向传播（forward pass）过程中获取 logits（对数概率）。\n1. 对于通过 API 调用的闭源模型（如 GPT-4o, Gemini），无法获取 logits，因此该方法不可用。\n2. 对于开源权重模型（如 Llama 3.2），可以通过 PyTorch 直接在代理内部进行推理来实现，但这要求代理运行在带有 GPU 的机器上，否则速度会很慢。\n变通方案：如果你使用的是本地托管的开源模型，可以尝试在本地 GPU 服务器上运行 optillm 代理，以便直接加载模型并获取 logits；或者寻找其他能返回 logits 的 API 接口。","https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fissues\u002F59",{"id":157,"question_zh":158,"answer_zh":159,"source_url":160},15273,"安装 z3-solver 失败，是否有替代方案或支持其他求解器？","针对 z3-solver 安装困难的问题，维护者已采取以下措施：\n1. 已将 'sympy' 添加到求解器方法的可用库列表中，作为 z3 的替代或补充。Sympy 通常比 z3 更容易安装。\n2. 虽然 z3 和 sympy 在某些功能上相似，但 sympy 在安装便捷性上有优势。\n3. 目前尚未正式支持 Lean4 或 AlphaGeometry，相关想法建议在讨论区进一步交流。用户可以尝试使用 sympy 来解决类似的符号数学问题。","https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fissues\u002F38",{"id":162,"question_zh":163,"answer_zh":164,"source_url":165},15274,"optillm 是否计划集成 DeepConf 等最新的推理时扩展技术？","关于集成 DeepConf（一种在推理时进行扩展的技术）：\n1. 维护者指出，如果使用 4-bit 量化模型，DeepConf 可能无法很好地工作，因为它至少需要查看 log probs（对数概率），并且需要大量的配置调整。\n2. 目前暂无明确的立即集成计划，主要受限于模型精度要求和配置复杂性。\n3. 社区欢迎贡献者提交 PR 来实现此功能，但在实施前需充分考虑量化模型带来的局限性。","https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fissues\u002F230",[167,172,177,182,187,192,197,202,207,212,217,222,227,232,237,242,247,252,257,262],{"id":168,"version":169,"summary_zh":170,"released_at":171},89925,"v0.3.14","## 变更内容\n* 由 @codelion 在 https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fpull\u002F297 中修复了 spaCy 的版本约束\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fcompare\u002Fv0.3.13...v0.3.14","2026-03-19T00:18:09",{"id":173,"version":174,"summary_zh":175,"released_at":176},89926,"v0.3.13","## 变更内容\n* 由 @ohpauleez 在 https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fpull\u002F290 中提升了 CePO 功能\n* 由 @codelion 在 https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fpull\u002F294 中修复了 macOS MPS 兼容性问题，并将版本升级至 0.3.13\n\n## 新贡献者\n* @ohpauleez 在 https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fpull\u002F290 中完成了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fcompare\u002Fv0.3.12...v0.3.13","2026-01-28T03:24:25",{"id":178,"version":179,"summary_zh":180,"released_at":181},89927,"v0.3.12","## 变更内容\n* 修复默认地址绑定，由 @codelion 在 https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fpull\u002F288 中完成\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fcompare\u002Fv0.3.11...v0.3.12","2025-12-25T04:42:47",{"id":183,"version":184,"summary_zh":185,"released_at":186},89928,"v0.3.11","## 变更内容\n* 修复了网络搜索功能，由 @codelion 在 https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fpull\u002F284 中完成。\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fcompare\u002Fv0.3.10...v0.3.11","2025-12-03T03:49:59",{"id":188,"version":189,"summary_zh":190,"released_at":191},89929,"v0.3.10","## 变更内容\n* 修复 `math_verify` 包名中的拼写错误，由 @theodorosploumis 在 https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fpull\u002F282 中完成\n* 新功能发布，由 @codelion 在 https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fpull\u002F283 中完成\n\n## 新贡献者\n* @theodorosploumis 在 https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fpull\u002F282 中完成了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fcompare\u002Fv0.3.9...v0.3.10","2025-11-30T06:17:13",{"id":193,"version":194,"summary_zh":195,"released_at":196},89930,"v0.3.9","## 变更内容\n* 由 @codelion 在 https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fpull\u002F280 中修复了最大令牌数问题\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fcompare\u002Fv0.3.8...v0.3.9","2025-11-20T11:28:51",{"id":198,"version":199,"summary_zh":200,"released_at":201},89931,"v0.3.8","## 变更内容\n* 版本升级至 0.3.8，并由 @codelion 在 https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fpull\u002F277 中添加了 google-cloud-aiplatform\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fcompare\u002Fv0.3.7...v0.3.8","2025-11-20T03:24:11",{"id":203,"version":204,"summary_zh":205,"released_at":206},89932,"v0.3.7","## 变更内容\n* 由 @codelion 在 https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fpull\u002F271 中更新了 README.md\n* 由 @codelion 在 https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fpull\u002F273 中修复了响应格式为无的问题\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fcompare\u002Fv0.3.6...v0.3.7","2025-11-17T08:33:30",{"id":208,"version":209,"summary_zh":210,"released_at":211},89933,"v0.3.6","## 变更内容\n* 由 @codelion 在 https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fpull\u002F270 中修复了 Docker 构建和 latest 标签问题。\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fcompare\u002Fv0.3.5...v0.3.6","2025-11-08T01:23:15",{"id":213,"version":214,"summary_zh":215,"released_at":216},89934,"v0.3.5","## 变更内容\n* 由 @codelion 在 https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fpull\u002F268 中修复了流式处理的 bug\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fcompare\u002Fv0.3.4...v0.3.5","2025-11-04T10:15:53",{"id":218,"version":219,"summary_zh":220,"released_at":221},89935,"v0.3.4","## What's Changed\r\n* Refactor TTD-DR plugin for improved citation handling by @codelion in https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fpull\u002F261\r\n* Add robust response validation and token config support by @codelion in https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fpull\u002F266\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Falgorithmicsuperintelligence\u002Foptillm\u002Fcompare\u002Fv0.3.3...v0.3.4","2025-11-01T08:27:28",{"id":223,"version":224,"summary_zh":225,"released_at":226},89936,"v0.3.3","## What's Changed\r\n* Feat mars by @codelion in https:\u002F\u002Fgithub.com\u002Fcodelion\u002Foptillm\u002Fpull\u002F259\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcodelion\u002Foptillm\u002Fcompare\u002Fv0.3.2...v0.3.3","2025-10-03T11:12:25",{"id":228,"version":229,"summary_zh":230,"released_at":231},89937,"v0.3.2","## What's Changed\r\n* Add configurable SSL certificate verification support by @codelion in https:\u002F\u002Fgithub.com\u002Fcodelion\u002Foptillm\u002Fpull\u002F258\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcodelion\u002Foptillm\u002Fcompare\u002Fv0.3.1...v0.3.2","2025-09-30T05:17:31",{"id":233,"version":234,"summary_zh":235,"released_at":236},89938,"v0.3.1","## What's Changed\r\n* Feat offline dockerfile by @codelion in https:\u002F\u002Fgithub.com\u002Fcodelion\u002Foptillm\u002Fpull\u002F257\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcodelion\u002Foptillm\u002Fcompare\u002Fv0.3.0...v0.3.1","2025-09-30T03:28:23",{"id":238,"version":239,"summary_zh":240,"released_at":241},89939,"v0.3.0","## What's Changed\r\n* Add multi-transport support to MCP plugin by @codelion in https:\u002F\u002Fgithub.com\u002Fcodelion\u002Foptillm\u002Fpull\u002F255\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcodelion\u002Foptillm\u002Fcompare\u002Fv0.2.10...v0.3.0","2025-09-29T11:36:43",{"id":243,"version":244,"summary_zh":245,"released_at":246},89940,"v0.2.10","## What's Changed\r\n* Cache expensive resources in privacy plugin by @codelion in https:\u002F\u002Fgithub.com\u002Fcodelion\u002Foptillm\u002Fpull\u002F254\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcodelion\u002Foptillm\u002Fcompare\u002Fv0.2.9...v0.2.10","2025-09-29T01:42:19",{"id":248,"version":249,"summary_zh":250,"released_at":251},89941,"v0.2.9","## What's Changed\r\n* as by @codelion in https:\u002F\u002Fgithub.com\u002Fcodelion\u002Foptillm\u002Fpull\u002F248\r\n* Cepo 2025 Q3 by @pawelf-cerebras in https:\u002F\u002Fgithub.com\u002Fcodelion\u002Foptillm\u002Fpull\u002F250\r\n* Feat new cepo release by @codelion in https:\u002F\u002Fgithub.com\u002Fcodelion\u002Foptillm\u002Fpull\u002F251\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcodelion\u002Foptillm\u002Fcompare\u002Fv0.2.8...v0.2.9","2025-09-27T01:34:11",{"id":253,"version":254,"summary_zh":255,"released_at":256},89942,"v0.2.8","## What's Changed\r\n* Fix proxy clients by @codelion in https:\u002F\u002Fgithub.com\u002Fcodelion\u002Foptillm\u002Fpull\u002F246\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcodelion\u002Foptillm\u002Fcompare\u002Fv0.2.7...v0.2.8","2025-09-09T15:38:12",{"id":258,"version":259,"summary_zh":260,"released_at":261},89943,"v0.2.7","## What's Changed\r\n* Fix support per provider concurrency by @codelion in https:\u002F\u002Fgithub.com\u002Fcodelion\u002Foptillm\u002Fpull\u002F245\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcodelion\u002Foptillm\u002Fcompare\u002Fv0.2.6...v0.2.7","2025-09-09T06:53:19",{"id":263,"version":264,"summary_zh":265,"released_at":266},89944,"v0.2.6","## What's Changed\r\n* Fix proxy timeout by @codelion in https:\u002F\u002Fgithub.com\u002Fcodelion\u002Foptillm\u002Fpull\u002F244\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fcodelion\u002Foptillm\u002Fcompare\u002Fv0.2.5...v0.2.6","2025-09-09T01:40:18"]