[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-MiroMindAI--MiroThinker":3,"tool-MiroMindAI--MiroThinker":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",146793,2,"2026-04-08T23:32:35",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108111,"2026-04-08T11:23:26",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":77,"owner_twitter":78,"owner_website":79,"owner_url":80,"languages":81,"stars":109,"forks":110,"last_commit_at":111,"license":112,"difficulty_score":10,"env_os":75,"env_gpu":113,"env_ram":114,"env_deps":115,"category_tags":118,"github_topics":120,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":131,"updated_at":132,"faqs":133,"releases":168},5710,"MiroMindAI\u002FMiroThinker","MiroThinker","MiroThinker is a deep research agent optimized for complex research and prediction tasks. Our latest models, MiroThinker-1.7 and MiroThinker-H1, achieve 74.0 and 88.2 on the BrowseComp, respectively.","MiroThinker 是一款专为复杂研究与预测任务打造的深度研究智能体。面对海量信息检索难、逻辑推理链条长以及金融趋势预测不准等痛点，它能够自主执行大规模网络搜索、多轮工具调用及深度文档分析，最终生成结构严谨的专业研究报告。\n\n这款工具特别适合需要处理高难度课题的研究人员、追求精准数据的金融分析师，以及希望构建高级自主代理系统的开发者。普通用户也能通过其在线平台轻松获取深度的行业洞察。\n\nMiroThinker 的技术亮点在于其卓越的“交互式扩展”能力与超长上下文处理机制。它支持高达 256K 的上下文窗口，单次任务可执行超过 600 次工具调用，确保在极复杂的任务中不丢失关键细节。其最新模型 MiroThinker-1.7 系列在 BrowseComp 等权威基准测试中表现优异，其中专有模型 H1 更是取得了 88.2 分的高分。值得一提的是，其开源版本在仅使用 300 亿参数的情况下，依然在中文复杂推理任务上刷新了开源模型的最佳纪录，实现了高性能与低成本的完美平衡。无论是本地部署还是在线体验，MiroThinker 都能成为您得力的科研与决策助手。","\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_b1110b2f4908.png\" width=\"55%\" alt=\"MiroThinker\" \u002F>\n\u003C\u002Fdiv>\n\n\u003Cbr>\n\n\u003Cdiv align=\"center\">\n\n[![MODEL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModel-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white)](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fmiromind-ai\u002Fmirothinker-17)\n[![Blog](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBlog-4285F4?style=for-the-badge&logo=google-chrome&logoColor=white)](https:\u002F\u002Fmiromind.ai\u002F#blog)\n[![DATA](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FData-0040A1?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fmiromind-ai\u002FMiroVerse-v0.1)\n\n[![GITHUB](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGithub-24292F?style=for-the-badge&logo=github&logoColor=white)](https:\u002F\u002Fgithub.com\u002FMiroMindAI)\n[![WEBSITE](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWebsite-4285F4?style=for-the-badge&logo=google-chrome&logoColor=white)](https:\u002F\u002Fmiromind.ai\u002F)\n[![DISCORD](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https:\u002F\u002Fdiscord.com\u002Finvite\u002FGPqEnkzQZd)\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\n### 🚀 [Try MiroThinker!](https:\u002F\u002Fdr.miromind.ai\u002F)\n\n\u003C\u002Fdiv>\n\n**MiroThinker**: A deep research agent optimized for research and prediction. It achieves a 88.2  on the challenging BrowseComp benchmark. See [Quick Start](#-quick-start).\n\n\n## 📋 Table of Contents\n\n- 📰 [News & Updates](#-news--updates)\n- 📝 [Introduction](#-introduction)\n- ✨ [Key Features](#-key-features)\n- 📈 [Performance on Benchmarks](#-performance-on-benchmarks)\n- 🚀 [Quick Start](#-quick-start)\n- 📊 [Benchmark Evaluation](#-benchmark-evaluation)\n- 🔬 [Trace Collection](#-trace-collection)\n- ❓ [FAQ & Troubleshooting](#-faq--troubleshooting)\n- 📄 [License](#-license)\n- 🙏 [Acknowledgments](#-acknowledgments)\n\n## 📰 News & Updates\n- **[2026-03-11]** 🎉🎉🎉 Introducing [MiroThinker-1.7](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fmiromind-ai\u002Fmirothinker-17), including [MiroThinker-1.7-mini](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-1.7-mini) and [MiroThinker-1.7](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-1.7). MiroThinker-1.7-mini achieves 72.3 on BrowseComp-ZH, setting a new SOTA among open-source models while using only 30B parameters. Our proprietary agent MiroThinker-H1 achieves leading performance on BrowseComp and BrowseComp-ZH among open-source and commercial models.\n- **\\[2026-01-23\\]** 🎉 We have brought two important updates to [MiroThinker online](http:\u002F\u002Fdr.miromind.ai): (a) Core Research Report Generation: Deep Research online reports now support generation, preview, and sharing. (b) Extended Document Upload Types: Now supports the upload of various file formats, such as `.pdf`, `.doc`, `.ppt`,  `.xls`,  `.jpg`. Welcome to try it out! MiroThinker will continue to be maintained and iteratively upgraded, with the goal of becoming the best Research Agent you'll ever use! \n- **\\[2026-01-05\\]** 🎉🎉 We release [MiroThinker-v1.5](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fmiromind-ai\u002Fmirothinker-v15), a series of open-source deep research agents optimized for financial prediction. [MiroThinker-v1.5-30B](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-v1.5-30B) surpasses Kimi-K2-Thinking on BrowseComp-ZH at much lower cost, using only 1\u002F30 of the parameters. [MiroThinker-v1.5-235B](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-v1.5-235B) scores 39.2% on HLE-Text, 69.8% on BrowseComp, 71.5% on BrowseComp-ZH, and 80.8% on GAIA-Val-165, setting a new state-of-the-art among search agents.\n\n\n\u003Cdetails>\n  \u003Csummary>📜 Click to expand older updates\u003C\u002Fsummary>\n\n- **\\[2025-11-13\\]** 🎉 [MiroThinker-v1.0](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fmiromind-ai\u002Fmirothinker-v10) is now released! Introducing **interactive scaling** as a third dimension of performance improvement, MiroThinker v1.0 supports 256K context window and up to 600 tool calls per task. Available in 8B, 30B, and 72B parameter scales, achieving 37.7%, 47.1%, 55.6%, and 81.9% on HLE-Text, BrowseComp, BrowseComp-ZH, and GAIA-Text-103, respectively. See [Technical Report](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.11793) for more details.\n- **\\[2025-09-11\\]** MiroThinker-72B-Preview ranked 4th in this week's FutureX benchmark. See [FutureX](https:\u002F\u002Ffuturex-ai.github.io\u002F).\n- **\\[2025-09-08\\]** [MiroThinker-v0.2](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fmiromind-ai\u002Fmirothinker-v02) is now released, achieving open-source SOTA performance across multiple benchmarks, including HLE (17.8%), HLE-Text-Only (19.1%), BrowseComp-EN (17.2%), BrowseComp-ZH (29.4%), XBench-DeepSearch (56.0%), and Frames (74.8%).\n- **\\[2025-09-07\\]** We supported more benchmarks, including [BrowseComp-ZH](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.19314), [XBench-DeepSearch](https:\u002F\u002Fxbench.org\u002Fagi\u002Faisearch), and [FutureX](https:\u002F\u002Ffuturex-ai.github.io\u002F). We plan to add more benchmarks in the future.\n- **\\[2025-08-22\\]** Introducing streamlined deployment options for MiroThinker with optimized resource usage and faster startup times. Experience the interactive demo: [🚀 Try Gradio Demo](apps\u002Fgradio-demo)\n- **\\[2025-08-08\\]** [MiroThinker-v0.1](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fmiromind-ai\u002Fmirothinker-v01-689301b6d0563321862d44a1) released.\n\n\u003C\u002Fdetails>\n\n## 📝 Introduction\n\n### MiroThinker-1.7\nOur new MiroThinker family represents a significant leap in building reliable agents for long-chain tasks. Engineered with enhanced post-training pipeline, our  MiroThinker-1.7 family achieve SOTA performance in deep research tasks among open-source models.\n\n\n**Key Features**\n\n- 🚀 MiroThinker-1.7 supports a 256K context window, long-horizon reasoning, and deep multi-step analysis.\n- 🔧 Handles up to 300 tool interactions per task, now with more accurate stepwise reasoning and decision-making.\n- 📦 Released in 30B and 235B parameter scales, accompanied by a comprehensive suite of tools and workflows to flexibly support diverse research settings and compute budgets.\n- Our proprietary agent, MiroThinker-H1 provides promising evidence for long-chain verifiable reasoning — reasoning processes that are step-verifiable and globally verifiable, improving the performance of complex agentic workflows.\n\n\u003Cdiv align=\"center\">\n\n|      Model Name       |         Parameters            | Max Context | Max Tool Calls |                              HF Link                               |\n|:---------------------:|:-----------------------------:|:-----------:|:--------------:|:------------------------------------------------------------------:|\n| MiroThinker-1.7-mini  | 30B   |    256K     |      300       | [🤗 link](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-1.7-mini) |\n| MiroThinker-1.7 | 235B |    256K     |      300       | [🤗 link](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-1.7) |\n\n\u003C\u002Fdiv>\n\nMiroThinker-1.7 demonstrates strong general-research performance across a broad range of benchmarks, achieving 74.0%, 75.3%, 82.7% and 42.9% on  BrowseComp, BrowseComp-ZH, GAIA-Val-165 and HLE-Text, respectively. MiroThinker-1.7 achieves SOTA performance on BrowseComp-ZH.\n\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_43ea5f118a9d.png)\n\n\n\n\n### MiroThinker-v1.5\n\n\u003Cdetails>\n  \u003Csummary>📦 Click to expand MiroThinker-v1.5 details\u003C\u002Fsummary>\n\nMiroThinker v1.5 is the world-leading open-source search agent that advances tool-augmented reasoning through **interactive scaling** — training the agent to handle deeper and more frequent agent-environment interactions as a third dimension of performance improvement, beyond model size and context length.\n\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_44e8c904d068.png)\n\n**Key Features**\n\n- 🚀 MiroThinker v1.5 supports a 256K context window, long-horizon reasoning, and deep multi-step analysis.\n- 🔧 Handles up to 400 tool calls per task — a substantial improvement over previous open-source research agents.\n- 📦 Released in 30B and 235B parameter scales, accompanied by a comprehensive suite of tools and workflows to flexibly support diverse research settings and compute budgets.\n\n\u003Cdiv align=\"center\">\n\n|      Agent Name       |         Base Agent            | Max Context | Max Tool Calls |                              HF Link                               |\n|:---------------------:|:-----------------------------:|:-----------:|:--------------:|:------------------------------------------------------------------:|\n| MiroThinker-v1.5-30B  | Qwen3-30B-A3B-Thinking-2507   |    256K     |      400       | [🤗 link](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-v1.5-30B) |\n| MiroThinker-v1.5-235B | Qwen3-235B-A22B-Thinking-2507 |    256K     |      400       | [🤗 link](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-v1.5-235B) |\n\n\u003C\u002Fdiv>\n\nMiroThinker v1.5 demonstrates strong general-research performance across a broad range of benchmarks, achieving 39.2%, 69.8%, 71.5%, and 80.8% on HLE-Text, BrowseComp, BrowseComp-ZH, and GAIA-Val-165, respectively. These results surpass previous open-source agents and set the new world-leading BrowseComp performance.\n\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_099e0eae1e18.png)\n\n\u003C\u002Fdetails>\n\n### MiroThinker-v1.0\n\n\u003Cdetails>\n  \u003Csummary>📦 Click to expand MiroThinker-v1.0 details\u003C\u002Fsummary>\n\nUnlike previous agents that scale only model size or context length, MiroThinker v1.0 introduces **interactive scaling** at the agent level, systematically training the agent to handle deeper and more frequent agent–environment interactions as a third dimension of performance improvement. Interactive scaling leverages environment feedback and external information acquisition to correct errors and refine trajectories.\n\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_81601a80ecbe.png)\n\n### ✨ Key Features\n\n- 🚀 **256K Context Window**: Supports long-horizon reasoning and deep multi-step analysis\n- 🔧 **600 Tool Calls**: Handles up to 600 tool calls per task — a substantial improvement over previous open-source research agents\n- 📦 **Multiple Scales**: Released in 8B, 30B, and 72B parameter scales, accompanied by a comprehensive suite of tools and workflows to flexibly support diverse research settings and compute budgets\n\n\u003Cdiv align=\"center\">\n\n|      Agent Name      |         Base Agent          | Max Context | Max Tool Calls |                              HF Link                               |\n|:--------------------:|:---------------------------:|:-----------:|:--------------:|:------------------------------------------------------------------:|\n| MiroThinker-v1.0-8B  |        Qwen3-8B             |    256K     |      600       | [🤗 link](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-v1.0-8B)  |\n| MiroThinker-v1.0-30B | Qwen3-30B-A3B-Thinking-2507 |    256K    |      600       | [🤗 link](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-v1.0-30B) |\n| MiroThinker-v1.0-72B |    Qwen2.5-72B-Instruct     |    256K    |      600       | [🤗 link](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-v1.0-72B) |\n\n\u003C\u002Fdiv>\n\nMiroThinker v1.0 demonstrates strong general-research performance across a broad range of benchmarks, achieving **37.7%**, **47.1%**, **55.6%**, and **81.9%** on HLE-Text, BrowseComp, BrowseComp-ZH, and GAIA-Text-103, respectively. These results surpass previous open-source agents and narrow the gap with commercial counterparts such as **GPT-5-high**.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_aefd6e80a90e.png\" width=\"100%\" alt=\"MiroThinker\" \u002F>\n\u003C\u002Fdiv>\n\n\u003C\u002Fdetails>\n\n### MiroThinker-v0.2\n\n\u003Cdetails>\n  \u003Csummary>📦 Click to expand MiroThinker-v0.2 details\u003C\u002Fsummary>\n\nIn this new version, we introduced three key improvements:\n\n- 📚 **Richer training data** from both English and Chinese sources, yielding significant gains in benchmark performance and generalization\n- 🎯 **Unified DPO training** with a single preference dataset across all agents\n- 📏 **Extended context length** from 40k to 64k for more challenging multi-turn tool-use tasks\n\nCompared to v0.1, MiroThinker v0.2 delivers consistent gains across benchmarks. For example, scores improved from **57.3 → 64.1** on **GAIA-Text-103** and from **17.0 → 29.4** on **BrowseComp-ZH**, reflecting substantial advancements in the model’s general research agent capabilities.\n\n\u003Cdiv align=\"center\">\n\n|        Agent Name        |      Base Agent       | Max Context |                                HF Link                                 |\n|:------------------------:|:---------------------:|:-----------:|:----------------------------------------------------------------------:|\n| MiroThinker-4B-SFT-v0.2  |       Qwen3-4B        |    64K     | [🤗 link](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-4B-SFT-v0.2)  |\n| MiroThinker-4B-DPO-v0.2  |       Qwen3-4B        |    64K     | [🤗 link](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-4B-DPO-v0.2)  |\n| MiroThinker-8B-SFT-v0.2  |       Qwen3-8B        |    64K     | [🤗 link](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-8B-SFT-v0.2)  |\n| MiroThinker-8B-DPO-v0.2  |       Qwen3-8B        |    64K     | [🤗 link](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-8B-DPO-v0.2)  |\n| MiroThinker-14B-SFT-v0.2 |       Qwen3-14B       |    64K     | [🤗 link](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-14B-SFT-v0.2) |\n| MiroThinker-14B-DPO-v0.2 |       Qwen3-14B       |    64K     | [🤗 link](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-14B-DPO-v0.2) |\n| MiroThinker-32B-SFT-v0.2 |       Qwen3-32B       |    64K     | [🤗 link](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-32B-SFT-v0.2) |\n| MiroThinker-32B-DPO-v0.2 |       Qwen3-32B       |    64K     | [🤗 link](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-32B-DPO-v0.2) |\n\n\u003C\u002Fdiv>\n\n\u003C\u002Fdetails>\n\n### MiroThinker-v0.1\n\n\u003Cdetails>\n  \u003Csummary>📦 Click to expand MiroThinker-v0.1 details\u003C\u002Fsummary>\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_ada3ef2bd3ee.png\" width=\"98%\" alt=\"MiroFlow Performance on GAIA-Validation\" \u002F>\n  \u003Cp>\u003Cstrong>Performance of Open-Source Agents on GAIA-Validation Benchmark.\u003C\u002Fstrong>\u003C\u002Fp>\n\u003C\u002Fdiv>\n\nWe have released the **MiroThinker v0.1** series, including both SFT and DPO variants at parameter scales of **8B**, **14B**, and **32B**. Notably, MiroThinker v0.1 achieves **state-of-the-art performance** among open-source models on the [GAIA benchmark](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fgaia-benchmark\u002FGAIA), a rigorous evaluation suite for advanced agentic capabilities, demonstrating its strength in long-context, decision-intensive, and real-world task scenarios.\n\n\u003Cdiv align=\"center\">\n\n| Agent Name                | Base Agent | Max Context | HF Link                                                               |\n| :-----------------------: |:----------:|:-----------:| :--------------------------------------------------------------------:|\n| MiroThinker-8B-SFT-v0.1   |  Qwen3-8B  |    40K     | [🤗 link](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-8B-SFT-v0.1)  |\n| MiroThinker-8B-DPO-v0.1   |  Qwen3-8B  |    40K     | [🤗 link](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-8B-DPO-v0.1)  |\n| MiroThinker-14B-SFT-v0.1  | Qwen3-14B  |    40K     | [🤗 link](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-14B-SFT-v0.1) |\n| MiroThinker-14B-DPO-v0.1  | Qwen3-14B  |    40K     | [🤗 link](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-14B-DPO-v0.1) |\n| MiroThinker-32B-SFT-v0.1  | Qwen3-32B  |    40K     | [🤗 link](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-32B-SFT-v0.1) |\n| MiroThinker-32B-DPO-v0.1  | Qwen3-32B  |    40K     | [🤗 link](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-32B-DPO-v0.1) |\n\n\u003C\u002Fdiv>\n\n\u003C\u002Fdetails>\n\n## ✨ Key Features\n\n### 🤖 **MiroThinker-Optimized Framework**\n\n- 🔓 **Fully Open-Source Agent Framework**: Complete transparency with open framework and open agents\n- 🔗 **Tool Integration**: Seamless integration with external tools and APIs\n- 📝 **Trace Collection**: Comprehensive logging and analysis of agent interactions with elapsed time and estimated completion time displayed in minutes. Ready for SFT and DPO\n- 📊 **Benchmark Evaluation**: Extensive testing across multiple benchmark datasets\n\n### 📊 **Comprehensive Benchmark Suite**\n\n\u003Cdetails open>\n  \u003Csummary>📋 Click to expand benchmark list\u003C\u002Fsummary>\n\n- **GAIA Validation**: A benchmark for General AI Assistants. ([paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.12983))\n- **GAIA-Text-103**: A subset of GAIA Validation for text-only tasks. ([paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.22648))\n- **HLE**: Humanity's Last Exam. ([paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.14249))\n- **HLE-Text-2158**: A subset of HLE for text-only tasks. ([paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.14249))\n- **HLE-Text-500**: A subset of HLE for text-only tasks, created by [WebThinker](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.21776). ([paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.21776))\n- **BrowseComp-EN**: Web browsing and comprehension tasks. ([paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.12516))\n- **BrowseComp-ZH**: A Chinese version of BrowseComp. ([paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.19314))\n- **WebWalkerQA**: Web navigation and question answering. ([paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.07572))\n- **Frames**: Factuality, Retrieval, And reasoning MEasurement Set. ([paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.12941))\n- **XBench-DeepSearch**: A benchmark for deep research agents. ([website](https:\u002F\u002Fxbench.org\u002Fagi\u002Faisearch))\n- **FutureX**: A live benchmark designed for predicting unknown future. ([website](https:\u002F\u002Ffuturex-ai.github.io\u002F))\n- **SEAL-0**: A benchmark for evaluating LLMs on conflicting-evidence web questions. ([paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.01062))\n- **AIME2025**: American Invitational Mathematics Examination 2025. ([website](https:\u002F\u002Fartificialanalysis.ai\u002Fevaluations\u002Faime-2025))\n- **DeepSearchQA**: Google's Deep Search Question Answering benchmark. ([paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.20827))\n\n\u003C\u002Fdetails>\n\n## 📈 Performance on Benchmarks\n\n### MiroThinker-1.7\n\n> To prevent potential information leakage (e.g., retrieving benchmark answers from HuggingFace), we blocked access to certain websites during evaluation.\n\n\u003Cdiv>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_f12d4f748c46.png\" width=\"100%\" alt=\"MiroThinker\" \u002F>\n\u003C\u002Fdiv>\n\n\u003C\u002Fdetails>\n\n\n\n### MiroThinker-v1.5\n\n\u003Cdetails>\n  \u003Csummary>📦 Click to expand MiroThinker-v1.5 details\u003C\u002Fsummary>\n\n> To prevent potential information leakage (e.g., searching benchmark answers from HuggingFace), access to HuggingFace has been explicitly disabled in these tools.\n\n> We further perform canary string testing on the tool outputs of all trajectories and disregard any trajectory found to be contaminated, treating it as an incorrect answer.\n\n\u003Cdiv>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_5297deca77da.png\" width=\"100%\" alt=\"MiroThinker\" \u002F>\n\u003C\u002Fdiv>\n\n\u003C\u002Fdetails>\n\n### MiroThinker-v1.0\n\n\u003Cdetails>\n  \u003Csummary>📦 Click to expand MiroThinker-v1.0 details\u003C\u002Fsummary>\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_2951e00652f7.png\" width=\"100%\" alt=\"MiroThinker\" \u002F>\n\u003C\u002Fdiv>\n\n\u003C\u002Fdetails>\n\n### MiroThinker-v0.2\n\n\u003Cdetails>\n  \u003Csummary>📦 Click to expand MiroThinker-v0.2 details\u003C\u002Fsummary>\n\n#### Comparison with SOTA Research Agents\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_ab3ed096dfad.png\" width=\"90%\" alt=\"MiroThinker\" \u002F>\n\u003C\u002Fdiv>\n\n#### GAIA Benchmark\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_3a0ed8050f59.png\" width=\"80%\" alt=\"MiroThinker\" \u002F>\n\u003C\u002Fdiv>\n\n\u003C\u002Fdetails>\n\n### MiroThinker-v0.1\n\n\u003Cdetails>\n  \u003Csummary>📦 Click to expand MiroThinker-v0.1 details\u003C\u002Fsummary>\n\n#### GAIA Benchmark\n\n\u003Cdiv align=\"center\">\n\n| **Method**                   | Text-103\u003Cbr>Best Pass@1 | Text-103\u003Cbr>Pass@1 (Avg@8) | Val-165\u003Cbr>Best Pass@1 | Val-165\u003Cbr>Pass@1 (Avg@8) |\n|------------------------------|:-----------------------:|:--------------------------:|:----------------------:|:-------------------------:|\n| **🔹—— 7B\u002F8B Agents ——**     |                         |                            |                        |                           |\n| Search-o1-7B                 |          17.5           |             -              |           -            |             -             |\n| R1-Searcher-7B               |          20.4           |             -              |           -            |             -             |\n| WebDancer-7B                 |          31.0           |             -              |           -            |             -             |\n| WebSailor-7B                 |          37.9           |             -              |           -            |             -             |\n| CK-Pro-8B                    |          40.3           |             -              |          32.7          |             -             |\n| **MiroThinker-8B-SFT-v0.1**  |          44.7           |            40.1            |          34.6          |           31.8            |\n|     + Commercial Tools       |          46.6           |            42.1            |          37.6          |           33.9            |\n| **MiroThinker-8B-DPO-v0.1**  |          46.6           |            44.8            |          37.0          |           35.4            |\n|     + Commercial Tools       |        **50.5**         |          **46.7**          |        **38.2**        |         **35.9**          |\n| **🔹—— 14B Agents ——**       |                         |                            |                        |                           |\n| **MiroThinker-14B-SFT-v0.1** |          47.6           |            44.4            |          37.0          |           34.4            |\n|     + Commercial Tools       |          49.5           |            47.5            |          41.8          |           39.8            |\n| **MiroThinker-14B-DPO-v0.1** |          48.5           |            46.6            |          42.4          |           39.2            |\n|     + Commercial Tools       |        **52.4**         |          **48.5**          |        **45.5**        |         **42.0**          |\n| **🔹—— 32B Agents ——**       |                         |                            |                        |                           |\n| Qwen3-32B                    |          31.1           |            26.7            |          29.7          |           26.4            |\n| Search-o1-32B                |          28.2           |             -              |           -            |             -             |\n| WebThinker-32B-RL            |          48.5           |             -              |           -            |             -             |\n| WebDancer-QwQ-32B            |          51.5           |             -              |           -            |             -             |\n| WebSailor-32B                |          53.2           |             -              |           -            |             -             |\n| WebShaper-QwQ-32B            |          53.3           |             -              |           -            |             -             |\n| **MiroThinker-32B-SFT-v0.1** |          55.3           |            51.3            |          44.9          |           42.7            |\n|     + Commercial Tools       |          58.3           |            54.2            |          48.5          |           45.8            |\n| **MiroThinker-32B-DPO-v0.1** |          57.3           |            54.1            |          48.5          |           45.9            |\n|     + Commercial Tools       |        **60.2**         |          **57.9**          |        **50.9**        |         **48.9**          |\n\n\u003C\u002Fdiv>\n\n1. Following the practices of WebThinker, WebAgents, and CognitiveKernel, we report the Best Pass@1, the highest score across three runs, which often reflects stronger performance, though it may exhibit some variability. To provide a more stable measure, we additionally report Pass@1 (Avg@8), which offers greater consistency at the cost of slightly lower scores.\n\n1. For consistency with prior open-source works, we evaluate GAIA-Text-103 using the WebAgents LLM-as-a-Judge template, and report results on GAIA-Val-165 using the official GAIA scorer script.\n\n1. By default, we use open-source tools wherever possible, except for the code tool [E2B](https:\u002F\u002Fgithub.com\u002Fe2b-dev\u002FE2B) and the Google search tool [Serper](https:\u002F\u002Fserper.dev\u002F). We use [Whisper](https:\u002F\u002Fhuggingface.co\u002Fopenai\u002Fwhisper-large-v3-turbo), [Qwen2.5-VL-72B-Instruct](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen2.5-VL-72B-Instruct), and [Qwen3-235B-A22B-Thinking-2507](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-235B-A22B-Thinking-2507) in our implementation. The framework can be easily extended to other open-source tools of your choice.\n\n1. Replacing these open-source tools with commercial alternatives can yield performance gains. Commercial tools were mainly used for multimodal capabilities and certain complex reasoning subtasks. The majority of tasks, including planning, browsing, refinement, navigation, and more, were handled by our agents.\n\n#### More Benchmarks\n\n\u003Cdiv align=\"center\">\n\n| Method                       | HLE\u003Cbr>Pass@1 | Frames\u003Cbr>Pass@1 | BrowseComp\u003Cbr>Pass@1 | BrowseComp-ZH\u003Cbr>Pass@1 | WebWalkerQA\u003Cbr>Pass@1 |\n|------------------------------|:-------------:|:----------------:|:--------------------:|:-----------------------:|:---------------------:|\n| OpenAI Deep Research         |     26.6      |        -         |         51.5         |          42.9           |           -           |\n| Gemini Deep Research         |     26.9      |        -         |          -           |            -            |           -           |\n| Kimi-Researcher              |     26.9      |       78.8       |          -           |            -            |           -           |\n|                              |               |                  |                      |                         |                       |\n| WebDancer-7B                 |       -       |        -         |          -           |            -            |         36.0          |\n| WebSailor-7B                 |       -       |        -         |         6.7          |          14.2           |           -           |\n| **MiroThinker-8B-SFT-v0.1**  |       -       |       58.0       |         5.5          |           9.3           |         41.3          |\n| **MiroThinker-8B-DPO-v0.1**  |       -       |       64.4       |         8.7          |          13.6           |         45.7          |\n|                              |               |                  |                      |                         |                       |\n| WebThinker-32B-RL            |       -       |        -         |          -           |            -            |         46.5          |\n| WebDancer-QwQ-32B            |       -       |        -         |         3.8          |          18.0           |         47.9          |\n| WebSailor-32B                |       -       |        -         |         10.5         |          25.5           |           -           |\n| WebShaper-32B                |       -       |        -         |          -           |            -            |         51.4          |\n| **MiroThinker-32B-SFT-v0.1** |     10.2      |       70.4       |         10.6         |          13.8           |         45.7          |\n| **MiroThinker-32B-DPO-v0.1** |     11.8      |       71.7       |         13.0         |          17.0           |         49.3          |\n\n\u003C\u002Fdiv>\n\n1. MiroThinker’s performance was tested with this repository and open-source tools; other agents’ results are from their papers and official sites.\n\n1. As [MiroVerse-v0.1](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fmiromind-ai\u002FMiroVerse-v0.1) mainly contains English data, the agent’s Chinese capability is limited. We plan to add more Chinese data to improve performance in the next version.\n\n\u003C\u002Fdetails>\n\n## 🚀 Quick Start\n\nFor optimal usage, we recommend using MiroThinker with this tool-enabled agent framework and thinking mode enabled.\n\n### Prerequisites\n\n- 🐍 **Python 3.10+**\n- 📦 **uv package manager** ([Installation guide](https:\u002F\u002Fgithub.com\u002Fastral-sh\u002Fuv))\n- 🔑 **Required API keys** (see configuration section below)\n\n### Installation\n\n```bash\n# Clone the repository\ngit clone https:\u002F\u002Fgithub.com\u002FMiroMindAI\u002FMiroThinker\ncd MiroThinker\n\n# Setup environment\ncd apps\u002Fmiroflow-agent\nuv sync\n\n# Configure API keys\ncp .env.example .env\n# Edit .env with your API keys (SERPER_API_KEY, JINA_API_KEY, E2B_API_KEY, etc.)\n```\n\n> **📝 Environment Variables**: See [Tool Configuration](#tool-configuration) section for required API keys.\n\n### Tool Configuration\n\n#### Minimal Configuration for MiroThinker-1.7.\n\n| Server | Description | Tools Provided | Required Environment Variables |\n|:-------|:------------|:---------------|:-------------------------------|\n| **`tool-python`** | Execution environment and file management (E2B sandbox) | `create_sandbox`, `run_command`, `run_python_code`, `upload_file_from_local_to_sandbox`, `download_file_from_sandbox_to_local`, `download_file_from_internet_to_sandbox` | `E2B_API_KEY` |\n| **`search_and_scrape_webpage`** | Google search via Serper API | `google_search` | `SERPER_API_KEY`, `SERPER_BASE_URL` |\n| **`jina_scrape_llm_summary`** | Web scraping with LLM-based information extraction | `scrape_and_extract_info` | `JINA_API_KEY`, `JINA_BASE_URL`, `SUMMARY_LLM_BASE_URL`, `SUMMARY_LLM_MODEL_NAME`, `SUMMARY_LLM_API_KEY` |\n\n**Minimal `.env` configuration example:**\n\n```bash\n# Required for MiroThinker v1.5 and v1.0 (minimal setup)\nSERPER_API_KEY=your_serper_key\nSERPER_BASE_URL=\"https:\u002F\u002Fgoogle.serper.dev\"\nJINA_API_KEY=your_jina_key\nJINA_BASE_URL=\"https:\u002F\u002Fr.jina.ai\"\nE2B_API_KEY=your_e2b_key\n\n# Required for jina_scrape_llm_summary\n# Note: Summary LLM can be a small model (e.g., Qwen3-14B or GPT-5-Nano)\n# The choice has minimal impact on performance, use what's most convenient\nSUMMARY_LLM_BASE_URL=\"https:\u002F\u002Fyour_summary_llm_base_url\u002Fv1\u002Fchat\u002Fcompletions\"\nSUMMARY_LLM_MODEL_NAME=your_llm_model_name  # e.g., \"Qwen\u002FQwen3-14B\" or \"gpt-5-nano\"\nSUMMARY_LLM_API_KEY=your_llm_api_key  # Optional, depends on LLM provider\n\n# Required for benchmark evaluation (LLM-as-a-Judge)\nOPENAI_API_KEY=your_openai_key  # Required for running benchmark evaluations\nOPENAI_BASE_URL=\"https:\u002F\u002Fapi.openai.com\u002Fv1\"  # Optional, defaults to OpenAI's API\n```\n\n> **💡 Why this is minimal**: These 3 MCP servers cover the core capabilities needed for research tasks: web search, content extraction, and code execution. All other servers are optional enhancements.\n>\n> **🤖 Summary LLM**: The `SUMMARY_LLM` can be a small model like Qwen3-14B or GPT-5-Nano. The choice has minimal impact on overall performance, use whichever is most convenient for your setup.\n>\n> **📊 For Benchmark Evaluation**: If you plan to run benchmark evaluations, you also need `OPENAI_API_KEY` (and optionally `OPENAI_BASE_URL`) for LLM-as-a-Judge functionality used in evaluation scripts.\n>\n> **🖼️ For GAIA Multimodal Tasks**: GAIA-Val-165 includes tasks with image\u002Faudio\u002Fvideo files. Since MiroThinker is a text-only LLM, GPT-4o is used to pre-process these files into text descriptions. The same `OPENAI_API_KEY` is used for both this preprocessing and LLM-as-a-Judge.\n>\n> **📖 For more details**: See [MiroFlow Tools README](libs\u002Fmiroflow-tools\u002FREADME.md) for complete documentation of all available tools.\n\n\u003Cdetails>\n  \u003Csummary>🔧 Click to expand additional available tools\u003C\u002Fsummary>\n\nThe following optional tools are available but were not used in MiroThinker v1.0-1.7 evaluation:\n\n| Server Name          | Type         | Description                                 |\n|:---------------------|:-------------|:--------------------------------------------|\n| `tool-vqa`           | Commercial   | Vision processing using Claude              |\n| `tool-vqa-os`        | Open-Source  | Vision processing (open-source alternative) |\n| `tool-transcribe`    | Commercial   | Audio transcription using OpenAI            |\n| `tool-transcribe-os` | Open-Source  | Audio transcription using Whisper           |\n| `tool-reasoning`     | Commercial   | Reasoning engine using Claude               |\n| `tool-reasoning-os`  | Open-Source  | Reasoning engine (open-source alternative)  |\n| `tool-reading`       | Open-Source  | Document reading using MarkItDown           |\n| `tool-google-search` | Commercial   | Web search using Google + scraping          |\n| `tool-sogou-search` | Commercial   | Web search using Sogou (Chinese)           |\n\n> **📖 Local Deployment**: For instructions on deploying open-source tools (`tool-vqa-os`, `tool-transcribe-os`, `tool-reasoning-os`) locally, see [Local Tool Deployment Guide](assets\u002FLOCAL-TOOL-DEPLOYMENT.md).\n\nSee the [MiroFlow Tools README](libs\u002Fmiroflow-tools\u002FREADME.md) for complete documentation of all available tools.\n\n\u003C\u002Fdetails>\n\n#### Pre-configured Agent Settings\n\nThe `apps\u002Fmiroflow-agent\u002Fconf\u002Fagent\u002F` directory contains several pre-configured agent settings. Each configuration uses different tools and requires corresponding environment variables in your `.env` file.\n\n> **💡 Recommended**: For MiroThinker-1.7, use `mirothinker_1.7_keep5_max200` (with context management, recommended for most tasks) or `mirothinker_v1.7_keep5_max300` (only used for BrowseComp and BrowseComp-ZH). \n\n| Configuration                          | Description | Max Turns | Context Retention | Required Environment Variables                                                                                                                               | Recommended For |\n|:---------------------------------------|:------------|:----------|:------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------|\n| **`mirothinker_1.7_keep5_max200`** ⭐  | Single-agent with context management | 200 | Keep 5 most recent | `SERPER_API_KEY`, `SERPER_BASE_URL`, `JINA_API_KEY`, `JINA_BASE_URL`, `E2B_API_KEY`, `SUMMARY_LLM_BASE_URL`, `SUMMARY_LLM_MODEL_NAME`, `SUMMARY_LLM_API_KEY` | **1.7 (recommended for most tasks)** |\n| **`mirothinker_1.7_keep5_max300`** ⭐  | Single-agent with context management | 300 | Keep 5 most recent | Same as above                                                                                                                              | **1.7 (for BrowseComp & BrowseComp-ZH)** |\n\n\n\u003Cdetails>\n  \u003Csummary>📦 Click to expand legacy configurations (v0.1\u002Fv0.2)\u003C\u002Fsummary>\n\n| Configuration            | Description | Max Turns | Context Retention | Required Environment Variables | Recommended For |\n|:-------------------------|:------------|:----------|:------------------|:-------------------------------|:----------------|\n| **`mirothinker_v1.5_keep5_max200`**  | Single-agent with context management | 200 | Keep 5 most recent | `SERPER_API_KEY`, `SERPER_BASE_URL`, `JINA_API_KEY`, `JINA_BASE_URL`, `E2B_API_KEY`, `SUMMARY_LLM_BASE_URL`, `SUMMARY_LLM_MODEL_NAME`, `SUMMARY_LLM_API_KEY` | **v1.5 (recommended for most tasks)** |\n| **`mirothinker_v1.5_keep5_max400`**  | Single-agent with context management | 400 | Keep 5 most recent | Same as above                                                                                                                              | **v1.5 (for BrowseComp & BrowseComp-ZH)** |\n| **`mirothinker_v1.5`**                 | Single-agent for MiroThinker v1.5 | 600 | Keep all results | Same as above | **v1.5** |\n| **`mirothinker_v1.0_keep5`**           | Single-agent with context management | 600 | Keep 5 most recent | Same as above                                                                                                                                   | **v1.0** |\n| **`mirothinker_v1.0`**                 | Single-agent for MiroThinker v1.0 | 600 | Keep all results | Same as above | **v1.0** |\n| **`multi_agent`**        | Multi-agent with commercial tools (v0.1\u002Fv0.2) | 50 | Keep all results | `E2B_API_KEY`, `ANTHROPIC_API_KEY`, `ANTHROPIC_BASE_URL`, `OPENAI_API_KEY`, `OPENAI_BASE_URL`, `SERPER_API_KEY`, `SERPER_BASE_URL`, `JINA_API_KEY`, `JINA_BASE_URL` | v0.1\u002Fv0.2 |\n| **`multi_agent_os`**     | Multi-agent with open-source tools (v0.1\u002Fv0.2) | 50 | Keep all results | `E2B_API_KEY`, `VISION_API_KEY`, `VISION_BASE_URL`, `VISION_MODEL_NAME`, `WHISPER_API_KEY`, `WHISPER_BASE_URL`, `WHISPER_MODEL_NAME`, `REASONING_API_KEY`, `REASONING_BASE_URL`, `REASONING_MODEL_NAME`, `SERPER_API_KEY`, `SERPER_BASE_URL`, `JINA_API_KEY`, `JINA_BASE_URL` | v0.1\u002Fv0.2 |\n\n\u003C\u002Fdetails>\n\n> **💡 Note**: All environment variables are listed in `apps\u002Fmiroflow-agent\u002F.env.example`. Copy it to `.env` and fill in the values for the tools you plan to use.\n\n#### Creating Custom Tool Configurations\n\n\u003Cdetails>\n  \u003Csummary>🔧 Click to expand custom tool configuration guide\u003C\u002Fsummary>\n\nYou can create your own YAML configuration file to freely combine MCP servers. Here's how:\n\n1. **Create a new YAML file** in `apps\u002Fmiroflow-agent\u002Fconf\u002Fagent\u002F`:\n\n```yaml\n# conf\u002Fagent\u002Fmy_custom_config.yaml\ndefaults:\n  - default\n  - _self_\n\nmain_agent:\n  tools:\n    - tool-python                    # Execution environment\n    - search_and_scrape_webpage      # Google search\n    - jina_scrape_llm_summary        # Web scraping with LLM\n    - tool-vqa                       # Vision processing (optional)\n    - tool-transcribe                # Audio processing (optional)\n    - tool-reasoning                 # Reasoning engine (optional)\n    - tool-reading                   # Document reading (optional)\n  max_turns: 300  # Maximum number of turns\n\nsub_agents:\n  agent-browsing:  # Optional sub-agent\n    tools:\n      - tool-google-search\n      - tool-vqa\n      - tool-reading\n      - tool-python\n    max_turns: 50\n\nkeep_tool_result: -1  # Context retention budget: -1 keeps all tool results, or specify K to keep only the K most recent tool responses\n```\n\n> **💡 Context Retention Strategy**: The `keep_tool_result` parameter implements a **recency-based context retention** strategy. In the standard ReAct paradigm, all tool outputs are retained in the message history, which can lead to inefficient context utilization. Empirically, we observe that the agent's subsequent actions depend primarily on recent observations rather than distant ones. This strategy retains only the most recent K tool responses (where K is the `keep_tool_result` value) while preserving the complete sequence of thoughts and actions.\n>\n> **Benefits:**\n>\n> - ✅ Preserves the reasoning and action trace\n> - ✅ Focuses the agent's attention on the most contextually relevant observations\n> - ✅ Frees additional context space for extended reasoning and deeper tool-use trajectories\n> - ✅ Does not lead to performance degradation while allowing more context space for interactive scaling\n>\n> **Usage:** Set `keep_tool_result: -1` to keep all tool results, or specify a positive integer K (e.g., `keep_tool_result: 5`) to keep only the K most recent tool responses.\n\n2. **Use your custom configuration** when running evaluations:\n\n```bash\ncd apps\u002Fmiroflow-agent\nuv run main.py llm=qwen-3 agent=my_custom_config llm.base_url=https:\u002F\u002Fyour_base_url\u002Fv1\n```\n\n3. **Configure environment variables** in `.env` based on the tools you use.\n\n   All available environment variables are listed in `apps\u002Fmiroflow-agent\u002F.env.example`. Copy it to `.env` and configure the variables according to your chosen configuration:\n\n   ```bash\n   cd apps\u002Fmiroflow-agent\n   cp .env.example .env\n   # Edit .env with your actual API keys\n   ```\n\n   **For MiroThinker v1.5** (`mirothinker_v1.5_keep5_max200.yaml`, `mirothinker_v1.5_keep5_max400.yaml`, or `mirothinker_v1.5.yaml`) and **v1.0** (`mirothinker_v1.0_keep5.yaml` or `mirothinker_v1.0.yaml`), see the [Minimal Configuration](#minimal-configuration-for-mirothinker-v15-and-v10) section above for the complete configuration example.\n\n   **For other configurations**, refer to the [Pre-configured Agent Settings](#pre-configured-agent-settings) table above to see which environment variables are required.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n  \u003Csummary>🔑 Click to expand optional API keys\u003C\u002Fsummary>\n\n```bash\n# API for LLM-as-a-Judge (for benchmark testing, required for benchmark evaluation)\nOPENAI_API_KEY=your_openai_key\nOPENAI_BASE_URL=\"https:\u002F\u002Fapi.openai.com\u002Fv1\"  # Optional, defaults to OpenAI's API\n\n# API for Open-Source Audio Transcription Tool (for benchmark testing, optional)\nWHISPER_MODEL_NAME=\"openai\u002Fwhisper-large-v3-turbo\"\nWHISPER_API_KEY=your_whisper_key\nWHISPER_BASE_URL=\"https:\u002F\u002Fyour_whisper_base_url\u002Fv1\"\n\n# API for Open-Source VQA Tool (for benchmark testing, optional)\nVISION_MODEL_NAME=\"Qwen\u002FQwen2.5-VL-72B-Instruct\"\nVISION_API_KEY=your_vision_key\nVISION_BASE_URL=\"https:\u002F\u002Fyour_vision_base_url\u002Fv1\u002Fchat\u002Fcompletions\"\n\n# API for Open-Source Reasoning Tool (for benchmark testing, optional)\nREASONING_MODEL_NAME=\"Qwen\u002FQwen3-235B-A22B-Thinking-2507\"\nREASONING_API_KEY=your_reasoning_key\nREASONING_BASE_URL=\"https:\u002F\u002Fyour_reasoning_base_url\u002Fv1\u002Fchat\u002Fcompletions\"\n\n# API for Claude Sonnet 3.7 as Commercial Tools (optional)\nANTHROPIC_API_KEY=your_anthropic_key\n\n# API for Sogou Search (optional)\nTENCENTCLOUD_SECRET_ID=your_tencent_cloud_secret_id\nTENCENTCLOUD_SECRET_KEY=your_tencent_cloud_secret_key\n\n# API for Summary LLM (can use small models like Qwen3-14B or GPT-5-Nano)\nSUMMARY_LLM_BASE_URL=\"https:\u002F\u002Fyour_summary_llm_base_url\u002Fv1\u002Fchat\u002Fcompletions\"\nSUMMARY_LLM_MODEL_NAME=your_summary_llm_model_name  # e.g., \"Qwen\u002FQwen3-14B\" or \"gpt-5-nano\"\nSUMMARY_LLM_API_KEY=your_summary_llm_api_key\n```\n\n\u003C\u002Fdetails>\n\n### Serve the MiroThinker Agent\n\n#### Option 1 (Recommended): Serve with SGLang or vLLM\n\nUse SGLang to serve MiroThinker models at port 61002:\n\n```bash\nNUM_GPUS=4\nPORT=61002\n\n# Downloading agent from HF \nAGENT_PATH=miromind-ai\u002FMiroThinker-1.7-mini\n\n\npython3 -m sglang.launch_server \\\n    --model-path $AGENT_PATH \\\n    --tp $NUM_GPUS \\\n    --dp 1 \\\n    --host 0.0.0.0 \\\n    --port $PORT \\\n    --trust-remote-code\n```\n\n> **📍 Server URL**: This will start a server at `http:\u002F\u002F0.0.0.0:$PORT`. Use this as your server base URL (e.g., `http:\u002F\u002F0.0.0.0:61002\u002Fv1`).\n\n#### Option 2: Quantized Light-Weight Options\n\nWe also provide comprehensive guidance for serving MiroThinker agents using CPU-optimized and GPU-accelerated quantization techniques, along with detailed analysis and guidelines for deployment with llama.cpp, Ollama, SGLang, and other inference frameworks.\n\n> **📖 Complete Guide**: See [Deployment Documentation](apps\u002Fgradio-demo\u002F) for detailed deployment instructions.\n\n### Run Your First Task\n\nAfter setting up the environment and starting your server, run `main.py` to test with a default question: *\"What is the title of today's arxiv paper in computer science?\"*\n\n```bash\ncd apps\u002Fmiroflow-agent\n\n# Using MiroThinker agents (requires your own server)\nuv run python main.py llm=qwen-3 agent=mirothinker_1.7_keep5_max200 llm.base_url=http:\u002F\u002Flocalhost:61002\u002Fv1\n\n# Or using Claude (requires ANTHROPIC_API_KEY in .env)\nuv run python main.py llm=claude-3-7 agent=single_agent_keep5\n\n# Or using GPT-5 (requires OPENAI_API_KEY in .env)\nuv run python main.py llm=gpt-5 agent=single_agent_keep5\n```\n\n**To customize your question**, edit `main.py` line 32:\n\n```python\ntask_description = \"Your custom question here\"\n```\n\nThe agent will search the web, execute code if needed, and provide an answer with sources.\n\n> **📖 More details**: See [apps\u002Fmiroflow-agent\u002FREADME.md](apps\u002Fmiroflow-agent\u002FREADME.md) for available configurations and troubleshooting.\n\n## 📊 Benchmark Evaluation\n\n> For researchers who want to reproduce our benchmark results or evaluate on standard benchmarks.\n\n### Download Benchmark Data\n\n```bash\ncd MiroThinker  # Back to project root\nwget https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fmiromind-ai\u002FMiroFlow-Benchmarks\u002Fresolve\u002Fmain\u002Fdata_20251115_password_protected.zip\nunzip data_20251115_password_protected.zip\n# Password: pf4*\nrm data_20251115_password_protected.zip\n```\n\n### Run Benchmark Evaluation\n\n> **Note:** For MiroThinker-1.7, use `mirothinker_1.7_keep5_max200` (with context management), `mirothinker_1.7_keep5_max300` (with context management).\n\n**Available Parameters:**\n\nYou can customize the evaluation by setting the following environment variables before running the script:\n\n| Parameter | Default | Description |\n|:----------|:--------|:------------|\n| `LLM_MODEL` | `\"MiroThinker-Agents\"` | Agent name identifier |\n| `BASE_URL` | `\"https:\u002F\u002Fyour-api.com\u002Fv1\"` | Base URL of your server |\n| `NUM_RUNS` | Varies by benchmark | Number of evaluation runs (3 for most benchmarks, 8 for GAIA\u002FXBench\u002FFutureX\u002FSEAL-0, 32 for AIME2025) |\n| `LLM_PROVIDER` | `\"qwen\"` | LLM provider (e.g., `qwen`, `openai`, `anthropic`) |\n| `AGENT_SET` | `\"mirothinker_1.7_keep5_max200\"` | Agent configuration (e.g., `mirothinker_1.7_keep5_max200`, `mirothinker_1.7_keep5_max300`.) |\n| `MAX_CONTEXT_LENGTH` | `262144` | Maximum context length (256K) |\n| `MAX_CONCURRENT` | `10` | Maximum concurrent tasks |\n| `PASS_AT_K` | `1` | Pass@K evaluation metric |\n| `TEMPERATURE` | `1.0` | Sampling temperature |\n| `API_KEY` | `\"xxx\"` | API key for the server |\n\n**Example Usage:**\n\n```bash\n# Navigate to the miroflow-agent directory first\ncd apps\u002Fmiroflow-agent\n\n# Basic usage with v1.5 (recommended)\nNUM_RUNS=8 LLM_MODEL=\"MiroThinker-1.7-mini\" BASE_URL=\"https:\u002F\u002Fyour-api.com\u002Fv1\" bash scripts\u002Frun_evaluate_multiple_runs_gaia-validation-text-103.sh\n\n# Or with v1.0\n# NUM_RUNS=8 LLM_MODEL=\"MiroThinker-v1.0-30B\" BASE_URL=\"https:\u002F\u002Fyour-api.com\u002Fv1\" bash scripts\u002Frun_evaluate_multiple_runs_gaia-validation-text-103.sh\n\n# Customize number of runs and agent configuration (v1.5 with context management)\nLLM_MODEL=\"MiroThinker-1.7-mini\" \\\nBASE_URL=\"https:\u002F\u002Fyour-api.com\u002Fv1\" \\\nNUM_RUNS=8 \\\nAGENT_SET=\"mirothinker_1.7_keep5_max200\" \\\nbash scripts\u002Frun_evaluate_multiple_runs_gaia-validation-text-103.sh\n\n```\n\n\u003Cdetails open>\n  \u003Csummary>📋 Click to expand all benchmark commands\u003C\u002Fsummary>\n\n> **⚠️ Important for MiroThinker-1.7**: To reproduce our reported results, you must set the correct `AGENT_SET`:\n>\n> - **BrowseComp & BrowseComp-ZH**: Use `AGENT_SET=\"mirothinker_1.7_keep5_max300\"`\n> - **All other benchmarks**: Use `AGENT_SET=\"mirothinker_1.7_keep5_max200\"`\n\n```bash\n# Navigate to the miroflow-agent directory first\ncd apps\u002Fmiroflow-agent\n\n# HLE\nNUM_RUNS=3 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_hle.sh\n\n# HLE-Text-2158\nNUM_RUNS=3 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_hle-text-2158.sh\n\n# HLE-Text-500\nNUM_RUNS=3 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_hle-text-500.sh\n\n# GAIA-Text-103\nNUM_RUNS=8 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_gaia-validation-text-103.sh\n\n# GAIA-Validation (GAIA-Val-165)\nNUM_RUNS=8 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_gaia-validation.sh\n\n# BrowseComp-EN (⚠️ use max300)\nNUM_RUNS=3 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max300\" bash scripts\u002Frun_evaluate_multiple_runs_browsecomp.sh\n\n# BrowseComp-ZH (⚠️ use max300)\nNUM_RUNS=3 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max300\" bash scripts\u002Frun_evaluate_multiple_runs_browsecomp_zh.sh\n\n# WebWalkerQA\nNUM_RUNS=3 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_webwalkerqa.sh\n\n# XBench-DeepSearch\nNUM_RUNS=8 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_xbench_deepsearch.sh\n\n# FRAMES\nNUM_RUNS=3 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_frames.sh\n\n# SEAL-0\nNUM_RUNS=8 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_seal-0.sh\n\n# FutureX\nNUM_RUNS=8 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_futurex.sh\n\n# AIME2025\nNUM_RUNS=32 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_aime2025.sh\n\n# DeepSearchQA\nNUM_RUNS=3 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_deepsearchqa.sh\n```\n\n\u003C\u002Fdetails>\n\n#### 3. **Monitor evaluation progress**\n\n\u003Cdetails>\n  \u003Csummary>📊 Click to expand progress monitoring commands\u003C\u002Fsummary>\n\n```bash\n# Navigate to the miroflow-agent directory first\ncd apps\u002Fmiroflow-agent\n\n# For HLE\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_hle.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# For HLE-Text-2158\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_hle-text-2158.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# For HLE-Text-500\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_hle-text-500.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# For BrowseComp-EN\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_browsecomp.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# For BrowseComp-ZH\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_browsecomp_zh.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# For GAIA-Validation\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_gaia-validation.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# For GAIA-Text-103\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_gaia-validation-text-103.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# For WebWalkerQA\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_webwalkerqa.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# For Frames\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_frames.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# For XBench-DeepSearch\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_xbench_deepsearch.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# For SEAL-0\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_seal-0.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# For AIME2025\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_aime2025.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# For DeepSearchQA\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_deepsearchqa.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n```\n\n\u003C\u002Fdetails>\n\n## 🔬 Trace Collection\n\n\u003Cdetails>\n\u003Csummary>📋 Click to expand trace collection commands\u003C\u002Fsummary>\n\n```bash\ncd apps\u002Fcollect-trace\n\n# Collect Traces for SFT\nbash scripts\u002Fcollect_trace_claude37.sh\nbash scripts\u002Fcollect_trace_gpt5.sh\n\n# Collect Traces for DPO\nbash scripts\u002Fcollect_trace_qwen3.sh\n```\n\n\u003C\u002Fdetails>\n\n## ❓ FAQ & Troubleshooting\n\n### Common Issues\n\n\u003Cdetails>\n  \u003Csummary>🔧 Click to expand troubleshooting guide\u003C\u002Fsummary>\n\n#### **Q: Which version should I use?**\n\n**A:** We recommend **MiroThinker-1.7** ⭐ with the minimal configuration:\n\n- **v1.7** ⭐: Latest version with 256K context, world-leading performance. Use config (with context management):\n  - `mirothinker_1.7_keep5_max200` (up to 200 turns, recommended for most tasks)\n  - `mirothinker_1.7_keep5_max300` (up to 300 turns, only used for BrowseComp and BrowseComp-ZH)\n\n#### **Q: How do I get API keys?**\n\n**A:** You need these keys for minimal setup:\n\n- **SERPER_API_KEY**: Get from [Serper.dev](https:\u002F\u002Fserper.dev\u002F) (Google search API)\n- **JINA_API_KEY**: Get from [Jina.ai](https:\u002F\u002Fjina.ai\u002F) (Web scraping)\n- **E2B_API_KEY**: Get from [E2B.dev](https:\u002F\u002Fe2b.dev\u002F) (Code execution sandbox)\n- **SUMMARY_LLM_API_KEY**: Your LLM API credentials (for content summarization). Can be a small model like Qwen3-14B or GPT-5-Nano—the choice has minimal impact on performance.\n- **OPENAI_API_KEY**: Get from [OpenAI](https:\u002F\u002Fplatform.openai.com\u002F) (Required for benchmark evaluation, used for LLM-as-a-Judge)\n- **OPENAI_BASE_URL**: Optional, defaults to `https:\u002F\u002Fapi.openai.com\u002Fv1`. Can be changed to use OpenAI-compatible APIs.\n\n#### **Q: Agent server connection errors**\n\n**A:** Common issues:\n\n- **Check base URL format**: Should end with `\u002Fv1` (e.g., `https:\u002F\u002Fyour-api.com\u002Fv1`)\n- **Verify API key**: Ensure `API_KEY` is set correctly in environment or script\n- **Check server status**: Make sure your server is running and accessible\n- **Network issues**: Verify firewall\u002Fnetwork settings allow connections\n\n#### **Q: Evaluation script fails to run**\n\n**A:** Troubleshooting steps:\n\n1. **Check working directory**: Make sure you're in `apps\u002Fmiroflow-agent` directory\n1. **Verify environment**: Run `uv sync` to ensure dependencies are installed\n1. **Check .env file**: Ensure all required environment variables are set\n1. **Review logs**: Check `logs\u002F` directory for detailed error messages\n1. **Verify data path**: Ensure benchmark data is downloaded and in correct location\n\n#### **Q: Out of memory errors**\n\n**A:** Solutions:\n\n- **Reduce context length**: Set `MAX_CONTEXT_LENGTH` to a smaller value (e.g., 131072 for 128K)\n- **Use context management with fewer turns**:\n  - For v1.5: Use `mirothinker_1.7_keep5_max200` or `mirothinker_1.7_keep5_max300` (with context management)\n- **Reduce concurrent tasks**: Set `MAX_CONCURRENT` to a smaller number (e.g., 5)\n- **Use smaller agents**:\n  - For v1.5: Try 30B instead of 235B\n  - For v1.0: Try 8B or 30B instead of 72B\n\n#### **Q: Tool execution errors**\n\n**A:** Common fixes:\n\n- **E2B errors**: Verify `E2B_API_KEY` is valid and account has credits\n- **Serper errors**: Check `SERPER_API_KEY` and rate limits\n- **Jina errors**: Verify `JINA_API_KEY` and `JINA_BASE_URL` are correct\n- **LLM summarization errors**: Check `SUMMARY_LLM_*` variables and agent availability\n\n#### **Q: How to monitor long-running evaluations?**\n\n**A:** Use the progress monitoring scripts:\n\n```bash\ncd apps\u002Fmiroflow-agent\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_\u003Cbenchmark_name>.py \u002Fpath\u002Fto\u002Flogs\n```\n\nThe scripts show completion status, elapsed time, and estimated remaining time.\n\n\u003C\u002Fdetails>\n\n### Getting Help\n\n- 📖 **Documentation**: Check [MiroFlow Tools README](libs\u002Fmiroflow-tools\u002FREADME.md) for tool details\n- 💬 **Discord**: Join our [Discord community](https:\u002F\u002Fdiscord.com\u002Finvite\u002FGPqEnkzQZd)\n- 🐛 **Issues**: Report bugs on [GitHub Issues](https:\u002F\u002Fgithub.com\u002FMiroMindAI\u002FMiroThinker\u002Fissues)\n- 📧 **Contact**: Visit [our website](https:\u002F\u002Fmiromind.ai\u002F) for more information\n\n## 📄 License\n\nThis project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.\n\n## 🙏 Acknowledgments\n\nWe extend our sincere gratitude to:\n\n- 🏆 **Benchmark Contributors** for the comprehensive evaluation datasets\n- 🌍 **Open Source Community** for the tools and libraries that make this possible\n- 👥 **All Contributors** who have helped make MiroThinker better\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FMiroMindAI\u002FMiroThinker\u002Fgraphs\u002Fcontributors\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_0f707f772df9.png\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\nJoin our community and help us build the future of AI agents!\n\n### References\n\nIf you find this project useful in your research, please consider citing:\n\n**MiroThinker** (Model & Method)\n```\n@article{miromind2026mirothinker,\n  title={MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification},\n  author={MiroMind Team and Bai, S. and Bing, L. and Lei, L. and Li, R. and Li, X. and Lin, X. and Min, E. and Su, L. and Wang, B. and Wang, L. and Wang, L. and Wang, S. and Wang, X. and Zhang, Y. and Zhang, Z. and others},\n  journal={arXiv preprint arXiv:2603.15726},\n  year={2026}\n}\n\n@article{miromind2025mirothinker,\n  title={MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling},\n  author={MiroMind Team and Bai, Song and Bing, Lidong and Chen, Carson and Chen, Guanzheng and Chen, Yuntao and Chen, Zhe and Chen, Ziyi and Dong, Xuan and others},\n  journal={arXiv preprint arXiv:2511.11793},\n  year={2025}\n}\n```\n\n**MiroFlow** (Framework)\n```bibtex\n@article{miromind2026miroflow,\n  title={MiroFlow: Towards High-Performance and Robust Open-Source Agent Framework for General Deep Research Tasks},\n  author={Su, Shiqian and Xing, Sen and Dong, Xuan and Zhong, Muyan and Wang, Bin and Zhu, Xizhou and Chen, Yuntao and Wang, Wenhai and Deng, Yue and Zhu, Pengxiang and others},\n  journal={arXiv preprint arXiv:2602.22808},\n  year={2026}\n}\n```\n\n[![Star History Chart](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_a1effcfbbcd0.png)](https:\u002F\u002Fstar-history.com\u002F#MiroMindAI\u002FMiroThinker&Date)\n","\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_b1110b2f4908.png\" width=\"55%\" alt=\"MiroThinker\" \u002F>\n\u003C\u002Fdiv>\n\n\u003Cbr>\n\n\u003Cdiv align=\"center\">\n\n[![MODEL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModel-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white)](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fmiromind-ai\u002Fmirothinker-17)\n[![Blog](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBlog-4285F4?style=for-the-badge&logo=google-chrome&logoColor=white)](https:\u002F\u002Fmiromind.ai\u002F#blog)\n[![DATA](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FData-0040A1?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fmiromind-ai\u002FMiroVerse-v0.1)\n\n[![GITHUB](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGithub-24292F?style=for-the-badge&logo=github&logoColor=white)](https:\u002F\u002Fgithub.com\u002FMiroMindAI)\n[![WEBSITE](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWebsite-4285F4?style=for-the-badge&logo=google-chrome&logoColor=white)](https:\u002F\u002Fmiromind.ai\u002F)\n[![DISCORD](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https:\u002F\u002Fdiscord.com\u002Finvite\u002FGPqEnkzQZd)\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\n### 🚀 [体验 MiroThinker！](https:\u002F\u002Fdr.miromind.ai\u002F)\n\n\u003C\u002Fdiv>\n\n**MiroThinker**：一款专为研究与预测优化的深度研究代理。它在极具挑战性的 BrowseComp 基准测试中取得了 88.2 的成绩。请参阅 [快速入门](#-quick-start)。\n\n## 📋 目录\n\n- 📰 [新闻与更新](#-news--updates)\n- 📝 [简介](#-introduction)\n- ✨ [关键特性](#-key-features)\n- 📈 [基准测试表现](#-performance-on-benchmarks)\n- 🚀 [快速入门](#-quick-start)\n- 📊 [基准评估](#-benchmark-evaluation)\n- 🔬 [轨迹收集](#-trace-collection)\n- ❓ [常见问题与故障排除](#-faq--troubleshooting)\n- 📄 [许可证](#-license)\n- 🙏 [致谢](#-acknowledgments)\n\n## 📰 新闻与更新\n- **[2026-03-11]** 🎉🎉🎉 推出 [MiroThinker-1.7](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fmiromind-ai\u002Fmirothinker-17)，包括 [MiroThinker-1.7-mini](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-1.7-mini) 和 [MiroThinker-1.7](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-1.7)。MiroThinker-1.7-mini 在 BrowseComp-ZH 上取得了 72.3 的成绩，仅使用 300 亿参数便刷新了开源模型的 SOTA 记录。而我们的专有代理 MiroThinker-H1 则在 BrowseComp 和 BrowseComp-ZH 上均位居开源及商用模型之首。\n- **\\[2026-01-23\\]** 🎉 我们为 [MiroThinker online](http:\u002F\u002Fdr.miromind.ai) 带来了两项重要更新：(a) 核心研究报告生成：在线深度研究报告现支持生成、预览与分享。(b) 扩展文档上传类型：现已支持上传多种文件格式，如 `.pdf`、`.doc`、`.ppt`、`.xls`、`.jpg` 等。欢迎试用！MiroThinker 将持续维护并迭代升级，致力于成为您用过的最佳研究代理！\n- **\\[2026-01-05\\]** 🎉🎉 我们发布了 [MiroThinker-v1.5](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fmiromind-ai\u002Fmirothinker-v15)，这是一系列专为金融预测优化的开源深度研究代理。其中，[MiroThinker-v1.5-30B](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-v1.5-30B) 以远低于 Kimi-K2-Thinking 的成本，在 BrowseComp-ZH 上超越了后者，且仅使用其 1\u002F30 的参数量。而 [MiroThinker-v1.5-235B](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-v1.5-235B) 则在 HLE-Text 上获得了 39.2% 的成绩，在 BrowseComp 上为 69.8%，在 BrowseComp-ZH 上为 71.5%，在 GAIA-Val-165 上更是达到了 80.8%，一举刷新了搜索类代理的最新技术水平。\n\n\n\u003Cdetails>\n  \u003Csummary>📜 点击展开历史更新\u003C\u002Fsummary>\n\n- **\\[2025-11-13\\]** 🎉 [MiroThinker-v1.0](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fmiromind-ai\u002Fmirothinker-v10) 现已发布！我们引入了“交互式扩展”这一性能提升的第三维度，MiroThinker v1.0 支持 256K 的上下文窗口，并可在单个任务中执行多达 600 次工具调用。该版本提供 80 亿、300 亿和 720 亿参数三种规模，分别在 HLE-Text、BrowseComp、BrowseComp-ZH 和 GAIA-Text-103 上取得了 37.7%、47.1%、55.6% 和 81.9% 的成绩。更多详情请参阅 [技术报告](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.11793)。\n- **\\[2025-09-11\\]** MiroThinker-72B-Preview 在本周的 FutureX 基准测试中位列第 4。详情请见 [FutureX](https:\u002F\u002Ffuturex-ai.github.io\u002F)。\n- **\\[2025-09-08\\]** [MiroThinker-v0.2](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fmiromind-ai\u002Fmirothinker-v02) 已正式发布，其在多个基准测试中均取得了开源领域的 SOTA 成绩，包括 HLE（17.8%）、HLE-Text-Only（19.1%）、BrowseComp-EN（17.2%）、BrowseComp-ZH（29.4%）、XBench-DeepSearch（56.0%）以及 Frames（74.8%）。\n- **\\[2025-09-07\\]** 我们新增了多项基准测试，包括 [BrowseComp-ZH](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.19314)、[XBench-DeepSearch](https:\u002F\u002Fxbench.org\u002Fagi\u002Faisearch) 以及 [FutureX](https:\u002F\u002Ffuturex-ai.github.io\u002F)。未来我们还将继续增加更多基准测试项目。\n- **\\[2025-08-22\\]** 我们推出了针对 MiroThinker 的精简部署方案，优化了资源使用并缩短了启动时间。立即体验互动演示：[🚀 试用 Gradio 演示](apps\u002Fgradio-demo)\n- **\\[2025-08-08\\]** [MiroThinker-v0.1](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fmiromind-ai\u002Fmirothinker-v01-689301b6d0563321862d44a1) 正式发布。\n\n\u003C\u002Fdetails>\n\n## 📝 简介\n\n### MiroThinker-1.7\n我们全新的 MiroThinker 系列标志着构建可靠长链任务代理方面的一次重大飞跃。凭借强化的后训练流程，MiroThinker-1.7 系列在开源模型中实现了深度研究任务的 SOTA 表现。\n\n\n**关键特性**\n\n- 🚀 MiroThinker-1.7 支持 256K 的上下文窗口、长时序推理以及深入的多步分析。\n- 🔧 每个任务最多可进行 300 次工具交互，同时具备更精准的逐步推理与决策能力。\n- 📦 提供 300 亿和 2350 亿参数两种规模，并配备全面的工具集和工作流，灵活适应不同的研究场景与计算预算。\n- 我们的专有代理 MiroThinker-H1 为长链可验证推理提供了有力证据——即每一步均可验证、全局也可验证的推理过程，从而显著提升了复杂代理工作流的性能。\n\n\u003Cdiv align=\"center\">\n\n|      模型名称       |         参数            | 最大上下文 | 最大工具调用次数 |                              HF 链接                               |\n|:---------------------:|:-----------------------------:|:-----------:|:--------------:|:------------------------------------------------------------------:|\n| MiroThinker-1.7-mini  | 300 亿   |    256K     |      300       | [🤗 链接](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-1.7-mini) |\n| MiroThinker-1.7 | 2350 亿 |    256K     |      300       | [🤗 链接](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-1.7) |\n\n\u003C\u002Fdiv>\n\nMiroThinker-1.7 在广泛的基准测试中展现了强大的通用研究能力，分别在 BrowseComp、BrowseComp-ZH、GAIA-Val-165 和 HLE-Text 上取得了 74.0%、75.3%、82.7% 和 42.9% 的成绩。尤其值得一提的是，MiroThinker-1.7 在 BrowseComp-ZH 上达到了 SOTA 水平。\n\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_43ea5f118a9d.png)\n\n### MiroThinker-v1.5\n\n\u003Cdetails>\n  \u003Csummary>📦 点击展开 MiroThinker-v1.5 的详细信息\u003C\u002Fsummary>\n\nMiroThinker v1.5 是全球领先的开源搜索代理，通过 **交互式扩展** 推动工具增强型推理——训练代理以处理更深入、更频繁的代理与环境交互，将其作为性能提升的第三个维度，超越模型规模和上下文长度。\n\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_44e8c904d068.png)\n\n**核心特性**\n\n- 🚀 MiroThinker v1.5 支持 256K 上下文窗口，实现长时序推理和深度多步分析。\n- 🔧 每个任务最多可调用 400 次工具——相比之前的开源研究代理有显著提升。\n- 📦 分别以 30B 和 235B 参数规模发布，并配备一套全面的工具和工作流，灵活支持多样化的研究场景和算力预算。\n\n\u003Cdiv align=\"center\">\n\n|      代理名称       |         基础模型            | 最大上下文 | 最大工具调用次数 |                              Hugging Face 链接                               |\n|:---------------------:|:-----------------------------:|:-----------:|:--------------:|:------------------------------------------------------------------:|\n| MiroThinker-v1.5-30B  | Qwen3-30B-A3B-Thinking-2507   |    256K     |      400       | [🤗 链接](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-v1.5-30B) |\n| MiroThinker-v1.5-235B | Qwen3-235B-A22B-Thinking-2507 |    256K     |      400       | [🤗 链接](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-v1.5-235B) |\n\n\u003C\u002Fdiv>\n\nMiroThinker v1.5 在广泛的基准测试中展现出强大的通用研究性能，在 HLE-Text、BrowseComp、BrowseComp-ZH 和 GAIA-Val-165 上分别达到 39.2%、69.8%、71.5% 和 80.8%。这些结果超越了此前的开源代理，并刷新了 BrowseComp 的世界纪录。\n\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_099e0eae1e18.png)\n\n\u003C\u002Fdetails>\n\n### MiroThinker-v1.0\n\n\u003Cdetails>\n  \u003Csummary>📦 点击展开 MiroThinker-v1.0 的详细信息\u003C\u002Fsummary>\n\n与仅通过扩大模型规模或上下文长度进行扩展的早期代理不同，MiroThinker v1.0 在代理层面引入了 **交互式扩展**，系统性地训练代理以应对更深、更频繁的代理与环境交互，从而形成性能提升的第三维度。交互式扩展利用环境反馈和外部信息获取来纠正错误并优化路径。\n\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_81601a80ecbe.png)\n\n### ✨ 核心特性\n\n- 🚀 **256K 上下文窗口**：支持长时序推理和深度多步分析\n- 🔧 **600 次工具调用**：每个任务最多可调用 600 次工具——相比之前的开源研究代理有显著提升\n- 📦 **多种规模**：分别以 8B、30B 和 72B 参数规模发布，并配备全面的工具和工作流，灵活支持不同的研究场景和算力预算\n\n\u003Cdiv align=\"center\">\n\n|      代理名称      |         基础模型          | 最大上下文 | 最大工具调用次数 |                              Hugging Face 链接                               |\n|:--------------------:|:---------------------------:|:-----------:|:--------------:|:------------------------------------------------------------------:|\n| MiroThinker-v1.0-8B  |        Qwen3-8B             |    256K     |      600       | [🤗 链接](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-v1.0-8B)  |\n| MiroThinker-v1.0-30B | Qwen3-30B-A3B-Thinking-2507 |    256K    |      600       | [🤗 链接](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-v1.0-30B) |\n| MiroThinker-v1.0-72B |    Qwen2.5-72B-Instruct     |    256K    |      600       | [🤗 链接](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-v1.0-72B) |\n\n\u003C\u002Fdiv>\n\nMiroThinker v1.0 在一系列基准测试中表现出色，在 HLE-Text、BrowseComp、BrowseComp-ZH 和 GAIA-Text-103 上分别达到 **37.7%**、**47.1%**、**55.6%** 和 **81.9%**。这些成绩不仅超越了以往的开源代理，还进一步缩小了与商业级模型如 **GPT-5-high** 的差距。\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_aefd6e80a90e.png\" width=\"100%\" alt=\"MiroThinker\" \u002F>\n\u003C\u002Fdiv>\n\n\u003C\u002Fdetails>\n\n### MiroThinker-v0.2\n\n\u003Cdetails>\n  \u003Csummary>📦 点击展开 MiroThinker-v0.2 的详细信息\u003C\u002Fsummary>\n\n在这个新版本中，我们引入了三项关键改进：\n\n- 📚 来自英语和中文的 **更丰富的训练数据**，显著提升了基准测试表现和泛化能力\n- 🎯 所有代理统一使用 **单一偏好数据集进行 DPO 训练**\n- 📏 将 **上下文长度从 40K 扩展到 64K**，以更好地应对更具挑战性的多轮工具使用任务\n\n与 v0.1 相比，MiroThinker v0.2 在各项基准测试中均取得了稳定提升。例如，在 **GAIA-Text-103** 上，得分从 **57.3 → 64.1**；在 **BrowseComp-ZH** 上，则从 **17.0 → 29.4**，充分体现了模型作为通用研究代理能力的显著进步。\n\n\u003Cdiv align=\"center\">\n\n|        代理名称        |      基础模型       | 最大上下文 |                                Hugging Face 链接                                 |\n|:------------------------:|:---------------------:|:-----------:|:----------------------------------------------------------------------:|\n| MiroThinker-4B-SFT-v0.2  |       Qwen3-4B        |    64K     | [🤗 链接](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-4B-SFT-v0.2)  |\n| MiroThinker-4B-DPO-v0.2  |       Qwen3-4B        |    64K     | [🤗 链接](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-4B-DPO-v0.2)  |\n| MiroThinker-8B-SFT-v0.2  |       Qwen3-8B        |    64K     | [🤗 链接](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-8B-SFT-v0.2)  |\n| MiroThinker-8B-DPO-v0.2  |       Qwen3-8B        |    64K     | [🤗 链接](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-8B-DPO-v0.2)  |\n| MiroThinker-14B-SFT-v0.2 |       Qwen3-14B       |    64K     | [🤗 链接](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-14B-SFT-v0.2) |\n| MiroThinker-14B-DPO-v0.2 |       Qwen3-14B       |    64K     | [🤗 链接](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-14B-DPO-v0.2) |\n| MiroThinker-32B-SFT-v0.2 |       Qwen3-32B       |    64K     | [🤗 链接](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-32B-SFT-v0.2) |\n| MiroThinker-32B-DPO-v0.2 |       Qwen3-32B       |    64K     | [🤗 链接](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-32B-DPO-v0.2) |\n\n\u003C\u002Fdiv>\n\n\u003C\u002Fdetails>\n\n### MiroThinker-v0.1\n\n\u003Cdetails>\n  \u003Csummary>📦 点击展开 MiroThinker-v0.1 的详细信息\u003C\u002Fsummary>\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_ada3ef2bd3ee.png\" width=\"98%\" alt=\"MiroFlow 在 GAIA-Validation 上的性能\" \u002F>\n  \u003Cp>\u003Cstrong>开源智能体在 GAIA-Validation 基准测试上的性能。\u003C\u002Fstrong>\u003C\u002Fp>\n\u003C\u002Fdiv>\n\n我们发布了 **MiroThinker v0.1** 系列模型，包括 SFT 和 DPO 两种版本，参数规模分别为 **8B**、**14B** 和 **32B**。值得注意的是，MiroThinker v0.1 在 [GAIA 基准](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fgaia-benchmark\u002FGAIA) 上取得了开源模型中的 **最先进性能**。GAIA 是一个针对高级智能体能力的严格评估基准，能够充分展示模型在长上下文、决策密集型以及真实世界任务场景中的强大能力。\n\n\u003Cdiv align=\"center\">\n\n| 智能体名称                | 基础模型 | 最大上下文 | Hugging Face 链接                                                               |\n| :-----------------------: |:----------:|:-----------:| :--------------------------------------------------------------------:|\n| MiroThinker-8B-SFT-v0.1   |  Qwen3-8B  |    40K     | [🤗 链接](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-8B-SFT-v0.1)  |\n| MiroThinker-8B-DPO-v0.1   |  Qwen3-8B  |    40K     | [🤗 链接](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-8B-DPO-v0.1)  |\n| MiroThinker-14B-SFT-v0.1  | Qwen3-14B  |    40K     | [🤗 链接](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-14B-SFT-v0.1) |\n| MiroThinker-14B-DPO-v0.1  | Qwen3-14B  |    40K     | [🤗 链接](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-14B-DPO-v0.1) |\n| MiroThinker-32B-SFT-v0.1  | Qwen3-32B  |    40K     | [🤗 链接](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-32B-SFT-v0.1) |\n| MiroThinker-32B-DPO-v0.1  | Qwen3-32B  |    40K     | [🤗 链接](https:\u002F\u002Fhuggingface.co\u002Fmiromind-ai\u002FMiroThinker-32B-DPO-v0.1) |\n\n\u003C\u002Fdiv>\n\n\u003C\u002Fdetails>\n\n## ✨ 核心特性\n\n### 🤖 **MiroThinker 优化框架**\n\n- 🔓 **完全开源的智能体框架**：框架与智能体全部开源，实现完全透明\n- 🔗 **工具集成**：可无缝集成外部工具和 API\n- 📝 **轨迹记录**：全面记录并分析智能体的交互过程，显示耗时及预计完成时间（以分钟为单位）。支持 SFT 和 DPO 训练\n- 📊 **基准评测**：在多个基准数据集上进行广泛测试\n\n### 📊 **全面的基准测试套件**\n\n\u003Cdetails open>\n  \u003Csummary>📋 点击展开基准列表\u003C\u002Fsummary>\n\n- **GAIA Validation**：通用人工智能助手的基准测试。（[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.12983)）\n- **GAIA-Text-103**：GAIA Validation 中仅针对文本任务的子集。（[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.22648)）\n- **HLE**：人类终极考试。（[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.14249)）\n- **HLE-Text-2158**：HLE 中仅针对文本任务的子集。（[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.14249)）\n- **HLE-Text-500**：HLE 中仅针对文本任务的子集，由 [WebThinker](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.21776) 创建。（[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.21776)）\n- **BrowseComp-EN**：网页浏览与理解任务。（[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.12516)）\n- **BrowseComp-ZH**：BrowseComp 的中文版。（[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.19314)）\n- **WebWalkerQA**：网页导航与问答任务。（[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.07572)）\n- **Frames**：事实性、检索与推理综合测评集。（[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.12941)）\n- **XBench-DeepSearch**：深度研究型智能体的基准测试。（[官网](https:\u002F\u002Fxbench.org\u002Fagi\u002Faisearch)）\n- **FutureX**：用于预测未知未来的实时基准测试。（[官网](https:\u002F\u002Ffuturex-ai.github.io\u002F)）\n- **SEAL-0**：评估 LLM 在具有冲突证据的网络问题上的表现的基准测试。（[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.01062)）\n- **AIME2025**：2025 年美国邀请数学竞赛。（[官网](https:\u002F\u002Fartificialanalysis.ai\u002Fevaluations\u002Faime-2025)）\n- **DeepSearchQA**：谷歌的深度搜索问答基准测试。（[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.20827)）\n\n\u003C\u002Fdetails>\n\n## 📈 基准测试性能\n\n### MiroThinker-1.7\n\n> 为防止潜在的信息泄露（例如从 HuggingFace 获取基准答案），我们在评估过程中屏蔽了对某些网站的访问。\n\n\u003Cdiv>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_f12d4f748c46.png\" width=\"100%\" alt=\"MiroThinker\" \u002F>\n\u003C\u002Fdiv>\n\n\u003C\u002Fdetails>\n\n\n\n### MiroThinker-v1.5\n\n\u003Cdetails>\n  \u003Csummary>📦 点击展开 MiroThinker-v1.5 的详细信息\u003C\u002Fsummary>\n\n> 为防止潜在的信息泄露（例如从 HuggingFace 搜索基准答案），这些工具已明确禁用了对 HuggingFace 的访问权限。\n\n> 我们进一步对所有轨迹的工具输出进行了金丝雀字符串测试，并将任何被污染的轨迹视为错误答案而予以排除。\n\n\u003Cdiv>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_5297deca77da.png\" width=\"100%\" alt=\"MiroThinker\" \u002F>\n\u003C\u002Fdiv>\n\n\u003C\u002Fdetails>\n\n### MiroThinker-v1.0\n\n\u003Cdetails>\n  \u003Csummary>📦 点击展开 MiroThinker-v1.0 的详细信息\u003C\u002Fsummary>\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_2951e00652f7.png\" width=\"100%\" alt=\"MiroThinker\" \u002F>\n\u003C\u002Fdiv>\n\n\u003C\u002Fdetails>\n\n### MiroThinker-v0.2\n\n\u003Cdetails>\n  \u003Csummary>📦 点击展开 MiroThinker-v0.2 的详细信息\u003C\u002Fsummary>\n\n#### 与 SOTA 研究型智能体的对比\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_ab3ed096dfad.png\" width=\"90%\" alt=\"MiroThinker\" \u002F>\n\u003C\u002Fdiv>\n\n#### GAIA 基准测试\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_3a0ed8050f59.png\" width=\"80%\" alt=\"MiroThinker\" \u002F>\n\u003C\u002Fdiv>\n\n\u003C\u002Fdetails>\n\n### MiroThinker-v0.1\n\n\u003Cdetails>\n  \u003Csummary>📦 点击展开 MiroThinker-v0.1 的详细信息\u003C\u002Fsummary>\n\n#### GAIA 基准测试\n\n\u003Cdiv align=\"center\">\n\n| **方法**                   | Text-103\u003Cbr>最佳通过率@1 | Text-103\u003Cbr>通过率@1（平均@8） | Val-165\u003Cbr>最佳通过率@1 | Val-165\u003Cbr>通过率@1（平均@8） |\n|------------------------------|:-----------------------:|:--------------------------:|:----------------------:|:-------------------------:|\n| **🔹—— 7B\u002F8B 代理 ——**     |                         |                            |                        |                           |\n| Search-o1-7B                 |          17.5           |             -              |           -            |             -             |\n| R1-Searcher-7B               |          20.4           |             -              |           -            |             -             |\n| WebDancer-7B                 |          31.0           |             -              |           -            |             -             |\n| WebSailor-7B                 |          37.9           |             -              |           -            |             -             |\n| CK-Pro-8B                    |          40.3           |             -              |          32.7          |             -             |\n| **MiroThinker-8B-SFT-v0.1**  |          44.7           |            40.1            |          34.6          |           31.8            |\n|     + 商业工具       |          46.6           |            42.1            |          37.6          |           33.9            |\n| **MiroThinker-8B-DPO-v0.1**  |          46.6           |            44.8            |          37.0          |           35.4            |\n|     + 商业工具       |        **50.5**         |          **46.7**          |        **38.2**        |         **35.9**          |\n| **🔹—— 14B 代理 ——**       |                         |                            |                        |                           |\n| **MiroThinker-14B-SFT-v0.1** |          47.6           |            44.4            |          37.0          |           34.4            |\n|     + 商业工具       |          49.5           |            47.5            |          41.8          |           39.8            |\n| **MiroThinker-14B-DPO-v0.1** |          48.5           |            46.6            |          42.4          |           39.2            |\n|     + 商业工具       |        **52.4**         |          **48.5**          |        **45.5**        |         **42.0**          |\n| **🔹—— 32B 代理 ——**       |                         |                            |                        |                           |\n| Qwen3-32B                    |          31.1           |            26.7            |          29.7          |           26.4            |\n| Search-o1-32B                |          28.2           |             -              |           -            |             -             |\n| WebThinker-32B-RL            |          48.5           |             -              |           -            |             -             |\n| WebDancer-QwQ-32B            |          51.5           |             -              |           -            |             -             |\n| WebSailor-32B                |          53.2           |             -              |           -            |             -             |\n| WebShaper-QwQ-32B            |          53.3           |             -              |           -            |             -             |\n| **MiroThinker-32B-SFT-v0.1** |          55.3           |            51.3            |          44.9          |           42.7            |\n|     + 商业工具       |          58.3           |            54.2            |          48.5          |           45.8            |\n| **MiroThinker-32B-DPO-v0.1** |          57.3           |            54.1            |          48.5          |           45.9            |\n|     + 商业工具       |        **60.2**         |          **57.9**          |        **50.9**        |         **48.9**          |\n\n\u003C\u002Fdiv>\n\n1. 借鉴 WebThinker、WebAgents 和 CognitiveKernel 的做法，我们报告了 Best Pass@1，即三次运行中的最高分，这通常反映了更强的性能，尽管可能会有一定的波动。为了提供更稳定的指标，我们还报告了 Pass@1（Avg@8），它在牺牲一点分数的情况下提供了更高的稳定性。\n\n1. 为与先前的开源工作保持一致，我们使用 WebAgents 的 LLM-as-a-Judge 模板来评估 GAIA-Text-103，并使用官方的 GAIA 评分脚本来报告 GAIA-Val-165 的结果。\n\n1. 默认情况下，我们尽可能使用开源工具，除了代码工具 [E2B](https:\u002F\u002Fgithub.com\u002Fe2b-dev\u002FE2B) 和 Google 搜索工具 [Serper](https:\u002F\u002Fserper.dev\u002F)。我们在实现中使用了 [Whisper](https:\u002F\u002Fhuggingface.co\u002Fopenai\u002Fwhisper-large-v3-turbo)、[Qwen2.5-VL-72B-Instruct](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen2.5-VL-72B-Instruct) 和 [Qwen3-235B-A22B-Thinking-2507](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-235B-A22B-Thinking-2507)。该框架可以轻松扩展到您选择的其他开源工具。\n\n1. 将这些开源工具替换为商业替代品可以带来性能提升。商业工具主要用于多模态能力和某些复杂的推理子任务。而大多数任务，包括规划、浏览、细化、导航等，都由我们的代理来处理。\n\n#### 更多基准测试\n\n\u003Cdiv align=\"center\">\n\n| 方法                       | HLE\u003Cbr>通过率@1 | 帧\u003Cbr>通过率@1 | 浏览器比较\u003Cbr>通过率@1 | 浏览器比较-中文\u003Cbr>通过率@1 | WebWalkerQA\u003Cbr>通过率@1 |\n|------------------------------|:-------------:|:----------------:|:--------------------:|:-----------------------:|:---------------------:|\n| OpenAI 深度研究         |     26.6      |        -         |         51.5         |          42.9           |           -           |\n| Gemini 深度研究         |     26.9      |        -         |          -           |            -            |           -           |\n| Kimi-Researcher              |     26.9      |       78.8       |          -           |            -            |           -           |\n|                              |               |                  |                      |                         |                       |\n| WebDancer-7B                 |       -       |        -         |          -           |            -            |         36.0          |\n| WebSailor-7B                 |       -       |        -         |         6.7          |          14.2           |           -           |\n| **MiroThinker-8B-SFT-v0.1**  |       -       |       58.0       |         5.5          |           9.3           |         41.3          |\n| **MiroThinker-8B-DPO-v0.1**  |       -       |       64.4       |         8.7          |          13.6           |         45.7          |\n|                              |               |                  |                      |                         |                       |\n| WebThinker-32B-RL            |       -       |        -         |          -           |            -            |         46.5          |\n| WebDancer-QwQ-32B            |       -       |        -         |         3.8          |          18.0           |         47.9          |\n| WebSailor-32B                |       -       |        -         |         10.5         |          25.5           |           -           |\n| WebShaper-32B                |       -       |        -         |          -           |            -            |         51.4          |\n| **MiroThinker-32B-SFT-v0.1** |     10.2      |       70.4       |         10.6         |          13.8           |         45.7          |\n| **MiroThinker-32B-DPO-v0.1** |     11.8      |       71.7       |         13.0         |          17.0           |         49.3          |\n\n\u003C\u002Fdiv>\n\n1. MiroThinker 的性能测试使用了本仓库和开源工具；其他智能体的结果则来自其论文和官方网站。\n\n1. 由于 [MiroVerse-v0.1](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fmiromind-ai\u002FMiroVerse-v0.1) 主要包含英文数据，因此该智能体的中文能力较为有限。我们计划在下一版本中加入更多中文数据以提升其表现。\n\n\u003C\u002Fdetails>\n\n\n\n## 🚀 快速入门\n\n为获得最佳使用效果，我们建议将 MiroThinker 与本工具支持的智能体框架结合使用，并启用思考模式。\n\n### 先决条件\n\n- 🐍 **Python 3.10+**\n- 📦 **uv 包管理器**（[安装指南](https:\u002F\u002Fgithub.com\u002Fastral-sh\u002Fuv)）\n- 🔑 **所需 API 密钥**（详见下方配置部分）\n\n### 安装步骤\n\n```bash\n# 克隆仓库\ngit clone https:\u002F\u002Fgithub.com\u002FMiroMindAI\u002FMiroThinker\ncd MiroThinker\n\n# 设置环境\ncd apps\u002Fmiroflow-agent\nuv sync\n\n# 配置 API 密钥\ncp .env.example .env\n# 编辑 .env 文件，填入您的 API 密钥（SERPER_API_KEY、JINA_API_KEY、E2B_API_KEY 等）\n```\n\n> **📝 环境变量**：所需 API 密钥请参阅【工具配置】章节。\n\n### 工具配置\n\n#### MiroThinker-1.7 的最小配置\n\n| 服务 | 描述 | 提供的工具 | 所需环境变量 |\n|:-------|:------------|:---------------|:-------------------------------|\n| **`tool-python`** | 执行环境与文件管理（E2B 沙盒） | `create_sandbox`、`run_command`、`run_python_code`、`upload_file_from_local_to_sandbox`、`download_file_from_sandbox_to_local`、`download_file_from_internet_to_sandbox` | `E2B_API_KEY` |\n| **`search_and_scrape_webpage`** | 使用 Serper API 进行谷歌搜索 | `google_search` | `SERPER_API_KEY`、`SERPER_BASE_URL` |\n| **`jina_scrape_llm_summary`** | 基于 LLM 的网页抓取与信息提取 | `scrape_and_extract_info` | `JINA_API_KEY`、`JINA_BASE_URL`、`SUMMARY_LLM_BASE_URL`、`SUMMARY_LLM_MODEL_NAME`、`SUMMARY_LLM_API_KEY` |\n\n**最小 `.env` 配置示例：**\n\n```bash\n# 适用于 MiroThinker v1.5 和 v1.0（最小化设置）\nSERPER_API_KEY=your_serper_key\nSERPER_BASE_URL=\"https:\u002F\u002Fgoogle.serper.dev\"\nJINA_API_KEY=your_jina_key\nJINA_BASE_URL=\"https:\u002F\u002Fr.jina.ai\"\nE2B_API_KEY=your_e2b_key\n\n# 用于 jina_scrape_llm_summary\n# 注意：摘要 LLM 可以是小型模型（如 Qwen3-14B 或 GPT-5-Nano）\n# 选择对性能影响较小，可根据实际情况选用\nSUMMARY_LLM_BASE_URL=\"https:\u002F\u002Fyour_summary_llm_base_url\u002Fv1\u002Fchat\u002Fcompletions\"\nSUMMARY_LLM_MODEL_NAME=your_llm_model_name  # 例如 \"Qwen\u002FQwen3-14B\" 或 \"gpt-5-nano\"\nSUMMARY_LLM_API_KEY=your_llm_api_key  # 可选，取决于 LLM 提供商\n\n# 运行基准评测所需（LLM-as-a-Judge）\nOPENAI_API_KEY=your_openai_key  # 运行基准评测时必需\nOPENAI_BASE_URL=\"https:\u002F\u002Fapi.openai.com\u002Fv1\"  # 可选，默认为 OpenAI 的 API\n\n```\n\n> **💡 为什么这是最小配置**：这 3 个 MCP 服务器涵盖了研究任务所需的核心能力：网页搜索、内容提取和代码执行。其他服务器均为可选的增强功能。\n>\n> **🤖 总结 LLM**：`SUMMARY_LLM` 可以是 Qwen3-14B 或 GPT-5-Nano 等小型模型。选择对整体性能影响很小，可根据你的设置方便性来决定使用哪一种。\n>\n> **📊 基准评测用**：如果你计划运行基准评测，则还需要 `OPENAI_API_KEY`（以及可选的 `OPENAI_BASE_URL`），用于评测脚本中使用的 LLM-as-a-Judge 功能。\n>\n> **🖼️ GAIA 多模态任务用**：GAIA-Val-165 包含图像\u002F音频\u002F视频文件的任务。由于 MiroThinker 是纯文本 LLM，因此使用 GPT-4o 将这些文件预处理成文本描述。同样的 `OPENAI_API_KEY` 同时用于此预处理和 LLM-as-a-Judge。\n>\n> **📖 更多详情**：请参阅 [MiroFlow 工具 README](libs\u002Fmiroflow-tools\u002FREADME.md)，了解所有可用工具的完整文档。\n\n\u003Cdetails>\n  \u003Csummary>🔧 点击展开更多可用工具\u003C\u002Fsummary>\n\n以下是一些可选工具，但在 MiroThinker v1.0-1.7 的评估中并未使用：\n\n| 服务器名称          | 类型         | 描述                                 |\n|:---------------------|:-------------|:--------------------------------------------|\n| `tool-vqa`           | 商业         | 使用 Claude 进行视觉处理              |\n| `tool-vqa-os`        | 开源         | 视觉处理（开源替代方案）             |\n| `tool-transcribe`    | 商业         | 使用 OpenAI 进行音频转录            |\n| `tool-transcribe-os` | 开源         | 使用 Whisper 进行音频转录           |\n| `tool-reasoning`     | 商业         | 使用 Claude 的推理引擎              |\n| `tool-reasoning-os`  | 开源         | 推理引擎（开源替代方案）            |\n| `tool-reading`       | 开源         | 使用 MarkItDown 阅读文档            |\n| `tool-google-search` | 商业         | 使用 Google 搜索并抓取网页          |\n| `tool-sogou-search`  | 商业         | 使用搜狗进行网页搜索（中文）        |\n\n> **📖 本地部署**：有关如何在本地部署开源工具（`tool-vqa-os`、`tool-transcribe-os`、`tool-reasoning-os`）的说明，请参阅 [本地工具部署指南](assets\u002FLOCAL-TOOL-DEPLOYMENT.md)。\n\n所有可用工具的完整文档请参阅 [MiroFlow 工具 README](libs\u002Fmiroflow-tools\u002FREADME.md)。\n\n\u003C\u002Fdetails>\n\n#### 预配置的智能体设置\n\n`apps\u002Fmiroflow-agent\u002Fconf\u002Fagent\u002F` 目录包含若干预配置的智能体设置。每种配置使用不同的工具，并要求在 `.env` 文件中设置相应的环境变量。\n\n> **💡 推荐**：对于 MiroThinker-1.7，建议使用 `mirothinker_1.7_keep5_max200`（带上下文管理，推荐用于大多数任务）或 `mirothinker_v1.7_keep5_max300`（仅用于 BrowseComp 和 BrowseComp-ZH）。\n\n| 配置                          | 描述 | 最大回合数 | 上下文保留 | 必需的环境变量                                                                                                                               | 推荐用途 |\n|:------------------------------|:-----|:-----------|:-----------|:---------------------------------------------------------------------------------------------------------------------------------------------|:---------|\n| **`mirothinker_1.7_keep5_max200`** ⭐  | 单智能体，带上下文管理 | 200 | 保留最近 5 条 | `SERPER_API_KEY`, `SERPER_BASE_URL`, `JINA_API_KEY`, `JINA_BASE_URL`, `E2B_API_KEY`, `SUMMARY_LLM_BASE_URL`, `SUMMARY_LLM_MODEL_NAME`, `SUMMARY_LLM_API_KEY` | **1.7（推荐用于大多数任务）** |\n| **`mirothinker_1.7_keep5_max300`** ⭐  | 单智能体，带上下文管理 | 300 | 保留最近 5 条 | 与上相同                                                                                                                              | **1.7（用于 BrowseComp 和 BrowseComp-ZH）** |\n\n\n\u003Cdetails>\n  \u003Csummary>📦 点击展开旧版配置（v0.1\u002Fv0.2）\u003C\u002Fsummary>\n\n| 配置            | 描述 | 最大回合数 | 上下文保留 | 必需的环境变量 | 推荐用途 |\n|:-----------------|:-----|:-----------|:-----------|:---------------|:---------|\n| **`mirothinker_v1.5_keep5_max200`**  | 单智能体，带上下文管理 | 200 | 保留最近 5 条 | `SERPER_API_KEY`, `SERPER_BASE_URL`, `JINA_API_KEY`, `JINA_BASE_URL`, `E2B_API_KEY`, `SUMMARY_LLM_BASE_URL`, `SUMMARY_LLM_MODEL_NAME`, `SUMMARY_LLM_API_KEY` | **v1.5（推荐用于大多数任务）** |\n| **`mirothinker_v1.5_keep5_max400`**  | 单智能体，带上下文管理 | 400 | 保留最近 5 条 | 与上相同                                                                                                                              | **v1.5（用于 BrowseComp 和 BrowseComp-ZH）** |\n| **`mirothinker_v1.5`**                 | 适用于 MiroThinker v1.5 的单智能体 | 600 | 保留所有结果 | 与上相同 | **v1.5** |\n| **`mirothinker_v1.0_keep5`**           | 单智能体，带上下文管理 | 600 | 保留最近 5 条 | 与上相同                                                                                                                                   | **v1.0** |\n| **`mirothinker_v1.0`**                 | 适用于 MiroThinker v1.0 的单智能体 | 600 | 保留所有结果 | 与上相同 | **v1.0** |\n| **`multi_agent`**        | 多智能体，使用商业工具（v0.1\u002Fv0.2） | 50 | 保留所有结果 | `E2B_API_KEY`, `ANTHROPIC_API_KEY`, `ANTHROPIC_BASE_URL`, `OPENAI_API_KEY`, `OPENAI_BASE_URL`, `SERPER_API_KEY`, `SERPER_BASE_URL`, `JINA_API_KEY`, `JINA_BASE_URL` | v0.1\u002Fv0.2 |\n| **`multi_agent_os`**     | 多智能体，使用开源工具（v0.1\u002Fv0.2） | 50 | 保留所有结果 | `E2B_API_KEY`, `VISION_API_KEY`, `VISION_BASE_URL`, `VISION_MODEL_NAME`, `WHISPER_API_KEY`, `WHISPER_BASE_URL`, `WHISPER_MODEL_NAME`, `REASONING_API_KEY`, `REASONING_BASE_URL`, `REASONING_MODEL_NAME`, `SERPER_API_KEY`, `SERPER_BASEURL`, `JINA API KEY`, `JINA BASE URL` | v0.1\u002Fv0.2 |\n\n\u003C\u002Fdetails>\n\n> **💡 注意**：所有环境变量均列于 `apps\u002Fmiroflow-agent\u002F.env.example` 中。请将其复制到 `.env` 文件，并根据你计划使用的工具填写相应值。\n\n#### 创建自定义工具配置\n\n\u003Cdetails>\n  \u003Csummary>🔧 点击展开自定义工具配置指南\u003C\u002Fsummary>\n\n你可以创建自己的 YAML 配置文件，自由组合 MCP 服务器。具体步骤如下：\n\n1. 在 `apps\u002Fmiroflow-agent\u002Fconf\u002Fagent\u002F` 目录下创建一个新的 YAML 文件：\n\n```yaml\n\n# conf\u002Fagent\u002Fmy_custom_config.yaml\ndefaults:\n  - default\n  - _self_\n\nmain_agent:\n  tools:\n    - tool-python                    # 执行环境\n    - search_and_scrape_webpage      # Google 搜索\n    - jina_scrape_llm_summary        # 使用 LLM 进行网页抓取与摘要\n    - tool-vqa                       # 视觉处理（可选）\n    - tool-transcribe                # 音频处理（可选）\n    - tool-reasoning                 # 推理引擎（可选）\n    - tool-reading                   # 文档阅读（可选）\n  max_turns: 300  # 最大轮次\n\nsub_agents:\n  agent-browsing:  # 可选子代理\n    tools:\n      - tool-google-search\n      - tool-vqa\n      - tool-reading\n      - tool-python\n    max_turns: 50\n\nkeep_tool_result: -1  # 上下文保留预算：-1 表示保留所有工具结果，或指定 K 值以仅保留最近的 K 条工具响应\n```\n\n> **💡 上下文保留策略**：`keep_tool_result` 参数实现了一种基于**时效性的上下文保留**策略。在标准的 ReAct 框架中，所有工具输出都会保留在消息历史中，这可能导致上下文利用效率低下。根据经验观察，智能体的后续行为主要依赖于近期的观测结果，而非较早的信息。此策略仅保留最近的 K 条工具响应（K 即 `keep_tool_result` 的值），同时完整保留思维与行动序列。\n>\n> **优点：**\n>\n> - ✅ 保留推理与行动轨迹\n> - ✅ 使智能体专注于最相关的上下文信息\n> - ✅ 腾出更多上下文空间，支持更长时间的推理和更深入的工具使用路径\n> - ✅ 不会导致性能下降，同时为交互扩展提供更多上下文空间\n>\n> **使用方法**：设置 `keep_tool_result: -1` 以保留所有工具结果，或指定一个正整数 K（例如 `keep_tool_result: 5`）以仅保留最近的 K 条工具响应。\n\n2. **运行评估时使用自定义配置**：\n\n```bash\ncd apps\u002Fmiroflow-agent\nuv run main.py llm=qwen-3 agent=my_custom_config llm.base_url=https:\u002F\u002Fyour_base_url\u002Fv1\n```\n\n3. **根据所使用的工具配置 `.env` 环境变量**。\n\n   所有可用的环境变量均列于 `apps\u002Fmiroflow-agent\u002F.env.example` 中。将其复制到 `.env` 文件，并根据所选配置进行相应设置：\n\n   ```bash\n   cd apps\u002Fmiroflow-agent\n   cp .env.example .env\n   # 编辑 .env 文件，填入实际的 API 密钥\n   ```\n\n   **对于 MiroThinker v1.5**（`mirothinker_v1.5_keep5_max200.yaml`、`mirothinker_v1.5_keep5_max400.yaml` 或 `mirothinker_v1.5.yaml`）以及 **v1.0**（`mirothinker_v1.0_keep5.yaml` 或 `mirothinker_v1.0.yaml`），请参阅上文的[适用于 MiroThinker v1.5 和 v1.0 的最小配置]部分，获取完整的配置示例。\n\n   **对于其他配置**，请参考上文的[预配置智能体设置]表格，以了解所需的环境变量。\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n  \u003Csummary>🔑 点击展开可选 API 密钥\u003C\u002Fsummary>\n\n```bash\n# 用于 LLM-as-a-Judge 的 API（用于基准测试，基准评估时必需）\nOPENAI_API_KEY=your_openai_key\nOPENAI_BASE_URL=\"https:\u002F\u002Fapi.openai.com\u002Fv1\"  # 可选，默认为 OpenAI 的 API\n\n# 用于开源音频转录工具的 API（用于基准测试，可选）\nWHISPER_MODEL_NAME=\"openai\u002Fwhisper-large-v3-turbo\"\nWHISPER_API_KEY=your_whisper_key\nWHISPER_BASE_URL=\"https:\u002F\u002Fyour_whisper_base_url\u002Fv1\"\n\n# 用于开源 VQA 工具的 API（用于基准测试，可选）\nVISION_MODEL_NAME=\"Qwen\u002FQwen2.5-VL-72B-Instruct\"\nVISION_API_KEY=your_vision_key\nVISION_BASE_URL=\"https:\u002F\u002Fyour_vision_base_url\u002Fv1\u002Fchat\u002Fcompletions\"\n\n# 用于开源推理工具的 API（用于基准测试，可选）\nREASONING_MODEL_NAME=\"Qwen\u002FQwen3-235B-A22B-Thinking-2507\"\nREASONING_API_KEY=your_reasoning_key\nREASONING_BASE_URL=\"https:\u002F\u002Fyour_reasoning_base_url\u002Fv1\u002Fchat\u002Fcompletions\"\n\n# 用于 Claude Sonnet 3.7 作为商用工具的 API（可选）\nANTHROPIC_API_KEY=your_anthropic_key\n\n# 用于搜狗搜索的 API（可选）\nTENCENTCLOUD_SECRET_ID=your_tencent_cloud_secret_id\nTENCENTCLOUD_SECRET_KEY=your_tencent_cloud_secret_key\n\n# 用于摘要 LLM 的 API（可使用小型模型，如 Qwen3-14B 或 GPT-5-Nano）\nSUMMARY_LLM_BASE_URL=\"https:\u002F\u002Fyour_summary_llm_base_url\u002Fv1\u002Fchat\u002Fcompletions\"\nSUMMARY_LLM_MODEL_NAME=your_summary_llm_model_name  # 例如 \"Qwen\u002FQwen3-14B\" 或 \"gpt-5-nano\"\nSUMMARY_LLM_API_KEY=your_summary_llm_api_key\n```\n\n\u003C\u002Fdetails>\n\n### 部署 MiroThinker 智能体\n\n#### 选项 1（推荐）：使用 SGLang 或 vLLM 部署\n\n使用 SGLang 在端口 61002 上部署 MiroThinker 模型：\n\n```bash\nNUM_GPUS=4\nPORT=61002\n\n# 从 HF 下载智能体\nAGENT_PATH=miromind-ai\u002FMiroThinker-1.7-mini\n\n\npython3 -m sglang.launch_server \\\n    --model-path $AGENT_PATH \\\n    --tp $NUM_GPUS \\\n    --dp 1 \\\n    --host 0.0.0.0 \\\n    --port $PORT \\\n    --trust-remote-code\n```\n\n> **📍 服务器地址**：这将启动一个位于 `http:\u002F\u002F0.0.0.0:$PORT` 的服务器。请将其用作您的服务器基础 URL（例如 `http:\u002F\u002F0.0.0.0:61002\u002Fv1`）。\n\n#### 选项 2：量化轻量级方案\n\n我们还提供了使用 CPU 优化和 GPU 加速量化技术部署 MiroThinker 智能体的全面指南，并附有详细的分析及针对 llama.cpp、Ollama、SGLang 等推理框架的部署指导。\n\n> **📖 完整指南**：请参阅 [部署文档](apps\u002Fgradio-demo\u002F) 获取详细的部署说明。\n\n### 运行您的第一个任务\n\n完成环境搭建并启动服务器后，运行 `main.py` 以使用默认问题进行测试：“今天计算机科学领域的 arXiv 论文标题是什么？”\n\n```bash\ncd apps\u002Fmiroflow-agent\n\n# 使用 MiroThinker 智能体（需自行搭建服务器）\nuv run python main.py llm=qwen-3 agent=mirothinker_1.7_keep5_max200 llm.base_url=http:\u002F\u002Flocalhost:61002\u002Fv1\n\n# 或使用 Claude（需在 .env 中配置 ANTHROPIC_API_KEY）\nuv run python main.py llm=claude-3-7 agent=single_agent_keep5\n\n# 或使用 GPT-5（需在 .env 中配置 OPENAI_API_KEY）\nuv run python main.py llm=gpt-5 agent=single_agent_keep5\n```\n\n**若要自定义问题**，请编辑 `main.py` 第 32 行：\n\n```python\ntask_description = \"您自定义的问题在此\"\n```\n\n智能体会在网络上搜索，必要时执行代码，并给出带有来源的答案。\n\n> **📖 更多详情**：请参阅 [apps\u002Fmiroflow-agent\u002FREADME.md](apps\u002Fmiroflow-agent\u002FREADME.md) 以了解可用配置及故障排除方法。\n\n## 📊 基准评估\n\n> 适用于希望复现我们的基准测试结果或在标准基准上进行评估的研究人员。\n\n### 下载基准数据\n\n```bash\ncd MiroThinker  # 返回项目根目录\nwget https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fmiromind-ai\u002FMiroFlow-Benchmarks\u002Fresolve\u002Fmain\u002Fdata_20251115_password_protected.zip\nunzip data_20251115_password_protected.zip\n# 密码：pf4*\nrm data_20251115_password_protected.zip\n```\n\n### 运行基准评估\n\n> **注意**：对于 MiroThinker-1.7，请使用 `mirothinker_1.7_keep5_max200`（带上下文管理）和 `mirothinker_1.7_keep5_max300`（带上下文管理）。\n\n**可用参数：**\n\n您可以在运行脚本之前通过设置以下环境变量来自定义评估：\n\n| 参数            | 默认值         | 描述                                       |\n|-----------------|----------------|--------------------------------------------|\n| `LLM_MODEL`     | `\"MiroThinker-Agents\"` | 代理名称标识符                             |\n| `BASE_URL`      | `\"https:\u002F\u002Fyour-api.com\u002Fv1\"` | 您服务器的基 URL                           |\n| `NUM_RUNS`      | 根据基准不同而异 | 评估运行次数（大多数基准为3次，GAIA\u002FXBench\u002FFutureX\u002FSEAL-0为8次，AIME2025为32次） |\n| `LLM_PROVIDER`  | `\"qwen\"`       | 大模型提供商（如 `qwen`、`openai`、`anthropic`） |\n| `AGENT_SET`     | `\"mirothinker_1.7_keep5_max200\"` | 代理配置（如 `mirothinker_1.7_keep5_max200`、`mirothinker_1.7_keep5_max300`等） |\n| `MAX_CONTEXT_LENGTH` | `262144`       | 最大上下文长度（256K）                     |\n| `MAX_CONCURRENT` | `10`           | 最大并发任务数                             |\n| `PASS_AT_K`     | `1`            | Pass@K 评估指标                            |\n| `TEMPERATURE`   | `1.0`          | 采样温度                                   |\n| `API_KEY`       | `\"xxx\"`        | 服务器的 API 密钥                          |\n\n**示例用法：**\n\n```bash\n# 首先导航到 miroflow-agent 目录\ncd apps\u002Fmiroflow-agent\n\n# 使用 v1.5 的基本用法（推荐）\nNUM_RUNS=8 LLM_MODEL=\"MiroThinker-1.7-mini\" BASE_URL=\"https:\u002F\u002Fyour-api.com\u002Fv1\" bash scripts\u002Frun_evaluate_multiple_runs_gaia-validation-text-103.sh\n\n# 或者使用 v1.0\n# NUM_RUNS=8 LLM_MODEL=\"MiroThinker-v1.0-30B\" BASE_URL=\"https:\u002F\u002Fyour-api.com\u002Fv1\" bash scripts\u002Frun_evaluate_multiple_runs_gaia-validation-text-103.sh\n\n# 自定义运行次数和代理配置（v1.5 带上下文管理）\nLLM_MODEL=\"MiroThinker-1.7-mini\" \\\nBASE_URL=\"https:\u002F\u002Fyour-api.com\u002Fv1\" \\\nNUM_RUNS=8 \\\nAGENT_SET=\"mirothinker_1.7_keep5_max200\" \\\nbash scripts\u002Frun_evaluate_multiple_runs_gaia-validation-text-103.sh\n\n```\n\n\u003Cdetails open>\n  \u003Csummary>📋 点击展开所有基准测试命令\u003C\u002Fsummary>\n\n> **⚠️ 对于 MiroThinker-1.7 的重要提示**：要复现我们报告的结果，必须设置正确的 `AGENT_SET`：\n>\n> - **BrowseComp 和 BrowseComp-ZH**：使用 `AGENT_SET=\"mirothinker_1.7_keep5_max300\"`\n> - **其他所有基准测试**：使用 `AGENT_SET=\"mirothinker_1.7_keep5_max200\"`\n\n```bash\n# 首先导航到 miroflow-agent 目录\ncd apps\u002Fmiroflow-agent\n\n# HLE\nNUM_RUNS=3 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_hle.sh\n\n# HLE-Text-2158\nNUM_RUNS=3 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_hle-text-2158.sh\n\n# HLE-Text-500\nNUM_RUNS=3 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_hle-text-500.sh\n\n# GAIA-Text-103\nNUM_RUNS=8 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_gaia-validation-text-103.sh\n\n# GAIA-Validation (GAIA-Val-165)\nNUM_RUNS=8 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_gaia-validation.sh\n\n# BrowseComp-EN（⚠️ 使用 max300）\nNUM_RUNS=3 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max300\" bash scripts\u002Frun_evaluate_multiple_runs_browsecomp.sh\n\n# BrowseComp-ZH（⚠️ 使用 max300）\nNUM_RUNS=3 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max300\" bash scripts\u002Frun_evaluate_multiple_runs_browsecomp_zh.sh\n\n# WebWalkerQA\nNUM_RUNS=3 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_webwalkerqa.sh\n\n# XBench-DeepSearch\nNUM_RUNS=8 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_xbench_deepsearch.sh\n\n# FRAMES\nNUM_RUNS=3 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_frames.sh\n\n# SEAL-0\nNUM_RUNS=8 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_seal-0.sh\n\n# FutureX\nNUM_RUNS=8 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_futurex.sh\n\n# AIME2025\nNUM_RUNS=32 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_aime2025.sh\n\n# DeepSearchQA\nNUM_RUNS=3 LLM_MODEL=\"xxx\" BASE_URL=\"xxx\" AGENT_SET=\"mirothinker_1.7_keep5_max200\" bash scripts\u002Frun_evaluate_multiple_runs_deepsearchqa.sh\n```\n\n\u003C\u002Fdetails>\n\n#### 3. **监控评估进度**\n\n\u003Cdetails>\n  \u003Csummary>📊 点击展开进度监控命令\u003C\u002Fsummary>\n\n```bash\n# 首先导航到 miroflow-agent 目录\ncd apps\u002Fmiroflow-agent\n\n# 对于 HLE\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_hle.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# 对于 HLE-Text-2158\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_hle-text-2158.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# 对于 HLE-Text-500\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_hle-text-500.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# 对于 BrowseComp-EN\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_browsecomp.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# 对于 BrowseComp-ZH\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_browsecomp_zh.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# 对于 GAIA-Validation\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_gaia-validation.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# 对于 GAIA-Text-103\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_gaia-validation-text-103.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# 对于 WebWalkerQA\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_webwalkerqa.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# 对于 Frames\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_frames.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# 对于 XBench-DeepSearch\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_xbench_deepsearch.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# 对于 SEAL-0\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_seal-0.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# 对于 AIME2025\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_aime2025.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n\n# 对于 DeepSearchQA\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_deepsearchqa.py \u002Fpath\u002Fto\u002Fevaluation\u002Flogs\n```\n\n\u003C\u002Fdetails>\n\n## 🔬 跟踪日志收集\n\n\u003Cdetails>\n\u003Csummary>📋 点击展开跟踪日志收集命令\u003C\u002Fsummary>\n\n```bash\ncd apps\u002Fcollect-trace\n\n# 收集 SFT 的跟踪日志\nbash scripts\u002Fcollect_trace_claude37.sh\nbash scripts\u002Fcollect_trace_gpt5.sh\n\n# 收集 DPO 的跟踪日志\nbash scripts\u002Fcollect_trace_qwen3.sh\n```\n\n\u003C\u002Fdetails>\n\n## ❓ 常见问题与故障排除\n\n### 常见问题\n\n\u003Cdetails>\n  \u003Csummary>🔧 点击展开故障排除指南\u003C\u002Fsummary>\n\n#### **问：我应该使用哪个版本？**\n\n**答：** 我们推荐使用 **MiroThinker-1.7** ⭐，并采用最小配置：\n\n- **v1.7** ⭐：最新版本，拥有256K上下文长度，性能处于世界领先水平。请使用带上下文管理的配置：\n  - `mirothinker_1.7_keep5_max200`（最多200轮对话，推荐用于大多数任务）\n  - `mirothinker_1.7_keep5_max300`（最多300轮对话，仅用于BrowseComp和BrowseComp-ZH）\n\n#### **问：如何获取API密钥？**\n\n**答：** 最小化设置需要以下密钥：\n\n- **SERPER_API_KEY**：从 [Serper.dev](https:\u002F\u002Fserper.dev\u002F) 获取（Google搜索API）\n- **JINA_API_KEY**：从 [Jina.ai](https:\u002F\u002Fjina.ai\u002F) 获取（网页抓取）\n- **E2B_API_KEY**：从 [E2B.dev](https:\u002F\u002Fe2b.dev\u002F) 获取（代码执行沙箱）\n- **SUMMARY_LLM_API_KEY**：您的LLM API凭证（用于内容摘要）。可以使用小型模型，如Qwen3-14B或GPT-5-Nano——选择对性能影响较小。\n- **OPENAI_API_KEY**：从 [OpenAI](https:\u002F\u002Fplatform.openai.com\u002F) 获取（用于基准评估，作为LLM评判者）\n- **OPENAI_BASE_URL**：可选，默认为 `https:\u002F\u002Fapi.openai.com\u002Fv1`。可更改为使用兼容OpenAI的API。\n\n#### **问：代理服务器连接错误**\n\n**答：** 常见问题：\n\n- **检查基础URL格式**：应以 `\u002Fv1` 结尾（例如 `https:\u002F\u002Fyour-api.com\u002Fv1`）\n- **验证API密钥**：确保在环境变量或脚本中正确设置了 `API_KEY`\n- **检查服务器状态**：确保您的服务器正在运行且可访问\n- **网络问题**：确认防火墙\u002F网络设置允许连接\n\n#### **问：评估脚本无法运行**\n\n**答：** 故障排除步骤：\n\n1. **检查工作目录**：确保您位于 `apps\u002Fmiroflow-agent` 目录下\n1. **验证环境**：运行 `uv sync` 以确保依赖项已安装\n1. **检查 .env 文件**：确保所有必需的环境变量均已设置\n1. **查看日志**：检查 `logs\u002F` 目录中的详细错误信息\n1. **验证数据路径**：确保基准测试数据已下载并位于正确位置\n\n#### **问：内存不足错误**\n\n**答：** 解决方案：\n\n- **减少上下文长度**：将 `MAX_CONTEXT_LENGTH` 设置为较小值（例如，128K时设为131072）\n- **使用较少轮次的上下文管理**：\n  - 对于v1.5：使用 `mirothinker_1.7_keep5_max200` 或 `mirothinker_1.7_keep5_max300`（带上下文管理）\n- **减少并发任务数**：将 `MAX_CONCURRENT` 设置为较小数值（例如5）\n- **使用较小规模的代理**：\n  - 对于v1.5：尝试30B而非235B\n  - 对于v1.0：尝试8B或30B而非72B\n\n#### **问：工具执行错误**\n\n**答：** 常见修复方法：\n\n- **E2B错误**：验证 `E2B_API_KEY` 是否有效，且账户是否有余额\n- **Serper错误**：检查 `SERPER_API_KEY` 和速率限制\n- **Jina错误**：验证 `JINA_API_KEY` 和 `JINA_BASE_URL` 是否正确\n- **LLM摘要错误**：检查 `SUMMARY_LLM_*` 变量及代理可用性\n\n#### **问：如何监控长时间运行的评估？**\n\n**答：** 使用进度监控脚本：\n\n```bash\ncd apps\u002Fmiroflow-agent\npython benchmarks\u002Fcheck_progress\u002Fcheck_progress_\u003Cbenchmark_name>.py \u002Fpath\u002Fto\u002Flogs\n```\n\n这些脚本会显示完成状态、已用时间以及预计剩余时间。\n\u003C\u002Fdetails>\n\n### 获取帮助\n\n- 📖 **文档**：查看 [MiroFlow Tools README](libs\u002Fmiroflow-tools\u002FREADME.md)，了解工具详情\n- 💬 **Discord**：加入我们的 [Discord社区](https:\u002F\u002Fdiscord.com\u002Finvite\u002FGPqEnkzQZd)\n- 🐛 **问题**：在 [GitHub Issues](https:\u002F\u002Fgithub.com\u002FMiroMindAI\u002FMiroThinker\u002Fissues) 上报告错误\n- 📧 **联系**：访问 [我们的网站](https:\u002F\u002Fmiromind.ai\u002F) 获取更多信息\n\n## 📄 许可证\n\n本项目采用Apache 2.0许可证授权——详情请参阅 [LICENSE](LICENSE) 文件。\n\n## 🙏 致谢\n\n我们向以下各方致以诚挚的感谢：\n\n- 🏆 **基准测试贡献者** 提供了全面的评估数据集\n- 🌍 **开源社区** 提供了使这一切成为可能的工具和库\n- 👥 **所有贡献者** 帮助我们不断改进MiroThinker\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FMiroMindAI\u002FMiroThinker\u002Fgraphs\u002Fcontributors\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_0f707f772df9.png\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\n加入我们的社区，共同构建AI代理的未来！\n\n### 参考文献\n\n如果您在研究中使用了本项目，请考虑引用以下内容：\n\n**MiroThinker**（模型与方法）\n```\n@article{miromind2026mirothinker,\n  title={MiroThinker-1.7 & H1: 通过验证迈向重型科研代理},\n  author={MiroMind团队及白S.、冰L.、雷L.、李R.、李X.、林X.、敏E.、苏L.、王B.、王L.、王L.、王S.、王X.、张Y.、张Z.等},\n  journal={arXiv预印本 arXiv:2603.15726},\n  year={2026}\n}\n\n@article{miromind2025mirothinker,\n  title={MiroThinker：通过模型、上下文和交互式扩展，突破开源科研代理的性能边界},\n  author={MiroMind团队及白松、冰立东、陈卡森、陈冠正、陈云涛、陈哲、陈子怡、董轩等},\n  journal={arXiv预印本 arXiv:2511.11793},\n  year={2025}\n}\n```\n\n**MiroFlow**（框架）\n```bibtex\n@article{miromind2026miroflow,\n  title={MiroFlow：面向通用深度研究任务的高性能、稳健开源代理框架},\n  author={苏世谦、邢森、董轩、钟牧言、王斌、朱熙周、陈云涛、王文海、邓悦、朱鹏翔等},\n  journal={arXiv预印本 arXiv:2602.22808},\n  year={2026}\n}\n```\n\n[![Star历史图表](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_readme_a1effcfbbcd0.png)](https:\u002F\u002Fstar-history.com\u002F#MiroMindAI\u002FMiroThinker&Date)","# MiroThinker 快速上手指南\n\nMiroThinker 是一款专为深度研究和预测任务优化的开源 AI 智能体（Agent）。它支持超长上下文（256K）和复杂的长程推理，能够在一个任务中执行数百次工具调用，在 BrowseComp 等基准测试中表现卓越。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux (推荐 Ubuntu 20.04+) 或 macOS。Windows 用户建议使用 WSL2。\n*   **Python**: 版本 >= 3.10。\n*   **GPU**: 推荐使用 NVIDIA GPU。\n    *   **30B 模型**: 建议显存 >= 48GB (单卡 A100\u002FA800 或多卡并行)。\n    *   **235B 模型**: 需要多卡集群环境。\n*   **依赖管理**: 推荐使用 `conda` 或 `venv` 创建独立虚拟环境。\n\n## 安装步骤\n\n### 1. 克隆项目代码\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FMiroMindAI\u002FMiroThinker.git\ncd MiroThinker\n```\n\n### 2. 创建并激活虚拟环境\n```bash\nconda create -n mirothinker python=3.10 -y\nconda activate mirothinker\n```\n\n### 3. 安装依赖\n项目通常使用 `requirements.txt` 管理依赖。为确保下载速度，建议配置国内镜像源（如清华源）进行安装：\n\n```bash\npip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n> **注意**：如果项目中包含特定的推理引擎依赖（如 vLLM 或 SGLang），请根据 `README` 中的具体子模块说明进行额外安装。若需使用量化版本以降低显存需求，请确保安装了相应的 `bitsandbytes` 或 `auto-gptq` 库。\n\n## 基本使用\n\n以下是使用 Python 加载 MiroThinker-1.7-mini (30B) 模型并进行简单推理的最小化示例。本示例基于 Hugging Face `transformers` 库。\n\n### 1. 下载模型\n您可以直接从 Hugging Face 下载，或使用国内镜像站（如 ModelScope）加速。\n\n**方式 A: 使用 Hugging Face CLI (需配置网络)**\n```bash\nhuggingface-cli download miromind-ai\u002FMiroThinker-1.7-mini --local-dir .\u002Fmodels\u002FMiroThinker-1.7-mini\n```\n\n**方式 B: 使用 ModelScope (推荐国内用户)**\n```bash\npip install modelscope -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n然后在代码中使用 ModelScope 加载，或先下载：\n```python\nfrom modelscope import snapshot_download\nmodel_dir = snapshot_download('miromind-ai\u002FMiroThinker-1.7-mini', cache_dir='.\u002Fmodels')\n```\n\n### 2. 运行推理示例\n\n创建一个名为 `run_inference.py` 的文件，输入以下代码：\n\n```python\nimport torch\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\n\n# 配置模型路径 (如果使用本地下载)\nmodel_path = \".\u002Fmodels\u002FMiroThinker-1.7-mini\" \n# 如果直接使用 HF ID 且网络通畅，可替换为：\"miromind-ai\u002FMiroThinker-1.7-mini\"\n\nprint(\"正在加载分词器和模型...\")\ntokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)\nmodel = AutoModelForCausalLM.from_pretrained(\n    model_path,\n    torch_dtype=torch.bfloat16,\n    device_map=\"auto\",\n    trust_remote_code=True\n)\nmodel.eval()\n\n# 构建提示词 (Prompt)\n# MiroThinker 通常需要特定的 Prompt 格式来触发深度推理能力\nquery = \"请分析 2025 年全球人工智能芯片市场的主要趋势，并预测 2026 年的增长点。\"\nmessages = [\n    {\"role\": \"user\", \"content\": query}\n]\n\n# 应用 Chat Template\ntext = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)\ninputs = tokenizer.encode(text, return_tensors=\"pt\").to(model.device)\n\nprint(\"开始生成回答...\")\nwith torch.no_grad():\n    outputs = model.generate(\n        inputs,\n        max_new_tokens=2048,\n        do_sample=True,\n        temperature=0.7,\n        top_p=0.9,\n        pad_token_id=tokenizer.eos_token_id\n    )\n\nresponse = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)\nprint(\"\\n--- MiroThinker 回答 ---\\n\")\nprint(response)\n```\n\n### 3. 执行脚本\n```bash\npython run_inference.py\n```\n\n> **提示**：MiroThinker 的核心优势在于**工具调用（Tool Use）**和**长程推理**。上述示例仅为基础文本生成。要完整体验其“深度研究”能力（如自动搜索、文件分析），请参考项目根目录下的 `apps\u002Fgradio-demo` 启动交互式界面，或查阅官方文档中关于 Agent 工作流配置的详细说明。","某金融科技公司的量化分析师需要在 24 小时内完成一份关于“全球半导体供应链断裂风险”的深度预测报告，以辅助下周的投资决策。\n\n### 没有 MiroThinker 时\n- **信息搜集碎片化**：分析师需手动在数十个新闻源、财报 PDF 和行业数据库中切换，耗时数小时仅能覆盖表面信息，极易遗漏关键隐性数据。\n- **逻辑推演深度不足**：面对海量杂乱数据，人工难以构建复杂的多步因果链，导致预测结论往往基于直觉而非严密的证据链支撑。\n- **报告产出效率低**：从数据清洗、交叉验证到撰写初稿，团队需通宵协作，且因疲劳容易出现数据引用错误或逻辑断层。\n- **多格式数据处理难**：大量的非结构化数据（如扫描版财报图片、手写会议纪要）无法被传统工具直接解析，需额外安排专人录入。\n\n### 使用 MiroThinker 后\n- **全自动深度调研**：MiroThinker 自主发起数百次工具调用，瞬间遍历全球新闻、学术库及财报，精准锁定供应链断点的关键信号。\n- **严密推理与预测**：依托其优化的深度研究能力，MiroThinker 自动构建多层级推理路径，输出带有高置信度评分的趋势预测，逻辑链条清晰可追溯。\n- **一键生成专业报告**：MiroThinker 直接整合分析结果，生成包含图表、引用来源完整的深度研究报告，将原本需要一天的工作压缩至分钟级。\n- **全格式文档兼容**：MiroThinker 直接读取并解析上传的 PDF、Excel 甚至图片格式的原始单据，无缝提取关键财务指标纳入分析模型。\n\nMiroThinker 将原本依赖资深专家数天完成的复杂研判任务，转化为分钟级的高精度自动化决策支持，彻底重塑了深度研究的效率边界。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FMiroMindAI_MiroThinker_43ea5f11.png","MiroMindAI","MiroMind","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FMiroMindAI_396c2807.png","",null,"service@miromind.ai","miromind_ai","https:\u002F\u002Fmiromind.ai","https:\u002F\u002Fgithub.com\u002FMiroMindAI",[82,86,90,94,98,102,105],{"name":83,"color":84,"percentage":85},"Python","#3572A5",86.2,{"name":87,"color":88,"percentage":89},"Shell","#89e051",5.5,{"name":91,"color":92,"percentage":93},"JavaScript","#f1e05a",4.6,{"name":95,"color":96,"percentage":97},"CSS","#663399",1.6,{"name":99,"color":100,"percentage":101},"Jinja","#a52a22",1,{"name":103,"color":104,"percentage":101},"HTML","#e34c26",{"name":106,"color":107,"percentage":108},"Just","#384d54",0.1,8194,602,"2026-04-08T16:55:33","Apache-2.0","未说明（模型参数量高达 30B-235B，隐含需要高性能 GPU 集群或多卡环境）","未说明",{"notes":116,"python":114,"dependencies":117},"README 中未提供具体的本地运行环境配置、依赖库列表或安装命令。该项目主要提供 Hugging Face 模型权重和在线演示链接。由于模型参数量巨大（30B 至 235B）且支持 256K 上下文窗口，本地部署通常需要多张高端 NVIDIA GPU（如 A100\u002FH100）及大量显存，建议使用 vLLM 或其他推理框架进行加载。",[],[119,14,13],"其他",[121,122,123,124,125,126,127,128,129,130],"agent","browsecomp","deep-research","futurex","gaia","hle","research-agent","xbench","agent-framework","search-agent","2026-03-27T02:49:30.150509","2026-04-09T09:37:23.508847",[134,139,144,149,154,159,163],{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},25911,"MiroThinker 的 Pro 模式和标准模式有什么区别？","主要区别在于使用的模型尺寸不同。开启 Pro 模式后，系统会使用更大尺寸的模型，具备更强的理解能力，通常会进行更深入的调研。此外，目前只有 Pro 模式支持附件上传功能。如果问题不复杂或信息容易通过互联网搜索验证，使用标准模式通常更具性价比。","https:\u002F\u002Fgithub.com\u002FMiroMindAI\u002FMiroThinker\u002Fissues\u002F118",{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},25912,"网页刷新后，SSE（服务器发送事件）流如何恢复并获取历史记录？","系统采用双层存储策略：热数据层在任务执行时将事件实时写入消息队列，冷数据层在任务完成后将结果持久化到数据库。当客户端刷新重连时：1. 首先检查任务是否仍在运行（队列是否存在）；2. 若仍在运行，先以“快照”形式交付累积的历史记录，然后继续流式传输新事件；3. 若任务已结束，则直接从数据库获取完整历史记录。前端会通过标签页可见性变化、网络错误检测和心跳检查自动监测连接状态并重连。","https:\u002F\u002Fgithub.com\u002FMiroMindAI\u002FMiroThinker\u002Fissues\u002F74",{"id":145,"question_zh":146,"answer_zh":147,"source_url":148},25913,"如何在 MCP 环境的 search_and_scrape_webpage.py 中正确添加日志调试？","普通的 logger.info() 或 print() 可能无法输出。解决方案是修改 `setup_mcp_logging` 函数，添加一个控制台处理器（console handler）。在具体调用该函数时，设置参数 `console_handler=True`。示例代码如下：\n```python\ndef setup_mcp_logging(level=\"INFO\", addr=None, tool_name=\"unknown_tool\", console_handler=False):\n    root = logging.getLogger()\n    root.setLevel(level)\n    # 移除根处理器\n    for h in root.handlers[:]:\n        root.removeHandler(h)\n        h.close()\n    # ... (其他清理逻辑)\n    if console_handler:\n        # 添加控制台处理器以输出日志\n        ch = logging.StreamHandler()\n        ch.setLevel(level)\n        root.addHandler(ch)\n```\n这样即可在服务器脚本中看到自定义的输出日志。","https:\u002F\u002Fgithub.com\u002FMiroMindAI\u002FMiroThinker\u002Fissues\u002F52",{"id":150,"question_zh":151,"answer_zh":152,"source_url":153},25914,"复现 mirothinker1.5 时，Summary Model 应该使用哪个模型？需要开启思考模式吗？","Summary Model 推荐使用 Qwen3-14B 或者 gpt5-nano。在使用 Qwen3-14B 进行 summary 时，不需要开启思考模式。如果在强化学习（RL）阶段考虑成本，使用 gpt5-nano 会更便宜一些。如果使用小模型（如 4B）作为 summary model 去跑大模型（如 30B），可能会导致分数显著下降。","https:\u002F\u002Fgithub.com\u002FMiroMindAI\u002FMiroThinker\u002Fissues\u002F83",{"id":155,"question_zh":156,"answer_zh":157,"source_url":158},25915,"Demo 中的搜索结果是否进行了重排序或置信度筛选？","Demo 上的搜索结果是谷歌搜索（通过 Serper API）和搜狗搜索（官方 API）结果的直接拼接，系统并未对搜索结果进行重排序，也没有基于置信度或相关性进行额外的筛选步骤。搜索结果是由 Serper 直接返回的原始数据。","https:\u002F\u002Fgithub.com\u002FMiroMindAI\u002FMiroThinker\u002Fissues\u002F59",{"id":160,"question_zh":161,"answer_zh":162,"source_url":138},25916,"MiroThinker 计划提供 API 吗？","是的，开发团队计划在 4 月份推出 MiroThinker 的 API，用户可以关注后续更新。",{"id":164,"question_zh":165,"answer_zh":166,"source_url":167},25917,"为什么我在 Pro 模式下找不到上传附件的按钮？","虽然 Pro 模式理论上支持附件上传，但如果按钮消失，可能是上下文长度限制或特定会话状态导致的。根据社区反馈，确保上下文窗口足够大（避免频繁出现上下文不足的错误）可能有助于功能正常显示。如果问题持续，建议检查浏览器控制台错误或尝试重新加载页面\u002F会话。","https:\u002F\u002Fgithub.com\u002FMiroMindAI\u002FMiroThinker\u002Fissues\u002F88",[]]