[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-RUC-NLPIR--WebThinker":3,"tool-RUC-NLPIR--WebThinker":65},[4,23,32,40,49,57],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":22},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,2,"2026-04-05T10:45:23",[13,14,15,16,17,18,19,20,21],"图像","数据工具","视频","插件","Agent","其他","语言模型","开发框架","音频","ready",{"id":24,"name":25,"github_repo":26,"description_zh":27,"stars":28,"difficulty_score":29,"last_commit_at":30,"category_tags":31,"status":22},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,3,"2026-04-04T04:44:48",[17,13,20,19,18],{"id":33,"name":34,"github_repo":35,"description_zh":36,"stars":37,"difficulty_score":29,"last_commit_at":38,"category_tags":39,"status":22},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74939,"2026-04-05T23:16:38",[19,13,20,18],{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":46,"last_commit_at":47,"category_tags":48,"status":22},3215,"awesome-machine-learning","josephmisiti\u002Fawesome-machine-learning","awesome-machine-learning 是一份精心整理的机器学习资源清单，汇集了全球优秀的机器学习框架、库和软件工具。面对机器学习领域技术迭代快、资源分散且难以甄选的痛点，这份清单按编程语言（如 Python、C++、Go 等）和应用场景（如计算机视觉、自然语言处理、深度学习等）进行了系统化分类，帮助使用者快速定位高质量项目。\n\n它特别适合开发者、数据科学家及研究人员使用。无论是初学者寻找入门库，还是资深工程师对比不同语言的技术选型，都能从中获得极具价值的参考。此外，清单还延伸提供了免费书籍、在线课程、行业会议、技术博客及线下聚会等丰富资源，构建了从学习到实践的全链路支持体系。\n\n其独特亮点在于严格的维护标准：明确标记已停止维护或长期未更新的项目，确保推荐内容的时效性与可靠性。作为机器学习领域的“导航图”，awesome-machine-learning 以开源协作的方式持续更新，旨在降低技术探索门槛，让每一位从业者都能高效地站在巨人的肩膀上创新。",72149,1,"2026-04-03T21:50:24",[20,18],{"id":50,"name":51,"github_repo":52,"description_zh":53,"stars":54,"difficulty_score":46,"last_commit_at":55,"category_tags":56,"status":22},2234,"scikit-learn","scikit-learn\u002Fscikit-learn","scikit-learn 是一个基于 Python 构建的开源机器学习库，依托于 SciPy、NumPy 等科学计算生态，旨在让机器学习变得简单高效。它提供了一套统一且简洁的接口，涵盖了从数据预处理、特征工程到模型训练、评估及选择的全流程工具，内置了包括线性回归、支持向量机、随机森林、聚类等在内的丰富经典算法。\n\n对于希望快速验证想法或构建原型的数据科学家、研究人员以及 Python 开发者而言，scikit-learn 是不可或缺的基础设施。它有效解决了机器学习入门门槛高、算法实现复杂以及不同模型间调用方式不统一的痛点，让用户无需重复造轮子，只需几行代码即可调用成熟的算法解决分类、回归、聚类等实际问题。\n\n其核心技术亮点在于高度一致的 API 设计风格，所有估算器（Estimator）均遵循相同的调用逻辑，极大地降低了学习成本并提升了代码的可读性与可维护性。此外，它还提供了强大的模型选择与评估工具，如交叉验证和网格搜索，帮助用户系统地优化模型性能。作为一个由全球志愿者共同维护的成熟项目，scikit-learn 以其稳定性、详尽的文档和活跃的社区支持，成为连接理论学习与工业级应用的最",65628,"2026-04-05T10:10:46",[20,18,14],{"id":58,"name":59,"github_repo":60,"description_zh":61,"stars":62,"difficulty_score":10,"last_commit_at":63,"category_tags":64,"status":22},3364,"keras","keras-team\u002Fkeras","Keras 是一个专为人类设计的深度学习框架，旨在让构建和训练神经网络变得简单直观。它解决了开发者在不同深度学习后端之间切换困难、模型开发效率低以及难以兼顾调试便捷性与运行性能的痛点。\n\n无论是刚入门的学生、专注算法的研究人员，还是需要快速落地产品的工程师，都能通过 Keras 轻松上手。它支持计算机视觉、自然语言处理、音频分析及时间序列预测等多种任务。\n\nKeras 3 的核心亮点在于其独特的“多后端”架构。用户只需编写一套代码，即可灵活选择 TensorFlow、JAX、PyTorch 或 OpenVINO 作为底层运行引擎。这一特性不仅保留了 Keras 一贯的高层易用性，还允许开发者根据需求自由选择：利用 JAX 或 PyTorch 的即时执行模式进行高效调试，或切换至速度最快的后端以获得最高 350% 的性能提升。此外，Keras 具备强大的扩展能力，能无缝从本地笔记本电脑扩展至大规模 GPU 或 TPU 集群，是连接原型开发与生产部署的理想桥梁。",63927,"2026-04-04T15:24:37",[20,14,18],{"id":66,"github_repo":67,"name":68,"description_en":69,"description_zh":70,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":87,"forks":88,"last_commit_at":89,"license":90,"difficulty_score":29,"env_os":91,"env_gpu":92,"env_ram":91,"env_deps":93,"category_tags":99,"github_topics":100,"view_count":29,"oss_zip_url":79,"oss_zip_packed_at":79,"status":22,"created_at":114,"updated_at":115,"faqs":116,"releases":147},868,"RUC-NLPIR\u002FWebThinker","WebThinker","[NeurIPS 2025] 🌐 WebThinker: Empowering Large Reasoning Models with Deep Research Capability","WebThinker 是一款专为增强大型推理模型深度研究能力而设计的开源框架。它赋予 AI 自主进行网络搜索、信息整合及复杂逻辑推理的能力，使其不再局限于静态知识库，而是能像人类研究员一样探索未知领域。\n\n针对大模型在处理实时性要求高或需要多步验证的调研任务时表现不足的痛点，WebThinker 通过集成高效的搜索接口（如 Google Serper）与先进的推理模型（如 QwQ、R1 系列），实现了从问题理解到答案生成的闭环。该项目论文已被 NeurIPS 2025 接收，证明了其学术价值。\n\nWebThinker 特别适合希望构建智能体（Agent）的开发者、从事 AI 应用的研究人员，以及需要自动化深度信息检索的用户。其核心亮点在于对开源推理模型的优化适配，提供了多种参数规模的预训练模型供直接部署。所有代码均已公开，用户可轻松在本地或云端搭建属于自己的深度研究助手，开启更智能的信息探索之旅。","\n\u003Ch1 align=\"center\"> 🌐 WebThinker: Empowering Large Reasoning Models with Deep Research Capability\u003C\u002Fa>\u003C\u002Fh1>\n\n\n\u003Cdiv align=\"center\"> \n\n[![Notion](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FNotion-WebThinker-red?style=flat&logo=notion&logoColor=white)](https:\u002F\u002Fforemost-beechnut-8ed.notion.site\u002FWebThinker-Empowering-Large-Reasoning-Models-with-Deep-Research-Capability-d13158a27d924a4b9df7f9ab94066b64) \n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-arXiv-b5212f.svg?logo=arxiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.21776)\n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-Hugging%20Face-yellow?logo=huggingface)](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2504.21776)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLICENSE-MIT-green.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FMIT) \n[![Python 3.9+](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.9+-blue.svg)](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002Frelease\u002Fpython-390\u002F) \n[![X (formerly Twitter) URL](https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Furl?url=https%3A%2F%2Fx.com%2FKevin_GuoweiXu%2Fstatus%2F1858338565463421244)](https:\u002F\u002Fx.com\u002Fkakakbibibi\u002Fstatus\u002F1917768235069628823)\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n🤗 \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Flixiaoxi45\u002FWebThinker-QwQ-32B\" target=\"_blank\">WebThinker-QwQ-32B\u003C\u002Fa> ｜\n🤗 \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Flixiaoxi45\u002FWebThinker-R1-7B\" target=\"_blank\">WebThinker-R1-7B\u003C\u002Fa> ｜\n🤗 \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Flixiaoxi45\u002FWebThinker-R1-14B\" target=\"_blank\">WebThinker-R1-14B\u003C\u002Fa> ｜\n🤗 \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Flixiaoxi45\u002FWebThinker-R1-32B\" target=\"_blank\">WebThinker-R1-32B\u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\u003Ch5 align=\"center\"> If you like our project, please give us a star ⭐ on GitHub for the latest update.\u003C\u002Fh5>\n\n## 📣 Latest News\n\n- **[Sep 18, 2025]**: 🎉 Our paper **[WebThinker: Empowering Large Reasoning Models with Deep Research Capability](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.21776)** has been accepted at NeurIPS 2025!\n- **[May 30, 2025]**: 🔍 WebThinker now supports **[Google Serper API](https:\u002F\u002Fserper.dev\u002F)** for web search! Important: **[Bing Search API](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fbing\u002Fapis\u002Fbing-web-search-api)** will be retired in August 2025.\n- **[May 9, 2025]**: The brief introduction of WebThinker can be found on platforms like **[X](https:\u002F\u002Fx.com\u002Fkakakbibibi\u002Fstatus\u002F1917768235069628823)**, **[Zhihu](https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F1903981050780192911)**, and **[WeChat](https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002FlVrTZQLmrJkkG5QYcEZTFA)**.\n- **[May 1, 2025]**: 🤗 **[WebThinker Model Collection](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Flixiaoxi45\u002Fwebthinker-6812d5fd1287ee53d68f0557)** is now available on Hugging Face. You can deploy our optimized models for your deep research tasks.\n- **[May 1, 2025]**: 📄 Our paper is now available on **[arXiv](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.21776)** and **[Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2504.21776)**.\n- **[March 31, 2025]**: 🎉 **[WebThinker Notion Page](https:\u002F\u002Fforemost-beechnut-8ed.notion.site\u002FWebThinker-Empowering-Large-Reasoning-Models-with-Deep-Research-Capability-d13158a27d924a4b9df7f9ab94066b64)** launched with comprehensive project details.\n- **[March 31, 2025]**: 🚀 Full codebase released. WebThinker now supports deep research with open-source reasoning models like QwQ-32B.\n\n## 🔥 Deep Research Agent Family\n\n\u003Cdetails open>\u003Csummary>Welcome to try our deep research agent series: \u003C\u002Fsummary>\u003Cp>\n\n\n> [**DeepAgent: A General Reasoning Agent with Scalable Toolsets (New!)**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.21618) \u003Cbr>\n> **Authors:** Xiaoxi Li, Wenxiang Jiao, Jiarui Jin, Guanting Dong, Jiajie Jin, Yinuo Wang, Hao Wang, Yutao Zhu, Ji-Rong Wen, Yuan Lu, Zhicheng Dou \u003Cbr>\n> **TLDR:** An end-to-end deep reasoning agent that performs autonomous thinking, tool discovery, and action execution with brain-inspired memory folding mechanism. \u003Cbr>\n[![github](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-Github-black?logo=github)](https:\u002F\u002Fgithub.com\u002FRUC-NLPIR\u002FDeepAgent) [![github](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FRUC-NLPIR\u002FDeepAgent.svg?style=social)](https:\u002F\u002Fgithub.com\u002FRUC-NLPIR\u002FDeepAgent) [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2510.21618-b31b1b.svg?logo=arXiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.21618) [![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHugging%20Face-Paper-yellow?logo=huggingface)](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2510.21618)\n\n\n > [**WebThinker: Empowering Large Reasoning Models with Deep Research Capability (NeurIPS 2025)**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.21776) \u003Cbr>\n> **Authors:** Xiaoxi Li*, Jiajie Jin*, Guanting Dong*, Hongjin Qian, Yutao Zhu, Yongkang Wu, Ji-Rong Wen, Zhicheng Dou \u003Cbr>\n> **TLDR:** A deep research agent that empowers large reasoning models with autonomous search, web browsing, and research report drafting capabilities. \u003Cbr>\n[![github](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-Github-black?logo=github)](https:\u002F\u002Fgithub.com\u002FRUC-NLPIR\u002FWebThinker) [![github](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FRUC-NLPIR\u002FWebThinker.svg?style=social)](https:\u002F\u002Fgithub.com\u002FRUC-NLPIR\u002FWebThinker) [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2504.21776-b31b1b.svg?logo=arXiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.21776) [![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHugging%20Face-Paper-yellow?logo=huggingface)](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2504.21776)\n\n> [**Search-o1: Agentic Search-Enhanced Large Reasoning Models (EMNLP 2025)**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.05366) \u003Cbr>\n> **Authors:** Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, Zhicheng Dou \u003Cbr>\n> **TLDR:** An agentic search-enhanced framework that integrates autonomous knowledge retrieval with large reasoning models through Agentic RAG and reasoning-in-documents modules. \u003Cbr>\n[![github](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-Github-black?logo=github)](https:\u002F\u002Fgithub.com\u002FRUC-NLPIR\u002FSearch-o1) [![github](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FRUC-NLPIR\u002FSearch-o1.svg?style=social)](https:\u002F\u002Fgithub.com\u002FRUC-NLPIR\u002FSearch-o1) [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2501.16399-b31b1b.svg?logo=arXiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.05366) [![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHugging%20Face-Paper-yellow?logo=huggingface)](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2501.05366) [![Project Page](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-green)](https:\u002F\u002Fsearch-o1.github.io\u002F)\n\u003C\u002Fp>\u003C\u002Fdetails>\n\n\n\n## 🎬 Demo\n\n\u003Cdiv align=\"center\">\n    \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fa38e82ec-5aed-4efe-a8b8-e9ee2d97e9b9\" \u002F>\n\u003C\u002Fdiv>\n\n## 💡 Overview\n\n**WebThinker** is a deep research framework fully powered by large reasoning models (LRMs). WebThinker enables LRMs to **autonomously search**, **deeply explore web pages**, and **draft research reports**, all within their thinking process.\n\nUnlike existing open-source deep search agents that typically employ retrieval-augmented generation (RAG) with predefined workflows, WebThinker allows the reasoning model itself to perform actions during thinking, achieving **end-to-end task execution** in a single generation.\n\n### 📊 Overall Performance\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FRUC-NLPIR_WebThinker_readme_dc106ac8bc3f.png\" width=\"100%\" \u002F>\n\u003C\u002Fp>\n\nAs shown above, WebThinker consistently outperforms competing approaches on both knowledge-intensive complex reasoning benchmarks (GPQA, GAIA, WebWalkerQA, HLE) and open-ended reasoning tasks for report generation. Our WebThinker-32B with QwQ-32B as backbone reasoning model achieves superior performance across all tasks.\n\n### ✨ The WebThinker Framework\n\n![Model Comparison](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FRUC-NLPIR_WebThinker_readme_5cdfd9edde0b.png)\n\n**WebThinker** enables reasoning models to autonomously conduct web searches and web page navigations to acquire external knowledge during their reasoning process. This approach significantly reduces the time and costs associated with information gathering for researchers in knowledge-intensive fields. Furthermore, WebThinker allows LRMs to draft section content while thinking and searching, producing comprehensive, customized reports that directly address users' research questions.\n\n**Key Features:**\n- We introduce a **Deep Web Explorer** that empowers LRMs to search, navigate pages by clicking interactive elements (like links or buttons), and extract relevant information. Based on initial search results, the LRM can initiate follow-up searches and traverse deeper links until it collects all relevant information.\n- For scientific reporting, our **Autonomous Think-Search-and-Draft** strategy integrates real-time knowledge seeking with report creation. We equip LRMs with three specialized tools: (1) drafting content for specific chapters, (2) checking the current report, and (3) editing the report—ensuring reports remain comprehensive, coherent, and adaptive to new insights.\n- We're developing **RL-based training strategies** to optimize end-to-end task performance by leveraging large-scale reasoning trajectories from complex tasks. Using accuracy of reasoning, tool usage, and final outputs, we construct preference pairs for online DPO training, enabling the model to progressively improve its research capabilities.\n\n\n\n## 🔧 Installation\n\n###  Environment Setup\n```bash\n# Create conda environment\nconda create -n webthinker python=3.9\nconda activate webthinker\n\n# Install requirements\ncd WebThinker-main\npip install -r requirements.txt\n```\n\n## 🏃 Quick Start\n\n### Pre-preparation\n\n#### Model Serving\nBefore running WebThinker, ensure your reasoning model and auxiliary model are served using vLLM. In our experiments, we use QwQ-32B as the reasoning model and Qwen-32B-Instruct as the auxiliary model. You can also explore other instruction-tuned models as your auxiliary model, which will be used in webpage reading, report writting\u002Feditting, evaluation, etc. For detailed instructions on model serving, see [here](https:\u002F\u002Fdocs.vllm.ai\u002Fen\u002Fstable\u002Fserving\u002Fdistributed_serving.html). \n\n#### Web Parser Client\nFor better web crawling performance, we recommend setting up a web parser client in `scripts\u002Fsearch\u002Fbing_search.py` using [Crawl4AI](https:\u002F\u002Fgithub.com\u002Funclecode\u002Fcrawl4ai). This will help handle JavaScript-rendered content and provide more reliable webpage extraction.\n\nNow you can run different inference modes using the provided scripts. Below are examples of how to execute each mode:\n\n### Problem Solving Mode\n\n1. If you would like to ask a single question, run the following command:\n```bash\npython scripts\u002Frun_web_thinker.py \\\n    --single_question \"What is OpenAI Deep Research?\" \\\n    --search_engine \"serper\" \\\n    --serper_api_key \"YOUR_GOOGLE_SERPER_API\" \\\n    --api_base_url \"YOUR_API_BASE_URL\" \\\n    --model_name \"QwQ-32B\" \\\n    --aux_api_base_url \"YOUR_AUX_API_BASE_URL\" \\\n    --aux_model_name \"Qwen2.5-32B-Instruct\" \\\n    --tokenizer_path \"PATH_TO_YOUR_TOKENIZER\" \\\n    --aux_tokenizer_path \"PATH_TO_YOUR_AUX_TOKENIZER\"\n```\n\n2. If you would like to run results on benchmarks, run the following command:\n```bash\npython scripts\u002Frun_web_thinker.py \\\n    --dataset_name gaia \\\n    --split dev \\\n    --concurrent_limit 32 \\\n    --max_search_limit 15 \\\n    --search_engine \"serper\" \\\n    --serper_api_key \"YOUR_GOOGLE_SERPER_API\" \\\n    --api_base_url \"YOUR_API_BASE_URL\" \\\n    --model_name \"QwQ-32B\" \\\n    --aux_api_base_url \"YOUR_AUX_API_BASE_URL\" \\\n    --aux_model_name \"Qwen2.5-32B-Instruct\" \\\n    --tokenizer_path \"PATH_TO_YOUR_TOKENIZER\" \\\n    --aux_tokenizer_path \"PATH_TO_YOUR_AUX_TOKENIZER\"\n```\n\n### Report Generation Mode\n\n1. If you would like to ask a single question, run the following command:\n```bash\npython scripts\u002Frun_web_thinker_report.py \\\n    --single_question \"What are the models of OpenAI and what are the differences?\" \\\n    --search_engine \"serper\" \\\n    --serper_api_key \"YOUR_GOOGLE_SERPER_API\" \\\n    --api_base_url \"YOUR_API_BASE_URL\" \\\n    --model_name \"QwQ-32B\" \\\n    --aux_api_base_url \"YOUR_AUX_API_BASE_URL\" \\\n    --aux_model_name \"Qwen2.5-32B-Instruct\" \\\n    --tokenizer_path \"PATH_TO_YOUR_TOKENIZER\" \\\n    --aux_tokenizer_path \"PATH_TO_YOUR_AUX_TOKENIZER\"\n```\n\n2. If you would like to run results on benchmarks, run the following command:\n```bash\npython scripts\u002Frun_web_thinker_report.py \\\n    --dataset_name glaive \\\n    --split test \\\n    --concurrent_limit 32 \\\n    --search_engine \"serper\" \\\n    --serper_api_key \"YOUR_GOOGLE_SERPER_API\" \\\n    --api_base_url \"YOUR_API_BASE_URL\" \\\n    --model_name \"QwQ-32B\" \\\n    --aux_api_base_url \"YOUR_AUX_API_BASE_URL\" \\\n    --aux_model_name \"Qwen2.5-32B-Instruct\" \\\n    --tokenizer_path \"PATH_TO_YOUR_TOKENIZER\" \\\n    --aux_tokenizer_path \"PATH_TO_YOUR_AUX_TOKENIZER\"\n```\n\n**Parameters Explanation:**\n- `--dataset_name`: Name of the dataset to use (glaive).\n- `--split`: Data split to run (test).\n- `--single_question`: The question you want to ask when running in single question mode.\n- `--concurrent_limit`: Maximum number of concurrent requests.\n- `--max_search_limit`: Maximum number of search queries per reasoning session.\n- `--search_engine`: Search engine to use (bing or serper). Default: bing.\n- `--serper_api_key`: Your Google Serper API key (not required when using Bing).\n- `--bing_subscription_key`: Your Bing Search API subscription key (not required when using Serper).\n- `--api_base_url`: Base URL for the main model API.\n- `--model_name`: Name of the main model to use.\n- `--aux_api_base_url`: Base URL for the auxiliary model API.\n- `--aux_model_name`: Name of the auxiliary model to use.\n\n### Run Demo\n\nYou can run the demo we have created with the following command, and we will conduct in-depth exploration and thinking based on the questions you input.\n```bash\ncd demo\nstreamlit run_demo.py\n```\n\n**Note:** Before running, it is necessary to configure the relevant parameters in `demo\u002Fsettings.py`.\n\n### Benchmarks\n\nThe benchmarks we utilize are categorized into two types:\n- **Complex Reasoning Benchmarks:** \n    - **PhD-level Science QA:** [GPQA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.12022) (198 questions)\n    - **General AI Assistant:** [GAIA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.12983) (103 questions)\n    - **Web Exploration:** [WebWalkerQA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.07572) (680 questions)\n    - **Extremely Difficult Reasoning Problems:** [Humanity's Last Exam (HLE)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.14249) (500 questions)\n- **Scientific Report Evaluation:**\n    - **General Open-ended Reasoning Problem:** [Reasoning-v1-20m](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fglaiveai\u002Freasoning-v1-20m) (30 questions)\n\nAll the pre-processed data is available in the `.\u002Fdata\u002F` directory. For GAIA, HLE and Reasoning-v1-20m, we sampled a text-only subset of questions to efficiently conduct our evaluation.\n\n\n### Evaluation\n\nOur model inference scripts will automatically save the model's input and output texts for evaluation. \n\n#### Problem Solving Evaluation\n\nYou can use the following command to evaluate the model's problem solving performance:\n\n```bash\npython scripts\u002Fevaluate\u002Fevaluate.py \\\n    --output_path \"YOUR_OUTPUT_PATH\" \\\n    --task math \\\n    --use_llm \\\n    --api_base_url \"YOUR_AUX_API_BASE_URL\" \\\n    --model_name \"Qwen2.5-72B-Instruct\" \\\n    --extract_answer\n```\n**Parameters Explanation:**\n- `--output_path`: Path to the model's outputs for evaluation.\n- `--task`: Task name. You can always set it to math (suitable for any QA task), unless it is a code task, then set it to code. \n- `--use_llm`: Whether to use the LLM to evaluate the model's performance.\n- `--api_base_url`: Base URL for the LLM API.\n- `--model_name`: Model name for LLM evaluation.\n- `--extract_answer`: Whether to extract the answer from the model's output, otherwise it will use the last few lines of the model's output as the final answer. Only used when `--use_llm` is set to `True`.\n\n#### Report Generation Evaluation\n\nWe employ [DeepSeek-R1](https:\u002F\u002Fapi-docs.deepseek.com\u002F) and [GPT-4o](https:\u002F\u002Fplatform.openai.com\u002Fdocs\u002Fmodels\u002Fgpt-4o) to perform *listwise evaluation* for comparison of reports generated by different models. You can evaluate the reports using:\n\n```bash\npython scripts\u002Fevaluate\u002Fevaluate_report.py \\\n    --api-base-url \"YOUR_API_BASE_URL\" \\\n    --api-key \"YOUR_API_KEY\" \\\n    --models \"YOUR_MODEL_NAME\" \\\n    --model-to-test-dir \"YOUR_MODEL_OUTPUT_DIRECTORY\"\n```\n**Parameters Explanation:**\n- `--api-base-url`: Base URL for the LLM API (e.g., \"https:\u002F\u002Fopenrouter.ai\u002Fapi\u002Fv1\" or \"https:\u002F\u002Fapi.openai.com\u002Fv1\").\n- `--api-key`: Your API key for the LLM service.\n- `--models`: A list of model names (e.g., \"deepseek\u002Fdeepseek-r1\", \"openai\u002Fgpt-4o\") to be used for evaluating the reports. The script will iterate through these models to get evaluations.\n- `--model-to-test-dir`: Path to the directory where the generated reports (markdown files) from your model are stored.\n\n\n📊 **Report Comparison Available**: \n\nWe've included the complete set of 30 test reports generated by **WebThinker**, **Grok3 DeeperSearch** and **Gemini2.0 Deep Research** in the `.\u002Foutputs\u002F` directory for your reference and comparison.\n\n\n## 📄 Citation\n\nIf you find this work helpful, please cite our paper:\n```bibtex\n@article{Li2025WebThinker,\n  author       = {Xiaoxi Li and\n                  Jiajie Jin and\n                  Guanting Dong and\n                  Hongjin Qian and\n                  Yutao Zhu and\n                  Yongkang Wu and\n                  Ji{-}Rong Wen and\n                  Zhicheng Dou},\n  title        = {WebThinker: Empowering Large Reasoning Models with Deep Research Capability},\n  journal      = {CoRR},\n  volume       = {abs\u002F2504.21776},\n  year         = {2025},\n  url          = {https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2504.21776},\n  doi          = {10.48550\u002FARXIV.2504.21776},\n  eprinttype    = {arXiv},\n  eprint       = {2504.21776},\n  timestamp    = {Sun, 25 May 2025 20:50:43 +0200},\n  biburl       = {https:\u002F\u002Fdblp.org\u002Frec\u002Fjournals\u002Fcorr\u002Fabs-2504-21776.bib},\n  bibsource    = {dblp computer science bibliography, https:\u002F\u002Fdblp.org}\n}\n```\n\n## 📄 License\n\nThis project is released under the [MIT License](LICENSE).\n\n## 📞 Contact\n\nFor any questions or feedback, please reach out to us at [xiaoxi_li@ruc.edu.cn](xiaoxi_li@ruc.edu.cn).\n\n\n## Star History\n\n[![Star History Chart](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FRUC-NLPIR_WebThinker_readme_0a4aa37715c2.png)](https:\u002F\u002Fwww.star-history.com\u002F#RUC-NLPIR\u002FWebThinker&Date)\n","\u003Ch1 align=\"center\"> 🌐 WebThinker：赋能大型推理模型具备深度研究能力\u003C\u002Fh1>\n\n\n\u003Cdiv align=\"center\"> \n\n[![Notion](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FNotion-WebThinker-red?style=flat&logo=notion&logoColor=white)](https:\u002F\u002Fforemost-beechnut-8ed.notion.site\u002FWebThinker-Empowering-Large-Reasoning-Models-with-Deep-Research-Capability-d13158a27d924a4b9df7f9ab94066b64) \n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-arXiv-b5212f.svg?logo=arxiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.21776)\n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-Hugging%20Face-yellow?logo=huggingface)](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2504.21776)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLICENSE-MIT-green.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FMIT) \n[![Python 3.9+](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.9+-blue.svg)](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002Frelease\u002Fpython-390\u002F) \n[![X (formerly Twitter) URL](https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Furl?url=https%3A%2F%2Fx.com%2FKevin_GuoweiXu%2Fstatus%2F1858338565463421244)](https:\u002F\u002Fx.com\u002Fkakakbibibi\u002Fstatus\u002F1917768235069628823)\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n🤗 \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Flixiaoxi45\u002FWebThinker-QwQ-32B\" target=\"_blank\">WebThinker-QwQ-32B\u003C\u002Fa> ｜\n🤗 \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Flixiaoxi45\u002FWebThinker-R1-7B\" target=\"_blank\">WebThinker-R1-7B\u003C\u002Fa> ｜\n🤗 \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Flixiaoxi45\u002FWebThinker-R1-14B\" target=\"_blank\">WebThinker-R1-14B\u003C\u002Fa> ｜\n🤗 \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Flixiaoxi45\u002FWebThinker-R1-32B\" target=\"_blank\">WebThinker-R1-32B\u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\u003Ch5 align=\"center\"> 如果您喜欢我们的项目，请在 GitHub 上给我们一个星标 ⭐ 以获取最新更新。\u003C\u002Fh5>\n\n## 📣 最新动态\n\n- **[2025 年 9 月 18 日]**: 🎉 我们的论文 **[WebThinker: Empowering Large Reasoning Models with Deep Research Capability](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.21776)** 已被 NeurIPS 2025 接收！\n- **[2025 年 5 月 30 日]**: 🔍 WebThinker 现在支持使用 **[Google Serper API](https:\u002F\u002Fserper.dev\u002F)** 进行网络搜索！重要提示：**[Bing Search API](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fbing\u002Fapis\u002Fbing-web-search-api)** 将于 2025 年 8 月停用。\n- **[2025 年 5 月 9 日]**: WebThinker 的简要介绍可在 **[X](https:\u002F\u002Fx.com\u002Fkakakbibibi\u002Fstatus\u002F1917768235069628823)**、**[知乎](https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F1903981050780192911)** 和 **[微信公众号](https:\u002F\u002Fmp.weixin.qq.com\u002Fs\u002FlVrTZQLmrJkkG5QYcEZTFA)** 等平台找到。\n- **[2025 年 5 月 1 日]**: 🤗 **[WebThinker 模型集合](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Flixiaoxi45\u002Fwebthinker-6812d5fd1287ee53d68f0557)** 现已在 Hugging Face 上线。您可以部署我们优化的模型用于深度研究任务。\n- **[2025 年 5 月 1 日]**: 📄 我们的论文现已发布在 **[arXiv](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.21776)** 和 **[Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2504.21776)** 上。\n- **[2025 年 3 月 31 日]**: 🎉 **[WebThinker Notion 页面](https:\u002F\u002Fforemost-beechnut-8ed.notion.site\u002FWebThinker-Empowering-Large-Reasoning-Models-with-Deep-Research-Capability-d13158a27d924a4b9df7f9ab94066b64)** 上线，包含详细的项目信息。\n- **[2025 年 3 月 31 日]**: 🚀 完整代码库已发布。WebThinker 现支持与 QwQ-32B 等开源推理模型配合进行深度研究。\n\n## 🔥 深度研究智能体家族\n\n\u003Cdetails open>\u003Csummary>欢迎尝试我们的深度研究智能体系列：\u003C\u002Fsummary>\u003Cp>\n\n\n> [**DeepAgent: A General Reasoning Agent with Scalable Toolsets (New!)**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.21618) \u003Cbr>\n> **Authors:** Xiaoxi Li, Wenxiang Jiao, Jiarui Jin, Guanting Dong, Jiajie Jin, Yinuo Wang, Hao Wang, Yutao Zhu, Ji-Rong Wen, Yuan Lu, Zhicheng Dou \u003Cbr>\n> **TLDR:** 一个端到端的深度推理智能体，通过受大脑启发的记忆折叠机制（brain-inspired memory folding mechanism），执行自主思考、工具发现和动作执行。\u003Cbr>\n[![github](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-Github-black?logo=github)](https:\u002F\u002Fgithub.com\u002FRUC-NLPIR\u002FDeepAgent) [![github](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FRUC-NLPIR\u002FDeepAgent.svg?style=social)](https:\u002F\u002Fgithub.com\u002FRUC-NLPIR\u002FDeepAgent) [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2510.21618-b31b1b.svg?logo=arXiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.21618) [![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHugging%20Face-Paper-yellow?logo=huggingface)](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2510.21618)\n\n\n > [**WebThinker: Empowering Large Reasoning Models with Deep Research Capability (NeurIPS 2025)**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.21776) \u003Cbr>\n> **Authors:** Xiaoxi Li*, Jiajie Jin*, Guanting Dong*, Hongjin Qian, Yutao Zhu, Yongkang Wu, Ji-Rong Wen, Zhicheng Dou \u003Cbr>\n> **TLDR:** 一个深度研究智能体，赋予大型推理模型自主搜索、网页浏览和研究报告撰写的能力。\u003Cbr>\n[![github](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-Github-black?logo=github)](https:\u002F\u002Fgithub.com\u002FRUC-NLPIR\u002FWebThinker) [![github](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FRUC-NLPIR\u002FWebThinker.svg?style=social)](https:\u002F\u002Fgithub.com\u002FRUC-NLPIR\u002FWebThinker) [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2504.21776-b31b1b.svg?logo=arXiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.21776) [![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHugging%20Face-Paper-yellow?logo=huggingface)](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2504.21776)\n\n> [**Search-o1: Agentic Search-Enhanced Large Reasoning Models (EMNLP 2025)**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.05366) \u003Cbr>\n> **Authors:** Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, Zhicheng Dou \u003Cbr>\n> **TLDR:** 一个智能体搜索增强框架，通过智能体检索增强生成（Agentic RAG）和文档内推理（reasoning-in-documents）模块，将自主知识检索与大型推理模型相结合。\u003Cbr>\n[![github](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-Github-black?logo=github)](https:\u002F\u002Fgithub.com\u002FRUC-NLPIR\u002FSearch-o1) [![github](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FRUC-NLPIR\u002FSearch-o1.svg?style=social)](https:\u002F\u002Fgithub.com\u002FRUC-NLPIR\u002FSearch-o1) [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2501.16399-b31b1b.svg?logo=arXiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.05366) [![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHugging%20Face-Paper-yellow?logo=huggingface)](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2501.05366) [![Project Page](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-green)](https:\u002F\u002Fsearch-o1.github.io\u002F)\n\u003C\u002Fp>\u003C\u002Fdetails>\n\n\n\n## 🎬 演示\n\n\u003Cdiv align=\"center\">\n    \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fa38e82ec-5aed-4efe-a8b8-e9ee2d97e9b9\" \u002F>\n\u003C\u002Fdiv>\n\n## 💡 概述\n\n**WebThinker** 是一个完全由大型推理模型（Large Reasoning Models, LRMs）驱动的深度研究框架。WebThinker 使 LRMs 能够在其思考过程中**自主搜索**、**深入探索网页**并**起草研究报告**。\n\n与现有的开源深度搜索智能体不同，后者通常采用带有预定义工作流的检索增强生成（Retrieval-Augmented Generation, RAG），WebThinker 允许推理模型本身在思考期间执行操作，从而在单次生成中实现**端到端任务执行**。\n\n### 📊 整体性能\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FRUC-NLPIR_WebThinker_readme_dc106ac8bc3f.png\" width=\"100%\" \u002F>\n\u003C\u002Fp>\n\n如上所示，WebThinker 在知识密集型复杂推理基准测试（GPQA, GAIA, WebWalkerQA, HLE）和用于报告生成的开放式推理任务中，始终优于竞争方法。我们的 WebThinker-32B 以 QwQ-32B 作为骨干推理模型，在所有任务上均实现了卓越的性能。\n\n### ✨ WebThinker 框架\n\n![Model Comparison](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FRUC-NLPIR_WebThinker_readme_5cdfd9edde0b.png)\n\n**WebThinker** 使推理模型能够在推理过程中自主进行网络搜索和网页导航，以获取外部知识。这种方法显著减少了知识密集型领域研究人员收集信息所需的时间和成本。此外，WebThinker 允许大型推理模型（LRM）在思考和搜索的同时起草章节内容，生成全面、定制化的报告，直接回应用户的研究问题。\n\n**主要特性：**\n- 我们引入了一个 **深度网络探索器（Deep Web Explorer）**，赋能 LRM 进行搜索、通过点击交互元素（如链接或按钮）导航页面，并提取相关信息。基于初始搜索结果，LRM 可以发起后续搜索并遍历更深层的链接，直到收集到所有相关信息。\n- 对于科学报告，我们的 **自主思考 - 搜索 - 起草（Autonomous Think-Search-and-Draft）** 策略将实时知识寻求与报告创建相结合。我们为 LRM 配备了三个专用工具：(1) 起草特定章节的内容，(2) 检查当前报告，以及 (3) 编辑报告——确保报告保持全面、连贯，并能适应新的见解。\n- 我们正在开发 **基于强化学习（RL）的训练策略**，通过利用来自复杂任务的大规模推理轨迹来优化端到端任务性能。使用推理准确性、工具使用和最终输出的准确度，我们构建偏好对用于在线 DPO（直接偏好优化）训练，使模型能够逐步提高其研究能力。\n\n\n\n## 🔧 安装\n\n###  环境设置\n```bash\n# Create conda environment\nconda create -n webthinker python=3.9\nconda activate webthinker\n\n# Install requirements\ncd WebThinker-main\npip install -r requirements.txt\n```\n\n## 🏃 快速开始\n\n### 准备工作\n\n#### 模型服务部署\n在运行 WebThinker 之前，请确保您的推理模型和辅助模型已使用 vLLM 进行服务部署。在我们的实验中，我们使用 QwQ-32B 作为推理模型，使用 Qwen-32B-Instruct 作为辅助模型。您也可以探索其他指令微调模型作为您的辅助模型，这将用于网页阅读、报告撰写\u002F编辑、评估等。关于模型服务的详细说明，请参阅 [此处](https:\u002F\u002Fdocs.vllm.ai\u002Fen\u002Fstable\u002Fserving\u002Fdistributed_serving.html)。 \n\n#### 网页解析客户端\n为了获得更好的网络爬取性能，我们建议在 `scripts\u002Fsearch\u002Fbing_search.py` 中使用 [Crawl4AI](https:\u002F\u002Fgithub.com\u002Funclecode\u002Fcrawl4ai) 设置一个网页解析客户端。这将有助于处理 JavaScript 渲染的内容，并提供更可靠的网页提取。\n\n现在您可以使用提供的脚本运行不同的推理模式。以下是执行每种模式的示例：\n\n### 问题解决模式\n\n1. 如果您想询问单个问题，请运行以下命令：\n```bash\npython scripts\u002Frun_web_thinker.py \\\n    --single_question \"What is OpenAI Deep Research?\" \\\n    --search_engine \"serper\" \\\n    --serper_api_key \"YOUR_GOOGLE_SERPER_API\" \\\n    --api_base_url \"YOUR_API_BASE_URL\" \\\n    --model_name \"QwQ-32B\" \\\n    --aux_api_base_url \"YOUR_AUX_API_BASE_URL\" \\\n    --aux_model_name \"Qwen2.5-32B-Instruct\" \\\n    --tokenizer_path \"PATH_TO_YOUR_TOKENIZER\" \\\n    --aux_tokenizer_path \"PATH_TO_YOUR_AUX_TOKENIZER\"\n```\n\n2. 如果您想在基准测试上运行结果，请运行以下命令：\n```bash\npython scripts\u002Frun_web_thinker.py \\\n    --dataset_name gaia \\\n    --split dev \\\n    --concurrent_limit 32 \\\n    --max_search_limit 15 \\\n    --search_engine \"serper\" \\\n    --serper_api_key \"YOUR_GOOGLE_SERPER_API\" \\\n    --api_base_url \"YOUR_API_BASE_URL\" \\\n    --model_name \"QwQ-32B\" \\\n    --aux_api_base_url \"YOUR_AUX_API_BASE_URL\" \\\n    --aux_model_name \"Qwen2.5-32B-Instruct\" \\\n    --tokenizer_path \"PATH_TO_YOUR_TOKENIZER\" \\\n    --aux_tokenizer_path \"PATH_TO_YOUR_AUX_TOKENIZER\"\n```\n\n### 报告生成模式\n\n1. 如果您想询问单个问题，请运行以下命令：\n```bash\npython scripts\u002Frun_web_thinker_report.py \\\n    --single_question \"What are the models of OpenAI and what are the differences?\" \\\n    --search_engine \"serper\" \\\n    --serper_api_key \"YOUR_GOOGLE_SERPER_API\" \\\n    --api_base_url \"YOUR_API_BASE_URL\" \\\n    --model_name \"QwQ-32B\" \\\n    --aux_api_base_url \"YOUR_AUX_API_BASE_URL\" \\\n    --aux_model_name \"Qwen2.5-32B-Instruct\" \\\n    --tokenizer_path \"PATH_TO_YOUR_TOKENIZER\" \\\n    --aux_tokenizer_path \"PATH_TO_YOUR_AUX_TOKENIZER\"\n```\n\n2. 如果您想在基准测试上运行结果，请运行以下命令：\n```bash\npython scripts\u002Frun_web_thinker_report.py \\\n    --dataset_name glaive \\\n    --split test \\\n    --concurrent_limit 32 \\\n    --search_engine \"serper\" \\\n    --serper_api_key \"YOUR_GOOGLE_SERPER_API\" \\\n    --api_base_url \"YOUR_API_BASE_URL\" \\\n    --model_name \"QwQ-32B\" \\\n    --aux_api_base_url \"YOUR_AUX_API_BASE_URL\" \\\n    --aux_model_name \"Qwen2.5-32B-Instruct\" \\\n    --tokenizer_path \"PATH_TO_YOUR_TOKENIZER\" \\\n    --aux_tokenizer_path \"PATH_TO_YOUR_AUX_TOKENIZER\"\n```\n\n**参数说明：**\n- `--dataset_name`: 要使用的数据集名称 (glaive)。\n- `--split`: 要运行的数据划分 (test)。\n- `--single_question`: 在单问题模式下您想要提出的问题。\n- `--concurrent_limit`: 最大并发请求数。\n- `--max_search_limit`: 每个推理会话的最大搜索查询次数。\n- `--search_engine`: 要使用的搜索引擎 (bing 或 serper)。默认值：bing。\n- `--serper_api_key`: 您的 Google Serper API 密钥（使用 Bing 时不需要）。\n- `--bing_subscription_key`: 您的 Bing 搜索 API 订阅密钥（使用 Serper 时不需要）。\n- `--api_base_url`: 主模型 API 的基础 URL。\n- `--model_name`: 要使用的主模型名称。\n- `--aux_api_base_url`: 辅助模型 API 的基础 URL。\n- `--aux_model_name`: 要使用的辅助模型名称。\n\n### 运行演示\n\n您可以使用以下命令运行我们创建的演示，我们将根据您输入的问题进行深入探索和思考。\n```bash\ncd demo\nstreamlit run_demo.py\n```\n\n**注意：** 在运行之前，需要在 `demo\u002Fsettings.py` 中配置相关参数。\n\n### 基准测试 (Benchmarks)\n\n我们使用的基准测试分为两类：\n- **复杂推理基准测试 (Complex Reasoning Benchmarks):** \n    - **博士级科学问答 (QA):** [GPQA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.12022) (198 个问题)\n    - **通用 AI 助手:** [GAIA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.12983) (103 个问题)\n    - **网络探索:** [WebWalkerQA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.07572) (680 个问题)\n    - **极高难度推理问题:** [人类终极考试 (HLE)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.14249) (500 个问题)\n- **科学报告评估:**\n    - **通用开放域推理问题:** [Reasoning-v1-20m](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fglaiveai\u002Freasoning-v1-20m) (30 个问题)\n\n所有预处理的数据均可在 `.\u002Fdata\u002F` 目录中找到。对于 GAIA、HLE 和 Reasoning-v1-20m，我们采样了一部分仅包含文本的问题子集，以便更高效地进行评估。\n\n\n### 评估 (Evaluation)\n\n我们的模型推理脚本会自动保存模型的输入和输出文本以供评估。 \n\n#### 问题解决评估 (Problem Solving Evaluation)\n\n您可以使用以下命令来评估模型的问题解决性能：\n\n```bash\npython scripts\u002Fevaluate\u002Fevaluate.py \\\n    --output_path \"YOUR_OUTPUT_PATH\" \\\n    --task math \\\n    --use_llm \\\n    --api_base_url \"YOUR_AUX_API_BASE_URL\" \\\n    --model_name \"Qwen2.5-72B-Instruct\" \\\n    --extract_answer\n```\n**参数说明:**\n- `--output_path`: 用于评估的模型输出路径。\n- `--task`: 任务名称。除非是代码任务（此时设为 code），否则通常可以将其设置为 math（适用于任何问答 (QA) 任务）。\n- `--use_llm`: 是否使用大语言模型 (LLM) 来评估模型的性能。\n- `--api_base_url`: 大语言模型 (LLM) 应用程序接口 (API) 的基础统一资源定位符 (URL)。\n- `--model_name`: 用于 LLM 评估的模型名称。\n- `--extract_answer`: 是否从模型输出中提取答案，否则它将使用模型输出的最后几行作为最终答案。仅在 `--use_llm` 设置为 `True` 时使用。\n\n#### 报告生成评估 (Report Generation Evaluation)\n\n我们采用 [DeepSeek-R1](https:\u002F\u002Fapi-docs.deepseek.com\u002F) 和 [GPT-4o](https:\u002F\u002Fplatform.openai.com\u002Fdocs\u002Fmodels\u002Fgpt-4o) 执行 *列表级评估 (listwise evaluation)*，以比较不同模型生成的报告。您可以使用以下命令评估报告：\n\n```bash\npython scripts\u002Fevaluate\u002Fevaluate_report.py \\\n    --api-base-url \"YOUR_API_BASE_URL\" \\\n    --api-key \"YOUR_API_KEY\" \\\n    --models \"YOUR_MODEL_NAME\" \\\n    --model-to-test-dir \"YOUR_MODEL_OUTPUT_DIRECTORY\"\n```\n**参数说明:**\n- `--api-base-url`: 大语言模型 (LLM) 应用程序接口 (API) 的基础统一资源定位符 (URL)（例如：\"https:\u002F\u002Fopenrouter.ai\u002Fapi\u002Fv1\" 或 \"https:\u002F\u002Fapi.openai.com\u002Fv1\"）。\n- `--api-key`: 您用于 LLM 服务的 API 密钥。\n- `--models`: 用于评估报告的模型名称列表（例如：\"deepseek\u002Fdeepseek-r1\", \"openai\u002Fgpt-4o\"）。脚本将遍历这些模型以获取评估结果。\n- `--model-to-test-dir`: 存储您的模型生成的报告（markdown 文件）的目录路径。\n\n\n📊 **提供报告对比**: \n\n我们在 `.\u002Foutputs\u002F` 目录中包含了由 **WebThinker**、**Grok3 DeeperSearch** 和 **Gemini2.0 Deep Research** 生成的全部 30 份测试报告，供您参考和对比。\n\n\n## 📄 引用 (Citation)\n\n如果您发现这项工作有帮助，请引用我们的论文：\n```bibtex\n@article{Li2025WebThinker,\n  author       = {Xiaoxi Li and\n                  Jiajie Jin and\n                  Guanting Dong and\n                  Hongjin Qian and\n                  Yutao Zhu and\n                  Yongkang Wu and\n                  Ji{-}Rong Wen and\n                  Zhicheng Dou},\n  title        = {WebThinker: Empowering Large Reasoning Models with Deep Research Capability},\n  journal      = {CoRR},\n  volume       = {abs\u002F2504.21776},\n  year         = {2025},\n  url          = {https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2504.21776},\n  doi          = {10.48550\u002FARXIV.2504.21776},\n  eprinttype    = {arXiv},\n  eprint       = {2504.21776},\n  timestamp    = {Sun, 25 May 2025 20:50:43 +0200},\n  biburl       = {https:\u002F\u002Fdblp.org\u002Frec\u002Fjournals\u002Fcorr\u002Fabs-2504-21776.bib},\n  bibsource    = {dblp computer science bibliography, https:\u002F\u002Fdblp.org}\n}\n```\n\n## 📄 许可证 (License)\n\n本项目采用 [MIT 许可证](LICENSE) 发布。\n\n## 📞 联系方式 (Contact)\n\n如有任何问题或反馈，请通过 [xiaoxi_li@ruc.edu.cn](xiaoxi_li@ruc.edu.cn) 与我们联系。\n\n\n## 星标历史 (Star History)\n\n[![Star History Chart](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FRUC-NLPIR_WebThinker_readme_0a4aa37715c2.png)](https:\u002F\u002Fwww.star-history.com\u002F#RUC-NLPIR\u002FWebThinker&Date)","# WebThinker 快速上手指南\n\n## 📋 环境准备\n\n在开始之前，请确保满足以下系统要求和前置条件：\n\n1.  **Python 版本**: 需要 Python 3.9 或更高版本。\n2.  **API 密钥**:\n    *   推荐使用 **Google Serper API**（注意：Bing Search API 将于 2025 年 8 月停用）。\n    *   获取密钥：[https:\u002F\u002Fserper.dev\u002F](https:\u002F\u002Fserper.dev\u002F)\n3.  **模型服务 (vLLM)**:\n    *   WebThinker 依赖 vLLM 服务来运行推理模型和辅助模型。\n    *   **推理模型**: 例如 `QwQ-32B`。\n    *   **辅助模型**: 例如 `Qwen-32B-Instruct` (用于网页阅读、报告撰写等)。\n    *   请参考 [vLLM 文档](https:\u002F\u002Fdocs.vllm.ai\u002Fen\u002Fstable\u002Fserving\u002Fdistributed_serving.html) 部署模型服务。\n4.  **Web 解析器 (可选但推荐)**:\n    *   为了处理 JavaScript 渲染内容，建议在 `scripts\u002Fsearch\u002Fbing_search.py` 中配置 [Crawl4AI](https:\u002F\u002Fgithub.com\u002Funclecode\u002Fcrawl4ai)。\n\n## 🛠️ 安装步骤\n\n1.  **创建并激活 Conda 环境**:\n    ```bash\n    conda create -n webthinker python=3.9\n    conda activate webthinker\n    ```\n\n2.  **安装依赖**:\n    ```bash\n    cd WebThinker-main\n    pip install -r requirements.txt\n    ```\n\n## 🚀 基本使用\n\n完成安装并配置好模型服务后，您可以运行以下命令体验问题解决模式。请确保替换命令中的 `YOUR_...` 占位符为您的实际配置信息。\n\n### 单问题解答示例\n\n```bash\npython scripts\u002Frun_web_thinker.py \\\n    --single_question \"What is OpenAI Deep Research?\" \\\n    --search_engine \"serper\" \\\n    --serper_api_key \"YOUR_GOOGLE_SERPER_API\" \\\n    --api_base_url \"YOUR_API_BASE_URL\" \\\n    --model_name \"QwQ-32B\" \\\n    --aux_api_base_url \"YOUR_AUX_API_BASE_URL\" \\\n    --aux_model_name \"Qwen2.5-32B-Instruct\" \\\n    --tokenizer_path \"PATH_TO_YOUR_TOKENIZER\" \\\n    --aux_tokenizer_path \"PATH_TO_YOUR_AUX_TOKENIZER\"\n    ```\n\n> **注意**: \n> - `--api_base_url` 需指向您启动的 vLLM 推理模型服务地址。\n> - `--aux_api_base_url` 需指向辅助模型服务地址。\n> - 如需生成完整研究报告，可使用 `scripts\u002Frun_web_thinker_report.py` 脚本。","某互联网大厂的产品经理正在筹备下一代智能健康监测设备的立项方案，急需了解全球最新的竞品动态与技术路线，以支撑季度战略规划。\n\n### 没有 WebThinker 时\n- 依赖通用大模型，因训练数据截止，无法获取近半年发布的海外竞品核心参数。\n- 需人工在多个搜索引擎反复查询，手动复制粘贴整理信息，耗时数天且易疲劳出错。\n- 面对海量碎片化新闻，难以辨别真伪，常因幻觉问题引用错误数据导致决策风险。\n- 仅能进行浅层信息罗列，缺乏对技术演进趋势的深度逻辑推演与关联分析。\n\n### 使用 WebThinker 后\n- 集成 Google Serper API，自动检索全网实时资讯，彻底突破模型知识库的时效限制。\n- 自主规划多步搜索策略，自动聚合并清洗数据，将原本数天的调研工作压缩至小时级。\n- 利用内置推理模型交叉验证信源，显著降低幻觉率，确保关键市场数据准确可信。\n- 输出包含引用链接的深度分析报告，直接提供可落地的技术路线建议，赋能业务决策。\n\nWebThinker 通过赋予大模型深度联网推理能力，让复杂市场调研从“人工搬运”升级为“智能洞察”。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FRUC-NLPIR_WebThinker_dc106ac8.png","RUC-NLPIR","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FRUC-NLPIR_de7305eb.jpg","",null,"https:\u002F\u002Fruc-nlpir.github.io\u002F","https:\u002F\u002Fgithub.com\u002FRUC-NLPIR",[83],{"name":84,"color":85,"percentage":86},"Python","#3572A5",100,1428,138,"2026-04-03T06:43:23","MIT","未说明","未明确说明，需支持 vLLM 部署大模型（如 QwQ-32B）",{"notes":94,"python":95,"dependencies":96},"1. 需配置 Google Serper API Key（Bing Search API 将于 2025 年 8 月停用）；2. 必须使用 vLLM 部署推理模型及辅助模型；3. 推荐使用 Crawl4AI 搭建网页解析客户端以处理 JS 渲染内容；4. 支持问题解决与报告生成两种模式；5. 模型权重可在 Hugging Face 获取。","3.9+",[97,98],"vllm","crawl4ai",[18],[101,102,103,104,105,106,107,108,109,110,111,112,113],"deepresearch","deepsearch","deepseek-r1","gaia","gpqa","hle","o1","o3","qwq","reasoning","reportgen","webwalker","research","2026-03-27T02:49:30.150509","2026-04-06T09:44:28.871338",[117,122,127,132,137,142],{"id":118,"question_zh":119,"answer_zh":120,"source_url":121},3729,"为什么我复现 GAIA 基准测试时，计算出的分数与 README 中的数值不一致？","这通常是因为对评估指标的理解差异。虽然精确匹配（exact match）可能因空格或大小写不同而判错，但在 `llm_equal` 指标下，只要内容正确即被视为正确。此外，请检查您的 NLTK 库路径配置是否正确，并建议将搜索返回的文档数量设置为 10，以确保检索到足够的相关信息。","https:\u002F\u002Fgithub.com\u002FRUC-NLPIR\u002FWebThinker\u002Fissues\u002F3",{"id":123,"question_zh":124,"answer_zh":125,"source_url":126},3730,"主推理代理（Main Agent）和网页探索代理（Web Explorer）应该如何选择和配对模型？","建议使用参数量级相匹配的指令微调模型。例如，当主推理模型使用 WebThinker-R1-7B 时，网页探索代理应配对 Qwen2.5-7B-Instruct；若主模型为 32B 版本，则辅助模型应对应使用 Qwen2.5-32B-Instruct。","https:\u002F\u002Fgithub.com\u002FRUC-NLPIR\u002FWebThinker\u002Fissues\u002F27",{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},3731,"项目的训练数据和代码计划开源吗？","是的，目前与训练相关的代码和数据正在整理中。一旦准备就绪，项目方计划将其公开发布。数据的筛选标准包括报告格式、长度、章节顺序以及模型裁判给出的完整性、事实性等指标。","https:\u002F\u002Fgithub.com\u002FRUC-NLPIR\u002FWebThinker\u002Fissues\u002F18",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},3732,"安装完成后，如果找不到 WebThinker-main 文件夹或 requirements.txt 文件怎么办？","请确认您是否已经通过 git clone 命令成功克隆了整个仓库。部分用户在正确执行 clone 操作后，即可在本地看到包含 WebThinker-main 文件夹和依赖文件的完整目录结构。","https:\u002F\u002Fgithub.com\u002FRUC-NLPIR\u002FWebThinker\u002Fissues\u002F16",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},3733,"为什么我的搜索返回文档过少，导致测试结果不理想？","这可能是因为 NLTK 库未正确配置，导致无法完整抓取页面内容。建议您检查并正确设置 NLTK 的下载路径，同时在运行脚本时将返回文档的数量限制显式设置为 10，以保证检索质量。","https:\u002F\u002Fgithub.com\u002FRUC-NLPIR\u002FWebThinker\u002Fissues\u002F12",{"id":143,"question_zh":144,"answer_zh":145,"source_url":146},3734,"为什么项目选择使用 LLM as Judge 的方式进行结果评估？","这是因为 Gaia、WebWalkerQA、HLE 等官方基准测试通常只提供标准答案标签，而不提供详细的中间验证集。因此，为了更客观地评估生成内容的质量（而不仅仅是字符串匹配），社区和官方普遍采用大语言模型作为裁判来进行语义层面的比对。","https:\u002F\u002Fgithub.com\u002FRUC-NLPIR\u002FWebThinker\u002Fissues\u002F5",[]]