[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-inclusionAI--ASearcher":3,"tool-inclusionAI--ASearcher":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",146793,2,"2026-04-08T23:32:35",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108111,"2026-04-08T11:23:26",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":72,"owner_avatar_url":73,"owner_bio":74,"owner_company":75,"owner_location":75,"owner_email":75,"owner_twitter":75,"owner_website":76,"owner_url":77,"languages":78,"stars":91,"forks":92,"last_commit_at":93,"license":75,"difficulty_score":94,"env_os":95,"env_gpu":96,"env_ram":97,"env_deps":98,"category_tags":107,"github_topics":75,"view_count":32,"oss_zip_url":75,"oss_zip_packed_at":75,"status":17,"created_at":108,"updated_at":109,"faqs":110,"releases":143},5752,"inclusionAI\u002FASearcher","ASearcher","An Open-Source Large-Scale Reinforcement Learning Project for Search Agents","ASearcher 是一个专为搜索智能体打造的开源大规模强化学习框架，旨在通过在线训练将搜索能力提升至专家水平。它主要解决了传统搜索代理在长程任务中难以维持高效探索、训练资源利用率低以及缺乏高质量合成数据等痛点。\n\n该项目非常适合希望构建高性能搜索机器人的开发者与研究人员。ASearcher 提供了从数据合成、模型权重到完整训练流程的全套开源资源，让用户能低成本地定制专属智能体。其核心技术亮点包括：引入基于提示词的数据合成智能体，自动生成高难度问答对以丰富训练多样性；采用全异步强化学习架构，将轨迹收集与模型训练解耦，彻底消除 GPU 空闲时间，支持超过 100 轮工具调用和 40 万 token 生成的超长程搜索任务。\n\n在性能表现上，ASearcher 无需依赖外部大模型，仅在 32B 参数规模下，便在 GAIA、xBench-DeepSearch 等多个权威基准测试中超越了其他开源方案，并通过强化学习带来了显著的性能跃升。无论是进行学术研究还是工程落地，ASearcher 都为打造下一代自主搜索代理提供了坚实可靠的基础设施。","\u003Ch1 align=\"center\">\n\u003Cem>ASearcher\u003C\u002Fem>: An Open-Source Large-Scale\nReinforcement Learning Project for Search Agents\n\u003C\u002Fh1>\n\n\u003Cp align=\"center\">| \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.07976\">\u003Cb>📰 Paper\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FinclusionAI\u002FASearcher-train-data\">\u003Cb>🤗 Datasets\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FinclusionAI\u002Fasearcher-6891d8acad5ebc3a1e1fb2d1\">\u003Cb>🤗 Models\u003C\u002Fb>\u003C\u002Fa> | \u003C\u002Fp>\n\n# Introduction\n\nASearcher is an open-source framework designed for large-scale online reinforcement learning (RL) training of search agents. Our mission is to advance Search Intelligence to expert-level performance. We are fully committed to open-source by releasing model weights, detailed training methodologies, and data synthesis pipelines. Additionally, we provide comprehensive guidance on building and training customized agents based on AReaL. ASearcher empowers developers to build their own high-performance search agents easily and cost-effectively.\n\n**ASearcher Highlights**\n\n+ 🔁 **Data Synthesis Agent**: We introduce a prompt-based LLM agent that autonomously generates grounded, challenging, and highly uncertain QA pairs to enhance training diversity.\n+ ⚡ **Fully Asynchronous Agentic RL**: Our scalable agentic RL framework decouples trajectory collection from model training, eliminating GPU idle time and enabling efficient long-horizon RL training.\n+ 🌐 **RL Enables Long-Horizon Search**: Through RL training, ASearcher exhibits long-horizon search, with tool calls exceeding 100 rounds and generated tokens surpassing 400k during RL training. \n+ 🏆 **Cutting-Edge Performance**: With a simple agent design and no external LLMs, ASearcher achieves *Avg@4 scores of 58.7, 51.1, and 74.5* on GAIA, xBench-DeepSearch, and Frames, respectively, surpassing other open-source search agents on the same 32B scale. ASearcher achieves *Pass@4 scores of 74.7, 75.0, and 85.5* on GAIA, xBench-DeepSearch, and Frames.\n+ 📈 **Substantial Improvement Through RL**: RL training  brings improvements of *+15.0, +22.4, and +14.6* Avg@4 scores on GAIA, xBench-DeepSearch, and Frames, respectively.\n+ 🛠️ **Fully Open-Source**: We are committed to open-sourcing all components for agentic RL training, including datasets, data synthesis agent, training details, model weights, and detailed guidelines for customized agent development.\u003Cfont style=\"color:#DF2A3F;\"> The released models and data could be found at [🤗Huggingface](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FinclusionAI\u002Fasearcher-6891d8acad5ebc3a1e1fb2d1) \u003C\u002Ffont>.\n\n**📰 News & Updates**:\n- 2025-09-18: Training code and latest model for ASearcher-Web-QwQ are released! Checkout [ASearcher-Web-QwQ-V2](https:\u002F\u002Fhuggingface.co\u002FinclusionAI\u002FASearcher-Web-QwQ-V2) and [our training code](ASearcher\u002Ftrain\u002Fasearcher_reasoning.py) for smooth large-scale agentic RL training!\n- 2025-09-18: More clean & flexible training! ASearcher now uses [AReaL](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL) as a package.\n- 2025-08-30: **ASearcher-Web-QwQ-V2 incoming!** State-of-the-art search agent with improved training data and end-to-end agentic RL training. Model and data will be released soon.\n- 2025-08-09: Our [technical report](assets\u002FASearcher.pdf) is released.\n- 2025-08-05: **ASearcher** is released, try asynchronous RL training and automatic QA synthesis to train an advanced search agent!🎉\n\n# Results Showcase\nWe evaluate our approach on challenging QA benchmarks (GAIA, xBench-DeepSearch, and Frames), which test advanced problem-solving abilities and web search strategies. These benchmarks are specifically designed to assess an agent's capability to interact with the real web and retrieve up-to-date information, often beyond the internal knowledge of LLMs.\n\n**Cutting-Edge Performance.** Our agent, [ASearcher-Web-QwQ-v2](https:\u002F\u002Fhuggingface.co\u002FinclusionAI\u002FASearcher-Web-QwQ), achieves state-of-the-art performance among open-source agents, with the highest Avg@4 scores on GAIA and xBench. Additionally, we report Pass@4, which measures the ratio of questions where the agent finds the correct answer within four trials. [ASearcher-Web-QwQ-v2](https:\u002F\u002Fhuggingface.co\u002FinclusionAI\u002FASearcher-Web-QwQ) also outperforms existing open-source agents in terms of pass rate, further demonstrating its robustness.\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FinclusionAI_ASearcher_readme_e8abacc843ca.png)\n\n\u003Cp align=\"center\"> Fig.1 The performance of various methods based on 32B-scale models on GAIA, xBench-DeepSearch, and frames. Avg@4 and Pass@4 are reported  \u003C\u002Fp>\n\n\n**Substantial Improvements Through RL.** When comparing performance before and after reinforcement learning (RL), [ASearcher-Web-QwQ-v2](https:\u002F\u002Fhuggingface.co\u002FinclusionAI\u002FASearcher-Web-QwQ) achieves improvements of *+15.0, +22.4, and +14.6* on GAIA, xBench-DeepSearch, and Frames, respectively. In terms of pass rate (Pass@4), [ASearcher-Web-QwQ-v2](https:\u002F\u002Fhuggingface.co\u002FinclusionAI\u002FASearcher-Web-QwQ) also demonstrates significant gains—particularly on xBench-DeepSearch, where it shows a remarkable improvement of 22.4.\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FinclusionAI_ASearcher_readme_c449e6a6cf85.jpg)\n\u003Cp align=\"center\"> Fig.2 Comparison of the performance of QwQ-32B agent before and after RL Training. \u003C\u002Fp>\n\n\n\n# Data Synthesis\nWe develop a prompt-based LLM agent designed to autonomously generate grounded, challenging, and highly uncertain QA pairs. The process begins with basic questions, which the agent then iteratively refines through two key strategies:\n\n+ Fuzzing: Increasing uncertainty by obscuring key details in the query.\n+ Context Injection: Augmenting questions with external facts retrieved via tools to deepen complexity.\n\nEach generated question undergoes rigorous multi-stage validation:\n\n+ Quality Assurance: Checks for fluency, timeliness, and logical coherence.\n+ Difficulty Verification: Compares answers generated by an LRM against ground truth to ensure challenge.\n+ Answer Uniqueness Validation: Confirms that incorrect LRM answers are indeed invalid, preserving question integrity.\n\n\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FinclusionAI_ASearcher_readme_277b90285109.png)\n\n\u003Cp align=\"center\"> Fig.3 Data Synthesis Agent. \u003C\u002Fp>\n\n# Fully Asynchronous Agentic RL training\n\nOur analysis reveals significant **variance in the execution time of agent trajectories**. By examining the number of turns and generated tokens during RL training, we observe that lengthy trajectories can require dozens more turns than shorter ones. In terms of token generation, longer trajectories exceed their shorter counterparts by up to two orders of magnitude, as illustrated in the figure below.\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FinclusionAI_ASearcher_readme_986d84098591.png)\n\n\u003Cp align=\"center\"> Fig.4 (Left) Number of turns versus training steps. (Right) Number of generated tokens versus training steps. \u003C\u002Fp>\n\n \n**Fuly Asynchronous RL Training Enables Long-Horizon Tool Use.** In batch-generation RL systems, a batch must wait for the longest trajectory to complete, resulting in significant GPU idle time. In contrast, fully asynchronous reinforcement learning (RL) eliminates this bottleneck by completely decoupling training from trajectory generation. This allows relaxed turn limits (e.g., 128 turns\u002Ftrajectory), enabling agents to explore deeper search paths without sacrificing training efficiency. Remarkably, our agent, ASearcher-Web-QwQ, achieves extreme long-horizon search, **with tool calls exceeding 100 turns and generated tokens surpassing 400k during RL training**.\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FinclusionAI_ASearcher_readme_20a64b57c27e.png)\n\n\u003Cp align=\"center\"> Fig.5 Illustration of full fully asynchronous RL Training. \u003C\u002Fp>\n \n\n# Quick Start\n## Evaluation\nTo reproduce the results presented in Fig.2, please run the following script.\n\n```bash\ncd evaluation\u002F\n\nMODEL_PATH=\u002Fpath\u002Fto\u002Fmodels \nDATA_DIR=\u002Fpath\u002Fto\u002Ftest_set # Could be downloaded from [https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FinclusionAI\u002FASearcher-test-data]\n\nDATA_NAMES=GAIA,xbench-deepsearch,Frames\nAGENT_TYPE=asearcher-reasoning\nPROMPT_TYPE=asearcher-reasoning\nSEARCH_CLIENT_TYPE=async-web-search-access\n\nSCRIPT_DIR=\"$(cd \"$(dirname \"${BASH_SOURCE[0]}\")\" &> \u002Fdev\u002Fnull && pwd)\"\nPROJECT_ROOT=\"$(dirname \"$SCRIPT_DIR\")\"\n\nPYTHONPATH=\"${PROJECT_ROOT}:$PYTHONPATH\" \\\nSERPER_API_KEY=${your_serper_api_key} \\\nJINA_API_KEY=${your_jina_api_key} \\\nTOKENIZERS_PARALLELISM=false \\\npython3 search_eval_async.py \\\n    --data_names ${DATA_NAMES} \\\n    --model_name_or_path ${MODEL_PATH}  \\\n    --output_dir ${MODEL_PATH} \\\n    --data_dir ${DATA_DIR} \\\n    --prompt_type $PROMPT_TYPE \\\n    --agent-type ${AGENT_TYPE} \\\n    --search-client-type ${SEARCH_CLIENT_TYPE} \\\n    --tensor_parallel_size 4 \\\n    --temperature 0.6 \\\n    --parallel-mode seed \\\n    --seed 1 \\\n    --use-jina \\\n    --llm_as_judge \\\n    --pass-at-k 1 \\ # if you want get more stable result, please increase it\n```\nplease also refer to the [Evaluation doc](docs\u002Fevaluation.md) for the detailed guideline.\n\n## Training\n\n\n### Fine-tuning a 7B model\n\n**1. Set Up the Environment**\n\nPlease refer to https:\u002F\u002Finclusionai.github.io\u002FAReaL\u002Ftutorial\u002Finstallation.html#runtime-environment\n\n**2.1 Training a 7B model on 16 nodes (recommanded)**\n```bash\ncd AReaL\n\nexport SERPER_API_KEY=YOUR_SERPER_API_KEY\nexport JINA_API_KEY=YOUR_JINA_API_KEY\npython3 -m areal.launcher.ray ASearcher\u002Ftrain\u002Fasearcher.py \\\n    --config ASearcher\u002Fconfigs\u002Fasearcher_web_16nodes.yaml \\\n    experiment_name=\u003Cyour experiment name> \\\n    trial_name=\u003Cyour trial name> \\\n    allocation_mode=sglang.d96p1t1+d32p1t1 \\\n    cluster.n_nodes=16 \\\n    cluster.n_gpus_per_node=8\n```\n\n**2.2 Training a 7B model on a single node (might be slow)**\n\n```bash\ncd AReaL\n\nexport SERPER_API_KEY=YOUR_SERPER_API_KEY\nexport JINA_API_KEY=YOUR_JINA_API_KEY\n\npython3 -m areal.launcher.local ASearcher\u002Ftrain\u002Fasearcher.py \\\n    --config ASearcher\u002Fconfigs\u002Fasearcher_web.yaml \\\n    experiment_name=\u003Cyour experiment name> \\\n    trial_name=\u003Cyour trial name>\n```\n\n### Fine-tuning a QwQ-32B Agent\n\n**Step 1.** Launch Qwen2.5-72B-Instruct for LLM-as-Judge:\n\n```shell\npython3 -m areal.launcher.ray ASearcher\u002Ftrain\u002Fasearcher_reasoning.py \\\n    --config ASearcher\u002Fconfigs\u002Fasearcher_web_qwq.yaml \\\n    experiment_name=asearcher-qwen72b-inst-server-only \\\n    trial_name=run1 \\\n    cluster.n_nodes=1 allocation_mode=sglang.d2t4p1 \\\n    actor.path=Qwen\u002FQwen2.5-72B-Instruct \n```\n\n**Step 2.** Launch QwQ-32B agent training:\n\n```shell\npython3 -m areal.launcher.ray \\\n    ASearcher\u002Ftrain\u002Fasearcher_reasoning.py \\\n    --config ASearcher\u002Fconfigs\u002Fasearcher_web_qwq.yaml \\\n    experiment_name=asearcher-qwq-train \\\n    trial_name=run1 cluster.n_nodes=6 allocation_mode=sglang.d2t8+d4t8 \\\n    actor.path=Qwen\u002FQwQ-32B \\\n    train_dataset.path=path_to_ASearcher-LRM-35k \\\n    judge_engine.experiment_name=asearcher-qwen72b-inst-server-only \\\n    judge_engine.trial_name=run1\n```\n\n\nplease also refer to the [Training doc](docs\u002Ftraining.md) for the detailed guideline.\n\n## Launch demo\nPlease refer [Demo documentation](demo\u002FREADME.md) to see how to launch an asearcher visualization demo.\n\n## (Optional) Customization\n\nPlease refer to our [guideline](docs\u002Fguideline.md) for more information about building a custom agent.\n\n## (Optional) Data Synthesis\nThe data synthesis agent is provided in `qa_synthesis\u002Fqa_synthesis_agent.py`. To run the agent for synthesizing QA, you need to,\n\n1. Download related data, including the Wikipedia 2018 webpages, and a list of sampled links\n2. Launch SGLang servers for two models: `QwQ-32B` and `Qwen2.5-72B-instruct`\n3. Run `python3 qa_synthesis\u002Fqa_synthesis_agent.py` to synthesize high-quality QAs!\n\n## Acknowledgements\n\nWe would like to acknowledge that the primary contributors to this work are from the RL Lab at Ant Research and the Institute for Interdisciplinary Information Sciences at Tsinghua University.\n\nOur team has also received invaluable assistance from the following groups:\n\n- The [AWorld](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAWorld) team at Ant Group for sharing their experience in agent development\n- The Super Computing Technology (SCT) team at Ant Group, particularly for their specialized knowledge in large-scale cluster management and operations\n\nWe are also grateful for the foundational work and inspiration provided by the research community, including but not limited to [Search-o1](https:\u002F\u002Fsearch-o1.github.io\u002F), [Search-R1](https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1) and [WebAgent](https:\u002F\u002Fgithub.com\u002FAlibaba-NLP\u002FWebAgent).\n\n## Citation\n\nPlease cite our work if you find our work useful!\n\n```\n@misc{gao2025turnsunlockinglonghorizonagentic,\n      title={Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL}, \n      author={Jiaxuan Gao and Wei Fu and Minyang Xie and Shusheng Xu and Chuyi He and Zhiyu Mei and Banghua Zhu and Yi Wu},\n      year={2025},\n      eprint={2508.07976},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.07976}, \n}\n```","\u003Ch1 align=\"center\">\n\u003Cem>ASearcher\u003C\u002Fem>: 一个面向搜索代理的大规模开源强化学习项目\n\u003C\u002Fh1>\n\n\u003Cp align=\"center\">| \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.07976\">\u003Cb>📰 论文\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FinclusionAI\u002FASearcher-train-data\">\u003Cb>🤗 数据集\u003C\u002Fb>\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FinclusionAI\u002Fasearcher-6891d8acad5ebc3a1e1fb2d1\">\u003Cb>🤗 模型\u003C\u002Fb>\u003C\u002Fa> | \u003C\u002Fp>\n\n# 简介\n\nASearcher 是一个开源框架，专为大规模在线强化学习（RL）训练搜索代理而设计。我们的使命是将搜索智能提升至专家级水平。我们完全秉持开源理念，公开模型权重、详细的训练方法以及数据合成流水线。此外，我们还提供基于 AReaL 构建和训练自定义代理的全面指南。ASearcher 赋能开发者以简单且经济高效的方式构建自己的高性能搜索代理。\n\n**ASearcher 亮点**\n\n+ 🔁 **数据合成代理**：我们引入了一种基于提示的 LLM 代理，能够自主生成有据可依、具有挑战性且高度不确定的问答对，从而提升训练多样性。\n+ ⚡ **全异步代理式强化学习**：我们的可扩展代理式 RL 框架将轨迹收集与模型训练解耦，消除了 GPU 的空闲时间，实现了高效的长 horizon 强化学习训练。\n+ 🌐 **RL 支持长 horizon 搜索**：通过强化学习训练，ASearcher 展现出长 horizon 搜索能力，工具调用次数超过 100 轮，RL 训练过程中生成的 token 数量超过 40 万。\n+ 🏆 **前沿性能**：在简单的代理设计且不依赖外部 LLM 的情况下，ASearcher 在 GAIA、xBench-DeepSearch 和 Frames 上分别取得了 *Avg@4 分数 58.7、51.1 和 74.5*，超越了其他同为 32B 规模的开源搜索代理。ASearcher 在 GAIA、xBench-DeepSearch 和 Frames 上的 *Pass@4 分数分别为 74.7、75.0 和 85.5*。\n+ 📈 **通过 RL 实现显著提升**：强化学习训练使 ASearcher 在 GAIA、xBench-DeepSearch 和 Frames 上的 Avg@4 分数分别提升了 *+15.0、+22.4 和 +14.6*。\n+ 🛠️ **完全开源**：我们致力于开源代理式强化学习训练的所有组件，包括数据集、数据合成代理、训练细节、模型权重以及自定义代理开发的详细指南。\u003Cfont style=\"color:#DF2A3F;\"> 已发布的模型和数据可在 [🤗Huggingface](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FinclusionAI\u002Fasearcher-6891d8acad5ebc3a1e1fb2d1) 上找到。\u003C\u002Ffont>\n\n**📰 新闻与更新**：\n- 2025-09-18：ASearcher-Web-QwQ 的训练代码及最新模型已发布！请查看 [ASearcher-Web-QwQ-V2](https:\u002F\u002Fhuggingface.co\u002FinclusionAI\u002FASearcher-Web-QwQ-V2) 和 [我们的训练代码](ASearcher\u002Ftrain\u002Fasearcher_reasoning.py)，以实现流畅的大规模代理式强化学习训练！\n- 2025-09-18：更清洁、更灵活的训练！ASearcher 现在使用 [AReaL](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL) 作为软件包。\n- 2025-08-30：**ASearcher-Web-QwQ-V2 即将发布！** 具有改进的训练数据和端到端代理式强化学习训练的最先进搜索代理。模型和数据将很快发布。\n- 2025-08-09：我们的 [技术报告](assets\u002FASearcher.pdf) 已发布。\n- 2025-08-05：**ASearcher** 正式发布，尝试异步强化学习训练和自动 QA 合成，训练出先进的搜索代理吧！🎉\n\n# 结果展示\n我们在具有挑战性的 QA 基准测试（GAIA、xBench-DeepSearch 和 Frames）上评估了我们的方法，这些基准测试旨在检验高级问题解决能力和网络搜索策略。这些基准测试专门设计用于评估代理与真实网络交互并检索最新信息的能力，而这些信息往往超出了 LLM 的内部知识范围。\n\n**前沿性能。** 我们的代理 [ASearcher-Web-QwQ-v2](https:\u002F\u002Fhuggingface.co\u002FinclusionAI\u002FASearcher-Web-QwQ) 在开源代理中达到了最先进的水平，在 GAIA 和 xBench 上拥有最高的 Avg@4 分数。此外，我们还报告了 Pass@4，即代理在四次尝试内找到正确答案的比例。[ASearcher-Web-QwQ-v2](https:\u002F\u002Fhuggingface.co\u002FinclusionAI\u002FASearcher-Web-QwQ) 在通过率方面也优于现有的开源代理，进一步证明了其稳健性。\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FinclusionAI_ASearcher_readme_e8abacc843ca.png)\n\n\u003Cp align=\"center\"> 图1 基于 32B 规模模型的各种方法在 GAIA、xBench-DeepSearch 和 Frames 上的表现。报告了 Avg@4 和 Pass@4。\u003C\u002Fp>\n\n\n**通过 RL 实现显著提升。** 将强化学习（RL）训练前后的性能进行比较时，[ASearcher-Web-QwQ-v2](https:\u002F\u002Fhuggingface.co\u002FinclusionAI\u002FASearcher-Web-QwQ) 在 GAIA、xBench-DeepSearch 和 Frames 上分别实现了 *+15.0、+22.4 和 +14.6* 的提升。在通过率（Pass@4）方面，[ASearcher-Web-QwQ-v2](https:\u002F\u002Fhuggingface.co\u002FinclusionAI\u002FASearcher-Web-QwQ) 也表现出显著增长——尤其是在 xBench-DeepSearch 上，其通过率提高了 22.4。\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FinclusionAI_ASearcher_readme_c449e6a6cf85.jpg)\n\u003Cp align=\"center\"> 图2 QwQ-32B 代理在 RL 训练前后性能的对比。\u003C\u002Fp>\n\n\n\n# 数据合成\n我们开发了一种基于提示的 LLM 代理，旨在自主生成有据可依、具有挑战性且高度不确定的问答对。该过程从基础问题开始，随后代理会通过两种关键策略不断优化问题：\n\n+ 模糊化：通过模糊查询中的关键细节来增加不确定性。\n+ 上下文注入：通过工具检索的外部事实来丰富问题内容，从而加深复杂性。\n\n每个生成的问题都会经过严格的多阶段验证：\n\n+ 质量保证：检查语言流畅性、时效性和逻辑连贯性。\n+ 难度验证：将 LRM 生成的答案与标准答案进行比较，以确保问题的挑战性。\n+ 答案唯一性验证：确认 LRM 的错误答案确实无效，从而保持问题的完整性。\n\n\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FinclusionAI_ASearcher_readme_277b90285109.png)\n\n\u003Cp align=\"center\"> 图3 数据合成代理。\u003C\u002Fp>\n\n# 全异步智能体强化学习训练\n\n我们的分析揭示了**智能体轨迹执行时间存在显著差异**。通过考察强化学习训练过程中每条轨迹的回合数和生成的标记数量，我们发现较长的轨迹可能比短轨迹多出数十个回合。在标记生成方面，较长轨迹的生成量甚至可以达到较短轨迹的两 orders of magnitude，如下图所示。\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FinclusionAI_ASearcher_readme_986d84098591.png)\n\n\u003Cp align=\"center\"> 图4（左）回合数与训练步数的关系。（右）生成标记数与训练步数的关系。\u003C\u002Fp>\n\n**全异步强化学习训练支持长 horizon 工具使用。** 在批量生成的强化学习系统中，整个批次必须等待最长的轨迹完成，这会导致 GPU 出现大量空闲时间。相比之下，全异步强化学习（RL）通过将训练与轨迹生成完全解耦，消除了这一瓶颈。这使得我们可以放宽回合限制（例如每条轨迹最多 128 回合），从而使智能体能够在不牺牲训练效率的情况下探索更深层次的搜索路径。值得注意的是，我们的智能体 ASearcher-Web-QwQ 实现了极长的 horizon 搜索，**在强化学习训练期间工具调用次数超过 100 回合，生成标记数超过 40 万**。\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FinclusionAI_ASearcher_readme_20a64b57c27e.png)\n\n\u003Cp align=\"center\"> 图5 全异步强化学习训练示意图。\u003C\u002Fp>\n \n\n# 快速入门\n## 评估\n要复现图2中的结果，请运行以下脚本。\n\n```bash\ncd evaluation\u002F\n\nMODEL_PATH=\u002Fpath\u002Fto\u002Fmodels \nDATA_DIR=\u002Fpath\u002Fto\u002Ftest_set # 可从 [https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FinclusionAI\u002FASearcher-test-data] 下载\n\nDATA_NAMES=GAIA,xbench-deepsearch,Frames\nAGENT_TYPE=asearcher-reasoning\nPROMPT_TYPE=asearcher-reasoning\nSEARCH_CLIENT_TYPE=async-web-search-access\n\nSCRIPT_DIR=\"$(cd \"$(dirname \"${BASH_SOURCE[0]}\")\" &> \u002Fdev\u002Fnull && pwd)\"\nPROJECT_ROOT=\"$(dirname \"$SCRIPT_DIR\")\"\n\nPYTHONPATH=\"${PROJECT_ROOT}:$PYTHONPATH\" \\\nSERPER_API_KEY=${your_serper_api_key} \\\nJINA_API_KEY=${your_jina_api_key} \\\nTOKENIZERS_PARALLELISM=false \\\npython3 search_eval_async.py \\\n    --data_names ${DATA_NAMES} \\\n    --model_name_or_path ${MODEL_PATH}  \\\n    --output_dir ${MODEL_PATH} \\\n    --data_dir ${DATA_DIR} \\\n    --prompt_type $PROMPT_TYPE \\\n    --agent-type ${AGENT_TYPE} \\\n    --search-client-type ${SEARCH_CLIENT_TYPE} \\\n    --tensor_parallel_size 4 \\\n    --temperature 0.6 \\\n    --parallel-mode seed \\\n    --seed 1 \\\n    --use-jina \\\n    --llm_as_judge \\\n    --pass-at-k 1 \\ # 如果您希望获得更稳定的结果，请提高此值\n```\n有关详细指南，请参阅[评估文档](docs\u002Fevaluation.md)。\n\n## 训练\n\n\n### 微调一个 7B 模型\n\n**1. 设置环境**\n\n请参考 https:\u002F\u002Finclusionai.github.io\u002FAReaL\u002Ftutorial\u002Finstallation.html#runtime-environment\n\n**2.1 在 16 个节点上训练 7B 模型（推荐）**\n```bash\ncd AReaL\n\nexport SERPER_API_KEY=YOUR_SERPER_API_KEY\nexport JINA_API_KEY=YOUR_JINA_API_KEY\npython3 -m areal.launcher.ray ASearcher\u002Ftrain\u002Fasearcher.py \\\n    --config ASearcher\u002Fconfigs\u002Fasearcher_web_16nodes.yaml \\\n    experiment_name=\u003Cyour experiment name> \\\n    trial_name=\u003Cyour trial name> \\\n    allocation_mode=sglang.d96p1t1+d32p1t1 \\\n    cluster.n_nodes=16 \\\n    cluster.n_gpus_per_node=8\n```\n\n**2.2 在单个节点上训练 7B 模型（可能会较慢）**\n\n```bash\ncd AReaL\n\nexport SERPER_API_KEY=YOUR_SERPER_API_KEY\nexport JINA_API_KEY=YOUR_JINA_API_KEY\n\npython3 -m areal.launcher.local ASearcher\u002Ftrain\u002Fasearcher.py \\\n    --config ASearcher\u002Fconfigs\u002Fasearcher_web.yaml \\\n    experiment_name=\u003Cyour experiment name> \\\n    trial_name=\u003Cyour trial name>\n```\n\n### 微调一个 QwQ-32B 智能体\n\n**步骤 1.** 启动 Qwen2.5-72B-Instruct 作为 LLM-as-Judge：\n\n```shell\npython3 -m areal.launcher.ray ASearcher\u002Ftrain\u002Fasearcher_reasoning.py \\\n    --config ASearcher\u002Fconfigs\u002Fasearcher_web_qwq.yaml \\\n    experiment_name=asearcher-qwen72b-inst-server-only \\\n    trial_name=run1 \\\n    cluster.n_nodes=1 allocation_mode=sglang.d2t4p1 \\\n    actor.path=Qwen\u002FQwen2.5-72B-Instruct \n```\n\n**步骤 2.** 启动 QwQ-32B 智能体训练：\n\n```shell\npython3 -m areal.launcher.ray \\\n    ASearcher\u002Ftrain\u002Fasearcher_reasoning.py \\\n    --config ASearcher\u002Fconfigs\u002Fasearcher_web_qwq.yaml \\\n    experiment_name=asearcher-qwq-train \\\n    trial_name=run1 cluster.n_nodes=6 allocation_mode=sglang.d2t8+d4t8 \\\n    actor.path=Qwen\u002FQwQ-32B \\\n    train_dataset.path=path_to_ASearcher-LRM-35k \\\n    judge_engine.experiment_name=asearcher-qwen72b-inst-server-only \\\n    judge_engine.trial_name=run1\n```\n\n\n有关详细指南，请参阅[训练文档](docs\u002Ftraining.md)。\n\n## 启动演示\n请参阅[演示文档](demo\u002FREADME.md)，了解如何启动 asearcher 可视化演示。\n\n## （可选）自定义\n有关构建自定义智能体的更多信息，请参阅我们的[指南](docs\u002Fguideline.md)。\n\n## （可选）数据合成\n数据合成智能体位于 `qa_synthesis\u002Fqa_synthesis_agent.py` 中。要运行该智能体进行 QA 数据合成，您需要：\n\n1. 下载相关数据，包括维基百科 2018 年网页以及采样链接列表。\n2. 启动两个模型的 SGLang 服务器：`QwQ-32B` 和 `Qwen2.5-72B-instruct`。\n3. 运行 `python3 qa_synthesis\u002Fqa_synthesis_agent.py` 来合成高质量的 QA 对！\n\n## 致谢\n\n我们谨此感谢本工作的主要贡献者来自蚂蚁集团研究实验室的 RL 实验室以及清华大学交叉信息研究院。\n\n此外，我们的团队还得到了以下团队的宝贵帮助：\n\n- 蚂蚁集团的 [AWorld](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAWorld) 团队，分享了他们在智能体开发方面的经验。\n- 蚂蚁集团超级计算技术（SCT）团队，特别是在大规模集群管理和运维方面的专业知识。\n\n我们也要感谢研究社区提供的基础性工作和启发，包括但不限于 [Search-o1](https:\u002F\u002Fsearch-o1.github.io\u002F)、[Search-R1](https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1) 和 [WebAgent](https:\u002F\u002Fgithub.com\u002FAlibaba-NLP\u002FWebAgent)。\n\n## 引用\n如果您认为我们的工作有用，请引用我们的研究成果！\n\n```\n@misc{gao2025turnsunlockinglonghorizonagentic,\n      title={超越十回合：利用大规模异步强化学习解锁长 horizon 智能体搜索}, \n      author={Jiaxuan Gao 和 Wei Fu 和 Minyang Xie 和 Shusheng Xu 和 Chuyi He 和 Zhiyu Mei 和 Banghua Zhu 和 Yi Wu},\n      year={2025},\n      eprint={2508.07976},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.07976}, \n}\n```","# ASearcher 快速上手指南\n\nASearcher 是一个开源的大规模在线强化学习（RL）框架，专为训练搜索智能体（Search Agents）设计。它支持异步代理 RL 训练、自动数据合成，并能实现超过 100 轮工具调用的长程搜索能力。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**: Linux (推荐 Ubuntu 20.04+)\n- **GPU**: NVIDIA GPU (建议显存 ≥ 24GB，多卡或多节点训练效果更佳)\n- **Python**: 3.9 或更高版本\n- **CUDA**: 11.8 或 12.x\n\n### 前置依赖\n在开始之前，请确保已安装以下基础依赖：\n- Git\n- CUDA Toolkit\n- NCCL (用于多卡通信)\n\n**API Key 准备**:\n运行前需获取以下服务的 API Key 并设置为环境变量：\n- [Serper](https:\u002F\u002Fserper.dev\u002F) (搜索引擎): `SERPER_API_KEY`\n- [Jina AI](https:\u002F\u002Fjina.ai\u002F) (网页读取): `JINA_API_KEY`\n\n## 安装步骤\n\nASearcher 基于 [AReaL](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FAReaL) 框架构建。请按照以下步骤安装运行时环境。\n\n### 1. 克隆项目\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FASearcher.git\ncd ASearcher\n```\n\n### 2. 安装 AReaL 运行时环境\n请参考 AReaL 官方安装文档配置基础环境（包含 PyTorch, SGLang, Ray 等核心组件）：\n```bash\n# 访问 AReaL 安装指南\n# https:\u002F\u002Finclusionai.github.io\u002FAReaL\u002Ftutorial\u002Finstallation.html#runtime-environment\n```\n*注：国内开发者若遇到网络问题，可尝试配置 pip 国内镜像源（如清华源或阿里源）加速 Python 包下载。*\n\n### 3. 安装项目依赖\n进入项目根目录安装特定依赖（如有 `requirements.txt`）：\n```bash\npip install -r requirements.txt\n```\n\n## 基本使用\n\n### 场景一：模型评估 (Evaluation)\n复现论文中的测试结果（GAIA, xBench-DeepSearch, Frames）。\n\n1. **准备数据与模型**：\n   - 从 [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FinclusionAI\u002FASearcher-test-data) 下载测试集。\n   - 下载预训练模型权重。\n\n2. **运行评估脚本**：\n```bash\ncd evaluation\u002F\n\nMODEL_PATH=\u002Fpath\u002Fto\u002Fmodels \nDATA_DIR=\u002Fpath\u002Fto\u002Ftest_set \n\nDATA_NAMES=GAIA,xbench-deepsearch,Frames\nAGENT_TYPE=asearcher-reasoning\nPROMPT_TYPE=asearcher-reasoning\nSEARCH_CLIENT_TYPE=async-web-search-access\n\nSCRIPT_DIR=\"$(cd \"$(dirname \"${BASH_SOURCE[0]}\")\" &> \u002Fdev\u002Fnull && pwd)\"\nPROJECT_ROOT=\"$(dirname \"$SCRIPT_DIR\")\"\n\nPYTHONPATH=\"${PROJECT_ROOT}:$PYTHONPATH\" \\\nSERPER_API_KEY=${your_serper_api_key} \\\nJINA_API_KEY=${your_jina_api_key} \\\nTOKENIZERS_PARALLELISM=false \\\npython3 search_eval_async.py \\\n    --data_names ${DATA_NAMES} \\\n    --model_name_or_path ${MODEL_PATH}  \\\n    --output_dir ${MODEL_PATH} \\\n    --data_dir ${DATA_DIR} \\\n    --prompt_type $PROMPT_TYPE \\\n    --agent-type ${AGENT_TYPE} \\\n    --search-client-type ${SEARCH_CLIENT_TYPE} \\\n    --tensor_parallel_size 4 \\\n    --temperature 0.6 \\\n    --parallel-mode seed \\\n    --seed 1 \\\n    --use-jina \\\n    --llm_as_judge \\\n    --pass-at-k 1\n```\n\n### 场景二：强化学习训练 (Training)\n\n#### 选项 A：微调 7B 模型 (单节点快速尝试)\n适用于资源有限的开发环境，速度较慢。\n\n```bash\ncd AReaL\n\nexport SERPER_API_KEY=YOUR_SERPER_API_KEY\nexport JINA_API_KEY=YOUR_JINA_API_KEY\n\npython3 -m areal.launcher.local ASearcher\u002Ftrain\u002Fasearcher.py \\\n    --config ASearcher\u002Fconfigs\u002Fasearcher_web.yaml \\\n    experiment_name=my_7b_experiment \\\n    trial_name=run1\n```\n\n#### 选项 B：微调 QwQ-32B 智能体 (多节点分布式)\n需要多节点集群支持，分为两步：启动裁判模型和启动训练。\n\n**第一步：启动裁判模型 (LLM-as-Judge)**\n```shell\npython3 -m areal.launcher.ray ASearcher\u002Ftrain\u002Fasearcher_reasoning.py \\\n    --config ASearcher\u002Fconfigs\u002Fasearcher_web_qwq.yaml \\\n    experiment_name=asearcher-qwen72b-inst-server-only \\\n    trial_name=run1 \\\n    cluster.n_nodes=1 allocation_mode=sglang.d2t4p1 \\\n    actor.path=Qwen\u002FQwen2.5-72B-Instruct \n```\n\n**第二步：启动 QwQ-32B 训练任务**\n```shell\npython3 -m areal.launcher.ray \\\n    ASearcher\u002Ftrain\u002Fasearcher_reasoning.py \\\n    --config ASearcher\u002Fconfigs\u002Fasearcher_web_qwq.yaml \\\n    experiment_name=asearcher-qwq-train \\\n    trial_name=run1 cluster.n_nodes=6 allocation_mode=sglang.d2t8+d4t8 \\\n    actor.path=Qwen\u002FQwQ-32B \\\n    train_dataset.path=path_to_ASearcher-LRM-35k \\\n    judge_engine.experiment_name=asearcher-qwen72b-inst-server-only \\\n    judge_engine.trial_name=run1\n```\n\n### 场景三：启动演示 (Demo)\n查看可视化的搜索过程：\n```bash\n# 详见 demo\u002FREADME.md 中的具体启动命令\n```\n\n### 场景四：数据合成 (可选)\n自动生成高质量的问答对用于训练：\n1. 下载 Wikipedia 2018 网页数据及采样链接列表。\n2. 启动 `QwQ-32B` 和 `Qwen2.5-72B-instruct` 的 SGLang 服务。\n3. 运行合成脚本：\n```bash\npython3 qa_synthesis\u002Fqa_synthesis_agent.py\n```\n\n> **提示**：更多自定义智能体开发指南请参阅 `docs\u002Fguideline.md`。","某金融科技团队需要构建一个能自动追踪全球政策变动并生成深度研报的智能助手，以辅助投资分析师快速决策。\n\n### 没有 ASearcher 时\n- **搜索深度不足**：传统代理往往在几次搜索后就停止，无法像人类专家那样进行超过 100 轮的深层信息挖掘，导致遗漏关键隐性线索。\n- **训练成本高昂且低效**：强化学习训练中 GPU 常因等待数据收集而闲置，难以支撑长周期任务训练，模型迭代速度极慢。\n- **数据多样性匮乏**：缺乏高质量的合成数据，模型在面对复杂、不确定的真实世界问题时泛化能力差，容易陷入死循环或给出幻觉答案。\n- **性能瓶颈明显**：在 GAIA 等高难度基准测试中，现有开源方案得分较低，无法独立处理需要多步推理和实时网页交互的复杂查询。\n\n### 使用 ASearcher 后\n- **实现超长程搜索**：借助强化学习优化，ASearcher 能自主执行超 100 轮工具调用，生成超过 40 万 token 的推理链，彻底厘清复杂的政策关联。\n- **训练效率飞跃**：其全异步智能体强化学习架构解耦了数据采集与模型训练，消除了 GPU 空闲时间，大幅降低了大规模训练的时间与经济成本。\n- **数据自我进化**：内置的数据合成智能体能自动生成高难度、高不确定性的问答对，显著提升了模型处理陌生领域问题的鲁棒性。\n- **专家级表现**：在同等规模下，ASearcher 在 GAIA 等权威榜单上的平均分提升超过 15 分，能够精准输出包含实时数据的深度分析结论。\n\nASearcher 通过开源的大规模强化学习框架，让开发者能以低成本打造出具备专家级长程推理与实时搜索能力的智能代理。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FinclusionAI_ASearcher_e8abacc8.png","inclusionAI","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FinclusionAI_70666e45.jpg","This organization contains the series of open-source projects from Ant Group with dedicated efforts to work towards Artificial General Intelligence (AGI).",null,"https:\u002F\u002Finclusion-ai.org","https:\u002F\u002Fgithub.com\u002FinclusionAI",[79,83,87],{"name":80,"color":81,"percentage":82},"Python","#3572A5",98,{"name":84,"color":85,"percentage":86},"Shell","#89e051",1.5,{"name":88,"color":89,"percentage":90},"Dockerfile","#384d54",0.5,574,38,"2026-04-08T16:56:22",5,"Linux","必需 NVIDIA GPU。训练 7B 模型推荐 16 节点（每节点 8 GPU）；训练 QwQ-32B 模型需多节点集群（示例配置为 6 节点，每节点 8 GPU）。依赖 SGLang 和 Ray 进行分布式加速，未明确具体显存大小，但运行 32B 模型及长上下文（400k tokens）通常需要高显存（建议单卡 80GB 或多卡并行）。","未说明（大规模集群训练通常要求每节点 512GB+）",{"notes":99,"python":100,"dependencies":101},"该项目专注于大规模在线强化学习，严重依赖分布式集群环境。核心框架基于 AReaL，并使用 SGLang 作为推理后端。训练和评估需要配置 Serper API Key 和 Jina API Key 以支持网络搜索功能。官方文档主要提供 Linux 下的 Ray 集群启动脚本，未提及 Windows 或 macOS 的支持。由于涉及长程搜索（超过 100 轮工具调用和 40 万 token），对集群通信带宽和稳定性有较高要求。","3.8+",[102,103,104,105,106],"AReaL","SGLang","Ray","torch","transformers",[35,14,13],"2026-03-27T02:49:30.150509","2026-04-09T10:05:30.923737",[111,116,121,126,131,135,139],{"id":112,"question_zh":113,"answer_zh":114,"source_url":115},26097,"为什么我无法复现 ASearcher-Local-7B 的论文评估结果，得到的分数显著偏低？","这是因为早期版本的 ASearcher prompt 存在缺陷，导致性能下降。该问题已在 PR #15 中修复。请拉取最新的代码库版本，然后重新运行评估脚本即可复现正确的结果。","https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FASearcher\u002Fissues\u002F12",{"id":117,"question_zh":118,"answer_zh":119,"source_url":120},26098,"使用 index_builder.py 生成的索引文件与 Search-R1 提供的索引文件有何区别？这会影响性能吗？","两者的核心构建逻辑基本相同，主要区别在于底层语料库的分区策略（corpus partition strategy）。但这并不是导致评估结果不匹配的根本原因。评估结果差异主要是由初始 prompt 错误引起的（已在最新代码中修复），而非索引文件的差异。","https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FASearcher\u002Fissues\u002F19",{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},26099,"如何获取完整的测试数据集？下载后运行评估脚本报错怎么办？","完整测试数据可在 HuggingFace 数据集页面的 \"Files and versions\" 选项卡中找到。如果运行脚本报错，请检查以下两点修复：1. 将下载的 'frames' 文件夹重命名为 'Frames'；2. 在 Frames\u002Ftest.json 文件中，将键名 \"Question\" 修改为 \"question\"（注意大小写）。修正后即可正常运行 OpenAI API 评估脚本。","https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FASearcher\u002Fissues\u002F7",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},26100,"如何复现合成数据（Synthesized Data）的过程？合成数据是用于 SFT 还是 RL？","合成数据的步骤如下：1. 按照文档下载本地 Wiki 语料库和网页数据；2. 搭建两个 SGLang 服务器，分别部署 QwQ-32B 和 Qwen2.5-72B-Instruct 模型；3. 运行 qa_synthesis\u002Fqa_synthesis_agent.py 脚本生成问答数据。关于用途：在该项目中，合成数据专门用于在线代理强化学习（Online Agentic RL），不过这些数据也可以用于生成轨迹以进行监督微调（SFT）。","https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FASearcher\u002Fissues\u002F3",{"id":132,"question_zh":133,"answer_zh":134,"source_url":130},26101,"如何处理 Wiki 数据以生成 pages_path 和 links_path 所需的文件？","无需额外处理。直接从 HuggingFace 下载以下两个文件：pages 文件 (wiki_webpages.jsonl) 和 links 文件 (wikilinks.jsonl)。下载完成后，直接将这两个文件的路径分别填入代码配置中的 `pages_path` 和 `links_path` 变量即可运行 QA 合成程序。",{"id":136,"question_zh":137,"answer_zh":138,"source_url":120},26102,"在使用本地检索器（local retriever e5）进行评估时，应该使用什么 Prompt？它支持 access 工具吗？","评估时应参考 evaluation\u002Futils.py 中的相关 Prompt 定义。需要注意的是，本地检索器（local retriever）本身并不支持 `\u003Caccess> url \u003C\u002Faccess>` 这种直接访问 URL 的操作。如果模型生成了无效的 `\u003Caccess>\u003C\u002Faccess>` 动作，可能是因为 Prompt 设置不当或模型误解了本地环境的限制。建议确认是否使用了针对本地模式优化的 Prompt 模板。",{"id":140,"question_zh":141,"answer_zh":142,"source_url":115},26103,"为什么 Bamboogle 数据集的 F1 分数即使修复了 Prompt 后仍然低于论文报告的值？","这主要是由于样本量较小导致的方差较大。Bamboogle 数据集仅包含 125 个样本，而其他数据集如 2WikiMultihopQA 有 1000 个样本。小样本会导致评估结果波动较大（例如在 0.51 到 0.56 之间波动），因此单次运行的结果可能无法完全达到论文中的高分，这属于正常现象。",[]]