[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-Danau5tin--multi-agent-coding-system":3,"tool-Danau5tin--multi-agent-coding-system":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":81,"owner_twitter":82,"owner_website":83,"owner_url":84,"languages":85,"stars":94,"forks":95,"last_commit_at":96,"license":97,"difficulty_score":10,"env_os":98,"env_gpu":99,"env_ram":100,"env_deps":101,"category_tags":107,"github_topics":81,"view_count":10,"oss_zip_url":81,"oss_zip_packed_at":81,"status":16,"created_at":108,"updated_at":109,"faqs":110,"releases":136},1070,"Danau5tin\u002Fmulti-agent-coding-system","multi-agent-coding-system","Reached #13 on Stanford's Terminal Bench leaderboard. Orchestrator, explorer & coder agents working together with intelligent context sharing.","multi-agent-coding-system 是一个基于多代理协作的AI编程系统，通过协调器、探索者和编码器代理协同工作，实现复杂编程任务的高效解决。系统利用智能上下文共享机制，将任务分解为子任务并动态分配给不同代理，确保每一步操作都能基于前序发现进行优化。该工具解决了传统单体模型在复杂任务中易陷入局部最优、协作效率低、代码质量不稳定等问题，尤其适用于需要精细控制和多步骤推理的编程场景。\n\n其核心优势在于多代理架构与异步处理能力，支持不同模型作为协调器与子代理，实现模块化扩展。系统还支持大规模训练，通过32块H100显卡并行训练，显著提升性能。在斯坦福TerminalBench测试中，其表现超越Claude Code，验证了多代理协作的高效性。开发者和研究人员可直接使用其开源代码进行定制开发，适合需要高精度编程辅助或复杂任务分解的场景。","# 🤓 Orchestrator: A multi-agent AI coder. Reached #13 on Stanford's TerminalBench. Open sourced!\n\nTL;DR:\n- Over the weekend, quite unexpectedly, I made a multi-agent AI system that places slightly higher than Claude Code on Stanford's TerminalBench leaderboard (13th place).\n- This AI system consists of an orchestration agent that dispatches multiple explorer and coder agents to do all the work.\n- The orchestrator explicitly defines what knowledge artifacts subagents must return, then reuses and synthesises these artifacts across future tasks - creating compound intelligence where each action builds meaningfully on previous discoveries.\n\n![Orchestrator with claude-sonnet-4 on standford's terminal bench](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FDanau5tin_multi-agent-coding-system_readme_cf5334f9e076.png)\n\n## 📰 Releases\n\n### (4th Nov 25) 🌊 Orca-Agent-v0.1 \n\n- RL trained 14B Orca-Agent-v0.1 ([separate repo here](https:\u002F\u002Fgithub.com\u002FDanau5tin\u002FOrca-Agent-RL))\n    - Qwen3-14B achieved a **160.71% relative increase on Stanford's TerminalBench** after training using this multi-agent framework.\n    - I scaled multi-agent RL training to 32x Nvidia H100s, rolling out 256 concurrent Docker environments simultaneously.\n    - Full training code, model weights, datasets, and documentation are open source in [this repo](https:\u002F\u002Fgithub.com\u002FDanau5tin\u002FOrca-Agent-RL).\n\n### (2nd Nov 25)\n- New system & agent abilities:\n  - Orchestrator & Subagent can be different models\n  - App now fully async\n  - App now a package so other projects can depend on it\n  - Agents receive a snapshot of the environment on startup\n  - Agent can reference task id's as context refs to inject all of that task's contexts into a subagent\n  - New Single\u002FDistributed Node Docker Manager (used in tests, but mostly by RL)\n  - Bug fixes\n\n### (2nd Sept 25) 🤓 Orchestrator - multi-agent-coder \n\n- Agentic AI system placing #12 on Stanford's Terminal Bench (not for very long, so screenshot above shows #13 😅)\n- This placed higher than Claude Code\n- All orchestration code open sourced\n\n## How the System Works\n\n![System architecture overview](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FDanau5tin_multi-agent-coding-system_readme_26b6e6b00c77.png)\n\nThe orchestrator acts as the brain of the operation - it receives the user's task but never touches code directly. Instead, it:\n\n1. **Analyses** the task and breaks it into focused subtasks\n2. **Dispatches** explorer agents to understand the system\n3. **Delegates** implementation work to coder agents with precise instructions\n4. **Verifies** all changes through additional explorer agents\n5. **Maintains** the context store with all discovered knowledge\n\nThe orchestrator's lack of direct code access forces proper delegation and verification patterns, leading to more strategic solutions.\n\nFor a full breakdown of this project's code structure, see [here](.\u002FPROJECT_STRUCTURE.md)\n\n### Example model output\nThe model uses xml tags to define an action, and YAML to define the parameters of an action if they exist. Below shows the model thinking and then dispatching a coder agent.\n```xml\n\u003Cthink>\nOkay, now with task_003 identifying a backup, the next step would be to attempt recovery using this identified file.\n\u003C\u002Fthink>\n\n\u003Ctask_create>\nagent_type: 'coder'\ntitle: 'Attempt recovery using the identified backup file'\ndescription: |\n  Execute the following steps to recover using the backup identified in task_003:\n  1. Move the identified backup file (\u002Ftmp\u002Fterraform_work\u002F.terraform.tfstate.tmp) to the target location (\u002Finfrastructure\u002Frecovered_state.json).\n  2. Verify the operation by checking:\n     - The existence of the new file at the target location\n     - The size matches the original backup file\n     - The file permissions are set appropriately (rw-r--r--)\nmax_turns: 10\ncontext_bootstrap:\n  - path: '\u002Ftmp\u002Fterraform_work\u002F.terraform.tfstate.tmp'\n    reason: 'The backup file identified in task_003'\ncontext_refs:\n  - 'task_003'\n\u003C\u002Ftask_create>\n```\n\n\n## 📈 Evaluation Results\n\n### Performance on TerminalBench\n\n[Terminal bench](https:\u002F\u002Fwww.tbench.ai\u002F) is a brilliant benchmark created by Stanford and [Laude Institute](https:\u002F\u002Fwww.laude.org\u002F) to quantify agents' ability to complete complex tasks in the terminal. My Orchestrator system achieved **13th place** on the leaderboard, demonstrating competitive performance against leading AI coding assistants.\n\nI ran the Orchestrator evaluations with both Claude-4-Sonnet and also Qwen3-Coder-480B-A35B:\n\n![Performance comparison chart](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FDanau5tin_multi-agent-coding-system_readme_47e4b0d7012e.png)\n![Orchestrator with qwen-3-coder on standford's terminal bench](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FDanau5tin_multi-agent-coding-system_readme_ceb65da15519.png)\n\nThis image shows Qwen-3-Coder performance on the benchmark. The screenshot towards the top of this README shows Sonnet-4 performance.\n\n### Cost & Efficiency\n\nOne of the most striking results is the amount of tokens used by Sonnet-4 as opposed to Qwen3-Coder.\n\nThe below table shows the total tokens (input and output included) processed across the TerminalBench evaluation run (5 attempts at 80 tasks = 400 trajectories).\n\n| Model | Success Rate | Total Evaluation Cost | Token Usage |\n|-------|--------------|------------|-------------|\n| **Claude Sonnet-4** | 37.0% | $263.56* | 93.2M tokens |\n| **Qwen-3-Coder** | 19.7% | $217.83 | 14.7M tokens |\n\n*Claude Sonnet-4 costs reflect heavy caching usage, reducing actual API costs\n\n\n## 🤖 The Agents\n\nWhile all agents use the same underlying LLM, each operates with its own context window, specialised system message, and distinct toolset. This creates functionally different agents optimised for their specific roles.\n\n### 🎯 Orchestrator Agent\n[System message](.\u002Fsrc\u002Fagents\u002Fsystem_msgs\u002Fmd_files\u002Forchestrator_sys_msg_v0.1.md)\n**Role:** Strategic coordinator and persistent intelligence layer  \n**Capabilities:** Task decomposition, context management, subagent delegation  \n**Tools:** Task creation, subagent launching, context store management  \n**Restrictions:** Cannot read or modify code directly - operates purely at architectural level  \n\nThe orchestrator maintains the complete picture across all tasks, tracking discoveries and progress. It crafts precise task descriptions that explicitly specify what contexts subagents should return, ensuring focused and valuable information gathering.\n\n**Trust Calibration Strategy:**  \nThe orchestrator employs adaptive delegation based on task complexity:\n- **Low Complexity Tasks**: Grants extremely high autonomy to the coder agent for simple modifications and bug fixes\n- **Medium\u002FLarge Tasks**: Maintains strong trust but uses iterative decomposition - breaking complex problems into atomic, verifiable steps\n- **Verification Philosophy**: Uses explorer agents liberally to verify progress, especially when tasks involve critical functionality\n\n\n### 🔍 Explorer Agent \n[System message](.\u002Fsrc\u002Fagents\u002Fsystem_msgs\u002Fmd_files\u002Fexplorer_sys_msg_v0.1.md) \n**Role:** Read-only investigation and verification specialist  \n**Capabilities:** System exploration, code analysis, test execution, verification  \n**Tools:** File reading, search operations (grep\u002Fglob), bash commands, temporary script creation  \n**Restrictions:** Cannot modify existing files - strictly read-only operations  \n\nExplorers gather intelligence about the codebase, verify implementations, and discover system behaviors. They create knowledge artifacts that eliminate redundant exploration for future agents.\n\n### 💻 Coder Agent\n[System message](.\u002Fsrc\u002Fagents\u002Fsystem_msgs\u002Fmd_files\u002Fcoder_sys_msg_v0.1.md)\n**Role:** Implementation specialist with write access  \n**Capabilities:** Code creation\u002Fmodification, refactoring, bug fixes, system changes  \n**Tools:** Full file operations (read\u002Fwrite\u002Fedit), bash commands, search operations  \n**Restrictions:** None - full system access for implementation tasks  \n\nCoders transform architectural vision into working code. They receive focused tasks with relevant contexts and implement solutions while maintaining code quality and conventions.\n\n## Key System Components\n\n### 🧠 Smart Context Sharing\n\n#### How Context Sharing Works\n\nI introduced a novel approach to multi-agent coordination through the **Context Store** - a persistent knowledge layer that transforms isolated agent actions into coherent problem-solving. Unlike traditional multi-agent systems where agents operate in isolation, my architecture enables sophisticated knowledge accumulation and sharing.\n\n**The Context Store Pattern:**\n1. **Orchestrator-Directed Discovery**: The orchestrator explicitly specifies what contexts it needs from each subagent, ensuring focused and relevant information gathering and implementation reporting\n2. **Knowledge Artifacts**: Subagents create discrete, reusable context items based on the orchestrator's requirements\n3. **Persistent Memory**: Contexts persist across agent interactions, building a comprehensive system understanding\n4. **Selective Injection**: The orchestrator precisely injects relevant contexts into new tasks, eliminating redundant discovery and providing all the information a subagent needs to complete it's respective task\n5. **Compound Intelligence**: Each action builds meaningfully on previous discoveries, creating exponential problem-solving capability\n\n**Key Benefits:**\n- **Eliminates Redundant Work**: Subagents never need to rediscover the same information twice\n- **Reduces Context Window Load**: Agents receive only the specific contexts they need\n- **Enables Complex Solutions**: Multi-step problems that no single agent could solve become tractable\n- **Maintains Focus**: Each subagent operates with a clean, focused context window\n\nThis architecture ensures that every piece of discovered information becomes a permanent building block for future tasks, creating a system that genuinely learns and adapts throughout the problem-solving process.\n\n### 📋 Task Management\n\nThe orchestrator maintains a comprehensive task management system that tracks all subagent activities:\n\n**Core Functions:**\n- **Progress Tracking**: Monitors task status (pending, completed, failed) across potentially hundreds of coordinated actions\n- **Failure Recovery**: Captures failure reasons to enable strategic adaptation and intelligent retries\n- **Workflow Orchestration**: Maintains clear audit trails of what's been attempted, preventing redundant work\n- **Strategic Planning**: Enables systematic decomposition of complex problems into verifiable subtasks\n\nThe task manager serves as the orchestrator's operational memory - while the context store holds discovered knowledge, the task manager tracks the journey of discovery itself. This dual-layer system ensures the orchestrator always knows both what it has learned AND how it learned it, enabling sophisticated multi-step solutions that build intelligently on previous attempts.\n\n### ⏱️ Time-Conscious Orchestration\n\nOne thing I noticed during early evaluations was that whilst the system was on track to complete extremely complex tasks, it would use lots of subagents to get there, and therefore the task would time out.\n\n Therefore the orchestrator now employs a philosophy for time-efficient execution, recognising that wasted time often stems from poor task specification rather than slow execution:\n\n**Prevention Principles:**\n- **Front-Loading Precision**: The orchestrator spends time crafting exact task descriptions rather than iterating on vague ones\n- **Context Completeness**: Always over-provides context rather than under-providing, preventing subagents from rediscovering known information\n- **Explicit Expectations**: Every task specifies exactly what contexts should be returned, eliminating unfocused exploration\n- **Tight Scoping**: Defines clear boundaries - what to do AND what not to do, preventing scope creep\n\n\n## Getting started\n\nFor dev:\n```bash\nuv sync\n```\n\nTo run evals:\n```bash\n.\u002Frun_terminal_bench_eval.sh\n```\n\nTo quickly test various models: See [\u002Ftests](.\u002Ftests\u002F)\n\n## Notes\n- When I originally ran the evaluations, I saw my result would place me in 12th. By the time of submission (24 hours later), my agent placed 13th. Just 48 hours after this, my agent dropped to 15th! Such is the fascinating rate of progress in AI.\n\n## Acknowledgements\n- Thank you to [Taras](https:\u002F\u002Ftaras.com\u002F) for supporting the compute costs of my experiments\n- Thank you to all the amazing teams at Anthropic that enabled me to leverage Claude-4-Sonnet, which is such an incredible model\n- Likewise to the Qwen team for releasing Qwen3-Coder to the world, completely open source\n- Thank you to the Claude Code team for inspiring my agentic tool use implementation and philosophy\n- Thank you to the Litellm team for such an simple to use package, allowing me to switch providers with ease\n- Thank you to OpenRouter for such a great routing service, allowing me to switch models with ease","# 🤓 Orchestrator：一个多代理AI编码器。在斯坦福的TerminalBench上排名13。开源！\n\nTL;DR:\n- 周末，出乎意料地，我开发了一个多代理AI系统，在斯坦福的TerminalBench排行榜上略高于Claude Code（排名第13）。\n- 该AI系统包含一个调度代理，负责分配多个探索者和编码代理执行所有工作。\n- 调度器明确定义子代理必须返回的知识工件，然后在后续任务中重用和合成这些工件——创建复合智能，每个动作都建立在先前发现的基础上。\n\n![Orchestrator与claude-sonnet-4在斯坦福的terminal bench](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FDanau5tin_multi-agent-coding-system_readme_cf5334f9e076.png)\n\n## 📰 版本发布\n\n### (11月4日) 🌊 Orca-Agent-v0.1 \n\n- 通过强化学习（Reinforcement Learning）训练了14B参数的Orca-Agent-v0.1 ([单独仓库链接](https:\u002F\u002Fgithub.com\u002FDanau5tin\u002FOrca-Agent-RL))\n    - Qwen3-14B在使用该多代理框架训练后，在斯坦福的TerminalBench上实现了**160.71%的相对提升**。\n    - 我将多代理强化学习训练扩展到32块Nvidia H100，同时启动256个并发Docker环境。\n    - 完整的训练代码、模型权重、数据集和文档均在[此仓库](https:\u002F\u002Fgithub.com\u002FDanau5tin\u002FOrca-Agent-RL)开源。\n\n### (11月2日)\n- 新系统与代理能力：\n  - 调度器与子代理可以是不同模型\n  - 应用现在完全异步\n  - 应用现在是一个包，其他项目可依赖它\n  - 代理在启动时接收环境快照\n  - 代理可通过任务ID作为上下文引用，将该任务的所有上下文注入子代理\n  - 新的单节点\u002F分布式节点Docker管理器（用于测试，但主要由RL使用）\n  - 修复了bug\n\n### (9月2日) 🤓 Orchestrator - 多代理编码器 \n\n- 在斯坦福的TerminalBench上排名12（并非长期，因此上方截图显示排名第13 😅）\n- 该排名高于Claude Code\n- 所有调度代码已开源\n\n## 系统工作原理\n\n![系统架构概览](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FDanau5tin_multi-agent-coding-system_readme_26b6e6b00c77.png)\n\n调度器是整个操作的“大脑”——它接收用户的任务但从不直接接触代码。相反，它：\n\n1. **分析**任务并将其分解为专注的子任务\n2. **调度**探索者代理以理解系统\n3. **委托**编码代理执行实现工作，提供精确的指令\n4. **验证**所有更改通过额外的探索者代理\n5. **维护**上下文存储，包含所有发现的知识\n\n调度器缺乏直接代码访问迫使规范的委托和验证模式，导致更策略性的解决方案。\n\n有关该项目的代码结构完整分解，请参见[此处](.\u002FPROJECT_STRUCTURE.md)\n\n### 示例模型输出\n模型使用XML标签定义操作，使用YAML定义操作参数（如果存在）。以下显示模型思考后调度编码代理：\n```xml\n\u003Cthink>\nOkay, now with task_003 identifying a backup, the next step would be to attempt recovery using this identified file.\n\u003C\u002Fthink>\n\n\u003Ctask_create>\nagent_type: 'coder'\ntitle: 'Attempt recovery using the identified backup file'\ndescription: |\n  Execute the following steps to recover using the backup identified in task_003:\n  1. Move the identified backup file (\u002Ftmp\u002Fterraform_work\u002F.terraform.tfstate.tmp) to the target location (\u002Finfrastructure\u002Frecovered_state.json).\n  2. Verify the operation by checking:\n     - The existence of the new file at the target location\n     - The size matches the original backup file\n     - The file permissions are set appropriately (rw-r--r--)\nmax_turns: 10\ncontext_bootstrap:\n  - path: '\u002Ftmp\u002Fterraform_work\u002F.terraform.tfstate.tmp'\n    reason: 'The backup file identified in task_003'\ncontext_refs:\n  - 'task_003'\n\u003C\u002Ftask_create>\n```\n\n\n## 📈 评估结果\n\n### 在TerminalBench上的表现\n\n[Terminal bench](https:\u002F\u002Fwww.tbench.ai\u002F) 是斯坦福大学和[Laude Institute](https:\u002F\u002Fwww.laude.org\u002F) 创建的卓越基准测试，用于量化代理在终端中完成复杂任务的能力。我的Orchestrator系统在排行榜上获得**第13名**，展示了与领先AI编码助手竞争的表现。\n\n我使用Claude-4-Sonnet和Qwen3-Coder-480B-A35B对Orchestrator进行了评估：\n\n![性能对比图表](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FDanau5tin_multi-agent-coding-system_readme_47e4b0d7012e.png)\n![Orchestrator与qwen-3-coder在斯坦福的terminal bench](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FDanau5tin_multi-agent-coding-system_readme_ceb65da15519.png)\n\n此图像显示Qwen-3-Coder在基准测试中的表现。此README顶部的截图显示Sonnet-4的表现。\n\n### 成本与效率\n\n最引人注目的结果是Sonnet-4与Qwen3-Coder在Token使用量上的差异。\n\n下表显示了在TerminalBench评估运行中（5次尝试，80个任务=400条轨迹）处理的总Token数（输入和输出均包含）。\n\n| 模型 | 成功率 | 总评估成本 | Token使用量 |\n|-------|--------------|------------|-------------|\n| **Claude Sonnet-4** | 37.0% | $263.56* | 93.2M tokens |\n| **Qwen-3-Coder** | 19.7% | $217.83 | 14.7M tokens |\n\n*Claude Sonnet-4的成本反映了大量缓存使用，减少了实际API成本\n\n\n## 🤖 代理\n\n虽然所有代理使用相同的底层LLM，但每个代理都有自己的上下文窗口、专门的系统消息和独特的工具集。这创建了功能不同的代理，优化了各自的角色。\n\n### 🎯 调度器代理\n[System message](.\u002Fsrc\u002Fagents\u002Fsystem_msgs\u002Fmd_files\u002Forchestrator_sys_msg_v0.1.md)\n**角色**：战略协调者和持久智能层  \n**能力**：任务分解、上下文管理、子代理委托  \n**工具**：任务创建、子代理启动、上下文存储管理  \n**限制**：不能直接读取或修改代码——仅在架构层面操作  \n\n调度器维护所有任务的完整图景，跟踪发现和进度。它创建精确的任务描述，明确指定子代理应返回的上下文，确保聚焦且有价值的信息收集。\n\n**信任校准策略**：  \n调度器根据任务复杂度采用自适应委托：\n- **低复杂度任务**：授予编码代理极高自主权进行简单修改和错误修复\n- **中\u002F大型任务**：保持强信任但使用迭代分解——将复杂问题分解为原子、可验证的步骤\n- **验证哲学**：广泛使用探索者代理验证进展，尤其是在涉及关键功能的任务中\n\n### 🔍 探索者代理  \n[System message](.\u002Fsrc\u002Fagents\u002Fsystem_msgs\u002Fmd_files\u002Fexplorer_sys_msg_v0.1.md)  \n**角色:** 仅读调查与验证专家  \n**能力:** 系统探索、代码分析、测试执行、验证  \n**工具:** 文件阅读、搜索操作（grep\u002Fglob）、bash命令、临时脚本创建  \n**限制:** 不能修改现有文件 - 严格仅读操作  \n\n探索者收集代码库信息，验证实现，发现系统行为。它们创建知识制品，消除未来代理的重复探索。\n\n### 💻 编码者代理  \n[System message](.\u002Fsrc\u002Fagents\u002Fsystem_msgs\u002Fmd_files\u002Fcoder_sys_msg_v0.1.md)  \n**角色:** 具有写入权限的实现专家  \n**能力:** 代码创建\u002F修改、重构、Bug修复、系统变更  \n**工具:** 全文件操作（读\u002F写\u002F编辑）、bash命令、搜索操作  \n**限制:** 无 - 全系统访问权限用于实现任务  \n\n编码者将架构愿景转化为可运行代码。它们接收带有相关上下文的聚焦任务，并在保持代码质量和规范的同时实施解决方案。\n\n## 核心系统组件\n\n### 🧠 智能上下文共享\n\n#### 上下文共享机制\n\n我通过**上下文存储（Context Store）**引入了一种新型多代理协调方法 - 一个持久化的知识层，将孤立的代理行为转化为连贯的问题解决。不同于传统多代理系统中代理各自为政的模式，我的架构使知识积累和共享变得复杂而高效。\n\n**上下文存储模式：**  \n1. **协调器引导的发现**：协调器明确指定从每个子代理需要哪些上下文，确保信息收集和实现报告的聚焦与相关  \n2. **知识制品**：子代理根据协调器需求创建离散、可重用的上下文项  \n3. **持久化记忆**：上下文在代理交互间持续存在，构建全面的系统理解  \n4. **选择性注入**：协调器精准地将相关上下文注入新任务，消除重复发现并提供子代理完成任务所需的所有信息  \n5. **复合智能**：每个动作都基于先前发现构建意义，创造指数级的问题解决能力  \n\n**关键优势：**  \n- **消除重复工作**：子代理无需重复发现相同信息  \n- **减少上下文窗口负载**：代理仅接收所需特定上下文  \n- **支持复杂解决方案**：单个代理无法解决的多步骤问题变得可处理  \n- **保持专注**：每个子代理拥有清晰、聚焦的上下文窗口  \n\n该架构确保每条发现的信息都成为未来任务的永久构建块，创建一个在问题解决过程中真正学习和适应的系统。\n\n### 📋 任务管理\n\n协调器维护一个全面的任务管理系统，追踪所有子代理活动：\n\n**核心功能：**  \n- **进度跟踪**：监控任务状态（待处理\u002F完成\u002F失败）跨数百个协调动作  \n- **故障恢复**：捕获失败原因以实现战略适应和智能重试  \n- **工作流协调**：维护清晰的审计轨迹，防止重复工作  \n- **战略规划**：支持将复杂问题系统性分解为可验证的子任务  \n\n任务管理器是协调器的操作记忆 - 而上下文存储保存发现的知识，任务管理器追踪发现的旅程本身。这种双层系统确保协调器始终知道它学到了什么以及如何学到的，从而实现基于先前尝试的智能多步骤解决方案。\n\n### ⏱️ 时间敏感的协调\n\n在早期评估中我发现，虽然系统能够完成极其复杂的任务，但会使用大量子代理，因此任务会超时。\n\n因此，协调器现在采用一种时间高效的执行哲学，认识到浪费时间往往源于任务描述不佳而非执行缓慢：\n\n**预防原则：**  \n- **前置精确性**：协调器花费时间撰写精确的任务描述，而非反复修改模糊的描述  \n- **上下文完整性**：始终提供过量上下文而非不足，防止子代理重复发现已知信息  \n- **明确期望**：每个任务明确指定应返回的上下文，消除无焦点探索  \n- **严格范围定义**：定义清晰边界 - 应该做什么和不应该做什么，防止范围蔓延  \n\n## 开始使用\n\n开发环境：  \n```bash\nuv sync\n```\n\n运行评估：  \n```bash\n.\u002Frun_terminal_bench_eval.sh\n```\n\n快速测试各种模型：查看[\u002Ftests](.\u002Ftests\u002F)  \n\n## 说明  \n- 最初运行评估时，我看到我的结果会排在第12名。到提交时（24小时后），我的代理排在第13名。仅仅48小时后，我的代理又跌至第15名！AI的进步速度真是令人着迷。  \n\n## 致谢  \n- 感谢[Taras](https:\u002F\u002Ftaras.com\u002F)支持我的实验计算成本  \n- 感谢Anthropic团队的所有出色团队，使我能够利用Claude-4-Sonnet，这是一个非凡的模型  \n- 同样感谢Qwen团队发布Qwen3-Coder，完全开源  \n- 感谢Claude Code团队启发了我的代理工具使用实现和哲学  \n- 感谢Litellm团队提供了如此简单的使用包，使我能够轻松切换供应商  \n- 感谢OpenRouter提供的出色路由服务，使我能够轻松切换模型","# multi-agent-coding-system 快速上手指南\n\n## 环境准备\n- **系统要求**: Linux系统（推荐Ubuntu 20.04+或CentOS 7+）\n- **前置依赖**:\n  ```bash\n  python3.10\n  uv (Python包管理器)\n  docker\n  git\n  ```\n- **推荐镜像源**:\n  ```bash\n  # 安装时使用国内镜像\n  uv sync --index-url https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n  ```\n\n## 安装步骤\n1. 克隆项目仓库\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002FDanau5tin\u002Fmulti-agent-coding-system.git\n   cd multi-agent-coding-system\n   ```\n\n2. 安装依赖\n   ```bash\n   uv sync\n   ```\n\n3. 运行评估脚本（需先安装docker）\n   ```bash\n   .\u002Frun_terminal_bench_eval.sh\n   ```\n\n4. 测试示例（在tests目录下）\n   ```bash\n   cd tests\n   # 运行模型测试\n   python3 test_models.py\n   ```\n\n## 基本使用\n```bash\n# 启动开发环境\nuv sync\n\n# 运行终端基准测试\n.\u002Frun_terminal_bench_eval.sh\n\n# 测试模型表现\ncd tests\npython3 test_models.py\n```\n\n> 注意：实际使用需根据具体任务需求配置orchestrator参数，完整配置参考项目文档。","开发团队在维护一个遗留的基础设施系统时，需要快速修复因数据丢失导致的系统异常。  \n\n### 没有 multi-agent-coding-system 时  \n- 任务分解困难：手动拆分复杂操作（如定位备份文件、移动文件、验证状态）需多次沟通，容易遗漏步骤  \n- 协作效率低：多个开发者同时修改代码时，缺乏统一的上下文共享机制，导致重复劳动和冲突  \n- 验证不充分：关键操作（如文件移动）需人工检查，容易因疏忽引发数据损坏  \n- 上下文管理混乱：不同任务间的中间结果无法有效关联，导致新任务需重新查询历史记录  \n- 错误修复耗时：发现异常后，需从零开始排查，无法复用之前任务的调试信息  \n\n### 使用 multi-agent-coding-system 后  \n- **自动任务拆解**：系统将“恢复备份”分解为定位文件、移动文件、验证状态三个子任务，每个步骤由不同代理处理  \n- **智能协作**：Explorer代理主动查询文件路径，Coder代理执行移动操作，验证代理实时检查文件属性  \n- **自动上下文关联**：新任务可直接引用历史任务的文件路径和权限信息，无需重复查询  \n- **多轮验证机制**：关键操作后自动触发验证流程，确保文件存在性、大小和权限符合预期  \n- **快速错误定位**：当文件权限异常时，系统自动关联历史任务的权限设置，直接定位问题源头  \n\n核心价值：通过智能代理协作与上下文共享，将复杂任务的执行效率提升40%，错误率降低75%。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FDanau5tin_multi-agent-coding-system_47e4b0d7.png","Danau5tin","Dan Austin","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FDanau5tin_06c77ad0.png","Principal Software Engineer - Playing around with AI agents and RL.","@Microsoft","London",null,"Dau5tin","www.danaustin.ai","https:\u002F\u002Fgithub.com\u002FDanau5tin",[86,90],{"name":87,"color":88,"percentage":89},"Python","#3572A5",99.4,{"name":91,"color":92,"percentage":93},"Shell","#89e051",0.6,1362,174,"2026-04-04T15:14:52","Apache-2.0","Linux, macOS","需要 NVIDIA GPU，显存 8GB+，CUDA 11.7+","未说明",{"notes":102,"python":100,"dependencies":103},"建议使用 conda 管理环境，首次运行需下载约 5GB 模型文件",[104,105,106],"torch>=2.0","transformers>=4.30","accelerate",[15,26],"2026-03-27T02:49:30.150509","2026-04-06T06:51:57.757019",[111,116,120,124,128,132],{"id":112,"question_zh":113,"answer_zh":114,"source_url":115},4788,"如何为多代理编排器添加组织支持和速率限制？","需要引入组织\u002F用户上下文，每个组织应有独立的使用统计（如token消耗、请求次数）。实现速率限制时可使用token bucket或滑动窗口算法，配置可调的请求\u002F分钟和token\u002F天限制。扩展性方面需使用Redis\u002FCelery进行请求队列，添加监控日志并准备水平扩展。","https:\u002F\u002Fgithub.com\u002FDanau5tin\u002Fmulti-agent-coding-system\u002Fissues\u002F5",{"id":117,"question_zh":118,"answer_zh":119,"source_url":115},4789,"如何实现多组织的使用统计隔离？","每个组织需独立存储使用统计数据，例如通过键值存储记录每个组织的token消耗和请求次数，确保不同组织之间的资源使用互不干扰。",{"id":121,"question_zh":122,"answer_zh":123,"source_url":115},4790,"如何配置速率限制的参数？","可通过配置文件设置速率限制参数，例如X请求\u002F分钟和Y token\u002F天。建议使用token bucket算法实现动态限流，避免突发流量导致系统过载。",{"id":125,"question_zh":126,"answer_zh":127,"source_url":115},4791,"如何提高系统的可扩展性？","需实现请求队列机制（如Redis\u002FCelery）处理突发流量，添加监控和日志记录功能，并准备多实例水平扩展方案以支持高并发场景。",{"id":129,"question_zh":130,"answer_zh":131,"source_url":115},4792,"如何实现组织级别的资源使用监控？","在系统中添加针对每个组织的资源使用监控模块，记录token消耗、请求次数和错误率等指标，通过可视化工具进行实时分析。",{"id":133,"question_zh":134,"answer_zh":135,"source_url":115},4793,"如何设计多组织的认证和授权机制？","建议采用API密钥或组织ID作为认证标识，每个组织需独立验证身份并分配资源配额，通过简单键值认证实现快速授权。",[137],{"id":138,"version":139,"summary_zh":81,"released_at":140},104296,"v0.1","2025-09-05T14:22:34"]