[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-augmentcode--augment-swebench-agent":3,"tool-augmentcode--augment-swebench-agent":62},[4,18,26,36,46,54],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",160411,2,"2026-04-18T23:33:24",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":42,"last_commit_at":43,"category_tags":44,"status":17},8272,"opencode","anomalyco\u002Fopencode","OpenCode 是一款开源的 AI 编程助手（Coding Agent），旨在像一位智能搭档一样融入您的开发流程。它不仅仅是一个代码补全插件，而是一个能够理解项目上下文、自主规划任务并执行复杂编码操作的智能体。无论是生成全新功能、重构现有代码，还是排查难以定位的 Bug，OpenCode 都能通过自然语言交互高效完成，显著减少开发者在重复性劳动和上下文切换上的时间消耗。\n\n这款工具专为软件开发者、工程师及技术研究人员设计，特别适合希望利用大模型能力来提升编码效率、加速原型开发或处理遗留代码维护的专业人群。其核心亮点在于完全开源的架构，这意味着用户可以审查代码逻辑、自定义行为策略，甚至私有化部署以保障数据安全，彻底打破了传统闭源 AI 助手的“黑盒”限制。\n\n在技术体验上，OpenCode 提供了灵活的终端界面（Terminal UI）和正在测试中的桌面应用程序，支持 macOS、Windows 及 Linux 全平台。它兼容多种包管理工具，安装便捷，并能无缝集成到现有的开发环境中。无论您是追求极致控制权的资深极客，还是渴望提升产出的独立开发者，OpenCode 都提供了一个透明、可信",144296,1,"2026-04-16T14:50:03",[13,45],"插件",{"id":47,"name":48,"github_repo":49,"description_zh":50,"stars":51,"difficulty_score":32,"last_commit_at":52,"category_tags":53,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",109154,"2026-04-18T11:18:24",[14,15,13],{"id":55,"name":56,"github_repo":57,"description_zh":58,"stars":59,"difficulty_score":32,"last_commit_at":60,"category_tags":61,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[45,13,15,14],{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":78,"owner_twitter":73,"owner_website":79,"owner_url":80,"languages":81,"stars":90,"forks":91,"last_commit_at":92,"license":93,"difficulty_score":10,"env_os":94,"env_gpu":95,"env_ram":95,"env_deps":96,"category_tags":103,"github_topics":77,"view_count":32,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":104,"updated_at":105,"faqs":106,"releases":137},9522,"augmentcode\u002Faugment-swebench-agent","augment-swebench-agent","The #1 open-source SWE-bench Verified implementation","augment-swebench-agent 是一款开源的 AI 编程助手，专为应对真实的软件工程挑战而设计。与仅解决孤立算法题的传统基准不同，它基于 SWE-bench Verified 标准，能够处理源自 GitHub 真实项目的复杂任务，包括代码库导航、回归测试迭代及多步骤问题修复。\n\n该工具旨在解决 AI 在面对大型项目时难以理解上下文、执行安全命令及整合多方案结果的难题。它特别适合软件开发者、AI 研究人员及希望评估大模型工程能力的技术团队使用。用户既可通过交互模式将其作为个人编码助手，也能利用评测套件进行自动化基准测试。\n\n其核心技术亮点在于采用了“核心驱动 + 集成投票”的双层架构：以业界领先的 Claude Sonnet 3.7 作为主代理，负责调用 Bash、文件编辑及顺序思维等工具；同时结合 OpenAI o1 模型进行多数投票集成，从多个候选方案中筛选最优解。此外，项目内置了安全的命令审批机制和 Docker 容器化支持，确保在隔离环境中稳定运行。作为一个轻量级且易于扩展的实现，它为构建和验证高性能开源编码 Agent 提供了坚实的基线参考。","# Augment SWE-bench Verified Agent\n\n[SWE-bench Verified](https:\u002F\u002Fwww.swebench.com\u002F) tests how well AI systems handle software engineering tasks pulled from actual GitHub issues in popular open-source projects. Some example problems can be found in OpenAI’s [original blog post on the benchmark](https:\u002F\u002Fopenai.com\u002Findex\u002Fintroducing-swe-bench-verified\u002F). Where most coding benchmarks focus on isolated Leetcode-style programming problems, SWE-bench involves codebase navigation, iterating against a suite of regression tests, and overall much more complexity.\n\nTo achieve a 65.4% success rate on our first-ever SWE-bench submission we combined Claude Sonnet 3.7 as our core driver, along with OpenAI’s o1 as our ensembler. We deferred leveraging our own models to build a strong open-source baseline agent with off-the-shelf models.\n\nSince Anthropic's models are currently state-of-the-art on code, we used Claude Sonnet 3.7 as our agent's core driver, and we forked our agent system architecture from [Anthropic's own blog post about SWE-bench](https:\u002F\u002Fwww.anthropic.com\u002Fnews\u002Fclaude-3-7-sonnet).\n\n## Features\n\n- Small and simple coding agent implementation + SWE-bench docker harness that is super easy to run and build on top of.\n- Implementation of tools from our SWE-bench submission:\n  - Bash command execution\n  - File viewing and editing\n  - Sequential thinking for complex problem-solving\n- Prompt template + system prompt from our SWE-bench submission.\n- Integration with Anthropic's Claude for core agent and OpenAI models for ensembling\n- Command approval management for safe execution\n- Majority vote ensembler for selecting the best solution from multiple candidates\n- Support for running agent in a Docker container\n- Support for running SWE-bench eval harness\n\n## Installation\n\n### Prerequisites\n\n- [Docker](https:\u002F\u002Fwww.docker.com\u002F) (We tested with `Docker version 26.1.3, build 26.1.3-0ubuntu1~22.04.1`.)\n- Anthropic API key (for Claude models)\n- OpenAI API key (for OpenAI models)\n\n### Setup\n\n1. Clone the repository:\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Faugmentcode\u002Faugment-swebench-agent.git\n   cd augment-swebench-agent\n   ```\n\n2. Install dependencies:\n   ```bash\n   .\u002Fsetup.sh\n   source .venv\u002Fbin\u002Factivate\n   ```\n\n3. Set your API keys:\n   ```bash\n   # For Anthropic Claude models\n   export ANTHROPIC_API_KEY=your_anthropic_api_key_here\n\n   # For OpenAI models\n   export OPENAI_API_KEY=your_openai_api_key_here\n   ```\n\n## Ways to use this repo\n\n- Interactive mode: Use `cli.py` to spin up an interactive agent for experimentation or as a personal coding assistant!\n- SWE-bench mode: Use `run_agent_on_swebench_problem.py` to run the agent on SWE-bench problems. This is similar to the script we used to generate our SWE-bench submission.\n\nMore details on both below!\n\n## Usage (interactive mode)\n\nRun the CLI interface to interact with the agent directly. By default, the agent will run\nin the current directory.\n\n```bash\npython cli.py\n```\n\nThis will start an interactive session where you can communicate with the agent and assign it tasks.\n\n### Command-line Options\n\n- `--workspace`: Path to the workspace directory (default: current directory)\n- `--problem-statement`: Provide a problem statement to make the agent non-interactive (default: None)\n- `--needs-permission`: Whether to require permission before executing commands (default: False)\n- `--use-container-workspace`: Path to the shared volume that is mounted into the Docker container. This must be set if you are using `--docker-container-id`. (default: None)\n- `--docker-container-id`: ID of the Docker container to use. This must be set if you are using `--use-container-workspace`. (default: None)\n\nExample:\n```bash\npython cli.py --workspace \u002Fpath\u002Fto\u002Fproject --problem-statement \"Fix the login issue\"\n```\n\n### Non-interactive Mode\n\nYou can run the agent in non-interactive mode by providing a problem statement:\n\n```bash\npython cli.py --problem-statement \"Implement a feature to sort items by date\"\n```\n\n### Using Docker\n\nIf you want to use a Docker container for the workspace, you need to specify the path to the Docker container\nvolume as well as the Docker container ID:\n\n```bash\npython cli.py --use-container-workspace --docker-container-id \u003Ccontainer_id> --workspace \u002Fpath\u002Fto\u002Fdocker\u002Fvolume\n```\n\n## Usage (SWE-bench mode)\n\n### Quick Test Run\n\nAs a test run, run the following. It will generate 2 candidate solutions for each of 5 problems. It will also run the evaluation step for each candidate solution. Finally, it will provide instructions for how to run ensembler on the results.\n```bash\npython run_agent_on_swebench_problem.py --num-examples 5 --num-candidate-solutions 2\n```\n\nYou can increase `--num-examples` and `--num-candidate-solutions` to run on more problems and generate more candidate solutions. But be aware that this will take longer and cost more money.\n\n### Command-line Options\n\n- `--num-examples`: Number of examples to run on (default: None, which runs on all examples)\n- `--shard-ct`: Number of shards to split the work into (default: 1)\n- `--shard-id`: Shard ID to run (0-indexed, default: 0)\n- `--num-processes`: Number of processes to use for each example (default: 8)\n- `--num-candidate-solutions`: Number of candidate solutions to generate for each example (default: 8)\n\n### Running on more examples.\n\nThere are 500 examples total in SWE-bench Verified. Note that this can take awhile, so there are a few levels of parallelism this repository supports.\n- Firstly, we suggest running 8 processes. This is the `--num-processes` flag. Beyond this, Docker hits issues.\n- Secondly, we support a notion of breaking up the dataset into shards. This is the `--shard-ct` and `--shard-id` flags. This makes it relatively easy to split up the work across multiple machines, which circumnvents the issues with scaling Docker beyond 8 processes.\n\nIn our experiments, it took us a couple hours to run the full evaluation for 1 candidate solution per problem. This was\nwith 10 shards split out across separate pods (managed by Kubernetes) and each pod had 8 processes.\n\nKeep in mind that you may hit rate-limits from Anthropic running 80 agents in parallel like we did. We have very high rate-limits with Anthropic's API that you may not have. Given this, you may have to run with a smaller `--shard-ct` and\u002For `--num-processes`.\n\nSuppose you want to run with 10 shards and 8 processes per shard, then that would mean you run the following command 10 times, varying the `--shard-id` flag from 0 to 9, on 10 different machines:\n```bash\npython run_agent_on_swebench_problem.py --shard-ct 10 --shard-id \u003Cworker_index> > logs.out 2> logs.err\n```\n\n### Majority Vote Ensembler\n\nThe Majority Vote Ensembler is a tool that helps select the best solution from multiple candidates using an LLM. It works by presenting multiple candidate solutions to a problem to OpenAI's o1 model and asking it to analyze and select the most common solution.\n\n#### How It Works\n\n1. The tool takes a JSON file containing problems, each with multiple candidate solutions (diffs)\n2. For each problem, it constructs a prompt using the `build_ensembler_prompt` function\n3. The prompt is sent to o1.\n4. The LLM analyzes all candidate solutions and selects the best one\n5. The tool extracts the selected solution index from the LLM's response\n6. Results are saved to a JSON file\n\n#### Usage\n\n```bash\npython majority_vote_ensembler.py path\u002Fto\u002Finput.jsonl --output_path path\u002Fto\u002Foutput.json --workers 8\n```\n\nWhere:\n- `path\u002Fto\u002Finput.jsonl` is a JSONL file containing problems and candidate solutions (see `example_ensembler_dataset.jsonl` for format)\n- `--output_path` specifies where to save the results\n- `--workers` sets the number of worker threads for parallel processing (default: 8)\n\n#### Example\n\n```bash\npython majority_vote_ensembler.py example_ensembler_data.jsonl --output_path example_ensembler_results.json\n```\n\n#### Input Format\n\nThe input JSONL file should contain a list of problem objects, each with the following structure. The `diffs` are the candidate solutions generated by the agent. The `eval_outcomes` are the results of running the eval harness on each candidate solution, where the index corresponds to the index in the `diffs` array.\n\n```json\n{\n  \"id\": \"problem-1\",\n  \"instruction\": \"Add a function to calculate factorial\",\n  \"diffs\": [\n    \"```diff\\n@@ -10,3 +10,10 @@\\n def function():\\n     return x\\n+\\n+def new_function():\\n+    return y\\n```\",\n    \"...other candidate solutions...\"\n  ],\n  \"eval_outcomes\": [\n    {\n      \"is_success\": true\n    },\n    {\n      \"is_success\": false\n    },\n    {\n      \"is_success\": true\n    }\n  ]\n}\n```\n\n#### Output Format\n\nThe output JSON file will contain an array of result objects, each with the following structure:\n\n```json\n[\n  {\n    \"id\": \"problem-1\",\n    \"instruction\": \"Add a function to calculate factorial\",\n    \"response\": \"[LLM's full response text]\",\n    \"selected_diff_index\": 2,\n    \"selected_diff\": \"[The selected diff content]\",\n    \"is_eval_success\": true\n  }\n]\n```\n\n## Development\n\n### Running Tests\n\n```bash\npytest\n```\n\n### Adding New Tools\n\nTo add a new tool to the agent:\n\n1. Create a new tool class in the `tools\u002F` directory\n2. Implement the required methods (run_impl, get_tool_param, etc.)\n3. Add the tool to the agent's tools list in `tools\u002Fagent.py`\n\n### Customizing the Agent Prompts\n\nThe agent's prompts are defined in the `prompts\u002F` directory. You can customize the prompts by modifying the template strings in the respective files.\n\n### Customizing the Majority Vote Ensembler\n\nYou can customize the Majority Vote Ensembler by modifying:\n\n- `prompts\u002Fensembler_prompt.py`: Change the prompt template used for ensembling\n- Change the LLM model by modifying the `get_client` call in `process_problem` function\n\n## Contributing\n\nContributions are welcome! Please open an issue or submit a pull request.\n\n## License\n\nThis project is licensed under the MIT License.\n","# Augment SWE-bench 验证代理\n\n[SWE-bench Verified](https:\u002F\u002Fwww.swebench.com\u002F) 用于测试 AI 系统在处理来自热门开源项目真实 GitHub 问题的软件工程任务时的表现。一些示例问题可以在 OpenAI 的 [关于该基准的原始博客文章](https:\u002F\u002Fopenai.com\u002Findex\u002Fintroducing-swe-bench-verified\u002F) 中找到。与大多数专注于孤立 Leetcode 式编程题目的编码基准不同，SWE-bench 涉及代码库导航、针对回归测试套件进行迭代，以及整体上更高的复杂性。\n\n为了在我们首次提交 SWE-bench 测试中达到 65.4% 的成功率，我们结合了 Claude Sonnet 3.7 作为核心驱动模型，并使用 OpenAI 的 o1 作为集成器。我们选择不直接利用自己的模型，而是采用现成的模型来构建一个强大的开源基线代理。\n\n由于 Anthropic 的模型目前在代码生成领域处于最先进水平，我们便将 Claude Sonnet 3.7 用作代理的核心驱动模型，并从 Anthropic 自己关于 SWE-bench 的博客文章中借鉴了代理系统架构：[Anthropic 关于 SWE-bench 的博客文章](https:\u002F\u002Fwww.anthropic.com\u002Fnews\u002Fclaude-3-7-sonnet)。\n\n## 功能特性\n\n- 小型且简单的编码代理实现 + SWE-bench Docker 测试框架，易于运行和扩展。\n- 实现了我们在 SWE-bench 提交中使用的工具：\n  - Bash 命令执行\n  - 文件查看与编辑\n  - 用于复杂问题解决的序列化思维\n- 来自我们 SWE-bench 提交的提示模板 + 系统提示。\n- 集成 Anthropic 的 Claude 作为核心代理模型，以及 OpenAI 的模型用于集成。\n- 命令审批管理，确保安全执行。\n- 多数投票集成器，用于从多个候选方案中选出最佳解。\n- 支持在 Docker 容器中运行代理。\n- 支持运行 SWE-bench 评估框架。\n\n## 安装\n\n### 先决条件\n\n- [Docker](https:\u002F\u002Fwww.docker.com\u002F)（我们使用 `Docker version 26.1.3, build 26.1.3-0ubuntu1~22.04.1` 进行测试。）\n- Anthropic API 密钥（用于 Claude 模型）\n- OpenAI API 密钥（用于 OpenAI 模型）\n\n### 设置步骤\n\n1. 克隆仓库：\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Faugmentcode\u002Faugment-swebench-agent.git\n   cd augment-swebench-agent\n   ```\n\n2. 安装依赖项：\n   ```bash\n   .\u002Fsetup.sh\n   source .venv\u002Fbin\u002Factivate\n   ```\n\n3. 设置您的 API 密钥：\n   ```bash\n   # 对于 Anthropic Claude 模型\n   export ANTHROPIC_API_KEY=your_anthropic_api_key_here\n\n   # 对于 OpenAI 模型\n   export OPENAI_API_KEY=your_openai_api_key_here\n   ```\n\n## 使用本仓库的方式\n\n- 交互模式：使用 `cli.py` 启动交互式代理，用于实验或作为个人编码助手！\n- SWE-bench 模式：使用 `run_agent_on_swebench_problem.py` 在 SWE-bench 问题上运行代理。这与我们用于生成 SWE-bench 提交结果的脚本类似。\n\n以下将详细介绍这两种使用方式！\n\n## 使用方法（交互模式）\n\n运行 CLI 界面以直接与代理交互。默认情况下，代理将在当前目录下运行。\n\n```bash\npython cli.py\n```\n\n这将启动一个交互式会话，您可以在其中与代理沟通并为其分配任务。\n\n### 命令行选项\n\n- `--workspace`：工作区目录路径（默认：当前目录）\n- `--problem-statement`：提供问题陈述以使代理进入非交互模式（默认：无）\n- `--needs-permission`：是否需要在执行命令前获得许可（默认：假）\n- `--use-container-workspace`：挂载到 Docker 容器中的共享卷路径。如果您使用 `--docker-container-id`，则必须设置此参数。（默认：无）\n- `--docker-container-id`：要使用的 Docker 容器 ID。如果您使用 `--use-container-workspace`，则必须设置此参数。（默认：无）\n\n示例：\n```bash\npython cli.py --workspace \u002Fpath\u002Fto\u002Fproject --problem-statement \"修复登录问题\"\n```\n\n### 非交互模式\n\n您可以通过提供问题陈述来让代理以非交互模式运行：\n\n```bash\npython cli.py --problem-statement \"实现按日期排序商品的功能\"\n```\n\n### 使用 Docker\n\n如果您希望使用 Docker 容器作为工作区，则需要同时指定 Docker 容器卷的路径以及 Docker 容器 ID：\n\n```bash\npython cli.py --use-container-workspace --docker-container-id \u003Ccontainer_id> --workspace \u002Fpath\u002Fto\u002Fdocker\u002Fvolume\n```\n\n## 使用方法（SWE-bench 模式）\n\n### 快速测试运行\n\n作为一次测试运行，您可以执行以下命令。它将为 5 个问题中的每个问题生成 2 个候选解决方案，并对每个候选解决方案运行评估步骤。最后，它还会提供如何对结果进行集成的说明。\n\n```bash\npython run_agent_on_swebench_problem.py --num-examples 5 --num-candidate-solutions 2\n```\n\n您可以增加 `--num-examples` 和 `--num-candidate-solutions` 参数，以运行更多问题并生成更多候选解决方案。但请注意，这将花费更长的时间并产生更高的成本。\n\n### 命令行选项\n\n- `--num-examples`：要运行的示例数量（默认：无，即运行所有示例）\n- `--shard-ct`：将工作拆分为多少个分片（默认：1）\n- `--shard-id`：要运行的分片 ID（从 0 开始计数，默认：0）\n- `--num-processes`：每个示例使用的进程数量（默认：8）\n- `--num-candidate-solutions`：每个示例要生成的候选解决方案数量（默认：8）\n\n### 运行更多示例\n\nSWE-bench Verified 总共有 500 个示例。需要注意的是，这可能需要较长时间，因此本仓库支持多级并行处理：\n\n1. 首先，建议使用 8 个进程，即 `--num-processes` 参数。超过这个数量，Docker 可能会出现问题。\n2. 其次，我们支持将数据集拆分为多个分片，即 `--shard-ct` 和 `--shard-id` 参数。这样可以相对容易地将工作分配到多台机器上，从而避免 Docker 在超过 8 个进程时出现的问题。\n\n在我们的实验中，为每个问题运行 1 个候选解决方案的完整评估耗时约 2 小时。当时我们将 10 个分片分配到不同的 Pod 中（由 Kubernetes 管理），每个 Pod 运行 8 个进程。\n\n请记住，像我们一样并行运行 80 个代理可能会触发 Anthropic 的速率限制。而您可能没有我们那样高的 Anthropic API 速率限制。因此，您可能需要减少 `--shard-ct` 和\u002F或 `--num-processes` 的值。\n\n假设您想使用 10 个分片，每个分片 8 个进程，那么您需要在 10 台不同的机器上分别运行以下命令，每次更改 `--shard-id` 参数，从 0 到 9：\n\n```bash\npython run_agent_on_swebench_problem.py --shard-ct 10 --shard-id \u003Cworker_index> > logs.out 2> logs.err\n```\n\n### 多数投票集成器\n\n多数投票集成器是一种工具，它利用大语言模型从多个候选方案中选出最佳解决方案。其工作原理是将问题的多个候选解决方案提交给 OpenAI 的 o1 模型，并请求该模型分析并选择最普遍认同的方案。\n\n#### 工作原理\n\n1. 该工具接收一个包含问题的 JSON 文件，每个问题都附有多个候选解决方案（差异）。\n2. 对于每个问题，它使用 `build_ensembler_prompt` 函数构建提示。\n3. 将提示发送至 o1 模型。\n4. 大语言模型会分析所有候选解决方案，并选出最佳的一个。\n5. 工具从大语言模型的响应中提取所选解决方案的索引。\n6. 最终结果会被保存到一个 JSON 文件中。\n\n#### 使用方法\n\n```bash\npython majority_vote_ensembler.py path\u002Fto\u002Finput.jsonl --output_path path\u002Fto\u002Foutput.json --workers 8\n```\n\n参数说明：\n- `path\u002Fto\u002Finput.jsonl` 是一个包含问题和候选解决方案的 JSONL 文件（格式请参考 `example_ensembler_dataset.jsonl`）。\n- `--output_path` 指定保存结果的路径。\n- `--workers` 设置用于并行处理的工作线程数量（默认为 8）。\n\n#### 示例\n\n```bash\npython majority_vote_ensembler.py example_ensembler_data.jsonl --output_path example_ensembler_results.json\n```\n\n#### 输入格式\n\n输入的 JSONL 文件应包含一系列问题对象，每个问题对象的结构如下。`diffs` 是由智能体生成的候选解决方案，而 `eval_outcomes` 则是针对每个候选解决方案运行评估框架后得到的结果，其中索引与 `diffs` 数组中的索引相对应。\n\n```json\n{\n  \"id\": \"problem-1\",\n  \"instruction\": \"添加一个计算阶乘的函数\",\n  \"diffs\": [\n    \"```diff\\n@@ -10,3 +10,10 @@\\n def function():\\n     return x\\n+\\n+def new_function():\\n+    return y\\n```\",\n    \"...其他候选解决方案...\"\n  ],\n  \"eval_outcomes\": [\n    {\n      \"is_success\": true\n    },\n    {\n      \"is_success\": false\n    },\n    {\n      \"is_success\": true\n    }\n  ]\n}\n```\n\n#### 输出格式\n\n输出的 JSON 文件将包含一个结果对象数组，每个对象的结构如下：\n\n```json\n[\n  {\n    \"id\": \"problem-1\",\n    \"instruction\": \"添加一个计算阶乘的函数\",\n    \"response\": \"[大语言模型的完整响应文本]\",\n    \"selected_diff_index\": 2,\n    \"selected_diff\": \"[被选中的差异内容]\",\n    \"is_eval_success\": true\n  }\n]\n```\n\n## 开发\n\n### 运行测试\n\n```bash\npytest\n```\n\n### 添加新工具\n\n要向智能体添加新工具：\n\n1. 在 `tools\u002F` 目录下创建一个新的工具类。\n2. 实现所需的方法（如 `run_impl`、`get_tool_param` 等）。\n3. 将该工具添加到 `tools\u002Fagent.py` 中智能体的工具列表中。\n\n### 自定义智能体提示\n\n智能体的提示定义在 `prompts\u002F` 目录中。您可以通过修改相应文件中的模板字符串来定制提示。\n\n### 自定义多数投票集成器\n\n您可以自定义多数投票集成器，方法如下：\n\n- 修改 `prompts\u002Fensembler_prompt.py`：更改用于集成的提示模板。\n- 修改 `process_problem` 函数中的 `get_client` 调用，以更换使用的 LLM 模型。\n\n## 贡献\n\n欢迎贡献！请开立议题或提交拉取请求。\n\n## 许可证\n\n本项目采用 MIT 许可证授权。","# Augment SWE-bench Agent 快速上手指南\n\nAugment SWE-bench Agent 是一个开源的软件开发智能体，专为解决真实的 GitHub 问题而设计。它结合了 Claude Sonnet 3.7 作为核心驱动和 OpenAI o1 作为集成器，在 SWE-bench Verified 基准测试中取得了优异成绩。该工具支持代码库导航、回归测试迭代及复杂问题解决。\n\n## 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n*   **操作系统**：Linux 或 macOS（Windows 用户建议使用 WSL2）\n*   **Docker**：已安装并运行 Docker Engine（测试版本：`26.1.3` 或更高）。\n    *   国内用户可参考 [Docker 中国镜像加速](https:\u002F\u002Fdocker.mirrors.ustc.edu.cn\u002F) 配置以提升拉取速度。\n*   **Python**：建议 Python 3.8+（脚本将自动创建虚拟环境）\n*   **API 密钥**：\n    *   [Anthropic API Key](https:\u002F\u002Fconsole.anthropic.com\u002F)（用于 Claude 模型）\n    *   [OpenAI API Key](https:\u002F\u002Fplatform.openai.com\u002F)（用于 o1 模型及结果集成）\n\n## 安装步骤\n\n### 1. 克隆仓库\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Faugmentcode\u002Faugment-swebench-agent.git\ncd augment-swebench-agent\n```\n\n### 2. 安装依赖\n运行设置脚本并激活虚拟环境：\n```bash\n.\u002Fsetup.sh\nsource .venv\u002Fbin\u002Factivate\n```\n> **提示**：如果 `.\u002Fsetup.sh` 执行缓慢，可检查 `requirements.txt` 并使用国内 pip 源手动安装：\n> `pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`\n\n### 3. 配置 API 密钥\n导出您的 API 密钥到环境变量：\n```bash\n# 配置 Anthropic (Claude)\nexport ANTHROPIC_API_KEY=your_anthropic_api_key_here\n\n# 配置 OpenAI (o1)\nexport OPENAI_API_KEY=your_openai_api_key_here\n```\n\n## 基本使用\n\n本工具提供两种主要模式：**交互式模式**（适合日常辅助开发）和 **SWE-bench 模式**（适合批量评测）。\n\n### 模式一：交互式使用 (Interactive Mode)\n\n这是最简单的使用方式，您可以直接与 Agent 对话，让它帮您修复代码或实现功能。\n\n**启动交互会话：**\n```bash\npython cli.py\n```\n\n**非交互式运行（直接下达任务）：**\n如果您想直接让 Agent 处理一个具体任务而不进入对话模式：\n```bash\npython cli.py --problem-statement \"Fix the login issue in auth.py\"\n```\n\n**常用参数说明：**\n*   `--workspace`: 指定工作目录（默认为当前目录）。\n*   `--needs-permission`: 设置为 `True` 可在执行命令前要求人工确认（更安全）。\n*   `--docker-container-id`: 若需在特定 Docker 容器中运行，需配合此参数及 `--use-container-workspace` 使用。\n\n### 模式二：SWE-bench 基准测试 (SWE-bench Mode)\n\n用于在标准的 SWE-bench 问题上运行 Agent 并生成解决方案。\n\n**快速测试运行：**\n以下命令将对 5 个问题各生成 2 个候选解决方案，并自动运行评估：\n```bash\npython run_agent_on_swebench_problem.py --num-examples 5 --num-candidate-solutions 2\n```\n\n**结果集成（多数投票）：**\n生成多个候选方案后，使用 o1 模型进行“多数投票”以选出最佳方案：\n```bash\npython majority_vote_ensembler.py path\u002Fto\u002Finput.jsonl --output_path results.json --workers 8\n```\n\n### 进阶提示\n*   **并行加速**：在处理大量数据时，可通过 `--shard-ct` 和 `--shard-id` 参数将任务分片，以便在多台机器或多进程下并行运行，绕过 Docker 进程数限制。\n*   **成本控制**：增加 `--num-candidate-solutions` 会显著提高 Token 消耗和时间，请根据需求调整。","某开源社区维护者正面对 GitHub 上积压的数十个复杂 Issue，需要快速定位并修复涉及多文件修改和回归测试的 Bug。\n\n### 没有 augment-swebench-agent 时\n- **上下文迷失**：面对大型代码库，人工梳理文件依赖关系耗时极长，容易遗漏关键调用链。\n- **测试迭代繁琐**：每次修改代码后需手动运行全套回归测试，失败后反复排查原因，效率低下。\n- **解决方案单一**：仅凭个人经验给出一种修复方案，缺乏多视角验证，极易引入新的边缘情况错误。\n- **执行风险不可控**：直接运行自动生成的脚本或命令存在破坏本地环境或误删文件的安全隐患。\n\n### 使用 augment-swebench-agent 后\n- **智能导航定位**：利用其内置的顺序思考（Sequential thinking）能力，自动遍历代码库并精准锁定故障文件与逻辑断点。\n- **自动化闭环验证**：代理自动执行 Bash 命令进行编辑与测试，在 Docker 隔离环境中无限次迭代直至所有回归测试通过。\n- **多数投票优选**：调用 OpenAI 模型作为集成器，对多个候选修复方案进行“多数投票”，自动选出鲁棒性最强的代码。\n- **安全命令审批**：通过命令审批管理机制，在执行高风险操作前拦截确认，确保开发环境的绝对安全。\n\naugment-swebench-agent 将原本需要数小时的人工调试过程压缩为全自动化的闭环流程，显著提升了处理真实世界软件工程难题的成功率与安全性。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Faugmentcode_augment-swebench-agent_f95ee76d.png","augmentcode","Augment Code","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Faugmentcode_083527b4.png","Developer AI that deeply understands your codebase and how your team builds software.",null,"auggie@augmentcode.com","https:\u002F\u002Faugmentcode.com\u002F","https:\u002F\u002Fgithub.com\u002Faugmentcode",[82,86],{"name":83,"color":84,"percentage":85},"Python","#3572A5",99.4,{"name":87,"color":88,"percentage":89},"Shell","#89e051",0.6,867,155,"2026-04-15T20:42:58","NOASSERTION","Linux, macOS","未说明",{"notes":97,"python":98,"dependencies":99},"该工具主要依赖外部 API（Claude 和 OpenAI），无需本地 GPU 运行模型。必须安装 Docker 以支持容器化工作空间和安全执行命令。在大规模并行运行（如 80 个代理）时可能会遇到 API 速率限制，建议通过分片（sharding）和多机部署来扩展任务。","未说明 (需通过 .\u002Fsetup.sh 创建虚拟环境)",[100,101,102],"Docker (版本 26.1.3 已测试)","Anthropic API Key","OpenAI API Key",[35,13],"2026-03-27T02:49:30.150509","2026-04-19T15:26:26.007426",[107,112,117,122,127,132],{"id":108,"question_zh":109,"answer_zh":110,"source_url":111},42741,"如何提交功能建议或产品反馈？","请将功能建议和产品反馈提交到官方的 Discord 频道。维护者建议在 Discord 的 \"Feedback\" 频道中提出您的想法，以便团队更好地收集和处理。","https:\u002F\u002Fgithub.com\u002Faugmentcode\u002Faugment-swebench-agent\u002Fissues\u002F18",{"id":113,"question_zh":114,"answer_zh":115,"source_url":116},42742,"文档过时了，我该如何贡献修改？","如果您发现文档过时并希望贡献内容，请加入官方 Discord 社区（https:\u002F\u002Fdiscord.com\u002Finvite\u002FVWFnprrCDh）进行讨论。这是与团队沟通贡献方式的最佳渠道。","https:\u002F\u002Fgithub.com\u002Faugmentcode\u002Faugment-swebench-agent\u002Fissues\u002F22",{"id":118,"question_zh":119,"answer_zh":120,"source_url":121},42743,"VS Code 插件无法使用或一直显示加载中怎么办？","如果遇到 VS Code 插件无法工作或一直显示加载状态的问题，这属于产品使用问题而非开源仓库代码问题。请访问官方 Discord 服务器（https:\u002F\u002Fdiscord.gg\u002FNKpvMCKP）寻求技术支持和解决方案。","https:\u002F\u002Fgithub.com\u002Faugmentcode\u002Faugment-swebench-agent\u002Fissues\u002F15",{"id":123,"question_zh":124,"answer_zh":125,"source_url":126},42744,"遇到 HTTP 400 Bad Request 错误如何处理？","当遇到 HTTP 400 错误时，请记录您的 Request ID（如报错信息中所示），并前往官方 Discord 社区（https:\u002F\u002Fdiscord.gg\u002FNKpvMCKP）提供该 ID 以获取帮助。此仓库不直接处理此类产品服务问题。","https:\u002F\u002Fgithub.com\u002Faugmentcode\u002Faugment-swebench-agent\u002Fissues\u002F8",{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},42745,"如何在基于浏览器的 VS Code (如 Open VSX) 中安装插件？","目前该插件主要发布在 Visual Studio Marketplace。对于使用 Open VSX Registry 的浏览器版 VS Code 用户，暂时可以通过手动下载 .vsix 文件进行安装。关于自动更新的支持或正式上架 Open VSX 的计划，请在官方 Discord（https:\u002F\u002Fdiscord.gg\u002FNKpvMCKP）中查询最新进展。","https:\u002F\u002Fgithub.com\u002Faugmentcode\u002Faugment-swebench-agent\u002Fissues\u002F6",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},42746,"遇到服务不可用、数据造假或需要退款该怎么办？","如果遇到服务质量问题（如返回模拟数据却声称是真实数据、功能不可用等），由于 GitHub 仓库无法直接处理退款，请直接联系 Augment Code 客服或通过购买平台的客服渠道申请退款。在申诉时，请详细说明服务未达标的情况（如业务路由返回 404、依赖冲突未解决等）以作为退款依据。","https:\u002F\u002Fgithub.com\u002Faugmentcode\u002Faugment-swebench-agent\u002Fissues\u002F21",[]]