[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-EmergenceAI--Agent-E":3,"tool-EmergenceAI--Agent-E":64},[4,17,27,35,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,43,44,45,15,46,26,13,47],"数据工具","视频","插件","其他","音频",{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":10,"last_commit_at":54,"category_tags":55,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,46],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},2181,"OpenHands","OpenHands\u002FOpenHands","OpenHands 是一个专注于 AI 驱动开发的开源平台，旨在让智能体（Agent）像人类开发者一样理解、编写和调试代码。它解决了传统编程中重复性劳动多、环境配置复杂以及人机协作效率低等痛点，通过自动化流程显著提升开发速度。\n\n无论是希望提升编码效率的软件工程师、探索智能体技术的研究人员，还是需要快速原型验证的技术团队，都能从中受益。OpenHands 提供了灵活多样的使用方式：既可以通过命令行（CLI）或本地图形界面在个人电脑上轻松上手，体验类似 Devin 的流畅交互；也能利用其强大的 Python SDK 自定义智能体逻辑，甚至在云端大规模部署上千个智能体并行工作。\n\n其核心技术亮点在于模块化的软件智能体 SDK，这不仅构成了平台的引擎，还支持高度可组合的开发模式。此外，OpenHands 在 SWE-bench 基准测试中取得了 77.6% 的优异成绩，证明了其解决真实世界软件工程问题的能力。平台还具备完善的企业级功能，支持与 Slack、Jira 等工具集成，并提供细粒度的权限管理，适合从个人开发者到大型企业的各类用户场景。",70612,"2026-04-05T11:12:22",[26,15,13,45],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":80,"owner_website":81,"owner_url":82,"languages":83,"stars":100,"forks":101,"last_commit_at":102,"license":103,"difficulty_score":23,"env_os":104,"env_gpu":105,"env_ram":105,"env_deps":106,"category_tags":117,"github_topics":79,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":118,"updated_at":119,"faqs":120,"releases":151},3466,"EmergenceAI\u002FAgent-E","Agent-E","Agent driven automation starting with the web. Try it: https:\u002F\u002Fwww.emergence.ai\u002Fweb-automation-api","Agent-E 是一款基于智能体（Agent）的开源自动化系统，旨在让用户通过自然语言指令直接操控浏览器完成复杂任务。它解决了传统网页自动化工具依赖繁琐代码编写、难以应对动态网页变化的痛点，将操作门槛降低至“说话即执行”。\n\n用户只需描述需求，Agent-E 即可自主规划并执行多项操作：包括自动填写表单、在电商平台按特定条件筛选商品、从新闻或学术网站提取关键信息、控制视频播放，乃至在 JIRA 等项目管理平台中过滤和处理任务。无论是查找本地餐厅还是整理历史资料，它都能像私人助理一样灵活响应。\n\n这款工具特别适合希望提升工作效率的普通用户、需要快速验证想法的产品经理，以及致力于研究多智能体协作的开发者与科研人员。其核心技术亮点在于构建了分层规划机制，并基于成熟的 AG2 智能体框架开发，支持多智能体协同编排。项目不仅提供本地部署脚本，还推出了包含高级日志和云端扩展的企业级托管服务。对于想要探索\"AI 驱动自动化”前沿应用的人群，Agent-E 提供了一个功能全面且易于上手的实践平台。","# Agent-E\n**Free Trial: Managed Web Agent & Orchestrator**\u003Cbr>\nTry our Web Agent (Agent-E with enterprise enhancements) and multi-agent orchestrator. Access features such as advanced logging, role-based access, and cloud-hosted scalable infrastructure, backed by expert support. Sign up [here](https:\u002F\u002Fwww.emergence.ai\u002Forchestrator)\n\n\n[Discord](https:\u002F\u002Fdiscord.gg\u002FwgNfmFuqJF) &nbsp;&nbsp; [Cite paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.13032) _Note: The WebVoyager validation used [nested_chat_for_hierarchial_planning branch](https:\u002F\u002Fgithub.com\u002FEmergenceAI\u002FAgent-E\u002Ftree\u002Fnested_chat_for_hierarchial_planning) and GPT4-Turbo_\n\n\nAgent-E is an agent based system that aims to automate actions on the user's computer. At the moment it focuses on automation within the browser. The system is based on on [AG2 agent framework](https:\u002F\u002Fdocs.ag2.ai\u002Fdocs\u002FHome).\n\nThis provides a natural language way to interacting with a web browser:\n- Fill out forms (web forms not PDF yet) using information about you or from another site\n- Search and sort products on e-commerce sites like Amazon based on various criteria, such as bestsellers or price.\n- Locate specific content and details on websites, from sports scores on ESPN to contact information on university pages.\n- Navigate to and interact with web-based media, including playing YouTube videos and managing playback settings like full-screen and mute.\n- Perform comprehensive web searches to gather information on a wide array of topics, from historical sites to top local restaurants.\n- Manage and automate tasks on project management platforms (like JIRA) by filtering issues, easing the workflow for users.\n- Provide personal shopping assistance, suggesting products based on the user's needs, such as storage options for game cards.\n\nWhile Agent-E is growing, it is already equipped to handle a versatile range of tasks, but the best task is the one that you come up with. So, take it for a spin and tell us what you were able to do with it. For more information see our [blog article](https:\u002F\u002Fwww.emergence.ai\u002Fblog\u002Fdistilling-the-web-for-multi-agent-automation).\n\n\n## Quick Start Using Scripts\nTo get started with Agent-E, follow the steps below to install dependencies and configure your environment.\n#### 1. Run the Installation Script\n\n- **macOS\u002FLinux**:\n  - From the project root, run the following command to set up the environment and install all dependencies:\n    ```bash\n    .\u002Finstall.sh\n    ```\n    - For **Playwright support**, you can pass the `-p` flag to install Playwright without further prompting:\n      ```bash\n      .\u002Finstall.sh -p\n      ```\n\n- **Windows**:\n  - From the project root, execute the following command in PowerShell:\n    ```powershell\n    .\\win_install.ps1\n    ```\n    - To install Playwright without further prompting, add the `-p` flag:\n      ```powershell\n      .\\win_install.ps1 -p\n      ```\n#### 2. Configure Environment Variables\n- Go to the newly created `.env` and `agents_llm_config.json` and follow the instructions to set the fields\n\n#### 3. Run Agent-E\nOnce you have set up the environment and installed all the dependencies, you can run Agent-E using the following command:\n```bash\npython -m ae.main\n```\n\n**For macOS Users**\n```bash\npython -u -m ae.main\n```\n\n\n## Manual Setup\n\n### 1. Install `uv`\nAgent-E uses `uv` to manage the Python virtual environment and package dependencies.\n\n- **macOS\u002FLinux**:\n  ```bash\n  curl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\n\n- **Windows**:\n  ```bash\n  powershell -c \"irm https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.ps1 | iex\"\n  ```\n\n- Alternatively, you can install `uv` using `pip`: `pip install uv`\n\n### 2. Set up the virtual environment\nUse `uv` to create and activate a virtual environment for the project.\n```bash\nuv venv --python 3.11  # 3.10+ should also work\nsource .venv\u002Fbin\u002Factivate  # On Windows: .venv\\Scripts\\activate\n```\n\n### 3. Install dependencies\nGenerate the `requirements.txt` file from the `pyproject.toml` and install dependencies.\n```bash\nuv pip compile pyproject.toml -o requirements.txt\nuv pip install -r requirements.txt\n```\n\nTo install extras for development, run:\n```bash\nuv pip install -r pyproject.toml --extra dev\n```\n\n### 4.(Optional) Install Playwright Drivers\nIf you do not have Google Chrome installed locally and don’t want to install it, you can use Playwright for browser automation.\n```bash\nplaywright install\n```\n\n### 5. Configure the environment\nCreate a `.env` file by copying the provided example file.\n```bash\ncp .env-example .env\n```\n- Edit the `.env` file and set the following variables:\n    - `AUTOGEN_MODEL_NAME` (e.g., `gpt-4-turbo` for optimal performance).\n    - `AUTOGEN_MODEL_API_KEY` LLM API key.\n    - If using a model other than OpenAI, configure `AUTOGEN_MODEL_BASE_URL` (url where the completion endpoint is hosted, but don't put `\u002Fcompletion` in it), `AUTOGEN_MODEL_API_TYPE`, and `AUTOGEN_MODEL_API_VERSION`.\n- Optionally configure `AUTOGEN_LLM_TEMPERATURE` and `AUTOGEN_LLM_TOP_P`.\n- If you want to use local chrome browser over playwright browser, go to chrome:\u002F\u002Fversion\u002F in chrome, find the path to your profile and set `BROWSER_STORAGE_DIR` to the path value\n\n## Environment Variables\n\nAgent-E relies on several environment variables for its configuration. You need to define these in a `.env` file in the project root. A sample `.env-example` file is provided for convenience.\n\n### Key Variables:\n\n- **`AUTOGEN_MODEL_NAME`**  \n  Name of the LLM model you want to use (e.g., `gpt-4-turbo`). This is required for most setups.\n  \n- **`AUTOGEN_MODEL_API_KEY`**  \n  Your API key for accessing the LLM model (e.g., OpenAI API key).\n  \n- **`AUTOGEN_MODEL_BASE_URL`** *(optional)*  \n  Base URL for the model if it's hosted on a service other than OpenAI (e.g., Azure OpenAI services). Example:  \n  `https:\u002F\u002Fapi.groq.com\u002Fopenai\u002Fv1`  \n  or  \n  `https:\u002F\u002F\u003CYOUR_AZURE_ENDPOINT>.openai.azure.com`\n\n- **`AUTOGEN_MODEL_API_TYPE`** *(optional)*  \n  Type of model API (e.g., `azure` for Azure-hosted models).\n\n- **`AUTOGEN_MODEL_API_VERSION`** *(optional)*  \n  Version of the model API to use, typically needed for Azure models (e.g., `2023-03-15-preview`).\n\n- **`AUTOGEN_LLM_TEMPERATURE`** *(optional)*  \n  Sets the temperature for the LLM. Controls randomness in output. Defaults to `0.0` for `gpt-*` models.\n\n- **`AUTOGEN_LLM_TOP_P`** *(optional)*  \n  Sets the top-p value, which controls the diversity of token sampling. Defaults to `0.001` for `gpt-*` models.\n\n- **`BROWSER_STORAGE_DIR`** *(optional)*  \n  Path to your local Chrome browser profile, required if using a local Chrome instance instead of Playwright.\n\n- **`SAVE_CHAT_LOGS_TO_FILE`**  \n  Set to `true` or `false` (Default: `true`). Indicates whether to save chat logs to a file or print them to stdout.\n\n- **`LOG_MESSAGES_FORMAT`**  \n  Set to `json` or `text` (Default: `text`). Specifies the format for logging messages.\n\n- **`ADDITIONAL_SKILL_DIRS`** *(optional)*\n  A comma-separated list of directories or `.py` files where additional skills can be loaded from. This is used to dynamically load skills from specified directories or files.\n  Example: `ADDITIONAL_SKILL_DIRS=\".\u002Fprivate_skills,.\u002Fextra_skills\u002Fmy_custom_skill.py\"` would be added to the `.env` file (or equivalent)\n\n- **`PLANNER_USER_INPUT_SKILL_ENABLED`** *(optional)*\n  Set to `true` or `false` (Default: `false`). Specifies whether to allow the planner agent to get user input or not.\n  \n## Running the Code\n\nOnce you have set up the environment and installed all the dependencies, you can run Agent-E using `.\u002Frun.sh` script or using the following command:\n```bash\npython -m ae.main\n```\n\n### For macOS Users\nIf you encounter `BlockingIOError` (Errno 35) when running the program on macOS, execute the following command to avoid the issue:\n```bash\npython -u -m ae.main\n```\n\n### Expected Behavior\nOnce Agent-E is running, you should see an icon in the browser interface. Clicking on this icon will open a chat-like interface where you can input natural language commands. Example commands you can try:\n- `open youtube and search for funny cat videos`\n- `find iPhone 14 on Amazon and sort by best seller`\n\n## Advanced Usage\n\n### Launch via Web Endpoint\n\nAgent-E provides a FastAPI wrapper, allowing you to send commands via HTTP and receive streaming results. This feature is useful for programmatic task automation or integrating Agent-E into larger systems.\n\n#### To launch the FastAPI server:\n\n1. On Linux\u002FmacOS, run the following command:\n   ```bash\n   uvicorn ae.server.api_routes:app --reload --loop asyncio\n   ```\n2. On Windows, run the same command but without ```--reload``` (Python still has different async implementations across OSes, removing --reload helping finding a workaround, see this [answer on  StackOverflow](https:\u002F\u002Fstackoverflow.com\u002Fa\u002F78795990)):\n   ```cmd\n   uvicorn ae.server.api_routes:app --loop asyncio\n   ```\n\n2. Send POST requests to execute tasks. For example, to execute a task using cURL:\n```bash\ncurl --location 'http:\u002F\u002F127.0.0.1:8000\u002Fexecute_task' \\\n--header 'Content-Type: application\u002Fjson' \\\n--data '{\n    \"command\": \"go to espn, look for soccer news, report the names of the most recent soccer champs\"\n}'\n```\nOptionally, the API request can include an llm_config object if you want to apply a different configuration during API request execution. The llm_config object should have configuration seperately for planner_agent and browser_nav_agent. See  `agents_llm_config-example.json` for an exmaple.\n\n```bash\ncurl --location 'http:\u002F\u002F127.0.0.1:8000\u002Fexecute_task' \\\n--header 'Content-Type: application\u002Fjson' \\\n--data '{\n    \"command\": \"go to espn, look for soccer news, report the names of the most recent soccer champs\",\n    \"llm_config\":{\"planner_agent\":{...}, \"browser_nav_agent\":{...}}\n}'\n```\n### Customizing LLM Parameters\nAgent-E supports advanced LLM configurations using environment variables or JSON-based configuration files. This allows users to customize how the underlying model behaves, such as setting temperature, top-p, and model API base URLs.\n\nTo configure Agent-E using a JSON file, add the following to your `.env` file:\n```makefile\nAGENTS_LLM_CONFIG_FILE=agents_llm_config.json\nAGENTS_LLM_CONFIG_FILE_REF_KEY=openai_gpt\n```\nA sample JSON config file is provided in the project root: `agents_llm_config-example.json`.\n\n\n#### Default Values for LLM Parameters\nIf you do not set `temperature`, `top_p`, or `seed` in your `.env` file or JSON configuration, Agent-E will use the following default values:\n- For `gpt-*` models:\n    - `\"temperature\": 0.0`\n    - `\"top_p\": 0.001`\n    - `\"seed\": 12345`\n- For other models:\n    -  `\"temperature\": 0.1`\n    - `\"top_p\": 0.1`\n\n## Open-source Models\n\nAgent-E supports the use of open-source models through LiteLLM and Ollama. This allows users to run language models locally on their machines, with LiteLLM translating OpenAI-format inputs to local models' endpoints.\n\n### Steps to Use Open-source Models:\n\n1. **Install LiteLLM**:\n    ```bash\n    pip install 'litellm[proxy]'\n    ```\n\n2. **Install Ollama**:\n    - For Mac and Windows, download [Ollama](https:\u002F\u002Follama.com\u002Fdownload).\n    - For Linux:\n        ```bash\n        curl -fsSL https:\u002F\u002Follama.com\u002Finstall.sh | sh\n        ```\n\n3. **Pull Ollama Models**:\n    Before using a model, download it from the library. The list of available models is [here](https:\u002F\u002Follama.com\u002Flibrary). For example, to pull the Mistral v0.3 model:\n    ```bash\n    ollama pull mistral:v0.3\n    ```\n\n4. **Run LiteLLM**:\n    Start the LiteLLM proxy using the downloaded model:\n    ```bash\n    litellm --model ollama_chat\u002Fmistral:v0.3\n    ```\n\n5. **Configure Model in AutoGen**:\n    Modify your `.env` file as follows. No model name or API keys are required since the model is running locally.\n    ```bash\n    AUTOGEN_MODEL_NAME=NotRequired\n    AUTOGEN_MODEL_API_KEY=NotRequired\n    AUTOGEN_MODEL_BASE_URL=http:\u002F\u002F0.0.0.0:400\n    ```\n\n### Notes:\n- Running local Large Language Models (LLMs) with Agent-E is possible, but has not been thoroughly tested. Use this feature with caution.\n\n\n## Troubleshooting\n\nBelow are some common issues you may encounter when setting up or running Agent-E, along with steps to resolve them.\n\n### 1. pip not installed in the virtual environment\n\nIf you encounter an issue where `pip` is not installed in the virtual environment after setup, follow these steps:\n\n1. Activate the virtual environment:\n   ```bash\n   source .venv\u002Fbin\u002Factivate  # On Windows: .venv\\Scripts\\activate\n\n2. Install `pip`:\n```bash\npython -m ensurepip --upgrade\n```\n\n3. Deactivate the virtual environment:\n```bash\ndeactivate\n```\n\n4. Reactivate the virtual environment:\n```bash\nsource .venv\u002Fbin\u002Factivate  # On Windows: .venv\\Scripts\\activate\n```\n\n5. Check for `pip` in the `.venv\u002Fbin` directory. You should now have `pip` installed.\n\n### 2. BlockingIOError on macOS\nIf you are on `macOS` and encounter the following error:\n```\nBlockingIOError: [Errno 35] write could not complete without blocking\n```\nThis happens when AutoGen tries to print large amounts of text to the terminal. To fix this, run the following command with the `-u` flag to make output unbuffered:\n```bash\npython -u -m ae.main\n```\nNote: Using unbuffered output may result in some output not appearing in the terminal.\n\n### 3. Playwright driver issues\nIf you do not have Google Chrome installed locally and run into issues with browser automation, install the Playwright drivers:\n```bash\nplaywright install\n```\nPlaywright will install the necessary browser binaries to run the automation tasks without needing Chrome locally installed.\n\n### 4. Chrome profile not found\nIf you want to use your local Chrome browser instead of Playwright and encounter issues finding the browser profile path, follow these steps:\n\n1. Open Chrome and go to `chrome:\u002F\u002Fversion\u002F`.\n2. Locate the `Profile Path`.\n3. Set the `BROWSER_STORAGE_DIR` environment variable in the `.env` file to this path:\n```\nBROWSER_STORAGE_DIR=\u002Fpath\u002Fto\u002Fyour\u002Fchrome\u002Fprofile\n```\n\nIf you encounter other issues, please refer to the project’s [GitHub issues](https:\u002F\u002Fgithub.com\u002FEmergenceAI\u002FAgent-E\u002Fissues) or reach out on [Discord](https:\u002F\u002Fdiscord.gg\u002FwgNfmFuqJF) for assistance.\n\n\n## Demos\n\n| Video | Command | Description |\n|-----------|-------------|-------------|\n| [![Oppenheimer Video](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEmergenceAI_Agent-E_readme_07d8622ed5de.png)](https:\u002F\u002Fwww.youtube.com\u002Fembed\u002Fv4BgYiDHNZs) | There is an Oppenheimer video on youtube by Veritasium, can you find it and play it? | \u003Cul> \u003Cli>Navigates to www.youtube.com \u003C\u002Fli> \u003Cli>Searches for Oppenheimer Veritasium using the searchbar \u003C\u002Fli> \u003Cli> Plays the correct video \u003C\u002Fli>\u003C\u002Ful>|\n| [![Example 2: Use information to fill forms](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEmergenceAI_Agent-E_readme_50e76ca6bbf2.png)](https:\u002F\u002Fwww.youtube.com\u002Fembed\u002FuyE7tfKkB0E) | Can you do this task? Wait for me to review before submitting. | Takes the highlighted text from the email as part of the instruction. \u003Cul> \u003Cli>Navigates to the form URL \u003C\u002Fli> \u003Cli>Identifies elements in the form to fill \u003C\u002Fli> \u003Cli> Fills the form using information from memory defined in user preferences.txt \u003C\u002Fli> \u003Cli>Waits for user to review before submitting the form \u003C\u002Fli> |\n| [![Example 3: Find and add specific product to amazon cart](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEmergenceAI_Agent-E_readme_c6d603f8dbc9.png)](https:\u002F\u002Fwww.youtube.com\u002Fembed\u002FCiKZwU_F6TQ) | Find Finish dishwasher detergent tablets on amazon, sort by best seller and add the first one to my cart | \u003Cul> \u003Cli> Navigates to www.amazon.com \u003C\u002Fli> \u003Cli>Searches for Finish dishwasher detergent tablets using amazon search feature \u003C\u002Fli> \u003Cli> Sorts the search results by best seller \u003C\u002Fli> \u003Cli>Selects the first product to navigate to the the product page of the first product. \u003C\u002Fli> \u003Cli> Adds the product to cart \u003C\u002Fli>\u003C\u002Ful> |\n| [![Example 4: Compare flight prices on Google Flights](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEmergenceAI_Agent-E_readme_33012b732061.png)](https:\u002F\u002Fwww.youtube.com\u002Fembed\u002FJDtnMx0pTmQ) | Compare business class flight options from Lisbon to Singapore for a one-way trip on September 15, 2024 on Google Flights? | \u003Cul>\u003Cli>  \u003C\u002Fli> \u003Cli> Sets Journey type to one-way. \u003C\u002Fli> \u003Cli> Sets number of passengers to one. \u003C\u002Fli> \u003Cli> Sets departure date to 15 September \u003C\u002Fli> \u003Cli> Sets date to September 15 2024 \u003C\u002Fli> \u003Cli> Sets ticket type to business class \u003C\u002Fli> \u003Cli> Executes search \u003C\u002Fli> \u003Cli> Sets departure date to 15 September \u003C\u002Fli> Extracts flight information\u003C\u002Ful>|\n\n\n\n## Architecture\n\n![Agent-E system view](docs\u002Fimages\u002Fagent-e-system-architecture.png?raw=true \"Agent-E system view\")\n\nBuilding on the foundation provided by the [AG2 agent framework](https:\u002F\u002Fdocs.ag2.ai\u002Fdocs\u002FHome) (Formerly AutoGen), Agent-E's architecture leverages the interplay between skills and agents. Each skill embodies an atomic action, a fundamental building block that, when executed, returns a natural language description of its outcome. This granularity allows Agent-E to flexibly assemble these skills to tackle complex web automation workflows.\n\n![Agent-E AutoGen setup](docs\u002Fimages\u002Fagent-e-autogen-setup.png?raw=true \"Agent-E AutoGen setup\")\n\nThe diagram above shows the configuration chosen on top of AutoGen. The skills can be partitioned differently, but this is the one that we chose for the time being. We chose to use skills that map to what humans learn about the web browser rather than allow the LLM to write code as it pleases. We see the use of configured skills to be safer and more predictable in its outcomes. Certainly it can click on the wrong things, but at least it is not going to execute malicious unknown code.\n\n### Agents\nAt the moment there are two agents, the User proxy (executes the skills), and Browser navigation. Browser navigation agent embodies all the skills for interacting with the web browser.\n\n### Skills Library\nAt the core of Agent-E's capabilities is the Skills Library, a repository of well-defined actions that the agent can perform; for now web actions. These skills are grouped into two main categories:\n\n- **Sensing Skills**: Skills like `get_dom_with_content_type` and `geturl` that help the agent understand the current state of the webpage or the browser.\n- **Action Skills**: Skills that allow the agent to interact with and manipulate the web environment, such as `click`, `enter text`, and `open url`.\n\nEach skill is created with the intention to be as conversational as possible, making the interactions with LLMs more intuitive and error-tolerant. For instance, rather than simply returning a boolean value, a skill might explain in natural language what happened during its execution, enabling the LLM to better understand the context and correct course if necessary.\n\nBelow are the skills we have implemented:\n\n| Sensing Skills | Action Skills |\n|----------------|---------------|\n| `geturl` - Fetches and returns the current url. | `click` - given a DOM query selector, this will click on it. |\n| `get_dom_with_content_type` - Retrieves the HTML DOM of the active page based on the specified content type. Content type can be:\u003Cbr> - `text_only`: Extracts the inner text of the html DOM. Responds with text output.\u003Cbr> - `input_fields`: Extracts the interactive elements in the DOM (button, input, textarea, etc.) and responds with a compact JSON object.\u003Cbr> - `all_fields`: Extracts all the fields in the DOM and responds with a compact JSON object. | `enter_text_and_click` - Optimized method that combines enter text and click skills. The optimization here helps use cases such as enter text in a field and press the search button. Since the DOM would not have changed or changes should be immaterial to this action, identifying both selectors for an input field and an actionable button can happen based on the same DOM examination. |\n| `get_user_input` - Provides the orchestrator with a mechanism to receive user feedback to disambiguate or seek clarity on fulfilling their request. | `bulk_enter_text` - Optimized method that wraps enter_text method so that multiple text entries can be performed one shot. |\n|  | `enter_text` - Enters text in a field specified by the provided DOM query selector. |\n|  | `openurl` - Opens the given URL in current or new tab. |\n\n\n### DOM Distillation\n\nAgent-E's approach to managing the vast landscape of HTML DOM is methodical and, frankly, essential for efficiency. We've introduced DOM Distillation to pare down the DOM to just the elements pertinent to the user's task.\n\nIn practice, this means taking the expansive DOM and delivering a more digestible JSON snapshot. This isn't about just reducing size, it's about honing in on relevance, serving the LLMs only what's necessary to fulfill a request. So far we have three content types:\n\n- **Text only**: For when the mission is information retrieval, and the text is the target. No distractions.\n- **Input fields**: Zeroing in on elements that call for user interaction. It’s about streamlining actions.\n- **All content**: The full scope of distilled DOM, encompassing all elements when the task demands a comprehensive understanding.\n\nIt's a surgical procedure, carefully removing extraneous information while preserving the structure and content needed for the agent’s operation. Of course with any distillation there could be casualties, but the idea is to refine this over time to limit\u002Feliminate them.\n\nSince we can't rely on all web page authors to use best practices, such as adding unique ids to each HTML element, we had to inject our own attribute (`mmid`) in every DOM element. We can then guide the LLM to rely on using `mmid` in the generated DOM queries.\n\nTo cutdown on some of the DOM noise, we use the DOM Accessibility Tree rather than the regular HTML DOM. The accessibility tree by nature is geared towards helping screen readers, which is closer to the mission of web automation than plain old HTML DOM.\n\nThe distillation process is a work in progress. We look to refine this process and condense the DOM further aiming to make interactions faster, cost-effective, and more accurate.\n\n## Testing and Benchmarking\n\nAgent-E builds on the work done by [Web Arena](https:\u002F\u002Fgithub.com\u002Fweb-arena-x\u002Fwebarena) for testing and evaluation. The `test` directory contains a `tasks` subdirectory with JSON files that define test cases, which also serve as examples.\n\nAgent-E operates in a real-world web environment, which introduces variability in testing. As a result, not all tests may pass consistently due to changes in live websites. The goal is to ensure Agent-E works as expected across a wide range of tasks, with a focus on practical web automation.\n\n### Running Tests\n\nTo run the full test suite, use the following command:\n\n```bash\npython -m test.run_tests\n```\n\n### macOS Users\nIf you're running the tests on macOS and encounter `BlockingIOError`, run the tests with unbuffered output:\n```bash\npython -u -m test.run_tests\n```\n\n### Running Specific Tests\nIf you want to run specific tests, you can modify the minimum and maximum task indices. This will run a subset of the tasks defined in the test configuration file.\n\nExample:\n```bash\npython -m test.run_tests --min_task_index 0 --max_task_index 28 --test_results_id first_28_tests\n```\nThis command will run tests from index 0 to 27 and assign the results the identifier `first_28_tests`.\n\n### Parameters for run_tests\nHere are additional parameters that you can pass to customize the test execution:\n- `--min_task_index`: Minimum task index to start tests from (default: 0).\n- `--max_task_index`: Maximum task index to end tests with, non-inclusive.\n- `--test_results_id`: A unique identifier for the test results. If not provided, a timestamp is used.\n- `--test_config_file`: Path to the test configuration file. Default is `test\u002Ftasks\u002Ftest.json`.\n- `--wait_time_non_headless`: The amount of time to wait between headless tests.\n- `--take_screenshots`: Takes screenshots after every operation performed. Example: `--take_screenshots` `true`. Default is `false`\n\n### Example Command\nHere’s an example of how to use the parameters (macOS Users add `-u` parameter to the command below):\n```bash\npython -m test.run_tests --min_task_index 0 --max_task_index 28 --test_results_id first_28_tests\n```\n\n\n## Contributing\n\nThank you for your interest in contributing to Agent-E! We welcome contributions from the community and appreciate your help in improving the project.\n\n### How to Contribute:\n\n1. **Fork the Repository**  \n   Start by forking the [Agent-E repository](https:\u002F\u002Fgithub.com\u002FEmergenceAI\u002FAgent-E.git) to your GitHub account.\n\n2. **Create a New Branch**  \n   Create a new branch for your feature or bug fix:\n   ```bash\n   git checkout -b my-feature-branch\n   ```\n\n3. **Make Changes**\nImplement your changes in your new branch. Be sure to follow the project's coding style and best practices.\n\n4. **Run Tests**\nBefore submitting your pull request, ensure that all tests pass by running:\n```bash\npython -m test.run_tests\n```\n\n5. **Submit a Pull Request**\nOnce your changes are ready, push your branch to your GitHub fork and submit a pull request to the main repository. Please include a clear description of your changes and why they are necessary.\n\n### Contribution Guidelines:\n- Follow the [contributing guidelines](CONTRIBUTING.md) for more detailed information on contributing.\n- Be sure to write clear and concise commit messages.\n- When submitting a pull request, make sure to link any related issues and provide a detailed description of the changes.\n\n\n### Code of Conduct:\nPlease note that we have a [Code of Conduct](CODE_OF_CONDUCT.md) that all contributors are expected to follow. We are committed to providing a welcoming and inclusive environment for everyone.\n\n### Reporting Issues:\nIf you encounter a bug or have a feature request, please open an issue in the [GitHub issue tracker](https:\u002F\u002Fgithub.com\u002FEmergenceAI\u002FAgent-E\u002Fissues). Be sure to provide detailed information so we can address the issue effectively.\n\n### Join the Discussion:\nWe encourage you to join our community on [Discord](https:\u002F\u002Fdiscord.gg\u002FwgNfmFuqJF) for discussions, questions, and updates on Agent-E.\n\n\n## Docs Generation\n\nAgent-E uses [Sphinx](https:\u002F\u002Fwww.sphinx-doc.org\u002Fen\u002Fmaster\u002F) to generate its documentation. To contribute or generate documentation locally, follow these steps:\n\n### Prerequisites\n\nEnsure that you have installed the development dependencies before generating the docs. You can install them using the following command:\n\n```bash\nuv pip install -r pyproject.toml --extra dev\n```\n\n### Steps to Generate Documentation\n1. Navigate to the project root directory:\n```bash\ncd Agent-E\n```\n\n2. Create a `docs` directory if it doesn’t exist:\n```bash\nmkdir docs\ncd docs\n```\n\n3. Initialize Sphinx using the quickstart command:\n```bash\nsphinx-quickstart\n```\n\n4. Configure Sphinx by editing the `docs\u002Fconf.py` file. Add the following lines to include the project in the Sphinx path and enable extensions:\n```python\nimport os\nimport sys\nsys.path.insert(0, os.path.abspath('..'))\nextensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon']\nhtml_theme = 'sphinx_rtd_theme'\n```\n\n5. Generate API Documentation:\nFrom the project root, run the following command to generate API documentation files:\n```bash\nsphinx-apidoc -o docs\u002Fsource .\n```\n\n6. Build the Documentation:\nAfter generating the API documentation, go to the `docs` directory and build the HTML docs:\n```bash\nsphinx-build -b html . _build\n```\n\n### Viewing the Documentation\nOnce the documentation is built, open the generated HTML files in your browser by navigating to the `_build` directory and opening `index.html`.\n```bash\nopen _build\u002Findex.html\n```\n\nThis will display the generated documentation in your default web browser.\n\n## Join the Community\n\nWe encourage you to become part of the Agent-E community! Whether you're here to ask questions, share feedback, or contribute to the project, we welcome all participation.\n\n### Join the Conversation:\n\n- **Discord**: Connect with other users and developers in our [Discord community](https:\u002F\u002Fdiscord.gg\u002FwgNfmFuqJF). Feel free to ask questions, share your experiences, or discuss potential features with fellow users and contributors.\n\n### Stay Updated:\n\nStay informed about new features, updates, and announcements by following the project and engaging with the community.\n\n- **GitHub**: Keep an eye on the latest issues and pull requests, and contribute directly to the codebase on [GitHub](https:\u002F\u002Fgithub.com\u002FEmergenceAI\u002FAgent-E).\n\nWe look forward to seeing you in the community!\n\n\n## Citation\n\nIf you use this work in your research or projects, please cite the following article:\n\n```\n@misc{abuelsaad2024-agente,\n      title={Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems},\n      author={Tamer Abuelsaad and Deepak Akkil and Prasenjit Dey and Ashish Jagmohan and Aditya Vempaty and Ravi Kokku},\n      year={2024},\n      eprint={2407.13032},\n      archivePrefix={arXiv},\n      primaryClass={cs.AI},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.13032},\n}\n```\nYou can also view the paper on [arXiv](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.13032).\n\n\n## TODO\n\nHere are some features and improvements planned for future releases of Agent-E:\n\n- **Robust Dropdown Handling**: Improve handling for dropdowns on sites like travel booking platforms (e.g., Booking.com, Expedia, Google Flights). Many of these menus have dynamic content or combine multiple aspects, such as selecting both departure and return dates within the same menu.\n- **Task Caching**: Implement caching for tasks that have been run before, allowing users to rerun tasks without requiring new LLM calls. This cache should be smart, selectively caching elements like DOM locators while excluding items such as information retrieval results to improve token efficiency.\n- **Open Source and Local LLM Compatibility**: Adapt Agent-E to work with open-source LLMs, ideally allowing it to run locally. This may involve simplifying prompts or refactoring skills to match the capabilities of these models.\n- **Multi-Tab and Bookmark Handling**: Add skills to support bookmarks and multi-tab usage. Currently, Agent-E can handle only one tab at a time, losing state if another tab is opened, requiring the user to restart.\n- **PDF Text Handling**: Enhance support for managing large amounts of text from PDFs that exceed LLM context windows, possibly by chunking the text with some overlap to retain context.\n- **Browser Nav Agent History Optimization**: Improve the Browser Navigation Agent by developing ways to trim browser history, aiming to reduce token usage and cognitive load on the LLM.\n- **Harvest User Preferences**: Integrate Long-Term Memory (LTM) support to automatically populate user preferences over time, with options for users to manually inject preferences. This may involve using a vector database like FAISS locally or an external hosted vector database.\n- **DOM Distillation Testing Harness**: Develop a testing harness for DOM distillation, allowing distillation improvements to be measured for accuracy and performance improvements.\n- **DOM Distillation Optimization**: Continue to make DOM distillation faster and more efficient.\n- **Shadow DOM Support**: Some sites use Shadow DOMs, support extracting its content from Accessibility Tree.\n- **Google Suite Compatibility**: Add support for Google Docs, Sheets, Slides, and Gmail, which often use canvas elements inaccessible via conventional DOM methods.\n- **Cross-Platform Installer**: Create an installer compatible with Windows, Mac, and ideally Ubuntu, aimed at non-technical users. This installer should allow for environment variable configuration within the app.\n- **Execution Process Video Recording**: Implement video recording to capture the execution process, as requested in issue [#106](https:\u002F\u002Fgithub.com\u002FEmergenceAI\u002FAgent-E\u002Fissues\u002F106).\n- **DOM Distillation Optimizations**: Replace deprecated `snapshot()` method for DOM distillation.\n- **Token Optimization**: Investigate ways to reduce the number of tokens used by the AutoGen-required prompts and annotations.\n- ~~**Action Verification**: Implement response for every skill that reflects DOM changes (using Mutation Observers), so the LLM can judge if the skill executed properly.~~\n- ~~**Execution Planner**: Develop a planner agent that can make the LLM decide on multiple steps ahead for faster execution.~~\n- ~~[Nested chat did the trick]**Group Chat**: Enable group chat features and move some skills to different agents.~~\n\n\n","# Agent-E\n**免费试用：托管式网络代理与编排器**\u003Cbr>\n试用我们的网络代理（具备企业级增强功能的Agent-E）以及多代理编排器。您可以访问高级日志记录、基于角色的访问控制和云端托管的可扩展基础设施等功能，并享受专家支持。请在此处注册[这里](https:\u002F\u002Fwww.emergence.ai\u002Forchestrator)\n\n\n[Discord](https:\u002F\u002Fdiscord.gg\u002FwgNfmFuqJF) &nbsp;&nbsp; [引用论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.13032) _注：WebVoyager验证使用了[nested_chat_for_hierarchial_planning分支](https:\u002F\u002Fgithub.com\u002FEmergenceAI\u002FAgent-E\u002Ftree\u002Fnested_chat_for_hierarchial_planning)和GPT4-Turbo_\n\n\nAgent-E是一个基于智能体的系统，旨在自动化用户计算机上的操作。目前它主要专注于浏览器内的自动化任务。该系统基于[AG2智能体框架](https:\u002F\u002Fdocs.ag2.ai\u002Fdocs\u002FHome)。\n\n这提供了一种通过自然语言与网页浏览器交互的方式：\n- 使用关于您的信息或来自其他网站的信息填写表单（目前仅支持网页表单，暂不支持PDF）\n- 根据各种标准（如畅销商品或价格）在亚马逊等电商网站上搜索和排序产品。\n- 在各类网站上查找特定内容和详细信息，从ESPN上的体育比分到大学官网上的联系方式。\n- 导航并操作基于网页的媒体内容，包括播放YouTube视频以及管理全屏、静音等播放设置。\n- 执行全面的网络搜索，获取涵盖广泛主题的信息，从历史遗迹到当地顶级餐厅。\n- 通过筛选问题来管理和自动化项目管理平台（如JIRA）上的任务，从而简化用户的流程。\n- 提供个人购物协助，根据用户需求推荐产品，例如游戏卡的存储方案。\n\n尽管Agent-E仍在不断发展，但它已经能够处理多种多样的任务，而最出色的任务往往来自于您的创意。因此，请亲自体验一下，并告诉我们您用它完成了哪些事情。更多信息请参阅我们的[博客文章](https:\u002F\u002Fwww.emergence.ai\u002Fblog\u002Fdistilling-the-web-for-multi-agent-automation)。\n\n\n## 使用脚本快速入门\n要开始使用Agent-E，请按照以下步骤安装依赖项并配置您的环境。\n#### 1. 运行安装脚本\n\n- **macOS\u002FLinux**:\n  - 从项目根目录运行以下命令以设置环境并安装所有依赖项：\n    ```bash\n    .\u002Finstall.sh\n    ```\n    - 如果需要**Playwright支持**，可以传递`-p`标志以无需进一步提示即可安装Playwright：\n      ```bash\n      .\u002Finstall.sh -p\n      ```\n\n- **Windows**:\n  - 从项目根目录，在PowerShell中执行以下命令：\n    ```powershell\n    .\\win_install.ps1\n    ```\n    - 若要无需进一步提示地安装Playwright，可添加`-p`标志：\n      ```powershell\n      .\\win_install.ps1 -p\n      ```\n#### 2. 配置环境变量\n- 打开新创建的`.env`文件和`agents_llm_config.json`，按照说明设置相关字段。\n\n#### 3. 运行Agent-E\n在完成环境设置和所有依赖项的安装后，您可以通过以下命令运行Agent-E：\n```bash\npython -m ae.main\n```\n\n**适用于macOS用户**\n```bash\npython -u -m ae.main\n```\n\n\n## 手动设置\n\n### 1. 安装`uv`\nAgent-E使用`uv`来管理Python虚拟环境和包依赖。\n\n- **macOS\u002FLinux**:\n  ```bash\n  curl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\n\n- **Windows**:\n  ```powershell\n  powershell -c \"irm https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.ps1 | iex\"\n  ```\n\n- 或者，您也可以使用`pip`安装`uv`：`pip install uv`\n\n### 2. 设置虚拟环境\n使用`uv`为项目创建并激活虚拟环境。\n```bash\nuv venv --python 3.11  # 3.10及以上版本也可\nsource .venv\u002Fbin\u002Factivate  # Windows系统下：.venv\\Scripts\\activate\n```\n\n### 3. 安装依赖项\n从`pyproject.toml`生成`requirements.txt`文件，并安装依赖项。\n```bash\nuv pip compile pyproject.toml -o requirements.txt\nuv pip install -r requirements.txt\n```\n\n若需安装开发所需的额外依赖项，可运行：\n```bash\nuv pip install -r pyproject.toml --extra dev\n```\n\n### 4.（可选）安装Playwright驱动程序\n如果您本地未安装Google Chrome，且不想安装，可以使用Playwright进行浏览器自动化。\n```bash\nplaywright install\n```\n\n### 5. 配置环境\n通过复制提供的示例文件创建`.env`文件。\n```bash\ncp .env-example .env\n```\n- 编辑`.env`文件并设置以下变量：\n    - `AUTOGEN_MODEL_NAME`（例如，使用`gpt-4-turbo`以获得最佳性能）。\n    - `AUTOGEN_MODEL_API_KEY` LLM API密钥。\n    - 如果使用非OpenAI模型，则需配置`AUTOGEN_MODEL_BASE_URL`（即托管completion端点的URL，但不要包含`\u002Fcompletion`）、`AUTOGEN_MODEL_API_TYPE`和`AUTOGEN_MODEL_API_VERSION`。\n- 您还可以选择性地配置`AUTOGEN_LLM_TEMPERATURE`和`AUTOGEN_LLM_TOP_P`。\n- 如果希望使用本地Chrome浏览器而非Playwright浏览器，请在Chrome中打开chrome:\u002F\u002Fversion\u002F，找到您的用户资料路径，并将`BROWSER_STORAGE_DIR`设置为该路径值。\n\n## 环境变量\n\nAgent-E的配置依赖于多个环境变量。您需要在项目根目录下的`.env`文件中定义这些变量。我们还提供了一个方便的`.env-example`示例文件。\n\n### 关键变量：\n\n- **`AUTOGEN_MODEL_NAME`**  \n  您想要使用的大型语言模型名称（例如 `gpt-4-turbo`）。这是大多数设置所必需的。\n\n- **`AUTOGEN_MODEL_API_KEY`**  \n  访问该大型语言模型的 API 密钥（例如 OpenAI 的 API 密钥）。\n\n- **`AUTOGEN_MODEL_BASE_URL`** *(可选)*  \n  如果模型托管在 OpenAI 以外的服务上（例如 Azure OpenAI 服务），则需要提供基础 URL。示例：  \n  `https:\u002F\u002Fapi.groq.com\u002Fopenai\u002Fv1`  \n  或  \n  `https:\u002F\u002F\u003CYOUR_AZURE_ENDPOINT>.openai.azure.com`\n\n- **`AUTOGEN_MODEL_API_TYPE`** *(可选)*  \n  模型 API 的类型（例如，`azure` 表示由 Azure 托管的模型）。\n\n- **`AUTOGEN_MODEL_API_VERSION`** *(可选)*  \n  要使用的模型 API 版本，通常适用于 Azure 模型（例如 `2023-03-15-preview`）。\n\n- **`AUTOGEN_LLM_TEMPERATURE`** *(可选)*  \n  设置大型语言模型的温度参数，控制输出的随机性。对于 `gpt-*` 模型，默认值为 `0.0`。\n\n- **`AUTOGEN_LLM_TOP_P`** *(可选)*  \n  设置 top-p 值，用于控制标记采样的多样性。对于 `gpt-*` 模型，默认值为 `0.001`。\n\n- **`BROWSER_STORAGE_DIR`** *(可选)*  \n  您本地 Chrome 浏览器配置文件的路径，如果使用本地 Chrome 实例而非 Playwright，则需要此路径。\n\n- **`SAVE_CHAT_LOGS_TO_FILE`**  \n  设置为 `true` 或 `false`（默认：`true`）。指示是否将聊天日志保存到文件中，还是打印到标准输出。\n\n- **`LOG_MESSAGES_FORMAT`**  \n  设置为 `json` 或 `text`（默认：`text`）。指定消息日志的格式。\n\n- **`ADDITIONAL_SKILL_DIRS`** *(可选)*  \n  一个以逗号分隔的目录或 `.py` 文件列表，从中可以加载额外的技能。这用于从指定的目录或文件动态加载技能。\n  示例：`ADDITIONAL_SKILL_DIRS=\".\u002Fprivate_skills,.\u002Fextra_skills\u002Fmy_custom_skill.py\"` 将被添加到 `.env` 文件（或等效文件）中。\n\n- **`PLANNER_USER_INPUT_SKILL_ENABLED`** *(可选)*  \n  设置为 `true` 或 `false`（默认：`false`）。指定规划代理是否允许获取用户输入。\n\n## 运行代码\n\n在完成环境设置并安装所有依赖项后，您可以使用 `.\u002Frun.sh` 脚本或以下命令运行 Agent-E：\n```bash\npython -m ae.main\n```\n\n### 对于 macOS 用户\n如果您在 macOS 上运行程序时遇到 `BlockingIOError`（Errno 35），请执行以下命令以避免该问题：\n```bash\npython -u -m ae.main\n```\n\n### 预期行为\nAgent-E 启动后，您应该会在浏览器界面中看到一个图标。单击该图标将打开类似聊天的界面，您可以在其中输入自然语言指令。您可以尝试的示例命令包括：\n- `打开 YouTube 并搜索搞笑猫咪视频`\n- `在亚马逊上查找 iPhone 14，并按畅销排序`\n\n## 高级用法\n\n### 通过 Web 端点启动\n\nAgent-E 提供了一个 FastAPI 包装器，允许您通过 HTTP 发送命令并接收流式结果。此功能对于程序化任务自动化或将 Agent-E 集成到更大的系统中非常有用。\n\n#### 启动 FastAPI 服务器的方法：\n\n1. 在 Linux\u002FmacOS 上，运行以下命令：\n   ```bash\n   uvicorn ae.server.api_routes:app --reload --loop asyncio\n   ```\n2. 在 Windows 上，运行相同的命令，但不带 `--reload`（由于不同操作系统对异步实现的支持存在差异，去掉 `--reload` 可以作为一种临时解决方案，请参阅 [StackOverflow 上的解答](https:\u002F\u002Fstackoverflow.com\u002Fa\u002F78795990)）：\n   ```cmd\n   uvicorn ae.server.api_routes:app --loop asyncio\n   ```\n\n2. 发送 POST 请求来执行任务。例如，使用 cURL 执行任务：\n```bash\ncurl --location 'http:\u002F\u002F127.0.0.1:8000\u002Fexecute_task' \\\n--header 'Content-Type: application\u002Fjson' \\\n--data '{\n    \"command\": \"前往 ESPN，查看足球新闻，报告最近的足球冠军名单\"\n}'\n```\n可选地，API 请求可以包含一个 `llm_config` 对象，以便在 API 请求执行过程中应用不同的配置。`llm_config` 对象应分别为规划代理和浏览器导航代理提供单独的配置。有关示例，请参阅 `agents_llm_config-example.json`。\n\n```bash\ncurl --location 'http:\u002F\u002F127.0.0.1:8000\u002Fexecute_task' \\\n--header 'Content-Type: application\u002Fjson' \\\n--data '{\n    \"command\": \"前往 ESPN，查看足球新闻，报告最近的足球冠军名单\",\n    \"llm_config\":{\"planner_agent\":{...}, \"browser_nav_agent\":{...}}\n}'\n```\n\n### 自定义 LLM 参数\nAgent-E 支持使用环境变量或基于 JSON 的配置文件进行高级 LLM 配置。这使用户能够自定义底层模型的行为，例如设置温度、top-p 和模型 API 的基础 URL。\n\n要使用 JSON 文件配置 Agent-E，请在您的 `.env` 文件中添加以下内容：\n```makefile\nAGENTS_LLM_CONFIG_FILE=agents_llm_config.json\nAGENTS_LLM_CONFIG_FILE_REF_KEY=openai_gpt\n```\n项目根目录下提供了一个示例 JSON 配置文件：`agents_llm_config-example.json`。\n\n#### LLM 参数的默认值\n如果您未在 `.env` 文件或 JSON 配置中设置 `temperature`、`top_p` 或 `seed`，Agent-E 将使用以下默认值：\n- 对于 `gpt-*` 模型：\n    - `\"temperature\": 0.0`\n    - `\"top_p\": 0.001`\n    - `\"seed\": 12345`\n- 对于其他模型：\n    - `\"temperature\": 0.1`\n    - `\"top_p\": 0.1`\n\n## 开源模型\n\nAgent-E 通过 LiteLLM 和 Ollama 支持使用开源模型。这使得用户能够在本地机器上运行语言模型，LiteLLM 会将 OpenAI 格式的输入转换为本地模型的端点。\n\n### 使用开源模型的步骤：\n\n1. **安装 LiteLLM**：\n    ```bash\n    pip install 'litellm[proxy]'\n    ```\n\n2. **安装 Ollama**：\n    - 对于 Mac 和 Windows，下载 [Ollama](https:\u002F\u002Follama.com\u002Fdownload)。\n    - 对于 Linux：\n        ```bash\n        curl -fsSL https:\u002F\u002Follama.com\u002Finstall.sh | sh\n        ```\n\n3. **拉取 Ollama 模型**：\n    在使用模型之前，需从库中下载。可用模型列表请见 [这里](https:\u002F\u002Follama.com\u002Flibrary)。例如，拉取 Mistral v0.3 模型：\n    ```bash\n    ollama pull mistral:v0.3\n    ```\n\n4. **运行 LiteLLM**：\n    使用已下载的模型启动 LiteLLM 代理：\n    ```bash\n    litellm --model ollama_chat\u002Fmistral:v0.3\n    ```\n\n5. **在 AutoGen 中配置模型**：\n    修改您的 `.env` 文件如下。由于模型在本地运行，因此无需提供模型名称或 API 密钥。\n    ```bash\n    AUTOGEN_MODEL_NAME=NotRequired\n    AUTOGEN_MODEL_API_KEY=NotRequired\n    AUTOGEN_MODEL_BASE_URL=http:\u002F\u002F0.0.0.0:400\n    ```\n\n### 注意事项：\n- 使用 Agent-E 运行本地大型语言模型是可行的，但尚未经过充分测试。请谨慎使用此功能。\n\n\n## 故障排除\n\n以下是您在设置或运行 Agent-E 时可能遇到的一些常见问题及解决方法。\n\n### 1. 虚拟环境中未安装 pip\n\n如果在设置虚拟环境后发现 `pip` 未安装，请按照以下步骤操作：\n\n1. 激活虚拟环境：\n   ```bash\n   source .venv\u002Fbin\u002Factivate  # 在 Windows 上：.venv\\Scripts\\activate\n   ```\n\n2. 安装 `pip`：\n```bash\npython -m ensurepip --upgrade\n```\n\n3. 退出虚拟环境：\n```bash\ndeactivate\n```\n\n4. 再次激活虚拟环境：\n```bash\nsource .venv\u002Fbin\u002Factivate  # 在 Windows 上：.venv\\Scripts\\activate\n```\n\n5. 检查 `.venv\u002Fbin` 目录下是否有 `pip`，此时应已成功安装。\n\n### 2. macOS 上的 BlockingIOError\n如果您在 `macOS` 系统上遇到以下错误：\n```\nBlockingIOError: [Errno 35] write could not complete without blocking\n```\n这通常是由于 AutoGen 尝试向终端输出大量文本所致。解决方法是使用 `-u` 参数以启用非缓冲输出：\n```bash\npython -u -m ae.main\n```\n注意：启用非缓冲输出可能会导致部分输出无法显示在终端中。\n\n### 3. Playwright 驱动问题\n如果您本地未安装 Google Chrome，并且在进行浏览器自动化时遇到问题，请安装 Playwright 驱动：\n```bash\nplaywright install\n```\nPlaywright 将自动下载所需的浏览器二进制文件，从而无需在本地安装 Chrome 即可运行自动化任务。\n\n### 4. 找不到 Chrome 配置文件\n如果您希望使用本地的 Chrome 浏览器而非 Playwright，并且遇到无法找到浏览器配置文件路径的问题，请按照以下步骤操作：\n\n1. 打开 Chrome 浏览器并访问 `chrome:\u002F\u002Fversion\u002F`。\n2. 找到“配置文件路径”。\n3. 在 `.env` 文件中将 `BROWSER_STORAGE_DIR` 环境变量设置为该路径：\n```\nBROWSER_STORAGE_DIR=\u002Fpath\u002Fto\u002Fyour\u002Fchrome\u002Fprofile\n```\n\n如您遇到其他问题，请参考项目的 [GitHub Issues](https:\u002F\u002Fgithub.com\u002FEmergenceAI\u002FAgent-E\u002Fissues) 或在 [Discord](https:\u002F\u002Fdiscord.gg\u002FwgNfmFuqJF) 上寻求帮助。\n\n\n## 演示视频\n\n| 视频 | 命令 | 描述 |\n|-----------|-------------|-------------|\n| [![奥本海默视频](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEmergenceAI_Agent-E_readme_07d8622ed5de.png)](https:\u002F\u002Fwww.youtube.com\u002Fembed\u002Fv4BgYiDHNZs) | YouTube 上有一段 Veritasium 制作的奥本海默视频，你能找到并播放它吗？ | \u003Cul> \u003Cli>导航至 www.youtube.com \u003C\u002Fli> \u003Cli>使用搜索栏查找“奥本海默 Veritasium”\u003C\u002Fli> \u003Cli>播放正确的视频 \u003C\u002Fli>\u003C\u002Ful>|\n| [![示例 2：利用信息填写表单](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEmergenceAI_Agent-E_readme_50e76ca6bbf2.png)](https:\u002F\u002Fwww.youtube.com\u002Fembed\u002FuyE7tfKkB0E) | 你能完成这项任务吗？请先让我审核后再提交。 | 根据邮件中高亮显示的文本作为指令内容。\u003Cul> \u003Cli>导航至表单网址 \u003C\u002Fli> \u003Cli>识别表单中的填写项 \u003C\u002Fli> \u003Cli>根据 user preferences.txt 中的记忆信息填写表单 \u003C\u002Fli> \u003Cli>等待用户审核后再提交表单 \u003C\u002Fli> |\n| [![示例 3：在亚马逊上找到特定商品并加入购物车](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEmergenceAI_Agent-E_readme_c6d603f8dbc9.png)](https:\u002F\u002Fwww.youtube.com\u002Fembed\u002FCiKZwU_F6TQ) | 在亚马逊上找到 Finish 洗碗机清洁片，按畅销排序后将第一个商品加入我的购物车 | \u003Cul> \u003Cli>导航至 www.amazon.com \u003C\u002Fli> \u003Cli>使用亚马逊搜索功能查找 Finish 洗碗机清洁片 \u003C\u002Fli> \u003Cli>按畅销排序搜索结果 \u003C\u002Fli> \u003Cli>选择第一个产品进入商品详情页 \u003C\u002Fli> \u003Cli>将商品加入购物车 \u003C\u002Fli>\u003C\u002Ful> |\n| [![示例 4：在 Google Flights 上比较航班价格](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEmergenceAI_Agent-E_readme_33012b732061.png)](https:\u002F\u002Fwww.youtube.com\u002Fembed\u002FJDtnMx0pTmQ) | 请在 Google Flights 上比较 2024 年 9 月 15 日从里斯本飞往新加坡的单程商务舱航班选项吗？ | \u003Cul>\u003Cli> 设置行程类型为单程。\u003C\u002Fli> \u003Cli> 设置乘客人数为 1 人。\u003C\u002Fli> \u003Cli> 设置出发日期为 9 月 15 日。\u003C\u002Fli> \u003Cli> 设置日期为 2024 年 9 月 15 日。\u003C\u002Fli> \u003Cli> 设置机票类型为商务舱。\u003C\u002Fli> \u003Cli> 执行搜索。\u003C\u002Fli> \u003Cli> 提取航班信息。\u003C\u002Fli>\u003C\u002Ful>|\n\n\n\n## 架构\n\n![Agent-E 系统视图](docs\u002Fimages\u002Fagent-e-system-architecture.png?raw=true \"Agent-E 系统视图\")\n\n基于 [AG2 代理框架](https:\u002F\u002Fdocs.ag2.ai\u002Fdocs\u002FHome)（原 AutoGen）的基础之上，Agent-E 的架构充分利用了技能与代理之间的协同作用。每个技能代表一个原子级的操作，是构建复杂网络自动化工作流的基本单元。当这些技能被执行时，会返回对其执行结果的自然语言描述。这种细粒度的设计使得 Agent-E 能够灵活地组合这些技能来应对复杂的网页自动化任务。\n\n![Agent-E AutoGen 配置](docs\u002Fimages\u002Fagent-e-autogen-setup.png?raw=true \"Agent-E AutoGen 配置\")\n\n上图展示了我们在 AutoGen 基础之上所采用的配置。虽然技能可以有不同的划分方式，但我们目前选择了这种方式。我们倾向于使用与人类对网页浏览器操作相对应的技能，而不是让大模型随意编写代码。我们认为，通过预定义的技能来进行操作更加安全且结果更可预测。当然，它仍有可能点击错误的内容，但至少不会执行未知的恶意代码。\n\n### 代理\n目前有两个代理：用户代理（负责执行技能）和浏览器导航代理。浏览器导航代理包含了所有用于与网页浏览器交互的技能。\n\n### 技能库\nAgent-E 的核心能力在于其技能库，这是一个包含代理可执行的、定义明确的操作的存储库；目前主要是网络操作。这些技能被分为两大类：\n\n- **感知技能**：如 `get_dom_with_content_type` 和 `geturl` 等技能，帮助代理理解当前网页或浏览器的状态。\n- **动作技能**：允许代理与网络环境交互并进行操作的技能，例如 `click`、`enter text` 和 `open url`。\n\n每项技能的设计都力求尽可能地接近对话式交互，从而使与 LLM 的交互更加直观且容错性更高。例如，与其简单地返回一个布尔值，技能可能会用自然语言解释其执行过程中发生了什么，以便 LLM 更好地理解上下文，并在必要时调整方向。\n\n以下是我们已实现的技能：\n\n| 感知技能 | 动作技能 |\n|----------------|---------------|\n| `geturl` - 获取并返回当前 URL。 | `click` - 根据提供的 DOM 查询选择器，点击该元素。 |\n| `get_dom_with_content_type` - 根据指定的内容类型，获取当前页面的 HTML DOM。内容类型包括：\u003Cbr> - `text_only`：提取 HTML DOM 的纯文本内容，以文本形式返回。\u003Cbr> - `input_fields`：提取 DOM 中的交互元素（按钮、输入框、文本区域等），并以简洁的 JSON 对象形式返回。\u003Cbr> - `all_fields`：提取 DOM 中的所有字段，并以简洁的 JSON 对象形式返回。 | `enter_text_and_click` - 优化方法，结合了输入文本和点击技能。这种优化适用于诸如在输入框中输入文本并点击搜索按钮的场景。由于 DOM 在此过程中不会发生变化，或者变化对操作影响不大，因此可以基于同一份 DOM 快照同时识别输入框和可点击按钮的选择器。 |\n| `get_user_input` - 为编排器提供一种机制，用于接收用户反馈，以消除歧义或澄清如何完成用户的请求。 | `bulk_enter_text` - 优化方法，封装了 `enter_text` 方法，以便一次性完成多个文本输入。 |\n|  | `enter_text` - 根据提供的 DOM 查询选择器，在指定的输入框中输入文本。 |\n|  | `openurl` - 在当前标签页或新标签页中打开给定的 URL。 |\n\n\n### DOM 精炼\n\nAgent-E 对待庞大 HTML DOM 的方式是系统而有条理的，坦率地说，这对于提高效率至关重要。我们引入了 DOM 精炼技术，将 DOM 压缩至仅与用户任务相关的元素。\n\n具体来说，就是将庞大的 DOM 转换为更易于理解的 JSON 快照。这不仅仅是减少数据量，更重要的是聚焦于相关性，只向 LLM 提供完成请求所需的信息。目前我们支持三种内容类型：\n\n- **纯文本**：当任务是信息检索，目标就是文本时使用。不带任何干扰信息。\n- **输入字段**：专注于需要用户交互的元素。目的是简化操作流程。\n- **全部内容**：完整的精炼 DOM，涵盖所有元素，适用于需要全面理解页面的任务。\n\n这一过程就像外科手术一样，仔细剔除冗余信息，同时保留代理运行所需的结构和内容。当然，在精炼过程中可能会丢失一些信息，但我们的目标是通过不断优化来尽量减少甚至消除这种损失。\n\n由于我们无法依赖所有网页开发者遵循最佳实践，比如为每个 HTML 元素添加唯一 ID，因此我们不得不在每个 DOM 元素中注入我们自己的属性 (`mmid`)。这样就可以引导 LLM 在生成的 DOM 查询中使用 `mmid` 属性。\n\n为了进一步减少 DOM 中的噪声，我们使用 DOM 可访问性树，而不是普通的 HTML DOM。可访问性树本身就是为了辅助屏幕阅读器而设计的，这比单纯的 HTML DOM 更符合 Web 自动化的目标。\n\nDOM 精炼过程仍在持续改进中。我们希望进一步优化这一流程，压缩 DOM 数据，从而实现更快、更经济、更准确的交互。\n\n## 测试与基准测试\n\nAgent-E 基于 [Web Arena](https:\u002F\u002Fgithub.com\u002Fweb-arena-x\u002Fwebarena) 在测试和评估方面的工作成果。`test` 目录下包含一个 `tasks` 子目录，其中存放着定义测试用例的 JSON 文件，这些文件同时也作为示例。\n\nAgent-E 运行在真实的网络环境中，这使得测试结果存在一定的变异性。因此，由于实时网站的变化，并非所有测试都能始终通过。我们的目标是在广泛的任务范围内确保 Agent-E 能够按预期工作，重点放在实际的 Web 自动化上。\n\n### 运行测试\n\n要运行完整的测试套件，可以使用以下命令：\n\n```bash\npython -m test.run_tests\n```\n\n### macOS 用户\n如果您在 macOS 上运行测试时遇到 `BlockingIOError` 错误，请使用无缓冲输出运行测试：\n```bash\npython -u -m test.run_tests\n```\n\n### 运行特定测试\n如果只想运行部分测试，可以修改最小和最大任务索引。这样将只运行测试配置文件中定义的一部分任务。\n\n示例：\n```bash\npython -m test.run_tests --min_task_index 0 --max_task_index 28 --test_results_id first_28_tests\n```\n该命令将运行从索引 0 到 27 的测试，并将结果标记为 `first_28_tests`。\n\n### run_tests 的参数\n以下是可用于自定义测试执行的其他参数：\n- `--min_task_index`：开始测试的最小任务索引（默认值为 0）。\n- `--max_task_index`：结束测试的最大任务索引，不包括该索引。\n- `--test_results_id`：测试结果的唯一标识符。如果不提供，则会使用时间戳。\n- `--test_config_file`：测试配置文件的路径。默认值为 `test\u002Ftasks\u002Ftest.json`。\n- `--wait_time_non_headless`：非无头模式下两次测试之间的等待时间。\n- `--take_screenshots`：在每次操作后拍摄截图。例如：`--take_screenshots true`。默认值为 `false`。\n\n### 示例命令\n以下是使用这些参数的示例（macOS 用户需在命令前加 `-u` 参数）：\n```bash\npython -m test.run_tests --min_task_index 0 --max_task_index 28 --test_results_id first_28_tests\n```\n\n\n## 贡献\n感谢您对参与 Agent-E 项目的兴趣！我们欢迎社区的贡献，并非常感激您为改进该项目所做出的努力。\n\n### 如何贡献：\n\n1. **Fork 仓库**  \n   首先将 [Agent-E 仓库](https:\u002F\u002Fgithub.com\u002FEmergenceAI\u002FAgent-E.git) Fork 到您的 GitHub 账户。\n\n2. **创建新分支**  \n   为您的功能或 bug 修复创建一个新分支：\n   ```bash\n   git checkout -b my-feature-branch\n   ```\n\n3. **进行更改**  \n   在您的新分支中实现更改。请确保遵循项目的编码风格和最佳实践。\n\n4. **运行测试**  \n   在提交 Pull Request 之前，请确保所有测试都通过，运行以下命令：\n   ```bash\n   python -m test.run_tests\n   ```\n\n5. **提交 Pull Request**  \n   当您的更改准备就绪后，将您的分支推送到您的 GitHub Fork，并向主仓库提交 Pull Request。请务必包含清晰的更改描述以及这些更改的必要性说明。\n\n### 贡献指南：\n- 请参阅 [贡献指南](CONTRIBUTING.md)，以获取更详细的贡献信息。\n- 请确保编写清晰简洁的提交信息。\n- 提交 Pull Request 时，请务必链接相关问题，并提供详细的更改描述。\n\n\n### 行为准则：\n请注意，我们有一份 [行为准则](CODE_OF_CONDUCT.md)，所有贡献者均应遵守。我们致力于为所有人提供一个友好且包容的环境。\n\n### 报告问题：\n如果您遇到 bug 或有功能请求，请在 [GitHub 问题跟踪器](https:\u002F\u002Fgithub.com\u002FEmergenceAI\u002FAgent-E\u002Fissues) 中提交一个问题。请务必提供详细信息，以便我们能够有效地解决问题。\n\n### 加入讨论：\n我们鼓励您加入我们的 [Discord](https:\u002F\u002Fdiscord.gg\u002FwgNfmFuqJF) 社区，参与讨论、提问以及了解 Agent-E 的最新动态。\n\n\n## 文档生成\n\nAgent-E 使用 [Sphinx](https:\u002F\u002Fwww.sphinx-doc.org\u002Fen\u002Fmaster\u002F) 生成其文档。要本地贡献或生成文档，请按照以下步骤操作：\n\n### 前置条件\n\n在生成文档之前，请确保已安装开发依赖项。您可以使用以下命令进行安装：\n\n```bash\nuv pip install -r pyproject.toml --extra dev\n```\n\n### 生成文档的步骤\n1. 导航到项目根目录：\n```bash\ncd Agent-E\n```\n\n2. 如果 `docs` 目录不存在，请创建它：\n```bash\nmkdir docs\ncd docs\n```\n\n3. 使用 quickstart 命令初始化 Sphinx：\n```bash\nsphinx-quickstart\n```\n\n4. 编辑 `docs\u002Fconf.py` 文件以配置 Sphinx。添加以下内容，将项目路径添加到 Sphinx 搜索路径，并启用扩展：\n```python\nimport os\nimport sys\nsys.path.insert(0, os.path.abspath('..'))\nextensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon']\nhtml_theme = 'sphinx_rtd_theme'\n```\n\n5. 生成 API 文档：  \n   从项目根目录运行以下命令以生成 API 文档文件：\n   ```bash\n   sphinx-apidoc -o docs\u002Fsource .\n   ```\n\n6. 构建文档：  \n   生成 API 文档后，进入 `docs` 目录并构建 HTML 文档：\n   ```bash\n   sphinx-build -b html . _build\n   ```\n\n### 查看文档\n文档构建完成后，您可以通过浏览器打开 `_build` 目录中的 `index.html` 文件来查看生成的文档：\n```bash\nopen _build\u002Findex.html\n```\n\n这将在您的默认网页浏览器中显示生成的文档。\n\n## 加入社区\n\n我们诚挚地邀请您加入 Agent-E 社区！无论您是想提问、分享反馈，还是为项目贡献力量，我们都欢迎您的参与。\n\n### 参与交流：\n\n- **Discord**：在我们的 [Discord 社区](https:\u002F\u002Fdiscord.gg\u002FwgNfmFuqJF) 中与其他用户和开发者交流。欢迎您提出问题、分享经验，或与社区成员一起讨论潜在的功能。\n\n### 保持更新：\n\n关注项目的最新动态、功能更新和公告，积极参与社区互动。\n\n- **GitHub**：随时查看最新的问题和 Pull Request，并直接在 [GitHub](https:\u002F\u002Fgithub.com\u002FEmergenceAI\u002FAgent-E) 上为代码库做出贡献。\n\n我们期待在社区中见到您！\n\n\n## 引用\n\n如果您在研究或项目中使用了本工作，请引用以下文章：\n\n```\n@misc{abuelsaad2024-agente,\n      title={Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems},\n      author={Tamer Abuelsaad and Deepak Akkil and Prasenjit Dey and Ashish Jagmohan and Aditya Vempaty and Ravi Kokku},\n      year={2024},\n      eprint={2407.13032},\n      archivePrefix={arXiv},\n      primaryClass={cs.AI},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.13032},\n}\n```\n您也可以在 [arXiv](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.13032) 上查看该论文。\n\n## 待办事项\n\n以下是 Agent-E 未来版本计划推出的一些功能和改进：\n\n- **健壮的下拉菜单处理**：改进对旅游预订平台（如 Booking.com、Expedia、Google Flights）等网站下拉菜单的处理。这些菜单中许多都包含动态内容，或将多个方面整合到同一个菜单中，例如在同一个菜单里同时选择出发日期和返回日期。\n- **任务缓存**：为已执行过的任务实现缓存功能，允许用户无需再次调用 LLM 即可重新运行任务。该缓存应具备智能性，仅对 DOM 定位器等元素进行缓存，而排除信息检索结果等项目，以提升 token 使用效率。\n- **开源与本地 LLM 兼容性**：调整 Agent-E 以兼容开源 LLM，理想情况下使其能够在本地运行。这可能需要简化提示词或重构技能，以适配这些模型的能力。\n- **多标签页与书签管理**：新增支持书签和多标签页使用的技能。目前，Agent-E 每次只能处理一个标签页，一旦打开其他标签页就会丢失状态，用户需重新启动。\n- **PDF 文本处理**：增强对 PDF 中超出 LLM 上下文窗口限制的大量文本的管理能力，可通过适当重叠分块的方式保留上下文信息。\n- **浏览器导航代理历史记录优化**：通过开发修剪浏览器历史记录的方法来优化浏览器导航代理，从而减少 token 使用量并降低 LLM 的认知负担。\n- **收集用户偏好**：集成长期记忆（LTM）支持，以便随时间自动填充用户偏好，并提供让用户手动注入偏好选项的功能。这可能涉及在本地使用 FAISS 等向量数据库，或采用外部托管的向量数据库。\n- **DOM 精炼测试框架**：开发用于 DOM 精炼的测试框架，以便衡量精炼效果的准确性和性能提升。\n- **DOM 精炼优化**：持续提升 DOM 精炼的速度和效率。\n- **Shadow DOM 支持**：部分网站使用 Shadow DOM，支持从无障碍树中提取其内容。\n- **Google 套件兼容性**：增加对 Google Docs、Sheets、Slides 和 Gmail 的支持，这些应用通常使用传统 DOM 方法无法访问的 canvas 元素。\n- **跨平台安装程序**：创建兼容 Windows、Mac 以及理想的 Ubuntu 系统的安装程序，面向非技术用户。该安装程序应在应用程序内支持环境变量配置。\n- **执行过程视频录制**：按照议题 [#106](https:\u002F\u002Fgithub.com\u002FEmergenceAI\u002FAgent-E\u002Fissues\u002F106) 的要求，实现执行过程的视频录制功能。\n- **DOM 精炼优化**：替换已弃用的 `snapshot()` 方法，用于 DOM 精炼。\n- **Token 优化**：研究如何减少 AutoGen 所需提示词和注释中使用的 token 数量。\n- ~~**动作验证**：为每个技能实现响应机制，通过 Mutation Observers 检测 DOM 变化，使 LLM 能够判断技能是否正确执行。~~\n- ~~**执行规划器**：开发规划代理，让 LLM 能够提前规划多个步骤，以加快执行速度。~~\n- ~~【嵌套聊天奏效】**群聊**：启用群聊功能，并将部分技能分配至不同代理。~~","# Agent-E 快速上手指南\n\nAgent-E 是一个基于 AG2 框架的智能体系统，旨在通过自然语言指令自动化浏览器操作（如填写表单、搜索商品、导航网页等）。\n\n## 环境准备\n\n*   **操作系统**: macOS, Linux, 或 Windows\n*   **Python 版本**: 推荐 Python 3.11 (3.10+ 亦可)\n*   **前置依赖**:\n    *   需要拥有大模型 API Key（如 OpenAI GPT-4-Turbo），或本地部署的开源模型环境（Ollama + LiteLLM）。\n    *   浏览器：本地安装的 Google Chrome 或使用 Playwright 自动安装的浏览器驱动。\n\n## 安装步骤\n\n你可以选择**脚本一键安装**或**手动安装**。\n\n### 方式一：脚本一键安装（推荐）\n\n在项目根目录下执行以下命令：\n\n**macOS \u002F Linux:**\n```bash\n.\u002Finstall.sh\n# 若需自动安装 Playwright 浏览器驱动，添加 -p 参数\n.\u002Finstall.sh -p\n```\n\n**Windows (PowerShell):**\n```powershell\n.\\win_install.ps1\n# 若需自动安装 Playwright 浏览器驱动，添加 -p 参数\n.\\win_install.ps1 -p\n```\n\n### 方式二：手动安装\n\n如果脚本执行失败，可按以下步骤手动配置：\n\n1.  **安装 uv 包管理器**\n    ```bash\n    # macOS\u002FLinux\n    curl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\n\n    # Windows\n    powershell -c \"irm https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.ps1 | iex\"\n    ```\n\n2.  **创建并激活虚拟环境**\n    ```bash\n    uv venv --python 3.11\n    source .venv\u002Fbin\u002Factivate  # Windows: .venv\\Scripts\\activate\n    ```\n\n3.  **安装项目依赖**\n    ```bash\n    uv pip compile pyproject.toml -o requirements.txt\n    uv pip install -r requirements.txt\n    ```\n\n4.  **(可选) 安装 Playwright 驱动**\n    如果不使用本地 Chrome，需安装 Playwright：\n    ```bash\n    playwright install\n    ```\n\n## 配置环境变量\n\n1.  复制示例配置文件：\n    ```bash\n    cp .env-example .env\n    ```\n\n2.  编辑 `.env` 文件，填入你的大模型配置。以 OpenAI 为例：\n    ```makefile\n    AUTOGEN_MODEL_NAME=gpt-4-turbo\n    AUTOGEN_MODEL_API_KEY=sk-your-openai-api-key\n    # 其他模型（如 Azure 或本地模型）需额外配置 BASE_URL 等参数\n    ```\n\n    > **提示**：若使用本地开源模型（如 Ollama），需先启动 LiteLLM 代理，并将 `AUTOGEN_MODEL_BASE_URL` 指向本地地址（如 `http:\u002F\u002F0.0.0.0:4000`），此时 Name 和 Key 可设为 `NotRequired`。\n\n## 基本使用\n\n### 1. 启动 Agent-E\n\n确保虚拟环境已激活，运行以下命令：\n\n**通用命令:**\n```bash\npython -m ae.main\n```\n\n**macOS 用户专用** (避免 `BlockingIOError`):\n```bash\npython -u -m ae.main\n```\n\n### 2. 执行任务\n\n程序启动后，浏览器界面会出现一个图标。点击该图标打开对话窗口，输入自然语言指令即可。\n\n**示例指令：**\n*   `open youtube and search for funny cat videos` (打开 YouTube 并搜索搞笑猫咪视频)\n*   `find iPhone 14 on Amazon and sort by best seller` (在亚马逊查找 iPhone 14 并按畅销榜排序)\n*   `go to espn, look for soccer news, report the names of the most recent soccer champs` (访问 ESPN 查找足球新闻，报告最近的冠军球队名称)\n\n### 3. 高级用法：API 调用\n\nAgent-E 支持通过 FastAPI 接口进行程序化调用。\n\n**启动服务:**\n```bash\n# Linux\u002FmacOS\nuvicorn ae.server.api_routes:app --reload --loop asyncio\n\n# Windows (移除 --reload)\nuvicorn ae.server.api_routes:app --loop asyncio\n```\n\n**发送请求示例:**\n```bash\ncurl --location 'http:\u002F\u002F127.0.0.1:8000\u002Fexecute_task' \\\n--header 'Content-Type: application\u002Fjson' \\\n--data '{\n    \"command\": \"go to espn, look for soccer news, report the names of the most recent soccer champs\"\n}'\n```","某电商运营专员每天需跨多个竞品网站收集特定品类（如“降噪耳机”）的销量排名、价格波动及用户评价，以制定当日调价策略。\n\n### 没有 Agent-E 时\n- 人工逐个打开亚马逊、京东等网站，手动输入搜索词并切换排序条件，耗时且易出错。\n- 遇到动态加载的页面或复杂的筛选器（如“仅看有货”、“四星以上”），需要反复点击尝试，效率极低。\n- 将不同页面的数据复制粘贴到 Excel 时，格式经常混乱，需花费大量时间清洗和对齐数据。\n- 一旦网站界面微调或增加验证码，原有的自动化脚本（如 Selenium）立即失效，维护成本高昂。\n- 无法实时响应突发需求，例如老板突然要求对比“过去一小时”的价格变化，人工根本无法完成。\n\n### 使用 Agent-E 后\n- 只需输入自然语言指令（如“查找亚马逊和京东上销量前十的降噪耳机，按价格升序排列”），Agent-E 自动规划路径并完成搜索与排序。\n- 智能识别并操作复杂的网页交互元素，自动处理滚动加载、弹窗关闭及多级筛选，无需人工干预。\n- 直接提取结构化数据并整理成表格，自动对齐字段，省去了繁琐的复制粘贴和格式调整环节。\n- 基于视觉理解和多模态能力，适应网页布局的细微变化，即使界面更新也能稳定执行任务，大幅降低维护负担。\n- 支持即时响应临时指令，几分钟内即可输出跨平台的实时对比报告，让决策跟上市场节奏。\n\nAgent-E 将原本需要数小时的人工浏览与数据搬运工作，转化为分钟级的自然语言交互，让运营人员从重复劳动中解放出来专注于策略分析。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FEmergenceAI_Agent-E_973ac4ae.png","EmergenceAI","Emergence AI","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FEmergenceAI_81d85349.png","Intelligent agents, seamlessly orchestrated. At emergence, we’re advancing the science and development of AI agents.",null,"emergence_ai","https:\u002F\u002Fwww.emergence.ai","https:\u002F\u002Fgithub.com\u002FEmergenceAI",[84,88,92,96],{"name":85,"color":86,"percentage":87},"Python","#3572A5",89.3,{"name":89,"color":90,"percentage":91},"JavaScript","#f1e05a",8.8,{"name":93,"color":94,"percentage":95},"Shell","#89e051",1,{"name":97,"color":98,"percentage":99},"PowerShell","#012456",0.9,1233,184,"2026-04-03T21:59:04","MIT","Linux, macOS, Windows","未说明",{"notes":107,"python":108,"dependencies":109},"该工具主要依赖大语言模型 API（如 OpenAI GPT-4-Turbo），也可配置为使用本地开源模型（需安装 Ollama 和 LiteLLM，但官方注明未经过充分测试）。浏览器自动化可通过安装 Playwright 驱动或使用本地 Chrome 浏览器实现。若在 macOS 上遇到 BlockingIOError，需使用 'python -u' 参数运行。","3.11 (3.10+ 应该也可以)",[110,111,112,113,114,115,116],"uv","AG2 (AutoGen)","Playwright","FastAPI","Uvicorn","LiteLLM (可选，用于开源模型)","Ollama (可选，用于本地模型)",[15],"2026-03-27T02:49:30.150509","2026-04-06T06:44:16.165245",[121,126,131,136,141,146],{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},15898,"为什么运行后界面没有反应或与预期不同？","该项目基于 AutoGen 代理框架，要求所使用的 LLM 必须支持函数调用（Function Calling）。如果您使用的是开源模型（如本地运行的 Llama3 via Ollama）且未实现函数调用功能，Agent-E 将无法正常工作。请确保更换为支持函数调用的模型或 API。","https:\u002F\u002Fgithub.com\u002FEmergenceAI\u002FAgent-E\u002Fissues\u002F76",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},15899,"Agent-E 与 Web Voyager 有什么区别？其工作原理是什么？","Agent-E 不进行页面的视觉分析（Visual Analysis），而是直接基于 DOM（文档对象模型）进行操作。这种方法的优势在于成本较低且速度较快，但可能在某些复杂场景下不如视觉分析直观。项目认为视觉分析和 DOM 解析各有优劣，未来可能会结合两种方法。更多技术细节可参考项目论文：https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.13032。","https:\u002F\u002Fgithub.com\u002FEmergenceAI\u002FAgent-E\u002Fissues\u002F117",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},15900,"在 Windows 10 上运行时提示找不到 Chrome 浏览器怎么办？","这通常是因为配置中使用了 'chrome-beta' 而非标准的 'chrome'。Playwright 对两者的初始化方式不同，目前默认仅支持标准版 Chrome。解决方法有两种：\n1. 安装 Playwright 管理的 Chrome 实例，并确保不要将 browser_storage_dir 指向其他 Chrome 实例。\n2. 修改代码：在 `playwright_manager.py` 文件中，找到 `playwright.chromium().launch` 这一行，将浏览器类型参数从 \"chrome\" 改为 \"chrome-beta\" 以匹配您安装的版本。","https:\u002F\u002Fgithub.com\u002FEmergenceAI\u002FAgent-E\u002Fissues\u002F59",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},15901,"使用时频繁遇到 'Too Many Requests' (429 错误) 是什么原因？","这通常是因为使用的模型（如 Llama3）在 DOM 分析方面表现不佳，导致它无法准确识别 HTML 元素，从而生成错误的查询选择器并触发多次重试。这会迅速消耗 API 配额（例如 Groq 免费层限制为每分钟 6k tokens），进而触发速率限制。建议检查日志文件 `ae\u002Flog_files\u002Fchat_messages.json` 确认模型是否生成了无效的选择器，或尝试更换对 DOM 理解能力更强的模型。","https:\u002F\u002Fgithub.com\u002FEmergenceAI\u002FAgent-E\u002Fissues\u002F30",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},15902,"如何为 Agent-E 添加 Firefox 浏览器支持？","目前 Agent-E 主要基于 Chromium 引擎。若要支持 Firefox，需要解决 MutationObserver 在 Firefox 中的兼容性问题。具体调试步骤包括：\n1. 确认页面导航时是否成功注入了 MutationObserver 脚本（检查浏览器控制台是否有 \"Mutation Observer loaded\" 日志）。\n2. 在执行技能时，监听数据流并检查过滤逻辑。\n3. 如果出现问题，需在 `dom_mutation_observor.py` 文件中增加日志输出，以排查 Firefox 环境下为何无法正常工作。","https:\u002F\u002Fgithub.com\u002FEmergenceAI\u002FAgent-E\u002Fissues\u002F111",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},15903,"Agent 在执行任务时卡住无响应怎么办？","如果 Agent 在浏览器聊天窗口和终端中长时间无响应，可能是异步函数执行阻塞或模型陷入死循环。虽然该特定问题因缺乏复现步骤而关闭，但通用排查建议包括：检查网络连接、确认 LLM API 是否正常返回、以及查看日志文件中是否有未处理的异常堆栈信息。对于复杂的多页遍历任务（如统计所有职位数量），模型可能会因上下文过长或逻辑死锁而挂起。","https:\u002F\u002Fgithub.com\u002FEmergenceAI\u002FAgent-E\u002Fissues\u002F29",[]]