[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-hrithikkoduri--WebRover":3,"tool-hrithikkoduri--WebRover":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":81,"owner_twitter":81,"owner_website":82,"owner_url":83,"languages":84,"stars":101,"forks":102,"last_commit_at":103,"license":104,"difficulty_score":105,"env_os":106,"env_gpu":107,"env_ram":107,"env_deps":108,"category_tags":122,"github_topics":81,"view_count":10,"oss_zip_url":81,"oss_zip_packed_at":81,"status":16,"created_at":123,"updated_at":124,"faqs":125,"releases":161},972,"hrithikkoduri\u002FWebRover","WebRover","WebRover is an autonomous AI agent designed to interpret user input and execute actions by interacting with web elements to accomplish tasks or answer questions. It leverages advanced language models and web automation tools to navigate the web, gather information, and provide structured responses based on the user's needs.","WebRover 是一款智能网页自动化助手，能够理解你的指令并自动操作浏览器完成任务。无论是预订机票、填写表单，还是深入调研某个学术话题、整理多源信息生成报告，它都能胜任。\n\n这款工具解决了传统自动化脚本灵活性差、搜索引擎无法深度整合信息的痛点。WebRover 内置三种专业模式：日常任务模式帮你搞定重复性网页操作；研究模式适合快速收集整理资料；深度研究模式则能跨多网站验证信息、自动引用来源，甚至输出格式规范的学术文档。\n\n特别适合需要频繁处理网页信息的研究人员、市场分析师、内容创作者，以及希望减少重复劳动的知识工作者。开发者也可基于其开源架构（Python + FastAPI 后端，Next.js 前端）进行二次开发。\n\n技术亮点在于结合了 LangGraph 状态管理与 Playwright 浏览器自动化，支持 GPT-4o、Claude 等主流大模型，并配备 RAG 检索增强生成管道，让 AI 在浏览网页时能\"记住\"上下文、交叉验证信息，输出结果可直接导出至 Google Docs 或 PDF。","# WebRover\n\n\u003Cdiv align=\"center\">\n  \u003C!-- Backend -->\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3776AB?style=for-the-badge&logo=python&logoColor=white\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FFastAPI-009688?style=for-the-badge&logo=fastapi&logoColor=white\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FOpenAI-412991?style=for-the-badge&logo=openai&logoColor=white\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLangChain-121212?style=for-the-badge&logo=chainlink&logoColor=white\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLangGraph-FF6B6B?style=for-the-badge&logo=graph&logoColor=white\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPlaywright-2EAD33?style=for-the-badge&logo=playwright&logoColor=white\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPillow-3776AB?style=for-the-badge&logo=python&logoColor=white\" \u002F>\n  \n  \u003C!-- Frontend -->\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FNext.js-000000?style=for-the-badge&logo=next.js&logoColor=white\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTypeScript-3178C6?style=for-the-badge&logo=typescript&logoColor=white\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTailwind_CSS-38B2AC?style=for-the-badge&logo=tailwind-css&logoColor=white\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FReact-61DAFB?style=for-the-badge&logo=react&logoColor=black\" \u002F>\n\n  \u003Ch3>Your AI Co-pilot for Web Navigation 🚀\u003C\u002Fh3>\n\n  \u003Cp align=\"center\">\n    \u003Cb>Autonomous Web Agent | Task Automation | Information Retrieval | Deep Research\u003C\u002Fb>\n  \u003C\u002Fp>\n\u003C\u002Fdiv>\n\n# Overview\nWebRover is an AI-powered web agent that combines autonomous browsing with advanced research capabilities. While maintaining its core ability to automate web tasks, version 2.0 introduces sophisticated research workflows including multi-source analysis, academic paper generation, and deep topic exploration. The system intelligently routes queries between task automation and research modes, providing a versatile tool for both quick actions and comprehensive research.\n\n# Motivation\nWhile traditional web automation tools excel at task execution, and search engines help with information retrieval, there's a growing need for tools that can handle both while specializing in deep research workflows. WebRover bridges this gap by offering task automation alongside intelligent research capabilities, with a particular focus on comprehensive information gathering, analysis, and synthesis. This dual-purpose approach aims to transform how we interact with web content, making both task execution and research more efficient and thorough.\n\n## Demo Video - Deep Research Agent\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F325c6c55-9384-4939-a912-3b1d13635799\n> Watch as the WebRover Deep Research Agent explores a topic, gathers information, and generates an academic paper.\n\n## Demo Video - Task Agent\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F95ae9afb-3fdf-47f8-857e-f6a1a0d94df5\n> Watch as the WebRover Task Agent navigates a website and performs a task.\n\n\n\n## Key Features\n\n### Agent Capabilities\n- Three specialized agents for different use cases (Task, Research, Deep Research)\n- Dynamic agent selection based on task complexity\n- Real-time agent state visualization\n- Streaming agent actions and thoughts\n\n### Browser Integration\n- Local browser instance for privacy and control\n- Multi-tab management\n- PDF document handling\n- Secure browsing sessions\n\n### User Interface\n- Modern chat interface with real-time updates\n- Interactive agent selection\n- Action streaming with visual feedback\n- Real-time page annotations and highlights\n\n### Output Options\n- Direct chat responses\n- One-click Google Docs export\n- PDF download functionality\n- Copy to clipboard support\n\n### Research Tools\n- Vector store for information retention\n- Multi-source verification\n- Academic paper generation\n- Reference management\n\n### Technical Features\n- State-of-the-art LLM integration (GPT-4o, o3-mini-high, Claude-3.5 sonnet)\n- RAG pipeline for enhanced responses\n- LangGraph for state management\n- Playwright for reliable web automation\n\n## Agent Types\n\n### 1. Task Agent\nA specialized automation agent for executing web-based tasks and workflows.\n- Custom action planning for multi-step tasks\n- Dynamic element interaction based on context\n- Real-time task progress monitoring\n\n### 2. Research Agent\nAn information gathering specialist with smart content processing.\n- Intelligent source selection and validation\n- Adaptive search refinement\n- Single-pass comprehensive information gathering\n\n### 3. Deep Research Agent (New! 🎉)\nAn advanced research agent that produces academic-quality content through systematic topic exploration.\n- Automatic topic decomposition and structured research\n- Independent subtopic exploration\n- Academic paper generation with proper citations\n- Cross-referenced bibliography compilation\n\n### Agent Architecture Diagrams\n\n#### Deep Research Agent Flow\n![Deep Research Agent Architecture](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhrithikkoduri_WebRover_readme_ac62f44f932a.png)\n\n*Deep Research Agent's workflow for comprehensive research and content generation*\n\n### Research Agent Flow\n![Research Agent Architecture](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhrithikkoduri_WebRover_readme_3a255b946796.png)\n\n*Research Agent's workflow for information gathering and synthesis*\n\n\n#### Task Agent Flow\n![Task Agent Architecture](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhrithikkoduri_WebRover_readme_7b1565d48aac.png)\n\n*Task Agent's workflow for automating web interactions*\n\n\n\n## Architecture\n\nThe system is built on a modern tech stack with three distinct agent types, each powered by:\n\n1. **State Management**\n   - LangGraph for maintaining agent state\n   - Handles complex navigation flows and decision making\n   - Structured workflow management\n\n2. **Browser Automation**\n   - Playwright for reliable web interaction\n   - Custom element detection and interaction system\n   - Automated navigation and content extraction\n\n3. **Content Processing**\n   - RAG (Retrieval Augmented Generation) pipeline\n   - Vector store integration for efficient information storage\n   - PDF and webpage content extraction\n   - Automatic content structuring and organization\n\n4. **AI Decision Making**\n   - Multiple LLM integration (GPT-4, Claude)\n   - Context-aware navigation\n   - Self-review mechanisms\n   - Structured output generation\n\n## Setup Instructions\n\n### Backend Setup\n\n1. Clone the repository\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Fhrithikkoduri18\u002FWebRover.git\n   cd WebRover\n   cd backend\n   ```\n\n2. Install Poetry (if not already installed)\n\n   Mac\u002FLinux:\n   ```bash\n   curl -sSL https:\u002F\u002Finstall.python-poetry.org | python3 -\n   ```\n   Windows:\n   ```bash\n   (Invoke-WebRequest -Uri https:\u002F\u002Finstall.python-poetry.org -UseBasicParsing).Content | python -\n   ```\n\n3. Set Python version for Poetry\n   ```bash\n   poetry env use python3.12\n   ```\n\n4. Activate the Poetry shell:\n   For Unix\u002FLinux\u002FMacOS:\n   ```bash\n   poetry shell\n   # or manually\n   source $(poetry env info --path)\u002Fbin\u002Factivate\n   ```\n   For Windows:\n   ```bash\n   poetry shell\n   # or manually\n   & (poetry env info --path)\\Scripts\\activate\n   ```\n\n5. Install dependencies using Poetry:\n   ```bash\n   poetry install\n   ```\n\n6. Set up environment variables in `.env`:\n   ```bash\n   OPENAI_API_KEY=\"your_openai_api_key\"\n   LANGCHAIN_API_KEY=\"your_langchain_api_key\"\n   LANGCHAIN_TRACING_V2=\"true\"\n   LANGCHAIN_ENDPOINT=\"https:\u002F\u002Fapi.smith.langchain.com\"\n   LANGCHAIN_PROJECT=\"your_project_name\"\n   ANTHROPIC_API_KEY=\"your_anthropic_api_key\"\n   ```\n\n7. Run the backend:\n\n   Make sure you are in the backend folder\n\n    ```bash\n    uvicorn app.main:app --reload --port 8000 \n    ```\n\n   For Windows User:\n\n    ```bash\n    uvicorn app.main:app --port 8000\n    ```\n\n8. Access the API at `http:\u002F\u002Flocalhost:8000`\n\n### Frontend Setup\n\n1. Open a new terminal and make sure you are in the WebRover folder:\n   ```bash\n   cd frontend\n   ```\n\n2. Install dependencies:\n   ```bash\n   npm install\n   ```\n\n3. Run the frontend:\n   ```bash\n   npm run dev\n   ```\n\n4. Access the frontend at `http:\u002F\u002Flocalhost:3000`\n\nFor mac users: \n\nTry running http:\u002F\u002Flocalhost:3000 on Safari browser. \nIf you face any with connecting to browser, open terminal and run:\n\n```bash\npkill -9 \"Chrome\"\n```\nand try again.\n\nIf you still face issues, try changing the websocket port from 9222 to 9223 in the `webrover_browser.py` file in the `backend\u002FBrowser` folder.\n\n\n## Contributing\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature\u002FAmazingFeature`)\n3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the branch (`git push origin feature\u002FAmazingFeature`)\n5. Open a Pull Request\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n---\n\nMade with ❤️ by [@hrithikkoduri](https:\u002F\u002Fgithub.com\u002Fhrithikkoduri)\n","# WebRover\n\n\u003Cdiv align=\"center\">\n  \u003C!-- 后端 -->\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3776AB?style=for-the-badge&logo=python&logoColor=white\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FFastAPI-009688?style=for-the-badge&logo=fastapi&logoColor=white\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FOpenAI-412991?style=for-the-badge&logo=openai&logoColor=white\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLangChain-121212?style=for-the-badge&logo=chainlink&logoColor=white\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLangGraph-FF6B6B?style=for-the-badge&logo=graph&logoColor=white\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPlaywright-2EAD33?style=for-the-badge&logo=playwright&logoColor=white\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPillow-3776AB?style=for-the-badge&logo=python&logoColor=white\" \u002F>\n  \n  \u003C!-- 前端 -->\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FNext.js-000000?style=for-the-badge&logo=next.js&logoColor=white\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTypeScript-3178C6?style=for-the-badge&logo=typescript&logoColor=white\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTailwind_CSS-38B2AC?style=for-the-badge&logo=tailwind-css&logoColor=white\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FReact-61DAFB?style=for-the-badge&logo=react&logoColor=black\" \u002F>\n\n  \u003Ch3>您的网页导航 AI 副驾驶 🚀\u003C\u002Fh3>\n\n  \u003Cp align=\"center\">\n    \u003Cb>自主网页智能体（Autonomous Web Agent）| 任务自动化 | 信息检索 | 深度研究\u003C\u002Fb>\n  \u003C\u002Fp>\n\u003C\u002Fdiv>\n\n# 概述\n\nWebRover 是一款 AI 驱动的网页智能体，将自主浏览与高级研究能力相结合。在保持自动化网页任务核心能力的同时，2.0 版本引入了复杂的研究工作流，包括多源分析、学术论文生成和深度主题探索。系统能够在任务自动化和研究模式之间智能路由查询，为快速操作和全面研究提供多功能工具。\n\n# 动机\n\n传统的网页自动化工具擅长任务执行，搜索引擎有助于信息检索，但人们越来越需要能够同时处理两者并专注于深度研究工作流的工具。WebRover 通过提供任务自动化和智能研究能力来弥补这一差距，特别侧重于全面的信息收集、分析和综合。这种双重用途的方法旨在改变我们与网页内容的交互方式，使任务执行和研究都更加高效和彻底。\n\n## 演示视频 - 深度研究智能体\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F325c6c55-9384-4939-a912-3b1d13635799\n> 观看 WebRover 深度研究智能体如何探索主题、收集信息并生成学术论文。\n\n## 演示视频 - 任务智能体\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F95ae9afb-3fdf-47f8-857e-f6a1a0d94df5\n> 观看 WebRover 任务智能体如何浏览网站并执行任务。\n\n\n\n## 核心功能\n\n### 智能体能力\n- 三种专用智能体，适用于不同场景（任务、研究、深度研究）\n- 基于任务复杂度的动态智能体选择\n- 实时智能体状态可视化\n- 智能体操作和思考的流式传输\n\n### 浏览器集成\n- 本地浏览器实例，确保隐私和控制\n- 多标签页管理\n- PDF 文档处理\n- 安全浏览会话\n\n### 用户界面\n- 现代聊天界面，实时更新\n- 交互式智能体选择\n- 操作流式传输，带视觉反馈\n- 实时页面注释和高亮\n\n### 输出选项\n- 直接聊天回复\n- 一键导出 Google Docs\n- PDF 下载功能\n- 复制到剪贴板支持\n\n### 研究工具\n- 向量存储（Vector Store）用于信息保留\n- 多源验证\n- 学术论文生成\n- 参考文献管理\n\n### 技术特性\n- 最先进的 LLM（Large Language Model，大语言模型）集成（GPT-4o、o3-mini-high、Claude-3.5 sonnet）\n- RAG（Retrieval Augmented Generation，检索增强生成）管道用于增强回复\n- LangGraph 用于状态管理\n- Playwright 用于可靠的网页自动化\n\n## 智能体类型\n\n### 1. 任务智能体（Task Agent）\n用于执行网页任务和工作流的专用自动化智能体。\n- 多步骤任务的自定义行动计划\n- 基于上下文的动态元素交互\n- 实时任务进度监控\n\n### 2. 研究智能体（Research Agent）\n具有智能内容处理的信息收集专家。\n- 智能来源选择和验证\n- 自适应搜索优化\n- 单次遍历全面信息收集\n\n### 3. 深度研究智能体（Deep Research Agent）（新功能！🎉）\n通过系统化主题探索生成学术级内容的高级研究智能体。\n- 自动主题分解和结构化研究\n- 独立子主题探索\n- 带正确引用的学术论文生成\n- 交叉引用参考文献汇编\n\n### 智能体架构图\n\n#### 深度研究智能体流程\n![Deep Research Agent Architecture](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhrithikkoduri_WebRover_readme_ac62f44f932a.png)\n\n*深度研究智能体用于全面研究和内容生成的工作流*\n\n### 研究智能体流程\n![Research Agent Architecture](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhrithikkoduri_WebRover_readme_3a255b946796.png)\n\n*研究智能体用于信息收集和综合的工作流*\n\n\n#### 任务智能体流程\n![Task Agent Architecture](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhrithikkoduri_WebRover_readme_7b1565d48aac.png)\n\n*任务智能体用于自动化网页交互的工作流*\n\n\n\n## 架构\n\n系统基于现代技术栈构建，包含三种不同的智能体类型，每种均由以下技术驱动：\n\n1. **状态管理**\n   - LangGraph 用于维护智能体状态\n   - 处理复杂的导航流程和决策制定\n   - 结构化工作流管理\n\n2. **浏览器自动化**\n   - Playwright 用于可靠的网页交互\n   - 自定义元素检测和交互系统\n   - 自动导航和内容提取\n\n3. **内容处理**\n   - RAG（Retrieval Augmented Generation，检索增强生成）管道\n   - 向量存储（Vector Store）集成，实现高效信息存储\n   - PDF 和网页内容提取\n   - 自动内容结构化和组织\n\n4. **AI 决策制定**\n   - 多 LLM（Large Language Model，大语言模型）集成（GPT-4、Claude）\n   - 上下文感知导航\n   - 自我审查机制\n   - 结构化输出生成\n\n## 设置说明\n\n### 后端设置\n\n1. 克隆仓库\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Fhrithikkoduri18\u002FWebRover.git\n   cd WebRover\n   cd backend\n   ```\n\n2. 安装 Poetry（如果尚未安装）\n\n   Mac\u002FLinux:\n   ```bash\n   curl -sSL https:\u002F\u002Finstall.python-poetry.org | python3 -\n   ```\n   Windows:\n   ```bash\n   (Invoke-WebRequest -Uri https:\u002F\u002Finstall.python-poetry.org -UseBasicParsing).Content | python -\n   ```\n\n3. 为 Poetry 设置 Python 版本\n   ```bash\n   poetry env use python3.12\n   ```\n\n4. 激活 Poetry shell：\n   对于 Unix\u002FLinux\u002FMacOS：\n   ```bash\n   poetry shell\n   # 或手动激活\n   source $(poetry env info --path)\u002Fbin\u002Factivate\n   ```\n   对于 Windows：\n   ```bash\n   poetry shell\n   # 或手动激活\n   & (poetry env info --path)\\Scripts\\activate\n   ```\n\n5. 使用 Poetry 安装依赖：\n   ```bash\n   poetry install\n   ```\n\n6. 在 `.env` 中设置环境变量：\n   ```bash\n   OPENAI_API_KEY=\"your_openai_api_key\"\n   LANGCHAIN_API_KEY=\"your_langchain_api_key\"\n   LANGCHAIN_TRACING_V2=\"true\"\n   LANGCHAIN_ENDPOINT=\"https:\u002F\u002Fapi.smith.langchain.com\"\n   LANGCHAIN_PROJECT=\"your_project_name\"\n   ANTHROPIC_API_KEY=\"your_anthropic_api_key\"\n   ```\n\n7. 运行后端：\n\n   确保你位于 backend 文件夹中\n\n    ```bash\n    uvicorn app.main:app --reload --port 8000 \n    ```\n\n   对于 Windows 用户：\n\n    ```bash\n    uvicorn app.main:app --port 8000\n    ```\n\n8. 在 `http:\u002F\u002Flocalhost:8000` 访问 API\n\n### 前端设置\n\n1. 打开新终端并确保你位于 WebRover 文件夹中：\n   ```bash\n   cd frontend\n   ```\n\n2. 安装依赖：\n   ```bash\n   npm install\n   ```\n\n3. 运行前端：\n   ```bash\n   npm run dev\n   ```\n\n4. 在 `http:\u002F\u002Flocalhost:3000` 访问前端\n\n对于 Mac 用户：\n\n尝试在 Safari 浏览器中运行 http:\u002F\u002Flocalhost:3000。\n如果在连接浏览器时遇到任何问题，打开终端并运行：\n\n```bash\npkill -9 \"Chrome\"\n```\n然后重试。\n\n如果仍然遇到问题，尝试将 `backend\u002FBrowser` 文件夹中 `webrover_browser.py` 文件的 WebSocket 端口从 9222 改为 9223。\n\n## 贡献\n\n1. Fork 本仓库\n2. 创建你的功能分支（`git checkout -b feature\u002FAmazingFeature`）\n3. 提交你的更改（`git commit -m 'Add some AmazingFeature'`）\n4. 推送到分支（`git push origin feature\u002FAmazingFeature`）\n5. 打开 Pull Request\n\n## 许可证\n\n本项目采用 MIT 许可证 - 详情请参见 [LICENSE](LICENSE) 文件。\n\n---\n\n由 [@hrithikkoduri](https:\u002F\u002Fgithub.com\u002Fhrithikkoduri) 用 ❤️ 制作","# WebRover 快速上手指南\n\n## 环境准备\n\n### 系统要求\n- **操作系统**: macOS \u002F Linux \u002F Windows\n- **Python**: 3.12\n- **Node.js**: 18+（推荐 20 LTS）\n- **浏览器**: Chrome \u002F Edge（用于自动化控制）\n\n### 前置依赖\n| 组件 | 用途 | 安装方式 |\n|:---|:---|:---|\n| Poetry | Python 依赖管理 | 见下方安装步骤 |\n| npm | 前端包管理 | 随 Node.js 安装 |\n| Git | 代码克隆 | 系统自带或官网下载 |\n\n---\n\n## 安装步骤\n\n### 1. 克隆仓库\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fhrithikkoduri18\u002FWebRover.git\ncd WebRover\n```\n\n### 2. 后端安装\n\n```bash\ncd backend\n\n# 安装 Poetry（如未安装）\n# macOS\u002FLinux:\ncurl -sSL https:\u002F\u002Finstall.python-poetry.org | python3 -\n# Windows:\n(Invoke-WebRequest -Uri https:\u002F\u002Finstall.python-poetry.org -UseBasicParsing).Content | python -\n\n# 配置 Python 环境\npoetry env use python3.12\n\n# 激活虚拟环境\npoetry shell\n\n# 安装依赖（国内用户可配置清华源加速）\npoetry install\n```\n\n**配置环境变量**：创建 `.env` 文件\n\n```bash\nOPENAI_API_KEY=\"your_openai_api_key\"\nANTHROPIC_API_KEY=\"your_anthropic_api_key\"\nLANGCHAIN_API_KEY=\"your_langchain_api_key\"\nLANGCHAIN_TRACING_V2=\"true\"\nLANGCHAIN_ENDPOINT=\"https:\u002F\u002Fapi.smith.langchain.com\"\nLANGCHAIN_PROJECT=\"webrover\"\n```\n\n**启动后端服务**：\n\n```bash\n# macOS\u002FLinux\nuvicorn app.main:app --reload --port 8000\n\n# Windows\nuvicorn app.main:app --port 8000\n```\n\n服务地址：`http:\u002F\u002Flocalhost:8000`\n\n### 3. 前端安装\n\n```bash\n# 新开终端，进入前端目录\ncd frontend\n\n# 安装依赖（国内用户可使用 npm 淘宝镜像）\nnpm install\n\n# 启动开发服务器\nnpm run dev\n```\n\n服务地址：`http:\u002F\u002Flocalhost:3000`\n\n> **macOS 用户注意**：如遇浏览器连接问题，执行 `pkill -9 \"Chrome\"` 后重试；若仍失败，修改 `backend\u002FBrowser\u002Fwebrover_browser.py` 中的 WebSocket 端口从 `9222` 改为 `9223`。\n\n---\n\n## 基本使用\n\n### 访问界面\n打开浏览器访问 `http:\u002F\u002Flocalhost:3000`，进入 WebRover 主界面。\n\n### 选择智能体类型\n\n| 智能体 | 适用场景 | 示例指令 |\n|:---|:---|:---|\n| **Task Agent** | 网页自动化任务 | \"登录 GitHub 并创建一个新仓库\" |\n| **Research Agent** | 信息快速检索 | \"查询 2024 年 Python 最新特性\" |\n| **Deep Research Agent** | 深度学术研究 | \"撰写一份关于大语言模型幻觉现象的综述论文\" |\n\n### 快速示例\n\n**任务自动化示例**：\n```\n访问 https:\u002F\u002Fnews.ycombinator.com，提取首页前 10 条新闻的标题和链接\n```\n\n**深度研究示例**：\n```\n研究\"具身智能\"领域，分析当前技术瓶颈、主流方案和发展趋势，生成带参考文献的学术报告\n```\n\n### 导出成果\n- 点击 **Google Docs** 一键导出到云端文档\n- 点击 **PDF** 下载本地文件\n- 点击 **复制** 将内容粘贴到剪贴板","一位市场分析师需要在2小时内完成一份关于\"2024年全球新能源汽车电池技术发展趋势\"的竞品调研报告，用于下午的投资决策会议。\n\n### 没有 WebRover 时\n\n- 手动打开数十个网页，在宁德时代、比亚迪、特斯拉等官网和财报之间反复切换，浏览器标签页混乱不堪\n- 需要逐页阅读PDF技术白皮书，关键数据散落在不同文档中，整理时频繁遗漏重要信息\n- 搜索学术数据库时，被大量低质量内容干扰，难以快速筛选出高引用论文和权威行业分析\n- 手动复制粘贴数据到文档，格式混乱，最后1小时还在调整排版，根本没时间深入分析趋势\n\n### 使用 WebRover 后\n\n- 直接输入指令\"收集宁德时代、比亚迪、特斯拉2024年电池技术路线和产能数据\"，WebRover自动多标签并行浏览，实时标注关键信息位置\n- 自动下载并解析PDF技术文档，提取能量密度、成本、量产时间等核心参数，结构化存储到向量数据库随时调用\n- 智能识别任务复杂度，自动切换至深度研究模式，交叉验证多个信源，生成带引用来源的学术级分析\n- 一键导出至Google Docs，自动格式化图表和参考文献，分析师得以专注洞察提炼，提前30分钟完成报告\n\nWebRover将原本需要4小时的机械性信息搜集工作压缩至20分钟，让专业分析师真正把时间花在价值判断而非网页跳转上。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhrithikkoduri_WebRover_5f624922.png","hrithikkoduri","Hrithik Koduri","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fhrithikkoduri_2b85cb65.jpg","Product Engineer @FlatFilers (now Obvious)","Flatfile Inc.","Austin, TX",null,"https:\u002F\u002Fwww.hrithikkoduri.com","https:\u002F\u002Fgithub.com\u002Fhrithikkoduri",[85,89,93,97],{"name":86,"color":87,"percentage":88},"Python","#3572A5",52.2,{"name":90,"color":91,"percentage":92},"JavaScript","#f1e05a",28.9,{"name":94,"color":95,"percentage":96},"TypeScript","#3178c6",18.6,{"name":98,"color":99,"percentage":100},"CSS","#663399",0.4,993,172,"2026-04-04T14:46:54","MIT",4,"Linux, macOS, Windows","未说明",{"notes":109,"python":110,"dependencies":111},"需要同时运行后端(Python)和前端(Node.js)服务；后端使用Poetry管理依赖；需要配置OpenAI、Anthropic和LangChain的API密钥；Playwright需要安装浏览器驱动；macOS用户如遇浏览器连接问题需手动终止Chrome进程或修改websocket端口","3.12",[112,113,114,115,116,117,118,119,120,121],"fastapi","uvicorn","langchain","langgraph","openai","anthropic","playwright","pillow","next.js","typescript",[15,26,13],"2026-03-27T02:49:30.150509","2026-04-06T06:45:21.065044",[126,131,136,141,146,151,156],{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},4303,"Windows 上无法连接浏览器怎么办？","此问题通常由环境变量配置错误导致。请检查 `.env` 文件格式，确保使用 LangChain 官方标准格式（每行末尾无逗号）：\n\n```\nLANGSMITH_TRACING=true\nLANGSMITH_ENDPOINT=\"https:\u002F\u002Fapi.smith.langchain.com\"\nLANGSMITH_API_KEY=\"\u003Cyour-api-key>\"\nLANGSMITH_PROJECT=\"pr-juicy-restoration-70\"\nOPENAI_API_KEY=\"\u003Cyour-openai-api-key>\"\n```\n\n注意：从 README 复制粘贴时容易带入多余逗号，请手动删除。另外，如果浏览器窗口已打开但发送消息后卡在 \"thinking\" 状态，请检查：\n1. 运行 `curl https:\u002F\u002Fapi.openai.com\u002Fv1\u002Fmodels -H \"Authorization: Bearer your_gpt_key\"` 确认 API Key 有效且包含 \"gpt-4o\" 模型\n2. 确保 Playwright 已正确安装：`playwright install`","https:\u002F\u002Fgithub.com\u002Fhrithikkoduri\u002FWebRover\u002Fissues\u002F6",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},4304,"Windows 上出现 \"_make_subprocess_transport\" 错误导致应用崩溃","这是 Python 3.13 在 Windows 上的已知兼容性问题。解决方法：移除 `--reload` 参数启动应用：\n\n```bash\nuvicorn app.main:app --port 8000\n```\n\n原命令 `uvicorn app.main:app --reload --port 8000` 中的热重载功能在 Windows 上会导致 `NotImplementedError` 异常。","https:\u002F\u002Fgithub.com\u002Fhrithikkoduri\u002FWebRover\u002Fissues\u002F2",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},4305,"Mac 上运行提示 \"No module named 'fastapi'\" 或模块导入错误","请按以下步骤解决：\n\n1. 确保在正确的目录运行命令：应在 `backend` 根目录执行，而非 `backend\u002Fapp` 目录\n2. 正确启动命令为：`uvicorn app.main:app --reload --port 8000`\n3. 如果仍报错，安装缺失依赖：`pip install langchain_openai`\n\n注意：在虚拟环境所在目录（root 或 backend）运行安装命令。","https:\u002F\u002Fgithub.com\u002Fhrithikkoduri\u002FWebRover\u002Fissues\u002F5",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},4306,"发送请求后显示 \"thinking\" 但无响应，卡在初始页面","此问题通常由以下原因导致：\n\n1. **API Key 配置错误**：检查 `.env` 文件中 OpenAI API Key 是否正确设置\n2. **API Key 余额不足**：确保 OpenAI 账户有足够余额用于 API 调用\n3. **欧盟用户需切换 LangChain 端点**：欧盟地区需将 `LANGSMITH_ENDPOINT` 改为欧盟专用 URL\n\n建议先验证 OpenAI 连接是否正常，可通过简单 API 调用测试。","https:\u002F\u002Fgithub.com\u002Fhrithikkoduri\u002FWebRover\u002Fissues\u002F7",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},4307,"README 中的 git clone 命令无法使用","README 中的仓库 URL 大小写有误。请使用正确的命令：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fhrithikkoduri\u002FWebRover.git\n```\n\n注意 `WebRover` 的 W 和 R 需要大写，而非 `webrover`。同时 `cd` 命令也需对应修改为 `cd WebRover`。","https:\u002F\u002Fgithub.com\u002Fhrithikkoduri\u002FWebRover\u002Fissues\u002F15",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},4308,"任务执行时陷入无限循环无法完成","当前版本在处理复杂任务（如查找特定信息）时可能不够健壮，会陷入循环。建议：\n\n1. 尝试使用新版 WebRover，已针对任务代理进行优化\n2. 简化任务描述，避免过于复杂的指令\n3. 开发者正在改进对历史操作的解析和下一步决策能力\n\n如遇到特定任务循环，可向开发者反馈具体用例以便优化。","https:\u002F\u002Fgithub.com\u002Fhrithikkoduri\u002FWebRover\u002Fissues\u002F10",{"id":157,"question_zh":158,"answer_zh":159,"source_url":160},4309,"如何配置使用本地 Chrome 浏览器而非 Playwright 浏览器？","目前 Issue 中未提供具体配置步骤，但这是一个常见需求。建议关注项目更新或向维护者询问具体的本地浏览器配置方法。","https:\u002F\u002Fgithub.com\u002Fhrithikkoduri\u002FWebRover\u002Fissues\u002F3",[162],{"id":163,"version":164,"summary_zh":165,"released_at":166},103763,"v2.0","### Release Note for WebRover\r\n\r\n#### Key Differences between Version 1 and Main Branch:\r\n\r\n**New Features and Enhancements:**\r\n- **Deep Research Agent:** Introduced a sophisticated deep research agent for comprehensive web data extraction and processing.\r\n- **Task Agent Integration:** Added a task agent to facilitate automated task execution based on user inputs.\r\n- **Browser Automation:** Enhanced browser automation capabilities with better error handling and retry logic.\r\n- **Improved UI\u002FUX:** Significant updates to the user interface, including new chat layout, UI themes, and toggle modes for better user experience.\r\n- **Dependency Management:** Switched from `requirements.txt` to `poetry` for managing project dependencies.\r\n\r\n**Bug Fixes:**\r\n- **Click Retrying Issue:** Resolved issues related to click retrying with scroll up adjustments.\r\n- **Import and Requirement Fixes:** Addressed various import and requirement issues to ensure smoother project setup and execution.\r\n\r\n**Documentation Updates:**\r\n- **Readme Improvements:** Enhanced the README file with updated setup instructions and additional guidance for running the backend server on different operating systems.\r\n\r\nFor more detailed information on the changes, you can review the commit history:\r\n- [Commits in Version 1](https:\u002F\u002Fgithub.com\u002Fhrithikkoduri\u002FWebRover\u002Fcommits?sha=version1)\r\n- [Commits in Main Branch](https:\u002F\u002Fgithub.com\u002Fhrithikkoduri\u002FWebRover\u002Fcommits?sha=main)","2025-02-13T00:43:08"]