[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-handrew--browserpilot":3,"tool-handrew--browserpilot":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":78,"owner_location":78,"owner_email":78,"owner_twitter":78,"owner_website":79,"owner_url":80,"languages":81,"stars":86,"forks":87,"last_commit_at":88,"license":89,"difficulty_score":10,"env_os":90,"env_gpu":90,"env_ram":90,"env_deps":91,"category_tags":96,"github_topics":97,"view_count":10,"oss_zip_url":78,"oss_zip_packed_at":78,"status":16,"created_at":104,"updated_at":105,"faqs":106,"releases":136},457,"handrew\u002Fbrowserpilot","browserpilot","Natural language browser automation","BrowserPilot 是一个通过自然语言控制浏览器自动化的工具，允许用户用接近日常语言的指令操作网页。它解决了传统自动化脚本需要编写复杂代码、且容易因网页结构变化而失效的问题。用户无需掌握编程细节，只需用英文描述操作步骤（如“搜索关键词后点击第一个链接”），工具即可自动执行。\n\n这一工具更适合开发者和研究人员使用，尤其是需要频繁进行网页测试、数据抓取或自动化流程构建的场景。虽然基础使用需安装 Python 环境和 Chromedriver，但其核心优势在于降低了自动化门槛——用户可专注于逻辑描述而非代码实现。例如，通过“点击可见文本框”“等待 10 秒”等指令，即可完成从搜索到页面跳转的完整流程。\n\n技术层面，BrowserPilot 结合了 GPT-3 的自然语言理解和 Selenium 的浏览器控制能力，能将指令转化为可执行代码。其独特之处在于支持函数封装和 YAML 配置文件，既可复用常用操作（如“登录流程”），又能通过分离指令与代码降低维护成本。对于希望快速验证自动化方案或探索 AI 辅助开发的用户，这是一个兼具灵活性与扩展性的实践工具。","# 🛫 BrowserPilot\n\nAn intelligent web browsing agent controlled by natural language.\n\n![demo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhandrew_browserpilot_readme_3e8e6862e899.gif)\n\nLanguage is the most natural interface through which humans give and receive instructions. Instead of writing bespoke automation or scraping code which is brittle to changes, creating and adding agents should be as simple as writing plain English.\n\n## 🏗️ Installation\n\n1. `pip install browserpilot`\n2. Download Chromedriver (latest stable release) from [here](https:\u002F\u002Fsites.google.com\u002Fchromium.org\u002Fdriver\u002F) and place it in the same folder as this file. Unzip. In Finder, right click the unpacked chromedriver and click \"Open\". This will remove the restrictive default permissions and allow Python to access it.\n3. Create an environment variable in your favorite manner setting OPENAI_API_KEY to your API key.\n\n\n## 🦭 Usage\n### 🗺️ API\nThe form factor is fairly simple (see below).\n\n```python\nfrom browserpilot.agents.gpt_selenium_agent import GPTSeleniumAgent\n\ninstructions = \"\"\"Go to Google.com\nFind all textareas.\nFind the first visible textarea.\nClick on the first visible textarea.\nType in \"buffalo buffalo buffalo buffalo buffalo\" and press enter.\nWait 2 seconds.\nFind all anchor elements that link to Wikipedia.\nClick on the first one.\nWait for 10 seconds.\"\"\"\n\nagent = GPTSeleniumAgent(instructions, \"\u002Fpath\u002Fto\u002Fchromedriver\")\nagent.run()\n```\n\nThe harder (but funner) part is writing the natural language prompts.\n\n\n### 📑 Writing Prompts\n\nIt helps if you are familiar with how Selenium works and programming in general. This is because this project uses GPT-3 to translate natural language into code, so you should be as precise as you can. In this way, it is more like writing code with Copilot than it is talking to a friend; for instance, it helps to refer to things as `input`s or `textareas` (vs. \"text box\" \"search box\") or \"button which says 'Log in'\" rather than \"the login button\". Sometimes, it will also not pick up on specific words that are important, so it helps to break them out into separate lines. Instead of \"find all the visible textareas\", you do \"find all the textareas\" and then \"find the first visible textarea\".\n\nYou can look at some examples in `prompts\u002Fexamples` to get started.\n\nCreate \"functions\" by enclosing instructions in `BEGIN_FUNCTION func_name` and `END_FUNCTION`, and then call them by starting a line with `RUN_FUNCTION` or `INJECT_FUNCTION`. Below is an example: \n\n```\nBEGIN_FUNCTION search_buffalo\nGo to Google.com\nFind all textareas.\nFind the first visible textarea.\nClick on the first visible textarea.\nType in \"buffalo buffalo buffalo buffalo buffalo\" and press enter.\nWait 2 seconds.\nGet all anchors on the page that contain the word \"buffalo\".\nClick on the first link.\nEND_FUNCTION\n\nRUN_FUNCTION search_buffalo\nWait for 10 seconds.\n```\n\nYou may also choose to create a yaml or json file with a list of instructions. In general, it needs to have an `instructions` field, and optionally a `compiled` field which has the processed code.\n\nSee [buffalo wikipedia example](prompts\u002Fexamples\u002Fbuffalo_wikipedia.yaml).\n\nYou may pass a `instruction_output_file` to the constructor of GPTSeleniumAgent which will output a yaml file with the compiled instructions from GPT-3, to avoid having to pay API costs. \n\n## ✋🏼 Contributing\nThere are two ways I envision folks contributing.\n\n- **Adding to the Prompt Library**: Read \"Writing Prompts\" above and simply make a pull request to add something to `prompts\u002F`! At some point, I will figure out a protocol for folder naming conventions and the evaluation of submitted code (for security, accuracy, etc). This would be a particularly attractive option for those who aren't as familiar with coding.\n- **Contributing code**: I am happy to take suggestions! The main way to add to the repository is to extend the capabilities of the agent, or to create new agents entirely. The best way to do this is to familiarize yourself with \"Architecture and Prompt Patterns\" above, and to (a) expand the list of capabilities in the base prompt in `InstructionCompiler` and (b) write the corresponding method in `GPTSeleniumAgent`. \n\n## ⛩️ Architecture and Prompt Patterns\n\nThis repo was inspired by the work of [Yihui He](https:\u002F\u002Fgithub.com\u002Fyihui-he\u002FActGPT), [Adept.ai](https:\u002F\u002Fadept.ai\u002F), and [Nat Friedman](https:\u002F\u002Fgithub.com\u002Fnat\u002Fnatbot). In particular, the basic abstractions and prompts used were built off of Yihui's hackathon code. The idea to preprocess HTML and use GPT-3 to intelligently pick elements out is from Nat. \n\n- The prompts used can be found in [instruction compiler](browserpilot\u002Fagents\u002Fcompilers\u002Finstruction_compiler.py). The base prompt describes in plain English a set of actions that the browsing agent can take, some general conventions on how to write code, and some constraints on its behavior. **These actions correspond one-for-one with methods in `GPTSeleniumAgent`**. Those actions, to-date, include:\n    - `env.driver`, the Selenium webdriver.\n    - `env.find_elements(by='id', value=None)` finds and returns list of elements.\n    - `env.find_element(by='id', value=None)` is similar to `env.find_elements()` except it only returns the first element.\n    - `env.find_nearest(e, xpath)` can be used to locate an element near another one.\n    - `env.send_keys(element, text)` sends `text` to element.\n    - `env.get(url)` goes to url.\n    - `env.click(element)` clicks the element.\n    - `env.wait(seconds)` waits for `seconds` seconds.\n    - `env.scroll(direction, iframe=None)` scrolls the page. Will switch to `iframe` if given. `direction` can be \"up\", \"down\", \"left\", or \"right\". \n    - `env.get_llm_response(text)` asks AI about a string `text`.\n    - `env.retrieve_information(prompt)` returns a string, information from a page given a prompt. Use prompt=\"Summarize:\" for summaries. Invoked with commands like \"retrieve\", \"find in the page\", or similar.\n    - `env.ask_llm_to_find_element(description)` asks AI to find an element that matches the description.\n    - `env.query_memory(prompt)` asks AI with a prompt to query its memory (an embeddings index) of the web pages it has browsed. Invoked with \"Query memory\".\n    - `env.save(text, filename)` saves the string `text` to a file `filename`.\n    - `env.get_text_from_page()` returns the free text from the page.\n- The rest of the code is basically middleware which exposes a Selenium object to GPT-3. **For each action mentioned in the base prompt, there is a corresponding method in GPTSeleniumAgent.**\n    - An `InstructionCompiler` is used to parse user input into semantically cogent blocks of actions.\n- The agent has a `Memory` which enables it to synthesize what it sees.\n\n\n## 🎉 Finished\n0.2.51 \n- Thanks to @rapatel0, you can now run BrowserPilot with Selenium Grid, remotely.\n\n-0.2.42 - 0.2.44\n- Small changes in `examples.py` and dependencies.\n- Refactor for the big Llama Index upgrade.\n\n0.2.38 - 0.2.41\n- Change `enable_memory` to `memory_file` to enable more control over what the memory is called. Allow users to load memory as well.\n- Make `get_text_from_page` simpler.\n\n\n0.2.26 - 0.2.37\n- Bit the bullet and switched the default model to gpt-3.5-turbo. Will be much cheaper!\n- Also fixed retries. I wasn't actually getting the retry action!\n- Fiddle with the prompt a bit for GPT-3.5.\n- Concerningly, gpt-3.5-turbo keep trying to import modules. I manually remove lines that try to import modules.\n- Compatibility with new Llama Index updates.\n\n0.2.14 - 0.2.25\n- Add options to avoid website detection of bots.\n- Add more OpenAI API error handling.\n- Improve stack trace prompt and a few other prompts.\n- Add \"displayed in viewport\" capability. \n- Make ```from browserpilot.agents import \u003Cagent>``` possible.\n- Make `find_element` and `find_elements` search only for displayed elements.\n- Save memory once finished running.\n- Add scroll option to top and bottom.\n\n0.2.10 - 0.2.13\n- Add more error handling for OpenAI exceptions.\n- Change all the embedding querying to use ChatGPT.\n- Get rid of the nltk dependency! Good riddance.\n- Expand the max token window for asking the LLM a question on a web page. \n- Fix an issue with the Memory module which tried to access OpenAI API key before it's initialized. Change the prompt slightly.\n- ❗️ Enable ChatGPT use with GPT Index, so that we can use GPT3.5-turbo to query embeddings.\n\n0.2.7 -  0.2.9\n- Vacillating on the default model. ChatGPT does not work well for writing code, as it takes too many freedoms with what it returns.\n- Also, I tried condensing the prompt a bit, which is growing a bit long.\n\n0.2.4 - 0.2.6\n- Give the agent a memory (still very experimental and not very good). Add capability to screenshot elements.\n- Bug fixes around versioning and prompt injection.\n\n0.2.3\n- Move `chrome_options` to somewhere more sensible. Just keep the yaml clean, you know?\n\n0.2.2\n- ChatGPT support.\n\n0.2.1\n- Dict support for loading instructions.\n\n0.2.0\n- 🎬 a `Studio` CLI which helps iteratively test prompts!\n- JSON loading.\n- Basic iframe support.\n\n\u003C0.2.0\n- GPTSeleniumAgent should be able to load prompts and cached successful runs in the form of yaml files. InstructionCompiler should be able to save instructions to yaml.\n- 💭 Add a summarization capability to the agent.\n- Demo\u002Ftest something where it has to ask the LLM to synthesize something it reads online.\n- 🚨 Figured out how to feed the content of the HTML page into the GPT-3 context window and have it reliably pick out specific elements from it, that would be great!\n\n## 🚨 Disclaimer \n\nThis package runs code output from the OpenAI API in Python using `exec`. **This is not considered a safe convention**. Accordingly, you should be extra careful when using this package. The standard disclaimer follows.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n\n","# 🛫 BrowserPilot\n\n一个通过自然语言控制的智能网页浏览代理。\n\n![demo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhandrew_browserpilot_readme_3e8e6862e899.gif)\n\n语言是人类给予和接收指令最自然的界面。与其编写易受变化影响的定制自动化或抓取代码，创建和添加代理应该像编写普通英文一样简单。\n\n## 🏗️ 安装\n\n1. `pip install browserpilot`\n2. 从 [这里](https:\u002F\u002Fsites.google.com\u002Fchromium.org\u002Fdriver\u002F) 下载最新稳定版的 Chromedriver（Chrome浏览器的驱动程序），将其放在与本文件相同的文件夹中。解压后，在Finder中右键点击解压后的chromedriver并点击\"Open\"。这将移除默认的限制性权限，允许Python访问它。\n3. 使用你喜欢的方式创建环境变量，将 OPENAI_API_KEY 设置为你的 API 密钥（API密钥）。\n\n## 🦭 使用\n### 🗺️ API\n形式非常简单（见下文）。\n\n```python\nfrom browserpilot.agents.gpt_selenium_agent import GPTSeleniumAgent\n\ninstructions = \"\"\"Go to Google.com\nFind all textareas.\nFind the first visible textarea.\nClick on the first visible textarea.\nType in \"buffalo buffalo buffalo buffalo buffalo\" and press enter.\nWait 2 seconds.\nFind all anchor elements that link to Wikipedia.\nClick on the first one.\nWait for 10 seconds.\"\"\"\n\nagent = GPTSeleniumAgent(instructions, \"\u002Fpath\u002Fto\u002Fchromedriver\")\nagent.run()\n```\n\n更具挑战性（但也更有趣）的部分是编写自然语言提示。\n\n### 📑 编写提示\n\n如果你熟悉 Selenium 的工作原理和编程基础会更有帮助。这是因为该项目使用 GPT-3 将自然语言转换为代码，因此你需要尽可能精确。这种方式更像是用 Copilot 编写代码，而不是和朋友交谈；例如，建议使用 `input` 或 `textareas`（而非\"text box\" \"search box\"）或\"button which says 'Log in'\"（而非\"the login button\"）。有时，它可能无法识别某些重要词语，因此最好将它们拆分成单独的行。例如，不要写\"find all the visible textareas\"，而是分两步：\"find all the textareas\" 然后 \"find the first visible textarea\"。\n\n你可以查看 `prompts\u002Fexamples` 中的一些示例来入门。\n\n通过在 `BEGIN_FUNCTION func_name` 和 `END_FUNCTION` 中包裹指令来创建\"函数\"，然后通过以 `RUN_FUNCTION` 或 `INJECT_FUNCTION` 开头的行来调用它们。以下是示例：\n\n```\nBEGIN_FUNCTION search_buffalo\nGo to Google.com\nFind all textareas.\nFind the first visible textarea.\nClick on the first visible textarea.\nType in \"buffalo buffalo buffalo buffalo buffalo\" and press enter.\nWait 2 seconds.\nGet all anchors on the page that contain the word \"buffalo\".\nClick on the first link.\nEND_FUNCTION\n\nRUN_FUNCTION search_buffalo\nWait for 10 seconds.\n```\n\n你也可以选择创建一个包含指令列表的 yaml 或 json 文件。通常需要包含一个 `instructions` 字段，以及一个可选的 `compiled` 字段，用于存储处理后的代码。\n\n参见 [buffalo wikipedia 示例](prompts\u002Fexamples\u002Fbuffalo_wikipedia.yaml)。\n\n你可以将 `instruction_output_file` 传递给 GPTSeleniumAgent 的构造函数，这将输出一个包含 GPT-3 编译指令的 yaml 文件，避免支付 API 费用。\n\n## ✋🏼 贡献\n我设想了两种贡献方式。\n\n- **添加到提示库**：阅读上方的\"编写提示\"部分，直接提交 pull request 添加内容到 `prompts\u002F`！在某个时间点，我会制定文件夹命名规范和提交代码的评估协议（出于安全、准确性等方面的考虑）。这对不太熟悉编程的人尤其有吸引力。\n- **贡献代码**：我很乐意接受建议！扩展代理功能或创建全新代理是主要的贡献方式。最佳方法是熟悉上方的\"架构和提示模式\"，然后（a）扩展 `InstructionCompiler` 中基础提示的功能列表，以及（b）在 `GPTSeleniumAgent` 中编写对应的方法。\n\n## ⛩️ 架构和提示模式\n\n这个仓库受到 [Yihui He](https:\u002F\u002Fgithub.com\u002Fyihui-he\u002FActGPT)、[Adept.ai](https:\u002F\u002Fadept.ai\u002F) 和 [Nat Friedman](https:\u002F\u002Fgithub.com\u002Fnat\u002Fnatbot) 的工作的启发。特别是，基本的抽象和提示模式是基于 Yihui 的黑客马拉松代码构建的。预处理 HTML 并使用 GPT-3 智能选择元素的想法来自 Nat。\n\n- 使用的提示可以找到在 [instruction compiler](browserpilot\u002Fagents\u002Fcompilers\u002Finstruction_compiler.py)。基础提示用普通英语描述了浏览代理可以执行的一系列操作，一些通用的编码规范，以及对其行为的一些约束。**这些操作与 `GPTSeleniumAgent` 中的方法一一对应**。到目前为止，这些操作包括：\n    - `env.driver`，Selenium 的网络驱动程序。\n    - `env.find_elements(by='id', value=None)` 查找并返回元素列表。\n    - `env.find_element(by='id', value=None)` 与 `env.find_elements()` 类似，但只返回第一个元素。\n    - `env.find_nearest(e, xpath)` 可用于定位另一个元素附近的元素。\n    - `env.send_keys(element, text)` 向元素发送 `text`。\n    - `env.get(url)` 前往 url。\n    - `env.click(element)` 点击元素。\n    - `env.wait(seconds)` 等待 `seconds` 秒。\n    - `env.scroll(direction, iframe=None)` 滚动页面。如果提供了 `iframe`，则会切换到该 iframe。`direction` 可以是 \"up\"、\"down\"、\"left\" 或 \"right\"。\n    - `env.get_llm_response(text)` 向 AI 询问字符串 `text`。\n    - `env.retrieve_information(prompt)` 返回一个字符串，根据提示从页面中检索信息。使用 prompt=\"Summarize:\" 来获取摘要。通过类似 \"retrieve\"、\"find in the page\" 或类似的命令调用。\n    - `env.ask_llm_to_find_element(description)` 请 AI 查找匹配描述的元素。\n    - `env.query_memory(prompt)` 通过提示向 AI 查询其记忆（嵌入索引）中已浏览的网页。通过 \"Query memory\" 调用。\n    - `env.save(text, filename)` 将字符串 `text` 保存到文件 `filename`。\n    - `env.get_text_from_page()` 返回页面中的自由文本。\n- 其余代码基本上是中间件，将 Selenium 对象暴露给 GPT-3。**对于基础提示中提到的每个操作，GPTSeleniumAgent 中都有一个对应的方法**。\n    - 使用 `InstructionCompiler` 将用户输入解析为语义连贯的操作块。\n- 代理有一个 `Memory`，使其能够综合它所看到的内容。\n\n## 🎉 完成\n0.2.51 \n- 感谢 @rapatel0，现在可以通过 Selenium Grid（分布式浏览器测试框架）远程运行 BrowserPilot。\n\n-0.2.42 - 0.2.44\n- 对 `examples.py` 示例代码和依赖项进行了小幅修改。\n- 为 Llama Index（机器学习索引库）重大升级进行代码重构。\n\n0.2.38 - 0.2.41\n- 将 `enable_memory` 改为 `memory_file` 以实现对内存文件名的更精细控制，同时支持用户加载内存。\n- 简化了 `get_text_from_page` 方法。\n\n0.2.26 - 0.2.37\n- 终于决定将默认模型切换为 gpt-3.5-turbo（成本将大幅降低！）\n- 修复了重试机制的问题，之前实际上无法触发重试操作。\n- 针对 GPT-3.5 优化了提示词。\n- 令人担忧的是，gpt-3.5-turbo 模型持续尝试导入模块，我手动删除了相关导入语句。\n- 兼容新的 Llama Index 更新。\n\n0.2.14 - 0.2.25\n- 添加防反爬虫检测选项。\n- 增强 OpenAI API 错误处理能力。\n- 优化堆栈跟踪提示和其他几个提示模板。\n- 新增 \"显示在视口内\" 功能。\n- 支持通过 ```from browserpilot.agents import \u003Cagent>``` 导入代理。\n- 使 `find_element` 和 `find_elements` 仅搜索可见元素。\n- 运行结束后保存内存。\n- 添加页面顶部和底部滚动选项。\n\n0.2.10 - 0.2.13\n- 增加更多 OpenAI 异常处理。\n- 将所有嵌入查询改为使用 ChatGPT。\n- 移除了 nltk 依赖项（终于摆脱了）。\n- 扩大了向 LLM 提问时的最大 token 窗口。\n- 修复内存模块在初始化前访问 OpenAI API 密钥的问题，稍作提示词调整。\n- ❗️ 启用 ChatGPT 与 GPT Index 集成，使用 GPT3.5-turbo 查询嵌入向量。\n\n0.2.7 -  0.2.9\n- 在默认模型选择上反复权衡。ChatGPT 生成代码效果不佳，因为它对返回结果的自由度太高。\n- 同时尝试精简提示词长度，当前提示词已略显冗长。\n\n0.2.4 - 0.2.6\n- 为代理添加实验性内存功能（仍处于早期阶段），新增元素截图能力。\n- 修复版本管理和提示注入相关的 bug。\n\n0.2.3\n- 将 `chrome_options` 移动到更合理的位置，保持 yaml 文件整洁。\n\n0.2.2\n- 添加 ChatGPT 支持。\n\n0.2.1\n- 支持通过字典加载指令。\n\n0.2.0\n- 🎬 新增 `Studio` CLI 工具，支持迭代测试提示词！\n- 支持 JSON 加载。\n- 基础 iframe 支持。\n\n\u003C0.2.0\n- GPTSeleniumAgent 应能通过 yaml 文件加载提示词和缓存的成功运行记录。InstructionCompiler 应能将指令保存为 yaml。\n- 💭 为代理添加摘要功能。\n- 演示\u002F测试需要让 LLM 合成在线阅读内容的场景。\n- 🚨 成功实现了将 HTML 页面内容输入 GPT-3 上下文窗口并可靠提取特定元素的功能！\n\n## 🚨 免责声明 \n\n该包通过 `exec` 执行来自 OpenAI API 的代码输出。**这种做法存在安全隐患**。因此使用本包时需格外谨慎。标准免责声明如下：\n\n软件按\"原样\"提供，不提供任何形式的明示或暗示担保，包括但不限于适销性、特定用途适用性和非侵权性担保。在任何情况下，作者或版权持有者均不对因软件使用或其他交易产生的索赔、损害或其他责任负责，无论该责任源于合同、侵权或其他法律行为。","# BrowserPilot 快速上手指南\n\n## 🧰 环境准备\n- **系统要求**：Python 3.8+\n- **前置依赖**：\n  - `pip`（Python包管理工具）\n  - [Chromedriver](https:\u002F\u002Fsites.google.com\u002Fchromium.org\u002Fdriver\u002F)（需与Chrome浏览器版本匹配）\n  - OpenAI API密钥（[申请地址](https:\u002F\u002Fplatform.openai.com\u002Faccount\u002Fapi-keys)）\n\n---\n\n## 📦 安装步骤\n1. **安装BrowserPilot**  \n   ```bash\n   pip install browserpilot\n   ```\n   *（可选：国内用户可使用镜像加速）*\n   ```bash\n   pip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple browserpilot\n   ```\n\n2. **配置Chromedriver**  \n   - 下载最新稳定版Chromedriver并解压  \n   - 将`chromedriver`文件放入项目目录  \n   - macOS用户需右键点击`chromedriver` → \"打开\"以解除权限限制\n\n3. **设置环境变量**  \n   ```bash\n   export OPENAI_API_KEY=your_api_key_here\n   ```\n\n---\n\n## ▶️ 基本使用\n### 最简代码示例\n```python\nfrom browserpilot.agents.gpt_selenium_agent import GPTSeleniumAgent\n\ninstructions = \"\"\"Go to Google.com\nFind all textareas.\nFind the first visible textarea.\nClick on the first visible textarea.\nType in \"buffalo buffalo buffalo buffalo buffalo\" and press enter.\"\"\"\n\nagent = GPTSeleniumAgent(instructions, \"\u002Fpath\u002Fto\u002Fchromedriver\")\nagent.run()\n```\n\n### 关键说明\n- **指令格式**：使用自然语言描述操作流程（类似Selenium脚本）\n- **路径替换**：将`\u002Fpath\u002Fto\u002Fchromedriver`替换为实际解压路径\n- **API调用**：首次运行会通过OpenAI API编译指令为代码，后续可通过`instruction_output_file`参数缓存结果\n\n---\n\n> ⚠️ 安全提示：该工具会执行来自OpenAI API的代码，建议仅在受控环境中使用","某跨境电商运营团队每周需监控50+海外电商平台上同类产品的价格波动和促销信息，以便调整自身定价策略。团队成员需手动访问各平台搜索商品、记录价格、截图促销页面，耗时约8小时\u002F周且易遗漏关键数据。\n\n### 没有 browserpilot 时\n- **手动操作效率低下**：需逐个平台打开网页，重复搜索相同商品，平均每个平台耗时12分钟\n- **代码维护成本高**：现有Python+BeautifulSoup脚本因网页结构频繁变动，每周需2次代码调试\n- **非技术人员无法参与**：市场分析师需依赖开发人员编写爬虫，需求响应周期长达3天\n- **任务逻辑复杂难实现**：涉及多步骤交互（如登录账户、切换货币单位、展开详情页）时，需编写复杂代码\n- **无法快速调整策略**：临时增加监控品类时，需重新开发整套自动化流程\n\n### 使用 browserpilot 后\n- **自然语言指令秒级执行**：通过\"访问Amazon.com → 搜索'wireless headphones' → 提取所有价格和折扣标签\"等指令，单平台监控时间缩短至90秒\n- **抗网页结构变化**：当Target网站改版搜索框位置时，原有指令无需修改仍能正确定位元素\n- **业务人员自主操作**：市场分析师可直接编写\"登录Walmart商户后台 → 导出上周销售报表 → 截图库存预警区域\"等指令\n- **复杂流程可视化编排**：通过函数封装实现\"货币切换→商品筛选→价格比较\"的标准化操作流\n- **策略调整即时生效**：新增监控品类时，仅需在指令库中添加对应自然语言描述即可\n\n核心价值：将专业级浏览器自动化能力转化为自然语言交互，使业务人员能像写操作手册一样完成数据采集任务，使技术团队从重复性脚本开发中解放出来专注核心系统建设。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhandrew_browserpilot_ffa5e206.png","handrew","Andrew Han","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fhandrew_1d242464.jpg",null,"www.andrewhhan.com","https:\u002F\u002Fgithub.com\u002Fhandrew",[82],{"name":83,"color":84,"percentage":85},"Python","#3572A5",100,630,76,"2026-03-21T01:18:29","MIT","未说明",{"notes":92,"python":90,"dependencies":93},"需手动下载Chromedriver并配置执行权限；需设置OPENAI_API_KEY环境变量；依赖OpenAI API调用，可能产生费用",[94,95],"selenium","openai",[15,26],[98,99,100,101,102,94,103],"browser-automation","generative-ai","gpt-3","robotic-process-automation","rpa","selenium-python","2026-03-27T02:49:30.150509","2026-04-06T05:16:43.049071",[107,112,116,121,126,131],{"id":108,"question_zh":109,"answer_zh":110,"source_url":111},1778,"如何解决 llama_index 的 Document 导入错误？","该错误通常由 llama_index 版本不兼容导致。请确保安装的版本包含 Document 类。可尝试升级 llama_index 或调整代码引用方式。若问题持续，检查依赖管理工具（如 poetry）的锁定版本。","https:\u002F\u002Fgithub.com\u002Fhandrew\u002Fbrowserpilot\u002Fissues\u002F4",{"id":113,"question_zh":114,"answer_zh":115,"source_url":111},1779,"如何修复 openai>=1.0.0 的旧接口报错？","运行 `openai migrate` 自动升级代码至新接口，或降级 openai 版本：`pip install openai==0.28`。详细迁移指南请参考官方文档：https:\u002F\u002Fgithub.com\u002Fopenai\u002Fopenai-python",{"id":117,"question_zh":118,"answer_zh":119,"source_url":120},1780,"如何解决 'No index in storage context' 错误？","升级 browserpilot 至 0.2.48 版本：`pip install --upgrade browserpilot`。若问题仍存在，请检查 persist_dir 路径是否正确且具有读写权限。","https:\u002F\u002Fgithub.com\u002Fhandrew\u002Fbrowserpilot\u002Fissues\u002F2",{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},1781,"如何让 Chrome 使用指定用户配置文件？","在 chrome_options 中正确设置 'user-data-dir' 参数指向完整配置文件路径。示例：`'user-data-dir': '\u002FUsers\u002Fevan\u002FLibrary\u002FApplication Support\u002FGoogle\u002FChrome\u002FProfile 1'`。确保路径无拼写错误且浏览器有权限访问该目录。","https:\u002F\u002Fgithub.com\u002Fhandrew\u002Fbrowserpilot\u002Fissues\u002F12",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},1782,"如何使用 GPT-4 模型执行指令？","需验证模型输出格式是否符合预期。检查响应是否以 ```python 开头，必要时调整提示词格式。示例命令：`--model gpt-4-1106-preview`。若仍无响应，尝试简化指令或增加调试日志。","https:\u002F\u002Fgithub.com\u002Fhandrew\u002Fbrowserpilot\u002Fissues\u002F6",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},1783,"如何定位网页元素失败的问题？","使用 `ask_llm_to_find_element` 指令让 LLM 分析 DOM 结构。在指令中添加：\"Ask LLM to find the [元素类型]\"。此方法可提高复杂页面元素识别的成功率。","https:\u002F\u002Fgithub.com\u002Fhandrew\u002Fbrowserpilot\u002Fissues\u002F9",[]]