[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-aws--nova-act":3,"tool-aws--nova-act":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",147882,2,"2026-04-09T11:32:47",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108111,"2026-04-08T11:23:26",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":77,"owner_twitter":76,"owner_website":78,"owner_url":79,"languages":80,"stars":93,"forks":94,"last_commit_at":95,"license":96,"difficulty_score":32,"env_os":97,"env_gpu":98,"env_ram":98,"env_deps":99,"category_tags":105,"github_topics":76,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":106,"updated_at":107,"faqs":108,"releases":138},5920,"aws\u002Fnova-act","nova-act","Amazon Nova Act is an AWS service for building and deploying highly reliable AI agents that automate UI-based workflows at scale.","Nova Act 是亚马逊推出的一项 AWS 服务，旨在帮助开发者构建和部署高可靠性的 AI 智能体，以规模化自动化基于用户界面（UI）的工作流程。它主要解决了传统自动化脚本难以应对复杂、动态网页交互的痛点，能够像真人一样在浏览器中完成重复性操作，并在遇到不确定情况时智能转交人工监督，确保流程稳健运行。\n\n这款工具特别适合需要处理大量网页自动化任务的软件开发者和企业技术团队。用户只需结合自然语言描述与 Python 代码，即可灵活定义工作流。从在网页端快速原型验证，到本地 IDE 调试，再到云端部署与监控，Nova Act 提供了完整的开发闭环。其独特亮点在于“人机协同”机制，既保留了 AI 的高效，又通过人工介入保障了关键决策的准确性；同时支持通过 API 或 MCP 协议集成外部工具，并具备处理验证码、文件上传下载及敏感数据输入等复杂场景的能力。作为基于 Python SDK 的开源生态组件，Nova Act 让大规模 UI 自动化变得简单、可控且易于维护。","# Nova Act SDK\n\nA Python SDK for Amazon Nova Act.\n\nAmazon Nova Act is available as a new AWS service to build and manage fleets of reliable AI agents for automating production UI workflows at scale. Nova Act completes repetitive UI workflows in the browser and escalates to a human supervisor when appropriate. You can define workflows by combining the flexibility of natural language with Python code. Start by exploring in the web playground at nova.amazon.com\u002Fact, develop and debug in your IDE, deploy to AWS, and monitor your workflows in the AWS Console, all in just a few steps.\n\n(Preview) Nova Act also integrates with external tools through API calls, remote MCP, or agentic frameworks, such as Strands Agents.\n\n\n> #### ⚠️ Important: Nova Act SDK versions older than 3.0 are no longer supported. Users must upgrade to the latest version to receive security updates and new features.\n\n> Please follow the upgrade instructions below:\n\n > ```bash\n > # Upgrade to the latest version\n > pip install --upgrade nova-act\n >\n > # Check your current version\n > pip show nova-act\n > ```\n\n## Table of contents\n* [Pre-requisites](#pre-requisites)\n* [Nova Act IDE Extension](#quick-set-up-with-ide-extension)\n* [Nova Act Authentication and Installation](#authentication)\n* [Quick Start](#quick-start)\n* [How to prompt Nova Act](#how-to-prompt-act)\n* [Workflows](#workflows)\n* [Extract information from a web page](#extracting-information-from-a-web-page)\n* [Human-in-the-loop (HITL)](#human-in-the-loop-hitl) \n* [Tools](#tool-use-beyond-the-browser-preview)\n* [Run multiple sessions in parallel](#running-multiple-sessions-in-parallel)\n* [Authentication, cookies, and persisting browser state](#authentication-cookies-and-persistent-browser-state)\n* [Handling sensitive data](#entering-sensitive-information)\n* [Captchas](#captchas)\n* [Search on a website](#search-on-a-website)\n* [File upload and download](#file-upload-and-download)\n* [Working with Browser Dialogs](#working-with-browser-dialogs)\n* [Working with dates](#picking-dates)\n* [Setting the browser user agent](#setting-the-browser-user-agent)\n* [Using a proxy](#using-a-proxy)\n* [Time worked tracking utility](#time-worked-tracking-utility)\n* [Logging and viewing traces](#logging)\n* [Recording a video of a session](#recording-a-session)\n* [Storing Session Data in Amazon S3](#storing-session-data-in-your-amazon-s3-bucket)\n* [Navigating Pages](#navigating-pages)\n* [Viewing headless sessions](#viewing-a-session-that-is-running-in-headless-mode)\n* [Use Nova Act SDK with Amazon Bedrock AgentCore Browser Tool](#use-nova-act-sdk-with-amazon-bedrock-agentcore-browser-tool)\n* [Known limitations](#known-limitations)\n* [Disclosures](#disclosures)\n* [Report a Bug](#report-a-bug)\n* [Reference: Nova Act constructor parameters](#initializing-novaact)\n* [Reference: Actuating the browser](#actuating-the-browser)\n* [Reference: Nova Act CLI](#nova-act-cli)\n\n## Pre-requisites\n\n1. Operating System: MacOS Sierra+, Ubuntu 22.04+, WSL2 or Windows 10+\n2. Python: 3.10 or above\n\n> **Note:** Nova Act supports English.\n\n## Set Up\n\n### Quick Set Up with IDE Extension\n\nAccelerate your development process with the [Nova Act extension](https:\u002F\u002Fgithub.com\u002Faws\u002Fnova-act-extension). The extension automates the setup of your Nova Act development environment and brings the entire agent development experience directly into your IDE, enabling chat-to-script generation, browser session debugging, and step-by-step testing capabilities. For installation instructions and detailed documentation, visit the [extension repository](https:\u002F\u002Fgithub.com\u002Faws\u002Fnova-act-extension) or [website](https:\u002F\u002Fnova.amazon.com\u002Fact).\n\n### Authentication\n\n#### API Key Authentication\n\nNote: When using the Nova Act Playground and\u002For choosing Nova Act developer tools with API key authentication, access and use are subject to the nova.amazon.com Terms of Use. \n\n\nNavigate to https:\u002F\u002Fnova.amazon.com\u002Fact and generate an API key.\n\nTo save it as an environment variable, execute in the terminal:\n```sh\nexport NOVA_ACT_API_KEY=\"your_api_key\"\n```\n\n#### IAM-based Authentication\n\nNote: When choosing developer tools with AWS IAM authentication and\u002For deploying workflows to the Nova Act AWS service, your AWS Service Terms and\u002For Customer Agreement (or other agreement governing your use of the AWS Service) apply.\n\nNova Act also supports authentication using IAM credentials. For details please refer to the Amazon [Nova Act User Guide documentation](https:\u002F\u002Fdocs.aws.amazon.com\u002Fnova-act\u002Flatest\u002Fuserguide\u002F). To use IAM-based credentials use the Workflow constructs (see [Worfklows](#workflows)). Please note the SDK will instantiate a default boto session if AWS credentials are already configured in your environment.\n\n### Installation\n\n```bash\npip install nova-act\n```\n\nAlternatively, you can build `nova-act`. Clone this repo, and then:\n```sh\npip install .\n```\n\n#### [Optional] Install Google Chrome\nNova Act works best with Google Chrome but does not have permission to install this browser. You may skip this step if you already have Google Chrome installed or are fine with using Chromium. Otherwise, you can install Google Chrome by running the following command in the same environment where you installed Nova Act. For more information, visit https:\u002F\u002Fplaywright.dev\u002Fpython\u002Fdocs\u002Fbrowsers#google-chrome--microsoft-edge.\n```bash\nplaywright install chrome\n```\n\n\n## Quick Start\n\n*Note: The first time you run NovaAct, it may take 1 to 2 minutes to start. This is because NovaAct needs to [install Playwright modules](https:\u002F\u002Fplaywright.dev\u002Fpython\u002Fdocs\u002Fbrowsers#install-browsers). Subsequent runs will only take a few seconds to start. This functionality can be toggled off by setting the `NOVA_ACT_SKIP_PLAYWRIGHT_INSTALL` environment variable.*\n\n### Script mode\n\n```python\nfrom nova_act import NovaAct\n\nwith NovaAct(starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\u002Fsearch\") as nova:\n    nova.act(\"Find flights from Boston to Wolf on Feb 22nd\")\n```\n\nThe SDK will (1) open Chrome, (2) perform the task as described in the prompt, and then (3) close Chrome. Details of the run will be printed as console log messages.\n\nRefer to the section [Initializing NovaAct](#initializing-novaact) to learn about other runtime options that can be passed into NovaAct.\n\n### Interactive mode\n\nUsing interactive Python is a nice way to experiment:\n\n```sh\n% python\nPython 3.10.16 (main, Dec  3 2024, 17:27:57) [Clang 16.0.0 (clang-1600.0.26.4)] on darwin\nType \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n>>> from nova_act import NovaAct\n>>> nova = NovaAct(starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\u002Fsearch\")\n>>> nova.start()\n>>> nova.act(\"Find flights from Boston to Wolf on Feb 22nd\")\n```\n\nPlease don't interact with the browser when an `act()` is running because the underlying model will not know what you've changed!\n> Note: When using interactive mode, `ctrl+x` can exit the agent action leaving the browser intact for another `act()` call. `ctrl+c` does not do this -- it will exit the browser and require a `NovaAct` restart.\n\n### Async mode\n\nNova Act provides an async implementation for use with `asyncio`. Import `NovaAct` from `nova_act.asyncio` and use `async with` and `await`:\n\n```python\nimport asyncio\nfrom nova_act.asyncio import NovaAct\n\nasync def main():\n    async with NovaAct(starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\u002Fsearch\") as nova:\n        await nova.act(\"Find flights from Boston to Wolf on Feb 22nd\")\n\nasyncio.run(main())\n```\n\n### Samples\n\nThe [samples](.\u002Fsrc\u002Fnova_act\u002Fsamples) folder contains several examples of using Nova Act to complete various tasks, including:\n* search for apartments on a real estate website, find each apartment's distance from a train station using a maps website, and combine these into a single result set. [This sample](.\u002Fsrc\u002Fnova_act\u002Fsamples\u002Fsearch_apartments_calculate_commute.py) demonstrates running multiple NovaActs in parallel (more detail below).\n* book a flight using data that is provided by a tool, and return the booking number. [This sample](.\u002Fsrc\u002Fnova_act\u002Fsamples\u002Fbooking_with_data_from_tool.py) demonstrates how to implement a python function as a tool that can be used to provide data for the workflow.\n* allows a human to log into an email application, and approve to print the number of emails. [This sample](.\u002Fsrc\u002Fnova_act\u002Fsamples\u002Fprint_number_of_emails.py) demonstrates providing HITL (Human in the loop) callback implementations to incorporate human participation in the workflow.\n\nFor more samples showing how to use Nova Act SDK, please refer to this [Github repository](https:\u002F\u002Fgithub.com\u002Famazon-agi-labs\u002Fnova-act-samples)\n\n## How to prompt act()\n\nThe simplest way to use Nova Act to achieve an end-to-end task is by specifying the entire goal, possibly with hints to guide the agent, in one prompt. However, the agent then must take many steps sequentially to achieve the goal, and any issues or nondeterminism along the way can throw the workflow off track. We have found that Nova Act works most reliably when the task can be accomplished in fewer than 30 steps.\n\nMake sure the prompt is direct and spells out exactly what you want Nova Act to do, including what information you want it to return, if any (read more on data extraction [here](#extracting-information-from-a-web-page)). Aim to completely specify the choices the agent should make and what values it should put in form fields. During your testing, if you see act() going off track, enhance the prompt with hints (e.g. how to use certain UI elements it encounters, how to get to a particular function on the website, or what paths to avoid) — just like you would do with a new team member who might be unfamiliar with the task and the website. If the agent is taking a long winding path or you are unable to get repeated reliability, break the task up into stages and connect these in code.\n\n**1. Be direct and succinct in what the agent should do**\n\n❌ DON'T\n```python\nnova.act(\"Let's see what routes vta offers\")\n```\n\n✅ DO\n```python\nnova.act(\"Navigate to the routes tab\")\n```\n\n❌ DON'T\n```python\nnova.act_get(\"I want to go and meet a friend. I should figure out when the Orange Line comes next.\")\n```\n\n✅ DO\n```python\nnova.act_get(f\"Find the next departure time for the Orange Line from Government Center after {time}\")\n```\n\n**2. Provide complete instructions**\n\n❌ DON'T\n```python\nnova.act(\"book me a hotel that costs less than $100 with the highest star rating\")\n```\n\n✅ DO\n```python\nnova.act(f\"book a hotel for two adults in Houston between {startdate} and {enddate} that costs less than $100 per night with the highest star rating. two queen beds preferred but single king also ok. stop when you get to the enter customer details or payment page.\")\n```\n\n**3. Break up large acts into smaller ones**\n\n❌ DON'T\n```python\nnova.act(\"book me a hotel that costs less than $100 with the highest star rating then find the closest car rental and get me car there, finally find a lunch spot nearby and book it at 12:30pm\")\n```\n\n✅ DO\n```python\nhotel_address = nova.act_get(f\"book a hotel for two adults in Houston between {startdate} and {enddate} that costs less than $100 per night with the highest star rating. two queen beds preferred but single king also ok. return the address of the hotel you booked.\").response\nnova.act(f“book a restaurant near {hotel_address} at 12:30pm for two people”)\nnova.act(f“rent a small sized car between {startdate} and {enddate} from a car rental place near {hotel_address}”)\n```\n\nAnd if the agent still struggles, break it down:\n\n```python\nnova.act(f\"search for hotels for two adults in Houston between {startdate} and {enddate}\")\nnova.act(\"sort by avg customer review\")\nhotel_address = nova.act_get(\"book the first hotel that is $100 or less. prefer two queen beds if there is an option. return the address of the hotel you booked.\").response\nnova.act(f“book a restaurant near {hotel_address} at 12:30pm on {startdate} for two people”)\nnova.act(f“search for car rental places near {hotel_address} and navigate to the closest one’s website”)\nnova.act(f“rent a small sized car between {startdate} and {enddate}, pickup time 12pm, drop-off 12pm.”)\n```\n\n## Workflows\n\nA workflow defines your agent's end-to-end task. Workflows are comprised of act() statements and Python code that orchestrate the automation logic.\n\nThe `nova-act` SDK provides a number of convenience wrappers for managing workflows deployed with the NovaAct AWS service. Simply call the CreateWorkflowDefinition API (or use the AWS Console) and get a WorkflowDefinition to get started.\n\n### The Context Manager\n\nThe core type driving workflow coordination with the NovaAct service is `Workflow`. This class provides a [context manager](https:\u002F\u002Fpeps.python.org\u002Fpep-0343\u002F) which will handle calling the necessary workflow API operations from the Amazon Nova Act service. It calls `CreateWorkflowRun` when your run starts and `UpdateWorkflowRun` with the appropriate status when it finishes. It is provided to the `NovaAct` client via a constructor argument, so that all called APIs will be associated with the correct workflow + run (`CreateSession`, `CreateAct`, `InvokeActStep`, `UpdateAct` etc.). See the following example for how to use it:\n\n```python\nimport os\nfrom nova_act import NovaAct, Workflow\n\ndef main():\n    with Workflow(\n        workflow_definition_name=\"\u003Cyour-workflow-name>\",\n        model_id=\"nova-act-latest\"\n    ) as workflow:\n        with NovaAct(\n            starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\u002Fsearch\",\n            workflow=workflow,\n        ) as nova:\n            nova.act(\"Find flights from Boston to Wolf on Feb 22nd\")\n\nif name == \"main\":\n    main()\n```\n\n#### Retry handling\nBy default, when a Nova Act request times out, the Nova Act SDK will retry it once. This can be overridden by passing in a `boto_config` object to the Workflow constructor. You can also use this object to override the default 60 second `read_timeout`. For example, to retry a request 4 times (for a total of 5 attempts) with a 90 second timeout:\n\n```python\nboto_config = Config(retries={\"total_max_attempts\": 5, \"mode\": \"standard\"}, read_timeout=90)\nwith Workflow(\n    boto_config=boto_config,\n    workflow_definition_name=\"\u003Cyour-workflow-name>\",\n    model_id=\"nova-act-latest\"\n) as workflow:\n```\nNote that retrying the same Nova Act request may result in increased cost if the request ends up executing multiple times. For more information on retries including retry modes, please refer to the [botocore retry documentation](https:\u002F\u002Fbotocore.amazonaws.com\u002Fv1\u002Fdocumentation\u002Fapi\u002Flatest\u002Freference\u002Fconfig.html).\n\n### The Decorator\n\nFor convenience, the SDK also exposes a [decorator](https:\u002F\u002Fpeps.python.org\u002Fpep-0318\u002F) which can be used to annotate functions to be run under a given workflow. The decorator leverages [ContextVars](https:\u002F\u002Fpeps.python.org\u002Fpep-0567\u002F) to inject the correct `Workflow` object into each `NovaAct` instance within the function; no need to provide the `workflow` keyword argument! The following syntax provides identical functionality to the previous example:\n\n```python\nfrom nova_act import NovaAct, workflow\n\n@workflow(\n    workflow_definition_name=\"\u003Cyour-workflow-name>\",\n    model_id=\"nova-act-latest\",\n)\ndef main():\n    with NovaAct(starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\u002Fsearch\") as nova:\n        nova.act(\"Find flights from Boston to Wolf on Feb 22nd\")\n\nif __name__ == \"__main__\":\n    main()\n```\n\n#### Configuring AWS Credentials with `boto_session_kwargs`\n\nThe `Workflow` class accepts an optional `boto_session_kwargs` parameter for customizing the boto3 Session configuration. **By default, if not provided, the workflow uses `{\"region_name\": \"us-east-1\"}`** when AWS credentials are available.\n\nIf you need to customize your AWS session (e.g., to use a specific profile or provide explicit credentials), you can pass a custom dictionary to `boto_session_kwargs`. This works with both the **Context Manager** and **Decorator** versions:\n\n**Using the Context Manager:**\n\n```python\nfrom nova_act import NovaAct, Workflow\n\ndef main():\n    with Workflow(\n        workflow_definition_name=\"\u003Cyour-workflow-name>\",\n        model_id=\"nova-act-latest\",\n        boto_session_kwargs={\n            \"profile_name\": \"my-aws-profile\",\n            \"region_name\": \"us-east-1\"\n        }\n    ) as workflow:\n        with NovaAct(\n            starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\u002Fsearch\",\n            workflow=workflow,\n        ) as nova:\n            nova.act(\"Find flights from Boston to Wolf on Feb 22nd\")\n\nif __name__ == \"__main__\":\n    main()\n```\n\n**Using the Decorator:**\n\n```python\nfrom nova_act import NovaAct, workflow\n\n@workflow(\n    workflow_definition_name=\"\u003Cyour-workflow-name>\",\n    model_id=\"nova-act-latest\",\n    boto_session_kwargs={\n        \"profile_name\": \"my-aws-profile\",\n        \"region_name\": \"us-east-1\"\n    }\n)\ndef main():\n    with NovaAct(starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\u002Fsearch\") as nova:\n        nova.act(\"Find flights from Boston to Wolf on Feb 22nd\")\n\nif __name__ == \"__main__\":\n    main()\n```\n\n**Note:** If you don't provide `boto_session_kwargs` and don't use an API key, the workflow will automatically load AWS credentials using boto3 (more details [here](https:\u002F\u002Fboto3.amazonaws.com\u002Fv1\u002Fdocumentation\u002Fapi\u002Flatest\u002Fguide\u002Fconfiguration.html) on how boto3 loads AWS credentials).\n\n### Best Practices\n\n#### Multi-threading\n\nThe `Workflow` class will work as-is for multi-threaded workflows. See the following example:\n\n```python\nfrom nova_act import NovaAct, Workflow\n\ndef multi_threaded_helper(workflow: Workflow):\n    with NovaAct(..., workflow=workflow) as nova:\n       # nova will have the appropriate workflow run\n \nwith Workflow(\n    workflow_definition_name=\"my-workflow\",\n    model_id=\"nova-act-latest\"\n) as workflow:\n    t = Thread(target=multi_threaded_helper, args=(workflow,))\n    t.start()\n    t.join()\n```\n\nBecause the `@workflow` decorator leverages ContextVars for injecting context, and because ContextVars are intentionally designed to be thread-specific, users will have to provide the context to any functions that will run in different threads from where the wrapping function is defined. See the following example:\n\n```python\nfrom contextvars import copy_context\nfrom nova_act import NovaAct, workflow\n\ndef multi_threaded_helper():\n    with NovaAct(...) as nova:\n       # nova will have the appropriate workflow run\n \n@workflow(\n    workflow_definition_name=\"my-workflow\"\n    model_id=\"nova-act-latest\",\n)\ndef multi_threaded_workflow():\n    ctx = copy_context()\n    t = Thread(target=ctx.run, args=(multi_threaded_helper,))\n    t.start()\n    t.join()\n\nmulti_threaded_workflow()\n```\n\nOr, alternatively, use the `workflow` argument directly to manually inject it, as when directly leveraging the `Workflow` class:\n\n```python\nfrom nova_act import NovaAct, get_current_workflow, workflow\n\ndef multi_threaded_helper(workflow: Workflow):\n    with NovaAct(..., workflow=workflow) as nova:\n       # nova will have the appropriate workflow run\n \n@workflow(\n    workflow_definition_name=\"my-workflow\"\n    model_id=\"nova-act-latest\",\n)\ndef multi_threaded_workflow():\n    t = Thread(target=multi_threaded_helper, args=(get_current_workflow(),))\n    t.start()\n    t.join()\n\nmulti_threaded_workflow()  \n```\n#### Multi-processing\nThe `Workflow` construct does not currently support passing between multi-processing because it maintains a boto3 Session and Client as instance variables, and those objects are not [pickle](https:\u002F\u002Fdocs.python.org\u002F3\u002Flibrary\u002Fpickle.html)-able. Support coming soon!\n\n### Nova Act CLI\n\nThe Nova Act CLI provides a streamlined command-line interface for deploying Python workflows to AWS AgentCore Runtime, handling containerization, ECR management, IAM roles, and multi-region deployments automatically. See the [Nova Act CLI README](.\u002Fsrc\u002Fnova_act\u002Fcli\u002FREADME.md) for installation and usage instructions.\n\n## Common Building Blocks\n\n### Extracting information from a web page\n\nUse `pydantic` and ask `act_get` to respond to a question about the browser page in a certain schema.\n\n- Make sure you use a schema whenever you are expecting any kind of structured response, even just a bool (yes\u002Fno). If a schema is not provided, the returned object will not contain a response.\n- Put a prompt to extract information in its own separate `act()` call.\n\nFor convenience, the `act_get()` function works the same as `act()` but provides a default `STRING_SCHEMA`, so that a response will always be available in the return object whether or not a specific schema is provided. We recommend using `act_get()` for all extraction tasks, to ensure type safety.\n\nExample:\n\n```python\nfrom nova_act import NovaAct\nfrom pydantic import BaseModel\n\nclass Measurement(BaseModel):\n    value: float\n    unit: str\n\nclass PlanetData(BaseModel):\n    gravity: Measurement\n    average_temperature: Measurement\n\nwith NovaAct(\n        starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\"\n    ) as nova:\n        planet = 'Proxima Centauri b'\n        result = nova.act_get(\n            f\"Go to the {planet} page and return the gravity and average temperature.\",\n            schema=PlanetData.model_json_schema(),\n        )\n\n        # Parse the response into the data model\n        planet_data = PlanetData.model_validate(result.parsed_response)\n\n        # Do something with the parsed data\n        print(f\"✓ {planet} data:\\n{planet_data.model_dump_json(indent=2)}\")\n```\n\nIf all you need is a bool response, there's a convenient `BOOL_SCHEMA` constant:\nExample:\n\n```python\nfrom nova_act import NovaAct, ActInvalidModelGenerationError, BOOL_SCHEMA\nwith NovaAct(starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\") as nova:\n    try:\n        result = nova.act_get(\"Am I logged in?\", schema=BOOL_SCHEMA)\n    except ActInvalidModelGenerationError as e:\n        # act response did not match the schema ¯\\_(ツ)_\u002F¯\n        print(f\"Invalid result: {e}\")\n    else:\n        # result.parsed_response is now a bool\n        if result.parsed_response:\n            print(\"You are logged in\")\n        else:\n            print(\"You are not logged in\")\n```\n\n### Human-in-the-loop (HITL)\n\nNova Act's Human-in-the-Loop (HITL) capability enables seamless human supervision within autonomous web workflows. HITL is available in the Nova Act SDK for you to implement in your workflows (not provided as a managed AWS service). When your workflow encounters scenarios requiring human judgment or intervention, HITL can provide tools and user interfaces for supervisors to assist, verify, or take control of the process. \n\n#### HITL patterns\n\n##### Human approval\n\nHuman approval enables asynchronous human decision-making in automated processes. When Nova Act encounters a decision point requiring human judgment, it captures a screenshot of the current state and presents it to a human reviewer via a browser-based interface. Use this when you need binary or multi-choice decisions (Approve\u002FReject, Yes\u002FNo, or selecting from predefined options).\n\n##### UI takeover\n\nUI takeover enables real-time human control of a remote browser session. When Nova Act encounters a task that requires human interaction, it hands control of the browser to a human operator via a live-streaming interface. The operator can interact with the browser using mouse and keyboard in real-time\n\n#### Implementing HITL\n\nPlease refer to the [Amazon Nova Act User Guide documentation on HITL](https:\u002F\u002Fdocs.aws.amazon.com\u002Fnova-act\u002Flatest\u002Fuserguide\u002Fhitl.html#implementing-hitl) for implementing HITL in your production workflows.\n\n##### Implementing HITL using the SDK\n\nTo implement HITL patterns in the Nova Act SDK, define a class that extends `HumanInputCallbacksBase` and implements two of its abstract methods `approve` and `ui_takeover`. Pass an instance of it to the `human_input_callbacks` argument of the `NovaAct` constructor.\n\n- `approve` - is a callback that will be triggered for the Human approval pattern (e.g Approve expense reports or purchase approvals)\n- `ui_takeover` - is a callback that will be triggered for the UI takeover pattern (e.g Solve CAPTCHA challenges)\n\n```\nfrom nova_act import NovaAct, Workflow\nfrom nova_act.tools.human.interface.human_input_callback import (\n    ApprovalResponse, HumanInputCallbacksBase, UiTakeoverResponse,\n)\n\nclass MyHumanInputCallbacks(HumanInputCallbacksBase):\n    def approve(self, message: str) -> ApprovalResponse:\n        ... \n\n    def ui_takeover(self, message: str) -> UiTakeoverResponse:\n        ...\n\nwith NovaAct(\n    starting_page=...,\n    tty=False,\n    human_input_callbacks=MyHumanInputCallbacks(),\n) as nova:\n    ...\n    print(f\"Task completed: {result.response}\")\n```\n\nRefer to [this sample](.\u002Fsrc\u002Fnova_act\u002Fsamples\u002Fprint_number_of_emails.py) for a working example.\n\n\n### Tool Use Beyond the Browser (Preview)\n\n(Preview) Nova Act allows you to integrate external tools beyond the browser, such as an API Call or Database Query, into workflows. Nova Act SDK allows using a Python function as a tool that can be invoked during a workflow step. To make a Python function available as a tool, annotate it with the @tool decorator. You can pass a list of tools to the NovaAct constructor argument tools.\n\n```\nfrom nova_act import NovaAct, tool\n\n@tool\ndef my_tool(str: input) -> str:\n   ...\n\nwith NovaAct(\n    starting_page=...,\n    tools=[my_tool],\n)\n```\n\nRefer to [this sample](.\u002Fsrc\u002Fnova_act\u002Fsamples\u002Fbooking_with_data_from_tool.py) for a working example.\n\nUsers may also provide tools from an MCP server by leveraging a [Strands MCP Client](https:\u002F\u002Fstrandsagents.com\u002Flatest\u002Fdocumentation\u002Fdocs\u002Fuser-guide\u002Fconcepts\u002Ftools\u002Fmcp-tools\u002F):\n\n```python\nfrom mcp import StdioServerParameters, stdio_client\nfrom nova_act import NovaAct\nfrom strands.tools.mcp import MCPClient\n\nwith MCPClient(\n    lambda: stdio_client(\n        StdioServerParameters(command=\"uvx\", args=[\"awslabs.aws-documentation-mcp-server@latest\"])\n    )\n) as aws_docs_client:\n    with NovaAct(\n        starting_page=\"https:\u002F\u002Faws.amazon.com\u002F\", tools=aws_docs_client.list_tools_sync(),\n    ) as nova:\n        print(\n            nova.act_get(\n                \"Use the 'search_documentation' tool to tell me about Amazon Bedrock and how to use it with Python.\"\n                \"Ignore the web browser; do not click, scroll, type, etc.\"\n            )\n        )\n\n```\n\n#### Advanced: Tools Requiring Browser Control\n\nIf your custom tool needs to interact with the browser directly (similar to HITL tools), you can mark it with `requires_unlocked_actuator_context = True`. This temporarily suspends the actuator's internal hooks during tool execution, allowing external processes to control the browser.\n\n```python\nfrom nova_act import NovaAct, tool\n\n@tool\ndef my_browser_control_tool(message: str) -> str:\n    \"\"\"Tool that needs direct browser access.\"\"\"\n    # ... interact with browser externally\n    return \"done\"\n\n# Mark the tool as requiring unlocked context\nmy_browser_control_tool.requires_unlocked_actuator_context = True\n\nwith NovaAct(starting_page=..., tools=[my_browser_control_tool]) as nova:\n    nova.act(\"Use my_browser_control_tool to do something\")\n```\n\nThe actuator automatically re-locks the context on the next agent action.\n\n### Handling ActErrors\n\nOnce the `NovaAct` client is started, it might encounter errors during the `act()` execution. All of these error types are included in the [`nova_act.types.act_errors` module](.\u002Fsrc\u002Fnova_act\u002Ftypes\u002Fact_errors.py), and are organized as follows:\n1. `ActAgentError`: Indicates requested prompt failed to complete; users may retry with a different request.\n   * Examples include: `ActAgentFailed` (the agent raised an error because the task was not possible), `ActInvalidModelGenerationError` (model generated output that could not be interpreted), or `ActExceededMaxStepsError` (`act()` failed to complete within the configured maximum number of steps)\n1. `ActExecutionError`: Indicates a local error encountered while executing valid output from the agent\n   * Examples include: `ActActuationError` (client encountered an exception while actuating the Browser), or `ActCanceledError` (the user canceled execution).\n1. `ActClientError`: Indicates a request to the NovaAct Service was invalid; users may retry with a different request.\n   * Examples include: `ActGuardrailsError` (the request was blocked by our RAI guardrails) or `ActRateLimitExceededError` (request was throttled; rate should be reduced).\n1. `ActServerError`: Indicates the NovaAct Service encountered an error processing the request.\n   * Examples include: `ActInternalServerError` (internal error processing request), `ActBadResponseError` (the service returned a response with unrecognized shape), or `ActServiceUnavailableError` (the service could not be reached.)\n\nUsers may catch `ActAgentError`s and `ActClientError`s and retry with the appropriate request; for `ActExecutionError`s and `ActServerError`s, please submit an issue to the team to look into, including (1) your SDK version, (2) your platform + operating system, (3) the full error trace, and (4) steps to reproduce.\n\n### Running multiple sessions in parallel\nOne `NovaAct` instance can only actuate one browser at a time. However, it is possible to actuate multiple browsers concurrently with multiple `NovaAct` instances! They are quite lightweight. You can use this to parallelize parts of your task, creating a kind of browser use map-reduce for the internet. [This sample](.\u002Fsrc\u002Fnova_act\u002Fsamples\u002Fsearch_apartments_calculate_commute.py) shows running multiple sessions in parallel.\n\n### Authentication, cookies, and persistent browser state\n\nNova Act supports working with authenticated browser sessions by overriding its default settings. By default, when Nova Act runs, it clones the Chromium user data directory and deletes it at the end of the run. To use authenticated sessions, you need to specify an existing directory containing the authenticated sessions, and disable the cloning (which in turn disables deletion of the directory).\n\nSpecifically, you need to:\n1. (optional) Create a new local directory for the user data directory For example, `\u002Ftmp\u002Fuser-data-dir`. You can skip this step to use an existing Chromium profile.\n2. specify this directory when instantiating `NovaAct` via the `user_data_dir` parameter\n3. disable cloning this directory when instantiating `NovaAct` by passing in the parameter `clone_user_data_dir=False`\n4. instruct Nova Act to open the site(s) into which you want to authenticate\n5. authenticate into the sites. See [Entering sensitive information](#entering-sensitive-information) below for more information on entering sensitive data\n6. stop your Nova Act session\n\nThe next time you run Nova Act with `user_data_dir` set to the directory you created in step 1, you will start from an authenticated session. In subsequent runs, you can decide if you want to enable or disable cloning. If you are running multiple `NovaAct` instances in parallel, they must each create their own copy so you must enable cloning in that use case (`clone_user_data_dir=True`).\n\nHere's an example script that shows how to pass in these parameters.\n\n```python\nimport os\n\nfrom nova_act import NovaAct\n\nos.makedirs(user_data_dir, exist_ok=True)\n\nwith NovaAct(starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\", user_data_dir=user_data_dir, clone_user_data_dir=False) as nova:\n    input(\"Log into your websites, then press enter...\")\n    # Add your nova.act() statements here.\n\nprint(f\"User data dir saved to {user_data_dir=}\")\n```\n\nThe script is included in the installation: `python -m nova_act.samples.setup_chrome_user_data_dir`.\n\n#### Run against the local default Chrome browser\n\nIf your local default Chrome browser has extensions or security features you need for sites you need your workflow to access, you can configure the SDK to use the Chrome browser installed on your machine rather than the one managed by the SDK using the `NovaAct` parameters below.\n\n> **Important notes:**\n>\n> - This feature currently only works for MacOS\n> - This will quit your default running Chrome and restart it with new arguments. At the end of the session, it will quit Chrome.\n> - If your Chrome browser has many tabs open, consider closing unnecessary ones before running the automation, as Chrome's performance during the restart can be affected by high numbers of open tabs.\n\nBefore starting NovaAct with this feature, you must copy the files from your system Chrome user_data_dir to a location of your choice.\nThis is necessary as Chrome does not allow CDP connections into instances started with the system default user_data_dir.\n\nManually, this is can be done with:\n```\nrsync -a --exclude=\"Singleton*\" \u002FUsers\u002F$USER\u002FLibrary\u002FApplication\\ Support\u002FGoogle\u002FChrome\u002F \u003Cyour choice of location>\n```\n\nYou can also use the convenience function `rsync_from_default_user_data(\u003Cyour choice of location>)` to create and update that directory as part of your script.\nNote that invoking `rsync_from_default_user_data` will overwrite changes in the destination directory and make it an exact mirror of `\u002FUsers\u002F$USER\u002FLibrary\u002FApplication\\ Support\u002FGoogle\u002FChrome\u002F` by overwriting existing files with the same name as in the source and deleting files not in it. If you want to persist profile changes that NovaAct made in the working directory back to your system, you must then mirror the changes back into the system default dir with your own implementation after stopping NovaAct.\n\nWhen using this feature, you must specify `clone_user_data_dir=False` and pass the desired working dir as `user_data_dir` with the appropriate files populated. This is because `NovaAct` will not be cloning or deleting the `user_data_dir`s for you in this mode.\n\n```python\n>>> from nova_act import NovaAct, rsync_from_default_user_data\n>>> working_user_data_dir = \"\u002FUsers\u002F$USER\u002Fyour_choice_of_path\"\n>>> rsync_from_default_user_data(working_user_data_dir)\n>>> nova = NovaAct(use_default_chrome_browser=True, clone_user_data_dir=False, user_data_dir=working_user_data_dir, starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\u002Fsearch\")\n>>> nova.start()\n>>> nova.act_get(\"Find flights from Boston to Wolf on Feb 22nd\")\n...\n>>> nova.stop()\n>>> quit()\n```\n\n### Entering sensitive information\n\nTo enter a password or sensitive information (e.g., credit card and social security number), do not prompt the model with the sensitive information. Ask the model to focus on the element you want to fill in. Then use Playwright APIs directly to type the data, using `client.page.keyboard.type(sensitive_string)`. You can get that data in the way you wish: prompting in the command line using [`getpass`](https:\u002F\u002Fdocs.python.org\u002F3\u002Flibrary\u002Fgetpass.html), using an argument, or setting env variable.\n\nNote that any passwords or other sensitive data saved with a Chromium-based browser's password manager on Linux systems without a system-level keyring (ex. Libsecret, KWallet) will be stored in plaintext within a user's profile directory.\n\n> **Caution:** If you instruct Nova Act to take an action on any browser screen displaying sensitive information, including information provided through Playwright APIs, that information will be included in the screenshots collected.\n\n```python\n# Sign in.\nnova.act(\"enter username janedoe and click on the password field\")\n# Collect the password from the command line and enter it via playwright. (Does not get sent over the network.)\nnova.page.keyboard.type(getpass())\n# Now that username and password is filled in, ask NovaAct to proceed.\nnova.act(\"sign in\")\n```\n\n### Security Options\n\nNovaAct is initialized with secure default behaviors which you may want to relax depending on your use-case.\n\n#### Allow Navigation to Local `file:\u002F\u002F` URLS\n\nTo enable local file navigation, define one or more filepath patterns in `SecurityOptions.allowed_file_open_paths`\n```python\nfrom nova_act import NovaAct, SecurityOptions\n\nNovaAct(starting_page=\"file:\u002F\u002Fhome\u002Fnova-act\u002Fsite\u002Findex.html\", SecurityOptions(allowed_file_open_paths=['\u002Fhome\u002Fnova-act\u002Fsite\u002F*']))\n```\n\n#### Allow File Uploads\nTo allow the agent to upload files to websites, define one or more filepath patterns in `SecurityOptions.allowed_file_upload_paths`.\n\n```python\nfrom nova_act import NovaAct, SecurityOptions\n\nNovaAct(starting_page=\"https:\u002F\u002Fexample.com\", SecurityOptions(allowed_file_upload_paths=['\u002Fhome\u002Fnova-act\u002Fshared\u002F*']))\n```\n\n#### Filepath Structures\nThe filepath parameters support the following formats:\n- `[\"\u002Fhome\u002Fnova-act\u002Fshared\u002F*\"]` - Allow from specific directory\n- `[\"\u002Fhome\u002Fnova-act\u002Fshared\u002Ffile.txt\"]` - Allow a specific filepath\n- `[\"*\"]` - Enable for all paths\n- `[]` - Disable the feature (Default)\n\n### State Guardrails\n\nState guardrails allow you to control which URLs the agent can visit during execution. You can provide a callback function that inspects the browser state after each observation and decides whether to allow or block continued execution. If blocked, `act()` will raise `ActStateGuardrailError`. This is useful for preventing the agent from navigating to unauthorized domains or sensitive pages.\n\n```python\nfrom nova_act import NovaAct, GuardrailDecision, GuardrailInputState\nfrom urllib.parse import urlparse\nimport fnmatch\n\ndef url_guardrail(state: GuardrailInputState) -> GuardrailDecision:\n    hostname = urlparse(state.browser_url).hostname\n    if not hostname:\n        return GuardrailDecision.BLOCK\n\n    # Example URL block-list\n    blocked = [\"*.blocked-domain.com\", \"*.another-blocked-domain.com\"]\n    if any(fnmatch.fnmatch(hostname, pattern) for pattern in blocked):\n        return GuardrailDecision.BLOCK\n\n    # Example URL allow-list\n    allowed = [\"allowed-domain.com\", \"*.another-allowed-domain.com\"]\n    if any(fnmatch.fnmatch(hostname, pattern) for pattern in allowed):\n        return GuardrailDecision.PASS\n\n    return GuardrailDecision.BLOCK\n\nwith NovaAct(starting_page=\"https:\u002F\u002Fallowed-domain.com\", state_guardrail=url_guardrail) as nova:\n    # The following will be blocked if agent tries to visit a blocklisted domain or leave one of the allowlisted domains\n    nova.act(\"Navigate to the homepage\")\n```\n\n### Captchas\n\nYou should use the `ui_takeover` callback (see [HITL](#human-in-the-loop-hitl)) if your script encounters captchas in certain places. This will allow redirecting the step of solving Captcha to a human.\n\n### Search on a website\n\n```python\nnova.go_to_url(website_url)\nnova.act(\"search for cats\")\n```\n\nIf the model has trouble finding the search button, you can instruct it to press enter to initiate the search.\n\n```python\nnova.act(\"search for cats. type enter to initiate the search.\")\n```\n\n### File upload and download\n\nYou can use playwright to download a file on a web page.\n\nThrough a download action button:\n\n```python\n# Ask playwright to capture any downloads, then actuate the page to initiate it.\nwith nova.page.expect_download() as download_info:\n    nova.act(\"click on the download button\")\n\n# Temp path for the download is available.\nprint(f\"Downloaded file {download_info.value.path()}\")\n\n# Now save the downloaded file permanently to a location of your choice.\ndownload_info.value.save_as(\"my_downloaded_file\")\n```\n\n> **Important notes**:\n>\n> - The browser will show the file being downloaded to the temporary path defined by Playwright ([see docs](https:\u002F\u002Fplaywright.dev\u002Fdocs\u002Fdownloads#introduction))\n>    - This temporary path is accessible via `download_info.value.path()`\n>  - When using `download_info.value.save_as()`:\n>    - If a full path is provided (e.g., \"\u002Fpath\u002Fto\u002Fmy_downloaded_file\"), the file will be saved there\n>    - If only a filename is provided (e.g., \"my_downloaded_file\"), it will be saved in the current working directory where the Python script was executed from\n\nTo download the current page:\n\n1. If it's HTML, then accessing `nova.page.content()` will give you the rendered DOM. You can save that to a file.\n2. If it is another content type, like a pdf, you can download it using `nova.page.request`:\n\n```python\n# Download the content using Playwright's request.\nresponse = nova.page.request.get(nova.page.url)\nwith open(\"downloaded.pdf\", \"wb\") as f:\n    f.write(response.body())\n```\n\nNovaAct can natively upload files using the appropriate upload action on the page. To do that, first you must allow NovaAct to access the file for upload. Then instruct it to\nupload it by filename:\n\n```python\nupload_filename = \"\u002Fupload_path\u002Fupload_me.pdf\"\n\nwith NovaAct(..., security_options=SecurityOptions(allowed_file_upload_paths=[\"\u002Fupload_path\u002F*\"])) as nova:\n    nova.act(f\"upload {upload_filename} using the upload receipt button\")\n```\n\n> **Important security note**:\n>\n> Pick `allowed_file_upload_paths` narrowly to minimize NovaAct's access to your filesystem to avoid data exfiltration by malicious sites or web content.\n\n### Working with Browser Dialogs\n\nPlaywright automatically dismisses browser native dialogs such as [alert](https:\u002F\u002Fdeveloper.mozilla.org\u002Fen-US\u002Fdocs\u002FWeb\u002FAPI\u002FWindow\u002Falert), [confirm](https:\u002F\u002Fdeveloper.mozilla.org\u002Fen-US\u002Fdocs\u002FWeb\u002FAPI\u002FWindow\u002Fconfirm), and [prompt](https:\u002F\u002Fdeveloper.mozilla.org\u002Fen-US\u002Fdocs\u002FWeb\u002FAPI\u002FWindow\u002Fprompt) by default. To handle them manually, register a dialog handler before Nova Act performs the action that triggers the dialog. For example:\n\n```python\ndef handle_dialog(dialog):\n    \"\"\"Handle dialog by printing its message and accepting it.\"\"\"\n    print(f\"Dialog message: {dialog.message}\")\n    dialog.accept()  # Accept and dismiss the dialog\n    # dialog.dismiss()  # Or dismiss\u002Fcancel the dialog\n\n# Register the handler\nnova.page.on(\"dialog\", handle_dialog)\n# Trigger the dialog\nnova.act(\"Do something that results in a dialog\")\n# Unregister the handler\nnova.page.remove_listener(\"dialog\", handle_dialog)\n```\n\nFor more details, see the [Playwright documentation](https:\u002F\u002Fplaywright.dev\u002Fpython\u002Fdocs\u002Fdialogs#alert-confirm-prompt-dialogs).\n\n### Picking dates\n\nSpecifying the start and end dates in absolute time works best.\n\n```python\nnova.act(\"select dates march 23 to march 28\")\n```\n\n### Setting the browser user agent\n\nNova Act comes with Playwright's Chrome and Chromium browsers. These use the default User Agent set by Playwright. You can override this with the `user_agent` option:\n\n```python\nnova = NovaAct(..., user_agent=\"MyUserAgent\u002F2.7\")\n```\n\n### Using a proxy\n\nNova Act supports proxy configurations for browser sessions. This can be useful when you need to route traffic through a specific proxy server:\n\n```python\n# Basic proxy without authentication\nproxy_config = {\n    \"server\": \"http:\u002F\u002Fproxy.example.com:8080\"\n}\n\n# Proxy with authentication\nproxy_config = {\n    \"server\": \"http:\u002F\u002Fproxy.example.com:8080\",\n    \"username\": \"myusername\",\n    \"password\": \"mypassword\"\n}\n\nnova = NovaAct(\n    starting_page=\"https:\u002F\u002Fexample.com\",\n    proxy=proxy_config\n)\n```\n\n> **Note:** If connecting to a CDP endpoint, the code that launches the browser and manages the lifecycle is responsible for configuring the proxy. These configuration params only apply if NovaAct is creating and launching the browser.\n\n\n### Logging\nBy default, `NovaAct` will emit all logs level `logging.INFO` or above. This can be overridden by specifying an integer value under the `NOVA_ACT_LOG_LEVEL` environment variable. Integers should correspond to [Python logging levels](https:\u002F\u002Fdocs.python.org\u002F3\u002Flibrary\u002Flogging.html#logging-levels).\n \n### Viewing act traces\n \nAfter an `act()` finishes, it will output traces of what it did in a self-contained html file. The location of the file is printed in the console trace.\n \n```sh\n> ** View your act run here: \u002Fvar\u002Ffolders\u002F6k\u002F75j3vkvs62z0lrz5bgcwq0gw0000gq\u002FT\u002Ftmpk7_23qte_nova_act_logs\u002F15d2a29f-a495-42fb-96c5-0fdd0295d337\u002Fact_844b076b-be57-4014-b4d8-6abed1ac7a5e_output.html\n```\n \nYou can change the directory for this by passing in a `logs_directory` argument to `NovaAct`.\n\n### Time worked tracking utility\n\nThe time_worked utility tracks and reports the approximate time spent by the agent working on tasks, excluding time spent waiting for human input. This helps you understand the actual agent execution time.\n\n#### How It Works\nApproximate time worked is calculated using this basic formula:\n```\ntime_worked = (end_time - start_time) - human_wait_time\n```\n\nWhen an `act()` call completes (successfully or with an error), the following is calculated:\n- **Approx. Time Worked**: Total execution time (end time minus start time) minus any time spent waiting for human input\n- **Human Wait Time**: Time spent waiting for `approve()` or `ui_takeover()` callbacks from when the callback is issued to when the agent execution continues\n\n#### Console Output\n\nAt the end of each `act()` call, you'll see a time worked summary in the console, as well as in the JSON and HTML reports:\n\nWithout human input:\n```\n⏱️ Approx. Time Worked: 11.8s\n```\n\nWith human input:\n```\n⏱️  Approx. Time Worked: 28.3s (excluding 4.5s human wait)\n```\n\n#### Important Disclaimer\n\n> **Note:** Time worked calculations are approximate and may have inaccuracies due to system timing variations, network latency, or other factors. This metric should be viewed as a utility to help understand agent execution patterns and should not be used for formal time tracking or billing purposes.\n\n### Recording a session\n \nYou can easily record an entire browser session locally by setting the `logs_directory` and specifying `record_video=True` in the constructor for `NovaAct`.\n\n### Storing Session Data in Your Amazon S3 Bucket\n\nNova Act allows you to store session data (HTML traces, screenshots, etc.) in your own [Amazon S3](https:\u002F\u002Faws.amazon.com\u002Fs3\u002F) bucket using the `S3Writer` convenience utility:\n\n```python\nimport boto3\nfrom nova_act import NovaAct\nfrom nova_act.util.s3_writer import S3Writer\n\n# Create a boto3 session with appropriate credentials\nboto_session = boto3.Session()\n\n# Create an S3Writer\ns3_writer = S3Writer(\n    boto_session=boto_session,\n    s3_bucket_name=\"my-bucket\",\n    s3_prefix=\"my-prefix\u002F\",  # Optional\n    metadata={\"Project\": \"MyProject\"}  # Optional\n)\n\n# Use the S3Writer with NovaAct\nwith NovaAct(\n    starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\u002Fsearch\",\n    boto_session=boto_session,  # You may use API key here instead\n    stop_hooks=[s3_writer]\n) as nova:\n    result = nova.act_get(\"Find flights from Boston to Wolf on Feb 22nd\")\n```\n\nThe S3Writer requires the following AWS permissions:\n- s3:ListObjects on the bucket and prefix\n- s3:PutObject on the bucket and prefix\n\nWhen the NovaAct session ends, all session files will be automatically uploaded to the specified S3 bucket with the provided prefix.\n\n#### S3 Upload Troubleshooting\n\n**No files in S3 bucket?**\n- Check logs for \"Registered stop hooks\" message during initialization\n- Verify your code path actually executes the NovaAct context manager\n\n### Navigating pages\n\n> **Use `nova.go_to_url` instead of `nova.page.goto`**\n\nThe Playwright Page's `goto()` method has a default timeout of 30 seconds, which may cause failures for slow-loading websites. If the page does not finish loading within this time, `goto()` will raise a `TimeoutError`, potentially interrupting your workflow. Additionally, goto() does not always work well with act, as Playwright may consider the page ready before it has fully loaded.\nTo address these issues, we have implemented a new function, `go_to_url()`, which provides more reliable navigation. You can use it by calling: `nova.go_to_url(url)` after `nova.start()`. You can also use the `go_to_url_timeout` parameter on `NovaAct` initialization to modify the default max wait time in seconds for the start page load and subsequent `go_to_url()` calls.\n\n### Viewing a session that is running in headless mode\n\nWhen running the browser in headless mode (`headless: True`), you may need to see how the workflow is progressing as the agent is going through it. To do this:\n1. set the following environment variables before starting your Nova Act workflow\n```bash\nexport NOVA_ACT_BROWSER_ARGS=\"--remote-debugging-port=9222\"\n```\n2. start your Nova Act workflow as you normally do, with `headless: True`\n3. Open a local browser to `http:\u002F\u002Flocalhost:9222\u002Fjson`\n4. Look for the item of type `page` and copy and paste its `devtoolsFrontendUrl` into the browser\n\nYou'll now be observing the activity happening within the headless browser. You can also interact with the browser window as you normally would, which can be helpful for handling captchas. For example, in your Python script:\n1. ask Nova Act to check if there is a captcha\n2. if there is, `sleep()` for a period of time. Loop back to step 1. During `sleep()`...\n3. send an email \u002F SMS alert (eg, with [Amazon Simple Notification Service](https:\u002F\u002Faws.amazon.com\u002Fsns\u002F)) containing the `devtoolsFrontendUrl` signaling human intervention is required\n4. a human opens the `devtoolsFrontendUrl` and solves the captcha\n5. the next time step 1 is run, Nova Act will see the captcha has been solved, and the script will continue\n\nNote that if you are running Nova Act on a remote host, you may need to set up port forwarding to enable access from another system.\n\n\n## Use Nova Act SDK with Amazon Bedrock AgentCore Browser Tool\n\nThe Nova Act SDK can be used together with the [Amazon Bedrock AgentCore Browser Tool](https:\u002F\u002Fdocs.aws.amazon.com\u002Fbedrock-agentcore\u002Flatest\u002Fdevguide\u002Fbrowser-tool.html) for production-ready browser automation at scale. The AgentCore Browser Tool provides a fully managed cloud-based browser automation solution that addresses limitations around real-time data access, while the Nova Act SDK gives you the flexibility to build sophisticated agent workflows.\nSee [this blog post](https:\u002F\u002Faws.amazon.com\u002Fblogs\u002Fmachine-learning\u002Fintroducing-amazon-bedrock-agentcore-browser-tool\u002F) for integration instructions.\n\n> **Note**: When the Nova Act SDK and Bedrock AgentCore Browser run on different operating systems (e.g., SDK on MacOS and AgentCore Browser on Linux), keyboard commands may not translate correctly between systems. This impacts certain SDK functions like `agent_type()`, which uses keyboard shortcuts (such as `ControlOrMeta+A` for \"select all\") that are OS-dependent. This behavior is an expected consequence of the cross-OS integration architecture and should be considered when developing automations that use keyboard input methods.\n\n## Known limitations\nOur vision for Nova Act is to provide key capabilities to build useful agents at scale. If you encounter limitations with Nova Act — please provide feedback to [nova-act@amazon.com](mailto:nova-act@amazon.com?subject=Nova%20Act%20Bug%20Report) to help us make it better.\n\n\nFor example:\n\n* `act()` cannot interact with non-browser applications;\n* `act()` cannot interact with the browser window. This means that browser modals such as those requesting access to use your location don't interfere with act() but must be manually acknowledged if desired;\n* Screen size constraints;\n  * Nova Act is optimized for resolutions between `864×1296` and `1536×2304`; and\n  * Performance may degrade outside this range\n  * You can adjust the screen dimensions using `screen_width` and `screen_height` parameters (e.g., `screen_width=1920, screen_height=1080`)\n\nLearn more in the AWS AI Service Card for Amazon Nova Act.\n\n## Reference\n\n\n### Initializing `NovaAct`\n\nThe constructor accepts the following:\n\n* `starting_page (str)`: The URL of the starting page; supports both web URLs (`https:\u002F\u002F`) and local file URLs (`file:\u002F\u002F`) (required argument)\n  * Note: file URLs require passing `ignore_https_errors=True` to the constructor\n* `headless (bool)`: Whether to launch the browser in headless mode (defaults to `False`)\n* `user_data_dir (str)`: Path to a [user data directory](https:\u002F\u002Fchromium.googlesource.com\u002Fchromium\u002Fsrc\u002F+\u002Fmaster\u002Fdocs\u002Fuser_data_dir.md#introduction), which stores browser session data like cookies and local storage (defaults to `None`).\n* `nova_act_api_key (str)`: The API key you generated for authentication; required if the `NOVA_ACT_API_KEY` environment variable is not set. If passed, takes precedence over the environment variable.\n* `logs_directory (str)`: The directory where NovaAct will output its logs, run info, and videos (if `record_video` is set to `True`).\n* `record_video (bool))`: Whether to record video and save it to `logs_directory`. Must have `logs_directory` specified for video to record.\n* `proxy (dict)`: Proxy configuration for the browser. Should be a dictionary containing:\n  * `server` (required): The proxy server URL (must start with `http:\u002F\u002F` or `https:\u002F\u002F`)\n  * `username` (optional): Username for proxy authentication\n  * `password` (optional): Password for proxy authentication\n  * Note: Proxy is not supported when connecting to a CDP endpoint or using the default Chrome browser\n* `human_input_callbacks` (optional): An implementation of human input callbacks. If not provided, a request for human input tool will not be made.\n* `tools` (optional): A list of client provided tools.\n\nThis creates one browser session. You can create as many browser sessions as you wish and run them in parallel but a single session must be single-threaded.\n\n### Actuating the browser\n\n#### Use act\n\n`act()` takes a natural language prompt from the user and will actuate on the browser window on behalf of the user to achieve the goal. Arguments:\n\n* `max_steps` (int): Configure the maximum number of steps (browser actuations) `act()` will take before giving up on the task. Use this to make sure the agent doesn't get stuck forever trying different paths. Default is 30.\n* `timeout` (int): Number of seconds timeout for the entire act call. Prefer using `max_steps` as time per step can vary based on model server load and website latency.\n* `observation_delay_ms`: Additional delay in milliseconds before taking an observation of the page. Useful to wait for UI animations to complete.\n\nReturns an `ActResult`.\n\n```python\nclass ActResult:\n    metadata: ActMetadata\n\nclass ActMetadata:\n    session_id: str | None\n    act_id: str | None\n    num_steps_executed: int\n    start_time: float\n    end_time: float\n    prompt: string\n```\n\nIf a schema is passed to `act()` (the `act_get()` function conveniently provides a default `STRING_SCHEMA`), then the returned object will be an `ActGetResult`, a subclass which includes the raw and structured response:\n\n```python\nclass ActGetResult(ActResult):\n    response: str | None\n    parsed_response: JSONType\n    valid_json: bool | None\n    matches_schema: bool | None\n```\n\n#### Do it programmatically\n\n`NovaAct` exposes a Playwright [`Page`](https:\u002F\u002Fplaywright.dev\u002Fpython\u002Fdocs\u002Fapi\u002Fclass-page) object directly under the `page` attribute.\n\nThis can be used to retrieve current state of the browser, for example a screenshot or the DOM, or actuate it:\n\n```python\nscreenshot_bytes = nova.page.screenshot()\ndom_string = nova.page.content()\nnova.page.keyboard.type(\"hello\")\n```\n\n## Disclosures\n\nNote: When using the Nova Act Playground and\u002For choosing Nova Act developer tools with API key authentication, access and use are subject to the nova.amazon.com Terms of Use. When choosing Nova Act developer tools with AWS IAM authentication and\u002For deploying workflows to the Nova Act AWS service, your AWS Service Terms and\u002For Customer Agreement (or other agreement governing your use of the AWS Service) apply.\n\n1. Nova Act may not always get it right. \n2. ⚠️ Please be aware that Nova Act may encounter commands in the content it observes on third party websites, including user-generated content on trusted websites such as social media posts, search results, forum comments, news articles, and document attachments. These unauthorized commands, known as prompt injections, may cause the model to make mistakes or act in a manner that differs from its instructions, such as ignoring your instructions, performing unauthorized actions, or exfiltrating sensitive data. To reduce the risks associated with prompt injections, it is important to monitor Nova Act and review its actions, especially when processing untrusted user-contributed content.\n3. We recommend you do not provide sensitive information to Nova Act, such as account passwords. Note that if you use sensitive information through Playwright calls, the information could be collected in screenshots if it appears unobstructed on the browser when Nova Act is engaged in completing an action. (See Entering sensitive information below.).\n4. When choosing developer tools on nova.amazon.com\u002Fact with API key authentication, we collect information on interactions with Nova Act, including in-browser screenshots to develop and improve our services. Email us at nova-act@amazon.com to request deletion of your Nova Act data.\n5. Do not share your API key generated on https:\u002F\u002Fnova.amazon.com\u002Fact. Anyone with access to your API key can use it to operate Nova Act under your Amazon account. If you lose your API key or believe someone else may have access to it, go to https:\u002F\u002Fnova.amazon.com\u002Fact to deactivate your key and obtain a new one.\n6. If you are using our browsing environment defaults, look for `NovaAct` in the user agent string to identify our agent. If you operate Nova Act in your own browsing environment or customize the user agent, we recommend that you include that same string.\n\n## Report a Bug\n\nHelp us improve! If you notice any issues, please let us know by submitting a bug report via nova-act@amazon.com. \n\n\nBe sure to include the following in the email:\n- Description of the issue;\n- Session ID, which will have been printed out as a console log message; and\n- Script of the workflow you are using.\n\nYour feedback is valuable in ensuring a better experience for everyone.\n\nThanks for experimenting with Nova Act!\n\n","# Nova Act SDK\n\n适用于 Amazon Nova Act 的 Python SDK。\n\nAmazon Nova Act 是一项新的 AWS 服务，用于构建和管理可靠的 AI 代理集群，以大规模自动化生产级 UI 工作流。Nova Act 可在浏览器中完成重复性 UI 工作流，并在适当的时候升级到人工主管处理。您可以通过结合自然语言的灵活性与 Python 代码来定义工作流。只需几个步骤：首先在 nova.amazon.com\u002Fact 的 Web 演示环境中探索，在您的 IDE 中开发和调试，部署到 AWS，并在 AWS 控制台中监控您的工作流。\n\n（预览版）Nova Act 还可通过 API 调用、远程 MCP 或代理框架（如 Strands Agents）与外部工具集成。\n\n\n> #### ⚠️ 重要提示：版本低于 3.0 的 Nova Act SDK 已不再受支持。用户必须升级到最新版本才能获得安全更新和新功能。\n\n> 请按照以下升级说明操作：\n\n > ```bash\n > # 升级到最新版本\n > pip install --upgrade nova-act\n >\n > # 检查当前版本\n > pip show nova-act\n > ```\n\n## 目录\n* [先决条件](#pre-requisites)\n* [Nova Act IDE 扩展](#quick-set-up-with-ide-extension)\n* [Nova Act 身份验证与安装](#authentication)\n* [快速入门](#quick-start)\n* [如何向 Nova Act 发出指令](#how-to-prompt-act)\n* [工作流](#workflows)\n* [从网页中提取信息](#extracting-information-from-a-web-page)\n* [人工介入流程 (HITL)](#human-in-the-loop-hitl) \n* [工具](#tool-use-beyond-the-browser-preview)\n* [并行运行多个会话](#running-multiple-sessions-in-parallel)\n* [身份验证、Cookie 和持久化浏览器状态](#authentication-cookies-and-persistent-browser-state)\n* [处理敏感数据](#entering-sensitive-information)\n* [验证码](#captchas)\n* [在网站上搜索](#search-on-a-website)\n* [文件上传和下载](#file-upload-and-download)\n* [处理浏览器对话框](#working-with-browser-dialogs)\n* [处理日期](#picking-dates)\n* [设置浏览器用户代理](#setting-the-browser-user-agent)\n* [使用代理](#using-a-proxy)\n* [工时跟踪工具](#time-worked-tracking-utility)\n* [日志记录和查看追踪信息](#logging)\n* [录制会话视频](#recording-a-session)\n* [将会话数据存储到 Amazon S3](#storing-session-data-in-your-amazon-s3-bucket)\n* [导航页面](#navigating-pages)\n* [查看无头模式下的会话](#viewing-a-session-that-is-running-in-headless-mode)\n* [将 Nova Act SDK 与 Amazon Bedrock AgentCore 浏览器工具一起使用](#use-nova-act-sdk-with-amazon-bedrock-agentcore-browser-tool)\n* [已知限制](#known-limitations)\n* [披露声明](#disclosures)\n* [报告 Bug](#report-a-bug)\n* [参考：Nova Act 构造函数参数](#initializing-novaact)\n* [参考：控制浏览器](#actuating-the-browser)\n* [参考：Nova Act CLI](#nova-act-cli)\n\n## 先决条件\n\n1. 操作系统：MacOS Sierra+、Ubuntu 22.04+、WSL2 或 Windows 10+\n2. Python 版本：3.10 或更高\n\n> **注意：** Nova Act 支持英语。\n\n## 设置\n\n### 使用 IDE 扩展快速设置\n\n借助 [Nova Act 扩展](https:\u002F\u002Fgithub.com\u002Faws\u002Fnova-act-extension)，加速您的开发流程。该扩展可自动配置 Nova Act 开发环境，并将完整的代理开发体验直接带入您的 IDE，支持聊天生成脚本、浏览器会话调试以及逐步测试等功能。有关安装说明和详细文档，请访问 [扩展仓库](https:\u002F\u002Fgithub.com\u002Faws\u002Fnova-act-extension) 或 [官网](https:\u002F\u002Fnova.amazon.com\u002Fact)。\n\n### 身份验证\n\n#### API 密钥认证\n\n注意：当您使用 Nova Act Playground 和\u002F或选择使用 API 密钥认证的 Nova Act 开发工具时，访问和使用须遵守 nova.amazon.com 的使用条款。\n\n\n请前往 https:\u002F\u002Fnova.amazon.com\u002Fact 生成 API 密钥。\n\n要将其保存为环境变量，请在终端中执行：\n```sh\nexport NOVA_ACT_API_KEY=\"your_api_key\"\n```\n\n#### 基于 IAM 的认证\n\n注意：当您选择使用 AWS IAM 认证的开发工具和\u002F或将工作流部署到 Nova Act AWS 服务时，您的 AWS 服务条款和\u002F或客户协议（或其他管理您使用 AWS 服务的协议）将适用。\n\nNova Act 也支持使用 IAM 凭证进行身份验证。有关详细信息，请参阅 Amazon 的 [Nova Act 用户指南文档](https:\u002F\u002Fdocs.aws.amazon.com\u002Fnova-act\u002Flatest\u002Fuserguide\u002F)。要使用基于 IAM 的凭证，请使用工作流构造（参见 [工作流](#workflows)）。请注意，如果您的环境中已配置 AWS 凭证，SDK 将实例化一个默认的 boto 会话。\n\n### 安装\n\n```bash\npip install nova-act\n```\n\n或者，您也可以自行构建 `nova-act`。克隆此仓库后，执行：\n```bash\npip install .\n```\n\n#### 【可选】安装 Google Chrome\nNova Act 最适合与 Google Chrome 配合使用，但没有权限自动安装该浏览器。如果您已经安装了 Google Chrome，或者可以接受使用 Chromium 浏览器，则可以跳过此步骤。否则，您可以在安装 Nova Act 的同一环境中运行以下命令来安装 Google Chrome。更多信息请访问 https:\u002F\u002Fplaywright.dev\u002Fpython\u002Fdocs\u002Fbrowsers#google-chrome--microsoft-edge。\n```bash\nplaywright install chrome\n```\n\n\n## 快速入门\n\n*注：首次运行 NovaAct 时，可能需要 1 到 2 分钟才能启动。这是因为 NovaAct 需要 [安装 Playwright 模块](https:\u002F\u002Fplaywright.dev\u002Fpython\u002Fdocs\u002Fbrowsers#install-browsers)。后续运行只需几秒钟即可启动。您可以通过设置 `NOVA_ACT_SKIP_PLAYWRIGHT_INSTALL` 环境变量来关闭此功能。*\n\n### 脚本模式\n\n```python\nfrom nova_act import NovaAct\n\nwith NovaAct(starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\u002Fsearch\") as nova:\n    nova.act(\"查找2月22日从波士顿飞往沃尔夫的航班\")\n```\n\nSDK 将执行以下操作：(1) 打开 Chrome 浏览器，(2) 按照提示完成任务，然后 (3) 关闭 Chrome 浏览器。运行的详细信息将以控制台日志消息的形式输出。\n\n有关可传递给 NovaAct 的其他运行时选项，请参阅【初始化 NovaAct】部分。\n\n### 交互模式\n\n使用交互式 Python 是一种很好的实验方式：\n\n```sh\n% python\nPython 3.10.16 (main, Dec  3 2024, 17:27:57) [Clang 16.0.0 (clang-1600.0)] on darwin\n输入 \"help\", \"copyright\", \"credits\" 或 \"license\" 以获取更多信息。\n>>> from nova_act import NovaAct\n>>> nova = NovaAct(starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\u002Fsearch\")\n>>> nova.start()\n>>> nova.act(\"查找2月22日从波士顿飞往沃尔夫的航班\")\n```\n\n请注意，当 `act()` 正在运行时，请勿与浏览器进行交互，因为底层模型将无法感知您的操作！\n> 注意：在使用交互模式时，按下 `ctrl+x` 可以退出代理操作，同时保持浏览器打开状态以便再次调用 `act()`。而按下 `ctrl+c` 则会退出浏览器，并需要重新启动 `NovaAct`。\n\n### 异步模式\n\nNova Act 提供了基于 `asyncio` 的异步实现。从 `nova_act.asyncio` 中导入 `NovaAct`，并使用 `async with` 和 `await`：\n\n```python\nimport asyncio\nfrom nova_act.asyncio import NovaAct\n\nasync def main():\n    async with NovaAct(starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\u002Fsearch\") as nova:\n        await nova.act(\"查找2月22日从波士顿飞往沃尔夫的航班\")\n\nasyncio.run(main())\n```\n\n### 示例\n\n[samples](.\u002Fsrc\u002Fnova_act\u002Fsamples) 文件夹中包含多个使用 Nova Act 完成各种任务的示例，包括：\n* 在房地产网站上搜索公寓，利用地图网站计算每个公寓到火车站的距离，并将这些信息整合到一个结果集中。[此示例](.\u002Fsrc\u002Fnova_act\u002Fsamples\u002Fsearch_apartments_calculate_commute.py) 展示了如何并行运行多个 NovaAct 实例（详情见下文）。\n* 使用工具提供的数据预订航班，并返回预订号。[此示例](.\u002Fsrc\u002Fnova_act\u002Fsamples\u002Fbooking_with_data_from_tool.py) 演示了如何将 Python 函数实现为工具，用于向工作流提供数据。\n* 允许用户登录电子邮件应用，并批准打印邮件数量。[此示例](.\u002Fsrc\u002Fnova_act\u002Fsamples\u002Fprint_number_of_emails.py) 展示了如何通过 HITL（人工参与）回调实现，将人类纳入工作流中。\n\n更多关于如何使用 Nova Act SDK 的示例，请参阅此 [Github 仓库](https:\u002F\u002Fgithub.com\u002Famazon-agi-labs\u002Fnova-act-samples)。\n\n## 如何提示 act()\n\n使用 Nova Act 完成端到端任务最简单的方式，是在一个提示中直接指定整个目标，并可辅以一些引导性提示。然而，代理需要按顺序执行许多步骤才能达成目标，而过程中一旦出现任何问题或不确定性，都可能导致工作流偏离预期。我们发现，当任务能在少于 30 步内完成时，Nova Act 的表现最为稳定。\n\n请确保提示清晰直接，明确说明您希望 Nova Act 执行的具体操作，包括是否需要返回某些信息（有关从网页提取信息的更多信息，请参阅 [此处](#extracting-information-from-a-web-page)）。尽量完整地指定代理应做出的选择以及应在表单字段中填写的值。在测试过程中，如果发现 act() 运行偏离预期，可以通过添加提示来改进（例如，如何使用遇到的特定 UI 元素、如何到达网站上的某个功能，或应避免哪些路径），这就像指导一位对任务和网站不熟悉的新人一样。如果代理总是绕远路，或者始终无法保证稳定可靠的结果，则可以将任务拆分为多个阶段，并在代码中将它们连接起来。\n\n**1. 明确且简洁地描述代理应执行的操作**\n\n❌ 不要这样写：\n```python\nnova.act(\"让我们看看 VTA 提供了哪些路线\")\n```\n\n✅ 应该这样写：\n```python\nnova.act(\"导航到‘路线’选项卡\")\n```\n\n❌ 不要这样写：\n```python\nnova.act_get(\"我想去见朋友。我得查一下橙线什么时候会来下一班。\")\n```\n\n✅ 应该这样写：\n```python\nnova.act_get(f\"查找政府中心站之后 {time} 的橙线下一班发车时间\")\n```\n\n**2. 提供完整的指令**\n\n❌ 不要这样写：\n```python\nnova.act(\"帮我预订一家价格低于 100 美元且星级最高的酒店\")\n```\n\n✅ 应该这样写：\n```python\nnova.act(f\"在休斯敦为两名成人预订一间每晚房价低于 100 美元、星级最高的酒店，入住日期为 {startdate} 至 {enddate}。首选两张大床，但一张特大床也可以。当进入客户信息或支付页面时停止。\")\n```\n\n**3. 将大型任务拆分为更小的任务**\n\n❌ 不要这样写：\n```python\nnova.act(\"帮我预订一家价格低于 100 美元且星级最高的酒店，然后找到最近的租车公司并取车，最后在附近找一家午餐店并在 12:30 预订座位\")\n```\n\n✅ 应该这样写：\n```python\nhotel_address = nova.act_get(f\"在休斯敦为两名成人预订一间每晚房价低于 100 美元、星级最高的酒店，入住日期为 {startdate} 至 {enddate}。首选两张大床，但一张特大床也可以。预订成功后返回酒店地址。\").response\nnova.act(f\"在 {hotel_address} 附近为两人预订 12:30 的餐厅\")\nnova.act(f\"在 {hotel_address} 附近寻找租车公司，并访问其官网\")\nnova.act(f\"从 {hotel_address} 附近的租车公司租一辆小型汽车，租期为 {startdate} 至 {enddate}, 取车时间为 12:00，还车时间为 12:00。\")\n```\n\n如果代理仍然难以完成任务，可以进一步细分：\n\n```python\nnova.act(f\"在休斯敦为两名成人搜索 {startdate} 至 {enddate} 期间的酒店\")\nnova.act(\"按客户平均评分排序\")\nhotel_address = nova.act_get(\"预订第一家房价不超过 100 美元的酒店。如果有选择，优先两张大床。预订成功后返回酒店地址。\").response\nnova.act(f\"在 {hotel_address} 附近预订 12:30 的午餐\")\nnova.act(f\"搜索 {hotel_address} 附近的租车公司，并访问最近的一家官网\")\nnova.act(f\"租一辆小型汽车，租期为 {startdate} 至 {enddate}, 取车时间为 12:00，还车时间为 12:00。\")\n```\n\n## 工作流\n\n工作流定义了代理的端到端任务。工作流由 act() 语句和用于编排自动化逻辑的 Python 代码组成。\n\n`nova-act` SDK 提供了一系列便捷的封装工具，用于管理部署在 NovaAct AWS 服务上的工作流。只需调用 CreateWorkflowDefinition API（或使用 AWS 控制台），即可获取 WorkflowDefinition 并开始使用。\n\n### 上下文管理器\n\n使用 NovaAct 服务协调工作流的核心类型是 `Workflow`。该类提供一个 [上下文管理器](https:\u002F\u002Fpeps.python.org\u002Fpep-0343\u002F)，用于处理对 Amazon Nova Act 服务中必要工作流 API 操作的调用。当您的运行开始时，它会调用 `CreateWorkflowRun`；当运行结束时，则会以适当的状态调用 `UpdateWorkflowRun`。此对象通过构造函数参数传递给 `NovaAct` 客户端，以便所有调用的 API 都能与正确的工作流和运行相关联（例如 `CreateSession`、`CreateAct`、`InvokeActStep`、`UpdateAct` 等）。以下示例展示了如何使用它：\n\n```python\nimport os\nfrom nova_act import NovaAct, Workflow\n\ndef main():\n    with Workflow(\n        workflow_definition_name=\"\u003Cyour-workflow-name>\",\n        model_id=\"nova-act-latest\"\n    ) as workflow:\n        with NovaAct(\n            starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\u002Fsearch\",\n            workflow=workflow,\n        ) as nova:\n            nova.act(\"查找2月22日从波士顿飞往沃尔夫的航班\")\n```\n\n#### 重试处理\n默认情况下，当 Nova Act 请求超时时，Nova Act SDK 会重试一次。您可以通过向 `Workflow` 构造函数传递一个 `boto_config` 对象来覆盖此行为。您还可以使用该对象来覆盖默认的 60 秒读取超时时间。例如，要将请求重试 4 次（总共 5 次尝试），并设置 90 秒的超时时间：\n\n```python\nboto_config = Config(retries={\"total_max_attempts\": 5, \"mode\": \"standard\"}, read_timeout=90)\nwith Workflow(\n    boto_config=boto_config,\n    workflow_definition_name=\"\u003Cyour-workflow-name>\",\n    model_id=\"nova-act-latest\"\n) as workflow:\n```\n\n请注意，重试相同的 Nova Act 请求可能会导致成本增加，因为请求可能被执行多次。有关重试的更多信息，包括重试模式，请参阅 [botocore 重试文档](https:\u002F\u002Fbotocore.amazonaws.com\u002Fv1\u002Fdocumentation\u002Fapi\u002Flatest\u002Freference\u002Fconfig.html)。\n\n### 装饰器\n\n为方便起见，SDK 还提供了一个 [装饰器](https:\u002F\u002Fpeps.python.org\u002Fpep-0318\u002F)，可用于标注在给定工作流下运行的函数。该装饰器利用 [ContextVars](https:\u002F\u002Fpeps.python.org\u002Fpep-0567\u002F) 将正确的 `Workflow` 对象注入到函数内的每个 `NovaAct` 实例中；无需再提供 `workflow` 关键字参数！以下语法提供了与上一示例相同的功能：\n\n```python\nfrom nova_act import NovaAct, workflow\n\n@workflow(\n    workflow_definition_name=\"\u003Cyour-workflow-name>\",\n    model_id=\"nova-act-latest\",\n)\ndef main():\n    with NovaAct(starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\u002Fsearch\") as nova:\n        nova.act(\"查找2月22日从波士顿飞往沃尔夫的航班\")\n```\n\n#### 使用 `boto_session_kwargs` 配置 AWS 凭证\n`Workflow` 类接受一个可选的 `boto_session_kwargs` 参数，用于自定义 boto3 会话的配置。**默认情况下，如果未提供此参数，当存在 AWS 凭证时，工作流将使用 `{\"region_name\": \"us-east-1\"}`**。\n\n如果您需要自定义 AWS 会话（例如使用特定的配置文件或显式提供凭证），可以将自定义字典传递给 `boto_session_kwargs`。此方法适用于 **上下文管理器** 和 **装饰器** 版本：\n\n**使用上下文管理器：**\n\n```python\nfrom nova_act import NovaAct, Workflow\n\ndef main():\n    with Workflow(\n        workflow_definition_name=\"\u003Cyour-workflow-name>\",\n        model_id=\"nova-act-latest\",\n        boto_session_kwargs={\n            \"profile_name\": \"my-aws-profile\",\n            \"region_name\": \"us-east-1\"\n        }\n    ) as workflow:\n        with NovaAct(\n            starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\u002Fsearch\",\n            workflow=workflow,\n        ) as nova:\n            nova.act(\"查找2月22日从波士顿飞往沃尔夫的航班\")\n```\n\n**使用装饰器：**\n\n```python\nfrom nova_act import NovaAct, workflow\n\n@workflow(\n    workflow_definition_name=\"\u003Cyour-workflow-name>\",\n    model_id=\"nova-act-latest\",\n    boto_session_kwargs={\n        \"profile_name\": \"my-aws-profile\",\n        \"region_name\": \"us-east-1\"\n    }\n)\ndef main():\n    with NovaAct(starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\u002Fsearch\") as nova:\n        nova.act(\"查找2月22日从波士顿飞往沃尔夫的航班\")\n```\n\n**注意：** 如果您未提供 `boto_session_kwargs` 且未使用 API 密钥，工作流将自动使用 boto3 加载 AWS 凭证（有关 boto3 如何加载 AWS 凭证的详细信息，请参阅 [此处](https:\u002F\u002Fboto3.amazonaws.com\u002Fv1\u002Fdocumentation\u002Fapi\u002Flatest\u002Fguide\u002Fconfiguration.html)）。\n\n### 最佳实践\n\n#### 多线程\n\n`Workflow` 类可以直接用于多线程工作流。请参阅以下示例：\n\n```python\nfrom nova_act import NovaAct, Workflow\n\ndef multi_threaded_helper(workflow: Workflow):\n    with NovaAct(..., workflow=workflow) as nova:\n       # nova 将运行相应的工作流\n \nwith Workflow(\n    workflow_definition_name=\"my-workflow\",\n    model_id=\"nova-act-latest\"\n) as workflow:\n    t = Thread(target=multi_threaded_helper, args=(workflow,))\n    t.start()\n    t.join()\n```\n\n由于 `@workflow` 装饰器利用 ContextVar 来注入上下文，而 ContextVar 本身被设计为线程专属，因此用户需要将上下文传递给那些将在不同于包装函数定义的线程中运行的函数。请参阅以下示例：\n\n```python\nfrom contextvars import copy_context\nfrom nova_act import NovaAct, workflow\n\ndef multi_threaded_helper():\n    with NovaAct(...) as nova:\n       # nova 将运行相应的工作流\n \n@workflow(\n    workflow_definition_name=\"my-workflow\"\n    model_id=\"nova-act-latest\",\n)\ndef multi_threaded_workflow():\n    ctx = copy_context()\n    t = Thread(target=ctx.run, args=(multi_threaded_helper,))\n    t.start()\n    t.join()\n\nmulti_threaded_workflow()\n```\n\n或者，也可以直接使用 `workflow` 参数手动注入上下文，就像直接使用 `Workflow` 类时那样：\n\n```python\nfrom nova_act import NovaAct, get_current_workflow, workflow\n\ndef multi_threaded_helper(workflow: Workflow):\n    with NovaAct(..., workflow=workflow) as nova:\n       # nova 将运行相应的工作流\n \n@workflow(\n    workflow_definition_name=\"my-workflow\"\n    model_id=\"nova-act-latest\",\n)\ndef multi_threaded_workflow():\n    t = Thread(target=multi_threaded_helper, args=(get_current_workflow(),))\n    t.start()\n    t.join()\n\nmulti_threaded_workflow()  \n```\n\n#### 多进程\n目前，`Workflow` 构造不支持在多进程之间传递，因为它将 boto3 的 Session 和 Client 作为实例变量保存，而这些对象无法被 [pickle](https:\u002F\u002Fdocs.python.org\u002F3\u002Flibrary\u002Fpickle.html) 序列化。对此的支持即将推出！\n\n### Nova Act CLI\n\nNova Act CLI 提供了一个简化的命令行界面，用于将 Python 工作流部署到 AWS AgentCore Runtime，并自动处理容器化、ECR 管理、IAM 角色以及多区域部署等工作。有关安装和使用说明，请参阅 [Nova Act CLI README](.\u002Fsrc\u002Fnova_act\u002Fcli\u002FREADME.md)。\n\n## 常用构建模块\n\n### 从网页中提取信息\n\n使用 `pydantic`，并让 `act_get` 根据特定模式对浏览器页面上的问题作出响应。\n\n- 当你期望任何结构化响应时，即使只是布尔值（是\u002F否），也务必使用模式。如果没有提供模式，返回的对象将不会包含响应。\n- 将提取信息的提示放在单独的 `act()` 调用中。\n\n为方便起见，`act_get()` 函数与 `act()` 的功能相同，但会提供一个默认的 `STRING_SCHEMA`，因此无论是否提供特定模式，返回的对象中始终会包含响应。我们建议在所有提取任务中使用 `act_get()`，以确保类型安全。\n\n示例：\n\n```python\nfrom nova_act import NovaAct\nfrom pydantic import BaseModel\n\nclass Measurement(BaseModel):\n    value: float\n    unit: str\n\nclass PlanetData(BaseModel):\n    gravity: Measurement\n    average_temperature: Measurement\n\nwith NovaAct(\n        starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\"\n    ) as nova:\n        planet = 'Proxima Centauri b'\n        result = nova.act_get(\n            f\"前往 {planet} 页面，返回其重力和平均温度。\",\n            schema=PlanetData.model_json_schema(),\n        )\n\n        # 将响应解析为数据模型\n        planet_data = PlanetData.model_validate(result.parsed_response)\n\n        # 对解析后的数据进行处理\n        print(f\"✓ {planet} 数据：\\n{planet_data.model_dump_json(indent=2)}\")\n```\n\n如果只需要布尔值响应，可以使用便捷的 `BOOL_SCHEMA` 常量：\n\n示例：\n\n```python\nfrom nova_act import NovaAct, ActInvalidModelGenerationError, BOOL_SCHEMA\nwith NovaAct(starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\") as nova:\n    try:\n        result = nova.act_get(\"我是否已登录？\", schema=BOOL_SCHEMA)\n    except ActInvalidModelGenerationError as e:\n        # act 响应未匹配模式 ¯\\_(ツ)_\u002F¯\n        print(f\"无效结果：{e}\")\n    else:\n        # result.parsed_response 现在是一个布尔值\n        if result.parsed_response:\n            print(\"您已登录\")\n        else:\n            print(\"您未登录\")\n```\n\n### 人机协作（HITL）\n\nNova Act 的人机协作（HITL）功能可在自动化 Web 工作流中实现无缝的人工监督。HITL 已集成到 Nova Act SDK 中，供您在工作流中实现（并非作为 AWS 托管服务提供）。当您的工作流遇到需要人工判断或干预的场景时，HITL 可以提供工具和用户界面，供监督人员协助、验证或接管流程。\n\n#### HITL 模式\n\n##### 人工审批\n\n人工审批允许在自动化流程中进行异步的人工决策。当 Nova Act 遇到需要人工判断的决策点时，它会捕获当前页面的截图，并通过基于浏览器的界面呈现给人工审核员。当您需要二元或多项选择决策（批准\u002F拒绝、是\u002F否，或从预定义选项中选择）时，可使用此模式。\n\n##### UI 接管\n\nUI 接管允许实时控制远程浏览器会话。当 Nova Act 遇到需要人工交互的任务时，它会通过实时流媒体界面将浏览器控制权交给人工操作员。操作员可以使用鼠标和键盘实时与浏览器进行交互。\n\n#### 实现 HITL\n\n请参阅 [Amazon Nova Act 用户指南中的 HITL 文档](https:\u002F\u002Fdocs.aws.amazon.com\u002Fnova-act\u002Flatest\u002Fuserguide\u002Fhitl.html#implementing-hitl)，了解如何在生产工作流中实现 HITL。\n\n##### 使用 SDK 实现 HITL\n\n要在 Nova Act SDK 中实现 HITL 模式，需定义一个继承自 `HumanInputCallbacksBase` 的类，并实现其两个抽象方法 `approve` 和 `ui_takeover`。然后将该类的实例传递给 `NovaAct` 构造函数的 `human_input_callbacks` 参数。\n\n- `approve`：用于触发人工审批模式的回调（例如，审批费用报告或采购申请）。\n- `ui_takeover`：用于触发 UI 接管模式的回调（例如，解决 CAPTCHA 挑战）。\n\n```python\nfrom nova_act import NovaAct, Workflow\nfrom nova_act.tools.human.interface.human_input_callback import (\n    ApprovalResponse, HumanInputCallbacksBase, UiTakeoverResponse,\n)\n\nclass MyHumanInputCallbacks(HumanInputCallbacksBase):\n    def approve(self, message: str) -> ApprovalResponse:\n        ... \n\n    def ui_takeover(self, message: str) -> UiTakeoverResponse:\n        ...\n\nwith NovaAct(\n    starting_page=...,\n    tty=False,\n    human_input_callbacks=MyHumanInputCallbacks(),\n) as nova:\n    ...\n    print(f\"任务完成：{result.response}\")\n```\n\n有关实际示例，请参阅 [此示例](.\u002Fsrc\u002Fnova_act\u002Fsamples\u002Fprint_number_of_emails.py)。\n\n### 浏览器之外的工具使用（预览）\n\n（预览）Nova Act 允许您将浏览器之外的外部工具，如 API 调用或数据库查询，集成到工作流中。Nova Act SDK 支持将 Python 函数用作工具，在工作流步骤中调用。要使 Python 函数可用作工具，只需使用 `@tool` 装饰器对其进行标注。您可以将工具列表传递给 `NovaAct` 构造函数的 `tools` 参数。\n\n```python\nfrom nova_act import NovaAct, tool\n\n@tool\ndef my_tool(str: input) -> str:\n   ...\n\nwith NovaAct(\n    starting_page=...,\n    tools=[my_tool],\n)\n```\n\n有关实际示例，请参阅 [此示例](.\u002Fsrc\u002Fnova_act\u002Fsamples\u002Fbooking_with_data_from_tool.py)。\n\n用户还可以通过利用 [Strands MCP 客户端](https:\u002F\u002Fstrandsagents.com\u002Flatest\u002Fdocumentation\u002Fdocs\u002Fuser-guide\u002Fconcepts\u002Ftools\u002Fmcp-tools\u002F)，从 MCP 服务器提供工具：\n\n```python\nfrom mcp import StdioServerParameters, stdio_client\nfrom nova_act import NovaAct\nfrom strands.tools.mcp import MCPClient\n\nwith MCPClient(\n    lambda: stdio_client(\n        StdioServerParameters(command=\"uvx\", args=[\"awslabs.aws-documentation-mcp-server@latest\"])\n    )\n) as aws_docs_client:\n    with NovaAct(\n        starting_page=\"https:\u002F\u002Faws.amazon.com\u002F\", tools=aws_docs_client.list_tools_sync(),\n    ) as nova:\n        print(\n            nova.act_get(\n                \"使用 'search_documentation' 工具告诉我关于 Amazon Bedrock 的信息，以及如何用 Python 使用它。\n                忽略网页浏览器；不要点击、滚动、输入等。\"\n            )\n        )\n\n```\n\n#### 进阶：需要浏览器控制的工具\n\n如果您的自定义工具需要直接与浏览器交互（类似于 HITL 工具），可以将其标记为 `requires_unlocked_actuator_context = True`。这会在工具执行期间暂时暂停执行器的内部钩子，从而允许外部进程控制浏览器。\n\n```python\nfrom nova_act import NovaAct, tool\n\n@tool\ndef my_browser_control_tool(message: str) -> str:\n    \"\"\"需要直接访问浏览器的工具。\"\"\"\n    # ... 外部与浏览器交互\n    return \"完成\"\n\n# 将工具标记为需要解锁上下文\nmy_browser_control_tool.requires_unlocked_actuator_context = True\n\nwith NovaAct(starting_page=..., tools=[my_browser_control_tool]) as nova:\n    nova.act(\"使用 my_browser_control_tool 做点什么\")\n```\n\n执行器会在下一次代理动作时自动重新锁定上下文。\n\n### 处理 ActError 错误\n\n一旦 `NovaAct` 客户端启动，在执行 `act()` 时可能会遇到错误。所有这些错误类型都包含在 [`nova_act.types.act_errors` 模块](.\u002Fsrc\u002Fnova_act\u002Ftypes\u002Fact_errors.py)中，并按以下类别组织：\n1. `ActAgentError`：表示请求的提示未能完成；用户可以尝试使用不同的请求重试。\n   * 示例包括：`ActAgentFailed`（代理因任务无法完成而引发错误）、`ActInvalidModelGenerationError`（模型生成了无法解析的输出）或 `ActExceededMaxStepsError`（`act()` 未在配置的最大步骤数内完成）。\n1. `ActExecutionError`：表示在执行代理的有效输出时遇到了本地错误。\n   * 示例包括：`ActActuationError`（客户端在驱动浏览器时遇到异常）或 `ActCanceledError`（用户取消了执行）。\n1. `ActClientError`：表示对 Nova Act 服务的请求无效；用户可以尝试使用不同的请求重试。\n   * 示例包括：`ActGuardrailsError`（请求被我们的 RAI 安全护栏阻止）或 `ActRateLimitExceededError`（请求被限流；应降低请求频率）。\n1. `ActServerError`：表示 Nova Act 服务在处理请求时遇到了错误。\n   * 示例包括：`ActInternalServerError`（处理请求时发生内部错误）、`ActBadResponseError`（服务返回了形状无法识别的响应）或 `ActServiceUnavailableError`（无法连接到服务）。\n\n用户可以捕获 `ActAgentError` 和 `ActClientError` 并使用适当的请求重试；对于 `ActExecutionError` 和 `ActServerError`，请向团队提交问题以进行调查，包括 (1) 您使用的 SDK 版本、(2) 您的平台和操作系统、(3) 完整的错误堆栈跟踪，以及 (4) 复现步骤。\n\n### 并行运行多个会话\n一个 `NovaAct` 实例一次只能操作一个浏览器。然而，通过使用多个 `NovaAct` 实例，可以同时操作多个浏览器！这些实例非常轻量级。你可以利用这一点来并行化任务的某些部分，从而实现一种针对互联网的浏览器“map-reduce”模式。[这个示例](.\u002Fsrc\u002Fnova_act\u002Fsamples\u002Fsearch_apartments_calculate_commute.py) 展示了如何并行运行多个会话。\n\n### 身份验证、Cookie 和持久化的浏览器状态\n\nNova Act 支持通过覆盖其默认设置来处理已身份验证的浏览器会话。默认情况下，当 Nova Act 运行时，它会克隆 Chromium 的用户数据目录，并在运行结束时将其删除。要使用已身份验证的会话，你需要指定一个包含已身份验证会话的现有目录，并禁用克隆功能（这也会禁用对该目录的删除）。\n\n具体步骤如下：\n\n1. （可选）为用户数据目录创建一个新的本地目录，例如 `\u002Ftmp\u002Fuser-data-dir`。你也可以跳过此步骤，直接使用现有的 Chromium 配置文件。\n2. 在实例化 `NovaAct` 时，通过 `user_data_dir` 参数指定该目录。\n3. 在实例化 `NovaAct` 时，通过传递参数 `clone_user_data_dir=False` 来禁用对该目录的克隆。\n4. 指示 Nova Act 打开你要进行身份验证的网站。\n5. 在这些网站上完成身份验证。有关输入敏感信息的更多信息，请参阅下文的 [输入敏感信息]。\n6. 停止你的 Nova Act 会话。\n\n下次你运行 Nova Act 并将 `user_data_dir` 设置为你在第一步中创建的目录时，你将从一个已身份验证的会话开始。在后续运行中，你可以决定是否启用或禁用克隆功能。如果你正在并行运行多个 `NovaAct` 实例，则每个实例都需要创建自己的副本，因此在这种情况下必须启用克隆功能（`clone_user_data_dir=True`）。\n\n以下是一个示例脚本，展示了如何传递这些参数：\n\n```python\nimport os\n\nfrom nova_act import NovaAct\n\nos.makedirs(user_data_dir, exist_ok=True)\n\nwith NovaAct(starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\", user_data_dir=user_data_dir, clone_user_data_dir=False) as nova:\n    input(\"请登录到你的网站，然后按回车键...\")\n    # 在此处添加你的 nova.act() 语句。\n\nprint(f\"用户数据目录已保存至 {user_data_dir=}\")\n```\n\n该脚本包含在安装包中：`python -m nova_act.samples.setup_chrome_user_data_dir`。\n\n#### 使用本地默认的 Chrome 浏览器运行\n\n如果你的本地默认 Chrome 浏览器安装了某些扩展程序或安全功能，而这些是你的工作流需要访问的网站所必需的，那么你可以通过以下 `NovaAct` 参数配置 SDK，使其使用你机器上安装的 Chrome 浏览器，而不是由 SDK 管理的浏览器。\n\n> **重要提示：**\n>\n> - 此功能目前仅适用于 MacOS。\n> - 这将退出你当前运行的 Chrome 浏览器，并以新的参数重新启动它。会话结束时，Chrome 将被退出。\n> - 如果你的 Chrome 浏览器打开了许多标签页，建议在运行自动化之前关闭不必要的标签页，因为重启过程中 Chrome 的性能可能会受到大量打开标签的影响。\n\n在使用此功能启动 Nova Act 之前，你必须将系统 Chrome 用户数据目录中的文件复制到你选择的位置。这是因为 Chrome 不允许通过 CDP 连接到使用系统默认用户数据目录启动的实例。\n\n手动操作可以通过以下命令完成：\n```\nrsync -a --exclude=\"Singleton*\" \u002FUsers\u002F$USER\u002FLibrary\u002FApplication\\ Support\u002FGoogle\u002FChrome\u002F \u003C你选择的位置>\n```\n\n你也可以使用便捷函数 `rsync_from_default_user_data(\u003C你选择的位置>)` 来在脚本中创建和更新该目录。请注意，调用 `rsync_from_default_user_data` 会覆盖目标目录中的更改，并使其成为 `\u002FUsers\u002F$USER\u002FLibrary\u002FApplication\\ Support\u002FGoogle\u002FChrome\u002F` 的精确镜像，即用源目录中同名的文件覆盖目标目录中的文件，并删除目标目录中不存在的文件。如果你想将 Nova Act 在工作目录中对配置文件所做的更改永久保存回系统默认目录，那么在停止 Nova Act 后，你必须通过自己的实现将这些更改再次同步回系统默认目录。\n\n使用此功能时，必须指定 `clone_user_data_dir=False`，并将填充好相应文件的工作目录作为 `user_data_dir` 传入。这是因为在该模式下，`NovaAct` 不会为你克隆或删除 `user_data_dir`。\n\n```python\n>>> from nova_act import NovaAct, rsync_from_default_user_data\n>>> working_user_data_dir = \"\u002FUsers\u002F$USER\u002Fyour_choice_of_path\"\n>>> rsync_from_default_user_data(working_user_data_dir)\n>>> nova = NovaAct(use_default_chrome_browser=True, clone_user_data_dir=False, user_data_dir=working_user_data_dir, starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\u002Fsearch\")\n>>> nova.start()\n>>> nova.act_get(\"查找2月22日从波士顿飞往沃尔夫的航班\")\n...\n>>> nova.stop()\n>>> quit()\n```\n\n### 输入敏感信息\n\n要输入密码或其他敏感信息（例如信用卡号和社保号码），不要直接向模型提供这些敏感信息。而是让模型专注于你想要填写的表单元素。然后直接使用 Playwright API 来输入数据，例如 `client.page.keyboard.type(sensitive_string)`。你可以通过任何你希望的方式获取这些数据：在命令行中使用 [`getpass`](https:\u002F\u002Fdocs.python.org\u002F3\u002Flibrary\u002Fgetpass.html) 提示输入、通过命令行参数传递，或者设置环境变量。\n\n请注意，在没有系统级密钥环（如 Libsecret 或 KWallet）的 Linux 系统上，任何使用基于 Chromium 的浏览器密码管理器保存的密码或其他敏感数据都会以明文形式存储在用户的配置文件目录中。\n\n> **注意：** 如果你指示 Nova Act 对任何显示敏感信息的浏览器界面执行操作，包括通过 Playwright API 提供的信息，这些信息都将被包含在收集的截图中。\n\n```python\n# 登录。\nnova.act(\"输入用户名 janedoe 并点击密码字段\")\n# 从命令行获取密码并通过 Playwright 输入。（不会通过网络发送。）\nnova.page.keyboard.type(getpass())\n# 现在用户名和密码已经填写完毕，请求 NovaAct 继续操作。\nnova.act(\"登录\")\n```\n\n### 安全选项\n\nNovaAct 启动时采用安全的默认行为，但根据您的使用场景，您可能希望放宽这些限制。\n\n#### 允许导航到本地 `file:\u002F\u002F` URL\n\n要启用本地文件导航，请在 `SecurityOptions.allowed_file_open_paths` 中定义一个或多个文件路径模式：\n\n```python\nfrom nova_act import NovaAct, SecurityOptions\n\nNovaAct(starting_page=\"file:\u002F\u002Fhome\u002Fnova-act\u002Fsite\u002Findex.html\", SecurityOptions(allowed_file_open_paths=['\u002Fhome\u002Fnova-act\u002Fsite\u002F*']))\n```\n\n#### 允许文件上传\n\n要允许代理将文件上传到网站，请在 `SecurityOptions.allowed_file_upload_paths` 中定义一个或多个文件路径模式：\n\n```python\nfrom nova_act import NovaAct, SecurityOptions\n\nNovaAct(starting_page=\"https:\u002F\u002Fexample.com\", SecurityOptions(allowed_file_upload_paths=['\u002Fhome\u002Fnova-act\u002Fshared\u002F*']))\n```\n\n#### 文件路径结构\n文件路径参数支持以下格式：\n- `[\"\u002Fhome\u002Fnova-act\u002Fshared\u002F*\"]` - 允许从特定目录访问\n- `[\"\u002Fhome\u002Fnova-act\u002Fshared\u002Ffile.txt\"]` - 允许访问特定文件路径\n- `[\"*\"]` - 对所有路径启用\n- `[]` - 禁用该功能（默认）\n\n### 状态护栏\n\n状态护栏允许您控制代理在执行过程中可以访问的 URL。您可以提供一个回调函数，在每次观察后检查浏览器状态，并决定是否允许继续执行。如果被阻止，`act()` 将引发 `ActStateGuardrailError`。这有助于防止代理导航到未经授权的域名或敏感页面。\n\n```python\nfrom nova_act import NovaAct, GuardrailDecision, GuardrailInputState\nfrom urllib.parse import urlparse\nimport fnmatch\n\ndef url_guardrail(state: GuardrailInputState) -> GuardrailDecision:\n    hostname = urlparse(state.browser_url).hostname\n    if not hostname:\n        return GuardrailDecision.BLOCK\n\n    # 示例 URL 阻止列表\n    blocked = [\"*.blocked-domain.com\", \"*.another-blocked-domain.com\"]\n    if any(fnmatch.fnmatch(hostname, pattern) for pattern in blocked):\n        return GuardrailDecision.BLOCK\n\n    # 示例 URL 允许列表\n    allowed = [\"allowed-domain.com\", \"*.another-allowed-domain.com\"]\n    if any(fnmatch.fnmatch(hostname, pattern) for pattern in allowed):\n        return GuardrailDecision.PASS\n\n    return GuardrailDecision.BLOCK\n\nwith NovaAct(starting_page=\"https:\u002F\u002Fallowed-domain.com\", state_guardrail=url_guardrail) as nova:\n    # 如果代理尝试访问被列入阻止列表的域名或离开允许列表中的域名，以下操作将被阻止\n    nova.act(\"导航到主页\")\n```\n\n### 验证码\n\n如果您的脚本在某些地方遇到验证码，应使用 `ui_takeover` 回调函数（参见 [HITL](#human-in-the-loop-hitl)），以便将解决验证码的步骤转交给人工处理。\n\n### 在网站上搜索\n\n```python\nnova.go_to_url(website_url)\nnova.act(\"搜索猫\")\n```\n\n如果模型难以找到搜索按钮，您可以指示它按下回车键以启动搜索：\n\n```python\nnova.act(\"搜索猫。按回车键启动搜索。\")\n```\n\n### 文件上传和下载\n\n您可以使用 Playwright 下载网页上的文件。\n\n通过下载操作按钮：\n\n```python\n# 让 Playwright 捕获任何下载，然后触发页面开始下载。\nwith nova.page.expect_download() as download_info:\n    nova.act(\"点击下载按钮\")\n\n# 可以获取下载的临时路径。\nprint(f\"已下载文件 {download_info.value.path()}\")\n\n# 现在将下载的文件永久保存到您选择的位置。\ndownload_info.value.save_as(\"my_downloaded_file\")\n```\n\n> **重要提示**：\n>\n> - 浏览器会显示文件正在下载到 Playwright 定义的临时路径（[参阅文档](https:\u002F\u002Fplaywright.dev\u002Fdocs\u002Fdownloads#introduction)）\n>    - 该临时路径可通过 `download_info.value.path()` 获取\n>  - 使用 `download_info.value.save_as()` 时：\n>    - 如果提供了完整路径（例如 `\u002Fpath\u002Fto\u002Fmy_downloaded_file`），文件将保存到该路径\n>    - 如果只提供了文件名（例如 `my_downloaded_file`），文件将保存到运行 Python 脚本的当前工作目录\n\n要下载当前页面：\n\n1. 如果是 HTML，则通过访问 `nova.page.content()` 可以获取渲染后的 DOM，您可以将其保存为文件。\n2. 如果是其他内容类型，例如 PDF，可以使用 `nova.page.request` 进行下载：\n\n```python\n# 使用 Playwright 的请求下载内容。\nresponse = nova.page.request.get(nova.page.url)\nwith open(\"downloaded.pdf\", \"wb\") as f:\n    f.write(response.body())\n```\n\nNovaAct 可以原生地使用页面上的相应上传操作上传文件。为此，您首先需要允许 NovaAct 访问要上传的文件，然后指示它按文件名上传：\n\n```python\nupload_filename = \"\u002Fupload_path\u002Fupload_me.pdf\"\n\nwith NovaAct(..., security_options=SecurityOptions(allowed_file_upload_paths=[\"\u002Fupload_path\u002F*\"])) as nova:\n    nova.act(f\"使用上传收据按钮上传 {upload_filename}\")\n```\n\n> **重要的安全提示**：\n>\n> 请谨慎选择 `allowed_file_upload_paths`，以尽量减少 NovaAct 对您文件系统的访问权限，从而避免恶意网站或网络内容窃取数据。\n\n### 处理浏览器对话框\n\nPlaywright 默认会自动关闭浏览器原生对话框，例如 [alert](https:\u002F\u002Fdeveloper.mozilla.org\u002Fen-US\u002Fdocs\u002FWeb\u002FAPI\u002FWindow\u002Falert)、[confirm](https:\u002F\u002Fdeveloper.mozilla.org\u002Fen-US\u002Fdocs\u002FWeb\u002FAPI\u002FWindow\u002Fconfirm) 和 [prompt](https:\u002F\u002Fdeveloper.mozilla.org\u002Fen-US\u002Fdocs\u002FWeb\u002FAPI\u002FWindow\u002Fprompt)。如需手动处理这些对话框，请在 Nova Act 执行触发对话框的操作之前注册一个对话框处理器。例如：\n\n```python\ndef handle_dialog(dialog):\n    \"\"\"处理对话框，打印其消息并接受它\"\"\"\n    print(f\"对话框消息：{dialog.message}\")\n    dialog.accept()  # 接受并关闭对话框\n    # dialog.dismiss()  # 或取消\u002F关闭对话框\n\n# 注册处理器\nnova.page.on(\"dialog\", handle_dialog)\n# 触发对话框\nnova.act(\"做一些会导致对话框出现的事情\")\n# 取消注册处理器\nnova.page.remove_listener(\"dialog\", handle_dialog)\n```\n\n有关详细信息，请参阅 [Playwright 文档](https:\u002F\u002Fplaywright.dev\u002Fpython\u002Fdocs\u002Fdialogs#alert-confirm-prompt-dialogs)。\n\n### 选择日期\n\n以绝对时间指定开始和结束日期效果最佳。\n\n```python\nnova.act(\"选择3月23日至3月28日\")\n```\n\n### 设置浏览器用户代理\n\nNova Act 配备了 Playwright 的 Chrome 和 Chromium 浏览器。这些浏览器使用 Playwright 设置的默认用户代理。您可以通过 `user_agent` 选项覆盖此设置：\n\n```python\nnova = NovaAct(..., user_agent=\"MyUserAgent\u002F2.7\")\n```\n\n### 使用代理\n\nNova Act 支持浏览器会话的代理配置。当您需要通过特定的代理服务器路由流量时，这将非常有用：\n\n```python\n# 基本代理，无需身份验证\nproxy_config = {\n    \"server\": \"http:\u002F\u002Fproxy.example.com:8080\"\n}\n\n# 带认证的代理\nproxy_config = {\n    \"server\": \"http:\u002F\u002Fproxy.example.com:8080\",\n    \"username\": \"myusername\",\n    \"password\": \"mypassword\"\n}\n\nnova = NovaAct(\n    starting_page=\"https:\u002F\u002Fexample.com\",\n    proxy=proxy_config\n)\n```\n\n> **注意:** 如果连接到 CDP 端点，负责启动浏览器和管理生命周期的代码需要配置代理。这些配置参数仅在 NovaAct 创建并启动浏览器时才适用。\n\n\n### 日志记录\n默认情况下，`NovaAct` 会输出所有日志级别为 `logging.INFO` 或更高的日志信息。可以通过设置环境变量 `NOVA_ACT_LOG_LEVEL` 来覆盖此行为，该变量应指定一个整数值，对应于 [Python 日志级别](https:\u002F\u002Fdocs.python.org\u002F3\u002Flibrary\u002Flogging.html#logging-levels)。\n \n### 查看 act 跟踪信息\n \n在 `act()` 执行完成后，它会将所执行的操作以自包含的 HTML 文件形式输出。文件的位置会在控制台日志中打印出来。\n \n```sh\n> ** 在此处查看您的 act 运行结果：\u002Fvar\u002Ffolders\u002F6k\u002F75j3vkvs62z0lrz5bgcwq0gw0000gq\u002FT\u002Ftmpk7_23qte_nova_act_logs\u002F15d2a29f-a495-42fb-96c5-0fdd0295d337\u002Fact_844b076b-be57-4014-b4d8-6abed1ac7a5e_output.html\n```\n \n您可以通过向 `NovaAct` 传递 `logs_directory` 参数来更改此目录。\n\n### 工作时间跟踪工具\n\ntime_worked 工具用于跟踪并报告智能体在任务上花费的大致时间，不包括等待人工输入的时间。这有助于您了解智能体的实际执行时间。\n\n#### 工作原理\n大致工作时间使用以下基本公式计算：\n```\ntime_worked = (end_time - start_time) - human_wait_time\n```\n\n当 `act()` 调用完成（无论成功还是失败）时，会计算以下内容：\n- **大致工作时间**：总执行时间（结束时间减去开始时间）减去任何等待人工输入的时间\n- **人工等待时间**：从发出 `approve()` 或 `ui_takeover()` 回调请求到智能体继续执行之间所花费的时间\n\n#### 控制台输出\n\n每次 `act()` 调用结束时，您都会在控制台以及 JSON 和 HTML 报告中看到工作时间摘要：\n\n没有人工输入时：\n```\n⏱️ 大致工作时间：11.8秒\n```\n\n有人工输入时：\n```\n⏱️ 大致工作时间：28.3秒（不包括4.5秒的人工等待时间）\n```\n\n#### 重要声明\n\n> **注意:** 工作时间的计算是近似的，可能会因系统计时差异、网络延迟或其他因素而出现误差。此指标应被视为帮助理解智能体执行模式的辅助工具，不应用于正式的时间记录或计费目的。\n\n### 录制会话\n \n您可以通过设置 `logs_directory` 并在 `NovaAct` 构造函数中指定 `record_video=True`，轻松地在本地录制整个浏览器会话。\n\n### 将会话数据存储到您的 Amazon S3 存储桶\n\nNova Act 允许您使用 `S3Writer` 便捷工具将会话数据（HTML 跟踪、截图等）存储到您自己的 [Amazon S3](https:\u002F\u002Faws.amazon.com\u002Fs3\u002F) 存储桶中：\n\n```python\nimport boto3\nfrom nova_act import NovaAct\nfrom nova_act.util.s3_writer import S3Writer\n\n# 使用适当的凭证创建 boto3 会话\nboto_session = boto3.Session()\n\n# 创建 S3Writer\ns3_writer = S3Writer(\n    boto_session=boto_session,\n    s3_bucket_name=\"my-bucket\",\n    s3_prefix=\"my-prefix\u002F\",  # 可选\n    metadata={\"Project\": \"MyProject\"}  # 可选\n)\n\n# 将 S3Writer 与 NovaAct 结合使用\nwith NovaAct(\n    starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\u002Fsearch\",\n    boto_session=boto_session,  \u002F\u002F 您也可以在此处使用 API 密钥代替\n    stop_hooks=[s3_writer]\n) as nova:\n    result = nova.act_get(\"查找2月22日从波士顿飞往狼城的航班\")\n```\n\nS3Writer 需要以下 AWS 权限：\n- 对存储桶和前缀的 s3:ListObjects 权限\n- 对存储桶和前缀的 s3:PutObject 权限\n\n当 NovaAct 会话结束时，所有会话文件将自动上传到指定的 S3 存储桶，并带有提供的前缀。\n\n#### S3 上传故障排除\n\n**S3 存储桶中没有文件？**\n- 检查初始化过程中是否有“已注册停止钩子”的日志信息\n- 确认您的代码路径确实执行了 NovaAct 上下文管理器\n\n### 页面导航\n\n> **请使用 `nova.go_to_url` 代替 `nova.page.goto`**\n\nPlaywright Page 的 `goto()` 方法默认超时时间为 30 秒，这可能导致加载缓慢的网站出现失败。如果页面在此时间内未加载完毕，`goto()` 会抛出 `TimeoutError`，从而可能中断您的工作流程。此外，`goto()` 并不总是与 act 配合良好，因为 Playwright 可能会在页面完全加载之前就认为页面已就绪。\n为了解决这些问题，我们实现了一个新的函数 `go_to_url()`，它可以提供更可靠的导航。您可以在调用 `nova.start()` 后，通过 `nova.go_to_url(url)` 来使用它。您还可以在 `NovaAct` 初始化时使用 `go_to_url_timeout` 参数来修改起始页面加载以及后续 `go_to_url()` 调用的默认最大等待时间（以秒为单位）。\n\n### 查看以无头模式运行的会话\n\n当浏览器以无头模式运行时（`headless: True`），您可能需要查看智能体正在执行的工作流程进展。为此：\n1. 在启动 Nova Act 工作流之前，设置以下环境变量：\n```bash\nexport NOVA_ACT_BROWSER_ARGS=\"--remote-debugging-port=9222\"\n```\n2. 按照正常方式启动 Nova Act 工作流，同时保持 `headless: True`。\n3. 打开本地浏览器访问 `http:\u002F\u002Flocalhost:9222\u002Fjson`。\n4. 查找类型为 `page` 的条目，并将其 `devtoolsFrontendUrl` 复制粘贴到浏览器中。\n\n现在您将能够观察无头浏览器中的活动。您还可以像平常一样与浏览器窗口进行交互，这对于处理验证码非常有帮助。例如，在您的 Python 脚本中：\n1. 让 Nova Act 检查是否存在验证码\n2. 如果存在，就让脚本暂停一段时间，然后返回步骤 1。在暂停期间……\n3. 发送一封电子邮件或短信提醒（例如使用 [Amazon Simple Notification Service](https:\u002F\u002Faws.amazon.com\u002Fsns\u002F)），其中包含 `devtoolsFrontendUrl`，表明需要人工干预\n4. 人工用户打开 `devtoolsFrontendUrl` 并解决验证码\n5. 下次运行步骤 1 时，Nova Act 会发现验证码已被解决，脚本将继续执行。\n\n请注意，如果您在远程主机上运行 Nova Act，可能需要设置端口转发才能从其他系统访问。\n\n## 将 Nova Act SDK 与 Amazon Bedrock AgentCore 浏览器工具结合使用\n\nNova Act SDK 可以与 [Amazon Bedrock AgentCore 浏览器工具](https:\u002F\u002Fdocs.aws.amazon.com\u002Fbedrock-agentcore\u002Flatest\u002Fdevguide\u002Fbrowser-tool.html) 配合使用，实现面向生产的规模化浏览器自动化。AgentCore 浏览器工具提供完全托管的云上浏览器自动化解决方案，解决了实时数据访问方面的限制；而 Nova Act SDK 则赋予您构建复杂智能体工作流的灵活性。\n有关集成说明，请参阅[这篇博客文章](https:\u002F\u002Faws.amazon.com\u002Fblogs\u002Fmachine-learning\u002Fintroducing-amazon-bedrock-agentcore-browser-tool\u002F)。\n\n> **注意**：当 Nova Act SDK 和 Bedrock AgentCore 浏览器运行在不同操作系统上时（例如，SDK 在 macOS 上，AgentCore 浏览器在 Linux 上），键盘命令可能无法正确跨系统转换。这会影响某些 SDK 函数，如 `agent_type()`，该函数使用依赖操作系统的键盘快捷键（例如“全选”的 `ControlOrMeta+A`）。这种行为是跨操作系统集成架构的预期结果，在开发使用键盘输入方法的自动化流程时应予以考虑。\n\n## 已知限制\n我们对 Nova Act 的愿景是提供关键能力，以规模化构建实用的智能体。如果您在使用 Nova Act 时遇到任何限制，请通过 [nova-act@amazon.com](mailto:nova-act@amazon.com?subject=Nova%20Act%20Bug%20Report) 向我们反馈，帮助我们不断改进。\n\n例如：\n\n* `act()` 无法与非浏览器应用程序交互；\n* `act()` 无法与浏览器窗口直接交互。这意味着，诸如请求访问您的位置信息之类的浏览器弹出窗口不会干扰 `act()` 的执行，但如果需要处理，则必须手动确认；\n* 屏幕尺寸限制：\n  * Nova Act 针对 `864×1296` 至 `1536×2304` 的分辨率进行了优化；超出此范围可能会导致性能下降；\n  * 您可以使用 `screen_width` 和 `screen_height` 参数调整屏幕尺寸（例如，`screen_width=1920, screen_height=1080`）。\n\n更多详细信息请参阅 Amazon Nova Act 的 AWS AI 服务卡片。\n\n## 参考\n\n### 初始化 `NovaAct`\n\n构造函数接受以下参数：\n\n* `starting_page (str)`：起始页面的 URL，支持 Web URL（`https:\u002F\u002F`）和本地文件 URL（`file:\u002F\u002F`）（必填参数）；\n  * 注意：使用文件 URL 时，需在构造函数中传入 `ignore_https_errors=True`。\n* `headless (bool)`：是否以无头模式启动浏览器（默认为 `False`）。\n* `user_data_dir (str)`：用户数据目录的路径，用于存储浏览器会话数据，如 Cookie 和本地存储（默认为 `None`）。\n* `nova_act_api_key (str)`：您生成用于身份验证的 API 密钥；如果未设置 `NOVA_ACT_API_KEY` 环境变量，则此参数为必填项。若同时提供了环境变量和此参数，则优先使用此参数。\n* `logs_directory (str)`：NovaAct 将输出日志、运行信息以及视频（若 `record_video` 设置为 `True`）的目录。\n* `record_video (bool)`：是否录制视频并保存到 `logs_directory`。必须指定 `logs_directory` 才能进行视频录制。\n* `proxy (dict)`：浏览器代理配置，应为包含以下字段的字典：\n  * `server`（必填）：代理服务器 URL（必须以 `http:\u002F\u002F` 或 `https:\u002F\u002F` 开头）；\n  * `username`（可选）：代理认证用户名；\n  * `password`（可选）：代理认证密码；\n  * 注意：连接到 CDP 端点或使用默认 Chrome 浏览器时不支持代理。\n* `human_input_callbacks`（可选）：人工输入回调的实现。若未提供，则不会触发人工输入请求。\n* `tools`（可选）：客户端提供的工具列表。\n\n此操作将创建一个浏览器会话。您可以根据需要创建任意数量的浏览器会话并并行运行，但每个会话必须是单线程的。\n\n### 控制浏览器\n\n#### 使用 `act`\n\n`act()` 接受用户的自然语言提示，并代表用户在浏览器窗口中执行操作以达成目标。参数：\n\n* `max_steps (int)`：设置 `act()` 在放弃任务前最多执行的步骤数。建议使用此参数，以防止智能体因尝试不同路径而无限循环。默认值为 30。\n* `timeout (int)`：整个 `act` 调用的超时时间（单位：秒）。建议优先使用 `max_steps`，因为每步所需时间可能因模型服务器负载和网站延迟而异。\n* `observation_delay_ms`：在对页面进行观察之前额外增加的毫秒级延迟。有助于等待 UI 动画完成。\n\n返回一个 `ActResult` 对象。\n\n```python\nclass ActResult:\n    metadata: ActMetadata\n\nclass ActMetadata:\n    session_id: str | None\n    act_id: str | None\n    num_steps_executed: int\n    start_time: float\n    end_time: float\n    prompt: string\n```\n\n如果向 `act()` 传递了模式（`act_get()` 函数会便捷地提供默认的 `STRING_SCHEMA`），则返回的对象将是 `ActGetResult`，它是 `ActResult` 的子类，包含原始响应和结构化响应：\n\n```python\nclass ActGetResult(ActResult):\n    response: str | None\n    parsed_response: JSONType\n    valid_json: bool | None\n    matches_schema: bool | None\n```\n\n#### 以编程方式操作\n\n`NovaAct` 直接通过 `page` 属性暴露了 Playwright 的 [`Page`](https:\u002F\u002Fplaywright.dev\u002Fpython\u002Fdocs\u002Fapi\u002Fclass-page) 对象。\n\n这可用于获取浏览器的当前状态，例如截屏或 DOM，也可以直接对浏览器进行操作：\n\n```python\nscreenshot_bytes = nova.page.screenshot()\ndom_string = nova.page.content()\nnova.page.keyboard.type(\"hello\")\n```\n\n## 免责声明\n\n注意：在使用 Nova Act Playground 和\u002F或选择采用 API 密钥认证的 Nova Act 开发者工具时，访问和使用须遵守 nova.amazon.com 的《使用条款》。当您选择采用 AWS IAM 认证的 Nova Act 开发者工具，以及\u002F或者将工作流部署至 Nova Act AWS 服务时，则适用您的 AWS 服务条款和\u002F或客户协议（或其他规范您使用 AWS 服务的协议）。\n\n1. Nova Act 并非总是能够正确执行任务。\n2. ⚠️ 请注意，Nova Act 在其观察到的第三方网站内容中，可能会遇到命令，这些内容可能来自受信任网站上的用户生成内容，例如社交媒体帖子、搜索结果、论坛评论、新闻文章以及文档附件等。此类未经授权的命令被称为“提示注入”，可能导致模型出现错误，或以与指令不符的方式行事，例如忽略您的指示、执行未授权操作，或泄露敏感数据。为降低提示注入带来的风险，务必对 Nova Act 的行为进行监控并审查其操作，尤其是在处理不可信的用户贡献内容时。\n3. 我们建议您不要向 Nova Act 提供敏感信息，例如账户密码。请注意，如果您通过 Playwright 调用传递敏感信息，且该信息在浏览器中清晰可见，那么在 Nova Act 执行相应操作时，这些信息可能会被截图记录下来。（请参阅下方“输入敏感信息”部分。）\n4. 当您在 nova.amazon.com\u002Fact 上选择使用 API 密钥认证的开发者工具时，我们会收集与 Nova Act 的交互信息，包括浏览器内的截图，用于开发和改进我们的服务。如需删除您的 Nova Act 数据，请发送邮件至 nova-act@amazon.com。\n5. 请勿分享您在 https:\u002F\u002Fnova.amazon.com\u002Fact 上生成的 API 密钥。任何拥有您 API 密钥的人，均可利用该密钥在您的 Amazon 账户下操作 Nova Act。如果您遗失了 API 密钥，或怀疑他人可能已获取该密钥，请前往 https:\u002F\u002Fnova.amazon.com\u002Fact 停用现有密钥并重新获取新的密钥。\n6. 如果您使用的是我们的默认浏览环境，请在用户代理字符串中查找“NovaAct”以识别我们的代理。如果您在自己的浏览环境中运行 Nova Act，或自定义了用户代理字符串，我们建议您也包含相同的字符串。\n\n## 报告问题\n\n帮助我们不断改进！如果您发现任何问题，请通过 nova-act@amazon.com 向我们提交问题报告。\n\n请确保在邮件中包含以下内容：\n- 问题描述；\n- 会话 ID（已在控制台日志中打印）；以及\n- 您正在使用的流程脚本。\n\n您的反馈对我们提升所有用户的体验至关重要。\n\n感谢您试用 Nova Act！","# Nova Act 快速上手指南\n\nNova Act 是亚马逊推出的一项新服务，旨在构建和管理可靠的 AI 代理集群，用于大规模自动化生产环境中的 UI 工作流。它结合自然语言提示与 Python 代码，自动在浏览器中完成重复性任务，并在必要时升级至人工监督。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：\n    *   macOS Sierra 及以上\n    *   Ubuntu 22.04 及以上\n    *   WSL2 或 Windows 10+\n*   **Python 版本**：3.10 或更高\n*   **浏览器**：推荐使用 Google Chrome（Nova Act 基于 Playwright，需安装对应浏览器驱动）\n*   **语言支持**：目前主要支持英文指令。\n\n## 安装步骤\n\n### 1. 安装 SDK\n使用 pip 安装最新版本的 `nova-act`（注意：3.0 以下版本已不再支持）：\n\n```bash\npip install --upgrade nova-act\n```\n\n### 2. 安装浏览器驱动\nNova Act 依赖 Playwright 运行浏览器。首次运行时会自动安装，也可手动预安装 Google Chrome 以获得最佳体验：\n\n```bash\nplaywright install chrome\n```\n\n> **提示**：若首次运行启动较慢（约 1-2 分钟），是因为正在后台安装 Playwright 模块。后续运行将只需几秒钟。如需跳过自动安装，可设置环境变量 `NOVA_ACT_SKIP_PLAYWRIGHT_INSTALL`。\n\n### 3. 配置认证\n您需要通过 API Key 或 AWS IAM 进行认证。\n\n**方式一：API Key（推荐用于本地开发测试）**\n1. 访问 [https:\u002F\u002Fnova.amazon.com\u002Fact](https:\u002F\u002Fnova.amazon.com\u002Fact) 生成 API Key。\n2. 在终端中将其设置为环境变量：\n\n```sh\nexport NOVA_ACT_API_KEY=\"your_api_key\"\n```\n\n**方式二：AWS IAM（推荐用于部署到 AWS 服务）**\n确保环境中已配置 AWS 凭证（如 `~\u002F.aws\u002Fcredentials`），SDK 将自动创建默认的 boto session。详细配置请参考 AWS 官方文档。\n\n## 基本使用\n\nNova Act 提供三种主要使用模式：脚本模式、交互模式和异步模式。\n\n### 模式一：脚本模式（推荐）\n最简单的方式是使用上下文管理器，任务完成后自动关闭浏览器。\n\n```python\nfrom nova_act import NovaAct\n\nwith NovaAct(starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\u002Fsearch\") as nova:\n    nova.act(\"Find flights from Boston to Wolf on Feb 22nd\")\n```\n\n### 模式二：交互模式\n适合在 Python REPL 或 Jupyter Notebook 中逐步调试和实验。\n\n```python\nfrom nova_act import NovaAct\n\n# 初始化并启动\nnova = NovaAct(starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\u002Fsearch\")\nnova.start()\n\n# 执行动作\nnova.act(\"Find flights from Boston to Wolf on Feb 22nd\")\n\n# 提示：在 act() 运行时请勿手动操作浏览器。\n# 按 Ctrl+X 可退出当前动作但保留浏览器状态；按 Ctrl+C 会直接关闭浏览器。\n```\n\n### 模式三：异步模式\n适用于需要并发处理或多个会话的场景。\n\n```python\nimport asyncio\nfrom nova_act.asyncio import NovaAct\n\nasync def main():\n    async with NovaAct(starting_page=\"https:\u002F\u002Fnova.amazon.com\u002Fact\u002Fgym\u002Fnext-dot\u002Fsearch\") as nova:\n        await nova.act(\"Find flights from Boston to Wolf on Feb 22nd\")\n\nasyncio.run(main())\n```\n\n### 💡 编写高效提示词的技巧\n为了让 Agent 更可靠地执行任务，建议遵循以下原则：\n1.  **指令直接明确**：避免模糊表述，直接说明要点击的标签或输入的值。\n    *   ✅ `nova.act(\"Navigate to the routes tab\")`\n    *   ❌ `nova.act(\"Let's see what routes are available\")`\n2.  **提供完整细节**：包含所有必要参数（如日期、人数、价格限制）。\n3.  **拆分复杂任务**：如果任务步骤超过 30 步，建议拆分为多个小的 `act()` 调用，并通过代码传递中间结果。","某大型电商运营团队每天需登录数十个供应商后台，手动下载昨日销售报表并整理入库，工作繁琐且易出错。\n\n### 没有 nova-act 时\n- 员工需重复执行点击、登录、筛选日期、下载文件等机械操作，耗时数小时且容易因疲劳点错按钮。\n- 遇到网页弹窗验证码或动态加载元素时，传统 RPA 脚本极易崩溃，需要专人频繁维护代码。\n- 流程缺乏弹性，一旦供应商后台界面微调，整个自动化脚本就必须重写，开发周期长。\n- 异常处理困难，系统无法判断何时该跳过错误继续运行，何时必须暂停并通知人工介入。\n- 难以规模化，想同时处理多个供应商账号时，只能堆砌人力或搭建复杂的虚拟机集群。\n\n### 使用 nova-act 后\n- 运营人员只需用自然语言描述“下载昨日报表”，nova-act 即可自动在浏览器中完成全流程，效率提升 90%。\n- 内置的智能识别能力轻松应对验证码和复杂对话框，遇到无法解决的异常时自动升级请求人类主管介入。\n- 结合 Python 代码与自然语言定义工作流，即使网页布局变更，只需调整提示词即可快速适配，无需重构底层逻辑。\n- 支持“人机协同”模式，关键节点自动暂停等待确认，既保证了自动化速度又确保了数据准确性。\n- 利用并行会话功能，单台机器即可同时管理数百个供应商账号的报表抓取任务，轻松实现规模化部署。\n\nnova-act 将原本僵化的 UI 自动化转变为灵活、可靠且可规模化的智能代理工作流，让人力从重复劳动中彻底解放。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Faws_nova-act_9ca1acff.png","aws","Amazon Web Services","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Faws_84ebd8ed.png","",null,"open-source-github@amazon.com","https:\u002F\u002Famazon.com\u002Faws","https:\u002F\u002Fgithub.com\u002Faws",[81,85,89],{"name":82,"color":83,"percentage":84},"Python","#3572A5",98.5,{"name":86,"color":87,"percentage":88},"JavaScript","#f1e05a",1.4,{"name":90,"color":91,"percentage":92},"Dockerfile","#384d54",0.1,906,145,"2026-04-07T15:42:40","Apache-2.0","macOS Sierra+, Ubuntu 22.04+, WSL2, Windows 10+","未说明",{"notes":100,"python":101,"dependencies":102},"该工具主要用于自动化浏览器 UI 工作流，依赖 Playwright 驱动浏览器（推荐安装 Google Chrome）。首次运行可能需要 1-2 分钟安装 Playwright 浏览器模块。支持通过 API Key 或 AWS IAM 进行认证。仅支持英语环境。版本低于 3.0 的 SDK 已不再支持。","3.10+",[64,103,104],"playwright","boto3",[13,52],"2026-03-27T02:49:30.150509","2026-04-09T23:49:02.289571",[109,114,118,123,128,133],{"id":110,"question_zh":111,"answer_zh":112,"source_url":113},26864,"遇到 'TargetClosedError: Browser.new_page: Target page, context or browser has been closed' 错误怎么办？","这通常是因为浏览器进程崩溃或客户端已停止。最近的更新已修复了部分相关问题。如果您在安装 Playwright 浏览器时遇到权限问题，可以设置环境变量 `NOVA_ACT_SKIP_PLAYWRIGHT_INSTALL` 来跳过自动安装步骤（前提是您已经手动安装了 Playwright 和浏览器）。此外，也可以在全局级别设置该环境变量以避免重复安装导致的冲突。","https:\u002F\u002Fgithub.com\u002Faws\u002Fnova-act\u002Fissues\u002F18",{"id":115,"question_zh":116,"answer_zh":117,"source_url":113},26865,"如何在 Ubuntu 等系统上避免使用 sudo 安装 Playwright 浏览器？","为了避免以 root 权限安装带来的安全风险，您可以将浏览器安装到用户定义的空间。具体做法是设置环境变量 `PLAYWRIGHT_BROWSERS_PATH=$HOME\u002F.pw-browsers`，然后运行命令 `python -m playwright install chromium`。同时，在使用 nova-act 时，可以设置 `NOVA_ACT_SKIP_PLAYWRIGHT_INSTALL` 环境变量来跳过其内部的自动安装逻辑，从而使用您预先安装好的浏览器。",{"id":119,"question_zh":120,"answer_zh":121,"source_url":122},26866,"Nova-Act 是否支持 MCP (Model Context Protocol) 集成？","是的，Nova-Act 现在官方支持 MCP 服务器。您可以使用亚马逊官方提供的 MCP 服务器：https:\u002F\u002Fgithub.com\u002Famazon-agi-labs\u002Famazon-nova-act-mcp。此外，社区也维护了一个改进版的 MCP 服务器，可以通过 `uvx` 添加到配置中使用，项目地址为：https:\u002F\u002Fgithub.com\u002Fmadtank\u002Fnova-act-mcp。","https:\u002F\u002Fgithub.com\u002Faws\u002Fnova-act\u002Fissues\u002F17",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},26867,"运行示例代码时出现模糊的 'ActProtocolError: unhandled failure type' 错误如何调试？","这是一个通用异常，早期版本提供的信息不足。请确保您将 `nova-act` 升级到最新版本（`nova-act>=1.0.2344.0`），新版本已经合并了改进的错误报告机制，能提供更具体的错误原因（如路由服务拒绝、认证不匹配等）。如果升级后问题依旧，请提供新的详细报错输出以便进一步分析。","https:\u002F\u002Fgithub.com\u002Faws\u002Fnova-act\u002Fissues\u002F8",{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},26868,"升级到 Nova-Act 2.0 版本后无法启动浏览器或运行示例文件怎么办？","这通常是由于旧的 Playwright 浏览器二进制文件与新版本的 Nova-Act 或 Playwright 库之间存在兼容性冲突。解决方法是彻底卸载并重新安装相关组件。请依次执行以下命令：\n1. 卸载包：`pip uninstall nova-act`\n2. 卸载所有浏览器：`playwright uninstall --all`\n3. 重新安装包：`pip install nova-act`\n4. 重新安装浏览器：`playwright install`\n这样可以确保浏览器二进制文件与当前的 Playwright 版本完全匹配。","https:\u002F\u002Fgithub.com\u002Faws\u002Fnova-act\u002Fissues\u002F67",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},26869,"为什么在自定义 contenteditable 编辑器中无法可靠地选中文本？","这是当前自动化技术（Playwright\u002FNova-Act 底层依赖）的一个已知限制，并非特定编辑器的故障。目前的自动化引擎在处理某些自定义的 contenteditable 元素时存在局限性，导致无法像操作标准输入框那样可靠地选中文本。建议关注后续版本更新看是否有改进，或者尝试通过其他交互方式（如直接输入、快捷键等）变通解决。","https:\u002F\u002Fgithub.com\u002Faws\u002Fnova-act\u002Fissues\u002F82",[139,144,149,154,159,164,169,174,179,184,189,194,199,204,209,214,219,224,229,234],{"id":140,"version":141,"summary_zh":142,"released_at":143},172122,"v3.3.96.0","## :rocket: 新功能\n\n- **浏览器 CLI**：直接从命令行与 Nova Act 交互——浏览网页、提取数据、截取屏幕截图、管理标签页和会话，无需编写 Python 代码。\n  - 使用 `--cdp` 附加到已运行的 Chrome 实例，或使用 `--use-default-chrome` 启动带有扩展程序的默认 Chrome 浏览器。\n  - 根据脚本编写和调试需求，获取结构化的 JSON 输出（`--json`）、简洁的输出（`--quiet`）或详细的日志（`--verbose`）。\n  - 使用 `--observe` 实时查看浏览器操作，便于调试和开发。\n  - 每执行一条命令后，自动捕获无障碍快照和屏幕截图，让您随时了解当前页面状态。\n  - 自动检测您的身份验证方式（API 密钥或 AWS 凭证），并可通过 `--auth-mode`、`--aws-profile` 和 `--region` 进行配置。\n  - 浏览器会话可在多次 CLI 调用之间保持，并实现自动恢复和元数据持久化。\n  - 运行 `act browser doctor` 检查环境，或运行 `act browser setup` 进行引导式首次配置。\n  - 支持按命令捕获日志、按步骤跟踪无障碍性、录制会话、监控磁盘使用情况，以及自动生成失败截图。\n  - 使用 `act browser qa-plan` 可根据纯英文描述生成基于 Gherkin 的质量保证测试计划。","2026-04-01T16:18:31",{"id":145,"version":146,"summary_zh":147,"released_at":148},172123,"v3.3.35.0","## 🚀 新功能\n\n* **异步支持（预览）**：Nova Act 实现了异步版本，可与异步应用无缝集成，并支持并发浏览器自动化\n* **轨迹自动转储**：每次执行 Act 后，轨迹 JSON 文件会自动保存到会话日志目录中，取代之前的调用日志格式，以标准化的轨迹数据形式记录更丰富的元数据\n* **工具调用结果纳入日志**：会话日志的 JSON 和 HTML 报告现在会包含工具调用结果，从而提升调试可见性\n* **[CLI] 远程工作流可见性**：CLI 工作流列表现可拉取并合并远程 AWS 工作流定义与本地状态，通过筛选标志显示已同步、远程和本地的工作流标识\n* **[CLI] ARM64 AgentCore 支持**：Docker 镜像构建目标现已扩展至 ARM64 架构，以便在 x86_64 主机上部署时兼容 AgentCore\n\n## 🔨 改进\n\n* **工具请求拦截绕过**：工具在执行过程中可临时禁用 SDK 的请求拦截功能，避免外部进程直接控制浏览器时导致浏览器卡顿\n* **[CLI] 启动速度更快**：将耗时的导入操作延迟到命令执行时进行，使 CLI 响应更加迅速\n* **[CLI] 动态 Dockerfile 区域配置**：将 Dockerfile 中的硬编码区域替换为动态区域解析，使部署可在任意区域正常运行\n* **[CLI] 错误信息优化**：异常链式调用可保留完整的堆栈跟踪信息，便于调试\n\n## 🐛 问题修复\n\n* **路径遍历修复**：修复安全相对路径处理逻辑，正确处理路径等于基础目录的边缘情况\n* **颜色输入事件**：颜色选择器输入控件在程序化设置值时，现能正确触发 input 和 change 事件","2026-03-27T15:14:34",{"id":150,"version":151,"summary_zh":152,"released_at":153},172124,"v3.1.263.0","## 🚀 新功能\n\n- **改进的 iFrame 触发机制**：优化了跨域元素的处理，以提升下拉菜单的交互体验\n\n## 🐛 问题修复\n\n- **SSL 稳定性**：通过将 unroute 限制为交互模式并改进上下文清理，修复了 SSL 钩子死锁问题\n- **安全性**：通过将 S3 上传路径与基础目录进行校验，修复了路径遍历漏洞","2026-02-26T19:21:43",{"id":155,"version":156,"summary_zh":157,"released_at":158},172125,"v3.1.157.0","## :rocket: 新功能\n\n- **Bedrock AgentCore 浏览器示例**：新增了一个示例，演示如何将 Nova Act 与 Bedrock AgentCore 浏览器一起使用。\n\n## :bug: 错误修复\n\n- **字符串响应解析**：修复了解析后的字符串模式响应中包含多余双引号的问题。\n- **Dockerfile 改进**：修复了 CLI Dockerfile 中 Playwright 的安装问题，以确保浏览器正确配置。\n\n## :hammer: 功能改进\n\n- **日志记录改进**：追踪日志记录器现为全局单例，便于下游应用重定向和处理日志。增强的追踪和日志输出提供了更有帮助的调试信息。","2026-02-05T17:05:51",{"id":160,"version":161,"summary_zh":162,"released_at":163},172126,"v3.1.89.0","## 🚀 新功能\n\n- **默认模型变更**：在使用 API 密钥认证时，将默认模型更新为 `nova-act-preview`。[更多信息请参阅 AWS 文档](https:\u002F\u002Fdocs.aws.amazon.com\u002Fnova-act\u002Flatest\u002Fuserguide\u002Fmodel-version-selection.html)\n\n## 🔨 改进\n\n- **依赖项更新**：升级了 `strands-agents` 和 `strands-agents-tools` 依赖项，以包含错误修复、性能改进和功能增强。\n- **增强的错误处理**：新增了 `ActInvalidToolError` 和 `ActInvalidToolSchemaError` 异常，以便在工具调用出现问题时提供更清晰的诊断信息。\n\n## 🐛 错误修复\n\n- **Linux 无头模式**：修复了在没有图形界面的 Linux 系统上浏览器启动失败的问题，当检测不到显示服务器时会自动启用无头模式。","2026-01-29T15:58:13",{"id":165,"version":166,"summary_zh":167,"released_at":168},172127,"v3.1.18.0","## ⚠️ 重要通知\n\n* Nova Act SDK 3.0 以下版本已不再受支持。用户必须升级到最新版本，以获取安全更新和新功能。\n* 每日 API Key 配额已下调。如果您需要为更具规模的项目分配专属配额，请将工作流迁移至 [Nova Act AWS 服务](https:\u002F\u002Faws.amazon.com\u002Fnova\u002Fact\u002F)。\n\n## 🚀 新增内容\n\n* MCP 工具集成文档：新增 README 章节，介绍如何使用 Strands MCP 客户端集成来自 MCP（模型上下文协议）服务器的工具。\n* 浏览器对话框处理文档：新增 README 章节，介绍如何通过自定义 Playwright 处理器来处理浏览器对话框（alert、confirm、prompt）。\n\n## 🔨 改进\n\n* 滚动行为：通过仅在所需轴向上检查滚动，修复了意外滚动问题。\n* 更清晰的错误信息：限流错误和每日配额错误现在会提供更有帮助的提示信息，以指导用户操作。","2026-01-21T20:23:50",{"id":170,"version":171,"summary_zh":172,"released_at":173},172128,"v3.0.157.0","## 🔨  改进\n\n* **增强的元素定位**：通过改进活动元素检测和一致的深层元素定位，使元素交互更加可靠。\n* **更好的显示兼容性**：将屏幕分辨率容差提高至20%，以提升在不同显示配置下的兼容性。\n* **错误提示信息**：提供更清晰的身份验证错误信息，并优雅地处理不支持的视口尺寸。\n\n## 🐛  Bug修复\n\n* **Playwright安装**：通过固定Playwright版本来解决安装问题，防止兼容性冲突。\n* **后端稳定性**：修复了后端操作中的字符串处理和HTTP压缩问题。\n* **路由处理**：通过确保unroute钩子在移除路由前等待所有正在执行的处理器完成，从而修复“路由已处理”错误，避免页面导航时出现竞态条件。\n","2026-01-07T19:28:52",{"id":175,"version":176,"summary_zh":177,"released_at":178},172129,"v3.0.67.0","## ⚠️ 重要通知\n\n* 对 SDK 3.0 以下版本的支持将于 2026 年 1 月 21 日终止\n\n## 🔐 安全更新\n\n* 现已对所有页面导航应用 SSL 证书验证\n\n## 🚀 新功能\n\n* 针对 Shadow DOM 和 iframe 元素增强了焦点检测，提高了触发可靠性\n\n## 🔨 改进\n\n* 提供更清晰的认证配置错误提示信息，并附带改进的指导说明和文档链接\n* CLI 错误报告现可保留完整的堆栈跟踪，便于故障排查\n\n## 🐛 问题修复\n\n* 修复了 AgentCore 运行时连接以及重复工作流执行相关的 CLI 部署问题\n\n","2025-12-16T16:08:32",{"id":180,"version":181,"summary_zh":182,"released_at":183},172130,"v3.0.5.0","1. 更新 `pyproject.toml`，将 CLI 模板包含到生成的 wheel 包中。\n2. 修复一个小问题：移除 boto3 客户端上未使用的 `# type: ignore` 注释。","2025-12-02T21:03:23",{"id":185,"version":186,"summary_zh":187,"released_at":188},172131,"v3.0.0.0","#### NovaAct 3.0.0.0\n\n🎉 **Nova Act SDK v3 已发布！**\n\n本次重大版本更新引入了与 Nova Act AWS 服务的集成、Nova Act CLI、人机协作功能，以及（预览版）浏览器之外的工具使用支持。通过 [nova.amazon.com\u002Fact](http:\u002F\u002Fnova.amazon.com\u002Fact) 提供的 API Key，依然可以继续访问 Nova Act 免费版。\n\n🚀 **新特性**\n\n* **Nova Act AWS 服务集成**：借助 `Workflow` 上下文管理器和装饰器，利用 AWS Nova Act 服务大规模部署和管理生产工作流。\n* **Nova Act CLI**：通过简单命令将工作流快速部署到 AWS 的 Amazon Nova Act 和 Amazon Bedrock AgentCore 运行时——直接在终端中创建、部署、运行并监控工作流。\n* **人机协作（HITL）**：使用 `HumanInputCallbacksBase` 在工作流中实现人工监督，包括用于决策的审批模式以及用于实时干预的 UI 接管功能（该功能不作为 AWS 托管服务提供）。\n* **自定义工具集成（预览版）**：通过 `@tool` 装饰器集成外部工具（如 API 调用或数据库查询），将工作流扩展至浏览器操作之外。\n* **原生文件上传**：直接使用 Nova Act 上传文件。\n* **代理执行时间跟踪**：监控近似执行时间，以了解并优化工作流性能。\n\n**🔨 改进**\n\n* **增强的 Action Viewer HTML**：Action Viewer 现在具有可折叠部分和更优的布局，便于调试和工作流分析。\n* **更好的控制台输出**：添加了表情符号和进度指示器，使工作流状态更加清晰。\n* **提升执行可靠性**：优化了点击、滚动和输入操作，改进了元素处理和输入清除机制，从而实现更一致的工作流执行。\n* **增强导航安全性**：状态护栏现在会验证起始页面和导航路径，防止工作流偏离预期轨道。\n\n⚠️ **破坏性变更**\n\n* **`act()` API 变更**：为了从 `act()` 中获取结果，您现在需要向 `act()` 函数传递 Schema，或者使用新的 `act_get()` 函数。如果未传递 Schema 而直接调用 `act()`，将不再返回响应属性。\n* **`SecurityOptions` 更新**：文件访问控制现使用 `allowed_file_open_paths` 实现作用域权限，取代了之前的 `allow_file_urls` 标志。\n\n#### 更改日志\n\n**新增**\n\n* 通过 `Workflow` 上下文管理器和装饰器与 Nova Act AWS 服务集成。\n* 通过 `@tool` 装饰器实现自定义工具集成，将工作流扩展至浏览器操作之外（预览版）。\n* 使用 `HumanInputCallbacksBase` 提供审批和 UI 接管模式的人机协作（HITL）支持。\n* 新增 `act` CLI，支持将 Nova Act 工作流快速部署到 AWS AgentCore 运行时。\n* 增加工作时间跟踪工具，用于估算和理解代理执行时间。\n* 改进了控制台输出，增加了表情符号和更多反馈信息。\n* 原生文件上传支持。\n\n**变更**\n\n* **破坏性变更**：移除响应属性","2025-12-02T18:05:32",{"id":190,"version":191,"summary_zh":192,"released_at":193},172132,"v2.3.18.0","## 🔐 Security Updates\r\n\r\n* Add configurable Security Options to block navigation to potentially unsafe URL schemes and unintended file uploads\r\n* Tighten Content Security Policy (CSP) and sanitization in Action Viewer HTML to mitigate security risks\r\n\r\n## 🚀 What's New\r\n\r\n* Add built-in method for data extraction to increase reliability of structured extract prompts\r\n\r\n## 🔨 Improvements\r\n\r\n* Enhance scrolling accuracy for more reliable page navigation\r\n* Improve error handling with better error classification and messaging\r\n* Improve type definitions for better development experience\r\n\r\n## 🐛 Bug Fixes\r\n\r\n* Fix dependency version compatibility issues\r\n* Prevent infinite loops when traversing complex DOM structures\r\n\r\n","2025-11-18T16:32:56",{"id":195,"version":196,"summary_zh":197,"released_at":198},172133,"v2.1.319.0","## Release Notes\r\n\r\n* Add state guardrail to control which URLs Nova Act can visit for enhanced security\r\n* Improve support for select and input elements to enable more reliable interaction with dropdown menus and form fields\r\n* Improve full page scrolling by detecting and recovering from failed scroll attempts\r\n* Reorder operations in browser initialization to improve starting page transition time\r\n* Safely terminate Chrome before copying user data to eliminate conflicts with default Chrome browser integration\r\n* Strengthen protection against XSS risks, we recommend updating to this version for improved security","2025-10-31T17:22:35",{"id":200,"version":201,"summary_zh":202,"released_at":203},172134,"v2.1.124.0","## Release Notes\r\n\r\n* Connect to existing browser sessions with new `cdp_use_existing_page` and optional `starting_page`  parameter to preserve browser context\r\n* Resolve text input clearing issues across operating systems for consistent agent typing behavior\r\n* Enhance scroll and click interactions within PDF documents\r\n* Refine Action View HTML styling and layout for enhanced observability","2025-10-08T16:57:01",{"id":205,"version":206,"summary_zh":207,"released_at":208},172135,"v2.1.36.0","## Features\r\n\r\n* Integration with the [Nova Act extension](https:\u002F\u002Fgithub.com\u002Faws\u002Fnova-act-extension) for enhanced development experience with automated environment setup, chat-based script generation, real-time debugging, and step-by-step testing capabilities\r\n\r\n## Fixes & Improvements\r\n\r\n* Expand file upload coverage\r\n* Fix scroll behavior when bounding box is the entire page before scrolling\r\n* Fix model parameters not being passed to product server start-plan call\r\n* Fix emoji decoding issues\r\n* Improve scroll and click functionality on PDFs\r\n* Fix scroll behavior when bounding box is the entire page (viewport dimensions)\r\n* Improve setting of session and act ids\r\n* Major error handling refactor with comprehensive improvements across multiple modules\r\n* Minor logging improvements\r\n* Relax Playwright dependency version constraints\r\n* Add allow-origins for Chrome devtools frontend\r\n* Interpret double and right clicks\r\n* Additional minor fixes, improvements, and cleanup\r\n\r\n## Documentation Updates\r\n\r\n* Add README note about cross-OS keyboard shortcuts with AgentCore Browser\r\n* Improve documentation following error refactor\r\n* Add README section for Nova Act extension","2025-09-23T15:22:33",{"id":210,"version":211,"summary_zh":212,"released_at":213},172136,"v2.0.357.0","## Fixes & Improvements\r\n\r\n* Strengthen type checking\r\n* Minor refactor of telemetry module for reusability\r\n* Remove legacy extension-related code\r\n* Improve stop hook and `S3Writer` logging\r\n* Factor browser profile and user data directory management out of `default_chrome_browser` feature to standardize `user_data_dir` behavior\r\n* Improve scrollable element check logic\r\n* Fix `go_to_url_timeout`\r\n* Correct inaccurate `ModelError` raises\r\n* Update `agent_type()` to insert the text if string length is > 10\r\n* Allow scrollbars in Playwright headless mode\r\n* Measure step server time\r\n* Fix `wait_for_page_to_settle` bug that resulted in wait logic being bypassed\r\n* Additional minor fixes, improvements, and cleanup\r\n\r\n## Documentation Updates\r\n\r\n* Add README note about unencrypted password storage in browsers on some operating systems\r\n* Update sample workflows","2025-09-04T19:55:40",{"id":215,"version":216,"summary_zh":217,"released_at":218},172137,"v2.0.177.0","## Features\r\n\r\n* Playwright Browser Actuation\r\n    * Overhauled the browser actuation stack to use Playwright for improved accuracy, client-side latency, and customization\r\n\r\n## Fixes & Improvements\r\n\r\n* Allow specifying user agent when `cdp_endpoint_url` is set\r\n* Add `observation_delay_ms` argument to `act()` for customizable delays before observations, e.g., waiting for UI animations\r\n* Improve Action Viewer log output\r\n* Improve error messages\r\n* Handle Chrome v138 breaking changes to extension loading\r\n* Additional minor fixes and improvements\r\n\r\n## Documentation Updates\r\n\r\n* README\r\n    * Minor reorganization of `Common Building Blocks` section\r\n    * Add section on integration with Amazon Bedrock AgentCore Browser Tool\r\n    * Add notes about Playwright downloads\r\n\r\n","2025-08-15T19:13:58",{"id":220,"version":221,"summary_zh":222,"released_at":223},172138,"v1.0.4013.0","## Features\r\n\r\n* Nova Act path to production (preview)\r\n    * Authenticate with AWS IAM\r\n    * Write Action Viewer logs to S3\r\n    * Integrate with the [Amazon Bedrock AgentCore Browser](https:\u002F\u002Faws.amazon.com\u002Fbedrock\u002Fagentcore)\r\n    * Learn more about the path to production preview in our [blog post](https:\u002F\u002Flabs.amazon.science\u002Fblog\u002Fprototype-to-production). Once you’re ready to bring your prototype to production, [join our waitlist](https:\u002F\u002Famazonexteu.qualtrics.com\u002Fjfe\u002Fform\u002FSV_9siTXCFdKHpdwCa). Access to the preview is limited to select customers.","2025-07-16T14:58:42",{"id":225,"version":226,"summary_zh":227,"released_at":228},172139,"v1.0.3949.0","## Features\r\n\r\n* Pass proxy configurations to Playwright via `proxy` option in the `NovaAct` constructor to route traffic through a specific proxy server\r\n\r\n## Fixes & Improvements\r\n\r\n* Refactor `PlaywrightInstanceManager` to improve code organization and make the actuation system more modular and maintainable\r\n* Improve session logs directory management\r\n* Fix page access logic for different actuator types (default vs custom)\r\n* Enhance error message clarity\r\n* Improve custom actuator functionality by providing access to the starting page URL during initialization\r\n* Improve unicode string decoding\r\n* Simplify DOM and `idToBboxMap` actuation logic\r\n* Improve observability\r\n* Bump Playwright version to 1.52.0\r\n* Various fixes and improvements to the extension and actuation preview","2025-07-10T19:34:58",{"id":230,"version":231,"summary_zh":232,"released_at":233},172140,"v1.0.3679.0","## Features\r\n\r\n* Pass CDP header to Playwright via `cdp_headers` option in the `NovaAct` constructor\r\n* Try upcoming features via `preview` option in `NovaAct` constructor\r\n    * Note: preview features may be unstable and the API may change in the future\r\n* Use Playwright for actuation via `playwright_actuation` preview feature\r\n* Customize actuation via `custom_actuator` preview feature\r\n\r\n## Fixes & Improvements\r\n\r\n* Refactor local log file writing\r\n* Improve `act` telemetry\r\n* Support `--profile-directory` flag when launching default Chrome\r\n* Support headless mode when launching default Chrome\r\n* Various extension fixes and improvements","2025-06-24T20:34:30",{"id":235,"version":236,"summary_zh":237,"released_at":238},172141,"v1.0.3380.0","## Features\r\n\r\n- Use the `use_default_chrome_browser` option of the `NovaAct` constructor to run workflows with your locally installed Chrome to access sites requiring specific extensions or security features (macOS only)\r\n\r\n## Fixes & Improvements\r\n\r\n- Add enum for common JavaScript expressions\r\n- Scaffolding for option to disable extension\r\n- Improve Nova Act client code organization and maintainability","2025-06-13T21:29:17"]