automagica

3.1k 489 困难 1 次阅读昨天图像开发框架其他Agent

AI 解读由 AI 自动生成，仅供参考

Automagica 是一款融合了人工智能技术的智能机器人流程自动化（RPA）开源项目，旨在让自动化技术变得人人可用。它主要解决重复性电脑操作耗时费力的问题，能够自动执行如浏览器与 Excel 交互、文件加密解密等复杂任务，显著提升工作效率。

这套工具特别适合希望实现办公自动化的普通用户，以及需要灵活定制流程的开发者。对于开发者而言，Automagica 提供了基于 Jupyter Notebook 的开发环境（Automagica Lab）和可视化流程设计器（Automagica Flow），并支持直接编写 Python 代码，兼顾了低门槛与高自由度。

其独特的技术亮点在于"Automagica Wand"，这是一个由 AI 驱动的界面元素拾取器，能更精准地识别和操作软件界面，解决了传统 RPA 在动态界面中定位不准的痛点。虽然该项目核心部分已被 Netcall 收购并转向商业集成，但其原有的架构设计依然展示了将 AI 视觉能力与传统自动化脚本完美结合的创新思路，是学习 RPA 与 AI 结合应用的优秀案例。

使用场景

某电商公司的财务专员每天需从多个供应商门户下载发票 PDF，提取关键数据并录入 Excel 报表，同时需对敏感文件进行加密归档。

没有 automagica 时

员工必须手动登录不同网站下载文件，耗时且容易因疲劳输错网址或账号。
依靠人工肉眼识别 PDF 中的金额和日期，效率低下且极易出现转录错误。
缺乏统一的流程管理，一旦某个步骤出错（如网页元素变动），整个任务中断且难以排查。
敏感财务文件分散在本地文件夹，缺乏自动化的加密机制，存在数据泄露风险。
每次新增供应商都需要重新编写复杂的脚本，维护成本极高，非技术人员无法参与优化。

使用 automagica 后

利用 Automagica Bot 自动登录各门户并批量下载发票，无需人工干预，释放人力。
通过 AI 驱动的 Automagica Wand 智能识别并抓取 PDF 关键字段，准确率提升至 99% 以上。
借助 Automagica Flow 可视化设计流程，当网页结构变化时可快速调整节点，系统稳定性大幅增强。
内置加密活动（Encrypt file）自动对归档文件进行 Fernet 加密，确保数据存储符合安全合规要求。
业务人员可通过低代码界面直接修改自动化逻辑，无需依赖专业开发团队，响应速度显著加快。

automagica 将繁琐重复的财务流程转化为稳定、安全且可自我进化的智能自动化闭环，让企业真正实现了降本增效。

运行环境要求

操作系统

Windows

GPU

未说明

内存

未说明

依赖

notes该项目已于 2020 年 10 月 13 日被 Netcall plc 收购，不再作为开源软件（AGPL3 许可）提供。部分功能（如 Random beep）明确仅在 Windows 上有效。Automagica Lab 组件需要预先安装 Jupyter。原有的免费服务仅对已部署用户保留三个月过渡期。

python未说明

Jupyter (用于 Automagica Lab)

快速开始

Automagica

Automagica 项目始于 2018 年，旨在开发开源软件，确保机器人流程自动化技术能够惠及所有人。

随着 Automagica 各个版本的发布，诸如 Wand 和 Portal 等更高级的功能需要一套服务基础设施来支持更可靠的机器人运行、先进的服务以及管理和控制功能。

随着这些服务使用量的增加，服务层的托管和维护成本也随之上升。

为了推动这些重要服务进入下一发展阶段，我们很高兴地宣布：2020 年 10月13 日，领先的低代码、客户互动及联络中心软件提供商 Netcall plc 已收购 Oakwood Technologies BV（以“Automagica”名义运营）。

Netcall 将把 Automagica 的 RPA 集成到其 Liberty 平台中，从而提供 RPA、低代码和客户互动解决方案的强大组合。

Automagica Robot 现已不再依据 AGPL3 许可证条款提供。

不过，我们不会停止对已部署机器人的服务支持。这些机器人将继续免费使用 Wand 和 OCR 功能，为期三个月，自今日（2020年10月13日）起算。

现有 Automagica Portal 用户在未来三个月内也可免费访问该平台；在此期间，我们将为用户提供迁移到商业服务的选项。

在此，我们衷心感谢所有为该项目做出贡献的人士。

热爱 Automagica 示例

组件

Automagica 套件由以下组件构成：

Automagica Bot：负责执行自动化任务的运行时/代理。
Automagica Flow：可视化流程设计器，可快速构建自动化流程，并全面支持 Python 代码。
Automagica Wand：基于 AI 的 UI 元素选择器。
Automagica Lab：基于 Jupyter Notebooks 的笔记本式自动化开发环境（需安装 Jupyter）。
Automagica Portal：用于管理机器人、凭据、自动化流程、日志等。

Portal 和 Flow

示例

浏览器与 Excel 协同工作：

Excel 示例 Automagica

活动

Automagica 所有官方活动概览：

Process	Description
Cryptography	‌‌
Random key	Generate random Fernet key.
Encrypt text	Encrypt text with (Fernet) key,
Decrypt text	Dexrypt bytes-like object to string with (Fernet) key
Encrypt file	Encrypt file with (Fernet) key. Note that file will be unusable unless unlocked with the same key.
Decrypt file	Decrypts file with (Fernet) key
Key from password	Generate key based on password and salt. If both password and salt are known the key can be regenerated.
Hash from file	Generate hash from file
Hash from text	Generate hash from text. Keep in mind that MD5 is not cryptographically secure.
Random	‌‌
Random number	Random numbers can be integers (not a fractional number) or a float (fractional number).
Random data	Generates all kinds of random data. Specifying locale changes format for some options
Random boolean	Generates a random boolean (True or False)
Random name	Generates a random name. Adding a locale adds a more common name in the specified locale. Provides first name and last name.
Random words	Generates a random sentence. Specifying locale changes language and content based on locale.
Random address	Generates a random address. Specifying locale changes random locations and streetnames based on locale.
Random beep	Generates a random beep, only works on Windows
Random date	Generates a random date.
Today's date	Generates today's date.
Generate unique identifier	Generates a random UUID4 (universally unique identifier).
Output	‌‌
Display overlay message	Display custom OSD (on-screen display) message. Can be used to display a message for a limited amount of time. Can be used for illustration, debugging or as OSD.
Print message in console	Print message in console. Can be used to display data in the Automagica Flow console
Browser	‌‌
Chrome	Open Chrome Browser
Save all images	Save all images on current page in the Browser
Browse to URL	Browse to URL.
Find elements by text	Find all elements by their text. Text does not need to match exactly, part of text is enough.
Find all links	Find all links on a webpage in the browser
Find first link on a webpage	Find first link on a webpage
Get all text on webpage	Get all the raw body text from current webpage
Highlight element	Highlight elements in yellow in the browser
Exit the browser	Quit the browser by exiting gracefully. One can also use the native 'quit' function
Find all XPaths	Find all elements with specified xpath on a webpage in the the browser. Can also use native 'find_elements_by_xpath'
Find XPath in browser	Find all element with specified xpath on a webpage in the the browser. Can also use native 'find_elements_by_xpath'
Find class in browser	Find element with specified class on a webpage in the the browser. Can also use native 'find_element_by_class_name'
Find class in browser	Find all elements with specified class on a webpage in the the browser. Can also use native 'find_elements_by_class_name' function
Find element in browser based on class and text	Find all elements with specified class and text on a webpage in the the browser.
Find id in browser	Find element with specified id on a webpage in the the browser. Can also use native 'find_element_by_id' function
Switch to iframe in browser	Switch to an iframe in the browser
Credential Management	‌‌
Set credential	Add a credential which stores credentials locally and securely. All parameters should be Unicode text.
Delete credential	Delete a locally stored credential. All parameters should be Unicode text.
Get credential	Get a locally stored redential. All parameters should be Unicode text.
FTP	‌‌
Create FTP connection (insecure)	Can be used to automate activites for FTP
Download file	Downloads a file from FTP server. Connection needs to be established first.
Upload file	Upload file to FTP server
List FTP files	Generate a list of all the files in the FTP directory
Check FTP directory	Check if FTP directory exists
Create FTP directory	Create a FTP directory. Note that sufficient permissions are present
Keyboard	‌‌
Press key	Press and release an entered key. Make sure your keyboard is on US layout (standard QWERTY).If you are using this on Mac Os you might need to grant access to your terminal application.
Press key combination	Press a combination of two or three keys simultaneously. Make sure your keyboard is on US layout (standard QWERTY).
Type text	Simulate keystrokes. If an element ID is specified, text will be typed in a specific field or element based on the element ID (vision) by the recorder.
Mouse	‌‌
Get mouse coordinates	Get the x and y pixel coordinates of current mouse position.
Display mouse position	Displays mouse position in an overlay.
Mouse click	Clicks on an element based on the element ID (vision)
Mouse click coordinates	Clicks on an element based on pixel position determined by x and y coordinates. To find coordinates one could use display_mouse_position().
Double mouse click coordinates	Double clicks on a pixel position determined by x and y coordinates.
Double mouse click	Double clicks on an element based on the element ID (vision)
Right click	Right clicks on an element based on the element ID (vision)
Right click coordinates	Right clicks on an element based pixel position determined by x and y coordinates.
Move mouse	Moves te pointer to an element based on the element ID (vision)
Move mouse coordinates	Moves te pointer to an element based on the pixel position determined by x and y coordinates
Move mouse relative	Moves the mouse an x- and y- distance relative to its current pixel position.
Drag mouse	Drags mouse to an element based on pixel position determined by x and y coordinates
Drag mouse	Drags mouse to an element based on the element ID (vision)
Image	‌‌
Random screen snippet	Take a random square snippet from the current screen. Mainly for testing and/or development purposes.
Screenshot	Take a screenshot of current screen.
Folder Operations	‌‌
List files in folder	List all files in a folder (and subfolders)
Create folder	Creates new folder at the given path.
Rename folder	Rename a folder
Open a folder	Open a folder with the default explorer.
Move a folder	Moves a folder from one place to another.
Remove folder	Remove a folder including all subfolders and files. For the function to work optimal, all files and subfolders in the main targetfolder should be closed.
Empty folder	Remove all contents from a folderFor the function to work optimal, all files and subfolders in the main targetfolder should be closed.
Checks if folder exists	Check whether folder exists or not, regardless if folder is empty or not.
Copy a folder	Copies a folder from one place to another.
Zip	Zip folder and its contents. Creates a .zip file.
Unzip	Unzips a file or folder from a .zip file.
Return most recent file in directory	Return most recent file in directory
Delay	‌‌
Wait	Make the robot wait for a specified number of seconds. Note that this activity is blocking. This means that subsequent activities will not occur until the the specified waiting time has expired.
Wait for folder	Waits until a folder exists.Note that this activity is blocking and will keep the system waiting.
Word Application	‌‌
Start Word Application	For this activity to work, Microsoft Office Word needs to be installed on the system.
Save	Save active Word document
Save As	Save active Word document to a specific location
Append text	Append text at end of Word document.
Replace text	Can be used for example to replace arbitrary placeholder value. For example whenusing template document, using 'XXXX' as a placeholder. Take note that all strings are case sensitive.
Read all text	Read all the text from a document
Export to PDF	Export the document to PDF
Export to HTML	Export to HTML
Set footers	Set the footers of the document
Set headers	Set the headers of the document
Quit Word	This closes Word, make sure to use 'save' or 'save_as' if you would like to save before quitting.
Word File	‌‌
Read and Write Word files	These activities can read, write and edit Word (docx) files without the need of having Word installed.
Read all text	Read all the text from the document
Append text	Append text at the end of the document
Save	Save document
Save as	Save file on specified path
Set headers	Set headers of Word document
Replace all	Replaces all occurences of a placeholder text in the document with a replacement text.
Outlook Application	‌‌
Start Outlook Application	For this activity to work, Outlook needs to be installed on the system.
Send e-mail	Send an e-mail using Outlook
Retrieve folders	Retrieve list of folders from Outlook
Retrieve e-mails	Retrieve list of messages from Outlook
Delete e-mails	Deletes e-mail messages in a certain folder. Can be specified by searching on subject, body or sender e-mail.
Move e-mails	Move e-mail messages in a certain folder. Can be specified by searching on subject, body or sender e-mail.
Save attachments	Save all attachments from certain folder
Retrieve contacts	Retrieve all contacts
Add a contact	Add a contact to Outlook contacts
Quit	Close the Outlook application
Excel Application	‌‌
Start Excel Application	For this activity to work, Microsoft Office Excel needs to be installed on the system.
Add worksheet	Adds a worksheet to the current workbook
Activate worksheet	Activate a worksheet in the current Excel document by name
Save	Save the current workbook. Defaults to homedir
Save as	Save the current workbook to a specific path
Write cell	Write to a specific cell in the currently active workbook and active worksheet
Read cell	Read a cell from the currently active workbook and active worksheet
Write range	Write to a specific range in the currently active worksheet in the active workbook
Read range	Read a range of cells from the currently active worksheet in the active workbook
Run macro	Run a macro by name from the currently active workbook
Get worksheet names	Get names of all the worksheets in the currently active workbook
Get table	Get table data from the currently active worksheet by name of the table
Activate range	Activate a particular range in the currently active workbook
Activate first empty cell down	Activates the first empty cell going down
Activate first empty cell right	Activates the first empty cell going right
Activate first empty cell left	Activates the first empty cell going left
Activate first empty cell up	Activates the first empty cell going up
Write cell formula	Write a formula to a particular cell
Read cell formula	Read the formula from a particular cell
Insert empty row	Inserts an empty row to the currently active worksheet
Insert empty column	Inserts an empty column in the currently active worksheet. Existing columns will shift to the right.
Delete row in Excel	Deletes a row from the currently active worksheet. Existing data will shift up.
Delete column	Delete a column from the currently active worksheet. Existing columns will shift to the left.
Export to PDF	Export to PDF
Insert data as table	Insert list of dictionaries as a table in Excel
Read worksheet	Read data from a worksheet as a list of lists
Quit Excel	This closes Excel, make sure to use 'save' or 'save_as' if you would like to save before quitting.
Excel File	‌‌
Read and Write xlsx files.	This activity can read, write and edit Excel (xlsx) files without the need of having Excel installed.
Export file to dataframe	Export to pandas dataframe
Activate worksheet	Activate a worksheet. By default the first worksheet is activated.
Save as	Save file as
Save as	Save file
Write cell	Write a cell based on column and row
Read cell	Read a cell based on column and row
Add worksheet	Add a worksheet
Get worksheet names	Get worksheet names
PowerPoint Application	‌‌
Start PowerPoint Application	For this activity to work, PowerPoint needs to be installed on the system.
Save PowerPoint	Save PowerPoint Slidedeck
Save PowerPoint	Save PowerPoint Slidedeck
Close PowerPoint Application	Close PowerPoint
Add PowerPoint Slides	Adds slides to a presentation
Slide count	Returns the number of slides
Text to slide	Add text to a slide
Delete slide	Delete a slide
Replace all occurences of text in PowerPoint slides	Can be used for example to replace arbitrary placeholder value in a PowerPoint.
PowerPoint to PDF	Export PowerPoint presentation to PDF file
Slides to images	Export PowerPoint slides to seperate image files
Office 365	‌‌
Send email Office Outlook 365	Send email Office Outlook 365
Salesforce	‌‌
Salesforce API	Activity to make calls to Salesforce REST API.
E-mail (SMTP)	‌‌
Mail with SMTP	This function lets you send emails with an e-mail address.
Windows OS	‌‌
Find window with specific title	Find a specific window based on the name, either a perfect match or a partial match.
Login to Windows Remote Desktop	Create a RDP and login to Windows Remote Desktop
Stop Windows Remote Desktop	Stop Windows Remote Desktop
Set Windows password	Sets the password for a Windows user.
Check Windows password	Validates a Windows user password if it is correct
Lock Windows	Locks Windows requiring login to continue.
Check if Windows logged in	Checks if the current user is logged in and not on the lockscreen. Most automations do not work properly when the desktop is locked.
Check if Windows is locked	Checks if the current user is locked out and on the lockscreen. Most automations do not work properly when the desktop is locked.
Get Windows username	Get current logged in user's username
Set clipboard	Set any text to the Windows clipboard.
Get clipboard	Get the text currently in the Windows clipboard
Empty clipboard	Empty text from clipboard. Getting clipboard data after this should return in None
Run VBSscript	Run a VBScript file
Beep	Make a beeping sound. Make sure your volume is up and you have hardware connected.
Get all network interface names	Returns a list of all network interfaces of the current machine
Enable network interface	Enables a network interface by its name.
Disable network interface	Disables a network interface by its name.
Get default printer	Returns the name of the printer selected as default
Set default printer	Set the default printer.
Remove printer	Removes a printer by its name
Get service status	Returns the status of a service on the machine
Start a service	Starts a Windows service
Stop a service	Stops a Windows service
Set window to foreground	Sets a window to foreground by its title.
Get foreground window title	Retrieve the title of the current foreground window
Close window	Closes a window by its title
Maximize window	Maximizes a window by its title
Restore window	Restore a window by its title
Minimize window	Minimizes a window by its title
Resize window	Resize a window by its title
Hide window	Hides a window from the user desktop by using it's title
Terminal	‌‌
Run SSH command	Runs a command over SSH (Secure Shell)
SNMP	‌‌
SNMP Get	Retrieves data from an SNMP agent using SNMP (Simple Network Management Protocol)
Active Directory	‌‌
AD interface	Interface to Windows Active Directory through ADSI. Connects to the AD domain to which the machine is joined by default.
Get AD object by name	Interface to Windows Active Directory through ADSI
Utilities	‌‌
Get user home path	Returns the current user's home path
Get desktop path	Returns the current user's desktop path
Get downloads path	Returns the current user's default download path
Open file	Opens file with default programs
Set wallpaper	Set Windows desktop wallpaper with the the specified image
Download file from a URL	Download file from a URL
System	‌‌
Rename a file	This activity will rename a file. If the the desired name already exists in the folder file will not be renamed. Make sure to add the exstention to specify filetype.
Move a file	If the new location already contains a file with the same name.
Remove a file	Remove a file
Check if file exists	This function checks whether the file with the given path exists.
Wait until a file exists.	Note that this activity is blocking and will keep the system waiting.
List to .txt	Writes a list to a text (.txt) file.Every element of the entered list is written on a new line of the text file.
Read list from .txt file	This activity reads the content of a .txt file to a list and returns that list.Every new line from the .txt file becomes a new element of the list. The activity willnot work if the entered path is not attached to a .txt file.
Read .txt file	This activity reads a .txt file and returns the content
Append to .txt	Append a text line to a file and creates the file if it does not exist yet.
Make text file	Initialize text file
Read .txt file with newlines to list	Read a text file to a Python list-object
Copy a file	Copies a file from one place to another.If the new location already contains a file with the same name, a random 4 character uid is added to the name.
Get file extension	Get extension of a file
Print	Send file to default printer to priner. This activity sends a file to the printer. Make sure to have a default printer set up.
PDF	‌‌
Text from PDF	Extracts the text from a PDF. This activity reads text from a pdf file. Can only read PDF files that contain a text layer.
Merge PDF	Merges multiple PDFs into a single file
Extract page from PDF	Extracts a particular range of a PDF to a separate file.
Extract images from PDF	Save a specific page from a PDF as an image
Watermark a PDF	Watermark a PDF
System Monitoring	‌‌
CPU load	Get average CPU load for all cores.
Count CPU	Get the number of CPU's in the current system.
CPU frequency	Get frequency at which CPU currently operates.
CPU Stats	Get CPU statistics
Memory statistics	Get memory statistics
Disk stats	Get disk statistics of main disk
Partition info	Get disk partition info
Boot time	Get most recent boot time
Uptime	Get uptime since last boot
Image Processing	‌‌
Show image	Displays an image specified by the path variable on the default imaging program.
Rotate image	Rotate an image
Resize image	Resizes the image specified by the path variable.
Get image width	Get with of image
Get image height	Get height of image
Crop image	Crops the image specified by path to a region determined by the box variable.
Mirror image horizontally	Mirrors an image with a given path horizontally from left to right.
Mirror image vertically	Mirrors an image with a given path vertically from top to bottom.
Process	‌‌
Windows run	Use Windows Run to boot a processNote this uses keyboard inputs which means this process can be disrupted by interfering inputs
Run process	Use subprocess to open a windows process
Check if process is running	Check if process is running. Validates if given process name (name) is currently running on the system.
Get running processes	Get names of unique processes currently running on the system.
Kill process	Kills a process forcefully
Optical Character Recognition (OCR)	‌‌
Get text with OCR	This activity extracts all text from the current screen or an image if a path is specified.
Find text on screen with OCR	This activity finds position (coordinates) of specified text on the current screen using OCR.
Click on text with OCR	This activity clicks on position (coordinates) of specified text on the current screen using OCR.
Double click on text with OCR	This activity double clicks on position (coordinates) of specified text on the current screen using OCR.
Right click on text with OCR	This activity Right clicks on position (coordinates) of specified text on the current screen using OCR.
UiPath	‌‌
Execute a UiPath process	This activity allows you to execute a process designed with the UiPath Studio. All console output from the Write Line activity will be printed as output.
AutoIt	‌‌
Execute a AutoIt script	This activity allows you to run an AutoIt script. If you use the ConsoleWrite function, the output will be presented to you.
Alternative frameworks	‌‌
Execute a Robot Framework test case	This activity allows you to run a Robot Framework test case. Console output of the test case will be printed.
Run a Blue Prism process	This activity allows you to run a Blue Prism process.
Run an Automation Anywhere task	This activity allows you to run an Automation Anywhere task.
General	‌‌
Raise exception	Raises an exception
SAP GUI	‌‌
Quit SAP GUI	Quits the SAP GUI completely and forcibly.
Log in to SAP GUI	Logs in to an SAP system on SAP GUI.
Click on a SAP GUI element	Clicks on an identifier in the SAP GUI.
Get text from a SAP GUI element	Retrieves the text from a SAP GUI element.
Set text of a SAP GUI element	Sets the text of a SAP GUI element.
Highlights a SAP GUI element	Temporarily highlights a SAP GUI element
Portal	‌‌
Create a new job in the Automagica Portal	This activity creates a new job in the Automagica Portal for a given process. The bot performing this activity needs to be in the same team as the process it creates a job for.
Get a credential from the Automagica Portal	This activity retrieves a credential from the Automagica Portal.
Vision	‌‌
Check if element is visible on screen	This activity can be used to check if a certain element is visible on the screen.Note that this uses Automagica Portal and uses some advanced an fuzzy matching algorithms for finding identical elements.
Wait for an element to appear	Wait for an element that is defined the recorder
Wait Vanish	This activity allows the bot to wait for an element to vanish.
Read Text with Automagica Wand	This activity allows the bot to detect and read the text of an element by using the Automagica Portal API with a provided sample ID.
	‌‌

许可

版权与许可

本仓库中的所有源代码及其他文件，除非另有说明，均受 Netcall plc 版权保护。

商业许可

有关许可、试用及商业使用的更多信息，请参阅此页面。

Automagica 快速上手指南

⚠️ 重要提示：Automagica 开源项目已于 2020 年 10 月被 Netcall plc 收购。原有的 AGPL3 开源许可已停止，官方云服务（如 Portal、Wand）仅对现有用户提供有限时间的免费过渡期，随后将转为商业服务。本指南基于其开源架构整理，适用于本地开发与学习，生产环境请评估商业授权或迁移方案。

Automagica 是一个旨在让机器人流程自动化（RPA）技术普及化的开源项目。它结合了 Python 的强大功能与可视化流程设计，支持浏览器自动化、文件处理、数据生成等多种任务。

环境准备

在开始之前，请确保您的开发环境满足以下要求：

操作系统：Windows 10/11（推荐，部分功能如声音提示仅限 Windows），Linux 或 macOS。
Python 版本：Python 3.6 或更高版本。
前置依赖：
- pip (Python 包管理工具)
- Google Chrome 浏览器及其对应的 ChromeDriver（用于浏览器自动化）。
- Jupyter Notebook（如果计划使用 Automagica Lab 进行开发）。

国内加速建议：在安装 Python 依赖时，建议使用清华或阿里云镜像源以提升下载速度。

安装步骤

1. 安装核心库

打开终端（Terminal）或命令提示符（CMD），运行以下命令安装 Automagica 核心包：

pip install automagica -i https://pypi.tuna.tsinghua.edu.cn/simple

2. 安装浏览器驱动

Automagica 默认依赖 Chrome 进行网页自动化。请确保已安装 Google Chrome 浏览器，并下载对应版本的 chromedriver。

将 chromedriver 的可执行文件路径添加到系统环境变量 PATH 中，或者将其放置在项目根目录下。

提示：也可以使用 webdriver-manager 自动管理驱动：
pip install webdriver-manager -i https://pypi.tuna.tsinghua.edu.cn/simple

3. 可选组件安装

如果您需要使用笔记本风格的开发环境（Automagica Lab），请安装 Jupyter：

pip install jupyter -i https://pypi.tuna.tsinghua.edu.cn/simple

基本使用

Automagica 的核心是通过简单的 Python 函数调用（Activities）来执行自动化任务。您可以直接在 Python 脚本或 Jupyter Notebook 中运行。

示例：浏览器操作与 Excel 数据处理

以下示例演示了如何打开浏览器、访问网页、提取文本，并生成随机数据。

from automagica import *

# 1. 打开 Chrome 浏览器
Chrome()

# 2. 访问指定 URL
browse_to('https://www.example.com')

# 3. 获取网页上的所有文本内容
page_text = get_text_on_webpage()
print_console(page_text)

# 4. 查找页面上的第一个链接
first_link = find_first_link()
print_console(f"First link found: {first_link}")

# 5. 生成一些随机测试数据（例如随机姓名和日期）
random_name = generate_random_name(locale='zh_CN') # 尝试生成中文名字
random_date = generate_random_date()

print_console(f"Generated Name: {random_name}")
print_console(f"Generated Date: {random_date}")

# 6. 在屏幕上显示一条临时消息（OSD）
display_osd_message("自动化任务完成！", duration=3)

# 7. 优雅地关闭浏览器
exit()

进阶：使用 Flow 进行可视化编排

如果您更喜欢低代码方式，可以启动 Automagica Flow（如果已安装图形界面组件）：

在命令行输入 automagica flow 启动可视化设计器。
通过拖拽组件构建流程。
在需要复杂逻辑的地方，直接嵌入上述 Python 代码片段。

常用活动速查

文件加密/解密: encrypt_text_with_key(), decrypt_file_with_key()
随机数据: generate_random_number(), generate_random_address()
元素定位: by_xpath(), by_class(), find_elements_by_text()
调试输出: print_console(), display_osd_message()

通过以上步骤，您即可快速搭建本地的 Automagica 开发环境并开始编写自动化脚本。

常见问题

为什么在调用 OCR 功能（ExtractTextFromImage）时出现 FileNotFoundError: [WinError 2] 系统找不到指定的文件？

如何在 Mac OS X 上安装 Automagica？遇到 sty 或 pywin32 依赖缺失怎么办？

使用 pip 或 PyCharm 安装时遇到 openssl/opensslv.h 错误如何解决？

Jupyter 无法打开或点击编辑脚本无反应（CMD 窗口闪退）怎么办？

如何正确使用 API Trigger？为什么返回 404 错误？

Automagica 项目停止开源了吗？社区是否有替代方案？

相似工具推荐

openclaw

OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。 OpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你

★ 349.3k|★★★☆☆|4天前

Agent开发框架图像

stable-diffusion-webui

stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。

★ 162.1k|★★★☆☆|4天前

开发框架图像Agent

everything-claude-code

everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上

★ 148.6k|★★☆☆☆|今天

开发框架Agent语言模型

ComfyUI

ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。

★ 108.1k|★★☆☆☆|昨天

开发框架图像Agent

markitdown

MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器

★ 93.4k|★★☆☆☆|3天前

插件开发框架

LLMs-from-scratch

LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。 LLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备

★ 90.1k|★★★☆☆|3天前

语言模型图像Agent