[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-jina-ai--agentchain":3,"tool-jina-ai--agentchain":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":81,"owner_website":82,"owner_url":83,"languages":84,"stars":97,"forks":98,"last_commit_at":99,"license":100,"difficulty_score":101,"env_os":102,"env_gpu":103,"env_ram":104,"env_deps":105,"category_tags":110,"github_topics":111,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":121,"updated_at":122,"faqs":123,"releases":124},3316,"jina-ai\u002Fagentchain","agentchain","Chain together LLMs for reasoning & orchestrate multiple large models for accomplishing complex tasks","AgentChain 是一款基于大语言模型（LLM）的智能编排框架，旨在将多个 AI 模型或智能体串联起来，以协同完成复杂的推理与执行任务。它核心解决了单一模型难以处理多步骤、跨模态复杂工作流的痛点，让用户能通过自然语言指令，灵活调度不同工具来实现从数据理解、内容生成到实际行动（如搜索网页、拨打电话）的全流程自动化。\n\n这款工具特别适合开发者、AI 研究人员以及需要构建定制化智能工作流的技术团队使用。其独特亮点在于“全模态”支持，不仅能处理文本，还能直接输入和输出图像、音频及表格数据，实现跨模态的信息转换与分析。此外，AgentChain 具备强大的智能编排能力，能像大脑一样自主规划任务路径，根据需求动态选择并组合最合适的工具或子模型。系统架构高度可定制，允许用户按需扩展新的智能体或调整分布式架构，为构建专属的多模态 AI 应用提供了坚实且灵活的基础。","\u003Cp>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjina-ai_agentchain_readme_32a43fdaf32e.png\" alt=\"AgentChain logo\" width=\"250px\">\u003C\u002Fa>\n\u003C\u002Fp>\n\nAgentChain uses Large Language Models (LLMs) for planning and orchestrating multiple Agents or Large Models (LMs) for accomplishing sophisticated tasks. AgentChain is fully multimodal: it accepts text, image, audio, tabular data as input and output.\n\n- **🧠 LLMs as the brain:** AgentChain leverages state-of-the-art Large Language Models to provide users with the ability to plan and make decisions based on natural language inputs. This feature makes AgentChain a versatile tool for a wide range of applications, such as task execution give natural language instructions, data understanding, and data generation.\n- **🌟 Fully Multimodal IO:** AgentChain is fully multimodal, accepting input and output from various modalities, such as text, image, audio, or video (coming soon). This feature makes AgentChain a versatile tool for a wide range of applications, such as computer vision, speech recognition, and transitioning from one modality to another.\n- **🤝 Orchestrate Versatile Agents:** AgentChain can orchestrate multiple agents to perform complex tasks. Using composability and hierarchical structuring of tools AgentChain can choose intelligently which tools to use and when for a certain task. This feature makes AgentChain a powerful tool for projects that require complex combination of tools.\n- **🔧 Customizable for Ad-hoc Needs:** AgentChain can be customized to fit specific project requirements, making it a versatile tool for a wide range of applications. Specific requirements can be met by enhancing capabilities with new agents (and distributed architecture coming soon).\n\n\n\n\n\n# Get started\n1. Install requirements: `pip install -r requirements.txt`\n2. Download model checkpoints: `bash download.sh`\n3. Depending on the agents you need in-place, make sure to export environment variables\n\n```shell\nOPENAI_API_KEY={YOUR_OPENAI_API_KEY} # mandatory since the LLM is central in this application\nSERPAPI_API_KEY={YOUR_SERPAPI_API_KEY}  # make sure to include a serp API key in case you need the agent to be able to search the web\n\n# These environment variables are needed in case you want the agent to be able to make phone calls\nAWS_ACCESS_KEY_ID={YOUR_AWS_ACCESS_KEY_ID}\nAWS_SECRET_ACCESS_KEY={YOUR_AWS_SECRET_ACCESS_KEY}\nTWILIO_ACCOUNT_SID={YOUR_TWILIO_ACCOUNT_SID}\nTWILIO_AUTH_TOKEN={YOUR_TWILIO_AUTH_TOKEN}\nAWS_S3_BUCKET_NAME={YOUR_AWS_S3_BUCKET_NAME} # make sure to create an S3 bucket with public access\n```\n4. Install `ffmpeg` library (needed for whisper): `sudo apt update && sudo apt install ffmpeg` (Ubuntu command)\n5. Run the main script: `python main.py`\n\n\n## System requirements\nAs of [this commit](https:\u002F\u002Fgithub.com\u002Fjina-ai\u002Fagentchain\u002Fcommit\u002Fda588a728c390fb538fd361d4f41dd50aa193751), it is needed to have at least 29 GB of GPU memory to run the AgentChain.\nHowever, make sure to assign GPU devices correctly in `main.py`.\n\nYou can comment out some tools and models to reduce the GPU memory footprint (but for less capabilities).\n\n\n# Demo\n\n\nAgentChain demo 1: transcribing audio and visualizing the result as an image. A video of the AgentChain interface shows an uploaded audio and the resulting generated image, which is a representation of the audio content.\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F4182659\u002F225347932-87298e6c-58d0-4a29-892f-1398b1406c15.mp4\n\n---\n\nAgentChain demo 2: asking questions about an image. A video of the AgentChain interface shows an image and a question being asked about it, with the resulting answer displayed below.\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F4182659\u002F225348027-ed30f9d5-d05b-405a-9651-c08f4976cf83.mp4\n\n---\n\nAgentChain demo 3: question-answering on tabular data and making a phone call to report the results. A video of the AgentChain interface shows a table of data with a question being asked and the resulting answer displayed, followed by a phone call being made using the `CommsAgent`.\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F4182659\u002F225348128-6e9bdb3b-78ed-49e8-80f5-fd7c9ad66f28.mp4\n\n# Agents in AgentChain\n\n> The content of this document mostly shows **our vision** and **what we aim to achieve** with AgentChain.\nCheck the Demo section to understand what we achieved so far.\n\n![](architecture.svg)\n\nAgentChain is a sophisticated system with the goal of solving general problems. It can orchestrate multiple agents to accomplish sub-problems. These agents are organized into different groups, each with their unique set of capabilities and functionalities. Here are some of the agent groups in AgentChain:\n\n### SearchAgents\nThe `SearchAgents` group is responsible for gathering information from various sources, including search engines, online databases, and APIs. The agents in this group are highly skilled at retrieving up-to-date world knowledge information. Some examples of agents in this group include the `Google Search API`, `Bing API`, `Wikipedia API`, and `Serp`.\n\n### CommsAgents\nThe `CommsAgents` group is responsible for handling communication between different parties, such as sending emails, making phone calls, or messaging via various platforms. The agents in this group can integrate with a wide range of platforms. Some examples of agents in this group include `TwilioCaller`, `TwilioEmailWriter`, `TwilioMessenger` and `Slack`.\n\n### ToolsAgents\nThe `ToolsAgents` group is responsible for performing various computational tasks, such as performing calculations, running scripts, or executing commands. The agents in this group can work with a wide range of programming languages and tools. Some examples of agents in this group include `Math`, `Python REPL`, and `Terminal`.\n\n### MultiModalAgents\nThe `MultiModalAgents` group is responsible for handling input and output from various modalities, such as text, image, audio, or video (coming soon). The agents in this group can process and understand different modalities. Some examples of agents in this group include `OpenAI Whisper`, `Blip2`, `Coqui`, and `StableDiffusion`.\n\n### ImageAgents\nThe `ImageAgents` group is responsible for processing and manipulating images, such as enhancing image quality, object detection, or image recognition. The agents in this group can perform complex operations on images. Some examples of agents in this group include `Upscaler`, `ControlNet` and `YOLO`.\n\n### DBAgents\nThe `DBAgents` group is responsible for adding and fetching data from your database, such as getting metrics or aggregations from your database. The agents in this group interact with databases and enrich other agents with your database information. Some examples of agents in this group include `SQL`, `MongoDB`, `ElasticSearch`, `Qrant` and `Notion`.\n\n\n# Potential Applications\n\n### Example 1: 🏝️📸🌅 AgentChain Image Generation System for Travel Company\nAs a travel company that is promoting a new and exotic destination, it is crucial to have high-quality images that can grab the attention of potential travelers. However, manually creating stunning images can be time-consuming and expensive. That's why the travel company wants to use AgentChain to automate the image generation process and create beautiful visuals with the help of various agents.\n\nHere is how AgentChain can help by chaining different agents together:\n1. Use `SearchAgent` (`Google Search API`, `Wikipedia API`, `Serp`) to gather information and inspiration about the destination, such as the most popular landmarks, the local cuisine, and the unique features of the location.\n2. Use `ImageAgent` (`Upscaler`) to enhance the quality of images and make them more appealing by using state-of-the-art algorithms to increase the resolution and remove noise from the images.\n3. Use `MultiModalAgent` (`Blip2`) to generate descriptive captions for the images, providing more context and making the images more meaningful.\n4. Use `CommsAgent` (`TwilioEmailWriter`) to send the images to the target audience via email or other messaging platforms, attracting potential travelers with stunning visuals and promoting the new destination.\n\n### Example 2: 💼💹📈 AgentChain Financial Analysis Report for Investment Firm\nAs an investment firm that manages a large portfolio of stocks, it is critical to stay up-to-date with the latest market trends and analyze the performance of different stocks to make informed investment decisions. However, analyzing data from multiple sources can be time-consuming and error-prone. That's why the investment firm wants to use AgentChain to automate the analysis process and generate reports with the help of various agents.\n\nHere is how AgentChain can help by chaining different agents together:\n1. Use `ToolsAgent` (`Python REPL`, `TableQA`) to analyze data from different sources (e.g., CSV files, stock market APIs) and perform calculations related to financial metrics such as earnings, dividends, and P\u002FE ratios.\n2. Use `SearchAgent` (`Bing API`) to gather news and information related to the stocks in the portfolio, such as recent earnings reports, industry trends, and analyst ratings.\n3. Use `NLPAgent` (`GPT`) to create a summary and bullet points of the news and information gathered, providing insights into market sentiment and potential trends.\n4. Use `CommsAgent` (`TwilioEmailWriter`) to send a summary report of the analysis to the appropriate stakeholders, helping them make informed decisions about their investments.\n\n### Example 3: 🛍️💬💻 AgentChain Customer Service Chatbot for E-commerce Site\nAs an e-commerce site that wants to provide excellent customer service, it is crucial to have a chatbot that can handle customer inquiries and support requests in a timely and efficient manner. However, building a chatbot that can understand and respond to complex customer requests can be challenging. That's why the e-commerce site wants to use AgentChain to automate the chatbot process and provide superior customer service with the help of various agents.\n\nHere is how AgentChain can help by chaining different agents together:\n1. Use `MultiModalAgent` (`Blip2`, `Whisper`) to handle input from various modalities (text, image, audio), making it easier for customers to ask questions and make requests in a natural way.\n2. Use `SearchAgent` (`Google Search API`, `Wikipedia API`) or `DBAgent` to provide information about products or services whether in-house or public, such as specifications, pricing, and availability.\n3. Use `CommsAgent` (`TwilioMessenger`) to communicate with customers via messaging platforms, providing support and answering questions in real-time.\n4. Use `ToolsAgent` (`Math`) to perform calculations related to discounts, taxes, or shipping costs, helping customers make informed decisions about their purchases.\n5. Use `MultiModalAgent` (`Coqui`) to generate natural-sounding responses and hold more complex conversations, providing a personalized and engaging experience for customers.\n\n### Example 4: 🧑‍⚕️💊💤 AgentChain Personal Health Assistant\nAccess to personal health assistance can be expensive and limited. It is essential to have a personal health assistant that can help individuals manage their health and well-being. However, providing personalized health advice and reminders can be challenging, especially for seniors. That's why AgentChain aims to automate the health assistant process and provide personalized support with the help of various agents.\n\nHere is how AgentChain can help by chaining different agents together:\n1. Use `DBAgent` to handle input from various health monitoring devices (e.g., heart rate monitors, blood pressure monitors, sleep trackers), providing real-time health data and alerts to the health assistant.\n2. Use `SearchAgent` (`Google Search API`, `Wikipedia API`) or any other medical database to provide information about health topics and medications, such as side effects, dosage, and interactions.\n3. Use `NLPAgent` (`GPT`) to generate personalized recommendations for diet, exercise, and medication, taking into account the seniors' health goals and preferences.\n4. Use `CommsAgent` (`TwilioCaller`, `TwilioMessenger`) to advise, make reminders and provide alerts to help stay on track with their health goals, improving their quality of life and reducing the need for emergency care.\n\n\n## Acknowledgements\nWe appreciate the open source of the following projects:\n\n[Hugging Face](https:\u002F\u002Fgithub.com\u002Fhuggingface) &#8194;\n[LangChain](https:\u002F\u002Fgithub.com\u002Fhwchase17\u002Flangchain) &#8194;\n[Stable Diffusion](https:\u002F\u002Fgithub.com\u002FCompVis\u002Fstable-diffusion) &#8194; \n[ControlNet](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet) &#8194; \n[InstructPix2Pix](https:\u002F\u002Fgithub.com\u002Ftimothybrooks\u002Finstruct-pix2pix) &#8194; \n[CLIPSeg](https:\u002F\u002Fgithub.com\u002Ftimojl\u002Fclipseg) &#8194;\n[BLIP](https:\u002F\u002Fgithub.com\u002Fsalesforce\u002FBLIP) &#8194;\n[Microsoft](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fvisual-chatgpt) &#8194;\n","\u003Cp>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjina-ai_agentchain_readme_32a43fdaf32e.png\" alt=\"AgentChain logo\" width=\"250px\">\u003C\u002Fa>\n\u003C\u002Fp>\n\nAgentChain 利用大型语言模型（LLMs）来规划并协调多个智能体或大型模型，以完成复杂的任务。AgentChain 是完全多模态的：它接受文本、图像、音频、表格数据作为输入和输出。\n\n- **🧠 LLMs 作为大脑**：AgentChain 利用最先进的大型语言模型，使用户能够基于自然语言输入进行规划和决策。这一特性使 AgentChain 成为一个适用于广泛应用场景的多功能工具，例如根据自然语言指令执行任务、理解数据以及生成数据。\n- **🌟 完全多模态输入输出**：AgentChain 是完全多模态的，可以接受来自不同模态的输入和输出，如文本、图像、音频或视频（即将推出）。这一特性使 AgentChain 成为一个适用于计算机视觉、语音识别以及跨模态转换等广泛领域的多功能工具。\n- **🤝 协调多种智能体**：AgentChain 可以协调多个智能体来完成复杂任务。通过工具的可组合性和层次化结构，AgentChain 能够智能地选择在特定任务中使用哪些工具以及何时使用。这一特性使 AgentChain 成为需要复杂工具组合的项目的强大工具。\n- **🔧 针对临时需求可定制**：AgentChain 可以根据具体项目需求进行定制，使其成为适用于各种应用场景的多功能工具。通过添加新智能体（以及即将推出的分布式架构），可以满足特定需求并增强功能。\n\n\n\n\n\n# 开始使用\n1. 安装依赖：`pip install -r requirements.txt`\n2. 下载模型检查点：`bash download.sh`\n3. 根据所需的智能体配置，确保正确设置环境变量\n\n```shell\nOPENAI_API_KEY={YOUR_OPENAI_API_KEY} # 必需，因为 LLM 在本应用中处于核心地位\nSERPAPI_API_KEY={YOUR_SERPAPI_API_KEY}  # 如果需要智能体具备网络搜索能力，请务必提供 Serp API 密钥\n\n# 如果希望智能体能够拨打电话，则需要以下环境变量\nAWS_ACCESS_KEY_ID={YOUR_AWS_ACCESS_KEY_ID}\nAWS_SECRET_ACCESS_KEY={YOUR_AWS_SECRET_ACCESS_KEY}\nTWILIO_ACCOUNT_SID={YOUR_TWILIO_ACCOUNT_SID}\nTWILIO_AUTH_TOKEN={YOUR_TWILIO_AUTH_TOKEN}\nAWS_S3_BUCKET_NAME={YOUR_AWS_S3_BUCKET_NAME} # 请确保创建具有公共访问权限的 S3 存储桶\n```\n4. 安装 `ffmpeg` 库（Whisper 所需）：`sudo apt update && sudo apt install ffmpeg`（Ubuntu 命令）\n5. 运行主脚本：`python main.py`\n\n\n## 系统要求\n截至 [此提交](https:\u002F\u002Fgithub.com\u002Fjina-ai\u002Fagentchain\u002Fcommit\u002Fda588a728c390fb538fd361d4f41dd50aa193751)，运行 AgentChain 至少需要 29 GB 的 GPU 内存。不过，请务必在 `main.py` 中正确分配 GPU 设备。\n您也可以注释掉部分工具和模型以减少 GPU 内存占用（但会降低功能）。\n\n\n# 演示\n\n\nAgentChain 演示 1：将音频转录并以图像形式可视化结果。一段 AgentChain 界面视频展示了上传的音频以及生成的图像，该图像是对音频内容的呈现。\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F4182659\u002F225347932-87298e6c-58d0-4a29-892f-1398b1406c15.mp4\n\n---\n\nAgentChain 演示 2：针对图像提问。一段 AgentChain 界面视频展示了一张图片及围绕该图片提出的问题，下方显示了回答结果。\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F4182659\u002F225348027-ed30f9d5-d05b-405a-9651-c08f4976cf83.mp4\n\n---\n\nAgentChain 演示 3：对表格数据进行问答，并通过电话报告结果。一段 AgentChain 界面视频展示了一个数据表，用户提出了一个问题并得到了答案，随后使用 `CommsAgent` 拨打了电话。\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F4182659\u002F225348128-6e9bdb3b-78ed-49e8-80f5-fd7c9ad66f28.mp4\n\n# AgentChain 中的智能体\n\n> 本文档的内容主要展示了我们对 AgentChain 的愿景以及我们希望实现的目标。\n请参阅“演示”部分，了解我们目前所取得的成果。\n\n![](architecture.svg)\n\nAgentChain 是一个旨在解决通用问题的复杂系统。它可以协调多个智能体来完成子问题。这些智能体被组织成不同的组别，每组都有其独特的功能和能力。以下是 AgentChain 中的一些智能体组别：\n\n### 搜索智能体\n`SearchAgents` 组负责从各种来源收集信息，包括搜索引擎、在线数据库和 API。该组中的智能体擅长获取最新的世界知识信息。该组的一些例子包括 `Google Search API`、`Bing API`、`Wikipedia API` 和 `Serp`。\n\n### 通信智能体\n`CommsAgents` 组负责处理不同主体之间的通信，例如发送电子邮件、拨打电话或通过各种平台进行消息传递。该组中的智能体可以与广泛的平台集成。该组的一些例子包括 `TwilioCaller`、`TwilioEmailWriter`、`TwilioMessenger` 和 `Slack`。\n\n### 工具智能体\n`ToolsAgents` 组负责执行各种计算任务，例如进行计算、运行脚本或执行命令。该组中的智能体可以使用多种编程语言和工具。该组的一些例子包括 `Math`、`Python REPL` 和 `Terminal`。\n\n### 多模态智能体\n`MultiModalAgents` 组负责处理来自不同模态的输入和输出，例如文本、图像、音频或视频（即将推出）。该组中的智能体能够处理和理解不同模态的信息。该组的一些例子包括 `OpenAI Whisper`、`Blip2`、`Coqui` 和 `StableDiffusion`。\n\n### 图像智能体\n`ImageAgents` 组负责处理和操作图像，例如提升图像质量、目标检测或图像识别。该组中的智能体可以对图像执行复杂的操作。该组的一些例子包括 `Upscaler`、 `ControlNet` 和 `YOLO`。\n\n### 数据库智能体\n`DBAgents` 组负责向您的数据库中添加和检索数据，例如从数据库中获取指标或聚合信息。该组中的智能体与数据库交互，并利用您的数据库信息丰富其他智能体的功能。该组的一些例子包括 `SQL`、 `MongoDB`、 `ElasticSearch`、 `Qrant` 和 `Notion`。\n\n\n# 潜在应用\n\n### 示例 1：🏝️📸🌅 旅行公司 AgentChain 图像生成系统\n作为一家正在推广全新异域目的地的旅行公司，拥有能够吸引潜在旅客注意力的高质量图片至关重要。然而，手动制作精美的图片既耗时又昂贵。因此，该旅行公司希望利用 AgentChain 自动化图像生成流程，并借助多种智能体创建精美视觉内容。\n\n以下是 AgentChain 如何通过串联不同智能体来提供帮助：\n1. 使用 `SearchAgent`（Google 搜索 API、维基百科 API、Serp）收集关于目的地的信息和灵感，例如最受欢迎的地标、当地美食以及该地的独特之处。\n2. 使用 `ImageAgent`（Upscaler）通过最先进的算法提升图像分辨率并去除噪点，从而增强图像质量，使其更具吸引力。\n3. 使用 `MultiModalAgent`（Blip2）为图像生成描述性文字，提供更多上下文信息，使图像更具意义。\n4. 使用 `CommsAgent`（TwilioEmailWriter）将生成的图片通过电子邮件或其他消息平台发送给目标受众，以惊艳的视觉效果吸引潜在旅客，推广新目的地。\n\n### 示例 2：💼💹📈 投资公司 AgentChain 财务分析报告\n作为一家管理大量股票投资组合的投资公司，及时掌握最新市场趋势并分析各只股票的表现，以便做出明智的投资决策，至关重要。然而，从多个来源分析数据既耗时又容易出错。因此，该投资公司希望利用 AgentChain 自动化分析流程，并借助各类智能体生成报告。\n\n以下是 AgentChain 如何通过串联不同智能体来提供帮助：\n1. 使用 `ToolsAgent`（Python REPL、TableQA）分析来自不同来源的数据（如 CSV 文件、股市 API），并计算与财务指标相关的数据，例如收益、股息和市盈率。\n2. 使用 `SearchAgent`（Bing API）收集与投资组合中股票相关的信息和新闻，例如近期财报、行业趋势以及分析师评级。\n3. 使用 `NLPAgent`（GPT）对收集到的新闻和信息进行摘要整理，并提炼要点，以洞察市场情绪和潜在趋势。\n4. 使用 `CommsAgent`（TwilioEmailWriter）将分析总结报告发送给相关利益方，帮助他们做出更明智的投资决策。\n\n### 示例 3：🛍️💬💻 电商网站 AgentChain 客服聊天机器人\n作为一家致力于提供优质客户服务的电商平台，拥有一款能够及时高效地处理客户咨询和支持请求的聊天机器人至关重要。然而，构建能够理解并回应复杂客户需求的聊天机器人颇具挑战性。因此，该电商网站希望利用 AgentChain 自动化聊天机器人流程，并借助各类智能体提供卓越的客户服务。\n\n以下是 AgentChain 如何通过串联不同智能体来提供帮助：\n1. 使用 `MultiModalAgent`（Blip2、Whisper）处理来自多种模态的输入（文本、图像、音频），让客户能够以更自然的方式提出问题和请求。\n2. 使用 `SearchAgent`（Google 搜索 API、维基百科 API）或 `DBAgent` 提供产品或服务的相关信息，无论是内部还是公开资源，例如规格、价格和库存情况。\n3. 使用 `CommsAgent`（TwilioMessenger）通过消息平台与客户沟通，实时提供支持并解答疑问。\n4. 使用 `ToolsAgent`（Math）计算折扣、税费或运费等费用，帮助客户做出更明智的购物决策。\n5. 使用 `MultiModalAgent`（Coqui）生成自然流畅的回复，进行更复杂的对话交流，为客户提供个性化且富有吸引力的服务体验。\n\n### 示例 4：🧑‍⚕️💊💤 AgentChain 个人健康助手\n获得个人健康援助往往成本高昂且机会有限。拥有一位能够帮助个人管理健康和福祉的健康助手至关重要。然而，提供个性化的健康建议和提醒颇具挑战性，尤其是对于老年人而言。因此，AgentChain 致力于自动化健康助手流程，并借助各类智能体提供个性化支持。\n\n以下是 AgentChain 如何通过串联不同智能体来提供帮助：\n1. 使用 `DBAgent` 处理来自各类健康监测设备（如心率监测器、血压计、睡眠追踪器）的输入，向健康助手提供实时健康数据及警报。\n2. 使用 `SearchAgent`（Google 搜索 API、维基百科 API）或其他医疗数据库，提供有关健康主题和药物的信息，例如副作用、剂量和药物相互作用。\n3. 使用 `NLPAgent`（GPT）根据老年人的健康目标和偏好，生成个性化的饮食、运动和用药建议。\n4. 使用 `CommsAgent`（TwilioCaller、TwilioMessenger）提供建议、设置提醒并发出警报，帮助用户保持健康目标的执行进度，从而提升生活质量并减少对紧急医疗服务的需求。\n\n\n## 致谢\n我们感谢以下开源项目的支持：\n\n[Hugging Face](https:\u002F\u002Fgithub.com\u002Fhuggingface) &#8194;\n[LangChain](https:\u002F\u002Fgithub.com\u002Fhwchase17\u002Flangchain) &#8194;\n[Stable Diffusion](https:\u002F\u002Fgithub.com\u002FCompVis\u002Fstable-diffusion) &#8194; \n[ControlNet](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet) &#8194; \n[InstructPix2Pix](https:\u002F\u002Fgithub.com\u002Ftimothybrooks\u002Finstruct-pix2pix) &#8194; \n[CLIPSeg](https:\u002F\u002Fgithub.com\u002Ftimojl\u002Fclipseg) &#8194;\n[BLIP](https:\u002F\u002Fgithub.com\u002Fsalesforce\u002FBLIP) &#8194;\n[Microsoft](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fvisual-chatgpt) &#8194;","# AgentChain 快速上手指南\n\nAgentChain 是一个利用大语言模型（LLM）进行规划，并编排多个智能体（Agents）以完成复杂任务的多模态框架。它支持文本、图像、音频和表格数据作为输入和输出。\n\n## 环境准备\n\n### 系统要求\n- **GPU 显存**：至少需要 **29 GB** 显存才能运行完整功能（基于当前版本）。\n  - *优化建议*：如果显存不足，可以编辑 `main.py` 注释掉部分工具或模型以减少显存占用，但会相应减少可用功能。\n- **操作系统**：推荐 Linux (如 Ubuntu)，Windows 用户建议使用 WSL2。\n- **依赖库**：需要安装 `ffmpeg` 以支持音频处理（Whisper 模型）。\n\n### 前置依赖\n确保已安装 Python 3.8+ 和 Git。\n\n## 安装步骤\n\n### 1. 克隆项目并安装依赖\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fjina-ai\u002Fagentchain.git\ncd agentchain\npip install -r requirements.txt\n# 国内开发者建议使用清华源加速安装：\n# pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n### 2. 下载模型检查点\n执行官方提供的下载脚本：\n```bash\nbash download.sh\n```\n\n### 3. 安装 FFmpeg\nAgentChain 依赖 `ffmpeg` 处理音频，请在终端执行以下命令安装：\n```bash\n# Ubuntu\u002FDebian\nsudo apt update && sudo apt install ffmpeg\n\n# macOS (需先安装 Homebrew)\nbrew install ffmpeg\n```\n\n### 4. 配置环境变量\n根据你需要使用的智能体功能，导出相应的 API Key。LLM 是核心组件，`OPENAI_API_KEY` 为必填项。\n\n```bash\nexport OPENAI_API_KEY={YOUR_OPENAI_API_KEY}\n\n# 如果需要联网搜索功能\nexport SERPAPI_API_KEY={YOUR_SERPAPI_API_KEY}\n\n# 如果需要电话呼叫功能 (可选)\nexport AWS_ACCESS_KEY_ID={YOUR_AWS_ACCESS_KEY_ID}\nexport AWS_SECRET_ACCESS_KEY={YOUR_AWS_SECRET_ACCESS_KEY}\nexport TWILIO_ACCOUNT_SID={YOUR_TWILIO_ACCOUNT_SID}\nexport TWILIO_AUTH_TOKEN={YOUR_TWILIO_AUTH_TOKEN}\nexport AWS_S3_BUCKET_NAME={YOUR_AWS_S3_BUCKET_NAME}\n```\n> **注意**：请确保已创建具有公共访问权限的 AWS S3 存储桶（如需使用电话功能）。\n\n## 基本使用\n\n### 运行主程序\n配置完成后，直接运行主脚本即可启动 AgentChain：\n\n```bash\npython main.py\n```\n\n### 功能示例\n启动后，AgentChain 将根据你的输入自动编排不同的智能体组（如搜索、通信、多模态处理等）来完成任务。支持的典型场景包括：\n\n1.  **多模态转换**：上传音频文件，自动转录并生成代表音频内容的图像。\n2.  **视觉问答**：上传图片并提出问题，系统将分析图片内容并回答。\n3.  **数据分析与报告**：上传表格数据，系统进行问答分析，并通过电话或邮件报告结果。\n\n### 自定义开发\n你可以通过修改代码组合不同的智能体组来满足特定需求：\n- **SearchAgents**: 调用 Google\u002FBing\u002FWikipedia 获取实时信息。\n- **CommsAgents**: 集成 Twilio\u002FSlack 进行邮件、电话或消息发送。\n- **MultiModalAgents**: 使用 Whisper, Blip2, StableDiffusion 处理音视频和图像。\n- **ToolsAgents**: 执行 Python 代码、数学计算或终端命令。\n- **DBAgents**: 连接 SQL\u002FMongoDB\u002FNotion 进行数据读写。\n\n通过调整 `main.py` 中的逻辑，你可以灵活构建适用于旅行推广、金融分析、电商客服或个人健康助手等场景的应用。","某电商运营团队需要每日处理大量用户语音投诉，从中提取关键信息、查询订单状态，并自动生成可视化报告拨打回访电话。\n\n### 没有 agentchain 时\n- **多模态数据割裂**：语音录音需人工转写为文字，图片证据需单独上传分析，数据流转依赖手动操作，效率极低。\n- **任务流程繁琐**：员工需先在客服系统查订单，再去数据库比对日志，最后手动拨号回访，跨系统切换耗时费力。\n- **决策缺乏智能**：面对复杂投诉（如“货不对板且物流延误”），难以自动规划先查物流再核对商品图的执行顺序，容易遗漏关键步骤。\n- **扩展成本高昂**：若要增加“自动发送赔偿短信”或“生成趋势图表”等新功能，需重新开发接口并编写大量胶水代码。\n\n### 使用 agentchain 后\n- **全模态自动流转**：agentchain 直接接收语音和图片输入，自动调用 Whisper 转写文本并利用视觉模型分析图片，无需人工干预预处理。\n- **智能编排闭环**：大模型作为“大脑”自动规划路径：先解析语音意图，接着调用搜索工具查订单，最后通过 Twilio 自动拨打回访电话，一气呵成。\n- **动态推理决策**：针对复杂场景，agentchain 能自主判断需先对比商品图与订单描述，再决定是否需要升级工单，确保逻辑严密无遗漏。\n- **灵活定制扩展**：只需配置新 Agent 节点，即可轻松加入“自动生成赔偿方案”或“绘制投诉热力图”等功能，无需重构底层架构。\n\nagentchain 通过将大模型的推理能力与多模态工具链深度结合，让复杂的跨系统自动化任务变得像自然对话一样简单高效。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjina-ai_agentchain_32a43fda.png","jina-ai","Jina AI","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fjina-ai_e871d2a3.png","Your Search Foundation, Supercharged!",null,"hello@jina.ai","JinaAI_","https:\u002F\u002Fjina.ai","https:\u002F\u002Fgithub.com\u002Fjina-ai",[85,89,93],{"name":86,"color":87,"percentage":88},"Python","#3572A5",96.5,{"name":90,"color":91,"percentage":92},"Shell","#89e051",2.3,{"name":94,"color":95,"percentage":96},"Dockerfile","#384d54",1.2,609,56,"2026-03-29T07:15:50","MIT",4,"Linux","必需，至少需要 29GB 显存 (GPU Memory)，具体显卡型号未说明","未说明",{"notes":106,"python":104,"dependencies":107},"1. 必须设置 OPENAI_API_KEY 环境变量。2. 如需使用网络搜索、电话呼叫等功能，需额外配置 SerpAPI、AWS 和 Twilio 的相关密钥。3. 在 Ubuntu 系统上需手动安装 ffmpeg 库以支持 Whisper 模型。4. 可通过注释掉 main.py 中的部分工具和模型来降低显存占用，但会减少功能。5. 需运行 download.sh 脚本下载模型检查点。",[108,109],"requirements.txt 中定义的依赖","ffmpeg",[26,54,14,55,13],[112,113,114,115,116,117,118,119,120],"artificial-intelligence","llm","machine-learning","multimodal","nlproc","langchain","stable-diffusion","blip","whisper","2026-03-27T02:49:30.150509","2026-04-06T08:42:14.374042",[],[]]