[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-myshell-ai--AIlice":3,"tool-myshell-ai--AIlice":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",155373,2,"2026-04-14T11:34:08",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":77,"owner_twitter":78,"owner_website":79,"owner_url":80,"languages":81,"stars":102,"forks":103,"last_commit_at":104,"license":105,"difficulty_score":10,"env_os":106,"env_gpu":107,"env_ram":108,"env_deps":109,"category_tags":114,"github_topics":115,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":120,"updated_at":121,"faqs":122,"releases":152},7518,"myshell-ai\u002FAIlice","AIlice","AIlice is a fully autonomous, general-purpose AI agent.","AIlice 是一款基于开源大语言模型打造的全自动通用 AI 智能体，旨在成为像电影《钢铁侠》中 JARVIS 那样的独立私人助手。它主要解决了传统 AI 难以独立处理复杂、多步骤任务的问题，能够自主将宏大目标拆解为多个子任务，动态创建并协调不同的代理角色协同工作，最终整合结果。无论是深度主题研究、代码编写、系统管理还是文献综述，AIlice 都能高效应对，甚至具备自我进化能力，可自主扩展新功能。\n\n该项目独特的 IACT（交互式代理调用树）架构是其核心亮点，赋予了系统极高的容错性和任务分解灵活性。此外，AIlice 不仅支持本地部署以保护隐私，还集成了实用的语音对话功能和最新的 MCP 工具扩展能力，让交互更自然、功能更强大。\n\nAIlice 非常适合开发者、研究人员以及希望探索本地化高级 AI 应用的技术爱好者使用。对于想要摆脱云端依赖、在本地构建强大自动化工作流的用户来说，它是一个极具价值的开源选择。通过简单的配置，用户即可结合商业或开源模型，释放出大语言模型在现实世界中的真正潜力。","\n\u003Cdiv align= \"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmyshell-ai_AIlice_readme_5502bfb65596.png\" height=256>\n    \u003Ch1>Ailice\u003C\u002Fh1>\n\n[![forks](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Fmyshell-ai\u002FAIlice)](https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice)\n[![stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmyshell-ai\u002FAIlice)](https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice)\n[![watchers](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fwatchers\u002Fmyshell-ai\u002FAIlice)](https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice)\n[![Visitors](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmyshell-ai_AIlice_readme_b13f8d0d8b9d.png)](https:\u002F\u002Fvisitorbadge.io\u002Fstatus?path=https%3A%2F%2Fgithub.com%2Fmyshell-ai%2FAIlice)\n[![license](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fmyshell-ai\u002FAIlice)](.\u002FLICENSE)\n\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"#quick-start\">Quick Start\u003C\u002Fa> •\n  \u003Ca href=\"https:\u002F\u002Fwww.youtube.com\u002F@stevenlu-zh6ds\">Demo\u003C\u002Fa> •\n  \u003Ca href=\"#development\">Development\u003C\u002Fa> •\n  \u003Ca href=\"https:\u002F\u002Ftwitter.com\u002Fstevenlu1729\">Twitter\u003C\u002Fa> •\n  \u003Ca href=\"https:\u002F\u002Fwww.reddit.com\u002Fr\u002FAIlice\u002F\">Reddit\u003C\u002Fa>\n\u003C\u002Fp>\n\n---\n\n**ATTENTION! We currently has no plans to create any related crypto tokens. Please be cautious and recognize scams to avoid being deceived. (Updated on January 6, 2025)**\n\n:fire: Mar 22, 2025: Ailice can now use MCP tools! Click [here](#configuring-extension-modules-and-mcp-servers).\n\n:fire: Jan 23, 2025: Updated the voice dialogue feature. Thanks to ChatTTS's excellent implementation, voice dialogue has finally moved beyond its experimental status and become practical.\n\n:fire: Jun 22, 2024: We have entered the era of locally running JARVIS-like AI assistants! The latest open-source LLMs enable us to perform complex tasks locally! Click [here](#guide-to-choosing-an-llm) to learn more.\n\n----\n\nAilice is a fully **autonomous, general-purpose AI agent**. This project aims to create a standalone artificial intelligence assistant, similar to JARVIS, based on the open-source LLM. Using its unique IACT(Interactive Agents Call Tree) architecture, AIlice can decompose complex tasks into dynamically constructed agents and integrate results with high fault tolerance. Currently, Ailice demonstrates proficiency in a range of tasks, including **thematic research, coding, system management, literature reviews, and complex hybrid tasks** that go beyond these basic capabilities.\n\nWe will ultimately achieve **self-evolution of AI agents**. That is, AI agents will autonomously build their own feature expansions and new types of agents, unleashing LLM's knowledge and reasoning capabilities into the real world seamlessly.\n\n---\n\n## 🚀 Try Ailice Online\n\n**Experience Ailice instantly at [kragent.ai](https:\u002F\u002Fkragent.ai\u002F)** - A platform built on Ailice that lets you explore its capabilities without local setup. Perfect for testing and discovering what this AI assistant can do!\n\nUse **your own API key** to configure powerful commercial LLMs and unlock their full potential (it will be a completely different beast from the free default configuration!). For optimal cost-effectiveness, consider a hybrid setup combining commercial and open-source LLMs. Recommended models: **Claude 3.5\u002F3.7, Gemini 2.5 Pro**.\n\n---\n\n- [Features](#features)\n- [Quick Start](#quick-start)\n  - [Quick Installation](#quick-installation)\n  - [COOL things we can do](#cool-things-we-can-do)\n- [Installation and Usage](#installation-and-usage)\n  - [System Requirements](#system-requirements)\n  - [Environment Configuration and Installation](#environment-configuration-and-installation)\n  - [If You Need to Frequently Use Google](#if-you-need-to-frequently-use-google)\n  - [Usage](#usage)\n  - [Configuring Extension Modules and MCP Servers](#configuring-extension-modules-and-mcp-servers)\n  - [Useful Tips](#useful-tips)\n- [Selection and Configuration of LLM](#selection-and-configuration-of-LLM)\n  - [Guide to Choosing an LLM](#guide-to-choosing-an-llm)\n  - [The Most Outstanding Open-source Model](#the-most-outstanding-open-source-model)\n  - [How to Add LLM Support](#how-to-add-llm-support)\n    - [Using LLM through Inference Services](#using-llm-through-inference-services)\n      - [Example 1: ollama + litellm](#example-1-ollama-litellm)\n      - [Example 2: LM Studio](#example-2-lm-studio)\n      - [Example 3: Add open source multimodal model support](#example-3-add-open-source-multimodal-model-support)\n    - [Open Source Models on Huggingface](#open-source-models-on-huggingface)\n  - [Using Different Models in Different Agents](#using-different-models-in-different-agents)\n- [Development](#development)\n  - [Design](#design)\n    - [Computational Model: Interactive Agents Call Tree](#computational-model-interactive-agents-call-tree)\n    - [Basic Computing Unit: Tai Chi Diagram of LLM and Interpreter](#basic-computing-unit-tai-chi-diagram-of-llm-and-interpreter)\n    - [Agent Design: Implementing the Interpreter Framework](#agent-design-implementing-the-interpreter-framework)\n    - [Scripting Language: From Text to Reality](#scripting-language-from-text-to-reality)\n    - [Multimodal: Collaboration of Rich Text and Variable Mechanisms](#multimodal-collaboration-of-rich-text-and-variable-mechanisms)\n    - [Self-Expansion: Growing Like a Tree](#self-expansion-growing-like-a-tree)\n  - [How Developers Should Get Started](#how-developers-should-get-started)\n  - [Project Development Standards and Constraints](#project-development-standards-and-constraints)\n  - [Future Development Roadmap](#future-development-roadmap)\n\n\n\u003Ca name=\"features\">\u003C\u002Fa>\n## Features\nKey technical features of Ailice include:\n\n- **In-depth research capabilities on specialized subjects.**\n- **The ability to read and analyze articles and scholarly works.**\n- **Advanced automation in programming and script execution, functioning as a comprehensive coder and an efficient system management tool, similar to an AI-powered operating system.**\n- **Voice interaction support.**\n- **Compatibility with open-source models and seamless integration with commercial models.**\n- **Native multi-modal support across all agents.**\n- **Rich media UI with image\u002Fvideo\u002Faudio, LaTeX formulas, code highlighting, and file upload\u002Fdownload support.**\n- **A natural and highly fault-tolerant Interactive Agents Call Tree architecture.**\n- **Flexible parsing of LLM outputs, enabling a broader range of function call mechanisms.**\n- **The capability to self-construct and dynamically load modules for interacting with the environment, providing endless possibilities for expanding features.**\n\n\u003Ca name=\"quick-start\">\u003C\u002Fa>\n## Quick Start\n\n\u003Ca name=\"quick-installation\">\u003C\u002Fa>\n### Quick Installation\n\nInstall and run Ailice with the following commands. Once Ailice is launched, use a browser to open the web page it provides, a dialogue interface will appear. Issue commands to Ailice through the conversation to accomplish various tasks. For your first use, you can try the commands provided in the [COOL things we can do](#cool-things-we-can-do) section to quickly get familiarized.\n\n**Local run:**\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice.git\ncd AIlice\npip install -e .\nailice --contextWindowRatio=0.2\n```\n\n**Sandbox run:**\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice.git\ncd AIlice\ndocker build -t ailice .\ndocker run -it -p 127.0.0.1:5000:5000 --name ailice ailice --expose=1 --contextWindowRatio=0.2\n```\n\n**Sandbox run with CUDA support**(Please install [nvidia-container-toolkit](https:\u002F\u002Fdocs.nvidia.com\u002Fdatacenter\u002Fcloud-native\u002Fcontainer-toolkit\u002Flatest\u002Finstall-guide.html) first):\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice.git\ncd AIlice\ndocker build --build-arg BASE_IMAGE=nvidia\u002Fcuda:13.0.0-cudnn-devel-ubuntu24.04 -t ailice .\ndocker run --gpus all -it -p 127.0.0.1:5000:5000 --name ailice ailice --expose=1 --contextWindowRatio=0.2\n```\n\n**Sandbox run with GUI support**(Linux only, special configuration required for Windows and macOS):\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice.git\ncd AIlice\ndocker build -t ailice .\ndocker run -it -p 127.0.0.1:5000:5000 \\\n    -e DISPLAY=$DISPLAY \\\n    -v \u002Ftmp\u002F.X11-unix:\u002Ftmp\u002F.X11-unix \\\n    --name ailice \\\n    ailice --expose=1 --contextWindowRatio=0.2\n```\n\n- For a more detailed understanding of the installation and configuration methods, please visit the [Installation and Usage](#installation-and-usage) section and the [Selection and Configuration of LLM](#selection-and-configuration-of-LLM) section.\n- To grasp the basic design principles of Ailice, navigate to the [Design](#design) section.\n\n\n\u003Ca name=\"cool-things-we-can-do\">\u003C\u002Fa>\n### COOL Things We Can Do\n\n#### Quick Start Examples\n- \"List the contents of the current directory.\"\n- \"Check the weather in San Francisco today.\"\n- \"Calculate the integral of e^(-x^2) from negative infinity to positive infinity with detailed derivation steps.\"\n- \"Generate fractal visualization using any algorithm of choice.\"\n\n#### System Administration\n- \"Install Google Chrome browser on this system. Download the latest stable version, verify the installation, and confirm it's working properly.\"\n\n#### Software Development & Analysis\n- \"Clone the GiraffeCV project from GitHub, analyze its architecture, identify main module interfaces, and provide a detailed report.\"\n\n#### Artificial Intelligence\n- \"Use SDXL to generate an image of 'a fat orange cat'. Reference sample code from its Hugging Face page, save to current directory and display the result.\"\n\n#### Web Development\n- \"Generate a professional homepage for the AIlice AI agent project on GitHub. Run it on local port 59001 with beautiful interface including images and text content.\"\n\n#### Engineering Design & Simulation\n- \"Design a frequency modulation (FM) radio receiver with simulation. Provide schematic diagram and simulation results.\"\n- \"Use CadQuery to generate a gear with custom parameters. Provide projection views from multiple angles after generation.\"\n\n#### Cybersecurity\n- \"Perform a comprehensive security scan of kragent.ai website. Check for common vulnerabilities and provide a detailed security assessment report with recommendations.\"\n\n#### Academic Research & Literature Review\n- \"Find a recent review paper on the black hole information paradox. Use it to collect URLs of important literature from the past five years, read them, and report on field progress.\"\n\n#### Theoretical Physics\n- \"Derive wave equation and interference theory, create numerical simulations and visualizations, generate LaTeX PDF slides.\"\n\n#### Mathematics\n- \"Create 3D visualization of holonomy in differential geometry.\"\n- \"Generate animated demonstration of Gibbs phenomenon in Fourier series approximation.\"\n\n#### Medical Research\n- \"Investigate current therapeutic approaches for cardiac sarcoidosis. Search recent literature from 2022-2025, analyze clinical trial results, and compile a comprehensive report on emerging treatment modalities.\"\n\n#### Computational Biology\n- \"Using molecular phylogenetic data and comparative genomics, calculate the most recent common ancestor (MRCA) time between humans and cats.\"\n\n#### Astronomy & Astrophysics\n- \"Calculate the current position of Mars as viewed from California using horizontal coordinate system (altitude and azimuth).\"\n\n#### Document Generation\n- \"Generate a science-style PDF paper with proper formatting and self-determined content.\"\n\n#### Dataset Construction & Automated Collection\n- \"Search the internet for 100 physics tutorials across various branches and download the PDF files to a 'physics' folder. Organize them by subdisciplines (mechanics, electromagnetism, quantum physics, etc.) and create an index with metadata.\"\n\n#### Video Processing & AI Analysis Pipeline\n**Task:** Process Feynman's physics lectures through a complete AI pipeline\n\n**Steps:**\n\n1. Find Feynman's lecture videos on YouTube, download to `Feynman\u002F` subdirectory (create folder first)\n2. Extract audio from videos, save to `Feynman\u002Faudio\u002F`\n3. Convert audio to text using whisper-large-v3 (reference Hugging Face example code), merge into single document\n4. Extract answer to: \"Why do we need antiparticles?\" from the transcribed text\n\n*This multi-step task requires interactive communication with the AI agent, using the \"Interrupt\" button when needed to guide the process.*\n\n#### Extensibility & Module Development\n\n1. \"Write an ext-module to obtain wiki page content through keywords.\"\n2. \"Load the newly implemented wiki module and use it to query the relativity entry.\"\n\n*The AI agent can construct external interaction modules (ext-modules) autonomously, providing unlimited extensibility through simple prompts.*\n\n\u003Ca name=\"installation-and-usage\">\u003C\u002Fa>\n## Installation and Usage\n\n\u003Ca name=\"system-requirements\">\u003C\u002Fa>\n### System Requirements\n\nIf users do not plan to run LLMs locally, then running Ailice has virtually no hardware requirements. For users who want to run LLMs locally, currently only models with 70B or more parameters can perform tasks well, therefore you need at least two RTX 4090 GPUs (48GB VRAM) to effectively complete tasks.\n\n- Ailice was developed in **Ubuntu**, therefore installation and usage on Ubuntu has the best guarantee. \n- The usability in a **MacOS** environment is similar to that in an Ubuntu environment.\n- For **Windows** users, using **Docker** or installing **WSL** (Windows Subsystem for Linux) and running Ailice within WSL is a better choice, especially for users who need to execute programming tasks -- I haven't integrated Windows command execution tools yet (this will be considered in the future, but the flexibility of command-line tools is of great significance for AI agents, which makes Linux platforms more advantageous). Additionally, the lack of testing on Windows significantly increases the likelihood of bugs; if you encounter related issues, please submit issues for resolution. \n\nBefore installing Ailice, it is strongly recommended to install **Anaconda and create a virtual environment** first (you can also use other tools you prefer, such as venv). You will also need **Chrome**, as Ailice needs it for web browsing. For users who want to run Ailice in a fully controlled container\u002Fvirtual machine, you will need **Docker** (or other virtual machines, such as VirtualBox).\n\nIf you want to run Ailice in a virtual machine, ensure **Hyper-V** is turned off(otherwise llama.cpp cannot be installed). In a VirtualBox environment, you can disable it by following these steps: disable PAE\u002FNX and VT-X\u002FAMD-V ( Hyper-V) on VirtualBox settings for the VM. Set paravirtualization Interface to Default, disable nested paging.\n\n\u003Ca name=\"environment-configuration-and-installation\">\u003C\u002Fa>\n### Environment Configuration and Installation\nYou can use the following command to install Ailice:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice.git\ncd AIlice\npip install -e .\n```\n\nThe running speed of Ailice may be slow after installation, as the long-term memory module's embedded model computation might be running on CPU. In this case, we can try running the following command to install GPU acceleration:\n\n```bash\nailice_turbo\n```\n\nFor users who need to use the pdf reading\u002Fvoice dialogue\u002Fhuggingface models\u002Fmodel fine-tuning functions, you can use one of the following command(Installing too many features increases the likelihood of dependency conflicts, so it is recommended to install only the necessary parts):\n\n```bash\npip install -e .[pdf-reading]\npip install -e .[speech]\npip install -e .[huggingface]\npip install -e .[finetuning]\n```\n\nYou can run Ailice now! Use the commands in [Usage](#usage).\n\n\u003Ca name=\"if-you-need-to-frequently-use-google\">\u003C\u002Fa>\n### If You Need to Frequently Use Google\nBy default, the Google module in Ailice is restricted, and repeated usage can lead to errors requiring some time to resolve. This is an awkward reality in the AI era; traditional search engines only allow access to genuine users, and AI agents currently don't fall within the category of 'genuine users'. While we have alternative solutions, they all require configuring an API key, which sets a high barrier for entry for ordinary users. However, for users who require frequent access to Google, I assume you'd be willing to endure the hassle of applying for a Google's official API key (We are referring to [Custom Search JSON API](https:\u002F\u002Fdevelopers.google.com\u002Fcustom-search\u002Fv1\u002Foverview), which requires you to specify searching the entire internet at the time of creation) for search tasks. For these users, please open config.json and use the following configuration:\n\n```\n{\n    ...\n    \"services\": {\n        ...\n        \"google\": {\n          \"cmd\": \"python3 -m ailice.modules.AGoogleAPI --addr=ipc:\u002F\u002F\u002Ftmp\u002FAGoogle.ipc --api_key=YOUR_API_KEY --cse_id=YOUR_CSE_ID\",\n          \"addr\": \"ipc:\u002F\u002F\u002Ftmp\u002FAGoogle.ipc\"\n        },\n        ...\n    }\n}\n```\n\nand install google-api-python-client: \n\n```bash\npip install google-api-python-client\n```\n\nThen simply restart Ailice.\n\n\n\u003Ca name=\"usage\">\u003C\u002Fa>\n### Usage\n\nYou can directly copy a command from the typical use cases below to run Ailice.\n\n```bash\nailice   #Use models configured individually for different agents under the agentModelConfig field in config.json.\nailice_web --speechOn=1 --ttsDevice=cuda --sttDevice=cuda\nailice --modelID=anthropic:claude-sonnet-4-20250514 --contextWindowRatio=0.2\nailice --modelID=openrouter:z-ai\u002Fglm-4.5 --chatHistoryPath=.\u002Fchat_history --contextWindowRatio=0.2\nailice --modelID=mistral:mistral-large-latest --prompt=\"researcher\"\nailice --modelID=deepseek:deepseek-chat\nailice --modelID=hf:Open-Orca\u002FMistral-7B-OpenOrca --quantization=8bit --contextWindowRatio=0.6\nailice --modelID=groq:llama3-70b-8192\nailice --modelID=openrouter:google\u002Fgemini-2.5-pro\nailice --modelID=lm-studio:qwen2-72b --contextWindowRatio=0.5\n```\n\nIt should be noted that the last use case requires you to configure the LLM inference service first, please refer to [How to Add LLM Support](#how-to-add-llm-support). Using inference frameworks such as LM Studio can use limited hardware resources to support larger models, provide faster inference speed and faster Ailice startup speed, making it more suitable for ordinary users.\n\nWhen you run it for the first time, you will be asked to enter the api-key. You can also modify the api-key by editing the config.json file. Please note that the first time When using an open source LLM, it will take a long time to download the model weights, please make sure you have enough time and disk space.\n\nWhen you turn on the speechOn switch for the first time, you may need to wait for a long time at startup. This is because the weights of the speech recognition and TTS models are being downloaded in the background.\n\nAs shown in the examples, you can use the Agent through `ailice`, it provides a web dialogue interface. You can view the default value of each parameter by using\n\n```bash\nailice --help\n```\n\nThe default values for all command line arguments can be customized by modifying the corresponding parameters in config.json.\n\n- --**modelID** There are two modes for model configuration. In the first mode, the model is uniformly specified by modelID. In the second mode, different types of agents will run on different models. When this parameter is an empty string (unspecified), the second mode will be used automatically, i.e., the models configured individually for different agents under the agentModelConfig field in config.json will be used, for details please refer to [Using Different Models in Different Agents](#using-different-models-in-different-agents). The currently supported models can be seen in config.json.\n- --**quantization** is the quantization option, you can choose 4bit or 8bit. The default is not quantized.\n- --**maxMemory** is the memory video memory capacity constraint, the default is not set, the format when set is like \"{0:\"23GiB\", 1:\"24GiB\", \"cpu\": \"64GiB\"}\".\n- --**prompt** specifies the prompt to be executed, which is the type of agent. The default is 'main', this agent will decide to call the appropriate agent type according to your\nneeds. You can also specify a special type of agent and interact with it directly.\n- --**temperature** sets the temperature parameter of LLM reasoning, the default is zero.\n- --**flashAttention2** is the switch to enable flash attention 2 to speed up inference. It may have a certain impact on output quality.\n- --**contextWindowRatio** is a user-specified proportion coefficient, which determines the proportion of the upper limit of the prompt length constructed during inference to the LLM context window in some cases. The default value is 0.6.\n- --**speechOn** is the switch to enable voice conversation.\n- --**ttsDevice** specifies the computing device used by the text-to-speech model. The default is \"cpu\", you can set it to \"cuda\" if there is enough video memory.\n- --**sttDevice** specifies the computing device used by the speech-to-text model. The default is \"cpu\", you can set it to \"cuda\" if there is enough video memory.\n- --**resetApiKey** whether to reset the model's API key after startup.\n- --**chatHistoryPath** is used to specify the directory where chat history data is stored.\n- --**certificate** Certificate settings for the web interface. The simplest option is an empty string, which will use the HTTP protocol for the UI web page. Setting it to 'adhoc' will use a self-generated certificate, providing encryption for the data flow between the UI and server, but it requires dismissing browser security warnings. The most secure method is to apply for a certificate and set this parameter to '{\"cert\": \"your_cert.pem\", \"key\": \"your_key.pem\")'.\n- --**expose** Whether to provide public access.\n- --**share** create a publicly shareable link for Ailice. (For security reasons, we have temporarily removed this feature. It will be re-enabled once more security measures are implemented in the UI. Please ensure that the services provided by app.py are not exposed to any untrusted networks)\n\n\n\u003Ca name=\"configuring-extension-modules-and-mcp-servers\">\u003C\u002Fa>\n### Configuring Extension Modules and MCP Servers\n\nThe configuration file of Ailice is named config.json, and its location will be output to the command line when Ailice is started. In this section, we will introduce how to extend Ailice's external interaction capabilities through configuration file.\n\nIn Ailice, we use the term \"module\" to specifically refer to components that provide functions for interacting with the external world. Each module runs as an independent process; they can run in different software or hardware environments from the core process, making Ailice capable of being distributed. We provide a series of basic module configurations in the configuration file required for Ailice's operation (such as vector database, search, browser, code execution, etc.). You can also add configurations for new modules. Module configuration is very simple, consisting of only two items:\n\n```json\n  \"services\": {\n    ...\n    \"scripter\": {\"cmd\": \"python3 -m ailice.modules.AScripter --addr=tcp:\u002F\u002F127.0.0.1:59000\",\n\t               \"addr\": \"tcp:\u002F\u002F127.0.0.1:59000\"},\n    ...\n  }\n```\n\nAmong these, under **\"cmd\"** is a command line used to start the module's process. When Ailice starts, it automatically runs these commands to launch the modules. Users can specify any command, providing significant flexibility. You can start a module's process locally or utilize Docker to start a process in a virtual environment, or even start a remote process. Some modules have multiple implementations (such as Google\u002FStorage), and you can configure here to switch to another implementation.\n\n**\"addr\"** refers to the address and port number of the module process. Users might be confused by the fact that many modules in the default configuration have both \"cmd\" and \"addr\" containing addresses and port numbers, causing redundancy. This is because \"cmd\" can, in principle, contain any command (which may include addresses and port numbers, or none at all). Therefore, a separate \"addr\" item is necessary to inform Ailice how to access the module process.\n\nAilice can use tools from various **MCP servers**, simply by wrapping the MCP server with the ailice_mcp_wrapper command, which allows it to be used as a standard extension module for Ailice. \n\nFor an MCP server started locally in stdio mode, assuming the startup command is `mcp_echo hello`, we use the following configuration to launch it as a standard Ailice service.\n\n```json\n  \"services\": {\n    ...\n    \"mcp_echo\": {\"cmd\": \"ailice_mcp_wrapper --addr tcp:\u002F\u002F127.0.0.1:59200 stdio mcp_echo hello\",\n                 \"addr\": \"tcp:\u002F\u002F127.0.0.1:59200\"},\n    ...\n  }\n```\n\nFor an MCP server in SSE mode, assuming its service address is: `http:\u002F\u002Fexample:8000\u002Fsse`, we use the following configuration to connect to it.\n\n```json\n  \"services\": {\n    ...\n    \"mcp_echo\": {\"cmd\": \"ailice_mcp_wrapper --addr tcp:\u002F\u002F127.0.0.1:59200 sse --server_url http:\u002F\u002Fexample:8000\u002Fsse\",\n                 \"addr\": \"tcp:\u002F\u002F127.0.0.1:59200\"},\n    ...\n  }\n```\n\nFor extension modules that are set up in the configuration file, Ailice automatically loads them after startup and selects the appropriate tools for the current task. In addition to using config.json, you can also provide Ailice with the address and port of the extension module process during runtime, and it can dynamically load and use the module.\n\n\n\u003Ca name=\"useful-tips\">\u003C\u002Fa>\n### Useful Tips\n\nAilice is an agent based on multi-agent cooperation, and as a user, you are also one of the \"agents\". Hence, when Ailice requires additional information, it will seek input from you, and the thoroughness of your details is crucial for her success.\n\nInterrupts. **Interrupts are the second interaction mode supported by Ailice, which allows you to interrupt and provide prompts to Ailice's agents at any time to correct errors or provide guidance**. In `ailice`, during Ailice's task execution, a interrupt button appears on the right side of the input box. Pressing it pauses Ailice's execution and waits for your prompt message. You can enter your prompt into the input box and press Enter to send the message to the agent currently executing the subtask.\nProficient use of this feature requires a good understanding of Ailice's workings, especially the agent call tree architecture. It also involves focusing more on the command line window rather than the dialogue interface during Ailice's task execution. Overall, this is a highly useful feature, especially on less powerful language model setups.\n\nIn voice dialogue mode, you can **ask Ailice to switch between different voice tones** until you find the one you like (it will remember your preferred voice tone).\n\n\u003Ca name=\"selection-and-configuration-of-LLM\">\u003C\u002Fa>\n## Selection and Configuration of LLM\n\n\u003Ca name=\"guide-to-choosing-an-llm\">\u003C\u002Fa>\n### Guide to Choosing an LLM\n\nUpdated on Aug 15, 2025.\n\nCurrently, Ailice can **handle more complex tasks using the locally run 72B open-source model (qwen-2.5-72b-instruct running on 4090x2)**. Considering the low cost of open-source models, we highly recommend users to start using them. Moreover, localizing LLM operations ensures absolute privacy protection, a rare quality in AI applications in our time. Click [here](#example-2-lm-studio) to learn how to run LLM locally. For users whose GPU conditions are insufficient to run large models, you can use the online inference service (such as openrouter, this will be mentioned next) to access these open-source models (though this sacrifices privacy). You can make agents excel by leveraging different models according to their strengths and weaknesses. For details, please refer to [Using Different Models in Different Agents](#using-different-models-in-different-agents).\n\n**claude-3-5-sonnet\u002Fclaude-3-7-sonnet\u002Fclaude-sonnet-4\u002Fgemini-2.5-pro** provides the best performance.\n\n**z-ai\u002Fglm-4.5** is very close to top-tier performance.\n\n**qwen-2.5-72b-instruct** provides the best performance at the 70B level\n\n**gpt-4-turbo**\u002F**gpt-3.5-turbo** is surprisingly lazy, and we have never been able to find a stable prompt expression.\n**gpt-4o** used to have top-tier performance, but currently (similar to the previous turbo models) they exhibit laziness issues in function calls. We no longer recommend using them.\n\nFor users whose hardware capabilities are insufficient to run open-source models locally and who are unable to obtain API keys for commercial models, they can try the following options:\n\n- **openrouter\u002Fapipie** These services can route your inference requests to various open-source or commercial models without the need to deploy open-source models locally or apply for API keys for various commercial models. They're fantastic choices. Ailice automatically supports all models in OpenRouter\u002FApipie. Thanks @babybirdprd for recommending OpenRouter to me.\n\n\u003Ca name=\"the-most-outstanding-open-source-model\">\u003C\u002Fa>\n### The Most Outstanding Open-source Model\n\nWe will select the currently best-performing open-source model to provide a reference for users of open-source models. \n\n- The best among all models: **z-ai\u002Fglm-4.5**. This model provides excellent performance, but exceeds the hardware capabilities of the vast majority of people, so it's not suitable for local running.\n\n- **qwen-2.5-72b-instruct** qwen-2.0-72b-instruct was the first open-source model with practical value, and version 2.5 continues to improve. It provides inexpensive models for relatively simple tasks, and is also the best model that most people can run locally.\n\nIf you find a better model, please let me know.\n\n\u003Ca name=\"how-to-add-llm-support\">\u003C\u002Fa>\n### How to Add LLM Support\nFor advanced players, it is inevitable to try more models. Fortunately, this is not difficult to achieve. \n\n\n\u003Ca name=\"using-llm-through-inference-services\">\u003C\u002Fa>\n#### Using LLM through Inference Services\n\nFor openai\u002Fmistral\u002Fanthropic\u002Fgroq models, you don't need to do anything. Just use the modelID consisting of the official model name appended to the \"oai:\"\u002F\"mistral:\"\u002F\"anthropic:\"\u002F\"groq:\" prefix. If you need to use a model that is not included in Ailice's supported list, you can resolve this by adding an entry for this model in the config.json file. The method for adding is to directly reference the entry of a similar model, modify the **contextWindow** to the actual value, keep the **systemAsUser** consistent with the similar model, and set **args** to empty dict.\n\nYou can use any third-party inference server compatible with the OpenAI API to replace the built-in LLM inference functionality in Ailice. \nJust use the same configuration format as the openai models and modify the **baseURL, apikey, contextWindow and other parameters** (Actually, this is how Ailice supports Groq models).\n\nFor inference servers that do not support the OpenAI API, you can try using **litellm** to convert them into an OpenAI-compatible API (we have an example below). \n\nIt's important to note that due to the presence of many SYSTEM messages in Ailice's conversation records, which is not a common use case for LLM, the level of support for this depends on the specific implementation of these inference servers. In this case,  you can set the systemAsUser parameter to true to circumvent the issue. Although this might prevent the model from running Ailice at its optimal performance, it also allows us to be compatible with various efficient inference servers. For the average user, the benefits outweigh the drawbacks.\n\n\n\u003Ca name=\"example-1-ollama-litellm\">\u003C\u002Fa>\n##### Example 1: ollama + litellm\nWe use Ollama as an example to explain how to add support for such services.\nFirst, we need to use Litellm to convert Ollama's interface into a format compatible with OpenAI.\n\n```bash\npip install litellm\nollama pull mistral-openorca\nlitellm --model ollama\u002Fmistral-openorca --api_base http:\u002F\u002Flocalhost:11434 --temperature 0.0 --max_tokens 8192\n```\n\nThen, add support for this service in the config.json file (the location of this file will be prompted when Ailice is launched).\n\n```json\n{\n  \"maxMemory\": {},\n  \"quantization\": null,\n  \"models\": {\n    \"oai\": {\n      ...\n    },\n    \"ollama\": {\n      \"modelWrapper\": \"AModelChatGPT\",\n      \"apikey\": \"fake-key\",\n      \"baseURL\": \"http:\u002F\u002Flocalhost:8000\",\n      \"modelList\": {\n        \"mistral-openorca\": {\n          \"formatter\": \"AFormatterGPT\",\n          \"contextWindow\": 8192,\n          \"systemAsUser\": false,\n          \"args\": {}\n        }\n      }\n    },\n    ...\n  },\n  ...\n}\n```\n\nNow we can run Ailice:\n\n```bash\nailice --modelID=ollama:mistral-openorca\n```\n\n\u003Ca name=\"example-2-lm-studio\">\u003C\u002Fa>\n##### Example 2: LM Studio\n\nIn this example, we will use LM Studio to run the most open source model I've ever seen: **Qwen2-72B-Instruct-Q3_K_S.gguf**, powering Ailice to run on a local machine.\n\nDownload model weights of **Qwen2-72B-Instruct-Q3_K_S.gguf** using LM Studio.\n\nIn the LM Studio's \"LocalServer\" window, set n_gpu_layers to -1 if you want to use GPU only. Adjust the 'Context Length' parameter on the left to 16384(or a smaller value based on your available memory), and change the 'Context Overflow Policy' to 'Keep the system prompt and the first user message, truncate middle'.\n\nRun the service. We assume the address of the service is \"http:\u002F\u002Flocalhost:1234\u002Fv1\u002F\".\n\nThen, we open config.json and make the following modifications:\n\n```json\n{\n  \"maxMemory\": {},\n  \"quantization\": null,\n  \"models\": {\n    \"oai\": {\n      ...\n    },\n    \"lm-studio\": {\n      \"modelWrapper\": \"AModelChatGPT\",\n      \"apikey\": \"fakekey\",\n      \"baseURL\": \"http:\u002F\u002Flocalhost:1234\u002Fv1\u002F\",\n      \"modelList\": {\n        \"qwen2-72b\": {\n          \"formatter\": \"AFormatterGPT\",\n          \"contextWindow\": 32764,\n          \"systemAsUser\": true,\n          \"args\": {}\n        }\n      }\n    },\n    ...\n  },\n  ...\n}\n```\n\nFinally, run Ailice. You can adjust the 'contextWindowRatio' parameter based on your available VRAM or memory space. The larger the parameter, the more VRAM space is required.\n\n```bash\nailice --modelID=lm-studio:qwen2-72b --contextWindowRatio=0.5\n```\n\n\u003Ca name=\"example-3-add-open-source-multimodal-model-support\">\u003C\u002Fa>\n##### Example 3: Add open source multimodal model support\n\nSimilar to what we did in the previous section, after we use LM Studio to download and run LLAVA, we modify the configuration file as follows:\n\n```json\n{\n  \"maxMemory\": {},\n  \"quantization\": null,\n  \"models\": {\n    \"oai\": {\n      ...\n    },\n    \"lm-studio\": {\n      \"modelWrapper\": \"AModelChatGPT\",\n      \"apikey\": \"fakekey\",\n      \"baseURL\": \"http:\u002F\u002Flocalhost:1234\u002Fv1\u002F\",\n      \"modelList\": {\n        \"llava-1.6-34b\": {\n          \"formatter\": \"AFormatterGPTVision\",\n          \"contextWindow\": 4096,\n          \"systemAsUser\": true,\n          \"args\": {}\n        }\n      },\n    },\n    ...\n  },\n  ...\n}\n```\n\nHowever, it should be noted that the current open source multi-modal model is far from sufficient to perform agent tasks, so this example is for developers rather than users.\n\n\n\u003Ca name=\"open-source-models-on-huggingface\">\u003C\u002Fa>\n#### Open Source Models on Huggingface\n\nFor open source models on Huggingface, you only need to know the following information to add support for new models: The huggingface address of the model, the prompt format of the model, and the context window length.\nUsually one line of code is enough to add a new model, but occasionally you are unlucky and you need about a dozen lines of code.\n\nHere is the complete method of adding new LLM support:\n\nOpen config.json, you should add the config of new LLM into models.hf.modelList, which looks like the following:\n\n```json\n{\n  \"maxMemory\": {},\n  \"quantization\": null,\n  \"models\": {\n    \"hf\": {\n      \"modelWrapper\": \"AModelCausalLM\",\n      \"modelList\": {\n        \"meta-llama\u002FLlama-2-13b-chat-hf\": {\n          \"formatter\": \"AFormatterLLAMA2\",\n          \"contextWindow\": 4096,\n          \"systemAsUser\": false,\n          \"args\": {}\n        },\n        \"meta-llama\u002FLlama-2-70b-chat-hf\": {\n          \"formatter\": \"AFormatterLLAMA2\",\n          \"contextWindow\": 4096,\n          \"systemAsUser\": false,\n          \"args\": {}\n        },\n        ...\n      }\n    },\n  ...\n  }\n...\n}\n```\n\n- \"formatter\" is a class that defines LLM's prompt format. You can find their definitions in core\u002Fllm\u002FAFormatter. You can read these codes to determine which format is required for the model you want to add. In case you don't find it, You need to write one yourself. Fortunately, Formatter is a very simple thing and can be completed in more than a dozen lines of code. I believe you will understand how to do it after reading a few Formatter source codes.\n\n- The context window is a property that the LLM of the Transformer architecture usually has. It determines the length of text that the model can process at one time. You need to set the context window of the new model to the \"contextWindow\" key.\n\n- \"systemAsUser\": We use the \"system\" role as the sender of the message returned by the function calls. However, not all LLMs have a clear definition of system role, and there is no guarantee that the LLM can adapt to this approach. So we need to use systemAsUser to set whether to put the text returned by the function calls in user messages. Try to set it to False first.\n\nEverything is done! Use \"hf:\" as a prefix to the model name to form a modelID, and use the modelID of the new model as the command parameter to start Ailice!\n\n\n\u003Ca name=\"using-different-models-in-different-agents\">\u003C\u002Fa>\n### Using Different Models in Different Agents\n\nAilice has two operating modes. One mode uses a single LLM to drive all agents, while the other allows each type of agent to specify a corresponding LLM. The latter mode enables us to better combine the capabilities of open-source models and commercial models, achieving better performance at a lower cost. To use the second mode, you need to configure the agentModelConfig item in config.json first:\n\n```json\n  \"modelID\": \"\",\n  \"agentModelConfig\": {\n    \"DEFAULT\": \"openrouter:z-ai\u002Fglm-4.5\",\n    \"main\": \"openrouter:anthropic\u002Fclaude-sonnet-4\",\n    \"search-engine\": \"openrouter:qwen\u002Fqwen-2.5-72b-instruct\"\n  },\n```\n\nFirst, ensure that the default value for modelID is set to an empty string, then configure the corresponding LLM for each type of agent in agentModelConfig.\n\nFinally, you can achieve the second operating mode by not specifying a modelID:\n\n```bash\nailice\n```\n\n\n\u003Ca name=\"development\">\u003C\u002Fa>\n## Development\n\n\u003Ca name=\"design\">\u003C\u002Fa>\n### Design\nThe basic principles when designing Ailice are:\n\n- **Enriching the behavior of LLM with highly dynamic prompt construction mechanisms;**\n- **Separating different computational tasks as much as possible, using recursion and divide-and-conquer from traditional computing to solve complex problems.**\n- **Agents should be able to interact in both directions.**\n\nLet's briefly explain these fundamental principles.\n\nStarting from the most obvious level, a highly dynamic prompt construction makes it less likely for an agent to fall into a loop. The influx of new variables from the external environment continuously impacts the LLM, helping it to avoid that pitfall. Furthermore, feeding the LLM with all the currently available information can greatly improve its output. For example, in automated programming, error messages from interpreters or command lines assist the LLM in continuously modifying the code until the correct result is achieved. Lastly, in dynamic prompt construction, new information in the prompts may also come from other agents, which acts as a form of linked inference computation, making the system's computational mechanisms more complex, varied, and capable of producing richer behaviors.\n\nSeparating computational tasks is, from a practical standpoint, due to our limited context window. We cannot expect to complete a complex task within a window of a few thousand tokens. If we can decompose a complex task so that each subtask is solved within limited resources, that would be an ideal outcome. In traditional computing models, we have always taken advantage of this, but in new computing centered around LLMs, this is not easy to achieve. The issue is that if one subtask fails, the entire task is at risk of failure. Recursion is even more challenging: how do you ensure that with each call, the LLM solves a part of the subproblem rather than passing the entire burden to the next level of the call? We have solved the first problem with the IACT architecture in Ailice, and the second problem is theoretically not difficult to solve, but it likely requires a smarter LLM.\n\nThe third principle is what everyone is currently working on: having multiple intelligent agents interact and cooperate to complete more complex tasks. The implementation of this principle actually addresses the aforementioned issue of subtask failure. Multi-agent collaboration is crucial for the fault tolerance of agents in operation. In fact, this may be one of the biggest differences between the new computational paradigm and traditional computing: traditional computing is precise and error-free, assigning subtasks only through unidirectional communication (function calls), whereas the new computational paradigm is error-prone and requires bidirectional communication between computing units to correct errors. This will be explained in detail in the following section on the IACT framework.\n\n\n\u003Ca name=\"computational-model-interactive-agents-call-tree\">\u003C\u002Fa>\n#### Computational Model: Interactive Agents Call Tree\n![IACT](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmyshell-ai_AIlice_readme_9648f3629f87.jpg)\n*IACT Architecture Diagram. A user requirement to build a page for image collection and display is dynamically decomposed into two tasks: coder_spider and coder_website. When coder_spider encounters difficulties, it proactively seeks assistance from its caller, proxy_cat_gallery. Proxy_cat_gallery then creates another agent, researcher_api, and employs it to address the issue.*\n\nAilice can be regarded as **a computer powered by a LLM**, and its features include:\n\n- Representing input, output, programs, and data in text form.\n\n- Using LLM as the processor.\n\n- Breaking down computational tasks through successive calls to basic computing units (analogous to functions in traditional computing), which are essentially various functional agents.\n\nTherefore, **user-input text commands are executed as a kind of program, decomposed into various \"subprograms\", and addressed by different agents**, forming the fundamental architecture of Ailice. In the following, we will provide a detailed explanation of the nature of these basic computing units.\n\nA natural idea is to let LLM solve certain problems (such as information retrieval, document understanding, etc.) through multi-round dialogues with external callers and\nperipheral modules in the simplest computational unit. We temporarily call this computational unit a \"function\". Then, by analogy with traditional computing, we allow \nfunctions to call each other, and finally add the concept of threads to implement multi-agent interaction. However, we can have a **much simpler and more elegant computational\nmodel** than this.\n\nThe key here is that the \"function\" that wraps LLM reasoning can actually be called and returned multiple times. A \"function\" with coder functionality can pause work and\nreturn a query statement to its caller when it encounters unclear requirements during coding. If the caller is still unclear about the answer, it continues to ask the next\nhigher level caller. This process can even go all the way to the final user's chat window. When new information is added, the caller will reactivate the coder's execution\nprocess by passing in the supplementary information. It can be seen that this \"function\" is not a traditional function, but an object that can be called multiple times. \nThe high intelligence of LLM makes this interesting property possible. You can also see it as **agents strung together by calling relationships, where each agent can create and call more sub-agents, and can also dialogue with its caller to obtain supplementary information or report its progress**. In Ailice, we call this computational unit\n**\"AProcessor\"**(essentially what we referred to as an agent). Its code is located in core\u002FAProcessor.py.\n\n\n\u003Ca name=\"basic-computing-unit-tai-chi-diagram-of-llm-and-interpreter\">\u003C\u002Fa>\n#### Basic Computing Unit: Tai Chi Diagram of LLM and Interpreter\nNext, we will elaborate on the structure inside AProcessor. The interior of AProcessor is a multi-round dialogue. The \"program\" that defines the function of AProcessor\nis a prompt generation mechanism, which generates the prompt for each round of dialogue from the dialogue history. The dialogue is one-to-many. After the external caller\ninputs the request, LLM will have multiple rounds of dialogue with the peripheral modules (we call them SYSTEM), LLM outputs function calls in various grammatical forms,\nand the system calls the peripheral modules to generate results and puts the results in the reply message. LLM finally gets the answer and responds to the external caller,\nending this call. But because the dialogue history is still preserved, the caller can call in again to continue executing more tasks. \n\nThe last part we want to introduce\nis the parsing module for LLM output. In fact, **we regard the output text of LLM as a \"script\" of semi-natural language and semi-formal language, and use a simple interpreter to execute it**. We can use regular expressions to express a carefully designed grammatical structure, parse it into a function call and execute it. Under this design, we\ncan design more flexible function call grammar forms, such as a section with a certain fixed title (such as \"UPDATE MEMORY\"), which can also be directly parsed out and\ntrigger the execution of an action. This implicit function call does not need to make LLM aware of its existence, but only needs to make it strictly follow a certain format\nconvention. For the most hardcore possibility, we have left room. The interpreter here can not only use regular expressions for pattern matching, its Eval function is\nrecursive. We don't know what this will be used for, but it seems not bad to leave a cool possibility, right? Therefore, inside AProcessor, the calculation is alternately\ncompleted by LLM and the interpreter, their outputs are each other's inputs, forming a cycle.\n\n\n\u003Ca name=\"agent-design-implementing-the-interpreter-framework\">\u003C\u002Fa>\n#### Agent Design: Implementing the Interpreter Framework\nIn Ailice, the interpreter is one of the most crucial components within an agent. We use the interpreter to map texts from the LLM output that match specific patterns to actions, including function calls, variable definitions and references, and any user-defined actions. Sometimes these actions directly interact with peripheral modules, affecting the external world; other times, they are used to modify the agent's internal state, thereby influencing its future prompts.\n\nThe basic structure of the interpreter is straightforward: a list of pattern-action pairs. Patterns are defined by regular expressions, and actions are specified by a Python function with type annotations. Given that syntactic structures can be nested, we refer to the overarching structure as the entry pattern. During runtime, the interpreter actively detects these entry patterns in the LLM output text. Upon detecting an entry pattern (and if the corresponding action returns data), it immediately terminates the LLM generation to execute the relevant action.\n\nThe design of agents in Ailice encompasses two fundamental aspects: **the logic for generating prompts based on dialogue history and the agent's internal state, and a set of pattern-action pairs.** Essentially, the agent implements the interpreter framework with a set of pattern-action pairs; it becomes an integral part of the interpreter. The agent's internal state is one of the targets for the interpreter's actions, with changes to the agent's internal state influencing the direction of future prompts.\n\nGenerating prompts from dialogue history and the internal state is nearly a standardized process, although developers still have the freedom to choose entirely different generation logic. The primary challenge for developers is to create a system prompt template, which is pivotal for the agent and often demands the most effort to perfect. However, this task revolves entirely around crafting natural language prompts.\n\n\n\u003Ca name=\"scripting-language-from-text-to-reality\">\u003C\u002Fa>\n#### Scripting Language: From Text to Reality\nAilice utilizes a simple scripting language embedded within text to map the text-based capabilities of LLMs to the real world. **This straightforward scripting language includes non-nested function calls and mechanisms for creating and referencing variables, as well as operations for concatenating text content**. Its purpose is to enable LLMs to exert influence on the world more naturally: from smoother text manipulation abilities to simple function invocation mechanisms, and multimodal variable operation capabilities. Finally, it should be noted that for the designers of agents, they always have the freedom to extend new syntax for this scripting language. What is introduced here is a minimal standard syntax structure.\n\nThe basic syntax is as follows:\n\nVariable Definition:\nVAR_NAME := \u003C!|\"SOME_CONTENT\"|!>\n\nFunction Calls\u002FVariable References\u002FText Concatenation:\n!FUNC-NAME\u003C!|\"...\", '...', VAR_NAME1, \"Execute the following code: \\n\" + VAR_NAME2, ...|!>\n\nThe basic variable types are str\u002FAImage\u002Fvarious multimodal types. The str type is consistent with Python's string syntax, supporting triple quotes and escape characters.\n\nThis constitutes the entirety of the embedded scripting language.\n\nThe variable definition mechanism introduces a way to extend the context window, allowing LLMs to record important content into variables to prevent forgetting. During system operation, various variables are automatically defined. For example, if a block of code wrapped in triple backticks is detected within a text message, a variable is automatically created to store the code, enabling the LLM to reference the variable to execute the code, thus avoiding the time and token costs associated with copying the code in full. Furthermore, some module functions may return data in multimodal types rather than text. In such cases, the system automatically defines these as variables of the corresponding multimodal type, allowing the LLM to reference them (the LLM might send them to another module for processing).\n\n\n\u003Ca name=\"multimodal-collaboration-of-rich-text-and-variable-mechanisms\">\u003C\u002Fa>\n#### Multimodal: Collaboration of Rich Text and Variable Mechanisms\nIn the long run, LLMs are bound to evolve into multimodal models capable of seeing and hearing. Therefore, **the exchanges between Ailice's agents should be in rich text**, not just plain text. While Markdown provides some capability for marking up multimodal content, it is insufficient. Hence, we will need an extended version of Markdown in the future to include various embedded multimodal data such as videos and audio.\n\nLet's take images as an example to illustrate the multimodal mechanism in Ailice. When agents receive text containing Markdown-marked images, the system automatically inputs them into a multimodal model to ensure the model can see these contents. Markdown typically uses paths or URLs for marking, so we have expanded the Markdown syntax to allow the use of variable names to reference multimodal content.\n\nAnother minor issue is how different agents with their own internal variable lists exchange multimodal variables. This is simple: the system automatically checks whether a message sent from one agent to another contains internal variable names. If it does, the variable content is passed along to the next agent.\n\nWhy do we go to the trouble of implementing an additional multimodal variable mechanism when marking multimodal content with paths and URLs is much more convenient? This is because marking multimodal content based on local file paths is only feasible when Ailice runs entirely in a local environment, which is not the design intent. Ailice is meant to be distributed, with the core and modules potentially running on different computers, and it might even load services running on the internet to provide certain computations. This makes returning complete multimodal data more attractive. Of course, these designs made for the future might be over-engineering, and if so, we will modify them in the future.\n\n\n\u003Ca name=\"self-expansion-growing-like-a-tree\">\u003C\u002Fa>\n#### Self-Expansion: Growing Like a Tree\nOne of the goals of Ailice is to achieve introspection and self-expansion (which is why our logo features a butterfly with its reflection in the water). **This would enable her to understand her own code and build new functionalities, including new external interaction modules (i.e. new functions) and new types of agents (APrompt class)**. As a result, the knowledge and capabilities of LLMs would be more thoroughly unleashed.\n\nImplementing self-expansion involves two parts. On one hand, new modules and new types of agents (APrompt class) need to be dynamically loaded during runtime and naturally integrated into the computational system to participate in processing, which we refer to as dynamic loading. On the other hand, Ailice needs the ability to construct new modules and agent types.\n\nThe dynamic loading mechanism itself is of great significance: it represents **a novel software update mechanism**. We can allow Ailice to search for its own extension code on the internet, check the code for security, fix bugs and compatibility issues, and ultimately run the extension as part of itself. Therefore, Ailice developers only need to place their contributed code on the internet, without the need to merge into the main codebase or consider any other installation methods. The implementation of the dynamic loading mechanism is continuously improving. Its core lies in the extension packages providing some text describing their functions. During runtime, each agent in Ailice finds suitable functions or agent types to solve sub-problems for itself through semantic matching and other means.\n\nBuilding new modules is a relatively simple task, as the interface constraints that modules need to meet are very straightforward. We can teach LLMs to construct new modules through an example. The more complex task is the self-construction of new agent types (APrompt class), which requires a good understanding of Ailice's overall architecture. The construction of system prompts is particularly delicate and is a challenging task even for humans. Therefore, we pin our hopes on more powerful LLMs in the future to achieve introspection, **allowing Ailice to understand herself by reading her own source code (for something as complex as a program, the best way to introduce it is to present itself), thereby constructing better new agents**.\n\n\n\u003Ca name=\"how-developers-should-get-started\">\u003C\u002Fa>\n### How Developers Should Get Started\n\n- For developing Agents, the main loop of Ailice is located in the AiliceMain.py or ui\u002Fapp.py files. To further understand the construction of an agent, you need to read the code in the \"prompts\" folder, by reading these code you can understand how an agent's prompts are dynamically constructed.\n\n- For developers who want to understand the internal operation logic of Ailice, please read core\u002FAProcessor.py and core\u002FInterpreter.py. These two files contain approximately three\nhundred lines of code in total, but they contain the basic framework of Ailice.\n\n\n\u003Ca name=\"project-development-standards-and-constraints\">\u003C\u002Fa>\n### Project Development Standards and Constraints\n\n- In this project, **achieving the desired functionality of the AI Agent is the primary goal. The secondary goal is code clarity and simplicity**. The implementation of the AI Agent is still an exploratory topic, so we aim to **minimize rigid components in the software (such as architecture\u002Finterfaces imposing constraints on future development) and provide maximum flexibility for the application layer (e.g., prompt classes)**. Abstraction, deduplication, and decoupling are not immediate priorities.\n\n- When implementing a feature, **always choose the best method rather than the most obvious one**. The metric for \"best\" often includes traits such as trivializing the problem from a higher perspective, maintaining code clarity and simplicity, and ensuring that changes do not significantly increase overall complexity or limit the software's future possibilities.\n\n- Adding comments is not mandatory unless absolutely necessary; **strive to make the code clear enough to be self-explanatory**. While this may not be an issue for developers who appreciate comments, in the AI era, we can generate detailed code explanations at any time, eliminating the need for unstructured, hard-to-maintain comments.\n\n- Follow the principle of Occam's razor when adding code; **never add unnecessary lines**.\n\n- **Functions or methods in the core should not exceed 60 lines**.\n\n- While there are no explicit coding style constraints, maintain consistency or similarity with the original code in terms of naming and case usage to avoid readability burdens. \n\nAilice aims to achieve multimodal and self-expanding features within a scale of less than 5000 lines, reaching its final form at the current stage. The pursuit of concise code is not only because succinct code often represents a better implementation, but also because it enables AI to develop introspective capabilities early on and facilitates better self-expansion. Please adhere to the above rules and approach each line of code with diligence.\n\n\n\u003Ca name=\"future-development-roadmap\">\u003C\u002Fa>\n### Future Development Roadmap\n\nAilice's fundamental tasks are twofold: **one is to fully unleash the capabilities of LLM based on text into the real world; the other is to explore better mechanisms for long-term memory and forming a coherent understanding of vast amounts of text**. Our development efforts revolve around these two focal points.\n\nIf you are interested in the development of Ailice itself, you may consider the following directions:\n\n- Explore improved **long-term memory mechanisms** to enhance the capabilities of each Agent. We need a long-term memory mechanism that **enables consistent understanding of large amounts of content and facilitates association**. The most feasible option at the moment is to replace vector database with knowledge graph, which will greatly benefit the comprehension of long texts\u002Fcodes and enable us to build genuine personal AI assistants.\n\n- **Multimodal** support. The support for the multimodal model has been completed, and the current development focus is shifting towards the multimodal support of peripheral modules. We need a module that operates computers based on screenshots and simulates mouse\u002Fkeyboard actions.\n\n- **Self-expanding** support. Our goal is to enable language models to **autonomously code and implement new peripheral modules\u002Fagent types and dynamically load them for immediate use**. This capability will enable self-expansion, empowering the system to seamlessly integrate new functionalities. We've completed most of the functionality, but we still need to develop the capability to construct new types of agents.\n\n- **Richer UI interface**. We need to organize the agents output into a tree structure in dialog window and dynamically update the output of all agents. And accept user input on the web interface and transfer it to scripter's standard input, which is especially needed when using sudo.\n\n- **Develop Agents** with various functionalities based on the current framework.\n\n- **Explore the application of IACT architecture on complex tasks.** By utilizing an interactive agents call tree, we can break down large documents for improved reading comprehension, as well as decompose complex software engineering tasks into smaller modules, completing the entire project build and testing through iterations. This requires a series of intricate prompt designs and testing efforts, but it holds an exciting promise for the future. The IACT architecture significantly alleviates the resource constraints imposed by the context window, allowing us to dynamically adapt to more intricate tasks.\n\n- **Build rich external interaction modules using self-expansion mechanisms! This will be accomplished in [AiliceEVO](https:\u002F\u002Fgithub.com\u002Fstevenlu137\u002FAIliceEVO).**\n\n- **Explore using Ailice as the execution environment for reinforcement learning to optimize the performance of small-sized language models.** In agent applications, the amount of knowledge in the model is less important than its reasoning and tool-use abilities. The latter largely depends on the model's understanding of its operational environment, and reinforcement learning (RL) may be the best option for developing this understanding.","\u003Cdiv align= \"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmyshell-ai_AIlice_readme_5502bfb65596.png\" height=256>\n    \u003Ch1>Ailice\u003C\u002Fh1>\n\n[![forks](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Fmyshell-ai\u002FAIlice)](https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice)\n[![stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmyshell-ai\u002FAIlice)](https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice)\n[![watchers](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fwatchers\u002Fmyshell-ai\u002FAIlice)](https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice)\n[![Visitors](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmyshell-ai_AIlice_readme_b13f8d0d8b9d.png)](https:\u002F\u002Fvisitorbadge.io\u002Fstatus?path=https%3A%2F%2Fgithub.com%2Fmyshell-ai%2FAIlice)\n[![license](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fmyshell-ai\u002FAIlice)](.\u002FLICENSE)\n\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"#quick-start\">快速入门\u003C\u002Fa> •\n  \u003Ca href=\"https:\u002F\u002Fwww.youtube.com\u002F@stevenlu-zh6ds\">演示\u003C\u002Fa> •\n  \u003Ca href=\"#development\">开发\u003C\u002Fa> •\n  \u003Ca href=\"https:\u002F\u002Ftwitter.com\u002Fstevenlu1729\">Twitter\u003C\u002Fa> •\n  \u003Ca href=\"https:\u002F\u002Fwww.reddit.com\u002Fr\u002FAIlice\u002F\">Reddit\u003C\u002Fa>\n\u003C\u002Fp>\n\n---\n\n**注意！我们目前没有任何发行相关加密代币的计划。请保持警惕，识别诈骗以避免上当受骗。（更新于2025年1月6日）**\n\n:fire: 2025年3月22日：Ailice现在可以使用MCP工具了！点击[这里](#configuring-extension-modules-and-mcp-servers)。\n\n:fire: 2025年1月23日：更新了语音对话功能。得益于ChatTTS的出色实现，语音对话终于从实验阶段走向实用。\n\n:fire: 2024年6月22日：我们已经进入了本地运行类似JARVIS的AI助手时代！最新的开源大模型使我们能够在本地执行复杂任务！点击[这里](#guide-to-choosing-an-llm)了解更多。\n\n----\n\nAilice是一个完全**自主的通用型AI智能体**。该项目旨在基于开源大模型打造一个类似于JARVIS的独立人工智能助手。借助其独特的IACT（交互式智能体调用树）架构，Ailice能够将复杂任务分解为动态构建的子智能体，并以高容错性整合结果。目前，Ailice在**主题研究、编程、系统管理、文献综述以及更复杂的混合任务**等方面表现出色。\n\n我们的最终目标是实现**AI智能体的自我进化**。也就是说，AI智能体将自主构建自身的功能扩展和新型智能体，从而无缝地将大模型的知识与推理能力释放到现实世界中。\n\n---\n\n## 🚀 在线体验Ailice\n\n**立即在[kragent.ai](https:\u002F\u002Fkragent.ai\u002F)体验Ailice**——这是一个基于Ailice构建的平台，无需本地部署即可探索其强大功能。非常适合测试并发现这款AI助手能做什么！\n\n使用**您自己的API密钥**来配置强大的商用大模型，充分发挥其潜力（这将与默认的免费配置完全不同！）。为了获得最佳性价比，建议采用商用与开源大模型相结合的混合方案。推荐模型：**Claude 3.5\u002F3.7、Gemini 2.5 Pro**。\n\n---\n\n- [功能](#features)\n- [快速入门](#quick-start)\n  - [快速安装](#quick-installation)\n  - [我们可以做的酷炫事情](#cool-things-we-can-do)\n- [安装与使用](#installation-and-usage)\n  - [系统要求](#system-requirements)\n  - [环境配置与安装](#environment-configuration-and-installation)\n  - [如果您需要频繁使用Google](#if-you-need-to-frequently-use-google)\n  - [使用方法](#usage)\n  - [配置扩展模块和MCP服务器](#configuring-extension-modules-and-mcp-servers)\n  - [实用技巧](#useful-tips)\n- [大模型的选择与配置](#selection-and-configuration-of-LLM)\n  - [如何选择大模型指南](#guide-to-choosing-an-llm)\n  - [最出色的开源模型](#the-most-outstanding-open-source-model)\n  - [如何添加大模型支持](#how-to-add-llm-support)\n    - [通过推理服务使用大模型](#using-llm-through-inference-services)\n      - [示例1：ollama + litellm](#example-1-ollama-litellm)\n      - [示例2：LM Studio](#example-2-lm-studio)\n      - [示例3：添加开源多模态模型支持](#example-3-add-open-source-multimodal-model-support)\n    - [Huggingface上的开源模型](#open-source-models-on-huggingface)\n  - [在不同智能体中使用不同模型](#using-different-models-in-different-agents)\n- [开发](#development)\n  - [设计](#design)\n    - [计算模型：交互式智能体调用树](#computational-model-interactive-agents-call-tree)\n    - [基本计算单元：大模型与解释器的太极图](#basic-computing-unit-tai-chi-diagram-of-llm-and-interpreter)\n    - [智能体设计：实现解释器框架](#agent-design-implementing-the-interpreter-framework)\n    - [脚本语言：从文本到现实](#scripting-language-from-text-to-reality)\n    - [多模态：富文本与变量机制的协作](#multimodal-collaboration-of-rich-text-and-variable-mechanisms)\n    - [自我扩展：像树一样生长](#self-expansion-growing-like-a-tree)\n  - [开发者如何入门](#how-developers-should-get-started)\n  - [项目开发标准与约束](#project-development-standards-and-constraints)\n  - [未来开发路线图](#future-development-roadmap)\n\n\n\u003Ca name=\"features\">\u003C\u002Fa>\n## 功能\nAilice的核心技术特点包括：\n\n- **针对专业领域的深度研究能力。**\n- **阅读和分析文章及学术著作的能力。**\n- **先进的编程与脚本执行自动化，可作为全面的编码工具和高效的系统管理工具，类似于AI驱动的操作系统。**\n- **语音交互支持。**\n- **兼容开源模型，并可无缝集成商用模型。**\n- **所有智能体原生支持多模态。**\n- **丰富的媒体界面，支持图片\u002F视频\u002F音频、LaTeX公式、代码高亮显示以及文件上传\u002F下载。**\n- **自然且高容错性的交互式智能体调用树架构。**\n- **灵活解析大模型输出，支持更广泛的函数调用机制。**\n- **能够自我构建并动态加载与环境交互的模块，为功能扩展提供无限可能。**\n\n\u003Ca name=\"quick-start\">\u003C\u002Fa>\n## 快速入门\n\n\u003Ca name=\"quick-installation\">\u003C\u002Fa>\n\n### 快速安装\n\n使用以下命令安装并运行 Ailice。Ailice 启动后，用浏览器打开它提供的网页，对话界面就会出现。通过对话向 Ailice 发出指令来完成各种任务。首次使用时，可以尝试 [COOL things we can do](#cool-things-we-can-do) 部分中提供的命令，快速上手。\n\n**本地运行：**\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice.git\ncd AIlice\npip install -e .\nailice --contextWindowRatio=0.2\n```\n\n**沙箱运行：**\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice.git\ncd AIlice\ndocker build -t ailice .\ndocker run -it -p 127.0.0.1:5000:5000 --name ailice ailice --expose=1 --contextWindowRatio=0.2\n```\n\n**支持 CUDA 的沙箱运行**（请先安装 [nvidia-container-toolkit](https:\u002F\u002Fdocs.nvidia.com\u002Fdatacenter\u002Fcloud-native\u002Fcontainer-toolkit\u002Flatest\u002Finstall-guide.html)）：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice.git\ncd AIlice\ndocker build --build-arg BASE_IMAGE=nvidia\u002Fcuda:13.0.0-cudnn-devel-ubuntu24.04 -t ailice .\ndocker run --gpus all -it -p 127.0.0.1:5000:5000 --name ailice ailice --expose=1 --contextWindowRatio=0.2\n```\n\n**支持 GUI 的沙箱运行**（仅限 Linux，Windows 和 macOS 需特殊配置）：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice.git\ncd AIlice\ndocker build -t ailice .\ndocker run -it -p 127.0.0.1:5000:5000 \\\n    -e DISPLAY=$DISPLAY \\\n    -v \u002Ftmp\u002F.X11-unix:\u002Ftmp\u002F.X11-unix \\\n    --name ailice \\\n    ailice --expose=1 --contextWindowRatio=0.2\n```\n\n- 如需更详细的安装和配置方法，请参阅 [安装与使用](#installation-and-usage) 部分以及 [LLM 的选择与配置](#selection-and-configuration-of-LLM) 部分。\n- 若要了解 Ailice 的基本设计原则，请前往 [设计](#design) 部分。\n\n\n\u003Ca name=\"cool-things-we-can-do\">\u003C\u002Fa>\n### COOL Things We Can Do\n\n#### 快速入门示例\n- “列出当前目录的内容。”\n- “查看今天旧金山的天气。”\n- “计算 e^(-x^2) 从负无穷到正无穷的积分，并给出详细的推导步骤。”\n- “使用任意算法生成分形可视化图像。”\n\n#### 系统管理\n- “在本系统上安装 Google Chrome 浏览器。下载最新稳定版本，验证安装并确认其正常工作。”\n\n#### 软件开发与分析\n- “从 GitHub 克隆 GiraffeCV 项目，分析其架构，识别主要模块接口，并提供详细报告。”\n\n#### 人工智能\n- “使用 SDXL 生成一张‘一只胖乎乎的橙色猫’的图片。参考其 Hugging Face 页面上的示例代码，保存到当前目录并展示结果。”\n\n#### 网页开发\n- “为 GitHub 上的 AIlice AI 代理项目生成一个专业的主页。在本地端口 59001 上运行，界面美观，包含图片和文字内容。”\n\n#### 工程设计与仿真\n- “设计一个频率调制 (FM) 无线电接收器并进行仿真。提供原理图和仿真结果。”\n- “使用 CadQuery 生成一个具有自定义参数的齿轮。生成后提供多角度的投影视图。”\n\n#### 网络安全\n- “对 kragent.ai 网站进行全面的安全扫描。检查常见漏洞，并提供包含建议的详细安全评估报告。”\n\n#### 学术研究与文献综述\n- “查找一篇关于黑洞信息悖论的最新综述论文。利用该论文收集过去五年内的重要文献链接，阅读这些文献，并汇报该领域的研究进展。”\n\n#### 理论物理\n- “推导波动方程和干涉理论，创建数值模拟和可视化效果，生成 LaTeX 格式的 PDF 幻灯片。”\n\n#### 数学\n- “创建微分几何中整体性的三维可视化效果。”\n- “生成傅里叶级数逼近中的吉布斯现象动画演示。”\n\n#### 医学研究\n- “调查目前治疗心脏结节病的方法。检索 2022 至 2025 年间的最新文献，分析临床试验结果，编写一份关于新兴治疗方式的综合报告。”\n\n#### 计算生物学\n- “利用分子系统发育数据和比较基因组学，计算人类与猫之间的最近共同祖先 (MRCA) 时间。”\n\n#### 天文学与天体物理学\n- “使用地平坐标系（高度角和方位角）计算从加州观测到的火星当前位置。”\n\n#### 文档生成\n- “生成一篇科学风格的 PDF 论文，格式规范，内容自定。”\n\n#### 数据集构建与自动化采集\n- “在互联网上搜索 100 篇涵盖各分支的物理教程，并将 PDF 文件下载到名为 ‘physics’ 的文件夹中。按子学科（力学、电磁学、量子物理等）整理，并创建带有元数据的索引。”\n\n#### 视频处理与 AI 分析流水线\n**任务：** 通过完整的 AI 流水线处理费曼的物理讲座\n\n**步骤：**\n\n1. 在 YouTube 上找到费曼的讲座视频，下载到 `Feynman\u002F` 子目录中（需先创建文件夹）\n2. 从视频中提取音频，保存到 `Feynman\u002Faudio\u002F`\n3. 使用 whisper-large-v3 将音频转换为文本（参考 Hugging Face 示例代码），合并成一个文档\n4. 从转录文本中提取答案：“为什么我们需要反粒子？”\n\n*此多步骤任务需要与 AI 代理进行交互式沟通，必要时使用“中断”按钮引导流程。*\n\n#### 扩展性与模块开发\n\n1. “编写一个扩展模块，通过关键词获取维基百科页面内容。”\n2. “加载新实现的维基百科模块，并用其查询相对论条目。”\n\n*AI 代理能够自主构建外部交互模块（ext-modules），只需简单的提示即可实现无限的扩展性。*\n\n\u003Ca name=\"installation-and-usage\">\u003C\u002Fa>\n## 安装与使用\n\n\u003Ca name=\"system-requirements\">\u003C\u002Fa>\n\n### 系统要求\n\n如果用户不打算在本地运行大语言模型，那么运行 Ailice 几乎没有硬件要求。对于希望在本地运行大语言模型的用户，目前只有参数量达到 700 亿或以上的模型才能较好地完成任务，因此至少需要两块 RTX 4090 显卡（每块显卡配备 48GB 显存）才能有效执行任务。\n\n- Ailice 是在 **Ubuntu** 上开发的，因此在 Ubuntu 上安装和使用有最佳的保障。\n- 在 **MacOS** 环境下的可用性与 Ubuntu 环境相似。\n- 对于 **Windows** 用户，使用 **Docker** 或安装 **WSL**（Windows Subsystem for Linux），并在 WSL 中运行 Ailice 是更好的选择，尤其是需要执行编程任务的用户——我尚未集成 Windows 的命令执行工具（未来会考虑这一点，但命令行工具的灵活性对 AI 代理来说非常重要，这也使得 Linux 平台更具优势）。此外，在 Windows 上缺乏充分测试会显著增加出现 bug 的可能性；如果您遇到相关问题，请提交 issue 以寻求解决。\n\n在安装 Ailice 之前，强烈建议先安装 **Anaconda 并创建虚拟环境**（您也可以使用其他喜欢的工具，如 venv）。您还需要 **Chrome** 浏览器，因为 Ailice 需要它来进行网页浏览。对于希望在完全受控的容器或虚拟机中运行 Ailice 的用户，您需要 **Docker**（或其他虚拟机工具，如 VirtualBox）。\n\n如果您想在虚拟机中运行 Ailice，请确保关闭 **Hyper-V**（否则无法安装 llama.cpp）。在 VirtualBox 环境下，可以通过以下步骤禁用 Hyper-V：在虚拟机的 VirtualBox 设置中，禁用 PAE\u002FNX 和 VT-X\u002FAMD-V（即 Hyper-V），并将“半虚拟化接口”设置为默认值，同时禁用嵌套分页。\n\n\u003Ca name=\"environment-configuration-and-installation\">\u003C\u002Fa>\n### 环境配置与安装\n您可以使用以下命令来安装 Ailice：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice.git\ncd AIlice\npip install -e .\n```\n\n安装完成后，Ailice 的运行速度可能会较慢，这是因为长期记忆模块中的嵌入式模型计算可能正在 CPU 上运行。在这种情况下，可以尝试运行以下命令来启用 GPU 加速：\n\n```bash\nailice_turbo\n```\n\n对于需要使用 PDF 阅读、语音对话、Hugging Face 模型或模型微调功能的用户，可以使用以下命令之一（安装过多功能会增加依赖冲突的可能性，因此建议仅安装必要的部分）：\n\n```bash\npip install -e .[pdf-reading]\npip install -e .[speech]\npip install -e .[huggingface]\npip install -e .[finetuning]\n```\n\n现在您可以运行 Ailice 了！请参考 [使用方法](#usage) 中的命令。\n\n\u003Ca name=\"if-you-need-to-frequently-use-google\">\u003C\u002Fa>\n### 如果您需要频繁使用 Google\n默认情况下，Ailice 中的 Google 模块是受限的，重复使用可能会导致错误，且需要一定时间才能解决。这在 AI 时代是一个令人尴尬的现实：传统搜索引擎只允许真正的用户访问，而 AI 代理目前并不属于“真实用户”的范畴。尽管我们有替代方案，但这些方案都需要配置 API 密钥，这对普通用户来说门槛较高。不过，对于那些需要频繁访问 Google 的用户，我相信您愿意承担申请 Google 官方 API 密钥的麻烦（我们指的是 [Custom Search JSON API](https:\u002F\u002Fdevelopers.google.com\u002Fcustom-search\u002Fv1\u002Foverview)，该 API 在创建时需要指定搜索整个互联网）。对于这类用户，请打开 config.json 文件，并使用以下配置：\n\n```\n{\n    ...\n    \"services\": {\n        ...\n        \"google\": {\n          \"cmd\": \"python3 -m ailice.modules.AGoogleAPI --addr=ipc:\u002F\u002F\u002Ftmp\u002FAGoogle.ipc --api_key=YOUR_API_KEY --cse_id=YOUR_CSE_ID\",\n          \"addr\": \"ipc:\u002F\u002F\u002Ftmp\u002FAGoogle.ipc\"\n        },\n        ...\n    }\n}\n```\n\n然后安装 google-api-python-client：\n\n```bash\npip install google-api-python-client\n```\n\n最后只需重启 Ailice 即可。\n\n\u003Ca name=\"usage\">\u003C\u002Fa>\n\n### 使用方法\n\n您可以直接从下面的典型用例中复制命令来运行 Ailice。\n\n```bash\nailice   # 使用 config.json 文件中 agentModelConfig 字段下为不同智能体单独配置的模型。\nailice_web --speechOn=1 --ttsDevice=cuda --sttDevice=cuda\nailice --modelID=anthropic:claude-sonnet-4-20250514 --contextWindowRatio=0.2\nailice --modelID=openrouter:z-ai\u002Fglm-4.5 --chatHistoryPath=.\u002Fchat_history --contextWindowRatio=0.2\nailice --modelID=mistral:mistral-large-latest --prompt=\"researcher\"\nailice --modelID=deepseek:deepseek-chat\nailice --modelID=hf:Open-Orca\u002FMistral-7B-OpenOrca --quantization=8bit --contextWindowRatio=0.6\nailice --modelID=groq:llama3-70b-8192\nailice --modelID=openrouter:google\u002Fgemini-2.5-pro\nailice --modelID=lm-studio:qwen2-72b --contextWindowRatio=0.5\n```\n\n需要注意的是，最后一个用例需要您先配置 LLM 推理服务，请参阅 [如何添加 LLM 支持](#how-to-add-llm-support)。使用 LM Studio 等推理框架，可以在有限的硬件资源下支持更大的模型，提供更快的推理速度和更快速的 Ailice 启动速度，因此更适合普通用户。\n\n首次运行时，系统会提示您输入 API 密钥。您也可以通过编辑 config.json 文件来修改 API 密钥。请注意，首次使用开源 LLM 时，下载模型权重需要较长时间，请确保您有充足的时间和磁盘空间。\n\n首次开启 speechOn 开关时，启动过程中可能需要等待较长时间。这是因为语音识别和 TTS 模型的权重正在后台下载。\n\n如示例所示，您可以通过 `ailice` 使用智能体功能，并且它提供了 Web 对话界面。您可以通过以下命令查看每个参数的默认值：\n\n```bash\nailice --help\n```\n\n所有命令行参数的默认值都可以通过修改 config.json 中对应的参数进行自定义。\n\n- `--modelID`：模型配置有两种模式。第一种模式是统一通过 modelID 指定模型；第二种模式则是不同类型智能体会运行在不同的模型上。当该参数为空字符串（未指定）时，将自动采用第二种模式，即使用 config.json 文件中 agentModelConfig 字段下为不同智能体单独配置的模型。详细信息请参阅 [在不同智能体中使用不同模型](#using-different-models-in-different-agents)。当前支持的模型可在 config.json 中查看。\n- `--quantization`：量化选项，可选择 4bit 或 8bit。默认为不进行量化。\n- `--maxMemory`：显存容量限制，默认不设置。设置格式为 `{0:\"23GiB\", 1:\"24GiB\", \"cpu\": \"64GiB\"}`。\n- `--prompt`：指定要执行的提示词，即智能体类型。默认为 'main'，该智能体会根据您的需求决定调用合适的智能体类型。您也可以直接指定某一特定类型的智能体并与之交互。\n- `--temperature`：设置 LLM 推理的温度参数，默认为零。\n- `--flashAttention2`：启用 Flash Attention 2 以加速推理的开关。可能会对输出质量产生一定影响。\n- `--contextWindowRatio`：用户指定的比例系数，在某些情况下用于确定推理过程中构建的提示长度上限占 LLM 上下文窗口的比例。默认值为 0.6。\n- `--speechOn`：启用语音对话的开关。\n- `--ttsDevice`：指定文本转语音模型使用的计算设备。默认为 \"cpu\"，如果有足够的显存，可以设置为 \"cuda\"。\n- `--sttDevice`：指定语音转文本模型使用的计算设备。默认为 \"cpu\"，如果有足够的显存，可以设置为 \"cuda\"。\n- `--resetApiKey`：是否在启动后重置模型的 API 密钥。\n- `--chatHistoryPath`：用于指定聊天历史数据的存储目录。\n- `--certificate`：Web 界面的证书设置。最简单的选项是空字符串，此时 UI 页面将使用 HTTP 协议。设置为 'adhoc' 将使用自生成的证书，为 UI 和服务器之间的数据流提供加密，但需要忽略浏览器的安全警告。最安全的方法是申请证书，并将此参数设置为 `{\"cert\": \"your_cert.pem\", \"key\": \"your_key.pem\"}`。\n- `--expose`：是否对外公开访问。\n- `--share`：为 Ailice 创建一个可公开分享的链接。（出于安全考虑，我们暂时移除了此功能。待 UI 实施更多安全措施后，该功能将重新启用。请确保 app.py 提供的服务不会暴露到任何不受信任的网络中。）\n\n\n\u003Ca name=\"configuring-extension-modules-and-mcp-servers\">\u003C\u002Fa>\n\n### 配置扩展模块和 MCP 服务器\n\nAilice 的配置文件名为 config.json，其位置会在 Ailice 启动时输出到命令行中。在这一节中，我们将介绍如何通过配置文件来扩展 Ailice 的外部交互能力。\n\n在 Ailice 中，我们使用“模块”一词特指那些提供与外部世界交互功能的组件。每个模块都作为一个独立的进程运行；它们可以与核心进程运行在不同的软件或硬件环境中，从而使 Ailice 具备分布式能力。我们在 Ailice 运行所需的配置文件中提供了一系列基础模块配置（如向量数据库、搜索、浏览器、代码执行等）。你也可以添加新模块的配置。模块配置非常简单，仅包含两项：\n\n```json\n  \"services\": {\n    ...\n    \"scripter\": {\"cmd\": \"python3 -m ailice.modules.AScripter --addr=tcp:\u002F\u002F127.0.0.1:59000\",\n\t               \"addr\": \"tcp:\u002F\u002F127.0.0.1:59000\"},\n    ...\n  }\n```\n\n其中，“cmd”项是一个用于启动模块进程的命令行。当 Ailice 启动时，它会自动运行这些命令来启动模块。用户可以指定任意命令，这提供了极大的灵活性。你可以在本地启动模块进程，也可以利用 Docker 在虚拟环境中启动进程，甚至可以启动远程进程。有些模块有多种实现方式（如 Google\u002FStorage），你可以在这里进行配置以切换到另一种实现。\n\n“addr”指的是模块进程的地址和端口号。用户可能会感到困惑，因为在默认配置中，许多模块的“cmd”和“addr”都包含了地址和端口号，似乎存在冗余。这是因为“cmd”原则上可以包含任何命令（其中可能包含地址和端口号，也可能不包含）。因此，需要单独的“addr”项来告知 Ailice 如何访问模块进程。\n\nAilice 可以使用来自各种 MCP 服务器的工具，只需用 ailice_mcp_wrapper 命令将 MCP 服务器封装起来，即可将其作为 Ailice 的标准扩展模块使用。\n\n对于以 stdio 模式在本地启动的 MCP 服务器，假设启动命令为 `mcp_echo hello`，我们使用以下配置将其作为 Ailice 的标准服务启动。\n\n```json\n  \"services\": {\n    ...\n    \"mcp_echo\": {\"cmd\": \"ailice_mcp_wrapper --addr tcp:\u002F\u002F127.0.0.1:59200 stdio mcp_echo hello\",\n                 \"addr\": \"tcp:\u002F\u002F127.0.0.1:59200\"},\n    ...\n  }\n```\n\n对于 SSE 模式的 MCP 服务器，假设其服务地址为：`http:\u002F\u002Fexample:8000\u002Fsse`，我们使用以下配置将其连接起来。\n\n```json\n  \"services\": {\n    ...\n    \"mcp_echo\": {\"cmd\": \"ailice_mcp_wrapper --addr tcp:\u002F\u002F127.0.0.1:59200 sse --server_url http:\u002F\u002Fexample:8000\u002Fsse\",\n                 \"addr\": \"tcp:\u002F\u002F127.0.0.1:59200\"},\n    ...\n  }\n```\n\n对于在配置文件中设置的扩展模块，Ailice 会在启动后自动加载，并为当前任务选择合适的工具。除了使用 config.json 外，你还可以在运行时向 Ailice 提供扩展模块进程的地址和端口，这样它就可以动态加载并使用该模块。\n\n\n\u003Ca name=\"useful-tips\">\u003C\u002Fa>\n### 实用提示\n\nAilice 是一个基于多智能体协作的代理，而作为用户，你也是其中的一个“智能体”。因此，当 Ailice 需要更多信息时，它会向你寻求输入，而你提供的信息是否详尽，对她的成功至关重要。\n\n中断。**中断是 Ailice 支持的第二种交互模式，它允许你在任何时候打断并提示 Ailice 的智能体，以纠正错误或提供指导**。在 `ailice` 中，在 Ailice 执行任务的过程中，输入框右侧会出现一个中断按钮。按下它会暂停 Ailice 的执行，并等待你的提示信息。你可以将提示内容输入到输入框中，然后按 Enter 键将消息发送给正在执行子任务的智能体。\n熟练使用此功能需要对 Ailice 的工作原理有很好的理解，尤其是智能体调用树架构。此外，在 Ailice 执行任务时，还需要更多地关注命令行窗口，而不是对话界面。总体而言，这是一个非常有用的功能，尤其是在性能较弱的语言模型设置下。\n\n在语音对话模式下，你可以**让 Ailice 在不同的音色之间切换**，直到找到你喜欢的音色（它会记住你偏好的音色）。\n\n\u003Ca name=\"selection-and-configuration-of-LLM\">\u003C\u002Fa>\n## LLM 的选择与配置\n\n\u003Ca name=\"guide-to-choosing-an-llm\">\u003C\u002Fa>\n### LLM 选择指南\n\n更新于 2025 年 8 月 15 日。\n\n目前，Ailice 可以**使用本地运行的 72B 开源模型（qwen-2.5-72b-instruct 在 4090x2 上运行）处理更复杂的任务**。考虑到开源模型的成本较低，我们强烈建议用户从这些模型开始使用。此外，将 LLM 运行本地化可以确保绝对的隐私保护，这在当今的 AI 应用中是非常难得的品质。点击[这里](#example-2-lm-studio)了解如何在本地运行 LLM。对于 GPU 条件不足以运行大型模型的用户，可以使用在线推理服务（如 openrouter，稍后会提到）来访问这些开源模型（尽管这会牺牲隐私）。你可以根据不同模型的优势和劣势，让智能体发挥出最佳效果。有关详细信息，请参阅[在不同智能体中使用不同模型](#using-different-models-in-different-agents)。\n\n**claude-3-5-sonnet\u002Fclaude-3-7-sonnet\u002Fclaude-sonnet-4\u002Fgemini-2.5-pro** 提供了最佳性能。\n\n**z-ai\u002Fglm-4.5** 的性能非常接近顶级水平。\n\n**qwen-2.5-72b-instruct** 在 70B 级别中提供了最佳性能。\n\n**gpt-4-turbo**\u002F**gpt-3.5-turbo** 表现得异常懒惰，我们始终无法找到稳定的提示表达方式。\n**gpt-4o** 曾经拥有顶级性能，但目前（与之前的 turbo 型号类似）在函数调用方面也出现了懒惰问题。我们不再推荐使用它们。\n\n对于硬件能力不足以在本地运行开源模型，且无法获取商用模型 API 密钥的用户，可以尝试以下选项：\n\n- **openrouter\u002Fapipie** 这些服务可以将你的推理请求路由到各种开源或商用模型，而无需在本地部署开源模型或申请各种商用模型的 API 密钥。它们是非常棒的选择。Ailice 自动支持 OpenRouter\u002FApipie 中的所有模型。感谢 @babybirdprd 向我推荐了 OpenRouter。\n\n\u003Ca name=\"the-most-outstanding-open-source-model\">\u003C\u002Fa>\n\n### 最出色的开源模型\n\n我们将选择当前性能最佳的开源模型，为使用开源模型的用户提供参考。\n\n- 所有模型中表现最佳的是：**z-ai\u002Fglm-4.5**。该模型性能卓越，但其硬件需求远超大多数用户的设备能力，因此不适合在本地运行。\n\n- **qwen-2.5-72b-instruct**：通义千问2.0-72B-Instruct是首个具备实用价值的开源模型，而2.5版本在此基础上进一步优化。它提供了适用于相对简单任务的低成本模型，同时也是大多数用户能够在本地运行的最佳选择。\n\n如果您发现更好的模型，请告知我们。\n\n\u003Ca name=\"how-to-add-llm-support\">\u003C\u002Fa>\n### 如何添加LLM支持\n对于高级用户来说，尝试更多模型是不可避免的。幸运的是，实现这一点并不困难。\n\n\n\u003Ca name=\"using-llm-through-inference-services\">\u003C\u002Fa>\n#### 通过推理服务使用LLM\n\n对于OpenAI、Mistral、Anthropic和Groq等模型，您无需进行任何操作。只需使用由官方模型名称加上“oai:”、“mistral:”、“anthropic:”或“groq:”前缀组成的modelID即可。如果需要使用Ailice不支持的模型，您可以通过在config.json文件中添加该模型条目来解决这个问题。添加方法是直接引用类似模型的条目，将**contextWindow**修改为实际值，保持**systemAsUser**与类似模型一致，并将**args**设置为空字典。\n\n您可以使用任何兼容OpenAI API的第三方推理服务器来替代Ailice内置的LLM推理功能。\n只需采用与OpenAI模型相同的配置格式，并修改**baseURL、apikey、contextWindow等参数**（实际上，Ailice正是通过这种方式支持Groq模型的）。\n\n对于不支持OpenAI API的推理服务器，您可以尝试使用**litellm**将其转换为兼容OpenAI的API（我们将在下文提供示例）。\n\n需要注意的是，由于Ailice的对话记录中包含大量SYSTEM消息，而这并非LLM的常见用例，因此对此类服务的支持程度取决于这些推理服务器的具体实现方式。在这种情况下，您可以将systemAsUser参数设置为true以规避问题。虽然这可能会使模型无法以最佳性能运行Ailice，但也让我们能够兼容各种高效的推理服务器。对于普通用户而言，这种做法带来的好处大于弊端。\n\n\n\u003Ca name=\"example-1-ollama-litellm\">\u003C\u002Fa>\n##### 示例1：ollama + litellm\n我们以Ollama为例，说明如何为这类服务添加支持。\n首先，我们需要使用Litellm将Ollama的接口转换为与OpenAI兼容的格式。\n\n```bash\npip install litellm\nollama pull mistral-openorca\nlitellm --model ollama\u002Fmistral-openorca --api_base http:\u002F\u002Flocalhost:11434 --temperature 0.0 --max_tokens 8192\n```\n\n然后，在config.json文件中添加对该服务的支持（该文件的位置会在Ailice启动时提示）。\n\n```json\n{\n  \"maxMemory\": {},\n  \"quantization\": null,\n  \"models\": {\n    \"oai\": {\n      ...\n    },\n    \"ollama\": {\n      \"modelWrapper\": \"AModelChatGPT\",\n      \"apikey\": \"fake-key\",\n      \"baseURL\": \"http:\u002F\u002Flocalhost:8000\",\n      \"modelList\": {\n        \"mistral-openorca\": {\n          \"formatter\": \"AFormatterGPT\",\n          \"contextWindow\": 8192,\n          \"systemAsUser\": false,\n          \"args\": {}\n        }\n      }\n    },\n    ...\n  },\n  ...\n}\n```\n\n现在我们可以运行Ailice：\n\n```bash\nailice --modelID=ollama:mistral-openorca\n```\n\n\u003Ca name=\"example-2-lm-studio\">\u003C\u002Fa>\n##### 示例2：LM Studio\n\n在这个示例中，我们将使用LM Studio运行我见过的最强大的开源模型：**Qwen2-72B-Instruct-Q3_K_S.gguf**，从而让Ailice在本地机器上运行。\n\n使用LM Studio下载**Qwen2-72B-Instruct-Q3_K_S.gguf**的模型权重。\n\n在LM Studio的“LocalServer”窗口中，如果您只想使用GPU，请将n_gpu_layers设置为-1。将左侧的“Context Length”参数调整为16384（或根据您的可用内存选择较小的值），并将“Context Overflow Policy”改为“保留系统提示和第一条用户消息，截断中间内容”。\n\n启动服务。假设服务地址为“http:\u002F\u002Flocalhost:1234\u002Fv1\u002F”。\n\n然后，打开config.json并进行如下修改：\n\n```json\n{\n  \"maxMemory\": {},\n  \"quantization\": null,\n  \"models\": {\n    \"oai\": {\n      ...\n    },\n    \"lm-studio\": {\n      \"modelWrapper\": \"AModelChatGPT\",\n      \"apikey\": \"fakekey\",\n      \"baseURL\": \"http:\u002F\u002Flocalhost:1234\u002Fv1\u002F\",\n      \"modelList\": {\n        \"qwen2-72b\": {\n          \"formatter\": \"AFormatterGPT\",\n          \"contextWindow\": 32764，\n          \"systemAsUser\": true，\n          \"args\": {}\n        }\n      }\n    },\n    ...\n  },\n  ...\n}\n```\n\n最后，运行Ailice。您可以根据可用的VRAM或内存空间调整‘contextWindowRatio’参数。该参数越大，所需的VRAM空间就越多。\n\n```bash\nailice --modelID=lm-studio:qwen2-72b --contextWindowRatio=0.5\n```\n\n\u003Ca name=\"example-3-add-open-source-multimodal-model-support\">\u003C\u002Fa>\n##### 示例3：添加开源多模态模型支持\n\n与上一节类似，我们在使用LM Studio下载并运行LLAVA后，按如下方式修改配置文件：\n\n```json\n{\n  \"maxMemory\": {},\n  \"quantization\": null,\n  \"models\": {\n    \"oai\": {\n      ...\n    },\n    \"lm-studio\": {\n      \"modelWrapper\": \"AModelChatGPT\",\n      \"apikey\": \"fakekey\",\n      \"baseURL\": \"http:\u002F\u002Flocalhost:1234\u002Fv1\u002F\",\n      \"modelList\": {\n        \"llava-1.6-34b\": {\n          \"formatter\": \"AFormatterGPTVision\",\n          \"contextWindow\": 4096，\n          \"systemAsUser\": true，\n          \"args\": {}\n        }\n      },\n    },\n    ...\n  },\n  ...\n}\n```\n\n然而，需要注意的是，目前的开源多模态模型还远远不足以完成智能体任务，因此这个示例更适合开发者而非普通用户。\n\n\n\u003Ca name=\"open-source-models-on-huggingface\">\u003C\u002Fa>\n#### Huggingface上的开源模型\n\n对于Huggingface上的开源模型，您只需了解以下信息即可添加新模型的支持：模型的Huggingface地址、模型的提示格式以及上下文窗口长度。\n通常只需一行代码即可添加新模型，但有时也可能需要十几行代码。\n\n以下是完整的新增LLM支持方法：\n打开config.json文件，您应将新LLM的配置添加到models.hf.modelList中，其格式如下：\n\n```json\n{\n  \"maxMemory\": {},\n  \"quantization\": null,\n  \"models\": {\n    \"hf\": {\n      \"modelWrapper\": \"AModelCausalLM\",\n      \"modelList\": {\n        \"meta-llama\u002FLlama-2-13b-chat-hf\": {\n          \"formatter\": \"AFormatterLLAMA2\",\n          \"contextWindow\": 4096,\n          \"systemAsUser\": false,\n          \"args\": {}\n        },\n        \"meta-llama\u002FLlama-2-70b-chat-hf\": {\n          \"formatter\": \"AFormatterLLAMA2\",\n          \"contextWindow\": 4096,\n          \"systemAsUser\": false,\n          \"args\": {}\n        },\n        ...\n      }\n    },\n  ...\n  }\n...\n}\n```\n\n- “formatter” 是一个定义 LLM 提示词格式的类。你可以在 core\u002Fllm\u002FAFormatter 中找到它们的定义。通过阅读这些代码，你可以确定要添加的模型需要哪种格式。如果找不到合适的格式，就需要自己编写一个。幸运的是，Formatter 的实现非常简单，通常十几行代码就能完成。相信在阅读了几份 Formatter 源码之后，你就能明白该如何操作。\n\n- 上下文窗口是 Transformer 架构的 LLM 通常具有的一个属性，它决定了模型一次能够处理的文本长度。你需要将新模型的上下文窗口设置为“contextWindow”键对应的值。\n\n- “systemAsUser”：我们使用“system”角色来表示函数调用返回的消息发送者。然而，并非所有 LLM 都对 system 角色有明确的定义，而且也不能保证 LLM 能够适应这种做法。因此，我们需要通过 systemAsUser 来设置是否将函数调用返回的文本放入用户消息中。建议先将其设置为 False。\n\n一切就绪！只需在模型名称前加上“hf:”作为前缀，形成 modelID，然后将新模型的 modelID 作为命令参数传递给 Ailice 即可启动！\n\n\n\u003Ca name=\"using-different-models-in-different-agents\">\u003C\u002Fa>\n\n\n### 在不同智能体中使用不同模型\n\nAilice 具有两种运行模式。一种模式是使用单一的 LLM 驱动所有智能体，而另一种模式则允许每种类型的智能体指定对应的 LLM。后一种模式使我们能够更好地结合开源模型和商业模型的能力，在降低成本的同时获得更好的性能。要使用第二种模式，首先需要在 config.json 中配置 agentModelConfig 项：\n\n```json\n  \"modelID\": \"\",\n  \"agentModelConfig\": {\n    \"DEFAULT\": \"openrouter:z-ai\u002Fglm-4.5\",\n    \"main\": \"openrouter:anthropic\u002Fclaude-sonnet-4\",\n    \"search-engine\": \"openrouter:qwen\u002Fqwen-2.5-72b-instruct\"\n  },\n```\n\n首先确保 modelID 的默认值被设置为空字符串，然后在 agentModelConfig 中为每种类型的智能体配置对应的 LLM。\n\n最后，可以通过不指定 modelID 来实现第二种运行模式：\n\n```bash\nailice\n```\n\n\n\u003Ca name=\"development\">\u003C\u002Fa>\n## 开发\n\n\u003Ca name=\"design\">\u003C\u002Fa>\n### 设计\n设计 Ailice 时的基本原则是：\n\n- **通过高度动态的提示词构建机制丰富 LLM 的行为；**\n- **尽可能地分离不同的计算任务，利用传统计算中的递归和分治思想来解决复杂问题。**\n- **智能体之间应能进行双向交互。**\n\n下面简要解释这些基本原则。\n\n从最直观的角度来看，高度动态的提示词构建机制可以降低智能体陷入循环的可能性。来自外部环境的新变量不断输入到 LLM 中，有助于其避免陷入死循环。此外，向 LLM 提供当前可用的所有信息，可以显著提升其输出质量。例如，在自动化编程中，解释器或命令行返回的错误信息可以帮助 LLM 不断调整代码，直到得到正确的结果。另外，在动态提示词构建过程中，提示词中的新信息也可能来自其他智能体，这相当于一种联动推理计算，使得系统的计算机制更加复杂多样，从而产生更丰富的行为。\n\n从实际角度来看，分离计算任务主要是因为我们的上下文窗口有限。我们不可能指望在几千个 token 的窗口内完成一项复杂的任务。如果能够将复杂任务分解成若干子任务，让每个子任务都在有限资源内解决，那将是理想的结果。在传统的计算模型中，我们一直采用这种方法，但在以 LLM 为中心的新计算模式中，要做到这一点并不容易。问题在于，一旦某个子任务失败，整个任务都有可能失败。递归则更为棘手：如何确保每次调用时，LLM 只解决子问题的一部分，而不是将全部负担转交给下一层调用？我们在 Ailice 中通过 IACT 架构解决了第一个问题，而第二个问题理论上并不难解决，只是可能需要更智能的 LLM 才能做到。\n\n第三个原则正是目前大家都在努力的方向：让多个智能体相互协作，共同完成更复杂的任务。这一原则的实现实际上也解决了前面提到的子任务失败问题。多智能体协作对于智能体在运行中的容错能力至关重要。事实上，这可能是新计算范式与传统计算之间最大的区别之一：传统计算精确无误，子任务仅通过单向通信（函数调用）分配；而新计算范式则容易出错，需要计算单元之间进行双向通信来纠正错误。这一点将在接下来关于 IACT 框架的章节中详细说明。\n\n\n\u003Ca name=\"computational-model-interactive-agents-call-tree\">\u003C\u002Fa>\n#### 计算模型：交互式智能体调用树\n![IACT](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmyshell-ai_AIlice_readme_9648f3629f87.jpg)\n*IACT 架构图。用户要求构建一个用于图片收集和展示的页面，该需求被动态分解为两个任务：coder_spider 和 coder_website。当 coder_spider 遇到困难时，会主动向其调用者 proxy_cat_gallery 寻求帮助。proxy_cat_gallery 随即创建另一个智能体 researcher_api，并委托其解决问题。*\n\nAilice 可以被视为 **一台由 LLM 驱动的计算机**，其特点包括：\n\n- 以文本形式表示输入、输出、程序和数据。\n- 使用 LLM 作为处理器。\n- 通过连续调用基本计算单元（类似于传统计算中的函数），将计算任务分解为多个子任务，而这些基本计算单元本质上就是各种功能型智能体。\n\n因此，**用户输入的文本命令会被当作一种程序执行，分解为多个“子程序”，并由不同的智能体负责处理**，这构成了 Ailice 的基础架构。接下来，我们将详细探讨这些基本计算单元的本质。\n\n一个自然的想法是，让大语言模型通过与外部调用者及外围模块进行多轮对话，在最简单的计算单元中解决某些问题（如信息检索、文档理解等）。我们暂时将这个计算单元称为“函数”。然后，类比传统计算，允许函数之间相互调用，并最终引入线程的概念来实现多智能体交互。然而，我们其实可以构建一种**更为简单且优雅的计算模型**。\n\n关键在于，封装了大语言模型推理能力的“函数”实际上可以被多次调用并返回。具备编码功能的“函数”在编码过程中遇到需求不明确时，可以暂停工作，向其调用者返回一条查询语句。如果调用者对答案仍不清楚，则会继续向上一级调用者提问。这一过程甚至可以一直传递到最终用户的聊天窗口。当有新信息加入时，调用者可以通过传入补充信息重新激活编码者的执行流程。由此可见，这种“函数”并非传统意义上的函数，而是一个可以被多次调用的对象。正是大语言模型的高度智能使得这一特性成为可能。你也可以将其视为**由调用关系串联起来的智能体，每个智能体既可以创建和调用更多的子智能体，又能够与其调用者对话以获取补充信息或汇报进展**。在Ailice中，我们将这种计算单元称为**“AProcessor”**（本质上就是我们所说的智能体）。它的代码位于core\u002FAProcessor.py。\n\n\n\u003Ca name=\"basic-computing-unit-tai-chi-diagram-of-llm-and-interpreter\">\u003C\u002Fa>\n#### 基本计算单元：LLM与解释器的太极图\n接下来，我们将详细阐述AProcessor内部的结构。AProcessor的内部是一次多轮对话。“定义AProcessor功能”的“程序”是一种提示词生成机制，它根据对话历史生成每一轮对话的提示词。这种对话属于一对多的形式：外部调用者输入请求后，大语言模型会与外围模块（我们称之为SYSTEM）进行多轮对话，大语言模型会以各种语法形式输出函数调用，系统则调用外围模块生成结果，并将结果放入回复消息中。最后，大语言模型得到答案并回复外部调用者，本次调用结束。但由于对话历史仍然被保留，调用者可以再次发起调用，继续执行更多任务。\n\n我们最后要介绍的是用于解析大语言模型输出的模块。事实上，**我们将大语言模型的输出文本视为一种半自然语言、半形式语言的“脚本”，并使用一个简单的解释器来执行它**。我们可以利用正则表达式来描述精心设计的语法结构，将其解析为函数调用并执行。在这种设计下，我们可以设计更加灵活的函数调用语法形式，例如带有特定固定标题的部分（如“UPDATE MEMORY”），也可以直接被解析出来并触发相应动作的执行。这种隐式的函数调用无需让大语言模型意识到它的存在，只需要严格遵循某种格式规范即可。为了应对最极端的情况，我们还留有一定的余地。这里的解释器不仅可以使用正则表达式进行模式匹配，其Eval函数还是递归的。我们尚不清楚这会被用于何种场景，但留下这样一个酷炫的可能性似乎也不错，不是吗？因此，在AProcessor内部，计算是由大语言模型和解释器交替完成的，它们的输出互为输入，形成一个循环。\n\n\n\u003Ca name=\"agent-design-implementing-the-interpreter-framework\">\u003C\u002Fa>\n#### 智能体设计：实现解释器框架\n在Ailice中，解释器是智能体中最核心的组件之一。我们利用解释器将大语言模型输出中符合特定模式的文本映射为具体动作，包括函数调用、变量定义与引用，以及任何用户自定义的动作。有时这些动作会直接与外围模块交互，影响外部世界；另一些则用于修改智能体的内部状态，从而影响其未来的提示词。\n\n解释器的基本结构非常简单：一组模式-动作对。模式由正则表达式定义，动作则由带有类型注解的Python函数指定。由于语法结构可能存在嵌套，我们将顶层结构称为入口模式。运行时，解释器会主动检测大语言模型输出文本中的这些入口模式。一旦检测到入口模式（且对应的动作返回数据），它就会立即终止大语言模型的生成过程，以执行相关动作。\n\nAilice中智能体的设计包含两个基本方面：**基于对话历史和智能体内部状态生成提示词的逻辑，以及一组模式-动作对**。本质上，智能体通过一组模式-动作对实现了解释器框架的一部分；它本身也成为解释器不可分割的一部分。智能体的内部状态是解释器动作的目标之一，而内部状态的变化会影响未来提示词的方向。\n\n从对话历史和内部状态生成提示词的过程几乎是标准化的，尽管开发者仍有自由选择完全不同的生成逻辑。开发者面临的主要挑战是设计系统提示词模板，这是智能体的核心所在，往往也是需要投入最多精力去完善的部分。不过，这项工作完全围绕着撰写自然语言提示词展开。\n\n\n\u003Ca name=\"scripting-language-from-text-to-reality\">\u003C\u002Fa>\n#### 脚本语言：从文本到现实\nAilice利用一种内嵌于文本中的简单脚本语言，将大语言模型的文本处理能力映射到现实世界。**这种简洁的脚本语言包括非嵌套的函数调用、变量的创建与引用机制，以及文本内容的拼接操作**。其目的是让大语言模型能够更自然地对世界产生影响：从更流畅的文本操作能力，到简单的函数调用机制，再到多模态的变量操作能力。最后需要指出的是，对于智能体的设计者而言，他们始终拥有扩展该脚本语言新语法的自由。这里介绍的是一个最小化的标准语法结构。\n\n基本语法如下：\n\n变量定义：\nVAR_NAME := \u003C!|\"SOME_CONTENT\"|!>\n\n函数调用\u002F变量引用\u002F文本拼接：\n!FUNC-NAME\u003C!|\"...\", '...', VAR_NAME1, \"执行以下代码：\\n\" + VAR_NAME2, ...|!>\n\n基本变量类型包括 str、AImage 和各种多模态类型。str 类型与 Python 的字符串语法一致，支持三重引号和转义字符。\n\n这构成了嵌入式脚本语言的全部内容。\n\n变量定义机制引入了一种扩展上下文窗口的方式，允许 LLM 将重要内容记录到变量中，以防止遗忘。在系统运行过程中，各种变量会自动被定义。例如，如果在文本消息中检测到用三重反引号包裹的代码块，系统会自动创建一个变量来存储该代码，从而使 LLM 可以通过引用该变量来执行代码，避免了完整复制代码所需的时间和 token 开销。此外，某些模块函数可能会返回多模态类型的数据，而不是纯文本。在这种情况下，系统会自动将这些数据定义为相应多模态类型的变量，以便 LLM 能够引用它们（LLM 可能会将它们发送到另一个模块进行处理）。\n\n\n\u003Ca name=\"multimodal-collaboration-of-rich-text-and-variable-mechanisms\">\u003C\u002Fa>\n#### 多模态：富文本与变量机制的协同\n从长远来看，LLM 必然会演变为能够“看”和“听”的多模态模型。因此，**Ailice 各个智能体之间的交互应采用富文本形式**，而不仅仅是纯文本。虽然 Markdown 提供了一些标记多模态内容的能力，但仍然不够充分。因此，未来我们需要一种扩展版的 Markdown，以支持嵌入视频、音频等各种多模态数据。\n\n我们以图像为例来说明 Ailice 中的多模态机制。当智能体接收到包含 Markdown 标记图像的文本时，系统会自动将其输入到多模态模型中，以确保模型能够“看到”这些内容。通常，Markdown 使用路径或 URL 来标记图像，为此我们扩展了 Markdown 语法，允许使用变量名来引用多模态内容。\n\n另一个小问题是，拥有各自内部变量列表的不同智能体如何交换多模态变量。解决方法很简单：系统会自动检查从一个智能体发送到另一个智能体的消息中是否包含内部变量名。如果包含，则会将变量内容一同传递给下一个智能体。\n\n既然使用路径和 URL 标记多模态内容更加方便，为什么我们还要费心实现额外的多模态变量机制呢？这是因为基于本地文件路径标记多模态内容，只有在 Ailice 完全运行于本地环境时才可行，而这并非我们的设计初衷。Ailice 的设计目标是分布式运行，核心和各个模块可能分别运行在不同的计算机上，甚至可以加载互联网上的服务来进行某些计算。这样一来，直接返回完整的多模态数据就更具吸引力。当然，这些面向未来的设计也许有些过度工程化，如果是这样，我们会在未来对其进行调整。\n\n\n\u003Ca name=\"self-expansion-growing-like-a-tree\">\u003C\u002Fa>\n#### 自我扩展：像树一样生长\nAilice 的目标之一是实现自我反思和自我扩展（这也是我们标志中蝴蝶倒映在水中的原因）。**这将使她能够理解自己的代码并构建新的功能，包括新的外部交互模块（即新功能）和新型智能体（APrompt 类）**。这样一来，LLM 的知识和能力将得到更充分的释放。\n\n实现自我扩展涉及两个方面。一方面，需要在运行时动态加载新的模块和新型智能体（APrompt 类），并自然地融入计算系统参与处理，这被称为动态加载。另一方面，Ailice 需要具备构建新模块和新型智能体的能力。\n\n动态加载机制本身具有重要意义：它代表了一种**全新的软件更新机制**。我们可以让 Ailice 在互联网上搜索自己的扩展代码，对代码进行安全性检查、修复 bug 和兼容性问题，并最终将扩展作为自身的一部分运行起来。因此，Ailice 的开发者只需将自己的贡献代码发布到网上，无需合并到主代码库中，也不必考虑其他安装方式。动态加载机制的实现正在不断完善。其核心在于扩展包提供一段描述其功能的文本。在运行时，Ailice 中的每个智能体都会通过语义匹配等方式，为自己找到合适的功能或智能体类型来解决子问题。\n\n构建新模块是一项相对简单的任务，因为模块需要满足的接口约束非常明确。我们可以通过示例来训练 LLM 构建新模块。更为复杂的则是新型智能体（APrompt 类）的自我构建，这需要对 Ailice 的整体架构有很好的理解。尤其是系统提示词的构建，极为精细，即使对人类来说也是一项挑战。因此，我们寄希望于未来更强大的 LLM 实现自我反思，**让 Ailice 通过阅读自己的源代码来理解自己（对于像程序这样复杂的事物，最好的介绍方式就是由其自身呈现），从而构建出更好的新型智能体**。\n\n\n\u003Ca name=\"how-developers-should-get-started\">\u003C\u002Fa>\n\n\n### 开发者入门指南\n\n- 对于开发智能体而言，Ailice 的主循环位于 AiliceMain.py 或 ui\u002Fapp.py 文件中。要进一步了解智能体的构建过程，需要阅读 “prompts” 文件夹中的代码，通过这些代码可以理解智能体的提示词是如何动态构建的。\n\n- 对于希望深入了解 Ailice 内部运行逻辑的开发者，请阅读 core\u002FAProcessor.py 和 core\u002FInterpreter.py。这两份文件总共约三百行代码，却包含了 Ailice 的基本框架。\n\n\n\u003Ca name=\"project-development-standards-and-constraints\">\u003C\u002Fa>\n\n### 项目开发标准与约束\n\n- 在本项目中，**实现AI Agent的预期功能是首要目标，其次才是代码的清晰与简洁**。目前AI Agent的实现仍处于探索阶段，因此我们希望**尽量减少软件中的刚性组件（如对后续开发造成限制的架构或接口），并为应用层提供最大的灵活性（例如提示词类）**。抽象、去重和解耦并非当前的优先事项。\n\n- 在实现某个功能时，**应选择最优方法，而非最直观的方法**。所谓“最优”，通常包括从更高维度简化问题、保持代码的清晰简洁，以及确保改动不会显著增加整体复杂度或限制软件未来的扩展性。\n\n- 添加注释并非强制要求，除非绝对必要；**应力求使代码足够清晰，无需额外解释**。虽然这对喜欢注释的开发者来说可能不是问题，但在AI时代，我们可以随时生成详细的代码说明，从而避免使用结构松散、难以维护的注释。\n\n- 添加代码时应遵循奥卡姆剃刀原则，**绝不添加不必要的代码行**。\n\n- **核心模块中的函数或方法不应超过60行**。\n\n- 虽然没有明确的编码风格约束，但仍需在命名和大小写使用上与原有代码保持一致或相似，以避免影响代码可读性。\n\nAilice的目标是在5000行以下的规模内实现多模态和自我扩展功能，并在当前阶段达到最终形态。追求简洁的代码不仅因为精炼的代码往往代表更好的实现方式，还因为它能够让AI尽早具备自我反思能力，从而更好地进行自我扩展。请严格遵守上述规则，认真对待每一行代码。\n\n\n\u003Ca name=\"future-development-roadmap\">\u003C\u002Fa>\n### 未来开发路线图\n\nAilice的根本任务有两个：**一是将基于文本的LLM能力充分释放到现实世界中；二是探索更优的长期记忆机制，形成对海量文本的一致性理解**。我们的开发工作将围绕这两个核心展开。\n\n如果您对Ailice本身的开发感兴趣，可以考虑以下几个方向：\n\n- 探索改进的**长期记忆机制**，以提升各Agent的能力。我们需要一种能够**持续理解大量内容并促进知识关联**的长期记忆机制。目前最可行的方案是用知识图谱替代向量数据库，这将极大提升对长篇文本或代码的理解能力，并帮助我们构建真正的个人AI助手。\n\n- **多模态**支持。多模态模型的支持已经完成，当前的开发重点正转向外围模块的多模态支持。我们需要一个能够基于屏幕截图操作计算机、模拟鼠标和键盘输入的模块。\n\n- **自我扩展**支持。我们的目标是让语言模型能够**自主编写代码并实现新的外围模块或Agent类型，然后动态加载以供立即使用**。这一能力将实现系统的自我扩展，使其能够无缝集成新功能。我们已完成大部分相关功能，但仍需进一步开发构建新型Agent的能力。\n\n- **更丰富的UI界面**。我们需要将Agent的输出组织成对话窗口中的树状结构，并动态更新所有Agent的输出。同时，还需在Web界面上接收用户输入，并将其传递给脚本执行器的标准输入，这一点在使用sudo时尤为重要。\n\n- 基于现有框架，**开发具有各种功能的Agent**。\n\n- **探索IACT架构在复杂任务中的应用**。通过构建交互式Agent调用树，我们可以将大型文档拆解以提升阅读理解能力，也可以将复杂的软件工程任务分解为更小的模块，通过迭代逐步完成整个项目的构建与测试。这需要一系列精细的提示词设计和测试工作，但为未来带来了令人期待的前景。IACT架构能够显著缓解上下文窗口带来的资源限制，使系统能够动态适应更复杂的任务。\n\n- **利用自我扩展机制构建丰富的外部交互模块！这将在[AiliceEVO](https:\u002F\u002Fgithub.com\u002Fstevenlu137\u002FAIliceEVO)中实现**。\n\n- **探索将Ailice用作强化学习的执行环境，以优化小型语言模型的性能**。在Agent应用中，模型的知识量不如其推理能力和工具使用能力重要。而后者在很大程度上取决于模型对其运行环境的理解，强化学习（RL）可能是培养这种理解的最佳途径。","# AIlice 快速上手指南\n\nAIlice 是一个完全自主的通用 AI 智能体（Agent），旨在基于开源大语言模型构建类似 JARVIS 的独立助手。它具备深度研究、编程自动化、系统管理及多模态交互能力。\n\n## 1. 环境准备\n\n### 系统要求\n- **操作系统**：Linux, macOS, Windows (推荐 WSL2 或 Docker)\n- **Python 版本**：3.8 - 3.11\n- **内存**：建议 8GB 以上（运行本地大模型需更多）\n- **可选加速**：NVIDIA GPU (用于本地推理或 Docker CUDA 支持)\n\n### 前置依赖\n- Git\n- Python pip 包管理工具\n- Docker (如果使用容器化部署)\n- **网络环境**：由于项目涉及 GitHub 克隆及可能的模型下载，国内用户建议配置科学上网环境或使用国内镜像源加速 pip 安装。\n\n## 2. 安装步骤\n\n你可以选择直接在本地安装，或使用 Docker 容器运行（推荐，环境隔离更干净）。\n\n### 方案 A：本地直接安装\n\n1. **克隆项目代码**\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice.git\n   cd AIlice\n   ```\n\n2. **安装依赖**\n   *(国内用户若遇下载慢，可临时指定清华\u002F阿里镜像源)*\n   ```bash\n   pip install -e . -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n   ```\n\n3. **启动服务**\n   ```bash\n   ailice --contextWindowRatio=0.2\n   ```\n\n### 方案 B：Docker 容器运行（推荐）\n\n1. **构建镜像**\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice.git\n   cd AIlice\n   docker build -t ailice .\n   ```\n\n2. **运行容器**\n   ```bash\n   docker run -it -p 127.0.0.1:5000:5000 --name ailice ailice --expose=1 --contextWindowRatio=0.2\n   ```\n\n   *注：若拥有 NVIDIA 显卡并希望启用 GPU 加速，请确保已安装 `nvidia-container-toolkit`，并使用以下命令：*\n   ```bash\n   docker build --build-arg BASE_IMAGE=nvidia\u002Fcuda:13.0.0-cudnn-devel-ubuntu24.04 -t ailice .\n   docker run --gpus all -it -p 127.0.0.1:5000:5000 --name ailice ailice --expose=1 --contextWindowRatio=0.2\n   ```\n\n## 3. 基本使用\n\n启动成功后，终端会显示一个本地访问地址（通常为 `http:\u002F\u002F127.0.0.1:5000`）。请在浏览器中打开该地址，即可进入富媒体对话界面。\n\n### 配置模型\n首次使用前，需在界面设置中配置你的 LLM API Key（支持 OpenAI, Claude, Gemini 等商业模型，或通过 Ollama\u002FLM Studio 接入本地开源模型）。\n\n### 快速尝试指令\n在对话框中输入以下自然语言指令，体验 AIlice 的自主执行能力：\n\n**1. 系统管理与文件操作**\n```text\nList the contents of the current directory.\n```\n*(列出当前目录内容)*\n\n**2. 实时信息查询**\n```text\nCheck the weather in San Francisco today.\n```\n*(查询旧金山今日天气)*\n\n**3. 复杂数学推导与计算**\n```text\nCalculate the integral of e^(-x^2) from negative infinity to positive infinity with detailed derivation steps.\n```\n*(计算高斯积分并提供详细推导步骤)*\n\n**4. 代码生成与可视化**\n```text\nGenerate fractal visualization using any algorithm of choice.\n```\n*(选择任意算法生成分形可视化图像)*\n\n**5. 综合任务示例**\n```text\nInstall Google Chrome browser on this system. Download the latest stable version, verify the installation, and confirm it's working properly.\n```\n*(自动下载、安装并验证谷歌浏览器)*\n\n---\n*提示：AIlice 支持多轮对话和任务拆解，你可以像指挥助手一样下达复杂的混合任务（如“查找论文、总结内容并生成 PDF 报告”）。*","一位独立开发者需要在周末紧急完成一个包含竞品数据抓取、代码原型编写及文献综述的复杂项目，但时间紧迫且缺乏助手。\n\n### 没有 AIlice 时\n- **任务割裂严重**：开发者需手动在浏览器搜索数据、切换 IDE 写代码、再打开论文网站查资料，上下文频繁中断，效率极低。\n- **容错成本高昂**：一旦爬虫脚本因网站结构变化报错，或代码出现逻辑漏洞，必须人工逐行调试，耗费大量宝贵时间。\n- **难以处理复杂逻辑**：面对“先分析数据再生成对应代码”的混合任务，传统单步 AI 工具无法自主拆解步骤，往往需要人类反复提示和纠偏。\n- **本地部署门槛高**：想要保护数据隐私而使用本地大模型，却因配置繁琐、显存优化困难而被迫放弃，只能依赖不安全的云端服务。\n\n### 使用 AIlice 后\n- **全流程自主闭环**：AIlice 基于 IACT 架构自动将项目拆解为“数据收集”、“代码构建”、“文献整理”三个动态子智能体，并行执行并整合结果，开发者只需下达一个指令。\n- **高容错自我修复**：当子任务遇到报错（如爬虫失效），AIlice 能自主诊断问题、调整策略并重试，无需人工介入即可恢复运行。\n- **复杂任务无缝协同**：针对混合需求，AIlice 自动规划执行顺序，让负责研究的智能体输出结论，直接驱动编码智能体生成适配的业务逻辑，实现真正的“思考 - 行动”链。\n- **本地化强力运行**：借助对开源大模型的深度优化，开发者可在本地电脑流畅运行类 JARVIS 助手，既保障了核心代码与数据安全，又节省了昂贵的 API 费用。\n\nAIlice 将开发者从繁琐的工具切换与调试中解放出来，使其真正专注于创意决策，实现了单人团队具备完整研发流水线的效能飞跃。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmyshell-ai_AIlice_5502bfb6.png","myshell-ai","MyShell","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fmyshell-ai_dc0a3ec6.png","We are building an open ecosystem for AI Native Apps.",null,"ethan@myshell.ai","myshell_ai","https:\u002F\u002Fapp.myshell.ai\u002F","https:\u002F\u002Fgithub.com\u002Fmyshell-ai",[82,86,90,94,98],{"name":83,"color":84,"percentage":85},"Python","#3572A5",67.3,{"name":87,"color":88,"percentage":89},"HTML","#e34c26",14.2,{"name":91,"color":92,"percentage":93},"JavaScript","#f1e05a",12.1,{"name":95,"color":96,"percentage":97},"CSS","#663399",6.3,{"name":99,"color":100,"percentage":101},"Dockerfile","#384d54",0.1,1400,208,"2026-04-13T00:56:02","MIT","Linux, macOS, Windows","非必需。若本地运行开源大模型或启用 Docker CUDA 支持，建议配备 NVIDIA GPU（需安装 nvidia-container-toolkit），Docker 示例中使用 CUDA 13.0；若使用商业 API 则无特定显卡要求。","未说明（取决于所选用的大语言模型规模，本地运行大型模型通常建议 16GB+）",{"notes":110,"python":111,"dependencies":112},"1. 支持多种运行方式：本地直接运行、Docker 沙箱、Docker + CUDA 加速、Docker + GUI（Linux 原生支持，Win\u002FMac 需特殊配置）。2. 核心架构依赖外部 LLM，可灵活配置商业模型（如 Claude, Gemini）或通过 Ollama、LM Studio 等推理服务连接开源模型。3. 具备语音对话功能（基于 ChatTTS）。4. 支持 MCP (Model Context Protocol) 工具扩展。5. 首次使用需配置 LLM API Key 或本地推理服务地址。","未说明（通过 pip install -e . 安装，通常兼容 Python 3.8+）",[113],"nvidia-container-toolkit (可选，用于 Docker GPU 支持)",[13,14,15,35],[116,117,118,119],"agent","ai","llm","llm-agent","2026-03-27T02:49:30.150509","2026-04-15T06:06:37.752598",[123,128,133,138,143,148],{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},33697,"如何在 Windows 上安装 AIlice？","目前默认安装已移除对 torch\u002Fhuggingface 栈的依赖，并用自研 UI 替换了 Gradio，因此默认选项下的 Windows 安装问题已不再存在。如果仍需手动安装 torch，请注意：在 Windows 上直接运行 `pip install torch` 默认安装的是 CPU 版本；若需 GPU 版本，必须指定版本，例如：`pip install torch==2.1.1+cu118`。","https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice\u002Fissues\u002F16",{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},33698,"Windows 用户无法使用 Docker 时如何直接运行？","项目已支持直接在命令行运行，无需 Docker。请切换到最新的 dev 分支代码即可使用该功能。","https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice\u002Fissues\u002F6",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},33699,"遇到模型加载失败或 llama-cpp-python 相关错误怎么办？","这通常是由于 llama-cpp-python 库的版本兼容性问题导致的。建议尝试升级或切换该库的版本（例如从 0.1.68 升级到最新版）。此外，也可以尝试删除损坏的模型权重文件并重新下载。","https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice\u002Fissues\u002F41",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},33700,"如何强制停止正在运行的 Agent 任务？","由于提示词（prompt）方式停止任务可能因父 Agent 缺乏上下文而失效，最新版本已添加硬中断功能。您可以在 UI 中使用 `\u002Fstop` 命令来强制控制并停止任务执行。","https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice\u002Fissues\u002F65",{"id":144,"question_zh":145,"answer_zh":146,"source_url":147},33701,"如何为不同的模型单独配置采样参数（如 temperature, min_p）？","可以在 `config.json` 文件中为任何 LLM 配置推理参数。这样您就可以为每个模型（甚至同一模型在不同用途下，如思考或编码）设置独立的采样器参数，而无需依赖默认值。","https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FAIlice\u002Fissues\u002F50",{"id":149,"question_zh":150,"answer_zh":151,"source_url":132},33702,"如何添加并使用自定义的大语言模型（LLM）？","请按以下步骤操作：\n1. 在 `alice\u002Fcore\u002Fllm` 目录下添加新的 LLM 脚本，参考现有代码实现 `__init__` 和 `Generate` 函数，如有需要可添加新的 Formatter。\n2. 在 `ailice\u002Fcore\u002Fllm\u002FALLMPool.py` 中将您的模型添加到池中，确保命令行可访问。\n3. 在 `ailice\u002Fcore\u002Fllm\u002FALLMeta.py` 中添加模型信息。\n4. 最后运行命令：`ailice_main --modelID={您在上述文件中设置的模型 ID} --prompt=\"main\"`。",[153],{"id":154,"version":155,"summary_zh":156,"released_at":157},263554,"v0.2.0-alpha","这是文本模态阶段的稳定版本，提示词针对 gpt4-turbo 进行了优化，在开源模型如 NousResearch\u002FNous-Hermes-2-Mixtral-8x7B-DPO 上也有不错的表现。不过，语音交互和网页界面功能目前还不够完善。\n\n在主分支启动多模态功能开发任务之后，可能会出现性能不稳定的情况。建议用户将本版本作为备选方案使用。","2024-02-07T07:59:24"]