[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-Open-LLM-VTuber--Open-LLM-VTuber":3,"tool-Open-LLM-VTuber--Open-LLM-VTuber":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":67,"owner_name":67,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":77,"owner_twitter":77,"owner_website":78,"owner_url":79,"languages":80,"stars":93,"forks":94,"last_commit_at":95,"license":96,"difficulty_score":10,"env_os":97,"env_gpu":98,"env_ram":99,"env_deps":100,"category_tags":110,"github_topics":111,"view_count":23,"oss_zip_url":77,"oss_zip_packed_at":77,"status":16,"created_at":122,"updated_at":123,"faqs":124,"releases":152},1294,"Open-LLM-VTuber\u002FOpen-LLM-VTuber","Open-LLM-VTuber","Talk to any LLM with hands-free voice interaction, voice interruption, and Live2D taking face running locally across platforms","Open-LLM-VTuber 是一个支持语音交互的 AI 伴侣工具，能够实现实时语音对话、语音打断以及 Live2D 虚拟形象展示。所有功能均可在本地离线运行，无需依赖网络连接。它通过语音识别和合成技术，让用户可以像与真人对话一样，与 AI 进行自然交流，同时搭配生动的虚拟角色形象，提升互动体验。\n\n这个项目解决了传统 AI 对话工具缺乏直观交互方式的问题，使用户可以通过语音而非键盘输入与 AI 沟通，更加便捷自然。此外，Live2D 技术的应用让 AI 伴侣拥有更丰富的视觉表现力，增强了沉浸感。\n\nOpen-LLM-VTuber 适合对 AI 交互体验感兴趣的一般用户，也适合开发者和研究人员进行本地化部署与二次开发。其独特之处在于结合了语音交互、实时对话与 Live2D 视觉呈现，为用户提供了一个多功能、易用性强的 AI 伴侣平台。","![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpen-LLM-VTuber_Open-LLM-VTuber_readme_71d1698fa47e.jpg)\n\n\u003Ch1 align=\"center\">Open-LLM-VTuber\u003C\u002Fh1>\n\u003Ch3 align=\"center\">\n\n[![GitHub release](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Frelease\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber)](https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Freleases) \n[![license](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber)](https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Fblob\u002Fmaster\u002FLICENSE) \n[![CodeQL](https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Factions\u002Fworkflows\u002Fcodeql.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Factions\u002Fworkflows\u002Fcodeql.yml)\n[![Ruff](https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Factions\u002Fworkflows\u002Fruff.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Factions\u002Fworkflows\u002Fruff.yml)\n[![Docker](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FOpen-LLM-VTuber%2FOpen--LLM--VTuber-%25230db7ed.svg?logo=docker&logoColor=blue&labelColor=white&color=blue)](https:\u002F\u002Fhub.docker.com\u002Fr\u002FOpen-LLM-VTuber\u002Fopen-llm-vtuber) \n[![QQ User Group](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FQQ_User_Group-792615362-white?style=flat&logo=qq&logoColor=white)](https:\u002F\u002Fqm.qq.com\u002Fq\u002FngvNUQpuKI)\n[![Static Badge](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FJoin%20Chat-Zulip?style=flat&logo=zulip&label=Zulip(dev-community)&color=blue&link=https%3A%2F%2Folv.zulipchat.com)](https:\u002F\u002Folv.zulipchat.com)\n\n> **📢 v2.0 Development**: We are focusing on Open-LLM-VTuber v2.0 — a complete rewrite of the codebase. v2.0 is currently in its early discussion and planning phase. We kindly ask you to refrain from opening new issues or pull requests for feature requests on v1. To participate in the v2 discussions or contribute, join our developer community on [Zulip](https:\u002F\u002Folv.zulipchat.com). Weekly meeting schedules will be announced on Zulip. We will continue fixing bugs for v1 and work through existing pull requests.\n\n[![BuyMeACoffee](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBuy%20Me%20a%20Coffee-ffdd00?style=for-the-badge&logo=buy-me-a-coffee&logoColor=black)](https:\u002F\u002Fwww.buymeacoffee.com\u002Fyi.ting)\n[![](https:\u002F\u002Fdcbadge.limes.pink\u002Fapi\u002Fserver\u002F3UDA8YFDXx)](https:\u002F\u002Fdiscord.gg\u002F3UDA8YFDXx)\n\n[![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber)\n\nENGLISH README | [中文 README](.\u002FREADME.CN.md) | [한국어 README](.\u002FREADME.KR.md) | [日本語 README](.\u002FREADME.JP.md)\n\n[Documentation](https:\u002F\u002Fopen-llm-vtuber.github.io\u002Fdocs\u002Fquick-start) | [![Roadmap](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FRoadmap-GitHub_Project-yellow)](https:\u002F\u002Fgithub.com\u002Forgs\u002FOpen-LLM-VTuber\u002Fprojects\u002F2)\n\n\u003Ca href=\"https:\u002F\u002Ftrendshift.io\u002Frepositories\u002F12358\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpen-LLM-VTuber_Open-LLM-VTuber_readme_34d606db08fc.png\" alt=\"Open-LLM-VTuber%2FOpen-LLM-VTuber | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"\u002F>\u003C\u002Fa>\n\n\u003C\u002Fh3>\n\n\n> 常见问题 Common Issues doc (Written in Chinese): https:\u002F\u002Fdocs.qq.com\u002Fpdf\u002FDTFZGQXdTUXhIYWRq\n>\n> User Survey: https:\u002F\u002Fforms.gle\u002Fw6Y6PiHTZr1nzbtWA\n>\n> 调查问卷(中文): https:\u002F\u002Fwj.qq.com\u002Fs2\u002F16150415\u002Ff50a\u002F\n\n\n\n> :warning: This project is in its early stages and is currently under **active development**.\n\n> :warning: If you want to run the server remotely and access it on a different machine, such as running the server on your computer and access it on your phone, you will need to configure `https`, because the microphone on the front end will only launch in a secure context (a.k.a. https or localhost). See [MDN Web Doc](https:\u002F\u002Fdeveloper.mozilla.org\u002Fen-US\u002Fdocs\u002FWeb\u002FAPI\u002FMediaDevices\u002FgetUserMedia). Therefore, you should configure https with a reverse proxy to access the page on a remote machine (non-localhost).\n\n\n\n## ⭐️ What is this project?\n\n\n**Open-LLM-VTuber** is a unique **voice-interactive AI companion** that not only supports **real-time voice conversations**  and **visual perception** but also features a lively **Live2D avatar**. All functionalities can run completely offline on your computer!\n\nYou can treat it as your personal AI companion — whether you want a `virtual girlfriend`, `boyfriend`, `cute pet`, or any other character, it can meet your expectations. The project fully supports `Windows`, `macOS`, and `Linux`, and offers two usage modes: web version and desktop client (with special support for **transparent background desktop pet mode**, allowing the AI companion to accompany you anywhere on your screen).\n\nAlthough the long-term memory feature is temporarily removed (coming back soon), thanks to the persistent storage of chat logs, you can always continue your previous unfinished conversations without losing any precious interactive moments.\n\nIn terms of backend support, we have integrated a rich variety of LLM inference, text-to-speech, and speech recognition solutions. If you want to customize your AI companion, you can refer to the [Character Customization Guide](https:\u002F\u002Fopen-llm-vtuber.github.io\u002Fdocs\u002Fuser-guide\u002Flive2d) to customize your AI companion's appearance and persona.\n\nThe reason it's called `Open-LLM-Vtuber` instead of `Open-LLM-Companion` or `Open-LLM-Waifu` is because the project's initial development goal was to use open-source solutions that can run offline on platforms other than Windows to recreate the closed-source AI Vtuber `neuro-sama`.\n\n### 👀 Demo\n| ![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpen-LLM-VTuber_Open-LLM-VTuber_readme_19a9806ff9d3.jpg) | ![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpen-LLM-VTuber_Open-LLM-VTuber_readme_8a11e3cf7250.jpg) |\n|:---:|:---:|\n| ![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpen-LLM-VTuber_Open-LLM-VTuber_readme_99e04027c04f.jpg) | ![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpen-LLM-VTuber_Open-LLM-VTuber_readme_85c0fcebbada.jpg) |\n\n\n## ✨ Features & Highlights\n\n- 🖥️ **Cross-platform support**: Perfect compatibility with macOS, Linux, and Windows. We support NVIDIA and non-NVIDIA GPUs, with options to run on CPU or use cloud APIs for resource-intensive tasks. Some components support GPU acceleration on macOS.\n\n- 🔒 **Offline mode support**: Run completely offline using local models - no internet required. Your conversations stay on your device, ensuring privacy and security.\n\n- 💻 **Attractive and powerful web and desktop clients**: Offers both web version and desktop client usage modes, supporting rich interactive features and personalization settings. The desktop client can switch freely between window mode and desktop pet mode, allowing the AI companion to be by your side at all times.\n\n- 🎯 **Advanced interaction features**:\n  - 👁️ Visual perception, supporting camera, screen recording and screenshots, allowing your AI companion to see you and your screen\n  - 🎤 Voice interruption without headphones (AI won't hear its own voice)\n  - 🫱 Touch feedback, interact with your AI companion through clicks or drags\n  - 😊 Live2D expressions, set emotion mapping to control model expressions from the backend\n  - 🐱 Pet mode, supporting transparent background, global top-most, and mouse click-through - drag your AI companion anywhere on the screen\n  - 💭 Display AI's inner thoughts, allowing you to see AI's expressions, thoughts and actions without them being spoken\n  - 🗣️ AI proactive speaking feature\n  - 💾 Chat log persistence, switch to previous conversations anytime\n  - 🌍 TTS translation support (e.g., chat in Chinese while AI uses Japanese voice)\n\n- 🧠 **Extensive model support**:\n  - 🤖 Large Language Models (LLM): Ollama, OpenAI (and any OpenAI-compatible API), Gemini, Claude, Mistral, DeepSeek, Zhipu AI, GGUF, LM Studio, vLLM, etc.\n  - 🎙️ Automatic Speech Recognition (ASR): sherpa-onnx, FunASR, Faster-Whisper, Whisper.cpp, Whisper, Groq Whisper, Azure ASR, etc.\n  - 🔊 Text-to-Speech (TTS): sherpa-onnx, pyttsx3, MeloTTS, Coqui-TTS, GPTSoVITS, Bark, CosyVoice, Edge TTS, Fish Audio, Azure TTS, etc.\n\n- 🔧 **Highly customizable**:\n  - ⚙️ **Simple module configuration**: Switch various functional modules through simple configuration file modifications, without delving into the code\n  - 🎨 **Character customization**: Import custom Live2D models to give your AI companion a unique appearance. Shape your AI companion's persona by modifying the Prompt. Perform voice cloning to give your AI companion the voice you desire\n  - 🧩 **Flexible Agent implementation**: Inherit and implement the Agent interface to integrate any Agent architecture, such as HumeAI EVI, OpenAI Her, Mem0, etc.\n  - 🔌 **Good extensibility**: Modular design allows you to easily add your own LLM, ASR, TTS, and other module implementations, extending new features at any time\n\n\n## 👥 User Reviews\n> Thanks to the developer for open-sourcing and sharing the girlfriend for everyone to use\n> \n> This girlfriend has been used over 100,000 times\n\n\n## 🚀 Quick Start\n\nPlease refer to the [Quick Start](https:\u002F\u002Fopen-llm-vtuber.github.io\u002Fdocs\u002Fquick-start) section in our documentation for installation.\n\n\n\n## ☝ Update\n> :warning: `v1.0.0` has breaking changes and requires re-deployment. You *may* still update via the method below, but the `conf.yaml` file is incompatible and most of the dependencies needs to be reinstalled with `uv`. For those who came from versions before `v1.0.0`, I recommend deploy this project again with the [latest deployment guide](https:\u002F\u002Fopen-llm-vtuber.github.io\u002Fdocs\u002Fquick-start).\n\nPlease use `uv run update.py` to update if you installed any versions later than `v1.0.0`.\n\n## 😢 Uninstall  \nMost files, including Python dependencies and models, are stored in the project folder.\n\nHowever, models downloaded via ModelScope or Hugging Face may also be in `MODELSCOPE_CACHE` or `HF_HOME`. While we aim to keep them in the project's `models` directory, it's good to double-check.  \n\nReview the installation guide for any extra tools you no longer need, such as `uv`, `ffmpeg`, or `deeplx`.  \n\n## 🤗 Want to contribute?\nCheckout the [development guide](https:\u002F\u002Fdocs.llmvtuber.com\u002Fdocs\u002Fdevelopment-guide\u002Foverview).\n\n\n# 🎉🎉🎉 Related Projects\n\n[ylxmf2005\u002FLLM-Live2D-Desktop-Assitant](https:\u002F\u002Fgithub.com\u002Fylxmf2005\u002FLLM-Live2D-Desktop-Assitant)\n- Your Live2D desktop assistant powered by LLM! Available for both Windows and MacOS, it senses your screen, retrieves clipboard content, and responds to voice commands with a unique voice. Featuring voice wake-up, singing capabilities, and full computer control for seamless interaction with your favorite character.\n\n\n\n\n\n\n## 📜 Third-Party Licenses\n\n### Live2D Sample Models Notice\n\nThis project includes Live2D sample models provided by Live2D Inc. These assets are licensed separately under the Live2D Free Material License Agreement and the Terms of Use for Live2D Cubism Sample Data. They are not covered by the MIT license of this project.\n\nThis content uses sample data owned and copyrighted by Live2D Inc. The sample data are utilized in accordance with the terms and conditions set by Live2D Inc. (See [Live2D Free Material License Agreement](https:\u002F\u002Fwww.live2d.jp\u002Fen\u002Fterms\u002Flive2d-free-material-license-agreement\u002F) and [Terms of Use](https:\u002F\u002Fwww.live2d.com\u002Feula\u002Flive2d-sample-model-terms_en.html)).\n\nNote: For commercial use, especially by medium or large-scale enterprises, the use of these Live2D sample models may be subject to additional licensing requirements. If you plan to use this project commercially, please ensure that you have the appropriate permissions from Live2D Inc., or use versions of the project without these models.\n\n\n## Contributors\nThanks our contributors and maintainers for making this project possible.\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpen-LLM-VTuber_Open-LLM-VTuber_readme_6a6810b9ce93.png\" \u002F>\n\u003C\u002Fa>\n\n\n## Star History\n\n[![Star History Chart](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpen-LLM-VTuber_Open-LLM-VTuber_readme_5ea3b225fd9f.png)](https:\u002F\u002Fstar-history.com\u002F#Open-LLM-VTuber\u002Fopen-llm-vtuber&Date)\n\n\n\n\n\n","![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpen-LLM-VTuber_Open-LLM-VTuber_readme_71d1698fa47e.jpg)\n\n\u003Ch1 align=\"center\">Open-LLM-VTuber\u003C\u002Fh1>\n\u003Ch3 align=\"center\">\n\n[![GitHub release](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Frelease\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber)](https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Freleases) \n[![license](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber)](https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Fblob\u002Fmaster\u002FLICENSE) \n[![CodeQL](https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Factions\u002Fworkflows\u002Fcodeql.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Factions\u002Fworkflows\u002Fcodeql.yml)\n[![Ruff](https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Factions\u002Fworkflows\u002Fruff.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Factions\u002Fworkflows\u002Fruff.yml)\n[![Docker](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FOpen-LLM-VTuber%2FOpen--LLM--VTuber-%25230db7ed.svg?logo=docker&logoColor=blue&labelColor=white&color=blue)](https:\u002F\u002Fhub.docker.com\u002Fr\u002FOpen-LLM-VTuber\u002Fopen-llm-vtuber) \n[![QQ User Group](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FQQ_User_Group-792615362-white?style=flat&logo=qq&logoColor=white)](https:\u002F\u002Fqm.qq.com\u002Fq\u002FngvNUQpuKI)\n[![Static Badge](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FJoin%20Chat-Zulip?style=flat&logo=zulip&label=Zulip(dev-community)&color=blue&link=https%3A%2F%2Folv.zulipchat.com)](https:\u002F\u002Folv.zulipchat.com)\n\n> **📢 v2.0 Development**: 我们正在专注于 Open-LLM-VTuber v2.0 —— 代码库的彻底重写。v2.0 目前处于早期讨论和规划阶段。我们恳请您暂时不要为 v1 提交新的功能需求 issue 或 pull request。如需参与 v2 的讨论或贡献，请加入我们在 [Zulip](https:\u002F\u002Folv.zulipchat.com) 上的开发者社区。每周会议安排将在 Zulip 上公布。我们将继续修复 v1 的 bug，并处理现有的 pull request。\n\n[![BuyMeACoffee](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBuy%20Me%20a%20Coffee-ffdd00?style=for-the-badge&logo=buy-me-a-coffee&logoColor=black)](https:\u002F\u002Fwww.buymeacoffee.com\u002Fyi.ting)\n[![](https:\u002F\u002Fdcbadge.limes.pink\u002Fapi\u002Fserver\u002F3UDA8YFDXx)](https:\u002F\u002Fdiscord.gg\u002F3UDA8YFDXx)\n\n[![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber)\n\n英文 README | [中文 README](.\u002FREADME.CN.md) | [韩语 README](.\u002FREADME.KR.md) | [日语 README](.\u002FREADME.JP.md)\n\n[Documentation](https:\u002F\u002Fopen-llm-vtuber.github.io\u002Fdocs\u002Fquick-start) | [![Roadmap](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FRoadmap-GitHub_Project-yellow)](https:\u002F\u002Fgithub.com\u002Forgs\u002FOpen-LLM-VTuber\u002Fprojects\u002F2)\n\n\u003Ca href=\"https:\u002F\u002Ftrendshift.io\u002Frepositories\u002F12358\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpen-LLM-VTuber_Open-LLM-VTuber_readme_34d606db08fc.png\" alt=\"Open-LLM-VTuber%2FOpen-LLM-VTuber | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"\u002F>\u003C\u002Fa>\n\n\u003C\u002Fh3>\n\n\n> 常见问题 Common Issues doc (Written in Chinese): https:\u002F\u002Fdocs.qq.com\u002Fpdf\u002FDTFZGQXdTUXhIYWRq\n>\n> User Survey: https:\u002F\u002Fforms.gle\u002Fw6Y6PiHTZr1nzbtWA\n>\n> 调查问卷(中文): https:\u002F\u002Fwj.qq.com\u002Fs2\u002F16150415\u002Ff50a\u002F\n\n\n\n> :warning: 本项目尚处于早期阶段，目前正处于**积极开发中**。\n\n> :warning: 如果您希望远程运行服务器并在其他设备上访问它，例如在电脑上运行服务器而在手机上访问，您需要配置 `https`，因为前端的麦克风只能在安全上下文中启动（即 https 或 localhost）。请参阅 [MDN Web Doc](https:\u002F\u002Fdeveloper.mozilla.org\u002Fen-US\u002Fdocs\u002FWeb\u002FAPI\u002FMediaDevices\u002FgetUserMedia)。因此，您应该通过反向代理配置 https，以便在远程设备（非 localhost）上访问该页面。\n\n\n\n## ⭐️ 什么是这个项目？\n\n\n**Open-LLM-VTuber** 是一款独特的**语音交互式 AI 伴侣**，不仅支持**实时语音对话**和**视觉感知**，还拥有生动的**Live2D 头像**。所有功能均可在您的电脑上完全离线运行！\n\n您可以将其视为自己的私人 AI 伴侣——无论您想要一个 `虚拟女友`、`男友`、`可爱宠物`，还是任何其他角色，它都能满足您的期望。该项目全面支持 `Windows`、`macOS` 和 `Linux`，并提供两种使用模式：网页版和桌面客户端（特别支持**透明背景桌面宠物模式**，让 AI 伴侣可以伴随您在屏幕上的任何位置）。\n\n尽管长期记忆功能暂时移除（即将回归），但由于聊天记录的持久化存储，您始终可以继续之前未完成的对话，不会丢失任何珍贵的互动时刻。\n\n在后端支持方面，我们集成了丰富的 LLM 推理、文本转语音和语音识别解决方案。如果您想自定义您的 AI 伴侣，可以参考 [角色定制指南](https:\u002F\u002Fopen-llm-vtuber.github.io\u002Fdocs\u002Fuser-guide\u002Flive2d) 来调整 AI 伴侣的外观和个性。\n\n之所以命名为 `Open-LLM-Vtuber` 而不是 `Open-LLM-Companion` 或 `Open-LLM-Waifu`，是因为项目的最初开发目标是利用可在 Windows 以外的平台上离线运行的开源解决方案，重现闭源 AI Vtuber `neuro-sama`。\n\n### 👀 演示\n| ![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpen-LLM-VTuber_Open-LLM-VTuber_readme_19a9806ff9d3.jpg) | ![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpen-LLM-VTuber_Open-LLM-VTuber_readme_8a11e3cf7250.jpg) |\n|:---:|:---:|\n| ![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpen-LLM-VTuber_Open-LLM-VTuber_readme_99e04027c04f.jpg) | ![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpen-LLM-VTuber_Open-LLM-VTuber_readme_85c0fcebbada.jpg) |\n\n## ✨ 特色与亮点\n\n- 🖥️ **跨平台支持**：完美兼容 macOS、Linux 和 Windows。我们支持 NVIDIA 和非 NVIDIA GPU，并提供在 CPU 上运行或使用云端 API 来处理资源密集型任务的选项。部分组件还支持在 macOS 上进行 GPU 加速。\n\n- 🔒 **离线模式支持**：完全离线运行，仅使用本地模型——无需联网。您的对话全程保存在设备上，确保隐私与安全。\n\n- 💻 **美观且强大的网页端与桌面端客户端**：同时提供网页版和桌面端两种使用模式，支持丰富的交互功能与个性化设置。桌面端客户端可在窗口模式与桌面宠物模式之间自由切换，让 AI 伴侣时刻陪伴在您身边。\n\n- 🎯 **高级交互功能**：\n  - 👁️ 视觉感知，支持摄像头、屏幕录制与截图，让您的 AI 伴侣能够“看见”您和您的屏幕。\n  - 🎤 无需耳机即可语音打断（AI 不会听到自己的声音）。\n  - 🫱 触控反馈，通过点击或拖拽与您的 AI 伴侣互动。\n  - 😊 Live2D 表情，可从后台设置情绪映射以控制模型表情。\n  - 🐱 宠物模式，支持透明背景、全局置顶以及鼠标穿透——您可以将 AI 伴侣拖动到屏幕上的任意位置。\n  - 💭 显示 AI 的内心想法，让您无需言语即可看到 AI 的表情、思绪与行为。\n  - 🗣️ AI 主动发言功能。\n  - 💾 聊天记录持久化，随时切换回之前的对话。\n  - 🌍 支持 TTS 翻译（例如，您用中文聊天，而 AI 使用日语语音）。\n\n- 🧠 **广泛的模型支持**：\n  - 🤖 大语言模型（LLM）：Ollama、OpenAI（以及任何 OpenAI 兼容的 API）、Gemini、Claude、Mistral、DeepSeek、智谱 AI、GGUF、LM Studio、vLLM 等。\n  - 🎙️ 自动语音识别（ASR）：sherpa-onnx、FunASR、Faster-Whisper、Whisper.cpp、Whisper、Groq Whisper、Azure ASR 等。\n  - 🔊 文本转语音（TTS）：sherpa-onnx、pyttsx3、MeloTTS、Coqui-TTS、GPTSoVITS、Bark、CosyVoice、Edge TTS、Fish Audio、Azure TTS 等。\n\n- 🔧 **高度可定制**：\n  - ⚙️ **简单的模块配置**：只需修改简单的配置文件即可切换各种功能模块，无需深入代码。\n  - 🎨 **角色定制**：导入自定义 Live2D 模型，为您的 AI 伴侣打造独一无二的外观。通过调整 Prompt 来塑造 AI 伴侣的人设。还可进行语音克隆，赋予 AI 伴侣您心仪的嗓音。\n  - 🧩 **灵活的 Agent 实现**：继承并实现 Agent 接口，即可集成任何 Agent 架构，如 HumeAI EVI、OpenAI Her、Mem0 等。\n  - 🔌 **良好的扩展性**：模块化设计使您可以轻松添加自己的 LLM、ASR、TTS 等模块实现，随时扩展新功能。\n\n\n## 👥 用户评价\n> 感谢开发者开源并分享这款“女朋友”，让大家都能使用！\n> \n> 这款“女朋友”已被使用超过 10 万次。\n\n\n## 🚀 快速入门\n\n请参阅我们的文档中的[快速入门](https:\u002F\u002Fopen-llm-vtuber.github.io\u002Fdocs\u002Fquick-start)章节以进行安装。\n\n\n## ☝ 更新\n> :warning: `v1.0.0` 存在破坏性变更，需重新部署。您*仍*可以通过以下方法更新，但 `conf.yaml` 文件不兼容，且大部分依赖项需要使用 `uv` 重新安装。对于从 `v1.0.0` 之前版本升级的用户，建议按照[最新部署指南](https:\u002F\u002Fopen-llm-vtuber.github.io\u002Fdocs\u002Fquick-start)重新部署该项目。\n\n如果您安装的是 `v1.0.0` 之后的版本，请使用 `uv run update.py` 进行更新。\n\n## 😢 卸载  \n大多数文件，包括 Python 依赖和模型，都存储在项目文件夹中。\n\n不过，通过 ModelScope 或 Hugging Face 下载的模型也可能位于 `MODELSCOPE_CACHE` 或 `HF_HOME` 中。虽然我们尽量将这些模型放在项目的 `models` 目录下，但最好还是再确认一下。  \n\n请查阅安装指南，移除您不再需要的额外工具，例如 `uv`、`ffmpeg` 或 `deeplx`。\n\n\n## 🤗 想要贡献？\n请查看[开发指南](https:\u002F\u002Fdocs.llmvtuber.com\u002Fdocs\u002Fdevelopment-guide\u002Foverview)。\n\n\n# 🎉🎉🎉 相关项目\n\n[ylxmf2005\u002FLLM-Live2D-Desktop-Assitant](https:\u002F\u002Fgithub.com\u002Fylxmf2005\u002FLLM-Live2D-Desktop-Assitant)\n- 您的 Live2D 桌面助手，由 LLM 驱动！适用于 Windows 和 macOS，可感知您的屏幕、获取剪贴板内容，并以独特的声音响应语音指令。具备语音唤醒、唱歌功能以及对电脑的全面控制，让您与心爱的角色无缝互动。\n\n\n\n\n\n## 📜 第三方许可证\n\n### Live2D 示例模型声明\n\n本项目包含由 Live2D Inc. 提供的 Live2D 示例模型。这些资产分别依据 Live2D 免费素材许可协议及 Live2D Cubism 示例数据使用条款获得授权，不在本项目的 MIT 许可范围内。\n\n本内容使用了 Live2D Inc. 所拥有并受版权保护的示例数据。示例数据的使用严格遵守 Live2D Inc. 制定的条款与条件（详见 [Live2D 免费素材许可协议](https:\u002F\u002Fwww.live2d.jp\u002Fen\u002Fterms\u002Flive2d-free-material-license-agreement\u002F) 和 [使用条款](https:\u002F\u002Fwww.live2d.com\u002Feula\u002Flive2d-sample-model-terms_en.html)）。\n\n注意：若用于商业用途，尤其是中大型企业，使用这些 Live2D 示例模型可能需要额外的授权许可。如您计划将本项目用于商业目的，请务必事先获得 Live2D Inc. 的相应授权，或使用不含这些模型的项目版本。\n\n\n## 贡献者\n感谢我们的贡献者与维护者，正是他们的努力才让这个项目成为现实。\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpen-LLM-VTuber_Open-LLM-VTuber_readme_6a6810b9ce93.png\" \u002F>\n\u003C\u002Fa>\n\n\n## 星标历史\n\n[![星标历史图表](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpen-LLM-VTuber_Open-LLM-VTuber_readme_5ea3b225fd9f.png)](https:\u002F\u002Fstar-history.com\u002F#Open-LLM-VTuber\u002Fopen-llm-vtuber&Date)","# Open-LLM-VTuber 快速上手指南\n\n---\n\n## 环境准备\n\n### 系统要求\n- 支持 **Windows**, **macOS**, **Linux** 系统\n- 推荐使用 Python 3.10+ 环境\n- 需要安装以下依赖：\n  - `uv`（Python 包管理工具）\n  - `ffmpeg`\n  - `deeplx`（可选）\n\n> 💡 提示：如需加速依赖包下载，建议使用国内镜像源，例如：\n> ```bash\n> uv --index-url https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple install \u003Cpackage-name>\n> ```\n\n---\n\n## 安装步骤\n\n### 1. 克隆项目仓库\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber.git\ncd Open-LLM-VTuber\n```\n\n### 2. 安装依赖\n```bash\nuv sync\n```\n\n> ⚠️ 注意：如果遇到依赖冲突或版本问题，请参考 [官方文档](https:\u002F\u002Fopen-llm-vtuber.github.io\u002Fdocs\u002Fquick-start) 中的最新部署指南。\n\n### 3. 启动服务\n```bash\nuv run main.py\n```\n\n> 📌 如果你使用的是 v1.0.0 之前的版本，需要重新部署项目并更新配置文件。请参考 [最新部署指南](https:\u002F\u002Fopen-llm-vtuber.github.io\u002Fdocs\u002Fquick-start)。\n\n---\n\n## 基本使用\n\n### 启动后访问 Web 界面\n默认情况下，服务会在本地启动一个 Web 服务器，你可以通过浏览器访问：\n\n```\nhttp:\u002F\u002Flocalhost:8000\n```\n\n> 🔒 如果你需要远程访问（如手机访问电脑上的服务），请配置 HTTPS 反向代理，否则前端麦克风功能将无法启用。\n\n### 使用桌面客户端（可选）\n如果你希望以桌面宠物模式运行 AI 伴侣，可以使用桌面客户端：\n\n```bash\nuv run desktop_client.py\n```\n\n> 🖼️ 桌面客户端支持透明背景、全局置顶和鼠标穿透等功能，AI 伴侣可以自由拖动到屏幕任意位置。\n\n---\n\n## 小结\n\n通过以上步骤，你已经成功部署了 Open-LLM-VTuber，并可以通过 Web 或桌面客户端与你的 AI 伴侣进行互动。更多高级功能和自定义设置，请参考 [官方文档](https:\u002F\u002Fopen-llm-vtuber.github.io\u002Fdocs\u002Fquick-start)。","一个独立开发者正在开发一款虚拟助手应用，用于为用户提供日常问答、情感陪伴和个性化互动体验。他希望在本地环境中实现一个具备语音交互能力的 AI 伴侣，并且能够实时响应用户的声音指令，同时展示一个生动的 2D 动态角色形象。\n\n### 没有 Open-LLM-VTuber 时  \n- 需要手动编写大量代码来集成语音识别、自然语言处理和动画渲染模块，开发周期长且复杂度高  \n- 无法实现流畅的语音中断功能，导致用户在对话中难以自然打断 AI 的回答  \n- 缺乏现成的 Live2D 角色模型支持，需要从零开始设计或购买角色素材并进行适配  \n- 本地运行时性能不稳定，容易出现延迟或卡顿现象，影响用户体验  \n- 部署和测试流程繁琐，需要依赖多个第三方服务和平台，增加了维护成本  \n\n### 使用 Open-LLM-VTuber 后  \n- 提供了一套完整的本地化解决方案，可快速集成语音交互、AI 对话和 Live2D 角色展示功能，大幅缩短开发时间  \n- 支持语音中断功能，使用户可以在对话中随时打断 AI，提升交互的自然性和灵活性  \n- 内置多种 Live2D 角色模板，用户可直接使用或自定义，无需额外开发角色模型  \n- 所有功能均基于本地运行，确保了数据隐私和稳定性，同时降低了对外部服务的依赖  \n- 提供统一的部署和调试环境，简化了测试流程，提高了开发效率  \n\nOpen-LLM-VTuber 让开发者能够高效构建个性化的 AI 交互应用，兼顾功能完整性与开发便捷性。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FOpen-LLM-VTuber_Open-LLM-VTuber_71d1698f.jpg","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FOpen-LLM-VTuber_ddbd8df0.png","Open LLM VTuber Project",null,"https:\u002F\u002Fopen-llm-vtuber.github.io\u002F","https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber",[81,85,89],{"name":82,"color":83,"percentage":84},"Python","#3572A5",96.7,{"name":86,"color":87,"percentage":88},"JavaScript","#f1e05a",2.8,{"name":90,"color":91,"percentage":92},"HTML","#e34c26",0.6,6454,851,"2026-04-05T10:15:37","NOASSERTION","Linux, macOS, Windows","支持 NVIDIA 和非 NVIDIA GPU，部分组件支持 macOS 上的 GPU 加速，显存需求未明确说明","未说明",{"notes":101,"python":102,"dependencies":103},"建议使用 conda 管理环境，首次运行需下载约 5GB 模型文件。若需远程访问需配置 HTTPS。","3.8+",[104,105,106,107,108,109],"torch>=2.0","transformers>=4.30","accelerate","uv","ffmpeg","deeplx",[14,15,26,13],[112,113,114,115,116,117,118,119,120,121],"ai-vtuber","ai-waifu","ai","neuro-sama","chatbots","live2d","live2d-web","llm","ollama","ai-companion","2026-03-27T02:49:30.150509","2026-04-06T05:16:59.396518",[125,130,135,139,144,148],{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},5917,"允许网页麦克风权限后，cmd 没有响应，出现 Uncaught (in promise) AbortError: The user aborted a request 错误。","请检查 `static\u002Findex.js` 文件中的 Voice Activation Detection 部分，将代码更新为以下内容以升级 vad 库：\n\n```javascript\n\u003C!-- Voice Activation Detection -->\n\u003Cscript src=\"https:\u002F\u002Fcdn.jsdelivr.net\u002Fnpm\u002Fonnxruntime-web@1.19.2\u002Fdist\u002Fort.js\">\u003C\u002Fscript>\n\u003C!-- \u003Cscript src=\"libs\u002Fort.js\">\u003C\u002Fscript> -->\n\u003Cscript src=\"https:\u002F\u002Fcdn.jsdelivr.net\u002Fnpm\u002F@ricky0123\u002Fvad-web@0.0.19\u002Fdist\u002Fbundle.min.js\">\u003C\u002Fscript>\n\u003C!-- \u003Cscript src=\"libs\u002Fbundle.min.js\">\u003C\u002Fscript> -->\n```\n此问题可能与语音激活检测库版本过旧有关。","https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Fissues\u002F24",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},5918,"运行 server.py 时出现 'ERROR: Exception in ASGI application' 错误。","请确保你提供的报错信息完整，并新开一个 issue 提供详细日志。此外，可以尝试更换 ASR 模型（如使用 paraformer-zh 或更小的模型）并使用 ollama 运行，这可能会解决部分兼容性问题。","https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Fissues\u002F27",{"id":136,"question_zh":137,"answer_zh":138,"source_url":134},5919,"如何设置 LLM Provider？","目前支持的 LLM Provider 只能是 `MemGPT` 或 `ollama`。所有 OpenAI 格式的接口都应使用 Ollama。未来计划将名称改为 `openai-compatible` 以避免混淆。在配置文件中，可以参考如下注释进行设置。\n\n\u003Cimg width=\"623\" alt=\"Screenshot 2024-10-17 at 6 36 03 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F680c01e8-917f-4768-8683-83b7450eba65\">",{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},5920,"如何解决 cmd 调用 uv run run_server.py 报错的问题？","请确保 Ollama 模型服务已正确启动。如果未启动，请仔细阅读文档中的快速入门部分，其中包含关于 Ollama 的详细指南。可参考 FAQ 中的 [该部分](https:\u002F\u002Fdocs.llmvtuber.com\u002Fdocs\u002Ffaq\u002F#%E9%81%87%E5%88%B0-error-calling-the-chat-endpoint-%E9%94%99%E8%AF%AF%E6%80%8E%E4%B9%88%E5%8A%9E) 获取更多帮助。","https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Fissues\u002F205",{"id":145,"question_zh":146,"answer_zh":147,"source_url":143},5921,"如何解决 silero-vad 导致的依赖导入失败问题？","当前 web 和 Electron 前端并未使用 `silero-vad` 模块。在 `dev` 分支中，我们正在更新 `vad_factory` 以动态加载 `silero_vad`，并将在 `conf.default.yaml` 中将 `vad_model` 设置为 `null`，同时从依赖项中移除 `torchaudio`。此更改将防止因 `torchaudio` 作为 `silero-vad` 依赖而引发的导入错误。\n\n相关提交记录为：f3f984bd42129b251708bbebd99e63526734815a",{"id":149,"question_zh":150,"answer_zh":151,"source_url":129},5922,"如何提高语音识别和 LLM 推理的速度？","如果速度慢，可能是显存不足导致的。Ollama 在显存不足时会将部分内容转移到 RAM 或 swap 空间，虽然不会崩溃，但速度会变慢。建议使用较小的模型或更高量化的版本，例如 `qwen2.5:7b-instruct-q3_K_M` 或 `qwen2.5:7b-instruct-q4_K_M`。对于语音识别，可以考虑使用更小的模型如 `distil-large-v3`、`medium` 或 `SenseVoiceSmall`。另外，也可以考虑使用 Groq API，其速度快且免费额度较高。",[153,158,163,168,173,178,183,188,193,198,203,208,213,218,223,228,233,238,243],{"id":154,"version":155,"summary_zh":156,"released_at":157},105538,"v1.2.1","1.2.1 contains important fixes to the frontend. Please [update](https:\u002F\u002Fdocs.llmvtuber.com\u002Fdocs\u002Fuser-guide\u002Fupdate) to the latest version if you are using 1.2.0.\r\n\r\n## What's Changed\r\n* 🧑‍💻 feat: add cursor & copilot & gemini rules for code assist and code review by @t41372 in https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Fpull\u002F259\r\n* **Version 1.2.1 (bug fix) by @ylxmf2005 in https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Fpull\u002F266**\r\n* Enhancement MCP server argument 'cwd' by @Stewitch in https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Fpull\u002F269\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Fcompare\u002F1.2.0...v1.2.1\r\n\r\n\r\n## Faster download links for Chinese users 给内地用户准备的(相对)快速的下载链接\r\nOpen-LLM-VTuber-v1.2.1-zh.zip (包含 sherpa onnx asr 的 sense-voice 模型，就不用再从github上拉取了)\r\n- [Open-LLM-VTuber-v1.2.1-en.zip](https:\u002F\u002Fpub-17317087be374bc68161ac63de2022a5.r2.dev\u002Fv1.2.1\u002FOpen-LLM-VTuber-v1.2.1-en.zip)\r\n- [Open-LLM-VTuber-v1.2.1-zh.zip](https:\u002F\u002Fpub-17317087be374bc68161ac63de2022a5.r2.dev\u002Fv1.2.1\u002FOpen-LLM-VTuber-v1.2.1-zh.zip)\r\n\r\nopen-llm-vtuber-1.2.1-setup.exe (桌面版前端，Windows)\r\n- [open-llm-vtuber-1.2.1-setup.exe](https:\u002F\u002Fpub-17317087be374bc68161ac63de2022a5.r2.dev\u002Fv1.2.1\u002Fopen-llm-vtuber-1.2.1-setup.exe)\r\n\r\nopen-llm-vtuber-1.2.1.dmg (桌面版前端，macOS)\r\n- [open-llm-vtuber-electron-.dmg](https:\u002F\u002Fpub-17317087be374bc68161ac63de2022a5.r2.dev\u002Fv1.2.1\u002Fopen-llm-vtuber-1.2.1.dmg)\r\n","2025-08-26T08:47:08",{"id":159,"version":160,"summary_zh":161,"released_at":162},105539,"1.2.0","# v1.2.0 Release\r\n\r\nThis is a substantial update, packed with major features including Letta-based long-term memory, MCP support, Live2D Cubism 5 support, Chinese support for the frontend, an improved update system, a Bilibili Danmaku client, and numerous bug fixes.\r\n\r\nFirst, we'd like to apologize for the extended release cycle. We will do our best to avoid such long intervals between updates in the future.\r\n\r\nAdditionally, please note a licensing change for the project's frontend (the `Open-LLM-VTuber-Web` repository, which powers the built-in web and Electron clients). Effective with this release (v1.2.0), the frontend will transition from unspecified license (all rights reserved) to the `Open-LLM-VTuber License 1.0`.\r\n\r\nThe backend will remain under the MIT License for v1.2.0 but is expected to be unified under the `Open-LLM-VTuber License 1.0` around v1.3 or v1.4. We are still discussing the specifics and will provide a clear announcement in the GitHub Release when the change occurs. Please be aware that Live2D models have their own licenses, which you should check separately.\r\n\r\n### ⚠️ Notice: Potential Breaking Changes\r\n\r\nIn this version, we have refactored the Live2D implementation to add support for Live2D 5.0 models and fix display issues with many existing models. As part of this change, **support for Live2D 2.1 models has been removed**. While this should increase compatibility with modern models, if you encounter any issues with your Live2D model not displaying after updating, please let us know and consider rolling back to the previous version.\r\n\r\n## ✨ Highlights\r\n\r\n-   **(MCP)** The AI can now call tools that support the Model-Context Protocol (MCP). Built-in support is included for [time](https:\u002F\u002Fgithub.com\u002Fmodelcontextprotocol\u002Fservers\u002Ftree\u002Fmain\u002Fsrc\u002Ftime) and [ddg-search](https:\u002F\u002Fgithub.com\u002Fnickclyde\u002Fduckduckgo-mcp-server). The frontend now displays the status of tool calls. (See the Appendix for a demo).\r\n-   **(MCP)** Added support for BrowserBase's Browser Use MCP with a [Live View](https:\u002F\u002Fdocs.browserbase.com\u002Ffeatures\u002Fsession-live-view) in the frontend.\r\n-   **(Live2D)** The frontend Live2D SDK has been migrated from `pixi-live2d-display-lipsync` to the official Live2D Web SDK. This adds support for Cubism 5 but removes support for Cubism 2. Models now have improved feedback on click interactions.\r\n-   The default Live2D model has been changed to `mao_pro`, as the expressions for the `shizuku` model were removed by the official creators in the Live2D 5 version.\r\n-   **(Frontend)** Added Chinese language support.\r\n-   Implemented an interface for live streaming platforms and added a client for receiving Bilibili Danmaku (live comments).\r\n-   **(Memory)** Implemented Letta-based long-term memory.\r\n-   **(LLM)** Added support for LM Studio.\r\n-   **(TTS)** Added support for OpenAI-Compatible TTS, SparkTTS, and SiliconFlow TTS.\r\n-   Added a `requirements.txt` file for users who are not familiar with `pip` commands or prefer not to use `uv`.\r\n-   Numerous bug fixes.\r\n-   Updated the documentation, which now includes an \"Ask AI\" feature.\r\n\r\n## Detailed Changes Since v1.1.0:\r\n\r\n### Backend:\r\n\r\n-   Changed some preset options in the configuration file: `llm_provider` -> `ollama_llm`.\r\n-   Set `project_id` and `organization_id` in `conf.yaml` to `null` by default to prevent API errors.\r\n-   Azure ASR: Added a list for detected languages and fixed several bugs.\r\n-   Fixed bugs related to configuration file updates (2bc0c1b5f75ea79f563935b03a2267e6584d9bc @ylxmf2005).\r\n-   To allow Windows users to confidently use backslashes for file paths, all double quotes in the configuration file have been changed to single quotes (758d0b304bfa9d2c561987e9d3edac74857309c7).\r\n-   Fixed Claude's vision capabilities. It seems this was never working correctly—did no one notice until now?\r\n-   Information about Live2D models can now be fetched from the `GET \u002Flive2d-models\u002Finfo` route.\r\n-   When using the update script, the frontend (linked via git submodule) will now be updated as well.\r\n-   Fixed [#150](https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Fissues\u002F150): The `temperature` parameter was not passed during the initialization of OpenAI-Compatible LLMs.\r\n-   Fixed [#141](https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Fissues\u002F141): A dependency issue on Intel Macs.\r\n-   Implemented a live streaming platform interface and a Bilibili Danmaku client based on [blivedm](https:\u002F\u002Fgithub.com\u002Fxfgryujk\u002Fblivedm\u002Ftree\u002Fdev) (fea16ace015851656e6c044961758c69247ce69e), [#142](https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Fpull\u002F142) @Fluchw, @ylxmf2005.\r\n-   Merged [#161](https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Fpull\u002F161), adding the `StatelessLLMWithTemplate` class. Thanks, @aaronchantrill!\r\n-   Added OpenAI-Compatible TTS [#178](https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Fpull\u002F178). Thanks, @fastfading!\r\n-   Implemented Letta-based long-term memory [#179](https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-","2025-08-03T13:43:12",{"id":164,"version":165,"summary_zh":166,"released_at":167},105540,"v1.1.0","## What's Changed\r\n\r\n### Major Features\r\n* Implemented group chat functionality (@ylxmf2005)\r\n* Added Silero-VAD voice activity detection (@AnyaCoder)\r\n* Added CosyVoice2 text-to-speech support (@Warma10032)\r\n* Added frontend ASR\u002FTTS tools accessible at `http:\u002F\u002Flocalhost:web-tool`\r\n  - Users can now directly use the project's speech recognition and text-to-speech engines\r\n* Introduced one-click CUDA-ready setup using pixi (@mokurin000)\r\n* Improved configuration management and update mechanism:\r\n  - `conf.yaml` is no longer tracked in git\r\n  - New config template system for generating and updating `conf.yaml` during upgrades\r\n\r\n### Bug Fixes & Improvements\r\n* Fixed sentence divider issues\r\n* Fixed system prompt override bug for certain LLMs\r\n* Removed deprecated `prompts\u002Fpersona` directory (unused since v1.0.0)\r\n* Major codebase refactoring of conversation and handler components (@ylxmf2005)\r\n\r\n### New Contributors\r\n* @mokurin000\r\n* @AnyaCoder\r\n* @Warma10032\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber\u002Fcompare\u002Fv1.0.0...v1.1.0\r\n\r\n\r\n## Which files should I get? 我应该下载哪些文件？\r\n\r\n### For Existing Open-LLM-VTuber Users (v1.0.0 or newer) 现有 Open-LLM-VTuber 用户（v1.0.0 或更新版本）\r\n1. Run `uv run upgrade.py` to update to the latest version 运行 `uv run upgrade.py` 来更新到最新版本\r\n2. Download the new electron app from the releases section 从发布区(下面)下载新的 electron 应用程序\r\n\r\n### For New Users or Versions Below v1.0.0 新用户或 v1.0.0 以下版本用户\r\nPlease refer to the [new deployment documentation](https:\u002F\u002Fdocs.llmvtuber.com\u002Fdocs\u002Fquick-start) for installation instructions.\r\n请参考[新部署文档](https:\u002F\u002Fdocs.llmvtuber.com\u002Fdocs\u002Fquick-start)获取安装说明。\r\n\r\n### Download Files 下载文件\r\nIf you are here because you read the documentation, download the zip file and the electron app below.\r\nDownload both of these files:\r\n1. The electron app\r\n2. The language-specific ZIP file:\r\n   - English: `Open-LLM-VTuber-v1.1.0-en.zip`\r\n   - Chinese: `Open-LLM-VTuber-v1.1.0-zh.zip`\r\n\r\nNote: The ZIP files are identical except for the language of the configuration file. Both packages include the SenseVoiceSmall model file to ensure accessibility for Chinese users.\r\n\r\n如果您是按照文档指引来到这里的，请下载以下的 zip 文件和 electron 应用程序。\r\n请下载这两个文件：\r\n1. electron 应用程序\r\n2. 对应语言的 ZIP 文件：\r\n   - 英文版：`Open-LLM-VTuber-v1.1.0-en.zip`\r\n   - 中文版：`Open-LLM-VTuber-v1.1.0-zh.zip`\r\n\r\n注意：这些 ZIP 文件除了配置文件的语言不同外完全相同。两个包都包含 SenseVoiceSmall 模型文件以确保内地用户可以愉快使用。\r\n\r\n\r\n## Faster download links for Chinese users 给内地用户准备的(相对)快速的下载链接\r\nOpen-LLM-VTuber-v1.1.0-zh.zip (包含 sherpa onnx asr 的 sense-voice 模型，就不用再从github上拉取了)\r\n- [Open-LLM-VTuber-v1.1.0-en.zip](https:\u002F\u002Fpub-17317087be374bc68161ac63de2022a5.r2.dev\u002Fv1.1.0\u002FOpen-LLM-VTuber-v1.1.0-en.zip)\r\n- [Open-LLM-VTuber-v1.1.0-zh.zip](https:\u002F\u002Fpub-17317087be374bc68161ac63de2022a5.r2.dev\u002Fv1.1.0\u002FOpen-LLM-VTuber-v1.1.0-zh.zip)\r\n\r\nopen-llm-vtuber-electron-1.1.0-frontend.exe (桌面版前端，Windows)\r\n- https:\u002F\u002Fpub-17317087be374bc68161ac63de2022a5.r2.dev\u002Fv1.1.0\u002Fopen-llm-vtuber-electron-1.1.0-setup.exe\r\n\r\nopen-llm-vtuber-electron-1.1.0-frontend.dmg (桌面版前端，macOS)\r\n- https:\u002F\u002Fpub-17317087be374bc68161ac63de2022a5.r2.dev\u002Fv1.1.0\u002Fopen-llm-vtuber-electron-1.1.0.dmg\r\n","2025-02-19T18:12:05",{"id":169,"version":170,"summary_zh":171,"released_at":172},105541,"v1.0.0","# Open-LLM-VTuber v1.0.1 Release 💥\r\n\r\nThis release marks a significant milestone for Open-LLM-VTuber, featuring a complete rewrite of the backend and frontend with over 240+ new commits, along with numerous enhancements and new features. If you were using a version before this, version `v1.0.0` is basically a new app.\r\n\r\n⚠️ Direct upgrades from older versions are impossible due to architectural changes. Please refer to our **[new documentation site](https:\u002F\u002Fopen-llm-vtuber.github.io\u002Fdocs\u002Fintro)** for installation.\r\n\r\n(v1.0.0 had a bug after the release, so let's just ignore that and have the v1.0.1)\r\n\r\n| ![i4_pet_desktop](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F06eff9dc-e141-4401-90ac-823b08662aae) | ![i1](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fe0175aa3-62c8-4cde-9c6f-5d010727c04f) |\r\n|:---:|:---:|\r\n| ![i3](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F082d8f29-9b48-4dbb-87f6-0f12d89a92f2) | ![i2](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Ff6b50eda-8187-4d37-b39b-a34e33683328) |\r\n![i4](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Ffa4a5884-0ec7-4377-8a3b-204aafaf8ede) | ![i3_browser_world_fun](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F8e0819d2-75dd-4ebf-97ab-399bf2d01795) |\r\n\r\n\r\n## ✨ Highlights\r\n*   **Vision Capability:** Video chat with the AI.\r\n*   **Desktop Pet Mode:** A new Desktop Pet Mode lets you have your VTuber companion directly on your desktop.\r\n*   **Brand New Frontend:**  A completely redesigned frontend built with React, ChakuraUI, and Vite offers a modern user experience. Available as web and desktop apps, located in the [Open-LLM-VTuber-Web](https:\u002F\u002Fgithub.com\u002FOpen-LLM-VTuber\u002FOpen-LLM-VTuber-Web) repository.\r\n*   **Chat History Management:**  Implemented a system to store and retrieve conversation history, enabling persistent interactions with your AI.\r\n*   **New LLM support:**  Many new (stateless) LLM providers are now supported (and refactored), including Ollama, OpenAI, Gemini, Claude, Mistral, DeepSeek, Zhipu, and llama.cpp.\r\n*   **DeepSeek R1 Reasoning model support**: The reasoning chain will be displayed but not spoken. See your waifu's inner thoughts!\r\n*   **Major Backend Rewrite:** The core of Open-LLM-VTuber has been rebuilt from the ground up, focusing on asynchronous operations, improved memory management, and a more modular architecture.\r\n*   **Refactored Configuration:** The `conf.yaml` file was restructured, and `config_alts` has been renamed to `characters`.\r\n* **TTS Preprocessor**: Text inside `asterisks`, `brackets`, `parentheses`, and `angle brackets` will no longer be spoken by the TTS.\r\n*   **Dependency management:** Switched to `uv` for dependency management, removed unused dependencies such as `rich`, `playsound3`, and `sounddevice`.\r\n*   **Documentation Site:** A comprehensive documentation site is now live at [https:\u002F\u002Fopen-llm-vtuber.github.io\u002F](https:\u002F\u002Fopen-llm-vtuber.github.io\u002F).\r\n\r\n## 📋 Detailed Changes\r\n\r\n### 🧮 Backend\r\n\r\n*   **Architecture:**\r\n    *   The project structure has been reorganized to use the `src\u002F` directory.\r\n    *   The backend is now fully asynchronous, improving responsiveness.\r\n    *   CLI mode (`main.py`) has been removed.\r\n    *   The \"exit word\" has been removed.\r\n    *   Models are initialized and managed using `ServiceContext`, offering better memory management, particularly when switching characters.\r\n    *   Refactored LLMs into `agent` and `stateless_llm`, supporting a wider range of LLMs with a new agent interface: `basic_memory_agent` and `hume_ai_agent`.\r\n*   **LLM (Language Model) Enhancements:**\r\n    *   New (and old but refactored) providers: Ollama, OpenAI (and any OpenAI Compatible API), Gemini, Claude, Mistral, DeepSeek, Zhipu, llama.cpp.\r\n    *   `temperature` parameter added.\r\n    *   No more tokens will be generated after interruption, improving the responsiveness of voice interruption.\r\n    *   Ollama models are preloaded at startup, kept in memory for the server's duration, and unloaded at exit.\r\n    *   Added a `hf_mirror` flag to specify whether to use the Hugging Face mirror source.\r\n*   **TTS (Text-to-Speech) Enhancements:**\r\n    *   TTS now generates multiple audio segments concurrently and sends them sequentially, reducing latency.\r\n    *   New interruption logic for smoother transitions.\r\n    *   Added filters (`asterisks`, `brackets`, `parentheses`) to prevent unwanted text from being spoken.\r\n    *   Implemented `faster_first_response` feature to prioritize the synthesis and playback of the first sentence fragment, minimizing latency.\r\n*   **ASR (Automatic Speech Recognition) Enhancements:**\r\n    *   Made Sherpa-onnx ASR with the **SenseVoiceSmall int8** model the default for both English and Chinese presets, with automatic model download.\r\n    *   Added a `provider` option for sherpa-onnx-asr.\r\n*   **Other Improvements:**\r\n    *   Chat log persistence is used to maintain conversation history.\r\n    *   All `print` statements are replaced with `loguru` for structured logging.\r\n    *   Added a Chinese configurati","2025-02-04T04:29:35",{"id":174,"version":175,"summary_zh":176,"released_at":177},105542,"v0.5.2","# v0.5.2 patch\r\n\r\nThe default ASR provider was changed back to `FunASR`. It was mistakenly set to `faster-whisper` a while ago without notice, and it caused a lot of problems for people who use Nvidia GPU without the cudnn. This change was not intended and has now been reversed.\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fcompare\u002Fv0.5.1...v0.5.2","2024-12-21T21:18:39",{"id":179,"version":180,"summary_zh":181,"released_at":182},105543,"v0.5.1","## What's New\r\n\r\nIf you wonder where the `v0.5.0` goes, it's gone forever.\r\n\r\n(But if you are lucky enough to get `v0.5.0`, it's not a big deal. The difference between `v0.5.0` and `v0.5.1` is a new CORS fix, which is sort of irrelevant to you.)\r\n \r\n### Enhancements\r\n\r\n- 🎉  **llama.cpp Integration**: You can now run GGUF model files (LLM models) directly within the project, eliminating the need for external services like Ollama, LM Studio, or other APIs.\r\n- 🎉  **Sherpa-ONNX Support for ASR and TTS**: Added support for Sherpa-ONNX, enabling better speech recognition and text-to-speech experience. Contributed by @Neil2893 in https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fpull\u002F50.\r\n  - Sherpa-ONNX allows us to run models like SenseVoiceSmall, MeloTTS, and PiperTTS as easy as hell. With more testing and scripting to automate the model downloading process, Sherpa-ONNX with SenseVoiceSmall and MeloTTS or PiperTTS will likely be the new default ASR and TTS model for this project, as these models deliver great performance with fast inference even on CPU. The current SenseVoiceSmall implementation with FunASR is bulky and buggy, the MeloTTS is as difficult to install as possible, and PiperTTS is a dead project with hundreds of unfixed bugs, which includes one that stops me from integrating it into this project. Those issues are addressed with sherpa-onnx. Thanks a lot for the work done by @Neil2893 🎉 🎉 🎉 .\r\n\r\n\r\n- :tada: **VAD Tuning Options**: Introduced `negativeSpeechThreshold` and `redemptionFrames` parameters, giving users more control over VAD (Voice Activity Detection) settings to enhance their AI interaction experience. Contributed by @Neil2893 in https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fpull\u002F53.\r\n\r\n### Bug fix\r\n- 🐛 CORS policy issue: If users attempt to host the web part of this project separately, it will throw you a CORS policy error, and the Live2D model will not load. This issue is fixed (although the browser may be keeping a cache for the CORS stuff that prevents you from seeing this change).\r\n\r\n### New Contributor\r\n\r\n- Welcome [[@Neil2893](https:\u002F\u002Fgithub.com\u002FNeil2893)](https:\u002F\u002Fgithub.com\u002FNeil2893), who made their first contribution in [[#50](https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fpull\u002F50)](https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fpull\u002F50)!\r\n\r\n**Full Changelog**: [[v0.4.4...v0.5.1](https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fcompare\u002Fv0.4.4...v0.5.1)](https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fcompare\u002Fv0.4.4...v0.5.1)\r\n\r\n\r\n\r\n## Regarding the next version\r\n\r\nI will start refactoring this project, which includes breaking changes, as I wanted to change the architecture, clean up some tech debts, and prepare this project for more features. The next version, well, if everything goes well, will be `v1.0.0`. I'm also working with some folks to rewrite the front end with React, and an awesome guy is working on making the installation process super easy.\r\n\r\nI barely knew Python when I started this project (I started writing this project pretending it was JavaScript and did not even bother with OOP initially). Most of my knowledge about Python and best practices came from doing this project, and there were wrong decisions. I refactored many ugly parts in the past few months, but some of the changes I want involve breaking change. I think it's a good idea to put the breaking changes I can think of and do them in one go, and this is what I will be doing.\r\n\r\nThe biggest change planned that will influence the users (you) is that **I will be removing the CLI mode in `v1.0.0`**. I can't see a reason for anyone to run this project in CLI mode without the Live2D body after I added the text input feature in `v0.4.0`. If you worry about the GPU usage, just run the webpage in the background, and it won't render the Live2D body when it's not on the screen. In addition, the code will be much cleaner without the CLI mode. Let me know if you are super upset about the removal of the CLI mode.\r\n\r\nRegarding `v1.0.0`, you can check my to-do list and progress on [GitHub Project](https:\u002F\u002Fgithub.com\u002Fusers\u002Ft41372\u002Fprojects\u002F1\u002Fviews\u002F5). If you have any suggestions, please let me know. I'm not a super-experienced developer and might do things wrong or make the wrong decisions. Let me know about those things before I finish with the first big-breaking change in this project (or second, but I had very few users at that time to make a long announcement about it).\r\n\r\nThe to-do list will be in Chinese because... well, most of my users and all of my awesome contributors in this project speak Chinese (and also because Chinese is my first language, after all). I still write announcements in English because this is what I have been doing, but this text will be translated into Chinese when I post the same announcement on the QQ channel and QQ group. So yeah.","2024-12-15T01:36:24",{"id":184,"version":185,"summary_zh":186,"released_at":187},105544,"v0.4.4","## What's Changed\r\n* Update dockerfile by @SunnyPai0413 in https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fpull\u002F52\r\n* FunASR now requires `onnx` as a dependency without notice. It's now updated in our doc and in the auto-installation script.\r\n\r\n## New Contributors\r\n* @SunnyPai0413 made their first contribution in https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fpull\u002F52\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fcompare\u002Fv0.4.3...v0.4.4","2024-12-13T04:26:59",{"id":189,"version":190,"summary_zh":191,"released_at":192},105545,"v0.4.3","## What's Changed\r\n* bugfix: env won't reinstall if it doesn't exist. by @SunKSugaR in https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fpull\u002F47\r\n\r\n## New Contributors\r\n* @SunKSugaR made their first contribution in https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fpull\u002F47\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fcompare\u002Fv0.4.2...v0.4.3","2024-12-06T13:53:03",{"id":194,"version":195,"summary_zh":196,"released_at":197},105546,"v0.4.2","**Emergency Update 0.4.2**, everyone 🚨🚨🚨\r\n\r\nIn version 0.4.1, I accidentally removed the **persona settings option** and released it without noticed...  \r\n\r\nNo issues with earlier versions, but users of `v0.4.1` might notice the AI's personality acting a bit strange, and the option to choose persona settings (personality configuration) has disappeared. This was due to my **mistake** when I accidentally deleted it!  \r\n\r\nIf you're using v0.4.1, you can either update to the latest version or manually restore the persona settings at line `280` (right below the line `# some options: \"en_sarcastic_neuro\"`).  \r\n\r\nJust add this line back:  \r\n`PERSONA_CHOICE: \"en_sarcastic_neuro\"  # or if you rather edit persona prompt below, leave it blank ...`","2024-11-30T16:12:56",{"id":199,"version":200,"summary_zh":201,"released_at":202},105547,"v0.4.1","# Release v0.4.1\r\nA day (or a couple of hours depending on your timezone) after the v0.4.0 release, here comes the v0.4.1 release with some quick fixes.\r\n\r\n## 🚀 New Features\r\n- Added persistence for user preferences:\r\n  - VAD confidence threshold settings\r\n  - Background image selection\r\n  These settings are now saved in the browser localStorage and persist across sessions.\r\n\r\n## 🐛 Bug Fixes\r\n- Fixed audio sentence tracking to prevent missing lines\r\n  - Implemented improved end-of-audio detection\r\n  - Reduced instances of AI skipping sentences\r\n- Restored version number display at server launch\r\n\r\n## 📦 Full Changelog\r\n- View the complete list of changes: [v0.4.0...v0.4.1](https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fcompare\u002Fv0.4.0...v0.4.1)","2024-11-28T20:55:40",{"id":204,"version":205,"summary_zh":206,"released_at":207},105548,"v0.4.0","# Release v0.4.0\r\n\r\nI was going to add documentation for GPT-SoVITS, the upgrade script, and the installation scripts before releasing this version. I also plan to have the installation script detect if the user needs a proxy to download models from huggingface before releasing v0.4.0. However, I realized that I would never release v0.4.0 if I chose to do those things, and v0.4.0 would get bigger and bigger every day.\r\n\r\nSo yeah, another 2 weeks have passed (after five pre-releases), and here is the v0.4.0 release.\r\n\r\n## 🚀 What's New\r\n\r\n### 💬 Text Input in the Browser\r\nYou can now interact with the AI directly by typing in the Browser.\r\n\r\n### 🎉 GPT SoVITS Support\r\nAdded **GPT SoVITS support** by [@YveMU](https:\u002F\u002Fgithub.com\u002FYveMU) in [PR #40](https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fpull\u002F40).\r\n\r\n### ⚙️ Auto Installation Script (Experimental)\r\nIntroduced an experimental **auto-installation script** to simplify setup. This script:\r\n- is cross-platform (at least it's intended to be)\r\n- Creates a miniconda environment in the project directory (and the miniconda is also installed to the project directory).\r\n- Installs FFmpeg and the correct Python version in the miniconda environment.\r\n- Automatically configures dependencies for FunASR, edgeTTS, and ollama (excluding the ollama installation itself).\r\n\r\n### ⚡ ASR\u002FTTS Preloading & Caching\r\nASR and TTS models now preload when the server launches (default but optional), significantly reducing the wait time when opening the webpage.\r\n\r\n### 🖱️ Pointer Interaction Toggle\r\nAdded a **Pointer Interactive Button** to prevent Live2D from following your cursor.\r\n\r\n### 🔧 Adjustable VAD Confidence Threshold\r\nIntroduced a **Voice Activation Detection (VAD) Confidence Threshold** field:\r\n- Configure how confident the AI must be in detecting speech.\r\n- Example: At 98%, the AI will only listen when it's 98% certain you're speaking.\r\n\r\n### ✨ Special Character Filtering\r\nBy default, TTS will no longer vocalize special characters like emojis. (you can re-enable this in `conf.yaml`.)\r\n\r\n---\r\n\r\n## 🔄 What's Changed\r\n- **Voice interruption turned off by default**: You can turn it back on with the \"Voice Interruption Button\" button. This change is motivated by the following prevalent issue \r\n\t- the AI got interrupted by background noise \r\n\t- the system will go crazy when you interrupt yourself (interrupt before AI says anything).\r\n- **Default TTS**: FunASR is now the default TTS.\r\n- **ASR\u002FTTS Visibility**: The server shows the active ASR and TTS on launch.\r\n- **New Prompt**: Added a fun English prompt for discussing nuclear proliferation.\r\n\r\n---\r\n\r\n## 🎉 New Contributors\r\nThanks to our new contributor:\r\n- [@YveMU](https:\u002F\u002Fgithub.com\u002FYveMU) for their first contribution in [PR #40](https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fpull\u002F40).\r\n\r\n---\r\n\r\n## 📜 Full Changelog\r\nView the complete list of changes: [v0.3.1...v0.4.0](https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fcompare\u002Fv0.3.1...v0.4.0)","2024-11-28T03:50:23",{"id":209,"version":210,"summary_zh":211,"released_at":212},105549,"v0.3.1","# Release Notes - Version 0.3.1\r\n\r\nWell yeah. I forgot to release the version `v0.3.0`. \r\n\r\nIn addition, I realized that I have always been doing the semantic versioning the wrong way, so from this release, we will do the semantic versioning the right way.\r\n\r\n## What's New\r\n* Added Fish TTS API\r\n* Add Claude API as LLM by @Y0oMu in https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fpull\u002F35\r\n* Add `initialXshift` and `initialYshift` parameters to Live2D configurations in `model_dict.json`. These two parameters allow us to change the initial position for the Live2D model.\r\n\r\n## Improvements\r\n* Improve the error message for edge-tts and other TTS.\r\n* Remove `python-dotenv` as a requirement because it's not used anywhere.\r\n* The upgrade script `upgrade.py` no longer has any dependency requirements.\r\n\r\n## Bug Fixes\r\n* gbk encoding fix now extends to the loading of `model_dict.json`.\r\n\r\n## New Contributors\r\n* @Y0oMu made their first contribution in https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fpull\u002F35\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fcompare\u002Fv0.2.5...v0.3.1","2024-11-17T04:00:38",{"id":214,"version":215,"summary_zh":216,"released_at":217},105550,"v0.2.5","# Release Notes - Version 0.2.5\r\n\r\nIt's been three weeks since the previous release, so here is a new one.\r\n\r\n## What's New\r\n\r\n- **AzureTTS Enhancements**: Added customizable pitch and rate properties, allowing users to match the voice style of Neuro-sama.\r\n- **Experimental Mem0 Integration**: Introduced support for Mem0 as an experimental feature.\r\n- **Real-Time Configuration Switching**: Users can now switch configurations in real-time via the `config_alts` directory, enabling dynamic adjustments of Live2D, voice, LLM, and other settings directly from the frontend.\r\n- **Dynamic Background Switching**: Allows users to change the background image in real-time on the frontend.\r\n- **CoquiTTS Support**: Added support for CoquiTTS as an additional TTS option.\r\n- **Chinese Documentation**: Comprehensive documentation is now available in Chinese.\r\n- **AI-Generated Favicon**: Introduced a new favicon (`favicon.ico`), generated by Adobe Firefly Image 3 (trained exclusively on licensed content).\r\n- **Experimental Upgrade Script**: Added an upgrade script for experimental testing.\r\n\r\n## Bug Fixes\r\n\r\n- **No Voice Input Mode**: Reinstated support for no voice input mode in CLI mode.\r\n- **TTS Naming Issues**: Resolved TTS naming inconsistencies (#29, #30).\r\n- **File Encoding**: Improved file handling for `persona prompt` and `conf.yaml` files to support non-UTF-8 encodings, resolving issues related to GBK encoding.\r\n- **Bug #31**: Addressed issue as detailed in GitHub (#31).\r\n\r\n## Improvements\r\n\r\n- **Version Tracking**: Implemented `__init__.py` for version control.\r\n- **Memory Management**: Fixed memory-related issues.\r\n- **Stability**: Enhanced stability, especially regarding interruptions.\r\n\r\n## Changes\r\n\r\n- **API Key Configuration**: Removed the `api_keys.py` file. Users should now add their AzureTTS API keys directly to `conf.yaml`.\r\n- **Default Settings Update**: Updated the default LLM to `qwen2.5` and modified the default language for some ASR components to `auto`.\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fcompare\u002Fv0.2.4...v0.2.5","2024-11-08T02:19:20",{"id":219,"version":220,"summary_zh":221,"released_at":222},105551,"v0.2.4","## Release Notes - Version 0.2.4\r\n\r\nIt's been a week since the last release, so here is a new release.\r\n\r\n### What's New\r\n\r\n- **Feature: xTTSv2 TTS Engine Support**  \r\n  Added support for the `xtts-api-server`, which now integrates with the xTTSv2 text-to-speech engine. Thanks to @Eggze2 for contributing! [#23](https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fpull\u002F23)\r\n  \r\n- **Feature: Environment Variable Support in `conf.yaml`**  \r\n  You can now reference environment variables directly in the `conf.yaml` file using the `${ENV_VAR_NAME}` syntax. This eliminates the need for explicit values in the configuration file by dynamically loading them from the environment.\r\n\r\n### Bug Fixes\r\n\r\n- **Hands-Free Voice Interactions (CLI)**  \r\n  Restored hands-free voice interactions in the CLI, which were previously asking for key presses at the end of each conversation turn. This was not intended, and the functionality is now working as expected.\r\n\r\n### Improvements\r\n\r\n- **CLI Interruption Stability**  \r\n  Improved the stability of interruptions in CLI mode. The system now knows which sentence was interrupted, preventing sentences it didn't had a chance to say from being stored in the LLM's memory. This behavior is now consistent with the Live2D mode.\r\n  \r\n### Changes\r\n\r\n- **Randomized Cache Audio Filenames**  \r\n  Cached audio files are now named with random UUIDs instead of sequential names like `temp-1`, improving uniqueness and preventing potential naming conflicts.\r\n\r\n- **`PortAudio` Dependency Update**  \r\n  `PortAudio` is no longer required if the local microphone is not in use (e.g., when running in a headless container with no mic). Previously, the program would throw an error even if all we need is a web server and a local microphone wasn't necessary. Now, it only throws an error if microphone functionality is explicitly needed locally (e.g. when running the main.py).\r\n\r\n### New Contributors\r\n\r\nA big thanks to our newest contributor:\r\n- @Eggze2 made their first contribution in [#23](https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fpull\u002F23)\r\n\r\n**Full Changelog**: [Compare v0.2.3...v0.2.4](https:\u002F\u002Fgithub.com\u002Ft41372\u002FOpen-LLM-VTuber\u002Fcompare\u002Fv0.2.3...v0.2.4)\r\n\r\n---\r\nThis release note was enhanced with GPT-4o, which is why it sounds so professional.\r\n","2024-10-15T04:21:30",{"id":224,"version":225,"summary_zh":226,"released_at":227},105552,"v0.2.3","I haven't really touched the project for two weeks now. I thought I was gonna make the new release once I fixed Piper, but that didn't really happen cause I got sick (and feeling very lazy when I'm not). However, there have been some important bug fixes since the last release, so I guess I will just make a new release in case anyone downloads zip files from releases instead of cloning them from git.\r\n\r\n**New Feature:**\r\n- Audio dubbing for a different language through translation: you can now talk to the LLM in English (and the LLM thinks in English) while hearing a Japanese (or any other language, really) audio. This is implemented by adding a translation layer right before the TTS, so nothing is translated to Japanese other than TTS.\r\n\r\n**Bug fix:**\r\n- Missing lines: Under certain circumstances, the audio for some sentences will be ignored even if the corresponding text appears on the screen. This fix is quite important cause it's surprisingly easy to trigger...\r\n\r\n**Unfixed bug that was supposed to be fixed:**\r\n- Piper TTS is currently not working. It never worked, really. I just didn't test it enough before releasing it. It turns out the current way of interacting with Piper TTS has a big problem: it mixes up audio for different sentences and crashes stuff. This shouldn't be a problem if the PiperTTS custom filename flag for the cli interface works as expected, but Piper TTS has been kinda dead and unmaintained for months now with hundreds of opening issues.","2024-10-05T03:53:54",{"id":229,"version":230,"summary_zh":231,"released_at":232},105553,"v0.2.2","Added features\r\n- New docker support with Nvidia GPU passthrough\r\n- Add: GroqWhisperASR\r\n\r\nBug Fix:\r\n- Fix: Local interruption with \"i\". \r\n- Fix: unicorn missing websocket\r\n- Fix: MeloTTS nltk download issue\r\n\r\nChanges\r\n- removed dependency: halo","2024-09-07T12:28:06",{"id":234,"version":235,"summary_zh":236,"released_at":237},105554,"v0.2.1","Voice interruption, but much more stable with updated documentations and many bug fixes.\r\n\r\nWell to be clear there is a status button added at the top-left corner of the page.\r\n\r\n[![A demo video that is a bit too long](https:\u002F\u002Fimg.youtube.com\u002Fvi\u002F_M5Bkk18ogk\u002F0.jpg)](https:\u002F\u002Fyoutu.be\u002F_M5Bkk18ogk)","2024-09-03T04:51:23",{"id":239,"version":240,"summary_zh":241,"released_at":242},105555,"v0.2.0","Voice interruption!\r\n\r\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Ff3c45289-a848-4099-a9f0-80d32d7969ac\r\n\r\n### Some notable differences about this version (compared to the previous version):\r\n\r\nImplemented:\r\n- voice interruption\r\n- Implemented buttons to turn on\u002Foff the microphone and the voice interruption.\r\n\r\nChanges:\r\n- to use live2d, you must use the mic in the browser now. The `MIC_IN_BROWSER` option is now deprecated and useless.\r\n- The `LIVE2D` option in the config.yaml is deprecated and useless now.\r\n- to use live2d, just run the server and open `localhost:12393` or whatever port you use with your browser. When the page is loaded, everything is loaded. When the page is closed, the session will end.\r\n- you can no longer click the live2d figure to make it do some weird reactions\r\n\r\nBug fix:\r\n- Fixed: Live2D lips have some bugs; sometimes, they won't move. Now using the \"RaSan147\u002Fpixi-live2d-display\" fork instead of the original \"guansss\u002Fpixi-live2d-display\" library.\r\n","2024-09-02T10:02:31",{"id":244,"version":245,"summary_zh":246,"released_at":247},105556,"v0.1.0","I redesigned the backend structure while implementing the voice interruption feature and introduced many breaking changes. It might be a good idea to release a version before the voice interruption version (which would be v0.2.0) gets merged into the main branch so people can easily find and download the pre-interruption version.\r\n\r\nAlso, I didn't really do versioning because I didn't think about it. I guess now is a good time to do so.\r\n\r\nSome notable differences about this version (compare to the next version to be released):\r\n- no voice interruption\r\n- support live2d while the microphone is NOT in the browser (but in the terminal on the server side)\r\n- requires users to run the `server.py`, open the browser page, and launch`main.py` for live2d features.\r\n- live2d lips have some bugs and sometimes the lips won't move\r\n- you can still click the live2d figure to make it do some weird reactions\r\n- two of the buttons at the buttom of the page was not implemented and was grey\r\n- can't think of any...","2024-09-02T10:00:17"]