[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-uezo--ChatdollKit":3,"tool-uezo--ChatdollKit":65},[4,17,27,35,48,57],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",150037,2,"2026-04-10T23:33:47",[13,14,15],"开发框架","Agent","语言模型","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,3,"2026-04-06T11:19:32",[15,26,14,13],"图像",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":10,"last_commit_at":33,"category_tags":34,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":10,"last_commit_at":41,"category_tags":42,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",85092,"2026-04-10T11:13:16",[26,43,44,45,14,46,15,13,47],"数据工具","视频","插件","其他","音频",{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":54,"last_commit_at":55,"category_tags":56,"status":16},5784,"funNLP","fighting41love\u002FfunNLP","funNLP 是一个专为中文自然语言处理（NLP）打造的超级资源库，被誉为\"NLP 民工的乐园”。它并非单一的软件工具，而是一个汇集了海量开源项目、数据集、预训练模型和实用代码的综合性平台。\n\n面对中文 NLP 领域资源分散、入门门槛高以及特定场景数据匮乏的痛点，funNLP 提供了“一站式”解决方案。这里不仅涵盖了分词、命名实体识别、情感分析、文本摘要等基础任务的标准工具，还独特地收录了丰富的垂直领域资源，如法律、医疗、金融行业的专用词库与数据集，甚至包含古诗词生成、歌词创作等趣味应用。其核心亮点在于极高的全面性与实用性，从基础的字典词典到前沿的 BERT、GPT-2 模型代码，再到高质量的标注数据和竞赛方案，应有尽有。\n\n无论是刚刚踏入 NLP 领域的学生、需要快速验证想法的算法工程师，还是从事人工智能研究的学者，都能在这里找到急需的“武器弹药”。对于开发者而言，它能大幅减少寻找数据和复现模型的时间；对于研究者，它提供了丰富的基准测试资源和前沿技术参考。funNLP 以开放共享的精神，极大地降低了中文自然语言处理的开发与研究成本，是中文 AI 社区不可或缺的宝藏仓库。",79857,1,"2026-04-08T20:11:31",[15,43,46],{"id":58,"name":59,"github_repo":60,"description_zh":61,"stars":62,"difficulty_score":54,"last_commit_at":63,"category_tags":64,"status":16},5773,"cs-video-courses","Developer-Y\u002Fcs-video-courses","cs-video-courses 是一个精心整理的计算机科学视频课程清单，旨在为自学者提供系统化的学习路径。它汇集了全球知名高校（如加州大学伯克利分校、新南威尔士大学等）的完整课程录像，涵盖从编程基础、数据结构与算法，到操作系统、分布式系统、数据库等核心领域，并深入延伸至人工智能、机器学习、量子计算及区块链等前沿方向。\n\n面对网络上零散且质量参差不齐的教学资源，cs-video-courses 解决了学习者难以找到成体系、高难度大学级别课程的痛点。该项目严格筛选内容，仅收录真正的大学层级课程，排除了碎片化的简短教程或商业广告，确保用户能接触到严谨的学术内容。\n\n这份清单特别适合希望夯实计算机基础的开发者、需要补充特定领域知识的研究人员，以及渴望像在校生一样系统学习计算机科学的自学者。其独特的技术亮点在于分类极其详尽，不仅包含传统的软件工程与网络安全，还细分了生成式 AI、大语言模型、计算生物学等新兴学科，并直接链接至官方视频播放列表，让用户能一站式获取高质量的教育资源，免费享受世界顶尖大学的课堂体验。",79792,"2026-04-08T22:03:59",[46,26,43,13],{"id":66,"github_repo":67,"name":68,"description_en":69,"description_zh":70,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":81,"owner_twitter":82,"owner_website":83,"owner_url":84,"languages":85,"stars":94,"forks":95,"last_commit_at":96,"license":97,"difficulty_score":98,"env_os":99,"env_gpu":100,"env_ram":100,"env_deps":101,"category_tags":111,"github_topics":112,"view_count":10,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":123,"updated_at":124,"faqs":125,"releases":156},6444,"uezo\u002FChatdollKit","ChatdollKit","ChatdollKit enables you to make your 3D model into a chatbot","ChatdollKit 是一款专为 Unity 引擎打造的 3D 虚拟助手开发套件，旨在帮助开发者轻松将静态的 3D 模型转化为具备语音交互能力的智能聊天机器人。它有效解决了传统 3D 角色缺乏自然对话能力、口型与动作不同步以及多平台适配复杂等痛点，让虚拟形象能够“听、说、看、动”。\n\n这款工具非常适合游戏开发者、互动媒体设计师以及希望构建沉浸式 AI 应用的研究人员使用。无论是制作 PC、移动端还是 WebGL 网页端的虚拟偶像或客服助手，ChatdollKit 都能提供一站式解决方案。\n\n其核心技术亮点在于原生支持多种主流大语言模型（如 ChatGPT、Claude、Gemini），并集成了先进的语音处理流程。ChatdollKit 不仅能自动同步语音与面部表情、实现精准唇形匹配，还具备独特的“打断发言”（Barge-in）功能，允许用户在 AI 说话时直接插话，使对话体验更加自然流畅。此外，它结合了多种语音活动检测技术，即使在嘈杂环境中也能准确识别用户意图，并支持跨平台部署，包括 VR 和 AR 场景，是构建下一代拟人化 AI 交互界面的理想选择。","﻿# ChatdollKit\n3D virtual assistant SDK that enables you to make your 3D model into a voice-enabled chatbot. [🇯🇵日本語のREADMEはこちら](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fblob\u002Fmaster\u002FREADME.ja.md)\n\n- [🐈 Live demo](https:\u002F\u002Funagiken.blob.core.windows.net\u002Fchatdollkit\u002FChatdollKitDemoWebGL\u002Findex.html) A WebGL demo. Say \"Hello\" to start conversation. She’s multilingual, so you can ask her something like \"Let's talk in Japanese\" when you want to switch languages.\n- [🍎 iOS App: OshaberiAI](https:\u002F\u002Fapps.apple.com\u002Fus\u002Fapp\u002Foshaberiai\u002Fid6446883638) A Virtual Agent App made with ChatdollKit: a perfect fusion of character creation by AI prompt engineering, customizable 3D VRM models, and your favorite voices by VOICEVOX.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_492a59b524d1.png\" width=\"720\">\n\n\n## ✨ Features\n\n- **Generative AI Native**: Supports multiple LLMs like ChatGPT, Anthropic Claude, Google Gemini Pro, Dify, and others, with function calling (ChatGPT\u002FGemini) and multimodal capabilities.\n- **3D model expression**: Synchronizes speech and motion, controls facial expressions and animations autonomously, supports blinking and lip-sync.\n- **Dialog control**: Integrates Speech-to-Text and Text-to-Speech (OpenAI, Azure, Google, VOICEVOX \u002F AivisSpeech, Aivis Cloud API, Style-Bert-VITS2 etc.), manages dialog state (context), extracts intents and routes topics, supports wakeword detection.\n- **Multi platforms**: Compatible with Windows, Mac, Linux, iOS, Android, and other Unity-supported platforms, including VR, AR, and WebGL.\n\n\n## 💎 What's New in Version 0.8.16\n\n- **🎙️ WebSocket Streaming STT**: WebSocket-based streaming speech recognition offloads VAD to the server and completes recognition during turn-end detection, reducing overall response latency by several hundred milliseconds.\n- **🗣️ Barge-in Support**: Users can now interrupt AI speech mid-sentence with their voice, making conversations feel more natural and responsive.\n- **💃 ModelController Refactoring**: Extracted speech handling into `SpeechController` and face expressions into `FaceController`, improving maintainability and extensibility.\n\n\n\u003Cdetails>\n\u003Csummary>🕰️ Previous Updates (click to expand)\u003C\u002Fsummary>\n\n### 0.8.15\n\n- **🌏 WebGL Enhancements**: Add Silero VAD support, camera switching (front\u002Frear) with correct aspect ratio handling, file upload for images, optimized microphone data transfer, and fixes for lip-sync when muted.  \n- **✨ UI Control Improvements**: Sleeker and more streamlined UI controls that work out-of-the-box with zero configuration—just drop them onto your scene’s Canvas.  \n- **🥁 Stronger Noise Resistance**: Combine multiple voice activity detection methods (e.g., Silero VAD + built-in energy-based VAD) to better capture user speech even in noisy environments like event venues.\n\n\n### 0.8.14\n\n- **🎙️ Echo Cancelling Support**: Add native microphone support for Android, iOS, and macOSX that support AEC, noise cancelling and other features for voice conversation.\n- **🗣️ Conversation Improvement**: Prevent conversation breakdown caused by turn-end misrecognition and improve conversation experience with features like automatic volume control when users interrupt during AI speech\n- **💠 Platform Expansion**: Support for Aivis Cloud API TTS, AIAvatarKit TTS\u002FSTT, and GPT-5 `reasoning_effort` parameter\n\n### 0.8.13\n\n- **🥳 Silero VAD Support**: ML-based voice-activity detection vastly improves turn-end accuracy in noisy settings, enabling smooth conversations outdoors or at events.\n- **🪄 TTS Pre-processing**: Optional text pre-processing lets you fine-tune pronunciation (e.g., convert “OpenAI” to katakana) before synthesis.\n- **🤝 Grok & Gemini Compatibility**: Removes OpenAI-specific params from the OpenAI-style endpoint, so Grok, Gemini, and other API-compatible models work out of the box.\n\n### 0.8.11 and 0.8.12\n\n- **🤖 AIAvatarKit Backend**: Offloads AI agent logic to the server—boosting front-end maintainability—while letting you plug in frameworks like AutoGen (and any other agent SDK) for unlimited capability expansion.\n- **🌐 WebGL Improvements**: Upgraded mic capture to modern `AudioWorkletNode` for lower latency and reliability; stabilized mute\u002Funmute handling; improved error handling to immediately surface HTTP errors and prevent hangs; fixed API-key authorization in WebGL builds.\n\n### 0.8.10\n\n- **🌎 Dynamic Multi-Language**: The system can now autonomously switch languages for both speaking and listening during conversations.\n- **🔖 Long-Term Memory**: Past conversation history can now be stored and searched. Components are provided for [ChatMemory](https:\u002F\u002Fgithub.com\u002Fuezo\u002Fchatmemory), but you can also integrate with services like mem0 or Zep.\n\n\n### 0.8.8 and 0.8.9\n\n- **✨ Support NijiVoice as a Speech Synthesizer**: Now support NijiVoice, an AI-Powered Expressive Speech Generation Service.\n- **🥰🥳 Support Multiple AITuber Dialogue**: AITubers can now chat with each other, bringing dynamic and engaging interactions to life like never before!\n- **💪 Support Dify as a backend for AITuber**: Seamlessly integrate with any LLM while empowering AITubers with agentic capabilities, blending advanced knowledge and functionality for highly efficient and scalable operations!\n\n\n### 0.8.7\n\n- **✨ Update AITuber demo**: Support more APIs, bulk configuration, UI and mode!. (v0.8.7)\n\n\n### 0.8.6\n\n- **🎛️ Support VOICEVOX and AivisSpeech inline style**: Enables dynamic and autonomous switching of voice styles to enrich character expression and adapt to emotional nuances.\n- **🥰 Improve VRM runtime loading**: Allows seamless and error-free switching of 3D models at runtime, ensuring a smoother user experience.\n\n\n### 0.8.5\n\n- **🎓 Chain of Thought Prompting**: Say hello to Chain of Thought (CoT) Prompting! 🎉 Your AI character just got a major boost in IQ and EQ!\n\n\n### 0.8.4\n\n- **🧩 Modularized for Better Reusability and Maintainability**: We’ve reorganized key components, focusing on modularity to improve customizability and reusability. Check out the demos for more details!\n- **🧹 Removed Legacy Components**: Outdated components have been removed, simplifying the toolkit and ensuring compatibility with the latest features. Refer to [🔄 Migration from 0.7.x](#-migration-from-07x) if you're updating from v0.7.x.\n\n\n### 0.8.3\n\n- **🎧 Stream Speech Listener**: We’ve added `AzureStreamSpeechListener` for smoother conversations by recognizing speech as it’s spoken.\n- **🗣️ Improved Conversation**: Interrupt characters to take your turn, and enjoy more expressive conversations with natural pauses—enhancing the overall experience.\n- **💃 Easier Animation Registration**: We’ve simplified the process of registering animations for your character, making your code cleaner and easier to manage.\n\n\n### 0.8.2\n\n- **🌐 Control WebGL Character from JavaScript**: We’ve added the ability to control the ChatdollKit Unity application from JavaScript when running in WebGL builds. This allows for more seamless interactions between the Unity app and web-based systems.\n- **🗣️ Speech Synthesizer**: A new `SpeechSynthesizer` component has been introduced to streamline text-to-speech (TTS) operations. This component is reusable across projects without `Model` package, simplifying maintenance and reusability. \n\n\n### 0.8.1\n\n- **🏷️ User-Defined Tags Support**: You can now include custom tags in AI responses, enabling dynamic actions. For instance, embed language codes in replies to switch between multiple languages on the fly during conversations.\n- **🌐 External Control via Socket**: Now supports external commands through Socket communication. Direct conversation flow, trigger specific phrases, or control expressions and gestures, unlocking new use cases like AI Vtubers and remote customer service. Check out the client-side demo here: https:\u002F\u002Fgist.github.com\u002Fuezo\u002F9e56a828bb5ea0387f90cc07f82b4c15\n\n### 0.8 Beta\n\n- **⚡ Optimized AI Dialog Processing**: We've boosted response speed with parallel processing and made it easier for you to customize behavior with your own code. Enjoy faster, more flexible AI conversations!\n- **🥰 Emotionally Rich Speech**: Adjusts vocal tone dynamically to match the conversation, delivering more engaging and natural interactions.\n- **🎤 Enhanced Microphone Control**: Microphone control is now more flexible than ever! Easily start\u002Fstop devices, mute\u002Funmute, and adjust voice recognition thresholds independently.\n\n\u003C\u002Fdetails>\n\n\n## 🚀 Quick Start\n\nYou can learn how to setup ChatdollKit by watching this video that runs the demo scene(including chat with ChatGPT): https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=rRtm18QSJtc\n\n[![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_a43865f5727d.jpg)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=rRtm18QSJtc)\n\nTo run the demo for version 0.8, please follow the steps below after importing the dependencies:\n\n- Open scene `Demo\u002FDemo08`.\n- Select `AIAvatarVRM` object in scene.\n- Set OpenAI API key to following components on inspector:\n  - ChatGPTService\n  - OpenSpeechSynthesizer\n  - OpenAISpeechListener\n- Run on Unity Editor.\n- Say \"こんにちは\" or word longer than 3 characters.\n\n\n## 🔖 Table of Contents\n\n- [📦 Setup New Project](#-setup-new-project)\n  - [Import dependencies](#import-dependencies)\n  - [Resource preparation](#resource-preparation)\n  - [AIAvatarVRM prefab](#aiavatarvrm-prefab)\n  - [ModelController](#modelcontroller)\n  - [Animator](#animator)\n  - [AIAvatar](#aiavatar)\n  - [LLM Service](#llm-service)\n  - [Speech Service](#speech-service)\n  - [Microphone Controller](#microphone-controller)\n  - [Run](#run)\n- [🎓 LLM Service](#-llm-service)\n  - [Basic Settings](#basic-settings)\n  - [Facial Expressions](#facial-expressions)\n  - [Animations](#animations)\n  - [Pause in Speech](#pause-in-speech)\n  - [User Defined Tag](#user-defined-tag)\n  - [Multi Modal](#multi-modal)\n  - [Chain of Thought Prompting](#chain-of-thought-prompting)\n  - [Consecutive Request Merging](#consecutive-request-merging)\n  - [Timestamp Insertion](#timestamp-insertion)\n  - [Edit Chat Completion Request](#edit-chat-completion-request)\n  - [Long-Term Memory](#long-term-memory)\n- [🗣️ Speech Synthesizer (Text-to-Speech)](#%EF%B8%8F-speech-synthesizer-text-to-speech)\n  - [Voice Prefetch Mode](#voice-prefetch-mode)\n  - [Make custom SpeechSynthesizer](#make-custom-speechsynthesizer)\n  - [Performance and Quality Tuning](#performance-and-quality-tuning)\n  - [Preprocessing](#preprocessing)\n- [🎧 Speech Listener (Speech-to-Text)](#-speech-listener-speech-to-text)\n  - [Settings on AIAvatar Inspector](#settings-on-aiavatar-inspector)\n  - [Downsampling](#downsampling)\n  - [Using AzureStreamSpeechListener](#using-azurestreamspeechlistener)\n  - [Using Silero VAD](#using-silero-vad)\n  - [Using Multiple VADs Combination](#using-multiple-vads-combination)\n  - [Echo Cancelling](#echo-cancelling)\n  - [Custom Barge-in Condition](#custom-barge-in-condition)\n- [⏰ Wake Word Detection](#-wake-word-detection)\n  - [Wake Words](#wake-words)\n  - [Cancel Words](#cancel-words)\n  - [Interrupt Words](#interrupt-words)\n  - [Ignore Words](#ignore-words)\n  - [Wake Length](#wake-length)\n- [⚡️ AI Agent (Tool Call)](#%EF%B8%8F-ai-agent-tool-call)\n- [🎙️ Devices](#%EF%B8%8F-devices)\n  - [Microphone](#microphone)\n  - [Camera](#camera)\n- [🥰 3D Model Control](#-3d-model-control)\n  - [Idle Animations](#idle-animations)\n  - [Control by Script](#control-by-script)\n- [🎚️ UI Components](#%EF%B8%8F-ui-components)\n- [🎮 Control from External Programs](#-control-from-external-programs)\n  - [ChatdollKit Remote Client](#chatdollkit-remote-client)\n- [🌐 Run on WebGL](#-run-on-webgl)\n- [🔄 Migration from 0.7.x](#-migration-from-07x)\n- [❤️ Thanks](#%EF%B8%8F-thanks)\n\n\n## 📦 Setup New Project\n\nThe steps for setting up with a VRM model are as follows. For instructions on using models for VRChat, refer to [README v0.7.7](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fblob\u002Fv0.7.7\u002FREADME.md#-modelcontroller).\n\n**⚠️CAUTION**: Do not use the SRP (Scriptable Render Pipeline) project template in Unity. UniVRM, which ChatdollKit depends on, does not support SRP.\n\n### Import dependencies\n\nDownload the latest version of [ChatdollKit.unitypackage](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Freleases) and import it into your Unity project after import dependencies;\n\n- `Burst` from Unity Package Manager (Window > Package Manager)\n- [UniTask](https:\u002F\u002Fgithub.com\u002FCysharp\u002FUniTask)(Tested on Ver.2.5.4)\n- [uLipSync](https:\u002F\u002Fgithub.com\u002Fhecomi\u002FuLipSync)(Tested on v3.1.0)\n- [UniVRM](https:\u002F\u002Fgithub.com\u002Fvrm-c\u002FUniVRM\u002Freleases\u002Ftag\u002Fv0.127.2)(v0.127.2)\n- [ChatdollKit VRM Extension](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Freleases)\n- JSON.NET: If your project doesn't have JSON.NET, add it from Package Manager > [+] > Add package from git URL... > com.unity.nuget.newtonsoft-json\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_4615b678a861.png\" width=\"640\">\n\n\n### Resource preparation\n\nAdd 3D model to the scene and adjust as you like. Also install required resources for the 3D model like shaders etc.\n\nAnd, import animation clips. In this README, I use [Anime Girls Idle Animations Free](https:\u002F\u002Fassetstore.unity.com\u002Fpackages\u002F3d\u002Fanimations\u002Fanime-girl-idle-animations-free-150406) that is also used in demo. I believe it is worth for you to purchase the pro edition👍\n\n\n### AIAvatarVRM prefab\n\nAdd the `ChatdollKit\u002FPrefabs\u002FAIAvatarVRM` prefab to the scene. And, create EventSystem to use UI components.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_73f78211661a.png\" width=\"640\">\n\n\n### ModelController\n\nSelect `Setup ModelController` in the context menu of ModelController.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_e480e297313a.png\" width=\"640\">\n\n\n### Animator\n\nSelect `Setup Animator` in the context menu of ModelController and select the folder that contains animation clips or their parent folder. In this case put animation clips in `01_Idles` and `03_Others` onto `Base Layer` for override blending, `02_Layers` onto `Additive Layer` for additive blending.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_5d424d96e638.gif\" width=\"640\">\n\nNext, see the `Base Layer` of newly created AnimatorController in the folder you selected. Confirm the value for transition to the state you want to set it for idle animation.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_ab49c9711c03.png\" width=\"640\">\n\nLastly, set the value to `Idle Animation Value` on the inspector of ModelController.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_6a7b2f17c7b0.png\" width=\"640\">\n\n\n### AIAvatar\n\nOn the inspector of `AIAvatar`, set `Wake Word` to start conversation (e.g. hello \u002F こんにちは🇯🇵), `Cancel Word` to stop comversation (e.g. stop \u002F おしまい🇯🇵), `Error Voice` and `Error Face` that will be shown when error occured (e.g. Something wrong \u002F 調子が悪いみたい🇯🇵).\n\n`Prefix \u002F Suffix Allowance` is the allowable length for additional characters before or after the wake word. For example, if the wake word is \"Hello\" and the allowance is 4 characters, the phrase \"Ah, Hello!\" will still be detected as the wake word.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_a89004dddc8b.png\" width=\"640\">\n\n\n### LLM Service\n\nAttach the component corresponding to the LLM service from `ChatdollKit\u002FScripts\u002FLLM` and set the required fields like API keys and system prompts. In this example, we use ChatGPT, but the framework also supports Claude, Gemini, and Dify.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_3d768b5582ba.png\" width=\"640\">\n\n\n### Speech Service\n\nAttach the `SpeechListener` component from `ChatdollKit\u002FScripts\u002FSpeechListener` for speech recognition and the `SpeechSynthesizer` component from `ChatdollKit\u002FScripts\u002FSpeechSynthesizer` for speech synthesis. Configure the necessary fields like API keys and language codes. Enabling `PrintResult` in the SpeechListener settings will output recognized speech to the log, useful for debugging.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_adea1c55010f.png\" width=\"640\">\n\n\n### Microphone Controller\n\nAdd `ChatdollKit\u002FPrefabs\u002FRuntime\u002FMicrophoneController` to your scene. This provides a UI to adjust the minimum volume for speech recognition. If the environment is noisy, you can slide it to the left to filter out background noise.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_e744a1222cd3.png\" width=\"640\">\n\n\n### Run\n\nPress Play button of Unity editor. You can see the model starts with idling animation and blinking.\n\n- Adjust the microphone volume slider if necessary.\n- Say the word you set to `Wake Word` on inspector. (e.g. hello \u002F こんにちは🇯🇵)\n- Your model will reply \"Hi there!\" or something.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_7f8c56014437.png\" width=\"640\">\n\nEnjoy👍\n\n\n## 🎓 LLM Service\n\n### Basic Settings\n\nWe support ChatGPT, Claude, Gemini, and Dify as text generation AI services. Experimental support for Command R is also available, but it is unstable. To use LLM services, attach the LLMService component you want to use from `ChatdollKit\u002FScripts\u002FLLM` to the AIAvatar object and check the `IsEnabled` box. If other LLMServices are already attached, make sure to uncheck the `IsEnabled` box for those you don't plan to use.\n\nYou can configure parameters like API keys and system prompts directly on the attached LLMService in the inspector. For more details of these parameters, please refer to the API references for the LLM services.\n\nNOTE: To use OpenAI-compatible APIs, check `IsOpenAPICompatibleAPI` and set `ChatCompletionURL` in addition to the above.\n\n- Gemini: https:\u002F\u002Fgenerativelanguage.googleapis.com\u002Fv1beta\u002Fchat\u002Fcompletions\n- Grok: https:\u002F\u002Fapi.x.ai\u002Fv1\u002Fchat\u002Fcompletions\n\n\n### Facial Expressions\n\nYou can autonomously control facial expressions according to the conversation content.\n\nTo control expressions, include tags like `[face:ExpressionName]` in the AI responses, which can be set through system prompts. Here's an example of a system prompt:\n\n```\nYou have four expressions: 'Joy', 'Angry', 'Sorrow', 'Fun' and 'Surprised'.\nIf you want to express a particular emotion, please insert it at the beginning of the sentence like [face:Joy].\n\nExample\n[face:Joy]Hey, you can see the ocean! [face:Fun]Let's go swimming.\n```\n\nThe expression names must be understandable by the AI. Make sure they match exactly, including case sensitivity, with the expressions defined in the VRM model.\n\n\n### Animations\n\nYou can also control gestures (referred to as animations) autonomously based on the conversation content.\n\nTo control animations, include tags like `[anim:AnimationName]` in the AI responses, and set the instructions in the system prompt. Here's an example:\n\n```\nYou can express your emotions through the following animations:\n\n- angry_hands_on_waist\n- brave_hand_on_chest\n- calm_hands_on_back\n- concern_right_hand_front\n- energetic_right_fist_up\n- energetic_right_hand_piece\n- pitiable_right_hand_on_back_head\n- surprise_hands_open_front\n- walking\n- waving_arm\n- look_away\n- nodding_once\n- swinging_body\n\nIf you want to express emotions with gestures, insert the animation into the response message like [anim:waving_arm].\n\nExample\n[anim:waving_arm]Hey, over here!\n```\n\nThe animation names must be clear to the AI for it to understand the intended gesture.\n\nTo link the specified animation name to the animation defined in the `Animator Controller`, register them in `ModelController` through code as shown below:\n\n```csharp\n\u002F\u002F Base\nmodelController.RegisterAnimation(\"angry_hands_on_waist\", new Model.Animation(\"BaseParam\", 0, 3.0f));\nmodelController.RegisterAnimation(\"brave_hand_on_chest\", new Model.Animation(\"BaseParam\", 1, 3.0f));\nmodelController.RegisterAnimation(\"calm_hands_on_back\", new Model.Animation(\"BaseParam\", 2, 3.0f));\nmodelController.RegisterAnimation(\"concern_right_hand_front\", new Model.Animation(\"BaseParam\", 3, 3.0f));\nmodelController.RegisterAnimation(\"energetic_right_fist_up\", new Model.Animation(\"BaseParam\", 4, 3.0f));\nmodelController.RegisterAnimation(\"energetic_right_hand_piece\", new Model.Animation(\"BaseParam\", 5, 3.0f));\nmodelController.RegisterAnimation(\"pitiable_right_hand_on_back_head\", new Model.Animation(\"BaseParam\", 7, 3.0f));\nmodelController.RegisterAnimation(\"surprise_hands_open_front\", new Model.Animation(\"BaseParam\", 8, 3.0f));\nmodelController.RegisterAnimation(\"walking\", new Model.Animation(\"BaseParam\", 9, 3.0f));\nmodelController.RegisterAnimation(\"waving_arm\", new Model.Animation(\"BaseParam\", 10, 3.0f));\n\u002F\u002F Additive\nmodelController.RegisterAnimation(\"look_away\", new Model.Animation(\"BaseParam\", 6, 3.0f, \"AGIA_Layer_look_away_01\", \"Additive Layer\"));\nmodelController.RegisterAnimation(\"nodding_once\", new Model.Animation(\"BaseParam\", 6, 3.0f, \"AGIA_Layer_nodding_once_01\", \"Additive Layer\"));\nmodelController.RegisterAnimation(\"swinging_body\", new Model.Animation(\"BaseParam\", 6, 3.0f, \"AGIA_Layer_swinging_body_01\", \"Additive Layer\"));\n```\n\nIf you use Animation Girl Idle Animations or its free edition, you can register animations easily:\n\n```csharp\nmodelController.RegisterAnimations(AGIARegistry.GetAnimations(animationCollectionKey));\n```\n\n\n### Pause in Speech\n\nYou can insert pauses in the character's speech to make conversations feel more natural and human-like.\n\nTo control the length of pauses, include tags like `[pause:seconds]` in the AI responses, which can be set through system prompts. The specified number of seconds can be a float value, allowing precise control of the pause duration at that point in the dialogue. Here's an example of a system prompt:\n\n```\nYou can insert pauses in the character's speech to make conversations feel more natural and human-like.\n\nExample:\nHey, it's a beautiful day outside! [pause:1.5] What do you think we should do?\n```\n\n\n### User Defined Tag\n\nBesides expressions and animations, you can execute actions based on developer-defined tags. Include the instructions to insert tags in the system prompt and implement `HandleExtractedTags`.\n\nHere's an example of switching room lighting on\u002Foff during the conversation:\n\n\n```\nIf you want switch room light on or off, insert language tag like [light:on].\n\nExample:\n[light:off]OK, I will turn off the light. Good night.\n```\n\n```csharp\ndialogProcessor.LLMServiceExtensions.HandleExtractedTags = (tags, session) =>\n{\n    if (tags.ContainsKey(\"light\"))\n    {\n        var lightCommand = tags[\"light\"];\n        if (lightCommand.lower() == \"on\")\n        {\n            \u002F\u002F Turn on the light\n            Debug.Log($\"Turn on the light\");\n        }\n        else if (lightCommand.lower() == \"off\")\n        {\n            \u002F\u002F Turn off the light\n            Debug.Log($\"Turn off the light\");\n        }\n        else\n        {\n            Debug.LogWarning($\"Unprocessable command: {lightCommand}\");\n        }\n    }\n};\n```\n\n\n### Multi Modal\n\nYou can include images from cameras or files in requests to the LLM. Include the image binary data under the key `imageBytes` in the `payloads` argument of `DialogProcessor.StartDialogAsync`.\n\nAdditionally, you can enable the system to autonomously capture images when required based on the user's speech. To achieve this, add the tag `[vision:camera]` in the AI response by configuring it in the system prompt, and implement the image capture process for when this tag is received in the LLM service.\n\n```\nYou can use camera to get what you see.\nWhen the user wants to you to see something, insert [vision:camera] into your response message.\n\nExample\nuser: Look! I bought this today.\nassistant: [vision:camera]Let me see.\n```\n\n```csharp\ngameObject.GetComponent\u003CChatGPTService>().CaptureImage = async (source) =>\n{\n    if (simpleCamera != null)\n    {\n        try\n        {\n            return await simpleCamera.CaptureImageAsync();\n        }\n        catch (Exception ex)\n        {\n            Debug.LogError($\"Error at CaptureImageAsync: {ex.Message}\\n{ex.StackTrace}\");\n        }\n    }\n\n    return null;\n};\n```\n\n\n### Chain of Thought Prompting\n\nChain of Thought (CoT) prompting is a technique to enhance AI performance. For more information about CoT and examples of prompts, see https:\u002F\u002Fdocs.anthropic.com\u002Fen\u002Fdocs\u002Fbuild-with-claude\u002Fprompt-engineering\u002Fchain-of-thought .\n\nChatdollKit supports Chain of Thought by excluding sentences wrapped in `\u003Cthinking> ~ \u003C\u002Fthinking>` tags from speech synthesis.\n\nYou can customize the tag by setting a preferred word (e.g., \"reason\") as the `ThinkTag` in the inspector of `LLMContentProcessor`.\n\n\n### Consecutive Request Merging\n\nWhen a user speaks in short bursts, the system may receive multiple rapid requests in quick succession, causing the AI to respond to each fragment separately. Consecutive Request Merging combines these fragmented inputs into a single request, so the AI can respond to the full intent at once.\n\nConfigure the following settings on the `DialogProcessor` component in the inspector:\n\n|Item|Description|\n|----|----|\n|**Merge Request Threshold**|The time window (in seconds) for merging consecutive requests. If a new request arrives within this interval after the previous one, the requests are merged. Set to `0` to disable (default: `0`).|\n|**Merge Request Prefix**|The prefix text prepended to the merged request to instruct the AI to disregard the previous incomplete response (default: `\"Previous user's request and your response have been canceled. Please respond again to the following request:\"`).|\n\nWhen a merge occurs, the previous request text and the new request text are concatenated, and the prefix is prepended to inform the AI that the prior response was canceled. The merged text is not displayed in the user message window — the user sees only their latest utterance.\n\n\n### Timestamp Insertion\n\nYou can automatically insert the current date and time into requests sent to the LLM at regular intervals. This allows the AI character to be aware of the current time without relying on tool calls, enabling time-aware responses such as greetings appropriate to the time of day.\n\nConfigure the following settings on the `DialogProcessor` component in the inspector:\n\n|Item|Description|\n|----|----|\n|**Timestamp Insertion Interval**|The interval (in seconds) between timestamp insertions. Set to `0` to disable this feature (default: `0`).|\n|**Timestamp Prefix**|The prefix text prepended to the timestamp (default: `\"Current date and time: \"`).|\n\nWhen enabled, the current date and time is prepended to the user's request text in the format `Current date and time: 2026\u002F02\u002F14 14:30:00` before it is sent to the LLM. The timestamp is not displayed in the user message window.\n\n\n### Edit Chat Completion Request\n\nYou can customize the HTTP request headers and body of the ChatGPT API call by setting `EditChatCompletionRequest` on `ChatGPTService`. This delegate is invoked just before the request is serialized and sent, giving you full control over the `UnityWebRequest` and the request data dictionary.\n\n```csharp\nvar chatGPTService = gameObject.GetComponent\u003CChatGPTService>();\n\nchatGPTService.EditChatCompletionRequest = (request, data) =>\n{\n    \u002F\u002F Add or modify request headers\n    request.SetRequestHeader(\"X-Custom-Header\", \"my-value\");\n\n    \u002F\u002F Add or modify body parameters\n    data[\"custom_param\"] = \"some_value\";\n\n    \u002F\u002F Modify messages (e.g. remove history and keep only the last user message)\n    var messages = (List\u003CILLMMessage>)data[\"messages\"];\n    messages.RemoveRange(0, messages.Count - 1);\n};\n```\n\n**Parameters:**\n\n|Parameter|Type|Description|\n|----|----|-----|\n|`request`|`UnityWebRequest`|The HTTP request object. Use `SetRequestHeader()` to add or override headers.|\n|`data`|`Dictionary\u003Cstring, object>`|The request body dictionary. Modify it directly to change the JSON payload sent to the API.|\n\n**Example: OpenClaw integration**\n\nThe following example adds a custom session header, strips the conversation history to send only the latest user message, and prepends a `[channel:voice]` tag to the text content:\n\n```csharp\nchatGPTService.EditChatCompletionRequest = (request, data) =>\n{\n    request.SetRequestHeader(\"x-openclaw-session-key\", \"agent:main:main\");\n\n    var messages = (List\u003CILLMMessage>)data[\"messages\"];\n    messages.RemoveRange(0, messages.Count - 1);\n\n    \u002F\u002F Add [channel:voice] prefix to text content of user message\n    if (messages.Last() is ChatGPTUserMessage userMessage)\n    {\n        var textPart = userMessage.content.OfType\u003CTextContentPart>().FirstOrDefault();\n        if (textPart != null)\n        {\n            textPart.text = \"[channel:voice]\" + textPart.text;\n        }\n        else\n        {\n            userMessage.content.Insert(0, new TextContentPart(\"[channel:voice]\"));\n        }\n    }\n};\n```\n\n\n### Long-Term Memory\n\nChatdollKit itself does not have a built-in mechanism for managing long-term memory. However, by implementing `OnStreamingEnd`, it is possible to accumulate memory. Additionally, by using a tool that retrieves stored memories, the system can recall and reflect them in conversations.\n\nThe following is an example using [ChatMemory](https:\u002F\u002Fgithub.com\u002Fuezo\u002Fchatmemory).\n\nFirst, to store memories, attach the `Extension\u002FChatMemory\u002FChatMemoryIntegrator` component to the main GameObject and set the ChatMemory service URL and a user ID. The user ID can be any value, but if you are building a service for multiple users, make sure to assign an ID that can uniquely identify each user within your service from code-behind.\n\nNext, add the following code to an appropriate location (such as `Main`) so that the request and response messages are stored in ChatMemory as history when the LLM stream finishes.\n\n```csharp\nusing ChatdollKit.Extension.ChatMemory;\n\nvar chatMemory = gameObject.GetComponent\u003CChatMemoryIntegrator>();\ndialogProcessor.LLMServiceExtensions.OnStreamingEnd += async (text, payloads, llmSession, token) =>\n{\n    chatMemory.AddHistory(llmSession.ContextId, text, llmSession.CurrentStreamBuffer, token).Forget();\n};\n```\n\n\nTo retrieve memories and include them in the conversation, simply add the `Extension\u002FChatMemory\u002FChatMemoryTool` component to the main GameObject.\n\n**NOTE:** ChatMemory manages what is known as episodic memory. There is also an entity called `Knowledge`, which corresponds to factual information, but it is not automatically extracted or stored. Handle it manually as needed. (By default, it is included in search targets.)\n\n\n## 🗣️ Speech Synthesizer (Text-to-Speech)\n\nWe support cloud-based speech synthesis services such as Google, Azure, OpenAI, and Watson, in addition to VOICEVOX \u002F AivisSpeech, Aivis Cloud API, VOICEROID and Style-Bert-VITS2 for more characterful and engaging voices. To use a speech synthesis service, attach `SpeechSynthesizer` from `ChatdollKit\u002FScripts\u002FSpeechListener` to the AIAvatar object and check the `IsEnabled` box. If other `SpeechSynthesizer` components are attached, make sure to uncheck the `IsEnabled` box for those not in use.\n\nYou can configure parameters like API keys and endpoints on the attached `SpeechSynthesizer` in the inspector. For more details of these parameters, refer to the API references of TTS services.\n\n### Voice Prefetch Mode\n\nThe `Voice Prefetch Mode` determines how speech synthesis requests are managed and processed. By default, the system operates in Parallel mode. The following modes are supported:\n\n1. **Parallel (default)**: In this mode, multiple speech synthesis requests are sent and processed simultaneously. This ensures the fastest response times when generating multiple speech outputs in quick succession. Use this mode when latency is critical and sufficient resources are available for parallel processing.\n1. **Sequential**: Requests are processed one at a time in the order they are enqueued. This mode is ideal for managing limited resources or ensuring strict order of speech outputs. It avoids potential concurrency issues but may result in longer wait times for subsequent requests.\n1. **Disabled**: No prefetching is performed in this mode. Speech synthesis occurs only when explicitly triggered, making it suitable for minimal-resource scenarios or when prefetching is unnecessary.\n\nYou can change the `Voice Prefetch Mode` in the inspector on the SpeechSynthesizer component. Ensure the selected mode aligns with your performance and resource management requirements.\n\n\n### Make custom SpeechSynthesizer\n\nYou can easily create and use a custom `SpeechSynthesizer` for your preferred text-to-speech service. Create a class that inherits from `ChatdollKit.SpeechSynthesizer.SpeechSynthesizerBase`, and implement the asynchronous method `DownloadAudioClipAsync` that takes a `string text` and `Dictionary\u003Cstring, object> parameters`, and returns an `AudioClip` object playable in Unity.\n\n```csharp\nUniTask\u003CAudioClip> DownloadAudioClipAsync(string text, Dictionary\u003Cstring, object> parameters, CancellationToken cancellationToken)\n```\n\nNote that WebGL does not support compressed audio playback, so make sure to handle this by adjusting your code depending on the platform.\n\n\n### Performance and Quality Tuning\n\nTo achieve fast response times, rather than synthesizing the entire response message into speech, we split the text into smaller parts based on punctuation and progressively synthesize and play each segment. While this greatly improves performance, excessively splitting the text can reduce the quality of the speech, especially when using AI-based speech synthesis like Style-Bert-VITS2, affecting the tone and fluency.\n\nYou can balance performance and speech quality by adjusting how the text is split for synthesis in the `LLMContentProcessor` component's inspector.\n\n|Item|Description|\n|----|----|\n|**Split Chars**|Characters to split the text at for synthesis. Speech synthesis is always performed at these points.|\n|**Optional Split Chars**|Optional split characters. Normally, the text isn't split at these, but it will be if the text length exceeds the value set in Max Length Before Optional Split.|\n|**Max Length Before Optional Split**|Threshold for text length at which optional split characters are used as split points.|\n\n\n### Preprocessing\n\nImplement `SpeechSynthesizer.PreprocessText` method to preprocess the text to synthesize.\n\nInterface:\n\n```csharp\nFunc\u003Cstring, Dictionary\u003Cstring, object>, CancellationToken, UniTask\u003Cstring>> PreprocessText;\n```\n\n\n## 🎧 Speech Listener (Speech-to-Text)\n\nWe support cloud-based speech recognition services such as Google, Azure, and OpenAI. To use these services, attach the `SpeechListener` component from `ChatdollKit\u002FScripts\u002FSpeechListener` to the AIAvatar object. Be aware that if multiple SpeechListeners are attached, they will run in parallel, so ensure that only the one you want is active.\n\nYou can configure parameters such as API keys and endpoints on the attached SpeechListener in the inspector. For details of these parameters, please refer to the API references of the respective STT services and products.\n\nMost of the `Voice Recorder Settings` are controlled by the `AIAvatar` component, described later, so any settings in the inspector, except for those listed below, will be ignored.\n\n|Item|Description|\n|----|-----------|\n|**Auto Start**|When enabled, starts speech recognition automatically when the application launches.|\n|**Print Result**|When enabled, outputs the transcribed recognized speech to the console.|\n\n\n### Settings on AIAvatar Inspector\n\nMost of the settings related to the SpeechListener are configured in the inspector of the `AIAvatar` component.\n\n|Item|Description|\n|---|---|\n|**Conversation Timeout**|The waiting time (seconds) before the conversation is considered finished. After this period, it transitions to Idle mode, and the message window will be hidden. To resume the conversation, the wake word must be recognized again.|\n|**Idle Timeout**|The waiting time (seconds) before transitioning from Idle mode to Sleep mode. By default, there is no difference between Idle and Sleep modes, but it can be used to switch between different speech recognition methods or idle animations through user implementation.|\n|**Voice Recognition Threshold DB**|The volume threshold (decibels) for speech recognition. Sounds below this threshold will not be recognized.|\n|**Voice Recognition Raised Threshold DB**|An elevated threshold (decibels) for voice recognition, used to detect louder speech. This is utilized when the `Microphone Mute By` setting is set to `Threshold`.|\n|**Conversation Silence Duration Threshold**|If silence is detected for longer than this time, recording ends, and speech recognition is performed.|\n|**Conversation Min Recording Duration**|Speech recognition is performed only if the recorded sound exceeds this duration. This helps to ignore short noises and prevent misrecognition.|\n|**Conversation Max Recording Duration**|If the recorded sound exceeds this time, speech recognition is not performed, and the recording is ignored. This prevents overly long recordings from overburdening speech recognition.|\n|**Idle Silence Duration Threshold**|The amount of silence (seconds) required to stop recording during Idle mode. A smaller value is set to smoothly detect short periods of silence when waiting for the wake word.|\n|**Idle Min Recording Duration**|The minimum recording duration during Idle mode. A smaller value is set compared to conversation mode to smoothly detect short phrases.|\n|**Idle Max Recording Duration**|The maximum recording duration during Idle mode. Since wake words are usually short, a shorter value is set compared to conversation mode.|\n|**Microphone Mute By**|The method used to prevent the avatar's speech from being recognized during speech. \u003Cbr>\u003Cbr>- None: Does nothing.\u003Cbr>- Threshold: Raises the voice recognition threshold to `Voice Recognition Raised Threshold DB`.\u003Cbr>- Mute: Ignores input sound from the microphone.\u003Cbr>- Stop Device: Stops the microphone device.\u003Cbr>- Stop Listener: Stops the listener. **Select this when you use AzureStreamSpeechListener**|\n|**Stop Response On Barge In**|When enabled (default: true), the AI's speech is automatically stopped when the SpeechListener detects that the user has started speaking (barge-in). The trigger condition depends on the SpeechListener type: for non-stream listeners, it fires when voice recording exceeds a minimum duration (default: 1.5 seconds); for stream listeners (AIAvatarKitStream, AzureStream), it fires when recognized text reaches a minimum length (default: 2 characters). You can override the trigger condition by setting `BargeInCondition` on the SpeechListener. Set to false to disable this feature.|\n\n\n**NOTE: **`AzureStreamSpeechListener` doesn't have some properties above because that control microphone by SDK DLL internally.\n\n\n### Downsampling\n\nThe `SpeechListener` class supports downsampling of raw microphone input to a lower sample rate before sending input data to the STT service. This feature helps reduce audio payload size, leading to smoother transcription over limited-bandwidth networks.\n\nYou’ll find the **Target Sample Rate** (int) field exposed in the Inspector of SpeechListener component:\n\n- Set to `0` (default) to use the original sample rate (no downsampling).  \n- Set to a positive integer (e.g., `16000`) to downsample input to that rate (in Hz).\n\n\n### Using AIAvatarKitStreamSpeechListener\n\n[AIAvatarKit](https:\u002F\u002Fgithub.com\u002Fuezo\u002Faiavatarkit) is a framework that provides a Speech-to-Speech pipeline, and it also offers a streaming speech recognition server. By combining Silero VAD (also used in ChatdollKit) for turn-end detection with any speech recognition engine, you can build a real-time speech recognition server tailored to your use case.\n\nTo use this, attach the `AIAvatarKitStreamSpeechListener` component to your AIAvatar object and configure the connection URL. Then add the following speech display handling to your main logic initialization (in `Start()`, or any process that runs after `Awake()`).\n\n```csharp\n\u002F\u002F Show AI message partially with AIAvatarKitStreamSpeechListener\nvar aiavatarKitStreamSpeechListener = gameObject.GetComponent\u003CAIAvatarKitStreamSpeechListener>();\nif (aiavatarKitStreamSpeechListener != null)\n{\n    var userMessageWindow = (SimpleMessageWindow)aiAvatar.UserMessageWindow;\n\n    \u002F\u002F Disable text animation since partial results are streamed in real-time\n    userMessageWindow.IsTextAnimated = false;\n    \u002F\u002F Shorter PostGap is fine for streaming display; prioritize responsiveness\n    userMessageWindow.PostGap = 0.2f;\n\n    \u002F\u002F Manually hide user message window after PostGap on the first turn,\n    \u002F\u002F because the user message window is not managed by the normal dialog flow\n    var originalOnRecognized = aiavatarKitStreamSpeechListener.OnRecognized;\n    aiavatarKitStreamSpeechListener.OnRecognized = async (text) => {\n        if (originalOnRecognized != null)\n        {\n            await originalOnRecognized(text);\n        }\n        if (aiAvatar.Mode != AIAvatar.AvatarMode.Conversation)\n        {\n            await UniTask.Delay((int)(userMessageWindow.PostGap * 1000));\n            aiAvatar.UserMessageWindow?.Hide();\n        }\n    };\n\n    \u002F\u002F Display partial recognition results\n    aiavatarKitStreamSpeechListener.OnPartialRecognized = (partialText) => {\n        if (!string.IsNullOrEmpty(partialText))\n        {\n            aiAvatar.UserMessageWindow.Show(partialText);\n        }\n    };\n}\n```\n\n**NOTE**: Using Silero VAD in WebGL builds can cause high browser processing overhead. We recommend using `AIAvatarKitStreamSpeechListener` to offload VAD processing to the server side.\n\n**NOTE**: To use `AIAvatarKitStreamSpeechListener` in WebGL builds, add the NativeWebSocket package via Package Manager: `https:\u002F\u002Fgithub.com\u002Fendel\u002FNativeWebSocket.git#upm`. Unity's built-in WebSocket client (`System.Net.WebSockets`) is not supported on WebGL.\n\n\n### Using AzureStreamSpeechListener\n\nTo use `AzureStreamSpeechListener`, some settings differ from other SpeechListeners. This is because `AzureStreamSpeechListener` controls the microphone internally through the SDK and performs transcription incrementally.\n\n**Microphone Mute By**: Select `Stop Listener`. If this is not set, the character will listen to its own speech, disrupting the conversation.\n\n**User Message Window**: Uncheck `Is Text Animated`, and set `Pre Gap` to `0` and `Post Gap` to around `0.2`.\n\n**Update()**: To display the recognized text incrementally, add the following code inside the Update() method:\n\n\n```csharp\nif (aiAvatar.Mode == AIAvatar.AvatarMode.Conversation)\n{\n    if (!string.IsNullOrEmpty(azureStreamSpeechListener.RecognizedTextBuffer))\n    {\n        aiAvatar.UserMessageWindow.Show(azureStreamSpeechListener.RecognizedTextBuffer);\n    }\n}\n```\n\n\n### Using Silero VAD\n\nSilero VAD is a machine learning-based voice activity detection model. By using this, you can determine human voice even in noisy environments, which significantly improves the accuracy of turn-end detection in noisy conditions compared to microphone volume-based voice activity detection.\n\nThe usage procedure is as follows:\n\n- Import [onnxruntime-unity](https:\u002F\u002Fgithub.com\u002Fasus4\u002Fonnxruntime-unity). Follow the procedure on GitHub to edit manifest.json.\n- Download the [Silero VAD ONNX model](https:\u002F\u002Fgithub.com\u002Fsnakers4\u002Fsilero-vad\u002Ftree\u002Fmaster\u002Fsrc\u002Fsilero_vad\u002Fdata) and place it in the StreamingAssets folder. The filename should be `silero_vad.onnx`.\n- Download and import ChatdollKit's SileroVADExtension.\n- Attach `SileroVADProcessor` to the object where SpeechListener is attached.\n- In the `Awake` method of any MonoBehaviour component, set it as the voice detection function for SpeechListener.\n    ```csharp\n    var sileroVad = gameObject.GetComponent\u003CSileroVADProcessor>();\n    sileroVad.Initialize();\n    var speechListener = gameObject.GetComponent\u003CSpeechListenerBase>();\n    speechListener.DetectVoiceFunc = sileroVad.IsVoiced;\n    ```\n- Place SileroVADMicrophoneButton in the scene if necessary\n\nWhen executed, Silero VAD will be used for voice activity detection.\n\n\n### Using Multiple VADs Combination\n\nChatdollKit supports combining multiple types of VADs. For example, by combining Silero VAD, which can recognize only human voices even in noisy environments, with the built-in energy-based VAD, which only captures loud voices, the system can accurately pick up the user’s speech at event venues while partially filtering out surrounding voices and venue announcements.\n\nTo use multiple VADs, add multiple voice detection functions to `DetectVoiceFunctions` instead of `DetectVoiceFunc`.\n\n```csharp\nspeechListener.DetectVoiceFunctions = new List\u003CFunc\u003Cfloat[], float, bool>>()\n{\n    sileroVad.IsVoiced, speechListener.IsVoiceDetectedByVolume\n};\n```\n\n\n### Echo Cancelling\n\nUnity's built-in Microphone API doesn't support echo cancelling. To enable this feature, use platform-specific native microphone plugins.\n\n```csharp\nprivate void Awake()\n{\n    var microphoneManager = gameObject.GetComponent\u003CMicrophoneManager>();\n    \n    \u002F\u002F First, import the ChatdollKit_NativeMicrophone package\n    \u002F\u002F Then, set the appropriate provider for your platform:\n    \n    \u002F\u002F iOS\n    microphoneManager.MicrophoneProvider = new IOSMicrophoneProvider();\n    \u002F\u002F Android\n    microphoneManager.MicrophoneProvider = new AndroidMicrophoneProvider();\n    \u002F\u002F macOS\n    microphoneManager.MicrophoneProvider = new MacMicrophoneProvider();\n}\n```\n\nWith echo cancelling enabled, you can allow users to interrupt the AI while it's speaking. To enable this feature:\n\n1. In the Inspector, select the `AIAvatar` component\n2. Set `MicrophoneMuteBy` to `None`\n\nThis configuration allows the microphone to remain active during AI speech, enabling natural conversation interruptions while the echo cancelling prevents the AI's voice from being picked up by the microphone.\n\n\n### Custom Barge-in Condition\n\nBy default, barge-in is triggered based on the type of SpeechListener:\n\n- **Non-stream listeners** (OpenAI, Google, AIAvatarKit): Fires when the user has been recording for at least `BargeInMinDuration` seconds (default: 1.5s).\n- **Stream listeners** (AzureStream, AIAvatarKitStream): Fires when the partially recognized text reaches `BargeInMinTextLength` characters (default: 2).\n\nYou can override this logic by setting `BargeInCondition` on the SpeechListener. The delegate signature is `Func\u003Cstring, float, bool>` where:\n\n- `text` — Partially recognized text (`null` for non-stream listeners)\n- `recordDuration` — Elapsed recording time in seconds (`0f` for stream listeners)\n- Return `true` to trigger barge-in\n\n```csharp\n\u002F\u002F Example: Require at least 3 characters for stream listeners\naiAvatar.SpeechListener.BargeInCondition = (text, recordDuration) =>\n{\n    if (text != null)\n    {\n        return text.Length >= 3;\n    }\n    return recordDuration >= 2.0f;\n};\n```\n\n\n## ⏰ Wake Word Detection\n\nYou can detect wake words as triggers to start a conversation. You can also configure settings in the AIAvatar component’s inspector for cancel words that end a conversation, or to use the length of recognized speech as a trigger instead of specific phrases.\n\n### Wake Words\n\nThe conversation starts when this phrase is recognized. You can register multiple wake words. Except for the following items, settings will be ignored in versions 0.8 and later.\n\n|Item|Description|\n|---|---|\n|**Text**|Phrase to start conversation.|\n|**Prefix \u002F Suffix Allowance**|The allowable length for additional characters before or after the wake word. For example, if the wake word is \"Hello\" and the allowance is 4 characters, the phrase \"Ah, Hello!\" will still be detected as the wake word.|\n\n### Cancel Words\n\nThe conversation ends when this phrase is recognized. You can register multiple cancel words.\n\n### Interrupt Words\n\nThe character stop speaking and start listening user's request. You can register multiple interrupt words. (e.g. \"Wait\")\n\n**NOTE:** In the AIAvatar's inspector, select `Threshold` under `Microphone Mute By` to allow ChatdollKit to listen your voice while the character is speaking.\n\n### Ignore Words\n\nYou can register strings to be ignored when determining whether the recognized speech matches a wake word or cancel word. This is useful if you don’t want to consider the presence or absence of punctuation.\n\n### Wake Length\n\nYou can start a conversation based on the length of the recognized text, rather than specific phrases. This feature is disabled when the value is `0`. For example, in Idle mode, you can resume the conversation using text length instead of a wake word, and in Sleep mode, the conversation can resume with the wake word.\n\n\n## ⚡️ AI Agent (Tool Call)\n\nUsing the Tool Call (Function Calling) feature provided by the LLM, you can develop AI characters that function as AI agents, rather than simply engaging in conversation.\n\nBy creating a component that implements `ITool` or extends `ToolBase` and attaching it to the AIAvatar object, it will automatically be recognized as a tool and executed when needed. To create a custom tool, define `FunctionName` and `FunctionDescription`, and implement the `GetToolSpec` method, which returns the function definition, and the `ExecuteFunction` method, which handles the function’s process. For details, refer to `ChatdollKit\u002FExamples\u002FWeatherTool`.\n\n**NOTE**: See [Migration from FunctionSkill to Tool](#migration-from-functionskill-to-tool) if your project has custom LLMFunctionSkills.\n\n\n### Integration with Remote AI Agents\n\nWhile ChatdollKit natively supports simple tool calls, it also provides integration with server-side AI agents to enable more agentic behaviors.\n\nSpecifically, ChatdollKit allows you to call AI agents through RESTful APIs by registering them as an `LLMService`. This lets you send requests and receive responses without needing to be aware of the agentic processes happening behind the scenes.  \nCurrently, [Dify](https:\u002F\u002Fdify.ai) and [AIAvatarKit](https:\u002F\u002Fgithub.com\u002Fuezo\u002Faiavatarkit) are supported. You can use them by attaching either `DifyService` or `AIAvatarKitService`, configuring their settings, and enabling the `IsEnabled` flag.\n\n\n## 🎙️ Devices\n\nWe provide a device control mechanism. Currently, microphones and cameras are supported.\n\n### Microphone\n\nThe `MicrophoneManager` component captures audio from the microphone and makes the audio waveform data available to other components. It is primarily intended for use with the SpeechListener, but you can also register and use recording sessions through the `StartRecordingSession` method in custom user-implemented components.\n\nThe following are the settings that can be configured in the inspector.\n\n|Item|Description|\n|----|----|\n|**Sample Rate**|Specifies the sampling rate. Set it to 44100 when using WebGL.|\n|**Noise Gate Threshold DB**|Specifies the noise gate level in decibels. When used with the AIAvatar component, this value is controlled by the AIAvatar component.|\n|**Auto Start**|Starts capturing audio from the microphone when the application launches.|\n|**Is Debug**|Logs microphone start\u002Fstop and mute\u002Funmute actions.|\n\n\n### Camera\n\nWe provide the `SimpleCamera` prefab, which packages features such as image capture, preview display, and camera switching. Since the way cameras are handled varies by device, this is provided experimentally. For details, refer to the prefab and the scripts attached to it.\n\n\n## 🥰 3D Model Control\n\nThe `ModelController` component controls the gestures, facial expressions, and speech of 3D models.\n\n### Idle Animations\n\nIdle animations are looped while the model is waiting. To run the desired motion, register it in the state machine of the Animator Controller and configure the transition conditions by setting the parameter name as the `Idle Animation Key` and the value as the `Idle Animation Value` in the `ModelController` inspector.\n\nTo register multiple motions and randomly switch between them at regular intervals, use the `AddIdleAnimation` method in the code as shown below. The first argument is the `Animation` object to be executed, `weight` is the multiplier for the appearance probability, and `mode` is only specified if you want to display the animation in a particular model state. The constructor of the `Animation` class takes the parameter name as the first argument, the value as the second, and the duration (in seconds) as the third.\n\n```csharp\nmodelController.AddIdleAnimation(new Animation(\"BaseParam\", 2, 5f));\nmodelController.AddIdleAnimation(new Animation(\"BaseParam\", 6, 5f), weight: 2);\nmodelController.AddIdleAnimation(new Animation(\"BaseParam\", 99, 5f), mode: \"sleep\");\n```\n\n### Control by Script\n\nThis section is under construction. Essentially, you create an `AnimatedVoiceRequest` object and call `ModelController.AnimatedSay`. The `AIAvatar` internally makes requests that combine animations, expressions, and speech, so refer to that for guidance.\n\n\n## 🎚️ UI Components\n\nWe provide UI component prefabs commonly used in voice-interactive AI character applications. You can use them by simply adding them to the scene. For configuration details, refer to the demo.\n\n- **FPSManager**: Displays the current frame rate. You can also set the target frame rate using this component.\n- **MicrophoneController**: A slider to adjust the microphone's noise gate.\n- **RequestInput**: A text box for inputting requests. It also provides buttons for retrieving images from the file system and for launching the camera.\n- **SimpleCamera**: A component that handles image capture and preview display from the camera. You can also capture images without showing the preview.\n\n\n## 🎮 Control from External Programs\n\nYou can send requests to the ChatdollKit application from external programs using socket communication or from JavaScript. This feature enables new use cases such as AI Vtuber streaming, remote avatar customer service, and hybrid character operations combining AI and human interaction.\nAttach `ChatdollKit\u002FScripts\u002FNetwork\u002FSocketServer` to the AIAvatar object and set the port number (e.g., 8080) to control using socket communication, or, attach `ChatdollKit\u002FScripts\u002FIO\u002FJavaScriptMessageHandler` to control from JavaScript.\n\nAdditionally, to handle dialog requests over the network, attach the `ChatdollKit\u002FScripts\u002FDialog\u002FDialogPriorityManager` to the AIAvatar object. To process requests that make the character perform gestures, facial expressions, or speech created by humans instead of AI responses, attach the `ChatdollKit\u002FScripts\u002FModel\u002FModelRequestBroker` to the AIAvatar object.\n\nBelow is a code example for using both of the above components.\n\n```csharp\n\u002F\u002F Configure message handler for remote control\n#pragma warning disable CS1998\n#if UNITY_WEBGL && !UNITY_EDITOR\ngameObject.GetComponent\u003CJavaScriptMessageHandler>().OnDataReceived = async (message) =>\n{\n    HandleExternalMessage(message, \"JavaScript\");\n};\n#else\ngameObject.GetComponent\u003CSocketServer>().OnDataReceived = async (message) =>\n{\n    HandleExternalMessage(message, \"SocketServer\");\n};\n#endif\n#pragma warning restore CS1998\n```\n\n```csharp\nprivate void HandleExternalMessage(ExternalInboundMessage message, string source)\n{\n    \u002F\u002F Assign actions based on the request's Endpoint and Operation\n    if (message.Endpoint == \"dialog\")\n    {\n        if (message.Operation == \"start\")\n        {\n            if (source == \"JavaScript\")\n            {\n                dialogPriorityManager.SetRequest(message.Text, message.Payloads, 0);\n            }\n            else\n            {\n                dialogPriorityManager.SetRequest(message.Text, message.Payloads, message.Priority);\n            }\n        }\n        else if (message.Operation == \"clear\")\n        {\n            dialogPriorityManager.ClearDialogRequestQueue(message.Priority);\n        }\n    }\n    else if (message.Endpoint == \"model\")\n    {\n        modelRequestBroker.SetRequest(message.Text);\n    }            \n}\n```\n\n### ChatdollKit Remote Client\n\nThe `SocketServer` is designed to receive arbitrary information via socket communication, so no official client program is provided. However, a Python sample code is available. Please refer to the following and adapt it to other languages or platforms as needed.\n\nhttps:\u002F\u002Fgist.github.com\u002Fuezo\u002F9e56a828bb5ea0387f90cc07f82b4c15\n\nOr, if you want to build AITuber (AI VTuber), try AITuber demo with [ChatdollKit AITuber Controller](https:\u002F\u002Fgithub.com\u002Fuezo\u002Fchatdollkit-aituber) that is using `SocketServer` internally.\n\n\n## 🌐 Run on WebGL\n\nRefer to the following tips for now. We are preparing demo for WebGL.\n\n- It takes 5-10 minutes to build. (It depends on machine spec)\n- Very hard to debug. Error message doesn't show the stacktrace: `To use dlopen, you need to use Emscripten’s linking support, see https:\u002F\u002Fgithub.com\u002Fkripken\u002Femscripten\u002Fwiki\u002FLinking` \n- Built-in Async\u002FAwait doesn't work (app stops at `await`) because JavaScript doesn't support threading. Use [UniTask](https:\u002F\u002Fgithub.com\u002FCysharp\u002FUniTask) instead.\n- CORS required for HTTP requests.\n- Microphone is not supported. Use `ChatdollMicrophone` that is compatible with WebGL.\n- Compressed audio formats like MP3 are not supported. Use WAV in SpeechSynthesizer.\n- OVRLipSync is not supported. Use [uLipSync](https:\u002F\u002Fgithub.com\u002Fhecomi\u002FuLipSync) instead.\n- You also add the code below to your main script to enable uLipSync:\n    ```\n    var ul = gameObject.GetComponent\u003CuLipSync.uLipSync>();\n    modelController.SpeechController.HandlePlayingSamples = (samples) =>\n    {\n        ul.OnDataReceived(samples, 1);\n    };\n    ```\n- If you want to show multibyte characters in message window put the font that includes multibyte characters to your project and set it to message windows.\n\n\n## 🔄 Migration from 0.7.x\n\nThe easiest way is deleting `Assets\u002FChatdollKit` and import ChatdollKit unitypackage again. But if you can't do so for some reasons, you can solve errors by following steps:\n\n1. Import the latest version of ChatdollKit unitypackage. Some errors will be shown in the console.\n\n1. Import ChatdollKit_0.7to084Migration.unitypackage.\n\n1. Add `partial` keyword to `ModelController`, `AnimatedVoiceRequest` and `Voice`.\n\n1. Replace `OnSayStart` with `OnSayStartMigration` in `DialogController`.\n\n**⚠️Note**: This simply suppresses error outputs and does not enable continued use of legacy code. If any parts of your project still use `DialogController`, `LLMFunctionSkill`, `LLMContentSkill`, or `ChatdollKit`, replace each with the updated component as follows:\n\n- `DialogController`: `DialogProcessor`\n- `LLMFunctionSkill`: `Tool`\n- `LLMContentSkill`: `LLMContentProcessor`\n- `ChatdollKit`: `AIAvatar`\n\n\n### Migration from FunctionSkill to Tool\n\nIf your component inherits from `LLMFunctionSkillBase`, you can easily migrate it to inherit from `ToolBase` by following these steps:\n\n1. Change the inherited class\n\n    Replace `LLMFunctionSkillBase` with `ToolBase` as the base class.\n\n    ```md\n    \u002F\u002F Before\n    public class MyFunctionSkill : LLMFunctionSkillBase\n\n    \u002F\u002F After\n    public class MyFunctionSkill : ToolBase\n    ```\n\n1. Update the `ExecuteFunction` method signature\n\n    Modify the `ExecuteFunction` method’s parameters and return type as follows:\n\n    ```md\n    \u002F\u002F Before\n    public UniTask\u003CFunctionResponse> ExecuteFunction(string argumentsJsonString, Request request, State state, User user, CancellationToken token)\n\n    \u002F\u002F After\n    public UniTask\u003CToolResponse> ExecuteFunction(string argumentsJsonString, CancellationToken token)\n    ```\n\n1. Update the return type of `ExecuteFunction`\n\n    Change `FunctionResponse` to `ToolResponse`.\n\n\n\n## ❤️ Thanks\n\n- [uLipSync](https:\u002F\u002Fgithub.com\u002Fhecomi\u002FuLipSync) (LipSync) (c)[hecomi](https:\u002F\u002Ftwitter.com\u002Fhecomi)\n- [UniTask](https:\u002F\u002Fgithub.com\u002FCysharp\u002FUniTask) (async\u002Fawait integration) (c)[neuecc](https:\u002F\u002Fx.com\u002Fneuecc)\n- [UniVRM](https:\u002F\u002Fgithub.com\u002Fvrm-c\u002FUniVRM\u002Freleases\u002Ftag\u002Fv0.89.0) (VRM) (c)[VRM Consortium](https:\u002F\u002Fx.com\u002Fvrm_pr) \u002F (c)[Masataka SUMI](https:\u002F\u002Fx.com\u002Fsantarh) for MToon\n","﻿# ChatdollKit\n一款3D虚拟助手SDK，可将您的3D模型轻松转化为具备语音功能的聊天机器人。[🇯🇵日语README在此](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fblob\u002Fmaster\u002FREADME.ja.md)\n\n- [🐈 实时演示](https:\u002F\u002Funagiken.blob.core.windows.net\u002Fchatdollkit\u002FChatdollKitDemoWebGL\u002Findex.html) 一个基于WebGL的演示。只需说“Hello”即可开始对话。她支持多语言，因此当您想切换语言时，可以说“让我们用日语聊聊”。\n- [🍎 iOS应用：OshaberiAI](https:\u002F\u002Fapps.apple.com\u002Fus\u002Fapp\u002Foshaberiai\u002Fid6446883638) 这是一款使用ChatdollKit开发的虚拟助手应用：完美融合了通过AI提示工程进行角色创建、可定制的3D VRM模型以及您喜爱的VOICEVOX语音。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_492a59b524d1.png\" width=\"720\">\n\n\n## ✨ 功能特性\n\n- **原生生成式AI支持**：支持多种大语言模型，如ChatGPT、Anthropic Claude、Google Gemini Pro、Dify等，并具备函数调用（ChatGPT\u002FGemini）和多模态能力。\n- **3D模型表达**：同步语音与动作，自主控制面部表情与动画，支持眨眼和唇形同步。\n- **对话管理**：集成语音转文本与文本转语音技术（OpenAI、Azure、Google、VOICEVOX \u002F AivisSpeech、Aivis Cloud API、Style-Bert-VITS2等），管理对话状态（上下文），提取意图并路由话题，支持唤醒词检测。\n- **跨平台支持**：兼容Windows、Mac、Linux、iOS、Android以及其他Unity支持的平台，包括VR、AR和WebGL。\n\n\n## 💎 版本0.8.16的新特性\n\n- **🎙️ WebSocket流式STT**：基于WebSocket的流式语音识别将VAD处理卸载到服务器端，并在轮次结束检测时完成识别，从而将整体响应延迟缩短数百毫秒。\n- **🗣️ 抢话支持**：用户现在可以在AI发言过程中随时用语音打断，使对话更加自然流畅、响应迅速。\n- **💃 ModelController重构**：将语音处理逻辑提取至`SpeechController`，面部表情控制提取至`FaceController`，提升了代码的可维护性和扩展性。\n\n\n\u003Cdetails>\n\u003Csummary>🕰️ 历史更新（点击展开）\u003C\u002Fsummary>\n\n### 0.8.15\n\n- **🌏 WebGL增强**：新增Silero VAD支持，支持前后摄像头切换并正确处理宽高比，添加图片文件上传功能，优化麦克风数据传输，并修复静音时的唇形同步问题。  \n- **✨ UI控件改进**：更简洁流畅的UI控件，无需任何配置即可开箱即用——直接拖放到场景中的Canvas上即可。  \n- **🥁 更强的抗噪能力**：结合多种语音活动检测方法（例如Silero VAD + 内置能量型VAD），即使在嘈杂的环境中（如活动现场）也能更好地捕捉用户语音。\n\n\n### 0.8.14\n\n- **🎙️ 回声消除支持**：为支持AEC、降噪等功能的Android、iOS和macOSX设备原生添加麦克风支持，优化语音对话体验。\n- **🗣️ 对话体验提升**：防止因轮次结束误判而导致的对话中断，并通过用户在AI发言时打断时自动调节音量等功能改善对话体验。\n- **💠 平台扩展**：支持Aivis Cloud API TTS、AIAvatarKit TTS\u002FSTT以及GPT-5的`reasoning_effort`参数。\n\n### 0.8.13\n\n- **🥳 Silero VAD支持**：基于ML的语音活动检测大幅提升了嘈杂环境下的轮次结束准确性，使户外或活动现场的对话更加顺畅。\n- **🪄 TTS预处理**：提供可选的文本预处理功能，允许您在合成前微调发音（例如将“OpenAI”转换为片假名）。\n- **🤝 Grok与Gemini兼容性**：移除了OpenAI风格端点中的特定参数，使得Grok、Gemini及其他兼容API的大模型可以直接使用。\n\n\n### 0.8.11和0.8.12\n\n- **🤖 AIAvatarKit后端**：将AI代理逻辑卸载到服务器端，提升前端的可维护性，同时允许您接入AutoGen等框架（以及任何其他代理SDK），实现无限的能力扩展。\n- **🌐 WebGL改进**：升级麦克风采集为现代的`AudioWorkletNode`，以降低延迟并提高可靠性；稳定了静音\u002F取消静音的处理逻辑；改进了错误处理机制，能够立即显示HTTP错误并防止程序卡死；修复了WebGL构建中的API密钥授权问题。\n\n### 0.8.10\n\n- **🌎 动态多语言支持**：系统现在能够在对话过程中自动切换说话和听语的语言。\n- **🔖 长期记忆**：过去的对话历史现在可以存储并检索。我们提供了[ChatMemory](https:\u002F\u002Fgithub.com\u002Fuezo\u002Fchatmemory)组件，但您也可以集成mem0或Zep等服务。\n\n\n### 0.8.8和0.8.9\n\n- **✨ 支持NijiVoice作为语音合成器**：现支持NijiVoice，这是一款基于AI的富有表现力的语音生成服务。\n- **🥰🥳 多AITuber对话支持**：AITuber之间现在可以相互聊天，带来前所未有的动态且引人入胜的互动！\n- **💪 支持Dify作为AITuber的后端**：可无缝对接任意LLM，同时赋予AITuber代理能力，融合先进知识与功能，实现高效且可扩展的运营！\n\n\n### 0.8.7\n\n- **✨ 更新AITuber演示**：支持更多API、批量配置、UI及模式！(v0.8.7)\n\n\n### 0.8.6\n\n- **🎛️ 支持VOICEVOX和AivisSpeech内联风格**：实现语音风格的动态自主切换，丰富角色表达并适应情感细微变化。\n- **🥰 改进VRM运行时加载**：允许在运行时无缝且无错误地切换3D模型，确保更流畅的用户体验。\n\n\n### 0.8.5\n\n- **🎓 思维链提示**：欢迎使用思维链（CoT）提示！🎉 您的AI角色智商与情商都得到了大幅提升！\n\n\n### 0.8.4\n\n- **🧩 模块化设计，提升复用性与可维护性**：我们重新组织了关键组件，注重模块化设计，以提高自定义能力和复用性。详情请参阅演示！\n- **🧹 移除遗留组件**：移除了过时的组件，简化了工具包并确保与最新功能的兼容性。如果您是从v0.7.x版本升级，请参考[🔄 从0.7.x迁移](#-migration-from-07x)。\n\n\n### 0.8.3\n\n- **🎧 流式语音监听器**：新增了`AzureStreamSpeechListener`，可在语音说出时实时识别，使对话更加流畅。\n- **🗣️ 对话体验提升**：允许用户随时打断角色发言并接替对话，享受带有自然停顿的更具表现力的对话，全面提升体验。\n- **💃 更简便的动画注册**：简化了角色动画的注册流程，使代码更整洁易管理。\n\n### 0.8.2\n\n- **🌐 通过 JavaScript 控制 WebGL 角色**：我们在 WebGL 构建中添加了从 JavaScript 控制 ChatdollKit Unity 应用程序的功能。这使得 Unity 应用与基于 Web 的系统之间的交互更加无缝。\n- **🗣️ 语音合成器**：引入了一个新的 `SpeechSynthesizer` 组件，以简化文本到语音（TTS）操作。该组件无需 `Model` 包即可在不同项目中重复使用，从而简化维护和复用性。\n\n\n### 0.8.1\n\n- **🏷️ 支持用户自定义标签**：现在可以在 AI 回答中包含自定义标签，以实现动态操作。例如，在回复中嵌入语言代码，以便在对话过程中实时切换多种语言。\n- **🌐 通过 Socket 进行外部控制**：现支持通过 Socket 通信接收外部命令。可以直接控制对话流程、触发特定短语，或控制表情和手势，从而解锁新的应用场景，如 AI 虚拟主播和远程客服。客户端演示请见：https:\u002F\u002Fgist.github.com\u002Fuezo\u002F9e56a828bb5ea0387f90cc07f82b4c15\n\n### 0.8 Beta\n\n- **⚡ 优化的 AI 对话处理**：我们通过并行处理提升了响应速度，并使您能够更轻松地使用自己的代码自定义行为。享受更快、更灵活的 AI 对话体验！\n- **🥰 富有情感的语音**：根据对话内容动态调整语调，带来更具吸引力和自然的交互体验。\n- **🎤 增强的麦克风控制**：麦克风控制现在比以往任何时候都更加灵活！您可以轻松地独立启动或停止设备、静音或取消静音，以及调整语音识别阈值。\n\n\u003C\u002Fdetails>\n\n\n## 🚀 快速入门\n\n您可以通过观看此视频来学习如何设置 ChatdollKit，视频中展示了演示场景（包括与 ChatGPT 的聊天）：https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=rRtm18QSJtc\n\n[![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_a43865f5727d.jpg)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=rRtm18QSJtc)\n\n要运行 0.8 版本的演示，请在导入依赖项后按照以下步骤操作：\n\n- 打开场景 `Demo\u002FDemo08`。\n- 在场景中选择 `AIAvatarVRM` 对象。\n- 将 OpenAI API 密钥设置到检视器中的以下组件：\n  - ChatGPTService\n  - OpenSpeechSynthesizer\n  - OpenAISpeechListener\n- 在 Unity 编辑器中运行。\n- 说出“こんにちは”或任何超过 3 个字符的词语。\n\n\n## 🔖 目录\n\n- [📦 新项目设置](#-setup-new-project)\n  - [导入依赖项](#import-dependencies)\n  - [资源准备](#resource-preparation)\n  - [AIAvatarVRM 预制件](#aiavatarvrm-prefab)\n  - [ModelController](#modelcontroller)\n  - [Animator](#animator)\n  - [AIAvatar](#aiavatar)\n  - [LLM 服务](#llm-service)\n  - [语音服务](#speech-service)\n  - [麦克风控制器](#microphone-controller)\n  - [运行](#run)\n- [🎓 LLM 服务](#-llm-service)\n  - [基本设置](#basic-settings)\n  - [面部表情](#facial-expressions)\n  - [动画](#animations)\n  - [语音中的暂停](#pause-in-speech)\n  - [用户自定义标签](#user-defined-tag)\n  - [多模态](#multi-modal)\n  - [思维链提示](#chain-of-thought-prompting)\n  - [连续请求合并](#consecutive-request-merging)\n  - [时间戳插入](#timestamp-insertion)\n  - [编辑聊天完成请求](#edit-chat-completion-request)\n  - [长期记忆](#long-term-memory)\n- [🗣️ 语音合成器（文本到语音）](#%EF%B8%8F-speech-synthesizer-text-to-speech)\n  - [语音预取模式](#voice-prefetch-mode)\n  - [创建自定义 SpeechSynthesizer](#make-custom-speechsynthesizer)\n  - [性能和质量调优](#performance-and-quality-tuning)\n  - [预处理](#preprocessing)\n- [🎧 语音监听器（语音到文本）](#-speech-listener-speech-to-text)\n  - [AIAvatar 检视器中的设置](#settings-on-aiavatar-inspector)\n  - [降采样](#downsampling)\n  - [使用 AzureStreamSpeechListener](#using-azurestreamspeechlistener)\n  - [使用 Silero VAD](#using-silero-vad)\n  - [结合使用多个 VAD](#using-multiple-vads-combination)\n  - [回声消除](#echo-cancelling)\n  - [自定义打断条件](#custom-barge-in-condition)\n- [⏰ 唤醒词检测](#-wake-word-detection)\n  - [唤醒词](#wake-words)\n  - [取消词](#cancel-words)\n  - [打断词](#interrupt-words)\n  - [忽略词](#ignore-words)\n  - [唤醒时长](#wake-length)\n- [⚡️ AI 代理（工具调用）](#%EF%B8%8F-ai-agent-tool-call)\n- [🎙️ 设备](#%EF%B8%8F-devices)\n  - [麦克风](#microphone)\n  - [相机](#camera)\n- [🥰 3D 模型控制](#-3d-model-control)\n  - [空闲动画](#idle-animations)\n  - [脚本控制](#control-by-script)\n- [🎚️ UI 组件](#%EF%B8%8F-ui-components)\n- [🎮 外部程序控制](#-control-from-external-programs)\n  - [ChatdollKit 远程客户端](#chatdollkit-remote-client)\n- [🌐 在 WebGL 上运行](#-run-on-webgl)\n- [🔄 从 0.7.x 迁移](#-migration-from-07x)\n- [❤️ 致谢](#%EF%B8%8F-thanks)\n\n\n## 📦 新项目设置\n\n以下是使用 VRM 模型进行设置的步骤。有关使用 VRChat 模型的说明，请参阅 [README v0.7.7](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fblob\u002Fv0.7.7\u002FREADME.md#-modelcontroller)。\n\n**⚠️ 注意**：请勿使用 Unity 中的 SRP（可编程渲染管线）项目模板。ChatdollKit 所依赖的 UniVRM 不支持 SRP。\n\n### 导入依赖项\n\n下载最新版本的 [ChatdollKit.unitypackage](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Freleases)，并在导入依赖项后将其导入您的 Unity 项目中；\n\n- 从 Unity 包管理器中安装 `Burst`（窗口 > 包管理器）\n- [UniTask](https:\u002F\u002Fgithub.com\u002FCysharp\u002FUniTask)（测试于 Ver.2.5.4）\n- [uLipSync](https:\u002F\u002Fgithub.com\u002Fhecomi\u002FuLipSync)（测试于 v3.1.0）\n- [UniVRM](https:\u002F\u002Fgithub.com\u002Fvrm-c\u002FUniVRM\u002Freleases\u002Ftag\u002Fv0.127.2)（v0.127.2）\n- [ChatdollKit VRM 扩展](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Freleases)\n- JSON.NET：如果您的项目没有 JSON.NET，请从包管理器 > [+] > 从 Git URL 添加包... > com.unity.nuget.newtonsoft-json 添加。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_4615b678a861.png\" width=\"640\">\n\n\n### 资源准备\n\n将 3D 模型添加到场景中，并根据需要进行调整。同时安装 3D 模型所需的着色器等资源。\n\n此外，还需导入动画片段。在本 README 中，我使用了 [Anime Girls Idle Animations Free](https:\u002F\u002Fassetstore.unity.com\u002Fpackages\u002F3d\u002Fanimations\u002Fanime-girl-idle-animations-free-150406)，这也是演示中使用的资源。我认为购买专业版是值得的👍\n\n\n### AIAvatarVRM 预制件\n\n将 `ChatdollKit\u002FPrefabs\u002FAIAvatarVRM` 预制件添加到场景中。同时创建 EventSystem 以使用 UI 组件。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_73f78211661a.png\" width=\"640\">\n\n\n### ModelController\n\n在 ModelController 的上下文菜单中选择 `Setup ModelController`。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_e480e297313a.png\" width=\"640\">\n\n### 动画师\n\n在 ModelController 的上下文菜单中选择 `Setup Animator`，然后选择包含动画剪辑的文件夹或其父文件夹。在此示例中，将 `01_Idles` 和 `03_Others` 文件夹中的动画剪辑放置到 `Base Layer` 上以进行覆盖混合，将 `02_Layers` 文件夹中的动画剪辑放置到 `Additive Layer` 上以进行叠加混合。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_5d424d96e638.gif\" width=\"640\">\n\n接下来，在您选择的文件夹中查看新创建的 AnimatorController 的 `Base Layer`。确认用于过渡到您希望设置为 idle 动画的状态的值。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_ab49c9711c03.png\" width=\"640\">\n\n最后，在 ModelController 的检视器中将该值设置为 `Idle Animation Value`。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_6a7b2f17c7b0.png\" width=\"640\">\n\n\n### AIAvatar\n\n在 `AIAvatar` 的检视器中，设置用于开始对话的唤醒词（例如 hello \u002F こんにちは🇯🇵）、用于停止对话的取消词（例如 stop \u002F おしまい🇯🇵），以及发生错误时显示的错误语音和错误表情（例如 Something wrong \u002F 調子が悪いみたい🇯🇵）。\n\n`前缀\u002F后缀允许长度` 是唤醒词前后允许添加的字符长度。例如，如果唤醒词是 “Hello”，允许长度为 4 个字符，则短语 “Ah, Hello!” 仍会被识别为唤醒词。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_a89004dddc8b.png\" width=\"640\">\n\n\n### LLM 服务\n\n从 `ChatdollKit\u002FScripts\u002FLLM` 中附加与 LLM 服务相对应的组件，并设置 API 密钥、系统提示等必填字段。本示例使用 ChatGPT，但该框架也支持 Claude、Gemini 和 Dify。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_3d768b5582ba.png\" width=\"640\">\n\n\n### 语音服务\n\n从 `ChatdollKit\u002FScripts\u002FSpeechListener` 中附加 `SpeechListener` 组件用于语音识别，从 `ChatdollKit\u002FScripts\u002FSpeechSynthesizer` 中附加 `SpeechSynthesizer` 组件用于语音合成。配置 API 密钥、语言代码等必要字段。在 SpeechListener 设置中启用 `PrintResult` 将会把识别出的语音输出到日志中，这对于调试非常有用。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_adea1c55010f.png\" width=\"640\">\n\n\n### 麦克风控制器\n\n将 `ChatdollKit\u002FPrefabs\u002FRuntime\u002FMicrophoneController` 添加到您的场景中。这提供了一个 UI 界面，用于调整语音识别的最小音量。如果环境比较嘈杂，您可以将其向左滑动以过滤掉背景噪声。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_e744a1222cd3.png\" width=\"640\">\n\n\n### 运行\n\n按下 Unity 编辑器的播放按钮。您可以看到模型会以空闲动画和眨眼动作开始运行。\n\n- 如有必要，调整麦克风音量滑块。\n- 对着麦克风说出您在检视器中设置的唤醒词。（例如 hello \u002F こんにちは🇯🇵）\n- 您的模型将会回复 “Hi there!” 或其他内容。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_readme_7f8c56014437.png\" width=\"640\"\n\n尽情享受吧👍\n\n\n## 🎓 LLM 服务\n\n### 基本设置\n\n我们支持 ChatGPT、Claude、Gemini 和 Dify 作为文本生成 AI 服务。目前还实验性地支持 Command R，但其稳定性尚不稳定。要使用 LLM 服务，需从 `ChatdollKit\u002FScripts\u002FLLM` 中将您想要使用的 LLMService 组件附加到 AIAvatar 对象上，并勾选 `IsEnabled` 复选框。如果已经附加了其他 LLMService，请确保取消勾选您不打算使用的那些的 `IsEnabled` 复选框。\n\n您可以在检视器中直接对附加的 LLMService 配置 API 密钥、系统提示等参数。有关这些参数的更多详细信息，请参阅相应 LLM 服务的 API 文档。\n\n注意：要使用与 OpenAI 兼容的 API，除了上述设置外，还需勾选 `IsOpenAPICompatibleAPI` 并设置 `ChatCompletionURL`。\n\n- Gemini: https:\u002F\u002Fgenerativelanguage.googleapis.com\u002Fv1beta\u002Fchat\u002Fcompletions\n- Grok: https:\u002F\u002Fapi.x.ai\u002Fv1\u002Fchat-completions\n\n\n### 面部表情\n\n您可以根据对话内容自主控制面部表情。\n\n要控制表情，可在 AI 的回复中加入类似 `[face:ExpressionName]` 的标签，这些标签可以通过系统提示进行设置。以下是一个系统提示的示例：\n\n```\n你有四种表情：“喜悦”、“愤怒”、“悲伤”、“有趣”和“惊讶”。\n如果你想表达某种特定的情绪，请将其插入句子开头，例如 [face:喜悦]。\n\n示例：\n[face:喜悦]嘿，你能看到大海！[face:有趣]我们去游泳吧。\n```\n\n表情名称必须是 AI 能够理解的。请确保它们与 VRM 模型中定义的表情完全一致，包括大小写。\n\n### 动画\n\n你还可以根据对话内容自主控制手势（称为动画）。要控制动画，可以在AI回复中加入类似 `[anim:AnimationName]` 的标签，并在系统提示中设置相关指令。以下是一个示例：\n\n```\n你可以通过以下动画来表达情感：\n\n- angry_hands_on_waist\n- brave_hand_on_chest\n- calm_hands_on_back\n- concern_right_hand_front\n- energetic_right_fist_up\n- energetic_right_hand_piece\n- pitiable_right_hand_on_back_head\n- surprise_hands_open_front\n- walking\n- waving_arm\n- look_away\n- nodding_once\n- swinging_body\n\n如果你想用手势表达情感，只需在回复消息中插入相应的动画，例如 [anim:waving_arm]。\n示例：\n[anim:waving_arm]嘿，我在这儿！\n```\n\n动画名称必须清晰明确，以便AI能够理解你想要表达的手势。\n\n要将指定的动画名称与 `Animator Controller` 中定义的动画关联起来，可以通过代码在 `ModelController` 中进行注册，如下所示：\n\n```csharp\n\u002F\u002F 基础层\nmodelController.RegisterAnimation(\"angry_hands_on_waist\", new Model.Animation(\"BaseParam\", 0, 3.0f));\nmodelController.RegisterAnimation(\"brave_hand_on_chest\", new Model.Animation(\"BaseParam\", 1, 3.0f));\nmodelController.RegisterAnimation(\"calm_hands_on_back\", new Model.Animation(\"BaseParam\", 2, 3.0f));\nmodelController.RegisterAnimation(\"concern_right_hand_front\", new Model.Animation(\"BaseParam\", 3, 3.0f));\nmodelController.RegisterAnimation(\"energetic_right_fist_up\", new Model.Animation(\"BaseParam\", 4, 3.0f));\nmodelController.RegisterAnimation(\"energetic_right_hand_piece\", new Model.Animation(\"BaseParam\", 5, 3.0f));\nmodelController.RegisterAnimation(\"pitiable_right_hand_on_back_head\", new Model.Animation(\"BaseParam\", 7, 3.0f));\nmodelController.RegisterAnimation(\"surprise_hands_open_front\", new Model.Animation(\"BaseParam\", 8, 3.0f));\nmodelController.RegisterAnimation(\"walking\", new Model.Animation(\"BaseParam\", 9, 3.0f));\nmodelController.RegisterAnimation(\"waving_arm\", new Model.Animation(\"BaseParam\", 10, 3.0f));\n\u002F\u002F 添加层\nmodelController.RegisterAnimation(\"look_away\", new Model.Animation(\"BaseParam\", 6, 3.0f, \"AGIA_Layer_look_away_01\", \"Additive Layer\"));\nmodelController.RegisterAnimation(\"nodding_once\", new Model.Animation(\"BaseParam\", 6, 3.0f, \"AGIA_Layer_nodding_once_01\", \"Additive Layer\"));\nmodelController.RegisterAnimation(\"swinging_body\", new Model.Animation(\"BaseParam\", 6, 3.0f, \"AGIA_Layer_swinging_body_01\", \"Additive Layer\"));\n```\n\n如果你使用 Animation Girl Idle Animations 或其免费版本，可以更方便地注册动画：\n\n```csharp\nmodelController.RegisterAnimations(AGIARegistry.GetAnimations(animationCollectionKey));\n```\n\n\n### 语音停顿\n\n你可以在角色的语音中插入停顿，使对话听起来更加自然和人性化。要控制停顿的长度，可以在AI回复中加入类似 `[pause:seconds]` 的标签，这些标签可以通过系统提示进行设置。指定的秒数可以是浮点值，从而精确控制对话中该位置的停顿时长。以下是一个系统提示的示例：\n\n```\n你可以在角色的语音中插入停顿，以使对话显得更加自然和人性化。\n\n示例：\n嘿，外面今天天气真好！[pause:1.5] 你觉得我们该做些什么呢？\n```\n\n\n### 用户自定义标签\n\n除了表情和动画之外，你还可以根据开发者定义的标签执行特定操作。在系统提示中包含插入标签的指令，并实现 `HandleExtractedTags` 方法。以下是在对话过程中切换房间灯光开关的一个示例：\n\n```\n如果你想切换房间灯光的开关状态，可以在回复中插入类似 [light:on] 的语言标签。\n\n示例：\n[light:off]好的，那我就把灯关了。晚安。\n```\n\n```csharp\ndialogProcessor.LLMServiceExtensions.HandleExtractedTags = (tags, session) =>\n{\n    if (tags.ContainsKey(\"light\"))\n    {\n        var lightCommand = tags[\"light\"];\n        if (lightCommand.ToLower() == \"on\")\n        {\n            \u002F\u002F 打开灯光\n            Debug.Log($\"打开灯光\");\n        }\n        else if (lightCommand.ToLower() == \"off\")\n        {\n            \u002F\u002F 关闭灯光\n            Debug.Log($\"关闭灯光\");\n        }\n        else\n        {\n            Debug.LogWarning($\"无法处理的命令：{lightCommand}\");\n        }\n    }\n};\n```\n\n\n### 多模态\n\n你可以将来自摄像头或文件的图像包含在发送给 LLM 的请求中。在 `DialogProcessor.StartDialogAsync` 方法的 `payloads` 参数中，以 `imageBytes` 为键传递图像二进制数据。此外，你还可以让系统根据用户的语音自动捕获图像。为此，需在系统提示中配置 AI 回复中的 `[vision:camera]` 标签，并在 LLM 服务中实现当接收到该标签时触发图像采集的流程。\n\n```\n你可以使用摄像头来获取你所看到的内容。\n当用户希望你观察某样东西时，可以在你的回复中插入 [vision:camera]。\n\n示例：\n用户：看！我今天刚买的。\n助手：[vision:camera]让我看看。\n```\n\n```csharp\ngameObject.GetComponent\u003CChatGPTService>().CaptureImage = async (source) =>\n{\n    if (simpleCamera != null)\n    {\n        try\n        {\n            return await simpleCamera.CaptureImageAsync();\n        }\n        catch (Exception ex)\n        {\n            Debug.LogError($\"CaptureImageAsync 出错：{ex.Message}\\n{ex.StackTrace}\");\n        }\n    }\n\n    return null;\n};\n```\n\n\n### 思维链提示法\n\n思维链提示法（CoT）是一种提升 AI 表现的技术。有关 CoT 的更多信息及提示示例，请参阅 https:\u002F\u002Fdocs.anthropic.com\u002Fen\u002Fdocs\u002Fbuild-with-claude\u002Fprompt-engineering\u002Fchain-of-thought 。ChatdollKit 支持思维链提示法，它会将 `\u003Cthinking> ~ \u003C\u002Fthinking>` 标签包裹的句子排除在语音合成之外。\n\n你可以通过在 `LLMContentProcessor` 的检视器中设置首选词（例如“reason”）作为 `ThinkTag` 来自定义该标签。\n\n### 连续请求合并\n\n当用户以短促的语句进行对话时，系统可能会在短时间内接收到多个快速连续的请求，导致 AI 对每个片段分别作出响应。通过启用“连续请求合并”功能，可以将这些零散的输入合并为一个完整的请求，从而使 AI 能够一次性理解并回应用户的完整意图。\n\n请在检视器中对 `DialogProcessor` 组件进行如下配置：\n\n|项|描述|\n|----|----|\n|**合并请求阈值**|用于合并连续请求的时间窗口（单位：秒）。如果在前一个请求之后的该时间间隔内又收到新的请求，则会将这两个请求合并。设置为 `0` 可禁用此功能（默认值：`0`）。|\n|**合并请求前缀**|添加到合并后请求前面的文本前缀，用于指示 AI 忽略之前的不完整响应（默认：`\"之前的用户请求及您的回复已被取消。请针对以下请求重新作答：\"`）。|\n\n当发生合并时，系统会将前一个请求的文本与新请求的文本拼接在一起，并在前面加上指定的前缀，以告知 AI 前一响应已被取消。合并后的文本不会显示在用户消息窗口中——用户只会看到他们最新的发言内容。\n\n\n### 时间戳插入\n\n您可以按照设定的时间间隔，自动将当前日期和时间插入发送给 LLM 的请求中。这样，AI 角色无需依赖工具调用即可感知当前时间，从而实现基于时间的响应，例如根据一天中的不同时间段给出相应的问候语。\n\n请在检视器中对 `DialogProcessor` 组件进行如下配置：\n\n|项|描述|\n|----|----|\n|**时间戳插入间隔**|两次时间戳插入之间的时间间隔（单位：秒）。设置为 `0` 可禁用此功能（默认值：`0`）。|\n|**时间戳前缀**|添加到时间戳前面的文本前缀（默认：`\"当前日期和时间： \"`）。|\n\n启用后，当前日期和时间将以 `当前日期和时间： 2026\u002F02\u002F14 14:30:00` 的格式插入到用户的请求文本之前，然后再发送给 LLM。时间戳同样不会显示在用户消息窗口中。\n\n\n### 编辑 ChatGPT API 请求\n\n您可以通过设置 `ChatGPTService` 上的 `EditChatCompletionRequest` 来自定义 ChatGPT API 调用的 HTTP 请求头和请求体。此委托会在请求被序列化并发送之前被调用，使您能够完全控制 `UnityWebRequest` 对象以及请求数据字典。\n\n```csharp\nvar chatGPTService = gameObject.GetComponent\u003CChatGPTService>();\n\nchatGPTService.EditChatCompletionRequest = (request, data) =>\n{\n    \u002F\u002F 添加或修改请求头\n    request.SetRequestHeader(\"X-Custom-Header\", \"my-value\");\n\n    \u002F\u002F 添加或修改请求体参数\n    data[\"custom_param\"] = \"some_value\";\n\n    \u002F\u002F 修改消息内容（例如移除历史记录，仅保留最后一条用户消息）\n    var messages = (List\u003CILLMMessage>)data[\"messages\"];\n    messages.RemoveRange(0, messages.Count - 1);\n};\n```\n\n**参数说明：**\n\n|参数|类型|描述|\n|----|----|-----|\n|`request`|`UnityWebRequest`|HTTP 请求对象。使用 `SetRequestHeader()` 方法来添加或覆盖请求头。|\n|`data`|`Dictionary\u003Cstring, object>`|请求体字典。直接修改它即可改变发送给 API 的 JSON 数据负载。|\n\n**示例：集成 OpenClaw**\n\n以下示例展示了如何添加自定义会话头、清除对话历史仅保留最新用户消息，并在文本内容前添加 `[channel:voice]` 标签：\n\n```csharp\nchatGPTService.EditChatCompletionRequest = (request, data) =>\n{\n    request.SetRequestHeader(\"x-openclaw-session-key\", \"agent:main:main\");\n\n    var messages = (List\u003CILLMMessage>)data[\"messages\"];\n    messages.RemoveRange(0, messages.Count - 1);\n\n    \u002F\u002F 在用户消息的文本内容前添加 [channel:voice] 前缀\n    if (messages.Last() is ChatGPTUserMessage userMessage)\n    {\n        var textPart = userMessage.content.OfType\u003CTextContentPart>().FirstOrDefault();\n        if (textPart != null)\n        {\n            textPart.text = \"[channel:voice]\" + textPart.text;\n        }\n        else\n        {\n            userMessage.content.Insert(0, new TextContentPart(\"[channel:voice]\"));\n        }\n    }\n};\n```\n\n\n### 长期记忆\n\nChatdollKit 本身并不具备内置的长期记忆管理机制。然而，通过实现 `OnStreamingEnd` 回调，可以逐步积累记忆信息。此外，借助能够检索存储记忆的工具，系统还可以在对话中调用并引用这些记忆。\n\n以下是一个使用 [ChatMemory](https:\u002F\u002Fgithub.com\u002Fuezo\u002Fchatmemory) 的示例。\n\n首先，为了存储记忆，请将 `Extension\u002FChatMemory\u002FChatMemoryIntegrator` 组件附加到主游戏对象上，并设置 ChatMemory 服务的 URL 以及一个用户 ID。用户 ID 可以是任意值，但如果您正在构建面向多用户的服务，则务必从代码层面为每个用户分配一个唯一标识符。\n\n接下来，在适当的位置（例如 `Main` 脚本）添加以下代码，以便在 LLM 流式传输结束时，将请求和响应消息作为对话历史存储到 ChatMemory 中：\n\n```csharp\nusing ChatdollKit.Extension.ChatMemory;\n\nvar chatMemory = gameObject.GetComponent\u003CChatMemoryIntegrator>();\ndialogProcessor.LLMServiceExtensions.OnStreamingEnd += async (text, payloads, llmSession, token) =>\n{\n    chatMemory.AddHistory(llmSession.ContextId, text, llmSession.CurrentStreamBuffer, token).Forget();\n};\n```\n\n\n要检索记忆并将其融入对话中，只需将 `Extension\u002FChatMemory\u002FChatMemoryTool` 组件添加到主游戏对象上即可。\n\n**注意：** ChatMemory 管理的是所谓的情景记忆。此外还存在一种称为“知识”的实体，它对应于事实性信息，但并不会自动提取或存储。您需要根据需求手动处理。（默认情况下，它会被纳入搜索范围。）\n\n\n## 🗣️ 语音合成器（文本转语音）\n\n我们支持多种基于云的语音合成服务，如 Google、Azure、OpenAI 和 Watson；同时，也支持 VOICEVOX \u002F AivisSpeech、Aivis Cloud API、VOICEROID 以及 Style-Bert-VITS2 等更具个性和表现力的语音引擎。要使用语音合成服务，只需将 `ChatdollKit\u002FScripts\u002FSpeechListener` 中的 `SpeechSynthesizer` 组件附加到 AIAvatar 对象上，并勾选 `IsEnabled` 复选框。如果同时附加了其他 `SpeechSynthesizer` 组件，请确保将未使用的组件的 `IsEnabled` 复选框取消勾选。\n\n您可以在检视器中为已附加的 `SpeechSynthesizer` 组件配置 API 密钥、端点等参数。有关这些参数的详细信息，请参阅各 TTS 服务的 API 文档。\n\n### 语音预取模式\n\n`语音预取模式`决定了语音合成请求的管理和处理方式。默认情况下，系统以并行模式运行。支持以下几种模式：\n\n1. **并行（默认）**：在此模式下，多个语音合成请求会同时发送并被处理。这确保了在短时间内连续生成多个语音输出时具有最快的响应速度。当延迟至关重要且有足够的资源进行并行处理时，请使用此模式。\n1. **串行**：请求会按照入队顺序依次处理。此模式非常适合资源有限的情况，或需要严格保证语音输出顺序的场景。它避免了潜在的并发问题，但可能会导致后续请求的等待时间较长。\n1. **禁用**：在此模式下不会执行任何预取操作。语音合成仅在显式触发时才会发生，因此适用于资源非常有限的场景，或者在不需要预取的情况下。\n\n您可以在 SpeechSynthesizer 组件的检视器中更改 `语音预取模式`。请确保所选模式与您的性能和资源管理需求相匹配。\n\n\n### 创建自定义 SpeechSynthesizer\n\n您可以轻松创建并使用自定义的 `SpeechSynthesizer` 来集成您偏好的文本转语音服务。只需创建一个继承自 `ChatdollKit.SpeechSynthesizer.SpeechSynthesizerBase` 的类，并实现异步方法 `DownloadAudioClipAsync`，该方法接收一个 `string text` 和一个 `Dictionary\u003Cstring, object> parameters` 参数，返回可在 Unity 中播放的 `AudioClip` 对象。\n\n```csharp\nUniTask\u003CAudioClip> DownloadAudioClipAsync(string text, Dictionary\u003Cstring, object> parameters, CancellationToken cancellationToken)\n```\n\n请注意，WebGL 不支持压缩音频的播放，因此请根据平台的不同调整代码以妥善处理这一问题。\n\n\n### 性能与质量调优\n\n为了获得快速的响应时间，我们不是将整个回复消息一次性合成为语音，而是根据标点符号将文本拆分成更小的部分，并逐步合成和播放每个片段。虽然这样做可以显著提升性能，但如果过度拆分文本，可能会降低语音质量，尤其是在使用基于 AI 的语音合成技术（如 Style-Bert-VITS2）时，从而影响语气和流畅度。\n\n您可以通过调整 `LLMContentProcessor` 组件检视器中的文本拆分方式来平衡性能和语音质量。\n\n|项目|描述|\n|----|----|\n|**拆分字符**|用于拆分文本以进行合成的字符。语音合成始终会在这些位置执行。|\n|**可选拆分字符**|可选的拆分字符。通常情况下，文本不会在这些位置拆分，但如果文本长度超过“可选拆分前的最大长度”设置的值，则会在这些位置拆分。|\n|**可选拆分前的最大长度**|当文本长度达到该阈值时，可选拆分字符将被用作拆分点。|\n\n\n### 预处理\n\n实现 `SpeechSynthesizer.PreprocessText` 方法，对要合成的文本进行预处理。\n\n接口：\n\n```csharp\nFunc\u003Cstring, Dictionary\u003Cstring, object>, CancellationToken, UniTask\u003Cstring>> PreprocessText;\n```\n\n\n## 🎧 语音监听器（语音转文本）\n\n我们支持基于云的语音识别服务，例如 Google、Azure 和 OpenAI。要使用这些服务，只需将 `ChatdollKit\u002FScripts\u002FSpeechListener` 中的 `SpeechListener` 组件附加到 AIAvatar 对象上即可。请注意，如果同时附加了多个 SpeechListener，它们会并行运行，因此请确保只启用您希望使用的那一个。\n\n您可以在检视器中为附加的 SpeechListener 配置 API 密钥、端点等参数。有关这些参数的详细信息，请参阅相应 STT 服务和产品的 API 文档。\n\n大多数 `录音机设置` 由稍后介绍的 `AIAvatar` 组件控制，因此除下列设置外，检视器中的其他设置将被忽略。\n\n|项目|描述|\n|----|-----------|\n|**自动启动**|启用后，应用程序启动时会自动开始语音识别。|\n|**打印结果**|启用后，会将识别出的语音转录内容输出到控制台。|\n\n### AIAvatar 检查器中的设置\n\n与 SpeechListener 相关的大多数设置都在 `AIAvatar` 组件的检查器中进行配置。\n\n|项目|描述|\n|---|---|\n|**对话超时**|在对话被视为结束之前的等待时间（秒）。超过此时间段后，将切换到空闲模式，并隐藏消息窗口。要恢复对话，必须再次识别唤醒词。|\n|**空闲超时**|从空闲模式切换到睡眠模式之前的等待时间（秒）。默认情况下，空闲模式和睡眠模式之间没有区别，但可以通过用户自定义实现来切换不同的语音识别方法或空闲动画。|\n|**语音识别阈值 dB**|用于语音识别的音量阈值（分贝）。低于此阈值的声音将不会被识别。|\n|**语音识别提升阈值 dB**|用于检测更响亮语音的提升阈值（分贝）。当“麦克风静音方式”设置为“阈值”时，会使用此设置。|\n|**对话静默持续时间阈值**|如果检测到的静默时间超过此值，则停止录音并进行语音识别。|\n|**对话最小录音时长**|只有当录制的声音超过此时长时才会进行语音识别。这有助于忽略短促的噪音，防止误识别。|\n|**对话最大录音时长**|如果录制的声音超过此时间，则不会进行语音识别，而是直接忽略该录音。这样可以避免过长的录音给语音识别带来过大负担。|\n|**空闲静默持续时间阈值**|空闲模式下停止录音所需的静默时间（秒）。为了在等待唤醒词时平滑地检测到短暂的静默，通常会设置较小的值。|\n|**空闲最小录音时长**|空闲模式下的最小录音时长。相比对话模式，此处会设置更小的值，以便更流畅地检测短语。|\n|**空闲最大录音时长**|空闲模式下的最大录音时长。由于唤醒词通常较短，因此会比对话模式设置更短的时长。|\n|**麦克风静音方式**|用于在语音过程中防止角色说话被识别的方法。\u003Cbr>\u003Cbr>- 无：不执行任何操作。\u003Cbr>- 阈值：将语音识别阈值提升至“语音识别提升阈值 dB”。\u003Cbr>- 静音：忽略来自麦克风的输入声音。\u003Cbr>- 停止设备：停止麦克风设备。\u003Cbr>- 停止监听器：停止监听器。**当您使用 AzureStreamSpeechListener 时，请选择此项**|\n|**打断响应**|启用时（默认：true），当 SpeechListener 检测到用户开始讲话（打断）时，AI 的语音会自动停止。触发条件取决于 SpeechListener 类型：对于非流式监听器，当语音录音超过最小时长（默认：1.5 秒）时触发；对于流式监听器（AIAvatarKitStream、AzureStream），当识别出的文本达到最小长度（默认：2 个字符）时触发。您可以通过在 SpeechListener 上设置 `BargeInCondition` 来覆盖触发条件。将其设置为 false 可以禁用此功能。|\n\n\n**注意：** `AzureStreamSpeechListener` 不具备上述某些属性，因为它由 SDK DLL 在内部控制麦克风。\n\n\n### 下采样\n\n`SpeechListener` 类支持在将原始麦克风输入发送到 STT 服务之前，将其下采样到较低的采样率。此功能有助于减少音频负载大小，从而在带宽有限的网络上实现更流畅的转录。\n\n您可以在 SpeechListener 组件的检查器中找到 **目标采样率**（整数）字段：\n\n- 设置为 `0`（默认）以使用原始采样率（不进行下采样）。  \n- 设置为正整数（例如 `16000`）以将输入下采样到该采样率（单位：Hz）。\n\n\n### 使用 AIAvatarKitStreamSpeechListener\n\n[AIAvatarKit](https:\u002F\u002Fgithub.com\u002Fuezo\u002Faiavatarkit) 是一个提供语音到语音处理管道的框架，同时也提供了一个流式语音识别服务器。通过将 Silero VAD（也用于 ChatdollKit）用于话轮结束检测，并结合任何语音识别引擎，您可以构建一个针对特定应用场景的实时语音识别服务器。\n\n要使用此功能，请将 `AIAvatarKitStreamSpeechListener` 组件附加到您的 AIAvatar 对象，并配置连接 URL。然后，在主逻辑初始化中（例如在 `Start()` 方法中，或在 `Awake()` 之后运行的任何流程中），添加以下语音显示处理代码：\n\n```csharp\n\u002F\u002F 使用 AIAvatarKitStreamSpeechListener 部分显示 AI 消息\nvar aiavatarKitStreamSpeechListener = gameObject.GetComponent\u003CAIAvatarKitStreamSpeechListener>();\nif (aiavatarKitStreamSpeechListener != null)\n{\n    var userMessageWindow = (SimpleMessageWindow)aiAvatar.UserMessageWindow;\n\n    \u002F\u002F 禁用文本动画，因为部分结果是实时流式传输的\n    userMessageWindow.IsTextAnimated = false;\n    \u002F\u002F 流式显示时，较短的 PostGap 即可，优先考虑响应速度\n    userMessageWindow.PostGap = 0.2f;\n\n    \u002F\u002F 在第一轮结束后手动隐藏用户消息窗口，\n    \u002F\u002F 因为用户消息窗口不由正常的对话流程管理\n    var originalOnRecognized = aiavatarKitStreamSpeechListener.OnRecognized;\n    aiavatarKitStreamSpeechListener.OnRecognized = async (text) => {\n        if (originalOnRecognized != null)\n        {\n            await originalOnRecognized(text);\n        }\n        if (aiAvatar.Mode != AIAvatar.AvatarMode.Conversation)\n        {\n            await UniTask.Delay((int)(userMessageWindow.PostGap * 1000));\n            aiAvatar.UserMessageWindow?.Hide();\n        }\n    };\n\n    \u002F\u002F 显示部分识别结果\n    aiavatarKitStreamSpeechListener.OnPartialRecognized = (partialText) => {\n        if (!string.IsNullOrEmpty(partialText))\n        {\n            aiAvatar.UserMessageWindow.Show(partialText);\n        }\n    };\n}\n```\n\n**注意**：在 WebGL 构建中使用 Silero VAD 可能会导致浏览器处理开销过高。我们建议使用 `AIAvatarKitStreamSpeechListener` 将 VAD 处理卸载到服务器端。\n\n**注意**：要在 WebGL 构建中使用 `AIAvatarKitStreamSpeechListener`，请通过包管理器添加 NativeWebSocket 包：`https:\u002F\u002Fgithub.com\u002Fendel\u002FNativeWebSocket.git#upm`。Unity 自带的 WebSocket 客户端（`System.Net.WebSockets`）在 WebGL 上不受支持。\n\n### 使用 AzureStreamSpeechListener\n\n使用 `AzureStreamSpeechListener` 时，部分设置与其他 SpeechListener 不同。这是因为 `AzureStreamSpeechListener` 会通过 SDK 在内部控制麦克风，并以增量方式执行转录。\n\n**麦克风静音条件**：选择 `停止监听器`。如果未进行此设置，角色将听到自己的语音，从而打断对话。\n\n**用户消息窗口**：取消勾选 `文本动画显示`，并将 `前置间隔` 设置为 `0`，`后置间隔` 设置为约 `0.2`。\n\n**Update()**：为了逐步显示识别出的文本，请在 Update() 方法中添加以下代码：\n\n\n```csharp\nif (aiAvatar.Mode == AIAvatar.AvatarMode.Conversation)\n{\n    if (!string.IsNullOrEmpty(azureStreamSpeechListener.RecognizedTextBuffer))\n    {\n        aiAvatar.UserMessageWindow.Show(azureStreamSpeechListener.RecognizedTextBuffer);\n    }\n}\n```\n\n\n### 使用 Silero VAD\n\nSilero VAD 是一种基于机器学习的语音活动检测模型。通过使用它，即使在嘈杂环境中也能准确判断是否有人声，与仅依赖麦克风音量的语音活动检测相比，显著提高了在噪声环境下的端点检测精度。\n\n使用步骤如下：\n\n- 导入 [onnxruntime-unity](https:\u002F\u002Fgithub.com\u002Fasus4\u002Fonnxruntime-unity)。按照 GitHub 上的说明编辑 manifest.json 文件。\n- 下载 [Silero VAD ONNX 模型](https:\u002F\u002Fgithub.com\u002Fsnakers4\u002Fsilero-vad\u002Ftree\u002Fmaster\u002Fsrc\u002Fsilero_vad\u002Fdata)，并将其放置在 StreamingAssets 文件夹中，文件名为 `silero_vad.onnx`。\n- 下载并导入 ChatdollKit 的 SileroVADExtension。\n- 将 `SileroVADProcessor` 挂载到已附加 SpeechListener 的对象上。\n- 在任意 MonoBehaviour 组件的 Awake 方法中，将其设置为 SpeechListener 的语音检测函数。\n    ```csharp\n    var sileroVad = gameObject.GetComponent\u003CSileroVADProcessor>();\n    sileroVad.Initialize();\n    var speechListener = gameObject.GetComponent\u003CSpeechListenerBase>();\n    speechListener.DetectVoiceFunc = sileroVad.IsVoiced;\n    ``` \n- 如有需要，可在场景中放置 SileroVADMicrophoneButton。\n\n执行后，系统将使用 Silero VAD 进行语音活动检测。\n\n\n### 使用多种 VAD 组合\n\nChatdollKit 支持组合使用多种类型的 VAD。例如，可以将能够在嘈杂环境中仅识别人声的 Silero VAD 与仅捕捉响亮声音的内置能量型 VAD 结合使用。这样，在活动现场既能准确捕捉用户的语音，又能部分过滤掉周围的噪音和场地广播。\n\n要使用多种 VAD，需将多个语音检测函数添加到 `DetectVoiceFunctions` 中，而不是 `DetectVoiceFunc`。\n\n```csharp\nspeechListener.DetectVoiceFunctions = new List\u003CFunc\u003Cfloat[], float, bool>>()\n{\n    sileroVad.IsVoiced, speechListener.IsVoiceDetectedByVolume\n};\n```\n\n\n### 回声消除\n\nUnity 内置的 Microphone API 不支持回声消除功能。若需启用该功能，可使用特定平台的原生麦克风插件。\n\n```csharp\nprivate void Awake()\n{\n    var microphoneManager = gameObject.GetComponent\u003CMicrophoneManager>();\n    \n    \u002F\u002F 首先导入 ChatdollKit_NativeMicrophone 包\n    \u002F\u002F 然后根据平台设置相应的提供者：\n    \n    \u002F\u002F iOS\n    microphoneManager.MicrophoneProvider = new IOSMicrophoneProvider();\n    \u002F\u002F Android\n    microphoneManager.MicrophoneProvider = new AndroidMicrophoneProvider();\n    \u002F\u002F macOS\n    microphoneManager.MicrophoneProvider = new MacMicrophoneProvider();\n}\n```\n\n启用回声消除后，用户可以在 AI 讲话时随时打断。要启用此功能：\n\n1. 在 Inspector 中选择 `AIAvatar` 组件。\n2. 将 `MicrophoneMuteBy` 设置为 `无`。\n\n这样配置后，麦克风将在 AI 讲话期间保持开启状态，允许自然地打断对话；同时，回声消除功能可防止 AI 的声音被麦克风拾取。\n\n\n### 自定义抢占条件\n\n默认情况下，抢占触发取决于 SpeechListener 的类型：\n\n- **非流式监听器**（OpenAI、Google、AIAvatarKit）：当用户录音时间达到至少 `BargeInMinDuration` 秒时触发（默认值为 1.5 秒）。\n- **流式监听器**（AzureStream、AIAvatarKitStream）：当部分识别的文本达到 `BargeInMinTextLength` 个字符时触发（默认值为 2 个字符）。\n\n您可以通过设置 SpeechListener 的 `BargeInCondition` 来覆盖此逻辑。委托签名为 `Func\u003Cstring, float, bool>`，其中：\n\n- `text` — 部分识别的文本（对于非流式监听器为 `null`）。\n- `recordDuration` — 录音经过的时间（以秒为单位，对于流式监听器为 `0f`）。\n- 返回 `true` 以触发抢占。\n\n```csharp\n\u002F\u002F 示例：要求流式监听器至少输入 3 个字符\naiAvatar.SpeechListener.BargeInCondition = (text, recordDuration) =>\n{\n    if (text != null)\n    {\n        return text.Length >= 3;\n    }\n    return recordDuration >= 2.0f;\n};\n```\n\n\n## ⏰ 唤醒词检测\n\n您可以检测唤醒词作为启动对话的触发条件。此外，还可以在 AIAvatar 组件的 Inspector 中配置结束对话的取消词，或者使用识别出的语音长度而非特定短语来触发对话。\n\n### 唤醒词\n\n当识别到该短语时，对话将开始。您可以注册多个唤醒词。除以下项目外，0.8 版本及更高版本中的其他设置将被忽略。\n\n|项目|描述|\n|---|---|\n|**文本**|用于启动对话的短语。|\n|**前缀\u002F后缀允许范围**|唤醒词前后允许额外字符的最大长度。例如，若唤醒词为“Hello”，允许范围为 4 个字符，则短语“Ah, Hello!”仍会被检测为唤醒词。|\n\n### 取消词\n\n当识别到该短语时，对话将结束。您可以注册多个取消词。\n\n### 打断词\n\n角色将停止讲话并开始倾听用户的请求。您可以注册多个打断词。（例如，“等一下”）\n\n**注意**：在 AIAvatar 的 Inspector 中，将 `麦克风静音条件` 下的 `阈值` 选中，以允许 ChatdollKit 在角色讲话时继续监听您的语音。\n\n### 忽略词\n\n您可以注册一些字符串，使其在判断识别出的语音是否匹配唤醒词或取消词时被忽略。这在不希望考虑标点符号是否存在的情况下非常有用。\n\n### 唤醒长度\n\n您也可以根据识别出的文本长度而非特定短语来启动对话。当该值为 `0` 时，此功能将被禁用。例如，在空闲模式下，可通过文本长度而非唤醒词恢复对话；而在睡眠模式下，则可使用唤醒词重新开始对话。\n\n## ⚡️ AI 代理（工具调用）\n\n借助大模型提供的工具调用（函数调用）功能，您可以开发充当 AI 代理的 AI 角色，而不仅仅是进行简单的对话交流。\n\n通过创建实现 `ITool` 接口或继承 `ToolBase` 类的组件，并将其附加到 AIAvatar 对象上，该组件将自动被识别为工具，并在需要时执行。要创建自定义工具，需定义 `FunctionName` 和 `FunctionDescription`，并实现返回函数定义的 `GetToolSpec` 方法以及处理函数逻辑的 `ExecuteFunction` 方法。有关详细信息，请参阅 `ChatdollKit\u002FExamples\u002FWeatherTool`。\n\n**注意**：如果您的项目中包含自定义的 LLMFunctionSkills，请参阅 [从 FunctionSkill 迁移到 Tool](#migration-from-functionskill-to-tool)。\n\n### 与远程 AI 代理的集成\n\n虽然 ChatdollKit 原生支持简单的工具调用，但它也提供了与服务器端 AI 代理的集成能力，以实现更复杂的代理行为。\n\n具体来说，ChatdollKit 允许您通过 RESTful API 调用 AI 代理，并将其注册为 `LLMService`。这样，您无需了解背后的代理流程，即可发送请求并接收响应。目前支持 [Dify](https:\u002F\u002Fdify.ai) 和 [AIAvatarKit](https:\u002F\u002Fgithub.com\u002Fuezo\u002Faiavatarkit)。您可以通过附加 `DifyService` 或 `AIAvatarKitService`、配置其设置并启用 `IsEnabled` 标志来使用它们。\n\n## 🎙️ 设备\n\n我们提供设备控制机制。目前支持麦克风和摄像头。\n\n### 麦克风\n\n`MicrophoneManager` 组件负责从麦克风捕获音频，并将音频波形数据提供给其他组件使用。它主要用于配合 SpeechListener 使用，但您也可以在自定义用户实现的组件中通过 `StartRecordingSession` 方法注册并使用录音会话。\n\n以下是在检视器中可配置的设置：\n\n|项|描述|\n|----|----|\n|**采样率**|指定采样率。使用 WebGL 时请设置为 44100。|\n|**噪声门阈值（dB）**|以分贝为单位指定噪声门级别。与 AIAvatar 组件一起使用时，此值由 AIAvatar 控制。|\n|**自动启动**|应用程序启动时开始从麦克风捕获音频。|\n|**调试模式**|记录麦克风的启停及静音\u002F取消静音操作。|\n\n### 摄像头\n\n我们提供了 `SimpleCamera` 预制件，其中封装了图像捕获、预览显示和摄像头切换等功能。由于不同设备对摄像头的处理方式有所不同，因此该预制件仅作为实验性功能提供。有关详细信息，请参考该预制件及其附带的脚本。\n\n## 🥰 3D 模型控制\n\n`ModelController` 组件用于控制 3D 模型的手势、面部表情和语音。\n\n### 空闲动画\n\n空闲动画会在模型处于等待状态时循环播放。要运行所需的动作，需将其注册到 Animator Controller 的状态机中，并通过在 `ModelController` 检视器中将参数名称设置为“Idle Animation Key”，将值设置为“Idle Animation Value”，来配置过渡条件。\n\n若要注册多个动作并在固定时间间隔内随机切换，可按如下所示在代码中使用 `AddIdleAnimation` 方法。第一个参数是要执行的 `Animation` 对象，`weight` 是出现概率的倍数，而 `mode` 仅在希望在特定模型状态下显示动画时才需指定。`Animation` 类的构造函数接受三个参数：第一个是参数名称，第二个是参数值，第三个则是持续时间（以秒为单位）。\n\n```csharp\nmodelController.AddIdleAnimation(new Animation(\"BaseParam\", 2, 5f));\nmodelController.AddIdleAnimation(new Animation(\"BaseParam\", 6, 5f), weight: 2);\nmodelController.AddIdleAnimation(new Animation(\"BaseParam\", 99, 5f), mode: \"sleep\");\n```\n\n### 脚本控制\n\n本节正在建设中。基本上，您需要创建一个 `AnimatedVoiceRequest` 对象，并调用 `ModelController.AnimatedSay`。`AIAvatar` 内部会发出结合动画、表情和语音的请求，因此可参考其相关实现作为指导。\n\n## 🎚️ UI 组件\n\n我们提供了语音交互式 AI 角色应用中常用的 UI 组件预制件。您只需将其添加到场景中即可使用。有关配置细节，请参阅演示。\n\n- **FPSManager**：显示当前帧率。您还可以使用此组件设置目标帧率。\n- **MicrophoneController**：用于调节麦克风噪声门的滑块。\n- **RequestInput**：用于输入请求的文本框。它还提供从文件系统获取图片以及启动摄像头的按钮。\n- **SimpleCamera**：用于处理摄像头图像捕获和预览显示的组件。您也可以在不显示预览的情况下直接捕获图像。\n\n## 🎮 从外部程序控制\n\n你可以通过套接字通信或 JavaScript 从外部程序向 ChatdollKit 应用程序发送请求。此功能支持新的使用场景，例如 AI 虚拟主播直播、远程虚拟形象客服，以及结合 AI 和人工交互的混合型角色运营。\n\n将 `ChatdollKit\u002FScripts\u002FNetwork\u002FSocketServer` 挂载到 AIAvatar 对象上，并设置端口号（例如 8080），即可通过套接字通信进行控制；或者挂载 `ChatdollKit\u002FScripts\u002FIO\u002FJavaScriptMessageHandler` 来实现 JavaScript 控制。\n\n此外，为了处理网络上的对话请求，需要将 `ChatdollKit\u002FScripts\u002FDialog\u002FDialogPriorityManager` 挂载到 AIAvatar 对象上。若要处理让角色执行由人类而非 AI 回答生成的手势、面部表情或语音的请求，则需将 `ChatdollKit\u002FScripts\u002FModel\u002FModelRequestBroker` 挂载到 AIAvatar 对象上。\n\n以下是同时使用上述两个组件的代码示例：\n\n```csharp\n\u002F\u002F 配置远程控制的消息处理器\n#pragma warning disable CS1998\n#if UNITY_WEBGL && !UNITY_EDITOR\ngameObject.GetComponent\u003CJavaScriptMessageHandler>().OnDataReceived = async (message) =>\n{\n    HandleExternalMessage(message, \"JavaScript\");\n};\n#else\ngameObject.GetComponent\u003CSocketServer>().OnDataReceived = async (message) =>\n{\n    HandleExternalMessage(message, \"SocketServer\");\n};\n#endif\n#pragma warning restore CS1998\n```\n\n```csharp\nprivate void HandleExternalMessage(ExternalInboundMessage message, string source)\n{\n    \u002F\u002F 根据请求的 Endpoint 和 Operation 分配动作\n    if (message.Endpoint == \"dialog\")\n    {\n        if (message.Operation == \"start\")\n        {\n            if (source == \"JavaScript\")\n            {\n                dialogPriorityManager.SetRequest(message.Text, message.Payloads, 0);\n            }\n            else\n            {\n                dialogPriorityManager.SetRequest(message.Text, message.Payloads, message.Priority);\n            }\n        }\n        else if (message.Operation == \"clear\")\n        {\n            dialogPriorityManager.ClearDialogRequestQueue(message.Priority);\n        }\n    }\n    else if (message.Endpoint == \"model\")\n    {\n        modelRequestBroker.SetRequest(message.Text);\n    }            \n}\n```\n\n### ChatdollKit 远程客户端\n\n`SocketServer` 设计用于通过套接字通信接收任意信息，因此未提供官方客户端程序。不过，我们提供了一个 Python 示例代码，请参考以下链接，并根据需要将其适配到其他语言或平台。\n\nhttps:\u002F\u002Fgist.github.com\u002Fuezo\u002F9e56a828bb5ea0387f90cc07f82b4c15\n\n或者，如果你希望搭建 AITuber（AI 虚拟主播），可以尝试使用内置了 `SocketServer` 的 [ChatdollKit AITuber Controller](https:\u002F\u002Fgithub.com\u002Fuezo\u002Fchatdollkit-aituber) 提供的 AITuber 示例。\n\n## 🌐 在 WebGL 上运行\n\n目前可参考以下提示。我们正在准备 WebGL 的演示版本。\n\n- 构建时间约为 5–10 分钟（取决于机器配置）。\n- 调试非常困难。错误信息不会显示堆栈跟踪：“要使用 dlopen，你需要使用 Emscripten 的链接支持，请参阅 https:\u002F\u002Fgithub.com\u002Fkripken\u002Femscripten\u002Fwiki\u002FLinking”。\n- 内置的 Async\u002FAwait 无法正常工作（应用程序会在 `await` 处停止），因为 JavaScript 不支持多线程。请改用 [UniTask](https:\u002F\u002Fgithub.com\u002FCysharp\u002FUniTask)。\n- HTTP 请求需要 CORS 支持。\n- 不支持麦克风输入。请使用与 WebGL 兼容的 `ChatdollMicrophone`。\n- 不支持 MP3 等压缩音频格式。请在 SpeechSynthesizer 中使用 WAV 格式。\n- 不支持 OVRLipSync。请改用 [uLipSync](https:\u002F\u002Fgithub.com\u002Fhecomi\u002FuLipSync)。\n- 你还需要在主脚本中添加以下代码以启用 uLipSync：\n    ```\n    var ul = gameObject.GetComponent\u003CuLipSync.uLipSync>();\n    modelController.SpeechController.HandlePlayingSamples = (samples) =>\n    {\n        ul.OnDataReceived(samples, 1);\n    };\n    ```\n- 如果希望在消息窗口中显示多字节字符，请将包含多字节字符的字体导入项目，并将其设置为消息窗口的字体。\n\n## 🔄 从 0.7.x 版本迁移\n\n最简单的方法是删除 `Assets\u002FChatdollKit` 文件夹，然后重新导入 ChatdollKit 的 Unity 包。但如果你因某些原因无法这样做，可以通过以下步骤解决错误：\n\n1. 导入最新版本的 ChatdollKit Unity 包。控制台可能会显示一些错误。\n2. 导入 `ChatdollKit_0.7to084Migration.unitypackage`。\n3. 在 `ModelController`、`AnimatedVoiceRequest` 和 `Voice` 类中添加 `partial` 关键字。\n4. 将 `DialogController` 中的 `OnSayStart` 替换为 `OnSayStartMigration`。\n\n**⚠️ 注意**：此举仅能抑制错误输出，并不能继续使用旧版代码。如果项目中仍有部分代码使用 `DialogController`、`LLMFunctionSkill`、`LLMContentSkill` 或 `ChatdollKit`，请按如下方式替换为更新后的组件：\n\n- `DialogController`：`DialogProcessor`\n- `LLMFunctionSkill`：`Tool`\n- `LLMContentSkill`：`LLMContentProcessor`\n- `ChatdollKit`：`AIAvatar`\n\n### 从 FunctionSkill 迁移到 Tool\n\n如果你的组件继承自 `LLMFunctionSkillBase`，可以通过以下步骤轻松迁移到继承自 `ToolBase`：\n\n1. 更改基类\n\n    将 `LLMFunctionSkillBase` 替换为 `ToolBase` 作为基类。\n\n    ```md\n    \u002F\u002F 之前\n    public class MyFunctionSkill : LLMFunctionSkillBase\n\n    \u002F\u002F 之后\n    public class MyFunctionSkill : ToolBase\n    ```\n\n1. 更新 `ExecuteFunction` 方法签名\n\n    按照以下方式修改 `ExecuteFunction` 方法的参数和返回值类型：\n\n    ```md\n    \u002F\u002F 之前\n    public UniTask\u003CFunctionResponse> ExecuteFunction(string argumentsJsonString, Request request, State state，User user，CancellationToken token)\n\n    \u002F\u002F 之后\n    public UniTask\u003CToolResponse> ExecuteFunction(string argumentsJsonString，CancellationToken token)\n    ```\n\n1. 更新 `ExecuteFunction` 的返回值类型\n\n    将 `FunctionResponse` 替换为 `ToolResponse`。\n\n## ❤️ 感谢\n\n- [uLipSync](https:\u002F\u002Fgithub.com\u002Fhecomi\u002FuLipSync)（唇形同步）（c）[hecomi](https:\u002F\u002Ftwitter.com\u002Fhecomi)\n- [UniTask](https:\u002F\u002Fgithub.com\u002FCysharp\u002FUniTask)（async\u002Fawait 集成）（c）[neuecc](https:\u002F\u002Fx.com\u002Fneuecc)\n- [UniVRM](https:\u002F\u002Fgithub.com\u002Fvrm-c\u002FUniVRM\u002Freleases\u002Ftag\u002Fv0.89.0)（VRM）（c）[VRM 联盟](https:\u002F\u002Fx.com\u002Fvrm_pr) \u002F （c）[Masataka SUMI](https:\u002F\u002Fx.com\u002Fsantarh) 为 MToon 做出的贡献","# ChatdollKit 快速上手指南\n\nChatdollKit 是一个基于 Unity 的 3D 虚拟助手 SDK，可将 3D 模型（如 VRM）转化为支持语音交互的聊天机器人。它原生支持多种大语言模型（LLM）、语音识别（STT）和语音合成（TTS），并具备表情同步、动作控制及多平台部署能力。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**: Windows, macOS, Linux, iOS, Android\n- **游戏引擎**: Unity (推荐最新 LTS 版本)\n- **渲染管线**: **必须使用 Built-in Render Pipeline**。\n  > ⚠️ **注意**: 请勿使用 SRP (URP\u002FHDRP) 项目模板，因为依赖项 UniVRM 尚不支持 SRP。\n\n### 前置依赖\n在开始之前，请确保已准备好以下资源：\n1. **3D 模型**: 推荐使用 `.vrm` 格式的模型文件。\n2. **API Keys**:\n   - **LLM**: OpenAI (ChatGPT), Anthropic Claude, Google Gemini, Dify 等任一服务的 API Key。\n   - **语音服务**: OpenAI, Azure, Google, VOICEVOX 或 AivisSpeech 等任一 STT\u002FTTS 服务的 API Key。\n3. **网络环境**: 确保开发机器能访问对应的 AI 服务接口。\n\n## 安装步骤\n\n由于 ChatdollKit 主要作为 Unity 包或源码集成，请按照以下步骤导入：\n\n1. **创建 Unity 项目**\n   新建一个 Unity 项目，务必选择 **Built-in Render Pipeline** 模板。\n\n2. **导入依赖与插件**\n   - 下载 ChatdollKit 源码或 UnityPackage。\n   - 将 `ChatdollKit` 文件夹拖入项目的 `Assets` 目录。\n   - 根据提示安装必要的 Unity 包管理器（Package Manager）依赖，主要是 `UniVRM`。\n     ```bash\n     # 如果使用 git 方式导入 UniVRM (示例)\n     # 通常在 Unity Package Manager 中添加如下 URL:\n     https:\u002F\u002Fgithub.com\u002Fvrm-c\u002FUniVRM.git?path=Assets\u002FUniVRM\n     ```\n\n3. **资源准备**\n   - 将你的 `.vrm` 模型文件放入 `Assets\u002FResources` 或指定文件夹。\n   - 确保模型包含正确的 BlendShapes (表情) 和 Animator Controller。\n\n## 基本使用\n\n以下是最简单的运行流程，基于官方提供的 `Demo08` 场景进行配置。\n\n### 1. 打开演示场景\n在 Unity 编辑器中，打开位于以下路径的场景：\n```text\nDemo\u002FDemo08\n```\n\n### 2. 配置核心对象\n在 Hierarchy 面板中找到并选中 `AIAvatarVRM` 对象。\n\n### 3. 设置 API Key\n在 Inspector 面板中，找到以下组件并填入你的 API Key：\n- **ChatGPTService**: 填入 LLM 服务的 API Key。\n- **OpenSpeechSynthesizer**: 填入 TTS 服务的 API Key。\n- **OpenAISpeechListener**: 填入 STT 服务的 API Key。\n\n> 💡 **提示**: 如果你使用的是非 OpenAI 的服务（如 Gemini 或 VOICEVOX），请在对应组件的下拉菜单中选择服务类型，并填入相应的 Key。\n\n### 4. 加载模型 (如需)\n如果场景未自动加载模型，请在 `AIAvatarVRM` 组件的 `Model Path` 字段中指定你的 `.vrm` 文件路径，或在运行时通过代码加载。\n\n### 5. 运行与测试\n点击 Unity 编辑器顶部的 **Play** 按钮。\n- 允许浏览器或系统访问麦克风权限。\n- 对着麦克风说 **\"こんにちは\"** (你好) 或任何长度超过 3 个字符的词语。\n- 3D 角色将进行语音识别、思考并语音回复，同时伴随口型和表情变化。\n\n### 进阶：切换语言\n该工具支持多语言动态切换。在对话中直接指令即可，例如：\n- \"让我们用日语交谈吧\" (Let's talk in Japanese)\n- \"请用中文回答\"\n\n此时系统将自动切换后续的语音识别和合成语言。","某教育科技公司正致力于为其儿童编程学习平台开发一位能实时互动、表情丰富的 3D AI 导师，以替代传统的文字客服。\n\n### 没有 ChatdollKit 时\n- **开发周期漫长**：团队需分别集成语音识别、大语言模型和语音合成服务，并手动编写代码将音频流与 3D 模型的口型、眨眼动作逐帧对齐，耗时数周。\n- **交互体验生硬**：由于缺乏精准的语音活动检测（VAD），AI 常在用户说话时强行插话，或是在噪音环境下无法准确判断对话结束，导致对话频繁中断。\n- **多端适配困难**：想要将助手部署到 WebGL 网页端和 iOS App 时，需针对不同平台重写底层音频处理和渲染逻辑，维护成本极高。\n- **情感表达缺失**：3D 角色仅能机械地播放预设动画，无法根据对话内容（如鼓励、疑惑）自主切换面部表情和肢体动作，难以吸引儿童用户。\n\n### 使用 ChatdollKit 后\n- **一站式快速集成**：ChatdollKit 原生支持主流大模型及多种 TTS\u002FSTT 服务，自动同步语音与唇形动画，将原本数周的联调工作缩短至几天内完成。\n- **自然流畅的对话**：借助其内置的 Silero VAD 和“打断发言”（Barge-in）功能，即使在学习机嘈杂的环境中，AI 也能精准识别用户意图并允许随时插话，交互如真人般自然。\n- **跨平台无缝发布**：基于 Unity 架构，ChatdollKit 让同一套代码轻松部署于 Windows、iOS 及 WebGL 端，大幅降低了多平台维护难度。\n- **生动的角色演绎**：通过内置的表情控制器，3D 导师能根据对话语境自主做出微笑、点头等细腻表情，显著提升了儿童用户的学习沉浸感。\n\nChatdollKit 通过将复杂的音视频同步与对话管理封装为标准化 SDK，让开发者能专注于内容创作，快速打造出有温度、跨平台的 3D 智能助手。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fuezo_ChatdollKit_a43865f5.jpg","uezo","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fuezo_c545e95e.png","A hobby programmer \u002F waifu developer based in Tokyo.\r\n\r\n🥰→🐈 🍣 ☕️ 🌂 💫",null,"Tokyo","uezo@uezo.net","uezochan","https:\u002F\u002Fwww.amazon.co.jp\u002Fstores\u002Fauthor\u002FB00C9SR66M","https:\u002F\u002Fgithub.com\u002Fuezo",[86,90],{"name":87,"color":88,"percentage":89},"C#","#178600",95.9,{"name":91,"color":92,"percentage":93},"JavaScript","#f1e05a",4.1,1130,114,"2026-04-08T16:40:05","Apache-2.0",4,"Windows, macOS, Linux, iOS, Android, WebGL","未说明",{"notes":102,"python":103,"dependencies":104},"该工具是基于 Unity 引擎的 3D 虚拟助手 SDK，而非 Python 库。重要提示：创建 Unity 项目时切勿使用 SRP (Scriptable Render Pipeline) 模板，因为其依赖的 UniVRM 不支持 SRP。支持多种大语言模型后端（如 ChatGPT, Claude, Gemini）及语音服务。","不适用 (基于 Unity 引擎)",[105,106,107,108,109,110],"Unity Engine","UniVRM","ChatGPT Service","Azure Speech Services","VOICEVOX","Silero VAD",[15,46],[113,114,115,116,117,118,119,120,121,122],"unity","chatbot","virtualassistant","azure","3d-model","chatgpt","unity3d","waifu","ai-companion","vrm","2026-03-27T02:49:30.150509","2026-04-11T08:01:46.193348",[126,131,136,141,146,151],{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},29160,"如何在 WebGL 构建中解决日文字符显示问题或语音合成崩溃？","在 WebGL 平台上，如果角色说话时出现杂音、文本不显示或崩溃，通常是因为字体不支持特定语言（如日语）。解决方案是更换为包含相应字符集的字体，例如使用 'IBM Plex Sans JP'、'Noto Sans' 或其他支持日语的字体。此外，如果是调用 Gemini API 报错缺少密钥，请确保注册并配置了对应的 Gemini API Key，而不仅仅是 OpenAI Key。","https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fissues\u002F295",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},29161,"运行 Demo 时遇到 'NullReferenceException' 或 'KeyNotFoundException: Neutral' 错误怎么办？","这通常是因为模型的面部表情配置中缺少名为 'Neutral' 的表情。虽然缺失该表情不应导致致命错误，但当前版本可能会抛出异常。解决方法是检查 Model Controller 中的 Animator 设置，确保配置了名为 'Neutral' 的面部表情。如果模型本身没有该表情，需要在配置中添加或映射一个默认的中性表情以避免字典查找失败。","https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fissues\u002F230",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},29162,"ChatGPT LLM 服务无响应，但语音监听正常，如何修复？","如果语音监听正常工作但 ChatGPT 没有回复，请检查 LLM 组件中的 'Prefix and Suffix Allowance'（前缀和后缀容差）设置。尝试将该值设置为大于 1（例如 4），不要设置为 0。许多用户在将此值调整后成功解决了编辑器或头显（如 Meta Quest 3）上无响应的问题。","https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fissues\u002F495",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},29163,"使用 Google TTS Loader 时报错 'Length of created clip must be larger than 0' 是什么原因？","该错误表明 Google TTS 服务接收到的待合成文本为空。这通常是因为 API Key 或 Language 参数缺失，或者传递给 TTS 的文本内容为空字符串。调试方法是在代码中加入检查：如果 `voice.Text.Trim()` 为空，则记录日志并返回 null，避免尝试生成空音频片段。同时请确认 GoogleTTSLoader 组件中已正确填写 API Key 和语言设置。","https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fissues\u002F352",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},29164,"如何配置才能让 ChatGPT 自动触发特定的动画（如挥手）？","要让对话自动触发动画，需完成以下三步：\n1. 创建包含动画剪辑和过渡条件的 Animator Controller。\n2. 定义动画名称与触发值之间的映射关系（参考 Demo 中 Main.cs 的第 93 行示例）。\n3. 在系统提示词（System Prompt）中告知 ChatGPT 可用的动画名称列表，以便 AI 知道可以调用哪些动作（参考 Demo.unity 第 1801 行的配置示例）。","https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fissues\u002F324",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},29165,"集成自定义本地 TTS（如 GPT-SoVITS）时出现 'KeyNotFoundException: Voice' 错误如何解决？","该错误通常发生在 DialogController 尝试访问字典中不存在的 'Voice' 键时。这可能是因为自定义 TTS 脚本未正确返回预期的语音数据结构，或者在提交请求时语音对象未正确初始化。检查自定义 TTS 加载器脚本，确保其正确处理了语音文本并返回了有效的 AudioClip，同时在调用 `OnSubmitRequestInput` 之前确认语音字典中已包含必要的键值对。","https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fissues\u002F314",[157,162,167,172,177,182,187,192,197,202,207,212,217,222,227,232,237,242,247,252],{"id":158,"version":159,"summary_zh":160,"released_at":161},197976,"v0.8.16","## 💃 模型\n\n为提高可维护性，已将语音处理逻辑从 `ModelController` 中提取至 `SpeechController` ，并将面部表情处理逻辑提取至 `FaceController`。\n\n* 由 @lavender-snow 在 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F471 中改进了针对各类 3D 头像的眨眼和面部表情处理。\n* 重构了眨眼的设置与验证逻辑：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F473\n* 重构 ModelController：拆分面部与语音逻辑：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F475\n* 修复了 FaceController 初始化时的空引用错误：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F476\n\n非常感谢 @lavender-snow 的贡献！✨🙏✨\n\n## 🦜 对话\n\n通过引入参考现实世界时间戳的时序对话，以及在用户打断时停止 AI 发音的插话支持，进一步提升了对话体验。\n\n* 向提示词中添加周期性时间戳插入功能：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F489\n* 添加插话支持，允许用户语音中断 AI 发音：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F484\n\n## 🎓 LLM\n\n* 允许通过 Inspector 开启或关闭每个工具：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F477\n* 为 Gemini 请求添加 thinkingConfig 字段：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F485\n\n## 🎙️ 语音监听器\n\n新增基于 WebSocket 的流式语音识别功能。由于 VAD 和语音识别均在服务器端完成，因此可将 VAD 处理负载从客户端卸载。此外，在轮次结束检测的等待时间内完成语音识别，整体响应延迟可缩短数百毫秒。\n\n* 为 AIAvatarSpeechListener 添加说话人匹配与后处理功能：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F478\n* 支持在 Inspector 中切换多个 SpeechListener：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F474\n* 新增用于实时流式语音识别的 AIAvatarKitStreamSpeechListener：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F483\n* 支持 WebSocket STT 的 API 密钥认证：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F486\n* 使用子协议在浏览器 WebSocket 中发送头部信息：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F490\n\n## 🗣️ 语音合成器\n\n* 添加对 Kotodama API 的 TTS 服务支持：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F480\n\n## 🍩 其他\n\n* 更新 README 中的架构概览图：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F470\n* 为 API 请求中的 Authorization 头添加空值检查：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F481\n* 使音量滑块操作更加自然直观：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F487\n* AIAvatar 预制件：添加插话功能、音频混音器，并修复相关标志位：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F488\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcompare\u002Fv0.8.15...v0.8.16","2026-02-14T22:36:42",{"id":163,"version":164,"summary_zh":165,"released_at":166},197977,"v0.8.15","## 🌏WebGL 更新\n\n支持 Silero VAD、前后摄像头切换以及正确的宽高比处理，新增图片文件上传支持，提升了麦克风输入性能，并修复了静音状态下唇形同步失效的 bug。重大改进 💪\n\n* 为 Silero VAD 添加 WebGL 支持 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F455\n* 为 SileroVAD WebGL JS 资源添加元数据文件 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F458\n* 为音频播放添加 HandlePlayingSamples 回调 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F456\n* 添加 WebGL 摄像头设备支持，并重构 SimpleCamera https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F462\n* 为 ImageButton 添加 WebGL 文件上传支持 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F464\n* 使用 malloc 优化 WebGL 麦克风数据传输 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F467\n\n\n## ✨ UI 控件改进\n\n除了外观更加时尚外，现在无需任何配置即可直接使用——只需将其放置在场景的 Canvas 上即可。\n\n* 为 AIAvatar 添加最大音量控制并改进静音功能 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F459\n* 添加聊天控件的 UI 预制体和脚本 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F460\n* 添加 MessageWindow 及容器预制体，并提升其可配置性 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F463\n\n\n## 🥁 进一步增强抗噪能力\n\n支持组合使用多种类型的 VAD。例如，将即使在嘈杂环境中也能仅识别人声的 Silero VAD 与仅捕捉响亮声音的内置基于能量的 VAD 结合使用，系统便可在活动现场准确捕捉用户语音，同时部分过滤掉周围环境音及场馆广播。\n\n* 在语音监听器中支持多种语音检测功能 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F466\n* 移除 SileroVADMicrophoneButton 组件及预制体 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F468\n\n\n## 🍩 其他更新\n\n* 在禁用函数调用时遮罩相关工具 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F457\n* 将 SpeechListenerBase 中的 OnDestroy 方法设为虚方法 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F461\n* 为 AIAvatar 对话处理添加 payload 支持 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F465\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcompare\u002Fv0.8.14...v0.8.15","2025-08-21T15:13:14",{"id":168,"version":169,"summary_zh":170,"released_at":171},197978,"v0.8.14","## 🎙️ 回声消除支持\n\n* 为 Android、iOS 和 macOSX 添加原生麦克风支持 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F440\n* 添加实验性的原生麦克风插件🔌 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F449\n* 在录音时，于静音角色音量前增加延迟 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F453\n* 将回声消除说明添加至 README https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F454\n\n## 🗣️ 提升犹豫与停顿的稳定性\n\n* 添加请求合并功能，以防止语音识别片段化 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F442\n* 添加对角色音量控制的支持 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F444\n* 改进 ChatGPT 的上下文序列验证 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F450\n* 修复递归流式传输中禁用函数调用的问题 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F452\n\n## 🧩 平台增强\n\n* 添加关于兼容 OpenAI API 使用的说明 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F437\n* 添加对 Aivis Cloud API 作为 TTS 服务的支持💠 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F439\n* 添加对 AIAvatarKit STT 和 TTS 的支持 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F446\n\n## 🙏 错误修复及小型\u002F内部改动\n\n* 为 SileroVAD 添加 Android 特定的模型加载 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F441\n* 重构 LLM 的工具调用处理和上下文更新 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F443\n* 暴露录音状态和采样数量以便调试 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F445\n* 修复在 WebGL 中导致对话和构建失败的 bug https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F447\n* 更新 v0.8.14 的演示 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F448\n* 针对 v0.8.14 进行更新 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F451\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcompare\u002Fv0.8.13...v0.8.14","2025-08-12T12:42:33",{"id":173,"version":174,"summary_zh":175,"released_at":176},197979,"v0.8.13","## 🥳 Silero VAD 支持\n\n基于机器学习的语音活动检测大幅提升了嘈杂环境下的话轮结束准确性，让户外或活动现场的对话更加流畅。\n\n* 由 @uezo 在 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F431 中添加了基于 ONNX 的语音活动检测处理器 SileroVADProcessor\n* 由 @uezo 在 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F434 中添加了 SileroVAD 麦克风按钮的 UI 和逻辑\n* 由 @uezo 在 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F430 中为 RecordingSession 添加了预滚动缓冲区\n\n\n## 🪄 TTS 预处理\n\n可选的文本预处理功能允许你在合成之前微调发音（例如将“OpenAI”转换为片假名）。\n\n* 由 @uezo 在 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F432 中为 SpeechSynthesizerBase 添加了 PreprocessText 钩子\n\n\n## 🤝 Grok 和 Gemini 兼容性\n\n移除了 OpenAI 风格端点中的 OpenAI 特定参数，使得 Grok、Gemini 及其他兼容 API 的模型可以开箱即用。\n\n* 由 @uezo 在 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F433 中为 ChatGPTService 添加了 OpenAI 兼容 API 选项\n\n\n## 🍩 其他更改\n\n* 由 @uezo 在 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F428 中支持临时内部请求并实现空闲状态恢复\n* 由 @uezo 在 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F429 中添加了对 AIAvatarKit 语言响应的支持\n* 由 @uezo 在 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F435 中修复了 WebGL 构建失败的 bug\n\n\n## 🎂 生日版本\n\n本次更新正值我的生日发布——感谢大家与我一同庆祝！🥳🎉🎈🍰\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcompare\u002Fv0.8.12...v0.8.13","2025-07-18T14:59:33",{"id":178,"version":179,"summary_zh":180,"released_at":181},197980,"v0.8.12","## 变更内容\n* AIAvatarKit 流式传输改进 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F427\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcompare\u002Fv0.8.11...v0.8.12","2025-05-15T23:52:42",{"id":183,"version":184,"summary_zh":185,"released_at":186},197981,"v0.8.11","## 🤖 支持服务器端智能体框架协作\n\n将 AI 智能体逻辑卸载到服务器端，提升前端的可维护性；同时允许您接入 AutoGen 等框架（以及任何其他智能体 SDK），实现功能的无限扩展。\n\n- [添加对 AIAvatarKit 作为 AI 智能体后端服务的支持](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcommit\u002F930ea46)\n- [支持 Dify 的 `inputs` 参数](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcommit\u002F3f76f7a)\n- [修复 WebGL 中 API Key 授权失效的问题](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcommit\u002F887ee79)\n- [允许 AIAvatarKit 的 SystemPromptParams 为 null](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcommit\u002F44f2d34)\n\n\n## 🌐 WebGL 改进\n\n将麦克风采集升级为现代的 `AudioWorkletNode`，以降低延迟并提高可靠性；稳定了静音与取消静音的处理流程；优化了错误处理机制，能够立即提示 HTTP 错误并防止程序卡死；修复了 WebGL 构建中的 API Key 授权问题。\n\n- [将 WebGLMicrophone 实现从 ScriptProcessor 切换为 AudioWorklet](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcommit\u002F299b3f0)\n- [防止在 WebGL 上解除静音后继续处理已静音用户的声音](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcommit\u002F50700b6)\n- [立即返回 HTTP 错误，避免 AI 角色卡死](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcommit\u002Fe604b84)\n\n\n## 🍩 其他更新\n\n- [移除 CommandRService](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcommit\u002F0584495)\n- [新增选项，可在 SpeechGatewaySpeechSynthesizer 中包含 Wave 头信息](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcommit\u002Faf3a31c)\n- [为 ChatMemory 集成扩展添加频道信息](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcommit\u002F9cf2768)\n- [启用麦克风输入的降采样以用于语音识别](https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcommit\u002F60fdd7e)\n\n顺带一提，本次发布是在东京的 [Jonanjima 海滨公园](https:\u002F\u002Fwww.tokyo-park.or.jp\u002Fpark\u002Fformat\u002Findex027.html) 野餐时完成的 🏕️🌳✈️ — 这里是东京一处绝佳的露营、烧烤和近距离观机的好去处。强烈推荐！\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcompare\u002F0.8.10...v0.8.11","2025-04-29T04:46:10",{"id":188,"version":189,"summary_zh":190,"released_at":191},197982,"0.8.10","## 🌎 动态多语言切换\n\n* 支持动态多语言语音合成 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F414\n* 为语音识别启用多语言支持 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F415\n* 在语音合成器中添加对 SpeechGateway 的支持 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F416\n\n\n## 🔖 长期记忆\n\n* 添加会话级别的标识 ContextId https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F418\n* 将请求信息添加到 `OnStreamingEnd` 的参数中 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F419\n* 添加对长期记忆的支持 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F420\n\n\n## 🍩 其他更新\n\n* 通过 AITuber 控制器 @buchizo 在 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F413 中支持 ChatGPT LLM 的 IsAzure 选项\n* 为 WebGL 启用回声消除和噪声抑制 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F417\n* 防止 WebGL 构建错误 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F421\n* 改进 HTTP 访问中的错误处理 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F422\n* 修复设置 Nijivoice 持续时间失败的 bug https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F423\n* 进行小幅改动并更新 v0.8.10 的 README https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F424\n\n非常感谢您的贡献，@buchizo san！🥰🥰🥰\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcompare\u002F0.8.9...0.8.10","2025-03-29T13:40:56",{"id":193,"version":194,"summary_zh":195,"released_at":196},197983,"0.8.9","## ✨ 支持 NijiVoice 作为语音合成器\n\n* 添加语音预取模式功能 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F408\n* 添加对 NijiVoice 作为语音合成器的支持 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F409\n\n\n## 🍩 其他更改\n\n* 改进对话处理 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F410\n* 修复 DialogProcessor 在处理 LLM 流之前失败的 bug https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F411\n* 针对 v0.8.9 的更新 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F412\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcompare\u002F0.8.8.1...0.8.9","2024-12-13T16:40:12",{"id":198,"version":199,"summary_zh":200,"released_at":201},197984,"0.8.8.1","## 💪 将 Dify 作为 AITuber 的后端支持\n\n与任何大语言模型无缝集成，同时为 AITuber 赋予智能体能力，融合先进知识与功能，实现高效且可扩展的运营！\n\n* 在 DifyService 中启用清除上下文功能 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F405\n* 更新 AITuber 演示版本至 v0.8.8.1 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F406\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcompare\u002F0.8.8...0.8.8.1","2024-12-05T13:07:23",{"id":203,"version":204,"summary_zh":205,"released_at":206},197985,"0.8.8","## 🥰🥳支持多AI主播对话\n\n* 添加用于套接字通信的客户端 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F402\n* 增加对多AI主播之间互动的支持 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F403\n\n\n## 🍩其他更新\n\n* 修复v0.8.7中的WebGL构建错误 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F401\n* v0.8.8版本更新 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F404\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcompare\u002F0.8.7...0.8.8","2024-12-03T13:20:06",{"id":208,"version":209,"summary_zh":210,"released_at":211},197986,"0.8.7","## ✨More Features For AITuber and Update Demo✨\r\n\r\n* Support start\u002Fstop SocketServer from external components https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F395\r\n* Add dummy components for the use case that doesn't use microphone https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F396\r\n* Update demo for v0.8.7 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F397\r\n* Update README and small changes for v0.8.7 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F399\r\n\r\n## 🐛 Bug fix\r\n\r\n* Fix bug where WebGL build fails https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F398\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcompare\u002F0.8.6...0.8.7","2024-11-29T14:36:59",{"id":213,"version":214,"summary_zh":215,"released_at":216},197987,"0.8.7beta-ootb-app","Out-of-the-box application for AITuber use case.\r\n\r\nUse [ChatdollKit AITuber Controller](https:\u002F\u002Fgithub.com\u002Fuezo\u002Fchatdollkit-aituber) to control.\r\n\r\n\r\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F7170e419-20d6-4987-9db5-066af6b0abc1\r\n\r\n","2024-11-23T14:26:41",{"id":218,"version":219,"summary_zh":220,"released_at":221},197988,"0.8.6","## 🎛️ Support VOICEVOX and AivisSpeech inline style\r\n\r\nEnables dynamic and autonomous switching of voice styles to enrich character expression and adapt to emotional nuances.\r\n\r\n* Support applying inline style for VOICEVOX https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F390\r\n\r\n\r\n## 🥰 Improve VRM runtime loading\r\n\r\nAllows seamless and error-free switching of 3D models at runtime, ensuring a smoother user experience.\r\n\r\n* Resolve runtime VRM model loading errors and refactor configuration process https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F391\r\n\r\n\r\n## 🍩 Other changes\r\n\r\n* Enhance AITuber Demo with New REST API Features https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F392\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcompare\u002F0.8.5...0.8.6","2024-11-20T13:56:47",{"id":223,"version":224,"summary_zh":225,"released_at":226},197989,"0.8.5","## 🎓 Chain of Thought Prompting\r\n\r\nChain of Thought (CoT) prompting is a technique to enhance AI performance. For more information about CoT and examples of prompts, see https:\u002F\u002Fdocs.anthropic.com\u002Fen\u002Fdocs\u002Fbuild-with-claude\u002Fprompt-engineering\u002Fchain-of-thought .\r\n\r\nChatdollKit supports Chain of Thought by excluding sentences wrapped in \u003Cthinking> ~ \u003C\u002Fthinking> tags from speech synthesis.\r\n\r\nYou can customize the tag by setting a preferred word (e.g., \"reason\") as the ThinkTag in the inspector of LLMContentProcessor.\r\n\r\n* Enable Chain of Thought Prompting https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F387\r\n\r\n\r\n## 🍩 Other updates\r\n\r\n* Update UniVRM to 0.127.2 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F386\r\n* Add FileLogger to AITuber Demo https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F388\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcompare\u002F0.8.4.1...0.8.5","2024-11-13T14:46:08",{"id":228,"version":229,"summary_zh":230,"released_at":231},197990,"0.8.4.1","## 🥳 Update for AITuber 💬\r\n\r\n* Improve features for AITuber remote control by @uezo in https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F385\r\n\r\nSee also: https:\u002F\u002Fgithub.com\u002Fuezo\u002Fchatdollkit-aituber\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcompare\u002F0.8.4...0.8.4.1","2024-11-04T14:47:47",{"id":233,"version":234,"summary_zh":235,"released_at":236},197991,"0.8.4","## 🧩 Modularized for Better Reusability and Maintainability\r\n\r\nWe’ve reorganized key components, focusing on modularity to improve customizability and reusability. Check out the demos for more details!\r\n\r\n* Add modular demos https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F381\r\n* Support getting and clearing LLM context through DialogProcessor https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F383\r\n* Introduce LLMServiceExtensions for Centralized Custom Processing https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F382\r\n\r\n\r\n## 🧹 Removed Legacy Components\r\n\r\nOutdated components have been removed, simplifying the toolkit and ensuring compatibility with the latest features. Refer to [🔄 Migration from 0.7.x](#-migration-from-07x) if you're updating from v0.7.x.\r\n\r\n* Remove legacy v0.7.x components https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F376\r\n\r\n\r\n## 🍩 Other Updates\r\n\r\n* Set default StopChat behavior to skip user input prompt https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F377\r\n* Fix LLMService IsEnabled handling https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F379\r\n* Unify speech handling regardless of source in ModelController https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F380\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcompare\u002F0.8.3...0.8.4","2024-10-27T13:36:21",{"id":238,"version":239,"summary_zh":240,"released_at":241},197992,"0.8.3","## ✨ New features\r\n\r\n* Add SpeechListener for Azure Speech SDK stream mode https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F361\r\n* Add functionality to interrupt character speech https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F367\r\n* Add pause functionality to insert delays in character speech https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F368\r\n\r\n\r\n## 💃 Easier Animation Registration\r\n\r\n* Add named animation registration https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F360\r\n* Make it easier to register animations https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F366\r\n\r\n\r\n## 🍩 Other updates and bug fixes\r\n\r\n* Migrate to LLMService Context Management https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F362\r\n* Fix bug where SpeechListener doesn't listen after error #364 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F365\r\n* Add auto start functionality to AzureStreamSpeechListener https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F369\r\n* Add support for OpenAI TTS on WebGL https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F370\r\n* Fix bug where Function Calling fails https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F372\r\n* Fix Gemini fails after function calling #371 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F373\r\n* Update demo for v0.8.3 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F374\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcompare\u002F0.8.2...0.8.3","2024-10-11T12:58:47",{"id":243,"version":244,"summary_zh":245,"released_at":246},197993,"0.8.2","## 🌐 Control WebGL Character from JavaScript\r\n\r\nWe’ve added the ability to control the ChatdollKit Unity application from JavaScript when running in WebGL builds. This allows for more seamless interactions between the Unity app and web-based systems.\r\n\r\n* Enable WebGL interaction via external JavaScript https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F355\r\n\r\n\r\n## 🗣️ Speech Synthesizer\r\n\r\nA new `SpeechSynthesizer` component has been introduced to streamline text-to-speech (TTS) operations. This component is reusable across projects without `Model` package, simplifying maintenance and reusability. \r\n\r\n* Add SpeechSynthesizer as new mainstream TTS component https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F354\r\n* Speech synthesizer updates https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F358\r\n* Improve TTS handling for empty strings and errors https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F353\r\n\r\n\r\n## 🍩 Other Updates\r\n\r\n* Small changes for v0.8.2 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F356\r\n* Update demo v0.8.2 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F357\r\n* Prevent user request from being overwritten by noise https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F359\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcompare\u002F0.8.1...0.8.2\r\n","2024-09-23T03:55:05",{"id":248,"version":249,"summary_zh":250,"released_at":251},197994,"0.8.1","## 🏷️ User-Defined Tags\r\n\r\nYou can now include custom tags in AI responses, enabling dynamic actions. For instance, embed language codes in replies to switch between multiple languages on the fly during conversations.\r\n\r\n* Add support for user-defined tags in response messages https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F342\r\n* Add support for user-defined tags (Claude, Gemini and Dify) https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F350\r\n\r\n## 🌐 External Control via Socket\r\n\r\nNow supports external commands through Socket communication. Direct conversation flow, trigger specific phrases, or control expressions and gestures, unlocking new use cases like AI Vtubers and remote customer service.\r\n\r\n* Add SocketServer to enable external request handling via socket communication https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F345\r\n* Add DialogPriorityManager for handling prioritized dialog requests https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F346\r\n* Add option to hide user message window https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F347\r\n* Add ModelRequestBroker for simplified model control via tagged text https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F348\r\n\r\nCheck out the client-side demo here: https:\u002F\u002Fgist.github.com\u002Fuezo\u002F9e56a828bb5ea0387f90cc07f82b4c15\r\n\r\n\r\n## 🍩  Other Updates\r\n\r\n* Fix bug where expressions on error doesn't work https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F344\r\n* Improve text splitting logic in SplitString method https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F349\r\n* Update demo for v0.8.1 https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F351\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcompare\u002F0.8.0...0.8.1","2024-09-18T14:49:45",{"id":253,"version":254,"summary_zh":255,"released_at":256},197995,"0.8.0","## 💎 What's New in Version 0.8 Beta\r\n\r\nTo run the demo for version 0.8.0 beta, please follow the steps below after importing the dependencies:\r\n\r\n- Open scene `Demo\u002FDemo08`.\r\n- Select `AIAvatarVRM` object in scene.\r\n- Set OpenAI API key to following components on inspector:\r\n  - ChatGPTService\r\n  - OpenAITTSLoader\r\n  - OpenAISpeechListener\r\n- Run on Unity Editor.\r\n- Say \"こんにちは\" or word longer than 3 characters.\r\n- Enjoy👍\r\n\r\n### ⚡ Optimized AI Dialog Processing\r\n\r\nWe've boosted response speed with parallel processing and made it easier for you to customize behavior with your own code. Enjoy faster, more flexible AI conversations!\r\n\r\n* Optimize AI-driven interactions by @uezo in https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F335\r\n\r\n\r\n### 🥰 Emotionally Rich Speech\r\n\r\nAdjusts vocal tone dynamically to match the conversation, delivering more engaging and natural interactions.\r\n\r\n* Improve expressiveness of text-to-speech output by @uezo in https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F336\r\n* Allow adding emotion to speech synthesis by @uezo in https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F337\r\n\r\n\r\n### 🎤 Enhanced Microphone Control\r\n\r\nMicrophone control is now more flexible than ever! Easily start\u002Fstop devices, mute\u002Funmute, and adjust voice recognition thresholds independently.\r\n\r\n* Add new SpeechListener namespace with voice input modules by @uezo in https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F334\r\n\r\n\r\n## 🍩 Other Changes\r\n\r\n* Fix some bugs in StyleBertVITSTTSLoader by @uezo in https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F333\r\n* Update for v0.8.0 beta by @uezo in https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fpull\u002F338\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fuezo\u002FChatdollKit\u002Fcompare\u002F0.7.7...0.8.0","2024-09-08T08:12:13"]