[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-kyutai-labs--unmute":3,"tool-kyutai-labs--unmute":65},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",159267,2,"2026-04-17T11:29:14",[13,14,15],"开发框架","Agent","语言模型","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,3,"2026-04-06T11:19:32",[15,26,14,13],"图像",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":10,"last_commit_at":33,"category_tags":34,"status":16},8553,"spec-kit","github\u002Fspec-kit","Spec Kit 是一款专为提升软件开发效率而设计的开源工具包，旨在帮助团队快速落地“规格驱动开发”（Spec-Driven Development）模式。传统开发中，需求文档往往与代码实现脱节，导致沟通成本高且结果不可控；而 Spec Kit 通过将规格说明书转化为可执行的指令，让 AI 直接依据明确的业务场景生成高质量代码，从而减少从零开始的随意编码，确保产出结果的可预测性。\n\n该工具特别适合希望利用 AI 辅助编程的开发者、技术负责人及初创团队。无论是启动全新项目还是在现有工程中引入规范化流程，用户只需通过简单的命令行操作，即可初始化项目并集成主流的 AI 编程助手。其核心技术亮点在于“规格即代码”的理念，支持社区扩展与预设模板，允许用户根据特定技术栈定制开发流程。此外，Spec Kit 强调官方维护的安全性，提供稳定的版本管理，帮助开发者在享受 AI 红利的同时，依然牢牢掌握架构设计的主动权，真正实现从“凭感觉写代码”到“按规格建系统”的转变。",88749,"2026-04-17T09:48:14",[15,26,14,13],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":10,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,15],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":10,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",85092,"2026-04-10T11:13:16",[26,51,52,53,14,54,15,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":62,"last_commit_at":63,"category_tags":64,"status":16},5784,"funNLP","fighting41love\u002FfunNLP","funNLP 是一个专为中文自然语言处理（NLP）打造的超级资源库，被誉为\"NLP 民工的乐园”。它并非单一的软件工具，而是一个汇集了海量开源项目、数据集、预训练模型和实用代码的综合性平台。\n\n面对中文 NLP 领域资源分散、入门门槛高以及特定场景数据匮乏的痛点，funNLP 提供了“一站式”解决方案。这里不仅涵盖了分词、命名实体识别、情感分析、文本摘要等基础任务的标准工具，还独特地收录了丰富的垂直领域资源，如法律、医疗、金融行业的专用词库与数据集，甚至包含古诗词生成、歌词创作等趣味应用。其核心亮点在于极高的全面性与实用性，从基础的字典词典到前沿的 BERT、GPT-2 模型代码，再到高质量的标注数据和竞赛方案，应有尽有。\n\n无论是刚刚踏入 NLP 领域的学生、需要快速验证想法的算法工程师，还是从事人工智能研究的学者，都能在这里找到急需的“武器弹药”。对于开发者而言，它能大幅减少寻找数据和复现模型的时间；对于研究者，它提供了丰富的基准测试资源和前沿技术参考。funNLP 以开放共享的精神，极大地降低了中文自然语言处理的开发与研究成本，是中文 AI 社区不可或缺的宝藏仓库。",79857,1,"2026-04-08T20:11:31",[15,51,54],{"id":66,"github_repo":67,"name":68,"description_en":69,"description_zh":70,"ai_summary_zh":71,"readme_en":72,"readme_zh":73,"quickstart_zh":74,"use_case_zh":75,"hero_image_url":76,"owner_login":77,"owner_name":78,"owner_avatar_url":79,"owner_bio":80,"owner_company":81,"owner_location":81,"owner_email":81,"owner_twitter":81,"owner_website":82,"owner_url":83,"languages":84,"stars":115,"forks":116,"last_commit_at":117,"license":118,"difficulty_score":23,"env_os":119,"env_gpu":120,"env_ram":121,"env_deps":122,"category_tags":132,"github_topics":81,"view_count":10,"oss_zip_url":81,"oss_zip_packed_at":81,"status":16,"created_at":133,"updated_at":134,"faqs":135,"releases":170},8581,"kyutai-labs\u002Funmute","unmute","Make text LLMs listen and speak","Unmute 是一个让纯文本大语言模型（LLM）具备“听”和“说”能力的开源系统。它通过集成 Kyutai 研发的语音识别（STT）与语音合成（TTS）模型，将用户的语音实时转为文字输入给模型，再将模型生成的文字回复即时转化为语音播放，从而构建出流畅的语音对话体验。\n\n这一工具主要解决了传统文本模型无法直接处理音频交互的痛点，让用户无需打字即可与自然语言模型进行低延迟的实时口头交流。其技术亮点在于对语音处理链路进行了专门的低延迟优化，并采用灵活的架构设计：后端通过 WebSocket 实时流转音频数据，且兼容任意文本大模型。用户既可以直接使用官方服务，也能通过 OpenRouter 接入云端模型，或在本地部署如 Gemma 3 等开源模型。\n\nUnmute 特别适合开发者、AI 研究人员以及希望构建语音交互应用的技术爱好者。对于想要探索多模态交互原型或搭建私有化语音助手的团队来说，它提供了基于 Docker Compose 的便捷部署方案，只需单张支持 CUDA 的显卡即可在 Linux 环境下快速启动。虽然普通用户也可通过网页版直接体验，但其核心价值更在于为技术社区提供了一套可定制、","Unmute 是一个让纯文本大语言模型（LLM）具备“听”和“说”能力的开源系统。它通过集成 Kyutai 研发的语音识别（STT）与语音合成（TTS）模型，将用户的语音实时转为文字输入给模型，再将模型生成的文字回复即时转化为语音播放，从而构建出流畅的语音对话体验。\n\n这一工具主要解决了传统文本模型无法直接处理音频交互的痛点，让用户无需打字即可与自然语言模型进行低延迟的实时口头交流。其技术亮点在于对语音处理链路进行了专门的低延迟优化，并采用灵活的架构设计：后端通过 WebSocket 实时流转音频数据，且兼容任意文本大模型。用户既可以直接使用官方服务，也能通过 OpenRouter 接入云端模型，或在本地部署如 Gemma 3 等开源模型。\n\nUnmute 特别适合开发者、AI 研究人员以及希望构建语音交互应用的技术爱好者。对于想要探索多模态交互原型或搭建私有化语音助手的团队来说，它提供了基于 Docker Compose 的便捷部署方案，只需单张支持 CUDA 的显卡即可在 Linux 环境下快速启动。虽然普通用户也可通过网页版直接体验，但其核心价值更在于为技术社区提供了一套可定制、可扩展的语音交互基础设施。","# Unmute\n\nTry it out at [Unmute.sh](https:\u002F\u002Funmute.sh)!\n\nUnmute is a system that allows text LLMs to listen and speak by wrapping them in Kyutai's Text-to-speech and Speech-to-text models.\nThe speech-to-text transcribes what the user says, the LLM generates a response in text, and the text-to-speech reads it out loud.\nBoth the STT and TTS are optimized for low latency and the system works with any text LLM you like.\n\nIf you want to use Kyutai STT or Kyutai TTS separately, check out [kyutai-labs\u002Fdelayed-streams-modeling](https:\u002F\u002Fgithub.com\u002Fkyutai-labs\u002Fdelayed-streams-modeling).\nA pre-print about the models is available [here](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2509.08753).\n\nOn a high level, it works like this:\n\n```mermaid\ngraph LR\n    UB[User browser]\n    UB --> B(Backend)\n    UB --> F(Frontend)\n    B --> STT(Speech-to-text)\n    B --> LLM(LLM)\n    B --> TTS(Text-to-speech)\n```\n\n- The user opens the Unmute website, served by the **frontend**.\n- By clicking \"connect\", the user establishes a websocket connection to the **backend**, sending audio and other metadata back and forth in real time.\n  - The backend connects via websocket to the **speech-to-text** server, sending it the audio from the user and receiving back the transcription in real time.\n  - Once the speech-to-text detects that the user has stopped speaking and it's time to generate a response, the backend connects to an **LLM** server to retrieve the response. We serve the LLM using [OpenRouter](https:\u002F\u002Fopenrouter.ai\u002F), but you can also host your own using [VLLM](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm).\n  - As the response is being generated, the backend feeds it to the **text-to-speech** server to read it out loud, and forwards the generated speech to the user.\n\n## Setup\n\n> [!NOTE]\n> If something isn't working for you, don't hesistate to open an issue. We'll do our best to help you figure out what's wrong.\n\nRequirements:\n- Hardware: a GPU with CUDA support and at least 16 GB VRAM. Architecture must be x86_64, no aarch64 support is planned.\n- OS: Linux, or Windows with WSL ([installation instructions](https:\u002F\u002Fubuntu.com\u002Fdesktop\u002Fwsl)). Running on Windows natively is not supported (see [#84](https:\u002F\u002Fgithub.com\u002Fkyutai-labs\u002Funmute\u002Fissues\u002F84)). Neither is running on Mac (see [#74](https:\u002F\u002Fgithub.com\u002Fkyutai-labs\u002Funmute\u002Fissues\u002F74)).\n\nWe provide multiple ways of deploying your own [unmute.sh](unmute.sh):\n\n| Name                      | Number of gpus | Number of machines | Difficulty | Documented | Kyutai support |\n|---------------------------|----------------|--------------------|------------|------------|----------------|\n| Docker Compose            | 1+             | 1                  | Very easy  |✅         |✅              |\n| Dockerless                | 1 to 3         | 1 to 5             | Easy       |✅         |✅              |\n| Docker Swarm              | 1 to ~100      | 1 to ~100          | Medium     |✅         |❌              |\n\n\nSince Unmute is a complex system with many services that need to be running at the same time, we recommend using [**Docker Compose**](https:\u002F\u002Fdocs.docker.com\u002Fcompose\u002F) to run Unmute.\nIt allows you to start or stop all services using a single command.\nSince the services are Docker containers, you get a reproducible environment without having to worry about dependencies.\n\nWhile we support deploying with Docker compose and without Docker, the Docker Swarm deployment is only given to show how we deploy and scale [unmute.sh](unmute.sh). It looks a lot like the compose files, but since debugging multi-nodes applications is hard, we cannot help you debug the swarm deployment.\n\n### LLM access on Hugging Face Hub\n\nYou can use any LLM you want.\nIn production, we use GPT OSS 120B served over OpenRouter.\nIn the default local setup (Docker Compose\u002FDockerless), Unmute uses [Gemma 3 1B](https:\u002F\u002Fhuggingface.co\u002Fgoogle\u002Fgemma-3-1b-it) as the LLM.\n\nThis model is freely available but requires you to accept the conditions to accept it:\n\n1. Create a Hugging Face account.\n2. Accept the conditions on the [Mistral Small 3.2 24B model page](https:\u002F\u002Fhuggingface.co\u002Fmistralai\u002FMistral-Small-3.2-24B-Instruct-2506).\n3. [Create an access token.](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhub\u002Fen\u002Fsecurity-tokens) You can use a fine-grained token, the only permission you need to grant is \"Read access to contents of all public gated repos you can access\".\n   **Do not use tokens with write access when deploying publicly.** In case the server is compromised somehow, the attacker would get write access to any models\u002Fdatasets\u002Fetc. you have on Hugging Face.\n4. Add the token into your `~\u002F.bashrc` or equivalent as `export HUGGING_FACE_HUB_TOKEN=hf_...your token here...`\n\n### Start Unmute\n\nMake sure you have [**Docker Compose**](https:\u002F\u002Fdocs.docker.com\u002Fcompose\u002F) installed.\nYou'll also need the [NVIDIA Container Toolkit](https:\u002F\u002Fdocs.nvidia.com\u002Fdatacenter\u002Fcloud-native\u002Fcontainer-toolkit\u002Flatest\u002Finstall-guide.html) to allow Docker to access your GPU.\nTo make sure the NVIDIA Container Toolkit is installed correctly, run:\n```bash\nsudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi\n```\n\nIf you use [google\u002Fgemma-3-1b-it](https:\u002F\u002Fhuggingface.co\u002Fgoogle\u002Fgemma-3-1b-it),\nthe default in `docker-compose.yml`, 16GB of GPU memory is sufficient.\nIf you're running into memory issues, open `docker-compose.yml` and look for `NOTE:` comments to see places that you might need to adjust.\n\nOn a machine with a GPU, run:\n\n```bash\n# Make sure you have the environment variable with the token:\necho $HUGGING_FACE_HUB_TOKEN  # This should print hf_...something...\n\ndocker compose up --build\n```\n\n#### Using multiple GPUs\n\nOn [Unmute.sh](https:\u002F\u002Funmute.sh\u002F), we run the speech-to-text, text-to-speech, and the VLLM server on separate GPUs,\nwhich improves the latency compared to a single-GPU setup.\nThe TTS latency decreases from ~750ms when running everything on a single L40S GPU to around ~450ms on [Unmute.sh](https:\u002F\u002Funmute.sh\u002F).\n\nIf you have at least three GPUs available, add this snippet to the `stt`, `tts` and `llm` services to ensure they are run on separate GPUs:\n\n```yaml\n  stt: # Similarly for `tts` and `llm`\n    # ...other configuration\n    deploy:\n      resources:\n        reservations:\n          devices:\n            - driver: nvidia\n              count: 1\n              capabilities: [gpu]\n```\n\n\n### Running without Docker\n\nAlternatively, you can choose to run Unmute by manually starting the services without going through Docker.\nThis can be more difficult to set up because of the various dependencies needed.\n\nThe following instructions only work for Linux and WSL.\n\n#### Software requirements\n\n* `uv`: Install with `curl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh`\n* `cargo`: Install with `curl https:\u002F\u002Fsh.rustup.rs -sSf | sh`\n* `pnpm`: Install with `curl -fsSL https:\u002F\u002Fget.pnpm.io\u002Finstall.sh | sh -`\n* `cuda 12.1`: Install it with conda or directly from the Nvidia website. Needed for the Rust processes (tts and stt).\n\n#### Hardware requirements\n\nStart each of the services one by one in a different tmux session or terminal:\n```bash\n.\u002Fdockerless\u002Fstart_frontend.sh\n.\u002Fdockerless\u002Fstart_backend.sh\n.\u002Fdockerless\u002Fstart_llm.sh        # Needs 6.1GB of vram\n.\u002Fdockerless\u002Fstart_stt.sh        # Needs 2.5GB of vram\n.\u002Fdockerless\u002Fstart_tts.sh        # Needs 5.3GB of vram\n```\nAnd the website should be accessible at `http:\u002F\u002Flocalhost:3000`.\n\n### Connecting to a remote server running Unmute\n\nIf you're running Unmute on a machine that you're accessing over SSH – call it `unmute-box`  – and you'd like to access it from your local computer,\nyou'll need to set up [port forwarding](https:\u002F\u002Fwww.ssh.com\u002Facademy\u002Fssh\u002Ftunneling-example).\n\n> [!NOTE]\n> If you're running over HTTP and not HTTPS, you'll need to forward the ports even if `http:\u002F\u002Funmute-box:3000` is accessible directly.\n> This is because browsers usually won't let you use the microphone on HTTP connections except for localhost, for security reasons.\n> See below for HTTPS instructions.\n\n**For Docker Compose**: By default, our Docker Compose setup runs on port 80.\nTo forward port 80 on the remote to port 3333 locally, use:\n\n```bash\nssh -N -L 3333:localhost:80 unmute-box\n```\nIf everything works correctly, this command will simply not output anything and just keep running.\nThen open `localhost:3333` in your browser.\n\n**For Dockerless**: You need to separately forward the backend (port 8000) and frontend (port 3000):\n\n```bash\nssh -N -L 8000:localhost:8000 -L 3000:localhost:3000 unmute-box\n```\n\n```mermaid\nflowchart LR\n    subgraph Local_Machine [Local Machine]\n        direction TB\n        browser[Browser]\n        browser -. \"User opens localhost:3000 in browser\" .-> local_frontend[localhost:3000]\n        browser -. \"Frontend queries API at localhost:8000\" .-> local_backend[localhost:8000]\n    end\n    subgraph Remote_Server [Remote Server]\n        direction TB\n        remote_backend[Backend:8000]\n        remote_frontend[Frontend:3000]\n    end\n    local_backend -- \"SSH Tunnel: 8000\" --> remote_backend\n    local_frontend -- \"SSH Tunnel: 3000\" --> remote_frontend\n```\n\n### HTTPS support\n\nFor simplicity, we omit HTTPS support from the Docker Compose and Dockerless setups.\nIf you want to make the deployment work over the HTTPS, consider using Docker Swarm\n(see [SWARM.md](\u002FSWARM.md)) or ask your favorite LLM how to make the Docker Compose or dockerless setup work over HTTPS.\n\n\n## Production deployment with Docker Swarm\n\nIf you're curious to know how we deploy and scale [unmute.sh](https:\u002F\u002Funmute.sh), take a look at our docs\non the [Docker Swarm deployment](.\u002FSWARM.md).\n\n## Modifying Unmute\n\nHere are some high-level pointers about how you'd go about making certain changes to Unmute.\n\n### Subtitles and dev mode\n\nPress \"S\" to turn on subtitles for both the user and the chatbot.\n\nThere is also a dev mode that can help debugging, but it's disabled by default.\nGo to `useKeyboardShortcuts.ts` and change `ALLOW_DEV_MODE` to `true`.\nThen press `D` to see a debug view.\nYou can add information to the dev mode by modifying `self.debug_dict` in `unmute_handler.py`.\n\n### Changing characters\u002Fvoices\n\nThe characters' voices and prompts are defined in [`voices.yaml`](voices.yaml).\nThe format of the config file should be intuitive.\nCertain system prompts contain dynamically generated elements.\nFor example, \"Quiz show\" has its 5 questions randomly chosen in advance from a fixed list.\nSystem prompts like this are defined in [`unmute\u002Fllm\u002Fsystem_prompt.py`](unmute\u002Fllm\u002Fsystem_prompt.py).\n\nNote that the file is only loaded when the backend starts and is then cached, so if you change something in `voices.yaml`,\nyou'll need to restart the backend.\n\nYou can check out the available voices in our [voice repository](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices).\nTo use one of the voices, change the `path_on_server` field in [`voices.yaml`](voices.yaml) to the relative\npath of the voice you want, for example [`voice-donations\u002FHaku.wav`](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Fvoice-donations\u002FHaku.wav).\n\nFrom June 2025 to February 2026, we also ran the [Unmute Voice Donation Project](https:\u002F\u002Funmute.sh\u002Fvoice-donation),\nwhere volunteers provided their voices for use with Kyutai TTS 1.6B (used by Unmute) and other open-source TTS models.\nYou can find these voices in the [voice repository](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices) as well.\n\n### Using external LLM servers\n\nThe Unmute backend can be used with any OpenAI compatible LLM server. By default, the `docker-compose.yml` configures VLLM to enable a fully self-contained, local setup.\nYou can modify this file to change to another external LLM, such as an OpenAI server, a local ollama setup, etc.\n\nFor ollama, as environment variables for the `unmute-backend` image, replace\n```yaml\n  backend:\n    image: unmute-backend:latest\n    [..]\n    environment:\n      [..]\n       - KYUTAI_LLM_URL=http:\u002F\u002Fllm:8000\n```\n\nwith\n```yaml\n  backend:\n    image: unmute-backend:latest\n    [..]\n    environment:\n      [..]\n      - KYUTAI_LLM_URL=http:\u002F\u002Fhost.docker.internal:11434\n      - KYUTAI_LLM_MODEL=gemma3\n      - KYUTAI_LLM_API_KEY=ollama\n    extra_hosts:\n      - \"host.docker.internal:host-gateway\"\n```\nThis points to your localhost server. Alternatively, to use an OpenAI-compatible server such as [OpenRouter](https:\u002F\u002Fopenrouter.ai\u002F), you can use\n```yaml\n  backend:\n    image: unmute-backend:latest\n    [..]\n    environment:\n      [..]\n      - KYUTAI_LLM_URL=https:\u002F\u002Fopenrouter.ai\u002Fapi\n      - KYUTAI_LLM_MODEL=google\u002Fgemma-3-12b-it # or whatever\n      - KYUTAI_LLM_API_KEY=sk-.. # your OpenRouter key\n```\n\nThe section for vllm can then be removed, as it is no longer needed:\n```yaml\n  llm:\n    image: vllm\u002Fvllm-openai:v0.11.0\n    [..]\n```\n\n### Swapping the frontend\n\nThe backend and frontend communicate over websocket using a protocol based on the\n[OpenAI Realtime API](https:\u002F\u002Fplatform.openai.com\u002Fdocs\u002Fguides\u002Frealtime) (\"ORA\").\nWhere possible, we try to match the ORA format, but there are some extra messages we needed to add,\nand others have simplified parameters.\nWe try to make it clear where we deviate from the ORA format, see [`unmute\u002Fopenai_realtime_api_events.py`](unmute\u002Fopenai_realtime_api_events.py).\n\nFor detailed information about the WebSocket communication protocol, message types, and audio processing pipeline, see the [browser-backend communication documentation](docs\u002Fbrowser_backend_communication.md).\n\nIdeally, it should be simple to write a single frontend that can communicate with either the Unmute backend\nor the OpenAI Realtime API, but we are not fully compatible yet.\nContributions welcome!\n\nThe frontend is a Next.js app defined in `frontend\u002F`.\nIf you'd like to compare to a different frontend implementation,\nthere is a Python client defined in\n[`unmute\u002Floadtest\u002Floadtest_client.py`](unmute\u002Floadtest\u002Floadtest_client.py),\na script that we use to benchmark the latency and throughput of Unmute.\n\n### Tool calling\n\nThis is a common requirement so we would appreciate a contribution to support tool calling in Unmute!\n\nThe easiest way to integrate tool calling into Unmute would be to do so in a way that's fully invisible to Unmute itself - just make it part of the LLM server.\nSee [this comment](https:\u002F\u002Fgithub.com\u002Fkyutai-labs\u002Funmute\u002Fissues\u002F77#issuecomment-3035220686) on how this can be achieved.\nYou'd need to write a simple server in FastAPI to wrap vLLM but plug in the tool call responses.\n\n## Developing Unmute\n\n### Install pre-commit hooks\n\nFirst install `pre-commit` itself – you likely want to install it globally using `pip install pre-commit` rather than in a virtual environment or `uv`,\nbecause you need the `pre-commit` executable to always be available. Then run:\n\n```bash\npre-commit install --hook-type pre-commit\n```\n\nWe recommend using [uv](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002F) to manage Python dependencies.\nThe commands below assume you are using uv.\n\n### Run backend (dev mode, with autoreloading)\n\n```bash\nuv run fastapi dev unmute\u002Fmain_websocket.py\n```\n\n### Run backend (production)\n\n```bash\nuv run fastapi run unmute\u002Fmain_websocket.py\n```\n\n### Run loadtest\n\n`loadtest_client.py` is a script that connects to Unmute and simulates conversations with it in order to measure latency and throughput.\n\n```bash\nuv run unmute\u002Floadtest\u002Floadtest_client.py --server-url ws:\u002F\u002Flocalhost:8000 --n-workers 16\n```\n","# 解除静音\n\n快来 [Unmute.sh](https:\u002F\u002Funmute.sh) 体验吧！\n\nUnmute 是一个系统，它通过将文本型大语言模型（LLM）与 Kyutai 的语音合成（TTS）和语音识别（STT）模型相结合，使 LLM 具备“听”和“说”的能力。语音识别会将用户的语音转录为文本，LLM 生成回复文本，然后语音合成就会将这些文本朗读出来。语音识别和语音合成都针对低延迟进行了优化，该系统可以与任何你喜欢的文本型 LLM 配合使用。\n\n如果你想单独使用 Kyutai 的 STT 或 TTS，可以查看 [kyutai-labs\u002Fdelayed-streams-modeling](https:\u002F\u002Fgithub.com\u002Fkyutai-labs\u002Fdelayed-streams-modeling)。关于这些模型的预印本可以在 [这里](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2509.08753) 获取。\n\n从高层次来看，其工作流程如下：\n\n```mermaid\ngraph LR\n    UB[用户浏览器]\n    UB --> B(后端)\n    UB --> F(前端)\n    B --> STT(语音识别)\n    B --> LLM(LLM)\n    B --> TTS(语音合成)\n```\n\n- 用户打开由 **前端** 提供服务的 Unmute 网站。\n- 点击“连接”后，用户会建立一个 WebSocket 连接至 **后端**，实时传输音频及其他元数据。\n  - 后端通过 WebSocket 连接到 **语音识别** 服务器，将用户的音频发送过去，并实时接收转录结果。\n  - 当语音识别检测到用户停止说话并准备生成回复时，后端会连接到 **LLM** 服务器以获取响应。我们使用 OpenRouter 来部署 LLM，但你也可以使用 VLLM 自行托管。\n  - 在生成回复的过程中，后端会将回复内容传递给 **语音合成** 服务器进行朗读，并将合成后的语音流返回给用户。\n\n## 设置\n\n> [!NOTE]  \n> 如果遇到问题，请随时提交 Issue，我们会尽力帮助你解决问题。\n\n### 要求：\n- 硬件：支持 CUDA 的 GPU，显存至少 16 GB。架构必须是 x86_64，暂不支持 aarch64。\n- 操作系统：Linux，或带有 WSL 的 Windows（[安装说明](https:\u002F\u002Fubuntu.com\u002Fdesktop\u002Fwsl)）。不支持在原生 Windows 上运行（参见 [#84](https:\u002F\u002Fgithub.com\u002Fkyutai-labs\u002Funmute\u002Fissues\u002F84)），也不支持在 Mac 上运行（参见 [#74](https:\u002F\u002Fgithub.com\u002Fkyutai-labs\u002Funmute\u002Fissues\u002F74)）。\n\n我们提供了多种方式来部署你自己的 [unmute.sh](unmute.sh)：\n\n| 名称                      | GPU 数量 | 机器数量 | 难度       | 文档是否完善 | Kyutai 支持 |\n|---------------------------|----------|----------|------------|--------------|-------------|\n| Docker Compose            | 1+       | 1        | 非常简单   | ✅           | ✅          |\n| 无 Docker                 | 1–3      | 1–5      | 容易       | ✅           | ✅          |\n| Docker Swarm              | 1–~100   | 1–~100   | 中等难度   | ✅           | ❌          |\n\n由于 Unmute 是一个复杂的系统，包含多个需要同时运行的服务，我们建议使用 [**Docker Compose**](https:\u002F\u002Fdocs.docker.com\u002Fcompose\u002F) 来运行 Unmute。它允许你通过一条命令启动或停止所有服务。由于这些服务都是 Docker 容器，你可以获得一个可重复的环境，而无需担心依赖关系。\n\n虽然我们也支持使用 Docker Compose 和无 Docker 的部署方式，但 Docker Swarm 部署仅用于展示我们如何部署和扩展 [unmute.sh](unmute.sh)。它的配置文件与 Compose 类似，但由于多节点应用的调试较为困难，我们无法提供 Swarm 部署的调试支持。\n\n### Hugging Face Hub 上的 LLM 访问权限\n\n你可以使用任何你想要的 LLM。在生产环境中，我们使用通过 OpenRouter 提供服务的 GPT OSS 120B。而在默认的本地设置中（Docker Compose\u002F无 Docker），Unmute 使用 [Gemma 3 1B](https:\u002F\u002Fhuggingface.co\u002Fgoogle\u002Fgemma-3-1b-it) 作为 LLM。\n\n该模型可免费使用，但你需要先接受相关条款：\n\n1. 创建一个 Hugging Face 账号。\n2. 接受 [Mistral Small 3.2 24B 模型页面](https:\u002F\u002Fhuggingface.co\u002Fmistralai\u002FMistral-Small-3.2-24B-Instruct-2506)上的使用条款。\n3. [创建访问令牌。](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhub\u002Fen\u002Fsecurity-tokens) 你可以使用细粒度令牌，只需授予“读取你可访问的所有公开 gated repo 内容”的权限即可。\n   **请勿在公开部署时使用具有写入权限的令牌。** 如果服务器遭到入侵，攻击者将获得对你在 Hugging Face 上所有模型、数据集等的写入权限。\n4. 将令牌添加到你的 `~\u002F.bashrc` 或等效文件中，格式为 `export HUGGING_FACE_HUB_TOKEN=hf_...你的令牌...\"`。\n\n### 启动 Unmute\n\n请确保已安装 [**Docker Compose**](https:\u002F\u002Fdocs.docker.com\u002Fcompose\u002F)。此外，还需要安装 [NVIDIA Container Toolkit](https:\u002F\u002Fdocs.nvidia.com\u002Fdatacenter\u002Fcloud-native\u002Fcontainer-toolkit\u002Flatest\u002Finstall-guide.html)，以便 Docker 能够访问你的 GPU。要验证 NVIDIA Container Toolkit 是否正确安装，请运行以下命令：\n\n```bash\nsudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi\n```\n\n如果你使用的是 [google\u002Fgemma-3-1b-it](https:\u002F\u002Fhuggingface.co\u002Fgoogle\u002Fgemma-3-1b-it)，这是 `docker-compose.yml` 中的默认配置，那么 16 GB 的显存就足够了。如果遇到内存不足的问题，可以打开 `docker-compose.yml`，查找带有 `NOTE:` 标记的注释，看看哪些地方可能需要调整。\n\n在配备 GPU 的机器上，运行以下命令：\n\n```bash\n# 确保已设置包含令牌的环境变量：\necho $HUGGING_FACE_HUB_TOKEN  # 应该输出 hf_...一些内容...\n\ndocker compose up --build\n```\n\n#### 使用多 GPU\n\n在 [Unmute.sh](https:\u002F\u002Funmute.sh\u002F) 上，我们将语音识别、语音合成以及 VLLM 服务器分别部署在不同的 GPU 上，这样可以显著降低延迟，相比单 GPU 配置效果更好。例如，当所有服务都在一台 L40S GPU 上运行时，TTS 延迟约为 750 毫秒；而在 [Unmute.sh](https:\u002F\u002Funmute.sh\u002F) 上，这一延迟降至约 450 毫秒。\n\n如果你有至少三块 GPU，可以在 `stt`、`tts` 和 `llm` 服务的配置中添加以下片段，以确保它们分别运行在不同的 GPU 上：\n\n```yaml\n  stt: # `tts` 和 `llm` 同理\n    # ...其他配置\n    deploy:\n      resources:\n        reservations:\n          devices:\n            - driver: nvidia\n              count: 1\n              capabilities: [gpu]\n```\n\n### 不使用 Docker 运行\n\n或者，你也可以选择不通过 Docker，而是手动启动各个服务来运行 Unmute。\n由于需要处理多种依赖关系，这种方式的设置可能会更加复杂。\n\n以下说明仅适用于 Linux 和 WSL 环境。\n\n#### 软件要求\n\n* `uv`：使用 `curl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh` 安装\n* `cargo`：使用 `curl https:\u002F\u002Fsh.rustup.rs -sSf | sh` 安装\n* `pnpm`：使用 `curl -fsSL https:\u002F\u002Fget.pnpm.io\u002Finstall.sh | sh -` 安装\n* `cuda 12.1`：可通过 conda 或直接从 Nvidia 官网安装。Rust 进程（tts 和 stt）需要此工具。\n\n#### 硬件要求\n\n在不同的 tmux 会话或终端中依次启动每个服务：\n```bash\n.\u002Fdockerless\u002Fstart_frontend.sh\n.\u002Fdockerless\u002Fstart_backend.sh\n.\u002Fdockerless\u002Fstart_llm.sh        # 需要 6.1GB 显存\n.\u002Fdockerless\u002Fstart_stt.sh        # 需要 2.5GB 显存\n.\u002Fdockerless\u002Fstart_tts.sh        # 需要 5.3GB 显存\n```\n之后，网站应可在 `http:\u002F\u002Flocalhost:3000` 访问。\n\n### 连接到运行 Unmute 的远程服务器\n\n如果你在一台通过 SSH 访问的机器上运行 Unmute——假设这台机器名为 `unmute-box`——并且希望从本地计算机访问它，\n你需要设置 [端口转发](https:\u002F\u002Fwww.ssh.com\u002Facademy\u002Fssh\u002Ftunneling-example)。\n\n> [!注意]\n> 如果你使用的是 HTTP 而不是 HTTPS，即使可以直接访问 `http:\u002F\u002Funmute-box:3000`，仍然需要进行端口转发。\n> 这是因为出于安全考虑，浏览器通常不允许在非本地的 HTTP 连接上使用麦克风。\n> HTTPS 的相关说明请见下文。\n\n**对于 Docker Compose**：默认情况下，我们的 Docker Compose 设置运行在端口 80 上。\n要将远程服务器的 80 端口转发到本地的 3333 端口，请使用：\n\n```bash\nssh -N -L 3333:localhost:80 unmute-box\n```\n\n如果一切正常，该命令不会输出任何内容，只会持续运行。然后在浏览器中打开 `localhost:3333`。\n\n**对于 Dockerless**：你需要分别转发后端（8000 端口）和前端（3000 端口）：\n\n```bash\nssh -N -L 8000:localhost:8000 -L 3000:localhost:3000 unmute-box\n```\n\n```mermaid\nflowchart LR\n    subgraph Local_Machine [本地机器]\n        direction TB\n        browser[浏览器]\n        browser -. \"用户在浏览器中打开 localhost:3000\" .-> local_frontend[localhost:3000]\n        browser -. \"前端向 localhost:8000 发起 API 请求\" .-> local_backend[localhost:8000]\n    end\n    subgraph Remote_Server [远程服务器]\n        direction TB\n        remote_backend[后端:8000]\n        remote_frontend[前端:3000]\n    end\n    local_backend -- \"SSH 隧道: 8000\" --> remote_backend\n    local_frontend -- \"SSH 隧道: 3000\" --> remote_frontend\n```\n\n### HTTPS 支持\n\n为简化起见，我们在 Docker Compose 和 Dockerless 的设置中未包含 HTTPS 支持。\n如果你想让部署通过 HTTPS 工作，可以考虑使用 Docker Swarm\n(参见 [SWARM.md](\u002FSWARM.md))，或者询问你喜欢的 LLM 如何使 Docker Compose 或 Dockerless 设置支持 HTTPS。\n\n\n## 使用 Docker Swarm 的生产部署\n\n如果你想知道我们是如何部署和扩展 [unmute.sh](https:\u002F\u002Funmute.sh) 的，请查看我们的文档中关于\n[Docker Swarm 部署](.\u002FSWARM.md)的部分。\n\n## 修改 Unmute\n\n以下是一些关于如何对 Unmute 进行特定修改的高层次指导。\n\n### 字幕与开发模式\n\n按下 “S” 键可为用户和聊天机器人同时开启字幕。\n\n此外，还有一个开发模式可以帮助调试，但默认是禁用的。打开 `useKeyboardShortcuts.ts` 文件，将 `ALLOW_DEV_MODE` 改为 `true`。\n然后按下 “D” 键即可查看调试界面。你可以通过修改 `unmute_handler.py` 中的 `self.debug_dict` 来添加更多调试信息。\n\n### 更改角色\u002F声音\n\n角色的声音和提示语定义在 [`voices.yaml`](voices.yaml) 文件中。\n配置文件的格式应该比较直观。某些系统提示语包含动态生成的内容。\n例如，“知识竞赛”中的 5 道题目是从一个固定列表中随机提前选出的。\n类似这样的系统提示语定义在 [`unmute\u002Fllm\u002Fsystem_prompt.py`](unmute\u002Fllm\u002Fsystem_prompt.py) 文件中。\n\n请注意，该文件仅在后端启动时加载并被缓存，因此如果你更改了 `voices.yaml` 中的内容，\n就需要重新启动后端。\n\n你可以在我们的 [语音仓库](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices) 中查看可用的语音。\n要使用其中一种语音，只需将 `voices.yaml` 中的 `path_on_server` 字段更改为所需语音的相对路径，\n例如 [`voice-donations\u002FHaku.wav`](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices\u002Fblob\u002Fmain\u002Fvoice-donations\u002FHaku.wav)。\n\n从 2025 年 6 月至 2026 年 2 月，我们还开展了 [Unmute 语音捐赠项目](https:\u002F\u002Funmute.sh\u002Fvoice-donation)，\n志愿者们提供了自己的声音，用于 Kyutai TTS 1.6B（Unmute 使用）及其他开源 TTS 模型。\n这些语音同样可以在 [语音仓库](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices) 中找到。\n\n### 使用外部 LLM 服务器\n\nUnmute 后端可以与任何兼容 OpenAI 的 LLM 服务器一起使用。默认情况下，`docker-compose.yml` 配置 VLLM，以实现完全自包含的本地部署。\n你可以修改此文件，切换到其他外部 LLM，比如 OpenAI 服务器、本地 Ollama 部署等。\n\n以 Ollama 为例，将 `unmute-backend` 镜像的环境变量部分中的\n```yaml\n  backend:\n    image: unmute-backend:latest\n    [..]\n    environment:\n      [..]\n       - KYUTAI_LLM_URL=http:\u002F\u002Fllm:8000\n```\n\n替换为：\n```yaml\n  backend:\n    image: unmute-backend:latest\n    [..]\n    environment:\n      [..]\n      - KYUTAI_LLM_URL=http:\u002F\u002Fhost.docker.internal:11434\n      - KYUTAI_LLM_MODEL=gemma3\n      - KYUTAI_LLM_API_KEY=ollama\n    extra_hosts:\n      - \"host.docker.internal:host-gateway\"\n```\n\n这样就指向了你的本地服务器。或者，如果你想使用兼容 OpenAI 的服务器，比如 [OpenRouter](https:\u002F\u002Fopenrouter.ai\u002F)，可以这样配置：\n```yaml\n  backend:\n    image: unmute-backend:latest\n    [..]\n    environment:\n      [..]\n      - KYUTAI_LLM_URL=https:\u002F\u002Fopenrouter.ai\u002Fapi\n      - KYUTAI_LLM_MODEL=google\u002Fgemma-3-12b-it # 或其他型号\n      - KYUTAI_LLM_API_KEY=sk-.. # 你的 OpenRouter 密钥\n```\n\n随后可以移除 vllm 相关的部分，因为不再需要：\n```yaml\n  llm:\n    image: vllm\u002Fvllm-openai:v0.11.0\n    [..]\n```\n\n### 切换前端\n\n后端和前端通过 WebSocket 按照基于 [OpenAI 实时 API](https:\u002F\u002Fplatform.openai.com\u002Fdocs\u002Fguides\u002Frealtime)（简称 ORA）的协议进行通信。在可能的情况下，我们尽量与 ORA 格式保持一致，但也有一些额外的消息需要添加，而另一些消息则简化了参数。我们会明确指出与 ORA 格式的差异，请参阅 [`unmute\u002Fopenai_realtime_api_events.py`](unmute\u002Fopenai_realtime_api_events.py)。\n\n有关 WebSocket 通信协议、消息类型以及音频处理流程的详细信息，请参阅 [浏览器与后端通信文档](docs\u002Fbrowser_backend_communication.md)。\n\n理想情况下，编写一个能够同时与 Unmute 后端或 OpenAI 实时 API 通信的单一前端应该很简单，但目前我们尚未完全兼容。欢迎贡献代码！\n\n前端是一个 Next.js 应用，定义在 `frontend\u002F` 目录中。如果您想将其与其他前端实现进行比较，我们还提供了一个 Python 客户端，位于 [`unmute\u002Floadtest\u002Floadtest_client.py`](unmute\u002Floadtest\u002Floadtest_client.py)，该脚本用于基准测试 Unmute 的延迟和吞吐量。\n\n### 工具调用\n\n这是一个常见的需求，因此我们非常欢迎有人为 Unmute 添加对工具调用的支持！\n\n将工具调用集成到 Unmute 中最简单的方式，是让这一功能对 Unmute 本身完全透明——只需将其作为 LLM 服务器的一部分即可。关于如何实现这一点，请参阅 [此评论](https:\u002F\u002Fgithub.com\u002Fkyutai-labs\u002Funmute\u002Fissues\u002F77#issuecomment-3035220686)。您需要使用 FastAPI 编写一个简单的服务器来封装 vLLM，并接入工具调用的响应。\n\n## 开发 Unmute\n\n### 安装 pre-commit 钩子\n\n首先安装 `pre-commit` 工具本身——建议您使用 `pip install pre-commit` 将其全局安装，而不是在虚拟环境或 `uv` 中安装，因为您需要始终能够访问 `pre-commit` 可执行文件。然后运行以下命令：\n\n```bash\npre-commit install --hook-type pre-commit\n```\n\n我们推荐使用 [uv](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002F) 来管理 Python 依赖。以下命令假设您正在使用 uv。\n\n### 运行后端（开发模式，自动重载）\n\n```bash\nuv run fastapi dev unmute\u002Fmain_websocket.py\n```\n\n### 运行后端（生产模式）\n\n```bash\nuv run fastapi run unmute\u002Fmain_websocket.py\n```\n\n### 运行负载测试\n\n`loadtest_client.py` 是一个脚本，它会连接到 Unmute 并模拟与之的对话，以测量延迟和吞吐量。\n\n```bash\nuv run unmute\u002Floadtest\u002Floadtest_client.py --server-url ws:\u002F\u002Flocalhost:8000 --n-workers 16\n```","# Unmute 快速上手指南\n\nUnmute 是一个开源系统，通过集成 Kyutai 的语音转文本（STT）和文本转语音（TTS）模型，让纯文本大语言模型（LLM）具备“听”和“说”的能力。它支持低延迟实时对话，并兼容任意文本 LLM。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**：Linux 或 Windows (需使用 WSL2)。**不支持**原生 Windows 或 macOS。\n- **硬件架构**：x86_64 (暂不支持 aarch64)。\n- **GPU 要求**：支持 CUDA 的 NVIDIA 显卡，显存至少 **16 GB**。\n  - 若使用默认模型 `Gemma 3 1B`，16GB 显存即可。\n  - 若需多 GPU 部署以降低延迟，建议拥有 3 张及以上 GPU。\n\n### 前置依赖\n推荐使用 **Docker Compose** 方式部署，需安装以下工具：\n1. **Docker Compose**\n2. **NVIDIA Container Toolkit** (用于让 Docker 访问 GPU)\n\n验证 NVIDIA Toolkit 是否安装成功：\n```bash\nsudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi\n```\n若命令成功输出 GPU 信息，则环境就绪。\n\n### Hugging Face 访问配置\nUnmute 默认使用 Hugging Face 上的模型（如 `google\u002Fgemma-3-1b-it`）。\n1. 注册 Hugging Face 账号。\n2. 在模型页面接受使用条款（例如 [Mistral Small 3.2](https:\u002F\u002Fhuggingface.co\u002Fmistralai\u002FMistral-Small-3.2-24B-Instruct-2506) 或 Gemma 系列）。\n3. 创建 Access Token：\n   - 权限仅需勾选：**Read access to contents of all public gated repos you can access**。\n   - **警告**：切勿使用具有写入权限的 Token，以防服务器被攻破后泄露数据。\n4. 将 Token 添加到环境变量（以 `.bashrc` 为例）：\n```bash\nexport HUGGING_FACE_HUB_TOKEN=hf_...your_token_here...\nsource ~\u002F.bashrc\n```\n> **国内加速提示**：若下载模型缓慢，可配置 Hugging Face 镜像源：\n> ```bash\n> export HF_ENDPOINT=https:\u002F\u002Fhf-mirror.com\n> ```\n\n## 安装步骤\n\n### 方法一：使用 Docker Compose（推荐）\n这是最简单且可复现的部署方式，一键启动所有服务。\n\n1. 克隆项目代码（假设已获取源码）。\n2. 确认环境变量已设置：\n```bash\necho $HUGGING_FACE_HUB_TOKEN\n# 应输出 hf_ 开头的字符串\n```\n3. 启动服务：\n```bash\ndocker compose up --build\n```\n启动完成后，服务通常运行在 `http:\u002F\u002Flocalhost:80`。\n\n### 方法二：无 Docker 部署（仅限高级用户）\n若不使用 Docker，需手动安装依赖并在不同终端启动各服务。\n**软件依赖**：`uv`, `cargo`, `pnpm`, `cuda 12.1`。\n\n依次在独立的 tmux 会话或终端中运行：\n```bash\n.\u002Fdockerless\u002Fstart_frontend.sh\n.\u002Fdockerless\u002Fstart_backend.sh\n.\u002Fdockerless\u002Fstart_llm.sh        # 需 6.1GB 显存\n.\u002Fdockerless\u002Fstart_stt.sh        # 需 2.5GB 显存\n.\u002Fdockerless\u002Fstart_tts.sh        # 需 5.3GB 显存\n```\n启动后访问 `http:\u002F\u002Flocalhost:3000`。\n\n## 基本使用\n\n### 本地访问\n启动成功后，在浏览器打开：\n- Docker Compose 模式：`http:\u002F\u002Flocalhost` (或 `http:\u002F\u002Flocalhost:80`)\n- 无 Docker 模式：`http:\u002F\u002Flocalhost:3000`\n\n点击页面上的 **\"Connect\"** 按钮，建立 WebSocket 连接。允许浏览器使用麦克风权限，即可开始语音对话。\n\n### 远程访问（SSH 端口转发）\n若 Unmute 运行在远程服务器（例如通过 SSH 连接的 `unmute-box`），由于浏览器安全策略限制，HTTP 协议下非 localhost 无法调用麦克风，必须配置端口转发。\n\n**Docker Compose 模式（默认端口 80）：**\n在本地机器执行：\n```bash\nssh -N -L 3333:localhost:80 unmute-box\n```\n然后在本地浏览器访问 `http:\u002F\u002Flocalhost:3333`。\n\n**无 Docker 模式：**\n需分别转发前端（3000）和后端（8000）：\n```bash\nssh -N -L 8000:localhost:8000 -L 3000:localhost:3000 unmute-box\n```\n然后在本地浏览器访问 `http:\u002F\u002Flocalhost:3000`。\n\n### 进阶功能\n- **开启字幕**：在对话界面按键盘 **`S`** 键，可显示用户和机器人的实时字幕。\n- **开发者模式**：修改前端代码 `useKeyboardShortcuts.ts` 中的 `ALLOW_DEV_MODE` 为 `true`，重启后按 **`D`** 键查看调试信息。\n- **更换声音\u002F角色**：编辑 [`voices.yaml`](voices.yaml) 文件，修改 `path_on_server` 指向 [Kyutai 声音仓库](https:\u002F\u002Fhuggingface.co\u002Fkyutai\u002Ftts-voices) 中的其他音色文件，修改后需重启后端服务。\n- **接入外部 LLM**：修改 `docker-compose.yml` 中的 `KYUTAI_LLM_URL` 环境变量，即可对接 OpenAI、Ollama 或其他兼容接口。","一位视障开发者正在深夜调试复杂的 Python 代码，急需通过自然对话快速定位报错原因并获取修复建议。\n\n### 没有 unmute 时\n- **交互断层严重**：必须依赖屏幕阅读器逐字朗读网页内容，再手动打字输入问题，思维频繁被打断，效率极低。\n- **上下文丢失快**：在“听报错 - 切窗口 - 打字提问 - 等回复”的繁琐流程中，难以保持对代码逻辑的连贯思考。\n- **多模态支持缺失**：无法直接口述代码片段或错误日志，只能依靠并不精准的语音转文字输入法，反复修正消耗大量精力。\n- **响应延迟感知强**：传统文本交互的等待感被放大，缺乏实时对话的流畅性，导致调试过程枯燥且充满挫败感。\n\n### 使用 unmute 后\n- **全双工语音闭环**：unmute 将 Gemma 3 等文本模型包裹在低延迟的语音系统中，开发者可直接口述问题，系统即时语音播报解决方案，实现“所说即所得”。\n- **思维流不断档**：省去了打字和切换界面的步骤，开发者能像与同事结对编程一样，边看代码边口头讨论，保持高度的逻辑专注。\n- **原生口语理解**：依托优化的语音识别模型，unmute 能精准转录包含专业术语的口述代码，自动处理停顿与修正，无需人工干预。\n- **实时流式反馈**：借助流式传输技术，unmute 在模型生成回答的同时即可开始朗读，大幅缩短等待时间，让调试节奏更加紧凑自然。\n\nunmute 的核心价值在于打破了文本大模型的“哑巴”限制，让开发者仅凭声音就能获得沉浸式、零摩擦的智能编程辅助体验。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkyutai-labs_unmute_02c8e24c.png","kyutai-labs","kyutai","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fkyutai-labs_c568cbc2.png","Kyutai - Open Science AI Lab",null,"https:\u002F\u002Fkyutai.org\u002F","https:\u002F\u002Fgithub.com\u002Fkyutai-labs",[85,89,93,96,99,103,107,111],{"name":86,"color":87,"percentage":88},"Python","#3572A5",66,{"name":90,"color":91,"percentage":92},"TypeScript","#3178c6",26.9,{"name":94,"color":95,"percentage":10},"JavaScript","#f1e05a",{"name":97,"color":98,"percentage":10},"Dockerfile","#384d54",{"name":100,"color":101,"percentage":102},"Shell","#89e051",1.3,{"name":104,"color":105,"percentage":106},"Jupyter Notebook","#DA5B0B",0.9,{"name":108,"color":109,"percentage":110},"CSS","#663399",0.5,{"name":112,"color":113,"percentage":114},"MDX","#fcb32c",0.4,1273,221,"2026-04-16T09:12:52","MIT","Linux, Windows (仅限 WSL)","必需 NVIDIA GPU (支持 CUDA)，架构必须为 x86_64 (不支持 aarch64\u002FMac)。单卡运行默认模型需至少 16GB 显存；若分离部署服务，LLM 需 6.1GB，STT 需 2.5GB，TTS 需 5.3GB 显存。","未说明",{"notes":123,"python":124,"dependencies":125},"1. 不支持在 macOS 或 Windows 原生环境运行，Windows 用户必须使用 WSL。\n2. 必须安装 NVIDIA Container Toolkit 以便 Docker 访问 GPU。\n3. 需要配置 Hugging Face Token 以访问受限模型（如 Gemma 3）。\n4. 推荐使用 Docker Compose 进行一键部署；若无 Docker，需手动安装 uv, cargo, pnpm 及 CUDA 12.1。\n5. 多 GPU 部署可降低延迟（将 STT, TTS, LLM 分配至不同显卡）。","未说明 (后端使用 uv 管理，前端\u002F工具链涉及 Node.js 和 Rust)",[126,127,128,129,130,131],"Docker Compose","NVIDIA Container Toolkit","uv","cargo (Rust)","pnpm","CUDA 12.1",[15,55],"2026-03-27T02:49:30.150509","2026-04-18T02:22:20.334913",[136,141,146,151,156,161,166],{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},38440,"如何在 Nvidia RTX 50 系列显卡上运行？","在最新的 main 分支上应该可以直接运行。项目最近发布了支持 Torch 高达 2.9 版本的新版 Moshi，这意味着即将在 main 分支中添加官方支持，而无需特殊的变通方法或分离的 Python 环境。如果有用户已测试成功（如 RTX 5070 Ti），表明当前方案可行。","https:\u002F\u002Fgithub.com\u002Fkyutai-labs\u002Funmute\u002Fissues\u002F76",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},38441,"Unmute（Docker 或普通版本）可以在 Apple Silicon (M 系列芯片) 上运行吗？","目前可以“勉强”运行。如果 LLM 不是本地部署，体验会比较流畅且延迟较低。在基础的 M4 芯片上已有成功案例，预计在 M4 Pro 或 M4 Max 上表现会更好。社区正在关注 MLX 社区是否有相关进展以提供更好支持。","https:\u002F\u002Fgithub.com\u002Fkyutai-labs\u002Funmute\u002Fissues\u002F74",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},38442,"前端连接时提示需要麦克风权限，但没有弹出授权窗口怎么办？","这通常是因为使用了 HTTP 而非 HTTPS 协议，浏览器出于安全考虑禁止在非安全上下文中使用麦克风。\n解决方案：\n1. 最好部署 HTTPS。\n2. 如果是本地测试且必须用 HTTP，可以在 Chrome 地址栏输入 `chrome:\u002F\u002Fflags\u002F#unsafely-treat-insecure-origin-as-secure`，将你的本地 IP（如 http:\u002F\u002F192.168.0.X）添加进去并启用该选项，然后重启浏览器即可。","https:\u002F\u002Fgithub.com\u002Fkyutai-labs\u002Funmute\u002Fissues\u002F116",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},38443,"为什么找不到语音克隆（Voice Clone）所需的 safetensors 权重文件？","语音克隆功能目前尚未公开可用。维护者建议关注其最新发布的 Pocket TTS 项目。由于法规等原因，Kyutai 暂时无法公开发布相关的微调模型或语音克隆权重。社区也有其他替代方案（如基于 Sesame CSM 的项目），但官方 Unmute 暂不支持此功能。","https:\u002F\u002Fgithub.com\u002Fkyutai-labs\u002Funmute\u002Fissues\u002F99",{"id":157,"question_zh":158,"answer_zh":159,"source_url":160},38444,"为了获得类似生产环境的低延迟性能，推荐什么样的硬件配置？","单张 H100 可能不足以处理平滑的中断和低延迟需求。测试表明，使用多张 GPU（例如 8x H100s）能显著提升性能，使其接近生产版本的效果。瓶颈可能在于显存容量（如 GTX 4060 的 8GB 可能不足）以及是否需要多卡并行来处理 LLM 和音频流。具体的系统提示词和配置可在开源代码库中查找。","https:\u002F\u002Fgithub.com\u002Fkyutai-labs\u002Funmute\u002Fissues\u002F129",{"id":162,"question_zh":163,"answer_zh":164,"source_url":165},38445,"可以在 Jetson Orin Nano 等 ARM64 架构设备上部署吗？","目前存在架构兼容性问题（Aarch\u002FARM64）。虽然目标是支持单 GPU 机器，但在 Jetson Orin Nano (8GB) 上安装时会遇到架构问题。目前的开发重点仍集中在带有 RTX 40\u002F50 系列显卡的标准 x64 游戏电脑上。在 ARM 架构上运行可能需要解决 Docker 部署及 vLLM 服务的兼容性问题。","https:\u002F\u002Fgithub.com\u002Fkyutai-labs\u002Funmute\u002Fissues\u002F139",{"id":167,"question_zh":168,"answer_zh":169,"source_url":165},38446,"为什么界面上有两个连接按钮？能否简化？","保留两个连接按钮（一个在按钮栏，一个在系统圆圈）是有意为之的设计。维护者担心如果只保留圆圈按钮，新用户可能不知道可以点击圆圈来开始连接。虽然这在窄屏下可能导致部分遮挡，但为了确保新用户的易用性，目前保持现状。",[]]