[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-rsxdalv--TTS-WebUI":3,"tool-rsxdalv--TTS-WebUI":62},[4,18,28,37,45,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":24,"last_commit_at":25,"category_tags":26,"status":17},9989,"n8n","n8n-io\u002Fn8n","n8n 是一款面向技术团队的公平代码（fair-code）工作流自动化平台，旨在让用户在享受低代码快速构建便利的同时，保留编写自定义代码的灵活性。它主要解决了传统自动化工具要么过于封闭难以扩展、要么完全依赖手写代码效率低下的痛点，帮助用户轻松连接 400 多种应用与服务，实现复杂业务流程的自动化。\n\nn8n 特别适合开发者、工程师以及具备一定技术背景的业务人员使用。其核心亮点在于“按需编码”：既可以通过直观的可视化界面拖拽节点搭建流程，也能随时插入 JavaScript 或 Python 代码、调用 npm 包来处理复杂逻辑。此外，n8n 原生集成了基于 LangChain 的 AI 能力，支持用户利用自有数据和模型构建智能体工作流。在部署方面，n8n 提供极高的自由度，支持完全自托管以保障数据隐私和控制权，也提供云端服务选项。凭借活跃的社区生态和数百个现成模板，n8n 让构建强大且可控的自动化系统变得简单高效。",184740,2,"2026-04-19T23:22:26",[16,14,13,15,27],"插件",{"id":29,"name":30,"github_repo":31,"description_zh":32,"stars":33,"difficulty_score":10,"last_commit_at":34,"category_tags":35,"status":17},10095,"AutoGPT","Significant-Gravitas\u002FAutoGPT","AutoGPT 是一个旨在让每个人都能轻松使用和构建 AI 的强大平台，核心功能是帮助用户创建、部署和管理能够自动执行复杂任务的连续型 AI 智能体。它解决了传统 AI 应用中需要频繁人工干预、难以自动化长流程工作的痛点，让用户只需设定目标，AI 即可自主规划步骤、调用工具并持续运行直至完成任务。\n\n无论是开发者、研究人员，还是希望提升工作效率的普通用户，都能从 AutoGPT 中受益。开发者可利用其低代码界面快速定制专属智能体；研究人员能基于开源架构探索多智能体协作机制；而非技术背景用户也可直接选用预置的智能体模板，立即投入实际工作场景。\n\nAutoGPT 的技术亮点在于其模块化“积木式”工作流设计——用户通过连接功能块即可构建复杂逻辑，每个块负责单一动作，灵活且易于调试。同时，平台支持本地自托管与云端部署两种模式，兼顾数据隐私与使用便捷性。配合完善的文档和一键安装脚本，即使是初次接触的用户也能在几分钟内启动自己的第一个 AI 智能体。AutoGPT 正致力于降低 AI 应用门槛，让人人都能成为 AI 的创造者与受益者。",183572,"2026-04-20T04:47:55",[13,36,27,14,15],"语言模型",{"id":38,"name":39,"github_repo":40,"description_zh":41,"stars":42,"difficulty_score":10,"last_commit_at":43,"category_tags":44,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":46,"name":47,"github_repo":48,"description_zh":49,"stars":50,"difficulty_score":24,"last_commit_at":51,"category_tags":52,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",161147,"2026-04-19T23:31:47",[14,13,36],{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":59,"last_commit_at":60,"category_tags":61,"status":17},8272,"opencode","anomalyco\u002Fopencode","OpenCode 是一款开源的 AI 编程助手（Coding Agent），旨在像一位智能搭档一样融入您的开发流程。它不仅仅是一个代码补全插件，而是一个能够理解项目上下文、自主规划任务并执行复杂编码操作的智能体。无论是生成全新功能、重构现有代码，还是排查难以定位的 Bug，OpenCode 都能通过自然语言交互高效完成，显著减少开发者在重复性劳动和上下文切换上的时间消耗。\n\n这款工具专为软件开发者、工程师及技术研究人员设计，特别适合希望利用大模型能力来提升编码效率、加速原型开发或处理遗留代码维护的专业人群。其核心亮点在于完全开源的架构，这意味着用户可以审查代码逻辑、自定义行为策略，甚至私有化部署以保障数据安全，彻底打破了传统闭源 AI 助手的“黑盒”限制。\n\n在技术体验上，OpenCode 提供了灵活的终端界面（Terminal UI）和正在测试中的桌面应用程序，支持 macOS、Windows 及 Linux 全平台。它兼容多种包管理工具，安装便捷，并能无缝集成到现有的开发环境中。无论您是追求极致控制权的资深极客，还是渴望提升产出的独立开发者，OpenCode 都提供了一个透明、可信",144296,1,"2026-04-16T14:50:03",[13,27],{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":68,"readme_en":69,"readme_zh":70,"quickstart_zh":71,"use_case_zh":72,"hero_image_url":73,"owner_login":74,"owner_name":75,"owner_avatar_url":76,"owner_bio":77,"owner_company":78,"owner_location":78,"owner_email":78,"owner_twitter":78,"owner_website":79,"owner_url":80,"languages":81,"stars":120,"forks":121,"last_commit_at":122,"license":123,"difficulty_score":24,"env_os":124,"env_gpu":125,"env_ram":126,"env_deps":127,"category_tags":136,"github_topics":138,"view_count":24,"oss_zip_url":78,"oss_zip_packed_at":78,"status":17,"created_at":156,"updated_at":157,"faqs":158,"releases":188},10124,"rsxdalv\u002FTTS-WebUI","TTS-WebUI","A single Gradio + React WebUI with extensions for ACE-Step, OmniVoice, Kimi Audio, Piper TTS, GPT-SoVITS, CosyVoice, XTTSv2, DIA, Kokoro, OpenVoice, ParlerTTS, Stable Audio, MMS, StyleTTS2, MAGNet, AudioGen, MusicGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, and Bark!","TTS-WebUI 是一个功能强大的开源音频处理平台，旨在为用户提供一个统一的网页界面，轻松访问和管理数十种顶尖的 AI 语音与音乐生成模型。它巧妙地将 Gradio 和 React 前端技术结合，通过插件化架构集成了包括 CosyVoice、GPT-SoVITS、XTTSv2、RVC、MusicGen 以及 Bark 在内的众多热门模型，覆盖了从文本转语音、高保真语音克隆、歌声转换到背景音乐生成的全方位需求。\n\n过去，用户若想尝试不同的 AI 音频模型，往往需要分别配置复杂的环境、安装各自的依赖库，过程繁琐且容易出错。TTS-WebUI 完美解决了这一痛点，它将所有模型整合在一个简洁易用的界面中，支持一键安装和 Docker 部署，极大地降低了使用门槛。无论是希望快速生成配音内容的普通创作者、需要测试不同模型效果的研究人员，还是寻求高效工作流的开发者，都能从中受益。\n\n其独特的技术亮点在于高度的可扩展性与兼容性，不仅支持本地运行，还提供 Google Colab 云端笔记本来方便资源有限的用户体验。此外，它还支持与 Silly Tavern 等流行前端集成，为角色扮演和互动叙事场景提","TTS-WebUI 是一个功能强大的开源音频处理平台，旨在为用户提供一个统一的网页界面，轻松访问和管理数十种顶尖的 AI 语音与音乐生成模型。它巧妙地将 Gradio 和 React 前端技术结合，通过插件化架构集成了包括 CosyVoice、GPT-SoVITS、XTTSv2、RVC、MusicGen 以及 Bark 在内的众多热门模型，覆盖了从文本转语音、高保真语音克隆、歌声转换到背景音乐生成的全方位需求。\n\n过去，用户若想尝试不同的 AI 音频模型，往往需要分别配置复杂的环境、安装各自的依赖库，过程繁琐且容易出错。TTS-WebUI 完美解决了这一痛点，它将所有模型整合在一个简洁易用的界面中，支持一键安装和 Docker 部署，极大地降低了使用门槛。无论是希望快速生成配音内容的普通创作者、需要测试不同模型效果的研究人员，还是寻求高效工作流的开发者，都能从中受益。\n\n其独特的技术亮点在于高度的可扩展性与兼容性，不仅支持本地运行，还提供 Google Colab 云端笔记本来方便资源有限的用户体验。此外，它还支持与 Silly Tavern 等流行前端集成，为角色扮演和互动叙事场景提供了丰富的声音表现力。TTS-WebUI 让探索前沿音频 AI 技术变得像浏览网页一样简单高效。","\u003Ch1 align=\"center\">TTS WebUI\u003C\u002Fh1>\n\n\u003Cdiv align=\"center\">\n\n  \u003Ch4 align=\"center\">\n\n  [Download Installer](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Freleases\u002Fdownload\u002Fv0.0.0\u002Ftts-webui-installer.zip) ||\n  [Installation](#installation) ||\n  [Docker Setup](#docker-setup) ||\n  [Silly Tavern](#integrations) ||\n  [Extensions](#extensions) ||\n  [Feedback \u002F Bug reports](https:\u002F\u002Fforms.gle\u002F2L62owhBsGFzdFBC8)\n\n  \u003C\u002Fh4>\n\n  [![banner](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frsxdalv_TTS-WebUI_readme_f2560aa3c133.png)](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts-webui)\n\n  [![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Frsxdalv\u002Ftts-webui?style=social)](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts-webui\u002Fstargazers)\n  [![GitHub](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Frsxdalv\u002Ftts-webui)](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts-webui\u002Fblob\u002Fmain\u002FLICENSE)\n  [![Discord](https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F1258772280071295018?label=discord&logo=discord&logoColor=white)](https:\u002F\u002Fdiscord.gg\u002FV8BKTVRtJ9)\n  [![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Frsxdalv\u002Ftts-webui\u002Fblob\u002Fmain\u002Fdocumentation\u002Fnotebooks\u002Fgoogle_colab.ipynb)\n  [![GitHub forks](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Frsxdalv\u002Ftts-webui?style=social)](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts-webui\u002Fnetwork\u002Fmembers)\n  [![YouTube](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FYouTube-%23FF0000.svg?logo=YouTube&logoColor=white)](https:\u002F\u002Fwww.youtube.com\u002F@TTS-WebUI)\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\n## Videos\n\n\u003C\u002Fdiv>\n\n| [![Watch the video](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frsxdalv_TTS-WebUI_readme_bec48153aebb.jpg)](https:\u002F\u002Fyoutu.be\u002FJXojhFjZ39k) | [![Watch the video](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frsxdalv_TTS-WebUI_readme_9debe8accd5d.jpg)](https:\u002F\u002Fyoutu.be\u002FYvM3DdRHDsI) | [![Watch the video](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frsxdalv_TTS-WebUI_readme_e4561d7cb33d.jpg)](https:\u002F\u002Fyoutu.be\u002F_0rftbXPJLI) |\n| :------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------: |\n\n\u003Cdiv align=\"center\">\n\n## Examples\n\n\u003C\u002Fdiv>\n\n\n| \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F16ac948a-fe98-49ad-ad87-19c41fe7e65e\" width=\"300\">\u003C\u002Fvideo> | \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F55bde4f7-bbcc-4ecf-8f94-b315b9d22e74\" width=\"300\">\u003C\u002Fvideo> | \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Ffcee8906-a101-400d-8499-4e72c7603042\" width=\"300\">\u003C\u002Fvideo> |\n| :-----------------------------------------: | :-----------------------------------------: | :-------------------------------: |\n\n\u003Cdiv align=\"center\">\n\n## Screenshots\n\n\u003C\u002Fdiv>\n\n| ![react_1](.\u002Fdocumentation\u002Fscreenshots\u002Freact_ui%20(1).png) | ![react_2](.\u002Fdocumentation\u002Fscreenshots\u002Freact_ui%20(2).png) | ![react_3](.\u002Fdocumentation\u002Fscreenshots\u002Freact_ui%20(3).png) |\n| :-----------------------------------------: | :-----------------------------------------: | :-------------------------------: |\n\n| ![gradio_1](.\u002Fdocumentation\u002Fscreenshots\u002Fgradio%20(1).png) | ![gradio_2](.\u002Fdocumentation\u002Fscreenshots\u002Fgradio%20(2).png) | ![gradio_3](.\u002Fdocumentation\u002Fscreenshots\u002Fgradio%20(3).png) |\n| :-----------------------------------------: | :-----------------------------------------: | :-------------------------------: |\n\n\u003Cdiv align=\"center\">\n\n## Supported Models\n\n| Text-to-speech                                                                      | Audio\u002FMusic Generation                                                                | Audio Conversion\u002FTools                                                       |\n| ----------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------- |\n| [Bark](https:\u002F\u002Fgithub.com\u002Fsuno-ai\u002Fbark)                                             | [MusicGen](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Faudiocraft\u002Fblob\u002Fmain\u002Fdocs\u002FMUSICGEN.md) | [RVC](https:\u002F\u002Fgithub.com\u002FRVC-Project\u002FRetrieval-based-Voice-Conversion-WebUI) |\n| [Tortoise](https:\u002F\u002Fgithub.com\u002Fneonbjb\u002Ftortoise-tts)                                 | [MAGNeT](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Faudiocraft\u002Fblob\u002Fmain\u002Fdocs\u002FMAGNET.md)     | [Demucs](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdemucs)                         |\n| [Maha TTS](https:\u002F\u002Fgithub.com\u002Fdubverse-ai\u002FMahaTTS)                                  | [Stable Audio](https:\u002F\u002Fgithub.com\u002FStability-AI\u002Fstable-audio-tools)                    | [Vocos](https:\u002F\u002Fgithub.com\u002Fgemelo-ai\u002Fvocos)                                  |\n| [MMS](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Ffairseq\u002Fblob\u002Fmain\u002Fexamples\u002Fmms\u002FREADME.md) | [Riffusion\\*](https:\u002F\u002Fgithub.com\u002Friffusion\u002Friffusion-hobby)                           | [Whisper](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fwhisper)                                 |\n| [Vall-E X](https:\u002F\u002Fgithub.com\u002FPlachtaa\u002FVALL-E-X)                                    | [AudioCraft Mac\\*](https:\u002F\u002Fgithub.com\u002Ftrizko\u002Faudiocraft)                              | [AP BWE](https:\u002F\u002Fgithub.com\u002Fyxlu-0102\u002FAP-BWE)                                |\n| [StyleTTS2](https:\u002F\u002Fgithub.com\u002Fsidharthrajaram\u002FStyleTTS2)                           | [AudioCraft Plus\\*](https:\u002F\u002Fgithub.com\u002FGrandaddyShmax\u002Faudiocraft_plus)                | [Resemble Enhance](https:\u002F\u002Fgithub.com\u002Fresemble-ai\u002Fresemble-enhance)          |\n| [SeamlessM4T](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fseamless_communication)           | [ACE-Step\\*](https:\u002F\u002Fgithub.com\u002FACE-Step\u002FACE-Step)                                    | [Audio Separator](https:\u002F\u002Fgithub.com\u002Fnomadkaraoke\u002Fpython-audio-separator)    |\n| [XTTSv2\\*](https:\u002F\u002Fgithub.com\u002Fcoqui-ai\u002FTTS)                                         | [Song Bloom\\*](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts_webui_extension.song_bloom)             | [PyRNNoise\\*](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts_webui_extension.pyrnnoise)      |\n| [MARS5\\*](https:\u002F\u002Fgithub.com\u002Fcamb-ai\u002Fmars5-tts)                                     |                                                                                       | [MiMo Audio\\*](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts_webui_extension.mimo_audio)    |\n| [F5-TTS\\*](https:\u002F\u002Fgithub.com\u002FSWivid\u002FF5-TTS)                                        |                                                                                       |                                                                              |\n| [Parler TTS\\*](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fparler-tts)                           |                                                                                       |                                                                              |\n| [OpenVoice\\*](https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FOpenVoice)                              |                                                                                       |                                                                              |\n| [OpenVoice V2\\*](https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FOpenVoice)                           |                                                                                       |                                                                              |\n| [Kokoro TTS\\*](https:\u002F\u002Fgithub.com\u002Fhexgrad\u002Fkokoro)                                   |                                                                                       |                                                                              |\n| [DIA\\*](https:\u002F\u002Fgithub.com\u002Fnari-labs\u002Fdia)                                           |                                                                                       |                                                                              |\n| [CosyVoice\\*](https:\u002F\u002Fgithub.com\u002FFunAudioLLM\u002FCosyVoice)                             |                                                                                       |                                                                              |\n| [GPT-SoVITS\\*](https:\u002F\u002Fgithub.com\u002FX-T-E-R\u002FGPT-SoVITS-Inference)                     |                                                                                       |                                                                              |\n| [Piper TTS\\*](https:\u002F\u002Fgithub.com\u002Frhasspy\u002Fpiper)                                     |                                                                                       |                                                                              |\n| [Kimi Audio 7B Instruct\\*](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002FKimi-Audio)                 |                                                                                       |                                                                              |\n| [Chatterbox\\*](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Fchatterbox)                               |                                                                                       |                                                                              |\n| [VibeVoice\\*](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts_webui_extension.vibevoice)             |                                                                                       |                                                                              |\n| [Kitten TTS\\*](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts_webui_extension.kitten_tts)           |                                                                                       |                                                                              |\n| [Index-TTS2\\*](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts_webui_extension.index_tts)            |                                                                                       |                                                                              |\n| [VoxCPM\\*](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts_webui_extension.vox_cpm)                  |                                                                                       |                                                                              |\n| [FireRedTTS2\\*](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts_webui_extension.fireredtts2)         |                                                                                       |                                                                              |\n| [MegaTTS3\\*](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts_webui_extension.megatts3)               |                                                                                       |                                                                              |\n| [MiniMax Cloud TTS](https:\u002F\u002Fwww.minimaxi.com) (built-in)                            |                                                                                       |                                                                              |\n\n\\* These models are not installed by default, instead they are available as extensions.\n\n\u003C\u002Fdiv>\n\n\n## Installation\n\n### Using the Installer (Recommended)\n\nCurrent base installation size is around 10.7 GB. Each model will require 2-8 GB of space in addition.\n\n* Download the [latest version](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Freleases\u002Fdownload\u002Fv0.0.0\u002Ftts-webui-installer.zip) and extract it.\n* Run start_tts_webui.bat or start_tts_webui.sh to start the server. It will ask you to select the GPU\u002FChip you are using. Once everything has installed, it will start the Gradio server at http:\u002F\u002Flocalhost:7770 and the React UI at http:\u002F\u002Flocalhost:3000.\n* Output log will be available in the installer_scripts\u002Foutput.log file.\n* Note: The start script sets up a conda environment and a python virtual environment. Thus you don't need to make a venv before that, and in fact, launching from another venv might break this script.\n\n### Manual installation\n\nPrerequisites:\n* git\n* Python 3.10 or 3.11 (3.12 not supported yet)\n* PyTorch\n* ffmpeg (with vorbis support)\n* (Optional) NodeJS 22.9.0 for React UI\n* SQLite (bundled with Python) for database support\n\n1. Clone the repository:\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts-webui.git\n   cd tts-webui\n   ```\n2. Install required packages:\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n3. Run the server:\n   ```bash\n   python server.py --no-react\n   ```\n\n4. For React UI:\n   ```bash\n   cd react-ui\n   npm install\n   npm run build\n   cd ..\n   python server.py\n   ```\n\nFor detailed manual installation instructions, please refer to the [Manual Installation Guide](.\u002Fdocumentation\u002Fmanual_installation.md).\n\n### Docker Setup\n\ntts-webui can also be ran inside of a Docker container. Using CUDA inside of docker requires [NVIDIA Container Toolkit](https:\u002F\u002Fdocs.nvidia.com\u002Fdatacenter\u002Fcloud-native\u002Fcontainer-toolkit\u002Flatest\u002Finstall-guide.html). To get started, pull the image from GitHub Container Registry:\n\n```\ndocker pull ghcr.io\u002Frsxdalv\u002Ftts-webui:main\n```\n\nOnce the image has been pulled it can be started with Docker Compose:\nThe ports are 7770 (env:TTS_PORT) for the Gradio backend and 3000 (env:UI_PORT) for the React front end.\n\n```\ndocker compose up -d\n```\n\nThe container will take some time to generate the first output while models are downloaded in the background. The status of this download can be verified by checking the container logs:\n\n```\ndocker logs tts-webui\n```\n\n#### Building the image yourself\nIf you wish to build your own docker container, you can use the included Dockerfile:\n\n```\ndocker build -t tts-webui .\n```\nPlease note that the docker-compose needs to be edited to use the image you just built.\n\n\n## Changelog\n\nApril:\n* Add torchcodec CPU to requirements\n* Upgrade PyTorch to 2.11.0\n* Update pin TorchAudio to 2.7.0\n\nMarch:\n* feat: Next.js 13.5 → 16.2.1 Upgrade\n\nDecember:\n* Minor bug fixes and improvements\n* refactor: extract bark config loading to separate function from WebUI\n* chore: remove postgres from tts-webui as it is a heavy dependency not needed for most users\n* feat: add database API server to manage generated audio files and metadata\n\nNovember:\n* Add extension marketplace directly in the Gradio UI using iframe\n* Move voices-tortoise to voices\u002Ftortoise\u002F (and maha-tts to voices\u002Fmaha_tts\u002F)\n* feat: add comprehensive test suite for tts_webui modules\n* feat: implement install button for extension marketplace\n* feat: add button to check dependencies in extension management UI\n* feat(cli): add extension-manager command to launch GUI tool\n* chore: remove uv from google colab\n* chore: move logging directory to installer_scripts\u002Flogs\u002F\n* feat: add ability to disable decorator extensions via config.json\n\nOctober:\n* Update Gradio to 5.49.1\n* Update @gradio\u002Fclient to 1.19.1\n* Fix Chatterbox installation issues, paths and streaming errors in Silly Tavern\n* Create new extension category - Conversational AI\n* Reorganize environment variables, new dotenv manager\n* Convert more of the UI into extensions and simplify base server\n* Add API_KEY to OpenAI TTS API extension\n* feat: support new extensions format - tabsInGroups\n* feat: add External Extensions Installer to manage and install external extensions via JSON\n* feat: Add links to TTS WebUI Extension Catalog in documentation and installer UI\n* feat: Add Log Viewer extension to view and manage log files\n\n## Past Changes\n\nSee the [2025 Changelog](.\u002Fdocumentation\u002Fchangelog-2025.md) for a detailed list of changes in 2025.\n\nSee the [2024 Changelog](.\u002Fdocumentation\u002Fchangelog-2024.md) for a detailed list of changes in 2024.\n\nSee the [2023 Changelog](.\u002Fdocumentation\u002Fchangelog-2023.md) for a detailed list of changes in 2023.\n\n## Extensions\n\nExtensions are available to install from the webui itself, or using React UI. They can also be installed using the extension manager or the External Extensions Installer (a built-in tool for adding custom extensions from JSON).\n\nInternally, extensions are just python packages that are installed using pip. Multiple extensions can be installed at the same time, but there might be compatibility issues between them. After installing or updating an extension, you need to restart the app to load it.\n\nFor a curated list of community-created extensions, visit the [TTS WebUI Extension Catalog](https:\u002F\u002Frsxdalv.github.io\u002Ftts-webui-extension-catalog\u002F). You can also find information on publishing your own extensions there.\n\nUpdates need to be done manually by using the mini-control panel:\n\n![mini-control-panel](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frsxdalv_TTS-WebUI_readme_6ca76eb72ac8.png)\n\n\n## Integrations\n\n### Silly Tavern\n\n1. Update OpenAI TTS API extension to latest version\n2. Start the API and test it with Python Requests\n \n   *(OpenAI client might not be installed thus the Test with Python OpenAI client might fail)*\n\n3. Once you can see the audio generates successfully, go to Silly Tavern, and add a new TTS API\n   Default provider endpoint: `http:\u002F\u002Flocalhost:7778\u002Fv1\u002Faudio\u002Fspeech`\n   ![silly-tavern-tts-api](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frsxdalv_TTS-WebUI_readme_acd245a1dfb8.png)\n4. Test it out!\n\n### Text Generation WebUI (oobabooga\u002Ftext-generation-webui)\n\n1. Install https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftext-to-tts-webui extension in text-generation-webui\n2. Start the API and test it with Python Requests\n3. Configure using the panel: ![oobaboooga-text-to-tts-webui](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frsxdalv_TTS-WebUI_readme_431e8cdfbeaf.png)\n\n### OpenWebUI \n\n1. Enable OpenAI API extension in TTS WebUI\n2. Start the API and test it with Python Requests\n3. Once you can see the audio generates successfully, go to OpenWebUI, and add a new TTS API\n   Default provider endpoint: `http:\u002F\u002Flocalhost:7778\u002Fv1\u002Faudio\u002Fspeech`\n4. Test it out!\n![openwebui](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frsxdalv_TTS-WebUI_readme_c3e79642a9bf.png)\n\n### OpenAI Compatible APIs\n\nUsing the instructions above, you can install an OpenAI compatible API, and use it with Silly Tavern or other OpenAI compatible clients.\n\n## Compatibility \u002F Errors\n\n### Red messages in console\nThese messages:\n```\n---- requires ----, but you have ---- which is incompatible.\n```\nAre completely normal. It's both a limitation of pip and because this Web UI combines a lot of different AI projects together. Since the projects are not always compatible with each other, they will complain about the other projects being installed. This is normal and expected. And in the end, despite the warnings\u002Ferrors the projects will work together.\nIt's not clear if this situation will ever be resolvable, but that is the hope.\n\n\n## Extra Voices for Bark, Prompt Samples\n\u003Cdiv align=\"center\">\n\n[![PromptEcho](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frsxdalv_TTS-WebUI_readme_f88d54d70cad.png)](https:\u002F\u002Fpromptecho.com\u002F)\n\n[![Bark Speaker Directory](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frsxdalv_TTS-WebUI_readme_4e6cc766644b.png)](https:\u002F\u002Frsxdalv.github.io\u002Fbark-speaker-directory\u002F)\n\n\u003C\u002Fdiv>\n\n## Bark Readme\n[README_Bark.md](.\u002Fdocumentation\u002FREADME_Bark.md)\n\n## Info about managing models, caches and system space for AI projects\nhttps:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts-webui\u002Fdiscussions\u002F186#discussioncomment-7291274\n\n\n\n## Open Source Libraries\n\n\u003Cdetails>\n\u003Csummary>This project utilizes the following open source libraries:\u003C\u002Fsummary>\n\n- **suno-ai\u002Fbark** - [MIT License](https:\u002F\u002Fgithub.com\u002Fsuno-ai\u002Fbark\u002Fblob\u002Fmain\u002FLICENSE)\n  - Description: Inference code for Bark model.\n  - Repository: [suno\u002Fbark](https:\u002F\u002Fgithub.com\u002Fsuno-ai\u002Fbark)\n\n- **tortoise-tts** - [Apache-2.0 License](https:\u002F\u002Fgithub.com\u002Fneonbjb\u002Ftortoise-tts\u002Fblob\u002Fmaster\u002FLICENSE)\n  - Description: A flexible text-to-speech synthesis library for various platforms.\n  - Repository: [neonbjb\u002Ftortoise-tts](https:\u002F\u002Fgithub.com\u002Fneonbjb\u002Ftortoise-tts)\n\n- **ffmpeg** - [LGPL License](https:\u002F\u002Fgithub.com\u002FFFmpeg\u002FFFmpeg\u002Fblob\u002Fmaster\u002FLICENSE.md)\n  - Description: A complete and cross-platform solution for video and audio processing.\n  - Repository: [FFmpeg](https:\u002F\u002Fgithub.com\u002FFFmpeg\u002FFFmpeg)\n  - Use: Encoding Vorbis Ogg files\n\n- **ffmpeg-python** - [Apache 2.0 License](https:\u002F\u002Fgithub.com\u002Fkkroening\u002Fffmpeg-python\u002Fblob\u002Fmaster\u002FLICENSE)\n  - Description: Python bindings for FFmpeg library for handling multimedia files.\n  - Repository: [kkroening\u002Fffmpeg-python](https:\u002F\u002Fgithub.com\u002Fkkroening\u002Fffmpeg-python)\n\n- **audiocraft** - [MIT License](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Faudiocraft\u002Fblob\u002Fmain\u002FLICENSE)\n  - Description: A library for audio generation and MusicGen.\n  - Repository: [facebookresearch\u002Faudiocraft](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Faudiocraft)\n\n- **vocos** - [MIT License](https:\u002F\u002Fgithub.com\u002Fcharactr-platform\u002Fvocos\u002Fblob\u002Fmaster\u002FLICENSE)\n  - Description: An improved decoder for encodec samples\n  - Repository: [charactr-platform\u002Fvocos](https:\u002F\u002Fgithub.com\u002Fcharactr-platform\u002Fvocos)\n\n- **RVC** - [MIT License](https:\u002F\u002Fgithub.com\u002FRVC-Project\u002FRetrieval-based-Voice-Conversion-WebUI\u002Fblob\u002Fmain\u002FLICENSE)\n  - Description: An easy-to-use Voice Conversion framework based on VITS.\n  - Repository: [RVC-Project\u002FRetrieval-based-Voice-Conversion-WebUI](https:\u002F\u002Fgithub.com\u002FRVC-Project\u002FRetrieval-based-Voice-Conversion-WebUI)\n\u003C\u002Fdetails>\n\n## Ethical and Responsible Use\nThis technology is intended for enablement and creativity, not for harm.\n\nBy engaging with this AI model, you acknowledge and agree to abide by these guidelines, employing the AI model in a responsible, ethical, and legal manner.\n- Non-Malicious Intent: Do not use this AI model for malicious, harmful, or unlawful activities. It should only be used for lawful and ethical purposes that promote positive engagement, knowledge sharing, and constructive conversations.\n- No Impersonation: Do not use this AI model to impersonate or misrepresent yourself as someone else, including individuals, organizations, or entities. It should not be used to deceive, defraud, or manipulate others.\n- No Fraudulent Activities: This AI model must not be used for fraudulent purposes, such as financial scams, phishing attempts, or any form of deceitful practices aimed at acquiring sensitive information, monetary gain, or unauthorized access to systems.\n- Legal Compliance: Ensure that your use of this AI model complies with applicable laws, regulations, and policies regarding AI usage, data protection, privacy, intellectual property, and any other relevant legal obligations in your jurisdiction.\n- Acknowledgement: By engaging with this AI model, you acknowledge and agree to abide by these guidelines, using the AI model in a responsible, ethical, and legal manner.\n\n## License\n\n### Codebase and Dependencies\n\nThe codebase is licensed under MIT. However, it's important to note that when installing the dependencies, you will also be subject to their respective licenses. Although most of these licenses are permissive, there may be some that are not. Therefore, it's essential to understand that the permissive license only applies to the codebase itself, not the entire project.\n\nThat being said, the goal is to maintain MIT compatibility throughout the project. If you come across a dependency that is not compatible with the MIT license, please feel free to open an issue and bring it to our attention.\n\nKnown non-permissive dependencies:\n| Library     | License           | Notes                                                                                     |\n|-------------|-------------------|-------------------------------------------------------------------------------------------|\n| encodec     | CC BY-NC 4.0      | Newer versions are MIT, but need to be installed manually                                  |\n| diffq       | CC BY-NC 4.0      | Optional in the future, not necessary to run, can be uninstalled, should be updated with demucs |\n| lameenc     | GPL License       | Future versions will make it LGPL, but need to be installed manually                      |\n| unidecode   | GPL License       | Not mission critical, can be replaced with another library, issue: https:\u002F\u002Fgithub.com\u002Fneonbjb\u002Ftortoise-tts\u002Fissues\u002F494 |\n\n\n### Model Weights\nModel weights have different licenses, please pay attention to the license of the model you are using.\n\nMost notably:\n- Bark: MIT\n- Tortoise: *Unknown* (Apache-2.0 according to repo, but no license file in HuggingFace)\n- MusicGen: CC BY-NC 4.0\n- AudioGen: CC BY-NC 4.0\n","\u003Ch1 align=\"center\">TTS WebUI\u003C\u002Fh1>\n\n\u003Cdiv align=\"center\">\n\n  \u003Ch4 align=\"center\">\n\n  [下载安装程序](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Freleases\u002Fdownload\u002Fv0.0.0\u002Ftts-webui-installer.zip) ||\n  [安装说明](#installation) ||\n  [Docker 部署](#docker-setup) ||\n  [Silly Tavern 集成](#integrations) ||\n  [扩展插件](#extensions) ||\n  [反馈\u002F错误报告](https:\u002F\u002Fforms.gle\u002F2L62owhBsGFzdFBC8)\n\n  \u003C\u002Fh4>\n\n  [![banner](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frsxdalv_TTS-WebUI_readme_f2560aa3c133.png)](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts-webui)\n\n  [![GitHub 星标](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Frsxdalv\u002Ftts-webui?style=social)](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts-webui\u002Fstargazers)\n  [![GitHub 许可证](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Frsxdalv\u002Ftts-webui)](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts-webui\u002Fblob\u002Fmain\u002FLICENSE)\n  [![Discord](https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F1258772280071295018?label=discord&logo=discord&logoColor=white)](https:\u002F\u002Fdiscord.gg\u002FV8BKTVRtJ9)\n  [![在 Colab 中打开](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Frsxdalv\u002Ftts-webui\u002Fblob\u002Fmain\u002Fdocumentation\u002Fnotebooks\u002Fgoogle_colab.ipynb)\n  [![GitHub 分叉数](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Frsxdalv\u002Ftts-webui?style=social)](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts-webui\u002Fnetwork\u002Fmembers)\n  [![YouTube](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FYouTube-%23FF0000.svg?logo=YouTube&logoColor=white)](https:\u002F\u002Fwww.youtube.com\u002F@TTS-WebUI)\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\n## 视频\n\n\u003C\u002Fdiv>\n\n| [![观看视频](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frsxdalv_TTS-WebUI_readme_bec48153aebb.jpg)](https:\u002F\u002Fyoutu.be\u002FJXojhFjZ39k) | [![观看视频](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frsxdalv_TTS-WebUI_readme_9debe8accd5d.jpg)](https:\u002F\u002Fyoutu.be\u002FYvM3DdRHDsI) | [![观看视频](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frsxdalv_TTS-WebUI_readme_e4561d7cb33d.jpg)](https:\u002F\u002Fyoutu.be\u002F_0rftbXPJLI) |\n| :------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------: |\n\n\u003Cdiv align=\"center\">\n\n## 示例\n\n\u003C\u002Fdiv>\n\n\n| \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F16ac948a-fe98-49ad-ad87-19c41fe7e65e\" width=\"300\">\u003C\u002Fvideo> | \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F55bde4f7-bbcc-4ecf-8f94-b315b9d22e74\" width=\"300\">\u003C\u002Fvideo> | \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Ffcee8906-a101-400d-8499-4e72c7603042\" width=\"300\">\u003C\u002Fvideo> |\n| :-----------------------------------------: | :-----------------------------------------: | :-------------------------------: |\n\n\u003Cdiv align=\"center\">\n\n## 截图\n\n\u003C\u002Fdiv>\n\n| ![react_1](.\u002Fdocumentation\u002Fscreenshots\u002Freact_ui%20(1).png) | ![react_2](.\u002Fdocumentation\u002Fscreenshots\u002Freact_ui%20(2).png) | ![react_3](.\u002Fdocumentation\u002Fscreenshots\u002Freact_ui%20(3).png) |\n| :-----------------------------------------: | :-----------------------------------------: | :-------------------------------: |\n\n| ![gradio_1](.\u002Fdocumentation\u002Fscreenshots\u002Fgradio%20(1).png) | ![gradio_2](.\u002Fdocumentation\u002Fscreenshots\u002Fgradio%20(2).png) | ![gradio_3](.\u002Fdocumentation\u002Fscreenshots\u002Fgradio%20(3).png) |\n| :-----------------------------------------: | :-----------------------------------------: | :-------------------------------: |\n\n\u003Cdiv align=\"center\">\n\n## 支持的模型\n\n| 文本转语音                                                                      | 音频\u002F音乐生成                                                                | 音频转换\u002F工具                                                       |\n| ----------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------- |\n| [Bark](https:\u002F\u002Fgithub.com\u002Fsuno-ai\u002Fbark)                                             | [MusicGen](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Faudiocraft\u002Fblob\u002Fmain\u002Fdocs\u002FMUSICGEN.md) | [RVC](https:\u002F\u002Fgithub.com\u002FRVC-Project\u002FRetrieval-based-Voice-Conversion-WebUI) |\n| [Tortoise](https:\u002F\u002Fgithub.com\u002Fneonbjb\u002Ftortoise-tts)                                 | [MAGNeT](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Faudiocraft\u002Fblob\u002Fmain\u002Fdocs\u002FMAGNET.md)     | [Demucs](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdemucs)                         |\n| [Maha TTS](https:\u002F\u002Fgithub.com\u002Fdubverse-ai\u002FMahaTTS)                                  | [Stable Audio](https:\u002F\u002Fgithub.com\u002FStability-AI\u002Fstable-audio-tools)                    | [Vocos](https:\u002F\u002Fgithub.com\u002Fgemelo-ai\u002Fvocos)                                  |\n| [MMS](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Ffairseq\u002Fblob\u002Fmain\u002Fexamples\u002Fmms\u002FREADME.md) | [Riffusion\\*](https:\u002F\u002Fgithub.com\u002Friffusion\u002Friffusion-hobby)                           | [Whisper](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fwhisper)                                 |\n| [Vall-E X](https:\u002F\u002Fgithub.com\u002FPlachtaa\u002FVALL-E-X)                                    | [AudioCraft Mac\\*](https:\u002F\u002Fgithub.com\u002Ftrizko\u002Faudiocraft)                              | [AP BWE](https:\u002F\u002Fgithub.com\u002Fyxlu-0102\u002FAP-BWE)                                |\n| [StyleTTS2](https:\u002F\u002Fgithub.com\u002Fsidharthrajaram\u002FStyleTTS2)                           | [AudioCraft Plus\\*](https:\u002F\u002Fgithub.com\u002FGrandaddyShmax\u002Faudiocraft_plus)                | [Resemble Enhance](https:\u002F\u002Fgithub.com\u002Fresemble-ai\u002Fresemble-enhance)          |\n| [SeamlessM4T](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fseamless_communication)           | [ACE-Step\\*](https:\u002F\u002Fgithub.com\u002FACE-Step\u002FACE-Step)                                    | [Audio Separator](https:\u002F\u002Fgithub.com\u002Fnomadkaraoke\u002Fpython-audio-separator)    |\n| [XTTSv2\\*](https:\u002F\u002Fgithub.com\u002Fcoqui-ai\u002FTTS)                                         | [Song Bloom\\*](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts_webui_extension.song_bloom)             | [PyRNNoise\\*](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts_webui_extension.pyrnnoise)      |\n| [MARS5\\*](https:\u002F\u002Fgithub.com\u002Fcamb-ai\u002Fmars5-tts)                                     |                                                                                       | [MiMo Audio\\*](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts_webui_extension.mimo_audio)    |\n| [F5-TTS\\*](https:\u002F\u002Fgithub.com\u002FSWivid\u002FF5-TTS)                                        |                                                                                       |                                                                              |\n| [Parler TTS\\*](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fparler-tts)                           |                                                                                       |                                                                              |\n| [OpenVoice\\*](https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FOpenVoice)                              |                                                                                       |                                                                              |\n| [OpenVoice V2\\*](https:\u002F\u002Fgithub.com\u002Fmyshell-ai\u002FOpenVoice)                           |                                                                                       |                                                                              |\n| [Kokoro TTS\\*](https:\u002F\u002Fgithub.com\u002Fhexgrad\u002Fkokoro)                                   |                                                                                       |                                                                              |\n| [DIA\\*](https:\u002F\u002Fgithub.com\u002Fnari-labs\u002Fdia)                                           |                                                                                       |                                                                              |\n| [CosyVoice\\*](https:\u002F\u002Fgithub.com\u002FFunAudioLLM\u002FCosyVoice)                             |                                                                                       |                                                                              |\n| [GPT-SoVITS\\*](https:\u002F\u002Fgithub.com\u002FX-T-E-R\u002FGPT-SoVITS-Inference)                     |                                                                                       |                                                                              |\n| [Piper TTS\\*](https:\u002F\u002Fgithub.com\u002Frhasspy\u002Fpiper)                                     |                                                                                       |                                                                              |\n| [Kimi Audio 7B Instruct\\*](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002FKimi-Audio)                 |                                                                                       |                                                                              |\n| [Chatterbox\\*](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Fchatterbox)                               |                                                                                       |                                                                              |\n| [VibeVoice\\*](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts_webui_extension.vibevoice)             |                                                                                       |                                                                              |\n| [Kitten TTS\\*](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts_webui_extension.kitten_tts)           |                                                                                       |                                                                              |\n| [Index-TTS2\\*](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts_webui_extension.index_tts)            |                                                                                       |                                                                              |\n| [VoxCPM\\*](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts_webui_extension.vox_cpm)                  |                                                                                       |                                                                              |\n| [FireRedTTS2\\*](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts_webui_extension.fireredtts2)         |                                                                                       |                                                                              |\n| [MegaTTS3\\*](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts_webui_extension.megatts3)               |                                                                                       |                                                                              |\n| [MiniMax Cloud TTS](https:\u002F\u002Fwww.minimaxi.com) (内置)                            |                                                                                       |                                                                              |\n\n\\* 这些模型默认不会安装，而是作为扩展提供。\n\n\u003C\u002Fdiv>\n\n\n\n\n## 安装\n\n### 使用安装程序（推荐）\n\n当前基础安装大小约为10.7 GB。每个模型还需要额外的2-8 GB空间。\n\n* 下载[最新版本](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Freleases\u002Fdownload\u002Fv0.0.0\u002Ftts-webui-installer.zip)并解压。\n* 运行 start_tts_webui.bat 或 start_tts_webui.sh 启动服务器。它会提示您选择使用的 GPU\u002F芯片。安装完成后，Gradio 服务器将在 http:\u002F\u002Flocalhost:7770 上启动，React UI 将在 http:\u002F\u002Flocalhost:3000 上启动。\n* 输出日志将保存在 installer_scripts\u002Foutput.log 文件中。\n* 注意：启动脚本会设置一个 conda 环境和一个 Python 虚拟环境。因此，在此之前无需创建 venv；实际上，从其他 venv 启动可能会导致该脚本无法正常运行。\n\n### 手动安装\n\n先决条件：\n* git\n* Python 3.10 或 3.11（暂不支持 3.12）\n* PyTorch\n* ffmpeg（支持 vorbis 编码）\n* （可选）NodeJS 22.9.0 用于 React UI\n* SQLite（随 Python 附带）用于数据库支持\n\n1. 克隆仓库：\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts-webui.git\n   cd tts-webui\n   ```\n2. 安装所需包：\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n3. 运行服务器：\n   ```bash\n   python server.py --no-react\n   ```\n\n4. 对于 React UI：\n   ```bash\n   cd react-ui\n   npm install\n   npm run build\n   cd ..\n   python server.py\n   ```\n\n有关详细的手动安装说明，请参阅[手动安装指南](.\u002Fdocumentation\u002Fmanual_installation.md)。\n\n### Docker 设置\n\ntts-webui 也可以在 Docker 容器中运行。在 Docker 中使用 CUDA 需要 [NVIDIA Container Toolkit](https:\u002F\u002Fdocs.nvidia.com\u002Fdatacenter\u002Fcloud-native\u002Fcontainer-toolkit\u002Flatest\u002Finstall-guide.html)。要开始使用，可以从 GitHub Container Registry 拉取镜像：\n\n```\ndocker pull ghcr.io\u002Frsxdalv\u002Ftts-webui:main\n```\n\n镜像拉取完成后，可以使用 Docker Compose 启动容器：\n\n端口分别为 7770（环境变量：TTS_PORT）用于 Gradio 后端，以及 3000（环境变量：UI_PORT）用于 React 前端。\n\n```\ndocker compose up -d\n```\n\n容器需要一些时间来生成第一个输出，因为模型会在后台下载。可以通过查看容器日志来确认下载状态：\n\n```\ndocker logs tts-webui\n```\n\n#### 自己构建镜像\n如果您希望自行构建 Docker 容器，可以使用附带的 Dockerfile：\n\n```\ndocker build -t tts-webui .\n```\n请注意，需要编辑 docker-compose 文件以使用您刚刚构建的镜像。\n\n## 更改记录\n\n四月：\n* 在依赖项中添加 torchcodec CPU\n* 将 PyTorch 升级至 2.11.0\n* 将 TorchAudio 锁定版本更新至 2.7.0\n\n三月：\n* 新特性：Next.js 从 13.5 升级至 16.2.1\n\n十二月：\n* 修复了一些小错误并进行了改进\n* 重构：将 bark 配置加载提取到 WebUI 的独立函数中\n* 杂项：从 tts-webui 中移除 postgres，因为它是一个对大多数用户来说不必要的重型依赖\n* 新特性：添加数据库 API 服务器，用于管理生成的音频文件和元数据\n\n十一月：\n* 使用 iframe 直接在 Gradio UI 中添加扩展市场\n* 将 voices-tortoise 移至 voices\u002Ftortoise\u002F（并将 maha-tts 移至 voices\u002Fmaha_tts\u002F）\n* 新特性：为 tts_webui 模块添加全面的测试套件\n* 新特性：为扩展市场添加安装按钮\n* 新特性：在扩展管理 UI 中添加检查依赖项的按钮\n* 新特性（命令行）：添加 extension-manager 命令以启动 GUI 工具\n* 杂项：从 Google Colab 中移除 uv\n* 杂项：将日志目录移动到 installer_scripts\u002Flogs\u002F\n* 新特性：添加通过 config.json 禁用装饰性扩展的功能\n\n十月：\n* 将 Gradio 更新至 5.49.1\n* 将 @gradio\u002Fclient 更新至 1.19.1\n* 修复 Chatterbox 安装问题、路径问题以及 Silly Tavern 中的流媒体错误\n* 创建新的扩展类别——对话式 AI\n* 重新组织环境变量，引入新的 dotenv 管理器\n* 将更多 UI 功能转换为扩展，并简化基础服务器\n* 为 OpenAI TTS API 扩展添加 API_KEY\n* 新特性：支持新的扩展格式——tabsInGroups\n* 新特性：添加外部扩展安装程序，用于通过 JSON 管理和安装外部扩展\n* 新特性：在文档和安装程序 UI 中添加指向 TTS WebUI 扩展目录的链接\n* 新特性：添加日志查看器扩展，用于查看和管理日志文件\n\n## 历史变更\n\n有关 2025 年的详细变更列表，请参阅[2025 年更改记录](.\u002Fdocumentation\u002Fchangelog-2025.md)。\n\n有关 2024 年的详细变更列表，请参阅[2024 年更改记录](.\u002Fdocumentation\u002Fchangelog-2024.md)。\n\n有关 2023 年的详细变更列表，请参阅[2023 年更改记录](.\u002Fdocumentation\u002Fchangelog-2023.md)。\n\n## 扩展\n\n扩展可以直接从 WebUI 或 React UI 安装，也可以使用扩展管理器或外部扩展安装程序（一个内置工具，用于通过 JSON 添加自定义扩展）进行安装。 \n\n在内部，扩展只是使用 pip 安装的 Python 包。可以同时安装多个扩展，但它们之间可能存在兼容性问题。安装或更新扩展后，需要重启应用才能加载。\n\n有关社区创建的精选扩展列表，请访问[TTS WebUI 扩展目录](https:\u002F\u002Frsxdalv.github.io\u002Ftts-webui-extension-catalog\u002F)。您还可以在那里找到有关发布您自己的扩展的信息。\n\n更新需要通过迷你控制面板手动完成：\n\n![mini-control-panel](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frsxdalv_TTS-WebUI_readme_6ca76eb72ac8.png)\n\n\n## 集成\n\n### Silly Tavern\n\n1. 将 OpenAI TTS API 扩展更新至最新版本\n2. 启动 API 并使用 Python Requests 测试\n\n   *(OpenAI 客户端可能未安装，因此使用 Python OpenAI 客户端测试可能会失败)*\n\n3. 当您成功生成音频时，前往 Silly Tavern 并添加一个新的 TTS API\n   默认提供商端点：`http:\u002F\u002Flocalhost:7778\u002Fv1\u002Faudio\u002Fspeech`\n   ![silly-tavern-tts-api](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frsxdalv_TTS-WebUI_readme_acd245a1dfb8.png)\n4. 试试看！\n\n### Text Generation WebUI (oobabooga\u002Ftext-generation-webui)\n\n1. 在 text-generation-webui 中安装 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftext-to-tts-webui 扩展\n2. 启动 API 并使用 Python Requests 测试\n3. 使用面板进行配置：![oobaboooga-text-to-tts-webui](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frsxdalv_TTS-WebUI_readme_431e8cdfbeaf.png)\n\n### OpenWebUI\n\n1. 在 TTS WebUI 中启用 OpenAI API 扩展\n2. 启动 API 并使用 Python Requests 测试\n3. 当您成功生成音频时，前往 OpenWebUI 并添加一个新的 TTS API\n   默认提供商端点：`http:\u002F\u002Flocalhost:7778\u002Fv1\u002Faudio\u002Fspeech`\n4. 试试看！\n![openwebui](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frsxdalv_TTS-WebUI_readme_c3e79642a9bf.png)\n\n### OpenAI 兼容 API\n\n按照上述说明，您可以安装一个 OpenAI 兼容的 API，并将其与 Silly Tavern 或其他 OpenAI 兼容客户端一起使用。\n\n## 兼容性 \u002F 错误\n\n### 控制台中的红色消息\n这些消息：\n```\n---- 需要 ----，但您安装了 ----，两者不兼容。\n```\n是完全正常的。这既是 pip 的局限性，也是因为这个 Web UI 将许多不同的 AI 项目整合在一起。由于这些项目之间并不总是相互兼容的，它们会抱怨其他已安装的项目。这是正常且预期的现象。最终，尽管有警告或错误提示，这些项目仍然可以协同工作。\n目前尚不清楚这种情况是否能够被解决，但这正是我们的期望。\n\n\n## Bark 的额外语音及提示词示例\n\u003Cdiv align=\"center\">\n\n[![PromptEcho](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frsxdalv_TTS-WebUI_readme_f88d54d70cad.png)](https:\u002F\u002Fpromptecho.com\u002F)\n\n[![Bark 说话人目录](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frsxdalv_TTS-WebUI_readme_4e6cc766644b.png)](https:\u002F\u002Frsxdalv.github.io\u002Fbark-speaker-directory\u002F)\n\n\u003C\u002Fdiv>\n\n## Bark 说明文档\n[README_Bark.md](.\u002Fdocumentation\u002FREADME_Bark.md)\n\n## 关于管理 AI 项目的模型、缓存和系统空间的信息\nhttps:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts-webui\u002Fdiscussions\u002F186#discussioncomment-7291274\n\n\n\n## 开源库\n\n\u003Cdetails>\n\u003Csummary>本项目使用了以下开源库：\u003C\u002Fsummary>\n\n- **suno-ai\u002Fbark** - [MIT 许可证](https:\u002F\u002Fgithub.com\u002Fsuno-ai\u002Fbark\u002Fblob\u002Fmain\u002FLICENSE)\n  - 描述：Bark 模型的推理代码。\n  - 仓库：[suno\u002Fbark](https:\u002F\u002Fgithub.com\u002Fsuno-ai\u002Fbark)\n\n- **tortoise-tts** - [Apache-2.0 许可证](https:\u002F\u002Fgithub.com\u002Fneonbjb\u002Ftortoise-tts\u002Fblob\u002Fmaster\u002FLICENSE)\n  - 描述：一个适用于多种平台的灵活文本到语音合成库。\n  - 仓库：[neonbjb\u002Ftortoise-tts](https:\u002F\u002Fgithub.com\u002Fneonbjb\u002Ftortoise-tts)\n\n- **ffmpeg** - [LGPL 许可证](https:\u002F\u002Fgithub.com\u002FFFmpeg\u002FFFmpeg\u002Fblob\u002Fmaster\u002FLICENSE.md)\n  - 描述：一个完整且跨平台的视频和音频处理解决方案。\n  - 仓库：[FFmpeg](https:\u002F\u002Fgithub.com\u002FFFmpeg\u002FFFmpeg)\n  - 用途：编码 Vorbis Ogg 文件\n\n- **ffmpeg-python** - [Apache 2.0 许可证](https:\u002F\u002Fgithub.com\u002Fkkroening\u002Fffmpeg-python\u002Fblob\u002Fmaster\u002FLICENSE)\n  - 描述：用于处理多媒体文件的 FFmpeg 库的 Python 绑定。\n  - 仓库：[kkroening\u002Fffmpeg-python](https:\u002F\u002Fgithub.com\u002Fkkroening\u002Fffmpeg-python)\n\n- **audiocraft** - [MIT 许可证](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Faudiocraft\u002Fblob\u002Fmain\u002FLICENSE)\n  - 描述：一个用于音频生成和 MusicGen 的库。\n  - 仓库：[facebookresearch\u002Faudiocraft](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Faudiocraft)\n\n- **vocos** - [MIT 许可证](https:\u002F\u002Fgithub.com\u002Fcharactr-platform\u002Fvocos\u002Fblob\u002Fmaster\u002FLICENSE)\n  - 描述：一种改进的 encodec 样本解码器。\n  - 仓库：[charactr-platform\u002Fvocos](https:\u002F\u002Fgithub.com\u002Fcharactr-platform\u002Fvocos)\n\n- **RVC** - [MIT 许可证](https:\u002F\u002Fgithub.com\u002FRVC-Project\u002FRetrieval-based-Voice-Conversion-WebUI\u002Fblob\u002Fmain\u002FLICENSE)\n  - 描述：一个基于 VITS 的易用语音转换框架。\n  - 仓库：[RVC-Project\u002FRetrieval-based-Voice-Conversion-WebUI](https:\u002F\u002Fgithub.com\u002FRVC-Project\u002FRetrieval-based-Voice-Conversion-WebUI)\n\u003C\u002Fdetails>\n\n## 伦理与负责任的使用\n这项技术旨在促进赋能与创造力，而非造成伤害。\n\n通过使用本 AI 模型，您确认并同意遵守以下准则，以负责任、合乎伦理且合法的方式使用该 AI 模型。\n- 非恶意意图：请勿将本 AI 模型用于恶意、有害或非法活动。它仅应用于合法且合乎道德的目的，以促进积极互动、知识共享和建设性对话。\n- 禁止冒充：请勿使用本 AI 模型冒充或伪装成他人，包括个人、组织或实体。不得利用其进行欺骗、诈骗或操纵他人行为。\n- 禁止欺诈行为：本 AI 模型不得用于任何形式的欺诈目的，例如金融诈骗、网络钓鱼或其他旨在获取敏感信息、谋取经济利益或未经授权访问系统的欺骗性行为。\n- 遵守法律：请确保您对本 AI 模型的使用符合所在司法管辖区关于 AI 使用、数据保护、隐私、知识产权及其他相关法律义务的适用法律法规和政策。\n- 确认：通过使用本 AI 模型，您确认并同意遵守上述准则，以负责任、合乎伦理且合法的方式使用该 AI 模型。\n\n## 许可证\n\n### 代码库与依赖项\n\n代码库采用 MIT 许可证。然而，需要注意的是，在安装依赖项时，您还需遵守它们各自的许可证条款。尽管大多数许可证较为宽松，但也可能存在一些不兼容的情况。因此，务必明确：宽松许可证仅适用于代码库本身，而不涵盖整个项目。\n\n尽管如此，我们的目标是在整个项目中保持与 MIT 许可证的兼容性。如果您发现某个依赖项与 MIT 许可证不兼容，请随时提交问题并告知我们。\n\n已知的非宽松许可证依赖项：\n| 库名     | 许可证           | 备注                                                                                     |\n|-------------|-------------------|-------------------------------------------------------------------------------------------|\n| encodec     | CC BY-NC 4.0      | 新版本为 MIT，但需手动安装                                  |\n| diffq       | CC BY-NC 4.0      | 未来可选，运行并非必需，可卸载，应与 demucs 一起更新 |\n| lameenc     | GPL 许可证       | 未来版本将改为 LGPL，但需手动安装                      |\n| unidecode   | GPL 许可证       | 非关键组件，可用其他库替代，问题：https:\u002F\u002Fgithub.com\u002Fneonbjb\u002Ftortoise-tts\u002Fissues\u002F494 |\n\n\n### 模型权重\n模型权重有不同的许可证，请注意您所使用的模型的许可证。\n\n其中最值得注意的是：\n- Bark：MIT\n- Tortoise：*未知*（根据仓库信息为 Apache-2.0，但 HuggingFace 上没有许可证文件）\n- MusicGen：CC BY-NC 4.0\n- AudioGen：CC BY-NC 4.0","# TTS-WebUI 快速上手指南\n\nTTS-WebUI 是一个功能强大的开源文本转语音（TTS）、音频生成及处理工具集合，支持包括 Bark、Tortoise、XTTSv2、F5-TTS 在内的数十种主流模型，并提供 Gradio 和 React 两种用户界面。\n\n## 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n*   **操作系统**: Windows, Linux 或 macOS\n*   **Python 版本**: **3.10** 或 **3.11** (暂不支持 3.12)\n*   **GPU**: 推荐 NVIDIA 显卡 (需安装对应的 CUDA 驱动)，部分模型支持 CPU 运行但速度较慢\n*   **前置依赖**:\n    *   `git`: 用于克隆代码库\n    *   `ffmpeg`: 需安装并支持 vorbis 编码 (用于音频处理)\n    *   `NodeJS` (可选): 版本建议 22.9.0，仅在使用 React UI 时需要\n    *   `SQLite`: Python 自带，无需单独安装\n\n> **国内用户提示**: 建议配置 pip 国内镜像源以加速依赖下载。\n> ```bash\n> pip config set global.index-url https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n## 安装步骤\n\n您可以选择**一键安装包**（推荐新手）或**手动安装**（适合开发者）。\n\n### 方法一：使用安装包（推荐）\n\n此方法会自动配置 Conda 环境和 Python 虚拟环境，无需手动干预。\n\n1.  **下载安装包**:\n    从 [GitHub Releases](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Freleases) 下载最新版本的 `tts-webui-installer.zip` 并解压。\n    *(注：基础安装体积约 10.7 GB，每个额外模型需 2-8 GB 空间)*\n\n2.  **启动服务**:\n    *   **Windows**: 双击运行 `start_tts_webui.bat`\n    *   **Linux\u002FMac**: 终端运行 `.\u002Fstart_tts_webui.sh`\n\n3.  **配置硬件**:\n    首次运行时，脚本会提示您选择使用的 GPU 或芯片类型，按指引选择即可。\n\n4.  **访问界面**:\n    安装完成后，服务将自动启动：\n    *   **Gradio 界面**: http:\u002F\u002Flocalhost:7770\n    *   **React 界面**: http:\u002F\u002Flocalhost:3000\n\n### 方法二：手动安装\n\n适合需要自定义环境或进行二次开发的用戶。\n\n1.  **克隆仓库**:\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Frsxdalv\u002Ftts-webui.git\n    cd tts-webui\n    ```\n\n2.  **创建并激活虚拟环境**:\n    ```bash\n    # Windows\n    python -m venv venv\n    venv\\Scripts\\activate\n\n    # Linux\u002FMac\n    python3 -m venv venv\n    source venv\u002Fbin\u002Factivate\n    ```\n\n3.  **安装 PyTorch**:\n    请访问 [PyTorch 官网](https:\u002F\u002Fpytorch.org\u002F) 获取适合您系统的安装命令。例如 (CUDA 11.8):\n    ```bash\n    pip install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n    ```\n\n4.  **安装项目依赖**:\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n5.  **安装 React UI 依赖 (可选)**:\n    如果需要使用 React 界面，请确保已安装 NodeJS，然后执行：\n    ```bash\n    cd frontend\n    npm install\n    cd ..\n    ```\n\n6.  **启动服务**:\n    ```bash\n    python app.py\n    ```\n\n## 基本使用\n\n启动成功后，打开浏览器访问 `http:\u002F\u002Flocalhost:7770` (Gradio) 或 `http:\u002F\u002Flocalhost:3000` (React)。\n\n### 最简单的文本转语音流程\n\n1.  **选择模型**:\n    在界面左侧或顶部的模型下拉菜单中，选择一个已安装的 TTS 模型（例如 `Bark`, `XTTSv2`, 或 `Piper`）。\n    *   *注意：带 `*` 号的模型默认未安装，需在“Extensions”标签页中点击下载。*\n\n2.  **输入文本**:\n    在文本输入框中输入您想要转换的文字内容。\n\n3.  **调整参数 (可选)**:\n    根据所选模型，调整语速 (Speed)、情感 (Emotion) 或参考音频 (Reference Audio) 等参数。\n\n4.  **生成音频**:\n    点击 **Generate** (生成) 按钮。\n\n5.  **播放与下载**:\n    生成完成后，界面将显示音频播放器，您可以直接试听或点击下载按钮保存 `.wav` \u002F `.mp3` 文件。\n\n### 进阶功能\n*   **语音克隆**: 上传一段参考人声录音，模型可模仿该音色朗读新文本（需模型支持，如 XTTSv2）。\n*   **音频处理**: 切换至 \"Audio Conversion\u002FTools\" 类别下的模型（如 RVC, Demucs），可进行变声、人声分离或音质增强。\n*   **扩展管理**: 在设置或扩展页面，可以一键下载和管理更多社区支持的模型。","一位独立游戏开发者正在为角色对话系统制作多语言语音包，需要同时处理旁白叙述、角色情感演绎及背景音效生成。\n\n### 没有 TTS-WebUI 时\n- **环境配置噩梦**：为了尝试 Bark、CosyVoice 和 RVC 等不同模型，开发者需分别搭建多个独立的 Python 虚拟环境，依赖冲突频发，安装过程耗时数天。\n- **工作流割裂**：生成旁白用一套代码，克隆角色声音用另一套脚本，后期降噪又要切换软件，数据在不同工具间反复导出导入，极易出错。\n- **试错成本高昂**：调整语速、情感或音色参数时，每次修改都需重新运行复杂的命令行脚本，无法实时预览效果，导致创意验证效率极低。\n- **资源管理混乱**：不同模型对显存要求不一，缺乏统一调度，经常因显存溢出导致程序崩溃，难以在同一台机器上流畅运行多种模型。\n\n### 使用 TTS-WebUI 后\n- **一键集成部署**：通过 TTS-WebUI 的单一安装包或 Docker 容器，瞬间集成了从 GPT-SoVITS 到 MusicGen 等二十余种主流模型，无需再关心底层依赖冲突。\n- **统一操作界面**：在基于 Gradio 和 React 的统一面板中，开发者可无缝切换文本转语音、声音克隆及音乐生成模块，所有音频资产在一个页面内完成全流程制作。\n- **实时交互调优**：利用可视化滑块即时调整参数并秒级试听，快速锁定最适合游戏氛围的音色与情感表现，将原本数小时的调试工作压缩至几分钟。\n- **高效资源调度**：TTS-WebUI 自动管理模型加载与显存释放，支持在同一会话中灵活调用轻量级 Piper 进行快速迭代，或调用重型模型进行最终渲染，稳定性显著提升。\n\nTTS-WebUI 将碎片化的音频 AI 工具链整合为一站式创作平台，让开发者从繁琐的环境维护中解放，专注于内容创意本身。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frsxdalv_TTS-WebUI_f2560aa3.png","rsxdalv","Roberts Slisans","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Frsxdalv_3ceaddbe.jpg","TTS WebUI Developer",null,"https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI","https:\u002F\u002Fgithub.com\u002Frsxdalv",[82,86,90,94,98,102,105,108,112,116],{"name":83,"color":84,"percentage":85},"TypeScript","#3178c6",57.9,{"name":87,"color":88,"percentage":89},"Python","#3572A5",32.1,{"name":91,"color":92,"percentage":93},"JavaScript","#f1e05a",3.2,{"name":95,"color":96,"percentage":97},"Shell","#89e051",1.6,{"name":99,"color":100,"percentage":101},"Batchfile","#C1F12E",1.2,{"name":103,"color":104,"percentage":101},"MDX","#fcb32c",{"name":106,"color":107,"percentage":101},"HTML","#e34c26",{"name":109,"color":110,"percentage":111},"PowerShell","#012456",0.6,{"name":113,"color":114,"percentage":115},"CSS","#663399",0.5,{"name":117,"color":118,"percentage":119},"Jupyter Notebook","#DA5B0B",0.2,3091,314,"2026-04-19T19:21:26","MIT","Windows, Linux, macOS","支持多种 GPU\u002F芯片（安装脚本会提示选择），具体显存需求取决于所选模型（每个模型额外需要 2-8GB 存储空间，隐含对显存有一定要求），未明确指定最低 CUDA 版本","未说明",{"notes":128,"python":129,"dependencies":130},"基础安装大小约 10.7 GB，每个模型额外需要 2-8 GB 空间。官方推荐使用提供的安装程序（Installer），它会自动配置 conda 环境和 Python 虚拟环境，请勿在外部虚拟环境中运行启动脚本以免出错。不支持 Python 3.12。部分高级模型默认不安装，需作为扩展插件另行安装。","3.10 或 3.11",[131,132,133,134,135],"git","PyTorch","ffmpeg (需支持 vorbis)","SQLite","NodeJS 22.9.0 (可选，用于 React UI)",[15,14,27,137,13],"音频",[139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155],"gradio","text-to-speech","tts","ai","audio-generation","generator","music","musicgen","rvc","tortoise-tts","vocos","styletts2","generative-ai","openai-api","ace-step","cosyvoice","openvoice","2026-03-27T02:49:30.150509","2026-04-20T19:22:19.391797",[159,164,169,174,179,183],{"id":160,"question_zh":161,"answer_zh":162,"source_url":163},45448,"在 macOS 上运行 start_macos.sh 脚本后没有任何反应，只是打开了终端窗口，如何解决？","这通常是因为 torchaudio 库无法加载导致的。主要问题可能是 Conda 错误地选择了 macOS 的芯片架构（例如在 Apple Silicon 和 Intel 之间混淆）。虽然这个问题之前修复过多次，但在新架构或特定环境下可能再次出现。建议检查 Conda 环境是否正确识别了当前的系统架构，或者参考相关的架构兼容性讨论（如 Issue #196）进行排查。","https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fissues\u002F230",{"id":165,"question_zh":166,"answer_zh":167,"source_url":168},45449,"为什么移动虚拟环境（VENV）文件夹会导致程序出错或找不到 pip？","移动后的虚拟环境（VENV）通常会损坏。因为 VENV 使用硬编码的绝对路径来存储其 PATH 信息。一旦文件夹被移动，原有的路径失效，系统可能会回退到全局的 pip（例如 `\u002Fhome\u002Fuser\u002F.local\u002Flib\u002F...`），从而导致依赖包找不到或版本冲突。Python 生态系统不像 Node.js 那样支持灵活的路径重映射。解决方法是不要手动移动安装文件夹，如果必须移动，需要重新创建虚拟环境并重新安装依赖。","https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fissues\u002F433",{"id":170,"question_zh":171,"answer_zh":172,"source_url":173},45450,"在 Windows 上一键安装脚本（start_windows.bat）打开后立即关闭，是什么原因？","这通常是因为安装路径中包含空格。该脚本依赖 Miniconda，而 Miniconda 无法在包含空格的路径下静默安装。请确保将整个项目文件夹移动到没有空格的路径下（例如 `D:\\TTS-WebUI` 而不是 `C:\\Users\\Name\\My Projects\\TTS`），然后重新运行脚本。如果问题依旧，可以尝试使用 WSL（Windows Subsystem for Linux）来获得更好的兼容性和性能。","https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fissues\u002F383",{"id":175,"question_zh":176,"answer_zh":177,"source_url":178},45451,"使用 Stable Audio 生成音频时，保存文件失败并报错，提示文件名有问题，如何解决？","这是因为 Stable Audio 生成的提示词（Prompt）中包含换行符（\\n）或其他特殊字符，导致生成的文件名非法。例如，当提示词包含多行信息（如 Genre, Mood, Style 等）时，直接将其作为文件名会引发错误。目前的临时解决方案是手动简化提示词，去除换行符和特殊符号，或者等待开发者修复文件名清洗逻辑以自动处理这些字符。","https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fissues\u002F319",{"id":180,"question_zh":181,"answer_zh":182,"source_url":168},45452,"Linux Mint LTS 上安装 NVIDIA 驱动相关组件时报错，且 PyTorch 安装位置错误，如何处理？","这是一个较旧的安装流程问题。目前安装程序已默认改为通过 pip 安装 PyTorch，从而避免了之前因路径配置导致的错误。如果您遇到此类问题，请确保使用的是最新版本的安装脚本。如果仍然报错，请检查是否手动移动过虚拟环境文件夹（见相关问题），因为移动文件夹会破坏硬编码的路径引用。",{"id":184,"question_zh":185,"answer_zh":186,"source_url":187},45453,"macOS 安装过程中出现构建 torchcrepe、rvc-beta 或 pyworld  wheel 的错误或卡住怎么办？","在 macOS 上构建这些包（特别是涉及 C++ 扩展的包如 pyworld 或 torchcrepe）时，经常会出现编译错误或长时间无响应。这通常与 Xcode Command Line Tools 未安装或版本不匹配有关。请首先运行 `xcode-select --install` 确保编译工具链完整。此外，确认 Conda 环境正确激活，并且架构设置（arm64 vs x86_64）与您的 Mac 芯片一致。如果问题持续，建议查看专门的 macOS 安装问题汇总帖（如 Issue #196）。","https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fissues\u002F196",[189,194,199,204,209,214,219],{"id":190,"version":191,"summary_zh":192,"released_at":193},360369,"v1.1.0","## 变更内容\n* 功能：在依赖项中添加了 torchcodec CPU，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F655 中实现。\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fcompare\u002Fv1.0.0...v1.1.0","2026-04-05T18:14:23",{"id":195,"version":196,"summary_zh":197,"released_at":198},360370,"v1.0.0","## 变更内容\n* 功能新增：将 PyTorch 升级至 2.11.0，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F654 中完成。\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fcompare\u002Fv0.5.1...v1.0.0","2026-04-05T15:19:46",{"id":200,"version":201,"summary_zh":202,"released_at":203},360371,"v0.5.1","## 变更内容\n* 12月修复，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F633 中完成\n* 新特性：从默认配置中移除 PostgreSQL 和 Bark 配置，并修复认证问题，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F635 中完成\n* 新特性：添加扩展工具栏以方便访问，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F642 中完成\n* 新特性：升级 Next.js，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F645 中完成\n* 修复：解决 React 构建安装问题，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F646 中完成\n* 添加 MiniMax Cloud TTS 作为内置扩展，由 @octo-patch 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F648 中完成\n* 修复：为 MiniMax TTS 添加种子参数，并在 PR 上运行 React 构建，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F649 中完成\n* 修复：将 torchvision 和 torchaudio 锁定到与 torch 版本匹配的版本，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F651 中完成\n* 修复：锁定 torchaudio 版本，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F653 中完成\n\n## 新贡献者\n* @octo-patch 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F648 中做出了首次贡献\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fcompare\u002Fv0.4.0...v0.5.1","2026-04-04T14:47:54",{"id":205,"version":206,"summary_zh":207,"released_at":208},360372,"v0.4.0","## 变更内容\n* chore：清理文件并优化目录结构，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F616 中完成\n* feat：新增通过 config.json 禁用装饰器扩展的功能，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F621 中完成\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fcompare\u002Fv0.3.0...v0.4.0","2025-11-23T23:45:18",{"id":210,"version":211,"summary_zh":212,"released_at":213},360373,"v0.3.0","## 变更内容\n* 功能：由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F600 中添加的外部扩展\n* 功能：由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F604 中集成的 TTS WebUI 扩展目录引用\n* 功能：由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F605 中添加的日志扩展\n* 功能：由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F606 中进行的测试及扩展目录同步\n* 功能：由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F611 中实现的通过按钮安装扩展并修复 pip 包管理器封装\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fcompare\u002Fv0.0.1...v0.3.0","2025-11-14T09:42:32",{"id":215,"version":216,"summary_zh":217,"released_at":218},360374,"v0.0.1","## 变更内容\n* 切换到新的下载压缩包，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F553 中完成\n* 修复安装程序，并添加 Higgs v2，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F555 中完成\n* 添加 VibeVoice，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F556 中完成\n* 修复：在 Google Colab 上添加关于 Python 3.11 的信息，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F564 中完成\n* 新增功能：添加 Kitten TTS Mini 扩展，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F565 中完成\n* PyRNNoise，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F566 中完成\n* Chatterbox React 界面，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F567 中完成\n* 修复：将 React 版本的 Chatterbox 界面中的 'language' 重命名为 'language_id'，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F570 中完成\n* 将 Kokoro TTS API 重命名为 OpenAI API TTS，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F573 中完成\n* 新增功能：将所有 extension_ 包重命名为 tts_webui_extension.*，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F574 中完成\n* 迁移修复，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F575 中完成\n* 新增功能：安装升级，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F576 中完成\n* 新增功能：与扩展的新 pip 源集成，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F577 中完成\n* 新增功能：平台更新及 MiMo Audio，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F578 中完成\n* 添加 index tts 2，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F581 中完成\n* 添加 VoxCPM，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F583 中完成\n* 添加 FireRedTTS2，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F584 中完成\n* 新增功能：添加 MegaTTS3 扩展，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F586 中完成\n* 杂项：为发布标签准备仓库，由 @rsxdalv 在 https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fpull\u002F587 中完成\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Fcompare\u002Fv0.0.0...v0.0.1","2025-09-29T12:45:46",{"id":220,"version":221,"summary_zh":222,"released_at":223},360375,"v0.0.0","[下载 tts-webui-installer.zip](https:\u002F\u002Fgithub.com\u002Frsxdalv\u002FTTS-WebUI\u002Freleases\u002Fdownload\u002Fv0.0.0\u002Ftts-webui-installer.zip)，然后解压并运行 start_tts_webui 脚本。\n\n_这是一个自动更新的发布版本，包含仓库的 .zip 文件。_","2025-08-16T17:25:57"]