[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-RVC-Boss--GPT-SoVITS":3,"tool-RVC-Boss--GPT-SoVITS":66},[4,23,32,40,48,57],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":22},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,2,"2026-04-05T10:45:23",[13,14,15,16,17,18,19,20,21],"图像","数据工具","视频","插件","Agent","其他","语言模型","开发框架","音频","ready",{"id":24,"name":25,"github_repo":26,"description_zh":27,"stars":28,"difficulty_score":29,"last_commit_at":30,"category_tags":31,"status":22},2863,"TTS","coqui-ai\u002FTTS","🐸TTS 是一款功能强大的深度学习文本转语音（Text-to-Speech）开源库，旨在将文字自然流畅地转化为逼真的人声。它解决了传统语音合成技术中声音机械生硬、多语言支持不足以及定制门槛高等痛点，让高质量的语音生成变得触手可及。\n\n无论是希望快速集成语音功能的开发者，还是致力于探索前沿算法的研究人员，亦或是需要定制专属声音的数据科学家，🐸TTS 都能提供得力支持。它不仅预置了覆盖全球 1100 多种语言的训练模型，让用户能够即刻上手，还提供了完善的工具链，支持用户利用自有数据训练新模型或对现有模型进行微调，轻松实现特定风格的声音克隆。\n\n在技术亮点方面，🐸TTS 表现卓越。其最新的 ⓍTTSv2 模型支持 16 种语言，并在整体性能上大幅提升，实现了低于 200 毫秒的超低延迟流式输出，极大提升了实时交互体验。此外，它还无缝集成了 🐶Bark、🐢Tortoise 等社区热门模型，并支持调用上千个 Fairseq 模型，展现了极强的兼容性与扩展性。配合丰富的数据集分析与整理工具，🐸TTS 已成为科研与生产环境中备受信赖的语音合成解决方案。",44971,3,"2026-04-03T14:47:02",[21,20,13],{"id":33,"name":34,"github_repo":35,"description_zh":36,"stars":37,"difficulty_score":29,"last_commit_at":38,"category_tags":39,"status":22},2375,"LocalAI","mudler\u002FLocalAI","LocalAI 是一款开源的本地人工智能引擎，旨在让用户在任意硬件上轻松运行各类 AI 模型，包括大语言模型、图像生成、语音识别及视频处理等。它的核心优势在于彻底打破了高性能计算的门槛，无需昂贵的专用 GPU，仅凭普通 CPU 或常见的消费级显卡（如 NVIDIA、AMD、Intel 及 Apple Silicon）即可部署和运行复杂的 AI 任务。\n\n对于担心数据隐私的用户而言，LocalAI 提供了“隐私优先”的解决方案，确保所有数据处理均在本地基础设施内完成，无需上传至云端。同时，它完美兼容 OpenAI、Anthropic 等主流 API 接口，这意味着开发者可以无缝迁移现有应用，直接利用本地资源替代云服务，既降低了成本又提升了可控性。\n\nLocalAI 内置了超过 35 种后端支持（如 llama.cpp、vLLM、Whisper 等），并集成了自主 AI 代理、工具调用及检索增强生成（RAG）等高级功能，且具备多用户管理与权限控制能力。无论是希望保护敏感数据的企业开发者、进行算法实验的研究人员，还是想要在个人电脑上体验最新 AI 技术的极客玩家，都能通过 LocalAI 获",44782,"2026-04-02T22:14:26",[13,21,19,17,20,14,16],{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":29,"last_commit_at":46,"category_tags":47,"status":22},3108,"bark","suno-ai\u002Fbark","Bark 是由 Suno 推出的开源生成式音频模型，能够根据文本提示创造出高度逼真的多语言语音、音乐、背景噪音及简单音效。与传统仅能朗读文字的语音合成工具不同，Bark 基于 Transformer 架构，不仅能模拟说话，还能生成笑声、叹息、哭泣等非语言声音，甚至能处理带有情感色彩和语气停顿的复杂文本，极大地丰富了音频表达的可能性。\n\n它主要解决了传统语音合成声音机械、缺乏情感以及无法生成非语音类音效的痛点，让创作者能通过简单的文字描述获得生动自然的音频素材。无论是需要为视频配音的内容创作者、探索多模态生成的研究人员，还是希望快速原型设计的开发者，都能从中受益。普通用户也可通过集成的演示页面轻松体验其神奇效果。\n\n技术亮点方面，Bark 支持商业使用（MIT 许可），并在近期更新中实现了显著的推理速度提升，同时提供了适配低显存 GPU 的版本，降低了使用门槛。此外，社区还建立了丰富的提示词库，帮助用户更好地驾驭模型生成特定风格的声音。只需几行 Python 代码，即可将创意文本转化为高质量音频，是连接文字与声音世界的强大桥梁。",39067,"2026-04-04T03:33:35",[21],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":54,"last_commit_at":55,"category_tags":56,"status":22},3788,"airi","moeru-ai\u002Fairi","airi 是一款开源的本地化 AI 伴侣项目，旨在将虚拟角色（如“二次元老婆”或赛博生命）带入用户的现实世界。它的核心目标是复刻并超越知名 AI 主播 Neuro-sama 的能力，让用户能够拥有完全自主掌控、可私有化部署的智能伙伴。\n\nairi 主要解决了用户对高度定制化、具备情感交互能力且数据隐私安全的 AI 角色的需求。不同于依赖云端服务的通用助手，airi 允许用户在本地运行，不仅保护了对话隐私，还赋予了用户定义角色性格与灵魂的自由。它支持实时语音聊天，甚至能直接参与《我的世界》（Minecraft）和《异星工厂》（Factorio）等游戏，实现了从单纯对话到共同娱乐的跨越。\n\n这款工具非常适合喜爱虚拟角色的普通用户、希望搭建个性化 AI 陪伴的技术爱好者，以及研究多模态交互的开发者。其独特的技术亮点在于跨平台支持（涵盖 Web、macOS 和 Windows）以及强大的游戏交互能力，让 AI 不仅能“说”，还能“玩”。通过容器化的灵魂设计，airi 为每个人创造专属数字生命提供了可能，让虚拟陪伴变得更加真实且触手可及。",37086,1,"2026-04-05T10:54:25",[19,21,17],{"id":58,"name":59,"github_repo":60,"description_zh":61,"stars":62,"difficulty_score":63,"last_commit_at":64,"category_tags":65,"status":22},2735,"MockingBird","babysor\u002FMockingBird","MockingBird 是一款开源的实时语音克隆工具，旨在让用户仅需 5 秒的参考音频，即可快速合成任意内容的语音，并实现逼真的音色复刻。它有效解决了传统语音合成技术中数据采集成本高、训练周期长以及难以实时生成的痛点，让个性化语音生成变得触手可及。\n\n这款工具特别适合开发者、AI 研究人员以及对语音技术感兴趣的技术爱好者使用。无论是用于构建交互式语音应用、进行声学模型研究，还是制作创意内容，MockingBird 都能提供强大的支持。普通用户若具备基础的编程环境配置能力，也可通过其提供的 Web 服务或工具箱体验前沿的变声效果。\n\n在技术亮点方面，MockingBird 基于 PyTorch 框架，不仅完美支持中文普通话及多种主流数据集，还实现了跨平台运行，兼容 Windows、Linux 乃至 M1 架构的 macOS。其独特的架构设计允许复用预训练的编码器与声码器，只需微调合成器即可获得出色效果，大幅降低了部署门槛。此外，项目内置了现成的 Web 服务器功能，方便用户通过远程调用快速集成到自己的应用中。尽管原作者已转向云端优化版本，但 MockingBird 作为经典的本地部署方案",36902,4,"2026-04-02T16:15:29",[17,21,13,20],{"id":67,"github_repo":68,"name":69,"description_en":70,"description_zh":71,"ai_summary_zh":71,"readme_en":72,"readme_zh":73,"quickstart_zh":74,"use_case_zh":75,"hero_image_url":76,"owner_login":77,"owner_name":78,"owner_avatar_url":79,"owner_bio":78,"owner_company":78,"owner_location":78,"owner_email":78,"owner_twitter":78,"owner_website":78,"owner_url":80,"languages":81,"stars":115,"forks":116,"last_commit_at":117,"license":118,"difficulty_score":29,"env_os":119,"env_gpu":120,"env_ram":121,"env_deps":122,"category_tags":130,"github_topics":131,"view_count":10,"oss_zip_url":78,"oss_zip_packed_at":78,"status":22,"created_at":138,"updated_at":139,"faqs":140,"releases":170},4128,"RVC-Boss\u002FGPT-SoVITS","GPT-SoVITS","1 min voice data can also be used to train a good TTS model! (few shot voice cloning)","GPT-SoVITS 是一款强大的开源语音合成与声音克隆工具，旨在让用户仅需极少量的音频数据即可训练出高质量的个性化语音模型。它核心解决了传统语音合成技术依赖海量录音数据、门槛高且成本大的痛点，实现了“零样本”和“少样本”的快速建模：用户只需提供 5 秒参考音频即可即时生成语音，或使用 1 分钟数据进行微调，从而获得高度逼真且相似度极佳的声音效果。\n\n该工具特别适合内容创作者、独立开发者、研究人员以及希望为角色配音的普通用户使用。其内置的友好 WebUI 界面集成了人声伴奏分离、自动数据集切片、中文语音识别及文本标注等辅助功能，极大地降低了数据准备和模型训练的技术门槛，让非专业人士也能轻松上手。\n\n在技术亮点方面，GPT-SoVITS 不仅支持中、英、日、韩、粤语等多语言跨语种合成，还具备卓越的推理速度，在主流显卡上可实现实时甚至超实时的生成效率。无论是需要快速制作视频配音，还是进行多语言语音交互研究，GPT-SoVITS 都能以极低的数据成本提供专业级的语音合成体验。","\u003Cdiv align=\"center\">\n\n\u003Ch1>GPT-SoVITS-WebUI\u003C\u002Fh1>\nA Powerful Few-shot Voice Conversion and Text-to-Speech WebUI.\u003Cbr>\u003Cbr>\n\n[![madewithlove](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fmade_with-%E2%9D%A4-red?style=for-the-badge&labelColor=orange)](https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS)\n\n\u003Ca href=\"https:\u002F\u002Ftrendshift.io\u002Frepositories\u002F7033\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FRVC-Boss_GPT-SoVITS_readme_4a68feb902da.png\" alt=\"RVC-Boss%2FGPT-SoVITS | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"\u002F>\u003C\u002Fa>\n\n\u003C!-- img src=\"https:\u002F\u002Fcounter.seku.su\u002Fcmoe?name=gptsovits&theme=r34\" \u002F>\u003Cbr> -->\n\n[![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.10--3.12-blue?style=for-the-badge&logo=python)](https:\u002F\u002Fwww.python.org)\n[![GitHub release](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Frelease\u002FRVC-Boss\u002Fgpt-sovits?style=for-the-badge&logo=github)](https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002Fgpt-sovits\u002Freleases)\n\n[![Train In Colab](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FColab-Training-F9AB00?style=for-the-badge&logo=googlecolab)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002FRVC-Boss\u002FGPT-SoVITS\u002Fblob\u002Fmain\u002FColab-WebUI.ipynb)\n[![Huggingface](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F免费在线体验-free_online_demo-yellow.svg?style=for-the-badge&logo=huggingface)](https:\u002F\u002Flj1995-gpt-sovits-proplus.hf.space\u002F)\n[![Image Size](https:\u002F\u002Fimg.shields.io\u002Fdocker\u002Fimage-size\u002Fxxxxrt666\u002Fgpt-sovits\u002Flatest?style=for-the-badge&logo=docker)](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fxxxxrt666\u002Fgpt-sovits)\n\n[![简体中文](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F简体中文-阅读文档-blue?style=for-the-badge&logo=googledocs&logoColor=white)](https:\u002F\u002Fwww.yuque.com\u002Fbaicaigongchang1145haoyuangong\u002Fib3g1e)\n[![English](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FEnglish-Read%20Docs-blue?style=for-the-badge&logo=googledocs&logoColor=white)](https:\u002F\u002Frentry.co\u002FGPT-SoVITS-guide#\u002F)\n[![Change Log](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FChange%20Log-View%20Updates-blue?style=for-the-badge&logo=googledocs&logoColor=white)](https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fblob\u002Fmain\u002Fdocs\u002Fen\u002FChangelog_EN.md)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLICENSE-MIT-green.svg?style=for-the-badge&logo=opensourceinitiative)](https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fblob\u002Fmain\u002FLICENSE)\n\n**English** | [**中文简体**](.\u002Fdocs\u002Fcn\u002FREADME.md) | [**日本語**](.\u002Fdocs\u002Fja\u002FREADME.md) | [**한국어**](.\u002Fdocs\u002Fko\u002FREADME.md) | [**Türkçe**](.\u002Fdocs\u002Ftr\u002FREADME.md)\n\n\u003C\u002Fdiv>\n\n---\n\n## Features:\n\n1. **Zero-shot TTS:** Input a 5-second vocal sample and experience instant text-to-speech conversion.\n\n2. **Few-shot TTS:** Fine-tune the model with just 1 minute of training data for improved voice similarity and realism.\n\n3. **Cross-lingual Support:** Inference in languages different from the training dataset, currently supporting English, Japanese, Korean, Cantonese and Chinese.\n\n4. **WebUI Tools:** Integrated tools include voice accompaniment separation, automatic training set segmentation, Chinese ASR, and text labeling, assisting beginners in creating training datasets and GPT\u002FSoVITS models.\n\n**Check out our [demo video](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV12g4y1m7Uw) here!**\n\nUnseen speakers few-shot fine-tuning demo:\n\nhttps:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fassets\u002F129054828\u002F05bee1fa-bdd8-4d85-9350-80c060ab47fb\n\n**RTF(inference speed) of GPT-SoVITS v2 ProPlus**:\n0.028 tested in 4060Ti, 0.014 tested in 4090 (1400words~=4min, inference time is 3.36s), 0.526 in M4 CPU. You can test our [huggingface demo](https:\u002F\u002Flj1995-gpt-sovits-proplus.hf.space\u002F) (half H200) to experience high-speed inference .\n\n请不要尬黑GPT-SoVITS推理速度慢，谢谢！\n\n**User guide: [简体中文](https:\u002F\u002Fwww.yuque.com\u002Fbaicaigongchang1145haoyuangong\u002Fib3g1e) | [English](https:\u002F\u002Frentry.co\u002FGPT-SoVITS-guide#\u002F)**\n\n## Installation\n\nFor users in China, you can [click here](https:\u002F\u002Fwww.codewithgpu.com\u002Fi\u002FRVC-Boss\u002FGPT-SoVITS\u002FGPT-SoVITS-Official) to use AutoDL Cloud Docker to experience the full functionality online.\n\n### Tested Environments\n\n| Python Version | PyTorch Version  | Device        |\n| -------------- | ---------------- | ------------- |\n| Python 3.10    | PyTorch 2.5.1    | CUDA 12.4     |\n| Python 3.11    | PyTorch 2.5.1    | CUDA 12.4     |\n| Python 3.11    | PyTorch 2.7.0    | CUDA 12.8     |\n| Python 3.9     | PyTorch 2.8.0dev | CUDA 12.8     |\n| Python 3.9     | PyTorch 2.5.1    | Apple silicon |\n| Python 3.11    | PyTorch 2.7.0    | Apple silicon |\n| Python 3.9     | PyTorch 2.2.2    | CPU           |\n\n### Windows\n\nIf you are a Windows user (tested with win>=10), you can [download the integrated package](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FGPT-SoVITS-windows-package\u002Fresolve\u002Fmain\u002FGPT-SoVITS-v3lora-20250228.7z?download=true) and double-click on _go-webui.bat_ to start GPT-SoVITS-WebUI.\n\n**Users in China can [download the package here](https:\u002F\u002Fwww.yuque.com\u002Fbaicaigongchang1145haoyuangong\u002Fib3g1e\u002Fdkxgpiy9zb96hob4#KTvnO).**\n\nInstall the program by running the following commands:\n\n```pwsh\nconda create -n GPTSoVits python=3.10\nconda activate GPTSoVits\npwsh -F install.ps1 --Device \u003CCU126|CU128|CPU> --Source \u003CHF|HF-Mirror|ModelScope> [--DownloadUVR5]\n```\n\n### Linux\n\n```bash\nconda create -n GPTSoVits python=3.10\nconda activate GPTSoVits\nbash install.sh --device \u003CCU126|CU128|ROCM|CPU> --source \u003CHF|HF-Mirror|ModelScope> [--download-uvr5]\n```\n\n### macOS\n\n**Note: The models trained with GPUs on Macs result in significantly lower quality compared to those trained on other devices, so we are temporarily using CPUs instead.**\n\nInstall the program by running the following commands:\n\n```bash\nconda create -n GPTSoVits python=3.10\nconda activate GPTSoVits\nbash install.sh --device \u003CMPS|CPU> --source \u003CHF|HF-Mirror|ModelScope> [--download-uvr5]\n```\n\n### Install Manually\n\n#### Install Dependences\n\n```bash\nconda create -n GPTSoVits python=3.10\nconda activate GPTSoVits\n\npip install -r extra-req.txt --no-deps\npip install -r requirements.txt\n```\n\n#### Install FFmpeg\n\n##### Conda Users\n\n```bash\nconda activate GPTSoVits\nconda install ffmpeg\n```\n\n##### Ubuntu\u002FDebian Users\n\n```bash\nsudo apt install ffmpeg\nsudo apt install libsox-dev\n```\n\n##### Windows Users\n\nDownload and place [ffmpeg.exe](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FVoiceConversionWebUI\u002Fblob\u002Fmain\u002Fffmpeg.exe) and [ffprobe.exe](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FVoiceConversionWebUI\u002Fblob\u002Fmain\u002Fffprobe.exe) in the GPT-SoVITS root\n\nInstall [Visual Studio 2017](https:\u002F\u002Faka.ms\u002Fvs\u002F17\u002Frelease\u002Fvc_redist.x86.exe)\n\n##### MacOS Users\n\n```bash\nbrew install ffmpeg\n```\n\n### Running GPT-SoVITS with Docker\n\n#### Docker Image Selection\n\nDue to rapid development in the codebase and a slower Docker image release cycle, please:\n\n- Check [Docker Hub](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fxxxxrt666\u002Fgpt-sovits) for the latest available image tags\n- Choose an appropriate image tag for your environment\n- `Lite` means the Docker image **does not include** ASR models and UVR5 models. You can manually download the UVR5 models, while the program will automatically download the ASR models as needed\n- The appropriate architecture image (amd64\u002Farm64) will be automatically pulled during Docker Compose\n- Docker Compose will mount **all files** in the current directory. Please switch to the project root directory and **pull the latest code** before using the Docker image\n- Optionally, build the image locally using the provided Dockerfile for the most up-to-date changes\n\n#### Environment Variables\n\n- `is_half`: Controls whether half-precision (fp16) is enabled. Set to `true` if your GPU supports it to reduce memory usage.\n\n#### Shared Memory Configuration\n\nOn Windows (Docker Desktop), the default shared memory size is small and may cause unexpected behavior. Increase `shm_size` (e.g., to `16g`) in your Docker Compose file based on your available system memory.\n\n#### Choosing a Service\n\nThe `docker-compose.yaml` defines two services:\n\n- `GPT-SoVITS-CU126` & `GPT-SoVITS-CU128`: Full version with all features.\n- `GPT-SoVITS-CU126-Lite` & `GPT-SoVITS-CU128-Lite`: Lightweight version with reduced dependencies and functionality.\n\nTo run a specific service with Docker Compose, use:\n\n```bash\ndocker compose run --service-ports \u003CGPT-SoVITS-CU126-Lite|GPT-SoVITS-CU128-Lite|GPT-SoVITS-CU126|GPT-SoVITS-CU128>\n```\n\n#### Building the Docker Image Locally\n\nIf you want to build the image yourself, use:\n\n```bash\nbash docker_build.sh --cuda \u003C12.6|12.8> [--lite]\n```\n\n#### Accessing the Running Container (Bash Shell)\n\nOnce the container is running in the background, you can access it using:\n\n```bash\ndocker exec -it \u003CGPT-SoVITS-CU126-Lite|GPT-SoVITS-CU128-Lite|GPT-SoVITS-CU126|GPT-SoVITS-CU128> bash\n```\n\n## Pretrained Models\n\n**If `install.sh` runs successfully, you may skip No.1,2,3**\n\n**Users in China can [download all these models here](https:\u002F\u002Fwww.yuque.com\u002Fbaicaigongchang1145haoyuangong\u002Fib3g1e\u002Fdkxgpiy9zb96hob4#nVNhX).**\n\n1. Download pretrained models from [GPT-SoVITS Models](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FGPT-SoVITS) and place them in `GPT_SoVITS\u002Fpretrained_models`.\n\n2. Download G2PW models from [G2PWModel.zip(HF)](https:\u002F\u002Fhuggingface.co\u002FXXXXRT\u002FGPT-SoVITS-Pretrained\u002Fresolve\u002Fmain\u002FG2PWModel.zip)| [G2PWModel.zip(ModelScope)](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002FXXXXRT\u002FGPT-SoVITS-Pretrained\u002Fresolve\u002Fmaster\u002FG2PWModel.zip), unzip and rename to `G2PWModel`, and then place them in `GPT_SoVITS\u002Ftext`.(Chinese TTS Only)\n\n3. For UVR5 (Vocals\u002FAccompaniment Separation & Reverberation Removal, additionally), download models from [UVR5 Weights](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FVoiceConversionWebUI\u002Ftree\u002Fmain\u002Fuvr5_weights) and place them in `tools\u002Fuvr5\u002Fuvr5_weights`.\n\n   - If you want to use `bs_roformer` or `mel_band_roformer` models for UVR5, you can manually download the model and corresponding configuration file, and put them in `tools\u002Fuvr5\u002Fuvr5_weights`. **Rename the model file and configuration file, ensure that the model and configuration files have the same and corresponding names except for the suffix**. In addition, the model and configuration file names **must include `roformer`** in order to be recognized as models of the roformer class.\n\n   - The suggestion is to **directly specify the model type** in the model name and configuration file name, such as `mel_mand_roformer`, `bs_roformer`. If not specified, the features will be compared from the configuration file to determine which type of model it is. For example, the model `bs_roformer_ep_368_sdr_12.9628.ckpt` and its corresponding configuration file `bs_roformer_ep_368_sdr_12.9628.yaml` are a pair, `kim_mel_band_roformer.ckpt` and `kim_mel_band_roformer.yaml` are also a pair.\n\n4. For Chinese ASR (additionally), download models from [Damo ASR Model](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002Fdamo\u002Fspeech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch\u002Ffiles), [Damo VAD Model](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002Fdamo\u002Fspeech_fsmn_vad_zh-cn-16k-common-pytorch\u002Ffiles), and [Damo Punc Model](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002Fdamo\u002Fpunc_ct-transformer_zh-cn-common-vocab272727-pytorch\u002Ffiles) and place them in `tools\u002Fasr\u002Fmodels`.\n\n5. For English or Japanese ASR (additionally), download models from [Faster Whisper Large V3](https:\u002F\u002Fhuggingface.co\u002FSystran\u002Ffaster-whisper-large-v3) and place them in `tools\u002Fasr\u002Fmodels`. Also, [other models](https:\u002F\u002Fhuggingface.co\u002FSystran) may have the similar effect with smaller disk footprint.\n\n## Dataset Format\n\nThe TTS annotation .list file format:\n\n```\n\nvocal_path|speaker_name|language|text\n\n```\n\nLanguage dictionary:\n\n- 'zh': Chinese\n- 'ja': Japanese\n- 'en': English\n- 'ko': Korean\n- 'yue': Cantonese\n\nExample:\n\n```\n\nD:\\GPT-SoVITS\\xxx\u002Fxxx.wav|xxx|en|I like playing Genshin.\n\n```\n\n## Finetune and inference\n\n### Open WebUI\n\n#### Integrated Package Users\n\nDouble-click `go-webui.bat`or use `go-webui.ps1`\nif you want to switch to V1,then double-click`go-webui-v1.bat` or use `go-webui-v1.ps1`\n\n#### Others\n\n```bash\npython webui.py \u003Clanguage(optional)>\n```\n\nif you want to switch to V1,then\n\n```bash\npython webui.py v1 \u003Clanguage(optional)>\n```\n\nOr maunally switch version in WebUI\n\n### Finetune\n\n#### Path Auto-filling is now supported\n\n1. Fill in the audio path\n2. Slice the audio into small chunks\n3. Denoise(optinal)\n4. ASR\n5. Proofreading ASR transcriptions\n6. Go to the next Tab, then finetune the model\n\n### Open Inference WebUI\n\n#### Integrated Package Users\n\nDouble-click `go-webui-v2.bat` or use `go-webui-v2.ps1` ,then open the inference webui at `1-GPT-SoVITS-TTS\u002F1C-inference`\n\n#### Others\n\n```bash\npython GPT_SoVITS\u002Finference_webui.py \u003Clanguage(optional)>\n```\n\nOR\n\n```bash\npython webui.py\n```\n\nthen open the inference webui at `1-GPT-SoVITS-TTS\u002F1C-inference`\n\n## V2 Release Notes\n\nNew Features:\n\n1. Support Korean and Cantonese\n\n2. An optimized text frontend\n\n3. Pre-trained model extended from 2k hours to 5k hours\n\n4. Improved synthesis quality for low-quality reference audio\n\n   [more details](\u003Chttps:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fwiki\u002FGPT%E2%80%90SoVITS%E2%80%90v2%E2%80%90features-(%E6%96%B0%E7%89%B9%E6%80%A7)>)\n\nUse v2 from v1 environment:\n\n1. `pip install -r requirements.txt` to update some packages\n\n2. Clone the latest codes from github.\n\n3. Download v2 pretrained models from [huggingface](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FGPT-SoVITS\u002Ftree\u002Fmain\u002Fgsv-v2final-pretrained) and put them into `GPT_SoVITS\u002Fpretrained_models\u002Fgsv-v2final-pretrained`.\n\n   Chinese v2 additional: [G2PWModel.zip(HF)](https:\u002F\u002Fhuggingface.co\u002FXXXXRT\u002FGPT-SoVITS-Pretrained\u002Fresolve\u002Fmain\u002FG2PWModel.zip)| [G2PWModel.zip(ModelScope)](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002FXXXXRT\u002FGPT-SoVITS-Pretrained\u002Fresolve\u002Fmaster\u002FG2PWModel.zip)(Download G2PW models, unzip and rename to `G2PWModel`, and then place them in `GPT_SoVITS\u002Ftext`.)\n\n## V3 Release Notes\n\nNew Features:\n\n1. The timbre similarity is higher, requiring less training data to approximate the target speaker (the timbre similarity is significantly improved using the base model directly without fine-tuning).\n\n2. GPT model is more stable, with fewer repetitions and omissions, and it is easier to generate speech with richer emotional expression.\n\n   [more details](\u003Chttps:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fwiki\u002FGPT%E2%80%90SoVITS%E2%80%90v3v4%E2%80%90features-(%E6%96%B0%E7%89%B9%E6%80%A7)>)\n\nUse v3 from v2 environment:\n\n1. `pip install -r requirements.txt` to update some packages\n\n2. Clone the latest codes from github.\n\n3. Download v3 pretrained models (s1v3.ckpt, s2Gv3.pth and models--nvidia--bigvgan_v2_24khz_100band_256x folder) from [huggingface](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FGPT-SoVITS\u002Ftree\u002Fmain) and put them into `GPT_SoVITS\u002Fpretrained_models`.\n\n   additional: for Audio Super Resolution model, you can read [how to download](.\u002Ftools\u002FAP_BWE_main\u002F24kto48k\u002Freadme.txt)\n\n## V4 Release Notes\n\nNew Features:\n\n1. Version 4 fixes the issue of metallic artifacts in Version 3 caused by non-integer multiple upsampling, and natively outputs 48k audio to prevent muffled sound (whereas Version 3 only natively outputs 24k audio). The author considers Version 4 a direct replacement for Version 3, though further testing is still needed.\n   [more details](\u003Chttps:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fwiki\u002FGPT%E2%80%90SoVITS%E2%80%90v3v4%E2%80%90features-(%E6%96%B0%E7%89%B9%E6%80%A7)>)\n\nUse v4 from v1\u002Fv2\u002Fv3 environment:\n\n1. `pip install -r requirements.txt` to update some packages\n\n2. Clone the latest codes from github.\n\n3. Download v4 pretrained models (gsv-v4-pretrained\u002Fs2v4.pth, and gsv-v4-pretrained\u002Fvocoder.pth) from [huggingface](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FGPT-SoVITS\u002Ftree\u002Fmain) and put them into `GPT_SoVITS\u002Fpretrained_models`.\n\n## V2Pro Release Notes\n\nNew Features:\n\n1. Slightly higher VRAM usage than v2, surpassing v4's performance, with v2's hardware cost and speed.\n   [more details](\u003Chttps:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fwiki\u002FGPT%E2%80%90SoVITS%E2%80%90features-(%E5%90%84%E7%89%88%E6%9C%AC%E7%89%B9%E6%80%A7)>)\n\n2.v1\u002Fv2 and the v2Pro series share the same characteristics, while v3\u002Fv4 have similar features. For training sets with average audio quality, v1\u002Fv2\u002Fv2Pro can deliver decent results, but v3\u002Fv4 cannot. Additionally, the synthesized tone and timebre of v3\u002Fv4 lean more toward the reference audio rather than the overall training set.\n\nUse v2Pro from v1\u002Fv2\u002Fv3\u002Fv4 environment:\n\n1. `pip install -r requirements.txt` to update some packages\n\n2. Clone the latest codes from github.\n\n3. Download v2Pro pretrained models (v2Pro\u002Fs2Dv2Pro.pth, v2Pro\u002Fs2Gv2Pro.pth, v2Pro\u002Fs2Dv2ProPlus.pth, v2Pro\u002Fs2Gv2ProPlus.pth, and sv\u002Fpretrained_eres2netv2w24s4ep4.ckpt) from [huggingface](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FGPT-SoVITS\u002Ftree\u002Fmain) and put them into `GPT_SoVITS\u002Fpretrained_models`.\n\n## Todo List\n\n- [x] **High Priority:**\n\n  - [x] Localization in Japanese and English.\n  - [x] User guide.\n  - [x] Japanese and English dataset fine tune training.\n\n- [ ] **Features:**\n  - [x] Zero-shot voice conversion (5s) \u002F few-shot voice conversion (1min).\n  - [x] TTS speaking speed control.\n  - [ ] ~~Enhanced TTS emotion control.~~ Maybe use pretrained finetuned preset GPT models for better emotion.\n  - [ ] Experiment with changing SoVITS token inputs to probability distribution of GPT vocabs (transformer latent).\n  - [x] Improve English and Japanese text frontend.\n  - [ ] Develop tiny and larger-sized TTS models.\n  - [x] Colab scripts.\n  - [x] Try expand training dataset (2k hours -> 10k hours).\n  - [x] better sovits base model (enhanced audio quality)\n  - [ ] model mix\n\n## (Additional) Method for running from the command line\n\nUse the command line to open the WebUI for UVR5\n\n```bash\npython tools\u002Fuvr5\u002Fwebui.py \"\u003Cinfer_device>\" \u003Cis_half> \u003Cwebui_port_uvr5>\n```\n\n\u003C!-- If you can't open a browser, follow the format below for UVR processing,This is using mdxnet for audio processing\n```\npython mdxnet.py --model --input_root --output_vocal --output_ins --agg_level --format --device --is_half_precision\n``` -->\n\nThis is how the audio segmentation of the dataset is done using the command line\n\n```bash\npython audio_slicer.py \\\n    --input_path \"\u003Cpath_to_original_audio_file_or_directory>\" \\\n    --output_root \"\u003Cdirectory_where_subdivided_audio_clips_will_be_saved>\" \\\n    --threshold \u003Cvolume_threshold> \\\n    --min_length \u003Cminimum_duration_of_each_subclip> \\\n    --min_interval \u003Cshortest_time_gap_between_adjacent_subclips>\n    --hop_size \u003Cstep_size_for_computing_volume_curve>\n```\n\nThis is how dataset ASR processing is done using the command line(Only Chinese)\n\n```bash\npython tools\u002Fasr\u002Ffunasr_asr.py -i \u003Cinput> -o \u003Coutput>\n```\n\nASR processing is performed through Faster_Whisper(ASR marking except Chinese)\n\n(No progress bars, GPU performance may cause time delays)\n\n```bash\npython .\u002Ftools\u002Fasr\u002Ffasterwhisper_asr.py -i \u003Cinput> -o \u003Coutput> -l \u003Clanguage> -p \u003Cprecision>\n```\n\nA custom list save path is enabled\n\n## Credits\n\nSpecial thanks to the following projects and contributors:\n\n### Theoretical Research\n\n- [ar-vits](https:\u002F\u002Fgithub.com\u002Finnnky\u002Far-vits)\n- [SoundStorm](https:\u002F\u002Fgithub.com\u002Fyangdongchao\u002FSoundStorm\u002Ftree\u002Fmaster\u002Fsoundstorm\u002Fs1\u002FAR)\n- [vits](https:\u002F\u002Fgithub.com\u002Fjaywalnut310\u002Fvits)\n- [TransferTTS](https:\u002F\u002Fgithub.com\u002Fhcy71o\u002FTransferTTS\u002Fblob\u002Fmaster\u002Fmodels.py#L556)\n- [contentvec](https:\u002F\u002Fgithub.com\u002Fauspicious3000\u002Fcontentvec\u002F)\n- [hifi-gan](https:\u002F\u002Fgithub.com\u002Fjik876\u002Fhifi-gan)\n- [fish-speech](https:\u002F\u002Fgithub.com\u002Ffishaudio\u002Ffish-speech\u002Fblob\u002Fmain\u002Ftools\u002Fllama\u002Fgenerate.py#L41)\n- [f5-TTS](https:\u002F\u002Fgithub.com\u002FSWivid\u002FF5-TTS\u002Fblob\u002Fmain\u002Fsrc\u002Ff5_tts\u002Fmodel\u002Fbackbones\u002Fdit.py)\n- [shortcut flow matching](https:\u002F\u002Fgithub.com\u002Fkvfrans\u002Fshortcut-models\u002Fblob\u002Fmain\u002Ftargets_shortcut.py)\n\n### Pretrained Models\n\n- [Chinese Speech Pretrain](https:\u002F\u002Fgithub.com\u002FTencentGameMate\u002Fchinese_speech_pretrain)\n- [Chinese-Roberta-WWM-Ext-Large](https:\u002F\u002Fhuggingface.co\u002Fhfl\u002Fchinese-roberta-wwm-ext-large)\n- [BigVGAN](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FBigVGAN)\n- [eresnetv2](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002Fiic\u002Fspeech_eres2netv2w24s4ep4_sv_zh-cn_16k-common)\n\n### Text Frontend for Inference\n\n- [paddlespeech zh_normalization](https:\u002F\u002Fgithub.com\u002FPaddlePaddle\u002FPaddleSpeech\u002Ftree\u002Fdevelop\u002Fpaddlespeech\u002Ft2s\u002Ffrontend\u002Fzh_normalization)\n- [split-lang](https:\u002F\u002Fgithub.com\u002FDoodleBears\u002Fsplit-lang)\n- [g2pW](https:\u002F\u002Fgithub.com\u002FGitYCC\u002Fg2pW)\n- [pypinyin-g2pW](https:\u002F\u002Fgithub.com\u002Fmozillazg\u002Fpypinyin-g2pW)\n- [paddlespeech g2pw](https:\u002F\u002Fgithub.com\u002FPaddlePaddle\u002FPaddleSpeech\u002Ftree\u002Fdevelop\u002Fpaddlespeech\u002Ft2s\u002Ffrontend\u002Fg2pw)\n\n### WebUI Tools\n\n- [ultimatevocalremovergui](https:\u002F\u002Fgithub.com\u002FAnjok07\u002Fultimatevocalremovergui)\n- [audio-slicer](https:\u002F\u002Fgithub.com\u002Fopenvpi\u002Faudio-slicer)\n- [SubFix](https:\u002F\u002Fgithub.com\u002Fcronrpc\u002FSubFix)\n- [FFmpeg](https:\u002F\u002Fgithub.com\u002FFFmpeg\u002FFFmpeg)\n- [gradio](https:\u002F\u002Fgithub.com\u002Fgradio-app\u002Fgradio)\n- [faster-whisper](https:\u002F\u002Fgithub.com\u002FSYSTRAN\u002Ffaster-whisper)\n- [FunASR](https:\u002F\u002Fgithub.com\u002Falibaba-damo-academy\u002FFunASR)\n- [AP-BWE](https:\u002F\u002Fgithub.com\u002Fyxlu-0102\u002FAP-BWE)\n\nThankful to @Naozumi520 for providing the Cantonese training set and for the guidance on Cantonese-related knowledge.\n\n## Thanks to all contributors for their efforts\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fgraphs\u002Fcontributors\" target=\"_blank\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FRVC-Boss_GPT-SoVITS_readme_c6ade5f7870e.png\" \u002F>\n\u003C\u002Fa>\n","\u003Cdiv align=\"center\">\n\n\u003Ch1>GPT-SoVITS-WebUI\u003C\u002Fh1>\n一款功能强大的少样本语音转换与文本转语音WebUI。\u003Cbr>\u003Cbr>\n\n[![madewithlove](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fmade_with-%E2%9D%A4-red?style=for-the-badge&labelColor=orange)](https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS)\n\n\u003Ca href=\"https:\u002F\u002Ftrendshift.io\u002Frepositories\u002F7033\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FRVC-Boss_GPT-SoVITS_readme_4a68feb902da.png\" alt=\"RVC-Boss%2FGPT-SoVITS | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"\u002F>\u003C\u002Fa>\n\n\u003C!-- img src=\"https:\u002F\u002Fcounter.seku.su\u002Fcmoe?name=gptsovits&theme=r34\" \u002F>\u003Cbr> -->\n\n[![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.10--3.12-blue?style=for-the-badge&logo=python)](https:\u002F\u002Fwww.python.org)\n[![GitHub release](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Frelease\u002FRVC-Boss\u002Fgpt-sovits?style=for-the-badge&logo=github)](https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002Fgpt-sovits\u002Freleases)\n\n[![Train In Colab](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FColab-Training-F9AB00?style=for-the-badge&logo=googlecolab)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002FRVC-Boss\u002FGPT-SoVITS\u002Fblob\u002Fmain\u002FColab-WebUI.ipynb)\n[![Huggingface](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F免费在线体验-free_online_demo-yellow.svg?style=for-the-badge&logo=huggingface)](https:\u002F\u002Flj1995-gpt-sovits-proplus.hf.space\u002F)\n[![Image Size](https:\u002F\u002Fimg.shields.io\u002Fdocker\u002Fimage-size\u002Fxxxxrt666\u002Fgpt-sovits\u002Flatest?style=for-the-badge&logo=docker)](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fxxxxrt666\u002Fgpt-sovits)\n\n[![简体中文](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F简体中文-阅读文档-blue?style=for-the-badge&logo=googledocs&logoColor=white)](https:\u002F\u002Fwww.yuque.com\u002Fbaicaigongchang1145haoyuangong\u002Fib3g1e)\n[![English](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FEnglish-Read%20Docs-blue?style=for-the-badge&logo=googledocs&logoColor=white)](https:\u002F\u002Frentry.co\u002FGPT-SoVITS-guide#\u002F)\n[![Change Log](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FChange%20Log-View%20Updates-blue?style=for-the-badge&logo=googledocs&logoColor=white)](https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fblob\u002Fmain\u002Fdocs\u002Fen\u002FChangelog_EN.md)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLICENSE-MIT-green.svg?style=for-the-badge&logo=opensourceinitiative)](https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fblob\u002Fmain\u002FLICENSE)\n\n**English** | [**中文简体**](.\u002Fdocs\u002Fcn\u002FREADME.md) | [**日本語**](.\u002Fdocs\u002Fja\u002FREADME.md) | [**한국어**](.\u002Fdocs\u002Fko\u002FREADME.md) | [**Türkçe**](.\u002Fdocs\u002Ftr\u002FREADME.md)\n\n\u003C\u002Fdiv>\n\n---\n\n## Features:\n\n1. **Zero-shot TTS:** 输入一段5秒的语音样本，即可体验即时的文本转语音转换。\n\n2. **Few-shot TTS:** 仅需1分钟的训练数据即可微调模型，以获得更好的语音相似度和真实感。\n\n3. **跨语言支持:** 可在不同于训练数据集的语言中进行推理，目前支持英语、日语、韩语、粤语和中文。\n\n4. **WebUI工具:** 集成的工具包括伴奏分离、自动分割训练集、中文ASR以及文本标注，帮助初学者创建训练数据集和GPT\u002FSoVITS模型。\n\n**请在此处观看我们的[演示视频](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV12g4y1m7Uw)！**\n\n未见过的说话者少样本微调演示：\n\nhttps:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fassets\u002F129054828\u002F05bee1fa-bdd8-4d85-9350-80c060ab47fb\n\n**GPT-SoVITS v2 ProPlus的RTF（推理速度）**：\n在4060Ti上测试为0.028，在4090上测试为0.014（1400字≈4分钟，推理时间为3.36秒），在M4 CPU上为0.526。您可以在我们的[Huggingface演示](https:\u002F\u002Flj1995-gpt-sovits-proplus.hf.space\u002F)（半H200）上体验高速推理。\n\n请不要尬黑GPT-SoVITS推理速度慢，谢谢！\n\n**用户指南：[简体中文](https:\u002F\u002Fwww.yuque.com\u002Fbaicaigongchang1145haoyuangong\u002Fib3g1e) | [English](https:\u002F\u002Frentry.co\u002FGPT-SoVITS-guide#\u002F)**\n\n## 安装\n\n对于中国用户，您可以[点击此处](https:\u002F\u002Fwww.codewithgpu.com\u002Fi\u002FRVC-Boss\u002FGPT-SoVITS\u002FGPT-SoVITS-Official)使用AutoDL Cloud Docker在线体验完整功能。\n\n### 测试环境\n\n| Python版本 | PyTorch版本 | 设备 |\n| -------------- | ---------------- | ------------- |\n| Python 3.10    | PyTorch 2.5.1    | CUDA 12.4     |\n| Python 3.11    | PyTorch 2.5.1    | CUDA 12.4     |\n| Python 3.11    | PyTorch 2.7.0    | CUDA 12.8     |\n| Python 3.9     | PyTorch 2.8.0dev | CUDA 12.8     |\n| Python 3.9     | PyTorch 2.5.1    | 苹果芯片 |\n| Python 3.11    | PyTorch 2.7.0    | 苹果芯片 |\n| Python 3.9     | PyTorch 2.2.2    | CPU           |\n\n### Windows\n\n如果您是Windows用户（经win>=10测试），可以[下载集成包](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FGPT-SoVITS-windows-package\u002Fresolve\u002Fmain\u002FGPT-SoVITS-v3lora-20250228.7z?download=true)并双击_go-webui.bat_以启动GPT-SoVITS-WebUI。\n\n**中国用户可[在此下载包](https:\u002F\u002Fwww.yuque.com\u002Fbaicaigongchang1145haoyuangong\u002Fib3g1e\u002Fdkxgpiy9zb96hob4#KTvnO)。**\n\n通过运行以下命令安装程序：\n\n```pwsh\nconda create -n GPTSoVits python=3.10\nconda activate GPTSoVits\npwsh -F install.ps1 --Device \u003CCU126|CU128|CPU> --Source \u003CHF|HF-Mirror|ModelScope> [--DownloadUVR5]\n```\n\n### Linux\n\n```bash\nconda create -n GPTSoVits python=3.10\nconda activate GPTSoVits\nbash install.sh --device \u003CCU126|CU128|ROCM|CPU> --source \u003CHF|HF-Mirror|ModelScope> [--download-uvr5]\n```\n\n### macOS\n\n**注意：在Mac上使用GPU训练的模型质量明显低于其他设备，因此我们暂时改用CPU。**\n\n通过运行以下命令安装程序：\n\n```bash\nconda create -n GPTSoVits python=3.10\nconda activate GPTSoVits\nbash install.sh --device \u003CMPS|CPU> --source \u003CHF|HF-Mirror|ModelScope> [--download-uvr5]\n```\n\n### 手动安装\n\n#### 安装依赖\n\n```bash\nconda create -n GPTSoVits python=3.10\nconda activate GPTSoVits\n\npip install -r extra-req.txt --no-deps\npip install -r requirements.txt\n```\n\n#### 安装FFmpeg\n\n##### Conda用户\n\n```bash\nconda activate GPTSoVits\nconda install ffmpeg\n```\n\n##### Ubuntu\u002FDebian用户\n\n```bash\nsudo apt install ffmpeg\nsudo apt install libsox-dev\n```\n\n##### Windows用户\n\n下载并放置[ffmpeg.exe](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FVoiceConversionWebUI\u002Fblob\u002Fmain\u002Fffmpeg.exe)和[ffprobe.exe](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FVoiceConversionWebUI\u002Fblob\u002Fmain\u002Fffprobe.exe)到GPT-SoVITS根目录。\n\n安装[Visual Studio 2017](https:\u002F\u002Faka.ms\u002Fvs\u002F17\u002Frelease\u002Fvc_redist.x86.exe)。\n\n##### MacOS用户\n\n```bash\nbrew install ffmpeg\n```\n\n### 使用 Docker 运行 GPT-SoVITS\n\n#### Docker 镜像选择\n\n由于代码库更新迅速，而 Docker 镜像的发布周期较慢，请注意以下事项：\n\n- 请前往 [Docker Hub](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fxxxxrt666\u002Fgpt-sovits) 查看最新的镜像标签。\n- 根据您的环境选择合适的镜像标签。\n- `Lite` 表示该 Docker 镜像 **不包含** ASR 模型和 UVR5 模型。您可以手动下载 UVR5 模型，而 ASR 模型会在需要时由程序自动下载。\n- 在使用 Docker Compose 时，会根据您的架构自动拉取相应的镜像（amd64 或 arm64）。\n- Docker Compose 会挂载当前目录下的 **所有文件**。请切换到项目根目录，并在使用 Docker 镜像之前 **拉取最新代码**。\n- 您也可以选择使用提供的 Dockerfile 在本地构建镜像，以获得最新的更改。\n\n#### 环境变量\n\n- `is_half`：控制是否启用半精度（fp16）。如果您的 GPU 支持，则将其设置为 `true`，以减少显存占用。\n\n#### 共享内存配置\n\n在 Windows（Docker Desktop）上，默认的共享内存大小较小，可能会导致意外行为。请根据您可用的系统内存，在 Docker Compose 文件中增加 `shm_size` 的值（例如设置为 `16g`）。\n\n#### 服务选择\n\n`docker-compose.yaml` 文件定义了两个服务：\n\n- `GPT-SoVITS-CU126` 和 `GPT-SoVITS-CU128`：完整版，包含所有功能。\n- `GPT-SoVITS-CU126-Lite` 和 `GPT-SoVITS-CU128-Lite`：轻量版，依赖项和功能较少。\n\n要使用 Docker Compose 运行特定服务，可以执行以下命令：\n\n```bash\ndocker compose run --service-ports \u003CGPT-SoVITS-CU126-Lite|GPT-SoVITS-CU128-Lite|GPT-SoVITS-CU126|GPT-SoVITS-CU128>\n```\n\n#### 本地构建 Docker 镜像\n\n如果您希望自行构建镜像，可以使用以下命令：\n\n```bash\nbash docker_build.sh --cuda \u003C12.6|12.8> [--lite]\n```\n\n#### 访问正在运行的容器（Bash Shell）\n\n当容器在后台运行时，您可以使用以下命令访问它：\n\n```bash\ndocker exec -it \u003CGPT-SoVITS-CU126-Lite|GPT-SoVITS-CU128-Lite|GPT-SoVITS-CU126|GPT-SoVITS-CU128> bash\n```\n\n## 预训练模型\n\n**如果 `install.sh` 脚本成功运行，您可以跳过第 1、2、3 步。**\n\n**中国用户可以从这里下载所有这些模型**：[点击下载](https:\u002F\u002Fwww.yuque.com\u002Fbaicaigongchang1145haoyuangong\u002Fib3g1e\u002Fdkxgpiy9zb96hob4#nVNhX)。\n\n1. 从 [GPT-SoVITS Models](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FGPT-SoVITS) 下载预训练模型，并将其放置在 `GPT_SoVITS\u002Fpretrained_models` 目录下。\n\n2. 从 [G2PWModel.zip(HF)](https:\u002F\u002Fhuggingface.co\u002FXXXXRT\u002FGPT-SoVITS-Pretrained\u002Fresolve\u002Fmain\u002FG2PWModel.zip) 或 [G2PWModel.zip(ModelScope)](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002FXXXXRT\u002FGPT-SoVITS-Pretrained\u002Fresolve\u002Fmaster\u002FG2PWModel.zip) 下载 G2PW 模型，解压后重命名为 `G2PWModel`，然后将其放置在 `GPT_SoVITS\u002Ftext` 目录下。（仅适用于中文 TTS）\n\n3. 对于 UVR5（人声\u002F伴奏分离及混响去除，可选），请从 [UVR5 Weights](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FVoiceConversionWebUI\u002Ftree\u002Fmain\u002Fuvr5_weights) 下载模型，并将其放置在 `tools\u002Fuvr5\u002Fuvr5_weights` 目录下。\n\n   - 如果您想使用 `bs_roformer` 或 `mel_band_roformer` 模型进行 UVR5 处理，可以手动下载模型及其对应的配置文件，并将它们放入 `tools\u002Fuvr5\u002Fuvr5_weights` 目录。**请确保模型文件和配置文件的名称相同且对应一致，仅后缀不同**。此外，模型和配置文件的名称 **必须包含 `roformer`** 才能被识别为 roformer 类型的模型。\n\n   - 建议在模型名和配置文件名中 **直接标明模型类型**，例如 `mel_mand_roformer`、`bs_roformer`。如果不明确指定，系统将通过比较配置文件来判断模型类型。例如，`bs_roformer_ep_368_sdr_12.9628.ckpt` 及其对应的配置文件 `bs_roformer_ep_368_sdr_12.9628.yaml` 是一对，而 `kim_mel_band_roformer.ckpt` 和 `kim_mel_band_roformer.yaml` 也是另一对。\n\n4. 对于中文 ASR（可选），请从 [Damo ASR Model](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002Fdamo\u002Fspeech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch\u002Ffiles)、[Damo VAD Model](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002Fdamo\u002Fspeech_fsmn_vad_zh-cn-16k-common-pytorch\u002Ffiles) 和 [Damo Punc Model](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002Fdamo\u002Fpunc_ct-transformer_zh-cn-common-vocab272727-pytorch\u002Ffiles) 下载模型，并将其放置在 `tools\u002Fasr\u002Fmodels` 目录下。\n\n5. 对于英文或日文 ASR（可选），请从 [Faster Whisper Large V3](https:\u002F\u002Fhuggingface.co\u002FSystran\u002Ffaster-whisper-large-v3) 下载模型，并将其放置在 `tools\u002Fasr\u002Fmodels` 目录下。此外，[其他模型](https:\u002F\u002Fhuggingface.co\u002FSystran) 也可能具有类似效果，但占用的磁盘空间更小。\n\n## 数据集格式\n\nTTS 注释 `.list` 文件格式如下：\n\n```\n人声路径|说话者姓名|语言|文本\n```\n\n语言字典：\n\n- 'zh'：中文\n- 'ja'：日语\n- 'en'：英语\n- 'ko'：韩语\n- 'yue'：粤语\n\n示例：\n\n```\nD:\\GPT-SoVITS\\xxx\u002Fxxx.wav|xxx|en|我喜欢玩原神。\n```\n\n## 微调与推理\n\n### Open WebUI\n\n#### 集成包用户\n\n双击 `go-webui.bat` 或使用 `go-webui.ps1`。如果您想切换到 V1 版本，可以双击 `go-webui-v1.bat` 或使用 `go-webui-v1.ps1`。\n\n#### 其他用户\n\n```bash\npython webui.py \u003C语言（可选）>\n```\n\n如果想切换到 V1 版本：\n\n```bash\npython webui.py v1 \u003C语言（可选）>\n```\n\n或者您也可以在 WebUI 中手动切换版本。\n\n### 微调\n\n#### 现已支持路径自动填充\n\n1. 填写音频路径。\n2. 将音频切分成小片段。\n3. 噪音去除（可选）。\n4. ASR。\n5. 校对 ASR 转录结果。\n6. 切换到下一个标签页，即可开始微调模型。\n\n### 打开推理 WebUI\n\n#### 集成包用户\n\n双击 `go-webui-v2.bat` 或使用 `go-webui-v2.ps1`，然后在 `1-GPT-SoVITS-TTS\u002F1C-inference` 打开推理 WebUI。\n\n#### 其他用户\n\n```bash\npython GPT_SoVITS\u002Finference_webui.py \u003C语言（可选）>\n```\n\n或者\n\n```bash\npython webui.py\n```\n\n然后在 `1-GPT-SoVITS-TTS\u002F1C-inference` 打开推理 WebUI。\n\n## V2 版本更新说明\n\n新特性：\n\n1. 支持韩语和粤语\n\n2. 优化了文本前端\n\n3. 预训练模型从2000小时扩展到5000小时\n\n4. 提升了低质量参考音频的合成质量\n\n   [更多详情](\u003Chttps:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fwiki\u002FGPT%E2%80%90SoVITS%E2%80%90v2%E2%80%90features-(%E6%96%B0%E7%89%B9%E6%80%A7)>)\n\n在 v1 环境中使用 v2：\n\n1. 运行 `pip install -r requirements.txt` 更新部分依赖包\n\n2. 从 GitHub 克隆最新代码\n\n3. 从 [huggingface](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FGPT-SoVITS\u002Ftree\u002Fmain\u002Fgsv-v2final-pretrained) 下载 v2 预训练模型，并将其放入 `GPT_SoVITS\u002Fpretrained_models\u002Fgsv-v2final-pretrained` 目录。\n\n   中文 v2 补充：[G2PWModel.zip(HF)](https:\u002F\u002Fhuggingface.co\u002FXXXXRT\u002FGPT-SoVITS-Pretrained\u002Fresolve\u002Fmain\u002FG2PWModel.zip)| [G2PWModel.zip(ModelScope)](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002FXXXXRT\u002FGPT-SoVITS-Pretrained\u002Fresolve\u002Fmaster\u002FG2PWModel.zip)(下载 G2PW 模型，解压后重命名为 `G2PWModel`，再放入 `GPT_SoVITS\u002Ftext` 目录。)\n\n## V3 版本更新说明\n\n新特性：\n\n1. 音色相似度更高，只需较少的训练数据即可逼近目标说话人（直接使用基础模型而无需微调时，音色相似度显著提升）。\n\n2. GPT 模型更加稳定，重复和遗漏现象更少，更容易生成情感表达更丰富的语音。\n\n   [更多详情](\u003Chttps:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fwiki\u002FGPT%E2%80%90SoVITS%E2%80%90v3v4%E2%80%90features-(%E6%96%B0%E7%89%B9%E6%80%A7)>)\n\n在 v2 环境中使用 v3：\n\n1. 运行 `pip install -r requirements.txt` 更新部分依赖包\n\n2. 从 GitHub 克隆最新代码\n\n3. 从 [huggingface](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FGPT-SoVITS\u002Ftree\u002Fmain) 下载 v3 预训练模型（s1v3.ckpt、s2Gv3.pth 和 models--nvidia--bigvgan_v2_24khz_100band_256x 文件夹），并将其放入 `GPT_SoVITS\u002Fpretrained_models` 目录。\n\n   补充：关于音频超分辨率模型，可参阅 [如何下载](.\u002Ftools\u002FAP_BWE_main\u002F24kto48k\u002Freadme.txt)\n\n## V4 版本更新说明\n\n新特性：\n\n1. 解决了 V3 版本因非整数倍上采样导致的金属质感问题，并原生输出 48kHz 音频以避免声音闷塞（而 V3 仅原生输出 24kHz）。作者认为 V4 可直接替代 V3，但仍需进一步测试。\n   [更多详情](\u003Chttps:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fwiki\u002FGPT%E2%80%90SoVITS%E2%80%90v3v4%E2%80%90features-(%E6%96%B0%E7%89%B9%E6%80%A7)>)\n\n在 v1\u002Fv2\u002Fv3 环境中使用 v4：\n\n1. 运行 `pip install -r requirements.txt` 更新部分依赖包\n\n2. 从 GitHub 克隆最新代码\n\n3. 从 [huggingface](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FGPT-SoVITS\u002Ftree\u002Fmain) 下载 v4 预训练模型（gsv-v4-pretrained\u002Fs2v4.pth 和 gsv-v4-pretrained\u002Fvocoder.pth），并将其放入 `GPT_SoVITS\u002Fpretrained_models` 目录。\n\n## V2Pro 版本更新说明\n\n新特性：\n\n1. 显存占用略高于 v2，性能超越 v4，同时保持 v2 的硬件成本和速度。\n   [更多详情](\u003Chttps:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fwiki\u002FGPT%E2%80%90SoVITS%E2%80%90features-(%E5%90%84%E7%89%88%E6%9C%AC%E7%89%B9%E6%80%A7)>)\n\n2. v1\u002Fv2 和 v2Pro 系列具有相同特性，而 v3\u002Fv4 则有相似特点。对于平均音质的训练集，v1\u002Fv2\u002Fv2Pro 能取得不错效果，但 v3\u002Fv4 则难以达到。此外，v3\u002Fv4 合成的音色和韵律更倾向于参考音频，而非整体训练集。\n\n在 v1\u002Fv2\u002Fv3\u002Fv4 环境中使用 v2Pro：\n\n1. 运行 `pip install -r requirements.txt` 更新部分依赖包\n\n2. 从 GitHub 克隆最新代码\n\n3. 从 [huggingface](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FGPT-SoVITS\u002Ftree\u002Fmain) 下载 v2Pro 预训练模型（v2Pro\u002Fs2Dv2Pro.pth、v2Pro\u002Fs2Gv2Pro.pth、v2Pro\u002Fs2Dv2ProPlus.pth、v2Pro\u002Fs2Gv2ProPlus.pth，以及 sv\u002Fpretrained_eres2netv2w24s4ep4.ckpt），并将其放入 `GPT_SoVITS\u002Fpretrained_models` 目录。\n\n## 待办事项清单\n\n- [x] **高优先级：**\n\n  - [x] 日语和英语本地化。\n  - [x] 用户指南。\n  - [x] 日语和英语数据集微调训练。\n\n- [ ] **功能：**\n  - [x] 零样本语音转换（5秒）\u002F 少样本语音转换（1分钟）。\n  - [x] TTS 语速控制。\n  - [ ] ~~增强的 TTS 情感控制。~~ 或许可以使用预训练的微调版 GPT 模型来更好地控制情感。\n  - [ ] 尝试将 SoVITS 的词元输入改为 GPT 词汇的概率分布（Transformer 隐层表示）。\n  - [x] 改进英语和日语文本前端。\n  - [ ] 开发小型和大型 TTS 模型。\n  - [x] Colab 脚本。\n  - [x] 尝试扩充训练数据集（2000小时 -> 10000小时）。\n  - [x] 更好的 SoVITS 基础模型（提升音频质量）。\n  - [ ] 模型混合\n\n## （补充）命令行运行方法\n\n使用命令行打开 UVR5 的 WebUI：\n\n```bash\npython tools\u002Fuvr5\u002Fwebui.py \"\u003Cinfer_device>\" \u003Cis_half> \u003Cwebui_port_uvr5>\n```\n\n\u003C!-- 如果无法打开浏览器，请按照以下格式进行 UVR 处理，这是使用 mdxnet 进行音频处理的方式：\n```\npython mdxnet.py --model --input_root --output_vocal --output_ins --agg_level --format --device --is_half_precision\n``` -->\n\n以下是使用命令行对数据集进行音频分割的方法：\n\n```bash\npython audio_slicer.py \\\n    --input_path \"\u003C原始音频文件或目录路径>\" \\\n    --output_root \"\u003C保存分割后音频片段的目录>\" \\\n    --threshold \u003C音量阈值> \\\n    --min_length \u003C每个子片段的最小时长> \\\n    --min_interval \u003C相邻子片段之间的最短时间间隔> \\\n    --hop_size \u003C计算音量曲线的步长>\n```\n\n以下是使用命令行进行数据集 ASR 处理的方法（仅支持中文）：\n\n```bash\npython tools\u002Fasr\u002Ffunasr_asr.py -i \u003Cinput> -o \u003Coutput>\n```\n\nASR 处理由 Faster_Whisper 完成（除中文外的其他语言标注）。\n\n（无进度条，GPU 性能可能导致延迟）\n\n```bash\npython .\u002Ftools\u002Fasr\u002Ffasterwhisper_asr.py -i \u003Cinput> -o \u003Coutput> -l \u003Clanguage> -p \u003Cprecision>\n```\n\n支持自定义列表保存路径。\n\n## 致谢\n\n特别感谢以下项目和贡献者：\n\n### 理论研究\n\n- [ar-vits](https:\u002F\u002Fgithub.com\u002Finnnky\u002Far-vits)\n- [SoundStorm](https:\u002F\u002Fgithub.com\u002Fyangdongchao\u002FSoundStorm\u002Ftree\u002Fmaster\u002Fsoundstorm\u002Fs1\u002FAR)\n- [vits](https:\u002F\u002Fgithub.com\u002Fjaywalnut310\u002Fvits)\n- [TransferTTS](https:\u002F\u002Fgithub.com\u002Fhcy71o\u002FTransferTTS\u002Fblob\u002Fmaster\u002Fmodels.py#L556)\n- [contentvec](https:\u002F\u002Fgithub.com\u002Fauspicious3000\u002Fcontentvec\u002F)\n- [hifi-gan](https:\u002F\u002Fgithub.com\u002Fjik876\u002Fhifi-gan)\n- [fish-speech](https:\u002F\u002Fgithub.com\u002Ffishaudio\u002Ffish-speech\u002Fblob\u002Fmain\u002Ftools\u002Fllama\u002Fgenerate.py#L41)\n- [f5-TTS](https:\u002F\u002Fgithub.com\u002FSWivid\u002FF5-TTS\u002Fblob\u002Fmain\u002Fsrc\u002Ff5_tts\u002Fmodel\u002Fbackbones\u002Fdit.py)\n- [shortcut flow matching](https:\u002F\u002Fgithub.com\u002Fkvfrans\u002Fshortcut-models\u002Fblob\u002Fmain\u002Ftargets_shortcut.py)\n\n### 预训练模型\n\n- [中文语音预训练](https:\u002F\u002Fgithub.com\u002FTencentGameMate\u002Fchinese_speech_pretrain)\n- [Chinese-Roberta-WWM-Ext-Large](https:\u002F\u002Fhuggingface.co\u002Fhfl\u002Fchinese-roberta-wwm-ext-large)\n- [BigVGAN](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FBigVGAN)\n- [eresnetv2](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002Fiic\u002Fspeech_eres2netv2w24s4ep4_sv_zh-cn_16k-common)\n\n### 文本前端推理工具\n\n- [paddlespeech 中文规范化](https:\u002F\u002Fgithub.com\u002FPaddlePaddle\u002FPaddleSpeech\u002Ftree\u002Fdevelop\u002Fpaddlespeech\u002Ft2s\u002Ffrontend\u002Fzh_normalization)\n- [split-lang](https:\u002F\u002Fgithub.com\u002FDoodleBears\u002Fsplit-lang)\n- [g2pW](https:\u002F\u002Fgithub.com\u002FGitYCC\u002Fg2pW)\n- [pypinyin-g2pW](https:\u002F\u002Fgithub.com\u002Fmozillazg\u002Fpypinyin-g2pW)\n- [paddlespeech g2pw](https:\u002F\u002Fgithub.com\u002FPaddlePaddle\u002FPaddleSpeech\u002Ftree\u002Fdevelop\u002Fpaddlespeech\u002Ft2s\u002Ffrontend\u002Fg2pw)\n\n### WebUI 工具\n\n- [ultimatevocalremovergui](https:\u002F\u002Fgithub.com\u002FAnjok07\u002Fultimatevocalremovergui)\n- [audio-slicer](https:\u002F\u002Fgithub.com\u002Fopenvpi\u002Faudio-slicer)\n- [SubFix](https:\u002F\u002Fgithub.com\u002Fcronrpc\u002FSubFix)\n- [FFmpeg](https:\u002F\u002Fgithub.com\u002FFFmpeg\u002FFFmpeg)\n- [gradio](https:\u002F\u002Fgithub.com\u002Fgradio-app\u002Fgradio)\n- [faster-whisper](https:\u002F\u002Fgithub.com\u002FSYSTRAN\u002Ffaster-whisper)\n- [FunASR](https:\u002F\u002Fgithub.com\u002Falibaba-damo-academy\u002FFunASR)\n- [AP-BWE](https:\u002F\u002Fgithub.com\u002Fyxlu-0102\u002FAP-BWE)\n\n感谢 @Naozumi520 提供粤语训练数据集，并在粤语相关知识方面给予指导。\n\n## 感谢所有贡献者的辛勤付出\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fgraphs\u002Fcontributors\" target=\"_blank\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FRVC-Boss_GPT-SoVITS_readme_c6ade5f7870e.png\" \u002F>\n\u003C\u002Fa>","# GPT-SoVITS 快速上手指南\n\nGPT-SoVITS 是一款强大的少样本语音转换与文本转语音（TTS）工具。只需 5 秒音频即可实现零样本推理，或使用 1 分钟数据进行微调以获得高相似度声音。支持中、英、日、韩、粤语跨语言合成。\n\n## 环境准备\n\n### 系统要求\n*   **操作系统**: Windows 10\u002F11, Linux, macOS (Mac 训练建议使用 CPU 以保证质量)\n*   **Python 版本**: 3.9 - 3.12 (推荐 3.10)\n*   **硬件加速**: \n    *   NVIDIA GPU (推荐 CUDA 12.4+)\n    *   Apple Silicon (MPS)\n    *   CPU (仅推理或 Mac 训练)\n*   **依赖工具**: FFmpeg (必需)\n\n### 前置依赖安装 (FFmpeg)\n在运行主程序前，请确保已安装 FFmpeg：\n\n**Conda 用户 (推荐):**\n```bash\nconda activate GPTSoVits\nconda install ffmpeg\n```\n\n**Windows 手动安装:**\n下载 `ffmpeg.exe` 和 `ffprobe.exe` 放置于项目根目录，并安装 [Visual Studio 2017 运行库](https:\u002F\u002Faka.ms\u002Fvs\u002F17\u002Frelease\u002Fvc_redist.x86.exe)。\n\n**Ubuntu\u002FDebian:**\n```bash\nsudo apt install ffmpeg libsox-dev\n```\n\n**macOS:**\n```bash\nbrew install ffmpeg\n```\n\n---\n\n## 安装步骤\n\n### 方案一：Windows 一键整合包（最推荐）\n适合初学者，无需配置环境。\n1.  **国内用户**：[点击下载整合包](https:\u002F\u002Fwww.yuque.com\u002Fbaicaigongchang1145haoyuangong\u002Fib3g1e\u002Fdkxgpiy9zb96hob4#KTvnO)\n2.  解压后双击运行 `go-webui.bat` 即可启动。\n\n### 方案二：源码安装 (Linux \u002F Windows \u002F macOS)\n适合开发者，需预先安装 Conda。\n\n1.  **创建虚拟环境**:\n    ```bash\n    conda create -n GPTSoVits python=3.10\n    conda activate GPTSoVits\n    ```\n\n2.  **执行安装脚本**:\n    *   **Linux**:\n        ```bash\n        bash install.sh --device \u003CCU126|CU128|ROCM|CPU> --source \u003CHF-Mirror|ModelScope> [--download-uvr5]\n        ```\n        > 注：国内用户请将 `--source` 设为 `ModelScope` (魔搭) 或 `HF-Mirror` 以加速下载。`--device` 根据显卡选择，如 `CU126`。\n\n    *   **Windows (PowerShell)**:\n        ```pwsh\n        pwsh -F install.ps1 --Device \u003CCU126|CU128|CPU> --Source \u003CHF-Mirror|ModelScope> [--DownloadUVR5]\n        ```\n\n    *   **macOS**:\n        ```bash\n        bash install.sh --device \u003CMPS|CPU> --source \u003CHF-Mirror|ModelScope> [--download-uvr5]\n        ```\n\n3.  **补充模型 (若安装脚本未自动完成)**:\n    国内用户可前往 [语雀文档模型下载区](https:\u002F\u002Fwww.yuque.com\u002Fbaicaigongchang1145haoyuangong\u002Fib3g1e\u002Fdkxgpiy9zb96hob4#nVNhX) 下载以下模型并放入对应目录：\n    *   `GPT_SoVITS\u002Fpretrained_models`: 基础预训练模型\n    *   `GPT_SoVITS\u002Ftext\u002FG2PWModel`: 中文 TTS 必需 (解压并重命名为 `G2PWModel`)\n    *   `tools\u002Fasr\u002Fmodels`: ASR 识别模型 (可选，用于自动标注)\n    *   `tools\u002Fuvr5\u002Fuvr5_weights`: 人声分离模型 (可选)\n\n---\n\n## 基本使用\n\n### 1. 启动 WebUI\n在安装完成的根目录下：\n\n*   **整合包用户**: 双击 `go-webui.bat`。\n*   **源码用户**:\n    ```bash\n    python webui.py zh\n    ```\n    *(注：`zh` 为界面语言，可选填)*\n\n浏览器将自动打开 `http:\u002F\u002F127.0.0.1:9874`。\n\n### 2. 零样本推理 (5 秒即刻体验)\n无需训练，直接测试效果：\n1.  进入 **\"推理\" (Inference)** 选项卡。\n2.  **上传参考音频**: 上传一段 5 秒以上的清晰人声录音（支持 wav\u002Fmp3）。\n3.  **输入文本**: 在文本框输入想要合成的文字。\n4.  **选择语言**: 选择参考音频的语言 (zh\u002Fen\u002Fja\u002Fko\u002Fyue)。\n5.  点击 **\"语音合成\"**，即可听到克隆声音。\n\n### 3. 微调训练 (提升相似度)\n若需固定某个音色进行长期创作：\n1.  进入 **\"小语种\u002F中文训练\" (Finetune)** 选项卡。\n2.  **填写音频路径**: 指向包含目标人物语音的文件夹。\n3.  **一键流程**:\n    *   点击 **\"切片\"** (自动分割音频)。\n    *   点击 **\"去噪\"** (可选，提升音质)。\n    *   点击 **\"ASR\"** (自动识别语音转文字)。\n    *   点击 **\"校对\"** (检查并修正识别错误的文字)。\n4.  点击 **\"开始训练\"**。\n5.  训练完成后，回到 **\"推理\"** 选项卡，在下拉菜单中选择刚才训练好的模型名称即可使用。\n\n### 数据格式说明\n若手动制作 `.list` 标注文件，格式如下：\n```text\n音频绝对路径 | 说话人名称 | 语言代码 | 文本内容\n```\n示例：\n```text\nD:\\Audio\\sample.wav|SpeakerA|zh|你好，这是 GPT-SoVITS 的测试。\n```\n*语言代码：zh(中文), en(英文), ja(日文), ko(韩文), yue(粤语)*","一位独立游戏开发者需要为游戏中多位 NPC 快速生成多语种语音，但预算有限且无法聘请专业配音演员。\n\n### 没有 GPT-SoVITS 时\n- **成本高昂**：聘请真人配音演员费用昂贵，且按句计费，超出独立开发者的承受范围。\n- **周期漫长**：从联系配音、录制到后期修音，往往需要数周时间，严重拖慢游戏上线进度。\n- **修改困难**：一旦游戏剧本调整或发现台词错误，必须重新预约配音员补录，沟通成本极高。\n- **语种受限**：难以找到能同时流利演绎中文、日文及英文的同一名配音员，导致角色声音不统一。\n- **情感单一**：免费的基础 TTS 工具声音机械僵硬，缺乏情感起伏，无法沉浸式地展现角色性格。\n\n### 使用 GPT-SoVITS 后\n- **极低门槛**：仅需采集某位志愿者 1 分钟的干声样本，即可训练出高相似度的专属语音模型，几乎零成本。\n- **即时生成**：输入文本后秒级合成语音，开发者可随改随听，将原本数周的配音周期压缩至几小时。\n- **灵活迭代**：剧本变更时，只需在本地重新推理生成新音频，无需依赖外部人员，随时响应需求变化。\n- **跨语支持**：利用同一声音模型，轻松实现中、日、英等多语种无缝切换，确保全球服角色音色高度一致。\n- **情感逼真**：基于少量样本的微调能力，使合成语音具备丰富的语气和情感细节，听感接近真人演绎。\n\nGPT-SoVITS 让独立开发者仅用极少的数据和时间成本，就能获得媲美专业录音棚的多语种高质量语音产出能力。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FRVC-Boss_GPT-SoVITS_770f5b07.png","RVC-Boss",null,"https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FRVC-Boss_6e1c1cbf.png","https:\u002F\u002Fgithub.com\u002FRVC-Boss",[82,86,89,93,97,101,104,108,111],{"name":83,"color":84,"percentage":85},"Python","#3572A5",97,{"name":87,"color":88,"percentage":54},"Shell","#89e051",{"name":90,"color":91,"percentage":92},"Cuda","#3A4E3A",0.6,{"name":94,"color":95,"percentage":96},"PowerShell","#012456",0.5,{"name":98,"color":99,"percentage":100},"Jupyter Notebook","#DA5B0B",0.4,{"name":102,"color":103,"percentage":100},"C","#555555",{"name":105,"color":106,"percentage":107},"Dockerfile","#384d54",0.1,{"name":109,"color":110,"percentage":107},"C++","#f34b7d",{"name":112,"color":113,"percentage":114},"Batchfile","#C1F12E",0,56375,6160,"2026-04-05T22:15:46","MIT","Windows, Linux, macOS","NVIDIA GPU 推荐 (CUDA 12.4\u002F12.6\u002F12.8)，测试通过型号包括 RTX 4060Ti\u002F4090；Mac 支持 Apple Silicon (MPS) 但训练质量较低，建议仅用 CPU；支持 CPU 模式","未说明 (Docker 配置建议共享内存 shm_size 设为 16g)",{"notes":123,"python":124,"dependencies":125},"1. Windows 用户可直接下载整合包运行；2. macOS 用户使用 GPU 训练的模型质量显著低于其他设备，官方建议暂时使用 CPU 进行训练；3. 首次运行需下载预训练模型、G2PW 模型、UVR5 模型及 ASR 模型（中国用户有国内镜像源）；4. Docker 用户需注意设置共享内存大小以防异常；5. 支持零样本（5 秒音频）和少样本（1 分钟音频）推理，支持中、英、日、韩、粤语。","3.9 - 3.12 (官方测试环境主要为 3.10 和 3.11)",[126,127,128,129],"PyTorch (2.2.2 - 2.8.0dev)","FFmpeg","libsox-dev (Linux)","Visual Studio 2017 Redistributable (Windows)",[21],[132,133,134,135,136,137],"text-to-speech","tts","vits","voice-clone","voice-cloneai","voice-cloning","2026-03-27T02:49:30.150509","2026-04-06T09:46:56.008446",[141,146,151,156,161,165],{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},18797,"如何在 Mac (Apple Silicon\u002FMPS) 上安装和运行 GPT-SoVITS？","Mac 用户需按照以下步骤配置环境以支持 MPS 加速：\n1. 创建 Conda 环境：`conda create -n GPTSoVits python=3.9` 并激活。\n2. 安装依赖：`pip install -r requirements.txt`。\n3. 安装特定版本的 ASR 依赖（避免打标出错）：`pip install funasr==0.8.7`。\n4. 安装 PyTorch nightly 版本（稳定版 2.1.2 可能报错）：`pip3 install --pre torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fnightly\u002Fcpu`。\n5. 安装 ffmpeg：`brew install ffmpeg`。\n6. 修改 `webui.py` 中的显存大小设置（可选，默认约为系统内存的 2\u002F3）。\n7. 运行：`python webui.py`。\n注意：UVR5 功能在 Mac 上默认使用 CPU，ASR 功能也主要依赖 CPU，代码中已默认开启 `PYTORCH_ENABLE_MPS_FALLBACK=1` 以自动处理不支持的操作。","https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fissues\u002F165",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},18798,"是否支持使用 CPU 进行模型训练？","支持 CPU 训练，但需要手动修改代码配置：\n1. 在 `s1_train` 的 main 函数中，初始化 trainer 时将 `accelerator` 改为 `cpu`，`devices` 设为 `1`。\n2. 如果遇到类型不匹配错误，将 `precision` 手动指定为 `32`（默认为半精度，CPU 可能不支持）。\n3. 在 `s2_train` 中，注释掉 `os.environ[\"CUDA_VISIBLE_DEVICES\"] = ...` 这一行。\n4. 在 `s2_train` 的 main 函数中手动设置 `n_gpu` 以指定进程数。","https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fissues\u002F290",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},18799,"GPT-SoVITS 是否支持粤语（Cantonese）推理？","是的，项目已支持粤语推理。此外，社区提供了支持普通话和粤语中英混合的 G2P（图形到音素）工具，可参考项目：https:\u002F\u002Fgithub.com\u002Fpengzhendong\u002Fg2p-mix。用户也可以利用现有的预训练模型在粤语数据集上进行微调以获得更好效果。","https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fissues\u002F189",{"id":157,"question_zh":158,"answer_zh":159,"source_url":160},18800,"在 Mac 上进行本地推理时遇到报错或无法运行怎么办？","如果在 Mac 上运行 WebUI 推理时报错，通常需要修改 `GPT_SoVITS\u002Finference_webui.py` 文件以强制使用 CPU 和全精度：\n1. 将设备设置从 `CUDA` 改为 `CPU`。\n2. 将模型精度从半精度改为全精度：把代码中的 `model.half()` 修改为 `model.float()`。\n修改保存后重新运行 `python web.py` 即可。预处理脚本也需类似地将 `device` 参数设置为 `\"cpu\"`。","https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fissues\u002F93",{"id":162,"question_zh":163,"answer_zh":164,"source_url":145},18801,"安装依赖时出现 `opencc==1.1.1` 找不到对应版本或 Python 版本不匹配的错误如何解决？","该错误通常是因为 Python 版本过高（如 Python 3.9+）导致 `opencc==1.1.1` 无法找到兼容包。解决方案是确保使用 Python 3.9 创建 Conda 环境（`conda create -n GPTSoVits python=3.9`）。如果问题依旧，尝试安装更高版本的 opencc（如 `pip install opencc==1.1.8`），或者检查 `requirements.txt` 中是否硬编码了不兼容的版本号并进行调整。",{"id":166,"question_zh":167,"answer_zh":168,"source_url":169},18802,"在 Colab 上运行时遇到 `conda-libmamba-solver` 或 `libarchive.so.20` 错误怎么办？","这是 Colab 环境中 Conda 求解器配置冲突导致的。解决方法是在运行安装脚本前，显式配置 Conda 使用经典求解器。执行命令：`conda config --set solver classic` 或在相关脚本中禁用 libmamba 求解器。同时，确保目标目录（如 'GPT-SoVITS'）为空或不存在，以避免 `destination path already exists` 的错误。","https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fissues\u002F2231",[171,176,181,186],{"id":172,"version":173,"summary_zh":174,"released_at":175},109319,"20250606v2pro","[Windows 7z 安装包下载](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FGPT-SoVITS-windows-package\u002Fresolve\u002Fmain\u002FGPT-SoVITS-v2pro-20250604.7z?download=true)|[Windows 7z 安装包（适用于 50x0 系列 NVIDIA 显卡）下载](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FGPT-SoVITS-windows-package\u002Fresolve\u002Fmain\u002FGPT-SoVITS-v2pro-20250604-nvidia50.7z?download=true)\n\n中国用户可使用以下3个源加速下载整合包\n1、[魔搭直链，可满速下载](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002FFlowerCry\u002Fgpt-sovits-7z-pacakges\u002Fresolve\u002Fmaster\u002FGPT-SoVITS-v2pro-20250604.7z)|[魔搭直链，可满速下载，50系N卡特供](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002FFlowerCry\u002Fgpt-sovits-7z-pacakges\u002Fresolve\u002Fmaster\u002FGPT-SoVITS-v2pro-20250604-nvidia50.7z)\n2、[Hugging Face 中国镜像直链](https:\u002F\u002Fhf-mirror.com\u002Flj1995\u002FGPT-SoVITS-windows-package\u002Fresolve\u002Fmain\u002FGPT-SoVITS-v2pro-20250604.7z?download=true)|[Hugging Face 中国镜像直链，50系N卡特供](https:\u002F\u002Fhf-mirror.com\u002Flj1995\u002FGPT-SoVITS-windows-package\u002Fresolve\u002Fmain\u002FGPT-SoVITS-v2pro-20250604-nvidia50.7z?download=true) (不要跳转，直接复制链接到浏览器打开)\n3、[有百度网盘超级会员的可以用百度网盘满速下载](https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1OE5qL0KreO-ASHwm6Zl9gA?pwd=mqpi)\n\n\n[教程持续更新](www.yuque.com\u002Fbaicaigongchang1145haoyuangong\u002Fib3g1e) [用户指南](https:\u002F\u002Frentry.co\u002FGPT-SoVITS-guide#\u002F)\n\n已更新至 v2pro，[更新内容详情(v2pro 新特性)](https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fwiki\u002FGPT%E2%80%90SoVITS%E2%80%90features-(%E5%90%84%E7%89%88%E6%9C%AC%E7%89%B9%E6%80%A7))\n\n[一些其他资料（some other documents）](https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fwiki)\n\n[变更日志](https:\u002F\u002Fgithub.com\u002FRVC-Project\u002FRetrieval-based-Voice-Conversion-WebUI\u002Fblob\u002Fmain\u002Fdocs\u002Fen\u002FChangelog_EN.md) [更新日志](https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fblob\u002Fmain\u002Fdocs\u002Fcn\u002FChangelog_CN.md)\n\n[国内云镜像](https:\u002F\u002Fwww.codewithgpu.com\u002Fi\u002FRVC-Boss\u002FGPT-SoVITS\u002FGPT-SoVITS-Official )","2025-06-06T03:04:27",{"id":177,"version":178,"summary_zh":179,"released_at":180},109320,"20250422v4","[Windows 7z 安装包下载](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FGPT-SoVITS-windows-package\u002Fresolve\u002Fmain\u002FGPT-SoVITS-v4-20250529.7z?download=true)|[Windows 7z 安装包（适用于 50x0 系列 NVIDIA 显卡）下载](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FGPT-SoVITS-windows-package\u002Fresolve\u002Fmain\u002FGPT-SoVITS-v4-20250529-nvidia50.7z?download=true)\n\n中国用户可使用以下3个源加速下载整合包：\n1、[魔搭直链，可满速下载](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002FFlowerCry\u002Fgpt-sovits-7z-pacakges\u002Fresolve\u002Fmaster\u002FGPT-SoVITS-v4-20250529.7z)|[魔搭直链，可满速下载，50系N卡特供](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002FFlowerCry\u002Fgpt-sovits-7z-pacakges\u002Fresolve\u002Fmaster\u002FGPT-SoVITS-v4-20250529-nvidia50.7z)\n2、[Hugging Face 中国镜像直链](https:\u002F\u002Fhf-mirror.com\u002Flj1995\u002FGPT-SoVITS-windows-package\u002Fresolve\u002Fmain\u002FGPT-SoVITS-v4-20250529.7z?download=true)|[Hugging Face 中国镜像直链，50系N卡特供](https:\u002F\u002Fhf-mirror.com\u002Flj1995\u002FGPT-SoVITS-windows-package\u002Fresolve\u002Fmain\u002FGPT-SoVITS-v4-20250529-nvidia50.7z?download=true)（不要跳转，直接复制链接到浏览器打开）\n3、[有百度网盘超级会员的可以用百度网盘满速下载](https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1OE5qL0KreO-ASHwm6Zl9gA?pwd=mqpi)\n\n\n[教程持续更新](www.yuque.com\u002Fbaicaigongchang1145haoyuangong\u002Fib3g1e) [用户指南](https:\u002F\u002Frentry.co\u002FGPT-SoVITS-guide#\u002F)\n\n已更新至 v4，[更新内容详情（v4 新特性）](https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fwiki\u002FGPT%E2%80%90SoVITS%E2%80%90v3v4%E2%80%90features-(%E6%96%B0%E7%89%B9%E6%80%A7))\n\n[一些其他资料（some other documents）](https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fwiki)\n\n[变更日志](https:\u002F\u002Fgithub.com\u002FRVC-Project\u002FRetrieval-based-Voice-Conversion-WebUI\u002Fblob\u002Fmain\u002Fdocs\u002Fen\u002FChangelog_EN.md) [更新日志](https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fblob\u002Fmain\u002Fdocs\u002Fcn\u002FChangelog_CN.md)\n\n[国内云镜像](https:\u002F\u002Fwww.codewithgpu.com\u002Fi\u002FRVC-Boss\u002FGPT-SoVITS\u002FGPT-SoVITS-Official )","2025-04-22T12:10:21",{"id":182,"version":183,"summary_zh":184,"released_at":185},109321,"20250228v3","[Windows 7z 安装包下载](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FGPT-SoVITS-windows-package\u002Fresolve\u002Fmain\u002FGPT-SoVITS-v3lora-20250228.7z)\n\n中国用户可使用以下两个资源加速下载整合包：\n1、[需登录，免费满速下载链接](https:\u002F\u002Fdrive.uc.cn\u002Fs\u002Fa1fd91ae0a4f4)\n2、[拥有百度网盘超级会员的用户可使用百度网盘](https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1OE5qL0KreO-ASHwm6Zl9gA?pwd=mqpi)\n\n[教程持续更新](www.yuque.com\u002Fbaicaigongchang1145haoyuangong\u002Fib3g1e) [用户指南](https:\u002F\u002Frentry.co\u002FGPT-SoVITS-guide#\u002F)\n\n已更新至 v3 版本，[更新内容详情（v3 新特性）](https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fwiki\u002FGPT%E2%80%90SoVITS%E2%80%90v3%E2%80%90features-(%E6%96%B0%E7%89%B9%E6%80%A7))\n\n[一些其他资料](https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fwiki)\n\n[变更日志（英文）](https:\u002F\u002Fgithub.com\u002FRVC-Project\u002FRetrieval-based-Voice-Conversion-WebUI\u002Fblob\u002Fmain\u002Fdocs\u002Fen\u002FChangelog_EN.md) [更新日志（中文）](https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fblob\u002Fmain\u002Fdocs\u002Fcn\u002FChangelog_CN.md)\n\n[国内云镜像](https:\u002F\u002Fwww.codewithgpu.com\u002Fi\u002FRVC-Boss\u002FGPT-SoVITS\u002FGPT-SoVITS-Official )","2025-02-28T14:28:49",{"id":187,"version":188,"summary_zh":189,"released_at":190},109322,"20240821v2","[Windows 7z 安装包下载](https:\u002F\u002Fhuggingface.co\u002Flj1995\u002FGPT-SoVITS-windows-package\u002Fresolve\u002Fmain\u002FGPT-SoVITS-v2-240821.7z)\n\n中国用户可使用以下2个源加速下载整合包：\n1、[需登录，免费满速下载链接](https:\u002F\u002Fdrive.uc.cn\u002Fs\u002Fa1fd91ae0a4f4)\n2、[有百度网盘超级会员的可以用百度网盘](https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1OE5qL0KreO-ASHwm6Zl9gA?pwd=mqpi)\n\n[教程持续更新](www.yuque.com\u002Fbaicaigongchang1145haoyuangong\u002Fib3g1e) [用户指南](https:\u002F\u002Frentry.co\u002FGPT-SoVITS-guide#\u002F)\n\n已更新至 v2 版本，[更新内容详情（v2 新特性）](https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fwiki\u002FGPT%E2%80%90SoVITS%E2%80%90v2%E2%80%90features-(%E6%96%B0%E7%89%B9%E6%80%A7))\n\n[一些其他资料（some other documents）](https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fwiki)\n\n[变更日志（英文）](https:\u002F\u002Fgithub.com\u002FRVC-Project\u002FRetrieval-based-Voice-Conversion-WebUI\u002Fblob\u002Fmain\u002Fdocs\u002Fen\u002FChangelog_EN.md) [更新日志（中文）](https:\u002F\u002Fgithub.com\u002FRVC-Boss\u002FGPT-SoVITS\u002Fblob\u002Fmain\u002Fdocs\u002Fcn\u002FChangelog_CN.md)\n\n[国内云端镜像](https:\u002F\u002Fwww.codewithgpu.com\u002Fi\u002FRVC-Boss\u002FGPT-SoVITS\u002FGPT-SoVITS-Official )","2024-08-21T15:50:40"]