[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-babysor--MockingBird":3,"tool-babysor--MockingBird":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":80,"owner_location":81,"owner_email":82,"owner_twitter":80,"owner_website":80,"owner_url":83,"languages":84,"stars":100,"forks":101,"last_commit_at":102,"license":103,"difficulty_score":104,"env_os":105,"env_gpu":106,"env_ram":107,"env_deps":108,"category_tags":120,"github_topics":121,"view_count":128,"oss_zip_url":80,"oss_zip_packed_at":80,"status":16,"created_at":129,"updated_at":130,"faqs":131,"releases":162},2735,"babysor\u002FMockingBird","MockingBird","🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time","MockingBird 是一款开源的实时语音克隆工具，旨在让用户仅需 5 秒的参考音频，即可快速合成任意内容的语音，并实现逼真的音色复刻。它有效解决了传统语音合成技术中数据采集成本高、训练周期长以及难以实时生成的痛点，让个性化语音生成变得触手可及。\n\n这款工具特别适合开发者、AI 研究人员以及对语音技术感兴趣的技术爱好者使用。无论是用于构建交互式语音应用、进行声学模型研究，还是制作创意内容，MockingBird 都能提供强大的支持。普通用户若具备基础的编程环境配置能力，也可通过其提供的 Web 服务或工具箱体验前沿的变声效果。\n\n在技术亮点方面，MockingBird 基于 PyTorch 框架，不仅完美支持中文普通话及多种主流数据集，还实现了跨平台运行，兼容 Windows、Linux 乃至 M1 架构的 macOS。其独特的架构设计允许复用预训练的编码器与声码器，只需微调合成器即可获得出色效果，大幅降低了部署门槛。此外，项目内置了现成的 Web 服务器功能，方便用户通过远程调用快速集成到自己的应用中。尽管原作者已转向云端优化版本，但 MockingBird 作为经典的本地部署方案","MockingBird 是一款开源的实时语音克隆工具，旨在让用户仅需 5 秒的参考音频，即可快速合成任意内容的语音，并实现逼真的音色复刻。它有效解决了传统语音合成技术中数据采集成本高、训练周期长以及难以实时生成的痛点，让个性化语音生成变得触手可及。\n\n这款工具特别适合开发者、AI 研究人员以及对语音技术感兴趣的技术爱好者使用。无论是用于构建交互式语音应用、进行声学模型研究，还是制作创意内容，MockingBird 都能提供强大的支持。普通用户若具备基础的编程环境配置能力，也可通过其提供的 Web 服务或工具箱体验前沿的变声效果。\n\n在技术亮点方面，MockingBird 基于 PyTorch 框架，不仅完美支持中文普通话及多种主流数据集，还实现了跨平台运行，兼容 Windows、Linux 乃至 M1 架构的 macOS。其独特的架构设计允许复用预训练的编码器与声码器，只需微调合成器即可获得出色效果，大幅降低了部署门槛。此外，项目内置了现成的 Web 服务器功能，方便用户通过远程调用快速集成到自己的应用中。尽管原作者已转向云端优化版本，但 MockingBird 作为经典的本地部署方案，依然为社区提供了宝贵的学习与实践资源。","> 🚧 While I no longer actively update this repo, you can find me continuously pushing this tech forward to good side and open-source. I'm also building an optimized and cloud hosted version: https:\u002F\u002Fnoiz.ai\u002F and [we're hiring](https:\u002F\u002Fgithub.com\u002Fbabysor\u002FMockingBird\u002Fissues\u002F1029).\n>\n![mockingbird](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fbabysor_MockingBird_readme_d5f8dce10c87.jpg)\n\u003Ca href=\"https:\u002F\u002Ftrendshift.io\u002Frepositories\u002F3869\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fbabysor_MockingBird_readme_2afa9a759104.png\" alt=\"babysor%2FMockingBird | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"\u002F>\u003C\u002Fa>\n\n[![MIT License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-blue.svg?style=flat)](http:\u002F\u002Fchoosealicense.com\u002Flicenses\u002Fmit\u002F)\n\n> English | [中文](README-CN.md)| [中文Linux](README-LINUX-CN.md)\n\n## Features\n🌍 **Chinese** supported mandarin and tested with multiple datasets: aidatatang_200zh, magicdata, aishell3, data_aishell, and etc.\n\n🤩 **PyTorch** worked for pytorch, tested in version of 1.9.0(latest in August 2021), with GPU Tesla T4 and GTX 2060\n\n🌍 **Windows + Linux** run in both Windows OS and linux OS (even in M1 MACOS)\n\n🤩 **Easy & Awesome** effect with only newly-trained synthesizer, by reusing the pretrained encoder\u002Fvocoder\n\n🌍 **Webserver Ready** to serve your result with remote calling\n\n### [DEMO VIDEO](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV17Q4y1B7mY\u002F)\n\n## Quick Start\n\n### 1. Install Requirements\n#### 1.1 General Setup\n> Follow the original repo to test if you got all environment ready.\n**Python 3.7 or higher ** is needed to run the toolbox.\n\n* Install [PyTorch](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F).\n> If you get an `ERROR: Could not find a version that satisfies the requirement torch==1.9.0+cu102 (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2 )` This error is probably due to a low version of python, try using 3.9 and it will install successfully\n* Install [ffmpeg](https:\u002F\u002Fffmpeg.org\u002Fdownload.html#get-packages).\n* Run `pip install -r requirements.txt` to install the remaining necessary packages.\n> The recommended environment here is `Repo Tag 0.0.1` `Pytorch1.9.0 with Torchvision0.10.0 and cudatoolkit10.2` `requirements.txt` `webrtcvad-wheels` because `requirements. txt` was exported a few months ago, so it doesn't work with newer versions\n* Install webrtcvad `pip install webrtcvad-wheels`(If you need)\n\nor\n- install dependencies with `conda` or `mamba`\n\n  ```conda env create -n env_name -f env.yml```\n\n  ```mamba env create -n env_name -f env.yml```\n\n  will create a virtual environment where necessary dependencies are installed. Switch to the new environment by `conda activate env_name` and enjoy it.\n  > env.yml only includes the necessary dependencies to run the project，temporarily without monotonic-align. You can check the official website to install the GPU version of pytorch.\n\n#### 1.2 Setup with a M1 Mac\n> The following steps are a workaround to directly use the original `demo_toolbox.py`without the changing of codes.\n>\n  >  Since the major issue comes with the PyQt5 packages used in `demo_toolbox.py` not compatible with M1 chips, were one to attempt on training models with the M1 chip, either that person can forgo `demo_toolbox.py`, or one can try the `web.py` in the project.\n\n##### 1.2.1 Install `PyQt5`, with [ref](https:\u002F\u002Fstackoverflow.com\u002Fa\u002F68038451\u002F20455983) here.\n  * Create and open a Rosetta Terminal, with [ref](https:\u002F\u002Fdev.to\u002Fcourier\u002Ftips-and-tricks-to-setup-your-apple-m1-for-development-547g) here.\n  * Use system Python to create a virtual environment for the project\n    ```\n    \u002Fusr\u002Fbin\u002Fpython3 -m venv \u002FPathToMockingBird\u002Fvenv\n    source \u002FPathToMockingBird\u002Fvenv\u002Fbin\u002Factivate\n    ```\n  * Upgrade pip and install `PyQt5`\n    ```\n    pip install --upgrade pip\n    pip install pyqt5\n    ```\n##### 1.2.2 Install `pyworld` and `ctc-segmentation`\n\n> Both packages seem to be unique to this project and are not seen in the original [Real-Time Voice Cloning](https:\u002F\u002Fgithub.com\u002FCorentinJ\u002FReal-Time-Voice-Cloning) project. When installing with `pip install`, both packages lack wheels so the program tries to directly compile from c code and could not find `Python.h`.\n\n  * Install `pyworld`\n      * `brew install python` `Python.h` can come with Python installed by brew\n      * `export CPLUS_INCLUDE_PATH=\u002Fopt\u002Fhomebrew\u002FFrameworks\u002FPython.framework\u002FHeaders` The filepath of brew-installed `Python.h` is unique to M1 MacOS and listed above. One needs to manually add the path to the environment variables.\n      * `pip install pyworld` that should do.\n\n\n  * Install`ctc-segmentation`\n    > Same method does not apply to `ctc-segmentation`, and one needs to compile it from the source code on [github](https:\u002F\u002Fgithub.com\u002Flumaku\u002Fctc-segmentation).\n    * `git clone https:\u002F\u002Fgithub.com\u002Flumaku\u002Fctc-segmentation.git`\n    * `cd ctc-segmentation`\n    * `source \u002FPathToMockingBird\u002Fvenv\u002Fbin\u002Factivate` If the virtual environment hasn't been deployed, activate it.\n    * `cythonize -3 ctc_segmentation\u002Fctc_segmentation_dyn.pyx`\n    * `\u002Fusr\u002Fbin\u002Farch -x86_64 python setup.py build` Build with x86 architecture.\n    * `\u002Fusr\u002Fbin\u002Farch -x86_64 python setup.py install --optimize=1 --skip-build`Install with x86 architecture.\n\n##### 1.2.3 Other dependencies\n  * `\u002Fusr\u002Fbin\u002Farch -x86_64 pip install torch torchvision torchaudio` Pip installing `PyTorch` as an example, articulate that it's installed with x86 architecture\n  * `pip install ffmpeg`  Install ffmpeg\n  * `pip install -r requirements.txt` Install other requirements.\n\n##### 1.2.4 Run the Inference Time (with Toolbox)\n  > To run the project on x86 architecture. [ref](https:\u002F\u002Fyoutrack.jetbrains.com\u002Fissue\u002FPY-46290\u002FAllow-running-Python-under-Rosetta-2-in-PyCharm-for-Apple-Silicon).\n  * `vim \u002FPathToMockingBird\u002Fvenv\u002Fbin\u002FpythonM1` Create an executable file `pythonM1` to condition python interpreter at `\u002FPathToMockingBird\u002Fvenv\u002Fbin`.\n  * Write in the following content:\n    ```\n    #!\u002Fusr\u002Fbin\u002Fenv zsh\n    mydir=${0:a:h}\n    \u002Fusr\u002Fbin\u002Farch -x86_64 $mydir\u002Fpython \"$@\"\n    ```\n  * `chmod +x pythonM1` Set the file as executable.\n  * If using PyCharm IDE, configure project interpreter to `pythonM1`([steps here](https:\u002F\u002Fwww.jetbrains.com\u002Fhelp\u002Fpycharm\u002Fconfiguring-python-interpreter.html#add-existing-interpreter)), if using command line python, run `\u002FPathToMockingBird\u002Fvenv\u002Fbin\u002FpythonM1 demo_toolbox.py`\n\n\n### 2. Prepare your models\n> Note that we are using the pretrained encoder\u002Fvocoder but not synthesizer, since the original model is incompatible with the Chinese symbols. It means the demo_cli is not working at this moment, so additional synthesizer models are required.\n\nYou can either train your models or use existing ones:\n\n#### 2.1 Train encoder with your dataset (Optional)\n\n* Preprocess with the audios and the mel spectrograms:\n`python encoder_preprocess.py \u003Cdatasets_root>` Allowing parameter `--dataset {dataset}` to support the datasets you want to preprocess. Only the train set of these datasets will be used. Possible names: librispeech_other, voxceleb1, voxceleb2. Use comma to sperate multiple datasets.\n\n* Train the encoder: `python encoder_train.py my_run \u003Cdatasets_root>\u002FSV2TTS\u002Fencoder`\n> For training, the encoder uses visdom. You can disable it with `--no_visdom`, but it's nice to have. Run \"visdom\" in a separate CLI\u002Fprocess to start your visdom server.\n\n#### 2.2 Train synthesizer with your dataset\n* Download dataset and unzip: make sure you can access all .wav in folder\n* Preprocess with the audios and the mel spectrograms:\n`python pre.py \u003Cdatasets_root>`\nAllowing parameter `--dataset {dataset}` to support aidatatang_200zh, magicdata, aishell3, data_aishell, etc.If this parameter is not passed, the default dataset will be aidatatang_200zh.\n\n* Train the synthesizer:\n`python train.py --type=synth mandarin \u003Cdatasets_root>\u002FSV2TTS\u002Fsynthesizer`\n\n* Go to next step when you see attention line show and loss meet your need in training folder *synthesizer\u002Fsaved_models\u002F*.\n\n#### 2.3 Use pretrained model of synthesizer\n> Thanks to the community, some models will be shared:\n\n| author | Download link | Preview Video | Info |\n| --- | ----------- | ----- |----- |\n| @author | https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1iONvRxmkI-t1nHqxKytY3g  [Baidu](https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1iONvRxmkI-t1nHqxKytY3g) 4j5d  |  | 75k steps trained by multiple datasets\n| @author | https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1fMh9IlgKJlL2PIiRTYDUvw  [Baidu](https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1fMh9IlgKJlL2PIiRTYDUvw) code：om7f  |  | 25k steps trained by multiple datasets, only works under version 0.0.1\n|@FawenYo | https:\u002F\u002Fyisiou-my.sharepoint.com\u002F:u:\u002Fg\u002Fpersonal\u002Flawrence_cheng_fawenyo_onmicrosoft_com\u002FEWFWDHzee-NNg9TWdKckCc4BC7bK2j9cCbOWn0-_tK0nOg?e=n0gGgC  | [input](https:\u002F\u002Fgithub.com\u002Fbabysor\u002FMockingBird\u002Fwiki\u002Faudio\u002Fself_test.mp3) [output](https:\u002F\u002Fgithub.com\u002Fbabysor\u002FMockingBird\u002Fwiki\u002Faudio\u002Fexport.wav) | 200k steps with local accent of Taiwan, only works under version 0.0.1\n|@miven| https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1PI-hM3sn5wbeChRryX-RCQ code: 2021 https:\u002F\u002Fwww.aliyundrive.com\u002Fs\u002FAwPsbo8mcSP code: z2m0 | https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1uh411B7AD\u002F | only works under version 0.0.1\n\n#### 2.4 Train vocoder (Optional)\n> note: vocoder has little difference in effect, so you may not need to train a new one.\n* Preprocess the data:\n`python vocoder_preprocess.py \u003Cdatasets_root> -m \u003Csynthesizer_model_path>`\n> `\u003Cdatasets_root>` replace with your dataset root，`\u003Csynthesizer_model_path>`replace with directory of your best trained models of sythensizer, e.g. *sythensizer\\saved_mode\\xxx*\n\n* Train the wavernn vocoder:\n`python vocoder_train.py mandarin \u003Cdatasets_root>`\n\n* Train the hifigan vocoder\n`python vocoder_train.py mandarin \u003Cdatasets_root> hifigan`\n\n### 3. Launch\n#### 3.1 Using the web server\nYou can then try to run:`python web.py` and open it in browser, default as `http:\u002F\u002Flocalhost:8080`\n\n#### 3.2 Using the Toolbox\nYou can then try the toolbox:\n`python demo_toolbox.py -d \u003Cdatasets_root>`\n\n#### 3.3 Using the command line\nYou can then try the command:\n`python gen_voice.py \u003Ctext_file.txt> your_wav_file.wav`\nyou may need to install cn2an by \"pip install cn2an\" for better digital number result.\n\n## Reference\n> This repository is forked from [Real-Time-Voice-Cloning](https:\u002F\u002Fgithub.com\u002FCorentinJ\u002FReal-Time-Voice-Cloning) which only support English.\n\n| URL | Designation | Title | Implementation source |\n| --- | ----------- | ----- | --------------------- |\n| [1803.09017](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.09017) | GlobalStyleToken (synthesizer)| Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis | This repo |\n| [2010.05646](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.05646) | HiFi-GAN (vocoder)| Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis | This repo |\n| [2106.02297](https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.02297) | Fre-GAN (vocoder)| Fre-GAN: Adversarial Frequency-consistent Audio Synthesis | This repo |\n|[**1806.04558**](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1806.04558.pdf) | **SV2TTS** | **Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis** | This repo |\n|[1802.08435](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1802.08435.pdf) | WaveRNN (vocoder) | Efficient Neural Audio Synthesis | [fatchord\u002FWaveRNN](https:\u002F\u002Fgithub.com\u002Ffatchord\u002FWaveRNN) |\n|[1703.10135](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.10135.pdf) | Tacotron (synthesizer) | Tacotron: Towards End-to-End Speech Synthesis | [fatchord\u002FWaveRNN](https:\u002F\u002Fgithub.com\u002Ffatchord\u002FWaveRNN)\n|[1710.10467](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1710.10467.pdf) | GE2E (encoder)| Generalized End-To-End Loss for Speaker Verification | This repo |\n\n## F Q&A\n#### 1.Where can I download the dataset?\n| Dataset | Original Source | Alternative Sources |\n| --- | ----------- | ---------------|\n| aidatatang_200zh | [OpenSLR](http:\u002F\u002Fwww.openslr.org\u002F62\u002F) | [Google Drive](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F110A11KZoVe7vy6kXlLb6zVPLb_J91I_t\u002Fview?usp=sharing) |\n| magicdata | [OpenSLR](http:\u002F\u002Fwww.openslr.org\u002F68\u002F) | [Google Drive (Dev set)](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1g5bWRUSNH68ycC6eNvtwh07nX3QhOOlo\u002Fview?usp=sharing) |\n| aishell3 | [OpenSLR](https:\u002F\u002Fwww.openslr.org\u002F93\u002F) | [Google Drive](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1shYp_o4Z0X0cZSKQDtFirct2luFUwKzZ\u002Fview?usp=sharing) |\n| data_aishell | [OpenSLR](https:\u002F\u002Fwww.openslr.org\u002F33\u002F) |  |\n> After unzip aidatatang_200zh, you need to unzip all the files under `aidatatang_200zh\\corpus\\train`\n\n#### 2.What is`\u003Cdatasets_root>`?\nIf the dataset path is `D:\\data\\aidatatang_200zh`,then `\u003Cdatasets_root>` is`D:\\data`\n\n#### 3.Not enough VRAM\nTrain the synthesizer：adjust the batch_size in `synthesizer\u002Fhparams.py`\n```\n\u002F\u002FBefore\ntts_schedule = [(2,  1e-3,  20_000,  12),   # Progressive training schedule\n                (2,  5e-4,  40_000,  12),   # (r, lr, step, batch_size)\n                (2,  2e-4,  80_000,  12),   #\n                (2,  1e-4, 160_000,  12),   # r = reduction factor (# of mel frames\n                (2,  3e-5, 320_000,  12),   #     synthesized for each decoder iteration)\n                (2,  1e-5, 640_000,  12)],  # lr = learning rate\n\u002F\u002FAfter\ntts_schedule = [(2,  1e-3,  20_000,  8),   # Progressive training schedule\n                (2,  5e-4,  40_000,  8),   # (r, lr, step, batch_size)\n                (2,  2e-4,  80_000,  8),   #\n                (2,  1e-4, 160_000,  8),   # r = reduction factor (# of mel frames\n                (2,  3e-5, 320_000,  8),   #     synthesized for each decoder iteration)\n                (2,  1e-5, 640_000,  8)],  # lr = learning rate\n```\n\nTrain Vocoder-Preprocess the data：adjust the batch_size in `synthesizer\u002Fhparams.py`\n```\n\u002F\u002FBefore\n### Data Preprocessing\n        max_mel_frames = 900,\n        rescale = True,\n        rescaling_max = 0.9,\n        synthesis_batch_size = 16,                  # For vocoder preprocessing and inference.\n\u002F\u002FAfter\n### Data Preprocessing\n        max_mel_frames = 900,\n        rescale = True,\n        rescaling_max = 0.9,\n        synthesis_batch_size = 8,                  # For vocoder preprocessing and inference.\n```\n\nTrain Vocoder-Train the vocoder：adjust the batch_size in `vocoder\u002Fwavernn\u002Fhparams.py`\n```\n\u002F\u002FBefore\n# Training\nvoc_batch_size = 100\nvoc_lr = 1e-4\nvoc_gen_at_checkpoint = 5\nvoc_pad = 2\n\n\u002F\u002FAfter\n# Training\nvoc_batch_size = 6\nvoc_lr = 1e-4\nvoc_gen_at_checkpoint = 5\nvoc_pad =2\n```\n\n#### 4.If it happens `RuntimeError: Error(s) in loading state_dict for Tacotron: size mismatch for encoder.embedding.weight: copying a param with shape torch.Size([70, 512]) from checkpoint, the shape in current model is torch.Size([75, 512]).`\nPlease refer to issue [#37](https:\u002F\u002Fgithub.com\u002Fbabysor\u002FMockingBird\u002Fissues\u002F37)\n\n#### 5. How to improve CPU and GPU occupancy rate?\nAdjust the batch_size as appropriate to improve\n\n\n#### 6. What if it happens `the page file is too small to complete the operation`\nPlease refer to this [video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Oh6dga-Oy10&ab_channel=CodeProf) and change the virtual memory to 100G (102400), for example : When the file is placed in the D disk, the virtual memory of the D disk is changed.\n\n#### 7. When should I stop during training?\nFYI, my attention came after 18k steps and loss became lower than 0.4 after 50k steps.\n![attention_step_20500_sample_1](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fbabysor_MockingBird_readme_c7cc7e7f1b29.png)\n![step-135500-mel-spectrogram_sample_1](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fbabysor_MockingBird_readme_7ff49cf3dc77.png)\n","> 🚧 雖然我不再積極更新這個倉庫，但我一直在推動這項技術向更好的方向發展，並致力於開源。同時，我也在構建一個優化且雲託管的版本：https:\u002F\u002Fnoiz.ai\u002F，並且我們正在招聘人才（[issue 1029](https:\u002F\u002Fgithub.com\u002Fbabysor\u002FMockingBird\u002Fissues\u002F1029)）。\n>\n![mockingbird](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fbabysor_MockingBird_readme_d5f8dce10c87.jpg)\n\u003Ca href=\"https:\u002F\u002Ftrendshift.io\u002Frepositories\u002F3869\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fbabysor_MockingBird_readme_2afa9a759104.png\" alt=\"babysor%2FMockingBird | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"\u002F>\u003C\u002Fa>\n\n[![MIT License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-blue.svg?style=flat)](http:\u002F\u002Fchoosealicense.com\u002Flicenses\u002Fmit\u002F)\n\n> 英文 | [中文](README-CN.md)| [中文Linux](README-LINUX-CN.md)\n\n## 特色\n🌍 **中文** 支持普通話，並已在多個數據集上測試過，包括 aidatatang_200zh、magicdata、aishell3、data_aishell 等。\n\n🤩 **PyTorch** 基於 PyTorch 框架開發，在 1.9.0 版本（截至 2021 年 8 月為最新版本）下進行了測試，使用 Tesla T4 和 GTX 2060 GPU。\n\n🌍 **Windows + Linux** 可在 Windows 和 Linux 系統上運行，甚至適用於 M1 Mac OS。\n\n🤩 **簡單又強大** 只需使用新訓練的合成器，即可實現出色效果，同時重用預訓練的編碼器和聲碼器。\n\n🌍 **Web 服務器就緒** 可以通過遠程調用提供結果。\n\n### [演示視頻](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV17Q4y1B7mY\u002F)\n\n## 快速入門\n\n### 1. 安裝依賴\n#### 1.1 一般設置\n> 請按照原始倉庫的說明，檢查是否已準備好所有環境。\n**Python 3.7 或更高版本** 是運行該工具箱的必要條件。\n\n* 安裝 [PyTorch](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F)。\n> 如果遇到 `ERROR: Could not find a version that satisfies the requirement torch==1.9.0+cu102 (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2 )` 的錯誤，這很可能是由於 Python 版本過低所致。建議使用 3.9 版本，通常可以成功安裝。\n* 安裝 [ffmpeg](https:\u002F\u002Fffmpeg.org\u002Fdownload.html#get-packages)。\n* 執行 `pip install -r requirements.txt` 以安裝剩餘的必要軟體包。\n> 推薦的環境配置為：`Repo Tag 0.0.1`、`Pytorch 1.9.0 附帶 Torchvision 0.10.0 和 cudatoolkit 10.2`、`requirements.txt`、`webrtcvad-wheels`。由於 `requirements.txt` 是數月前導出的，可能無法與較新的版本兼容。\n* 如有需要，安裝 webrtcvad：`pip install webrtcvad-wheels`。\n\n或者\n- 使用 `conda` 或 `mamba` 安裝依賴：\n\n  ```conda env create -n env_name -f env.yml```\n\n  ```mamba env create -n env_name -f env.yml```\n\n  這將創建一個包含所需依賴的虛擬環境。切換到新環境後，執行 `conda activate env_name` 即可開始使用。\n  > env.yml 只包含運行項目所需的依賴，暫時未包含 monotonic-align。您可以訪問官方網站安裝 GPU 版本的 PyTorch。\n\n#### 1.2 在 M1 Mac 上設置\n> 以下步驟是繞過修改代碼，直接使用原始 `demo_toolbox.py` 的方法。\n>\n  > 由於 `demo_toolbox.py` 中使用的 PyQt5 套件與 M1 處理器不兼容，這是主要問題。如果希望在 M1 處理器上訓練模型，可以選擇放棄使用 `demo_toolbox.py`，或嘗試項目中的 `web.py`。\n\n##### 1.2.1 安裝 `PyQt5`，參考資料：[這裡](https:\u002F\u002Fstackoverflow.com\u002Fa\u002F68038451\u002F20455983)。\n  * 打開 Rosetta 終端，參考資料：[這裡](https:\u002F\u002Fdev.to\u002Fcourier\u002Ftips-and-tricks-to-setup-your-apple-m1-for-development-547g)。\n  * 使用系統自帶的 Python 創建項目虛擬環境：\n    ```\n    \u002Fusr\u002Fbin\u002Fpython3 -m venv \u002FPathToMockingBird\u002Fvenv\n    source \u002FPathToMockingBird\u002Fvenv\u002Fbin\u002Factivate\n    ```\n  * 更新 pip 並安裝 `PyQt5`：\n    ```\n    pip install --upgrade pip\n    pip install pyqt5\n    ```\n##### 1.2.2 安裝 `pyworld` 和 `ctc-segmentation`\n\n> 這兩個套件似乎是專屬於本項目，未見於原始的 [Real-Time Voice Cloning](https:\u002F\u002Fgithub.com\u002FCorentinJ\u002FReal-Time-Voice-Cloning) 項目中。使用 `pip install` 安裝時，由於缺少輪子文件，程序會嘗試直接從 C 語言編譯，但找不到 `Python.h`。\n\n  * 安裝 `pyworld`：\n      * 使用 `brew install python` 安裝 Python，brew 安裝的 Python 會附帶 `Python.h`。\n      * 設定環境變量：`export CPLUS_INCLUDE_PATH=\u002Fopt\u002Fhomebrew\u002FFrameworks\u002FPython.framework\u002FHeaders`。brew 安裝的 `Python.h` 路徑對 M1 Mac 獨特，需手動添加到環境變量中。\n      * 再次執行 `pip install pyworld`，應可成功安裝。\n\n  * 安裝 `ctc-segmentation`：\n    > 對於 `ctc-segmentation`，上述方法不適用，需要從 [GitHub](https:\u002F\u002Fgithub.com\u002Flumaku\u002Fctc-segmentation) 上的源代碼編譯。\n    * 克隆代碼庫：`git clone https:\u002F\u002Fgithub.com\u002Flumaku\u002Fctc-segmentation.git`\n    * 切換到該目錄：`cd ctc-segmentation`\n    * 激活虛擬環境：`source \u002FPathToMockingBird\u002Fvenv\u002Fbin\u002Factivate`（若尚未激活）。\n    * 執行 `cythonize -3 ctc_segmentation\u002Fctc_segmentation_dyn.pyx`\n    * 使用 x86 架構編譯：`\u002Fusr\u002Fbin\u002Farch -x86_64 python setup.py build`\n    * 使用 x86 架構安裝：`\u002Fusr\u002Fbin.arch -x86_64 python setup.py install --optimize=1 --skip-build`\n\n##### 1.2.3 其他依賴\n  * 使用 x86 架構安裝 PyTorch：`\u002Fusr\u002Fbin.arch -x86_64 pip install torch torchvision torchaudio`。\n  * 安裝 ffmpeg：`pip install ffmpeg`。\n  * 安裝其他依賴：`pip install -r requirements.txt`。\n\n##### 1.2.4 運行推理（使用工具箱）\n  > 為了在 x86 架構下運行項目，參考資料：[這裡](https:\u002F\u002Fyoutrack.jetbrains.com\u002Fissue\u002FPY-46290\u002FAllow-running-Python-under-Rosetta-2-in-PyCharm-for-Apple-Silicon)。\n  * 創建可執行文件 `pythonM1`：`vim \u002FPathToMockingBird\u002Fvenv\u002Fbin\u002FpythonM1`，並將其放置在 `\u002FPathToMockingBird\u002Fvenv\u002Fbin` 目錄下。\n  * 寫入以下內容：\n    ```\n    #!\u002Fusr\u002Fbin\u002Fenv zsh\n    mydir=${0:a:h}\n    \u002Fusr\u002Fbin.arch -x86_64 $mydir\u002Fpython \"$@\"\n    ```\n  * 設定文件為可執行：`chmod +x pythonM1`。\n  * 若使用 PyCharm IDE，請將項目解釋器配置為 `pythonM1`（[配置步驟在此](https:\u002F\u002Fwww.jetbrains.com\u002Fhelp\u002Fpycharm\u002Fconfiguring-python-interpreter.html#add-existing-interpreter)）；若使用命令行 Python，則執行 `\u002FPathToMockingBird\u002Fvenv\u002Fbin\u002FpythonM1 demo_toolbox.py`。\n\n### 2. 准备您的模型\n> 注意，我们使用的是预训练的编码器和声码器，但不使用合成器，因为原始模型与中文符号不兼容。这意味着 demo_cli 目前无法正常工作，因此需要额外的合成器模型。\n\n您可以选择训练自己的模型，也可以直接使用现有的模型：\n\n#### 2.1 使用您的数据集训练编码器（可选）\n\n* 对音频和梅尔谱进行预处理：\n`python encoder_preprocess.py \u003Cdatasets_root>` 可以通过参数 `--dataset {dataset}` 指定要预处理的数据集。仅会使用这些数据集的训练集。可能的数据集名称包括：librispeech_other、voxceleb1、voxceleb2。多个数据集之间用逗号分隔。\n\n* 训练编码器：`python encoder_train.py my_run \u003Cdatasets_root>\u002FSV2TTS\u002Fencoder`\n> 编码器在训练过程中会使用 visdom 进行可视化。您可以通过 `--no_visdom` 参数禁用它，不过启用 visdom 会更有帮助。请在另一个命令行或进程中运行 `visdom` 来启动 visdom 服务器。\n\n#### 2.2 使用您的数据集训练合成器\n* 下载并解压数据集：确保可以访问文件夹中的所有 .wav 文件。\n* 对音频和梅尔谱进行预处理：\n`python pre.py \u003Cdatasets_root>`\n可以通过参数 `--dataset {dataset}` 指定 aidatatang_200zh、magicdata、aishell3、data_aishell 等数据集。如果未指定该参数，则默认使用 aidatatang_200zh 数据集。\n\n* 训练合成器：\n`python train.py --type=synth mandarin \u003Cdatasets_root>\u002FSV2TTS\u002Fsynthesizer`\n\n* 当您在训练目录 *synthesizer\u002Fsaved_models\u002F* 中看到注意力线出现，并且损失达到满意水平时，即可进入下一步。\n\n#### 2.3 使用预训练的合成器模型\n> 感谢社区贡献，以下是一些共享的模型：\n\n| 作者 | 下载链接 | 预览视频 | 信息 |\n| --- | ----------- | ----- |----- |\n| @author | https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1iONvRxmkI-t1nHqxKytY3g  [百度网盘](https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1iONvRxmkI-t1nHqxKytY3g) 提取码：4j5d  |  | 基于多个数据集训练的 7.5 万步模型 |\n| @author | https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1fMh9IlgKJlL2PIiRTYDUvw  [百度网盘](https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1fMh9IlgKJlL2PIiRTYDUvw) 提取码：om7f  |  | 基于多个数据集训练的 2.5 万步模型，仅适用于版本 0.0.1 |\n|@FawenYo | https:\u002F\u002Fyisiou-my.sharepoint.com\u002F:u:\u002Fg\u002Fpersonal\u002Flawrence_cheng_fawenyo_onmicrosoft_com\u002FEWFWDHzee-NNg9TWdKckCc4BC7bK2j9cCbOWn0-_tK0nOg?e=n0gGgC  | [输入](https:\u002F\u002Fgithub.com\u002Fbabysor\u002FMockingBird\u002Fwiki\u002Faudio\u002Fself_test.mp3) [输出](https:\u002F\u002Fgithub.com\u002Fbabysor\u002FMockingBird\u002Fwiki\u002Faudio\u002Fexport.wav) | 基于台湾本地口音训练的 20 万步模型，仅适用于版本 0.0.1 |\n|@miven| https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1PI-hM3sn5wbeChRryX-RCQ 提取码：2021；阿里云盘链接：https:\u002F\u002Fwww.aliyundrive.com\u002Fs\u002FAwPsbo8mcSP 提取码：z2m0 | https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1uh411B7AD\u002F | 仅适用于版本 0.0.1 |\n\n#### 2.4 训练声码器（可选）\n> 注意：声码器对最终效果的影响较小，因此您可能不需要重新训练一个新的声码器。\n* 数据预处理：\n`python vocoder_preprocess.py \u003Cdatasets_root> -m \u003Csynthesizer_model_path>`\n> `\u003Cdatasets_root>` 替换为您数据集的根目录，`\u003Csynthesizer_model_path>` 替换为您的最佳合成器模型所在的目录，例如 *sythensizer\\saved_mode\\xxx*。\n\n* 训练 wavernn 声码器：\n`python vocoder_train.py mandarin \u003Cdatasets_root>`\n\n* 训练 hifigan 声码器：\n`python vocoder_train.py mandarin \u003Cdatasets_root> hifigan`\n\n### 3. 启动\n#### 3.1 使用 Web 服务器\n您可以尝试运行：`python web.py`，然后在浏览器中打开，默认地址为 `http:\u002F\u002Flocalhost:8080`。\n\n#### 3.2 使用工具箱\n您也可以尝试使用工具箱：\n`python demo_toolbox.py -d \u003Cdatasets_root>`\n\n#### 3.3 使用命令行\n您还可以尝试使用命令：\n`python gen_voice.py \u003Ctext_file.txt> your_wav_file.wav`\n为了获得更好的数字转换效果，您可能需要先安装 cn2an 库，命令为 `pip install cn2an`。\n\n## 参考文献\n> 本仓库基于 [Real-Time-Voice-Cloning](https:\u002F\u002Fgithub.com\u002FCorentinJ\u002FReal-Time-Voice-Cloning) 分支而来，原项目仅支持英文。\n\n| URL | 名称 | 标题 | 实现来源 |\n| --- | ----------- | ----- | --------------------- |\n| [1803.09017](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.09017) | GlobalStyleToken（合成器）| 风格标记：端到端语音合成中的无监督风格建模、控制与迁移 | 本仓库 |\n| [2010.05646](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.05646) | HiFi-GAN（声码器）| 用于高效高保真语音合成的生成对抗网络 | 本仓库 |\n| [2106.02297](https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.02297) | Fre-GAN（声码器）| Fre-GAN：对抗式频率一致音频合成 | 本仓库 |\n|[**1806.04558**](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1806.04558.pdf) | **SV2TTS** | 从说话人验证到多说话人文本转语音合成的迁移学习 | 本仓库 |\n|[1802.08435](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1802.08435.pdf) | WaveRNN（声码器）| 高效神经音频合成 | [fatchord\u002FWaveRNN](https:\u002F\u002Fgithub.com\u002Ffatchord\u002FWaveRNN) |\n|[1703.10135](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.10135.pdf) | Tacotron（合成器）| Tacotron：迈向端到端语音合成 | [fatchord\u002FWaveRNN](https:\u002F\u002Fgithub.com\u002Ffatchord\u002FWaveRNN) |\n|[1710.10467](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1710.10467.pdf) | GE2E（编码器）| 用于说话人验证的广义端到端损失 | 本仓库 |\n\n## 常见问题解答\n#### 1. 我可以在哪里下载数据集？\n| 数据集 | 原始来源 | 替代来源 |\n| --- | ----------- | ---------------|\n| aidatatang_200zh | [OpenSLR](http:\u002F\u002Fwww.openslr.org\u002F62\u002F) | [Google Drive](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F110A11KZoVe7vy6kXlLb6zVPLb_J91I_t\u002Fview?usp=sharing) |\n| magicdata | [OpenSLR](http:\u002F\u002Fwww.openslr.org\u002F68\u002F) | [Google Drive (开发集)](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1g5bWRUSNH68ycC6eNvtwh07nX3QhOOlo\u002Fview?usp=sharing) |\n| aishell3 | [OpenSLR](https:\u002F\u002Fwww.openslr.org\u002F93\u002F) | [Google Drive](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1shYp_o4Z0X0cZSKQDtFirct2luFUwKzZ\u002Fview?usp=sharing) |\n| data_aishell | [OpenSLR](https:\u002F\u002Fwww.openslr.org\u002F33\u002F) |  |\n> 解压 aidatatang_200zh 后，需要解压 `aidatatang_200zh\\corpus\\train` 下的所有文件。\n\n#### 2. `\u003Cdatasets_root>` 是什么？\n如果数据集路径是 `D:\\data\\aidatatang_200zh`，那么 `\u003Cdatasets_root>` 就是 `D:\\data`。\n\n#### 3. 显存不足\n训练合成器：调整 `synthesizer\u002Fhparams.py` 中的 `batch_size`。\n```\n\u002F\u002F之前\ntts_schedule = [(2,  1e-3,  20_000,  12),   # 渐进式训练计划\n                (2,  5e-4,  40_000,  12),   # (r, lr, step, batch_size)\n                (2,  2e-4,  80_000,  12),   #\n                (2,  1e-4, 160_000,  12),   # r = 缩放因子（每个解码器迭代生成的梅尔频谱帧数）\n                (2,  3e-5, 320_000,  12),   #\n                (2,  1e-5, 640_000,  12)],  # lr = 学习率\n\u002F\u002F之后\ntts_schedule = [(2,  1e-3,  20_000,  8),   # 渐进式训练计划\n                (2,  5e-4,  40_000,  8),   # (r, lr, step, batch_size)\n                (2,  2e-4,  80_000,  8),   #\n                (2,  1e-4, 160_000,  8),   # r = 缩放因子（每个解码器迭代生成的梅尔频谱帧数）\n                (2,  3e-5, 320_000,  8),   #\n                (2,  1e-5, 640_000,  8)],  # lr = 学习率\n```\n\n训练声码器——预处理数据：调整 `synthesizer\u002Fhparams.py` 中的 `batch_size`。\n```\n\u002F\u002F之前\n### 数据预处理\n        max_mel_frames = 900,\n        rescale = True,\n        rescaling_max = 0.9,\n        synthesis_batch_size = 16,                  # 用于声码器的预处理和推理。\n\u002F\u002F之后\n### 数据预处理\n        max_mel_frames = 900,\n        rescale = True,\n        rescaling_max = 0.9,\n        synthesis_batch_size = 8,                  # 用于声码器的预处理和推理。\n```\n\n训练声码器——训练声码器：调整 `vocoder\u002Fwavernn\u002Fhparams.py` 中的 `batch_size`。\n```\n\u002F\u002F之前\n# 训练\nvoc_batch_size = 100\nvoc_lr = 1e-4\nvoc_gen_at_checkpoint = 5\nvoc_pad = 2\n\n\u002F\u002F之后\n# 训练\nvoc_batch_size = 6\nvoc_lr = 1e-4\nvoc_gen_at_checkpoint = 5\nvoc_pad = 2\n```\n\n#### 4. 如果出现 `RuntimeError: Error(s) in loading state_dict for Tacotron: size mismatch for encoder.embedding.weight: copying a param with shape torch.Size([70, 512]) from checkpoint, the shape in current model is torch.Size([75, 512]).` 错误，\n请参考 issue [#37](https:\u002F\u002Fgithub.com\u002Fbabysor\u002FMockingBird\u002Fissues\u002F37)。\n\n#### 5. 如何提高 CPU 和 GPU 的占用率？\n适当调整 `batch_size` 可以提高利用率。\n\n#### 6. 如果出现“页面文件太小，无法完成操作”的错误怎么办？\n请参考这个 [视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=Oh6dga-Oy10&ab_channel=CodeProf)，将虚拟内存调整为 100G（102400），例如：当文件放在 D 盘时，将 D 盘的虚拟内存进行相应调整。\n\n#### 7. 训练过程中应该在什么时候停止？\n供参考：我的注意力机制在 18k 步后开始起作用，损失在 50k 步后降至 0.4 以下。\n![attention_step_20500_sample_1](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fbabysor_MockingBird_readme_c7cc7e7f1b29.png)\n![step-135500-mel-spectrogram_sample_1](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fbabysor_MockingBird_readme_7ff49cf3dc77.png)","# MockingBird 快速上手指南\n\nMockingBird 是一个基于 PyTorch 的开源语音克隆工具，支持中文（普通话）及多种数据集，可实现高质量的语音合成与音色转换。\n\n## 1. 环境准备\n\n### 系统要求\n- **操作系统**：Windows、Linux 或 macOS（M1 Mac 需特殊配置，见下文备注）\n- **Python 版本**：3.7 或更高（推荐 3.9 以避免部分依赖安装错误）\n- **GPU 支持**：推荐使用 NVIDIA GPU（如 Tesla T4, GTX 2060），需安装对应版本的 CUDA\n\n### 前置依赖\n在开始之前，请确保已安装以下基础软件：\n1. **PyTorch**：访问 [PyTorch 官网](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F) 获取适合你环境的安装命令。\n   > 建议环境组合：`PyTorch 1.9.0` + `Torchvision 0.10.0` + `cudatoolkit 10.2`\n2. **FFmpeg**：用于音频处理。\n   - Windows: 下载二进制包并配置环境变量，或使用 `choco install ffmpeg`\n   - Linux: `sudo apt-get install ffmpeg`\n   - macOS: `brew install ffmpeg`\n\n## 2. 安装步骤\n\n### 方法一：使用 pip 安装（推荐）\n\n1. 克隆项目代码：\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Fbabysor\u002FMockingBird.git\n   cd MockingBird\n   ```\n\n2. 安装 Python 依赖包：\n   > 注意：由于 `requirements.txt` 导出时间较早，建议先创建虚拟环境。\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n3. 安装额外的 VAD 模块（可选但推荐）：\n   ```bash\n   pip install webrtcvad-wheels\n   ```\n\n4. 安装数字转换工具（优化中文数字朗读效果）：\n   ```bash\n   pip install cn2an\n   ```\n\n### 方法二：使用 Conda\u002FMamba 安装\n\n如果你偏好使用 Conda 管理环境：\n\n```bash\nconda env create -n mockingbird -f env.yml\nconda activate mockingbird\n```\n> 注：`env.yml` 仅包含核心依赖，如需 GPU 加速，请手动安装对应版本的 PyTorch。\n\n### ⚠️ M1 Mac 用户特别提示\nM1 芯片默认架构与部分依赖（如 PyQt5）不兼容。若需运行图形界面工具箱 (`demo_toolbox.py`)，请通过 Rosetta 2 转译运行：\n1. 创建 Rosetta 终端并建立 x86_64 虚拟环境。\n2. 安装依赖时强制指定架构：\n   ```bash\n   \u002Fusr\u002Fbin\u002Farch -x86_64 pip install torch torchvision torchaudio\n   ```\n3. 运行脚本时添加架构前缀：\n   ```bash\n   \u002Fusr\u002Fbin\u002Farch -x86_64 python demo_toolbox.py\n   ```\n*若仅需推理功能，可直接使用 `web.py` 或命令行模式，无需复杂配置。*\n\n## 3. 基本使用\n\n### 第一步：准备模型\nMockingBird 复用预训练的编码器 (Encoder) 和声码器 (Vocoder)，但**合成器 (Synthesizer)** 需要针对中文重新训练或使用社区分享的预训练模型。\n\n- **方案 A（推荐新手）**：下载社区预训练的中文合成器模型。\n  将下载的模型文件放入 `synthesizer\u002Fsaved_models\u002F` 目录下。\n  *(可在项目 Wiki 或 README 表格中寻找百度网盘链接)*\n\n- **方案 B（进阶）**：使用自己的数据集训练。\n  确保数据集（如 aidatatang_200zh）已解压，然后执行：\n  ```bash\n  python pre.py \u003Cdatasets_root> --dataset aidatatang_200zh\n  python train.py --type=synth mandarin \u003Cdatasets_root>\u002FSV2TTS\u002Fsynthesizer\n  ```\n\n### 第二步：启动服务\n\n#### 方式 1：Web 界面（最简单）\n启动本地 Web 服务器，通过浏览器操作：\n```bash\npython web.py\n```\n启动后在浏览器访问：`http:\u002F\u002Flocalhost:8080`\n\n#### 方式 2：命令行生成语音\n直接通过命令将文本转换为特定音色的语音：\n\n```bash\npython gen_voice.py \u003Ctext_file.txt> your_wav_file.wav\n```\n\n- `\u003Ctext_file.txt>`：包含待合成文本的文件。\n- `your_wav_file.wav`：参考音频文件（用于提取音色）。\n\n> 提示：为了获得更好的中文数字朗读效果，请确保已安装 `cn2an` 库。\n\n#### 方式 3：图形化工具箱\n如果环境配置完整（特别是 PyQt5），可启动交互式工具箱：\n```bash\npython demo_toolbox.py -d \u003Cdatasets_root>\n```","一家小型独立游戏工作室正在为一款民国背景的角色扮演游戏制作中文配音，但团队预算有限，无法聘请专业配音演员录制所有支线任务的对话。\n\n### 没有 MockingBird 时\n- **成本高昂且周期长**：聘请专业配音员按句收费，每次修改剧本都需重新预约录音棚，导致预算迅速超支且开发进度受阻。\n- **声音一致性难维持**：不同场次由不同替补配音员代录，导致同一角色在不同章节中音色、语调出现明显割裂感，破坏沉浸体验。\n- **实时交互无法实现**：玩家自由输入的文本或动态生成的剧情对话只能显示字幕，无法生成对应的语音，限制了玩法的创新。\n- **方言与语料匮乏**：难以找到能完美演绎特定年代感或带有轻微地方口音的现成语音素材库。\n\n### 使用 MockingBird 后\n- **极速克隆降低成本**：仅需采集主角配音员 5 秒的样本音频，MockingBird 即可克隆其声线，无限生成任意新台词，将配音成本降低 90% 以上。\n- **音色高度统一**：无论剧本如何修改或新增多少分支剧情，所有生成语音均保持完全一致的音色特征，确保角色形象的连贯性。\n- **支持实时语音合成**：结合 Webserver 功能，游戏可调用接口实时将玩家输入的文字转化为角色语音，真正实现“开口即说”的沉浸式互动。\n- **灵活适配中文语境**：利用其对 AIShell 等中文数据集的优化支持，轻松生成自然流畅的普通话甚至特定语气风格的对话，无需额外训练复杂模型。\n\nMockingBird 让小微团队也能以极低门槛拥有电影级的动态语音生成能力，彻底打破了传统配音对内容创作的限制。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fbabysor_MockingBird_d5f8dce1.jpg","babysor","Vega","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fbabysor_b7ad89e8.png","ex-Facebook Engineer. Focusing on cutting-edge SaaS\u002FIaaS\u002F Cloud Service, expertise in Distributed System, AI. ",null,"Beijing","babysor00@gmail.com","https:\u002F\u002Fgithub.com\u002Fbabysor",[85,89,93,97],{"name":86,"color":87,"percentage":88},"Python","#3572A5",99.6,{"name":90,"color":91,"percentage":92},"Shell","#89e051",0.2,{"name":94,"color":95,"percentage":96},"Cython","#fedf5b",0.1,{"name":98,"color":99,"percentage":96},"Dockerfile","#384d54",36902,5233,"2026-04-02T16:15:29","NOASSERTION",4,"Windows, Linux, macOS (包括 M1，但需通过 Rosetta 2 运行 x86 架构)","需要 NVIDIA GPU 以获得最佳性能（测试型号：Tesla T4, GTX 2060），CUDA 10.2；M1 Mac 可通过 CPU 或特定配置运行但主要推荐 x86 模拟","未说明",{"notes":109,"python":110,"dependencies":111},"1. M1 Mac 用户必须使用 Rosetta 2 创建 x86 环境的虚拟环境，并手动编译 pyworld 和 ctc-segmentation，无法直接运行 demo_toolbox.py（建议使用 web.py）。2. 默认预训练模型不支持中文合成，需单独训练或使用社区提供的中文 Synthesizer 模型。3. 推荐使用 Repo Tag 0.0.1 版本配合 PyTorch 1.9.0 + CUDA 10.2 环境，新版依赖可能不兼容。4. 训练 Encoder 时可选安装 visdom 用于监控。","3.7+ (推荐 3.9 以避免 PyTorch 安装错误)",[112,113,114,115,116,117,118,119],"torch==1.9.0","torchvision==0.10.0","ffmpeg","PyQt5","webrtcvad-wheels","pyworld","ctc-segmentation","cn2an",[15,55,14,13],[122,123,124,125,126,127],"ai","speech","pytorch","deep-learning","text-to-speech","tts",8,"2026-03-27T02:49:30.150509","2026-04-06T08:17:44.699216",[132,137,142,147,152,157],{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},12662,"加载预训练模型时出现 'size mismatch' (维度不匹配) 的 RuntimeError 怎么办？","这通常是因为代码版本与预训练模型版本不一致导致的。解决方案是修改配置文件：打开 `synthesizer\u002Fhparams.py` 文件，将 `use_gst` 和 `use_ser_for_gst` 参数均设置为 `False`。如果问题依旧，请确保使用的是与模型匹配的 main 分支代码版本。","https:\u002F\u002Fgithub.com\u002Fbabysor\u002FMockingBird\u002Fissues\u002F37",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},12663,"运行程序后合成的音频全是杂音或只有 2 秒噪音，如何解决？","这通常是由于模型配置与预训练权重不匹配造成的。请尝试以下步骤：1. 确认已按照上述方法修改 `synthesizer\u002Fhparams.py` 中的 `use_gst` 和 `use_ser_for_gst` 为 `False`；2. 检查是否使用了正确的预训练模型文件；3. 如果是 Windows 用户，尝试重新安装特定版本的依赖：`pip install PyQt5==5.15.4` 和 `pip install sounddevice==0.4.3`。","https:\u002F\u002Fgithub.com\u002Fbabysor\u002FMockingBird\u002Fissues\u002F24",{"id":143,"question_zh":144,"answer_zh":145,"source_url":146},12664,"安装 requirements 时提示 'No matching distribution found for monotonic-align' 怎么办？","这是因为指定版本的 `monotonic-align==0.0.3` 在当前 Python 环境下找不到。解决方法是编辑 `requirements.txt` 文件，直接删除 `monotonic-align` 后面的版本号限制（即删除 `==0.0.3`），然后重新运行 `pip install -r requirements.txt` 安装最新兼容版本。","https:\u002F\u002Fgithub.com\u002Fbabysor\u002FMockingBird\u002Fissues\u002F884",{"id":148,"question_zh":149,"answer_zh":150,"source_url":151},12665,"运行时提示 'ModuleNotFoundError: No module named pyworld' 如何修复？","`pyworld` 需要 C++ 编译环境才能安装。请先下载并安装 Visual Studio Build Tools (链接：https:\u002F\u002Fvisualstudio.microsoft.com\u002Fzh-hans\u002F)，安装时务必勾选 'C++ 生成工具' (C++ build tools)。安装完成后，重新运行 `pip install pyworld` 即可成功编译安装。","https:\u002F\u002Fgithub.com\u002Fbabysor\u002FMockingBird\u002Fissues\u002F444",{"id":153,"question_zh":154,"answer_zh":155,"source_url":156},12666,"使用 HiFi-GAN 声码器时报错 'IndexError: list index out of range' 且无法听到声音？","该错误通常是因为缺少必要的 Visual C++ 运行库导致声码器初始化失败。请下载安装 Visual C++ Redistributable 或完整的 Visual Studio (需包含 C++ 扩展)。安装完成后重启终端再次运行程序，确保声码器模型配置路径能被正确读取。","https:\u002F\u002Fgithub.com\u002Fbabysor\u002FMockingBird\u002Fissues\u002F432",{"id":158,"question_zh":159,"answer_zh":160,"source_url":161},12667,"运行 pre.py 预处理数据时报错 'LibsndfileError: System error' 导致 .npy 文件未生成？","这通常是由于多进程处理音频文件时发生冲突或文件路径问题。参考相关修复方案 (Issue #988)，可以尝试减少预处理时的进程数量，或者检查音频文件路径是否包含特殊字符。如果是在 Windows 上，确保以管理员身份运行或在代码中调整多进程启动方式。","https:\u002F\u002Fgithub.com\u002Fbabysor\u002FMockingBird\u002Fissues\u002F958",[163],{"id":164,"version":165,"summary_zh":80,"released_at":166},63062,"v0.0.1","2021-11-07T13:53:53"]