[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-lukaszliniewicz--Pandrator":3,"tool-lukaszliniewicz--Pandrator":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":78,"owner_location":78,"owner_email":78,"owner_twitter":78,"owner_website":78,"owner_url":79,"languages":80,"stars":85,"forks":86,"last_commit_at":87,"license":88,"difficulty_score":23,"env_os":89,"env_gpu":90,"env_ram":91,"env_deps":92,"category_tags":106,"github_topics":107,"view_count":23,"oss_zip_url":78,"oss_zip_packed_at":78,"status":16,"created_at":128,"updated_at":129,"faqs":130,"releases":163},2694,"lukaszliniewicz\u002FPandrator","Pandrator","Turn PDFs and EPUBs into audiobooks, subtitles or videos into dubbed videos (including translation), and more. For free. Pandrator uses local models, notably XTTS, including voice-cloning (instant, RVC-enhanced, XTTS fine-tuning) and LLM processing. It aspires to be a user-friendly app with a GUI, an installer and all-in-one packages.","Pandrator 是一款功能强大的本地化多媒体处理工具，旨在帮助用户免费将 PDF、EPUB 电子书转换为有声书，或将视频字幕转化为多语言配音视频。它有效解决了传统转换工具发音生硬、缺乏情感以及依赖云端服务导致隐私泄露和费用高昂的痛点，让用户能在完全离线的环境下获得自然流畅的听觉体验。\n\n这款软件特别适合普通读者、内容创作者及教育工作者使用。其最大的亮点在于“开箱即用”的友好设计：提供图形化界面（GUI）和一键安装包，无需复杂的代码配置即可在 Windows 上轻松运行。在技术层面，Pandrator 并非单一的 AI 模型，而是一个集成了多种先进开源技术的框架。它核心采用 XTTS 模型实现高质量的多语言合成与即时声音克隆，并引入 RVC 技术进一步优化音色逼真度。此外，它还结合本地大语言模型（LLM）对文本进行智能预处理，自动修正 OCR 识别错误或优化数字缩写朗读，确保生成的语音逻辑清晰、听感自然。无论是想听书的用户，还是需要制作多语言视频内容的创作者，Pandrator 都能提供专业且便捷的解决方案。","\u003Cp align=\"left\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flukaszliniewicz_Pandrator_readme_ffd8182e493b.png\" alt=\"Icon\" width=\"200\" height=\"200\"\u002F>\n\u003C\u002Fp>\n\n# Pandrator: a multilingual GUI audiobook, subtitle and dubbing generator with voice cloning and translation\n>[!TIP]\n>**TL;DR:**\n> - Pandrator is not an AI model itself, but a GUI framework for Text-to-Speech, subtitle generation and translation projects. It can generate audiobooks and subtitles\u002Fdubbing by leveraging several AI tools, custom workflows and algorithms. It works on Windows out of the box. It does work on Linux, but you have to perform a manual installation at the moment.\n> - The easiest way to use it is to download one of the precompiled **[archives](https:\u002F\u002F1drv.ms\u002Ff\u002Fs!AgSiDu9lV3iMnPFKPO5BB_c72OLjtQ?e=3fRZMG)** - simply unpack them and use the included launcher. See **[this table](#self-contained-packages)** for their contents and sizes.\n> - You can talk to me or share tips\u002Fworkflows\u002Fideas on the Discord server.\n>\n> [![](https:\u002F\u002Fdcbadge.limes.pink\u002Fapi\u002Fserver\u002FJZzHv3MnaV)](https:\u002F\u002Fdiscord.gg\u002Fhttps:\u002F\u002Fdiscord.gg\u002FJZzHv3MnaV)\n\n\n\n## Quick Demonstration\nThis video shows the process of launching Pandrator, selecting a source file, starting generation, stopping it and previewing the saved file. It has not been sped up as it's intended to illustrate the real performance (you may skip the first 35s when the XTTS server is launching, and please remember to turn on the sound). \n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F7cab141a-e043-4057-8166-72cb29281c50\n\nAnd here you can see the dubbing workflow - from a YT video, through transcription, translation, speech generation to synchronisation. \n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fdfd4b6e8-3eda-49e4-bff4-f1683ec4cf21\n\n## About Pandrator\n\nPandrator aspires to be easy to use and install - it has a one-click installer and a graphical user interface. It is a tool designed to perform two tasks: \n- transform text, PDF (including see-through cropping), EPUB and SRT files into spoken audio in multiple languages based chiefly on open source software run locally, including preprocessing to make the generated speech sound as natural as possible by, among other things, splitting the text into paragraphs, sentences and smaller logical text blocks (clauses), which the TTS models can process with minimal artifacts. Each sentence can be regenerated if the first attempt is not satisfacory, including marking for regeneration using mouse or keyboard actions when listening back to the generation. Voice cloning is possible for models that support it, and text can be additionally preprocessed using LLMs (to remove OCR artifacts or spell out things that the TTS models struggle with, like Roman numerals and abbreviations, for example),\n- generate dubbing either directly from a video file, including transcription (using [WhisperX](https:\u002F\u002Fgithub.com\u002Fm-bain\u002FwhisperX)), or from an .srt file. It includes a complete workflow from a video file to a dubbed video file with subtitles - including translation using a variety of APIs and techniques to improve the quality of translation. [Subdub](https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FSubdub), a companion app developed for this purpose, can also be used on its own. You can also correct or translate subtitles without generating audio. \n\nAt the moment, it leverages [XTTS](https:\u002F\u002Fhuggingface.co\u002Fcoqui\u002FXTTS-v2) for its exceptional multilingual capabilities, good quality and easy fine-tuning, and [Silero](https:\u002F\u002Fgithub.com\u002Fsnakers4\u002Fsilero-models) for text-to-speech conversion and voice cloning, enhanced by [RVC_CLI](https:\u002F\u002Fgithub.com\u002Fblaisewf\u002Frvc-cli) for quality improvement and better voice cloning results, and NISQA for audio quality evaluation. Additionally, it incorporates [Text Generation Webui's](https:\u002F\u002Fgithub.com\u002Foobabooga\u002Ftext-generation-webui) API for local LLM-based text pre-processing, enabling a wide range of text manipulations before audio generation.\n\n## Supported Languages\n- XTTS supports English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu) and Korean (ko). \n\n- Silero supports English, German, Russian, Spanish, French, Hindi, Russian, Tatar, Ukrainian, Uzbek and Kalmyk. \n\n>[!NOTE]\n> Please note that Pandrator is still in an alpha stage and I'm not an experienced developer (I'm a noob, in fact), so the code is far from perfect in terms of optimisation, features and reliability. Please keep this in mind and contribute, if you want to help me make it better.\n\n## Samples\nThe samples were generated using the minimal settings - no LLM text processing, RVC or TTS evaluation, and no sentences were regenerated. Both XTTS and Silero generations were faster than playback speed, and Silero used only one CPU core. \n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F1c763c94-c66b-4c22-a698-6c4bcf3e875d\n\nhttps:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FPandrator\u002Fassets\u002F75737665\u002F118f5b9c-641b-4edd-8ef6-178dd924a883\n\nDubbing sample, including translation ([video source](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=_SwUpU0E2Eg&t=61s&pp=ygUn0LLRi9GB0YLRg9C_0LvQtdC90LjQtSDQu9C10LPQsNGB0L7QstCw)):\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F1ba8068d-986e-4dec-a162-3b7cc49052f4\n\n## Requirements\n\n### Hardware Requirements\n\n| TTS Model       | CPU Requirements                                              | GPU Requirements                                                       |\n|------------|---------------------------------------------------------------|-------------------------------------------------------------------------|\n| XTTS       | A reasonably modern CPU with 4+ cores (for CPU-only generation)              | NVIDIA GPU with 4GB+ of VRAM for good performance                        |\n| Silero     | Performs well on most CPUs regardless of core count                   | N\u002FA                                                                     |\n\n### Dependencies\nThis project relies on several APIs and services (running locally) and libraries, notably:\n\n#### Required\n- [XTTS API Server by daswer123](https:\u002F\u002Fgithub.com\u002Fdaswer123\u002Fxtts-api-server.git) for Text-to-Speech (TTS) generation using Coqui [XTTSv2](https:\u002F\u002Fhuggingface.co\u002Fcoqui\u002FXTTS-v2) OR [Silero API Server by ouoertheo](https:\u002F\u002Fgithub.com\u002Fouoertheo\u002Fsilero-api-server) for TTS generaton using the [Silero models](https:\u002F\u002Fgithub.com\u002Fsnakers4\u002Fsilero-models).\n- [FFmpeg](https:\u002F\u002Fgithub.com\u002FFFmpeg\u002FFFmpeg) for audio encoding.\n- [Sentence Splitter by mediacloud](https:\u002F\u002Fgithub.com\u002Fmediacloud\u002Fsentence-splitter) for splitting `.txt ` files into sentences, [customtkinter by TomSchimansky](https:\u002F\u002Fgithub.com\u002FTomSchimansky\u002FCustomTkinter), [num2words by savoirfairelinux](https:\u002F\u002Fgithub.com\u002Fsavoirfairelinux\u002Fnum2words), and many others. For a full list, see `requirements.txt`.\n\n#### Optional\n- [Subdub](https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FSubdub), a command line app that transcribes video files, translates subtitles and synchronises the generated speech with the video, made specially for Pandrator.\n- [WhisperX by m-bain](https:\u002F\u002Fgithub.com\u002Fm-bain\u002FwhisperX), an enhanced implementation of OpenAI's Whisper model with improved alignment, used for dubbing and XTTS training. \n- [Easy XTTS Trainer](https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002Feasy_xtts_trainer), a command line app that enables XTTS fine-tuning using one or more audio files, made specially for Pandrator.\n- [RVC Python by daswer123](https:\u002F\u002Fgithub.com\u002Fdaswer123\u002Frvc-python) for enhancing voice quality and cloning results with [Retrieval Based Voice Conversion](https:\u002F\u002Fgithub.com\u002FRVC-Project\u002FRetrieval-based-Voice-Conversion-WebUI).\n- [Text Generation Webui API by oobabooga](https:\u002F\u002Fgithub.com\u002Foobabooga\u002Ftext-generation-webui.git) for LLM-based text pre-processing.\n- [NISQA by gabrielmittag](https:\u002F\u002Fgithub.com\u002Fgabrielmittag\u002FNISQA.git) for evaluating TTS generations (using the [FastAPI implementation](https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FNISQA-API)).\n\n## Installation\n\n### Self-contained packages\nI've prepared packages (archives) that you can simply unpack - everything is preinstalled in its own portable conda environment. You can download them from **[here](https:\u002F\u002F1drv.ms\u002Ff\u002Fs!AgSiDu9lV3iMnPFKPO5BB_c72OLjtQ?e=sLidui)**.\n\nYou can use the launcher to start Pandrator, update it and install new features. \n\n| Package | Contents                                                   | Unpacked Size | \n|---------|-------------------------------------------------------------|---------------|\n| 1       | Pandrator and Silero                                        | 4GB           | \n| 2       | Pandrator and XTTS                                          | 14GB          | \n| 3       | Pandrator, XTTS, RVC, WhisperX (for dubbing) and XTTS fine-tuning | 36GB          | \n\n\n### GUI Installer and Launcher (Windows)\n\n![pandrator_installer_launcher_KLoHrNDIps](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flukaszliniewicz_Pandrator_readme_251448dff7cb.png)\n\nRun `pandrator_installer_launcher.exe` with administrator priviliges. You will find it under [Releases](https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FPandrator\u002Freleases). The executable was created using [pyinstaller](https:\u002F\u002Fgithub.com\u002Fpyinstaller\u002Fpyinstaller) from `pandrator_installer_launcher.py` in the repository.\n\n**The file may be flagged as a threat by antivirus software, so you may have to add it as an exception; if you're not comfortable doing that, install C++ Build Tools and Calibre manually or perform a fully manual installation**\n\nYou can choose which TTS engines to install and whether to install the software that enables RVC voice cloning (RVC Python), dubbing (WhisperX) and XTTS fine-tuning (Easy XTTS Trainer). You may install more components later. \n\nThe Installer\u002FLauncher performs the following tasks:\n\n1. Creates the Pandrator folder\n2. Installs necessary tools if not already present:\n   - C++ Build Tools\n   - Calibre\n3. Installs Miniconda (locally, not system-wide)\n4. Clones the following repositories:\n   - Pandrator\n   - Subdub\n   - PyPDFCropper\n   - XTTS API Server (if selected)\n   - Silero API Server (if selected)\n5. Creates conda environments (pandrator_installer, xtta_api_server_installer, whisperx_installer, easy_xtts_training_installer).\nIf you want to perform some actions inside the environments, for example for debugging, troubleshooting or customization, please go the the Pandrator folder and run:\n```\nconda\u002FScripts\u002Fconda.exe -p conda\u002Fenvs\u002Fenv_name run no-capture-output python [command]\n```\n7. Installs all necessary dependencies\n\n**Note:** You can use the Installer\u002FLauncher to launch Pandrator and all the tools at any moment.\n\nIf you want to perform the setup again, remove the Pandrator folder it created. Please allow at least a couple of minutes for the initial setup process to download models and install dependencies. Depending on the options you've chosen, it may take up to 30 minutes.\n\nFor additional functionality not yet included in the installer:\n- Install Text Generation Webui and remember to enable the API (add `--api` to `CMD_FLAGS.txt` in the main directory of the Webui before starting it).\n- Set up NISQA API for automatic evaluation of generations.\n\nPlease refer to the repositories linked under [Dependencies](#Dependencies) for detailed installation instructions. Remember that the API servers (XTTS, Silero) must be running to make use of the functionalities they offer.\n\n### Manual Installation\n\n#### Prerequisites\n\n- Git\n- Miniconda or Anaconda\n- Microsoft Visual C++ Build Tools\n- Calibre\n\n#### Installation Steps\n\n1. Install dependencies:\n   - Calibre: Download and install from [https:\u002F\u002Fcalibre-ebook.com\u002Fdownload_windows](https:\u002F\u002Fcalibre-ebook.com\u002Fdownload_windows)\n   - Microsoft Visual C++ Build Tools: \n     ```\n     winget install --id Microsoft.VisualStudio.2022.BuildTools --override \"--quiet --wait --add Microsoft.VisualStudio.Workload.VCTools --includeRecommended\" --accept-package-agreements --accept-source-agreements\n     ```\n\n2. Clone the repositories:\n   ```\n   mkdir Pandrator\n   cd Pandrator\n   git clone https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FPandrator.git\n   git clone https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FSubdub.git\n   ```\n\n3. Create and activate a conda environment:\n   ```\n   conda create -n pandrator_installer python=3.10 -y\n   conda activate pandrator_installer\n   ```\n\n4. Install Pandrator and Subdub requirements:\n   ```\n   cd Pandrator\n   pip install -r requirements.txt\n   cd ..\u002FSubdub\n   pip install -r requirements.txt\n   cd ..\n   ```\n\n5. (Optional) Install XTTS:\n   ```\n   git clone https:\u002F\u002Fgithub.com\u002Fdaswer123\u002Fxtts-api-server.git\n   conda create -n xtts_api_server_installer python=3.10 -y\n   conda activate xtts_api_server_installer\n   pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --extra-index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n   pip install xtts-api-server\n   ```\n\n6. (Optional) Install Silero:\n   ```\n   conda create -n silero_api_server_installer python=3.10 -y\n   conda activate silero_api_server_installer\n   pip install silero-api-server\n   ```\n\n7. (Optional) Install RVC (Retrieval-based Voice Conversion):\n   ```\n   conda activate pandrator_installer\n   pip install pip==24\n   pip install rvc-python\n   pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n   ```\n\n8. (Optional) Install WhisperX:\n   ```\n   conda create -n whisperx_installer python=3.10 -y\n   conda activate whisperx_installer\n   conda install git -c conda-forge -y\n   pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n   conda install cudnn=8.9.7.29 -c conda-forge -y\n   conda install ffmpeg -c conda-forge -y\n   pip install git+https:\u002F\u002Fgithub.com\u002Fm-bain\u002Fwhisperx.git\n   ```\n\n9. (Optional) Install XTTS Fine-tuning:\n   ```\n   git clone https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002Feasy_xtts_trainer.git\n   conda create -n easy_xtts_trainer python=3.10 -y\n   conda activate easy_xtts_trainer\n   cd easy_xtts_trainer\n   pip install -r requirements.txt\n   pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n   cd ..\n   ```\n\n#### Running the Components\n\n1. Run Pandrator:\n   ```\n   conda activate pandrator_installer\n   cd Pandrator\n   python pandrator.py\n   ```\n\n2. Run XTTS API Server (if installed):\n   ```\n   conda activate xtts_api_server_installer\n   python -m xtts_api_server\n   ```\n   Additional options:\n   - For CPU only: Add `--device cpu`\n   - For low VRAM: Add `--lowvram` (for 4GB or less)\n   - To use DeepSpeed: Add `--deepspeed`\n\n3. Run Silero API Server (if installed):\n   ```\n   conda activate silero_api_server_installer\n   python -m silero_api_server\n   ```\n\n#### Folder Structure\n\nAfter installation, your folder structure should look like this:\n\n```\nPandrator\u002F\n├── Pandrator\u002F\n├── Subdub\u002F\n├── xtts-api-server\u002F (if XTTS is installed)\n├── easy_xtts_trainer\u002F (if XTTS Fine-tuning is installed)\n```\n\nFor more detailed information on using specific components or troubleshooting, please refer to the documentation of each individual repository.\n\n## Quick Start Guide\n\n### Basic Usage: Audiobooks\nIf you don't want to use the additional features like RVC, you have everything you need in the **Session tab**. \n\n#### Session\nEither create a new session or load an existing one (select a folder in `Outputs` to do that).\n\n#### File selection and preprocessing\nChoose a `.txt`, `.srt`, `.pdf`, `.epub`, `.mobi` or `.docx` file. If you choose a PDF or EPUB file, a preview window will open with the extracted text. For PDFs, you will be able to crop the document (with translucent pages) ro remove headers and footers or selected pages. You may edit the extracted text (OCRed books often have poorly recognized text from the title page, for example) and check\u002Fadd paragraphs and Chapter markers (they will be created automatically for EPUB files). Files that contain a lot of text, regardless of format, can take a moment to finish preprocessing before generation begins. The GUI will freeze, but as long as there is processor activity, it's simply working.\n\n#### Selecting the TTS Engline and the voice\n1. Select the TTS server you want to use - XTTS or Silero - and the language from the dropdown. XTTS is the recommended option.\n2. Choose the voice you want to use.\n   1. **XTTS**, voices are short, 6-12s `.wav` files (22050hz sample rate, mono) stored in the `tts_voices` directory (`Pandrator\u002FPandrator\u002Ftts_voices`). You can upload and select them via the GUI. The XTTS model uses the audio to clone the voice. It doesn't matter what language the sample is in, you will be able to generate speech in all supported languages, but the quality will be best if you provide a sample in your target language. You may use the sample one in the repository or upload your own. Please make sure that the audio is between 6 and 12s, mono, and the sample rate is 22050hz. You may use a tool like Audacity to prepare the files. The less noise, the better. You may use a tool like [Resemble AI](https:\u002F\u002Fgithub.com\u002Fresemble-ai\u002Fresemble-enhance) for denoising and\u002For enhancement of your samples on [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FResembleAI\u002Fresemble-enhance). You may put several samples in a folder inside `tts_voices` and the model will use all of them at once (generally up to 4). It can improve the quality. \n   2. **Silero** offers a number of voices for each language it supports. It doesn't support voice cloning. Simply select a voice from the dropdown after choosing the language.\n\n#### Output options\nThe default output format is .m4b. You can also select opus, mp3 or wav, choose a cover image and provide metadata.\n\n#### Generation \nClick on \"Start Generation\" to begin. You may stop and resume it later, or close the programme and load the session later.\n\n#### Generated sentences\nYou can play back the generated sentences, also as a playlist, edit them (the text that will be used for regeneration), regenerate or remove individual ones. You can also mark them for regeneration. This is useful when you don't want to stop listening but work on all problematic sentences later. You can use the \"m\" key to mark the sentence that is currently playing or the right mouse button to mark both the current and the previous sentence (this can be useful if you're listening to the output and not looking at the screen).\n\"Save Output\" concatenates the sentences generated so far an encodes them as one file.\n\n### Dubbing\n\nPandrator offers a comprehensive workflow for generating dubbed videos from video files or existing subtitles. This includes transcription, translation, speech generation, and synchronization:\n\n1. **Select a Video or SRT File:** \n    - **Video File:** Choose a video file. The audio will be extracted automatically, and transcription will be performed using WhisperX. \n    - **SRT File:** Select an existing SRT subtitle file. In this case, you also need to specify the corresponding video file (unless you only want to translate the subtitles).\n2. **Transcription (if using a video file):**\n    - **Language:** Select the language spoken in the original video.\n    - **Model:** Choose a WhisperX model for transcription. Smaller models are faster, while larger ones provide higher accuracy. The `large-v3` model provides the best results. \n    - Pandrator will automatically run WhisperX to generate an SRT file containing the transcription.\n3. **Translation (optional):**\n    - **Enable Translation:** Toggle this option to translate the subtitles.\n    - **Original and Target Languages:** Select the original language of the subtitles and the language you want to translate them into.\n    - **Translation Model:** Choose a translation model (e.g., `haiku`, `sonnet`, `sonnet thinking`, `gemini-flash`,  `gemini-flash-thinking`, `gpt-4o-mini`, `gpt-4o`, `deepl`, `local`). With the exception of the local option, you have to set an API key in the _API Keys_ tab. Sonnet provides the best results, but is the most expensive. Gemini-flash-thinking is decent and free (you need to obtain an API key from Google AI Studio). You can translate 500,000 characters for free with DeepL. For local translation, you need to have Text Generation Webui set up and running with the model you want to use loaded.\n    - **Chain-of-thought (optional):** Enable this option to use chain-of-thought prompting, which may improve quality for non-thinking models - don't use with thinking models (available only for LLMs, not DeepL).\n4. In order to generate speech, click on __Generate Dubbing Audio__. You will be able to edit\u002Fregenerate the sentences as in the Audiobook workflow. You can also choose to only transcribe the chosen video file or only translate a subtitle file.\n6. **Synchronization:** When you're happy with the generated audio, click on __Add Dubbing to Video__. The dubbing will be synchronised with the video, producing a dubbed video file with embedded subtitles.\n\n### General Audio Settings\n1. You can change the lenght of silence appended to the end of sentences and paragraphs.\n2. You can enable a fade-in and -out effect and set the duration.\n3. You can enable RVC. For this to work, you have to install RVC_Python. You can do this in the Installer\u002FLauncher at any time. You need to select a model - an RVC model consists of two files. A `.pth ` and an `.index ` file. They need to have the same name (e.g. voicex.pth and voicex.index). For best results, use the same voice for XTTS. You can also fine tune the RVC options such as pitch.\n\n### General Text Pre-Processing Settings\n1. You can disable\u002Fenable splitting long sentences and set the max lenght a text fragment sent for TTS generation may have (enabled by default; it tries to split sentences whose lenght exceeds the max lenght value; it looks for punctuation marks (, ; : -) and chooses the one closest to the midpoint of the sentence; if there are no punctuation marks, it looks for conjunctions like \"and\"; it performs this operation twice as some sentence fragments may still be too long after just one split.\n2. You can disable\u002Fenable appending short sentences (to preceding or following sentences; disabled by default, may perhaps improve the flow as the lenght of text fragments sent to the model is more uniform).\n3. Remove diacritics (useful when generating a text that contains many foreign words or transliterations from foreign alphabets, e.g. Japanese). Do not enable this if you generate in a language that needs diacritics, like German or Polish! The pronounciation will be wrong then.\n\n### LLM Pre-processing\n- Enable LLM processing to use language models for preprocessing the text before sending it to the TTS API. For example, you may ask the LLM to remove OCR artifacts, spell out abbreviations, correct punctuation etc.\n- You can define up to three prompts for text optimization. Each prompt is sent to the LLM API separately, and the output of the last prompt is used for TTS generation.\n- For each prompt, you can enable\u002Fdisable it, set the prompt text, choose the LLM model to use, and enable\u002Fdisable evaluation (if enabled, the LLM API will be called twice for each prompt, and then again for the model to choose the better result).\n- Load the available LLM models using the \"Load LLM Models\" button in the Session tab.\n\n### RVC Quality Enhancement and Voice Cloning\n- Enable RVC to enhance the generated audio quality and apply voice cloning.\n- Select the RVC model file (.pth) and the corresponding index file using the \"Select RVC Model\" and \"Select RVC Index\" buttons in the Audio Processing tab.\n- When RVC is enabled, the generated audio will be processed using the selected RVC model and index before being saved.\n\n### NISQA TTS Evaluation\n- Enable TTS evaluation to assess the quality of the generated audio using the NISQA (Non-Intrusive Speech Quality Assessment) model.\n- Set the target MOS (Mean Opinion Score) value and the maximum number of attempts for each sentence.\n- When TTS evaluation is enabled, the generated audio will be evaluated using the NISQA model, and the best audio (based on the MOS score) will be chosen for each sentence.\n- If the target MOS value is not reached within the maximum number of attempts, the best audio generated so far will be used.\n\n## Contributing\nContributions, suggestions for improvements, and bug reports are most welcome!\n\n## Tips\n- You can find a collection of voice sample for example [here](https:\u002F\u002Faiartes.com\u002Fvoiceai). They are intended for use with ElevenLabs, so you will need to pick an 8-12s fragment and save it as 22050khz mono `.wav` usuing Audacity, for instance.\n- You can find a collection of RVC models for example [here](https:\u002F\u002Fvoice-models.com\u002F).\n\n## To-do\n- [ ] Add support for Surya for PDF OCR, layout and redeaing order detection, plus preprocessing of chapters, headers, footers, footnotes and tables. \n- [ ] Add support for StyleTTS2\n- [ ] Add importing\u002Fexporting settings.\n- [ ] Add support for proprietary APIs for text pre-processing and TTS generation.\n- [ ] Include OCR for PDFs.\n- [ ] Add support for a higher quality local TTS model, Tortoise.\n- [ ] Add option to record a voice sample and use it for TTS to the GUI.\n- [x] Add support for chapter segmentation\n- [x] Add all API servers to the setup script.\n- [x] Add support for custom XTTS models \n- [x] Add workflow to create dubbing from `.srt` subtitle files.\n- [x] Include support for PDF files.\n- [x] Integrate editing capabilities for processed sentences within the UI.\n- [x] Add support for a lower quality but faster local TTS model that can easily run on CPU, e.g. Silero or Piper.\n- [x] Add support for EPUB.\n\n\n","\u003Cp align=\"left\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flukaszliniewicz_Pandrator_readme_ffd8182e493b.png\" alt=\"Icon\" width=\"200\" height=\"200\"\u002F>\n\u003C\u002Fp>\n\n# Pandrator：一款支持语音克隆与翻译的多语言GUI有声书、字幕及配音生成工具\n>[!TIP]\n>**简而言之：**\n> - Pandrator本身并非AI模型，而是一个用于文本转语音、字幕生成和翻译项目的GUI框架。它能够借助多种AI工具、自定义工作流和算法来生成有声书以及字幕或配音。该软件在Windows系统上开箱即用。虽然也能在Linux上运行，但目前仍需手动安装。\n> - 使用它的最简单方式是下载其中一个预编译的**[压缩包](https:\u002F\u002F1drv.ms\u002Ff\u002Fs!AgSiDu9lV3iMnPFKPO5BB_c72OLjtQ?e=3fRZMG)**——只需解压并使用其中的启动器即可。其内容与大小请参见**[此表格](#self-contained-packages)**。\n> - 您可以在Discord服务器上与我交流，或分享技巧、工作流程与创意。\n>\n> [![](https:\u002F\u002Fdcbadge.limes.pink\u002Fapi\u002Fserver\u002FJZzHv3MnaV)](https:\u002F\u002Fdiscord.gg\u002Fhttps:\u002F\u002Fdiscord.gg\u002FJZzHv3MnaV)\n\n\n\n## 快速演示\n本视频展示了启动Pandrator、选择源文件、开始生成、停止生成并预览保存文件的过程。视频未加速，旨在展示实际运行效果（您可跳过前35秒XTTS服务器启动的部分，并请务必打开声音）。\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F7cab141a-e043-4057-8166-72cb29281c50\n\n接下来的视频则演示了从YouTube视频到转录、翻译、语音合成再到同步的配音工作流。\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fdfd4b6e8-3eda-49e4-bff4-f1683ec4cf21\n\n## 关于Pandrator\n\nPandrator致力于易用性和易安装性——提供一键安装程序和图形化用户界面。它是一款专为两项任务设计的工具：\n- 将文本、PDF（包括透明裁剪）、EPUB和SRT文件转换为多语言的语音输出，主要基于本地运行的开源软件，包含预处理步骤，以尽可能使生成的语音听起来自然，例如将文本拆分为段落、句子及更小的逻辑文本块（从句），以便TTS模型在处理时产生较少的人工痕迹。如果首次尝试不满意，每个句子都可以重新生成；在回听生成结果时，可通过鼠标或键盘操作标记需要重做的部分。对于支持语音克隆的模型，还可以进行语音克隆；此外，还可利用LLM对文本进行进一步预处理（如去除OCR伪影，或将罗马数字、缩写等TTS模型难以处理的内容逐字读出）。\n- 直接从视频文件生成配音，包括转录（使用[WhisperX](https:\u002F\u002Fgithub.com\u002Fm-bain\u002FwhisperX)）或从.SRT文件开始。它涵盖了从视频文件到带字幕的配音视频的完整工作流程——包括使用多种API和技巧来提升翻译质量。为此开发的配套应用[Subdub](https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FSubdub)也可单独使用。您还可以在不生成音频的情况下直接校对或翻译字幕。\n\n目前，Pandrator主要利用[XTTS](https:\u002F\u002Fhuggingface.co\u002Fcoqui\u002FXTTS-v2)的强大多语言能力、优良品质及易于微调的特点，同时结合[Silero](https:\u002F\u002Fgithub.com\u002Fsnakers4\u002Fsilero-models)进行文本转语音和语音克隆，并通过[RVC_CLI](https:\u002F\u002Fgithub.com\u002Fblaisewf\u002Frvc-cli)进一步提升音质与语音克隆效果，以及NISQA来进行音频质量评估。此外，它还集成了[Text Generation Webui](https:\u002F\u002Fgithub.com\u002Foobabooga\u002Ftext-generation-webui)的API，用于本地LLM驱动的文本预处理，从而在音频生成之前实现多样化的文本操作。\n\n## 支持的语言\n- XTTS支持英语（en）、西班牙语（es）、法语（fr）、德语（de）、意大利语（it）、葡萄牙语（pt）、波兰语（pl）、土耳其语（tr）、俄语（ru）、荷兰语（nl）、捷克语（cs）、阿拉伯语（ar）、中文（zh-cn）、日语（ja）、匈牙利语（hu）和韩语（ko）。\n\n- Silero支持英语、德语、俄语、西班牙语、法语、印地语、鞑靼语、乌克兰语、乌兹别克语和卡尔梅克语。\n\n>[!NOTE]\n> 请注意，Pandrator目前仍处于Alpha阶段，而我并非经验丰富的开发者（实际上是个新手），因此代码在优化、功能和可靠性方面都远未完善。请理解这一点，并欢迎为改进这款工具贡献力量。\n\n## 示例\n这些示例均采用最低设置生成——未使用LLM文本处理、RVC或TTS评估，也未对任何句子进行重做。XTTS和Silero的生成速度均快于播放速度，且Silero仅使用了一个CPU核心。\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F1c763c94-c66b-4c22-a698-6c4bcf3e875d\n\nhttps:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FPandrator\u002Fassets\u002F75737665\u002F118f5b9c-641b-4edd-8ef6-178dd924a883\n\n配音示例，含翻译（[视频来源](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=_SwUpU0E2Eg&t=61s&pp=ygUn0LLRi9GB0YLRg9C_0LvQtdC90LjQtSDQu9C10LPQsNGB0L7QstCw)）：\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F1ba8068d-986e-4dec-a162-3b7cc49052f4\n\n## 系统要求\n\n### 硬件要求\n\n| TTS模型       | CPU要求                                              | GPU要求                                                       |\n|------------|---------------------------------------------------------------|-------------------------------------------------------------------------|\n| XTTS       | 至少具备4核的较新CPU（仅使用CPU进行生成）              | NVIDIA显卡，显存4GB以上，以获得良好性能                        |\n| Silero     | 在大多数CPU上表现良好，无论核心数量多少                   | 无                                                                     |\n\n### 依赖项\n本项目依赖于多个 API 和服务（本地运行）以及库，其中主要包括：\n\n#### 必需\n- [XTTS API 服务器（由 daswer123 提供）](https:\u002F\u002Fgithub.com\u002Fdaswer123\u002Fxtts-api-server.git)，用于基于 Coqui [XTTSv2](https:\u002F\u002Fhuggingface.co\u002Fcoqui\u002FXTTS-v2) 的文本转语音 (TTS) 生成；或 [Silero API 服务器（由 ouoertheo 提供）](https:\u002F\u002Fgithub.com\u002Fouoertheo\u002Fsilero-api-server)，用于基于 [Silero 模型](https:\u002F\u002Fgithub.com\u002Fsnakers4\u002Fsilero-models) 的 TTS 生成。\n- [FFmpeg](https:\u002F\u002Fgithub.com\u002FFFmpeg\u002FFFmpeg)，用于音频编码。\n- [Sentence Splitter（由 mediacloud 提供）](https:\u002F\u002Fgithub.com\u002Fmediacloud\u002Fsentence-splitter)，用于将 `.txt` 文件按句子分割；[customtkinter（由 TomSchimansky 提供）](https:\u002F\u002Fgithub.com\u002FTomSchimansky\u002FCustomTkinter)、[num2words（由 savoirfairelinux 提供）](https:\u002F\u002Fgithub.com\u002Fsavoirfairelinux\u002Fnum2words) 等。完整列表请参阅 `requirements.txt`。\n\n#### 可选\n- [Subdub（由 lukaszliniewicz 提供）](https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FSubdub)，一款命令行应用程序，可对视频文件进行转录、翻译字幕，并将生成的语音与视频同步，专为 Pandrator 打造。\n- [WhisperX（由 m-bain 提供）](https:\u002F\u002Fgithub.com\u002Fm-bain\u002FwhisperX)，OpenAI Whisper 模型的增强版，具有更优的对齐效果，用于配音和 XTTS 训练。\n- [Easy XTTS Trainer（由 lukaszliniewicz 提供）](https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002Feasy_xtts_trainer)，一款命令行应用程序，允许使用一个或多个音频文件对 XTTS 进行微调，专为 Pandrator 设计。\n- [RVC Python（由 daswer123 提供）](https:\u002F\u002Fgithub.com\u002Fdaswer123\u002Frvc-python)，用于通过 [基于检索的语音转换](https:\u002F\u002Fgithub.com\u002FRVC-Project\u002FRetrieval-based-Voice-Conversion-WebUI) 提升语音质量和克隆效果。\n- [Text Generation Webui API（由 oobabooga 提供）](https:\u002F\u002Fgithub.com\u002Foobabooga\u002Ftext-generation-webui.git)，用于基于大语言模型的文本预处理。\n- [NISQA（由 gabrielmittag 提供）](https:\u002F\u002Fgithub.com\u002Fgabrielmittag\u002FNISQA.git)，用于评估 TTS 生成结果（采用 [FastAPI 实现](https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FNISQA-API))。\n\n## 安装说明\n\n### 自包含软件包\n我已准备好可以直接解压使用的软件包（压缩文件），其中所有内容均已预先安装在独立的便携式 conda 环境中。您可从 **[此处](https:\u002F\u002F1drv.ms\u002Ff\u002Fs!AgSiDu9lV3iMnPFKPO5BB_c72OLjtQ?e=sLidui)** 下载。\n\n您可以使用启动器来启动 Pandrator、更新程序并安装新功能。\n\n| 软件包 | 内容                                                   | 解压后大小 | \n|---------|-------------------------------------------------------------|---------------|\n| 1       | Pandrator 和 Silero                                        | 4GB           | \n| 2       | Pandrator 和 XTTS                                          | 14GB          | \n| 3       | Pandrator、XTTS、RVC、WhisperX（用于配音）及 XTTS 微调工具 | 36GB          | \n\n\n### GUI 安装程序与启动器（Windows）\n\n![pandrator_installer_launcher_KLoHrNDIps](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flukaszliniewicz_Pandrator_readme_251448dff7cb.png)\n\n请以管理员权限运行 `pandrator_installer_launcher.exe`。该文件位于 [Releases](https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FPandrator\u002Freleases) 页面。此可执行文件是使用 [pyinstaller](https:\u002F\u002Fgithub.com\u002Fpyinstaller\u002Fpyinstaller) 从仓库中的 `pandrator_installer_launcher.py` 编译而成。\n\n**该文件可能会被杀毒软件标记为威胁，因此您可能需要将其添加到白名单；如果您对此不放心，可以手动安装 C++ Build Tools 和 Calibre，或者进行完全手动安装。**\n\n您可以选择要安装的 TTS 引擎，以及是否安装支持 RVC 语音克隆（RVC Python）、配音（WhisperX）和 XTTS 微调（Easy XTTS Trainer）的软件。后续也可再安装其他组件。\n\n安装程序\u002F启动器将执行以下操作：\n\n1. 创建 Pandrator 文件夹\n2. 如果尚未安装，则安装必要的工具：\n   - C++ Build Tools\n   - Calibre\n3. 安装 Miniconda（仅限本地环境，而非系统全局）\n4. 克隆以下仓库：\n   - Pandrator\n   - Subdub\n   - PyPDFCropper\n   - XTTS API 服务器（如已选择）\n   - Silero API 服务器（如已选择）\n5. 创建 conda 环境（pandrator_installer、xtta_api_server_installer、whisperx_installer、easy_xtts_training_installer）。\n若需在这些环境中执行某些操作，例如调试、故障排除或自定义，请进入 Pandrator 文件夹并运行：\n```\nconda\u002FScripts\u002Fconda.exe -p conda\u002Fenvs\u002Fenv_name run no-capture-output python [command]\n```\n7. 安装所有必要的依赖项。\n\n**注意：** 您可以随时使用安装程序\u002F启动器来启动 Pandrator 及所有相关工具。\n\n如需重新进行设置，请删除安装程序创建的 Pandrator 文件夹。初始设置过程中需要下载模型并安装依赖项，因此请预留至少几分钟时间；根据您选择的选项，整个过程可能需要长达 30 分钟。\n\n对于安装程序尚未包含的附加功能：\n- 安装 Text Generation Webui，并确保启用 API（在启动 Webui 前，将 `--api` 添加到主目录下的 `CMD_FLAGS.txt` 文件中）。\n- 配置 NISQA API，以实现对生成结果的自动评估。\n\n有关详细的安装说明，请参阅 [依赖项](#Dependencies) 部分所链接的各个仓库。请注意，XTTS 和 Silero 的 API 服务器必须处于运行状态，才能使用其提供的功能。\n\n### 手动安装\n\n#### 先决条件\n\n- Git\n- Miniconda 或 Anaconda\n- Microsoft Visual C++ 构建工具\n- Calibre\n\n#### 安装步骤\n\n1. 安装依赖：\n   - Calibre：从 [https:\u002F\u002Fcalibre-ebook.com\u002Fdownload_windows](https:\u002F\u002Fcalibre-ebook.com\u002Fdownload_windows) 下载并安装。\n   - Microsoft Visual C++ 构建工具：\n     ```\n     winget install --id Microsoft.VisualStudio.2022.BuildTools --override \"--quiet --wait --add Microsoft.VisualStudio.Workload.VCTools --includeRecommended\" --accept-package-agreements --accept-source-agreements\n     ```\n\n2. 克隆仓库：\n   ```\n   mkdir Pandrator\n   cd Pandrator\n   git clone https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FPandrator.git\n   git clone https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FSubdub.git\n   ```\n\n3. 创建并激活 Conda 环境：\n   ```\n   conda create -n pandrator_installer python=3.10 -y\n   conda activate pandrator_installer\n   ```\n\n4. 安装 Pandrator 和 Subdub 的依赖：\n   ```\n   cd Pandrator\n   pip install -r requirements.txt\n   cd ..\u002FSubdub\n   pip install -r requirements.txt\n   cd ..\n   ```\n\n5. （可选）安装 XTTS：\n   ```\n   git clone https:\u002F\u002Fgithub.com\u002Fdaswer123\u002Fxtts-api-server.git\n   conda create -n xtts_api_server_installer python=3.10 -y\n   conda activate xtts_api_server_installer\n   pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --extra-index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n   pip install xtts-api-server\n   ```\n\n6. （可选）安装 Silero：\n   ```\n   conda create -n silero_api_server_installer python=3.10 -y\n   conda activate silero_api_server_installer\n   pip install silero-api-server\n   ```\n\n7. （可选）安装 RVC（基于检索的语音转换）：\n   ```\n   conda activate pandrator_installer\n   pip install pip==24\n   pip install rvc-python\n   pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n   ```\n\n8. （可选）安装 WhisperX：\n   ```\n   conda create -n whisperx_installer python=3.10 -y\n   conda activate whisperx_installer\n   conda install git -c conda-forge -y\n   pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n   conda install cudnn=8.9.7.29 -c conda-forge -y\n   conda install ffmpeg -c conda-forge -y\n   pip install git+https:\u002F\u002Fgithub.com\u002Fm-bain\u002Fwhisperx.git\n   ```\n\n9. （可选）安装 XTTS 微调工具：\n   ```\n   git clone https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002Feasy_xtts_trainer.git\n   conda create -n easy_xtts_trainer python=3.10 -y\n   conda activate easy_xtts_trainer\n   cd easy_xtts_trainer\n   pip install -r requirements.txt\n   pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n   cd ..\n   ```\n\n#### 运行组件\n\n1. 运行 Pandrator：\n   ```\n   conda activate pandrator_installer\n   cd Pandrator\n   python pandrator.py\n   ```\n\n2. 运行 XTTS API 服务器（如果已安装）：\n   ```\n   conda activate xtts_api_server_installer\n   python -m xtts_api_server\n   ```\n   额外选项：\n   - 仅使用 CPU：添加 `--device cpu`\n   - 低显存模式：添加 `--lowvram`（适用于 4GB 及以下显存）\n   - 使用 DeepSpeed：添加 `--deepspeed`\n\n3. 运行 Silero API 服务器（如果已安装）：\n   ```\n   conda activate silero_api_server_installer\n   python -m silero_api_server\n   ```\n\n#### 文件夹结构\n\n安装完成后，您的文件夹结构应如下所示：\n\n```\nPandrator\u002F\n├── Pandrator\u002F\n├── Subdub\u002F\n├── xtts-api-server\u002F (如果安装了 XTTS)\n├── easy_xtts_trainer\u002F (如果安装了 XTTS 微调工具)\n```\n\n有关特定组件的使用或故障排除的更详细信息，请参阅各个仓库的文档。\n\n## 快速入门指南\n\n### 基本用法：有声书\n如果您不想使用 RVC 等附加功能，那么在 **“会话”选项卡** 中就已经具备了所需的一切。\n\n#### 会话\n您可以创建一个新的会话，也可以加载一个已有的会话（在 `Outputs` 文件夹中选择一个文件夹即可）。\n\n#### 文件选择与预处理\n选择 `.txt`、`.srt`、`.pdf`、`.epub`、`.mobi` 或 `.docx` 文件。如果选择 PDF 或 EPUB 文件，将会打开一个预览窗口，显示提取出的文本。对于 PDF 文件，您可以通过半透明页面对文档进行裁剪，以移除页眉、页脚或特定页面。您可以编辑提取出的文本（例如，OCR 识别的书籍通常在扉页等位置存在识别错误），并检查或添加段落和章节标记（EPUB 文件会自动创建这些标记）。无论格式如何，包含大量文本的文件在开始生成之前可能需要一些时间完成预处理。此时界面可能会卡住，但只要 CPU 仍在运行，就说明程序正在正常工作。\n\n#### 选择 TTS 引擎和语音\n1. 从下拉菜单中选择要使用的 TTS 服务器——XTTS 或 Silero——以及语言。推荐使用 XTTS。\n2. 选择您想要使用的语音。\n   1. **XTTS** 的语音是短小的 6–12 秒 `.wav` 文件（采样率为 22050Hz，单声道），存储在 `tts_voices` 目录中（`Pandrator\u002FPandrator\u002Ftts_voices`）。您可以通过 GUI 上传并选择这些语音。XTTS 模型会利用音频来克隆语音。样本的语言并不重要，您可以在所有支持的语言中生成语音，但如果提供目标语言的样本，效果会更好。您可以使用仓库中的示例文件，也可以上传自己的文件。请确保音频长度在 6 到 12 秒之间，为单声道，且采样率为 22050Hz。可以使用 Audacity 等工具准备文件。噪音越少越好。您还可以使用 [Resemble AI](https:\u002F\u002Fgithub.com\u002Fresemble-ai\u002Fresemble-enhance) 等工具，在 [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FResembleAI\u002Fresemble-enhance) 上对样本进行降噪和\u002F或增强。您可以将多个样本放入 `tts_voices` 目录下的一个文件夹中，模型会同时使用所有样本（通常最多 4 个），这样可以提升质量。\n   2. **Silero** 为其支持的每种语言提供了多种语音选项。它不支持语音克隆功能。只需在选择语言后，从下拉菜单中选择一种语音即可。\n\n#### 输出选项\n默认输出格式为 .m4b。您也可以选择 opus、mp3 或 wav 格式，选择封面图片并添加元数据。\n\n#### 生成\n点击“开始生成”即可开始。您可以随时停止并稍后再继续，或者关闭程序并在以后重新加载会话。\n\n#### 已生成的句子\n您可以播放已生成的句子，也可以将其作为播放列表播放；还可以编辑这些句子（即用于重新生成的文本）、重新生成或删除个别句子。您也可以将它们标记为待重新生成。这在您不想中断聆听，而希望稍后再处理有问题的句子时非常有用。您可以使用 `m` 键标记当前正在播放的句子，或使用鼠标右键同时标记当前句和上一句（如果您是在听输出而不是看屏幕，这种方式会更方便）。\n“保存输出”会将迄今为止生成的所有句子拼接在一起，并编码为一个文件。\n\n### 配音\nPandrator 提供了一个全面的工作流程，用于从视频文件或现有字幕文件生成配音视频。该流程包括转录、翻译、语音合成和同步：\n\n1. **选择视频或 SRT 文件：**\n    - **视频文件：** 选择一个视频文件。系统会自动提取音频，并使用 WhisperX 进行转录。\n    - **SRT 文件：** 选择一个现有的 SRT 字幕文件。在这种情况下，您还需要指定对应的视频文件（除非您只想翻译字幕）。\n2. **转录（如果使用视频文件）：**\n    - **语言：** 选择原视频中使用的语言。\n    - **模型：** 选择用于转录的 WhisperX 模型。较小的模型速度更快，而较大的模型则提供更高的准确性。“large-v3”模型的效果最佳。\n    - Pandrator 会自动运行 WhisperX，生成包含转录内容的 SRT 文件。\n3. **翻译（可选）：**\n    - **启用翻译：** 打开此选项以翻译字幕。\n    - **原文与目标语言：** 选择字幕的原文语言以及您希望翻译成的目标语言。\n    - **翻译模型：** 选择一个翻译模型（例如，“haiku”、“sonnet”、“sonnet thinking”、“gemini-flash”、“gemini-flash-thinking”、“gpt-4o-mini”、“gpt-4o”、“deepl”、“local”）。除“local”选项外，您需要在 _API 密钥_ 选项卡中设置 API 密钥。Sonnet 的效果最好，但价格也最贵。Gemini-flash-thinking 效果不错且免费（需从 Google AI Studio 获取 API 密钥）。通过 DeepL，您可以免费翻译 50 万字符。对于本地翻译，您需要安装并运行 Text Generation Webui，同时加载您希望使用的模型。\n    - **思维链（可选）：** 启用此选项以使用思维链提示，这可能会提高非思考型模型的质量——请勿与思考型模型一起使用（仅适用于 LLM，不适用于 DeepL）。\n4. 为了生成语音，点击 __生成配音音频__。您可以像在有声书流程中一样编辑或重新生成句子。您也可以选择只转录所选视频文件，或只翻译字幕文件。\n6. **同步：** 当您对生成的音频满意时，点击 __将配音添加到视频__。配音将与视频同步，生成带有嵌入字幕的配音视频。\n\n### 通用音频设置\n1. 您可以调整在句子和段落末尾添加的静音时长。\n2. 您可以启用淡入淡出效果，并设置持续时间。\n3. 您可以启用 RVC。要使 RVC 正常工作，您需要安装 RVC_Python。您可以在安装程序\u002F启动器中随时完成安装。您需要选择一个模型——RVC 模型由两个文件组成：一个 `.pth` 文件和一个 `.index` 文件。这两个文件必须同名（例如 voicex.pth 和 voicex.index）。为了获得最佳效果，建议为 XTTS 使用相同的语音。您还可以微调 RVC 的参数，例如音高。\n\n### 文本预处理通用设置\n1. 您可以禁用\u002F启用长句拆分功能，并设置发送至TTS生成的文本片段的最大长度（默认启用；当句子长度超过最大长度值时，系统会尝试拆分句子。它会寻找标点符号（, ; : -），并选择最接近句子中点的标点进行分割；如果没有标点符号，则会寻找“and”等连词。此操作会执行两次，因为一次拆分后某些句子片段可能仍然过长）。\n2. 您可以禁用\u002F启用短句拼接功能（将短句拼接到前一句或后一句；默认禁用，这可能会使文本流更加顺畅，因为发送给模型的文本片段长度更为均匀）。\n3. 去除变音符号（在生成包含大量外来词或来自外文字母的音译文本时非常有用，例如日语）。如果您使用需要变音符号的语言（如德语或波兰语）进行生成，请勿启用此功能！否则发音将会错误。\n\n### LLM 预处理\n- 启用LLM处理功能，以便在将文本发送至TTS API之前，利用语言模型对其进行预处理。例如，您可以要求LLM去除OCR产生的噪声、展开缩写、修正标点符号等。\n- 您最多可以定义三个用于文本优化的提示。每个提示会单独发送至LLM API，最终由最后一个提示的输出用于TTS生成。\n- 对于每个提示，您可以启用或禁用它、设置提示文本、选择要使用的LLM模型，并启用或禁用评估功能（若启用，LLM API会对每个提示调用两次，随后再调用一次以让模型选出更好的结果）。\n- 使用“会话”选项卡中的“加载LLM模型”按钮，加载可用的LLM模型。\n\n### RVC 质量增强与语音克隆\n- 启用RVC功能，以提升生成音频的质量并应用语音克隆技术。\n- 在“音频处理”选项卡中，使用“选择RVC模型”和“选择RVC索引”按钮，分别选择RVC模型文件（.pth）和对应的索引文件。\n- 当RVC功能启用时，生成的音频将在保存之前，使用选定的RVC模型和索引进行处理。\n\n### NISQA TTS 评估\n- 启用TTS评估功能，以使用NISQA（非侵入式语音质量评估）模型评估生成音频的质量。\n- 设置目标MOS（平均意见得分）值以及每句话的最大尝试次数。\n- 当TTS评估功能启用时，生成的音频将使用NISQA模型进行评估，并为每句话选择MOS得分最高的音频。\n- 如果在最大尝试次数内未能达到目标MOS值，则将使用迄今为止生成的最佳音频。\n\n## 贡献\n我们非常欢迎您的贡献、改进建议以及错误报告！\n\n## 小贴士\n- 您可以在此处找到一些语音样本集：[这里](https:\u002F\u002Faiartes.com\u002Fvoiceai)。这些样本专为ElevenLabs设计，因此您需要从中选取一段8至12秒的音频，并使用Audacity等工具将其保存为22050kHz的单声道`.wav`格式。\n- 您也可以在此处找到一些RVC模型集：[这里](https:\u002F\u002Fvoice-models.com\u002F)。\n\n## 待办事项\n- [ ] 添加对Surya的支持，用于PDF的OCR、版面及阅读顺序检测，以及章节、页眉、页脚、脚注和表格的预处理。\n- [ ] 添加对StyleTTS2的支持。\n- [ ] 添加设置的导入导出功能。\n- [ ] 添加对专有API的支持，用于文本预处理和TTS生成。\n- [ ] 包含PDF的OCR功能。\n- [ ] 添加对更高质量本地TTS模型Tortoise的支持。\n- [ ] 添加录制语音样本并通过GUI直接用于TTS的功能。\n- [x] 添加对章节分割的支持。\n- [x] 将所有API服务器加入安装脚本。\n- [x] 添加对自定义XTTS模型的支持。\n- [x] 添加从`.srt`字幕文件制作配音的工作流程。\n- [x] 包含对PDF文件的支持。\n- [x] 在UI中集成已处理句子的编辑功能。\n- [x] 添加对低质量但运行速度更快、可在CPU上轻松运行的本地TTS模型的支持，例如Silero或Piper。\n- [x] 添加对EPUB的支持。","# Pandrator 快速上手指南\n\nPandrator 是一个多语言图形界面（GUI）工具，用于生成有声书、字幕和配音。它集成了语音克隆、翻译及多种 AI 模型（如 XTTS、Silero），支持本地运行，旨在让文本转语音和视频配音流程变得简单高效。\n\n## 1. 环境准备\n\n### 系统要求\n*   **操作系统**: \n    *   **Windows**: 推荐，支持一键安装器。\n    *   **Linux**: 支持，但需手动安装配置。\n*   **硬件配置**:\n    *   **CPU**: 现代处理器，至少 4 核心（若仅使用 CPU 运行 XTTS）。\n    *   **GPU (可选但推荐)**: NVIDIA 显卡，显存 4GB+（用于加速 XTTS 生成和提升质量）。Silero 模型在普通 CPU 上即可良好运行。\n\n### 前置依赖\n无论采用哪种安装方式，以下基础工具是必须的：\n*   **Git**: 用于代码克隆。\n*   **Miniconda \u002F Anaconda**: 用于管理 Python 环境。\n*   **Microsoft Visual C++ Build Tools**: Windows 编译环境。\n*   **Calibre**: 用于处理 EPUB\u002FPDF 等电子书格式。\n\n> **提示**: 国内用户建议配置 Conda 和 Pip 的国内镜像源（如清华源、阿里源）以加速依赖下载。\n\n## 2. 安装步骤\n\n### 方案 A：Windows 一键安装包（推荐）\n这是最简单的方式，所有依赖已预装在独立的便携环境中。\n\n1.  **下载压缩包**:\n    访问官方提供的归档链接下载预设包（注意网络环境）：\n    *   **基础版 (Pandrator + Silero)**: 约 4GB\n    *   **进阶版 (Pandrator + XTTS)**: 约 14GB\n    *   **完整版 (含 RVC, WhisperX, 微调工具)**: 约 36GB\n    \n    > 下载地址参考：[OneDrive 归档链接](https:\u002F\u002F1drv.ms\u002Ff\u002Fs!AgSiDu9lV3iMnPFKPO5BB_c72OLjtQ?e=sLidui)\n\n2.  **解压与运行**:\n    *   解压下载的压缩包。\n    *   运行文件夹内的启动器（Launcher）即可直接使用。\n\n### 方案 B：Windows 安装器安装\n如果你希望自定义组件（如只装 XTTS 或包含配音功能），可使用官方安装器。\n\n1.  **下载安装器**:\n    从 [Releases 页面](https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FPandrator\u002Freleases) 下载 `pandrator_installer_launcher.exe`。\n    *   *注意*: 该文件可能被杀毒软件误报，需添加例外信任。\n\n2.  **运行安装**:\n    右键以**管理员身份**运行 `pandrator_installer_launcher.exe`。\n    *   按照界面提示选择需要安装的 TTS 引擎（XTTS\u002FSilero）。\n    *   勾选额外功能：RVC (声音克隆增强), WhisperX (视频配音\u002F转录), Easy XTTS Trainer (模型微调)。\n    *   安装程序会自动处理 C++ 工具、Calibre、Miniconda 环境创建及仓库克隆。\n    *   首次运行可能需要 5-30 分钟（取决于网速和所选组件）。\n\n### 方案 C：手动安装 (适用于 Linux 或高级用户)\n\n1.  **安装基础依赖**:\n    ```bash\n    # 安装 Calibre (以 Ubuntu\u002FDebian 为例)\n    sudo apt install calibre\n\n    # 安装构建工具 (Windows PowerShell 管理员模式)\n    winget install --id Microsoft.VisualStudio.2022.BuildTools --override \"--quiet --wait --add Microsoft.VisualStudio.Workload.VCTools --includeRecommended\" --accept-package-agreements --accept-source-agreements\n    ```\n\n2.  **克隆仓库与环境配置**:\n    ```bash\n    mkdir Pandrator\n    cd Pandrator\n    \n    # 克隆主程序及相关子模块 (根据需求选择)\n    git clone https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FPandrator.git\n    git clone https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FSubdub.git\n    git clone https:\u002F\u002Fgithub.com\u002Fdaswer123\u002Fxtts-api-server.git\n    # 如需 Silero\n    git clone https:\u002F\u002Fgithub.com\u002Fouoertheo\u002Fsilero-api-server.git\n\n    # 创建并激活 Conda 环境 (示例)\n    conda create -n pandrator python=3.10\n    conda activate pandrator\n\n    # 安装 Python 依赖\n    cd Pandrator\n    pip install -r requirements.txt\n    ```\n\n3.  **启动后端服务**:\n    Pandrator 依赖外部 API 服务器运行，使用前需单独启动它们：\n    ```bash\n    # 启动 XTTS 服务 (在 xtts-api-server 目录)\n    python server.py --port 8020\n\n    # 或在另一个终端启动 Silero 服务\n    python server.py --port 8000\n    ```\n\n## 3. 基本使用\n\n### 启动程序\n*   **安装包用户**: 直接运行解压目录下的启动器。\n*   **手动安装用户**: 在激活的 conda 环境中运行主脚本：\n    ```bash\n    python main.py\n    ```\n\n### 核心功能操作流程\n\n#### 场景一：制作有声书 (文本\u002FEPUB\u002FPDF -> 音频)\n1.  **加载文件**: 在 GUI 中选择源文件（支持 `.txt`, `.pdf`, `.epub`, `.srt`）。\n2.  **配置参数**:\n    *   选择 TTS 模型 (XTTS 或 Silero)。\n    *   选择目标语言 (XTTS 支持中、英、日、韩等 16 种语言)。\n    *   (可选) 上传参考音频进行**声音克隆**。\n    *   (可选) 启用 LLM 预处理以优化数字、缩写朗读效果。\n3.  **生成与预览**: 点击生成。生成过程中可试听单句，若不满意可标记并重生成特定句子。\n4.  **导出**: 完成后保存为音频文件。\n\n#### 场景二：视频配音 (视频 -> 转录 -> 翻译 -> 配音 -> 合成)\n1.  **输入视频**: 选择本地视频文件或 YouTube 链接。\n2.  **工作流设置**:\n    *   **转录**: 自动调用 WhisperX 提取字幕。\n    *   **翻译**: 选择目标语言，系统将翻译字幕内容。\n    *   **配音**: 使用选定的 TTS 模型生成对应语音。\n    *   **同步**: 自动调整语音节奏以匹配视频口型\u002F时长。\n3.  **执行**: 点击开始，等待全流程完成。\n4.  **结果**: 获得带有新配音和新字幕的视频文件。\n\n> **提示**: 更多高级用法、工作流分享及社区支持，可加入官方 Discord 服务器进行交流。","一位独立教育创作者希望将手中的英文技术 PDF 文档和 YouTube 教程视频，快速转化为多语言的有声书和配音视频，以拓展全球受众。\n\n### 没有 Pandrator 时\n- **流程割裂且繁琐**：需要分别使用 OCR 工具提取文字、手动清洗格式、再找不同的 TTS 网站生成音频，最后还要用视频软件强行对齐字幕，耗时极长。\n- **语音情感生硬**：通用的在线朗读声音机械感强，缺乏真人语气，且难以克隆特定讲师的音色，导致学习体验枯燥。\n- **本地部署门槛高**：若想用高质量的开源模型（如 XTTS），需手动配置 Python 环境、安装依赖库并调试代码，对非程序员极不友好。\n- **多语言翻译困难**：视频配音需先转录、再翻译、最后合成，环节众多，一旦翻译出错需重新来过，试错成本极高。\n\n### 使用 Pandrator 后\n- **一站式自动化流水线**：直接导入 PDF 或视频链接，Pandrator 自动完成文本预处理、分段、翻译及音频生成，甚至能智能处理罗马数字和缩写。\n- **高保真语音克隆**：利用内置的 XTTS 和 RVC 增强技术，只需几秒参考音频即可克隆真人音色，生成的有声书语气自然、情感丰富。\n- **开箱即用的本地体验**：通过一键安装包即可在 Windows 上运行图形界面，无需编写代码或配置复杂环境，所有计算均在本地完成，保护数据隐私。\n- **可视化精修工作流**：支持边听边标记不满意的句子进行重生成，并能直接从视频生成带时间轴的双语字幕和配音视频，大幅降低后期修改难度。\n\nPandrator 将原本需要数天协作完成的复杂多媒体本地化工程，缩减为普通人几小时内即可独立完成的自动化流程。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flukaszliniewicz_Pandrator_ffd8182e.png","lukaszliniewicz","Lukasz Liniewicz","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Flukaszliniewicz_f149eb7f.jpg",null,"https:\u002F\u002Fgithub.com\u002Flukaszliniewicz",[81],{"name":82,"color":83,"percentage":84},"Python","#3572A5",100,543,39,"2026-04-02T11:29:26","AGPL-3.0","Windows, Linux","非必需。若使用 XTTS 模型以获得良好性能，需要 NVIDIA GPU 且显存 4GB+；Silero 模型仅需 CPU。未明确提及具体 CUDA 版本。","未说明",{"notes":93,"python":94,"dependencies":95},"1. Windows 支持一键安装器，Linux 需手动安装。2. 提供预编译包（自包含 Conda 环境），解压即用，大小在 4GB 至 36GB 之间。3. 首次运行需下载模型和依赖，耗时可能长达 30 分钟。4. 部分功能（如 LLM 预处理、RVC、配音）为可选组件，需在安装时选择或后续手动配置。5. 杀毒软件可能会误报安装程序为威胁，需添加例外。","未说明 (通过 Miniconda\u002FAnaconda 环境自动管理)",[96,97,98,99,100,101,102,103,104,105],"XTTS API Server (基于 Coqui XTTSv2)","Silero API Server","FFmpeg","WhisperX (可选，用于配音)","RVC Python (可选，用于声音转换)","Text Generation Webui API (可选，用于文本预处理)","NISQA (可选，用于音频质量评估)","CustomTkinter","Sentence Splitter","num2words",[55,13,26],[108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127],"audiobook","audiobook-creator","audiobook-maker","audiobooks","text-processing","text-to-speech","customtkinterprojects","llm","rvc","tkinter-gui","xtts","xttsv2","silero","voice-cloning","voicecraft","dubbing","pdf-to-audio","subtitle-to-speech","subtitle-to-voice","voice-clone","2026-03-27T02:49:30.150509","2026-04-06T09:45:01.293739",[131,136,141,146,151,155,159],{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},12493,"安装或更新时出现 conda\u002Fpip 命令执行失败错误（非零退出状态）怎么办？","这通常与环境权限或损坏的安装有关。维护者建议尝试使用最新版本的安装程序进行“干净安装”（即卸载旧版本并删除相关文件夹后重新安装）。如果在更新过程中出错，请确保没有以管理员身份运行不必要的步骤（通常仅需安装系统依赖时需要），或者尝试将 Pandrator 文件夹移动到用户目录下（避免网络驱动器或受保护的系统路径）。如果问题依旧，请下载最新的 installer_launcher.exe 重试。","https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FPandrator\u002Fissues\u002F46",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},12494,"如何在旧款显卡（如 GTX 1080Ti）上解决 whisperX 训练报错或显存不足的问题？","对于不支持默认 FP16 精度的旧显卡，需要禁用半精度运算。可以在启动脚本或配置中显式添加 `--fp16 False` 参数。如果通过命令行运行，确保命令中包含该标志（例如：`python -m whisperx ... --fp16 False`）。此外，建议为 xtts_training 和 whisperX 创建独立的 Conda 环境以避免依赖冲突。如果官方启动器尚未提供此选项切换，可以手动修改运行脚本或在启动参数中强制指定。","https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FPandrator\u002Fissues\u002F50",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},12495,"生成的 TTS 音频与视频或字幕（.srt）不同步，逐渐产生延迟如何解决？","这是一个已知问题，通常发生在长音频生成中。目前的临时解决方案是将长视频或书籍拆分为较短的片段（例如按章节或每几分钟一段）分别进行处理，这样可以减少累积的时间漂移。维护者正在开发新功能以改进同步机制（如增加关键帧同步频率），但在更新发布前，分段处理是获得最佳同步效果的最有效方法。","https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FPandrator\u002Fissues\u002F54",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},12496,"如何获得最佳的语音合成音质？有什么推荐的模型或设置？","为了获得最佳效果，建议寻找经过微调的 XTTS 模型（可在 Huggingface 上查找），或者如果您拥有 GPU，可以使用 RVC（Retrieval-based Voice Conversion）进行后期处理。虽然全书预处理速度较慢，但拆分章节单独运行可以提高稳定性和质量。多尝试不同的设置和声音样本，直到找到满意的效果后再进行大规模处理。","https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FPandrator\u002Fissues\u002F29",{"id":152,"question_zh":153,"answer_zh":154,"source_url":135},12497,"Pandrator 需要管理员权限才能启动吗？为什么之前不需要现在需要了？","正常情况下，启动 Pandrator 不需要管理员权限，仅在通过 Chocolatey 安装系统级依赖时才需要。如果您发现必须使用管理员权限才能启动，可能是由于安装路径权限问题（例如安装在网络驱动器或受保护的系统目录）。尝试将整个 Pandrator 文件夹及启动器复制到您的用户文件夹（如桌面或文档目录）中通常可以解决此问题。如果仍然无效，请检查是否有杀毒软件或组策略限制了访问。",{"id":156,"question_zh":157,"answer_zh":158,"source_url":140},12498,"下载压缩包（.7z）时提示“磁盘读取错误”或文件损坏怎么办？","这通常是由于下载中断或网络波动导致的文件不完整。请尝试多次重新下载该文件，确保下载过程完全完成且未受干扰。如果使用的是浏览器下载，尝试更换下载工具（如 IDM 或迅雷）以确保文件完整性。如果问题持续存在，请检查磁盘空间是否充足以及硬盘是否有坏道。维护者确认文件本身无误，问题多出在下载环节。",{"id":160,"question_zh":161,"answer_zh":162,"source_url":135},12499,"ffmpeg 在 Pandrator 中无法工作，但在命令行中可以，如何修复？","如果直接在 CMD 中运行 `ffmpeg` 命令正常，说明 ffmpeg 已正确安装但未添加到 Pandrator 的运行环境变量中。请确保 ffmpeg 的可执行文件路径已添加到系统 PATH 环境变量中，或者将 ffmpeg.exe 直接复制到 Pandrator 的安装目录下。重启 Pandrator 后通常能识别到。如果仍不行，尝试以管理员身份运行一次 Pandrator 以刷新环境配置。",[164,169,174,179,184,189,194,199,204,209,214,219,224,229,234,239,244,249,254,259],{"id":165,"version":166,"summary_zh":167,"released_at":168},62840,"v.0.31","### 变更说明\n\n本次更新主要针对安装程序\u002F启动器、字幕处理以及 XTTS 训练进行了优化，预计将带来显著的改进。\n\n### 安装程序\u002F启动器\n- 已从自动化安装流程中移除构建工具的安装步骤，用户可选择单独进行安装（程序会尝试自动安装构建工具；若失败，则会显示手动安装的指导说明）。\n- 除 Calibre 外，其他依赖项不再以系统全局方式安装。现使用 Dulwich 来克隆和更新代码库，而非直接调用 git；ffmpeg 则被安装到 Conda 环境中。\n\n#### 配音功能\n- 新增可用于翻译和校对的人工智能模型（包括 Gemini、Claude 和 Openrouter API），并优化了提示词设计。\n- 更新了 WhisperX 模型。\n- 新增校对任务，可在不进行翻译的情况下提升字幕质量。\n- 采用“思维链”而非单纯的评估模式。\n- 同步效果进一步优化。\n\n#### XTTS 训练\n- 改进了样本处理流程，减少了伪影。\n- 增加了参考语音片段的输出选项：扩展版和动态版——分别选取 3 或 4 段不同的语音样本，并将其放置在一个文件夹中。XTTS 现在可以直接接受文件夹作为输入，从而利用其中的所有片段以获得可能更好的训练效果。\n- 解决了训练启动时遇到的问题。\n\n如果您的源音频为专业级（录音棚品质），请勿启用任何预处理选项。\n\n#### 废弃功能\n已停止对 Voicecraft 的支持——目前仅在安装程序中移除，后续版本中也将从 Pandrator 主体中彻底移除。\n\n> 请重新安装 Pandrator（或再次下载完整安装包）。\n\n### 自包含软件包\n我准备了一些压缩包，您只需解压即可使用。所有组件均已预先安装在独立的便携式 Conda 环境中。您可以从以下链接下载：**[这里](https:\u002F\u002F1drv.ms\u002Ff\u002Fs!AgSiDu9lV3iMnPFKPO5BB_c72OLjtQ?e=sLidui)**。由于文件权限已更新，这些软件包无需管理员权限即可运行。\n\n您可以通过启动器来启动 Pandrator、更新软件以及安装新功能。\n\n| 软件包 | 内容                                                   | 解压后大小 | \n|--------|---------------------------------------------------------|------------|\n| 1      | Pandrator 与 Silero                                        | 4GB        | \n| 2      | Pandrator 与 XTTS（支持 Nvidia GPU）                                          | 14GB       | \n| 3      | Pandrator、XTTS、RVC、WhisperX（用于配音和训练）以及 XTTS 微调工具 | 36GB       | \n\n### 安装程序\n您可以使用下方的安装程序\u002F启动器，该程序基于仓库中的 `pandrator_installer_launcher.py` 文件生成；亦可直接使用源文件。请注意，务必以管理员身份运行可执行文件。Windows 系统或杀毒软件可能会将其识别为潜在威胁。您可以将其加入白名单，或者如果您对此感到不安，也可以查看仓库中的源代码，并手动安装 Pandrator。","2025-03-17T23:08:06",{"id":170,"version":171,"summary_zh":172,"released_at":173},62841,"v.03","### 更改\n\n本次更新再次聚焦于训练。新增了几项预处理选项，可提升数据集的质量，从而提高训练出的模型性能，尤其对于非英语语言（这类语言通常含有更多噪声和瑕疵）。\n\n> 若要使用这些新功能，请重新安装 Pandrator（或再次下载最大尺寸的安装包）。\n\n用于处理源音频的新选项包括：\n\n- 剪除尾部静音，\n- 去除呼吸声，\n- 添加淡入淡出效果，\n- 丢弃即使经过所有预处理后仍以突然中断结尾的片段，以避免生成句子末尾出现“咔哒”声。\n\n如果您的源音频是专业级（录音棚质量），则除了剪切、淡入淡出以及可能的突变检测之外，无需启用其他任何预处理选项。\n\n### 自包含安装包\n我已准备好可以直接解压使用的安装包（压缩文件），其中所有内容均已预先安装在独立的便携式 Conda 环境中。您可从**[此处](https:\u002F\u002F1drv.ms\u002Ff\u002Fs!AgSiDu9lV3iMnPFKPO5BB_c72OLjtQ?e=sLidui)**下载。\n\n通过启动器即可启动 Pandrator、进行更新以及安装新功能。\n\n| 安装包 | 内容                                                   | 解压后大小 | \n|---------|-------------------------------------------------------------|---------------|\n| 1       | Pandrator 和 Silero                                        | 4GB           | \n| 2       | Pandrator 和 XTTS（仅 CPU 版本） | 7GB |\n| 3       | Pandrator 和 XTTS（支持 Nvidia GPU）                                          | 14GB          | \n| 4       | Pandrator、XTTS、RVC、WhisperX（用于配音和训练）以及 XTTS 微调工具 | 36GB          | \n\n### 安装程序\n您可以使用下方的安装\u002F启动器，该工具基于仓库中的 `pandrator_installer_launcher.py` 文件创建；亦可直接使用源文件。请务必以管理员身份运行可执行文件。Windows 或您的防病毒软件可能会将其标记为威胁。您可以将其加入白名单，或者如果您对此不放心，也可以查看仓库中的代码并手动安装 Pandrator。","2024-11-11T05:15:32",{"id":175,"version":176,"summary_zh":177,"released_at":178},62842,"v.0295","> 要更新到此版本，请下载安装程序可执行文件，并用它替换旧版本，然后重新安装 Pandrator，或直接下载其中一个软件包。别忘了在启动器内更新 Pandrator——WhisperX 和 EasyXTTSTrainer 也已更新。\n\n### 更改内容\n\n本次更新：\n\n- 进一步优化了训练流程，重点在于微调 Silero VAD 参数。\n- 修复了安装程序中的一些问题，尤其是更新过程。\n- 修正了配音会话的加载功能（现在应该可以加载一个会话，选择之前转录\u002F翻译好的 .srt 文件和原始视频文件，重新生成部分或全部配音内容，随后轻松创建一个新的同步版本，而无需手动删除文件）。\n\n### 自包含软件包\n我准备了一些可以直接解压使用的软件包（压缩文件），其中所有内容都预装在一个独立的便携式 Conda 环境中。您可以从 **[这里](https:\u002F\u002F1drv.ms\u002Ff\u002Fs!AgSiDu9lV3iMnPFKPO5BB_c72OLjtQ?e=sLidui)** 下载。\n\n您可以通过启动器来启动 Pandrator、更新它以及安装新功能。\n\n| 软件包 | 内容                                                   | 解压后大小 | \n|---------|-------------------------------------------------------------|---------------|\n| 1       | Pandrator 和 Silero                                        | 4GB           | \n| 2       | Pandrator 和 XTTS（仅 CPU） | 7GB |\n| 3       | Pandrator 和 XTTS（Nvidia GPU 支持）                                          | 14GB          | \n| 4       | Pandrator、XTTS、RVC、WhisperX（用于配音和训练）以及 XTTS 微调 | 36GB          | \n\n### 安装程序\n您可以使用下方的安装程序\u002F启动器，它是根据仓库中的 `pandrator_installer_launcher.py` 文件制作的；也可以直接使用源代码文件。请务必以管理员身份运行该可执行文件。Windows 或您的杀毒软件可能会将其标记为威胁。您可以将其加入白名单，或者如果您对此不放心，也可以查看仓库中的代码并手动安装 Pandrator。","2024-11-07T04:54:52",{"id":180,"version":181,"summary_zh":182,"released_at":183},62843,"v.0.29","这是一次非常小的更新，主要修复了几个与依赖相关的 bug，并优化了训练工作流（特别是源音频的分割和精修流程）。由于启动器也进行了更新，如果您受到影响，请用新的启动器可执行文件替换旧版本，然后再更新 Pandrator。\n\n### 自包含软件包\n我准备了一些可以直接解压使用的软件包（压缩文件），其中所有内容都已预装在独立的便携式 Conda 环境中。您可以从**[这里](https:\u002F\u002F1drv.ms\u002Ff\u002Fs!AgSiDu9lV3iMnPFKPO5BB_c72OLjtQ?e=sLidui)**下载。\n\n您可以通过启动器来启动 Pandrator、更新它以及安装新功能。\n\n| 软件包 | 内容                                                   | 解压后大小 | \n|---------|-------------------------------------------------------------|---------------|\n| 1       | Pandrator 和 Silero                                        | 4GB           | \n| 2       | Pandrator 和 XTTS（仅 CPU） | 7GB |\n| 3       | Pandrator 和 XTTS（支持 Nvidia GPU）                                          | 14GB          | \n| 4       | Pandrator、XTTS、RVC、WhisperX（用于配音和训练）以及 XTTS 微调 | 36GB          | \n\n### 安装程序\n您可以使用下方的安装程序\u002F启动器，它是基于仓库中的 `pandrator_installer_launcher.py` 文件生成的；也可以直接使用源文件。请务必以管理员身份运行该可执行文件。Windows 或您的杀毒软件可能会将其标记为威胁。您可以将其加入白名单，或者如果您对此不放心，可以查看仓库中的代码并手动安装 Pandrator。","2024-11-04T05:29:14",{"id":185,"version":186,"summary_zh":187,"released_at":188},62844,"v.0.28","本次更新对 Easy XTTS 训练器进行了多项改进，旨在提升训练后模型的质量，并为训练过程提供更精细的控制。\n\n![python_ROFgHz97wb](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F336d9813-d9d0-464d-bc0c-86813cf7e2ab)\n\n* **音频分割优化：** 训练器现在会通过定位音频中最安静的点来确定各片段之间的最佳分割位置。这种方法能够实现更平滑的片段过渡，减少突兀的切断或混入前一或下一片段部分内容的情况，从而提升合成语音的整体质量和自然度，并有助于消除伪音等瑕疵。\n\n* **集成式音频预处理：** 您现在可以直接在 Pandrator 中将以下音频处理步骤作为训练流程的一部分应用：\n    * **归一化：** 将音频归一化至目标 LUFS 值（默认为 -16.0）。使用 `--normalize \u003Cvalue>` 可指定不同的目标值。\n    * **去齿音：** 通过 `--dess` 标志降低咝音。\n    * **降噪：** 使用 DeepFilterNet 进行降噪，启用 `--denoise`。\n    * **动态范围压缩：** 利用 `--compress` 选项，配合针对男声、女声或中性声音的预设配置文件。\n    * **采样率控制：** 使用 `--sample-rate` 显式设置采样率（22050Hz 或 44100Hz）。推荐使用 22050Hz。\n\n* **训练选项：**\n    * **训练集与验证集划分：** `--training-proportion` 参数（例如 `--training-proportion 8_2`）现可控制训练集与验证集的划分比例。\n    * **分割方法：** 训练器支持三种分割方式：`maximise-punctuation`、`punctuation-only` 和 `mixed`。其中，`--method-proportion` 参数用于控制混合模式下各子方法的比例。\n\n* **Pandrator 集成：** 训练好的模型以及参考音频样本（共两个：一个来自最长 10% 片段中的随机片段，另一个来自最长 70% 片段中的最快片段）会自动在 Pandrator 中可用，以便立即进行语音生成，这一点与以往版本一致。\n\n这些改进使得训练过程的控制更加精准，并有望生成更高品质的自定义 XTTS 语音。\n\n### 自包含软件包\n我已准备好了若干软件包（压缩文件），您只需解压即可使用——所有内容均已预先安装在独立的便携式 conda 环境中。您可以从 **[这里](https:\u002F\u002F1drv.ms\u002Ff\u002Fs!AgSiDu9lV3iMnPFKPO5BB_c72OLjtQ?e=sLidui)** 下载。\n\n您可以通过启动器来运行 Pandrator、更新软件以及安装新功能。\n\n| 软件包 | 内容                                                   | 解压后大小 | \n|---------|-------------------------------------------------------------|---------------|\n| 1       | Pandrator 和 Silero                                        | 4GB           | \n| 2       | Pandrator 和 XTTS（仅 CPU） | 7GB |\n| 3       | Pandrator 和 XTTS                                          | 14GB          | \n| 4       | Pandrator, X","2024-11-02T05:08:01",{"id":190,"version":191,"summary_zh":192,"released_at":193},62845,"v.0.27",">__编辑__（10月28日）：此前存在一个 bug，会导致 Pandrator 在特定情况下无法启动。该问题现已修复。如果您受到影响，请从本次发布中下载启动器，并使用更新功能。\n\n这是一次非常小的更新。我新增了在进行文本提取之前对 PDF 文件进行裁剪的功能（用于去除页眉和页脚），同时还支持使用 [PyCropPDF](https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FPyCropPDF.git) 删除 TTS 不需要的页面（例如封面或目录）：\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fdf01a53f-19b4-4e5d-83be-2fed7923ace8\n\n您可以通过启动器中的“更新”选项来完成操作。\n\n### 自包含软件包\n我准备了一些可以直接解压使用的软件包（压缩文件），其中所有依赖都已预先安装在独立的便携式 conda 环境中。您可以从 **[这里](https:\u002F\u002F1drv.ms\u002Ff\u002Fs!AgSiDu9lV3iMnPFKPO5BB_c72OLjtQ?e=sLidui)** 下载。\n\n您可以通过启动器来启动 Pandrator、更新程序以及安装新功能。\n\n| 软件包 | 内容                                                   | 解压后大小 | \n|---------|-------------------------------------------------------------|---------------|\n| 1       | Pandrator 和 Silero                                        | 4GB           | \n| 2       | Pandrator 和 XTTS                                          | 14GB          | \n| 3       | Pandrator、XTTS、RVC、WhisperX（用于配音）以及 XTTS 微调 | 36GB          | \n\n### 安装程序\n您可以使用下方的安装程序\u002F启动器，它是由仓库中的 `pandrator_installer_launcher.py` 文件生成的；也可以直接使用源代码文件。请务必以管理员身份运行可执行文件。Windows 或您的防病毒软件可能会将其标记为威胁。您可以将其加入白名单，或者如果您对此不放心，也可以查看仓库中的代码并手动安装 Pandrator。","2024-10-26T02:06:24",{"id":195,"version":196,"summary_zh":197,"released_at":198},62846,"v.0.26","本次发布主要针对安装程序进行了优化。我们改用 Chocolatey 而不是 winget，以提升构建工具安装的可靠性；同时，XTTS 服务器的启动流程也得到了改进。希望这能解决部分用户在通过启动器启动时，服务器无法正常上线的问题。\n\n### 自包含软件包\n我准备了可以直接解压使用的软件包（压缩文件），其中所有组件都已预先安装到独立的便携式 Conda 环境中。您可从**[这里](https:\u002F\u002F1drv.ms\u002Ff\u002Fs!AgSiDu9lV3iMnPFKPO5BB_c72OLjtQ?e=sLidui)**下载。\n\n您可以使用启动器来启动 Pandrator、更新它以及安装新功能。\n\n| 软件包 | 内容                                                   | 解压后大小 | \n|--------|---------------------------------------------------------|------------|\n| 1      | Pandrator 和 Silero                                        | 4GB        | \n| 2      | Pandrator 和 XTTS                                          | 14GB       | \n| 3      | Pandrator、XTTS、RVC、WhisperX（用于配音）以及 XTTS 微调 | 36GB       | \n\n### 安装程序\n您可以使用下方的安装程序\u002F启动器，该程序基于仓库中的 `pandrator_installer_launcher.py` 文件生成；也可以直接使用源代码文件。请务必以管理员身份运行可执行文件。Windows 或您的杀毒软件可能会将其误报为威胁，请将其加入白名单；如果您对此不放心，也可以查看仓库中的源代码，并手动安装 Pandrator。","2024-10-19T01:32:41",{"id":200,"version":201,"summary_zh":202,"released_at":203},62847,"v.0.25","### 更新内容\n- 新增了为待重写的句子打标记并将其保存到列表的功能，可通过按钮、按下“m”键或右键点击来实现。这一功能在生成较长文本时尤为实用——您可以标记出有问题的句子，稍后再进行修改（右键点击会同时保存当前正在播放的句子及其前一句，而按下“m”键则仅保存当前句；如果听音频时未查看播放列表，可能难以及时找到对应的句子）。\n- 增加了使用 yt-dlp 从 YouTube（及其他网络平台）下载视频的功能（适用于配音\u002F字幕\u002F翻译工作流）。\n- 优化了元数据选项及处理方式。\n- 修复了一些小 bug，并进行了其他改进。\n\n![python_UWLQrHAIGy](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F4fe7c4e2-9e2c-477f-a2d3-2eab5216219a)\n\n### 自包含软件包\n我准备了一些可以直接解压使用的软件包（压缩文件），其中所有依赖都已预先安装在独立的便携式 Conda 环境中。您可以使用启动器来运行 Pandrator、更新程序以及安装新功能，具体操作取决于您下载的软件包版本。\n\n| 软件包 | 内容                                                   | 解压后大小 | 下载链接     |\n|--------|---------------------------------------------------------|------------|--------------|\n| 1      | Pandrator 和 Silero                                        | 4GB        | [下载](https:\u002F\u002F1drv.ms\u002Fu\u002Fs!AgSiDu9lV3iMnoVipZuCpbxCWkfaCA?e=Xqbvsl) |\n| 2      | Pandrator 和 XTTS                                          | 14GB       | [下载](https:\u002F\u002F1drv.ms\u002Fu\u002Fs!AgSiDu9lV3iMnoVtKm77JrJYOrqEGw?e=sjcAMr) |\n| 3      | Pandrator、XTTS、RVC、WhisperX（用于配音）以及 XTTS 微调工具 | 36GB       | [下载](https:\u002F\u002F1drv.ms\u002Fu\u002Fs!AgSiDu9lV3iMnoVuGyd8Q_thLlS1nQ?e=vfON94) |\n\n### 安装程序\n您可以使用下方的安装程序\u002F启动器，该程序基于仓库中的 `pandrator_installer_launcher.py` 文件制作而成；也可以直接使用源文件。请务必以管理员身份运行可执行文件。Windows 或您的杀毒软件可能会将其识别为威胁，请将其加入白名单；如果您对此不放心，也可以查看仓库中的代码，并手动安装 Pandrator。","2024-10-12T04:03:33",{"id":205,"version":206,"summary_zh":207,"released_at":208},62848,"v.0.2","### 变更\n- 界面现在占满整个屏幕，分为两部分：左侧为设置，右侧为生成的句子播放器\u002F编辑器。\n- 通过并行化显著加快了长文件的预处理速度，时间缩短了3到4倍。\n- 引入了元数据功能：可以设置专辑标题、艺术家、流派，并上传封面图片。\n- 增加了对`.m4b`格式的支持。\n- 增加了章节检测功能（目前仅适用于epub文件）以及m4b文件中的章节标记功能（如果希望文件尽可能小，建议使用opus格式——即使在16k采样率下，opus在语音合成方面的表现也非常出色！）。\n- 对训练流程进行了一些小幅优化（训练完成后，会在`tts_voices`文件夹中自动创建一个包含参考样本的文件夹），并修复了RVC流程中的若干问题。\n\n![image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F7e5afab5-8f45-4698-9af0-13b5bcdd9bda)\n\n### 预装软件包\n您可以在**[这里](https:\u002F\u002F1drv.ms\u002Ff\u002Fs!AgSiDu9lV3iMnPFKPO5BB_c72OLjtQ?e=XsPCIB)**下载无需安装、解压即可使用的自包含软件包。所有组件均已打包在便携式conda环境中，无需额外安装。您也可以随时通过启动器安装其他组件。不过，请务必通过启动器更新Pandrator。\n\n### 安装程序\n您可以使用下方的安装\u002F启动器，它基于仓库中的`pandrator_installer_launcher.py`文件制作；或者直接使用该源文件。请注意以管理员身份运行可执行文件。Windows系统或您的杀毒软件可能会将其误报为威胁。您可以将其加入白名单，或者如果您对此不放心，也可以查看仓库中的代码并手动安装Pandrator。","2024-10-10T01:13:23",{"id":210,"version":211,"summary_zh":212,"released_at":213},62849,"v.0.15","### 变更\n除了修复 bug 和一些小的界面改进外，我还添加了微调（训练）自定义 XTTS 模型的功能。操作非常简单：只需选择一个音频文件或包含多个音频文件的文件夹，为模型命名，训练就会全自动进行。训练完成后，点击“连接到服务器”后，训练好的模型会出现在 GUI 中的“XTTS 模型”下拉菜单中。需要配备至少 8 GB 显存的 Nvidia GPU。仅需 10 分钟左右的音频数据，就能显著提升零样本语音克隆的效果，不过我建议至少准备 30 分钟的音频。您可以尝试增加训练轮数和梯度累积层数。使用自定义模型时，仍然需要提供一段参考语音文件，您可以上传从源音频中提取的某个片段（这些片段位于 `Pandrator\u002Feasy_xtts_trainer\u002F\u003C模型名称>\u002Faudio_sources\u002Fprocessed` 目录下）。训练模型需要通过启动器安装相关工具（如果您已有安装，请下载最新的启动器可执行文件，将其放置在与 Pandrator 文件夹相同的目录下并运行即可完成安装）。\n\n![python_SeEf1P6KBF](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F0b7f424e-bd73-49f2-ad23-b3f98ac9369e)\n\n### 预装软件包\n您可以在 **[这里](https:\u002F\u002F1drv.ms\u002Ff\u002Fs!AgSiDu9lV3iMnPFKPO5BB_c72OLjtQ?e=XsPCIB)** 下载无需安装、解压即可使用的独立软件包。所有组件均已打包在便携式 Conda 环境中，无需额外安装。您也可以随时使用启动器安装其他组件。\n\n### 安装程序\n您可以使用下方的安装程序\u002F启动器，它是由仓库中的 `pandrator_installer_launcher.py` 文件生成的，或者直接使用该源文件。请务必以管理员身份运行可执行文件。Windows 或您的杀毒软件可能会将其误报为威胁。您可以将其加入白名单；如果您对此不放心，也可以查看仓库中的代码，并手动安装 Pandrator。","2024-10-06T02:38:58",{"id":215,"version":216,"summary_zh":217,"released_at":218},62850,"v.0.1","In this release, I've:\r\n\r\n- fixed splitting of Chinese and Japanese sentences, \r\n- added the option to regenerate all sentences, \r\n- changed the RVC implementation to [RVC Python](https:\u002F\u002Fgithub.com\u002Fdaswer123\u002Frvc-python) and added it to the installer as an optional tool (RVC model files are now kept in the `rvc_models` folder inside the Pandrator folder, each in its own directory; when uploading RVC models through the UI, please make sure that the .pth and .index files have the same name),\r\n- completely reworked the dubbing workflow by offloading most of it to a separate cli app, [Subdub](https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FSubdub), which I made for this purpose. It is installed together with Pandrator when using the installer script or executable. It is now possible to select a video file, transcribe it (using WhisperX), translate the subtitles (using LLMs, including proprietary ones, or the DeepL api, which is free up to 500,000 characters a month), generate speech using the standard Pandrator workflow, mix the dubbing audio with the original soundtrack and save it to the video; it's also possible to load an .srt file and a video if transcription is not necessary,\r\n- added logging to a file and log preview in the UI,\r\n- made Pandrator connect automatically to the chosen TTS engine if opened through the launcher,\r\n- improved the UI a little.\r\n\r\nYou may use the installer\u002Flauncher below, which was created from the `pandrator_installer_launcher.py` file in the repository, or use the source file directly. Please remember to run the executable as an administrator. It's possible that Windows or your antivirus software will flag it as a threat. You may whitelist it, or, if you're not comfortable doing that, review the code in the repository and install Pandrator manually. \r\n\r\n","2024-09-28T03:30:58",{"id":220,"version":221,"summary_zh":222,"released_at":223},62851,"v.0.0.9.5","I've added an update function to the installer\u002Flauncher (to update Pandrator) and fixed some issues with RVC installation and processing. \r\n\r\nThe `.exe` installer\u002Flauncher was created using `pyinstaller` from `pandrator_installer_launcher.py` in the repository. **Please remember to run it as an administrator if you want it to install git, ffmpeg, C++ Build Tools and\u002For calibre.**\r\n\r\nIt's possible that your antivirus software flags the installer as a threat, because it is not signed. In that case, add it as an exception. If you're not comfortable doing that, review the code in the repository and perform a manual installation.","2024-09-10T02:24:45",{"id":225,"version":226,"summary_zh":227,"released_at":228},62852,"v.0.0.9","I've created a single installer\u002Flauncher that supports all TTS engines and RVC. This should make it much simpler to experiment with different tools. Also, advanced settings have been exposed for XTTS, including speed and temperature, and support for fine-tuned models has been added. \r\n\r\nThe `.exe` installer\u002Flauncher was created using `pyinstaller` from `pandrator_installer_launcher.py` in the repository. **Please remember to run it as an administrator if you want it to install git, ffmpeg, C++ Build Tools and\u002For calibre.**\r\n\r\nIt's possible that your antivirus software flags the installer as a threat, because it is not signed. In that case, add it as an exception. If you're not comfortable doing that, review the code in the repository and perform a manual installation.","2024-09-06T17:50:32",{"id":230,"version":231,"summary_zh":232,"released_at":233},62853,"v.0.0.8.5","The RVC implementation has been updated and it's possible to install RVC_CLI from the launcher, fully automatically. A full installation (including the dependencies) takes about 20-25 minutes with a fast internet connection (without RVC). But at least now it should be fully automated.\r\n\r\nThe `.exe` one-click installer files were created using `pyinstaller` from `pandrator_installer_launcher_xtts.py`, `pandrator_start_minimal_silero.py` and `pandrator_start_minimal_voicecraft.py` in the repository. **Please remember to run them as an administrator if you want them to install git, curl, ffmpeg, C++ Build Tools and\u002For calibre.**\r\n\r\nIt's possible that your antivirus software flags them as a threat. In that case, add them as an exception. If you're not comfortable doing that, review the code in the repository and perform a manual installation.","2024-09-04T20:20:43",{"id":235,"version":236,"summary_zh":237,"released_at":238},62854,"v.0.0.8","The XTTS installer has received a GUI; it's possible to install XTTS with or without GPU support and to choose weather to run XTTS with or without --deepspeed and --lowvram arguments. It also functions as a launcher. In the future Silero, VoiceCraft, RVC etc. will be added to it as well. Also, the installer checks if C++ Build Tools are installed, and, if not, installes them using winget. A full installation (including the dependencies) takes about 20-25 minutes with a fast internet connection. But at least now it should be fully automated.\r\n\r\nThe `.exe` one-click installer files were created using `pyinstaller` from `pandrator-start-minimal_xtts.py`, `pandrator_start_minimal_silero.py` and `pandrator_start_minimal_voicecraft.py` in the repository. **Please remember to run them as an administrator if you want them to install git, curl, ffmpeg, C++ Build Tools and\u002For calibre.**\r\n\r\nIt's possible that your antivirus software flags them as a threat. In that case, add them as an exception. If you're not comfortable doing that, review the code in the repository and perform a manual installation.","2024-08-31T21:18:43",{"id":240,"version":241,"summary_zh":242,"released_at":243},62855,"v.0.0.7.5","The Windows installers of the XTTS and Silero versions have been fixed. \r\n\r\nThe `.exe` one-click installer files were created using `pyinstaller` from `pandrator-start-minimal_xtts.py`, `pandrator_start_minimal_silero.py` and `pandrator_start_minimal_voicecraft.py` in the repository. **Please remember to run them as an administrator if you want them to install git, ffmpeg and\u002For calibre.**\r\n\r\nIt's possible that your antivirus software flags them as a threat. In that case, add them as an exception. If you're not comfortable doing that, review the code in the repository and perform a manual installation.","2024-08-24T03:22:22",{"id":245,"version":246,"summary_zh":247,"released_at":248},62856,"v.0.0.7","This release addresses primarily VoiceCraft and its recent updates. It adds VoiceCraft model selection to the GUI as well as advanced generation settings. The selected model will be downloaded automatically. \r\n\r\nThe `.exe` one-click installer files were created using `pyinstaller` from `pandrator-start-minimal_xtts.py`, `pandrator_start_minimal_silero.py` and `pandrator_start_minimal_voicecraft.py` in the repository. **Please remember to run them as an administrator if you want them to install git, ffmpeg and\u002For calibre.**\r\n\r\nIt's possible that your antivirus software flags them as a threat. In that case, add them as an exception. If you're not comfortable doing that, review the code in the repository and perform a manual installation.","2024-04-23T02:54:16",{"id":250,"version":251,"summary_zh":252,"released_at":253},62857,"v.0.0.6","New features:\r\n- Support for EPUB files using `ebook-convert` from [Calibre](https:\u002F\u002Fgithub.com\u002Fkovidgoyal\u002Fcalibre). \r\n\r\nImprovements:\r\n- Enhanced sentence splitting logic. \r\n\r\nThe `.exe` one-click installer files were created using `pyinstaller` from `pandrator-start-minimal_xtts.py`, `pandrator_start_minimal_silero.py` and `pandrator_start_minimal_voicecraft.py` in the repository. **Please remember to run them as an administrator if you want them to install git, ffmpeg and\u002For calibre.**\r\n\r\nIt's possible that your antivirus software flags them as a threat. In that case, add them as an exception. If you're not comfortable doing that, review the code in the repository and perform a manual installation.","2024-04-15T02:46:23",{"id":255,"version":256,"summary_zh":257,"released_at":258},62858,"v0.0.5","New features:\r\n- The ability to use PDF files as input (they are converted to .txt, and before the final conversion happens you can see a preview, enable or disable paragraph retention and edit the text). It is not perfect and uses a relatively simple conversion method, so results may vary depending on the layout complexity of the input PDF and other factors. You may use the LLM workflow to try and remove OCR artifacts\u002Fmisspelled words etc. I'm looking for a better conversion method, and if you have any suggestions, please let me know. \r\n- The option to select an external (remote) XTTS server, for example hosted on a service like RunPod or a Google Colab like [this one]() created by the author of XTTS Api Server. \r\n\r\nFixes:\r\n- Corrected dependencies (`ffmpeg-python`).\r\n- Improved lowering of the original track's volume during subtitle speech segments when mixing the synchronized audio output with a video track. \r\n-  Minor UI improvements. \r\n\r\nThe `.exe` files were created using `pyinstaller` from `pandrator-start-minimal_xtts.py`, `pandrator_start_minimal_silero.py` and `pandrator_start_minimal_voicecraft.py` in the repository. **Please remember to run them as an administrator.**\r\n\r\nIt's possible that your antivirus software flags them as a threat. In that case, add them as an exception. If you're not comfortable doing that, review the code in the repository and perform a manual installation.","2024-04-07T01:21:55",{"id":260,"version":261,"summary_zh":262,"released_at":263},62859,"v0.0.4","- Added the [VoiceCraft](https:\u002F\u002Fgithub.com\u002Fjasonppy\u002FVoiceCraft) model through the [VoiceCraft API Server](https:\u002F\u002Fgithub.com\u002Flukaszliniewicz\u002FVoiceCraft_API) I made for this purpose. You can install Pandrator with VoiceCraft using the `pandrator_start_minimal_voicecraft.exe`. \r\n- It is now possible to generate speech from an `.srt` file and automatically mix it with a video's sound track. \r\n\r\nThe `.exe` files were created using `pyinstaller` from `pandrator-start-minimal_xtts.py`, `pandrator_start_minimal_silero.py` and `pandrator_start_minimal_voicecraft.py` in the repository. **Please remember to run them as an administrator.**\r\n\r\nIt's possible that your antivirus software flags them as a threat. In that case, add them as an exception. If you're not comfortable doing that, review the code in the repository and perform a manual installation.","2024-04-03T22:53:43"]