[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-KoljaB--RealtimeTTS":3,"tool-KoljaB--RealtimeTTS":64},[4,23,32,40,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":22},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",85092,2,"2026-04-10T11:13:16",[13,14,15,16,17,18,19,20,21],"图像","数据工具","视频","插件","Agent","其他","语言模型","开发框架","音频","ready",{"id":24,"name":25,"github_repo":26,"description_zh":27,"stars":28,"difficulty_score":29,"last_commit_at":30,"category_tags":31,"status":22},4128,"GPT-SoVITS","RVC-Boss\u002FGPT-SoVITS","GPT-SoVITS 是一款强大的开源语音合成与声音克隆工具，旨在让用户仅需极少量的音频数据即可训练出高质量的个性化语音模型。它核心解决了传统语音合成技术依赖海量录音数据、门槛高且成本大的痛点，实现了“零样本”和“少样本”的快速建模：用户只需提供 5 秒参考音频即可即时生成语音，或使用 1 分钟数据进行微调，从而获得高度逼真且相似度极佳的声音效果。\n\n该工具特别适合内容创作者、独立开发者、研究人员以及希望为角色配音的普通用户使用。其内置的友好 WebUI 界面集成了人声伴奏分离、自动数据集切片、中文语音识别及文本标注等辅助功能，极大地降低了数据准备和模型训练的技术门槛，让非专业人士也能轻松上手。\n\n在技术亮点方面，GPT-SoVITS 不仅支持中、英、日、韩、粤语等多语言跨语种合成，还具备卓越的推理速度，在主流显卡上可实现实时甚至超实时的生成效率。无论是需要快速制作视频配音，还是进行多语言语音交互研究，GPT-SoVITS 都能以极低的数据成本提供专业级的语音合成体验。",56375,3,"2026-04-05T22:15:46",[21],{"id":33,"name":34,"github_repo":35,"description_zh":36,"stars":37,"difficulty_score":29,"last_commit_at":38,"category_tags":39,"status":22},2863,"TTS","coqui-ai\u002FTTS","🐸TTS 是一款功能强大的深度学习文本转语音（Text-to-Speech）开源库，旨在将文字自然流畅地转化为逼真的人声。它解决了传统语音合成技术中声音机械生硬、多语言支持不足以及定制门槛高等痛点，让高质量的语音生成变得触手可及。\n\n无论是希望快速集成语音功能的开发者，还是致力于探索前沿算法的研究人员，亦或是需要定制专属声音的数据科学家，🐸TTS 都能提供得力支持。它不仅预置了覆盖全球 1100 多种语言的训练模型，让用户能够即刻上手，还提供了完善的工具链，支持用户利用自有数据训练新模型或对现有模型进行微调，轻松实现特定风格的声音克隆。\n\n在技术亮点方面，🐸TTS 表现卓越。其最新的 ⓍTTSv2 模型支持 16 种语言，并在整体性能上大幅提升，实现了低于 200 毫秒的超低延迟流式输出，极大提升了实时交互体验。此外，它还无缝集成了 🐶Bark、🐢Tortoise 等社区热门模型，并支持调用上千个 Fairseq 模型，展现了极强的兼容性与扩展性。配合丰富的数据集分析与整理工具，🐸TTS 已成为科研与生产环境中备受信赖的语音合成解决方案。",44971,"2026-04-03T14:47:02",[21,20,13],{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":29,"last_commit_at":46,"category_tags":47,"status":22},2375,"LocalAI","mudler\u002FLocalAI","LocalAI 是一款开源的本地人工智能引擎，旨在让用户在任意硬件上轻松运行各类 AI 模型，包括大语言模型、图像生成、语音识别及视频处理等。它的核心优势在于彻底打破了高性能计算的门槛，无需昂贵的专用 GPU，仅凭普通 CPU 或常见的消费级显卡（如 NVIDIA、AMD、Intel 及 Apple Silicon）即可部署和运行复杂的 AI 任务。\n\n对于担心数据隐私的用户而言，LocalAI 提供了“隐私优先”的解决方案，确保所有数据处理均在本地基础设施内完成，无需上传至云端。同时，它完美兼容 OpenAI、Anthropic 等主流 API 接口，这意味着开发者可以无缝迁移现有应用，直接利用本地资源替代云服务，既降低了成本又提升了可控性。\n\nLocalAI 内置了超过 35 种后端支持（如 llama.cpp、vLLM、Whisper 等），并集成了自主 AI 代理、工具调用及检索增强生成（RAG）等高级功能，且具备多用户管理与权限控制能力。无论是希望保护敏感数据的企业开发者、进行算法实验的研究人员，还是想要在个人电脑上体验最新 AI 技术的极客玩家，都能通过 LocalAI 获",44782,"2026-04-02T22:14:26",[13,21,19,17,20,14,16],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":29,"last_commit_at":54,"category_tags":55,"status":22},3108,"bark","suno-ai\u002Fbark","Bark 是由 Suno 推出的开源生成式音频模型，能够根据文本提示创造出高度逼真的多语言语音、音乐、背景噪音及简单音效。与传统仅能朗读文字的语音合成工具不同，Bark 基于 Transformer 架构，不仅能模拟说话，还能生成笑声、叹息、哭泣等非语言声音，甚至能处理带有情感色彩和语气停顿的复杂文本，极大地丰富了音频表达的可能性。\n\n它主要解决了传统语音合成声音机械、缺乏情感以及无法生成非语音类音效的痛点，让创作者能通过简单的文字描述获得生动自然的音频素材。无论是需要为视频配音的内容创作者、探索多模态生成的研究人员，还是希望快速原型设计的开发者，都能从中受益。普通用户也可通过集成的演示页面轻松体验其神奇效果。\n\n技术亮点方面，Bark 支持商业使用（MIT 许可），并在近期更新中实现了显著的推理速度提升，同时提供了适配低显存 GPU 的版本，降低了使用门槛。此外，社区还建立了丰富的提示词库，帮助用户更好地驾驭模型生成特定风格的声音。只需几行 Python 代码，即可将创意文本转化为高质量音频，是连接文字与声音世界的强大桥梁。",39067,"2026-04-04T03:33:35",[21],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":29,"last_commit_at":62,"category_tags":63,"status":22},5908,"ChatTTS","2noise\u002FChatTTS","ChatTTS 是一款专为日常对话场景打造的生成式语音模型，特别适用于大语言模型助手等交互式应用。它主要解决了传统文本转语音（TTS）技术在对话中缺乏自然感、情感表达单一以及难以处理停顿、笑声等细微语气的问题，让机器生成的语音听起来更像真人在聊天。\n\n这款工具非常适合开发者、研究人员以及希望为应用增添自然语音交互功能的设计师使用。普通用户也可以通过社区开发的衍生产品体验其能力。ChatTTS 的核心亮点在于其对对话任务的深度优化：它不仅支持中英文双语，还能精准控制韵律细节，自动生成自然的 laughter（笑声）、pauses（停顿）和 interjections（插入语），从而实现多说话人的互动对话效果。在韵律表现上，ChatTTS 超越了大多数开源 TTS 模型。目前开源版本基于 4 万小时数据预训练而成，虽主要用于学术研究与教育目的，但已展现出强大的潜力，并支持流式音频生成与零样本推理，为后续的多情绪控制等进阶功能奠定了基础。",39042,"2026-04-09T11:54:03",[19,17,20,21],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":81,"owner_website":82,"owner_url":83,"languages":84,"stars":105,"forks":106,"last_commit_at":107,"license":108,"difficulty_score":10,"env_os":109,"env_gpu":110,"env_ram":111,"env_deps":112,"category_tags":126,"github_topics":127,"view_count":10,"oss_zip_url":79,"oss_zip_packed_at":79,"status":22,"created_at":132,"updated_at":133,"faqs":134,"releases":170},7837,"KoljaB\u002FRealtimeTTS","RealtimeTTS","Converts text to speech in realtime","RealtimeTTS 是一款专为实时应用打造的高性能文本转语音（TTS）开源库。它核心解决了传统语音合成延迟高、响应慢的痛点，能够将文字流几乎即时地转化为清晰自然的语音输出，完美适配大语言模型（LLM）的流式生成场景，让 AI 对话不再有明显的等待停顿。\n\n这款工具非常适合开发者构建实时语音助手、交互式客服系统或需要低延迟反馈的音频应用。其最大亮点在于强大的兼容性与稳定性：不仅支持 OpenAI、Elevenlabs、Azure、Coqui 等二十多种主流及前沿语音引擎，还内置了智能故障转移机制。当主用引擎出现波动时，它能自动切换至备用方案，确保服务持续稳定运行。此外，RealtimeTTS 具备多语言能力，并持续集成如 PocketTTS 等轻量级新引擎，在保持高质量音质的同时进一步优化了 CPU 占用与响应速度。无论是希望快速原型的工程师，还是追求极致体验的研究人员，都能通过它轻松实现流畅的实时语音交互。","# RealtimeTTS\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002FRealtimeTTS)](https:\u002F\u002Fpypi.org\u002Fproject\u002FRealtimeTTS\u002F)\n[![Downloads](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FKoljaB_RealtimeTTS_readme_a1326e50ca2a.png)](https:\u002F\u002Fwww.pepy.tech\u002Fprojects\u002Frealtimetts)\n[![GitHub release](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frelease\u002FKoljaB\u002FRealtimeTTS.svg)](https:\u002F\u002FGitHub.com\u002FKoljaB\u002FRealtimeTTS\u002Freleases\u002F)\n[![GitHub commits](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FKoljaB_RealtimeTTS_readme_ab755b94e79b.png)](https:\u002F\u002FGitHub.com\u002FNaereen\u002FKoljaB\u002FRealtimeTTS\u002Fcommit\u002F)\n[![GitHub forks](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002FKoljaB\u002FRealtimeTTS.svg?style=social&label=Fork&maxAge=2592000)](https:\u002F\u002FGitHub.com\u002FKoljaB\u002FRealtimeTTS\u002Fnetwork\u002F)\n[![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FKoljaB\u002FRealtimeTTS.svg?style=social&label=Star&maxAge=2592000)](https:\u002F\u002FGitHub.com\u002FKoljaB\u002FRealtimeTTS\u002Fstargazers\u002F)\n\n*Easy to use, low-latency text-to-speech library for realtime applications*\n\n## About the Project\n\nRealtimeTTS is a state-of-the-art text-to-speech (TTS) library designed for real-time applications. It stands out in its ability to convert text streams fast into high-quality auditory output with minimal latency.\n\n> **Important:** [Installation](#installation) has changed to allow more customization. Please use `pip install realtimetts[all]` instead of `pip install realtimetts` now. More [info here](#installation).\n\n> **Hint:** *\u003Cstrong>Check out [Linguflex](https:\u002F\u002Fgithub.com\u002FKoljaB\u002FLinguflex)\u003C\u002Fstrong>, the original project from which RealtimeTTS is spun off. It lets you control your environment by speaking and is one of the most capable and sophisticated open-source assistants currently available.*\n\nhttps:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fassets\u002F7604638\u002F87dcd9a5-3a4e-4f57-be45-837fc63237e7\n\n## Key Features\n\n- **Low Latency**\n  - almost instantaneous text-to-speech conversion\n  - compatible with LLM outputs\n- **High-Quality Audio**\n  - generates clear and natural-sounding speech\n- **Multiple TTS Engine Support**\n  - supports OpenAI TTS, Elevenlabs, Azure Speech Services, Coqui TTS, StyleTTS2, Piper, gTTS, Edge TTS, Parler TTS, Kokoro, Cartesia, Faster Qwen 3, NeuTTS, PocketTTS, Modelslab, CAMB AI, MiniMax and System TTS\n- **Multilingual**\n- **Robust and Reliable**:\n  - ensures continuous operation through a fallback mechanism\n  - switches to alternative engines in case of disruptions guaranteeing consistent performance and reliability, which is vital for critical and professional use cases\n\n> **Hint**: *check out [RealtimeSTT](https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeSTT), the input counterpart of this library, for speech-to-text capabilities. Together, they form a powerful realtime audio wrapper around large language models.*\n\n## FAQ\n\nCheck the [FAQ page](.\u002FFAQ.md) for answers to a lot of questions around the usage of RealtimeTTS.\n\n## Documentation\n\nThe documentation for **RealtimeTTS** is available in the following languages:\n\n- **[English](https:\u002F\u002Fkoljab.github.io\u002FRealtimeTTS\u002Fen\u002F)**\n- **[French](https:\u002F\u002Fkoljab.github.io\u002FRealtimeTTS\u002Ffr\u002F)**\n- **[Spanish](https:\u002F\u002Fkoljab.github.io\u002FRealtimeTTS\u002Fes\u002F)**\n- **[German](https:\u002F\u002Fkoljab.github.io\u002FRealtimeTTS\u002Fde\u002F)**\n- **[Italian](https:\u002F\u002Fkoljab.github.io\u002FRealtimeTTS\u002Fit\u002F)**\n- **[Chinese](https:\u002F\u002Fkoljab.github.io\u002FRealtimeTTS\u002Fzh\u002F)**\n- **[Japanese](https:\u002F\u002Fkoljab.github.io\u002FRealtimeTTS\u002Fja\u002F)**\n- **[Hindi](https:\u002F\u002Fkoljab.github.io\u002FRealtimeTTS\u002Fhi\u002F)**\n- **[Korean](https:\u002F\u002Fkoljab.github.io\u002FRealtimeTTS\u002Fko\u002F)**\n\n---\n\nLet me know if you need any adjustments or additional languages!\n\n## Updates\n\n- **New Engine:** PocketTTSEngine\n  - **Installation:** `pip install pocket-tts`\n  - Kyutai Labs' lightweight 100M parameter TTS, CPU-optimized (~6x real-time)\n  - Voice cloning via WAV files, ~200ms latency, 8 built-in voices\n\n- **New Engine:** NeuTTSEngine\n  - On-device voice cloning TTS with 3-second reference audio\n  - **Installation:** Clone from https:\u002F\u002Fgithub.com\u002Fneuphonic\u002Fneutts\n\n- **New Engine:** ZipoVoiceEngine\n  - **Installation:** `pip install RealtimeTTS\n  - **Test File Example:** [zipvoice_test.py](https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fblob\u002Fmaster\u002Ftests\u002Fzipvoice_test.py)\n\n- **New Engine:** OrpheusEngine\n  - **Installation:** `pip install RealtimeTTS[orpheus]\n  - **Test File Example:** [orpheus_test.py](https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fblob\u002Fmaster\u002Ftests\u002Forpheus_test.py)\n\n- **New Engine:** KokoroEngine\n  - **Installation:** `pip install RealtimeTTS[kokoro]\n  - **Test File Example:** [kokoro_test.py](https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fblob\u002Fmaster\u002Ftests\u002Fkokoro_test.py)\n\nSupport for more kokoro languages. Full installation for also japanese and chinese languages (see updated test file): \n```shell\npip install \"RealtimeTTS[kokoro,jp,zh]\"\n```\n\nIf you run into problems with japanese (Error \"module 'jieba' has no attribute 'lcut'\") try:\n```shell\npip uninstall jieba jieba3k\npip install jieba\n```\n\n\n- **New Engine:** PiperEngine\n  - **Installation Tutorial:** [Watch on YouTube](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=GGvdq3giiTQ)\n  - **Test File Example:** [piper_test.py](https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fblob\u002Fmaster\u002Ftests\u002Fpiper_test.py)\n\nStyleTTS2 engine:\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fd1634012-ba53-4445-a43a-7042826eedd7\n\nEdgeTTS engine:\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F73ec6258-23ba-4bc6-acc7-7351a13c5509\n\nSee [release history](https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Freleases).\n\nAdded ParlerEngine. Needs flash attention, then barely runs fast enough for realtime inference on a 4090.\n\nParler Installation for Windows (after installing RealtimeTTS):\n\n```python\npip install git+https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fparler-tts.git\npip install torch==2.3.1+cu121 torchaudio==2.3.1 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\npip install https:\u002F\u002Fgithub.com\u002Foobabooga\u002Fflash-attention\u002Freleases\u002Fdownload\u002Fv2.6.3\u002Fflash_attn-2.6.3+cu122torch2.3.1cxx11abiFALSE-cp310-cp310-win_amd64.whl\npip install \"numpy\u003C2\"\n```\n\n## Tech Stack\n\nThis library uses:\n\n- **Text-to-Speech Engines**\n  - **OpenAIEngine** 🌐: OpenAI's TTS with 6 premium voices\n  - **CoquiEngine** 🏠: High-quality neural TTS with local processing\n  - **AzureEngine** 🌐: Microsoft's TTS with 500k free chars\u002Fmonth\n  - **ElevenlabsEngine** 🌐: Premium voice quality with extensive options\n  - **GTTSEngine** 🌐: Free Google Translate TTS, no GPU needed\n  - **EdgeEngine** 🌐: Edge free TTS service (Microsoft Azure)\n  - **ParlerEngine** 🏠: Local neural TTS for high-end GPUs\n  - **SystemEngine** 🏠: Built-in system TTS for quick setup\n  - **PiperEngine** 🏠: Very fast TTS system, also runs on Raspberry Pi \n  - **StyleTTS2Engine** 🏠: Expressive, natural speech\n  - **OrpheusEngine** 🏠: Llama‑powered TTS with emotion tags\n  - **CambEngine** 🌐: CAMB AI MARS models with 140+ languages\n  - **MiniMaxEngine** 🌐: MiniMax Cloud TTS with 12 voice presets\n  - **ZipVoiceEngine** 🏠: 123M zero‑shot model, state‑of‑the‑art quality\n  - **PocketTTSEngine** 🏠: Kyutai Labs 100M model, CPU-optimized with voice cloning\n  - **NeuTTSEngine** 🏠: Voice cloning with 3-second reference audio\n  - **CartesiaEngine** 🌐: Fast API based high quality synthesis\n  - **FasterQwenEngine** 🏠: Local fast high quality voice cloning\n  - **ModelsLabEngine** 🌐: API based TTS\n  - **OmnivoiceEngine** 🏠: Hundreds of languages, very high quality voice cloning\n  \n\n🏠 Local processing (no internet required)\n🌐 Requires internet connection\n\n- **Sentence Boundary Detection**\n  - **NLTK Sentence Tokenizer**: Natural Language Toolkit's sentence tokenizer for straightforward text-to-speech tasks in English or when simplicity is preferred.\n  - **Stanza Sentence Tokenizer**: Stanza sentence tokenizer for working with multilingual text or when higher accuracy and performance are required.\n\n*By using \"industry standard\" components RealtimeTTS offers a reliable, high-end technological foundation for developing advanced voice solutions.*\n\n## Installation\n\n> **Note:** Basic Installation with `pip install realtimetts` is not recommended anymore, use `pip install realtimetts[all]` instead.\n\n> **Note:** Set `output_device_index` in TextToAudioStream if needed. Linux users: Install portaudio via `apt-get install -y portaudio19-dev` | MacOS users: Install portaudio via `brew install portaudio`\n\nThe RealtimeTTS library provides installation options for various dependencies for your use case. Here are the different ways you can install RealtimeTTS depending on your needs:\n\n### Full Installation\n\nTo install RealtimeTTS with support for all TTS engines:\n\n```bash\npip install -U realtimetts[all]\n```\n\n### Custom Installation\n\nInstall only required dependencies using these options:\n\n- **all**: Complete package with all engines\n- **system**: Local system TTS via pyttsx3\n- **azure**: Azure Speech Services support\n- **elevenlabs**: ElevenLabs API integration\n- **openai**: OpenAI TTS services\n- **gtts**: Google Text-to-Speech\n- **edge**: Microsoft Edge TTS\n- **coqui**: Coqui TTS engine\n- **camb**: CAMB AI MARS TTS\n- **minimax**: MiniMax Cloud TTS\n- **minimal**: Core package only (for custom engine development)\n\nExample: `pip install realtimetts[all]`, `pip install realtimetts[azure]`, `pip install realtimetts[azure,elevenlabs,openai]`\n\n### Virtual Environment Installation\n\nFor those who want to perform a full installation within a virtual environment, follow these steps:\n\n```bash\npython -m venv env_realtimetts\nenv_realtimetts\\Scripts\\activate.bat\npython.exe -m pip install --upgrade pip\npip install -U realtimetts[all]\n```\n\nMore information about [CUDA installation](#cuda-installation).\n\n## Engine Requirements\n\nDifferent engines supported by RealtimeTTS have unique requirements. Ensure you fulfill these requirements based on the engine you choose.\n\n### SystemEngine\nThe `SystemEngine` works out of the box with your system's built-in TTS capabilities. No additional setup is needed.\n\n### GTTSEngine\nThe `GTTSEngine` works out of the box using Google Translate's text-to-speech API. No additional setup is needed.\n\n### OpenAIEngine\nTo use the `OpenAIEngine`:\n- set environment variable OPENAI_API_KEY\n- install ffmpeg (see [CUDA installation](#cuda-installation) point 3)\n\n### AzureEngine\nTo use the `AzureEngine`, you will need:\n- Microsoft Azure Text-to-Speech API key (provided via AzureEngine constructor parameter \"speech_key\" or in the environment variable AZURE_SPEECH_KEY)\n- Microsoft Azure service region.\n\nMake sure you have these credentials available and correctly configured when initializing the `AzureEngine`.\n\n### CambEngine\nTo use the `CambEngine`, you need:\n- CAMB AI API key (provided via CambEngine constructor parameter \"api_key\" or in the environment variable CAMB_API_KEY)\n- `mpv` installed on your system (essential for streaming audio).\n- Available models: `mars-flash` (low-latency), `mars-pro` (high-fidelity), `mars-instruct` (instruction-following)\n- 140+ languages via BCP-47 codes (e.g., `en-us`, `es-es`, `ja-jp`)\n\n### MiniMaxEngine\nTo use the `MiniMaxEngine`, you need:\n- MiniMax API key (provided via MiniMaxEngine constructor parameter \"api_key\" or in the environment variable MINIMAX_API_KEY)\n- Available models: `speech-2.8-hd` (high quality), `speech-2.8-turbo` (fast)\n- 12 voice presets including English and multilingual options\n\n### ElevenlabsEngine\nFor the `ElevenlabsEngine`, you need:\n- Elevenlabs API key (provided via ElevenlabsEngine constructor parameter \"api_key\" or in the environment variable ELEVENLABS_API_KEY)\n- `mpv` installed on your system (essential for streaming mpeg audio, Elevenlabs only delivers mpeg).\n\n  🔹 **Installing `mpv`:**\n  - **macOS**:\n    ```bash\n    brew install mpv\n    ```\n\n  - **Linux and Windows**: Visit [mpv.io](https:\u002F\u002Fmpv.io\u002F) for installation instructions.\n\n### PiperEngine\n\n**PiperEngine** offers high-quality, real-time text-to-speech synthesis using the Piper model.\n\n- **Separate Installation:**\n  - Piper must be installed independently from RealtimeTTS. Follow the [Piper installation tutorial for Windows](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=GGvdq3giiTQ).\n\n- **Configuration:**\n  - Provide the correct paths to the Piper executable and voice model files when initializing `PiperEngine`.\n  - Ensure that the `PiperVoice` is correctly set up with the model and configuration files.\n\n### CoquiEngine\n\nDelivers high quality, local, neural TTS with voice-cloning.\n\nDownloads a neural TTS model first. In most cases it be fast enough for Realtime using GPU synthesis. Needs around 4-5 GB VRAM.\n\n- to clone a voice submit the filename of a wave file containing the source voice as \"voice\" parameter to the CoquiEngine constructor\n- voice cloning works best with a 22050 Hz mono 16bit WAV file containing a short (~5-30 sec) sample\n\nOn most systems GPU support will be needed to run fast enough for realtime, otherwise you will experience stuttering.\n\n## Quick Start\n\nHere's a basic usage example:\n\n```python\nfrom RealtimeTTS import TextToAudioStream, SystemEngine, AzureEngine, ElevenlabsEngine\n\nengine = SystemEngine() # replace with your TTS engine\nstream = TextToAudioStream(engine)\nstream.feed(\"Hello world! How are you today?\")\nstream.play_async()\n```\n\n## Feed Text\n\nYou can feed individual strings:\n\n```python\nstream.feed(\"Hello, this is a sentence.\")\n```\n\nOr you can feed generators and character iterators for real-time streaming:\n\n```python\ndef write(prompt: str):\n    for chunk in openai.ChatCompletion.create(\n        model=\"gpt-3.5-turbo\",\n        messages=[{\"role\": \"user\", \"content\" : prompt}],\n        stream=True\n    ):\n        if (text_chunk := chunk[\"choices\"][0][\"delta\"].get(\"content\")) is not None:\n            yield text_chunk\n\ntext_stream = write(\"A three-sentence relaxing speech.\")\n\nstream.feed(text_stream)\n```\n\n```python\nchar_iterator = iter(\"Streaming this character by character.\")\nstream.feed(char_iterator)\n```\n\n## Playback\n\nAsynchronously:\n\n```python\nstream.play_async()\nwhile stream.is_playing():\n    time.sleep(0.1)\n```\n\nSynchronously:\n\n```python\nstream.play()\n```\n\n## Testing the Library\n\nThe test subdirectory contains a set of scripts to help you evaluate and understand the capabilities of the RealtimeTTS library.\n\nNote that most of the tests still rely on the \"old\" OpenAI API (\u003C1.0.0). Usage of the new OpenAI API is demonstrated in openai_1.0_test.py.\n\n- **simple_test.py**\n    - **Description**: A \"hello world\" styled demonstration of the library's simplest usage.\n\n- **complex_test.py**\n    - **Description**: A comprehensive demonstration showcasing most of the features provided by the library.\n\n- **coqui_test.py**\n    - **Description**: Test of local coqui TTS engine.\n\n- **translator.py**\n    - **Dependencies**: Run `pip install openai realtimestt`.\n    - **Description**: Real-time translations into six different languages.\n\n- **openai_voice_interface.py**\n    - **Dependencies**: Run `pip install openai realtimestt`.\n    - **Description**: Wake word activated and voice based user interface to the OpenAI API.\n\n- **advanced_talk.py**\n    - **Dependencies**: Run `pip install openai keyboard realtimestt`.\n    - **Description**: Choose TTS engine and voice before starting AI conversation.\n\n- **minimalistic_talkbot.py**\n    - **Dependencies**: Run `pip install openai realtimestt`.\n    - **Description**: A basic talkbot in 20 lines of code.\n\n- **simple_llm_test.py**\n    - **Dependencies**: Run `pip install openai`.\n    - **Description**: Simple demonstration of how to integrate the library with large language models (LLMs).\n\n- **test_callbacks.py**\n    - **Dependencies**: Run `pip install openai`.\n    - **Description**: Showcases the callbacks and lets you check the latency times in a real-world application environment.\n\n## Pause, Resume & Stop\n\nPause the audio stream:\n\n```python\nstream.pause()\n```\n\nResume a paused stream:\n\n```python\nstream.resume()\n```\n\nStop the stream immediately:\n\n```python\nstream.stop()\n```\n\n## Requirements Explained\n\n- **Python Version**:\n  - **Required**: Python >= 3.9, \u003C 3.13\n  - **Reason**: The library depends on the GitHub library \"TTS\" from coqui, which requires Python versions in this range.\n\n- **PyAudio**: to create an output audio stream\n\n- **stream2sentence**: to split the incoming text stream into sentences\n\n- **pyttsx3**: System text-to-speech conversion engine\n\n- **pydub**: to convert audio chunk formats\n\n- **azure-cognitiveservices-speech**: Azure text-to-speech conversion engine\n\n- **elevenlabs**: Elevenlabs text-to-speech conversion engine\n\n- **coqui-TTS**: Coqui's XTTS text-to-speech library for high-quality local neural TTS\n\n  Shoutout to [Idiap Research Institute](https:\u002F\u002Fgithub.com\u002Fidiap) for maintaining a [fork of coqui tts](https:\u002F\u002Fgithub.com\u002Fidiap\u002Fcoqui-ai-TTS).\n\n- **openai**: to interact with OpenAI's TTS API\n\n- **gtts**: Google translate text-to-speech conversion\n\n\n## Configuration\n\n### Initialization Parameters for `TextToAudioStream`\n\nWhen you initialize the `TextToAudioStream` class, you have various options to customize its behavior. Here are the available parameters:\n\n#### `engine` (BaseEngine)\n- **Type**: `Union[BaseEngine, List[BaseEngine]]`\n- **Required**: Yes\n- **Description**: The core engine(s) used for text-to-audio synthesis.  \n  - If a single engine instance is provided, it will be used for all synthesis tasks.  \n  - If a list of engine instances is provided, the system uses them for fallback mechanisms.  \n\n#### `on_text_stream_start` (callable)\n- **Type**: `Callable`\n- **Required**: No\n- **Description**: A callback function triggered when the text streaming process begins.  \n  - **Use Case**: Displaying a \"Processing...\" status message or initializing resources.  \n  - **Signature**: `on_text_stream_start() -> None`.\n\n#### `on_text_stream_stop` (callable)\n- **Type**: `Callable`\n- **Required**: No\n- **Description**: A callback function triggered when the text streaming process ends.  \n  - **Use Case**: Cleaning up resources or signaling that the text-to-speech pipeline has completed processing.  \n  - **Signature**: `on_text_stream_stop() -> None`.\n\n#### `on_audio_stream_start` (callable)\n- **Type**: `Callable`\n- **Required**: No\n- **Description**: A callback function triggered when the audio playback starts.  \n  - **Use Case**: Logging playback events or updating UI elements to reflect active audio playback.  \n  - **Signature**: `on_audio_stream_start() -> None`.\n\n#### `on_audio_stream_stop` (callable)\n- **Type**: `Callable`\n- **Required**: No\n- **Description**: A callback function triggered when the audio playback ends.  \n  - **Use Case**: Resetting UI elements or initiating follow-up actions after playback.  \n  - **Signature**: `on_audio_stream_stop() -> None`.\n\n#### `on_character` (callable)\n- **Type**: `Callable`\n- **Required**: No\n- **Description**: A callback function triggered for every character processed during synthesis.  \n  - **Use Case**: Real-time visualization of character-level processing, useful for debugging or monitoring.  \n  - **Signature**: `on_character(character: str) -> None`.\n\n#### `on_word` (callable, optional)\n- **Type**: `Callable`\n- **Required**: No\n- **Default**: `None`\n- **Description**: A callback function triggered when a word starts playing. The callback receives an object (an instance of `TimingInfo`) that includes:\n  - **word**: the text of the word,\n  - **start_time**: the time offset (in seconds) when the word starts,\n  - **end_time**: the time offset (in seconds) when the word ends.\n- **Use Case**: Useful for tracking word-level progress or highlighting spoken words in a display.\n- **Notes**: Currently supported only by AzureEngine and KokoroEngine (for English voices, both American and British). Other engines don't provide word-level timings.\n\n#### `output_device_index` (int) ❗ NOT SUPPORTED for ElevenlabsEngine and EdgeEngine (MPV playout)\n- **Type**: `int`\n- **Required**: No\n- **Default**: `None`\n- **Description**: The index of the audio output device to use for playback.  \n  - **How It Works**: The system will use the device corresponding to this index for audio playback. If `None`, the system's default audio output device is used.  \n  - **Obtaining Device Indices**: Use PyAudio's device query methods to retrieve available indices.\n\n#### `mpv_audio_device` (str) For ElevenlabsEngine and EdgeEngine (MPV playout)\n- **Type**: `str`\n- **Required**: No\n- **Default**: `None`\n- **Description**: The name of the audio device to use for playback.\n  - **How It Works**: The system will use the device corresponding to this name for audio playback. If `None`, the system's default audio output device is used.\n  - **Obtaining Device Names**: Use `mpv --audio-device=help` to get the device names.\n\n#### `tokenizer` (string)\n- **Type**: `str`\n- **Required**: No\n- **Default**: `\"nltk\"`\n- **Description**: Specifies the tokenizer used for splitting text into sentences or fragments.  \n  - **Supported Options**: `\"nltk\"` (default) and `\"stanza\"`.  \n  - **Custom Tokenization**: You can provide a custom tokenizer by setting the `tokenize_sentences` parameter instead.\n\n#### `language` (string)\n- **Type**: `str`\n- **Required**: No\n- **Default**: `\"en\"`\n- **Description**: Language code for sentence splitting.  \n  - **Examples**: `\"en\"` for English, `\"de\"` for German, `\"fr\"` for French.  \n  - Ensure that the tokenizer supports the specified language.\n\n#### `muted` (bool)\n- **Type**: `bool`\n- **Required**: No\n- **Default**: `False`\n- **Description**: Controls whether audio playback is muted.\n  - If `True`, audio playback is disabled and no audio stream will be opened, allowing the synthesis to generate audio data without playing it.  \n  - **Use Case**: Useful for scenarios where you want to save audio to a file or process audio chunks without hearing the output.\n\n#### `frames_per_buffer` (int)\n- **Type**: `int`\n- **Required**: No\n- **Default**: `pa.paFramesPerBufferUnspecified`\n- **Description**: Defines the number of audio frames processed per buffer by PyAudio.  \n  - **Implications**:  \n    - Lower values reduce latency but increase CPU usage.  \n    - Higher values increase latency but reduce CPU load.  \n  - If set to `pa.paFramesPerBufferUnspecified`, PyAudio selects a default value based on the platform and hardware.\n\n##### `comma_silence_duration` (float)  \n- **Type**: `float`  \n- **Required**: No  \n- **Default**: `0.0`  \n- **Description**: Duration of silence (in seconds) inserted after a comma.  \n\n##### `sentence_silence_duration` (float)  \n- **Type**: `float`  \n- **Required**: No  \n- **Default**: `0.0`  \n- **Description**: Duration of silence (in seconds) inserted after the end of a sentence.  \n\n##### `default_silence_duration` (float)  \n- **Type**: `float`  \n- **Required**: No  \n- **Default**: `0.0`  \n- **Description**: Default silence duration (in seconds) between fragments when no punctuation rule applies.  \n\n#### `playout_chunk_size` (int)\n- **Type**: `int`\n- **Required**: No\n- **Default**: `-1`\n- **Description**: Specifies the size of audio chunks (in bytes) to play out to the stream.  \n  - **Behavior**:  \n    - If `-1`, the chunk size is determined dynamically based on `frames_per_buffer` or a default internal value.  \n    - Smaller chunk sizes can reduce latency but may increase overhead.  \n    - Larger chunk sizes improve efficiency but may introduce playback delays.  \n\n#### `level` (int)\n- **Type**: `int`\n- **Required**: No\n- **Default**: `logging.WARNING`\n- **Description**: Sets the logging level for the internal logger.  \n  - **Examples**:  \n    - `logging.DEBUG`: Detailed information for debugging.  \n    - `logging.INFO`: General runtime information.  \n    - `logging.WARNING`: Warnings about potential issues.  \n    - `logging.ERROR`: Serious errors requiring attention.\n\n#### Example Usage:\n\n```python\nengine = YourEngine()  # Substitute with your engine\nstream = TextToAudioStream(\n    engine=engine,\n    on_text_stream_start=my_text_start_func,\n    on_text_stream_stop=my_text_stop_func,\n    on_audio_stream_start=my_audio_start_func,\n    on_audio_stream_stop=my_audio_stop_func,\n    level=logging.INFO\n)\n```\n\n### Methods\n\n#### `play` and `play_async`\n\nThese methods are responsible for executing the text-to-audio synthesis and playing the audio stream. The difference is that `play` is a blocking function, while `play_async` runs in a separate thread, allowing other operations to proceed.\n\n##### Parameters:\n\n###### `fast_sentence_fragment` (bool)\n- **Default**: `True`\n- **Description**: When set to `True`, the method will prioritize speed, generating and playing sentence fragments faster. This is useful for applications where latency matters.\n\n###### `fast_sentence_fragment_allsentences` (bool)\n- **Default**: `False`\n- **Description**: When set to `True`, applies the fast sentence fragment processing to all sentences, not just the first one.\n\n###### `fast_sentence_fragment_allsentences_multiple` (bool)\n- **Default**: `False`\n- **Description**: When set to `True`, allows yielding multiple sentence fragments instead of just a single one.\n\n###### `buffer_threshold_seconds` (float)\n- **Default**: `0.0`\n- **Description**: Specifies the time in seconds for the buffering threshold, which impacts the smoothness and continuity of audio playback.\n\n  - **How it Works**: Before synthesizing a new sentence, the system checks if there is more audio material left in the buffer than the time specified by `buffer_threshold_seconds`. If so, it retrieves another sentence from the text generator, assuming that it can fetch and synthesize this new sentence within the time window provided by the remaining audio in the buffer. This process allows the text-to-speech engine to have more context for better synthesis, enhancing the user experience.\n\n  A higher value ensures that there's more pre-buffered audio, reducing the likelihood of silence or gaps during playback. If you experience breaks or pauses, consider increasing this value.\n\n###### `minimum_sentence_length` (int)\n- **Default**: `10`\n- **Description**: Sets the minimum character length to consider a string as a sentence to be synthesized. This affects how text chunks are processed and played.\n\n###### `minimum_first_fragment_length` (int)\n- **Default**: `10`\n- **Description**: The minimum number of characters required for the first sentence fragment before yielding.\n\n###### `log_synthesized_text` (bool)\n- **Default**: `False`\n- **Description**: When enabled, logs the text chunks as they are synthesized into audio. Helpful for auditing and debugging.\n\n###### `reset_generated_text` (bool)\n- **Default**: `True`\n- **Description**: If True, reset the generated text before processing.\n\n###### `output_wavfile` (str)\n- **Default**: `None`\n- **Description**: If set, save the audio to the specified WAV file.\n\n###### `on_sentence_synthesized` (callable)\n- **Default**: `None`\n- **Description**: A callback function that gets called after a single sentence fragment was synthesized.\n\n###### `before_sentence_synthesized` (callable)\n- **Default**: `None`\n- **Description**: A callback function that gets called before a single sentence fragment gets synthesized.\n\n###### `on_audio_chunk` (callable)\n- **Default**: `None`\n- **Description**: Callback function that gets called when a single audio chunk is ready.\n\n###### `tokenizer` (str)\n- **Default**: `\"nltk\"`\n- **Description**: Tokenizer to use for sentence splitting. Currently supports \"nltk\" and \"stanza\".\n\n###### `tokenize_sentences` (callable)\n- **Default**: `None`\n- **Description**: A custom function that tokenizes sentences from the input text. You can provide your own lightweight tokenizer if you are unhappy with nltk and stanza. It should take text as a string and return split sentences as a list of strings.\n\n###### `language` (str)\n- **Default**: `\"en\"`\n- **Description**: Language to use for sentence splitting.\n\n###### `context_size` (int)\n- **Default**: `12`\n- **Description**: The number of characters used to establish context for sentence boundary detection. A larger context improves the accuracy of detecting sentence boundaries.\n\n###### `context_size_look_overhead` (int)\n- **Default**: `12`\n- **Description**: Additional context size for looking ahead when detecting sentence boundaries.\n\n###### `muted` (bool)\n- **Default**: `False`\n- **Description**: If True, disables audio playback via local speakers. Useful when you want to synthesize to a file or process audio chunks without playing them.\n\n###### `sentence_fragment_delimiters` (str)\n- **Default**: `\".?!;:,\\n…)]}。-\"`\n- **Description**: A string of characters that are considered sentence delimiters.\n\n###### `force_first_fragment_after_words` (int)\n- **Default**: `15`\n- **Description**: The number of words after which the first sentence fragment is forced to be yielded.\n\n### CUDA installation\n\nThese steps are recommended for those who require **better performance** and have a compatible NVIDIA GPU.\n\n> **Note**: *to check if your NVIDIA GPU supports CUDA, visit the [official CUDA GPUs list](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-gpus).*\n\nTo use a torch with support via CUDA please follow these steps:\n\n> **Note**: *newer pytorch installations [may](https:\u002F\u002Fstackoverflow.com\u002Fa\u002F77069523) (unverified) not need Toolkit (and possibly cuDNN) installation anymore.*\n\n1. **Install NVIDIA CUDA Toolkit**:\n    For example, to install Toolkit 12.X, please\n    - Visit [NVIDIA CUDA Downloads](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-downloads).\n    - Select your operating system, system architecture, and os version.\n    - Download and install the software.\n\n    or to install Toolkit 11.8, please\n    - Visit [NVIDIA CUDA Toolkit Archive](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-11-8-0-download-archive).\n    - Select your operating system, system architecture, and os version.\n    - Download and install the software.\n\n2. **Install NVIDIA cuDNN**:\n\n    For example, to install cuDNN 8.7.0 for CUDA 11.x please\n    - Visit [NVIDIA cuDNN Archive](https:\u002F\u002Fdeveloper.nvidia.com\u002Frdp\u002Fcudnn-archive).\n    - Click on \"Download cuDNN v8.7.0 (November 28th, 2022), for CUDA 11.x\".\n    - Download and install the software.\n\n3. **Install ffmpeg**:\n\n    You can download an installer for your OS from the [ffmpeg Website](https:\u002F\u002Fffmpeg.org\u002Fdownload.html).\n\n    Or use a package manager:\n\n    - **On Ubuntu or Debian**:\n        ```bash\n        sudo apt update && sudo apt install ffmpeg\n        ```\n\n    - **On Arch Linux**:\n        ```bash\n        sudo pacman -S ffmpeg\n        ```\n\n    - **On MacOS using Homebrew** ([https:\u002F\u002Fbrew.sh\u002F](https:\u002F\u002Fbrew.sh\u002F)):\n        ```bash\n        brew install ffmpeg\n        ```\n\n    - **On Windows using Chocolatey** ([https:\u002F\u002Fchocolatey.org\u002F](https:\u002F\u002Fchocolatey.org\u002F)):\n        ```bash\n        choco install ffmpeg\n        ```\n\n    - **On Windows using Scoop** ([https:\u002F\u002Fscoop.sh\u002F](https:\u002F\u002Fscoop.sh\u002F)):\n        ```bash\n        scoop install ffmpeg\n        ```\n\n4. **Install PyTorch with CUDA support**:\n\n    To upgrade your PyTorch installation to enable GPU support with CUDA, follow these instructions based on your specific CUDA version. This is useful if you wish to enhance the performance of RealtimeSTT with CUDA capabilities.\n\n    - **For CUDA 11.8:**\n\n        To update PyTorch and Torchaudio to support CUDA 11.8, use the following commands:\n\n        ```bash\n        pip install torch==2.5.1+cu118 torchaudio==2.5.1 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n        ```\n\n    - **For CUDA 12.X:**\n\n\n        To update PyTorch and Torchaudio to support CUDA 12.X, execute the following:\n\n        ```bash\n        pip install torch==2.5.1+cu121 torchaudio==2.5.1 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\n        ```\n\n    Replace `2.3.1` with the version of PyTorch that matches your system and requirements.\n\n5. **Fix for to resolve compatibility issues**:\n    If you run into library compatibility issues, try setting these libraries to fixed versions:\n\n    ```bash\n    pip install networkx==2.8.8\n    pip install typing_extensions==4.8.0\n    pip install fsspec==2023.6.0\n    pip install imageio==2.31.6\n    pip install networkx==2.8.8\n    pip install numpy==1.24.3\n    pip install requests==2.31.0\n    ```\n\n## 💖 Acknowledgements\n\nHuge shoutout to the team behind [Coqui AI](https:\u002F\u002Fcoqui.ai\u002F) - especially the brilliant [Eren Gölge](https:\u002F\u002Fgithub.com\u002Ferogol) - for being the first to give us local high-quality synthesis with real-time speed and even a clonable voice!\n\nThank you [Pierre Nicolas Durette](https:\u002F\u002Fgithub.com\u002Fpndurette) for giving us a free tts to use without GPU using Google Translate with his gtts python library.\n\n## Contribution\n\nContributions are always welcome (e.g. PR to add a new engine).\n\n## License Information\n\n### ❗ Important Note:\nWhile the source of this library is open-source, the usage of many of the engines it depends on is not: External engine providers often restrict commercial use in their free plans. This means the engines can be used for noncommercial projects, but commercial usage requires a paid plan.\n\n### Engine Licenses Summary:\n\n#### CoquiEngine\n- **License**: Open-source only for noncommercial projects.\n- **Commercial Use**: Requires a paid plan.\n- **Details**: [CoquiEngine License](https:\u002F\u002Fcoqui.ai\u002Fcpml)\n\n#### ElevenlabsEngine\n- **License**: Open-source only for noncommercial projects.\n- **Commercial Use**: Available with every paid plan.\n- **Details**: [ElevenlabsEngine License](https:\u002F\u002Fhelp.elevenlabs.io\u002Fhc\u002Fen-us\u002Farticles\u002F13313564601361-Can-I-publish-the-content-I-generate-on-the-platform-)\n\n#### AzureEngine\n- **License**: Open-source only for noncommercial projects.\n- **Commercial Use**: Available from the standard tier upwards.\n- **Details**: [AzureEngine License](https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fanswers\u002Fquestions\u002F1192398\u002Fcan-i-use-azure-text-to-speech-for-commercial-usag)\n\n#### SystemEngine\n- **License**: Mozilla Public License 2.0 and GNU Lesser General Public License (LGPL) version 3.0.\n- **Commercial Use**: Allowed under this license.\n- **Details**: [SystemEngine License](https:\u002F\u002Fgithub.com\u002Fnateshmbhat\u002Fpyttsx3\u002Fblob\u002Fmaster\u002FLICENSE)\n\n#### GTTSEngine\n- **License**: MIT license\n- **Commercial Use**: It's under the MIT license, so it should be theoretically possible. Some caution might be necessary since it utilizes undocumented Google Translate speech functionality.\n- **Details**: [GTTS MIT License](https:\u002F\u002Fgithub.com\u002Fpndurette\u002FgTTS\u002Fblob\u002Fmain\u002FLICENSE)\n\n#### OpenAIEngine\n- **License**: please read [OpenAI Terms of Use](https:\u002F\u002Fopenai.com\u002Fpolicies\u002Fterms-of-use)\n\n**Disclaimer**: This is a summarization of the licenses as understood at the time of writing. It is not legal advice. Please read and respect the licenses of the different engine providers if you plan to use them in a project.\n\n## Contributors\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Ftraceloop\u002Fopenllmetry\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg alt=\"contributors\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FKoljaB_RealtimeTTS_readme_1f54367972fe.png\"\u002F>\n\u003C\u002Fa>\n\n## Audio licensing\n\nAudio samples derived from the EARS dataset by Meta (Facebook Research):\nhttps:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Ffacebookresearch\u002Fears_dataset\n\nLicensed under CC BY-NC 4.0:\nhttps:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-nc\u002F4.0\u002F\n\n## Author\n\nKolja Beigel\nEmail: kolja.beigel@web.de\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"#realtimetts\" target=\"_blank\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBack%20to%20Top-000000?style=for-the-badge\" alt=\"Back to Top\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n","# RealtimeTTS\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002FRealtimeTTS)](https:\u002F\u002Fpypi.org\u002Fproject\u002FRealtimeTTS\u002F)\n[![Downloads](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FKoljaB_RealtimeTTS_readme_a1326e50ca2a.png)](https:\u002F\u002Fwww.pepy.tech\u002Fprojects\u002Frealtimetts)\n[![GitHub release](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frelease\u002FKoljaB\u002FRealtimeTTS.svg)](https:\u002F\u002FGitHub.com\u002FKoljaB\u002FRealtimeTTS\u002Freleases\u002F)\n[![GitHub commits](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FKoljaB_RealtimeTTS_readme_ab755b94e79b.png)](https:\u002F\u002FGitHub.com\u002FNaereen\u002FKoljaB\u002FRealtimeTTS\u002Fcommit\u002F)\n[![GitHub forks](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002FKoljaB\u002FRealtimeTTS.svg?style=social&label=Fork&maxAge=2592000)](https:\u002F\u002FGitHub.com\u002FKoljaB\u002FRealtimeTTS\u002Fnetwork\u002F)\n[![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FKoljaB\u002FRealtimeTTS.svg?style=social&label=Star&maxAge=2592000)](https:\u002F\u002FGitHub.com\u002FKoljaB\u002FRealtimeTTS\u002Fstargazers\u002F)\n\n*易于使用、低延迟的实时文本转语音库*\n\n## 项目简介\n\nRealtimeTTS 是一款面向实时应用的先进文本转语音（TTS）库。它以能够快速将文本流转换为高质量音频输出，同时保持极低的延迟而著称。\n\n> **重要提示：** [安装方式](#installation)已更新，以便提供更多自定义选项。请使用 `pip install realtimetts[all]` 而不是 `pip install realtimetts`。更多[信息请点击这里](#installation)。\n\n> **提示：** *\u003Cstrong>请查看 [Linguflex](https:\u002F\u002Fgithub.com\u002FKoljaB\u002FLinguflex)\u003C\u002Fstrong>，这是 RealtimeTTS 的原始项目。它允许您通过语音控制环境，是目前功能最强大、最复杂的开源助手之一。*\n\nhttps:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fassets\u002F7604638\u002F87dcd9a5-3a4e-4f57-be45-837fc63237e7\n\n## 核心特性\n\n- **低延迟**\n  - 几乎即时的文本转语音转换\n  - 兼容大型语言模型的输出\n- **高质量音频**\n  - 生成清晰自然的语音\n- **多引擎支持**\n  - 支持 OpenAI TTS、Elevenlabs、Azure Speech Services、Coqui TTS、StyleTTS2、Piper、gTTS、Edge TTS、Parler TTS、Kokoro、Cartesia、Faster Qwen 3、NeuTTS、PocketTTS、Modelslab、CAMB AI、MiniMax 和 System TTS\n- **多语言支持**\n- **稳定可靠**：\n  - 通过回退机制确保持续运行\n  - 在出现中断时自动切换到备用引擎，从而保证一致的性能和可靠性，这对于关键和专业应用场景至关重要\n\n> **提示**：*请查看 [RealtimeSTT](https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeSTT)，该库的输入端对应产品，提供语音转文本功能。两者结合可为大型语言模型构建强大的实时音频处理框架。*\n\n## 常见问题解答\n\n请访问[常见问题解答页面](.\u002FFAQ.md)，获取关于 RealtimeTTS 使用的大量解答。\n\n## 文档\n\n**RealtimeTTS** 的文档提供以下语言版本：\n\n- **[英语](https:\u002F\u002Fkoljab.github.io\u002FRealtimeTTS\u002Fen\u002F)**\n- **[法语](https:\u002F\u002Fkoljab.github.io\u002FRealtimeTTS\u002Ffr\u002F)**\n- **[西班牙语](https:\u002F\u002Fkoljab.github.io\u002FRealtimeTTS\u002Fes\u002F)**\n- **[德语](https:\u002F\u002Fkoljab.github.io\u002FRealtimeTTS\u002Fde\u002F)**\n- **[意大利语](https:\u002F\u002Fkoljab.github.io\u002FRealtimeTTS\u002Fit\u002F)**\n- **[中文](https:\u002F\u002Fkoljab.github.io\u002FRealtimeTTS\u002Fzh\u002F)**\n- **[日语](https:\u002F\u002Fkoljab.github.io\u002FRealtimeTTS\u002Fja\u002F)**\n- **[印地语](https:\u002F\u002Fkoljab.github.io\u002FRealtimeTTS\u002Fhi\u002F)**\n- **[韩语](https:\u002F\u002Fkoljab.github.io\u002FRealtimeTTS\u002Fko\u002F)**\n\n---\n\n如果您需要任何调整或添加其他语言，请随时告知！\n\n## 更新内容\n\n- **新引擎：** PocketTTSEngine\n  - **安装方法：** `pip install pocket-tts`\n  - Kyutai Labs 推出的轻量级 100M 参数 TTS，专为 CPU 优化（约 6 倍实时速度）\n  - 支持通过 WAV 文件进行语音克隆，延迟约 200 毫秒，内置 8 种声音\n\n- **新引擎：** NeuTTSEngine\n  - 设备端语音克隆 TTS，仅需 3 秒参考音频\n  - **安装方法：** 从 https:\u002F\u002Fgithub.com\u002Fneuphonic\u002Fneutts 克隆\n\n- **新引擎：** ZipoVoiceEngine\n  - **安装方法：** `pip install RealtimeTTS`\n  - **测试文件示例：** [zipvoice_test.py](https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fblob\u002Fmaster\u002Ftests\u002Fzipvoice_test.py)\n\n- **新引擎：** OrpheusEngine\n  - **安装方法：** `pip install RealtimeTTS[orpheus]`\n  - **测试文件示例：** [orpheus_test.py](https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fblob\u002Fmaster\u002Ftests\u002Forpheus_test.py)\n\n- **新引擎：** KokoroEngine\n  - **安装方法：** `pip install RealtimeTTS[kokoro]`\n  - **测试文件示例：** [kokoro_test.py](https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fblob\u002Fmaster\u002Ftests\u002Fkokoro_test.py)\n\n支持更多 Kokoro 语言。完整安装还包括日语和中文（请参阅更新后的测试文件）：\n```shell\npip install \"RealtimeTTS[kokoro,jp,zh]\"\n```\n\n如果在使用日语时遇到问题（错误“模块 'jieba' 没有属性 'lcut'”），请尝试：\n```shell\npip uninstall jieba jieba3k\npip install jieba\n```\n\n\n- **新引擎：** PiperEngine\n  - **安装教程：** [观看 YouTube 视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=GGvdq3giiTQ)\n  - **测试文件示例：** [piper_test.py](https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fblob\u002Fmaster\u002Ftests\u002Fpiper_test.py)\n\nStyleTTS2 引擎：\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fd1634012-ba53-4445-a43a-7042826eedd7\n\nEdgeTTS 引擎：\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F73ec6258-23ba-4bc6-acc7-7351a13c5509\n\n更多信息请参阅[发布历史](https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Freleases)。\n\n新增 ParlerEngine。需要 Flash Attention，否则在 4090 显卡上几乎无法达到实时推理的速度。\n\nWindows 系统下 Parler 的安装步骤（在安装 RealtimeTTS 后）：\n\n```python\npip install git+https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fparler-tts.git\npip install torch==2.3.1+cu121 torchaudio==2.3.1 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\npip install https:\u002F\u002Fgithub.com\u002Foobabooga\u002Fflash-attention\u002Freleases\u002Fdownload\u002Fv2.6.3\u002Fflash_attn-2.6.3+cu122torch2.3.1cxx11abiFALSE-cp310-cp310-win_amd64.whl\npip install \"numpy\u003C2\"\n```\n\n## 技术栈\n\n本库使用以下文本转语音引擎：\n\n- **文本转语音引擎**\n  - **OpenAIEngine** 🌐：OpenAI 的 TTS，提供 6 种优质语音\n  - **CoquiEngine** 🏠：高质量神经网络 TTS，支持本地处理\n  - **AzureEngine** 🌐：微软的 TTS，每月免费提供 50 万字符\n  - **ElevenlabsEngine** 🌐：优质语音质量，选项丰富\n  - **GTTSEngine** 🌐：免费的 Google 翻译 TTS，无需 GPU\n  - **EdgeEngine** 🌐：Edge 免费 TTS 服务（基于 Microsoft Azure）\n  - **ParlerEngine** 🏠：适用于高端 GPU 的本地神经网络 TTS\n  - **SystemEngine** 🏠：内置系统 TTS，快速部署\n  - **PiperEngine** 🏠：超快速 TTS 系统，也可在 Raspberry Pi 上运行\n  - **StyleTTS2Engine** 🏠：富有表现力、自然的语音合成\n  - **OrpheusEngine** 🏠：基于 Llama 的 TTS，支持情感标签\n  - **CambEngine** 🌐：CAMB AI MARS 模型，支持 140 多种语言\n  - **MiniMaxEngine** 🌐：MiniMax 云 TTS，提供 12 种预设语音\n  - **ZipVoiceEngine** 🏠：1.23 亿参数的零样本模型，达到最先进水平\n  - **PocketTTSEngine** 🏠：Kyutai Labs 1 亿参数模型，专为 CPU 优化并支持语音克隆\n  - **NeuTTSEngine** 🏠：仅需 3 秒参考音频即可实现语音克隆\n  - **CartesiaEngine** 🌐：基于 FastAPI 的高质量语音合成\n  - **FasterQwenEngine** 🏠：本地高速高质量语音克隆\n  - **ModelsLabEngine** 🌐：基于 API 的 TTS\n  - **OmnivoiceEngine** 🏠：支持数百种语言，具备极高品质的语音克隆能力\n\n🏠 本地处理（无需互联网）\n🌐 需要互联网连接\n\n- **句子边界检测**\n  - **NLTK 句子分词器**：自然语言工具包提供的句子分词器，适用于简单的英语文本转语音任务或追求简洁性的情况。\n  - **Stanza 句子分词器**：Stanza 句子分词器，适合多语言文本处理或对准确性和性能要求较高的场景。\n\n*通过使用“行业标准”组件，RealtimeTTS 提供了一个可靠、高端的技术基础，用于开发先进的语音解决方案。*\n\n## 安装\n\n> **注意**：不再推荐使用 `pip install realtimetts` 进行基础安装，建议改用 `pip install realtimetts[all]`。\n\n> **注意**：如有需要，请在 TextToAudioStream 中设置 `output_device_index`。Linux 用户：通过 `apt-get install -y portaudio19-dev` 安装 PortAudio；MacOS 用户：通过 `brew install portaudio` 安装 PortAudio。\n\nRealtimeTTS 库提供了针对不同使用场景的依赖项安装选项。以下是根据您的需求安装 RealtimeTTS 的几种方式：\n\n### 完整安装\n\n要安装支持所有 TTS 引擎的 RealtimeTTS：\n\n```bash\npip install -U realtimetts[all]\n```\n\n### 自定义安装\n\n仅安装所需依赖项，可选择以下选项：\n\n- **all**：包含所有引擎的完整包\n- **system**：通过 pyttsx3 使用本地系统 TTS\n- **azure**：支持 Azure 语音服务\n- **elevenlabs**：集成 ElevenLabs API\n- **openai**：OpenAI TTS 服务\n- **gtts**：Google 文本转语音\n- **edge**：Microsoft Edge TTS\n- **coqui**：Coqui TTS 引擎\n- **camb**：CAMB AI MARS TTS\n- **minimax**：MiniMax 云 TTS\n- **minimal**：仅核心包（用于自定义引擎开发）\n\n示例：`pip install realtimetts[all]`、`pip install realtimetts[azure]`、`pip install realtimetts[azure,elevenlabs,openai]`。\n\n### 虚拟环境安装\n\n若希望在虚拟环境中进行完整安装，请按以下步骤操作：\n\n```bash\npython -m venv env_realtimetts\nenv_realtimetts\\Scripts\\activate.bat\npython.exe -m pip install --upgrade pip\npip install -U realtimetts[all]\n```\n\n更多关于 [CUDA 安装](#cuda-installation) 的信息。\n\n## 引擎要求\n\nRealtimeTTS 支持的不同引擎具有各自独特的依赖和配置要求。请根据所选引擎确保满足相应条件。\n\n### SystemEngine\n`SystemEngine` 可直接利用系统自带的 TTS 功能，无需额外设置。\n\n### GTTSEngine\n`GTTSEngine` 直接使用 Google 翻译的文本转语音 API，无需额外配置。\n\n### OpenAIEngine\n要使用 `OpenAIEngine`：\n- 设置环境变量 `OPENAI_API_KEY`\n- 安装 FFmpeg（参见 [CUDA 安装](#cuda-installation) 第 3 点）。\n\n### AzureEngine\n要使用 `AzureEngine`，您需要：\n- Microsoft Azure 文本转语音 API 密钥（可通过 `AzureEngine` 构造函数参数 `speech_key` 或环境变量 `AZURE_SPEECH_KEY` 提供）\n- Microsoft Azure 服务区域。\n\n请确保在初始化 `AzureEngine` 时已准备好并正确配置这些凭据。\n\n### CambEngine\n要使用 `CambEngine`，您需要：\n- CAMB AI API 密钥（可通过 `CambEngine` 构造函数参数 `api_key` 或环境变量 `CAMB_API_KEY` 提供）\n- 系统中已安装 `mpv`（用于流式播放音频）。\n- 可用模型包括：`mars-flash`（低延迟）、`mars-pro`（高保真）、`mars-instruct`（指令跟随）。\n- 支持 140 多种语言，使用 BCP-47 代码表示（如 `en-us`、`es-es`、`ja-jp`）。\n\n### MiniMaxEngine\n要使用 `MiniMaxEngine`，您需要：\n- MiniMax API 密钥（可通过 `MiniMaxEngine` 构造函数参数 `api_key` 或环境变量 `MINIMAX_API_KEY` 提供）\n- 可用模型包括：`speech-2.8-hd`（高质量）、`speech-2.8-turbo`（快速）。\n- 提供 12 种预设语音，涵盖英语及多种语言。\n\n### ElevenlabsEngine\n对于 `ElevenlabsEngine`，您需要：\n- Elevenlabs API 密钥（可通过 `ElevenlabsEngine` 构造函数参数 `api_key` 或环境变量 `ELEVENLABS_API_KEY` 提供）\n- 系统中已安装 `mpv`（用于播放 Elevenlabs 仅提供的 MPEG 格式音频）。\n\n  🔹 **安装 `mpv`：**\n  - **macOS**：\n    ```bash\n    brew install mpv\n    ```\n\n  - **Linux 和 Windows**：请访问 [mpv.io](https:\u002F\u002Fmpv.io\u002F) 查看安装说明。\n\n### PiperEngine\n\n**PiperEngine** 使用 Piper 模型提供高质量的实时文本转语音合成。\n\n- **单独安装：**\n  - Piper 必须独立于 RealtimeTTS 安装。请参考 [Windows 平台 Piper 安装教程](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=GGvdq3giiTQ)。\n\n- **配置：**\n  - 初始化 `PiperEngine` 时，需正确指定 Piper 可执行文件和语音模型文件的路径。\n  - 确保 `PiperVoice` 已正确设置，并配备了相应的模型和配置文件。\n\n### CoquiEngine\n\n提供高质量的本地神经网络 TTS，并支持语音克隆功能。\n\n该引擎会先下载一个神经网络 TTS 模型。在大多数情况下，借助 GPU 合成即可满足实时需求。通常需要约 4–5 GB 显存。\n\n- 要进行语音克隆，需将包含源语音的 WAV 文件名作为 `voice` 参数传递给 CoquiEngine 构造函数。\n- 最佳的语音克隆输入应为采样率 22050 Hz、单声道、16 位的 WAV 文件，且录音时长较短（约 5–30 秒）。\n\n在大多数系统上，若想达到实时效果，必须启用 GPU 加速，否则可能会出现卡顿现象。\n\n## 快速入门\n\n以下是一个基本的使用示例：\n\n```python\nfrom RealtimeTTS import TextToAudioStream, SystemEngine, AzureEngine, ElevenlabsEngine\n\nengine = SystemEngine() # 替换为您使用的TTS引擎\nstream = TextToAudioStream(engine)\nstream.feed(\"你好，世界！你今天过得怎么样？\")\nstream.play_async()\n```\n\n## 输入文本\n\n您可以输入单个字符串：\n\n```python\nstream.feed(\"你好，这是一句话。\")\n```\n\n或者您也可以输入生成器和字符迭代器来实现实时流式传输：\n\n```python\ndef write(prompt: str):\n    for chunk in openai.ChatCompletion.create(\n        model=\"gpt-3.5-turbo\",\n        messages=[{\"role\": \"user\", \"content\" : prompt}],\n        stream=True\n    ):\n        if (text_chunk := chunk[\"choices\"][0][\"delta\"].get(\"content\")) is not None:\n            yield text_chunk\n\ntext_stream = write(\"一段由三句话组成的放松演讲。\")\n\nstream.feed(text_stream)\n```\n\n```python\nchar_iterator = iter(\"逐字流式传输。\")\nstream.feed(char_iterator)\n```\n\n## 播放\n\n异步播放：\n\n```python\nstream.play_async()\nwhile stream.is_playing():\n    time.sleep(0.1)\n```\n\n同步播放：\n\n```python\nstream.play()\n```\n\n## 测试库\n\n测试子目录包含一组脚本，可以帮助您评估和理解RealtimeTTS库的功能。\n\n请注意，大多数测试仍然依赖于“旧”OpenAI API（\u003C1.0.0）。新OpenAI API的用法在openai_1.0_test.py中进行了演示。\n\n- **simple_test.py**\n    - **描述**：一个类似于“hello world”的演示，展示了该库最简单的用法。\n\n- **complex_test.py**\n    - **描述**：一个全面的演示，展示了该库提供的大部分功能。\n\n- **coqui_test.py**\n    - **描述**：本地Coqui TTS引擎的测试。\n\n- **translator.py**\n    - **依赖项**：运行`pip install openai realtimestt`。\n    - **描述**：实时翻译成六种不同的语言。\n\n- **openai_voice_interface.py**\n    - **依赖项**：运行`pip install openai realtimestt`。\n    - **描述**：唤醒词激活且基于语音的OpenAI API用户界面。\n\n- **advanced_talk.py**\n    - **依赖项**：运行`pip install openai keyboard realtimestt`。\n    - **描述**：在开始AI对话之前选择TTS引擎和声音。\n\n- **minimalistic_talkbot.py**\n    - **依赖项**：运行`pip install openai realtimestt`。\n    - **描述**：一个20行代码的基本聊天机器人。\n\n- **simple_llm_test.py**\n    - **依赖项**：运行`pip install openai`。\n    - **描述**：简单演示如何将该库与大型语言模型（LLMs）集成。\n\n- **test_callbacks.py**\n    - **依赖项**：运行`pip install openai`。\n    - **描述**：展示了回调函数，并让您在实际应用环境中检查延迟时间。\n\n## 暂停、恢复与停止\n\n暂停音频流：\n\n```python\nstream.pause()\n```\n\n恢复已暂停的流：\n\n```python\nstream.resume()\n```\n\n立即停止流：\n\n```python\nstream.stop()\n```\n\n## 需求说明\n\n- **Python版本**：\n  - **要求**：Python >= 3.9，\u003C 3.13\n  - **原因**：该库依赖于Coqui的GitHub库“TTS”，而该库需要此范围内的Python版本。\n\n- **PyAudio**：用于创建输出音频流\n\n- **stream2sentence**：用于将传入的文本流分割成句子\n\n- **pyttsx3**：系统文本转语音转换引擎\n\n- **pydub**：用于转换音频块格式\n\n- **azure-cognitiveservices-speech**：Azure文本转语音转换引擎\n\n- **elevenlabs**：Elevenlabs文本转语音转换引擎\n\n- **coqui-TTS**：Coqui的XTTS文本转语音库，用于高质量的本地神经网络TTS\n\n特别感谢[Idiap研究所](https:\u002F\u002Fgithub.com\u002Fidiap)维护的[Coqui TTS的分支](https:\u002F\u002Fgithub.com\u002Fidiap\u002Fcoqui-ai-TTS)。\n\n- **openai**：用于与OpenAI的TTS API交互\n\n- **gtts**：Google翻译文本转语音转换\n\n\n## 配置\n\n### `TextToAudioStream`的初始化参数\n\n在初始化`TextToAudioStream`类时，您有多种选项可以自定义其行为。以下是可用的参数：\n\n#### `engine`（BaseEngine）\n- **类型**：`Union[BaseEngine, List[BaseEngine]]`\n- **必填**：是\n- **描述**：用于文本转音频合成的核心引擎。  \n  - 如果提供单个引擎实例，则所有合成任务都将使用该引擎。  \n  - 如果提供多个引擎实例列表，则系统会使用它们作为备用机制。  \n\n#### `on_text_stream_start`（可调用对象）\n- **类型**：`Callable`\n- **必填**：否\n- **描述**：当文本流处理过程开始时触发的回调函数。  \n  - **用途**：显示“正在处理…”的状态消息或初始化资源。  \n  - **签名**：`on_text_stream_start() -> None`。\n\n#### `on_text_stream_stop`（可调用对象）\n- **类型**：`Callable`\n- **必填**：否\n- **描述**：当文本流处理过程结束时触发的回调函数。  \n  - **用途**：清理资源或提示文本到语音管道已完成处理。  \n  - **签名**：`on_text_stream_stop() -> None`。\n\n#### `on_audio_stream_start`（可调用对象）\n- **类型**：`Callable`\n- **必填**：否\n- **描述**：当音频播放开始时触发的回调函数。  \n  - **用途**：记录播放事件或更新UI元素以反映音频正在播放。  \n  - **签名**：`on_audio_stream_start() -> None`。\n\n#### `on_audio_stream_stop`（可调用对象）\n- **类型**：`Callable`\n- **必填**：否\n- **描述**：当音频播放结束时触发的回调函数。  \n  - **用途**：重置UI元素或在播放结束后启动后续操作。  \n  - **签名**：`on_audio_stream_stop() -> None`。\n\n#### `on_character`（可调用对象）\n- **类型**：`Callable`\n- **必填**：否\n- **描述**：在合成过程中每处理一个字符时触发的回调函数。  \n  - **用途**：实时可视化字符级别的处理过程，可用于调试或监控。  \n  - **签名**：`on_character(character: str) -> None`。\n\n#### `on_word`（可选的可调用对象）\n- **类型**：`Callable`\n- **必填**：否\n- **默认值**：`None`\n- **描述**：当一个单词开始播放时触发的回调函数。该回调接收一个对象（`TimingInfo`的实例），其中包含：\n  - **word**：单词的文本，\n  - **start_time**：单词开始播放的时间偏移量（以秒为单位），\n  - **end_time**：单词结束播放的时间偏移量（以秒为单位）。\n- **用途**：可用于跟踪单词级别的进度或在显示设备上突出显示所说出的单词。  \n- **备注**：目前仅支持AzureEngine和KokoroEngine（针对美式和英式语音）。其他引擎不提供单词级别的计时信息。\n\n#### `output_device_index`（整数）❗ ElevenlabsEngine 和 EdgeEngine（MPV 播放）不支持此参数\n- **类型**: `int`\n- **是否必填**: 否\n- **默认值**: `None`\n- **描述**: 用于播放的音频输出设备索引。  \n  - **工作原理**: 系统将使用与此索引对应的设备进行音频播放。如果为 `None`，则使用系统的默认音频输出设备。  \n  - **获取设备索引**: 可使用 PyAudio 的设备查询方法来获取可用的索引。\n\n#### `mpv_audio_device`（字符串）适用于 ElevenlabsEngine 和 EdgeEngine（MPV 播放）\n- **类型**: `str`\n- **是否必填**: 否\n- **默认值**: `None`\n- **描述**: 用于播放的音频设备名称。\n  - **工作原理**: 系统将使用与此名称对应的设备进行音频播放。如果为 `None`，则使用系统的默认音频输出设备。\n  - **获取设备名称**: 可使用 `mpv --audio-device=help` 来查看设备名称列表。\n\n#### `tokenizer`（字符串）\n- **类型**: `str`\n- **是否必填**: 否\n- **默认值**: `\"nltk\"`\n- **描述**: 指定用于将文本拆分为句子或片段的分词器。  \n  - **支持选项**: `\"nltk\"`（默认）和 `\"stanza\"`。  \n  - **自定义分词**: 您可以通过设置 `tokenize_sentences` 参数来提供自定义分词器。\n\n#### `language`（字符串）\n- **类型**: `str`\n- **是否必填**: 否\n- **默认值**: `\"en\"`\n- **描述**: 用于句子拆分的语言代码。  \n  - **示例**: `\"en\"` 表示英语， `\"de\"` 表示德语， `\"fr\"` 表示法语。  \n  - 请确保所选分词器支持指定的语言。\n\n#### `muted`（布尔值）\n- **类型**: `bool`\n- **是否必填**: 否\n- **默认值**: `False`\n- **描述**: 控制是否静音音频播放。\n  - 如果为 `True`，音频播放将被禁用，不会打开音频流，从而允许合成器生成音频数据而不进行播放。  \n  - **适用场景**: 适用于需要将音频保存到文件或处理音频片段而无需听到输出的情况。\n\n#### `frames_per_buffer`（整数）\n- **类型**: `int`\n- **是否必填**: 否\n- **默认值**: `pa.paFramesPerBufferUnspecified`\n- **描述**: 定义 PyAudio 每个缓冲区处理的音频帧数。  \n  - **影响**:  \n    - 值越小，延迟越低，但 CPU 使用率越高。  \n    - 值越大，延迟越高，但 CPU 负载越低。  \n  - 如果设置为 `pa.paFramesPerBufferUnspecified`，PyAudio 将根据平台和硬件选择默认值。\n\n##### `comma_silence_duration`（浮点数）  \n- **类型**: `float`  \n- **是否必填**: 否  \n- **默认值**: `0.0`  \n- **描述**: 在逗号后插入的静音时长（以秒为单位）。  \n\n##### `sentence_silence_duration`（浮点数）  \n- **类型**: `float`  \n- **是否必填**: 否  \n- **默认值**: `0.0`  \n- **描述**: 在句末插入的静音时长（以秒为单位）。  \n\n##### `default_silence_duration`（浮点数）  \n- **类型**: `float`  \n- **是否必填**: 否  \n- **默认值**: `0.0`  \n- **描述**: 当没有适用的标点规则时，片段之间的默认静音时长（以秒为单位）。  \n\n#### `playout_chunk_size`（整数）\n- **类型**: `int`\n- **是否必填**: 否\n- **默认值**: `-1`\n- **描述**: 指定要播放到流中的音频块大小（以字节为单位）。  \n  - **行为**:  \n    - 如果为 `-1`，则块大小会根据 `frames_per_buffer` 或内部默认值动态确定。  \n    - 较小的块大小可以降低延迟，但可能会增加开销。  \n    - 较大的块大小可以提高效率，但可能会引入播放延迟。  \n\n#### `level`（整数）\n- **类型**: `int`\n- **是否必填**: 否\n- **默认值**: `logging.WARNING`\n- **描述**: 设置内部日志记录器的日志级别。  \n  - **示例**:  \n    - `logging.DEBUG`: 用于调试的详细信息。  \n    - `logging.INFO`: 一般运行时信息。  \n    - `logging.WARNING`: 关于潜在问题的警告。  \n    - `logging.ERROR`: 需要关注的严重错误。\n\n#### 示例用法：\n\n```python\nengine = YourEngine()  # 替换为您使用的引擎\nstream = TextToAudioStream(\n    engine=engine,\n    on_text_stream_start=my_text_start_func,\n    on_text_stream_stop=my_text_stop_func,\n    on_audio_stream_start=my_audio_start_func,\n    on_audio_stream_stop=my_audio_stop_func,\n    level=logging.INFO\n)\n```\n\n### 方法\n\n#### `play` 和 `play_async`\n\n这两个方法负责执行文本到音频的合成，并播放音频流。区别在于，`play` 是一个阻塞函数，而 `play_async` 会在单独的线程中运行，从而允许其他操作继续进行。\n\n##### 参数：\n\n###### `fast_sentence_fragment` (bool)\n- **默认值**: `True`\n- **说明**: 当设置为 `True` 时，该方法会优先考虑速度，更快地生成并播放句子片段。这对于对延迟敏感的应用场景非常有用。\n\n###### `fast_sentence_fragment_allsentences` (bool)\n- **默认值**: `False`\n- **说明**: 当设置为 `True` 时，会将快速句子片段处理应用于所有句子，而不仅仅是第一句。\n\n###### `fast_sentence_fragment_allsentences_multiple` (bool)\n- **默认值**: `False`\n- **说明**: 当设置为 `True` 时，允许一次输出多个句子片段，而不仅仅是一个。\n\n###### `buffer_threshold_seconds` (float)\n- **默认值**: `0.0`\n- **说明**: 指定缓冲阈值的时间（以秒为单位），这会影响音频播放的流畅性和连续性。\n\n  - **工作原理**: 在合成新句子之前，系统会检查缓冲区中是否还有超过 `buffer_threshold_seconds` 所指定时间长度的音频内容。如果有，则会从文本生成器中获取下一句，假设能够在缓冲区剩余音频的时间窗口内完成该句的获取和合成。这一过程可以让文本转语音引擎获得更多的上下文信息，从而提升合成质量，改善用户体验。\n  \n  较高的值可以确保有更多的预缓冲音频，减少播放过程中出现静音或断续的可能性。如果遇到中断或停顿的情况，可以考虑适当提高此值。\n\n###### `minimum_sentence_length` (int)\n- **默认值**: `10`\n- **说明**: 设置被视为可合成句子的最小字符长度。这会影响文本块的处理和播放方式。\n\n###### `minimum_first_fragment_length` (int)\n- **默认值**: `10`\n- **说明**: 第一个句子片段在被输出之前所需的最小字符数。\n\n###### `log_synthesized_text` (bool)\n- **默认值**: `False`\n- **说明**: 启用后，会在文本被合成为音频时记录这些文本块。有助于审计和调试。\n\n###### `reset_generated_text` (bool)\n- **默认值**: `True`\n- **说明**: 如果为 `True`，则在开始处理前重置已生成的文本。\n\n###### `output_wavfile` (str)\n- **默认值**: `None`\n- **说明**: 如果设置了此参数，则会将音频保存到指定的 WAV 文件中。\n\n###### `on_sentence_synthesized` (callable)\n- **默认值**: `None`\n- **说明**: 在单个句子片段被合成完成后调用的回调函数。\n\n###### `before_sentence_synthesized` (callable)\n- **默认值**: `None`\n- **说明**: 在单个句子片段被合成之前调用的回调函数。\n\n###### `on_audio_chunk` (callable)\n- **默认值**: `None`\n- **说明**: 当单个音频块准备就绪时调用的回调函数。\n\n###### `tokenizer` (str)\n- **默认值**: `\"nltk\"`\n- **说明**: 用于句子切分的分词器。目前支持 `\"nltk\"` 和 `\"stanza\"`。\n\n###### `tokenize_sentences` (callable)\n- **默认值**: `None`\n- **说明**: 一个自定义函数，用于从输入文本中切分句子。如果您对 nltk 和 stanza 不满意，可以提供自己的轻量级分词器。该函数应接受字符串形式的文本，并返回由字符串组成的句子列表。\n\n###### `language` (str)\n- **默认值**: `\"en\"`\n- **说明**: 用于句子切分的语言。\n\n###### `context_size` (int)\n- **默认值**: `12`\n- **说明**: 用于确定句子边界的上下文字符数。较大的上下文可以提高句子边界检测的准确性。\n\n###### `context_size_look_overhead` (int)\n- **默认值**: `12`\n- **说明**: 在检测句子边界时向前查看的额外上下文大小。\n\n###### `muted` (bool)\n- **默认值**: `False`\n- **说明**: 如果为 `True`，则会禁用通过本地扬声器播放音频。这在您希望仅将音频合成到文件中，或者处理音频块而不进行播放时非常有用。\n\n###### `sentence_fragment_delimiters` (str)\n- **默认值**: `\".?!;:,\\n…)]}。-\"`\n- **说明**: 被视为句子分隔符的字符字符串。\n\n###### `force_first_fragment_after_words` (int)\n- **默认值**: `15`\n- **说明**: 强制在读取一定数量的单词后立即输出第一个句子片段。\n\n### CUDA 安装\n\n这些步骤推荐给那些需要**更好性能**且拥有兼容 NVIDIA GPU 的用户。\n\n> **注意**: *要检查您的 NVIDIA GPU 是否支持 CUDA，请访问 [官方 CUDA GPU 列表](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-gpus)。*\n\n要使用支持 CUDA 的 PyTorch，请按照以下步骤操作：\n\n> **注意**: *较新的 PyTorch 安装版本[可能](https:\u002F\u002Fstackoverflow.com\u002Fa\u002F77069523)（未经验证）不再需要安装 Toolkit 和 cuDNN。*\n\n1. **安装 NVIDIA CUDA Toolkit**：\n    例如，要安装 Toolkit 12.X，请\n    - 访问 [NVIDIA CUDA 下载页面](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-downloads)。\n    - 选择您的操作系统、系统架构和操作系统版本。\n    - 下载并安装软件。\n\n    或者，要安装 Toolkit 11.8，请\n    - 访问 [NVIDIA CUDA Toolkit 归档页面](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-11-8-0-download-archive)。\n    - 选择您的操作系统、系统架构和操作系统版本。\n    - 下载并安装软件。\n\n2. **安装 NVIDIA cuDNN**：\n\n    例如，要为 CUDA 11.x 安装 cuDNN 8.7.0，请\n    - 访问 [NVIDIA cuDNN 归档页面](https:\u002F\u002Fdeveloper.nvidia.com\u002Frdp\u002Fcudnn-archive)。\n    - 点击“下载 cuDNN v8.7.0（2022年11月28日），适用于 CUDA 11.x”。\n    - 下载并安装软件。\n\n3. **安装 ffmpeg**：\n\n    您可以从 [ffmpeg 官网](https:\u002F\u002Fffmpeg.org\u002Fdownload.html) 下载适用于您操作系统的安装程序。\n\n    或者使用包管理器：\n\n    - **在 Ubuntu 或 Debian 上**：\n        ```bash\n        sudo apt update && sudo apt install ffmpeg\n        ```\n\n    - **在 Arch Linux 上**：\n        ```bash\n        sudo pacman -S ffmpeg\n        ```\n\n    - **在 macOS 上使用 Homebrew**（[https:\u002F\u002Fbrew.sh\u002F](https:\u002F\u002Fbrew.sh\u002F)）：\n        ```bash\n        brew install ffmpeg\n        ```\n\n    - **在 Windows 上使用 Chocolatey**（[https:\u002F\u002Fchocolatey.org\u002F](https:\u002F\u002Fchocolatey.org\u002F)）：\n        ```bash\n        choco install ffmpeg\n        ```\n\n    - **在 Windows 上使用 Scoop**（[https:\u002F\u002Fscoop.sh\u002F](https:\u002F\u002Fscoop.sh\u002F)）：\n        ```bash\n        scoop install ffmpeg\n        ```\n\n4. **安装支持 CUDA 的 PyTorch**：\n\n    若要升级您的 PyTorch 安装以启用 CUDA 的 GPU 支持，请根据您使用的具体 CUDA 版本遵循以下说明。这在您希望利用 CUDA 功能提升 RealtimeSTT 性能时非常有用。\n\n    - **对于 CUDA 11.8：**\n\n        要将 PyTorch 和 Torchaudio 更新为支持 CUDA 11.8，请使用以下命令：\n\n        ```bash\n        pip install torch==2.5.1+cu118 torchaudio==2.5.1 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n        ```\n\n    - **对于 CUDA 12.X：**\n\n\n        要将 PyTorch 和 Torchaudio 更新为支持 CUDA 12.X，请执行以下命令：\n\n        ```bash\n        pip install torch==2.5.1+cu121 torchaudio==2.5.1 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\n        ```\n\n        请将 `2.3.1` 替换为您系统和需求相匹配的 PyTorch 版本。\n\n5. **解决兼容性问题的修复方法**：\n    如果您遇到库兼容性问题，请尝试将这些库设置为固定版本：\n\n    ```bash\n    pip install networkx==2.8.8\n    pip install typing_extensions==4.8.0\n    pip install fsspec==2023.6.0\n    pip install imageio==2.31.6\n    pip install networkx==2.8.8\n    pip install numpy==1.24.3\n    pip install requests==2.31.0\n    ```\n\n## 💖 致谢\n\n向 [Coqui AI](https:\u002F\u002Fcoqui.ai\u002F) 团队致以衷心的感谢——尤其是才华横溢的 [Eren Gölge](https:\u002F\u002Fgithub.com\u002Ferogol)，他是第一个为我们提供本地高质量实时合成技术，并且还具备可克隆语音功能的人！\n\n感谢 [Pierre Nicolas Durette](https:\u002F\u002Fgithub.com\u002Fpndurette) 使用他的 gtts Python 库，通过 Google Translate 提供了无需 GPU 即可使用的免费文本转语音服务。\n\n## 贡献\n\n我们始终欢迎贡献（例如添加新引擎的 PR）。\n\n## 许可信息\n\n### ❗ 重要提示：\n虽然本库的源代码是开源的，但它所依赖的许多引擎的使用权限并非如此：外部引擎提供商通常在其免费计划中限制商业用途。这意味着这些引擎可用于非商业项目，但商业用途则需要付费计划。\n\n### 引擎许可摘要：\n\n#### CoquiEngine\n- **许可**：仅对非商业项目开放源代码。\n- **商业用途**：需要付费计划。\n- **详情**：[CoquiEngine 许可](https:\u002F\u002Fcoqui.ai\u002Fcpml)\n\n#### ElevenlabsEngine\n- **许可**：仅对非商业项目开放源代码。\n- **商业用途**：所有付费计划均可使用。\n- **详情**：[ElevenlabsEngine 许可](https:\u002F\u002Fhelp.elevenlabs.io\u002Fhc\u002Fen-us\u002Farticles\u002F13313564601361-Can-I-publish-the-content-I-generate-on-the-platform-)\n\n#### AzureEngine\n- **许可**：仅对非商业项目开放源代码。\n- **商业用途**：从标准层级开始可用。\n- **详情**：[AzureEngine 许可](https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fanswers\u002Fquestions\u002F1192398\u002Fcan-i-use-azure-text-to-speech-for-commercial-usag)\n\n#### SystemEngine\n- **许可**：Mozilla 公共许可证 2.0 和 GNU 较小通用公共许可证（LGPL）3.0版。\n- **商业用途**：在此许可下允许。\n- **详情**：[SystemEngine 许可](https:\u002F\u002Fgithub.com\u002Fnateshmbhat\u002Fpyttsx3\u002Fblob\u002Fmaster\u002FLICENSE)\n\n#### GTTSEngine\n- **许可**：MIT 许可\n- **商业用途**：由于采用 MIT 许可，理论上是可以的。不过由于其使用了未公开的 Google Translate 语音功能，可能需要谨慎对待。\n- **详情**：[GTTS MIT 许可](https:\u002F\u002Fgithub.com\u002Fpndurette\u002FgTTS\u002Fblob\u002Fmain\u002FLICENSE)\n\n#### OpenAIEngine\n- **许可**：请阅读 [OpenAI 使用条款](https:\u002F\u002Fopenai.com\u002Fpolicies\u002Fterms-of-use)\n\n**免责声明**：此处是对各许可证的理解总结，仅供参考，不构成法律建议。如果您计划在项目中使用这些引擎，请务必仔细阅读并尊重各引擎提供商的许可协议。\n\n## 贡献者\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Ftraceloop\u002Fopenllmetry\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg alt=\"contributors\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FKoljaB_RealtimeTTS_readme_1f54367972fe.png\"\u002F>\n\u003C\u002Fa>\n\n## 音频许可\n\n音频样本来源于 Meta（Facebook Research）的 EARS 数据集：\nhttps:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Ffacebookresearch\u002Fears_dataset\n\n采用 CC BY-NC 4.0 许可：\nhttps:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-nc\u002F4.0\u002F\n\n## 作者\n\nKolja Beigel\n邮箱：kolja.beigel@web.de\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"#realtimetts\" target=\"_blank\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBack%20to%20Top-000000?style=for-the-badge\" alt=\"返回顶部\">\n  \u003C\u002Fa>\n\u003C\u002Fp>","# RealtimeTTS 快速上手指南\n\nRealtimeTTS 是一个专为实时应用设计的低延迟文本转语音（TTS）库。它能够将文本流快速转换为高质量的音频输出，几乎无延迟，非常适合与大语言模型（LLM）结合使用。\n\n## 1. 环境准备\n\n### 系统要求\n- **Python**: 建议 Python 3.8 或更高版本\n- **操作系统**: Windows, macOS, Linux\n- **硬件**:\n  - 本地引擎（如 Coqui, Piper, StyleTTS2）：推荐具备 NVIDIA GPU 以获得最佳性能，部分轻量级引擎（如 PocketTTS）可在 CPU 上流畅运行。\n  - 云端引擎（如 OpenAI, Azure, Elevenlabs）：仅需网络连接。\n\n### 前置依赖\n不同操作系统可能需要安装额外的音频驱动：\n\n- **Linux**:\n  ```bash\n  sudo apt-get install -y portaudio19-dev\n  # 若使用 mpv 引擎（如 Elevenlabs），还需安装 mpv\n  sudo apt-get install mpv\n  ```\n- **macOS**:\n  ```bash\n  brew install portaudio\n  # 若使用 mpv 引擎\n  brew install mpv\n  ```\n- **Windows**:\n  - 通常无需额外安装端口音频驱动。\n  - 若使用 Elevenlabs 等需要流式播放 MPEG 的引擎，请前往 [mpv.io](https:\u002F\u002Fmpv.io\u002F) 下载并安装 `mpv`，或将其添加到系统环境变量中。\n\n> **注意**：部分引擎（如 OpenAI TTS）需要安装 `ffmpeg`。\n\n## 2. 安装步骤\n\n官方已不再推荐基础安装方式，建议使用包含所有依赖的完整安装模式，以避免缺少特定引擎的依赖包。\n\n### 推荐：完整安装\n安装支持所有 TTS 引擎的版本：\n```bash\npip install -U realtimetts[all]\n```\n\n### 可选：自定义安装\n如果只需特定引擎，可减小安装包体积：\n```bash\n# 仅安装 Azure 和 OpenAI 引擎\npip install realtimetts[azure,openai]\n\n# 仅安装本地系统自带 TTS\npip install realtimetts[system]\n```\n\n### 国内加速方案\n如果遇到下载速度慢的问题，推荐使用清华或阿里镜像源：\n```bash\npip install -U realtimetts[all] -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n### 虚拟环境安装（推荐）\n为避免依赖冲突，建议在虚拟环境中安装：\n```bash\npython -m venv env_realtimetts\n# Windows 激活\nenv_realtimetts\\Scripts\\activate\n# macOS\u002FLinux 激活\nsource env_realtimetts\u002Fbin\u002Factivate\n\npython -m pip install --upgrade pip\npip install -U realtimetts[all] -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n## 3. 基本使用\n\n以下示例展示如何使用默认的本地引擎（SystemEngine）进行最简单的语音合成。\n\n### 代码示例\n\n```python\nfrom RealtimeTTS import TextToAudioStream, SystemEngine\n\n# 初始化引擎和流\nengine = SystemEngine()\nstream = TextToAudioStream(engine)\n\n# 输入文本并播放\ntext = \"你好，这是一个实时语音合成的测试。\"\nprint(f\"正在生成语音：{text}\")\nstream.feed(text)\nstream.play_async()\n\n# 等待播放结束（在实际应用中可根据需要处理）\ninput(\"按回车键停止播放...\")\n```\n\n### 使用云端引擎示例（以 OpenAI 为例）\n\n使用前需设置环境变量 `OPENAI_API_KEY`。\n\n```python\nimport os\nfrom RealtimeTTS import TextToAudioStream, OpenAIEngine\n\n# 设置 API Key\nos.environ[\"OPENAI_API_KEY\"] = \"sk-your-api-key-here\"\n\n# 初始化 OpenAI 引擎\nengine = OpenAIEngine()\nstream = TextToAudioStream(engine)\n\n# 流式播放\nstream.feed(\"Hello, this is a test using OpenAI TTS engine.\")\nstream.play_async()\n\ninput(\"Press Enter to stop...\")\n```\n\n### 核心功能说明\n- **feed(text)**: 向队列添加文本，支持多次调用实现流式输入。\n- **play_async()**: 异步播放音频，不阻塞主线程，适合实时交互场景。\n- **play()**: 同步播放，会阻塞直到音频播放完毕。\n\n通过切换不同的 Engine 类（如 `AzureEngine`, `ElevenlabsEngine`, `CoquiEngine` 等），你可以轻松更换语音提供商而无需修改主要逻辑。","一位开发者正在构建基于大语言模型（LLM）的实时语音对话助手，需要让 AI 的回答能像真人一样即时流畅地播报出来。\n\n### 没有 RealtimeTTS 时\n- **延迟感明显**：传统 TTS 引擎必须等待 LLM 生成完整句子后才开始合成音频，用户每次都要面对漫长的“思考沉默期”。\n- **交互体验割裂**：由于无法流式输出，声音是整段蹦出来的，缺乏自然对话的连贯性，让用户感觉是在听录音而非交流。\n- **容错能力薄弱**：一旦首选语音服务接口超时或报错，程序往往直接崩溃或静音，缺乏自动切换备用引擎的机制。\n- **多引擎适配繁琐**：若想尝试 Elevenlabs、Azure 或本地 Coqui 等不同效果，需要编写大量重复代码来适配各自的 API 差异。\n\n### 使用 RealtimeTTS 后\n- **近乎零延迟**：利用流式特性，RealtimeTTS 能在 LLM 吐出第一个字时就立即开始合成语音，实现“边想边说”的丝滑体验。\n- **拟真度大幅提升**：音频输出清晰自然且连续不断，消除了机械的停顿感，让 AI 助手听起来更像真人在实时对谈。\n- **运行稳定可靠**：内置的故障转移机制会在主引擎异常时自动无缝切换至备用方案，确保对话服务永不中断。\n- **开发效率倍增**：通过统一的接口即可轻松调用 OpenAI、Edge TTS 等十几种引擎，无需关心底层差异，快速验证最佳音色。\n\nRealtimeTTS 通过极低的延迟和强大的引擎兼容性，将原本生硬的文本回复转化为了流畅自然的实时语音交互，彻底重塑了人机对话的沉浸感。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FKoljaB_RealtimeTTS_da9ab5cd.png","KoljaB","Kolja Beigel","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FKoljaB_1688cc4c.png","Open-source developer of robust, high-performance, real-time STT\u002FTTS pipelines. ",null,"kolja.beigel@web.de","LonLigrin","https:\u002F\u002Fwww.youtube.com\u002F@Linguflex","https:\u002F\u002Fgithub.com\u002FKoljaB",[85,89,93,97,101],{"name":86,"color":87,"percentage":88},"Python","#3572A5",96.4,{"name":90,"color":91,"percentage":92},"Shell","#89e051",1.5,{"name":94,"color":95,"percentage":96},"Batchfile","#C1F12E",1.2,{"name":98,"color":99,"percentage":100},"JavaScript","#f1e05a",0.6,{"name":102,"color":103,"percentage":104},"Dockerfile","#384d54",0.2,3862,386,"2026-04-15T11:06:49","MIT","Linux, macOS, Windows","非必需。大多数引擎支持 CPU 运行（如 PocketTTS, Piper, System TTS）。部分高级本地引擎（如 ParlerEngine, StyleTTS2）需要高性能 GPU（文中提及 RTX 4090 可实现实时推理），ParlerEngine 明确需要 Flash Attention 和 CUDA 支持（示例命令使用 cu121\u002Fcu122）。","未说明（取决于所选引擎，本地大模型引擎通常需要较高内存）",{"notes":113,"python":114,"dependencies":115},"1. 安装方式已变更：推荐使用 'pip install realtimetts[all]' 进行完整安装，或按需指定引擎（如 [azure, openai]）。\n2. 系统依赖：Linux 用户需安装 portaudio19-dev，macOS 用户需通过 brew 安装 portaudio。\n3. 音频播放：使用 Elevenlabs 或 CAMB 引擎时，系统必须安装 mpv 播放器。\n4. 本地引擎配置：PiperEngine 需单独安装并配置可执行路径；ParlerEngine 在 Windows 上安装复杂，需手动安装特定版本的 torch 和 flash-attention。\n5. 网络要求：标记为 🌐 的引擎需要联网，标记为 🏠 的引擎可本地离线运行。","3.10+ (根据 ParlerEngine 的 Windows 安装示例中 cp310 推断，通常建议较新版本)",[116,117,118,119,120,121,122,123,124,125],"pyttsx3 (SystemEngine)","azure-cognitiveservices-speech (AzureEngine)","elevenlabs (ElevenlabsEngine)","openai (OpenAIEngine)","gtts (GTTSEngine)","edge-tts (EdgeEngine)","mpv (用于 Elevenlabs\u002FCAMB 音频流)","ffmpeg (OpenAIEngine 等需要)","torch (Parler\u002FStyleTTS2 等本地深度学习引擎需要)","flash-attn (ParlerEngine 需要)",[21],[128,129,130,131],"python","realtime","speech-synthesis","text-to-speech","2026-03-27T02:49:30.150509","2026-04-16T03:27:56.383825",[135,140,145,150,155,160,165],{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},35119,"使用 GTTS 或 Edge 引擎时语音在句子中间意外切断或停顿怎么办？","该问题已在 v0.4.14 版本中修复。升级到此版本或更高版本即可解决语音在句子中间异常切断的问题。如果升级后仍有问题，请检查是否使用了完整的文本输入而非流式片段，并确认未在其他底层引擎（如 PyTTSx3 或 edge_tts）中出现类似行为。","https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fissues\u002F223",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},35116,"如何在浏览器前端通过 JavaScript 接收并播放实时音频流？","可以通过 WebSocket 将后端生成的音频块流式传输到前端。在后端使用 `on_audio_chunk` 回调获取音频数据，并通过 WebSocket 发送给前端；前端使用 JavaScript 接收二进制数据并通过 Web Audio API 或 MediaSource Extensions 进行播放。维护者已表示将提供代码示例，用户也可参考现有的 WebSocket 音频流实现方案。","https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fissues\u002F51",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},35117,"在无声音设备的服务器（如 VPS、容器）中运行时报 ALSA 错误怎么办？","即使设置了 `muted=True`，旧版本仍可能尝试打开音频输出设备导致 ALSA 错误。解决方案是升级到最新版本，其中添加了全局 `muted` 参数到 `TextToAudioStream`，当设置为 `True` 时，可完全阻止 pyaudio 打开输出流，从而允许在无声音设备的环境中使用 `on_audio_chunk` 回调或 `output_wavfile` 参数。","https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fissues\u002F53",{"id":151,"question_zh":152,"answer_zh":153,"source_url":154},35118,"如何仅获取音频数据块而不通过系统扬声器播放？","可以使用 `play()` 方法中的 `on_audio_chunk` 回调函数来接收生成的音频数据块，同时设置 `muted=True` 参数。这样可以在不调用系统音频输出的情况下获取 PCM 数据，适用于需要将音频数据转发给其他服务（如实时通信、推流）的场景。","https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fissues\u002F36",{"id":156,"question_zh":157,"answer_zh":158,"source_url":159},35120,"如何为 Coqui 引擎更换或克隆不同的声音模型？","可以通过指定不同的模型文件路径来切换声音。注意：在引用声音文件时，不要手动添加 `.wav` 后缀，库会自动处理。例如，若声音文件名为 `voice.wav`，传入参数时应使用 `voice` 而非 `voice.wav`。此前有用户因错误添加后缀导致失败，移除后即可正常工作。","https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fissues\u002F70",{"id":161,"question_zh":162,"answer_zh":163,"source_url":164},35121,"是否支持每个引擎同时使用多种语言？","目前大多数引擎（如 GTTS）在初始化时只接受单个语言参数，不支持直接传入语言列表。如需多语言支持，建议根据文本语言动态创建不同的引擎实例，或切换到支持多语言的引擎（如 Coqui TTS），并在运行时按需加载对应语言模型。","https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fissues\u002F116",{"id":166,"question_zh":167,"answer_zh":168,"source_url":169},35122,"该项目是否支持 Fish TTS 进行实时语音合成？","截至当前讨论，官方尚未正式集成 Fish TTS。但有社区成员提到 Fish Speech v1.5 存在公开权重版本，不过其许可证可能限制商业用途。如需使用，需自行封装 Fish TTS 引擎并适配 RealtimeTTS 的接口规范，或关注后续官方更新。","https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fissues\u002F325",[171,176,181,186,191,196,201,206,211,216,221,226,231,236,241,246,251,256,261,266],{"id":172,"version":173,"summary_zh":174,"released_at":175},278971,"v0.6.0","# 实时TTS v0.6.0\n\n### 功能特性\n\n- **新增TTS引擎**\n  - **更快的通义千问TTS**：新增 `FasterQwenEngine` 引擎。详细实现方法请参阅 tests\u002Ffaster_qwen_emotions.py 和 tests\u002Ffaster_qwen_test.py。演示视频：https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ZOKcUpJlrXQ\n  - **Cartesia**：通过 WebSocket API 新增了对 Cartesia 语音合成的支持。（[#348]）\n  - **MiniMax 云TTS**：新增 `MiniMaxEngine`，支持 MiniMax T2A v2 API 的两种模型（`speech-2.8-hd`、`speech-2.8-turbo`），提供 12 种语音预设及运行时参数控制。（[#369]）\n  - **ModelsLab TTS**：新增 `ModelsLabEngine`，支持 9 种语言下的 30 多种语音，具备语速和情感控制功能，并支持懒加载。（[#368]）\n  - **CAMB AI MARS TTS**：新增 `CambEngine` 和 `CambVoice`，用于 CAMB AI 的 MARS 模型，支持 140 多种语言及流式输出。（[#367]）\n  - **NeuTTS**：新增 `NeuTTSEngine`，支持设备端 TTS 及语音克隆功能（只需 3 秒参考音频，兼容 CPU\u002FCUDA\u002FMPS）。（[#359]）\n  - **PocketTTS**：新增 `PocketTTSEngine`，专为 CPU 优化的 TTS 引擎（Kyutai Labs，8 种语音，支持语音克隆，低延迟）。（[#358]）\n\n- **WebSocket 流式传输**\n  - 通过 WebSocket 端点实现实时 TTS 流式传输，支持多用户、双向音频，并优化了 Web UI。附带 Python 示例客户端。（[#356]）\n\n### 改进\n\n- **引擎易用性**\n  - 允许 `OpenAIEngine` 将 API 密钥作为参数传入（若未传入则回退到环境变量）。（[#361]）\n  - 为 `ZipvoiceEngine` 添加语言参数，用于指定输出和提示语音的语言。（[#362]）\n  - 在 `AudioConfiguration`\u002F`TextToAudioStream` 中新增 `mpv_audio_device` 选项，用于选择 MPV 播放设备。（[#327]）\n  - 为 `TextToAudioStream` 添加可调节音量参数（0.0–1.0）。（[#335]）\n  - PiperEngine：简化合成流程，并增加从模型配置中检测采样率的功能，以提升大模型的合成质量。（[#346], [#347]）\n  - 条件日志记录：仅在启用日志时才打印“SYNTHESIS FINISHED”。（[#332]）\n\n- **通用改进**\n  - 在 macOS 上安装 `portaudio` 库。（[#328]）\n\n### 修复\n\n- 修正 `requirements.txt` 中关于 `pypinyin` 版本的拼写错误。（[#355]）\n  - 修复 `__all__` 列表中遗漏的逗号，该问题曾影响引擎的导出。（[#367]）\n  - 为非 Kokoro TTS 引擎添加音频格式检测与转换功能。（[#356]）\n  - 修复语音获取错误，并改进引擎初始化逻辑。（[#356]）\n\n### 其他\n\n- 更新了关于新引擎、播放选项及 WebSocket 使用的文档。\n- 新增或更新了针对新引擎的测试文件和示例脚本。\n- 无破坏性变更；所有更新均向后兼容。\n\n---\n\n**PR 列表：**\n[#369], [#368], [#367], [#362], [#361], [#359], [#358], [#356], [#355], [#348], [#347], [#346], [#335], [#332], [#328], [#327]","2026-03-28T21:14:57",{"id":177,"version":178,"summary_zh":179,"released_at":180},278972,"v0.5.7","# RealtimeTTS v0.5.7\n\n**新引擎：**\n\n✨ 添加了 `ZipVoiceEngine` —— ZipVoice 体积小、速度快，能够以高质量的音质实现语音克隆。\n\n- 请参阅 [zipvoice_test](https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fblob\u002Fmaster\u002Ftests\u002Fzipvoice_test.py) 文件，获取实现示例。\n- 请参阅 [zipvoice docker 文件夹](https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Ftree\u002Fmaster\u002Fdocker\u002Fzipvoice)，了解基于 ZipVoice 的实时流式 FastAPI 服务器实现。\n\n**错误修复与改进：**\n- 修复 #320 ","2025-07-21T18:10:53",{"id":182,"version":183,"summary_zh":184,"released_at":185},278973,"v0.5.6","# 实时TTS v0.5.6\n\n- **Coqui 引擎：** 更加健壮的错误处理","2025-06-27T08:03:52",{"id":187,"version":188,"summary_zh":189,"released_at":190},278974,"v0.5.5","# 实时TTS v0.5.5\n\n**错误修复与改进：**\n- **Coqui 引擎：** 增强了文本归一化功能，以更好地处理特殊字符（如前导非字母数字字符、智能引号、破折号等）及各类空白字符，从而提升合成的可靠性。\n- **流状态：** 修复了 `TextToAudioStream` 中的逻辑，使流的活跃状态报告更加准确。\n\n**其他变更：**\n- **Orpheus 引擎：** 放宽了对语音名称的严格校验（已将校验代码注释掉）。\n- **依赖项：** 升级了 `openai`（至 1.77.0）和 `edge-tts`（至 7.0.2）。","2025-05-03T21:38:10",{"id":192,"version":193,"summary_zh":194,"released_at":195},278975,"v0.5.3","# RealtimeTTS v0.5.3 发行说明\n\n**增强与变更：**\n- **静音控制**：将静音插入功能从 CoquiEngine 中移出，纳入 `TextToAudioStream`；在 play 和 play-async 方法中引入可配置的逗号、句子及默认静音时长。\n- **KokoroEngine**：现已使用 KokoroVoice。#303（感谢！）\n- **OrpheusEngine**：现在允许用户选择模型，并改进了按需停止检查机制。\n- **更线程安全的管道 (?)**：为 CoquiEngine 添加了 `SafePipe`，希望以此实现更可靠的进程间通信（仍需进一步测试）。","2025-04-19T18:47:39",{"id":197,"version":198,"summary_zh":199,"released_at":200},278976,"v0.5.1","# RealtimeTTS v0.5.1 发行说明\n\n**增强与变更：**\n- **音频修剪与淡入淡出**：可修剪音频开头和结尾的静音，并为生成的音频添加淡入淡出效果（可在 Kokoro 和 StyleTTS 引擎中配置）。\n- **停止事件处理**：实现了对正在进行的合成任务的快速可靠停止，尤其适用于 Coqui 引擎。\n- **Coqui 动态设置**：可通过 `engine.set_language()` 和 `engine.set_stream_chunk_size()` 方法，在运行时动态更改 Coqui 引擎的语言及流式传输块大小。\n- **依赖库更新**：将相关库（如 `openai`、`kokoro`、`elevenlabs` 等）升级至最新版本。","2025-04-11T11:08:49",{"id":202,"version":203,"summary_zh":204,"released_at":205},278977,"v0.5.0","# RealtimeTTS v0.5.0 发行说明\n\n**增强与变更：**\n- 重构了模块初始化，支持按需加载引擎。\n- 通过惰性加载减少了首次导入时间。\n- 提供更清晰的错误信息，并附带缺失依赖项的安装提示。","2025-03-28T15:19:10",{"id":207,"version":208,"summary_zh":209,"released_at":210},278978,"v0.4.55","# RealtimeTTS v0.4.55 发行说明\n\n- **增强的 OpenAI 引擎初始化：**\n  - 新增可选参数：`instructions`、`debug`、`speed`、`response_format` 和 `timeout`。\n  - 更新可用语音，新增“ash”、“coral”和“sage”。\n  - **注意：** 目前 `speed` 和 `timeout` 参数尚无法正常工作；具体原因尚不明确——这些参数已提交至 API。\n\n- **文本转流改进：**\n  - 引入 `error_flag` 标志位，用于跟踪播放过程中出现的错误。\n  - 将合成工作线程设置为守护线程，以确保线程能够正确终止。\n\n- **依赖库更新：**\n  - 将 OpenAI 包从版本 1.66.3 升级至 1.68.2。","2025-03-24T15:21:05",{"id":212,"version":213,"summary_zh":214,"released_at":215},278979,"v0.4.54","# RealtimeTTS v0.4.54 发行说明\n\n**新引擎：**\n\n✨ 新增 `OrpheusEngine` - 适用于 Orpheus-3B 模型的实时 TTS，具备以下特性：\n- 多种语音预设（`zac`、`zoe`、`tara` 等）\n- 支持情感化语音标签（`\u003Claugh>`、`\u003Cgasp>` 等）\n- 低延迟流式传输（首个音频标记生成时间小于 100 毫秒）\n- 使用外部服务器 → 可在其他网络系统上生成 TTS\n\n**安装：**\n```bash\npip install realtimetts[orpheus]\n```\n\n**要求：** 需要本地运行的 LM Studio 或兼容的 API 服务器，并已加载 [Orpheus-3B-0.1-ft-Q8_0-GGUF](https:\u002F\u002Fhuggingface.co\u002FPkmX\u002Forpheus-3b-0.1-ft-Q8_0-GGUF\u002Fblob\u002Fmain\u002Forpheus-3b-0.1-ft-q8_0.gguf) 模型。使用前请先在 LM Studio 中加载该模型。\n\n**示例代码：**\n\n这里提供一个 [代码示例](https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fblob\u002Fmaster\u002Ftests\u002Forpheus_test.py)，展示如何使用 OrpheusEngine。","2025-03-22T20:10:20",{"id":217,"version":218,"summary_zh":219,"released_at":220},278980,"v0.4.52","# RealtimeTTS v0.4.52 发行说明\n\n- 修复了文本中使用星号 (*) 会导致 KokoroEngine 合成失败的 bug (#278)","2025-03-19T22:11:28",{"id":222,"version":223,"summary_zh":224,"released_at":225},278981,"v0.4.51","# RealtimeTTS v0.4.51 Release Notes\r\n\r\n- added on_word callback to TextToAudioStream that gets called when a word gets played out\r\n  Supported by AzureEngine and english voices of KokoroEngine.\r\n  An instance of class TimingInfo from BaseEngine.py gets submitted as parameter. It carries the members word, start_time and end_time. For examples please look into tests\u002Fkokoro_test.py and tests\u002Fazure_test.py\r\n\r\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fe7cb35ed-2d4c-407f-a768-4f01a5afa619\r\n\r\n","2025-03-17T23:21:52",{"id":227,"version":228,"summary_zh":229,"released_at":230},278982,"v0.4.50","# RealtimeTTS v0.4.50 Release Notes\r\n\r\n- added possibility to mix voices for KokoroEngine:\r\n  ```python\r\n  engine.set_voice(\"0.714*af_nicole + 0.286*af_sky\")\r\n  ```\r\n\r\nThis feature allows you to mix a custom voice. You can use [🛠️ this tool](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fysharma\u002FMake_Custom_Voices_With_KokoroTTS) and copy\u002Fpaste the voice combination formula.\r\n\r\n📝  [Code to test voice mixing](https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fblob\u002Fmaster\u002Ftests\u002Fkokoro_mix_voices.py) mixing voices with Kokoro","2025-03-09T21:54:57",{"id":232,"version":233,"summary_zh":234,"released_at":235},278983,"v0.4.49","# RealtimeTTS v0.4.49 Release Notes\r\n\r\n- added speed parameter for KokoroEngine  \r\n  KokoroEngine takes a default_speed parameter and it can be set with engine.set_speed(speed: float).\r\n- #271 \r\n","2025-03-09T18:38:02",{"id":237,"version":238,"summary_zh":239,"released_at":240},278984,"v0.4.48","# RealtimeTTS v0.4.48 Release Notes\r\n\r\n- [set_voice](https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fblob\u002Fmaster\u002FRealtimeTTS\u002Fengines\u002Fcoqui_engine.py#L1023) method of CoquiEngine now also accepts a list of wav file references as parameter\r\n","2025-02-15T13:54:18",{"id":242,"version":243,"summary_zh":244,"released_at":245},278985,"v0.4.47","# RealtimeTTS v0.4.47 Release Notes\r\n\r\n- bugfix: paused streams could not be stopped\r\n","2025-02-09T10:48:05",{"id":247,"version":248,"summary_zh":249,"released_at":250},278986,"v0.4.46","# RealtimeTTS v0.4.46 Release Notes\r\n\r\n- support for more kokoro voices (japanese, chinese) by installing with `pip install \"RealtimeTTS[kokoro,jp,zh]\"`\r\n","2025-02-08T19:49:59",{"id":252,"version":253,"summary_zh":254,"released_at":255},278987,"v0.4.43","# RealtimeTTS v0.4.43 Release Notes\r\n\r\n- raises kokoro library dependency to version 0.7.3 from 2025-04-02 and therefore hopefully fixes #259 \r\n","2025-02-04T15:58:38",{"id":257,"version":258,"summary_zh":259,"released_at":260},278988,"v0.4.42","# RealtimeTTS v0.4.42 Release Notes\r\n\r\n- KokoroEngine can now be installed with `pip install RealtimeTTS[kokoro]` (does not need external installation anymore)\r\n- supports Kokoro-V1.0\r\n- support for [more voices](https:\u002F\u002Fhuggingface.co\u002Fhexgrad\u002FKokoro-82M\u002Fblob\u002Fmain\u002FVOICES.md)\r\n","2025-02-03T19:09:16",{"id":262,"version":263,"summary_zh":264,"released_at":265},278989,"v0.4.41","# RealtimeTTS v0.4.41 Release Notes\r\n\r\n### New Feature: KokoroEngine Support\r\n\r\n- **KokoroEngine Integration**  \r\n  - Introduces support for the Kokoro 82M TTS engine.\r\n  - Provides access to a variety of Kokoro voice models.\r\n\r\n- **Installation**:  \r\n  ```bash\r\n  pip install realtimetts[all]==0.4.41\r\n  ```\r\n\r\n- **Setup Resources**:\r\n  - Kokoro installation guide: [Kokoro Installation](https:\u002F\u002Fhuggingface.co\u002Fhexgrad\u002FKokoro-82M#usage)\r\n  - Test script: [kokoro_test.py](https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fblob\u002Fmaster\u002Ftests\u002Fkokoro_test.py)\r\n  - Engine implementation: [kokoro_engine.py](https:\u002F\u002Fgithub.com\u002FKoljaB\u002FRealtimeTTS\u002Fblob\u002Fmaster\u002FRealtimeTTS\u002Fengines\u002Fkokoro_engine.py)\r\n\r\n### Usage Overview\r\n\r\n```python\r\nfrom RealtimeTTS import TextToAudioStream, KokoroEngine\r\n\r\n# Initialize Kokoro engine\r\nengine = KokoroEngine(kokoro_root=\"path\u002Fto\u002FKokoro-82M\")\r\n\r\n# Switch voice as needed\r\nengine.set_voice(\"af_sky\")\r\n\r\n# Create audio stream\r\nstream = TextToAudioStream(engine)\r\n\r\n# Feed and play audio using Kokoro voices\r\nstream.feed(\"Hello world\")\r\nstream.play()\r\n\r\n```\r\n\r\n","2025-01-11T17:14:00",{"id":267,"version":268,"summary_zh":269,"released_at":270},278990,"v0.4.40","# RealtimeTTS v0.4.4 Release Notes\r\n\r\n### **Configurable Playback Parameters**\r\n\r\n#### **New Parameters: `frames_per_buffer` and `playout_chunk_size`**\r\n\r\n- **Purpose**:\r\n  - These new parameters provide finer control over audio playback buffering, which is especially useful for mitigating stuttering issues on Unix-based systems.\r\n\r\n- **Details**:\r\n  1. **`frames_per_buffer`**:\r\n     - Controls the number of audio frames processed per buffer by PyAudio. \r\n     - Lower values reduce latency but increase CPU usage, while higher values reduce CPU load but increase latency.\r\n     - **Recommended Settings for Stuttering**:\r\n       - Start by setting `frames_per_buffer` to `256`.\r\n       - If issues persist, reduce it further to `128`.\r\n       \r\n     Example:\r\n     ```python\r\n     stream = TextToAudioStream(engine, frames_per_buffer=256)\r\n     ```\r\n\r\n  2. **`playout_chunk_size`**:\r\n     - Specifies the size (in bytes) of audio chunks played out to the stream. \r\n     - Works in conjunction with `frames_per_buffer` to optimize audio smoothness.\r\n     - Defaults to dynamic calculation, but can be explicitly set for precise control.\r\n\r\n     Example:\r\n     ```python\r\n     stream = TextToAudioStream(engine, playout_chunk_size=1024)\r\n     ```\r\n\r\n#### **How These Parameters Address Stuttering**:\r\n- On Unix systems, default buffer sizes may cause sporadic stuttering during audio playback due to timing mismatches between the audio stream and system audio drivers.\r\n- By reducing `frames_per_buffer` to `256` or `128`, the playback becomes more responsive and better aligned with system timing.\r\n- Adjusting `playout_chunk_size` further enhances playback smoothness by ensuring optimal chunk delivery to the audio stream.\r\n\r\n---\r\n\r\n### **Usage Examples**\r\n\r\n#### **Basic Configuration**:\r\n```python\r\nfrom RealtimeTTS import TextToAudioStream, PiperEngine\r\n\r\nengine = PiperEngine(piper_path=\"path\u002Fto\u002Fpiper.exe\", voice=my_voice)\r\nstream = TextToAudioStream(\r\n    engine=engine,\r\n    frames_per_buffer=256,  # Start with 256 to reduce stuttering\r\n    playout_chunk_size=1024 # Optional for further customization\r\n)\r\nstream.play()\r\n```\r\n\r\n#### **Fine-Tuning for Stuttering**:\r\n- If playback issues occur:\r\n  1. Set `frames_per_buffer` to `256` (recommended starting point).\r\n  2. Reduce to `128` if stuttering persists.\r\n  3. Optionally adjust `playout_chunk_size` to a fixed value like `1024` or `512`.\r\n\r\n---\r\n\r\n- **Backward Compatibility**:\r\n  - Defaults for `frames_per_buffer` and `playout_chunk_size` maintain compatibility with previous versions, requiring no changes for existing setups unless adjustments are needed.\r\n\r\n","2025-01-06T12:07:16"]