[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-jamiepine--voicebox":3,"tool-jamiepine--voicebox":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":81,"owner_twitter":75,"owner_website":82,"owner_url":83,"languages":84,"stars":120,"forks":121,"last_commit_at":122,"license":123,"difficulty_score":10,"env_os":124,"env_gpu":125,"env_ram":126,"env_deps":127,"category_tags":137,"github_topics":138,"view_count":146,"oss_zip_url":81,"oss_zip_packed_at":81,"status":16,"created_at":147,"updated_at":148,"faqs":149,"releases":177},372,"jamiepine\u002Fvoicebox","voicebox","The open-source voice synthesis studio","Voicebox 是一款本地优先的开源语音合成工作室，旨在让用户完全掌控自己的声音数据。它支持从几秒音频中克隆声音，并在本地生成高质量的多语言语音，同时提供丰富的后期处理效果。\n\n针对云端语音服务存在的隐私泄露风险和持续付费成本，Voicebox 提供了完美的替代方案。它内置了包括 Qwen3-TTS 在内的五种语音引擎，覆盖英语、日语等 23 种语言。其独特的故事编辑器允许用户通过时间轴编排多轨道对话，非常适合制作有声书、播客或游戏配音。对于开发者而言，它还开放了 REST API，便于将语音合成功能集成到自己的应用中。\n\n技术上，Voicebox 基于 Rust 构建，采用 Tauri 框架实现原生性能，兼容 macOS、Windows、Linux 及 Docker 环境。无论是追求隐私安全的普通用户，还是需要定制化工具的开发者，Voicebox 都能提供灵活、免费且高效的本地化语音解决方案，让创意表达不再受限于网络和服务商。","\u003Cp align=\"center\">\n  \u003Cimg src=\".github\u002Fassets\u002Ficon-dark.webp\" alt=\"Voicebox\" width=\"120\" height=\"120\" \u002F>\n\u003C\u002Fp>\n\n\u003Ch1 align=\"center\">Voicebox\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n  \u003Cstrong>The open-source voice synthesis studio.\u003C\u002Fstrong>\u003Cbr\u002F>\n  Clone voices. Generate speech. Apply effects. Build voice-powered apps.\u003Cbr\u002F>\n  All running locally on your machine.\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Freleases\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fdownloads\u002Fjamiepine\u002Fvoicebox\u002Ftotal?style=flat&color=blue\" alt=\"Downloads\" \u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Freleases\u002Flatest\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Frelease\u002Fjamiepine\u002Fvoicebox?style=flat\" alt=\"Release\" \u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fstargazers\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fjamiepine\u002Fvoicebox?style=flat\" alt=\"Stars\" \u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fblob\u002Fmain\u002FLICENSE\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fjamiepine\u002Fvoicebox?style=flat\" alt=\"License\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fvoicebox.sh\">voicebox.sh\u003C\u002Fa> •\n  \u003Ca href=\"https:\u002F\u002Fdocs.voicebox.sh\">Docs\u003C\u002Fa> •\n  \u003Ca href=\"#download\">Download\u003C\u002Fa> •\n  \u003Ca href=\"#features\">Features\u003C\u002Fa> •\n  \u003Ca href=\"#api\">API\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cbr\u002F>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fvoicebox.sh\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjamiepine_voicebox_readme_44f6cc7400fb.webp\" alt=\"Voicebox App Screenshot\" width=\"800\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cem>Click the image above to watch the demo video on \u003Ca href=\"https:\u002F\u002Fvoicebox.sh\">voicebox.sh\u003C\u002Fa>\u003C\u002Fem>\n\u003C\u002Fp>\n\n\u003Cbr\u002F>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjamiepine_voicebox_readme_cc41f5f4cacb.webp\" alt=\"Voicebox Screenshot 2\" width=\"800\" \u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjamiepine_voicebox_readme_5ca3401f9bf3.webp\" alt=\"Voicebox Screenshot 3\" width=\"800\" \u002F>\n\u003C\u002Fp>\n\n\u003Cbr\u002F>\n\n## What is Voicebox?\n\nVoicebox is a **local-first voice cloning studio** — a free and open-source alternative to ElevenLabs. Clone voices from a few seconds of audio, generate speech in 23 languages across 5 TTS engines, apply post-processing effects, and compose multi-voice projects with a timeline editor.\n\n- **Complete privacy** — models and voice data stay on your machine\n- **5 TTS engines** — Qwen3-TTS, LuxTTS, Chatterbox Multilingual, Chatterbox Turbo, and HumeAI TADA\n- **23 languages** — from English to Arabic, Japanese, Hindi, Swahili, and more\n- **Post-processing effects** — pitch shift, reverb, delay, chorus, compression, and filters\n- **Expressive speech** — paralinguistic tags like `[laugh]`, `[sigh]`, `[gasp]` via Chatterbox Turbo\n- **Unlimited length** — auto-chunking with crossfade for scripts, articles, and chapters\n- **Stories editor** — multi-track timeline for conversations, podcasts, and narratives\n- **API-first** — REST API for integrating voice synthesis into your own projects\n- **Native performance** — built with Tauri (Rust), not Electron\n- **Runs everywhere** — macOS (MLX\u002FMetal), Windows (CUDA), Linux, AMD ROCm, Intel Arc, Docker\n\n---\n\n## Download\n\n| Platform              | Download                                               |\n| --------------------- | ------------------------------------------------------ |\n| macOS (Apple Silicon) | [Download DMG](https:\u002F\u002Fvoicebox.sh\u002Fdownload\u002Fmac-arm)   |\n| macOS (Intel)         | [Download DMG](https:\u002F\u002Fvoicebox.sh\u002Fdownload\u002Fmac-intel) |\n| Windows               | [Download MSI](https:\u002F\u002Fvoicebox.sh\u002Fdownload\u002Fwindows)   |\n| Docker                | `docker compose up`                                    |\n\n> **[View all binaries →](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Freleases\u002Flatest)**\n\n> **Linux** — Pre-built binaries are not yet available. See [voicebox.sh\u002Flinux-install](https:\u002F\u002Fvoicebox.sh\u002Flinux-install) for build-from-source instructions.\n\n---\n\n## Features\n\n### Multi-Engine Voice Cloning\n\nFive TTS engines with different strengths, switchable per-generation:\n\n| Engine                      | Languages | Strengths                                                                                                                                |\n| --------------------------- | --------- | ---------------------------------------------------------------------------------------------------------------------------------------- |\n| **Qwen3-TTS** (0.6B \u002F 1.7B) | 10        | High-quality multilingual cloning, delivery instructions (\"speak slowly\", \"whisper\")                                                     |\n| **LuxTTS**                  | English   | Lightweight (~1GB VRAM), 48kHz output, 150x realtime on CPU                                                                              |\n| **Chatterbox Multilingual** | 23        | Broadest language coverage — Arabic, Danish, Finnish, Greek, Hebrew, Hindi, Malay, Norwegian, Polish, Swahili, Swedish, Turkish and more |\n| **Chatterbox Turbo**        | English   | Fast 350M model with paralinguistic emotion\u002Fsound tags                                                                                   |\n| **TADA** (1B \u002F 3B)          | 10        | HumeAI speech-language model — 700s+ coherent audio, text-acoustic dual alignment                                                        |\n\n### Emotions & Paralinguistic Tags\n\nType `\u002F` in the text input to insert expressive tags that the model synthesizes inline with speech (Chatterbox Turbo):\n\n`[laugh]` `[chuckle]` `[gasp]` `[cough]` `[sigh]` `[groan]` `[sniff]` `[shush]` `[clear throat]`\n\n### Post-Processing Effects\n\n8 audio effects powered by Spotify's `pedalboard` library. Apply after generation, preview in real time, build reusable presets.\n\n| Effect           | Description                                   |\n| ---------------- | --------------------------------------------- |\n| Pitch Shift      | Up or down by up to 12 semitones              |\n| Reverb           | Configurable room size, damping, wet\u002Fdry mix  |\n| Delay            | Echo with adjustable time, feedback, and mix  |\n| Chorus \u002F Flanger | Modulated delay for metallic or lush textures |\n| Compressor       | Dynamic range compression                     |\n| Gain             | Volume adjustment (-40 to +40 dB)             |\n| High-Pass Filter | Remove low frequencies                        |\n| Low-Pass Filter  | Remove high frequencies                       |\n\nShips with 4 built-in presets (Robotic, Radio, Echo Chamber, Deep Voice) and supports custom presets. Effects can be assigned per-profile as defaults.\n\n### Unlimited Generation Length\n\nText is automatically split at sentence boundaries and each chunk is generated independently, then crossfaded together. Works with all engines.\n\n- Configurable auto-chunking limit (100–5,000 chars)\n- Crossfade slider (0–200ms) for smooth transitions\n- Max text length: 50,000 characters\n- Smart splitting respects abbreviations, CJK punctuation, and `[tags]`\n\n### Generation Versions\n\nEvery generation supports multiple versions with provenance tracking:\n\n- **Original** — clean TTS output, always preserved\n- **Effects versions** — apply different effects chains from any source version\n- **Takes** — regenerate with a new seed for variation\n- **Source tracking** — each version records its lineage\n- **Favorites** — star generations for quick access\n\n### Async Generation Queue\n\nGeneration is non-blocking. Submit and immediately start typing the next one.\n\n- Serial execution queue prevents GPU contention\n- Real-time SSE status streaming\n- Failed generations can be retried\n- Stale generations from crashes auto-recover on startup\n\n### Voice Profile Management\n\n- Create profiles from audio files or record directly in-app\n- Import\u002Fexport profiles to share or back up\n- Multi-sample support for higher quality cloning\n- Per-profile default effects chains\n- Organize with descriptions and language tags\n\n### Stories Editor\n\nMulti-voice timeline editor for conversations, podcasts, and narratives.\n\n- Multi-track composition with drag-and-drop\n- Inline audio trimming and splitting\n- Auto-playback with synchronized playhead\n- Version pinning per track clip\n\n### Recording & Transcription\n\n- In-app recording with waveform visualization\n- System audio capture (macOS and Windows)\n- Automatic transcription powered by Whisper (including Whisper Turbo)\n- Export recordings in multiple formats\n\n### Model Management\n\n- Per-model unload to free GPU memory without deleting downloads\n- Custom models directory via `VOICEBOX_MODELS_DIR`\n- Model folder migration with progress tracking\n- Download cancel\u002Fclear UI\n\n### GPU Support\n\n| Platform                 | Backend        | Notes                                          |\n| ------------------------ | -------------- | ---------------------------------------------- |\n| macOS (Apple Silicon)    | MLX (Metal)    | 4-5x faster via Neural Engine                  |\n| Windows \u002F Linux (NVIDIA) | PyTorch (CUDA) | Auto-downloads CUDA binary from within the app |\n| Linux (AMD)              | PyTorch (ROCm) | Auto-configures HSA_OVERRIDE_GFX_VERSION       |\n| Windows (any GPU)        | DirectML       | Universal Windows GPU support                  |\n| Intel Arc                | IPEX\u002FXPU       | Intel discrete GPU acceleration                |\n| Any                      | CPU            | Works everywhere, just slower                  |\n\n---\n\n## API\n\nVoicebox exposes a full REST API for integrating voice synthesis into your own apps.\n\n```bash\n# Generate speech\ncurl -X POST http:\u002F\u002Flocalhost:17493\u002Fgenerate \\\n  -H \"Content-Type: application\u002Fjson\" \\\n  -d '{\"text\": \"Hello world\", \"profile_id\": \"abc123\", \"language\": \"en\"}'\n\n# List voice profiles\ncurl http:\u002F\u002Flocalhost:17493\u002Fprofiles\n\n# Create a profile\ncurl -X POST http:\u002F\u002Flocalhost:17493\u002Fprofiles \\\n  -H \"Content-Type: application\u002Fjson\" \\\n  -d '{\"name\": \"My Voice\", \"language\": \"en\"}'\n```\n\n**Use cases:** game dialogue, podcast production, accessibility tools, voice assistants, content automation.\n\nFull API documentation available at `http:\u002F\u002Flocalhost:17493\u002Fdocs`.\n\n---\n\n## Tech Stack\n\n| Layer         | Technology                                        |\n| ------------- | ------------------------------------------------- |\n| Desktop App   | Tauri (Rust)                                      |\n| Frontend      | React, TypeScript, Tailwind CSS                   |\n| State         | Zustand, React Query                              |\n| Backend       | FastAPI (Python)                                  |\n| TTS Engines   | Qwen3-TTS, LuxTTS, Chatterbox, Chatterbox Turbo, TADA |\n| Effects       | Pedalboard (Spotify)                              |\n| Transcription | Whisper \u002F Whisper Turbo (PyTorch or MLX)          |\n| Inference     | MLX (Apple Silicon) \u002F PyTorch (CUDA\u002FROCm\u002FXPU\u002FCPU) |\n| Database      | SQLite                                            |\n| Audio         | WaveSurfer.js, librosa                            |\n\n---\n\n## Roadmap\n\n| Feature                 | Description                                    |\n| ----------------------- | ---------------------------------------------- |\n| **Real-time Streaming** | Stream audio as it generates, word by word     |\n| **Voice Design**        | Create new voices from text descriptions       |\n| **More Models**         | XTTS, Bark, and other open-source voice models  |\n| **Plugin Architecture** | Extend with custom models and effects          |\n| **Mobile Companion**    | Control Voicebox from your phone               |\n\n---\n\n## Development\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for detailed setup and contribution guidelines.\n\n### Quick Start\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox.git\ncd voicebox\n\njust setup   # creates Python venv, installs all deps\njust dev     # starts backend + desktop app\n```\n\nInstall [just](https:\u002F\u002Fgithub.com\u002Fcasey\u002Fjust): `brew install just` or `cargo install just`. Run `just --list` to see all commands.\n\n**Prerequisites:** [Bun](https:\u002F\u002Fbun.sh), [Rust](https:\u002F\u002Frustup.rs), [Python 3.11+](https:\u002F\u002Fpython.org), [Tauri Prerequisites](https:\u002F\u002Fv2.tauri.app\u002Fstart\u002Fprerequisites\u002F), and [Xcode](https:\u002F\u002Fdeveloper.apple.com\u002Fxcode\u002F) on macOS.\n\n### Building Locally\n\n```bash\njust build          # Build CPU server binary + Tauri app\njust build-local    # (Windows) Build CPU + CUDA server binaries + Tauri app\n```\n\n### Adding New Voice Models\n\nThe multi-engine architecture makes adding new TTS engines straightforward. A [step-by-step guide](docs\u002Fcontent\u002Fdocs\u002Fdeveloper\u002Ftts-engines.mdx) covers the full process: dependency research, backend protocol implementation, frontend wiring, and PyInstaller bundling.\n\nThe guide is optimized for AI coding agents. An [agent skill](.agents\u002Fskills\u002Fadd-tts-engine\u002FSKILL.md) can pick up a model name and handle the entire integration autonomously — you just test the build locally.\n\n### Project Structure\n\n```\nvoicebox\u002F\n├── app\u002F              # Shared React frontend\n├── tauri\u002F            # Desktop app (Tauri + Rust)\n├── web\u002F              # Web deployment\n├── backend\u002F          # Python FastAPI server\n├── landing\u002F          # Marketing website\n└── scripts\u002F          # Build & release scripts\n```\n\n---\n\n## Contributing\n\nContributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n1. Fork the repo\n2. Create a feature branch\n3. Make your changes\n4. Submit a PR\n\n## Security\n\nFound a security vulnerability? Please report it responsibly. See [SECURITY.md](SECURITY.md) for details.\n\n---\n\n## License\n\nMIT License — see [LICENSE](LICENSE) for details.\n\n---\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fvoicebox.sh\">voicebox.sh\u003C\u002Fa>\n\u003C\u002Fp>\n","\u003Cp align=\"center\">\n  \u003Cimg src=\".github\u002Fassets\u002Ficon-dark.webp\" alt=\"Voicebox\" width=\"120\" height=\"120\" \u002F>\n\u003C\u002Fp>\n\n\u003Ch1 align=\"center\">Voicebox\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n  \u003Cstrong>开源语音合成工作室。\u003C\u002Fstrong>\u003Cbr\u002F>\n  克隆声音。生成语音。应用效果。构建语音驱动的应用程序。\u003Cbr\u002F>\n  所有功能均在您的本地机器上运行。\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Freleases\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fdownloads\u002Fjamiepine\u002Fvoicebox\u002Ftotal?style=flat&color=blue\" alt=\"下载量\" \u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Freleases\u002Flatest\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Frelease\u002Fjamiepine\u002Fvoicebox?style=flat\" alt=\"版本\" \u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fstargazers\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fjamiepine\u002Fvoicebox?style=flat\" alt=\"星标\" \u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fblob\u002Fmain\u002FLICENSE\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fjamiepine\u002Fvoicebox?style=flat\" alt=\"许可证\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fvoicebox.sh\">voicebox.sh\u003C\u002Fa> •\n  \u003Ca href=\"https:\u002F\u002Fdocs.voicebox.sh\">文档\u003C\u002Fa> •\n  \u003Ca href=\"#download\">下载\u003C\u002Fa> •\n  \u003Ca href=\"#features\">功能\u003C\u002Fa> •\n  \u003Ca href=\"#api\">API (应用程序编程接口)\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cbr\u002F>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fvoicebox.sh\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjamiepine_voicebox_readme_44f6cc7400fb.webp\" alt=\"Voicebox App Screenshot\" width=\"800\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cem>点击图片上方以在 \u003Ca href=\"https:\u002F\u002Fvoicebox.sh\">voicebox.sh\u003C\u002Fa> 上观看演示视频\u003C\u002Fem>\n\u003C\u002Fp>\n\n\u003Cbr\u002F>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjamiepine_voicebox_readme_cc41f5f4cacb.webp\" alt=\"Voicebox Screenshot 2\" width=\"800\" \u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjamiepine_voicebox_readme_5ca3401f9bf3.webp\" alt=\"Voicebox Screenshot 3\" width=\"800\" \u002F>\n\u003C\u002Fp>\n\n\u003Cbr\u002F>\n\n## 什么是 Voicebox？\n\nVoicebox 是一个**本地优先的语音克隆工作室** —— 它是 ElevenLabs 的免费开源替代品。只需几秒音频即可克隆声音，在 5 个 TTS (Text-to-Speech，文本转语音) 引擎中生成 23 种语言的语音，应用后期处理效果，并使用时间线编辑器编排多语音项目。\n\n- **完全隐私** — 模型和语音数据保留在您的机器上\n- **5 个 TTS 引擎** — Qwen3-TTS, LuxTTS, Chatterbox Multilingual, Chatterbox Turbo, 和 HumeAI TADA\n- **23 种语言** — 从英语到阿拉伯语、日语、印地语、斯瓦希里语等\n- **后期处理效果** — 音高移位、混响、延迟、合唱、压缩和滤波器\n- **富有表现力的语音** — 通过 Chatterbox Turbo 使用副语言标签如 `[laugh]`, `[sigh]`, `[gasp]`\n- **无限长度** — 自动分块并带有交叉淡入，适用于脚本、文章和章节\n- **故事编辑器** — 用于对话、播客和叙事的轨道时间线\n- **API 优先** — REST API 用于将语音合成集成到您的项目中\n- **原生性能** — 基于 Tauri (Rust) 构建，而非 Electron\n- **全平台运行** — macOS (MLX\u002FMetal), Windows (CUDA), Linux, AMD ROCm, Intel Arc, Docker\n\n---\n\n## 下载\n\n| 平台              | 下载                                               |\n| --------------------- | ------------------------------------------------------ |\n| macOS (Apple Silicon) | [下载 DMG](https:\u002F\u002Fvoicebox.sh\u002Fdownload\u002Fmac-arm)   |\n| macOS (Intel)         | [下载 DMG](https:\u002F\u002Fvoicebox.sh\u002Fdownload\u002Fmac-intel) |\n| Windows               | [下载 MSI](https:\u002F\u002Fvoicebox.sh\u002Fdownload\u002Fwindows)   |\n| Docker                | `docker compose up`                                    |\n\n> **[查看所有二进制文件 →](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Freleases\u002Flatest)**\n\n> **Linux** — 预编译的二进制文件尚不可用。请参阅 [voicebox.sh\u002Flinux-install](https:\u002F\u002Fvoicebox.sh\u002Flinux-install) 获取从源码构建的说明。\n\n---\n\n## 功能\n\n### 多引擎语音克隆\n\n五个具有不同优势的 TTS 引擎，可按次切换：\n\n| 引擎                      | 语言数 | 优势                                                                                                                                |\n| --------------------------- | --------- | ---------------------------------------------------------------------------------------------------------------------------------------- |\n| **Qwen3-TTS** (0.6B \u002F 1.7B) | 10        | 高质量多语言克隆，语调指令（“慢慢说”，“耳语”）                                                     |\n| **LuxTTS**                  | 英语      | 轻量级（约 1GB VRAM\u002F显存），48kHz 输出，CPU 上 150 倍实时 (realtime) 速度                                                                              |\n| **Chatterbox Multilingual** | 23        | 最广泛的语言覆盖 — 阿拉伯语、丹麦语、芬兰语、希腊语、希伯来语、印地语、马来语、挪威语、波兰语、斯瓦希里语、瑞典语、土耳其语等 |\n| **Chatterbox Turbo**        | 英语      | 快速 3.5 亿参数模型，支持副语言情感\u002F声音标签                                                                                   |\n| **TADA** (1B \u002F 3B)          | 10        | HumeAI 语音语言模型 — 700 秒以上连贯音频，文本 - 声学双对齐                                                        |\n\n### 情感与副语言标签\n\n在文本输入框中输入 `\u002F` 即可插入模型会随语音合成的表达性标签（Chatterbox Turbo）：\n\n`[laugh]` `[chuckle]` `[gasp]` `[cough]` `[sigh]` `[groan]` `[sniff]` `[shush]` `[clear throat]`\n\n### 后期处理效果\n\n由 Spotify 的 `pedalboard` 库驱动的 8 种音频效果。生成后应用，实时预览，构建可重用的预设。\n\n| 效果           | 描述                                   |\n| ---------------- | --------------------------------------------- |\n| Pitch Shift (音高移位)      | 上下最多 12 个半音              |\n| Reverb (混响)           | 可配置房间大小、阻尼、干湿比混合  |\n| Delay (延迟)            | 可调节时间、反馈和混合的混响     |\n| Chorus \u002F Flanger (合唱 \u002F 镶边) | 调制延迟以产生金属感或丰富质感 |\n| Compressor (压缩器)       | 动态范围压缩                     |\n| Gain (增益)             | 音量调整（-40 至 +40 dB）             |\n| High-Pass Filter (高通滤波器) | 去除低频                        |\n| Low-Pass Filter (低通滤波器)  | 去除高频                       |\n\n内置 4 种预设（机器人、广播、回声室、深嗓音），支持自定义预设。效果可作为默认值分配给每个配置文件。\n\n### 无限生成长度\n\n文本会自动在句子边界处分割，每个块独立生成，然后交叉淡入合并。适用于所有引擎。\n\n- 可配置的自动分块限制（100–5,000 字符）\n- 交叉淡入滑块（0–200ms）以实现平滑过渡\n- 最大文本长度：50,000 字符\n- 智能分割尊重缩写、中日韩 (CJK) 标点符号和 `[tags]`\n\n### 生成版本\n\n每次生成都支持多个版本，并带有溯源追踪功能：\n\n- **原始** — 纯净的 TTS（文本转语音）输出，始终保留\n- **效果版本** — 应用来自任何源版本的不同效果链\n- **Takes（变体）** — 使用新种子重新生成以获得变体\n- **来源追踪** — 每个版本都会记录其谱系\n- **收藏** — 星标生成内容以便快速访问\n\n### 异步生成队列\n\n生成过程是非阻塞的。提交后立即可开始输入下一个。\n\n- 串行执行队列可防止 GPU（图形处理器）争用\n- 实时 SSE（服务器发送事件）状态流\n- 失败的生成可以重试\n- 崩溃产生的过时生成在启动时自动恢复\n\n### 语音配置文件管理\n\n- 从音频文件创建配置文件或在应用内直接录制\n- 导入\u002F导出配置文件以共享或备份\n- 支持多样本以实现更高质量的克隆\n- 每个配置文件的默认效果链\n- 通过描述和语言标签进行组织\n\n### 故事编辑器\n\n用于对话、播客和叙事的多人声时间线编辑器。\n\n- 支持拖放的多轨道编排\n- 行内音频裁剪和分割\n- 带同步播放头的自动回放\n- 每个轨道剪辑的版本固定\n\n### 录音与转录\n\n- 应用内录制，带波形可视化\n- 系统音频捕获（macOS 和 Windows）\n- 由 Whisper（语音识别模型）（包括 Whisper Turbo）驱动的自动转录\n- 以多种格式导出录音\n\n### 模型管理\n\n- 按模型卸载以释放 GPU（图形处理器）内存，无需删除下载内容\n- 通过 `VOICEBOX_MODELS_DIR` 设置自定义模型目录\n- 带进度跟踪的模型文件夹迁移\n- 下载取消\u002F清除界面\n\n### GPU 支持\n\n| 平台                 | 后端        | 备注                                          |\n| ------------------------ | -------------- | ---------------------------------------------- |\n| macOS (Apple 芯片)    | MLX（Apple 机器学习框架）(Metal)    | 通过神经引擎快 4-5 倍                  |\n| Windows \u002F Linux (英伟达) | PyTorch（深度学习框架）(CUDA) | 应用内自动下载 CUDA 二进制文件 |\n| Linux (超威半导体)              | PyTorch（深度学习框架）(ROCm) | 自动配置 HSA_OVERRIDE_GFX_VERSION       |\n| Windows (任意 GPU)        | DirectML       | 通用 Windows GPU 支持                  |\n| Intel Arc                | IPEX\u002FXPU       | Intel 独立 GPU 加速                |\n| 任意                      | CPU            | 处处可用，只是较慢                  |\n\n---\n\n## API\n\nVoicebox 提供完整的 REST API，以便将语音合成功能集成到您自己的应用中。\n\n```bash\n# Generate speech\ncurl -X POST http:\u002F\u002Flocalhost:17493\u002Fgenerate \\\n  -H \"Content-Type: application\u002Fjson\" \\\n  -d '{\"text\": \"Hello world\", \"profile_id\": \"abc123\", \"language\": \"en\"}'\n\n# List voice profiles\ncurl http:\u002F\u002Flocalhost:17493\u002Fprofiles\n\n# Create a profile\ncurl -X POST http:\u002F\u002Flocalhost:17493\u002Fprofiles \\\n  -H \"Content-Type: application\u002Fjson\" \\\n  -d '{\"name\": \"My Voice\", \"language\": \"en\"}'\n```\n\n**用例：** 游戏对话、播客制作、辅助工具、语音助手、内容自动化。\n\n完整 API 文档位于 `http:\u002F\u002Flocalhost:17493\u002Fdocs`。\n\n---\n\n## 技术栈\n\n| 层级         | 技术                                        |\n| ------------- | ------------------------------------------------- |\n| 桌面应用   | Tauri（跨平台应用框架）(Rust)                                      |\n| 前端      | React, TypeScript, Tailwind CSS                   |\n| 状态管理         | Zustand, React Query                              |\n| 后端       | FastAPI（Python 框架）                                  |\n| TTS 引擎   | Qwen3-TTS, LuxTTS, Chatterbox, Chatterbox Turbo, TADA |\n| 效果       | Pedalboard (Spotify)                              |\n| 转录 | Whisper \u002F Whisper Turbo (PyTorch 或 MLX)          |\n| 推理     | MLX (Apple 芯片) \u002F PyTorch (CUDA\u002FROCm\u002FXPU\u002FCPU) |\n| 数据库      | SQLite                                            |\n| 音频         | WaveSurfer.js, librosa                            |\n\n---\n\n## 路线图\n\n| 功能                 | 描述                                    |\n| ----------------------- | ---------------------------------------------- |\n| **实时流式传输** | 逐字流式传输生成的音频     |\n| **语音设计**        | 根据文本描述创建新声音       |\n| **更多模型**         | XTTS、Bark 及其他开源语音模型  |\n| **插件架构** | 通过自定义模型和效果扩展          |\n| **移动伴侣**    | 用手机控制 Voicebox               |\n\n---\n\n## 开发\n\n详见 [CONTRIBUTING.md](CONTRIBUTING.md) 以获取详细的设置和贡献指南。\n\n### 快速开始\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox.git\ncd voicebox\n\njust setup   # creates Python venv, installs all deps\njust dev     # starts backend + desktop app\n```\n\n安装 [just](https:\u002F\u002Fgithub.com\u002Fcasey\u002Fjust)：`brew install just` 或 `cargo install just`。运行 `just --list` 查看所有命令。\n\n**前置条件：** [Bun](https:\u002F\u002Fbun.sh)、[Rust](https:\u002F\u002Frustup.rs)、[Python 3.11+](https:\u002F\u002Fpython.org)、[Tauri 前置条件](https:\u002F\u002Fv2.tauri.app\u002Fstart\u002Fprerequisites\u002F)，以及 macOS 上的 [Xcode](https:\u002F\u002Fdeveloper.apple.com\u002Fxcode\u002F)。\n\n### 本地构建\n\n```bash\njust build          # Build CPU server binary + Tauri app\njust build-local    # (Windows) Build CPU + CUDA server binaries + Tauri app\n```\n\n### 添加新语音模型\n\n多引擎架构使得添加新的 TTS（文本转语音）引擎变得简单。[分步指南](docs\u002Fcontent\u002Fdocs\u002Fdeveloper\u002Ftts-engines.mdx) 涵盖完整流程：依赖研究、后端协议实现、前端集成和 PyInstaller 打包。\n\n该指南针对 AI 编码代理进行了优化。一个 [代理技能](.agents\u002Fskills\u002Fadd-tts-engine\u002FSKILL.md) 可以接收模型名称并自主处理整个集成过程——您只需在本地测试构建结果。\n\n### 项目结构\n\n```\nvoicebox\u002F\n├── app\u002F              # Shared React frontend\n├── tauri\u002F            # Desktop app (Tauri + Rust)\n├── web\u002F              # Web deployment\n├── backend\u002F          # Python FastAPI server\n├── landing\u002F          # Marketing website\n└── scripts\u002F          # Build & release scripts\n```\n\n---\n\n## 贡献\n\n欢迎贡献！详见 [CONTRIBUTING.md](CONTRIBUTING.md) 获取指南。\n\n1. Fork 仓库\n2. 创建功能分支\n3. 进行修改\n4. 提交 PR（拉取请求）\n\n## 安全\n\n发现安全漏洞？请负责任地报告。详见 [SECURITY.md](SECURITY.md)。\n\n---\n\n## 许可证\n\nMIT 许可证 — 详见 [LICENSE](LICENSE)。\n\n---\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fvoicebox.sh\">voicebox.sh\u003C\u002Fa>\n\u003C\u002Fp>","# Voicebox 快速上手指南\n\n**Voicebox** 是一个本地优先的开源语音克隆工作室，支持在本地机器上克隆声音、生成语音、应用音效并构建语音驱动的应用。它提供隐私保护、多引擎支持（5 种 TTS）及跨平台运行能力。\n\n---\n\n## 环境准备\n\n### 系统要求\n*   **macOS**: Apple Silicon (MLX\u002FMetal) 或 Intel\n*   **Windows**: 支持 NVIDIA (CUDA)、DirectML 等\n*   **Linux**: 支持 AMD ROCm、Intel Arc、CPU (需源码编译)\n*   **Docker**: 支持容器化部署\n\n### 前置依赖 (开发\u002F源码编译)\n如果您计划从源码构建或在 Linux 上运行，请确保安装以下工具：\n*   [Bun](https:\u002F\u002Fbun.sh)\n*   [Rust](https:\u002F\u002Frustup.rs)\n*   [Python 3.11+]\n*   [Tauri Prerequisites](https:\u002F\u002Fv2.tauri.app\u002Fstart\u002Fprerequisites\u002F)\n\n### 硬件建议\n*   **GPU**: 推荐使用 NVIDIA GPU (CUDA) 或 Apple Silicon (Metal) 以获得最佳性能。\n*   **内存**: 根据模型大小，建议至少 8GB RAM，显存 4GB+ 更佳。\n\n---\n\n## 安装步骤\n\n### 方式一：预编译二进制文件 (推荐)\n适用于 macOS 和 Windows 用户，无需配置环境。\n\n| 平台 | 下载链接 |\n| :--- | :--- |\n| **macOS (Apple Silicon)** | [Download DMG](https:\u002F\u002Fvoicebox.sh\u002Fdownload\u002Fmac-arm) |\n| **macOS (Intel)** | [Download DMG](https:\u002F\u002Fvoicebox.sh\u002Fdownload\u002Fmac-intel) |\n| **Windows** | [Download MSI](https:\u002F\u002Fvoicebox.sh\u002Fdownload\u002Fwindows) |\n| **Docker** | `docker compose up` |\n\n> **注意**: Linux 用户暂无预编译包，请参考官方文档进行源码安装。\n\n### 方式二：从源码安装 (开发者)\n适用于需要自定义或 Linux 用户。\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox.git\ncd voicebox\n\n# 安装依赖并创建虚拟环境\njust setup   \n\n# 启动后端与桌面应用\njust dev     \n```\n\n> **提示**: 首次运行时可能需要下载基础模型，请确保网络稳定。\n\n---\n\n## 基本使用\n\n### 图形界面操作\n1.  **启动应用**: 打开已安装的 Voicebox。\n2.  **克隆声音**: 导入几秒音频或使用内置录音功能创建声音配置文件 (Profile)。\n3.  **生成语音**: 输入文本，选择语言和引擎，点击生成。\n4.  **后期处理**: 使用 Pitch Shift、Reverb 等 8 种音效调整输出。\n5.  **故事编辑**: 在时间轴编辑器中编排多角色对话或播客。\n\n### API 集成\nVoicebox 提供 REST API，可通过 `localhost:17493` 调用。\n\n**生成语音示例：**\n```bash\ncurl -X POST http:\u002F\u002Flocalhost:17493\u002Fgenerate \\\n  -H \"Content-Type: application\u002Fjson\" \\\n  -d '{\"text\": \"Hello world\", \"profile_id\": \"abc123\", \"language\": \"en\"}'\n```\n\n**获取声音列表：**\n```bash\ncurl http:\u002F\u002Flocalhost:17493\u002Fprofiles\n```\n\n**创建声音配置：**\n```bash\ncurl -X POST http:\u002F\u002Flocalhost:17493\u002Fprofiles \\\n  -H \"Content-Type: application\u002Fjson\" \\\n  -d '{\"name\": \"My Voice\", \"language\": \"en\"}'\n```\n\n完整 API 文档可在应用启动后访问 `http:\u002F\u002Flocalhost:17493\u002Fdocs`。","独立游戏开发者小李正在制作一款多结局叙事游戏，急需为五个不同性格的角色生成高质量对白，但面临预算和隐私的双重压力。\n\n### 没有 voicebox 时\n- 依赖云端 API 导致剧本和角色声音数据上传，存在核心创意泄露风险。\n- 按字符计费模式昂贵，数百句剧情对话的费用远超独立开发者的预算。\n- 通用语音缺乏情感变化，无法通过指令实现 `[笑]`、`[叹气]` 等细腻表现。\n- 多角色对话需分别生成再手动剪辑，音轨对齐困难且工作流繁琐耗时。\n\n### 使用 voicebox 后\n- 本地运行模型确保所有语音数据不出本机，彻底保障项目隐私安全无忧。\n- 免费开源引擎支持脚本自动分块与无限长度生成，大幅降低制作成本压力。\n- 调用 Chatterbox Turbo 引擎添加 paralinguistic 标签，精准控制角色情绪起伏。\n- 内置故事编辑器可直接编排多轨道对话，一键渲染出完整的场景音效文件。\n\nvoicebox 让个人创作者能在零成本且隐私受保护的前提下，轻松实现电影级的多角色语音合成效果。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fjamiepine_voicebox_c77f4a11.webp","jamiepine","Jamie Pine","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fjamiepine_cba7be2e.jpg","rust, ai and nice ui","Spacedrive Technology Inc.","Vancouver, BC",null,"https:\u002F\u002Fsd.app","https:\u002F\u002Fgithub.com\u002Fjamiepine",[85,89,93,97,101,105,109,113,116],{"name":86,"color":87,"percentage":88},"TypeScript","#3178c6",57.6,{"name":90,"color":91,"percentage":92},"Python","#3572A5",32.9,{"name":94,"color":95,"percentage":96},"Rust","#dea584",6,{"name":98,"color":99,"percentage":100},"Shell","#89e051",1.2,{"name":102,"color":103,"percentage":104},"Just","#384d54",1,{"name":106,"color":107,"percentage":108},"JavaScript","#f1e05a",0.6,{"name":110,"color":111,"percentage":112},"CSS","#663399",0.5,{"name":114,"color":103,"percentage":115},"Dockerfile",0.2,{"name":117,"color":118,"percentage":119},"HTML","#e34c26",0.1,14473,1722,"2026-04-05T10:23:24","MIT","macOS, Windows, Linux","非必需，支持 CPU 运行。推荐 NVIDIA GPU (CUDA)，LuxTTS 引擎需约 1GB 显存。支持 AMD ROCm、Apple Metal、Intel Arc、DirectML。","未说明",{"notes":128,"python":129,"dependencies":130},"Linux 暂无预编译包需源码构建；支持 Docker 部署；开发环境需安装 Bun、Rust、Just 及 Tauri 前置条件；模型数据完全本地存储；部分引擎需自动下载 CUDA 二进制文件。","3.11+",[131,132,133,134,135,136],"torch","fastapi","pedalboard","whisper","librosa","sqlite",[26,55,14,13,15],[139,140,141,142,134,143,144,145],"ai","voice-clone","qwen3-tts","voice-ai","qwen3-tts-ui","cuda","mlx",121,"2026-03-27T02:49:30.150509","2026-04-06T05:17:28.250297",[150,155,159,163,167,172],{"id":151,"question_zh":152,"answer_zh":153,"source_url":154},1346,"为什么 Voicebox 默认只使用 CPU 而不是 GPU？","在 Voicebox v0.2.x 版本中，应用程序默认搭载 CPU 后端。这是为了简化初始安装体验，用户需要手动下载并安装 CUDA 加速后端才能启用 GPU。无需重新安装整个应用。","https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fissues\u002F141",{"id":156,"question_zh":157,"answer_zh":158,"source_url":154},1347,"如何在 Voicebox 中启用 GPU 加速？","请确保已更新至 v0.2.x 版本。进入 **设置 (Settings) → GPU 加速 (GPU Acceleration)**，点击此处下载并安装 CUDA 加速后端。安装完成后，下次生成音频时 GPU 应自动被检测到并生效。",{"id":160,"question_zh":161,"answer_zh":162,"source_url":154},1348,"Voicebox 支持哪些类型的显卡（如 AMD、Intel）？","支持多种硬件后端：\n- **NVIDIA**：通过 CUDA 后端支持（Windows\u002FLinux）。\n- **AMD**：Linux 系统支持 AMD ROCm。\n- **Intel**：Windows 系统支持 Intel Arc (XPU) 和 DirectML。",{"id":164,"question_zh":165,"answer_zh":166,"source_url":154},1349,"即使启用了 GPU，渲染音频的速度依然很慢吗？","根据用户反馈，即使使用 GPU，当前模型的渲染速度可能仍然较慢。维护者表示该模型目前主要用于娱乐和测试，建议等待下一代更新以获得更好的性能。",{"id":168,"question_zh":169,"answer_zh":170,"source_url":171},1350,"如果更新到 v0.2.x 后 GPU 仍未被检测到怎么办？","如果在安装 CUDA 后端后仍无法检测 GPU，请打开一个新的 Issue，并提供您的 GPU 型号和操作系统详细信息，以便进一步排查问题。","https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fissues\u002F84",{"id":173,"question_zh":174,"answer_zh":175,"source_url":176},1351,"在哪里可以下载 Voicebox 的最新版本进行更新？","您可以从官方站点 [voicebox.sh](https:\u002F\u002Fvoicebox.sh) 下载最新发行版，或者直接在应用内部进行更新操作。","https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fissues\u002F198",[178,183,188,193,198,203,208,213,218,223,228,233,238,243,248,253,258,263],{"id":179,"version":180,"summary_zh":181,"released_at":182},100849,"v0.3.0","This release rewrites the backend into a modular architecture, overhauls the settings UI into routed sub-pages, fixes audio player freezing, migrates documentation to Fumadocs, and ships a batch of bug fixes targeting the most-reported issues from the tracker.\r\n\r\nThe backend's 3,000-line monolith `main.py` has been decomposed into domain routers, a services layer, and a proper database package. A style guide and ruff configuration now enforce consistency. On the frontend, settings have been split into dedicated routed pages with server logs, a changelog viewer, and an about page. The audio player no longer freezes mid-playback, and model loading status is now visible in the UI. Seven user-reported bugs have been fixed, including server crashes during sample uploads, generation list staleness, cryptic error messages, and CUDA support for RTX 50-series GPUs.\r\n\r\n### Settings Overhaul ([#294](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F294))\r\n- Split settings into routed sub-tabs: General, Generation, GPU, Logs, Changelog, About\r\n- Added live server log viewer with auto-scroll\r\n- Added in-app changelog page that parses `CHANGELOG.md` at build time\r\n- Added About page with version info, license, and generation folder quick-open\r\n- Extracted reusable `SettingRow` component for consistent setting layouts\r\n\r\n### Audio Player Fix ([#293](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F293))\r\n- Fixed audio player freezing during playback\r\n- Improved playback UX with better state management and listener cleanup\r\n- Fixed restart race condition during regeneration\r\n- Added stable keys for audio element re-rendering\r\n- Improved accessibility across player controls\r\n\r\n### Backend Refactor ([#285](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F285))\r\n- Extracted all routes from `main.py` into 13 domain routers under `backend\u002Froutes\u002F` — `main.py` dropped from ~3,100 lines to ~10\r\n- Moved CRUD and service modules into `backend\u002Fservices\u002F`, platform detection into `backend\u002Futils\u002F`\r\n- Split monolithic `database.py` into a `database\u002F` package with separate `models`, `session`, `migrations`, and `seed` modules\r\n- Added `backend\u002FSTYLE_GUIDE.md` and `pyproject.toml` with ruff linting config\r\n- Removed dead code: unused `_get_cuda_dll_excludes`, stale `studio.py`, `example_usage.py`, old `Makefile`\r\n- Deduplicated shared logic across TTS backends into `backends\u002Fbase.py`\r\n- Improved startup logging with version, platform, data directory, and database stats\r\n- Fixed startup database session leak — sessions now rollback and close in `finally` block\r\n- Isolated shutdown unload calls so one backend failure doesn't block the others\r\n- Handled null duration in `story_items` migration\r\n- Reject model migration when target is a subdirectory of source cache\r\n\r\n### Documentation Rewrite ([#288](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F288))\r\n- Migrated docs site from Mintlify to Fumadocs (Next.js-based)\r\n- Rewrote introduction and root page with content from README\r\n- Added \"Edit on GitHub\" links and last-updated timestamps on all pages\r\n- Generated OpenAPI spec and auto-generated API reference pages\r\n- Removed stale planning docs (`CUDA_BACKEND_SWAP`, `EXTERNAL_PROVIDERS`, `MLX_AUDIO`, `TTS_PROVIDER_ARCHITECTURE`, etc.)\r\n- Sidebar groups now expand by default; root redirects to `\u002Fdocs`\r\n- Added OG image metadata and `\u002Fog` preview page\r\n\r\n### UI & Frontend\r\n- Added model loading status indicator and effects preset dropdown ([3187344](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fcommit\u002F3187344))\r\n- Fixed take-label race condition during regeneration\r\n- Added accessible focus styling to select component\r\n- Softened select focus indicator opacity\r\n- Addressed 4 critical and 12 major issues from CodeRabbit review\r\n\r\n### Bug Fixes ([#295](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F295))\r\n- Fixed sample uploads crashing the server — audio decoding now runs in a thread pool instead of blocking the async event loop ([#278](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fissues\u002F278))\r\n- Fixed generation list not updating when a generation completes — switched to `refetchQueries` for reliable cache busting, added SSE error fallback, and page reset on completion ([#231](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fissues\u002F231))\r\n- Fixed error toasts showing `[object Object]` instead of the actual error message ([#290](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fissues\u002F290))\r\n- Added Whisper model selection (`base`, `small`, `medium`, `large`, `turbo`) and expanded language support to the `\u002Ftranscribe` endpoint ([#233](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fissues\u002F233))\r\n- Upgraded CUDA backend build from cu121 to cu126 for RTX 50-series (Blackwell) GPU support ([#289](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fissues\u002F289))\r\n- Handled client disconnects in SSE and streaming endpoints to suppress `[Errno 32] Broken Pipe` errors ([#248](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fissues\u002F248))\r\n- Fixed Docker build failure from pip hash mismatch on Qwen3-TTS dependencies ([#286](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002F","2026-03-17T09:46:18",{"id":184,"version":185,"summary_zh":186,"released_at":187},100850,"v0.2.3","The \"it works in dev but not in prod\" release. This version fixes a series of PyInstaller bundling issues that prevented model downloading, loading, generation, and progress tracking from working in production builds.\r\n\r\n### Model Downloads Now Actually Work\r\n\r\nThe v0.2.1\u002Fv0.2.2 builds could not download or load models that weren't already cached from a dev install. This release fixes the entire chain:\r\n\r\n- **Chatterbox, Chatterbox Turbo, and LuxTTS** all download, load, and generate correctly in bundled builds\r\n- **Real-time download progress** — byte-level progress bars now work in production. The root cause: `huggingface_hub` silently disables tqdm progress bars based on logger level, which prevented our progress tracker from receiving byte updates. We now force-enable the internal counter regardless.\r\n- **Fixed Python 3.12.0 `code.replace()` bug** — the macOS build was on Python 3.12.0, which has a [known CPython bug](https:\u002F\u002Fgithub.com\u002Fpyinstaller\u002Fpyinstaller\u002Fissues\u002F7992) that corrupts bytecode when PyInstaller rewrites code objects. This caused `NameError: name 'obj' is not defined` crashes during scipy\u002Ftorch imports. Upgraded to Python 3.12.13.\r\n\r\n### PyInstaller Fixes\r\n\r\n- Collect all `inflect` files — `typeguard`'s `@typechecked` decorator calls `inspect.getsource()` at import time, which needs `.py` source files, not just bytecode. Fixes LuxTTS \"could not get source code\" error.\r\n- Collect all `perth` files — bundles the pretrained watermark model (`hparams.yaml`, `.pth.tar`) needed by Chatterbox at runtime\r\n- Collect all `piper_phonemize` files — bundles `espeak-ng-data\u002F` (phoneme tables, language dicts) needed by LuxTTS for text-to-phoneme conversion\r\n- Set `ESPEAK_DATA_PATH` in frozen builds so the espeak-ng C library finds the bundled data instead of looking at `\u002Fusr\u002Fshare\u002Fespeak-ng-data\u002F`\r\n- Collect all `linacodec` files — fixes `inspect.getsource` error in Vocos codec\r\n- Collect all `zipvoice` files — fixes source code lookup in LuxTTS voice cloning\r\n- Copy metadata for `requests`, `transformers`, `huggingface-hub`, `tokenizers`, `safetensors`, `tqdm` — fixes `importlib.metadata` lookups in frozen binary\r\n- Add hidden imports for `chatterbox`, `chatterbox_turbo`, `luxtts`, `zipvoice` backends\r\n- Add `multiprocessing.freeze_support()` to fix resource_tracker subprocess crash in frozen binary\r\n- `--noconsole` now only applied on Windows — macOS\u002FLinux need stdout\u002Fstderr for Tauri sidecar log capture\r\n- Hardened `sys.stdout`\u002F`sys.stderr` devnull redirect to test writability, not just `None` check\r\n\r\n### Updater\r\n\r\n- Fixed updater artifact generation with `v1Compatible` for `tauri-action` signature files\r\n- Updated `tauri-action` to v0.6 to fix updater JSON and `.sig` generation\r\n\r\n### Other Fixes\r\n\r\n- Full traceback logging on all backend model loading errors (was just `str(e)` before)","2026-03-15T23:28:51",{"id":189,"version":190,"summary_zh":191,"released_at":192},100851,"v0.2.2","UPDATE: I'm working on a rewrite of the model downloading, it's absolute hell and takes a while to test as it always works in dev and never in prod builds. Will have a solution up ASAP. If you're eager to test 0.2.x please compile from source. Next update will solve model downloading and the updater issue for good.\r\n\r\n## v0.2.2\r\n\r\n- Fix Chatterbox model support in bundled builds [SIKE fixed in 0.2.3]\r\n- Fix LuxTTS\u002FZipVoice support in bundled builds [SIKE fixed in 0.2.3]\r\n- Auto-update CUDA binary when app version changes\r\n- CUDA download progress bar\r\n- Fix server process staying alive on macOS (SIGHUP handling, watchdog grace period)\r\n- Hide console window when running CUDA binary on Windows","2026-03-15T16:55:42",{"id":194,"version":195,"summary_zh":196,"released_at":197},100852,"v0.2.1","# The best local voice cloning tool, just got better...\r\nSee the new website: https:\u002F\u002Fvoicebox.sh\r\n\r\n![voicebox-0 2 0](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F2cf64dfb-9ba8-49ce-b129-d9772da73567)\r\n\r\n\r\n> Released 2026-03-15 — [v0.2.1 on GitHub](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Freleases\u002Ftag\u002Fv0.2.1) (version bump due to an immutable release tag on GitHub)\r\n\r\nVoicebox v0.1.x was a single-engine voice cloning app built around Qwen3-TTS. v0.2.0 is a ground-up rethink: four TTS engines, 23 languages, paralinguistic emotion controls, a post-processing effects pipeline, unlimited generation length, an async generation queue, and support for every major GPU vendor. Plus Docker.\r\n\r\n---\r\n\r\n## New TTS Engines\r\n\r\n### Multi-Engine Architecture\r\nVoicebox now runs **four independent TTS engines** behind a thread-safe per-engine backend registry. Switch engines per-generation from a single dropdown — no restart required.\r\n\r\n| Engine | Languages | Size | Key Strengths |\r\n|--------|-----------|------|---------------|\r\n| **Qwen3-TTS 1.7B** | 10 | ~3.5 GB | Highest quality, delivery instructions (\"speak slowly\", \"whisper\") |\r\n| **Qwen3-TTS 0.6B** | 10 | ~1.2 GB | Lighter, faster variant |\r\n| **LuxTTS** | English | ~300 MB | CPU-friendly, 48 kHz output, 150x realtime |\r\n| **Chatterbox Multilingual** | 23 | ~3.2 GB | Broadest language coverage, zero-shot cloning |\r\n| **Chatterbox Turbo** | English | ~1.5 GB | 350M params, low latency, paralinguistic tags |\r\n\r\n### Chatterbox Multilingual — 23 Languages ([#257](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F257))\r\nZero-shot voice cloning in Arabic, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Norwegian, Polish, Portuguese, Russian, Spanish, Swahili, Swedish, and Turkish. The language dropdown dynamically filters to show only languages supported by the selected engine.\r\n\r\n### LuxTTS — Lightweight English TTS ([#254](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F254))\r\nA fast, CPU-friendly English engine. ~300 MB download, 48 kHz output, runs at 150x realtime on CPU. Good for quick drafts and machines without a GPU.\r\n\r\n### Chatterbox Turbo — Expressive English ([#258](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F258))\r\nA fast 350M-parameter English model with inline paralinguistic tags.\r\n\r\n### Paralinguistic Tags Autocomplete ([#265](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F265))\r\nType `\u002F` in the text input with Chatterbox Turbo selected to open an autocomplete for **9 expressive tags** that the model synthesizes inline with speech:\r\n\r\n`[laugh]` `[chuckle]` `[gasp]` `[cough]` `[sigh]` `[groan]` `[sniff]` `[shush]` `[clear throat]`\r\n\r\nTags render as inline badges in a rich text editor and serialize cleanly to the API.\r\n\r\n---\r\n\r\n## Generation\r\n\r\n### Unlimited Generation Length — Auto-Chunking ([#266](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F266))\r\nLong text is now **automatically split at sentence boundaries**, generated per-chunk, and crossfaded back together. Engine-agnostic — works with all four engines.\r\n\r\n- **Auto-chunking limit slider** — 100–5,000 chars (default 800)\r\n- **Crossfade slider** — 0–200ms (default 50ms), or 0 for a hard cut\r\n- **Max text length raised to 50,000 characters**\r\n- Smart splitting respects abbreviations (Dr., e.g., a.m.), CJK punctuation, and never breaks inside `[tags]`\r\n\r\n### Asynchronous Generation Queue ([#269](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F269))\r\nGeneration is now fully non-blocking. Submit a generation and start typing the next one immediately.\r\n\r\n- Serial execution queue prevents GPU contention\r\n- Real-time SSE status streaming (`generating` → `completed` \u002F `failed`)\r\n- Failed generations can be retried without re-entering text\r\n- Stale generations from crashes are auto-recovered on startup\r\n- Generating status pill shown inline in the story editor\r\n\r\n### Generation Versions\r\nEvery generation now supports **multiple versions** with provenance tracking:\r\n\r\n- **Original** — the unprocessed TTS output, always preserved\r\n- **Effects versions** — apply different effects chains to create new versions from any source\r\n- **Takes** — regenerate with the same text\u002Fvoice but a new seed\r\n- **Source tracking** — each version records which version it was derived from\r\n- **Version pinning in stories** — pin a specific version to a story track clip\r\n- **Favorites** — star generations for quick access\r\n\r\n### Language Parameter Fix\r\nQwen TTS models now correctly receive the selected language. The generation form syncs with the voice profile's language setting.\r\n\r\n---\r\n\r\n## Post-Processing Effects ([#271](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F271))\r\n\r\nA full audio effects system powered by Spotify's `pedalboard` library. Apply effects after generation, preview in real time, and build reusable presets.\r\n\r\n| Effect | Description |\r\n|--------|-------------|\r\n| Pitch Shift | ±12 semitones |\r\n| Reverb | Room size, damping, wet\u002Fdry mix |\r\n| Delay | Adjustable time, feedback, mix |\r\n| ","2026-03-15T15:27:23",{"id":199,"version":200,"summary_zh":201,"released_at":202},100853,"v0.1.13","## What's Changed\r\n\r\n### Stability and reliability\r\n- [#95](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F95) Fix: selecting 0.6B model still downloads and uses 1.7B\r\n- [#93](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F93) fix(mlx): bundle native libs and broaden error handling for Apple Silicon\r\n- [#79](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F79) fix: handle non-ASCII filenames in Content-Disposition headers\r\n- [#78](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F78) fix: guard getUserMedia call against undefined mediaDevices in non-secure contexts\r\n- [#77](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F77) fix: await for confirmation before deleting voices and channels\r\n- [#128](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F128) fix: resolve multiple issues (#96, #119, #111, #108, #121, #125, #127)\r\n- [#40](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F40) Fix: audio export path resolution\r\n\r\n### Build and packaging\r\n- [#122](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F122) fix(web): add @tailwindcss\u002Fvite plugin to web config\r\n- [#126](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F126) Create requirements.txt\r\n\r\n### UX and docs\r\n- [#44](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F44) Enhances floating generate box UX\r\n- [#57](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F57) chore: updates repo URL in README\r\n- [#146](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F146) Add Spacebot banner to landing page\r\n- [#1](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fpull\u002F1) Improvements\r\n","2026-02-23T22:01:37",{"id":204,"version":205,"summary_zh":206,"released_at":207},100854,"v0.1.12","## Model Download UX Overhaul\r\n\r\n- Real-time download progress tracking with accurate percentage and speed info\r\n- No more downloading notifications during generation even when its not downloading\r\n- Better error handling and status reporting throughout the download process\r\n\r\n## Other Improvements\r\n\r\n- Enhanced health check endpoint with GPU type information\r\n- Improved model caching verification\r\n- More reliable SSE progress updates\r\n- Actual update notifications, you don't need to go to settings and manually check anymore\r\n\r\nNote: CUDA support for windows coming next update see [issue](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fissues\u002F8#issuecomment-3828047073) and my [plan](https:\u002F\u002Fgithub.com\u002Fjamiepine\u002Fvoicebox\u002Fblob\u002Fmain\u002Fdocs\u002Fplans\u002FTTS_PROVIDER_ARCHITECTURE.md). ","2026-01-31T10:10:36",{"id":209,"version":210,"summary_zh":211,"released_at":212},100855,"v0.1.11","- Fixed transcriptions on MLX\r\n- Fixed model download progress (finally)","2026-01-30T11:38:38",{"id":214,"version":215,"summary_zh":216,"released_at":217},100856,"v0.1.10","## Faster generation on Apple Silicon \r\n\r\nMassive speed gains, from around 20s per generation to 2-3s!\r\n\r\nAdded native MLX backend support for Apple Silicon, providing significantly faster TTS and STT generation on M-series macOS machines.\r\n\r\n**Note: this update broke transcriptions on Apple Silicon only, the patch is in the oven as we speak, 0.1.11 will follow.**\r\n\r\n### Features\r\n\r\n- **MLX Backend**: New backend implementation optimized for Apple Silicon using MLX framework\r\n- **Dynamic Backend Selection**: Automatically detects platform and selects between MLX (macOS) and PyTorch (other platforms)\r\n- **Improved Performance**: Leverages Apple's unified memory architecture for faster model inference\r\n\r\n### Backend Changes\r\n\r\n- Refactored TTS and STT logic into modular backend implementations (`mlx_backend.py`, `pytorch_backend.py`)\r\n- Added platform detection system to handle backend selection at runtime\r\n- Updated model loading and caching to support both backend types\r\n- Enhanced health check endpoints to report active backend type\r\n\r\n### Build & Release\r\n\r\n- Updated build process to include MLX-specific dependencies for macOS builds\r\n- Modified release workflow to handle platform-specific backend bundling\r\n- Added `requirements-mlx.txt` for MLX dependencies\r\n\r\n### Documentation\r\n\r\n- Updated setup and building guides with MLX-specific instructions\r\n- Added troubleshooting guidance for MLX-related issues\r\n- Enhanced architecture documentation to explain backend selection","2026-01-30T09:10:35",{"id":219,"version":220,"summary_zh":221,"released_at":222},100857,"v0.1.9","### Improved voice profile creation flow, \r\n- Voice create drafts: No longer lose work if you close the model\r\n- Fixed whisper only transcribing English or Chinese, now has support for all languages\r\n\r\n### Improved Stories editor:\r\n- Added spacebar for play\u002Fpause\r\n- Timeline now auto-scrolls to follow playhead during playback\r\n- Fixed misalignment of the items with mouse when picking up\r\n- Fixed hitbox for selecting an item\r\n- Fixed playhead jumping forward when pressing play (the timing anchors bug)\r\n\r\n### Generation box improvements\r\n- Instruct mode no longer wipes prompt text\r\n- Improved UI cleanliness\r\n\r\n### Misc\r\n- Fixed \"Model downloading\" toast during generation when model is already downloaded","2026-01-30T01:44:16",{"id":224,"version":225,"summary_zh":226,"released_at":227},100858,"v0.1.8","## 🐛 Bug Fixes\r\n\r\n### Model Download Timeout Issues\r\nFixed critical issue where model downloads would fail with \"Failed to fetch\" errors on Windows:\r\n\r\n- **Root Cause**: Multi-GB model downloads exceeded HTTP request timeout (30-60s), causing frontend to show errors even though downloads were continuing in background\r\n- **Solution**: Refactored download endpoints to return immediately and continue downloads in background\r\n- `\u002Fmodels\u002Fdownload` endpoint now returns instantly with download starting in background\r\n- `\u002Fgenerate` and `\u002Ftranscribe` endpoints now auto-start model downloads when needed\r\n- Returns 202 Accepted status with download progress information for better UX\r\n- Frontend can track download progress via SSE endpoint and retry when complete\r\n\r\n### Cross-Platform Cache Path Issues\r\n- Fixed hardcoded `~\u002F.cache\u002Fhuggingface\u002Fhub` paths that don't work on Windows\r\n- All cache paths now use `hf_constants.HF_HUB_CACHE` for proper cross-platform support\r\n- Windows: Uses `%USERPROFILE%\\.cache\\huggingface\\hub` or `%LOCALAPPDATA%`\r\n- macOS\u002FLinux: Uses `~\u002F.cache\u002Fhuggingface\u002Fhub`\r\n- Ensures HuggingFace cache directory exists on startup (defensive fix)\r\n\r\n## ✨ Features\r\n\r\n### Windows Process Management\r\n- Added `\u002Fshutdown` endpoint for graceful server shutdown on Windows\r\n- Improved process lifecycle management for bundled server binary\r\n\r\n### GPU Detection Improvements  \r\n- Added `gpu_type` field to health check response\r\n- Now shows specific GPU type: \"CUDA (GPU Name)\", \"MPS (Apple Silicon)\", or None\r\n- Fixes UI showing \"GPU: Not Available\" when MPS\u002FCUDA is actually detected","2026-01-29T12:03:59",{"id":229,"version":230,"summary_zh":231,"released_at":232},100859,"v0.1.7","## Features\r\n\r\n- Trim and split audio clips in Story Editor\r\n- Auto-activation of stories in Story Editor with visible playhead\r\n- Conditional auto-play support in AudioPlayer for better user control\r\n\r\n## Improvements\r\n\r\n- Refactored audio loading with `setAudioWithAutoPlay` across HistoryTable, SampleList, and generation forms\r\n- Improved playback state management to accurately reflect current playing status\r\n- Cleaned up component formatting in HistoryTable and SampleList for consistency\r\n\r\n## Fixes\r\n\r\n- Audio now only auto-plays when explicitly intended, preventing unexpected playback\r\n","2026-01-29T06:48:47",{"id":234,"version":235,"summary_zh":236,"released_at":237},100860,"v0.1.6","Introducing Stories\r\n\r\n\u003Cimg width=\"1312\" height=\"912\" alt=\"Screenshot 2026-01-28 at 9 14 22 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F727d7e28-bebe-4a3c-a373-17b953a8e934\" \u002F>\r\n\r\nThis release introduces Stories, a full voice editor for composing podcasts and generated conversations.\r\n\r\n## What's New\r\n\r\n**Stories Editor**\r\n\r\nCreate multi-voice narratives, podcasts, or conversations with a timeline-based editor. Stories lets you:\r\n\r\n- Compose tracks with different voices\r\n- Edit and arrange audio segments inline\r\n- Build generated conversations with multiple participants\r\n- Manage playback across multiple voice tracks\r\n\r\n**Improved Voice Generation UI**\r\n\r\n- Auto-resizing input for longer prompts\r\n- Default voice selection for faster workflow\r\n- Better layout and interaction patterns\r\n\r\n**Track Editor Integration**\r\n\r\n- Inline track editing within story items\r\n- Improved playback controls and dynamics\r\n- Seamless integration between story content and audio tracks\r\n\r\nStories gives you fine-grained control over voice generation workflows, making it easier to create full podcasts or scripted conversations from scratch.\r\n","2026-01-29T05:17:36",{"id":239,"version":240,"summary_zh":241,"released_at":242},100861,"v0.1.5","Fixed recording length limit at 0:29 to auto stop instead of passing the limit and getting an error, which would cause users to loose their recording","2026-01-28T23:05:10",{"id":244,"version":245,"summary_zh":246,"released_at":247},100862,"v0.1.4","# Changelog - v0.1.4\r\n\r\n## Audio Channels\r\n\r\n- Audio channel management system\r\n- Native audio playback handling in AudioPlayer component\r\n- Improved debugging capabilities for audio playback\r\n\r\n## UI Improvements\r\n\r\n- Refactored ConnectionForm and Checkbox components\r\n- Improved layout consistency and responsiveness across components\r\n- Added safe area constants for better responsive design\r\n- Reorganized App layout with new component structure\r\n- Cleaner UI with CheckCircle2 icon removal","2026-01-28T02:57:57",{"id":249,"version":250,"summary_zh":251,"released_at":252},100863,"v0.1.3","- Improved the generate textbox\r\n- Maybe fixed Windows autoupdate restarting entire computer","2026-01-27T22:39:52",{"id":254,"version":255,"summary_zh":256,"released_at":257},100864,"v0.1.2","### Audio Capture & Format Conversion:\r\n  - Added audio format conversion util audio.ts\r\n  - Enhanced system audio capture on both macOS\r\n  and Windows (significantly improved macos.rs\r\n  and windows.rs)\r\n  - Improved audio recording hooks\r\n  (useSystemAudioCapture, useAudioRecording)\r\n  - Added audio input entitlement for macOS\r\n  - Added audio capture tests\r\n\r\n ### Update System:\r\n  - Enhanced auto-updater functionality and\r\n  update status display","2026-01-27T05:42:31",{"id":259,"version":260,"summary_zh":261,"released_at":262},100865,"v0.1.1","# Voicebox v0.1.1 Release Notes\r\n\r\n## Platform Support\r\n\r\n- **macOS Audio Capture**: Added native audio capture support for sample creation on macOS\r\n- **Windows Audio Capture**: Implemented and refactored audio capture using WASAPI with improved thread safety and error handling\r\n- **Linux Support**: Temporarily removed Linux builds due to GitHub runner disk space constraints (coming soon)\r\n\r\n## Audio Features\r\n\r\n### Recording & Playback\r\n- Added play\u002Fpause functionality for audio samples across all components\r\n- Introduced three new audio sample components:\r\n  - `AudioSampleRecording` - Direct microphone recording\r\n  - `AudioSampleSystem` - System audio capture\r\n  - `AudioSampleUpload` - File upload with drag-and-drop support\r\n- Enhanced audio duration handling to fix metadata issues on Windows\r\n- Improved audio player with restart functionality and better reset handling\r\n\r\n### Audio Management\r\n- Added audio validation and error handling with better user feedback\r\n- Implemented consistent audio cleanup across the application\r\n- Fixed audio file duration handling across components\r\n\r\n## Voice Profile Management\r\n\r\n- Added voice profile import functionality with file size validation (100MB limit)\r\n- Enhanced profile form with new audio sample components\r\n- Improved drag-and-drop support for audio file uploads\r\n- Better error handling and user notifications during import\u002Fexport\r\n\r\n## Server Management\r\n\r\n- Changed default server URL from `localhost:8000` to `127.0.0.1:17493`\r\n- Added server reuse logic to detect and connect to existing instances\r\n- Implemented \"keep server running\" preference with cleanup on exit\r\n- Enhanced orphaned process handling\r\n- Improved server startup logging with dynamic URL display\r\n\r\n## UI\u002FUX Improvements\r\n\r\n- Added `TitleBarDragRegion` component for improved window dragging on macOS\r\n- Refactored App and Sidebar components for better platform support\r\n- Introduced alert dialog component for confirmations\r\n- Enhanced sidebar icon representation\r\n- Improved user feedback messages throughout the app\r\n- Refactored global styles to use Tailwind CSS directives\r\n\r\n## Build & Release Process\r\n\r\n- Added `.bumpversion.cfg` for automated version management\r\n- Enhanced icon generation script to support multi-size Windows icons (icon.ico)\r\n- Updated build scripts for better formatting and readability\r\n- Removed bumpversion dependency from backend requirements\r\n- Improved CONTRIBUTING.md with clearer release process documentation\r\n\r\n## Bug Fixes\r\n\r\n- Fixed date formatting to handle timezone-less date strings as UTC\r\n- Fixed getLatestRelease function to properly filter downloadable files\r\n- Improved audio duration metadata reading on Windows\r\n- Enhanced error handling across audio components\r\n\r\n## Technical Improvements\r\n\r\n- Introduced `AtomicBool` for thread-safe stop signal handling in audio capture\r\n- Updated WASAPI integration for better buffer management\r\n- Enhanced audio capture task spawning for non-Send type compatibility\r\n- Added 'windows' crate dependency for Windows-specific functionality\r\n- Improved progress tracking for model downloads in backend\r\n\r\n---\r\n\r\n**Full Changelog**: v0.1.0...v0.1.1\r\n","2026-01-27T03:00:57",{"id":264,"version":265,"summary_zh":266,"released_at":267},100866,"v0.1.0","# Voicebox v0.1.0\r\n\r\nThe first public release of Voicebox — an open-source voice synthesis studio powered by Qwen3-TTS.\r\n\r\n---\r\n\r\n## Download\r\n\r\n| Platform | Status |\r\n|----------|--------|\r\n| macOS (Apple Silicon) | Available |\r\n| macOS (Intel) | Available |\r\n| Windows (x64) | Available |\r\n| Linux | Coming soon* |\r\n\r\n*Linux builds are delayed due to GitHub Actions CI issues. We're working on it and will release Linux support in v0.1.1.\r\n\r\n---\r\n\r\n## What's in this release\r\n\r\n### Voice Cloning with Qwen3-TTS\r\n\r\nClone any voice from just a few seconds of audio using Alibaba's Qwen3-TTS model.\r\n\r\n- **Automatic model download** — Models download from HuggingFace on first use\r\n- **Multiple model sizes** — Support for 1.7B and 0.6B parameter models\r\n- **Voice prompt caching** — Regenerate instantly without reprocessing audio\r\n- **Multi-language** — English and Chinese support\r\n\r\n### Voice Profile Management\r\n\r\n- **Create profiles** from audio files or record directly in the app\r\n- **Multiple samples per profile** — Combine samples for higher quality cloning\r\n- **Import\u002FExport** — Share profiles or back them up\r\n- **Automatic transcription** — Whisper extracts reference text from samples\r\n\r\n### Speech Generation\r\n\r\n- **Simple text-to-speech** — Select a profile, type text, generate\r\n- **Seed control** — Reproducible generations with optional seed input\r\n- **Long-form support** — Generate up to 5,000 characters at once\r\n\r\n### Generation History\r\n\r\n- **Full history** — Every generation is saved with metadata\r\n- **Search** — Find past generations by text content\r\n- **Inline playback** — Listen without leaving the app\r\n- **Download** — Export audio files to your system\r\n\r\n### Flexible Deployment\r\n\r\n- **Local mode** — Backend runs alongside the desktop app\r\n- **Remote mode** — Connect to a GPU server on your network\r\n- **One-click server** — Turn any machine into a Voicebox server\r\n\r\n### Desktop Experience\r\n\r\n- **Native performance** — Built with Tauri (Rust), not Electron\r\n- **Cross-platform** — Same experience on macOS and Windows\r\n- **Bundled backend** — No Python installation required\r\n\r\n---\r\n\r\n## Tech Stack\r\n\r\n- **Desktop:** Tauri v2 (Rust)\r\n- **Frontend:** React, TypeScript, Tailwind CSS\r\n- **Backend:** FastAPI (Python)\r\n- **Voice Model:** Qwen3-TTS\r\n- **Transcription:** Whisper\r\n- **Database:** SQLite\r\n\r\n---\r\n\r\n## Known Issues\r\n\r\n- **First launch is slow** — Model downloads (2-7GB) on first use\r\n- **Apple Silicon performance** — Generation takes ~10s per paragraph on M1\u002FM2 chips; CUDA is significantly faster\r\n- **Linux not available** — CI pipeline issues; coming in v0.1.1\r\n\r\n---\r\n\r\n## What's Next\r\n\r\nWe're already working on the next release. Here's a preview:\r\n\r\n- **Linux support** — Top priority\r\n- **Real-time synthesis** — Stream audio as it generates\r\n- **Voice effects** — Pitch shift, reverb, and more\r\n- **Timeline editor** — Word-level precision audio editing\r\n- **Conversation mode** — Multi-speaker dialogue generation\r\n- **More models** — XTTS, Bark, and other open-source voice models\r\n\r\n---\r\n\r\n## Feedback\r\n\r\nFound a bug? Have a feature request? Open an issue on GitHub or reach out at [voicebox.sh](https:\u002F\u002Fvoicebox.sh).\r\n\r\n---\r\n\r\n**Thank you for trying Voicebox!**\r\n\r\nP.S: This was originally released yesterday, note to self, don't let Claude manage GitHub tags with bypass permissions turned on.","2026-01-27T04:03:34"]