[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-vndee--local-talking-llm":3,"tool-vndee--local-talking-llm":64},[4,17,27,35,44,52],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":10,"last_commit_at":41,"category_tags":42,"status":16},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,"2026-04-06T03:28:53",[13,14,15,43],"视频",{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":23,"last_commit_at":50,"category_tags":51,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":53,"name":54,"github_repo":55,"description_zh":56,"stars":57,"difficulty_score":23,"last_commit_at":58,"category_tags":59,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,60,43,61,15,62,26,13,63],"数据工具","插件","其他","音频",{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":81,"owner_twitter":82,"owner_website":83,"owner_url":84,"languages":85,"stars":93,"forks":94,"last_commit_at":95,"license":96,"difficulty_score":97,"env_os":98,"env_gpu":99,"env_ram":100,"env_deps":101,"category_tags":115,"github_topics":116,"view_count":23,"oss_zip_url":121,"oss_zip_packed_at":121,"status":16,"created_at":122,"updated_at":123,"faqs":124,"releases":159},4235,"vndee\u002Flocal-talking-llm","local-talking-llm","A talking LLM that runs on your own computer without needing the internet.","local-talking-llm 是一款能让你在本地电脑上打造专属离线语音助手的开源项目。它无需联网即可运行，旨在解决用户对数据隐私的担忧以及对云端服务的依赖问题，让你拥有像电影《钢铁侠》中\"Jarvis\"那样能听会说、完全私有的智能伙伴。\n\n该项目非常适合希望保护隐私的开发者、AI 爱好者以及想要尝试构建本地化应用的研究人员。通过整合三大核心技术，它实现了完整的语音交互闭环：利用 OpenAI Whisper 进行高精度的语音转文字识别；接入 Ollama 运行本地大语言模型（或兼容云端模型）作为“大脑”处理对话逻辑；最后采用最新的 ChatterBox TTS 模型将回复转化为自然语音。\n\n其独特的技术亮点在于引入了先进的 ChatterBox 模型，不仅支持仅需少量样本即可实现的“声音克隆”功能，还能灵活控制回复的情感色彩，同时具备更快的推理速度和内置的音频水印技术。整个流程从录音、转录、思考到发声均在本地完成，为用户提供了安全、流畅且高度可定制的语音交互体验。","## Build your own voice assistant and run it locally: Whisper + Ollama + ChatterBox\n\n> Original article: https:\u002F\u002Fblog.duy-huynh.com\u002Fbuild-your-own-voice-assistant-and-run-it-locally\u002F\n>\n> **Updated May 2025**: Now using [Chatterbox TTS](https:\u002F\u002Fgithub.com\u002Fresemble-ai\u002Fchatterbox), a state-of-the-art open-source TTS model from Resemble AI!\n>\n> The original implementation using Bark has been preserved in the `archive-2025-05-29` branch for reference.\n\n[![BuyMeACoffee](https:\u002F\u002Fraw.githubusercontent.com\u002Fpachadotdev\u002Fbuymeacoffee-badges\u002Fmain\u002Fbmc-yellow.svg)](https:\u002F\u002Fwww.buymeacoffee.com\u002Fvndee)\n\n\nAfter my latest post about how to build your own RAG and run it locally. Today, we're taking it a step further by not only implementing the conversational abilities of large language models but also adding listening and speaking capabilities. The idea is straightforward: we are going to create a voice assistant reminiscent of Jarvis or Friday from the iconic Iron Man movies, which can operate offline on your computer.\n\n**New Features with ChatterBox:**\n- 🎯 **Voice Cloning**: Clone any voice with just a short audio sample\n- 🎭 **Emotion Control**: Adjust emotional expressiveness of responses\n- 🚀 **Better Performance**: 0.5B parameter model with faster inference\n- 💧 **Watermarked Audio**: Built-in neural watermarking for authenticity\n\n### Techstack\nFirst, you should set up a virtual Python environment. You have several options for this, including pyenv, virtualenv, poetry, and others that serve a similar purpose. Personally, I'll use Poetry for this tutorial due to my personal preferences. Here are several crucial libraries you'll need to install:\n\n- **rich**: For a visually appealing console output.\n- **openai-whisper**: A robust tool for speech-to-text conversion.\n- **chatterbox-tts**: State-of-the-art text-to-speech synthesis with voice cloning and emotion control.\n- **langchain**: A straightforward library for interfacing with Large Language Models (LLMs).\n- **langchain-openai**: For connecting to OpenAI-compatible cloud LLM providers like [MiniMax](https:\u002F\u002Fwww.minimaxi.com).\n- **sounddevice**, **pyaudio**, and **speechrecognition**: Essential for audio recording and playback.\n\nFor a detailed list of dependencies, refer to the link here.\n\nThe most critical component here is the Large Language Model (LLM) backend. By default, we use **Ollama** for running LLMs locally. Alternatively, you can use **MiniMax** as a cloud LLM provider for higher-quality responses without local GPU requirements. If Ollama is new to you, I recommend checking out my previous article on offline RAG: \"Build Your Own RAG and Run It Locally: Langchain + Ollama + Streamlit\". Basically, you just need to download the Ollama application, pull your preferred model, and run it.\n\n### Architecture\nOkay, if everything has been set up, let's proceed to the next step. Below is the overall architecture of our application, which fundamentally comprises 3 main components:\n\n- **Speech Recognition**: Utilizing OpenAI's Whisper, we convert spoken language into text. Whisper's training on diverse datasets ensures its proficiency across various languages and dialects.\n- **Conversational Chain**: For the conversational capabilities, we'll employ the Langchain interface with a pluggable LLM backend — either a local model via Ollama (e.g., Gemma3, Llama-4) or a cloud model via [MiniMax](https:\u002F\u002Fwww.minimaxi.com) (MiniMax-M2.7). This setup promises a seamless and engaging conversational flow.\n- **Speech Synthesizer**: The transformation of text to speech is achieved through Chatterbox TTS, a state-of-the-art model from Resemble AI, renowned for its lifelike speech production and voice cloning capabilities.\n\nThe workflow is straightforward: record speech, transcribe to text, generate a response using an LLM, and vocalize the response using ChatterBox.\n\n```mermaid\nflowchart TD\n    A[🎤 User Speech Input] --> B[Speech Recognition\u003Cbr\u002F>OpenAI Whisper]\n    B --> C[📝 Text Transcription]\n    C --> D[Conversational Chain\u003Cbr\u002F>Langchain + Ollama \u002F MiniMax\u003Cbr\u002F>Gemma3 \u002F Llama-4 \u002F MiniMax-M2.7]\n    D --> E[🤖 Generated Response]\n    E --> F[Speech Synthesizer\u003Cbr\u002F>Chatterbox TTS]\n    F --> G[🔊 Audio Output]\n    G --> H[👤 User Hears Response]\n\n    style A fill:#e1f5fe\n    style B fill:#f3e5f5\n    style C fill:#e8f5e8\n    style D fill:#fff3e0\n    style E fill:#e8f5e8\n    style F fill:#f3e5f5\n    style G fill:#e1f5fe\n    style H fill:#fce4ec\n```\n\n### Installation\n\n**⚠️ Important**: We strongly recommend using [uv](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002F) for dependency management instead of pip with `requirements.txt`. The `requirements.txt` file was generated by `uv pip freeze` and contains pinned versions that may not install correctly across different systems.\n\n#### Option 1: Using uv (Recommended)\n\n```bash\n# Install uv if you haven't already\ncurl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\n# or on macOS: brew install uv\n# or on Windows: powershell -ExecutionPolicy ByPass -c \"irm https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.ps1 | iex\"\n\n# Clone the repository\ngit clone https:\u002F\u002Fgithub.com\u002Fvndee\u002Flocal-talking-llm.git\ncd local-talking-llm\n\n# Install dependencies using uv (recommended)\nuv sync\n\n# Activate the virtual environment\nsource .venv\u002Fbin\u002Factivate  # On Windows: .venv\\Scripts\\activate\n\n# Download NLTK data (for sentence tokenization)\npython -c \"import nltk; nltk.download('punkt_tab')\"\n```\n\n#### Option 2: Using pip (Alternative)\n\nIf you prefer to use pip, install directly from pyproject.toml:\n\n```bash\n# Clone the repository\ngit clone https:\u002F\u002Fgithub.com\u002Fvndee\u002Flocal-talking-llm.git\ncd local-talking-llm\n\n# Create virtual environment\npython -m venv venv\nsource venv\u002Fbin\u002Factivate  # On Windows: venv\\Scripts\\activate\n\n# Install from pyproject.toml (NOT requirements.txt)\npip install -e .\n\n# Download NLTK data\npython -c \"import nltk; nltk.download('punkt')\"\n```\n\n#### Install and Setup Ollama\n\n```bash\n# Install and start Ollama\n# Follow instructions at https:\u002F\u002Follama.ai\nollama pull gemma3  # or any other model you prefer\n```\n\n#### Setup MiniMax (Cloud LLM Alternative)\n\nIf you don't have a local GPU or prefer higher-quality cloud models, you can use [MiniMax](https:\u002F\u002Fwww.minimaxi.com) as the LLM backend:\n\n1. Sign up at [MiniMax Platform](https:\u002F\u002Fwww.minimaxi.com) and get your API key\n2. Set the environment variable:\n   ```bash\n   export MINIMAX_API_KEY=\"your-api-key-here\"\n   ```\n\nNo Ollama installation is needed when using MiniMax — the LLM runs in the cloud while TTS and STT still run locally.\n\n### Usage\n\n#### Basic Usage\n```bash\npython app.py\n```\n\n#### With Voice Cloning\nRecord a 10-30 second audio sample of the voice you want to clone, then:\n```bash\npython app.py --voice path\u002Fto\u002Fvoice_sample.wav\n```\n\n#### With Custom Settings\n```bash\n# Adjust emotion and pacing\npython app.py --exaggeration 0.7 --cfg-weight 0.3\n\n# Use a different LLM model\npython app.py --model codellama\n\n# Save generated voice samples\npython app.py --save-voice\n```\n\n#### With MiniMax Cloud LLM\n```bash\n# Use MiniMax as the LLM provider (requires MINIMAX_API_KEY env var)\npython app.py --provider minimax\n\n# Use a specific MiniMax model with custom temperature\npython app.py --provider minimax --model MiniMax-M2.7 --temperature 0.8\n\n# Pass API key directly\npython app.py --provider minimax --api-key your-api-key-here\n\n# Combine with voice cloning and emotion control\npython app.py --provider minimax --voice path\u002Fto\u002Fvoice.wav --exaggeration 0.7\n```\n\n### Configuration Options\n\n- `--provider`: LLM provider (`ollama` or `minimax`, default: ollama)\n- `--api-key`: API key for cloud LLM providers (or use `MINIMAX_API_KEY` env var)\n- `--temperature`: LLM temperature (0.0-1.0, default: 0.7)\n- `--voice`: Path to audio file for voice cloning\n- `--exaggeration`: Emotion intensity (0.0-1.0, default: 0.5)\n  - Lower values (0.3-0.4): Calmer, more neutral delivery\n  - Higher values (0.7-0.9): More expressive and emotional\n- `--cfg-weight`: Controls pacing and delivery style (0.0-1.0, default: 0.5)\n  - Lower values: Faster, more dynamic speech\n  - Higher values: Slower, more deliberate speech\n- `--model`: Ollama model to use (default: llama2)\n- `--save-voice`: Save generated audio responses to `voices\u002F` directory\n\n### Implementation Details\n\n#### TextToSpeechService with ChatterBox\nThe new TextToSpeechService leverages ChatterBox's advanced features:\n\n```python\nfrom chatterbox.tts import ChatterboxTTS\n\nclass TextToSpeechService:\n    def __init__(self, device: str = \"cuda\" if torch.cuda.is_available() else \"cpu\"):\n        self.device = device\n        self.model = ChatterboxTTS.from_pretrained(device=device)\n        self.sample_rate = self.model.sr\n\n    def synthesize(self, text: str, audio_prompt_path: str = None,\n                  exaggeration: float = 0.5, cfg_weight: float = 0.5):\n        wav = self.model.generate(\n            text,\n            audio_prompt_path=audio_prompt_path,\n            exaggeration=exaggeration,\n            cfg_weight=cfg_weight\n        )\n        return self.sample_rate, wav.squeeze().cpu().numpy()\n```\n\nKey improvements over the previous Bark implementation:\n- **Voice Cloning**: Pass an audio file to clone any voice\n- **Emotion Control**: Adjust expressiveness with the `exaggeration` parameter\n- **Better Quality**: ChatterBox produces more natural-sounding speech\n- **Faster Inference**: Smaller model size (0.5B vs Bark's larger models)\n\n#### Dynamic Emotion Analysis\nThe app now includes automatic emotion detection to make responses more expressive:\n\n```python\ndef analyze_emotion(text: str) -> float:\n    emotional_keywords = ['amazing', 'terrible', 'love', 'hate', 'excited',\n                         'sad', 'happy', 'angry', '!', '?!']\n    emotion_score = 0.5\n    for keyword in emotional_keywords:\n        if keyword in text.lower():\n            emotion_score += 0.1\n    return min(0.9, max(0.3, emotion_score))\n```\n\n### Tips for Best Results\n\n1. **Voice Cloning**:\n   - Use a clear 10-30 second audio sample\n   - Ensure the sample has minimal background noise\n   - The voice should speak naturally in the sample\n\n2. **Emotion Control**:\n   - For general conversation: `exaggeration=0.5, cfg_weight=0.5`\n   - For dramatic\u002Fexpressive speech: `exaggeration=0.7+, cfg_weight=0.3`\n   - For calm\u002Fprofessional tone: `exaggeration=0.3, cfg_weight=0.7`\n\n3. **Performance**:\n   - Use CUDA if available for faster inference\n   - The first generation might be slower due to model loading\n   - Consider using smaller Whisper models (\"tiny.en\" or \"base.en\") for faster transcription\n\n### Scaling to Production\n\nFor those aiming to elevate this application to a production-ready status, consider:\n\n- **Performance Optimization**:\n  - Use optimized inference engines (ONNX, TensorRT)\n  - Implement model quantization for faster inference\n  - Add caching for frequently used phrases\n\n- **Enhanced Features**:\n  - Multi-speaker support with voice profiles\n  - Real-time voice conversion\n  - Integration with more LLM providers\n  - Web interface with real-time streaming\n\n- **Voice Database**:\n  - Create a library of voice samples\n  - Implement voice selection UI\n  - Add voice mixing capabilities\n\n- **API Service**:\n  - RESTful API for TTS requests\n  - WebSocket support for real-time communication\n  - Rate limiting and authentication\n\n### Troubleshooting\n\n#### Dependency Installation Issues\n\n**Problem**: `requirements.txt` installation fails with errors like:\n- `ERROR: Could not find a version that satisfies the requirement cPython==0.0.6`\n- `ModuleNotFoundError: No module named 'distutils'`\n- Various version conflicts\n\n**Solution**: The `requirements.txt` file was generated by `uv pip freeze` and contains exact versions that may not work across different systems. Use one of these alternatives:\n\n1. **Use uv (Recommended)**:\n   ```bash\n   uv sync\n   ```\n\n2. **Use pip with pyproject.toml**:\n   ```bash\n   pip install -e .\n   ```\n\n3. **Manual installation of core packages**:\n   ```bash\n   pip install chatterbox-tts langchain-ollama openai-whisper sounddevice rich nltk\n   ```\n\n#### Runtime Issues\n\n- **CUDA out of memory**: Use CPU mode or reduce model precision\n- **Microphone not working**: Check system permissions and device settings\n- **Slow inference**: Ensure you're using GPU if available, consider using smaller models\n- **Voice cloning quality**: Use higher quality audio samples with clear speech\n- **Import errors**: Make sure you activated the virtual environment before running the app\n\n### Conclusion\n\nWith the integration of ChatterBox, we've significantly enhanced our local voice assistant. The addition of voice cloning and emotion control opens up new possibilities for creating personalized and expressive AI assistants. Whether you're building a personal Jarvis, creating content, or developing voice-enabled applications, this updated stack provides a powerful foundation.\n\nThe combination of Whisper's robust speech recognition, Ollama's flexible LLM serving, and ChatterBox's advanced TTS capabilities creates a fully-featured voice assistant that runs entirely offline. No cloud services, no API keys, just pure local AI power!\n\n### Resources\n\n- [ChatterBox GitHub](https:\u002F\u002Fgithub.com\u002Fresemble-ai\u002Fchatterbox)\n- [Ollama](https:\u002F\u002Follama.ai)\n- [MiniMax Platform](https:\u002F\u002Fwww.minimaxi.com)\n- [Whisper](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fwhisper)\n- [Original Blog Post](https:\u002F\u002Fblog.duy-huynh.com\u002Fbuild-your-own-voice-assistant-and-run-it-locally\u002F)\n\n---\n\n","## 搭建属于你的语音助手并在本地运行：Whisper + Ollama + ChatterBox\n\n> 原文链接：https:\u002F\u002Fblog.duy-huynh.com\u002Fbuild-your-own-voice-assistant-and-run-it-locally\u002F\n>\n> **更新于2025年5月**：现已采用 Resemble AI 推出的最先进开源 TTS 模型——[Chatterbox TTS](https:\u002F\u002Fgithub.com\u002Fresemble-ai\u002Fchatterbox)，表现更胜一筹！\n>\n> 使用 Bark 的原始实现仍保留在 `archive-2025-05-29` 分支中，供参考。\n\n[![BuyMeACoffee](https:\u002F\u002Fraw.githubusercontent.com\u002Fpachadotdev\u002Fbuymeacoffee-badges\u002Fmain\u002Fbmc-yellow.svg)](https:\u002F\u002Fwww.buymeacoffee.com\u002Fvndee)\n\n\n继我上一篇关于如何搭建并本地运行 RAG 的文章之后，今天我们将更进一步：不仅实现大型语言模型的对话能力，还为其添加听和说的功能。我们的目标很简单——打造一款类似经典电影《钢铁侠》中贾维斯或弗莱迪那样的语音助手，它可以在你的电脑上离线运行。\n\n**ChatterBox 新特性：**\n- 🎯 **语音克隆**：只需一段简短音频样本即可克隆任意声音\n- 🎭 **情感控制**：调节回复的情感表达强度\n- 🚀 **性能提升**：0.5B 参数量模型，推理速度更快\n- 💧 **水印音频**：内置神经网络水印技术，确保音频真实性\n\n### 技术栈\n首先，你需要设置一个 Python 虚拟环境。你可以选择 pyenv、virtualenv、poetry 等工具来完成这一任务。出于个人习惯，本教程将使用 Poetry。以下是几个关键库，你需要安装它们：\n\n- **rich**：用于美化终端输出。\n- **openai-whisper**：强大的语音转文本工具。\n- **chatterbox-tts**：最先进的文本转语音合成模型，支持语音克隆与情感控制。\n- **langchain**：用于对接大型语言模型（LLM）的简单易用库。\n- **langchain-openai**：用于连接兼容 OpenAI 的云端 LLM 提供商，例如 [MiniMax](https:\u002F\u002Fwww.minimaxi.com)。\n- **sounddevice**、**pyaudio** 和 **speechrecognition**：用于音频录制与播放的必备库。\n\n详细的依赖列表请参阅此处链接。\n\n其中最关键的部分是大型语言模型（LLM）后端。默认情况下，我们使用 **Ollama** 在本地运行 LLM。当然，你也可以选择 **MiniMax** 作为云端 LLM 提供商，这样无需本地 GPU 即可获得更高质量的响应。如果你对 Ollama 还不熟悉，建议先阅读我之前关于离线 RAG 的文章：“搭建属于你的 RAG 并在本地运行：Langchain + Ollama + Streamlit”。基本上，你只需要下载 Ollama 应用程序，拉取你喜欢的模型并启动即可。\n\n### 架构设计\n好了，如果一切准备就绪，接下来我们就可以开始下一步了。以下是整个应用的整体架构，主要由三个核心组件构成：\n\n- **语音识别模块**：利用 OpenAI 的 Whisper 将语音转换为文本。Whisper 经过多种数据集的训练，能够很好地处理不同语言和方言。\n- **对话链模块**：为了实现对话功能，我们将使用 Langchain 接口，并搭配可插拔的 LLM 后端——既可以是通过 Ollama 运行的本地模型（如 Gemma3、Llama-4），也可以是通过 [MiniMax](https:\u002F\u002Fwww.minimaxi.com) 提供的云端模型（MiniMax-M2.7）。这种组合能够提供流畅且富有吸引力的对话体验。\n- **语音合成模块**：文本到语音的转换则由 Resemble AI 的 Chatterbox TTS 完成，该模型以其逼真的语音生成能力和语音克隆功能而闻名。\n\n工作流程非常简单：录制语音、将其转录为文本、利用 LLM 生成回复，最后再通过 ChatterBox 将文本转化为语音输出。\n\n```mermaid\nflowchart TD\n    A[🎤 用户语音输入] --> B[语音识别\u003Cbr\u002F>OpenAI Whisper]\n    B --> C[📝 文本转录]\n    C --> D[对话链\u003Cbr\u002F>Langchain + Ollama \u002F MiniMax\u003Cbr\u002F>Gemma3 \u002F Llama-4 \u002F MiniMax-M2.7]\n    D --> E[🤖 生成回复]\n    E --> F[语音合成\u003Cbr\u002F>Chatterbox TTS]\n    F --> G[🔊 音频输出]\n    G --> H[👤 用户听到回复]\n\n    style A fill:#e1f5fe\n    style B fill:#f3e5f5\n    style C fill:#e8f5e8\n    style D fill:#fff3e0\n    style E fill:#e8f5e8\n    style F fill:#f3e5f5\n    style G fill:#e1f5fe\n    style H fill:#fce4ec\n```\n\n### 安装步骤\n\n**⚠️ 重要提示**：我们强烈推荐使用 [uv](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002F) 来管理依赖，而不是用 pip 和 `requirements.txt` 文件。`requirements.txt` 是通过 `uv pip freeze` 生成的，其中包含固定版本号，可能会在不同系统上无法正确安装。\n\n#### 方法一：使用 uv（推荐）\n\n```bash\n# 如果尚未安装 uv，请先安装\ncurl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\n# 或者在 macOS 上：brew install uv\n# 或者在 Windows 上：powershell -ExecutionPolicy ByPass -c \"irm https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.ps1 | iex\"\n\n# 克隆仓库\ngit clone https:\u002F\u002Fgithub.com\u002Fvndee\u002Flocal-talking-llm.git\ncd local-talking-llm\n\n# 使用 uv 安装依赖（推荐）\nuv sync\n\n# 激活虚拟环境\nsource .venv\u002Fbin\u002Factivate  # Windows 上：.venv\\Scripts\\activate\n\n# 下载 NLTK 数据（用于句子分词）\npython -c \"import nltk; nltk.download('punkt_tab')\"\n```\n\n#### 方法二：使用 pip（替代方案）\n\n如果你更倾向于使用 pip，可以直接从 pyproject.toml 文件安装：\n\n```bash\n# 克隆仓库\ngit clone https:\u002F\u002Fgithub.com\u002Fvndee\u002Flocal-talking-llm.git\ncd local-talking-llm\n\n# 创建虚拟环境\npython -m venv venv\nsource venv\u002Fbin\u002Factivate  # Windows 上：venv\\Scripts\\activate\n\n# 从 pyproject.toml 安装（而非 requirements.txt）\npip install -e .\n\n# 下载 NLTK 数据\npython -c \"import nltk; nltk.download('punkt')\"\n```\n\n#### 安装并配置 Ollama\n\n```bash\n# 安装并启动 Ollama\n# 按照 https:\u002F\u002Follama.ai 上的说明操作\nollama pull gemma3  # 或其他你喜欢的模型\n```\n\n#### 配置 MiniMax（云端 LLM 替代方案）\n\n如果你没有本地 GPU，或者希望使用更高品质的云端模型，可以选用 [MiniMax](https:\u002F\u002Fwww.minimaxi.com) 作为 LLM 后端：\n\n1. 在 [MiniMax 平台](https:\u002F\u002Fwww.minimaxi.com) 注册并获取 API 密钥。\n2. 设置环境变量：\n   ```bash\n   export MINIMAX_API_KEY=\"your-api-key-here\"\n   ```\n\n使用 MiniMax 时无需安装 Ollama——LLM 将在云端运行，而 TTS 和 STT 仍然会在本地执行。\n\n### 使用方法\n\n#### 基本用法\n```bash\npython app.py\n```\n\n#### 语音克隆功能\n录制一段 10–30 秒的音频样本，作为你要克隆的声音，然后运行：\n```bash\npython app.py --voice path\u002Fto\u002Fvoice_sample.wav\n```\n\n#### 自定义设置\n```bash\n# 调整情感表达和语速\npython app.py --exaggeration 0.7 --cfg-weight 0.3\n\n# 使用不同的 LLM 模型\npython app.py --model codellama\n\n# 保存生成的语音样本\npython app.py --save-voice\n```\n\n#### 使用 MiniMax 云端 LLM\n```bash\npython app.py --use-minimax --minimax-api-key=your-api-key-here\n\n# 使用 MiniMax 作为 LLM 提供者（需要设置 MINIMAX_API_KEY 环境变量）\npython app.py --provider minimax\n\n# 使用特定的 MiniMax 模型并自定义温度\npython app.py --provider minimax --model MiniMax-M2.7 --temperature 0.8\n\n# 直接传递 API 密钥\npython app.py --provider minimax --api-key your-api-key-here\n\n# 结合语音克隆和情感控制\npython app.py --provider minimax --voice path\u002Fto\u002Fvoice.wav --exaggeration 0.7\n```\n\n### 配置选项\n\n- `--provider`: LLM 提供者（`ollama` 或 `minimax`，默认：ollama）\n- `--api-key`: 云 LLM 提供者的 API 密钥（或使用 `MINIMAX_API_KEY` 环境变量）\n- `--temperature`: LLM 温度（0.0-1.0，默认：0.7）\n- `--voice`: 用于语音克隆的音频文件路径\n- `--exaggeration`: 情感强度（0.0-1.0，默认：0.5）\n  - 较低值（0.3-0.4）：语气更平静、中性\n  - 较高值（0.7-0.9）：更具表现力和情感\n- `--cfg-weight`: 控制语速和表达风格（0.0-1.0，默认：0.5）\n  - 较低值：语速较快、更生动\n  - 较高值：语速较慢、更沉稳\n- `--model`: 要使用的 Ollama 模型（默认：llama2）\n- `--save-voice`: 将生成的音频回复保存到 `voices\u002F` 目录\n\n### 实现细节\n\n#### 基于 ChatterBox 的 TextToSpeechService\n新的 TextToSpeechService 利用 ChatterBox 的高级功能：\n\n```python\nfrom chatterbox.tts import ChatterboxTTS\n\nclass TextToSpeechService:\n    def __init__(self, device: str = \"cuda\" if torch.cuda.is_available() else \"cpu\"):\n        self.device = device\n        self.model = ChatterboxTTS.from_pretrained(device=device)\n        self.sample_rate = self.model.sr\n\n    def synthesize(self, text: str, audio_prompt_path: str = None,\n                  exaggeration: float = 0.5, cfg_weight: float = 0.5):\n        wav = self.model.generate(\n            text,\n            audio_prompt_path=audio_prompt_path,\n            exaggeration=exaggeration,\n            cfg_weight=cfg_weight\n        )\n        return self.sample_rate, wav.squeeze().cpu().numpy()\n```\n\n相比之前的 Bark 实现，主要改进包括：\n- **语音克隆**：传入音频文件即可克隆任意声音\n- **情感控制**：通过 `exaggeration` 参数调整表达力度\n- **音质提升**：ChatterBox 生成的语音更加自然\n- **推理速度更快**：模型规模更小（0.5B 对比 Bark 的大型模型）\n\n#### 动态情感分析\n应用现在包含自动情感检测功能，使回应更具表现力：\n\n```python\ndef analyze_emotion(text: str) -> float:\n    emotional_keywords = ['amazing', 'terrible', 'love', 'hate', 'excited',\n                         'sad', 'happy', 'angry', '!', '?!']\n    emotion_score = 0.5\n    for keyword in emotional_keywords:\n        if keyword in text.lower():\n            emotion_score += 0.1\n    return min(0.9, max(0.3, emotion_score))\n```\n\n### 最佳实践建议\n\n1. **语音克隆**：\n   - 使用清晰的 10-30 秒音频样本\n   - 确保样本背景噪声尽可能少\n   - 样本中的语音应自然流畅\n\n2. **情感控制**：\n   - 一般对话场景：`exaggeration=0.5, cfg_weight=0.5`\n   - 戏剧化\u002F富有表现力的场景：`exaggeration=0.7+，cfg_weight=0.3`\n   - 冷静专业的语气：`exaggeration=0.3，cfg_weight=0.7`\n\n3. **性能优化**：\n   - 如果有 CUDA 可用，尽量使用以加速推理\n   - 第一次生成可能会较慢，因为需要加载模型\n   - 可考虑使用较小的 Whisper 模型（如 `\"tiny.en\"` 或 `\"base.en\"`）以加快转录速度。\n\n### 生产级部署\n\n对于希望将此应用提升至生产级别的用户，可考虑以下方案：\n\n- **性能优化**：\n  - 使用优化的推理引擎（ONNX、TensorRT）\n  - 实施模型量化以提高推理速度\n  - 添加常用短语的缓存机制\n\n- **增强功能**：\n  - 支持多说话人及语音档案管理\n  - 实现实时语音转换\n  - 集成更多 LLM 提供者\n  - 构建带有实时流媒体功能的 Web 界面\n\n- **语音数据库**：\n  - 创建语音样本库\n  - 实现语音选择界面\n  - 增加语音混合功能\n\n- **API 服务**：\n  - 提供 TTS 请求的 RESTful API\n  - 支持 WebSocket 实时通信\n  - 实施速率限制与身份验证\n\n### 故障排除\n\n#### 依赖安装问题\n\n**问题**：运行 `requirements.txt` 安装时出现错误，例如：\n- `ERROR: Could not find a version that satisfies the requirement cPython==0.0.6`\n- `ModuleNotFoundError: No module named 'distutils'`\n- 各种版本冲突\n\n**解决方案**：`requirements.txt` 文件由 `uv pip freeze` 生成，其中包含的精确版本可能在不同系统上无法正常工作。请尝试以下替代方法：\n\n1. **推荐使用 uv**：\n   ```bash\n   uv sync\n   ```\n\n2. **使用 pip 和 pyproject.toml**：\n   ```bash\n   pip install -e .\n   ```\n\n3. **手动安装核心包**：\n   ```bash\n   pip install chatterbox-tts langchain-ollama openai-whisper sounddevice rich nltk\n   ```\n\n#### 运行时问题\n\n- **CUDA 显存不足**：切换到 CPU 模式或降低模型精度\n- **麦克风无法工作**：检查系统权限和设备设置\n- **推理缓慢**：确保在可用时使用 GPU，必要时可选用小型号模型\n- **语音克隆质量不佳**：使用高质量、语音清晰的音频样本\n- **导入错误**：运行应用前务必激活虚拟环境\n\n### 总结\n\n通过集成 ChatterBox，我们的本地语音助手得到了显著提升。新增的语音克隆和情感控制功能为打造个性化且富有表现力的 AI 助手开辟了全新可能性。无论您是想构建个人版的 Jarvis、创作内容，还是开发语音驱动的应用程序，这套升级后的技术栈都提供了强大的基础支持。\n\nWhisper 强大的语音识别能力、Ollama 灵活的 LLM 服务以及 ChatterBox 先进的 TTS 技术相结合，共同打造出一款完全离线运行的多功能语音助手。无需任何云端服务，也无需 API 密钥，纯粹依靠本地 AI 功能！\n\n### 资源链接\n\n- [ChatterBox GitHub](https:\u002F\u002Fgithub.com\u002Fresemble-ai\u002Fchatterbox)\n- [Ollama](https:\u002F\u002Follama.ai)\n- [MiniMax 平台](https:\u002F\u002Fwww.minimaxi.com)\n- [Whisper](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fwhisper)\n- [原文博客](https:\u002F\u002Fblog.duy-huynh.com\u002Fbuild-your-own-voice-assistant-and-run-it-locally\u002F)\n\n---","# local-talking-llm 快速上手指南\n\n构建属于你自己的本地语音助手（类似钢铁侠的 Jarvis），支持离线运行。该工具集成了 **Whisper**（语音识别）、**Ollama\u002FMiniMax**（大语言模型）和 **ChatterBox**（语音合成与克隆）。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**: Windows, macOS, 或 Linux\n- **Python**: 3.10 或更高版本\n- **硬件**: \n  - 推荐配备 NVIDIA GPU (用于加速 Whisper 和 ChatterBox 推理)\n  - 若无 GPU，可在 CPU 上运行但速度较慢\n- **依赖工具**:\n  - [Ollama](https:\u002F\u002Follama.ai) (若使用本地模型)\n  - [uv](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002F) (推荐的包管理工具，替代 pip)\n\n### 前置依赖\n确保已安装以下基础工具：\n- Git\n- FFmpeg (用于音频处理，大多数包管理器可直接安装)\n\n## 安装步骤\n\n推荐使用 `uv` 进行依赖管理，以避免版本冲突。\n\n### 1. 安装 uv (如未安装)\n```bash\n# Linux\u002FmacOS\ncurl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\n\n# Windows (PowerShell)\npowershell -ExecutionPolicy ByPass -c \"irm https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.ps1 | iex\"\n```\n\n### 2. 克隆项目并安装依赖\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fvndee\u002Flocal-talking-llm.git\ncd local-talking-llm\n\n# 同步依赖并创建虚拟环境\nuv sync\n\n# 激活虚拟环境\n# Linux\u002FmacOS:\nsource .venv\u002Fbin\u002Factivate\n# Windows:\n.venv\\Scripts\\activate\n\n# 下载必要的 NLTK 数据\npython -c \"import nltk; nltk.download('punkt_tab')\"\n```\n\n### 3. 配置大语言模型后端\n\n#### 方案 A：使用 Ollama (本地运行，推荐)\n```bash\n# 安装 Ollama 后，拉取模型 (例如 gemma3 或 llama3)\nollama pull gemma3\n```\n\n#### 方案 B：使用 MiniMax (云端运行，无需本地显卡)\n1. 注册 [MiniMax 平台](https:\u002F\u002Fwww.minimaxi.com) 获取 API Key。\n2. 设置环境变量：\n```bash\nexport MINIMAX_API_KEY=\"your-api-key-here\"\n# Windows PowerShell: $env:MINIMAX_API_KEY=\"your-api-key-here\"\n```\n\n## 基本使用\n\n### 1. 启动语音助手\n默认使用本地 Ollama 模型：\n```bash\npython app.py\n```\n\n若使用 MiniMax 云端模型：\n```bash\npython app.py --provider minimax\n```\n\n### 2. 体验声音克隆 (可选)\n准备一段 10-30 秒的目标人声录音（格式为 `.wav`，背景噪音小），然后运行：\n```bash\npython app.py --voice path\u002Fto\u002Fvoice_sample.wav\n```\n\n### 3. 调整情感与语速 (可选)\n通过参数控制回复的情感强度和语速风格：\n```bash\n# 高情感表达 (更夸张), 较快语速\npython app.py --exaggeration 0.8 --cfg-weight 0.3\n\n# 平静专业语调，较慢语速\npython app.py --exaggeration 0.3 --cfg-weight 0.7\n```\n\n### 常用参数说明\n- `--provider`: 选择模型提供商 (`ollama` 或 `minimax`)\n- `--model`: 指定模型名称 (如 `gemma3`, `MiniMax-M2.7`)\n- `--voice`: 声音克隆样本路径\n- `--exaggeration`: 情感强度 (0.0-1.0，默认 0.5)\n- `--cfg-weight`: 语速控制 (0.0-1.0，越低越快)\n- `--save-voice`: 将生成的语音保存到 `voices\u002F` 目录","资深数据分析师林工需要在无网络的保密实验室中，通过语音快速查询本地文档库并记录分析结论，同时双手需忙于操作实验设备。\n\n### 没有 local-talking-llm 时\n- **网络依赖导致中断**：由于实验室物理隔离互联网，无法使用云端语音助手，任何查询请求都因断网而失败。\n- **交互效率低下**：必须停下手中工作，手动打字输入指令或查阅资料，严重打断实验操作的连贯性。\n- **隐私泄露风险**：若强行将敏感数据上传至外部云服务进行处理，违反公司核心数据不出内网的安全合规要求。\n- **反馈形式单一**：只能依赖屏幕阅读返回结果，在视线需聚焦显微镜或精密仪器时，无法通过听觉获取信息。\n\n### 使用 local-talking-llm 后\n- **完全离线运行**：基于 Ollama 和 Whisper 构建的本地闭环，无需任何网络连接即可在保密环境中流畅响应。\n- **解放双手操作**：林工只需口述指令，local-talking-llm 自动完成“听写 - 思考 - 播报”全流程，实现真正的边做边问。\n- **数据绝对安全**：所有语音识别、大模型推理及 TTS 合成均在本地显卡完成，敏感实验数据从未离开过本机。\n- **拟人化情感交互**：利用 ChatterBox 的情绪控制功能，系统能用自然且带有语气的声音播报复杂结论，降低长时间工作的认知疲劳。\n\nlocal-talking-llm 将原本受限的离线环境转化为高效的智能交互空间，让开发者在保障数据主权的同时享受媲美科幻电影的语音助理体验。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fvndee_local-talking-llm_3e04add2.png","vndee","Duy Huynh","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fvndee_f95de0ee.jpg","SWE - AI\u002FML & Data","Looking for the next cool idea","Vietnam","vndee.huynh@gmail.com","DeeHuynh99","blog.duy.dev","https:\u002F\u002Fgithub.com\u002Fvndee",[86,90],{"name":87,"color":88,"percentage":89},"Python","#3572A5",98,{"name":91,"color":92,"percentage":23},"Makefile","#427819",833,181,"2026-04-05T00:26:16","MIT",4,"Linux, macOS, Windows","非必需。本地运行 LLM (Ollama) 和 TTS (ChatterBox) 推荐使用 NVIDIA GPU (CUDA) 以加速推理；若无 GPU，可使用 CPU 运行或切换至云端 LLM (MiniMax)。具体显存需求取决于所选模型大小（如 ChatterBox 为 0.5B 参数模型，相对轻量）。","未说明（建议至少 8GB-16GB 以流畅运行本地大模型和音频处理）",{"notes":102,"python":103,"dependencies":104},"强烈建议使用 'uv' 进行依赖管理，避免直接使用 requirements.txt 导致版本冲突。若本地无 GPU，可配置使用 MiniMax 云端 API 作为大语言模型后端，此时仅语音识别和合成在本地运行。首次运行前需下载 NLTK 数据 ('punkt_tab') 并拉取 Ollama 模型（如 gemma3, llama2 等）。支持声音克隆功能，需提供 10-30 秒的清晰音频样本。","未说明（需支持 uv 或 venv 的现代 Python 版本，通常建议 3.9+）",[105,106,107,108,109,110,111,112,113,114],"openai-whisper","chatterbox-tts","langchain","langchain-openai","sounddevice","pyaudio","speechrecognition","rich","nltk","ollama",[13,26,63],[117,118,119,120],"chatbot","llm","speech-recognition","speech-synthesis",null,"2026-03-27T02:49:30.150509","2026-04-06T14:04:01.959520",[125,130,135,140,145,150,155],{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},19289,"安装依赖时遇到 pyaudio 或 portaudio 错误，以及运行时报缺少 punkt 怎么办？","1. Portaudio 问题：建议尝试使用 `poetry` 来管理环境，因为该项目是在 poetry 环境下构建和测试的。如果必须手动安装，需先单独安装 portaudio 系统库。\n2. Punkt 缺失：这是 nltk 首次运行时需要的资源，通常会自动下载。如果失败，请手动下载 nltk 的 punkt 包。\n3. 模型切换：可以通过向 Ollama 传递特定参数来使用其他模型（如 phi2），例如在代码中指定 `Ollama(model=\"phi2\")`。","https:\u002F\u002Fgithub.com\u002Fvndee\u002Flocal-talking-llm\u002Fissues\u002F3",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},19290,"如何将默认模型从 llama2 替换为 llama3 或其他模型？","不需要修改 langchain_community 包。只需在初始化 Ollama 时传递模型名称参数即可。例如，使用 llama3 的代码为：`Ollama(model=\"llama3\")`。Ollama 完全支持 llama3 及其他库中可用的模型。","https:\u002F\u002Fgithub.com\u002Fvndee\u002Flocal-talking-llm\u002Fissues\u002F5",{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},19291,"如何让项目支持 CUDA 加速而不是仅使用 CPU？","项目的三个关键组件对 CUDA 的支持情况如下：\n1. **语音转文字 (SpeechToTextService)**: Whisper 模型如果检测到 CUDA 会自动使用，无需额外配置。\n2. **文字转语音 (TextToSpeechService)**: 同样会在可用时自动使用 CUDA。\n3. **大语言模型 (Ollama LLM)**: 需要在 Linux 上按照 Ollama 官方开发文档进行设置以启用 CUDA 服务。","https:\u002F\u002Fgithub.com\u002Fvndee\u002Flocal-talking-llm\u002Fissues\u002F10",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},19292,"助手回答正确后为什么会继续输出无关内容（幻觉）？","可以通过以下两种方式解决：\n1. **修改提示词模板**：增加指令，例如添加 \"Only provide a response related to the conversation transcript. Do not include any other text or word count.\"（只提供与对话记录相关的回复，不要包含其他文本或字数统计）。\n2. **调整转录参数**：在使用 GPU 时，设置额外的转录参数以减少幻觉，例如：`result = stt.transcribe(audio_np, fp16=True, verbose=True, condition_on_previous_text=False, no_speech_threshold=0.1)`。","https:\u002F\u002Fgithub.com\u002Fvndee\u002Flocal-talking-llm\u002Fissues\u002F4",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},19293,"如何实现多语言支持（如西班牙语或中文）？","需要两步操作：\n1. **修改模型加载代码**：将 `stt = whisper.load_model(\"base.en\")` 改为 `stt = whisper.load_model(\"base\")`，以加载支持多语言的模型版本。\n2. **修改提示词模板**：在 template 中明确指定回复语言。例如，若要西班牙语回复，添加 \"Your response must be in Spanish\"；若要中文，则添加相应的语言指令。","https:\u002F\u002Fgithub.com\u002Fvndee\u002Flocal-talking-llm\u002Fissues\u002F1",{"id":151,"question_zh":152,"answer_zh":153,"source_url":154},19294,"运行时出现 urllib ConnectionRefusedError (WinError 10061) 连接被拒绝错误怎么办？","该错误通常是因为端口冲突或服务未启动。尝试更改 Ollama 的默认端口配置通常可以解决此问题。确保 Ollama 服务正在运行，并且应用程序配置的端口与 Ollama 监听的端口一致。","https:\u002F\u002Fgithub.com\u002Fvndee\u002Flocal-talking-llm\u002Fissues\u002F2",{"id":156,"question_zh":157,"answer_zh":158,"source_url":129},19295,"树莓派等低性能设备上 TTS（语音合成）速度极慢，有替代方案吗？","原项目使用的 Bark 引擎在树莓派等资源受限设备上速度非常慢。维护者建议可以尝试使用 Hugging Face 的 `parler-tts` 作为替代方案，或者寻找其他更轻量级的 TTS 引擎以适应低功耗硬件。",[]]