[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-pipecat-ai--pipecat":3,"tool-pipecat-ai--pipecat":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",143909,2,"2026-04-07T11:33:18",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":76,"owner_twitter":76,"owner_website":77,"owner_url":78,"languages":79,"stars":91,"forks":92,"last_commit_at":93,"license":94,"difficulty_score":32,"env_os":95,"env_gpu":96,"env_ram":97,"env_deps":98,"category_tags":106,"github_topics":108,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":115,"updated_at":116,"faqs":117,"releases":138},5098,"pipecat-ai\u002Fpipecat","pipecat","Open Source framework for voice and multimodal conversational AI","Pipecat 是一个开源的 Python 框架，专为构建实时语音及多模态对话式 AI 智能体而设计。它致力于解决开发者在整合语音识别、文本转语音、视频流处理以及各类 AI 服务时面临的复杂编排难题，让团队能专注于打造独特的对话逻辑与应用体验，而非耗费精力在底层基础设施上。\n\n这款工具非常适合希望快速开发语音助手、AI 伴侣、交互式叙事应用或企业客服机器人的软件工程师与技术团队。无论是初创公司还是大型企业，都能利用它轻松实现低延迟的实时互动。\n\nPipecat 的核心亮点在于其“语音优先”的设计理念与高度模块化的管道架构。用户可以将音频、视频、AI 模型及不同的传输协议（如 WebSocket 或 WebRTC）像积木一样灵活组合，构建出复杂的对话行为。此外，它还拥有完善的生态系统，提供覆盖 JavaScript、Swift、Kotlin 等多平台的客户端 SDK，并配套了专用的 CLI 工具、调试器 Whisker 以及结构化对话管理方案 Pipecat Flows，极大地降低了从原型开发到生产部署的门槛。","\u003Ch1>\u003Cdiv align=\"center\">\n \u003Cimg alt=\"pipecat\" width=\"300px\" height=\"auto\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpipecat-ai_pipecat_readme_ef470759fc62.png\">\n\u003C\u002Fdiv>\u003C\u002Fh1>\n\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fpipecat-ai)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fpipecat-ai) ![Tests](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Factions\u002Fworkflows\u002Ftests.yaml\u002Fbadge.svg) [![codecov](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fpipecat-ai\u002Fpipecat\u002Fgraph\u002Fbadge.svg?token=LNVUIVO4Y9)](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fpipecat-ai\u002Fpipecat) [![Docs](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDocumentation-blue)](https:\u002F\u002Fdocs.pipecat.ai) [![Discord](https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F1239284677165056021)](https:\u002F\u002Fdiscord.gg\u002Fpipecat) [![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002Fpipecat-ai\u002Fpipecat)\n\n# 🎙️ Pipecat: Real-Time Voice & Multimodal AI Agents\n\n**Pipecat** is an open-source Python framework for building real-time voice and multimodal conversational agents. Orchestrate audio and video, AI services, different transports, and conversation pipelines effortlessly—so you can focus on what makes your agent unique.\n\n> Want to dive right in? Run `pipecat init quickstart` or follow the [quickstart guide](https:\u002F\u002Fdocs.pipecat.ai\u002Fgetting-started\u002Fquickstart).\n\n## 🚀 What You Can Build\n\n- **Voice Assistants** – natural, streaming conversations with AI\n- **AI Companions** – coaches, meeting assistants, characters\n- **Multimodal Interfaces** – voice, video, images, and more\n- **Interactive Storytelling** – creative tools with generative media\n- **Business Agents** – customer intake, support bots, guided flows\n- **Complex Dialog Systems** – design logic with structured conversations\n\n## 🧠 Why Pipecat?\n\n- **Voice-first**: Integrates speech recognition, text-to-speech, and conversation handling\n- **Pluggable**: Supports many AI services and tools\n- **Composable Pipelines**: Build complex behavior from modular components\n- **Real-Time**: Ultra-low latency interaction with different transports (e.g. WebSockets or WebRTC)\n\n## 🌐 Pipecat Ecosystem\n\n### 📱 Client SDKs\n\nBuilding client applications? You can connect to Pipecat from any platform using our official SDKs:\n\n\u003Ca href=\"https:\u002F\u002Fdocs.pipecat.ai\u002Fclient\u002Fjs\u002Fintroduction\">JavaScript\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fdocs.pipecat.ai\u002Fclient\u002Freact\u002Fintroduction\">React\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fdocs.pipecat.ai\u002Fclient\u002Freact-native\u002Fintroduction\">React Native\u003C\u002Fa> |\n\u003Ca href=\"https:\u002F\u002Fdocs.pipecat.ai\u002Fclient\u002Fios\u002Fintroduction\">Swift\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fdocs.pipecat.ai\u002Fclient\u002Fandroid\u002Fintroduction\">Kotlin\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fdocs.pipecat.ai\u002Fclient\u002Fc++\u002Fintroduction\">C++\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat-esp32\">ESP32\u003C\u002Fa>\n\n### 🧭 Structured conversations\n\nLooking to build structured conversations? Check out [Pipecat Flows](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat-flows) for managing complex conversational states and transitions.\n\n### 🪄 Beautiful UIs\n\nWant to build beautiful and engaging experiences? Checkout the [Voice UI Kit](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fvoice-ui-kit), a collection of components, hooks and templates for building voice AI applications quickly.\n\n### 🛠️ Create and deploy projects\n\nCreate a new project in under a minute with the [Pipecat CLI](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat-cli). Then use the CLI to monitor and deploy your agent to production.\n\n### 🔍 Debugging\n\nLooking for help debugging your pipeline and processors? Check out [Whisker](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fwhisker), a real-time Pipecat debugger.\n\n### 🖥️ Terminal\n\nLove terminal applications? Check out [Tail](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Ftail), a terminal dashboard for Pipecat.\n\n### 🤖 Claude Code Skills\n\nUse [Pipecat Skills](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fskills) with [Claude Code](https:\u002F\u002Fclaude.ai\u002Fcode) to scaffold projects, deploy to Pipecat Cloud, and more. Install the marketplace with:\n\n```\nclaude plugin marketplace add pipecat-ai\u002Fskills\n```\n\nand install any of the available plugins.\n\n### 🧩 Community Integrations\n\nBuild and share your own Pipecat service integrations! Browse existing [community integrations](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fcommunity-integrations) or check out our [guide](COMMUNITY_INTEGRATIONS.md) to create your own.\n\n### 📺️ Pipecat TV Channel\n\nCatch new features, interviews, and how-tos on our [Pipecat TV](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLzU2zoMTQIHjqC3v4q2XVSR3hGSzwKFwH) channel.\n\n## 🎬 See it in action\n\n\u003Cp float=\"left\">\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat-examples\u002Ftree\u002Fmain\u002Fsimple-chatbot\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpipecat-ai_pipecat_readme_a26c3771f48d.png\" width=\"400\" \u002F>\u003C\u002Fa>&nbsp;\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat-examples\u002Ftree\u002Fmain\u002Fstorytelling-chatbot\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpipecat-ai_pipecat_readme_c0b4bcc36960.png\" width=\"400\" \u002F>\u003C\u002Fa>\n    \u003Cbr\u002F>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat-examples\u002Ftree\u002Fmain\u002Ftranslation-chatbot\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fpipecat-ai\u002Fpipecat-examples\u002Fmain\u002Ftranslation-chatbot\u002Fimage.png\" width=\"400\" \u002F>\u003C\u002Fa>&nbsp;\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fblob\u002Fmain\u002Fexamples\u002Fvision\u002Fvision-moondream.py\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpipecat-ai_pipecat_readme_c0b47ea255de.png\" width=\"400\" \u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n## 🧩 Available services\n\n| Category            | Services                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |\n| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| Speech-to-Text      | [AssemblyAI](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fassemblyai), [AWS](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Faws), [Azure](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fazure), [Cartesia](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fcartesia), [Deepgram](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fdeepgram), [ElevenLabs](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Felevenlabs), [Fal Wizper](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Ffal), [Gladia](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fgladia), [Google](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fgoogle), [Gradium](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fgradium), [Groq (Whisper)](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fgroq), [NVIDIA Riva](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Friva), [OpenAI (Whisper)](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fopenai), [Sarvam](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fsarvam), [Soniox](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fsoniox), [Speechmatics](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fspeechmatics), [Whisper](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fwhisper)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |\n| LLMs                | [Anthropic](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fanthropic), [AWS](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Faws), [Azure](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fazure), [Cerebras](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fcerebras), [DeepSeek](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fdeepseek), [Fireworks AI](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Ffireworks), [Gemini](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fgemini), [Grok](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fgrok), [Groq](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fgroq), [Mistral](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fmistral), [Nebius](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fnebius), [Novita](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fnovita), [NVIDIA NIM](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fnvidia), [Ollama](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Follama), [OpenAI](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fopenai), [OpenRouter](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fopenrouter), [Perplexity](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fperplexity), [Qwen](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fqwen), [SambaNova](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fsambanova), [Sarvam](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fsarvam), [Together AI](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Ftogether)                                                                                                                                                                                                                                                                                                                                                         |\n| Text-to-Speech      | [Async](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fasyncai), [AWS](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Faws), [Azure](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fazure), [Camb AI](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fcamb), [Cartesia](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fcartesia), [Deepgram](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fdeepgram), [ElevenLabs](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Felevenlabs), [Fish](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Ffish), [Google](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fgoogle), [Gradium](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fgradium), [Groq](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fgroq), [Hume](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fhume), [Inworld](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Finworld), [Kokoro](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fkokoro), [LMNT](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Flmnt), [MiniMax](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fminimax), [Neuphonic](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fneuphonic), [NVIDIA Riva](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Friva), [OpenAI](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fopenai), [Piper](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fpiper), [Resemble](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fresemble), [Rime](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Frime), [Sarvam](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fsarvam), [Smallest](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fsmallest), [Speechmatics](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fspeechmatics), [xAI](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fxai), [XTTS](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fxtts) |\n| Speech-to-Speech    | [AWS Nova Sonic](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fs2s\u002Faws), [Gemini Multimodal Live](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fs2s\u002Fgemini), [Grok Voice Agent](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fs2s\u002Fgrok), [OpenAI Realtime](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fs2s\u002Fopenai), [Ultravox](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fs2s\u002Fultravox),                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |\n| Transport           | [Daily (WebRTC)](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftransport\u002Fdaily), [FastAPI Websocket](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftransport\u002Ffastapi-websocket), [LiveKit (WebRTC)](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftransport\u002Flivekit), [SmallWebRTCTransport](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftransport\u002Fsmall-webrtc), [WebSocket Server](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftransport\u002Fwebsocket-server), [WhatsApp](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftransport\u002Fwhatsapp), Local                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |\n| Serializers         | [Exotel](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fserializers\u002Fexotel), [Genesys](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fserializers\u002Fgenesys), [Plivo](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fserializers\u002Fplivo), [Twilio](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fserializers\u002Ftwilio), [Telnyx](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fserializers\u002Ftelnyx), [Vonage](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fserializers\u002Fvonage)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |\n| Video               | [HeyGen](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fvideo\u002Fheygen), [LemonSlice](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftransport\u002Flemonslice), [Tavus](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fvideo\u002Ftavus), [Simli](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fvideo\u002Fsimli)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |\n| Memory              | [mem0](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fmemory\u002Fmem0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |\n| Vision & Image      | [fal](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fimage-generation\u002Ffal), [Google Imagen](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fimage-generation\u002Fgoogle-imagen), [Moondream](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fvision\u002Fmoondream)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |\n| Audio Processing    | [Silero VAD](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Futilities\u002Faudio\u002Fsilero-vad-analyzer), [Krisp Viva](https:\u002F\u002Fdocs.pipecat.ai\u002Fguides\u002Ffeatures\u002Fkrisp-viva), [Koala](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Futilities\u002Faudio\u002Fkoala-filter), [ai-coustics](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Futilities\u002Faudio\u002Faic-filter), [RNNoise](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Futilities\u002Faudio\u002Frnnoise-filter)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |\n| Analytics & Metrics | [OpenTelemetry](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Futilities\u002Fopentelemetry), [Sentry](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fanalytics\u002Fsentry)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |\n| Community           | [Browse community integrations →](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fcommunity-integrations)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |\n\n📚 [View full services documentation →](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fsupported-services)\n\n## ⚡ Getting started\n\nYou can get started with Pipecat running on your local machine, then move your agent processes to the cloud when you're ready.\n\n1. Install uv\n\n   ```bash\n   curl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\n   ```\n\n   > **Need help?** Refer to the [uv install documentation](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002Fgetting-started\u002Finstallation\u002F).\n\n2. Install the module\n\n   ```bash\n   # For new projects\n   uv init my-pipecat-app\n   cd my-pipecat-app\n   uv add pipecat-ai\n\n   # Or for existing projects\n   uv add pipecat-ai\n   ```\n\n3. Set up your environment\n\n   ```bash\n   cp env.example .env\n   ```\n\n4. To keep things lightweight, only the core framework is included by default. If you need support for third-party AI services, you can add the necessary dependencies with:\n\n   ```bash\n   uv add \"pipecat-ai[option,...]\"\n   ```\n\n> **Using pip?** You can still use `pip install pipecat-ai` and `pip install \"pipecat-ai[option,...]\"` to get set up.\n\n## 🧪 Code examples\n\n- [Foundational](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Ftree\u002Fmain\u002Fexamples) — small snippets that build on each other, introducing one or two concepts at a time\n- [Example apps](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat-examples) — complete applications that you can use as starting points for development\n\n## 🛠️ Contributing to the framework\n\n### Prerequisites\n\n**Minimum Python Version:** 3.11\n**Recommended Python Version:** >= 3.12\n\n### Setup Steps\n\n1. Clone the repository and navigate to it:\n\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat.git\n   cd pipecat\n   ```\n\n2. Install development and testing dependencies:\n\n   ```bash\n   uv sync --group dev --all-extras \\\n     --no-extra gstreamer \\\n     --no-extra local \\\n   ```\n\n3. Install the git pre-commit hooks:\n\n   ```bash\n   uv run pre-commit install\n   ```\n\n> **Note**: Some extras (local, gstreamer) require system dependencies. See documentation if you encounter build errors.\n\n### Claude Code Skills\n\nInstall development workflow skills for contributing to Pipecat with [Claude Code](https:\u002F\u002Fclaude.ai\u002Fcode):\n\n```\nclaude plugin marketplace add pipecat-ai\u002Fpipecat\nclaude plugin install pipecat-dev@pipecat-dev-skills\n```\n\n### Running tests\n\nTo run all tests, from the root directory:\n\n```bash\nuv run pytest\n```\n\nRun a specific test suite:\n\n```bash\nuv run pytest tests\u002Ftest_name.py\n```\n\n## 🤝 Contributing\n\nWe welcome contributions from the community! Whether you're fixing bugs, improving documentation, or adding new features, here's how you can help:\n\n- **Found a bug?** Open an [issue](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fissues)\n- **Have a feature idea?** Start a [discussion](https:\u002F\u002Fdiscord.gg\u002Fpipecat)\n- **Want to contribute code?** Check our [CONTRIBUTING.md](CONTRIBUTING.md) guide\n- **Documentation improvements?** [Docs](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fdocs) PRs are always welcome\n\nBefore submitting a pull request, please check existing issues and PRs to avoid duplicates.\n\nWe aim to review all contributions promptly and provide constructive feedback to help get your changes merged.\n\n## 🛟 Getting help\n\n➡️ [Join our Discord](https:\u002F\u002Fdiscord.gg\u002Fpipecat)\n\n➡️ [Read the docs](https:\u002F\u002Fdocs.pipecat.ai)\n\n➡️ [Reach us on X](https:\u002F\u002Fx.com\u002Fpipecat_ai)\n","\u003Ch1>\u003Cdiv align=\"center\">\n \u003Cimg alt=\"pipecat\" width=\"300px\" height=\"auto\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpipecat-ai_pipecat_readme_ef470759fc62.png\">\n\u003C\u002Fdiv>\u003C\u002Fh1>\n\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fpipecat-ai)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fpipecat-ai) ![Tests](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Factions\u002Fworkflows\u002Ftests.yaml\u002Fbadge.svg) [![codecov](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fpipecat-ai\u002Fpipecat\u002Fgraph\u002Fbadge.svg?token=LNVUIVO4Y9)](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fpipecat-ai\u002Fpipecat) [![Docs](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDocumentation-blue)](https:\u002F\u002Fdocs.pipecat.ai) [![Discord](https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F1239284677165056021)](https:\u002F\u002Fdiscord.gg\u002Fpipecat) [![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002Fpipecat-ai\u002Fpipecat)\n\n# 🎙️ Pipecat：实时语音与多模态 AI 代理\n\n**Pipecat** 是一个开源的 Python 框架，用于构建实时语音和多模态对话代理。它可以轻松编排音频、视频、AI 服务、不同传输协议以及对话流程，让您专注于打造独一无二的代理。\n\n> 想快速上手吗？运行 `pipecat init quickstart` 或按照 [快速入门指南](https:\u002F\u002Fdocs.pipecat.ai\u002Fgetting-started\u002Fquickstart) 操作。\n\n## 🚀 您可以构建什么\n\n- **语音助手**——与 AI 进行自然流畅的流式对话\n- **AI 伙伴**——教练、会议助理、虚拟角色等\n- **多模态界面**——语音、视频、图像等多种交互方式\n- **互动式故事讲述**——结合生成式媒体的创意工具\n- **业务代理**——客户信息收集、支持机器人、引导式流程等\n- **复杂对话系统**——通过结构化对话设计逻辑\n\n## 🧠 为什么选择 Pipecat？\n\n- **语音优先**：集成语音识别、文本转语音及对话处理功能\n- **可插拔**：支持多种 AI 服务和工具\n- **可组合的管道**：由模块化组件构建复杂行为\n- **实时性**：在不同传输协议（如 WebSocket 或 WebRTC）下实现超低延迟交互\n\n## 🌐 Pipecat 生态系统\n\n### 📱 客户端 SDK\n\n正在开发客户端应用吗？您可以使用我们的官方 SDK 从任何平台连接到 Pipecat：\n\n\u003Ca href=\"https:\u002F\u002Fdocs.pipecat.ai\u002Fclient\u002Fjs\u002Fintroduction\">JavaScript\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fdocs.pipecat.ai\u002Fclient\u002Freact\u002Fintroduction\">React\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fdocs.pipecat.ai\u002Fclient\u002Freact-native\u002Fintroduction\">React Native\u003C\u002Fa> |\n\u003Ca href=\"https:\u002F\u002Fdocs.pipecat.ai\u002Fclient\u002Fios\u002Fintroduction\">Swift\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fdocs.pipecat.ai\u002Fclient\u002Fandroid\u002Fintroduction\">Kotlin\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fdocs.pipecat.ai\u002Fclient\u002Fc++\u002Fintroduction\">C++\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat-esp32\">ESP32\u003C\u002Fa>\n\n### 🧭 结构化对话\n\n想要构建结构化对话吗？请查看 [Pipecat Flows](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat-flows)，它可以帮助您管理复杂的对话状态和转换。\n\n### 🪄 美丽的 UI\n\n想打造美观且引人入胜的体验吗？不妨试试 [Voice UI Kit](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fvoice-ui-kit)，这是一套用于快速构建语音 AI 应用的组件、钩子和模板集合。\n\n### 🛠️ 创建与部署项目\n\n使用 [Pipecat CLI](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat-cli) 在一分钟内创建新项目。然后利用 CLI 监控并部署您的代理至生产环境。\n\n### 🔍 调试\n\n需要帮助调试您的管道和处理器吗？请查看 [Whisker](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fwhisker)，这是一个用于 Pipecat 的实时调试器。\n\n### 🖥️ 终端\n\n喜欢终端应用吗？试试 [Tail](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Ftail)，它是 Pipecat 的终端仪表板。\n\n### 🤖 Claude Code 技能\n\n将 [Pipecat Skills](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fskills) 与 [Claude Code](https:\u002F\u002Fclaude.ai\u002Fcode) 结合使用，可以快速搭建项目、部署到 Pipecat Cloud 等。安装市场插件只需执行以下命令：\n\n```\nclaude plugin marketplace add pipecat-ai\u002Fskills\n```\n\n然后即可安装任意可用插件。\n\n### 🧩 社区集成\n\n您可以构建并分享自己的 Pipecat 服务集成！浏览现有的 [社区集成](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fcommunity-integrations) 或参考我们的 [指南](COMMUNITY_INTEGRATIONS.md)，亲手创建属于您的集成。\n\n### 📺️ Pipecat 电视频道\n\n在我们的 [Pipecat TV](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLzU2zoMTQIHjqC3v4q2XVSR3hGSzwKFwH) 频道上，您可以观看新功能、访谈和操作教程。\n\n## 🎬 实际演示\n\n\u003Cp float=\"left\">\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat-examples\u002Ftree\u002Fmain\u002Fsimple-chatbot\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpipecat-ai_pipecat_readme_a26c3771f48d.png\" width=\"400\" \u002F>\u003C\u002Fa>&nbsp;\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat-examples\u002Ftree\u002Fmain\u002Fstorytelling-chatbot\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpipecat-ai_pipecat_readme_c0b4bcc36960.png\" width=\"400\" \u002F>\u003C\u002Fa>\n    \u003Cbr\u002F>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat-examples\u002Ftree\u002Fmain\u002Ftranslation-chatbot\">\u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fpipecat-ai\u002Fpipecat-examples\u002Fmain\u002Ftranslation-chatbot\u002Fimage.png\" width=\"400\" \u002F>\u003C\u002Fa>&nbsp;\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fblob\u002Fmain\u002Fexamples\u002Fvision\u002Fvision-moondream.py\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpipecat-ai_pipecat_readme_c0b47ea255de.png\" width=\"400\" \u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n## 🧩 可用服务\n\n| 类别            | 服务                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |\n| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| 语音转文本      | [AssemblyAI](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fassemblyai), [AWS](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Faws), [Azure](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fazure), [Cartesia](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fcartesia), [Deepgram](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fdeepgram), [ElevenLabs](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Felevenlabs), [Fal Wizper](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Ffal), [Gladia](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fgladia), [Google](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fgoogle), [Gradium](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fgradium), [Groq (Whisper)](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fgroq), [NVIDIA Riva](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Friva), [OpenAI (Whisper)](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fopenai), [Sarvam](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fsarvam), [Soniox](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fsoniox), [Speechmatics](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fspeechmatics), [Whisper](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fstt\u002Fwhisper)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |\n| 大语言模型                | [Anthropic](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fanthropic), [AWS](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Faws), [Azure](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fazure), [Cerebras](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fcerebras), [DeepSeek](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fdeepseek), [Fireworks AI](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Ffireworks), [Gemini](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fgemini), [Grok](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fgrok), [Groq](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fgroq), [Mistral](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fmistral), [Nebius](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fnebius), [Novita](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fnovita), [NVIDIA NIM](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fnvidia), [Ollama](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Follama), [OpenAI](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fopenai), [OpenRouter](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fopenrouter), [Perplexity](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fperplexity), [Qwen](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fqwen), [SambaNova](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fsambanova), [Sarvam](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Fsarvam), [Together AI](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fllm\u002Ftogether)                                                                                                                                                                                                                                                                                                                                                         |\n| 文本转语音      | [Async](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fasyncai), [AWS](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Faws), [Azure](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fazure), [Camb AI](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fcamb), [Cartesia](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fcartesia), [Deepgram](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fdeepgram), [ElevenLabs](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Felevenlabs), [Fish](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Ffish), [Google](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fgoogle), [Gradium](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fgradium), [Groq](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fgroq), [Hume](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fhume), [Inworld](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Finworld), [Kokoro](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fkokoro), [LMNT](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Flmnt), [MiniMax](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fminimax), [Neuphonic](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fneuphonic), [NVIDIA Riva](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Friva), [OpenAI](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fopenai), [Piper](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fpiper), [Resemble](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fresemble), [Rime](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Frime), [Sarvam](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fsarvam), [Smallest](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fsmallest), [Speechmatics](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fspeechmatics), [xAI](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fxai), [XTTS](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftts\u002Fxtts) |\n| 语音到语音    | [AWS Nova Sonic](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fs2s\u002Faws), [Gemini Multimodal Live](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fs2s\u002Fgemini), [Grok Voice Agent](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fs2s\u002Fgrok), [OpenAI Realtime](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fs2s\u002Fopenai), [Ultravox](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fs2s\u002Fultravox),                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |\n| 传输              | [Daily (WebRTC)](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftransport\u002Fdaily), [FastAPI Websocket](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftransport\u002Ffastapi-websocket), [LiveKit (WebRTC)](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftransport\u002Flivekit), [SmallWebRTCTransport](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftransport\u002Fsmall-webrtc), [WebSocket Server](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftransport\u002Fwebsocket-server), [WhatsApp](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftransport\u002Fwhatsapp), 本地                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |\n| 序列化器          | [Exotel](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fserializers\u002Fexotel), [Genesys](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fserializers\u002Fgenesys), [Plivo](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fserializers\u002Fplivo), [Twilio](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fserializers\u002Ftwilio), [Telnyx](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fserializers\u002Ftelnyx), [Vonage](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fserializers\u002Fvonage)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |\n| 视频              | [HeyGen](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fvideo\u002Fheygen), [LemonSlice](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Ftransport\u002Flemonslice), [Tavus](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fvideo\u002Ftavus), [Simli](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fvideo\u002Fsimli)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |\n| 记忆              | [mem0](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fmemory\u002Fmem0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |\n| 视觉与图像        | [fal](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fimage-generation\u002Ffal), [Google Imagen](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fimage-generation\u002Fgoogle-imagen), [Moondream](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fvision\u002Fmoondream)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |\n| 音频处理          | [Silero VAD](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Futilities\u002Faudio\u002Fsilero-vad-analyzer), [Krisp Viva](https:\u002F\u002Fdocs.pipecat.ai\u002Fguides\u002Ffeatures\u002Fkrisp-viva), [Koala](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Futilities\u002Faudio\u002Fkoala-filter), [ai-coustics](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Futilities\u002Faudio\u002Faic-filter), [RNNoise](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Futilities\u002Faudio\u002Frnnoise-filter)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |\n| 分析与指标        | [OpenTelemetry](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Futilities\u002Fopentelemetry), [Sentry](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fanalytics\u002Fsentry)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |\n| 社区              | [浏览社区集成 →](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fcommunity-integrations)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |\n\n📚 [查看完整服务文档 →](https:\u002F\u002Fdocs.pipecat.ai\u002Fserver\u002Fservices\u002Fsupported-services)\n\n\n\n## ⚡ 快速入门\n\n您可以先在本地机器上运行 Pipecat，待准备就绪后再将代理进程迁移到云端。\n\n1. 安装 uv\n\n   ```bash\n   curl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\n   ```\n\n   > **需要帮助？** 请参考 [uv 安装文档](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002Fgetting-started\u002Finstallation\u002F)。\n\n2. 安装模块\n\n   ```bash\n   # 对于新项目\n   uv init my-pipecat-app\n   cd my-pipecat-app\n   uv add pipecat-ai\n\n   # 或者对于现有项目\n   uv add pipecat-ai\n   ```\n\n3. 配置环境变量\n\n   ```bash\n   cp env.example .env\n   ```\n\n4. 为了保持轻量化，默认仅包含核心框架。如果您需要支持第三方 AI 服务，可以通过以下命令添加必要的依赖：\n\n   ```bash\n   uv add \"pipecat-ai[option,...]\"\n   ```\n\n> **使用 pip 吗？** 您仍然可以使用 `pip install pipecat-ai` 和 `pip install \"pipecat-ai[option,...]\"` 来完成设置。\n\n## 🧪 代码示例\n\n- [基础示例](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Ftree\u002Fmain\u002Fexamples) — 一系列相互衔接的小片段，每次引入一到两个概念。\n- [示例应用](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat-examples) — 完整的应用程序，可作为开发的起点。\n\n## 🛠️ 参与框架贡献\n\n### 前提条件\n\n**最低 Python 版本：** 3.11  \n**推荐 Python 版本：** >= 3.12  \n\n### 设置步骤\n\n1. 克隆仓库并进入目录：\n\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat.git\n   cd pipecat\n   ```\n\n2. 安装开发和测试依赖：\n\n   ```bash\n   uv sync --group dev --all-extras \\\n     --no-extra gstreamer \\\n     --no-extra local \\\n   ```\n\n3. 安装 Git 的 pre-commit 钩子：\n\n   ```bash\n   uv run pre-commit install\n   ```\n\n> **注意**：部分额外依赖（local、gstreamer）需要系统级依赖。如果遇到构建错误，请参阅相关文档。\n\n### Claude Code 技能\n\n使用 [Claude Code](https:\u002F\u002Fclaude.ai\u002Fcode) 安装开发工作流技能，以参与 Pipecat 的贡献：\n\n```\nclaude plugin marketplace add pipecat-ai\u002Fpipecat\nclaude plugin install pipecat-dev@pipecat-dev-skills\n```\n\n### 运行测试\n\n从根目录运行所有测试：\n\n```bash\nuv run pytest\n```\n\n运行特定测试套件：\n\n```bash\nuv run pytest tests\u002Ftest_name.py\n```\n\n## 🤝 贡献方式\n\n我们欢迎社区的贡献！无论您是修复 bug、改进文档，还是添加新功能，都可以通过以下方式参与：\n\n- **发现 bug？** 请提交 [issue](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fissues)\n- **有功能想法？** 请发起 [讨论](https:\u002F\u002Fdiscord.gg\u002Fpipecat)\n- **想贡献代码？** 请查阅我们的 [CONTRIBUTING.md](CONTRIBUTING.md) 指南\n- **改进文档？** 文档 PR 始终受到欢迎。\n\n在提交 Pull Request 之前，请先检查现有的 issue 和 PR，以避免重复。\n\n我们致力于及时审查所有贡献，并提供建设性的反馈，以帮助您的更改顺利合并。\n\n## 🛟 获取帮助\n\n➡️ [加入我们的 Discord](https:\u002F\u002Fdiscord.gg\u002Fpipecat)\n\n➡️ [阅读文档](https:\u002F\u002Fdocs.pipecat.ai)\n\n➡️ [在 X 上联系我们](https:\u002F\u002Fx.com\u002Fpipecat_ai)","# Pipecat 快速上手指南\n\nPipecat 是一个开源的 Python 框架，专为构建**实时语音和多模态对话 AI 代理**而设计。它支持低延迟的音频\u002F视频流处理，可轻松编排各种 AI 服务、传输协议和对话管道。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux, macOS 或 Windows (推荐 WSL2)\n*   **Python 版本**: Python 3.10 或更高版本\n*   **包管理器**: pip (Python 包安装程序)\n*   **前置依赖**:\n    *   建议创建虚拟环境以避免依赖冲突。\n    *   若需使用特定 AI 服务（如 OpenAI, Deepgram 等），请提前准备好相应的 API Key。\n\n## 安装步骤\n\n### 1. 创建并激活虚拟环境（推荐）\n\n```bash\npython -m venv pipecat-env\n# Linux\u002FmacOS\nsource pipecat-env\u002Fbin\u002Factivate\n# Windows\npipecat-env\\Scripts\\activate\n```\n\n### 2. 安装 Pipecat 核心库\n\n通过 PyPI 安装最新稳定版：\n\n```bash\npip install pipecat-ai\n```\n\n> **提示**：如果您需要特定的服务集成（例如 OpenAI LLM 或 Deepgram STT），可以通过 extras 安装，例如：`pip install pipecat-ai[openai,deepgram]`。\n\n### 3. 初始化项目（可选但推荐）\n\n使用官方 CLI 工具快速搭建项目骨架：\n\n```bash\npip install pipecat-cli\npipecat init quickstart\n```\n\n按照终端提示完成配置，这将生成一个包含基础依赖和示例代码的项目目录。\n\n## 基本使用\n\n以下是一个最简单的示例，展示如何构建一个基础的语音对话管道。该示例结合了语音识别（STT）、大语言模型（LLM）和语音合成（TTS）。\n\n**注意**：运行前请确保已设置相关环境变量（如 `OPENAI_API_KEY`, `DEEPGRAM_API_KEY` 等）。\n\n```python\nimport asyncio\nimport os\nfrom pipecat.pipeline.pipeline import Pipeline\nfrom pipecat.pipeline.runner import PipelineRunner\nfrom pipecat.pipeline.task import PipelineTask\nfrom pipecat.processors.aggregators.sentence import SentenceAggregator\nfrom pipecat.services.deepgram import DeepgramSTTService\nfrom pipecat.services.openai import OpenAILLMService\nfrom pipecat.services.elevenlabs import ElevenLabsTTSService\nfrom pipecat.transports.websocket import WebSocketTransport\n\nasync def main():\n    # 初始化传输层 (以 WebSocket 为例)\n    transport = WebSocketTransport()\n\n    # 初始化服务\n    stt = DeepgramSTTService(api_key=os.getenv(\"DEEPGRAM_API_KEY\"))\n    llm = OpenAILLMService(api_key=os.getenv(\"OPENAI_API_KEY\"))\n    tts = ElevenLabsTTSService(api_key=os.getenv(\"ELEVENLABS_API_KEY\"))\n\n    # 构建管道：语音输入 -> 文本识别 -> 句子聚合 -> LLM 推理 -> 语音合成 -> 音频输出\n    pipeline = Pipeline(\n        [\n            transport.input(),\n            stt,\n            SentenceAggregator(),\n            llm,\n            tts,\n            transport.output(),\n        ]\n    )\n\n    # 创建任务并运行\n    task = PipelineTask(pipeline)\n    runner = PipelineRunner()\n\n    await runner.run(task)\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n### 下一步\n*   **查看文档**: 访问 [Pipecat 官方文档](https:\u002F\u002Fdocs.pipecat.ai) 获取详细的 API 参考和架构说明。\n*   **探索示例**: 浏览 [pipecat-examples](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat-examples) 仓库，获取更多关于多模态、故事生成和翻译机器人的完整代码。\n*   **客户端集成**: 使用官方提供的 JavaScript, React, Swift 或 Kotlin SDK 将后端管道连接到您的前端应用。","一家初创团队正在开发一款基于 Web 的实时 AI 心理陪伴助手，需要让用户通过浏览器与 AI 进行低延迟、自然流畅的语音对话。\n\n### 没有 pipecat 时\n- **延迟严重**：自行拼接语音识别（STT）、大模型和语音合成（TTS）服务时，音频流处理链路冗长，导致用户说完话后需等待数秒才能听到回复，对话体验割裂。\n- **架构复杂**：为了协调不同厂商的 API 和数据格式，团队需编写大量胶水代码来管理音频缓冲、状态同步和错误重试，开发周期被大幅拉长。\n- **多端适配难**：若要同时支持 Web、iOS 和 Android 客户端，需为每个平台单独重写底层的 WebSocket 或 WebRTC 连接逻辑，维护成本极高。\n- **调试黑盒**：当出现声音卡顿或对话中断时，缺乏可视化的链路追踪工具，开发者只能在海量日志中盲目排查音频帧丢失的具体环节。\n\n### 使用 pipecat 后\n- **极致流畅**：pipecat 内置的流式管道架构实现了音频帧的即时透传，将端到端延迟压缩至毫秒级，用户感觉像是在与真人实时交谈。\n- **快速组装**：利用其模块化组件，团队仅用少量代码即可将 Whisper、LLM 和 ElevenLabs 等服务像积木一样串联，核心业务逻辑开发时间缩短 70%。\n- **一次构建多端运行**：借助官方提供的 JavaScript、Swift 和 Kotlin 等客户端 SDK，同一套后端管道可直接服务于所有平台，无需重复造轮子。\n- **透明可观测**：通过集成 Whisker 调试器和 Tail 终端看板，团队能实时监控音频流在每个处理节点的耗时与状态，迅速定位并解决性能瓶颈。\n\npipecat 让开发者从繁琐的底层音视频编排中解放出来，专注于打造真正有温度的对话体验。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpipecat-ai_pipecat_ef470759.png","pipecat-ai","Pipecat","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fpipecat-ai_0678da1d.png","Pipecat is a framework for building voice (and multimodal) conversational agents.",null,"https:\u002F\u002Fpipecat.ai","https:\u002F\u002Fgithub.com\u002Fpipecat-ai",[80,84,88],{"name":81,"color":82,"percentage":83},"Python","#3572A5",100,{"name":85,"color":86,"percentage":87},"Shell","#89e051",0,{"name":89,"color":90,"percentage":87},"Jinja","#a52a22",11058,1882,"2026-04-07T08:51:09","BSD-2-Clause","未说明 (支持 Python 的跨平台系统)","非必需 (取决于所选服务)。框架本身为纯 Python，但集成本地模型 (如 Whisper, Ollama) 时需对应硬件；云服务 (OpenAI, Groq 等) 无需本地 GPU。","未说明 (取决于具体应用场景和所选模型)",{"notes":99,"python":100,"dependencies":101},"Pipecat 是一个模块化框架，其运行环境需求高度依赖于用户选择的具体服务后端。若使用云端 API (如 OpenAI, Deepgram)，仅需基础网络环境；若集成本地开源模型 (通过 Ollama 或本地 Whisper)，则需满足相应模型的算力与存储需求。建议通过 'pipecat init quickstart' 初始化项目以自动配置依赖。","3.9+ (推断自现代 AI 框架及 CLI 工具兼容性，文档未明确指定最低版本)",[72,102,103,104,105],"aiohttp","websockets","numpy","loguru",[35,13,107,15,14],"音频",[109,110,111,112,113,114],"ai","real-time","voice","voice-assistant","chatbot-framework","chatbots","2026-03-27T02:49:30.150509","2026-04-07T22:50:55.347779",[118,123,128,133],{"id":119,"question_zh":120,"answer_zh":121,"source_url":122},23171,"为什么 Google STT 服务会频繁抛出 409 错误（流超时）？","Google STT 服务需要持续的音频输入。如果在一段时间内没有收到输入（例如静音期间），它会抛出 409 错误并终止连接。这是一个已知的 API 行为限制。\n\n解决方案：\n1. 避免在 Google STT 服务之前阻断音频流。例如，不要将 `STTMuteFilter` 放在 `GoogleSTTService` 之前，因为这会切断连续输入。\n2. 建议将 `STTMuteFilter` 移至 `GoogleSTTService` 之后，仅过滤由 STT 生成的转录帧（TranscriptionFrames），而不是原始音频帧。\n3. 确保即使在用户不说话时，也有某种机制保持音频流的活跃或处理静默数据。","https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fissues\u002F2180",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},23172,"遇到 ElevenLabs 策略违规（Policy Violation）或 1008 错误是什么原因？","这通常不是 Pipecat 的代码错误，而是由以下原因引起的：\n1. **额度耗尽**：最常见的原因是您的 ElevenLabs 账户用完了按需（on-demand）信用额度。\n2. **语音模型限制**：某些特定的非标准语音或模型可能与当前 API 版本不兼容，导致报错，而标准语音通常正常工作。\n3. **服务端错误**：如果没有复现步骤且未近期报告，可能是 ElevenLabs 服务端的临时错误。\n\n建议检查您的 ElevenLabs 账户余额，并尝试切换回标准语音测试是否恢复正常。","https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fissues\u002F1426",{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},23173,"WebSocket 断开后管道继续发送消息导致报错，该如何解决？","当调用方挂断导致 WebSocket 断开，而管道仍在尝试发送数据时，会触发 `WebSocketDisconnect` 异常。\n\n该问题已在 **Pipecat v0.0.78** 及更高版本（如 v0.0.79）中修复。如果您遇到此错误，请将 Pipecat 升级到最新版本：\n```bash\npip install --upgrade pipecat-ai\n```\n升级后，系统应能更优雅地处理连接断开情况，不再频繁抛出此类异常。","https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fissues\u002F2209",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},23174,"AudioBufferProcessor 合并输出的用户和机器人音频不同步怎么办？","这是一个已知的计算逻辑问题，`AudioBufferProcessor` 在计算静音时长时未正确考虑音频片段本身的持续时间，导致时间戳对齐错误。\n\n该问题已通过 [PR #3541](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3541) 修复。请确保您使用的是包含此修复的最新版本代码。如果您在使用最新版本后仍然遇到同步问题（特别是在使用 `SmallWebRTCTransport` 时），可能需要检查您的具体实现是否有其他阻塞用户输入的逻辑，或者尝试使用官方示例脚本（如 `34-audio-recording.py`）来排查是否是项目特定配置导致的问题。","https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fissues\u002F2851",[139,144,149,154,159,164,169,174,179,184,189,194,199,204,209,214,219,224,229,234],{"id":140,"version":141,"summary_zh":142,"released_at":143},136856,"v0.0.108","### 新增功能\n\n- 新增了 `SarvamLLMService`，支持 `sarvam-30b`、`sarvam-30b-16k`、`sarvam-105b` 和 `sarvam-105b-32k` 模型。\n  （PR [#3978](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3978)）\n\n- 在 `TTSService` 中新增了 `on_turn_context_created(context_id)` 钩子。可通过重写该方法，在文本开始流动之前执行特定于提供商的初始化操作（例如提前打开服务器端上下文）。每次创建新的对话上下文 ID 时都会调用此钩子。\n  （PR [#4013](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F4013)）\n\n- 新增了 `XAIHttpTTSService`，用于通过 xAI 的 HTTP TTS API 进行文本转语音。\n  （PR [#4031](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F4031)）\n\n- 在所有 LLM 适配器中，新增了对对话上下文中“developer”角色消息的支持。对于非 OpenAI 服务（Anthropic、Google、AWS Bedrock），“developer”消息会被转换为“user”消息（可使用 `system_instruction` 设置系统指令）。而对于 OpenAI 服务，“developer”消息会直接保留在对话历史中。在 Responses API 中，这些消息则保持为“developer”角色（与现有的“system”→“developer”转换一致）。\n  （PR [#4089](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F4089)）\n\n- 新增了 `SmallestTTSService`，这是一个基于 WebSocket 的 TTS 服务集成，对接 Smallest AI 的 Waves API。支持 Lightning v2 和 v3.1 模型，并提供可配置的语音、语言、语速、一致性、相似度以及增强设置。\n  （PR [#4092](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F4092)）\n\n- 在轮次停止策略中新增了警告提示：当 `VADParams.stop_secs` 与推荐的默认值（0.2 秒）不同时，或当 `stop_secs >= STT p99 延迟` 时，STT 等待超时将被压缩至 0 秒，可能导致轮次检测延迟。这些警告会引导开发者使用其 VAD 设置重新运行 [stt-benchmark](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fstt-benchmark) 工具。\n  （PR [#4115](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F4115)）\n\n- 在 `AssemblyAISTTSettings` 中新增了 `domain` 参数，用于启用专业识别模式，例如医疗模式（`domain=\"medical-v1\"`）。\n  （PR [#4117](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F4117)）\n\n- 新增了 `NovitaLLMService`，用于通过 Novita AI 兼容 OpenAI 的 API 使用其 LLM 模型。\n  （PR [#4119](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F4119)）\n\n- 在 `VADAnalyzer` 和 `VADController` 中新增了 `cleanup()` 方法，以便在不再需要时正确释放 VAD 分析器资源。自定义的 `VADAnalyzer` 子类可以重写 `cleanup()` 方法来释放持有的任何资源。\n  （PR [#4120](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F4120)）\n\n- 在 `AssemblyAISTTService` 中新增了 `on_end_of_turn` 事件处理器。该事件会在最终转录文本推送完成后触发，为轮次结束逻辑提供一个可靠的钩子，避免与 `TranscriptionFrame` 发生竞争条件。此功能在 Pipecat 和 AssemblyAI 的轮次检测模式下均适用。\n  （PR [#4128](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F4128)）\n\n- 新增了 `DeepgramFluxSageMakerSTTService`，用于在 AWS SageMaker 终端上运行 Deepgram Flux 语音转文本服务。","2026-03-28T04:48:42",{"id":145,"version":146,"summary_zh":147,"released_at":148},136857,"v0.0.107","### 新增\n\n- 向 `SyncParallelPipeline` 添加了 `frame_order` 参数。将 `frame_order=FrameOrder.PIPELINE` 设置为按管道定义顺序推送同步输出帧（先输出第一个管道的所有帧，再输出第二个管道的帧，依此类推），而非默认的到达顺序。\n  （PR [#4029](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F4029)）\n\n- 向 `OutputImageRawFrame` 添加了 `sync_with_audio` 字段。当该字段设置为 `True` 时，输出传输队列会将图像帧与音频同步发送，确保只有在所有前置音频均已发送完毕后才会显示图像帧，从而实现音画同步播放。\n  （PR [#4029](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F4029)）\n\n- 添加了 `OpenAIResponsesLLMService`，这是一种新的 LLM 服务，使用 OpenAI Responses API。支持流式文本、函数调用、用量指标以及带外推理功能。可与通用的 `LLMContext` 和 `LLMContextAggregatorPair` 配合使用。请参阅 `examples\u002Ffoundational\u002F07-interruptible-openai-responses.py` 和 `14-function-calling-openai-responses.py`。\n  （PR [#4074](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F4074)）\n\n- 向 `TransportParams` 添加了 `audio_out_auto_silence` 参数（默认值为 `True`）。当该参数设置为 `False` 时，传输模块会在输出队列为空时等待音频数据，而不是插入静音，这在需要不间断音频播放且不允许人为间隙的场景中非常有用。\n  （PR [#4104](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F4104)）\n\n### 变更\n\n- 重命名了追踪跨度属性，以符合 OpenTelemetry GenAI 语义规范：将 `gen_ai.system` 更名为 `gen_ai.provider.name`，`system` 更名为 `gen_ai.system_instructions`，`gen_ai.usage.cache_read_input_tokens` 更名为 `gen_ai.usage.cache_read.input_tokens`，`gen_ai.usage.cache_creation_input_tokens` 更名为 `gen_ai.usage.cache_creation.input_tokens`。\n  （PR [#3449](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3449)）\n\n- `DeepgramSageMakerTTSService` 现在会正确地通过基础 `TTSService` 的音频上下文队列路由音频。音频帧不再直接推送，而是通过 `append_to_audio_context()` 方法传递，从而实现正确的排序、中断处理以及开始\u002F结束帧的生命周期管理。中断事件现在会通过 `on_audio_context_interrupted` 在适当时机向 Deepgram 发送 `Clear` 消息，以清空其文本缓冲区。\n  （PR [#4083](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F4083)）\n\n- `GradiumTTSService` 现在会在每个 TTS 上下文的第一条文本消息之前，按照 Gradium 的多路复用协议，发送包含 `client_req_id` 的上下文级 `setup` 消息。此前，仅在连接时发送一次带有 `client_req_id` 的设置消息，这导致在使用 `close_ws_on_eos=False` 时，Gradium 无法将请求与其会话关联起来。\n  （PR [#4091](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F4091)）\n\n### 修复\n\n- 通过从 `_settings.system_instruction` 读取系统指令，而非已移除的 `_system_instruction` 属性，修复了 LLM 追踪跨度中过时的 `system_instruction`。\n  （PR","2026-03-24T03:18:56",{"id":150,"version":151,"summary_zh":152,"released_at":153},136858,"v0.0.106","### 新增功能\n\n- 为 `ServiceUpdateSettingsFrame`（及其子类 `LLMUpdateSettingsFrame`、`TTSUpdateSettingsFrame`、`STTUpdateSettingsFrame`）添加了可选的 `service` 字段，用于指定目标服务实例。当设置了 `service` 时，只有匹配的服务会应用这些设置；其他服务则会原样转发该帧。这使得在管道中存在多个相同类型的服务时，可以单独更新其中某一服务。\n  （PR [#4004](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F4004)）\n\n- 在 Daily 运行器的 `configure()` 方法中添加了 `sip_provider` 和 `room_geo` 参数。这两个便捷参数允许调用者直接指定 SIP 提供商名称和地理位置，而无需手动构建 `DailyRoomProperties` 和 `DailyRoomSipParams`。\n  （PR [#4005](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F4005)）\n\n- 添加了 `PerplexityLLMAdapter`，它可以自动转换对话消息，以满足 Perplexity 更严格的 API 约束条件（严格的角色交替、不允许非初始系统消息、最后一条消息必须是用户或工具发出的）。此前，某些对话历史可能会导致 Perplexity API 报错，而使用 OpenAI 则不会出现此类问题（`PerplexityLLMService` 继承自 `OpenAILLMService`，因为 Perplexity 使用与 OpenAI 兼容的 API）。\n  （PR [#4009](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F4009)）\n\n- 为 Daily 传输层增加了对 DTMF 输入事件的支持。传入的 DTMF 音调现在通过 Daily 的 `on_dtmf_event` 回调函数接收，并作为 `InputDTMFFrame` 推入管道，从而使机器人能够响应电话呼叫者的按键操作。\n  （PR [#4047](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F4047)）\n\n- 添加了基于唤醒短语触发用户轮次的 `WakePhraseUserTurnStartStrategy`，并支持 `single_activation` 模式。同时废弃了 `WakeCheckFilter`。\n  （PR [#4064](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F4064)）\n\n- 添加了 `default_user_turn_start_strategies()` 和 `default_user_turn_stop_strategies()` 辅助函数，用于组合自定义策略列表。\n  （PR [#4064](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F4064)）\n\n### 变更内容\n\n- 将工具结果的 JSON 序列化方式更改为使用 `ensure_ascii=False`，以保留 UTF-8 字符，而不对其进行转义。这有助于减少非英语语言的上下文大小和令牌使用量。\n  （PR [#3457](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3457)）\n\n- `OpenAIRealtimeSTTService` 的 `noise_reduction` 参数现已移至 `OpenAIRealtimeSTTSettings` 中，从而可以通过 `STTUpdateSettingsFrame` 在运行时进行更新。从版本 0.0.106 开始，直接使用 `noise_reduction` 初始化参数已被弃用。\n  （PR [#3991](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3991)）\n\n- 更新了 `sarvamai` 依赖项，由 `0.1.26a2`（alpha 版）升级至 `0.1.26`（稳定版）。\n  （PR [#3997](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3997)）\n\n- `SimliVideoService` 现在继承自 `AIService` 而不是 `FrameProcessor`，使其与其他 HeyGen 和 Tavus 视频服务保持一致。它支持通过 `SimliVideoService.Settings(...)` 进行配置。","2026-03-19T06:43:56",{"id":155,"version":156,"summary_zh":157,"released_at":158},136859,"v0.0.105","### 新增功能\n\n- 增加了并发音频上下文支持：通过将 `pause_frame_processing` 设置为 `False`，并将每句话路由到各自的音频上下文队列，`CartesiaTTSService` 现在可以在上一句话仍在播放时合成下一句话。\n  （PR [#3804](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3804)）\n\n- 为 Daily 传输增加了自定义视频轨道支持。使用 `DailyParams` 中的 `video_out_destinations` 可以同时发布多个视频轨道，与现有的 `audio_out_destinations` 功能相呼应。\n  （PR [#3831](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3831)）\n\n- 添加了 `ServiceSwitcherStrategyFailover`，当当前服务报告非致命错误时，会自动切换到下一个服务。恢复策略可以通过 `on_service_switched` 事件处理器实现。\n  （PR [#3861](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3861)）\n\n- 在 `register_function()` 和 `register_direct_function()` 中新增了可选的 `timeout_secs` 参数，用于对每个工具的函数调用进行超时控制，从而覆盖全局的 `function_call_timeout_secs` 默认值。\n  （PR [#3915](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3915)）\n\n- 为 Daily 传输的 `enable_recording` 属性添加了 `cloud-audio-only` 录音选项。\n  （PR [#3916](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3916)）\n\n- 在 `BaseOpenAILLMService`、`AnthropicLLMService` 和 `AWSBedrockLLMService` 中实现了 `system_instruction` 的绑定，使其作为默认系统提示工作，与 Google 服务的行为保持一致。这使得可以在多个 LLM 服务之间共享单个 `LLMContext`，每个服务可以独立提供自己的系统指令。\n\n    ```python\n    llm = OpenAILLMService(\n        api_key=os.getenv(\"OPENAI_API_KEY\"),\n        system_instruction=\"你是一个乐于助人的助手。\",\n    )\n\n    context = LLMContext()\n\n    @transport.event_handler(\"on_client_connected\")\n    async def on_client_connected(transport, client):\n        context.add_message({\"role\": \"user\", \"content\": \"请自我介绍。\"})\n        await task.queue_frames([LLMRunFrame()])\n    ```  \n  （PR [#3918](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3918)）\n\n- 在 `AssemblyAIConnectionParams` 中新增了 `vad_threshold` 参数，用于配置 U3 Pro 中语音活动检测的灵敏度。将其与外部 VAD 阈值（例如 Silero VAD）对齐，可以避免出现“死区”——即 AssemblyAI 会转录 VAD 尚未检测到的语音内容。\n  （PR [#3927](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3927)）\n\n- 在 `BaseWhisperSTTService` 和 `OpenAISTTService` 中新增了 `push_empty_transcripts` 参数，允许将空转录作为 `TranscriptionFrame` 推送至下游，而不是直接丢弃（这是默认行为）。这一功能适用于 VAD 即使用户未说话也会触发的情况。在这种情况下，知道没有发生任何转录是非常有用的，这样代理就可以继续发言，而不必再等待转录结果。\n  （PR [#3930](ht","2026-03-11T01:01:15",{"id":160,"version":161,"summary_zh":162,"released_at":163},136860,"v0.0.104","### 新增功能\n\n- 新增了 `TextAggregationMetricsData` 指标，用于衡量从首个 LLM 令牌到首个完整句子的时间，反映 TTS 流程中句子聚合的延迟开销。\n  （PR [#3696](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3696)）\n\n- 增加了在运行时使用强类型对象而非字典来更新服务设置的支持。\n\n  例如，以往需要这样写：\n\n  ```python\n  await task.queue_frame(\n      STTUpdateSettingsFrame(settings={\"language\": Language.ES})\n  )\n  ```\n\n  现在可以改写为：\n\n  ```python\n  await task.queue_frame(\n      STTUpdateSettingsFrame(delta=DeepgramSTTSettings(language=Language.ES))\n  )\n  ```\n\n  每个服务现在都提供了如 `DeepgramSTTSettings` 这样的强类型类，用于表示该服务可运行时更新的配置项。\n  （PR [#3714](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3714)）\n\n- 增加了对 Azure 语音转文本服务指定私有端点的支持，从而可以在防火墙后的私有网络中使用。\n  （PR [#3764](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3764)）\n\n- 新增了 `LemonSliceTransport` 和 `LemonSliceApi`，以支持将实时 LemonSlice 虚拟形象添加到任何 Daily 会议房间中。\n  （PR [#3791](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3791)）\n\n- 在 Ultravox 服务的 `AgentInputParams` 和 `OneShotInputParams` 中新增了 `output_medium` 参数，用于在通话创建时控制初始输出媒介（文本或语音）。\n  （PR [#3806](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3806)）\n\n- 新增了通用的会话检测指标类 `TurnMetricsData`，并包含端到端处理时间的测量。`KrispVivaTurn` 现在会发出 `TurnMetricsData`，其中的 `e2e_processing_time_ms` 用于跟踪从 VAD 语音转静音过渡到会话完成之间的时间间隔。\n  （PR [#3809](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3809)）\n\n- 向 `AudioContextTTSService` 添加了 `on_audio_context_interrupted()` 和 `on_audio_context_completed()` 回调函数。子类可以通过重写这些回调来执行特定于提供商的清理操作，而无需再覆盖 `_handle_interruption()` 方法。\n  （PR [#3814](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3814)）\n\n- 向 `LLMContextSummarizer` 添加了 `on_summary_applied` 事件，用于可观ability 监控，提供上下文摘要前后的消息数量信息。\n  （PR [#3855](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3855)）\n\n- 在 `LLMContextSummarizationConfig` 中新增了 `summary_message_template` 字段，用于自定义摘要注入上下文时的格式化方式（例如，用 XML 标签包裹）。\n  （PR [#3855](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3855)）\n\n- 在 `LLMContextSummarizationConfig` 中新增了 `summarization_timeout` 参数（默认值为 120 秒），以防止 LLM 调用卡死而永久阻塞后续的上下文摘要操作。\n  （PR [#3855](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3855)）\n\n- 在 `LLMContextSummarizationConfig` 中新增了可选的 `llm` 字段，用于将摘要生成路由到专门的 LLM 服务（例如，更便宜或更快的模型），而不是使用管道中的","2026-03-03T05:25:10",{"id":165,"version":166,"summary_zh":167,"released_at":168},136861,"v0.0.103","### 新增功能\n\n- 在 `InworldAITTSService` 中新增了 `\"timestampTransportStrategy\": \"ASYNC\"` 配置。这使得时间戳信息可以滞后于音频分块的到达，从而显著降低首个音频分块的延迟。\n  （PR [#3625](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3625)）\n\n- 为 `RimeTTSService` 添加了特定于模型的 `InputParams`：Arcana 模型参数（`repetition_penalty`、`temperature`、`top_p`）以及 MistV2 模型参数（`no_text_normalization`、`save_oovs`、`segment`）。现在，当模型、语音或参数发生变化时，WebSocket 连接会自动重新建立。\n  （PR [#3642](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3642)）\n\n- 在 `BaseOutputTransport` 中新增了 `write_transport_frame()` 钩子，允许传输子类处理通过音频队列流动的自定义帧类型。\n  （PR [#3719](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3719)）\n\n- 为 Daily 传输添加了 `DailySIPTransferFrame` 和 `DailySIPReferFrame`。这些帧会将 SIP 转移和 SIP REFER 操作与音频一起排队，确保操作仅在机器人完成当前话语后执行。\n  （PR [#3719](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3719)）\n\n- 为 `SarvamSTTService` 增加了保活支持，以防止空闲连接超时（例如在使用 `ServiceSwitcher` 时）。\n  （PR [#3730](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3730)）\n\n- 添加了 `UserIdleTimeoutUpdateFrame`，可通过动态更新超时时间来在运行时启用或禁用用户空闲检测。\n  （PR [#3748](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3748)）\n\n- 在基础 `Frame` 类中新增了 `broadcast_sibling_id` 字段。该字段由 `broadcast_frame()` 和 `broadcast_frame_instance()` 自动设置为反向推送的配对帧的 ID，从而使接收方能够识别广播对。\n  （PR [#3774](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3774)）\n\n- 为 `RTVIObserverParams` 添加了 `ignored_sources` 参数，并为 `RTVIObserver` 增加了 `add_ignored_source()` 和 `remove_ignored_source()` 方法，用于抑制来自特定管道处理器的 RTVI 消息（例如静默评估的 LLM）。\n  （PR [#3779](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3779)）\n\n- 新增了 `DeepgramSageMakerTTSService`，用于通过 HTTP\u002F2 双向流式传输在 AWS SageMaker 终端节点上部署的 Deepgram TTS 模型。该服务支持 Deepgram TTS 协议（Speak、Flush、Clear、Close）、中断处理以及每轮的 TTFB 指标。\n  （PR [#3785](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3785)）\n\n### 变更内容\n\n- ⚠️ `RimeTTSService` 现在默认使用 `model=\"arcana\"` 和 `wss:\u002F\u002Fusers-ws.rime.ai\u002Fws3` 端点。`InputParams` 的默认值已从 MistV2 特有的值更改为 `None`——只有显式设置的参数才会作为查询参数发送。\n  （PR [#3642](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3642)）\n\n- `AICFilter` 现在通过 `aic_filter.py` 中的单例 `AICModelManager` 共享只读 AIC 模型。\n    - 使用相同模型路径或 `(model_id, model_download_dir)` 的多个过滤器将共享一个已加载的模型，并维护引用计数。","2026-02-21T00:47:18",{"id":170,"version":171,"summary_zh":172,"released_at":173},136862,"v0.0.102","### 新增功能\n\n- 新增了 `ResembleAITTSService`，用于通过 Resemble AI 的流式 WebSocket API 实现文本转语音功能，支持词级时间戳和抖动缓冲，以确保音频播放流畅。\n  （PR [#3134](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3134)）\n\n- 新增了 `UserBotLatencyObserver`，用于跟踪用户与机器人之间的响应延迟。当启用追踪时，延迟测量结果会自动记录为 OpenTelemetry 轮次跨度上的 `turn.user_bot_latency_seconds` 属性。\n  （PR [#3355](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3355)）\n\n- 为 `TTSSpeakFrame` 添加了 `append_to_context` 参数，用于条件性地将文本添加到 LLM 上下文。\n    - 允许对是否将文本添加到对话上下文中进行细粒度控制\n    - 默认值为 `True`，以保持向后兼容性\n  （PR [#3584](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3584)）\n\n- 新增了带有 `context_id` 字段的 TTS 上下文跟踪系统，用于在整个管道中追踪音频生成过程。\n    - `TTSAudioRawFrame`、`TTSStartedFrame` 和 `TTSStoppedFrame` 现在都包含 `context_id`\n    - `AggregatedTextFrame` 和 `TTSTextFrame` 现在也包含 `context_id`\n    - 这一功能可以追踪特定音频片段是由哪个 TTS 请求生成的\n  （PR [#3584](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3584)）\n\n- 增加了对 Inworld TTS WebSocket 自动模式的支持，以进一步降低延迟。\n  （PR [#3593](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3593)）\n\n- 新增了用于上下文摘要的两个新帧：`LLMContextSummaryRequestFrame` 和 `LLMContextSummaryResultFrame`。\n  （PR [#3621](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3621)）\n\n- 新增了上下文摘要功能，当对话长度达到 token 数或消息数限制时，可自动压缩对话历史，从而实现高效的长时间对话。\n    - 通过在 `LLMAssistantAggregatorParams` 中设置 `enable_context_summarization=True` 来启用\n    - 可使用 `LLMContextSummarizationConfig` 自定义行为（最大 token 数、阈值等）\n    - 摘要过程中会自动保留未完成的函数调用序列\n    - 请参阅以下新示例：\n      `examples\u002Ffoundational\u002F54-context-summarization-openai.py` 和\n      `examples\u002Ffoundational\u002F54a-context-summarization-google.py`\n  （PR [#3621](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3621)）\n\n- 新增了 RTVI 函数调用生命周期事件（`llm-function-call-started`、`llm-function-call-in-progress`、`llm-function-call-stopped`），并通过 `RTVIObserverParams.function_call_report_level` 提供可配置的安全级别。支持针对每个函数控制暴露哪些信息（`DISABLED`、`NONE`、`NAME` 或 `FULL`）。\n  （PR [#3630](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3630)）\n\n- 新增了 `RequestMetadataFrame` 和针对 `ServiceSwitcher` 的元数据处理机制，以确保 STT 服务在切换时能够正确发出 `STTMetadataFrame`。只有当前激活的服务的元数据会被传递到下游；当服务切换时，新激活的服务会重新发出其元数据。","2026-02-11T02:40:49",{"id":175,"version":176,"summary_zh":177,"released_at":178},136863,"v0.0.101","### 新增内容\r\n\r\n- 为 `AICFilter` 和 `AICVADAnalyzer` 增加了以下功能：\r\n    - 在 `AICFilter` 中新增了通过 `model_id` 和 `model_download_dir` 参数支持模型下载的功能。\r\n    - 为 `AICFilter` 新增了 `model_path` 参数，用于加载本地的 `.aicmodel` 文件。\r\n    - 为 `AICFilter` 和 `AICVADAnalyzer` 添加了单元测试。\r\n  (PR [#3408](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3408))\r\n\r\n- 在 Gemini Live 服务中增加了对 `server_content.interrupted` 信号的处理，以便在管道中尚未进行话者角色跟踪的情况下（例如使用本地 VAD 和上下文聚合器时），能够更快地响应打断请求。如果管道中已经存在话者角色跟踪，则额外的打断操作不会产生负面影响。\r\n  (PR [#3429](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3429))\r\n\r\n- 新增了用于 Genesys AudioHook WebSocket 协议的 `GenesysFrameSerializer`，从而实现了 Pipecat 管道与 Genesys Cloud 联络中心之间的双向音频流传输。\r\n  (PR [#3500](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3500))\r\n\r\n- 为 `PipelineTask` 新增了只读属性 `reached_upstream_types` 和 `reached_downstream_types`，用于检查当前帧经过的过滤器类型。\r\n  (PR [#3510](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3510))\r\n\r\n- 为 `PipelineTask` 新增了 `add_reached_upstream_filter()` 和 `add_reached_downstream_filter()` 方法，用于追加帧类型信息。\r\n  (PR [#3510](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3510))\r\n\r\n- 新增了 `UserTurnCompletionLLMServiceMixin`，供 LLM 服务检测并过滤不完整的用户发言回合。当在 `LLMUserAggregatorParams` 中启用 `filter_incomplete_user_turns` 选项后，LLM 会在每次响应的开头输出一个发言完成标记：✓（完整）、○（短且不完整）或 ◐（长且不完整）。不完整的发言回合将被抑制，并且可通过可配置的超时机制自动重新提示用户。\r\n  (PR [#3518](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3518))\r\n\r\n- 新增了 `FrameProcessor.broadcast_frame_instance(frame)` 方法，用于通过提取帧的字段并为每个方向创建新的实例来广播帧实例。\r\n  (PR [#3519](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3519))\r\n\r\n- 当 `enable_rtvi=True`（默认值）时，`PipelineTask` 现在会自动添加 `RTVIProcessor` 并注册 `RTVIObserver`，从而简化了管道的设置流程。\r\n  (PR [#3519](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3519))\r\n\r\n- 新增了 `RTVIProcessor.create_rtvi_observer()` 工厂方法，用于创建 RTVI 观察者。\r\n  (PR [#3519](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3519))\r\n\r\n- 为 `TransportParams` 新增了 `video_out_codec` 参数，允许配置 `DailyTransport` 中视频输出的首选编解码器（例如 `\"VP8\"`、`\"H264\"`、`\"H265\"`）。\r\n  (PR [#3520](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3520))\r\n\r\n- 为 Google TTS 服务（`GoogleHttpTTSService`、`GoogleTTSService`、`GeminiTTSService`）新增了 `location` 参数，以支持不同地区的端点调用。\r\n  (PR [#3523](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3523))\r\n\r\n- 新增了 `P","2026-01-31T07:01:11",{"id":180,"version":181,"summary_zh":182,"released_at":183},136864,"v0.0.100","### 新增\n\n- 添加了 Hathora 服务，以支持由 Hathora 托管的 TTS 和 STT 模型（仅限非流式）。\n  （PR [#3169](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3169)）\n\n- 添加了 `CambTTSService`，利用 Camb.ai 与 MARS 模型（mars-flash、mars-pro、mars-instruct）的集成，实现高质量的文本转语音合成。\n  （PR [#3349](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3349)）\n\n- 在 `WebsocketClientParams` 中新增了 `additional_headers` 参数，使 `WebsocketClientTransport` 能够在连接时发送自定义头部信息，用于身份验证等场景。\n  （PR [#3461](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3461)）\n\n- 添加了 `UserIdleController`，用于检测用户空闲状态，并通过可选的 `user_idle_timeout` 参数集成到 `LLMUserAggregator` 和 `UserTurnProcessor` 中。该控制器会发出 `on_user_turn_idle` 事件，供应用层进行处理。同时，废弃了旧的 `UserIdleProcessor`，转而采用新的组合式设计。\n  （PR [#3482](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3482)）\n\n- 在 `LLMUserAggregator` 中新增了 `on_user_mute_started` 和 `on_user_mute_stopped` 事件处理器，用于跟踪用户静音状态的变化。\n  （PR [#3490](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3490)）\n\n### 变更\n\n- 在 `AsyncAITTSService` 中增强了中断处理机制，支持多上下文 WebSocket 会话，从而实现更稳健的上下文管理。\n  （PR [#3287](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3287)）\n\n- 对 `UserSpeakingFrame` 进行了节流处理，使其最多每 200 毫秒广播一次，而不是每接收到一个音频块就立即广播，从而降低用户说话期间的帧处理开销。\n  （PR [#3483](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3483)）\n\n### 已弃用\n\n- 为与其他包名保持一致，我们已弃用了 `pipecat.turns.mute`（在 Pipecat 0.0.99 中引入），并将其替换为 `pipecat.turns.user_mute`。\n  （PR [#3479](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3479)）\n\n### 修复\n\n- 修正了 `AsyncAIHttpTTSService` 中 TTFB 指标的计算逻辑。\n  （PR [#3287](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3287)）\n\n- 修复了一个问题：对于实时（语音到语音）服务，“bot-llm-text” RTVI 事件不会触发，这些问题涉及以下服务：\n\n    - `AWSNovaSonicLLMService`\n    - `GeminiLiveLLMService`\n    - `OpenAIRealtimeLLMService`\n    - `GrokRealtimeLLMService`\n\n  问题的原因是这些服务未推送 `LLMTextFrame`。现在已修复，这些服务会正常推送 `LLMTextFrame`。\n  （PR [#3446](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3446)）\n\n- 修复了在使用 `ExternalUserTurnStrategies` 时，`on_user_turn_stop_timeout` 事件可能在用户正在讲话时被触发的问题。\n  （PR [#3454](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3454)）\n\n- 修复了用户轮次开始策略在用户轮次开始后未被重置的问题，这会导致策略行为不正确。\n  （PR [#3455](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3455)）\n\n- 修复了 `MinWordsUserTurnStartStrategy` 不再聚合转录内容的问题，从而避免在单词之间有停顿时错误地启动轮次。\n  （PR [#3462](https","2026-01-21T03:37:46",{"id":185,"version":186,"summary_zh":187,"released_at":188},136865,"v0.0.99","### 新增功能\n\n- 引入用户发言轮次策略。用户发言轮次策略用于指示用户发言何时开始或结束。在对话式智能体中，这些通常被称为“开始\u002F停止发言”或“轮次交接”计划或策略。\n\n  用户发言开始策略用于指示用户何时开始发言（例如，通过 VAD 事件或当用户说出一个或多个词语时）。\n\n  用户发言结束策略用于指示用户何时停止发言（例如，使用轮次结束检测模型或通过观察传入的转录文本）。\n\n  对于这两种策略，都可以指定一个策略列表；系统会按顺序评估这些策略，直到其中有一个评估为真为止。\n\n    可用的用户发言开始策略：\n      - VADUserTurnStartStrategy\n      - TranscriptionUserTurnStartStrategy\n      - MinWordsUserTurnStartStrategy\n      - ExternalUserTurnStartStrategy\n\n    可用的用户发言结束策略：\n      - TranscriptionUserTurnStopStrategy\n      - TurnAnalyzerUserTurnStopStrategy\n      - ExternalUserTurnStopStrategy\n\n    默认策略如下：\n      - 开始：[VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy]\n      - 结束：[TranscriptionUserTurnStopStrategy]\n\n  轮次策略在设置 `LLMContextAggregatorPair` 时进行配置。例如：\n\n    ```python\n    context_aggregator = LLMContextAggregatorPair(\n        context,\n        user_params=LLMUserAggregatorParams(\n            user_turn_strategies=UserTurnStrategies(\n                stop=[\n                    TurnAnalyzerUserTurnStopStrategy(\n                      turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())\n                    )\n                ],\n            )\n        ),\n    )\n    ```\n\n  若要使用用户发言轮次策略，您必须升级到新的通用 `LLMContext` 和 `LLMContextAggregatorPair`。（PR [#3045](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3045)）\n\n- 添加了 `RNNoiseFilter`，用于通过 pyrnnoise 库利用 RNNoise 神经网络实现实时噪声抑制。（PR [#3205](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3205)）\n\n- 添加了 `GrokRealtimeLLMService`，用于 xAI 的 Grok Voice Agent API，支持实时语音对话：\n\n    - 支持通过 WebSocket 连接进行实时音频流传输\n    - 内置服务器端 VAD（语音活动检测）\n    - 多种语音选项：Ara、Rex、Sal、Eve、Leo\n    - 内置工具支持：web_search、x_search、file_search\n    - 支持自定义函数调用，采用标准的 Pipecat 工具模式\n    - 可配置的音频格式（PCM，采样率 8kHz–48kHz）\n  （PR [#3267](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3267)）\n\n- 为 Ultravox 增加了 TTFB 的近似值计算。（PR [#3268](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3268)）\n\n- 在 TTS 服务基类中新增了 `AudioContextTTSService`。`AudioContextWordTTSService` 现在继承自 `AudioContextTTSService` 和 `WebsocketWordTTSService`。（PR [#3289](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3289)）\n\n- `LLMUserAggregator` n","2026-01-14T01:19:13",{"id":190,"version":191,"summary_zh":192,"released_at":193},136866,"v0.0.98","### Added\r\n\r\n- Added `RimeNonJsonTTSService` which supports non-JSON streaming mode. This new class supports websocket streaming for the Arcana model.\r\n  (PR [#3085](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3085))\r\n\r\n- Added additional functionality related to \"thinking\", for Google and Anthropic LLMs.\r\n\r\n  1. New typed parameters for Google and Anthropic LLMs that control the models' thinking behavior (like how much thinking to do, and whether to output thoughts or thought summaries):\r\n     - `AnthropicLLMService.ThinkingConfig`\r\n     - `GoogleLLMService.ThinkingConfig`\r\n  2. New frames for representing thoughts output by LLMs:\r\n     - `LLMThoughtStartFrame`\r\n     - `LLMThoughtTextFrame`\r\n     - `LLMThoughtEndFrame`\r\n  3. A generic mechanism for recording LLM thoughts to context, used specifically to support Anthropic, whose thought signatures are expected to appear alongside the text of the thoughts within assistant context messages. See:\r\n     - `LLMThoughtEndFrame.signature`\r\n     - `LLMAssistantAggregator` handling of the above field\r\n     - `AnthropicLLMAdapter` handling of `\"thought\"` context messages\r\n  4. Google-specific logic for inserting thought signatures into the context, to help maintain thinking continuity in a chain of LLM calls. See:\r\n     - `GoogleLLMService` sending `LLMMessagesAppendFrame`s to add LLM-specific `\"thought_signature\"` messages to context\r\n     - `GeminiLLMAdapter` handling of `\"thought_signature\"` messages\r\n  5. An expansion of `TranscriptProcessor` to process LLM thoughts in addition to user and assistant utterances. See:\r\n     - `TranscriptProcessor(process_thoughts=True)` (defaults to `False`)\r\n     - `ThoughtTranscriptionMessage`, which is now also emitted with the\r\n       `\"on_transcript_update\"` event\r\n  (PR [#3175](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3175))\r\n\r\n- Data and control frames can now be marked as non-interruptible by using the `UninterruptibleFrame` mixin. Frames marked as `UninterruptibleFrame` will not be interrupted during processing, and any queued frames of this type will be retained in the internal queues. This is useful when you need ordered frames (data or control) that should not be discarded or cancelled due to interruptions.\r\n  (PR [#3189](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3189))\r\n\r\n- Added `on_conversation_detected` event to `VoicemaiDetector`.\r\n  (PR [#3207](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3207))\r\n\r\n- Added `x-goog-api-client` header with Pipecat's version to all Google services' requests.\r\n  (PR [#3208](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3208))\r\n\r\n- Added support for the HeyGen LiveAvatar API (see https:\u002F\u002Fwww.liveavatar.com\u002F).\r\n  (PR [#3210](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3210))\r\n\r\n- Added to `AWSNovaSonicLLMService` functionality related to the new (and now default) Nova 2 Sonic model (`\"amazon.nova-2-sonic-v1:0\"`):\r\n\r\n  - Added the `endpointing_sensitivity` parameter to control how quickly the model decides the user has stopped speaking.\r\n  - Made the assistant-response-trigger hack a no-op. It's only needed for the older Nova Sonic model.\r\n  (PR [#3212](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3212))\r\n\r\n- [Ultravox Realtime](https:\u002F\u002Fdocs.ultravox.ai) is now a supported speech-to-speech service.\r\n\r\n  - Added `UltravoxRealtimeLLMService` for the integration.\r\n  - Added `49-ultravox-realtime.py` example (with tool calling).\r\n  (PR [#3227](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3227))\r\n\r\n- Added Daily PSTN dial-in support to the development runner with `--dialin` flag. This includes:\r\n\r\n  - `\u002Fdaily-dialin-webhook` endpoint that handles incoming Daily PSTN webhooks\r\n  - Automatic Daily room creation with SIP configuration\r\n  - `DialinSettings` and `DailyDialinRequest` types in `pipecat.runner.types` for type-safe dial-in data\r\n  - The runner now mimics Pipecat Cloud's dial-in webhook handling for local development\r\n (PR [#3235](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3235))\r\n\r\n- Add Gladia session id to logs for `GladiaSTTService`.\r\n  (PR [#3236](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3236))\r\n\r\n- Added `InworldHttpTTSService` which uses Inworld's HTTP based TTS service in either streaming or non-streaming mode. Note: This class was previously named `InworldTTSService`.\r\n  (PR [#3239](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3239))\r\n\r\n- Added `language_hints_strict` parameter to `SonioxSTTService` to strictly enforces language hints. This ensures that transcription occurs in the specified language.\r\n  (PR [#3245](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3245))\r\n\r\n- Added Pipecat library version info to the `about` field in the `bot-ready` RTVI message.\r\n  (PR [#3248](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3248))\r\n\r\n- Added `VisionFullResponseStartFrame`, `VisionFullResponseEndFrame` and `VisionTextFrame`. This are used by vision services similar to LLM services.\r\n  (PR [#3252](https:\u002F\u002Fgithub.com\u002Fpipecat-ai\u002Fpipecat\u002Fpull\u002F3252))\r\n\r\n### Changed\r\n\r\n- `FunctionCallInProgressFrame` a","2025-12-17T19:31:07",{"id":195,"version":196,"summary_zh":197,"released_at":198},136867,"v0.0.97","### Added\r\n\r\n- Added new Gradium services, `GradiumSTTService` and `GradiumTTSService`, for speech-to-text and text-to-speech functionality using Gradium's API.\r\n\r\n- Additions for `AsyncAITTSService` and `AsyncAIHttpTTSService`:\r\n\r\n    - Added new `languages`: `pt`, `nl`, `ar`, `ru`, `ro`, `ja`, `he`, `hy`, `tr`, `hi`, `zh`.\r\n    - Updated the default model to `asyncflow_multilingual_v1.0` for improved accuracy and broader language coverage.\r\n\r\n- Added optional tool and tool output filters for MCP services.\r\n\r\n\r\n### Changed\r\n\r\n- Updated Deepgram logging to include Deepgram request IDs for improved debugging.\r\n\r\n- Text Aggregation Improvements:\r\n\r\n    - **Breaking Change**: `BaseTextAggregator.aggregate()` now returns `AsyncIterator[Aggregation]` instead of `Optional[Aggregation]`. This enables the aggregator to return multiple results based on the provided text.\r\n    - Refactored text aggregators to use inheritance: `SkipTagsAggregator` and `PatternPairAggregator` now inherit from `SimpleTextAggregator`, reusing the base class's sentence detection logic.\r\n\r\n- Improved interruption handling to prevent bots from repeating themselves. LLM services that return multiple sentences in a single response (e.g., `GoogleLLMService`) are now split into individual sentences before being sent to TTS. This ensures interruptions occur at sentence boundaries, preventing the bot from repeating content after being interrupted during long responses.\r\n\r\n- Updated `AICFilter` to use Quail STT as the default model (`AICModelType.QUAIL_STT`). Quail STT is optimized for human-to-machine interaction (e.g., voice agents, speech-to-text) and operates at a native sample rate of 16 kHz with fixed enhancement parameters.\r\n\r\n- If an unexpected exception is caught, or if `FrameProcessor.push_error()` is called with an exception, the file name and line number where the exception occured are now logged.\r\n\r\n- Updated Smart Turn model weights to v3.1.\r\n\r\n- Smart Turn analyzer now uses the full context of the turn rather than just the audio since VAD last triggered.\r\n\r\n- Updated `CartesiaSTTService` to return the full transcription `result` in the `TranscriptionFrame` and `InterimTranscriptionFrame`. This provides access to word timestamp data.\r\n\r\n- Added tracking headers (`X-Hume-Client-Name` and `X-Hume-Client-Version`) to all requests made by `HumeTTSService` to the Hume API for better usage tracking and analytics.\r\n  - Added `stop()` and `cancel()` cleanup methods to `HumeTTSService` to properly close the HTTP client and prevent resource leaks.\r\n\r\n\r\n### Deprecated\r\n\r\n- NVIDIA Services name changes (all functionality is unchanged):\r\n\r\n    - `NimLLMService` is now deprecated, use `NvidiaLLMService` instead.\r\n    - `RivaSTTService` is now deprecated, use `NvidiaSTTService` instead.\r\n    - `RivaTTSService` is now deprecated, use `NvidiaTTSService` instead.\r\n    - Use `uv pip install pipecat-ai[nvidia]` instead of `uv pip install pipecat-ai[riva]`\r\n\r\n- The `noise_gate_enable` parameter in `AICFilter` is deprecated and no longer has any effect. Noise gating is now handled automatically by the AIC VAD system. Use `AICFilter.create_vad_analyzer()` for VAD functionality instead.\r\n\r\n- Package `pipecat.sync` is deprecated, use `pipecat.utils.sync` instead.\r\n\r\n\r\n### Fixed\r\n\r\n- Fixed bug in `PatternPairAggregator` where pattern handlers could be called multiple times for `KEEP` or `AGGREGATE` patterns.\r\n\r\n- Fixed sentence aggregation to correctly handle ambiguous punctuation in streaming text, such as currency (\"$29.95\") and abbreviations (\"Mr. Smith\").\r\n\r\n- Fixed an issue in `AWSTranscribeSTTService` where the `region` arg was always set to `us-east-1` when providing an AWS_REGION env var.\r\n\r\n- Fixed an issue in `SarvamTTSService` where the last sentence was not being spoken. Now, audio is flushed when the TTS services receives the `LLMFullResponseEndFrame` or `EndFrame`.\r\n\r\n- Fixed an issue in `DeepgramTTSService` where a `TTSStoppedFrame` was incorrectly pushed after a functional call. This caused an issue with the voice-ui-kit's conversational panel rending of the LLM output after a function call.\r\n\r\n- Fixed an issue where `LLMTextFrame.skip_tts` was being overwritten by LLM services.\r\n\r\n- Fixed an issue that caused `WebsocketService` instances to attempt reconnection during shutdown.\r\n\r\n- Fixed an issue in `ElevenLabsTTSService` where character usage metrics were only reported on the first TTS generation per turn.\r\n","2025-12-05T23:59:54",{"id":200,"version":201,"summary_zh":202,"released_at":203},136868,"v0.0.96","## 🦃 Happy Thanksgiving! 🦃\r\n\r\n### Added\r\n\r\n- Added `AWSBedrockAgentCoreProcessor` to support invoking an AgentCore-hosted agent in a Pipecat pipeline.\r\n\r\n- Enhanced error handling across the framework:\r\n\r\n  - Added `on_error` callback to `FrameProcessor` for centralized error handling.\r\n\r\n  - Renamed `push_error(error: ErrorFrame)` to `push_error_frame(error: ErrorFrame)` for clarity.\r\n\r\n  - Added new `push_error` method for simplified error reporting:\r\n\r\n    ```python\r\n    async def push_error(error_msg: str,\r\n                         exception: Optional[Exception] = None,\r\n                         fatal: bool = False)\r\n    ```\r\n\r\n  - Standardized error logging by replacing `logger.exception` calls with `logger.error` throughout the codebase.\r\n\r\n- Added `cache_read_input_tokens`, `cache_creation_input_tokens` and `reasoning_tokens` to OTel spans for LLM call\r\n\r\n- Added `LiveKitRESTHelper` utility class for managing LiveKit rooms via REST API.\r\n\r\n- Added `DeepgramSageMakerSTTService` which connects to a SageMaker hosted Deepgram STT model. Added `07c-interruptible-deepgram-sagemaker.py` foundational example.\r\n\r\n- Added `SageMakerBidiClient` to connect to SageMaker hosted BiDi compatible services.\r\n\r\n- Added support for `include_timestamps` and `enable_logging` in `ElevenLabsRealtimeSTTService`. When `include_timestamps` is enabled, timestamp data is included in the `TranscriptionFrame`'s `result` parameter.\r\n\r\n- Added optional speaking rate control to `InworldTTSService`.\r\n\r\n- Introduced a new `AggregatedTextFrame` type to support passing text along with an `aggregated_by` field to describe the type of text included. `TTSTextFrame`s now inherit from `AggregatedTextFrame`. With this inheritance, an observer can watch for `AggregatedTextFrame`s to accumlate the perceived output and determine whether or not the text was spoken based on if that frame is also a `TTSTextFrame`.\r\n\r\n  With this frame, the llm token stream can be transformed into custom composable chunks, allowing for aggregation outside the TTS service. This makes it possible to listen for or handle those aggregations and sets the stage for doing things like composing a best effort of the perceived llm output in a more digestable form and to do so whether or not it is processed by a TTS or if even a TTS exists.\r\n\r\n- Introduced `LLMTextProcessor`: A new processor meant to allow customization for how LLMTextFrames should be aggregated and considered. It's purpose is to turn `LLMTextFrame`s into `AggregatedTextFrame`s. By default, a TTSService will still aggregate `LLMTextFrame`s by sentence for the service to consume. However, if you wish to override how the llm text is aggregated, you should no longer override the TTS's internal text_aggregator, but instead, insert this processor between your LLM and TTS in the pipeline.\r\n\r\n- New `bot-output` RTVI message to represent what the bot actually \"says\".\r\n\r\n  - The `RTVIObserver` now emits `bot-output` messages based off the new `AggregatedTextFrame`s (`bot-tts-text` and `bot-llm-text` are still supported and generated, but `bot-transcript` is now deprecated in lieu of this new, more thorough, message).\r\n\r\n  - The new `RTVIBotOutputMessage` includes the fields:\r\n\r\n    - `spoken`: A boolean indicating whether the text was spoken by TTS\r\n\r\n    - `aggregated_by`: A string representing how the text was aggregated (\"sentence\", \"word\", \"my custom aggregation\")\r\n\r\n  - Introduced new fields to `RTVIObserver` to support the new `bot-output`\r\n    messaging:\r\n\r\n    - `bot_output_enabled`: Defaults to True. Set to false to disable bot-output messages.\r\n\r\n    - `skip_aggregator_types`: Defaults to `None`. Set to a list of strings that match aggregation types that should not be included in bot-output messages. (Ex. `credit_card`)\r\n\r\n  - Introduced new methods, `add_text_transformer()` and `remove_text_transformer()`, to `RTVIObserver` to support providing (and subsequently removing) callbacks for various types of aggregations (or all aggregations with `*`) that can modify the text before being sent as a `bot-output` or `tts-text` message. (Think obscuring the credit card or inserting extra detail the client might want that the context doesn't need.)\r\n\r\n- In `MiniMaxHttpTTSService`:\r\n\r\n  - Added support for speech-2.6-hd and speech-2.6-turbo models\r\n\r\n  - Added languages: Afrikaans, Bulgarian, Catalan, Danish, Persian, Filipino, Hebrew, Croatian, Hungarian, Malay, Norwegian, Nynorsk, Slovak, Slovenian, Swedish, and Tamil\r\n\r\n  - Added new emotions: calm and fluent\r\n\r\n- Added `enable_logging` to `SimliVideoService` input parameters. It's disabled by default.\r\n\r\n### Changed\r\n\r\n- Updated `FishAudioTTSService` default model to `s1`.\r\n\r\n- Updated `DeepgramTTSService` to use Deepgram's TTS websocket API. ⚠️ This is a potential breaking change, which only affects you if you're self-hosting `DeepgramTTSService`. The new service uses Websockets and improves TTFB latency.\r\n\r\n- Updated `daily-python` to 0.22.0.\r\n\r\n- `BaseTextAggrega","2025-11-27T01:24:46",{"id":205,"version":206,"summary_zh":207,"released_at":208},136869,"v0.0.95","### Added\r\n\r\n- Added ai-coustics integrated VAD (`AICVADAnalyzer`) with `AICFilter` factory and example wiring; leverages the enhancement model for robust detection with no ONNX dependency or added processing complexity.\r\n\r\n- Added a watchdog to `DeepgramFluxSTTService` to prevent dangling tasks in case the user was speaking and we stop receiving audio.\r\n\r\n- Introduced a minimum confidence parameter in `DeepgramFluxSTTService` to avoid generating transcriptions below a defined threshold.\r\n\r\n- Added `ElevenLabsRealtimeSTTService` which implements the Realtime STT service from ElevenLabs.\r\n\r\n- Added word-level timestamps support to Hume TTS service\r\n\r\n### Changed\r\n\r\n- ⚠️ Breaking change: `LLMContext.create_image_message()`, `LLMContext.create_audio_message()`, `LLMContext.add_image_frame_message()` and `LLMContext.add_audio_frames_message()` are now async methods. This fixes an issue where the asyncio event loop would be blocked while encoding audio or images.\r\n\r\n- `ConsumerProcessor` now queues frames from the producer internally instead of pushing them directly. This allows us to subclass consumer processors and manipulate frames before they are pushed.\r\n\r\n- `BaseTextFilter` only require subclasses to implement the `filter()` method.\r\n\r\n- Extracted the logic for retrying connections, and create a new `send_with_retry` method inside `WebSocketService`.\r\n\r\n- Refactored `DeepgramFluxSTTService` to automatically reconnect if sending a message fails.\r\n\r\n- Updated all STT and TTS services to use consistent error handling pattern with `push_error()` method for better pipeline error event integration.\r\n\r\n- Added support for `maybe_capture_participant_camera()` and `maybe_capture_participant_screen()` for `SmallWebRTCTransport` in the runner utils.\r\n\r\n- Added Hindi support for Rime TTS services.\r\n\r\n- Updated `GeminiTTSService` to use Google Cloud Text-to-Speech streaming API instead of the deprecated Gemini API. Now uses `credentials` \u002F `credentials_path` for authentication. The `api_key` parameter is deprecated. Also, added support for `prompt` parameter for style instructions and expressive markup tags. Significantly improved latency with streaming synthesis.\r\n\r\n- Updated language mappings for the Google and Gemini TTS services to match official documentation.\r\n\r\n### Deprecated\r\n\r\n- The `api_key` parameter in `GeminiTTSService` is deprecated. Use `credentials` or `credentials_path` instead for Google Cloud authentication.\r\n\r\n### Fixed\r\n\r\n- Fixed a `SimliVideoService` connection issue.\r\n\r\n- Fixed an issue in the `Runner` where, when using `SmallWebRTCTransport`, the `request_data` was not being passed to the `SmallWebRTCRunnerArguments` body.\r\n\r\n- Fixed subtle issue of assistant context messages ending up with double spaces between words or sentences.\r\n\r\n- Fixed an issue where `NeuphonicTTSService` wasn't pushing `TTSTextFrame`s, meaning assistant messages weren't being written to context.\r\n\r\n- Fixed an issue with OpenTelemetry where tracing wasn't correctly displaying LLM completions and tools when using the universal `LLMContext`.\r\n\r\n- Fixed issue where `DeepgramFluxSTTService` failed to connect if passing a `keyterm` or `tag` containing a space.\r\n\r\n- Prevented `HeyGenVideoService` from automatically disconnecting after 5 minutes.\r\n\r\n","2025-11-19T05:25:10",{"id":210,"version":211,"summary_zh":212,"released_at":213},136870,"v0.0.94","### Deprecated\r\n\r\n- The `KrispFilter` is deprecated and will be removed in a future version. Use the `KrispVivaFilter` instead.\r\n\r\n### Removed\r\n\r\n- `LivekitFrameSerializer` has been removed. Use `LiveKitTransport` instead.\r\n\r\n### Fixed\r\n\r\n- Fixed a bug related to `LLMAssistantAggregator` where spaces were sometimes missing from assistant messages in context.\r\n","2025-11-10T21:55:07",{"id":215,"version":216,"summary_zh":217,"released_at":218},136871,"v0.0.93","### Added\r\n\r\n- Added support for Sarvam Speech-to-Text service (`SarvamSTTService`) with streaming WebSocket support for `saarika` (STT) and `saaras` (STT-translate) models.\r\n\r\n- Added support for passing in a `ToolsSchema` in lieu of a list of provider- specific dicts when initializing `OpenAIRealtimeLLMService` or when updating it using `LLMUpdateSettingsFrame`.\r\n\r\n- Added `TransportParams.audio_out_silence_secs`, which specifies how many seconds of silence to output when an `EndFrame` reaches the output transport. This can help ensure that all audio data is fully delivered to clients.\r\n\r\n- Added new `FrameProcessor.broadcast_frame()` method. This will push two instances of a given frame class, one upstream and the other downstream.\r\n\r\n  ```python\r\n  await self.broadcast_frame(UserSpeakingFrame)\r\n  ```\r\n\r\n- Added `MetricsLogObserver` for logging performance metrics from `MetricsFrame` instances. Supports filtering via `include_metrics` parameter to control which metrics types are logged (TTFB, processing time, LLM token usage, TTS usage, smart turn metrics).\r\n\r\n- Added `pronunciation_dictionary_locators` to `ElevenLabsTTSService` and `ElevenLabsHttpTTSService`.\r\n\r\n- Added support for loading external observers. You can now register custom pipeline observers by setting the `PIPECAT_OBSERVER_FILES` environment variable. This variable should contain a colon-separated list of Python files (e.g. `export PIPECAT_OBSERVER_FILES=\"observer1.py:observer2.py:...\"`). Each file must define a function with the following signature:\r\n\r\n  ```python\r\n  async def create_observers(task: PipelineTask) -> Iterable[BaseObserver]:\r\n      ...\r\n  ```\r\n\r\n- Added support for new sonic-3 languages in `CartesiaTTSService` and `CartesiaHttpTTSService`.\r\n\r\n- `EndFrame` and `EndTaskFrame` have an optional `reason` field to indicate why the pipeline is being ended.\r\n\r\n- `CancelFrame` and `CancelTaskFrame` have an optional `reason` field to indicate why the pipeline is being canceled. This can be also specified when you cancel a task with `PipelineTask.cancel(reason=\"cancellation reason\")`.\r\n\r\n- Added `include_prob_metrics` parameter to Whisper STT services to enable access to probability metrics from transcription results.\r\n\r\n- Added utility functions `extract_whisper_probability()`, `extract_openai_gpt4o_probability()`, and `extract_deepgram_probability()` to extract probability metrics from `TranscriptionFrame` objects for Whisper-based, OpenAI GPT-4o-transcribe, and Deepgram STT services respectively.\r\n\r\n- Added `LLMSwitcher.register_direct_function()`. It works much like `LLMSwitcher.register_function()` in that it's a shorthand for registering functions on all LLMs in the switcher, but for direct functions.\r\n\r\n- Added `LLMSwitcher.register_direct_function()`. It works much like `LLMSwitcher.register_function()` in that it's a shorthand for registering a function on all LLMs in the switcher, except this new method takes a direct function (a `FunctionSchema`-less function).\r\n\r\n- Added `MCPClient.get_tools_schema()` and `MCPClient.register_tools_schema()` as a two-step alternative to `MCPClient.register_tools()`, to allow users to pass MCP tools to, say, `GeminiLiveLLMService` (as well as other speech-to-speech services) in the constructor.\r\n\r\n- Added support for passing in an `LLMSwicher` to `MCPClient.register_tools()` (as well as the new `MCPClient.register_tools_schema()`).\r\n\r\n- Added `cpu_count` parameter to `LocalSmartTurnAnalyzerV3`. This is set to `1` by default for more predictable performance on low-CPU systems.\r\n\r\n### Changed\r\n\r\n- Improved `concatenate_aggregated_text()` to one word outputs from OpenAI Realtime and Gemini Live. Text fragments are now correctly concatenated without spaces when these patterns are detected.\r\n\r\n- `STTMuteFilter` no longer sends `STTMuteFrame` to the STT service. The filter now blocks frames locally without instructing the STT service to stop processing audio. This prevents inactivity-related errors (such as 409 errors from Google STT) while maintaining the same muting behavior at the application level. Important: The STTMuteFilter should be placed _after_ the STT service itself.\r\n\r\n- Improved `GoogleSTTService` error handling to properly catch gRPC `Aborted` exceptions (corresponding to 409 errors) caused by stream inactivity. These exceptions are now logged at DEBUG level instead of ERROR level, since they indicate expected behavior when no audio is sent for 10+ seconds (e.g., during long silences or when audio input is blocked). The service automatically reconnects when this occurs.\r\n\r\n- Bumped the `fastapi` dependency's upperbound to `\u003C0.122.0`.\r\n\r\n- Updated the default model for `GoogleVertexLLMService` to `gemini-2.5-flash`.\r\n\r\n- Updated the `GoogleVertexLLMService` to use the `GoogleLLMService` as a base\r\n  class instead of the `OpenAILLMService`.\r\n\r\n- Updated STT and TTS services to pass through unverified language codes with a warning instead of returning None. This allows developers to u","2025-11-07T21:26:44",{"id":220,"version":221,"summary_zh":222,"released_at":223},136872,"v0.0.92","## :jack_o_lantern:  The Haunted Edition :ghost:\r\n\r\n### Added\r\n\r\n- Added a new `DeepgramHttpTTSService`, which delivers a meaningful reduction in latency when compared to the `DeepgramTTSService`.\r\n\r\n- Add support for `speaking_rate` input parameter in `GoogleHttpTTSService`.\r\n\r\n- Added `enable_speaker_diarization` and `enable_language_identification` to `SonioxSTTService`.\r\n\r\n- Added `SpeechmaticsTTSService`, which uses Speechmatic's TTS API. Updated examples 07a\\* to use the new TTS service.\r\n\r\n- Added support for including images or audio to LLM context messages using `LLMContext.create_image_message()` or `LLMContext.create_image_url_message()` (not all LLMs support URLs) and `LLMContext.create_audio_message()`. For example, when creating `LLMMessagesAppendFrame`:\r\n\r\n  ```python\r\n  message = LLMContext.create_image_message(image=..., size= ...)\r\n  await self.push_frame(LLMMessagesAppendFrame(messages=[message], run_llm=True))\r\n  ```\r\n\r\n- New event handlers for the `DeepgramFluxSTTService`: `on_start_of_turn`, `on_turn_resumed`, `on_end_of_turn`, `on_eager_end_of_turn`, `on_update`.\r\n\r\n- Added `generation_config` parameter support to `CartesiaTTSService` and `CartesiaHttpTTSService` for Cartesia Sonic-3 models. Includes a new `GenerationConfig` class with `volume` (0.5-2.0), `speed` (0.6-1.5), and `emotion` (60+ options) parameters for fine-grained speech generation control.\r\n\r\n- Expanded support for univeral `LLMContext` to `OpenAIRealtimeLLMService`.  As a reminder, the context-setup pattern when using `LLMContext` is:\r\n\r\n  ```python\r\n  context = LLMContext(messages, tools)\r\n  context_aggregator = LLMContextAggregatorPair(context)\r\n  ```\r\n\r\n  (Note that even though `OpenAIRealtimeLLMService` now supports the universal `LLMContext`, it is not meant to be swapped out for another LLM service at runtime with `LLMSwitcher`.)\r\n\r\n  Note: `TranscriptionFrame`s and `InterimTranscriptionFrame`s now go upstream from `OpenAIRealtimeLLMService`, so if you're using `TranscriptProcessor`, say, you'll want to adjust accordingly:\r\n\r\n  ```python\r\n  pipeline = Pipeline(\r\n    [\r\n      transport.input(),\r\n      context_aggregator.user(),\r\n\r\n      # BEFORE\r\n      llm,\r\n      transcript.user(),\r\n\r\n      # AFTER\r\n      transcript.user(),\r\n      llm,\r\n\r\n      transport.output(),\r\n      transcript.assistant(),\r\n      context_aggregator.assistant(),\r\n    ]\r\n  )\r\n  ```\r\n\r\n  Also worth noting: whether or not you use the new context-setup pattern with `OpenAIRealtimeLLMService`, some types have changed under the hood:\r\n\r\n  ```python\r\n  ## BEFORE:\r\n\r\n  # Context aggregator type\r\n  context_aggregator: OpenAIContextAggregatorPair\r\n\r\n  # Context frame type\r\n  frame: OpenAILLMContextFrame\r\n\r\n  # Context type\r\n  context: OpenAIRealtimeLLMContext\r\n  # or\r\n  context: OpenAILLMContext\r\n\r\n  ## AFTER:\r\n\r\n  # Context aggregator type\r\n  context_aggregator: LLMContextAggregatorPair\r\n\r\n  # Context frame type\r\n  frame: LLMContextFrame\r\n\r\n  # Context type\r\n  context: LLMContext\r\n  ```\r\n\r\n  Also note that `RealtimeMessagesUpdateFrame` and `RealtimeFunctionCallResultFrame` have been deprecated, since they're no longer used by `OpenAIRealtimeLLMService`. OpenAI Realtime now works more like other LLM services in Pipecat, relying on updates to its context, pushed by context aggregators, to update its internal state. Listen for `LLMContextFrame`s for context updates.\r\n\r\n  Finally, `LLMTextFrame`s are no longer pushed from `OpenAIRealtimeLLMService` when it's configured with `output_modalities=['audio']`. If you need to process its output, listen for `TTSTextFrame`s instead.\r\n\r\n- Expanded support for universal `LLMContext` to `GeminiLiveLLMService`.  As a reminder, the context-setup pattern when using `LLMContext` is:\r\n\r\n  ```python\r\n  context = LLMContext(messages, tools)\r\n  context_aggregator = LLMContextAggregatorPair(context)\r\n  ```\r\n\r\n  (Note that even though `GeminiLiveLLMService` now supports the universal `LLMContext`, it is not meant to be swapped out for another LLM service at runtime with `LLMSwitcher`.)\r\n\r\n  Worth noting: whether or not you use the new context-setup pattern with `GeminiLiveLLMService`, some types have changed under the hood:\r\n\r\n  ```python\r\n  ## BEFORE:\r\n\r\n  # Context aggregator type\r\n  context_aggregator: GeminiLiveContextAggregatorPair\r\n\r\n  # Context frame type\r\n  frame: OpenAILLMContextFrame\r\n\r\n  # Context type\r\n  context: GeminiLiveLLMContext\r\n  # or\r\n  context: OpenAILLMContext\r\n\r\n  ## AFTER:\r\n\r\n  # Context aggregator type\r\n  context_aggregator: LLMContextAggregatorPair\r\n\r\n  # Context frame type\r\n  frame: LLMContextFrame\r\n\r\n  # Context type\r\n  context: LLMContext\r\n  ```\r\n\r\n  Also note that `LLMTextFrame`s are no longer pushed from `GeminiLiveLLMService` when it's configured with `modalities=GeminiModalities.AUDIO`. If you need to process its output, listen for `TTSTextFrame`s instead.\r\n\r\n### Changed\r\n\r\n- The development runner's `\u002Fstart` endpoint now supports passing `dailyRoomProperties` and `dailyMeetingTokenProperties`","2025-10-31T16:49:39",{"id":225,"version":226,"summary_zh":227,"released_at":228},136873,"v0.0.91","### Added\r\n\r\n- It is now possible to start a bot from the `\u002Fstart` endpoint when using the runner Daily's transport. This follows the Pipecat Cloud format with `createDailyRoom` and `body` fields in the POST request body.\r\n\r\n- Added an ellipsis character (`…`) to the end of sentence detection in the string utils.\r\n\r\n- Expanded support for universal `LLMContext` to `AWSNovaSonicLLMService`.  As a reminder, the context-setup pattern when using `LLMContext` is:\r\n\r\n  ```python\r\n  context = LLMContext(messages, tools)\r\n  context_aggregator = LLMContextAggregatorPair(context)\r\n  ```\r\n\r\n  (Note that even though `AWSNovaSonicLLMService` now supports the universal `LLMContext`, it is not meant to be swapped out for another LLM service at runtime.)\r\n\r\n  Worth noting: whether or not you use the new context-setup pattern with `AWSNovaSonicLLMService`, some types have changed under the hood:\r\n\r\n  ```python\r\n  ## BEFORE:\r\n\r\n  # Context aggregator type\r\n  context_aggregator: AWSNovaSonicContextAggregatorPair\r\n\r\n  # Context frame type\r\n  frame: OpenAILLMContextFrame\r\n\r\n  # Context type\r\n  context: AWSNovaSonicLLMContext\r\n  # or\r\n  context: OpenAILLMContext\r\n\r\n  ## AFTER:\r\n\r\n  # Context aggregator type\r\n  context_aggregator: LLMContextAggregatorPair\r\n\r\n  # Context frame type\r\n  frame: LLMContextFrame\r\n\r\n  # Context type\r\n  context: LLMContext\r\n  ```\r\n\r\n- Added support for `bulbul:v3` model in `SarvamTTSService` and `SarvamHttpTTSService`.\r\n\r\n- Added `keyterms_prompt` parameter to `AssemblyAIConnectionParams`.\r\n\r\n- Added `speech_model` parameter to `AssemblyAIConnectionParams` to access the multilingual model.\r\n\r\n- Added support for trickle ICE to the `SmallWebRTCTransport`.\r\n\r\n- Added support for updating `OpenAITTSService` settings (`instructions` and `speed`) at runtime via `TTSUpdateSettingsFrame`.\r\n\r\n- Added `--whatsapp` flag to runner to better surface WhatsApp transport logs.\r\n\r\n- Added `on_connected` and `on_disconnected` events to TTS and STT websocket-based services.\r\n\r\n- Added an `aggregate_sentences` arg in `ElevenLabsHttpTTSService`, where the default value is True.\r\n\r\n- Added a `room_properties` arg to the Daily runner's `configure()` method, allowing `DailyRoomProperties` to be provided.\r\n\r\n- The runner `--folder` argument now supports downloading files from subdirectories.\r\n\r\n### Changed\r\n\r\n- `RunnerArguments` now include the `body` field, so there's no need to add it to subclasses. Also, all `RunnerArguments` fields are now keyword-only.\r\n\r\n- `CartesiaSTTService` now inherits from `WebsocketSTTService`.\r\n\r\n- Package upgrades:\r\n\r\n  - `daily-python` upgraded to 0.20.0.\r\n  - `openai` upgraded to support up to 2.x.x.\r\n  - `openpipe` upgraded to support up to 5.x.x.\r\n\r\n- `SpeechmaticsSTTService` updated dependencies for `speechmatics-rt>=0.5.0`.\r\n\r\n### Deprecated\r\n\r\n- The `send_transcription_frames` argument to `AWSNovaSonicLLMService` is deprecated. Transcription frames are now always sent. They go upstream, to be handled by the user context aggregator. See \"Added\" section for details.\r\n\r\n- Types in `pipecat.services.aws.nova_sonic.context` have been deprecated due to changes to support `LLMContext`. See \"Changed\" section for details.\r\n\r\n### Fixed\r\n\r\n- Fixed an issue where the `RTVIProcessor` was sending duplicate `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` messages.\r\n\r\n- Fixed an issue in `AWSBedrockLLMService` where both `temperature` and `top_p` were always sent together, causing conflicts with models like Claude Sonnet 4.5 that don't allow both parameters simultaneously. The service now only includes inference parameters that are explicitly set, and `InputParams` defaults have been changed to `None` to rely on AWS Bedrock's built-in model defaults.\r\n\r\n- Fixed an issue in `RivaSegmentedSTTService` where a runtime error occurred due to a mismatch in the `_handle_transcription` method's signature.\r\n\r\n- Fixed multiple pipeline task cancellation issues. `asyncio.CancelledError` is now handled properly in `PipelineTask` making it possible to cancel an asyncio task that it's executing a `PipelineRunner` cleanly. Also, `PipelineTask.cancel()` does not block anymore waiting for the `CancelFrame` to reach the end of the pipeline (going back to the behavior in \u003C 0.0.83).\r\n\r\n- Fixed an issue in `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` where the Flash models would split words, resulting in a space being inserted between words.\r\n\r\n- Fixed an issue where audio filters' `stop()` would not be called when using `CancelFrame`.\r\n\r\n- Fixed an issue in `ElevenLabsHttpTTSService`, where `apply_text_normalization` was incorrectly set as a query parameter. It's now being added as a request parameter.\r\n\r\n- Fixed an issue where `RimeHttpTTSService` and `PiperTTSService` could generate incorrectly 16-bit aligned audio frames, potentially leading to internal errors or static audio.\r\n\r\n- Fixed an issue in `SpeechmaticsSTTService` where `AdditionalVocabEntry` items needed to have `sounds_like` for the session to","2025-10-22T02:14:34",{"id":230,"version":231,"summary_zh":232,"released_at":233},136874,"v0.0.90","### Added\r\n\r\n- Added audio filter `KrispVivaFilter` using the Krisp VIVA SDK.\r\n\r\n- Added `--folder` argument to the runner, allowing files saved in that folder to be downloaded from `http:\u002F\u002FHOST:PORT\u002Ffile\u002FFILE`.\r\n\r\n- Added `GeminiLiveVertexLLMService`, for accessing Gemini Live via Google Vertex AI.\r\n\r\n- Added some new configuration options to `GeminiLiveLLMService`:\r\n\r\n  - `thinking`\r\n  - `enable_affective_dialog`\r\n  - `proactivity`\r\n\r\n  Note that these new configuration options require using a newer model than the default, like \"gemini-2.5-flash-native-audio-preview-09-2025\". The last two require specifying `http_options=HttpOptions(api_version=\"v1alpha\")`.\r\n\r\n- Added `on_pipeline_error` event to `PipelineTask`. This event will get fired when an `ErrorFrame` is pushed (use `FrameProcessor.push_error()`).\r\n\r\n  ```python\r\n  @task.event_handler(\"on_pipeline_error\")\r\n  async def on_pipeline_error(task: PipelineTask, frame: ErrorFrame):\r\n      ...\r\n  ```\r\n\r\n- Added a `service_tier` `InputParam` to the `BaseOpenAILLMService`. This parameter can influence the latency of the response. For example `\"priority\"` will result in faster completions, but in exchange for a higher price.\r\n\r\n### Changed\r\n\r\n- Updated `GeminiLiveLLMService` to use the `google-genai` library rather than use WebSockets directly.\r\n\r\n### Deprecated\r\n\r\n- `LivekitFrameSerializer` is now deprecated. Use `LiveKitTransport` instead.\r\n\r\n- `pipecat.service.openai_realtime` is now deprecated, use `pipecat.services.openai.realtime` instead or `pipecat.services.azure.realtime` for Azure Realtime.\r\n\r\n- `pipecat.service.aws_nova_sonic` is now deprecated, use `pipecat.services.aws.nova_sonic` instead.\r\n\r\n- `GeminiMultimodalLiveLLMService` is now deprecated, use `GeminiLiveLLMService`.\r\n\r\n### Fixed\r\n\r\n- Fixed a `GoogleVertexLLMService` issue that would generate an error if no token information was returned.\r\n\r\n- `GeminiLiveLLMService` will now end gracefully (i.e. after the bot has finished) upon receiving an `EndFrame`.\r\n\r\n- `GeminiLiveLLMService` will try to seamlessly reconnect when it loses its connection.\r\n","2025-10-10T17:24:17",{"id":235,"version":236,"summary_zh":237,"released_at":238},136875,"v0.0.89","### Fixed\r\n\r\n- Reverted a change introduced in 0.0.88 that was causing pipelines to be frozen when using interruption strategies and processors that block interruption frames (e.g. `STTMuteFilter`).\r\n","2025-10-08T01:57:12"]