[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-pytorch--audio":3,"tool-pytorch--audio":62},[4,18,28,37,45,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":24,"last_commit_at":25,"category_tags":26,"status":17},9989,"n8n","n8n-io\u002Fn8n","n8n 是一款面向技术团队的公平代码（fair-code）工作流自动化平台，旨在让用户在享受低代码快速构建便利的同时，保留编写自定义代码的灵活性。它主要解决了传统自动化工具要么过于封闭难以扩展、要么完全依赖手写代码效率低下的痛点，帮助用户轻松连接 400 多种应用与服务，实现复杂业务流程的自动化。\n\nn8n 特别适合开发者、工程师以及具备一定技术背景的业务人员使用。其核心亮点在于“按需编码”：既可以通过直观的可视化界面拖拽节点搭建流程，也能随时插入 JavaScript 或 Python 代码、调用 npm 包来处理复杂逻辑。此外，n8n 原生集成了基于 LangChain 的 AI 能力，支持用户利用自有数据和模型构建智能体工作流。在部署方面，n8n 提供极高的自由度，支持完全自托管以保障数据隐私和控制权，也提供云端服务选项。凭借活跃的社区生态和数百个现成模板，n8n 让构建强大且可控的自动化系统变得简单高效。",184740,2,"2026-04-19T23:22:26",[16,14,13,15,27],"插件",{"id":29,"name":30,"github_repo":31,"description_zh":32,"stars":33,"difficulty_score":10,"last_commit_at":34,"category_tags":35,"status":17},10095,"AutoGPT","Significant-Gravitas\u002FAutoGPT","AutoGPT 是一个旨在让每个人都能轻松使用和构建 AI 的强大平台，核心功能是帮助用户创建、部署和管理能够自动执行复杂任务的连续型 AI 智能体。它解决了传统 AI 应用中需要频繁人工干预、难以自动化长流程工作的痛点，让用户只需设定目标，AI 即可自主规划步骤、调用工具并持续运行直至完成任务。\n\n无论是开发者、研究人员，还是希望提升工作效率的普通用户，都能从 AutoGPT 中受益。开发者可利用其低代码界面快速定制专属智能体；研究人员能基于开源架构探索多智能体协作机制；而非技术背景用户也可直接选用预置的智能体模板，立即投入实际工作场景。\n\nAutoGPT 的技术亮点在于其模块化“积木式”工作流设计——用户通过连接功能块即可构建复杂逻辑，每个块负责单一动作，灵活且易于调试。同时，平台支持本地自托管与云端部署两种模式，兼顾数据隐私与使用便捷性。配合完善的文档和一键安装脚本，即使是初次接触的用户也能在几分钟内启动自己的第一个 AI 智能体。AutoGPT 正致力于降低 AI 应用门槛，让人人都能成为 AI 的创造者与受益者。",183572,"2026-04-20T04:47:55",[13,36,27,14,15],"语言模型",{"id":38,"name":39,"github_repo":40,"description_zh":41,"stars":42,"difficulty_score":10,"last_commit_at":43,"category_tags":44,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":46,"name":47,"github_repo":48,"description_zh":49,"stars":50,"difficulty_score":24,"last_commit_at":51,"category_tags":52,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",161147,"2026-04-19T23:31:47",[14,13,36],{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":59,"last_commit_at":60,"category_tags":61,"status":17},8272,"opencode","anomalyco\u002Fopencode","OpenCode 是一款开源的 AI 编程助手（Coding Agent），旨在像一位智能搭档一样融入您的开发流程。它不仅仅是一个代码补全插件，而是一个能够理解项目上下文、自主规划任务并执行复杂编码操作的智能体。无论是生成全新功能、重构现有代码，还是排查难以定位的 Bug，OpenCode 都能通过自然语言交互高效完成，显著减少开发者在重复性劳动和上下文切换上的时间消耗。\n\n这款工具专为软件开发者、工程师及技术研究人员设计，特别适合希望利用大模型能力来提升编码效率、加速原型开发或处理遗留代码维护的专业人群。其核心亮点在于完全开源的架构，这意味着用户可以审查代码逻辑、自定义行为策略，甚至私有化部署以保障数据安全，彻底打破了传统闭源 AI 助手的“黑盒”限制。\n\n在技术体验上，OpenCode 提供了灵活的终端界面（Terminal UI）和正在测试中的桌面应用程序，支持 macOS、Windows 及 Linux 全平台。它兼容多种包管理工具，安装便捷，并能无缝集成到现有的开发环境中。无论您是追求极致控制权的资深极客，还是渴望提升产出的独立开发者，OpenCode 都提供了一个透明、可信",144296,1,"2026-04-16T14:50:03",[13,27],{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":76,"owner_twitter":76,"owner_website":77,"owner_url":78,"languages":79,"stars":104,"forks":105,"last_commit_at":106,"license":107,"difficulty_score":24,"env_os":108,"env_gpu":109,"env_ram":108,"env_deps":110,"category_tags":114,"github_topics":118,"view_count":24,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":124,"updated_at":125,"faqs":126,"releases":156},9993,"pytorch\u002Faudio","audio","Data manipulation and transformation for audio signal processing, powered by PyTorch","torchaudio 是专为 PyTorch 打造的音频数据处理库，致力于将深度学习能力延伸至音频与语音领域。它主要解决了在机器学习工作流中高效加载、转换和处理音频信号的难题，让开发者无需在不同库之间切换，即可在一个统一的框架内完成从数据预处理到模型训练的全过程。\n\n这款工具非常适合 AI 研究人员、深度学习工程师以及需要构建音频相关模型的开发者使用。无论是处理常见的语音数据集，还是进行复杂的声学特征提取，torchaudio 都能提供流畅的支持。其核心亮点在于深度集成 PyTorch 生态：所有计算均基于 PyTorch 张量操作，不仅天然支持强大的 GPU 加速，还能利用自动求导系统实现端到端的可训练音频变换。库内置了丰富的功能模块，涵盖梅尔频谱图（MelSpectrogram）、MFCC 提取、重采样等常用变换，并提供与 Kaldi 等专业工具兼容的接口，确保实验结果的一致性与复现性。目前，torchaudio 已进入维护阶段，更加聚焦于为机器学习任务提供精简、高效的音频数据处理核心能力，是构建现代语音识别与音频分析系统的理想基石。","torchaudio: an audio library for PyTorch\n========================================\n\n[![Documentation](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdynamic\u002Fjson.svg?label=docs&url=https%3A%2F%2Fpypi.org%2Fpypi%2Ftorchaudio%2Fjson&query=%24.info.version&colorB=brightgreen&prefix=v)](https:\u002F\u002Fpytorch.org\u002Faudio\u002Fmain\u002F)\n[![Anaconda Badge](https:\u002F\u002Fanaconda.org\u002Fpytorch\u002Ftorchaudio\u002Fbadges\u002Fdownloads.svg)](https:\u002F\u002Fanaconda.org\u002Fpytorch\u002Ftorchaudio)\n[![Anaconda-Server Badge](https:\u002F\u002Fanaconda.org\u002Fpytorch\u002Ftorchaudio\u002Fbadges\u002Fplatforms.svg)](https:\u002F\u002Fanaconda.org\u002Fpytorch\u002Ftorchaudio)\n\n![TorchAudio Logo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpytorch_audio_readme_59db8eb3d9a2.png)\n\n> [!NOTE]\n> **We have transitioned TorchAudio into a\n>  maintenance phase. This process removed some user-facing\n>  features. These features were deprecated from TorchAudio 2.8 and removed in 2.9.\n>  Our main goals were to reduce redundancies with the rest of the\n>  PyTorch ecosystem, make it easier to maintain, and create a version of\n>  TorchAudio that is more tightly scoped to its strengths: processing audio\n>  data for ML. Please see\n>  [our community message](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fissues\u002F3902)\n>  for more details.**\n\nThe aim of torchaudio is to apply [PyTorch](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fpytorch) to\nthe audio domain. By supporting PyTorch, torchaudio follows the same philosophy\nof providing strong GPU acceleration, having a focus on trainable features through\nthe autograd system, and having consistent style (tensor names and dimension names).\nTherefore, it is primarily a machine learning library and not a general signal\nprocessing library. The benefits of PyTorch can be seen in torchaudio through\nhaving all the computations be through PyTorch operations which makes it easy\nto use and feel like a natural extension.\n\n- [Dataloaders for common audio datasets](http:\u002F\u002Fpytorch.org\u002Faudio\u002Fmain\u002Fdatasets.html)\n- Audio and speech processing functions\n  - [forced_align](https:\u002F\u002Fpytorch.org\u002Faudio\u002Fmain\u002Fgenerated\u002Ftorchaudio.functional.forced_align.html)\n- Common audio transforms\n  - [Spectrogram, AmplitudeToDB, MelScale, MelSpectrogram, MFCC, MuLawEncoding, MuLawDecoding, Resample](http:\u002F\u002Fpytorch.org\u002Faudio\u002Fmain\u002Ftransforms.html)\n- Compliance interfaces: Run code using PyTorch that align with other libraries\n  - [Kaldi: spectrogram, fbank, mfcc](https:\u002F\u002Fpytorch.org\u002Faudio\u002Fmain\u002Fcompliance.kaldi.html)\n\nInstallation\n------------\n\nPlease refer to https:\u002F\u002Fpytorch.org\u002Faudio\u002Fmain\u002Finstallation.html for installation and build process of TorchAudio.\n\n\nAPI Reference\n-------------\n\nAPI Reference is located here: http:\u002F\u002Fpytorch.org\u002Faudio\u002Fmain\u002F\n\nContributing Guidelines\n-----------------------\n\nPlease refer to [CONTRIBUTING.md](.\u002FCONTRIBUTING.md)\n\nCitation\n--------\n\nIf you find this package useful, please cite as:\n\n```bibtex\n@article{yang2021torchaudio,\n  title={TorchAudio: Building Blocks for Audio and Speech Processing},\n  author={Yao-Yuan Yang and Moto Hira and Zhaoheng Ni and Anjali Chourdia and Artyom Astafurov and Caroline Chen and Ching-Feng Yeh and Christian Puhrsch and David Pollack and Dmitriy Genzel and Donny Greenberg and Edward Z. Yang and Jason Lian and Jay Mahadeokar and Jeff Hwang and Ji Chen and Peter Goldsborough and Prabhat Roy and Sean Narenthiran and Shinji Watanabe and Soumith Chintala and Vincent Quenneville-Bélair and Yangyang Shi},\n  journal={arXiv preprint arXiv:2110.15018},\n  year={2021}\n}\n```\n\n```bibtex\n@misc{hwang2023torchaudio,\n      title={TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch},\n      author={Jeff Hwang and Moto Hira and Caroline Chen and Xiaohui Zhang and Zhaoheng Ni and Guangzhi Sun and Pingchuan Ma and Ruizhe Huang and Vineel Pratap and Yuekai Zhang and Anurag Kumar and Chin-Yun Yu and Chuang Zhu and Chunxi Liu and Jacob Kahn and Mirco Ravanelli and Peng Sun and Shinji Watanabe and Yangyang Shi and Yumeng Tao and Robin Scheibler and Samuele Cornell and Sean Kim and Stavros Petridis},\n      year={2023},\n      eprint={2310.17864},\n      archivePrefix={arXiv},\n      primaryClass={eess.AS}\n}\n```\n\nDisclaimer on Datasets\n----------------------\n\nThis is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset's license.\n\nIf you're a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!\n\nPre-trained Model License\n-------------------------\n\nThe pre-trained models provided in this library may have their own licenses or terms and conditions derived from the dataset used for training. It is your responsibility to determine whether you have permission to use the models for your use case.\n\nFor instance, SquimSubjective model is released under the Creative Commons Attribution Non Commercial 4.0 International (CC-BY-NC 4.0) license. See [the link](https:\u002F\u002Fzenodo.org\u002Frecord\u002F4660670#.ZBtWPOxuerN) for additional details.\n\nOther pre-trained models that have different license are noted in documentation. Please checkout the [documentation page](https:\u002F\u002Fpytorch.org\u002Faudio\u002Fmain\u002F).\n","torchaudio：PyTorch 的音频库\n========================================\n\n[![文档](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdynamic\u002Fjson.svg?label=docs&url=https%3A%2F%2Fpypi.org%2Fpypi%2Ftorchaudio%2Fjson&query=%24.info.version&colorB=brightgreen&prefix=v)](https:\u002F\u002Fpytorch.org\u002Faudio\u002Fmain\u002F)\n[![Anaconda 徽章](https:\u002F\u002Fanaconda.org\u002Fpytorch\u002Ftorchaudio\u002Fbadges\u002Fdownloads.svg)](https:\u002F\u002Fanaconda.org\u002Fpytorch\u002Ftorchaudio)\n[![Anaconda 服务器徽章](https:\u002F\u002Fanaconda.org\u002Fpytorch\u002Ftorchaudio\u002Fbadges\u002Fplatforms.svg)](https:\u002F\u002Fanaconda.org\u002Fpytorch\u002Ftorchaudio)\n\n![TorchAudio Logo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpytorch_audio_readme_59db8eb3d9a2.png)\n\n> [!NOTE]\n> **我们已将 TorchAudio 进入维护阶段。在此过程中，移除了部分面向用户的特性。这些特性自 TorchAudio 2.8 起已被弃用，并在 2.9 中彻底移除。我们的主要目标是减少与 PyTorch 生态其他部分的冗余，简化维护工作，并打造一个更专注于其优势——为机器学习处理音频数据——的 TorchAudio 版本。更多详情请参阅\n> [我们的社区公告](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fissues\u002F3902)。**\n\ntorchaudio 的目标是将 [PyTorch](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fpytorch) 应用于音频领域。通过支持 PyTorch，torchaudio 坚持相同的理念：提供强大的 GPU 加速、通过 autograd 系统聚焦可训练特征，并保持一致的风格（张量名称和维度名称）。因此，它主要是一个机器学习库，而非通用信号处理库。PyTorch 的优势在 torchaudio 中得以体现，所有计算都通过 PyTorch 操作完成，这使得使用起来非常方便，仿佛是 PyTorch 的自然扩展。\n\n- [常用音频数据集的数据加载器](http:\u002F\u002Fpytorch.org\u002Faudio\u002Fmain\u002Fdatasets.html)\n- 音频和语音处理函数\n  - [强制对齐](https:\u002F\u002Fpytorch.org\u002Faudio\u002Fmain\u002Fgenerated\u002Ftorchaudio.functional.forced_align.html)\n- 常用音频变换\n  - [频谱图、幅度转分贝、梅尔尺度、梅尔频谱图、MFCC、μ律编码、μ律解码、重采样](http:\u002F\u002Fpytorch.org\u002Faudio\u002Fmain\u002Ftransforms.html)\n- 兼容接口：使用 PyTorch 编写与其他库兼容的代码\n  - [Kaldi：频谱图、FBank、MFCC](https:\u002F\u002Fpytorch.org\u002Faudio\u002Fmain\u002Fcompliance.kaldi.html)\n\n安装\n------------\n\n请参阅 https:\u002F\u002Fpytorch.org\u002Faudio\u002Fmain\u002Finstallation.html 以获取 TorchAudio 的安装和构建流程。\n\n\nAPI 参考\n-------------\n\nAPI 参考位于此处：http:\u002F\u002Fpytorch.org\u002Faudio\u002Fmain\u002F\n\n贡献指南\n-----------------------\n\n请参阅 [CONTRIBUTING.md](.\u002FCONTRIBUTING.md)\n\n引用\n--------\n\n如果您觉得本包有用，请按以下方式引用：\n\n```bibtex\n@article{yang2021torchaudio,\n  title={TorchAudio: Building Blocks for Audio and Speech Processing},\n  author={Yao-Yuan Yang and Moto Hira and Zhaoheng Ni and Anjali Chourdia and Artyom Astafurov and Caroline Chen and Ching-Feng Yeh and Christian Puhrsch and David Pollack and Dmitriy Genzel and Donny Greenberg and Edward Z. Yang and Jason Lian and Jay Mahadeokar and Jeff Hwang and Ji Chen and Peter Goldsborough and Prabhat Roy and Sean Narenthiran and Shinji Watanabe and Soumith Chintala and Vincent Quenneville-Bélair and Yangyang Shi},\n  journal={arXiv preprint arXiv:2110.15018},\n  year={2021}\n}\n```\n\n```bibtex\n@misc{hwang2023torchaudio,\n      title={TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch},\n      author={Jeff Hwang and Moto Hira and Caroline Chen and Xiaohui Zhang and Zhaoheng Ni and Guangzhi Sun and Pingchuan Ma and Ruizhe Huang and Vineel Pratap and Yuekai Zhang and Anurag Kumar and Chin-Yun Yu and Chuang Zhu and Chunxi Liu and Jacob Kahn and Mirco Ravanelli and Peng Sun and Shinji Watanabe and Yangyang Shi and Yumeng Tao and Robin Scheibler and Samuele Cornell and Sean Kim and Stavros Petridis},\n      year={2023},\n      eprint={2310.17864},\n      archivePrefix={arXiv},\n      primaryClass={eess.AS}\n}\n```\n\n数据集免责声明\n----------------------\n\n这是一个用于下载和准备公开数据集的工具库。我们不托管或分发这些数据集，也不对其质量或公平性作出保证，更不声称您拥有使用这些数据集的许可。您有责任根据数据集的许可协议确定自己是否有权使用该数据集。\n\n如果您是数据集的所有者，并希望更新其中的任何部分（描述、引用等），或者不希望您的数据集被包含在本库中，请通过 GitHub 问题与我们联系。感谢您对机器学习社区的贡献！\n\n预训练模型许可\n-------------------------\n\n本库中提供的预训练模型可能具有各自的许可或由训练所用数据集衍生的条款和条件。您有责任确定自己是否拥有针对特定用途使用这些模型的许可。\n\n例如，SquimSubjective 模型采用知识共享署名非商业性使用 4.0 国际许可协议（CC-BY-NC 4.0）发布。更多详细信息请参阅 [此链接](https:\u002F\u002Fzenodo.org\u002Frecord\u002F4660670#.ZBtWPOxuerN)。\n\n其他具有不同许可的预训练模型已在文档中注明。请查阅 [文档页面](https:\u002F\u002Fpytorch.org\u002Faudio\u002Fmain\u002F)。","# Torchaudio 快速上手指南\n\nTorchaudio 是 PyTorch 生态中专为音频和语音处理设计的库。它利用 PyTorch 的 GPU 加速能力和自动求导机制，专注于为机器学习任务提供高效的音频数据处理功能（如频谱图提取、重采样、特征工程等），而非通用的信号处理库。\n\n> **注意**：Torchaudio 目前已进入维护阶段（Maintenance Phase）。部分功能在 2.8 版本弃用并在 2.9 版本移除，旨在精简库结构，更聚焦于机器学习所需的音频数据处理核心能力。\n\n## 1. 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：Linux, macOS, 或 Windows\n*   **Python**：建议 Python 3.8 及以上版本\n*   **前置依赖**：必须先安装 **PyTorch**。Torchaudio 的版本需与已安装的 PyTorch 版本兼容。\n*   **硬件**：可选配 NVIDIA GPU 以加速音频特征计算（需安装 CUDA 版本的 PyTorch）。\n\n## 2. 安装步骤\n\n推荐优先使用 PyTorch 官方提供的安装命令，该命令会自动匹配适合您当前环境的版本。\n\n### 方式一：使用 pip 安装（推荐）\n\n访问 [PyTorch 官网安装页面](https:\u002F\u002Fpytorch.org\u002Faudio\u002Fmain\u002Finstallation.html) 获取最新命令。通常情况下，如果您已安装 PyTorch，可直接运行：\n\n```bash\npip install torchaudio\n```\n\n**国内加速方案**：\n如果您在中国大陆地区，建议使用清华大学或阿里云镜像源以加快下载速度：\n\n```bash\npip install torchaudio -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n### 方式二：使用 Conda 安装\n\n如果您使用 Anaconda 或 Miniconda 管理环境：\n\n```bash\nconda install -c pytorch torchaudio\n```\n\n**国内加速方案**：\n配置清华源后执行：\n\n```bash\nconda install -c https:\u002F\u002Fmirrors.tuna.tsinghua.edu.cn\u002Fanaconda\u002Fcloud\u002Fpytorch torchaudio\n```\n\n## 3. 基本使用\n\nTorchaudio 的核心功能是加载音频文件并将其转换为 Tensor，以便直接输入到 PyTorch 模型中。以下是最简单的加载与变换示例。\n\n### 示例：加载音频并提取梅尔频谱图\n\n```python\nimport torch\nimport torchaudio\n\n# 1. 加载音频文件\n# waveform: 音频波形数据 (Tensor)\n# sample_rate: 采样率 (int)\nwaveform, sample_rate = torchaudio.load(\"example.wav\")\n\nprint(f\"波形形状：{waveform.shape}, 采样率：{sample_rate}\")\n\n# 2. 数据变换：提取梅尔频谱图 (MelSpectrogram)\n# 定义变换参数\ntransform = torchaudio.transforms.MelSpectrogram(\n    sample_rate=sample_rate,\n    n_mels=128\n)\n\n# 应用变换\nmel_specgram = transform(waveform)\n\nprint(f\"梅尔频谱图形状：{mel_specgram.shape}\")\n\n# 3. (可选) 转换为分贝刻度\nto_db = torchaudio.transforms.AmplitudeToDB()\nmel_specgram_db = to_db(mel_specgram)\n\n# 此时 mel_specgram_db 可直接用于训练神经网络\n```\n\n### 常用功能概览\n\n*   **数据集加载**：`torchaudio.datasets` 提供了常见音频数据集（如 LibriSpeech, GTZAN）的 DataLoader 接口。\n*   **合规性接口**：提供与 Kaldi 等传统工具库对齐的接口（如 `fbank`, `mfcc`），方便迁移旧代码。\n*   **强制对齐**：支持 `forced_align` 等功能，用于语音识别任务中的时间戳对齐。\n\n更多详细 API 请参考 [官方文档](https:\u002F\u002Fpytorch.org\u002Faudio\u002Fmain\u002F)。","某语音识别初创团队的算法工程师正在构建一个端到端的说话人情感分析模型，需要处理海量原始录音数据并提取声学特征。\n\n### 没有 audio 时\n- **数据处理割裂**：需先用 LibROSA 或 SciPy 加载音频，再手动转换为 NumPy 数组并传入 PyTorch，流程繁琐且容易在格式转换中出错。\n- **GPU 加速缺失**：频谱图（Spectrogram）和梅尔频率倒谱系数（MFCC）等特征提取只能在 CPU 上串行计算，处理大规模数据集时耗时极长，成为训练瓶颈。\n- **梯度断裂风险**：传统信号处理库不支持自动求导，导致无法将音频预处理环节纳入整体神经网络进行端到端的联合优化。\n- **生态兼容困难**：难以直接复用 PyTorch 现有的 DataLoader 机制，编写自定义数据集类时代码冗余度高，维护成本大。\n\n### 使用 audio 后\n- **原生无缝集成**：audio 提供原生的 PyTorch Tensor 接口，可直接加载波形并在一行代码内完成从文件到张量的转换，消除格式壁垒。\n- **全链路 GPU 加速**：利用 audio 内置的 `Spectrogram`、`MelSpectrogram` 等变换算子，特征提取过程直接运行在 GPU 上，数据准备速度提升数倍。\n- **支持端到端训练**：所有音频操作均基于 PyTorch Autograd 系统构建，允许梯度反向传播至预处理层，实现了真正的端到端模型微调。\n- **标准化数据流**：通过 audio 提供的专用 Datasets 和 Transforms 组件，轻松构建高效的数据流水线，代码风格与主流视觉任务保持一致。\n\naudio 通过将音频信号处理深度融入 PyTorch 生态，彻底打破了数据准备与模型训练之间的性能及功能隔阂，让音频深度学习开发变得高效且流畅。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fpytorch_audio_59db8eb3.png","pytorch","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fpytorch_be722ba8.jpg","",null,"https:\u002F\u002Fpytorch.org","https:\u002F\u002Fgithub.com\u002Fpytorch",[80,84,88,92,96,100],{"name":81,"color":82,"percentage":83},"Python","#3572A5",88.2,{"name":85,"color":86,"percentage":87},"Cuda","#3A4E3A",5.8,{"name":89,"color":90,"percentage":91},"C++","#f34b7d",5.6,{"name":93,"color":94,"percentage":95},"Shell","#89e051",0.3,{"name":97,"color":98,"percentage":99},"Batchfile","#C1F12E",0.1,{"name":101,"color":102,"percentage":103},"C","#555555",0,2865,768,"2026-04-18T15:51:43","BSD-2-Clause","未说明","支持 GPU 加速（基于 PyTorch），具体型号、显存大小及 CUDA 版本未在文档中明确指定",{"notes":111,"python":108,"dependencies":112},"该工具已进入维护阶段，部分功能在 2.9 版本中被移除。主要定位为机器学习库而非通用信号处理库。安装详情需参考官方安装文档，数据集和预训练模型的使用需用户自行确认许可协议。",[113],"torch",[115,116,16,27,14,13,36,15,117],"音频","其他","视频",[65,119,120,121,122,73,123],"python","io","speech","machine-learning","audio-processing","2026-03-27T02:49:30.150509","2026-04-20T12:55:29.936144",[127,132,137,142,147,152],{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},44893,"如何在 Apple Silicon (M1) Mac 上安装 torchaudio？","在 M1 Mac 上，建议安装原生支持 ARM64 架构的 Python 环境。可以使用 Miniforge3 或新版原生 Anaconda 分布版。确保使用 Python 3.9 及以上版本，然后直接运行 `pip install torch torchaudio`。安装时需确认下载的包名称包含 `macosx_11_0_arm64` 字样（例如 `torchaudio-0.10.2-cp39-cp39-macosx_11_0_arm64.whl`），以确保架构匹配。","https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fissues\u002F1573",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},44894,"如何使用 file-like object（如内存流）加载 MP3 文件？","直接使用 `torchaudio.load()` 加载包含 MP3 数据的 file-like object 可能会失败，因为后端无法自动推断格式。解决方案有两种：1. 升级 torchaudio 到最新版本（>=1.9.0 或更高），新版本对此有更好的支持；2. 如果问题依旧，可能是 FFmpeg 库版本冲突导致，尝试设置环境变量 `LD_LIBRARY_PATH` 优先使用 conda 环境内的 `libavcodec` 库；3. 对于某些版本，可能需要显式指定 `format=\"mp3\"` 参数，但这取决于具体的后端实现和版本。","https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fissues\u002F2363",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},44895,"如何迁移到 torchaudio 0.7+ 的新后端接口？","为了获得正确且一致的 I\u002FO 体验，不同平台需进行以下配置：\n1. Linux\u002FmacOS 用户：请调用 `torchaudio.set_audio_backend(\"sox_io\")` 切换到新后端。\n2. Windows 用户：请设置 `torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False` 并重新加载后端以使用新接口。\n注意：此次更新修复了非 16bit 有符号整数 WAV 格式的 bug，可能会导致部分向后不兼容的变化。","https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fissues\u002F903",{"id":143,"question_zh":144,"answer_zh":145,"source_url":146},44896,"新的 AudioMetaData 类中哪里可以找到编码和位深信息？","在弃用旧的 `sox` 后端后，新的 `AudioMetaData` 类（由 `info` 方法返回）包含了更标准化的字段。要获取音频文件的编码格式、位深（bits_per_sample）等详细信息，请直接访问 `AudioMetaData` 对象的 `encoding` 和 `bits_per_sample` 属性。这些字段替代了旧版 `sox_signalinfo_t` 和 `sox_encodinginfo_t` 中的对应数据。","https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fissues\u002F1094",{"id":148,"question_zh":149,"answer_zh":150,"source_url":151},44897,"Windows 上是否支持 MP3 格式？如何解决依赖问题？","原生的 Windows 支持正在逐步完善。如果在 Windows 上遇到 MP3 支持问题或缺少 Sox\u002FFFmpeg 二进制文件，可以考虑以下方案：\n1. 使用 `static-ffmpeg` 或 `static-sox` 等 Python 包，它们可以自动下载并管理所需的二进制文件（通过 `pip install static-ffmpeg` 等安装）。\n2. 对于 MP3 写入支持，可以尝试使用 `lameenc` 库，它无需额外依赖即可在所有操作系统上工作。\n3. 确保使用的是较新版本的 torchaudio，其中已包含对 Windows  wheel 和 conda 包的构建支持。","https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fissues\u002F425",{"id":153,"question_zh":154,"answer_zh":155,"source_url":141},44898,"为什么加载 24-bit 音频文件会出错，如何支持？","早期版本的 torchaudio 可能对 24-bit 音频格式支持不完善。该功能已在 `0.8.1` 版本及以后的发布版中移植和支持。如果您遇到相关问题，请将 torchaudio 升级到 `0.8.1` 或更高版本（或使用 nightly build），即可正常处理 24-bit 音频文件。",[157,162,167,172,177,182,187,192,197,202,207,212,217,222,227,232,237,242,247,252],{"id":158,"version":159,"summary_zh":160,"released_at":161},359844,"v2.11.0","本版本兼容 PyTorch 2.11，并且也兼容 PyTorch 的未来版本。未添加任何新功能。","2026-03-23T18:40:32",{"id":163,"version":164,"summary_zh":165,"released_at":166},359845,"v2.10.0","本版本与 PyTorch 2.10 兼容。未添加任何新功能。\n\n此前被标记为弃用的 C++ 和 CUDA 扩展现已保留，并将继续保留在 torchaudio 中（包括 `lfilter`、`RNNTLoss`、`CUCT`、`forced_align` 和 `overdrive`）。\n\n本版本标志着在 https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fissues\u002F3902 中描述的 TorchAudio 迁移工作的最终完成。","2026-01-21T17:05:17",{"id":168,"version":169,"summary_zh":170,"released_at":171},359846,"v2.9.1","这是一个补丁版本，与 [PyTorch 2.9.1](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fpytorch\u002Freleases\u002Ftag\u002Fv2.9.1) 兼容。该版本没有新增功能。","2025-11-12T20:07:48",{"id":173,"version":174,"summary_zh":175,"released_at":176},359847,"v2.9.0","### 已弃用的 API\n\n大多数被标记为“移除”（[drop](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fissues\u002F3902#issue-3017467084)）的 API 现已删除。\n\n### `load()` 和 `save()` 迁移到 TorchCodec\n\n我们正在将 PyTorch 的解码和编码功能整合到 [TorchCodec](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Ftorchcodec) 中。`torchaudio.load()` 和 `torchaudio.save()` 仍然存在，但其底层实现现在依赖于 TorchCodec。\n\n### C++ 和 CUDA 扩展\n\n我们仍在努力保留 `forced_align`、`lfilter`、`overdrive`、`RNNT` 和 `CUCTC`。","2025-10-15T17:17:43",{"id":178,"version":179,"summary_zh":180,"released_at":181},359848,"v2.8.0","从 https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fissues\u002F3902#issuecomment-3160818888 粘贴的更新内容：\n\n\n### 已弃用的 API\n\n\n大多数被标记为“移除”（Drop）的 API 现已明确废弃，在文档中以及从 Python 中调用时都会抛出弃用警告。这些 API 将在下一个 2.9 版本中被移除。\n\n### `load()` 和 `save()` 向 TorchCodec 的迁移\n\n\n正如我们之前所提到，我们正在将 PyTorch 的解码和编码功能整合到 [TorchCodec](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Ftorchcodec) 中。\n\n`torchaudio.load()` 和 `torchaudio.save()` 是 TorchAudio 中最受欢迎的 API 之一，因此为了方便用户，我们提供了 [`torchaudio.load_with_torchcodec()`](https:\u002F\u002Fdocs.pytorch.org\u002Faudio\u002F2.8\u002Fgenerated\u002Ftorchaudio.load_with_torchcodec.html#torchaudio.load_with_torchcodec) 和 [`torchaudio.save_with_torchcodec()`](https:\u002F\u002Fdocs.pytorch.org\u002Faudio\u002Fstable\u002Fgenerated\u002Ftorchaudio.save_with_torchcodec.html#torchaudio.save_with_torchcodec)，它们在很大程度上可以作为直接替换使用。不过，我们仍然鼓励用户直接迁移到 TorchCodec 的 [`AudioDecoder()`](https:\u002F\u002Fdocs.pytorch.org\u002Ftorchcodec\u002Fstable\u002Fgenerated\u002Ftorchcodec.decoders.AudioDecoder.html#torchcodec.decoders.AudioDecoder) 和 [`AudioEncoder()`](https:\u002F\u002Fdocs.pytorch.org\u002Ftorchcodec\u002Fstable\u002Fgenerated\u002Ftorchcodec.encoders.AudioEncoder.html#torchcodec.encoders.AudioEncoder)。\n\n在未来的版本中，`torchaudio.load()` 和 `torchaudio.save()` 仍将存在，但其底层实现将依赖于 `torchaudio.load_with_torchcodec()` 和 `torchaudio.save_with_torchcodec()`。\n\n我们希望这次迁移能够尽可能顺畅——大多数用户只需运行 `pip install torchcodec`，其余部分应该就能照常工作。\n\n目前，TorchCodec 尚未支持 Windows，但我们正在加紧开发，请大家耐心等待。\n\n### C++ 和 CUDA 扩展\n\n\n我们曾提到，正在探索保留目前计划删除的基于 C++ 的 API 的方案，具体包括：`forced_align`、`lfilter`、`overdrive`、`RNNT` 和 `CUCTC`。\n\n虽然我无法百分之百确定这一点，但我们现在更有信心可以通过将这些扩展移植到 PyTorch 新的“稳定 ABI”算子上来予以保留。我们目前正在积极地推进这项工作。","2025-08-06T17:08:37",{"id":183,"version":184,"summary_zh":185,"released_at":186},359849,"v2.7.1","此版本兼容 PyTorch 2.7.1，未新增任何功能。\r\n\r\n\r\n\r\n> [!NOTE]\r\n我们正在对 TorchAudio 进行重构，并将其过渡到维护阶段。在此过程中，部分面向用户的功能将会被移除。我们的主要目标是减少与 PyTorch 生态其他部分的重复，降低维护难度，并打造一个更加聚焦于其核心优势的 TorchAudio 版本：即为机器学习处理音频数据。更多详情请参阅[我们的社区公告](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fissues\u002F3902)。","2025-06-04T18:21:13",{"id":188,"version":189,"summary_zh":190,"released_at":191},359850,"v2.7.0","此版本兼容 PyTorch 2.7。未新增任何功能。\r\n\r\n\r\n\r\n> [!NOTE]\r\n我们正在对 TorchAudio 进行重构，并将其过渡到维护阶段。在此过程中，部分面向用户的功能将会被移除。我们的主要目标是减少与 PyTorch 生态其他部分的重复，降低维护难度，并打造一个更加聚焦于其核心优势的 TorchAudio 版本：即为机器学习处理音频数据。更多详情请参阅[我们的社区公告](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fissues\u002F3902)。","2025-04-24T15:24:54",{"id":193,"version":194,"summary_zh":195,"released_at":196},359851,"v2.6.0","此版本兼容 `PyTorch 2.6`。未新增功能。\n\n进行了以下修复和改进：\n\n- 修复使用负索引时音频裁剪不正确的问题：https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fpull\u002F3860\n- 修复在请求非零 `pre_trigger_time` 时，VAD 返回空输出的问题：https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fpull\u002F3866\n- ROCM 兼容性改进：https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fpull\u002F3840、https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fpull\u002F3843","2025-01-29T17:29:44",{"id":198,"version":199,"summary_zh":200,"released_at":201},359852,"v2.5.0","本版本兼容 `PyTorch 2.5`。未新增功能。\n\n本版本包含一项改进：\n\n- 减少 lfilter 反向传播中的计算量 https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fpull\u002F3831","2024-10-17T16:24:06",{"id":203,"version":204,"summary_zh":205,"released_at":206},359853,"v2.4.1","本版本兼容 [PyTorch 2.4.1](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fpytorch\u002Freleases\u002Ftag\u002Fv2.4.1) 修补版。未添加任何新功能。","2024-09-04T20:06:34",{"id":208,"version":209,"summary_zh":210,"released_at":211},359854,"v2.4.0","This release is compatible with `PyTorch 2.4`. There are no new features added.\r\n\r\nThis release contains 2 fixes:\r\n\r\n- Fix view size error when backpropagating through lfilter https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fpull\u002F3794\r\n- [BC-Breaking] Fix model downloading in bento https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fpull\u002F3803","2024-07-24T18:54:09",{"id":213,"version":214,"summary_zh":215,"released_at":216},359855,"v2.3.1","This release is compatible with [PyTorch 2.3.1](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fpytorch\u002Freleases\u002Ftag\u002Fv2.3.1) patch release. There are no new features added.","2024-06-05T19:22:04",{"id":218,"version":219,"summary_zh":220,"released_at":221},359856,"v2.3.0","This release is compatible with [PyTorch 2.3.0](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fpytorch\u002Freleases\u002Ftag\u002Fv2.3.0) patch release. There are no new features added.\r\n\r\nThis release contains minor documentation and code quality improvements (#3734, #3748, #3757, #3759)","2024-04-24T16:19:43",{"id":223,"version":224,"summary_zh":225,"released_at":226},359857,"v2.2.2","This release is compatible with [PyTorch 2.2.2](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fpytorch\u002Freleases\u002Ftag\u002Fv2.2.2) patch release. There are no new features added.","2024-03-28T15:39:10",{"id":228,"version":229,"summary_zh":230,"released_at":231},359858,"v2.2.1","This release is compatible with [PyTorch 2.2.1](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fpytorch\u002Freleases\u002Ftag\u002Fv2.2.1) patch release. There are no new features added.","2024-02-22T21:42:46",{"id":233,"version":234,"summary_zh":235,"released_at":236},359859,"v2.2.0","## New Features\r\n- Add path-like object support to StreamReader\u002FWriter https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fpull\u002F3608 \r\n- Introduce `trio` top-level module, dedicated for core I\u002FO operations (https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fpull\u002F3676, https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fpull\u002F3680, https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fpull\u002F3681, https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fpull\u002F3682) Please refer to https:\u002F\u002Fpytorch.org\u002Faudio\u002F2.2.0\u002Ftorio.html for the details.\r\n\r\n## Bug Fixes\r\n- https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fpull\u002F3685 Make F.vad return empty tensor for zero valued tensor input\r\n\r\n## Recipe Updates\r\n- https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fpull\u002F3631 Fix inconsistent naming\r\n","2024-01-30T18:17:31",{"id":238,"version":239,"summary_zh":240,"released_at":241},359860,"v2.1.2","This is a patch release, which is compatible with [PyTorch 2.1.2](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fpytorch\u002Freleases\u002Ftag\u002Fv2.1.2). There are no new features added.","2023-12-15T02:05:05",{"id":243,"version":244,"summary_zh":245,"released_at":246},359861,"v2.1.1","This is a minor release, which is compatible with PyTorch 2.1.1 and includes bug fixes, improvements and documentation updates.\r\n\r\n## Bug Fixes\r\n\r\n* Cherry-pick 2.1.1: Fix WavLM bundles (#3665)\r\n* Cherry-pick 2.1.1: Add back compression level in i\u002Fo dispatcher backend by (#3666)\r\n","2023-11-15T22:19:00",{"id":248,"version":249,"summary_zh":250,"released_at":251},359862,"v2.1.0","## Hilights\r\n\r\nTorchAudio v2.1 introduces the new features and backward-incompatible changes;\r\n\r\n1. [BETA] A new API to apply filter, effects and codec    \r\n`torchaudio.io.AudioEffector` can apply filters, effects and encodings to waveforms in online\u002Foffline fashion.    \r\nYou can use it as a form of augmentation.   \r\nPlease refer to https:\u002F\u002Fpytorch.org\u002Faudio\u002F2.1\u002Ftutorials\u002Feffector_tutorial.html for the examples.\r\n1. [BETA] Tools for forced alignment    \r\nNew functions and a pre-trained model for forced alignment were added.    \r\n`torchaudio.functional.forced_align` computes alignment from an emission and `torchaudio.pipelines.MMS_FA` provides access to the model trained for multilingual forced alignment in [MMS: Scaling Speech Technology to 1000+ languages](https:\u002F\u002Fai.meta.com\u002Fblog\u002Fmultilingual-model-speech-recognition\u002F) project.    \r\nPlease refer to https:\u002F\u002Fpytorch.org\u002Faudio\u002F2.1\u002Ftutorials\u002Fctc_forced_alignment_api_tutorial.html for the usage of `forced_align` function, and https:\u002F\u002Fpytorch.org\u002Faudio\u002F2.1\u002Ftutorials\u002Fforced_alignment_for_multilingual_data_tutorial.html for how one can use `MMS_FA` to align transcript in multiple languages.\r\n1. [BETA] TorchAudio-Squim : Models for reference-free speech assessment    \r\nModel architectures and pre-trained models from the paper [TorchAudio-Squim: Reference-less Speech Quality and Intelligibility measures in TorchAudio](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.01448) were added.\r\nYou can use `torchaudio.pipelines.SQUIM_SUBJECTIVE` and `torchaudio.pipelines.SQUIM_OBJECTIVE` models to estimate the various speech quality and intelligibility metrics. This is helpful when evaluating the quality of speech generation models, such as TTS.    \r\nPlease refer to https:\u002F\u002Fpytorch.org\u002Faudio\u002F2.1\u002Ftutorials\u002Fsquim_tutorial.html for the detail.\r\n1. [BETA] CUDA-based CTC decoder    \r\n`torchaudio.models.decoder.CUCTCDecoder` takes emission stored in CUDA memory and performs CTC beam search on it in CUDA device. The beam search is fast. It eliminates the need to move data from CUDA device to CPU when performing automatic speech recognition. With PyTorch's CUDA support, it is now possible to perform the entire speech recognition pipeline in CUDA.    \r\nPlease refer to https:\u002F\u002Fpytorch.org\u002Faudio\u002F2.1\u002Ftutorials\u002Fasr_inference_with_cuda_ctc_decoder_tutorial.html for the detail.\r\n1. [Prototype] Utilities for AI music generation    \r\nWe are working to add utilities that are relevant to music AI. Since the last release, the following APIs were added to the prototype.    \r\nPlease refer to respective documentation for the usage.    \r\n   - torchaudio.prototype.chroma_filterbank\r\n   - torchaudio.prototype.transforms.ChromaScale\r\n   - torchaudio.prototype.transforms.ChromaSpectrogram\r\n   - torchaudio.prototype.pipelines.VGGISH\r\n1. New recipes for training models.\r\nRecipes for Audio-visual ASR, multi-channel DNN beamforming and TCPGen context-biasing were added.    \r\nPlease refer to the recipes    \r\n   - https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Ftree\u002Frelease\u002F2.1\u002Fexamples\u002Favsr\r\n   - https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Ftree\u002Frelease\u002F2.1\u002Fexamples\u002Fdnn_beamformer\r\n   - https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Ftree\u002Frelease\u002F2.1\u002Fexamples\u002Fasr\u002Flibrispeech_conformer_rnnt_biasing\r\n1. Update to FFmpeg support\r\nThe version of supported FFmpeg libraries was updated.    \r\nTorchAudio v2.1 works with FFmpeg 6, 5 and 4.4. The support for 4.3, 4.2 and 4.1 are dropped.   \r\nPlease refer to https:\u002F\u002Fpytorch.org\u002Faudio\u002F2.1\u002Finstallation.html#optional-dependencies for the detail of the new FFmpeg integration mechanism.\r\n1. Update to libsox integration    \r\nTorchAudio now depends on libsox installed separately from torchaudio. Sox I\u002FO backend no longer supports file-like object. (This is supported by FFmpeg backend and soundfile)    \r\nPlease refer to https:\u002F\u002Fpytorch.org\u002Faudio\u002F2.1\u002Finstallation.html#optional-dependencies for the detail.\r\n\r\n## New Features\r\n\r\n### I\u002FO\r\n- Support overwriting PTS in `torchaudio.io.StreamWriter` (#3135)\r\n- Include format information after filter `torchaudio.io.StreamReader.get_out_stream_info` (#3155)\r\n- Support CUDA frame in `torchaudio.io.StreamReader` filter graph (#3183, #3479)\r\n- Support YUV444P in GPU decoder (#3199)\r\n- Add additional filter graph processing to `torchaudio.io.StreamWriter` (#3194)\r\n- Cache and reuse HW device context in GPU decoder (#3178)\r\n- Cache and reuse HW device context in GPU encoder (#3215)\r\n- Support changing the number of channels in `torchaudio.io.StreamReader` (#3216)\r\n- Support encode spec change in `torchaudio.io.StreamWriter` (#3207)\r\n- Support encode options such as compression rate and bit rate (#3179, #3203, #3224)\r\n- Add `420p10le` support to `torchaudio.io.StreamReader` CPU decoder (#3332)\r\n- Support multiple FFmpeg versions (#3464, #3476)\r\n- Support writing opus and mp3 with soundfile (#3554)\r\n- Add switch to disable sox integration and ffmpeg integration at runtime (#3500)\r\n\r\n### Ops\r\n- Add `torchaudio.io.AudioEffector` (#3163, #3372, #3374)\r\n- Add `torchaudio.transforms.SpecAugment` (#3309, #3314","2023-10-04T17:30:04",{"id":253,"version":254,"summary_zh":255,"released_at":256},359863,"v2.0.2","# TorchAudio 2.0.2 Release Note\r\n\r\nThis is a minor release, which is compatible with PyTorch 2.0.1 and includes bug fixes, improvements and documentation updates. There is no new feature added.\r\n\r\n## Bug fix\r\n* #3239 Properly set #samples passed to encoder (#3204)\r\n* #3238 Fix virtual function issue with CTC decoder (#3230)\r\n* #3245 Fix path-like object support in FFmpeg dispatcher (#3243, #3248)\r\n* #3261 Use scaled_dot_product_attention in Wav2vec2\u002FHuBERT's SelfAttention (#3253)\r\n* #3264 Use scaled_dot_product_attention in WavLM attention (#3252, #3265)\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fpytorch\u002Faudio\u002Fcompare\u002Fv2.0.1...v2.0.2","2023-05-08T20:03:52"]