[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-speechbrain--speechbrain":3,"tool-speechbrain--speechbrain":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",148568,2,"2026-04-09T23:34:24",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108111,"2026-04-08T11:23:26",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":64,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":76,"owner_twitter":77,"owner_website":78,"owner_url":79,"languages":80,"stars":97,"forks":98,"last_commit_at":99,"license":100,"difficulty_score":32,"env_os":101,"env_gpu":102,"env_ram":101,"env_deps":103,"category_tags":110,"github_topics":112,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":132,"updated_at":133,"faqs":134,"releases":164},6145,"speechbrain\u002Fspeechbrain","speechbrain","A PyTorch-based Speech Toolkit","SpeechBrain 是一个基于 PyTorch 的开源语音工具包，旨在加速对话式 AI 的开发。它就像一位全能助手，帮助开发者轻松构建语音助手、聊天机器人及大语言模型背后的核心技术，覆盖从语音识别、说话人确认到语音增强、分离及语言建模等多种任务。\n\n过去，语音处理与自然语言处理往往需要不同的技术栈，而 SpeechBrain 打破了这一壁垒，提供了一套统一的“整体性”解决方案。它解决了研究人员和工程师在复现前沿算法时面临的代码分散、环境配置复杂等痛点。通过提供超过 200 个经过验证的训练配方（Recipes）和 100 多个托管在 HuggingFace 上的预训练模型（如 Whisper、Wav2Vec2、Llama2 等），用户只需几行代码或简单的 YAML 配置即可完成模型的微调与推理，极大降低了实验门槛。\n\n这款工具特别适合 AI 研究人员、语音算法工程师以及希望快速原型化的开发者使用。其独特的亮点在于高度一致的代码结构设计，确保了不同任务间的可复现性与易用性；此外，它还前瞻性地支持了脑电（EEG）模态，探索非言语人群的人机交互新可能。无论你是想从零训练模型，还是利用现有","SpeechBrain 是一个基于 PyTorch 的开源语音工具包，旨在加速对话式 AI 的开发。它就像一位全能助手，帮助开发者轻松构建语音助手、聊天机器人及大语言模型背后的核心技术，覆盖从语音识别、说话人确认到语音增强、分离及语言建模等多种任务。\n\n过去，语音处理与自然语言处理往往需要不同的技术栈，而 SpeechBrain 打破了这一壁垒，提供了一套统一的“整体性”解决方案。它解决了研究人员和工程师在复现前沿算法时面临的代码分散、环境配置复杂等痛点。通过提供超过 200 个经过验证的训练配方（Recipes）和 100 多个托管在 HuggingFace 上的预训练模型（如 Whisper、Wav2Vec2、Llama2 等），用户只需几行代码或简单的 YAML 配置即可完成模型的微调与推理，极大降低了实验门槛。\n\n这款工具特别适合 AI 研究人员、语音算法工程师以及希望快速原型化的开发者使用。其独特的亮点在于高度一致的代码结构设计，确保了不同任务间的可复现性与易用性；此外，它还前瞻性地支持了脑电（EEG）模态，探索非言语人群的人机交互新可能。无论你是想从零训练模型，还是利用现有大模型进行定制，SpeechBrain 都能让复杂的语音技术变得触手可及。","\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fspeechbrain\u002Fspeechbrain\u002Fdevelop\u002Fdocs\u002Fimages\u002Fspeechbrain-logo.svg\" alt=\"SpeechBrain Logo\"\u002F>\n\u003C\u002Fp>\n\n[![Typing SVG](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspeechbrain_speechbrain_readme_32faf143fc63.png)](https:\u002F\u002Fgit.io\u002Ftyping-svg)\n\n\n| 📘 [Tutorials](https:\u002F\u002Fspeechbrain.readthedocs.io) | 🌐 [Website](https:\u002F\u002Fspeechbrain.github.io\u002F) | 📚 [Documentation](https:\u002F\u002Fspeechbrain.readthedocs.io\u002Fen\u002Flatest\u002Findex.html) | 🤝 [Contributing](https:\u002F\u002Fspeechbrain.readthedocs.io\u002Fen\u002Flatest\u002Fcontributing.html) | 🤗 [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fspeechbrain) | ▶️ [YouTube](https:\u002F\u002Fwww.youtube.com\u002F@SpeechBrainProject) | 🐦 [X](https:\u002F\u002Ftwitter.com\u002FSpeechBrain1) |\n\n![GitHub Repo stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fspeechbrain\u002Fspeechbrain?style=social) *Please, help our community project. Star on GitHub!*\n\n**Exciting News (January, 2024):** Discover what is new in SpeechBrain 1.0 [here](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1IEPfKRuvJRSjoxu22GZhb3czfVHsAy0s?usp=sharing)!\n#\n# 🗣️💬 What SpeechBrain Offers\n\n- SpeechBrain is an **open-source** [PyTorch](https:\u002F\u002Fpytorch.org\u002F) toolkit that accelerates **Conversational AI** development, i.e., the technology behind *speech assistants*, *chatbots*, and *large language models*.\n\n- It is crafted for fast and easy creation of advanced technologies for **Speech** and **Text** Processing.\n\n\n## 🌐  Vision\n- With the rise of [deep learning](https:\u002F\u002Fwww.deeplearningbook.org\u002F), once-distant domains like speech processing and NLP are now very close. A well-designed neural network and large datasets are all you need.\n\n- We think it is now time for a **holistic toolkit** that, mimicking the human brain, jointly supports diverse technologies for complex Conversational AI systems.\n\n- This spans *speech recognition*, *speaker recognition*, *speech enhancement*, *speech separation*, *language modeling*, *dialogue*, and beyond.\n\n- Aligned with our long-term goal of natural human-machine conversation, including for non-verbal individuals, we have recently added support for the [EEG modality](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FMOABB).\n\n\n\n## 📚 Training Recipes\n- We share over 200 competitive training [recipes](recipes) on more than 40 datasets supporting 20 speech and text processing tasks (see below).\n\n- We support both training from scratch and fine-tuning pretrained models such as [Whisper](https:\u002F\u002Fhuggingface.co\u002Fopenai\u002Fwhisper-large), [Wav2Vec2](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_doc\u002Fwav2vec2), [WavLM](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_doc\u002Fwavlm), [Hubert](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_doc\u002Fhubert), [GPT2](https:\u002F\u002Fhuggingface.co\u002Fgpt2), [Llama2](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_doc\u002Fllama2), and beyond. The models on [HuggingFace](https:\u002F\u002Fhuggingface.co\u002F) can be easily plugged in and fine-tuned.\n\n- For any task, you train the model using these commands:\n```python\npython train.py hparams\u002Ftrain.yaml\n```\n\n- The hyperparameters are encapsulated in a YAML file, while the training process is orchestrated through a Python script.\n\n- We maintained a consistent code structure across different tasks.\n\n- For better replicability, training logs and checkpoints are hosted on Dropbox.\n\n## \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspeechbrain\" target=\"_blank\"> \u003Cimg src=\"https:\u002F\u002Fhuggingface.co\u002Ffront\u002Fassets\u002Fhuggingface_logo.svg\" alt=\"drawing\" width=\"40\"\u002F> \u003C\u002Fa> Pretrained Models and Inference\n\n- Access over 100 pretrained models hosted on [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fspeechbrain).\n- Each model comes with a user-friendly interface for seamless inference. For example, transcribing speech using a pretrained model requires just three lines of code:\n\n```python\nfrom speechbrain.inference import EncoderDecoderASR\n\nasr_model = EncoderDecoderASR.from_hparams(source=\"speechbrain\u002Fasr-conformer-transformerlm-librispeech\", savedir=\"pretrained_models\u002Fasr-transformer-transformerlm-librispeech\")\nasr_model.transcribe_file(\"speechbrain\u002Fasr-conformer-transformerlm-librispeech\u002Fexample.wav\")\n```\n\n##  \u003Ca href=\"https:\u002F\u002Fspeechbrain.github.io\u002F\" target=\"_blank\"> \u003Cimg src=\"https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fcommons\u002Fthumb\u002Fd\u002Fd0\u002FGoogle_Colaboratory_SVG_Logo.svg\u002F1200px-Google_Colaboratory_SVG_Logo.svg.png\" alt=\"drawing\" width=\"50\"\u002F> \u003C\u002Fa>  Documentation\n- We are deeply dedicated to promoting inclusivity and education.\n- We have authored over 30 [tutorials](https:\u002F\u002Fspeechbrain.readthedocs.io) that not only describe how SpeechBrain works but also help users familiarize themselves with Conversational AI.\n- Every class or function has clear explanations and examples that you can run. Check out the [documentation](https:\u002F\u002Fspeechbrain.readthedocs.io\u002Fen\u002Flatest\u002Findex.html) for more details 📚.\n\n\n\n## 🎯 Use Cases\n- 🚀 **Research Acceleration**: Speeding up academic and industrial research. You can develop and integrate new models effortlessly, comparing their performance against our baselines.\n\n- ⚡️ **Rapid Prototyping**: Ideal for quick prototyping in time-sensitive projects.\n\n- 🎓 **Educational Tool**: SpeechBrain's simplicity makes it a valuable educational resource. It is used by institutions like [Mila](https:\u002F\u002Fmila.quebec\u002Fen\u002F), [Concordia University](https:\u002F\u002Fwww.concordia.ca\u002F), [Avignon University](https:\u002F\u002Funiv-avignon.fr\u002Fen\u002F), and many others for student training.\n\n#\n# 🚀 Quick Start\n\nTo get started with SpeechBrain, follow these simple steps:\n\n## 🛠️ Installation\n\n### Install via PyPI\n\n1. Install SpeechBrain using PyPI:\n\n    ```bash\n    pip install speechbrain\n    ```\n\n2. Access SpeechBrain in your Python code:\n\n    ```python\n    import speechbrain as sb\n    ```\n\n### Install from GitHub\nThis installation is recommended for users who wish to conduct experiments and customize the toolkit according to their needs.\n\n1. Clone the GitHub repository and install the requirements:\n\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain.git\n    cd speechbrain\n    pip install -r requirements.txt\n    pip install --editable .\n    ```\n\n2. Access SpeechBrain in your Python code:\n\n    ```python\n    import speechbrain as sb\n    ```\n\nAny modifications made to the `speechbrain` package will be automatically reflected, thanks to the `--editable` flag.\n\n## ✔️ Test Installation\n\nEnsure your installation is correct by running the following commands:\n\n```bash\npytest tests\npytest --doctest-modules speechbrain\n```\n\n## 🏃‍♂️ Running an Experiment\n\nIn SpeechBrain, you can train a model for any task using the following steps:\n\n```python\ncd recipes\u002F\u003Cdataset>\u002F\u003Ctask>\u002F\npython experiment.py params.yaml\n```\n\nThe results will be saved in the `output_folder` specified in the YAML file.\n\n## 📘 Learning SpeechBrain\n\n- **Website:** Explore general information on the [official website](https:\u002F\u002Fspeechbrain.github.io).\n\n- **Tutorials:** Start with [basic tutorials](https:\u002F\u002Fspeechbrain.readthedocs.io\u002Fen\u002Flatest\u002Ftutorials\u002Fbasics.html) covering fundamental functionalities. Find advanced tutorials and topics in the Tutorial notebooks category in the [SpeechBrain documentation](https:\u002F\u002Fspeechbrain.readthedocs.io).\n\n- **Documentation:** Detailed information on the SpeechBrain API, contribution guidelines, and code is available in the [documentation](https:\u002F\u002Fspeechbrain.readthedocs.io\u002Fen\u002Flatest\u002Findex.html).\n\n#\n# 🔧 Supported Technologies\n- SpeechBrain is a versatile framework designed for implementing a wide range of technologies within the field of Conversational AI.\n- It excels not only in individual task implementations but also in combining various technologies into complex pipelines.\n\n## 🎙️ Speech\u002FAudio Processing\n| Tasks        | Datasets           | Technologies\u002FModels  |\n| ------------- |-------------| -----|\n| Speech Recognition      | [AISHELL-1](recipes\u002FAISHELL-1), [CommonVoice](recipes\u002FCommonVoice), [DVoice](recipes\u002FDVoice), [LibriSpeech](recipes\u002FLibriSpeech), [MEDIA](recipes\u002FMEDIA), [RescueSpeech](recipes\u002FRescueSpeech), [Switchboard](recipes\u002FSwitchboard), [TIMIT](recipes\u002FTIMIT), [Tedlium2](recipes\u002FTedlium2), [Voicebank](recipes\u002FVoicebank) | [CTC](https:\u002F\u002Fwww.cs.toronto.edu\u002F~graves\u002Ficml_2006.pdf), [Transducers](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1211.3711.pdf?origin=publication_detail), [Transformers](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03762), [Seq2Seq](http:\u002F\u002Fzhaoshuaijiang.com\u002Ffile\u002FHybrid_CTC_Attention_Architecture_for_End-to-End_Speech_Recognition.pdf), [Beamsearch techniques for CTC](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1911.01629.pdf),[seq2seq](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.02619.pdf),[transducers](https:\u002F\u002Fwww.merl.com\u002Fpublications\u002Fdocs\u002FTR2017-190.pdf)), [Rescoring](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1612.02695.pdf), [Conformer](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.08100), [Branchformer](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.02971), [Hyperconformer](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.18281), [Kaldi2-FST](https:\u002F\u002Fgithub.com\u002Fk2-fsa\u002Fk2) |\n| Speaker Recognition      | [VoxCeleb](recipes\u002FVoxCeleb) | [ECAPA-TDNN](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.07143), [ResNET](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.12592.pdf), [Xvectors](https:\u002F\u002Fwww.danielpovey.com\u002Ffiles\u002F2018_icassp_xvectors.pdf), [PLDA](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F6639151), [Score Normalization](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fabs\u002Fpii\u002FS1051200499903603) |\n| Speech Separation      | [WSJ0Mix](recipes\u002FWSJ0Mix), [LibriMix](recipes\u002FLibriMix), [WHAM!](recipes\u002FWHAMandWHAMR), [WHAMR!](recipes\u002FWHAMandWHAMR), [Aishell1Mix](recipes\u002FAishell1Mix), [BinauralWSJ0Mix](recipes\u002FBinauralWSJ0Mix) | [SepFormer](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.13154), [RESepFormer](https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.09507), [SkiM](https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.10800), [DualPath RNN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.06379), [ConvTasNET](https:\u002F\u002Farxiv.org\u002Fabs\u002F1809.07454) |\n| Speech Enhancement      | [DNS](recipes\u002FDNS), [Voicebank](recipes\u002FVoicebank) | [SepFormer](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.13154), [MetricGAN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.04874), [MetricGAN-U](https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.05866), [SEGAN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.09452), [spectral masking](http:\u002F\u002Fstaff.ustc.edu.cn\u002F~jundu\u002FPublications\u002Fpublications\u002FTrans2015_Xu.pdf), [time masking](http:\u002F\u002Fstaff.ustc.edu.cn\u002F~jundu\u002FPublications\u002Fpublications\u002FTrans2015_Xu.pdf) |\n| Interpretability | [ESC50](recipes\u002FESC50) | [Listenable Maps for Audio Classifiers (L-MAC)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.13086), [Learning-to-Interpret (L2I)](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Ffile\u002Fe53280d73dd5389e820f4a6250365b0e-Paper-Conference.pdf), [Non-Negative Matrix Factorization (NMF)](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Ffile\u002Fe53280d73dd5389e820f4a6250365b0e-Paper-Conference.pdf), [PIQ](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.12659) |\n| Speech Generation | [AudioMNIST](recipes\u002FAudioMNIST) | [Diffusion](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.11239), [Latent Diffusion](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.10752) |\n| Text-to-Speech      | [LJSpeech](recipes\u002FLJSpeech), [LibriTTS](recipes\u002FLibriTTS) | [Tacotron2](https:\u002F\u002Farxiv.org\u002Fabs\u002F1712.05884), [Zero-Shot Multi-Speaker Tacotron2](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.02418), [FastSpeech2](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.04558) |\n| Vocoding      | [LJSpeech](recipes\u002FLJSpeech), [LibriTTS](recipes\u002FLibriTTS) | [HiFiGAN](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.05646), [DiffWave](https:\u002F\u002Farxiv.org\u002Fabs\u002F2009.09761)\n| Spoken Language Understanding | [MEDIA](recipes\u002FMEDIA), [SLURP](recipes\u002FSLURP), [Fluent Speech Commands](recipes\u002Ffluent-speech-commands), [Timers-and-Such](recipes\u002Ftimers-and-such)  | [Direct SLU](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.01604), [Decoupled SLU](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.01604), [Multistage SLU](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.01604) |\n| Speech-to-Speech Translation  | [CVSS](recipes\u002FCVSS) | [Discrete Hubert](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2106.07447.pdf), [HiFiGAN](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.05646), [wav2vec2](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.11477) |\n| Speech Translation  | [Fisher CallHome (Spanish)](recipes\u002FFisher-Callhome-Spanish), [IWSLT22(lowresource)](recipes\u002FIWSLT22_lowresource) | [wav2vec2](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.11477) |\n| Emotion Classification      | [IEMOCAP](recipes\u002FIEMOCAP), [ZaionEmotionDataset](recipes\u002FZaionEmotionDataset) | [ECAPA-TDNN](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.07143), [wav2vec2](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.11477), [Emotion Diarization](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.12991) |\n| Language Identification | [VoxLingua107](recipes\u002FVoxLingua107), [CommonLanguage](recipes\u002FCommonLanguage)| [ECAPA-TDNN](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.07143) |\n| Voice Activity Detection  | [LibriParty](recipes\u002FLibriParty) | [CRDNN](https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.04624) |\n| Sound Classification  | [ESC50](recipes\u002FESC50), [UrbanSound](recipes\u002FUrbanSound8k) | [CNN14](https:\u002F\u002Fgithub.com\u002Franchlai\u002Fsound_classification), [ECAPA-TDNN](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.07143) |\n| Self-Supervised Learning | [CommonVoice](recipes\u002FCommonVoice), [LibriSpeech](recipes\u002FLibriSpeech) | [wav2vec2](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.11477) |\n| Metric Learning | [REAL-M](recipes\u002FREAL-M\u002Fsisnr-estimation), [Voicebank](recipes\u002FVoicebank) | [Blind SNR-Estimation](https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.08909), [PESQ Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.05866) |\n| Alignment | [TIMIT](recipes\u002FTIMIT) | [CTC](https:\u002F\u002Fwww.cs.toronto.edu\u002F~graves\u002Ficml_2006.pdf), [Viterbi](https:\u002F\u002Fwww.cs.cmu.edu\u002F~cga\u002Fbehavior\u002Frabiner1.pdf), [Forward Forward](https:\u002F\u002Fwww.cs.cmu.edu\u002F~cga\u002Fbehavior\u002Frabiner1.pdf) |\n| Diarization | [AMI](recipes\u002FAMI) | [ECAPA-TDNN](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.07143), [X-vectors](https:\u002F\u002Fwww.danielpovey.com\u002Ffiles\u002F2018_icassp_xvectors.pdf), [Spectral Clustering](https:\u002F\u002Fweb.archive.org\u002Fweb\u002F20240305184559\u002Fhttp:\u002F\u002Fwww.ifp.illinois.edu\u002F~hning2\u002Fpapers\u002FNing_spectral.pdf) |\n\n## 📝 Text Processing\n| Tasks        | Datasets           | Technologies\u002FModels  |\n| ------------- |-------------| -----|\n| Language Modeling | [CommonVoice](recipes\u002FCommonVoice), [LibriSpeech](recipes\u002FLibriSpeech)| [n-grams](https:\u002F\u002Fweb.stanford.edu\u002F~jurafsky\u002Fslp3\u002F3.pdf), [RNNLM](https:\u002F\u002Fwww.fit.vutbr.cz\u002Fresearch\u002Fgroups\u002Fspeech\u002Fpubli\u002F2010\u002Fmikolov_interspeech2010_IS100722.pdf), [TransformerLM](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03762) |\n| Response Generation | [MultiWOZ](recipes\u002FMultiWOZ\u002Fresponse_generation)| [GPT2](https:\u002F\u002Fd4mucfpksywv.cloudfront.net\u002Fbetter-language-models\u002Flanguage_models_are_unsupervised_multitask_learners.pdf), [Llama2](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.09288) |\n| Grapheme-to-Phoneme | [LibriSpeech](recipes\u002FLibriSpeech) | [RNN](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.13703), [Transformer](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.13703), [Curriculum Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.13703), [Homograph loss](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.13703) |\n\n## 🧠 EEG Processing\n| Tasks        | Datasets           | Technologies\u002FModels  |\n| ------------- |-------------| -----|\n| Motor Imagery | [BNCI2014001](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fhparams\u002FMotorImagery\u002FBNCI2014001), [BNCI2014004](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fhparams\u002FMotorImagery\u002FBNCI2014004), [BNCI2015001](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fhparams\u002FMotorImagery\u002FBNCI2015001), [Lee2019_MI](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fhparams\u002FMotorImagery\u002FLee2019_MI), [Zhou201](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fhparams\u002FMotorImagery\u002FZhou2016) | [EEGNet](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Fblob\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fmodels\u002FEEGNet.py), [ShallowConvNet](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Fblob\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fmodels\u002FShallowConvNet.py), [EEGConformer](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Fblob\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fmodels\u002FEEGConformer.py) |\n| P300 | [BNCI2014009](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fhparams\u002FP300\u002FBNCI2014009), [EPFLP300](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fhparams\u002FP300\u002FEPFLP300), [bi2015a](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fhparams\u002FP300\u002Fbi2015a), | [EEGNet](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Fblob\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fmodels\u002FEEGNet.py) |\n| SSVEP | [Lee2019_SSVEP](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fhparams\u002FSSVEP\u002FLee2019_SSVEP) | [EEGNet](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Fblob\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fmodels\u002FEEGNet.py) |\n\n\n\n\n## 🔍 Additional Features\n\nSpeechBrain includes a range of native functionalities that enhance the development of Conversational AI technologies. Here are some examples:\n\n- **Training Orchestration:** The `Brain` class serves as a fully customizable tool for managing training and evaluation loops over data. It simplifies training loops while providing the flexibility to override any part of the process.\n\n- **Hyperparameter Management:** A YAML-based hyperparameter file specifies all hyperparameters, from individual numbers (e.g., learning rate) to complete objects (e.g., custom models). This elegant solution drastically simplifies the training script.\n\n- **Dynamic Dataloader:** Enables flexible and efficient data reading.\n\n- **GPU Training:** Supports single and multi-GPU training, including distributed training.\n\n- **Dynamic Batching:** On-the-fly dynamic batching enhances the efficient processing of variable-length signals.\n\n- **Mixed-Precision Training:** Accelerates training through mixed-precision techniques.\n\n- **Efficient Data Reading:** Reads large datasets efficiently from a shared Network File System (NFS) via [WebDataset](https:\u002F\u002Fgithub.com\u002Fwebdataset\u002Fwebdataset).\n\n- **Hugging Face Integration:** Interfaces seamlessly with [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fspeechbrain) for popular models such as wav2vec2 and Hubert.\n\n- **Orion Integration:** Interfaces with [Orion](https:\u002F\u002Fgithub.com\u002FEpistimio\u002Forion) for hyperparameter tuning.\n\n- **Speech Augmentation Techniques:** Includes SpecAugment, Noise, Reverberation, and more.\n\n- **Data Preparation Scripts:** Includes scripts for preparing data for supported datasets.\n\nSpeechBrain is rapidly evolving, with ongoing efforts to support a growing array of technologies in the future.\n\n\n## 📊 Performance\n\n- SpeechBrain integrates a variety of technologies, including those that achieves competitive or state-of-the-art performance.\n\n- For a comprehensive overview of the achieved performance across different tasks, datasets, and technologies, please visit [here](PERFORMANCE.md).\n\n#\n# 📜 License\n\n- SpeechBrain is released under the [Apache License, version 2.0](https:\u002F\u002Fwww.apache.org\u002Flicenses\u002FLICENSE-2.0), a popular BSD-like license.\n- You are free to redistribute SpeechBrain for both free and commercial purposes, with the condition of retaining license headers. Unlike the GPL, the Apache License is not viral, meaning you are not obligated to release modifications to the source code.\n\n#\n# 🔮Future Plans\n\nWe have ambitious plans for the future, with a focus on the following priorities:\n\n- **Scale Up:** We aim to provide comprehensive recipes and technologies for training massive models on extensive datasets.\n\n- **Scale Down:** While scaling up delivers unprecedented performance, we recognize the challenges of deploying large models in production scenarios. We are focusing on real-time, streamable, and small-footprint Conversational AI.\n\n- **Multimodal Large Language Models**: We envision a future where a single foundation model can handle a wide range of text, speech, and audio tasks. Our core team is focused on enabling the training of advanced multimodal LLMs.\n\n#\n# 🤝 Contributing\n\n- SpeechBrain is a community-driven project, led by a core team with the support of numerous international collaborators.\n- We welcome contributions and ideas from the community. For more information, check [here](https:\u002F\u002Fspeechbrain.github.io\u002Fcontributing.html).\n\n#\n# 🙏 Sponsors\n\n- SpeechBrain is an academically driven project and relies on the passion and enthusiasm of its contributors.\n- As we cannot rely on the resources of a large company, we deeply appreciate any form of support, including donations or collaboration with the core team.\n- If you're interested in sponsoring SpeechBrain, please reach out to us at speechbrainproject@gmail.com.\n- A heartfelt thank you to all our sponsors, including the current ones:\n\n\n\n[\u003Cimg src=\"https:\u002F\u002Fhuggingface.co\u002Ffront\u002Fassets\u002Fhuggingface_logo.svg\" alt=\"Image 1\" width=\"250\"\u002F>](https:\u002F\u002Fspeechbrain.github.io\u002Fimg\u002Fhf.ico) &nbsp; &nbsp;\n[\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspeechbrain_speechbrain_readme_ca4b5331675d.png\" alt=\"Image 3\" width=\"250\"\u002F>](https:\u002F\u002Fviadialog.com\u002Fen\u002F) &nbsp; &nbsp;\n[\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspeechbrain_speechbrain_readme_3efd1a1edff3.png\" alt=\"Image 4\" width=\"250\"\u002F>](https:\u002F\u002Feurope.naverlabs.com\u002F)\n\n\u003Cbr>\u003Cbr>\n\n[\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspeechbrain_speechbrain_readme_77a46224ef61.png\" alt=\"Image 5\" width=\"250\"\u002F>](https:\u002F\u002Fwww.ovhcloud.com\u002Fen-ca\u002F) &nbsp; &nbsp;\n[\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspeechbrain_speechbrain_readme_bf83b7f16f03.png\" alt=\"Image 2\" width=\"250\"\u002F>](https:\u002F\u002Fusa.baidu.com\u002F) &nbsp; &nbsp;\n[\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspeechbrain_speechbrain_readme_22c03d6370f5.png\" alt=\"Image 6\" width=\"250\"\u002F>](https:\u002F\u002Fresearch.samsung.com\u002Faicenter_cambridge)\n\n\u003Cbr>\u003Cbr>\n\n[\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspeechbrain_speechbrain_readme_62eec9889323.png\" alt=\"Image 7\" width=\"250\"\u002F>](https:\u002F\u002Fmila.quebec\u002Fen\u002F) &nbsp; &nbsp;\n[\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspeechbrain_speechbrain_readme_6c8731d96ff3.jpeg\" alt=\"Image 9\" width=\"250\"\u002F>](https:\u002F\u002Fwww.concordia.ca\u002F) &nbsp; &nbsp;\n[\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspeechbrain_speechbrain_readme_aa0be620b5f7.png\" alt=\"Image 8\" width=\"250\"\u002F>](https:\u002F\u002Flia.univ-avignon.fr\u002F) &nbsp; &nbsp;\n#\n# 📖 Citing SpeechBrain\n\nIf you use SpeechBrain in your research or business, please cite it using the following BibTeX entry:\n\n```bibtex\n@article{speechbrain_v1,\n  author  = {Mirco Ravanelli and Titouan Parcollet and Adel Moumen and Sylvain de Langen and Cem Subakan and Peter Plantinga and Yingzhi Wang and Pooneh Mousavi and Luca Della Libera and Artem Ploujnikov and Francesco Paissan and Davide Borra and Salah Zaiem and Zeyu Zhao and Shucong Zhang and Georgios Karakasidis and Sung-Lin Yeh and Pierre Champion and Aku Rouhe and Rudolf Braun and Florian Mai and Juan Zuluaga-Gomez and Seyed Mahed Mousavi and Andreas Nautsch and Ha Nguyen and Xuechen Liu and Sangeet Sagar and Jarod Duret and Salima Mdhaffar and Ga{{\\\"e}}lle Laperri{{\\`e}}re and Mickael Rouvier and Renato De Mori and Yannick Est{{\\`e}}ve},\n  title   = {Open-Source Conversational AI with SpeechBrain 1.0},\n  journal = {Journal of Machine Learning Research},\n  year    = {2024},\n  volume  = {25},\n  number  = {333},\n  url     = {http:\u002F\u002Fjmlr.org\u002Fpapers\u002Fv25\u002F24-0991.html}\n}\n\n@misc{speechbrain,\n  title={{SpeechBrain}: A General-Purpose Speech Toolkit},\n  author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},\n  year={2021},\n  eprint={2106.04624},\n  archivePrefix={arXiv},\n  primaryClass={eess.AS},\n  note={arXiv:2106.04624}\n}\n```\n\n","\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fspeechbrain\u002Fspeechbrain\u002Fdevelop\u002Fdocs\u002Fimages\u002Fspeechbrain-logo.svg\" alt=\"SpeechBrain Logo\"\u002F>\n\u003C\u002Fp>\n\n[![Typing SVG](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspeechbrain_speechbrain_readme_32faf143fc63.png)](https:\u002F\u002Fgit.io\u002Ftyping-svg)\n\n\n| 📘 [教程](https:\u002F\u002Fspeechbrain.readthedocs.io) | 🌐 [官网](https:\u002F\u002Fspeechbrain.github.io\u002F) | 📚 [文档](https:\u002F\u002Fspeechbrain.readthedocs.io\u002Fen\u002Flatest\u002Findex.html) | 🤝 [贡献指南](https:\u002F\u002Fspeechbrain.readthedocs.io\u002Fen\u002Flatest\u002Fcontributing.html) | 🤗 [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fspeechbrain) | ▶️ [YouTube](https:\u002F\u002Fwww.youtube.com\u002F@SpeechBrainProject) | 🐦 [X](https:\u002F\u002Ftwitter.com\u002FSpeechBrain1) |\n\n![GitHub 仓库星标数](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fspeechbrain\u002Fspeechbrain?style=social) *请支持我们的社区项目，给 GitHub 仓库点个赞吧！*\n\n**最新消息（2024年1月）：** 立即了解 SpeechBrain 1.0 的新特性 [点击这里](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1IEPfKRuvJRSjoxu22GZhb3czfVHsAy0s?usp=sharing)!\n#\n# 🗣️💬 SpeechBrain 能为你提供什么\n\n- SpeechBrain 是一个基于 [PyTorch](https:\u002F\u002Fpytorch.org\u002F) 的 **开源** 工具包，旨在加速 **对话式 AI** 的开发，即那些驱动 *语音助手*、*聊天机器人* 和 *大型语言模型* 的技术。\n\n- 它专为快速、便捷地构建先进的 **语音** 和 **文本** 处理技术而设计。\n\n\n## 🌐 愿景\n- 随着 [深度学习](https:\u002F\u002Fwww.deeplearningbook.org\u002F) 的兴起，曾经遥不可及的语音处理和自然语言处理等领域如今已变得触手可及。只需精心设计的神经网络和大规模数据集，便能实现目标。\n\n- 我们认为，现在正是打造一个 **综合性工具包** 的时候，它应像人脑一样，协同支持各种复杂对话式 AI 系统所需的技术。\n\n- 这些技术涵盖 *语音识别*、*说话人识别*、*语音增强*、*语音分离*、*语言建模*、*对话系统* 等，并且还在不断扩展。\n\n- 为了实现我们长期追求的自然人机对话目标——包括为非言语人群服务——我们最近还新增了对 [EEG 模态](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FMOABB) 的支持。\n\n\n\n## 📚 训练配方\n- 我们在超过 40 个数据集上分享了 200 多种具有竞争力的训练配方，覆盖 20 种语音和文本处理任务（见下文）。\n\n- 我们既支持从头开始训练，也支持对预训练模型进行微调，例如 [Whisper](https:\u002F\u002Fhuggingface.co\u002Fopenai\u002Fwhisper-large)、[Wav2Vec2](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_doc\u002Fwav2vec2)、[WavLM](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_doc\u002Fwavlm)、[Hubert](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_doc\u002Fhubert)、[GPT2](https:\u002F\u002Fhuggingface.co\u002Fgpt2)、[Llama2](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_doc\u002Fllama2)，以及其他模型。这些托管在 [HuggingFace](https:\u002F\u002Fhuggingface.co\u002F) 上的模型可以轻松集成并进行微调。\n\n- 对于任何任务，你都可以使用以下命令来训练模型：\n```python\npython train.py hparams\u002Ftrain.yaml\n```\n\n- 超参数被封装在一个 YAML 文件中，而训练过程则由 Python 脚本协调完成。\n\n- 我们在不同任务之间保持了统一的代码结构。\n\n- 为了提高实验的可重复性，训练日志和检查点都存储在 Dropbox 上。\n\n## \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspeechbrain\" target=\"_blank\"> \u003Cimg src=\"https:\u002F\u002Fhuggingface.co\u002Ffront\u002Fassets\u002Fhuggingface_logo.svg\" alt=\"drawing\" width=\"40\"\u002F> \u003C\u002Fa> 预训练模型与推理\n\n- 您可以访问托管在 [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fspeechbrain) 上的 100 多个预训练模型。\n- 每个模型都配有用户友好的接口，方便您进行无缝推理。例如，使用预训练模型转录语音仅需三行代码：\n\n```python\nfrom speechbrain.inference import EncoderDecoderASR\n\nasr_model = EncoderDecoderASR.from_hparams(source=\"speechbrain\u002Fasr-conformer-transformerlm-librispeech\", savedir=\"pretrained_models\u002Fasr-transformer-transformerlm-librispeech\")\nasr_model.transcribe_file(\"speechbrain\u002Fasr-conformer-transformerlm-librispeech\u002Fexample.wav\")\n```\n\n##  \u003Ca href=\"https:\u002F\u002Fspeechbrain.github.io\u002F\" target=\"_blank\"> \u003Cimg src=\"https:\u002F\u002Fupload.wikimedia.org\u002Fwikipedia\u002Fcommons\u002Fthumb\u002Fd\u002Fd0\u002FGoogle_Colaboratory_SVG_Logo.svg\u002F1200px-Google_Colaboratory_SVG_Logo.svg.png\" alt=\"drawing\" width=\"50\"\u002F> \u003C\u002Fa> 文档\n- 我们致力于推动包容性和教育事业。\n- 我们编写了 30 多篇 [教程](https:\u002F\u002Fspeechbrain.readthedocs.io)，不仅详细介绍了 SpeechBrain 的工作原理，还能帮助用户熟悉对话式 AI 技术。\n- 每个类或函数都有清晰的说明和可运行示例。更多详情请参阅 [文档](https:\u002F\u002Fspeechbrain.readthedocs.io\u002Fen\u002Flatest\u002Findex.html) 📚。\n\n\n\n## 🎯 应用场景\n- 🚀 **科研加速**：加快学术和工业研究进程。您可以轻松开发和集成新模型，并将其性能与我们的基线进行比较。\n\n- ⚡️ **快速原型开发**：非常适合时间紧迫的项目中的快速原型设计。\n\n- 🎓 **教学工具**：SpeechBrain 的简洁性使其成为宝贵的教学资源。许多机构，如 [Mila](https:\u002F\u002Fmila.quebec\u002Fen\u002F)、[康考迪亚大学](https:\u002F\u002Fwww.concordia.ca\u002F)、[阿维尼翁大学](https:\u002F\u002Funiv-avignon.fr\u002Fen\u002F) 等，都将其用于学生培训。\n\n#\n# 🚀 快速入门\n\n要开始使用 SpeechBrain，请按照以下简单步骤操作：\n\n## 🛠️ 安装\n\n### 通过 PyPI 安装\n\n1. 使用 PyPI 安装 SpeechBrain：\n\n    ```bash\n    pip install speechbrain\n    ```\n\n2. 在 Python 代码中导入 SpeechBrain：\n\n    ```python\n    import speechbrain as sb\n    ```\n\n### 从 GitHub 安装\n此安装方式推荐给希望进行实验并根据自身需求自定义工具包的用户。\n\n1. 克隆 GitHub 仓库并安装依赖项：\n\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain.git\n    cd speechbrain\n    pip install -r requirements.txt\n    pip install --editable .\n    ```\n\n2. 在 Python 代码中导入 SpeechBrain：\n\n    ```python\n    import speechbrain as sb\n    ```\n\n由于使用了 `--editable` 标志，对 `speechbrain` 包所做的任何修改都会自动生效。\n\n## ✔️ 测试安装\n\n通过运行以下命令确保安装正确：\n\n```bash\npytest tests\npytest --doctest-modules speechbrain\n```\n\n## 🏃‍♂️ 运行实验\n\n在 SpeechBrain 中，您可以按照以下步骤为任何任务训练模型：\n\n```bash\ncd recipes\u002F\u003Cdataset>\u002F\u003Ctask>\u002F\npython experiment.py params.yaml\n```\n\n结果将保存在 YAML 文件中指定的 `output_folder` 中。\n\n## 📘 学习 SpeechBrain\n\n- **官网**：在[官方网站](https:\u002F\u002Fspeechbrain.github.io)上浏览相关信息。\n\n- **教程**：从涵盖基础功能的[基础教程](https:\u002F\u002Fspeechbrain.readthedocs.io\u002Fen\u002Flatest\u002Ftutorials\u002Fbasics.html)开始。更高级的教程和主题可在[SpeechBrain 文档](https:\u002F\u002Fspeechbrain.readthedocs.io)中的“Tutorial notebooks”类别中找到。\n\n- **文档**：SpeechBrain 的 API 详细信息、贡献指南以及代码均可在[文档](https:\u002F\u002Fspeechbrain.readthedocs.io\u002Fen\u002Flatest\u002Findex.html)中查阅。\n\n#\n# 🔧 支持的技术\n- SpeechBrain 是一个多功能框架，旨在实现对话式 AI 领域中的多种技术。\n- 它不仅擅长单个任务的实现，还能将各种技术组合成复杂的流水线。\n\n## 🎙️ 语音\u002F音频处理\n| 任务        | 数据集           | 技术\u002F模型  |\n| ------------- |-------------| -----|\n| 语音识别      | [AISHELL-1](recipes\u002FAISHELL-1), [CommonVoice](recipes\u002FCommonVoice), [DVoice](recipes\u002FDVoice), [LibriSpeech](recipes\u002FLibriSpeech), [MEDIA](recipes\u002FMEDIA), [RescueSpeech](recipes\u002FRescueSpeech), [Switchboard](recipes\u002FSwitchboard), [TIMIT](recipes\u002FTIMIT), [Tedlium2](recipes\u002FTedlium2), [Voicebank](recipes\u002FVoicebank) | [CTC](https:\u002F\u002Fwww.cs.toronto.edu\u002F~graves\u002Ficml_2006.pdf), [Transducers](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1211.3711.pdf?origin=publication_detail), [Transformers](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03762), [Seq2Seq](http:\u002F\u002Fzhaoshuaijiang.com\u002Ffile\u002FHybrid_CTC_Attention_Architecture_for_End-to-End_Speech_Recognition.pdf), [用于 CTC 的 Beamsearch 技术](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1911.01629.pdf),[seq2seq](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.02619.pdf),[transducers](https:\u002F\u002Fwww.merl.com\u002Fpublications\u002Fdocs\u002FTR2017-190.pdf)), [重打分](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1612.02695.pdf), [Conformer](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.08100), [Branchformer](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.02971), [Hyperconformer](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.18281), [Kaldi2-FST](https:\u002F\u002Fgithub.com\u002Fk2-fsa\u002Fk2) |\n| 说话人识别      | [VoxCeleb](recipes\u002FVoxCeleb) | [ECAPA-TDNN](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.07143), [ResNET](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.12592.pdf), [Xvectors](https:\u002F\u002Fwww.danielpovey.com\u002Ffiles\u002F2018_icassp_xvectors.pdf), [PLDA](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F6639151), [得分归一化](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fabs\u002Fpii\u002FS1051200499903603) |\n| 语音分离      | [WSJ0Mix](recipes\u002FWSJ0Mix), [LibriMix](recipes\u002FLibriMix), [WHAM!](recipes\u002FWHAMandWHAMR), [WHAMR!](recipes\u002FWHAMandWHAMR), [Aishell1Mix](recipes\u002FAishell1Mix), [BinauralWSJ0Mix](recipes\u002FBinauralWSJ0Mix) | [SepFormer](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.13154), [RESepFormer](https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.09507), [SkiM](https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.10800), [DualPath RNN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.06379), [ConvTasNET](https:\u002F\u002Farxiv.org\u002Fabs\u002F1809.07454) |\n| 语音增强      | [DNS](recipes\u002FDNS), [Voicebank](recipes\u002FVoicebank) | [SepFormer](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.13154), [MetricGAN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.04874), [MetricGAN-U](https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.05866), [SEGAN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.09452), [频谱掩蔽](http:\u002F\u002Fstaff.ustc.edu.cn\u002F~jundu\u002FPublications\u002Fpublications\u002FTrans2015_Xu.pdf), [时间掩蔽](http:\u002F\u002Fstaff.ustc.edu.cn\u002F~jundu\u002FPublications\u002Fpublications\u002FTrans2015_Xu.pdf) |\n| 可解释性 | [ESC50](recipes\u002FESC50) | [音频分类器的可听地图 (L-MAC)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.13086), [学习解释 (L2I)](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Ffile\u002Fe53280d73dd5389e820f4a6250365b0e-Paper-Conference.pdf), [非负矩阵分解 (NMF)](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Ffile\u002Fe53280d73dd5389e820f4a6250365b0e-Paper-Conference.pdf), [PIQ](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.12659) |\n| 语音生成 | [AudioMNIST](recipes\u002FAudioMNIST) | [扩散模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.11239), [潜在扩散模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.10752) |\n| 文本转语音      | [LJSpeech](recipes\u002FLJSpeech), [LibriTTS](recipes\u002FLibriTTS) | [Tacotron2](https:\u002F\u002Farxiv.org\u002Fabs\u002F1712.05884), [零样本多说话人 Tacotron2](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.02418), [FastSpeech2](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.04558) |\n| 编码解码      | [LJSpeech](recipes\u002FLJSpeech), [LibriTTS](recipes\u002FLibriTTS) | [HiFiGAN](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.05646), [DiffWave](https:\u002F\u002Farxiv.org\u002Fabs\u002F2009.09761) |\n| 口语语言理解 | [MEDIA](recipes\u002FMEDIA), [SLURP](recipes\u002FSLURP), [流利语音命令](recipes\u002Ffluent-speech-commands), [计时器等](recipes\u002Ftimers-and-such)  | [直接 SLU](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.01604), [解耦 SLU](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.01604), [多阶段 SLU](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.01604) |\n| 语音到语音翻译  | [CVSS](recipes\u002FCVSS) | [离散 Hubert](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2106.07447.pdf), [HiFiGAN](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.05646), [wav2vec2](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.11477) |\n| 语音翻译  | [Fisher CallHome（西班牙语）](recipes\u002FFisher-Callhome-Spanish), [IWSLT22（低资源）](recipes\u002FIWSLT22_lowresource) | [wav2vec2](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.11477) |\n| 情感分类      | [IEMOCAP](recipes\u002FIEMOCAP), [ZaionEmotionDataset](recipes\u002FZaionEmotionDataset) | [ECAPA-TDNN](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.07143), [wav2vec2](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.11477), [情感区分](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.12991) |\n| 语言识别 | [VoxLingua107](recipes\u002FVoxLingua107), [CommonLanguage](recipes\u002FCommonLanguage)| [ECAPA-TDNN](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.07143) |\n| 语音活动检测  | [LibriParty](recipes\u002FLibriParty) | [CRDNN](https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.04624) |\n| 声音分类  | [ESC50](recipes\u002FESC50), [UrbanSound](recipes\u002FUrbanSound8k) | [CNN14](https:\u002F\u002Fgithub.com\u002Franchlai\u002Fsound_classification), [ECAPA-TDNN](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.07143) |\n| 自监督学习 | [CommonVoice](recipes\u002FCommonVoice), [LibriSpeech](recipes\u002FLibriSpeech) | [wav2vec2](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.11477) |\n| 度量学习 | [REAL-M](recipes\u002FREAL-M\u002Fsisnr-estimation), [Voicebank](recipes\u002FVoicebank) | [盲 SNR 估计](https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.08909), [PESQ 学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.05866) |\n| 对齐 | [TIMIT](recipes\u002FTIMIT) | [CTC](https:\u002F\u002Fwww.cs.toronto.edu\u002F~graves\u002Ficml_2006.pdf), [维特比算法](https:\u002F\u002Fwww.cs.cmu.edu\u002F~cga\u002Fbehavior\u002Frabiner1.pdf), [前向前向算法](https:\u002F\u002Fwww.cs.cmu.edu\u002F~cga\u002Fbehavior\u002Frabiner1.pdf) |\n| 区分 | [AMI](recipes\u002FAMI) | [ECAPA-TDNN](https:\u002F\u002Farxiv.org\u002Fabs\u002F2005.07143), [X-vectors](https:\u002F\u002Fwww.danielpovey.com\u002Ffiles\u002F2018_icassp_xvectors.pdf), [谱聚类](https:\u002F\u002Fweb.archive.org\u002Fweb\u002F20240305184559\u002Fhttp:\u002F\u002Fwww.ifp.illinois.edu\u002F~hning2\u002Fpapers\u002FNing_spectral.pdf) |\n\n## 📝 文本处理\n| 任务        | 数据集           | 技术\u002F模型  |\n| ------------- |-------------| -----|\n| 语言建模 | [CommonVoice](recipes\u002FCommonVoice), [LibriSpeech](recipes\u002FLibriSpeech)| [n-gram](https:\u002F\u002Fweb.stanford.edu\u002F~jurafsky\u002Fslp3\u002F3.pdf), [RNNLM](https:\u002F\u002Fwww.fit.vutbr.cz\u002Fresearch\u002Fgroups\u002Fspeech\u002Fpubli\u002F2010\u002Fmikolov_interspeech2010_IS100722.pdf), [TransformerLM](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03762) |\n| 回答生成 | [MultiWOZ](recipes\u002FMultiWOZ\u002Fresponse_generation)| [GPT2](https:\u002F\u002Fd4mucfpksywv.cloudfront.net\u002Fbetter-language-models\u002Flanguage_models_are_unsupervised_multitask_learners.pdf), [Llama2](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.09288) |\n| 字素到音素 | [LibriSpeech](recipes\u002FLibriSpeech) | [RNN](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.13703), [Transformer](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.13703), [课程学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.13703), [同形异义词损失](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.13703) |\n\n## 🧠 脑电图处理\n| 任务        | 数据集           | 技术\u002F模型  |\n| ------------- |-------------| -----|\n| 运动想象 | [BNCI2014001](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fhparams\u002FMotorImagery\u002FBNCI2014001), [BNCI2014004](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fhparams\u002FMotorImagery\u002FBNCI2014004), [BNCI2015001](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fhparams\u002FMotorImagery\u002FBNCI2015001), [Lee2019_MI](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fhparams\u002FMotorImagery\u002FLee2019_MI), [Zhou201](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fhparams\u002FMotorImagery\u002FZhou2016) | [EEGNet](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Fblob\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fmodels\u002FEEGNet.py), [ShallowConvNet](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Fblob\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fmodels\u002FShallowConvNet.py), [EEGConformer](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Fblob\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fmodels\u002FEEGConformer.py) |\n| P300 | [BNCI2014009](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fhparams\u002FP300\u002FBNCI2014009), [EPFLP300](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fhparams\u002FP300\u002FEPFLP300), [bi2015a](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fhparams\u002FP300\u002Fbi2015a), | [EEGNet](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Fblob\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fmodels\u002FEEGNet.py) |\n| SSVEP | [Lee2019_SSVEP](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fhparams\u002FSSVEP\u002FLee2019_SSVEP) | [EEGNet](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Fblob\u002Fmain\u002Fbenchmarks\u002FMOABB\u002Fmodels\u002FEEGNet.py) |\n\n\n\n\n## 🔍 其他功能\n\nSpeechBrain 包含一系列原生功能，可增强对话式 AI 技术的开发。以下是一些示例：\n\n- **训练编排：** `Brain` 类是一个完全可定制的工具，用于管理数据上的训练和评估循环。它简化了训练循环，同时提供了覆盖流程任何部分的灵活性。\n  \n- **超参数管理：** 基于 YAML 的超参数文件指定了所有超参数，从单个数字（如学习率）到完整对象（如自定义模型）。这种优雅的解决方案大大简化了训练脚本。\n  \n- **动态数据加载器：** 实现灵活高效的数据读取。\n  \n- **GPU 训练：** 支持单 GPU 和多 GPU 训练，包括分布式训练。\n  \n- **动态批处理：** 即时动态批处理可提高变长信号的高效处理能力。\n  \n- **混合精度训练：** 通过混合精度技术加速训练。\n  \n- **高效数据读取：** 通过 [WebDataset](https:\u002F\u002Fgithub.com\u002Fwebdataset\u002Fwebdataset) 从共享网络文件系统 (NFS) 高效读取大型数据集。\n  \n- **Hugging Face 集成：** 与 [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fspeechbrain) 无缝对接，支持 wav2vec2 和 Hubert 等流行模型。\n  \n- **Orion 集成：** 与 [Orion](https:\u002F\u002Fgithub.com\u002FEpistimio\u002Forion) 对接，用于超参数调优。\n  \n- **语音增强技术：** 包括 SpecAugment、噪声、混响等。\n  \n- **数据准备脚本：** 包括为支持的数据集准备数据的脚本。\n\nSpeechBrain 正在快速发展，未来将继续努力支持越来越多的技术。\n\n\n## 📊 性能\n\n- SpeechBrain 集成了多种技术，其中一些技术达到了具有竞争力或最先进的性能水平。\n  \n- 如需全面了解不同任务、数据集和技术所取得的性能，请访问 [此处](PERFORMANCE.md)。\n\n#\n# 📜 许可证\n\n- SpeechBrain 根据 [Apache 许可证 2.0 版](https:\u002F\u002Fwww.apache.org\u002Flicenses\u002FLICENSE-2.0) 发布，这是一种流行的类 BSD 许可证。\n- 您可以自由地以免费或商业目的重新分发 SpeechBrain，但需保留许可证头信息。与 GPL 不同，Apache 许可证并非“病毒式”许可，这意味着您没有义务公开源代码的修改内容。\n\n#\n# 🔮 未来计划\n\n我们对未来有着宏伟的计划，重点将放在以下几个方面：\n\n- **扩大规模：** 我们旨在为在大规模数据集上训练巨型模型提供全面的配方和技术。\n  \n- **缩小规模：** 尽管扩大规模能够带来前所未有的性能，但我们也意识到在生产环境中部署大型模型所面临的挑战。因此，我们将专注于实时、流式传输且占用空间小的对话式 AI。\n  \n- **多模态大型语言模型：** 我们设想未来能够使用单一基础模型来处理各种文本、语音和音频任务。我们的核心团队正致力于实现先进多模态 LLM 的训练。\n  \n#\n# 🤝 贡献\n\n- SpeechBrain 是一个由社区驱动的项目，由核心团队领导，并得到众多国际合作者的支持。\n- 我们欢迎社区的贡献和想法。更多信息请参阅 [这里](https:\u002F\u002Fspeechbrain.github.io\u002Fcontributing.html)。\n\n#\n\n# 🙏 赞助商\n\n- SpeechBrain 是一个以学术为导向的项目，依靠其贡献者的热情与投入。\n- 由于我们无法依赖大型公司的资源，因此我们非常感谢任何形式的支持，包括捐赠或与核心团队的合作。\n- 如果您有兴趣赞助 SpeechBrain，请通过 speechbrainproject@gmail.com 与我们联系。\n- 衷心感谢所有赞助商，包括目前的赞助商：\n\n\n\n[\u003Cimg src=\"https:\u002F\u002Fhuggingface.co\u002Ffront\u002Fassets\u002Fhuggingface_logo.svg\" alt=\"Image 1\" width=\"250\"\u002F>](https:\u002F\u002Fspeechbrain.github.io\u002Fimg\u002Fhf.ico) &nbsp; &nbsp;\n[\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspeechbrain_speechbrain_readme_ca4b5331675d.png\" alt=\"Image 3\" width=\"250\"\u002F>](https:\u002F\u002Fviadialog.com\u002Fen\u002F) &nbsp; &nbsp;\n[\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspeechbrain_speechbrain_readme_3efd1a1edff3.png\" alt=\"Image 4\" width=\"250\"\u002F>](https:\u002F\u002Feurope.naverlabs.com\u002F)\n\n\u003Cbr>\u003Cbr>\n\n[\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspeechbrain_speechbrain_readme_77a46224ef61.png\" alt=\"Image 5\" width=\"250\"\u002F>](https:\u002F\u002Fwww.ovhcloud.com\u002Fen-ca\u002F) &nbsp; &nbsp;\n[\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspeechbrain_speechbrain_readme_bf83b7f16f03.png\" alt=\"Image 2\" width=\"250\"\u002F>](https:\u002F\u002Fusa.baidu.com\u002F) &nbsp; &nbsp;\n[\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspeechbrain_speechbrain_readme_22c03d6370f5.png\" alt=\"Image 6\" width=\"250\"\u002F>](https:\u002F\u002Fresearch.samsung.com\u002Faicenter_cambridge)\n\n\u003Cbr>\u003Cbr>\n\n[\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspeechbrain_speechbrain_readme_62eec9889323.png\" alt=\"Image 7\" width=\"250\"\u002F>](https:\u002F\u002Fmila.quebec\u002Fen\u002F) &nbsp; &nbsp;\n[\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspeechbrain_speechbrain_readme_6c8731d96ff3.jpeg\" alt=\"Image 9\" width=\"250\"\u002F>](https:\u002F\u002Fwww.concordia.ca\u002F) &nbsp; &nbsp;\n[\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspeechbrain_speechbrain_readme_aa0be620b5f7.png\" alt=\"Image 8\" width=\"250\"\u002F>](https:\u002F\u002Flia.univ-avignon.fr\u002F) &nbsp; &nbsp;\n#\n# 📖 引用 SpeechBrain\n\n如果您在研究或业务中使用 SpeechBrain，请使用以下 BibTeX 条目进行引用：\n\n```bibtex\n@article{speechbrain_v1,\n  author  = {Mirco Ravanelli and Titouan Parcollet and Adel Moumen and Sylvain de Langen and Cem Subakan and Peter Plantinga and Yingzhi Wang and Pooneh Mousavi and Luca Della Libera and Artem Ploujnikov and Francesco Paissan and Davide Borra and Salah Zaiem and Zeyu Zhao and Shucong Zhang and Georgios Karakasidis and Sung-Lin Yeh and Pierre Champion and Aku Rouhe and Rudolf Braun and Florian Mai and Juan Zuluaga-Gomez and Seyed Mahed Mousavi and Andreas Nautsch and Ha Nguyen and Xuechen Liu and Sangeet Sagar and Jarod Duret and Salima Mdhaffar and Ga{{\\\"e}}lle Laperri{{\\`e}}re and Mickael Rouvier and Renato De Mori and Yannick Est{{\\`e}}ve},\n  title   = {Open-Source Conversational AI with SpeechBrain 1.0},\n  journal = {Journal of Machine Learning Research},\n  year    = {2024},\n  volume  = {25},\n  number  = {333},\n  url     = {http:\u002F\u002Fjmlr.org\u002Fpapers\u002Fv25\u002F25-0991.html}\n}\n\n@misc{speechbrain,\n  title={{SpeechBrain}: A General-Purpose Speech Toolkit},\n  author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},\n  year={2021},\n  eprint={2106.04624},\n  archivePrefix={arXiv},\n  primaryClass={eess.AS},\n  note={arXiv:2106.04624}\n}\n```","# SpeechBrain 快速上手指南\n\nSpeechBrain 是一个基于 PyTorch 的开源工具包，旨在加速对话式 AI（如语音助手、聊天机器人）的开发。它支持语音识别、说话人识别、语音增强、文本转语音等多种任务，并提供超过 200 个训练配方和 100+ 个预训练模型。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux, macOS 或 Windows (推荐 Linux)\n*   **Python**: 3.8 或更高版本\n*   **核心依赖**: [PyTorch](https:\u002F\u002Fpytorch.org\u002F) (需预先安装与您的 CUDA 版本匹配的 PyTorch)\n*   **硬件**: 推荐使用 NVIDIA GPU 进行模型训练和推理（可选，但强烈推荐）\n\n> **提示**: 如果您尚未安装 PyTorch，请访问 [PyTorch 官网](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F) 获取适合您环境的安装命令。国内用户可使用清华源加速安装：\n> ```bash\n> pip install torch torchvision torchaudio --index-url https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n## 安装步骤\n\n您可以选择通过 PyPI 直接安装（适合快速使用），或从 GitHub 克隆源码（适合二次开发和实验）。\n\n### 方式一：通过 PyPI 安装（推荐新手）\n\n这是最快捷的安装方式，适合直接调用库功能。\n\n```bash\npip install speechbrain\n```\n*国内加速方案*:\n```bash\npip install speechbrain -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n安装完成后，在 Python 中导入即可使用：\n```python\nimport speechbrain as sb\n```\n\n### 方式二：从 GitHub 源码安装（推荐开发者）\n\n如果您需要修改源码、运行官方提供的训练配方（Recipes）或参与贡献，建议使用此方式。\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain.git\ncd speechbrain\npip install -r requirements.txt\npip install --editable .\n```\n*国内加速方案 (克隆仓库)*:\n```bash\ngit clone https:\u002F\u002Fgitee.com\u002Fmirrors\u002Fspeechbrain.git  # 如果存在镜像，否则建议使用上述官方源配合代理\n# 或者配置 git 全局代理后使用官方源\n```\n*注意*: `--editable` 参数确保您对 `speechbrain` 代码的任何修改都会立即生效，无需重新安装。\n\n### 验证安装\n\n运行以下命令确保安装无误：\n\n```bash\npytest tests\npytest --doctest-modules speechbrain\n```\n\n## 基本使用\n\nSpeechBrain 的核心优势在于其简洁的接口，无论是加载预训练模型进行推理，还是启动新的训练任务都非常直观。\n\n### 1. 使用预训练模型进行推理\n\n只需几行代码即可调用托管在 HuggingFace 上的预训练模型。以下示例展示如何使用预训练模型进行语音转文字（ASR）：\n\n```python\nfrom speechbrain.inference import EncoderDecoderASR\n\n# 加载预训练模型\nasr_model = EncoderDecoderASR.from_hparams(\n    source=\"speechbrain\u002Fasr-conformer-transformerlm-librispeech\", \n    savedir=\"pretrained_models\u002Fasr-transformer-transformerlm-librispeech\"\n)\n\n# 转录音频文件\nresult = asr_model.transcribe_file(\"speechbrain\u002Fasr-conformer-transformerlm-librispeech\u002Fexample.wav\")\nprint(result)\n```\n\n### 2. 运行训练实验\n\nSpeechBrain 提供了统一的训练接口。所有的超参数都封装在 YAML 文件中，通过 Python 脚本驱动。\n\n假设您想在一个特定数据集（如 LibriSpeech）上运行语音识别任务，操作如下：\n\n```bash\ncd recipes\u002FLibriSpeech\u002FASR\u002FCTC\u002F\npython experiment.py params.yaml\n```\n\n*   **参数配置**: 修改 `params.yaml` 即可调整学习率、模型架构、数据路径等超参数。\n*   **结果保存**: 训练日志、模型检查点（Checkpoints）和最终结果将自动保存在 YAML 文件中指定的 `output_folder` 目录下。\n\n### 3. 深入学习\n\n*   **教程**: 访问 [官方文档教程](https:\u002F\u002Fspeechbrain.readthedocs.io) 查看涵盖基础功能到高级应用的 30+ 个交互式教程。\n*   **模型库**: 在 [HuggingFace SpeechBrain 主页](https:\u002F\u002Fhuggingface.co\u002Fspeechbrain) 浏览更多支持的预训练模型。\n*   **代码结构**: 项目保持了高度一致的代码结构，不同任务间的代码逻辑相似，便于迁移和学习。","某初创团队正在开发一款面向听障人士的实时会议辅助系统，需要将语音精准转写为文字并区分不同发言人。\n\n### 没有 speechbrain 时\n- 团队需分别寻找语音识别、说话人分离和语音增强模型，代码框架不统一，集成耗时数周。\n- 缺乏现成的训练配方，复现论文效果困难，模型在嘈杂会议室环境下的准确率极低。\n- 微调 Whisper 或 Wav2Vec2 等预训练模型需要编写大量底层 PyTorch 代码，调试成本高昂。\n- 不同任务的数据预处理逻辑各异，导致数据管道混乱，难以快速迭代实验。\n\n### 使用 speechbrain 后\n- 直接调用 toolkit 内统一的接口，一键串联语音增强、分离与识别模块，原型开发缩短至 3 天。\n- 利用官方提供的 200+ 竞争级训练配方，快速适配会议室噪音场景，显著提升了复杂声学环境下的转写精度。\n- 仅需修改 YAML 配置文件即可完成对 HuggingFace 上预训练模型的微调，无需重写底层训练循环。\n- 所有任务共享一致的数据处理流程和代码结构，团队成员能高效协作并快速验证新算法。\n\nspeechbrain 通过提供 holistic 的全栈式解决方案，让开发者从繁琐的底层基建中解放，专注于构建真正包容的自然人机对话体验。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspeechbrain_speechbrain_d89b3746.png","SpeechBrain","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fspeechbrain_0ebb0d78.png","",null,"SpeechBrain1","https:\u002F\u002Fspeechbrain.github.io\u002F","https:\u002F\u002Fgithub.com\u002Fspeechbrain",[81,85,89,93],{"name":82,"color":83,"percentage":84},"Python","#3572A5",98.1,{"name":86,"color":87,"percentage":88},"Perl","#0298c3",1.5,{"name":90,"color":91,"percentage":92},"MATLAB","#e16737",0.3,{"name":94,"color":95,"percentage":96},"Shell","#89e051",0.1,11432,1680,"2026-04-09T10:33:17","Apache-2.0","未说明","未说明 (基于 PyTorch，通常建议 NVIDIA GPU 以加速训练)",{"notes":104,"python":101,"dependencies":105},"该工具是基于 PyTorch 的开源 toolkit。支持通过 PyPI (pip install speechbrain) 或 GitHub 源码安装。支持微调 Whisper, Wav2Vec2, Llama2 等预训练模型。训练日志和检查点托管在 Dropbox。具体版本依赖需查看 requirements.txt 文件，README 中未列出确切版本号。",[106,107,108,109],"torch","torchaudio","transformers","huggingface_hub",[35,14,15,111],"音频",[113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,108,130,131],"speech-recognition","speech-toolkit","speaker-recognition","speech-to-text","speech-enhancement","speech-separation","audio","audio-processing","speech-processing","speechrecognition","asr","voice-recognition","spoken-language-understanding","speaker-diarization","speaker-verification","pytorch","huggingface","language-model","deep-learning","2026-03-27T02:49:30.150509","2026-04-10T15:52:20.206943",[135,140,145,150,155,160],{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},27815,"训练中断后恢复时，学习率（Learning Rate）为何会重置为异常值导致模型训练失败？","这通常是因为食谱（recipe）中的检查点配置（checkpointer）缺少了对优化器（optimizers）的引用。确保在初始化优化器的函数（如 init_optimizers()）中，将所有使用的优化器（例如 wav2vec 和 adam 的优化器）都添加到 checkpointer 的 recoverables 列表中。如果食谱文件中遗漏了这些优化器的注册，恢复训练时学习率调度器将无法正确加载之前的状态。","https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fissues\u002F1824",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},27816,"在 VoxCeleb 等数据集上训练速度极慢，即使使用多卡分布式训练（DDP）也没有提升，如何解决？","这通常是由于 CPU 线程数过多导致的瓶颈，特别是在高端 CPU（如 AMD EPYC）上。可以通过限制 PyTorch 使用的最大线程数来解决。在代码开头添加以下设置：\nimport torch\ntorch.set_num_threads(16)\n根据用户反馈，将线程数限制为 16 后，在 A100 GPU 上的训练速度提升了 6 倍。","https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fissues\u002F990",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},27817,"使用预训练的 ASR 模型进行解码时，出现严重的重复单词或插入错误（Insertion Errors），怎么办？","该问题在某些旧版本中可能存在，建议首先尝试升级到 SpeechBrain 的最新 master 分支，许多解码错误已在后续版本中修复。如果问题仍然存在，可以尝试将解码时的 batch size 设置为 1 以增加稳定性。如果特定音频仍出现重复，可能是该样本本身的特性或模型对该类数据的泛化能力不足，需检查数据预处理或微调模型。","https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fissues\u002F924",{"id":151,"question_zh":152,"answer_zh":153,"source_url":154},27818,"使用多张 NVIDIA GPU（如 RTX 3090）训练时，为什么只有一张显卡在工作，其他显卡空闲？","请确保在运行训练脚本时正确添加了分布式后端参数。对于 SpeechBrain，通常需要添加 --data_parallel_backend 标志来启用多卡训练。命令示例：\npython train.py hparams\u002Ftransformer.yaml --data_folder \u003C数据路径> --data_parallel_backend\n如果仍然无效，请检查 PyTorch 版本与 CUDA 版本的兼容性，并确认 NCCL 后端是否正常初始化。","https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fissues\u002F868",{"id":156,"question_zh":157,"answer_zh":158,"source_url":159},27819,"使用 EncoderDecoderASR 进行推理的结果与训练时记录的 WER 评估结果不一致，原因是什么？","这种差异通常源于不同版本之间的配置变更或默认参数调整。如果在某个版本（如 0.5.10）正常而在新版本（如 0.5.12）出现异常，请检查两个版本间 hyperparams.yaml 文件的差异，特别是解码策略（beam size, lm weight 等）和预处理流程。建议直接使用训练时保存的完整实验文件夹进行推理，或者严格对齐训练和推理使用的 YAML 配置文件及 SpeechBrain 版本。","https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fissues\u002F1590",{"id":161,"question_zh":162,"answer_zh":163,"source_url":139},27820,"如何在自定义数据集上复现官方食谱（Recipe）的训练效果？","复现时请确保完全保留官方 YAML 文件中的模型结构和超参数设置，仅修改数据路径相关的变量（如 data_folder）。如果遇到问题，对比官方食谱（如 CommonVoice 或 LibriSpeech）中的 train.py 脚本，确保自定义脚本中正确注册了所有必要的组件（如优化器、调度器、检查点对象）。特别注意检查点配置中是否包含了所有需要恢复状态的对象（recoverables）。",[165,170,175,180,185,190,195,200,205,210,215,220,225,230,235],{"id":166,"version":167,"summary_zh":168,"released_at":169},188772,"v1.1.0","本次重大发布扩展了 SpeechBrain 对 SpeechLLM 的支持，并引入了多项新功能、配方和改进。\n\n## 亮点\n\n- 特征缓存 — 将提取的特征（例如 wav2vec 嵌入）保存到磁盘，并在需要时实时加载，从而跳过重新计算。这一功能支撑了我们在 LibriSpeech 数据集上的首个 ASR SpeechLLM 配方，实现了基于预计算嵌入的 LLM 训练。\n- 新配方 — 适用于 ASR 和翻译的 SpeechLLM、流式 SSL、FocalCodec 以及 SENSE 模型。\n\n此外还包含内部改进和错误修复。以下是主要变更的日志记录（省略了一些较小的修复）：\n\n## 变更内容\n* @TParcollet 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2850 中重构了 LLaMA 层块（代码来自三星 AI 中心剑桥分部）。\n* @Chaanks 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2790 中提供了 BestRQ 的流式处理配方。\n* @shucongzhang 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2765 中为 SpeechBrain SSL 准备了 Librilight 数据。\n* @ZhaoZeyu1995 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2772 中实现了基于 k2 的 CTC ASR 模型对齐功能。\n* @TParcollet 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2865 中提出了使用 LLaMA 的 SpeechLLM，并针对 CoVoST 数据集上的语音翻译任务开发了 Conformer 配方（代码来自三星 AI 中心剑桥分部）。\n* @pplantinga 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2985 中提出了特征缓存方案：CachedDynamicItem。\n* @younessdkhissi 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2975 中调整了转换器模型的贪心解码方法。\n* @Copilot 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2989 中用基于 soundfile 的音频 I\u002FO 包装器替换了 torchaudio 的 I\u002FO 接口。\n* @lucadellalib 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F3000 中引入了 FocalCodec [NeurIPS 2025]。\n* @Adel-Moumen 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F3005 中为缓存功能增加了压缩、文件名管理以及关闭和加载机制。\n* @Adel-Moumen 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F3008 中实现了 PaddedBatch 中按键配置填充的功能。\n* @Adel-Moumen 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2885 中开发了 SpeechLLM 的 LibriSpeech 配方。\n* @Adel-Moumen 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F3028 中移除了 CTC 的 CUDA 实现，并将转换器损失整合到现有框架中。\n* @MaryemBouziane 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2998 中添加了 SENSE 模型。\n\n## 新贡献者\n* @emmanuel-ferdman 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2900 中做出了首次贡献。\n* @ofiryaish 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2871 中做出了首次贡献。\n* @omidiu 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2923 中做出了首次贡献。\n* @svecjan 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2934 中做出了首次贡献。\n* @nouranali 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2855 中做出了首次贡献。\n* @OscarFree 在 https:\u002F\u002Fgithub.com\u002Fs…","2026-03-30T14:41:48",{"id":171,"version":172,"summary_zh":173,"released_at":174},188773,"v1.0.3","## 主要变更\n* @TParcollet 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2767 中添加了人民语音数据集（3万小时）的Conformer ASR模型（代码来自三星剑桥人工智能中心）\n* @poonehmousavi 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2755 中实现了音频与音乐的自监督学习\n* @poonehmousavi 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2751 中新增了音频分词器\n* @shucongzhang 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2781 中引入了Libriheavy数据集（代码来自SAIC-剑桥）\n* @TParcollet 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2806 中提供了大规模ASR任务的Conformer配方（代码来自三星剑桥人工智能中心）\n* @shucongzhang 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2799 中为ASR引入了旋转位置嵌入（RoPE，代码来自三星剑桥）\n* @pplantinga 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2689 中实现了语音分析功能\n\n## 新贡献者\n* @rogiervd 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2734 中完成了首次贡献\n* @benniekiss 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2746 中完成了首次贡献\n* @mirofedurco 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2762 中完成了首次贡献\n* @kit1980 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2797 中完成了首次贡献\n* @IliasMAOUDJ 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2574 中完成了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fcompare\u002Fv1.0.2...v1.0.3","2025-04-07T17:05:07",{"id":176,"version":177,"summary_zh":178,"released_at":179},188774,"v1.0.2","这是一次小版本更新，包含一些新功能和配方、内部改进、错误修复以及改进的教程。\n\n以下是主要变更的日志记录（省略了一些细微的错误修复）：\n\n## 值得注意的变更\n\n- 添加了对适配器的支持，请参阅[新教程](https:\u002F\u002Fspeechbrain.readthedocs.io\u002Fen\u002Flatest\u002Ftutorials\u002Fnn\u002Fneural-network-adapters.html) (#2563)\n- 添加了 BEST-RQ，并提供了 LibriSpeech 的配方 (#2309)\n- 为 ASR 添加了 GigaSpeech 配方，包括 Conformer RNN-T 和 WavLM CTC (#2421)\n- 全面重构了 `fetch` 和 `Pretrained` 模块，尽可能减少了开箱即用时符号链接的使用（详见下文）\n- 将所有教程迁移到 SpeechBrain 仓库，并合并到[主文档](https:\u002F\u002Fspeechbrain.readthedocs.io\u002Fen\u002Flatest\u002F)中，同时改进了文档（更新了过时信息、修复了失效链接等）。\n- 新增了以下教程：\n    - [语音识别指标](https:\u002F\u002Fspeechbrain.readthedocs.io\u002Fen\u002Flatest\u002Ftutorials\u002Ftasks\u002Fasr-metrics.html)\n    - [用于低内存快速微调的神经网络适配器](https:\u002F\u002Fspeechbrain.readthedocs.io\u002Fen\u002Flatest\u002Ftutorials\u002Fnn\u002Fneural-network-adapters.html)\n    - [基于 Conformer 的流式语音识别](https:\u002F\u002Fspeechbrain.readthedocs.io\u002Fen\u002Flatest\u002Ftutorials\u002Fnn\u002Fconformer-streaming-asr.html)\n- 在可用时，TensorFloat32 现已默认启用 (#2682)\n\n### 新功能\n\n- 为 TransformerASR 添加了层级 Dropout 支持 (#2309)\n- 为 ASR、EEG 以及可能更多任务添加了符号翻转增强 (#2636)\n- 通过添加 `seed_everything` 并改进 DDP 的种子处理，提升了实验的可重复性 (#2654)\n- 添加了“quirks”模块，以一种易于查找的方式集中管理覆盖的 PyTorch 默认设置和变通方案，并配备了完善的日志记录功能 (#2558)\n\n### 错误修复\n\n- 提升了 VAD 推理的性能 (#2683)\n- 修复了 DDP 处理中的多个问题 (#2682)\n- 修复了损坏的增强集成测试 (#2628)\n- 修复了处理较新 CommonVoice 数据集时的错误 (#2647)\n- 修复了增强中的拼接 bug (#2717)\n- 移除了在 G2P 推理中错误引入的 EOS 标记 (#2718)\n- ……以及其他一些修复\n\n## 新的 `fetch` 语义\n\n我们对 `fetch` 的工作方式进行了一系列更改，这些更改会影响到多个方面，您需要了解。\n\n- 在与获取相关的各种代码中，例如推理接口的 `from_hparams` 方法中，`savedir` 参数指的是应将文件收集到的目录。**现在该参数是可选的，默认值为 `None`。**\n  - 当从本地路径或 HuggingFace 仓库获取文件（模型、音频等）时，不再需要指定目标目录。\n    - 对于本地文件获取，直接返回文件路径。\n    - 对于 HuggingFace 获取，直接使用 HuggingFace 缓存。\n    - 对于 URL 获取，仍需指定 `savedir`。\n  - 推理接口在加载音频文件时，将不再默认在您的工作目录中生成符号链接。\n  - 现在，默认情况下避免创建符号链接的做法变得更加……","2024-10-31T10:06:07",{"id":181,"version":182,"summary_zh":183,"released_at":184},188775,"v1.0.0","[![GitHub 仓库星标数](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fspeechbrain\u002Fspeechbrain?style=social) *请支持我们的社区项目，给 GitHub 仓库点个赞吧！*](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain)\n\n# 🚀 **SpeechBrain 1.0 有哪些新特性？**\n\n📅 2024年2月，我们发布了 [SpeechBrain 1.0](https:\u002F\u002Fspeechbrain.github.io\u002F)。这是由一支杰出的核心开发团队牵头，联合全球众多开发者历时一年共同努力的成果。\n\n\n## 📊 一些数据：\n - [SpeechBrain](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain) 已经发展成为一个重要的开源项目，跻身于最广泛使用的语音处理工具包之列。\n - 超过140位开发者为我们的仓库做出了贡献，在 [GitHub](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain) 上收获了超过7,300颗星。\n - 每月从 [PyPI](https:\u002F\u002Fpypi.org\u002Fproject\u002Fspeechbrain\u002F) 的下载量已达到惊人的20万次。\n - 面向对话式 AI 的配方数量已扩展至超过 [200个](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Ftree\u002Fdevelop\u002Frecipes)，并在 [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fspeechbrain\u002F) 上提供了100多个预训练模型。\n\n## 🌟 主要更新：\n- [SpeechBrain 1.0](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain) 引入了多项重大改进，进一步扩展了对多样化数据集和任务的支持，包括自然语言处理和脑电图（EEG）信号处理。\n  \n- 该工具包如今在对话式 AI 及各类序列处理应用中表现出色。\n  \n- 改进内容涵盖语音识别中的关键技术，例如可流式处理的 Conformer 转导器、与 K2 框架集成以实现有限状态转导器、CTC 解码及 n-gram 重打分；新增 CTC\u002F联合注意力 Beam Search 接口；增强了与 HuggingFace 模型（包括 GPT2 和 Llama2）的兼容性；并对数据增强、训练和推理流程进行了优化。\n\n- 我们还创建了一个专门用于基准测试的新仓库，地址是 [这里](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002F)。目前，该仓库包含了多个领域的基准测试，如语音自监督模型（[MP3S](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FMP3S)）、持续学习（[CL-MASR](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FCL_MASR)）以及 EEG 处理（[SpeechBrain-MOABB](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FMOABB)）。\n\n有关详细的技术信息，请参阅下文。\n\n## 🔄 不兼容的变更\n熟悉 [SpeechBrain](https:\u002F\u002Fspeechbrain.github.io\u002F) 的用户都知道，我们一直尽力避免引入不向后兼容的更改。尽管 [SpeechBrain](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain) 一贯重视保持向后兼容性，但此次发布的新大版本为我们提供了进行重大增强和重构的机会。\n\n1. **🤗 HuggingFace 接口重构：**\n   - 此前，我们的接口仅限于特定模型，如 Whisper、HuBERT、WavL","2024-10-01T10:54:56",{"id":186,"version":187,"summary_zh":188,"released_at":189},188776,"v1.0.1","这是一次小版本更新，包含一些新功能和配方、内部改进、错误修复、兼容性提升，以及更广泛的 Python 向后兼容性。\n\n注意：v1.0.0 和 v1.0.1 早在此次发布日期之前就已在 GitHub 上发布。这些版本当时被误标为草稿。\n\n## 主要变更\n\n- 现在我们声明支持的 Python 版本范围为 `3.8` 至 `3.12`（此前为 `3.9` 至 `3.11`），并相应地改进了测试。\n- 对 Whisper 集成进行了重大改进，支持多种任务、修复微调问题、提升性能等（#2450）。\n- 改进了模型参数信息的打印（#2470）。\n- 新增了主要面向语音识别的指标（#2451）。\n- 为 v1.0 重构后中断的旧 `speechbrain.pretrained` 导入添加了向后兼容性支持（#2485）。\n- 更新了 BibTeX 引用。您可随时在此处找到最新版本：[https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain?tab=readme-ov-file#-citing-speechbrain](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain?tab=readme-ov-file#-citing-speechbrain)。\n\n### 配方及其他功能\n\n- 新增了 VoxPopuli 转换器配方（#2421）。\n- 升级了 CommonVoice 转换器和 Transformer 配方，并进行了多项改进（#2433、#2465、#2560）。\n- 重构了 ESC50 配方并新增 FocalNets 支持（#2499）；为便于解释，增加了单样本推理功能（#2616）。\n- 集成了 Speechtokenizer（#2497）。\n- 增加了对 HiFi-GAN 的支持，使其能够与新的 SSL 离散标记配合使用，并支持在 LJSpeech 和 LibriTTS 数据集上进行比特率可调的训练（#2571）。\n- 新增了用于音频分类器的 Listenable Maps 配方（#2538）。\n\n### 错误修复\n\n- 修复了 `ctc_segmentation` 中的错误（#2505）。\n- 对 `RelPosEncXL` 进行了修复和重构（#2498）。\n- 修复了输入归一化在某些情况下错误地对用户输入进行原地操作的问题（#2504）。\n- DDP 相关修复（#2506、#2633）。\n- 修复了在不使用流式处理功能时与较旧 torchaudio 版本的向后兼容性问题（#2532）。\n- 修复了分离与增强配方在遇到 NaN 时的行为问题（#2524）。\n- 修正了 LibriSpeech 转换器配方中 SpecAugment 过于激进的问题（#2548）。\n- 修复了 CommonVoice 数据准备过程中文件重复转换的问题（#2557）。\n- 在某些情况下修复了 SpectrogramDrop 的错误（#2564）。\n- 针对使用 `--editable` 标志时 Windows 安装可能出现的问题提供了一种潜在解决方案（#2541）。\n- 对 SSL 离散标记进行了改进并重构（#2509）。\n- 对四元数网络进行了修复和改进（#2464）。\n- 修复了 AISHELL 模型问题，并为 `TransformerASR` 中的 `causal` 参数添加了向后兼容性警告（#2606）。\n- …以及其他若干修复。\n\n### 内部变更\n\n- 通过在 CI 流程中引入拼写检查、头文件排序以及更严格的文档语法检查，提升了代码质量。\n- 显著提高了 CI 性能，从而改善了 PR 开发体验。\n- 重构了 SpeechBrain 的模块结构，在可能的情况下采用懒加载机制，以减少导入时间并大幅降低循环导入带来的困扰。\n- 引入了一些基础设施，以便在导入状态字典时执行部分预处理。","2024-10-01T10:55:15",{"id":191,"version":192,"summary_zh":193,"released_at":194},188777,"v0.5.16","SpeechBrain 0.5.16 将是 SpeechBrain 1.0 正式发布之前的最后一个次要版本。\n\n在这个次要版本中，我们专注于优化现有功能，未引入任何接口变更，以确保平稳过渡到 SpeechBrain 1.0。在 1.0 版本中，将进行不向后兼容的修改。\n\nSpeechBrain 0.5.16 的主要亮点：\n\n**Bug 修复**：我们实施了多项小修复，以提升 SpeechBrain 的整体稳定性和性能。\n\n**测试与文档**：我们致力于改进测试基础设施和文档，从而提供更强大、更易用的用户体验。\n\n**扩展模型与数据集支持**：SpeechBrain 0.5.16 新增对若干新模型和数据集的支持，进一步增强了平台的通用性。详细列表请参阅下方的提交记录。\n\n敬请关注，并准备好迎接具有突破性的 SpeechBrain 1.0 版本！届时我们将推出重大变革和令人兴奋的新功能。\n\n感谢您一直以来对 SpeechBrain 社区的支持！\n\n## 提交记录\n- [[cea36b4](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fcommit\u002Fcea36b45e0445069d1fbaa0a7755dc58722b6c1c)]: 更新 README.md（Mirco Ravanelli）[#1599](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1599)\n- [[cead130](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fcommit\u002Fcead130eec8a5fb0c64638abc1a8ea219ff1146f)]: 更新 README.md（prometheus）[#975](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F975)\n- [[779c620](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fcommit\u002F779c6206bc1aabdf851bfacd6c310d2720509b93)]: 更新 README.md（Mirco Ravanelli）[#2124](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F2124)\n- [[32af2ac](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fcommit\u002F32af2aca61f6b44dbaa9debefce74b3af5f45dd7)]: 更新依赖项（避免弃用错误）（Mirco Ravanelli）[#975](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F975)\n- [[b039df1](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fcommit\u002Fb039df1b68ad1fe5ed33ec9fda0ad9d4755007d2)]: 小幅修复（Mirco Ravanelli）[#975](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F975)\n- [[07e7c73](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fcommit\u002F07e7c7375299805835e8e42c6356f3757039f4ac)]: 小幅修复（Mirco Ravanelli）[#975](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F975)\n- [[dac6842](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fcommit\u002Fdac684210aacb38b3746316833ad9fbc741748db)]: 更新 README.md（Mirco Ravanelli）[#975](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F975)\n- [[75f4c66](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fcommit\u002F75f4c6659d2fcd498679002846c74cbb146a4324)]: 更新 README.md（Mirco Ravanelli）[#975](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F975）\n- [[327a3f5](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fcommit\u002F327a3f5cff7b4470d2a9e886c2a405fd37e6319e)]: 修复 SSVEP YAML 文件（prometheus）[#975](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F975)\n- [[067d94e](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fs","2023-11-22T02:28:28",{"id":196,"version":197,"summary_zh":198,"released_at":199},188778,"v0.5.15","# SpeechBrain 0.5.15 发行说明\n\n我们非常高兴地宣布 SpeechBrain 0.5.15 版本正式发布！这一新版本标志着我们的开源对话式 AI 工具包迈出了重要一步。核心团队与日益壮大的贡献者网络通力合作，致力于在修复各类问题的同时，持续提升和扩展工具包的功能。\n\n## 新特性有哪些？\n\n本次发布具有里程碑意义，它很可能是备受期待的 SpeechBrain 1.0 正式版之前的最后一个次要版本，后者预计将在未来几个月内推出。在这一版本中，我们取得了多项重要进展，以下是关键成果的概览。如需了解所有变更的完整列表，请参阅文末的详细说明。\n\n## 值得关注的成果\n\n1. **基准测试仓库**：\n我们自豪地推出了 [benchmark repository](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks)，旨在为研究人员提供一套标准化的流程，用于对不同技术与模型进行基准测试和比较。目前，该仓库已包含以下基准测试：\n   - [CLMASR](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FCL_MASR)：评估面向新语言的语音识别中的持续学习技术。\n   - [MP3S 基准测试](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fbenchmarks\u002Ftree\u002Fmain\u002Fbenchmarks\u002FCL_MASR)：在多种任务及不同下游模型下（多探针）评估语音自监督表征。\n\n2. **用户体验优化**：\n为了方便用户访问日志和检查点文件，我们将日志和输出目录从 Google Drive 迁移到了 Dropbox。\n\n3. **性能更优的新模型**：\n我们实现了改进版的 Fastspeech 2.0，兼具高效性和较高性能。得益于更优秀的 Conformer 和 Branchformer 架构，在 Librispeech 数据集上的性能得到了显著提升。此外，我们还引入了高效的 Conformer Transducer 模型以及 SLI-GRU 模型。\n\n4. **博士后可解释性技术支持**：\n我们现在为博士后阶段的可解释性技术提供了更好的支持。更多信息请参考 ESC50 的配方文档。\n\n5. **新增数据集**：\n我们新增了针对多个新数据集的配方，其中包括近期发布的 RescueSpeech（救援及特定领域环境下的语音识别）以及用于语音情感识别的 Zaion Emotion Dataset。\n\n6. **韩语 ASR 性能提升**：\n我们针对 KsponSpeech 数据集进行了优化，进一步提升了韩语自动语音识别的性能。\n\n7. **配方测试增强**：\n我们对配方测试进行了改进，以确保更高的可靠性和性能。\n\n8. **Whisper 相关修复**：\n我们修复了 Whisper 的配方及相关接口，同时保持向后兼容性。此举是为了应对原始模型中接口变化所带来的影响。\n\n9. **其他多项修复**：\n除上述成果外，我们还解决了若干其他问题，包括梯度累积以及其他一些小问题。","2023-07-22T18:07:47",{"id":201,"version":202,"summary_zh":203,"released_at":204},188779,"v0.5.14","本次发布是一个小版本，但非常重要。它在显著增加可用功能的同时，还修复了大量小 bug 和问题。以下是本次发布的成果概览；完整的变更详细列表请参阅本发布说明的底部。\n\n# 重要成果\n* 新增 22 名贡献者，非常感谢大家！\n* 新增 31 个配方（ASR、SLU、AST、AER、可解释性、SSL）。\n* 实现了配方的完全自动化测试。\n* 提高了持续集成对代码、URL、YAML、配方和 HuggingFace 模型的覆盖率。\n* 新增用于 ASR 的 Conformer Large 模型。\n* 集成了 Whisper，可用于微调或推理。\n* 完整重写并记录了 wav2vec2 的预训练流程。\n* 基于 IWSLT 的低资源语音翻译。\n* 还有许多其他新特性……详见下文。\n\n## 变更内容\n* 由 @anautsch 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1526 中修复 1522 号问题。\n* Bug 修复：修复了 OPEN_RIR 数据准备流程中的冲突。由 @xin-w8023 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1536 中完成。\n* 为 BinauralWSJ0Mix 添加噪声和混响版本。由 @huangzj421 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1502 中完成。\n* 由 @anautsch 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1566 中修复分布式命名空间问题。\n* 功能改进：使用成员字段替代硬编码。由 @xin-w8023 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1567 中实现。\n* 更新 Logo 至新版本。由 @pplantinga 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1575 中完成。\n* IWSLT 2022 语音翻译配方。由 @mzboito 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1475 中提供。\n* 修复 Issue #1277：timit 配方缺少大写选项。由 @Adel-Moumen 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1564 中完成。\n* 更新 README.md。由 @qanastek 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1577 中完成。\n* 从 huggingface_wav2vec 的所有 Transformer 层输出隐藏状态。由 @BenoitWang 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1570 中实现。\n* 修复 update_learning_rate 中的 bug。由 @wangxin22 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1578 中完成。\n* 修正 Tacotron2 parse_decoder_outputs() 中 unsqueeze() 输出的使用问题。由 @jqug 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1525 中完成。\n* 使用 SpeechBrain 实现 wav2vec2 预训练。由 @RuABraun 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1312 中完成。\n* 在 filter_ctc_output() 中移除冗余过滤操作。由 @olvb 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1584 中完成。\n* 修复 hubert 在 huggingface_wav2vec 中 output_all_hiddens 的问题。由 @gorinars 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1587 中完成。\n* 修复分离配方中 batch_evaluation 的返回值问题。由 @z-wony 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1555 中完成。\n* 修复尽管没有示例却导致 doctest 无限循环的问题。由 @anautsch 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1591 中完成。\n* 将文档中注明的最低 Python 版本修正为 3.7。由 @AsuMagic 在 https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1595 中完成。\n* Conformer","2023-03-24T17:40:58",{"id":206,"version":207,"summary_zh":208,"released_at":209},188780,"v0.5.13","这是一个小版本更新，改进了依赖项的版本规范。我们注意到 SpeechBrain 现已兼容 PyTorch 1.12，更新后的包也反映了这一点。有关相应更改的更多详细信息，请参阅每个提交旁边的链接问题。\n\n## 提交摘要\n- [[edb7714](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fcommit\u002Fedb771472e938aa16e2046a0875d5432d454fca7)]: 在核心模块中添加 no_sync 和 on_fit_batch_end 方法（Rudolf Arseni Braun）[#1449](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1449)\n- [[07155e9](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fcommit\u002F07155e9ec3e1e912aa8d7c7798bc9252efd27b17)]: G2P 修复（flexthink）[#1473](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1473)\n- [[6602dab](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fcommit\u002F6602dabe16fdb5cf632fd9e45c7d3a9bbfff25a7)]: 修复 #1469 问题，并为性能分析添加了最小化测试（anautsch）[#1476](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1476)\n- [[abbfab9](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fcommit\u002Fabbfab9dbb6353a1f8b773080e182cbc8ab7307c)]: 测试清理：通过代码检查工具；修复文档测试、单元测试和集成测试；在 CPU 上加载 YAML 文件（anautsch）[#1487](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1487)\n- [[1a16b41](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fcommit\u002F1a16b411d667a8ba788ec8eeb1ec8a9f1fe32f9d)]: 修复 DDP 中的错误命令（=）[#1498](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1498)\n- [[0b0ec9d](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fcommit\u002F0b0ec9d8bf7d5de4ce745b478ed91da2d457375c)]: 在 core.py 的 fit_batch() 中使用 no_sync()（Rudolf Arseni Braun）[#1449](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1449)\n- [[5c9b833](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fcommit\u002F5c9b8332aad18791da075760ec147ae0e56ea30b)]: 移除 PyTorch 的最大兼容版本限制（Peter Plantinga）[#1504](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1504)\n- [[d0f4352](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fcommit\u002Fd0f4352eb578d8700f479c6181b7157e3801ce02)]: 移除对 HF Hub 的限制，因为它与 Colab 不兼容（Titouan）[#1508](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1508)\n- [[b78f6f8](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fcommit\u002Fb78f6f86cfb47a6c66ddb08372c1655f3c45833b)]: 向 Hub 添加修订版本号（Titouan）[#1510](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1510)\n- [[2c491a4](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fcommit\u002F2c491a4662b1d7bb03f71c38a8f82d9faa766f33)]: 修复转换器损失输入设备的问题（Adel Moumen）[#1511](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1511)\n- [[4972f76](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fcommit\u002F4972f76d92d42b529824f6b715ff5af321bcd753)]: 安装命令中缺少空格（pehonnet）[#1512](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1512)\n- [[6bc72af](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fcommit\u002F6bc72af63377c9389f0e7920c0bb55e5d809a4bd)]: 修复 core.py 中分布式采样器的 shuffle 参数（Rudolf Arseni Braun）[#1518](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fpull\u002F1518)\n- [[df7acd9]","2022-08-29T16:25:16",{"id":211,"version":212,"summary_zh":213,"released_at":214},188781,"v0.5.12","# 发行说明 - SpeechBrain v0.5.12\n\n我们付出了巨大的努力，非常高兴地宣布 SpeechBrain 的新版本发布！\n\nSpeechBrain 0.5.12 在未引入任何重大接口变更的情况下，显著扩展了工具包的功能。在此，我要衷心感谢众多为这一版本做出贡献的开发者。\n\n主要更新内容如下：\n\nA) **文本转语音 (TTS)**：我们开发了 SpeechBrain 的首个 TTS 系统。您可以在 [这里](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Ftree\u002Fdevelop\u002Frecipes\u002FLJSpeech\u002FTTS) 找到相关代码。该系统基于 Tacotron2 + HiFiGAN（作为声码器）。与简易推理接口配套的模型已在 [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fspeechbrain\u002Ftts-tacotron2-ljspeech) 上发布。\n\nB) **字素到音素转换 (G2P)**：我们开发了一种先进的字素到音素转换模型。代码可在 [这里](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Ftree\u002Fdevelop\u002Frecipes\u002FLibriSpeech\u002FG2P) 找到。当前版本在性能上显著优于我们之前的模型。\n\nC) **语音分离**：\n1. 我们开发了一种名为“资源高效 SepFormer”（RE-Sepformer）的新型 *SepFormer* 模型。代码位于 [这里](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Ftree\u002Fdevelop\u002Frecipes\u002FWSJ0Mix\u002Fseparation)，预训练模型（配备简易推理接口）则可在 [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fspeechbrain\u002Fresepformer-wsj02mix) 上获取。\n2. 我们发布了针对 WSJMix 数据集的双耳语音分离配方。相关代码请见 [这里](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Ftree\u002Fdevelop\u002Frecipes\u002FBinauralWSJ0Mix)。\n3. 我们还发布了一个使用 AIShell mix 数据集的新配方。代码可在 [这里](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Ftree\u002Fdevelop\u002Frecipes\u002FAishell1Mix) 查看。\n\nD) **语音增强**：\n1. 我们发布了用于语音增强的 *SepFormer* 模型。代码位于 [这里](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Ftree\u002Fdevelop\u002Frecipes\u002FWHAMandWHAMR\u002Fenhancement)，而预训练模型（配备简易推理接口）则可在 [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fspeechbrain\u002Fsepformer-whamr-enhancement) 上找到。\n2. 我们实现了 *WideResNet* 用于语音增强，并将其应用于基于模仿损失的语音增强任务。代码位于 [这里](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Ftree\u002Fdevelop\u002Frecipes\u002FVoicebank\u002FMTL\u002FASR_enhance)，预训练模型（配备简易推理接口）则可在 [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fspeechbrain\u002Fmtl-mimic-voicebank) 上获取。\n\nE) **特征前端**：\n1. 我们现在支持 *LEAF* 滤波器组。代码位于 [这里](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fblob\u002Fdevelop\u002Fspeechbrain\u002Flobes\u002Ffeatures.py)。使用该滤波器组的示例配方可在 [这里](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Ftree\u002Fdevelop\u002Frecipes\u002FGoogle-speech-commands) 找到。\n2. 我们现在支持 *SincConv 多通道* 模块（代码见 [这里](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fblob\u002Fdevelop\u002Fspeechbrain\u002Fnnet\u002FCNN.py)）。\n\nF) **配方重构**：\n1. 我们重构了 *Voxceleb* 配方，并修复了归一化相关的问题。新的代码请见 [这里](https:\u002F\u002Fgithub.com\u002Fspeechbra","2022-06-26T20:19:45",{"id":216,"version":217,"summary_zh":218,"released_at":219},188782,"v0.5.11","Dear users, \r\nWe worked very hard, and we are very happy to announce the new version of SpeechBrain.\r\nSpeechBrain 0.5.11 further expands the toolkit without introducing any major interface change. \r\n\r\nThe main changes are the following:\r\n1. We implemented new recipes, such as:\r\n- [VoxLingua 107](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Ftree\u002Fdevelop\u002Frecipes\u002FVoxLingua107) for language identification.\r\n- [Sepformer](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Ftree\u002Fdevelop\u002Frecipes\u002FWHAMandWHAMR\u002Fenhancement) for speech enhancement \r\n- [MetricGAN-U](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Ftree\u002Fdevelop\u002Frecipes\u002FVoicebank\u002Fenhance\u002FMetricGAN-U) for speech enhancement\r\n- [SLURP](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Ftree\u002Fdevelop\u002Frecipes\u002FSLURP) with wav2vec for spoken language understanding.\r\n- [REALM](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Ftree\u002Fdevelop\u002Frecipes\u002FREAL-M\u002F) for speech separation with real data.\r\n- [Korean Speech Recognition](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Ftree\u002Fdevelop\u002Frecipes\u002FKsponSpeech) with KsponSpeech.\r\n- [CommonVoice](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Fblob\u002Fdevelop\u002Frecipes\u002FCommonVoice\u002FASR\u002Fseq2seq\u002Fhparams\u002Ftrain_de.yaml) for German. \r\n- [IEMOCAP](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Ftree\u002Fdevelop\u002Frecipes\u002FIEMOCAP) for language emotion recognition using wav2vec.\r\n\r\n2. Support for Dynamic batching with a [Tutorial](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1mypqbHDrusZaIbqPoiEGY-WIbnpMHa2I?usp=sharing) to help users familiarize themselves with it.\r\n\r\n3. Support for [wav2vec training](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Ftree\u002Fdevelop\u002Frecipes\u002FCommonVoice\u002Fself-supervised-learning\u002Fwav2vec2) within SpeechBrain.\r\n\r\n4. Developed an interface with Orion for hyperparameter tuning with a [Tutorial](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1b-5EOjZC7M9RvfWZ0Pq0HMV0KmQKu730?usp=sharing) to help users familiarize themselves with it.\r\n\r\n5. the torchaudio transducer loss is now [supported](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain\u002Ftree\u002Fdevelop\u002Frecipes\u002FLibriSpeech\u002FASR\u002Ftransducer). We also kept our numba implementation to help users customize the transducer loss part if needed.\r\n\r\n6. Improved CTC-Segmentation\r\n7. Fixed minor bugs and issues (e.g., fixed MVDR beamformer ).\r\n\r\nLet me thank all the amazing contributors for this achievement. \r\nPlease, keep add a star to our project if you appreciate our effort for the community.\r\nTogether, we are growing very fast, and we have big plans for the future.\r\n\r\nStay Tuned!","2021-12-20T04:22:27",{"id":221,"version":222,"summary_zh":223,"released_at":224},188783,"0.5.10","This version mainly expands the functionalities of SpeechBrain without adding any backward incompatibilities.\r\n\r\nNew Recipes:\r\n\r\n- Language Identification with CommonLanguage\r\n- EEG signal processing with ERPCore\r\n- Speech translation with Fisher-Call Home \r\n- Emotion Recognition with IEMOCAP \r\n- Voice Activity Detection with LibriParty\r\n- ASR with LibriSpeech wav2vec (WER=1.9 on test-clean)\r\n- SpeechEnhancement with CoopNet\r\n- SpeechEnhancement with SEGAN\r\n- Speech Separation with LibriMix, WHAM, and WHAMR\r\n- Support for guided attention\r\n- Spoken Language Understanding with SLURP\r\n\r\nBeyond that, we fixed some minor bugs and issues. ","2021-09-11T22:34:16",{"id":226,"version":227,"summary_zh":228,"released_at":229},188784,"v0.5.9","This main differences with the previous version are the following:\r\n\r\n- Added Wham\u002Fwhamr\u002Flibrimix for speech separation\r\n- Compatibility with PyTorch 1.9\r\n- Fixed minor bugs\r\n- Added SpeechBrain paper","2021-06-17T01:25:19",{"id":231,"version":232,"summary_zh":233,"released_at":234},188785,"v0.5.8","SpeechBrain 0.5.8 improves the previous version in the following way:\r\n\r\n- Added wav2vec support in TIMIT, CommonVoice, AISHELL-1 \r\n- Improved  Fluent Speech Command Recipe\r\n- Improved SLU recipes\r\n- Recipe for UrbanSound8k\r\n- Fix small bugs\r\n- Fix typos","2021-06-06T01:42:20",{"id":236,"version":237,"summary_zh":238,"released_at":239},188786,"0.5.7","SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.\r\nThe current version (v0.5.7) supports:\r\n- E2E Speech Recognition\r\n- Speaker Recognition (Identification and Verification)\r\n- Spoken Language Understanding (e.g., Intent recognition)\r\n- Speaker Diarization\r\n- Speech Enhancement\r\n- Speech Separation\r\n- Multi-microphone signal processing (beamforming, localization)\r\n\r\nMany other tasks will be supported soon. Take a look into our roadmap on [Discourse](https:\u002F\u002Fspeechbrain.discourse.group\u002Ft\u002Fspeechbrain-a-community-roadmap\u002F179).\r\nYour contribution is welcome! Please, star our project to help us growing.\r\n\r\nFor more info and tutorials:\r\nhttps:\u002F\u002Fspeechbrain.github.io\u002F","2021-04-29T17:12:43"]