[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-lhotse-speech--lhotse":3,"tool-lhotse-speech--lhotse":62},[4,18,26,36,46,54],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",160015,2,"2026-04-18T11:30:52",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":42,"last_commit_at":43,"category_tags":44,"status":17},8272,"opencode","anomalyco\u002Fopencode","OpenCode 是一款开源的 AI 编程助手（Coding Agent），旨在像一位智能搭档一样融入您的开发流程。它不仅仅是一个代码补全插件，而是一个能够理解项目上下文、自主规划任务并执行复杂编码操作的智能体。无论是生成全新功能、重构现有代码，还是排查难以定位的 Bug，OpenCode 都能通过自然语言交互高效完成，显著减少开发者在重复性劳动和上下文切换上的时间消耗。\n\n这款工具专为软件开发者、工程师及技术研究人员设计，特别适合希望利用大模型能力来提升编码效率、加速原型开发或处理遗留代码维护的专业人群。其核心亮点在于完全开源的架构，这意味着用户可以审查代码逻辑、自定义行为策略，甚至私有化部署以保障数据安全，彻底打破了传统闭源 AI 助手的“黑盒”限制。\n\n在技术体验上，OpenCode 提供了灵活的终端界面（Terminal UI）和正在测试中的桌面应用程序，支持 macOS、Windows 及 Linux 全平台。它兼容多种包管理工具，安装便捷，并能无缝集成到现有的开发环境中。无论您是追求极致控制权的资深极客，还是渴望提升产出的独立开发者，OpenCode 都提供了一个透明、可信",144296,1,"2026-04-16T14:50:03",[13,45],"插件",{"id":47,"name":48,"github_repo":49,"description_zh":50,"stars":51,"difficulty_score":32,"last_commit_at":52,"category_tags":53,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",109154,"2026-04-18T11:18:24",[14,15,13],{"id":55,"name":56,"github_repo":57,"description_zh":58,"stars":59,"difficulty_score":32,"last_commit_at":60,"category_tags":61,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[45,13,15,14],{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":78,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":91,"forks":92,"last_commit_at":93,"license":94,"difficulty_score":42,"env_os":95,"env_gpu":96,"env_ram":96,"env_deps":97,"category_tags":106,"github_topics":108,"view_count":32,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":119,"updated_at":120,"faqs":121,"releases":152},9242,"lhotse-speech\u002Flhotse","lhotse","Tools for handling multimodal data in machine learning projects.","Lhotse 是一款专为机器学习项目设计的 Python 库，旨在让音频、文本、图像和视频等多模态数据的准备工作变得更加灵活高效。作为下一代 Kaldi 语音处理生态的重要组成部分，它主要解决了研究人员在处理大规模多模态数据集时面临的流程繁琐、格式不统一以及分布式训练数据加载困难等痛点。\n\n这款工具非常适合从事语音识别、多模态学习的研究人员以及需要构建复杂数据管道的 AI 开发者使用。Lhotse 的核心亮点在于其独特的“切片（Cuts）”概念，允许用户像操作代码对象一样灵活地裁剪、组合和变换音视频数据片段。它还提供了先进的数据加载算法，支持数据集混合、高效的动态分桶（bucketing）以及针对分布式训练的数据随机化处理。此外，Lhotse 内置了多种常用语料库的标准预处理方案，并推出了专为顺序 I\u002FO 优化的\"Lhotse Shar\"存储格式，显著提升了数据读取速度。通过纯 Python 的设计理念，Lhotse 降低了多模态数据处理的门槛，帮助用户更专注于模型创新而非数据清洗细节。","\u003Cdiv align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flhotse-speech_lhotse_readme_6f1d0db6afed.png\" width=376>\n\n[![PyPI Status](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Flhotse.svg)](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Flhotse)\n[![Python Versions](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Flhotse.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002Flhotse\u002F)\n[![PyPI Status](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flhotse-speech_lhotse_readme_d50896e0bc3d.png)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Flhotse)\n[![Build Status](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https%3A%2F%2Factions-badge.atrox.dev%2Fpzelasko%2Flhotse%2Fbadge%3Fref%3Dmaster&style=flat)](https:\u002F\u002Factions-badge.atrox.dev\u002Fpzelasko\u002Flhotse\u002Fgoto?ref=master)\n[![Documentation Status](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flhotse-speech_lhotse_readme_13d664e1afd7.png)](https:\u002F\u002Flhotse.readthedocs.io\u002Fen\u002Flatest\u002F?badge=latest)\n[![codecov](https:\u002F\u002Fcodecov.io\u002Fgh\u002Flhotse-speech\u002Flhotse\u002Fbranch\u002Fmaster\u002Fgraph\u002Fbadge.svg)](https:\u002F\u002Fcodecov.io\u002Fgh\u002Flhotse-speech\u002Flhotse)\n[![Code style: black](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcode%20style-black-000000.svg)](https:\u002F\u002Fgithub.com\u002Fpsf\u002Fblack)\n[![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Flhotse-speech\u002Flhotse-speech.github.io\u002Fblob\u002Fmaster\u002Fnotebooks\u002Flhotse-introduction.ipynb)\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2110.12561-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.12561)\n\n\u003C\u002Fdiv>\n\n\n# Lhotse\n\nLhotse is a Python library aiming to make multimodal (speech, audio, video, image, text) data preparation flexible and accessible to a wider community.\nAlongside [k2](https:\u002F\u002Fgithub.com\u002Fk2-fsa\u002Fk2), it is a part of the next generation [Kaldi](https:\u002F\u002Fgithub.com\u002Fkaldi-asr\u002Fkaldi) speech processing library.\n\n## Tutorial presentations and materials\n\n- (_Interspeech 2023_) Tutorial notebook [![Interspeech 2023 Tutorial](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1obZjUuVwks3A4oFX3gXFtPOM2LtrPQfL?usp=sharing)\n- (_Interspeech 2023_) [Tutorial slides](https:\u002F\u002Flivejohnshopkins-my.sharepoint.com\u002F:p:\u002Fg\u002Fpersonal\u002Fmwiesne2_jh_edu\u002FEYqRDl8cIr5BsVDxi1MOW5EBUpdqh10WFkzqixPIFM63hg?e=u3lrmL)\n- (_Interspeech 2021_) [Recorded lecture (3h)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=y6CJLFQlmhc&pp=ygUgaW50ZXJzcGVlY2ggMjAyMSBsaG90c2UgdHV0b3JpYWw%3D)\n\n## About\n\n### Main goals (updated for 2025)\n\n- Scale to multimodal data pipelines including audio, text, image, and video modalities.\n- Provide state-of-the-art dataloading algorithms such as dataset blending and efficient on-the-fly bucketing.\n- Handle data randomization (or de-duplication) for distributed multi-node training.\n- Attract a wider community to multimodal processing tasks with a **Python-centric design**.\n- Provide **standard data preparation recipes** for commonly used corpora.\n- Flexible data preparation for model training with the notion of **audio\u002Fvideo cuts**.\n- Support for efficient sequential I\u002FO data formats such as Lhotse Shar (similar to webdataset).\n\n### Tutorials\n\nWe offer the following tutorials available in `examples` directory:\n- Basic complete Lhotse workflow [![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Flhotse-speech\u002Flhotse\u002Fblob\u002Fmaster\u002Fexamples\u002F00-basic-workflow.ipynb)\n- Transforming data with Cuts [![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Flhotse-speech\u002Flhotse\u002Fblob\u002Fmaster\u002Fexamples\u002F01-cut-python-api.ipynb)\n- WebDataset integration [![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Flhotse-speech\u002Flhotse\u002Fblob\u002Fmaster\u002Fexamples\u002F02-webdataset-integration.ipynb)\n- How to combine multiple datasets [![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Flhotse-speech\u002Flhotse\u002Fblob\u002Fmaster\u002Fexamples\u002F03-combining-datasets.ipynb)\n- Lhotse Shar: storage format optimized for sequential I\u002FO and modularity [![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Flhotse-speech\u002Flhotse\u002Fblob\u002Fmaster\u002Fexamples\u002F04-lhotse-shar.ipynb)\n- Image and Video Support in Lhotse [![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Flhotse-speech\u002Flhotse\u002Fblob\u002Fmaster\u002Fexamples\u002F05-image-and-video-loading.ipynb)\n\n### Examples of use\n\nCheck out the following links to see how Lhotse is being put to use:\n- [Icefall recipes](https:\u002F\u002Fgithub.com\u002Fk2-fsa\u002Ficefall): where k2 and Lhotse meet.\n- Minimal ESPnet+Lhotse example: [![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1HKSYPsWx_HoCdrnLpaPdYj5zwlPsM3NH)\n\n### Main ideas\n\nLike Kaldi, Lhotse provides standard data preparation recipes, but extends that with a seamless PyTorch integration\nthrough task-specific Dataset classes. The data and meta-data are represented in human-readable text manifests and\nexposed to the user through convenient Python classes.\n\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flhotse-speech_lhotse_readme_473db64c2fe8.png)\n\nLhotse introduces the notion of audio cuts, designed to ease the training data construction with operations such as\nmixing, truncation and padding that are performed on-the-fly to minimize the amount of storage required. Data\naugmentation and feature extraction are supported both in pre-computed mode, with feature matrices stored on disk\n(optionally using lilcom-compressed backends for better storage efficiency), and on-the-fly mode that computes the\ntransformations upon request. Additionally, Lhotse introduces feature-space cut mixing to make the best of both\nworlds.\n\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flhotse-speech_lhotse_readme_aaf00554498a.png)\n\n## Installation\n\nLhotse supports Python version 3.7 and later.\n\n### Pip\n\nLhotse is available on PyPI:\n\n    pip install lhotse\n\nTo install the latest, unreleased version, do:\n\n    pip install git+https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\n\n### Development installation\n\nFor development installation, you can fork\u002Fclone the GitHub repo and install with pip:\n\n    git clone https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\n    cd lhotse\n    pip install -e '.[dev]'\n    pre-commit install  # installs pre-commit hooks with style checks\n\n    # Running unit tests\n    pytest test\n\n    # Running linter checks\n    pre-commit run\n\nThis is an editable installation (`-e` option), meaning that your changes to the source code are automatically\nreflected when importing lhotse (no re-install needed). The `[dev]` part means you're installing extra dependencies\nthat are used to run tests, build documentation or launch jupyter notebooks.\n\n### Environment variables\n\nLhotse uses several environment variables to customize it's behavior. They are as follows:\n- `LHOTSE_REQUIRE_TORCHAUDIO` - when it's set and not any of `1|True|true|yes`, we'll not check for torchaudio being installed and remove it from the requirements. It will disable many functionalities of Lhotse but the basic capabilities will remain (including reading audio with `soundfile`).\n- `LHOTSE_AUDIO_DURATION_MISMATCH_TOLERANCE` - used when we load audio from a file and receive a different number of samples than declared in `Recording.num_samples`. This is sometimes necessary because different codecs (or even different versions of the same codec) may use different padding when decoding compressed audio. Typically values up to 0.1, or even 0.3 (second) are still reasonable, and anything beyond that indicates a serious issue.\n- `LHOTSE_AUDIO_BACKEND` - may be set to any of the values returned from CLI `lhotse list-audio-backends` to override the default behavior of trial-and-error and always use a specific audio backend.\n- `LHOTSE_IO_BACKEND` - may be set to any of the values returned from CLI `lhotse list-io-backends` to override how Lhotse opens paths, URLs, and URIs via `open_best()` (for example when reading manifests or URL-backed `AudioSource`s). The same list is also available in Python via `lhotse.available_io_backends()`.\n- `LHOTSE_RESAMPLING_BACKEND` - may be set to any of the value returned from CLI `lhotse list-resampling-backends` to override the default behaviour.\n- `LHOTSE_FEATURES_STORAGE_BACKEND` - may be set to any valid feature storage backend name (e.g. `numpy_files`, `lilcom_chunky`) to override the default feature storage backend (which is `numpy_files`). Use `lhotse.available_storage_backends()` to inspect the currently usable choices, or `lhotse.storage_backend_statuses()` \u002F CLI `lhotse list-storage-backends` for a full list that also marks unavailable backends with install hints. If you have `lilcom` installed and want smaller feature archives, `lilcom_chunky` is the preferred choice.\n- `LHOTSE_AUDIO_LOADING_EXCEPTION_VERBOSE` - when set to `1` we'll emit full exception stack traces when every available audio backend fails to load a given file (they might be very large).\n- `LHOTSE_DILL_ENABLED` - when it's set to `1|True|true|yes`, we will enable `dill`-based serialization of `CutSet` and `Sampler` across processes (it's disabled by default even when `dill` is installed).\n- `LHOTSE_LEGACY_OPUS_LOADING` - (`=1`) reverts to a legacy OPUS loading mechanism that triggered a new ffmpeg subprocess for each OPUS file.\n- `LHOTSE_PREPARING_RELEASE` - used internally by developers when releasing a new version of Lhotse.\n- `TORCHAUDIO_USE_BACKEND_DISPATCHER` - when set to `1` and torchaudio version is below 2.1, we'll enable the experimental ffmpeg backend of torchaudio.\n- `AIS_ENDPOINT` is read by AIStore client to determine AIStore endpoint URL. Required for AIStore dataloading.\n- `AIS_CONNECT_TIMEOUT` - used by AIStore SDK to set the connection timeout (in seconds) for AIStore client requests. Set to `0` to disable (no timeout). If not set, the SDK default is used (3s).\n- `AIS_READ_TIMEOUT` - used by AIStore SDK to set the read timeout (in seconds) for AIStore client requests. Set to `0` to disable (no timeout). If not set, the SDK default is used (20s).\n- `RANK`, `WORLD_SIZE`, `WORKER`, and `NUM_WORKERS` are internally used to inform Lhotse Shar dataloading subprocesses.\n- `READTHEDOCS` is internally used for documentation builds.\n- `LHOTSE_MSC_OVERRIDE_PROTOCOLS` - when set, it will override your input protocols before feeding to MSCIOBackend.  Useful when you don't want to change your existing url format but want to use MSCIOBackend.  For example, if you have `s3:\u002F\u002Fs3-bucket\u002Fpath\u002Fto\u002Fmy\u002Fobject` and `gs:\u002F\u002Fgs-bucket\u002Fpath\u002Fto\u002Fmy\u002Fobject`, you can set `LHOTSE_MSC_OVERRIDE_PROTOCOLS=s3,gs` to override the urls to `msc:\u002F\u002Fs3-bucket\u002Fpath\u002Fto\u002Fmy\u002Fobject` and `msc:\u002F\u002Fgs-bucket\u002Fpath\u002Fto\u002Fmy\u002Fobject`.\n- `LHOTSE_MSC_PROFILE` - when set, it will override the your bucket name before feeding to MSCIOBackend.  Useful when your msc profile is not the same as your bucket name.  For example, if you have `s3:\u002F\u002Fs3-bucket\u002Fpath\u002Fto\u002Fmy\u002Fobject`, you can set `LHOTSE_MSC_OVERRIDE_PROTOCOLS=s3` and `LHOTSE_MSC_PROFILE=msc-s3-profile` to override the url to `msc:\u002F\u002Fmsc-s3-profile\u002Fpath\u002Fto\u002Fmy\u002Fobject`.\n- `LHOTSE_MSC_BACKEND_FORCED` - when set to `True`, forces Lhotse to use MSCIOBackend for all URLs. Use with caution as functionality may break if MSC does not support the provided URL format.\n\n### Optional dependencies\n\n**Other pip packages.** You can leverage optional features of Lhotse by installing the relevant supporting package:\n- `pip install lhotse[lilcom]` to enable lilcom-compressed feature and array storage backends. If storage efficiency is important, `lilcom_chunky` is the preferred feature-storage backend once this dependency is installed.\n- `torchcodec` (>= 0.9, requires torch >= 2.9) is supported as an audio backend when detected. It is a PyTorch-native audio decoder built on FFmpeg. Install it via `pip install torchcodec`. When installed, it takes precedence over torchaudio in the default backend chain.\n- `torchaudio` used to be a core dependency in Lhotse, but is now optional. Refer to [official PyTorch documentation for installation](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F).\n- `pip install lhotse[kaldi]` for a maximal feature set related to Kaldi compatibility. It includes libraries such as `kaldi_native_io` (a more efficient variant of `kaldi_io`) and `kaldifeat` that port some of Kaldi functionality into Python.\n- `pip install lhotse[orjson]` for up to 50% faster reading of JSONL manifests.\n- `pip install lhotse[webdataset]`. We support \"compiling\" your data into WebDataset tarball format for more effective IO. You can still interact with the data as if it was a regular lazy CutSet. To learn more, check out the following tutorial: [![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Flhotse-speech\u002Flhotse\u002Fblob\u002Fmaster\u002Fexamples\u002F02-webdataset-integration.ipynb)\n- `pip install h5py` if you want to extract speech features and store them as HDF5 arrays.\n- `pip install dill`. When `dill` is installed, we'll use it to pickle CutSet that uses a lambda function in calls such as `.map` or `.filter`. This is helpful in PyTorch DataLoader with `num_jobs>0`. Without `dill`, depending on your environment, you'll see an exception or a hanging script.\n- `pip install aistore` to read manifests, tar fles, and other data from AIStore using AIStore-supported URLs (set `AIS_ENDPOINT` environment variable to activate it). See [AIStore documentation](https:\u002F\u002Faiatscale.org) for more details.\n- `pip install smart_open` to read and write manifests and data in any location supported by `smart_open` (e.g. cloud, http).\n- `pip install opensmile` for feature extraction using the OpenSmile toolkit's Python wrapper.\n- `pip install multi-storage-client` for read and write manifests and data in different storage backends. See [multi-storage-client](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fmulti-storage-client) for more details.\n\n**sph2pipe.** For reading older LDC SPHERE (.sph) audio files that are compressed with codecs unsupported by ffmpeg and sox, please run:\n\n    # CLI\n    lhotse install-sph2pipe\n\n    # Python\n    from lhotse.tools import install_sph2pipe\n    install_sph2pipe()\n\nIt will download it to `~\u002F.lhotse\u002Ftools`, compile it, and auto-register in `PATH`. The program should be automatically detected and used by Lhotse.\n\n## Examples\n\nWe have example recipes showing how to prepare data and load it in Python as a PyTorch `Dataset`.\nThey are located in the `examples` directory.\n\nA short snippet to show how Lhotse can make audio data preparation quick and easy:\n\n```python\nfrom torch.utils.data import DataLoader\nfrom lhotse import CutSet, Fbank\nfrom lhotse.dataset import VadDataset, SimpleCutSampler\nfrom lhotse.recipes import prepare_switchboard\n\n# Prepare data manifests from a raw corpus distribution.\n# The RecordingSet describes the metadata about audio recordings;\n# the sampling rate, number of channels, duration, etc.\n# The SupervisionSet describes metadata about supervision segments:\n# the transcript, speaker, language, and so on.\nswbd = prepare_switchboard('\u002Fexport\u002Fcorpora3\u002FLDC\u002FLDC97S62')\n\n# CutSet is the workhorse of Lhotse, allowing for flexible data manipulation.\n# We create 5-second cuts by traversing SWBD recordings in windows.\n# No audio data is actually loaded into memory or stored to disk at this point.\ncuts = CutSet.from_manifests(\n    recordings=swbd['recordings'],\n    supervisions=swbd['supervisions']\n).cut_into_windows(duration=5)\n\n# We compute the log-Mel filter energies and store them on disk;\n# Then, we pad the cuts to 5 seconds to ensure all cuts are of equal length,\n# as the last window in each recording might have a shorter duration.\n# The padding will be performed once the features are loaded into memory.\ncuts = cuts.compute_and_store_features(\n    extractor=Fbank(),\n    storage_path='feats',\n    num_jobs=8\n).pad(duration=5.0)\n\n# Construct a Pytorch Dataset class for Voice Activity Detection task:\ndataset = VadDataset()\nsampler = SimpleCutSampler(cuts, max_duration=300)\ndataloader = DataLoader(dataset, sampler=sampler, batch_size=None)\nbatch = next(iter(dataloader))\n```\n\nThe `VadDataset` will yield a batch with pairs of feature and supervision tensors such as the following - the speech\nstarts roughly at the first second (100 frames):\n\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flhotse-speech_lhotse_readme_38d97c8df09e.png)\n\n# Acknowledgment\n\nSome contributions to this project were supported by National Science Foundation CCRI award 2120435.\n","\u003Cdiv align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flhotse-speech_lhotse_readme_6f1d0db6afed.png\" width=376>\n\n[![PyPI Status](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Flhotse.svg)](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Flhotse)\n[![Python Versions](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Flhotse.svg)](https:\u002F\u002Fpypi.org\u002Fproject\u002Flhotse\u002F)\n[![PyPI Status](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flhotse-speech_lhotse_readme_d50896e0bc3d.png)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Flhotse)\n[![Build Status](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https%3A%2F%2Factions-badge.atrox.dev%2Fpzelasko%2Flhotse%2Fbadge%3Fref%3Dmaster&style=flat)](https:\u002F\u002Factions-badge.atrox.dev\u002Fpzelasko\u002Flhotse\u002Fgoto?ref=master)\n[![Documentation Status](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flhotse-speech_lhotse_readme_13d664e1afd7.png)](https:\u002F\u002Flhotse.readthedocs.io\u002Fen\u002Flatest\u002F?badge=latest)\n[![codecov](https:\u002F\u002Fcodecov.io\u002Fgh\u002Flhotse-speech\u002Flhotse\u002Fbranch\u002Fmaster\u002Fgraph\u002Fbadge.svg)](https:\u002F\u002Fcodecov.io\u002Fgh\u002Flhotse-speech\u002Flhotse)\n[![Code style: black](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcode%20style-black-000000.svg)](https:\u002F\u002Fgithub.com\u002Fpsf\u002Fblack)\n[![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Flhotse-speech\u002Flhotse-speech.github.io\u002Fblob\u002Fmaster\u002Fnotebooks\u002Flhotse-introduction.ipynb)\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2110.12561-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.12561)\n\n\u003C\u002Fdiv>\n\n\n# Lhotse\n\nLhotse 是一个 Python 库，旨在使多模态（语音、音频、视频、图像、文本）数据的准备更加灵活，并让更广泛的社区能够轻松使用。它与 [k2](https:\u002F\u002Fgithub.com\u002Fk2-fsa\u002Fk2) 一起，构成了下一代 [Kaldi](https:\u002F\u002Fgithub.com\u002Fkaldi-asr\u002Fkaldi) 语音处理库的一部分。\n\n## 教程演示和资料\n\n- (_Interspeech 2023_) 教程笔记本 [![Interspeech 2023 教程](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1obZjUuVwks3A4oFX3gXFtPOM2LtrPQfL?usp=sharing)\n- (_Interspeech 2023_) [教程幻灯片](https:\u002F\u002Flivejohnshopkins-my.sharepoint.com\u002F:p:\u002Fg\u002Fpersonal\u002Fmwiesne2_jh_edu\u002FEYqRDl8cIr5BsVDxi1MOW5EBUpdqh10WFkzqixPIFM63hg?e=u3lrmL)\n- (_Interspeech 2021_) [录制讲座 (3小时)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=y6CJLFQlmhc&pp=ygUgaW50ZXJzcGVlY2ggMjAyMSBsaG90c2UgdHV0b3JpYWw%3D)\n\n## 关于\n\n### 主要目标（更新至 2025 年）\n\n- 扩展到包括音频、文本、图像和视频模态在内的多模态数据流水线。\n- 提供最先进的数据加载算法，如数据集混合和高效的按需分桶。\n- 处理分布式多节点训练中的数据随机化（或去重）问题。\n- 以 **以 Python 为中心的设计** 吸引更广泛的社区参与多模态处理任务。\n- 为常用语料库提供 **标准的数据准备配方**。\n- 通过 **音频\u002F视频片段** 的概念，实现灵活的模型训练数据准备。\n- 支持高效的顺序 I\u002FO 数据格式，例如 Lhotse Shar（类似于 webdataset）。\n\n### 教程\n\n我们在 `examples` 目录中提供了以下教程：\n- 基本完整的 Lhotse 工作流程 [![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Flhotse-speech\u002Flhotse\u002Fblob\u002Fmaster\u002Fexamples\u002F00-basic-workflow.ipynb)\n- 使用 Cuts 转换数据 [![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Flhotse-speech\u002Flhotse\u002Fblob\u002Fmaster\u002Fexamples\u002F01-cut-python-api.ipynb)\n- WebDataset 集成 [![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Flhotse-speech\u002Flhotse\u002Fblob\u002Fmaster\u002Fexamples\u002F02-webdataset-integration.ipynb)\n- 如何合并多个数据集 [![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Flhotse-speech\u002Flhotse\u002Fblob\u002Fmaster\u002Fexamples\u002F03-combining-datasets.ipynb)\n- Lhotse Shar：针对顺序 I\u002FO 和模块化的存储格式优化 [![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Flhotse-speech\u002Flhotse\u002Fblob\u002Fmaster\u002Fexamples\u002F04-lhotse-shar.ipynb)\n- Lhotse 中的图像和视频支持 [![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Flhotse-speech\u002Flhotse\u002Fblob\u002Fmaster\u002Fexamples\u002F05-image-and-video-loading.ipynb)\n\n### 使用示例\n\n请查看以下链接，了解 Lhotse 的实际应用：\n- [Icefall 配方](https:\u002F\u002Fgithub.com\u002Fk2-fsa\u002Ficefall)：k2 和 Lhotse 的结合点。\n- 最小化 ESPnet+Lhotse 示例：[![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1HKSYPsWx_HoCdrnLpaPdYj5zwlPsM3NH)\n\n### 核心理念\n\n与 Kaldi 类似，Lhotse 提供标准的数据准备配方，但通过特定任务的 Dataset 类实现了与 PyTorch 的无缝集成。数据和元数据以人类可读的文本清单形式呈现，并通过便捷的 Python 类暴露给用户。\n\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flhotse-speech_lhotse_readme_473db64c2fe8.png)\n\nLhotse 引入了音频片段的概念，旨在简化训练数据的构建，通过诸如混合、截断和填充等操作，在运行时完成，从而最大限度地减少所需的存储空间。数据增强和特征提取既支持预计算模式——将特征矩阵存储在磁盘上（可选使用 lilcom 压缩后端以提高存储效率），也支持按需计算变换的实时模式。此外，Lhotse 还引入了特征空间片段混合，以兼顾两者的优点。\n\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flhotse-speech_lhotse_readme_aaf00554498a.png)\n\n## 安装\n\nLhotse 支持 Python 3.7 及更高版本。\n\n### Pip\n\nLhotse 已在 PyPI 上发布：\n\n    pip install lhotse\n\n若要安装最新的未发布版本，请执行：\n\n    pip install git+https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\n\n### 开发环境安装\n\n对于开发环境的安装，您可以 fork 或 clone GitHub 仓库，然后使用 pip 安装：\n\n    git clone https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\n    cd lhotse\n    pip install -e '.[dev]'\n    pre-commit install  # 安装带有代码风格检查的 pre-commit 钩子\n\n    # 运行单元测试\n    pytest test\n\n    # 运行 linter 检查\n    pre-commit run\n\n这是一种可编辑的安装方式（`-e` 选项），这意味着您对源代码所做的更改会在导入 lhotse 时自动生效（无需重新安装）。`[dev]` 部分表示您正在安装用于运行测试、构建文档或启动 Jupyter 笔记本的额外依赖项。\n\n### 环境变量\n\nLhotse 使用多个环境变量来定制其行为。这些变量如下：\n\n- `LHOTSE_REQUIRE_TORCHAUDIO` - 当该变量被设置且其值不是 `1`、`True`、`true` 或 `yes` 时，Lhotse 将不会检查是否已安装 torchaudio，并将其从依赖项中移除。这会禁用 Lhotse 的许多功能，但基本功能（包括使用 `soundfile` 读取音频）仍将保留。\n  \n- `LHOTSE_AUDIO_DURATION_MISMATCH_TOLERANCE` - 当从文件加载音频时，如果实际采样数与 `Recording.num_samples` 中声明的采样数不一致，此变量将发挥作用。有时这是必要的，因为不同的编解码器（甚至同一编解码器的不同版本）在解码压缩音频时可能会采用不同的填充方式。通常，0.1 秒或甚至 0.3 秒的差异仍属合理范围；超过此范围则表明存在严重问题。\n\n- `LHOTSE_AUDIO_BACKEND` - 可设置为 CLI 命令 `lhotse list-audio-backends` 返回的任意值，以覆盖默认的尝试机制，强制使用特定的音频后端。\n\n- `LHOTSE_IO_BACKEND` - 可设置为 CLI 命令 `lhotse list-io-backends` 返回的任意值，以覆盖 Lhotse 使用 `open_best()` 打开路径、URL 和 URI 的方式（例如在读取清单文件或基于 URL 的 `AudioSource` 时）。同样，Python 中也可通过 `lhotse.available_io_backends()` 获取该列表。\n\n- `LHOTSE_RESAMPLING_BACKEND` - 可设置为 CLI 命令 `lhotse list-resampling-backends` 返回的任意值，以覆盖默认行为。\n\n- `LHOTSE_FEATURES_STORAGE_BACKEND` - 可设置为任何有效的特征存储后端名称（如 `numpy_files`、`lilcom_chunky`），以覆盖默认的特征存储后端（即 `numpy_files`）。可使用 `lhotse.available_storage_backends()` 检查当前可用的选项，或使用 `lhotse.storage_backend_statuses()` 和 CLI 命令 `lhotse list-storage-backends` 查看完整列表，其中还标注了不可用的后端及其安装提示。如果您已安装 `lilcom` 并希望获得更小的特征归档文件，建议选择 `lilcom_chunky`。\n\n- `LHOTSE_AUDIO_LOADING_EXCEPTION_VERBOSE` - 当设置为 `1` 时，若所有可用的音频后端均无法加载指定文件，Lhotse 将输出完整的异常堆栈跟踪（可能非常庞大）。\n\n- `LHOTSE_DILL_ENABLED` - 当设置为 `1`、`True`、`true` 或 `yes` 时，将在进程间启用基于 `dill` 的 `CutSet` 和 `Sampler` 序列化（即使已安装 `dill`，默认也为禁用状态）。\n\n- `LHOTSE_LEGACY_OPUS_LOADING` - （设为 `1`）将回退到旧版 OPUS 加载机制，该机制会为每个 OPUS 文件启动一个新的 ffmpeg 子进程。\n\n- `LHOTSE_PREPARING_RELEASE` - 开发者在发布 Lhotse 新版本时内部使用。\n\n- `TORCHAUDIO_USE_BACKEND_DISPATCHER` - 当设置为 `1` 且 torchaudio 版本低于 2.1 时，将启用 torchaudio 的实验性 ffmpeg 后端。\n\n- `AIS_ENDPOINT` 由 AIStore 客户端读取，用于确定 AIStore 端点 URL。这是进行 AIStore 数据加载所必需的。\n\n- `AIS_CONNECT_TIMEOUT` - 由 AIStore SDK 用于设置 AIStore 客户端请求的连接超时时间（单位：秒）。设置为 `0` 表示禁用超时（无超时）。若未设置，则使用 SDK 默认值（3 秒）。\n\n- `AIS_READ_TIMEOUT` - 由 AIStore SDK 用于设置 AIStore 客户端请求的读取超时时间（单位：秒）。设置为 `0` 表示禁用超时（无超时）。若未设置，则使用 SDK 默认值（20 秒）。\n\n- `RANK`、`WORLD_SIZE`、`WORKER` 和 `NUM_WORKERS` 在内部用于向 Lhotse Shar 数据加载子进程中传递信息。\n\n- `READTHEDOCS` 在内部用于文档构建。\n\n- `LHOTSE_MSC_OVERRIDE_PROTOCOLS` - 当设置时，会在输入数据传递给 MSCIOBackend 之前覆盖协议部分。这在您不想更改现有 URL 格式但仍希望使用 MSCIOBackend 时非常有用。例如，如果您有 `s3:\u002F\u002Fs3-bucket\u002Fpath\u002Fto\u002Fmy\u002Fobject` 和 `gs:\u002F\u002Fgs-bucket\u002Fpath\u002Fto\u002Fmy\u002Fobject`，可以设置 `LHOTSE_MSC_OVERRIDE_PROTOCOLS=s3,gs`，将 URL 覆盖为 `msc:\u002F\u002Fs3-bucket\u002Fpath\u002Fto\u002Fmy\u002Fobject` 和 `msc:\u002F\u002Fgs-bucket\u002Fpath\u002Fto\u002Fmy\u002Fobject`。\n\n- `LHOTSE_MSC_PROFILE` - 当设置时，会在输入数据传递给 MSCIOBackend 之前覆盖存储桶名称。这在您的 MSC 配置文件名称与存储桶名称不一致时很有用。例如，如果您有 `s3:\u002F\u002Fs3-bucket\u002Fpath\u002Fto\u002Fmy\u002Fobject`，可以同时设置 `LHOTSE_MSC_OVERRIDE_PROTOCOLS=s3` 和 `LHOTSE_MSC_PROFILE=msc-s3-profile`，将 URL 赋值为 `msc:\u002F\u002Fmsc-s3-profile\u002Fpath\u002Fto\u002Fmy\u002Fobject`。\n\n- `LHOTSE_MSC_BACKEND_FORCED` - 当设置为 `True` 时，将强制 Lhotse 对所有 URL 使用 MSCIOBackend。请谨慎使用，因为如果 MSC 不支持提供的 URL 格式，可能会导致功能失效。\n\n### 可选依赖项\n\n**其他 pip 包。** 您可以通过安装相关支持包来利用 Lhotse 的可选功能：\n- `pip install lhotse[lilcom]` 以启用 lilcom 压缩特性和数组存储后端。如果存储效率很重要，在安装此依赖项后，`lilcom_chunky` 是首选的特性存储后端。\n- `torchcodec`（>= 0.9，需要 torch >= 2.9）在检测到时被支持为音频后端。它是基于 FFmpeg 构建的 PyTorch 原生音频解码器。可通过 `pip install torchcodec` 安装。安装后，它会在默认后端链中优先于 torchaudio。\n- `torchaudio` 曾经是 Lhotse 的核心依赖项，但现在已成为可选项。请参阅 [PyTorch 官方安装文档](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F)。\n- `pip install lhotse[kaldi]` 以获得与 Kaldi 兼容的最大化功能集。它包括诸如 `kaldi_native_io`（`kaldi_io` 的更高效变体）和 `kaldifeat` 等库，这些库将部分 Kaldi 功能移植到 Python 中。\n- `pip install lhotse[orjson]` 以实现高达 50% 的 JSONL 清单读取速度提升。\n- `pip install lhotse[webdataset]`。我们支持将您的数据“编译”成 WebDataset tarball 格式，以提高 I\u002FO 效率。您仍然可以像操作普通的惰性 CutSet 一样与数据交互。要了解更多信息，请查看以下教程：[![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Flhotse-speech\u002Flhotse\u002Fblob\u002Fmaster\u002Fexamples\u002F02-webdataset-integration.ipynb)\n- 如果您希望提取语音特征并将其存储为 HDF5 数组，请安装 `pip install h5py`。\n- `pip install dill`。当安装了 `dill` 时，我们将使用它来序列化在 `.map` 或 `.filter` 等调用中使用 lambda 函数的 CutSet。这在 PyTorch DataLoader 中且 `num_jobs>0` 时非常有帮助。如果没有 `dill`，根据您的环境，您可能会看到异常或脚本挂起。\n- `pip install aistore` 以使用 AIStore 支持的 URL 从 AIStore 读取清单、tar 文件及其他数据（设置 `AIS_ENDPOINT` 环境变量以激活）。更多详情请参阅 [AIStore 文档](https:\u002F\u002Faiatscale.org)。\n- `pip install smart_open` 以在 `smart_open` 支持的任何位置（例如云、http）读取和写入清单及数据。\n- `pip install opensmile` 以使用 OpenSmile 工具包的 Python 封装进行特征提取。\n- `pip install multi-storage-client` 以在不同的存储后端中读取和写入清单及数据。更多详情请参阅 [multi-storage-client](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fmulti-storage-client)。\n\n**sph2pipe。** 对于读取较旧的 LDC SPHERE (.sph) 音频文件，这些文件使用 ffmpeg 和 sox 不支持的编解码器压缩，请运行：\n\n    # CLI\n    lhotse install-sph2pipe\n\n    # Python\n    from lhotse.tools import install_sph2pipe\n    install_sph2pipe()\n\n它会将其下载到 `~\u002F.lhotse\u002Ftools`，编译并自动注册到 `PATH` 中。该程序应能被 Lhotse 自动检测并使用。\n\n## 示例\n\n我们提供了一些示例脚本，展示如何准备数据并在 Python 中将其加载为 PyTorch `Dataset`。\n这些示例位于 `examples` 目录中。\n\n以下是一个简短片段，展示 Lhotse 如何使音频数据准备变得快速而简单：\n\n```python\nfrom torch.utils.data import DataLoader\nfrom lhotse import CutSet, Fbank\nfrom lhotse.dataset import VadDataset、SimpleCutSampler\nfrom lhotse.recipes import prepare_switchboard\n\n# 从原始语料分布中准备数据清单。\n# RecordingSet 描述了音频录音的元数据；采样率、声道数、时长等。\n# SupervisionSet 描述了监督片段的元数据：转录文本、说话人、语言等。\nswbd = prepare_switchboard('\u002Fexport\u002Fcorpora3\u002FLDC\u002FLDC97S62')\n\n# CutSet 是 Lhotse 的主力工具，允许灵活的数据操作。\n# 我们通过以 5 秒为窗口遍历 SWBD 录音来创建 5 秒的切片。\n# 在此时，实际并未将音频数据加载到内存或存储到磁盘。\ncuts = CutSet.from_manifests(\n    recordings=swbd['recordings'],\n    supervisions=swbd['supervisions']\n).cut_into_windows(duration=5)\n\n# 我们计算 log-Mel 滤波能量并将其存储到磁盘；\n# 然后，我们将切片填充至 5 秒，以确保所有切片长度一致，\n# 因为每个录音中的最后一个窗口可能持续时间较短。\n# 填充将在特征加载到内存后执行。\ncuts = cuts.compute_and_store_features(\n    extractor=Fbank(),\n    storage_path='feats',\n    num_jobs=8\n).pad(duration=5.0)\n\n# 构建用于语音活动检测任务的 Pytorch Dataset 类：\ndataset = VadDataset()\nsampler = SimpleCutSampler(cuts, max_duration=300)\ndataloader = DataLoader(dataset, sampler=sampler, batch_size=None)\nbatch = next(iter(dataloader))\n```\n\n`VadDataset` 将产生包含特征和监督张量对的批次，如下所示——语音大约从第一秒（100 帧）开始：\n\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flhotse-speech_lhotse_readme_38d97c8df09e.png)\n\n# 致谢\n\n本项目的部分贡献得到了美国国家科学基金会 CCRI 资助项目 2120435 的支持。","# Lhotse 快速上手指南\n\nLhotse 是一个专为多模态（语音、音频、视频、图像、文本）数据准备设计的 Python 库。它是下一代 Kaldi 语音处理生态（配合 k2）的核心组件，旨在通过以 Python 为中心的设计，提供灵活的数据加载、增强及标准化食谱。\n\n## 1. 环境准备\n\n*   **操作系统**：Linux, macOS, Windows (WSL 推荐)\n*   **Python 版本**：3.7 及以上\n*   **核心依赖**：\n    *   `torch` (PyTorch)\n    *   `torchaudio` (可选，但推荐用于音频处理)\n    *   `soundfile` (基础音频读取)\n*   **可选依赖**（根据需求安装）：\n    *   `lilcom`：用于高效压缩特征存储。\n    *   `torchcodec`：基于 FFmpeg 的 PyTorch 原生音频\u002F视频解码器（需 torch >= 2.9）。\n\n## 2. 安装步骤\n\n### 基础安装\n通过 PyPI 安装稳定版：\n\n```bash\npip install lhotse\n```\n\n> **国内加速建议**：如果遇到下载缓慢，可使用清华或阿里镜像源：\n> ```bash\n> pip install lhotse -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n### 安装最新开发版\n如需体验最新功能（未发布版本）：\n\n```bash\npip install git+https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\n```\n\n### 开发模式安装\n如果您需要修改源码或运行测试：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\ncd lhotse\npip install -e '.[dev]'\npre-commit install\n```\n\n### 启用高级功能（可选）\n*   **启用高效特征压缩**：\n    ```bash\n    pip install lhotse[lilcom]\n    ```\n*   **启用高性能音视频解码**：\n    ```bash\n    pip install torchcodec\n    ```\n\n## 3. 基本使用\n\nLhotse 的核心概念是 **Cut**（切片），它允许您在内存中动态组合、截断和填充音频数据，而无需预先修改磁盘上的文件。\n\n以下是最简单的完整工作流示例：\n\n```python\nfrom lhotse import RecordingSet, SupervisionSet, CutSet\nfrom lhotse.audio import AudioSource\nfrom lhotse.cut import MonoCut\n\n# 1. 准备元数据 (通常从标准食谱生成，此处手动构建示例)\n# 创建一个录音对象 (指向本地音频文件)\nrecording = Recording(\n    id=\"rec-1\",\n    sources=[AudioSource(type=\"file\", channels=[0], source=\"path\u002Fto\u002Faudio.wav\")],\n    sampling_rate=16000,\n    num_samples=160000,\n    duration=10.0\n)\n\n# 创建标注对象\nsupervision = SupervisionSegment(\n    id=\"seg-1\",\n    recording_id=\"rec-1\",\n    start=0.0,\n    duration=5.0,\n    text=\"你好，这是 Lhotse 测试\",\n    language=\"Chinese\"\n)\n\n# 2. 构建 Cut (将录音与标注关联)\ncut = MonoCut(\n    id=\"cut-1\",\n    start=0.0,\n    duration=5.0,\n    channel=0,\n    recording=recording,\n    supervision=[supervision]\n)\n\n# 3. 放入 CutSet 并进行操作\ncuts = CutSet.from_cuts([cut])\n\n# 示例：动态混合噪音 (假设有一个 noise_cut)\n# mixed_cuts = cuts.mix(noise_cuts, snr=[10, 20], mix_prob=0.5)\n\n# 示例：截断或填充\n# padded_cuts = cuts.pad(num_frames=1000)\n\n# 4. 加载音频数据 (返回 PyTorch Tensor)\naudio = cuts.load_audio() \nprint(f\"音频形状：{audio.shape}\")\n\n# 5. 转换为 PyTorch DataLoader\nfrom lhotse.dataset import SingleCutSampler, SimpleCutDataset\n\nsampler = SingleCutSampler(cuts, max_duration=300)\ndataset = SimpleCutDataset(cuts)\ndataloader = DataLoader(dataset, batch_sampler=sampler)\n\nfor batch in dataloader:\n    # 训练循环\n    pass\n```\n\n### 关键特性提示\n*   **惰性加载**：`load_audio()` 仅在调用时读取文件，支持动态混音和增强。\n*   **多模态支持**：除了音频，Lhotse 同样支持 `VideoCut` 和图像数据的加载与处理。\n*   **数据存储**：推荐使用 `Lhotse Shar` 格式或 `WebDataset` 格式进行大规模数据的序列化存储，以提升 I\u002FO 效率。","某语音识别初创团队正在构建一个支持多语种、多场景的端到端训练管线，需要整合 Librispeech、Common Voice 及内部采集的数十万小时异构音频数据。\n\n### 没有 lhotse 时\n- **数据加载代码冗余**：针对不同数据集格式（如 Kaldi archive、JSON manifest、原始 WAV）需编写大量重复的解析脚本，维护成本极高。\n- **动态增强难以实现**：无法在训练时实时进行音频切片拼接（Mixing）或变速变调，只能预先离线处理所有增强数据，导致存储空间爆炸式增长。\n- **多模态对齐困难**：处理音视频同步或音频 - 文本对齐时，缺乏统一的时间轴管理对象，常出现切片错位或元数据丢失。\n- **分布式训练效率低**：在多节点训练中，数据随机化策略简陋，容易导致部分 GPU 负载不均或某些样本被重复\u002F遗漏训练。\n\n### 使用 lhotse 后\n- **统一数据抽象层**：利用 lhotse 的 `Cut` 对象统一封装音频、文本及时间戳，通过标准 Recipe 一键加载各类开源_corpus_，代码量减少 70%。\n- **链式动态增强**：借助 `CutSet` 的链式 API，在数据加载流中实时执行混音、裁剪和声学增强，无需占用额外磁盘空间即可生成无限多样的训练样本。\n- **精准多模态调度**：内置的时间轴机制自动确保音频切片与对应文本、视频帧严格对齐，彻底消除因手动计算时间偏移导致的对齐错误。\n- **智能数据桶化**：内置的高效 Bucketing 算法自动按语音时长分组并均衡分配至各训练节点，显著提升了 GPU 利用率并加速模型收敛。\n\nlhotse 将繁琐的多模态数据准备转化为灵活的 Python 流水线，让团队能专注于模型架构创新而非数据清洗细节。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flhotse-speech_lhotse_aaf00554.png","lhotse-speech","Lhotse","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Flhotse-speech_51746a9e.png","Lhotse is a Python library aiming to make speech and audio data preparation flexible and accessible to a wider community.",null,"pzelasko@jhu.edu","PiotrZelasko","https:\u002F\u002Flhotse.readthedocs.io\u002Fen\u002Flatest\u002F","https:\u002F\u002Fgithub.com\u002Flhotse-speech",[83,87],{"name":84,"color":85,"percentage":86},"Python","#3572A5",100,{"name":88,"color":89,"percentage":90},"Shell","#89e051",0,1125,270,"2026-04-18T08:21:44","Apache-2.0","Linux, macOS, Windows","未说明",{"notes":98,"python":99,"dependencies":100},"Lhotse 是一个用于多模态数据准备的 Python 库，核心功能不强制依赖 GPU。torchaudio 现为可选依赖，若未安装需设置环境变量 LHOTSE_REQUIRE_TORCHAUDIO 为 false 以使用 soundfile 作为后端。支持通过 pip 直接安装或从源码进行开发模式安装。可选安装 lilcom 以获得更高的特征存储压缩率，或安装 torchcodec (需 PyTorch >= 2.9) 作为优先的音频解码后端。","3.7+",[101,102,103,104,105],"torch","torchaudio (可选)","soundfile","lilcom (可选)","torchcodec (可选，>=0.9)",[16,107,15,13,14],"音频",[109,110,111,112,113,114,115,116,117,118],"speech","audio","kaldi","machine-learning","ai","deep-learning","pytorch","data","python","speech-recognition","2026-03-27T02:49:30.150509","2026-04-19T03:09:44.090568",[122,127,132,137,142,147],{"id":123,"question_zh":124,"answer_zh":125,"source_url":126},41478,"为什么处理大型数据集时内存占用过高，如何优化？","内存占用高通常是因为使用了 Eager CutSet（将所有 manifest 加载到内存）或 BucketingSampler（需要预计算并持有所有 cuts）。优化方案如下：\n1. 使用 Lazy CutSet：它仅存储文件路径，非常节省内存。\n2. 替换采样器：对于大数据集，请使用 DynamicBucketingSampler。它只读取样本子集来计算桶，迭代时仅在内存中保持 `buffer_size` 数量的 cuts，随用随填，显著降低内存开销。\n3. 避免在大数据集上使用标准的 BucketingSampler，它更适合中小规模数据。","https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fissues\u002F556",{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},41479,"训练过程中 RAM 持续上升导致 OOM（内存溢出）怎么办？","如果在处理超过 10000 小时的大型数据集时遇到内存持续增长，请尝试以下解决方案：\n1. 使用动态采样器：切换到 DynamicBucketingSampler，即使使用 HDF5 文件也能大幅降低内存占用。\n2. 更新格式：确保使用最新的 LilcomChunkyWriter\u002FReader 格式存储特征。\n3. 集成 WebDataset：对于超大规模数据，考虑集成 WebDataset。\n维护者确认，在使用了动态采样器和新的分块格式后，大 Manifest 的内存问题已得到解决。","https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fissues\u002F518",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},41480,"如何在 Lhotse 中直接使用类似 Kaldi wav.scp 中的管道命令（如 'cmd |'）来加载音频？","不能直接将管道命令字符串（如 \"spx2wav \u002Fpath\u002Fto\u002Ffile - |\"）传递给 `Recording.from_file`，因为这会被视为文件名。\n解决方案：\n1. 推荐做法：先生成标准的 Kaldi `wav.scp` 文件，然后使用 `lhotse.kaldi.load_kaldi_data_dir` 直接加载整个数据目录。\n2. 如果必须编程式创建，需确保底层依赖 `kaldi_native_io` 已更新至支持该功能的版本（v1.6 及以上），相关修复已在 PR #667 中合并。","https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fissues\u002F659",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},41481,"使用 BucketingSampler 时损失函数出现锯齿状波动（sawtooth patterns）如何解决？","这种波动通常是因为桶内的数据按长度排序导致的。虽然可以通过在桶内打乱或随机反转顺序来增加随机性，但根本原因往往与填充（padding）有关：采样器统计的是未填充的时长，而数据集内部填充会增加实际时长。\n解决方案：\n1. 检查是否启用了 cut 拼接（concatenate-cuts），尝试关闭它（如设置 `--concatenate-cuts=0`）。\n2. 确认日志显示，在应用相关修复或调整配置后，震荡现象通常会消失。","https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fissues\u002F364",{"id":143,"question_zh":144,"answer_zh":145,"source_url":146},41482,"在提取 GigaSpeech XL 等超大数据集特征时进程意外终止，如何恢复或避免重算？","针对大数据集特征提取中断的问题：\n1. 断点续传：Lhotse 的特征提取机制通常支持增量处理，如果输出文件（如 .h5）已部分生成，重新运行脚本时可能会跳过已完成的部分（具体取决于使用的脚本逻辑，如 icefall 中的实现）。\n2. 工具链更新：确保使用最新的 icefall 和 lhotse 版本，社区已通过多个 PR（如 #120）优化了 GigaSpeech 的特征提取稳定性。\n3. 文件大小预估：GigaSpeech XL (10,000 小时) 的特征文件大小确实可能在 1TB 级别（参考 XS 子集 1000 小时约 960MB 的比例推算），请确保磁盘空间充足。","https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fissues\u002F452",{"id":148,"question_zh":149,"answer_zh":150,"source_url":151},41483,"使用 DynamicBucketingSampler 加载状态字典时报 'StopIteration' 异常导致训练中止，如何处理？","这是一个已知问题，当从某个 batch 恢复训练并加载 sampler 状态时，DynamicBucketingSampler 可能会抛出未被捕获的 `StopIteration` 异常。\n解决方案：\n1. 升级版本：该问题通常在后续的 lhotse 或 icefall 版本修复中被解决，请确保升级到最新版本。\n2. 临时规避：如果无法立即升级，尝试不加载 sampler 的状态字典，从头开始迭代，或者在训练脚本中增加对 `StopIteration` 的捕获逻辑（但这通常需要修改框架源码）。\n建议查看相关的 GitHub PR 或更新日志以获取具体的修复版本号。","https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fissues\u002F753",[153,158,163,168,173,178,183,188,193,198,203,208,213,218,223,228,233,238,243,248],{"id":154,"version":155,"summary_zh":156,"released_at":157},333458,"v1.32.2","## 变更内容\n* 在 README.md 中添加了 NSF 资助的致谢，由 @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1539 中完成。\n* 修复了 CutSampler 在较新 PyTorch 版本中的初始化问题，由 @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1543 中完成。\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fcompare\u002Fv1.32.1...v1.32.2","2026-01-14T15:09:03",{"id":159,"version":160,"summary_zh":161,"released_at":162},333459,"v1.32.1","修复在 Windows 上导入 Lhotse v1.32.0 时的问题。\n\n**完整更新日志**: https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fcompare\u002Fv1.32.0...v1.32.1","2025-11-24T16:43:29",{"id":164,"version":165,"summary_zh":166,"released_at":167},333460,"v1.32.0","## 食谱\r\n* @domklement 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1517 中添加了 NOTSOFAR-1 的食谱\r\n* @Lakoc 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1518 中添加了 LibriMix 全集和 WHAM 噪声准备的食谱\r\n* @Lakoc 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1521 中添加了 LibriSpeechMix 食谱\r\n* @Lakoc 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1519 中恢复了 AMI 食谱中的标点符号\r\n* @domklement 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1526 中添加了 chime6 数据集的下载\r\n* @ialmajai 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1502 中修复了 grid 食谱\r\n\r\n## 新特性\r\n* @racoiaws 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1510 中新增了一种增强方法：编解码器压缩（GSM、Opus、Vorbis、MP3）  \r\n* @racoiaws 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1511 中新增了一种增强方法：通过 libsox 进行来回重采样实现低通滤波  \r\n* @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1525 中支持 cut.load_custom_video() 和 collate_video(..., recording_field='custom_video')  \r\n* @gaikwadabhishek 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1529 中添加了 AISBatchLoader，用于从 AIStore 高效加载批量数据  \r\n* @gaikwadabhishek 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1534 中为 AudioSamples 类添加了对 AIStore 批量加载的支持  \r\n\r\n## 修复与改进\r\n* @denini08 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1504 中修复了由于 NamedTemporaryFile 导致的 Windows 测试失败问题  \r\n* @somniumism 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1523 中修复了 DynamicBucketer 中的 AttributeError  \r\n* @gaikwadabhishek 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1530 中为 Cut.iter_data() 方法添加了图像支持  \r\n* @gaikwadabhishek 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1531 中为 AISBatchLoader 添加了单元测试，并修复了清单跟踪问题  \r\n* @KarelVesely84 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1527 中避免了 OnTheFlyFeatures 和 PerturbVolume cut_transform 中出现的 bug  \r\n\r\n## 新贡献者\r\n* @ialmajai 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1502 中做出了首次贡献  \r\n* @denini08 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1504 中做出了首次贡献  \r\n* @Lakoc 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1519 中做出了首次贡献  \r\n* @somniumism 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1523 中做出了首次贡献  \r\n\r\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fcompare\u002F1.31.0...v1.32.0","2025-11-21T20:04:12",{"id":169,"version":170,"summary_zh":171,"released_at":172},333461,"1.31.0","（跳过 1.31.0 版本，因为我之前不小心在 PyPI 上上传并删除了该版本，现在无法再次使用）\n\n## 新的配方\n* @yfyeung 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1506 中添加了 GigaSpeech 2 配方\n\n## 新特性\n* @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1482 中优先使用 torchaudio 的 FFMPEG 后端处理视频数据\n* @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1483 中增加了通过 Pillow 加载图像的功能\n* @racoiaws 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1512 中新增了软\u002F硬削波增强功能\n* @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1505 中为 Lhotse Shar 引入了随机分片切分方式\n\n## 修复与改进\n* 小修复：@yfyeung 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1485 中移除了所有配方的执行权限\n* @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1486 中重构了 IO 模块，使其同时支持文件和 URL 格式的 Features、Array 和 Image 数据\n* @teowenshen 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1495 中更新了 edacc 下载链接\n* @KarelVesely84 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1496 中允许读取包含空参考文本的 Kaldi `text` 文件\n* @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1499 中支持访问 MixedCut.custom\n* @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1494 中支持不对自定义录音进行切分\n* @yfyeung 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1492 中支持在 `supervision_intervals` 和 `supervision_masks` 中进行左填充\n* @hoangtran9122 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1487 中修复了当 drop_last=True 时无法加入当前线程的问题\n* @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1513 中允许音频与清单之间最多有半秒的时长差异\n* @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1514 中允许从 tar 文件内部随机读取文件\n\n## 新贡献者\n* @hoangtran9122 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1487 中做出了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fcompare\u002Fv1.30.3...1.31.0","2025-09-18T21:42:28",{"id":174,"version":175,"summary_zh":176,"released_at":177},333462,"v1.30.3","## 变更内容\n* [修复] 由 @yuekaizhang 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1475 中提出，避免重复导入 librosa\n* 支持为写入操作设置初始分片偏移量，由 @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1476 中实现\n* 测试：CUDA 兼容性改进、移除未使用的测试用例，并添加 torchaudio 重采样功能，由 @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1480 中完成\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fcompare\u002Fv1.30.2...v1.30.3","2025-05-15T14:31:28",{"id":179,"version":180,"summary_zh":181,"released_at":182},333463,"v1.30.2","## 变更内容\n* 由 @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1474 中修复了 `.repeat()` 方法在输入为空时的边界情况。\n\n感谢 @monica-sekoyan 对该问题的调试。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fcompare\u002Fv1.30.1...v1.30.2","2025-04-28T15:51:40",{"id":184,"version":185,"summary_zh":186,"released_at":187},333464,"v1.30.1","补丁版本，修复了 AIStore 和多存储客户端逻辑等中的若干问题。\n\n## 变更内容\n* 修复多任务 to_shar 导出的返回值，由 @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1470 中完成。\n* 修复：为 AIStore 客户端初始化添加超时机制，由 @gaikwadabhishek 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1465 中完成。\n* 在 CI 中恢复 kaldifeat 模块，由 @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1471 中完成。\n* 修复：添加 LHOTSE_MSC_BACKEND_FORCED 标志，仅对非 MSC URL 强制使用 MSCIOBackend，由 @jayya2 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1472 中完成。\n\n## 新贡献者\n* @gaikwadabhishek 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1465 中完成了首次贡献。\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fcompare\u002Fv1.30.0...v1.30.1","2025-04-21T19:32:54",{"id":189,"version":190,"summary_zh":191,"released_at":192},333465,"v1.30.0","## 新特性\n\n* 添加多存储客户端后端，用于文件打开，由 @jayya2 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1455 中实现。\n\n更多关于 [multi-storage-client 的信息](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fmulti-storage-client)。\n\n## Bug 修复及其他改进\n* 修复原始配方中遗漏的硬编码路径，由 @m-wiesner 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1438 中完成。\n* [文档] 进行小幅修正，由 @pengzhendong 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1439 中完成。\n* 更新 CI 配置，由 @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1452 中完成。\n* 避免覆盖 dnsmos 注释中已存在的自定义字段，由 @pkufool 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1450 中实现。\n* 默认按 num_samples 对切片进行填充，由 @pengzhendong 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1449 中完成。\n* 从 qa.py 中移除 `validate_cut_set` 函数，由 @t13m 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1458 中完成。\n* 修复音频合并时样本数不匹配的边缘情况，由 @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1462 中完成。\n* 修复音频合并中的更多边缘情况，由 @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1463 中完成。\n* 向 `collate_features` 函数添加 `features_dtype` 参数，由 @t13m 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1456 中完成。\n\n## 新贡献者\n* @t13m 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1458 中完成了首次贡献。\n* @jayya2 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1455 中完成了首次贡献。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fcompare\u002Fv1.29.0...v1.30.0","2025-03-19T14:37:39",{"id":194,"version":195,"summary_zh":196,"released_at":197},333466,"v1.29.0","## 变更内容\n\n### 配方\n* @JinZr 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1423 中添加了中国构音障碍语音数据库的配方。\n* @yuta0306 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1434 中利用 Hugging Face 数据集的功能优化了 ReazonSpeech 的下载速度。\n\n### 新功能\n* @anteju 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1422 中增加了导出到 Shar 时以原始格式保存音频的选项。\n* @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1433 中实现了 `CutSet.from_huggingface_dataset()`，用于导入 Hugging Face 数据集。\n* @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1435 中将 AIStore 序列化后端扩展至写入功能。\n\n### 其他改进\n* @pengzhendong 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1419 中将文档中的 `max_frames` 改为 `max_duration`。\n* @pengzhendong 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1424 中添加了 OpenSMILE 的网址。\n* @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1421 中将文件读取的 IO 重构为后端模块。\n* @racoiaws 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1427 中修复了在某些环境下对 `.m4a` 文件的支持问题（可能也适用于其他 libsndfile 不支持的格式）。\n* @pengzhendong 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1426 中为 `CustomFieldMixin` 类添加了 `to_dict` 方法。\n* @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1432 中修复了在 `num_workers > 1` 的情况下，轮询采样器连续选择相同样本的问题。\n* @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1436 中修复了复制设置了自定义属性的 `MixedCut` 时出现的问题。\n\n## 新贡献者\n* @racoiaws 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1427 中完成了首次贡献。\n* @yuta0306 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1434 中完成了首次贡献。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fcompare\u002Fv1.28.0...v1.29.0","2024-12-13T15:02:03",{"id":199,"version":200,"summary_zh":201,"released_at":202},333467,"v1.28.0","## 新功能\n* 由 @domklement 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1398 中实现从 CutSet 到 HuggingFace 数据集的转换\n* 添加工作流：标注 DNSMOS P.835，由 @yfyeung 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1406 中完成\n\n## 新配方\n* 添加圣巴巴拉美国英语口语语料库 (SBCSAE) 的配方，由 @mmaciej2 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1395 中完成\n* 添加广播数据配方，由 @m-wiesner 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1400 中完成\n* Fleurs 数据集配方，由 @m-wiesner 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1402 中完成\n* 添加 Emilia 语料库，由 @csukuangfj 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1404 中完成\n\n## 变更内容\n* [spgispeech] 修复 durations 对象为 null 的问题，由 @frankyoujian 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1390 中完成\n* 当 ffmpeg 不可用时，将 backend 设置为 None，由 @pengzhendong 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1392 中完成\n* 修复 ksponspeech 配方，由 @yfyeung 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1394 中完成\n* 修复 ksponspeech 的命令行工具，由 @yfyeung 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1393 中完成\n* [修复] fisher_english 配方，由 @pengzhendong 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1410 中完成\n* 将 sphinx 版本从 7.2.6 降级到 7.1.2，由 @annapovey 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1409 中完成\n* 更新 lhotse.py 文件，由 @pengzhendong 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1414 中完成\n* 将 torchaudio 改为可选依赖项，由 @pzelasko 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1382 中完成\n* 轻微修复，由 @pengzhendong 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1418 中完成\n* 支持在 AIStore SDK 版本 ≥1.9.1 时进行 AIStore ObjectFile 的容错读取\n\n## 新贡献者\n* @frankyoujian 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1390 中完成了首次贡献\n* @pengzhendong 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1392 中完成了首次贡献\n* @mmaciej2 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1395 中完成了首次贡献\n* @domklement 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1398 中完成了首次贡献\n* @annapovey 在 https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1409 中完成了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fcompare\u002Fv1.27.0...v1.28.0","2024-11-19T14:54:25",{"id":204,"version":205,"summary_zh":206,"released_at":207},333468,"v1.27.0","## New recipes\r\n* [Recipe] Wenetspeech4tts by @yuekaizhang in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1384\r\n* [Recipe] Spatial LibriSpeech by @JinZr in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1386\r\n\r\n## Other enhancements\r\n* Cap the 'trng' random seeds to 2**31 avoiding numpy error by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1379\r\n* `CutSet`.prefetch() for background cuts loading during iteration by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1380\r\n* Include a copyright NOTICE listing major copyright holders by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1381\r\n* Added has_custom to MixedCut by @anteju in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1383\r\n* Fix to fixed batch size bucketing and audio loading network connectio… by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1387\r\n\r\n## New Contributors\r\n* @anteju made their first contribution in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1383\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fcompare\u002Fv1.26.0...v1.27.0","2024-08-22T15:25:45",{"id":209,"version":210,"summary_zh":211,"released_at":212},333469,"v1.26.0","## What's Changed\r\n* Add EARS recipe by @Ryu1845 in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1375\r\n* Concurrent dynamic bucketing by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1373\r\n* Refactor bucket selection for customization by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1377\r\n\r\n## New Contributors\r\n* @Ryu1845 made their first contribution in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1375\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fcompare\u002Fv1.25.0...v1.26.0","2024-07-26T15:58:52",{"id":214,"version":215,"summary_zh":216,"released_at":217},333470,"v1.25.0","## What's Changed\r\n* [feature] Add `.narrowband()` effect (mulaw, lpc10 codecs) by @rouseabout in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1348\r\n* [feature\u002Foptimization] Support for pre-determined batch sizes in `DynamicBucketingSampler` by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1372\r\n* [bug] Fix `MixedCut` transforms serialization by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1370\r\n\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fcompare\u002Fv1.24.2...v1.25.0","2024-07-18T23:45:10",{"id":219,"version":220,"summary_zh":221,"released_at":222},333471,"v1.24.2","## New recipes\r\n* Add KsponSpeech recipe by @whsqkaak in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1353\r\n\r\n## New features\r\nSeveral new APIs for manifest classes added in #1361:\r\n* `cut.iter_data()` which iterates over (key, manifest) pairs of all data items attached to a given cut (e.g., `(\"recording\", Recording(...)), (\"custom_features\", TemporalArray(...))`)\r\n* `is_in_memory` property for all manifest types to indicate if it contains data that is held in memory\r\n* `is_placeholder` for non-cut manifests to indicate if a manifest is just a placeholder (has some metadata, but can't be used to load data)\r\n* `cut.drop_in_memory_data()` which converts manifests with in-memory data to placeholders (this is useful for manifests that live longer than just dataloading to avoid blowing up CPU memory and\u002For slowing down the program)\r\n\r\n## Bug fixes\r\n* Restoring smart open for local files if available by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1360\r\n* Fix Recording.to_dict() when transforms are dicts and transform pickling issues by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1355\r\n* Utils for discovering attached data and dropping in-memory data by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1361\r\n* Numpy 2.0 compatibility by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1362\r\n\r\n## New Contributors\r\n* @whsqkaak made their first contribution in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1353\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fcompare\u002Fv1.24.1...v1.24.2","2024-06-25T15:59:43",{"id":224,"version":225,"summary_zh":226,"released_at":227},333472,"v1.24.1","## What's Changed\r\n* Support for reading data from AIStore using Python SDK by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1354\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fcompare\u002Fv1.24...v1.24.1","2024-06-10T20:35:38",{"id":229,"version":230,"summary_zh":231,"released_at":232},333473,"v1.24","## What's Changed\r\n\r\n### New features\r\n\r\nNotably, there's a new optimization for dynamic bucketing sampler in multi-GPU training - it will choose the same (or the closest possible) bucket on each DDP rank to keep the total training step times closer. The expected speedup is dependent on the model and the number of GPUs. We observed 8 and 13% speedups across two experiments compared to non-synchronized bucket selection. The new option is called `sync_buckets` and is enabled by default.\r\n\r\n* Dynamic bucket selection RNG sync by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1341\r\n* Add new sampler: weighted sampler by @marcoyang1998 in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1344\r\n* `reverb_rir`: support Cut input and in memory data by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1332 \r\n \r\n### Recipes\r\n\r\n* Add the ReazonSpeech recipe by @Triplecq in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1330\r\n \r\n### Other improvements\r\n\r\n* Missing 'subset' parameter by @daniel-dona in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1336\r\n* Fix describe on cuts by @keeofkoo in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1340\r\n* Use libsndfile in recording chunk dataset by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1335\r\n* Fix librispeech manifest caching by @haerski in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1343\r\n* Fix one-off edge case in split_lazy by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1347\r\n* Increase the start diff tolerance for feature loading by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1349\r\n* More test coverage for lhotse subset by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1345\r\n\r\n\r\n## New Contributors\r\n* @keeofkoo made their first contribution in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1340\r\n* @haerski made their first contribution in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1343\r\n* @Triplecq made their first contribution in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1330\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fcompare\u002Fv1.23...v1.24","2024-06-05T19:59:32",{"id":234,"version":235,"summary_zh":236,"released_at":237},333474,"v1.23","## What's Changed\r\n\r\n### Recipes\r\n* MDCC recipe by @JinZr in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1302\r\n* Updated text_norm for `aishell` recipe by @JinZr in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1305\r\n* Allow skipping missing files in AMI download by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1318\r\n* Add Chinese TTS dataset `baker`. by @csukuangfj in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1304\r\n* In CommonVoice corpus, use .tsv headers to parse and not column index by @daniel-dona in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1328\r\n\r\n### Fixes to a regression in noise mixing augmentations\r\n* Enhance `CutSet.mix()` randomness and data utilization by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1315\r\n* Fix randomness in CutMix transform by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1316\r\n* select a random sub-region of the noise based on the delta duration by @osadj in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1317\r\n\r\n### Other improvements\r\n* Add dataset for audio tagging by @marcoyang1998 in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1241\r\n* Fix _get_strided_batch device by @lifeiteng in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1303\r\n* Fix typo in README.md by @yfyeung in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1308\r\n* Fix export of features\u002Farray to shar by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1323\r\n* Fix `trim_to_supervision_groups` by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1322\r\n\r\n## New Contributors\r\n* @daniel-dona made their first contribution in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1328\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fcompare\u002Fv1.22...v1.23","2024-04-30T18:43:23",{"id":239,"version":240,"summary_zh":241,"released_at":242},333475,"v1.22","## What's Changed\r\n\r\n### New features\r\n\r\n* Extending Lhotse dataloading to text\u002Fmultimodal data by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1295\r\n\r\nAs an experimental feature, we are extending the API of Lhotse samplers to enable key sampling features for non-audio data such as text. That means text (and other) data can be dynamically multiplexed and bucketed in the same way as audio data with some lightweight wrappers. Please refer to new documentation here: https:\u002F\u002Flhotse.readthedocs.io\u002Fen\u002Flatest\u002Fdatasets.html#customizing-sampling-constraints\r\n\r\n* Multi-channel support improvements\r\n  * Fix loading multi-channel custom recording fields in multi cuts by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1298\r\n  * Channel selection for multi-channel custom recording fields by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1299\r\n\r\nLhotse `MultiCut`s:\r\n* are now exportable into Lhotse Shar format\r\n* gained a new method `cut = cut.with_channels([0, 1, ...])` to modify the channels they refer to\r\n* can have multi-channel custom Recordings with channels selectable via a special custom key (e.g., if defining `cut.target_recording`, audio can be read via `cut.load_target_recording()` and channels will be auto-selected by looking up `cut.target_recording_channel_selector`).\r\n\r\n### Recipes\r\n\r\n* Add new recipe: speechio by @yuekaizhang in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1297\r\n* tedlium2 recipe by @JinZr in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1296\r\n\r\n### Other improvements\r\n\r\n* Use audio backends and export custom fields in Lhotse Shar by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1290\r\n* Documentation for random seeds in lhotse + extended support of lazy r… by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1291\r\n* Cutconcat fixed max duration by @swigls in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1292\r\n* Fix feature_dim of Spectrogram extractors. by @csukuangfj in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1294\r\n* fix whisper for multi-channel data by @yuekaizhang in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1289\r\n* Xfail flaky SileroVAD tests by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1300\r\n\r\n## New Contributors\r\n* @swigls made their first contribution in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1292\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fcompare\u002Fv1.21...v1.22","2024-03-07T19:38:04",{"id":244,"version":245,"summary_zh":246,"released_at":247},333476,"v1.21","## What's Changed\r\n\r\nThis release patches lhotse to handle cases when libsox is not available for torchaudio. The audio backend code went through additional round of refactoring, and `libsndfile` is now preferred as a default since it showed faster audio decoding performance in our testing. Going forward, when `LHOTSE_AUDIO_BACKEND` is set, we will use the same backend for audio loading, audio saving, and reading audio metadata (if possible). This release also adds support for Python 3.12 and PyTorch 2.2.\r\n\r\n* Add VAD to Supervisions in LibriLight Recipe by @yfyeung in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1280\r\n* Fixes for manifest validation and fixing by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1284\r\n* Handle error with cachedir creation gracefully by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1287\r\n* `AudioBackend` specific `save_audio` and `info`, managing missing SoX in torchaudio, Python 3.12 \u002F PyTorch 2.2 support, using `libsndfile` as preferred audio backend by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1288\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fcompare\u002Fv1.20...v1.21","2024-02-13T19:57:33",{"id":249,"version":250,"summary_zh":251,"released_at":252},333477,"v1.20","## What's Changed\r\n\r\n### New features\r\n* Extended the subset of lhotse that works without installing torchaudio by @pzelasko in #1253 #1255\r\n* Ensure `drop_last=False` always returns an equal number of mini-batches by re-distributing and\u002For duplicating some data by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1277\r\n* Improved CPU memory usage and shuffling + bucketing in `DynamicBucketingSampler` by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1276\r\n* Enable seed randomization in dynamic samplers by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1278\r\n\r\n### Recipes\r\n* Fluent Speech Commands dataset, SLU task by @HSTEHSTEHSTE in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1272\r\n\r\n### Other improvements\r\n* Update docs with env vars used by Lhotse by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1252\r\n* support whisper large v3; deepspeed launcher rank world_size setting by @yuekaizhang in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1260\r\n* Fix non-deterministic tests by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1261\r\n* Fix duplication issues in CutSet.mix() by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1268\r\n* Support controllable `CutSet.mux` weights in multiprocess dataloading by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1266\r\n* Fix distributed sampler initialization and `exceeded` sampler warning false positives  by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1270\r\n* Install kaldi-native-io explicitly in the kaldi doc example. by @csukuangfj in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1275\r\n* Allow duplicate cut IDs in a CutSet (CutSet is list-like instead of dict-like) by @pzelasko in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1279\r\n\r\n## New Contributors\r\n* @HSTEHSTEHSTE made their first contribution in https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fpull\u002F1272\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flhotse-speech\u002Flhotse\u002Fcompare\u002Fv1.19...v1.20","2024-01-31T20:51:42"]