[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-facebookresearch--seamless_communication":3,"tool-facebookresearch--seamless_communication":65},[4,17,27,35,48,57],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",154349,2,"2026-04-13T23:32:16",[13,14,15],"开发框架","Agent","语言模型","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,3,"2026-04-06T11:19:32",[15,26,14,13],"图像",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":10,"last_commit_at":33,"category_tags":34,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":10,"last_commit_at":41,"category_tags":42,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",85092,"2026-04-10T11:13:16",[26,43,44,45,14,46,15,13,47],"数据工具","视频","插件","其他","音频",{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":54,"last_commit_at":55,"category_tags":56,"status":16},5784,"funNLP","fighting41love\u002FfunNLP","funNLP 是一个专为中文自然语言处理（NLP）打造的超级资源库，被誉为\"NLP 民工的乐园”。它并非单一的软件工具，而是一个汇集了海量开源项目、数据集、预训练模型和实用代码的综合性平台。\n\n面对中文 NLP 领域资源分散、入门门槛高以及特定场景数据匮乏的痛点，funNLP 提供了“一站式”解决方案。这里不仅涵盖了分词、命名实体识别、情感分析、文本摘要等基础任务的标准工具，还独特地收录了丰富的垂直领域资源，如法律、医疗、金融行业的专用词库与数据集，甚至包含古诗词生成、歌词创作等趣味应用。其核心亮点在于极高的全面性与实用性，从基础的字典词典到前沿的 BERT、GPT-2 模型代码，再到高质量的标注数据和竞赛方案，应有尽有。\n\n无论是刚刚踏入 NLP 领域的学生、需要快速验证想法的算法工程师，还是从事人工智能研究的学者，都能在这里找到急需的“武器弹药”。对于开发者而言，它能大幅减少寻找数据和复现模型的时间；对于研究者，它提供了丰富的基准测试资源和前沿技术参考。funNLP 以开放共享的精神，极大地降低了中文自然语言处理的开发与研究成本，是中文 AI 社区不可或缺的宝藏仓库。",79857,1,"2026-04-08T20:11:31",[15,43,46],{"id":58,"name":59,"github_repo":60,"description_zh":61,"stars":62,"difficulty_score":54,"last_commit_at":63,"category_tags":64,"status":16},5773,"cs-video-courses","Developer-Y\u002Fcs-video-courses","cs-video-courses 是一个精心整理的计算机科学视频课程清单，旨在为自学者提供系统化的学习路径。它汇集了全球知名高校（如加州大学伯克利分校、新南威尔士大学等）的完整课程录像，涵盖从编程基础、数据结构与算法，到操作系统、分布式系统、数据库等核心领域，并深入延伸至人工智能、机器学习、量子计算及区块链等前沿方向。\n\n面对网络上零散且质量参差不齐的教学资源，cs-video-courses 解决了学习者难以找到成体系、高难度大学级别课程的痛点。该项目严格筛选内容，仅收录真正的大学层级课程，排除了碎片化的简短教程或商业广告，确保用户能接触到严谨的学术内容。\n\n这份清单特别适合希望夯实计算机基础的开发者、需要补充特定领域知识的研究人员，以及渴望像在校生一样系统学习计算机科学的自学者。其独特的技术亮点在于分类极其详尽，不仅包含传统的软件工程与网络安全，还细分了生成式 AI、大语言模型、计算生物学等新兴学科，并直接链接至官方视频播放列表，让用户能一站式获取高质量的教育资源，免费享受世界顶尖大学的课堂体验。",79792,"2026-04-08T22:03:59",[46,26,43,13],{"id":66,"github_repo":67,"name":68,"description_en":69,"description_zh":70,"ai_summary_zh":71,"readme_en":72,"readme_zh":73,"quickstart_zh":74,"use_case_zh":75,"hero_image_url":76,"owner_login":77,"owner_name":78,"owner_avatar_url":79,"owner_bio":80,"owner_company":81,"owner_location":81,"owner_email":81,"owner_twitter":81,"owner_website":82,"owner_url":83,"languages":84,"stars":123,"forks":124,"last_commit_at":125,"license":126,"difficulty_score":23,"env_os":127,"env_gpu":128,"env_ram":129,"env_deps":130,"category_tags":138,"github_topics":81,"view_count":10,"oss_zip_url":81,"oss_zip_packed_at":81,"status":16,"created_at":139,"updated_at":140,"faqs":141,"releases":170},7397,"facebookresearch\u002Fseamless_communication","seamless_communication","Foundational Models for State-of-the-Art Speech and Text Translation","seamless_communication 是 Meta 推出的一套先进 AI 模型家族，旨在打破语言壁垒，实现更自然、真实的跨语言沟通。其核心基础模型 SeamlessM4T 支持近 100 种语言的语音与文本互译，涵盖语音转语音、语音转文本、文本转语音及自动语音识别等多种任务。在此基础上衍生的 SeamlessExpressive 能保留说话人的语调风格与情感色彩，让翻译听起来更像真人；而 SeamlessStreaming 则专注于低延迟的实时流式翻译，适用于同声传译场景。\n\n这套工具主要解决了传统机器翻译中语气生硬、缺乏情感以及实时性不足的问题，让跨语言交流不再丢失“人情味”。它非常适合开发者构建多语言应用、研究人员探索前沿语音技术，以及需要高质量实时翻译服务的企业用户。普通用户也可通过在线演示直接体验其强大的翻译能力。\n\n技术亮点方面，最新发布的 SeamlessM4T v2 采用了创新的 UnitY2 架构，在提升翻译质量的同时显著降低了语音生成的延迟。此外，该系列模型已集成至 Hugging Face Transformers 库，并提供了详尽的教程笔记，方便各类用户快","seamless_communication 是 Meta 推出的一套先进 AI 模型家族，旨在打破语言壁垒，实现更自然、真实的跨语言沟通。其核心基础模型 SeamlessM4T 支持近 100 种语言的语音与文本互译，涵盖语音转语音、语音转文本、文本转语音及自动语音识别等多种任务。在此基础上衍生的 SeamlessExpressive 能保留说话人的语调风格与情感色彩，让翻译听起来更像真人；而 SeamlessStreaming 则专注于低延迟的实时流式翻译，适用于同声传译场景。\n\n这套工具主要解决了传统机器翻译中语气生硬、缺乏情感以及实时性不足的问题，让跨语言交流不再丢失“人情味”。它非常适合开发者构建多语言应用、研究人员探索前沿语音技术，以及需要高质量实时翻译服务的企业用户。普通用户也可通过在线演示直接体验其强大的翻译能力。\n\n技术亮点方面，最新发布的 SeamlessM4T v2 采用了创新的 UnitY2 架构，在提升翻译质量的同时显著降低了语音生成的延迟。此外，该系列模型已集成至 Hugging Face Transformers 库，并提供了详尽的教程笔记，方便各类用户快速上手并进行二次开发。","![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_seamless_communication_readme_570c4247e5ec.jpg)\n# Seamless Intro\n\nSeamless is a family of AI models that enable more natural and authentic communication across languages. SeamlessM4T is a massive multilingual multimodal machine translation model supporting around 100 languages. SeamlessM4T serves as foundation for SeamlessExpressive, a model that preserves elements of prosody and voice style across languages and SeamlessStreaming, a model supporting simultaneous translation and streaming ASR for around 100 languages. SeamlessExpressive and SeamlessStreaming are combined into Seamless, a unified model featuring multilinguality, real-time and expressive translations.\n\n## Links\n\n### Demos\n\n|                        | SeamlessM4T v2                                                                                                                        | SeamlessExpressive                                                                                                                               | SeamlessStreaming                                                                      |\n| ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------- |\n| Demo                   | [SeamlessM4T v2 Demo](https:\u002F\u002Fseamless.metademolab.com\u002Fm4t?utm_source=github&utm_medium=web&utm_campaign=seamless&utm_content=readme) | [SeamlessExpressive Demo](https:\u002F\u002Fseamless.metademolab.com\u002Fexpressive?utm_source=github&utm_medium=web&utm_campaign=seamless&utm_content=readme) |                                                                                          |\n| HuggingFace Space Demo | [🤗 SeamlessM4T v2 Space](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffacebook\u002Fseamless-m4t-v2-large)                                                | [🤗 SeamlessExpressive Space](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffacebook\u002Fseamless-expressive)                                                         | [🤗 SeamlessStreaming Space](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffacebook\u002Fseamless-streaming) |\n\n### Papers\n[Seamless](https:\u002F\u002Fai.facebook.com\u002Fresearch\u002Fpublications\u002Fseamless-multilingual-expressive-and-streaming-speech-translation\u002F)\n\n[EMMA](https:\u002F\u002Fai.meta.com\u002Fresearch\u002Fpublications\u002Fefficient-monotonic-multihead-attention\u002F)\n\n[SONAR](https:\u002F\u002Fai.meta.com\u002Fresearch\u002Fpublications\u002Fsonar-expressive-zero-shot-expressive-speech-to-speech-translation\u002F)\n\n### Blog\n[AI at Meta Blog](https:\u002F\u002Fai.meta.com\u002Fresearch\u002Fseamless-communication\u002F)\n\n## Tutorial\nAn exhaustive [tutorial](Seamless_Tutorial.ipynb) given at the NeurIPS 2023 - Seamless EXPO, which is a one-stop shop to learn how to use the entire suite of Seamless models. Please feel free to play with the notebook.\n\n## SeamlessM4T\nSeamlessM4T is our foundational all-in-one **M**assively **M**ultilingual and **M**ultimodal **M**achine **T**ranslation model delivering high-quality translation for speech and text in nearly 100 languages.\n\nSeamlessM4T models support the tasks of:\n- Speech-to-speech translation (S2ST)\n- Speech-to-text translation (S2TT)\n- Text-to-speech translation (T2ST)\n- Text-to-text translation (T2TT)\n- Automatic speech recognition (ASR)\n\n:star2: We are releasing SeamlessM4T v2, an updated version with our novel *UnitY2* architecture. This new model improves over SeamlessM4T v1 in quality as well as inference latency in speech generation tasks.\n\nTo learn more about the collection of SeamlessM4T models, the approach used in each, their language coverage and their performance, visit the [SeamlessM4T README](docs\u002Fm4t\u002FREADME.md) or [🤗 Model Card](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-m4t-v2-large).\n\n> [!NOTE]\n> Seamless M4T is also available in the 🤗 Transformers library. Visit [this section](docs\u002Fm4t\u002FREADME.md#transformers-usage) for more details.\n\n## SeamlessExpressive\n\nSeamlessExpressive is a speech-to-speech translation model that captures certain underexplored aspects of prosody such as speech rate and pauses, while preserving the style of one's voice and high content translation quality.\n\nTo learn more about SeamlessExpressive models, visit the [SeamlessExpressive README](docs\u002Fexpressive\u002FREADME.md) or [🤗 Model Card](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-expressive)\n\n\n## SeamlessStreaming\n\nSeamlessStreaming is a streaming translation model. The model supports speech as input modality and speech\u002Ftext as output modalities.\n\nThe SeamlessStreaming model supports the following tasks:\n- Speech-to-speech translation (S2ST)\n- Speech-to-text translation (S2TT)\n- Automatic speech recognition (ASR)\n\nTo learn more about SeamlessStreaming models, visit the [SeamlessStreaming README](docs\u002Fstreaming\u002FREADME.md) or [🤗 Model Card](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-streaming)\n\n## Seamless\n\nThe Seamless model is the unified model for expressive streaming speech-to-speech translations.\n\n## What's new\n- [12\u002F18\u002F2023] We are open-sourcing our Conformer-based [W2v-BERT 2.0 speech encoder](#w2v-bert-20-speech-encoder) as described in Section 3.2.1 of the [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.05187.pdf), which is at the core of our Seamless models.\n- [12\u002F14\u002F2023] We are releasing the Seamless [tutorial](#tutorial) given at NeurIPS 2023.\n\n# Quick Start\n## Installation\n> [!NOTE]\n> One of the prerequisites is [fairseq2](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Ffairseq2) which has pre-built packages available only\n> for Linux x86-64 and Apple-silicon Mac computers. In addition it has a dependency on [libsndfile](https:\u002F\u002Fgithub.com\u002Flibsndfile\u002Flibsndfile) which\n> might not be installed on your machine. If you experience any installation issues, please refer to its\n> [README](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Ffairseq2) for further instructions.\n\n```\npip install .\n```\n\n> [!NOTE]\n> Transcribing inference audio for computing metric uses [Whisper](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fwhisper#setup), which is automatically installed. Whisper in turn requires the command-line tool [`ffmpeg`](https:\u002F\u002Fffmpeg.org\u002F) to be installed on your system, which is available from most package managers.\n\n\n## Running inference\n\n### SeamlessM4T Inference\nHere’s an example of using the CLI from the root directory to run inference.\n\nS2ST task:\n```bash\nm4t_predict \u003Cpath_to_input_audio> --task s2st --tgt_lang \u003Ctgt_lang> --output_path \u003Cpath_to_save_audio>\n```\nT2TT task:\n```bash\nm4t_predict \u003Cinput_text> --task t2tt --tgt_lang \u003Ctgt_lang> --src_lang \u003Csrc_lang>\n```\nPlease refer to the [inference README](src\u002Fseamless_communication\u002Fcli\u002Fm4t\u002Fpredict) for detailed instruction on how to run inference and the list of supported languages on the source, target sides for speech, text modalities.\n\nFor running S2TT\u002FASR natively (without Python) using GGML, please refer to [the unity.cpp section](#unitycpp).\n\n### SeamlessExpressive Inference\n> [!NOTE]\n> Please check the [section](#seamlessexpressive-models) on how to download the model.\n\nHere’s an example of using the CLI from the root directory to run inference.\n\n```bash\nexpressivity_predict \u003Cpath_to_input_audio> --tgt_lang \u003Ctgt_lang> --model_name seamless_expressivity --vocoder_name vocoder_pretssel --output_path \u003Cpath_to_save_audio>\n```\n\n### SeamlessStreaming and Seamless Inference\n\n[Streaming Evaluation README](src\u002Fseamless_communication\u002Fcli\u002Fstreaming) has detailed instructions for running evaluations for the SeamlessStreaming and Seamless models. The CLI has an `--no-scoring` option that can be used to skip the scoring part and just run inference.\n\nPlease check the inference [README](src\u002Fseamless_communication\u002Finference) for more details.\n\n## Running SeamlessStreaming Demo\nYou can duplicate the [SeamlessStreaming HF space](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffacebook\u002Fseamless-streaming?duplicate=true) to run the streaming demo.\n\n\nYou can also run the demo locally, by cloning the space from [here](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffacebook\u002Fseamless-streaming\u002Ftree\u002Fmain). See the [README](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffacebook\u002Fseamless-streaming\u002Fblob\u002Fmain\u002FREADME.md) of the SeamlessStreaming HF repo for more details on installation.\n\n## Running SeamlessM4T & SeamlessExpressive [Gradio](https:\u002F\u002Fgithub.com\u002Fgradio-app\u002Fgradio) demos locally\n\nTo launch the same demo Space we host on Hugging Face locally:\n\n```bash\ncd demo\npip install -r requirements.txt\npython app.py\n```\n\n# Resources and usage\n## Model\n### SeamlessM4T models\n| Model Name              | #params | checkpoint                                                                                                                                                                     | metrics                                                                             |\n| ----------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------- |\n| SeamlessM4T-Large v2    | 2.3B    | [🤗 Model card](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-m4t-v2-large) - [checkpoint](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-m4t-v2-large\u002Fresolve\u002Fmain\u002FseamlessM4T_v2_large.pt  )                   | [metrics](https:\u002F\u002Fdl.fbaipublicfiles.com\u002Fseamless\u002Fmetrics\u002FseamlessM4T_large_v2.zip) |\n| SeamlessM4T-Large (v1)  | 2.3B    | [🤗 Model card](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-m4t-large) - [checkpoint](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-m4t-large\u002Fresolve\u002Fmain\u002Fmultitask_unity_large.pt)    | [metrics](https:\u002F\u002Fdl.fbaipublicfiles.com\u002Fseamless\u002Fmetrics\u002FseamlessM4T_large.zip)    |\n| SeamlessM4T-Medium (v1) | 1.2B    | [🤗 Model card](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-m4t-medium) - [checkpoint](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-m4t-medium\u002Fresolve\u002Fmain\u002Fmultitask_unity_medium.pt) | [metrics](https:\u002F\u002Fdl.fbaipublicfiles.com\u002Fseamless\u002Fmetrics\u002FseamlessM4T_medium.zip)   |\n\n### SeamlessExpressive models\n\n[🤗 Model card](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-expressive)\n\nTo access and download SeamlessExpressive, please request the model artifacts through [this request form](https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fseamless-downloads\u002F). Upon approval, you will then receive an email with download links to each model artifact.\n\nPlease note that SeamlessExpressive is made available under its own [License](SEAMLESS_LICENSE) and [Acceptable Use Policy](ACCEPTABLE_USE_POLICY).\n\n### SeamlessStreaming models\n| Model Name        | #params | checkpoint                                                                                                                                                                                                                                                                                              | metrics                                                                                     |\n| ----------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------- |\n| SeamlessStreaming | 2.5B    | [🤗 Model card](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-streaming) - [monotonic decoder checkpoint](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-streaming\u002Fresolve\u002Fmain\u002Fseamless_streaming_monotonic_decoder.pt) - [streaming UnitY2 checkpoint](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-streaming\u002Fresolve\u002Fmain\u002Fseamless_streaming_unity.pt) | [metrics](https:\u002F\u002Fdl.fbaipublicfiles.com\u002Fseamless\u002Fmetrics\u002Fstreaming\u002Fseamless_streaming.zip) |\n\n### Seamless models\nSeamless model is simply the SeamlessStreaming model with the non-expressive `vocoder_v2` swapped out with the expressive `vocoder_pretssel`.\nPlease check out above [section](#seamlessexpressive-models) on how to acquire `vocoder_pretssel` checkpoint.\n\n### W2v-BERT 2.0 speech encoder\n| Model Name        | #params | checkpoint                                                                                                                                                                                                                                                                                                                                                                 |\n| ----------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| W2v-BERT 2.0 | 600M    | [🤗 Model card](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fconformer-shaw) - [checkpoint](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fconformer-shaw\u002Fresolve\u002Fmain\u002Fconformer_shaw.pt)\n\nHere's how you should do a foward pass through the speech encoder:\n\n```python\nimport torch\n\nfrom fairseq2.data.audio import AudioDecoder, WaveformToFbankConverter\nfrom fairseq2.memory import MemoryBlock\nfrom fairseq2.nn.padding import get_seqs_and_padding_mask\nfrom fairseq2.data import Collater\nfrom pathlib import Path\nfrom seamless_communication.models.conformer_shaw import load_conformer_shaw_model\n\n\naudio_wav_path, device, dtype = ...\naudio_decoder = AudioDecoder(dtype=torch.float32, device=device)\nfbank_converter = WaveformToFbankConverter(\n    num_mel_bins=80,\n    waveform_scale=2**15,\n    channel_last=True,\n    standardize=True,\n    device=device,\n    dtype=dtype,\n)\ncollater = Collater(pad_value=1)\n\nmodel = load_conformer_shaw_model(\"conformer_shaw\", device=device, dtype=dtype)\nmodel.eval()\n\nwith Path(audio_wav_path).open(\"rb\") as fb:\n    block = MemoryBlock(fb.read())\n\ndecoded_audio = audio_decoder(block)\nsrc = collater(fbank_converter(decoded_audio))[\"fbank\"]\nseqs, padding_mask = get_seqs_and_padding_mask(src)\n\nwith torch.inference_mode():\n  seqs, padding_mask = model.encoder_frontend(seqs, padding_mask)\n  seqs, padding_mask = model.encoder(seqs, padding_mask)\n```\n\n## Evaluation\n\n### SeamlessM4T Evaluation\nTo reproduce our results, or to evaluate using the same metrics over your own test sets, please check out the [README here](src\u002Fseamless_communication\u002Fcli\u002Fm4t\u002Fevaluate).\n### SeamlessExpressive Evaluation\n\nBelow is the script for efficient batched evaluation.\n\n```bash\nexport MODEL_DIR=\"\u002Fpath\u002Fto\u002FSeamlessExpressive\u002Fmodel\"\nexport TEST_SET_TSV=\"input.tsv\" # Your dataset in a TSV file, with headers \"id\", \"audio\"\nexport TGT_LANG=\"spa\" # Target language to translate into, options including \"fra\", \"deu\", \"eng\" (\"cmn\" and \"ita\" are experimental)\nexport OUTPUT_DIR=\"tmp\u002F\" # Output directory for generated text\u002Funit\u002Fwaveform\nexport TGT_TEXT_COL=\"tgt_text\" # The column in your ${TEST_SET_TSV} for reference target text to calcuate BLEU score. You can skip this argument.\nexport DFACTOR=\"1.0\" # Duration factor for model inference to tune predicted duration (preddur=DFACTOR*preddur) per each position which affects output speech rate. Greater value means slower speech rate (default to 1.0). See expressive evaluation README for details on duration factor we used.\nexpressivity_evaluate ${TEST_SET_TSV} \\\n  --gated-model-dir ${MODEL_DIR} --task s2st --tgt_lang ${TGT_LANG} \\\n  --audio_root_dir \"\" --output_path ${OUTPUT_DIR} --ref_field ${TGT_TEXT_COL} \\\n  --model_name seamless_expressivity --vocoder_name vocoder_pretssel \\\n  --text_unk_blocking True --duration_factor ${DFACTOR}\n```\n\nPlease check out this [README section](docs\u002Fexpressive\u002FREADME.md#automatic-evaluation)\n\n### SeamlessStreaming and Seamless Evaluation\n\n[Streaming Evaluation README](src\u002Fseamless_communication\u002Fcli\u002Fstreaming) has detailed instructions for running evaluations on the SeamlessStreaming and Seamless models.\n\n## Unity.cpp\nTo enable Seamless Communication Everywhere, we implemented unity.cpp so users could run SeamlessM4T models in GGML - a C tensor library allowing easier integration on verbose platforms.\n\nTo transcribe\u002Ftranslte a given audio,\n\n```\n.\u002Fggml\u002Fbin\u002Funity --model seamlessM4T_medium.ggml input.wav\n```\n\nFor details of build and more usage please check out [unity.cpp](ggml)\n\n## Expressive Datasets\n\nWe created two expressive speech-to-speech translation datasets, mExpresso and mDRAL, between English and five other languages -- French, German, Italian, Mandarin and Spanish. We currently open source the speech-to-text of mExpresso for out-of-English directions, and we will open source the remaining part of the datasets soon. For details, please check out [README](docs\u002Fexpressive\u002FREADME.md#benchmark-datasets)\n\n### SeamlessAlignExpressive\nWe’re introducing the first expressive speech alignment procedure. Starting with raw data, the expressive alignment procedure automatically discovers pairs of audio segments sharing not only the same meaning, but the same overall expressivity. To showcase this procedure, we are making metadata available to create a benchmarking dataset called SeamlessAlignExpressive, that can be used to validate the quality of our alignment method. SeamlessAlignExpressive is the first large-scale (11k+ hours) collection of multilingual audio alignments for expressive translation. More details can be found on the [SeamlessAlignExpressive README](docs\u002Fexpressive\u002Fseamless_align_expressive_README.md).\n\n\n## Converting raw audio to units\nPlease check out the [README here](src\u002Fseamless_communication\u002Fcli\u002Fm4t\u002Faudio_to_units). Note that SeamlessM4T v1 model uses reduced units and other models use non-reduced units.\n\n# Libraries\n\nSeamless Communication depends on 4 libraries developed by Meta.\n\n## [fairseq2](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Ffairseq2)\nfairseq2 is our next-generation open-source library of sequence modeling components that provides researchers and developers with building blocks for machine translation, language modeling, and other sequence generation tasks. All SeamlessM4T models in this repository are powered by fairseq2.\n\n## [SONAR and BLASER 2.0](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FSONAR)\nSONAR, Sentence-level multimOdal and laNguage-Agnostic Representations is a new multilingual and -modal sentence embedding space which outperforms existing sentence embeddings such as LASER3 and LabSE on the xsim and xsim++ multilingual similarity search tasks. SONAR provides text and speech encoders for many languages. SeamlessAlign was mined based on SONAR embeddings.\n\nBLASER 2.0 is our latest model-based evaluation metric for multimodal translation. It is an extension of BLASER, supporting both speech and text. It operates directly on the source signal, and as such, does not require any intermediate ASR system like ASR-BLEU. As in the first version, BLASER 2.0 leverages the similarity between input and output sentence embeddings. SONAR is the underlying embedding space for BLASER 2.0. Scripts to run evaluation with BLASER 2.0 can be found in the [SONAR repo](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FSONAR).\n\n## [stopes](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fstopes)\nAs part of the seamless communication project, we've extended the stopes library. Version 1 provided a text-to-text mining tool to build training dataset for translation models. Version 2 has been extended thanks to SONAR, to support tasks around training large speech translation models. In particular, we provide tools to read\u002Fwrite the fairseq audiozip datasets and a new mining pipeline that can do speech-to-speech, text-to-speech, speech-to-text and text-to-text mining, all based on the new SONAR embedding space.\n\n## [SimulEval](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FSimulEval)\nSimulEval is a library used for evaluating simulaneous translation models. SimulEval also provides a backend for generation using partial\u002Fincremental inputs with flexible\u002Fextensible states, which is used to implement streaming inference. Users define agents which implement SimulEval's interface, which can be connected together in a pipeline. You can find agents implemented for SeamlessStreaming [here](src\u002Fseamless_communication\u002Fstreaming\u002Fagents).\n\n## [Legacy] SeamlessM4T v1 instructions\n#### Finetuning SeamlessM4T v1 models\nPlease check out the [README here](src\u002Fseamless_communication\u002Fcli\u002Fm4t\u002Ffinetune).\n\n#### On-device models\nApart from Seamless-M4T large (2.3B) and medium (1.2B) models, we are also releasing a small model (281M) targeted for on-device inference. To learn more about the usage and model details check out the [README here](docs\u002Fm4t\u002Fon_device_README.md).\n\n#### SeamlessAlign mined dataset\nWe open-source the metadata to SeamlessAlign, the largest open dataset for multimodal translation, totaling 270k+ hours of aligned Speech and Text data. The dataset can be rebuilt by the community based on the [SeamlessAlign readme](docs\u002Fm4t\u002Fseamless_align_README.md).\n\n\n# Citation\nIf you use Seamless in your work or any models\u002Fdatasets\u002Fartifacts published in Seamless, please cite :\n\n```bibtex\n@inproceedings{seamless2023,\n   title=\"Seamless: Multilingual Expressive and Streaming Speech Translation\",\n   author=\"{Seamless Communication}, Lo{\\\"i}c Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek, Yilin Yang, Ethan Ye, Ivan Evtimov, Pierre Fernandez, Cynthia Gao, Prangthip Hansanti, Elahe Kalbassi, Amanda Kallet, Artyom Kozhevnikov, Gabriel Mejia, Robin San Roman, Christophe Touret, Corinne Wong, Carleigh Wood, Bokai Yu, Pierre Andrews, Can Balioglu, Peng-Jen Chen, Marta R. Costa-juss{\\`a}, Maha Elbayad, Hongyu Gong, Francisco Guzm{\\'a}n, Kevin Heffernan, Somya Jain, Justine Kao, Ann Lee, Xutai Ma, Alex Mourachko, Benjamin Peloquin, Juan Pino, Sravya Popuri, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Anna Sun, Paden Tomasello, Changhan Wang, Jeff Wang, Skyler Wang, Mary Williamson\",\n  journal={ArXiv},\n  year={2023}\n}\n```\n\n# License\n\nWe have three license categories.\n\nThe following non-generative components are MIT licensed as found in [MIT_LICENSE](MIT_LICENSE):\n- [W2v-BERT 2.0 speech encoder](#w2v-bert-20-speech-encoder)\n- Code\n- Text only part of the mExpresso dataset found in the [SeamlessExpressive README](docs\u002Fexpressive\u002FREADME.md).\n- UnitY2 forced alignment extractor found in the [UnitY2 Aligner README](docs\u002Fm4t\u002Funity2_aligner_README.md).\n- Speech toxicity tool with the etox dataset found in the [ETOX README](src\u002Fseamless_communication\u002Fcli\u002Ftoxicity\u002Fetox).\n- MuTox: Universal MUltilingual Audio-based TOXicity Dataset and Zero-shot Detector [Mutox README](src\u002Fseamless_communication\u002Fcli\u002Ftoxicity\u002Fmutox)\n\nThe following models are CC-BY-NC 4.0 licensed as found in the [LICENSE](LICENSE):\n- SeamlessM4T models (v1 and v2).\n- SeamlessStreaming models.\n\nThe following models are Seamless licensed as found in [SEAMLESS_LICENSE](SEAMLESS_LICENSE):\n- Seamless models.\n- SeamlessExpressive models.\n","![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_seamless_communication_readme_570c4247e5ec.jpg)\n# Seamless 简介\n\nSeamless 是一系列 AI 模型，旨在实现跨语言之间更加自然和真实的沟通。SeamlessM4T 是一个庞大的多语言多模态机器翻译模型，支持约 100 种语言。SeamlessM4T 是 SeamlessExpressive 和 SeamlessStreaming 的基础：SeamlessExpressive 能够在不同语言之间保留韵律和语音风格等元素，而 SeamlessStreaming 则支持近 100 种语言的同声传译和流式自动语音识别（ASR）。SeamlessExpressive 和 SeamlessStreaming 被整合为 Seamless，这是一款兼具多语言性、实时性和表现力的统一模型。\n\n## 链接\n\n### 演示\n\n|                        | SeamlessM4T v2                                                                                                                        | SeamlessExpressive                                                                                                                               | SeamlessStreaming                                                                      |\n| ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------- |\n| 演示                   | [SeamlessM4T v2 演示](https:\u002F\u002Fseamless.metademolab.com\u002Fm4t?utm_source=github&utm_medium=web&utm_campaign=seamless&utm_content=readme) | [SeamlessExpressive 演示](https:\u002F\u002Fseamless.metademolab.com\u002Fexpressive?utm_source=github&utm_medium=web&utm_campaign=seamless&utm_content=readme) |                                                                                          |\n| HuggingFace Space 演示 | [🤗 SeamlessM4T v2 Space](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffacebook\u002Fseamless-m4t-v2-large)                                                | [🤗 SeamlessExpressive Space](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffacebook\u002Fseamless-expressive)                                                         | [🤗 SeamlessStreaming Space](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffacebook\u002Fseamless-streaming) |\n\n### 论文\n[Seamless](https:\u002F\u002Fai.facebook.com\u002Fresearch\u002Fpublications\u002Fseamless-multilingual-expressive-and-streaming-speech-translation\u002F)\n\n[EMMA](https:\u002F\u002Fai.meta.com\u002Fresearch\u002Fpublications\u002Fefficient-monotonic-multihead-attention\u002F)\n\n[SONAR](https:\u002F\u002Fai.meta.com\u002Fresearch\u002Fpublications\u002Fsonar-expressive-zero-shot-expressive-speech-to-speech-translation\u002F)\n\n### 博客\n[Meta AI 博客](https:\u002F\u002Fai.meta.com\u002Fresearch\u002Fseamless-communication\u002F)\n\n## 教程\n一份详尽的 [教程](Seamless_Tutorial.ipynb)，在 NeurIPS 2023 - Seamless EXPO 上提供，是学习如何使用 Seamless 全套模型的一站式指南。欢迎随意尝试该笔记本。\n\n## SeamlessM4T\nSeamlessM4T 是我们的基础型一体化 **M**assively **M**ultilingual and **M**ultimodal **M**achine **T**ranslation 模型，能够为近 100 种语言中的语音和文本提供高质量的翻译。\n\nSeamlessM4T 模型支持以下任务：\n- 语音到语音翻译（S2ST）\n- 语音到文本翻译（S2TT）\n- 文本到语音翻译（T2ST）\n- 文本到文本翻译（T2TT）\n- 自动语音识别（ASR）\n\n:star2: 我们发布了 SeamlessM4T v2，这是采用全新 *UnitY2* 架构的更新版本。与 SeamlessM4T v1 相比，新模型在语音生成任务的质量和推理延迟方面均有提升。\n\n如需了解更多关于 SeamlessM4T 系列模型的信息，包括每种模型所采用的方法、语言覆盖范围及其性能，请访问 [SeamlessM4T README](docs\u002Fm4t\u002FREADME.md) 或 [🤗 Model Card](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-m4t-v2-large)。\n\n> [!NOTE]\n> Seamless M4T 也可在 🤗 Transformers 库中使用。请参阅 [此部分](docs\u002Fm4t\u002FREADME.md#transformers-usage) 获取更多详情。\n\n## SeamlessExpressive\n\nSeamlessExpressive 是一款语音到语音翻译模型，能够在保持高内容翻译质量的同时，捕捉语速和停顿等尚未充分探索的韵律特征，并保留说话者的个人风格。\n\n如需了解更多关于 SeamlessExpressive 模型的信息，请访问 [SeamlessExpressive README](docs\u002Fexpressive\u002FREADME.md) 或 [🤗 Model Card](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-expressive)。\n\n## SeamlessStreaming\n\nSeamlessStreaming 是一款流式翻译模型。该模型以语音作为输入模态，输出模态则为语音或文本。\n\nSeamlessStreaming 模型支持以下任务：\n- 语音到语音翻译（S2ST）\n- 语音到文本翻译（S2TT）\n- 自动语音识别（ASR）\n\n如需了解更多关于 SeamlessStreaming 模型的信息，请访问 [SeamlessStreaming README](docs\u002Fstreaming\u002FREADME.md) 或 [🤗 Model Card](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-streaming)。\n\n## Seamless\n\nSeamless 模型是一款用于表达性流式语音到语音翻译的统一模型。\n\n## 最新动态\n- [2023年12月18日] 我们开源了基于 Conformer 的 [W2v-BERT 2.0 语音编码器](#w2v-bert-20-speech-encoder)，如 [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.05187.pdf) 第 3.2.1 节所述，它是我们 Seamless 模型的核心。\n- [2023年12月14日] 我们发布了在 NeurIPS 2023 上提供的 [教程](#tutorial)。\n\n# 快速入门\n## 安装\n> [!NOTE]\n> 先决条件之一是 [fairseq2](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Ffairseq2)，其预编译包仅适用于 Linux x86-64 和 Apple Silicon Mac 电脑。此外，它还依赖于 [libsndfile](https:\u002F\u002Fgithub.com\u002Flibsndfile\u002Flibsndfile)，而您的系统可能尚未安装该库。如果您在安装过程中遇到任何问题，请参考其 [README](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Ffairseq2) 以获取进一步的说明。\n\n```\npip install .\n```\n\n> [!NOTE]\n> 在计算指标时，对推理音频进行转录需要用到 [Whisper](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fwhisper#setup)，该工具会自动安装。Whisper 又需要您系统上已安装命令行工具 [`ffmpeg`](https:\u002F\u002Fffmpeg.org\u002F)，大多数软件包管理器均可提供该工具。\n\n\n## 运行推理\n\n### SeamlessM4T 推理\n以下是从根目录使用命令行界面运行推理的示例。\n\nS2ST 任务：\n```bash\nm4t_predict \u003Cpath_to_input_audio> --task s2st --tgt_lang \u003Ctgt_lang> --output_path \u003Cpath_to_save_audio>\n```\nT2TT 任务：\n```bash\nm4t_predict \u003Cinput_text> --task t2tt --tgt_lang \u003Ctgt_lang> --src_lang \u003Csrc_lang>\n```\n有关如何运行推理以及语音和文本模态下源语言和目标语言的支持列表，请参阅 [推理 README](src\u002Fseamless_communication\u002Fcli\u002Fm4t\u002Fpredict)。\n\n如需使用 GGML 原生运行 S2TT\u002FASR（无需 Python），请参阅 [unity.cpp 部分](#unitycpp)。\n\n### 无缝表达推理\n> [!NOTE]\n> 请查看[章节](#seamlessexpressive-models)，了解如何下载模型。\n\n以下是从根目录使用 CLI 运行推理的示例。\n\n```bash\nexpressivity_predict \u003C输入音频路径> --tgt_lang \u003C目标语言> --model_name seamless_expressivity --vocoder_name vocoder_pretssel --output_path \u003C保存音频路径>\n```\n\n### 无缝流式传输与无缝推理\n\n[流式评估 README](src\u002Fseamless_communication\u002Fcli\u002Fstreaming) 提供了运行 SeamlessStreaming 和 Seamless 模型评估的详细说明。CLI 具有一个 `--no-scoring` 选项，可用于跳过评分部分，仅执行推理。\n\n请查看推理 [README](src\u002Fseamless_communication\u002Finference)，以获取更多详细信息。\n\n## 运行 SeamlessStreaming 演示\n您可以复制 [SeamlessStreaming HF 空间](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffacebook\u002Fseamless-streaming?duplicate=true)，以运行流式演示。\n\n\n您也可以通过从[这里](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffacebook\u002Fseamless-streaming\u002Ftree\u002Fmain)克隆该空间，在本地运行演示。有关安装的更多详细信息，请参阅 SeamlessStreaming HF 仓库的[README](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffacebook\u002Fseamless-streaming\u002Fblob\u002Fmain\u002FREADME.md)。\n\n## 在本地运行 SeamlessM4T 和 SeamlessExpressive [Gradio](https:\u002F\u002Fgithub.com\u002Fgradio-app\u002Fgradio) 演示\n\n要在本地启动我们在 Hugging Face 上托管的相同演示 Space：\n\n```bash\ncd demo\npip install -r requirements.txt\npython app.py\n```\n\n# 资源与使用\n## 模型\n### SeamlessM4T 模型\n| 模型名称              | 参数量 | 检查点                                                                                                                                                                     | 指标                                                                             |\n| ----------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------- |\n| SeamlessM4T-Large v2    | 2.3B    | [🤗 模型卡片](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-m4t-v2-large) - [检查点](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-m4t-v2-large\u002Fresolve\u002Fmain\u002FseamlessM4T_v2_large.pt  )                   | [指标](https:\u002F\u002Fdl.fbaipublicfiles.com\u002Fseamless\u002Fmetrics\u002FseamlessM4T_large_v2.zip) |\n| SeamlessM4T-Large (v1)  | 2.3B    | [🤗 模型卡片](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-m4t-large) - [检查点](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-m4t-large\u002Fresolve\u002Fmain\u002Fmultitask_unity_large.pt)    | [指标](https:\u002F\u002Fdl.fbaipublicfiles.com\u002Fseamless\u002Fmetrics\u002FseamlessM4T_large.zip)    |\n| SeamlessM4T-Medium (v1) | 1.2B    | [🤗 模型卡片](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-m4t-medium) - [检查点](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-m4t-medium\u002Fresolve\u002Fmain\u002Fmultitask_unity_medium.pt) | [指标](https:\u002F\u002Fdl.fbaipublicfiles.com\u002Fseamless\u002Fmetrics\u002FseamlessM4T_medium.zip)   |\n\n### SeamlessExpressive 模型\n\n[🤗 模型卡片](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-expressive)\n\n要访问并下载 SeamlessExpressive，请通过[此请求表单](https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fseamless-downloads\u002F)申请模型文件。获批后，您将收到一封包含各模型文件下载链接的电子邮件。\n\n请注意，SeamlessExpressive 是根据其自身的[许可证](SEAMLESS_LICENSE)和[可接受使用政策](ACCEPTABLE_USE_POLICY)提供的。\n\n### SeamlessStreaming 模型\n| 模型名称        | 参数量 | 检查点                                                                                                                                                                                                                                                                                              | 指标                                                                                     |\n| ----------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------- |\n| SeamlessStreaming | 2.5B    | [🤗 模型卡片](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-streaming) - [单调解码器检查点](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-streaming\u002Fresolve\u002Fmain\u002Fseamless_streaming_monotonic_decoder.pt) - [流式 UnitY2 检查点](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fseamless-streaming\u002Fresolve\u002Fmain\u002Fseamless_streaming_unity.pt) | [指标](https:\u002F\u002Fdl.fbaipublicfiles.com\u002Fseamless\u002Fmetrics\u002Fstreaming\u002Fseamless_streaming.zip) |\n\n### Seamless 模型\nSeamless 模型实际上是 SeamlessStreaming 模型，只是将非表达性的 `vocoder_v2` 替换为表达性的 `vocoder_pretssel`。\n请查看上述[章节](#seamlessexpressive-models)，了解如何获取 `vocoder_pretssel` 检查点。\n\n### W2v-BERT 2.0 语音编码器\n| 模型名称        | 参数量 | 检查点                                                                                                                                                                                                                                                                                                                                                                 |\n| ----------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| W2v-BERT 2.0 | 600M    | [🤗 模型卡片](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fconformer-shaw) - [检查点](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fconformer-shaw\u002Fresolve\u002Fmain\u002Fconformer_shaw.pt)\n\n以下是通过语音编码器进行前向传播的方法：\n\n```python\nimport torch\n\nfrom fairseq2.data.audio import AudioDecoder, WaveformToFbankConverter\nfrom fairseq2.memory import MemoryBlock\nfrom fairseq2.nn.padding import get_seqs_and_padding_mask\nfrom fairseq2.data import Collater\nfrom pathlib import Path\nfrom seamless_communication.models.conformer_shaw import load_conformer_shaw_model\n\n\naudio_wav_path, device, dtype = ...\naudio_decoder = AudioDecoder(dtype=torch.float32, device=device)\nfbank_converter = WaveformToFbankConverter(\n    num_mel_bins=80,\n    waveform_scale=2**15,\n    channel_last=True,\n    standardize=True,\n    device=device,\n    dtype=dtype,\n)\ncollater = Collater(pad_value=1)\n\nmodel = load_conformer_shaw_model(\"conformer_shaw\", device=device, dtype=dtype)\nmodel.eval()\n\nwith Path(audio_wav_path).open(\"rb\") as fb:\n    block = MemoryBlock(fb.read())\n\ndecoded_audio = audio_decoder(block)\nsrc = collater(fbank_converter(decoded_audio))[\"fbank\"]\nseqs, padding_mask = get_seqs_and_padding_mask(src)\n\nwith torch.inference_mode():\n  seqs, padding_mask = model.encoder_frontend(seqs, padding_mask)\n  seqs, padding_mask = model.encoder(seqs, padding_mask)\n```\n\n## 评估\n\n### SeamlessM4T 评估\n如需复现我们的结果，或使用相同指标在您自己的测试集上进行评估，请参阅[此处的 README](src\u002Fseamless_communication\u002Fcli\u002Fm4t\u002Fevaluate)。\n\n### SeamlessExpressive 评估\n\n以下是高效批处理评估的脚本。\n\n```bash\nexport MODEL_DIR=\"\u002Fpath\u002Fto\u002FSeamlessExpressive\u002Fmodel\"\nexport TEST_SET_TSV=\"input.tsv\" # 您的数据集为 TSV 文件，包含“id”和“audio”两列\nexport TGT_LANG=\"spa\" # 目标语言，可选“fra”、“deu”、“eng”（“cmn”和“ita”为实验性选项）\nexport OUTPUT_DIR=\"tmp\u002F\" # 生成文本\u002F单元\u002F波形的输出目录\nexport TGT_TEXT_COL=\"tgt_text\" # 您 ${TEST_SET_TSV} 中用于计算 BLEU 分数的参考目标文本列。您可以跳过此参数。\nexport DFACTOR=\"1.0\" # 持续时间因子，用于调整模型推理中预测的持续时间（preddur=DFACTOR*preddur），每个位置都会影响输出语速。数值越大，语速越慢（默认为 1.0）。有关我们使用的持续时间因子的详细信息，请参阅 expressive 评估的 README。\nexpressivity_evaluate ${TEST_SET_TSV} \\\n  --gated-model-dir ${MODEL_DIR} --task s2st --tgt_lang ${TGT_LANG} \\\n  --audio_root_dir \"\" --output_path ${OUTPUT_DIR} --ref_field ${TGT_TEXT_COL} \\\n  --model_name seamless_expressivity --vocoder_name vocoder_pretssel \\\n  --text_unk_blocking True --duration_factor ${DFACTOR}\n```\n\n请参阅此[README 部分](docs\u002Fexpressive\u002FREADME.md#automatic-evaluation)。\n\n### SeamlessStreaming 和 Seamless 评估\n\n[Streaming Evaluation README](src\u002Fseamless_communication\u002Fcli\u002Fstreaming) 提供了关于如何对 SeamlessStreaming 和 Seamless 模型进行评估的详细说明。\n\n## Unity.cpp\n为了实现无缝通信无处不在，我们实现了 unity.cpp，使用户能够在 GGML 上运行 SeamlessM4T 模型——这是一种 C 张量库，可在资源受限的平台上更轻松地集成。\n\n要转录\u002F翻译给定的音频，\n\n```\n.\u002Fggml\u002Fbin\u002Funity --model seamlessM4T_medium.ggml input.wav\n```\n\n有关构建及更多用法的详细信息，请参阅 [unity.cpp](ggml)。\n\n## Expressive 数据集\n\n我们创建了两个富有表现力的语音到语音翻译数据集，mExpresso 和 mDRAL，涵盖英语与其他五种语言之间的互译——法语、德语、意大利语、普通话和西班牙语。目前我们已开源 mExpresso 的英语以外方向的语音转文本数据，并将很快开源剩余部分。有关详情，请参阅[README](docs\u002Fexpressive\u002FREADME.md#benchmark-datasets)。\n\n### SeamlessAlignExpressive\n我们推出了首个富有表现力的语音对齐流程。该流程从原始数据出发，自动发现不仅语义相同，而且整体表现力也一致的音频片段对。为展示这一流程，我们正在公开元数据，以创建一个名为 SeamlessAlignExpressive 的基准数据集，可用于验证我们对齐方法的质量。SeamlessAlignExpressive 是首个大规模（超过 11,000 小时）的多语言富有表现力语音对齐集合。更多详情请参阅[SeamlessAlignExpressive 的 README](docs\u002Fexpressive\u002Fseamless_align_expressive_README.md)。\n\n## 将原始音频转换为单元\n请参阅[此处的 README](src\u002Fseamless_communication\u002Fcli\u002Fm4t\u002Faudio_to_units)。请注意，SeamlessM4T v1 模型使用的是缩减版单元，而其他模型则使用非缩减版单元。\n\n# 库\n\nSeamless Communication 依赖于 Meta 开发的 4 个库。\n\n## [fairseq2](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Ffairseq2)\nfairseq2 是我们新一代的序列建模组件开源库，为研究人员和开发者提供机器翻译、语言建模及其他序列生成任务所需的构建模块。本仓库中的所有 SeamlessM4T 模型均由 fairseq2 提供支持。\n\n## [SONAR 和 BLASER 2.0](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FSONAR)\nSONAR，即句子级多模态与语言无关表示，是一个全新的多语言、多模态句子嵌入空间，在 xsim 和 xsim++ 多语言相似度搜索任务上，其性能超越了现有的 LASER3 和 LabSE 等句子嵌入模型。SONAR 为多种语言提供了文本和语音编码器。SeamlessAlign 就是基于 SONAR 嵌入挖掘得到的。\n\nBLASER 2.0 是我们最新的基于模型的多模态翻译评估指标。它是 BLASER 的扩展版本，同时支持语音和文本。它直接作用于源端信号，因此无需像 ASR-BLEU 那样依赖中间的自动语音识别系统。与第一版相同，BLASER 2.0 利用输入和输出句子嵌入之间的相似性。SONAR 是 BLASER 2.0 的底层嵌入空间。使用 BLASER 2.0 进行评估的脚本可以在 [SONAR 仓库](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FSONAR) 中找到。\n\n## [stopes](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fstopes)\n作为无缝通信项目的一部分，我们扩展了 stopes 库。1.0 版本提供了一个文本到文本的挖掘工具，用于构建翻译模型的训练数据集。2.0 版本则借助 SONAR 得到了进一步扩展，以支持大规模语音翻译模型的训练相关任务。具体来说，我们提供了读取和写入 fairseq audiozip 数据集的工具，以及一个新的挖掘流水线，能够进行语音到语音、文本到语音、语音到文本和文本到文本的挖掘工作，所有这些都基于全新的 SONAR 嵌入空间。\n\n## [SimulEval](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FSimulEval)\nSimulEval 是一个用于评估同声传译模型的库。SimulEval 还提供了一个基于部分\u002F增量输入、具有灵活可扩展状态的后端，用于实现流式推理。用户可以定义实现 SimulEval 接口的代理，并将它们串联成一个流水线。针对 SeamlessStreaming 实现的代理可以在这里找到：[src\u002Fseamless_communication\u002Fstreaming\u002Fagents]。\n\n## [旧版] SeamlessM4T v1 使用说明\n#### 微调 SeamlessM4T v1 模型\n请查看 [此处的 README](src\u002Fseamless_communication\u002Fcli\u002Fm4t\u002Ffinetune)。\n\n#### 设备端模型\n除了 Seamless-M4T 大型（23亿参数）和中型（12亿参数）模型外，我们还发布了一个小型模型（2.81亿参数），专门用于设备端推理。如需了解更多关于使用方法和模型细节，请参阅 [此处的 README](docs\u002Fm4t\u002Fon_device_README.md)。\n\n#### SeamlessAlign 挖掘数据集\n我们开源了 SeamlessAlign 的元数据，这是目前最大的多模态翻译开源数据集，总计超过 27 万小时的对齐语音和文本数据。社区可以根据 [SeamlessAlign 的 README](docs\u002Fm4t\u002Fseamless_align_README.md) 重新构建该数据集。\n\n# 引用\n如果您在工作中使用了 Seamless 或其中发布的任何模型、数据集或成果，请引用以下内容：\n\n```bibtex\n@inproceedings{seamless2023,\n   title=\"Seamless: Multilingual Expressive and Streaming Speech Translation\",\n   author=\"{Seamless Communication}, Lo{\\\"i}c Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek, Yilin Yang, Ethan Ye, Ivan Evtimov, Pierre Fernandez, Cynthia Gao, Prangthip Hansanti, Elahe Kalbassi, Amanda Kallet, Artyom Kozhevnikov, Gabriel Mejia, Robin San Roman, Christophe Touret, Corinne Wong, Carleigh Wood, Bokai Yu, Pierre Andrews, Can Balioglu, Peng-Jen Chen, Marta R. Costa-juss{\\`a}, Maha Elbayad, Hongyu Gong, Francisco Guzm{\\'a}n, Kevin Heffernan, Somya Jain, Justine Kao, Ann Lee, Xutai Ma, Alex Mourachko, Benjamin Peloquin, Juan Pino, Sravya Popuri, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Anna Sun, Paden Tomasello, Changhan Wang, Jeff Wang, Skyler Wang, Mary Williamson\",\n  journal={ArXiv},\n  year={2023}\n}\n```\n\n# 许可协议\n\n我们共有三类许可协议。\n\n以下非生成性组件采用 MIT 许可，详见 [MIT_LICENSE](MIT_LICENSE)：\n- [W2v-BERT 2.0 语音编码器](#w2v-bert-20-speech-encoder)\n- 代码\n- mExpresso 数据集中仅包含文本的部分，详见 [SeamlessExpressive 的 README](docs\u002Fexpressive\u002FREADME.md)。\n- UnitY2 强制对齐提取器，详见 [UnitY2 Aligner 的 README](docs\u002Fm4t\u002Funity2_aligner_README.md)。\n- 带有 etox 数据集的语音毒性检测工具，详见 [ETOX 的 README](src\u002Fseamless_communication\u002Fcli\u002Ftoxicity\u002Fetox)。\n- MuTox：通用多语言音频毒性数据集及零样本检测器，详见 [Mutox 的 README](src\u002Fseamless_communication\u002Fcli\u002Ftoxicity\u002Fmutox)。\n\n以下模型采用 CC-BY-NC 4.0 许可，详见 [LICENSE](LICENSE)：\n- SeamlessM4T 模型（v1 和 v2）。\n- SeamlessStreaming 模型。\n\n以下模型采用 Seamless 许可，详见 [SEAMLESS_LICENSE](SEAMLESS_LICENSE)：\n- Seamless 模型。\n- SeamlessExpressive 模型。","# Seamless Communication 快速上手指南\n\nSeamless 是 Meta 推出的一系列 AI 模型，旨在实现跨语言的自然、真实沟通。核心功能包括语音到语音翻译（S2ST）、语音到文本翻译（S2TT）、文本到语音翻译（T2ST）及自动语音识别（ASR），支持约 100 种语言。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**：Linux x86-64 或 Apple Silicon (M1\u002FM2\u002FM3) Mac。\n  > **注意**：核心依赖 `fairseq2` 目前仅提供上述平台的预编译包，Windows 用户可能需要通过 WSL2 运行。\n- **命令行工具**：需安装 `ffmpeg`（用于音频处理）。\n  - Ubuntu\u002FDebian: `sudo apt-get install ffmpeg`\n  - macOS (Homebrew): `brew install ffmpeg`\n  - CentOS\u002FRHEL: `sudo yum install ffmpeg`\n\n### 前置依赖\n- Python 3.9+\n- PyTorch (安装过程中会自动处理)\n- `libsndfile` 库（通常在安装 `fairseq2` 时检查，若缺失请参照系统包管理器安装）\n\n## 安装步骤\n\n推荐使用 pip 直接安装。由于主要依赖托管在 PyPI，国内用户若遇到下载缓慢，可临时切换至清华或阿里镜像源。\n\n```bash\n# 使用默认源安装\npip install .\n\n# 若网络受限，推荐使用国内镜像加速安装\npip install . -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n> **提示**：安装过程中会自动拉取 `Whisper` 用于评估指标计算，无需手动额外安装。\n\n## 基本使用\n\n安装完成后，可在项目根目录使用命令行工具进行推理。以下是针对核心模型 **SeamlessM4T v2** 的最简使用示例。\n\n### 1. 语音到语音翻译 (S2ST)\n将输入音频翻译成目标语言的音频。\n\n```bash\nm4t_predict \u003Cpath_to_input_audio> --task s2st --tgt_lang \u003Ctgt_lang> --output_path \u003Cpath_to_save_audio>\n```\n\n**示例**：将 `input.wav` 翻译为法语并保存：\n```bash\nm4t_predict input.wav --task s2st --tgt_lang fra --output_path output_fra.wav\n```\n\n### 2. 文本到文本翻译 (T2TT)\n将输入文本从源语言翻译为目标语言文本。\n\n```bash\nm4t_predict \"\u003Cinput_text>\" --task t2tt --tgt_lang \u003Ctgt_lang> --src_lang \u003Csrc_lang>\n```\n\n**示例**：将英文 \"Hello world\" 翻译为中文：\n```bash\nm4t_predict \"Hello world\" --task t2tt --tgt_lang cmn --src_lang eng\n```\n\n### 3.  expressive 语音翻译 (SeamlessExpressive)\n保留说话人音色和韵律的语音翻译（需先按文档申请并下载模型权重）。\n\n```bash\nexpressivity_predict \u003Cpath_to_input_audio> --tgt_lang \u003Ctgt_lang> --model_name seamless_expressivity --vocoder_name vocoder_pretssel --output_path \u003Cpath_to_save_audio>\n```\n\n> **说明**：\n> - `\u003Ctgt_lang>`：目标语言代码（如 `cmn` 代表中文，`fra` 代表法语，`spa` 代表西班牙语）。\n> - 完整支持的语言列表及更多高级用法（如流式翻译），请参阅源码目录下的 `src\u002Fseamless_communication\u002Fcli` 相关 README 文件。","一家跨国医疗援助组织正在搭建实时多语言急救指挥系统，需要让讲不同语言的医生和现场救援人员通过语音直接沟通。\n\n### 没有 seamless_communication 时\n- **沟通延迟严重**：传统方案需先将语音转文字、翻译文本、再合成语音，多重串联导致对话停顿长达数秒，延误急救指令下达。\n- **情感信息丢失**：现有工具输出的语音机械平淡，无法传递说话人焦急或镇定的语气，导致接收方难以判断事态紧急程度。\n- **小语种支持匮乏**：对于非通用语言（如斯瓦希里语或特定方言），系统往往直接失效或错误率极高，迫使团队依赖稀缺的人工翻译。\n- **部署维护复杂**：需要分别集成 ASR、机器翻译和 TTS 三个独立模型，接口适配困难且服务器资源消耗巨大。\n\n### 使用 seamless_communication 后\n- **实现流式同传**：利用 SeamlessStreaming 模型支持的同时翻译能力，语音输入后几乎实时输出目标语言，对话流畅自然无感知延迟。\n- **保留语音神韵**：SeamlessExpressive 功能完整保留了原说话人的语调、节奏和情感色彩，让救援指令听起来真实且具有感染力。\n- **百种语言全覆盖**：基于 SeamlessM4T v2 的庞大语料库，系统能高质量处理近 100 种语言的互译，包括许多此前无法覆盖的稀有语种。\n- **一体化高效部署**：单个统一模型即可搞定语音到语音、文本到文本等全链路任务，大幅降低了推理延迟和运维复杂度。\n\nseamless_communication 通过单一模型实现了高保真、低延迟的百语种实时互译，彻底打破了紧急救援场景下的语言壁垒。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_seamless_communication_570c4247.jpg","facebookresearch","Meta Research","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Ffacebookresearch_449342bd.png","",null,"https:\u002F\u002Fopensource.fb.com","https:\u002F\u002Fgithub.com\u002Ffacebookresearch",[85,89,93,97,101,104,108,112,116,119],{"name":86,"color":87,"percentage":88},"Jupyter Notebook","#DA5B0B",70.4,{"name":90,"color":91,"percentage":92},"C","#555555",13.1,{"name":94,"color":95,"percentage":96},"Python","#3572A5",9.7,{"name":98,"color":99,"percentage":100},"C++","#f34b7d",2.9,{"name":102,"color":103,"percentage":10},"Cuda","#3A4E3A",{"name":105,"color":106,"percentage":107},"Metal","#8f14e9",0.6,{"name":109,"color":110,"percentage":111},"Objective-C","#438eff",0.5,{"name":113,"color":114,"percentage":115},"CMake","#DA3434",0.3,{"name":117,"color":118,"percentage":115},"Zig","#ec915c",{"name":120,"color":121,"percentage":122},"Shell","#89e051",0.1,11772,1172,"2026-04-13T22:55:29","NOASSERTION","Linux x86-64, macOS (Apple Silicon)","未说明（依赖 fairseq2 和 PyTorch，通常建议 NVIDIA GPU 以加速推理，但 README 未明确具体型号或显存要求）","未说明",{"notes":131,"python":129,"dependencies":132},"核心依赖 fairseq2 仅预构建了 Linux x86-64 和 Apple Silicon Mac 的包，不支持 Windows 或其他架构。系统必须安装 libsndfile 库和 ffmpeg 命令行工具。SeamlessExpressive 模型需要单独申请下载权限。部分评估功能会自动安装 Whisper，其依赖 ffmpeg。",[133,134,135,136,137],"fairseq2","libsndfile","ffmpeg","whisper","gradio",[15,47,46],"2026-03-27T02:49:30.150509","2026-04-14T15:27:22.942741",[142,147,152,157,162,166],{"id":143,"question_zh":144,"answer_zh":145,"source_url":146},33192,"在微调（finetune）过程中遇到 'RuntimeError: expected scalar type Half but found Float' 错误怎么办？","该问题通常与精度设置有关。目前官方 PR 已合并修复，请确保更新到最新版本代码即可解决。如果仍在使用旧版本，可以暂时使用修复分支 `zrthxn:fix\u002Ffinetune-precision-autocast`。此外，如果遇到显存不足（CUDA out of memory），建议将 batch_size 设置为 1，并尝试使用 medium 模型而非 large 模型。","https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fseamless_communication\u002Fissues\u002F414",{"id":148,"question_zh":149,"answer_zh":150,"source_url":151},33193,"微调 SeamlessM4T 模型需要多少 GPU 显存？40GB 显存为何还会报错？","即使拥有 40GB 显存（如 A100），微调 large 版本模型仍可能因显存不足而崩溃。建议解决方案：\n1. 使用 medium 版本模型（`--model_name seamlessM4T_medium`）。\n2. 将批处理大小设为 1（`--batch_size 1`）。\n3. 移除命令中的 `torchrun` 相关参数（如 `--rdzv-backend`, `--rdzv-endpoint` 等），这大约能节省 6GB 显存。\n示例命令：\nm4t_finetune --mode SPEECH_TO_TEXT --train_dataset \u003C路径> --eval_dataset \u003C路径> --learning_rate 1e-6 --batch_size 1 --model_name seamlessM4T_medium","https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fseamless_communication\u002Fissues\u002F421",{"id":153,"question_zh":154,"answer_zh":155,"source_url":156},33194,"安装后运行代码提示 'OSError: libsndfile is not found' 错误如何解决？","这是因为系统缺少 `libsndfile` 库。虽然可能已经通过 conda 安装，但在某些 Linux 发行版（如 Ubuntu\u002FPop!_OS）上，仍需通过系统包管理器安装底层依赖。请执行以下命令：\nsudo apt install libsndfile1\n安装完成后重新运行代码即可。如果是在 Conda 环境中，确保同时也执行了 `conda install -c conda-forge libsndfile`。","https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fseamless_communication\u002Fissues\u002F42",{"id":158,"question_zh":159,"answer_zh":160,"source_url":161},33195,"SeamlessM4T Large 模型在翻译粤语（yue）时输出的是简体中文普通话而不是繁体中文粤语，这是 Bug 吗？","这是一个已知的问题。在 SeamlessM4T Large 模型中，指定目标语言为 'yue' 时，有时会错误地输出简体中文的普通话内容，而 Medium 模型表现正常。这与训练数据中的标签或代码切换（code switching）有关。目前建议临时使用 Medium 模型进行粤语翻译，或者尝试使用 'cmn_Hant'（繁体中文）作为替代方案，尽管它本意是繁体普通话，但在某些情况下可能更接近预期效果。","https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fseamless_communication\u002Fissues\u002F64",{"id":163,"question_zh":164,"answer_zh":165,"source_url":146},33196,"在 Google Colab 上运行 SeamlessM4T 推荐使用什么类型的 GPU？","根据用户反馈，Google Colab 提供的 A100 GPU 通常能很好地运行 SeamlessM4T 模型。如果使用 large 模型遇到显存问题，建议切换到 medium 模型，或者在微调时将 batch_size 调整为 1 以适应显存限制。",{"id":167,"question_zh":168,"answer_zh":169,"source_url":156},33197,"如何在显存较小的机器（如 4GB 或 12GB VRAM）上部署 SeamlessM4T 模型？","在显存受限的机器上，建议选择模型大小与显存匹配：\n- 4GB VRAM：仅支持 medium 模型。\n- 12GB VRAM：可支持 large 模型。\n确保使用 Conda 正确安装环境依赖。如果是在云端部署（如 Cerebrium），参考相关示例代码配置环境可以有效避免兼容性问题。",[]]