[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-awslabs--sockeye":3,"tool-awslabs--sockeye":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":105,"forks":106,"last_commit_at":107,"license":108,"difficulty_score":23,"env_os":109,"env_gpu":109,"env_ram":109,"env_deps":110,"category_tags":114,"github_topics":115,"view_count":10,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":133,"updated_at":134,"faqs":135,"releases":146},1208,"awslabs\u002Fsockeye","sockeye","Sequence-to-sequence framework with a focus on Neural Machine Translation based on PyTorch","Sockeye 是一个开源的神经机器翻译框架，专注于高效构建和部署序列到序列模型，基于 PyTorch 开发。它解决了大规模机器翻译模型训练和推理的效率问题——通过分布式训练和优化推理技术，让模型在实际应用中运行更快、资源消耗更低，已成功应用于 Amazon Translate 等产品。目前 Sockeye 处于维护模式，功能稳定可靠，不再添加新特性。它特别适合 NLP 领域的开发者和研究人员，用于快速搭建高质量翻译系统，尤其适合处理中英文等语言对的翻译任务。技术亮点包括对 PyTorch 的深度优化、支持大规模数据训练，以及提供从旧模型迁移的便捷工具（如 MXNet 到 PyTorch 的转换）。如果你在开发翻译应用或研究 NMT，Sockeye 能帮你简化流程，专注模型创新。","# Sockeye\n\n**Sockeye has entered maintenance mode and is no longer adding new features. We are grateful to everyone who has contributed to Sockeye throughout its development with pull requests, issue reports, and more.**\n\n[![PyPI version](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fsockeye.svg)](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fsockeye)\n[![GitHub license](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fawslabs\u002Fsockeye.svg)](https:\u002F\u002Fgithub.com\u002Fawslabs\u002Fsockeye\u002Fblob\u002Fmain\u002FLICENSE)\n[![GitHub issues](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Fawslabs\u002Fsockeye.svg)](https:\u002F\u002Fgithub.com\u002Fawslabs\u002Fsockeye\u002Fissues)\n[![Documentation Status](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fawslabs_sockeye_readme_6bf48b3e9a6d.png)](http:\u002F\u002Fsockeye.readthedocs.io\u002Fen\u002Flatest\u002F?badge=latest)\n[![Torch Nightly](https:\u002F\u002Fgithub.com\u002Fawslabs\u002Fsockeye\u002Factions\u002Fworkflows\u002Ftorch_nightly.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fawslabs\u002Fsockeye\u002Factions\u002Fworkflows\u002Ftorch_nightly.yml)\n\nSockeye is an open-source sequence-to-sequence framework for Neural Machine Translation built on [PyTorch](https:\u002F\u002Fpytorch.org\u002F). It implements distributed training and optimized inference for state-of-the-art models, powering [Amazon Translate](https:\u002F\u002Faws.amazon.com\u002Ftranslate\u002F) and other MT applications. Recent developments and changes are tracked in our [CHANGELOG](https:\u002F\u002Fgithub.com\u002Fawslabs\u002Fsockeye\u002Fblob\u002Fmaster\u002FCHANGELOG.md).\n\nFor a quickstart guide to training a standard NMT model on any size of data, see the [WMT 2014 English-German tutorial](docs\u002Ftutorials\u002Fwmt_large.md).\n\nFor questions and issue reports, please [file an issue](https:\u002F\u002Fgithub.com\u002Fawslabs\u002Fsockeye\u002Fissues\u002Fnew) on GitHub.\n\n### Version 3.1.x: PyTorch only\nWith version 3.1.x, we remove support for MXNet 2.x. Models trained with PyTorch and Sockeye 3.0.x remain compatible\nwith Sockeye 3.1.x. Models trained with 2.3.x (using MXNet) and converted to PyTorch with Sockeye 3.0.x's conversion\ntool can NOT be used with Sockeye 3.1.x.\n\n### Version 3.0.0: Concurrent PyTorch and MXNet support\nStarting with version 3.0.0, Sockeye is also based on PyTorch. We maintain backwards compatibility with\nMXNet models of version 2.3.x with 3.0.x. If MXNet 2.x is installed, Sockeye can run both with PyTorch or MXNet.\n\nAll models trained with 2.3.x (using MXNet)\ncan be converted to models running with PyTorch using the converter CLI (`sockeye.mx_to_pt`). This will\ncreate a PyTorch parameter file (`\u003Cmodel>\u002Fparams.best`) and backup the existing MXNet parameter\nfile to `\u003Cmodel>\u002Fparams.best.mx`. Note that this only applies to fully-trained models that are to be used\nfor inference. Continued training of an MXNet model with PyTorch is not supported\n(because we do not convert training and optimizer states).\n`sockeye.mx_to_pt` requires MXNet to be installed into the environment.\n\nAll CLIs of Version 3.0.0 now use PyTorch by default, e.g. `sockeye-{train,translate,score}`.\nMXNet-based CLIs\u002Fmodules are still operational and accessible via `sockeye-{train,translate,score}-mx`.\n\nSockeye 3 can be installed and run without MXNet, but if installed, an extended test suite is executed to ensure\nequivalence between PyTorch and MXNet models. Note that running Sockeye 3.0.0 with MXNet requires MXNet 2.x to be\ninstalled (`pip install --pre -f https:\u002F\u002Fdist.mxnet.io\u002Fpython 'mxnet>=2.0.0b2021'`)\n\n## Installation\n\nDownload the current version of Sockeye:\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fawslabs\u002Fsockeye.git\n```\n\nInstall the sockeye module and its dependencies:\n```bash\ncd sockeye && pip3 install --editable .\n```\n\nFor faster GPU training, install [NVIDIA Apex](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fapex). NVIDIA also provides [PyTorch Docker containers](https:\u002F\u002Fngc.nvidia.com\u002Fcatalog\u002Fcontainers\u002Fnvidia:pytorch) that include Apex.\n\n## Documentation\n\n- For information on how to use Sockeye, please visit [our documentation](https:\u002F\u002Fawslabs.github.io\u002Fsockeye\u002F).\n- Developers may be interested in our [developer guidelines](https:\u002F\u002Fawslabs.github.io\u002Fsockeye\u002Fdevelopment.html).\n\n### Older versions\n\n- Sockeye 3.0, based on PyTorch & MXNet 2.x is available in the `sockeye_30` branch.\n- Sockeye 2.x, based on the MXNet Gluon API, is available in the `sockeye_2` branch.\n- Sockeye 1.x, based on the MXNet Module API, is available in the `sockeye_1` branch.\n\n## Citation\n\nFor more information about Sockeye, see our papers ([BibTeX](sockeye.bib)).\n\n##### Sockeye 3.x\n\n> Felix Hieber, Michael Denkowski, Tobias Domhan, Barbara Darques Barros, Celina Dong Ye, Xing Niu, Cuong Hoang, Ke Tran, Benjamin Hsu, Maria Nadejde, Surafel Lakew, Prashant Mathur, Anna Currey, Marcello Federico.\n> [Sockeye 3: Fast Neural Machine Translation with PyTorch](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.05851). ArXiv e-prints.\n\n##### Sockeye 2.x\n\n> Tobias Domhan, Michael Denkowski, David Vilar, Xing Niu, Felix Hieber, Kenneth Heafield.\n> [The Sockeye 2 Neural Machine Translation Toolkit at AMTA 2020](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.amta-research.10\u002F). Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (AMTA'20).\n\n> Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar.\n> [Sockeye 2: A Toolkit for Neural Machine Translation](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Fsockeye-2-a-toolkit-for-neural-machine-translation). Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, Project Track (EAMT'20).\n\n##### Sockeye 1.x\n\n> Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton, Matt Post.\n> [The Sockeye Neural Machine Translation Toolkit at AMTA 2018](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002FW18-1820\u002F). Proceedings of the 13th Conference of the Association for Machine Translation in the Americas  (AMTA'18).\n>\n> Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton and Matt Post. 2017.\n> [Sockeye: A Toolkit for Neural Machine Translation](https:\u002F\u002Farxiv.org\u002Fabs\u002F1712.05690). ArXiv e-prints.\n\n## Research with Sockeye\n\nSockeye has been used for both academic and industrial research. A list of known publications that use Sockeye is shown below.\nIf you know more, please let us know or submit a pull request (last updated: May 2022).\n\n### 2023\n* Zhang, Xuan, Kevin Duh, Paul McNamee. \"A Hyperparameter Optimization Toolkit for Neural Machine Translation Research\". Proceedings of ACL (2023).\n\n### 2022\n* Currey, Anna, Maria Nădejde, Raghavendra Pappagari, Mia Mayer, Stanislas Lauly, Xing Niu, Benjamin Hsu, Georgiana Dinu. \"MT-GenEval: A Counterfactual and Contextual Dataset for Evaluating Gender Accuracy in Machine Translation\". Proceedings of EMNLP (2022).\n* Domhan, Tobias, Eva Hasler, Ke Tran, Sony Trenous, Bill Byrne and Felix Hieber. \"The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation\". Proceedings of NAACL-HLT (2022)\n* Fischer, Lukas, Patricia Scheurer, Raphael Schwitter, Martin Volk. \"Machine Translation of 16th Century Letters from Latin to German\". Workshop on Language Technologies for Historical and Ancient Languages (2022).\n* Knowles, Rebecca, Patrick Littell. \"Translation Memories as Baselines for Low-Resource Machine Translation\". Proceedings of LREC (2022)\n* McNamee, Paul, Kevin Duh. \"The Multilingual Microblog Translation Corpus: Improving and Evaluating Translation of User-Generated Text\". Proceedings of LREC (2022)\n* Nadejde Maria, Anna Currey, Benjamin Hsu, Xing Niu, Marcello Federico, Georgiana Dinu. \"CoCoA-MT: A Dataset and Benchmark for Contrastive Controlled MT with Application to Formality\". Proceedings of NAACL (2022).\n* Weller-Di Marco, Marion, Matthias Huck, Alexander Fraser. \"Modeling Target-Side Morphology in Neural Machine Translation: A Comparison of Strategies\n\". arXiv preprint arXiv:2203.13550 (2022)\n\n\n### 2021\n\n* Bergmanis, Toms, Mārcis Pinnis. \"Facilitating Terminology Translation with Target Lemma Annotations\". arXiv preprint arXiv:2101.10035 (2021)\n* Briakou, Eleftheria, Marine Carpuat. \"Beyond Noise: Mitigating the Impact of Fine-grained Semantic Divergences on Neural Machine Translation\". arXiv preprint arXiv:2105.15087 (2021)\n* Hasler, Eva, Tobias Domhan, Sony Trenous, Ke Tran, Bill Byrne, Felix Hieber. \"Improving the Quality Trade-Off for Neural Machine Translation Multi-Domain Adaptation\". Proceedings of EMNLP (2021)\n* Tang, Gongbo, Philipp Rönchen, Rico Sennrich, Joakim Nivre. \"Revisiting Negation in Neural Machine Translation\". Transactions of the Association for Computation Linguistics 9 (2021)\n* Vu, Thuy, Alessandro Moschitti. \"Machine Translation Customization via Automatic Training Data Selection from the Web\". arXiv preprint arXiv:2102.1024 (2021)\n* Xu, Weijia, Marine Carpuat. \"EDITOR: An Edit-Based Transformer with Repositioning for Neural Machine Translation with Soft Lexical Constraints.\" Transactions of the Association for Computation Linguistics 9 (2021)\n* Müller, Mathias, Rico Sennrich. \"Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation\". Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (2021)\n* Popović, Maja, Alberto Poncelas. \"On Machine Translation of User Reviews.\" Proceedings of RANLP (2021)\n* Popović, Maja. \"On nature and causes of observed MT errors.\" Proceedings of the 18th MT Summit (Volume 1: Research Track) (2021)\n* Jain, Nishtha, Maja Popović, Declan Groves, Eva Vanmassenhove. \"Generating Gender Augmented Data for NLP.\" Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing (2021)\n* Vilar, David, Marcello Federico. \"A Statistical Extension of Byte-Pair Encoding.\" Proceedings of IWSLT (2021)\n\n### 2020\n\n* Dinu, Georgiana, Prashant Mathur, Marcello Federico, Stanislas Lauly, Yaser Al-Onaizan. \"Joint translation and unit conversion for end-to-end localization.\" Proceedings of IWSLT (2020)\n* Exel, Miriam, Bianka Buschbeck, Lauritz Brandt, Simona Doneva. \"Terminology-Constrained Neural Machine Translation at SAP\". Proceedings of EAMT (2020).\n* Hisamoto, Sorami, Matt Post, Kevin Duh. \"Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?\" Transactions of the Association for Computational Linguistics, Volume 8 (2020)\n* Naradowsky, Jason, Xuan Zhan, Kevin Duh. \"Machine Translation System Selection from Bandit Feedback.\" arXiv preprint arXiv:2002.09646 (2020)\n* Niu, Xing, Prashant Mathur, Georgiana Dinu, Yaser Al-Onaizan. \"Evaluating Robustness to Input Perturbations for Neural Machine Translation\". arXiv preprint \tarXiv:2005.00580 (2020)\n* Niu, Xing, Marine Carpuat. \"Controlling Neural Machine Translation Formality with Synthetic Supervision.\" Proceedings of AAAI (2020)\n* Keung, Phillip, Julian Salazar, Yichao Liu, Noah A. Smith. \"Unsupervised Bitext Mining and Translation\nvia Self-Trained Contextual Embeddings.\" arXiv preprint arXiv:2010.07761 (2020).\n* Sokolov, Alex, Tracy Rohlin, Ariya Rastrow. \"Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion.\" arXiv preprint arXiv:2006.14194 (2020)\n* Stafanovičs, Artūrs, Toms Bergmanis, Mārcis Pinnis. \"Mitigating Gender Bias in Machine Translation with Target Gender\nAnnotations.\" arXiv preprint arXiv:2010.06203 (2020)\n* Stojanovski, Dario, Alexander Fraser. \"Addressing Zero-Resource Domains Using Document-Level Context in Neural Machine Translation.\" arXiv preprint arXiv preprint arXiv:2004.14927 (2020)\n* Stojanovski, Dario, Benno Krojer, Denis Peskov, Alexander Fraser. \"ContraCAT: Contrastive Coreference Analytical Templates for Machine Translation\". Proceedings of COLING (2020)\n* Zhang, Xuan, Kevin Duh. \"Reproducible and Efficient Benchmarks for Hyperparameter Optimization of Neural Machine Translation Systems.\" Transactions of the Association for Computational Linguistics, Volume 8 (2020)\n* Swe Zin Moe, Ye Kyaw Thu, Hnin Aye Thant, Nandar Win Min, and Thepchai Supnithi, \"Unsupervised Neural Machine Translation between Myanmar Sign Language and Myanmar Language\", Journal of Intelligent Informatics and Smart Technology, April 1st Issue, 2020, pp. 53-61. (Submitted December 21, 2019; accepted March 6, 2020; revised March 16, 2020; published online April 30, 2020)\n* Thazin Myint Oo, Ye Kyaw Thu, Khin Mar Soe and Thepchai Supnithi, \"Neural Machine Translation between Myanmar (Burmese) and Dawei (Tavoyan)\", In Proceedings of the 18th International Conference on Computer Applications (ICCA 2020), Feb 27-28, 2020, Yangon, Myanmar, pp. 219-227\n* Müller, Mathias, Annette Rios, Rico Sennrich. \"Domain Robustness in Neural Machine Translation.\" Proceedings of AMTA (2020)\n* Rios, Annette, Mathias Müller, Rico Sennrich. \"Subword Segmentation and a Single Bridge Language Affect Zero-Shot Neural Machine Translation.\" Proceedings of the 5th WMT: Research Papers (2020)\n* Popović, Maja, Alberto Poncelas. \"Neural Machine Translation between similar South-Slavic languages.\" Proceedings of the 5th WMT: Research Papers (2020)\n* Popović, Maja, Alberto Poncelas. \"Extracting correctly aligned segments from unclean parallel data using character n-gram matching.\" Proceedings of Conference on Language Technologies & Digital Humanities (JTDH 2020).\n* Popović, Maja, Alberto Poncelas, Marija Brkic, Andy Way. \"Neural Machine Translation for translating into Croatian and Serbian.\" Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects (2020)\n\n### 2019\n\n* Agrawal, Sweta, Marine Carpuat. \"Controlling Text Complexity in Neural Machine Translation.\" Proceedings of EMNLP (2019)\n* Beck, Daniel, Trevor Cohn, Gholamreza Haffari. \"Neural Speech Translation using Lattice Transformations and Graph Networks.\" Proceedings of TextGraphs-13 (EMNLP 2019)\n* Currey, Anna, Kenneth Heafield. \"Zero-Resource Neural Machine Translation with Monolingual Pivot Data.\" Proceedings of EMNLP (2019)\n* Gupta, Prabhakar, Mayank Sharma. \"Unsupervised Translation Quality Estimation for Digital Entertainment Content Subtitles.\" IEEE International Journal of Semantic Computing (2019)\n* Hu, J. Edward, Huda Khayrallah, Ryan Culkin, Patrick Xia, Tongfei Chen, Matt Post, and Benjamin Van Durme. \"Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting.\" Proceedings of NAACL-HLT (2019)\n* Rosendahl, Jan, Christian Herold, Yunsu Kim, Miguel Graça,Weiyue Wang, Parnia Bahar, Yingbo Gao and Hermann Ney “The RWTH Aachen University Machine Translation Systems for WMT 2019” Proceedings of the 4th WMT: Research Papers (2019)\n* Thompson, Brian, Jeremy Gwinnup, Huda Khayrallah, Kevin Duh, and Philipp Koehn. \"Overcoming catastrophic forgetting during domain adaptation of neural machine translation.\" Proceedings of NAACL-HLT 2019 (2019)\n* Tättar, Andre, Elizaveta Korotkova, Mark Fishel “University of Tartu’s Multilingual Multi-domain WMT19 News Translation Shared Task Submission” Proceedings of 4th WMT: Research Papers (2019)\n* Thazin Myint Oo, Ye Kyaw Thu and Khin Mar Soe, \"Neural Machine Translation between Myanmar (Burmese) and Rakhine (Arakanese)\", In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects, NAACL-2019, June 7th 2019, Minneapolis, United States, pp. 80-88\n\n### 2018\n\n* Domhan, Tobias. \"How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures\". Proceedings of 56th ACL (2018)\n* Kim, Yunsu, Yingbo Gao, and Hermann Ney. \"Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies.\" arXiv preprint arXiv:1905.05475 (2019)\n* Korotkova, Elizaveta, Maksym Del, and Mark Fishel. \"Monolingual and Cross-lingual Zero-shot Style Transfer.\" arXiv preprint arXiv:1808.00179 (2018)\n* Niu, Xing, Michael Denkowski, and Marine Carpuat. \"Bi-directional neural machine translation with synthetic parallel data.\" arXiv preprint arXiv:1805.11213 (2018)\n* Niu, Xing, Sudha Rao, and Marine Carpuat. \"Multi-Task Neural Models for Translating Between Styles Within and Across Languages.\" COLING (2018)\n* Post, Matt and David Vilar. \"Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation.\" Proceedings of NAACL-HLT (2018)\n* Schamper, Julian, Jan Rosendahl, Parnia Bahar, Yunsu Kim, Arne Nix, and Hermann Ney. \"The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018.\" Proceedings of the 3rd WMT: Shared Task Papers (2018)\n* Schulz, Philip, Wilker Aziz, and Trevor Cohn. \"A stochastic decoder for neural machine translation.\" arXiv preprint arXiv:1805.10844 (2018)\n* Tamer, Alkouli, Gabriel Bretschner, and Hermann Ney. \"On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation.\" Proceedings of the 3rd WMT: Research Papers (2018)\n* Tang, Gongbo, Rico Sennrich, and Joakim Nivre. \"An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation.\" Proceedings of 3rd WMT: Research Papers (2018)\n* Thompson, Brian, Huda Khayrallah, Antonios Anastasopoulos, Arya McCarthy, Kevin Duh, Rebecca Marvin, Paul McNamee, Jeremy Gwinnup, Tim Anderson, and Philipp Koehn. \"Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation.\" arXiv preprint arXiv:1809.05218 (2018)\n* Vilar, David. \"Learning Hidden Unit Contribution for Adapting Neural Machine Translation Models.\" Proceedings of NAACL-HLT (2018)\n* Vyas, Yogarshi, Xing Niu and Marine Carpuat “Identifying Semantic Divergences in Parallel Text without Annotations”. Proceedings of NAACL-HLT (2018)\n* Wang, Weiyue, Derui Zhu, Tamer Alkhouli, Zixuan Gan, and Hermann Ney. \"Neural Hidden Markov Model for Machine Translation\". Proceedings of 56th ACL (2018)\n* Zhang, Xuan, Gaurav Kumar, Huda Khayrallah, Kenton Murray, Jeremy Gwinnup, Marianna J Martindale, Paul McNamee, Kevin Duh, and Marine Carpuat. \"An Empirical Exploration of Curriculum Learning for Neural Machine Translation.\" arXiv preprint arXiv:1811.00739 (2018)\n* Swe Zin Moe, Ye Kyaw Thu, Hnin Aye Thant and Nandar Win Min, \"Neural Machine Translation between Myanmar Sign Language and Myanmar Written Text\", In the second Regional Conference on Optical character recognition and Natural language processing technologies for ASEAN languages 2018 (ONA 2018), December 13-14, 2018, Phnom Penh, Cambodia.\n* Tang, Gongbo, Mathias Müller, Annette Rios and Rico Sennrich. \"Why Self-attention? A Targeted Evaluation of Neural Machine Translation Architectures.\" Proceedings of EMNLP (2018)\n\n### 2017\n\n* Domhan, Tobias and Felix Hieber. \"Using target-side monolingual data for neural machine translation through multi-task learning.\" Proceedings of EMNLP (2017).\n","# 红鲑鱼\n\n**红鲑鱼已进入维护模式，不再添加新功能。我们感谢所有在开发过程中通过提交拉取请求、报告问题等方式为红鲑鱼做出贡献的人们。**\n\n[![PyPI版本](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fsockeye.svg)](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fsockeye)\n[![GitHub许可证](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fawslabs\u002Fsockeye.svg)](https:\u002F\u002Fgithub.com\u002Fawslabs\u002Fsockeye\u002Fblob\u002Fmain\u002FLICENSE)\n[![GitHub问题](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Fawslabs\u002Fsockeye.svg)](https:\u002F\u002Fgithub.com\u002Fawslabs\u002Fsockeye\u002Fissues)\n[![文档状态](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fawslabs_sockeye_readme_6bf48b3e9a6d.png)](http:\u002F\u002Fsockeye.readthedocs.io\u002Fen\u002Flatest\u002F?badge=latest)\n[![Torch Nightly](https:\u002F\u002Fgithub.com\u002Fawslabs\u002Fsockeye\u002Factions\u002Fworkflows\u002Ftorch_nightly.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fawslabs\u002Fsockeye\u002Factions\u002Fworkflows\u002Ftorch_nightly.yml)\n\nSockeye 是一个基于 [PyTorch](https:\u002F\u002Fpytorch.org\u002F) 的开源序列到序列框架，用于神经机器翻译。它实现了分布式训练和针对最先进模型的优化推理，为 [Amazon Translate](https:\u002F\u002Faws.amazon.com\u002Ftranslate\u002F) 和其他机器翻译应用提供支持。近期的开发和变更记录在我们的 [CHANGELOG](https:\u002F\u002Fgithub.com\u002Fawslabs\u002Fsockeye\u002Fblob\u002Fmaster\u002FCHANGELOG.md) 中。\n\n如需快速开始训练任意规模数据的标准 NMT 模型，请参阅 [WMT 2014 英德教程](docs\u002Ftutorials\u002Fwmt_large.md)。\n\n如有任何问题或需要报告问题，请在 GitHub 上 [提交 issue](https:\u002F\u002Fgithub.com\u002Fawslabs\u002Fsockeye\u002Fissues\u002Fnew)。\n\n### 版本 3.1.x：仅支持 PyTorch\n在版本 3.1.x 中，我们移除了对 MXNet 2.x 的支持。使用 PyTorch 和 Sockeye 3.0.x 训练的模型仍然与 Sockeye 3.1.x 兼容。而使用 MXNet 训练并在 Sockeye 3.0.x 的转换工具中转为 PyTorch 格式的 2.3.x 模型，则无法在 Sockeye 3.1.x 中使用。\n\n### 版本 3.0.0：同时支持 PyTorch 和 MXNet\n从版本 3.0.0 开始，Sockeye 也基于 PyTorch 构建。我们保持了与 2.3.x 版本 MXNet 模型的向后兼容性。如果安装了 MXNet 2.x，Sockeye 可以同时使用 PyTorch 或 MXNet 运行。\n\n所有使用 MXNet 训练的 2.3.x 模型都可以通过转换 CLI (`sockeye.mx_to_pt`) 转换为 PyTorch 模型。这将生成一个 PyTorch 参数文件 (`\u003Cmodel>\u002Fparams.best`)，并将现有的 MXNet 参数文件备份为 `\u003Cmodel>\u002Fparams.best.mx`。请注意，此操作仅适用于已完成训练并用于推理的模型。继续使用 PyTorch 对 MXNet 模型进行训练是不被支持的（因为我们不会转换训练和优化器状态）。\n\n`sockeye.mx_to_pt` 需要环境中已安装 MXNet。\n\n版本 3.0.0 的所有 CLI 现默认使用 PyTorch，例如 `sockeye-{train,translate,score}`。基于 MXNet 的 CLI\u002F模块仍然可用，可通过 `sockeye-{train,translate,score}-mx` 访问。\n\nSockeye 3 可以在没有 MXNet 的情况下安装和运行，但如果安装了 MXNet，则会执行扩展测试套件，以确保 PyTorch 和 MXNet 模型之间的等效性。需要注意的是，使用 MXNet 运行 Sockeye 3.0.0 需要安装 MXNet 2.x（`pip install --pre -f https:\u002F\u002Fdist.mxnet.io\u002Fpython 'mxnet>=2.0.0b2021'`）。\n\n## 安装\n\n下载当前版本的 Sockeye：\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fawslabs\u002Fsockeye.git\n```\n\n安装 sockeye 模块及其依赖项：\n```bash\ncd sockeye && pip3 install --editable .\n```\n\n为了加速 GPU 训练，建议安装 [NVIDIA Apex](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fapex)。NVIDIA 还提供了包含 Apex 的 [PyTorch Docker 容器](https:\u002F\u002Fngc.nvidia.com\u002Fcatalog\u002Fcontainers\u002Fnvidia:pytorch)。\n\n## 文档\n\n- 如需了解如何使用 Sockeye，请访问 [我们的文档](https:\u002F\u002Fawslabs.github.io\u002Fsockeye\u002F)。\n- 开发者可能对我们的 [开发者指南](https:\u002F\u002Fawslabs.github.io\u002Fsockeye\u002Fdevelopment.html) 感兴趣。\n\n### 较旧版本\n\n- 基于 PyTorch 和 MXNet 2.x 的 Sockeye 3.0 可在 `sockeye_30` 分支中找到。\n- 基于 MXNet Gluon API 的 Sockeye 2.x 可在 `sockeye_2` 分支中找到。\n- 基于 MXNet Module API 的 Sockeye 1.x 可在 `sockeye_1` 分支中找到。\n\n## 引用\n\n有关 Sockeye 的更多信息，请参阅我们的论文（[BibTeX](sockeye.bib)）。\n\n##### Sockeye 3.x\n\n> Felix Hieber, Michael Denkowski, Tobias Domhan, Barbara Darques Barros, Celina Dong Ye, Xing Niu, Cuong Hoang, Ke Tran, Benjamin Hsu, Maria Nadejde, Surafel Lakew, Prashant Mathur, Anna Currey, Marcello Federico.\n> [Sockeye 3：使用 PyTorch 的快速神经机器翻译](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.05851)。ArXiv e-prints。\n\n##### Sockeye 2.x\n\n> Tobias Domhan, Michael Denkowski, David Vilar, Xing Niu, Felix Hieber, Kenneth Heafield.\n> [AMTA 2020 上的 Sockeye 2 神经机器翻译工具包](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002F2020.amta-research.10\u002F)。第14届美洲机器翻译协会会议（AMTA'20）论文集。\n\n> Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar。\n> [Sockeye 2：神经机器翻译工具包](https:\u002F\u002Fwww.amazon.science\u002Fpublications\u002Fsockeye-2-a-toolkit-for-neural-machine-translation)。第22届欧洲机器翻译协会年度会议项目赛道（EAMT'20）论文集。\n\n##### Sockeye 1.x\n\n> Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton, Matt Post。\n> [AMTA 2018 上的 Sockeye 神经机器翻译工具包](https:\u002F\u002Fwww.aclweb.org\u002Fanthology\u002FW18-1820\u002F)。第13届美洲机器翻译协会会议（AMTA'18）论文集。\n>\n> Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov、Ann Clifton 和 Matt Post。2017年。\n> [Sockeye：神经机器翻译工具包](https:\u002F\u002Farxiv.org\u002Fabs\u002F1712.05690)。ArXiv e-prints。\n\n## 使用 Sockeye 的研究\n\nSockeye 已被用于学术和工业研究。以下列出了已知使用 Sockeye 的出版物清单。如果您了解更多，请告知我们或提交拉取请求（最后更新：2022年5月）。\n\n### 2023年\n* Zhang, Xuan, Kevin Duh, Paul McNamee. “用于神经机器翻译研究的超参数优化工具包”。ACL 2023 年论文集。\n\n### 2022年\n* Currey, Anna, Maria Nădejde, Raghavendra Pappagari, Mia Mayer, Stanislas Lauly, Xing Niu, Benjamin Hsu, Georgiana Dinu. “MT-GenEval：用于评估机器翻译性别准确性的反事实与上下文数据集”。EMNLP会议论文集（2022年）。\n* Domhan, Tobias, Eva Hasler, Ke Tran, Sony Trenous, Bill Byrne 和 Felix Hieber. “魔鬼藏在细节中：关于神经机器翻译中词汇选择的陷阱”。NAACL-HLT会议论文集（2022年）。\n* Fischer, Lukas, Patricia Scheurer, Raphael Schwitter, Martin Volk. “16世纪拉丁文书信到德语的机器翻译”。历史与古代语言技术研讨会（2022年）。\n* Knowles, Rebecca, Patrick Littell. “翻译记忆库作为低资源机器翻译的基线”。LREC会议论文集（2022年）。\n* McNamee, Paul, Kevin Duh. “多语言微型博客翻译语料库：改进与评估用户生成文本的翻译”。LREC会议论文集（2022年）。\n* Nadejde Maria, Anna Currey, Benjamin Hsu, Xing Niu, Marcello Federico, Georgiana Dinu. “CoCoA-MT：一种用于对比控制式机器翻译的数据集与基准，应用于正式度任务”。NAACL会议论文集（2022年）。\n* Weller-Di Marco, Marion, Matthias Huck, Alexander Fraser. “神经机器翻译中目标端形态建模：策略比较”。arXiv预印本 arXiv:2203.13550（2022年）\n\n\n### 2021年\n\n* Bergmanis, Toms, Mārcis Pinnis. “借助目标词形标注促进术语翻译”。arXiv预印本 arXiv:2101.10035（2021年）。\n* Briakou, Eleftheria, Marine Carpuat. “超越噪声：缓解细粒度语义差异对神经机器翻译的影响”。arXiv预印本 arXiv:2105.15087（2021年）。\n* Hasler, Eva, Tobias Domhan, Sony Trenous, Ke Tran, Bill Byrne、Felix Hieber. “改善神经机器翻译多领域适应中的质量权衡”。EMNLP会议论文集（2021年）。\n* Tang, Gongbo, Philipp Rönchen, Rico Sennrich, Joakim Nivre. “重新审视神经机器翻译中的否定表达”。计算语言学协会汇刊第9卷（2021年）。\n* Vu, Thuy, Alessandro Moschitti. “通过从网络自动选择训练数据实现机器翻译定制化”。arXiv预印本 arXiv:2102.1024（2021年）。\n* Xu, Weijia, Marine Carpuat. “EDITOR：一种基于编辑操作并具有重排功能的Transformer模型，适用于带有软性词汇约束的神经机器翻译”。计算语言学协会汇刊第9卷（2021年）。\n* Müller, Mathias, Rico Sennrich. “理解神经机器翻译中最小贝叶斯风险解码的特性”。第59届计算语言学协会年会暨第11届自然语言处理国际联合会议论文集（第一卷：长篇论文）（2021年）。\n* Popović, Maja, Alberto Poncelas. “关于用户评论的机器翻译”。RANLP会议论文集（2021年）。\n* Popović, Maja. “关于观测到的机器翻译错误的本质与成因”。第18届机器翻译峰会论文集（第一卷：研究专题）（2021年）。\n* Jain, Nishtha, Maja Popović, Declan Groves, Eva Vanmassenhove. “为自然语言处理生成性别增强型数据”。第三届自然语言处理中的性别偏见研讨会论文集（2021年）。\n* Vilar, David, Marcello Federico. “字节对编码的统计扩展”。IWSLT会议论文集（2021年）。\n\n### 2020年\n\n* Dinu, Georgiana、Prashant Mathur、Marcello Federico、Stanislas Lauly、Yaser Al-Onaizan. “端到端定位中的联合翻译与单位转换”。IWSLT会议论文集（2020）\n* Exel, Miriam、Bianka Buschbeck、Lauritz Brandt、Simona Doneva. “SAP公司的术语约束神经机器翻译”。EAMT会议论文集（2020）\n* Hisamoto, Sorami、Matt Post、Kevin Duh. “序列到序列模型的成员推断攻击：我的数据是否在你的机器翻译系统中？”计算语言学协会汇刊，第8卷（2020）\n* Naradowsky, Jason、Xuan Zhan、Kevin Duh. “基于赌博机反馈的机器翻译系统选择”。arXiv预印本 arXiv:2002.09646（2020）\n* Niu, Xing、Prashant Mathur、Georgiana Dinu、Yaser Al-Onaizan. “神经机器翻译对输入扰动的鲁棒性评估”。arXiv预印本 arXiv:2005.00580（2020）\n* Niu, Xing、Marine Carpuat. “利用合成监督控制神经机器翻译的正式程度”。AAAI会议论文集（2020）\n* Keung, Phillip、Julian Salazar、Yichao Liu、Noah A. Smith. “基于自训练上下文嵌入的无监督双语文本挖掘与翻译”。arXiv预印本 arXiv:2010.07761（2020）\n* Sokolov, Alex、Tracy Rohlin、Ariya Rastrow. “用于多语言字素到音素转换的神经机器翻译”。arXiv预印本 arXiv:2006.14194（2020）\n* Stafanovičs, Artūrs、Toms Bergmanis、Mārcis Pinnis. “通过目标语性别标注缓解机器翻译中的性别偏见”。arXiv预印本 arXiv:2010.06203（2020）\n* Stojanovski, Dario、Alexander Fraser. “利用文档级上下文解决零资源领域问题的神经机器翻译”。arXiv预印本 arXiv:2004.14927（2020）\n* Stojanovski, Dario、Benno Krojer、Denis Peskov、Alexander Fraser. “ContraCAT：用于机器翻译的对比型共指分析模板”。COLING会议论文集（2020）\n* Zhang, Xuan、Kevin Duh. “神经机器翻译系统超参数优化的可复现且高效的基准测试”。计算语言学协会汇刊，第8卷（2020）\n* Swe Zin Moe、Ye Kyaw Thu、Hnin Aye Thant、Nandar Win Min以及Thepchai Supnithi，“缅甸手语与缅甸语之间的无监督神经机器翻译”，《智能信息与智能技术期刊》，2020年4月1日刊，第53–61页。（2019年12月21日提交；2020年3月6日接受；2020年3月16日修订；2020年4月30日在线发表）\n* Thazin Myint Oo、Ye Kyaw Thu、Khin Mar Soe以及Thepchai Supnithi，“缅甸语（缅语）与德威语（塔沃语）之间的神经机器翻译”，载于第18届国际计算机应用大会（ICCA 2020），2020年2月27–28日，缅甸仰光，第219–227页\n* Müller, Mathias、Annette Rios、Rico Sennrich. “神经机器翻译中的领域鲁棒性”。AMTA会议论文集（2020）\n* Rios, Annette、Mathias Müller、Rico Sennrich. “子词分割与单一桥接语言对零样本神经机器翻译的影响”。第五届WMT研究论文集（2020）\n* Popović, Maja、Alberto Poncelas. “相似南斯拉夫语种间的神经机器翻译”。第五届WMT研究论文集（2020）\n* Popović, Maja、Alberto Poncelas. “利用字符n-gram匹配从不干净的平行数据中提取正确对齐的片段”。语言技术与数字人文会议论文集（JTDH 2020）\n* Popović, Maja、Alberto Poncelas、Marija Brkic、Andy Way. “用于克罗地亚语和塞尔维亚语互译的神经机器翻译”。第七届面向相似语言、变体及方言的自然语言处理研讨会论文集（2020）\n\n### 2019年\n\n* Agrawal, Sweta、Marine Carpuat. “控制神经机器翻译中的文本复杂度”。EMNLP会议论文集（2019）\n* Beck, Daniel、Trevor Cohn、Gholamreza Haffari. “使用格变换和图网络的神经语音翻译”。TextGraphs-13（EMNLP 2019）会议论文集\n* Currey, Anna、Kenneth Heafield. “基于单语枢纽数据的零资源神经机器翻译”。EMNLP会议论文集（2019）\n* Gupta, Prabhakar、Mayank Sharma. “针对数字娱乐内容字幕的无监督翻译质量评估”。IEEE语义计算国际期刊（2019）\n* Hu, J. Edward、Huda Khayrallah、Ryan Culkin、Patrick Xia、Tongfei Chen、Matt Post以及Benjamin Van Durme. “改进的基于词汇约束的解码方法，用于翻译和单语改写”。NAACL-HLT会议论文集（2019）\n* Rosendahl, Jan、Christian Herold、Yunsu Kim、Miguel Graça、Weiyue Wang、Parnia Bahar、Yingbo Gao以及Hermann Ney，“亚琛工业大学2019年WMT机器翻译系统”。第四届WMT研究论文集（2019）\n* Thompson, Brian、Jeremy Gwinnup、Huda Khayrallah、Kevin Duh以及Philipp Koehn. “克服神经机器翻译领域适应过程中的灾难性遗忘现象”。NAACL-HLT 2019会议论文集（2019）\n* Tättar, Andre、Elizaveta Korotkova、Mark Fishel，“塔尔图大学多语言多领域WMT19新闻翻译共享任务提交”。第四届WMT研究论文集（2019）\n* Thazin Myint Oo、Ye Kyaw Thu以及Khin Mar Soe，“缅甸语（缅语）与若开语（阿拉干语）之间的神经机器翻译”，载于第六届面向相似语言、变体及方言的自然语言处理研讨会，NAACL-2019，2019年6月7日，美国明尼阿波利斯，第80–88页\n\n### 2018年\n\n* 多曼，托比亚斯。“你需要多少注意力？神经机器翻译架构的细粒度分析”。第56届ACL会议论文集（2018）\n* 金允洙、高英博和赫尔曼·奈伊。“在无共享词表的情况下有效实现神经机器翻译模型的跨语言迁移”。arXiv预印本arXiv:1905.05475（2019）\n* 科罗特科娃，叶利扎维塔、马克西姆·德尔和马克·费舍尔。“单语与跨语言零样本风格转换”。arXiv预印本arXiv:1808.00179（2018）\n* 牛星、迈克尔·登科夫斯基和玛丽娜·卡普阿特。“利用合成平行数据的双向神经机器翻译”。arXiv预印本arXiv:1805.11213（2018）\n* 牛星、苏达·拉奥和玛丽娜·卡普阿特。“用于同语种及跨语种风格间翻译的多任务神经网络模型”。COLING会议（2018）\n* 波斯特和大卫·维拉尔。“基于动态束宽分配的快速词汇约束解码在神经机器翻译中的应用”。NAACL-HLT会议论文集（2018）\n* 沙姆珀，朱利安、扬·罗森达尔、帕尔尼亚·巴哈尔、金允洙、阿尔内·尼克斯和赫尔曼·奈伊。“亚琛工业大学在WMT 2018上的有监督机器翻译系统”。第三届WMT共享任务论文集（2018）\n* 舒尔茨，菲利普、威尔克·阿齐兹和特雷弗·科恩。“一种用于神经机器翻译的随机解码器”。arXiv预印本arXiv:1805.10844（2018）\n* 塔梅尔、阿尔库利、加布里埃尔·布雷施纳和赫尔曼·奈伊。“关于多头注意力机制神经机器翻译中的对齐问题”。第三届WMT研究论文集（2018）\n* 唐功波、里科·森尼希和约阿基姆·尼夫雷。“注意力机制分析：以神经机器翻译中的词义消歧为例”。第三届WMT研究论文集（2018）\n* 汤普森，布莱恩、胡达·海拉拉、安东尼奥斯·阿纳斯塔索普洛斯、艾莉娅·麦卡锡、凯文·杜、丽贝卡·马文、保罗·麦克纳米、杰里米·格温纳普、蒂姆·安德森和菲利普·科恩。“冻结子网络以分析神经机器翻译中的领域适应”。arXiv预印本arXiv:1809.05218（2018）\n* 维拉尔，大卫。“学习隐藏单元贡献以适配神经机器翻译模型”。NAACL-HLT会议论文集（2018）\n* 维亚斯，约加尔希、牛星和玛丽娜·卡普阿特“在无标注情况下识别平行文本中的语义差异”。NAACL-HLT会议论文集（2018）\n* 王伟悦、朱德瑞、塔梅尔·阿尔库利、甘子轩和赫尔曼·奈伊。“用于机器翻译的神经隐马尔可夫模型”。第56届ACL会议论文集（2018）\n* 张璇、高拉夫·库马尔、胡达·海拉拉、肯顿·默里、杰里米·格温纳普、玛丽安娜·J·马丁代尔、保罗·麦克纳米、凯文·杜以及玛丽娜·卡普阿特。“神经机器翻译中课程学习的实证探索”。arXiv预印本arXiv:1811.00739（2018）\n* 斯韦·津·莫、耶·乔·图、欣·艾·坦特和南达尔·温·敏，“缅甸手语与缅甸书面语之间的神经机器翻译”，载于2018年第二届东盟语言光学字符识别与自然语言处理技术区域会议（ONA 2018），2018年12月13日至14日，柬埔寨金边。\n* 唐功波、马蒂亚斯·穆勒、阿内特·里奥斯和里科·森尼希。“为什么是自注意力？神经机器翻译架构的针对性评估”。EMNLP会议论文集（2018）\n\n### 2017年\n\n* 多曼，托比亚斯和费利克斯·希伯。“通过多任务学习利用目标端单语数据进行神经机器翻译”。EMNLP会议论文集（2017）。","# Sockeye 快速上手指南\n\n## 环境准备\n- **系统要求**：Linux\u002FmacOS（Windows 需额外配置）\n- **前置依赖**：Python 3.6+，PyTorch 1.7+（安装时自动处理依赖）\n- **GPU 加速**：可选 NVIDIA GPU + CUDA（推荐用于训练）\n- **国内加速**：安装时使用清华镜像源加速依赖下载\n\n## 安装步骤\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fawslabs\u002Fsockeye.git\ncd sockeye\npip3 install --editable . -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n## 基本使用\n使用预训练模型进行翻译（示例）：\n```bash\nsockeye-translate --models \u002Fpath\u002Fto\u002Fmodel --source input.txt --target output.txt\n```\n> 完整训练与翻译教程：[WMT 2014 English-German 快速入门](https:\u002F\u002Fawslabs.github.io\u002Fsockeye\u002Ftutorials\u002Fwmt_large.html)","一家跨境电商平台需要为全球用户提供实时多语言翻译服务，但团队此前使用自研框架开发翻译模型，频繁遭遇效率瓶颈。\n\n### 没有 sockeye 时\n- 人工编写数据预处理脚本，每次迭代需手动处理10GB+语料库，耗时2天以上。\n- 模型训练仅依赖单GPU，处理WMT 2014英德数据集需7天，无法快速验证新算法。\n- 翻译服务响应延迟高达3秒，用户流失率因卡顿上升15%。\n- 代码分散在多个脚本中，新成员需2周才能理解系统逻辑。\n- 无法利用多GPU资源，服务器成本浪费严重。\n\n### 使用 sockeye 后\n- 通过sockeye的`sockeye-translate` CLI一键完成数据预处理与训练，迭代周期压缩至4小时。\n- 启用sockeye分布式训练功能，多GPU并行加速，WMT数据集训练时间缩短至2天。\n- 优化推理引擎使翻译响应延迟降至0.5秒内，用户留存率提升12%。\n- 代码结构标准化，新成员1天内即可上手模型微调。\n- GPU利用率从40%提升至85%，年均服务器成本降低30%。\n\nsockeye将神经机器翻译的开发效率与部署性能提升至工业级标准。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fawslabs_sockeye_9c87c848.png","awslabs","Amazon Web Services - Labs","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fawslabs_9e60acf8.png","AWS Labs",null,"http:\u002F\u002Famazon.com\u002Faws\u002F","https:\u002F\u002Fgithub.com\u002Fawslabs",[83,87,91,95,98,102],{"name":84,"color":85,"percentage":86},"Python","#3572A5",99,{"name":88,"color":89,"percentage":90},"JavaScript","#f1e05a",0.4,{"name":92,"color":93,"percentage":94},"Shell","#89e051",0.3,{"name":96,"color":97,"percentage":94},"TeX","#3D6117",{"name":99,"color":100,"percentage":101},"Dockerfile","#384d54",0.1,{"name":103,"color":104,"percentage":101},"CSS","#663399",1216,320,"2026-04-02T16:33:41","Apache-2.0","未说明",{"notes":111,"python":109,"dependencies":112},"GPU 加速需安装 NVIDIA Apex，推荐使用 PyTorch Docker 容器",[113],"torch>=2.0",[26,13],[116,117,118,119,120,121,122,123,124,67,125,126,127,128,129,130,131,132],"deep-learning","deep-neural-networks","machine-learning","machine-translation","neural-machine-translation","encoder-decoder","attention-mechanism","sequence-to-sequence","sequence-to-sequence-models","attention-is-all-you-need","attention-model","seq2seq","translation","transformer-architecture","transformer","transformer-network","pytorch","2026-03-27T02:49:30.150509","2026-04-06T07:14:16.416209",[136,141],{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},5505,"如何解决无法安装 mxnet 的问题？","确保有足够的磁盘空间（WSL2 安装可能占用 30GB 以上），建议在 Linux 环境下安装。Windows 用户需注意：Sockeye 主要设计用于 Linux 系统，WSL2 可能因磁盘空间不足导致安装失败。具体步骤：检查磁盘空间，优先使用 Linux 系统而非 Windows WSL2。","https:\u002F\u002Fgithub.com\u002Fawslabs\u002Fsockeye\u002Fissues\u002F973",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},5506,"如何获得较高的 BLEU 分数？","使用合适的模型配置参数，例如 --num-embed 256、--rnn-num-hidden 512、--rnn-attention-type dot、--max-seq-len 100、--batch-size 128。训练后 BLEU 分数可达 27.23（如示例配置：python -m sockeye.train -s corpus.tc.BPE.de -t corpus.tc.BPE.en --num-embed 256 --rnn-num-hidden 512 --rnn-attention-type dot --max-seq-len 100 --batch-size 128），远高于初始的 0.171。","https:\u002F\u002Fgithub.com\u002Fawslabs\u002Fsockeye\u002Fissues\u002F479",[147,152,157,162,167,172,177,182,187,192,197,202,207,212,217,222,227,232,237,242],{"id":148,"version":149,"summary_zh":150,"released_at":151},105005,"3.1.34","## [3.1.34]\r\n\r\n### Fixed\r\n- Do not mask prepended tokens by default (for self-attention).\r\n- Do not require specifying `--end-of-prepending-tag` if it is already done when preparing the data.","2023-03-03T07:44:13",{"id":153,"version":154,"summary_zh":155,"released_at":156},105006,"3.1.33","## [3.1.33]\r\n\r\n### Fixed \r\n- Two small fixes to SampleK. Before the device was not set correctly leading to issues when running sampling on GPUs. Furthermore, SampleK did not return the top-k values correctly.\r\n\r\n## [3.1.32]\r\n\r\n### Added\r\n\r\n- Sockeye now supports blocking cross-attention between decoder and encoded prepended tokens.\r\n  - If the source contains prepended text and a tag indicating the end of prepended text,\r\n    Sockeye supports blocking the cross-attention between decoder and encoded prepended tokens (including the tag).\r\n    To enable this operation, specify `--end-of-prepending-tag` for training or data preparation,\r\n    and `--transformer-block-prepended-cross-attention` for training.\r\n\r\n### Changed\r\n\r\n- Sockeye uses a new dictionary-based prepared data format that supports storing length of prepended source tokens\r\n  (version 7). The previous format (version 6) is still supported.","2023-03-01T09:01:57",{"id":158,"version":159,"summary_zh":160,"released_at":161},105007,"3.1.31","## [3.1.31]\r\n\r\n### Fixed\r\n\r\n- Fixed sequence copying integration tests to correctly specify that scoring\u002Ftranslation outputs should not be checked.\r\n- Enabled `bfloat16` integration and system testing on all platforms.\r\n\r\n## [3.1.30]\r\n\r\n### Added\r\n\r\n- Added support for `--dtype bfloat16` to `sockeye-translate`, `sockeye-score`, and `sockeye-quantize`.\r\n\r\n### Fixed\r\n\r\n- Fixed compatibility issue with `numpy==1.24.0` by using `pickle` instead of `numpy` to save\u002Fload `ParallelSampleIter` data permutations.","2023-02-01T09:12:16",{"id":163,"version":164,"summary_zh":165,"released_at":166},105008,"3.1.29","## [3.1.29]\r\n\r\n### Changed\r\n\r\n- Running `sockeye-evaluate` no longer applies text tokenization for TER (same behavior as other metrics).\r\n- Turned on type checking for all `sockeye` modules except `test_utils` and addressed resulting type issues.\r\n- Refactored code in various modules without changing user-level behavior.\r\n\r\n## [3.1.28]\r\n\r\n### Added\r\n\r\n- Added kNN-MT model from [Khandelwal et al., 2021](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.00710).\r\n  - Installation: see [faiss document](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Ffaiss\u002Fblob\u002Fmain\u002FINSTALL.md) -- installation via conda is recommended.\r\n  - Building a faiss index from a sockeye model takes two steps:\r\n    - Generate decoder states: `sockeye-generate-decoder-states -m [model] --source [src] --target [tgt] --output-dir [output dir]`\r\n    - Build index: `sockeye-knn -i [input_dir] -o [output_dir] -t [faiss_index_signature]` where `input_dir` is the same as `output_dir` from the `sockeye-generate-decoder-states` command.\r\n    - Faiss index signature reference: [see here](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Ffaiss\u002Fwiki\u002FThe-index-factory)\r\n  - Running inference using the built index: `sockeye-translate ... --knn-index [index_dir] --knn-lambda [interpolation_weight]` where `index_dir` is the same as `output_dir` from the `sockeye-knn` command.","2022-12-12T08:29:00",{"id":168,"version":169,"summary_zh":170,"released_at":171},105009,"3.1.27","## [3.1.27]\r\n\r\n### Changed\r\n\r\n- allow torch 1.13 in requirements.txt\r\n- Replaced deprecated `torch.testing.assert_allclose` with `torch.testing.close` for PyTorch 1.14 compatibility.\r\n\r\n## [3.1.26]\r\n\r\n### Added\r\n\r\n- `--tf32 0|1` bool device (`torch.backends.cuda.matmul.allow_tf32`)\r\n enabling 10-bit precision (19 bit total) transparent float32\r\n acceleration. default true for backward compat with torch \u003C 1.12.\r\n allow different `--tf32` training continuation\r\n\r\n### Changed\r\n\r\n- `device.init_device()` called by train, translate, and score\r\n- allow torch 1.12 in requirements.txt\r\n\r\n## [3.1.25]\r\n\r\n## Changed\r\n- Updated to sacrebleu==2.3.1. Changed default BLEU floor smoothing offset from 0.01 to 0.1.\r\n\r\n## [3.1.24]\r\n\r\n### Fixed\r\n\r\n- Updated DeepSpeed checkpoint conversion to support newer versions of DeepSpeed.\r\n\r\n## [3.1.23]\r\n\r\n### Changed\r\n\r\n- Change decoder softmax size logging level from info to debug.\r\n\r\n## [3.1.22]\r\n\r\n### Added\r\n\r\n- log beam search avg output vocab size\r\n\r\n### Changed\r\n\r\n- common base Search for GreedySearch and BeamSearch\r\n- .pylintrc: suppress warnings about deprecated pylint warning suppressions\r\n\r\n## [3.1.21]\r\n\r\n### Fixed\r\n\r\n- Send skip_nvs and nvs_thresh args now to Translator constructor in sockeye-translate instead of ignoring them.\r\n\r\n## [3.1.20]\r\n\r\n### Added\r\n\r\n- Added training support for [DeepSpeed](https:\u002F\u002Fwww.deepspeed.ai\u002F).\r\n  - Installation: `pip install deepspeed`\r\n  - Usage: `deepspeed --no_python ... sockeye-train ...`\r\n  - DeepSpeed mode uses Zero Redundancy Optimizer (ZeRO) stage 1 ([Rajbhandari et al., 2019](https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.02054v3)).\r\n  - Run in FP16 mode with `--deepspeed-fp16` or BF16 mode with `--deepspeed-bf16`.\r\n\r\n## [3.1.19]\r\n\r\n### Added\r\n\r\n- Clean up GPU and CPU memory used during training initialization before starting the main training loop.\r\n\r\n### Changed\r\n\r\n- Refactored training code in advance of adding DeepSpeed support:\r\n  - Moved logic for flagging interleaved key-value parameters from layers.py to model.py.\r\n  - Refactored LearningRateScheduler API to be compatible with PyTorch\u002FDeepSpeed.\r\n  - Refactored optimizer and learning rate scheduler creation to be modular.\r\n  - Migrated to ModelWithLoss API, which wraps a Sockeye model and its losses in a single module.\r\n  - Refactored primary and secondary worker logic to reduce redundant calculations.\r\n  - Refactored code for saving\u002Floading training states.\r\n  - Added utility code for managing model\u002Ftraining configurations.\r\n\r\n### Removed\r\n\r\n- Removed unused training option `--learning-rate-t-scale`.\r\n\r\n## [3.1.18]\r\n\r\n### Added\r\n\r\n- Added `sockeye-train` and `sockeye-translate` option `--clamp-to-dtype` that clamps outputs of transformer attention, feed-forward networks, and process blocks to the min\u002Fmax finite values for the current dtype. This can prevent inf\u002Fnan values from overflow when running large models in float16 mode. See: https:\u002F\u002Fdiscuss.huggingface.co\u002Ft\u002Ft5-fp16-issue-is-fixed\u002F3139\r\n\r\n## [3.1.17]\r\n\r\n### Added\r\n\r\n- Added support for offline model quantization with `sockeye-quantize`.\r\n  - Pre-quantizing a model avoids the load-time memory spike of runtime quantization. For example, a float16 model loads directly as float16 instead of loading as float32 then casting to float16.\r\n\r\n## [3.1.16]\r\n\r\n### Added\r\n- Added nbest list reranking options using isometric translation criteria as proposed in an ICASSP 2021 paper https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.03847.\r\nTo use this feature pass a criterion (`isometric-ratio, isometric-diff, isometric-lc`) when specifying `--metric`.\r\n- Added `--output-best-non-blank` to output non-blank best hypothesis from the nbest list.\r\n\r\n## [3.1.15]\r\n\r\n### Fixed\r\n\r\n- Fix type of valid_length to be pt.Tensor instead of Optional[pt.Tensor] = None for jit tracing","2022-11-06T14:13:40",{"id":173,"version":174,"summary_zh":175,"released_at":176},105010,"3.1.14","## [3.1.14]\r\n\r\n### Added\r\n- Added the implementation of Neural vocabulary selection to Sockeye as presented in our NAACL 2022 paper \"The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation\" (Tobias Domhan, Eva Hasler, Ke Tran, Sony Trenous, Bill Byrne and Felix Hieber).\r\n  - To use NVS simply specify `--neural-vocab-selection` to `sockeye-train`. This will train a model with Neural Vocabulary Selection that is automatically used by `sockeye-translate`. If you want look at translations without vocabulary selection specify `--skip-nvs` as an argument to `sockeye-translate`.\r\n\r\n## [3.1.13]\r\n\r\n### Added\r\n\r\n- Added `sockeye-train` argument `--no-reload-on-learning-rate-reduce` that disables reloading the best training checkpoint when reducing the learning rate. This currently only applies to the `plateau-reduce` learning rate scheduler since other schedulers do not reload checkpoints.","2022-05-05T08:42:03",{"id":178,"version":179,"summary_zh":180,"released_at":181},105011,"3.1.12","## [3.1.12]\r\n\r\n### Fixed\r\n\r\n- Fix scoring with batches of size 1 (whic may occur when `|data| % batch_size == 1`.\r\n\r\n## [3.1.11]\r\n\r\n### Fixed\r\n\r\n- When resuming training with a fully trained model, `sockeye-train` will correctly exit without creating a duplicate (but separately numbered) checkpoint.\r\n","2022-04-26T08:56:30",{"id":183,"version":184,"summary_zh":185,"released_at":186},105012,"3.1.10","## [3.1.10]\r\n\r\n### Fixed\r\n\r\n- When loading parameters, SockeyeModel now ignores false positive missing parameters for traced modules. These modules use the same parameters as their original non-traced versions.","2022-04-12T07:23:27",{"id":188,"version":189,"summary_zh":190,"released_at":191},105013,"3.1.9","## [3.1.9]\r\n\r\n### Changed\r\n\r\n- Clarified usage of `batch_size` in Translator code.\r\n\r\n## [3.1.8]\r\n\r\n### Fixed\r\n\r\n- When saving parameters, SockeyeModel now skips parameters for traced modules because these modules are created at runtime and use the same parameters as non-traced versions. When loading parameters, SockeyeModel ignores parameters for traced modules that may have been saved by earlier versions.","2022-04-11T14:12:57",{"id":193,"version":194,"summary_zh":195,"released_at":196},105014,"3.1.7","## [3.1.7]\r\n\r\n### Changed\r\n\r\n- SockeyeModel components are now traced regardless of whether `inference_only` is set, including for the CheckpointDecoder during training.\r\n\r\n## [3.1.6]\r\n\r\n### Changed\r\n\r\n- Moved offsetting of topk scores out of the (traced) TopK module. This allows sending requests of variable\r\n  batch size to the same Translator\u002FModel\u002FBeamSearch instance.\r\n\r\n## [3.1.5]\r\n\r\n### Changed\r\n- Allow PyTorch 1.11 in requirements","2022-03-23T09:21:54",{"id":198,"version":199,"summary_zh":200,"released_at":201},105015,"3.1.4","## [3.1.4]\r\n\r\n### Added\r\n- Added support for the use of adding target prefix and target prefix factors to the input in JSON format during inference.","2022-03-10T09:14:37",{"id":203,"version":204,"summary_zh":205,"released_at":206},105016,"3.1.3","## [3.1.3]\r\n\r\n### Added\r\n- Added support for the use of adding source prefixes to the input in JSON format during inference.\r\n\r\n## [3.1.2]\r\n\r\n### Changed\r\n- Optimized creation of source length mask by using `expand` instead of `repeat_interleave`.\r\n\r\n## [3.1.1]\r\n\r\n### Changed\r\n- Updated torch dependency to 1.10.x (`torch>=1.10.0,\u003C1.11.0`)","2022-02-28T09:43:29",{"id":208,"version":209,"summary_zh":210,"released_at":211},105017,"3.1.0","## [3.1.0]\r\nSockeye is now exclusively based on Pytorch.\r\n\r\n### Changed\r\n- Renamed `x_pt` modules to `x`. Updated entry points in `setup.py`.\r\n\r\n### Removed\r\n- Removed MXNet from the codebase\r\n- Removed device locking \u002F GPU acquisition logic. Removed dependency on `portalocker`.\r\n- Removed arguments `--softmax-temperature`, `--weight-init-*`, `--mc-dropout`, `--horovod`, `--device-ids`\r\n- Removed all MXNet-related tests","2022-02-11T09:25:54",{"id":213,"version":214,"summary_zh":215,"released_at":216},105018,"3.0.15","## [3.0.15]\r\n\r\n### Fixed\r\n- Fixed GPU-based scoring by copying to cpu tensor first before converting to numpy.\r\n\r\n## [3.0.14]\r\n\r\n### Added\r\n- Added support for Translation Error Rate (TER) metric as implemented in sacrebleu==1.4.14.\r\n  Checkpoint decoder metrics will now include TER scores and early stopping can be determined\r\n  via TER improvements (`--optimized-metric ter`)","2022-02-09T19:13:09",{"id":218,"version":219,"summary_zh":220,"released_at":221},105019,"3.0.13","## [3.0.13]\r\n\r\n### Changed\r\n- use `expand` instead of `repeat` for attention masks to not allocate additional memory\r\n- avoid repeated `transpose` for initializing cached encoder-attention states in the decoder.\r\n\r\n## [3.0.12]\r\n\r\n### Removed\r\n- Removed unused code for Weight Normalization. Minor code cleanups.\r\n\r\n## [3.0.11]\r\n\r\n### Fixed\r\n\r\n- Fixed training with a single, fixed learning rate instead of a rate scheduler (`--learning-rate-scheduler none --initial-learning-rate ...`).","2022-02-03T12:08:07",{"id":223,"version":224,"summary_zh":225,"released_at":226},105020,"3.0.10","## [3.0.10]\r\n\r\n### Changed\r\n\r\n- End-to-end trace decode_step of the Sockeye model. Creates less overhead during decoding and a small speedup.\r\n\r\n## [3.0.9]\r\n\r\n### Fixed\r\n\r\n- Fixed not calling the traced target embedding module during inference.\r\n\r\n## [3.0.8]\r\n\r\n### Changed\r\n\r\n- Add support for JIT tracing source\u002Ftarget embeddings and JIT scripting the output layer during inference. \r\n","2022-01-19T07:19:21",{"id":228,"version":229,"summary_zh":230,"released_at":231},105021,"3.0.7","## [3.0.7]\r\n\r\n## Changed\r\n\r\n- Improve training speed by using`torch.nn.functional.multi_head_attention_forward` for self- and encoder-attention\r\n  during training. Requires reorganization of the parameter layout of the key-value input projections,\r\n  as the current Sockeye attention interleaves for faster inference.\r\n  Attention masks (both for source masking and autoregressive masks need some shape adjustments as requirements\r\n  for the fused MHA op differ slightly).\r\n  - Non-interleaved format for joint key-value input projection parameters:\r\n    `in_features=hidden, out_features=2*hidden -> Shape: (2*hidden, hidden)`\r\n  - Interleaved format for joint-key-value input projection stores key and value parameters, grouped by heads:\r\n    `Shape: ((num_heads * 2 * hidden_per_head), hidden)`\r\n  - Models save and load key-value projection parameters in interleaved format.\r\n  - When `model.training == True` key-value projection parameters are put into\r\n    non-interleaved format for `torch.nn.functional.multi_head_attention_forward`\r\n  - When `model.training == False`, i.e. model.eval() is called, key-value projection\r\n    parameters are again converted into interleaved format in place.\r\n\r\n## [3.0.6]\r\n\r\n### Fixed\r\n\r\n- Fixed checkpoint decoder issue that prevented using `bleu` as `--optimized-metric` for distributed training ([#995](https:\u002F\u002Fgithub.com\u002Fawslabs\u002Fsockeye\u002Fissues\u002F995)).\r\n\r\n## [3.0.5]\r\n\r\n### Fixed\r\n\r\n- Fixed data download in multilingual tutorial.","2021-12-20T09:59:18",{"id":233,"version":234,"summary_zh":235,"released_at":236},105022,"3.0.4","## [3.0.4]\r\n\r\n###\r\n\r\n- Make sure data permutation indices are in int64 format (doesn't seem to be the case by default on all platforms).\r\n\r\n## [3.0.3]\r\n\r\n### Fixed\r\n\r\n- Fixed ensemble decoding for models without target factors.\r\n\r\n## [3.0.2]\r\n\r\n### Changed\r\n\r\n- `sockeye-translate`: Beam search now computes and returns secondary target factor scores. Secondary target factors\r\n  do not participate in beam search, but are greedily chosen at every time step. Accumulated scores for secondary factors\r\n  are not normalized by length. Factor scores are included in JSON output (``--output-type json``).\r\n- `sockeye-score` now returns tab-separated scores for each target factor. Users can decide how to combine factor scores\r\n  depending on the downstream application. Score for the first, primary factor (i.e. output words) are normalized,\r\n  other factors are not.\r\n\r\n## [3.0.1]\r\n\r\n### Fixed\r\n\r\n- Parameter averaging (`sockeye-average`) now always uses the CPU, which enables averaging parameters from GPU-trained models on CPU-only hosts.","2021-12-13T17:39:31",{"id":238,"version":239,"summary_zh":240,"released_at":241},105023,"3.0.0","## [3.0.0] Sockeye 3: Fast Neural Machine Translation with PyTorch\r\n\r\nSockeye is now based on PyTorch.\r\nWe maintain backwards compatibility with MXNet models in version 2.3.x until 3.1.0.\r\nIf MXNet 2.x is installed, Sockeye can run both with PyTorch or MXNet but MXNet is no longer strictly required.\r\n\r\n### Added\r\n\r\n- Added model converter CLI `sockeye.mx_to_pt` that converts MXNet models to PyTorch models.\r\n- Added `--apex-amp` training argument that runs entire model in FP16 mode, replaces `--dtype float16` (requires [Apex](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fapex)).\r\n- Training automatically uses Apex fused optimizers if available (requires [Apex](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fapex)).\r\n- Added training argument `--label-smoothing-impl` to choose label smoothing implementation (default of `mxnet` uses the same logic as MXNet Sockeye 2).\r\n\r\n### Changed\r\n\r\n- CLI names point to the PyTorch code base (e.g. `sockeye-train` etc.).\r\n- MXNet-based CLIs are now accessible via `sockeye-\u003Cname>-mx`.\r\n- MXNet code requires MXNet >= 2.0 since we adopted the new numpy interface.\r\n- `sockeye-train` now uses PyTorch's distributed data-parallel mode for multi-process (multi-GPU) training. Launch with: `torchrun --no_python --nproc_per_node N sockeye-train --dist ...`\r\n- Updated the [quickstart tutorial](docs\u002Ftutorials\u002Fwmt_large.md) to cover multi-device training with PyTorch Sockeye.\r\n- Changed `--device-ids` argument (plural) to `--device-id` (singular). For multi-GPU training, see distributed mode noted above.\r\n- Updated default value: `--pad-vocab-to-multiple-of 8`\r\n- Removed `--horovod` argument used with `horovodrun` (use `--dist` with `torchrun`).\r\n- Removed `--optimizer-params` argument (use `--optimizer-betas`, `--optimizer-eps`).\r\n- Removed `--no-hybridization` argument (use `PYTORCH_JIT=0`, see [Disable JIT for Debugging](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fstable\u002Fjit.html#disable-jit-for-debugging)).\r\n- Removed `--omp-num-threads` argument (use `--env=OMP_NUM_THREADS=N`).\r\n\r\n### Removed\r\n\r\n- Removed support for constrained decoding (both positive and negative lexical constraints)\r\n- Removed support for beam histories\r\n- Removed `--amp-scale-interval` argument.\r\n- Removed `--kvstore` argument.\r\n- Removed arguments: `--weight-init`, `--weight-init-scale` `--weight-init-xavier-factor-type`, `--weight-init-xavier-rand-type`\r\n- Removed `--decode-and-evaluate-device-id` argument.\r\n- Removed arguments: `--monitor-pattern'`, `--monitor-stat-func`\r\n- Removed CUDA-specific requirements files in `requirements\u002F`","2021-11-30T09:48:34",{"id":243,"version":244,"summary_zh":245,"released_at":246},105024,"2.3.24","## [2.3.24]\r\n### Added\r\n\r\n- Use of the safe yaml loader for the model configuration files.\r\n\r\n## [2.3.23]\r\n### Changed\r\n\r\n- Do not sort BIAS_STATE in beam search. It is constant across decoder steps.","2021-11-05T09:28:33"]