[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-fe1ixxu--ALMA":3,"tool-fe1ixxu--ALMA":65},[4,17,27,35,48,57],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",150037,2,"2026-04-10T23:33:47",[13,14,15],"开发框架","Agent","语言模型","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,3,"2026-04-06T11:19:32",[15,26,14,13],"图像",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":10,"last_commit_at":33,"category_tags":34,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":10,"last_commit_at":41,"category_tags":42,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",85092,"2026-04-10T11:13:16",[26,43,44,45,14,46,15,13,47],"数据工具","视频","插件","其他","音频",{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":54,"last_commit_at":55,"category_tags":56,"status":16},5784,"funNLP","fighting41love\u002FfunNLP","funNLP 是一个专为中文自然语言处理（NLP）打造的超级资源库，被誉为\"NLP 民工的乐园”。它并非单一的软件工具，而是一个汇集了海量开源项目、数据集、预训练模型和实用代码的综合性平台。\n\n面对中文 NLP 领域资源分散、入门门槛高以及特定场景数据匮乏的痛点，funNLP 提供了“一站式”解决方案。这里不仅涵盖了分词、命名实体识别、情感分析、文本摘要等基础任务的标准工具，还独特地收录了丰富的垂直领域资源，如法律、医疗、金融行业的专用词库与数据集，甚至包含古诗词生成、歌词创作等趣味应用。其核心亮点在于极高的全面性与实用性，从基础的字典词典到前沿的 BERT、GPT-2 模型代码，再到高质量的标注数据和竞赛方案，应有尽有。\n\n无论是刚刚踏入 NLP 领域的学生、需要快速验证想法的算法工程师，还是从事人工智能研究的学者，都能在这里找到急需的“武器弹药”。对于开发者而言，它能大幅减少寻找数据和复现模型的时间；对于研究者，它提供了丰富的基准测试资源和前沿技术参考。funNLP 以开放共享的精神，极大地降低了中文自然语言处理的开发与研究成本，是中文 AI 社区不可或缺的宝藏仓库。",79857,1,"2026-04-08T20:11:31",[15,43,46],{"id":58,"name":59,"github_repo":60,"description_zh":61,"stars":62,"difficulty_score":23,"last_commit_at":63,"category_tags":64,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[14,26,13,15,46],{"id":66,"github_repo":67,"name":68,"description_en":69,"description_zh":70,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":80,"owner_location":81,"owner_email":82,"owner_twitter":76,"owner_website":83,"owner_url":84,"languages":85,"stars":101,"forks":102,"last_commit_at":103,"license":104,"difficulty_score":23,"env_os":105,"env_gpu":106,"env_ram":107,"env_deps":108,"category_tags":117,"github_topics":82,"view_count":10,"oss_zip_url":82,"oss_zip_packed_at":82,"status":16,"created_at":118,"updated_at":119,"faqs":120,"releases":149},6510,"fe1ixxu\u002FALMA","ALMA","State-of-the-art LLM-based translation models.","ALMA 是一款基于大语言模型（LLM）的先进机器翻译系统，由约翰霍普金斯大学与微软联合研发。它致力于解决传统翻译模型在多语言场景下性能不足、尤其是低资源语言翻译质量不佳的难题。通过独特的“两阶段微调”策略，ALMA 先在单语数据上学习语言规律，再利用高质量平行语料进行优化，从而实现了卓越的翻译效果。\n\n该项目已迭代至第三代 X-ALMA，支持从 6 种扩展至 50 种语言的互译，并在各类资源水平的语言上均保持顶尖性能。其核心技术亮点包括创新的“对比偏好优化（CPO）”算法，该方法无需依赖参考译文即可提升模型表现，性能可比肩甚至超越 GPT-4；以及 X-ALMA 采用的即插即用式语言模块架构和自适应拒绝偏好优化技术。\n\nALMA 非常适合自然语言处理研究人员、AI 开发者以及需要高质量多语言翻译解决方案的企业团队使用。无论是希望探索大模型翻译新范式的研究者，还是寻求部署高性能翻译引擎的工程师，都能从中获得强大的技术支持。目前，其核心算法已被国际顶级学术会议收录，并整合进主流开源社区，是构建下一代翻译系统的理想选择。","\u003Cp align=\"center\">\n    \u003Cimg alt=\"ALMA\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffe1ixxu_ALMA_readme_62a0e8c38265.png\" width=\"500\" height=\"203\">\n\u003C\u002Fp>\n\n\u003Cdiv align=\"center\">\n    \n# ALMA: Advanced Language Model-based translator\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n\u003Ca href=\"LICENSE\" alt=\"MIT License\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-FAD689.svg\" \u002F>\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11674\" alt=\"ALMA paper\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FALMA-Paper-D9AB42\" \u002F>\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.08417\" alt=\"ALMA-R paper\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FALMA--R-Paper-F6C555\" \u002F>\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.03115\" alt=\"X-ALMA paper\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FX--ALMA-Paper-F3B425\" \u002F>\u003C\u002Fa>\n\u003C!-- \u003Ca href=\"https:\u002F\u002Fnotes.aimodels.fyi\u002Falma-a-new-training-method-that-boosts-translation-performance-for-large-language-models\u002F\">\u003Cimg alt=\"Summary Link\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fsummary-link-F6C555\" \u002F>\u003C\u002Fa> -->\n\u003Ca href=\"https:\u002F\u002Fwww.clsp.jhu.edu\u002F\" alt=\"jhu\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FJohns_Hopkins_University-BEC23F\" \u002F>\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002F\" alt=\"MSlogo\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMicrosoft-B1B479?logo=microsoft\" \u002F>\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Ftwitter.com\u002Ffe1ixxu\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Ffollow\u002Fhaoranxu?style=social&logo=twitter\"\n      alt=\"follow on Twitter\">\u003C\u002Fa>\n\u003C\u002Fp>\n\nALMA has three generations: ALMA (1st), ALMA-R (2nd), and **X-ALMA(3rd NEW!)**.\n\n[**ALMA**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11674) (**A**dvanced **L**anguage **M**odel-based Tr**A**nslator) is a many-to-many LLM-based translation model,  which adopts a new translation model paradigm: it begins with fine-tuning on monolingual data and is further optimized using high-quality parallel data. This two-step fine-tuning process ensures strong translation performance.\n\n**[ALMA-R](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.08417v2.pdf)** builds upon ALMA models, with further LoRA fine-tuning with our proposed **Contrastive Preference Optimization (CPO)** as opposed to the Supervised Fine-tuning used in ALMA. CPO fine-tuning requires our [triplet preference data](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhaoranxu\u002FALMA-R-Preference) for preference learning. ALMA-R now can matches or even exceeds GPT-4 or WMT winners!\n\n**[X-ALMA](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.03115) (NEW!) extends ALMA(-R) from 6 languages to 50 languages and ensures top-tier performance across 50 diverse languages, regardless of their resource levels. This is achieved by plug-and-play language-specific module architecture and a carefully designed 5-step training recipe with novel *Adaptive-Rejection Preference Optimization* methods.** \n\n*Old ALMA Repo:*\n- The original **ALMA** repository can be found [here](https:\u002F\u002Fgithub.com\u002Ffe1ixxu\u002FALMA\u002Ftree\u002Fa3cc7877752779346312bb07798172eadc83d692).\n- The original **ALMA-R** repository can be found [here](https:\u002F\u002Fgithub.com\u002Ffe1ixxu\u002FALMA\u002Ftree\u002Fac120eb44c609ad9a386d617172d40432c2c0df6).\n\n# News 🌟\n⭐ Jan. 22 2025 **X-ALMA** has been accepted at **ICLR 2025**!\n\n⭐ Oct. 6 2024 **X-ALMA** is out! Please find the [paper here](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.03115) and [models & datasets here](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fhaoranxu\u002Fx-alma-66fde464ef90be465920abaa).\n\n⭐ Jun. 20 2024 We want to give a shout out to [SimPO](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.14734), which shares a similar reference-free preference learning framework with CPO but in a more stable manner due to its special length normalization and target reward margin. The most exciting thing is that CPO and SimPO can potentially be used together! Learn more about [CPO-SimPO](https:\u002F\u002Fgithub.com\u002Ffe1ixxu\u002FCPO_SIMPO)!\n\n⭐ May.1 CPO paper has been accepted at **ICML 2024**!\n\n⭐ Mar.22 2024 CPO method now is merged at [huggingface trl](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftrl)! See details [here](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftrl\u002Fpull\u002F1382).\n\n⭐ Jan.16 2024 **ALMA-R** is released! Please check more details with our new paper: [Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.08417).\n\n⭐ Jan.16 2024 The ALMA paper: [A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11674) has been accepted at **ICLR 2024**! Check out more details [here](https:\u002F\u002Fopenreview.net\u002Fforum?id=farT6XXntP)!\n\n# Contents 📄\n- [Download ALMA Models and Dataset](#download-alma-models-and-dataset-)\n- [A Quick Start](#a-quick-start)\n- [Environment Setup](#environment-setup-)\n- [Evaluation](#evaluation-)\n- [Training](#training-)\n- [FAQs](#faqs-)\n\n:star: Supports :star:\n  - AMD and Nvidia Cards\n  - Data Parallel Evaluation\n  - Also support LLaMA-1, LLaMA-2, OPT, Faclon, BLOOM, MPT\n  - LoRA Fine-tuning\n  - Monolingual data fine-tuning, parallel data fine-tuning\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffe1ixxu_ALMA_readme_db4b483817d4.png\" width=\"700\" height=\"300\">\n\u003C\u002Fp>\n\n# Download ALMA Models and Dataset 🚀\n\nWe release seven translation models for ALMA series:\n\nModel checkpoints are released at huggingface:\n|     Models    | Base Model Link | LoRA Link |\n|:-------------:|:---------------:|:---------:|\n|    ALMA-7B (1st gen)    |        [haoranxu\u002FALMA-7B](https:\u002F\u002Fhuggingface.co\u002Fhaoranxu\u002FALMA-7B)        |     -     |\n|  ALMA-7B-LoRA (1st gen) |        [haoranxu\u002FALMA-7B-Pretrain](https:\u002F\u002Fhuggingface.co\u002Fhaoranxu\u002FALMA-7B-Pretrain)        |     [haoranxu\u002FALMA-7B-Pretrain-LoRA](https:\u002F\u002Fhuggingface.co\u002Fhaoranxu\u002FALMA-7B-Pretrain-LoRA)     |\n|  ALMA-7B-R (2nd gen) |        [haoranxu\u002FALMA-7B-R (LoRA merged)](https:\u002F\u002Fhuggingface.co\u002Fhaoranxu\u002FALMA-7B-R)        |     -    |\n|    ALMA-13B-LoRA (1st gen)   |        [haoranxu\u002FALMA-13B](https:\u002F\u002Fhuggingface.co\u002Fhaoranxu\u002FALMA-13B)        |     -     |\n| ALMA-13B-LoRA |        [haoranxu\u002FALMA-13B-Pretrain](https:\u002F\u002Fhuggingface.co\u002Fhaoranxu\u002FALMA-13B-Pretrain)        |     [haoranxu\u002FALMA-13B-Pretrain-LoRA](https:\u002F\u002Fhuggingface.co\u002Fhaoranxu\u002FALMA-13B-Pretrain-LoRA)     |\n| ALMA-13B-R (2nd gen) |        [haoranxu\u002FALMA-13B-R (LoRA merged)](https:\u002F\u002Fhuggingface.co\u002Fhaoranxu\u002FALMA-13B-R)        |    -   |\n|  **X-ALMA (NEW, 3rd gen)** |        [X-ALMA Models](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fhaoranxu\u002Fx-alma-66fde464ef90be465920abaa)        |    -   |\n\n**Note that `ALMA-7B-Pretrain` and `ALMA-13B-Pretrain` are NOT translation models. They only experience stage 1 monolingual fine-tuning (20B tokens for the 7B model and 12B tokens for the 13B model), and should be utilized in conjunction with their LoRA models.** \n\n*We have also provided the WMT'22 and WMT'23 translation outputs from ALMA-13B-LoRA and ALMA-13B-R in the `outputs` directory. These outputs also includes our outputs of baselines and can be directly accessed and used for subsequent evaluations.*\n\nDatasets used by ALMA and ALMA-R are also released at huggingface now (NEW!)\n|     Datasets    | Train \u002F Validation| Test |\n|:-------------:|:---------------:|:---------:|\n|   ALMA Human-Written Parallel Data    |        [Parallel train and validation](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhaoranxu\u002FALMA-Human-Parallel)        |     [WMT'22](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhaoranxu\u002FWMT22-Test)    |\n|  ALMA-R Triplet Preference Data |        [Triplet Preference Data](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhaoranxu\u002FALMA-R-Preference)        |   [WMT'22](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhaoranxu\u002FWMT22-Test) and [WMT'23](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhaoranxu\u002FWMT23-Test)   |\n|  **X-ALMA Data** |   50-language   [parallel data](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhaoranxu\u002FX-ALMA-Parallel-Data) and [preference data](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhaoranxu\u002FX-ALMA-Preference)        |   [WMT'23](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhaoranxu\u002FWMT23-Test) and [FLORES-200](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhaoranxu\u002FX-ALMA-Parallel-Data)  |\n\n\n# A Quick Start\nX-ALMA is designed with a plug-and-play architecture, consisting of two components: a base model and language-specific modules, with each module shared across different language groups.\nThere are three ways to load X-ALMA for translation. An example of translating \"我爱机器翻译。\" into English (X-ALMA should also able to do multilingual open-ended QA). \n\n**The first way**: loading the merged model where the language-specific module has been merged into the base model (Recommended):\n```\nimport torch\nfrom transformers import AutoModelForCausalLM\nfrom transformers import AutoTokenizer\nfrom peft import PeftModel\n\nGROUP2LANG = {\n    1: [\"da\", \"nl\", \"de\", \"is\", \"no\", \"sv\", \"af\"],\n    2: [\"ca\", \"ro\", \"gl\", \"it\", \"pt\", \"es\"],\n    3: [\"bg\", \"mk\", \"sr\", \"uk\", \"ru\"],\n    4: [\"id\", \"ms\", \"th\", \"vi\", \"mg\", \"fr\"],\n    5: [\"hu\", \"el\", \"cs\", \"pl\", \"lt\", \"lv\"],\n    6: [\"ka\", \"zh\", \"ja\", \"ko\", \"fi\", \"et\"],\n    7: [\"gu\", \"hi\", \"mr\", \"ne\", \"ur\"],\n    8: [\"az\", \"kk\", \"ky\", \"tr\", \"uz\", \"ar\", \"he\", \"fa\"],\n}\nLANG2GROUP = {lang: str(group) for group, langs in GROUP2LANG.items() for lang in langs}\ngroup_id = LANG2GROUP[\"zh\"]\n\nmodel = AutoModelForCausalLM.from_pretrained(f\"haoranxu\u002FX-ALMA-13B-Group{group_id}\", torch_dtype=torch.float16, device_map=\"auto\")\ntokenizer = AutoTokenizer.from_pretrained(f\"haoranxu\u002FX-ALMA-13B-Group{group_id}\", padding_side='left')\n\n# Add the source sentence into the prompt template\nprompt=\"Translate this from Chinese to English:\\nChinese: 我爱机器翻译。\\nEnglish:\"\n\n# X-ALMA needs chat template but ALMA and ALMA-R don't need it.\nchat_style_prompt = [{\"role\": \"user\", \"content\": prompt}]\nprompt = tokenizer.apply_chat_template(chat_style_prompt, tokenize=False, add_generation_prompt=True)\n\ninput_ids = tokenizer(prompt, return_tensors=\"pt\", padding=True, max_length=40, truncation=True).input_ids.cuda()\n\n# Translation\nwith torch.no_grad():\n    generated_ids = model.generate(input_ids=input_ids, num_beams=5, max_new_tokens=20, do_sample=True, temperature=0.6, top_p=0.9)\noutputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)\nprint(outputs)\n```\n\n**The second way**: loading the base model and language-specific module (Recommended):\n```\nmodel = AutoModelForCausalLM.from_pretrained(\"haoranxu\u002FX-ALMA-13B-Pretrain\", torch_dtype=torch.float16, device_map=\"auto\")\nmodel = PeftModel.from_pretrained(model, f\"haoranxu\u002FX-ALMA-13B-Group{group_id}\")\ntokenizer = AutoTokenizer.from_pretrained(f\"haoranxu\u002FX-ALMA-13B-Group{group_id}\", padding_side='left')\n```\n\n**The third way**: loading the base model with all language-specific modules like MoE: (Require large GPU memory)\n```\nfrom modeling_xalma import XALMAForCausalLM\nmodel = XALMAForCausalLM.from_pretrained(\"haoranxu\u002FX-ALMA\", torch_dtype=torch.float16, device_map=\"auto\")\ntokenizer = AutoTokenizer.from_pretrained(\"haoranxu\u002FX-ALMA\", padding_side='left')\n\n# Add `lang=\"zh\"`: specify the language to instruct the model on which group to use for the third loading method during generation.\ngenerated_ids = model.generate(input_ids=input_ids, num_beams=5, max_new_tokens=20, do_sample=True, temperature=0.6, top_p=0.9, lang=\"zh\")\n```\n\n\nThe ALMA and ALMA-R translation prompt is:\n```\nTranslate this from \u003Csource language name> into \u003Ctarget language name>:\n\u003Csource language name>: \u003Csource language sentence>\n\u003Ctarget language name>:\n```\n\nThe X-ALMA translation prompt is:\n```\n\u003Cs>[INST] Translate this from \u003Csource language name> into \u003Ctarget language name>:\n\u003Csource language name>: \u003Csource language sentence>\n\u003Ctarget language name>: [\u002FINST]\n```\n\n# Environment Setup 🔧\n```\nconda create -n xalma python=3.11\nconda activate xalma\n```\nIf you use **AMD GPUs**, please first install torch with ROCm.\n\nThen install other dependencies:\n```\nbash install_alma.sh\n```\n# Evaluation 💻\n### Evaluation on X-ALMA\nThis is a quick start to evaluate our X-ALMA model. To produce translation outputs for FLORES-200 in both en→cs and cs→en directions, (If you want to evaluate WMT'23 instead, simply pass `--override_test_data_path haoranxu\u002FWMT23-Test`.), run the following command. **Note that You don't need enable `--chat_style` for ALMA and ALMA-R. This is only for X-ALMA**\n\n```\naccelerate launch --config_file configs\u002Fdeepspeed_eval_config_bf16.yaml \\\n    run_llmmt.py \\\n    --model_name_or_path haoranxu\u002FX-ALMA-13B-Group5 \\\n    --do_predict \\\n    --low_cpu_mem_usage \\\n    --language_pairs en-cs,cs-en \\\n    --mmt_data_path placeholder \\\n    --override_test_data_path haoranxu\u002FFLORES-200 \\\n    --per_device_eval_batch_size 1 \\\n    --output_dir .\u002Fyour_output_dir\u002F \\\n    --predict_with_generate \\\n    --max_new_tokens 256 \\\n    --max_source_length 256 \\\n    --bf16 \\\n    --seed 42 \\\n    --num_beams 5 \\\n    --overwrite_cache \\\n    --overwrite_output_dir \\\n    --chat_style # `--chat_style` only for X-ALMA. You don't need enable `--chat_style` for ALMA and ALMA-R\n\n```\nThe generated outputs will be saved in the `your_output_dir`. The translation file for the `en→cs` direction is named `test-en-cs`, and the file for the cs→en direction is `test-cs-en`.\nThe variable `${test_pairs}` denotes the translation directions you wish to evaluate. It supports testing multiple directions at once. For example, you can use `de-en,en-de,en-cs,cs-en`.\n\nPlease see more other examples for evaluating ALMA(-R) under the `.\u002Fevals` folder:\n\n**Note that this will perform data-parallel evaluation supported by deepspeed: that is, placing a single full copy of your model onto each available GPU and splitting batches across GPUs to evaluate on K GPUs K times faster than on one**. For those with limited GPU memory, we offer an alternative method. The user can pass `--multi_gpu_one_model` to run the process by distributing a single model across multiple GPUs. Please see evaluation examples in `evals\u002Falma_13b_r.sh` or  `evals\u002F*no_parallel` files.\n\n# Training 🔥\nHere we show how to \n- contrastive Preference Optmization Upon ALMA Models (ALMA→ALMA-R).\n- fine-tune LLaMA-2-7B on monolingual OSCAR data (stage 1)\n- fine-tune human-written parallel data fine-tuning once stage 1 is completed, including full-weight and LoRA fine-tuning (stage 2)\n\nPlease note that we do not share the training process for X-ALMA specifically, as it would require releasing numerous intermediate checkpoints, making the process overly complex.\n\n## **CPO Fine-Tuning**\nTo run the CPO fine-tuning with our triplet preference data, run the following command:\n```\nbash runs\u002Fcpo_ft.sh ${your_output_dir}\n```\n### OSCAR Monolingual Fine-Tuning\nTo execute the OSCAR monolingual fine-tuning, use the following command:\n```\nbash runs\u002Fmono_ft.sh ${your_output_dir}\n```\n### Parallel Data Fine-Tuning (Full-Weight)\nOnce the monolingual data fine-tuning is complete, proceed to the parallel data fine-tuning using the full-weight approach. Execute the following command:\n```\nbash runs\u002Fparallel_ft.sh ${your_output_dir} $training_pairs$\n```\nwhere `training_pairs` is the translation directions you considered. The default is all 10 directions: `de-en,cs-en,is-en,zh-en,ru-en,en-de,en-cs,en-is,en-zh,en-ru`.\n\n### Parallel Data Fine-Tuning (LoRA)\nIn Stage 2, there's also an option to employ LoRA for fine-tuning on the parallel data. To do so, execute the following command:\n```\nbash runs\u002Fparallel_ft_lora.sh ${your_output_dir} $training_pairs$\n```\n\n# FAQs ❓\n### What language directions do ALMA and ALMA-R support?\nCurrently, ALMA supports 10 directions: English↔German, Englishs↔Czech, Englishs↔Icelandic, Englishs↔Chinese, Englishs↔Russian. However, it may surprise us in other directions :)\n\n### What language directions do X-ALMA support?\nX-ALMA supports 50 languages and 98 directions (into and from English): da,nl,de,is,no,sv,af,ca,ro,gl,it,pt,es,bg,mk,sr,uk,ru,id,ms,th,vi,mg,fr,hu,el,cs,pl,lt,lv,ka,zh,ja,ko,fi,et,gu,hi,mr,ne,ur,az,kk,ky,tr,uz,ar,he,fa\n\n### When should I stop fine-tuning at stage 1?\nOur 7B and 13B models are trained on 20B and 12B tokens, respectively. However, as indicated in the paper, fine-tuning 1B tokens should boost the performance substantially. The steps required to fine-tune 1 billion tokens also vary based on your batch size. In our case, the batch size is calculated as follows: 16 GPUs * 4 (batch size per GPU) * 4 (gradient accumulation steps) = 256. With a sequence length of 512, we need approximately 8,000 steps to train on 1 billion tokens, calculated as 10^9 \u002F (256*512) ≈8000 steps. However, you may choose to fine-tune more steps to get better performance.\n\n### How to decide the interleave probability at stage 1?\nPlease find the reasons for interleave probability selection for stage 1 in Appendix D.1 in the [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.11674.pdf)!\n\n# Reference\nPlease find more details for ALMA models in our [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11674) or the [summary](https:\u002F\u002Fnotes.aimodels.fyi\u002Falma-a-new-training-method-that-boosts-translation-performance-for-large-language-models\u002F) of the paper.\n```\n@inproceedings{\n    xu2024a,\n    title={A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models},\n    author={Haoran Xu and Young Jin Kim and Amr Sharaf and Hany Hassan Awadalla},\n    booktitle={The Twelfth International Conference on Learning Representations},\n    year={2024},\n    url={https:\u002F\u002Fopenreview.net\u002Fforum?id=farT6XXntP}\n}\n```\n\nPlease also find more detailed information for the ALMA-R model with Contrastive Preference Optimization in the [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.08417v2.pdf).\n```\n@inproceedings{\n    xu2024contrastive,\n    title={Contrastive Preference Optimization: Pushing the Boundaries of {LLM} Performance in Machine Translation},\n    author={Haoran Xu and Amr Sharaf and Yunmo Chen and Weiting Tan and Lingfeng Shen and Benjamin Van Durme and Kenton Murray and Young Jin Kim},\n    booktitle={Forty-first International Conference on Machine Learning},\n    year={2024},\n    url={https:\u002F\u002Fopenreview.net\u002Fforum?id=51iwkioZpn}\n}\n```\n\nPlease find details about X-ALMA in the latest [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.03115)\n```\n@inproceedings{\nxu2025xalma,\ntitle={X-{ALMA}: Plug \\& Play Modules and Adaptive Rejection for Quality Translation at Scale},\nauthor={Haoran Xu and Kenton Murray and Philipp Koehn and Hieu Hoang and Akiko Eriguchi and Huda Khayrallah},\nbooktitle={The Thirteenth International Conference on Learning Representations},\nyear={2025},\nurl={https:\u002F\u002Fopenreview.net\u002Fforum?id=csbf1p8xUq}\n}\n```\n","\u003Cp align=\"center\">\n    \u003Cimg alt=\"ALMA\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffe1ixxu_ALMA_readme_62a0e8c38265.png\" width=\"500\" height=\"203\">\n\u003C\u002Fp>\n\n\u003Cdiv align=\"center\">\n    \n# ALMA：基于先进语言模型的翻译器\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n\u003Ca href=\"LICENSE\" alt=\"MIT许可证\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-FAD689.svg\" \u002F>\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11674\" alt=\"ALMA论文\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FALMA-Paper-D9AB42\" \u002F>\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.08417\" alt=\"ALMA-R论文\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FALMA--R-Paper-F6C555\" \u002F>\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.03115\" alt=\"X-ALMA论文\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FX--ALMA-Paper-F3B425\" \u002F>\u003C\u002Fa>\n\u003C!-- \u003Ca href=\"https:\u002F\u002Fnotes.aimodels.fyi\u002Falma-a-new-training-method-that-boosts-translation-performance-for-large-language-models\u002F\">\u003Cimg alt=\"摘要链接\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fsummary-link-F6C555\" \u002F>\u003C\u002Fa> -->\n\u003Ca href=\"https:\u002F\u002Fwww.clsp.jhu.edu\u002F\" alt=\"jhu\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FJohns_Hopkins_University-BEC23F\" \u002F>\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002F\" alt=\"MSlogo\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMicrosoft-B1B479?logo=microsoft\" \u002F>\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Ftwitter.com\u002Ffe1ixxu\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Ffollow\u002Fhaoranxu?style=social&logo=twitter\"\n      alt=\"在Twitter上关注\">\u003C\u002Fa>\n\u003C\u002Fp>\n\nALMA共有三代：ALMA（第一代）、ALMA-R（第二代）以及**X-ALMA（第三代，全新！）**。\n\n[**ALMA**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11674)（**A**dvanced **L**anguage **M**odel-based Tr**A**nslator）是一种多对多的基于大语言模型的翻译模型，它采用了一种全新的翻译模型范式：先在单语数据上进行微调，再利用高质量的平行数据进一步优化。这一两步微调流程确保了强大的翻译性能。\n\n**[ALMA-R](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.08417v2.pdf)** 在ALMA模型的基础上，使用我们提出的**对比偏好优化（CPO）**进行了进一步的LoRA微调，而ALMA则采用了监督微调。CPO微调需要我们的[三元组偏好数据](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhaoranxu\u002FALMA-R-Preference)来进行偏好学习。ALMA-R如今的表现可以媲美甚至超越GPT-4或WMT竞赛的冠军！\n\n**[X-ALMA](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.03115)（全新！）将ALMA(-R)的支持语言从6种扩展到了50种，并且无论资源丰富程度如何，都能在50种不同的语言中保持顶尖的翻译性能。这是通过即插即用的语言特定模块架构以及精心设计的5步训练方案，结合新颖的*自适应拒绝偏好优化*方法实现的。**\n\n*旧版ALMA仓库：*\n- 原始的**ALMA**仓库可以在这里找到：[这里](https:\u002F\u002Fgithub.com\u002Ffe1ixxu\u002FALMA\u002Ftree\u002Fa3cc7877752779346312bb07798172eadc83d692)。\n- 原始的**ALMA-R**仓库可以在这里找到：[这里](https:\u002F\u002Fgithub.com\u002Ffe1ixxu\u002FALMA\u002Ftree\u002Fac120eb44c609ad9a386d617172d40432c2c0df6)。\n\n# 新闻 🌟\n⭐ 2025年1月22日 **X-ALMA** 已被**ICLR 2025** 接受！\n\n⭐ 2024年10月6日 **X-ALMA** 正式发布！请在此处查看[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.03115)以及[模型和数据集](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fhaoranxu\u002Fx-alma-66fde464ef90be465920abaa)。\n\n⭐ 2024年6月20日 我们要特别感谢[SimPO](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.14734)，它与CPO共享类似的无参考偏好学习框架，但由于其特殊的长度归一化和目标奖励间隔，表现更加稳定。最令人兴奋的是，CPO和SimPO有可能结合使用！更多关于[CPO-SimPO](https:\u002F\u002Fgithub.com\u002Ffe1ixxu\u002FCPO_SIMPO)的信息，请参阅相关资料！\n\n⭐ 2024年5月1日 CPO论文已被**ICML 2024** 接受！\n\n⭐ 2024年3月22日 CPO方法现已合并至[huggingface trl](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftrl)！详情请见[此处](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftrl\u002Fpull\u002F1382)。\n\n⭐ 2024年1月16日 **ALMA-R** 正式发布！请参阅我们的新论文以获取更多详细信息：[对比偏好优化：推动机器翻译中大语言模型性能的边界](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.08417)。\n\n⭐ 2024年1月16日 ALMA论文：[机器翻译的范式转变：提升大语言模型的翻译性能](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11674)已被**ICLR 2024** 接受！更多详情请见[此处](https:\u002F\u002Fopenreview.net\u002Fforum?id=farT6XXntP)！\n\n# 目录 📄\n- [下载ALMA模型和数据集](#download-alma-models-and-dataset-)\n- [快速入门](#a-quick-start)\n- [环境设置](#environment-setup-)\n- [评估](#evaluation-)\n- [训练](#training-)\n- [常见问题解答](#faqs-)\n\n:star: 支持事项 :star:\n  - AMD和Nvidia显卡\n  - 数据并行评估\n  - 同时支持LLaMA-1、LLaMA-2、OPT、Faclon、BLOOM、MPT\n  - LoRA微调\n  - 单语数据微调、平行数据微调\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffe1ixxu_ALMA_readme_db4b483817d4.png\" width=\"700\" height=\"300\">\n\u003C\u002Fp>\n\n# 下载 ALMA 模型和数据集 🚀\n\n我们发布了 ALMA 系列的七种翻译模型：\n\n模型检查点已在 Hugging Face 上发布：\n|     模型    | 基础模型链接 | LoRA 链接 |\n|:-------------:|:---------------:|:---------:|\n|    ALMA-7B (第一代)    |        [haoranxu\u002FALMA-7B](https:\u002F\u002Fhuggingface.co\u002Fhaoranxu\u002FALMA-7B)        |     -     |\n|  ALMA-7B-LoRA (第一代) |        [haoranxu\u002FALMA-7B-Pretrain](https:\u002F\u002Fhuggingface.co\u002Fhaoranxu\u002FALMA-7B-Pretrain)        |     [haoranxu\u002FALMA-7B-Pretrain-LoRA](https:\u002F\u002Fhuggingface.co\u002Fhaoranxu\u002FALMA-7B-Pretrain-LoRA)     |\n|  ALMA-7B-R (第二代) |        [haoranxu\u002FALMA-7B-R (LoRA 合并)](https:\u002F\u002Fhuggingface.co\u002Fhaoranxu\u002FALMA-7B-R)        |     -    |\n|    ALMA-13B-LoRA (第一代)   |        [haoranxu\u002FALMA-13B](https:\u002F\u002Fhuggingface.co\u002Fhaoranxu\u002FALMA-13B)        |     -     |\n| ALMA-13B-LoRA |        [haoranxu\u002FALMA-13B-Pretrain](https:\u002F\u002Fhuggingface.co\u002Fhaoranxu\u002FALMA-13B-Pretrain)        |     [haoranxu\u002FALMA-13B-Pretrain-LoRA](https:\u002F\u002Fhuggingface.co\u002Fhaoranxu\u002FALMA-13B-Pretrain-LoRA)     |\n| ALMA-13B-R (第二代) |        [haoranxu\u002FALMA-13B-R (LoRA 合并)](https:\u002F\u002Fhuggingface.co\u002Fhaoranxu\u002FALMA-13B-R)        |    -   |\n|  **X-ALMA (全新，第三代)** |        [X-ALMA 模型](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fhaoranxu\u002Fx-alma-66fde464ef90be465920abaa)        |    -   |\n\n**请注意，`ALMA-7B-Pretrain` 和 `ALMA-13B-Pretrain` 并非翻译模型。它们仅经历了第一阶段的单语微调（7B 模型为 200 亿 token，13B 模型为 120 亿 token），应与其对应的 LoRA 模型结合使用。**\n\n*我们还在 `outputs` 目录中提供了 ALMA-13B-LoRA 和 ALMA-13B-R 在 WMT'22 和 WMT'23 上的翻译结果。这些输出还包括我们的基线结果，可以直接访问并用于后续评估。*\n\nALMA 和 ALMA-R 使用的数据集现在也已在 Hugging Face 上发布（全新！）：\n|     数据集    | 训练 \u002F 验证| 测试 |\n|:-------------:|:---------------:|:---------:|\n|   ALMA 人工编写的平行数据    |        [平行训练和验证](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhaoranxu\u002FALMA-Human-Parallel)        |     [WMT'22](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhaoranxu\u002FWMT22-Test)    |\n|  ALMA-R 三元组偏好数据 |        [三元组偏好数据](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhaoranxu\u002FALMA-R-Preference)        |   [WMT'22](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhaoranxu\u002FWMT22-Test) 和 [WMT'23](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhaoranxu\u002FWMT23-Test)   |\n|  **X-ALMA 数据** |   50 种语言   [平行数据](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhaoranxu\u002FX-ALMA-Parallel-Data) 和 [偏好数据](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhaoranxu\u002FX-ALMA-Preference)        |   [WMT'23](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhaoranxu\u002FWMT23-Test) 和 [FLORES-200](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhaoranxu\u002FX-ALMA-Parallel-Data)  |\n\n\n# 快速入门\nX-ALMA 采用即插即用架构，由基础模型和特定语言模块组成，每个模块可在不同语言组之间共享。\n有三种方式加载 X-ALMA 进行翻译。以下示例将“我爱机器翻译。”翻译成英语（X-ALMA 也可以进行多语言开放式问答）。\n\n**第一种方法**：加载已将特定语言模块合并到基础模型中的合并模型（推荐）：\n```\nimport torch\nfrom transformers import AutoModelForCausalLM\nfrom transformers import AutoTokenizer\nfrom peft import PeftModel\n\nGROUP2LANG = {\n    1: [\"da\", \"nl\", \"de\", \"is\", \"no\", \"sv\", \"af\"],\n    2: [\"ca\", \"ro\", \"gl\", \"it\", \"pt\", \"es\"],\n    3: [\"bg\", \"mk\", \"sr\", \"uk\", \"ru\"],\n    4: [\"id\", \"ms\", \"th\", \"vi\", \"mg\", \"fr\"],\n    5: [\"hu\", \"el\", \"cs\", \"pl\", \"lt\", \"lv\"],\n    6: [\"ka\", \"zh\", \"ja\", \"ko\", \"fi\", \"et\"],\n    7: [\"gu\", \"hi\", \"mr\", \"ne\", \"ur\"],\n    8: [\"az\", \"kk\", \"ky\", \"tr\", \"uz\", \"ar\", \"he\", \"fa\"],\n}\nLANG2GROUP = {lang: str(group) for group, langs in GROUP2LANG.items() for lang in langs}\ngroup_id = LANG2GROUP[\"zh\"]\n\nmodel = AutoModelForCausalLM.from_pretrained(f\"haoranxu\u002FX-ALMA-13B-Group{group_id}\", torch_dtype=torch.float16, device_map=\"auto\")\ntokenizer = AutoTokenizer.from_pretrained(f\"haoranxu\u002FX-ALMA-13B-Group{group_id}\", padding_side='left')\n\n# 将源句加入提示模板\nprompt=\"将这段中文翻译成英文：\\n中文：我爱机器翻译。\\n英文：\"\n\n# X-ALMA 需要聊天模板，而 ALMA 和 ALMA-R 则不需要。\nchat_style_prompt = [{\"role\": \"user\", \"content\": prompt}]\nprompt = tokenizer.apply_chat_template(chat_style_prompt, tokenize=False, add_generation_prompt=True)\n\ninput_ids = tokenizer(prompt, return_tensors=\"pt\", padding=True, max_length=40, truncation=True).input_ids.cuda()\n\n# 翻译\nwith torch.no_grad():\n    generated_ids = model.generate(input_ids=input_ids, num_beams=5, max_new_tokens=20, do_sample=True，temperature=0.6，top_p=0.9)\noutputs = tokenizer.batch_decode(generated_ids，skip_special_tokens=True)\nprint(outputs)\n```\n\n**第二种方法**：加载基础模型和特定语言模块（推荐）：\n```\nmodel = AutoModelForCausalLM.from_pretrained(\"haoranxu\u002FX-ALMA-13B-Pretrain\", torch_dtype=torch.float16，device_map=\"auto\")\nmodel = PeftModel.from_pretrained(model，f\"haoranxu\u002FX-ALMA-13B-Group{group_id}\")\ntokenizer = AutoTokenizer.from_pretrained(f\"haoranxu\u002FX-ALMA-13B-Group{group_id}\", padding_side='left')\n```\n\n**第三种方法**：加载带有所有特定语言模块的基础模型，类似于 MoE 架构：（需要较大的 GPU 内存）\n```\nfrom modeling_xalma import XALMAForCausalLM\nmodel = XALMAForCausalLM.from_pretrained(\"haoranxu\u002FX-ALMA\", torch_dtype=torch.float16，device_map=\"auto\")\ntokenizer = AutoTokenizer.from_pretrained(\"haoranxu\u002FX-ALMA\", padding_side='left')\n\n# 添加 `lang=\"zh\"`：指定语言，以指示模型在生成时使用哪个语言组进行第三次加载。\ngenerated_ids = model.generate(input_ids=input_ids，num_beams=5，max_new_tokens=20，do_sample=True，temperature=0.6，top_p=0.9，lang=\"zh\")\n```\n\n\nALMA 和 ALMA-R 的翻译提示是：\n```\n将这段 \u003C源语言名称> 翻译成 \u003C目标语言名称>：\n\u003C源语言名称>： \u003C源语言句子>\n\u003C目标语言名称>：\n```\n\nX-ALMA 的翻译提示是：\n```\n\u003Cs>[INST] 将这段 \u003C源语言名称> 翻译成 \u003C目标语言名称>：\n\u003C源语言名称>： \u003C源语言句子>\n\u003C目标语言名称>： [\u002FINST]\n```\n\n# 环境设置 🔧\n```\nconda create -n xalma python=3.11\nconda activate xalma\n```\n如果您使用 **AMD 显卡**，请先安装支持 ROCm 的 PyTorch。\n\n然后安装其他依赖项：\n```\nbash install_alma.sh\n```\n# 评估 💻\n\n### X-ALMA 评估\n这是评估我们 X-ALMA 模型的快速入门指南。要为 FLORES-200 数据集生成英→捷克语和捷克语→英语两个方向的翻译结果，（如果您想评估 WMT'23 而不是 FLORES-200，只需添加 `--override_test_data_path haoranxu\u002FWMT23-Test` 即可。）请运行以下命令。**请注意，ALMA 和 ALMA-R 不需要启用 `--chat_style`，此选项仅适用于 X-ALMA。**\n\n```\naccelerate launch --config_file configs\u002Fdeepspeed_eval_config_bf16.yaml \\\n    run_llmmt.py \\\n    --model_name_or_path haoranxu\u002FX-ALMA-13B-Group5 \\\n    --do_predict \\\n    --low_cpu_mem_usage \\\n    --language_pairs en-cs,cs-en \\\n    --mmt_data_path placeholder \\\n    --override_test_data_path haoranxu\u002FFLORES-200 \\\n    --per_device_eval_batch_size 1 \\\n    --output_dir .\u002Fyour_output_dir\u002F \\\n    --predict_with_generate \\\n    --max_new_tokens 256 \\\n    --max_source_length 256 \\\n    --bf16 \\\n    --seed 42 \\\n    --num_beams 5 \\\n    --overwrite_cache \\\n    --overwrite_output_dir \\\n    --chat_style # `--chat_style` 仅用于 X-ALMA。ALMA 和 ALMA-R 不需要启用 `--chat_style`\n```\n\n生成的输出将保存在 `your_output_dir` 目录中。英→捷克语方向的翻译文件名为 `test-en-cs`，而捷克语→英语方向的文件名为 `test-cs-en`。\n变量 `${test_pairs}` 表示您希望评估的翻译方向，支持同时测试多个方向。例如，您可以使用 `de-en,en-de,en-cs,cs-en`。\n\n更多关于评估 ALMA(-R) 的示例，请参阅 `.\u002Fevals` 文件夹下的内容：\n\n**请注意，这将执行由 DeepSpeed 支持的数据并行评估：即在每张可用的 GPU 上放置模型的完整副本，并将批次拆分到各 GPU 上，从而实现 K 张 GPU 的速度是单张 GPU 的 K 倍。** 对于 GPU 显存有限的用户，我们还提供另一种方法。用户可以添加 `--multi_gpu_one_model` 参数，以在多张 GPU 上分布式运行单个模型。请参阅 `evals\u002Falma_13b_r.sh` 或 `evals\u002F*no_parallel` 文件中的评估示例。\n\n# 训练 🔥\n在这里，我们将展示如何：\n- 在 ALMA 模型基础上进行对比偏好优化（ALMA→ALMA-R）。\n- 使用单语 OSCAR 数据对 LLaMA-2-7B 进行微调（阶段 1）。\n- 在阶段 1 完成后，对人工编写的平行数据进行全权重和 LoRA 微调（阶段 2）。\n\n请注意，我们不会公开 X-ALMA 的具体训练流程，因为这将涉及发布大量中间检查点，使整个过程过于复杂。\n\n## **CPO 微调**\n要使用我们的三元组偏好数据运行 CPO 微调，请执行以下命令：\n```\nbash runs\u002Fcpo_ft.sh ${your_output_dir}\n```\n\n### OSCAR 单语微调\n要执行 OSCAR 单语微调，请使用以下命令：\n```\nbash runs\u002Fmono_ft.sh ${your_output_dir}\n```\n\n### 平行数据微调（全权重）\n单语数据微调完成后，可以继续使用全权重方式对平行数据进行微调。请执行以下命令：\n```\nbash runs\u002Fparallel_ft.sh ${your_output_dir} $training_pairs$\n```\n其中 `training_pairs` 是您选择的翻译方向。默认为所有 10 个方向：`de-en,cs-en,is-en,zh-en,ru-en,en-de,en-cs,en-is,en-zh,en-ru`。\n\n### 平行数据微调（LoRA）\n在阶段 2 中，还可以选择使用 LoRA 方法对平行数据进行微调。为此，请执行以下命令：\n```\nbash runs\u002Fparallel_ft_lora.sh ${your_output_dir} $training_pairs$\n```\n\n# 常见问题 ❓\n### ALMA 和 ALMA-R 支持哪些语言方向？\n目前，ALMA 支持 10 个方向：英语↔德语、英语↔捷克语、英语↔冰岛语、英语↔中文、英语↔俄语。不过，它也可能在其他方向上带来惊喜 :)\n\n### X-ALMA 支持哪些语言方向？\nX-ALMA 支持 50 种语言和 98 个方向（包括以英语为源或目标的语言）：da,nl,de,is,no,sv,af,ca,ro,gl,it,pt,es,bg,mk,sr,uk,ru,id,ms,th,vi,mg,fr,hu,el,cs,pl,lt,lv,ka,zh,ja,ko,fi,et,gu,hi,mr,ne,ur,az,kk,ky,tr,uz,ar,he,fa。\n\n### 我应该在阶段 1 的什么时候停止微调？\n我们的 7B 和 13B 模型分别在 200 亿和 120 亿个标记上进行了预训练。然而，正如论文中所述，微调 10 亿个标记就能显著提升性能。完成 10 亿标记所需的步骤也取决于您的批量大小。在我们的案例中，批量大小计算如下：16 张 GPU × 4（每张 GPU 的批量大小）× 4（梯度累积步数）= 256。对于 512 的序列长度，我们需要大约 8,000 步来训练 10 亿个标记，计算公式为 10^9 \u002F (256*512) ≈8000 步。不过，您也可以选择微调更多步数以获得更好的效果。\n\n### 如何确定阶段 1 的混合概率？\n有关阶段 1 混合概率选择的原因，请参阅论文 [附录 D.1](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.11674.pdf)！\n\n# 参考文献\n有关 ALMA 模型的更多详细信息，请参阅我们的 [论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11674) 或论文的 [摘要](https:\u002F\u002Fnotes.aimodels.fyi\u002Falma-a-new-training-method-that-boosts-translation-performance-for-large-language-models\u002F)。\n```\n@inproceedings{\n    xu2024a,\n    title={机器翻译的范式转变：提升大型语言模型的翻译性能},\n    author={Haoran Xu、Young Jin Kim、Amr Sharaf 和 Hany Hassan Awadalla},\n    booktitle={第十二届国际表示学习会议},\n    year={2024},\n    url={https:\u002F\u002Fopenreview.net\u002Fforum?id=farT6XXntP}\n}\n```\n\n此外，有关经过对比偏好优化的 ALMA-R 模型的更详细信息，请参阅 [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.08417v2.pdf)。\n```\n@inproceedings{\n    xu2024contrastive,\n    title={对比偏好优化：推动大型语言模型在机器翻译中的性能边界},\n    author={Haoran Xu、Amr Sharaf、Yunmo Chen、Weiting Tan、Lingfeng Shen、Benjamin Van Durme、Kenton Murray 和 Young Jin Kim},\n    booktitle={第四十一届国际机器学习会议},\n    year={2024},\n    url={https:\u002F\u002Fopenreview.net\u002Fforum?id=51iwkioZpn}\n}\n```\n\n有关 X-ALMA 的详细信息，请参阅最新的 [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.03115)。\n```\n@inproceedings{\nxu2025xalma,\ntitle={X-{ALMA}：即插即用模块与自适应拒绝机制，实现大规模高质量翻译},\nauthor={Haoran Xu、Kenton Murray、Philipp Koehn、Hieu Hoang、Akiko Eriguchi 和 Huda Khayrallah},\nbooktitle={第十三届国际表示学习会议},\nyear={2025},\nurl={https:\u002F\u002Fopenreview.net\u002Fforum?id=csbf1p8xUq}\n}\n```","# ALMA \u002F X-ALMA 快速上手指南\n\nALMA 系列是基于大语言模型（LLM）的先进翻译工具，包含三代模型：**ALMA** (第一代)、**ALMA-R** (第二代) 和 **X-ALMA** (第三代，支持 50 种语言)。本指南重点介绍最新的 **X-ALMA** 及其前代模型的快速部署与使用。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**: Linux (推荐), macOS, Windows (需配置相应 CUDA\u002FROCm 环境)\n- **GPU**: 支持 NVIDIA (CUDA) 或 AMD (ROCm) 显卡\n- **Python**: 3.11 及以上版本\n- **显存建议**: \n    - 运行 7B 模型：建议 16GB+ 显存\n    - 运行 13B 模型：建议 24GB+ 显存\n    - 若使用 MoE 模式加载所有语言模块，需更大显存\n\n### 前置依赖\n确保已安装 `conda` 包管理器。若使用 AMD GPU，请先安装支持 ROCm 的 PyTorch 版本。\n\n## 安装步骤\n\n1. **创建并激活虚拟环境**\n   ```bash\n   conda create -n xalma python=3.11\n   conda activate xalma\n   ```\n\n2. **安装项目依赖**\n   克隆仓库后，运行官方提供的安装脚本：\n   ```bash\n   bash install_alma.sh\n   ```\n   *注：若国内网络下载依赖较慢，可手动配置 pip 镜像源（如清华源）后再运行脚本，或在脚本内替换 `pip install` 命令为 `pip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple ...`。*\n\n## 基本使用\n\nX-ALMA 采用“即插即用”架构，由基座模型和特定语言模块组成。以下展示如何使用 **X-ALMA-13B** 将中文翻译成英文。\n\n### 代码示例\n\n```python\nimport torch\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nfrom peft import PeftModel\n\n# 定义语言组映射 (X-ALMA 将 50 种语言分为 8 个组)\nGROUP2LANG = {\n    1: [\"da\", \"nl\", \"de\", \"is\", \"no\", \"sv\", \"af\"],\n    2: [\"ca\", \"ro\", \"gl\", \"it\", \"pt\", \"es\"],\n    3: [\"bg\", \"mk\", \"sr\", \"uk\", \"ru\"],\n    4: [\"id\", \"ms\", \"th\", \"vi\", \"mg\", \"fr\"],\n    5: [\"hu\", \"el\", \"cs\", \"pl\", \"lt\", \"lv\"],\n    6: [\"ka\", \"zh\", \"ja\", \"ko\", \"fi\", \"et\"],\n    7: [\"gu\", \"hi\", \"mr\", \"ne\", \"ur\"],\n    8: [\"az\", \"kk\", \"ky\", \"tr\", \"uz\", \"ar\", \"he\", \"fa\"],\n}\nLANG2GROUP = {lang: str(group) for group, langs in GROUP2LANG.items() for lang in langs}\n\n# 获取中文所属的组 ID\ngroup_id = LANG2GROUP[\"zh\"]\n\n# 方式一：加载已合并语言模块的模型（推荐，推理速度最快）\nmodel = AutoModelForCausalLM.from_pretrained(\n    f\"haoranxu\u002FX-ALMA-13B-Group{group_id}\", \n    torch_dtype=torch.float16, \n    device_map=\"auto\"\n)\ntokenizer = AutoTokenizer.from_pretrained(\n    f\"haoranxu\u002FX-ALMA-13B-Group{group_id}\", \n    padding_side='left'\n)\n\n# 构建提示词 (X-ALMA 需要使用 Chat Template)\nprompt_text = \"Translate this from Chinese to English:\\nChinese: 我爱机器翻译。\\nEnglish:\"\nchat_style_prompt = [{\"role\": \"user\", \"content\": prompt_text}]\nprompt = tokenizer.apply_chat_template(chat_style_prompt, tokenize=False, add_generation_prompt=True)\n\n# 编码输入\ninput_ids = tokenizer(prompt, return_tensors=\"pt\", padding=True, max_length=40, truncation=True).input_ids.cuda()\n\n# 生成翻译\nwith torch.no_grad():\n    generated_ids = model.generate(\n        input_ids=input_ids, \n        num_beams=5, \n        max_new_tokens=20, \n        do_sample=True, \n        temperature=0.6, \n        top_p=0.9\n    )\n\noutputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)\nprint(outputs)\n```\n\n### 提示词格式说明\n\n不同代际的模型提示词格式略有不同，请注意区分：\n\n*   **ALMA \u002F ALMA-R**:\n    ```text\n    Translate this from \u003C源语言名> into \u003C目标语言名>:\n    \u003C源语言名>: \u003C源句子>\n    \u003C目标语言名>:\n    ```\n\n*   **X-ALMA** (必须包含 `[INST]` 标记):\n    ```text\n    \u003Cs>[INST] Translate this from \u003C源语言名> into \u003C目标语言名>:\n    \u003C源语言名>: \u003C源句子>\n    \u003C目标语言名>: [\u002FINST]\n    ```\n\n### 其他加载方式\n若希望动态切换语言模块而不重新加载模型，可使用 **方式二**（基座 + LoRA 适配器）：\n```python\n# 加载基座\nmodel = AutoModelForCausalLM.from_pretrained(\"haoranxu\u002FX-ALMA-13B-Pretrain\", torch_dtype=torch.float16, device_map=\"auto\")\n# 动态挂载特定语言组模块\nmodel = PeftModel.from_pretrained(model, f\"haoranxu\u002FX-ALMA-13B-Group{group_id}\")\ntokenizer = AutoTokenizer.from_pretrained(f\"haoranxu\u002FX-ALMA-13B-Group{group_id}\", padding_side='left')\n```","一家跨国电商企业的本地化团队正急需将数万条包含大量行业术语和用户评论的英文产品描述，高质量地翻译成日语、阿拉语等小语种以拓展新兴市场。\n\n### 没有 ALMA 时\n- **小语种翻译质量差**：通用大模型在低资源语言（如阿拉伯语、泰语）上表现不佳，常出现语法错误或文化语境误读，导致本地用户阅读困难。\n- **专业术语不一致**：模型缺乏针对特定领域（如电商、法律）的微调，同一术语在不同段落翻译不统一，需人工反复校对修正。\n- **依赖昂贵 API 成本**：为追求质量不得不调用 GPT-4 等商业接口，处理海量数据时 API 费用高昂且存在数据隐私泄露风险。\n- **长尾语言支持缺失**：传统机器翻译引擎往往只覆盖主流语言，难以一次性满足企业拓展 50+ 种语言的全球化战略需求。\n\n### 使用 ALMA 后\n- **小语种性能跃升**：利用 X-ALMA 的自适应模块，即使在低资源语言上也能达到媲美母语者的流畅度，准确捕捉文化细微差别。\n- **领域术语精准统一**：通过“单语预训练 + 平行数据微调”的两阶段策略，ALMA 能深度理解电商语境，确保核心术语在全站内容中高度一致。\n- **私有化部署降本增效**：团队可直接部署开源的 ALMA-R 模型，利用 CPO 偏好优化技术在不依赖外部 API 的情况下超越 GPT-4 效果，大幅降低运营成本。\n- **一站式多语言覆盖**：借助 X-ALMA 对 50 种语言的广泛支持，无需为不同语系切换多个模型，一套架构即可支撑全球业务快速上线。\n\nALMA 通过先进的偏好优化与多语言架构，让企业在私有化环境下以低成本实现了覆盖全球 50 种语言的顶级翻译质量。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffe1ixxu_ALMA_db4b4838.png","fe1ixxu","Haoran Xu","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Ffe1ixxu_f665cdb2.jpg","Senior Researcher at Microsoft GenAI | CS Ph.D. at Johns Hopkins University | ex-Intern at Microsoft Research| ex-intern at Meta AI | ex-intern at Amazon Alexa","Microsoft","Seattle",null,"https:\u002F\u002Fwww.fe1ixxu.com\u002F","https:\u002F\u002Fgithub.com\u002Ffe1ixxu",[86,90,94,97],{"name":87,"color":88,"percentage":89},"Ruby","#701516",68.5,{"name":91,"color":92,"percentage":93},"Smalltalk","#596706",29.3,{"name":95,"color":96,"percentage":10},"Python","#3572A5",{"name":98,"color":99,"percentage":100},"Shell","#89e051",0.2,584,43,"2026-04-09T07:56:16","MIT","Linux","必需。支持 NVIDIA 和 AMD GPU。若使用 AMD 需安装 ROCm 版 torch。显存需求取决于模型大小：运行 X-ALMA-13B 或加载多语言模块（MoE 模式）需要大显存（建议 24GB+ 以支持 BF16\u002FFP16 推理），较小模型或 LoRA 模式可降低需求。","未说明（建议根据模型参数量配置，13B 模型推荐 32GB+）",{"notes":109,"python":110,"dependencies":111},"1. 项目提供三种代际模型（ALMA, ALMA-R, X-ALMA），其中 X-ALMA 支持 50 种语言。2. 若使用 AMD 显卡，必须先安装支持 ROCm 的 PyTorch 版本。3. 推荐使用 conda 创建名为 'xalma' 的虚拟环境。4. 部分功能（如 X-ALMA 的多语言模块动态加载）对显存要求极高。5. 训练和评估脚本依赖 DeepSpeed 配置文件。","3.11",[112,113,114,115,116],"torch","transformers","peft","accelerate","deepspeed",[15],"2026-03-27T02:49:30.150509","2026-04-11T10:02:44.089137",[121,126,131,136,141,145],{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},29455,"如何控制德语翻译中的敬语（Sie）和非敬语（Du）形式？","可以通过使用类似聊天微调（chat finetune）的提示词格式来控制。建议使用全英文提示，并采用多示例（multi-shot）方式明确语境。例如：\n\nTranslate this from English to German:\nEnglish: Explanation for you, a friend.\nGerman: Erklärung für dich, einen Freund.\nEnglish: Hi, description for you, my friend.\nGerman: Hallo, Beschreibung für dich, mein Freund.\nEnglish: \u003C你的英文输入>\nGerman:\n\n注意：不需要手动添加 \u003C\u002Fs> 或冒号，模型会自动处理。保持提示词为全英文即可，无需使用德语提示词。","https:\u002F\u002Fgithub.com\u002Ffe1ixxu\u002FALMA\u002Fissues\u002F13",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},29456,"在单语数据集上预训练 LLaMA-2 时遇到 'Current loss scale already at minimum' 溢出错误怎么办？","这通常是由于 Hugging Face transformers 库新版本引起的经典错误。可以尝试以下两种解决方案：\n1. 添加 `--bf16` 参数运行。\n2. 卸载当前的 transformers 库，并使用项目指定的版本重新安装：\npip install git+https:\u002F\u002Fgithub.com\u002Ffe1ixxu\u002FALMA.git@hf-install\n\n如果是因为显存不足（CUDA out of memory），可以尝试减小 `per_device_train_batch_size`（例如设为 2）并将 `gradient_accumulation_steps` 设为 1，但这会显著增加训练时间。","https:\u002F\u002Fgithub.com\u002Ffe1ixxu\u002FALMA\u002Fissues\u002F33",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},29457,"复现论文结果时，COMET 和 BLEU 分数低于预期，可能的原因是什么？","分数差异通常与训练轮数（epoch）和超参数设置有关。根据经验：\n1. 论文中的高分结果通常对应 epoch=2 的设置，但如果直接使用该设置可能导致分数异常低；尝试 epoch=1 可能更接近正常结果。\n2. 确保使用的 `transformers` 版本正确（例如 4.30.0.dev0），不同版本可能导致结果差异。\n3. 如果使用的是单卡（如 40G A100），需调整 `gradient_accumulation_steps`（例如设为 64）以匹配论文的 batch size。\n4. 加载模型时，请确认使用的是正确的检查点路径，通常是 `output_dir\u002Fcheckpoint-{step}\u002Fadapter_model\u002F` 或 `output_dir\u002Fcheckpoint-{step}`。","https:\u002F\u002Fgithub.com\u002Ffe1ixxu\u002FALMA\u002Fissues\u002F9",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},29458,"将 ALMA 风格微调应用到其他基础模型（如 Falcon）需要多少计算资源？","对于 1B token 的训练量，大约需要 300 GPU 小时。具体耗时取决于模型大小和硬件配置。建议参考项目的 FAQ 部分（关于何时停止第一阶段微调）以及原论文获取更多细节。","https:\u002F\u002Fgithub.com\u002Ffe1ixxu\u002FALMA\u002Fissues\u002F16",{"id":142,"question_zh":143,"answer_zh":144,"source_url":125},29459,"应该使用英文提示词还是德语提示词进行翻译？","完全使用英文提示词（Prompt fully in English）就足够了。项目中出现的德语提示词（如 \"Übersetzen Sie dies vom Englischen ins Deutsche:\"）仅用于论文附录 E 中的消融实验，实际使用时推荐使用全英文提示以获得最佳效果。",{"id":146,"question_zh":147,"answer_zh":148,"source_url":140},29460,"为什么我得到的翻译结果与官方演示或其他用户的結果不一致？","结果不一致通常由以下原因导致：\n1. **提示词格式不同**：是否使用了多示例（multi-shot）前缀（如 \"Description from a friend\"）会显著影响结果。\n2. **环境版本差异**：确保 `transformers` 和 `accelerate` 库的版本与官方一致（例如 transformers 4.35.0.dev0）。\n3. **加载方式**：确认正确加载了基座模型和 LoRA 权重。代码示例：\n```python\nmodel = AutoModelForCausalLM.from_pretrained(\"haoranxu\u002FALMA-13B-Pretrain\", torch_dtype=torch.float16, device_map=\"auto\")\nmodel = PeftModel.from_pretrained(model, \"haoranxu\u002FALMA-13B-Pretrain-LoRA\")\n```\n不同的量化方式或推理参数也可能导致输出差异。",[]]