[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-modelscope--ms-swift":3,"tool-modelscope--ms-swift":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":79,"owner_website":81,"owner_url":82,"languages":83,"stars":96,"forks":97,"last_commit_at":98,"license":99,"difficulty_score":10,"env_os":100,"env_gpu":101,"env_ram":102,"env_deps":103,"category_tags":109,"github_topics":110,"view_count":131,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":132,"updated_at":133,"faqs":134,"releases":170},154,"modelscope\u002Fms-swift","ms-swift","Use PEFT or Full-parameter to CPT\u002FSFT\u002FDPO\u002FGRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...) (AAAI 2025).","ms-swift 是由 ModelScope 社区推出的轻量级大模型微调与部署框架，支持 600 多个纯文本大模型（如 Qwen3.5、Llama4、GLM-5）和 400 多个多模态大模型（如 Qwen3-VL、Llava、InternVL3.5）的全流程训练与部署。它覆盖预训练、指令微调（SFT）、人类对齐（DPO、GRPO 等）、推理、评估、量化及部署等环节，帮助用户高效定制专属模型。\n\nms-swift 解决了大模型训练门槛高、流程复杂、多模态支持不足等问题，通过统一接口简化从数据准备到模型上线的全过程。它特别适合 AI 开发者和研究人员使用，尤其适用于需要快速实验新算法、适配多种模型架构或进行多模态任务的场景。\n\n技术上，ms-swift 集成了 Megatron 并行策略（TP\u002FPP\u002FCP\u002FEP）加速训练，并支持 GRPO、DPO、KTO 等多种偏好学习与强化学习算法，同时兼容 vLLM、SGLang、LMDeploy 等推理后端，以及 GPTQ、AWQ、FP8 等量化方案，兼顾灵活性与高性能。","# SWIFT (Scalable lightWeight Infrastructure for Fine-Tuning)\n\n\u003Cp align=\"center\">\n    \u003Cbr>\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmodelscope_ms-swift_readme_983e54e49bf1.png\"\u002F>\n    \u003Cbr>\n\u003Cp>\n\u003Cp align=\"center\">\n\u003Ca href=\"https:\u002F\u002Fmodelscope.cn\u002Fhome\">ModelScope Community Website\u003C\u002Fa>\n\u003Cbr>\n        \u003Ca href=\"README_CN.md\">中文\u003C\u002Fa> &nbsp ｜ &nbsp English &nbsp\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.11-5be.svg\">\n\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpytorch-%E2%89%A52.0-orange.svg\">\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fmodelscope\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fmodelscope-%E2%89%A51.23-5D91D4.svg\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fms-swift\u002F\">\u003Cimg src=\"https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fms-swift.svg\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002FLICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fmodelscope\u002Fms-swift\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fpepy.tech\u002Fproject\u002Fms-swift\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmodelscope_ms-swift_readme_f617c87ed07f.png\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpulls\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPR-welcome-55EB99.svg\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n\u003Ca href=\"https:\u002F\u002Ftrendshift.io\u002Frepositories\u002F6427\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmodelscope_ms-swift_readme_17f4a7d6fa98.png\" alt=\"modelscope%2Fswift | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"\u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n        \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.05517\">Paper\u003C\u002Fa> &nbsp ｜ \u003Ca href=\"https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002F\">English Documentation\u003C\u002Fa> &nbsp ｜ &nbsp \u003Ca href=\"https:\u002F\u002Fswift.readthedocs.io\u002Fzh-cn\u002Flatest\u002F\">中文文档\u003C\u002Fa> &nbsp\n\u003C\u002Fp>\n\n## 📖 Table of Contents\n- [Groups](#-Groups)\n- [Introduction](#-introduction)\n- [News](#-news)\n- [Installation](#%EF%B8%8F-installation)\n- [Quick Start](#-quick-Start)\n- [Usage](#-Usage)\n- [License](#-License)\n- [Citation](#-citation)\n\n\n## ☎ Groups\n\nYou can contact us and communicate with us by adding our group:\n\n\n[Discord Group](https:\u002F\u002Fdiscord.com\u002Finvite\u002FD27yfEFVz5)              |  WeChat Group\n:-------------------------:|:-------------------------:\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmodelscope_ms-swift_readme_dd3952153976.jpg\" width=\"200\" height=\"200\">  |  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmodelscope_ms-swift_readme_feddca09c050.png\" width=\"200\" height=\"200\">\n\n\n## 📝 Introduction\n🍲 **ms-swift** is a large model and multimodal large model fine-tuning and deployment framework provided by the ModelScope community. It now supports training (pre-training, fine-tuning, human alignment), inference, evaluation, quantization, and deployment for 600+ text-only large models and 400+ multimodal large models. Large models include: Qwen3, Qwen3.5, InternLM3, GLM4.5, Mistral, DeepSeek-R1, Llama4, etc. Multimodal large models include: Qwen3-VL, Qwen3-Omni, Llava, InternVL3.5, MiniCPM-V-4, Ovis2.5, GLM4.5-V, DeepSeek-VL2, etc.\n\n🍔 In addition, ms-swift integrates the latest training technologies, including Megatron parallelism techniques such as TP, PP, CP, EP to accelerate training, as well as numerous GRPO algorithm family reinforcement learning algorithms including: GRPO, DAPO, GSPO, SAPO, CISPO, RLOO, Reinforce++, etc. to enhance model intelligence. ms-swift supports a wide range of training tasks, including preference learning algorithms such as DPO, KTO, RM, CPO, SimPO, ORPO, as well as Embedding, Reranker, and sequence classification tasks. ms-swift provides full-pipeline support for large model training, including acceleration for inference, evaluation, and deployment modules using vLLM, SGLang, and LMDeploy, as well as model quantization using GPTQ, AWQ, BNB, and FP8 technologies.\n\n**Why Choose ms-swift?**\n\n- 🍎 **Model Types**: Supports **600+ text-only large models**, **400+ multimodal large models**, and All-to-All full modality models from training to deployment full pipeline, with Day-0 support for popular models.\n- **Dataset Types**: Built-in 150+ datasets for pre-training, fine-tuning, human alignment, multimodal and various other tasks, with support for custom datasets. Users only need to prepare datasets for one-click training.\n- **Hardware Support**: Supports A10\u002FA100\u002FH100, RTX series, T4\u002FV100, CPU, MPS, and domestic hardware Ascend NPU, etc.\n- **Lightweight Training**: Supports lightweight fine-tuning methods such as LoRA, QLoRA, DoRA, LoRA+, LLaMAPro, LongLoRA, LoRA-GA, ReFT, RS-LoRA, Adapter, LISA, etc.\n- **Quantized Training**: Supports training on BNB, AWQ, GPTQ, AQLM, HQQ, EETQ quantized models, requiring only 9GB training resources for 7B models.\n- **Memory Optimization**: GaLore, Q-Galore, UnSloth, Liger-Kernel, Flash-Attention 2\u002F3, and **Ulysses and Ring-Attention sequence parallelism techniques** support, reducing memory consumption for long-text training.\n- **Distributed Training**: Supports distributed data parallelism (DDP), device_map simple model parallelism, DeepSpeed ZeRO2 ZeRO3, FSDP\u002FFSDP2, and Megatron distributed training technologies.\n- 🍓 **Multimodal Training**: Supports multimodal packing technology to improve training speed by 100%+, supports mixed modality data training with text, images, video and audio, and supports independent control of vit\u002Faligner\u002Fllm.\n- **Agent Training**: Supports Agent templates, allowing one dataset to be used for training different models.\n- 🍊 **Training Tasks**: Supports pre-training and instruction fine-tuning, as well as training tasks such as DPO, GKD, KTO, RM, CPO, SimPO, ORPO, and supports **Embedding\u002FReranker** and sequence classification tasks.\n- 🥥 **Megatron Parallelism**: Provides TP\u002FPP\u002FSP\u002FCP\u002FETP\u002FEP\u002FVPP parallel strategies to significantly boost **MoE model training speed**. Supports full-parameter and LoRA training methods for 300+ pure text large models and 100+ multimodal large models. Supports CPT\u002FSFT\u002FGRPO\u002FDPO\u002FKTO\u002FRM training tasks.\n- 🍉 **Reinforcement Learning**: Built-in **rich GRPO family algorithms**, including GRPO, DAPO, GSPO, SAPO, CISPO, CHORD, RLOO, Reinforce++, etc. Supports synchronous and asynchronous vLLM engine inference acceleration, with extensible reward functions, multi-turn inference Schedulers, and environments through plugins.\n- **Full-Pipeline Capabilities**: Covers the entire workflow of training, inference, evaluation, quantization, and deployment.\n- **UI Training**: Provides Web-UI interface for training, inference, evaluation, and quantization, completing the full pipeline for large models.\n- **Inference Acceleration**: Supports Transformers, vLLM, SGLang, and LmDeploy inference acceleration engines, providing OpenAI interfaces for accelerating inference, deployment, and evaluation modules.\n- **Model Evaluation**: Uses EvalScope as the evaluation backend, supporting 100+ evaluation datasets for evaluating text-only and multimodal models.\n- **Model Quantization**: Supports quantization export for AWQ, GPTQ, FP8, and BNB. Exported models support inference acceleration using vLLM\u002FSGLang\u002FLmDeploy.\n\n\n## 🎉 News\n- 🎁 2026.03.03: **ms-swift v4.0** major version is officially released. For release notes, please refer to [here](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Freleases\u002Ftag\u002Fv4.0.0). You can provide your suggestions to us in [this issue](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fissues\u002F7250). Thank you for your support.\n- 🎁 2025.11.14: Megatron GRPO is now available!  Check out the [docs](.\u002Fdocs\u002Fsource_en\u002FMegatron-SWIFT\u002FGRPO.md) and [examples](examples\u002Fmegatron\u002Fgrpo).\n- 🎁 2025.11.04: Support for [Mcore-Bridge](docs\u002Fsource_en\u002FMegatron-SWIFT\u002FMcore-Bridge.md), making Megatron training as simple and easy to use as transformers.\n- 🎁 2025.10.28: Ray [here](docs\u002Fsource_en\u002FInstruction\u002FRay.md).\n- 🎁 2025.09.07: Added support for CHORD training algorithm. See the [documentation](.\u002Fdocs\u002Fsource_en\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FCHORD.md).\n- 🎁 2025.09.06: Ulysses can now be used with ring-attention, allowing sequences to be sharded into any number of chunks (no longer limited by the number of heads). The argument remains `--sequence_parallel_size N`.\n- 🎁 2025.09.02: Megatron-SWIFT now supports multimodal model training. Documentation can be found [here](.\u002Fdocs\u002Fsource_en\u002FMegatron-SWIFT\u002FMultimodal-Model.md).\n- 🎁 2025.08.12: Support [Dynamic Fine-Tuning](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.05629)(DFT) in SFT training, use parameter `--enable_dft_loss true`. Training scripts can be found [here](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Ffull\u002Fdft.sh).\n- 🎁 2025.07.09: Megatron-SWIFT supports LoRA training. Compared to ms-swift, it achieves significant speedup on MoE models. Training scripts can be found [here](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Fmegatron\u002Flora).\n- 🎁 2025.06.23: Fine-tuning of reranker models is supported. Training scripts can be found here: [Reranker](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Freranker\u002Ftrain_reranker.sh).\n- 🎁 2025.06.15: Support for GKD training on both pure text large models and multimodal models. Training scripts can be found here: [Pure Text](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Frlhf\u002Fgkd), [Multimodal](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Fmultimodal\u002Frlhf\u002Fgkd).\n\n\u003Cdetails>\u003Csummary>More\u003C\u002Fsummary>\n\n- 🎁 2025.06.11: Support for using Megatron parallelism techniques for RLHF training. The training script can be found [here](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Frlhf).\n- 🎁 2025.05.29: Support sequence parallel in pretrain, sft, dpo and grpo, check script [here](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Fsequence_parallel).\n- 🎁 2025.05.11: GRPO now supports custom processing logic for reward models. See the GenRM example [here](.\u002Fdocs\u002Fsource_en\u002FInstruction\u002FGRPO\u002FDeveloperGuide\u002Freward_model.md).\n- 🎁 2025.04.15: The ms-swift paper has been accepted by AAAI 2025. You can find the paper at [this link](https:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fview\u002F35383).\n- 🎁 2025.03.23: Multi-round GRPO is now supported for training multi-turn dialogue scenarios (e.g., agent tool calling). Please refer to the [doc](.\u002Fdocs\u002Fsource_en\u002FInstruction\u002FGRPO\u002FDeveloperGuide\u002Fmulti_turn.md).\n- 🎁 2025.03.16: Support for Megatron's parallel training techniques is now available. Please see the [Megatron-SWIFT training documentation](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FMegatron-SWIFT\u002FQuick-start.html).\n- 🎁 2025.03.15: Fine-tuning of embedding models for both pure text and multimodal models is supported. Please check the [training script](examples\u002Ftrain\u002Fembedding).\n- 🎁 2025.03.05: The hybrid mode for GRPO is supported, with a script for training a 72B model on 4 GPUs (4*80G) available [here](examples\u002Ftrain\u002Fgrpo\u002Finternal\u002Fvllm_72b_4gpu.sh). Tensor parallelism with vllm is also supported, with the training script available [here](examples\u002Ftrain\u002Fgrpo\u002Finternal).\n- 🎁 2025.02.21: The GRPO algorithm now supports LMDeploy, with the training script available [here](examples\u002Ftrain\u002Fgrpo\u002Finternal\u002Ffull_lmdeploy.sh). Additionally, the performance of the GRPO algorithm has been tested, achieving a training speed increase of up to 300% using various tricks. Please check the WanDB table [here](https:\u002F\u002Fwandb.ai\u002Ftastelikefeet\u002Fgrpo_perf_test?nw=nwuseryuzezyz).\n- 🎁 2025.02.21: The `swift sample` command is now supported. The reinforcement fine-tuning script can be found [here](docs\u002Fsource_en\u002FInstruction\u002FReinforced-Fine-tuning.md), and the large model API distillation sampling script is available [here](examples\u002Fsampler\u002Fdistill\u002Fdistill.sh).\n- 🔥 2025.02.12: Support for the GRPO (Group Relative Policy Optimization) training algorithm has been added. Documentation is available [here](docs\u002Fsource_en\u002FInstruction\u002FGRPO\u002FGetStarted\u002FGRPO.md).\n- 🎁 2024.12.04: Major update to **ms-swift 3.0**. Please refer to the [release notes and changes](docs\u002Fsource_en\u002FInstruction\u002FReleaseNote3.0.md).\n- 🎉 2024.08.12: The ms-swift paper has been published on arXiv and can be read [here](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.05517).\n- 🔥 2024.08.05: Support for using [evalscope](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fevalscope\u002F) as a backend for evaluating large models and multimodal models.\n- 🔥 2024.07.29: Support for using [vllm](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm) and [lmdeploy](https:\u002F\u002Fgithub.com\u002FInternLM\u002Flmdeploy) to accelerate inference for large models and multimodal models. When performing infer\u002Fdeploy\u002Feval, you can specify `--infer_backend vllm\u002Flmdeploy`.\n- 🔥 2024.07.24: Support for human preference alignment training for multimodal large models, including DPO\u002FORPO\u002FSimPO\u002FCPO\u002FKTO\u002FRM\u002FPPO.\n- 🔥 2024.02.01: Support for Agent training! The training algorithm is derived from [this paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.00986.pdf).\n\u003C\u002Fdetails>\n\n## 🛠️ Installation\nTo install using pip:\n```shell\npip install ms-swift -U\n\n# Using uv\npip install uv\nuv pip install ms-swift -U --torch-backend=auto\n```\n\nTo install from source:\n```shell\n# pip install git+https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift.git\n\ngit clone https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift.git\ncd ms-swift\n# The main branch is for swift 4.x. To install swift 3.x, please run the following command:\n# git checkout release\u002F3.12\npip install -e .\n\n# Using uv\nuv pip install -e . --torch-backend=auto\n```\n\nRunning Environment:\n\n|              | Range        | Recommended         | Notes                                     |\n|--------------|--------------|---------------------|-------------------------------------------|\n| python       | >=3.9        | 3.11\u002F3.12                |                                           |\n| cuda         |              | cuda12              | No need to install if using CPU, NPU, MPS |\n| torch        | >=2.0        | 2.8.0\u002F2.10.0         |                            |\n| transformers | >=4.33       | 4.57.6\u002F5.2.0              |                          |\n| modelscope   | >=1.23       |                     |                                           |\n| peft         | >=0.11,\u003C0.19 |                     |                                           |\n| flash_attn   |              | 2.8.3\u002F3.0.0b1 |                                           |\n| trl          | >=0.15,\u003C0.30 | 0.28.0              | RLHF                                      |\n| deepspeed    | >=0.14       | 0.18.8              | Training                                  |\n| vllm         | >=0.5.1      | 0.11.0\u002F0.17.1       | Inference\u002FDeployment                      |\n| sglang       | >=0.4.6      |          | Inference\u002FDeployment                      |\n| lmdeploy     | >=0.5   | 0.10.1                 | Inference\u002FDeployment                      |\n| evalscope    | >=1.0       |                     | Evaluation                                |\n| gradio       |              | 5.32.1              | Web-UI\u002FApp                                |\n\nFor more optional dependencies, you can refer to [here](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Frequirements\u002Finstall_all.sh).\n\n\n## 🚀 Quick Start\n\n10 minutes of self-cognition fine-tuning of Qwen3-4B-Instruct-2507 on a single 3090 GPU:\n\n### Command Line Interface (Recommended)\n\n```shell\n# 13GB\nCUDA_VISIBLE_DEVICES=0 \\\nswift sft \\\n    --model Qwen\u002FQwen3-4B-Instruct-2507 \\\n    --tuner_type lora \\\n    --dataset 'AI-ModelScope\u002Falpaca-gpt4-data-zh#500' \\\n              'AI-ModelScope\u002Falpaca-gpt4-data-en#500' \\\n              'swift\u002Fself-cognition#500' \\\n    --torch_dtype bfloat16 \\\n    --num_train_epochs 1 \\\n    --per_device_train_batch_size 1 \\\n    --per_device_eval_batch_size 1 \\\n    --learning_rate 1e-4 \\\n    --lora_rank 8 \\\n    --lora_alpha 32 \\\n    --target_modules all-linear \\\n    --gradient_accumulation_steps 16 \\\n    --eval_steps 50 \\\n    --save_steps 50 \\\n    --save_total_limit 2 \\\n    --logging_steps 5 \\\n    --max_length 2048 \\\n    --output_dir output \\\n    --warmup_ratio 0.05 \\\n    --dataloader_num_workers 4 \\\n    --model_author swift \\\n    --model_name swift-robot\n```\n\nTips:\n\n- If you want to train with a custom dataset, you can refer to [this guide](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FCustomization\u002FCustom-dataset.html) to organize your dataset format and specify `--dataset \u003Cdataset_path>`.\n- The `--model_author` and `--model_name` parameters are only effective when the dataset includes `swift\u002Fself-cognition`.\n- To train with a different model, simply modify `--model \u003Cmodel_id\u002Fmodel_path>`.\n- By default, **ModelScope** is used for downloading models and datasets. If you want to use HuggingFace, simply specify `--use_hf true`.\n\nAfter training is complete, use the following command to infer with the trained weights:\n\n- Here, `--adapters` should be replaced with the last checkpoint folder generated during training. Since the adapters folder contains the training parameter file `args.json`, there is no need to specify `--model`, `--system` separately; Swift will automatically read these parameters. To disable this behavior, you can set `--load_args false`.\n\n```shell\n# Using an interactive command line for inference.\nCUDA_VISIBLE_DEVICES=0 \\\nswift infer \\\n    --adapters output\u002Fvx-xxx\u002Fcheckpoint-xxx \\\n    --stream true \\\n    --temperature 0 \\\n    --max_new_tokens 2048\n\n# merge-lora and use vLLM for inference acceleration\nCUDA_VISIBLE_DEVICES=0 \\\nswift infer \\\n    --adapters output\u002Fvx-xxx\u002Fcheckpoint-xxx \\\n    --stream true \\\n    --merge_lora true \\\n    --infer_backend vllm \\\n    --vllm_max_model_len 8192 \\\n    --temperature 0 \\\n    --max_new_tokens 2048\n```\n\nFinally, use the following command to push the model to ModelScope:\n\n```shell\nCUDA_VISIBLE_DEVICES=0 \\\nswift export \\\n    --adapters output\u002Fvx-xxx\u002Fcheckpoint-xxx \\\n    --push_to_hub true \\\n    --hub_model_id '\u003Cyour-model-id>' \\\n    --hub_token '\u003Cyour-sdk-token>' \\\n    --use_hf false\n```\n\n\n### Web-UI\nThe Web-UI is a **zero-threshold** training and deployment interface solution based on Gradio interface technology. For more details, you can check [here](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FGetStarted\u002FWeb-UI.html).\n\n```shell\nSWIFT_UI_LANG=en swift web-ui\n```\n\n![image.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmodelscope_ms-swift_readme_1d5df5a1f3f7.jpg)\n\n### Using Python\n\nms-swift also supports training and inference using Python. Below is pseudocode for training and inference. For more details, you can refer to [here](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Fnotebook\u002Fqwen2_5-self-cognition\u002Fself-cognition-sft.ipynb).\n\nTraining:\n\n```python\nfrom peft import LoraConfig, get_peft_model\nfrom swift import get_model_processor, get_template, load_dataset, EncodePreprocessor\nfrom swift.trainers import Seq2SeqTrainer, Seq2SeqTrainingArguments\n# Retrieve the model and template, and add a trainable LoRA module\nmodel, tokenizer = get_model_processor(model_id_or_path, ...)\ntemplate = get_template(tokenizer, ...)\nlora_config = LoraConfig(...)\nmodel = get_peft_model(model, lora_config)\n\n# Download and load the dataset, and encode the text into tokens\ntrain_dataset, val_dataset = load_dataset(dataset_id_or_path, ...)\ntrain_dataset = EncodePreprocessor(template=template)(train_dataset, num_proc=num_proc)\nval_dataset = EncodePreprocessor(template=template)(val_dataset, num_proc=num_proc)\n\n# Train the model\ntraining_args = Seq2SeqTrainingArguments(...)\ntrainer = Seq2SeqTrainer(\n    model=model,\n    args=training_args,\n    template=template,\n    train_dataset=train_dataset,\n    eval_dataset=val_dataset,\n)\ntrainer.train()\n```\nInference:\n\n```python\nfrom swift import TransformersEngine, InferRequest, RequestConfig\n# Perform inference using the native Transformers engine\nengine = TransformersEngine(model_id_or_path, adapters=[lora_checkpoint])\ninfer_request = InferRequest(messages=[{'role': 'user', 'content': 'who are you?'}])\nrequest_config = RequestConfig(max_tokens=max_new_tokens, temperature=temperature)\n\nresp_list = engine.infer([infer_request], request_config)\nprint(f'response: {resp_list[0].choices[0].message.content}')\n```\n\n## ✨ Usage\nHere is a minimal example of training to deployment using ms-swift. For more details, you can check the [examples](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples).\n\n- If you want to use other models or datasets (including multimodal models and datasets), you only need to modify `--model` to specify the corresponding model's ID or path, and modify `--dataset` to specify the corresponding dataset's ID or path.\n- By default, ModelScope is used for downloading models and datasets. If you want to use HuggingFace, simply specify `--use_hf true`.\n\n|   Useful Links |\n| ------ |\n|   [🔥Command Line Parameters](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FCommand-line-parameters.html)   |\n|   [Megatron-SWIFT](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FMegatron-SWIFT\u002FQuick-start.html)   |\n|   [GRPO](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FGetStarted\u002FGRPO.html)   |\n|   [Supported Models and Datasets](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FSupported-models-and-datasets.html)   |\n|   [Custom Models](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FCustomization\u002FCustom-model.html), [🔥Custom Datasets](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FCustomization\u002FCustom-dataset.html)   |\n|   [LLM Tutorial](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fmodelscope-classroom\u002Ftree\u002Fmain\u002FLLM-tutorial)   |\n\n### Training\n\nSupported Training Methods:\n\n| Method                                                       | Full-Parameter                                               | LoRA | QLoRA                                                        | Deepspeed                                                    | Multi-Machine                                                | Multimodal                                                   |\n| ------------------------------------------------------------ | ------------------------------------------------------------ | ---- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |\n| [Pre-training](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Fpretrain) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | ✅                                                            |\n| [Supervised Fine-Tuning](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Flora_sft.sh) | [✅](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Ffull\u002Ftrain.sh) | ✅    | [✅](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Fqlora) | [✅](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Fmulti-gpu\u002Fdeepspeed) | [✅](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Fmulti-node) | [✅](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Fmultimodal) |\n| [GRPO](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Fgrpo) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | ✅                                                            |\n| [GKD](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Frlhf\u002Fgkd) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | [✅](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Fmultimodal\u002Frlhf\u002Fgkd) |\n| [PPO](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Frlhf\u002Fppo) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | ❌                                                            |\n| [DPO](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Frlhf\u002Fdpo) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | [✅](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Fmultimodal\u002Frlhf\u002Fdpo) |\n| [KTO](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Frlhf\u002Fkto.sh) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | [✅](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Fmultimodal\u002Frlhf\u002Fkto.sh) |\n| [Reward Model](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Frlhf\u002Frm.sh) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | ✅                                                            |\n| [CPO](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Frlhf\u002Fcpo.sh) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | ✅                                                            |\n| [SimPO](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Frlhf\u002Fsimpo.sh) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | ✅                                                            |\n| [ORPO](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Frlhf\u002Forpo.sh) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | ✅                                                            |\n| [Embedding](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Fembedding) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | ✅                                                            |\n| [Reranker](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Freranker) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | ✅                                                            |\n| [Sequence Classification](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Fseq_cls) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | ✅                                                            |\n\n\nPre-training:\n```shell\n# 8*A100\nNPROC_PER_NODE=8 \\\nCUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \\\nswift pt \\\n    --model Qwen\u002FQwen2.5-7B \\\n    --dataset swift\u002Fchinese-c4 \\\n    --streaming true \\\n    --tuner_type full \\\n    --deepspeed zero2 \\\n    --output_dir output \\\n    --max_steps 10000 \\\n    ...\n```\n\nFine-tuning:\n```shell\nCUDA_VISIBLE_DEVICES=0 swift sft \\\n    --model Qwen\u002FQwen2.5-7B-Instruct \\\n    --dataset AI-ModelScope\u002Falpaca-gpt4-data-en \\\n    --tuner_type lora \\\n    --output_dir output \\\n    ...\n```\n\nRLHF:\n```shell\nCUDA_VISIBLE_DEVICES=0 swift rlhf \\\n    --rlhf_type dpo \\\n    --model Qwen\u002FQwen2.5-7B-Instruct \\\n    --dataset hjh0119\u002FshareAI-Llama3-DPO-zh-en-emoji \\\n    --tuner_type lora \\\n    --output_dir output \\\n    ...\n```\n\n\n### Megatron-SWIFT\n\nms-swift supports using Megatron parallelism techniques to accelerate training, including large-scale cluster training and MoE model training. The following training methods are supported:\n\n| Method                 | Full-Parameter | LoRA | MoE  | Multimodal | FP8  |\n| ---------------------- | -------------- | ---- | ---- | ---------- | ---- |\n| Pre-training           | ✅              | ✅    | ✅    | ✅          | ✅    |\n| [Supervised Fine-Tuning](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron) | ✅              | ✅    | ✅    | ✅          | ✅    |\n| [GRPO](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Fgrpo)                   | ✅              | ✅    | ✅    | ✅          | ✅    |\n| [GKD](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Frlhf\u002Fgkd)                   | ✅              | ✅    | ✅    | ✅          | ✅    |\n| [DPO](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Frlhf\u002Fdpo)                    | ✅              | ✅    | ✅    | ✅          | ✅    |\n| [KTO](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Frlhf\u002Fkto)                    | ✅              | ✅    | ✅    | ✅          | ✅    |\n| [RM](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Frlhf\u002Frm)                     | ✅              | ✅    | ✅    | ✅          | ✅    |\n| [Embedding](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Fembedding) | ✅ | ✅| ✅ | ✅ | ✅ |\n| [Reranker](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Freranker) | ✅ | ✅| ✅ | ✅ | ✅ |\n| [Sequence Classification](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Fseq_cls)    | ✅              | ✅    | ✅    | ✅          | ✅    |\n\n\n```shell\nNPROC_PER_NODE=2 CUDA_VISIBLE_DEVICES=0,1 megatron sft \\\n    --model Qwen\u002FQwen2.5-7B-Instruct \\\n    --save_safetensors true \\\n    --dataset AI-ModelScope\u002Falpaca-gpt4-data-zh \\\n    --tuner_type lora \\\n    --output_dir output \\\n    ...\n```\n\n### Reinforcement Learning\n\nms-swift supports a rich set of GRPO family algorithms:\n\n| Method                                                       | Full-Parameter | LoRA | Multimodal | Multi-Machine |\n| ------------------------------------------------------------ | -------------- | ---- | ---------- | ------------- |\n| [GRPO](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FGetStarted\u002FGRPO.html) | ✅              | ✅    | ✅          | ✅             |\n| [DAPO](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FDAPO.html) | ✅              | ✅    | ✅          | ✅             |\n| [GSPO](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FGSPO.html) | ✅              | ✅    | ✅          | ✅             |\n| [SAPO](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FSAPO.html) | ✅              | ✅    | ✅          | ✅             |\n| [CISPO](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FCISPO.html) | ✅              | ✅    | ✅          | ✅             |\n| [CHORD](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FCHORD.html) | ✅              | ✅    | ✅          | ✅             |\n| [RLOO](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FRLOO.html) | ✅              | ✅    | ✅          | ✅             |\n| [Reinforce++](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FREINFORCEPP.html) | ✅              | ✅    | ✅          | ✅             |\n\n```shell\nCUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 \\\nswift rlhf \\\n    --rlhf_type grpo \\\n    --model Qwen\u002FQwen2.5-7B-Instruct \\\n    --tuner_type lora \\\n    --use_vllm true \\\n    --vllm_mode colocate \\\n    --dataset AI-MO\u002FNuminaMath-TIR#10000 \\\n    --output_dir output \\\n    ...\n```\n\n\n### Inference\n```shell\nCUDA_VISIBLE_DEVICES=0 swift infer \\\n    --model Qwen\u002FQwen2.5-7B-Instruct \\\n    --stream true \\\n    --infer_backend transformers \\\n    --max_new_tokens 2048\n\n# LoRA\nCUDA_VISIBLE_DEVICES=0 swift infer \\\n    --model Qwen\u002FQwen2.5-7B-Instruct \\\n    --adapters swift\u002Ftest_lora \\\n    --stream true \\\n    --infer_backend transformers \\\n    --temperature 0 \\\n    --max_new_tokens 2048\n```\n\n### Interface Inference\n```shell\nCUDA_VISIBLE_DEVICES=0 swift app \\\n    --model Qwen\u002FQwen2.5-7B-Instruct \\\n    --stream true \\\n    --infer_backend transformers \\\n    --max_new_tokens 2048\n```\n\n### Deployment\n```shell\nCUDA_VISIBLE_DEVICES=0 swift deploy \\\n    --model Qwen\u002FQwen2.5-7B-Instruct \\\n    --infer_backend vllm\n```\n\n### Sampling\n```shell\nCUDA_VISIBLE_DEVICES=0 swift sample \\\n    --model LLM-Research\u002FMeta-Llama-3.1-8B-Instruct \\\n    --sampler_engine transformers \\\n    --num_return_sequences 5 \\\n    --dataset AI-ModelScope\u002Falpaca-gpt4-data-zh#5\n```\n\n### Evaluation\n```shell\nCUDA_VISIBLE_DEVICES=0 swift eval \\\n    --model Qwen\u002FQwen2.5-7B-Instruct \\\n    --infer_backend lmdeploy \\\n    --eval_backend OpenCompass \\\n    --eval_dataset ARC_c\n```\n\n### Quantization\n```shell\nCUDA_VISIBLE_DEVICES=0 swift export \\\n    --model Qwen\u002FQwen2.5-7B-Instruct \\\n    --quant_bits 4 --quant_method awq \\\n    --dataset AI-ModelScope\u002Falpaca-gpt4-data-zh \\\n    --output_dir Qwen2.5-7B-Instruct-AWQ\n```\n\n### Push Model\n```shell\nswift export \\\n    --model \u003Cmodel-path> \\\n    --push_to_hub true \\\n    --hub_model_id '\u003Cmodel-id>' \\\n    --hub_token '\u003Csdk-token>'\n```\n\n## 🏛 License\n\nThis framework is licensed under the [Apache License (Version 2.0)](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmaster\u002FLICENSE). For models and datasets, please refer to the original resource page and follow the corresponding License.\n\n## 📎 Citation\n\n```bibtex\n@misc{zhao2024swiftascalablelightweightinfrastructure,\n      title={SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning},\n      author={Yuze Zhao and Jintao Huang and Jinghan Hu and Xingjun Wang and Yunlin Mao and Daoze Zhang and Zeyinzi Jiang and Zhikai Wu and Baole Ai and Ang Wang and Wenmeng Zhou and Yingda Chen},\n      year={2024},\n      eprint={2408.05517},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.05517},\n}\n```\n\n## Star History\n\n[![Star History Chart](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmodelscope_ms-swift_readme_b8e7cc07dcb0.png)](https:\u002F\u002Fstar-history.com\u002F#modelscope\u002Fms-swift&Date)\n","# SWIFT（Scalable lightWeight Infrastructure for Fine-Tuning，可扩展轻量级微调基础设施）\n\n\u003Cp align=\"center\">\n    \u003Cbr>\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmodelscope_ms-swift_readme_983e54e49bf1.png\"\u002F>\n    \u003Cbr>\n\u003Cp>\n\u003Cp align=\"center\">\n\u003Ca href=\"https:\u002F\u002Fmodelscope.cn\u002Fhome\">ModelScope 社区官网\u003C\u002Fa>\n\u003Cbr>\n        中文 &nbsp ｜ &nbsp \u003Ca href=\"README_CN.md\">English\u003C\u002Fa> &nbsp\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.11-5be.svg\">\n\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpytorch-%E2%89%A52.0-orange.svg\">\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fmodelscope\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fmodelscope-%E2%89%A51.23-5D91D4.svg\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fms-swift\u002F\">\u003Cimg src=\"https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fms-swift.svg\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002FLICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fmodelscope\u002Fms-swift\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fpepy.tech\u002Fproject\u002Fms-swift\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmodelscope_ms-swift_readme_f617c87ed07f.png\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpulls\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPR-welcome-55EB99.svg\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n\u003Ca href=\"https:\u002F\u002Ftrendshift.io\u002Frepositories\u002F6427\" target=\"_blank\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmodelscope_ms-swift_readme_17f4a7d6fa98.png\" alt=\"modelscope%2Fswift | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"\u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n        \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.05517\">论文\u003C\u002Fa> &nbsp ｜ \u003Ca href=\"https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002F\">英文文档\u003C\u002Fa> &nbsp ｜ &nbsp \u003Ca href=\"https:\u002F\u002Fswift.readthedocs.io\u002Fzh-cn\u002Flatest\u002F\">中文文档\u003C\u002Fa> &nbsp\n\u003C\u002Fp>\n\n## 📖 目录\n- [交流群组](#-Groups)\n- [简介](#-introduction)\n- [最新动态](#-news)\n- [安装](#%EF%B8%8F-installation)\n- [快速开始](#-quick-Start)\n- [使用方法](#-Usage)\n- [许可证](#-License)\n- [引用](#-citation)\n\n\n## ☎ 交流群组\n\n您可以通过加入我们的群组与我们联系和交流：\n\n\n[Discord 群组](https:\u002F\u002Fdiscord.com\u002Finvite\u002FD27yfEFVz5)              |  微信群\n:-------------------------:|:-------------------------:\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmodelscope_ms-swift_readme_dd3952153976.jpg\" width=\"200\" height=\"200\">  |  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmodelscope_ms-swift_readme_feddca09c050.png\" width=\"200\" height=\"200\">\n\n\n## 📝 简介\n🍲 **ms-swift** 是由 ModelScope 社区提供的大模型与多模态大模型微调及部署框架。目前支持 **600+ 纯文本大模型** 和 **400+ 多模态大模型** 的训练（预训练、微调、人类对齐）、推理、评估、量化和部署。其中大模型包括：Qwen3、Qwen3.5、InternLM3、GLM4.5、Mistral、DeepSeek-R1、Llama4 等；多模态大模型包括：Qwen3-VL、Qwen3-Omni、Llava、InternVL3.5、MiniCPM-V-4、Ovis2.5、GLM4.5-V、DeepSeek-VL2 等。\n\n🍔 此外，ms-swift 集成了最新的训练技术，包括 Megatron 并行技术（如 TP、PP、CP、EP）以加速训练，以及丰富的 GRPO（Generalized Reinforcement Learning Policy Optimization，广义强化学习策略优化）算法族强化学习算法，包括：GRPO、DAPO、GSPO、SAPO、CISPO、RLOO、Reinforce++ 等，以提升模型智能。ms-swift 支持广泛的训练任务，包括偏好学习算法（如 DPO、KTO、RM、CPO、SimPO、ORPO），以及 Embedding、Reranker 和序列分类任务。ms-swift 提供了大模型训练的全链路支持，包括使用 vLLM、SGLang 和 LMDeploy 对推理、评估和部署模块进行加速，以及使用 GPTQ、AWQ、BNB 和 FP8 技术进行模型量化。\n\n**为什么选择 ms-swift？**\n\n- 🍎 **模型类型**：支持 **600+ 纯文本大模型**、**400+ 多模态大模型**，以及 All-to-All 全模态模型，覆盖从训练到部署的完整流程，并对热门模型提供 Day-0 支持。\n- **数据集类型**：内置 150+ 个用于预训练、微调、人类对齐、多模态等任务的数据集，同时支持自定义数据集。用户只需准备数据集即可一键训练。\n- **硬件支持**：支持 A10\u002FA100\u002FH100、RTX 系列、T4\u002FV100、CPU、MPS，以及国产硬件 Ascend NPU 等。\n- **轻量级训练**：支持 LoRA、QLoRA、DoRA、LoRA+、LLaMAPro、LongLoRA、LoRA-GA、ReFT、RS-LoRA、Adapter、LISA 等轻量级微调方法。\n- **量化训练**：支持在 BNB、AWQ、GPTQ、AQLM、HQQ、EETQ 量化模型上进行训练，7B 模型仅需 9GB 训练资源。\n- **显存优化**：支持 GaLore、Q-Galore、UnSloth、Liger-Kernel、Flash-Attention 2\u002F3，以及 **Ulysses 和 Ring-Attention 序列并行技术**，显著降低长文本训练的显存消耗。\n- **分布式训练**：支持分布式数据并行（DDP）、device_map 简易模型并行、DeepSpeed ZeRO2\u002FZeRO3、FSDP\u002FFSDP2，以及 Megatron 分布式训练技术。\n- 🍓 **多模态训练**：支持多模态打包（packing）技术，训练速度提升 100%+；支持文本、图像、视频和音频的混合模态数据训练；支持对 ViT\u002FAligner\u002FLLM 进行独立控制。\n- **Agent 训练**：支持 Agent 模板，允许使用同一份数据集训练不同模型。\n- 🍊 **训练任务**：支持预训练和指令微调，以及 DPO、GKD、KTO、RM、CPO、SimPO、ORPO 等训练任务，并支持 **Embedding\u002FReranker** 和序列分类任务。\n- 🥥 **Megatron 并行**：提供 TP\u002FPP\u002FSP\u002FCP\u002FETP\u002FEP\u002FVPP 并行策略，显著提升 **MoE（Mixture of Experts，混合专家）模型训练速度**。支持 300+ 纯文本大模型和 100+ 多模态大模型的全参数和 LoRA 训练方法，支持 CPT\u002FSFT\u002FGRPO\u002FDPO\u002FKTO\u002FRM 训练任务。\n- 🍉 **强化学习**：内置 **丰富的 GRPO 算法族**，包括 GRPO、DAPO、GSPO、SAPO、CISPO、CHORD、RLOO、Reinforce++ 等。支持同步和异步 vLLM 引擎推理加速，通过插件机制支持可扩展的奖励函数、多轮推理调度器（Schedulers）和环境。\n- **全链路能力**：覆盖训练、推理、评估、量化和部署的完整工作流。\n- **UI 训练**：提供 Web-UI 界面，用于训练、推理、评估和量化，完成大模型的全链路操作。\n- **推理加速**：支持 Transformers、vLLM、SGLang 和 LmDeploy 推理加速引擎，提供 OpenAI 接口以加速推理、部署和评估模块。\n- **模型评估**：使用 EvalScope 作为评估后端，支持 100+ 个评估数据集，用于评估纯文本和多模态模型。\n- **模型量化**：支持 AWQ、GPTQ、FP8 和 BNB 的量化导出。导出的模型可使用 vLLM\u002FSGLang\u002FLmDeploy 进行推理加速。\n\n## 🎉 新闻\n- 🎁 2026.03.03：**ms-swift v4.0** 主版本正式发布。发布说明请参见 [此处](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Freleases\u002Ftag\u002Fv4.0.0)。您可以在 [此 issue](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fissues\u002F7250) 中向我们提供您的建议。感谢您的支持！\n- 🎁 2025.11.14：Megatron GRPO 现已可用！请查看 [文档](.\u002Fdocs\u002Fsource_en\u002FMegatron-SWIFT\u002FGRPO.md) 和 [示例](examples\u002Fmegatron\u002Fgrpo)。\n- 🎁 2025.11.04：支持 [Mcore-Bridge](docs\u002Fsource_en\u002FMegatron-SWIFT\u002FMcore-Bridge.md)，使 Megatron 训练像 transformers 一样简单易用。\n- 🎁 2025.10.28：Ray 支持，详见 [此处](docs\u002Fsource_en\u002FInstruction\u002FRay.md)。\n- 🎁 2025.09.07：新增对 CHORD 训练算法的支持。详见 [文档](.\u002Fdocs\u002Fsource_en\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FCHORD.md)。\n- 🎁 2025.09.06：Ulysses 现在可与 ring-attention 结合使用，允许将序列分片为任意数量的块（不再受限于注意力头的数量）。参数仍为 `--sequence_parallel_size N`。\n- 🎁 2025.09.02：Megatron-SWIFT 现已支持多模态模型训练。文档见 [此处](.\u002Fdocs\u002Fsource_en\u002FMegatron-SWIFT\u002FMultimodal-Model.md)。\n- 🎁 2025.08.12：在 SFT（Supervised Fine-Tuning，监督微调）训练中支持 [Dynamic Fine-Tuning](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.05629)（DFT），使用参数 `--enable_dft_loss true`。训练脚本见 [此处](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Ffull\u002Fdft.sh)。\n- 🎁 2025.07.09：Megatron-SWIFT 支持 LoRA（Low-Rank Adaptation，低秩适配）训练。相比 ms-swift，在 MoE（Mixture of Experts，混合专家）模型上显著提速。训练脚本见 [此处](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Fmegatron\u002Flora)。\n- 🎁 2025.06.23：支持重排序（reranker）模型的微调。训练脚本见：[Reranker](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Freranker\u002Ftrain_reranker.sh)。\n- 🎁 2025.06.15：支持在纯文本大模型和多模态模型上进行 GKD（Generalized Knowledge Distillation，广义知识蒸馏）训练。训练脚本见：[纯文本](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Frlhf\u002Fgkd)，[多模态](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Fmultimodal\u002Frlhf\u002Fgkd)。\n\n\u003Cdetails>\u003Csummary>更多\u003C\u002Fsummary>\n\n- 🎁 2025.06.11：支持在 RLHF（Reinforcement Learning from Human Feedback，基于人类反馈的强化学习）训练中使用 Megatron 并行技术。训练脚本见 [此处](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Frlhf)。\n- 🎁 2025.05.29：在预训练、SFT、DPO 和 GRPO 中支持序列并行（sequence parallel）。脚本见 [此处](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Fsequence_parallel)。\n- 🎁 2025.05.11：GRPO 现在支持为奖励模型（reward model）自定义处理逻辑。GenRM 示例见 [此处](.\u002Fdocs\u002Fsource_en\u002FInstruction\u002FGRPO\u002FDeveloperGuide\u002Freward_model.md)。\n- 🎁 2025.04.15：ms-swift 论文已被 AAAI 2025 接收。论文链接见 [此处](https:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fview\u002F35383)。\n- 🎁 2025.03.23：GRPO 现在支持多轮对话场景（例如智能体工具调用）的训练。请参考 [文档](.\u002Fdocs\u002Fsource_en\u002FInstruction\u002FGRPO\u002FDeveloperGuide\u002Fmulti_turn.md)。\n- 🎁 2025.03.16：现已支持 Megatron 的并行训练技术。请参阅 [Megatron-SWIFT 训练文档](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FMegatron-SWIFT\u002FQuick-start.html)。\n- 🎁 2025.03.15：支持纯文本和多模态模型的嵌入（embedding）模型微调。请查看 [训练脚本](examples\u002Ftrain\u002Fembedding)。\n- 🎁 2025.03.05：GRPO 支持混合模式，可在 4 张 GPU（4*80G）上训练 72B 模型的脚本见 [此处](examples\u002Ftrain\u002Fgrpo\u002Finternal\u002Fvllm_72b_4gpu.sh)。同时支持与 vLLM 结合的张量并行（tensor parallelism），训练脚本见 [此处](examples\u002Ftrain\u002Fgrpo\u002Finternal)。\n- 🎁 2025.02.21：GRPO 算法现已支持 LMDeploy，训练脚本见 [此处](examples\u002Ftrain\u002Fgrpo\u002Finternal\u002Ffull_lmdeploy.sh)。此外，GRPO 算法性能已通过测试，结合多种技巧最高可实现 300% 的训练速度提升。WanDB 表格见 [此处](https:\u002F\u002Fwandb.ai\u002Ftastelikefeet\u002Fgrpo_perf_test?nw=nwuseryuzezyz)。\n- 🎁 2025.02.21：现已支持 `swift sample` 命令。强化微调脚本见 [此处](docs\u002Fsource_en\u002FInstruction\u002FReinforced-Fine-tuning.md)，大模型 API 蒸馏采样脚本见 [此处](examples\u002Fsampler\u002Fdistill\u002Fdistill.sh)。\n- 🔥 2025.02.12：新增对 GRPO（Group Relative Policy Optimization，组相对策略优化）训练算法的支持。文档见 [此处](docs\u002Fsource_en\u002FInstruction\u002FGRPO\u002FGetStarted\u002FGRPO.md)。\n- 🎁 2024.12.04：**ms-swift 3.0** 重大更新。请参阅 [发布说明和变更](docs\u002Fsource_en\u002FInstruction\u002FReleaseNote3.0.md)。\n- 🎉 2024.08.12：ms-swift 论文已在 arXiv 上发布，可在此处阅读 [链接](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.05517)。\n- 🔥 2024.08.05：支持使用 [evalscope](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fevalscope\u002F) 作为后端评估大模型和多模态模型。\n- 🔥 2024.07.29：支持使用 [vllm](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm) 和 [lmdeploy](https:\u002F\u002Fgithub.com\u002FInternLM\u002Flmdeploy) 加速大模型和多模态模型的推理。在执行 infer\u002Fdeploy\u002Feval 时，可指定 `--infer_backend vllm\u002Flmdeploy`。\n- 🔥 2024.07.24：支持多模态大模型的人类偏好对齐训练，包括 DPO\u002FORPO\u002FSimPO\u002FCPO\u002FKTO\u002FRM\u002FPPO。\n- 🔥 2024.02.01：支持智能体（Agent）训练！该训练算法源自 [此论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.00986.pdf)。\n\u003C\u002Fdetails>\n\n## 🛠️ 安装\n使用 pip 安装：\n```shell\npip install ms-swift -U\n\n# 使用 uv\npip install uv\nuv pip install ms-swift -U --torch-backend=auto\n```\n\n从源码安装：\n```shell\n# pip install git+https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift.git\n\ngit clone https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift.git\ncd ms-swift\n# main 分支对应 swift 4.x。如需安装 swift 3.x，请运行以下命令：\n# git checkout release\u002F3.12\npip install -e .\n```\n\n# 使用 uv\nuv pip install -e . --torch-backend=auto\n```\n\n运行环境：\n\n|              | 范围         | 推荐版本            | 说明                                      |\n|--------------|--------------|---------------------|-------------------------------------------|\n| python       | >=3.9        | 3.11\u002F3.12           |                                           |\n| cuda         |              | cuda12              | 如果使用 CPU、NPU 或 MPS 则无需安装       |\n| torch        | >=2.0        | 2.8.0\u002F2.10.0        |                                           |\n| transformers | >=4.33       | 4.57.6\u002F5.2.0        |                                           |\n| modelscope   | >=1.23       |                     |                                           |\n| peft         | >=0.11,\u003C0.19 |                     |                                           |\n| flash_attn   |              | 2.8.3\u002F3.0.0b1       |                                           |\n| trl          | >=0.15,\u003C0.30 | 0.28.0              | RLHF（基于人类反馈的强化学习）            |\n| deepspeed    | >=0.14       | 0.18.8              | 训练                                      |\n| vllm         | >=0.5.1      | 0.11.0\u002F0.17.1       | 推理\u002F部署                                 |\n| sglang       | >=0.4.6      |                     | 推理\u002F部署                                 |\n| lmdeploy     | >=0.5        | 0.10.1              | 推理\u002F部署                                 |\n| evalscope    | >=1.0        |                     | 评估                                      |\n| gradio       |              | 5.32.1              | Web-UI\u002F应用                               |\n\n更多可选依赖项，请参考 [此处](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Frequirements\u002Finstall_all.sh)。\n\n\n## 🚀 快速开始\n\n在单张 3090 GPU 上对 Qwen3-4B-Instruct-2507 进行 10 分钟的自我认知微调：\n\n### 命令行接口（推荐）\n\n```shell\n# 显存占用约 13GB\nCUDA_VISIBLE_DEVICES=0 \\\nswift sft \\\n    --model Qwen\u002FQwen3-4B-Instruct-2507 \\\n    --tuner_type lora \\\n    --dataset 'AI-ModelScope\u002Falpaca-gpt4-data-zh#500' \\\n              'AI-ModelScope\u002Falpaca-gpt4-data-en#500' \\\n              'swift\u002Fself-cognition#500' \\\n    --torch_dtype bfloat16 \\\n    --num_train_epochs 1 \\\n    --per_device_train_batch_size 1 \\\n    --per_device_eval_batch_size 1 \\\n    --learning_rate 1e-4 \\\n    --lora_rank 8 \\\n    --lora_alpha 32 \\\n    --target_modules all-linear \\\n    --gradient_accumulation_steps 16 \\\n    --eval_steps 50 \\\n    --save_steps 50 \\\n    --save_total_limit 2 \\\n    --logging_steps 5 \\\n    --max_length 2048 \\\n    --output_dir output \\\n    --warmup_ratio 0.05 \\\n    --dataloader_num_workers 4 \\\n    --model_author swift \\\n    --model_name swift-robot\n```\n\n提示：\n\n- 如果你想使用自定义数据集进行训练，可以参考[此指南](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FCustomization\u002FCustom-dataset.html)来组织你的数据集格式，并指定 `--dataset \u003Cdataset_path>`。\n- `--model_author` 和 `--model_name` 参数仅在数据集中包含 `swift\u002Fself-cognition` 时生效。\n- 若要使用其他模型进行训练，只需修改 `--model \u003Cmodel_id\u002Fmodel_path>`。\n- 默认情况下，使用 **ModelScope** 下载模型和数据集。如果你想使用 HuggingFace，只需指定 `--use_hf true`。\n\n训练完成后，使用以下命令加载训练好的权重进行推理：\n\n- 此处 `--adapters` 应替换为训练过程中生成的最后一个 checkpoint 文件夹。由于 adapters 文件夹中包含训练参数文件 `args.json`，因此无需单独指定 `--model`、`--system`；Swift 会自动读取这些参数。若要禁用此行为，可设置 `--load_args false`。\n\n```shell\n# 使用交互式命令行进行推理。\nCUDA_VISIBLE_DEVICES=0 \\\nswift infer \\\n    --adapters output\u002Fvx-xxx\u002Fcheckpoint-xxx \\\n    --stream true \\\n    --temperature 0 \\\n    --max_new_tokens 2048\n\n# 合并 LoRA 权重并使用 vLLM 加速推理\nCUDA_VISIBLE_DEVICES=0 \\\nswift infer \\\n    --adapters output\u002Fvx-xxx\u002Fcheckpoint-xxx \\\n    --stream true \\\n    --merge_lora true \\\n    --infer_backend vllm \\\n    --vllm_max_model_len 8192 \\\n    --temperature 0 \\\n    --max_new_tokens 2048\n```\n\n最后，使用以下命令将模型推送到 ModelScope：\n\n```shell\nCUDA_VISIBLE_DEVICES=0 \\\nswift export \\\n    --adapters output\u002Fvx-xxx\u002Fcheckpoint-xxx \\\n    --push_to_hub true \\\n    --hub_model_id '\u003Cyour-model-id>' \\\n    --hub_token '\u003Cyour-sdk-token>' \\\n    --use_hf false\n```\n\n\n### Web-UI\nWeb-UI 是一个基于 Gradio 界面技术的**零门槛**训练与部署界面解决方案。更多详情请查看 [此处](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FGetStarted\u002FWeb-UI.html)。\n\n```shell\nSWIFT_UI_LANG=en swift web-ui\n```\n\n![image.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmodelscope_ms-swift_readme_1d5df5a1f3f7.jpg)\n\n### 使用 Python\n\nms-swift 也支持使用 Python 进行训练和推理。以下是训练和推理的伪代码。更多详情请参考 [此处](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Fnotebook\u002Fqwen2_5-self-cognition\u002Fself-cognition-sft.ipynb)。\n\n训练：\n\n```python\nfrom peft import LoraConfig, get_peft_model\nfrom swift import get_model_processor, get_template, load_dataset, EncodePreprocessor\nfrom swift.trainers import Seq2SeqTrainer, Seq2SeqTrainingArguments\n# 获取模型和模板，并添加可训练的 LoRA 模块\nmodel, tokenizer = get_model_processor(model_id_or_path, ...)\ntemplate = get_template(tokenizer, ...)\nlora_config = LoraConfig(...)\nmodel = get_peft_model(model, lora_config)\n\n# 下载并加载数据集，将文本编码为 token\ntrain_dataset, val_dataset = load_dataset(dataset_id_or_path, ...)\ntrain_dataset = EncodePreprocessor(template=template)(train_dataset, num_proc=num_proc)\nval_dataset = EncodePreprocessor(template=template)(val_dataset, num_proc=num_proc)\n\n# 训练模型\ntraining_args = Seq2SeqTrainingArguments(...)\ntrainer = Seq2SeqTrainer(\n    model=model,\n    args=training_args,\n    template=template,\n    train_dataset=train_dataset,\n    eval_dataset=val_dataset,\n)\ntrainer.train()\n```\n推理：\n\n```python\nfrom swift import TransformersEngine, InferRequest, RequestConfig\n# 使用原生 Transformers 引擎进行推理\nengine = TransformersEngine(model_id_or_path, adapters=[lora_checkpoint])\ninfer_request = InferRequest(messages=[{'role': 'user', 'content': 'who are you?'}])\nrequest_config = RequestConfig(max_tokens=max_new_tokens, temperature=temperature)\n\nresp_list = engine.infer([infer_request], request_config)\nprint(f'response: {resp_list[0].choices[0].message.content}')\n```\n\n## ✨ 使用方法\n\n以下是一个使用 ms-swift 从训练到部署的最小示例。更多详情，请查看 [示例](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples)。\n\n- 如果你想使用其他模型或数据集（包括多模态模型和数据集），只需修改 `--model` 参数以指定对应模型的 ID 或路径，并修改 `--dataset` 参数以指定对应数据集的 ID 或路径。\n- 默认情况下，使用 ModelScope 下载模型和数据集。如果你想使用 HuggingFace，只需指定 `--use_hf true`。\n\n|   有用链接 |\n| ------ |\n|   [🔥命令行参数](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FCommand-line-parameters.html)   |\n|   [Megatron-SWIFT](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FMegatron-SWIFT\u002FQuick-start.html)   |\n|   [GRPO](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FGetStarted\u002FGRPO.html)   |\n|   [支持的模型和数据集](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FSupported-models-and-datasets.html)   |\n|   [自定义模型](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FCustomization\u002FCustom-model.html), [🔥自定义数据集](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FCustomization\u002FCustom-dataset.html)   |\n|   [大语言模型（LLM）教程](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fmodelscope-classroom\u002Ftree\u002Fmain\u002FLLM-tutorial)   |\n\n### 训练\n\n支持的训练方法：\n\n| 方法                                                         | 全参数训练（Full-Parameter）                                 | LoRA | QLoRA                                                        | Deepspeed                                                    | 多机训练（Multi-Machine）                                    | 多模态（Multimodal）                                         |\n| ------------------------------------------------------------ | ------------------------------------------------------------ | ---- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |\n| [预训练（Pre-training）](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Fpretrain) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | ✅                                                            |\n| [监督微调（Supervised Fine-Tuning）](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Flora_sft.sh) | [✅](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Ffull\u002Ftrain.sh) | ✅    | [✅](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Fqlora) | [✅](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Fmulti-gpu\u002Fdeepspeed) | [✅](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Fmulti-node) | [✅](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Fmultimodal) |\n| [GRPO](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Fgrpo) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | ✅                                                            |\n| [GKD](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Frlhf\u002Fgkd) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | [✅](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Fmultimodal\u002Frlhf\u002Fgkd) |\n| [PPO](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Frlhf\u002Fppo) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | ❌                                                            |\n| [DPO](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Frlhf\u002Fdpo) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | [✅](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Fmultimodal\u002Frlhf\u002Fdpo) |\n| [KTO](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Frlhf\u002Fkto.sh) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | [✅](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Fmultimodal\u002Frlhf\u002Fkto.sh) |\n| [奖励模型（Reward Model）](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Frlhf\u002Frm.sh) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | ✅                                                            |\n| [CPO](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Frlhf\u002Fcpo.sh) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | ✅                                                            |\n| [SimPO](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Frlhf\u002Fsimpo.sh) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | ✅                                                            |\n| [ORPO](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Frlhf\u002Forpo.sh) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | ✅                                                            |\n| [嵌入模型训练（Embedding）](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Fembedding) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | ✅                                                            |\n| [重排序模型（Reranker）](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Freranker) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | ✅                                                            |\n| [序列分类（Sequence Classification）](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Fseq_cls) | ✅                                                            | ✅    | ✅                                                            | ✅                                                            | ✅                                                            | ✅                                                            |\n\n\n预训练（Pre-training）：\n```shell\n\n# 8*A100\nNPROC_PER_NODE=8 \\\nCUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \\\nswift pt \\\n    --model Qwen\u002FQwen2.5-7B \\\n    --dataset swift\u002Fchinese-c4 \\\n    --streaming true \\\n    --tuner_type full \\\n    --deepspeed zero2 \\\n    --output_dir output \\\n    --max_steps 10000 \\\n    ...\n```\n\n微调（Fine-tuning）:\n```shell\nCUDA_VISIBLE_DEVICES=0 swift sft \\\n    --model Qwen\u002FQwen2.5-7B-Instruct \\\n    --dataset AI-ModelScope\u002Falpaca-gpt4-data-en \\\n    --tuner_type lora \\\n    --output_dir output \\\n    ...\n```\n\n基于人类反馈的强化学习（RLHF）:\n```shell\nCUDA_VISIBLE_DEVICES=0 swift rlhf \\\n    --rlhf_type dpo \\\n    --model Qwen\u002FQwen2.5-7B-Instruct \\\n    --dataset hjh0119\u002FshareAI-Llama3-DPO-zh-en-emoji \\\n    --tuner_type lora \\\n    --output_dir output \\\n    ...\n```\n\n\n### Megatron-SWIFT\n\nms-swift 支持使用 Megatron 并行技术加速训练，包括大规模集群训练和 MoE（Mixture of Experts，混合专家）模型训练。支持以下训练方法：\n\n| 方法                 | 全参数微调（Full-Parameter） | LoRA | MoE  | 多模态（Multimodal） | FP8  |\n| ---------------------- | -------------- | ---- | ---- | ---------- | ---- |\n| 预训练（Pre-training）           | ✅              | ✅    | ✅    | ✅          | ✅    |\n| [监督微调（Supervised Fine-Tuning）](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron) | ✅              | ✅    | ✅    | ✅          | ✅    |\n| [GRPO](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Fgrpo)                   | ✅              | ✅    | ✅    | ✅          | ✅    |\n| [GKD](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Frlhf\u002Fgkd)                   | ✅              | ✅    | ✅    | ✅          | ✅    |\n| [DPO](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Frlhf\u002Fdpo)                    | ✅              | ✅    | ✅    | ✅          | ✅    |\n| [KTO](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Frlhf\u002Fkto)                    | ✅              | ✅    | ✅    | ✅          | ✅    |\n| [奖励模型（RM）](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Frlhf\u002Frm)                     | ✅              | ✅    | ✅    | ✅          | ✅    |\n| [嵌入模型（Embedding）](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Fembedding) | ✅ | ✅| ✅ | ✅ | ✅ |\n| [重排序模型（Reranker）](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Freranker) | ✅ | ✅| ✅ | ✅ | ✅ |\n| [序列分类（Sequence Classification）](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Fseq_cls)    | ✅              | ✅    | ✅    | ✅          | ✅    |\n\n\n```shell\nNPROC_PER_NODE=2 CUDA_VISIBLE_DEVICES=0,1 megatron sft \\\n    --model Qwen\u002FQwen2.5-7B-Instruct \\\n    --save_safetensors true \\\n    --dataset AI-ModelScope\u002Falpaca-gpt4-data-zh \\\n    --tuner_type lora \\\n    --output_dir output \\\n    ...\n```\n\n### 强化学习（Reinforcement Learning）\n\nms-swift 支持丰富的 GRPO 系列算法：\n\n| 方法                                                       | 全参数微调（Full-Parameter） | LoRA | 多模态（Multimodal） | 多机（Multi-Machine） |\n| ------------------------------------------------------------ | -------------- | ---- | ---------- | ------------- |\n| [GRPO](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FGetStarted\u002FGRPO.html) | ✅              | ✅    | ✅          | ✅             |\n| [DAPO](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FDAPO.html) | ✅              | ✅    | ✅          | ✅             |\n| [GSPO](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FGSPO.html) | ✅              | ✅    | ✅          | ✅             |\n| [SAPO](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FSAPO.html) | ✅              | ✅    | ✅          | ✅             |\n| [CISPO](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FCISPO.html) | ✅              | ✅    | ✅          | ✅             |\n| [CHORD](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FCHORD.html) | ✅              | ✅    | ✅          | ✅             |\n| [RLOO](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FRLOO.html) | ✅              | ✅    | ✅          | ✅             |\n| [Reinforce++](https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FREINFORCEPP.html) | ✅              | ✅    | ✅          | ✅             |\n\n```shell\nCUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 \\\nswift rlhf \\\n    --rlhf_type grpo \\\n    --model Qwen\u002FQwen2.5-7B-Instruct \\\n    --tuner_type lora \\\n    --use_vllm true \\\n    --vllm_mode colocate \\\n    --dataset AI-MO\u002FNuminaMath-TIR#10000 \\\n    --output_dir output \\\n    ...\n```\n\n\n### 推理（Inference）\n```shell\nCUDA_VISIBLE_DEVICES=0 swift infer \\\n    --model Qwen\u002FQwen2.5-7B-Instruct \\\n    --stream true \\\n    --infer_backend transformers \\\n    --max_new_tokens 2048\n\n# LoRA\nCUDA_VISIBLE_DEVICES=0 swift infer \\\n    --model Qwen\u002FQwen2.5-7B-Instruct \\\n    --adapters swift\u002Ftest_lora \\\n    --stream true \\\n    --infer_backend transformers \\\n    --temperature 0 \\\n    --max_new_tokens 2048\n```\n\n### 接口推理（Interface Inference）\n```shell\nCUDA_VISIBLE_DEVICES=0 swift app \\\n    --model Qwen\u002FQwen2.5-7B-Instruct \\\n    --stream true \\\n    --infer_backend transformers \\\n    --max_new_tokens 2048\n```\n\n### 部署（Deployment）\n```shell\nCUDA_VISIBLE_DEVICES=0 swift deploy \\\n    --model Qwen\u002FQwen2.5-7B-Instruct \\\n    --infer_backend vllm\n```\n\n### 采样（Sampling）\n```shell\nCUDA_VISIBLE_DEVICES=0 swift sample \\\n    --model LLM-Research\u002FMeta-Llama-3.1-8B-Instruct \\\n    --sampler_engine transformers \\\n    --num_return_sequences 5 \\\n    --dataset AI-ModelScope\u002Falpaca-gpt4-data-zh#5\n```\n\n### 评估（Evaluation）\n```shell\nCUDA_VISIBLE_DEVICES=0 swift eval \\\n    --model Qwen\u002FQwen2.5-7B-Instruct \\\n    --infer_backend lmdeploy \\\n    --eval_backend OpenCompass \\\n    --eval_dataset ARC_c\n```\n\n### 量化（Quantization）\n```shell\nCUDA_VISIBLE_DEVICES=0 swift export \\\n    --model Qwen\u002FQwen2.5-7B-Instruct \\\n    --quant_bits 4 --quant_method awq \\\n    --dataset AI-ModelScope\u002Falpaca-gpt4-data-zh \\\n    --output_dir Qwen2.5-7B-Instruct-AWQ\n```\n\n### 推送模型（Push Model）\n```shell\nswift export \\\n    --model \u003Cmodel-path> \\\n    --push_to_hub true \\\n    --hub_model_id '\u003Cmodel-id>' \\\n    --hub_token '\u003Csdk-token>'\n```\n\n## 🏛 许可证（License）\n\n本框架采用 [Apache License (Version 2.0)](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmaster\u002FLICENSE) 许可证。对于模型和数据集，请参考原始资源页面并遵守相应的许可证。\n\n## 📎 引用（Citation）\n\n```bibtex\n@misc{zhao2024swiftascalablelightweightinfrastructure,\n      title={SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning},\n      author={Yuze Zhao and Jintao Huang and Jinghan Hu and Xingjun Wang and Yunlin Mao and Daoze Zhang and Zeyinzi Jiang and Zhikai Wu and Baole Ai and Ang Wang and Wenmeng Zhou and Yingda Chen},\n      year={2024},\n      eprint={2408.05517},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.05517},\n}\n```\n\n## Star History（Star 历史）\n\n[![Star History Chart](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmodelscope_ms-swift_readme_b8e7cc07dcb0.png)](https:\u002F\u002Fstar-history.com\u002F#modelscope\u002Fms-swift&Date)","# ms-swift 快速上手指南\n\n## 环境准备\n\n- **操作系统**：Linux \u002F macOS \u002F Windows（推荐 Linux）\n- **Python 版本**：≥ 3.11\n- **PyTorch 版本**：≥ 2.0\n- **ModelScope 版本**：≥ 1.23\n- **硬件支持**：A10\u002FA100\u002FH100、RTX 系列、T4\u002FV100、CPU、MPS，以及国产昇腾 NPU 等\n\n> 💡 建议使用国内镜像源加速安装（如清华源、阿里源）。\n\n## 安装步骤\n\n### 1. 创建并激活虚拟环境（可选但推荐）\n\n```bash\npython -m venv swift-env\nsource swift-env\u002Fbin\u002Factivate  # Linux\u002FmacOS\n# swift-env\\Scripts\\activate  # Windows\n```\n\n### 2. 安装 ms-swift\n\n使用 pip 安装（推荐使用国内镜像加速）：\n\n```bash\npip install ms-swift -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n或从源码安装（获取最新功能）：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift.git\ncd ms-swift\npip install -e . -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n> ⚠️ 若使用昇腾 NPU，请参考[官方文档](https:\u002F\u002Fswift.readthedocs.io\u002Fzh-cn\u002Flatest\u002F)配置 CANN 和自定义算子。\n\n## 基本使用\n\n以下是一个使用 LoRA 微调 Qwen3 模型的最简示例：\n\n```bash\n# 下载示例脚本（首次运行会自动下载模型和数据集）\nswift sft \\\n    --model_type qwen3-7b \\\n    --dataset alpaca-en \\\n    --lora_target_modules ALL \\\n    --output_dir output_qwen3_lora\n```\n\n训练完成后，可直接进行推理：\n\n```bash\nswift infer \\\n    --ckpt_dir output_qwen3_lora\u002Fvx-x_xxx\u002Fcheckpoint-xxx \\\n    --load_dataset_config true\n```\n\n> ✅ `ms-swift` 内置 150+ 数据集和 1000+ 模型，支持一行命令完成训练、推理、评估、量化全流程。更多高级用法请参考 [中文文档](https:\u002F\u002Fswift.readthedocs.io\u002Fzh-cn\u002Flatest\u002F)。","某医疗AI创业团队希望基于开源多模态大模型，快速构建一个能理解医学影像（如X光片）并生成专业诊断建议的辅助系统。\n\n### 没有 ms-swift 时\n- 需手动适配多个主流多模态模型（如 LLaVA、InternVL3.5）的训练代码，每换一个模型就要重写数据加载和训练逻辑，耗时数周。\n- 缺乏统一接口支持 DPO 或 GRPO 等对齐算法，难以利用医生标注的偏好数据优化模型输出的专业性和安全性。\n- 在单台 A100 服务器上微调 GLM4.5-V 这类大模型时显存不足，需自行集成 DeepSpeed 或 Megatron 并行策略，技术门槛高。\n- 无法便捷地对训练后的模型进行量化（如 AWQ）和部署，推理延迟高，难以满足临床实时性要求。\n\n### 使用 ms-swift 后\n- 一行命令即可切换 Qwen3-Omni、GLM4.5-V 等 400+ 多模态模型，内置医学图像-文本数据集模板，5 分钟启动 SFT 训练。\n- 原生支持 DPO、GRPO 等人类反馈对齐算法，直接用医生提供的“好\u002F坏”诊断样本优化模型，显著提升回答合规性。\n- 自动启用 TP\u002FPP 并行与梯度检查点，在单卡 A100 上成功微调 20B 级多模态模型，显存占用降低 40%。\n- 训练完成后一键导出 AWQ 量化模型，并通过集成的 vLLM 加速推理，响应时间从 8 秒降至 1.2 秒。\n\nms-swift 将多模态大模型从实验到落地的周期从数月压缩至几天，让小团队也能高效驾驭前沿 AI 能力。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmodelscope_ms-swift_983e54e4.png","modelscope","ModelScope","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fmodelscope_66a27ef8.png","Model-as-a-Service in the making: bring accessible AI to all.",null,"contact@modelscope.cn","https:\u002F\u002Fwww.modelscope.cn\u002F","https:\u002F\u002Fgithub.com\u002Fmodelscope",[84,88,92],{"name":85,"color":86,"percentage":87},"Python","#3572A5",99.8,{"name":89,"color":90,"percentage":91},"Shell","#89e051",0.2,{"name":93,"color":94,"percentage":95},"Makefile","#427819",0,13532,1327,"2026-04-05T09:52:08","Apache-2.0","Linux, macOS, Windows","支持 NVIDIA GPU（A10\u002FA100\u002FH100、RTX 系列、T4\u002FV100）、Apple MPS、CPU 及国产昇腾 NPU；显存最低可支持 9GB（用于 7B 模型 QLoRA 训练）；未明确指定 CUDA 版本","未说明",{"notes":104,"python":105,"dependencies":106},"支持多种轻量化微调方法（如 LoRA、QLoRA 等）和量化训练（BNB、AWQ、GPTQ 等），可在低至 9GB 显存的设备上训练 7B 模型；支持分布式训练（DeepSpeed、FSDP、Megatron 等）及多模态模型训练；提供 Web UI 界面和完整训练-推理-评估-部署 pipeline。","3.11",[107,108],"torch>=2.0","modelscope>=1.23",[13,54,26],[111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130],"llm","lora","llama","sft","multimodal","peft","internvl","liger","deepseek-r1","embedding","grpo","open-r1","megatron","llama4","qwen3","reranker","qwen3-next","moe","qwen3-vl","qwen3-omni",38,"2026-03-27T02:49:30.150509","2026-04-06T06:44:28.962744",[135,140,144,148,153,157,161,166],{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},276,"如何对 MiniCPM-V 2.6 进行推理和 LoRA 微调？","首先克隆 swift 仓库并安装依赖：\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fswift.git\ncd swift\npip install -e .[llm]\n```\n推理命令：\n```bash\nCUDA_VISIBLE_DEVICES=0 swift infer \\\n  --model_type minicpm-v-v2_6-chat \\\n  --model_id_or_path OpenBMB\u002FMiniCPM-V-2_6\n```\n使用 COCO 数据集进行 LoRA 微调：\n```bash\nCUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \\\n  --model_type minicpm-v-v2_6-chat \\\n  --model_id_or_path OpenBMB\u002FMiniCPM-V-2_6 \\\n  --sft_type lora \\\n  --dataset coco-en-mini#20000 \\\n  --deepspeed default-zero2\n```\n自定义数据集支持 JSON\u002FJSONL 格式，字段包括 query、response、images 和可选 history。","https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fissues\u002F1613",{"id":141,"question_zh":142,"answer_zh":143,"source_url":139},277,"为什么使用 --model_type minicpm-v-v2_6-chat 报错“model_type not in MODEL_MAPPING”？","该错误是因为使用的 swift 版本不匹配。Issue 中维护者指出：示例命令适用于 swift2，而 swift3 的用法已变更，请参考最新文档：https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Fmultimodal。请确保使用与当前 swift 版本对应的 model_type 名称。",{"id":145,"question_zh":146,"answer_zh":147,"source_url":139},278,"LoRA 微调后的模型显存占用高，甚至在 int4 模型上也出现 CUDA out of memory，怎么办？","维护者指出：长文本（尤其是高分辨率图像）的主要显存占用不在模型本身，而在上下文长度。建议降低图片分辨率以减少上下文长度，从而缓解显存压力。",{"id":149,"question_zh":150,"answer_zh":151,"source_url":152},279,"如何加载 LoRA 微调后的模型进行推理自己的测试集？","可以使用如下 Python 脚本加载模型和 LoRA 权重进行推理：\n```python\nos.environ['CUDA_VISIBLE_DEVICES'] = '0,1'\nfrom swift.llm import get_model_tokenizer, get_template, ModelType, get_default_template_type\nfrom swift.tuners import Swift\nfrom transformers import GenerationConfig\n\nckpt_dir = '\u002Fpath\u002Fto\u002Fcheckpoint\u002F'\nmodel_type = ModelType.qwen1half_72b_chat\ntemplate_type = get_default_template_type(model_type)\nmodel_id_or_path = '\u002Fpath\u002Fto\u002Fbase\u002Fmodel\u002F'\n\nmodel, tokenizer = get_model_tokenizer(model_type, model_id_or_path=model_id_or_path, model_kwargs={'device_map': 'auto'})\nmodel = Swift.from_pretrained(model, ckpt_dir, inference_mode=True)\ntemplate = get_template(template_type, tokenizer)\n\nmodel.generation_config = GenerationConfig(\n    max_new_tokens=2048,\n    temperature=0.9,\n    repetition_penalty=1.05,\n    do_sample=True\n)\n```\n注意：需同时指定 model_type 和 model_id_or_path，并确保路径正确。","https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fissues\u002F613",{"id":154,"question_zh":155,"answer_zh":156,"source_url":152},280,"推理时出现 “probability tensor contains either inf, nan or element \u003C 0” 错误怎么办？","该错误通常由生成参数设置不当引起。建议调整 generation_config 参数，例如降低 temperature（如设为 0.9）、设置合理的 repetition_penalty（如 1.05），并确保 do_sample 与 top_p\u002Ftop_k 配合使用。参考正确配置见 Issue 评论中的推理脚本。",{"id":158,"question_zh":159,"answer_zh":160,"source_url":152},281,"如何下载模型到本地指定路径用于微调或推理？","可使用 ModelScope 的 snapshot_download 函数将模型下载到自定义目录：\n```python\nfrom modelscope import snapshot_download\nmodel_dir = snapshot_download('qwen\u002FQwen1.5-72B-Chat', cache_dir='\u002Fyour\u002Flocal\u002Fpath')\n```\n之后在训练或推理命令中通过 --model_id_or_path 指向该本地路径即可。",{"id":162,"question_zh":163,"answer_zh":164,"source_url":165},282,"如何在 ms-swift 3.0 中简化 model_type 的使用？","ms-swift 3.0 已弱化 model_type 概念，支持仅通过 --model_id_or_path 自动从 config.json 中检测模型类型。用户无需手动指定 model_type，系统会自动识别并加载对应配置。","https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fissues\u002F2217",{"id":167,"question_zh":168,"answer_zh":169,"source_url":165},283,"ms-swift 3.0 是否支持多 LoRA 推理和批量推理？","是的，ms-swift 3.0 已优化多 LoRA 推理体验，并支持 PT（PyTorch）后端的 batch 推理和多卡\u002FDeepSpeed 推理，提供统一的推理与部署接口，兼容 vLLM、LMDeploy 等框架。",[171,176,181,186,191,196,201,206,211,216,221,226,231,236,241,246,251,256,261,266],{"id":172,"version":173,"summary_zh":174,"released_at":175},99974,"v4.0.4","**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fcompare\u002Fv4.0.3...v4.0.4","2026-04-03T22:36:01",{"id":177,"version":178,"summary_zh":179,"released_at":180},99975,"v4.0.3","**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fcompare\u002Fv4.0.2...v4.0.3\r\n","2026-03-29T04:21:35",{"id":182,"version":183,"summary_zh":184,"released_at":185},99976,"v4.0.2","**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fcompare\u002Fv4.0.1...v4.0.2\r\n\r\n","2026-03-14T14:20:54",{"id":187,"version":188,"summary_zh":189,"released_at":190},99977,"v4.0.1","**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fcompare\u002Fv4.0.0...v4.0.1","2026-03-08T04:33:07",{"id":192,"version":193,"summary_zh":194,"released_at":195},99978,"v4.0.0","## 中文版\r\n\r\n### 新特性\r\n1. **架构优化**\r\n   a. 目录结构重构与依赖关系优化，使用模块化设计，提升架构的可扩展性和可定制性。\r\n   b. `model_type`与`template`解耦，简化同一 model_type 含多个 template 的模型支持流程。\r\n   c. Megatron-SWIFT 训练循环重写，使用 megatron-core 替代 megatron-lm 依赖。（兼容Ascend NPU）\r\n2. **Megatron-SWIFT**\r\n   a. 新模型支持：Qwen3.5系列、GLM4.7-Flash、MiniMax-M2.1、OLMoE。\r\n   b. Embedding 任务支持，训练示例：https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Fembedding\r\n   c. Reranker 任务支持，训练示例：https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Freranker\r\n   d. 新增`save_total_limit`参数，自动清理过期 checkpoint，并保留指标最优和最新的权重。\r\n   e. Qwen3-Next\u002FQwen3.5 新增`apply_wd_to_qk_layernorm`参数，支持对 qk layernorm 应用权重衰减。\r\n   f. 多模态MoE模型lora支持 `--target_modules all-router` 配置。\r\n3. **RL**\r\n   a. 支持GDPO算法计算优势，使用参数`--scale_rewards gdpo`。（感谢 @Auraithm 的贡献）\r\n   b. GKD 支持使用 top-k logits 计算KL以节约显存，使用参数 `--gkd_topk_logits`。\r\n   c. GKD 支持使用 teacher server，避免显式加载教师模型。\r\n4. **训练**\r\n  a. 新增 muon clip 优化器支持，训练示例：https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Foptimizer\u002Fmuonclip.sh （感谢 @vx120 的贡献）\r\n  b. 依赖更新：兼容最新依赖 python3.12, transformers 5.2.0, vllm 0.15.1, trl 0.28, liger-kernel 0.7.0等。\r\n  c. generative reranker lm_head 部分计算优化，降低显存占用。\r\n  d. fsdp2支持激活 cpu offload；deepspeed elastic支持。（感谢招商 @meichangsu1 的贡献）\r\n\r\n### 新模型\r\n\r\n1. **纯文本模型**\r\n    a. Qwen\u002FQwen3-Coder-Next\r\n    b. ZhipuAI\u002FGLM-4.7-Flash, ZhipuAI\u002FGLM-5\r\n    c. MiniMaxAI\u002FMiniMax-M2.1\r\n    d. Tencent-YouTu-Research\u002FYoutu-LLM-2B\r\n    e. IQuestLab\u002FIQuest-Coder-V1-40B-Instruct\r\n    f. allenai\u002FOLMoE-1B-7B-0924-Instruct系列（感谢 @qianhao0713 的贡献）\r\n2. **多模态模型**\r\n    a. Qwen\u002FQwen3.5-35B-A3B, Qwen\u002FQwen3.5-9B 系列。训练脚本参考：https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmodels\u002Fqwen3_5\r\n    b. Qwen3-VL-Embedding, Qwen3-VL-Reranker。训练脚本参考：https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Fembedding\u002Fqwen3, https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Freranker\u002Fqwen3\r\n    c. deepseek-ai\u002FDeepSeek-OCR-2\r\n    d. ZhipuAI\u002FGLM-OCR\r\n    e. PaddlePaddle\u002FPaddleOCR-VL-1.5\r\n    f. OpenBMB\u002FMiniCPM-o-4_5\r\n    g. stepfun-ai\u002FStep3-VL-10B\r\n    h. google\u002Fmedgemma-4b-it 系列\r\n\r\n\r\n## English Version\r\n\r\n\r\n### New Features\r\n\r\n1. **Architecture Optimization**\r\n   a. Directory structure refactoring and dependency optimization with modular design to enhance architecture scalability and customizability.\r\n   b. Decoupling of `model_type` and `template` to simplify support for models with multiple templates under the same model_type.\r\n   c. Rewritten Megatron-SWIFT training loop using megatron-core instead of megatron-lm dependency. (Compatible with Ascend NPU)\r\n2. **Megatron-SWIFT**\r\n   a. New model support: Qwen3.5 series, GLM4.7-Flash, MiniMax-M2.1, OLMoE.\r\n   b. Embedding task support. Training example: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Fembedding\r\n   c. Reranker task support. Training example: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Freranker\r\n   d. Added `save_total_limit` parameter to automatically clean up expired checkpoints while retaining the best-performing and latest weights.\r\n   e. Added `apply_wd_to_qk_layernorm` parameter for Qwen3-Next\u002FQwen3.5 to support weight decay on qk layernorm.\r\n   f. Multi-modal MoE model LoRA supports `--target_modules all-router` configuration.\r\n3. **RL**\r\n   a. Support for GDPO algorithm to compute advantages using parameter `--scale_rewards gdpo`. (Thanks to @Auraithm)\r\n   b. GKD supports using top-k logits to compute KL for memory savings with parameter `--gkd_topk_logits`.\r\n   c. GKD supports using teacher server to avoid explicitly loading the teacher model.\r\n4. **Training**\r\n   a. Added Muon-CLIP optimizer support. Training example: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Ftrain\u002Foptimizer\u002Fmuonclip.sh (Thanks to @vx120)\r\n   b. Dependency updates: Compatible with latest dependencies including python3.12, transformers 5.2.0, vllm 0.15.1, trl 0.28, liger-kernel 0.7.0, etc.\r\n   c. Optimized generative reranker lm_head computation to reduce memory usage.\r\n   d. FSDP2 supports CPU offload activation; DeepSpeed elastic support. (Thanks to @meichangsu1)\r\n\r\n### New Models\r\n\r\n1. **Text-only Models**\r\n   a. Qwen\u002FQwen3-Coder-Next\r\n   b. ZhipuAI\u002FGLM-4.7-Flash, ZhipuAI\u002FGLM-5\r\n   c. MiniMaxAI\u002FMiniMax-M2.1\r\n   d. Tencent-YouTu-Research\u002FYoutu-LLM-2B\r\n   e. IQuestLab\u002FIQuest-Coder-V1-40B-Instruct\r\n   f. allenai\u002FOLMoE-1B-7B-0924-Instruct series (Thanks to @qianhao0713)\r\n2. **Multi-modal Models**\r\n   a. Qwen\u002FQwen3.5-35B-A3B, Qwen\u002FQwen3.5-9B series. Training scripts: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmodels\u002Fqwen3_5\r\n   b. Qwen3-VL-Embedding, Qwen3-VL-Reranker. Training scripts: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Fembedding\u002Fqwen3, https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Freranker\u002Fqwen3\r\n   c. deepseek-ai\u002FDeepSeek-OCR-2\r\n   d. ZhipuAI\u002FGLM-OCR\r\n   e. PaddlePaddle\u002FPaddleOCR-VL-1.5\r\n   f. OpenBMB\u002FMiniCPM-o-4_5\r\n   g. stepfun-ai\u002FStep3-VL-10B\r\n   h. google\u002Fmedgemma-4b-i","2026-03-03T08:25:57",{"id":197,"version":198,"summary_zh":199,"released_at":200},99979,"v3.12.6","## What's Changed\r\n* [bugfix] fix grpo move_model_batches by @hjh0119 in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F8091\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fcompare\u002Fv3.12.5...v3.12.6\r\n","2026-02-28T01:46:03",{"id":202,"version":203,"summary_zh":204,"released_at":205},99980,"v3.12.5","**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fcompare\u002Fv3.12.4...v3.12.5\r\n","2026-02-14T10:10:49",{"id":207,"version":208,"summary_zh":209,"released_at":210},99981,"v3.12.4","**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fcompare\u002Fv3.12.3...v3.12.4\r\n","2026-02-03T16:44:33",{"id":212,"version":213,"summary_zh":214,"released_at":215},99982,"v3.12.3","**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fcompare\u002Fv3.12.2...v3.12.3\r\n","2026-01-24T06:19:29",{"id":217,"version":218,"summary_zh":219,"released_at":220},99983,"v3.12.2","**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fcompare\u002Fv3.12.1...v3.12.2","2026-01-17T07:20:11",{"id":222,"version":223,"summary_zh":224,"released_at":225},99984,"v3.12.1","## What's Changed\r\n* [bugfix] fix glm4_7 agent_template by @Jintao-Huang in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F7256\r\n* [bugfix] fix DeepSeek-OCR vllm deploy by @hjh0119 in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F7258\r\n* [feat] add async reward function support for GRPO training by @hjh0119 in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F7252\r\n* [model] support medgemma by @slin000111 in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F7261\r\n* [megatron] Support MiniMaxAI\u002FMiniMax-M2.1 by @Jintao-Huang in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F7262\r\n* Support muonclip optimizer by @vx120 in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F7191\r\n* add task_type by @slin000111 in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F7265\r\n* [bugfix] fix mtp save by @Jintao-Huang in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F7267\r\n* [feat] support megatron grpo entropy mask & log by @hjh0119 in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F7263\r\n* [model] support iquestcoder by @Jintao-Huang in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F7271\r\n* [bugfix] fix reward model adapters  by @hjh0119 in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F7293\r\n* Fix the issue of repeated inference in multi-turn scheduler. by @Simon-ss7 in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F7279\r\n* [bugfix] auto-enable async engine for vLLM encode tasks by @hjh0119 in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F7301\r\n* [bugfix] fix vllm_engine load_format by @Jintao-Huang in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F7302\r\n* fix npu megatron cp by @addsubmuldiv in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F7299\r\n* [misc] Remove unnecessary clone operations during weight synchronization by @hjh0119 in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F7308\r\n* [model] support youtu-llm by @hjh0119 in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F7306\r\n* [megatron] fix gpt_bridge oom by @Jintao-Huang in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F7310\r\n* [misc] fix youtu agent template type-checking by @hjh0119 in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F7311\r\n* [bugfix] Fix duplicate 'load_format' argument being passed in rollout by @hjh0119 in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F7312\r\n\r\n## New Contributors\r\n* @Simon-ss7 made their first contribution in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F7279\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fcompare\u002Fv3.12.0...v3.12.1","2026-01-08T02:29:08",{"id":227,"version":228,"summary_zh":229,"released_at":230},99985,"v3.12.0","## 中文版\r\n\r\n### 新特性\r\n1. **Megatron-SWIFT**\r\n   a. GKD算法支持Megatron训练，文档参考：https:\u002F\u002Fswift.readthedocs.io\u002Fzh-cn\u002Flatest\u002FMegatron-SWIFT\u002FGKD.html\r\n   b. 新模型支持：GLM4 Dense; GLM4.7; GLM4.6v-Flash, GLM-4.1V。\r\n   c. `save_safetensors` 支持断点续训，将 Mcore-Bridge 加载和存储方式作为推荐方式。\r\n   d. 非 padding-free 训练模式支持更多训练阶段：GRPO\u002FDPO\u002FKTO\u002FRM\u002F序列分类。\r\n   e. group_by_length 参数支持，将数据集长度大致相同的样本分组在一起（含随机因素），加速非packing模式下训练速度。\r\n   f. 支持 `--report_to` 参数，将训练日志在 wandb\u002Fswanlab 中记录并可视化。\r\n   g. Qwen3-Next 使用 Zero-Centered RMSNorm，与 transformers 对齐。\r\n   h. `train_dataloader_shuffle` 参数支持，控制训练数据集是否随机。\r\n   i. template.encode 新增重试机制，避免 megatron 训练因网络问题获取图片\u002F视频报错而卡住。\r\n2. **RL**\r\n   a. 增加 Off-Policy Sequence Masking (from DeepSeek-V3.2)，文档参考：https:\u002F\u002Fswift.readthedocs.io\u002Fzh-cn\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002Ftraining_inference_mismatch.html#off-policy-sequence-masking\r\n   b. GRPO 增加参数 num_generations_eval 设置 eval 阶段的生成数量。\r\n   c. 优化 GKD loss 计算的显存峰值。\r\n   d. GRPO\u002FGKD server mode 支持使用 ipv6 地址。\r\n   e. 支持使用 structured_outputs_regex 进行结构化输出采样。\r\n3. **训练**\r\n   a. embedding\u002Freranker\u002F序列分类任务支持序列 packing 和序列并行。训练脚本参考：https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Fsequence_parallel\r\n   b. 支持 `--fsdp fsdp2` 使用 ms-swift 内置的 FSDP2 配置文件。\r\n   c. loss_scale 支持3种基本策略：'default'、'last_round'、'all'与其他策略的混合使用，例如：'last_round+ignore_empty_think'。\r\n   d. cached_dataset 支持 embedding\u002Freranker\u002F序列分类训练任务，训练脚本参考https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Fcached_dataset\r\n   e. thinking template 重构，ThinkingTemplate 功能合入 Template，新增`enable_thinking`, `add_non_thinking_prefix`参数。\r\n   f. 新增 `SWIFT_PATCH_CONV3D` 环境变量，避免 torch2.9 环境跑 conv3d 缓慢的问题。\r\n   g. 支持 `swanlab_notification_method` 参数，在训练完成\u002F发生错误时，指定 swanlab 通知方式。\r\n   h. `dataloader_prefetch_factor` 参数默认值从10修改为2。\r\n4. **国产化硬件**（感谢昇腾和招商银行技术团队的贡献）\r\n   a. 新增更多训练脚本：https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fascend\r\n   b. Qwen3-VL 混合算子支持，具体查看这个PR：https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F7079\r\n   c. 更新 Megatron-SWIFT NPU 性能采集\u002F精度采集相关文档，参考这里：https:\u002F\u002Fswift.readthedocs.io\u002Fzh-cn\u002Flatest\u002FMegatron-SWIFT\u002FAscend.html\r\n\r\n\r\n### 新模型\r\n1. 纯文本模型：\r\n   a. ZhipuAI\u002FGLM-4.7系列\r\n   b. iic\u002FQwenLong-L1.5-30B-A3B\r\n   c. gongjy\u002FMiniMind2 （感谢 @PiggerZZM 的贡献）\r\n2. 多模态模型：\r\n   a. ZhipuAI\u002FGLM-4.6V; ZhipuAI\u002FGLM-4.6V-Flash系列\r\n   b. Tencent-Hunyuan\u002FHunyuanOCR\r\n\r\n## English Version\r\n\r\n### New Features\r\n\r\n1. **Megatron-SWIFT**\r\n   a. GKD algorithm supports Megatron training. Documentation reference: https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FMegatron-SWIFT\u002FGKD.html\r\n   b. New model support: GLM4 Dense; GLM4.7; GLM4.6v-Flash, GLM-4.1V.\r\n   c. `save_safetensors` supports checkpoint resumption, with Mcore-Bridge loading and storage method as the recommended approach.\r\n   d. Non-padding-free training mode supports more training stages: GRPO\u002FDPO\u002FKTO\u002FRM\u002Fsequence classification.\r\n   e. `group_by_length` parameter support, grouping samples with similar lengths in the dataset together (with random factors) to accelerate training speed in non-packing mode.\r\n   f. Support for `--report_to` parameter to log and visualize training logs in wandb\u002Fswanlab.\r\n   g. Qwen3-Next uses Zero-Centered RMSNorm, aligned with transformers.\r\n   h. `train_dataloader_shuffle` parameter support to control whether training dataset is shuffled.\r\n   i. Added retry mechanism to template.encode to prevent megatron training from freezing due to network issues when fetching images\u002Fvideos.\r\n2. **RL**\r\n   a. Added Off-Policy Sequence Masking (from DeepSeek-V3.2). Documentation reference: https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002Ftraining_inference_mismatch.html#off-policy-sequence-masking\r\n   b. GRPO adds `num_generations_eval` parameter to set the number of generations during eval stage.\r\n   c. Optimized memory peak for GKD loss calculation.\r\n   d. GRPO\u002FGKD server mode supports using ipv6 addresses.\r\n   e. Support for structured output sampling using `structured_outputs_regex`.\r\n3. **Training**\r\n   a. Embedding\u002Freranker\u002Fsequence classification tasks support sequence packing and sequence parallelism. Training script reference: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Fsequence_parallel\r\n   b. Support for `--fsdp fsdp2` to use ms-swift built-in FSDP2 configuration file.\r\n   c. `loss_scale` supports 3 basic strategies: 'default', 'last_round', 'all' and their hybrid use with other strategies, e.g., 'last_round+ignore_empty_think'.\r\n   d. `cached_dataset` supports embedding\u002Freranker\u002Fsequence classification training tasks. Training script reference: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Fcached_dataset\r\n   e. Thinking template refactored, ThinkingTemplate functionality merged into Template, added `enable_thinking` and `add_non_thinking_prefix` parameters.\r\n   f. Added `SWIFT_PATCH_CONV3D` environment variable to avoid slow conv3d execution in torch2.9 environment.\r\n   g. Support for `swanlab_notification_method` parameter to specify swanlab notification method when training completes\u002Ferrors occur.\r\n ","2025-12-30T03:24:43",{"id":232,"version":233,"summary_zh":234,"released_at":235},99986,"v3.11.3","**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fcompare\u002Fv3.11.2...v3.11.3\r\n","2025-12-28T12:54:28",{"id":237,"version":238,"summary_zh":239,"released_at":240},99987,"v3.11.2","**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fcompare\u002Fv3.11.1...v3.11.2","2025-12-21T02:59:12",{"id":242,"version":243,"summary_zh":244,"released_at":245},99988,"v3.11.1","**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fcompare\u002Fv3.11.0...v3.11.1\r\n","2025-12-15T01:10:38",{"id":247,"version":248,"summary_zh":249,"released_at":250},99989,"v3.11.0","## 中文版\r\n\r\n### 新特性\r\n1. **Megatron-SWIFT**\r\n    a. **支持 GRPO Megatron 训练**，训练文档参考：https:\u002F\u002Fswift.readthedocs.io\u002Fzh-cn\u002Flatest\u002FMegatron-SWIFT\u002FGRPO.html\r\n    b. **FP8 blockwise 训练支持**，支持FP8加载和导出权重，训练脚本参考：https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Ffp8\r\n    c. **MTP 训练支持**，训练脚本参考：https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Fmegatron\u002Flora\u002Fmtp.sh\r\n    d. 新模型支持：GPT-OSS，Llama4，InternVL3.5-GPT-OSS等。\r\n    e. 支持 `--save_strategy epoch` 策略存储模型。\r\n    f. 兼容 megaron-core 0.12-0.15 版本。\r\n2. **RL**\r\n    a. **新算法 SAPO 支持**，文档参考：https:\u002F\u002Fswift.readthedocs.io\u002Fzh-cn\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FSAPO.html\r\n    b. **新算法 CISPO 支持**，文档参考：https:\u002F\u002Fswift.readthedocs.io\u002Fzh-cn\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FCISPO.html\r\n    c. **缓解训推不一致的算法支持**，包括 TIS\u002FMIS 与 rollout off-policy metrics 记录，文档参考：https:\u002F\u002Fswift.readthedocs.io\u002Fzh-cn\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002Ftraining_inference_mismatch.html\r\n    d. tree-rollout 支持，文档参考：https:\u002F\u002Fswift.readthedocs.io\u002Fzh-cn\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002Ftreepo.html （感谢招商银行团队 @li2zhi 的贡献）\r\n    e. gkd 训练支持使用 liger_kernel loss（`--use_liger_kernel true`）。\r\n    f. 新增 GRPO loss_type，文档参考：https:\u002F\u002Fswift.readthedocs.io\u002Fzh-cn\u002Flatest\u002FInstruction\u002FGRPO\u002FDeveloperGuide\u002Floss_types.html\r\n3. **训练**\r\n    a. cached dataset 重构，更好支持大型数据集离线 tokenize 场景，脚本参考：https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Fcached_dataset\r\n    b. 预训练场景 `--truncation_strategy split` 策略支持，将长文本切成多条数据样本避免 tokens 浪费。\r\n    c. `packing_num_proc` 参数支持。\r\n    d. Qwen2.5-VL系列模型兼容使用 \"qwen_vl_utils>=0.14\"。\r\n    e. MFU 日志插件支持。(感谢 @y2logic 的贡献)\r\n4. **国产化硬件**（感谢昇腾和招商银行技术团队的贡献）\r\n    a. **Megatron-SWIFT 支持昇腾 NPU**，文档参考：https:\u002F\u002Fswift.readthedocs.io\u002Fzh-cn\u002Flatest\u002FBestPractices\u002FNPU-support.html\r\n    b. 昇腾NPU混合算子支持 Qwen2、Qwen3、Qwen3-MoE 系列模型，加速训练过程。\r\n\r\n### 新模型\r\n1. 纯文本模型：\r\n    a. moonshotai\u002FKimi-K2-Thinking\r\n2. 多模态模型：\r\n    a. SenseNova\u002FSenseNova-SI-InternVL3-2B系列\r\n    b. mistralai\u002FMinistral-3-3B-Instruct-2512系列\r\n    c. mistralai\u002FMistral-Small-3.2-24B-Instruct-2506\r\n\r\n## English Version\r\n\r\n### New Features\r\n1. **Megatron-SWIFT**\r\n    a. **GRPO training support on Megatron**, documentation: https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FMegatron-SWIFT\u002FGRPO.html\r\n    b. **FP8 blockwise training support**, including FP8 weight loading and exporting. Training scripts: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Ffp8\r\n    c. **MTP training support**, training script: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Fmegatron\u002Flora\u002Fmtp.sh\r\n    d. New model support: GPT-OSS, Llama4, InternVL3.5-GPT-OSS, etc.\r\n    e. Support for saving strategy `--save_strategy epoch`.\r\n    f. Compatible with megaron-core versions 0.12–0.15.\r\n2. **RL**\r\n    a. **New algorithm SAPO supported**, documentation: https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FSAPO.html\r\n    b. **New algorithm CISPO supported**, documentation: https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FCISPO.html\r\n    c. **Algorithms for mitigating training–inference mismatch**, including TIS\u002FMIS and rollout off-policy metrics. Docs: https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002Ftraining_inference_mismatch.html\r\n    d. Tree-rollout support, docs: https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002Ftreepo.html (Thanks to CMB team @li2zhi for the contribution)\r\n    e. GKD training supports liger_kernel loss (`--use_liger_kernel true`).\r\n    f. New GRPO loss types added, docs: https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FDeveloperGuide\u002Floss_types.html\r\n3. **Training**\r\n    a. Cached dataset refactoring for better offline tokenization of large datasets. Scripts: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Fcached_dataset\r\n    b. Pretraining `--truncation_strategy split` support, splitting long text into multiple samples to avoid token waste.\r\n    c. Added `packing_num_proc` parameter support.\r\n    d. Qwen2.5-VL series models compatible with \"qwen_vl_utils>=0.14\".\r\n    e. MFU logging plugin support (Thanks to @y2logic).\r\n4. **Domestic Hardware Support** (Thanks to Ascend and CMB technical teams)\r\n    a. **Megatron-SWIFT supports Ascend NPU**, documentation: https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FBestPractices\u002FNPU-support.html\r\n    b. Ascend NPU mixed operators support Qwen2, Qwen3, Qwen3-MoE series models, accelerating training.\r\n\r\n### New Models\r\n1. Text-only models:\r\n    a. moonshotai\u002FKimi-K2-Thinking\r\n2. Multimodal models:\r\n    a. SenseNova\u002FSenseNova-SI-InternVL3-2B series\r\n    b. mistralai\u002FMinistral-3-3B-Instruct-2512 series\r\n    c. mistralai\u002FMistral-Small-3.2-24B-Instruct-2506\r\n\r\n## What's Changed\r\n* bump version 3.11.0.dev by @Jintao-Huang in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F6560\r\n* [model] support Kimi-K2 by @Jintao-Huang in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F6562\r\n* [bugfix] fix pp vit_lr by @Jintao-Huang in https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fpull\u002F6565\r\n* [bugfix] fix tools ","2025-12-09T02:44:58",{"id":252,"version":253,"summary_zh":254,"released_at":255},99990,"v3.10.3","**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fcompare\u002Fv3.10.2...v3.10.3\r\n","2025-11-30T06:35:16",{"id":257,"version":258,"summary_zh":259,"released_at":260},99991,"v3.10.2","**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fcompare\u002Fv3.10.1...v3.10.2","2025-11-23T09:58:22",{"id":262,"version":263,"summary_zh":264,"released_at":265},99992,"v3.10.1","**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fcompare\u002Fv3.10.0...v3.10.1\r\n","2025-11-16T16:50:12",{"id":267,"version":268,"summary_zh":269,"released_at":270},99993,"v3.10.0","## 中文版\r\n\r\n### 新特性\r\n\r\n1. **Megatron-SWIFT**\r\n    a. **Mcore-Bridge发布**。支持直接加载和存储 safetensors 格式的模型权重；支持LoRA增量权重双向转换；支持多机转换。文档参考：https:\u002F\u002Fswift.readthedocs.io\u002Fzh-cn\u002Flatest\u002FMegatron-SWIFT\u002FMcore-Bridge.html 。训练脚本参考：https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Fmcore_bridge\r\n    b. megatron-core 版本升级至0.14.0。\r\n    c. 多模态模型训练新增 `vit_lr` 和 `aligner_lr` 参数支持。\r\n    d. 新增存储优化参数：async_save, save_retain_interval等。\r\n    e. 支持batched mrope，加速Qwen3-VL、Qwen2.5-VL等模型的训练速度。\r\n2. **RL**\r\n    a. GRPO LoRA 训练权重同步速度优化，具体参考：https:\u002F\u002Fswift.readthedocs.io\u002Fzh-cn\u002Flatest\u002FInstruction\u002FGRPO\u002FGetStarted\u002FGRPO.html#id3\r\n    b. GRPO 训练显存优化以降低峰值显存占用。\r\n    c. RLVR 新算法支持：**RLOO**，文档参考：https:\u002F\u002Fswift.readthedocs.io\u002Fzh-cn\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FRLOO.html 。**REINFORCE++** Baseline，文档参考：https:\u002F\u002Fswift.readthedocs.io\u002Fzh-cn\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FREINFORCEPP.html\r\n    d. **GKD 支持使用 vLLM 加速策略模型rollout**，并新增参数teacher_deepspeed额外控制教师模型分片策略。文档参考：https:\u002F\u002Fswift.readthedocs.io\u002Fzh-cn\u002Flatest\u002FInstruction\u002FGKD.html\r\n    e. GSPO 支持使用liger_kernel减少显存使用。\r\n3. **训练**\r\n    a. **PT\u002FSFT\u002F采样\u002F数据蒸馏中支持了RAY**，具体参考文档：https:\u002F\u002Fswift.readthedocs.io\u002Fzh-cn\u002Flatest\u002FInstruction\u002FRay.html\r\n    b. Qwen3-VL、Qwen3-Omni支持混合模态数据训练；Qwen3-VL支持ulysses序列并行。训练脚本参考：https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmodels\u002Fqwen3_vl\r\n    c. 支持 yaml 方式配置训练参数，脚本参考：https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fyaml\r\n    d. 新增 FSDP2 训练启动案例，脚本参考：https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Fmulti-gpu\u002Ffsdp2_lora\r\n    e. 新增自定义多模态模型注册最佳实践：https:\u002F\u002Fswift.readthedocs.io\u002Fzh-cn\u002Flatest\u002FBestPractices\u002FMLLM-Registration.html\r\n    f. embedding 训练中的 InfoNCE 损失与 Qwen3-Embedding 论文描述对齐。具体参考文档：https:\u002F\u002Fswift.readthedocs.io\u002Fzh-cn\u002Flatest\u002FBestPractices\u002FEmbedding.html\r\n    g. 新增多标签分类训练案例，脚本参考：https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Fseq_cls\u002Fmulti_label\r\n    h. agent_template 支持 seed-oss。感谢@hpsun1109的贡献。\r\n4. **全链路**\r\n    a. `swift export`支持 **GPTQ-v2** 量化，脚本参考：https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Fblob\u002Fmain\u002Fexamples\u002Fexport\u002Fquantize\u002Fgptq_v2.sh 。感谢@zzc0430的贡献。\r\n    b. `swift deploy` vllm推理后端支持 DP 部署，使用`--vllm_data_parallel_size`参数。感谢@YushunXiang 的贡献。\r\n    c. `swift deploy` 新增 health\u002Fping endpoints。\r\n    d. vLLM 部署新增参数  `vllm_mm_processor_cache_gb`\u002F`vllm_engine_kwargs`。\r\n\r\n\r\n### 新模型\r\n1. 纯文本模型：\r\n    a. Qwen\u002FQwen3Guard-Gen-0.6B系列\r\n    b. MiniMax\u002FMiniMax-M2\r\n2. 多模态模型：\r\n    a. Qwen\u002FQwen3-VL-2B-Instruct系列\r\n    b. deepseek-ai\u002FDeepSeek-OCR，训练脚本参考：https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmodels\u002Fdeepseek_ocr\r\n    c. PaddlePaddle\u002FPaddleOCR-VL\r\n    d. ZhipuAI\u002FGlyph\r\n    e. PaddlePaddle\u002FERNIE-4.5-VL-28B-A3B-Thinking系列\r\n    f. lmms-lab\u002FLLaVA-OneVision-1.5-4B-Instruct系列\r\n\r\n\r\n\r\n## English Version\r\n\r\n### New Features\r\n\r\n1. **Megatron-SWIFT**\r\n   a. **Mcore-Bridge Release**. Supports direct loading and saving of model weights in safetensors format; supports bidirectional conversion of LoRA incremental weights; supports multi-node conversion. Documentation: https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FMegatron-SWIFT\u002FMcore-Bridge.html. Training scripts: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmegatron\u002Fmcore_bridge\r\n   b. Upgraded megatron-core version to 0.14.0.\r\n   c. Added `vit_lr` and `aligner_lr` parameter support for multimodal model training.\r\n   d. Added storage optimization parameters: async_save, save_retain_interval, etc.\r\n   e. Support for batched mrope to accelerate training speed of Qwen3-VL, Qwen2.5-VL, and other models.\r\n2. **RL**\r\n   a. GRPO LoRA training weight synchronization speed optimization. Details: https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FGetStarted\u002FGRPO.html#memory-optimization-solutions-in-colocate-mode\r\n   b. GRPO training memory optimization to reduce peak memory consumption.\r\n   c. New RLVR algorithm support: **RLOO**, documentation: https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FRLOO.html. **REINFORCE++** Baseline, documentation: https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGRPO\u002FAdvancedResearch\u002FREINFORCEPP.html\r\n   d. **GKD supports using vLLM to accelerate policy model rollout**, with new parameter teacher_deepspeed for additional control of teacher model sharding strategy. Documentation: https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FGKD.html\r\n   e. GSPO supports using liger_kernel to reduce memory usage.\r\n3. **Training**\r\n   a. **RAY support added for PT\u002FSFT\u002FSampling\u002FData Distillation**, documentation: https:\u002F\u002Fswift.readthedocs.io\u002Fen\u002Flatest\u002FInstruction\u002FRay.html\r\n   b. Qwen3-VL and Qwen3-Omni support mixed modality data training; Qwen3-VL supports Ulysses sequence parallelism. Training scripts: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fmodels\u002Fqwen3_vl\r\n   c. Support for YAML-based training parameter configuration, scripts: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Fyaml\r\n   d. Added FSDP2 training launch example, scripts: https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift\u002Ftree\u002Fmain\u002Fexamples\u002Ftrain\u002Fmulti-gpu\u002Ffsd","2025-11-11T12:14:09"]