[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-axolotl-ai-cloud--axolotl":3,"tool-axolotl-ai-cloud--axolotl":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":80,"owner_website":81,"owner_url":82,"languages":83,"stars":103,"forks":104,"last_commit_at":105,"license":106,"difficulty_score":10,"env_os":107,"env_gpu":108,"env_ram":109,"env_deps":110,"category_tags":121,"github_topics":122,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":125,"updated_at":126,"faqs":127,"releases":156},2918,"axolotl-ai-cloud\u002Faxolotl","axolotl","Go ahead and axolotl questions","Axolotl 是一个免费且开源的大语言模型（LLM）微调框架，旨在让开发者能够轻松、高效地定制属于自己的 AI 模型。它主要解决了大模型微调过程中配置复杂、环境搭建困难以及显存资源消耗过大等痛点，通过统一的接口简化了从数据准备到模型训练的全流程。\n\n这款工具特别适合 AI 研究人员、机器学习工程师以及希望深入探索大模型潜力的开发者使用。无论是想要复现前沿论文算法，还是希望基于特定领域数据训练专用模型，Axolotl 都能提供强大的支持。\n\n在技术亮点方面，Axolotl 紧跟社区前沿，不仅支持 Mistral、Qwen、GLM 等最新主流模型架构，还引入了多项创新优化技术。例如，它支持针对混合专家模型（MoE）的专家量化和 ScatterMoE LoRA 技术，能显著降低训练时的显存占用；同时集成了 SageAttention、GDPO（广义直接偏好优化）以及 EAFT 等先进算法，有效提升了长上下文处理能力和训练效率。凭借其对多 GPU 训练的友好支持和活跃的社区生态，Axolotl 已成为当前大模型微调领域值得信赖的得力助手。","\u003Cp align=\"center\">\n    \u003Cpicture>\n        \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fraw.githubusercontent.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002F887513285d98132142bf5db2a74eb5e0928787f1\u002Fimage\u002Faxolotl_logo_digital_white.svg\">\n        \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Fraw.githubusercontent.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002F887513285d98132142bf5db2a74eb5e0928787f1\u002Fimage\u002Faxolotl_logo_digital_black.svg\">\n        \u003Cimg alt=\"Axolotl\" src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002F887513285d98132142bf5db2a74eb5e0928787f1\u002Fimage\u002Faxolotl_logo_digital_black.svg\" width=\"400\" height=\"104\" style=\"max-width: 100%;\">\n    \u003C\u002Fpicture>\n\u003C\u002Fp>\n  \u003Cp align=\"center\">\n      \u003Cstrong>A Free and Open Source LLM Fine-tuning Framework\u003C\u002Fstrong>\u003Cbr>\n  \u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Faxolotl-ai-cloud\u002Faxolotl.svg?color=blue\" alt=\"GitHub License\">\n    \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Factions\u002Fworkflows\u002Ftests.yml\u002Fbadge.svg\" alt=\"tests\">\n    \u003Ca href=\"https:\u002F\u002Fcodecov.io\u002Fgh\u002Faxolotl-ai-cloud\u002Faxolotl\">\u003Cimg src=\"https:\u002F\u002Fcodecov.io\u002Fgh\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fbranch\u002Fmain\u002Fgraph\u002Fbadge.svg\" alt=\"codecov\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Freleases\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frelease\u002Faxolotl-ai-cloud\u002Faxolotl.svg\" alt=\"Releases\">\u003C\u002Fa>\n    \u003Cbr\u002F>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fgraphs\u002Fcontributors\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcontributors-anon\u002Faxolotl-ai-cloud\u002Faxolotl?color=yellow&style=flat-square\" alt=\"contributors\" style=\"height: 20px;\">\u003C\u002Fa>\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Faxolotl-ai-cloud\u002Faxolotl\" alt=\"GitHub Repo stars\">\n    \u003Cbr\u002F>\n    \u003Ca href=\"https:\u002F\u002Fdiscord.com\u002Finvite\u002FHhrNrHJPRb\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdiscord-7289da.svg?style=flat-square&logo=discord\" alt=\"discord\" style=\"height: 20px;\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Ftwitter.com\u002Faxolotl_ai\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Ffollow\u002Faxolotl_ai?style=social\" alt=\"twitter\" style=\"height: 20px;\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fblob\u002Fmain\u002Fexamples\u002Fcolab-notebooks\u002Fcolab-axolotl-example.ipynb\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"google-colab\" style=\"height: 20px;\">\u003C\u002Fa>\n    \u003Cbr\u002F>\n    \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Factions\u002Fworkflows\u002Ftests-nightly.yml\u002Fbadge.svg\" alt=\"tests-nightly\">\n    \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Factions\u002Fworkflows\u002Fmulti-gpu-e2e.yml\u002Fbadge.svg\" alt=\"multigpu-semi-weekly tests\">\n\u003C\u002Fp>\n\n\n## 🎉 Latest Updates\n\n- 2026\u002F03:\n  - New model support has been added in Axolotl for [Mistral Small 4](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Ftree\u002Fmain\u002Fexamples\u002Fmistral4), [Qwen3.5, Qwen3.5 MoE](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Ftree\u002Fmain\u002Fexamples\u002Fqwen3.5), [GLM-4.7-Flash](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Ftree\u002Fmain\u002Fexamples\u002Fglm47-flash), [GLM-4.6V](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Ftree\u002Fmain\u002Fexamples\u002Fglm46v), and [GLM-4.5-Air](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Ftree\u002Fmain\u002Fexamples\u002Fglm45).\n  - [MoE expert quantization](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fexpert_quantization.html) support (via `quantize_moe_experts: true`) greatly reduces VRAM when training MoE models (FSDP2 compat).\n- 2026\u002F02:\n  - [ScatterMoE LoRA](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3410) support. LoRA fine-tuning directly on MoE expert weights using custom Triton kernels.\n  - Axolotl now has support for [SageAttention](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2823) and [GDPO](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3353) (Generalized DPO).\n- 2026\u002F01:\n  - New integration for [EAFT](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3366) (Entropy-Aware Focal Training), weights loss by entropy of the top-k logit distribution, and [Scalable Softmax](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3338), improves long context in attention.\n- 2025\u002F12:\n  - Axolotl now includes support for [Kimi-Linear](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fkimi-linear.html), [Plano-Orchestrator](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fplano.html), [MiMo](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fmimo.html), [InternVL 3.5](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Finternvl3_5.html), [Olmo3](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Folmo3.html), [Trinity](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Ftrinity.html), and [Ministral3](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fministral3.html).\n  - [Distributed Muon Optimizer](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3264) support has been added for FSDP2 pretraining.\n- 2025\u002F10: New model support has been added in Axolotl for: [Qwen3 Next](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fqwen3-next.html), [Qwen2.5-vl, Qwen3-vl](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Ftree\u002Fmain\u002Fexamples\u002Fqwen2_5-vl), [Qwen3, Qwen3MoE](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fqwen3.html), [Granite 4](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fgranite4.html), [HunYuan](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fhunyuan.html), [Magistral 2509](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fmagistral\u002Fvision.html), [Apertus](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fapertus.html), and [Seed-OSS](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fseed-oss.html).\n\n\u003Cdetails>\n\n\u003Csummary>Expand older updates\u003C\u002Fsummary>\n\n- 2025\u002F09: Axolotl now has text diffusion training. Read more [here](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Ftree\u002Fmain\u002Fsrc\u002Faxolotl\u002Fintegrations\u002Fdiffusion).\n- 2025\u002F08: QAT has been updated to include NVFP4 support. See [PR](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3107).\n- 2025\u002F07:\n  - ND Parallelism support has been added into Axolotl. Compose Context Parallelism (CP), Tensor Parallelism (TP), and Fully Sharded Data Parallelism (FSDP) within a single node and across multiple nodes. Check out the [blog post](https:\u002F\u002Fhuggingface.co\u002Fblog\u002Faccelerate-nd-parallel) for more info.\n  - Axolotl adds more models: [GPT-OSS](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fgpt-oss.html), [Gemma 3n](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fgemma3n.html), [Liquid Foundation Model 2 (LFM2)](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002FLiquidAI.html), and [Arcee Foundation Models (AFM)](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Farcee.html).\n  - FP8 finetuning with fp8 gather op is now possible in Axolotl via `torchao`. Get started [here](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmixed_precision.html#sec-fp8)!\n  - [Voxtral](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fvoxtral.html), [Magistral 1.1](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fmagistral.html), and [Devstral](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fdevstral.html) with mistral-common tokenizer support has been integrated in Axolotl!\n  - TiledMLP support for single-GPU to multi-GPU training with DDP, DeepSpeed and FSDP support has been added to support Arctic Long Sequence Training. (ALST). See [examples](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Ftree\u002Fmain\u002Fexamples\u002Falst) for using ALST with Axolotl!\n- 2025\u002F06: Magistral with mistral-common tokenizer support has been added to Axolotl. See [docs](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fmagistral.html) to start training your own Magistral models with Axolotl!\n- 2025\u002F05: Quantization Aware Training (QAT) support has been added to Axolotl. Explore the [docs](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fqat.html) to learn more!\n- 2025\u002F04: Llama 4 support has been added in Axolotl. See [docs](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fllama-4.html) to start training your own Llama 4 models with Axolotl's linearized version!\n- 2025\u002F03: Axolotl has implemented Sequence Parallelism (SP) support. Read the [blog](https:\u002F\u002Fhuggingface.co\u002Fblog\u002Faxolotl-ai-co\u002Flong-context-with-sequence-parallelism-in-axolotl) and [docs](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fsequence_parallelism.html) to learn how to scale your context length when fine-tuning.\n- 2025\u002F03: (Beta) Fine-tuning Multimodal models is now supported in Axolotl. Check out the [docs](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmultimodal.html) to fine-tune your own!\n- 2025\u002F02: Axolotl has added LoRA optimizations to reduce memory usage and improve training speed for LoRA and QLoRA in single GPU and multi-GPU training (DDP and DeepSpeed). Jump into the [docs](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Flora_optims.html) to give it a try.\n- 2025\u002F02: Axolotl has added GRPO support. Dive into our [blog](https:\u002F\u002Fhuggingface.co\u002Fblog\u002Faxolotl-ai-co\u002Ftraining-llms-w-interpreter-feedback-wasm) and [GRPO example](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Fgrpo_code) and have some fun!\n- 2025\u002F01: Axolotl has added Reward Modelling \u002F Process Reward Modelling fine-tuning support. See [docs](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Freward_modelling.html).\n\n\u003C\u002Fdetails>\n\n## ✨ Overview\n\nAxolotl is a free and open-source tool designed to streamline post-training and fine-tuning for the latest large language models (LLMs).\n\nFeatures:\n\n- **Multiple Model Support**: Train various models like GPT-OSS, LLaMA, Mistral, Mixtral, Pythia, and many more models available on the Hugging Face Hub.\n- **Multimodal Training**: Fine-tune vision-language models (VLMs) including LLaMA-Vision, Qwen2-VL, Pixtral, LLaVA, SmolVLM2, GLM-4.6V, InternVL 3.5, Gemma 3n, and audio models like Voxtral with image, video, and audio support.\n- **Training Methods**: Full fine-tuning, LoRA, QLoRA, GPTQ, QAT, Preference Tuning (DPO, IPO, KTO, ORPO), RL (GRPO, GDPO), and Reward Modelling (RM) \u002F Process Reward Modelling (PRM).\n- **Easy Configuration**: Re-use a single YAML configuration file across the full fine-tuning pipeline: dataset preprocessing, training, evaluation, quantization, and inference.\n- **Performance Optimizations**: [Multipacking](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmultipack.html), [Flash Attention 2\u002F3\u002F4](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fattention.html#flash-attention), [Xformers](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fattention.html#xformers), [Flex Attention](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fattention.html#flex-attention), [SageAttention](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fattention.html#sageattention), [Liger Kernel](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fcustom_integrations.html#liger-kernels), [Cut Cross Entropy](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fcustom_integrations.html#cut-cross-entropy), [ScatterMoE](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fcustom_integrations.html#kernels-integration), [Sequence Parallelism (SP)](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fsequence_parallelism.html), [LoRA optimizations](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Flora_optims.html), [Multi-GPU training (FSDP1, FSDP2, DeepSpeed)](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmulti-gpu.html), [Multi-node training (Torchrun, Ray)](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmulti-node.html), and many more!\n- **Flexible Dataset Handling**: Load from local, HuggingFace, and cloud (S3, Azure, GCP, OCI) datasets.\n- **Cloud Ready**: We ship [Docker images](https:\u002F\u002Fhub.docker.com\u002Fu\u002Faxolotlai) and also [PyPI packages](https:\u002F\u002Fpypi.org\u002Fproject\u002Faxolotl\u002F) for use on cloud platforms and local hardware.\n\n\n\n## 🚀 Quick Start - LLM Fine-tuning in Minutes\n\n**Requirements**:\n\n- NVIDIA GPU (Ampere or newer for `bf16` and Flash Attention) or AMD GPU\n- Python 3.11\n- PyTorch ≥2.9.1\n\n### Google Colab\n\n[![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fblob\u002Fmain\u002Fexamples\u002Fcolab-notebooks\u002Fcolab-axolotl-example.ipynb#scrollTo=msOCO4NRmRLa)\n\n### Installation\n\n#### Using pip\n\n```bash\npip3 install -U packaging==26.0 setuptools==75.8.0 wheel ninja\npip3 install --no-build-isolation axolotl[flash-attn,deepspeed]\n\n# Download example axolotl configs, deepspeed configs\naxolotl fetch examples\naxolotl fetch deepspeed_configs  # OPTIONAL\n```\n\n#### Using Docker\n\nInstalling with Docker can be less error prone than installing in your own environment.\n```bash\ndocker run --gpus '\"all\"' --rm -it axolotlai\u002Faxolotl:main-latest\n```\n\nOther installation approaches are described [here](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Finstallation.html).\n\n#### Cloud Providers\n\n\u003Cdetails>\n\n- [RunPod](https:\u002F\u002Frunpod.io\u002Fgsc?template=v2ickqhz9s&ref=6i7fkpdz)\n- [Vast.ai](https:\u002F\u002Fcloud.vast.ai?ref_id=62897&template_id=bdd4a49fa8bce926defc99471864cace&utm_source=github&utm_medium=developer_community&utm_campaign=template_launch_axolotl&utm_content=readme)\n- [PRIME Intellect](https:\u002F\u002Fapp.primeintellect.ai\u002Fdashboard\u002Fcreate-cluster?image=axolotl&location=Cheapest&security=Cheapest&show_spot=true)\n- [Modal](https:\u002F\u002Fwww.modal.com?utm_source=github&utm_medium=github&utm_campaign=axolotl)\n- [Novita](https:\u002F\u002Fnovita.ai\u002Fgpus-console?templateId=311)\n- [JarvisLabs.ai](https:\u002F\u002Fjarvislabs.ai\u002Ftemplates\u002Faxolotl)\n- [Latitude.sh](https:\u002F\u002Flatitude.sh\u002Fblueprint\u002F989e0e79-3bf6-41ea-a46b-1f246e309d5c)\n\n\u003C\u002Fdetails>\n\n### Your First Fine-tune\n\n```bash\n# Fetch axolotl examples\naxolotl fetch examples\n\n# Or, specify a custom path\naxolotl fetch examples --dest path\u002Fto\u002Ffolder\n\n# Train a model using LoRA\naxolotl train examples\u002Fllama-3\u002Flora-1b.yml\n```\n\nThat's it! Check out our [Getting Started Guide](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fgetting-started.html) for a more detailed walkthrough.\n\n\n## 📚 Documentation\n\n- [Installation Options](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Finstallation.html) - Detailed setup instructions for different environments\n- [Configuration Guide](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fconfig-reference.html) - Full configuration options and examples\n- [Dataset Loading](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fdataset_loading.html) - Loading datasets from various sources\n- [Dataset Guide](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fdataset-formats\u002F) - Supported formats and how to use them\n- [Multi-GPU Training](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmulti-gpu.html)\n- [Multi-Node Training](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmulti-node.html)\n- [Multipacking](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmultipack.html)\n- [API Reference](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fapi\u002F) - Auto-generated code documentation\n- [FAQ](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Ffaq.html) - Frequently asked questions\n\n## 🤝 Getting Help\n\n- Join our [Discord community](https:\u002F\u002Fdiscord.gg\u002FHhrNrHJPRb) for support\n- Check out our [Examples](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Ftree\u002Fmain\u002Fexamples\u002F) directory\n- Read our [Debugging Guide](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fdebugging.html)\n- Need dedicated support? Please contact [✉️wing@axolotl.ai](mailto:wing@axolotl.ai) for options\n\n## 🌟 Contributing\n\nContributions are welcome! Please see our [Contributing Guide](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fblob\u002Fmain\u002F.github\u002FCONTRIBUTING.md) for details.\n\n## 📈 Telemetry\n\nAxolotl has opt-out telemetry that helps us understand how the project is being used\nand prioritize improvements. We collect basic system information, model types, and\nerror rates—never personal data or file paths. Telemetry is enabled by default. To\ndisable it, set AXOLOTL_DO_NOT_TRACK=1. For more details, see our [telemetry documentation](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Ftelemetry.html).\n\n## ❤️ Sponsors\n\nInterested in sponsoring? Contact us at [wing@axolotl.ai](mailto:wing@axolotl.ai)\n\n## 📝 Citing Axolotl\n\nIf you use Axolotl in your research or projects, please cite it as follows:\n\n```bibtex\n@software{axolotl,\n  title = {Axolotl: Open Source LLM Post-Training},\n  author = {{Axolotl maintainers and contributors}},\n  url = {https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl},\n  license = {Apache-2.0},\n  year = {2023}\n}\n```\n\n## 📜 License\n\nThis project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.\n","\u003Cp align=\"center\">\n    \u003Cpicture>\n        \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fraw.githubusercontent.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002F887513285d98132142bf5db2a74eb5e0928787f1\u002Fimage\u002Faxolotl_logo_digital_white.svg\">\n        \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Fraw.githubusercontent.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002F887513285d98132142bf5db2a74eb5e0928787f1\u002Fimage\u002Faxolotl_logo_digital_black.svg\">\n        \u003Cimg alt=\"Axolotl\" src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002F887513285d98132142bf5db2a74eb5e0928787f1\u002Fimage\u002Faxolotl_logo_digital_black.svg\" width=\"400\" height=\"104\" style=\"max-width: 100%;\">\n    \u003C\u002Fpicture>\n\u003C\u002Fp>\n  \u003Cp align=\"center\">\n      \u003Cstrong>一个免费且开源的大型语言模型微调框架\u003C\u002Fstrong>\u003Cbr>\n  \u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Faxolotl-ai-cloud\u002Faxolotl.svg?color=blue\" alt=\"GitHub 许可证\">\n    \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Factions\u002Fworkflows\u002Ftests.yml\u002Fbadge.svg\" alt=\"测试\">\n    \u003Ca href=\"https:\u002F\u002Fcodecov.io\u002Fgh\u002Faxolotl-ai-cloud\u002Faxolotl\">\u003Cimg src=\"https:\u002F\u002Fcodecov.io\u002Fgh\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fbranch\u002Fmain\u002Fgraph\u002Fbadge.svg\" alt=\"Codecov\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Freleases\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frelease\u002Faxolotl-ai-cloud\u002Faxolotl.svg\" alt=\"发布版本\">\u003C\u002Fa>\n    \u003Cbr\u002F>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fgraphs\u002Fcontributors\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcontributors-anon\u002Faxolotl-ai-cloud\u002Faxolotl?color=yellow&style=flat-square\" alt=\"贡献者\" style=\"height: 20px;\">\u003C\u002Fa>\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Faxolotl-ai-cloud\u002Faxolotl\" alt=\"GitHub 仓库星标数\">\n    \u003Cbr\u002F>\n    \u003Ca href=\"https:\u002F\u002Fdiscord.com\u002Finvite\u002FHhrNrHJPRb\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdiscord-7289da.svg?style=flat-square&logo=discord\" alt=\"Discord\" style=\"height: 20px;\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Ftwitter.com\u002Faxolotl_ai\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Ffollow\u002Faxolotl_ai?style=social\" alt=\"Twitter\" style=\"height: 20px;\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fblob\u002Fmain\u002Fexamples\u002Fcolab-notebooks\u002Fcolab-axolotl-example.ipynb\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Google Colab\" style=\"height: 20px;\">\u003C\u002Fa>\n    \u003Cbr\u002F>\n    \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Factions\u002Fworkflows\u002Ftests-nightly.yml\u002Fbadge.svg\" alt=\"夜间测试\">\n    \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Factions\u002Fworkflows\u002Fmulti-gpu-e2e.yml\u002Fbadge.svg\" alt=\"多GPU半周测试\">\n\u003C\u002Fp>\n\n## 🎉 最新更新\n\n- 2026年3月：\n  - Axolotl 新增对以下模型的支持：[Mistral Small 4](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Ftree\u002Fmain\u002Fexamples\u002Fmistral4)、[Qwen3.5、Qwen3.5 MoE](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Ftree\u002Fmain\u002Fexamples\u002Fqwen3.5)、[GLM-4.7-Flash](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Ftree\u002Fmain\u002Fexamples\u002Fglm47-flash)、[GLM-4.6V](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Ftree\u002Fmain\u002Fexamples\u002Fglm46v) 以及 [GLM-4.5-Air](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Ftree\u002Fmain\u002Fexamples\u002Fglm45)。\n  - 支持 [MoE 专家量化](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fexpert_quantization.html)（通过 `quantize_moe_experts: true`），在训练 MoE 模型时可大幅降低显存占用（与 FSDP2 兼容）。\n- 2026年2月：\n  - 支持 [ScatterMoE LoRA](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3410)，利用自定义 Triton 内核直接对 MoE 专家权重进行 LoRA 微调。\n  - Axolotl 现已支持 [SageAttention](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2823) 和 [GDPO](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3353)（广义 DPO）。\n- 2026年1月：\n  - 新增对 [EAFT](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3366)（基于熵的焦点训练）的支持，该方法根据 top-k 对数几率分布的熵来调整损失权重；同时新增 [可扩展 Softmax](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3338)，可提升注意力机制在长上下文场景下的表现。\n- 2025年12月：\n  - Axolotl 现已支持 [Kimi-Linear](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fkimi-linear.html)、[Plano-Orchestrator](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fplano.html)、[MiMo](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fmimo.html)、[InternVL 3.5](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Finternvl3_5.html)、[Olmo3](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Folmo3.html)、[Trinity](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Ftrinity.html) 以及 [Ministral3](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fministral3.html)。\n  - 新增对 [分布式缪子优化器](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3264) 的支持，适用于 FSDP2 预训练。\n- 2025年10月：Axolotl 新增对以下模型的支持：[Qwen3 Next](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fqwen3-next.html)、[Qwen2.5-vl、Qwen3-vl](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Ftree\u002Fmain\u002Fexamples\u002Fqwen2_5-vl)、[Qwen3、Qwen3MoE](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fqwen3.html)、[Granite 4](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fgranite4.html)、[HunYuan](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fhunyuan.html)、[Magistral 2509](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fmagistral\u002Fvision.html)、[Apertus](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fapertus.html) 以及 [Seed-OSS](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fseed-oss.html)。\n\n\u003Cdetails>\n\n\u003Csummary>展开旧版更新\u003C\u002Fsummary>\n\n- 2025年9月：Axolotl 现已支持文本扩散训练。更多信息请参见 [这里](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Ftree\u002Fmain\u002Fsrc\u002Faxolotl\u002Fintegrations\u002Fdiffusion)。\n- 2025年8月：QAT 已更新，新增 NVFP4 支持。详情请参阅 [PR](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3107)。\n- 2025年7月：\n  - Axolotl 新增 ND 并行支持。可在单节点内及跨多个节点组合使用上下文并行（CP）、张量并行（TP）和全分片数据并行（FSDP）。更多信息请参阅 [博客文章](https:\u002F\u002Fhuggingface.co\u002Fblog\u002Faccelerate-nd-parallel)。\n  - Axolotl 增加了更多模型支持：[GPT-OSS](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fgpt-oss.html)、[Gemma 3n](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fgemma3n.html)、[Liquid Foundation Model 2 (LFM2)](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002FLiquidAI.html) 以及 [Arcee Foundation Models (AFM)](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Farcee.html)。\n  - 通过 `torchao`，Axolotl 现已支持使用 fp8 gather 操作进行 FP8 微调。开始使用请参阅 [这里](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmixed_precision.html#sec-fp8)！\n  - [Voxtral](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fvoxtral.html)、[Magistral 1.1](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fmagistral.html) 以及 [Devstral](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fdevstral.html)，均支持 mistral-common 分词器，现已集成到 Axolotl 中！\n  - 新增 TiledMLP 支持，可用于从单 GPU 到多 GPU 的训练，并兼容 DDP、DeepSpeed 和 FSDP，以支持北极长序列训练（ALST）。有关如何使用 ALST 与 Axolotl 的示例，请参阅 [这里](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Ftree\u002Fmain\u002Fexamples\u002Falst)。\n- 2025年6月：Axolotl 新增了支持 mistral-common 分词器的 Magistral 模型。请参阅 [文档](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fmagistral.html)，开始使用 Axolotl 训练您自己的 Magistral 模型吧！\n- 2025年5月：Axolotl 新增了量化感知训练（QAT）支持。探索 [文档](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fqat.html)，了解更多详情！\n- 2025年4月：Axolotl 新增了 Llama 4 的支持。请参阅 [文档](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmodels\u002Fllama-4.html)，使用 Axolotl 的线性化版本开始训练您自己的 Llama 4 模型吧！\n- 2025年3月：Axolotl 实现了序列并行（SP）支持。阅读 [博客](https:\u002F\u002Fhuggingface.co\u002Fblog\u002Faxolotl-ai-co\u002Flong-context-with-sequence-parallelism-in-axolotl) 和 [文档](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fsequence_parallelism.html)，了解如何在微调过程中扩展您的上下文长度。\n- 2025年3月：（Beta 版）Axolotl 现已支持多模态模型的微调。请参阅 [文档](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmultimodal.html)，开始微调您自己的多模态模型吧！\n- 2025年2月：Axolotl 新增了 LoRA 优化功能，可减少内存占用并提升单 GPU 和多 GPU 训练中 LoRA 及 QLoRA 的训练速度（DDP 和 DeepSpeed）。立即查看 [文档](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Flora_optims.html)，尝试一下吧！\n- 2025年2月：Axolotl 新增了 GRPO 支持。深入阅读我们的 [博客](https:\u002F\u002Fhuggingface.co\u002Fblog\u002Faxolotl-ai-co\u002Ftraining-llms-w-interpreter-feedback-wasm) 和 [GRPO 示例](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Fgrpo_code)，尽情体验吧！\n- 2025年1月：Axolotl 新增了奖励建模\u002F过程奖励建模的微调支持。请参阅 [文档](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Freward_modelling.html)。\n\n\u003C\u002Fdetails>\n\n## ✨ 概述\n\nAxolotl 是一款免费且开源的工具，旨在简化最新大型语言模型（LLMs）的训练后优化和微调流程。\n\n特性：\n\n- **多模型支持**：支持训练多种模型，如 GPT-OSS、LLaMA、Mistral、Mixtral、Pythia 等，以及 Hugging Face Hub 上提供的众多其他模型。\n- **多模态训练**：可对视觉-语言模型（VLMs）进行微调，包括 LLaMA-Vision、Qwen2-VL、Pixtral、LLaVA、SmolVLM2、GLM-4.6V、InternVL 3.5、Gemma 3n 等；同时支持音频模型如 Voxtral，具备图像、视频和音频处理能力。\n- **训练方法**：全量微调、LoRA、QLoRA、GPTQ、QAT、偏好微调（DPO、IPO、KTO、ORPO）、强化学习（GRPO、GDPO）以及奖励建模（RM）\u002F过程奖励建模（PRM）。\n- **简易配置**：可在整个微调流程中复用单一 YAML 配置文件，涵盖数据集预处理、训练、评估、量化和推理等环节。\n- **性能优化**：[Multipacking](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmultipack.html)、[Flash Attention 2\u002F3\u002F4](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fattention.html#flash-attention)、[Xformers](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fattention.html#xformers)、[Flex Attention](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fattention.html#flex-attention)、[SageAttention](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fattention.html#sageattention)、[Liger Kernel](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fcustom_integrations.html#liger-kernels)、[Cut Cross Entropy](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fcustom_integrations.html#cut-cross-entropy)、[ScatterMoE](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fcustom_integrations.html#kernels-integration)、[序列并行（SP）](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fsequence_parallelism.html)、[LoRA 优化](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Flora_optims.html)、[多 GPU 训练（FSDP1、FSDP2、DeepSpeed）](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmulti-gpu.html)、[多节点训练（Torchrun、Ray）](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmulti-node.html)，以及其他众多优化技术！\n- **灵活的数据集处理**：支持从本地、HuggingFace 及云端（S3、Azure、GCP、OCI）加载数据集。\n- **云原生支持**：我们提供 [Docker 镜像](https:\u002F\u002Fhub.docker.com\u002Fu\u002Faxolotlai) 和 [PyPI 包](https:\u002F\u002Fpypi.org\u002Fproject\u002Faxolotl\u002F)，便于在云平台和本地硬件上使用。\n\n\n\n## 🚀 快速入门 - 数分钟内完成 LLM 微调\n\n**要求**：\n\n- NVIDIA GPU（Ampere 或更新架构，以支持 `bf16` 和 Flash Attention）或 AMD GPU\n- Python 3.11\n- PyTorch ≥2.9.1\n\n### Google Colab\n\n[![在 Colab 中打开](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fblob\u002Fmain\u002Fexamples\u002Fcolab-notebooks\u002Fcolab-axolotl-example.ipynb#scrollTo=msOCO4NRmRLa)\n\n### 安装\n\n#### 使用 pip\n\n```bash\npip3 install -U packaging==26.0 setuptools==75.8.0 wheel ninja\npip3 install --no-build-isolation axolotl[flash-attn,deepspeed]\n\n# 下载示例 Axolotl 配置及 DeepSpeed 配置\naxolotl fetch examples\naxolotl fetch deepspeed_configs  # 可选\n```\n\n#### 使用 Docker\n\n通过 Docker 安装通常比在本地环境中安装更不易出错。\n```bash\ndocker run --gpus '\"all\"' --rm -it axolotlai\u002Faxolotl:main-latest\n```\n\n其他安装方式请参阅 [此处](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Finstallation.html)。\n\n#### 云服务提供商\n\n\u003Cdetails>\n\n- [RunPod](https:\u002F\u002Frunpod.io\u002Fgsc?template=v2ickqhz9s&ref=6i7fkpdz)\n- [Vast.ai](https:\u002F\u002Fcloud.vast.ai?ref_id=62897&template_id=bdd4a49fa8bce926defc99471864cace&utm_source=github&utm_medium=developer_community&utm_campaign=template_launch_axolotl&utm_content=readme)\n- [PRIME Intellect](https:\u002F\u002Fapp.primeintellect.ai\u002Fdashboard\u002Fcreate-cluster?image=axolotl&location=Cheapest&security=Cheapest&show_spot=true)\n- [Modal](https:\u002F\u002Fwww.modal.com?utm_source=github&utm_medium=github&utm_campaign=axolotl)\n- [Novita](https:\u002F\u002Fnovita.ai\u002Fgpus-console?templateId=311)\n- [JarvisLabs.ai](https:\u002F\u002Fjarvislabs.ai\u002Ftemplates\u002Faxolotl)\n- [Latitude.sh](https:\u002F\u002Flatitude.sh\u002Fblueprint\u002F989e0e79-3bf6-41ea-a46b-1f246e309d5c)\n\n\u003C\u002Fdetails>\n\n### 您的首次微调\n\n```bash\n# 获取 Axolotl 示例配置\naxolotl fetch examples\n\n# 或指定自定义路径\naxolotl fetch examples --dest path\u002Fto\u002Ffolder\n\n# 使用 LoRA 训练模型\naxolotl train examples\u002Fllama-3\u002Flora-1b.yml\n```\n\n就是这样！更多详细步骤请参阅我们的 [入门指南](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fgetting-started.html)。\n\n\n## 📚 文档\n\n- [安装选项](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Finstallation.html) - 针对不同环境的详细设置说明\n- [配置指南](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fconfig-reference.html) - 全面的配置选项与示例\n- [数据集加载](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fdataset_loading.html) - 从各类来源加载数据集\n- [数据集指南](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fdataset-formats\u002F) - 支持的格式及其使用方法\n- [多 GPU 训练](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmulti-gpu.html)\n- [多节点训练](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmulti-node.html)\n- [Multipacking](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmultipack.html)\n- [API 参考](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fapi\u002F) - 自动生成的代码文档\n- [常见问题解答](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Ffaq.html) - 常见问题汇总\n\n## 🤝 获取帮助\n\n- 加入我们的 [Discord 社区](https:\u002F\u002Fdiscord.gg\u002FHhrNrHJPRb) 寻求支持\n- 浏览我们的 [示例目录](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Ftree\u002Fmain\u002Fexamples\u002F)\n- 阅读我们的 [调试指南](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fdebugging.html)\n- 如需专属支持，请联系 [✉️wing@axolotl.ai](mailto:wing@axolotl.ai) 了解相关选项\n\n## 🌟 贡献\n\n欢迎贡献！详情请参阅我们的 [贡献指南](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fblob\u002Fmain\u002F.github\u002FCONTRIBUTING.md)。\n\n## 📈 遥测\n\nAxolotl 提供可选择关闭的遥测功能，用于了解项目使用情况并优先推进改进。我们仅收集基础系统信息、模型类型和错误率——绝不会收集个人数据或文件路径。遥测默认启用。如需禁用，请设置 AXOLOTL_DO_NOT_TRACK=1。更多详情请参阅我们的 [遥测文档](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Ftelemetry.html)。\n\n## ❤️ 赞助商\n\n有意赞助者请联系 [wing@axolotl.ai](mailto:wing@axolotl.ai)\n\n## 📝 引用 Axolotl\n\n如果您在研究或项目中使用 Axolotl，请按以下方式引用：\n\n```bibtex\n@software{axolotl,\n  title = {Axolotl: 开源 LLM 训练后优化工具},\n  author = {{Axolotl 维护者及贡献者}},\n  url = {https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl},\n  license = {Apache-2.0},\n  year = {2023}\n}\n```\n\n## 📜 许可证\n\n本项目采用 Apache 2.0 许可证授权，详情请参阅 [LICENSE](LICENSE) 文件。","# Axolotl 快速上手指南\n\nAxolotl 是一个免费且开源的大语言模型（LLM）微调框架，旨在简化最新模型的后期训练和微调流程。它支持多种模型架构、多模态训练以及丰富的优化策略。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **硬件要求**：\n    *   NVIDIA GPU（推荐 Ampere 架构或更新版本，以支持 `bf16` 精度和 Flash Attention）。\n    *   也支持 AMD GPU。\n*   **软件要求**：\n    *   Python 3.11\n    *   PyTorch ≥ 2.9.1\n*   **系统依赖**：\n    *   建议安装 `ninja` 编译工具以加速构建过程。\n\n> **提示**：国内开发者若遇到网络问题，建议在 pip 安装时配置清华源或阿里源（例如：`pip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple ...`）。\n\n## 安装步骤\n\n您可以选择通过 `pip` 直接安装或使用 `Docker` 容器运行。\n\n### 方式一：使用 Pip 安装（推荐）\n\n首先升级基础构建工具，然后安装 Axolotl 及其依赖（包含 Flash Attention 和 DeepSpeed 支持）：\n\n```bash\npip3 install -U packaging==26.0 setuptools==75.8.0 wheel ninja\npip3 install --no-build-isolation axolotl[flash-attn,deepspeed]\n```\n\n安装完成后，获取示例配置文件：\n\n```bash\n# 下载 Axolotl 示例配置\naxolotl fetch examples\n\n# 下载 DeepSpeed 配置（可选）\naxolotl fetch deepspeed_configs\n```\n\n### 方式二：使用 Docker 安装\n\n如果您希望避免环境依赖冲突，可以使用官方 Docker 镜像：\n\n```bash\ndocker run --gpus '\"all\"' --rm -it axolotlai\u002Faxolotl:main-latest\n```\n\n## 基本使用\n\nAxolotl 的核心优势在于通过单一的 YAML 配置文件驱动整个微调流程（数据预处理、训练、评估、量化及推理）。\n\n### 1. 准备配置文件\n\n从刚才下载的示例中选择一个适合您模型的配置文件（例如 `examples\u002Flora_llama3.yml`），并根据您的需求修改以下关键字段：\n*   `base_model`: 指定预训练模型的路径或 Hugging Face ID。\n*   `dataset`: 指定训练数据集的路径。\n*   `output_dir`: 指定模型保存路径。\n\n### 2. 启动训练\n\n使用 `accelerate` 启动训练任务。以下是最简单的单卡训练命令示例：\n\n```bash\naccelerate launch -m axolotl.cli.train examples\u002Flora_llama3.yml\n```\n\n如果您需要指定具体的 GPU 或使用多卡训练，可以结合 `torchrun` 或配置 `deepspeed` 参数（需在 YAML 中启用）。\n\n### 3. 快速体验 (Google Colab)\n\n如果您没有本地 GPU 环境，可以直接在 Google Colab 中运行官方提供的示例笔记本进行快速体验：\n\n[![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fblob\u002Fmain\u002Fexamples\u002Fcolab-notebooks\u002Fcolab-axolotl-example.ipynb)","某医疗科技公司的算法团队急需将通用的 Qwen3.5 大模型微调为专业的“临床病历结构化助手”，以从非结构化文本中提取关键诊疗信息。\n\n### 没有 axolotl 时\n- **环境配置繁琐**：工程师需手动拼接 DeepSpeed、FSDP 和各类注意力优化库（如 FlashAttention），常因版本冲突导致数天的环境调试。\n- **显存资源浪费**：面对参数量巨大的 MoE 架构模型，缺乏原生的专家量化（Expert Quantization）支持，单卡显存迅速爆满，被迫增加昂贵的多卡集群。\n- **训练策略单一**：难以快速验证前沿算法（如 GDPO 或 ScatterMoE LoRA），自定义修改底层训练循环极易引入 Bug，导致实验迭代周期长达数周。\n- **长上下文失效**：在处理长篇病历时，通用框架缺乏针对长序列的注意力优化（如 Scalable Softmax），导致模型在长文档末尾信息丢失严重。\n\n### 使用 axolotl 后\n- **开箱即用**：通过简单的 YAML 配置文件即可一键启动针对 Qwen3.5 的微调，自动适配最新的 Triton 内核与依赖，环境搭建缩短至小时级。\n- **显存效率倍增**：启用 `quantize_moe_experts` 功能后，直接在混合专家模型上应用量化，显存占用大幅降低，使得在消费级显卡上训练大模型成为可能。\n- **前沿算法无缝集成**：直接调用内置的 ScatterMoE LoRA 和 GDPO 策略，无需修改底层代码，团队能在一天内完成多种高效微调方案的对比验证。\n- **长文本精准捕捉**：利用集成的 Scalable Softmax 技术，模型在处理数千字的复杂病历时，关键信息提取准确率显著提升，彻底解决“长文遗忘”问题。\n\naxolotl 通过将复杂的分布式训练细节封装为简洁配置，让医疗 AI 团队从底层基建中解放，专注于核心业务逻辑的快速落地。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Faxolotl-ai-cloud_axolotl_181e30f8.png","axolotl-ai-cloud","Axolotl AI","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Faxolotl-ai-cloud_d570a45d.png","",null,"axolotl_ai","https:\u002F\u002Faxolotl.ai","https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud",[84,88,92,96,100],{"name":85,"color":86,"percentage":87},"Python","#3572A5",97.1,{"name":89,"color":90,"percentage":91},"Jinja","#a52a22",2.5,{"name":93,"color":94,"percentage":95},"Shell","#89e051",0.2,{"name":97,"color":98,"percentage":99},"CSS","#663399",0.1,{"name":101,"color":102,"percentage":99},"Dockerfile","#384d54",11568,1285,"2026-04-03T12:31:54","Apache-2.0","Linux","必需。支持 NVIDIA GPU (Ampere 架构或更新版本以支持 bf16 和 Flash Attention) 或 AMD GPU。未明确具体显存大小，但提及量化技术可减少 VRAM 占用。","未说明",{"notes":111,"python":112,"dependencies":113},"该工具主要面向 LLM 微调和后训练。推荐使用 Docker 安装以减少环境配置错误。NVIDIA 显卡需 Ampere 架构（如 RTX 30 系列、A100 等）或更新版本才能启用 bf16 精度和 Flash Attention 加速。支持多种并行策略（FSDP, DeepSpeed, Sequence Parallelism 等）以适应多卡和多节点训练。","3.11",[114,115,116,117,118,119,120],"torch>=2.9.1","packaging==26.0","setuptools==75.8.0","wheel","ninja","flash-attn","deepspeed",[26,13],[123,124],"fine-tuning","llm","2026-03-27T02:49:30.150509","2026-04-06T05:44:06.887760",[128,133,138,143,148,152],{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},13497,"如何将训练好的模型上传到 Hugging Face Hub？","Axolotl 使用步数（steps）而不是比例来保存模型。虽然理论上在训练停止时会自动保存，但该功能并不总是可靠。建议明确配置保存策略，不要完全依赖自动保存功能。","https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fissues\u002F302",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},13498,"在 Lambda Labs VM 上运行时遇到 PyTorch C 扩展加载失败或 bitsandbytes 未定义符号错误怎么办？","这通常是因为环境配置问题。如果是 PyTorch 导入错误，尝试使用 `python setup.py develop` 而不是 `install` 工作流。对于 bitsandbytes 的 `undefined symbol` 错误（如 `cquantize_blockwise_fp16_nf4`），请确保 bitsandbytes 是专门为 GPU 编译安装的，并检查是否所有安装步骤都已正确执行。有时需要强制重新安装 tensorflow 和 psutil 等依赖包来解决冲突。","https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fissues\u002F242",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},13499,"使用 DeepSpeed Zero3 进行多卡训练时，为什么显存没有随显卡数量增加而减少（模型未分片）？","如果在多卡环境下显存占用没有降低，可能是配置文件未正确生效。确保在启动命令中同时使用了 `--use_deepspeed` 和 `--deepspeed_config_file` 参数，例如：`accelerate launch --use_deepspeed --deepspeed_config_file deepspeed\u002Fzero3_bf16.json -m axolotl.cli.train ...`。此外，检查 YAML 配置中是否正确设置了 `bf16: true`（或 `fp16: true`）以及 `gradient_checkpointing: true` 以优化显存。","https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fissues\u002F1129",{"id":144,"question_zh":145,"answer_zh":146,"source_url":147},13500,"最近版本更新后，为什么在 8xH200 显卡上训练时显存占用大幅下降（仅使用 46GB 而非预期的 143GB）？","这可能是由于新版本中 Axolotl 不再自动注入某些 DeepSpeed 设置，导致批处理大小（batch size）被自动调整到了较低的值。尝试直接在 DeepSpeed 的 JSON 配置文件中明确设置 `micro_batch_size` 和其他相关参数，而不是仅依赖 Axolotl 配置文件。如果问题依旧，检查是否有数据集分割（dataset split）引入的新问题，或者回退到之前的稳定版本（如 0.4.11 构建版）进行对比测试。","https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fissues\u002F2688",{"id":149,"question_zh":150,"answer_zh":151,"source_url":147},13501,"每次重启训练时，为什么会出现长达一小时的 GCC 编译过程（\u002F_inductor\u002Fcompile_worker\u002F）？","这是 PyTorch 2.x 引入的 TorchInductor 编译器行为，它会在首次运行时编译内核。如果在每次重启时都重新编译，可能是因为缓存未被正确保存或读取。确保运行环境的临时目录可写，并且没有频繁清理缓存。如果使用的是旧版本或特定构建版，尝试升级到最新稳定版或检查是否有针对 Inductor 缓存的配置选项被意外禁用。",{"id":153,"question_zh":154,"answer_zh":155,"source_url":147},13502,"启动 Axolotl 时出现 'triton\u002Fautotune' 找不到错误是否正常？","这不是正常行为。该错误通常表明 Triton 库未正确安装或与当前 PyTorch\u002FCUDA 版本不兼容。请尝试重新安装 Triton（`pip install triton`）或确保安装了与您的 GPU 架构匹配的预编译版本。如果问题持续，可能需要检查 Python 环境是否干净，避免不同版本的深度学习库冲突。",[157,162,167,172,177,182,187,192,197,202,207,212,217,222,227,232,237,242,247,252],{"id":158,"version":159,"summary_zh":160,"released_at":161},72285,"v0.16.1","## Axolotl v0.16.1 发行说明\n\n### Gemma 4 支持\n![gemma-4_blog_keyword_header-dark width-2200 format-webp](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F07137cc3-dcd8-4c33-8596-1124bcbf03fd)\n\n\n示例 YAML：https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fblob\u002Fmain\u002Fexamples\u002Fgemma4\u002F26b-a4b-moe-qlora.yaml\n\n* Gemma 4 支持由 @winglian 在 https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3574 中实现\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fcompare\u002Fv0.16.0...v0.16.1","2026-04-02T21:47:32",{"id":163,"version":164,"summary_zh":165,"released_at":166},72286,"v0.16.0","# Axolotl v0.16.0 发行说明\n\n我们非常高兴地推出这一新版本。自 v0.15.0（2026年3月6日）以来，我们共进行了约80次新的提交。\n\n---\n\n## 亮点\n\n### **Async GRPO — 异步强化学习训练** (#3486)\n\n全面支持异步分组相对策略优化，并集成 vLLM。包括带有回放缓冲区的异步数据生产者、流式部分批次训练、原生 LoRA 权重同步至 vLLM，以及 FP8 兼容性。通过 FSDP1\u002FFSDP2 和 DeepSpeed ZeRO-3 支持多 GPU 训练。\n\n单步时间最高可提升 **58%**（Qwen2-0.5B 上，1.59秒\u002F步 vs 基线的3.79秒）。\n\n| 优化措施         | 单步时间   | 提升幅度 |\n| ---              | ---        | ---      |\n| 基线             | 3.79秒     | —        |\n| + 批量权重同步   | 2.52秒     | 加速34%  |\n| + Liger 内核融合 | 2.01秒     | 加速47%  |\n| + 流式部分批次   | 1.79秒     | 加速53%  |\n| + 元素分块 + 重投修复（500步） | 1.59秒     | 加速58%  |\n\n### **ScatterMoE + LoRA 融合 Triton 内核** (#3513)\n\n专为训练带有 LoRA 适配器的 MoE 模型设计的自定义融合 Triton 内核。通过将基础专家矩阵乘法和 LoRA 计算合并为一次内核调用，这些内核相比 eager 基线，可实现 **前向传播速度最高提升15倍，激活内存占用减少40倍**。\n\n\u003Cimg width=\"8773\" height=\"5142\" alt=\"scattermoe_speedup-2\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Ffc462b43-d488-4a3b-ae9e-7dfdb6312730\" \u002F>\n\n\n\u003Cdetails>\n\n\u003Csummary>实现细节\u003C\u002Fsummary>\n\n关键创新包括无原子操作的拆分反向内核（dA\u002FdB 梯度计算速度提升11倍）、自动调优的分块大小、融合 gather 反向操作（在工作负载较小时可消除中间分组 X 缓冲区）、选择性 NF4 反量化（仅对路由到的专家进行反量化——在短序列场景下，当只有少数专家被激活时，每层权重内存占用可减少高达约97%，但随着上下文长度增加，大多数专家都会接收到至少一个标记，因此节省效果会减弱），以及针对 H200\u002FB200 显存压力的参数调优。\n\n\u003C\u002Fdetails>\n\n### **SonicMoE 融合 LoRA** (#3519)\n\n为 SonicMoE（基于 CUTLASS 的 Hopper\u002FBlackwell GPU MoE 内核）提供 LoRA 支持。\n\n\u003Cimg width=\"10058\" height=\"4972\" alt=\"qwen35_moe_comparison-2\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F3a7095c9-d362-4c35-9b44-7cb4bbd3a8ba\" \u002F>\n\n这使得在单张 H100 SXM GPU 上，Qwen3.5-35B-A3B 8位 LoRA 微调的速度最高可提升1.45倍，同时相比 grouped_mm 基线，内存占用减少30%。\n\n### **GRPO 展平与打包** (#3552)\n\n支持 GRPO 训练中的批量展平和样本打包，从而提高 token 效率和训练吞吐量（约10%）。\n\n### **Flash Attention 4 支持** (#3481)\n\n支持 Hopper 和 Blackwell GPU 上的 Flash Attention 4，并根据硬件自动回退到 FA2\u002F3。\n\n\u003Cimg width=\"7104\" height=\"5142\" alt=\"fa2_vs_fa4\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F1d6bc8ad-fde9-4904-97c7-960e4363d810\" \u002F>\n\n我们发现，该功能在较长序列长度以及 Blackwell GPU 上的效果更为显著。\n\n### **NeMo Gym 集成** (#3516)\n\n完整集","2026-04-02T14:25:23",{"id":168,"version":169,"summary_zh":170,"released_at":171},72287,"v0.15.0","# Axolotl v0.15.0 发行说明\n\n本次发布带来了新的模型支持、显著的 MoE 改进、基于 Torch 2.10.0 和 `uv` 构建的基础架构更新，以及全面的质量改进修复。\n\n## 🚀 主要变更\n\n### Torch 2.10.0 和 uv 构建\n我们已升级到 **Torch 2.10.0**，并引入了基于 `uv` 的 Docker 构建，以实现更快、更可重复的镜像构建。单元测试中现已使用 Python 3.14。\n- 贡献者：@winglian，相关 PR：#3429、#3430、#3431 和 #3450。\n\n### ScatterMoE LoRA 和 SonicMoE\n新增了对 **ScatterMoE 的 LoRA 支持**，并引入了 **SonicMoE** 作为新的 MoE 内核选项，相比 transformers 中的 `grouped_mm`，它能够实现更快、更节省显存的 MoE 训练。\n- ScatterMoE LoRA：@winglian，PR #3410。\n- SonicMoE：@NanoCode012，PR #3411。\n\n### MoE 专家量化\n新增了对 Transformers v5 中 MoE 专家权重量化的支持，这能大幅降低峰值预留显存。例如，GLM-4.7-Flash QLoRA 的预留显存从 **~127 GiB 降至 ~23 GiB**。可通过设置 `quantize_moe_experts: true` 来启用。详情请参阅 [专家量化文档](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fexpert_quantization.html)。\n- 贡献者：@NanoCode012，PR #3439。\n\n## 🎉 新特性\n- **新模型支持：**\n  - GLM 模型，并附有专用补丁。（@ved1beta，PR #3329）\n  - Step3.5，用于 Cut Cross Entropy。（@NanoCode012，PR #3384）\n  - Qwen3.5 及 Qwen3.5 MoE 模型，支持打包功能。（@ved1beta，PR #3442）\n- **SageAttention：** 添加了 SageAttention 集成，以实现高效的注意力计算。（@NanoCode012，PR #2823）\n- **MXFP4 量化：** 新增对 MXFP4 量化的支持。（@ved1beta，PR #3375）\n- **Hub 版本支持：** 增加了 `hub_revision` 参数，用于在推送检查点时指定分支。（@madScientist10，PR #3387）\n- **点号表示法 CLI 参数：** 支持使用点号表示法来指定嵌套配置选项。（@ManasVardhan，PR #3419）\n- **SFT 样本生成：** 新增对 SFT 训练的样本生成支持。（@ved1beta，PR #3240）\n- **`train_per_sec_per_gpu` 指标：** 新增训练吞吐量指标。（@ved1beta，PR #3364）\n\n## ⚠️ 破坏性变更\n- **`dataset_processes` → `dataset_num_proc`：** 配置字段 `dataset_processes` 已重命名为 `dataset_num_proc`。（@tgoab，PR #3352）\n\n## 🐛 Bug 修复\n- **上下文并行：**\n  - 修复了状态字典保存和评估问题。（@ved1beta，PR #3382）\n  - 更正了 `total_num_steps` 和 `batch_size` 的计算。（@Yatimai，PR #3444）\n- **GRPO：**\n  - 修复了配置无法接受 `max_prompt_length` 的问题。（@NanoCode012，PR #3390）\n  - 将回放逻辑移至 `set_training_kwargs` 中。（@ved1beta，PR #3392）\n- **遥测：**\n  - 移除了遥测警告并改进了日志记录。（@NanoCode012，PR #3397、#3398）\n  - 在非主进程上禁用了遥测功能。（@NanoCode012，PR #3438）\n- **LoRA 核心：** 改进了失败消息提示，并优化了对 `trust_remote_code` 的处理。（@NanoCode012，PR #3378）\n- **生成模式：** 修复了 `add_special_tokens` 的处理问题，并启用了测试模式。","2026-03-06T17:55:44",{"id":173,"version":174,"summary_zh":175,"released_at":176},72288,"v0.14.0","这是一个重大版本发布，标志着我们已迁移到 **Transformers v5**。伴随这一核心依赖的重大升级，我们还为 MoE 模型引入了显著的性能优化，并推出了全新的微调方法。\n\n## 🚀 主要变更\n\n### Transformers v5 升级\n我们将底层的 `transformers` 依赖从 v4 升级至 **v5**。这是一项长期工作，旨在确保 Axolotl 始终与最新的生态系统进展、稳定性改进以及未来的模型架构保持兼容。\n- 由 @winglian 在 [#3272](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3272) 和 [#3376](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3376) 中贡献。\n\n### 更快的 MoE 训练\n我们新增了通过 transformers 选择 MoE 内核的支持：`batched_mm` 和 `grouped_mm`。同时，我们也加入了针对 `scattermoe` 的自定义集成。这些改进显著 **加速了训练**，并降低了混合专家（MoE）模型的 **显存占用**。\n- 由 @winglian 在 [#3377](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3377) 中贡献。\n\n## 🎉 新特性\n- **EAFT 支持：** 新增对高效适配微调（EAFT）的支持。([#3366](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3366)，由 @salmanmohammadi 贡献)\n- **新的 CCE 支持：** 为 **GLM 4.7 Flash**、**GLM Image**、**GLM 4.6v** 和 **Exaone 4** 新增 Cut Cross Entropy 支持。([#3373](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3373)，由 @NanoCode012 贡献)\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fcompare\u002Fv0.13.2...v0.14.0","2026-01-30T19:10:53",{"id":178,"version":179,"summary_zh":180,"released_at":181},72289,"v0.13.2","# Axolotl v0.13.2 发行说明\n\n这是一个补丁版本，引入了 GDPO 支持，并更新了核心基础设施，包括更现代的 CUDA 默认配置和 Python 版本。\n\n## 🎉 新特性\n*   **GDPO 支持:** 添加了对 GDPO 的支持。([#3353](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3353) 由 @ved1beta 提供)\n\n## 📦 依赖与基础设施更新\n*   **CUDA 12.9.1:** 基础镜像现在默认使用 CUDA 12.9.1。([#3367](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3367) 由 @winglian 提供)\n*   **Python 3.12:** 在基础镜像中添加了对 Python 3.12 的支持。([#3367](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3367) 由 @winglian 提供)\n*   **vLLM 升级:** 将 vLLM 依赖升级至 v0.14.0。([#3345](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3345) 由 @winglian 提供)\n\n## 其他修复\n* 由 @winglian 在 https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3365 中进行的版本开发\n* 仅移除以 'v' 开头的字符；例如，不从版本号中的 '.dev' 中移除，由 @winglian 在 https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3368 中实现\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fcompare\u002Fv0.13.1...v0.13.2","2026-01-22T15:59:13",{"id":183,"version":184,"summary_zh":185,"released_at":186},72290,"v0.13.1","本次发布新增对 **PyTorch 2.9.1** 的支持，通过引入 **新的实验跟踪工具**（SwanLab 和 Trackio）扩展了我们的生态系统，并增加了对一系列新模型的支持，包括 **Olmo3**、**Ministral 3**、**InternVL 3.5** 和 **Kimi**。此外，我们还对量化工作流和指标记录进行了显著改进。\n\n## 🎉 新特性\n\n### 扩展的模型支持\n我们新增了对更多模型的支持：\n- **Olmo3**：包括 Olmo 和 Olmo2。（[#3275](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3275)，由 @NanoCode012 提供）\n- **Ministral 3**（[#3297](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3297) 和 [#3300](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3300)，由 @NanoCode012 提供）\n- **InternVL 3.5**：（[#3141](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3141)，由 @NanoCode012 提供）\n- **Kimi**：采用实验性训练代码。（[#3257](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3257)，由 @NanoCode012 提供）\n- **Trinity**：由 ArceeAI 提供。（[#3292](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3292)，由 @NanoCode012 提供）\n- **Exaone 4**：（[#3279](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3279)，由 @nayohan 提供）\n- **MiMo & Plano**：（[#3332](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3332)，由 @NanoCode012 提供）\n\n### 新的实验跟踪集成\n- **SwanLab**：现在可以使用 SwanLab 进行实验跟踪。（[#3334](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3334)，由 @PraMamba 提供）\n- **Trackio**：新增了 Trackio 验证集成。（[#3253](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3253)，由 @abidlabs 提供）\n\n### 训练与 PEFT 改进\n- **Liger Kernel for DPO**：为 DPO 训练添加了 Liger 支持内核。（[#3302](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3302)，由 @ved1beta 提供）\n- **分布式 Muon**：新增对分布式 Muon 优化器的支持。（[#3264](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3264)，由 @salmanmohammadi 提供）\n- **权重共享安全性**：新增 `peft_ensure_weight_tying`，以确保 PEFT 中参数的正确处理。（[#3278](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3278)，由 @NanoCode012 提供）\n- **适配器数据类型**：新增 `peft_autocast_adapter_dtype` 配置选项，用于更精细地控制适配器的数据类型。（[#3311](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3311)，由 @xzuyn 提供）\n- **低成本 PPL 指标**：一种计算困惑度的新指标，计算成本更低。（[#3317](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3317)，由 @xzuyn 提供）\n- **缩放 Softmax**：通过 `s * log(n) + b` 对 Softmax 计算进行缩放。（[PR #3338](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3338)，由 @ved1beta 提供）\n\n## ⚠️ 已弃用功能与警告\n\n### PyTorch 2.7.1 已弃用\n对 PyTorch 2.7.1 的支持已被弃用。建议升级到更新的支持版本。\n- 由 @winglian 在 [#3339](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3339) 中贡献。\n\n## 🔧 修复与改进\n\n### 量化与 CLI\n- **保存处理器**：量化 CLI 现在能够正确地将处理器一同保存。","2026-01-20T13:59:00",{"id":188,"version":189,"summary_zh":190,"released_at":191},72291,"v0.13.0","本次发布包含多项重大新功能，包括适用于大规模数据集的**流式监督微调（SFT）**、全新的**文本扩散训练插件**，以及在**量化感知训练（QAT）**能力上的重大升级，并新增对**NVFP4**格式的支持。同时，我们非常高兴地宣布，现已支持包括**Gemma3**、**通义千问3系列**、**文心一言**、**磐石4**等在内的众多新型模型。\n\n除了这些亮点功能外，本次版本还增加了对PyTorch 2.9和CUDA 13的支持，对依赖库进行了大幅更新，引入了基于Ruff的新开发工具链，并修复了大量关键问题，以提升训练稳定性与用户体验。\n\n## 🎉 新特性\n\n### 流式监督微调（SFT）\n\n现在，您无需预先处理并将整个数据集加载到内存中，即可对任意规模的数据集进行微调。流式SFT能够实时处理数据，从而显著降低内存占用和启动时间，非常适合大规模训练场景。\n\n- 贡献者：@djsaunde，参见[#3101](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3101)。\n\n### 文本扩散训练插件\n\n借助我们的文本扩散训练插件，您可以探索一种全新的训练范式！该插件允许您基于扩散模型的目标来训练模型，为文本生成和编辑任务开辟了新的可能性。\n\n- 贡献者：@djsaunde，参见[#3067](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3067)，并在[#3191](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3191)中进行了关键修复。\n\n### 升级后的量化感知训练（QAT），支持NVFP4\n\n我们已迁移到全新的QAT API，并增强了`axolotl quantize`命令的功能。此次发布首次引入对**NVFP4**的支持——这是一种全新的4位浮点格式，进一步推动了模型量化与效率的边界。\n\n- 贡献者：@SalmanMohammadi，参见[#3107](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3107)。\n\n### 扩展的模型支持\n\n我们新增了对一系列强大且新兴模型的支持：\n- **通义千问3系列**：`qwen3-next`、`qwen3_vl`和`qwen3_vl_moe`（由@NanoCode012贡献，分别参见[#3150](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3150)和[#3178](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3178)，以及@miketung贡献的[#3183](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3183)）。\n- **磐石**：`Granite 4`示例及MoE变体`granitemoeshared`和`granitemoehybrid`（由@NanoCode012贡献，分别参见[#3256](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3256)和[#3178](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3178)，以及[#3158](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3158)）。\n- **Gemma3**：支持`gemma3_text`的注意力机制处理（由@NanoCode012贡献，参见[#3103](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3103)）。\n- **文心一言**：新增对Hunyuan v1模型的支持（由@NanoCode012贡献，参见[#3016](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3016)）。\n- **Mistral**：新增`Magistral-small-2509`支持，并原生集成Mistral3分词器（由@NanoCode012贡献，参见[#3165](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3165)）。","2025-12-02T15:00:47",{"id":193,"version":194,"summary_zh":195,"released_at":196},72292,"v0.12.2","## 变更内容\n* @SalmanMohammadi 在 https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3043 中添加了 citation.tff 文件\n* @winglian 在 https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3047 中将 monkeypatch 测试放在独立的运行器中执行\n* @winglian 在 https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3051 中更新了训练参数检查，以适应新的默认值\n* @winglian 在 https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3054 中对插件注册问题进行了后续修复\n* @winglian 在 https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3049 中修复了 vllm 标签问题，并添加了不含 tmux 的云镜像\n* 杂项：@github-actions[bot] 在 https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3050 中更新了 pre-commit 钩子\n* @winglian 在 https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3052 中移除了 prepare-from-posids 补丁\n* @winglian 在 https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3055 中提供了临时解决方案，以解除 main 分支文档构建的阻塞\n* @winglian 在 https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3064 中升级了 transformers==4.55.1 和 bitsandbytes==0.47.0\n* @NanoCode012 在 https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3061 中修复了 fsdp_config 验证为 None 的问题\n* @winglian 在 https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3066 中使用了更新的 transformers 4.55.2 补丁版本\n* @SalmanMohammadi 在 https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3060 中添加了在 PR 中跳过慢速测试的选项\n* @winglian 在 https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3063 中对 VLM 进行了多项修复\n* @winglian 在 https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3073 中针对 GPT-OSS 改进了 FSDP 分片合并及文档\n* @ved1beta 在 https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3068 中新增了带有 excess_length_strategy 的截断支持\n* @ved1beta 在 https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3074 中向 VllmserveCliArgs 添加了 data_parallel_size 参数\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fcompare\u002Fv0.12.1...v0.12.2","2025-08-18T14:41:48",{"id":198,"version":199,"summary_zh":200,"released_at":201},72293,"v0.12.1","v0.12.1 是一个补丁版本，用于修复在通过 CLI 使用 Ray Trainer 时出现的回归问题。\n\n## 变更内容\n* 使用 `exec` 替代 `subprocess`，以使 CLI 中的 `Ctrl+C` 操作更加友好，由 @winglian 在 https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3044 中实现。\n* 修复 Ray Train 相关问题，并为 Ray Trainer 添加 FSDP2 烟囱测试，由 @winglian 在 https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3053 中实现。\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fcompare\u002Fv0.12.0...v0.12.1","2025-08-11T13:38:12",{"id":203,"version":204,"summary_zh":205,"released_at":206},72294,"v0.12.0","我们正在推出分布式训练功能的重大升级，包括对大规模训练的 ND-并行的支持、对 DeepSpeed 自动张量并行的支持，以及 FP8 训练。此外，我们也很高兴地宣布支持微调最新的 gpt-oss 模型（以及更多模型！），并带来了一系列修复和依赖项更新。\n\n## 🎉 新特性\n\n### ND-并行：用于高级并行策略\n与 Accelerate 配合使用，我们引入了 ND-并行支持，允许您组合上下文并行、张量并行和全分片数据并行等多种并行技术，从而实现大规模高效地微调大型模型。更多详情请参阅 Hugging Face 官方 [博客文章](https:\u002F\u002Fhuggingface.co\u002Fblog\u002Faccelerate-nd-parallel)！\n\n- 由 @SalmanMohammadi 和 @winglian 在 [#2977](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2977) 和 [#3019](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3019) 中贡献。\n\n### 扩展的模型支持\n我们新增了一批强大的模型支持：\n- GPT-OSS ([#3020](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3020)，@winglian 贡献）—— 使用我们的[示例配置](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Ftree\u002Fmain\u002Fexamples\u002Fgpt-oss)即可快速上手！\n- **Gemma 3n** ([#2852](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2852)，@NanoCode012 贡献）\n- **Liquid 基础模型 2** ([#2905](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2905)，@winglian 贡献）\n- **Voxtral & Magistral Small 1.1** ([#2979](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2979)，@NanoCode012 贡献）\n- **Devstral** ([#2896](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2896)，@NanoCode012 贡献）\n\n### 使用 `torchao` 的实验性 FP8 混合精度训练\n现在您可以体验实验性的 FP8 混合精度训练！通过利用 `torchao` 库，您可以使用 FP8 数据类型进行训练，并在 FP8 下执行 gather 操作，从而显著节省显存并可能提升训练速度。请参阅[文档](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fmixed_precision.html#sec-fp8)以启用此功能。\n\n- 由 @djsaunde 在 [#2926](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2926) 中贡献。\n\n### 改进的 Slurm 支持\n我们修复了一些在预处理阶段可能导致任务卡死的问题，并提供了一个易于使用的 Slurm 示例，以满足您大型集群的需求。请查看[README 和示例](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fblob\u002F2c8497e489111176adfea5a88d5415ad28d72548\u002Fexamples\u002Fslurm\u002FREADME.md)。\n\n- 由 @winglian 在 https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F3038 中贡献。\n\n### DeepSpeed 自动张量并行（AutoTP）\n现在您可以利用 DeepSpeed 的自动张量并行功能，自动将模型的各层分配到多块 GPU 上。这会大幅降低每块 GPU 的显存需求，使您能够在相同的硬件条件下微调比以往更大规模的模型。只需在 YAML 配置中设置 `tensor_parallel_size: int` 即可启用。\n\n- 贡献者","2025-08-08T12:25:18",{"id":208,"version":209,"summary_zh":210,"released_at":211},72295,"v0.11.0.post1","## What's Changed\r\nTiledMLP also works with single-gpu and when enabled with Deepspeed ZeRO-3, can achieve training up to 400k context-length.\r\n\r\n* tiled_mlp supports single gpu by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2891\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fcompare\u002Fv0.11.0...v0.11.0.post1","2025-07-09T16:49:28",{"id":213,"version":214,"summary_zh":215,"released_at":216},72296,"v0.11.0","## 🚨 Breaking Changes\r\n\r\n### Upstream Patches for CCE, Phi3, Phi4\r\n\r\nOur Cut-Cross-Entropy (CCE) patches have been moved to a [dedicated upstream fork](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Fml-cross-entropy). This improves maintainability, enables the community to easily contribute new patches, and re-use across projects. This update includes:\r\n\r\n- Updates to support `transformers>=4.52.4` .\r\n- New patches for `phi3` `phi4_multimodal` .\r\n- All patches have been sanity-tested for reliability.\r\n\r\nPlease make sure to install from our fork instead. We recommend using the provided script in the repo\r\n\r\n```yaml\r\npython scripts\u002Fcutcrossentropy_install.py | sh\r\n```\r\n\r\n- Contributed by @NanoCode012 in [#2813](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2813).\r\n\r\n### Dropped support for PyTorch 2.5.1 and vLLM installation requirements\r\n\r\nAs PyTorch 2.8.0 is slated to be released later this month, we have now dropped support for 2.5.1. We recommend using torch==2.7.0 or 2.7.1. \r\n\r\nDocker images now default to use torch 2.7.1 when using `main-latest` tags.\r\n\r\nvLLM is no longer included in Docker images for torch==2.6.0. This is due to vllm wheels using the incorrect ABI for 2.6.0 and the last supported version of vLLM for torch 2.6.0 is 0.8.5.post1. See https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm\u002Fissues\u002F13608 for more details. \r\nSimilarly, vLLM is only included in torch==2.7.0 as it is pinned to that particular version and 2.7.1 support is still [in review](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm\u002Fpull\u002F19507)\r\n\r\n## 🎉 New features\r\n\r\n### Added Chunked cross entropy loss\r\n\r\nWe've introduced `chunked_cross_entropy` as an alternative to the default trainer loss function. This can help reduce peak memory usage during training, especially for models with large vocabularies.\r\n\r\n- Contributed by @winglian in [#2625](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2625).\r\n\r\n### Added Support for Falcon-h1\r\n\r\nYou can now fine-tune models from the Falcon-h1 family. Run one of the [example configs](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Ftree\u002Fv0.11.0\u002Fexamples\u002Ffalcon-h1).\r\n\r\n- Contributed by @younesbelkada in [#2811](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2811).\r\n\r\n### Added Support for Devstral Small\r\n\r\nIt is now possible to fine-tune Devstral models in Axolotl. Give it a try following our [docs](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fblob\u002Fv0.11.0\u002Fexamples\u002Fdevstral\u002FREADME.md).\r\n\r\n- Contributed by @NanoCode012 in [#2880](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2880).\r\n\r\n### TiledMLP support\r\n\r\nTiledMLP, authored by [Arctic Long Sequence Training](https:\u002F\u002Fwww.snowflake.com\u002Fen\u002Fengineering-blog\u002Farctic-long-sequence-training-multi-million-token-ai\u002F), reduces the activation footprint of long sequences in the MLP modules.\r\n\r\nThis currently only works with DeepSpeed Zero1 through Zero3. Single GPU, DDP, and FSDP aren't supported with this currently. Enable it via `tiled_mlp: true`. Follow the linked PR for more info.\r\n\r\n- Contributed by @winglian in [#2811](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2865).\r\n\r\n### DenseMixer integration\r\n\r\n[DenseMixer](https:\u002F\u002Fgithub.com\u002Fyaof20\u002FDenseMixer\u002F) is a MoE post-training method that improves router gradient estimation in MoE training. Read our [docs](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fcustom_integrations.html#densemixer) learn more.\r\n\r\n- Contributed by @winglian in [#2868](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2868).\r\n\r\n### Flexible Evaluation Sequence Length\r\n\r\nYou can now set a different `eval_sequence_len` in your config. This allows you to train with one sequence length but run evaluations on a longer or shorter one, providing more flexibility for testing model capabilities.\r\n\r\n- Contributed by @winglian in [#2836](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2836).\r\n\r\n### Improved Merge LoRA on CPU for DPO\r\n\r\n`--lora-on-cpu` flag now correctly moved LoRA adapters to CPU, even for DPO. This is useful for saving VRAM when merging LoRA adapters on machines with limited GPU memory.\r\n\r\n- Contributed by @kallewoof in [#2766](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2766).\r\n\r\n### Other Feature Enhancements\r\n\r\n- **Log Configuration on Startup:** Axolotl now logs the full, resolved configuration at the start of every run, making it much easier to verify your settings. (by @djsaunde in [#2819](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2819))\r\n- **`chat_template` kwargs:** Restored the ability to pass additional arguments to your chat templates for more flexible formatting. (by @NanoCode012 in [#2837](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2837))\r\n- Support Jinja2 template paths to `chat_template_jinja` and re-formatting string templates to files (by @winglian in [#2795](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2795))\r\n\r\n### 📦 Dependency Updates\r\n\r\n- `flash-attn` upgraded to `2.8.0.post2`. (by @winglian in [#2828](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2828))\r\n- `accelerate` upgraded to `1.8.1` and `bitsandbytes` to `0.46.0`.","2025-07-09T13:25:00",{"id":218,"version":219,"summary_zh":220,"released_at":221},72297,"v0.10.1","## 🚨 Breaking Changes\r\n\r\nNo breaking changes in this release! 🥳\r\n\r\n## 🎉 New features\r\n\r\n### Auto-Generated Configuration Docs\r\n\r\nSay goodbye to out-of-sync documentation! Our config reference page is now automatically generated from our Pydantic models. This means the documentation is always up-to-date with the latest available options. \r\n\r\n- Check it out here: https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fconfig-reference.html\r\n- Contributed by @djsaunde in [#2718](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2718), [#2806](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2806).\r\n\r\n### 📦 Dependency Updates\r\n\r\n- `transformers` bumped to `4.52.4` ([#2800](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2800))\r\n- `trl` bumped to `0.18.2` ([#2814](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2814))\r\n\r\n## 🔧 Major fixes\r\n\r\n### Chat Template Parsing Fix\r\n\r\nFixed an issue where `{% generation %}` and `{% endgeneration %}` tags in Jinja chat templates were not being correctly ignored, preventing potential formatting errors.\r\n\r\n- Contributed by @nyxkrage in [#2787](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2787).\r\n\r\n### Logging fixes\r\n\r\n- Addressed logging errors on Python 3.10 by @NanoCode012 in [#2802](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2802)\r\n- Fixed handling of logging in distributed states by @djsaunde in [#2808](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2808)\r\n\r\n## Other Improvements\r\n\r\n- update favicon by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2801\r\n- Set dev version by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2807\r\n- fix(doc): address exitcode formatting to help search by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2809\r\n\r\n## New Contributors\r\n\r\n- @nyxkrage made their first contribution in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2787\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fcompare\u002Fv0.10.0...v0.10.1","2025-06-19T15:30:04",{"id":223,"version":224,"summary_zh":225,"released_at":226},72298,"v0.10.0","## 🚨 Breaking Changes & Deprecations\r\n\r\n### **PyTorch 2.5 Future Deprecation Notice**\r\n\r\n- Support for `torch==2.5.1` will be deprecated in a future release. We now recommend using **PyTorch 2.6.0 or higher**.\r\n\r\n### **Internal Refactors**\r\n\r\n- The module for loading models has been refactored from `axolotl.models` to `axolotl.loaders`. This is an internal change and should not affect most users, but may impact those with custom scripts that import directly from these modules. ([#2680](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2680))\r\n\r\n### Updated transformers to `4.52.3`\r\n\r\nTransformers has done a major refactor for vision language models and forward functions for other modeling code. There may be loading issues or patches (Liger \u002F CCE) that do not work with this new version until we fix them.\r\n\r\n- Contributed by @winglian in [#2735](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2735)\r\n\r\n## 🎉 New Features\r\n\r\n### Sparse Finetuning using [LLMCompressor](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fllm-compressor\u002Ftree\u002Fmain)\r\n\r\nUsing LLMCompressor, the integration allows users to efficiently fine-tune models with structured\u002Funstructured sparsity, recovering 99% accuracy or better for sparse models, and 3X faster inference. Learn how to use it [here](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fcustom_integrations.html#llmcompressor).\r\n\r\n- Contributed by @rahul-tuli in [#2479](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2479).\r\n\r\n### Quantization-Aware Training (QAT)\r\n\r\nQAT simulates quantization during training to achieve higher quality post-training quantized (PTQ) models than from applying PTQ to models trained without QAT. Check the [docs](https:\u002F\u002Fdocs.axolotl.ai\u002Fdocs\u002Fqat.html).\r\n\r\n- Contributed by @SalmanMohammadi in [#2590](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2590) and @winglian in [#2776](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2776).\r\n![image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F97ebe800-a2aa-42be-9d66-38bdfe8ecf6d)\r\n\r\n### Mistral Tokenizer and Improved Tool Calling\r\n\r\n- **Mistral Native Tokenizer:** We now use `mistral-common` for an official, robust implementation of the Mistral tokenizer. (by @NanoCode012 in [#2780](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2780))\r\n- **Enhanced Tool Calling:** Added improved support for chat_template with a dedicated `tools` column in your dataset, enabling streamlined function-calling models. (by @NanoCode012 in [#2774](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2774))\r\n- **`chat_template` kwargs:** Pass additional arguments to your chat templates for more flexible formatting control. (by @NanoCode012 in [#2694](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2694))\r\n\r\n### Efficient Chunked Knowledge Distillation\r\n\r\nWe've added Liger-style chunking to efficiently calculate Knowledge Distillation (KD) loss and now support online distillation using logprobs from `vllm`\u002F`sglang`. (by @winglian in [#2700](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2700))\r\n\r\n### **📦 Dependency & Build Updates**\r\n\r\n- Added CI and Docker images for CUDA 12.8 (for B200 GPUs). (by @winglian in [#2683](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2683))\r\n- Added base images for PyTorch 2.7.1. (by @winglian in [#2764](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2764), [#2784](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2784))\r\n- Added support for `uv` in base images and test tooling. (by @winglian in [#2691](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2691), [#2750](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2750))\r\n- Removed the `hqq` dependency. (by @NanoCode012 in [#2759](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2759))\r\n\r\n### **📚 Documentation & Examples**\r\n\r\n- Added documentation for Group Relative Policy Optimization (GRPO) for RLHF. (by @mhenrichsen in [#2748](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2748))\r\n- Added new chat templates for Command-R+ and AYA-23 models. (by @hyeobiiii in [#2731](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2731))\r\n- Fixed the `lora_target_modules` syntax in the Qwen2-VL example config. (by @cummins-orgs in [#2793](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2793))\r\n\r\n## 🔧 Major fixes\r\n\r\n### **Performance & Stability**\r\n\r\n- **Slow Dataset Processing:** Fixed a performance regression where dataset processing was slow by allowing `num_proc` to be configured. (by @michelyang in [#2681](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2681))\r\n- **Sample Packing:** Limited the number of processes used by the multipack sampler to prevent resource exhaustion and slowdowns. (by @winglian in [#2771](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2771))\r\n- **DeepSpeed:** Fixed issues where distributed state was not initialized correctly ([#2737] by @djsaunde) and where the config was not being set for ZeRO Stage 3 ([#2754] by @NanoCode012).\r\n- **RL Training:** Resolved an issue where RL plugins could overwrite the trainer class ([#2697] by @NanoCode012) and improved feature parity","2025-06-17T16:13:53",{"id":228,"version":229,"summary_zh":230,"released_at":231},72299,"v0.9.2","## 🚨 Breaking Changes\r\n\r\nNo breaking changes in this release! 🎉\r\n\r\n## 🎉 New Features\r\n\r\n### **Activation Checkpointing with Disk Offloading**\r\n\r\nYou can now significantly reduce VRAM usage by offloading activation checkpoints to disk with prefetching. This makes it possible to train larger models or use larger batch sizes on memory-constrained hardware. This is only recommended on resource constrained systems like Google Colab.\r\n\r\n- Contributed by @winglian in [#2663](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2663).\r\n\r\n### Support for Atropos\r\n\r\n[Atropos](https:\u002F\u002Fgithub.com\u002FNousResearch\u002Fatropos\u002F) is a framework for Reinforcement Learning by NousResearch. The [Atropos plugin for Axolotl](https:\u002F\u002Fgithub.com\u002FNousResearch\u002Fatropos\u002Ftree\u002Fmain?tab=readme-ov-file#axolotl) adds Atropos' RL environments into Axolotl's training pipelines. This allows you to leverage Atropos for reinforcement learning while utilizing Axolotl's extensive features for model fine-tuning.\r\n\r\n- Contributed by @winglian in [#2666](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2666).\r\n\r\n## 🔧 Major Fixes\r\n\r\n- **LoRA Kernel Stability:** The LoRA kernel is now disabled when `lora_dropout` is non-zero instead of being auto-enabled. (by @NanoCode012 in [#2655](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2655))\r\n- **RunPod Stability:** Improved handling of environment variables for RunPod serverless deployments to prevent accidental deletion of secrets. (by @winglian in [#2653](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2653))\r\n- **General Bug Fixes:** Pass `save_only_model` to RL trainer and improve mistral fft tests. (by @winglian in [#2661](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2661))\r\n\r\n### Other Improvements\r\n\r\n- **Documentation:** Updated GRPO documentation for clarity. (by @winglian in [#2649](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2649))\r\n\r\n**Full Changelog**: `https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fcompare\u002Fv0.9.1.post1...v0.9.2`","2025-05-13T21:53:31",{"id":233,"version":234,"summary_zh":235,"released_at":236},72300,"v0.9.1.post1","# Bugfix\r\n\r\nThe new multipack implementation was binning too uniformly and in a sorted manner leading to loss increase\r\n\r\n## What's Changed\r\n* don't sort multipack sampler by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2657\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fcompare\u002Fv0.9.1...v0.9.1.post1\r\n","2025-05-10T01:54:28",{"id":238,"version":239,"summary_zh":240,"released_at":241},72301,"v0.9.1","## Efficiency & Performance\r\n\r\nAxolotl delivers demonstrably faster training due to improved packing and kernels, allowing you to accomplish more in less time. Our benchmarks show that we outperform the next faster trainer by 30% on real world workloads that translate directly to increased productivity and faster time-to-results.\r\n\r\n\u003Cimg width=\"406\" alt=\"Screenshot 2025-05-07 at 9 09 58 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F7c12ed36-6a6a-4f6e-a6a1-c1e583034d1f\" \u002F>\r\n\r\nAxolotl delivers this performance advantage while consuming less VRAM and eliminating resource spikes throughout your training runs.\r\n\r\n\u003Cimg width=\"533\" alt=\"Screenshot 2025-05-07 at 8 49 15 PM\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F6215783f-434d-498e-9d9a-efb3e2140ce6\" \u002F>\r\n\r\n### Get started on Google Colab\r\n\r\nFine-tune your own Qwen3-14B for free on Google Colab: https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1EscYgLM38dWMcG5IyJz1qHl3VO7a2hZz\r\n\r\n## 🚨 Breaking Changes & Deprecations\r\n\r\n### PyTorch 2.4.1 Support Removed\r\n\r\n- Support for `torch==2.4.1` has been officially removed. Please upgrade to a newer version of PyTorch. (by @winglian in [#2582](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2582))\r\n\r\n## 🎉 New Features\r\n\r\n### Greatly Improved Sample Packing\r\n\r\n- We've implemented an improved Parallel Bin Packing algorithm that achieves **~99% packing efficiency** on most datasets. This can improve your workload throughput by up to 10%. (by @winglian in [#2631](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2631))\r\n- **`pad_to_sequence_len`** is now automatically enabled when using sample packing for better performance and stability. (by @winglian in [#2607](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2607))\r\n\r\n### Xformers Attention for Packed Sequences\r\n\r\nAdded support for using `xformers` optimized attention with packed sequences in `fp16`, boosting training speed even further. (by @winglian in [#2619](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2619))\r\n\r\n### Support fine-tuning Text-to-Speech model with LLM backbone\r\n\r\nAxolotl now supports training a Text-to-Speech (TTS) model on top of an LLM. (by @mhenrichsen in [#2614](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2614))\r\n\r\n### Automatic LoRA Kernel Activation\r\n\r\nLoRA kernels are now **automatically enabled** where possible, providing a hands-free performance boost for LoRA training. This feature is automatically disabled for RL training to ensure stability. (by @djsaunde in [#2589](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2589) and @winglian in [#2600](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2600))\r\n\r\n### CAME Optimizer Support\r\n\r\nYou can now use the [CAME optimizer](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.02047), a memory-efficient optimizer designed for large language models. (by @xzuyn in [#2385](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2385))\r\n\r\n### General User Experience\r\n\r\n- **DeepSpeed Config Logging:** Your DeepSpeed configuration is now automatically saved to Weights & Biases, making your runs easier to reproduce and debug. (by @winglian in [#2593](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2593))\r\n- **Automatic Reasoning Dataset Splitting:** Axolotl can now automatically split existing reasoning datasets to leverage new chat templates with reasoning\u002Ftool-use turns. (by @winglian in [#2591](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2591))\r\n\r\n### 📦 Dependency Updates\r\n\r\n- `liger-kernel` bumped to `0.5.9`. (by @winglian in [#2640](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2640))\r\n- `vllm` bumped to `0.8.5` for Qwen2 support. (by @winglian in [#2583](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2583))\r\n- `datasets` and other Hugging Face libraries have been updated.\r\n\r\n## 🔧 Major Fixes\r\n\r\n### Model and Training Stability\r\n\r\n- **Qwen2 Models:** Fixed multiple issues with packing and kernel support for the Qwen2 and Qwen2-MoE model families, ensuring they train correctly. (by @winglian in [#2588](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2588), [#2612](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2612), [#2622](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2622) and @NanoCode012 in [#2596](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2596))\r\n- **Evaluation:** Fixed a bug that could cause evaluation runs to fail. (by @djsaunde in [#2586](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2586))\r\n- **DeepSpeed:** Resolved an issue where the learning rate was passed as a tensor instead of a float, causing errors. (by @winglian in [#2595](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2595)). This was also pushed upstream at https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Fpull\u002F37704 by @NanoCode012 and https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Fpull\u002F37881 by @winglian.\r\n- **DPO Trainer:** Fixed an issue where evaluation steps were incorrectly overridden. (by @winglian in [#2628](https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2628))\r\n\r\n## Other Improvements\r\n\r\n### 📚 Documentation & Examples\r\n\r\n- Add","2025-05-08T11:31:19",{"id":243,"version":244,"summary_zh":245,"released_at":246},72302,"v0.9.0","## What's Changed\r\n* [llama4] fix the mm yaml, add scout single gpu yaml by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2510\r\n* upgrade transformers to 4.51.1 by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2508\r\n* fix: liger swiglu for llama4 by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2504\r\n* Add Llama4 maverick examples and wandb links by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2512\r\n* fix: allow merge lora on pre-quantized model by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2511\r\n* feat: add CNAME by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2513\r\n* Update rlhf.qmd by @bursteratom in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2519\r\n* add mocks for loading datasets in cli train tests by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2497\r\n* Feat(examples): add deepcogito by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2516\r\n* feat(doc): explain deepspeed configs by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2514\r\n* chore: update doc links by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2509\r\n* [ci] make e2e tests a bit faster by reducing test split size by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2522\r\n* remove strict=false from example yamls by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2523\r\n* feat: add examples for deepcoder by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2517\r\n* make sure the all of the model is on the same device by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2524\r\n* feat: update cce to latest by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2521\r\n* Fix: add delinearization and make qlora work with fsdp2 by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2515\r\n* batch api HF adapter for ring-flash-attn; cleanup and improvements by @djsaunde in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2520\r\n* re-enable DS zero3 ci with updated transformers by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2533\r\n* adding codecov reporting by @djsaunde in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2372\r\n* fix: preprocess yielding whole dataset to each worker by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2503\r\n* fix(doc): cut cross entropy installation instructions broken in qmd by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2532\r\n* zero val fix for beta by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2538\r\n* fix: upgrade liger to 0.5.8 and use native Gemma3 patches by @chiwanpark in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2527\r\n* don't run multigpu tests twice, run SP in separate test by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2542\r\n* Fixed Rex Scheduler Warm Up by @Catgat in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2535\r\n* prevent rate limiting to hf when using dispatch batches by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2536\r\n* make sure to download fixtures for kd test by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2541\r\n* fix missing host\u002Fport for vllm by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2543\r\n* feat: add glm and glm4 multipack and cce by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2546\r\n* Codecov fixes \u002F improvements by @djsaunde in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2549\r\n* add base docker image with pytorch 2.7.0 and variant for cuda 12.8 by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2551\r\n* builds for torch==2.7.0 by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2552\r\n* Fix(doc): add delinearize instruction by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2545\r\n* disable codecov pr annotations by @djsaunde in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2556\r\n* make sure to validate the config before normalizing so defaults get set by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2554\r\n* Sequence parallel training context manager by @djsaunde in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2553\r\n* don't fail on codecov upload for external contributor PRs by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2564\r\n* fix: gradient checkpointing functools.partial object has no attribute __self__ by @ekojsalim in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2563\r\n* make cce default to true when using the plugin by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2562\r\n* chore(doc): minor update docker tags on doc by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2559\r\n* fix: crash when pretraining_dataset with dispatch_batches is false by @chiwanpark in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2558\r\n* fix support for wandb ru","2025-04-28T22:24:09",{"id":248,"version":249,"summary_zh":250,"released_at":251},72303,"v0.8.1","## What's Changed\n* add additional tf32 opt for cudnn by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2477\n* fix(example): align example to correct adapter by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2478\n* removing deepspeed guard for LoRA Triton kernels by @djsaunde in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2480\n* check if fixture exists in the cache already by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2485\n* simplify the example configs to be more minimal and less daunting by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2486\n* fix: cohere cce scaling wrong tensor by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2483\n* fix tokenizer overrides w gemma3 by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2488\n* Update dependencies and show slow tests in CI by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2492\n* Flex Attention + Packing with BlockMask support by @bursteratom in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2363\n* FSDP2 support by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2469\n* llama4 support by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2493\n* feat: add llama4 multimodal by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2499\n* fix: duplicate llama4 chattemplate enum by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2500\n* fix(doc): clarify roles mapping in chat_template by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2490\n* Feat: Add doc on loading datasets and support for Azure\u002FOCI by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2482\n* SP cu_seqlens fix, refactor by @djsaunde in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2495\n* feat: add llama4 CCE by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2498\n* Llama4 linearized by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2502\n\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fcompare\u002Fv0.8.0...v0.8.1","2025-04-08T00:50:24",{"id":253,"version":254,"summary_zh":255,"released_at":256},72304,"v0.8.0","## New Features\r\n\r\n#### Sequence parallelism support via ring-flash-attn\r\nThis enables long context training by distributing sequences across GPUs, reducing memory requirements per device while allowing near-linear scaling in context length per GPU. This complements other parallelism features that Axolotl offers, including FSDP and DeepSpeed. See our documentation [here](https:\u002F\u002Faxolotl-ai-cloud.github.io\u002Faxolotl\u002Fdocs\u002Fsequence_parallelism.html).\r\n\u003Cimg width=\"763\" alt=\"Screenshot 2025-04-02 at 9 17 14 AM\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F308db66d-084e-45b1-87c3-1a7b405390bc\" \u002F>\r\n\r\n#### Gemma-3 support has landed alongside several features to help you fine-tune Gemma-3 models:\r\n- Cut cross entropy\r\n- Liger kernel\r\n- Multimodal\r\n- Fixed loss calculation for Gradient Accumulation\r\n\r\n#### Multimodal Beta support for a variety of multi-modal models: \r\n- Mllama\r\n- Pixtral\r\n- Llava-1.5\r\n- Mistral-Small-3.1\r\n- Gemma-3\r\n- Qwen2-VL\r\n- Qwen2.5-VL\r\n\r\n#### Additional Features\r\n- Updated cut-cross-entropy patches for several models: Cohere, Cohere-2, Gemma, Gemma-2, Gemma-3, Mistral-3, and Mllama\r\n- Support for the REX Learning Rate Scheduler - https:\u002F\u002Farxiv.org\u002Fabs\u002F2107.04197\r\n- Tokenizer Overrides - you can now fine-tune with custom values in tokenizers using reserved tokens\r\n- Single-gpu and DDP support for Muon Optimizer\r\n- Sequential packing for Curriculum learning\r\n- Speeding up GRPO training with distributed vLLM - you can now use `axolotl vllm-serve path\u002Fto\u002Fconfig.yaml`  to serve a separate vLLM instance which can utilize multiple GPUs to speed up trajectory generation during GRPO.\r\n\r\n### Notes\r\n\r\nv0.8.x will be the last set of releases that will officially support torch\u003C=2.4.1. With PyTorch 2.7 release this month, we aim to support the latest 2 stable releases of PyTorch.\r\nWe expect FSDP2 support to be a fast follow and we'll include that in v0.8.1 once we can fix and validate issues such as saving checkpoints.\r\n\r\n## What's Changed\r\n* `train.py` refactor by @djsaunde in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2371\r\n* fix(doc): add installation for cce to docs by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2375\r\n* chore(docs): remove phorm by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2378\r\n* feat(doc): add docker images explanation by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2379\r\n* feat(doc): document drop_system_message and clarify limitation by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2381\r\n* chore(doc): add clarification about mpi4py error on single gpu deepspeed by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2383\r\n* fix(doc): add missing low_cpu_mem_usage config to docs by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2369\r\n* feat(grpo): add reward_weights config and refactor by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2365\r\n* Add REX LR Scheduler by @xzuyn in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2380\r\n* Update Tokenizer Overrides Handling in models.py by @mhenrichsen in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F1549\r\n* various fixes 20250305 by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2384\r\n* Optimizer refactor and add Muon support by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2367\r\n* remove lion-pytorch as it's already handled upstream by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2389\r\n* refactor: trl grpo configs to have descriptions by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2386\r\n* feat(doc): add more info on RewardModel datasets by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2391\r\n* chore(doc): add faq when having no default chat_template by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2398\r\n* Use Latest Cut Cross Entropy by @xzuyn in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2392\r\n* fix: create mount folder on modal if not exist by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2390\r\n* include iproute2 and nvtop in cloud image by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2393\r\n* fix(modal): add git pull when getting branch files by @NanoCode012 in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2399\r\n* pass additional info for fix untrained tokens when using distributed + offloading by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2388\r\n* use max of 32 dataset processes if not explicit by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2403\r\n* build cloud images with torch 2.6.0 by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2413\r\n* only validate hf user token on rank 0 by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2408\r\n* fixes against upstream main branches by @winglian in https:\u002F\u002Fgithub.com\u002Faxolotl-ai-cloud\u002Faxolotl\u002Fpull\u002F2407\r\n* chore(docs): add cookbook\u002Fblog link to doc","2025-04-02T13:51:40"]