[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-mosaicml--llm-foundry":3,"tool-mosaicml--llm-foundry":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":80,"owner_website":81,"owner_url":82,"languages":83,"stars":99,"forks":100,"last_commit_at":101,"license":102,"difficulty_score":10,"env_os":103,"env_gpu":104,"env_ram":105,"env_deps":106,"category_tags":115,"github_topics":116,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":122,"updated_at":123,"faqs":124,"releases":145},3185,"mosaicml\u002Fllm-foundry","llm-foundry","LLM training code for Databricks foundation models","llm-foundry 是 Databricks Mosaic 团队开源的一套大语言模型（LLM）训练与部署代码库，旨在帮助用户高效地从头训练、微调、评估及部署基础模型。它解决了大模型开发中流程复杂、资源利用率低以及难以复现前沿技术的痛点，让开发者能够快速验证最新的研究成果。\n\n这套工具非常适合 AI 研究人员、算法工程师以及希望构建自定义大模型的企业开发团队使用。无论是想训练参数量从 1.25 亿到 700 亿不等的模型，还是进行推理优化，llm-foundry 都提供了完整的脚本和模块化支持。其核心技术亮点在于深度集成了 Composer 训练框架，支持 Flash Attention 以提升计算效率，利用 ALiBi 技术实现上下文长度的灵活扩展，并有效缓解训练过程中的损失尖峰问题。此外，它还孕育了著名的 MPT 系列模型和先进的 DBRX 混合专家（MoE）架构。通过 llm-foundry，用户不仅能轻松处理数据预处理和模型转换，还能在学术基准或自定义任务上进行严谨评估，是探索大模型技术不可或缺的实用利器。","\u003C!-- SETUPTOOLS_LONG_DESCRIPTION_HIDE_BEGIN -->\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\">\n    \u003Cpicture>\n      \u003Cimg alt=\"LLM Foundry\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmosaicml_llm-foundry_readme_d68c2d9f7672.png\" width=\"95%\">\n    \u003C\u002Fpicture>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\u003C!-- SETUPTOOLS_LONG_DESCRIPTION_HIDE_END -->\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fllm-foundry\u002F\">\n        \u003Cimg alt=\"PyPi Version\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fllm-foundry\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fllm-foundry\u002F\">\n        \u003Cimg alt=\"PyPi Package Version\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fllm-foundry\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fmosaicml.me\u002Fslack\">\n        \u003Cimg alt=\"Chat @ Slack\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fslack-chat-2eb67d.svg?logo=slack\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fblob\u002Fmain\u002FLICENSE\">\n        \u003Cimg alt=\"License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache%202.0-green.svg\">\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\u003Cbr \u002F>\n\n# LLM Foundry\n\nThis repository contains code for training, finetuning, evaluating, and deploying LLMs for inference with [Composer](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fcomposer) and the [MosaicML platform](https:\u002F\u002Fforms.mosaicml.com\u002Fdemo?utm_source=github.com&utm_medium=referral&utm_campaign=llm-foundry). Designed to be easy-to-use, efficient _and_ flexible, this codebase enables rapid experimentation with the latest techniques.\n\nYou'll find in this repo:\n* `llmfoundry\u002F` - source code for models, datasets, callbacks, utilities, etc.\n* `scripts\u002F` - scripts to run LLM workloads\n  * `data_prep\u002F` - convert text data from original sources to StreamingDataset format\n  * `train\u002F` - train or finetune HuggingFace and MPT models from 125M - 70B parameters\n    * `train\u002Fbenchmarking` - profile training throughput and MFU\n  * `inference\u002F` - convert models to HuggingFace or ONNX format, and generate responses\n    * `inference\u002Fbenchmarking` - profile inference latency and throughput\n  * `eval\u002F` - evaluate LLMs on academic (or custom) in-context-learning tasks\n* `mcli\u002F` - launch any of these workloads using [MCLI](https:\u002F\u002Fdocs.mosaicml.com\u002Fprojects\u002Fmcli\u002Fen\u002Flatest\u002F) and the [MosaicML platform](https:\u002F\u002Fwww.mosaicml.com\u002Fplatform)\n* `TUTORIAL.md` - a deeper dive into the repo, example workflows, and FAQs\n\n# DBRX\n\nDBRX is a state-of-the-art open source LLM trained by Databricks Mosaic team. It uses the Mixture-of-Experts (MoE) architecture and was trained with optimized versions of [Composer](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fcomposer), LLM Foundry, and [MegaBlocks](https:\u002F\u002Fgithub.com\u002Fdatabricks\u002Fmegablocks). The model has 132B total parameters and 36B active parameters. We have released two DBRX models:\n\n\n| Model              | Context Length | Download                                           |\n| ------------------ | -------------- | -------------------------------------------------- |\n| DBRX Base          | 32768          | https:\u002F\u002Fhuggingface.co\u002Fdatabricks\u002Fdbrx-base        |\n| DBRX Instruct      | 32768          | https:\u002F\u002Fhuggingface.co\u002Fdatabricks\u002Fdbrx-instruct    |\n\nOur model weights and code are licensed for both researchers and commercial entities. The Databricks Open Source License can be found at [LICENSE](https:\u002F\u002Fgithub.com\u002Fdatabricks\u002Fdbrx\u002Fblob\u002Fmain\u002FLICENSE), and our Acceptable Use Policy can be found [here](https:\u002F\u002Fwww.databricks.com\u002Flegal\u002Facceptable-use-policy-open-model).\n\nFor more information about the DBRX models, see https:\u002F\u002Fgithub.com\u002Fdatabricks\u002Fdbrx.\n\n# MPT\n\nMosaic Pretrained Transformers (MPT) are GPT-style models with some special features -- Flash Attention for efficiency, ALiBi for context length extrapolation, and stability improvements to mitigate loss spikes. As part of MosaicML's Foundation series, we have open-sourced several MPT models:\n\n\n| Model              | Context Length | Download                                           | Commercial use? |\n| ------------------ | -------------- | -------------------------------------------------- | --------------- |\n| MPT-30B            | 8192           | https:\u002F\u002Fhuggingface.co\u002Fmosaicml\u002Fmpt-30b            | Yes             |\n| MPT-30B-Instruct   | 8192           | https:\u002F\u002Fhuggingface.co\u002Fmosaicml\u002Fmpt-30b-instruct   | Yes             |\n| MPT-30B-Chat       | 8192           | https:\u002F\u002Fhuggingface.co\u002Fmosaicml\u002Fmpt-30b-chat       | No              |\n| MPT-7b-8k          | 8192           | https:\u002F\u002Fhuggingface.co\u002Fmosaicml\u002Fmpt-7b-8k          | Yes             |\n| MPT-7b-8k-Chat | 8192           | https:\u002F\u002Fhuggingface.co\u002Fmosaicml\u002Fmpt-7b-8k-chat         | No              |\n| MPT-7B             | 2048           | https:\u002F\u002Fhuggingface.co\u002Fmosaicml\u002Fmpt-7b             | Yes             |\n| MPT-7B-Instruct    | 2048           | https:\u002F\u002Fhuggingface.co\u002Fmosaicml\u002Fmpt-7b-instruct    | Yes             |\n| MPT-7B-Chat        | 2048           | https:\u002F\u002Fhuggingface.co\u002Fmosaicml\u002Fmpt-7b-chat        | No              |\n| MPT-7B-StoryWriter | 65536          | https:\u002F\u002Fhuggingface.co\u002Fmosaicml\u002Fmpt-7b-storywriter | Yes             |\n\nTo try out these models locally, [follow the instructions](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Ftree\u002Fmain\u002Fscripts\u002Finference#interactive-generation-with-modelgenerate) in `scripts\u002Finference\u002FREADME.md` to prompt HF models using our [hf_generate.py](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fblob\u002Fmain\u002Fscripts\u002Finference\u002Fhf_generate.py) or [hf_chat.py](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fblob\u002Fmain\u002Fscripts\u002Finference\u002Fhf_chat.py) scripts.\n\n# MPT Community\n\nWe've been overwhelmed by all the amazing work the community has put into MPT! Here we provide a few links to some of them:\n* [ReplitLM](https:\u002F\u002Fgithub.com\u002Freplit\u002FreplitLM): `replit-code-v1-3b` is a 2.7B Causal Language Model focused on Code Completion. The model has been trained on a subset of the Stack Dedup v1.2 dataset covering 20 languages such as Java, Python, and C++\n* [LLaVa-MPT](https:\u002F\u002Fgithub.com\u002Fhaotian-liu\u002FLLaVA#LLaVA-MPT-7b): Visual instruction tuning to get MPT multimodal capabilities\n* [ggml](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fggml\u002Ftree\u002Fmaster): Optimized MPT version for efficient inference on consumer hardware\n* [GPT4All](https:\u002F\u002Fgpt4all.io\u002Findex.html): locally running chat system, now with MPT support!\n* [Q8MPT-Chat](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FIntel\u002FQ8-Chat): 8-bit optimized MPT for CPU by our friends at Intel\n\nTutorial videos from the community:\n* [Using MPT-7B with Langchain](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=DXpk9K7DgMo&t=3s) by [@jamesbriggs](https:\u002F\u002Fwww.youtube.com\u002F@jamesbriggs)\n* [MPT-7B StoryWriter Intro](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=O9Y_ZdsuKWQ) by [AItrepreneur](https:\u002F\u002Fwww.youtube.com\u002F@Aitrepreneur)\n* [Fine-tuning MPT-7B on a single GPU](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=KSlWkrByc0o&t=9s) by [@AIology2022](https:\u002F\u002Fwww.youtube.com\u002F@AIology2022)\n* [How to Fine-tune MPT-7B-Instruct on Google Colab](https:\u002F\u002Fyoutu.be\u002F3de0Utr9XnI) by [@VRSEN](https:\u002F\u002Fwww.youtube.com\u002F@vrsen)\n\nSomething missing? Contribute with a PR!\n\n# Latest News\n* [Blog: Introducing DBRX: A New State-of-the-Art Open LLM](https:\u002F\u002Fwww.databricks.com\u002Fblog\u002Fintroducing-dbrx-new-state-art-open-llm)\n* [Blog: LLM Training and Inference with Intel Gaudi2 AI Accelerators](https:\u002F\u002Fwww.databricks.com\u002Fblog\u002Fllm-training-and-inference-intel-gaudi2-ai-accelerators)\n* [Blog: Training LLMs at Scale with AMD MI250 GPUs](https:\u002F\u002Fwww.databricks.com\u002Fblog\u002Ftraining-llms-scale-amd-mi250-gpus)\n* [Blog: Training LLMs with AMD MI250 GPUs and MosaicML](https:\u002F\u002Fwww.mosaicml.com\u002Fblog\u002Famd-mi250)\n* [Blog: Announcing MPT-7B-8K: 8K Context Length for Document Understanding](https:\u002F\u002Fwww.mosaicml.com\u002Fblog\u002Flong-context-mpt-7b-8k)\n* [Blog: Training LLMs with AMD MI250 GPUs and MosaicML](https:\u002F\u002Fwww.mosaicml.com\u002Fblog\u002Famd-mi250)\n* [Blog: MPT-30B: Raising the bar for open-source foundation models](https:\u002F\u002Fwww.mosaicml.com\u002Fblog\u002Fmpt-30b)\n* [Blog: Introducing MPT-7B](https:\u002F\u002Fwww.mosaicml.com\u002Fblog\u002Fmpt-7b)\n* [Blog: Benchmarking LLMs on H100](https:\u002F\u002Fwww.mosaicml.com\u002Fblog\u002Fcoreweave-nvidia-h100-part-1)\n* [Blog: Blazingly Fast LLM Evaluation](https:\u002F\u002Fwww.mosaicml.com\u002Fblog\u002Fllm-evaluation-for-icl)\n* [Blog: GPT3 Quality for $500k](https:\u002F\u002Fwww.mosaicml.com\u002Fblog\u002Fgpt-3-quality-for-500k)\n* [Blog: Billion parameter GPT training made easy](https:\u002F\u002Fwww.mosaicml.com\u002Fblog\u002Fbillion-parameter-gpt-training-made-easy)\n\n\n\n# Hardware and Software Requirements\nThis codebase has been tested with PyTorch 2.4 with NVIDIA A100s and H100s.\nThis codebase may also work on systems with other devices, such as consumer NVIDIA cards and AMD cards, but we are not actively testing these systems.\nIf you have success\u002Ffailure using LLM Foundry on other systems, please let us know in a Github issue and we will update the support matrix!\n\n| Device         | Torch Version | Cuda Version | Status                       |\n| -------------- | ------------- | ------------ | ---------------------------- |\n| A100-40GB\u002F80GB | 2.7.0         | 12.8         | :white_check_mark: Supported |\n| H100-80GB      | 2.7.0         | 12.8         | :white_check_mark: Supported |\n\n## MosaicML Docker Images\nWe highly recommend using our prebuilt Docker images. You can find them here: https:\u002F\u002Fhub.docker.com\u002Forgs\u002Fmosaicml\u002Frepositories.\n\nThe `mosaicml\u002Fpytorch` images are pinned to specific PyTorch and CUDA versions, and are stable and rarely updated.\n\nThe `mosaicml\u002Fllm-foundry` images are built with new tags upon every commit to the `main` branch.\nYou can select a specific commit hash such as `mosaicml\u002Fllm-foundry:2.7.0_cu128-9867a7b` or take the latest one using `mosaicml\u002Fllm-foundry:2.7.0_cu128-latest`.\n\n**Please Note:** The `mosaicml\u002Fllm-foundry` images do not come with the `llm-foundry` package preinstalled, just the dependencies. You will still need to `pip install llm-foundry` either from PyPi or from source.\n\n| Docker Image                                           | Torch Version | Cuda Version      | LLM Foundry dependencies installed? |\n| ------------------------------------------------------ | ------------- | ----------------- | ----------------------------------- |\n| `mosaicml\u002Fpytorch:2.7.0_cu128-python3.12-ubuntu22.04`  | 2.7.0         | 12.8 (Infiniband) | No                                  |\n| `mosaicml\u002Fllm-foundry:2.7.0_cu128-latest`              | 2.7.0         | 12.8 (Infiniband) | Yes                                 |\n| `mosaicml\u002Fllm-foundry:2.7.0_cu128_aws-latest`          | 2.7.0         | 12.8 (EFA)        | Yes                                 |\n\n\n# Installation\n\nThis assumes you already have PyTorch, CMake, and packaging installed. If not, you can install them with `pip install cmake packaging torch`.\n\nTo get started, clone the repo and set up your environment. Instructions to do so differ slightly depending on whether you're using Docker.\n\n### With Docker (recommended)\n\nWe *strongly* recommend working with LLM Foundry inside a Docker container (see our recommended Docker image above). If you are doing so, follow these steps to clone the repo and install the requirements.\n\n\u003C!--pytest.mark.skip-->\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry.git\ncd llm-foundry\npip install -e \".[gpu]\"  # or `pip install -e .` if no NVIDIA GPU.\n```\n\n### Without Docker (not recommended)\n\nIf you choose not to use Docker, you should create and use a virtual environment.\n\n\u003C!--pytest.mark.skip-->\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry.git\ncd llm-foundry\n\n# Creating and activate a virtual environment\npython3 -m venv llmfoundry-venv\nsource llmfoundry-venv\u002Fbin\u002Factivate\n\npip install cmake packaging torch  # setup.py requires these be installed\n\npip install -e \".[gpu]\"  # or `pip install -e .` if no NVIDIA GPU.\n```\n\n### TransformerEngine and amp_fp8 support\nNVIDIA H100 GPUs have FP8 support; we have installed Flash Attention and Transformer in our Docker images already (see above). If you are not using our Docker images, you can install these packages with:\n\u003C!--pytest.mark.skip-->\n```bash\npip install flash-attn --no-build-isolation\npip install git+https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FTransformerEngine.git@stable\n```\n\nSee [here](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fblob\u002Fmain\u002FTUTORIAL.md#TransformerEngine-and-amp_fp8-support) for more details on enabling TransformerEngine layers and amp_fp8.\n\n### AMD (BETA support)\n\nIn [our testing of AMD GPUs](https:\u002F\u002Fwww.mosaicml.com\u002Fblog\u002Famd-mi250), the env setup includes:\n\n\u003C!--pytest.mark.skip-->\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry.git\ncd llm-foundry\n\n# Creating and activate a virtual environment\npython3 -m venv llmfoundry-venv-amd\nsource llmfoundry-venv-amd\u002Fbin\u002Factivate\n\n# installs\npip install cmake packaging torch\npip install -e .  # This installs some things that are not needed but they don't hurt\npip3 install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Frocm5.4.2\n```\n**Lastly**, install the ROCm enabled flash attention (instructions [here](https:\u002F\u002Fgithub.com\u002FROCmSoftwarePlatform\u002Fflash-attention\u002Ftree\u002Fflash_attention_for_rocm2#amd-gpurocm-support)).\n\nNotes:\n1. We don't yet have a Docker image where everything works perfectly. You might need to up\u002Fdowngrade some packages (in our case, we needed to downgrade to `numpy==1.23.5`) before everything works without issue.\n\n### Intel Gaudi\nSupport for LLM Foundry on Intel Gaudi devices is experimental, please use the branch `habana_alpha` and see the [README on that branch](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fblob\u002Fhabana_alpha) which has [install instructions and known issues.](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Ftree\u002Fhabana_alpha?tab=readme-ov-file#intel-gaudi)\n\nFor training and inference performance results on Intel Gaudi2 accelerators, see our blog: https:\u002F\u002Fwww.databricks.com\u002Fblog\u002Fllm-training-and-inference-intel-gaudi2-ai-accelerators\n\n\n# Quickstart\n\n> **Note**\n> Make sure to go through the installation steps above before trying the quickstart!\n\nHere is an end-to-end workflow for preparing a subset of the C4 dataset, training an MPT-125M model for 10 batches,\nconverting the model to HuggingFace format, evaluating the model on the Winograd challenge, and generating responses to prompts.\n\n**(Remember this is a quickstart just to demonstrate the tools -- To get good quality, the LLM must be trained for longer than 10 batches 😄)**\n\n\u003C!--pytest.mark.skip-->\n```bash\ncd scripts\n\n# Convert C4 dataset to StreamingDataset format\npython data_prep\u002Fconvert_dataset_hf.py \\\n  --dataset allenai\u002Fc4 --data_subset en \\\n  --out_root my-copy-c4 --splits train_small val_small \\\n  --concat_tokens 2048 --tokenizer EleutherAI\u002Fgpt-neox-20b --eos_text '\u003C|endoftext|>'\n\n# Train an MPT-125m model for 10 batches\ncomposer train\u002Ftrain.py \\\n  train\u002Fyamls\u002Fpretrain\u002Fmpt-125m.yaml \\\n  variables.data_local=my-copy-c4 \\\n  train_loader.dataset.split=train_small \\\n  eval_loader.dataset.split=val_small \\\n  max_duration=10ba \\\n  eval_interval=0 \\\n  save_folder=mpt-125m\n\n# Convert the model to HuggingFace format\npython inference\u002Fconvert_composer_to_hf.py \\\n  --composer_path mpt-125m\u002Fep0-ba10-rank0.pt \\\n  --hf_output_path mpt-125m-hf \\\n  --output_precision bf16 \\\n  # --hf_repo_for_upload user-org\u002Frepo-name\n\n# Evaluate the model on a subset of tasks\ncomposer eval\u002Feval.py \\\n  eval\u002Fyamls\u002Fhf_eval.yaml \\\n  icl_tasks=eval\u002Fyamls\u002Fcopa.yaml \\\n  model_name_or_path=mpt-125m-hf\n\n# Generate responses to prompts\npython inference\u002Fhf_generate.py \\\n  --name_or_path mpt-125m-hf \\\n  --max_new_tokens 256 \\\n  --prompts \\\n    \"The answer to life, the universe, and happiness is\" \\\n    \"Here's a quick recipe for baking chocolate chip cookies: Start by\"\n```\n\nNote: the `composer` command used above to train the model refers to the [Composer](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fcomposer) library's distributed launcher.\n\nIf you have a write-enabled [HuggingFace auth token](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhub\u002Fsecurity-tokens), you can optionally upload your model to the Hub! Just export your token like this:\n\n```bash\nexport HF_TOKEN=your-auth-token\n```\n\nand uncomment the line containing `--hf_repo_for_upload ...` in the above call to `inference\u002Fconvert_composer_to_hf.py`.\n\n# Registry\n\nYou can use the registry to customize your workflows without forking the library. Some components of LLM Foundry are registrable, such as models, loggers, and callbacks. This means that you can register new options for these components, and then use them in your yaml config.\n\n## Discovering registrable components\nTo help find and understand registrable components, you can use the `llmfoundry registry` cli command.\n\nWe provide two commands currently:\n- `llmfoundry registry get [--group]`: List all registries, and their components, optionally specifying a specific registry. Example usage: `llmfoundry registry get --group loggers` or `llmfoundry registry get`\n- `llmfoundry registry find \u003Cgroup> \u003Cname>`: Get information about a specific registered component. Example usage: `llmfoundry registry find loggers wandb`\n\nUse `--help` on any of these commands for more information.\n\nThese commands can also help you understand what each registry is composed of, as each registry contains a docstring that will be printed out. The general concept is that each registry defines an interface, and components registered to that registry must implement that interface. If there is a part of the library that is not currently extendable, but you think it should be, please open an issue!\n\n## How to register\n\nThere are a few ways to register a new component:\n\n### Python entrypoints\n\nYou can specify registered components via a Python entrypoint if you are building your own package with registered components.\nThis would be the expected usage if you are building a large extension to LLM Foundry, and going to be overriding many components. Note that things registered via entrypoints will override components registered directly in code.\n\nFor example, the following would register the `MyLogger` class, under the key `my_logger`, in the `llm_foundry.loggers` registry:\n\n\u003C!--pytest.mark.skip-->\n```yaml\n[build-system]\nrequires = [\"setuptools>=42\", \"wheel\"]\nbuild-backend = \"setuptools.build_meta\"\n\n[project]\nname = \"foundry_registry\"\nversion = \"0.1.0\"\ndependencies = [\n    \"mosaicml\",\n    \"llm-foundry\",\n]\n\n# Note: Even though in python code, this would be llmfoundry.registry.loggers,\n# when specified in the entry_points, it has to be \"llmfoundry_loggers\". That is,\n# the segments of the name should be joined by an _ in the entry_points section.\n[project.entry-points.\"llmfoundry_loggers\"]\nmy_logger = \"foundry_registry.loggers:MyLogger\"\n```\n\nIf developing new components via entrypoints, it is important to note that Python entrypoints are global to the Python environment. This means that if you have multiple packages that register components with the same key, the last one installed will be the one used. This can be useful for overriding components in LLM Foundry, but can also lead to unexpected behavior if not careful. Additionally, if you change the pyproject.toml, you will need to reinstall the package for the changes to take effect. You can do this quickly by installing with `pip install -e . --no-deps` to avoid reinstalling dependencies.\n\n### Direct call to register\n\nYou can also register a component directly in your code:\n\n\u003C!--pytest.mark.skip-->\n```python\nfrom composer.loggers import LoggerDestination\nfrom llmfoundry.registry import loggers\n\nclass MyLogger(LoggerDestination):\n    pass\n\nloggers.register(\"my_logger\", func=MyLogger)\n```\n\n### Decorators\n\nYou can also use decorators to register components directly from your code:\n\n\u003C!--pytest.mark.skip-->\n```python\nfrom composer.loggers import LoggerDestination\nfrom llmfoundry.registry import loggers\n\n@loggers.register(\"my_logger\")\nclass MyLogger(LoggerDestination):\n    pass\n```\n\nFor both the direct call and decorator approaches, if using the LLM Foundry train\u002Feval scripts, you will need to provide the `code_paths` argument, which is a list of files need to execute in order to register your components. For example, you may have a file called `foundry_imports.py` that contains the following:\n\n\u003C!--pytest.mark.skip-->\n```python\nfrom foundry_registry.loggers import MyLogger\nfrom llmfoundry.registry import loggers\n\nloggers.register(\"my_logger\", func=MyLogger)\n```\n\nYou would then provide `code_paths` to the train\u002Feval scripts in your yaml config:\n\n\u003C!--pytest.mark.skip-->\n```yaml\n...\ncode_paths:\n  - foundry_imports.py\n...\n```\n\nOne of these would be the expected usage if you are building a small extension to LLM Foundry, only overriding a few components, and thus don't want to create an entire package.\n\n# Learn more about LLM Foundry!\n\nCheck out [TUTORIAL.md](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fblob\u002Fmain\u002FTUTORIAL.md) to keep learning about working with LLM Foundry. The tutorial highlights example workflows, points you to other resources throughout the repo, and answers frequently asked questions!\n\n# Contact Us\n\nIf you run into any problems with the code, please file Github issues directly to this repo.\n\nIf you want to train LLMs on the MosaicML platform, reach out to us at [demo@mosaicml.com](mailto:demo@mosaicml.com)!\n","\u003C!-- SETUPTOOLS_LONG_DESCRIPTION_HIDE_BEGIN -->\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\">\n    \u003Cpicture>\n      \u003Cimg alt=\"LLM Foundry\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmosaicml_llm-foundry_readme_d68c2d9f7672.png\" width=\"95%\">\n    \u003C\u002Fpicture>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\u003C!-- SETUPTOOLS_LONG_DESCRIPTION_HIDE_END -->\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fllm-foundry\u002F\">\n        \u003Cimg alt=\"PyPi Version\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fllm-foundry\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Fllm-foundry\u002F\">\n        \u003Cimg alt=\"PyPi Package Version\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fllm-foundry\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fmosaicml.me\u002Fslack\">\n        \u003Cimg alt=\"Chat @ Slack\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fslack-chat-2eb67d.svg?logo=slack\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fblob\u002Fmain\u002FLICENSE\">\n        \u003Cimg alt=\"License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache%202.0-green.svg\">\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\u003Cbr \u002F>\n\n# LLM Foundry\n\n本仓库包含使用 [Composer](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fcomposer) 和 [MosaicML 平台](https:\u002F\u002Fforms.mosaicml.com\u002Fdemo?utm_source=github.com&utm_medium=referral&utm_campaign=llm-foundry) 进行 LLM 训练、微调、评估及推理部署的相关代码。该代码库设计简洁易用、高效且灵活，可帮助用户快速尝试最新的技术方法。\n\n在本仓库中，您将找到：\n* `llmfoundry\u002F` - 模型、数据集、回调函数、工具等源代码\n* `scripts\u002F` - 用于运行 LLM 工作负载的脚本\n  * `data_prep\u002F` - 将原始文本数据转换为 StreamingDataset 格式\n  * `train\u002F` - 训练或微调 HuggingFace 和 MPT 模型，参数规模从 1.25 亿到 700 亿不等\n    * `train\u002Fbenchmarking` - 用于性能分析训练吞吐量和 MFU\n  * `inference\u002F` - 将模型转换为 HuggingFace 或 ONNX 格式，并生成响应\n    * `inference\u002Fbenchmarking` - 用于评估推理延迟和吞吐量\n  * `eval\u002F` - 在学术（或自定义）上下文学习任务上评估 LLM\n* `mcli\u002F` - 使用 [MCLI](https:\u002F\u002Fdocs.mosaicml.com\u002Fprojects\u002Fmcli\u002Fen\u002Flatest\u002F) 和 [MosaicML 平台](https:\u002F\u002Fwww.mosaicml.com\u002Fplatform) 启动上述各类工作负载\n* `TUTORIAL.md` - 对本仓库的深入介绍、示例工作流及常见问题解答\n\n# DBRX\n\nDBRX 是由 Databricks Mosaic 团队训练的最先进开源 LLM。它采用专家混合（MoE）架构，并基于优化后的 [Composer](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fcomposer)、LLM Foundry 和 [MegaBlocks](https:\u002F\u002Fgithub.com\u002Fdatabricks\u002Fmegablocks) 进行训练。该模型总参数量为 1320 亿，活跃参数量为 360 亿。我们已发布了两款 DBRX 模型：\n\n\n| 模型              | 上下文长度 | 下载                                           |\n| ------------------ | -------------- | -------------------------------------------------- |\n| DBRX Base          | 32768          | https:\u002F\u002Fhuggingface.co\u002Fdatabricks\u002Fdbrx-base        |\n| DBRX Instruct      | 32768          | https:\u002F\u002Fhuggingface.co\u002Fdatabricks\u002Fdbrx-instruct    |\n\n我们的模型权重和代码对研究人员及商业实体均开放许可。Databricks 开源许可证可在 [LICENSE](https:\u002F\u002Fgithub.com\u002Fdatabricks\u002Fdbrx\u002Fblob\u002Fmain\u002FLICENSE) 中找到，其可接受使用政策则可在此处查阅：[这里](https:\u002F\u002Fwww.databricks.com\u002Flegal\u002Facceptable-use-policy-open-model)。\n\n有关 DBRX 模型的更多信息，请参阅 https:\u002F\u002Fgithub.com\u002Fdatabricks\u002Fdbrx。\n\n# MPT\n\nMosaic 预训练 Transformer（MPT）是一系列 GPT 风格的模型，具备多项特色功能——高效的 Flash Attention、支持长上下文外推的 ALiBi 机制，以及用于缓解损失骤增的稳定性改进。作为 MosaicML Foundation 系列的一部分，我们已开源了多款 MPT 模型：\n\n\n| 模型              | 上下文长度 | 下载                                           | 是否可用于商业用途？ |\n| ------------------ | -------------- | -------------------------------------------------- | --------------- |\n| MPT-30B            | 8192           | https:\u002F\u002Fhuggingface.co\u002Fmosaicml\u002Fmpt-30b            | 是             |\n| MPT-30B-Instruct   | 8192           | https:\u002F\u002Fhuggingface.co\u002Fmosaicml\u002Fmpt-30b-instruct   | 是             |\n| MPT-30B-Chat       | 8192           | https:\u002F\u002Fhuggingface.co\u002Fmosaicml\u002Fmpt-30b-chat       | 否              |\n| MPT-7b-8k          | 8192           | https:\u002F\u002Fhuggingface.co\u002Fmosaicml\u002Fmpt-7b-8k          | 是             |\n| MPT-7b-8k-Chat     | 8192           | https:\u002F\u002Fhuggingface.co\u002Fmosaicml\u002Fmpt-7b-8k-chat     | 否              |\n| MPT-7B             | 2048           | https:\u002F\u002Fhuggingface.co\u002Fmosaicml\u002Fmpt-7b             | 是             |\n| MPT-7B-Instruct    | 2048           | https:\u002F\u002Fhuggingface.co\u002Fmosaicml\u002Fmpt-7b-instruct    | 是             |\n| MPT-7B-Chat        | 2048           | https:\u002F\u002Fhuggingface.co\u002Fmosaicml\u002Fmpt-7b-chat        | 否              |\n| MPT-7B-StoryWriter | 65536          | https:\u002F\u002Fhuggingface.co\u002Fmosaicml\u002Fmpt-7b-storywriter | 是             |\n\n如需在本地试用这些模型，请按照 `scripts\u002Finference\u002FREADME.md` 中的说明，使用我们的 [hf_generate.py](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fblob\u002Fmain\u002Fscripts\u002Finference\u002Fhf_generate.py) 或 [hf_chat.py](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fblob\u002Fmain\u002Fscripts\u002Finference\u002Fhf_chat.py) 脚本，通过 HF 模型进行交互式生成。\n\n# MPT 社区\n\n社区为 MPT 所做的精彩工作让我们倍感振奋！以下是我们整理的一些相关链接：\n* [ReplitLM](https:\u002F\u002Fgithub.com\u002Freplit\u002FreplitLM)：`replit-code-v1-3b` 是一款专注于代码补全的 27 亿参数因果语言模型。该模型基于 Stack Dedup v1.2 数据集的一个子集进行训练，涵盖 Java、Python、C++ 等 20 种编程语言。\n* [LLaVa-MPT](https:\u002F\u002Fgithub.com\u002Fhaotian-liu\u002FLLaVA#LLaVA-MPT-7b)：通过视觉指令微调，使 MPT 具备多模态能力。\n* [ggml](https:\u002F\u002Fgithub.com\u002Fggerganov\u002Fggml\u002Ftree\u002Fmaster)：针对消费级硬件优化的 MPT 版本，可实现高效推理。\n* [GPT4All](https:\u002F\u002Fgpt4all.io\u002Findex.html)：一款可在本地运行的聊天系统，现已支持 MPT！\n* [Q8MPT-Chat](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FIntel\u002FQ8-Chat)：由英特尔团队开发的 8 位优化版 MPT，专为 CPU 设计。\n\n社区分享的教学视频：\n* [@jamesbriggs](https:\u002F\u002Fwww.youtube.com\u002F@jamesbriggs) 的 [使用 MPT-7B 与 Langchain](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=DXpk9K7DgMo&t=3s)\n* [AItrepreneur](https:\u002F\u002Fwww.youtube.com\u002F@Aitrepreneur) 的 [MPT-7B StoryWriter 简介](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=O9Y_ZdsuKWQ)\n* [@AIology2022](https:\u002F\u002Fwww.youtube.com\u002F@AIology2022) 的 [单 GPU 上微调 MPT-7B](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=KSlWkrByc0o&t=9s)\n* [@VRSEN](https:\u002F\u002Fwww.youtube.com\u002F@vrsen) 的 [如何在 Google Colab 上微调 MPT-7B-Instruct](https:\u002F\u002Fyoutu.be\u002F3de0Utr9XnI)\n\n如果您发现遗漏的内容，欢迎提交 PR 参与贡献！\n\n# 最新消息\n* [博客：推出DBRX：一款全新的前沿开源大模型](https:\u002F\u002Fwww.databricks.com\u002Fblog\u002Fintroducing-dbrx-new-state-art-open-llm)\n* [博客：使用英特尔Gaudi2 AI加速器进行大模型训练与推理](https:\u002F\u002Fwww.databricks.com\u002Fblog\u002Fllm-training-and-inference-intel-gaudi2-ai-accelerators)\n* [博客：利用AMD MI250 GPU大规模训练大模型](https:\u002F\u002Fwww.databricks.com\u002Fblog\u002Ftraining-llms-scale-amd-mi250-gpus)\n* [博客：使用AMD MI250 GPU和MosaicML训练大模型](https:\u002F\u002Fwww.mosaicml.com\u002Fblog\u002Famd-mi250)\n* [博客：宣布MPT-7B-8K：8K上下文长度用于文档理解](https:\u002F\u002Fwww.mosaicml.com\u002Fblog\u002Flong-context-mpt-7b-8k)\n* [博客：使用AMD MI250 GPU和MosaicML训练大模型](https:\u002F\u002Fwww.mosaicml.com\u002Fblog\u002Famd-mi250)\n* [博客：MPT-30B：为开源基础模型树立新标杆](https:\u002F\u002Fwww.mosaicml.com\u002Fblog\u002Fmpt-30b)\n* [博客：推出MPT-7B](https:\u002F\u002Fwww.mosaicml.com\u002Fblog\u002Fmpt-7b)\n* [博客：在H100上对大模型进行基准测试](https:\u002F\u002Fwww.mosaicml.com\u002Fblog\u002Fcoreweave-nvidia-h100-part-1)\n* [博客：极速的大模型评估](https:\u002F\u002Fwww.mosaicml.com\u002Fblog\u002Fllm-evaluation-for-icl)\n* [博客：以50万美元获得GPT3级别的质量](https:\u002F\u002Fwww.mosaicml.com\u002Fblog\u002Fgpt-3-quality-for-500k)\n* [博客：轻松实现十亿参数GPT的训练](https:\u002F\u002Fwww.mosaicml.com\u002Fblog\u002Fbillion-parameter-gpt-training-made-easy)\n\n\n\n# 硬件与软件要求\n该代码库已在配备NVIDIA A100和H100显卡的PyTorch 2.4环境中进行了测试。\n此代码库也可能在搭载其他设备的系统上运行，例如消费级NVIDIA显卡和AMD显卡，但我们目前并未积极测试这些系统。\n如果您在其他系统上成功或失败地使用LLM Foundry，请通过Github问题告知我们，我们将更新支持矩阵！\n\n| 设备         | Torch版本 | CUDA版本 | 状态                       |\n| -------------- | ------------- | ------------ | ---------------------------- |\n| A100-40GB\u002F80GB | 2.7.0         | 12.8         | :white_check_mark: 支持     |\n| H100-80GB      | 2.7.0         | 12.8         | :white_check_mark: 支持     |\n\n## MosaicML Docker镜像\n我们强烈建议使用我们预构建的Docker镜像。您可以在以下地址找到它们：https:\u002F\u002Fhub.docker.com\u002Forgs\u002Fmosaicml\u002Frepositories。\n\n`mosaicml\u002Fpytorch`镜像固定了特定的PyTorch和CUDA版本，稳定且很少更新。\n\n`mosaicml\u002Fllm-foundry`镜像则会在每次向`main`分支提交时生成新的标签。您可以选择特定的提交哈希，如`mosaicml\u002Fllm-foundry:2.7.0_cu128-9867a7b`，也可以使用最新的版本`mosaicml\u002Fllm-foundry:2.7.0_cu128-latest`。\n\n**请注意：** `mosaicml\u002Fllm-foundry`镜像并未预装`llm-foundry`包，仅安装了依赖项。您仍需从PyPi或源码中执行`pip install llm-foundry`。\n\n| Docker镜像                                           | Torch版本 | CUDA版本      | 是否已安装LLM Foundry依赖？ |\n| ------------------------------------------------------ | ------------- | ----------------- | ----------------------------------- |\n| `mosaicml\u002Fpytorch:2.7.0_cu128-python3.12-ubuntu22.04`  | 2.7.0         | 12.8（Infiniband） | 否                                  |\n| `mosaicml\u002Fllm-foundry:2.7.0_cu128-latest`              | 2.7.0         | 12.8（Infiniband） | 是                                 |\n| `mosaicml\u002Fllm-foundry:2.7.0_cu128_aws-latest`          | 2.7.0         | 12.8（EFA）        | 是                                 |\n\n\n# 安装\n\n此处假设您已安装PyTorch、CMake和packaging。若未安装，可通过`pip install cmake packaging torch`进行安装。\n\n要开始使用，首先克隆仓库并设置环境。具体步骤会因您是否使用Docker而略有不同。\n\n### 使用Docker（推荐）\n\n我们强烈建议在Docker容器内使用LLM Foundry（请参阅上方推荐的Docker镜像）。若您决定这样做，请按照以下步骤克隆仓库并安装所需依赖。\n\n\u003C!--pytest.mark.skip-->\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry.git\ncd llm-foundry\npip install -e \".[gpu]\"  # 或者如果无NVIDIA GPU，则使用`pip install -e .`\n```\n\n### 不使用Docker（不推荐）\n\n如果您选择不使用Docker，则应创建并使用虚拟环境。\n\n\u003C!--pytest.mark.skip-->\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry.git\ncd llm-foundry\n\n# 创建并激活虚拟环境\npython3 -m venv llmfoundry-venv\nsource llmfoundry-venv\u002Fbin\u002Factivate\n\npip install cmake packaging torch  # setup.py需要这些包\n\npip install -e \".[gpu]\"  # 或者如果无NVIDIA GPU，则使用`pip install -e .`\n```\n\n### TransformerEngine与amp_fp8支持\nNVIDIA H100显卡支持FP8；我们在Docker镜像中已安装Flash Attention和Transformer（见上文）。若您未使用我们的Docker镜像，可按如下方式安装这些包：\n\u003C!--pytest.mark.skip-->\n```bash\npip install flash-attn --no-build-isolation\npip install git+https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FTransformerEngine.git@stable\n```\n\n有关启用TransformerEngine层和amp_fp8的更多详细信息，请参阅[此处](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fblob\u002Fmain\u002FTUTORIAL.md#TransformerEngine-and-amp_fp8-support)。\n\n### AMD（测试版支持）\n\n根据我们对AMD GPU的测试结果（详见[这里](https:\u002F\u002Fwww.mosaicml.com\u002Fblog\u002Famd-mi250)），环境设置包括：\n\n\u003C!--pytest.mark.skip-->\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry.git\ncd llm-foundry\n\n# 创建并激活虚拟环境\npython3 -m venv llmfoundry-venv-amd\nsource llmfoundry-venv-amd\u002Fbin\u002Factivate\n\n# 安装\npip install cmake packaging torch\npip install -e .  # 这会安装一些不必要的东西，但不会造成影响\npip3 install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Frocm5.4.2\n```\n**最后**，还需安装支持ROCm的flash attention（安装说明[在此](https:\u002F\u002Fgithub.com\u002FROCmSoftwarePlatform\u002Fflash-attention\u002Ftree\u002Fflash_attention_for_rocm2#amd-gpurocm-support)）。\n\n注意事项：\n1. 目前我们尚未有所有组件完美兼容的Docker镜像。您可能需要调整某些包的版本（例如，在我们的案例中，曾将numpy降级至1.23.5），才能确保一切正常运行。\n\n### Intel Gaudi\nLLM Foundry对Intel Gaudi设备的支持尚处于实验阶段，请使用`habana_alpha`分支，并参考该分支的[README文件](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fblob\u002Fhabana_alpha)，其中包含[安装说明及已知问题](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Ftree\u002Fhabana_alpha?tab=readme-ov-file#intel-gaudi)。\n\n关于Intel Gaudi2加速器在训练与推理方面的性能结果，请参阅我们的博客：https:\u002F\u002Fwww.databricks.com\u002Fblog\u002Fllm-training-and-inference-intel-gaudi2-ai-accelerators\n\n# 快速入门\n\n> **注意**\n> 请务必先完成上述安装步骤，再尝试快速入门！\n\n以下是准备 C4 数据集子集、训练一个 MPT-125M 模型 10 个批次、将模型转换为 HuggingFace 格式、在 Winograd 挑战上评估模型，并生成提示响应的端到端工作流。\n\n**(请记住，这只是用于演示工具的快速入门——要获得高质量的结果，LLM 至少需要训练超过 10 个批次 😄)**\n\n\u003C!--pytest.mark.skip-->\n```bash\ncd scripts\n\n# 将 C4 数据集转换为 StreamingDataset 格式\npython data_prep\u002Fconvert_dataset_hf.py \\\n  --dataset allenai\u002Fc4 --data_subset en \\\n  --out_root my-copy-c4 --splits train_small val_small \\\n  --concat_tokens 2048 --tokenizer EleutherAI\u002Fgpt-neox-20b --eos_text '\u003C|endoftext|>'\n\n# 训练一个 MPT-125m 模型，共 10 个批次\ncomposer train\u002Ftrain.py \\\n  train\u002Fyamls\u002Fpretrain\u002Fmpt-125m.yaml \\\n  variables.data_local=my-copy-c4 \\\n  train_loader.dataset.split=train_small \\\n  eval_loader.dataset.split=val_small \\\n  max_duration=10ba \\\n  eval_interval=0 \\\n  save_folder=mpt-125m\n\n# 将模型转换为 HuggingFace 格式\npython inference\u002Fconvert_composer_to_hf.py \\\n  --composer_path mpt-125m\u002Fep0-ba10-rank0.pt \\\n  --hf_output_path mpt-125m-hf \\\n  --output_precision bf16 \\\n  # --hf_repo_for_upload user-org\u002Frepo-name\n\n# 在部分任务子集上评估模型\ncomposer eval\u002Feval.py \\\n  eval\u002Fyamls\u002Fhf_eval.yaml \\\n  icl_tasks=eval\u002Fyamls\u002Fcopa.yaml \\\n  model_name_or_path=mpt-125m-hf\n\n# 生成提示响应\npython inference\u002Fhf_generate.py \\\n  --name_or_path mpt-125m-hf \\\n  --max_new_tokens 256 \\\n  --prompts \\\n    \"生命、宇宙以及幸福的答案是\" \\\n    \"这里有一个快速烘焙巧克力曲奇饼干的食谱：首先\"\n```\n\n注意：上述用于训练模型的 `composer` 命令指的是 [Composer](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fcomposer) 库中的分布式启动器。\n\n如果你拥有可写权限的 [HuggingFace 认证令牌](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhub\u002Fsecurity-tokens)，你还可以选择将你的模型上传到 Hub！只需按如下方式导出你的令牌：\n\n```bash\nexport HF_TOKEN=your-auth-token\n```\n\n然后取消注释上面调用 `inference\u002Fconvert_composer_to_hf.py` 中包含 `--hf_repo_for_upload ...` 的那一行。\n\n# 注册表\n\n你可以使用注册表来自定义你的工作流程，而无需分叉该库。LLM Foundry 的一些组件是可以注册的，例如模型、日志记录器和回调函数。这意味着你可以为这些组件注册新的选项，然后在你的 YAML 配置中使用它们。\n\n## 发现可注册的组件\n为了帮助查找和理解可注册的组件，你可以使用 `llmfoundry registry` CLI 命令。\n\n目前我们提供了两个命令：\n- `llmfoundry registry get [--group]`：列出所有注册表及其组件，也可以选择指定某个特定的注册表。示例用法：`llmfoundry registry get --group loggers` 或 `llmfoundry registry get`\n- `llmfoundry registry find \u003Cgroup> \u003Cname>`：获取关于特定已注册组件的信息。示例用法：`llmfoundry registry find loggers wandb`\n\n对于这些命令中的任何一个，都可以使用 `--help` 查看更多信息。\n\n这些命令还可以帮助你了解每个注册表的具体组成，因为每个注册表都包含一段文档字符串，会在打印时显示出来。其基本概念是，每个注册表定义了一个接口，而注册到该注册表的组件必须实现这个接口。如果当前库中存在某些无法扩展的部分，但你认为应该可以扩展，请提交一个问题！\n\n## 如何注册\n有几种方法可以注册一个新的组件：\n\n### Python 入口点\n如果你正在构建一个包含注册组件的自有包，可以通过 Python 入口点来指定注册的组件。\n这通常是当你计划对 LLM Foundry 进行大规模扩展并覆盖许多组件时的预期用法。需要注意的是，通过入口点注册的内容会覆盖直接在代码中注册的组件。\n\n例如，以下内容将在 `llm_foundry.loggers` 注册表中以 `my_logger` 为键注册 `MyLogger` 类：\n\n\u003C!--pytest.mark.skip-->\n```yaml\n[build-system]\nrequires = [\"setuptools>=42\", \"wheel\"]\nbuild-backend = \"setuptools.build_meta\"\n\n[project]\nname = \"foundry_registry\"\nversion = \"0.1.0\"\ndependencies = [\n    \"mosaicml\",\n    \"llm-foundry\",\n]\n\n# 注意：尽管在 Python 代码中应为 llmfoundry.registry.loggers，\n# 但在 entry_points 中必须写成 \"llmfoundry_loggers\"。也就是说，\n# 名称各部分之间需要用下划线连接，在 entry_points 部分也是如此。\n[project.entry-points.\"llmfoundry_loggers\"]\nmy_logger = \"foundry_registry.loggers:MyLogger\"\n```\n\n如果通过入口点开发新组件，需要注意的是，Python 入口点在整个 Python 环境中是全局的。这意味着，如果有多个包使用相同的键注册组件，最后安装的那个包所使用的组件将会生效。这在覆盖 LLM Foundry 中的组件时非常有用，但也可能因疏忽而导致意外行为。此外，如果你修改了 pyproject.toml 文件，就需要重新安装该包才能使更改生效。你可以通过运行 `pip install -e . --no-deps` 来快速完成这一操作，从而避免重新安装依赖项。\n\n### 直接调用注册\n你也可以直接在代码中注册一个组件：\n\n\u003C!--pytest.mark.skip-->\n```python\nfrom composer.loggers import LoggerDestination\nfrom llmfoundry.registry import loggers\n\nclass MyLogger(LoggerDestination):\n    pass\n\nloggers.register(\"my_logger\", func=MyLogger)\n```\n\n### 装饰器\n你还可以使用装饰器直接从代码中注册组件：\n\n\u003C!--pytest.mark.skip-->\n```python\nfrom composer.loggers import LoggerDestination\nfrom llmfoundry.registry import loggers\n\n@loggers.register(\"my_logger\")\nclass MyLogger(LoggerDestination):\n    pass\n```\n\n无论是直接调用还是使用装饰器的方法，如果你使用 LLM Foundry 的训练\u002F评估脚本，都需要提供 `code_paths` 参数，即一个列表，其中包含了执行以注册你的组件所需的文件路径。例如，你可能有一个名为 `foundry_imports.py` 的文件，内容如下：\n\n\u003C!--pytest.mark.skip-->\n```python\nfrom foundry_registry.loggers import MyLogger\nfrom llmfoundry.registry import loggers\n\nloggers.register(\"my_logger\", func=MyLogger)\n```\n\n然后你需要在你的 YAML 配置中为训练\u002F评估脚本提供 `code_paths`：\n\n\u003C!--pytest.mark.skip-->\n```yaml\n...\ncode_paths:\n  - foundry_imports.py\n...\n```\n\n以上方法之一通常适用于你只对 LLM Foundry 进行小规模扩展，仅覆盖少数几个组件，因此不想创建整个包的情况。\n\n# 了解更多关于 LLM Foundry 的信息！\n\n请查看 [TUTORIAL.md](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fblob\u002Fmain\u002FTUTORIAL.md)，继续学习如何使用 LLM Foundry。该教程重点介绍了示例工作流，引导您找到仓库中的其他资源，并解答了常见问题！\n\n# 联系我们\n\n如果您在使用代码时遇到任何问题，请直接向本仓库提交 GitHub 问题。\n\n如果您希望在 MosaicML 平台上训练大语言模型，请通过 [demo@mosaicml.com](mailto:demo@mosaicml.com) 与我们联系！","# LLM Foundry 快速上手指南\n\nLLM Foundry 是一个用于训练、微调、评估和部署大语言模型（LLM）的开源代码库，基于 [Composer](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fcomposer) 构建。它支持从 1.25 亿到 700 亿参数的 HuggingFace 和 MPT 模型，并提供了 DBRX 和 MPT 系列模型的官方支持。\n\n## 环境准备\n\n### 系统要求\n*   **硬件**: 推荐使用 NVIDIA A100 (40GB\u002F80GB) 或 H100 (80GB) GPU。虽然可能在消费级 NVIDIA 显卡或 AMD 显卡上运行，但未经过官方全面测试。\n*   **软件版本**:\n    *   PyTorch: 2.7.0 (推荐)\n    *   CUDA: 12.8\n*   **前置依赖**: 需预先安装 `cmake`, `packaging`, `torch`。\n\n### Docker 环境（强烈推荐）\n为了简化环境配置，建议使用官方预建的 Docker 镜像。\n*   **基础镜像**: `mosaicml\u002Fpytorch:2.7.0_cu128-python3.12-ubuntu22.04` (仅含依赖)\n*   **完整镜像**: `mosaicml\u002Fllm-foundry:2.7.0_cu128-latest` (含 LLM Foundry 依赖，但仍需安装包本身)\n\n> **注意**：国内用户若拉取 Docker Hub 较慢，可配置阿里云、腾讯云等国内容器镜像加速器。\n\n## 安装步骤\n\n### 方式一：使用 Docker（推荐）\n\n1.  启动容器并挂载代码目录（示例）：\n    ```bash\n    docker run --gpus all -it --rm -v $(pwd):\u002Fworkspace mosaicml\u002Fllm-foundry:2.7.0_cu128-latest bash\n    ```\n2.  在容器内克隆仓库并安装：\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry.git\n    cd llm-foundry\n    # 可选：配置 pip 国内源加速安装\n    # pip config set global.index-url https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n    pip install -e \".[gpu]\"\n    ```\n    *(若无 NVIDIA GPU，请使用 `pip install -e .`)*\n\n### 方式二：本地环境安装（不推荐）\n\n若不使用 Docker，请创建虚拟环境进行安装：\n\n```bash\n# 1. 克隆仓库\ngit clone https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry.git\ncd llm-foundry\n\n# 2. 创建并激活虚拟环境\npython3 -m venv llmfoundry-venv\nsource llmfoundry-venv\u002Fbin\u002Factivate\n\n# 3. 安装基础依赖\npip install cmake packaging torch\n\n# 4. 安装 LLM Foundry (建议配置国内 pip 源)\n# pip config set global.index-url https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\npip install -e \".[gpu]\"\n```\n\n### 可选：启用 FP8 支持 (仅限 H100)\n如果未使用官方 Docker 镜像且需要在 H100 上启用 FP8 加速，需额外安装：\n```bash\npip install flash-attn --no-build-isolation\npip install git+https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FTransformerEngine.git@stable\n```\n\n## 基本使用\n\nLLM Foundry 提供了多种脚本用于数据处理、训练和推理。以下是最基础的本地交互式推理示例。\n\n### 运行交互式生成 (Inference)\n\n使用提供的脚本加载 HuggingFace 格式的模型（如 MPT-7B）并进行对话或文本生成。\n\n**单轮生成示例:**\n```bash\npython scripts\u002Finference\u002Fhf_generate.py \\\n    --model mosaicml\u002Fmpt-7b \\\n    --prompt \"Hello, how are you?\" \\\n    --max_new_tokens 100\n```\n\n**交互式聊天示例:**\n```bash\npython scripts\u002Finference\u002Fhf_chat.py \\\n    --model mosaicml\u002Fmpt-7b-instruct\n```\n\n### 主要功能模块说明\n*   **数据准备**: `scripts\u002Fdata_prep\u002F` - 将原始文本转换为 StreamingDataset 格式。\n*   **训练\u002F微调**: `scripts\u002Ftrain\u002F` - 支持训练或微调 125M 至 70B 参数量的模型。\n*   **评估**: `scripts\u002Feval\u002F` - 在学术基准或自定义任务上评估模型性能。\n*   **MCLI 集成**: 通过 `mcli\u002F` 目录可直接在 MosaicML 平台上启动大规模训练任务。\n\n更多详细工作流和高级用法请参考项目根目录下的 `TUTORIAL.md` 文件。","某金融科技公司的大模型团队需要在私有云环境中，基于海量内部合规文档训练一个具备长上下文理解能力的专属客服模型。\n\n### 没有 llm-foundry 时\n- **数据预处理繁琐**：团队需手动编写复杂的 ETL 脚本将原始文本转换为模型可接受的格式，缺乏统一的 StreamingDataset 支持，导致数据加载成为训练瓶颈。\n- **训练稳定性差**：在尝试复现 Flash Attention 或 ALiBi 等先进架构时，常因代码兼容性问题遭遇损失值尖峰（loss spikes），甚至导致训练中途崩溃。\n- **资源利用率低**：缺乏专业的性能分析工具，难以定位多卡训练中的通信瓶颈，昂贵的 GPU 集群实际算力利用率（MFU）长期徘徊在低位。\n- **评估流程割裂**：模型训练完成后，需额外开发脚本才能进行学术基准测试或自定义任务评估，迭代反馈周期长达数天。\n\n### 使用 llm-foundry 后\n- **数据流水线标准化**：直接利用内置的 `data_prep` 脚本，一键将异构文本数据高效转换为优化的 StreamingDataset 格式，大幅缩短数据准备时间。\n- **训练高效且稳定**：开箱即用地集成 Flash Attention 和 ALiBi 技术，配合内置的稳定性优化策略，轻松训练出支持 8k+ 上下文的 MPT 模型且无损失震荡。\n- **极致性能调优**：通过自带的基准测试脚本实时监控吞吐量与 MFU，快速识别并解决性能瓶颈，使集群算力利用率显著提升。\n- **全流程闭环**：在同一框架下完成从训练、微调到基于学术\u002F自定义任务的自动化评估，将模型迭代周期从“天”级压缩至“小时”级。\n\nllm-foundry 通过提供一站式、高稳定性的训练基础设施，让企业能够以最低的工程成本快速构建并部署生产级大语言模型。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmosaicml_llm-foundry_d68c2d9f.png","mosaicml","Databricks Mosaic Research","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fmosaicml_b4287c5e.png","We remove the barriers to state-of-the-art generative AI model development and make data + AI available to all",null,"DbrxMosaicAI","https:\u002F\u002Fwww.databricks.com\u002Fresearch\u002Fmosaic","https:\u002F\u002Fgithub.com\u002Fmosaicml",[84,88,92,96],{"name":85,"color":86,"percentage":87},"Python","#3572A5",98.3,{"name":89,"color":90,"percentage":91},"Shell","#89e051",1.7,{"name":93,"color":94,"percentage":95},"Makefile","#427819",0,{"name":97,"color":98,"percentage":95},"Dockerfile","#384d54",4397,587,"2026-04-03T13:11:15","Apache-2.0","Linux","训练必需 NVIDIA GPU (已测试 A100-40GB\u002F80GB, H100-80GB)；推理可能支持消费级 NVIDIA 或 AMD 显卡但未积极测试。CUDA 版本需 12.8。","未说明",{"notes":107,"python":108,"dependencies":109},"强烈建议使用官方提供的 Docker 镜像 (mosaicml\u002Fllm-foundry) 进行部署，镜像已预装大部分依赖但需单独 pip install llm-foundry。若不使用 Docker，需手动安装 Flash Attention 和 TransformerEngine 以支持 H100 的 FP8 特性。代码库主要基于 PyTorch 2.7.0 和 CUDA 12.8 在 A100\u002FH100 上测试通过。","3.12",[110,111,112,113,114],"torch==2.7.0","cmake","packaging","flash-attn","transformerengine",[26,13],[117,118,119,120,121],"deep-learning","llm","neural-networks","nlp","pytorch","2026-03-27T02:49:30.150509","2026-04-06T07:12:50.592513",[125,130,135,140],{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},14676,"MPT 模型是否支持 FasterTransformer？是否有相关的转换脚本？","FasterTransformer 将继续支持现有的 MPT 模型，但官方不再投入精力确保未来模型的兼容性。由于 FasterTransformer 的最后一次发布是在 1 月份，官方建议转向其他开发更活跃的服务，例如 LightLLM 或 VLLM。如果必须使用，可以参考社区提供的 config.pbtxt 配置示例，其中包含 backend 设置为 \"fastertransformer\" 以及 input_ids、input_lengths 等输入参数的详细定义。","https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fissues\u002F67",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},14677,"如何使用本地数据集对 MPT 模型进行微调？需要转换为 HuggingFace 数据集格式吗？","可以直接使用本地的 jsonl 文件路径进行微调，无需预先转换为标准的 HuggingFace 数据集对象。在 YAML 配置文件中，可以通过 `hf_kwargs` 指定 `data_files` 参数指向本地文件路径（例如：`train: \u002Fpath\u002Fto\u002Ftrain.jsonl`），并设置 `hf_name` 为任意标识符（如 'my-local-dataset'）。同时需要指定 `preprocessing_fn` 来处理数据加载逻辑。如果在运行中遇到损失不收敛等问题，可能需要调整 batch size 等超参数。","https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fissues\u002F94",{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},14678,"使用 LION 优化器时损失函数无法收敛怎么办？","这通常不是代码 Bug，而是 LION 优化器本身对超参数比较敏感。官方建议不要直接套用默认设置，而需要针对具体任务实验调整学习率调度（schedule）、batch size 和数据分布。此外，注意力机制的实现方式（如 `attn_impl` 设置为 `torch`）也会影响结果。可以参考 lucidrains 的 LION PyTorch 实现仓库获取调优建议。如果遇到类似损失尖峰或不下降的情况，优先尝试减小 batch size 或调整学习率。","https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fissues\u002F317",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},14679,"如何在 MPT-30B 中实现函数调用（Function Calling）支持？","目前官方尚未提供开源的微调模型直接支持函数调用，但社区提出了基于 ChatML 格式的扩展方案。建议在系统提示词（system prompt）中加入函数定义的 Schema（例如使用 TypeScript 风格定义命名空间和函数类型），并在对话中引入新的角色标签，如 `\u003C|im_start|>function` 用于模型生成函数调用，`\u003C|im_start|>tool` 用于返回函数执行结果。具体的上下文构建方式是将函数定义放在第一个 system block 中，随后跟随正常的用户和助手对话。","https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fissues\u002F379",[146,151,156,161,166,171,176,181,186,191,196,201,206,211,216,221,226,231,236,241],{"id":147,"version":148,"summary_zh":149,"released_at":150},81585,"v0.22.0","## 变更内容\n* 将测试改为使用 Foundry 镜像，由 @bowenyang008 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1839 中完成\n* 为支持 Qwen2 系列模型，在 QKV 输入投影中修复偏差问题，由 @gupta-abhay 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1837 中完成\n* 修复与 attention_bias 的向后兼容性，由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1840 中完成\n* 将 Head Dim 添加为 Attention 和 Rope 的可配置参数，由 @jdchang1 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1842 中完成\n* 支持重用 W_k 和 W_v 的输入，由 @ShashankMosaicML 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1710 中完成\n* 在错误信息中显示数据类型，由 @emmanuel-ferdman 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1834 中完成\n* 更新了 Composer 和 Streaming 的版本，由 @ethantang-db 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1857 中完成\n* 更新并限制了我们的 Composer 和 MLflow 库版本，由 @ethantang-db 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1859 中完成\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fcompare\u002Fv0.21.0...v0.22.0","2025-07-29T00:40:29",{"id":152,"version":153,"summary_zh":154,"released_at":155},81586,"v0.21.0","## 简要总结\n* Torch 版本已升级至 2.7.0\n* 通过 ENV 变量 FSDP_VERSION=2 支持 FSDP2。目前仅支持预训练（使用 meta 初始化）。启用 FSDP2 不需要修改 YAML 文件，仅适用于 FSDP(1) 的属性将被忽略并发出警告。更多详情请参阅 Composer [发布](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fcomposer\u002Freleases)\n\n## 变更内容\n* 在块覆盖中添加对 nope 位置编码的支持。由 @ShashankMosaicML 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1794 中实现\n* 将 foundry 版本升级至 0.21.0.dev0。由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1812 中完成\n* 在注意力机制中添加温度调节功能。由 @ShashankMosaicML 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1793 中实现\n* 更新 MCLI YAML 文件中的 foundry 版本。由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1813 中完成\n* 升级 yapf 版本。由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1814 中完成\n* 允许为 llama4 子选择合适的配置文件。由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1815 中完成\n* 将 RMSNorm 更改为使用 PyTorch 原生实现。由 @josejg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1809 中完成\n* 更新 datasets 依赖版本，从 \u003C3.6,>=3.3.2 调整为 >=3.3.2,\u003C3.7。由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1817 中完成\n* 将 onnxruntime 从 1.19.2 升级至 1.22.0。由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1819 中完成\n* 更新 huggingface-hub[hf_xet] 依赖版本，从 \u003C0.31,>=0.30.0 调整为 >=0.30.0,\u003C0.32。由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1818 中完成\n* 弃用推理 API 包装器。由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1821 中完成\n* 修复 Dtensor 初始化问题。由 @bowenyang008 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1820 中完成\n* 更新 accelerate 依赖版本，从 \u003C1.7,>=0.25 调整为 >=0.25,\u003C1.8。由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1824 中完成\n* 将 onnx 从 1.17.0 升级至 1.18.0。由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1823 中完成\n* 为 Python 3.12 更新 docformatter，并将 blank_line_before_module_docstring 设置为 false。由 @sashaDoubov 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1825 中完成\n* 删除无用的 print(\"here\") 语句。由 @tsebaka 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1826 中完成\n* 将 CI 测试版本更新至最新。由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1827 中完成\n* 将 coverage[toml] 从 7.8.0 升级至 7.8.2。由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1830 中完成\n* 实现可配置的分片大小。由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1833 中完成\n* 将 Composer 升级至 0.31.0。由 @bowenyang008 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1835 中完成\n* 修复与 Composer 主分支不兼容的单体检查点保存问题。由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1836 中完成\n* 将 torch 版本升级至 2.7。由 @bowenyang008 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1832 中完成\n* 将 huggingface-hub 的上限版本提升至 0.33。由 @bowenyang008 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1834 中完成","2025-05-31T00:57:58",{"id":157,"version":158,"summary_zh":159,"released_at":160},81587,"v0.20.0","## 变更内容\n* 由 @KuuCi 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1778 中将 Dev 版本升级至 0.20.0.dev0\n* 由 @KuuCi 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1779 中将示例 YAML 文件中的版本更新为 0.19.0\n* 由 @ethantang-db 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1781 中使分词器在构建 LLM 时变为可选\n* 由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1780 中移除了 CI 过程中对 Hugging Face 的更多调用\n* 由 @adyasha-db 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1787 中修改了多模态消息的验证检查\n* 由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1786 中移除了 CI 中与 Hugging Face 的所有连接\n* 由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1788 中将 transformers 的依赖范围从 \u003C4.50,>=v4.49.0 更新为 >=v4.49.0,\u003C4.52\n* 由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1776 中将 einops 从 0.8.0 升级至 0.8.1\n* 由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1775 中将 gitpython 从 3.1.43 升级至 3.1.44\n* 由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1790 中将 transformers 更新至 4.51\n* 由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1796 中将 setuptools 的依赖范围从 \u003C78.0.0 更新为 \u003C80.0.0\n* 由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1797 中将 tiktoken 的依赖范围从 \u003C0.8.1,>=0.4 更新为 >=0.4,\u003C0.9.1\n* 由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1800 中将 packaging 的依赖范围从 \u003C25,>=21 更新为 >=21,\u003C26\n* 由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1799 中将 accelerate 的依赖范围从 \u003C1.4,>=0.25 更新为 >=0.25,\u003C1.7\n* 由 @ethantang-db 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1792 中扩展了 hf_checkpointer，以支持保存任何额外内容\n* 由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1795 中实现了仅在全局 Rank 0 加载模型的混合初始化方式\n* 由 @ethantang-db 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1801 中为 hf_base.py 添加了 attn_implementation 参数\n* 由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1803 中将 setuptools 的依赖范围从 \u003C80.0.0 更新为 \u003C81.0.0\n* 由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1807 中将 datasets 的依赖范围从 \u003C3.4,>=3.3.2 更新为 >=3.3.2,\u003C3.6\n* 由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1810 中更新了 grouped-gemm 的版本\n* 由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1811 中移除了部分已弃用的旧代码和注释\n\n## 新贡献者\n* @ethantang-db 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1781 中做出了首次贡献\n* @adyasha-db 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1787 中做出了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fcompare\u002Fv0.19.0...v0.20.0","2025-04-29T20:09:02",{"id":162,"version":163,"summary_zh":164,"released_at":165},81588,"v0.19.0","## 新增功能\n### 1. 升级到 Python 3.12（https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1755）\n我们新增了对 Python 3.12 的支持，并弃用了对 Python 3.9 的支持。\n\n## 变更内容\n* 使用 llmfoundry 镜像代替 PyTorch 镜像进行 GPU 测试，由 @rithwik-db 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1752 中完成\n* 将开发版本号提升至 0.19.0.dev0，由 @rithwik-db 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1753 中完成\n* 更新 mcli YAML 示例以使用 0.18.0 和 PyTorch 2.6，由 @rithwik-db 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1754 中完成\n* 修复使用 HF 模型和 TE Layers 进行 FSDP 训练时的元数据初始化问题，由 @jjuvonen-amd 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1745 中完成\n* 修复 `llmfoundry\u002Fdata\u002Ftext_data.py` 中的 bug，由 @gsganden 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1760 中完成\n* 将 setuptools 的版本要求从 \u003C76.0.0 更新至 \u003C78.0.0，由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1758 中完成\n* 更新 README.md，由 @gsganden 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1721 中完成\n* 添加针对通用表格下载错误的错误处理机制，由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1761 中完成\n* 稍微修改了打包逻辑以支持继承，由 @abaheti95 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1762 中完成\n* 移除注册回退机制，由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1764 中完成\n* 将计划器的保存与加载操作移至配置日志记录之后，由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1769 中完成\n* 升级到 Python 3.12，由 @KuuCi 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1755 中完成\n* 修复 GPU 测试中的 Python 3.10 相关问题，由 @KuuCi 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1770 中完成\n* 移除测试中大量重复调用 Hugging Face API 的代码，由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1768 中完成\n* 将 coverage[toml] 的版本从 7.6.10 更新至 7.8.0，由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1767 中完成\n* 更新 mlflow 的版本要求，从 \u003C2.19,>=2.14.1 调整为 >=2.14.1,\u003C2.22，由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1766 中完成\n* 升级 Composer 至 0.30.0，由 @KuuCi 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1772 中完成\n* 升级 streaming 至 0.12.0，由 @KuuCi 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1777 中完成\n\n## 新贡献者\n* @jjuvonen-amd 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1745 中完成了首次贡献\n* @gsganden 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1760 中完成了首次贡献\n* @abaheti95 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1762 中完成了首次贡献\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fcompare\u002Fv0.18.0...v0.19.0","2025-04-07T20:25:51",{"id":167,"version":168,"summary_zh":169,"released_at":170},81589,"v0.18.0","## 变更内容\n- Torch 已升级至 `2.6.0`（在 #1740 中）\n    - 在最新的 megablocks 版本中，稀疏支持已被禁用（作为最新 Torch 升级的一部分），我们也相应地将这些禁用措施同步到了 llm-foundry 中（更多详情请参阅 [megablocks 发布说明](https:\u002F\u002Fgithub.com\u002Fdatabricks\u002Fmegablocks\u002Freleases\u002Ftag\u002Fv0.8.0)）。\n- 由于版本兼容性问题，`TransformerEngine` 已从 `all` 依赖组中移除（在 #1742 中）。我们预计将在未来的版本中重新添加该组件。\n- Transformers 已升级至 `v4.49.0`（在 #1735 中），这会导致主权重采用 `torch.bfloat16` 格式（更多信息请参阅 https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Fissues\u002F36567）。由于 `llm-foundry` 不支持低精度的主权重，因此我们在加载时手动将其硬编码为 `torch.float32`（见 #1734）。\n\n## 详细变更\n* 移除已弃用参数，由 @bigning 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1727 中完成。\n* 为 FA 2.7.1.post1 升级而更新 TE，由 @KuuCi 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1730 中完成。\n* 修复 Transformers 中的数据类型问题，由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1734 中完成。\n* 将 Composer 升级至 0.29.0，由 @rithwik-db 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1733 中完成。\n* 将 Transformer 升级至 v4.49.0，由 @KuuCi 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1735 中完成。\n* 将 FA2 升级至 2.7.4.post1，由 @KuuCi 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1728 中完成。\n* 注释 GHCR 镜像上传相关代码，由 @KuuCi 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1739 中完成。\n* 从 all 依赖组中移除 TE，由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1742 中完成。\n* 将 Torch 升级至 2.6，由 @rithwik-db 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1740 中完成。\n* 更新 Makefile 以使用 WORLD_SIZE，由 @irenedea 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1751 中完成。\n\n## 新贡献者\n* @rithwik-db 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1733 中完成了首次贡献。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fcompare\u002Fv0.17.1...v0.18.0","2025-03-18T18:31:07",{"id":172,"version":173,"summary_zh":174,"released_at":175},81590,"v0.17.1","## 新增功能\n### 数据集版本升级 (#1724)\n我们已将 Hugging Face 数据集库的版本升级，以修复在分词或过滤后多进程池容易挂起的常见问题。\n\n## 变更内容\n* 由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1714 中将 accelerate 的依赖版本从 \u003C1.2,>=0.25 更新为 >=0.25,\u003C1.4\n* 由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1724 中升级了 datasets 版本\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fcompare\u002Fv0.17.0...v0.17.1","2025-02-21T22:12:05",{"id":177,"version":178,"summary_zh":179,"released_at":180},81591,"v0.17.0","## 变更内容\n* 更新 mcli 示例以使用 0.16.0 版本，由 @irenedea 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1713 中完成。\n* 重构 HF 检查点程序，由 @milocress 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1690 中完成。\n  之前，MlFlow 要求将 PEFT 模型指定为与 Transformers 模型不同的特殊“风味”。现在不再需要这种变通方法，因此我们可以简化代码路径，并将上传 HuggingFace 检查点与注册训练好的模型清晰地分离。\n* 将版本号提升至 0.18.0.dev，由 @milocress 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1717 中完成。\n  移除了 `mpt` 损失计算中已弃用的 `sample_weighing_factor` 参数。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fcompare\u002Fv0.16.0...v0.17.0","2025-01-30T23:53:25",{"id":182,"version":183,"summary_zh":184,"released_at":185},81592,"v0.16.0","## 新增功能\n### Streaming 0.11.0 🚀 (#1711)\n我们已将 [streaming](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fstreaming) 升级至 0.11.0。现在，通过注册表，StreamingDataset 可以与自定义的 Stream 实现一起使用。有关示例用法，请参阅[文档页面](https:\u002F\u002Fdocs.mosaicml.com\u002Fprojects\u002Fstreaming\u002Fen\u002Fstable\u002Fdataset_configuration\u002Fmixing_data_sources.html)。\n\n## 变更内容\n* 由 @j316chuck 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1688 中修复 llama3 示例 YAML 文件。\n* 由 @snarayan21 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1689 中更新示例 YAML 文件，以使用最新的 foundry 版本。\n* 由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1670 中将 datasets 的依赖版本从 \u003C2.21,>=2.20.0 更新为 >=2.20.0,\u003C3.2。\n* 由 @KuuCi 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1697 中处理源数据集中多个斜杠的问题，将其合并为一个斜杠。\n* 由 @snarayan21 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1701 中使加载的 PEFT 适配器可选地设置为可训练。\n* 由 @ShashankMosaicML 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1700 中为 QA 和 messages 数据集添加预处理器。\n* 由 @b-chu 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1704 中更新 pycln。\n* 由 @b-chu 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1703 中添加权限错误处理。\n* 由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1698 中将 datasets 的依赖版本从 \u003C3.2,>=2.20.0 更新为 >=2.20.0,\u003C3.3。\n* 由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1702 中将 coverage[toml] 从 7.6.4 升级至 7.6.10。\n* 由 @es94129 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1711 中将 mosaicml-streaming 更新至 0.11.0。\n* 由 @irenedea 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1712 中将版本号提升至 0.17.0.dev0。\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fcompare\u002Fv0.15.1...v0.16.0","2025-01-17T19:34:50",{"id":187,"version":188,"summary_zh":189,"released_at":190},81593,"v0.15.1","## 变更内容\n* 由 @j316chuck 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1667 中将版本号升级至 0.16.0.dev0\n* 由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1673 中将 mlflow 的依赖范围从 \u003C2.18,>=2.14.1 更新为 >=2.14.1,\u003C2.19\n* 由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1668 中加速了嵌入测试\n* 由 @j316chuck 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1674 中添加了 mcli 的 YAML 版本升级\n* 由 @snarayan21 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1684 中升级了 OpenAI 的版本\n* 由 @snarayan21 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1685 中将 Streaming 升级至 v0.10.0\n* 由 @mattyding 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1679 中修复了在使用流式数据且无远程路径时的自动打包问题\n* 由 @snarayan21 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1687 中将 Composer 升级至 v0.28.0\n* 由 @janEbert 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1677 中公开了 `DistributedSampler` 的随机种子参数\n* 由 @j316chuck 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1686 中添加了 Llama3 微调示例的 YAML 文件\n\n## 新贡献者\n* @janEbert 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1677 中完成了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fcompare\u002Fv0.15.0...v0.15.1","2024-12-05T20:59:24",{"id":192,"version":193,"summary_zh":194,"released_at":195},81594,"v0.15.0","## 新特性\n\n### 开源嵌入模型 + 对比学习代码（https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1615）\n\nLLM Foundry 现在支持使用对比损失对嵌入模型进行微调。Foundry 目前支持多种选择对比损失中负样本段落的方法，这些负样本可以是随机选取的，也可以是预先定义好的。更多信息请参阅 [README](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fblob\u002Fmain\u002Fllmfoundry\u002Fmodels\u002Fllm_embed\u002FREADME.md)。\n\n### PyTorch 2.5.1（https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1665）\n\n此版本将 LLM Foundry 更新至 PyTorch 2.5.1，带来了对 PyTorch 2.5.1 中新特性和优化的支持。\n\n### 改进的错误信息（https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1657、https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1660、https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1623、https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1625）\n\n提供了多项改进的错误信息，使用户错误的调试更加清晰。\n\n\n## 变更内容\n* 更新 mcli 示例以使用 0.14.0，由 @irenedea 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1624 中完成\n* 开源嵌入模型 + 对比学习代码，由 @KuuCi 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1615 中实现\n* 捕获 Delta 表未找到的错误，由 @milocress 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1625 中完成\n* 添加 Mlflow 403 PL 用户错误处理，由 @mattyding 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1623 中实现\n* 捕捉数据准备集群启动失败的情况，由 @milocress 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1628 中完成\n* 提升 Mlflow 的最大版本号，由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1629 中完成\n* 再次封装集群连接失败的异常处理，由 @milocress 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1630 中实现\n* 添加 Mlflow `log_model` 选项，由 @nancyhung 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1544 中完成\n* 将损失生成相关的 token 计数逻辑移至数据加载器中，由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1632 中完成\n* 将 Databricks-Connect 版本从 14.1.0 升级至 15.4.3，由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1636 中完成\n* 修复数据集下载路径问题，由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1639 中完成\n* 撤销“将 Databricks-Connect 版本从 14.1.0 升级至 15.4.3”的更改，由 @XiaohanZhangCMU 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1640 中完成\n* 提升 Transformers 版本，由 @dakinggg 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1631 中完成\n* 修复 GPU 测试中的 test_tp_train 和 test_huggingface_conversion_callback_interval，由 @irenedea 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1642 中完成\n* 更新 datasets 依赖项，由 \u003C2.20,>=2.19 调整为 >=2.20.0,\u003C2.21，由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1330 中完成\n* 为 Transformers 的 save_pretrained 方法添加最大分片大小参数，由 @b-chu 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1648 中完成\n* 更新 huggingface-hub 的依赖项，由 \u003C0.25,>=0.19.0 调整为 >=0.19.0,\u003C0.27，由 @dependabot 在 https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1652 中完成\n* 更新 acce","2024-11-23T02:13:34",{"id":197,"version":198,"summary_zh":199,"released_at":200},81595,"v0.14.5","* Move transform_model_pre_registration in hf_checkpointer (https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1664)\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fcompare\u002Fv0.14.4...v0.14.5","2024-11-18T17:15:22",{"id":202,"version":203,"summary_zh":204,"released_at":205},81596,"v0.14.4","* Add max shard size to transformers save_pretrained by @b-chu in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1648\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fcompare\u002Fv0.14.3...v0.14.4","2024-11-07T20:42:47",{"id":207,"version":208,"summary_zh":209,"released_at":210},81597,"v0.14.3","## What's Changed\r\n* Fix dataset download location by @dakinggg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1639\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fcompare\u002Fv0.14.2...v0.14.3","2024-11-05T15:41:03",{"id":212,"version":213,"summary_zh":214,"released_at":215},81598,"v0.14.2","## Bug Fixes\r\n### Move loss generating token counting to the dataloader (#1632)\r\nFixes a throughput regression due to https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1610, which was release in v0.14.0\r\n\r\n## What's Changed\r\n* Move loss generating token counting to the dataloader by @dakinggg   in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1632\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fcompare\u002Fv0.14.1...v0.14.2","2024-11-04T02:14:03",{"id":217,"version":218,"summary_zh":219,"released_at":220},81599,"v0.14.1","## New Features\r\n### Use log_model for registering models (#1544 )\r\nInstead of calling the mlflow register API directly, we use the intended `log_model` API, which will both log the model to mlflow run artifacts, and register it to Unity Catalog.\r\n\r\n## What's Changed\r\n* Catch delta table not found error by @milocress  in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1625\r\n* Add Mlflow 403 PL UserError @dakinggg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1623\r\n* Catches when data prep cluster fails to start by @milocress in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1628\r\n* add another cluster connection failure wrapper by @milocress in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1630\r\n* Use log_model API to register the model by @nancyhung @dakinggg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1544\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fcompare\u002Fv0.14.0...v0.14.1","2024-11-01T23:55:45",{"id":222,"version":223,"summary_zh":224,"released_at":225},81600,"v0.14.0","## New Features\r\n### Load Checkpoint Callback (#1570)\r\nWe added support for Composer's LoadCheckpoint [callback](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fcomposer\u002Fblob\u002F28756dd52e96371689b764cb72c336406460ad35\u002Fcomposer\u002Fcallbacks\u002Fload_checkpoint.py#L18), which loads a checkpoint at a specified event. This enables use cases like loading model base weights with peft.\r\n```\r\ncallbacks:\r\n    load_checkpoint:\r\n        load_path: \u002Fpath\u002Fto\u002Fyour\u002Fweights\r\n```\r\n\r\n## Breaking Changes\r\n### Accumulate over tokens in a Batch for Training Loss (#1618,#1610,#1595)\r\nWe added a new flag `accumulate_train_batch_on_tokens` which specifies whether training loss is accumulated over the number of tokens in a batch, rather than the number of samples. It is true by default. This will slightly change loss curves for models trained with padding. The old behavior can be recovered by simply setting this to False explicitly. \r\n\r\n### Default Run Name (#1611)\r\nIf no run name is provided, we now will default to using composer's [randomly generated run names](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fcomposer\u002Fblob\u002Fmain\u002Fcomposer\u002Ftrainer\u002Ftrainer.py#L549). (Previously, we defaulted to using \"llm\" for the run name.)\r\n\r\n## What's Changed\r\n* Update mcli examples to use 0.13.0 by @irenedea in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1594\r\n* Pass accumulate_train_batch_on_tokens through to composer by @dakinggg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1595\r\n* Loosen MegaBlocks version pin by @mvpatel2000 in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1597\r\n* Add configurability for hf checkpointer register timeout by @dakinggg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1599\r\n* Loosen MegaBlocks to \u003C1.0 by @mvpatel2000 in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1598\r\n* Finetuning dataloader validation tweaks by @mvpatel2000 in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1600\r\n* Bump onnx from 1.16.2 to 1.17.0 by @dependabot in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1604\r\n* Remove TE from dockerfile and instead add as optional dependency by @snarayan21 in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1605\r\n* Data prep on multiple GPUs by @eitanturok in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1576\r\n* Add env var for configuring the maximum number of processes to use for dataset processing by @irenedea in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1606\r\n* Updated error message for cluster check by @nancyhung in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1602\r\n* Use fun default composer run names by @irenedea in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1611\r\n* Ensure log messages are properly formatted again by @snarayan21 in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1614\r\n* Add UC not enabled error for delta to json conversion by @irenedea in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1613\r\n* Use a temporary directory for downloading finetuning dataset files by @irenedea in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1608\r\n* Bump composer version to 0.26.0 by @irenedea in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1616\r\n* Add loss generating token counts by @dakinggg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1610\r\n* Change accumulate_train_batch_on_tokens default to True by @dakinggg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1618\r\n* Bump version to 0.15.0.dev0 by @irenedea in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1621\r\n* Add load checkpoint callback by @irenedea in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1570\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fcompare\u002Fv0.13.0...v0.14.0","2024-10-28T22:41:18",{"id":227,"version":228,"summary_zh":229,"released_at":230},81601,"v0.13.1","# 🚀 LLM Foundry v0.13.1\r\n\r\n## What's Changed\r\n* Add configurability to HF checkpointer timeout by @dakinggg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1599\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fcompare\u002Fv0.13.0...v0.13.1","2024-10-18T16:50:38",{"id":232,"version":233,"summary_zh":234,"released_at":235},81602,"v0.13.0","# 🚀 LLM Foundry v0.13.0\r\n\r\n## 🛠️ Bug Fixes & Cleanup \r\n### Pytorch 2.4 Checkpointing (#1569, #1581, #1583)\r\nResolved issues related to checkpointing for Curriculum Learning (CL) callbacks. \r\n\r\n## 🔧 Dependency Updates\r\nBumped tiktoken from 0.4.0 to 0.8.0 (#1572) \r\nUpdated onnxruntime from 1.19.0 to 1.19.2 (#1590) \r\n\r\n## What's Changed\r\n* Update mcli yamls by @dakinggg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1552\r\n* Use `allenai\u002Fc4` instead of `c4` dataset by @eitanturok in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1554\r\n* Tensor Parallelism by @eitanturok in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1521\r\n* Insufficient Permissions Error when trying to access table by @KuuCi in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1555\r\n* Add NoOp optimizer by @snarayan21 in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1560\r\n* Deterministic GCRP Errors  by @KuuCi in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1559\r\n* Simplify CL API by @b-chu in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1510\r\n* Reapply #1389 by @dakinggg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1561\r\n* Add dataset swap callback by @b-chu in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1536\r\n* Add error to catch more unknown example types by @milocress in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1562\r\n* Add FileExtensionNotFoundError by @b-chu in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1564\r\n* Add InvalidConversationError by @b-chu in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1565\r\n* Release docker img by @KuuCi in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1547\r\n* Revert FT dataloader changes from #1561, keep #1564 by @snarayan21 in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1566\r\n* Cleanup TP by @eitanturok in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1556\r\n* Changes for dataset swap callback by @gupta-abhay in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1569\r\n* Do not consider run_name when auto-detecting autoresume by @irenedea in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1571\r\n* Allow parameters with requires_grad=False in meta init by @sashaDoubov in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1567\r\n* Bump tiktoken from 0.4.0 to 0.8.0 by @dependabot in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1572\r\n* Add extensions to FinetuningFileNotFoundError by @b-chu in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1578\r\n* Handle long file names in convert text to mds by @irenedea in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1579\r\n* Set streaming log level by @mvpatel2000 in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1582\r\n* Fix pytorch checkpointing for CL callback by @b-chu in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1581\r\n* Fix pytorch checkpointing for CL callback by @b-chu in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1583\r\n* Error if filtered dataset contains 0 examples by @irenedea in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1585\r\n* Change cluster errors from NetworkError to UserError by @irenedea in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1586\r\n* Do not autoresume if a default name is set, only on user defined ones by @irenedea in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1588\r\n* Bump onnxruntime from 1.19.0 to 1.19.2 by @dependabot in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1590\r\n* Make FinetuningStreamingDataset parameters more flexible by @XiaohanZhangCMU in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1580\r\n* Add build callback tests by @irenedea in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1577\r\n* Bump version to 0.14.0.dev0 by @irenedea in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1587\r\n* Fix typo in eval code by using 'fsdp' instead of 'fsdp_config' by @irenedea in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1593\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fcompare\u002Fv0.12.0...v0.13.0","2024-10-15T06:23:20",{"id":237,"version":238,"summary_zh":239,"released_at":240},81603,"v0.12.0","# 🚀 LLM Foundry v0.12.0\r\n\r\n## New Features\r\n\r\n### PyTorch 2.4 (#1505)\r\nThis release updates LLM Foundry to the PyTorch 2.4 release, bringing with it support for the new features and optimizations in PyTorch 2.4\r\n\r\n### Extensibility improvements (#1450, #1449, #1468, #1467, #1478, #1493, #1495, #1511, #1512, #1527)\r\nNumerous improvements to the extensibility of the modeling and data loading code, enabling easier reuse for subclassing and extending. Please see the linked PRs for more details on each change.\r\n\r\n### Improved error messages (#1457, #1459, #1519, #1518, #1522, #1534, #1548, #1551)\r\nVarious improved error messages, making debugging user errors more clear.\r\n\r\n### Sliding window in torch attention (#1455)\r\nWe've added support for sliding window attention to the reference attention implementation, allowing easier testing and comparison against more optimized attention variants.\r\n\r\n## Bug fixes\r\n\r\n### Extra BOS token for llama 3.1 with completion data (#1476)\r\nA bug resulted in an extra BOS token being added between prompt and response during finetuning. This is fixed so that the prompt and response supplied by the user are concatenated without any extra tokens put between them.\r\n\r\n## What's Changed\r\n* Add test for logged_config transforms by @b-chu in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1441\r\n* Bump version to 0.12.0.dev0. by @irenedea in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1447\r\n* Update pytest-codeblocks requirement from \u003C0.17,>=0.16.1 to >=0.16.1,\u003C0.18 by @dependabot in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1445\r\n* Bump coverage[toml] from 7.4.4 to 7.6.1 by @dependabot in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1442\r\n* Enabled generalizing build_inner_model in ComposerHFCausalLM by @gupta-abhay in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1450\r\n* Update llm foundry version in mcli yamls by @irenedea in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1451\r\n* merge to main by @XiaohanZhangCMU in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F865\r\n* allow embedding resizing passed through by @jdchang1 in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1449\r\n* Update packaging requirement from \u003C23,>=21 to >=21,\u003C25 by @dependabot in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1444\r\n* Update pytest requirement from \u003C8,>=7.2.1 to >=7.2.1,\u003C9 by @dependabot in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1443\r\n* Implement ruff rules enforcing PEP 585 by @snarayan21 in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1453\r\n* Adding sliding window attn to scaled_multihead_dot_product_attention by @ShashankMosaicML in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1455\r\n* Add user error for UnicodeDeocdeError in convert text to mds by @irenedea in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1457\r\n* Fix log_config by @josejg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1432\r\n* Add EnvironmentLogger Callback by @josejg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1350\r\n* Update mosaicml\u002Fci-testing to 0.1.2 by @irenedea in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1458\r\n* Correct error message for inference wrapper by @josejg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1459\r\n* Update CI tests to v0.1.2 by @KuuCi in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1466\r\n* Bump onnxruntime from 1.18.1 to 1.19.0 by @dependabot in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1461\r\n* Update tenacity requirement from \u003C9,>=8.2.3 to >=8.2.3,\u003C10 by @dependabot in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1460\r\n* Simple change to enable mapping functions for ft constructor by @gupta-abhay in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1468\r\n* use default eval interval from composer by @milocress in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1369\r\n* Consistent Naming EnviromentLoggingCallback by @josejg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1470\r\n* Register NaN Monitor Callback by @josejg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1471\r\n* Add train subset num batches by @mvpatel2000 in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1472\r\n* Parent class hf models by @jdchang1 in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1467\r\n* Remove extra bos for prompt\u002Fresponse data with llama3.1 by @dakinggg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1476\r\n* Add prepare fsdp back by @dakinggg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1477\r\n* Add date_string when applying tokenizer chat template by @snarayan21 in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1474\r\n* Make sample tokenization extensible by @gupta-abhay in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1478\r\n* Use Streaming version 0.8.1 by @snarayan21 in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1479\r\n* Bump hf-transfer from 0.1.3 to 0.1.8 by @dependabot in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1480\r\n* fix hf checkpointer by @milocress in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1489\r\n* Fix device mismatch when running hf.generate by @ShashankMosaicML in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1","2024-09-26T03:52:01",{"id":242,"version":243,"summary_zh":244,"released_at":245},81604,"v0.11.0","# 🚀 LLM Foundry v0.11.0\r\n\r\n## New Features\r\n\r\n### LLM Foundry CLI Commands (#1337, #1345, #1348, #1354)\r\nWe've added CLI commands for our commonly used scripts.\r\n\r\nFor example, instead of calling `composer llm-foundry\u002Fscripts\u002Ftrain.py parameters.yaml`, you can now do `composer -c llm-foundry train parameters.yaml`.\r\n\r\n\r\n### Docker Images Contain All Optional Dependencies (#1431)\r\n[LLM Foundry Docker images](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry?tab=readme-ov-file#mosaicml-docker-images) now have all optional dependencies. \r\n\r\n\r\n### Support for Llama3 Rope Scaling (#1391)\r\nTo use it, you can add the following to your parameters:\r\n```\r\nmodel:\r\n    name: mpt_causal_lm\r\n    attn_config:\r\n      rope: true\r\n      ...\r\n      rope_impl: hf\r\n      rope_theta: 500000\r\n      rope_hf_config:\r\n        type: llama3\r\n        ...\r\n```\r\n\r\n\r\n### Tokenizer Registry (#1386)\r\nWe now have a tokenizer registry so you can easily add custom tokenizers.\r\n\r\n### LoadPlanner and SavePlanner Registries (#1358)\r\nWe now have LoadPlanner and SavePlanner registries so you can easily add custom checkpoint loading and saving logic.\r\n\r\n\r\n### Faster Auto-packing (#1435)\r\nThe auto packing startup is now much faster. To use auto packing with finetuning datasets, you can add `packing_ratio: auto` to your config like so:\r\n```\r\n  train_loader:\r\n    name: finetuning\r\n    dataset:\r\n      ...\r\n      packing_ratio: auto\r\n```\r\n\r\n\r\n## What's Changed\r\n* Extra serverless by @XiaohanZhangCMU in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1320\r\n* Fixing sequence_id =-1 bug, adding tests by @ShashankMosaicML in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1324\r\n* Registry docs update by @dakinggg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1323\r\n* Add dependabot by @dakinggg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1322\r\n* `HUGGING_FACE_HUB_TOKEN` -> `HF_TOKEN` by @dakinggg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1321\r\n* Bump version by @b-chu in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1326\r\n* Relax hf hub pin by @dakinggg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1314\r\n* Error if metadata matches existing keys by @dakinggg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1313\r\n* Update transformers requirement from \u003C4.41,>=4.40 to >=4.42.3,\u003C4.43 by @dependabot in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1327\r\n* Bump einops from 0.7.0 to 0.8.0 by @dependabot in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1328\r\n* Bump onnxruntime from 1.15.1 to 1.18.1 by @dependabot in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1329\r\n* Bump onnx from 1.14.0 to 1.16.1 by @dependabot in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1331\r\n* Currently multi-gpu generate does not work with hf.generate for hf checkpoints. This PR fixes that. by @ShashankMosaicML in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1332\r\n* Fix registry for callbacks with configs by @mvpatel2000 in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1333\r\n* Adding a child class of hf's rotary embedding to make hf generate work on multiple gpus. by @ShashankMosaicML in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1334\r\n* Add a config arg to just save an hf checkpoint by @dakinggg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1335\r\n* Deepcopy config in callbacks_with_config by @mvpatel2000 in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1336\r\n* Avoid HF race condition by @dakinggg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1338\r\n* Nicer error message for undefined symbol by @dakinggg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1339\r\n* Bump sentencepiece from 0.1.97 to 0.2.0 by @dependabot in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1342\r\n* Removing logging exception through update run metadata by @jjanezhang in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1292\r\n* [MCLOUD-4910] Escape UC names during data prep by @naren-loganathan in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1343\r\n* Add CLI for train.py by @KuuCi in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1337\r\n* Add fp32 to the set of valid inputs to attention layer by @j316chuck in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1347\r\n* Log all extraneous_keys in one go for ease of development by @josejg in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1344\r\n* Fix MLFlow Save Model for TE by @j316chuck in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1353\r\n* Add flag for saving only composer checkpoint by @irenedea in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1356\r\n* Expose flag for should_save_peft_only by @irenedea in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1357\r\n* Command utils + train by @KuuCi in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1361\r\n* Readd Clear Resolver by @KuuCi in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1365\r\n* Add Eval to Foundry CLI by @KuuCi in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1345\r\n* Enhanced Logging for convert_delta_to_json and convert_text_to_mds by @vanshcsingh in https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fllm-foundry\u002Fpull\u002F1366\r\n* Add convert_dataset_hf to CLI by @KuuCi","2024-08-13T17:16:23"]