[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-rasbt--LLMs-from-scratch":3,"tool-rasbt--LLMs-from-scratch":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",141543,2,"2026-04-06T11:32:54",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":10,"last_commit_at":50,"category_tags":51,"status":17},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,"2026-04-06T03:28:53",[14,15,13,52],"视频",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[14,35],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":77,"owner_twitter":73,"owner_website":78,"owner_url":79,"languages":80,"stars":93,"forks":94,"last_commit_at":95,"license":96,"difficulty_score":10,"env_os":97,"env_gpu":98,"env_ram":99,"env_deps":100,"category_tags":105,"github_topics":106,"view_count":123,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":124,"updated_at":125,"faqs":126,"releases":154},4487,"rasbt\u002FLLMs-from-scratch","LLMs-from-scratch","Implement a ChatGPT-like LLM in PyTorch from scratch, step by step","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备，这个项目都能提供坚实的路径指引。","# Build a Large Language Model (From Scratch)\n\nThis repository contains the code for developing, pretraining, and finetuning a GPT-like LLM and is the official code repository for the book [Build a Large Language Model (From Scratch)](https:\u002F\u002Famzn.to\u002F4fqvn0D).\n\n\u003Cbr>\n\u003Cbr>\n\n\u003Ca href=\"https:\u002F\u002Famzn.to\u002F4fqvn0D\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frasbt_LLMs-from-scratch_readme_bf20af783348.jpg\" width=\"250px\">\u003C\u002Fa>\n\n\u003Cbr>\n\nIn [*Build a Large Language Model (From Scratch)*](http:\u002F\u002Fmng.bz\u002ForYv), you'll learn and understand how large language models (LLMs) work from the inside out by coding them from the ground up, step by step. In this book, I'll guide you through creating your own LLM, explaining each stage with clear text, diagrams, and examples.\n\nThe method described in this book for training and developing your own small-but-functional model for educational purposes mirrors the approach used in creating large-scale foundational models such as those behind ChatGPT. In addition, this book includes code for loading the weights of larger pretrained models for finetuning.\n\n- Link to the official [source code repository](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch)\n- [Link to the book at Manning (the publisher's website)](http:\u002F\u002Fmng.bz\u002ForYv)\n- [Link to the book page on Amazon.com](https:\u002F\u002Fwww.amazon.com\u002Fgp\u002Fproduct\u002F1633437167)\n- ISBN 9781633437166\n\n\u003Ca href=\"http:\u002F\u002Fmng.bz\u002ForYv#reviews\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frasbt_LLMs-from-scratch_readme_ef3e482564e0.png\" width=\"220px\">\u003C\u002Fa>\n\n\n\u003Cbr>\n\u003Cbr>\n\nTo download a copy of this repository, click on the [Download ZIP](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch\u002Farchive\u002Frefs\u002Fheads\u002Fmain.zip) button or execute the following command in your terminal:\n\n```bash\ngit clone --depth 1 https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch.git\n```\n\n\u003Cbr>\n\n(If you downloaded the code bundle from the Manning website, please consider visiting the official code repository on GitHub at [https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch) for the latest updates.)\n\n\u003Cbr>\n\u003Cbr>\n\n\n# Table of Contents\n\nPlease note that this `README.md` file is a Markdown (`.md`) file. If you have downloaded this code bundle from the Manning website and are viewing it on your local computer, I recommend using a Markdown editor or previewer for proper viewing. If you haven't installed a Markdown editor yet, [Ghostwriter](https:\u002F\u002Fghostwriter.kde.org) is a good free option.\n\nYou can alternatively view this and other files on GitHub at [https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch) in your browser, which renders Markdown automatically.\n\n\u003Cbr>\n\u003Cbr>\n\n\n> **Tip:**\n> If you're seeking guidance on installing Python and Python packages and setting up your code environment, I suggest reading the [README.md](setup\u002FREADME.md) file located in the [setup](setup) directory.\n\n\u003Cbr>\n\u003Cbr>\n\n[![Code tests Linux](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch\u002Factions\u002Fworkflows\u002Fbasic-tests-linux-uv.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch\u002Factions\u002Fworkflows\u002Fbasic-tests-linux-uv.yml)\n[![Code tests Windows](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch\u002Factions\u002Fworkflows\u002Fbasic-tests-windows-uv-pip.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch\u002Factions\u002Fworkflows\u002Fbasic-tests-windows-uv-pip.yml)\n[![Code tests macOS](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch\u002Factions\u002Fworkflows\u002Fbasic-tests-macos-uv.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch\u002Factions\u002Fworkflows\u002Fbasic-tests-macos-uv.yml)\n\n\n\n| Chapter Title                                              | Main Code (for Quick Access)                                                                                                    | All Code + Supplementary      |\n|------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------|-------------------------------|\n| [Setup recommendations](setup) \u003Cbr\u002F>[How to best read this book](https:\u002F\u002Fsebastianraschka.com\u002Fblog\u002F2025\u002Freading-books.html)                            | -                                                                                                                               | -                             |\n| Ch 1: Understanding Large Language Models                  | No code                                                                                                                         | -                             |\n| Ch 2: Working with Text Data                               | - [ch02.ipynb](ch02\u002F01_main-chapter-code\u002Fch02.ipynb)\u003Cbr\u002F>- [dataloader.ipynb](ch02\u002F01_main-chapter-code\u002Fdataloader.ipynb) (summary)\u003Cbr\u002F>- [exercise-solutions.ipynb](ch02\u002F01_main-chapter-code\u002Fexercise-solutions.ipynb)               | [.\u002Fch02](.\u002Fch02)            |\n| Ch 3: Coding Attention Mechanisms                          | - [ch03.ipynb](ch03\u002F01_main-chapter-code\u002Fch03.ipynb)\u003Cbr\u002F>- [multihead-attention.ipynb](ch03\u002F01_main-chapter-code\u002Fmultihead-attention.ipynb) (summary) \u003Cbr\u002F>- [exercise-solutions.ipynb](ch03\u002F01_main-chapter-code\u002Fexercise-solutions.ipynb)| [.\u002Fch03](.\u002Fch03)             |\n| Ch 4: Implementing a GPT Model from Scratch                | - [ch04.ipynb](ch04\u002F01_main-chapter-code\u002Fch04.ipynb)\u003Cbr\u002F>- [gpt.py](ch04\u002F01_main-chapter-code\u002Fgpt.py) (summary)\u003Cbr\u002F>- [exercise-solutions.ipynb](ch04\u002F01_main-chapter-code\u002Fexercise-solutions.ipynb) | [.\u002Fch04](.\u002Fch04)           |\n| Ch 5: Pretraining on Unlabeled Data                        | - [ch05.ipynb](ch05\u002F01_main-chapter-code\u002Fch05.ipynb)\u003Cbr\u002F>- [gpt_train.py](ch05\u002F01_main-chapter-code\u002Fgpt_train.py) (summary) \u003Cbr\u002F>- [gpt_generate.py](ch05\u002F01_main-chapter-code\u002Fgpt_generate.py) (summary) \u003Cbr\u002F>- [exercise-solutions.ipynb](ch05\u002F01_main-chapter-code\u002Fexercise-solutions.ipynb) | [.\u002Fch05](.\u002Fch05)              |\n| Ch 6: Finetuning for Text Classification                   | - [ch06.ipynb](ch06\u002F01_main-chapter-code\u002Fch06.ipynb)  \u003Cbr\u002F>- [gpt_class_finetune.py](ch06\u002F01_main-chapter-code\u002Fgpt_class_finetune.py)  \u003Cbr\u002F>- [exercise-solutions.ipynb](ch06\u002F01_main-chapter-code\u002Fexercise-solutions.ipynb) | [.\u002Fch06](.\u002Fch06)              |\n| Ch 7: Finetuning to Follow Instructions                    | - [ch07.ipynb](ch07\u002F01_main-chapter-code\u002Fch07.ipynb)\u003Cbr\u002F>- [gpt_instruction_finetuning.py](ch07\u002F01_main-chapter-code\u002Fgpt_instruction_finetuning.py) (summary)\u003Cbr\u002F>- [ollama_evaluate.py](ch07\u002F01_main-chapter-code\u002Follama_evaluate.py) (summary)\u003Cbr\u002F>- [exercise-solutions.ipynb](ch07\u002F01_main-chapter-code\u002Fexercise-solutions.ipynb) | [.\u002Fch07](.\u002Fch07)  |\n| Appendix A: Introduction to PyTorch                        | - [code-part1.ipynb](appendix-A\u002F01_main-chapter-code\u002Fcode-part1.ipynb)\u003Cbr\u002F>- [code-part2.ipynb](appendix-A\u002F01_main-chapter-code\u002Fcode-part2.ipynb)\u003Cbr\u002F>- [DDP-script.py](appendix-A\u002F01_main-chapter-code\u002FDDP-script.py)\u003Cbr\u002F>- [exercise-solutions.ipynb](appendix-A\u002F01_main-chapter-code\u002Fexercise-solutions.ipynb) | [.\u002Fappendix-A](.\u002Fappendix-A) |\n| Appendix B: References and Further Reading                 | No code                                                                                                                         | [.\u002Fappendix-B](.\u002Fappendix-B) |\n| Appendix C: Exercise Solutions                             | - [list of exercise solutions](appendix-C)                                                                 | [.\u002Fappendix-C](.\u002Fappendix-C) |\n| Appendix D: Adding Bells and Whistles to the Training Loop | - [appendix-D.ipynb](appendix-D\u002F01_main-chapter-code\u002Fappendix-D.ipynb)                                                          | [.\u002Fappendix-D](.\u002Fappendix-D)  |\n| Appendix E: Parameter-efficient Finetuning with LoRA       | - [appendix-E.ipynb](appendix-E\u002F01_main-chapter-code\u002Fappendix-E.ipynb)                                                          | [.\u002Fappendix-E](.\u002Fappendix-E) |\n\n\u003Cbr>\n&nbsp;\n\nThe mental model below summarizes the contents covered in this book.\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frasbt_LLMs-from-scratch_readme_3a3e65304b21.jpg\" width=\"650px\">\n\n\n\u003Cbr>\n&nbsp;\n\n## Prerequisites\n\nThe most important prerequisite is a strong foundation in Python programming.\nWith this knowledge, you will be well prepared to explore the fascinating world of LLMs\nand understand the concepts and code examples presented in this book.\n\nIf you have some experience with deep neural networks, you may find certain concepts more familiar, as LLMs are built upon these architectures.\n\nThis book uses PyTorch to implement the code from scratch without using any external LLM libraries. While proficiency in PyTorch is not a prerequisite, familiarity with PyTorch basics is certainly useful. If you are new to PyTorch, Appendix A provides a concise introduction to PyTorch. Alternatively, you may find my book, [PyTorch in One Hour: From Tensors to Training Neural Networks on Multiple GPUs](https:\u002F\u002Fsebastianraschka.com\u002Fteaching\u002Fpytorch-1h\u002F), helpful for learning about the essentials.\n\n\n\n\u003Cbr>\n&nbsp;\n\n## Hardware Requirements\n\nThe code in the main chapters of this book is designed to run on conventional laptops within a reasonable timeframe and does not require specialized hardware. This approach ensures that a wide audience can engage with the material. Additionally, the code automatically utilizes GPUs if they are available. (Please see the [setup](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch\u002Fblob\u002Fmain\u002Fsetup\u002FREADME.md) doc for additional recommendations.)\n\n\n&nbsp;\n## Video Course\n\n[A 17-hour and 15-minute companion video course](https:\u002F\u002Fwww.manning.com\u002Flivevideo\u002Fmaster-and-build-large-language-models) where I code through each chapter of the book. The course is organized into chapters and sections that mirror the book's structure so that it can be used as a standalone alternative to the book or complementary code-along resource.\n\n\u003Ca href=\"https:\u002F\u002Fwww.manning.com\u002Flivevideo\u002Fmaster-and-build-large-language-models\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frasbt_LLMs-from-scratch_readme_4a642af55bac.webp\" width=\"350px\">\u003C\u002Fa>\n\n\n&nbsp;\n\n\n## Companion Book \u002F Sequel\n\n[*Build A Reasoning Model (From Scratch)*](https:\u002F\u002Fmng.bz\u002FlZ5B), while a standalone book, can be considered as a sequel to *Build A Large Language Model (From Scratch)*.\n\nIt starts with a pretrained model and implements different reasoning approaches, including inference-time scaling, reinforcement learning, and distillation, to improve the model's reasoning capabilities.\n\nSimilar to *Build A Large Language Model (From Scratch)*, [*Build A Reasoning Model (From Scratch)*](https:\u002F\u002Fmng.bz\u002FlZ5B) takes a hands-on approach implementing these methods from scratch.\n\n\u003Ca href=\"https:\u002F\u002Fmng.bz\u002FlZ5B\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frasbt_LLMs-from-scratch_readme_3603d5aed1a1.webp\" width=\"120px\">\u003C\u002Fa>\n\n- Amazon link (TBD)\n- [Manning link](https:\u002F\u002Fmng.bz\u002FlZ5B)\n- [GitHub repository](https:\u002F\u002Fgithub.com\u002Frasbt\u002Freasoning-from-scratch)\n\n\u003Cbr>\n\n&nbsp;\n## Exercises\n\nEach chapter of the book includes several exercises. The solutions are summarized in Appendix C, and the corresponding code notebooks are available in the main chapter folders of this repository (for example,  [.\u002Fch02\u002F01_main-chapter-code\u002Fexercise-solutions.ipynb](.\u002Fch02\u002F01_main-chapter-code\u002Fexercise-solutions.ipynb).\n\nIn addition to the code exercises, you can download a free 170-page PDF titled  [Test Yourself On Build a Large Language Model (From Scratch)](https:\u002F\u002Fwww.manning.com\u002Fbooks\u002Ftest-yourself-on-build-a-large-language-model-from-scratch) from the Manning website. It contains approximately 30 quiz questions and solutions per chapter to help you test your understanding.\n\n\u003Ca href=\"https:\u002F\u002Fwww.manning.com\u002Fbooks\u002Ftest-yourself-on-build-a-large-language-model-from-scratch\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frasbt_LLMs-from-scratch_readme_388299f23c0d.jpg\" width=\"150px\">\u003C\u002Fa>\n\n&nbsp;\n## Bonus Material\n\nSeveral folders contain optional materials as a bonus for interested readers:\n- **Setup**\n  - [Python Setup Tips](setup\u002F01_optional-python-setup-preferences)\n  - [Installing Python Packages and Libraries Used in This Book](setup\u002F02_installing-python-libraries)\n  - [Docker Environment Setup Guide](setup\u002F03_optional-docker-environment)\n\n- **Chapter 2: Working With Text Data**\n  - [Byte Pair Encoding (BPE) Tokenizer From Scratch](ch02\u002F05_bpe-from-scratch\u002Fbpe-from-scratch-simple.ipynb)\n  - [Comparing Various Byte Pair Encoding (BPE) Implementations](ch02\u002F02_bonus_bytepair-encoder)\n  - [Understanding the Difference Between Embedding Layers and Linear Layers](ch02\u002F03_bonus_embedding-vs-matmul)\n  - [Dataloader Intuition With Simple Numbers](ch02\u002F04_bonus_dataloader-intuition)\n\n- **Chapter 3: Coding Attention Mechanisms**\n  - [Comparing Efficient Multi-Head Attention Implementations](ch03\u002F02_bonus_efficient-multihead-attention\u002Fmha-implementations.ipynb)\n  - [Understanding PyTorch Buffers](ch03\u002F03_understanding-buffers\u002Funderstanding-buffers.ipynb)\n\n- **Chapter 4: Implementing a GPT Model From Scratch**\n  - [FLOPs Analysis](ch04\u002F02_performance-analysis\u002Fflops-analysis.ipynb)\n  - [KV Cache](ch04\u002F03_kv-cache)\n  - [Attention Alternatives](ch04\u002F#attention-alternatives)\n    - [Grouped-Query Attention](ch04\u002F04_gqa)\n    - [Multi-Head Latent Attention](ch04\u002F05_mla)\n    - [Sliding Window Attention](ch04\u002F06_swa)\n    - [Gated DeltaNet](ch04\u002F08_deltanet)\n  - [Mixture-of-Experts (MoE)](ch04\u002F07_moe)\n\n- **Chapter 5: Pretraining on Unlabeled Data**\n  - [Alternative Weight Loading Methods](ch05\u002F02_alternative_weight_loading\u002F)\n  - [Pretraining GPT on the Project Gutenberg Dataset](ch05\u002F03_bonus_pretraining_on_gutenberg)\n  - [Adding Bells and Whistles to the Training Loop](ch05\u002F04_learning_rate_schedulers)\n  - [Optimizing Hyperparameters for Pretraining](ch05\u002F05_bonus_hparam_tuning)\n  - [Building a User Interface to Interact With the Pretrained LLM](ch05\u002F06_user_interface)\n  - [Converting GPT to Llama](ch05\u002F07_gpt_to_llama)\n  - [Memory-efficient Model Weight Loading](ch05\u002F08_memory_efficient_weight_loading\u002Fmemory-efficient-state-dict.ipynb)\n  - [Extending the Tiktoken BPE Tokenizer with New Tokens](ch05\u002F09_extending-tokenizers\u002Fextend-tiktoken.ipynb)\n  - [PyTorch Performance Tips for Faster LLM Training](ch05\u002F10_llm-training-speed)\n  - [LLM Architectures](ch05\u002F#llm-architectures-from-scratch)\n    - [Llama 3.2 From Scratch](ch05\u002F07_gpt_to_llama\u002Fstandalone-llama32.ipynb)\n    - [Qwen3 Dense and Mixture-of-Experts (MoE) From Scratch](ch05\u002F11_qwen3\u002F)\n    - [Gemma 3 From Scratch](ch05\u002F12_gemma3\u002F)\n    - [Olmo 3 From Scratch](ch05\u002F13_olmo3\u002F)\n    - [Tiny Aya From Scratch](ch05\u002F15_tiny-aya\u002F)\n    - [Qwen3.5 From Scratch](ch05\u002F16_qwen3.5\u002F)\n  - [Chapter 5 with other LLMs as Drop-In Replacement (e.g., Llama 3, Qwen 3)](ch05\u002F14_ch05_with_other_llms\u002F)\n- **Chapter 6: Finetuning for classification**\n  - [Additional Experiments Finetuning Different Layers and Using Larger Models](ch06\u002F02_bonus_additional-experiments)\n  - [Finetuning Different Models on the 50k IMDb Movie Review Dataset](ch06\u002F03_bonus_imdb-classification)\n  - [Building a User Interface to Interact With the GPT-based Spam Classifier](ch06\u002F04_user_interface)\n- **Chapter 7: Finetuning to follow instructions**\n  - [Dataset Utilities for Finding Near Duplicates and Creating Passive Voice Entries](ch07\u002F02_dataset-utilities)\n  - [Evaluating Instruction Responses Using the OpenAI API and Ollama](ch07\u002F03_model-evaluation)\n  - [Generating a Dataset for Instruction Finetuning](ch07\u002F05_dataset-generation\u002Fllama3-ollama.ipynb)\n  - [Improving a Dataset for Instruction Finetuning](ch07\u002F05_dataset-generation\u002Freflection-gpt4.ipynb)\n  - [Generating a Preference Dataset With Llama 3.1 70B and Ollama](ch07\u002F04_preference-tuning-with-dpo\u002Fcreate-preference-data-ollama.ipynb)\n  - [Direct Preference Optimization (DPO) for LLM Alignment](ch07\u002F04_preference-tuning-with-dpo\u002Fdpo-from-scratch.ipynb)\n  - [Building a User Interface to Interact With the Instruction-Finetuned GPT Model](ch07\u002F06_user_interface)\n\nMore bonus material from the [Reasoning From Scratch](https:\u002F\u002Fgithub.com\u002Frasbt\u002Freasoning-from-scratch) repository:\n\n- **Qwen3 (From Scratch) Basics**\n  - [Qwen3 Source Code Walkthrough](https:\u002F\u002Fgithub.com\u002Frasbt\u002Freasoning-from-scratch\u002Fblob\u002Fmain\u002FchC\u002F01_main-chapter-code\u002FchC_main.ipynb)\n  - [Optimized Qwen3](https:\u002F\u002Fgithub.com\u002Frasbt\u002Freasoning-from-scratch\u002Ftree\u002Fmain\u002Fch02\u002F03_optimized-LLM)\n\n- **Evaluation**\n  - [Verifier-Based Evaluation (MATH-500)](https:\u002F\u002Fgithub.com\u002Frasbt\u002Freasoning-from-scratch\u002Ftree\u002Fmain\u002Fch03)\n  - [Multiple-Choice Evaluation (MMLU)](https:\u002F\u002Fgithub.com\u002Frasbt\u002Freasoning-from-scratch\u002Fblob\u002Fmain\u002FchF\u002F02_mmlu)\n  - [LLM Leaderboard Evaluation](https:\u002F\u002Fgithub.com\u002Frasbt\u002Freasoning-from-scratch\u002Fblob\u002Fmain\u002FchF\u002F03_leaderboards)\n  - [LLM-as-a-Judge Evaluation](https:\u002F\u002Fgithub.com\u002Frasbt\u002Freasoning-from-scratch\u002Fblob\u002Fmain\u002FchF\u002F04_llm-judge)\n- **Inference Scaling**\n  - [Self-Consistency](https:\u002F\u002Fgithub.com\u002Frasbt\u002Freasoning-from-scratch\u002Fblob\u002Fmain\u002Fch04\u002F01_main-chapter-code\u002Fch04_main.ipynb)\n  - [Self-Refinement](https:\u002F\u002Fgithub.com\u002Frasbt\u002Freasoning-from-scratch\u002Fblob\u002Fmain\u002Fch05\u002F01_main-chapter-code\u002Fch05_main.ipynb)\n\n- **Reinforcement Learning** (RL)\n  - [RLVR with GRPO From Scratch](https:\u002F\u002Fgithub.com\u002Frasbt\u002Freasoning-from-scratch\u002Fblob\u002Fmain\u002Fch06\u002F01_main-chapter-code\u002Fch06_main.ipynb)\n\n\n\u003Cbr>\n&nbsp;\n\n## Questions, Feedback, and Contributing to This Repository\n\n\nI welcome all sorts of feedback, best shared via the [Manning Forum](https:\u002F\u002Flivebook.manning.com\u002Fforum?product=raschka&page=1) or [GitHub Discussions](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch\u002Fdiscussions). Likewise, if you have any questions or just want to bounce ideas off others, please don't hesitate to post these in the forum as well.\n\nPlease note that since this repository contains the code corresponding to a print book, I currently cannot accept contributions that would extend the contents of the main chapter code, as it would introduce deviations from the physical book. Keeping it consistent helps ensure a smooth experience for everyone.\n\n\n&nbsp;\n## Citation\n\nIf you find this book or code useful for your research, please consider citing it.\n\nChicago-style citation:\n\n> Raschka, Sebastian. *Build A Large Language Model (From Scratch)*. Manning, 2024. ISBN: 978-1633437166.\n\nBibTeX entry:\n\n```\n@book{build-llms-from-scratch-book,\n  author       = {Sebastian Raschka},\n  title        = {Build A Large Language Model (From Scratch)},\n  publisher    = {Manning},\n  year         = {2024},\n  isbn         = {978-1633437166},\n  url          = {https:\u002F\u002Fwww.manning.com\u002Fbooks\u002Fbuild-a-large-language-model-from-scratch},\n  github       = {https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch}\n}\n```\n","# 从零开始构建大型语言模型\n\n本仓库包含开发、预训练和微调类似 GPT 的大型语言模型的代码，也是书籍《从零开始构建大型语言模型》（[Build a Large Language Model (From Scratch)](https:\u002F\u002Famzn.to\u002F4fqvn0D)）的官方代码库。\n\n\u003Cbr>\n\u003Cbr>\n\n\u003Ca href=\"https:\u002F\u002Famzn.to\u002F4fqvn0D\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frasbt_LLMs-from-scratch_readme_bf20af783348.jpg\" width=\"250px\">\u003C\u002Fa>\n\n\u003Cbr>\n\n在《从零开始构建大型语言模型》一书中，您将通过逐步从头编写代码，由内而外学习并理解大型语言模型（LLM）的工作原理。在这本书中，我将引导您创建属于自己的 LLM，并用清晰的文字、图表和示例解释每一个步骤。\n\n本书介绍的方法用于训练和开发一个小型但功能完整的模型，以供学习之用，其流程与构建 ChatGPT 等大规模基础模型所采用的方法一致。此外，本书还提供了加载更大规模预训练模型权重以便进行微调的代码。\n\n- 官方[源代码仓库链接](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch)\n- [Manning 出版社官网上的图书链接](http:\u002F\u002Fmng.bz\u002ForYv)\n- [Amazon.com 上的图书页面链接](https:\u002F\u002Fwww.amazon.com\u002Fgp\u002Fproduct\u002F1633437167)\n- ISBN 9781633437166\n\n\u003Ca href=\"http:\u002F\u002Fmng.bz\u002ForYv#reviews\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frasbt_LLMs-from-scratch_readme_ef3e482564e0.png\" width=\"220px\">\u003C\u002Fa>\n\n\n\u003Cbr>\n\u003Cbr>\n\n要下载本仓库的副本，请点击“[Download ZIP](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch\u002Farchive\u002Frefs\u002Fheads\u002Fmain.zip)”按钮，或在终端中执行以下命令：\n\n```bash\ngit clone --depth 1 https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch.git\n```\n\n\u003Cbr>\n\n（如果您是从 Manning 网站下载的代码包，请考虑访问 GitHub 上的官方代码仓库 [https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch)，以获取最新更新。）\n\n\u003Cbr>\n\u003Cbr>\n\n# 目录\n\n请注意，此 `README.md` 文件是一个 Markdown（`.md`）文件。如果您是从 Manning 网站下载了本代码包，并在本地计算机上查看它，建议使用 Markdown 编辑器或预览工具以获得最佳阅读体验。如果您尚未安装 Markdown 编辑器，[Ghostwriter](https:\u002F\u002Fghostwriter.kde.org) 是一个不错的免费选择。\n\n您也可以在 GitHub 上通过浏览器查看此文件及其他文件，网址为 [https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch)，GitHub 会自动渲染 Markdown 内容。\n\n\u003Cbr>\n\u003Cbr>\n\n\n> **提示：**\n> 如果您正在寻找关于安装 Python 和 Python 包以及设置代码环境的指导，建议阅读位于 [setup](setup) 目录下的 [README.md](setup\u002FREADME.md) 文件。\n\n\u003Cbr>\n\u003Cbr>\n\n[![Linux 代码测试](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch\u002Factions\u002Fworkflows\u002Fbasic-tests-linux-uv.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch\u002Factions\u002Fworkflows\u002Fbasic-tests-linux-uv.yml)\n[![Windows 代码测试](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch\u002Factions\u002Fworkflows\u002Fbasic-tests-windows-uv-pip.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch\u002Factions\u002Fworkflows\u002Fbasic-tests-windows-uv-pip.yml)\n[![macOS 代码测试](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch\u002Factions\u002Fworkflows\u002Fbasic-tests-macos-uv.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch\u002Factions\u002Fworkflows\u002Fbasic-tests-macos-uv.yml)\n\n\n\n| 章节标题                                              | 主要代码（便于快速访问）                                                                                                    | 所有代码 + 补充材料      |\n|------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------|-------------------------------|\n| [设置建议](setup) \u003Cbr\u002F>[如何更好地阅读本书](https:\u002F\u002Fsebastianraschka.com\u002Fblog\u002F2025\u002Freading-books.html)                            | -                                                                                                                               | -                             |\n| 第1章：理解大型语言模型                  | 无代码                                                                                                                         | -                             |\n| 第2章：处理文本数据                               | - [ch02.ipynb](ch02\u002F01_main-chapter-code\u002Fch02.ipynb)\u003Cbr\u002F>- [dataloader.ipynb](ch02\u002F01_main-chapter-code\u002Fdataloader.ipynb)（摘要）\u003Cbr\u002F>- [exercise-solutions.ipynb](ch02\u002F01_main-chapter-code\u002Fexercise-solutions.ipynb)               | [.\u002Fch02](.\u002Fch02)            |\n| 第3章：实现注意力机制                          | - [ch03.ipynb](ch03\u002F01_main-chapter-code\u002Fch03.ipynb)\u003Cbr\u002F>- [multihead-attention.ipynb](ch03\u002F01_main-chapter-code\u002Fmultihead-attention.ipynb)（摘要） \u003Cbr\u002F>- [exercise-solutions.ipynb](ch03\u002F01_main-chapter-code\u002Fexercise-solutions.ipynb)| [.\u002Fch03](.\u002Fch03)             |\n| 第4章：从零开始实现 GPT 模型                | - [ch04.ipynb](ch04\u002F01_main-chapter-code\u002Fch04.ipynb)\u003Cbr\u002F>- [gpt.py](ch04\u002F01_main-chapter-code\u002Fgpt.py)（摘要）\u003Cbr\u002F>- [exercise-solutions.ipynb](ch04\u002F01_main-chapter-code\u002Fexercise-solutions.ipynb) | [.\u002Fch04](.\u002Fch04)           |\n| 第5章：在无标签数据上进行预训练                        | - [ch05.ipynb](ch05\u002F01_main-chapter-code\u002Fch05.ipynb)\u003Cbr\u002F>- [gpt_train.py](ch05\u002F01_main-chapter-code\u002Fgpt_train.py)（摘要） \u003Cbr\u002F>- [gpt_generate.py](ch05\u002F01_main-chapter-code\u002Fgpt_generate.py)（摘要） \u003Cbr\u002F>- [exercise-solutions.ipynb](ch05\u002F01_main-chapter-code\u002Fexercise-solutions.ipynb) | [.\u002Fch05](.\u002Fch05)              |\n| 第6章：用于文本分类的微调                   | - [ch06.ipynb](ch06\u002F01_main-chapter-code\u002Fch06.ipynb)  \u003Cbr\u002F>- [gpt_class_finetune.py](ch06\u002F01_main-chapter-code\u002Fgpt_class_finetune.py)  \u003Cbr\u002F>- [exercise-solutions.ipynb](ch06\u002F01_main-chapter-code\u002Fexercise-solutions.ipynb) | [.\u002Fch06](.\u002Fch06)              |\n| 第7章：按照指令进行微调                    | - [ch07.ipynb](ch07\u002F01_main-chapter-code\u002Fch07.ipynb)\u003Cbr\u002F>- [gpt_instruction_finetuning.py](ch07\u002F01_main-chapter-code\u002Fgpt_instruction_finetuning.py)（摘要）\u003Cbr\u002F>- [ollama_evaluate.py](ch07\u002F01_main-chapter-code\u002Follama_evaluate.py)（摘要）\u003Cbr\u002F>- [exercise-solutions.ipynb](ch07\u002F01_main-chapter-code\u002Fexercise-solutions.ipynb) | [.\u002Fch07](.\u002Fch07)  |\n| 附录A：PyTorch简介                        | - [code-part1.ipynb](appendix-A\u002F01_main-chapter-code\u002Fcode-part1.ipynb)\u003Cbr\u002F>- [code-part2.ipynb](appendix-A\u002F01_main-chapter-code\u002Fcode-part2.ipynb)\u003Cbr\u002F>- [DDP-script.py](appendix-A\u002F01_main-chapter-code\u002FDDP-script.py)\u003Cbr\u002F>- [exercise-solutions.ipynb](appendix-A\u002F01_main-chapter-code\u002Fexercise-solutions.ipynb) | [.\u002Fappendix-A](.\u002Fappendix-A) |\n| 附录B：参考文献和拓展阅读                 | 无代码                                                                                                                         | [.\u002Fappendix-B](.\u002Fappendix-B) |\n| 附录C：习题解答                             | - [习题解答列表](appendix-C)                                                                 | [.\u002Fappendix-C](.\u002Fappendix-C) |\n| 附录D：为训练循环添加额外功能               | - [appendix-D.ipynb](appendix-D\u002F01_main-chapter-code\u002Fappendix-D.ipynb)                                                          | [.\u002Fappendix-D](.\u002Fappendix-D)  |\n| 附录E：使用 LoRA 进行参数高效的微调       | - [appendix-E.ipynb](appendix-E\u002F01_main-chapter-code\u002Fappendix-E.ipynb)                                                          | [.\u002Fappendix-E](.\u002Fappendix-E) |\n\n\u003Cbr>\n&nbsp;\n\n下图的心理模型总结了本书涵盖的内容。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frasbt_LLMs-from-scratch_readme_3a3e65304b21.jpg\" width=\"650px\">\n\n\n\u003Cbr>\n&nbsp;\n\n## 先决条件\n\n最重要的先决条件是扎实的 Python 编程基础。\n具备这一知识后，您将能够很好地探索 LLM 的精彩世界，\n并理解本书中介绍的概念和代码示例。\n\n如果您对深度神经网络有一定了解，可能会觉得某些概念更加熟悉，因为 LLM 正是建立在这些架构之上的。\n\n本书使用 PyTorch 从头开始实现代码，未使用任何外部 LLM 库。虽然熟练掌握 PyTorch 并非必需，但熟悉 PyTorch 的基础知识无疑会有所帮助。如果您是 PyTorch 新手，附录 A 提供了简明的 PyTorch 入门介绍。此外，我的书籍《一小时学 PyTorch：从张量到多 GPU 训练神经网络》（[https:\u002F\u002Fsebastianraschka.com\u002Fteaching\u002Fpytorch-1h\u002F](https:\u002F\u002Fsebastianraschka.com\u002Fteaching\u002Fpytorch-1h\u002F)）也能帮助您快速掌握 PyTorch 的核心内容。\n\n\n\n\u003Cbr>\n&nbsp;\n\n## 硬件要求\n\n本书主要章节中的代码设计为可在普通笔记本电脑上以合理的时间范围内运行，无需特殊硬件。这种做法确保了广泛的读者群体能够参与学习。此外，如果系统中存在 GPU，代码会自动利用 GPU 进行加速。（请参阅 [设置](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch\u002Fblob\u002Fmain\u002Fsetup\u002FREADME.md) 文档以获取更多建议。）\n\n\n&nbsp;\n## 视频课程\n\n[一部长达 17 小时 15 分钟的配套视频课程](https:\u002F\u002Fwww.manning.com\u002Flivevideo\u002Fmaster-and-build-large-language-models)，我在其中逐章编写并演示书中的代码。该课程按章节和小节组织，与书籍结构完全对应，因此既可以作为独立的学习资源，也可以作为与书籍内容相辅相成的代码实践指南。\n\n\u003Ca href=\"https:\u002F\u002Fwww.manning.com\u002Flivevideo\u002Fmaster-and-build-large-language-models\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frasbt_LLMs-from-scratch_readme_4a642af55bac.webp\" width=\"350px\">\u003C\u002Fa>\n\n\n&nbsp;\n\n\n## 配套书籍 \u002F 续作\n\n[*从零构建推理模型*](https:\u002F\u002Fmng.bz\u002FlZ5B) 虽然是一本独立的书籍，但也可被视为 *从零构建大型语言模型* 的续作。\n\n本书从一个预训练模型入手，实现了多种推理方法，包括推理时缩放、强化学习和知识蒸馏等，以提升模型的推理能力。\n\n与 *从零构建大型语言模型* 类似，[*从零构建推理模型*](https:\u002F\u002Fmng.bz\u002FlZ5B) 同样采用动手实践的方式，从头开始实现这些方法。\n\n\u003Ca href=\"https:\u002F\u002Fmng.bz\u002FlZ5B\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frasbt_LLMs-from-scratch_readme_3603d5aed1a1.webp\" width=\"120px\">\u003C\u002Fa>\n\n- 亚马逊链接（待定）\n- [Manning 链接](https:\u002F\u002Fmng.bz\u002FlZ5B)\n- [GitHub 仓库](https:\u002F\u002Fgithub.com\u002Frasbt\u002Freasoning-from-scratch)\n\n\u003Cbr>\n\n&nbsp;\n## 练习题\n\n本书每章都配有若干练习题。答案汇总在附录 C 中，相应的代码笔记本则位于本仓库各主章节文件夹内（例如，[.\u002Fch02\u002F01_main-chapter-code\u002Fexercise-solutions.ipynb](.\u002Fch02\u002F01_main-chapter-code\u002Fexercise-solutions.ipynb)）。\n\n除了代码练习之外，您还可以从 Manning 官网免费下载一份 170 页的 PDF 文件，名为 [测试自己：从零构建大型语言模型](https:\u002F\u002Fwww.manning.com\u002Fbooks\u002Ftest-yourself-on-build-a-large-language-model-from-scratch)。该文档每章约有 30 道测验题及答案，帮助您检验对内容的理解。\n\n\u003Ca href=\"https:\u002F\u002Fwww.manning.com\u002Fbooks\u002Ftest-yourself-on-build-a-large-language-model-from-scratch\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frasbt_LLMs-from-scratch_readme_388299f23c0d.jpg\" width=\"150px\">\u003C\u002Fa>\n\n&nbsp;\n\n## 附加资料\n\n若干文件夹包含可选材料，供感兴趣的读者参考：\n\n- **设置**\n  - [Python 设置技巧](setup\u002F01_optional-python-setup-preferences)\n  - [安装本书中使用的 Python 包和库](setup\u002F02_installing-python-libraries)\n  - [Docker 环境设置指南](setup\u002F03_optional-docker-environment)\n\n- **第 2 章：处理文本数据**\n  - [从头实现字节对编码（BPE）分词器](ch02\u002F05_bpe-from-scratch\u002Fbpe-from-scratch-simple.ipynb)\n  - [比较多种字节对编码（BPE）实现](ch02\u002F02_bonus_bytepair-encoder)\n  - [理解嵌入层与线性层的区别](ch02\u002F03_bonus_embedding-vs-matmul)\n  - [用简单示例直观理解数据加载器](ch02\u002F04_bonus_dataloader-intuition)\n\n- **第 3 章：实现注意力机制**\n  - [比较高效的多头注意力实现](ch03\u002F02_bonus_efficient-multihead-attention\u002Fmha-implementations.ipynb)\n  - [理解 PyTorch 缓冲区](ch03\u002F03_understanding-buffers\u002Funderstanding-buffers.ipynb)\n\n- **第 4 章：从零开始实现 GPT 模型**\n  - [FLOPs 分析](ch04\u002F02_performance-analysis\u002Fflops-analysis.ipynb)\n  - [KV 缓存](ch04\u002F03_kv-cache)\n  - [注意力机制的替代方案](ch04\u002F#attention-alternatives)\n    - [分组查询注意力](ch04\u002F04_gqa)\n    - [多头潜在注意力](ch04\u002F05_mla)\n    - [滑动窗口注意力](ch04\u002F06_swa)\n    - [门控 DeltaNet](ch04\u002F08_deltanet)\n  - [专家混合（MoE）](ch04\u002F07_moe)\n\n- **第 5 章：在无标签数据上预训练**\n  - [替代权重加载方法](ch05\u002F02_alternative_weight_loading\u002F)\n  - [在古腾堡计划数据集上预训练 GPT](ch05\u002F03_bonus_pretraining_on_gutenberg)\n  - [为训练循环添加额外功能](ch05\u002F04_learning_rate_schedulers)\n  - [优化预训练超参数](ch05\u002F05_bonus_hparam_tuning)\n  - [构建与预训练 LLM 交互的用户界面](ch05\u002F06_user_interface)\n  - [将 GPT 转换为 Llama](ch05\u002F07_gpt_to_llama)\n  - [内存高效的模型权重加载](ch05\u002F08_memory_efficient_weight_loading\u002Fmemory-efficient-state-dict.ipynb)\n  - [扩展 Tiktoken BPE 分词器的新词](ch05\u002F09_extending-tokenizers\u002Fextend-tiktoken.ipynb)\n  - [加速 LLM 训练的 PyTorch 性能技巧](ch05\u002F10_llm-training-speed)\n  - [LLM 架构](ch05\u002F#llm-architectures-from-scratch)\n    - [从零开始实现 Llama 3.2](ch05\u002F07_gpt_to_llama\u002Fstandalone-llama32.ipynb)\n    - [从零开始实现 Qwen3 密集模型和专家混合（MoE）](ch05\u002F11_qwen3\u002F)\n    - [从零开始实现 Gemma 3](ch05\u002F12_gemma3\u002F)\n    - [从零开始实现 Olmo 3](ch05\u002F13_olmo3\u002F)\n    - [从零开始实现 Tiny Aya](ch05\u002F15_tiny-aya\u002F)\n    - [从零开始实现 Qwen3.5](ch05\u002F16_qwen3.5\u002F)\n  - [使用其他 LLM 作为直接替换品的第 5 章内容（例如 Llama 3、Qwen 3）](ch05\u002F14_ch05_with_other_llms\u002F)\n\n- **第 6 章：用于分类的微调**\n  - [针对不同层及更大模型的额外微调实验](ch06\u002F02_bonus_additional-experiments)\n  - [在 5 万条 IMDb 电影评论数据集上微调不同模型](ch06\u002F03_bonus_imdb-classification)\n  - [构建与基于 GPT 的垃圾邮件分类器交互的用户界面](ch06\u002F04_user_interface)\n\n- **第 7 章：遵循指令的微调**\n  - [用于查找近似重复和创建被动语态条目的数据集工具](ch07\u002F02_dataset-utilities)\n  - [使用 OpenAI API 和 Ollama 评估指令响应](ch07\u002F03_model-evaluation)\n  - [生成用于指令微调的数据集](ch07\u002F05_dataset-generation\u002Fllama3-ollama.ipynb)\n  - [改进用于指令微调的数据集](ch07\u002F05_dataset-generation\u002Freflection-gpt4.ipynb)\n  - [利用 Llama 3.1 70B 和 Ollama 构建偏好数据集并进行 DPO 微调](ch07\u002F04_preference-tuning-with-dpo\u002Fcreate-preference-data-ollama.ipynb)\n  - [直接偏好优化（DPO）用于 LLM 对齐](ch07\u002F04_preference-tuning-with-dpo\u002Fdpo-from-scratch.ipynb)\n  - [构建与指令微调后的 GPT 模型交互的用户界面](ch07\u002F06_user_interface)\n\n更多来自 [Reasoning From Scratch](https:\u002F\u002Fgithub.com\u002Frasbt\u002Freasoning-from-scratch) 仓库的附加资料：\n\n- **Qwen3（从零开始）基础**\n  - [Qwen3 源代码详解](https:\u002F\u002Fgithub.com\u002Frasbt\u002Freasoning-from-scratch\u002Fblob\u002Fmain\u002FchC\u002F01_main-chapter-code\u002FchC_main.ipynb)\n  - [优化版 Qwen3](https:\u002F\u002Fgithub.com\u002Frasbt\u002Freasoning-from-scratch\u002Ftree\u002Fmain\u002Fch02\u002F03_optimized-LLM)\n\n- **评估**\n  - [基于验证器的评估（MATH-500）](https:\u002F\u002Fgithub.com\u002Frasbt\u002Freasoning-from-scratch\u002Ftree\u002Fmain\u002Fch03)\n  - [选择题评估（MMLU）](https:\u002F\u002Fgithub.com\u002Frasbt\u002Freasoning-from-scratch\u002Fblob\u002Fmain\u002FchF\u002F02_mmlu)\n  - [LLM 排行榜评估](https:\u002F\u002Fgithub.com\u002Frasbt\u002Freasoning-from-scratch\u002Fblob\u002Fmain\u002FchF\u002F03_leaderboards)\n  - [LLM 作为评判者的评估](https:\u002F\u002Fgithub.com\u002Frasbt\u002Freasoning-from-scratch\u002Fblob\u002Fmain\u002FchF\u002F04_llm-judge)\n\n- **推理扩展**\n  - [自洽性](https:\u002F\u002Fgithub.com\u002Frasbt\u002Freasoning-from-scratch\u002Fblob\u002Fmain\u002Fch04\u002F01_main-chapter-code\u002Fch04_main.ipynb)\n  - [自我精炼](https:\u002F\u002Fgithub.com\u002Frasbt\u002Freasoning-from-scratch\u002Fblob\u002Fmain\u002Fch05\u002F01_main-chapter-code\u002Fch05_main.ipynb)\n\n- **强化学习**（RL）\n  - [从零开始实现 RLVR 与 GRPO](https:\u002F\u002Fgithub.com\u002Frasbt\u002Freasoning-from-scratch\u002Fblob\u002Fmain\u002Fch06\u002F01_main-chapter-code\u002Fch06_main.ipynb)\n\n\n\u003Cbr>\n&nbsp;\n\n## 问题、反馈及参与本仓库\n\n我欢迎各种形式的反馈，最佳方式是通过 [Manning 论坛](https:\u002F\u002Flivebook.manning.com\u002Fforum?product=raschka&page=1) 或 [GitHub Discussions](https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch\u002Fdiscussions) 提交。同样地，如果您有任何问题或只是想与他人交流想法，请随时在论坛上发帖。\n\n请注意，由于本仓库包含与纸质书对应的代码，目前我无法接受会扩展主章节代码内容的贡献，因为这会导致与实体书内容产生偏差。保持一致性有助于确保所有用户的顺畅体验。\n\n\n&nbsp;\n## 引用\n\n如果您发现本书或代码对您的研究有所帮助，请考虑引用它。\n\n芝加哥格式引用：\n\n> Raschka, Sebastian. *从零构建大型语言模型*。Manning 出版社，2024 年。ISBN：978-1633437166。\n\nBibTeX 条目：\n\n```\n@book{build-llms-from-scratch-book,\n  author       = {Sebastian Raschka},\n  title        = {从零构建大型语言模型},\n  publisher    = {Manning},\n  year         = {2024},\n  isbn         = {978-1633437166},\n  url          = {https:\u002F\u002Fwww.manning.com\u002Fbooks\u002Fbuild-a-large-language-model-from-scratch},\n  github       = {https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch}\n}\n```","# LLMs-from-scratch 快速上手指南\n\n本指南基于 Sebastian Raschka 的开源项目 `LLMs-from-scratch`，旨在帮助开发者从零开始构建、预训练和微调类 GPT 的大型语言模型（LLM）。该项目是同名书籍的官方代码库，适合希望深入理解 LLM 内部原理的开发者。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**：Linux、Windows 或 macOS 均可运行。\n- **硬件**：\n  - 主章节代码设计为可在普通笔记本电脑上运行，无需专用硬件。\n  - 若可用，代码会自动利用 GPU 加速训练过程。\n- **前置知识**：\n  - 扎实的 Python 编程基础。\n  - 了解深度学习神经网络概念更佳（非必须）。\n  - 熟悉 PyTorch 基础操作（若不熟悉，可参考项目附录 A 或作者的其他教程）。\n\n### 依赖项\n- Python 3.8+\n- PyTorch\n- 其他常用数据科学库（如 numpy, pandas, matplotlib 等）\n\n> **提示**：详细的 Python 环境配置和包安装建议，请参考项目根目录下的 `setup\u002FREADME.md` 文件。\n\n## 安装步骤\n\n### 1. 克隆仓库\n使用以下命令将代码库克隆到本地：\n\n```bash\ngit clone --depth 1 https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch.git\n```\n\n或者下载 ZIP 压缩包后解压。\n\n### 2. 安装依赖\n进入项目目录并安装所需依赖。推荐使用 `uv` 或 `pip` 进行安装。\n\n**使用 uv (推荐，速度更快):**\n```bash\ncd LLMs-from-scratch\nuv pip install -r requirements.txt\n```\n\n**使用 pip:**\n```bash\ncd LLMs-from-scratch\npip install -r requirements.txt\n```\n\n> **国内加速建议**：如果下载速度慢，可使用国内镜像源。\n> 例如使用清华源安装：\n> ```bash\n> pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n## 基本使用\n\n本项目按书籍章节组织代码，每个章节包含主要的 Jupyter Notebook (`.ipynb`) 和总结性的 Python 脚本 (`.py`)。以下是快速体验核心功能的流程：\n\n### 1. 处理文本数据 (第 2 章)\n学习如何构建数据集和数据加载器。\n```bash\njupyter notebook ch02\u002F01_main-chapter-code\u002Fch02.ipynb\n```\n或查看简化版代码：\n```bash\npython ch02\u002F01_main-chapter-code\u002Fdataloader.ipynb\n```\n\n### 2. 实现注意力机制 (第 3 章)\n从零编写多头注意力机制。\n```bash\njupyter notebook ch03\u002F01_main-chapter-code\u002Fch03.ipynb\n```\n\n### 3. 构建 GPT 模型 (第 4 章)\n组装完整的 GPT 架构。\n```bash\njupyter notebook ch04\u002F01_main-chapter-code\u002Fch04.ipynb\n```\n或直接运行模型定义脚本：\n```bash\npython ch04\u002F01_main-chapter-code\u002Fgpt.py\n```\n\n### 4. 预训练模型 (第 5 章)\n在无标签数据上预训练模型并生成文本。\n```bash\njupyter notebook ch05\u002F01_main-chapter-code\u002Fch05.ipynb\n```\n运行训练脚本示例：\n```bash\npython ch05\u002F01_main-chapter-code\u002Fgpt_train.py\n```\n运行文本生成脚本示例：\n```bash\npython ch05\u002F01_main-chapter-code\u002Fgpt_generate.py\n```\n\n### 5. 微调模型 (第 6-7 章)\n针对文本分类或指令遵循任务进行微调。\n```bash\n# 文本分类微调\njupyter notebook ch06\u002F01_main-chapter-code\u002Fch06.ipynb\n\n# 指令微调\njupyter notebook ch07\u002F01_main-chapter-code\u002Fch07.ipynb\n```\n\n> **注意**：所有章节的练习解答代码位于各章节文件夹下的 `exercise-solutions.ipynb` 文件中。附录部分提供了 PyTorch 入门及参数高效微调 (LoRA) 的额外代码示例。","某高校人工智能实验室的研究生团队试图深入理解大语言模型（LLM）的内部机制，并计划构建一个针对特定学术领域的轻量级对话模型。\n\n### 没有 LLMs-from-scratch 时\n- **黑盒困境**：团队成员只能调用现成的 API 或加载庞大的预训练权重，完全无法知晓 Transformer 架构中注意力机制等核心组件的具体实现细节。\n- **学习断层**：面对复杂的数学公式和抽象论文，缺乏可运行的代码作为对照，导致从理论推导到工程落地的转化极其困难，试错成本高昂。\n- **定制受限**：想要修改模型结构以适应低资源环境或特殊数据格式时，因不熟悉底层逻辑而不敢轻易动手，往往陷入“调包侠”的被动局面。\n- **教育资源匮乏**：市面上缺乏系统性指导从零构建类 ChatGPT 模型的教程，新手在数据预处理、预训练到微调的全流程中容易迷失方向。\n\n### 使用 LLMs-from-scratch 后\n- **白盒掌控**：借助该工具提供的分步代码，团队成员亲手用 PyTorch 实现了每一个模块，彻底厘清了从词嵌入到自注意力机制的完整数据流向。\n- **知行合一**：配合书籍中的图表与实例，复杂的算法原理瞬间转化为可视化的代码逻辑，极大缩短了理解周期，让复现论文变得轻松自如。\n- **灵活魔改**：在清晰掌握底层架构的基础上，团队成功删减了冗余层并调整了上下文窗口，高效训练出适配实验室服务器的小型专用模型。\n- **全流程贯通**：利用其涵盖的数据加载、预训练及微调完整链路代码，学生们系统性地掌握了大模型开发的全貌，具备了独立研发能力。\n\nLLMs-from-scratch 将高深的大模型技术拆解为可执行的代码步骤，让开发者从单纯的“使用者”蜕变为真正的“创造者”。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frasbt_LLMs-from-scratch_1a2cfa5e.png","rasbt","Sebastian Raschka","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Frasbt_4eb76c31.jpg","AI Research Engineer working on LLMs.",null,"https:\u002F\u002Fsebastianraschka.com","https:\u002F\u002Fgithub.com\u002Frasbt",[81,85,89],{"name":82,"color":83,"percentage":84},"Jupyter Notebook","#DA5B0B",75.2,{"name":86,"color":87,"percentage":88},"Python","#3572A5",24.8,{"name":90,"color":91,"percentage":92},"Dockerfile","#384d54",0,90106,13794,"2026-04-06T11:19:32","NOASSERTION","Linux, macOS, Windows","非必需（代码可在普通笔记本电脑上运行），若有 GPU 会自动利用；具体型号、显存大小及 CUDA 版本未在 README 中明确说明","未说明",{"notes":101,"python":102,"dependencies":103},"本项目旨在从零构建 GPT 类模型，主要依赖 PyTorch，不使用外部 LLM 库。代码设计为可在普通笔记本电脑上运行，无需专用硬件。详细的 Python 安装、包管理及环境设置指南请参阅仓库中 setup 目录下的 README.md 文件。若熟悉深度神经网络或 PyTorch 基础将更有帮助，附录 A 提供了 PyTorch 简介。","未说明（需具备扎实的 Python 编程基础）",[104],"PyTorch",[35,15,13,14],[107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122],"chatgpt","gpt","large-language-models","llm","python","pytorch","ai","artificial-intelligence","language-model","deep-learning","machine-learning","from-scratch","generative-ai","transformers","neural-networks","chatbot",5,"2026-03-27T02:49:30.150509","2026-04-06T22:44:35.706045",[127,132,137,142,146,150],{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},20419,"如何确保使用 Ollama API 时生成可复现（确定性）的输出？","为了使 Ollama API 的输出完全可复现，需要在请求的 JSON 数据中添加一个 `options` 字段，并设置以下参数：\n1. `\"seed\": 123`（或其他固定整数）：设置随机种子。\n2. `\"temperature\": 0`：将温度设为 0 以消除随机性。\n3. `\"num_ctx\": 2048`（或固定值）：固定上下文窗口大小，否则输出仍会有轻微随机性。\n\n示例代码片段：\n```python\ndata = {\n    \"model\": model,\n    \"messages\": [{\"role\": \"user\", \"content\": prompt}],\n    \"options\": {\n        \"seed\": 123,\n        \"temperature\": 0,\n        \"num_ctx\": 2048\n    }\n}\n```\n注意：某些非确定性可能源于 KV 缓存或硬件差异，需关注 Ollama 后续更新以禁用 KV 缓存。","https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch\u002Fissues\u002F249",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},20420,"在加载权重到 GPT 模型时遇到张量尺寸不匹配（如 256 与 1024）的错误怎么办？","此错误通常由以下原因导致：\n1. **代码执行顺序问题**：确保按顺序执行了 Notebook 中的所有单元格，避免跳跃执行导致变量状态不一致。\n2. **模型结构修改未同步**：如果你替换了多头注意力模块（Multihead Attention Module）或修改了模型架构，权重变量的名称和形状会发生变化。必须相应地更新 `load_weights_into_gpt` 函数中的权重加载逻辑，使其与新的模型结构匹配。\n3. **版本不一致**：确保拉取了仓库的最新提交（`git pull`），因为代码可能已更新。\n\n如果修改了模型结构，请检查权重矩阵的转置操作（`.T`）和拆分逻辑（`np.split`）是否与当前模型定义的维度一致。","https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch\u002Fissues\u002F215",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},20421,"为什么在 Windows 上解压 IMDB 数据集或运行相关脚本非常慢？","Windows 文件系统在处理大量小文件（IMDB 数据集约 50,000 个文件）时效率较低，导致解压和读取过程耗时显著增加（可能需要 3-5 分钟甚至更久）。\n相比之下：\n- macOS 上约需 29 秒。\n- Google Colab (Ubuntu) 上仅需约 5 秒。\n\n**建议解决方案**：\n1. 如果可能，使用 WSL (Windows Subsystem for Linux) 或 Docker 运行 Ubuntu 环境，虽然 Docker 在某些配置下也可能较慢，但通常优于原生 Windows 文件系统。\n2. 作者已意识到此问题并计划替换为更小的数据集以提高易用性。在等待更新期间，耐心等待解压完成即可，这并非代码错误。","https:\u002F\u002Fgithub.com\u002Frasbt\u002FLLMs-from-scratch\u002Fissues\u002F155",{"id":143,"question_zh":144,"answer_zh":145,"source_url":141},20422,"运行 IMDB 分类训练脚本时验证集损失（Val loss）出现 NaN 是什么原因？","验证集损失出现 NaN 通常是因为数据集划分或文件路径配置不正确。具体表现为：\n1. **缺少必要脚本**：确保文件夹中包含 `gpt_download.py` 和 `previous_chapters.py` 等依赖脚本，否则无法正常运行训练。\n2. **数据集准备失败**：`download-prepare-dataset.py` 可能未正确创建测试集和验证集。如果直接从其他章节复制文件而未重新生成数据划分，可能导致数据加载错误。\n3. **解决方法**：重新运行数据下载和预处理脚本，确保 train\u002Fval\u002Ftest 集合正确生成。检查日志中是否有文件缺失或路径错误的警告。",{"id":147,"question_zh":148,"answer_zh":149,"source_url":136},20423,"在 GitHub 上寻求帮助时，如果提供私有仓库链接导致维护者无法查看代码怎么办？","维护者无法访问私有仓库。如果遇到此类情况：\n1. **公开仓库**：将你的仓库设置为公共（Public），以便他人查看代码。\n2. **使用 Fork**：最佳实践是 Fork 原项目仓库而不是直接克隆后新建私有库，这样便于维护者对比差异。\n3. **更新代码**：确保你的代码基于原项目的最新提交（`git pull`），排除因版本过旧导致的已知问题。\n\n在提问时，请直接提供可公开访问的代码链接或最小复现代码片段。",{"id":151,"question_zh":152,"answer_zh":153,"source_url":141},20424,"不同操作系统（Windows, macOS, Linux）在运行本项目时的性能差异有多大？","性能差异主要体现在文件 I\u002FO 操作上，尤其是处理包含大量小文件的数据集时：\n- **Windows**：最慢，解压和处理 IMDB 数据集可能需要数分钟，受限于文件系统对小文件的处理能力。\n- **macOS**：表现中等，同类任务约需 30 秒左右。\n- **Linux (如 Google Colab\u002FUbuntu)**：最快，通常只需几秒到 2-3 分钟。\n\n如果在 Windows 上遇到性能瓶颈，建议使用 WSL2 或在云端 Linux 环境（如 Colab, Kaggle, Lightning Studios）中运行代码以获得更佳体验。",[]]