[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-stanford-crfm--mistral":3,"tool-stanford-crfm--mistral":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",160015,2,"2026-04-18T11:30:52",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",109154,"2026-04-18T11:18:24",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":77,"owner_twitter":77,"owner_website":78,"owner_url":79,"languages":80,"stars":101,"forks":102,"last_commit_at":103,"license":104,"difficulty_score":105,"env_os":106,"env_gpu":107,"env_ram":106,"env_deps":108,"category_tags":118,"github_topics":77,"view_count":32,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":119,"updated_at":120,"faqs":121,"releases":156},9143,"stanford-crfm\u002Fmistral","mistral","Mistral: A strong, northwesterly wind: Framework for transparent and accessible large-scale language model training, built with Hugging Face 🤗  Transformers.","Mistral 是一个专为大规模语言模型训练设计的开源框架，旨在让复杂的模型训练过程变得更加透明和易于上手。它基于广受认可的 Hugging Face Transformers 构建，就像其名字所寓意的“强劲西北风”一样，致力于为用户带来清晰、高效的开发体验。\n\n过去，从零开始预训练大型语言模型往往面临环境配置复杂、分布式训练门槛高以及评估流程繁琐等挑战。Mistral 通过提供一套完整的工具链和脚本，有效解决了这些痛点。它不仅支持灵活接入新的预训练数据集，还内置了多种单机及多机分布式训练方案，甚至涵盖了在 Google Cloud 等云平台上的部署指南，同时提供了便捷的模型评估脚本。\n\n这款工具特别适合 AI 研究人员、深度学习工程师以及希望深入探索大模型底层机制的开发者使用。无论是想在单张 GPU 上快速验证想法，还是需要在多节点集群上进行大规模实验，Mistral 都能提供稳定支持。其独特的技术亮点在于与 Hugging Face 生态的无缝集成：训练生成的模型检查点直接采用标准格式，用户无需额外转换即可直接加载并使用，极大地简化了从训练到应用的流程。如果你渴望在透明、可控的环境","Mistral 是一个专为大规模语言模型训练设计的开源框架，旨在让复杂的模型训练过程变得更加透明和易于上手。它基于广受认可的 Hugging Face Transformers 构建，就像其名字所寓意的“强劲西北风”一样，致力于为用户带来清晰、高效的开发体验。\n\n过去，从零开始预训练大型语言模型往往面临环境配置复杂、分布式训练门槛高以及评估流程繁琐等挑战。Mistral 通过提供一套完整的工具链和脚本，有效解决了这些痛点。它不仅支持灵活接入新的预训练数据集，还内置了多种单机及多机分布式训练方案，甚至涵盖了在 Google Cloud 等云平台上的部署指南，同时提供了便捷的模型评估脚本。\n\n这款工具特别适合 AI 研究人员、深度学习工程师以及希望深入探索大模型底层机制的开发者使用。无论是想在单张 GPU 上快速验证想法，还是需要在多节点集群上进行大规模实验，Mistral 都能提供稳定支持。其独特的技术亮点在于与 Hugging Face 生态的无缝集成：训练生成的模型检查点直接采用标准格式，用户无需额外转换即可直接加载并使用，极大地简化了从训练到应用的流程。如果你渴望在透明、可控的环境中构建自己的大语言模型，Mistral 将是一个得力的助手。","\u003Cdiv align=\"center\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fstanford-crfm_mistral_readme_9297b1525833.png\" height=\"300px\"\u002F>\u003C\u002Fdiv>\n\n# Mistral\n\n> *Mistral*: A strong and cool northwesterly wind that builds as it moves, bringing good health and clear skies.\n\n[![license](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-Apache%202.0-green.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FApache-2.0)\n[![pre-commit](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpre--commit-enabled-green?logo=pre-commit&logoColor=white)](https:\u002F\u002Fgithub.com\u002Fpre-commit\u002Fpre-commit)\n\nA framework for transparent and accessible large-scale language model training, built with [Hugging Face 🤗](https:\u002F\u002Fhuggingface.co\u002F) . Includes tools\nand helpful scripts for incorporating new pre-training datasets, various schemes for single node and distributed training - including on\ncloud providers like GCP, and importantly, scripts for evaluation.\n\nVisit our [Read the Docs](https:\u002F\u002Fnlp.stanford.edu\u002Fmistral) for the full documentation.\n\nA Propulsion Endeavor 🚀\n\n---\n\n## Quickstart\n\n### Installation\n\nMistral has been tested with Python 3.8.12, PyTorch 1.11.0 (compiled with CUDA 11.3), CUDA 11.3, NCCL 2.10, Transformers 4.17.0, and DeepSpeed 0.6.0.\n\nThe environment can be easily built with the following commands:\n\n```bash\nconda create -n mistral python=3.8.12 pytorch=1.11.0 torchdata cudatoolkit=11.3 -c pytorch\nconda activate mistral\npip install -r setup\u002Fpip-requirements.txt\n```\n\nA `.yaml` export of a tested environment is provided at `environments\u002Fenvironment-gpu.yaml`.\n\nEnvironments and non-Python dependencies can be managed with conda, and Python dependencies can be managed with pip (note: conda was used for the PyTorch install to get the version compiled with CUDA 11.3).\n\n\n### Training GPT-2 Micro\n\n#### Prerequisites\n\nFirst, make sure to update `conf\u002Fmistral-micro.yaml` with the directories you want to store the Hugging Face\ncache and model runs.\n\n```\n# Artifacts & Caching\nartifacts:\n    cache_dir: \u002Fpath\u002Fto\u002Fartifacts\n    run_dir: \u002Fpath\u002Fto\u002Fruns\n```\n\nNext, make sure that `\u002Fpath\u002Fto\u002Fmistral` is on your `PYTHONPATH`.\n\n#### Single-node single-GPU training\n\nFor single-node single-gpu training, run:\n\n```bash\nconda activate mistral\ncd mistral\nCUDA_VISIBLE_DEVICES=0 python train.py --config conf\u002Fmistral-micro.yaml --nnodes 1 --nproc_per_node 1 --training_arguments.fp16 true --training_arguments.per_device_train_batch_size 2 --run_id tutorial-gpt2-micro\n```\n\n#### Multi-node multi-GPU training with DeepSpeed\n\nModify `\u002Fjob\u002Fhostfile` in the following way:\n\n```\n\u003CHostname of first machine> slots=\u003CNumber of GPUs>\n\u003CHostname of second machine> slots=\u003CNumber of GPUs>\n...\n\u003CHostname of the nth machine> slots=\u003CNumber of GPUs>\n```\n\nBelow is an example hostfile where we train on `machine1` and `machine2` with 8 GPUs each:\n\n```\nmachine1 slots=8\nmachine2 slots=8\n```\n\nTo start distributed training, run:\n\n```bash\nconda activate mistral\ncd mistral\ndeepspeed --num_gpus 8 --num_nodes 2 --master_addr machine1 train.py --config conf\u002Ftutorial-gpt2-micro.yaml --nnodes 2 --nproc_per_node 8 --training_arguments.fp16 true --training_arguments.per_device_train_batch_size 4 --training_arguments.deepspeed conf\u002Fdeepspeed\u002Fz2-small-conf.json --run_id tutorial-gpt2-micro-multi-node\n```\n\nNote: You may need to adjust your batch size depending on the capacity of your GPUs.\n\nIf you are interested in training a model on Google Cloud, check out our\n[Google Cloud + Kubernetes Tutorial](https:\u002F\u002Fnlp.stanford.edu\u002Fmistral\u002Ftutorials\u002Fgcp_plus_kubernetes.html).\n\n### Using the model\n\nModel checkpoints will be stored in the directory specified by the `artifacts.run_dir`. An example checkpoint might be\nin `\u002Fpath\u002Fto\u002Fruns\u002Ftutorial-gpt2-micro\u002Fcheckpoint-1000`.\n\nMistral stores model checkpoints in the Hugging Face format, so models can be loaded and used in the same manner as if\none had trained the model with Hugging Face.\n\nFor instance, to generate text with 🤗  Transformers:\n\n```python\nfrom transformers import GPT2LMHeadModel, GPT2Tokenizer\n\ntokenizer = GPT2Tokenizer.from_pretrained(\"gpt2\")\n\nmodel = GPT2LMHeadModel.from_pretrained(\"stanford-crfm\u002Feowyn-x777-checkpoint-400000\")\n\ninput_ids = tokenizer.encode(\n    \"Hello world, this is a language model prompt.\", return_tensors=\"pt\"\n)\n\nsample_output = model.generate(input_ids, do_sample=True, max_length=50, top_k=50)\n\nprint(\"Output:\\n\" + 100 * \"-\")\nprint(tokenizer.decode(sample_output[0], skip_special_tokens=True))\n```\n\nCheck out this [Google CoLab Notebook](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fstanford-crfm\u002Fmistral\u002Fblob\u002Fmain\u002Fgenerate_text.ipynb) to run\nthis demo!\n\n---\n\n## Resources\n\nThe Propulsion team has trained 5 GPT-2 Medium models and 5 GPT-2 Small models on the [OpenWebText corpus](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopenwebtext),\nas found in [🤗  datasets](https:\u002F\u002Fhuggingface.co\u002Fdatasets).\n\nEach GPT-2 Small model has 600 checkpoints, subject to the following checkpoint schedule:\n\n- Every 10 Steps, for the first 0 - 100 Steps.\n- Every 50 Steps, from 100 - 2000 Steps.\n- Every 100 Steps, from 2000 - 20,000 Steps.\n- Every 1000 Steps, from 20,000 - 400,000 Steps.\n\nCheckpoints can be downloaded from [🤗 hub](https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm).\n\n| Run | Type | Seed | Download |\n| --- | --- | --- | --- |\n| Alias | GPT-2 Small | 21 | [download](https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Falias-gpt2-small-x21\u002Ftree\u002Fmain) |\n| Battlestar | GPT-2 Small | 49 | [download](https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Fbattlestar-gpt2-small-x49\u002Ftree\u002Fmain) |\n| Caprica | GPT-2 Small | 81 | [download](https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Fcaprica-gpt2-small-x81\u002Ftree\u002Fmain) |\n| Darkmatter | GPT-2 Small | 343 | [download](https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Fdarkmatter-gpt2-small-x343\u002Ftree\u002Fmain) |\n| Expanse | GPT-2 Small | 777 | [download](https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Fexpanse-gpt2-small-x777\u002Ftree\u002Fmain) |\n| Arwen | GPT-2 Medium | 21 | [download](https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Farwen-gpt2-medium-x21\u002Ftree\u002Fmain) |\n| Beren | GPT-2 Medium | 49 | [download](https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Fberen-gpt2-medium-x49\u002Ftree\u002Fmain) |\n| Celebrimbor | GPT-2 Medium | 81 | [download](https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Fcelebrimbor-gpt2-medium-x81\u002Ftree\u002Fmain) |\n| Durin | GPT-2 Medium | 343 | [download](https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Fdurin-gpt2-medium-x343\u002Ftree\u002Fmain) |\n| Eowyn | GPT-2 Medium | 777 | [download](https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Feowyn-gpt2-medium-x777\u002Ftree\u002Fmain) |\n\n\nEach model has a distinct git repo, and each checkpoint is stored as a branch.\n\nAs an example, here's how to get the battlestar model's checkpoint for step 300000:\n\n```\n# Make sure you have git-lfs installed\n# (https:\u002F\u002Fgit-lfs.github.com)\ngit lfs install\n\n# get checkpoint 300000 for battlestar\ngit clone https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Fbattlestar-gpt2-small-x49 --branch checkpoint-300000 --single-branch\ncd battlestar-gpt2-small-x49\ngit lfs pull\n```\n\nFor convenience, every model and step checkpoint is listed in `mistral_models.json`.\n\n---\n\n## Issues\n\nTo ask questions, report issues, or request features, please use the [GitHub Issue Tracker](https:\u002F\u002Fgithub.com\u002Fstanford-crfm\u002Fmistral\u002Fissues).\nBefore creating a new issue, please make sure to search for existing issues that may solve your problem.\n\n---\n\n## Differences between Mistral and Hugging Face\n\nPlease visit the [following page](https:\u002F\u002Fnlp.stanford.edu\u002Fmistral\u002Fhugging_face_differences.html) that outlines the\ndifferences between the two codebases.\n\n---\n\n## Contributing\n\nPlease see the [following page](https:\u002F\u002Fnlp.stanford.edu\u002Fmistral\u002Fcontributing.html) for information on contributing.\n","\u003Cdiv align=\"center\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fstanford-crfm_mistral_readme_9297b1525833.png\" height=\"300px\"\u002F>\u003C\u002Fdiv>\n\n# Mistral\n\n> *Mistral*：一种强劲而清凉的西北风，越吹越强，带来宜人的气候与晴朗的天空。\n\n[![license](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-Apache%202.0-green.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FApache-2.0)\n[![pre-commit](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpre--commit-enabled-green?logo=pre-commit&logoColor=white)](https:\u002F\u002Fgithub.com\u002Fpre-commit\u002Fpre-commit)\n\n一个用于透明且易于访问的大规模语言模型训练框架，基于 [Hugging Face 🤗](https:\u002F\u002Fhuggingface.co\u002F) 构建。包含用于引入新预训练数据集的工具和实用脚本、单节点及分布式训练的各种方案——包括在 GCP 等云服务提供商上的部署，以及重要的评估脚本。\n\n请访问我们的 [Read the Docs](https:\u002F\u002Fnlp.stanford.edu\u002Fmistral) 获取完整文档。\n\nA Propulsion Endeavor 🚀\n\n---\n\n## 快速入门\n\n### 安装\n\nMistral 已在以下环境中测试通过：Python 3.8.12、PyTorch 1.11.0（使用 CUDA 11.3 编译）、CUDA 11.3、NCCL 2.10、Transformers 4.17.0 和 DeepSpeed 0.6.0。\n\n可通过以下命令轻松构建环境：\n\n```bash\nconda create -n mistral python=3.8.12 pytorch=1.11.0 torchdata cudatoolkit=11.3 -c pytorch\nconda activate mistral\npip install -r setup\u002Fpip-requirements.txt\n```\n\n我们提供了一个经过测试的环境配置文件导出 `.yaml` 文件，位于 `environments\u002Fenvironment-gpu.yaml`。\n\n可以使用 conda 管理环境和非 Python 依赖项，而 Python 依赖项则可以通过 pip 管理（注意：为了获取使用 CUDA 11.3 编译的版本，PyTorch 的安装是通过 conda 完成的）。\n\n### 训练 GPT-2 Micro\n\n#### 前提条件\n\n首先，请确保更新 `conf\u002Fmistral-micro.yaml` 文件中的目录设置，以指定 Hugging Face 缓存和模型运行日志的存储路径。\n\n```\n# 资源与缓存\nartifacts:\n    cache_dir: \u002Fpath\u002Fto\u002Fartifacts\n    run_dir: \u002Fpath\u002Fto\u002Fruns\n```\n\n其次，确保 `\u002Fpath\u002Fto\u002Fmistral` 已添加到你的 `PYTHONPATH` 中。\n\n#### 单节点单 GPU 训练\n\n对于单节点单 GPU 训练，运行以下命令：\n\n```bash\nconda activate mistral\ncd mistral\nCUDA_VISIBLE_DEVICES=0 python train.py --config conf\u002Fmistral-micro.yaml --nnodes 1 --nproc_per_node 1 --training_arguments.fp16 true --training_arguments.per_device_train_batch_size 2 --run_id tutorial-gpt2-micro\n```\n\n#### 多节点多 GPU 分布式训练（使用 DeepSpeed）\n\n按如下方式修改 `\u002Fjob\u002Fhostfile` 文件：\n\n```\n\u003C第一台机器的主机名> slots=\u003CGPU数量>\n\u003C第二台机器的主机名> slots=\u003CGPU数量>\n...\n\u003C第 n 台机器的主机名> slots=\u003CGPU数量>\n```\n\n以下是一个示例 hostfile，我们在 `machine1` 和 `machine2` 上各使用 8 张 GPU 进行训练：\n\n```\nmachine1 slots=8\nmachine2 slots=8\n```\n\n要启动分布式训练，运行以下命令：\n\n```bash\nconda activate mistral\ncd mistral\ndeepspeed --num_gpus 8 --num_nodes 2 --master_addr machine1 train.py --config conf\u002Ftutorial-gpt2-micro.yaml --nnodes 2 --nproc_per_node 8 --training_arguments.fp16 true --training_arguments.per_device_train_batch_size 4 --training_arguments.deepspeed conf\u002Fdeepspeed\u002Fz2-small-conf.json --run_id tutorial-gpt2-micro-multi-node\n```\n\n注意：你可能需要根据 GPU 的容量调整批量大小。\n\n如果你有兴趣在 Google Cloud 上训练模型，请查看我们的 [Google Cloud + Kubernetes 教程](https:\u002F\u002Fnlp.stanford.edu\u002Fmistral\u002Ftutorials\u002Fgcp_plus_kubernetes.html)。\n\n### 使用模型\n\n模型检查点将存储在 `artifacts.run_dir` 指定的目录中。例如，检查点可能位于 `\u002Fpath\u002Fto\u002Fruns\u002Ftutorial-gpt2-micro\u002Fcheckpoint-1000`。\n\nMistral 以 Hugging Face 格式存储模型检查点，因此这些模型可以像直接使用 Hugging Face 训练的模型一样加载和使用。\n\n例如，使用 🤗 Transformers 生成文本：\n\n```python\nfrom transformers import GPT2LMHeadModel, GPT2Tokenizer\n\ntokenizer = GPT2Tokenizer.from_pretrained(\"gpt2\")\n\nmodel = GPT2LMHeadModel.from_pretrained(\"stanford-crfm\u002Feowyn-x777-checkpoint-400000\")\n\ninput_ids = tokenizer.encode(\n    \"你好，世界！这是一个语言模型提示。\", return_tensors=\"pt\"\n)\n\nsample_output = model.generate(input_ids, do_sample=True, max_length=50, top_k=50)\n\nprint(\"输出：\\n\" + 100 * \"-\")\nprint(tokenizer.decode(sample_output[0], skip_special_tokens=True))\n```\n\n请查看这个 [Google Colab 笔记本](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fstanford-crfm\u002Fmistral\u002Fblob\u002Fmain\u002Fgenerate_text.ipynb)，运行此演示！\n\n---\n\n## 资源\n\nPropulsion 团队已在 [OpenWebText 语料库](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopenwebtext) 上训练了 5 个 GPT-2 Medium 模型和 5 个 GPT-2 Small 模型，该语料库可在 [🤗 datasets](https:\u002F\u002Fhuggingface.co\u002Fdatasets) 中找到。\n\n每个 GPT-2 Small 模型共有 600 个检查点，其保存策略如下：\n\n- 前 100 步，每 10 步保存一次。\n- 第 100 步至第 2000 步，每 50 步保存一次。\n- 第 2000 步至第 20000 步，每 100 步保存一次。\n- 第 20000 步至第 400000 步，每 1000 步保存一次。\n\n这些检查点可从 [🤗 hub](https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm) 下载。\n\n| 运行 | 类型   | 种子 | 下载链接 |\n| --- | ------ | ---- | ---------- |\n| Alias | GPT-2 Small | 21 | [下载](https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Falias-gpt2-small-x21\u002Ftree\u002Fmain) |\n| Battlestar | GPT-2 Small | 49 | [下载](https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Fbattlestar-gpt2-small-x49\u002Ftree\u002Fmain) |\n| Caprica | GPT-2 Small | 81 | [下载](https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Fcaprica-gpt2-small-x81\u002Ftree\u002Fmain) |\n| Darkmatter | GPT-2 Small | 343 | [下载](https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Fdarkmatter-gpt2-small-x343\u002Ftree\u002Fmain) |\n| Expanse | GPT-2 Small | 777 | [下载](https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Fexpanse-gpt2-small-x777\u002Ftree\u002Fmain) |\n| Arwen | GPT-2 Medium | 21 | [下载](https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Farwen-gpt2-medium-x21\u002Ftree\u002Fmain) |\n| Beren | GPT-2 Medium | 49 | [下载](https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Fberen-gpt2-medium-x49\u002Ftree\u002Fmain) |\n| Celebrimbor | GPT-2 Medium | 81 | [下载](https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Fcelebrimbor-gpt2-medium-x81\u002Ftree\u002Fmain) |\n| Durin | GPT-2 Medium | 343 | [下载](https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Fdurin-gpt2-medium-x343\u002Ftree\u002Fmain) |\n| Eowyn | GPT-2 Medium | 777 | [下载](https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Feowyn-gpt2-medium-x777\u002Ftree\u002Fmain) |\n\n\n每个模型都有独立的 Git 仓库，每个检查点都以分支形式存储。\n\n例如，以下是获取 battlestar 模型第 300000 步检查点的方法：\n\n```\n# 确保已安装 git-lfs\n# (https:\u002F\u002Fgit-lfs.github.com)\ngit lfs install\n\n# 获取 battlestar 模型的第 300000 步检查点\ngit clone https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Fbattlestar-gpt2-small-x49 --branch checkpoint-300000 --single-branch\ncd battlestar-gpt2-small-x49\ngit lfs pull\n```\n\n为方便起见，所有模型及其步骤检查点均列在 `mistral_models.json` 文件中。\n\n---\n\n## 问题\n\n如需提问、报告问题或请求功能，请使用 [GitHub 问题跟踪器](https:\u002F\u002Fgithub.com\u002Fstanford-crfm\u002Fmistral\u002Fissues)。\n在创建新问题之前，请务必搜索现有问题，看看是否已有解决方案。\n\n---\n\n## Mistral 与 Hugging Face 的区别\n\n请访问[此页面](https:\u002F\u002Fnlp.stanford.edu\u002Fmistral\u002Fhugging_face_differences.html)，其中概述了这两个代码库之间的差异。\n\n---\n\n## 贡献\n\n有关贡献的信息，请参阅[此页面](https:\u002F\u002Fnlp.stanford.edu\u002Fmistral\u002Fcontributing.html)。","# Mistral 快速上手指南\n\nMistral 是一个基于 Hugging Face 🤗 构建的大规模语言模型训练框架，旨在提供透明且易于访问的训练流程。它支持单节点及分布式训练（包括云端），并提供了数据集集成和模型评估的实用脚本。\n\n## 环境准备\n\n在开始之前，请确保您的系统满足以下经过测试的版本要求：\n\n*   **Python**: 3.8.12\n*   **PyTorch**: 1.11.0 (需使用 CUDA 11.3 编译)\n*   **CUDA**: 11.3\n*   **NCCL**: 2.10\n*   **Transformers**: 4.17.0\n*   **DeepSpeed**: 0.6.0\n\n**前置依赖：**\n*   已安装 `conda` (推荐用于管理非 Python 依赖和 PyTorch)\n*   已安装 `git` 和 `git-lfs` (用于下载模型检查点)\n\n> **注意**：国内用户若遇到连接问题，可配置 conda 使用清华或中科大镜像源，并在 pip 安装时指定国内镜像源（如 `-i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`）。\n\n## 安装步骤\n\n推荐使用 conda 创建独立环境以确保依赖版本兼容。\n\n1.  **创建并激活 conda 环境**：\n    ```bash\n    conda create -n mistral python=3.8.12 pytorch=1.11.0 torchdata cudatoolkit=11.3 -c pytorch\n    conda activate mistral\n    ```\n\n2.  **安装 Python 依赖**：\n    ```bash\n    pip install -r setup\u002Fpip-requirements.txt\n    ```\n    *(国内用户建议添加镜像源参数，例如：`pip install -r setup\u002Fpip-requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`)*\n\n3.  **验证环境（可选）**：\n    项目提供了一个测试过的环境导出文件 `environments\u002Fenvironment-gpu.yaml`，如需复现完全一致的环境可参考该文件。\n\n## 基本使用\n\n### 1. 配置运行路径\n\n在运行训练前，需修改配置文件 `conf\u002Fmistral-micro.yaml`，设置 artifacts 缓存目录和模型运行输出目录。\n\n```yaml\n# Artifacts & Caching\nartifacts:\n    cache_dir: \u002Fpath\u002Fto\u002Fartifacts  # 替换为您的实际路径\n    run_dir: \u002Fpath\u002Fto\u002Fruns         # 替换为您的实际路径\n```\n\n确保将 mistral 项目根目录添加到 `PYTHONPATH`：\n```bash\nexport PYTHONPATH=\u002Fpath\u002Fto\u002Fmistral:$PYTHONPATH\n```\n\n### 2. 单节点单卡训练 (GPT-2 Micro)\n\n激活环境后，使用以下命令启动训练：\n\n```bash\nconda activate mistral\ncd mistral\nCUDA_VISIBLE_DEVICES=0 python train.py --config conf\u002Fmistral-micro.yaml --nnodes 1 --nproc_per_node 1 --training_arguments.fp16 true --training_arguments.per_device_train_batch_size 2 --run_id tutorial-gpt2-micro\n```\n\n### 3. 加载与使用模型\n\n训练完成的检查点默认保存在 `artifacts.run_dir` 指定的目录下（例如 `\u002Fpath\u002Fto\u002Fruns\u002Ftutorial-gpt2-micro\u002Fcheckpoint-1000`）。Mistral 生成的模型格式与 Hugging Face 完全兼容，可直接使用 `transformers` 库加载。\n\n以下是一个生成文本的 Python 示例：\n\n```python\nfrom transformers import GPT2LMHeadModel, GPT2Tokenizer\n\n# 加载分词器\ntokenizer = GPT2Tokenizer.from_pretrained(\"gpt2\")\n\n# 加载预训练模型 (此处以斯坦福发布的 Eowyn 模型为例，也可替换为您的本地检查点路径)\nmodel = GPT2LMHeadModel.from_pretrained(\"stanford-crfm\u002Feowyn-x777-checkpoint-400000\")\n\n# 编码输入\ninput_ids = tokenizer.encode(\n    \"Hello world, this is a language model prompt.\", return_tensors=\"pt\"\n)\n\n# 生成文本\nsample_output = model.generate(input_ids, do_sample=True, max_length=50, top_k=50)\n\nprint(\"Output:\\n\" + 100 * \"-\")\nprint(tokenizer.decode(sample_output[0], skip_special_tokens=True))\n```\n\n### 4. 下载预训练模型 (可选)\n\n如果您不想从头训练，可以从 Hugging Face Hub 下载斯坦福团队预训练的 GPT-2 Small\u002FMedium 模型检查点。由于检查点存储在 Git 分支中，需使用 `git` 克隆特定分支：\n\n```bash\n# 确保已安装 git-lfs\ngit lfs install\n\n# 示例：下载 battlestar 模型的第 300,000 步检查点\ngit clone https:\u002F\u002Fhuggingface.co\u002Fstanford-crfm\u002Fbattlestar-gpt2-small-x49 --branch checkpoint-300000 --single-branch\ncd battlestar-gpt2-small-x49\ngit lfs pull\n```\n\n更多可用模型列表请参阅项目文档中的资源表格。","某高校 NLP 实验室的研究团队正试图复现一篇关于小参数语言模型的论文，并需要在有限的本地 GPU 集群上快速验证新的预训练数据混合策略。\n\n### 没有 mistral 时\n- **环境配置繁琐**：团队成员需手动对齐 PyTorch、CUDA、DeepSpeed 和 Transformers 的版本，常因依赖冲突导致数天的环境调试时间浪费。\n- **分布式训练门槛高**：编写多节点多卡的启动脚本极其复杂，尤其是配置 GCP 云主机间的通信和 DeepSpeed 参数时，极易出错且难以排查。\n- **评估流程割裂**：训练完成后，缺乏统一的评估脚本，研究人员需自行编写代码将模型转换为 Hugging Face 格式才能进行效果测试，中断了实验闭环。\n- **实验复现困难**：由于缺乏标准化的配置文件管理，不同成员运行的实验参数记录混乱，导致结果难以精确复现和对比。\n\n### 使用 mistral 后\n- **一键构建环境**：利用 mistral 提供的经过验证的 Conda 环境和 pip 依赖列表，团队在几分钟内即可搭建好包含 CUDA 11.3 和 DeepSpeed 的标准开发环境。\n- **简化分布式部署**：通过简单的 hostfile 配置和内置的 DeepSpeed 启动命令，轻松实现跨机器、多 GPU 的并行训练，甚至可直接复用其 GCP+Kubernetes 教程上云。\n- **无缝评估集成**：mistral 自动将模型检查点保存为标准的 Hugging Face 格式，研究人员可直接调用官方脚本或 Transformers 库进行即时推理和评估，无需额外转换。\n- **标准化实验管理**：基于 YAML 的配置文件统一管理数据路径、超参数和运行目录，确保了每次实验的透明度和可复现性，大幅提升了协作效率。\n\nmistral 通过提供从环境搭建、分布式训练到模型评估的全链路标准化工具，让研究团队能将精力完全聚焦于算法创新而非工程基建。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fstanford-crfm_mistral_af7846c5.png","stanford-crfm","Stanford Center for Research on Foundation Models","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fstanford-crfm_d57d38b2.png","",null,"https:\u002F\u002Fcrfm.stanford.edu\u002F","https:\u002F\u002Fgithub.com\u002Fstanford-crfm",[81,85,89,93,97],{"name":82,"color":83,"percentage":84},"Python","#3572A5",59.6,{"name":86,"color":87,"percentage":88},"Shell","#89e051",31.9,{"name":90,"color":91,"percentage":92},"Jupyter Notebook","#DA5B0B",6,{"name":94,"color":95,"percentage":96},"Dockerfile","#384d54",1.8,{"name":98,"color":99,"percentage":100},"Makefile","#427819",0.7,579,51,"2026-04-09T07:52:52","Apache-2.0",4,"未说明","需要 NVIDIA GPU（支持 CUDA），测试环境为 CUDA 11.3，需安装 NCCL 以支持分布式训练，显存大小需根据 batch size 调整（未明确具体数值）",{"notes":109,"python":110,"dependencies":111},"建议使用 conda 创建环境以确保 PyTorch 编译版本与 CUDA 11.3 匹配；分布式训练需配置 hostfile 并指定机器 hostname 和 GPU 数量；模型检查点以 Hugging Face 格式存储，可通过 git-lfs 下载特定分支的检查点。","3.8.12",[112,113,114,115,116,117],"pytorch=1.11.0","cudatoolkit=11.3","torchdata","transformers=4.17.0","deepspeed=0.6.0","nccl=2.10",[35,14],"2026-03-27T02:49:30.150509","2026-04-18T22:38:05.711185",[122,127,132,137,142,147,151],{"id":123,"question_zh":124,"answer_zh":125,"source_url":126},41039,"如何正确配置 Mistral 的运行环境以解决依赖链问题？","建议按照以下步骤设置环境：\n1. 创建 Conda 环境：`conda create -n mistral python=3.8.8`\n2. 激活环境：`conda activate mistral`\n3. 安装 CUDA 工具包：`conda install cudatoolkit=10.2`（需根据实际显卡驱动调整版本）\n4. 使用 pip 安装核心依赖：`pip install torch transformers datasets huggingface-hub deepspeed jsonlines quinine wandb`\n注意：必须确保安装的 PyTorch 版本与本地 CUDA 版本兼容。","https:\u002F\u002Fgithub.com\u002Fstanford-crfm\u002Fmistral\u002Fissues\u002F103",{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},41040,"Mistral 项目经过测试的稳定版本组合是什么？","项目已在以下特定版本组合中通过测试，推荐参考此配置以避免兼容性问题：\n- Python: 3.8.12\n- PyTorch: 1.11.0 (编译基于 CUDA 11.3)\n- CUDA: 11.3\n- NCCL: 2.10\n- Transformers: 4.17.0\n- DeepSpeed: 0.6.0\n用户需确保设置的 PyTorch 是使用与其系统匹配的 CUDA 版本编译的。如果需要编译 DeepSpeed 内核，也需正确设置对应的 CUDA 版本。","https:\u002F\u002Fgithub.com\u002Fstanford-crfm\u002Fmistral\u002Fissues\u002F119",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},41041,"在进行在线评估时，如何处理 OpenWebText 数据的分词以获得更准确的损失值？","为了获得与论文更接近的数值，建议模仿 Megatron-LM 的做法进行去分词（detokenization）处理。具体来说，GPT-2 模型在预训练时很少见到像 \" .\"（空格加句号）这样的 token，因此直接计算可能导致偏差。可以参考 Megatron-LM 中的 detokenizer.py 实现逻辑，在计算评估指标前先对数据进行适当的去分词处理。","https:\u002F\u002Fgithub.com\u002Fstanford-crfm\u002Fmistral\u002Fissues\u002F12",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},41042,"如何在数据预处理阶段解决内存不足或批处理速度慢的问题？","如果机器内存不足以支撑多线程复制数据集，可以尝试临时增大 batch size，直接加载整个数据集后进行截断，这样可以最小化数据丢失并避免频繁的磁盘交换。虽然增大批次大小可能会增加单次预处理时间（例如从 1000 到 16000 可能耗时从 5 分钟增加到 1 小时），但这通常是解决内存瓶颈和扩展至大型数据集（如 OpenWebText）的可行权宜之计。长期方案是自定义 Dataset 类并将数据缓存为 HDF5 格式。","https:\u002F\u002Fgithub.com\u002Fstanford-crfm\u002Fmistral\u002Fissues\u002F5",{"id":143,"question_zh":144,"answer_zh":145,"source_url":146},41043,"是否支持从本地 .jsonl 文件加载自定义数据集？","是的，Mistral v2.0 路线图确认已实现支持在 `*.jsonl` 文件中使用本地\u002F自定义数据集的功能。这允许用户方便地导入自己的数据进行训练，而无需依赖远程 HuggingFace 数据集。","https:\u002F\u002Fgithub.com\u002Fstanford-crfm\u002Fmistral\u002Fissues\u002F109",{"id":148,"question_zh":149,"answer_zh":150,"source_url":146},41044,"如何使用命令行分别执行数据处理和启动训练任务？","Mistral v2.0 引入了新的 CLI 工作流，允许将数据处理与训练启动分离。具体命令如下：\n1. 仅运行数据处理：`mistral process-data --config conf\u002Fgpt2-small-demo.yaml`\n2. 启动训练（如需可自动触发数据处理）：`mistral train --config conf\u002Fgpt2-small-demo.yaml --launcher deepspeed`\n这种重构解决了之前在使用 DeepSpeed 时可能出现的分词崩溃问题，并提供了更灵活的启动方式。",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},41045,"关于从 Checkpoint 恢复训练时的 WandB Run ID 问题需要注意什么？","虽然在 Mistral 的使用场景中 `run_id` 不应重复，但 WandB 本身对此并不敏感。Mistral 中的 `run_id` 被用作 WandB 中的 `name` 字段。在实现断点续训功能时，需注意处理 run_id 的逻辑，确保在恢复运行时能正确关联或更新 WandB 的记录，尽管 WandB 允许同名存在，但为了追踪清晰，建议在代码逻辑中妥善处理中断后的续跑标识。","https:\u002F\u002Fgithub.com\u002Fstanford-crfm\u002Fmistral\u002Fissues\u002F18",[157],{"id":158,"version":159,"summary_zh":160,"released_at":161},324608,"v1.0","这是 Mistral 的初始版本。\n\n完整的文档可以在这里找到：[https:\u002F\u002Fnlp.stanford.edu\u002Fmistral\u002F](https:\u002F\u002Fnlp.stanford.edu\u002Fmistral\u002F)。","2021-09-09T10:41:35"]