[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-stepfun-ai--NextStep-1":3,"tool-stepfun-ai--NextStep-1":64},[4,18,26,35,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,2,"2026-04-06T11:32:50",[14,15,13],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":10,"last_commit_at":41,"category_tags":42,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[43,15,13,14],"语言模型",{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":10,"last_commit_at":50,"category_tags":51,"status":17},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,"2026-04-06T03:28:53",[14,15,13,52],"视频",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",85013,"2026-04-06T11:09:19",[15,16,52,61,13,62,43,14,63],"插件","其他","音频",{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":79,"owner_website":79,"owner_url":81,"languages":82,"stars":91,"forks":92,"last_commit_at":93,"license":94,"difficulty_score":95,"env_os":96,"env_gpu":97,"env_ram":98,"env_deps":99,"category_tags":109,"github_topics":79,"view_count":32,"oss_zip_url":79,"oss_zip_packed_at":79,"status":17,"created_at":110,"updated_at":111,"faqs":112,"releases":147},4987,"stepfun-ai\u002FNextStep-1","NextStep-1","[🚀 ICLR 2026 Oral] NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s Multimodal Intelligence team.","NextStep-1 是由阶跃星辰（StepFun）多模智能团队研发的前沿开源项目，旨在突破传统自回归模型在图像生成领域的瓶颈。长期以来，自回归架构虽擅长处理文本，但在生成图像时往往依赖昂贵的扩散模型或不得不将图像压缩为有损的离散标记，导致细节丢失。NextStep-1 另辟蹊径，采用了一种直接处理“连续图像标记”的创新路径。\n\n作为一个拥有 140 亿参数的超大模型，NextStep-1 能够联合建模离散文本序列与连续图像序列。它巧妙地结合了标准的语言模型头部用于处理文本，以及一个轻量级的流匹配（Flow Matching）头部用于处理视觉数据。这种统一的“下一标记预测”框架不仅结构简单、易于扩展，更完整保留了视觉数据的丰富信息，从而生成细节惊人、画质卓越的图像。\n\n该工具特别适合人工智能研究人员、算法工程师以及对新一代生成式 AI 架构感兴趣的技术开发者使用。通过开源代码与权重，NextStep-1 为探索大规模自回归图像生成提供了宝贵的基线与研究素材。其独特的连续标记技术与流匹配机制，代表了当前图像生成领域的重要技术方向，曾荣获 ICLR 2026 口头报告殊荣，是理解未来多模态","NextStep-1 是由阶跃星辰（StepFun）多模智能团队研发的前沿开源项目，旨在突破传统自回归模型在图像生成领域的瓶颈。长期以来，自回归架构虽擅长处理文本，但在生成图像时往往依赖昂贵的扩散模型或不得不将图像压缩为有损的离散标记，导致细节丢失。NextStep-1 另辟蹊径，采用了一种直接处理“连续图像标记”的创新路径。\n\n作为一个拥有 140 亿参数的超大模型，NextStep-1 能够联合建模离散文本序列与连续图像序列。它巧妙地结合了标准的语言模型头部用于处理文本，以及一个轻量级的流匹配（Flow Matching）头部用于处理视觉数据。这种统一的“下一标记预测”框架不仅结构简单、易于扩展，更完整保留了视觉数据的丰富信息，从而生成细节惊人、画质卓越的图像。\n\n该工具特别适合人工智能研究人员、算法工程师以及对新一代生成式 AI 架构感兴趣的技术开发者使用。通过开源代码与权重，NextStep-1 为探索大规模自回归图像生成提供了宝贵的基线与研究素材。其独特的连续标记技术与流匹配机制，代表了当前图像生成领域的重要技术方向，曾荣获 ICLR 2026 口头报告殊荣，是理解未来多模态大模型演进的关键参考。","# NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale\n\n\u003Cdiv align=\"center\">\n\n[![Homepage](https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=Homepage&message=Project%20Page&color=blue&logo=home)](https:\u002F\u002Fstepfun.ai\u002Fresearch\u002Fen\u002Fnextstep1)&nbsp;[![huggingface weights](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Weights-StepFun\u002FNextStep1-yellow)](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fstepfun-ai\u002Fnextstep-1-689d80238a01322b93b8a3dc)&nbsp;[![arXiv:2508.10711](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2508.10711-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.10711)&nbsp;[![Blog](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBlog-NextStep1-blue)](https:\u002F\u002Fstepfun-ai.github.io\u002FNextStep-1\u002Fnextstep_1_blog\u002F)&nbsp;[![Blog](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBlog-NextStep1.1-blue)](https:\u002F\u002Fstepfun-ai.github.io\u002FNextStep-1\u002Fnextstep_1p1_blog\u002F)\n\n\u003C\u002Fdiv>\n\n> Autoregressive models—generating content step-by-step like reading a sentence—excel in language but struggle with images. Traditionally, they either depend on costly diffusion models or compress images into discrete, lossy tokens via vector quantization (VQ).\n>\n> NextStep-1 takes a different path: a 14B-parameter autoregressive model that works directly with continuous image tokens, preserving the full richness of visual data. It models sequences of discrete text tokens and continuous image tokens jointly—using a standard LM head for text and a lightweight 157M-parameter flow matching head for visuals. This unified next-token prediction framework is simple, scalable, and capable of producing stunningly detailed images.\n\n\u003Cdiv align=\"center\">\n\u003Cimg width=\"720\" alt=\"t2i_demo\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fstepfun-ai_NextStep-1_readme_ea575fc0f7e6.gif\">\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\u003Cimg width=\"720\" alt=\"edit_demo\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fstepfun-ai_NextStep-1_readme_dae3a338d964.gif\">\n\u003C\u002Fdiv>\n\n## 🔥 News\n\n- **Feb. 25, 2026**: **vLLM-Omni** supports high performance inference of NextStep-1.1. Please check [here](https:\u002F\u002Fdocs.vllm.ai\u002Fprojects\u002Fvllm-omni\u002Fen\u002Flatest\u002Fuser_guide\u002Fexamples\u002Foffline_inference\u002Ftext_to_image\u002F?h=nextstep#nextstep-models) for details!\n\n- **Feb. 16, 2026**: The training code of NextStep-1 (this repo) and the post-training blogs of NextStep-1.1 ([link](https:\u002F\u002Fstepfun-ai.github.io\u002FNextStep-1\u002Fnextstep_1p1_blog\u002F)) have been released. Welcome to discuss and contribute. Happy Chinese New Year!\n\n- **Feb. 6, 2026**: NextStep-1 has been selected as **Oral Presentation** by ICLR 2026! 🎉🎉🎉\n\n- **Dec. 24, 2025**: 🔥 We release **NextStep-1.1**, a text-to-image model that substantially elevates output quality through extended training and a Flow-based Reinforcement Learning (RL) post-training paradigm. Feel free to try with checkpoints hosted on our [HF repo](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FNextStep-1.1)!\n\n  Checkpoints are available on:\n  - 🤗 **Hugging Face**:\n    - Pretrain: [NextStep-1.1-Pretrain](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FNextStep-1.1-Pretrain)\n    - Post-train: [NextStep-1.1](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FNextStep-1.1)\n  - 🇨🇳 **ModelScope**:\n    - Pretrain: [NextStep-1.1-Pretrain](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002Fstepfun-ai\u002FNextStep-1.1-Pretrain)\n    - Post-train: [NextStep-1.1](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002Fstepfun-ai\u002FNextStep-1.1)\n\n- **Aug. 18, 2025**: 👋 We deploy NextStep-1-Large-Edit on [HuggingFace Spaces](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fstepfun-ai\u002FNextStep-1-Large-Edit). Feel free to try it out!\n\n- **Aug. 18, 2025**: 👋 We open the [WeChat Group](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fstepfun-ai_NextStep-1_readme_7ce332a4b4b3.png). Feel free to join us!\n\n  \u003Cdiv align=\"center\">\n  \u003Cimg width=\"360\" alt=\"wechat\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fstepfun-ai_NextStep-1_readme_7ce332a4b4b3.png\">\n  \u003C\u002Fdiv>\n\n- **Aug. 14, 2025**: 👋 We release the inference code and [huggingface model weights](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fstepfun-ai\u002Fnextstep-1-689d80238a01322b93b8a3dc) of NextStep-1-Large-Pretrain, NextStep-1-Large and NextStep-1-Large-Edit\n\n- **Aug. 14, 2025**: 👋 We have made our [technical report](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.10711) available as open source.\n\n---\n\n## 📑 Table of Contents\n\n- [🔥 News](#-news)\n- [📦 Installation & Environment](#-installation--environment)\n- [📥 Model & Data Preparation](#-model--data-preparation)\n  - [2.1 Download Model Weights](#21-download-model-weights)\n  - [2.2 Download Training Datasets](#22-download-training-datasets)\n  - [2.3 Process Custom Data (Optional)](#23-process-custom-data-optional)\n- [🚀 Training](#-training)\n  - [3.1 Start Training (via `smartrun`)](#31-start-training-via-smartrun)\n  - [3.2 Override Training Parameters](#32-override-training-parameters)\n  - [3.3 Inspect and Compare Configurations](#33-inspect-and-compare-configurations)\n- [🔮 Inference](#-inference)\n  - [4.1 Convert Checkpoint Format](#41-convert-checkpoint-format)\n  - [4.2 Run Inference](#42-run-inference)\n- [📚 References](#-references)\n- [📄 License](#-license)\n- [📖 Citation](#-citation)\n\n---\n\n## 📦 Installation & Environment\n\n### 1.1 Clone the Repository\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fstepfun-ai\u002FNextStep-1\ncd NextStep-1\n```\n\n### 1.2 Create Conda Environment\n\n```bash\nconda create -n nextstep python=3.10 -y\nconda activate nextstep\n```\n\n### 1.3 Install Dependencies\n\n> ⚠️ **Note**: Pre-installing PyTorch based on your CUDA version is recommended.\n\n```bash\npip install uv\nuv pip install -e .\n```\n\n> ☕ **Tip**: This installation may take a while. Grab a cup of coffee and take a break! ☕\n\n### 1.4 Built-in CLI Tools\n\nThe following CLI tools are available after installation:\n\n- **`smartrun`**: An intelligent distributed launcher that automatically wraps `torchrun` parameters.\n- **`gen_meta`**: Scans datasets to generate metadata indices (sample counts, checksums, etc.).\n- **`warmup_data`**: Pre-warms and caches data indices to significantly speed up training startup.\n- **`eshow`**: Inspect or compare experiment configurations.\n- **`singlegpu_debug` \u002F `multigpu_debug`**: Dedicated debug entries for remote attachment.\n\n---\n\n## 📥 Model & Data Preparation\n\n### 2.1 Download Model Weights\n\nDownload models to `.\u002Fnextstep_models`. Please update the corresponding paths in `nextstep\u002Fmodel_zoos.py`.\n\n```bash\nbash download_models.sh\n```\n> ☕ **Tip**: This download may take a while. Grab a cup of coffee and take a break! ☕\n\n#### Available Models\n\nThe following table lists all available models and their training stages:\n\n| Model | Pre-Training 256px | Pre-Training 512px | Annealing | RL | Visual Diversity | Fine-Tunability | Hugging Face |\n|-------|-------------------|-------------------|----------|----|-----------|------------------|--------------|\n| **NextStep-1-f8ch16-Tokenizer** | ❌ | ❌ | ❌ | ❌ | - | - | [![🤗](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-HuggingFace-yellow)](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FNextStep-1-f8ch16-Tokenizer) |\n| **NextStep-1.1-Pretrain-256px** | ✅ | ❌ | ❌ | ❌ | High | Easy | [![🤗](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-HuggingFace-yellow)](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FNextStep-1.1-Pretrain-256px) |\n| **NextStep-1.1-Pretrain** | ✅ | ✅ | ✅ | ❌ | Medium | Medium | [![🤗](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-HuggingFace-yellow)](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FNextStep-1.1-Pretrain) |\n| **NextStep-1.1** | ✅ | ✅ | ✅ | ✅ | Low | Hard | [![🤗](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-HuggingFace-yellow)](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FNextStep-1.1) |\n| **NextStep-1-Large-Pretrain** | ✅ | ✅ | ✅ | ❌ | High | Medium | [![🤗](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-HuggingFace-yellow)](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FNextStep-1-Large-Pretrain) |\n| **NextStep-1-Large** | ✅ | ✅ | ✅ | ✅ | Low | Hard | [![🤗](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-HuggingFace-yellow)](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FNextStep-1-Large) |\n| **NextStep-1-Large-Edit** | ✅ | ✅ | ✅ | ✅ | Low | Hard | [![🤗](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-HuggingFace-yellow)](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FNextStep-1-Large-Edit) |\n\n> ⚠️ **Note**: The models of NextStep-1 series are from the old version. Their performance is not as good as NextStep-1.1, so we do not recommend using them. Please use NextStep-1.1 series models instead.\n\n> 💡 **Quick Inference**: If you want to quickly inference the model, refer to the inference script below.\n\n```bash\npython3 inference\u002Finference.py\n```\n\n### 2.2 Download Training Datasets\n\nDownload datasets to `.\u002Fnextstep_data`.\n\n```bash\nbash download_datasets.sh\n```\n> ☕ **Tip**: This download may take a while. Grab a cup of coffee and take a break! ☕\n\n> ⚠️ **Important Note**: The datasets provided in `download_datasets.sh` are only example open-source datasets for demonstration purposes. NextStep's actual training utilized approximately **1 billion images** from proprietary in-house data sources that cannot be open-sourced. To achieve optimal training results, we strongly recommend collecting and preparing your own large-scale datasets following the data processing guidelines in section 2.3.\n\n### 2.3 Process Custom Data (Optional)\n\n> 💡 **Skip this section** if you are only using the default datasets from step 2.2. Follow these steps to process custom data:\n\n#### 2.3.1 Data Processing\n\nConvert raw data into the unified WebDataset (Tar) format.\n\n```bash\npython3 nextstep\u002Fdata\u002Fbuild_wds.py\n```\n\n**Data Specification** (generates `assets\u002Fidx_0000_0000.tar`):\n\n- **`key.json`**: Must contain a `caption` field using `\u003Cimage_n>` placeholders to define the interleaved sequence.\n- **`key-{i}.png`**: Images must be named `key-0.png`, `key-1.png`, etc., matching the placeholders in the JSON.\n- ⚠️ **Important**: The `key` must **NOT** contain dots (`.`) or hyphens (`-`). You must use the `build_wds.py` script to ensure correct indexing. **Modify `load_data` and `create_example` in the script to fit your specific data source.**\n\n#### 2.3.2 Metadata Generation\n\nCalculate sample counts for each Tar file to build training indices.\n\n```bash\ngen_meta \u002Fpath\u002Fto\u002Fyour\u002Fdataset\u002Froot_dir\n```\n\n> 💡 After completion, update `configs\u002Fdata\u002Fpretrain_data.json` and the corresponding Python data config files in `configs\u002Fdata` with the new data.\n\n#### 2.3.3 Warmup Indices\n\nRecommended for large-scale training to cache indices locally.\n\n```bash\nwarmup_data \u002Fpath\u002Fto\u002Fyour\u002Fdataset\u002Froot_dir --n_jobs 32\n```\n\n#### 2.3.4 Data Visualization\n\nPreview data distribution and content in Tar files or configurations.\n\n```bash\nstreamlit run nextstep\u002Fservice\u002F_preview.py --server.port 8501\n```\n\n#### 2.3.5 W&B Credentials\n\nCreate a `.config` file in the root directory for experiment tracking. API key can be found at https:\u002F\u002Fwandb.ai\u002Fsettings\n\n```text\nWANDB_MODE=online\nWANDB_API_KEY=YOUR_WANDB_API_KEY\nWANDB_BASE_URL=https:\u002F\u002Fapi.wandb.ai\n```\n\n---\n\n## 🚀 Training\n\n> ⚠️ **Before training**, please carefully review the configurations in the `configs` directory. You may need to modify the model or output paths in the configuration files.\n\n### 3.1 Start Training (via `smartrun`)\n\n**Option 1**: Start with the NextStep-1.1-Pretrain-256px model with small training steps (~10K)\n\n```bash\nsmartrun -m configs.nextstep_qwen14b_512px\n```\n\n> 💡 This command automatically utilizes all available machine resources. If you run this command on a single machine, it is equivalent to: `torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0 -m configs.nextstep_qwen14b_512px`\n\n**Option 2**: Start with the Qwen2.5-14B model with very large training steps (~500K)\n\n```bash\nsmartrun -m configs.nextstep_qwen14b_256px\n```\n\n### 3.2 Override Training Parameters\n\nOverride specific parameters during training:\n\n```bash\nsmartrun -m configs.nextstep_qwen14b_512px \\\n  training.max_steps=1000 \\\n  training.save_steps=200 \\\n  data.num_workers=2\n```\n\n### 3.3 Inspect and Compare Configurations\n\n**View a single configuration:**\n\n```bash\neshow configs\u002Fnextstep_qwen14b_512px.py\n```\n\n**Compare differences between two configurations** (e.g., 256px vs 512px):\n\n```bash\neshow configs\u002Fnextstep_qwen14b_256px.py configs\u002Fnextstep_qwen14b_512px.py\n```\n\n> 📌 **Tips**: Adjust specific parameters, configuration files, and data paths according to your situation. For detailed explanations, see [`configs\u002FREADME.md`](.\u002Fconfigs\u002FREADME.md).\n\n---\n\n## 🔮 Inference\n\n### 4.1 Convert Checkpoint Format\n\nConvert DeepSpeed sharded checkpoints to standard HuggingFace format:\n\n```bash\npython3 nextstep\u002Fdeepspeed\u002Fzero_to_fp32.py \u002Fpath\u002Fto\u002Fyour\u002Ftrained\u002Fcheckpoint_dir\n```\n\n### 4.2 Run Inference\n\n**Basic inference:**\n\n```bash\npython3 inference\u002Finference.py --model_name_or_path \u002Fpath\u002Fto\u002Fyour\u002Ftrained\u002Fcheckpoint_dir\n```\n\n**Quick start with default model:**\n\n```bash\npython3 inference\u002Finference.py\n```\n\n---\n\n## 📖 Documentation\n\nFor detailed documentation on specific modules, please refer to:\n\n- [NextStep Package](.\u002Fnextstep\u002FREADME.md) - Core package overview\n- [Configuration System](.\u002Fconfigs\u002FREADME.md) - Configuration files and training setup\n- [Training Engine](.\u002Fnextstep\u002Fengine\u002FREADME.md) - Training and validation implementation\n- [Models](.\u002Fnextstep\u002Fmodels\u002FREADME.md) - Model architecture and implementation\n- [Datasets](.\u002Fnextstep\u002Fdatasets\u002FREADME.md) - Dataset adapters and mixed sampling\n- [Data Processing](.\u002Fnextstep\u002Fdata\u002FREADME.md) - Data loading, indexing, and utilities\n- [Service](.\u002Fnextstep\u002Fservice\u002FREADME.md) - Data preview and visualization service\n- [Utils](.\u002Fnextstep\u002Futils\u002FREADME.md) - Utility functions and helpers\n\n---\n\n## 📚 References\n\n### Core Frameworks\n\n- [DeepSpeed](https:\u002F\u002Fgithub.com\u002Fdeepspeedai\u002FDeepSpeed)\n- [Transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers)\n- [Diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers)\n\n### Datasets\n\n- [Dolma](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fallenai\u002Fdolma)\n- [BLIP3o](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FBLIP3o\u002FBLIP3o-60k)\n- [GPT-Image-Edit](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FUCSC-VLAA\u002FGPT-Image-Edit-1.5M)\n- [Multimodal Textbook](https:\u002F\u002Fgithub.com\u002FDAMO-NLP-SG\u002Fmultimodal_textbook)\n\n---\n\n## 📄 License\n\nNextStep is licensed under the Apache License 2.0. You can find the license files in the respective GitHub and HuggingFace repositories.\n\n---\n\n## 📖 Citation\n\nIf you find NextStep useful for your research and applications, please consider starring this repository and citing:\n\n```bibtex\n@article{nextstepteam2025nextstep1,\n  title={NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale},\n  author={NextStep Team and Chunrui Han and Guopeng Li and Jingwei Wu and Quan Sun and Yan Cai and Yuang Peng and Zheng Ge and Deyu Zhou and Haomiao Tang and Hongyu Zhou and Kenkun Liu and Ailin Huang and Bin Wang and Changxin Miao and Deshan Sun and En Yu and Fukun Yin and Gang Yu and Hao Nie and Haoran Lv and Hanpeng Hu and Jia Wang and Jian Zhou and Jianjian Sun and Kaijun Tan and Kang An and Kangheng Lin and Liang Zhao and Mei Chen and Peng Xing and Rui Wang and Shiyu Liu and Shutao Xia and Tianhao You and Wei Ji and Xianfang Zeng and Xin Han and Xuelin Zhang and Yana Wei and Yanming Xu and Yimin Jiang and Yingming Wang and Yu Zhou and Yucheng Han and Ziyang Meng and Binxing Jiao and Daxin Jiang and Xiangyu Zhang and Yibo Zhu},\n  journal={arXiv preprint arXiv:2508.10711},\n  year={2025}\n}\n```\n","# NextStep-1：迈向大规模连续标记的自回归图像生成\n\n\u003Cdiv align=\"center\">\n\n[![主页](https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=Homepage&message=Project%20Page&color=blue&logo=home)](https:\u002F\u002Fstepfun.ai\u002Fresearch\u002Fen\u002Fnextstep1)&nbsp;[![huggingface权重](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Weights-StepFun\u002FNextStep1-yellow)](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fstepfun-ai\u002Fnextstep-1-689d80238a01322b93b8a3dc)&nbsp;[![arXiv:2508.10711](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2508.10711-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.10711)&nbsp;[![博客](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBlog-NextStep1-blue)](https:\u002F\u002Fstepfun-ai.github.io\u002FNextStep-1\u002Fnextstep_1_blog\u002F)&nbsp;[![博客](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBlog-NextStep1.1-blue)](https:\u002F\u002Fstepfun-ai.github.io\u002FNextStep-1\u002Fnextstep_1p1_blog\u002F)\n\n\u003C\u002Fdiv>\n\n> 自回归模型——像阅读句子一样逐步生成内容——在语言领域表现出色，但在图像生成方面却面临挑战。传统上，它们要么依赖于成本高昂的扩散模型，要么通过向量量化（VQ）将图像压缩为有损的离散标记。\n>\n> NextStep-1则采取了不同的路径：一个拥有140亿参数的自回归模型，直接处理连续的图像标记，从而保留了视觉数据的全部丰富性。它联合建模离散文本标记和连续图像标记的序列——对文本使用标准的语言模型头，而对视觉内容则采用一个轻量级的1.57亿参数流匹配头。这种统一的下一个标记预测框架简单、可扩展，并且能够生成令人惊叹的高细节图像。\n\n\u003Cdiv align=\"center\">\n\u003Cimg width=\"720\" alt=\"t2i_demo\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fstepfun-ai_NextStep-1_readme_ea575fc0f7e6.gif\">\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\u003Cimg width=\"720\" alt=\"edit_demo\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fstepfun-ai_NextStep-1_readme_dae3a338d964.gif\">\n\u003C\u002Fdiv>\n\n## 🔥 最新消息\n\n- **2026年2月25日**：**vLLM-Omni**支持NextStep-1.1的高性能推理。详情请见[这里](https:\u002F\u002Fdocs.vllm.ai\u002Fprojects\u002Fvllm-omni\u002Fen\u002Flatest\u002Fuser_guide\u002Fexamples\u002Foffline_inference\u002Ftext_to_image\u002F?h=nextstep#nextstep-models)！\n\n- **2026年2月16日**：NextStep-1的训练代码（本仓库）以及NextStep-1.1的后训练博客（[链接](https:\u002F\u002Fstepfun-ai.github.io\u002FNextStep-1\u002Fnextstep_1p1_blog\u002F)）已发布。欢迎讨论与贡献。祝大家春节快乐！\n\n- **2026年2月6日**：NextStep-1已被ICLR 2026选为**口头报告**！🎉🎉🎉\n\n- **2025年12月24日**：🔥 我们发布了**NextStep-1.1**，这是一款通过扩展训练和基于流的强化学习（RL）后训练范式显著提升输出质量的文生图模型。欢迎大家使用我们[Hugging Face仓库](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FNextStep-1.1)中托管的检查点进行尝试！\n\n  检查点可在以下平台获取：\n  - 🤗 **Hugging Face**：\n    - 预训练：[NextStep-1.1-Pretrain](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FNextStep-1.1-Pretrain)\n    - 后训练：[NextStep-1.1](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FNextStep-1.1)\n  - 🇨🇳 **ModelScope**：\n    - 预训练：[NextStep-1.1-Pretrain](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002Fstepfun-ai\u002FNextStep-1.1-Pretrain)\n    - 后训练：[NextStep-1.1](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002Fstepfun-ai\u002FNextStep-1.1)\n\n- **2025年8月18日**：👋 我们在[HuggingFace Spaces](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fstepfun-ai\u002FNextStep-1-Large-Edit)上部署了NextStep-1-Large-Edit，欢迎大家试用！\n\n- **2025年8月18日**：👋 我们开通了[微信群](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fstepfun-ai_NextStep-1_readme_7ce332a4b4b3.png)，欢迎大家加入！\n\n  \u003Cdiv align=\"center\">\n  \u003Cimg width=\"360\" alt=\"wechat\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fstepfun-ai_NextStep-1_readme_7ce332a4b4b3.png\">\n  \u003C\u002Fdiv>\n\n- **2025年8月14日**：👋 我们发布了NextStep-1-Large-Pretrain、NextStep-1-Large以及NextStep-1-Large-Edit的推理代码和[huggingface模型权重](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fstepfun-ai\u002Fnextstep-1-689d80238a01322b93b8a3dc)。\n\n- **2025年8月14日**：👋 我们已将我们的[技术报告](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.10711)开源。\n\n---\n\n## 📑 目录\n\n- [🔥 最新消息](#-news)\n- [📦 安装与环境配置](#-installation--environment)\n- [📥 模型与数据准备](#-model--data-preparation)\n  - [2.1 下载模型权重](#21-download-model-weights)\n  - [2.2 下载训练数据集](#22-download-training-datasets)\n  - [2.3 处理自定义数据（可选）](#23-process-custom-data-optional)\n- [🚀 训练](#-training)\n  - [3.1 开始训练（通过`smartrun`）](#31-start-training-via-smartrun)\n  - [3.2 覆盖训练参数](#32-override-training-parameters)\n  - [3.3 检查并比较配置](#33-inspect-and-compare-configurations)\n- [🔮 推理](#-inference)\n  - [4.1 转换检查点格式](#41-convert-checkpoint-format)\n  - [4.2 运行推理](#42-run-inference)\n- [📚 参考文献](#-references)\n- [📄 许可证](#-license)\n- [📖 引用](#-citation)\n\n---\n\n## 📦 安装与环境配置\n\n### 1.1 克隆仓库\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fstepfun-ai\u002FNextStep-1\ncd NextStep-1\n```\n\n### 1.2 创建Conda环境\n\n```bash\nconda create -n nextstep python=3.10 -y\nconda activate nextstep\n```\n\n### 1.3 安装依赖\n\n> ⚠️ **注意**：建议根据你的CUDA版本预先安装PyTorch。\n\n```bash\npip install uv\nuv pip install -e .\n```\n\n> ☕ **提示**：这次安装可能需要一些时间。不妨泡杯咖啡，稍作休息吧！☕\n\n### 1.4 内置CLI工具\n\n安装完成后，以下CLI工具可用：\n\n- **`smartrun`**：一个智能分布式启动器，可自动封装`torchrun`参数。\n- **`gen_meta`**：扫描数据集以生成元数据索引（样本数量、校验和等）。\n- **`warmup_data`**：预热并缓存数据索引，以显著加快训练启动速度。\n- **`eshow`**：检查或比较实验配置。\n- **`singlegpu_debug` \u002F `multigpu_debug`**：专门用于远程调试的入口。\n\n---\n\n## 📥 模型与数据准备\n\n### 2.1 下载模型权重\n\n将模型下载到 `.\u002Fnextstep_models` 目录。请相应更新 `nextstep\u002Fmodel_zoos.py` 中的路径。\n\n```bash\nbash download_models.sh\n```\n> ☕ **提示**: 这个下载可能需要一些时间。不妨泡杯咖啡，稍作休息吧！☕\n\n#### 可用模型\n\n下表列出了所有可用模型及其训练阶段：\n\n| 模型 | 预训练 256px | 预训练 512px | 退火 | RL | 视觉多样性 | 微调难易度 | Hugging Face |\n|-------|-------------------|-------------------|----------|----|-----------|------------------|--------------|\n| **NextStep-1-f8ch16-Tokenizer** | ❌ | ❌ | ❌ | ❌ | - | - | [![🤗](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-HuggingFace-yellow)](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FNextStep-1-f8ch16-Tokenizer) |\n| **NextStep-1.1-Pretrain-256px** | ✅ | ❌ | ❌ | ❌ | 高 | 易 | [![🤗](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-HuggingFace-yellow)](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FNextStep-1.1-Pretrain-256px) |\n| **NextStep-1.1-Pretrain** | ✅ | ✅ | ✅ | ❌ | 中 | 中 | [![🤗](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-HuggingFace-yellow)](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FNextStep-1.1-Pretrain) |\n| **NextStep-1.1** | ✅ | ✅ | ✅ | ✅ | 低 | 难 | [![🤗](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-HuggingFace-yellow)](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FNextStep-1.1) |\n| **NextStep-1-Large-Pretrain** | ✅ | ✅ | ✅ | ❌ | 高 | 中 | [![🤗](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-HuggingFace-yellow)](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FNextStep-1-Large-Pretrain) |\n| **NextStep-1-Large** | ✅ | ✅ | ✅ | ✅ | 低 | 难 | [![🤗](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-HuggingFace-yellow)](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FNextStep-1-Large) |\n| **NextStep-1-Large-Edit** | ✅ | ✅ | ✅ | ✅ | 低 | 难 | [![🤗](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-HuggingFace-yellow)](https:\u002F\u002Fhuggingface.co\u002Fstepfun-ai\u002FNextStep-1-Large-Edit) |\n\n> ⚠️ **注意**: NextStep-1 系列模型属于旧版本，性能不如 NextStep-1.1 系列，因此不建议使用。请改用 NextStep-1.1 系列模型。\n\n> 💡 **快速推理**: 如果您想快速对模型进行推理，请参考下面的推理脚本。\n\n```bash\npython3 inference\u002Finference.py\n```\n\n### 2.2 下载训练数据集\n\n将数据集下载到 `.\u002Fnextstep_data` 目录。\n\n```bash\nbash download_datasets.sh\n```\n> ☕ **提示**: 这个下载可能需要一些时间。不妨泡杯咖啡，稍作休息吧！☕\n\n> ⚠️ **重要提示**: `download_datasets.sh` 中提供的数据集仅为演示用途的开源示例数据集。而 NextStep 的实际训练则使用了来自公司内部专有数据源的约 **10亿张图片**，这些数据无法公开。为了获得最佳训练效果，我们强烈建议您按照第 2.3 节中的数据处理指南，收集并准备自己的大规模数据集。\n\n### 2.3 处理自定义数据（可选）\n\n> 💡 **跳过此部分**：如果您仅使用第 2.2 节中的默认数据集，则无需执行以下步骤。若要处理自定义数据，请按如下操作：\n\n#### 2.3.1 数据处理\n\n将原始数据转换为统一的 WebDataset (Tar) 格式。\n\n```bash\npython3 nextstep\u002Fdata\u002Fbuild_wds.py\n```\n\n**数据规范**（生成 `assets\u002Fidx_0000_0000.tar`）：\n\n- **`key.json`**: 必须包含一个 `caption` 字段，使用 `\u003Cimage_n>` 占位符来定义交错序列。\n- **`key-{i}.png`**: 图片必须命名为 `key-0.png`、`key-1.png` 等，与 JSON 中的占位符一一对应。\n- ⚠️ **重要**: `key` 中 **不能** 包含点号 (`.`) 或连字符 (`-`)。您必须使用 `build_wds.py` 脚本来确保索引正确。**请根据您的具体数据源修改脚本中的 `load_data` 和 `create_example` 函数。**\n\n#### 2.3.2 元数据生成\n\n计算每个 Tar 文件的样本数量，以构建训练索引。\n\n```bash\ngen_meta \u002Fpath\u002Fto\u002Fyour\u002Fdataset\u002Froot_dir\n```\n\n> 💡 完成后，请将新数据更新到 `configs\u002Fdata\u002Fpretrain_data.json` 以及 `configs\u002Fdata` 目录下的相应 Python 数据配置文件中。\n\n#### 2.3.3 预热索引\n\n建议在大规模训练时使用，以便将索引缓存在本地。\n\n```bash\nwarmup_data \u002Fpath\u002Fto\u002Fyour\u002Fdataset\u002Froot_dir --n_jobs 32\n```\n\n#### 2.3.4 数据可视化\n\n预览 Tar 文件或配置中的数据分布和内容。\n\n```bash\nstreamlit run nextstep\u002Fservice\u002F_preview.py --server.port 8501\n```\n\n#### 2.3.5 W&B 凭证\n\n在根目录下创建一个 `.config` 文件，用于实验跟踪。API 密钥可在 https:\u002F\u002Fwandb.ai\u002Fsettings 找到。\n\n```text\nWANDB_MODE=online\nWANDB_API_KEY=YOUR_WANDB_API_KEY\nWANDB_BASE_URL=https:\u002F\u002Fapi.wandb.ai\n```\n\n---\n\n## 🚀 训练\n\n> ⚠️ **在开始训练之前**, 请仔细检查 `configs` 目录中的配置文件。您可能需要修改配置文件中的模型或输出路径。\n\n### 3.1 开始训练（通过 `smartrun`）\n\n**选项 1**: 使用 NextStep-1.1-Pretrain-256px 模型，进行少量训练步数（约 1 万步）\n\n```bash\nsmartrun -m configs.nextstep_qwen14b_512px\n```\n\n> 💡 该命令会自动利用所有可用的机器资源。如果您在单台机器上运行此命令，等同于：`torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0 -m configs.nextstep_qwen14b_512px`\n\n**选项 2**: 使用 Qwen2.5-14B 模型，进行大量训练步数（约 50 万步）\n\n```bash\nsmartrun -m configs.nextstep_qwen14b_256px\n```\n\n### 3.2 覆盖训练参数\n\n在训练过程中覆盖特定参数：\n\n```bash\nsmartrun -m configs.nextstep_qwen14b_512px \\\n  training.max_steps=1000 \\\n  training.save_steps=200 \\\n  data.num_workers=2\n```\n\n### 3.3 检查和比较配置\n\n**查看单个配置:**\n\n```bash\neshow configs\u002Fnextstep_qwen14b_512px.py\n```\n\n**比较两个配置之间的差异**（例如 256px 与 512px）：\n\n```bash\neshow configs\u002Fnextstep_qwen14b_256px.py configs\u002Fnextstep_qwen14b_512px.py\n```\n\n> 📌 **提示**: 请根据您的实际情况调整具体的参数、配置文件和数据路径。详细说明请参阅 [`configs\u002FREADME.md`](.\u002Fconfigs\u002FREADME.md)。\n\n---\n\n## 🔮 推理\n\n### 4.1 转换检查点格式\n\n将 DeepSpeed 分片检查点转换为标准的 HuggingFace 格式：\n\n```bash\npython3 nextstep\u002Fdeepspeed\u002Fzero_to_fp32.py \u002Fpath\u002Fto\u002Fyour\u002Ftrained\u002Fcheckpoint_dir\n```\n\n### 4.2 运行推理\n\n**基础推理:**\n\n```bash\npython3 inference\u002Finference.py --model_name_or_path \u002Fpath\u002Fto\u002Fyour\u002Ftrained\u002Fcheckpoint_dir\n```\n\n**使用默认模型快速启动:**\n\n```bash\npython3 inference\u002Finference.py\n```\n\n---\n\n## 📖 文档\n\n有关特定模块的详细文档，请参阅：\n\n- [NextStep 包](.\u002Fnextstep\u002FREADME.md) - 核心包概述\n- [配置系统](.\u002Fconfigs\u002FREADME.md) - 配置文件与训练设置\n- [训练引擎](.\u002Fnextstep\u002Fengine\u002FREADME.md) - 训练与验证实现\n- [模型](.\u002Fnextstep\u002Fmodels\u002FREADME.md) - 模型架构与实现\n- [数据集](.\u002Fnextstep\u002Fdatasets\u002FREADME.md) - 数据集适配器与混合采样\n- [数据处理](.\u002Fnextstep\u002Fdata\u002FREADME.md) - 数据加载、索引及工具函数\n- [服务](.\u002Fnextstep\u002Fservice\u002FREADME.md) - 数据预览与可视化服务\n- [工具](.\u002Fnextstep\u002Futils\u002FREADME.md) - 工具函数与辅助程序\n\n---\n\n## 📚 参考文献\n\n### 核心框架\n\n- [DeepSpeed](https:\u002F\u002Fgithub.com\u002Fdeepspeedai\u002FDeepSpeed)\n- [Transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers)\n- [Diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers)\n\n### 数据集\n\n- [Dolma](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fallenai\u002Fdolma)\n- [BLIP3o](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FBLIP3o\u002FBLIP3o-60k)\n- [GPT-Image-Edit](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FUCSC-VLAA\u002FGPT-Image-Edit-1.5M)\n- [多模态教材](https:\u002F\u002Fgithub.com\u002FDAMO-NLP-SG\u002Fmultimodal_textbook)\n\n---\n\n## 📄 许可证\n\nNextStep 采用 Apache License 2.0 许可证。您可以在相应的 GitHub 和 HuggingFace 仓库中找到许可证文件。\n\n---\n\n## 📖 引用\n\n如果您在研究和应用中觉得 NextStep 有用，请考虑给本仓库点个赞，并引用以下内容：\n\n```bibtex\n@article{nextstepteam2025nextstep1,\n  title={NextStep-1：迈向大规模连续标记的自回归图像生成},\n  author={NextStep 团队以及 Chunrui Han、Guopeng Li、Jingwei Wu、Quan Sun、Yan Cai、Yuang Peng、Zheng Ge、Deyu Zhou、Haomiao Tang、Hongyu Zhou、Kenkun Liu、Ailin Huang、Bin Wang、Changxin Miao、Deshan Sun、En Yu、Fukun Yin、Gang Yu、Hao Nie、Haoran Lv、Hanheng Hu、Jia Wang、Jian Zhou、Jianjian Sun、Kaijun Tan、Kang An、Kangheng Lin、Liang Zhao、Mei Chen、Peng Xing、Rui Wang、Shiyu Liu、Shutao Xia、Tianhao You、Wei Ji、Xianfang Zeng、Xin Han、Xuelin Zhang、Yana Wei、Yanming Xu、Yimin Jiang、Yingming Wang、Yu Zhou、Yucheng Han、Ziyang Meng、Binxing Jiao、Daxin Jiang、Xiangyu Zhang、Yibo Zhu},\n  journal={arXiv 预印本 arXiv:2508.10711},\n  year={2025}\n}\n```","# NextStep-1 快速上手指南\n\nNextStep-1 是一个 140 亿参数的自回归图像生成模型，它直接使用连续令牌（continuous tokens）处理图像，无需传统的向量量化（VQ），能够联合建模离散文本令牌和连续图像令牌，生成高细节的图像。\n\n## 1. 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n*   **操作系统**: Linux (推荐 Ubuntu)\n*   **Python 版本**: 3.10\n*   **GPU**: 支持 CUDA 的 NVIDIA GPU (建议显存充足以运行大模型)\n*   **依赖管理**: 已安装 `conda` 和 `git`\n\n> **注意**: 建议根据您的 CUDA 版本预先安装 PyTorch，以获得最佳兼容性。\n\n## 2. 安装步骤\n\n### 2.1 克隆仓库\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fstepfun-ai\u002FNextStep-1\ncd NextStep-1\n```\n\n### 2.2 创建 Conda 环境\n```bash\nconda create -n nextstep python=3.10 -y\nconda activate nextstep\n```\n\n### 2.3 安装依赖\n使用 `uv` 进行快速安装（推荐）：\n```bash\npip install uv\nuv pip install -e .\n```\n> 💡 **提示**: 首次安装可能需要一些时间，请耐心等待。\n\n## 3. 基本使用\n\n### 3.1 下载模型权重\n官方提供了多个版本的模型。**强烈推荐使用 NextStep-1.1 系列**，因为其性能优于旧版 NextStep-1 系列。\n\n执行以下脚本下载模型到 `.\u002Fnextstep_models` 目录：\n```bash\nbash download_models.sh\n```\n> ⚠️ **注意**: 下载完成后，请检查 `nextstep\u002Fmodel_zoos.py` 中的路径配置是否正确。\n\n**可用模型推荐：**\n*   **NextStep-1.1**: 经过强化学习后训练，画质最高，适合最终生成。\n*   **NextStep-1.1-Pretrain**: 预训练版本，视觉多样性较高，适合微调。\n*   *(不推荐使用 NextStep-1-Large 等旧版系列)*\n\n### 3.2 快速推理\n安装并下载模型后，您可以直接运行官方提供的推理脚本来体验文生图功能：\n\n```bash\npython3 inference\u002Finference.py\n```\n\n该脚本将加载默认配置并生成示例图像。如需修改提示词或参数，请编辑 `inference\u002Finference.py` 文件或参考项目完整文档中的高级推理配置。\n\n### 3.3 (可选) 自定义数据训练\n如果您希望使用自己的数据集进行训练，需先将数据转换为 WebDataset (Tar) 格式：\n\n```bash\npython3 nextstep\u002Fdata\u002Fbuild_wds.py\n```\n*   **数据格式要求**: JSON 文件中需包含 `caption` 字段（使用 `\u003Cimage_n>` 占位符），图片文件命名为 `key-0.png`, `key-1.png` 等。\n*   **生成元数据**:\n    ```bash\n    gen_meta \u002Fpath\u002Fto\u002Fyour\u002Fdataset\u002Froot_dir\n    ```\n*   **预热索引** (加速训练启动):\n    ```bash\n    warmup_data \u002Fpath\u002Fto\u002Fyour\u002Fdataset\u002Froot_dir --n_jobs 32\n    ```\n\n### 3.4 启动训练\n使用内置的智能启动器 `smartrun` 开始训练。以下示例启动一个 512px 分辨率的预训练任务：\n\n```bash\nsmartrun -m configs.nextstep_qwen14b_512px\n```\n\n如需覆盖特定参数（如步数、保存频率）：\n```bash\nsmartrun -m configs.nextstep_qwen14b_512px \\\n  training.max_steps=1000 \\\n  training.save_steps=200\n```","某电商平台的视觉设计团队需要在短时间内为数千款新品生成高质量、细节丰富的商品宣传图，并支持对局部细节进行精准修改。\n\n### 没有 NextStep-1 时\n- **细节丢失严重**：传统自回归模型依赖向量量化（VQ）将图像压缩为离散令牌，导致生成的商品纹理模糊，无法还原面料质感或金属光泽。\n- **工作流割裂**：文生图与图像编辑需调用两套不同的模型架构（如扩散模型 + 专用编辑模型），增加了系统集成的复杂度和推理延迟。\n- **迭代成本高昂**：若需调整光影或局部特征，往往需要重新生成整张图片，难以实现“指哪改哪”的精细化控制。\n- **训练资源浪费**：为了弥补离散化带来的信息损失，必须投入更多算力训练更大的扩散模型，且推理速度受限于多步去噪过程。\n\n### 使用 NextStep-1 后\n- **还原真实质感**：NextStep-1 直接处理连续图像令牌，完整保留了视觉数据的丰富性，生成的丝绸褶皱和珠宝反光达到照片级逼真度。\n- **统一架构高效推理**：利用单一的自回归框架同时处理文本和连续图像令牌，配合 vLLM-Omni 加速，实现了文生图与局部编辑的无缝切换与高速响应。\n- **精准局部编辑**：基于连续的下一个令牌预测机制，设计师可仅重绘商品的特定区域（如更换背景或调整 Logo 位置），无需破坏整体构图。\n- **扩展性更强**：140 亿参数规模结合轻量级流匹配头，在保持简单架构的同时轻松应对大规模并发请求，显著降低了单位图像的生成成本。\n\nNextStep-1 通过突破性的连续令牌自回归技术，将高保真图像生成与灵活编辑能力统一于单一模型，彻底解决了传统方案在细节还原与工作流效率上的双重瓶颈。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fstepfun-ai_NextStep-1_f75419c8.png","stepfun-ai","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fstepfun-ai_576b766a.png","",null,"opensource@stepfun.com","https:\u002F\u002Fgithub.com\u002Fstepfun-ai",[83,87],{"name":84,"color":85,"percentage":86},"Python","#3572A5",99.5,{"name":88,"color":89,"percentage":90},"Shell","#89e051",0.5,654,23,"2026-04-06T07:08:46","Apache-2.0",4,"Linux","必需 NVIDIA GPU。训练需多卡分布式环境（示例命令使用 8 卡），推理支持 vLLM-Omni。具体显存未说明，但模型参数量达 14B，建议高显存配置。需预安装与 CUDA 版本匹配的 PyTorch。","未说明（建议 64GB+ 以支持 14B 参数模型训练及大数据集处理）",{"notes":100,"python":101,"dependencies":102},"1. 推荐使用 conda 创建 Python 3.10 环境。2. 安装前需根据本地 CUDA 版本手动预安装 PyTorch。3. 模型系列包含 NextStep-1（旧版）和 NextStep-1.1（推荐），后者性能更优。4. 官方提供的训练数据集仅为示例，实际训练使用了约 10 亿张专有图片，用户需自行准备大规模数据集以达到最佳效果。5. 提供 `smartrun` 工具自动管理分布式训练参数。6. 支持将 DeepSpeed 分片检查点转换为 HuggingFace 格式。","3.10",[103,104,105,106,107,108],"torch (需根据 CUDA 版本预安装)","uv","smartrun (内置工具)","WebDataset","streamlit","wandb",[15,62],"2026-03-27T02:49:30.150509","2026-04-07T18:38:34.159964",[113,118,123,128,133,138,143],{"id":114,"question_zh":115,"answer_zh":116,"source_url":117},22650,"运行示例代码时遇到 `import_utils` 模块缺失或 LayerNorm 形状不匹配报错，如何解决？","这是代码中的已知问题。请修改 `nextstep\u002Fmodels\u002Fmodeling_flux_vae.py` 文件中的 `layer_norm_2d` 调用部分：\n将 `mean = layer_norm_2d(mean, mean.size()[1:2])`\n改为 `mean = layer_norm_2d(mean, [mean.shape[1]])`。\n此外，建议优先参考 Hugging Face 仓库提供的最新推理代码，因为本地仓库的推理代码可能存在小问题。","https:\u002F\u002Fgithub.com\u002Fstepfun-ai\u002FNextStep-1\u002Fissues\u002F2",{"id":119,"question_zh":120,"answer_zh":121,"source_url":122},22651,"模型支持哪些分辨率？是否可以使用 1024x1024 进行生成？","虽然训练过程中使用了多种分辨率，但并未包含 1024px。当前版本对动态分辨率的支持尚不完善，强行使用 1024px 可能导致结果不符合预期。官方强烈建议固定使用 **512x512** 分辨率以获得最佳效果。您可以参考代码库中 `nextstep\u002Fmodels\u002Faspect_ratio.py` 文件查看训练时使用的具体补丁分辨率设置。","https:\u002F\u002Fgithub.com\u002Fstepfun-ai\u002FNextStep-1\u002Fissues\u002F3",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},22652,"训练代码（Training Code）何时开源？在哪里可以获取？","训练和评估代码已经正式开源并上传。您可以直接在 GitHub 仓库中查找相关文件，或访问 Hugging Face 上的项目页面获取最新的训练脚本。","https:\u002F\u002Fgithub.com\u002Fstepfun-ai\u002FNextStep-1\u002Fissues\u002F6",{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},22653,"为什么 NextStep-1 VAE 的重建指标（rFID）相比 FLUX 有所下降？","这是因为在 NextStep-1 的 VAE 微调过程中，团队移除了对抗损失（GAN loss）。这一设计选择导致了重建指标（如 rFID）与包含 GAN 损失的模型（如 FLUX）相比有所差异。","https:\u002F\u002Fgithub.com\u002Fstepfun-ai\u002FNextStep-1\u002Fissues\u002F7",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},22654,"生成一张 512x512 的图像耗时较长（约 120 秒），有什么加速推理的建议吗？","如果您希望显著提升推理速度，可以尝试使用 `vllm-omni` 进行部署。该项目针对此类模型进行了推理优化，具体的集成方法可以参考相关的 Pull Request 或文档。","https:\u002F\u002Fgithub.com\u002Fstepfun-ai\u002FNextStep-1\u002Fissues\u002F15",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},22655,"项目微信群的二维码过期了，如何加入社区交流群？","官方已更新了微信群二维码。如果二维码再次过期或无法扫描，您可以直接添加维护者的微信号（例如：18370619005），备注来意后会被邀请入群。","https:\u002F\u002Fgithub.com\u002Fstepfun-ai\u002FNextStep-1\u002Fissues\u002F13",{"id":144,"question_zh":145,"answer_zh":146,"source_url":117},22656,"文生图（Text-to-Image）任务运行失败，而图像编辑任务正常，是什么原因？","这通常是由于本地仓库中的推理代码存在未修复的小缺陷导致的。官方建议不要直接使用当前的本地推理脚本，而是前往 Hugging Face 仓库（stepfun-ai\u002FNextStep-1-Large）下载最新的推理代码补丁或完整实现，那里包含了针对文生图流程的修正。",[]]