[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-NVlabs--edm2":3,"tool-NVlabs--edm2":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":91,"forks":92,"last_commit_at":93,"license":94,"difficulty_score":10,"env_os":95,"env_gpu":96,"env_ram":97,"env_deps":98,"category_tags":111,"github_topics":79,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":112,"updated_at":113,"faqs":114,"releases":125},3428,"NVlabs\u002Fedm2","edm2","EDM2 and Autoguidance -- Official PyTorch implementation","edm2 是 NVIDIA 研究团队开源的先进扩散模型训练与生成框架，基于 PyTorch 构建。它核心解决了传统扩散模型在训练动态稳定性及生成图像质量上的瓶颈问题。通过两项荣获 CVPR 和 NeurIPS 口头报告的研究成果，edm2 不仅深入分析并优化了模型的训练过程，还创新性地提出了“自引导”（Autoguidance）技术——即利用模型自身的“弱化版本”来引导生成过程，从而在无需额外分类器的情况下显著提升图像细节与真实感。\n\n该项目提供了从超小型到超大型多种预训练模型，支持在 ImageNet 数据集上以不同分辨率进行高效推理与微调，并能灵活权衡 FID 指标与语义感知质量。edm2 特别适合 AI 研究人员、算法工程师以及对生成式图像质量有极高要求的专业开发者使用。虽然其训练环节需要多张高端 NVIDIA GPU 支持，但项目提供了完善的 Docker 环境与预设脚本，使得具备基础深度学习环境的用户也能轻松复现顶尖的图像生成效果，是探索下一代扩散模型架构的理想工具。","## EDM2 and Autoguidance &mdash; Official PyTorch implementation\n\n![Teaser image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FNVlabs_edm2_readme_bc38497547fd.jpg)\n\n**Analyzing and Improving the Training Dynamics of Diffusion Models** (CVPR 2024 oral)\u003Cbr>\nTero Karras, Miika Aittala, Jaakko Lehtinen, Janne Hellsten, Timo Aila, Samuli Laine\u003Cbr>\nhttps:\u002F\u002Farxiv.org\u002Fabs\u002F2312.02696\u003Cbr>\n\n**Guiding a Diffusion Model with a Bad Version of Itself** (NeurIPS 2024 oral)\u003Cbr>\nTero Karras, Miika Aittala, Tuomas Kynk&auml;&auml;nniemi, Jaakko Lehtinen, Timo Aila, Samuli Laine\u003Cbr>\nhttps:\u002F\u002Farxiv.org\u002Fabs\u002F2406.02507\u003Cbr>\n\nFor business inquiries, please visit our website and submit the form: [NVIDIA Research Licensing](https:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fresearch\u002Finquiries\u002F)\n\n## Requirements\n\n* Linux and Windows are supported, but we recommend Linux for performance and compatibility reasons.\n* 1+ high-end NVIDIA GPU for sampling and 8+ GPUs for training. We have done all testing and development using V100 and A100 GPUs.\n* 64-bit Python 3.9 and PyTorch 2.1 (or later). See https:\u002F\u002Fpytorch.org for PyTorch install instructions.\n* Other Python libraries: `pip install click Pillow psutil requests scipy tqdm diffusers==0.26.3 accelerate==0.27.2`\n* For downloading the raw snapshots needed for post-hoc EMA reconstruction, we recommend using [Rclone](https:\u002F\u002Frclone.org\u002Finstall\u002F).\n\nFor convenience, we provide a [Dockerfile](.\u002FDockerfile) with the required dependencies. You can use it as follows:\n\n```.bash\n# Build Docker image\ndocker build --tag edm2:latest .\n\n# Run generate_images.py using Docker\ndocker run --gpus all -it --rm --user $(id -u):$(id -g) \\\n    -v `pwd`:\u002Fscratch --workdir \u002Fscratch -e HOME=\u002Fscratch \\\n    edm2:latest \\\n    python generate_images.py --preset=edm2-img512-s-guid-dino --outdir=out\n```\n\nIf you hit an error, please ensure you have correctly installed the [NVIDIA container runtime](https:\u002F\u002Fdocs.docker.com\u002Fconfig\u002Fcontainers\u002Fresource_constraints\u002F#gpu). See [NVIDIA PyTorch container release notes](https:\u002F\u002Fdocs.nvidia.com\u002Fdeeplearning\u002Fframeworks\u002Fpytorch-release-notes\u002Frel-24-02.html#rel-24-02) for driver compatibility details.\n\nBreakdown of the `docker run` command line:\n\n- `--gpus all -it --rm --user $(id -u):$(id -g)`: With all GPUs enabled, run an interactive session with current user's UID\u002FGID to avoid Docker writing files as root.\n- ``-v `pwd`:\u002Fscratch --workdir \u002Fscratch``: Mount current running dir (e.g., the top of this git repo on your host machine) to `\u002Fscratch` in the container and use that as the current working dir.\n- `-e HOME=\u002Fscratch`: Specify where to cache temporary files. If you want more fine-grained control, you can instead set `DNNLIB_CACHE_DIR` (for pre-trained model download cache). You want these cache dirs to reside on persistent volumes so that their contents are retained across multiple `docker run` invocations.\n\n## Using pre-trained models\n\nWe provide pre-trained models for our proposed EDM2 configuration (config G) for different model sizes trained with ImageNet-512 and ImageNet-64. To generate images using a given model, run:\n\n```.bash\n# Generate a couple of images and save them as out\u002F*.png\npython generate_images.py --preset=edm2-img512-s-guid-dino --outdir=out\n```\n\nThe above command automatically downloads the necessary models and caches them under `$HOME\u002F.cache\u002Fdnnlib`, which can be overridden by setting the `DNNLIB_CACHE_DIR` environment variable. The `--preset=edm2-img512-s-guid-dino` option indicates that we will be using the S-sized EDM2 model, trained with ImageNet-512 and sampled using guidance, with EMA length and guidance strength chosen to minimize FD\u003Csub>DINOv2\u003C\u002Fsub>. The following presets are supported:\n\n```\n# EDM2 paper\nedm2-img512-{xs|s|m|l|xl|xxl}-fid              # Table 2, minimize fid\nedm2-img512-{xs|s|m|l|xl|xxl}-dino             # Table 5, minimize fd_dinov2\nedm2-img64-{s|m|l|xl}-fid                      # Table 3, minimize fid\nedm2-img512-{xs|s|m|l|xl|xxl}-guid-{fid|dino}  # Table 2, classifier-free guidance\n\n# Autoguidance paper\nedm2-img512-{s|xxl}-autog-{fid|dino}           # Table 1, conditional ImageNet-512\nedm2-img512-s-uncond-autog-{fid|dino}          # Table 1, unconditional ImageNet-512\nedm2-img64-s-autog-{fid|dino}                  # Table 1, conditional ImageNet-64\n```\n\nEach of these maps to a specific set of options that point to the models in [https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fposthoc-reconstructions\u002F](https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fposthoc-reconstructions\u002F). For example, `--preset=edm2-img512-xxl-guid-dino` is equivalent to:\n\n```.bash\n# Expanded command line for --preset=edm2-img512-xxl-guid-dino\npython generate_images.py \\\n    --net=https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fposthoc-reconstructions\u002Fedm2-img512-xxl-0939524-0.015.pkl \\\n    --gnet=https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fposthoc-reconstructions\u002Fedm2-img512-xs-uncond-2147483-0.015.pkl \\\n    --guidance=1.7 \\\n    --outdir=out\n```\n\nIn other words, we will use the XXL-sized conditional model at 939524 kimg and EMA length 0.015, and guide it with respect to the XS-sized unconditional model at 2147483 kimg with guidance strength 1.7. For further details, see `config_presets` in [`generate_images.py`](.\u002Fgenerate_images.py).\n\n## Calculating FLOPs and metrics\n\nThe computational cost of a given model can be estimated using `count_flops.py`:\n\n```.bash\n# Calculate FLOPs for a given model\npython count_flops.py \\\n    https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fposthoc-reconstructions\u002Fedm2-img512-s-2147483-0.130.pkl\n```\n\nTo calculate FID and FD\u003Csub>DINOv2\u003C\u002Fsub>, we first need to generate 50,000 random images. This can be quite time-consuming in practice, so it makes sense to distribute the workload across multiple GPUs. This can be done by launching `generate_images.py` through `torchrun`:\n\n```.bash\n# Generate 50000 images using 8 GPUs and save them as out\u002F*\u002F*.png\ntorchrun --standalone --nproc_per_node=8 generate_images.py \\\n    --preset=edm2-img512-s-guid-fid --outdir=out --subdirs --seeds=0-49999\n```\n\nAlternatively, `generate_images.py` can be launched as a multi-GPU or multi-node job in a compute cluster. This should work out-of-the-box as long as the cluster environment spawns a separate process for each GPU and populates the necessary environment variables. For further details, please refer to the [`torchrun`](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fstable\u002Felastic\u002Frun.html) documetation.\n\nHaving generated 50,000 images, FID and FD\u003Csub>DINOv2\u003C\u002Fsub> can then be calculated using `calculate_metrics.py`:\n\n```.bash\n# Calculate metrics for a random subset of 50000 images in out\u002F\npython calculate_metrics.py calc --images=out \\\n    --ref=https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fdataset-refs\u002Fimg512.pkl\n```\n\nHere, the `--ref` option points to pre-computed reference statistics for the dataset that the model was originally trained with. The necessary reference statistics for our pre-trained models are available at [https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fdataset-refs\u002F](https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fdataset-refs\u002F).\n\nNote that the numerical values of the metrics vary across different random seeds and are highly sensitive to the number of images. By default, `calculate_metrics.py` uses 50,000 generated images, in line with established best practices. Providing fewer images will result in an error, whereas providing more will use a random subset. To reduce the effect of random variation, we recommend repeating the calculation multiple times with different random seeds, e.g., `--seeds=0-49999`, `--seeds=50000-99999`, and `--seeds=100000-149999`. In our paper, we calculated each metric multiple times and reported the minimum.\n\nWhen performing larger sweeps over, say, EMA lengths or training snapshots, it may be impractical to use `generate_images.py` as outlined above. As an alternative, the metrics can also be calculated directly for a given network pickle, generating the necessary images on the fly:\n\n```.bash\n# Calculate metrics directly for a given model without saving any images\ntorchrun --standalone --nproc_per_node=8 calculate_metrics.py gen \\\n    --net=https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fposthoc-reconstructions\u002Fedm2-img512-s-2147483-0.130.pkl \\\n    --ref=https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fdataset-refs\u002Fimg512.pkl \\\n    --seed=123456789\n```\n\nWe also provide the necessary APIs to do these kinds of operations programmatically from external Python scripts. For further details, see `gen()` in [`calculate_metrics.py`](.\u002Fcalculate_metrics.py).\n\n## Post-hoc EMA reconstruction\n\nThe models in [https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fposthoc-reconstructions\u002F](https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fposthoc-reconstructions\u002F) correspond to specific choices for the EMA length. In addition, we also provide the raw snapshots for each training run in [https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fraw-snapshots\u002F](https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fraw-snapshots\u002F) that can be used to reconstruct arbitrary EMA profiles.\n\nNote that the raw snapshots can take up a considerable amount of disk space. In the paper, we saved snapshots every 8Mi (= 8 [mebi](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FBinary_prefix#mebi) = 8&times;2\u003Csup>20\u003C\u002Fsup>) training images, corresponding to 118&ndash;635 GB of data per training run depending on model size. In [https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fraw-snapshots\u002F](https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fraw-snapshots\u002F), we provide the snapshots at 32Mi intervals instead, corresponding to 30&ndash;159 GB per training run. We have done extensive testing to verify that this is sufficient for accurate reconstruction.\n\nTo reconstruct new EMA profiles, the first step is to download the raw snapshots corresponding to a given training run. We recommend using [Rclone](https:\u002F\u002Frclone.org\u002Finstall\u002F) for this:\n\n```.bash\n# Download raw snapshots for the pre-trained edm2-img512-xs model\nrclone copy --progress --http-url https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2 \\\n    :http:raw-snapshots\u002Fedm2-img512-xs\u002F raw-snapshots\u002Fedm2-img512-xs\u002F\n```\n\nThe above command downloads 128 network pickles, 238 MB each, yielding 29.8 GB in total. Once the download is complete, new EMA profiles can be reconstructed using `reconstruct_phema.py`:\n\n```.bash\n# Reconstruct a new EMA profile with std=0.150\npython reconstruct_phema.py --indir=raw-snapshots\u002Fedm2-img512-xs \\\n    --outdir=out --outstd=0.150\n```\n\nThis reads each of the input pickles once and saves the reconstructed model at `out\u002Fphema-2147483-0.150.pkl`, to be used with, e.g., `generate_images.py`. To perform a sweep over EMA length, it is also possible reconstruct multiple EMA profiles simultaneously:\n\n```.bash\n# Reconstruct a set of 31 EMA profiles, streaming over the input data 4 times\npython reconstruct_phema.py --indir=raw-snapshots\u002Fedm2-img512-xs \\\n    --outdir=out --outstd=0.010,0.015,...,0.250 --batch=8\n```\n\nSee [`python reconstruct_phema.py --help`](.\u002Fdocs\u002Fphema-help.txt) for the full list of options.\n\nNote that our post-hoc EMA approach is not specific to diffusion models in any way &mdash; it can be applied to other kinds of deep learning models as well. To try it out in your own training runs, you can **(1)** include [`training\u002Fphema.py`](.\u002Ftraining\u002Fphema.py) in your codebase, **(2)** modify your training loop to use `phema.PowerFunctionEMA`, and **(3)** take a copy of [`reconstruct_phema.py`](.\u002Freconstruct_phema.py) and modify it to suit your needs.\n\n## Preparing datasets\n\nDatasets are stored as uncompressed ZIP archives containing uncompressed PNG or NPY files, along with a metadata file `dataset.json` for labels. When using latent diffusion, it is necessary to create two different versions of a given dataset: the original RGB version, used for evaluation, and a VAE-encoded latent version, used for training.\n\nTo set up ImageNet-512:\n\n1. Download the ILSVRC2012 data archive from [Kaggle](https:\u002F\u002Fwww.kaggle.com\u002Fcompetitions\u002Fimagenet-object-localization-challenge\u002Fdata) and extract it somewhere, e.g., `downloads\u002Fimagenet`.\n\n2. Crop and resize the images to create the original RGB dataset:\n\n```.bash\n# Convert raw ImageNet data to a ZIP archive at 512x512 resolution\npython dataset_tool.py convert --source=downloads\u002Fimagenet\u002FILSVRC\u002FData\u002FCLS-LOC\u002Ftrain \\\n    --dest=datasets\u002Fimg512.zip --resolution=512x512 --transform=center-crop-dhariwal\n```\n\n3. Run the images through a pre-trained VAE encoder to create the corresponding latent dataset:\n\n```.bash\n# Convert the pixel data to VAE latents\npython dataset_tool.py encode --source=datasets\u002Fimg512.zip \\\n    --dest=datasets\u002Fimg512-sd.zip\n```\n\n4. Calculate reference statistics for the original RGB dataset, to be used with `calculate_metrics.py`:\n\n```.bash\n# Compute dataset reference statistics for calculating metrics\npython calculate_metrics.py ref --data=datasets\u002Fimg512.zip \\\n    --dest=dataset-refs\u002Fimg512.pkl\n```\n\n## Training new models\n\nNew models can be trained using `train_edm2.py`. For example, to train an XS-sized conditional model for ImageNet-512 using the same hyperparameters as in our paper, run:\n\n```.bash\n# Train XS-sized model for ImageNet-512 using 8 GPUs\ntorchrun --standalone --nproc_per_node=8 train_edm2.py \\\n    --outdir=training-runs\u002F00000-edm2-img512-xs \\\n    --data=datasets\u002Fimg512-sd.zip \\\n    --preset=edm2-img512-xs \\\n    --batch-gpu=32\n```\n\nThis example performs single-node training using 8 GPUs, but in practice, we recommend using at least 32 A100 GPUs, i.e., 4 DGX nodes. Note that training large models may easily run out of GPU memory, depending on the number of GPUs and the available VRAM. The best way to avoid this is to limit the per-GPU batch size using gradient accumulation. In the above example, the total batch size is 2048 images, i.e., 256 per GPU, but we limit it to 32 per GPU by specifying `--batch-gpu=32`. Modifying `--batch-gpu` is safe in the sense that it has no interaction with the other hyperparameters, whereas modifying the total batch size would also necessitate adjusting, e.g., the learning rate.\n\nBy default, the training script prints status every 128Ki (= 128 [kibi](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FBinary_prefix#kibi) = 128&times;2\u003Csup>10\u003C\u002Fsup>) training images (controlled by `--status`), saves network snapshots every 8Mi (= 8&times;2\u003Csup>20\u003C\u002Fsup>) training images (controlled by `--snapshot`), and dumps training checkpoints every 128Mi training images (controlled by `--checkpoint`). The status is saved in `log.txt` (one-line summary) and `stats.json` (comprehensive set of statistics). The network snapshots are saved in `network-snapshot-*.pkl`, and they can be used directly with, e.g., `generate_images.py` and `reconstruct_phema.py`.\n\nThe training checkpoints, saved in `training-state-*.pt`, can be used to resume the training at a later time.\nWhen the training script starts, it will automatically look for the highest-numbered checkpoint and load it if available. To resume training, simply run the same `train_edm2.py` command line again &mdash; it is important to use the same set of options to avoid accidentally changing the hyperparameters mid-training. If you wish to have the ability to suspend the training at any time so that no progress is lost, you can modify the `should_suspend()` function in [torch_utils\u002Fdistributed.py](.\u002Ftorch_utils\u002Fdistributed.py) to implement the desired signaling protocol.\n\nSee [`python train_edm2.py --help`](.\u002Fdocs\u002Ftrain-help.txt) for the full list of options.\n\n## 2D toy example\n\nThe 2D toy example used in the autoguidance paper can be reproduced with `toy_example.py`:\n\n```.bash\n# Visualize sampling distributions using autoguidance.\npython toy_example.py plot\n```\n\nSee [`python toy_example.py --help`](.\u002Fdocs\u002Ftoy-help.txt) for the full list of options.\n\n![2D toy example](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FNVlabs_edm2_readme_44ead30c6c77.jpg)\n\n## License\n\nCopyright &copy; 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.\n\nAll material, including source code and pre-trained models, is licensed under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](http:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-nc-sa\u002F4.0\u002F).\n\n## Citation\n\n```\n@inproceedings{Karras2024edm2,\n  title     = {Analyzing and Improving the Training Dynamics of Diffusion Models},\n  author    = {Tero Karras and Miika Aittala and Jaakko Lehtinen and\n               Janne Hellsten and Timo Aila and Samuli Laine},\n  booktitle = {Proc. CVPR},\n  year      = {2024},\n}\n\n@inproceedings{Karras2024autoguidance,\n  title     = {Guiding a Diffusion Model with a Bad Version of Itself},\n  author    = {Tero Karras and Miika Aittala and Tuomas Kynk\\\"a\\\"anniemi and\n               Jaakko Lehtinen and Timo Aila and Samuli Laine},\n  booktitle = {Proc. NeurIPS},\n  year      = {2024},\n}\n```\n\n## Development\n\nThis is a research reference implementation and is treated as a one-time code drop. As such, we do not accept outside code contributions in the form of pull requests.\n\n## Acknowledgments\n\nWe thank Eric Chan, Qinsheng Zhang, Erik H&auml;rk&ouml;nen, Arash Vahdat, Ming-Yu Liu, David Luebke, and Alex Keller for discussions and comments, and Tero Kuosmanen and Samuel Klenberg for maintaining our compute infrastructure.\n","## EDM2与自动引导——PyTorch官方实现\n\n![预告图](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FNVlabs_edm2_readme_bc38497547fd.jpg)\n\n**分析并改进扩散模型的训练动态**（CVPR 2024口头报告）\u003Cbr>\n特罗·卡拉斯、米卡·艾塔拉、雅各·莱蒂宁、扬内·赫尔斯特恩、蒂莫·艾拉、萨穆利·莱内\u003Cbr>\nhttps:\u002F\u002Farxiv.org\u002Fabs\u002F2312.02696\u003Cbr>\n\n**用自身的一个较差版本引导扩散模型**（NeurIPS 2024口头报告）\u003Cbr>\n特罗·卡拉斯、米卡·艾塔拉、图奥马斯·金坎涅米、雅各·莱蒂宁、蒂莫·艾拉、萨穆利·莱内\u003Cbr>\nhttps:\u002F\u002Farxiv.org\u002Fabs\u002F2406.02507\u003Cbr>\n\n如需商务合作，请访问我们的官网并提交表格：[NVIDIA Research Licensing](https:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fresearch\u002Finquiries\u002F)\n\n## 系统要求\n\n* 支持Linux和Windows，但出于性能和兼容性考虑，我们推荐使用Linux。\n* 采样时需要1张及以上高端NVIDIA GPU，训练则需要8张及以上GPU。我们所有的测试和开发均在V100和A100 GPU上完成。\n* 64位Python 3.9及PyTorch 2.1（或更高版本）。PyTorch安装说明请参见https:\u002F\u002Fpytorch.org。\n* 其他Python库：`pip install click Pillow psutil requests scipy tqdm diffusers==0.26.3 accelerate==0.27.2`\n* 下载后期EMA重建所需的原始快照时，建议使用[Rclone](https:\u002F\u002Frclone.org\u002Finstall\u002F)。\n\n为方便起见，我们提供了一个包含所需依赖项的[Dockerfile](.\u002FDockerfile)。使用方法如下：\n\n```.bash\n# 构建Docker镜像\ndocker build --tag edm2:latest .\n\n# 使用Docker运行generate_images.py\ndocker run --gpus all -it --rm --user $(id -u):$(id -g) \\\n    -v `pwd`:\u002Fscratch --workdir \u002Fscratch -e HOME=\u002Fscratch \\\n    edm2:latest \\\n    python generate_images.py --preset=edm2-img512-s-guid-dino --outdir=out\n```\n\n如果遇到错误，请确保已正确安装[NVIDIA容器运行时](https:\u002F\u002Fdocs.docker.com\u002Fconfig\u002Fcontainers\u002Fresource_constraints\u002F#gpu)。驱动程序兼容性详情请参阅[NVIDIA PyTorch容器发布说明](https:\u002F\u002Fdocs.nvidia.com\u002Fdeeplearning\u002Fframeworks\u002Fpytorch-release-notes\u002Frel-24-02.html#rel-24-02)。\n\n`docker run`命令行解析：\n\n- `--gpus all -it --rm --user $(id -u):$(id -g)`：启用所有GPU，以当前用户的UID\u002FGID运行交互式会话，避免Docker以root身份写入文件。\n- `-v `pwd`:\u002Fscratch --workdir \u002Fscratch`：将当前工作目录（例如主机上的本仓库根目录）挂载到容器内的`\u002Fscratch`，并将其设为当前工作目录。\n- `-e HOME=\u002Fscratch`：指定临时文件的缓存位置。若需更精细的控制，可设置`DNNLIB_CACHE_DIR`（用于预训练模型下载缓存）。这些缓存目录应位于持久化存储卷上，以便在多次`docker run`调用之间保留其内容。\n\n## 使用预训练模型\n\n我们提供了针对不同模型尺寸的EDM2配置（配置G）的预训练模型，分别在ImageNet-512和ImageNet-64数据集上训练。要使用给定模型生成图像，请运行：\n\n```.bash\n# 生成几张图片并保存为out\u002F*.png\npython generate_images.py --preset=edm2-img512-s-guid-dino --outdir=out\n```\n\n上述命令会自动下载所需模型，并将其缓存在`$HOME\u002F.cache\u002Fdnnlib`目录下，可通过设置`DNNLIB_CACHE_DIR`环境变量来覆盖该路径。`--preset=edm2-img512-s-guid-dino`选项表示我们将使用S尺寸的EDM2模型，在ImageNet-512数据集上训练，并采用指导采样方式，同时选择合适的EMA长度和指导强度以最小化FD\u003Csub>DINOv2\u003C\u002Fsub>。支持的预设如下：\n\n```\n# EDM2论文\nedm2-img512-{xs|s|m|l|xl|xxl}-fid              # 表2，最小化fid\nedm2-img512-{xs|s|m|l|xl|xxl}-dino             # 表5，最小化fd_dinov2\nedm2-img64-{s|m|l|xl}-fid                      # 表3，最小化fid\nedm2-img512-{xs|s|m|l|xl|xxl}-guid-{fid|dino}  # 表2，无分类器指导\n\n# 自动引导论文\nedm2-img512-{s|xxl}-autog-{fid|dino}           # 表1，条件式ImageNet-512\nedm2-img512-s-uncond-autog-{fid|dino}          # 表1，无条件式ImageNet-512\nedm2-img64-s-autog-{fid|dino}                  # 表1，条件式ImageNet-64\n```\n\n每个预设都对应一组特定的选项，指向[https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fposthoc-reconstructions\u002F](https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fposthoc-reconstructions\u002F)中的模型。例如，`--preset=edm2-img512-xxl-guid-dino`等价于：\n\n```.bash\n# 展开后的--preset=edm2-img512-xxl-guid-dino命令行\npython generate_images.py \\\n    --net=https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fposthoc-reconstructions\u002Fedm2-img512-xxl-0939524-0.015.pkl \\\n    --gnet=https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fposthoc-reconstructions\u002Fedm2-img512-xs-uncond-2147483-0.015.pkl \\\n    --guidance=1.7 \\\n    --outdir=out\n```\n\n也就是说，我们将使用XXL尺寸的条件式模型，其kimg值为939524，EMA长度为0.015，并以XS尺寸的无条件式模型作为指导，该模型的kimg值为2147483，指导强度为1.7。更多细节请参阅[`generate_images.py`](.\u002Fgenerate_images.py)中的`config_presets`。\n\n## 计算FLOPs及指标\n\n可使用`count_flops.py`估算给定模型的计算成本：\n\n```.bash\n# 计算给定模型的FLOPs\npython count_flops.py \\\n    https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fposthoc-reconstructions\u002Fedm2-img512-s-2147483-0.130.pkl\n```\n\n要计算FID和FD\u003Csub>DINOv2\u003C\u002Fsub>,首先需要生成50,000张随机图片。这在实际操作中可能非常耗时，因此最好将任务分配到多张GPU上执行。可以通过`torchrun`启动`generate_images.py`来实现：\n\n```.bash\n# 使用8张GPU生成50,000张图片，并保存为out\u002F*\u002F*.png\ntorchrun --standalone --nproc_per_node=8 generate_images.py \\\n    --preset=edm2-img512-s-guid-fid --outdir=out --subdirs --seeds=0-49999\n```\n\n此外，也可以在计算集群中以多GPU或多节点作业的方式运行`generate_images.py`。只要集群环境能为每张GPU启动独立进程并设置好必要的环境变量，即可直接运行。更多详情请参阅[`torchrun`](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fstable\u002Felastic\u002Frun.html)文档。\n\n生成50,000张图片后，即可使用`calculate_metrics.py`计算FID和FD\u003Csub>DINOv2\u003C\u002Fsub>：\n\n```.bash\n\n# 计算 out\u002F 目录中随机子集50000张图像的指标\npython calculate_metrics.py calc --images=out \\\n    --ref=https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fdataset-refs\u002Fimg512.pkl\n```\n\n此处，`--ref` 选项指向模型最初训练所用数据集的预计算参考统计信息。我们预训练模型所需的必要参考统计信息可在 [https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fdataset-refs\u002F](https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fdataset-refs\u002F) 获取。\n\n请注意，不同随机种子下指标的数值会有所不同，并且对图像数量非常敏感。默认情况下，`calculate_metrics.py` 使用 50,000 张生成图像，这符合既定的最佳实践。如果提供的图像数量少于该值，则会报错；而如果多于该值，则会使用一个随机子集。为了减少随机性带来的影响，我们建议使用不同的随机种子多次重复计算，例如 `--seeds=0-49999`、`--seeds=50000-99999` 和 `--seeds=100000-149999`。在我们的论文中，我们对每个指标进行了多次计算，并报告了最小值。\n\n当需要对 EMA 长度或训练快照等进行更大规模的扫描时，按照上述方式使用 `generate_images.py` 可能不太现实。作为替代方案，也可以直接针对给定的网络 pickle 计算指标，同时即时生成所需的图像：\n\n```.bash\n# 直接针对给定模型计算指标，无需保存任何图像\ntorchrun --standalone --nproc_per_node=8 calculate_metrics.py gen \\\n    --net=https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fposthoc-reconstructions\u002Fedm2-img512-s-2147483-0.130.pkl \\\n    --ref=https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fdataset-refs\u002Fimg512.pkl \\\n    --seed=123456789\n```\n\n我们还提供了必要的 API，以便从外部 Python 脚本以编程方式执行此类操作。有关详细信息，请参阅 [`calculate_metrics.py`](.\u002Fcalculate_metrics.py) 中的 `gen()` 函数。\n\n## 事后 EMA 重建\n\n[https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fposthoc-reconstructions\u002F](https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fposthoc-reconstructions\u002F) 中的模型对应于特定的 EMA 长度选择。此外，我们还在 [https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fraw-snapshots\u002F](https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fraw-snapshots\u002F) 提供每次训练运行的原始快照，可用于重建任意 EMA 曲线。\n\n需要注意的是，原始快照可能会占用大量磁盘空间。在论文中，我们每训练 8Mi（即 8 [mebi](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FBinary_prefix#mebi) = 8×2\u003Csup>20\u003C\u002Fsup>) 张图像保存一次快照，根据模型大小的不同，每次训练产生的数据量为 118–635 GB。而在 [https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fraw-snapshots\u002F](https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fraw-snapshots\u002F) 中，我们则以 32Mi 的间隔提供快照，每次训练产生的数据量为 30–159 GB。我们经过大量测试验证，确认这种频率足以实现准确的重建。\n\n要重建新的 EMA 曲线，第一步是下载与特定训练运行相对应的原始快照。我们推荐使用 [Rclone](https:\u002F\u002Frclone.org\u002Finstall\u002F) 来完成此操作：\n\n```.bash\n# 下载预训练 edm2-img512-xs 模型的原始快照\nrclone copy --progress --http-url https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2 \\\n    :http:raw-snapshots\u002Fedm2-img512-xs\u002F raw-snapshots\u002Fedm2-img512-xs\u002F\n```\n\n上述命令将下载 128 个网络 pickle 文件，每个文件约 238 MB，总计 29.8 GB。下载完成后，可以使用 `reconstruct_phema.py` 重建新的 EMA 曲线：\n\n```.bash\n# 重建 std=0.150 的新 EMA 曲线\npython reconstruct_phema.py --indir=raw-snapshots\u002Fedm2-img512-xs \\\n    --outdir=out --outstd=0.150\n```\n\n该脚本会依次读取输入的每个 pickle 文件，并将重建后的模型保存到 `out\u002Fphema-2147483-0.150.pkl`，随后可与 `generate_images.py` 等工具配合使用。若需扫描不同的 EMA 长度，也可以同时重建多个 EMA 曲线：\n\n```.bash\n# 同时重建 31 条 EMA 曲线，分 4 次流式处理输入数据\npython reconstruct_phema.py --indir=raw-snapshots\u002Fedm2-img512-xs \\\n    --outdir=out --outstd=0.010,0.015,...,0.250 --batch=8\n```\n\n完整选项列表请参阅 [`python reconstruct_phema.py --help`](.\u002Fdocs\u002Fphema-help.txt)。\n\n需要注意的是，我们的事后 EMA 方法并不局限于扩散模型，同样适用于其他类型的深度学习模型。要在您自己的训练中尝试这种方法，您可以：**(1)** 将 [`training\u002Fphema.py`](.\u002Ftraining\u002Fphema.py) 添加到您的代码库中，**(2)** 修改您的训练循环以使用 `phema.PowerFunctionEMA`，以及 **(3)** 复制 [`reconstruct_phema.py`](.\u002Freconstruct_phema.py)，并根据您的需求进行调整。\n\n## 数据集准备\n\n数据集以未压缩的 ZIP 归档形式存储，其中包含未压缩的 PNG 或 NPY 文件，以及用于标注的元数据文件 `dataset.json`。在使用潜在扩散模型时，有必要为同一数据集创建两个版本：用于评估的原始 RGB 版本，以及用于训练的 VAE 编码的潜在版本。\n\n以下是设置 ImageNet-512 的步骤：\n\n1. 从 [Kaggle](https:\u002F\u002Fwww.kaggle.com\u002Fcompetitions\u002Fimagenet-object-localization-challenge\u002Fdata) 下载 ILSVRC2012 数据归档，并将其解压到某个位置，例如 `downloads\u002Fimagenet`。\n\n2. 对图像进行裁剪和调整大小，以创建原始 RGB 数据集：\n\n```.bash\n# 将原始 ImageNet 数据转换为 512x512 分辨率的 ZIP 归档\npython dataset_tool.py convert --source=downloads\u002Fimagenet\u002FILSVRC\u002FData\u002FCLS-LOC\u002Ftrain \\\n    --dest=datasets\u002Fimg512.zip --resolution=512x512 --transform=center-crop-dhariwal\n```\n\n3. 将图像通过预训练的 VAE 编码器，以创建对应的潜在数据集：\n\n```.bash\n# 将像素数据转换为 VAE 潜在表示\npython dataset_tool.py encode --source=datasets\u002Fimg512.zip \\\n    --dest=datasets\u002Fimg512-sd.zip\n```\n\n4. 计算原始 RGB 数据集的参考统计信息，以便与 `calculate_metrics.py` 配合使用：\n\n```.bash\n# 计算数据集参考统计信息，用于指标计算\npython calculate_metrics.py ref --data=datasets\u002Fimg512.zip \\\n    --dest=dataset-refs\u002Fimg512.pkl\n```\n\n## 新模型训练\n\n可以使用 `train_edm2.py` 训练新模型。例如，要使用与我们论文中相同的超参数训练一个 XS 尺寸的 ImageNet-512 条件模型，可运行：\n\n```.bash\n\n# 使用8张GPU训练ImageNet-512的XS尺寸模型\ntorchrun --standalone --nproc_per_node=8 train_edm2.py \\\n    --outdir=training-runs\u002F00000-edm2-img512-xs \\\n    --data=datasets\u002Fimg512-sd.zip \\\n    --preset=edm2-img512-xs \\\n    --batch-gpu=32\n```\n\n本示例采用单节点、8张GPU进行训练，但在实际应用中，我们建议至少使用32张A100 GPU，即4个DGX节点。需要注意的是，训练大型模型时，根据GPU数量和显存容量的不同，很容易出现显存不足的情况。避免这一问题的最佳方法是通过梯度累积来限制每张GPU的批量大小。在上述示例中，总批量为2048张图像，即每张GPU 256张；但我们通过指定`--batch-gpu=32`将其限制为每张GPU 32张。调整`--batch-gpu`参数是安全的，因为它不会与其他超参数产生相互影响，而如果直接修改总批量，则还需要相应地调整学习率等其他超参数。\n\n默认情况下，训练脚本每处理128Ki（= 128 [kibi](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FBinary_prefix#kibi) = 128×2\u003Csup>10\u003C\u002Fsup>)张训练图像时打印一次状态信息（由`--status`控制），每8Mi（= 8×2\u003Csup>20\u003C\u002Fsup>)张训练图像时保存一次网络快照（由`--snapshot`控制），并每128Mi张训练图像时转储一次训练检查点（由`--checkpoint`控制）。状态信息会分别保存到`log.txt`（简要摘要）和`stats.json`（全面统计信息）文件中。网络快照则保存为`network-snapshot-*.pkl`格式，可以直接用于`generate_images.py`和`reconstruct_phema.py`等脚本。\n\n训练检查点以`training-state-*.pt`格式保存，可用于稍后恢复训练。当训练脚本启动时，它会自动查找编号最高的检查点并加载（如果存在）。要恢复训练，只需再次运行相同的`train_edm2.py`命令行即可——务必使用完全相同的选项，以免在训练过程中意外更改超参数。若希望能够在任何时候暂停训练而不丢失进度，可以修改[torch_utils\u002Fdistributed.py](.\u002Ftorch_utils\u002Fdistributed.py)中的`should_suspend()`函数，以实现所需的信号协议。\n\n完整的选项列表请参阅[`python train_edm2.py --help`](.\u002Fdocs\u002Ftrain-help.txt)。\n\n## 二维玩具示例\n\n自引导论文中使用的二维玩具示例可以通过`toy_example.py`重现：\n\n```.bash\n# 使用自引导可视化采样分布。\npython toy_example.py plot\n```\n\n完整的选项列表请参阅[`python toy_example.py --help`](.\u002Fdocs\u002Ftoy-help.txt)。\n\n![二维玩具示例](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FNVlabs_edm2_readme_44ead30c6c77.jpg)\n\n## 许可证\n\n版权所有 © 2024，NVIDIA公司及其关联公司。保留所有权利。\n\n所有材料，包括源代码和预训练模型，均依据[知识共享署名-非商业性使用-相同方式共享4.0国际许可协议](http:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-nc-sa\u002F4.0\u002F)授权。\n\n## 引用\n\n```\n@inproceedings{Karras2024edm2,\n  title     = {分析与改进扩散模型的训练动态},\n  author    = {Tero Karras、Miika Aittala、Jaakko Lehtinen、Janne Hellsten、Timo Aila、Samuli Laine},\n  booktitle = {CVPR会议论文集},\n  year      = {2024},\n}\n\n@inproceedings{Karras2024autoguidance,\n  title     = {用自身的一个较差版本引导扩散模型},\n  author    = {Tero Karras、Miika Aittala、Tuomas Kynk\\\"a\\\"anniemi、Jaakko Lehtinen、Timo Aila、Samuli Laine},\n  booktitle = {NeurIPS会议论文集},\n  year      = {2024},\n}\n```\n\n## 开发说明\n\n本项目是一个研究参考实现，被视为一次性代码发布。因此，我们不接受来自外部的Pull Request形式的代码贡献。\n\n## 致谢\n\n我们感谢Eric Chan、Qinsheng Zhang、Erik H&auml;rk&ouml;nen、Arash Vahdat、Ming-Yu Liu、David Luebke和Alex Keller提供的讨论与意见，同时也感谢Tero Kuosmanen和Samuel Klenberg对我们计算基础设施的维护工作。","# EDM2 快速上手指南\n\nEDM2 是 NVIDIA 研究团队推出的先进扩散模型训练与采样框架，包含两篇顶级会议论文成果（CVPR 2024 & NeurIPS 2024），旨在优化扩散模型的训练动态并引入“自引导”（Autoguidance）技术。\n\n## 环境准备\n\n### 系统要求\n*   **操作系统**：支持 Linux 和 Windows，**强烈推荐使用 Linux** 以获得最佳性能和兼容性。\n*   **GPU**：\n    *   图像生成（采样）：至少 1 张高端 NVIDIA GPU（测试基于 V100\u002FA100）。\n    *   模型训练：建议 8 张及以上 GPU。\n*   **软件版本**：\n    *   Python 3.9 (64-bit)\n    *   PyTorch 2.1 或更高版本\n\n### 依赖安装\n确保已安装 PyTorch，然后安装其他必要的 Python 库：\n\n```bash\npip install click Pillow psutil requests scipy tqdm diffusers==0.26.3 accelerate==0.27.2\n```\n\n> **提示**：如需下载用于事后 EMA 重建的原始快照，建议安装 [Rclone](https:\u002F\u002Frclone.org\u002Finstall\u002F)。\n\n### Docker 方案（推荐）\n为避免环境配置冲突，项目提供了包含所有依赖的 Dockerfile。\n\n1.  **构建镜像**：\n    ```bash\n    docker build --tag edm2:latest .\n    ```\n\n2.  **运行生成脚本**：\n    ```bash\n    docker run --gpus all -it --rm --user $(id -u):$(id -g) \\\n        -v `pwd`:\u002Fscratch --workdir \u002Fscratch -e HOME=\u002Fscratch \\\n        edm2:latest \\\n        python generate_images.py --preset=edm2-img512-s-guid-dino --outdir=out\n    ```\n    *注意：请确保已正确安装 [NVIDIA Container Runtime](https:\u002F\u002Fdocs.docker.com\u002Fconfig\u002Fcontainers\u002Fresource_constraints\u002F#gpu)。*\n\n## 安装步骤\n\n如果您选择直接在本地环境运行（非 Docker），请按以下步骤操作：\n\n1.  **克隆仓库**：\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002FNVlabs\u002Fedm2.git\n    cd edm2\n    ```\n\n2.  **安装 PyTorch**：\n    访问 [PyTorch 官网](https:\u002F\u002Fpytorch.org) 根据您的 CUDA 版本安装对应的 PyTorch。国内用户可使用清华源加速：\n    ```bash\n    pip install torch torchvision torchaudio --index-url https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n    ```\n\n3.  **安装项目依赖**：\n    ```bash\n    pip install -r requirements.txt\n    # 或者手动安装列出的库\n    pip install click Pillow psutil requests scipy tqdm diffusers==0.26.3 accelerate==0.27.2\n    ```\n\n## 基本使用\n\nEDM2 提供了预训练模型，可直接用于生成高质量图像。脚本会自动下载模型并缓存至 `$HOME\u002F.cache\u002Fdnnlib`。\n\n### 1. 生成图像（最简单示例）\n使用预设配置生成图像并保存到 `out` 目录：\n\n```bash\npython generate_images.py --preset=edm2-img512-s-guid-dino --outdir=out\n```\n\n*   `--preset=edm2-img512-s-guid-dino`：使用在 ImageNet-512 上训练的 S 尺寸模型，采用针对 FD_DINOv2 优化的引导策略。\n\n### 2. 常用预设型号\n您可以根据需求更换 `--preset` 参数：\n\n*   **最小化 FID** (Table 2\u002F3):\n    `edm2-img512-{xs|s|m|l|xl|xxl}-fid`\n    `edm2-img64-{s|m|l|xl}-fid`\n*   **最小化 FD_DINOv2** (Table 5):\n    `edm2-img512-{xs|s|m|l|xl|xxl}-dino`\n*   **无分类器引导** (Classifier-free guidance):\n    `edm2-img512-{xs|s|m|l|xl|xxl}-guid-{fid|dino}`\n*   **自引导 (Autoguidance)** (NeurIPS 2024):\n    `edm2-img512-{s|xxl}-autog-{fid|dino}` (条件生成)\n    `edm2-img512-s-uncond-autog-{fid|dino}` (无条件生成)\n\n### 3. 多 GPU 并行生成\n为了评估指标（如 FID），通常需要生成 50,000 张图像。建议使用 `torchrun` 在多卡上分布式运行：\n\n```bash\ntorchrun --standalone --nproc_per_node=8 generate_images.py \\\n    --preset=edm2-img512-s-guid-fid --outdir=out --subdirs --seeds=0-49999\n```\n\n### 4. 计算评估指标\n生成图像后，使用以下命令计算 FID 和 FD_DINOv2：\n\n```bash\npython calculate_metrics.py calc --images=out \\\n    --ref=https:\u002F\u002Fnvlabs-fi-cdn.nvidia.com\u002Fedm2\u002Fdataset-refs\u002Fimg512.pkl\n```\n\n> **注意**：指标数值对随机种子敏感，建议多次运行取最小值以获得稳定结果。","某游戏工作室的美术团队正急需为开放世界项目批量生成高分辨率（512x512）的写实风格植被与岩石纹理，以填充庞大的资产库。\n\n### 没有 edm2 时\n- 生成的图像细节模糊，叶片脉络和岩石裂纹等高频纹理丢失严重，无法满足近景渲染需求。\n- 为了提升画质不得不大幅延长采样步数，导致单张图像生成耗时过长，严重拖慢迭代效率。\n- 模型在复杂结构上容易出现伪影或畸变，美术人员需花费大量时间手动修图或重新抽卡。\n- 难以在保持图像多样性的同时确保整体风格统一，导致资产库视觉质量参差不齐。\n\n### 使用 edm2 后\n- 得益于优化的训练动力学，生成的 512px 图像纹理锐利清晰，微观细节丰富度显著提升。\n- 结合 Autoguidance 技术，仅需较少采样步数即可达到甚至超越以往的画质标准，生成速度大幅提升。\n- 利用“用较差版本引导自身”的策略，有效消除了结构畸变，输出图像的几何一致性更加稳定可靠。\n- 通过预设参数（如 `edm2-img512-s-guid-dino`）可精准最小化感知误差，确保批量产出的资产风格高度统一且逼真。\n\nedm2 通过革新扩散模型的训练与引导机制，让高质量、高一致性的游戏资产生成变得既快速又可控。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FNVlabs_edm2_44ead30c.jpg","NVlabs","NVIDIA Research Projects","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FNVlabs_fc20d641.jpg","",null,"http:\u002F\u002Fresearch.nvidia.com","https:\u002F\u002Fgithub.com\u002FNVlabs",[83,87],{"name":84,"color":85,"percentage":86},"Python","#3572A5",99.7,{"name":88,"color":89,"percentage":90},"Dockerfile","#384d54",0.3,831,56,"2026-04-03T17:24:37","NOASSERTION","Linux, Windows","必需 NVIDIA GPU。采样需 1+ 张高端显卡，训练需 8+ 张。官方测试使用 V100 和 A100。需安装 NVIDIA Container Runtime 以支持 Docker GPU 透传。","未说明",{"notes":99,"python":100,"dependencies":101},"推荐使用 Linux 以获得最佳性能和兼容性。提供 Dockerfile 以便快速部署环境。下载原始快照进行后验 EMA 重建时推荐使用 Rclone。预训练模型会自动下载并缓存，可通过环境变量 DNNLIB_CACHE_DIR 自定义缓存路径。计算指标（如 FID）时建议多卡并行以加速生成 50,000 张图像的过程。","3.9",[102,103,104,105,106,107,108,109,110],"torch>=2.1","click","Pillow","psutil","requests","scipy","tqdm","diffusers==0.26.3","accelerate==0.27.2",[14,13],"2026-03-27T02:49:30.150509","2026-04-06T09:45:33.561287",[115,120],{"id":116,"question_zh":117,"answer_zh":118,"source_url":119},15734,"使用 sd-vae-ft-mse 潜在编码器时，数据归一化范围应该是 [0, 1] 还是 [-1, 1]？","代码中在应用 VAE 之前对像素进行编码并将图像缩放至 [0, 1] 是正确的。虽然 SD VAE 通常被认为是在 [-1, 1] 下训练的，但 OpenAI 的一致性论文（Consistency Models）发现并采用了这种 [0, 1] 的处理方式。具体实现可参考源码：https:\u002F\u002Fgithub.com\u002FNVlabs\u002Fedm2\u002Fblob\u002F38d5a70fe338edc8b3aac4da8a0cefbc4a057fb8\u002Ftraining\u002Fencoders.py#L112","https:\u002F\u002Fgithub.com\u002FNVlabs\u002Fedm2\u002Fissues\u002F9",{"id":121,"question_zh":122,"answer_zh":123,"source_url":124},15735,"当前版本如何配置以使用 CFG（Classifier-Free Guidance）代替 AG（Adaptive Guidance）？","可以通过配置 gnet 将其 label_dim 设置为 0 来启用 CFG。具体需要修改训练网络配置和生成脚本：\n1. 网络配置参考：https:\u002F\u002Fgithub.com\u002FNVlabs\u002Fedm2\u002Fblob\u002Fmain\u002Ftraining\u002Fnetworks_edm2.py#L193\n2. 生成脚本配置参考：https:\u002F\u002Fgithub.com\u002FNVlabs\u002Fedm2\u002Fblob\u002Fmain\u002Fgenerate_images.py#L48-L59","https:\u002F\u002Fgithub.com\u002FNVlabs\u002Fedm2\u002Fissues\u002F12",[]]