[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-yang-song--score_sde_pytorch":3,"tool-yang-song--score_sde_pytorch":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",144730,2,"2026-04-07T23:26:32",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":78,"owner_email":79,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":99,"forks":100,"last_commit_at":101,"license":102,"difficulty_score":10,"env_os":103,"env_gpu":104,"env_ram":103,"env_deps":105,"category_tags":112,"github_topics":113,"view_count":32,"oss_zip_url":79,"oss_zip_packed_at":79,"status":17,"created_at":123,"updated_at":124,"faqs":125,"releases":160},5359,"yang-song\u002Fscore_sde_pytorch","score_sde_pytorch","PyTorch implementation for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)","score_sde_pytorch 是一个基于 PyTorch 实现的开源项目，核心用于“基于分数的生成建模”。它通过随机微分方程（SDE）这一统一框架，将复杂的数据分布逐步转化为简单的噪声分布，再通过逆向过程从噪声中高质量地还原出图像。\n\n该工具主要解决了传统生成模型在理论统一性、采样效率及似然计算方面的局限。它不仅复现并整合了 NCSN、DDPM 等经典算法，还支持精确的似然计算、潜在空间操控以及图像修复、上色等条件生成任务。其在 CIFAR-10 数据集上取得了极低的 FID 分数，并能生成高达 1024 像素的高保真人脸图像，展现了卓越的生成能力。\n\n技术亮点在于其高度模块化的设计，允许研究人员轻松扩展新的随机微分方程、预测器或校正器。此外，该项目已与 Hugging Face 🤗 Diffusers 库深度集成，用户仅需几行代码即可调用预训练模型进行推理，大幅降低了使用门槛。同时，官方还提供了支持分类条件生成的 JAX 版本供选择。\n\nscore_sde_pytorch 非常适合人工智能研究人员、算法开发者以及对生成式模型原理有深入探索需求的技术人员使用。对于希望快速验证 S","score_sde_pytorch 是一个基于 PyTorch 实现的开源项目，核心用于“基于分数的生成建模”。它通过随机微分方程（SDE）这一统一框架，将复杂的数据分布逐步转化为简单的噪声分布，再通过逆向过程从噪声中高质量地还原出图像。\n\n该工具主要解决了传统生成模型在理论统一性、采样效率及似然计算方面的局限。它不仅复现并整合了 NCSN、DDPM 等经典算法，还支持精确的似然计算、潜在空间操控以及图像修复、上色等条件生成任务。其在 CIFAR-10 数据集上取得了极低的 FID 分数，并能生成高达 1024 像素的高保真人脸图像，展现了卓越的生成能力。\n\n技术亮点在于其高度模块化的设计，允许研究人员轻松扩展新的随机微分方程、预测器或校正器。此外，该项目已与 Hugging Face 🤗 Diffusers 库深度集成，用户仅需几行代码即可调用预训练模型进行推理，大幅降低了使用门槛。同时，官方还提供了支持分类条件生成的 JAX 版本供选择。\n\nscore_sde_pytorch 非常适合人工智能研究人员、算法开发者以及对生成式模型原理有深入探索需求的技术人员使用。对于希望快速验证 SDE 理论或复现前沿论文结果的用户而言，这是一个不可或缺的基础设施。","# Score-Based Generative Modeling through Stochastic Differential Equations\n\n[![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fscore-based-generative-modeling-through-1\u002Fimage-generation-on-cifar-10)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Fimage-generation-on-cifar-10?p=score-based-generative-modeling-through-1)\n\nThis repo contains a PyTorch implementation for the paper [Score-Based Generative Modeling through Stochastic Differential Equations](https:\u002F\u002Fopenreview.net\u002Fforum?id=PxTIG12RRHS)\n\nby [Yang Song](https:\u002F\u002Fyang-song.github.io), [Jascha Sohl-Dickstein](http:\u002F\u002Fwww.sohldickstein.com\u002F), [Diederik P. Kingma](http:\u002F\u002Fdpkingma.com\u002F), [Abhishek Kumar](http:\u002F\u002Fusers.umiacs.umd.edu\u002F~abhishek\u002F), [Stefano Ermon](https:\u002F\u002Fcs.stanford.edu\u002F~ermon\u002F), and [Ben Poole](https:\u002F\u002Fcs.stanford.edu\u002F~poole\u002F)\n\n--------------------\n\nWe propose a unified framework that generalizes and improves previous work on score-based generative models through the lens of stochastic differential equations (SDEs). In particular, we can transform data to a simple noise distribution with a continuous-time stochastic process described by an SDE. This SDE can be reversed for sample generation if we know the score of the marginal distributions at each intermediate time step, which can be estimated with score matching. The basic idea is captured in the figure below:\n\n![schematic](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fyang-song_score_sde_pytorch_readme_5c2b74e6c4e8.jpg)\n\nOur work enables a better understanding of existing approaches,  new sampling algorithms, exact likelihood computation, uniquely identifiable encoding, latent code manipulation, and brings new conditional generation abilities (including but not limited to class-conditional generation, inpainting and colorization) to the family of score-based generative models.\n\nAll combined, we achieved an FID of **2.20** and an Inception score of **9.89** for unconditional generation on CIFAR-10, as well as high-fidelity generation of **1024px** Celeba-HQ images (samples below). In addition, we obtained a likelihood value of **2.99** bits\u002Fdim on uniformly dequantized CIFAR-10 images.\n\n![FFHQ samples](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fyang-song_score_sde_pytorch_readme_84ff65144675.jpg)\n\n## What does this code do?\nAside from the **NCSN++** and **DDPM++** models in our paper, this codebase also re-implements many previous score-based models in one place, including **NCSN** from [Generative Modeling by Estimating Gradients of the Data Distribution](https:\u002F\u002Farxiv.org\u002Fabs\u002F1907.05600), **NCSNv2** from [Improved Techniques for Training Score-Based Generative Models](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.09011), and **DDPM** from [Denoising Diffusion Probabilistic Models](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.11239). \n\nIt supports training new models, evaluating the sample quality and likelihoods of existing models. We carefully designed the code to be modular and easily extensible to new SDEs, predictors, or correctors.\n\n## **Integration with 🤗 Diffusers library**\n\nMost models are now also available in 🧨 Diffusers and accesible via the [ScoreSdeVE pipeline](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Fapi\u002Fpipelines\u002Fscore_sde_ve).\n\nDiffusers allows you to test score sde based models in PyTorch in just a couple lines of code.\n\nYou can install diffusers as follows:\n\n```\npip install diffusers torch accelerate\n```\n\nAnd then try out the models with just a couple lines of code:\n\n```python\nfrom diffusers import DiffusionPipeline\n\nmodel_id = \"google\u002Fncsnpp-ffhq-1024\"\n\n# load model and scheduler\nsde_ve = DiffusionPipeline.from_pretrained(model_id)\n\n# run pipeline in inference (sample random noise and denoise)\nimage = sde_ve().images[0]\n\n\n# save image\nimage[0].save(\"sde_ve_generated_image.png\")\n```\n\nMore models can be found directly [on the Hub](https:\u002F\u002Fhuggingface.co\u002Fmodels?library=diffusers&pipeline_tag=unconditional-image-generation&sort=downloads&search=ncsnpp).\n\n## JAX version\n\nPlease find a JAX implementation [here](https:\u002F\u002Fgithub.com\u002Fyang-song\u002Fscore_sde), which additionally supports class-conditional generation with a pre-trained classifier, and resuming an evalution process after pre-emption.\n\n###  JAX vs. PyTorch\n\nIn general, this PyTorch version consumes less memory but runs slower than JAX. Here is a benchmark on training an NCSN++ cont. model with VE SDE. Hardware is 4x Nvidia Tesla V100 GPUs (32GB)\n| Framework | Time (second per step) | Memory usage in total (GB) |\n|:----:|:----:|:----:|\n|PyTorch | 0.56 | 20.6|\n|JAX (`n_jitted_steps=1`)| 0.30 | 29.7 |\n|JAX (`n_jitted_steps=5`) | 0.20 | 74.8|\n\n## How to run the code\n\n### Dependencies\n\nRun the following to install a subset of necessary python packages for our code\n```sh\npip install -r requirements.txt\n```\n\n### Stats files for quantitative evaluation\n\nWe provide the stats file for CIFAR-10. You can download [`cifar10_stats.npz`](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F14UB27-Spi8VjZYKST3ZcT8YVhAluiFWI\u002Fview?usp=sharing)  and save it to `assets\u002Fstats\u002F`. Check out [#5](https:\u002F\u002Fgithub.com\u002Fyang-song\u002Fscore_sde\u002Fpull\u002F5) on how to compute this stats file for new datasets.\n\n### Usage\n\nTrain and evaluate our models through `main.py`.\n\n```sh\nmain.py:\n  --config: Training configuration.\n    (default: 'None')\n  --eval_folder: The folder name for storing evaluation results\n    (default: 'eval')\n  --mode: \u003Ctrain|eval>: Running mode: train or eval\n  --workdir: Working directory\n```\n\n* `config` is the path to the config file. Our prescribed config files are provided in `configs\u002F`. They are formatted according to [`ml_collections`](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fml_collections) and should be quite self-explanatory.\n\n  **Naming conventions of config files**: the path of a config file is a combination of the following dimensions:\n  *  dataset: One of `cifar10`, `celeba`, `celebahq`, `celebahq_256`, `ffhq_256`, `celebahq`, `ffhq`.\n  * model: One of `ncsn`, `ncsnv2`, `ncsnpp`, `ddpm`, `ddpmpp`.\n  * continuous: train the model with continuously sampled time steps. \n\n*  `workdir` is the path that stores all artifacts of one experiment, like checkpoints, samples, and evaluation results.\n\n* `eval_folder` is the name of a subfolder in `workdir` that stores all artifacts of the evaluation process, like meta checkpoints for pre-emption prevention, image samples, and numpy dumps of quantitative results.\n\n* `mode` is either \"train\" or \"eval\". When set to \"train\", it starts the training of a new model, or resumes the training of an old model if its meta-checkpoints (for resuming running after pre-emption in a cloud environment) exist in `workdir\u002Fcheckpoints-meta` . When set to \"eval\", it can do an arbitrary combination of the following\n\n  * Evaluate the loss function on the test \u002F validation dataset.\n\n  * Generate a fixed number of samples and compute its Inception score, FID, or KID. Prior to evaluation, stats files must have already been downloaded\u002Fcomputed and stored in `assets\u002Fstats`.\n\n  * Compute the log-likelihood on the training or test dataset.\n\n  These functionalities can be configured through config files, or more conveniently, through the command-line support of the `ml_collections` package. For example, to generate samples and evaluate sample quality, supply the  `--config.eval.enable_sampling` flag; to compute log-likelihoods, supply the `--config.eval.enable_bpd` flag, and specify `--config.eval.dataset=train\u002Ftest` to indicate whether to compute the likelihoods on the training or test dataset.\n\n## How to extend the code\n* **New SDEs**: inherent the `sde_lib.SDE` abstract class and implement all abstract methods. The `discretize()` method is optional and the default is Euler-Maruyama discretization. Existing sampling methods and likelihood computation will automatically work for this new SDE.\n* **New predictors**: inherent the `sampling.Predictor` abstract class, implement the `update_fn` abstract method, and register its name with `@register_predictor`. The new predictor can be directly used in `sampling.get_pc_sampler` for Predictor-Corrector sampling, and all other controllable generation methods in `controllable_generation.py`.\n* **New correctors**: inherent the `sampling.Corrector` abstract class, implement the `update_fn` abstract method, and register its name with `@register_corrector`. The new corrector can be directly used in `sampling.get_pc_sampler`, and all other controllable generation methods in `controllable_generation.py`.\n\n## Pretrained checkpoints\nAll checkpoints are provided in this [Google drive](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1tFmF_uh57O6lx9ggtZT_5LdonVK2cV-e?usp=sharing).\n\n**Instructions**: You may find two checkpoints for some models. The first checkpoint (with a smaller number) is the one that we reported FID scores in our paper's Table 3 (also corresponding to the FID and IS columns in the table below). The second checkpoint (with a larger number) is the one that we reported likelihood values and FIDs of black-box ODE samplers in our paper's Table 2 (also FID(ODE) and NNL (bits\u002Fdim) columns in the table below). The former corresponds to the smallest FID during the course of training (every 50k iterations). The later is the last checkpoint during training.\n\nPer Google's policy, we cannot release our original CelebA and CelebA-HQ checkpoints. That said, I have re-trained models on FFHQ 1024px, FFHQ 256px and CelebA-HQ 256px with personal resources, and they achieved similar performance to our internal checkpoints. \n\nHere is a detailed list of checkpoints and their results reported in the paper. **FID (ODE)** corresponds to the sample quality of black-box ODE solver applied to the probability flow ODE.\n\n| Checkpoint path | FID | IS | FID (ODE) | NNL (bits\u002Fdim) |\n|:----------|:-------:|:----------:|:----------:|:----------:|\n| [`ve\u002Fcifar10_ncsnpp\u002F`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1sP4GwvrYiI-sDPTp7sKYzsxJLGVamVMZ?usp=sharing) |  2.45 | 9.73 | - | - |\n| [`ve\u002Fcifar10_ncsnpp_continuous\u002F`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1b0gy_LLgO_DaQBgoWXwlVnL_rcAUgREh?usp=sharing) | 2.38 | 9.83 | - | - |\n| [`ve\u002Fcifar10_ncsnpp_deep_continuous\u002F`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F11s6A_xM7qiztdj8AHQWqaIAUSC3I7uX2?usp=sharing) | **2.20** | **9.89** | - | - |\n| [`vp\u002Fcifar10_ddpm\u002F`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1zDKcy3xbsN3F4AfyB_DfY_1oho89iKcf?usp=sharing) | 3.24 | - | 3.37 | 3.28 |\n| [`vp\u002Fcifar10_ddpm_continuous`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1RHNxW1qY-mTr0JMAE5t4V181Hi_aVWXK?usp=sharing) | - | - | 3.69| 3.21 |\n| [`vp\u002Fcifar10_ddpmpp`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1zOVj03ZBcq339p5QEKJPh2bBrxR_HOCM?usp=sharing) | 2.78 | 9.64 | - | - |\n| [`vp\u002Fcifar10_ddpmpp_continuous`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1xYjVMx10N9ivQQBIsEoXEeu9nvSGTBrC?usp=sharing) | 2.55 | 9.58 | 3.93 | 3.16 |\n| [`vp\u002Fcifar10_ddpmpp_deep_continuous`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1ZMLBiu9j7-rpdTQu8M2LlHAEQq4xRYrj?usp=sharing) | 2.41 | 9.68 | 3.08 | 3.13 |\n| [`subvp\u002Fcifar10_ddpm_continuous`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1DeebpmBkCxlZx89t3z45Te37T7BPOzd2?usp=sharing) | - | - | 3.56 | 3.05 |\n| [`subvp\u002Fcifar10_ddpmpp_continuous`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1bLgmnEAZnysRZfWt8qN3omGfijJ_B884?usp=sharing) | 2.61 | 9.56 | 3.16 | 3.02 |\n| [`subvp\u002Fcifar10_ddpmpp_deep_continuous`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F16QGkviGcizSbIPRk37-YksUhlNIna4Ys?usp=sharing) | 2.41 | 9.57 | **2.92** | **2.99** |\n\n| Checkpoint path | Samples |\n|:-----|:------:|\n| [`ve\u002Fbedroom_ncsnpp_continuous`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F18GmxDvfGR8se9uFucc9uweeVrX_GzuUG?usp=sharing) | ![bedroom_samples](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fyang-song_score_sde_pytorch_readme_772513bfb9ae.jpeg) |\n| [`ve\u002Fchurch_ncsnpp_continuous`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1zVChA0HrnJU66Jkt4P6KOnlREhBMc4Yh?usp=sharing) | ![church_samples](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fyang-song_score_sde_pytorch_readme_ecc24b3b908b.jpeg) |\n| [`ve\u002Fffhq_1024_ncsnpp_continuous`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1ZqLNr_kH0o9DxvwSlrQPMmkrhEnXhBm2?usp=sharing) |![ffhq_1024](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fyang-song_score_sde_pytorch_readme_c4f07a106670.jpeg)|\n| [`ve\u002Fffhq_256_ncsnpp_continuous`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1KG72ZKUCUa8dDcA03hOf1BsnK8kBcdPD?usp=sharing) |![ffhq_256_samples](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fyang-song_score_sde_pytorch_readme_caa2de1b67b1.jpg)|\n| [`ve\u002Fcelebahq_256_ncsnpp_continuous`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F19VJ7UZTE-ytGX6z5rl-tumW9c0Ps3itk?usp=sharing) |![celebahq_256_samples](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fyang-song_score_sde_pytorch_readme_358ef1dc50a7.jpg)|\n\n\n## Demonstrations and tutorials\n| Link | Description|\n|:----:|:-----|\n|[![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1dRR_0gNRmfLtPavX2APzUggBuXyjWW55?usp=sharing)  | Load our pretrained checkpoints and play with sampling, likelihood computation, and controllable synthesis (JAX + FLAX)|\n|[![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F17lTrPLTt_0EDXa4hkbHmbAFQEkpRDZnh?usp=sharing) | Load our pretrained checkpoints and play with sampling, likelihood computation, and controllable synthesis (PyTorch) |\n|[![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1SeXMpILhkJPjXUaesvzEhc3Ke6Zl_zxJ?usp=sharing) | Tutorial of score-based generative models in JAX + FLAX |\n|[![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F120kYYBOVa1i0TD85RjlEkFjaWDxSFUx3?usp=sharing)| Tutorial of score-based generative models in PyTorch |\n\n\n## Tips\n* When using the JAX codebase, you can jit multiple training steps together to improve training speed at the cost of more memory usage. This can be set via `config.training.n_jitted_steps`. For CIFAR-10, we recommend using `config.training.n_jitted_steps=5` when your GPU\u002FTPU has sufficient memory; otherwise we recommend using `config.training.n_jitted_steps=1`. Our current implementation requires `config.training.log_freq` to be dividable by `n_jitted_steps` for logging and checkpointing to work normally.\n* The `snr` (signal-to-noise ratio) parameter of `LangevinCorrector` somewhat behaves like a temperature parameter. Larger `snr` typically results in smoother samples, while smaller `snr` gives more diverse but lower quality samples. Typical values of `snr` is `0.05 - 0.2`, and it requires tuning to strike the sweet spot.\n* For VE SDEs, we recommend choosing `config.model.sigma_max` to be the maximum pairwise distance between data samples in the training dataset.\n\n## References\n\nIf you find the code useful for your research, please consider citing\n```bib\n@inproceedings{\n  song2021scorebased,\n  title={Score-Based Generative Modeling through Stochastic Differential Equations},\n  author={Yang Song and Jascha Sohl-Dickstein and Diederik P Kingma and Abhishek Kumar and Stefano Ermon and Ben Poole},\n  booktitle={International Conference on Learning Representations},\n  year={2021},\n  url={https:\u002F\u002Fopenreview.net\u002Fforum?id=PxTIG12RRHS}\n}\n```\n\nThis work is built upon some previous papers which might also interest you:\n\n* Song, Yang, and Stefano Ermon. \"Generative Modeling by Estimating Gradients of the Data Distribution.\" *Proceedings of the 33rd Annual Conference on Neural Information Processing Systems*. 2019.\n* Song, Yang, and Stefano Ermon. \"Improved techniques for training score-based generative models.\" *Proceedings of the 34th Annual Conference on Neural Information Processing Systems*. 2020.\n* Ho, Jonathan, Ajay Jain, and Pieter Abbeel. \"Denoising diffusion probabilistic models.\" *Proceedings of the 34th Annual Conference on Neural Information Processing Systems*. 2020.\n\n","# 基于分数的生成模型：通过随机微分方程\n\n[![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fscore-based-generative-modeling-through-1\u002Fimage-generation-on-cifar-10)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Fimage-generation-on-cifar-10?p=score-based-generative-modeling-through-1)\n\n本仓库包含论文《基于分数的生成模型：通过随机微分方程》的 PyTorch 实现，该论文由 Yang Song、Jascha Sohl-Dickstein、Diederik P. Kingma、Abhishek Kumar、Stefano Ermon 和 Ben Poole 共同撰写。\n\n--------------------\n\n我们提出了一种统一的框架，通过随机微分方程（SDE）的视角，对先前基于分数的生成模型工作进行了泛化和改进。具体而言，我们可以利用由 SDE 描述的连续时间随机过程，将数据转换为简单的噪声分布。如果已知每个中间时间步长上边缘分布的分数函数，就可以逆转这一 SDE 过程以进行采样；而这些分数函数则可通过分数匹配方法来估计。其基本思想如图所示：\n\n![示意图](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fyang-song_score_sde_pytorch_readme_5c2b74e6c4e8.jpg)\n\n我们的工作不仅有助于更好地理解现有方法，还带来了新的采样算法、精确的似然计算、唯一可识别的编码、潜在空间代码操控等功能，并为基于分数的生成模型家族增添了新的条件生成能力（包括但不限于类别条件生成、图像修复和彩色化等）。综合来看，在 CIFAR-10 数据集上的无条件生成任务中，我们取得了 FID 为 **2.20**、Inception 分数为 **9.89** 的优异成绩，并能高质量地生成 **1024px** 尺寸的 Celeba-HQ 图像（见下方样本）。此外，在对 CIFAR-10 图像进行均匀去量化处理后，我们还获得了 **2.99** 比特\u002F维度的似然值。\n\n![FFHQ 样本](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fyang-song_score_sde_pytorch_readme_84ff65144675.jpg)\n\n## 该代码的功能是什么？\n除了我们论文中的 **NCSN++** 和 **DDPM++** 模型外，本代码库还在一处重新实现了许多先前的基于分数的模型，包括来自论文《通过估计数据分布的梯度进行生成建模》的 **NCSN**、来自论文《训练基于分数的生成模型的改进技术》的 **NCSNv2**，以及来自论文《去噪扩散概率模型》的 **DDPM**。\n\n该代码支持训练新模型，也支持评估现有模型的样本质量和似然值。我们精心设计了代码结构，使其具有模块化特性，便于扩展到新的随机微分方程、预测器或校正器。\n\n## 与 🤗 Diffusers 库的集成\n\n目前，大多数模型也已在 🧨 Diffusers 中提供，并可通过 [ScoreSdeVE 流水线](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Fapi\u002Fpipelines\u002Fscore_sde_ve) 访问。\n\n借助 Diffusers，您只需几行代码即可在 PyTorch 中测试基于分数 SDE 的模型。\n\n安装 Diffusers 的命令如下：\n\n```\npip install diffusers torch accelerate\n```\n\n然后只需几行代码即可试用这些模型：\n\n```python\nfrom diffusers import DiffusionPipeline\n\nmodel_id = \"google\u002Fncsnpp-ffhq-1024\"\n\n# 加载模型和调度器\nsde_ve = DiffusionPipeline.from_pretrained(model_id)\n\n# 在推理模式下运行流水线（生成随机噪声并逐步去噪）\nimage = sde_ve().images[0]\n\n# 保存图像\nimage[0].save(\"sde_ve_generated_image.png\")\n```\n\n更多模型可以直接在 [Hub 上找到](https:\u002F\u002Fhuggingface.co\u002Fmodels?library=diffusers&pipeline_tag=unconditional-image-generation&sort=downloads&search=ncsnpp)。\n\n## JAX 版本\n请参阅 [此处](https:\u002F\u002Fgithub.com\u002Fyang-song\u002Fscore_sde) 的 JAX 实现，该版本额外支持使用预训练分类器进行类别条件生成，以及在中断后恢复评估过程。\n\n### JAX 与 PyTorch 的对比\n总体而言，PyTorch 版本的内存消耗较低，但运行速度比 JAX 慢。以下是在 4 张 Nvidia Tesla V100 GPU（每张 32GB）硬件上训练 NCSN++ VE SDE 模型的基准测试结果：\n\n| 框架         | 每步耗时（秒） | 总内存占用（GB） |\n|:----:|:----:|:----:|\n| PyTorch      | 0.56           | 20.6            |\n| JAX (`n_jitted_steps=1`) | 0.30           | 29.7            |\n| JAX (`n_jitted_steps=5`) | 0.20           | 74.8            |\n\n## 如何运行代码\n\n### 依赖项\n运行以下命令以安装本代码所需的部分 Python 包：\n```sh\npip install -r requirements.txt\n```\n\n### 定量评估用统计文件\n我们提供了 CIFAR-10 数据集的统计文件。您可以下载 [`cifar10_stats.npz`](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F14UB27-Spi8VjZYKST3ZcT8YVhAluiFWI\u002Fview?usp=sharing)，并将其保存到 `assets\u002Fstats\u002F` 目录下。有关如何为新数据集计算此类统计文件，请参阅 [#5](https:\u002F\u002Fgithub.com\u002Fyang-song\u002Fscore_sde\u002Fpull\u002F5)。\n\n### 使用方法\n\n通过 `main.py` 训练和评估我们的模型。\n\n```sh\nmain.py:\n  --config: 训练配置。\n    (默认值：'None')\n  --eval_folder: 存储评估结果的文件夹名称\n    (默认值：'eval')\n  --mode: \u003Ctrain|eval>：运行模式：训练或评估\n  --workdir: 工作目录\n```\n\n* `config` 是配置文件的路径。我们提供的预设配置文件位于 `configs\u002F` 目录下。这些配置文件按照 [`ml_collections`](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fml_collections) 的格式编写，应该非常直观易懂。\n\n  **配置文件命名规范**：配置文件的路径由以下维度组合而成：\n  * 数据集：`cifar10`、`celeba`、`celebahq`、`celebahq_256`、`ffhq_256`、`celebahq`、`ffhq` 中的一个。\n  * 模型：`ncsn`、`ncsnv2`、`ncsnpp`、`ddpm`、`ddpmpp` 中的一个。\n  * 连续性：使用连续采样的时间步长进行训练。\n\n* `workdir` 是存储一次实验所有产物的路径，例如检查点、样本和评估结果。\n\n* `eval_folder` 是 `workdir` 中用于存储评估过程所有产物的子文件夹名称，例如用于防止中断的元检查点、图像样本以及定量结果的 NumPy 文件。\n\n* `mode` 可以是 `\"train\"` 或 `\"eval\"`。当设置为 `\"train\"` 时，将开始训练新模型；如果 `workdir\u002Fcheckpoints-meta` 中存在该模型的元检查点（用于在云环境中被抢占后恢复运行），则会从中断处继续训练。当设置为 `\"eval\"` 时，可以执行以下任意组合的操作：\n\n  * 在测试\u002F验证数据集上评估损失函数。\n  * 生成固定数量的样本，并计算其 Inception 分数、FID 或 KID。在评估之前，必须先下载或计算好统计文件，并将其存储在 `assets\u002Fstats` 目录中。\n  * 在训练或测试数据集上计算对数似然。\n\n这些功能可以通过配置文件进行配置，也可以更方便地通过 `ml_collections` 包的命令行支持来实现。例如，要生成样本并评估样本质量，可以使用 `--config.eval.enable_sampling` 标志；要计算对数似然，则使用 `--config.eval.enable_bpd` 标志，并通过 `--config.eval.dataset=train\u002Ftest` 指定是在训练集还是测试集上计算似然。\n\n## 如何扩展代码\n* **新的随机微分方程 (SDE)**：继承 `sde_lib.SDE` 抽象类，并实现所有抽象方法。`discretize()` 方法是可选的，默认采用欧拉-丸山离散化。现有的采样方法和似然计算将自动适用于这个新的 SDE。\n* **新的预测器**：继承 `sampling.Predictor` 抽象类，实现 `update_fn` 抽象方法，并使用 `@register_predictor` 注册其名称。新预测器可以直接用于 `sampling.get_pc_sampler` 中的预测-校正采样，以及 `controllable_generation.py` 中的所有其他可控生成方法。\n* **新的校正器**：继承 `sampling.Corrector` 抽象类，实现 `update_fn` 抽象方法，并使用 `@register_corrector` 注册其名称。新校正器可以直接用于 `sampling.get_pc_sampler`，以及 `controllable_generation.py` 中的所有其他可控生成方法。\n\n## 预训练检查点\n所有检查点均在此 [Google Drive](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1tFmF_uh57O6lx9ggtZT_5LdonVK2cV-e?usp=sharing) 中提供。\n\n**说明**：对于某些模型，您可能会找到两个检查点。第一个检查点（编号较小）是我们论文表3中报告FID分数的版本（也对应下表中的FID和IS列）。第二个检查点（编号较大）是我们论文表2中报告黑箱ODE采样器的似然值和FID的版本（也对应下表中的FID(ODE)和NNL (bits\u002Fdim)列）。前者对应于训练过程中每5万次迭代时取得的最小FID，而后者则是训练结束时的最后一个检查点。\n\n根据Google的政策，我们无法发布原始的CelebA和CelebA-HQ检查点。不过，我已利用个人资源在FFHQ 1024px、FFHQ 256px以及CelebA-HQ 256px数据集上重新训练了模型，其性能与我们的内部检查点相当。\n\n以下是检查点及其在论文中报告结果的详细列表。**FID (ODE)** 对应于应用于概率流ODE的黑箱ODE求解器的样本质量。\n\n| 检查点路径 | FID | IS | FID (ODE) | NNL (bits\u002Fdim) |\n|:----------|:-------:|:----------:|:----------:|:----------:|\n| [`ve\u002Fcifar10_ncsnpp\u002F`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1sP4GwvrYiI-sDPTp7sKYzsxJLGVamVMZ?usp=sharing) |  2.45 | 9.73 | - | - |\n| [`ve\u002Fcifar10_ncsnpp_continuous\u002F`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1b0gy_LLgO_DaQBgoWXwlVnL_rcAUgREh?usp=sharing) | 2.38 | 9.83 | - | - |\n| [`ve\u002Fcifar10_ncsnpp_deep_continuous\u002F`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F11s6A_xM7qiztdj8AHQWqaIAUSC3I7uX2?usp=sharing) | **2.20** | **9.89** | - | - |\n| [`vp\u002Fcifar10_ddpm\u002F`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1zDKcy3xbsN3F4AfyB_DfY_1oho89iKcf?usp=sharing) | 3.24 | - | 3.37 | 3.28 |\n| [`vp\u002Fcifar10_ddpm_continuous`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1RHNxW1qY-mTr0JMAE5t4V181Hi_aVWXK?usp=sharing) | - | - | 3.69| 3.21 |\n| [`vp\u002Fcifar10_ddpmpp`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1zOVj03ZBcq339p5QEKJPh2bBrxR_HOCM?usp=sharing) | 2.78 | 9.64 | - | - |\n| [`vp\u002Fcifar10_ddpmpp_continuous`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1xYjVMx10N9ivQQBIsEoXEeu9nvSGTBrC?usp=sharing) | 2.55 | 9.58 | 3.93 | 3.16 |\n| [`vp\u002Fcifar10_ddpmpp_deep_continuous`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1ZMLBiu9j7-rpdTQu8M2LlHAEQq4xRYrj?usp=sharing) | 2.41 | 9.68 | 3.08 | 3.13 |\n| [`subvp\u002Fcifar10_ddpm_continuous`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1DeebpmBkCxlZx89t3z45Te37T7BPOzd2?usp=sharing) | - | - | 3.56 | 3.05 |\n| [`subvp\u002Fcifar10_ddpmpp_continuous`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1bLgmnEAZnysRZfWt8qN3omGfijJ_B884?usp=sharing) | 2.61 | 9.56 | 3.16 | 3.02 |\n| [`subvp\u002Fcifar10_ddpmpp_deep_continuous`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F16QGkviGcizSbIPRk37-YksUhlNIna4Ys?usp=sharing) | 2.41 | 9.57 | **2.92** | **2.99** |\n\n| 检查点路径 | 样本 |\n|:-----|:------:|\n| [`ve\u002Fbedroom_ncsnpp_continuous`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F18GmxDvfGR8se9uFucc9uweeVrX_GzuUG?usp=sharing) | ![bedroom_samples](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fyang-song_score_sde_pytorch_readme_772513bfb9ae.jpeg) |\n| [`ve\u002Fchurch_ncsnpp_continuous`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1zVChA0HrnJU66Jkt4P6KOnlREhBMc4Yh?usp=sharing) | ![church_samples](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fyang-song_score_sde_pytorch_readme_ecc24b3b908b.jpeg) |\n| [`ve\u002Fffhq_1024_ncsnpp_continuous`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1ZqLNr_kH0o9DxvwSlrQPMmkrhEnXhBm2?usp=sharing) |![ffhq_1024](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fyang-song_score_sde_pytorch_readme_c4f07a106670.jpeg)|\n| [`ve\u002Fffhq_256_ncsnpp_continuous`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1KG72ZKUCUa8dDcA03hOf1BsnK8kBcdPD?usp=sharing) |![ffhq_256_samples](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fyang-song_score_sde_pytorch_readme_caa2de1b67b1.jpg)|\n| [`ve\u002Fcelebahq_256_ncsnpp_continuous`](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F19VJ7UZTE-ytGX6z5rl-tumW9c0Ps3itk?usp=sharing) |![celebahq_256_samples](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fyang-song_score_sde_pytorch_readme_358ef1dc50a7.jpg)|\n\n\n## 演示与教程\n| 链接 | 描述|\n|:----:|:-----|\n|[![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1dRR_0gNRmfLtPavX2APzUggBuXyjWW55?usp=sharing)  | 加载我们的预训练检查点，体验采样、似然计算和可控生成（JAX + FLAX）|\n|[![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F17lTrPLTt_0EDXa4hkbHmbAFQEkpRDZnh?usp=sharing) | 加载我们的预训练检查点，体验采样、似然计算和可控生成（PyTorch）|\n|[![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1SeXMpILhkJPjXUaesvzEhc3Ke6Zl_zxJ?usp=sharing) | JAX + FLAX中的基于分数的生成模型教程 |\n|[![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F120kYYBOVa1i0TD85RjlEkFjaWDxSFUx3?usp=sharing)| PyTorch中的基于分数的生成模型教程 |\n\n\n## 小贴士\n* 使用JAX代码库时，可以将多个训练步骤一起进行jit编译，以提高训练速度，但会增加内存消耗。这可以通过`config.training.n_jitted_steps`来设置。对于CIFAR-10数据集，如果您的GPU\u002FTPU内存充足，建议使用`config.training.n_jitted_steps=5`；否则建议使用`config.training.n_jitted_steps=1`。目前的实现要求`config.training.log_freq`能够被`n_jitted_steps`整除，以便日志记录和检查点保存正常工作。\n* `LangevinCorrector`的`snr`（信噪比）参数在某种程度上类似于温度参数。较大的`snr`通常会产生更平滑的样本，而较小的`snr`则会产生更多样但质量较低的样本。典型的`snr`取值为`0.05 - 0.2`，需要通过调优找到最佳平衡点。\n* 对于VE SDE，建议将`config.model.sigma_max`设置为训练数据集中任意两个样本之间的最大距离。\n\n## 参考文献\n\n如果您发现该代码对您的研究有帮助，请考虑引用以下文献：\n```bibtex\n@inproceedings{\n  song2021scorebased,\n  title={基于分数的生成模型：通过随机微分方程},\n  author={杨松、雅沙·索尔-迪克斯坦、迪德里克·P·金格玛、阿比谢克·库马尔、斯特凡诺·埃尔蒙、本·普尔},\n  booktitle={国际表示学习大会},\n  year={2021},\n  url={https:\u002F\u002Fopenreview.net\u002Fforum?id=PxTIG12RRHS}\n}\n```\n\n本工作建立在一些先前的论文基础上，这些论文也可能对您有所兴趣：\n\n* 杨松和斯特凡诺·埃尔蒙. “通过估计数据分布的梯度进行生成建模.” *第33届神经信息处理系统年会论文集*. 2019年.\n* 杨松和斯特凡诺·埃尔蒙. “训练基于分数的生成模型的改进方法.” *第34届神经信息处理系统年会论文集*. 2020年.\n* 霍, 乔纳森、阿贾伊·贾因和皮特·阿贝尔. “去噪扩散概率模型.” *第34届神经信息处理系统年会论文集*. 2020年.","# score_sde_pytorch 快速上手指南\n\n`score_sde_pytorch` 是基于随机微分方程（SDE）的分数生成模型的 PyTorch 实现。该工具支持训练和评估多种生成模型（如 NCSN++、DDPM++），并提供了高质量的图像生成、似然计算及条件生成能力。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux (推荐) 或 macOS\n*   **Python**: 3.7 或更高版本\n*   **GPU**: 推荐使用 NVIDIA GPU (需安装对应的 CUDA 驱动)，虽然支持 CPU 运行，但训练和采样速度会显著较慢。\n*   **前置依赖**:\n    *   PyTorch\n    *   torchvision\n    *   ml_collections\n    *   其他 Python 科学计算库 (numpy, scipy, PIL 等)\n\n> **注意**：本项目代码库未直接提供国内镜像源配置，建议在安装 Python 依赖时使用清华或阿里镜像加速。\n\n## 安装步骤\n\n### 1. 克隆代码库\n首先从 GitHub 克隆项目源码：\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fyang-song\u002Fscore_sde_pytorch.git\ncd score_sde_pytorch\n```\n\n### 2. 安装依赖\n使用 `pip` 安装必要的 Python 包。推荐使用国内镜像源以加快下载速度：\n\n```bash\npip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n如果 `requirements.txt` 中未包含 PyTorch，请根据您的需求单独安装（以下为 CUDA 11.8 示例）：\n```bash\npip install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n```\n\n### 3. 下载统计文件 (可选，用于定量评估)\n如果您需要计算 FID 或 Inception Score 等指标，需下载预计算的统计文件。以 CIFAR-10 为例：\n\n```bash\nmkdir -p assets\u002Fstats\n# 手动下载 cifar10_stats.npz 并放入 assets\u002Fstats\u002F 目录\n# 下载地址: https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F14UB27-Spi8VjZYKST3ZcT8YVhAluiFWI\u002Fview\n```\n\n## 基本使用\n\n本工具主要通过 `main.py` 脚本进行模型训练和评估。使用前需准备好配置文件（位于 `configs\u002F` 目录下）。\n\n### 1. 训练模型\n选择一个配置文件启动训练。例如，使用默认的 CIFAR-10 NCSN++ 连续时间模型配置：\n\n```bash\npython main.py --config=configs\u002Fdefault_cifar10_ncsnpp_continuous.py --workdir=.\u002Fexperiments\u002Fcifar10_test --mode=train\n```\n\n*   `--config`: 指定配置文件路径。\n*   `--workdir`: 指定工作目录，用于保存检查点（checkpoints）、样本和日志。\n*   `--mode=train`: 启动训练模式。如果工作目录中已有检查点，将自动恢复训练。\n\n### 2. 评估与采样\n训练完成后（或使用预训练模型），可切换至评估模式生成图像或计算指标：\n\n```bash\npython main.py --config=configs\u002Fdefault_cifar10_ncsnpp_continuous.py --workdir=.\u002Fexperiments\u002Fcifar10_test --mode=eval --eval_folder=eval_results\n```\n\n通过命令行参数控制具体评估任务（利用 `ml_collections` 特性）：\n*   **生成样本并计算 FID\u002FIS**:\n    ```bash\n    python main.py ... --mode=eval --config.eval.enable_sampling=True\n    ```\n*   **计算对数似然 (Log-likelihood)**:\n    ```bash\n    python main.py ... --mode=eval --config.eval.enable_bpd=True --config.eval.dataset=test\n    ```\n\n### 3. 快速体验 (通过 🤗 Diffusers)\n如果您仅想快速测试预训练模型而无需配置完整训练环境，可以使用 Hugging Face 的 `diffusers` 库，仅需几行代码即可生成图像：\n\n**安装 diffusers:**\n```bash\npip install diffusers torch accelerate -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n**运行示例:**\n```python\nfrom diffusers import DiffusionPipeline\n\n# 加载预训练模型 (例如 FFHQ 1024px)\nmodel_id = \"google\u002Fncsnpp-ffhq-1024\"\nsde_ve = DiffusionPipeline.from_pretrained(model_id)\n\n# 生成图像\nimage = sde_ve().images[0]\n\n# 保存图像\nimage.save(\"sde_ve_generated_image.png\")\n```\n\n更多预训练模型可在 [Hugging Face Hub](https:\u002F\u002Fhuggingface.co\u002Fmodels?library=diffusers&pipeline_tag=unconditional-image-generation&sort=downloads&search=ncsnpp) 查找。","某数字艺术工作室的设计师需要为游戏项目快速生成大量高分辨率、风格统一且无版权风险的背景素材，同时要求能灵活修复局部破损或进行上色创作。\n\n### 没有 score_sde_pytorch 时\n- **生成质量受限**：传统 GAN 模型在生成 1024px 高清图像时容易出现伪影或模式崩溃，难以达到商业级画质标准。\n- **功能扩展困难**：若需实现图像修复（Inpainting）或黑白上色等条件生成任务，往往需要重新训练专用模型，开发周期长达数周。\n- **理论验证复杂**：缺乏统一的随机微分方程（SDE）框架，开发者难以精确计算生成概率或操控潜在代码，调优过程如同“黑盒”摸索。\n- **复现成本高昂**：社区中 NCSN、DDPM 等多种算法代码分散且接口不一，整合对比不同架构的性能需要耗费大量工程精力。\n\n### 使用 score_sde_pytorch 后\n- **画质显著提升**：利用其 SDE 逆向采样能力，直接在 CIFAR-10 上取得 FID 2.20 的优异成绩，并能稳定生成 1024px 的 Celeba-HQ 级高保真人脸与场景。\n- **任务灵活切换**：依托统一框架，无需重新训练即可通过调整采样过程，轻松实现类条件生成、局部修复及自动上色等多种创意需求。\n- **可控性增强**：支持精确的似然计算和潜在空间操作，设计师可量化评估生成效果，并对图像细节进行更精细的定向编辑。\n- **开发效率飞跃**：代码库模块化集成了 NCSN++、DDPM++ 等主流模型，甚至可通过 Hugging Face Diffusers 几行代码直接调用预训练模型，即刻投入生产。\n\nscore_sde_pytorch 通过统一的随机微分方程视角，将高质量图像生成从复杂的科研实验转化为高效、可控且多功能的工程实践。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fyang-song_score_sde_pytorch_c4f07a10.jpg","yang-song","Yang Song","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fyang-song_cc0a3772.jpg","Research Scientist at OpenAI.","OpenAI","Palo Alto, CA",null,"https:\u002F\u002Fyang-song.net","https:\u002F\u002Fgithub.com\u002Fyang-song",[83,87,91,95],{"name":84,"color":85,"percentage":86},"Jupyter Notebook","#DA5B0B",93.5,{"name":88,"color":89,"percentage":90},"Python","#3572A5",6.1,{"name":92,"color":93,"percentage":94},"Cuda","#3A4E3A",0.4,{"name":96,"color":97,"percentage":98},"C++","#f34b7d",0,2094,353,"2026-04-04T13:52:45","Apache-2.0","未说明","需要 NVIDIA GPU。基准测试使用 4x Nvidia Tesla V100 (32GB)。PyTorch 版本相比 JAX 显存占用较低（约 20.6GB 用于训练 NCSN++），但具体最低显存取决于模型和数据集分辨率（如 1024px 图像需要更大显存）。",{"notes":106,"python":103,"dependencies":107},"1. 代码通过 `requirements.txt` 安装依赖，具体版本未在 README 列出。\n2. 定量评估（如计算 FID）需预先下载统计文件（如 `cifar10_stats.npz`）至 `assets\u002Fstats\u002F` 目录。\n3. 预训练模型权重托管在 Google Drive，需手动下载；部分原始 CelebA 检查点因政策未发布，但提供了重新训练的 FFHQ 和 CelebA-HQ 替代版本。\n4. 支持通过 `main.py` 进行训练和评估，配置文件位于 `configs\u002F` 目录。\n5. 提供与 Hugging Face 🤗 Diffusers 库的集成，可简化推理过程。",[108,109,110,111],"torch","ml_collections","diffusers (可选，用于集成)","accelerate (可选，配合 diffusers)",[15,14],[114,115,116,117,118,119,120,121,122],"pytorch","stochastic-differential-equations","inverse-problems","generative-models","score-matching","score-based-generative-modeling","controllable-generation","iclr-2021","diffusion-models","2026-03-27T02:49:30.150509","2026-04-08T13:03:35.007254",[126,131,136,141,146,151,156],{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},24299,"如何在显存有限（如 16GB）的 GPU 上训练模型或解决内存溢出问题？","如果遇到 CUDA 内存不足（OOM），可以尝试降低图像分辨率（例如从 128 降至 64）。虽然 JAX 版本的采样速度比 PyTorch 快，但在显存受限的情况下，调整配置中的 image_size 是最直接的方法。注意，降低分辨率可能会影响最终生成的 FID 分数。","https:\u002F\u002Fgithub.com\u002Fyang-song\u002Fscore_sde_pytorch\u002Fissues\u002F1",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},24300,"如何在单张 GPU 上运行代码并指定特定的 GPU 设备？","不要直接在 main.py 中设置 `CUDA_VISIBLE_DEVICES` 或在代码中硬编码 `cuda:1`，这可能导致 cuDNN 映射错误。正确的做法是：\n1. 创建一个 shell 脚本或使用命令行导出环境变量：`export CUDA_VISIBLE_DEVICES=1`（将 1 替换为你想要的 GPU 编号）。\n2. 在配置文件或代码中，始终将设备设置为 `torch.device('cuda:0')`。因为当设置了 `CUDA_VISIBLE_DEVICES` 后，PyTorch 看到的第一个可用 GPU 永远是 0 号。","https:\u002F\u002Fgithub.com\u002Fyang-song\u002Fscore_sde_pytorch\u002Fissues\u002F27",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},24301,"使用多张 GPU 训练是否会改变模型性能？如何加速训练过程？","使用多张 GPU 训练不会影响模型的最终性能。原始配置文件已经设计为支持多 GPU 训练。例如，在 4 张 V100 GPU 上，批量大小（batch size）设为 128 时，训练大约需要 3 天完成。如果单卡训练太慢，建议使用多卡并行以加快迭代速度。","https:\u002F\u002Fgithub.com\u002Fyang-song\u002Fscore_sde_pytorch\u002Fissues\u002F4",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},24302,"离散模型的得分函数中是否需要添加 torch.round() 操作？","是的，对于离散模型，最好在获取标签后添加 `torch.round(labels)` 操作（例如在 models\u002Futils.py 第 155 行之后），以确保采样时的时间步长与训练时一致（整数 vs 浮点数）。不过，维护者指出当前代码不加 round 也有好处：它允许连续 SDE 框架兼容那些使用离散损失预训练的模型（如 DDPM 和 NCSN），从而能够计算对数似然等指标。如果不涉及跨框架兼容性，加上 round 更符合逻辑。","https:\u002F\u002Fgithub.com\u002Fyang-song\u002Fscore_sde_pytorch\u002Fissues\u002F2",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},24303,"在哪里可以找到潜在空间表示操纵（如插值、温度调节）的代码示例？","可以参考官方的 Colab 笔记本示例来学习如何操纵潜在表示（包括插值和温度调整）：[潜在空间操纵示例](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F17lTrPLTt_0EDXa4hkbHmbAFQEkpRDZnh#scrollTo=oe3rLGRm28nc&line=1&uniqifier=1)。\n注意：对于大尺寸图像，VP 或 subVP SDE 在潜在空间操纵方面的表现通常优于 VE SDE。","https:\u002F\u002Fgithub.com\u002Fyang-song\u002Fscore_sde_pytorch\u002Fissues\u002F5",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},24304,"为什么在使用 subVPSDE 进行 PC 采样时会报错 'AttributeError: alphas'？","这是因为 subVPSDE 的特性限制。根据论文说明，只有 VE 和 VP SDE 的漂移系数是逐元素的，且扩散系数是对角的，因此支持某些特定的采样属性（如 alphas）。subVPSDE 不完全具备这些属性，所以在演示笔记本的 \"PC sampling\"、\"PC inpainting\" 或 \"PC colorizer\" 部分直接使用 subVPSDE 会报错。请确保在这些特定任务中使用 VE 或 VP SDE。","https:\u002F\u002Fgithub.com\u002Fyang-song\u002Fscore_sde_pytorch\u002Fissues\u002F8",{"id":157,"question_zh":158,"answer_zh":159,"source_url":130},24305,"FID 评估采样速度太慢，有什么办法加速吗？","扩散模型的采样本身较慢是固有特性。如果使用 PyTorch 实现觉得太慢，可以尝试使用 JAX 版本，其采样速度通常更快。另外，虽然可以在少量样本（如 1000 张）上计算 FID 以节省时间，但维护者提醒，这样得出的 FID 分数会与在标准 50k 样本上评估的结果有较大偏差，仅供参考。",[]]