[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-NVlabs--NVAE":3,"tool-NVlabs--NVAE":64},[4,17,26,40,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,2,"2026-04-03T11:11:01",[13,14,15],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":23,"last_commit_at":32,"category_tags":33,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,34,35,36,15,37,38,13,39],"数据工具","视频","插件","其他","语言模型","音频",{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":10,"last_commit_at":46,"category_tags":47,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,38,37],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":10,"last_commit_at":54,"category_tags":55,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74913,"2026-04-05T10:44:17",[38,14,13,37],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":23,"last_commit_at":62,"category_tags":63,"status":16},2471,"tesseract","tesseract-ocr\u002Ftesseract","Tesseract 是一款历史悠久且备受推崇的开源光学字符识别（OCR）引擎，最初由惠普实验室开发，后由 Google 维护，目前由全球社区共同贡献。它的核心功能是将图片中的文字转化为可编辑、可搜索的文本数据，有效解决了从扫描件、照片或 PDF 文档中提取文字信息的难题，是数字化归档和信息自动化的重要基础工具。\n\n在技术层面，Tesseract 展现了强大的适应能力。从版本 4 开始，它引入了基于长短期记忆网络（LSTM）的神经网络 OCR 引擎，显著提升了行识别的准确率；同时，为了兼顾旧有需求，它依然支持传统的字符模式识别引擎。Tesseract 原生支持 UTF-8 编码，开箱即用即可识别超过 100 种语言，并兼容 PNG、JPEG、TIFF 等多种常见图像格式。输出方面，它灵活支持纯文本、hOCR、PDF、TSV 等多种格式，方便后续数据处理。\n\nTesseract 主要面向开发者、研究人员以及需要构建文档处理流程的企业用户。由于它本身是一个命令行工具和库（libtesseract），不包含图形用户界面（GUI），因此最适合具备一定编程能力的技术人员集成到自动化脚本或应用程序中",73286,"2026-04-03T01:56:45",[13,14],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":91,"forks":92,"last_commit_at":93,"license":94,"difficulty_score":10,"env_os":95,"env_gpu":96,"env_ram":97,"env_deps":98,"category_tags":107,"github_topics":79,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":108,"updated_at":109,"faqs":110,"releases":149},2953,"NVlabs\u002FNVAE","NVAE","The Official PyTorch Implementation of \"NVAE: A Deep Hierarchical Variational Autoencoder\" (NeurIPS 2020 spotlight paper)","NVAE 是一个基于 PyTorch 实现的深度分层变分自编码器，源自 NeurIPS 2020 的焦点论文。它主要用于构建高性能的生成模型，能够在 MNIST、CIFAR-10、CelebA 及 ImageNet 等多个图像数据集上训练出具有业界领先（SOTA）似然估计能力的模型。\n\n传统变分自编码器在处理高分辨率图像时，往往难以平衡生成质量与概率建模的准确性。NVAE 通过引入深层分层结构，有效解决了这一难题，显著提升了模型捕捉复杂数据分布的能力，从而生成更清晰、细节更丰富的图像。\n\n这款工具特别适合人工智能研究人员、深度学习开发者以及计算机视觉领域的工程师使用。如果你正在探索生成式 AI 的前沿技术，或需要复现高质量的图像生成实验，NVAE 提供了完整的训练脚本和数据预处理流程。其独特的技术亮点在于“深度分层”架构，通过在潜在空间的不同层级进行建模，极大地增强了模型对图像全局结构与局部纹理的理解力。虽然配置过程涉及 LMDB 数据集转换等技术细节，但其开源代码为学术研究和工程落地提供了坚实的基础。","# The Official PyTorch Implementation of \"NVAE: A Deep Hierarchical Variational Autoencoder\" [(NeurIPS 2020 Spotlight Paper)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.03898)\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"http:\u002F\u002Flatentspace.cc\u002Farash_vahdat\u002F\" target=\"_blank\">Arash&nbsp;Vahdat\u003C\u002Fa> &emsp; \u003Cb>&middot;\u003C\u002Fb> &emsp;\n  \u003Ca href=\"http:\u002F\u002Fjankautz.com\u002F\" target=\"_blank\">Jan&nbsp;Kautz\u003C\u002Fa> \n\u003C\u002Fdiv>\n\u003Cbr>\n\u003Cbr>\n\n[NVAE](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.03898) is a deep hierarchical variational autoencoder that enables training SOTA \nlikelihood-based generative models on several image datasets.\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FNVlabs_NVAE_readme_c0ea1efb82c4.png\" width=\"800\">\n\u003C\u002Fp>\n\n## Requirements\nNVAE is built in Python 3.7 using PyTorch 1.6.0. Use the following command to install the requirements:\n```\npip install -r requirements.txt\n``` \n\n## Set up file paths and data\nWe have examined NVAE on several datasets. For large datasets, we store the data in LMDB datasets\nfor I\u002FO efficiency. Click below on each dataset to see how you can prepare your data. Below, `$DATA_DIR` indicates\nthe path to a data directory that will contain all the datasets and `$CODE_DIR` refers to the code directory:\n\n\u003Cdetails>\u003Csummary>MNIST and CIFAR-10\u003C\u002Fsummary>\n\nThese datasets will be downloaded automatically, when you run the main training for NVAE using `train.py`\nfor the first time. You can use `--data=$DATA_DIR\u002Fmnist` or `--data=$DATA_DIR\u002Fcifar10`, so that the datasets\nare downloaded to the corresponding directories.\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>CelebA 64\u003C\u002Fsummary>\nRun the following commands to download the CelebA images and store them in an LMDB dataset:\n\n```shell script\ncd $CODE_DIR\u002Fscripts\npython create_celeba64_lmdb.py --split train --img_path $DATA_DIR\u002Fceleba_org --lmdb_path $DATA_DIR\u002Fceleba64_lmdb\npython create_celeba64_lmdb.py --split valid --img_path $DATA_DIR\u002Fceleba_org --lmdb_path $DATA_DIR\u002Fceleba64_lmdb\npython create_celeba64_lmdb.py --split test  --img_path $DATA_DIR\u002Fceleba_org --lmdb_path $DATA_DIR\u002Fceleba64_lmdb\n```\nAbove, the images will be downloaded to `$DATA_DIR\u002Fceleba_org` automatically and then then LMDB datasets are created\nat `$DATA_DIR\u002Fceleba64_lmdb`.\n\u003C\u002Fdetails>\n \n\u003Cdetails>\u003Csummary>ImageNet 32x32\u003C\u002Fsummary>\n\nRun the following commands to download tfrecord files from [GLOW](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fglow) and to convert them\nto LMDB datasets\n```shell script\nmkdir -p $DATA_DIR\u002Fimagenet-oord\ncd $DATA_DIR\u002Fimagenet-oord\nwget https:\u002F\u002Fstorage.googleapis.com\u002Fglow-demo\u002Fdata\u002Fimagenet-oord-tfr.tar\ntar -xvf imagenet-oord-tfr.tar\ncd $CODE_DIR\u002Fscripts\npython convert_tfrecord_to_lmdb.py --dataset=imagenet-oord_32 --tfr_path=$DATA_DIR\u002Fimagenet-oord\u002Fmnt\u002Fhost\u002Fimagenet-oord-tfr --lmdb_path=$DATA_DIR\u002Fimagenet-oord\u002Fimagenet-oord-lmdb_32 --split=train\npython convert_tfrecord_to_lmdb.py --dataset=imagenet-oord_32 --tfr_path=$DATA_DIR\u002Fimagenet-oord\u002Fmnt\u002Fhost\u002Fimagenet-oord-tfr --lmdb_path=$DATA_DIR\u002Fimagenet-oord\u002Fimagenet-oord-lmdb_32 --split=validation\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>CelebA HQ 256\u003C\u002Fsummary>\n\nRun the following commands to download tfrecord files from [GLOW](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fglow) and to convert them\nto LMDB datasets\n```shell script\nmkdir -p $DATA_DIR\u002Fceleba\ncd $DATA_DIR\u002Fceleba\nwget https:\u002F\u002Fstorage.googleapis.com\u002Fglow-demo\u002Fdata\u002Fceleba-tfr.tar\ntar -xvf celeba-tfr.tar\ncd $CODE_DIR\u002Fscripts\npython convert_tfrecord_to_lmdb.py --dataset=celeba --tfr_path=$DATA_DIR\u002Fceleba\u002Fceleba-tfr --lmdb_path=$DATA_DIR\u002Fceleba\u002Fceleba-lmdb --split=train\npython convert_tfrecord_to_lmdb.py --dataset=celeba --tfr_path=$DATA_DIR\u002Fceleba\u002Fceleba-tfr --lmdb_path=$DATA_DIR\u002Fceleba\u002Fceleba-lmdb --split=validation\n```\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\u003Csummary>FFHQ 256\u003C\u002Fsummary>\n\nVisit [this Google drive location](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1WocxvZ4GEZ1DI8dOz30aSj2zT6pkATYS) and download\n`images1024x1024.zip`. Run the following commands to unzip the images and to store them in LMDB datasets:\n```shell script\nmkdir -p $DATA_DIR\u002Fffhq\nunzip images1024x1024.zip -d $DATA_DIR\u002Fffhq\u002F\ncd $CODE_DIR\u002Fscripts\npython create_ffhq_lmdb.py --ffhq_img_path=$DATA_DIR\u002Fffhq\u002Fimages1024x1024\u002F --ffhq_lmdb_path=$DATA_DIR\u002Fffhq\u002Fffhq-lmdb --split=train\npython create_ffhq_lmdb.py --ffhq_img_path=$DATA_DIR\u002Fffhq\u002Fimages1024x1024\u002F --ffhq_lmdb_path=$DATA_DIR\u002Fffhq\u002Fffhq-lmdb --split=validation\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>LSUN\u003C\u002Fsummary>\n\nWe use LSUN datasets in our follow-up works. Visit [LSUN](https:\u002F\u002Fwww.yf.io\u002Fp\u002Flsun) for \ninstructions on how to download this dataset. Since the LSUN scene datasets come in the\nLMDB format, they are ready to be loaded using torchvision data loaders.\n\n\u003C\u002Fdetails>\n\n\n## Running the main NVAE training and evaluation scripts\nWe use the following commands on each dataset for training NVAEs on each dataset for \nTable 1 in the [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2007.03898.pdf). In all the datasets but MNIST\nnormalizing flows are enabled. Check Table 6 in the paper for more information on training\ndetails. Note that for the multinode training (more than 8-GPU experiments), we use the `mpirun` \ncommand to run the training scripts on multiple nodes. Please adjust the commands below according to your setup. \nBelow `IP_ADDR` is the IP address of the machine that will host the process with rank 0 \n(see [here](https:\u002F\u002Fpytorch.org\u002Ftutorials\u002Fintermediate\u002Fdist_tuto.html#initialization-methods)). \n`NODE_RANK` is the index of each node among all the nodes that are running the job.\n\n\u003Cdetails>\u003Csummary>MNIST\u003C\u002Fsummary>\n\nTwo 16-GB V100 GPUs are used for training NVAE on dynamically binarized MNIST. Training takes about 21 hours.\n\n```shell script\nexport EXPR_ID=UNIQUE_EXPR_ID\nexport DATA_DIR=PATH_TO_DATA_DIR\nexport CHECKPOINT_DIR=PATH_TO_CHECKPOINT_DIR\nexport CODE_DIR=PATH_TO_CODE_DIR\ncd $CODE_DIR\npython train.py --data $DATA_DIR\u002Fmnist --root $CHECKPOINT_DIR --save $EXPR_ID --dataset mnist --batch_size 200 \\\n        --epochs 400 --num_latent_scales 2 --num_groups_per_scale 10 --num_postprocess_cells 3 --num_preprocess_cells 3 \\\n        --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 --num_latent_per_group 20 --num_preprocess_blocks 2 \\\n        --num_postprocess_blocks 2 --weight_decay_norm 1e-2 --num_channels_enc 32 --num_channels_dec 32 --num_nf 0 \\\n        --ada_groups --num_process_per_node 2 --use_se --res_dist --fast_adamax \n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>CIFAR-10\u003C\u002Fsummary>\n\nEight 16-GB V100 GPUs are used for training NVAE on CIFAR-10. Training takes about 55 hours.\n\n```shell script\nexport EXPR_ID=UNIQUE_EXPR_ID\nexport DATA_DIR=PATH_TO_DATA_DIR\nexport CHECKPOINT_DIR=PATH_TO_CHECKPOINT_DIR\nexport CODE_DIR=PATH_TO_CODE_DIR\ncd $CODE_DIR\npython train.py --data $DATA_DIR\u002Fcifar10 --root $CHECKPOINT_DIR --save $EXPR_ID --dataset cifar10 \\\n        --num_channels_enc 128 --num_channels_dec 128 --epochs 400 --num_postprocess_cells 2 --num_preprocess_cells 2 \\\n        --num_latent_scales 1 --num_latent_per_group 20 --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 \\\n        --num_preprocess_blocks 1 --num_postprocess_blocks 1 --num_groups_per_scale 30 --batch_size 32 \\\n        --weight_decay_norm 1e-2 --num_nf 1 --num_process_per_node 8 --use_se --res_dist --fast_adamax \n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>CelebA 64\u003C\u002Fsummary>\n\nEight 16-GB V100 GPUs are used for training NVAE on CelebA 64. Training takes about 92 hours.\n\n```shell script\nexport EXPR_ID=UNIQUE_EXPR_ID\nexport DATA_DIR=PATH_TO_DATA_DIR\nexport CHECKPOINT_DIR=PATH_TO_CHECKPOINT_DIR\nexport CODE_DIR=PATH_TO_CODE_DIR\ncd $CODE_DIR\npython train.py --data $DATA_DIR\u002Fceleba64_lmdb --root $CHECKPOINT_DIR --save $EXPR_ID --dataset celeba_64 \\\n        --num_channels_enc 64 --num_channels_dec 64 --epochs 90 --num_postprocess_cells 2 --num_preprocess_cells 2 \\\n        --num_latent_scales 3 --num_latent_per_group 20 --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 \\\n        --num_preprocess_blocks 1 --num_postprocess_blocks 1 --weight_decay_norm 1e-1 --num_groups_per_scale 20 \\\n        --batch_size 16 --num_nf 1 --ada_groups --num_process_per_node 8 --use_se --res_dist --fast_adamax\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>ImageNet 32x32\u003C\u002Fsummary>\n\n24 16-GB V100 GPUs are used for training NVAE on ImageNet 32x32. Training takes about 70 hours.\n\n```shell script\nexport EXPR_ID=UNIQUE_EXPR_ID\nexport DATA_DIR=PATH_TO_DATA_DIR\nexport CHECKPOINT_DIR=PATH_TO_CHECKPOINT_DIR\nexport CODE_DIR=PATH_TO_CODE_DIR\nexport IP_ADDR=IP_ADDRESS\nexport NODE_RANK=NODE_RANK_BETWEEN_0_TO_2\ncd $CODE_DIR\nmpirun --allow-run-as-root -np 3 -npernode 1 bash -c \\\n        'python train.py --data $DATA_DIR\u002Fimagenet-oord\u002Fimagenet-oord-lmdb_32 --root $CHECKPOINT_DIR --save $EXPR_ID --dataset imagenet_32 \\\n        --num_channels_enc 192 --num_channels_dec 192 --epochs 45 --num_postprocess_cells 2 --num_preprocess_cells 2 \\\n        --num_latent_scales 1 --num_latent_per_group 20 --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 \\\n        --num_preprocess_blocks 1 --num_postprocess_blocks 1 --num_groups_per_scale 28 \\\n        --batch_size 24 --num_nf 1 --warmup_epochs 1 \\\n        --weight_decay_norm 1e-2 --weight_decay_norm_anneal --weight_decay_norm_init 1e0 \\\n        --num_process_per_node 8 --use_se --res_dist \\\n        --fast_adamax --node_rank $NODE_RANK --num_proc_node 3 --master_address $IP_ADDR '\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>CelebA HQ 256\u003C\u002Fsummary>\n\n24 32-GB V100 GPUs are used for training NVAE on CelebA HQ 256. Training takes about 94 hours.\n\n```shell script\nexport EXPR_ID=UNIQUE_EXPR_ID\nexport DATA_DIR=PATH_TO_DATA_DIR\nexport CHECKPOINT_DIR=PATH_TO_CHECKPOINT_DIR\nexport CODE_DIR=PATH_TO_CODE_DIR\nexport IP_ADDR=IP_ADDRESS\nexport NODE_RANK=NODE_RANK_BETWEEN_0_TO_2\ncd $CODE_DIR\nmpirun --allow-run-as-root -np 3 -npernode 1 bash -c \\\n        'python train.py --data $DATA_DIR\u002Fceleba\u002Fceleba-lmdb --root $CHECKPOINT_DIR --save $EXPR_ID --dataset celeba_256 \\\n        --num_channels_enc 30 --num_channels_dec 30 --epochs 300 --num_postprocess_cells 2 --num_preprocess_cells 2 \\\n        --num_latent_scales 5 --num_latent_per_group 20 --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 \\\n        --num_preprocess_blocks 1 --num_postprocess_blocks 1 --weight_decay_norm 1e-2 --num_groups_per_scale 16 \\\n        --batch_size 4 --num_nf 2 --ada_groups --min_groups_per_scale 4 \\\n        --weight_decay_norm_anneal --weight_decay_norm_init 1. --num_process_per_node 8 --use_se --res_dist \\\n        --fast_adamax --num_x_bits 5 --node_rank $NODE_RANK --num_proc_node 3 --master_address $IP_ADDR '\n```\n\nIn our early experiments, a smaller model with 24 channels instead of 30, could be trained on only 8 GPUs in \nthe same time (with the batch size of 6). The smaller models obtain only 0.01 bpd higher \nnegative log-likelihood.\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>FFHQ 256\u003C\u002Fsummary>\n\n24 32-GB V100 GPUs are used for training NVAE on FFHQ 256. Training takes about 160 hours. \n\n```shell script\nexport EXPR_ID=UNIQUE_EXPR_ID\nexport DATA_DIR=PATH_TO_DATA_DIR\nexport CHECKPOINT_DIR=PATH_TO_CHECKPOINT_DIR\nexport CODE_DIR=PATH_TO_CODE_DIR\nexport IP_ADDR=IP_ADDRESS\nexport NODE_RANK=NODE_RANK_BETWEEN_0_TO_2\ncd $CODE_DIR\nmpirun --allow-run-as-root -np 3 -npernode 1 bash -c \\\n        'python train.py --data $DATA_DIR\u002Fffhq\u002Fffhq-lmdb --root $CHECKPOINT_DIR --save $EXPR_ID --dataset ffhq \\\n        --num_channels_enc 30 --num_channels_dec 30 --epochs 200 --num_postprocess_cells 2 --num_preprocess_cells 2 \\\n        --num_latent_scales 5 --num_latent_per_group 20 --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 \\\n        --num_preprocess_blocks 1 --num_postprocess_blocks 1 --weight_decay_norm 1e-1  --num_groups_per_scale 16 \\\n        --batch_size 4 --num_nf 2  --ada_groups --min_groups_per_scale 4 \\\n        --weight_decay_norm_anneal --weight_decay_norm_init 1. --num_process_per_node 8 --use_se --res_dist \\\n        --fast_adamax --num_x_bits 5 --learning_rate 8e-3 --node_rank $NODE_RANK --num_proc_node 3 --master_address $IP_ADDR '\n```\n\nIn our early experiments, a smaller model with 24 channels instead of 30, could be trained on only 8 GPUs in \nthe same time (with the batch size of 6). The smaller models obtain only 0.01 bpd higher \nnegative log-likelihood.\n\u003C\u002Fdetails>\n\n**If for any reason your training is stopped, use the exact same commend with the addition of `--cont_training`\nto continue training from the last saved checkpoint. If you observe NaN, continuing the training using this flag\nusually will not fix the NaN issue.**\n\n## Known Issues\n\u003Cdetails>\u003Csummary>Cannot build CelebA 64 or training gives NaN right at the beginning on this dataset \u003C\u002Fsummary>\n\nSeveral users have reported issues building CelebA 64 or have encountered NaN at the beginning of training on this dataset.\nIf you face similar issues on this dataset, you can download this dataset manually and build LMDBs using instructions\non this issue https:\u002F\u002Fgithub.com\u002FNVlabs\u002FNVAE\u002Fissues\u002F2 .\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>Getting NaN after a few epochs of training \u003C\u002Fsummary>\n\nOne of the main challenges in training very deep hierarchical VAEs is training instability that we discussed in the paper.\nWe have verified that the settings in the commands above can be trained in a stable way. If you modify the settings\nabove and you encounter NaN after a few epochs of training, you can use these tricks to stabilize your training:\ni) increase the spectral regularization coefficient, `--weight_decay_norm`. ii) Use exponential decay on \n`--weight_decay_norm` using  `--weight_decay_norm_anneal` and `--weight_decay_norm_init`. iii) Decrease learning rate.\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>Training freezes with no NaN \u003C\u002Fsummary>\n\nIn some very rare cases, we observed that training freezes after 2-3 days of training. We believe the root cause\nof this is because of a racing condition that is happening in one of the low-level libraries. If for any reason the training \nis stopped, kill your current run, and use the exact same commend with the addition of `--cont_training`\nto continue training from the last saved checkpoint.\n\u003C\u002Fdetails>\n\n## Monitoring the training progress\nWhile running any of the commands above, you can monitor the training progress using Tensorboard:\n\n\u003Cdetails>\u003Csummary>Click here\u003C\u002Fsummary>\n\n```shell script\ntensorboard --logdir $CHECKPOINT_DIR\u002Feval-$EXPR_ID\u002F\n```\nAbove, `$CHECKPOINT_DIR` and `$EXPR_ID` are the same variables used for running the main training script.\n\n\u003C\u002Fdetails> \n\n## Post-training sampling, evaluation, and checkpoints\n\n\u003Cdetails>\u003Csummary>Evaluating Log-Likelihood\u003C\u002Fsummary>\n\nYou can use the following command to load a trained model and evaluate it on the test datasets:\n\n```shell script\ncd $CODE_DIR\npython evaluate.py --checkpoint $CHECKPOINT_DIR\u002Feval-$EXPR_ID\u002Fcheckpoint.pt --data $DATA_DIR\u002Fmnist --eval_mode=evaluate --num_iw_samples=1000\n```\nAbove, `--num_iw_samples` indicates the number of importance weighted samples used in evaluation. \n`$CHECKPOINT_DIR` and `$EXPR_ID` are the same variables used for running the main training script.\nSet `--data` to the same argument that was used when training NVAE (our example is for MNIST).\n\n\u003C\u002Fdetails> \n\n\u003Cdetails>\u003Csummary>Sampling\u003C\u002Fsummary>\n\nYou can also use the following command to generate samples from a trained model:\n\n```shell script\ncd $CODE_DIR\npython evaluate.py --checkpoint $CHECKPOINT_DIR\u002Feval-$EXPR_ID\u002Fcheckpoint.pt --eval_mode=sample --temp=0.6 --readjust_bn\n```\nwhere `--temp` sets the temperature used for sampling and `--readjust_bn` enables readjustment of the BN statistics\nas described in the paper. If you remove `--readjust_bn`, the sampling will proceed with BN layer in the eval mode \n(i.e., BN layers will use running mean and variances extracted during training).\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>Computing FID\u003C\u002Fsummary>\n\nYou can compute the FID score using 50K samples. To do so, you will need to create\na mean and covariance statistics file on the training data using a command like:\n\n```shell script\ncd $CODE_DIR\npython scripts\u002Fprecompute_fid_statistics.py --data $DATA_DIR\u002Fcifar10 --dataset cifar10 --fid_dir \u002Ftmp\u002Ffid-stats\u002F\n```\nThe command above computes the references statistics on the CIFAR-10 dataset and stores them in the `--fid_dir` durectory.\nGiven the reference statistics file, we can run the following command to compute the FID score:\n\n```shell script\ncd $CODE_DIR\npython evaluate.py --checkpoint $CHECKPOINT_DIR\u002Feval-$EXPR_ID\u002Fcheckpoint.pt --data $DATA_DIR\u002Fcifar10 --eval_mode=evaluate_fid  --fid_dir \u002Ftmp\u002Ffid-stats\u002F --temp=0.6 --readjust_bn\n```\nwhere `--temp` sets the temperature used for sampling and `--readjust_bn` enables readjustment of the BN statistics\nas described in the paper. If you remove `--readjust_bn`, the sampling will proceed with BN layer in the eval mode \n(i.e., BN layers will use running mean and variances extracted during training).\nAbove, `$CHECKPOINT_DIR` and `$EXPR_ID` are the same variables used for running the main training script.\nSet `--data` to the same argument that was used when training NVAE (our example is for MNIST).\n\n\u003C\u002Fdetails> \n\n\u003Cdetails>\u003Csummary>Checkpoints\u003C\u002Fsummary> \n\nWe provide checkpoints on MNIST, CIFAR-10, CelebA 64, CelebA HQ 256, FFHQ in \n[this Google drive directory](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1KVpw12AzdVjvbfEYM_6_3sxTy93wWkbe?usp=sharing). \nFor CIFAR10, we provide two checkpoints as we observed that a multiscale NVAE provides better qualitative\nresults than a single scale model on this dataset. The multiscale model is only slightly worse in terms\nof log-likelihood (0.01 bpd). We also observe that one of our early models on CelebA HQ 256 with 0.01 bpd \nworse likelihood generates much better images in low temperature on this dataset.\n\nYou can use the commands above to evaluate or sample from these checkpoints.\n\n\u003C\u002Fdetails> \n\n## How to construct smaller NVAE models\nIn the commands above, we are constructing big NVAE models that require several days of training\nin most cases. If you'd like to construct smaller NVAEs, you can use these tricks:\n\n* Reduce the network width: `--num_channels_enc` and `--num_channels_dec` are controlling the number\nof initial channels in the bottom-up and top-down networks respectively. Recall that we halve the\nnumber of channels with every spatial downsampling layer in the bottom-up network, and we double the number of\nchannels with every upsampling layer in the top-down network. By reducing\n`--num_channels_enc` and `--num_channels_dec`, you can reduce the overall width of the networks.\n\n* Reduce the number of residual cells in the hierarchy: `--num_cell_per_cond_enc` and \n`--num_cell_per_cond_dec` control the number of residual cells used between every latent variable\ngroup in the bottom-up and top-down networks respectively. In most of our experiments, we are using\ntwo cells per group for both networks. You can reduce the number of residual cells to one to make the model\nsmaller.\n\n* Reduce the number of epochs: You can reduce the training time by reducing `--epochs`.\n\n* Reduce the number of groups: You can make NVAE smaller by using a smaller number of latent variable groups. \nWe use two schemes for setting the number of groups:\n    1. An equal number of groups: This is set by `--num_groups_per_scale` which indicates the number of groups \n    in each scale of latent variables. Reduce this number to have a small NVAE.\n    \n    2. An adaptive number of groups: This is enabled by `--ada_groups`. In this case, the highest\n    resolution of latent variables will have `--num_groups_per_scale` groups and \n    the smaller scales will get half the number of groups successively (see groups_per_scale in utils.py).\n    We don't let the number of groups go below `--min_groups_per_scale`. You can reduce\n    the total number of groups by reducing `--num_groups_per_scale` and `--min_groups_per_scale`\n    when `--ada_groups` is enabled.\n\n## Understanding the implementation\nIf you are modifying the code, you can use the following figure to map the code to the paper.\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FNVlabs_NVAE_readme_e5b0af377556.png\" width=\"900\">\n\u003C\u002Fp>\n\n\n## Traversing the latent space\nWe can generate images by traversing in the latent space of NVAE. This sequence is generated using our model\ntrained on CelebA HQ, by interpolating between samples generated with temperature 0.6. \nSome artifacts are due to color quantization in GIFs.\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FNVlabs_NVAE_readme_bd12cbc331c5.png\" width=\"512\">\n\u003C\u002Fp>\n\n## License\nPlease check the LICENSE file. NVAE may be used non-commercially, meaning for research or \nevaluation purposes only. For business inquiries, please contact \n[researchinquiries@nvidia.com](mailto:researchinquiries@nvidia.com).\n\nYou should take into consideration that VAEs are trained to mimic the training data distribution, and, any \nbias introduced in data collection will make VAEs generate samples with a similar bias. Additional bias could be \nintroduced during model design, training, or when VAEs are sampled using small temperatures. Bias correction in \ngenerative learning is an active area of research, and we recommend interested readers to check this area before \nbuilding applications using NVAE.\n\n## Bibtex:\nPlease cite our paper, if you happen to use this codebase:\n\n```\n@inproceedings{vahdat2020NVAE,\n  title={{NVAE}: A Deep Hierarchical Variational Autoencoder},\n  author={Vahdat, Arash and Kautz, Jan},\n  booktitle={Neural Information Processing Systems (NeurIPS)},\n  year={2020}\n}\n```\n","# “NVAE：一种深度层次化变分自编码器”的官方 PyTorch 实现 [(NeurIPS 2020 Spotlight 论文)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.03898)\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"http:\u002F\u002Flatentspace.cc\u002Farash_vahdat\u002F\" target=\"_blank\">Arash&nbsp;Vahdat\u003C\u002Fa> &emsp; \u003Cb>&middot;\u003C\u002Fb> &emsp;\n  \u003Ca href=\"http:\u002F\u002Fjankautz.com\u002F\" target=\"_blank\">Jan&nbsp;Kautz\u003C\u002Fa> \n\u003C\u002Fdiv>\n\u003Cbr>\n\u003Cbr>\n\n[NVAE](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.03898) 是一种深度层次化变分自编码器，它使得在多个图像数据集上训练 SOTA 基于似然的生成模型成为可能。\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FNVlabs_NVAE_readme_c0ea1efb82c4.png\" width=\"800\">\n\u003C\u002Fp>\n\n## 要求\nNVAE 使用 Python 3.7 和 PyTorch 1.6.0 构建。请使用以下命令安装依赖：\n```\npip install -r requirements.txt\n``` \n\n## 设置文件路径和数据\n我们已经在多个数据集上测试了 NVAE。对于大型数据集，为了提高 I\u002FO 效率，我们将数据存储在 LMDB 数据集中。点击下方的每个数据集，查看如何准备您的数据。其中，`$DATA_DIR` 表示包含所有数据集的数据目录路径，而 `$CODE_DIR` 指的是代码目录：\n\n\u003Cdetails>\u003Csummary>MNIST 和 CIFAR-10\u003C\u002Fsummary>\n\n首次运行 `train.py` 进行 NVAE 主训练时，这些数据集会自动下载。您可以使用 `--data=$DATA_DIR\u002Fmnist` 或 `--data=$DATA_DIR\u002Fcifar10` 参数，以便将数据集下载到相应的目录中。\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>CelebA 64\u003C\u002Fsummary>\n运行以下命令下载 CelebA 图像并将其存储为 LMDB 数据集：\n\n```shell script\ncd $CODE_DIR\u002Fscripts\npython create_celeba64_lmdb.py --split train --img_path $DATA_DIR\u002Fceleba_org --lmdb_path $DATA_DIR\u002Fceleba64_lmdb\npython create_celeba64_lmdb.py --split valid --img_path $DATA_DIR\u002Fceleba_org --lmdb_path $DATA_DIR\u002Fceleba64_lmdb\npython create_celeba64_lmdb.py --split test  --img_path $DATA_DIR\u002Fceleba_org --lmdb_path $DATA_DIR\u002Fceleba64_lmdb\n```\n上述命令会自动将图像下载到 `$DATA_DIR\u002Fceleba_org` 目录，随后在 `$DATA_DIR\u002Fceleba64_lmdb` 创建 LMDB 数据集。\n\u003C\u002Fdetails>\n \n\u003Cdetails>\u003Csummary>ImageNet 32x32\u003C\u002Fsummary>\n\n运行以下命令从 [GLOW](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fglow) 下载 tfrecord 文件，并将其转换为 LMDB 数据集：\n```shell script\nmkdir -p $DATA_DIR\u002Fimagenet-oord\ncd $DATA_DIR\u002Fimagenet-oord\nwget https:\u002F\u002Fstorage.googleapis.com\u002Fglow-demo\u002Fdata\u002Fimagenet-oord-tfr.tar\ntar -xvf imagenet-oord-tfr.tar\ncd $CODE_DIR\u002Fscripts\npython convert_tfrecord_to_lmdb.py --dataset=imagenet-oord_32 --tfr_path=$DATA_DIR\u002Fimagenet-oord\u002Fmnt\u002Fhost\u002Fimagenet-oord-tfr --lmdb_path=$DATA_DIR\u002Fimagenet-oord\u002Fimagenet-oord-lmdb_32 --split=train\npython convert_tfrecord_to_lmdb.py --dataset=imagenet-oord_32 --tfr_path=$DATA_DIR\u002Fimagenet-oord\u002Fmnt\u002Fhost\u002Fimagenet-oord-tfr --lmdb_path=$DATA_DIR\u002Fimagenet-oord\u002Fimagenet-oord-lmdb_32 --split=validation\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>CelebA HQ 256\u003C\u002Fsummary>\n\n运行以下命令从 [GLOW](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fglow) 下载 tfrecord 文件，并将其转换为 LMDB 数据集：\n```shell script\nmkdir -p $DATA_DIR\u002Fceleba\ncd $DATA_DIR\u002Fceleba\nwget https:\u002F\u002Fstorage.googleapis.com\u002Fglow-demo\u002Fdata\u002Fceleba-tfr.tar\ntar -xvf celeba-tfr.tar\ncd $CODE_DIR\u002Fscripts\npython convert_tfrecord_to_lmdb.py --dataset=celeba --tfr_path=$DATA_DIR\u002Fceleba\u002Fceleba-tfr --lmdb_path=$DATA_DIR\u002Fceleba\u002Fceleba-lmdb --split=train\npython convert_tfrecord_to_lmdb.py --dataset=celeba --tfr_path=$DATA_DIR\u002Fceleba\u002Fceleba-tfr --lmdb_path=$DATA_DIR\u002Fceleba\u002Fceleba-lmdb --split=validation\n```\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\u003Csummary>FFHQ 256\u003C\u002Fsummary>\n\n访问 [此 Google Drive 链接](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1WocxvZ4GEZ1DI8dOz30aSj2zT6pkATYS) 并下载 `images1024x1024.zip`。运行以下命令解压图像并将它们存储为 LMDB 数据集：\n```shell script\nmkdir -p $DATA_DIR\u002Fffhq\nunzip images1024x1024.zip -d $DATA_DIR\u002Fffhq\u002F\ncd $CODE_DIR\u002Fscripts\npython create_ffhq_lmdb.py --ffhq_img_path=$DATA_DIR\u002Fffhq\u002Fimages1024x1024\u002F --ffhq_lmdb_path=$DATA_DIR\u002Fffhq\u002Fffhq-lmdb --split=train\npython create_ffhq_lmdb.py --ffhq_img_path=$DATA_DIR\u002Fffhq\u002Fimages1024x1024\u002F --ffhq_lmdb_path=$DATA_DIR\u002Fffhq\u002Fffhq-lmdb --split=validation\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>LSUN\u003C\u002Fsummary>\n\n我们在后续工作中使用了 LSUN 数据集。请访问 [LSUN](https:\u002F\u002Fwww.yf.io\u002Fp\u002Flsun) 获取下载该数据集的说明。由于 LSUN 场景数据集以 LMDB 格式提供，因此可以直接使用 torchvision 的数据加载器进行加载。\n\n\u003C\u002Fdetails>\n\n\n## 运行 NVAE 主训练和评估脚本\n我们在每个数据集上使用以下命令来训练 NVAE，以用于 [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2007.03898.pdf) 中的表 1。除 MNIST 外，其他所有数据集均启用了归一化流。更多训练细节请参阅论文中的表 6。请注意，对于多节点训练（超过 8 张 GPU 的实验），我们使用 `mpirun` 命令在多个节点上运行训练脚本。请根据您的设置调整以下命令。其中，`IP_ADDR` 是将承载 rank 0 进程的机器的 IP 地址（详见 [此处](https:\u002F\u002Fpytorch.org\u002Ftutorials\u002Fintermediate\u002Fdist_tuto.html#initialization-methods)）。`NODE_RANK` 是当前节点在整个作业中所处的序号。\n\n\u003Cdetails>\u003Csummary>MNIST\u003C\u002Fsummary>\n\n使用两块 16GB 的 V100 GPU 对动态二值化的 MNIST 进行训练。训练大约需要 21 小时。\n\n```shell script\nexport EXPR_ID=UNIQUE_EXPR_ID\nexport DATA_DIR=PATH_TO_DATA_DIR\nexport CHECKPOINT_DIR=PATH_TO_CHECKPOINT_DIR\nexport CODE_DIR=PATH_TO_CODE_DIR\ncd $CODE_DIR\npython train.py --data $DATA_DIR\u002Fmnist --root $CHECKPOINT_DIR --save $EXPR_ID --dataset mnist --batch_size 200 \\\n        --epochs 400 --num_latent_scales 2 --num_groups_per_scale 10 --num_postprocess_cells 3 --num_preprocess_cells 3 \\\n        --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 --num_latent_per_group 20 --num_preprocess_blocks 2 \\\n        --num_postprocess_blocks 2 --weight_decay_norm 1e-2 --num_channels_enc 32 --num_channels_dec 32 --num_nf 0 \\\n        --ada_groups --num_process_per_node 2 --use_se --res_dist --fast_adamax \n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>CIFAR-10\u003C\u002Fsummary>\n\n使用八块 16GB 的 V100 GPU 对 CIFAR-10 进行训练。训练大约需要 55 小时。\n\n```shell script\nexport EXPR_ID=唯一表达式ID\nexport DATA_DIR=数据目录路径\nexport CHECKPOINT_DIR=检查点目录路径\nexport CODE_DIR=代码目录路径\ncd $CODE_DIR\npython train.py --data $DATA_DIR\u002Fcifar10 --root $CHECKPOINT_DIR --save $EXPR_ID --dataset cifar10 \\\n        --num_channels_enc 128 --num_channels_dec 128 --epochs 400 --num_postprocess_cells 2 --num_preprocess_cells 2 \\\n        --num_latent_scales 1 --num_latent_per_group 20 --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 \\\n        --num_preprocess_blocks 1 --num_postprocess_blocks 1 --num_groups_per_scale 30 --batch_size 32 \\\n        --weight_decay_norm 1e-2 --num_nf 1 --num_process_per_node 8 --use_se --res_dist --fast_adamax \n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>CelebA 64\u003C\u002Fsummary>\n\n训练 NVAE 在 CelebA 64 数据集上时，使用了八块 16GB 显存的 V100 GPU。训练大约需要 92 小时。\n\n```shell script\nexport EXPR_ID=唯一表达式ID\nexport DATA_DIR=数据目录路径\nexport CHECKPOINT_DIR=检查点目录路径\nexport CODE_DIR=代码目录路径\ncd $CODE_DIR\npython train.py --data $DATA_DIR\u002Fceleba64_lmdb --root $CHECKPOINT_DIR --save $EXPR_ID --dataset celeba_64 \\\n        --num_channels_enc 64 --num_channels_dec 64 --epochs 90 --num_postprocess_cells 2 --num_preprocess_cells 2 \\\n        --num_latent_scales 3 --num_latent_per_group 20 --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 \\\n        --num_preprocess_blocks 1 --num_postprocess_blocks 1 --weight_decay_norm 1e-1 --num_groups_per_scale 20 \\\n        --batch_size 16 --num_nf 1 --ada_groups --num_process_per_node 8 --use_se --res_dist --fast_adamax\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>ImageNet 32x32\u003C\u002Fsummary>\n\n在 ImageNet 32x32 数据集上训练 NVAE 时，使用了 24 块 16GB 显存的 V100 GPU。训练大约需要 70 小时。\n\n```shell script\nexport EXPR_ID=唯一表达式ID\nexport DATA_DIR=数据目录路径\nexport CHECKPOINT_DIR=检查点目录路径\nexport CODE_DIR=代码目录路径\nexport IP_ADDR=IP地址\nexport NODE_RANK=节点排名（0 到 2 之间）\ncd $CODE_DIR\nmpirun --allow-run-as-root -np 3 -npernode 1 bash -c \\\n        'python train.py --data $DATA_DIR\u002Fimagenet-oord\u002Fimagenet-oord-lmdb_32 --root $CHECKPOINT_DIR --save $EXPR_ID --dataset imagenet_32 \\\n        --num_channels_enc 192 --num_channels_dec 192 --epochs 45 --num_postprocess_cells 2 --num_preprocess_cells 2 \\\n        --num_latent_scales 1 --num_latent_per_group 20 --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 \\\n        --num_preprocess_blocks 1 --num_postprocess_blocks 1 --num_groups_per_scale 28 \\\n        --batch_size 24 --num_nf 1 --warmup_epochs 1 \\\n        --weight_decay_norm 1e-2 --weight_decay_norm_anneal --weight_decay_norm_init 1e0 \\\n        --num_process_per_node 8 --use_se --res_dist \\\n        --fast_adamax --node_rank $NODE_RANK --num_proc_node 3 --master_address $IP_ADDR '\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>CelebA HQ 256\u003C\u002Fsummary>\n\n在 CelebA HQ 256 数据集上训练 NVAE 时，使用了 24 块 32GB 显存的 V100 GPU。训练大约需要 94 小时。\n\n```shell script\nexport EXPR_ID=唯一表达式ID\nexport DATA_DIR=数据目录路径\nexport CHECKPOINT_DIR=检查点目录路径\nexport CODE_DIR=代码目录路径\nexport IP_ADDR=IP地址\nexport NODE_RANK=节点排名（0 到 2 之间）\ncd $CODE_DIR\nmpirun --allow-run-as-root -np 3 -npernode 1 bash -c \\\n        'python train.py --data $DATA_DIR\u002Fceleba\u002Fceleba-lmdb --root $CHECKPOINT_DIR --save $EXPR_ID --dataset celeba_256 \\\n        --num_channels_enc 30 --num_channels_dec 30 --epochs 300 --num_postprocess_cells 2 --num_preprocess_cells 2 \\\n        --num_latent_scales 5 --num_latent_per_group 20 --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 \\\n        --num_preprocess_blocks 1 --num_postprocess_blocks 1 --weight_decay_norm 1e-2 --num_groups_per_scale 16 \\\n        --batch_size 4 --num_nf 2 --ada_groups --min_groups_per_scale 4 \\\n        --weight_decay_norm_anneal --weight_decay_norm_init 1. --num_process_per_node 8 --use_se --res_dist \\\n        --fast_adamax --num_x_bits 5 --node_rank $NODE_RANK --num_proc_node 3 --master_address $IP_ADDR '\n```\n\n在我们早期的实验中，一个通道数为 24 而不是 30 的较小模型，仅用 8 块 GPU 就能在相同时间内完成训练（批次大小为 6）。这些较小的模型仅比大模型高出 0.01 bpd 的负对数似然。\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>FFHQ 256\u003C\u002Fsummary>\n\n在 FFHQ 256 数据集上训练 NVAE 时，使用了 24 块 32GB 显存的 V100 GPU。训练大约需要 160 小时。\n\n```shell script\nexport EXPR_ID=唯一表达式ID\nexport DATA_DIR=数据目录路径\nexport CHECKPOINT_DIR=检查点目录路径\nexport CODE_DIR=代码目录路径\nexport IP_ADDR=IP地址\nexport NODE_RANK=节点排名（0 到 2 之间）\ncd $CODE_DIR\nmpirun --allow-run-as-root -np 3 -npernode 1 bash -c \\\n        'python train.py --data $DATA_DIR\u002Fffhq\u002Fffhq-lmdb --root $CHECKPOINT_DIR --save $EXPR_ID --dataset ffhq \\\n        --num_channels_enc 30 --num_channels_dec 30 --epochs 200 --num_postprocess_cells 2 --num_preprocess_cells 2 \\\n        --num_latent_scales 5 --num_latent_per_group 20 --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 \\\n        --num_preprocess_blocks 1 --num_postprocess_blocks 1 --weight_decay_norm 1e-1  --num_groups_per_scale 16 \\\n        --batch_size 4 --num_nf 2  --ada_groups --min_groups_per_scale 4 \\\n        --weight_decay_norm_anneal --weight_decay_norm_init 1. --num_process_per_node 8 --use_se --res_dist \\\n        --fast_adamax --num_x_bits 5 --learning_rate 8e-3 --node_rank $NODE_RANK --num_proc_node 3 --master_address $IP_ADDR '\n```\n\n在我们早期的实验中，一个通道数为 24 而不是 30 的较小模型，仅用 8 块 GPU 就能在相同时间内完成训练（批次大小为 6）。这些较小的模型仅比大模型高出 0.01 bpd 的负对数似然。\n\u003C\u002Fdetails>\n\n**如果由于任何原因您的训练被中断，请使用完全相同的命令，并添加 `--cont_training` 参数，以从上次保存的检查点继续训练。如果您遇到 NaN 值，通常使用此标志继续训练并不能解决 NaN 问题。**\n\n## 已知问题\n\u003Cdetails>\u003Csummary>无法构建 CelebA 64 数据集，或在此数据集上训练时一开始就出现 NaN\u003C\u002Fsummary>\n\n多位用户报告称，在构建 CelebA 64 数据集时遇到问题，或者在该数据集上训练刚开始就出现 NaN。\n如果您在此数据集上遇到类似问题，可以手动下载该数据集，并按照此 issue 中的说明构建 LMDB 文件：\nhttps:\u002F\u002Fgithub.com\u002FNVlabs\u002FNVAE\u002Fissues\u002F2。\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>训练几轮后出现 NaN\u003C\u002Fsummary>\n\n训练非常深层的层次化变分自编码器的主要挑战之一，就是我们在论文中讨论过的训练不稳定问题。\n我们已经验证，上述命令中的设置可以实现稳定的训练。如果您修改了上述设置，并在训练几轮后遇到 NaN，\n可以尝试以下技巧来稳定训练：i) 增加谱正则化系数 `--weight_decay_norm`；ii) 使用指数衰减调整\n`--weight_decay_norm`，通过 `--weight_decay_norm_anneal` 和 `--weight_decay_norm_init` 参数实现；iii) 降低学习率。\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>训练卡住且无 NaN\u003C\u002Fsummary>\n\n在极少数情况下，我们观察到训练进行 2–3 天后会卡住。我们认为其根本原因是由某个底层库中的竞态条件引起的。\n如果训练因任何原因被中断，请终止当前运行，并使用完全相同的命令，加上 `--cont_training` 参数，\n从上次保存的检查点继续训练。\n\u003C\u002Fdetails>\n\n## 监控训练进度\n在运行上述任一命令时，您可以使用 TensorBoard 监控训练进度：\n\n\u003Cdetails>\u003Csummary>点击此处\u003C\u002Fsummary>\n\n```shell script\ntensorboard --logdir $CHECKPOINT_DIR\u002Feval-$EXPR_ID\u002F\n```\n其中，`$CHECKPOINT_DIR` 和 `$EXPR_ID` 是与运行主训练脚本时使用的变量相同。\n\u003C\u002Fdetails>\n\n## 训练后的采样、评估和检查点\n\n\u003Cdetails>\u003Csummary>评估对数似然\u003C\u002Fsummary>\n\n您可以使用以下命令加载已训练好的模型，并在测试数据集上对其进行评估：\n\n```shell script\ncd $CODE_DIR\npython evaluate.py --checkpoint $CHECKPOINT_DIR\u002Feval-$EXPR_ID\u002Fcheckpoint.pt --data $DATA_DIR\u002Fmnist --eval_mode=evaluate --num_iw_samples=1000\n```\n其中，`--num_iw_samples` 表示评估时使用的重要性加权样本数量。\n`$CHECKPOINT_DIR` 和 `$EXPR_ID` 是与运行主训练脚本时使用的变量相同。\n请将 `--data` 设置为训练 NVAE 时所用的参数（我们的示例针对 MNIST）。\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>采样\u003C\u002Fsummary>\n\n您还可以使用以下命令从已训练好的模型生成样本：\n\n```shell script\ncd $CODE_DIR\npython evaluate.py --checkpoint $CHECKPOINT_DIR\u002Feval-$EXPR_ID\u002Fcheckpoint.pt --eval_mode=sample --temp=0.6 --readjust_bn\n```\n其中，`--temp` 设置采样时使用的温度，而 `--readjust_bn` 则启用论文中描述的 BN 统计重调功能。\n如果您移除 `--readjust_bn`，采样将以 BN 层的评估模式进行（即 BN 层将使用训练过程中提取的运行均值和方差）。\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>计算 FID 分数\u003C\u002Fsummary>\n\n您可以使用 5 万个样本计算 FID 分数。为此，您需要先使用如下命令为训练数据创建均值和协方差统计文件：\n\n```shell script\ncd $CODE_DIR\npython scripts\u002Fprecompute_fid_statistics.py --data $DATA_DIR\u002Fcifar10 --dataset cifar10 --fid_dir \u002Ftmp\u002Ffid-stats\u002F\n```\n上述命令将在 CIFAR-10 数据集上计算参考统计信息，并将其存储在 `--fid_dir` 目录中。\n有了参考统计文件后，我们可以运行以下命令来计算 FID 分数：\n\n```shell script\ncd $CODE_DIR\npython evaluate.py --checkpoint $CHECKPOINT_DIR\u002Feval-$EXPR_ID\u002Fcheckpoint.pt --data $DATA_DIR\u002Fcifar10 --eval_mode=evaluate_fid --fid_dir \u002Ftmp\u002Ffid-stats\u002F --temp=0.6 --readjust_bn\n```\n其中，`--temp` 设置采样时使用的温度，而 `--readjust_bn` 则启用论文中描述的 BN 统计重调功能。\n如果您移除 `--readjust_bn`，采样将以 BN 层的评估模式进行（即 BN 层将使用训练过程中提取的运行均值和方差）。\n以上命令中，`$CHECKPOINT_DIR` 和 `$EXPR_ID` 是与运行主训练脚本时使用的变量相同。\n请将 `--data` 设置为训练 NVAE 时所用的参数（我们的示例针对 MNIST）。\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>检查点\u003C\u002Fsummary>\n\n我们在 [此 Google Drive 目录](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1KVpw12AzdVjvbfEYM_6_3sxTy93wWkbe?usp=sharing) 中提供了 MNIST、CIFAR-10、CelebA 64、CelebA HQ 256 和 FFHQ 的检查点。\n对于 CIFAR-10 数据集，我们提供了两个检查点，因为我们发现多尺度 NVAE 在该数据集上能产生比单尺度模型更好的定性结果。\n尽管多尺度模型在对数似然方面略逊一筹（低 0.01 bpd），但其生成的图像质量更高。\n此外，我们还发现，早期在 CelebA HQ 256 数据集上训练的一个模型虽然对数似然低 0.01 bpd，但在较低温度下却能生成更高质量的图像。\n您可以使用上述命令来评估或采样这些检查点。\n\u003C\u002Fdetails>\n\n## 如何构建更小的 NVAE 模型\n在上述命令中，我们构建的是大型 NVAE 模型，通常需要数天的训练时间。如果您希望构建更小的 NVAE，可以使用以下技巧：\n\n* 降低网络宽度：`--num_channels_enc` 和 `--num_channels_dec` 分别控制自下而上和自上而下网络中的初始通道数。请记住，在自下而上网络中，每经过一个空间下采样层，通道数就会减半；而在自上而下网络中，每经过一个上采样层，通道数就会加倍。通过减少 `--num_channels_enc` 和 `--num_channels_dec`，您可以降低网络的整体宽度。\n\n* 减少层次结构中的残差单元数量：`--num_cell_per_cond_enc` 和 `--num_cell_per_cond_dec` 分别控制自下而上和自上而下网络中每组潜在变量之间的残差单元数量。在我们的大多数实验中，这两类网络每组都使用两个残差单元。您可以将每个组的残差单元数量减少到一个，从而使模型更小。\n\n* 减少训练轮数：通过减少 `--epochs` 参数，可以缩短训练时间。\n\n* 减少组数：通过使用较少的潜在变量组，可以使 NVAE 更小。我们有两种设置组数的方式：\n    1. 每个尺度的组数相等：这由 `--num_groups_per_scale` 参数决定，表示每个潜在变量尺度上的组数。减少该参数值即可得到一个较小的 NVAE。\n    \n    2. 自适应组数：通过启用 `--ada_groups` 参数来实现。在这种情况下，最高分辨率的潜在变量将有 `--num_groups_per_scale` 组，而较低分辨率的尺度则依次减半（参见 utils.py 中的 groups_per_scale）。我们不会让组数低于 `--min_groups_per_scale`。当启用 `--ada_groups` 时，您可以通过同时减少 `--num_groups_per_scale` 和 `--min_groups_per_scale` 来降低总组数。\n\n## 理解实现细节\n如果您正在修改代码，可以参考下图，将代码与论文内容对应起来。\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FNVlabs_NVAE_readme_e5b0af377556.png\" width=\"900\">\n\u003C\u002Fp>\n\n\n## 在潜在空间中遍历\n我们可以通过在 NVAE 的潜在空间中进行遍历来生成图像。此序列是使用我们在 CelebA HQ 数据集上训练的模型生成的，通过对温度为 0.6 的样本进行插值得到。其中出现的一些伪影是由于 GIF 文件的颜色量化造成的。\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FNVlabs_NVAE_readme_bd12cbc331c5.png\" width=\"512\">\n\u003C\u002Fp>\n\n## 许可协议\n请查看 LICENSE 文件。NVAE 只能用于非商业用途，即仅限于研究或评估目的。如需商业合作，请联系 [researchinquiries@nvidia.com](mailto:researchinquiries@nvidia.com)。\n\n需要注意的是，变分自编码器是被训练来模仿训练数据分布的，因此数据收集过程中引入的任何偏差都会导致生成的样本带有类似的偏差。此外，在模型设计、训练阶段，或者在使用较低温度对模型进行采样时，也可能引入额外的偏差。偏见校正一直是生成式学习领域的研究热点，我们建议有兴趣的读者在使用 NVAE 构建应用之前，先了解这一领域。\n\n## BibTeX 引用：\n如果您使用了本代码库，请引用我们的论文：\n\n```\n@inproceedings{vahdat2020NVAE,\n  title={{NVAE}: A Deep Hierarchical Variational Autoencoder},\n  author={Vahdat, Arash and Kautz, Jan},\n  booktitle={Neural Information Processing Systems (NeurIPS)},\n  year={2020}\n}\n```","# NVAE 快速上手指南\n\nNVAE (Deep Hierarchical Variational Autoencoder) 是一个基于 PyTorch 的深度分层变分自编码器，用于在多个图像数据集上训练最先进的似然生成模型。\n\n## 环境准备\n\n*   **操作系统**: Linux (推荐)\n*   **Python**: 3.7\n*   **深度学习框架**: PyTorch 1.6.0\n*   **硬件要求**: \n    *   基础实验 (如 MNIST): 至少 2 张 16GB V100 GPU\n    *   大型数据集 (如 ImageNet, CelebA HQ): 建议多节点多卡环境 (如 24 张 V100)\n*   **依赖管理**: 项目使用 `requirements.txt` 管理依赖。\n\n> **提示**: 国内用户建议使用清华或阿里镜像源加速 pip 安装。\n\n## 安装步骤\n\n1.  **克隆代码库**\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002FNVlabs\u002FNVAE.git\n    cd NVAE\n    ```\n\n2.  **安装依赖**\n    推荐使用国内镜像源安装 Python 依赖：\n    ```bash\n    pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n    ```\n\n3.  **配置数据路径**\n    在后续命令中，请确保设置好以下环境变量（根据你的实际路径修改）：\n    ```bash\n    export DATA_DIR=\u002Fpath\u002Fto\u002Fyour\u002Fdata\n    export CHECKPOINT_DIR=\u002Fpath\u002Fto\u002Fsave\u002Fcheckpoints\n    export CODE_DIR=$(pwd)\n    ```\n\n## 基本使用\n\n### 1. 准备数据集\n\n对于小型数据集（MNIST, CIFAR-10），脚本会自动下载。对于大型数据集，需要预先转换为 LMDB 格式以提高 I\u002FO 效率。\n\n**示例：准备 CelebA 64 数据集**\n```bash\ncd $CODE_DIR\u002Fscripts\n# 自动下载原始图片并转换为 LMDB 格式 (训练集、验证集、测试集)\npython create_celeba64_lmdb.py --split train --img_path $DATA_DIR\u002Fceleba_org --lmdb_path $DATA_DIR\u002Fceleba64_lmdb\npython create_celeba64_lmdb.py --split valid --img_path $DATA_DIR\u002Fceleba_org --lmdb_path $DATA_DIR\u002Fceleba64_lmdb\npython create_celeba64_lmdb.py --split test  --img_path $DATA_DIR\u002Fceleba_org --lmdb_path $DATA_DIR\u002Fceleba64_lmdb\n```\n\n**示例：准备 ImageNet 32x32 数据集**\n需先从 GLOW 仓库下载 tfrecord 文件，然后转换：\n```bash\nmkdir -p $DATA_DIR\u002Fimagenet-oord\ncd $DATA_DIR\u002Fimagenet-oord\nwget https:\u002F\u002Fstorage.googleapis.com\u002Fglow-demo\u002Fdata\u002Fimagenet-oord-tfr.tar\ntar -xvf imagenet-oord-tfr.tar\n\ncd $CODE_DIR\u002Fscripts\npython convert_tfrecord_to_lmdb.py --dataset=imagenet-oord_32 --tfr_path=$DATA_DIR\u002Fimagenet-oord\u002Fmnt\u002Fhost\u002Fimagenet-oord-tfr --lmdb_path=$DATA_DIR\u002Fimagenet-oord\u002Fimagenet-oord-lmdb_32 --split=train\npython convert_tfrecord_to_lmdb.py --dataset=imagenet-oord_32 --tfr_path=$DATA_DIR\u002Fimagenet-oord\u002Fmnt\u002Fhost\u002Fimagenet-oord-tfr --lmdb_path=$DATA_DIR\u002Fimagenet-oord\u002Fimagenet-oord-lmdb_32 --split=validation\n```\n\n### 2. 开始训练\n\n以下是最小化配置的 **MNIST** 训练示例（单节点，2 卡）：\n\n```bash\nexport EXPR_ID=mnist_test\ncd $CODE_DIR\n\npython train.py --data $DATA_DIR\u002Fmnist --root $CHECKPOINT_DIR --save $EXPR_ID --dataset mnist --batch_size 200 \\\n        --epochs 400 --num_latent_scales 2 --num_groups_per_scale 10 --num_postprocess_cells 3 --num_preprocess_cells 3 \\\n        --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 --num_latent_per_group 20 --num_preprocess_blocks 2 \\\n        --num_postprocess_blocks 2 --weight_decay_norm 1e-2 --num_channels_enc 32 --num_channels_dec 32 --num_nf 0 \\\n        --ada_groups --num_process_per_node 2 --use_se --res_dist --fast_adamax \n```\n\n**多节点分布式训练示例 (ImageNet 32x32)**\n如果需要多节点训练，需使用 `mpirun` 并配置主节点 IP 和节点排名：\n\n```bash\nexport IP_ADDR=192.168.1.100  # 替换为主节点 (rank 0) 的 IP 地址\nexport NODE_RANK=0            # 当前节点排名 (0, 1, 2...)\n\nmpirun --allow-run-as-root -np 3 -npernode 1 bash -c \\\n        'python train.py --data $DATA_DIR\u002Fimagenet-oord\u002Fimagenet-oord-lmdb_32 --root $CHECKPOINT_DIR --save $EXPR_ID --dataset imagenet_32 \\\n        --num_channels_enc 192 --num_channels_dec 192 --epochs 45 --num_postprocess_cells 2 --num_preprocess_cells 2 \\\n        --num_latent_scales 1 --num_latent_per_group 20 --num_cell_per_cond_enc 2 --num_cell_per_cond_dec 2 \\\n        --num_preprocess_blocks 1 --num_postprocess_blocks 1 --num_groups_per_scale 28 \\\n        --batch_size 24 --num_nf 1 --warmup_epochs 1 \\\n        --weight_decay_norm 1e-2 --weight_decay_norm_anneal --weight_decay_norm_init 1e0 \\\n        --num_process_per_node 8 --use_se --res_dist \\\n        --fast_adamax --node_rank $NODE_RANK --num_proc_node 3 --master_address $IP_ADDR '\n```\n\n### 3. 关键参数说明\n*   `--dataset`: 数据集名称 (mnist, cifar10, celeba_64, imagenet_32, celeba_256, ffhq)。\n*   `--num_process_per_node`: 每个节点使用的 GPU 数量。\n*   `--num_nf`: 是否启用归一化流 (Normalizing Flows)，除 MNIST 外通常设为 1 或更高。\n*   `--epochs`: 训练轮数。\n*   `--batch_size`: 单卡批大小。","某时尚电商公司的算法团队正致力于构建一个高保真的人脸图像生成系统，用于虚拟模特试衣和个性化营销素材制作。\n\n### 没有 NVAE 时\n- **生成图像细节模糊**：传统变分自编码器（VAE）在处理高分辨率人脸（如 CelebA HQ 256x256）时，往往丢失高频纹理，导致生成的皮肤质感像“磨皮过度”，缺乏真实毛孔和发丝细节。\n- **潜在空间控制力弱**：模型难以捕捉从全局轮廓到局部五官的多尺度特征，开发者无法独立调整发型、表情或光照而不影响其他面部结构。\n- **训练收敛困难**：在深层网络中尝试增加层级以提升效果时，常遭遇梯度消失或模式崩溃，导致训练不稳定，难以在大规模数据集上达到业界领先的似然估计分数。\n\n### 使用 NVAE 后\n- **还原极致纹理细节**：利用 NVAE 的深度分层架构，模型成功学习了多层级的潜在变量，生成的人脸图像清晰呈现了皮肤纹理、发丝走向甚至眼神光，视觉效果达到 SOTA 水平。\n- **实现精细化编辑**：得益于分层潜在空间，团队可以分别操控不同层级的变量，实现了仅改变人物表情而保持身份不变，或仅调整光照而不扭曲五官的精准编辑能力。\n- **稳定训练大规模数据**：NVAE 在 ImageNet 和 FFHQ 等大数据集上表现出卓越的训练稳定性，无需复杂的技巧即可收敛，显著缩短了从实验到部署的周期。\n\nNVAE 通过深度分层建模突破了传统生成模型的分辨率与细节瓶颈，让高质量、可控的人脸合成真正落地于商业生产环境。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FNVlabs_NVAE_826f15e2.png","NVlabs","NVIDIA Research Projects","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FNVlabs_fc20d641.jpg","",null,"http:\u002F\u002Fresearch.nvidia.com","https:\u002F\u002Fgithub.com\u002FNVlabs",[83,87],{"name":84,"color":85,"percentage":86},"Python","#3572A5",99.8,{"name":88,"color":89,"percentage":90},"Dockerfile","#384d54",0.2,1092,179,"2026-03-21T05:04:14","NOASSERTION","Linux","必需 NVIDIA GPU。根据数据集不同，需求如下：MNIST (2x 16GB V100), CIFAR-10\u002FCelebA 64 (8x 16GB V100), ImageNet 32x32 (24x 16GB V100), CelebA HQ 256\u002FFFHQ 256 (24x 32GB V100)。支持多节点分布式训练 (mpirun)。","未说明",{"notes":99,"python":100,"dependencies":101},"1. 大型数据集（如 CelebA, ImageNet, FFHQ）需预先转换为 LMDB 格式以提高 I\u002FO 效率，README 中提供了具体的转换脚本。2. 多节点训练需配置 mpirun 并指定主节点 IP 和节点排名。3. 部分小模型可在较少 GPU（如 8 张）上训练，但官方推荐配置基于 V100 显卡。4. 代码依赖 `requirements.txt` 安装，具体版本以 PyTorch 1.6.0 为准。","3.7",[102,103,104,105,106],"torch==1.6.0","lmdb","numpy","pillow","scipy",[14],"2026-03-27T02:49:30.150509","2026-04-06T07:13:01.953307",[111,116,121,126,131,136,141,145],{"id":112,"question_zh":113,"answer_zh":114,"source_url":115},13633,"训练过程中出现 \"nan\" 损失怎么办？","这通常与学习率设置或训练轮次有关。建议尝试将初始学习率修改为 1e-3（使用 '--learning_rate 1e-3' 参数）。此外，请确认报错发生的实际轮次，有时用户误将迭代次数当作 epoch 数（例如 batch_size=8 时约 5000 次迭代为一个 epoch），问题可能出现在较后的训练阶段而非初期。","https:\u002F\u002Fgithub.com\u002FNVlabs\u002FNVAE\u002Fissues\u002F2",{"id":117,"question_zh":118,"answer_zh":119,"source_url":120},13634,"在哪里可以下载预训练模型？","项目维护者已更新 README 文件，其中包含了预训练检查点（checkpoints）的下载链接，请直接查阅项目根目录下的 README 文档获取最新地址。","https:\u002F\u002Fgithub.com\u002FNVlabs\u002FNVAE\u002Fissues\u002F7",{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},13635,"如何从图像中获取编码器的潜在变量输出用于聚类分析？","可以通过提取模型中的潜在空间嵌入（latent space embeddings）来实现。需要注意的是，原始 NVAE 模型的潜在空间维度非常高，直接用于聚类效果可能不佳。建议简化模型参数以降低维度，或者在不同层级（hierarchy levels）上分别进行聚类，这样能获得更符合预期的分组结果。","https:\u002F\u002Fgithub.com\u002FNVlabs\u002FNVAE\u002Fissues\u002F15",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},13636,"代码中连续使用两个 ELU 激活函数是设计如此还是错误？","这是有意为之的设计。解码器采样单元（decoder sampler cell）包含 nn.ELU + Conv2D，而编码器采样器（encoder sampler）仅包含 Conv2D 层且不加激活函数。维护者表示曾在实验中尝试给编码器添加激活函数和归一化，但未观察到性能提升，因此保持现状。","https:\u002F\u002Fgithub.com\u002FNVlabs\u002FNVAE\u002Fissues\u002F25",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},13637,"高斯参数预处理（soft_clamp5）的作用是什么？","该预处理用于限制每个潜在变量的 KL 散度上限，防止模型不稳定。虽然单个变量的 KL 有界，但由于模型包含大量潜在变量，当编码器和先验分布出现微小不匹配时，累积的 KL 值会变得极大导致训练崩溃。所有论文中的实验均使用了此预处理步骤。","https:\u002F\u002Fgithub.com\u002FNVlabs\u002FNVAE\u002Fissues\u002F27",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},13638,"TensorBoard 中无法显示生成的图像（显示为黑像素或空白）如何解决？","首先尝试重新安装 TensorBoard：运行 'pip install tensorboard --ignore-installed'。启动时如果是最新版本可能需要添加 '--bind_all' 参数。另外，某些版本（如 tensorboardx==2.1）表现更好。还需注意，在训练初期（如前 10 个 epoch）可能只能看到重建图像，生成图像部分需要训练足够多的 epoch 后才会正常显示。","https:\u002F\u002Fgithub.com\u002FNVlabs\u002FNVAE\u002Fissues\u002F8",{"id":142,"question_zh":143,"answer_zh":144,"source_url":120},13639,"什么是 BPD elbo 指标？","BPD elbo 指的是变分推断中的证据下界（Evidence Lower Bound, ELBO）除以“维度”数量（在本案例中即每张图像的像素数）。随后通常再除以 log(2)，以便将结果转换为以 2 为底的单位（bits per dimension）。",{"id":146,"question_zh":147,"answer_zh":148,"source_url":115},13640,"如何在 256x256 分辨率的图像上进行训练？","通常不需要对模型架构进行额外的更改。只需根据论文中提供的信息调整相应的运行标志（flags）和参数配置即可适应更高分辨率的图像训练。",[]]