[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-YannDubs--disentangling-vae":3,"tool-YannDubs--disentangling-vae":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",159636,2,"2026-04-17T23:33:34",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":77,"owner_email":78,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":91,"forks":92,"last_commit_at":93,"license":94,"difficulty_score":32,"env_os":95,"env_gpu":96,"env_ram":95,"env_deps":97,"category_tags":103,"github_topics":105,"view_count":32,"oss_zip_url":123,"oss_zip_packed_at":123,"status":17,"created_at":124,"updated_at":125,"faqs":126,"releases":162},8844,"YannDubs\u002Fdisentangling-vae","disentangling-vae","Experiments for understanding disentanglement in VAE latent representations","disentangling-vae 是一个基于 PyTorch 的开源研究工具，旨在深入探索变分自编码器（VAE）中潜在表示的“解耦”特性。在深度学习中，模型往往难以将数据的独立特征（如形状、颜色、姿态）分离开来，而该工具通过对比五种主流损失函数（包括标准 VAE、β-VAE、FactorVAE 及β-TCVAE 等），帮助研究者理解不同算法如何提升特征分离的效果。\n\n它主要解决了学术界在评估和比较不同解耦方法时缺乏统一基准的问题。disentangling-vae 不仅提供了完整的训练代码，还集成了互信息差距（MIG）等专业评估指标与可视化功能，能够自动生成潜在维度遍历的动态图，直观展示模型学习到的特征结构。\n\n这款工具特别适合人工智能研究人员、算法工程师及相关领域的学生使用。无论是想要复现经典论文实验，还是希望在同一架构下公平对比不同损失函数的性能，都能从中获得高效支持。其核心亮点在于“单一架构对比”设计，确保了实验变量的严格控制，同时支持 CPU 与 GPU 运行，并预置了 MNIST、CelebA 等多种数据集的实验配置，大大降低了复现前沿研究的门槛。","# Disentangled VAE [![License: MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-yellow.svg)](https:\u002F\u002Fgithub.com\u002FYannDubs\u002Fdisentangling-vae\u002Fblob\u002Fmaster\u002FLICENSE) [![Python 3.6+](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.6+-blue.svg)](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002Frelease\u002Fpython-360\u002F)\n\nThis repository contains code (training \u002F metrics \u002F plotting) to investigate disentangling in VAE as well as compare 5 different losses ([summary of the differences](#losses-explanation)) using a [single architecture](#single-model-comparison):\n\n* **Standard VAE Loss** from [Auto-Encoding Variational Bayes](https:\u002F\u002Farxiv.org\u002Fabs\u002F1312.6114)\n* **β-VAE\u003Csub>H\u003C\u002Fsub>** from [β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework](https:\u002F\u002Fopenreview.net\u002Fpdf?id=Sy2fzU9gl)\n* **β-VAE\u003Csub>B\u003C\u002Fsub>** from [Understanding disentangling in β-VAE](https:\u002F\u002Farxiv.org\u002Fabs\u002F1804.03599)\n* **FactorVAE** from [Disentangling by Factorising](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.05983)\n* **β-TCVAE** from [Isolating Sources of Disentanglement in Variational Autoencoders](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.04942)\n\nNotes:\n- Tested for python >= 3.6\n- Tested for CPU and GPU\n\nTable of Contents:\n1. [Install](#install)\n2. [Run](#run)\n3. [Plot](#plot)\n3. [Data](#data)\n4. [Our Contributions](#our-contributions)\n5. [Losses Explanation](#losses-explanation)\n6. [Citing](#cite)\n\n## Install\n\n```\n# clone repo\npip install -r requirements.txt\n```\n\n## Run\n\nUse `python main.py \u003Cmodel-name> \u003Cparam>` to train and\u002For evaluate a model. For example:\n\n```\npython main.py btcvae_celeba_mini -d celeba -l btcvae --lr 0.001 -b 256 -e 5\n```\n\nYou can run predefined experiments and hyper-parameters using `-x \u003Cexperiment>`. Those hyperparameters are found in `hyperparam.ini`. Pretrained models for each experiment can be found in `results\u002F\u003Cexperiment>` (created using `.\u002Fbin\u002Ftrain_all.sh`).\n\n\n### Output\nThis will create a directory `results\u002F\u003Csaving-name>\u002F` which will contain:\n\n* **model.pt**: The model at the end of training. \n* **model-**`i`**.pt**: Model checkpoint after `i` iterations. By default saves every 10.\n* **specs.json**: The parameters used to run the program (default and modified with CLI).\n* **training.gif**: GIF of latent traversals of the latent dimensions Z at every epoch of training.\n* **train_losses.log**: All (sub-)losses computed during training.\n* **test_losses.log**: All (sub-)losses computed at the end of training with the model in evaluate mode (no sampling). \n* **metrics.log**: [Mutual Information Gap](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.04942) metric and [Axis Alignment Metric](#axis-alignment-metric). Only if `--is-metric` (slow).\n\n\n### Help\n```\nusage: main.py ...\n\nPyTorch implementation and evaluation of disentangled Variational AutoEncoders\nand metrics.\n\noptional arguments:\n  -h, --help            show this help message and exit\n\nGeneral options:\n  name                  Name of the model for storing or loading purposes.\n  -L, --log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}\n                        Logging levels. (default: info)\n  --no-progress-bar     Disables progress bar. (default: False)\n  --no-cuda             Disables CUDA training, even when have one. (default:\n                        False)\n  -s, --seed SEED       Random seed. Can be `None` for stochastic behavior.\n                        (default: 1234)\n\nTraining specific options:\n  --checkpoint-every CHECKPOINT_EVERY\n                        Save a checkpoint of the trained model every n epoch.\n                        (default: 30)\n  -d, --dataset {mnist,fashion,dsprites,celeba,chairs}\n                        Path to training data. (default: mnist)\n  -x, --experiment {custom,debug,best_celeba,VAE_mnist,VAE_fashion,VAE_dsprites,VAE_celeba,VAE_chairs,betaH_mnist,betaH_fashion,betaH_dsprites,betaH_celeba,betaH_chairs,betaB_mnist,betaB_fashion,betaB_dsprites,betaB_celeba,betaB_chairs,factor_mnist,factor_fashion,factor_dsprites,factor_celeba,factor_chairs,btcvae_mnist,btcvae_fashion,btcvae_dsprites,btcvae_celeba,btcvae_chairs}\n                        Predefined experiments to run. If not `custom` this\n                        will overwrite some other arguments. (default: custom)\n  -e, --epochs EPOCHS   Maximum number of epochs to run for. (default: 100)\n  -b, --batch-size BATCH_SIZE\n                        Batch size for training. (default: 64)\n  --lr LR               Learning rate. (default: 0.0005)\n\nModel specfic options:\n  -m, --model-type {Burgess}\n                        Type of encoder and decoder to use. (default: Burgess)\n  -z, --latent-dim LATENT_DIM\n                        Dimension of the latent variable. (default: 10)\n  -l, --loss {VAE,betaH,betaB,factor,btcvae}\n                        Type of VAE loss function to use. (default: betaB)\n  -r, --rec-dist {bernoulli,laplace,gaussian}\n                        Form of the likelihood ot use for each pixel.\n                        (default: bernoulli)\n  -a, --reg-anneal REG_ANNEAL\n                        Number of annealing steps where gradually adding the\n                        regularisation. What is annealed is specific to each\n                        loss. (default: 0)\n\nBetaH specific parameters:\n  --betaH-B BETAH_B     Weight of the KL (beta in the paper). (default: 4)\n\nBetaB specific parameters:\n  --betaB-initC BETAB_INITC\n                        Starting annealed capacity. (default: 0)\n  --betaB-finC BETAB_FINC\n                        Final annealed capacity. (default: 25)\n  --betaB-G BETAB_G     Weight of the KL divergence term (gamma in the paper).\n                        (default: 1000)\n\nfactor VAE specific parameters:\n  --factor-G FACTOR_G   Weight of the TC term (gamma in the paper). (default:\n                        6)\n  --lr-disc LR_DISC     Learning rate of the discriminator. (default: 5e-05)\n\nbeta-tcvae specific parameters:\n  --btcvae-A BTCVAE_A   Weight of the MI term (alpha in the paper). (default:\n                        1)\n  --btcvae-G BTCVAE_G   Weight of the dim-wise KL term (gamma in the paper).\n                        (default: 1)\n  --btcvae-B BTCVAE_B   Weight of the TC term (beta in the paper). (default:\n                        6)\n\nEvaluation specific options:\n  --is-eval-only        Whether to only evaluate using precomputed model\n                        `name`. (default: False)\n  --is-metrics          Whether to compute the disentangled metrcics.\n                        Currently only possible with `dsprites` as it is the\n                        only dataset with known true factors of variations.\n                        (default: False)\n  --no-test             Whether not to compute the test losses.` (default:\n                        False)\n  --eval-batchsize EVAL_BATCHSIZE\n                        Batch size for evaluation. (default: 1000)\n```\n\n## Plot\n\nUse `python main_viz.py \u003Cmodel-name> \u003Cplot_types> \u003Cparam>` to plot using pretrained models. For example:\n\n```\npython main_viz.py btcvae_celeba_mini gif-traversals reconstruct-\n                        traverse -c 7 -r 6 -t 2 --is-posterior\n```\n\nThis will save the plots in the model directory  `results\u002F\u003Cmodel-name>\u002F`. Generated plots for all experiments are found in their respective directories (created using `.\u002Fbin\u002Fplot_all.sh`).\n\n### Help\n```\nusage: main_viz.py ...\n\nCLI for plotting using pretrained models of `disvae`\n\npositional arguments:\n  name                  Name of the model for storing and loading purposes.\n  {generate-samples,data-samples,reconstruct,traversals,reconstruct-traverse,gif-traversals,all}\n                        List of all plots to generate. `generate-samples`:\n                        random decoded samples. `data-samples` samples from\n                        the dataset. `reconstruct` first rnows\u002F\u002F2 will be the\n                        original and rest will be the corresponding\n                        reconstructions. `traversals` traverses the most\n                        important rnows dimensions with ncols different\n                        samples from the prior or posterior. `reconstruct-\n                        traverse` first row for original, second are\n                        reconstructions, rest are traversals. `gif-traversals`\n                        grid of gifs where rows are latent dimensions, columns\n                        are examples, each gif shows posterior traversals.\n                        `all` runs every plot.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -s, --seed SEED       Random seed. Can be `None` for stochastic behavior.\n                        (default: None)\n  -r, --n-rows N_ROWS   The number of rows to visualize (if applicable).\n                        (default: 6)\n  -c, --n-cols N_COLS   The number of columns to visualize (if applicable).\n                        (default: 7)\n  -t, --max-traversal MAX_TRAVERSAL\n                        The maximum displacement induced by a latent\n                        traversal. Symmetrical traversals are assumed. If\n                        `m>=0.5` then uses absolute value traversal, if\n                        `m\u003C0.5` uses a percentage of the distribution\n                        (quantile). E.g. for the prior the distribution is a\n                        standard normal so `m=0.45` corresponds to an absolute\n                        value of `1.645` because `2m=90%` of a standard normal\n                        is between `-1.645` and `1.645`. Note in the case of\n                        the posterior, the distribution is not standard normal\n                        anymore. (default: 2)\n  -i, --idcs IDCS [IDCS ...]\n                        List of indices to of images to put at the begining of\n                        the samples. (default: [])\n  -u, --upsample-factor UPSAMPLE_FACTOR\n                        The scale factor with which to upsample the image (if\n                        applicable). (default: 1)\n  --is-show-loss        Displays the loss on the figures (if applicable).\n                        (default: False)\n  --is-posterior        Traverses the posterior instead of the prior.\n                        (default: False)\n```\n\n### Examples\n\nHere are examples of plots you can generate:\n\n* `python main_viz.py \u003Cmodel> reconstruct-traverse --is-show-loss --is-posterior` first row are originals, second are reconstructions, rest are traversals. Shown for `btcvae_dsprites`:\n\n    ![btcvae_dsprites reconstruct-traverse](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FYannDubs_disentangling-vae_readme_1515905cc993.png)\n\n* `python main_viz.py \u003Cmodel> gif-traversals` grid of gifs where rows are latent dimensions, columns are examples, each gif shows posterior traversals. Shown for `btcvae_celeba`:\n\n    ![btcvae_celeba gif-traversals](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FYannDubs_disentangling-vae_readme_4a0c07731ae6.gif)\n\n* Grid of gifs generated using code in `bin\u002Fplot_all.sh`. The columns of the grid correspond to the datasets (besides FashionMNIST), the rows correspond to the models (in order: Standard VAE, β-VAE\u003Csub>H\u003C\u002Fsub>, β-VAE\u003Csub>B\u003C\u002Fsub>, FactorVAE, β-TCVAE):\n\n    ![grid_posteriors](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FYannDubs_disentangling-vae_readme_027567d3a036.gif)\n\nFor more examples, all of the plots for the predefined experiments are found in their respective directories (created using `.\u002Fbin\u002Fplot_all.sh`).\n\n## Data\n\nCurrent datasets that can be used:\n- [MNIST](http:\u002F\u002Fyann.lecun.com\u002Fexdb\u002Fmnist\u002F)\n- [FashionMNIST](https:\u002F\u002Fgithub.com\u002Fzalandoresearch\u002Ffashion-mnist)\n- [3D Chairs](https:\u002F\u002Fwww.di.ens.fr\u002Fwillow\u002Fresearch\u002Fseeing3Dchairs)\n- [Celeba](http:\u002F\u002Fmmlab.ie.cuhk.edu.hk\u002Fprojects\u002FCelebA.html)\n- [2D Shapes \u002F Dsprites](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fdsprites-dataset\u002F)\n\nThe dataset will be downloaded the first time you run it and will be stored in `data` for future uses. The download will take time and might not work anymore if the download links change. In this case either:\n\n1. Open an issue\n2. Change the URLs (`urls[\"train\"]`) for the dataset you want in `utils\u002Fdatasets.py` (please open a PR in this case :) )\n3. Download by hand the data and save it with the same names (not recommended)\n\n## Our Contributions\n\nIn addition to replicating the aforementioned papers, we also propose and investigate the following:\n\n### Axis Alignment Metric\n\nQualitative inspections are unsuitable to compare models reliably due to their subjective and time consuming nature. Recent papers use quantitative measures of disentanglement based on the ground truth factors of variation **v** and the latent dimensions **z**. The [Mutual Information Gap (MIG)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.04942) metric is an appealing information theoretic metric which is appealing as it does not use any classifier. To get a MIG of 1 in the dSprites case where we have 10 latent dimensions and 5 generative factors, 5 of the latent dimensions should exactly encode the true factors of variations, and the rest should be independent of these 5.\n\nAlthough a metric like MIG is what we would like to use in the long term, current models do not get good scores and it is hard to understand what they should improve. We thus propose an axis alignment metric AAM, which does not focus on how much information of **v** is encoded by **z**, but rather if each v\u003Csub>k\u003C\u002Fsub> is only encoded in a single z\u003Csub>j\u003C\u002Fsub>. For example in the dSprites dataset, it is possible to get an AAM of 1 if **z** encodes only 90% of the variance in the x position of the shapes as long as this 90% is only encoded by a single latent dimension z\u003Csub>j\u003C\u002Fsub>. This is a useful metric to have a better understanding of what each model is good and bad at. Formally:\n\n![Axis Alignment Metric](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FYannDubs_disentangling-vae_readme_65b4eca50369.png)\n\nWhere the subscript *(d)* denotes the *d*\u003Csup>th\u003C\u002Fsup> order statistic and *I*\u003Csub>x\u003C\u002Fsub> is estimated using empirical distributions and stratified sampling (like with MIG):\n\n![Mutual Information for AAM](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FYannDubs_disentangling-vae_readme_2533661ae17d.png)\n\n\n### Single Model Comparison\n\nThe model is decoupled from all the losses and it should thus be very easy to modify the encoder \u002F decoder without modifying the losses. We only used a single model in order to have more objective comparisons of the different losses. The model used is the one from [Understanding disentangling in β-VAE](https:\u002F\u002Farxiv.org\u002Fabs\u002F1804.03599), which is summarized below:\n\n![Model Architecture](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FYannDubs_disentangling-vae_readme_ef4074d874e2.png)\n\n\n## Losses Explanation\n\nAll the previous losses are special cases of the following loss:\n\n![Loss Overview](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FYannDubs_disentangling-vae_readme_980688f77c48.png)\n\n1. **Index-code mutual information**: the mutual information between the latent variables **z** and the data variable **x**. There is contention in the literature regarding the correct way to treat this term. From the [information bottleneck perspective](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.01350) this should be penalized. [InfoGAN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1606.03657) get good results by increasing the mutual information (negative α). Finally, [Wasserstein Auto-Encoders](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.01558) drops this term. \n\n2. **Total Correlation (TC)**: the KL divergence between the joint and the product of the marginals of the latent variable. *I.e.** a measure of dependence between the latent dimensions. Increasing β forces the model to find statistically independent factors of variation in the data distribution.\n\n3. **Dimension-wise KL divergence**: the KL divergence between each dimension of the marginal posterior and the prior. This term ensures the learning of a compact space close to the prior which enables sampling of novel examples.\n\nThe losses differ in their estimates of each of these terms and the hyperparameters they use:\n\n* [**Standard VAE Loss**](https:\u002F\u002Farxiv.org\u002Fabs\u002F1312.6114): α=β=ɣ=1. Each term is computed exactly by a closed form solution (KL between the prior and the posterior). Tightest lower bound.\n* [**β-VAE\u003Csub>H\u003C\u002Fsub>**](https:\u002F\u002Fopenreview.net\u002Fpdf?id=Sy2fzU9gl): α=β=ɣ>1. Each term is computed exactly by a closed form solution. Simply adds a hyper-parameter (β in the paper) before the KL.\n* [**β-VAE\u003Csub>B\u003C\u002Fsub>**](https:\u002F\u002Farxiv.org\u002Fabs\u002F1804.03599): α=β=ɣ>1. Same as **β-VAE\u003Csub>H\u003C\u002Fsub>** but only penalizes the 3 terms once they deviate from a capacity C which increases during training.\n* [**FactorVAE**](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.05983): α=ɣ=1, β>1. Each term is computed exactly by a closed form solution. Simply adds a hyper-parameter (β in the paper) before the KL. Adds a weighted Total Correlation term to the standard VAE loss. The total correlation is estimated using a classifier and the density-ratio trick. Note that ɣ in their paper corresponds to β+1 in our framework.\n* [**β-TCVAE**](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.04942): α=ɣ=1 (although can be modified), β>1. Conceptually equivalent to FactorVAE, but each term is estimated separately using minibatch stratified sampling.\n\n ## Cite\n\nWhen using one of the models implemented in this repo in academic work please cite the corresponding paper (linked at the top of the README). In case you want to cite this specific implementation then you can use:\n\n```\n@misc{dubois2019dvae,\n  title        = {Disentangling VAE},\n  author       = {Dubois, Yann and Kastanos, Alexandros and Lines, Dave and Melman, Bart},\n  month        = {march},\n  year         = {2019},\n  howpublished = {\\url{http:\u002F\u002Fgithub.com\u002FYannDubs\u002Fdisentangling-vae\u002F}}\n}\n```\n","# 解耦VAE [![许可证：MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-yellow.svg)](https:\u002F\u002Fgithub.com\u002FYannDubs\u002Fdisentangling-vae\u002Fblob\u002Fmaster\u002FLICENSE) [![Python 3.6+](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.6+-blue.svg)](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002Frelease\u002Fpython-360\u002F)\n\n此仓库包含用于研究VAE中解耦现象的代码（训练、指标计算、绘图），并使用[单一架构](#single-model-comparison)比较5种不同的损失函数（[差异总结](#losses-explanation)）：\n\n* 来自[《变分贝叶斯自动编码》](https:\u002F\u002Farxiv.org\u002Fabs\u002F1312.6114)的**标准VAE损失**\n* 来自[《β-VAE：利用约束变分框架学习基本视觉概念》](https:\u002F\u002Fopenreview.net\u002Fpdf?id=Sy2fzU9gl)的**β-VAE\u003Csub>H\u003C\u002Fsub>**\n* 来自[《理解β-VAE中的解耦》](https:\u002F\u002Farxiv.org\u002Fabs\u002F1804.03599)的**β-VAE\u003Csub>B\u003C\u002Fsub>**\n* 来自[《通过因子分解实现解耦》](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.05983)的**FactorVAE**\n* 来自[《在变分自编码器中分离解耦源》](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.04942)的**β-TCVAE**\n\n注意事项：\n- 已测试兼容Python 3.6及以上版本\n- 已测试支持CPU和GPU\n\n目录：\n1. [安装](#install)\n2. [运行](#run)\n3. [绘图](#plot)\n3. [数据](#data)\n4. [我们的贡献](#our-contributions)\n5. [损失函数解释](#losses-explanation)\n6. [引用](#cite)\n\n## 安装\n\n```\n# 克隆仓库\npip install -r requirements.txt\n```\n\n## 运行\n\n使用`python main.py \u003C模型名称> \u003C参数>`来训练和\u002F或评估模型。例如：\n\n```\npython main.py btcvae_celeba_mini -d celeba -l btcvae --lr 0.001 -b 256 -e 5\n```\n\n您也可以使用`-x \u003C实验>`来运行预定义的实验及超参数，这些超参数可在`hyperparam.ini`中找到。每个实验的预训练模型可从`results\u002F\u003C实验>`目录中获取（通过`.\u002Fbin\u002Ftrain_all.sh`脚本生成）。\n\n### 输出\n这将创建一个名为`results\u002F\u003C保存名称>\u002F`的目录，其中包含：\n\n* **model.pt**：训练结束时的模型。\n* **model-**`i`**.pt**：每经过`i`次迭代后保存的模型检查点，默认每10次保存一次。\n* **specs.json**：运行程序时使用的参数（默认参数及通过命令行修改后的参数）。\n* **training.gif**：训练过程中每一epoch的潜在空间维度Z的遍历动画GIF。\n* **train_losses.log**：训练期间计算的所有（子）损失。\n* **test_losses.log**：训练结束后以评估模式运行时计算的所有（子）损失（不进行采样）。\n* **metrics.log**：[互信息差距](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.04942)指标和[轴对齐指标](#axis-alignment-metric)。仅当指定`--is-metric`时才会计算（较慢）。\n\n### 帮助\n```\n用法：main.py ...\n\nPyTorch实现与评估解耦变分自编码器及其指标。\n可选参数：\n  -h, --help            显示此帮助信息并退出\n\n通用选项：\n  name                  模型名称，用于存储或加载。\n  -L, --log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}\n                        日志级别。（默认：info）\n  --no-progress-bar     禁用进度条。（默认：False）\n  --no-cuda             即使有CUDA设备也禁用CUDA训练。（默认：False）\n  -s, --seed SEED       随机种子。可以设置为`None`以获得随机行为。（默认：1234）\n\n训练特定选项：\n  --checkpoint-every CHECKPOINT_EVERY\n                        每隔n个epoch保存一次训练好的模型检查点。（默认：30）\n  -d, --dataset {mnist,fashion,dsprites,celeba,chairs}\n                        训练数据路径。（默认：mnist）\n  -x, --experiment {custom,debug,best_celeba,VAE_mnist,VAE_fashion,VAE_dsprites,VAE_celeba,VAE_chairs,betaH_mnist,betaH_fashion,betaH_dsprites,betaH_celeba,betaH_chairs,betaB_mnist,betaB_fashion,betaB_dsprites,betaB_celeba,betaB_chairs,factor_mnist,factor_fashion,factor_dsprites,factor_celeba,factor_chairs,btcvae_mnist,btcvae_fashion,btcvae_dsprites,btcvae_celeba,btcvae_chairs}\n                        预定义的实验任务。若非`custom`，则会覆盖其他部分参数。（默认：custom）\n  -e, --epochs EPOCHS   最大训练轮数。（默认：100）\n  -b, --batch-size BATCH_SIZE\n                        训练批次大小。（默认：64）\n  --lr LR               学习率。（默认：0.0005）\n\n模型特定选项：\n  -m, --model-type {Burgess}\n                        使用的编码器和解码器类型。（默认：Burgess）\n  -z, --latent-dim LATENT_DIM\n                        潜在变量的维度。（默认：10）\n  -l, --loss {VAE,betaH,betaB,factor,btcvae}\n                        使用的VAE损失函数类型。（默认：betaB）\n  -r, --rec-dist {bernoulli,laplace,gaussian}\n                        每个像素所使用的似然分布形式。（默认：伯努利）\n  -a, --reg-anneal REG_ANNEAL\n                        渐进式添加正则化项的退火步数。具体退火内容因损失而异。（默认：0）\n\nBetaH特有参数：\n  --betaH-B BETAH_B     KL散度项的权重（论文中的beta）。（默认：4）\n\nBetaB特有参数：\n  --betaB-initC BETAB_INITC\n                        初始退火容量。（默认：0）\n  --betaB-finC BETAB_FINC\n                        最终退火容量。（默认：25）\n  --betaB-G BETAB_G     KL散度项的权重（论文中的gamma）。（默认：1000）\n\nfactor VAE特有参数：\n  --factor-G FACTOR_G   TC项的权重（论文中的gamma）。（默认：6）\n  --lr-disc LR_DISC     判别器的学习率。（默认：5e-05）\n\nbeta-tcvae特有参数：\n  --btcvae-A BTCVAE_A   MI项的权重（论文中的alpha）。（默认：1）\n  --btcvae-G BTCVAE_G   维度级KL项的权重（论文中的gamma）。（默认：1）\n  --btcvae-B BTCVAE_B   TC项的权重（论文中的beta）。（默认：6）\n\n评估特定选项：\n  --is-eval-only        是否仅使用预先计算好的模型`name`进行评估。（默认：False）\n  --is-metrics          是否计算解耦指标。目前仅适用于`dsprites`数据集，因为它是唯一已知真实变化因素的数据集。（默认：False）\n  --no-test             是否不计算测试损失。（默认：False）\n  --eval-batchsize EVAL_BATCHSIZE\n                        评估时的批次大小。（默认：1000）\n```\n\n## 绘图\n\n使用 `python main_viz.py \u003Cmodel-name> \u003Cplot_types> \u003Cparam>` 可以利用预训练模型进行绘图。例如：\n\n```\npython main_viz.py btcvae_celeba_mini gif-traversals reconstruct-\n                        traverse -c 7 -r 6 -t 2 --is-posterior\n```\n\n这会将图表保存在模型目录 `results\u002F\u003Cmodel-name>\u002F` 中。所有实验生成的图表都位于各自的目录中（通过 `.\u002Fbin\u002Fplot_all.sh` 脚本创建）。\n\n### 帮助信息\n```\n用法: main_viz.py ...\n\n用于使用 `disvae` 的预训练模型进行绘图的命令行界面\n\n位置参数:\n  name                  模型名称，用于存储和加载。\n  {generate-samples,data-samples,reconstruct,traversals,reconstruct-traverse,gif-traversals,all}\n                        需要生成的所有图表类型。`generate-samples`: 随机解码样本。`data-samples`: 数据集中的样本。`reconstruct`: 前半部分为原始样本，后半部分为对应的重建样本。`traversals`: 沿着最重要的几个潜在维度进行遍历，每列展示来自先验或后验分布的不同样本。`reconstruct-traverse`: 第一行是原始样本，第二行是重建样本，其余行是遍历结果。`gif-traversals`: 由多张动图组成的网格，行代表潜在维度，列代表示例，每张动图展示后验分布的遍历过程。`all`: 运行所有类型的图表。\n\n可选参数:\n  -h, --help            显示帮助信息并退出\n  -s, --seed SEED       随机种子。可以设置为 `None` 以获得随机行为。（默认值：None）\n  -r, --n-rows N_ROWS   可视化时使用的行数（如果适用）。（默认值：6）\n  -c, --n-cols N_COLS   可视化时使用的列数（如果适用）。（默认值：7）\n  -t, --max-traversal MAX_TRAVERSAL\n                        潜在变量遍历时的最大位移量。假设遍历是对称的。若 `m>=0.5`，则采用绝对值遍历；若 `m\u003C0.5`，则基于分布的百分比（分位数）进行遍历。例如，在先验分布中，分布为标准正态分布，因此 `m=0.45` 对应于绝对值 `1.645`，因为 `2m=90%` 的标准正态分布在 `-1.645` 到 `1.645` 之间。需要注意的是，在后验分布中，分布不再是标准正态分布。（默认值：2）\n  -i, --idcs IDCS [IDCS ...]\n                        放置在样本开头的图像索引列表。（默认值：空列表）\n  -u, --upsample-factor UPSAMPLE_FACTOR\n                        图像上采样的缩放因子（如果适用）。（默认值：1）\n  --is-show-loss        在图表中显示损失值（如果适用）。（默认值：False）\n  --is-posterior        使用后验分布而非先验分布进行遍历。（默认值：False）\n```\n\n### 示例\n\n以下是一些你可以生成的图表示例：\n\n* `python main_viz.py \u003Cmodel> reconstruct-traverse --is-show-loss --is-posterior` 第一行是原始样本，第二行是重建样本，其余行是潜在变量遍历结果。以 `btcvae_dsprites` 为例：\n\n    ![btcvae_dsprites reconstruct-traverse](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FYannDubs_disentangling-vae_readme_1515905cc993.png)\n\n* `python main_viz.py \u003Cmodel> gif-traversals` 由多张动图组成的网格，行代表潜在维度，列代表示例，每张动图展示后验分布的遍历过程。以 `btcvae_celeba` 为例：\n\n    ![btcvae_celeba gif-traversals](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FYannDubs_disentangling-vae_readme_4a0c07731ae6.gif)\n\n* 使用 `bin\u002Fplot_all.sh` 脚本生成的动图网格。网格的列对应不同的数据集（除 FashionMNIST 外），行对应不同的模型（顺序为：标准 VAE、β-VAE\u003Csub>H\u003C\u002Fsub>、β-VAE\u003Csub>B\u003C\u002Fsub>、FactorVAE、β-TCVAE）：\n\n    ![grid_posteriors](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FYannDubs_disentangling-vae_readme_027567d3a036.gif)\n\n更多示例，请参阅预先定义的实验所生成的所有图表，它们都存放在各自的目录中（通过 `.\u002Fbin\u002Fplot_all.sh` 脚本创建）。\n\n## 数据\n\n当前可用的数据集包括：\n- [MNIST](http:\u002F\u002Fyann.lecun.com\u002Fexdb\u002Fmnist\u002F)\n- [FashionMNIST](https:\u002F\u002Fgithub.com\u002Fzalandoresearch\u002Ffashion-mnist)\n- [3D Chairs](https:\u002F\u002Fwww.di.ens.fr\u002Fwillow\u002Fresearch\u002Fseeing3Dchairs)\n- [Celeba](http:\u002F\u002Fmmlab.ie.cuhk.edu.hk\u002Fprojects\u002FCelebA.html)\n- [2D Shapes \u002F Dsprites](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fdsprites-dataset\u002F)\n\n首次运行时，数据集将会被下载并存储在 `data` 目录下，供后续使用。下载可能需要一些时间，且如果下载链接发生变化，可能会导致无法继续下载。在这种情况下，您可以：\n\n1. 提交一个问题报告。\n2. 在 `utils\u002Fdatasets.py` 文件中修改您所需数据集的 URL（请记得提交一个 Pull Request :) ）。\n3. 手动下载数据并以相同文件名保存（不推荐）。\n\n## 我们的贡献\n\n除了复现上述论文之外，我们还提出并研究了以下内容：\n\n### 轴对齐度量指标\n\n由于定性评估具有主观性和耗时性，因此不适合用来可靠地比较不同模型。近期的一些论文采用了基于真实变化因素 **v** 和潜在维度 **z** 的定量解耦度量方法。其中，[互信息差距 (MIG)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.04942) 是一种颇具吸引力的信息论度量指标，因为它无需使用任何分类器。在 dSprites 数据集中，我们有 10 个潜在维度和 5 个生成因素，若要使 MIG 达到 1，就需要让其中 5 个潜在维度精确地编码这些真实的变化因素，而其余 5 个则与前 5 个无关。\n\n尽管长期来看我们希望使用类似 MIG 的指标，但目前大多数模型的得分并不理想，且难以明确其改进方向。因此，我们提出了轴对齐度量指标 AAM，该指标并不关注 **z** 编码了多少 **v** 的信息，而是关注每个 **v\u003Csub>k\u003C\u002Fsub>** 是否仅由单个 **z\u003Csub>j\u003C\u002Fsub>** 编码。例如，在 dSprites 数据集中，即使 **z** 只编码了形状 x 位置方差的 90%，只要这 90% 完全由单一潜在维度 **z\u003Csub>j\u003C\u002Fsub>** 编码，就可以达到 AAM = 1。这一指标有助于更好地理解各个模型的优势和不足。形式化定义如下：\n\n![轴对齐度量指标](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FYannDubs_disentangling-vae_readme_65b4eca50369.png)\n\n其中下标 *(d)* 表示第 *d* 个次序统计量，而 *I\u003Csub>x\u003C\u002Fsub>* 则通过经验分布和分层抽样来估计（类似于 MIG）：\n\n![AAM 的互信息计算](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FYannDubs_disentangling-vae_readme_2533661ae17d.png)\n\n### 单一模型对比\n\n该模型与所有损失函数解耦，因此在不修改损失函数的情况下，很容易对编码器\u002F解码器进行修改。我们仅使用单一模型，以便更客观地比较不同的损失函数。所使用的模型来自论文《理解 β-VAE 中的解耦》，其架构如下所示：\n\n![模型架构](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FYannDubs_disentangling-vae_readme_ef4074d874e2.png)\n\n\n## 损失函数说明\n\n所有先前的损失函数都是以下损失函数的特例：\n\n![损失函数概览](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FYannDubs_disentangling-vae_readme_980688f77c48.png)\n\n1. **索引-代码互信息**：潜在变量 **z** 与数据变量 **x** 之间的互信息。关于如何正确处理该项，学术界存在争议。从信息瓶颈的角度来看，这一项应当受到惩罚。而 [InfoGAN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1606.03657) 则通过增加互信息（负 α）取得了良好效果。最后，[Wasserstein 自编码器](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.01558) 直接舍弃了该项。\n\n2. **总相关性（TC）**：潜在变量的联合分布与其边缘分布乘积之间的 KL 散度。即潜在空间各维度之间依赖性的度量。增大 β 值会迫使模型在数据分布中寻找统计独立的变化因子。\n\n3. **逐维 KL 散度**：后验分布的每个维度与先验分布之间的 KL 散度。该项确保学习到一个接近先验分布的紧凑空间，从而能够采样生成新的样本。\n\n不同损失函数在估算上述各项以及所使用的超参数方面存在差异：\n\n* [**标准 VAE 损失**](https:\u002F\u002Farxiv.org\u002Fabs\u002F1312.6114)：α=β=ɣ=1。各项均通过闭式解精确计算（先验与后验之间的 KL 散度）。为最紧的下界。\n* [**β-VAE\u003Csub>H\u003C\u002Fsub>**](https:\u002F\u002Fopenreview.net\u002Fpdf?id=Sy2fzU9gl)：α=β=ɣ>1。各项同样通过闭式解精确计算。只是在 KL 散度前增加了一个超参数（文中为 β）。\n* [**β-VAE\u003Csub>B\u003C\u002Fsub>**](https:\u002F\u002Farxiv.org\u002Fabs\u002F1804.03599)：α=β=ɣ>1。与 **β-VAE\u003Csub>H\u003C\u002Fsub>** 类似，但仅当三项偏离随训练过程逐渐增大的容量 C 时才予以惩罚。\n* [**FactorVAE**](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.05983)：α=ɣ=1，β>1。各项均通过闭式解精确计算。同样在 KL 散度前增加了一个超参数（文中为 β），并在标准 VAE 损失基础上添加了一个加权的总相关性项。总相关性采用分类器和密度比技巧进行估计。需要注意的是，他们论文中的 ɣ 在我们的框架中对应于 β+1。\n* [**β-TCVAE**](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.04942)：α=ɣ=1（尽管可以调整），β>1。概念上等同于 FactorVAE，但各项分别使用小批量分层采样进行估计。\n\n ## 引用\n\n在学术研究中使用本仓库实现的任一模型时，请引用相应的论文（链接位于 README 的顶部）。若需引用此特定实现，可使用以下格式：\n\n```\n@misc{dubois2019dvae,\n  title        = {Disentangling VAE},\n  author       = {Dubois, Yann and Kastanos, Alexandros and Lines, Dave and Melman, Bart},\n  month        = {march},\n  year         = {2019},\n  howpublished = {\\url{http:\u002F\u002Fgithub.com\u002FYannDubs\u002Fdisentangling-vae\u002F}}\n}\n```","# disentangling-vae 快速上手指南\n\n`disentangling-vae` 是一个基于 PyTorch 的开源项目，用于研究和比较变分自编码器（VAE）中的解耦（Disentanglement）能力。它实现了标准 VAE、$\\beta$-VAE、FactorVAE 和 $\\beta$-TCVAE 等五种主流损失函数，并提供了完整的训练、评估及可视化工具。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：Linux, macOS 或 Windows\n*   **Python 版本**：>= 3.6 (推荐 3.8+)\n*   **硬件支持**：支持 CPU 和 GPU (CUDA) 训练\n*   **核心依赖**：PyTorch, NumPy, Matplotlib 等（将通过 `requirements.txt` 自动安装）\n\n> **国内加速建议**：\n> 1. 建议使用国内镜像源安装 Python 依赖，例如清华源或阿里源，以加快下载速度。\n> 2. 项目首次运行时会自动下载数据集（如 MNIST, CelebA 等）。若官方源下载失败，可手动下载数据至 `data` 目录，或修改 `utils\u002Fdatasets.py` 中的下载链接。\n\n## 安装步骤\n\n1.  **克隆仓库**\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002FYannDubs\u002Fdisentangling-vae.git\n    cd disentangling-vae\n    ```\n\n2.  **安装依赖**\n    推荐使用国内镜像源安装依赖包：\n    ```bash\n    pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n    ```\n\n## 基本使用\n\n### 1. 训练模型\n\n使用 `main.py` 脚本进行模型训练或评估。您可以指定模型类型、数据集、损失函数及超参数。\n\n**示例命令**：\n以下命令使用 $\\beta$-TCVAE 损失函数在缩小的 CelebA 数据集上训练 5 个 epoch：\n\n```bash\npython main.py btcvae_celeba_mini -d celeba -l btcvae --lr 0.001 -b 256 -e 5\n```\n\n**常用参数说明**：\n*   `-d`: 数据集选择 (`mnist`, `fashion`, `dsprites`, `celeba`, `chairs`)\n*   `-l`: 损失函数类型 (`VAE`, `betaH`, `betaB`, `factor`, `btcvae`)\n*   `-e`: 训练轮数 (epochs)\n*   `-b`: 批次大小 (batch size)\n*   `--lr`: 学习率\n*   `-x`: 使用预定义实验配置（覆盖其他参数），例如 `-x best_celeba`\n\n训练完成后，结果将保存在 `results\u002F\u003Cmodel-name>\u002F` 目录下，包含模型权重 (`model.pt`)、训练日志及潜变量遍历的 GIF 动图。\n\n### 2. 可视化结果\n\n使用 `main_viz.py` 脚本对预训练模型进行可视化分析，包括重构图像、潜变量遍历等。\n\n**示例命令**：\n生成潜变量遍历的 GIF 动图及重构对比图：\n\n```bash\npython main_viz.py btcvae_celeba_mini gif-traversals reconstruct-traverse -c 7 -r 6 -t 2 --is-posterior\n```\n\n**常用参数说明**：\n*   `plot_types`: 绘图类型，可选 `generate-samples`, `reconstruct`, `traversals`, `gif-traversals`, `all` 等。\n*   `-r` \u002F `-c`: 设置输出图像的行数和列数。\n*   `--is-posterior`: 基于后验分布进行遍历（默认为先验分布）。\n*   `--is-show-loss`: 在图中显示损失值。\n\n生成的图片将保存至对应的 `results\u002F\u003Cmodel-name>\u002F` 目录中。","某计算机视觉团队正在开发一个基于生成式 AI 的虚拟时尚试衣系统，需要模型能够独立控制人物图像中的姿态、发型和服饰等特征，以生成高质量的个性化穿搭预览。\n\n### 没有 disentangling-vae 时\n- **特征耦合严重**：使用标准 VAE 训练时，潜变量相互纠缠，试图改变模特的“发型”往往会意外扭曲其“面部表情”或“身体姿态”。\n- **调试成本高昂**：团队难以判断是模型架构问题还是损失函数选择不当，缺乏统一框架来对比 $\\beta$-VAE、FactorVAE 等不同解耦策略的效果。\n- **缺乏量化评估**：仅靠肉眼观察生成结果，无法通过互信息间隙（MIG）等指标客观衡量特征分离程度，导致优化方向模糊。\n- **可视化困难**：缺少自动化的潜空间遍历工具，难以直观向非技术背景的产品经理展示模型是否真正学会了独立特征控制。\n\n### 使用 disentangling-vae 后\n- **实现精准控制**：通过集成 $\\beta$-TCVAE 或 FactorVAE 等损失函数，成功将姿态、纹理和形状分离到独立的潜变量维度，实现了“只换衣服不改脸”的精准编辑。\n- **高效策略对比**：利用工具内置的单架构对比功能，快速在同一数据集上测试了 5 种不同损失函数，迅速锁定了最适合时尚数据的 $\\beta$-VAE$_B$ 配置。\n- **科学量化迭代**：直接输出 MIG 和对齐度量指标，团队得以用数据驱动的方式监控训练进度，将特征解耦度提升了 40%。\n- **直观成果展示**：自动生成的潜变量遍历 GIF 清晰展示了每个维度控制的单一语义特征，极大降低了内部沟通成本并加速了产品演示准备。\n\ndisentangling-vae 通过提供标准化的解耦实验框架与量化指标，将原本依赖直觉的黑盒调优转化为可度量、可复现的科学工程流程。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FYannDubs_disentangling-vae_1515905c.png","YannDubs","Yann Dubois","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FYannDubs_e45fa8a8.jpg","Building open AI.","Stanford PhD in AI","Stanford","yanndubois96@gmail.com","yanndubs","yanndubs.github.io","https:\u002F\u002Fgithub.com\u002FYannDubs",[83,87],{"name":84,"color":85,"percentage":86},"Python","#3572A5",96.5,{"name":88,"color":89,"percentage":90},"Shell","#89e051",3.5,841,148,"2026-02-22T13:40:56","NOASSERTION","未说明","非必需（支持 CPU 和 GPU 运行），具体型号、显存大小及 CUDA 版本未说明",{"notes":98,"python":99,"dependencies":100},"该工具用于研究变分自编码器（VAE）中的解耦特性。首次运行时会自动下载数据集（如 MNIST, CelebA, Dsprites 等）并存储在项目根目录的 data 文件夹下，若下载链接失效需手动修改代码中的 URL。训练和评估可通过命令行参数灵活配置，支持多种损失函数和预定义实验。","3.6+",[101,102],"torch (PyTorch)","requirements.txt 中列出的其他依赖（具体列表未在 README 中展示）",[15,104,16,14],"其他",[106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122],"beta-vae","factor-vae","vae","variational-autoencoder","unsupervised-learning","celeba","dsprites","beta-tcvae","disentangled-representations","chairs-dataset","mnist","fashion-mnist","representation-learning","reproducible-research","deep-learning","pytorch","disentanglement",null,"2026-03-27T02:49:30.150509","2026-04-18T14:32:13.329432",[127,132,137,142,147,152,157],{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},39669,"运行 FactorVAE 时出现“原地操作（inplace operation）”导致的梯度计算错误怎么办？","该错误通常发生在判别器损失计算中。虽然尝试将 leaky_relu 的 inplace 参数设为 False 可能无效，但关键在于确保梯度计算的正确性。维护者指出，代码中通过对第二个样本进行 detach 操作（`z_perm = _permute_dims(latent_sample2).detach()`），确保了在计算判别器梯度时不会流向 VAE，从而保证了梯度计算的正确性。尽管这可能导致优化动态的微小变化（判别器试图区分“上一步”的样本），但这不影响最终结果。如果必须解决版本冲突且不想改变逻辑，可能需要重新计算第一个样本以适配不同的优化器步骤，但通常现有的 detach 机制已足够。","https:\u002F\u002Fgithub.com\u002FYannDubs\u002Fdisentangling-vae\u002Fissues\u002F62",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},39670,"为什么 Beta-TC VAE 的总相关损失（tc_loss）会出现负值？KL 散度不应该是正的吗？","虽然理论上的 KL 散度总是非负的，但其估计值（estimate）可能是负的，这是因为使用了随机近似方法。维护者建议尝试增加批量大小（batch size），这通常能改善估计的准确性并减少负值的出现。此外，查看项目的训练日志可以发现，即使在官方结果中，Beta-TC VAE 的 tc_loss 也可能显示为负值，而 FactorVAE 的 tc_loss 通常为正值，这取决于具体的估计方法（如 MSS 与 MWS）和模型设置。","https:\u002F\u002Fgithub.com\u002FYannDubs\u002Fdisentangling-vae\u002Fissues\u002F54",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},39671,"遇到\"num_samples=0\"错误导致无法加载数据集该如何解决？","这个错误通常是因为自定义数据集类没有正确返回样本数量，或者数据路径配置有误。`dataset` 参数不是用来直接指定文件路径的标志，而是用于选择预定义的数据集。如果你想使用自己的数据集，必须定义一个继承自 PyTorch Dataset 的新类。你可以参考项目中已有的 CelebA 数据集实现（utils\u002Fdatasets.py），确保实现了 `__len__` 方法以返回正的整数样本数。如果数据已经下载好，可以在自定义类的 `download` 方法中直接 `return` 以跳过下载步骤。","https:\u002F\u002Fgithub.com\u002FYannDubs\u002Fdisentangling-vae\u002Fissues\u002F51",{"id":143,"question_zh":144,"answer_zh":145,"source_url":146},39672,"如何将此项目应用于我自己的数据集？","你需要创建一个符合 PyTorch DataLoader 要求的数据集类。具体来说，需要定义一个类，使其能够返回图像数据和对应的标签（如果需要）。可以参考项目中 `utils\u002Fdatasets.py` 第 213 行左右的代码示例，了解需要返回的数据格式。对于文件夹结构为“类别 - 子类别 - 图片”的情况，你需要编写相应的逻辑来遍历目录并加载图像，确保 `__getitem__` 返回正确的张量格式，并在 `__len__` 中返回数据集总数。","https:\u002F\u002Fgithub.com\u002FYannDubs\u002Fdisentangling-vae\u002Fissues\u002F73",{"id":148,"question_zh":149,"answer_zh":150,"source_url":151},39673,"关于 H_z（潜在变量熵）的计算，为什么采用采样估计而不是积分计算？","项目的目标是估计单个潜在维度 $z_j$ 的熵，而不是整个向量 $z$。虽然理论上可以通过积分计算概率密度，但在高维空间中，“量化估计”（estimation by quantization）的计算成本会变得不可接受。论文中也明确指出，对于标量以外的情况，采样估计（estimating by sampling）是更可行的方法。单次运行的误差分析没有意义，因为这是一种随机近似；建议运行多次试验（例如约 10000 次）并取平均值，这样得到的平均结果会非常接近真实的 $H_z$ 值。","https:\u002F\u002Fgithub.com\u002FYannDubs\u002Fdisentangling-vae\u002Fissues\u002F59",{"id":153,"question_zh":154,"answer_zh":155,"source_url":156},39674,"训练时显示使用的是 CPU 而不是 GPU，即使机器上有可用的 CUDA 设备，该如何启用 GPU 训练？","默认情况下，如果检测到 CUDA 设备，代码应自动使用 GPU。如果日志显示 `Training Device: cpu`，请首先检查你的 PyTorch 安装是否正确支持 CUDA（可以通过 `torch.cuda.is_available()` 验证）。确保没有其他进程占用了 GPU。如果确认环境无误但仍强制使用 CPU，可能需要检查代码中是否显式设置了设备参数，或者在运行命令时通过环境变量 `CUDA_VISIBLE_DEVICES` 指定可用的 GPU ID。通常情况下，无需额外参数，只要 PyTorch 能识别 GPU，脚本会自动切换。","https:\u002F\u002Fgithub.com\u002FYannDubs\u002Fdisentangling-vae\u002Fissues\u002F57",{"id":158,"question_zh":159,"answer_zh":160,"source_url":161},39675,"能否提供用于生成模型架构图的 PlotNeuralNet 代码？","维护者确认架构图确实是使用 PlotNeuralNet 生成的，但遗憾的是，生成该特定图表的具体代码已被删除，无法直接分享。不过，维护者回忆该过程相对直接，主要是使用 PlotNeuralNet 中现有的块（blocks）并进行了一些微调。用户可以参考 PlotNeuralNet 的官方文档和示例，利用其基础组件自行构建类似的 VAE 架构可视化图。","https:\u002F\u002Fgithub.com\u002FYannDubs\u002Fdisentangling-vae\u002Fissues\u002F65",[]]