[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-taesungp--swapping-autoencoder-pytorch":3,"tool-taesungp--swapping-autoencoder-pytorch":64},[4,17,26,40,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,2,"2026-04-03T11:11:01",[13,14,15],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":23,"last_commit_at":32,"category_tags":33,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,34,35,36,15,37,38,13,39],"数据工具","视频","插件","其他","语言模型","音频",{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":10,"last_commit_at":46,"category_tags":47,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,38,37],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":10,"last_commit_at":54,"category_tags":55,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74939,"2026-04-05T23:16:38",[38,14,13,37],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":23,"last_commit_at":62,"category_tags":63,"status":16},2471,"tesseract","tesseract-ocr\u002Ftesseract","Tesseract 是一款历史悠久且备受推崇的开源光学字符识别（OCR）引擎，最初由惠普实验室开发，后由 Google 维护，目前由全球社区共同贡献。它的核心功能是将图片中的文字转化为可编辑、可搜索的文本数据，有效解决了从扫描件、照片或 PDF 文档中提取文字信息的难题，是数字化归档和信息自动化的重要基础工具。\n\n在技术层面，Tesseract 展现了强大的适应能力。从版本 4 开始，它引入了基于长短期记忆网络（LSTM）的神经网络 OCR 引擎，显著提升了行识别的准确率；同时，为了兼顾旧有需求，它依然支持传统的字符模式识别引擎。Tesseract 原生支持 UTF-8 编码，开箱即用即可识别超过 100 种语言，并兼容 PNG、JPEG、TIFF 等多种常见图像格式。输出方面，它灵活支持纯文本、hOCR、PDF、TSV 等多种格式，方便后续数据处理。\n\nTesseract 主要面向开发者、研究人员以及需要构建文档处理流程的企业用户。由于它本身是一个命令行工具和库（libtesseract），不包含图形用户界面（GUI），因此最适合具备一定编程能力的技术人员集成到自动化脚本或应用程序中",73286,"2026-04-03T01:56:45",[13,14],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":80,"owner_location":80,"owner_email":80,"owner_twitter":80,"owner_website":80,"owner_url":81,"languages":82,"stars":95,"forks":96,"last_commit_at":97,"license":98,"difficulty_score":99,"env_os":100,"env_gpu":101,"env_ram":100,"env_deps":102,"category_tags":116,"github_topics":80,"view_count":23,"oss_zip_url":80,"oss_zip_packed_at":80,"status":16,"created_at":117,"updated_at":118,"faqs":119,"releases":155},3170,"taesungp\u002Fswapping-autoencoder-pytorch","swapping-autoencoder-pytorch","Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)","swapping-autoencoder-pytorch 是 NeurIPS 2020 论文《Swapping Autoencoder for Deep Image Manipulation》的官方 PyTorch 实现，由加州大学伯克利分校与 Adobe Research 联合开源。这款工具旨在解决深度图像编辑中“结构”与“纹理”难以独立控制的难题。它通过一种创新的交换自编码器架构，能够将输入图像编码为两个独立的潜在代码：一个是保留空间布局的结构代码，另一个是捕捉风格信息的纹理代码。\n\n借助这种分离机制，用户可以轻松地将一张图片的构图与另一张图片的纹理进行无缝融合，例如把教堂的建筑结构赋予森林的质感，同时保持生成图像的高度逼真与自然。其核心技术亮点在于引入了补丁共现判别器（patch co-occurrence discriminator），确保替换后的纹理在局部细节上与参考图一致，避免了传统方法中常见的伪影或不协调感。\n\nswapping-autoencoder-pytorch 主要适合人工智能研究人员、计算机视觉开发者以及需要高质量图像合成能力的数字艺术家使用。由于项目依赖自定义","swapping-autoencoder-pytorch 是 NeurIPS 2020 论文《Swapping Autoencoder for Deep Image Manipulation》的官方 PyTorch 实现，由加州大学伯克利分校与 Adobe Research 联合开源。这款工具旨在解决深度图像编辑中“结构”与“纹理”难以独立控制的难题。它通过一种创新的交换自编码器架构，能够将输入图像编码为两个独立的潜在代码：一个是保留空间布局的结构代码，另一个是捕捉风格信息的纹理代码。\n\n借助这种分离机制，用户可以轻松地将一张图片的构图与另一张图片的纹理进行无缝融合，例如把教堂的建筑结构赋予森林的质感，同时保持生成图像的高度逼真与自然。其核心技术亮点在于引入了补丁共现判别器（patch co-occurrence discriminator），确保替换后的纹理在局部细节上与参考图一致，避免了传统方法中常见的伪影或不协调感。\n\nswapping-autoencoder-pytorch 主要适合人工智能研究人员、计算机视觉开发者以及需要高质量图像合成能力的数字艺术家使用。由于项目依赖自定义 CUDA 内核及特定的环境配置（如 PyTorch 1.7+ 和 CUDA 10.1+），它更偏向于具备一定深度学习工程基础的专业用户，而非普通大众。对于希望探索可控图像生成、风格迁移或进行相关算法研究的用户来说，这是一个极具参考价值的高质量开源项目。","# Swapping Autoencoder for Deep Image Manipulation\n\n[Taesung Park](http:\u002F\u002Ftaesung.me\u002F), [Jun-Yan Zhu](https:\u002F\u002Fwww.cs.cmu.edu\u002F~junyanz\u002F), [Oliver Wang](http:\u002F\u002Fwww.oliverwang.info\u002F), [Jingwan Lu](https:\u002F\u002Fresearch.adobe.com\u002Fperson\u002Fjingwan-lu\u002F), [Eli Shechtman](https:\u002F\u002Fresearch.adobe.com\u002Fperson\u002Feli-shechtman\u002F), [Alexei A. Efros](http:\u002F\u002Fwww.eecs.berkeley.edu\u002F~efros\u002F), [Richard Zhang](https:\u002F\u002Frichzhang.github.io\u002F)\n\nUC Berkeley and Adobe Research\n\nNeurIPS 2020\n\n![teaser](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaesungp_swapping-autoencoder-pytorch_readme_3946949bdeb1.jpg)\n\u003Cp float=\"left\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaesungp_swapping-autoencoder-pytorch_readme_703e505927a2.gif\" height=\"190\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaesungp_swapping-autoencoder-pytorch_readme_86fd5315d6fb.gif\" height=\"190\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaesungp_swapping-autoencoder-pytorch_readme_b6960842fd63.gif\" height=\"190\" \u002F>\n\u003C\u002Fp>\n\n### [Project page](https:\u002F\u002Ftaesung.me\u002FSwappingAutoencoder\u002F) |   [Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.00653) | [3 Min Video](https:\u002F\u002Fyoutu.be\u002F0elW11wRNpg)\n\n\n## Overview\n\u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaesungp_swapping-autoencoder-pytorch_readme_f059ed3b71d1.jpg' width=\"1000px\"\u002F>\n\n**Swapping Autoencoder** consists of autoencoding (top) and swapping (bottom) operation.\n**Top**: An encoder E embeds an input (Notre-Dame) into two codes. The structure code is a tensor with spatial dimensions; the texture code is a 2048-dimensional vector. Decoding with generator G should produce a realistic image (enforced by discriminator D matching the input (reconstruction loss).\n**Bottom**: Decoding with the texture code from a second image (Saint Basil's Cathedral) should look realistic (via D) and match the texture of the image, by training with a patch co-occurrence discriminator Dpatch that enforces the output and reference patches look indistinguishable.\n\n## Installation \u002F Requirements\n\n- CUDA 10.1 or newer is required because it uses a custom CUDA kernel of [StyleGAN2](https:\u002F\u002Fgithub.com\u002FNVlabs\u002Fstylegan2\u002F), ported by [@rosinality](https:\u002F\u002Fgithub.com\u002Frosinality\u002Fstylegan2-pytorch)\n- The author used PyTorch 1.7.1 on Python 3.6\n- Install dependencies with `pip install dominate torchgeometry func-timeout tqdm matplotlib opencv_python lmdb numpy GPUtil Pillow scikit-learn visdom ninja`\n\n## Testing and Evaluation.\n\nWe provide the pretrained models and also several images that reproduce the figures of the paper. Please download and unzip them [here (2.1GB)](http:\u002F\u002Fefrosgans.eecs.berkeley.edu\u002FSwappingAutoencoder\u002Fswapping_autoencoder_models_and_test_images.zip) (Note: this is a http (not https) address, and you may need to paste in the link URL directly in the address bar for some browsers like Chrome, or download the dataset using `wget`). The scripts assume that the checkpoints are at `.\u002Fcheckpoints\u002F`, and the test images at `.\u002Ftestphotos\u002F`, but they can be changed by modifying `--checkpoints_dir` and `--dataroot` options.\n\nUPDATE: The pretrained model for the AFHQ dataset was added. Please download the models and samples images [here (256MB)](http:\u002F\u002Fefrosgans.eecs.berkeley.edu\u002FSwappingAutoencoder\u002Fafhq_models_and_test_images.zip) (Note: again, you may need to paste in the link URL directly in the address bar).\n\n### Swapping and Interpolation of the mountain model using sample images\n\n\u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaesungp_swapping-autoencoder-pytorch_readme_97ff73418e2d.png' width=\"1000px\"\u002F>\n\nTo run simple swapping and interpolation, specify the two input reference images, change `input_structure_image` and `input_texture_image` fields of\n`experiments\u002Fmountain_pretrained_launcher.py`, and run\n```bash\npython -m experiments mountain_pretrained test simple_swapping\npython -m experiments mountain_pretrained test simple_interpolation\n```\n\nThe provided script, `opt.tag(\"simple_swapping\")` and `opt.tag(\"simple_interpolation\")` in particular of `experiments\u002Fmountain_pretrained_launcher.py`, invokes a terminal command that looks similar to the following one.\n\n```bash\npython test.py --evaluation_metrics simple_swapping \\\n--preprocess scale_shortside --load_size 512 \\\n--name mountain_pretrained  \\\n--input_structure_image [path_to_sample_image] \\\n--input_texture_image [path_to_sample_image] \\\n--texture_mix_alpha 0.0 0.25 0.5 0.75 1.0\n```\n\nIn other words, feel free to use this command if that feels more straightforward.\n\nThe output images are saved at `.\u002Fresults\u002Fmountain_pretrained\u002Fsimpleswapping\u002F`.\n\n### Texture Swapping\n\n\u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaesungp_swapping-autoencoder-pytorch_readme_ca5e0b650d60.jpg' width=\"1000px\"\u002F>\nOur Swapping Autoencoder learns to disentangle texture from structure for image editing tasks such as texture swapping.  Each row shows the result of combining the structure code of the leftmost image with the texture code of the top image.\n\nTo reproduce this image (Figure 4) as well as Figures 9 and 12 of the paper, run\nthe following command:\n```bash\n\n# Reads options from .\u002Fexperiments\u002Fchurch_pretrained_launcher.py\npython -m experiments church_pretrained test swapping_grid\n\n# Reads options from .\u002Fexperiments\u002Fbedroom_pretrained_launcher.py\npython -m experiments bedroom_pretrained test swapping_grid\n\n# Reads options from .\u002Fexperiments\u002Fmountain_pretrained_launcher.py\npython -m experiments mountain_pretrained test swapping_grid\n\n# Reads options from .\u002Fexperiments\u002Fffhq512_pretrained_launcher.py\npython -m experiments ffhq512_pretrained test swapping_grid\n```\n\nMake sure the `dataroot` and `checkpoints_dir` paths are correctly set in\nthe respective `.\u002Fexperiments\u002Fxx_pretrained_launcher.py` script.\n\n### Quantitative Evaluations\n\nTo perform quantitative evaluation such as FID in Table 1, Fig 5, and Table 2, we first need to prepare image pairs of input structure and texture references images.\n\nThe reference images are randomly selected from the val set of LSUN, FFHQ, and the Waterfalls dataset. The pairs of input structure and texture images should be located at `input_structure\u002F` and `input_style\u002F` directory, with the same file name. For example, `input_structure\u002F001.png` and `input_style\u002F001.png` will be loaded together for swapping.\n\nReplace the path to the test images at `dataroot=\".\u002Ftestphotos\u002Fchurch\u002Ffig5_tab2\u002F\"` field of the script `experiments\u002Fchurch_pretrained_launcher.py`, and run\n```bash\npython -m experiments church_pretrained test swapping_for_eval\npython -m experiments ffhq1024_pretrained test swapping_for_eval\n```\n\nThe results can be viewed at `.\u002Fresults` (that can be changed using `--result_dir` option).\n\nThe FID is then computed between the swapped images and the original structure images, using https:\u002F\u002Fgithub.com\u002Fmseitzer\u002Fpytorch-fid.\n\n## Model Training.\n\n### Datasets\n\n- *LSUN Church and Bedroom* datasets can be downloaded [here](https:\u002F\u002Fgithub.com\u002Ffyu\u002Flsun). Once downloaded and unzipped, the directories should contain `[category]_[train\u002Fval]_lmdb\u002F`.\n- [*FFHQ datasets*](https:\u002F\u002Fgithub.com\u002FNVlabs\u002Fffhq-dataset) can be downloaded using this [link](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1WvlAIvuochQn_L_f9p3OdFdTiSLlnnhv\u002Fview?usp=sharing). This is the zip file of 70,000 images at 1024x1024 resolution. Unzip the files, and we will load the image files directly.\n- The *Flickr Mountains* dataset and the *Flickr Waterfall* dataset are not sharable due to license issues. But the images were scraped from [Mountains Anywhere](https:\u002F\u002Fflickr.com\u002Fgroups\u002F62119907@N00\u002F) and [Waterfalls Around the World](https:\u002F\u002Fflickr.com\u002Fgroups\u002F52241685729@N01\u002F), using the [Python wrapper for the Flickr API](https:\u002F\u002Fgithub.com\u002Falexis-mignon\u002Fpython-flickr-api). Please contact [Taesung Park](http:\u002F\u002Ftaesung.me\u002F) with title \"Flickr Dataset for Swapping Autoencoder\" for more details.\n- *AFHQ dataset* can be downloaded [here](https:\u002F\u002Fgithub.com\u002Fclovaai\u002Fstargan-v2\u002Fblob\u002Fmaster\u002FREADME.md#animal-faces-hq-dataset-afhq). \n\n### Training Scripts\n\nThe training configurations are specified using the scripts in `experiments\u002F*_launcher.py`. Use the following commands to launch various trainings.\n\n```bash\n# Modify |dataroot| and |checkpoints_dir| at\n# experiments\u002F[church,bedroom,ffhq,mountain]_launcher.py\npython -m experiments church train church_default\npython -m experiments bedroom train bedroom_default\npython -m experiments ffhq train ffhq512_default\npython -m experiments ffhq train ffhq1024_default\n\n# By default, the script uses GPUtil to look at available GPUs\n# on the machine and sets appropriate GPU IDs. To specify specific set of GPUs,\n# use the |--gpu| option. Be sure to also change |num_gpus| option in the corresponding script.\npython -m experiments church train church_default --gpu 01234567\n\n```\n\nThe training progress can be monitored using `visdom` at the port number specified by `--display_port`. The default is https:\u002F\u002Flocalhost:2004. For reference, the training takes 14 days on LSUN Church 256px, using 4 V100 GPUs. \n\nAdditionally, a few swapping grids are generated using random samples of the training set.\nThey are saved as webpages at `[checkpoints_dir]\u002F[expr_name]\u002Fsnapshots\u002F`.\nThe frequency of the grid generation is controlled using `--evaluation_freq`.\n\nAll configurable parameters are printed at the beginning of training. These configurations are spreaded throughout the codes in `def modify_commandline_options` of relevant classes, such as `models\u002Fswapping_autoencoder_model.py`, `util\u002Fiter_counter.py`, or `models\u002Fnetworks\u002Fencoder.py`. To change these configuration, simply modify the corresponding option in `opt.specify` of the training script.\n\nThe code for parsing and configurations are at `experiments\u002F__init__.py, experiments\u002F__main__.py, experiments\u002Ftmux_launcher.py`.\n\n### Continuing training.\n\nThe training continues by default from the last checkpoint, because the `--continue_train` option is set True by default.\nTo start from scratch, remove the checkpoint, or specify `continue_train=False` in the training script (e.g. `experiments\u002Fchurch_launcher.py`).\n\n## Code Structure (Main Functions)\n\n- `models\u002Fswapping_autoencoder_model.py`: The core file that defines losses, produces visuals.\n- `optimizers\u002Fswapping_autoencoder_optimizer.py`: Defines the optimizers and alternating training of GAN.\n- `models\u002Fnetworks\u002F`: contains the model architectures `generator.py`, `discriminator.py`, `encoder.py`, `patch_discrimiantor.py`, `stylegan2_layers.py`.\n- `options\u002F__init__.py`: contains basic option flags. BUT many important flags are spread out over files, such as `swapping_autoencoder_model.py` or `generator.py`. When the program starts, these options are all parsed together. The best way to check the used option list is to run the training script, and look at the console output of the configured options.\n- `util\u002Fiter_counter.py`: contains iteration counting.\n\n## Change Log\n\n- 4\u002F14\u002F2021: The configuration to train the pretrained model on the Mountains dataset had not been set correctly, and was updated accordingly. \n- 10\u002F14\u002F2021: The 256x256 pretrained model for the AFHQ dataset was added. Please use `experiments\u002Fafhq_pretrained_launcher.py`. \n\n## Bibtex\nIf you use this code for your research, please cite our paper:\n```\n@inproceedings{park2020swapping,\n  title={Swapping Autoencoder for Deep Image Manipulation},\n  author={Park, Taesung and Zhu, Jun-Yan and Wang, Oliver and Lu, Jingwan and Shechtman, Eli and Efros, Alexei A. and Zhang, Richard},\n  booktitle={Advances in Neural Information Processing Systems},\n  year={2020}\n}\n```\n## Acknowledgment\n\nThe StyleGAN2 layers heavily borrows (or rather, directly copies!) the PyTorch implementation of [@rosinality](https:\u002F\u002Fgithub.com\u002Frosinality\u002Fstylegan2-pytorch). We thank Nicholas Kolkin for the helpful discussion on the automated content and style evaluation, Jeongo Seo and Yoseob Kim for advice on the user interface, and William T. Peebles, Tongzhou Wang, and Yu Sun for the discussion on disentanglement.\n","# 用交换自编码器进行深度图像 manipulation\n\n[Taesung Park](http:\u002F\u002Ftaesung.me\u002F)、[Jun-Yan Zhu](https:\u002F\u002Fwww.cs.cmu.edu\u002F~junyanz\u002F)、[Oliver Wang](http:\u002F\u002Fwww.oliverwang.info\u002F)、[Jingwan Lu](https:\u002F\u002Fresearch.adobe.com\u002Fperson\u002Fjingwan-lu\u002F)、[Eli Shechtman](https:\u002F\u002Fresearch.adobe.com\u002Fperson\u002Feli-shechtman\u002F)、[Alexei A. Efros](http:\u002F\u002Fwww.eecs.berkeley.edu\u002F~efros\u002F)、[Richard Zhang](https:\u002F\u002Frichzhang.github.io\u002F)\n\n加州大学伯克利分校与 Adobe 研究院\n\nNeurIPS 2020\n\n![teaser](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaesungp_swapping-autoencoder-pytorch_readme_3946949bdeb1.jpg)\n\u003Cp float=\"left\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaesungp_swapping-autoencoder-pytorch_readme_703e505927a2.gif\" height=\"190\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaesungp_swapping-autoencoder-pytorch_readme_86fd5315d6fb.gif\" height=\"190\" \u002F>\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaesungp_swapping-autoencoder-pytorch_readme_b6960842fd63.gif\" height=\"190\" \u002F>\n\u003C\u002Fp>\n\n### [项目页面](https:\u002F\u002Ftaesung.me\u002FSwappingAutoencoder\u002F) |   [论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.00653) | [3 分钟视频](https:\u002F\u002Fyoutu.be\u002F0elW11wRNpg)\n\n\n## 概述\n\u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaesungp_swapping-autoencoder-pytorch_readme_f059ed3b71d1.jpg' width=\"1000px\"\u002F>\n\n**交换自编码器**由编码（顶部）和交换（底部）两部分组成。\n**顶部**：编码器 E 将输入图像（巴黎圣母院）编码为两个代码。结构代码是一个具有空间维度的张量；纹理代码是一个 2048 维向量。使用生成器 G 解码后应生成一张逼真的图像（由判别器 D 确保与输入匹配，即重建损失）。\n**底部**：使用来自第二张图像（圣巴西尔大教堂）的纹理代码解码时，应在判别器 D 的监督下看起来逼真，并且纹理与目标图像一致。为此，通过训练一个补丁共现判别器 Dpatch 来确保输出图像和参考图像的局部区域难以区分。\n\n## 安装 \u002F 要求\n\n- 需要 CUDA 10.1 或更高版本，因为该模型使用了 [StyleGAN2](https:\u002F\u002Fgithub.com\u002FNVlabs\u002Fstylegan2\u002F) 的自定义 CUDA 内核，由 [@rosinality](https:\u002F\u002Fgithub.com\u002Frosinality\u002Fstylegan2-pytorch) 移植而来。\n- 作者在 Python 3.6 上使用了 PyTorch 1.7.1。\n- 使用 `pip install dominate torchgeometry func-timeout tqdm matplotlib opencv_python lmdb numpy GPUtil Pillow scikit-learn visdom ninja` 安装依赖项。\n\n## 测试与评估。\n\n我们提供了预训练好的模型以及用于复现论文中各图的若干示例图片。请从 [这里 (2.1GB)](http:\u002F\u002Fefrosgans.eecs.berkeley.edu\u002FSwappingAutoencoder\u002Fswapping_autoencoder_models_and_test_images.zip) 下载并解压（注意：这是一个 http（而非 https）地址，某些浏览器如 Chrome 可能需要将链接直接粘贴到地址栏中，或者使用 `wget` 下载数据集）。脚本假设检查点位于 `.\u002Fcheckpoints\u002F`，测试图片位于 `.\u002Ftestphotos\u002F`，但可以通过修改 `--checkpoints_dir` 和 `--dataroot` 参数来更改路径。\n\n更新：新增了 AFHQ 数据集的预训练模型。请从 [这里 (256MB)](http:\u002F\u002Fefrosgans.eecs.berkeley.edu\u002FSwappingAutoencoder\u002Fafhq_models_and_test_images.zip) 下载模型和示例图片（同样，可能需要将链接直接粘贴到地址栏中）。\n\n### 使用示例图片对山景模型进行交换与插值\n\n\u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaesungp_swapping-autoencoder-pytorch_readme_97ff73418e2d.png' width=\"1000px\"\u002F>\n\n要运行简单的交换和插值操作，需指定两幅参考图像，修改 `experiments\u002Fmountain_pretrained_launcher.py` 中的 `input_structure_image` 和 `input_texture_image` 字段，然后运行：\n```bash\npython -m experiments mountain_pretrained test simple_swapping\npython -m experiments mountain_pretrained test simple_interpolation\n```\n\n提供的脚本，尤其是 `experiments\u002Fmountain_pretrained_launcher.py` 中的 `opt.tag(\"simple_swapping\")` 和 `opt.tag(\"simple_interpolation\")`，会调用类似于以下的终端命令：\n```bash\npython test.py --evaluation_metrics simple_swapping \\\n--preprocess scale_shortside --load_size 512 \\\n--name mountain_pretrained  \\\n--input_structure_image [样本图片路径] \\\n--input_texture_image [样本图片路径] \\\n--texture_mix_alpha 0.0 0.25 0.5 0.75 1.0\n```\n\n换句话说，如果觉得这样更直观，也可以直接使用此命令。\n\n输出图像将保存在 `.\u002Fresults\u002Fmountain_pretrained\u002Fsimpleswapping\u002F` 目录下。\n\n### 纹理交换\n\n\u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaesungp_swapping-autoencoder-pytorch_readme_ca5e0b650d60.jpg' width=\"1000px\"\u002F>\n我们的交换自编码器能够学习将纹理与结构分离，从而实现诸如纹理交换之类的图像编辑任务。每一行展示了最左侧图像的结构代码与上方图像的纹理代码结合后的结果。\n\n要复现这张图（图 4）以及论文中的图 9 和图 12，可运行以下命令：\n```bash\n\n# 读取 .\u002Fexperiments\u002Fchurch_pretrained_launcher.py 中的选项\npython -m experiments church_pretrained test swapping_grid\n\n# 读取 .\u002Fexperiments\u002Fbedroom_pretrained_launcher.py 中的选项\npython -m experiments bedroom_pretrained test swapping_grid\n\n# 读取 .\u002Fexperiments\u002Fmountain_pretrained_launcher.py 中的选项\npython -m experiments mountain_pretrained test swapping_grid\n\n# 读取 .\u002Fexperiments\u002Fffhq512_pretrained_launcher.py 中的选项\npython -m experiments ffhq512_pretrained test swapping_grid\n```\n\n请确保在相应的 `.\u002Fexperiments\u002Fxx_pretrained_launcher.py` 脚本中正确设置了 `dataroot` 和 `checkpoints_dir` 路径。\n\n### 定量评估\n\n要进行表 1、图 5 和表 2 中提到的 FID 等定量评估，首先需要准备输入结构和纹理参考图像的配对。\n\n这些参考图像随机选自 LSUN、FFHQ 和瀑布数据集的验证集。输入结构和纹理图像应分别放在 `input_structure\u002F` 和 `input_style\u002F` 目录中，且文件名相同。例如，`input_structure\u002F001.png` 和 `input_style\u002F001.png` 将被一起加载以进行交换。\n\n将脚本 `experiments\u002Fchurch_pretrained_launcher.py` 中的 `dataroot=\".\u002Ftestphotos\u002Fchurch\u002Ffig5_tab2\u002F\"` 字段替换为测试图片的路径，然后运行：\n```bash\npython -m experiments church_pretrained test swapping_for_eval\npython -m experiments ffhq1024_pretrained test swapping_for_eval\n```\n\n结果可在 `.\u002Fresults` 目录中查看（可通过 `--result_dir` 参数更改保存路径）。\n\n随后，使用 https:\u002F\u002Fgithub.com\u002Fmseitzer\u002Fpytorch-fid 计算交换后的图像与原始结构图像之间的 FID 值。\n\n## 模型训练。\n\n### 数据集\n\n- *LSUN 教堂和卧室* 数据集可以从 [这里](https:\u002F\u002Fgithub.com\u002Ffyu\u002Flsun) 下载。下载并解压后，目录应包含 `[category]_[train\u002Fval]_lmdb\u002F`。\n- [*FFHQ 数据集*](https:\u002F\u002Fgithub.com\u002FNVlabs\u002Fffhq-dataset) 可以通过此 [链接](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1WvlAIvuochQn_L_f9p3OdFdTiSLlnnhv\u002Fview?usp=sharing) 下载。这是一个包含 70,000 张 1024x1024 分辨率图像的压缩文件。解压后，我们将直接加载这些图像文件。\n- *Flickr 山脉* 数据集和 *Flickr 瀑布* 数据集由于许可问题无法公开分享。不过，这些图片是从 [Mountains Anywhere](https:\u002F\u002Fflickr.com\u002Fgroups\u002F62119907@N00\u002F) 和 [Waterfalls Around the World](https:\u002F\u002Fflickr.com\u002Fgroups\u002F52241685729@N01\u002F) 收集而来，使用了 [Flickr API 的 Python 封装库](https:\u002F\u002Fgithub.com\u002Falexis-mignon\u002Fpython-flickr-api)。如需更多详情，请以“用于交换自编码器的 Flickr 数据集”为标题联系 [Taesung Park](http:\u002F\u002Ftaesung.me\u002F)。\n- *AFHQ 数据集* 可以从 [这里](https:\u002F\u002Fgithub.com\u002Fclovaai\u002Fstargan-v2\u002Fblob\u002Fmaster\u002FREADME.md#animal-faces-hq-dataset-afhq) 下载。\n\n### 训练脚本\n\n训练配置通过 `experiments\u002F*_launcher.py` 中的脚本指定。可以使用以下命令启动各种训练：\n\n```bash\n# 修改 experiments\u002F[church,bedroom,ffhq,mountain]_launcher.py 中的 |dataroot| 和 |checkpoints_dir|\npython -m experiments church train church_default\npython -m experiments bedroom train bedroom_default\npython -m experiments ffhq train ffhq512_default\npython -m experiments ffhq train ffhq1024_default\n\n# 默认情况下，脚本会使用 GPUtil 检查机器上可用的 GPU，并设置合适的 GPU ID。若要指定特定的 GPU 集合，可使用 |--gpu| 选项。同时请务必在相应脚本中更改 |num_gpus| 选项。\npython -m experiments church train church_default --gpu 01234567\n```\n\n训练进度可以通过 `visdom` 在 `--display_port` 指定的端口上监控。默认端口是 https:\u002F\u002Flocalhost:2004。作为参考，在 LSUN 教堂 256px 数据集上，使用 4 块 V100 GPU 进行训练大约需要 14 天。\n\n此外，还会使用训练集的随机样本生成若干交换网格，并将其保存为网页文件，路径为 `[checkpoints_dir]\u002F[expr_name]\u002Fsnapshots\u002F`。网格生成的频率由 `--evaluation_freq` 控制。\n\n所有可配置参数都会在训练开始时打印出来。这些配置分散在相关类别的代码中，例如 `models\u002Fswapping_autoencoder_model.py`、`util\u002Fiter_counter.py` 或 `models\u002Fnetworks\u002Fencoder.py` 中的 `def modify_commandline_options` 函数内。若需更改这些配置，只需在训练脚本（如 `experiments\u002Fchurch_launcher.py`）中的 `opt.specify` 部分修改相应的选项即可。\n\n解析和配置相关的代码位于 `experiments\u002F__init__.py`、`experiments\u002F__main__.py` 和 `experiments\u002Ftmux_launcher.py` 中。\n\n### 继续训练\n\n默认情况下，训练会从最后一个检查点继续进行，因为 `--continue_train` 选项默认设置为 True。若想从头开始训练，可以删除检查点文件，或者在训练脚本中将 `continue_train` 设置为 False（例如在 `experiments\u002Fchurch_launcher.py` 中）。\n\n## 代码结构（主要功能）\n\n- `models\u002Fswapping_autoencoder_model.py`：定义损失函数并生成可视化结果的核心文件。\n- `optimizers\u002Fswapping_autoencoder_optimizer.py`：定义优化器以及 GAN 的交替训练过程。\n- `models\u002Fnetworks\u002F`：包含模型架构文件，如 `generator.py`、`discriminator.py`、`encoder.py`、`patch_discrimiantor.py` 和 `stylegan2_layers.py`。\n- `options\u002F__init__.py`：包含基础选项标志。然而，许多重要选项分散在其他文件中，例如 `swapping_autoencoder_model.py` 或 `generator.py`。程序启动时，这些选项会被统一解析。查看已使用选项列表的最佳方式是运行训练脚本，并观察控制台输出的配置信息。\n- `util\u002Fiter_counter.py`：包含迭代计数功能。\n\n## 更改记录\n\n- 2021年4月14日：针对在山脉数据集上训练预训练模型的配置未正确设置，现已更新。\n- 2021年10月14日：新增了适用于 AFHQ 数据集的 256x256 预训练模型。请使用 `experiments\u002Fafhq_pretrained_launcher.py`。\n\n## BibTeX 引用\n\n如果您在研究中使用了本代码，请引用我们的论文：\n```\n@inproceedings{park2020swapping,\n  title={Swapping Autoencoder for Deep Image Manipulation},\n  author={Park, Taesung and Zhu, Jun-Yan and Wang, Oliver and Lu, Jingwan and Shechtman, Eli and Efros, Alexei A. and Zhang, Richard},\n  booktitle={Advances in Neural Information Processing Systems},\n  year={2020}\n}\n```\n\n## 致谢\n\nStyleGAN2 的部分层大量借鉴（甚至可以说是直接复制！）了 [@rosinality](https:\u002F\u002Fgithub.com\u002Frosinality\u002Fstylegan2-pytorch) 的 PyTorch 实现。我们感谢 Nicholas Kolkin 在自动化内容与风格评估方面的有益讨论，Jeongo Seo 和 Yoseob Kim 在用户界面设计上的建议，以及 William T. Peebles、Tongzhou Wang 和 Yu Sun 在解耦表示方面的交流与探讨。","# Swapping Autoencoder 快速上手指南\n\nSwapping Autoencoder 是一个用于深度图像操作的开源工具，能够将一张图像的**结构**（Structure）与另一张图像的**纹理\u002F风格**（Texture）进行解耦和交换。本项目基于 PyTorch 实现，支持图像风格迁移、纹理替换及插值生成。\n\n## 1. 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n*   **操作系统**: Linux (推荐) 或 macOS\n*   **GPU**: NVIDIA GPU (必须)，支持 CUDA 10.1 或更高版本\n    *   *注：该项目使用了 StyleGAN2 的自定义 CUDA 内核，必须依赖 GPU 运行。*\n*   **Python**: 推荐 Python 3.6 (作者测试环境为 PyTorch 1.7.1 + Python 3.6)\n*   **依赖库**:\n    *   PyTorch\n    *   dominate, torchgeometry, func-timeout, tqdm, matplotlib, opencv_python, lmdb, numpy, GPUtil, Pillow, scikit-learn, visdom, ninja\n\n> **国内加速建议**：\n> 安装 PyTorch 时，建议使用清华或中科大镜像源以加快下载速度。\n> 例如：`pip install torch torchvision -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`\n\n## 2. 安装步骤\n\n### 第一步：安装基础依赖\n使用 pip 安装项目所需的所有 Python 包：\n\n```bash\npip install dominate torchgeometry func-timeout tqdm matplotlib opencv_python lmdb numpy GPUtil Pillow scikit-learn visdom ninja\n```\n\n### 第二步：下载预训练模型与测试图片\n项目提供了预训练模型（涵盖 Church, Bedroom, Mountain, FFHQ, AFHQ 等数据集）。\n\n**下载地址**：\n*   **通用模型与测试图 (2.1GB)**: [http:\u002F\u002Fefrosgans.eecs.berkeley.edu\u002FSwappingAutoencoder\u002Fswapping_autoencoder_models_and_test_images.zip](http:\u002F\u002Fefrosgans.eecs.berkeley.edu\u002FSwappingAutoencoder\u002Fswapping_autoencoder_models_and_test_images.zip)\n*   **AFHQ 动物数据集模型 (256MB)**: [http:\u002F\u002Fefrosgans.eecs.berkeley.edu\u002FSwappingAutoencoder\u002Fafhq_models_and_test_images.zip](http:\u002F\u002Fefrosgans.eecs.berkeley.edu\u002FSwappingAutoencoder\u002Fafhq_models_and_test_images.zip)\n\n*注意：部分浏览器可能无法直接点击 http 链接下载，请将链接复制粘贴到地址栏，或使用 `wget` 命令下载。*\n\n### 第三步：解压与目录配置\n下载完成后，解压文件。默认脚本假设目录结构如下（可根据需要修改）：\n*   模型检查点位于：`.\u002Fcheckpoints\u002F`\n*   测试图片位于：`.\u002Ftestphotos\u002F`\n\n如果路径不同，请在运行命令时通过 `--checkpoints_dir` 和 `--dataroot` 参数指定。\n\n## 3. 基本使用\n\n以下示例展示如何使用预训练的 \"Mountain\" 模型进行最简单的**纹理交换**和**插值**操作。\n\n### 场景一：简单纹理交换 (Simple Swapping)\n将一张图片的结构与另一张图片的纹理结合。\n\n1.  编辑配置文件 `experiments\u002Fmountain_pretrained_launcher.py`，修改以下字段指向你的图片路径：\n    *   `input_structure_image`: 提供结构的图片路径\n    *   `input_texture_image`: 提供纹理的图片路径\n\n2.  运行交换命令：\n```bash\npython -m experiments mountain_pretrained test simple_swapping\n```\n\n或者直接通过命令行参数指定（无需修改脚本）：\n```bash\npython test.py --evaluation_metrics simple_swapping \\\n--preprocess scale_shortside --load_size 512 \\\n--name mountain_pretrained  \\\n--input_structure_image [path_to_structure_image] \\\n--input_texture_image [path_to_texture_image] \\\n--texture_mix_alpha 0.0\n```\n\n结果将保存在：`.\u002Fresults\u002Fmountain_pretrained\u002Fsimpleswapping\u002F`\n\n### 场景二：纹理插值 (Simple Interpolation)\n在两张不同纹理之间生成平滑过渡的效果。\n\n1.  同样配置好 `input_structure_image` 和 `input_texture_image`（通常需要两张不同的纹理图来观察插值效果，具体视脚本逻辑而定，此处沿用上述配置逻辑）。\n\n2.  运行插值命令：\n```bash\npython -m experiments mountain_pretrained test simple_interpolation\n```\n\n或直接使用命令行：\n```bash\npython test.py --evaluation_metrics simple_swapping \\\n--preprocess scale_shortside --load_size 512 \\\n--name mountain_pretrained  \\\n--input_structure_image [path_to_structure_image] \\\n--input_texture_image [path_to_texture_image] \\\n--texture_mix_alpha 0.0 0.25 0.5 0.75 1.0\n```\n*注：`--texture_mix_alpha` 参数定义了混合比例，多个数值将生成一系列过渡图片。*\n\n### 场景三：批量网格交换 (Swapping Grid)\n如果你想复现论文中的交换网格图（多张结构图 x 多张纹理图），可以使用以下命令（以 Church 数据集为例）：\n\n```bash\n# 确保 experiments\u002Fchurch_pretrained_launcher.py 中的路径配置正确\npython -m experiments church_pretrained test swapping_grid\n```\n支持的预设模型包括：`church`, `bedroom`, `mountain`, `ffhq512`, `afhq` 等。\n\n---\n*更多高级功能（如定量评估 FID、从头训练模型等）请参考原始 README 文档。*","某数字艺术工作室的设计师正在为一款奇幻游戏快速生成大量风格统一但场景各异的背景概念图。\n\n### 没有 swapping-autoencoder-pytorch 时\n- **手动修图效率极低**：设计师需在 Photoshop 中逐张抠图并手动融合不同照片的纹理与结构，耗时数小时才能产出一张合格草图。\n- **风格一致性难保证**：强行拼接不同来源的图片常导致光影冲突或纹理断裂，画面显得虚假且缺乏整体感。\n- **创意迭代成本高**：若想尝试“保留山脉轮廓但替换为火山岩质感”的多种方案，每次调整都需重新进行繁琐的后期处理。\n- **依赖高质量素材**：必须寻找视角、分辨率完全匹配的两张素材才能合成，极大限制了创意来源。\n\n### 使用 swapping-autoencoder-pytorch 后\n- **自动化结构纹理分离**：利用其编码器自动将参考图拆解为“结构码”和“纹理码”，一键即可将 A 图的构图与 B 图的材质完美融合。\n- **生成结果自然逼真**：通过补丁共现判别器（patch co-occurrence discriminator）确保合成区域的纹理过渡平滑，消除了人工拼接的违和感。\n- **极速探索创意变体**：只需更换输入的结构图或纹理图，几秒钟内即可批量生成数十种不同质感的场景方案，大幅加速决策流程。\n- **突破素材匹配限制**：不再强求源图片的视角一致，即使是用素描稿作为结构引导、实拍照片作为纹理参考，也能生成高质量图像。\n\nswapping-autoencoder-pytorch 通过将图像解耦为独立的结构与纹理空间，让深度图像编辑从繁琐的手工劳作转变为高效的参数化创作。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ftaesungp_swapping-autoencoder-pytorch_3946949b.jpg","taesungp","Taesung Park","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Ftaesungp_c2b0df39.jpg","https:\u002F\u002Ftaesung.me",null,"https:\u002F\u002Fgithub.com\u002Ftaesungp",[83,87,91],{"name":84,"color":85,"percentage":86},"Python","#3572A5",94.8,{"name":88,"color":89,"percentage":90},"Cuda","#3A4E3A",4.5,{"name":92,"color":93,"percentage":94},"C++","#f34b7d",0.7,527,87,"2026-04-03T06:02:04","NOASSERTION",4,"未说明","必需 NVIDIA GPU，需支持 CUDA 10.1 或更高版本（因使用自定义 CUDA 内核）。作者训练使用了 4 块 V100 GPU，显存需求视分辨率而定（512x512 或 1024x1024），建议大显存显卡。",{"notes":103,"python":104,"dependencies":105},"该工具依赖 StyleGAN2 的自定义 CUDA 内核（由 rosinality 移植），因此必须安装兼容的 CUDA 环境。训练耗时较长（例如在 LSUN Church 数据集上需 14 天\u002F4xV100）。测试和训练脚本默认使用 GPUtil 自动选择可用 GPU，也可手动指定。部分数据集（如 Flickr Mountains\u002FWaterfalls）因许可问题无法直接下载，需联系作者获取。预训练模型通过 HTTP 链接提供，下载时可能需直接在浏览器地址栏粘贴链接或使用 wget。","3.6",[106,107,108,109,110,111,112,113,114,115],"torch==1.7.1","dominate","torchgeometry","func-timeout","tqdm","matplotlib","opencv_python","lmdb","numpy","GPUtil",[14],"2026-03-27T02:49:30.150509","2026-04-06T08:42:13.904866",[120,125,130,135,140,145,150],{"id":121,"question_zh":122,"answer_zh":123,"source_url":124},14611,"训练结果中出现伪影（artifacts）是因为批次大小（batch size）太小还是迭代次数不够？","建议延长训练时间。即使超过了默认迭代次数，随着迭代次数的增加，生成质量仍会持续提高。如果在复杂区域出现伪影，通常是因为训练尚未完全收敛，继续训练往往能改善结果。","https:\u002F\u002Fgithub.com\u002Ftaesungp\u002Fswapping-autoencoder-pytorch\u002Fissues\u002F2",{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},14605,"如何在 Windows 10 上运行该项目？","可以在 Windows 10 上运行，但需要进行以下修改：\n1. 在 test.py 的导入语句后添加 `if __name__ == '__main__':` 并将剩余代码缩进。\n2. 在 test.py 导入后添加：`import os` 和 `os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\"`。\n3. 修改 `data\u002F__init__.py` 第 104 行，将数据加载器的 `num_workers` 设置为 0。\n4. 使用类似以下命令运行：\n`python test.py --dataroot . --dataset_mode imagefolder --checkpoints_dir .\u002Fcheckpoints\u002F --num_gpus 1 --batch_size 1 --preprocess scale_shortside --load_size 512 --crop_size 512 --name mountain_pretrained --lambda_patch_R1 10.0 --result_dir .\u002Fresults\u002F --evaluation_metrics simple_swapping --input_structure_image .\u002Ftestphotos\u002Fmountain\u002Ffig12\u002Fstructure\u002Fimage.jpeg --input_texture_image .\u002Ftestphotos\u002Fmountain\u002Ffig12\u002Fstyle\u002Fimage.jpeg`","https:\u002F\u002Fgithub.com\u002Ftaesungp\u002Fswapping-autoencoder-pytorch\u002Fissues\u002F3",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},14606,"预训练模型的下载链接无法打开怎么办？","如果直接点击下载链接失效，请尝试复制完整的 URL 地址粘贴到浏览器地址栏中访问。例如：http:\u002F\u002Fefrosgans.eecs.berkeley.edu\u002FSwappingAutoencoder\u002Fswapping_autoencoder_models_and_test_images.zip。直接点击可能受浏览器限制，但复制粘贴通常可以正常下载。","https:\u002F\u002Fgithub.com\u002Ftaesungp\u002Fswapping-autoencoder-pytorch\u002Fissues\u002F10",{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},14607,"为什么生成器（Generator）的输出层没有使用 Sigmoid 或 Tanh 激活函数？","本项目遵循 StyleGAN 的公式，不在 RGB 输出端使用非线性激活函数（如 Sigmoid 或 Tanh）来约束范围。模型会自行学习约束输出值的范围，因此缺少这些激活函数不会造成明显差异。早期实验曾发现使用它们会导致梯度消失问题，改用 N(0, 0.02) 初始化虽可修复，但直接不加激活函数是更优方案。","https:\u002F\u002Fgithub.com\u002Ftaesungp\u002Fswapping-autoencoder-pytorch\u002Fissues\u002F22",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},14608,"在 FFHQ 数据集上训练预训练模型需要多长时间？","在 8 块 V100 GPU 的机器上，训练分辨率为 1024x1024 的 FFHQ 模型大约需要 2 周时间。","https:\u002F\u002Fgithub.com\u002Ftaesungp\u002Fswapping-autoencoder-pytorch\u002Fissues\u002F21",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},14609,"如何将 PyTorch 模型转换为 TorchScript (jit.trace) 以便在 C++ 中使用？","由于 BaseModel 的 forward 方法使用了动态命令分发（command 参数），直接 trace 整个模型会失败。建议分别追踪编码器（E）和生成器（G），或者创建包装函数。例如：\n```python\ndef encode(image):\n    return model.E(image)\n# 或者\ndef encode(image):\n   return model(image, command=\"encode\")\n```\n然后对 `encode` 函数使用 `jit.trace`。这样可以绕过复杂的 forward 逻辑，直接获取可导出的子模块。","https:\u002F\u002Fgithub.com\u002Ftaesungp\u002Fswapping-autoencoder-pytorch\u002Fissues\u002F18",{"id":151,"question_zh":152,"answer_zh":153,"source_url":154},14610,"论文中提到的“反射填充（reflection padding）”是为了防止纹理代码编码位置信息，具体原理是什么？","为了实现理想的解耦，结构代码应包含特定位置的信息，而纹理代码应仅包含整体纹理分布（风格），不应利用任何位置信息（例如识别图像边界或坐标）。\n如果使用零填充（zero padding），卷积层会学习到图像边界，从而通过测量距离边界的远近来推断坐标。使用反射填充可以避免这种情况。另一种方法是完全不使用填充，但这会导致卷积层输出尺寸难以计算，尤其是在处理对应关系时。因此，残差块使用反射填充，而卷积块不使用填充。","https:\u002F\u002Fgithub.com\u002Ftaesungp\u002Fswapping-autoencoder-pytorch\u002Fissues\u002F4",[]]