[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-eriklindernoren--PyTorch-GAN":3,"tool-eriklindernoren--PyTorch-GAN":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":81,"owner_twitter":79,"owner_website":82,"owner_url":83,"languages":84,"stars":93,"forks":94,"last_commit_at":95,"license":96,"difficulty_score":10,"env_os":97,"env_gpu":98,"env_ram":99,"env_deps":100,"category_tags":110,"github_topics":79,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":111,"updated_at":112,"faqs":113,"releases":144},1408,"eriklindernoren\u002FPyTorch-GAN","PyTorch-GAN","PyTorch implementations of Generative Adversarial Networks.","PyTorch-GAN 是一个汇集了多种生成对抗网络（GAN）变体的开源代码库，基于流行的 PyTorch 深度学习框架构建。它旨在帮助开发者和研究人员快速复现学术论文中提出的经典 GAN 模型，如 CycleGAN、StarGAN、Pix2Pix 以及条件生成对抗网络等，覆盖了从图像超分辨率、风格迁移到数据增强等多个应用场景。\n\n在深度学习领域，复现前沿论文往往面临代码缺失或架构复杂的挑战。PyTorch-GAN 通过提供清晰、模块化的参考实现，降低了学习门槛，让用户能更专注于理解 GAN 的核心思想与训练技巧，而非纠结于底层细节。虽然部分模型结构未完全照搬论文配置，但其重点在于准确传达算法原理，非常适合用于教学演示、算法验证及二次开发。\n\n该工具主要面向具备一定 Python 和深度学习基础的开发者、科研人员及高校学生。对于希望深入探索生成式 AI 技术，或需要快速搭建原型进行实验的用户来说，这是一个极具价值的起点。只需简单的安装步骤，即可运行示例代码并观察生成效果，助力用户高效开启生成模型的研究之旅。","\u003Cp align=\"center\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_b0271d3d67e5.png\" width=\"480\"\\>\u003C\u002Fp>\n\n**This repository has gone stale as I unfortunately do not have the time to maintain it anymore. If you would like to continue the development of it as a collaborator send me an email at eriklindernoren@gmail.com.**\n\n## PyTorch-GAN\nCollection of PyTorch implementations of Generative Adversarial Network varieties presented in research papers. Model architectures will not always mirror the ones proposed in the papers, but I have chosen to focus on getting the core ideas covered instead of getting every layer configuration right. Contributions and suggestions of GANs to implement are very welcomed.\n\n\u003Cb>See also:\u003C\u002Fb> [Keras-GAN](https:\u002F\u002Fgithub.com\u002Feriklindernoren\u002FKeras-GAN)\n\n## Table of Contents\n  * [Installation](#installation)\n  * [Implementations](#implementations)\n    + [Auxiliary Classifier GAN](#auxiliary-classifier-gan)\n    + [Adversarial Autoencoder](#adversarial-autoencoder)\n    + [BEGAN](#began)\n    + [BicycleGAN](#bicyclegan)\n    + [Boundary-Seeking GAN](#boundary-seeking-gan)\n    + [Cluster GAN](#cluster-gan)\n    + [Conditional GAN](#conditional-gan)\n    + [Context-Conditional GAN](#context-conditional-gan)\n    + [Context Encoder](#context-encoder)\n    + [Coupled GAN](#coupled-gan)\n    + [CycleGAN](#cyclegan)\n    + [Deep Convolutional GAN](#deep-convolutional-gan)\n    + [DiscoGAN](#discogan)\n    + [DRAGAN](#dragan)\n    + [DualGAN](#dualgan)\n    + [Energy-Based GAN](#energy-based-gan)\n    + [Enhanced Super-Resolution GAN](#enhanced-super-resolution-gan)\n    + [GAN](#gan)\n    + [InfoGAN](#infogan)\n    + [Least Squares GAN](#least-squares-gan)\n    + [MUNIT](#munit)\n    + [Pix2Pix](#pix2pix)\n    + [PixelDA](#pixelda)\n    + [Relativistic GAN](#relativistic-gan)\n    + [Semi-Supervised GAN](#semi-supervised-gan)\n    + [Softmax GAN](#softmax-gan)\n    + [StarGAN](#stargan)\n    + [Super-Resolution GAN](#super-resolution-gan)\n    + [UNIT](#unit)\n    + [Wasserstein GAN](#wasserstein-gan)\n    + [Wasserstein GAN GP](#wasserstein-gan-gp)\n    + [Wasserstein GAN DIV](#wasserstein-gan-div)\n\n## Installation\n    $ git clone https:\u002F\u002Fgithub.com\u002Feriklindernoren\u002FPyTorch-GAN\n    $ cd PyTorch-GAN\u002F\n    $ sudo pip3 install -r requirements.txt\n\n## Implementations   \n### Auxiliary Classifier GAN\n_Auxiliary Classifier Generative Adversarial Network_\n\n#### Authors\nAugustus Odena, Christopher Olah, Jonathon Shlens\n\n#### Abstract\nSynthesizing high resolution photorealistic images has been a long-standing challenge in machine learning. In this paper we introduce new methods for the improved training of generative adversarial networks (GANs) for image synthesis. We construct a variant of GANs employing label conditioning that results in 128x128 resolution image samples exhibiting global coherence. We expand on previous work for image quality assessment to provide two new analyses for assessing the discriminability and diversity of samples from class-conditional image synthesis models. These analyses demonstrate that high resolution samples provide class information not present in low resolution samples. Across 1000 ImageNet classes, 128x128 samples are more than twice as discriminable as artificially resized 32x32 samples. In addition, 84.7% of the classes have samples exhibiting diversity comparable to real ImageNet data.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1610.09585) [[Code]](implementations\u002Facgan\u002Facgan.py)\n\n#### Run Example\n```\n$ cd implementations\u002Facgan\u002F\n$ python3 acgan.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_70e8e8c9d8cc.gif\" width=\"360\"\\>\n\u003C\u002Fp>\n\n### Adversarial Autoencoder\n_Adversarial Autoencoder_\n\n#### Authors\nAlireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, Brendan Frey\n\n#### Abstract\nn this paper, we propose the \"adversarial autoencoder\" (AAE), which is a probabilistic autoencoder that uses the recently proposed generative adversarial networks (GAN) to perform variational inference by matching the aggregated posterior of the hidden code vector of the autoencoder with an arbitrary prior distribution. Matching the aggregated posterior to the prior ensures that generating from any part of prior space results in meaningful samples. As a result, the decoder of the adversarial autoencoder learns a deep generative model that maps the imposed prior to the data distribution. We show how the adversarial autoencoder can be used in applications such as semi-supervised classification, disentangling style and content of images, unsupervised clustering, dimensionality reduction and data visualization. We performed experiments on MNIST, Street View House Numbers and Toronto Face datasets and show that adversarial autoencoders achieve competitive results in generative modeling and semi-supervised classification tasks.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.05644) [[Code]](implementations\u002Faae\u002Faae.py)\n\n#### Run Example\n```\n$ cd implementations\u002Faae\u002F\n$ python3 aae.py\n```\n\n### BEGAN\n_BEGAN: Boundary Equilibrium Generative Adversarial Networks_\n\n#### Authors\nDavid Berthelot, Thomas Schumm, Luke Metz\n\n#### Abstract\nWe propose a new equilibrium enforcing method paired with a loss derived from the Wasserstein distance for training auto-encoder based Generative Adversarial Networks. This method balances the generator and discriminator during training. Additionally, it provides a new approximate convergence measure, fast and stable training and high visual quality. We also derive a way of controlling the trade-off between image diversity and visual quality. We focus on the image generation task, setting a new milestone in visual quality, even at higher resolutions. This is achieved while using a relatively simple model architecture and a standard training procedure.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.10717) [[Code]](implementations\u002Fbegan\u002Fbegan.py)\n\n#### Run Example\n```\n$ cd implementations\u002Fbegan\u002F\n$ python3 began.py\n```\n\n### BicycleGAN\n_Toward Multimodal Image-to-Image Translation_\n\n#### Authors\nJun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, Eli Shechtman\n\n#### Abstract\nMany image-to-image translation problems are ambiguous, as a single input image may correspond to multiple possible outputs. In this work, we aim to model a \\emph{distribution} of possible outputs in a conditional generative modeling setting. The ambiguity of the mapping is distilled in a low-dimensional latent vector, which can be randomly sampled at test time. A generator learns to map the given input, combined with this latent code, to the output. We explicitly encourage the connection between output and the latent code to be invertible. This helps prevent a many-to-one mapping from the latent code to the output during training, also known as the problem of mode collapse, and produces more diverse results. We explore several variants of this approach by employing different training objectives, network architectures, and methods of injecting the latent code. Our proposed method encourages bijective consistency between the latent encoding and output modes. We present a systematic comparison of our method and other variants on both perceptual realism and diversity.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.11586) [[Code]](implementations\u002Fbicyclegan\u002Fbicyclegan.py)\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_7f9f24f3cae8.jpg\" width=\"800\"\\>\n\u003C\u002Fp>\n\n#### Run Example\n```\n$ cd data\u002F\n$ bash download_pix2pix_dataset.sh edges2shoes\n$ cd ..\u002Fimplementations\u002Fbicyclegan\u002F\n$ python3 bicyclegan.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_026ee954b106.png\" width=\"480\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    Various style translations by varying the latent code.\n\u003C\u002Fp>\n\n\n### Boundary-Seeking GAN\n_Boundary-Seeking Generative Adversarial Networks_\n\n#### Authors\nR Devon Hjelm, Athul Paul Jacob, Tong Che, Adam Trischler, Kyunghyun Cho, Yoshua Bengio\n\n#### Abstract\nGenerative adversarial networks (GANs) are a learning framework that rely on training a discriminator to estimate a measure of difference between a target and generated distributions. GANs, as normally formulated, rely on the generated samples being completely differentiable w.r.t. the generative parameters, and thus do not work for discrete data. We introduce a method for training GANs with discrete data that uses the estimated difference measure from the discriminator to compute importance weights for generated samples, thus providing a policy gradient for training the generator. The importance weights have a strong connection to the decision boundary of the discriminator, and we call our method boundary-seeking GANs (BGANs). We demonstrate the effectiveness of the proposed algorithm with discrete image and character-based natural language generation. In addition, the boundary-seeking objective extends to continuous data, which can be used to improve stability of training, and we demonstrate this on Celeba, Large-scale Scene Understanding (LSUN) bedrooms, and Imagenet without conditioning.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1702.08431) [[Code]](implementations\u002Fbgan\u002Fbgan.py)\n\n#### Run Example\n```\n$ cd implementations\u002Fbgan\u002F\n$ python3 bgan.py\n```\n\n### Cluster GAN\n_ClusterGAN: Latent Space Clustering in Generative Adversarial Networks_\n\n#### Authors\nSudipto Mukherjee, Himanshu Asnani, Eugene Lin, Sreeram Kannan\n\n#### Abstract\nGenerative Adversarial networks (GANs) have obtained remarkable success in many unsupervised learning tasks and\nunarguably, clustering is an important unsupervised learning problem. While one can potentially exploit the\nlatent-space back-projection in GANs to cluster, we demonstrate that the cluster structure is not retained in the\nGAN latent space.  In this paper, we propose ClusterGAN as a new mechanism for clustering using GANs. By sampling\nlatent variables from a mixture of one-hot encoded variables and continuous latent variables, coupled with an\ninverse network (which projects the data to the latent space) trained jointly with a clustering specific loss, we\nare able to achieve clustering in the latent space. Our results show a remarkable phenomenon that GANs can preserve\nlatent space interpolation across categories, even though the discriminator is never exposed to such vectors. We\ncompare our results with various clustering baselines and demonstrate superior performance on both synthetic and\nreal datasets.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1809.03627) [[Code]](implementations\u002Fcluster_gan\u002Fclustergan.py)\n\nCode based on a full PyTorch [[implementation]](https:\u002F\u002Fgithub.com\u002Fzhampel\u002FclusterGAN).\n\n#### Run Example\n```\n$ cd implementations\u002Fcluster_gan\u002F\n$ python3 clustergan.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_04a97f36ef7b.gif\" width=\"360\"\\>\n\u003C\u002Fp>\n\n\n### Conditional GAN\n_Conditional Generative Adversarial Nets_\n\n#### Authors\nMehdi Mirza, Simon Osindero\n\n#### Abstract\nGenerative Adversarial Nets [8] were recently introduced as a novel way to train generative models. In this work we introduce the conditional version of generative adversarial nets, which can be constructed by simply feeding the data, y, we wish to condition on to both the generator and discriminator. We show that this model can generate MNIST digits conditioned on class labels. We also illustrate how this model could be used to learn a multi-modal model, and provide preliminary examples of an application to image tagging in which we demonstrate how this approach can generate descriptive tags which are not part of training labels.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1411.1784) [[Code]](implementations\u002Fcgan\u002Fcgan.py)\n\n#### Run Example\n```\n$ cd implementations\u002Fcgan\u002F\n$ python3 cgan.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_32a80459f043.gif\" width=\"360\"\\>\n\u003C\u002Fp>\n\n### Context-Conditional GAN\n_Semi-Supervised Learning with Context-Conditional Generative Adversarial Networks_\n\n#### Authors\nEmily Denton, Sam Gross, Rob Fergus\n\n#### Abstract\nWe introduce a simple semi-supervised learning approach for images based on in-painting using an adversarial loss. Images with random patches removed are presented to a generator whose task is to fill in the hole, based on the surrounding pixels. The in-painted images are then presented to a discriminator network that judges if they are real (unaltered training images) or not. This task acts as a regularizer for standard supervised training of the discriminator. Using our approach we are able to directly train large VGG-style networks in a semi-supervised fashion. We evaluate on STL-10 and PASCAL datasets, where our approach obtains performance comparable or superior to existing methods.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.06430) [[Code]](implementations\u002Fccgan\u002Fccgan.py)\n\n#### Run Example\n```\n$ cd implementations\u002Fccgan\u002F\n$ python3 ccgan.py\n```\n\n### Context Encoder\n_Context Encoders: Feature Learning by Inpainting_\n\n#### Authors\nDeepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, Alexei A. Efros\n\n#### Abstract\nWe present an unsupervised visual feature learning algorithm driven by context-based pixel prediction. By analogy with auto-encoders, we propose Context Encoders -- a convolutional neural network trained to generate the contents of an arbitrary image region conditioned on its surroundings. In order to succeed at this task, context encoders need to both understand the content of the entire image, as well as produce a plausible hypothesis for the missing part(s). When training context encoders, we have experimented with both a standard pixel-wise reconstruction loss, as well as a reconstruction plus an adversarial loss. The latter produces much sharper results because it can better handle multiple modes in the output. We found that a context encoder learns a representation that captures not just appearance but also the semantics of visual structures. We quantitatively demonstrate the effectiveness of our learned features for CNN pre-training on classification, detection, and segmentation tasks. Furthermore, context encoders can be used for semantic inpainting tasks, either stand-alone or as initialization for non-parametric methods.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1604.07379) [[Code]](implementations\u002Fcontext_encoder\u002Fcontext_encoder.py)\n\n#### Run Example\n```\n$ cd implementations\u002Fcontext_encoder\u002F\n\u003Cfollow steps at the top of context_encoder.py>\n$ python3 context_encoder.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_be629a861f89.png\" width=\"640\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    Rows: Masked | Inpainted | Original | Masked | Inpainted | Original\n\u003C\u002Fp>\n\n### Coupled GAN\n_Coupled Generative Adversarial Networks_\n\n#### Authors\nMing-Yu Liu, Oncel Tuzel\n\n#### Abstract\nWe propose coupled generative adversarial network (CoGAN) for learning a joint distribution of multi-domain images. In contrast to the existing approaches, which require tuples of corresponding images in different domains in the training set, CoGAN can learn a joint distribution without any tuple of corresponding images. It can learn a joint distribution with just samples drawn from the marginal distributions. This is achieved by enforcing a weight-sharing constraint that limits the network capacity and favors a joint distribution solution over a product of marginal distributions one. We apply CoGAN to several joint distribution learning tasks, including learning a joint distribution of color and depth images, and learning a joint distribution of face images with different attributes. For each task it successfully learns the joint distribution without any tuple of corresponding images. We also demonstrate its applications to domain adaptation and image transformation.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1606.07536) [[Code]](implementations\u002Fcogan\u002Fcogan.py)\n\n#### Run Example\n```\n$ cd implementations\u002Fcogan\u002F\n$ python3 cogan.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_3d3ec415fb93.gif\" width=\"360\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    Generated MNIST and MNIST-M images\n\u003C\u002Fp>\n\n### CycleGAN\n_Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks_\n\n#### Authors\nJun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros\n\n#### Abstract\nImage-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. Our goal is to learn a mapping G:X→Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping F:Y→X and introduce a cycle consistency loss to push F(G(X))≈X (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.10593) [[Code]](implementations\u002Fcyclegan\u002Fcyclegan.py)\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_fcac6d24856d.png\" width=\"640\"\\>\n\u003C\u002Fp>\n\n#### Run Example\n```\n$ cd data\u002F\n$ bash download_cyclegan_dataset.sh monet2photo\n$ cd ..\u002Fimplementations\u002Fcyclegan\u002F\n$ python3 cyclegan.py --dataset_name monet2photo\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_50907f776639.png\" width=\"900\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    Monet to photo translations.\n\u003C\u002Fp>\n\n### Deep Convolutional GAN\n_Deep Convolutional Generative Adversarial Network_\n\n#### Authors\nAlec Radford, Luke Metz, Soumith Chintala\n\n#### Abstract\nIn recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Additionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.06434) [[Code]](implementations\u002Fdcgan\u002Fdcgan.py)\n\n#### Run Example\n```\n$ cd implementations\u002Fdcgan\u002F\n$ python3 dcgan.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_1106b8957087.gif\" width=\"240\"\\>\n\u003C\u002Fp>\n\n### DiscoGAN\n_Learning to Discover Cross-Domain Relations with Generative Adversarial Networks_\n\n#### Authors\nTaeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, Jiwon Kim\n\n#### Abstract\nWhile humans easily recognize relations between data from different domains without any supervision, learning to automatically discover them is in general very challenging and needs many ground-truth pairs that illustrate the relations. To avoid costly pairing, we address the task of discovering cross-domain relations given unpaired data. We propose a method based on generative adversarial networks that learns to discover relations between different domains (DiscoGAN). Using the discovered relations, our proposed network successfully transfers style from one domain to another while preserving key attributes such as orientation and face identity.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.05192) [[Code]](implementations\u002Fdiscogan\u002Fdiscogan.py)\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_ae76c24abe01.png\" width=\"640\"\\>\n\u003C\u002Fp>\n\n#### Run Example\n```\n$ cd data\u002F\n$ bash download_pix2pix_dataset.sh edges2shoes\n$ cd ..\u002Fimplementations\u002Fdiscogan\u002F\n$ python3 discogan.py --dataset_name edges2shoes\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_73aad89bb03f.png\" width=\"480\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    Rows from top to bottom: (1) Real image from domain A (2) Translated image from \u003Cbr>\n    domain A (3) Reconstructed image from domain A (4) Real image from domain B (5) \u003Cbr>\n    Translated image from domain B (6) Reconstructed image from domain B\n\u003C\u002Fp>\n\n### DRAGAN\n_On Convergence and Stability of GANs_\n\n#### Authors\nNaveen Kodali, Jacob Abernethy, James Hays, Zsolt Kira\n\n#### Abstract\nWe propose studying GAN training dynamics as regret minimization, which is in contrast to the popular view that there is consistent minimization of a divergence between real and generated distributions. We analyze the convergence of GAN training from this new point of view to understand why mode collapse happens. We hypothesize the existence of undesirable local equilibria in this non-convex game to be responsible for mode collapse. We observe that these local equilibria often exhibit sharp gradients of the discriminator function around some real data points. We demonstrate that these degenerate local equilibria can be avoided with a gradient penalty scheme called DRAGAN. We show that DRAGAN enables faster training, achieves improved stability with fewer mode collapses, and leads to generator networks with better modeling performance across a variety of architectures and objective functions.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1705.07215) [[Code]](implementations\u002Fdragan\u002Fdragan.py)\n\n#### Run Example\n```\n$ cd implementations\u002Fdragan\u002F\n$ python3 dragan.py\n```\n\n### DualGAN\n_DualGAN: Unsupervised Dual Learning for Image-to-Image Translation_\n\n#### Authors\nZili Yi, Hao Zhang, Ping Tan, Minglun Gong\n\n#### Abstract\nConditional Generative Adversarial Networks (GANs) for cross-domain image-to-image translation have made much progress recently. Depending on the task complexity, thousands to millions of labeled image pairs are needed to train a conditional GAN. However, human labeling is expensive, even impractical, and large quantities of data may not always be available. Inspired by dual learning from natural language translation, we develop a novel dual-GAN mechanism, which enables image translators to be trained from two sets of unlabeled images from two domains. In our architecture, the primal GAN learns to translate images from domain U to those in domain V, while the dual GAN learns to invert the task. The closed loop made by the primal and dual tasks allows images from either domain to be translated and then reconstructed. Hence a loss function that accounts for the reconstruction error of images can be used to train the translators. Experiments on multiple image translation tasks with unlabeled data show considerable performance gain of DualGAN over a single GAN. For some tasks, DualGAN can even achieve comparable or slightly better results than conditional GAN trained on fully labeled data.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.02510) [[Code]](implementations\u002Fdualgan\u002Fdualgan.py)\n\n\n#### Run Example\n```\n$ cd data\u002F\n$ bash download_pix2pix_dataset.sh facades\n$ cd ..\u002Fimplementations\u002Fdualgan\u002F\n$ python3 dualgan.py --dataset_name facades\n```\n\n### Energy-Based GAN\n_Energy-based Generative Adversarial Network_\n\n#### Authors\nJunbo Zhao, Michael Mathieu, Yann LeCun\n\n#### Abstract\nWe introduce the \"Energy-based Generative Adversarial Network\" model (EBGAN) which views the discriminator as an energy function that attributes low energies to the regions near the data manifold and higher energies to other regions. Similar to the probabilistic GANs, a generator is seen as being trained to produce contrastive samples with minimal energies, while the discriminator is trained to assign high energies to these generated samples. Viewing the discriminator as an energy function allows to use a wide variety of architectures and loss functionals in addition to the usual binary classifier with logistic output. Among them, we show one instantiation of EBGAN framework as using an auto-encoder architecture, with the energy being the reconstruction error, in place of the discriminator. We show that this form of EBGAN exhibits more stable behavior than regular GANs during training. We also show that a single-scale architecture can be trained to generate high-resolution images.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1609.03126) [[Code]](implementations\u002Febgan\u002Febgan.py)\n\n#### Run Example\n```\n$ cd implementations\u002Febgan\u002F\n$ python3 ebgan.py\n```\n\n### Enhanced Super-Resolution GAN\n_ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks_\n\n#### Authors\nXintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Chen Change Loy, Yu Qiao, Xiaoou Tang\n\n#### Abstract\nThe Super-Resolution Generative Adversarial Network (SRGAN) is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied with unpleasant artifacts. To further enhance the visual quality, we thoroughly study three key components of SRGAN - network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN). In particular, we introduce the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit. Moreover, we borrow the idea from relativistic GAN to let the discriminator predict relative realness instead of the absolute value. Finally, we improve the perceptual loss by using the features before activation, which could provide stronger supervision for brightness consistency and texture recovery. Benefiting from these improvements, the proposed ESRGAN achieves consistently better visual quality with more realistic and natural textures than SRGAN and won the first place in the PIRM2018-SR Challenge. The code is available at [this https URL](https:\u002F\u002Fgithub.com\u002Fxinntao\u002FESRGAN).\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1809.00219) [[Code]](implementations\u002Fesrgan\u002Fesrgan.py)\n\n\n#### Run Example\n```\n$ cd implementations\u002Fesrgan\u002F\n\u003Cfollow steps at the top of esrgan.py>\n$ python3 esrgan.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_bc4f696ff5c4.png\" width=\"320\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    Nearest Neighbor Upsampling | ESRGAN\n\u003C\u002Fp>\n\n### GAN\n_Generative Adversarial Network_\n\n#### Authors\nIan J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio\n\n#### Abstract\nWe propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1\u002F2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1406.2661) [[Code]](implementations\u002Fgan\u002Fgan.py)\n\n#### Run Example\n```\n$ cd implementations\u002Fgan\u002F\n$ python3 gan.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_0013580eaac9.gif\" width=\"240\"\\>\n\u003C\u002Fp>\n\n### InfoGAN\n_InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets_\n\n#### Authors\nXi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel\n\n#### Abstract\nThis paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner. InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. We derive a lower bound to the mutual information objective that can be optimized efficiently, and show that our training procedure can be interpreted as a variation of the Wake-Sleep algorithm. Specifically, InfoGAN successfully disentangles writing styles from digit shapes on the MNIST dataset, pose from lighting of 3D rendered images, and background digits from the central digit on the SVHN dataset. It also discovers visual concepts that include hair styles, presence\u002Fabsence of eyeglasses, and emotions on the CelebA face dataset. Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing fully supervised methods.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1606.03657) [[Code]](implementations\u002Finfogan\u002Finfogan.py)\n\n#### Run Example\n```\n$ cd implementations\u002Finfogan\u002F\n$ python3 infogan.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_cff97b99285b.gif\" width=\"360\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    Result of varying categorical latent variable by column.\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_a776f5de29f3.png\" width=\"360\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    Result of varying continuous latent variable by row.\n\u003C\u002Fp>\n\n### Least Squares GAN\n_Least Squares Generative Adversarial Networks_\n\n#### Authors\nXudong Mao, Qing Li, Haoran Xie, Raymond Y.K. Lau, Zhen Wang, Stephen Paul Smolley\n\n#### Abstract\nUnsupervised learning with generative adversarial networks (GANs) has proven hugely successful. Regular GANs hypothesize the discriminator as a classifier with the sigmoid cross entropy loss function. However, we found that this loss function may lead to the vanishing gradients problem during the learning process. To overcome such a problem, we propose in this paper the Least Squares Generative Adversarial Networks (LSGANs) which adopt the least squares loss function for the discriminator. We show that minimizing the objective function of LSGAN yields minimizing the Pearson χ2 divergence. There are two benefits of LSGANs over regular GANs. First, LSGANs are able to generate higher quality images than regular GANs. Second, LSGANs perform more stable during the learning process. We evaluate LSGANs on five scene datasets and the experimental results show that the images generated by LSGANs are of better quality than the ones generated by regular GANs. We also conduct two comparison experiments between LSGANs and regular GANs to illustrate the stability of LSGANs.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.04076) [[Code]](implementations\u002Flsgan\u002Flsgan.py)\n\n#### Run Example\n```\n$ cd implementations\u002Flsgan\u002F\n$ python3 lsgan.py\n```\n\n\n### MUNIT\n_Multimodal Unsupervised Image-to-Image Translation_\n\n#### Authors\nXun Huang, Ming-Yu Liu, Serge Belongie, Jan Kautz\n\n#### Abstract\nUnsupervised image-to-image translation is an important and challenging problem in computer vision. Given an image in the source domain, the goal is to learn the conditional distribution of corresponding images in the target domain, without seeing any pairs of corresponding images. While this conditional distribution is inherently multimodal, existing approaches make an overly simplified assumption, modeling it as a deterministic one-to-one mapping. As a result, they fail to generate diverse outputs from a given source domain image. To address this limitation, we propose a Multimodal Unsupervised Image-to-image Translation (MUNIT) framework. We assume that the image representation can be decomposed into a content code that is domain-invariant, and a style code that captures domain-specific properties. To translate an image to another domain, we recombine its content code with a random style code sampled from the style space of the target domain. We analyze the proposed framework and establish several theoretical results. Extensive experiments with comparisons to the state-of-the-art approaches further demonstrates the advantage of the proposed framework. Moreover, our framework allows users to control the style of translation outputs by providing an example style image. Code and pretrained models are available at [this https URL](https:\u002F\u002Fgithub.com\u002Fnvlabs\u002FMUNIT)\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1804.04732) [[Code]](implementations\u002Fmunit\u002Fmunit.py)\n\n#### Run Example\n```\n$ cd data\u002F\n$ bash download_pix2pix_dataset.sh edges2shoes\n$ cd ..\u002Fimplementations\u002Fmunit\u002F\n$ python3 munit.py --dataset_name edges2shoes\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_438374a647c3.png\" width=\"480\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    Results by varying the style code.\n\u003C\u002Fp>\n\n### Pix2Pix\n_Unpaired Image-to-Image Translation with Conditional Adversarial Networks_\n\n#### Authors\nPhillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros\n\n#### Abstract\nWe investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.07004) [[Code]](implementations\u002Fpix2pix\u002Fpix2pix.py)\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_11179601e0f7.png\" width=\"640\"\\>\n\u003C\u002Fp>\n\n#### Run Example\n```\n$ cd data\u002F\n$ bash download_pix2pix_dataset.sh facades\n$ cd ..\u002Fimplementations\u002Fpix2pix\u002F\n$ python3 pix2pix.py --dataset_name facades\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_d1c590c13e37.png\" width=\"480\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    Rows from top to bottom: (1) The condition for the generator (2) Generated image \u003Cbr>\n    based of condition (3) The true corresponding image to the condition\n\u003C\u002Fp>\n\n### PixelDA\n_Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks_\n\n#### Authors\nKonstantinos Bousmalis, Nathan Silberman, David Dohan, Dumitru Erhan, Dilip Krishnan\n\n#### Abstract\nCollecting well-annotated image datasets to train modern machine learning algorithms is prohibitively expensive for many tasks. One appealing alternative is rendering synthetic data where ground-truth annotations are generated automatically. Unfortunately, models trained purely on rendered images often fail to generalize to real images. To address this shortcoming, prior work introduced unsupervised domain adaptation algorithms that attempt to map representations between the two domains or learn to extract features that are domain-invariant. In this work, we present a new approach that learns, in an unsupervised manner, a transformation in the pixel space from one domain to the other. Our generative adversarial network (GAN)-based method adapts source-domain images to appear as if drawn from the target domain. Our approach not only produces plausible samples, but also outperforms the state-of-the-art on a number of unsupervised domain adaptation scenarios by large margins. Finally, we demonstrate that the adaptation process generalizes to object classes unseen during training.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1612.05424) [[Code]](implementations\u002Fpixelda\u002Fpixelda.py)\n\n#### MNIST to MNIST-M Classification\nTrains a classifier on images that have been translated from the source domain (MNIST) to the target domain (MNIST-M) using the annotations of the source domain images. The classification network is trained jointly with the generator network to optimize the generator for both providing a proper domain translation and also for preserving the semantics of the source domain image. The classification network trained on translated images is compared to the naive solution of training a classifier on MNIST and evaluating it on MNIST-M. The naive model manages a 55% classification accuracy on MNIST-M while the one trained during domain adaptation achieves a 95% classification accuracy.\n\n```\n$ cd implementations\u002Fpixelda\u002F\n$ python3 pixelda.py\n```  \n| Method       | Accuracy  |\n| ------------ |:---------:|\n| Naive        | 55%       |\n| PixelDA      | 95%       |\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_2d177bcddaa8.png\" width=\"480\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    Rows from top to bottom: (1) Real images from MNIST (2) Translated images from \u003Cbr>\n    MNIST to MNIST-M (3) Examples of images from MNIST-M\n\u003C\u002Fp>\n\n### Relativistic GAN\n_The relativistic discriminator: a key element missing from standard GAN_\n\n#### Authors\nAlexia Jolicoeur-Martineau\n\n#### Abstract\nIn standard generative adversarial network (SGAN), the discriminator estimates the probability that the input data is real. The generator is trained to increase the probability that fake data is real. We argue that it should also simultaneously decrease the probability that real data is real because 1) this would account for a priori knowledge that half of the data in the mini-batch is fake, 2) this would be observed with divergence minimization, and 3) in optimal settings, SGAN would be equivalent to integral probability metric (IPM) GANs.\nWe show that this property can be induced by using a relativistic discriminator which estimate the probability that the given real data is more realistic than a randomly sampled fake data. We also present a variant in which the discriminator estimate the probability that the given real data is more realistic than fake data, on average. We generalize both approaches to non-standard GAN loss functions and we refer to them respectively as Relativistic GANs (RGANs) and Relativistic average GANs (RaGANs). We show that IPM-based GANs are a subset of RGANs which use the identity function.\nEmpirically, we observe that 1) RGANs and RaGANs are significantly more stable and generate higher quality data samples than their non-relativistic counterparts, 2) Standard RaGAN with gradient penalty generate data of better quality than WGAN-GP while only requiring a single discriminator update per generator update (reducing the time taken for reaching the state-of-the-art by 400%), and 3) RaGANs are able to generate plausible high resolutions images (256x256) from a very small sample (N=2011), while GAN and LSGAN cannot; these images are of significantly better quality than the ones generated by WGAN-GP and SGAN with spectral normalization.\n\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1807.00734) [[Code]](implementations\u002Frelativistic_gan\u002Frelativistic_gan.py)\n\n#### Run Example\n```\n$ cd implementations\u002Frelativistic_gan\u002F\n$ python3 relativistic_gan.py                 # Relativistic Standard GAN\n$ python3 relativistic_gan.py --rel_avg_gan   # Relativistic Average GAN\n```\n\n### Semi-Supervised GAN\n_Semi-Supervised Generative Adversarial Network_\n\n#### Authors\nAugustus Odena\n\n#### Abstract\nWe extend Generative Adversarial Networks (GANs) to the semi-supervised context by forcing the discriminator network to output class labels. We train a generative model G and a discriminator D on a dataset with inputs belonging to one of N classes. At training time, D is made to predict which of N+1 classes the input belongs to, where an extra class is added to correspond to the outputs of G. We show that this method can be used to create a more data-efficient classifier and that it allows for generating higher quality samples than a regular GAN.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1606.01583) [[Code]](implementations\u002Fsgan\u002Fsgan.py)\n\n#### Run Example\n```\n$ cd implementations\u002Fsgan\u002F\n$ python3 sgan.py\n```\n\n### Softmax GAN\n_Softmax GAN_\n\n#### Authors\nMin Lin\n\n#### Abstract\nSoftmax GAN is a novel variant of Generative Adversarial Network (GAN). The key idea of Softmax GAN is to replace the classification loss in the original GAN with a softmax cross-entropy loss in the sample space of one single batch. In the adversarial learning of N real training samples and M generated samples, the target of discriminator training is to distribute all the probability mass to the real samples, each with probability 1M, and distribute zero probability to generated data. In the generator training phase, the target is to assign equal probability to all data points in the batch, each with probability 1M+N. While the original GAN is closely related to Noise Contrastive Estimation (NCE), we show that Softmax GAN is the Importance Sampling version of GAN. We futher demonstrate with experiments that this simple change stabilizes GAN training.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.06191) [[Code]](implementations\u002Fsoftmax_gan\u002Fsoftmax_gan.py)\n\n#### Run Example\n```\n$ cd implementations\u002Fsoftmax_gan\u002F\n$ python3 softmax_gan.py\n```\n\n### StarGAN\n_StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation_\n\n#### Authors\nYunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, Jaegul Choo\n\n#### Abstract\nRecent studies have shown remarkable success in image-to-image translation for two domains. However, existing approaches have limited scalability and robustness in handling more than two domains, since different models should be built independently for every pair of image domains. To address this limitation, we propose StarGAN, a novel and scalable approach that can perform image-to-image translations for multiple domains using only a single model. Such a unified model architecture of StarGAN allows simultaneous training of multiple datasets with different domains within a single network. This leads to StarGAN's superior quality of translated images compared to existing models as well as the novel capability of flexibly translating an input image to any desired target domain. We empirically demonstrate the effectiveness of our approach on a facial attribute transfer and a facial expression synthesis tasks.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.09020) [[Code]](implementations\u002Fstargan\u002Fstargan.py)\n\n#### Run Example\n```\n$ cd implementations\u002Fstargan\u002F\n\u003Cfollow steps at the top of stargan.py>\n$ python3 stargan.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_02f3a7c38893.png\" width=\"640\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    Original | Black Hair | Blonde Hair | Brown Hair | Gender Flip | Aged\n\u003C\u002Fp>\n\n### Super-Resolution GAN\n_Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network_\n\n#### Authors\nChristian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi\n\n#### Abstract\nDespite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution. In this paper, we present SRGAN, a generative adversarial network (GAN) for image super-resolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4x upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images. In addition, we use a content loss motivated by perceptual similarity instead of similarity in pixel space. Our deep residual network is able to recover photo-realistic textures from heavily downsampled images on public benchmarks. An extensive mean-opinion-score (MOS) test shows hugely significant gains in perceptual quality using SRGAN. The MOS scores obtained with SRGAN are closer to those of the original high-resolution images than to those obtained with any state-of-the-art method.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1609.04802) [[Code]](implementations\u002Fsrgan\u002Fsrgan.py)\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_e0f76b8e3740.png\" width=\"640\"\\>\n\u003C\u002Fp>\n\n#### Run Example\n```\n$ cd implementations\u002Fsrgan\u002F\n\u003Cfollow steps at the top of srgan.py>\n$ python3 srgan.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_2cfbe968500a.png\" width=\"320\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    Nearest Neighbor Upsampling | SRGAN\n\u003C\u002Fp>\n\n### UNIT\n_Unsupervised Image-to-Image Translation Networks_\n\n#### Authors\nMing-Yu Liu, Thomas Breuel, Jan Kautz\n\n#### Abstract\nUnsupervised image-to-image translation aims at learning a joint distribution of images in different domains by using images from the marginal distributions in individual domains. Since there exists an infinite set of joint distributions that can arrive the given marginal distributions, one could infer nothing about the joint distribution from the marginal distributions without additional assumptions. To address the problem, we make a shared-latent space assumption and propose an unsupervised image-to-image translation framework based on Coupled GANs. We compare the proposed framework with competing approaches and present high quality image translation results on various challenging unsupervised image translation tasks, including street scene image translation, animal image translation, and face image translation. We also apply the proposed framework to domain adaptation and achieve state-of-the-art performance on benchmark datasets. Code and additional results are available in this [https URL](https:\u002F\u002Fgithub.com\u002Fmingyuliutw\u002Funit).\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.00848) [[Code]](implementations\u002Funit\u002Funit.py)\n\n#### Run Example\n```\n$ cd data\u002F\n$ bash download_cyclegan_dataset.sh apple2orange\n$ cd implementations\u002Funit\u002F\n$ python3 unit.py --dataset_name apple2orange\n```\n\n### Wasserstein GAN\n_Wasserstein GAN_\n\n#### Authors\nMartin Arjovsky, Soumith Chintala, Léon Bottou\n\n#### Abstract\nWe introduce a new algorithm named WGAN, an alternative to traditional GAN training. In this new model, we show that we can improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches. Furthermore, we show that the corresponding optimization problem is sound, and provide extensive theoretical work highlighting the deep connections to other distances between distributions.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1701.07875) [[Code]](implementations\u002Fwgan\u002Fwgan.py)\n\n#### Run Example\n```\n$ cd implementations\u002Fwgan\u002F\n$ python3 wgan.py\n```\n\n### Wasserstein GAN GP\n_Improved Training of Wasserstein GANs_\n\n#### Authors\nIshaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville\n\n#### Abstract\nGenerative Adversarial Networks (GANs) are powerful generative models, but suffer from training instability. The recently proposed Wasserstein GAN (WGAN) makes progress toward stable training of GANs, but sometimes can still generate only low-quality samples or fail to converge. We find that these problems are often due to the use of weight clipping in WGAN to enforce a Lipschitz constraint on the critic, which can lead to undesired behavior. We propose an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input. Our proposed method performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning, including 101-layer ResNets and language models over discrete data. We also achieve high quality generations on CIFAR-10 and LSUN bedrooms.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.00028) [[Code]](implementations\u002Fwgan_gp\u002Fwgan_gp.py)\n\n#### Run Example\n```\n$ cd implementations\u002Fwgan_gp\u002F\n$ python3 wgan_gp.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_7b406b4d81ec.gif\" width=\"240\"\\>\n\u003C\u002Fp>\n\n### Wasserstein GAN DIV\n_Wasserstein Divergence for GANs_\n\n#### Authors\nJiqing Wu, Zhiwu Huang, Janine Thoma, Dinesh Acharya, Luc Van Gool\n\n#### Abstract\nIn many domains of computer vision, generative adversarial networks (GANs) have achieved great success, among which the fam-\nily of Wasserstein GANs (WGANs) is considered to be state-of-the-art due to the theoretical contributions and competitive qualitative performance. However, it is very challenging to approximate the k-Lipschitz constraint required by the Wasserstein-1 metric (W-met). In this paper, we propose a novel Wasserstein divergence (W-div), which is a relaxed version of W-met and does not require the k-Lipschitz constraint.As a concrete application, we introduce a Wasserstein divergence objective for GANs (WGAN-div), which can faithfully approximate W-div through optimization. Under various settings, including progressive growing training, we demonstrate the stability of the proposed WGAN-div owing to its theoretical and practical advantages over WGANs. Also, we study the quantitative and visual performance of WGAN-div on standard image synthesis benchmarks, showing the superior performance of WGAN-div compared to the state-of-the-art methods.\n\n[[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1712.01026) [[Code]](implementations\u002Fwgan_div\u002Fwgan_div.py)\n\n#### Run Example\n```\n$ cd implementations\u002Fwgan_div\u002F\n$ python3 wgan_div.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_e760b5a05b6f.png\" width=\"240\"\\>\n\u003C\u002Fp>\n","\u003Cp align=\"center\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_b0271d3d67e5.png\" width=\"480\"\\>\u003C\u002Fp>\n\n**由于我已无暇继续维护此仓库，因此该仓库已进入“陈旧”状态。如果您希望以协作者的身份继续推进其开发，请发送邮件至 eriklindernoren@gmail.com。**\n\n## PyTorch-GAN\n本项目汇集了在研究论文中提出的各类生成对抗网络的 PyTorch 实现方案。模型架构未必完全与论文中所提出的方案一致，但我更倾向于专注于将核心思想完整地实现到位，而非追求每层配置的完美无缺。我们非常欢迎各位对 GAN 的实现方案提出宝贵的意见与建议。\n\n\u003Cb>另请参阅：\u003C\u002Fb> [Keras-GAN](https:\u002F\u002Fgithub.com\u002Feriklindernoren\u002FKeras-GAN)\n\n## 目录\n  * [安装](#installation)\n  * [实现方案](#implementations)\n    + [辅助分类器 GAN](#auxiliary-classifier-gan)\n    + [对抗自编码器](#adversarial-autoencoder)\n    + [BEGAN](#began)\n    + [自行车GAN](#bicyclegan)\n    + [边界求解 GAN](#boundary-seeking-gan)\n    + [聚类 GAN](#cluster-gan)\n    + [条件 GAN](#conditional-gan)\n    + [上下文条件 GAN](#context-conditional-gan)\n    + [上下文编码器](#context-encoder)\n    + [耦合 GAN](#coupled-gan)\n    + [CycleGAN](#cyclegan)\n    + [深度卷积 GAN](#deep-convolutional-gan)\n    + [DiscoGAN](#discogan)\n    + [DRAGAN](#dragan)\n    + [DualGAN](#dualgan)\n    + [基于能量的 GAN](#energy-based-gan)\n    + [增强超分辨率 GAN](#enhanced-super-resolution-gan)\n    + [GAN](#gan)\n    + [InfoGAN](#infogan)\n    + [最小二乘 GAN](#least-squares-gan)\n    + [MUNIT](#munit)\n    + [Pix2Pix](#pix2pix)\n    + [PixelDA](#pixelda)\n    + [相对论 GAN](#relativistic-gan)\n    + [半监督 GAN](#semi-supervised-gan)\n    + [Softmax GAN](#softmax-gan)\n    + [StarGAN](#stargan)\n    + [超分辨率 GAN](#super-resolution-gan)\n    + [UNIT](#unit)\n    + [Wasserstein GAN](#wasserstein-gan)\n    + [Wasserstein GAN GP](#wasserstein-gan-gp)\n    + [Wasserstein GAN DIV](#wasserstein-gan-div)\n\n## 安装\n    $ git clone https:\u002F\u002Fgithub.com\u002Feriklindernoren\u002FPyTorch-GAN\n    $ cd PyTorch-GAN\u002F\n    $ sudo pip3 install -r requirements.txt\n\n## 实现方案   \n### 辅助分类器 GAN\n_辅助分类器生成对抗网络_\n\n#### 作者\nAugustus Odena、Christopher Olah、Jonathon Shlens\n\n#### 摘要\n合成高分辨率的逼真图像一直是机器学习领域长期面临的挑战。在本文中，我们提出了用于改进生成对抗网络（GAN）图像合成训练的新方法。我们构建了一种采用标签条件化的 GAN 变体，该变体能够生成分辨率为 128x128 的图像样本，并展现出全局一致性。我们进一步拓展了以往关于图像质量评估的研究成果，为类条件图像合成模型的样本区分度和多样性提供了两种全新的分析方法。这些分析表明：高分辨率的样本能够提供低分辨率样本所不具备的类别信息。在 1000 个 ImageNet 类别中，分辨率为 128x128 的样本的区分度比人工缩放后的 32x32 样本高出两倍以上。此外，有 84.7% 的类别所生成的样本，在多样性方面与真实的 ImageNet 数据相当。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1610.09585) [[代码]](implementations\u002Facgan\u002Facgan.py)\n\n#### 运行示例\n```\n$ cd implementations\u002Facgan\u002F\n$ python3 acgan.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_70e8e8c9d8cc.gif\" width=\"360\"\\>\n\u003C\u002Fp>\n\n### 对抗自编码器\n_对抗自编码器_\n\n#### 作者\nAlireza Makhzani、Jonathon Shlens、Navdeep Jaitly、Ian Goodfellow、Brendan Frey\n\n#### 摘要\n在本文中，我们提出了“对抗自编码器”（AAE），这是一种概率自编码器，它利用近期提出的生成对抗网络（GAN）通过将自编码器隐藏代码向量的聚合后验分布与任意先验分布进行匹配，从而实现变分推理。将聚合后验分布与先验分布相匹配，可确保从先验空间的任意部分进行采样时，都能生成有意义的样本。由此，对抗自编码器的解码器得以学习一种深层生成模型，将设定的先验分布映射到数据分布上。我们展示了对抗自编码器如何应用于半监督分类、图像风格与内容的分离、无监督聚类、降维以及数据可视化等任务。我们在 MNIST、Street View 房屋编号以及多伦多人脸数据集上进行了实验，结果表明，对抗自编码器在生成建模和半监督分类任务中均取得了具有竞争力的性能。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.05644) [[代码]](implementations\u002Faae\u002Faae.py)\n\n#### 运行示例\n```\n$ cd implementations\u002Faae\u002F\n$ python3 aae.py\n```\n\n### BEGAN\n_BEGAN：边界均衡生成对抗网络_\n\n#### 作者\nDavid Berthelot、Thomas Schumm、Luke Metz\n\n#### 摘要\n我们提出了一种新的平衡约束方法，并结合基于 Wasserstein 距离的损失函数来训练基于自编码器的生成对抗网络。该方法能够在训练过程中平衡生成器与判别器之间的关系。此外，它还提供了一种新的近似收敛指标，训练速度快且稳定，同时能获得高质量的视觉效果。我们还推导出一种方法，用于调节图像多样性和视觉质量之间的权衡。我们专注于图像生成任务，在更高分辨率下仍实现了视觉质量的新突破。这一成果的取得，得益于我们采用相对简单的模型架构和标准的训练流程。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.10717) [[代码]](implementations\u002Fbegan\u002Fbegan.py)\n\n#### 运行示例\n```\n$ cd implementations\u002Fbegan\u002F\n$ python3 begun.py\n```\n\n### BicycleGAN\n_迈向多模态图像到图像的翻译_\n\n#### 作者\n朱俊彦、张理查德、迪帕克·帕塔克、特雷弗·达雷尔、阿列克谢·A·埃夫罗斯、奥利弗·王、伊莱·谢赫特曼\n\n#### 摘要\n在许多图像到图像的翻译任务中，存在多种模糊性：一张输入图像可能对应于多个可能的输出。在本研究中，我们旨在在条件生成建模的框架下，对可能输出的“分布”进行建模。这种映射的模糊性被提炼成一个低维的潜在向量，该向量可在测试时随机采样。生成器学习将给定的输入与这一潜在编码相结合，映射为输出。我们明确鼓励输出与潜在编码之间的连接具有可逆性。这有助于在训练过程中防止从潜在编码到输出的多对一映射——也就是所谓的模式坍塌问题——并产生更加多样化的结果。通过采用不同的训练目标、网络架构以及注入潜在编码的方法，我们探索了这一方法的多种变体。我们提出的方案鼓励潜在编码与输出模式之间实现双射的一致性。我们系统地比较了我们的方法与其他变体，在感知真实度和多样性两个方面进行了全面评估。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.11586) [[代码]](implementations\u002Fbicyclegan\u002Fbicyclegan.py)\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_7f9f24f3cae8.jpg\" width=\"800\"\\>\n\u003C\u002Fp>\n\n#### 运行示例\n```\n$ cd data\u002F\n$ bash download_pix2pix_dataset.sh edges2shoes\n$ cd ..\u002Fimplementations\u002Fbicyclegan\u002F\n$ python3 bicyclegan.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_026ee954b106.png\" width=\"480\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    不同风格的图像转换，通过调整潜在编码来实现。\n\u003C\u002Fp>\n\n\n### 边界求解GAN\n_边界求解生成对抗网络_\n\n#### 作者\nR·德文·赫杰姆、阿图尔·保罗·雅各布、通·切、亚当·特里施勒、权贤秀、约书亚·本吉奥\n\n#### 摘要\n生成对抗网络（GAN）是一种学习框架，其核心在于通过训练判别器来估计目标分布与生成分布之间的差异度量。通常情况下，GAN 的设计要求生成样本对生成参数具有完全可微性，因此不适用于离散数据。我们提出了一种针对离散数据的 GAN 训练方法：利用判别器估算出的差异度量，为生成样本计算重要权重，从而为生成器的训练提供策略梯度。这些重要权重与判别器的决策边界有着密切联系，我们将其命名为“边界求解GAN”（BGAN）。我们通过离散图像和基于字符的自然语言生成任务，验证了所提算法的有效性。此外，边界求解的目标也适用于连续数据，这有助于提升训练的稳定性；我们已在 Celeba、大规模场景理解（LSUN）卧室以及无需条件约束的 ImageNet 数据集上进行了相关实验。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1702.08431) [[代码]](implementations\u002Fbgan\u002Fbgan.py)\n\n#### 运行示例\n```\n$ cd implementations\u002Fbgan\u002F\n$ python3 bgan.py\n```\n\n### 群集GAN\n_群集GAN：生成对抗网络中的潜在空间聚类_\n\n#### 作者\n苏迪普托·穆克吉、希曼舒·阿斯纳尼、尤金·林、斯雷拉姆·坎南\n\n#### 摘要\n生成对抗网络（GAN）在众多无监督学习任务中取得了显著的成功，而聚类无疑是一个重要的无监督学习问题。尽管人们可以充分利用 GAN 中的潜在空间反向投影来进行聚类，但我们发现，GAN 的潜在空间中并不会保留聚类结构。在本文中，我们提出了群集GAN，作为一种利用 GAN 进行聚类的新机制。通过从一热编码变量与连续潜在变量的混合体中采样潜在变量，并结合一个反向网络（将数据投影至潜在空间），该网络与专门用于聚类的损失函数协同训练，我们成功实现了潜在空间中的聚类。我们的研究结果揭示了一个显著的现象：即使判别器从未接触过这些向量，GAN 仍能保留类别间的潜在空间插值。我们将我们的成果与多种聚类基线进行了对比，并证明了其在合成数据集和真实数据集上的优越性能。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1809.03627) [[代码]](implementations\u002Fcluster_gan\u002Fclustergan.py)\n\n基于完整 PyTorch 的代码[[实现]](https:\u002F\u002Fgithub.com\u002Fzhampel\u002FclusterGAN)。\n\n#### 运行示例\n```\n$ cd implementations\u002Fcluster_gan\u002F\n$ python3 clustergan.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_04a97f36ef7b.gif\" width=\"360\\\"\\>\n\u003C\u002Fp>\n\n\n### 条件GAN\n_条件生成对抗网络_\n\n#### 作者\n梅迪·米尔扎、西蒙·奥辛德罗\n\n#### 摘要\n生成对抗网络 [8] 最近被引入，作为一种训练生成模型的新方法。在本研究中，我们提出了条件生成对抗网络的版本，只需将我们希望对其进行条件约束的数据 y 一同输入生成器和判别器即可构建该模型。我们证明，该模型能够根据类别标签生成 MNIST 数字。我们还展示了如何利用该模型来学习多模态模型，并提供了初步的应用实例：在图像标注任务中，我们演示了该方法如何生成非训练标签中的描述性标签。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1411.1784) [[代码]](implementations\u002Fcgan\u002Fcgan.py)\n\n#### 运行示例\n```\n$ cd implementations\u002Fcgan\u002F\n$ python3 cgan.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_32a80459f043.gif\" width=\"360\\\"\\>\n\u003C\u002Fp>\n\n### 上下文条件GAN\n_带有上下文条件生成对抗网络的半监督学习_\n\n#### 作者\n艾米莉·丹顿、萨姆·格罗斯、罗布·费格斯\n\n#### 摘要\n我们提出了一种基于补全技术的简单半监督图像学习方法，该方法采用对抗损失进行训练。将随机缺失的图像块呈现给生成器，由生成器根据周围像素信息填补这些空缺。随后，补全后的图像会输入判别器网络，由判别器判断这些图像是真实的（未被篡改的训练图像）还是非真实图像。这一任务可作为判别器标准监督训练的正则化项。借助我们的方法，我们能够以半监督的方式直接训练大型 VGG 型网络。我们在 STL-10 和 PASCAL 数据集上进行了评估，我们的方法在性能上与现有方法相当，甚至更优。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.06430) [[代码]](implementations\u002Fccgan\u002Fccgan.py)\n\n#### 运行示例\n```\n$ cd implementations\u002Fccgan\u002F\n$ python3 ccgan.py\n```\n\n### 上下文编码器\n_上下文编码器：基于上下文的像素预测特征学习_\n\n#### 作者\nDeepak Pathak、Philipp Krahenbuhl、Jeff Donahue、Trevor Darrell、Alexei A. Efros\n\n#### 摘要\n我们提出了一种基于上下文的像素预测的无监督视觉特征学习算法。借鉴自自动编码器的思路，我们提出了“上下文编码器”——一种卷积神经网络，其训练目标是根据图像周围环境的背景信息，生成任意图像区域的内容。要成功完成这一任务，上下文编码器不仅需要理解整张图像的内容，还需要对缺失的部分产生合理的假设。在训练上下文编码器时，我们同时尝试了标准的逐像素重建损失，以及重建损失与对抗性损失的组合。后者能够取得更为清晰的结果，因为它能更好地处理输出中的多种模式。研究发现，上下文编码器所学习的表征不仅捕捉了图像的外观特征，还深入刻画了视觉结构的语义信息。我们通过定量实验验证了所学特征在卷积神经网络预训练中的有效性，这些特征可用于分类、检测和分割等任务。此外，上下文编码器还可用于语义修补任务，既可以独立使用，也可以作为非参数化方法的初始化步骤。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1604.07379) [[代码]](implementations\u002Fcontext_encoder\u002Fcontext_encoder.py)\n\n#### 运行示例\n```\n$ cd implementations\u002Fcontext_encoder\u002F\n\u003C按照context_encoder.py文件顶部的步骤进行操作>\n$ python3 context_encoder.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_be629a861f89.png\" width=\"640\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    行：蒙版 | 修补后图像 | 原始图像 | 蒙版 | 修补后图像 | 原始图像\n\u003C\u002Fp>\n\n### 联合生成对抗网络\n_联合生成对抗网络_\n\n#### 作者\nMing-Yu Liu、Oncel Tuzel\n\n#### 摘要\n我们提出了一种联合生成对抗网络（CoGAN），用于学习多领域图像的联合分布。与现有方法不同，传统方法要求训练集中包含不同领域对应的图像对，而CoGAN则无需任何对应图像对，即可学习联合分布。它仅需从边缘分布中采样，便能学习到联合分布。实现这一目标的关键在于引入权重共享约束，该约束限制了网络的容量，并促使网络更倾向于选择联合分布的解，而非将边缘分布相乘的结果。我们已将CoGAN应用于多项联合分布学习任务，包括学习彩色图像与深度图像的联合分布，以及学习具有不同属性的人脸图像的联合分布。在每项任务中，CoGAN均能成功地学习到联合分布，而无需任何对应图像对。我们还展示了CoGAN在领域适应和图像变换等任务中的应用。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1606.07536) [[代码]](implementations\u002Fcogan\u002Fcogan.py)\n\n#### 运行示例\n```\n$ cd implementations\u002Fcogan\u002F\n$ python3 cogan.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_3d3ec415fb93.gif\" width=\"360\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    生成的MNIST和MNIST-M图像\n\u003C\u002Fp>\n\n### CycleGAN\n_利用循环一致性对抗网络实现无配对图像到图像的翻译_\n\n#### 作者\nJun-Yan Zhu、Taesung Park、Phillip Isola、Alexei A. Efros\n\n#### 摘要\n图像到图像的翻译是一种视觉与图形学领域的难题，其目标是借助一组配对的图像对，学习输入图像与输出图像之间的映射关系。然而，在许多实际应用中，往往无法获得配对的训练数据。为此，我们提出了一种在缺乏配对样本的情况下，学习将图像从源域X翻译至目标域Y的方法。我们的目标是学习一个映射G:X→Y，使得G(X)生成的图像分布与Y的分布在对抗性损失下难以区分。由于该映射的约束条件极为宽松，我们将其与逆映射F:Y→X结合，并引入循环一致性损失，以推动F(G(X))≈X（反之亦然）。我们在多个不存在配对训练数据的任务上展示了定性结果，包括风格迁移、物体变形、季节转换、照片增强等。通过与多种现有方法的定量对比，我们证明了本方法的优越性。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.10593) [[代码]](implementations\u002Fcyclegan\u002Fcyclegan.py)\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_fcac6d24856d.png\" width=\"640\"\\>\n\u003C\u002Fp>\n\n#### 运行示例\n```\n$ cd data\u002F\n$ bash download_cyclegan_dataset.sh monet2photo\n$ cd ..\u002Fimplementations\u002Fcyclegan\u002F\n$ python3 cyclegan.py --dataset_name monet2photo\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_50907f776639.png\" width=\"900\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    从Monet到照片的翻译结果。\n\u003C\u002Fp>\n\n### 深度卷积生成对抗网络\n_深度卷积生成对抗网络_\n\n#### 作者\nAlec Radford、Luke Metz、Soumith Chintala\n\n#### 摘要\n近年来，卷积神经网络（CNN）在计算机视觉应用中得到了广泛应用，而无监督学习在CNN中的应用却相对较少。在本研究中，我们希望弥合CNN在监督学习与无监督学习领域取得的成功之间的差距。我们提出了一类名为“深度卷积生成对抗网络”（DCGAN）的CNN，并对其架构进行了特定的约束设计，同时证明了DCGAN是无监督学习的有力候选者。通过对多种图像数据集的训练，我们获得了令人信服的证据：我们的深度卷积对抗网络在生成器和判别器中，都能从对象部件逐步学习到场景层次的表征体系。此外，我们还将所学特征应用于多种新任务，充分证明了其作为通用图像表征的适用性。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.06434) [[代码]](implementations\u002Fdcgan\u002Fdcgan.py)\n\n#### 运行示例\n```\n$ cd implementations\u002Fdcgan\u002F\n$ python3 dcgan.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_1106b8957087.gif\" width=\"240\"\\>\n\u003C\u002Fp>\n\n### DiscoGAN\n_借助生成对抗网络学习跨域关系_\n\n#### 作者\n金泰洙、车文秀、金贤洙、李正权、金智媛\n\n#### 摘要\n人类在无需任何监督的情况下，能够轻松识别来自不同领域数据之间的关系；然而，要自动学习并发现这些关系，通常是一项极具挑战性的任务，且往往需要大量标注的真值对来揭示这些关系。为避免耗费高昂的成本进行人工配对，我们针对在无配对数据条件下发现跨域关系这一任务展开研究。我们提出了一种基于生成对抗网络的方法，该方法能够学习并发现不同领域之间的关系（DiscoGAN）。利用所发现的关系，我们的网络成功实现了从一个领域向另一个领域的风格迁移，同时保留了关键属性，例如方向和面部身份。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.05192) [[代码]](implementations\u002Fdiscogan\u002Fdiscogan.py)\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_ae76c24abe01.png\" width=\"640\"\\>\n\u003C\u002Fp>\n\n#### 运行示例\n```\n$ cd data\u002F\n$ bash download_pix2pix_dataset.sh edges2shoes\n$ cd ..\u002Fimplementations\u002Fdiscogan\u002F\n$ python3 discogan.py --dataset_name edges2shoes\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_73aad89bb03f.png\" width=\"480\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    从上到下依次为：(1) 来自域A的真实图像；(2) 来自域A的翻译后图像；(3) 来自域A的重建图像；(4) 来自域B的真实图像；(5) 来自域B的翻译后图像；(6) 来自域B的重建图像\n\u003C\u002Fp>\n\n### DRAGAN\n_生成对抗网络的收敛性与稳定性_\n\n#### 作者\n纳文·科达利、雅各布·阿伯内西、詹姆斯·海斯、佐尔特·基拉\n\n#### 摘要\n我们提出将生成对抗网络的训练动力学视为一种“遗憾最小化”问题来加以研究，这与主流观点——即生成对抗网络的训练目标始终是使真实分布与生成分布之间的差异最小化——形成了鲜明对比。我们从这一全新视角分析生成对抗网络训练的收敛性，以深入理解为何会出现模式坍缩现象。我们假设，在这一非凸博弈中，存在一些不良的局部均衡状态，而这些局部均衡状态正是导致模式坍缩的原因。我们观察到，这些局部均衡状态往往会在某些真实数据点附近表现出显著的判别器函数梯度。我们证明，通过一种名为DRAGAN的梯度惩罚方案，可以有效避免这些退化的局部均衡状态。研究表明，DRAGAN能够实现更快的训练速度，提升训练稳定性，减少模式坍缩的发生，并帮助生成器网络在多种架构和目标函数下获得更出色的建模性能。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1705.07215) [[代码]](implementations\u002Fdragan\u002Fdragan.py)\n\n#### 运行示例\n```\n$ cd implementations\u002Fdragan\u002F\n$ python3 dragan.py\n```\n\n### DualGAN\n_DualGAN：用于图像到图像翻译的无监督双流学习_\n\n#### 作者\n易子莉、张浩、谭平、公明伦\n\n#### 摘要\n条件生成对抗网络（GAN）在跨域图像到图像翻译任务中取得了显著进展。根据任务的复杂程度，训练一个条件GAN往往需要数千至数百万组带标签的图像对。然而，人工标注成本高昂，甚至不切实际，而且并非总是能够获取大量可用数据。受自然语言翻译中的双流学习启发，我们开发了一种全新的DualGAN机制，使图像翻译器能够从两个域的两组无标签图像中进行训练。在我们的架构中，原始GAN负责将域U的图像翻译成域V中的图像，而双流GAN则负责反向完成这一任务。通过原始任务与双流任务构成的闭环，任一域的图像均可被翻译并随后重建。因此，我们可以使用一个能兼顾图像重建误差的损失函数来训练翻译器。在多项无标签图像翻译任务上的实验表明，DualGAN在性能上相比单个GAN有了显著提升。在部分任务中，DualGAN甚至能够取得与完全标注数据训练的条件GAN相当甚至略胜一筹的成果。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.02510) [[代码]](implementations\u002Fdualgan\u002Fdualgan.py)\n\n\n#### 运行示例\n```\n$ cd data\u002F\n$ bash download_pix2pix_dataset.sh facades\n$ cd ..\u002Fimplementations\u002Fdualgan\u002F\n$ python3 dualgan.py --dataset_name facades\n```\n\n### 能量型生成对抗网络\n_能量型生成对抗网络_\n\n#### 作者\n赵俊博、迈克尔·马修、扬·勒库恩\n\n#### 摘要\n我们提出了“能量型生成对抗网络”模型（EBGAN），该模型将判别器视为一种能量函数：它会将低能量分配给接近数据流形的区域，而将高能量分配给其他区域。与概率生成对抗网络类似，生成器被视作旨在以最低的能量生成对比样本，而判别器则被训练为将高能量分配给这些生成的样本。将判别器视为能量函数，使得我们不仅能够采用常规的二分类器（输出为逻辑回归）作为判别器，还可以运用多种不同的架构和损失函数。其中，我们展示了一种EBGAN框架的具体实现方式：采用自编码器架构，将能量设为重建误差，以此替代判别器。我们发现，这种形式的EBGAN在训练过程中比普通生成对抗网络表现得更加稳定。此外，我们还证明，单尺度架构同样能够被训练以生成高分辨率图像。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1609.03126) [[代码]](implementations\u002Febgan\u002Febgan.py)\n\n#### 运行示例\n```\n$ cd implementations\u002Febgan\u002F\n$ python3 ebgan.py\n```\n\n### 增强型超分辨率GAN\n_ESRGAN：增强型超分辨率生成对抗网络_\n\n#### 作者\n王新涛、于珂、吴世翔、顾金金、刘一豪、董超、陈变更、乔宇、唐小欧\n\n#### 摘要\n超分辨率生成对抗网络（SRGAN）是一项开创性研究成果，能够在单张图像的超分辨率处理中生成逼真的纹理细节。然而，这些生成的细节往往伴随着令人不悦的伪影。为了进一步提升视觉质量，我们深入研究了SRGAN的三个关键组件——网络架构、对抗损失和感知损失，并对它们逐一进行优化，最终提出了增强型SRGAN（ESRGAN）。具体而言，我们引入了无批归一化的残差残差密集块（RRDB）作为基本的网络构建单元。此外，我们借鉴了相对论GAN的理念，让判别器预测相对真实度，而非绝对值。最后，我们通过使用激活前的特征来改进感知损失，从而为亮度一致性与纹理恢复提供更强大的监督支持。得益于这些改进，所提出的ESRGAN在视觉质量上始终优于SRGAN，且生成的纹理更加逼真自然，并在PIRM2018-SR挑战赛中荣获第一名。代码可在[此网址](https:\u002F\u002Fgithub.com\u002Fxinntao\u002FESRGAN)获取。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1809.00219) [[代码]](implementations\u002Fesrgan\u002Fesrgan.py)\n\n#### 运行示例\n```\n$ cd implementations\u002Fesrgan\u002F\n\u003C按照esrgan.py顶部的步骤操作>\n$ python3 esrgan.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_bc4f696ff5c4.png\" width=\"320\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    最近邻上采样 | ESRGAN\n\u003C\u002Fp>\n\n### GAN\n生成对抗网络_\n\n#### 作者\n伊恩·J·古德福洛、让·普盖-阿巴迪、梅迪·米尔扎、冰徐、大卫·沃德-法利、谢尔吉尔·奥扎尔、阿伦·库尔维尔、约书亚·本吉奥\n\n#### 摘要\n我们提出了一种全新的框架，通过对抗过程来估计生成模型。在此框架中，我们同时训练两个模型：一个用于捕捉数据分布的生成模型G，以及一个用于评估样本来自训练数据的概率、而非来自G的判别模型D。G的训练目标是最大化D出错的概率。这一框架对应于一个最小化极值的双人博弈。在G和D任意函数的范围内，唯一解存在：G能够恢复训练数据分布，而D在所有位置都等于1\u002F2。当G和D由多层感知机定义时，整个系统可以采用反向传播进行训练。无论是在训练阶段还是在生成样本阶段，均无需使用马尔可夫链或未展开的近似推理网络。实验结果通过定性和定量的方式对生成样本进行了验证，充分展示了该框架的潜力。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1406.2661) [[代码]](implementations\u002Fgan\u002Fgan.py)\n\n#### 运行示例\n```\n$ cd implementations\u002Fgan\u002F\n$ python3 gan.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_0013580eaac9.gif\" width=\"240\"\\>\n\u003C\u002Fp>\n\n### InfoGAN\nInfoGAN：通过信息最大化生成对抗网络实现可解释的表征学习_\n\n#### 作者\n陈曦、段妍、雷因·胡托夫、约翰·舒尔曼、伊利亚·苏茨克弗、彼得·阿比尔\n\n#### 摘要\n本文介绍了InfoGAN，这是一种基于信息论的生成对抗网络扩展，能够以完全无监督的方式学习解耦的表征。InfoGAN是一种生成对抗网络，它还能够最大化潜在变量中一小部分与观测之间的互信息。我们推导出了一个可用于高效优化的互信息目标下界，并证明我们的训练过程可以被理解为Wake-Sleep算法的一种变体。具体而言，InfoGAN成功地将书写风格从MNIST数据集中的数字形状中分离出来，将姿势从3D渲染图像的光照中分离出来，还将背景中的数字与SVHN数据集中的中心数字区分开来。此外，它还在CelebA人脸数据集中发现了包括发型、眼镜的有无以及情绪在内的多种视觉概念。实验表明，InfoGAN能够学习到具有可解释性的表征，其性能与现有全监督方法所学得的表征相当。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1606.03657) [[代码]](implementations\u002Finfogan\u002Finfogan.py)\n\n#### 运行示例\n```\n$ cd implementations\u002Finfogan\u002F\n$ python3 infogan.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_cff97b99285b.gif\" width=\"360\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    不同类别潜在变量按列变化的结果。\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_a776f5de29f3.png\" width=\"360\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    连续潜在变量按行变化的结果。\n\u003C\u002Fp>\n\n### 最小二乘生成对抗网络\n最小二乘生成对抗网络_\n\n#### 作者\n毛旭东、李青、谢浩然、刘瑞阳、王珍、斯蒂芬·保罗·斯莫利\n\n#### 摘要\n无监督学习中，生成对抗网络（GAN）已取得了巨大的成功。常规GAN通常将判别器假设为一个使用Sigmoid交叉熵损失函数的分类器。然而，我们发现，这种损失函数在学习过程中可能导致梯度消失问题。为了解决这一问题，我们在本文中提出了最小二乘生成对抗网络（LSGANs），并为判别器采用了最小二乘损失函数。我们证明，最小化LSGAN的目标函数等价于最小化皮尔逊χ²散度。与常规GAN相比，LSGANs具有两大优势：首先，LSGANs能够生成比常规GAN更高质量的图像；其次，LSGANs在学习过程中表现得更加稳定。我们在五个场景数据集上对LSGANs进行了评估，实验结果表明，LSGANs生成的图像质量优于常规GAN生成的图像。此外，我们还开展了两项对比实验，分别比较LSGANs与常规GAN，以凸显LSGANs的稳定性。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.04076) [[代码]](implementations\u002Flsgan\u002Flsgan.py)\n\n#### 运行示例\n```\n$ cd implementations\u002Flsgan\u002F\n$ python3 lsgan.py\n```\n\n### MUNIT\n多模态无监督图像到图像翻译\n\n#### 作者\n黄旭、刘明宇、塞尔日·贝隆吉、扬·考茨\n\n#### 摘要\n无监督图像到图像翻译是计算机视觉领域中一项既重要又极具挑战性的课题。在给定源域的一张图像时，我们的目标是在不接触任何对应图像对的情况下，学习目标域中相应图像的条件分布。尽管这种条件分布本质上具有多模态特性，但现有的方法往往做出了过于简化的假设，将其建模为一个确定性的单对一映射。因此，这些方法无法从给定的源域图像中生成多样化的输出。为了解决这一局限性，我们提出了一种多模态无监督图像到图像翻译（MUNIT）框架。我们假设图像表示可以被分解为：一种具有域不变性的内容编码，以及一种能够捕捉域特定属性的风格编码。为了将一张图像翻译至另一个域，我们将其中的内容编码与从目标域风格空间中随机采样的风格编码相结合。我们对所提出的框架进行了深入分析，并建立了若干理论结果。通过与当前最先进的方法进行广泛实验对比，进一步证明了该框架的优势。此外，我们的框架还允许用户通过提供示例风格图像来控制翻译输出的风格。代码和预训练模型均可在[此网址](https:\u002F\u002Fgithub.com\u002Fnvlabs\u002FMUNIT)获取。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1804.04732) [[代码]](implementations\u002Fmunit\u002Fmunit.py)\n\n#### 运行示例\n```\n$ cd data\u002F\n$ bash download_pix2pix_dataset.sh edges2shoes\n$ cd ..\u002Fimplementations\u002Fmunit\u002F\n$ python3 munit.py --dataset_name edges2shoes\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_438374a647c3.png\" width=\"480\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    不同风格编码下的结果表现。\n\u003C\u002Fp>\n\n### Pix2Pix\n无配对图像到图像翻译：结合条件对抗网络\n\n#### 作者\n菲利普·伊索拉、朱俊彦、周廷辉、阿列克谢·A·埃弗罗斯\n\n#### 摘要\n我们研究了条件对抗网络，将其作为解决图像到图像翻译问题的通用解决方案。这些网络不仅能够学习从输入图像到输出图像的映射关系，还能学习一种损失函数来训练该映射关系。这使得我们能够将这一通用方法应用于传统上需要采用截然不同的损失函数才能解决的问题。我们证明，这种方法在合成标签图中的照片、从边缘图中重建物体，以及为图像添加色彩等任务中均表现出色。事实上，自本文所附带的 Pix2Pix 软件发布以来，大量互联网用户（其中不乏艺术家）纷纷上传了自己使用我们系统所完成的实验成果，进一步印证了其广泛适用性以及易于上手的特点，无需对参数进行反复调整。作为社区的一员，我们不再需要手动设计映射函数；这项工作也表明，我们完全可以在不进行手工设计损失函数的情况下，依然取得令人满意的成果。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.07004) [[代码]](implementations\u002Fpix2pix\u002Fpix2pix.py)\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_11179601e0f7.png\" width=\"640\"\\>\n\u003C\u002Fp>\n\n#### 运行示例\n```\n$ cd data\u002F\n$ bash download_pix2pix_dataset.sh facades\n$ cd ..\u002Fimplementations\u002Fpix2pix\u002F\n$ python3 pix2pix.py --dataset_name facades\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_d1c590c13e37.png\" width=\"480\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    从上至下依次为：(1) 生成器的条件输入；(2) 基于该条件生成的图像；(3) 与该条件相对应的真实图像。\n\u003C\u002Fp>\n\n### PixelDA\n无监督像素级域适应：结合生成对抗网络\n\n#### 作者\n康斯坦蒂诺斯·布斯马利斯、内森·西尔伯曼、大卫·多汉、杜米特鲁·埃尔汉、迪利普·克里希南\n\n#### 摘要\n对于许多任务而言，收集标注详尽的图像数据集来训练现代机器学习算法的成本高得令人望而却步。一种颇具吸引力的替代方案是生成合成数据，其中真实标注信息可由系统自动生成。然而，不幸的是，仅基于渲染图像训练的模型往往难以泛化至真实图像。为了解决这一不足，先前的研究提出了无监督域适应算法，旨在实现两个域之间的表征映射，或学习提取具有域不变性的特征。在本研究中，我们提出了一种全新的方法，以无监督的方式学习从一个域到另一个域在像素空间中的变换。我们基于生成对抗网络（GAN）的方法，使源域图像呈现出仿佛源自目标域的样貌。我们的方法不仅能生成逼真的样本，而且在多项无监督域适应任务中，其性能远超当前最先进水平。最后，我们还证明，该适应过程能够推广至训练过程中未曾见过的对象类别。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1612.05424) [[代码]](implementations\u002Fpixelda\u002Fpixelda.py)\n\n#### MNIST 至 MNIST-M 分类\n利用源域（MNIST）的标注信息，对已从源域（MNIST）翻译至目标域（MNIST-M）的图像进行分类训练。分类网络与生成器网络共同训练，以优化生成器：一方面使其能够实现恰当的域翻译，另一方面又能保留源域图像的语义信息。我们对经过翻译处理的图像所训练的分类网络，与单纯在 MNIST 上训练分类器并将其应用于 MNIST-M 的朴素方案进行了对比。在 MNIST-M 上，朴素模型的分类准确率仅为 55%，而经过域适应训练的模型则达到了 95% 的分类准确率。\n\n```\n$ cd implementations\u002Fpixelda\u002F\n$ python3 pixelda.py\n```  \n| 方法       | 准确率  |\n| ------------ |:---------:|\n| 朴素方法      | 55%       |\n| PixelDA      | 95%       |\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_2d177bcddaa8.png\" width=\"480\\\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    从上至下依次为：(1) 来自 MNIST 的真实图像；(2) 从 MNIST 翻译至 MNIST-M 的图像；(3) MNIST-M 图像的示例。\n\u003C\u002Fp>\n\n### 相对论GAN\n_相对论判别器：标准GAN中缺失的关键要素_\n\n#### 作者\n亚历克西娅·乔利库尔-马蒂诺\n\n#### 摘要\n在标准生成对抗网络（SGAN）中，判别器用于估计输入数据为真实数据的概率。生成器则被训练以提高假数据为真实数据的概率。我们提出，判别器还应同时降低真实数据为真实数据的概率，原因如下：1）这能够基于先验知识，即小批量数据中有一半是假数据；2）通过最小化差异，这一目标可以被有效实现；3）在最优设置下，SGAN与积分概率度量（IPM）GAN无异。\n我们证明，可以通过使用相对论判别器来实现这一特性——该判别器能够估算给定真实数据相较于随机采样的假数据更具真实性的概率。此外，我们还提出了一种变体：在这种变体中，判别器会平均估算给定真实数据相较于假数据更具真实性的概率。我们将这两种方法推广至非标准GAN损失函数，并分别将其称为“相对论GAN”（RGAN）和“相对论平均GAN”（RaGAN）。我们发现，基于IPM的GAN是使用恒等函数的RGAN的子集。\n从实证研究来看，我们发现：1）与非相对论GAN相比，RGAN和RaGAN在稳定性上显著提升，且能生成更高质量的数据样本；2）采用梯度惩罚的普通RaGAN生成的数据质量优于WGAN-GP，且每轮生成器更新仅需一次判别器更新（使达到当前顶尖水平所需的时间缩短了400%）；3）RaGAN能够从极小的样本集（N=2011）中生成具有合理分辨率的高分辨率图像，而GAN和LSGAN则无法做到这一点；这些图像的质量远高于WGAN-GP以及采用频谱归一化的SGAN所生成的图像。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1807.00734) [[代码]](implementations\u002Frelativistic_gan\u002Frelativistic_gan.py)\n\n#### 运行示例\n```\n$ cd implementations\u002Frelativistic_gan\u002F\n$ python3 relativistic_gan.py                 # 相对论标准GAN\n$ python3 relativistic_gan.py --rel_avg_gan   # 相对论平均GAN\n```\n\n### 半监督GAN\n_半监督生成对抗网络_\n\n#### 作者\n奥古斯都·奥德纳\n\n#### 摘要\n我们将生成对抗网络（GAN）扩展至半监督场景，通过强制判别器网络输出类别标签来实现。我们针对一个包含N个类别的数据集，训练生成模型G和判别器D。在训练阶段，让D预测输入属于N+1个类别中的哪一个，其中额外增加一个类别，用于对应G的输出。我们证明，这种方法能够构建出更加高效的数据分类器，并且其生成的样本质量优于传统GAN。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1606.01583) [[代码]](implementations\u002Fsgan\u002Fsgan.py)\n\n#### 运行示例\n```\n$ cd implementations\u002Fsgan\u002F\n$ python3 sgan.py\n```\n\n### Softmax GAN\n_Softmax GAN_\n\n#### 作者\n林敏\n\n#### 摘要\nSoftmax GAN是生成对抗网络（GAN）的一种全新变体。Softmax GAN的核心思想在于：将原始GAN中的分类损失替换为样本空间中单个批次的softmax交叉熵损失。在对N个真实训练样本和M个生成样本进行对抗学习的过程中，判别器的训练目标是将全部概率质量分配给真实样本，每个样本的概率为1\u002FM，并将零概率分配给生成数据。在生成器的训练阶段，目标则是为批次中的所有数据点赋予相等的概率，每个数据点的概率为1\u002FM+N。尽管原始GAN与噪声对比估计（NCE）密切相关，但我们证明，Softmax GAN是GAN的“重要性抽样”版本。实验进一步表明，这一简单的改动能够稳定GAN的训练过程。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.06191) [[代码]](implementations\u002Fsoftmax_gan\u002Fsoftmax_gan.py)\n\n#### 运行示例\n```\n$ cd implementations\u002Fsoftmax_gan\u002F\n$ python3 softmax_gan.py\n```\n\n### StarGAN\n_StarGAN：面向多领域图像到图像翻译的统一生成对抗网络_\n\n#### 作者\n崔允济、崔珉宰、金文英、河正宇、金成勋、秋在吉\n\n#### 摘要\n近期的研究在两领域间的图像到图像翻译任务中取得了显著成果。然而，现有方法在处理超过两个领域时的可扩展性和鲁棒性仍显不足，因为对于每一对图像领域，都需要独立构建不同的模型。为了解决这一局限性，我们提出了StarGAN——一种新颖且可扩展的方法，只需使用一个单一模型，即可实现多领域间的图像到图像翻译。StarGAN这种统一的模型架构，使得多个不同领域的数据集能够在同一个网络中同步训练。因此，与现有模型相比，StarGAN生成的图像质量更为出色，同时还具备灵活地将输入图像翻译至任意目标领域的全新能力。我们在面部属性迁移和面部表情合成任务中，通过实验证明了该方法的有效性。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.09020) [[代码]](implementations\u002Fstargan\u002Fstargan.py)\n\n#### 运行示例\n```\n$ cd implementations\u002Fstargan\u002F\n\u003C按照stargan.py顶部的步骤操作>\n$ python3 stargan.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_02f3a7c38893.png\" width=\"640\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    原始 | 黑发 | 金发 | 棕发 | 性别反转 | 老年\n\u003C\u002Fp>\n\n### 超分辨率GAN\n_基于生成对抗网络的超现实单图像超分辨率_\n\n#### 作者\n克里斯蒂安·莱迪格、卢卡斯·泰斯、费伦茨·胡扎尔、何塞·卡巴列罗、安德鲁·坎宁安、亚历杭德罗·阿科斯塔、安德鲁·艾特肯、阿利汗·泰贾尼、约翰内斯·托茨、泽汉·王、温哲·施\n\n#### 摘要\n尽管借助更快、更深层的卷积神经网络，单图像超分辨率在精度和速度方面取得了突破性进展，但一个核心问题仍然未得到彻底解决：当我们将图像进行大规模上采样时，如何恢复更为精细的纹理细节？基于优化方法的超分辨率算法的行为主要取决于目标函数的选择。近年来的研究大多集中在最小化均方重建误差上。虽然这些方法所得到的估计值具有较高的信噪比，但在高频细节方面往往有所欠缺，且在感知层面并不令人满意——它们无法达到更高分辨率下所期望的高保真度。在本文中，我们提出了SRGAN——一种用于图像超分辨率（SR）的生成对抗网络（GAN）。据我们所知，这是首个能够为4倍上采样因子推断出照片级真实的自然图像的框架。为了实现这一目标，我们提出了一种感知损失函数，该函数由对抗损失和内容损失组成。对抗损失通过训练一个判别器网络，将我们的解决方案推向自然图像的流形；该判别器网络被设计成能够区分超分辨率图像与原始的照片级真实图像。此外，我们还采用了以感知相似度为导向的内容损失，而非像素空间中的相似度。我们的深度残差网络能够在公开基准数据集上，从高度下采样的图像中恢复出照片级真实的纹理。一项广泛的平均意见评分（MOS）测试表明，使用SRGAN在感知质量方面取得了显著提升。SRGAN所获得的MOS分数，相较于任何最先进的方法，都更接近于原始高分辨率图像的得分。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1609.04802) [[代码]](implementations\u002Fsrgan\u002Fsrgan.py)\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_e0f76b8e3740.png\" width=\"640\"\\>\n\u003C\u002Fp>\n\n#### 运行示例\n```\n$ cd implementations\u002Fsrgan\u002F\n\u003C按照srgan.py顶部的步骤操作>\n$ python3 srgan.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_2cfbe968500a.png\" width=\"320\"\\>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n    最近邻上采样 | SRGAN\n\u003C\u002Fp>\n\n### UNIT\n_无监督图像到图像翻译网络_\n\n#### 作者\n刘明宇、托马斯·布罗伊尔、扬·考茨\n\n#### 摘要\n无监督图像到图像翻译的目标是通过利用各个域的边缘分布中的图像，学习不同域之间图像的联合分布。由于存在无限多的联合分布可以达到给定的边缘分布，若不附加任何额外假设，仅凭边缘分布就无法推断出联合分布的任何信息。为了解决这一问题，我们提出了共享潜在空间的假设，并基于耦合GAN提出了一种无监督图像到图像翻译框架。我们将所提出的框架与现有方法进行了对比，并在多种具有挑战性的无监督图像翻译任务中取得了高质量的图像翻译结果，包括街景图像翻译、动物图像翻译以及人脸图像翻译。此外，我们还将所提出的框架应用于领域适应任务，并在基准数据集上实现了当前最优的性能。代码及更多实验结果可在此[https网址](https:\u002F\u002Fgithub.com\u002Fmingyuliutw\u002Funit)中获取。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.00848) [[代码]](implementations\u002Funit\u002Funit.py)\n\n#### 运行示例\n```\n$ cd data\u002F\n$ bash download_cyclegan_dataset.sh apple2orange\n$ cd implementations\u002Funit\u002F\n$ python3 unit.py --dataset_name apple2orange\n```\n\n### 韦斯特斯坦GAN\n_韦斯特斯坦GAN_\n\n#### 作者\n马丁·阿尔乔维西、索米特·钦塔拉、莱昂·博图\n\n#### 摘要\n我们介绍了一种名为WGAN的新算法，作为传统GAN训练的一种替代方案。在这一新模型中，我们证明了可以提升学习的稳定性，消除模式坍塌等问题，并提供有益的学习曲线，这些曲线对于调试和超参数搜索非常有帮助。此外，我们还证明了相应的优化问题本身是合理的，并通过大量理论研究，揭示了其与不同分布间距离之间的深刻联系。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1701.07875) [[代码]](implementations\u002Fwgan\u002Fwgan.py)\n\n#### 运行示例\n```\n$ cd implementations\u002Fwgan\u002F\n$ python3 wgan.py\n```\n\n### 韦斯特斯坦GAN GP\n_韦斯特斯坦GAN的改进训练_\n\n#### 作者\n伊尚·古拉贾尼、法鲁克·艾哈迈德、马丁·阿尔乔维西、文森特·杜穆兰、阿伦·库尔维尔\n\n#### 摘要\n生成对抗网络（GAN）是一种强大的生成模型，但其训练过程往往不稳定。最近提出的韦斯特斯坦GAN（WGAN）在推动GAN稳定训练方面取得了进展，但有时仍只能生成低质量样本，或难以收敛。我们发现，这些问题往往源于WGAN中采用的权重裁剪机制，该机制旨在对评价器施加Lipschitz约束，而这种约束可能会导致不良行为。为此，我们提出了一种替代权重裁剪的方法：对评价器的梯度与其输入的范数进行惩罚。我们的方法表现优于标准WGAN，并且能够在几乎无需调参的情况下，为多种GAN架构提供稳定的训练体验，包括101层ResNet以及针对离散数据的语言模型。此外，我们在CIFAR-10和LSUN卧室数据集上也取得了高质量的生成结果。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.00028) [[代码]](implementations\u002Fwgan_gp\u002Fwgan_gp.py)\n\n#### 运行示例\n```\n$ cd implementations\u002Fwgan_gp\u002F\n$ python3 wgan_gp.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_7b406b4d81ec.gif\" width=\"240\\\"\\>\n\u003C\u002Fp>\n\n### 沃瑟斯坦 GAN 分离\n_用于 GAN 的沃瑟斯坦散度_\n\n#### 作者\n吴季青、黄志武、珍妮·托马、迪内什·阿查里亚、卢克·范古尔\n\n#### 摘要\n在计算机视觉的诸多领域中，生成对抗网络（GAN）取得了巨大成功，其中沃瑟斯坦 GAN（WGAN）家族因其理论贡献和卓越的定性表现而被视为当前最前沿的技术。然而，要近似满足沃瑟斯坦 1 距离（W-met）所要求的 k- Lipschitz 约束却极具挑战性。在本文中，我们提出了一种全新的沃瑟斯坦散度（W-div），它是 W-met 的一种松弛版本，且无需满足 k-Lipschitz 约束。作为一项具体应用，我们引入了用于 GAN 的沃瑟斯坦散度目标函数（WGAN-div），该目标函数可通过优化方法忠实逼近 W-div。在多种训练场景下，包括渐进式增长训练，我们证明了所提出的 WGAN-div 具有良好的稳定性，其理论与实践优势均优于 WGAN。此外，我们还对 WGAN-div 在标准图像合成基准上的定量与可视化性能进行了研究，结果表明，WGAN-div 相较于当前最先进的方法，表现出更优异的性能。\n\n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1712.01026) [[代码]](implementations\u002Fwgan_div\u002Fwgan_div.py)\n\n#### 运行示例\n```\n$ cd implementations\u002Fwgan_div\u002F\n$ python3 wgan_div.py\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_readme_e760b5a05b6f.png\" width=\"240\"\\>\n\u003C\u002Fp>","# PyTorch-GAN 快速上手指南\n\nPyTorch-GAN 是一个汇集了多种生成对抗网络（GAN）变体的开源项目，所有模型均使用 PyTorch 实现。本项目侧重于复现论文中的核心思想，适合学习和研究各类 GAN 架构。\n\n> **注意**：原作者表示该仓库已不再积极维护。如需继续开发或协作，可联系原作者邮箱。\n\n## 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n*   **操作系统**：Linux \u002F macOS \u002F Windows\n*   **Python 版本**：Python 3.6+\n*   **深度学习框架**：PyTorch (将通过依赖自动安装)\n*   **其他依赖**：Git, pip3\n\n## 安装步骤\n\n### 1. 克隆项目代码\n使用 Git 将仓库克隆到本地：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Feriklindernoren\u002FPyTorch-GAN\ncd PyTorch-GAN\u002F\n```\n\n> **国内加速建议**：如果从 GitHub 克隆速度较慢，可以使用 Gitee 镜像（如有）或通过代理加速，或者手动下载 ZIP 包解压。\n\n### 2. 安装依赖库\n项目所需的所有 Python 依赖包列在 `requirements.txt` 中。建议使用国内镜像源（如清华源或阿里源）以加快下载速度：\n\n```bash\n# 使用默认源安装\nsudo pip3 install -r requirements.txt\n\n# 推荐：使用清华镜像源加速安装\npip3 install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n## 基本使用\n\n本项目包含了数十种不同的 GAN 实现（如 ACGAN, CycleGAN, Pix2Pix 等）。每种模型都有独立的运行脚本。以下以 **Auxiliary Classifier GAN (ACGAN)** 为例演示最基础的运行流程。\n\n### 1. 进入模型目录\n切换到具体模型的实现文件夹：\n\n```bash\ncd implementations\u002Facgan\u002F\n```\n\n### 2. 运行训练脚本\n直接运行对应的 Python 脚本即可开始训练。脚本会自动处理数据下载（部分模型需要预先下载数据集，请参考各模型目录下的说明）并启动训练过程：\n\n```bash\npython3 acgan.py\n```\n\n### 3. 查看其他模型\n您可以浏览 `implementations\u002F` 目录查看所有可用的模型列表。对于需要特定数据集的模型（如 BicycleGAN），通常需要先执行数据下载脚本，例如：\n\n```bash\n# 以 BicycleGAN 为例，先下载数据集\ncd data\u002F\nbash download_pix2pix_dataset.sh edges2shoes\n# 返回模型目录并运行\ncd ..\u002Fimplementations\u002Fbicyclegan\u002F\npython3 bicyclegan.py\n```\n\n训练完成后，生成的图像样本通常会保存在模型目录下的 `images\u002F` 或类似文件夹中。","一家时尚电商公司的算法团队正致力于开发自动化的“虚拟试衣”功能，需要将模特穿着普通服装的照片转换为展示特定设计款式的效果图。\n\n### 没有 PyTorch-GAN 时\n- **复现成本极高**：团队需从零阅读 CycleGAN 和 Pix2Pix 等论文的数学公式，手动搭建复杂的生成器与判别器架构，耗时数周且极易出错。\n- **训练难以收敛**：由于缺乏成熟的损失函数配置（如 WGAN-GP 的梯度惩罚），模型常出现模式崩溃，生成的衣物纹理模糊或颜色失真。\n- **数据配对受限**：面对无法获取严格“前后对比”配对数据的现实困境，传统监督学习方法束手无策，项目被迫停滞。\n- **调试黑盒化**：缺少可视化的基准参考，开发人员难以判断是网络结构问题还是超参数设置不当，排查效率极低。\n\n### 使用 PyTorch-GAN 后\n- **即插即用原型**：直接调用仓库中预置的 CycleGAN 和 Pix2Pix 实现代码，仅需替换数据集路径，半天内即可跑通基线模型。\n- **稳定高质量输出**：利用内置优化的训练策略和损失函数，生成的服装图像细节清晰、纹理自然，有效避免了训练过程中的震荡。\n- **解锁非配对训练**：借助成熟的非配对图像转换算法，成功利用网络上爬取的无序服装图进行训练，大幅降低了数据准备门槛。\n- **快速迭代验证**：以官方实现为“黄金标准”，团队可专注于调整业务特定的数据增强策略，迅速验证不同设计风格的迁移效果。\n\nPyTorch-GAN 将原本需要数月研究的学术算法转化为可立即执行的生产力代码，让团队能专注于业务逻辑创新而非底层架构重复造轮子。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Feriklindernoren_PyTorch-GAN_b0271d3d.png","eriklindernoren","Erik Linder-Norén","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Feriklindernoren_598153e0.jpg","ML engineer at Apple. Excited about machine learning, basketball and building things.",null,"Stockholm, Sweden","eriklindernoren@gmail.com","http:\u002F\u002Fwww.eriklindernoren.se","https:\u002F\u002Fgithub.com\u002Feriklindernoren",[85,89],{"name":86,"color":87,"percentage":88},"Python","#3572A5",99.6,{"name":90,"color":91,"percentage":92},"Shell","#89e051",0.4,17447,4090,"2026-04-03T12:56:14","MIT","Linux, macOS","未说明（基于 PyTorch，训练 GAN 通常建议使用 NVIDIA GPU，具体型号和显存取决于运行的具体模型）","未说明",{"notes":101,"python":102,"dependencies":103},"README 中未列出具体的 requirements.txt 内容，仅提供了安装命令。该仓库已停止维护（stale）。不同 GAN 实现（如 CycleGAN, StarGAN 等）对硬件资源需求差异巨大，运行前需查看具体子目录的代码或文档。部分示例需要预先下载数据集（如使用脚本 download_pix2pix_dataset.sh）。","3.x (通过命令 python3 和 pip3 推断)",[104,105,106,107,108,109],"torch","torchvision","numpy","matplotlib","scipy","Pillow",[14,13],"2026-03-27T02:49:30.150509","2026-04-06T06:53:09.056678",[114,119,124,129,134,139],{"id":115,"question_zh":116,"answer_zh":117,"source_url":118},6459,"运行 cyclegan.py 时出现 'ValueError: num_samples should be a positive integer value, but got num_samples=0' 错误怎么办？","该错误通常是因为数据集路径配置错误或数据文件夹为空。请检查以下几点：\n1. 确保数据集名称已正确添加到命令末尾，例如：`python cyclegan.py --dataset_name monet2photo`。\n2. 检查数据目录结构是否符合预期。如果是使用下载脚本，需确保执行了目录移动操作（将 trainA\u002FtrainB 移动到 train\u002FA\u002Ftrain\u002FB 等）。\n3. 确认 `\u002Fdata\u002F你的数据集名称` 文件夹中确实存在图片数据，样本数量不能为零。","https:\u002F\u002Fgithub.com\u002Feriklindernoren\u002FPyTorch-GAN\u002Fissues\u002F99",{"id":120,"question_zh":121,"answer_zh":122,"source_url":123},6460,"在 WGAN-GP 实现中，是否需要对 real_samples 和 fake_samples 使用 .detach()？","在当前代码实现中，不使用 `.detach()` 也没有问题，因为 `real_samples` 和 `fake_samples` 并没有直接用于反向传播计算梯度。虽然加上 `.detach()` 不会造成负面影响，甚至可能让代码逻辑更清晰（避免不必要的梯度计算），但在该特定实现中，由于后续会清零梯度，因此不加也不会影响训练结果或导致错误。","https:\u002F\u002Fgithub.com\u002Feriklindernoren\u002FPyTorch-GAN\u002Fissues\u002F21",{"id":125,"question_zh":126,"answer_zh":127,"source_url":128},6461,"如何修改代码以使用自定义数据集（如自己的照片）训练 WGAN-GP？","若要使用自定义图片文件夹训练，需修改数据加载部分的变换配置。如果遇到报错，尝试从 `transforms_` 列表中移除 `Image.BICUBIC`。示例代码如下：\n```python\nfolder_path = \".\u002Fmy_images\" # 你的图片文件夹路径\ntransforms_ = [\n    transforms.Resize(300),\n    # 移除 Image.BICUBIC,\n    transforms.CenterCrop(300),\n    transforms.ToTensor(),\n    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))\n]\ndataloader = DataLoader(ImageDataset(folder_path, transforms_=transforms_), batch_size=64, shuffle=True, num_workers=8)\n```\n确保文件夹中包含有效的图片文件（如 .png 或 .jpg）。","https:\u002F\u002Fgithub.com\u002Feriklindernoren\u002FPyTorch-GAN\u002Fissues\u002F9",{"id":130,"question_zh":131,"answer_zh":132,"source_url":133},6462,"训练判别器（Discriminator）时，可以将真实图片和生成图片拼接在一起打乱后输入吗？","理论上可以拼接并打乱，但这通常不是一个好主意。如果判别器中包含批归一化（BatchNorm）层，这种操作会导致统计信息计算混乱，从而破坏 GAN 的训练平衡，使判别器过快收敛。如果必须这样做，建议将所有的 BatchNorm 层替换为 GroupNorm (GN) 层，或者在不需要 BN 的任务中使用。通常情况下，分开推断真实图片和假图片是更稳妥的做法。","https:\u002F\u002Fgithub.com\u002Feriklindernoren\u002FPyTorch-GAN\u002Fissues\u002F135",{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},6463,"CycleGAN 中的 Identity Loss（恒等损失）代码逻辑是否正确？为什么输入域 A 的图片要经过 G_BA？","代码逻辑是正确的。Identity Loss 的目的是确保生成器在输入已经是目标域风格时不做任何改变（即实现恒等映射 f(x)=x）。\n具体来说：\n- `G_BA` 的作用是将域 B 转换为域 A。\n- 当输入是来自域 A 的图片 (`real_A`) 时，它已经属于目标域 A。此时通过 `G_BA`，期望输出仍然是原图 `real_A`。\n- 如果输入域 A 的图片经过 `G_BA` 后发生了变化，说明生成器过度转换了图片，Identity Loss 会惩罚这种行为。\n参考论文：Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks。","https:\u002F\u002Fgithub.com\u002Feriklindernoren\u002FPyTorch-GAN\u002Fissues\u002F59",{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},6464,"运行 InfoGAN 或其他模型时出现 'UserWarning: nn.Upsample is deprecated' 警告如何处理？","这是一个关于 PyTorch 版本更新的警告，`nn.Upsample` 已被弃用，建议使用 `nn.functional.interpolate`。虽然忽略该警告代码仍可正常运行，但为了消除警告，可以定义一个自定义的 `Upsample` 模块来替代：\n```python\nclass Upsample(nn.Module):\n    \"\"\" nn.Upsample is deprecated \"\"\"\n    def __init__(self, scale_factor, mode=\"nearest\"):\n        super(Upsample, self).__init__()\n        self.scale_factor = scale_factor\n        self.mode = mode\n\n    def forward(self, x):\n        x = F.interpolate(x, scale_factor=self.scale_factor, mode=self.mode)\n        return x\n```\n然后在模型定义中使用这个自定义类代替原有的 `nn.Upsample`。","https:\u002F\u002Fgithub.com\u002Feriklindernoren\u002FPyTorch-GAN\u002Fissues\u002F46",[]]