[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-sgrvinod--a-PyTorch-Tutorial-to-Super-Resolution":3,"tool-sgrvinod--a-PyTorch-Tutorial-to-Super-Resolution":61},[4,18,28,36,45,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":24,"last_commit_at":25,"category_tags":26,"status":17},9989,"n8n","n8n-io\u002Fn8n","n8n 是一款面向技术团队的公平代码（fair-code）工作流自动化平台，旨在让用户在享受低代码快速构建便利的同时，保留编写自定义代码的灵活性。它主要解决了传统自动化工具要么过于封闭难以扩展、要么完全依赖手写代码效率低下的痛点，帮助用户轻松连接 400 多种应用与服务，实现复杂业务流程的自动化。\n\nn8n 特别适合开发者、工程师以及具备一定技术背景的业务人员使用。其核心亮点在于“按需编码”：既可以通过直观的可视化界面拖拽节点搭建流程，也能随时插入 JavaScript 或 Python 代码、调用 npm 包来处理复杂逻辑。此外，n8n 原生集成了基于 LangChain 的 AI 能力，支持用户利用自有数据和模型构建智能体工作流。在部署方面，n8n 提供极高的自由度，支持完全自托管以保障数据隐私和控制权，也提供云端服务选项。凭借活跃的社区生态和数百个现成模板，n8n 让构建强大且可控的自动化系统变得简单高效。",184740,2,"2026-04-19T23:22:26",[16,14,13,15,27],"插件",{"id":29,"name":30,"github_repo":31,"description_zh":32,"stars":33,"difficulty_score":10,"last_commit_at":34,"category_tags":35,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":24,"last_commit_at":42,"category_tags":43,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",161147,"2026-04-19T23:31:47",[14,13,44],"语言模型",{"id":46,"name":47,"github_repo":48,"description_zh":49,"stars":50,"difficulty_score":24,"last_commit_at":51,"category_tags":52,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",109154,"2026-04-18T11:18:24",[14,15,13],{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":24,"last_commit_at":59,"category_tags":60,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[27,13,15,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":75,"owner_location":76,"owner_email":77,"owner_twitter":75,"owner_website":75,"owner_url":78,"languages":79,"stars":84,"forks":85,"last_commit_at":86,"license":87,"difficulty_score":10,"env_os":88,"env_gpu":89,"env_ram":88,"env_deps":90,"category_tags":95,"github_topics":96,"view_count":24,"oss_zip_url":75,"oss_zip_packed_at":75,"status":17,"created_at":104,"updated_at":105,"faqs":106,"releases":136},9918,"sgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution","a-PyTorch-Tutorial-to-Super-Resolution","Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network | a PyTorch Tutorial to Super-Resolution","a-PyTorch-Tutorial-to-Super-Resolution 是一个基于 PyTorch 框架的开源教程项目，旨在指导开发者构建能够逼真提升图像分辨率的深度学习模型。它主要解决低分辨率图像模糊、细节缺失的问题，通过算法“幻觉”出原本不存在的像素，将图像尺寸放大四倍（像素量增加 16 倍），同时保持惊人的真实感。\n\n该项目不仅复现了经典的 SRResNet 模型，更重点讲解了如何利用生成对抗网络（GAN）构建 SRGAN 模型。其独特的技术亮点在于引入了残差连接（Residual Connections）以优化深层网络训练，并通过对抗训练机制显著提升了生成图像的纹理细节和视觉自然度，超越了传统仅追求像素误差最小化的方法。\n\n这是一个极佳的学习资源，特别适合具备一定 PyTorch 和卷积神经网络基础的开发者与研究人员。如果你希望深入理解 GAN 的工作原理，或想亲手从零实现前沿的超分辨率算法，这个项目提供了从概念解析、代码实现到训练评估的完整路径。虽然普通用户也可利用其生成的模型处理图片，但其核心价值更偏向于技术教育与算法研究。","This is a **[PyTorch](https:\u002F\u002Fpytorch.org) Tutorial to Super-Resolution**.\n\nThis is also a tutorial for learning about **GANs** and how they work, regardless of intended task or application.\n\nThis is the fifth in [a series of tutorials](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002FDeep-Tutorials-for-PyTorch) I'm writing about _implementing_ cool models on your own with the amazing PyTorch library.\n\nBasic knowledge of PyTorch, convolutional neural networks is assumed.\n\nIf you're new to PyTorch, first read [Deep Learning with PyTorch: A 60 Minute Blitz](https:\u002F\u002Fpytorch.org\u002Ftutorials\u002Fbeginner\u002Fdeep_learning_60min_blitz.html) and [Learning PyTorch with Examples](https:\u002F\u002Fpytorch.org\u002Ftutorials\u002Fbeginner\u002Fpytorch_with_examples.html).\n\nQuestions, suggestions, or corrections can be posted as [issues](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fissues).\n\nI'm using `PyTorch 1.4` in `Python 3.6`.\n\n# Contents\n\n[***Objective***](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#objective)\n\n[***Concepts***](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#concepts)\n\n[***Overview***](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#overview)\n\n[***Implementation***](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#implementation)\n\n[***Training***](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#training)\n\n[***Evaluation***](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#evaluation)\n\n[***Inference***](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#inference)\n\n[***Frequently Asked Questions***](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#frequently-asked-questions)\n\n# Objective\n\n**To build a model that can realistically increase image resolution.**\n\nSuper-resolution (SR) models essentially hallucinate new pixels where previously there were none. In this tutorial, we will try to _quadruple_ the dimensions of an image i.e. increase the number of pixels by 16x!\n\nWe're going to be implementing [_Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network_](https:\u002F\u002Farxiv.org\u002Fabs\u002F1609.04802). It's not just that the results are very impressive... it's also a great introduction to GANs!\n\nWe will train the two models described in the paper — the SRResNet, and the SRGAN which greatly improves upon the former through adversarial training.  \n\nBefore you proceed, take a look at some examples generated from low-resolution images not seen during training. _Enhance!_\n\n---\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_44fe1d2fc77e.png\">\n\u003C\u002Fp>\n\n---\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_c2b8443a3162.png)\n\n---\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_bedd19429559.png)\n\n---\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_536f95c0cb31.png\">\n\u003C\u002Fp>\n\n---\n[A **video demo** for the SRGAN!](https:\u002F\u002Fyoutu.be\u002FsUhbIdSd6dc)\n\nSince YouTube's compression is likely reducing the video's quality, you can [download the original video file here](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F12OG-KawSFFs6Pah89V4a_Td-VcwMBE5i?usp=sharing) for best viewing.\n\n[![Click here to watch](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_7cb2dc7466fe.jpg)](https:\u002F\u002Fyoutu.be\u002FsUhbIdSd6dc)\n\nMake sure to watch in 1080p so that the 4x scaling is not downsampled to a lower value.\n\n---\n\nThere are large examples at the [end of the tutorial](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#large-examples).\n\n---\n\n# Concepts\n\n* **Super-Resolution**. duh.\n\n* **Residual Connections**. Introduced in the [seminal 2015 paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1512.03385), residual connections are shortcuts over one or many neural network layers that allow them to learn residual mappings – perturbations to the input that produce the desired output – instead of wholly learning the output itself. Adding these connections, across so-called residual \"blocks\", greatly increases the optimizability of very deep neural networks. \n  \n* **Generative Adversarial Network (GAN)**. From [another groundbreaking paper](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2014\u002Ffile\u002F5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf), GANs are a machine learning framework that pits two networks against each other, i.e., as adversaries. A generative model, called the Generator, seeks to produce some data – in this case, images of a higher resolution – that is identical in its distribution to the training data. A discriminating model, called the Discriminator, seeks to thwart its attempts at forgery by learning to tell real from fake. As either network grows more skilled, its predictions can be used to improve the other. Ultimately, we want the Generator's fictions to be indistinguishable from fact – at least to the human eye.\n\n* **Sub-Pixel Convolution**. An alternative to transposed convolutions commonly used for upscaling images, subpixel convolutions use regular convolutions on lower-resolution feature maps to create new pixels in the form of new image channels, which are then \"shuffled\" into a higher-resolution image. \n  \n* **Perceptual Loss**. This combines MSE-based content loss in a \"deep\" image space, as opposed to the usual RGB channel-space, and the adversarial loss, which allows the Generator to learn from the rulings of the Discriminator.\n\n# Overview\n\nIn this section, I will present an overview of this model. If you're already familiar with it, you can skip straight to the [Implementation](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#implementation) section or the commented code.\n\n### Image Upsampling Methods\n\nImage upsampling is basically the process of **artificially increasing its spatial resolution** – the number of pixels that represent the \"view\" contained in the image. \n\n**Upsampling an image is a very common application** – it's happening each time you pinch-zoom into an image on your phone or watch a 480p video on your 1080p monitor. You'd be right that there's no AI involved, and you can tell because the image will begin to appear blurry or blocky once you view it at a resolution greater than that it was encoded at. \n\nUnlike the neural super-resolution that we will attempt in this tutorial, common upsampling methods are not intended to produce high-fidelity estimations of what an image would look like at higher resolution. Rather, they are used because **images constantly need to be resampled in order to display them**. When you want an image to occupy a certain portion of a 1080p screen or be printed to fit A4-sized paper, for example, it'd be a hell of a coincidence if the native resolution of the monitor or printer matched the resolution of the image. While upsampling technically increases the resolution, it remains obvious that it is still effectively a low-resolution, low-detail image that is simply being viewed at a higher resolution, possibly with some smoothing or sharpening.\n\nIn fact, images upsampled with these methods can be used as a **proxy for the low-resolution image** to compare with their super-resolved versions both in the paper and in this tutorial. It would be impossible to display a low-resolution image at the same physical size (in inches, on your screen) as the super-resolved image without upsampling it in some way (or downsampling the super-resolved image, which is stupid). \n\nLet's take a look at some **common upsampling techniques**, shall we?\n\nAs a reference image, consider this awesome Samurai logo from *Cyberpunk 2077* [created by Reddit user \u002Fu\u002Fshapanga](https:\u002F\u002Freddit.com\u002Fr\u002Fcyberpunkgame\u002Fcomments\u002F8rnndi\u002Fi_remade_the_jacket_logo_from_the_trailer_feel\u002F), which I'm using here with their permission.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_a0d326857325.png\">\n\u003C\u002Fp>\n\nConsider the same image at quarter dimensions, or sixteen times fewer pixels.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_5448a97c614e.png\">\n\u003C\u002Fp>\n\nThe goal is to increase the number of pixels in this low-resolution image so it can be displayed at the same size as its high-resolution counterpart. \n\n#### Nearest Neighbour Upsampling\n\nThis is the simplest way to upsample an image and essentially amounts to stretching the image as-is. \n\nConsider a small image with a black diagonal line, with red on one side and gray on the other.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_a7256e34d58d.png\">\n\u003C\u002Fp>\n\nWe first create new, empty pixels between known pixels at the desired resolution.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_537748c3f37c.png\">\n\u003C\u002Fp>\n\nWe then assign each new pixel the **value of its nearest neighbor** whose value we _do_ know.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_140a98d26ec9.png\">\n\u003C\u002Fp>\n\nUpsampling the low-resolution Samurai image using nearest neighbor interpolation yields a result that appears blocky and contains jagged edges. \n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_16098bcea4cb.png\">\n\u003C\u002Fp>\n\n#### Bilinear \u002F Bicubic Upsampling\n\nHere too, we create empty pixels such that the image is at the target resolution.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_537748c3f37c.png\">\n\u003C\u002Fp>\n\nThese pixels must now be painted in. If we perform linear interpolation using the two closest known pixels (i.e., one on each side), it is **_bilinear_ upsampling**.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_5f23cf1284aa.png\">\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_211739388a7e.png\">\n\u003C\u002Fp>\n\nUpsampling the low-resolution Samurai image using bilinear interpolation yields a result that is smoother than what we achieved using nearest neighbor interpolation, because there is a more natural transition between pixels. \n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_34b924f2eb34.png\">\n\u003C\u002Fp>\n\nAlternatively, you can perform cubic interpolation using 4 known pixels (i.e., 2 on each side). This would be **_bicubic_ upsampling**. As you can imagine, the result is even smoother because we're using more data to perform the interpolation.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_da95b2510ad2.png\">\n\u003C\u002Fp>\n\n[This Wikimedia image](https:\u002F\u002Fcommons.wikimedia.org\u002Fwiki\u002FFile:Comparison_of_1D_and_2D_interpolation.svg) provides a nice snapshot of these interpolation methods.\n\nI would guess that if you're viewing a lower-resolution video on a higher-resolution screen – with the VLC media player, for example – you are seeing individual frames of the video upscaled using either bilinear or bicubic interpolation.\n\nThere are other, more advanced upsampling methods such as [Lanczos](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FLanczos_resampling), but my understanding of them is fairly limited. \n\n### Neural Super-Resolution\n\nIn contrast to more \"naive\" image upsampling, the goal of super-resolution *is* to **create high-resolution, high-fidelity, aesthetically pleasing, plausible images** from the low-resolution version. \n\nWhen an image is reduced to a lower resolution, finer details are irretrievably lost. Similarly, **upscaling to a higher resolution requires the _addition_ of new information**. \n\nAs a human, you may be able to visualize what an image might look like at a greater resolution – you might say to yourself, \"this blurry mess in this corner would resolve into individual strands of hair\", or \"that sand-coloured patch might actually be sand and would appear... granular\". To manually create such an image yourself, however, would require a certain level of artistry and would no doubt be extremely painstaking. The goal here, in this tutorial, would be to **train a neural network to perform this task**.\n\nA neural network trained for super-resolution might recognize, for instance, that the black diagonal line in our low-resolution patch from above would need to be reproduced as a smooth but sharp black diagonal in the upscaled image.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_24f9ac32153b.png\">\n\u003C\u002Fp>\n\nWhile neurally super-resolving an image may not be practical (or even necessary) for more mundane tasks, it is already being applied _today_. If you're playing a videogame with [NVIDIA DLSS](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FDeep_learning_super_sampling), for example, what's on your screen is being rendered (at lower cost) at a lower resolution and then neurally hallucinated into a larger but crisp image, as if you rendered it at this higher resolution in the first place. The day may not be far when your favorite video player will automatically upscale a movie to 4K as it plays on your humongous TV. \n\nAs stated at the beginning of this tutorial, we will be training two generative neural models – the **SRResNet** and the **SRGAN**. \n\nBoth networks will aim to _quadruple_ the dimensions of an image i.e. increase the number of pixels by 16x!\n\nThe low-resolution Samurai image super-resolved with the SRResNet is comparable in quality to the original high-resolution version.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_8b985fb2bf7b.png\">\n\u003C\u002Fp>\n\nAnd so is the low-resolution Samurai image super-resolved with the SRGAN.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_5e7f872f5824.png\">\n\u003C\u002Fp>\n\nWith the Samurai image, I'd say the SRResNet's result looks better than the SRGAN's. However, this might be because it's a relatively simple image with plain, solid colours – the SRResNet's weakness for producing overly smooth textures works to its advantage in this instance. \n\nIn terms of the ability to create photorealistic images with fine detail, the SRGAN greatly outperforms the SRResNet because of its adversarial training, as evidenced in the various examples peppered throughout this tutorial.\n\n### Residual (Skip) Connections\n\nGenerally, **deeper neural networks are more capable** – but only up to a point. It turns out that adding more layers will improve performance but after a certain threshold is reached, **performance will *degrade***. \n\nThis degradation is not caused by overfitting the training data – training metrics are affected as well. Nor is it caused by vanishing or exploding gradients, which you might expect with deep networks, because the problem persists despite normalizing initializations and layer outputs. \n\nTo address this relative unoptimizability of deeper neural networks, in a [seminal 2015 paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1512.03385), researchers introduced **_skip_ connections** – shortcuts that allow information to flow, unchanged, across an intervening operation. This information is added, element-wise, to the output of the operation.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_a40eebf9d12d.png\">\n\u003C\u002Fp>\n\nSuch a connection need not occur across a single layer. You can create a shortcut across a group of successive layers.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_b33c0c80f9a2.png\">\n\u003C\u002Fp>\n\nSkip connections allow intervening layers to **learn a residual mapping instead of learning the unreferenced, desired function in its entirety** – i.e., it would need to model only the changes that must be made to the input to produce the desired output. Thus, while the final result might be the same, what we want these layers to learn has been fundamentally changed.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_a0a74c97ae97.png\">\n\u003C\u002Fp>\n\n**Learning the residual mapping is significantly easier**. Consider, for example, the extreme case of having a group of non-linear layers learn the _identity_ mapping. While this may appear to be a simple task at first glance, its solution – i.e. the weights of these layers that linearly transform the input in such a way that applying a non-linear activation produces that same input – isn't obvious, and approximating it is not trivial. In contrast, the solution to learning its residual mapping, which is simply a _zero function_ (i.e., no changes to the input), *is* trivial – the weights must simply be driven to zero. \n\nIt turns out that this particular example _isn't_ as extreme as we think because **deeper layers in a network do learn something not completely unlike the identity function** because only small changes are made to the input by these layers. \n\nSkip connections allow you to train very deep networks and unlock significant performance gains. It is no surprise that they are used in both the SRResNet (aptly named the Super-Resolution _Residual_ Network) and the Generator of the SRGAN. In fact, you'd be hard-pressed to find a modern network without them. \n\n### Sub-Pixel Convolution\n\nHow is upscaling handled in CNNs? This isn't a task specific to super-resolution, but also to applications like semantic segmentation where the more \"global\" feature maps, which are by definition at a lower resolution, must be upsampled to the resolution you want to perform the segmentation at. \n\nA common approach is to **perform bilinear or bicubic upsampling to the target resolution, and _then_ apply convolutions** (which must be learned) to produce a better result. In fact, earlier networks for super-resolution did exactly this – upscale the low-resolution image at the very beginning of the network and then apply a series of convolutional layers in the high-resolution space to produce the final super-resolved image. \n\nAnother popular method is **transposed convolution**, which you may be familiar with, where whole convolutional kernels are applied to single pixels in the low-resolution image and the resulting multipixel patches are combined at the desired stride to produce the high-resolution image. It's basically **the reverse of the usual convolution process**.\n\n**Subpixel convolution** is an alternative approach that involves applying regular convolutions to the low-resolution image such that **the new pixels that we require are created in the form of additional channels**. \n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_7a118d21d362.png\">\n\u003C\u002Fp>\n\nIn other words, **if you want to upsample by a factor $s$**, the $s^2$ new pixels that must be created for each pixel in the low-resolution image are produced by the convolution operation in the form of **$s^2$ new channels** at that location. You may use any kernel size $k$ of your choice for this operation, and the low-resolution image can have any number of input channels $i$.\n\nThese channels are then rearranged to yield the high-resolution image, in a process called the **pixel shuffle**.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_f16d3fbc5774.png\">\n\u003C\u002Fp>\n\nIn the above example, there's only one output channel in the high-resolution image. **If you require $n$ output channels, simply create $n$ sets of $s^2$ channels**, which can be shuffled into $n$ sets of $s * s$ patches at each location. \n\nIn the rest of the tutorial, we will the pixel-shuffle operation as follows –\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_096ada5b5d1b.png\">\n\u003C\u002Fp>\n\nAs you can imagine, **performing convolutions in the low-resolution space is more efficient than doing so at a higher resolution**. Therefore, the subpixel convolution layer is often at the very end of the super-resolution network, *after* a series of convolutions have already been applied to the low-resolution image. \n\n### Minimizing Loss – a refresher\n\nLet's stop for a moment to examine _why_ we construct loss functions and minimize them. You probably already know all of this, but I think it would help to go over these concepts again because they are key to understanding how GANs are trained. \n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_b07a30de297f.png\">\n\u003C\u002Fp>\n\n- A **loss function** $L$ is basically a function that quantifies how _different_ the outputs of our network $N$ are from their desired values $D$. \n  \n- Our neural network's outputs $N(θ_N, I)$ are the outputs generated by the network with its current parameter set $θ_N$ when provided an input $I$.\n  \n- We say _desired_ values $D$, and not gold values or labels, because the values we desire are not necessarily the truth, as we will see later.\n  \n- The goal then would be to **minimize the loss function**, which we do by changing the network's parameters $θ_N$ in a way that drives its ouptuts $N(θ_N, I)$ towards the desired values $D$. \n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_ba154d6dfc18.png\">\n\u003C\u002Fp>\n\nKeep in mind that the change in the parameters $θ_N$ is not a consequence of minimizing the loss function $L$. Rather, the minimization of the loss function $L$ is a consequence of changing the parameters $θ_N$ in a particular way. Above, I say \"Minimizing $L$ *moves* $θ_N$...\" simply to indicate that *choosing* to minimize a certain loss function $L$ implies these particular changes to $θ_N$.\n\n_How_ the direction and magnitude of the changes to $θ_N$ are decided is secondary to this particular discussion, but in the interest of completeness – \n\n- Gradients of the loss function $L$ with respect to the parameters $θ_N$, i.e. $\\frac{∂L}{∂θ_N}$ are calculated, by propagating gradients back through the network using the chain rule of differentiation, in a process known as *backpropagation*.\n  \n- The parameters $θ_N$ are moved in a direction opposite to the gradients $\\frac{∂L}{∂θ_N}$ by a magnitude proportional to the magnitude of the gradients $\\frac{∂L}{∂θ_N}$ and a step size $lr$ known as the learning rate, thereby descending along the surface of the loss function, in a process known as *gradient descent*.\n\nTo conclude, the important takeaway here is that, for a network $N$ given an input $I$, by choosing a suitable loss function $L$ and desired values $D$, it is possible to manipulate all parameters $θ_N$ upstream of the loss function in a way that drives outputs of $N$ closer to $D$. \n\nDepending upon our requirements, we may choose to manipulate only a subset $θ_n$ of all parameters $θ_N$, by freezing the other parameters $θ_{N-n}$, thereby **training only a subnetwork $n$ in the whole network $N$**, in a way that drives outputs of the subnetwork $n$ in a way which, in turn, drives outputs of the whole network $N$ closer to desired values $D$. \n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_408e3c3fae03.png\">\n\u003C\u002Fp>\n\nYou may have already done this before in transfer learning applications – for instance, fine-tuning only the final layers $n$ of a large pretrained CNN or Transformer model $N$ to adapt it to a new task. We will do something similar later on, but in an entirely different context.\n\n### The Super-Resolution ResNet (SRResNet)\n\nThe SRResNet is a **fully convolutional network designed for 4x super-resolution**. As indicated in the name, it incorporates residual blocks with skip connections to increase the optimizability of the network despite its significant depth. \n\nThe SRResNet is trained and used as a standalone network, and as you will see soon, provides a **nice baseline for the SRGAN** – for both comparision and initialization.\n\n#### The SRResNet Architecture\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_cbc774407efa.png\">\n\u003C\u002Fp>\n\nThe SRResNet is composed of the following operations –\n\n- First, the low resolution image is convolved with a large kernel size $9\\times9$ and a stride of $1$, producing a feature map at the same resolution but with $64$ channels. A parametric *ReLU* (*PReLU*) activation is applied.\n  \n- This feature map is passed through $16$ **residual blocks**, each consisting of a convolution with a $3\\times3$ kernel and a stride of $1$, batch normalization and *PReLU* activation, another but similar convolution, and a second batch normalization. The resolution and number of channels are maintained in each convolutional layer.\n  \n- The result from the series of residual blocks is passed through a convolutional layer with a $3\\times3$ kernel and a stride of $1$, and batch normalized. The resolution and number of channels are maintained. In addition to the skip connections in each residual block (by definition), there is a larger skip connection arching across all residual blocks and this convolutional layer.\n  \n- $2$ **subpixel convolution blocks**, each upscaling dimensions by a factor of $2$ (followed by *PReLU* activation), produce a net 4x upscaling. The number of channels is maintained.\n  \n- Finally, a convolution with a large kernel size $9\\times9$ and a stride of $1$ is applied at this higher resolution, and the result is *Tanh*-activated to produce the **super-resolved image with RGB channels** in the range $[-1, 1]$.\n\nIf you're wondering about certain specific numbers above, don't worry. As is often the case, they were likely decided either empirically or for convenience by the authors and in the other works they referenced in their paper. \n\n#### The SRResNet Update\n\nTraining the SRResNet, like any network, is composed of a series of updates to its parameters. What might constitute such an update?\n\nOur training data will consist of high-resolution (gold) images, and their low-resolution counterparts which we create by 4x-downsampling them using bicubic interpolation. \n\nIn the forward pass, the SRResNet produces a **super-resolved image at 4x the dimensions of the low-resolution image** that was provided to it. \n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_08203634652d.png\">\n\u003C\u002Fp>\n\nWe use the **Mean-Squared Error (MSE) as the loss function** to compare the super-resolved image with this original, gold high-resolution image that was used to create the low-resolution image.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_092f5f0289dc.png\">\n\u003C\u002Fp>\n\nChoosing to minimize the MSE between the super-resolved and gold images means we will change the parameters of the SRResNet in a way that, if given the low-resolution image again, it will **create a super-resolved image that is closer in appearance to the original high-resolution version**. \n\nThe MSE loss is a type of ***content* loss**, because it is based purely on the contents of the predicted and target images. \n\nIn this specific case, we are considering their contents in the ***RGB space*** – we will discuss the significance of this soon.\n\n### The Super-Resolution Generative Adversarial Network (SRGAN)\n\nThe SRGAN consists of a **Generator** network and a **Discriminator** network. \n\nThe goal of the Generator is to learn to super-resolve an image realistically enough that the Discriminator, which is trained to identify telltale signs of such artificial origin, can no longer reliably tell the difference. \n\nBoth networks are **trained in tandem**. \n\nThe Generator learns not only by minimizing a content loss, as in the case of the SRResNet, but also by _spying_ on the Discriminator's methods. \n\nIf you're wondering, _we_ are the mole in the Discriminator's office! By providing the Generator access to the Discriminator's inner workings in the form of the gradients produced therein when backpropagating from its outputs, the Generator can adjust its own parameters in a way that alter the Discriminator's outputs in its favour. \n\nAnd as the Generator produces more realistic high-resolution images, we use these to train the Disciminator, improving its disciminating abilities.\n\n#### The Generator Architecture\n\nThe Generator is **identical to the SRResNet** in architecture. Well, why not? They perform the same function. This also allows us to use a trained SRResNet to initialize the Generator, which is a huge leg up. \n\n#### The Discriminator Architecture\n\nAs you might expect, the Discriminator is a convolutional network that functions as a **binary image classifier**.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_474d6d3c8346.png\">\n\u003C\u002Fp>\n\nIt is composed of the following operations –\n\n- The high-resolution image (of natural or artificial origin) is convolved with a large kernel size $9\\times9$ and a stride of $1$, producing a feature map at the same resolution but with $64$ channels. A leaky *ReLU* activation is applied.\n  \n- This feature map is passed through $7$ **convolutional blocks**, each consisting of a convolution with a $3\\times3$ kernel, batch normalization, and leaky *ReLU* activation. The number of channels is doubled in even-indexed blocks. Feature map dimensions are halved in odd-indexed blocks using a stride of $2$.\n  \n- The result from this series of convolutional blocks is flattened and linearly transformed into a vector of size $1024$, followed by leaky *ReLU* activation.\n  \n- A final linear transformation yields a single logit, which can be converted into a probability score using the *Sigmoid* activation function. This indicates the **probability of the original input being a natural (gold) image**.\n\n#### Interleaved Training\n\nFirst, let's describe how the Generator and Discriminator are trained in relation to each other. Which do we train first? \n\nWell, neither is fully trained well before the other – they are both trained *together*.\n\nTypically, any GAN is **trained in an interleaved fashion**, where the Generator and Discriminator are alternately trained for short periods of time.\n\nIn this particular paper, each component network is updated just once before making the switch.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_7ec23d03f780.png\">\n\u003C\u002Fp>\n\nIn other GAN implementations, you may notice there are $k$ updates to the Discriminator for every update to the Generator, where $k$ is a hyperparameter that can be tuned for best results. But often, $k=1$.\n\n#### The Discriminator Update\n\nIt's better to understand what constitutes an update to the Discriminator before getting to the Generator. There are no surprises here – it's exactly as you would expect. \n\nSince the Discriminator will learn to tell apart natural (gold) high-resolution images from those produced by Generator, it is provided both gold and super-resolved images with the corresponding labels ($HR$ vs $SR$) during training.\n\nFor example, in the forward pass, the Discriminator is provided with a gold high-resolution image and it produces a **probability score $P_{HR}$ for it being of natural origin**. \n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_96f700b14c67.png\">\n\u003C\u002Fp>\n\nWe desire the Discriminator to be able to correctly identify it as a gold image, and for $P_{HR}$ to be as high as possible. We therefore minimize the **binary cross-entropy loss** with the correct ($HR$) label.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_3c35c71d8a6f.png\">\n\u003C\u002Fp>\n\nChoosing to minimize this loss will change the parameters of the Discriminator in a way that, if given the gold high-resolution image again, it will **predict a higher probability $P_{HR}$ for it being of natural origin**. \n\nSimilarly, in the forward pass, the Discriminator is provided with the super-resolved image that the Generator (in its current state) created from the downsampled low-resolution version of the original high-resolution image, and the Discriminator produces a **probability score $P_{HR}$ for it being of natural origin**. \n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_04e5cbfa8975.png\">\n\u003C\u002Fp>\n\nWe desire the Discriminator to be able to correctly identify it as a super-resolved image, and for $P_{HR}$ to be as low as possible. We therefore minimize the **binary cross-entropy loss** with the correct ($SR$) label.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_c19364208956.png\">\n\u003C\u002Fp>\n\nChoosing to minimize this loss will change the parameters of the Discriminator in a way that, if given the super-resolved image again, it will **predict a lower probability $P_{HR}$ for it being of natural origin**. \n\nThe training of the Discriminator is fairly straightforward, and isn't any different from how you would expect to train any image classifier.\n\nNow, let's look at what constitutes an update to the Generator.\n\n#### A Better Content Loss\n\nThe **MSE-based content loss in the RGB space**, as used with the SRResNet, is a staple in the image generation business. \n\nBut it has its drawbacks – it **produces overly smooth images** without the fine detail that is required for photorealism. You may have already noticed this in the results of the SRResNet in the various examples in this tutorial. And it's easy to see why.\n\nWhen super-resolving a low-resolution patch or image, there are often multiple closely-related possibilities for the resulting high-resolution version. In other words, a small blurry patch in the low-resolution image can resolve itself into a manifold of high-resolution patches that would each be considered a valid result. \n\nImagine, for instance, that a low-resolution patch would need to produce a hatch pattern with blue diagonals with a specific spacing at a higher resolution in the RGB space. There are multiple possiblities for the exact positions of these diagonal lines.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_1cea36542b48.png\">\n\u003C\u002Fp>\n\nAny one of these would be considered a satisfying result. Indeed, the natural high-resolution image *will* contain one of them. \n\nBut a network trained with content loss in the RGB space, like the SRResNet, would be quite reluctant to produce such a result. Instead, it opts to produce something that is essentially the ***average* of the manifold of finely detailed high-resolution possibilities.** This, as you can imagine, contains little or no detail because they have all been averaged out! But it *is* a safe prediction because the natural or ground-truth patch it was trained with can be any one of these possibilities, and producing any *other* valid possibility would result in a very high MSE.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_72b9f3730b84.png\">\n\u003C\u002Fp>\n\nIn other words, the very fact that it is impossible to know from the low-resolution patch the *exact* RGB pixels in the ground-truth patch deters the network from creatively producing any equivalent pattern because there is a high risk of high MSE and a snowball's chance in hell of coincidentally producing the same pixels as the ground-truth patch. Instead, **an overly smooth \"averaged\" prediction will almost always have lower MSE!** \n\nIn the eyes of the model – remember, the model is *seeing* through the content loss in the RGB space – these many valid possibilities are not equivalent at all. The only valid prediction is producing *the* ground-truth RGB pixels, which are impossible to know exactly. To solve this problem, we need a way to make these many possibilities that are equivalent in our eyes *also* equivalent in the eyes of the model. \n\nIs there a way to ignore the precise configuration of RGB pixels in a patch or image and instead boil it down it to its basic essence or meaning? *Yes!* CNNs trained to classify images do exactly this – they produce \"deeper\" representations of the patch or image that describe its nature. It stands to reason that **patterns that are logically equivalent in the RGB space will produce similar representations when passed through a trained CNN.**\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_c9de0fce2aca.png\">\n\u003C\u002Fp>\n\nThis new \"deep\" representation space is much more suitable for calculating a content loss! Our super-resolution model no longer need fear being creative – producing a logical result with fine details that is not exactly the same as the ground-truth in RGB space will not be penalized. \n\n#### The Generator Update – part 1\n\nThe first component of the Generator update involves the ***content* loss**.\n\nAs we know, in the forward pass, the Generator produces a **super-resolved image at 4x the dimensions of the low-resolution image** that was provided to it. \n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_c92e95801152.png\">\n\u003C\u002Fp>\n\nFor the reasons described in the previous section, we will *not* be using MSE in RGB space as the content loss to compare the super-resolved image with the original, gold high-resolution image that was used to create the low-resolution image.\n\nInstead, we will pass both of these through a trained CNN, specifically the **VGG19 network** that has been pretrained on the Imagenet classification task. This network is **truncated at the $4$th convolution after the $5$th maxpooling layer.** \n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_6899179549a1.png\">\n\u003C\u002Fp>\n\nWe use **MSE-based content loss in this VGG space** to compare, indirectly, the super-resolved image with the original, gold high-resolution image.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_010009813cb4.png\">\n\u003C\u002Fp>\n\nChoosing to minimize the MSE between the super-resolved and gold images in the VGG space means we will change the parameters of the generator in a way that, if given the same low-resolution image again, it will **create a super-resolved image that is closer in appearance to the original high-resolution version by virtue of being closer in appearance in the VGG space**, *without* being overly-smooth or unrealistic as in the case of the SRResNet. \n\n#### The Generator Update – part 2\n\nWhat's a GAN without adversarial training? *Not* a GAN, is what.\n\nThe use of a content loss is only one component of a Generator update, and while we will see improvements with the use of a VGG space instead of an RGB space, the biggest contributor to photorealistic super-resolution in the Generator as opposed to the SRResNet is still likely going to be the **adversarial loss**.\n\nHere, the super-resolved image is passed through the Discriminator (in its current state) to obtain a **probability score $P_{HR}$ for it being of natural origin**. \n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_f0283407b238.png\">\n\u003C\u002Fp>\n\nThe Generator would obviously like the Discriminator to *not* realize that it is indeed *not* a natural image and for $P_{HR}$ to be as high as possible. How would we update the Generator in a way that increases $P_{HR}$? Note that our objective in this step is to train the Generator only – the Discriminator's weights are frozen.\n\nWe therefore **provide our *desired* label ($HR$) – the incorrect or misleading label –** to the binary cross-entropy loss function and use the resulting gradient information to update the Generator's weights!\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_b51e2b63eca4.png\">\n\u003C\u002Fp>\n\nChoosing to minimize the binary cross-entropy loss with the desired but *wrong* ($HR$) label means we will change the parameters of the Generator in a way that, if given the low-resolution image again, it will **create a super-resolved image that is closer in appearance and characteristics to the original high-resolution version such that it becomes more likely for the Discriminator to identify it as being of natural origin**. \n\nIn other words, from this loss formulation, we are using gradient information in the Discriminator – i.e. how the Discriminator's output $P_{HR}$ will respond to changes in the Discriminator's parameters – *not* to update the Discriminator's own parameters, *but rather* to acquire gradient information in the Generator via backpropagation – i.e. how the Discriminator's output $P_{HR}$ will respond to changes in the Generator's parameters – to make the necessary changes to the Generator!\n\n[Earlier](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#minimizing-loss--a-refresher) in the tutorial, we saw how we could minimize a loss function and move towards the desired output by updating only a subnetwork $n$ in a larger network $N$ by freezing the parameters of the subnetwork $N-n$. We are doing exactly the same here, with the Generator and Discriminator combining to form a supernetwork $N$, in which we are only updating the Generator $n$. No doubt the loss would be minimized to a greater extent if we also update the Discriminator $N-n$, but doing so would directly sabotage the Discriminator's discriminating abilities, which is counterproductive. \n\n#### Perceptual Loss\n\nSince the Generator learns from two types of losses – content loss and adversarial loss – we can combine them using a weighted average to represent what the authors of the paper call the ***perceptual* loss**.\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_1d165d9aa3cd.png\">\n\u003C\u002Fp>\n\nThis perceptual loss vastly improves upon the capabilities of the SRResNet, with the Generator able to produce photorealistic and finely-detailed images that are much more believable, as evidenced in user studies conducted in the paper! \n\n# Implementation\n\nThe sections below briefly describe the implementation.\n\nThey are meant to provide some context, but **details are best understood directly from the code**, which is quite heavily commented.\n\n### Dataset\n\n#### Description\n\nWhile the authors of the paper trained their models on a 350k-image subset of the ImageNet data, I simply used about 120k COCO images. They're a lot easier to obtain.\n\nAs in the paper, we test trained models on the Set5, Set14, and BSD100 datasets, which are commonly used benchmarks for the super-resolution task.\n\n#### Download\n\nYou'd need to download MSCOCO '14 [Training (13GB)](http:\u002F\u002Fimages.cocodataset.org\u002Fzips\u002Ftrain2014.zip) and [Validation (6GB)](http:\u002F\u002Fimages.cocodataset.org\u002Fzips\u002Fval2014.zip) images. \n\nYou can find download links to the Set5, Set14, BSD100 test datasets [here](https:\u002F\u002Fgithub.com\u002FXPixelGroup\u002FBasicSR\u002Fblob\u002Fmaster\u002Fdocs\u002FDatasetPreparation.md#common-image-sr-datasets). In the Google Drive link, navigate to the `Image Super-Resolution\u002FClassical` folder. Note that in the Set5 and Set14 zips, you will find multiple folders – the images you need are in the `original` folder. \n\nOrganize images in 5 separate folders – the training images in `train2014`, `val2014`, and the testing images in `BSDS100`, `Set5`, `Set14`.\n\n### Model Inputs and Targets\n\nThere are four inputs and targets. All input and target images are composed of RGB channels and are in the RGB space, unless otherwise noted.\n\n#### High-Resolution (HR) Images\n\nHigh-resolution images are random patches of size $96\\times96$ from the training images. HR images are used as targets to train the SRResNet and Generator of the SRGAN, and as inputs to the Discriminator of the SRGAN.\n\nWhen used as targets for the SRResNet, we will normalize these patches' contents to $[-1, 1]$, because this is the range in which MSE is calculated in the paper. Naturally, this means that super-resolved (SR) images must also be generated in $[-1, 1]$.\n\nWhen used as targets for the Generator of the SRGAN, we will normalize these patches' contents with the mean and standard deviation of ImageNet data, which can be found [here](https:\u002F\u002Fpytorch.org\u002Fvision\u002F0.12\u002Fmodels.html), because the HR image will be fed to an truncated ImageNet-pretrained VGG19 network for computing MSE in VGG space.\n\n```python\nmean = [0.485, 0.456, 0.406]\nstd = [0.229, 0.224, 0.225]\n```\n\nWhen used as inputs to the the Discriminator of the SRGAN, we will do the same. Naturally, this means that the Discriminator will accept inputs that have been ImageNet-normed.\n\nPyTorch follows the $NCHW$ convention, which means the channels dimension ($C$) will precede the size dimensions.\n\nTherefore, **HR images are `Float` tensors of dimensions $N\\times3\\times96\\times96$**, where $N$ is the batch size, and values are either in the $[-1, 1]$ range or ImageNet-normed.\n\n#### Low-Resolution (LR) Images\n\nLow-resolution versions of the HR images are produced by 4x bicubic downsampling. LR images are inputs to the SRResNet and Generator of the SRGAN.\n\n[Depending upon the library used for downsampling](https:\u002F\u002Fzuru.tech\u002Fblog\u002Fthe-dangers-behind-image-resizing#qualitative-results), we may need to perform antialiasing (i.e., prevent [aliasing](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FAliasing)) using a Gaussian blur as a low-pass filter before downsampling. Pillow's `resize` function, which we end up using, already incorporates antialiasing measures.\n\nIn the paper, LR images are scaled to $[0, 1]$, but we will instead normalize their contents with the mean and standard deviation of ImageNet data. Naturally, this means that inputs to our SRResNet and Generator must always be ImageNet-normed.\n\nTherefore, **LR images are `Float` tensors of dimensions $N\\times3\\times16\\times16$**, where $N$ is the batch size, and values are always ImageNet-normed.\n\n#### Super-Resolved (SR) Images\n\nSuper-resolved images are the intelligently upscaled versions of the LR images. SR images are outputs of the SRResNet and Generator of the SRGAN, to be compared with the target HR images, and also used as inputs to the Discriminator of the SRGAN.\n\nThe content loss when training the SRResNet is to be computed from RGB values in the $[-1, 1]$. SR images produced by the SRResNet are therefore in the same range, achieved with a final $\\tanh$ layer. \n\nSince the Generator of the SRGAN has the same architecture of the SRResNet, and is initially seeded with the trained SRResNet, the Generator's output will also be in $[-1, 1]$. \n\nSince the content loss when training the Generator is in VGG space, its SR images will need to be converted from $[-1, 1]$ to the ImageNet-normed space for input to the truncated VGG19. As mentioned earlier, the same is done with the HR images.\n\nTherefore, **SR images are `Float` tensors of dimensions $N\\times3\\times96\\times96$**, where $N$ is the batch size, and values are always in the range $[-1, 1]$.\n\n#### Discriminator Labels\n\nSince the Discriminator is a binary image classifier trained with both the SR and HR counterparts of each LR image, labels are $1$ or $0$ representing the $HR$ (natural origin) and $SR$ (artificial, Generator origin) labels respectively.\n\nDiscriminator labels are constructed manually during training – \n\n- a **`Long` tensor of dimensions $N$**, where $N$ is the batch size, filled with $1$s ($HR$) when training the Generator with the adversarial loss.\n\n- a **`Long` tensor of dimensions $2N$**, where $N$ is the batch size, filled with $N$ $1$s ($HR$) and $N$ $0$s ($SR$) when training the Discriminator with the $N$ HR and $N$ SR images respectively.\n\n### Data Pipeline\n\nData is divided into *training* and *test* splits. There is no *validation* split – we will simply use the hyperparameters described in the paper.\n\n#### Parse Raw Data\n\nSee `create_data_lists()` in [`utils.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Futils.py).\n\nThis parses the data downloaded and saves the following files –\n\n- `train_images.json`, a list containing **filepaths of all training images** (i.e. the images in `train2014` and `val2014` folders) that are above a specified minimum size.\n\n- `Set5_test_images.json`, `Set14_test_images.json`, `BSDS100_test_images.json`, each containing **filepaths of all test images** in the `Set5`, `Set14`, and `BSDS100` folders that are above a specified minimum size.\n\n#### Image Conversions\n\nSee `convert_image()` in [`utils.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Futils.py).\n\nWe will use a variety of normalizations, scaling, and representations for pixel values in RGB space –\n\n- As **Pillow (PIL)** images – images as read by the Pillow library in Python. RGB values are stored as integers in $[0, 255]$, which is how images are read from disk.\n  \n- As **floating values in $[0, 1]$**, which is used as an intermediate representation while converting from one representation to another.\n  \n- As **floating values in $[-1, 1]$**, which is how HR images are represented, SR images are produced, and in the case of the SRResNet, the medium in which the content loss is calculated.\n  \n- As **ImageNet-normed values**, which is how LR, SR, HR images are input to *any* model (SRResNet, Generator, Discriminator, or truncated VGG19).\n  \n- As [***y-channel***](https:\u002F\u002Fgithub.com\u002Fxinntao\u002FBasicSR\u002Fwiki\u002FColor-conversion-in-SR#rgbbgr----ycbcr), the luminance channel Y in the YCbCr color format, used to calculate the evaluation metrics Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM).\n\nTransformations from one form to another are accomplished by an intermediate transformation to $[0, 1]$.\n\n#### Image Transforms \n\nSee `ImageTransforms` in [`utils.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Futils.py).\n\nDuring training, **HR images are random fixed-size $96\\times96$ crops from training images** – one random crop per image per epoch. \n\nDuring evaluation, we take the **largest possible center-crop of each test image**, such that their dimensions are perfectly divisible by the scaling factor.\n\n**LR images are produced from HR images by 4x bicubic downsampling.** [Depending upon the library used for downsampling](https:\u002F\u002Fzuru.tech\u002Fblog\u002Fthe-dangers-behind-image-resizing#qualitative-results), we may need to perform antialiasing (i.e., prevent [aliasing](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FAliasing)) using a Gaussian blur as a low-pass filter before downsampling. Pillow's `resize` function, which we end up using, already incorporates antialiasing measures.\n\nHR images are converted to $[-1, 1]$ when training the SRResNet and ImageNet-normed when training the SRGAN. LR images are always ImageNet-normed.\n\n#### PyTorch Dataset\n\nSee `SRDataset` in [`datasets.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Fdatasets.py).\n\nThis is a subclass of PyTorch [`Dataset`](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fmaster\u002Fdata.html#torch.utils.data.Dataset), used to **define our training and test datasets.** \n\nIt needs a `__len__` method defined, which returns the size of the dataset, and a `__getitem__` method which returns the LR and HR image-pair corresponding to the `i`th image in the training or test JSON file, after performing the image transformations described above.\n\n#### PyTorch DataLoader\n\nThe `Dataset` described above, `SRDataset`, will be used by a PyTorch [`DataLoader`](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fmaster\u002Fdata.html#torch.utils.data.DataLoader) in [`train_srresnet.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Ftrain_srresnet.py),[`train_srgan.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Ftrain_srgan.py), and [`eval.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Feval.py) to **create and feed batches of data to the models** for training or evaluation.\n\n### Convolutional Block\n\nSee `ConvolutionalBlock` in [`models.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Fmodels.py).\n\nThis is a custom layer consisting of a **2D convolution**, an optional **batch-normalization**, and an optional ***Tanh***, ***PReLU***, or **Leaky *ReLU* activation**, used as a fundamental building block in the SRResNet, Generator, and Discriminator networks.\n\n### Sub-Pixel Convolutional Block\n\nSee `SubPixelConvolutionalBlock` in [`models.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Fmodels.py).\n\nThis is a custom layer consisting of a **2D convolution to $s^2n$ channels**, where $s$ is the scaling factor, and $n$ is the desired output channels in the upscaled image, followed by a **PyTorch [`nn.PixelShuffle()`](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fstable\u002Fgenerated\u002Ftorch.nn.PixelShuffle.html#torch.nn.PixelShuffle)**, used to perform upscaling in the SRResNet and Generator networks.\n\n### Residual Block\n\nSee `ResidualBlock` in [`models.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Fmodels.py).\n\nThis is a custom layer consisting of two convolutional blocks. The first convolutional block is *PReLU*-activated, and the second isn't activated at all. Batch normalization is performed in both. A **residual (skip) connection is applied** across the two convolutional blocks.\n\n### SRResNet\n\nSee `SRResNet` in [`models.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Fmodels.py).\n\nThis **constructs the SRResNet**, [as described](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#the-srresnet-architecture), using convolutional, residual, and sub-pixel convolutional blocks. \n\n### Generator\n\nSee `Generator` in [`models.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Fmodels.py).\n\nThe Generator of the SRGAN has the **same architecture as the SRResNet**, [as described](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#the-srresnet-architecture), and need not be constructed afresh.\n\n### Discriminator\n\nSee `Discriminator` in [`models.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Fmodels.py).\n\nThis **constructs the Discriminator of the SRGAN**, [as described](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#the-discriminator-architecture), using convolutional blocks and linear layers. \n\nAn *optional* [`nn.AdaptiveAvgPool2d`](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fstable\u002Fgenerated\u002Ftorch.nn.AdaptiveAvgPool2d.html) maintains a fixed image size before it is flattened and passed to the linear layers – this is only required if we don't use the default $96\\times96$ HR\u002FSR image size during training.\n\n### Truncated VGG19\n\nSee `TruncatedVGG19` in [`models.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Fmodels.py).\n\nThis **truncates an ImageNet-pretrained VGG19 network**, [available in `torchvision`](https:\u002F\u002Fpytorch.org\u002Fvision\u002F0.12\u002Fmodels.html), such that its output is the \"feature map obtained by the $j$th convolution (after activation) before the $i$th maxpooling layer within the VGG19 network\", as described in the paper.\n\nAs the authors do, we will use $i=5$ and $j=4$.\n\n# Training\n\nBefore you begin, make sure to save the required data files for training and evaluation. To do this, run the contents of [`create_data_lists.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Fcreate_data_lists.py) after pointing it to the training data folders `train2014`, `val2014`, and test data folders `Set5`, `Set14`, `BSDS100` folders after you [download the data](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#download) –\n\n`python create_data_lists.py`\n\n### Train the SRResNet\n\nSee [`train_srresnet.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Ftrain_srresnet.py).\n\nThe parameters for the SRResNet (and training it) are at the beginning of the file, so you can easily check or modify them should you need to.\n\nTo train the SRResNet **from scratch**, run this file –\n\n`python train_srresnet.py`\n\nTo resume training **from a checkpoint**, point to the corresponding file with the `checkpoint` parameter at the beginning of the code.\n\n### Train the SRGAN\n\nSee [`train_srgan.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Ftrain_srgan.py).\n\nYou can train the SRGAN only after training the SRResNet as the trained SRResNet checkpoint is used to initialize the SRGAN's Generator.\n\nThe parameters for the model (and training it) are at the beginning of the file, so you can easily check or modify them should you need to.\n\nTo train the SRGAN **from scratch**, run this file –\n\n`python train_srgan.py`\n\nTo resume training **from a checkpoint**, point to the corresponding file with the `checkpoint` parameter at the beginning of the code.\n\n### Remarks\n\nWe use the hyperparameters recommended in the paper.\n\nFor the SRResNet, we train using the Adam optimizer with a learning rate of $10^{-4}$ for $10^6$ iterations with a batch size of $16$.\n\nThe SRGAN is also trained with the Adam optimizer, with a learning rate of $10^{-4}$ for $10^5$ iterations and a learning rate of $10^{-5}$ for an *additional* $10^5$ iterations, with a batch size of $16$. \n\nI trained with a single RTX 2080Ti GPU. \n\n### Model Checkpoints\n\nYou can download my pretrained models [here](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F12OG-KawSFFs6Pah89V4a_Td-VcwMBE5i?usp=sharing).\n\nNote that these checkpoint should be [loaded directly with PyTorch](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fstable\u002Fgenerated\u002Ftorch.load.html#torch.load) for evaluation or inference – see below.\n\n# Evaluation\n\nSee [`eval.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Feval.py).\n\nTo evaluate the chosen model, run the file –\n\n`python eval.py`\n\nThis will calculate the Peak Signal-to-Noise Ratio (**PSNR**) and Structural Similarity Index Measure (**SSIM**) evaluation metrics on the 3 test datasets for the chosen model.\n\nHere are my results (with the paper's results in parantheses):\n\n|              |      PSNR      |      SSIM      |       |      PSNR      |      SSIM      |       |      PSNR      |      SSIM      |\n| :----------: | :------------: | :------------: | :---: | :------------: | :------------: | :---: | :------------: | :------------: |\n| **SRResNet** | 31.927 (32.05) | 0.902 (0.9019) |       | 28.588 (28.49) | 0.799 (0.8184) |       | 27.587 (27.58) | 0.756 (0.7620) |\n|  **SRGAN**   | 29.719 (29.40) | 0.859 (0.8472) |       | 26.509 (26.02) | 0.729 (0.7397) |       | 25.531 (25.16) | 0.678 (0.6688) |\n|              |    **Set5**    |    **Set5**    |       |   **Set14**    |   **Set14**    |       |   **BSD100**   |   **BSD100**   |\n\nErm, huge grain of salt. The paper emphasizes repeatedly that PSNR and SSIM _aren't really_ an indication of the quality of super-resolved images. The less realistic and overly smooth SRResNet images score better than those from the SRGAN. This is why the authors of the paper conduct an opinion score test, which is obviously beyond our means here.\n\n# Inference\n\nSee [`super_resolve.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Fsuper_resolve.py).\n\nMake sure to both the trained SRResNet and SRGAN checkpoints at the beginning of the code.\n\nRun the `visualize_sr()` function with your desired HR image to **visualize results in a grid**, with the original HR image, the bicubic-upsampled image (as a proxy for the LR version of this image), the super-resolved image from the SRResNet, and the super-resolved image from the SRGAN. \n\nThe examples at the beginning of this tutorial were generated using this function. Note that this does not upscale the chosen image, but rather **downscales and then super-resolves to compare with the original HR image**. You will need to modify the code if you wish to upscale the provided image directly, or perform any other function.\n\n**Be mindful of the size of the provided image.** The function provides a `halve` parameter, if you wish to create a new HR image at half the dimensions. This might be required if the original HR image is larger than your screen's size, making it impossible for you to experience the 4x super-resolution. \n\nFor instance, for a 2160p HR image, the LR image will be of 540p (2160p\u002F4) resolution. On a 1080p screen, you will essentially be looking at a comparison between a 540p LR image (in the form of its bicubically upscaled version) and 1080p SR\u002FHR images because your 1080p screen can only display the 2160p SR\u002FHR images at a downsampled 1080p. This is only an *apparent* rescaling of 2x. With `halve = True`, the HR\u002FSR images will be at 1080p and the LR image at 270p.\n\n### Large Examples\n\nThe images in the following examples (from [Cyberpunk 2077](https:\u002F\u002Fwww.cyberpunk.net\u002Fin\u002Fen\u002F)) are quite large. If you are viewing this page on a 1080p screen, you would need to **click on the image to view it at its actual size** to be able to effectively see the 4x super-resolution.\n\n\n\u003Cp align=\"center\">\n  \u003Ci>Click on image to view at full size.\u003C\u002Fi>\n\u003C\u002Fp>\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_edfeba208484.png)\n\n---\n\n\u003Cp align=\"center\">\n  \u003Ci>Click on image to view at full size.\u003C\u002Fi>\n\u003C\u002Fp>\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_b38442e95e6a.png)\n\n---\n\n\u003Cp align=\"center\">\n  \u003Ci>Click on image to view at full size.\u003C\u002Fi>\n\u003C\u002Fp>\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_71b6b5cf5139.png)\n\n---\n\n\u003Cp align=\"center\">\n  \u003Ci>Click on image to view at full size.\u003C\u002Fi>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_cae74d015f18.png\">\n\u003C\u002Fp>\n\n---\n\n\u003Cp align=\"center\">\n  \u003Ci>Click on image to view at full size.\u003C\u002Fi>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_dc67974f24a3.png\">\n\u003C\u002Fp>\n\n---\n\n\u003Cp align=\"center\">\n  \u003Ci>Click on image to view at full size.\u003C\u002Fi>\n\u003C\u002Fp>\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_e414a01144a0.png)\n\n---\n\n# Frequently Asked Questions\n\nI will populate this section over time from common questions asked in the [*Issues*](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fissues) section of this repository.\n\n**Why are super-resolved (SR) images from the Generator passed through the Discriminator twice? Why not simply *reuse* the output of the Discriminator from the first time?**\n\nYes, we do discriminate SR images *twice* –\n\n- When training the Generator, we pass SR images through the Discriminator, and use the Discriminator's output in the adversarial loss function with the incorrect but desired $HR$ label.\n\n- When training the Discriminator, we pass SR images through the Discriminator, and use the Discriminator's output to calculate the binary cross entropy loss with the correct and desired $SR$ label.\n\nIn the first instance, our goal is to update the parameters $\\theta_G$ of the Generator using the gradients of the loss function with respect to $\\theta_G$. And indeed, the Generator *is* a part of the computational graph over which we backpropagate gradients. \n\nIn the second instance, our goal is to update only the parameters $\\theta_D$ of the Discriminator, which are *upstream* of $\\theta_G$ in the *backwards* direction as we backpropagate gradients.\n\nIn other words, it is not necessary to calculate the gradients of the loss function with respect to $\\theta_G$ when training the Discriminator, and there is *no* need for the Generator to be a part of the computational graph! Having it so would be expensive because backpropagation is expensive. Therefore, we *detach* the SR images from the computational graph in the second instance, causing it to become, essentially, an independent variable with no memory of the computational graph (i.e. the Generator) that led to its creation.\n\nThis is why we forward-propagate twice – once with the SR images a part of the full SRGAN computational graph, *requiring* backpropagation across the Generator, and once with the SR images detached from the Generator's computational graph, *preventing* backpropagation across the Generator. \n\nForward-propagating twice is *much* cheaper than backpropagating twice.\n\n**How does subpixel convolution compare with transposed convolution?**\n\nThey seem rather similar to me, and should be able to achieve similar results. \n\nThey can be mathematically equivalent if, for a desired upsampling factor $s$, and a kernel size $k$ used in the subpixel convolution, the kernel size for the transposed convolution is $sk$. The number of parameters in this case will also be the same – $ns^2 * i * k * k$ for the former and $n * i * sk * sk$ for the latter. \n\nHowever, there are indications from some people that subpixel convolution *is* superior in particular ways, although I do not understand why. See this [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1609.07009.pdf), this [repository](https:\u002F\u002Fgithub.com\u002Fatriumlts\u002Fsubpixel), and this [Reddit thread](https:\u002F\u002Fwww.reddit.com\u002Fr\u002FMachineLearning\u002Fcomments\u002Fn5ru8r\u002Fd_subpixel_convolutions_vs_transposed_convolutions\u002F). Perhaps the [original paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1609.05158.pdf) too.\n\nObviously, being mathematically equivalent does not mean they are optimizable or learnable or efficient in the same way, but if anyone can knows *why* subpixel convolution can yield superior results, please open an issue and let me know so I can add this information to this tutorial.\n","这是一个关于**[PyTorch](https:\u002F\u002Fpytorch.org) 超分辨率的教程**。\n\n同时，这也是一个学习**GAN**及其工作原理的教程，无论其具体任务或应用是什么。\n\n这是我在[一系列教程](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002FDeep-Tutorials-for-PyTorch)中的第五篇，旨在教你如何使用强大的 PyTorch 库自行实现各种酷炫的模型。\n\n本教程假定读者已具备 PyTorch 和卷积神经网络的基础知识。\n\n如果你是 PyTorch 的新手，请先阅读 [用 PyTorch 进行深度学习：60 分钟速成](https:\u002F\u002Fpytorch.org\u002Ftutorials\u002Fbeginner\u002Fdeep_learning_60min_blitz.html) 和 [通过示例学习 PyTorch](https:\u002F\u002Fpytorch.org\u002Ftutorials\u002Fbeginner\u002Fpytorch_with_examples.html)。\n\n如有任何问题、建议或更正，欢迎在[问题页面](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fissues)中提出。\n\n我使用的是 `PyTorch 1.4` 和 `Python 3.6`。\n\n# 目录\n\n[***目标***](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#objective)\n\n[***概念***](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#concepts)\n\n[***概述***](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#overview)\n\n[***实现***](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#implementation)\n\n[***训练***](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#training)\n\n[***评估***](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#evaluation)\n\n[***推理***](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#inference)\n\n[***常见问题***](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#frequently-asked-questions)\n\n# 目标\n\n**构建一个能够真实地提高图像分辨率的模型。**\n\n超分辨率（SR）模型本质上是在原本没有像素的地方“幻化”出新的像素。在本教程中，我们将尝试将图像的尺寸**四倍放大**，即把像素数量增加到原来的16倍！\n\n我们将实现论文中的两种模型——SRResNet，以及在此基础上通过对抗训练进一步改进的 SRGAN。结果不仅非常惊艳，而且也是了解 GAN 的绝佳入门！\n\n我们将训练论文中提到的两个模型：SRResNet 和 SRGAN。在开始之前，不妨先看看一些由训练过程中未见过的低分辨率图像生成的示例。快来体验一下吧！\n\n---\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_44fe1d2fc77e.png\">\n\u003C\u002Fp>\n\n---\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_c2b8443a3162.png)\n\n---\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_bedd19429559.png)\n\n---\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_536f95c0cb31.png\">\n\u003C\u002Fp>\n\n---\n这是 SRGAN 的**视频演示**！[观看链接](https:\u002F\u002Fyoutu.be\u002FsUhbIdSd6dc)\n\n由于 YouTube 的压缩可能会降低视频质量，你可以从[这里下载原始视频文件](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F12OG-KawSFFs6Pah89V4a_Td-VcwMBE5i?usp=sharing)，以获得最佳观看效果。\n\n[![点击观看](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_7cb2dc7466fe.jpg)](https:\u002F\u002Fyoutu.be\u002FsUhbIdSd6dc)\n\n请务必以 1080p 分辨率观看，以免 4 倍缩放后的图像被下采样到较低分辨率。\n\n---\n\n教程的最后还有**大型示例**。[查看大型示例](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#large-examples)\n\n---\n\n# 概念\n\n* **超分辨率**。显而易见。\n\n* **残差连接**。残差连接最早出现在 2015 年的开创性论文中，它是一种跨越一层或多层神经网络的捷径，使网络能够学习输入的残差映射——即产生期望输出的扰动——而不是直接学习整个输出。通过在所谓的残差“块”中添加这些连接，可以显著提高深层神经网络的可优化性。\n\n* **生成对抗网络 (GAN)**。源自另一篇具有里程碑意义的论文，GAN 是一种机器学习框架，其中两个网络相互对抗。生成模型（Generator）试图生成与训练数据分布完全一致的数据——在本例中就是更高分辨率的图像；而判别模型（Discriminator）则试图通过学习区分真假来阻止这种伪造行为。随着双方技能的提升，它们可以通过彼此的预测来不断完善对方。最终的目标是让生成器所创造的“虚构”图像在人类眼中与真实图像无异。\n\n* **亚像素卷积**。这是一种替代转置卷积的图像上采样方法。亚像素卷积通过对低分辨率特征图进行常规卷积，生成新的像素，并将其作为新的图像通道，随后再将这些通道“打乱”组合成高分辨率图像。\n\n* **感知损失**。它结合了基于 MSE 的内容损失（在“深度”图像空间中计算，而非通常的 RGB 颜色空间）和对抗损失，从而使生成器能够从判别器的判断中学习。\n\n# 概述\n\n在这一部分，我将对本模型进行概述。如果你已经熟悉相关内容，可以直接跳转到[实现](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#implementation)部分或查看带注释的代码。\n\n### 图像上采样方法\n\n图像上采样本质上是**人为地提高其空间分辨率**——即增加用于表示图像中“视图”的像素数量的过程。\n\n**图像上采样是一种非常常见的应用**：每当你在手机上用双指缩放查看图片，或者在1080p显示器上播放480p视频时，都会发生上采样。你可能会认为这其中并没有人工智能的参与，而事实也的确如此——因为一旦你以高于原始编码分辨率的尺寸观看图像，它就会开始显得模糊或出现马赛克效应。\n\n与本教程中我们将尝试的神经网络超分辨率不同，常见的上采样方法并不旨在生成高保真度的、更高分辨率下图像应有的估计结果。相反，它们之所以被使用，是因为**图像在显示时经常需要重新采样**。例如，当你希望一张图片占据1080p屏幕上的特定区域，或者将其打印成A4纸大小时，显示器或打印机的原生分辨率恰好与图片分辨率一致的可能性微乎其微。尽管上采样在技术上确实提高了分辨率，但显而易见的是，它仍然只是以较高分辨率呈现的一张低分辨率、细节匮乏的图像，或许还经过了一些平滑或锐化处理。\n\n事实上，通过这些方法上采样的图像可以作为**低分辨率图像的替代品**，用来与超分辨率后的版本进行对比，无论是在论文中还是在本教程中都是如此。如果不以某种方式对低分辨率图像进行上采样（或者反过来对超分辨率图像进行下采样，这显然很荒谬），就无法将低分辨率图像以与超分辨率图像相同的物理尺寸（例如在屏幕上以英寸为单位）显示出来。\n\n那么，让我们来看看一些**常见的上采样技术**吧？\n\n作为参考图像，我们可以考虑这张来自《赛博朋克2077》的酷炫武士标志[由Reddit用户\u002Fu\u002Fshapanga创作](https:\u002F\u002Freddit.com\u002Fr\u002Fcyberpunkgame\u002Fcomments\u002F8rnndi\u002Fi_remade_the_jacket_logo_from_the_trailer_feel\u002F)，我在此已获得其许可并加以使用。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_a0d326857325.png\">\n\u003C\u002Fp>\n\n现在，我们再来看同一张图像缩小到四分之一尺寸的情况，也就是像素数量减少了16倍：\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_5448a97c614e.png\">\n\u003C\u002Fp>\n\n我们的目标是增加这张低分辨率图像的像素数量，使其能够以与高分辨率版本相同的尺寸显示出来。\n\n#### 最近邻上采样\n\n这是最简单的图像上采样方法，其实质就是直接按原样拉伸图像。\n\n假设有一张小图像，上面有一条黑色对角线，一侧是红色，另一侧是灰色：\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_a7256e34d58d.png\">\n\u003C\u002Fp>\n\n首先，我们在已知像素之间按照所需分辨率插入新的空白像素：\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_537748c3f37c.png\">\n\u003C\u002Fp>\n\n然后，我们将每个新像素的值设置为其**最近邻已知像素的值**：\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_140a98d26ec9.png\">\n\u003C\u002Fp>\n\n使用最近邻插值法对低分辨率的武士图像进行上采样后，得到的结果会显得块状明显，并且边缘会出现锯齿状：\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_16098bcea4cb.png\">\n\u003C\u002Fp>\n\n#### 双线性\u002F双三次上采样\n\n同样地，我们先插入空白像素，使图像达到目标分辨率：\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_537748c3f37c.png\">\n\u003C\u002Fp>\n\n接下来，我们需要为这些空白像素填充颜色。如果我们采用线性插值法，利用距离最近的两个已知像素（即每侧各一个）来计算新像素的值，这就称为**双线性上采样**：\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_5f23cf1284aa.png\">\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_211739388a7e.png\">\n\u003C\u002Fp>\n\n使用双线性插值法对低分辨率的武士图像进行上采样后，结果会比最近邻插值法更加平滑，这是因为像素之间的过渡更为自然：\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_34b924f2eb34.png\">\n\u003C\u002Fp>\n\n另外，你也可以采用三次插值法，利用四个已知像素（即每侧各两个）来计算新像素的值，这就是**双三次上采样**。正如你所想象的那样，由于使用了更多的数据来进行插值，最终结果会更加平滑：\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_da95b2510ad2.png\">\n\u003C\u002Fp>\n\n[这张维基媒体的示意图](https:\u002F\u002Fcommons.wikimedia.org\u002Fwiki\u002FFile:Comparison_of_1D_and_2D_interpolation.svg)很好地展示了这些插值方法。\n\n我猜想，如果你正在用VLC媒体播放器等工具，在高分辨率屏幕上观看低分辨率视频时，你所看到的画面其实就是通过双线性或双三次插值法进行上采样的每一帧。\n\n此外，还有一些更高级的上采样方法，比如[Lanczos插值法](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FLanczos_resampling)，不过我对它们的了解相对有限。\n\n### 神经网络超分辨率\n\n与较为“简单”的图像上采样不同，超分辨率的目标正是从低分辨率版本中**生成高分辨率、高保真度、美观且逼真的图像**。\n\n当一张图像被降为较低分辨率时，细节会不可逆转地丢失。同样地，**将图像放大到更高分辨率需要“添加”新的信息**。\n\n作为人类，你或许能够想象出图像在更高分辨率下会是什么样子——你可能会心想：“这个角落里的模糊一团，其实应该是每一根独立的发丝”，或者“那片沙色的区域很可能就是沙子，看起来应该带有颗粒感”。然而，若要手动完成这样的工作，不仅需要一定的艺术功底，还会耗费大量精力。而本教程的目标，就是**训练一个神经网络来完成这项任务**。\n\n例如，经过超分辨率训练的神经网络可以识别出，上方低分辨率图像中的黑色对角线，在放大后的图像中应当表现为一条平滑但清晰的黑色对角线。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_24f9ac32153b.png\">\n\u003C\u002Fp>\n\n尽管对于一些日常任务而言，使用神经网络进行超分辨率处理可能并不实用，甚至没有必要，但它已经在当今的实际应用中发挥作用。比如，当你玩支持[NVIDIA DLSS](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FDeep_learning_super_sampling)技术的电子游戏时，屏幕上显示的画面其实是以较低分辨率渲染的，随后通过神经网络“幻化”成更大、更清晰的图像，仿佛一开始就以高分辨率渲染一般。也许不久的将来，你最喜欢的视频播放器就能在你的巨幕电视上自动将影片提升至4K分辨率并播放。\n\n正如本教程开篇所述，我们将训练两种生成式神经网络模型——**SRResNet**和**SRGAN**。\n\n这两种网络都将把图像的尺寸扩大四倍，即像素数量增加16倍！\n\n使用SRResNet超分辨率处理后的低分辨率武士图，其质量已可与原始高分辨率版本相媲美。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_8b985fb2bf7b.png\">\n\u003C\u002Fp>\n\n而使用SRGAN处理后的低分辨率武士图也同样如此。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_5e7f872f5824.png\">\n\u003C\u002Fp>\n\n就这张武士图而言，我认为SRResNet的效果优于SRGAN。不过，这可能是因为它是一张相对简单的图像，色彩单一且纯色为主——SRResNet在生成过于光滑纹理方面的弱点，在这种情况下反而成了优势。\n\n而在生成具有精细细节的逼真图像方面，由于采用了对抗性训练方式，SRGAN的表现远远超过SRResNet，这一点在本教程中穿插的多个示例中都有所体现。\n\n### 残差（跳跃）连接\n\n通常来说，**更深的神经网络能力更强**，但这也存在一个限度。事实证明，增加层数确实能在一定程度上提升性能，然而一旦超过某个临界点，**性能反而会下降**。\n\n这种性能下降并非由过拟合训练数据引起——训练过程中的各项指标同样会受到影响。同时，它也不是由于深度网络常见的梯度消失或爆炸问题所致，因为即便对初始化和各层输出进行了归一化处理，这一现象依然存在。\n\n为了解决深层神经网络难以优化的问题，在2015年发表的一篇开创性论文中，研究人员提出了**跳跃连接**的概念——这些连接允许信息绕过中间的操作步骤，以未改变的形式直接传递过去，并将其与该操作的输出按元素相加。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_a40eebf9d12d.png\">\n\u003C\u002Fp>\n\n这种跳跃连接并不一定只跨越一层。你还可以创建跨多层的快捷路径。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_b33c0c80f9a2.png\">\n\u003C\u002Fp>\n\n借助跳跃连接，中间层无需学习完整的、未经参考的目标函数，而是学习**残差映射**——也就是说，它们只需建模输入到期望输出之间需要做出的改变即可。因此，尽管最终结果可能相同，但我们希望这些层学习的内容却发生了根本性的变化。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_a0a74c97ae97.png\">\n\u003C\u002Fp>\n\n**学习残差映射要容易得多**。以极端情况为例：让一组非线性层学习“恒等映射”。乍一看似乎很简单，但实际上，找到合适的权重组合——使输入经过线性变换后，再通过非线性激活仍能恢复原状——并非易事，近似求解也绝非轻而举。相比之下，学习其残差映射则要简单得多：只需将权重调整为零即可，因为此时对应的映射就是“零函数”，即不对输入做任何改变。\n\n事实上，上述例子并没有我们想象得那么极端——**网络中的深层层实际上学到的正是接近于恒等函数的行为**，因为这些层对输入所做的改动往往非常微小。\n\n跳跃连接使得训练极深的网络成为可能，并能显著提升性能。毫不奇怪，它们被广泛应用于SRResNet（顾名思义，即超分辨率残差网络）以及SRGAN的生成器中。实际上，如今几乎找不到没有跳跃连接的现代神经网络了。\n\n### 子像素卷积\n\n在卷积神经网络中，如何处理上采样呢？这不仅限于超分辨率任务，也适用于语义分割等应用：在这些应用中，更“全局”的特征图——按定义分辨率较低——必须被上采样到你希望进行分割的分辨率。\n\n一种常见方法是**先使用双线性或双三次插值上采样到目标分辨率，然后再应用需要学习的卷积层**，以获得更好的结果。事实上，早期的超分辨率网络正是这样做的：在网络的最开始就将低分辨率图像上采样，然后在高分辨率空间中应用一系列卷积层，最终生成超分辨率图像。\n\n另一种流行的方法是**转置卷积**，你可能已经熟悉它：将完整的卷积核应用于低分辨率图像中的单个像素，生成的多像素块按照设定的步长组合起来，从而得到高分辨率图像。这基本上就是**常规卷积过程的逆过程**。\n\n**子像素卷积**则是一种替代方案，它通过对低分辨率图像应用常规卷积，使得**所需的新增像素以额外通道的形式生成**。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_7a118d21d362.png\">\n\u003C\u002Fp>\n\n换句话说，**如果要将图像上采样 $s$ 倍**，那么对于低分辨率图像中的每个像素，需要生成的 $s^2$ 个新像素，就会通过卷积操作以该位置**$s^2$ 个新通道**的形式产生。你可以为这一操作选择任意大小的卷积核 $k$，而低分辨率图像可以具有任意数量的输入通道 $i$。\n\n随后，这些通道会被重新排列，形成高分辨率图像，这一过程称为**像素洗牌**。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_f16d3fbc5774.png\">\n\u003C\u002Fp>\n\n在上面的例子中，高分辨率图像只有一个输出通道。**如果你需要 $n$ 个输出通道，只需生成 $n$ 组各 $s^2$ 个通道**，然后在每个位置将它们洗牌成 $n$ 组 $s \\times s$ 的小块。\n\n在本教程的其余部分，我们将使用如下所示的像素洗牌操作：\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_096ada5b5d1b.png\">\n\u003C\u002Fp>\n\n正如你所想象的那样，**在低分辨率空间中进行卷积比在高分辨率空间中进行卷积更加高效**。因此，子像素卷积层通常位于超分辨率网络的末端，在对低分辨率图像应用了一系列卷积之后。\n\n### 最小化损失函数——回顾\n\n让我们暂时停下来，探讨一下我们为什么要构建损失函数并对其进行最小化。你可能已经了解这些内容，但我觉得再次回顾这些概念很有帮助，因为它们是理解 GAN 训练方式的关键。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_b07a30de297f.png\">\n\u003C\u002Fp>\n\n- **损失函数** $L$ 实际上是一个量化我们的网络 $N$ 的输出与期望值 $D$ 之间差异程度的函数。\n  \n- 我们的神经网络输出 $N(θ_N, I)$ 是指在网络当前参数集 $θ_N$ 下，给定输入 $I$ 时产生的输出。\n  \n- 我们称之为“期望值”$D$，而不是“真值”或“标签”，是因为我们期望的值并不一定是真实的，这一点我们稍后会看到。\n  \n- 因此，我们的目标就是**最小化损失函数**，而这可以通过调整网络参数 $θ_N$ 来实现，使网络输出 $N(θ_N, I)$ 更接近期望值 $D$。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_ba154d6dfc18.png\">\n\u003C\u002Fp>\n\n需要注意的是，参数 $θ_N$ 的变化并不是最小化损失函数 $L$ 的结果。相反，最小化损失函数 $L$ 是由于以特定方式改变参数 $θ_N$ 所导致的。我在上面提到“最小化 $L$ *会移动* $θ_N$……”只是为了表明，**选择最小化某个特定的损失函数 $L$，就意味着会对 $θ_N$ 进行这些特定的调整**。\n\n至于如何决定 $θ_N$ 变化的方向和幅度，在这里不是重点，不过为了完整起见：\n\n- 首先，通过链式法则沿网络反向传播梯度，计算损失函数 $L$ 对参数 $θ_N$ 的梯度，即 $\\frac{∂L}{∂θ_N}$，这一过程称为*反向传播*。\n  \n- 然后，根据梯度 $\\frac{∂L}{∂θ_N}$ 的大小以及一个称为学习率 $lr$ 的步长，沿着与梯度相反的方向移动参数 $θ_N$，从而沿着损失函数的曲面下降，这一过程称为*梯度下降*。\n\n总结来说，这里的重要启示是：对于给定输入 $I$ 的网络 $N$，通过选择合适的损失函数 $L$ 和期望值 $D$，我们可以调整损失函数上游的所有参数 $θ_N$，从而使网络输出更接近 $D$。\n\n根据具体需求，我们也可以只调整所有参数 $θ_N$ 中的一部分 $θ_n$，同时冻结其他参数 $θ_{N-n}$，从而**仅训练整个网络 $N$ 中的一个子网络 $n$**，使其输出的变化反过来促使整个网络 $N$ 的输出更接近期望值 $D$。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_408e3c3fae03.png\">\n\u003C\u002Fp>\n\n你可能已经在迁移学习的应用中这样做过——例如，只微调大型预训练 CNN 或 Transformer 模型 $N$ 的最后几层 $n$，以适应新的任务。我们稍后也会做类似的事情，只不过是在完全不同的背景下。\n\n### 超分辨率残差网络（SRResNet）\n\nSRResNet 是一种 **专为 4 倍超分辨率设计的全卷积网络**。正如其名所示，它采用了带有跳跃连接的残差块结构，即使网络深度较大，也能提高其可优化性。\n\nSRResNet 可以作为独立网络进行训练和使用，并且如您即将看到的那样，它为 SRGAN 提供了一个 **很好的基准模型**——既可用于比较，也可用于初始化。\n\n#### SRResNet 的架构\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_cbc774407efa.png\">\n\u003C\u002Fp>\n\nSRResNet 由以下操作组成：\n\n- 首先，将低分辨率图像与一个 $9\\times9$ 大小、步幅为 1 的卷积核进行卷积，生成与原图分辨率相同但通道数为 64 的特征图。随后应用参数化 ReLU（PReLU）激活函数。\n  \n- 接着，该特征图会依次通过 16 个 **残差块**，每个残差块包含两个 $3\\times3$ 卷积层（步幅均为 1）、批归一化和 PReLU 激活函数，以及第二个类似的卷积层和第二次批归一化。在每一层卷积中，特征图的分辨率和通道数都保持不变。\n  \n- 经过一系列残差块处理后的结果，再经过一个 $3\\times3$ 卷积层（步幅为 1）和批归一化处理，分辨率和通道数同样保持不变。除了每个残差块内部的跳跃连接之外，还有一个跨越所有残差块及该卷积层的更大规模的跳跃连接。\n  \n- 然后，通过 2 个 **亚像素卷积块**，每次将图像尺寸放大 2 倍（之后接 PReLU 激活），最终实现 4 倍的上采样。同时，通道数也保持不变。\n  \n- 最后，在更高分辨率下，再使用一个 $9\\times9$ 大小、步幅为 1 的卷积层进行卷积，并通过 Tanh 激活函数输出 **超分辨率图像**，其 RGB 通道值范围为 $[-1, 1]$。\n\n如果您对上述某些具体数值感到疑惑，请不必担心。通常情况下，这些数值很可能是作者及其论文中引用的其他研究工作根据经验或便利性而设定的。\n\n#### SRResNet 的更新过程\n\n与任何神经网络一样，SRResNet 的训练也是通过对其参数的一系列更新来完成的。那么，这样的更新具体包括哪些内容呢？\n\n我们的训练数据将由高分辨率（真实）图像及其对应的低分辨率版本组成，后者是通过对高分辨率图像进行 4 倍双三次插值下采样得到的。\n\n在前向传播过程中，SRResNet 会生成一张 **分辨率是输入低分辨率图像 4 倍的超分辨率图像**。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_08203634652d.png\">\n\u003C\u002Fp>\n\n我们使用 **均方误差（MSE）作为损失函数**，用来比较超分辨率图像与用于生成低分辨率图像的真实高分辨率图像。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_092f5f0289dc.png\">\n\u003C\u002Fp>\n\n选择最小化超分辨率图像与真实图像之间的 MSE，意味着我们会调整 SRResNet 的参数，使得当再次输入相同的低分辨率图像时，它能够 **生成更接近原始高分辨率图像外观的超分辨率图像**。\n\nMSE 损失属于一种 ***内容损失***，因为它仅基于预测图像和目标图像的内容进行计算。\n\n在本例中，我们关注的是它们在 ***RGB 空间*** 中的内容——这一点的重要性我们稍后会详细讨论。\n\n### 超分辨率生成对抗网络（SRGAN）\n\nSRGAN 由 **生成器网络** 和 **判别器网络** 组成。\n\n生成器的目标是学习如何以足够逼真的方式进行超分辨率重建，使得专门用于识别此类人工生成痕迹的判别器无法再可靠地区分真实图像和生成图像。\n\n这两部分网络是 **同时进行训练的**。\n\n生成器不仅像 SRResNet 那样通过最小化内容损失来进行学习，还会“窃听”判别器的工作机制。\n\n换句话说，我们就是潜伏在判别器办公室里的“内鬼”！通过让生成器获取判别器在反向传播其输出时所产生的梯度信息，生成器可以调整自身的参数，从而改变判别器的输出结果，使其对自己有利。\n\n随着生成器生成的高分辨率图像越来越逼真，我们也会用这些图像来训练判别器，不断提升其辨别能力。\n\n#### 生成器的架构\n\n从架构上看，生成器与 SRResNet **完全相同**。毕竟它们的功能是一样的。这也使得我们可以利用已经训练好的 SRResNet 来初始化生成器，从而获得巨大的优势。\n\n#### 判别器的架构\n\n正如你所料，判别器是一个卷积网络，其功能是作为一个 **二分类图像分类器**。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_474d6d3c8346.png\">\n\u003C\u002Fp>\n\n它的架构由以下步骤构成：\n\n- 将高分辨率图像（无论是自然图像还是人工图像）与一个 $9\\times9$ 大小、步幅为 1 的卷积核进行卷积，生成与原图分辨率相同但通道数为 64 的特征图。随后应用 Leaky ReLU 激活函数。\n  \n- 接着，该特征图会依次经过 7 个 **卷积块**，每个卷积块包含一个 $3\\times3$ 卷积层、批归一化和 Leaky ReLU 激活函数。其中，偶数编号的卷积块会使通道数翻倍；奇数编号的卷积块则通过步幅为 2 的卷积操作使特征图尺寸减半。\n  \n- 经过这一系列卷积块处理后的结果会被展平，并通过线性变换转化为一个 1024 维的向量，随后再次应用 Leaky ReLU 激活函数。\n  \n- 最后，再经过一次线性变换得到一个 logits 值，通过 Sigmoid 激活函数将其转换为概率分数，从而反映出 **输入图像为自然（真实）图像的概率**。\n\n#### 交替训练\n\n首先，让我们来描述一下生成器和判别器是如何相互配合进行训练的。我们应该先训练哪一个呢？\n\n其实，两者并不会一方完全训练好后再开始另一方的训练——它们是 **同时进行训练的**。\n\n通常来说，任何 GAN 都是以 **交替方式进行训练**的，即生成器和判别器会轮流进行短时间的训练。\n\n在这篇论文中，每个子网络只更新一次，然后就切换到另一个网络。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_7ec23d03f780.png\">\n\u003C\u002Fp>\n\n在其他 GAN 实现中，你可能会发现每更新一次生成器，就会有 $k$ 次更新判别器的过程，其中 $k$ 是一个可以根据最佳效果进行调整的超参数。不过，很多时候 $k$ 的值就是 1。\n\n#### 判别器的更新\n\n在讨论生成器之前，先了解判别器的更新方式会更有帮助。这里并没有什么意外——一切都如你所料。\n\n由于判别器需要学会区分真实的高分辨率图像（“黄金”图像）和由生成器生成的超分辨率图像，因此在训练过程中，它会同时接收到“黄金”图像和超分辨率图像，并附带相应的标签（$HR$ 表示真实图像，$SR$ 表示生成图像）。\n\n例如，在前向传播中，判别器会接收一张真实的高分辨率图像，并输出一个**概率分数 $P_{HR}$，表示该图像是自然生成的概率**。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_96f700b14c67.png\">\n\u003C\u002Fp>\n\n我们希望判别器能够正确地将其识别为“黄金”图像，并且 $P_{HR}$ 尽可能高。为此，我们会使用正确的标签（$HR$）来最小化**二元交叉熵损失**。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_3c35c71d8a6f.png\">\n\u003C\u002Fp>\n\n选择最小化这一损失，将调整判别器的参数，使得当再次输入这张“黄金”高分辨率图像时，它会**预测出更高的 $P_{HR}$，即认为该图像是自然生成的概率更高**。\n\n同样地，在前向传播中，判别器还会接收到由生成器根据原始高分辨率图像的下采样低分辨率版本生成的超分辨率图像，判别器会输出一个**概率分数 $P_{HR}$，表示该图像是自然生成的概率**。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_04e5cbfa8975.png\">\n\u003C\u002Fp>\n\n我们希望判别器能够正确地将其识别为超分辨率图像，并且 $P_{HR}$ 尽可能低。因此，我们会使用正确的标签（$SR$）来最小化**二元交叉熵损失**。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_c19364208956.png\">\n\u003C\u002Fp>\n\n选择最小化这一损失，将调整判别器的参数，使得当再次输入这张超分辨率图像时，它会**预测出更低的 $P_{HR}$，即认为该图像是自然生成的概率更低**。\n\n综上所述，判别器的训练过程相当直接，与训练任何图像分类器的方式并无不同。\n\n接下来，我们来看看生成器的更新内容。\n\n#### 更好的内容损失\n\n在 SRResNet 中使用的基于均方误差（MSE）的 RGB 空间内容损失，是图像生成领域中常用的手段。\n\n然而，它也存在一些缺点——它会导致生成的图像过于平滑，缺乏实现照片级逼真度所需的精细细节。你可能已经在本教程中的多个示例中观察到了 SRResNet 的这一问题。原因也很容易理解。\n\n在对低分辨率图像或区域进行超分辨率处理时，往往会有多种合理的高分辨率结果。换句话说，低分辨率图像中的一个小模糊区域，可以对应于许多不同的高分辨率区域，而每一种都可以被视为有效的结果。\n\n举个例子，假设低分辨率区域需要在高分辨率的 RGB 空间中以特定间距生成带有蓝色斜线的图案。那么这些斜线的确切位置就有多种可能性。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_1cea36542b48.png\">\n\u003C\u002Fp>\n\n其中任意一种都可以被认为是令人满意的结果。事实上，真实的高分辨率图像中确实会包含其中之一。\n\n但是，像 SRResNet 这样的网络如果使用 RGB 空间的内容损失进行训练，就会非常不愿意生成这样的结果。相反，它会选择生成一种本质上是所有这些精细高分辨率可能性的***平均值***。正如你所想象的那样，这种平均后的结果几乎没有任何细节，因为所有的细节都被平均掉了！不过，这种平均结果却是一种“安全”的预测，因为训练时使用的“真实”或“地面真值”图像可能是这几种可能性中的任意一种，而如果生成其他任何一种有效结果，则会导致非常高的 MSE。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_72b9f3730b84.png\">\n\u003C\u002Fp>\n\n换言之，正是由于无法从低分辨率图像准确推断出“地面真值”图像中精确的 RGB 像素值，网络才会避免创造性地生成任何等效的图案，因为这样不仅面临很高的 MSE 风险，而且几乎不可能偶然生成与“地面真值”完全一致的像素。相反，**一个过于平滑的“平均”预测通常会具有更低的 MSE！**\n\n在模型看来——请记住，模型是通过 RGB 空间的内容损失来“观察”图像的——这些众多的有效可能性其实并不等价。唯一有效的预测就是生成“地面真值”的 RGB 像素，而这些像素又是无法准确知道的。为了解决这个问题，我们需要找到一种方法，让在我们眼中等价的多种可能性，在模型看来也同样等价。\n\n有没有一种方法，可以忽略图像或区域中 RGB 像素的具体排列，而只关注其基本本质或意义呢？**有！** 训练用于图像分类的卷积神经网络正是这样做的——它们会生成图像或区域的“深层”表征，用来描述其本质特征。由此可以推断出：**在 RGB 空间中逻辑上等价的图案，经过训练好的 CNN 处理后，会产生相似的表征。**\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_c9de0fce2aca.png\">\n\u003C\u002Fp>\n\n这种新的“深层”表征空间非常适合用于计算内容损失！我们的超分辨率模型不再需要害怕发挥创造力——只要生成的图案在逻辑上合理、细节丰富，即使与“地面真值”的 RGB 图像不完全相同，也不会受到惩罚。\n\n#### 生成器的更新——第一部分\n\n生成器更新的第一步涉及***内容损失***。\n\n如我们所知，在前向传播中，生成器会根据输入的低分辨率图像生成一张**尺寸为原图 4 倍的超分辨率图像**。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_c92e95801152.png\">\n\u003C\u002Fp>\n\n鉴于上一节中提到的原因，我们将不再使用 RGB 空间中的 MSE 作为内容损失，来比较生成的超分辨率图像与用于生成低分辨率图像的原始“黄金”高分辨率图像。\n\n相反，我们会将这两张图像都输入到一个训练好的 CNN 中，具体来说是**在 ImageNet 分类任务上预训练过的 VGG19 网络**。这个网络被**截断至第 5 个最大池化层之后的第 4 层卷积层**。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_6899179549a1.png\">\n\u003C\u002Fp>\n\n然后，我们将在 VGG 空间中使用基于 MSE 的内容损失，间接比较超分辨率图像与原始“黄金”高分辨率图像。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_010009813cb4.png\">\n\u003C\u002Fp>\n\n选择在 VGG 特征空间中最小化超分辨率图像与真实高分辨率图像之间的均方误差，意味着我们将以这样的方式调整生成器的参数：当再次输入相同的低分辨率图像时，生成器会**生成一张在外观上更接近原始高分辨率版本的超分辨率图像——因为其在 VGG 特征空间中的表示也更加接近**，而不会像 SRResNet 那样过度平滑或显得不真实。\n\n#### 生成器更新——第二部分\n\n没有对抗训练的 GAN 算什么？答案是：它就不是 GAN。\n\n内容损失只是生成器更新的一个组成部分。尽管使用 VGG 特征空间而非 RGB 空间确实能带来改进，但相较于 SRResNet，使生成器实现照片级逼真超分辨率效果的最大推动力仍然很可能是**对抗损失**。\n\n在此步骤中，我们将生成的超分辨率图像输入到当前状态下的判别器中，以获得一个表示该图像为“自然图像”的**概率分数 $P_{HR}$**。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_f0283407b238.png\">\n\u003C\u002Fp>\n\n显然，生成器希望判别器无法识别出这并非自然图像，并使 $P_{HR}$ 尽可能高。那么，我们该如何更新生成器以提高 $P_{HR}$ 呢？请注意，在这一步中我们的目标仅是训练生成器——判别器的权重保持冻结。\n\n因此，我们**将我们期望的标签（即“自然图像”）——尽管这个标签是错误的或具有误导性——**提供给二元交叉熵损失函数，并利用由此产生的梯度信息来更新生成器的权重！\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_b51e2b63eca4.png\">\n\u003C\u002Fp>\n\n选择以期望的但错误的标签（“自然图像”）来最小化二元交叉熵损失，意味着我们将调整生成器的参数，使得当再次输入低分辨率图像时，生成器会**生成一张在外观和特征上更接近原始高分辨率版本的超分辨率图像，从而让判别器更有可能将其判定为自然图像**。\n\n换言之，通过这种损失函数的设计，我们利用的是判别器中的梯度信息——即判别器输出 $P_{HR}$ 如何响应判别器自身参数的变化——而不是直接用来更新判别器自身的参数，而是通过反向传播获取生成器中的梯度信息——即判别器输出 $P_{HR}$ 如何响应生成器参数的变化——进而对生成器进行必要的调整！\n\n在教程的[之前部分](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#minimizing-loss--a-refresher)，我们已经看到，可以通过冻结子网络 $N-n$ 的参数，只更新较大网络 $N$ 中的子网络 $n$ 来最小化损失并逐步逼近期望的输出。这里我们同样采用了类似的做法：生成器和判别器共同构成了一个“超网络”$N$，而我们只更新其中的生成器 $n$。诚然，如果同时更新判别器 $N-n$，损失可能会进一步降低，但这样做会直接削弱判别器的鉴别能力，反而得不偿失。\n\n#### 感知损失\n\n由于生成器同时从两种类型的损失——内容损失和对抗损失——中学习，我们可以将它们按权重平均结合起来，形成论文作者所称的***感知损失***。\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_1d165d9aa3cd.png\">\n\u003C\u002Fp>\n\n这种感知损失极大地提升了 SRResNet 的性能，使生成器能够生成更加逼真、细节丰富的图像，这一点也在论文中的人体实验研究中得到了证实！\n\n\n\n# 实现\n\n以下各节简要描述了实现过程。\n\n这些内容旨在提供一些背景信息，但**具体的细节最好直接参考代码**，因为代码中有非常详尽的注释。\n\n### 数据集\n\n#### 描述\n\n虽然论文的作者是在 ImageNet 数据集中一个包含 35 万张图像的子集上训练模型的，但我只使用了大约 12 万张 COCO 图像。这些图像更容易获取。\n\n与论文一致，我们会在 Set5、Set14 和 BSD100 数据集上测试训练好的模型，这些数据集是超分辨率任务中常用的基准测试集。\n\n#### 下载\n\n你需要下载 MSCOCO '14 的[训练集（13GB）](http:\u002F\u002Fimages.cocodataset.org\u002Fzips\u002Ftrain2014.zip)和[验证集（6GB）](http:\u002F\u002Fimages.cocodataset.org\u002Fzips\u002Fval2014.zip)图像。\n\nSet5、Set14 和 BSD100 测试数据集的下载链接可以在这里找到：[https:\u002F\u002Fgithub.com\u002FXPixelGroup\u002FBasicSR\u002Fblob\u002Fmaster\u002Fdocs\u002FDatasetPreparation.md#common-image-sr-datasets](https:\u002F\u002Fgithub.com\u002FXPixelGroup\u002FBasicSR\u002Fblob\u002Fmaster\u002Fdocs\u002FDatasetPreparation.md#common-image-sr-datasets)。在 Google Drive 的链接中，进入 `Image Super-Resolution\u002FClassical` 文件夹。请注意，在 Set5 和 Set14 的压缩包中，你会看到多个文件夹——你需要的图像位于 `original` 文件夹中。\n\n请将图像整理到五个独立的文件夹中：`train2014` 和 `val2014` 文件夹分别存放训练图像，而 `BSDS100`、`Set5` 和 `Set14` 文件夹则用于存放测试图像。\n\n### 模型输入与目标\n\n模型共有四类输入和目标。所有输入和目标图像均由 RGB 通道组成，且处于 RGB 颜色空间，除非另有说明。\n\n#### 高分辨率（HR）图像\n\n高分辨率图像为训练图像中随机裁剪的 $96\\times96$ 大小的图像块。HR 图像用作 SRResNet 和 SRGAN 生成器的目标，同时也作为 SRGAN 判别器的输入。\n\n当 HR 图像用作 SRResNet 的目标时，我们会将其像素值归一化至 $[-1, 1]$ 范围，因为论文中 MSE 损失正是在此范围内计算的。相应地，超分辨率（SR）图像也必须以 $[-1, 1]$ 范围生成。\n\n当 HR 图像用作 SRGAN 生成器的目标时，我们将使用 ImageNet 数据的均值和标准差对其进行归一化，相关参数如下所示：[此处](https:\u002F\u002Fpytorch.org\u002Fvision\u002F0.12\u002Fmodels.html) 可查阅。这是因为 HR 图像会输入到经过截断的、预训练的 ImageNet VGG19 网络中，用于在 VGG 特征空间中计算 MSE。\n\n```python\nmean = [0.485, 0.456, 0.406]\nstd = [0.229, 0.224, 0.225]\n```\n\n当 HR 图像用作 SRGAN 判别器的输入时，同样需要进行上述归一化处理。因此，判别器接收的输入也必须是经过 ImageNet 归一化的图像。\n\nPyTorch 使用 $NCHW$ 格式，即通道维度 ($C$) 位于尺寸维度之前。\n\n因此，**HR 图像是形状为 $N\\times3\\times96\\times96$ 的 `Float` 张量**，其中 $N$ 表示批次大小，像素值要么位于 $[-1, 1]$ 范围内，要么已按 ImageNet 标准归一化。\n\n#### 低分辨率（LR）图像\n\nLR 图像是通过对 HR 图像进行 4 倍双三次下采样得到的。LR 图像是 SRResNet 和 SRGAN 生成器的输入。\n\n[根据所使用的下采样库不同](https:\u002F\u002Fzuru.tech\u002Fblog\u002Fthe-dangers-behind-image-resizing#qualitative-results)，我们可能需要在下采样前使用高斯模糊作为低通滤波器来消除混叠效应（即防止 [混叠现象](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FAliasing)）。我们最终采用的 Pillow 库中的 `resize` 函数已经内置了抗混叠措施。\n\n论文中将 LR 图像缩放到 $[0, 1]$ 范围，但我们选择使用 ImageNet 数据的均值和标准差对其进行归一化。因此，我们的 SRResNet 和 SRGAN 生成器的输入始终需要按照 ImageNet 标准归一化。\n\n综上所述，**LR 图像是形状为 $N\\times3\\times16\\times16$ 的 `Float` 张量**，其中 $N$ 表示批次大小，像素值始终遵循 ImageNet 归一化标准。\n\n#### 超分辨率（SR）图像\n\n超分辨率图像为 LR 图像经智能上采样后的结果。SR 图像是 SRResNet 和 SRGAN 生成器的输出，用于与目标 HR 图像进行比较，并作为 SRGAN 判别器的输入。\n\n在训练 SRResNet 时，内容损失是基于 $[-1, 1]$ 范围内的 RGB 值计算的。因此，SRResNet 生成的 SR 图像也处于同一范围，这通过最后一层的 $\\tanh$ 激活函数实现。\n\n由于 SRGAN 生成器与 SRResNet 具有相同的架构，并且初始权重来自训练好的 SRResNet，因此生成器的输出也将保持在 $[-1, 1]$ 范围内。\n\n然而，在训练 SRGAN 生成器时，内容损失是在 VGG 特征空间中计算的，因此其生成的 SR 图像需要从 $[-1, 1]$ 范围转换为 ImageNet 归一化后的空间，以便输入到截断的 VGG19 网络中。这一点与 HR 图像的处理方式相同。\n\n综上所述，**SR 图像是形状为 $N\\times3\\times96\\times96$ 的 `Float` 张量**，其中 $N$ 表示批次大小，像素值始终位于 $[-1, 1]$ 范围内。\n\n#### 判别器标签\n\n由于判别器是一个二分类图像分类器，其训练数据同时包含每张 LR 图像对应的 SR 和 HR 对应图像，因此标签分别为 1 和 0，分别代表 HR（自然来源）和 SR（人工生成，来自生成器）。\n\n判别器标签在训练过程中手动构建：\n\n- 当使用对抗损失训练生成器时，构造一个 **形状为 $N$ 的 `Long` 张量**，其中 $N$ 为批次大小，全部填充为 1（HR）。\n- 当使用 HR 和 SR 图像分别训练判别器时，构造一个 **形状为 $2N$ 的 `Long` 张量**，其中 $N$ 为批次大小，前 $N$ 个元素为 1（HR），后 $N$ 个元素为 0（SR）。\n\n### 数据流水线\n\n数据被划分为*训练集*和*测试集*。没有*验证集*——我们将直接使用论文中描述的超参数。\n\n#### 解析原始数据\n\n参见[`utils.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Futils.py)中的`create_data_lists()`函数。\n\n该函数会解析下载的数据，并保存以下文件：\n\n- `train_images.json`：包含所有**训练图像的文件路径**（即`train2014`和`val2014`文件夹中的图像）的列表，这些图像的尺寸均大于指定的最小值。\n\n- `Set5_test_images.json`、`Set14_test_images.json`、`BSDS100_test_images.json`：分别包含`Set5`、`Set14`和`BSDS100`文件夹中所有**测试图像的文件路径**，且这些图像的尺寸均大于指定的最小值。\n\n#### 图像转换\n\n参见[`utils.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Futils.py)中的`convert_image()`函数。\n\n我们将对RGB空间中的像素值使用多种归一化、缩放和表示方法：\n\n- 作为**Pillow (PIL) 图像**：即由Python的Pillow库读取的图像。RGB值以$[0, 255]$范围内的整数形式存储，这也是从磁盘读取图像时的格式。\n\n- 作为**$[0, 1]$范围内的浮点值**，这在不同表示之间转换时用作中间表示。\n\n- 作为**$[-1, 1]$范围内的浮点值**，这是高分辨率图像的表示方式，也是生成超分辨率图像时使用的格式，并且在SRResNet中，内容损失正是基于这种表示来计算的。\n\n- 作为**ImageNet 归一化后的值**，这是低分辨率、超分辨率和高分辨率图像输入到任何模型（SRResNet、生成器、判别器或截断的VGG19）时所采用的格式。\n\n- 作为[***y通道***](https:\u002F\u002Fgithub.com\u002Fxinntao\u002FBasicSR\u002Fwiki\u002FColor-conversion-in-SR#rgbbgr----ycbcr)，即YCbCr色彩空间中的亮度通道Y，用于计算峰值信噪比（PSNR）和结构相似性指数（SSIM）等评估指标。\n\n从一种形式到另一种形式的转换，都是通过先转换为$[0, 1]$范围内的值来实现的。\n\n#### 图像变换\n\n参见[`utils.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Futils.py)中的`ImageTransforms`类。\n\n在训练过程中，**高分辨率图像是从训练图像中随机裁剪出的固定大小的$96\\times96$区域**——每张图像每个epoch随机裁剪一次。\n\n在评估过程中，我们则对每张测试图像进行**尽可能大的中心裁剪**，使得裁剪后的图像尺寸能够被缩放因子整除。\n\n**低分辨率图像则是通过对高分辨率图像进行4倍双三次下采样得到的。** [根据所使用的下采样库的不同](https:\u002F\u002Fzuru.tech\u002Fblog\u002Fthe-dangers-behind-image-resizing#qualitative-results)，我们可能需要在下采样之前使用高斯模糊作为低通滤波器来消除混叠现象（即防止[混叠](https:\u002F\u002Fen.wikipedia.org\u002Fwiki\u002FAliasing)）。而我们最终使用的Pillow库的`resize`函数已经内置了抗混叠措施。\n\n在训练SRResNet时，高分辨率图像会被转换为$[-1, 1]$范围内的值；而在训练SRGAN时，则会转换为ImageNet归一化的值。低分辨率图像始终是ImageNet归一化的。\n\n#### PyTorch 数据集\n\n参见[`datasets.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Fdatasets.py)中的`SRDataset`类。\n\n这是一个PyTorch [`Dataset`](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fmaster\u002Fdata.html#torch.utils.data.Dataset)的子类，用于**定义我们的训练集和测试集。**\n\n它需要定义一个`__len__`方法，返回数据集的大小；以及一个`__getitem__`方法，根据上述图像变换步骤，在训练或测试JSON文件中找到第$i$张图像对应的低分辨率和高分辨率图像对并返回。\n\n#### PyTorch 数据加载器\n\n上述的`SRDataset`将由[`train_srresnet.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Ftrain_srresnet.py)、[`train_srgan.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Ftrain_srgan.py)和[`eval.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Feval.py)中的PyTorch [`DataLoader`](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fmaster\u002Fdata.html#torch.utils.data.DataLoader)使用，以**创建并为模型提供用于训练或评估的数据批次**。\n\n### 卷积块\n\n参见[`models.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Fmodels.py)中的`ConvolutionalBlock`类。\n\n这是一个自定义层，由**二维卷积**、可选的**批量归一化**以及可选的***Tanh***、***PReLU***或**Leaky *ReLU* 激活函数**组成，用作SRResNet、生成器和判别器网络中的基本构建模块。\n\n### 超像素卷积块\n\n参见[`models.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Fmodels.py)中的`SubPixelConvolutionalBlock`类。\n\n这是一个自定义层，由**一个将通道数扩展到$s^2n$的二维卷积**组成，其中$s$为缩放因子，$n$为上采样后图像所需的输出通道数，随后接一个**PyTorch 的[`nn.PixelShuffle()`](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fstable\u002Fgenerated\u002Ftorch.nn.PixelShuffle.html#torch.nn.PixelShuffle)**，用于在SRResNet和生成器网络中执行上采样操作。\n\n### 残差块\n\n参见[`models.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Fmodels.py)中的`ResidualBlock`类。\n\n这是一个由两个卷积块组成的自定义层。第一个卷积块使用PReLU激活，第二个卷积块则不进行激活。两个卷积块都会进行批量归一化。此外，**两个卷积块之间还有一条残差（跳跃）连接**。\n\n### SRResNet\n\n参见[`models.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Fmodels.py)中的`SRResNet`类。\n\n该类**构建了SRResNet**，按照[文中描述](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#the-srresnet-architecture)，使用卷积块、残差块和超像素卷积块。\n\n### 生成器\n\n参见[`models.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Fmodels.py)中的`Generator`类。\n\nSRGAN的生成器具有与SRResNet**相同的架构**，按照[文中描述](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#the-srresnet-architecture)，因此无需重新构建。\n\n### 判别器\n\n请参阅 [`models.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Fmodels.py) 中的 `Discriminator`。\n\n该模块**构建了 SRGAN 的判别器**，如[文档所述](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#the-discriminator-architecture)，使用卷积块和线性层实现。\n\n一个*可选的* [`nn.AdaptiveAvgPool2d`](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fstable\u002Fgenerated\u002Ftorch.nn.AdaptiveAvgPool2d.html) 在将特征图展平并传递给线性层之前，会保持图像尺寸固定——这仅在训练时未使用默认的 $96\\times96$ 高分辨率\u002F超分辨率图像尺寸时才需要。\n\n### 截断的 VGG19\n\n请参阅 [`models.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Fmodels.py) 中的 `TruncatedVGG19`。\n\n该模块**截断了一个在 ImageNet 上预训练的 VGG19 网络**，该网络[可在 `torchvision` 中找到](https:\u002F\u002Fpytorch.org\u002Fvision\u002F0.12\u002Fmodels.html)，使其输出为论文中描述的“VGG19 网络中第 $i$ 个最大池化层之前的第 $j$ 个卷积层（激活后）所得到的特征图”。\n\n正如作者所做的那样，我们将使用 $i=5$ 和 $j=4$。\n\n# 训练\n\n在开始之前，请确保已保存用于训练和评估的必要数据文件。为此，在指向训练数据文件夹 `train2014`、`val2014` 以及测试数据文件夹 `Set5`、`Set14`、`BSDS100` 后，运行 [`create_data_lists.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Fcreate_data_lists.py) 的内容——前提是您已经[下载了数据](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution#download）：\n\n`python create_data_lists.py`\n\n### 训练 SRResNet\n\n请参阅 [`train_srresnet.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Ftrain_srresnet.py)。\n\nSRResNet 的参数（以及训练设置）位于文件开头，因此您可以根据需要轻松查看或修改它们。\n\n要从头开始训练 SRResNet，请运行此文件：\n\n`python train_srresnet.py`\n\n若要从检查点继续训练，请在代码开头通过 `checkpoint` 参数指定相应的检查点文件。\n\n### 训练 SRGAN\n\n请参阅 [`train_srgan.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Ftrain_srgan.py)。\n\n只有在训练完 SRResNet 后才能训练 SRGAN，因为 SRGAN 的生成器会使用已训练好的 SRResNet 检查点进行初始化。\n\n模型的参数（以及训练设置）位于文件开头，因此您可以根据需要轻松查看或修改它们。\n\n要从头开始训练 SRGAN，请运行此文件：\n\n`python train_srgan.py`\n\n若要从检查点继续训练，请在代码开头通过 `checkpoint` 参数指定相应的检查点文件。\n\n### 备注\n\n我们使用了论文中推荐的超参数。\n\n对于 SRResNet，我们使用 Adam 优化器，学习率为 $10^{-4}$，训练 $10^6$ 次迭代，批量大小为 $16$。\n\nSRGAN 同样使用 Adam 优化器，前 $10^5$ 次迭代的学习率为 $10^{-4}$，随后的额外 $10^5$ 次迭代学习率为 $10^{-5}$，批量大小仍为 $16$。\n\n我是在单块 RTX 2080Ti GPU 上进行训练的。\n\n### 模型检查点\n\n您可以从[这里](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F12OG-KawSFFs6Pah89V4a_Td-VcwMBE5i?usp=sharing)下载我的预训练模型。\n\n请注意，这些检查点应[直接使用 PyTorch 加载](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fstable\u002Fgenerated\u002Ftorch.load.html#torch.load)以进行评估或推理——详情见下文。\n\n# 评估\n\n请参阅 [`eval.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Feval.py)。\n\n要评估选定的模型，请运行该文件：\n\n`python eval.py`\n\n这将计算所选模型在三个测试数据集上的峰值信噪比（PSNR）和结构相似性指数（SSIM）评估指标。\n\n以下是我的结果（括号内为论文中的结果）：\n\n|              |      PSNR      |      SSIM      |       |      PSNR      |      SSIM      |       |      PSNR      |      SSIM      |\n| :----------: | :------------: | :------------: | :---: | :------------: | :------------: | :---: | :------------: | :------------: |\n| **SRResNet** | 31.927 (32.05) | 0.902 (0.9019) |       | 28.588 (28.49) | 0.799 (0.8184) |       | 27.587 (27.58) | 0.756 (0.7620) |\n|  **SRGAN**   | 29.719 (29.40) | 0.859 (0.8472) |       | 26.509 (26.02) | 0.729 (0.7397) |       | 25.531 (25.16) | 0.678 (0.6688) |\n|              |    **Set5**    |    **Set5**    |       |   **Set14**    |   **Set14**    |       |   **BSD100**   |   **BSD100**   |\n\n嗯，需要打个大大的折扣。论文反复强调，PSNR 和 SSIM 并不能真正反映超分辨率图像的质量。相比之下，SRResNet 生成的图像虽然更不真实、过于平滑，但其 PSNR 和 SSIM 得分却更高。这也是为什么论文作者进行了主观评分测试，而这种测试显然超出了我们的能力范围。\n\n# 推理\n\n请参阅 [`super_resolve.py`](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fblob\u002Fmaster\u002Fsuper_resolve.py)。\n\n请务必在代码开头加载训练好的 SRResNet 和 SRGAN 检查点。\n\n使用您希望处理的高分辨率图像调用 `visualize_sr()` 函数，即可**以网格形式可视化结果**，包括原始高分辨率图像、双三次插值上采样后的图像（作为该图像低分辨率版本的近似）、由 SRResNet 超分辨率后的图像以及由 SRGAN 超分辨率后的图像。\n\n本教程开头的示例就是使用该函数生成的。需要注意的是，该函数并不会对输入图像进行上采样，而是**先下采样再进行超分辨率，以便与原始高分辨率图像进行对比**。如果您希望直接对提供的图像进行上采样或其他操作，就需要修改代码。\n\n**请注意输入图像的尺寸。** 该函数提供了一个 `halve` 参数，允许您创建一个尺寸减半的新高分辨率图像。如果原始高分辨率图像的尺寸超过了您的屏幕分辨率，导致您无法体验 4 倍的超分辨率效果，那么这个功能就非常有用。例如，对于 2160p 的高分辨率图像，其低分辨率版本将是 540p（2160p\u002F4）。而在 1080p 的屏幕上，您实际上看到的是 540p 的低分辨率图像（以双三次插值上采样的形式）与 1080p 的超分辨率\u002F高分辨率图像之间的对比，因为您的 1080p 屏幕只能以 1080p 的分辨率显示 2160p 的超分辨率\u002F高分辨率图像。这仅仅是*视觉上的* 2 倍缩放效果。如果将 `halve` 设置为 `True`，那么高分辨率\u002F超分辨率图像将变为 1080p，而低分辨率图像则变为 270p。\n\n### 大尺寸示例\n\n以下示例中的图像（来自《赛博朋克2077》[https:\u002F\u002Fwww.cyberpunk.net\u002Fin\u002Fen\u002F]）非常大。如果您是在 1080p 分辨率的屏幕上查看此页面，那么您需要 **点击图片以查看其实际尺寸**，才能有效地观察到 4 倍超分辨率的效果。\n\n\n\u003Cp align=\"center\">\n  \u003Ci>点击图片以全尺寸查看。\u003C\u002Fi>\n\u003C\u002Fp>\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_edfeba208484.png)\n\n---\n\n\u003Cp align=\"center\">\n  \u003Ci>点击图片以全尺寸查看。\u003C\u002Fi>\n\u003C\u002Fp>\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_b38442e95e6a.png)\n\n---\n\n\u003Cp align=\"center\">\n  \u003Ci>点击图片以全尺寸查看。\u003C\u002Fi>\n\u003C\u002Fp>\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_71b6b5cf5139.png)\n\n---\n\n\u003Cp align=\"center\">\n  \u003Ci>点击图片以全尺寸查看。\u003C\u002Fi>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_cae74d015f18.png\">\n\u003C\u002Fp>\n\n---\n\n\u003Cp align=\"center\">\n  \u003Ci>点击图片以全尺寸查看。\u003C\u002Fi>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_dc67974f24a3.png\">\n\u003C\u002Fp>\n\n---\n\n\u003Cp align=\"center\">\n  \u003Ci>点击图片以全尺寸查看。\u003C\u002Fi>\n\u003C\u002Fp>\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_readme_e414a01144a0.png)\n\n---\n\n# 常见问题解答\n\n我将根据本仓库 [*Issues*](https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fissues) 标签下常见的提问，逐步完善本节内容。\n\n**为什么生成器输出的超分辨率图像要经过判别器两次？为什么不直接复用第一次经过判别器时的输出呢？**\n\n是的，我们确实会对超分辨率图像进行两次判别：\n\n- 在训练生成器时，我们将超分辨率图像输入判别器，并利用判别器的输出与错误但期望的 $HR$ 标签一起计算对抗损失函数。\n  \n- 在训练判别器时，我们同样将超分辨率图像输入判别器，但这次会使用判别器的输出来计算与正确且期望的 $SR$ 标签之间的二元交叉熵损失。\n\n在第一次过程中，我们的目标是通过损失函数对生成器参数 $\\theta_G$ 的梯度来更新 $\\theta_G$；而此时生成器确实是计算图的一部分，我们会在该计算图上执行反向传播。\n\n而在第二次过程中，我们的目标仅是更新判别器的参数 $\\theta_D$，这些参数在反向传播的方向上位于 $\\theta_G$ 的上游。\n\n换句话说，在训练判别器时，并不需要计算损失函数对 $\\theta_G$ 的梯度，生成器也无需参与计算图！如果让生成器继续参与计算图，将会增加不必要的开销，因为反向传播本身是非常耗时的。因此，在第二次前向传播中，我们会将超分辨率图像从计算图中分离出来，使其成为一个独立变量，不再保留任何关于生成它的计算图（即生成器）的记忆。\n\n这就是为什么我们要进行两次前向传播：一次是将超分辨率图像保留在完整的 SRGAN 计算图中，从而需要跨过生成器进行反向传播；另一次则是将超分辨率图像从生成器的计算图中分离出来，避免反向传播穿过生成器。显然，进行两次前向传播的成本远低于两次反向传播。\n\n**子像素卷积与转置卷积相比如何？**\n\n在我看来，两者非常相似，理论上应该能够达到相近的效果。\n\n如果对于给定的上采样因子 $s$ 和子像素卷积使用的卷积核大小 $k$，转置卷积的卷积核大小为 $sk$，那么这两种方法在数学上是可以等价的。在这种情况下，它们的参数量也会相同：前者为 $ns^2 * i * k * k$，后者为 $n * i * sk * sk$。\n\n不过，有一些研究表明，子像素卷积在某些方面确实更具优势，尽管我并不完全理解其中的原因。您可以参考这篇 [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1609.07009.pdf)、这个 [仓库](https:\u002F\u002Fgithub.com\u002Fatriumlts\u002Fsubpixel) 以及这个 [Reddit 帖子](https:\u002F\u002Fwww.reddit.com\u002Fr\u002FMachineLearning\u002Fcomments\u002Fn5ru8r\u002Fd_subpixel_convolutions_vs_transposed_convolutions\u002F)。也许原始的 [论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1609.05158.pdf) 也能提供一些线索。\n\n当然，数学上的等价性并不意味着它们在优化、学习能力或效率上完全一致。如果您了解子像素卷积为何能取得更好效果，请提交一个问题并告知我，我会将相关信息补充到本教程中。","# a-PyTorch-Tutorial-to-Super-Resolution 快速上手指南\n\n本指南旨在帮助开发者快速使用 PyTorch 复现基于生成对抗网络（GAN）的超分辨率模型（SRResNet 和 SRGAN），实现图像分辨率的 4 倍提升。\n\n## 1. 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux, macOS 或 Windows\n*   **Python 版本**: 推荐 `Python 3.6` 或更高版本\n*   **PyTorch 版本**: 教程基于 `PyTorch 1.4` 编写，建议使用 `1.4` 及以上兼容版本。\n*   **前置知识**: 具备基础的 PyTorch 使用和卷积神经网络（CNN）知识。\n\n**依赖安装建议：**\n国内用户推荐使用清华源或阿里源加速安装。\n\n```bash\n# 安装 PyTorch (以 CPU 版本为例，如需 GPU 请访问 pytorch.org 获取对应 CUDA 命令)\npip install torch==1.4.0 torchvision==0.5.0 -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n\n# 安装其他必要依赖 (根据项目 requirements.txt，通常包含 PIL, numpy 等)\npip install pillow numpy scipy -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n## 2. 安装步骤\n\n克隆项目仓库并进入目录：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution.git\ncd a-PyTorch-Tutorial-to-Super-Resolution\n```\n\n*注：本项目主要为教程代码，核心逻辑位于 `.py` 文件中，无需额外的 `setup.py` 安装步骤，确保上述依赖已安装即可运行。*\n\n## 3. 基本使用\n\n本教程的核心目标是训练一个模型，将低分辨率图像转换为高分辨率图像（放大 4 倍）。以下是简化的使用流程。\n\n### 3.1 数据准备\n您需要准备训练数据集（如 DIV2K 或 ImageNet）。将高分辨率图像放入指定文件夹，代码会自动将其下采样作为低分辨率输入。\n\n假设您已将高分辨率训练图片放置在 `data\u002FDIV2K_train_HR\u002F` 目录下。\n\n### 3.2 训练模型\n项目包含两个阶段的训练：先训练 SRResNet，再在此基础上训练 SRGAN。\n\n**第一步：训练 SRResNet (基础超分模型)**\n```bash\npython train.py --model_type SRResNet --data_path data\u002FDIV2K_train_HR\u002F\n```\n\n**第二步：训练 SRGAN (引入对抗损失，提升真实感)**\n*注意：通常需要加载预训练好的 SRResNet 权重作为初始化。*\n```bash\npython train.py --model_type SRGAN --data_path data\u002FDIV2K_train_HR\u002F --checkpoint checkpoints\u002FSRResNet_big.pth\n```\n\n### 3.3 推理与评估 (图像超分)\n训练完成后，使用以下命令对单张低分辨率图片进行超分辨率处理。\n\n```bash\n# 用法：python infer.py --image \u003C输入图片路径> --model \u003C模型权重路径> --output \u003C输出路径>\n\npython infer.py --image input\u002Fbaboon.png --model checkpoints\u002FSRGAN_big.pth --output output\u002Fbaboon_sr.png\n```\n\n执行后，`output\u002F` 目录下将生成分辨率提升 4 倍的高清图像。\n\n### 3.4 核心概念简述\n*   **SRResNet**: 基于残差连接的基础超分网络，优化像素级误差 (MSE)。\n*   **SRGAN**: 在 SRResNet 基础上引入生成对抗网络 (GAN)，通过判别器（Discriminator）和感知损失（Perceptual Loss）生成更具纹理细节、视觉上更真实的图像。\n*   **亚像素卷积 (Sub-Pixel Convolution)**: 用于高效的上采样操作，替代传统的转置卷积。","一家数字档案馆正在处理大量上世纪的低分辨率历史照片，急需将其修复为高清版本以供线上展览和学术研究。\n\n### 没有 a-PyTorch-Tutorial-to-Super-Resolution 时\n- **画质模糊失真**：传统插值算法（如双三次插值）仅能放大像素，导致图像边缘锯齿严重，细节涂抹感强，无法还原纹理。\n- **开发门槛极高**：团队若想自研基于生成对抗网络（GAN）的模型，需从零推导复杂的对抗训练逻辑和残差网络结构，耗时数月。\n- **缺乏真实感**：现有的开源方案多专注于提升数值指标（如 PSNR），生成的图像虽然清晰但显得过于平滑，丢失了老照片特有的颗粒质感。\n- **试错成本高昂**：在没有成熟教程指引的情况下，调整超分辨率模型的超参数如同“盲人摸象”，极易导致模型不收敛或产生伪影。\n\n### 使用 a-PyTorch-Tutorial-to-Super-Resolution 后\n- **照片级真实复原**：利用其内置的 SRGAN 模型，成功将低清图放大 4 倍，不仅清晰度提升，更通过“幻觉”机制合理补全了衣物纹理和面部细节。\n- **快速落地部署**：依托详细的 PyTorch 实现教程，团队在数天内便复现了论文模型，无需深究底层数学推导即可直接进行训练和推理。\n- **视觉自然逼真**：生成的图像摆脱了机械的平滑感，保留了符合人眼视觉习惯的自然噪点和锐度，完美契合历史照片的修复需求。\n- **流程透明可控**：教程清晰拆解了从数据预处理、残差块构建到对抗训练的全过程，让开发人员能轻松针对特定数据集进行微调优化。\n\na-PyTorch-Tutorial-to-Super-Resolution 将前沿的 GAN 超分技术转化为可执行的代码路径，让低质量图像的瞬间“高清重生”变得触手可及。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fsgrvinod_a-PyTorch-Tutorial-to-Super-Resolution_c2b8443a.png","sgrvinod","Sagar Vinodababu","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fsgrvinod_f8c3c1d8.jpg",null,"Calgary, Canada","sgrvinod@gmail.com","https:\u002F\u002Fgithub.com\u002Fsgrvinod",[80],{"name":81,"color":82,"percentage":83},"Python","#3572A5",100,707,127,"2026-04-14T04:29:52","MIT","未说明","未说明 (项目基于 PyTorch，通常支持 CUDA GPU 加速，但 README 未指定具体型号或显存要求)",{"notes":91,"python":92,"dependencies":93},"该项目是一个超分辨率（Super-Resolution）和生成对抗网络（GAN）的教学教程。作者明确声明使用 Python 3.6 和 PyTorch 1.4 版本。由于版本较老，现代环境可能需要调整依赖或使用兼容模式运行。教程假设用户具备 PyTorch 和卷积神经网络的基础知识。","3.6",[94],"PyTorch==1.4",[14,15],[97,98,99,100,101,102,103],"pytorch-tutorial","pytorch","super-resolution","srgan","generative-adversarial-network","gan","gans","2026-03-27T02:49:30.150509","2026-04-20T10:20:46.234021",[107,112,117,121,126,131],{"id":108,"question_zh":109,"answer_zh":110,"source_url":111},44514,"如何在 VGG19 特征空间下使用 MSE 损失训练 SRResNet？效果如何？","有用户尝试在 VGG 特征空间下使用 MSE 损失训练 SRResNet，发现生成的超分辨率图像会出现“规则的纹理图案”（regular texture like pattern），看起来不自然。这表明虽然 SRGAN 的改进部分源于在 VGG 空间计算感知损失，但直接将此策略应用于 SRResNet 可能会导致特定的伪影，而非预期的平滑或真实效果。","https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fissues\u002F15",{"id":113,"question_zh":114,"answer_zh":115,"source_url":116},44515,"加载检查点时报错 'ModuleNotFoundError: No module named models' 如何解决？","该错误通常是因为 `models.py` 文件的位置不正确导致的。请确保 `models.py` 文件位于与加载脚本相同的目录下，或者位于 Python 能够识别的路径中。有用户确认将 `models.py` 移动到正确位置后解决了该问题。","https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fissues\u002F11",{"id":118,"question_zh":119,"answer_zh":120,"source_url":111},44516,"图像放大后出现棋盘格伪影（checkerboard artifacts）是什么原因？","这通常是在使用子像素卷积（subpixel convolution）或反卷积（deconvolution）进行图像放大时发生的现象。维护者指出这是棋盘格伪影，并推荐参考 Distill 出版社的文章《Deconvolution and Checkerboard Artifacts》以了解其产生机制及示例。",{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},44517,"运行代码时提示找不到检查点文件 'checkpoint_srgan.pth.tar' 怎么办？","错误 'FileNotFoundError: [Errno 2] No such file or directory' 表明程序在当前目录下未找到指定的检查点文件。请确保已下载预训练模型文件，并将其命名为 `checkpoint_srgan.pth.tar`，且放置在代码运行的根目录（即脚本执行的当前目录 '.\u002F'）下。","https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fissues\u002F16",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},44518,"项目的损失函数定义在哪里？","虽然用户在 `models.py` 中未直接看到损失函数的定义，但在该教程架构中，损失函数通常在训练脚本（如 `train.py`）中定义，而不是在模型定义文件 `models.py` 中。`models.py` 主要负责定义网络结构（如 SRResNet, SRGAN），而优化目标和损失计算逻辑位于训练循环部分。","https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fissues\u002F8",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},44519,"如何生成 1x1 的数据？","该问题涉及数据生成的具体参数配置，但原始 Issue 中未提供具体的解决方案或代码片段。通常这需要修改数据加载器中的缩放因子（scale factor）或裁剪逻辑，建议查看项目中的数据预处理脚本（如 `datasets.py` 或相关数据准备文档）以调整输出尺寸。","https:\u002F\u002Fgithub.com\u002Fsgrvinod\u002Fa-PyTorch-Tutorial-to-Super-Resolution\u002Fissues\u002F10",[]]