[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-ohayonguy--PMRF":3,"tool-ohayonguy--PMRF":62},[4,18,26,35,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",109154,2,"2026-04-18T11:18:24",[14,15,13],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":32,"last_commit_at":41,"category_tags":42,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[43,13,15,14],"插件",{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":10,"last_commit_at":50,"category_tags":51,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[52,15,13,14],"语言模型",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,"2026-04-06T03:28:53",[14,15,13,61],"视频",{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":78,"owner_email":79,"owner_twitter":80,"owner_website":81,"owner_url":82,"languages":83,"stars":92,"forks":93,"last_commit_at":94,"license":95,"difficulty_score":96,"env_os":97,"env_gpu":98,"env_ram":99,"env_deps":100,"category_tags":114,"github_topics":115,"view_count":32,"oss_zip_url":79,"oss_zip_packed_at":79,"status":17,"created_at":129,"updated_at":130,"faqs":131,"releases":162},9722,"ohayonguy\u002FPMRF","PMRF","[ICLR 2025] Official implementation of Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration","PMRF 是一款专为照片级真实感图像修复设计的先进算法，其核心目标是在确保修复后图像视觉效果自然逼真的前提下，尽可能降低图像与原始清晰图像之间的均方误差（MSE）。传统修复方法往往难以兼顾“看起来真实”与“数值误差最小”这两个目标，而 PMRF 通过独特的“后验均值整流流”技术，从理论上证明了其能够逼近这一最优平衡点，有效解决了去模糊、超分辨率等任务中常见的细节丢失或伪影问题。\n\n该工具特别适合从事计算机视觉研究的科研人员、需要高质量复原方案的算法开发者，以及对图像画质有极致追求的专业设计师。作为入选 ICLR 2025 的前沿成果，PMRF 的创新之处在于将感知质量约束与最小化误差目标巧妙结合，并采用了高效的 HDiT 架构进行实现。项目不仅提供了完整的官方代码和预训练模型，还集成了 Hugging Face 在线演示，方便用户快速体验其在人脸修复及自然场景还原上的卓越表现。无论是用于学术探索还是实际工程落地，PMRF 都为高保真图像恢复提供了一个强有力的新选择。","\u003Cdiv align=\"center\">\n\n# Posterior-Mean Rectified Flow:\u003Cbr \u002F>Towards Minimum MSE Photo-Realistic Image Restoration\u003Cbr \u002F>(ICLR 2025)\n\n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.00418)] [[Project Page](https:\u002F\u002Fpmrf-ml.github.io\u002F)] [[Demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fohayonguy\u002FPMRF)]\n\n[Guy Ohayon](https:\u002F\u002Fohayonguy.github.io\u002F), [Tomer Michaeli](https:\u002F\u002Ftomer.net.technion.ac.il\u002F), [Michael Elad](https:\u002F\u002Felad.cs.technion.ac.il\u002F)\u003Cbr \u002F>\nTechnion—Israel Institute of Technology\n\n\u003C\u002Fdiv>\n\n> PMRF is a novel photo-realistic image restoration algorithm. It (provably) approximates the **optimal** estimator that minimizes the Mean Squared Error (MSE) under a perfect perceptual quality constraint.\n\n\u003Cdiv align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fohayonguy_PMRF_readme_4a41ad19ba60.png\" width=\"2000\">\n\u003C\u002Fdiv>\n\n---\n\n\u003Cdiv align=\"center\">\n\n[![license](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-red.svg)](https:\u002F\u002Fgithub.com\u002Fohayonguy\u002FPMRF\u002Fblob\u002Fmain\u002FLICENSE)\n[![torch](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPyTorch-2.3.1-DE3412)](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fpytorch)\n[![lightning](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLightning-2.3.3-8A2BE2)](https:\u002F\u002Fgithub.com\u002FLightning-AI\u002Fpytorch-lightning)\n[![Hugging Face](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo-%F0%9F%A4%97%20Hugging%20Face-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fohayonguy\u002FPMRF)\n[![Hits](https:\u002F\u002Fhits.sh\u002Fgithub.com\u002Fohayonguy\u002FPMRF.svg?label=Visitors&color=30a704)](https:\u002F\u002Fhits.sh\u002Fgithub.com\u002Fohayonguy\u002FPMRF\u002F)\n\n\u003C\u002Fdiv>\n\n# 📈 Some results from our paper\n### CelebA-Test quantitative comparison\n\nRed, blue and green indicate the best, the second best and the third best scores, respectively.\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fohayonguy_PMRF_readme_16425dc27480.png\"\u002F>\n\n\n### WIDER-Test visual comparison\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fohayonguy_PMRF_readme_dc82796a5eaf.png\"\u002F>\n\n### WebPhoto-Test visual comparison\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fohayonguy_PMRF_readme_f810ddcd3fb7.png\"\u002F>\n\n# ⚙️ Installation\n**Note for Windows users:** *It appears that several Windows users have been unable to install the `natten` package, which is required in order to use the HDiT model architecture in PMRF. A solution that worked for several people is suggested [here](https:\u002F\u002Fgithub.com\u002Fohayonguy\u002FPMRF\u002Fissues\u002F8#issue-2581034421). If you couldn't solve this issue, you may train PMRF using a different architecture (e.g. UNet) and avoid using the `natten` package.*\n\nWe created a conda environment by running the following commands, exactly in the given order (these are given in the `install.sh` file):\n\n```\nconda create -n pmrf python=3.10\nconda activate pmrf\nconda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=11.8 -c pytorch -c nvidia\nconda install lightning==2.3.3 -c conda-forge\npip install opencv-python==4.10.0.84 timm==1.0.8 wandb==0.17.5 lovely-tensors==0.1.16 torch-fidelity==0.3.0 einops==0.8.0 dctorch==0.1.2 torch-ema==0.3\npip install natten==0.17.1+torch230cu118 -f https:\u002F\u002Fshi-labs.com\u002Fnatten\u002Fwheels\npip install nvidia-cuda-nvcc-cu11\npip install basicsr==1.4.2\npip install git+https:\u002F\u002Fgithub.com\u002Ftoshas\u002Ftorch-fidelity.git\npip install lpips==0.1.4\npip install piq==0.8.0\npip install huggingface_hub==0.24.5\n```\n\n1. Note that the package `natten` is required for the HDiT architecture used by PMRF.\nMake sure to replace `natten==0.17.1+torch230cu118` with the correct CUDA version installed on your system.\nCheck out https:\u002F\u002Fshi-labs.com\u002Fnatten\u002F for the available versions.\n2. We installed `nvidia-cuda-nvcc-cu11` because otherwise `torch.compile` got hanging for some reason.\n`torch.compile` may work in your system without this package. In any case, if you wish to do so, you may simply skip\nthis package and\u002For remove all the `torch.compile` lines from our code.\n3. Due to a compatibility issue in `basicsr`, you will need to modify one of the files in this package.\nOpen `\u002Fpath\u002Fto\u002Fenv\u002Fpmrf\u002Flib\u002Fpython3.10\u002Fsite-packages\u002Fbasicsr\u002Fdata\u002Fdegradations.py`, where `\u002Fpath\u002Fto\u002Fenv` is the path\nwhere your conda installed the `pmrf` environment.\nThen, change the line\n```\nfrom torchvision.transforms.functional_tensor import rgb_to_grayscale\n```\nto\n```\nfrom torchvision.transforms.functional import rgb_to_grayscale\n```\n\n \n# ⬇️ Downloads\n## 🌐 Model checkpoints\nWe provide our blind face image restoration model checkpoint in [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fohayonguy\u002FPMRF_blind_face_image_restoration) and in [Google Drive](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1dfjZATcQ451uhvFH42tKnfMNHRkL6N_A?usp=sharing).\nThe checkpoints for section 5.2 in the paper (the controlled experiments) can be downloaded from [Google Drive](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1dfjZATcQ451uhvFH42tKnfMNHRkL6N_A?usp=sharing). Please keep the same folder structure as provided in Google Drive:\n\n```\ncheckpoints\u002F\n├── blind_face_restoration_pmrf.ckpt    # Checkpoint of our blind face image restoration model.\n├── swinir_restoration512_L1.pth    # Checkpoint of the SwinIR model trained by DifFace\n├── controlled_experiments\u002F     # Checkpoints for the controlled experiments\n│   ├── colorization_gaussian_noise_025\u002F\n│   │   ├── pmrf\u002F\n│   │   │   └── epoch=999-step=273000.ckpt\n│   │   ├── mmse\u002F\n│   │   │   └── epoch=999-step=273000.ckpt\n.   .   .\n.   .   .\n.   .   .\n```\nTo evaluate the landmark distance (LMD in the paper) and the identity metric (Deg in the paper), you will also need to download the `resnet18_110.pth` and `alignment_WFLW_4HG.pth` checkpoints from the [Google Drive](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1k3RCSliF6PsujCMIdCD1hNM63EozlDIZ) of [VQFR](https:\u002F\u002Fgithub.com\u002FTencentARC\u002FVQFR). Place these checkpoints in the `evaluation\u002Fmetrics_ckpt\u002F` folder.\n\n## 🌐 Test data sets for blind face image restoration\n1. Download WebPhoto-Test, LFW-Test, and CelebA-Test (HQ and LQ) from https:\u002F\u002Fxinntao.github.io\u002Fprojects\u002Fgfpgan.\n2. Download WIDER-Test from https:\u002F\u002Fshangchenzhou.com\u002Fprojects\u002FCodeFormer\u002F.\n3. Put these data sets wherever you want in your system.\n\n\n# 🧑 Blind face image restoration (section 5.1 in the paper)\n## ⚡ Quick inference\nTo quickly use our model, we provide a [Hugging Face checkpoint](https:\u002F\u002Fhuggingface.co\u002Fohayonguy\u002FPMRF_blind_face_image_restoration) which is automatically downloaded. Simply run\n```\npython inference.py \\\n--ckpt_path ohayonguy\u002FPMRF_blind_face_image_restoration \\\n--ckpt_path_is_huggingface \\\n--lq_data_path \u002Fpath\u002Fto\u002Flq\u002Fimages \\\n--output_dir \u002Fpath\u002Fto\u002Fresults\u002Fdir \\\n--batch_size 64 \\\n--num_flow_steps 25\n```\nPlease alter `--num_flow_steps` as you wish (this is the hyper-parameter `K` in our paper)\n\nYou may also provide a local model checkpoint (e.g., if you train your own PMRF model, or if you wish to use our [Google Drive](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1dfjZATcQ451uhvFH42tKnfMNHRkL6N_A?usp=sharing) checkpoint instead of the Hugging Face one). Simply run\n```\npython inference.py \\\n--ckpt_path .\u002Fcheckpoints\u002Fblind_face_restoration_pmrf.ckpt \\\n--lq_data_path \u002Fpath\u002Fto\u002Flq\u002Fimages \\\n--output_dir \u002Fpath\u002Fto\u002Fresults\u002Fdir \\\n--batch_size 64 \\\n--num_flow_steps 25\n```\nImportantly, note that our blind face image restoration model is trained to handle square and aligned face images. To restore general content face images (e.g., where there is more than one face in the image), you may use our [Hugging Face demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fohayonguy\u002FPMRF).\n\n## 🔬 Evaluation\n\n1. We downloaded the `resnet18_110.pth` and `alignment_WFLW_4HG.pth` checkpoints from the [Google Drive](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1k3RCSliF6PsujCMIdCD1hNM63EozlDIZ) of [VQFR](https:\u002F\u002Fgithub.com\u002FTencentARC\u002FVQFR), and put these in the folder `evaluation\u002Fmetrics_ckpt\u002F`.\nTo evaluate the results on CelebA-Test, run:\n```\ncd evaluation\npython compute_metrics_blind.py \\\n--parent_ffhq_512_path \u002Fpath\u002Fto\u002Fparent\u002Fof\u002Fffhq512 \\\n--rec_path \u002Fpath\u002Fto\u002Fceleba-512-test\u002Frestored\u002Fimages \\\n--gt_path \u002Fpath\u002Fto\u002Fceleba-512-test\u002Fground-truth\u002Fimages\n```\nTo evaluate the results on the real-world data sets, run:\n```\ncd evaluation\npython compute_metrics_blind.py \\\n--parent_ffhq_512_path \u002Fpath\u002Fto\u002Fparent\u002Fof\u002Fffhq512 \\\n--rec_path \u002Fpath\u002Fto\u002Freal-world\u002Frestored\u002Fimages \\\n--mmse_rec_path \u002Fpath\u002Fto\u002Fmmse\u002Frestored\u002Fimages\n```\nThe `--mmse_rec_path` argument is optional, and allows you to compute IndRMSE, as an indicator of the true RMSE for real-world degraded images.\nNote that the MMSE reconstructions are saved automatically when you run `inference.py`, since the MMSE model\nis also in the PMRF checkpoint.\n\n## 💻 Training\nIn the folder `scripts\u002F` we provide the training scripts we used for blind face image restoration and for training\nthe baseline models as well. If you want to run a script, you need to execute it in the root folder\n(where `train.py` is located). To train the model, you will need the FFHQ data set.\nWe downloaded the original FFHQ 1024x1024 data set and down-sampled the images to size 512x512 using bi-cubic down-sampling.\n\n1. Copy the `train_pmrf.sh` file (located in `scripts\u002Ftrain\u002Fblind_face_restoration`) to the root folder.\n2. Adjust the arguments `--train_data_root` and `--val_data_root` according to the location of the training and validation data in your system.\n3. The SwinIR model which was trained by [DifFace](https:\u002F\u002Fgithub.com\u002FzsyOAOA\u002FDifFace) is provided in the `checkpoints\u002F` folder. We downloaded it via\n```\nwget https:\u002F\u002Fgithub.com\u002FzsyOAOA\u002FDifFace\u002Freleases\u002Fdownload\u002FV1.0\u002Fswinir_restoration512_L1.pth\n```\n4. Adjust the argument `--mmse_model_ckpt_path` to the path of the SwinIR model.\n5. Adjust the arguments `--num_gpus` and `--num_workers` according to your system.\n6. Run the script `train_pmrf.sh` to train our model.\n\n\n# 👩‍🔬 Controlled experiments (section 5.2 in the paper)\nWe provide training and evaluation codes for the controlled experiments in our paper, where we compare PMRF with the following baseline methods:\n1. **Flow conditioned on Y**: A rectified flow model which is *conditioned* on the *input measurement*, and learns to flow from pure noise to the ground-truth data distribution.\n2. **Flow conditioned on the posterior mean predictor**: A rectified flow model which is *conditioned* on the *posterior mean prediction*, and learns to flow from pure noise to the ground-truth data distribution.\n3. **Flow from Y**: A rectified flow model which flows from the degraded measurement to the ground-truth data distribution.\n4. **Posterior mean predictor**: A model which is trained to minimize the MSE loss.\n\n## 🔬 Evaluation\nWe provide checkpoints for quick evaluation of PMRF and all the baseline methods.\n1. The evaluation is conducted on CelebA-Test images of size 256x256. To acquire such images, we downloaded the CelebA-Test (HQ) images from [GFPGAN](https:\u002F\u002Fxinntao.github.io\u002Fprojects\u002Fgfpgan), and down-sampled them to 256x256 using bi-cubic down-sampling.\n2. Adjust `--test_data_root` in `test.sh` to the path of the CelebA-Test 256x256 images, and adjust `--degradation` and `--ckpt_path` to the type of degradation you wish to assess and the corresponding model checkpoint.\n3. Run `test.sh`.\n\nWe automatically save the reconstructed outputs, the degraded measurements, as well as the samples from the source distribution (the images from which the ODE solver begins).\nAfter running `test.sh`, you may evaluate the results via :\n\n```\ncd evaluation\npython compute_metrics_controlled_experiments.py \\\n--parent_ffhq_256_path \u002Fpath\u002Fto\u002Fparent\u002Fof\u002Fffhq256 \\\n--rec_path \u002Fpath\u002Fto\u002Frestored\u002Fimages \\\n--gt_path \u002Fpath\u002Fto\u002Fceleba-256-test\u002Fground-truth\u002Fimages\n```\n\n## 💻 Training\n\n* We trained our models on FFHQ 256x256. To acquire such images, with down-sampled the original FFHQ 1024x1024 images using bi-cubic down-sampling.\n* The training scripts of PMRF and each of these baseline models are provided in the `scripts\u002Ftrain\u002Fcontrolled_experiments\u002F` folder.\n* To run each of these scripts, you need to copy it to the root folder where `train.py` is located. All you need to do is adjust the `--degradation`, `--source_noise_std`, `--train_data_root` and `--val_data_root` arguments in each script. For denoising, we used `--source_noise_std 0.025`, and for the rest of the tasks we used `--source_noise_std 0.1`.\n* To run the `train_pmrf.sh` and `train_posterior_conditioned_on_mmse_model.sh` scripts, you first need to train the MMSE model via `train_mmse.sh`. Then, adjust the `--mmse_model_ckpt_path` argument according to the path of the MMSE model final checkpoint.\n\n\n## 📝 Citation\n    @inproceedings{\n        ohayon2025posteriormean,\n        title={Posterior-Mean Rectified Flow: Towards Minimum {MSE} Photo-Realistic Image Restoration},\n        author={Guy Ohayon and Tomer Michaeli and Michael Elad},\n        booktitle={The Thirteenth International Conference on Learning Representations},\n        year={2025},\n        url={https:\u002F\u002Fopenreview.net\u002Fforum?id=hPOt3yUXii}\n    }\n\n## 📋 License and acknowledgements\nThis project is released under the [MIT license](https:\u002F\u002Fgithub.com\u002Fohayonguy\u002FPMRF\u002Fblob\u002Fmain\u002FLICENSE).\n\nWe borrow codes from [BasicSR](https:\u002F\u002Fgithub.com\u002FXPixelGroup\u002FBasicSR), [VQFR](https:\u002F\u002Fgithub.com\u002FTencentARC\u002FVQFR), [DifFace](https:\u002F\u002Fgithub.com\u002FzsyOAOA\u002FDifFace), [k-diffusion](https:\u002F\u002Fgithub.com\u002Fcrowsonkb\u002Fk-diffusion), and [SwinIR](https:\u002F\u002Fgithub.com\u002FJingyunLiang\u002FSwinIR). We thank the authors of these repositories for their useful implementations.\n\n## 📧 Contact\nIf you have any questions or inquiries, please feel free to [contact me](mailto:guyoep@gmail.com).\n","\u003Cdiv align=\"center\">\n\n# 后验均值修正流：\u003Cbr \u002F>迈向最小均方误差的逼真图像修复\u003Cbr \u002F>(ICLR 2025)\n\n[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.00418)] [[项目页面](https:\u002F\u002Fpmrf-ml.github.io\u002F)] [[演示](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fohayonguy\u002FPMRF)]\n\n[Guy Ohayon](https:\u002F\u002Fohayonguy.github.io\u002F)、[Tomer Michaeli](https:\u002F\u002Ftomer.net.technion.ac.il\u002F)、[Michael Elad](https:\u002F\u002Felad.cs.technion.ac.il\u002F)\u003Cbr \u002F>\n以色列理工学院\n\n\u003C\u002Fdiv>\n\n> PMRF 是一种新颖的逼真图像修复算法。它（在理论上）近似于在完美感知质量约束下，能够最小化均方误差（MSE）的**最优**估计器。\n\n\u003Cdiv align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fohayonguy_PMRF_readme_4a41ad19ba60.png\" width=\"2000\">\n\u003C\u002Fdiv>\n\n---\n\n\u003Cdiv align=\"center\">\n\n[![license](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-red.svg)](https:\u002F\u002Fgithub.com\u002Fohayonguy\u002FPMRF\u002Fblob\u002Fmain\u002FLICENSE)\n[![torch](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPyTorch-2.3.1-DE3412)](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fpytorch)\n[![lightning](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLightning-2.3.3-8A2BE2)](https:\u002F\u002Fgithub.com\u002FLightning-AI\u002Fpytorch-lightning)\n[![Hugging Face](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo-%F0%9F%A4%97%20Hugging%20Face-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fohayonguy\u002FPMRF)\n[![Hits](https:\u002F\u002Fhits.sh\u002Fgithub.com\u002Fohayonguy\u002FPMRF.svg?label=Visitors&color=30a704)](https:\u002F\u002Fhits.sh\u002Fgithub.com\u002Fohayonguy\u002FPMRF\u002F)\n\n\u003C\u002Fdiv>\n\n# 📈 我们论文中的部分结果\n### CelebA-Test 定量对比\n\n红色、蓝色和绿色分别表示最佳、第二佳和第三佳的成绩。\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fohayonguy_PMRF_readme_16425dc27480.png\"\u002F>\n\n\n### WIDER-Test 视觉对比\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fohayonguy_PMRF_readme_dc82796a5eaf.png\"\u002F>\n\n### WebPhoto-Test 视觉对比\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fohayonguy_PMRF_readme_f810ddcd3fb7.png\"\u002F>\n\n# ⚙️ 安装\n**Windows 用户注意：** *似乎有几位 Windows 用户无法安装 `natten` 包，而该包是使用 PMRF 中 HDiT 模型架构所必需的。一个对多人有效的解决方案已在 [这里](https:\u002F\u002Fgithub.com\u002Fohayonguy\u002FPMRF\u002Fissues\u002F8#issue-2581034421) 提出。如果您仍未能解决此问题，可以尝试使用其他架构（如 UNet）来训练 PMRF，从而避免使用 `natten` 包。*\n\n我们通过以下命令创建了一个 conda 环境，务必严格按照给定顺序执行（这些命令也包含在 `install.sh` 文件中）：\n\n```\nconda create -n pmrf python=3.10\nconda activate pmrf\nconda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=11.8 -c pytorch -c nvidia\nconda install lightning==2.3.3 -c conda-forge\npip install opencv-python==4.10.0.84 timm==1.0.8 wandb==0.17.5 lovely-tensors==0.1.16 torch-fidelity==0.3.0 einops==0.8.0 dctorch==0.1.2 torch-ema==0.3\npip install natten==0.17.1+torch230cu118 -f https:\u002F\u002Fshi-labs.com\u002Fnatten\u002Fwheels\npip install nvidia-cuda-nvcc-cu11\npip install basicsr==1.4.2\npip install git+https:\u002F\u002Fgithub.com\u002Ftoshas\u002Ftorch-fidelity.git\npip install lpips==0.1.4\npip install piq==0.8.0\npip install huggingface_hub==0.24.5\n```\n\n1. 请注意，PMRF 使用的 HDiT 架构需要 `natten` 包。请确保将 `natten==0.17.1+torch230cu118` 替换为您系统上安装的正确 CUDA 版本。可用版本请参阅 https:\u002F\u002Fshi-labs.com\u002Fnatten\u002F。\n2. 我们安装了 `nvidia-cuda-nvcc-cu11`，因为否则 `torch.compile` 会无故卡住。不过，在您的系统上，`torch.compile` 可能无需此包即可正常运行。无论如何，如果您希望如此，可以直接跳过该包，或从我们的代码中移除所有涉及 `torch.compile` 的行。\n3. 由于 `basicsr` 存在兼容性问题，您需要修改该包中的一个文件。打开 `\u002Fpath\u002Fto\u002Fenv\u002Fpmrf\u002Flib\u002Fpython3.10\u002Fsite-packages\u002Fbasicsr\u002Fdata\u002Fdegradations.py`，其中 `\u002Fpath\u002Fto\u002Fenv` 是您的 conda 环境安装路径。\n然后将以下行：\n```\nfrom torchvision.transforms.functional_tensor import rgb_to_grayscale\n```\n改为：\n```\nfrom torchvision.transforms.functional import rgb_to_grayscale\n```\n\n \n# ⬇️ 下载\n## 🌐 模型检查点\n我们提供了盲脸图像修复模型的检查点，可在 [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fohayonguy\u002FPMRF_blind_face_image_restoration) 和 [Google Drive](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1dfjZATcQ451uhvFH42tKnfMNHRkL6N_A?usp=sharing) 上获取。论文第 5.2 节中受控实验的检查点也可从 [Google Drive](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1dfjZATcQ451uhvFH42tKnfMNHRkL6N_A?usp=sharing) 下载。请保持与 Google Drive 中相同的文件夹结构：\n\n```\ncheckpoints\u002F\n├── blind_face_restoration_pmrf.ckpt    # 盲脸图像修复模型检查点。\n├── swinir_restoration512_L1.pth    # DifFace 训练的 SwinIR 模型检查点。\n├── controlled_experiments\u002F     # 受控实验的检查点\n│   ├── colorization_gaussian_noise_025\u002F\n│   │   ├── pmrf\u002F\n│   │   │   └── epoch=999-step=273000.ckpt\n│   │   ├── mmse\u002F\n│   │   │   └── epoch=999-step=273000.ckpt\n.   .   .\n.   .   .\n.   .   .\n```\n为了评估论文中的地标距离（LMD）和身份度量（Deg），您还需要从 [VQFR](https:\u002F\u002Fgithub.com\u002FTencentARC\u002FVQFR) 的 [Google Drive](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1k3RCSliF6PsujCMIdCD1hNM63EozlDIZ) 下载 `resnet18_110.pth` 和 `alignment_WFLW_4HG.pth` 检查点，并将其放置在 `evaluation\u002Fmetrics_ckpt\u002F` 文件夹中。\n\n## 🌐 盲脸图像修复测试数据集\n1. 从 https:\u002F\u002Fxinntao.github.io\u002Fprojects\u002Fgfpgan 下载 WebPhoto-Test、LFW-Test 和 CelebA-Test（HQ 和 LQ）。\n2. 从 https:\u002F\u002Fshangchenzhou.com\u002Fprojects\u002FCodeFormer\u002F 下载 WIDER-Test。\n3. 将这些数据集存放在您系统中的任意位置。\n\n\n# 🧑 盲脸图像修复（论文第 5.1 节）\n\n## ⚡ 快速推理\n为了快速使用我们的模型，我们提供了一个 [Hugging Face 检查点](https:\u002F\u002Fhuggingface.co\u002Fohayonguy\u002FPMRF_blind_face_image_restoration)，它会自动下载。只需运行以下命令：\n```\npython inference.py \\\n--ckpt_path ohayonguy\u002FPMRF_blind_face_image_restoration \\\n--ckpt_path_is_huggingface \\\n--lq_data_path \u002Fpath\u002Fto\u002Flq\u002Fimages \\\n--output_dir \u002Fpath\u002Fto\u002Fresults\u002Fdir \\\n--batch_size 64 \\\n--num_flow_steps 25\n```\n请根据需要调整 `--num_flow_steps`（这是我们论文中的超参数 `K`）。\n\n你也可以提供本地的模型检查点（例如，如果你训练了自己的 PMRF 模型，或者希望使用我们的 [Google Drive](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1dfjZATcQ451uhvFH42tKnfMNHRkL6N_A?usp=sharing) 检查点而不是 Hugging Face 的）。只需运行以下命令：\n```\npython inference.py \\\n--ckpt_path .\u002Fcheckpoints\u002Fblind_face_restoration_pmrf.ckpt \\\n--lq_data_path \u002Fpath\u002Fto\u002Flq\u002Fimages \\\n--output_dir \u002Fpath\u002Fto\u002Fresults\u002Fdir \\\n--batch_size 64 \\\n--num_flow_steps 25\n```\n\n需要注意的是，我们的盲人脸图像修复模型是针对方形且对齐的人脸图像进行训练的。如果要修复包含多个人脸的一般内容图像，可以使用我们的 [Hugging Face 演示](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fohayonguy\u002FPMRF)。\n\n## 🔬 评估\n\n1. 我们从 [VQFR](https:\u002F\u002Fgithub.com\u002FTencentARC\u002FVQFR) 的 [Google Drive](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1k3RCSliF6PsujCMIdCD1hNM63EozlDIZ) 下载了 `resnet18_110.pth` 和 `alignment_WFLW_4HG.pth` 检查点，并将它们放入 `evaluation\u002Fmetrics_ckpt\u002F` 文件夹中。\n   要在 CelebA-Test 数据集上评估结果，请运行：\n   ```\n   cd evaluation\n   python compute_metrics_blind.py \\\n   --parent_ffhq_512_path \u002Fpath\u002Fto\u002Fparent\u002Fof\u002Fffhq512 \\\n   --rec_path \u002Fpath\u002Fto\u002Fceleba-512-test\u002Frestored\u002Fimages \\\n   --gt_path \u002Fpath\u002Fto\u002Fceleba-512-test\u002Fground-truth\u002Fimages\n   ```\n\n   要在真实世界数据集上评估结果，请运行：\n   ```\n   cd evaluation\n   python compute_metrics_blind.py \\\n   --parent_ffhq_512_path \u002Fpath\u002Fto\u002Fparent\u002Fof\u002Fffhq512 \\\n   --rec_path \u002Fpath\u002Fto\u002Freal-world\u002Frestored\u002Fimages \\\n   --mmse_rec_path \u002Fpath\u002Fto\u002Fmmse\u002Frestored\u002Fimages\n   ```\n\n   其中 `--mmse_rec_path` 参数是可选的，用于计算 IndRMSE，作为真实世界退化图像的真实 RMSE 指标。请注意，当你运行 `inference.py` 时，MMSE 重建结果会自动保存，因为 MMSE 模型也包含在 PMRF 检查点中。\n\n## 💻 训练\n在 `scripts\u002F` 文件夹中，我们提供了用于盲人脸图像修复以及基线模型训练的脚本。如果你想运行某个脚本，需要在根目录下执行（即 `train.py` 所在的目录）。训练该模型需要 FFHQ 数据集。我们下载了原始的 FFHQ 1024×1024 数据集，并使用双三次下采样将其缩小到 512×512。\n\n1. 将位于 `scripts\u002Ftrain\u002Fblind_face_restoration` 中的 `train_pmrf.sh` 文件复制到根目录。\n2. 根据你系统中训练和验证数据的位置，调整 `--train_data_root` 和 `--val_data_root` 参数。\n3. 由 [DifFace](https:\u002F\u002Fgithub.com\u002FzsyOAOA\u002FDifFace) 训练的 SwinIR 模型已放在 `checkpoints\u002F` 文件夹中。我们通过以下命令下载了它：\n   ```\n   wget https:\u002F\u002Fgithub.com\u002FzsyOAOA\u002FDifFace\u002Freleases\u002Fdownload\u002FV1.0\u002Fswinir_restoration512_L1.pth\n   ```\n4. 将 `--mmse_model_ckpt_path` 参数调整为 SwinIR 模型的路径。\n5. 根据你的系统配置，调整 `--num_gpus` 和 `--num_workers` 参数。\n6. 运行 `train_pmrf.sh` 脚本以训练我们的模型。\n\n# 👩‍🔬 对照实验（论文第 5.2 节）\n我们提供了论文中对照实验的训练和评估代码，其中我们将 PMRF 与以下基线方法进行了比较：\n1. **基于 Y 的流模型**：一种经过修正的流模型，它*以输入测量值为条件*，学习从纯噪声流到真实数据分布。\n2. **基于后验均值预测器的流模型**：一种经过修正的流模型，它*以后验均值预测为条件*，学习从纯噪声流到真实数据分布。\n3. **直接从 Y 流动的模型**：一种经过修正的流模型，它从退化的测量值流到真实数据分布。\n4. **后验均值预测器**：一种旨在最小化 MSE 损失的模型。\n\n## 🔬 评估\n我们提供了用于快速评估 PMRF 及所有基线方法的检查点。\n1. 评估是在尺寸为 256×256 的 CelebA-Test 图像上进行的。为了获取这些图像，我们从 [GFPGAN](https:\u002F\u002Fxinntao.github.io\u002Fprojects\u002Fgfpgan) 下载了 CelebA-Test (HQ) 图像，并使用双三次下采样将其缩小到 256×256。\n2. 在 `test.sh` 中将 `--test_data_root` 调整为 CelebA-Test 256×256 图像的路径，并根据你要评估的退化类型及相应模型检查点，调整 `--degradation` 和 `--ckpt_path` 参数。\n3. 运行 `test.sh`。\n\n我们会自动保存重建后的输出、退化后的测量值以及来自源分布的样本（即 ODE 求解器开始处理的图像）。运行 `test.sh` 后，你可以通过以下方式评估结果：\n\n```\ncd evaluation\npython compute_metrics_controlled_experiments.py \\\n--parent_ffhq_256_path \u002Fpath\u002Fto\u002Fparent\u002Fof\u002Fffhq256 \\\n--rec_path \u002Fpath\u002Fto\u002Frestored\u002Fimages \\\n--gt_path \u002Fpath\u002Fto\u002Fceleba-256-test\u002Fground-truth\u002Fimages\n```\n\n## 💻 训练\n\n* 我们在 FFHQ 256×256 数据上训练了模型。为了获得这些图像，我们将原始 FFHQ 1024×1024 图像使用双三次下采样缩小到了 256×256。\n* PMRF 以及每个基线模型的训练脚本都位于 `scripts\u002Ftrain\u002Fcontrolled_experiments\u002F` 文件夹中。\n* 要运行这些脚本，你需要将其复制到 `train.py` 所在的根目录。你只需调整每个脚本中的 `--degradation`、`--source_noise_std`、`--train_data_root` 和 `--val_data_root` 参数即可。对于去噪任务，我们使用了 `--source_noise_std 0.025`；而对于其他任务，则使用了 `--source_noise_std 0.1`。\n* 要运行 `train_pmrf.sh` 和 `train_posterior_conditioned_on_mmse_model.sh` 脚本，首先需要通过 `train_mmse.sh` 训练 MMSE 模型。然后，根据 MMSE 模型最终检查点的路径，调整 `--mmse_model_ckpt_path` 参数。\n\n## 📝 引用\n    @inproceedings{\n        ohayon2025posteriormean,\n        title={Posterior-Mean Rectified Flow: Towards Minimum {MSE} Photo-Realistic Image Restoration},\n        author={Guy Ohayon and Tomer Michaeli and Michael Elad},\n        booktitle={The Thirteenth International Conference on Learning Representations},\n        year={2025},\n        url={https:\u002F\u002Fopenreview.net\u002Fforum?id=hPOt3yUXii}\n    }\n\n## 📋 许可证与致谢\n本项目采用 [MIT 许可证](https:\u002F\u002Fgithub.com\u002Fohayonguy\u002FPMRF\u002Fblob\u002Fmain\u002FLICENSE) 开源。\n\n我们借鉴了 [BasicSR](https:\u002F\u002Fgithub.com\u002FXPixelGroup\u002FBasicSR)、[VQFR](https:\u002F\u002Fgithub.com\u002FTencentARC\u002FVQFR)、[DifFace](https:\u002F\u002Fgithub.com\u002FzsyOAOA\u002FDifFace)、[k-diffusion](https:\u002F\u002Fgithub.com\u002Fcrowsonkb\u002Fk-diffusion) 和 [SwinIR](https:\u002F\u002Fgithub.com\u002FJingyunLiang\u002FSwinIR) 的代码。感谢这些仓库的作者提供了有用的实现。\n\n## 📧 联系方式\n如果您有任何问题或咨询，请随时[联系我](mailto:guyoep@gmail.com)。","# PMRF 快速上手指南\n\nPMRF (Posterior-Mean Rectified Flow) 是一种新颖的照片级真实感图像修复算法，旨在在保持完美感知质量约束的同时，最小化均方误差 (MSE)。本指南将帮助您快速配置环境并运行盲人脸图像修复模型。\n\n## 1. 环境准备\n\n*   **操作系统**: 推荐 Linux (Ubuntu)。\n    *   *Windows 用户注意*: `natten` 包在 Windows 上安装可能失败。若遇到此问题，建议改用 UNet 架构训练，或参考官方 Issue #8 中的解决方案。\n*   **硬件要求**: NVIDIA GPU (支持 CUDA 11.8)。\n*   **软件依赖**:\n    *   Conda (推荐用于环境管理)\n    *   Python 3.10\n    *   CUDA Toolkit 11.8\n\n## 2. 安装步骤\n\n请严格按照以下顺序执行命令以创建名为 `pmrf` 的 Conda 环境并安装依赖。\n\n### 2.1 创建环境与安装基础库\n\n```bash\nconda create -n pmrf python=3.10\nconda activate pmrf\nconda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=11.8 -c pytorch -c nvidia\nconda install lightning==2.3.3 -c conda-forge\n```\n\n### 2.2 安装 Python 依赖包\n\n```bash\npip install opencv-python==4.10.0.84 timm==1.0.8 wandb==0.17.5 lovely-tensors==0.1.16 torch-fidelity==0.3.0 einops==0.8.0 dctorch==0.1.2 torch-ema==0.3\npip install nvidia-cuda-nvcc-cu11\npip install basicsr==1.4.2\npip install git+https:\u002F\u002Fgithub.com\u002Ftoshas\u002Ftorch-fidelity.git\npip install lpips==0.1.4\npip install piq==0.8.0\npip install huggingface_hub==0.24.5\n```\n\n### 2.3 安装 NATTEN (关键步骤)\n\n`natten` 是 PMRF 使用的 HDiT 架构所必需的。**请务必根据您的实际 CUDA 版本替换下方命令中的版本号**。以下示例适用于 CUDA 11.8 + PyTorch 2.3：\n\n```bash\npip install natten==0.17.1+torch230cu118 -f https:\u002F\u002Fshi-labs.com\u002Fnatten\u002Fwheels\n```\n> 其他版本请参考：https:\u002F\u002Fshi-labs.com\u002Fnatten\u002F\n\n### 2.4 修复兼容性問題\n\n由于 `basicsr` 包的兼容性问题，需要手动修改一个文件。\n找到您的环境路径（例如 `\u002Fpath\u002Fto\u002Fenv\u002Fpmrf\u002Flib\u002Fpython3.10\u002Fsite-packages\u002Fbasicsr\u002Fdata\u002Fdegradations.py`），打开该文件，将：\n\n```python\nfrom torchvision.transforms.functional_tensor import rgb_to_grayscale\n```\n\n修改为：\n\n```python\nfrom torchvision.transforms.functional import rgb_to_grayscale\n```\n\n## 3. 基本使用\n\nPMRF 提供了预训练的盲人脸图像修复模型。您可以直接通过 Hugging Face 自动下载权重并进行推理，无需手动下载检查点。\n\n### 3.1 快速推理\n\n准备好低质量 (LQ) 的人脸图像文件夹后，运行以下命令：\n\n```bash\npython inference.py \\\n--ckpt_path ohayonguy\u002FPMRF_blind_face_image_restoration \\\n--ckpt_path_is_huggingface \\\n--lq_data_path \u002Fpath\u002Fto\u002Flq\u002Fimages \\\n--output_dir \u002Fpath\u002Fto\u002Fresults\u002Fdir \\\n--batch_size 64 \\\n--num_flow_steps 25\n```\n\n**参数说明：**\n*   `--lq_data_path`: 输入的低质量图像目录路径。\n*   `--output_dir`: 修复结果保存目录。\n*   `--num_flow_steps`: 流步数（论文中的超参数 $K$），默认 25，可根据需求调整以平衡速度与质量。\n*   **注意**: 该预训练模型针对**正方形且已对齐**的人脸图像进行了优化。若要处理包含多张人脸或未对齐的通用照片，建议使用官方的 [Hugging Face Demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fohayonguy\u002FPMRF)。\n\n### 3.2 使用本地模型检查点\n\n如果您已手动下载了模型文件（例如从 Google Drive），可使用以下命令：\n\n```bash\npython inference.py \\\n--ckpt_path .\u002Fcheckpoints\u002Fblind_face_restoration_pmrf.ckpt \\\n--lq_data_path \u002Fpath\u002Fto\u002Flq\u002Fimages \\\n--output_dir \u002Fpath\u002Fto\u002Fresults\u002Fdir \\\n--batch_size 64 \\\n--num_flow_steps 25\n```","一家数字档案修复团队正在处理一批因年代久远而严重模糊、带有噪点且细节丢失的历史人像照片，目标是还原出既清晰又符合真实光影质感的高清图像。\n\n### 没有 PMRF 时\n- **细节过度平滑**：传统去噪算法为了降低误差，往往将人脸纹理当作噪声抹除，导致皮肤呈现不自然的“塑料感”或蜡像感。\n- **伪影与失真**：基于生成对抗网络（GAN）的方法虽然能生成纹理，但容易 hallucinate（幻觉化）出不存在的五官特征，破坏人物原本的真实面貌。\n- **画质与感知难以兼得**：团队被迫在“数学误差最小化（模糊但准确）”和“视觉感知好（清晰但可能造假）”之间做痛苦的二选一，无法同时满足档案记录的严谨性与观赏性。\n- **迭代调试成本高**：需要反复调整多组参数尝试平衡点，且不同退化程度的照片需要定制不同的处理流程，效率极低。\n\n### 使用 PMRF 后\n- **最优误差控制**：PMRF 通过数学证明逼近了最小均方误差（MSE）的最优估计器，在保留原始像素统计准确性的同时，有效去除了噪声。\n- **照片级真实还原**：利用后验均值整流流技术，恢复出的皮肤毛孔、发丝等高频细节自然逼真，彻底消除了传统方法的模糊感或虚假纹理。\n- **打破质量悖论**：完美解决了感知质量与保真度的冲突，输出结果既符合人眼对高清照片的审美期待，又严格忠实于原图的人物特征，无需妥协。\n- **统一高效流程**：针对不同程度的退化图像，PMRF 仅需一套模型即可自适应处理，大幅减少了人工干预和参数调优的时间。\n\nPMRF 核心价值在于它从理论层面打破了图像修复中“清晰度”与“真实性”不可兼得的魔咒，为专业影像修复提供了兼具数学最优解与视觉真实感的终极方案。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fohayonguy_PMRF_4a41ad19.png","ohayonguy","Guy Ohayon","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fohayonguy_e5aaee2f.jpg","Postdoctoral Researcher","Flatiron Institute","United States",null,"guy__ohayon","guyohayon.com","https:\u002F\u002Fgithub.com\u002Fohayonguy",[84,88],{"name":85,"color":86,"percentage":87},"Python","#3572A5",93.8,{"name":89,"color":90,"percentage":91},"Shell","#89e051",6.2,746,43,"2026-04-16T23:10:02","MIT",4,"Linux, Windows, macOS","需要 NVIDIA GPU，安装命令指定 pytorch-cuda=11.8，需支持 CUDA 11.8；若使用 HDiT 架构需安装对应 CUDA 版本的 natten 包","未说明",{"notes":101,"python":102,"dependencies":103},"Windows 用户安装 natten 包可能遇到困难，若失败可改用 UNet 架构避开该依赖；安装 basicsr 后需手动修改其源码文件 (degradations.py) 以修复兼容性问题；若 torch.compile 挂起，需安装 nvidia-cuda-nvcc-cu11 或移除代码中的编译指令；训练盲脸修复需 FFHQ 数据集。","3.10",[104,105,106,107,108,109,110,111,112,113],"pytorch==2.3.1","torchvision==0.18.1","lightning==2.3.3","opencv-python==4.10.0.84","timm==1.0.8","natten==0.17.1+torch230cu118","basicsr==1.4.2","lpips==0.1.4","piq==0.8.0","huggingface_hub==0.24.5",[15],[116,117,118,119,120,121,122,123,124,125,126,127,128],"blind-face-restoration","colorization","computer-vision","denoising","diffusion-models","flow-matching","generative-models","image-manipulation","image-processing","image-restoration","inpainting","inverse-problems","rectified-flow","2026-03-27T02:49:30.150509","2026-04-20T04:07:12.802861",[132,137,142,147,152,157],{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},43656,"运行 train_pmrf.sh 后，模型权重保存在哪里？","该问题的评论主要讨论了关于“从后验均值采样”的理论误解，维护者指出后验均值是退化输入的函数，无法直接从中采样，并建议用户阅读论文第 3 节以获取理论解释。关于权重保存的具体路径，当前评论中未提供明确的技术细节，建议检查训练脚本中的输出配置或参考论文实现细节。","https:\u002F\u002Fgithub.com\u002Fohayonguy\u002FPMRF\u002Fissues\u002F7",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},43657,"stratified_uniform 时间步调度器的设计直觉是什么？为什么需要添加噪声？","维护者解释称，如果不添加噪声，PMRF 方法将无法正常工作。即使使用小批量大小，图像也不应保持模糊（类似于 MMSE 估计）。如果在训练和推理过程中结果不理想，请确保按照论文算法 1 的描述，在 MMSE 输出中添加了少量噪声。这种噪声对于训练和推理都是必须的。此外，流模型通常建议使用较大的批量大小进行训练。","https:\u002F\u002Fgithub.com\u002Fohayonguy\u002FPMRF\u002Fissues\u002F15",{"id":143,"question_zh":144,"answer_zh":145,"source_url":146},43658,"如何在 Windows 上部署 Gradio Demo 并解决 torchvision 版本错误？","Windows 用户遇到报错时，通常是因为 torchvision 版本不兼容。有用户反馈在使用 torch==2.4.0+cu124 和 torchvision==0.19.0+cu124 时出现 `basicsr` 导入错误。建议首先仔细阅读官方 README 中的安装指南（https:\u002F\u002Fgithub.com\u002Fohayonguy\u002FPMRF?tab=readme-ov-file#%EF%B8%8F-installation）。如果问题依旧，可能需要调整 torch 和 torchvision 的版本组合，或者尝试在 Linux 环境下运行以避免 Windows 特定的兼容性问题（如 natten 在 Windows 上的支持限制）。","https:\u002F\u002Fgithub.com\u002Fohayonguy\u002FPMRF\u002Fissues\u002F8",{"id":148,"question_zh":149,"answer_zh":150,"source_url":151},43659,"FFHQ 数据集是如何调整大小的？使用的是 PIL 的 bicubic 吗？","是的，作者确认使用了 PIL 的 bicubic 实现来下采样 FFHQ 数据集，而非之前误传的 OpenCV。具体的代码实现如下：\n```python\nfrom PIL import Image\ndef downscale_image(input_path, output_path, size=(512, 512)):\n    with Image.open(input_path) as img:\n        img_resized = img.resize(size, Image.BICUBIC)\n        img_resized.save(output_path)\n```\n维护者还提到，较新版本的 torchvision 已经解决了混叠问题，其行为与 PIL 对齐，但为了复现实验结果，建议明确使用 PIL 进行处理。","https:\u002F\u002Fgithub.com\u002Fohayonguy\u002FPMRF\u002Fissues\u002F16",{"id":153,"question_zh":154,"answer_zh":155,"source_url":156},43660,"运行时出现 'stack expects each tensor to be equal size' 或 natten 相关错误怎么办？","此类错误通常表明 `natten` (Neighborhood Attention Extension) 未正确安装或与当前系统不兼容。虽然用户可能已通过 pip 安装了 natten，但在某些环境（特别是 Windows 或 AMD GPU 环境）下，`has_fused_na` 等功能可能无法正常工作。维护者建议检查 natten 是否已正确安装。如果遇到底层兼容性问题（如 Windows 上的 fused 操作不支持），建议尝试在 Linux 环境下运行，或检查输入数据批次中的图像尺寸是否一致。","https:\u002F\u002Fgithub.com\u002Fohayonguy\u002FPMRF\u002Fissues\u002F13",{"id":158,"question_zh":159,"answer_zh":160,"source_url":161},43661,"为什么本地仓库运行的效果比 HuggingFace 在线 Demo 差？","造成效果差异的主要原因通常是预处理步骤不同，特别是对于非对齐的人脸图像。在线 Demo 可能包含了特定的人脸对齐或预处理流程，而本地仓库的默认推理代码可能未包含这些步骤。建议检查输入图像是否经过了与 Demo 相同的预处理（如人脸检测和裁剪对齐）。如果处理的是非对齐人脸，需要寻找或实现相应的预处理代码，或者确认推理参数是否与 Demo 保持一致。","https:\u002F\u002Fgithub.com\u002Fohayonguy\u002FPMRF\u002Fissues\u002F11",[]]