[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-phillipi--pix2pix":3,"tool-phillipi--pix2pix":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",158594,2,"2026-04-16T23:34:05",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":76,"owner_location":76,"owner_email":77,"owner_twitter":76,"owner_website":78,"owner_url":79,"languages":80,"stars":101,"forks":102,"last_commit_at":103,"license":104,"difficulty_score":105,"env_os":106,"env_gpu":107,"env_ram":108,"env_deps":109,"category_tags":120,"github_topics":121,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":131,"updated_at":132,"faqs":133,"releases":162},8169,"phillipi\u002Fpix2pix","pix2pix","Image-to-image translation with conditional adversarial nets","pix2pix 是一款基于深度学习的图像转换工具，核心功能是将一种类型的输入图像自动翻译成对应的输出图像。它主要解决了传统方法难以处理的复杂图像映射问题，例如将语义标签图还原为真实街景、把黑白照片上色、将草线稿渲染为逼真建筑立面，或是把卫星地图转换为普通地图等。\n\n该工具的独特技术亮点在于采用了“条件生成对抗网络”（Conditional GANs）。与普通的生成模型不同，pix2pix 在训练过程中不仅让生成器学习创造图像，还引入判别器来评估生成结果与真实图像的匹配度，同时利用输入图像作为条件约束，从而确保生成的图像既逼真又严格符合输入内容的结构特征。官方数据显示，即使是只有几百张图片的小型数据集，也能在较短时间内训练出效果不错的模型。\n\npix2pix 非常适合人工智能研究人员、计算机视觉开发者以及需要定制化图像生成方案的技术团队使用。由于项目主要提供基于 Torch 和 PyTorch 的代码实现，使用者需要具备一定的编程基础和深度学习环境配置能力。虽然普通用户无法直接通过图形界面操作，但设计师可以利用其训练好的模型或衍生应用，高效完成从概念草图到成品图的转化工作，极大地提升创作","pix2pix 是一款基于深度学习的图像转换工具，核心功能是将一种类型的输入图像自动翻译成对应的输出图像。它主要解决了传统方法难以处理的复杂图像映射问题，例如将语义标签图还原为真实街景、把黑白照片上色、将草线稿渲染为逼真建筑立面，或是把卫星地图转换为普通地图等。\n\n该工具的独特技术亮点在于采用了“条件生成对抗网络”（Conditional GANs）。与普通的生成模型不同，pix2pix 在训练过程中不仅让生成器学习创造图像，还引入判别器来评估生成结果与真实图像的匹配度，同时利用输入图像作为条件约束，从而确保生成的图像既逼真又严格符合输入内容的结构特征。官方数据显示，即使是只有几百张图片的小型数据集，也能在较短时间内训练出效果不错的模型。\n\npix2pix 非常适合人工智能研究人员、计算机视觉开发者以及需要定制化图像生成方案的技术团队使用。由于项目主要提供基于 Torch 和 PyTorch 的代码实现，使用者需要具备一定的编程基础和深度学习环境配置能力。虽然普通用户无法直接通过图形界面操作，但设计师可以利用其训练好的模型或衍生应用，高效完成从概念草图到成品图的转化工作，极大地提升创作效率。","\n# pix2pix\n[Project](https:\u002F\u002Fphillipi.github.io\u002Fpix2pix\u002F) | [Arxiv](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.07004) |\n[PyTorch](https:\u002F\u002Fgithub.com\u002Fjunyanz\u002Fpytorch-CycleGAN-and-pix2pix)\n\nTorch implementation for learning a mapping from input images to output images, for example:\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphillipi_pix2pix_readme_230f170a6c3b.jpg\" width=\"900px\"\u002F>\n\nImage-to-Image Translation with Conditional Adversarial Networks  \n [Phillip Isola](http:\u002F\u002Fweb.mit.edu\u002Fphillipi\u002F), [Jun-Yan Zhu](https:\u002F\u002Fwww.cs.cmu.edu\u002F~junyanz\u002F), [Tinghui Zhou](https:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~tinghuiz\u002F), [Alexei A. Efros](https:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~efros\u002F)   \n CVPR, 2017.\n\nOn some tasks, decent results can be obtained fairly quickly and on small datasets. For example, to learn to generate facades (example shown above), we trained on just 400 images for about 2 hours (on a single Pascal Titan X GPU). However, for harder problems it may be important to train on far larger datasets, and for many hours or even days.\n\n**Note**: Please check out our [PyTorch](https:\u002F\u002Fgithub.com\u002Fjunyanz\u002Fpytorch-CycleGAN-and-pix2pix) implementation for pix2pix and CycleGAN. The PyTorch version is under active development and can produce results comparable to or better than this Torch version.\n\n## Setup\n\n### Prerequisites\n- Linux or OSX\n- NVIDIA GPU + CUDA CuDNN (CPU mode and CUDA without CuDNN may work with minimal modification, but untested)\n\n### Getting Started\n- Install torch and dependencies from https:\u002F\u002Fgithub.com\u002Ftorch\u002Fdistro\n- Install torch packages `nngraph` and `display`\n```bash\nluarocks install nngraph\nluarocks install https:\u002F\u002Fraw.githubusercontent.com\u002Fszym\u002Fdisplay\u002Fmaster\u002Fdisplay-scm-0.rockspec\n```\n- Clone this repo:\n```bash\ngit clone git@github.com:phillipi\u002Fpix2pix.git\ncd pix2pix\n```\n- Download the dataset (e.g., [CMP Facades](http:\u002F\u002Fcmp.felk.cvut.cz\u002F~tylecr1\u002Ffacade\u002F)):\n```bash\nbash .\u002Fdatasets\u002Fdownload_dataset.sh facades\n```\n- Train the model\n```bash\nDATA_ROOT=.\u002Fdatasets\u002Ffacades name=facades_generation which_direction=BtoA th train.lua\n```\n- (CPU only) The same training command without using a GPU or CUDNN. Setting the environment variables ```gpu=0 cudnn=0``` forces CPU only\n```bash\nDATA_ROOT=.\u002Fdatasets\u002Ffacades name=facades_generation which_direction=BtoA gpu=0 cudnn=0 batchSize=10 save_epoch_freq=5 th train.lua\n```\n- (Optionally) start the display server to view results as the model trains. ( See [Display UI](#display-ui) for more details):\n```bash\nth -ldisplay.start 8000 0.0.0.0\n```\n\n- Finally, test the model:\n```bash\nDATA_ROOT=.\u002Fdatasets\u002Ffacades name=facades_generation which_direction=BtoA phase=val th test.lua\n```\nThe test results will be saved to an html file here: `.\u002Fresults\u002Ffacades_generation\u002Flatest_net_G_val\u002Findex.html`.\n\n## Train\n```bash\nDATA_ROOT=\u002Fpath\u002Fto\u002Fdata\u002F name=expt_name which_direction=AtoB th train.lua\n```\nSwitch `AtoB` to `BtoA` to train translation in opposite direction.\n\nModels are saved to `.\u002Fcheckpoints\u002Fexpt_name` (can be changed by passing `checkpoint_dir=your_dir` in train.lua).\n\nSee `opt` in train.lua for additional training options.\n\n## Test\n```bash\nDATA_ROOT=\u002Fpath\u002Fto\u002Fdata\u002F name=expt_name which_direction=AtoB phase=val th test.lua\n```\n\nThis will run the model named `expt_name` in direction `AtoB` on all images in `\u002Fpath\u002Fto\u002Fdata\u002Fval`.\n\nResult images, and a webpage to view them, are saved to `.\u002Fresults\u002Fexpt_name` (can be changed by passing `results_dir=your_dir` in test.lua).\n\nSee `opt` in test.lua for additional testing options.\n\n\n## Datasets\nDownload the datasets using the following script. Some of the datasets are collected by other researchers. Please cite their papers if you use the data.\n```bash\nbash .\u002Fdatasets\u002Fdownload_dataset.sh dataset_name\n```\n- `facades`: 400 images from [CMP Facades dataset](http:\u002F\u002Fcmp.felk.cvut.cz\u002F~tylecr1\u002Ffacade\u002F). [[Citation](datasets\u002Fbibtex\u002Ffacades.tex)]\n- `cityscapes`: 2975 images from the [Cityscapes training set](https:\u002F\u002Fwww.cityscapes-dataset.com\u002F).  [[Citation](datasets\u002Fbibtex\u002Fcityscapes.tex)]\n- `maps`: 1096 training images scraped from Google Maps\n- `edges2shoes`: 50k training images from [UT Zappos50K dataset](http:\u002F\u002Fvision.cs.utexas.edu\u002Fprojects\u002Ffinegrained\u002Futzap50k\u002F). Edges are computed by [HED](https:\u002F\u002Fgithub.com\u002Fs9xie\u002Fhed) edge detector + post-processing.\n[[Citation](datasets\u002Fbibtex\u002Fshoes.tex)]\n- `edges2handbags`: 137K Amazon Handbag images from [iGAN project](https:\u002F\u002Fgithub.com\u002Fjunyanz\u002FiGAN). Edges are computed by [HED](https:\u002F\u002Fgithub.com\u002Fs9xie\u002Fhed) edge detector + post-processing. [[Citation](datasets\u002Fbibtex\u002Fhandbags.tex)]\n- `night2day`: around 20K natural scene images from  [Transient Attributes dataset](http:\u002F\u002Ftransattr.cs.brown.edu\u002F) [[Citation](datasets\u002Fbibtex\u002Ftransattr.tex)]. To train a `day2night` pix2pix model, you need to add `which_direction=BtoA`.\n\n## Models\nDownload the pre-trained models with the following script. You need to rename the model (e.g., `facades_label2image` to `\u002Fcheckpoints\u002Ffacades\u002Flatest_net_G.t7`) after the download has finished.\n```bash\nbash .\u002Fmodels\u002Fdownload_model.sh model_name\n```\n- `facades_label2image` (label -> facade): trained on the CMP Facades dataset.\n- `cityscapes_label2image` (label -> street scene): trained on the Cityscapes dataset.\n- `cityscapes_image2label` (street scene -> label): trained on the Cityscapes dataset.\n- `edges2shoes` (edge -> photo): trained on UT Zappos50K dataset.\n- `edges2handbags` (edge -> photo): trained on Amazon handbags images.\n- `day2night` (daytime scene -> nighttime scene): trained on around 100 [webcams](http:\u002F\u002Ftransattr.cs.brown.edu\u002F).\n\n## Setup Training and Test data\n### Generating Pairs\nWe provide a python script to generate training data in the form of pairs of images {A,B}, where A and B are two different depictions of the same underlying scene. For example, these might be pairs {label map, photo} or {bw image, color image}. Then we can learn to translate A to B or B to A:\n\nCreate folder `\u002Fpath\u002Fto\u002Fdata` with subfolders `A` and `B`. `A` and `B` should each have their own subfolders `train`, `val`, `test`, etc. In `\u002Fpath\u002Fto\u002Fdata\u002FA\u002Ftrain`, put training images in style A. In `\u002Fpath\u002Fto\u002Fdata\u002FB\u002Ftrain`, put the corresponding images in style B. Repeat same for other data splits (`val`, `test`, etc).\n\nCorresponding images in a pair {A,B} must be the same size and have the same filename, e.g., `\u002Fpath\u002Fto\u002Fdata\u002FA\u002Ftrain\u002F1.jpg` is considered to correspond to `\u002Fpath\u002Fto\u002Fdata\u002FB\u002Ftrain\u002F1.jpg`.\n\nOnce the data is formatted this way, call:\n```bash\npython scripts\u002Fcombine_A_and_B.py --fold_A \u002Fpath\u002Fto\u002Fdata\u002FA --fold_B \u002Fpath\u002Fto\u002Fdata\u002FB --fold_AB \u002Fpath\u002Fto\u002Fdata\n```\n\nThis will combine each pair of images (A,B) into a single image file, ready for training.\n\n### Notes on Colorization\nNo need to run `combine_A_and_B.py` for colorization. Instead, you need to prepare some natural images and set `preprocess=colorization` in the script. The program will automatically convert each RGB image into Lab color space, and create  `L -> ab` image pair during the training. Also set `input_nc=1` and `output_nc=2`.\n\n### Extracting Edges\nWe provide python and Matlab scripts to extract coarse edges from photos. Run `scripts\u002Fedges\u002Fbatch_hed.py` to compute [HED](https:\u002F\u002Fgithub.com\u002Fs9xie\u002Fhed) edges. Run `scripts\u002Fedges\u002FPostprocessHED.m` to simplify edges with additional post-processing steps. Check the code documentation for more details.\n\n### Evaluating Labels2Photos on Cityscapes\nWe provide scripts for running the evaluation of the Labels2Photos task on the Cityscapes **validation** set. We assume that you have installed `caffe` (and `pycaffe`) in your system. If not, see the [official website](http:\u002F\u002Fcaffe.berkeleyvision.org\u002Finstallation.html) for installation instructions. Once `caffe` is successfully installed, download the pre-trained FCN-8s semantic segmentation model (512MB) by running\n```bash\nbash .\u002Fscripts\u002Feval_cityscapes\u002Fdownload_fcn8s.sh\n```\nThen make sure `.\u002Fscripts\u002Feval_cityscapes\u002F` is in your system's python path. If not, run the following command to add it\n```bash\nexport PYTHONPATH=${PYTHONPATH}:.\u002Fscripts\u002Feval_cityscapes\u002F\n```\nNow you can run the following command to evaluate your predictions:\n```bash\npython .\u002Fscripts\u002Feval_cityscapes\u002Fevaluate.py --cityscapes_dir \u002Fpath\u002Fto\u002Foriginal\u002Fcityscapes\u002Fdataset\u002F --result_dir \u002Fpath\u002Fto\u002Fyour\u002Fpredictions\u002F --output_dir \u002Fpath\u002Fto\u002Foutput\u002Fdirectory\u002F\n```\nImages stored under `--result_dir` should contain your model predictions on the Cityscapes **validation** split, and have the original Cityscapes naming convention (e.g., `frankfurt_000001_038418_leftImg8bit.png`). The script will output a text file under `--output_dir` containing the metric.\n\n**Further notes**: Our pre-trained FCN model is **not** supposed to work on Cityscapes in the original resolution (1024x2048) as it was trained on 256x256 images that are then upsampled to 1024x2048 during training. The purpose of the resizing during training was to 1) keep the label maps in the original high resolution untouched and 2) avoid the need to change the standard FCN training code and the architecture for Cityscapes. During test time, you need to synthesize 256x256 results. Our test code will automatically upsample your results to 1024x2048 before feeding them to the pre-trained FCN model. The output is at 1024x2048 resolution and will be compared to 1024x2048 ground truth labels. You do not need to resize the ground truth labels. The best way to verify whether everything is correct is to reproduce the numbers for real images in the paper first. To achieve it, you need to resize the original\u002Freal Cityscapes images (**not** labels) to 256x256 and feed them to the evaluation code.\n\n\n## Display UI\nOptionally, for displaying images during training and test, use the [display package](https:\u002F\u002Fgithub.com\u002Fszym\u002Fdisplay).\n\n- Install it with: `luarocks install https:\u002F\u002Fraw.githubusercontent.com\u002Fszym\u002Fdisplay\u002Fmaster\u002Fdisplay-scm-0.rockspec`\n- Then start the server with: `th -ldisplay.start`\n- Open this URL in your browser: [http:\u002F\u002Flocalhost:8000](http:\u002F\u002Flocalhost:8000)\n\nBy default, the server listens on localhost. Pass `0.0.0.0` to allow external connections on any interface:\n```bash\nth -ldisplay.start 8000 0.0.0.0\n```\nThen open `http:\u002F\u002F(hostname):(port)\u002F` in your browser to load the remote desktop.\n\nL1 error is plotted to the display by default. Set the environment variable `display_plot` to a comma-separated list of values `errL1`, `errG` and `errD` to visualize the L1, generator, and discriminator error respectively. For example, to plot only the generator and discriminator errors to the display instead of the default L1 error, set `display_plot=\"errG,errD\"`.\n\n## Citation\nIf you use this code for your research, please cite our paper \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fpdf\u002F1611.07004v1.pdf\">Image-to-Image Translation Using Conditional Adversarial Networks\u003C\u002Fa>:\n\n```\n@article{pix2pix2017,\n  title={Image-to-Image Translation with Conditional Adversarial Networks},\n  author={Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A},\n  journal={CVPR},\n  year={2017}\n}\n```\n\n## Cat Paper Collection\nIf you love cats, and love reading cool graphics, vision, and learning papers, please check out the Cat Paper Collection:  \n[[Github]](https:\u002F\u002Fgithub.com\u002Fjunyanz\u002FCatPapers) [[Webpage]](https:\u002F\u002Fwww.cs.cmu.edu\u002F~junyanz\u002Fcat\u002Fcat_papers.html)\n\n## Acknowledgments\nCode borrows heavily from [DCGAN](https:\u002F\u002Fgithub.com\u002Fsoumith\u002Fdcgan.torch). The data loader is modified from [DCGAN](https:\u002F\u002Fgithub.com\u002Fsoumith\u002Fdcgan.torch) and  [Context-Encoder](https:\u002F\u002Fgithub.com\u002Fpathak22\u002Fcontext-encoder).\n","# pix2pix\n[项目](https:\u002F\u002Fphillipi.github.io\u002Fpix2pix\u002F) | [Arxiv](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.07004) |\n[PyTorch](https:\u002F\u002Fgithub.com\u002Fjunyanz\u002Fpytorch-CycleGAN-and-pix2pix)\n\n用于学习从输入图像到输出图像映射的 Torch 实现，例如：\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphillipi_pix2pix_readme_230f170a6c3b.jpg\" width=\"900px\"\u002F>\n\n条件对抗网络下的图像到图像转换  \n [Phillip Isola](http:\u002F\u002Fweb.mit.edu\u002Fphillipi\u002F)、[Jun-Yan Zhu](https:\u002F\u002Fwww.cs.cmu.edu\u002F~junyanz\u002F)、[Tinghui Zhou](https:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~tinghuiz\u002F)、[Alexei A. Efros](https:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~efros\u002F)   \n CVPR, 2017。\n\n在某些任务上，使用较小的数据集即可较快地获得不错的效果。例如，要学习生成建筑立面（如上例所示），我们仅用400张图片训练了约2小时（在单个 Pascal Titan X GPU 上）。然而，对于更难的问题，可能需要使用更大的数据集，并且训练时间长达数小时甚至数天。\n\n**注**：请查看我们的 [PyTorch](https:\u002F\u002Fgithub.com\u002Fjunyanz\u002Fpytorch-CycleGAN-and-pix2pix) 实现，其中包含了 pix2pix 和 CycleGAN。PyTorch 版本目前仍在积极开发中，其效果可与或优于本 Torch 版本。\n\n## 设置\n\n### 前提条件\n- Linux 或 OSX\n- NVIDIA GPU + CUDA CuDNN（CPU 模式及无 CuDNN 的 CUDA 环境可能稍作修改后也能运行，但未经测试）\n\n### 开始使用\n- 从 https:\u002F\u002Fgithub.com\u002Ftorch\u002Fdistro 安装 torch 及其依赖项\n- 安装 torch 包 `nngraph` 和 `display`\n```bash\nluarocks install nngraph\nluarocks install https:\u002F\u002Fraw.githubusercontent.com\u002Fszym\u002Fdisplay\u002Fmaster\u002Fdisplay-scm-0.rockspec\n```\n- 克隆此仓库：\n```bash\ngit clone git@github.com:phillipi\u002Fpix2pix.git\ncd pix2pix\n```\n- 下载数据集（例如 [CMP Facades](http:\u002F\u002Fcmp.felk.cvut.cz\u002F~tylecr1\u002Ffacade\u002F)）：\n```bash\nbash .\u002Fdatasets\u002Fdownload_dataset.sh facades\n```\n- 训练模型\n```bash\nDATA_ROOT=.\u002Fdatasets\u002Ffacades name=facades_generation which_direction=BtoA th train.lua\n```\n- （仅限 CPU）不使用 GPU 或 CuDNN 的相同训练命令。设置环境变量 ```gpu=0 cudnn=0``` 强制仅使用 CPU\n```bash\nDATA_ROOT=.\u002Fdatasets\u002Ffacades name=facades_generation which_direction=BtoA gpu=0 cudnn=0 batchSize=10 save_epoch_freq=5 th train.lua\n```\n- （可选）启动显示服务器以在模型训练时查看结果。（有关详细信息，请参阅 [Display UI](#display-ui)）：\n```bash\nth -ldisplay.start 8000 0.0.0.0\n```\n\n- 最后，测试模型：\n```bash\nDATA_ROOT=.\u002Fdatasets\u002Ffacades name=facades_generation which_direction=BtoA phase=val th test.lua\n```\n测试结果将保存到此处的 HTML 文件中：`.\u002Fresults\u002Ffacades_generation\u002Flatest_net_G_val\u002Findex.html`。\n\n## 训练\n```bash\nDATA_ROOT=\u002Fpath\u002Fto\u002Fdata\u002F name=expt_name which_direction=AtoB th train.lua\n```\n将 `AtoB` 改为 `BtoA` 即可反向训练。\n\n模型将保存到 `.\u002Fcheckpoints\u002Fexpt_name` 目录下（可通过在 train.lua 中传递 `checkpoint_dir=your_dir` 来更改）。\n\n更多训练选项请参阅 train.lua 中的 `opt` 部分。\n\n## 测试\n```bash\nDATA_ROOT=\u002Fpath\u002Fto\u002Fdata\u002F name=expt_name which_direction=AtoB phase=val th test.lua\n```\n\n这将在 `\u002Fpath\u002Fto\u002Fdata\u002Fval` 中的所有图像上运行名为 `expt_name` 的模型，方向为 `AtoB`。\n\n结果图像以及用于查看它们的网页将保存到 `.\u002Fresults\u002Fexpt_name` 目录下（可通过在 test.lua 中传递 `results_dir=your_dir` 来更改）。\n\n更多测试选项请参阅 test.lua 中的 `opt` 部分。\n\n\n## 数据集\n使用以下脚本下载数据集。部分数据集由其他研究人员收集，请在使用这些数据时引用他们的论文。\n```bash\nbash .\u002Fdatasets\u002Fdownload_dataset.sh dataset_name\n```\n- `facades`：来自 [CMP Facades 数据集](http:\u002F\u002Fcmp.felk.cvut.cz\u002F~tylecr1\u002Ffacade\u002F) 的 400 张图片。[[引用](datasets\u002Fbibtex\u002Ffacades.tex)]\n- `cityscapes`：来自 [Cityscapes 训练集](https:\u002F\u002Fwww.cityscapes-dataset.com\u002F) 的 2975 张图片。[[引用](datasets\u002Fbibtex\u002Fcityscapes.tex)]\n- `maps`：从 Google 地图抓取的 1096 张训练图片\n- `edges2shoes`：来自 [UT Zappos50K 数据集](http:\u002F\u002Fvision.cs.utexas.edu\u002Fprojects\u002Ffinegrained\u002Futzap50k\u002F) 的 5 万张训练图片。边缘由 [HED](https:\u002F\u002Fgithub.com\u002Fs9xie\u002Fhed) 边缘检测器结合后处理计算得出。\n[[引用](datasets\u002Fbibtex\u002Fshoes.tex)]\n- `edges2handbags`：来自 [iGAN 项目](https:\u002F\u002Fgithub.com\u002Fjunyanz\u002FiGAN) 的 13.7 万张亚马逊手袋图片。边缘同样由 [HED](https:\u002F\u002Fgithub.com\u002Fs9xie\u002Fhed) 边缘检测器结合后处理计算得出。[[引用](datasets\u002Fbibtex\u002Fhandbags.tex)]\n- `night2day`：约 2 万张来自 [Transient Attributes 数据集](http:\u002F\u002Ftransattr.cs.brown.edu\u002F) 的自然场景图片 [[引用](datasets\u002Fbibtex\u002Ftransattr.tex)]。若要训练 `day2night` 的 pix2pix 模型，需添加 `which_direction=BtoA`。\n\n## 模型\n使用以下脚本下载预训练模型。下载完成后，需重命名模型（例如将 `facades_label2image` 重命名为 `\u002Fcheckpoints\u002Ffacades\u002Flatest_net_G.t7`）。\n```bash\nbash .\u002Fmodels\u002Fdownload_model.sh model_name\n```\n- `facades_label2image`（标签 → 建筑立面）：基于 CMP Facades 数据集训练\n- `cityscapes_label2image`（标签 → 街景）：基于 Cityscapes 数据集训练\n- `cityscapes_image2label`（街景 → 标签）：基于 Cityscapes 数据集训练\n- `edges2shoes`（边缘 → 照片）：基于 UT Zappos50K 数据集训练\n- `edges2handbags`（边缘 → 照片）：基于亚马逊手袋图片训练\n- `day2night`（白天场景 → 夜间场景）：基于约 100 个 [网络摄像头](http:\u002F\u002Ftransattr.cs.brown.edu\u002F) 训练\n\n## 准备训练和测试数据\n### 生成配对数据\n我们提供了一个 Python 脚本，用于生成成对的图像数据 {A,B}，其中 A 和 B 是同一场景的不同表现形式。例如，这些可能是 {标签图，照片} 或 {黑白图像，彩色图像} 的配对。然后我们可以学习将 A 转换为 B，或 B 转换为 A：\n\n创建文件夹 `\u002Fpath\u002Fto\u002Fdata`，并在其中建立子文件夹 `A` 和 `B`。`A` 和 `B` 应分别包含 `train`、`val`、`test` 等子文件夹。在 `\u002Fpath\u002Fto\u002Fdata\u002FA\u002Ftrain` 中放入 A 风格的训练图片，在 `\u002Fpath\u002Fto\u002Fdata\u002FB\u002Ftrain` 中放入对应的 B 风格图片。对其他数据划分（`val`、`test` 等）重复此操作。\n\n配对中的两张图片 A 和 B 必须具有相同的尺寸和文件名，例如 `\u002Fpath\u002Fto\u002Fdata\u002FA\u002Ftrain\u002F1.jpg` 被认为与 `\u002Fpath\u002Fto\u002Fdata\u002FB\u002Ftrain\u002F1.jpg` 对应。\n\n数据按上述方式整理好后，运行：\n```bash\npython scripts\u002Fcombine_A_and_B.py --fold_A \u002Fpath\u002Fto\u002Fdata\u002FA --fold_B \u002Fpath\u002Fto\u002Fdata\u002FB --fold_AB \u002Fpath\u002Fto\u002Fdata\n```\n\n这会将每对图像 (A,B) 合并为一个单独的图像文件，以便进行训练。\n\n### 关于色彩化的一些说明\n进行色彩化时无需运行 `combine_A_and_B.py`。相反，您需要准备一些自然图像，并在脚本中设置 `preprocess=colorization`。程序会自动将每张 RGB 图像转换为 Lab 色彩空间，并在训练过程中创建 `L → ab` 的图像对。同时请设置 `input_nc=1` 和 `output_nc=2`。\n\n### 提取边缘\n我们提供了 Python 和 Matlab 脚本，用于从照片中提取粗略的边缘。运行 `scripts\u002Fedges\u002Fbatch_hed.py` 来计算 [HED](https:\u002F\u002Fgithub.com\u002Fs9xie\u002Fhed) 边缘。运行 `scripts\u002Fedges\u002FPostprocessHED.m` 以通过额外的后处理步骤简化边缘。更多详情请参阅代码文档。\n\n### 在 Cityscapes 数据集上评估 Labels2Photos 任务\n我们提供了在 Cityscapes **验证** 集上运行 Labels2Photos 任务评估的脚本。我们假设您已经在系统中安装了 `caffe`（以及 `pycaffe`）。如果没有，请参阅 [官方网站](http:\u002F\u002Fcaffe.berkeleyvision.org\u002Finstallation.html) 获取安装说明。成功安装 `caffe` 后，通过运行以下命令下载预训练的 FCN-8s 语义分割模型（512MB）：\n```bash\nbash .\u002Fscripts\u002Feval_cityscapes\u002Fdownload_fcn8s.sh\n```\n然后确保 `.\u002Fscripts\u002Feval_cityscapes\u002F` 在您的系统 Python 路径中。如果不在，请运行以下命令将其添加：\n```bash\nexport PYTHONPATH=${PYTHONPATH}:.\u002Fscripts\u002Feval_cityscapes\u002F\n```\n现在您可以运行以下命令来评估您的预测结果：\n```bash\npython .\u002Fscripts\u002Feval_cityscapes\u002Fevaluate.py --cityscapes_dir \u002Fpath\u002Fto\u002Foriginal\u002Fcityscapes\u002Fdataset\u002F --result_dir \u002Fpath\u002Fto\u002Fyour\u002Fpredictions\u002F --output_dir \u002Fpath\u002Fto\u002Foutput\u002Fdirectory\u002F\n```\n存储在 `--result_dir` 下的图像应包含您模型在 Cityscapes **验证** 分割上的预测结果，并且遵循原始 Cityscapes 的命名规范（例如：`frankfurt_000001_038418_leftImg8bit.png`）。该脚本将在 `--output_dir` 下输出一个包含评估指标的文本文件。\n\n**进一步说明**：我们的预训练 FCN 模型 **不** 应用于原始分辨率（1024×2048）的 Cityscapes 数据集，因为它是在 256×256 的图像上训练的，这些图像在训练过程中被上采样到 1024×2048。训练时进行尺寸调整的目的在于：1) 保持标签图的原始高分辨率不变；2) 避免为 Cityscapes 数据集修改标准的 FCN 训练代码和网络架构。在测试时，您需要生成 256×256 的结果。我们的测试代码会自动将您的结果上采样到 1024×2048，然后再输入到预训练的 FCN 模型中。输出结果为 1024×2048 分辨率，并与 1024×2048 的真实标签进行比较。您无需对真实标签进行任何尺寸调整。验证一切是否正确的一个好方法是先复现论文中真实图像的评估结果。为此，您需要将原始\u002F真实的 Cityscapes 图像（**不是**标签）调整为 256×256，然后将其输入到评估代码中。\n\n## 显示界面\n可选地，在训练和测试过程中显示图像时，可以使用 [display 包](https:\u002F\u002Fgithub.com\u002Fszym\u002Fdisplay)。\n\n- 安装方法：`luarocks install https:\u002F\u002Fraw.githubusercontent.com\u002Fszym\u002Fdisplay\u002Fmaster\u002Fdisplay-scm-0.rockspec`\n- 启动服务器：`th -ldisplay.start`\n- 在浏览器中打开此网址：[http:\u002F\u002Flocalhost:8000](http:\u002F\u002Flocalhost:8000)\n\n默认情况下，服务器仅监听本地回环地址。若要允许外部连接到任意网络接口，可传递 `0.0.0.0` 参数：\n```bash\nth -ldisplay.start 8000 0.0.0.0\n```\n然后在浏览器中打开 `http:\u002F\u002F(hostname):(port)\u002F` 即可加载远程桌面。\n\n默认情况下，L1 误差会被绘制到显示界面上。您可以设置环境变量 `display_plot` 为逗号分隔的值列表 `errL1`, `errG` 和 `errD`，分别可视化 L1 误差、生成器误差和判别器误差。例如，如果您只想在显示界面上绘制生成器和判别器的误差，而不是默认的 L1 误差，可以设置 `display_plot=\"errG,errD\"`。\n\n## 引用\n如果您在研究中使用了本代码，请引用我们的论文 \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fpdf\u002F1611.07004v1.pdf\">基于条件对抗网络的图像到图像转换\u003C\u002Fa>：\n\n```\n@article{pix2pix2017,\n  title={Image-to-Image Translation with Conditional Adversarial Networks},\n  author={Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A},\n  journal={CVPR},\n  year={2017}\n}\n```\n\n## 猫论文合集\n如果您喜欢猫，并且热爱阅读关于图形学、视觉和机器学习的优秀论文，请查看猫论文合集：\n[[Github]](https:\u002F\u002Fgithub.com\u002Fjunyanz\u002FCatPapers) [[网页]](https:\u002F\u002Fwww.cs.cmu.edu\u002F~junyanz\u002Fcat\u002Fcat_papers.html)\n\n## 致谢\n本代码大量借鉴了 [DCGAN](https:\u002F\u002Fgithub.com\u002Fsoumith\u002Fdcgan.torch)。数据加载器则基于 [DCGAN](https:\u002F\u002Fgithub.com\u002Fsoumith\u002Fdcgan.torch) 和 [Context-Encoder](https:\u002F\u002Fgithub.com\u002Fpathak22\u002Fcontext-encoder) 进行了修改。","# pix2pix 快速上手指南\n\npix2pix 是一个基于条件生成对抗网络（cGAN）的图像到图像翻译工具，可用于将输入图像映射为输出图像（例如：素描转照片、标签图转街景、白天转黑夜等）。\n\n> **注意**：官方推荐使用 [PyTorch 版本](https:\u002F\u002Fgithub.com\u002Fjunyanz\u002Fpytorch-CycleGAN-and-pix2pix)，该版本维护更活跃且效果相当或更好。本指南基于原始的 Torch (Lua) 实现。\n\n## 1. 环境准备\n\n### 系统要求\n- **操作系统**：Linux 或 macOS\n- **硬件**：NVIDIA GPU + CUDA + CuDNN\n  - *注：CPU 模式也可运行（需设置参数），但未充分测试；无 CuDNN 的 CUDA 环境可能需要少量修改。*\n\n### 前置依赖\n- **Torch 框架**：需安装 Lua 版本的 Torch。\n- **LuaRocks 包管理器**：用于安装额外依赖。\n\n## 2. 安装步骤\n\n### 第一步：安装 Torch 核心\n访问 [Torch 官网](http:\u002F\u002Ftorch.ch\u002F) 或 GitHub 仓库安装基础环境：\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ftorch\u002Fdistro.git ~\u002Ftorch --recursive\ncd ~\u002Ftorch; bash install.sh  # 根据提示操作，完成后执行 source ~\u002F.bashrc 或 source ~\u002F.zshrc\n```\n\n### 第二步：安装必要包\n安装 `nngraph` 和 `display` 模块：\n```bash\nluarocks install nngraph\nluarocks install https:\u002F\u002Fraw.githubusercontent.com\u002Fszym\u002Fdisplay\u002Fmaster\u002Fdisplay-scm-0.rockspec\n```\n\n### 第三步：克隆项目代码\n```bash\ngit clone git@github.com:phillipi\u002Fpix2pix.git\ncd pix2pix\n```\n\n### 第四步：下载数据集\n使用内置脚本下载示例数据集（以建筑立面数据集 `facades` 为例）：\n```bash\nbash .\u002Fdatasets\u002Fdownload_dataset.sh facades\n```\n*国内用户若下载缓慢，可手动从 [CMP Facades 数据集](http:\u002F\u002Fcmp.felk.cvut.cz\u002F~tylecr1\u002Ffacade\u002F) 下载后放入 `datasets` 目录。*\n\n## 3. 基本使用\n\n以下流程演示如何训练一个将“标签图”转换为“建筑照片”的模型（方向：BtoA）。\n\n### 训练模型\n运行以下命令开始训练（默认使用 GPU）：\n```bash\nDATA_ROOT=.\u002Fdatasets\u002Ffacades name=facades_generation which_direction=BtoA th train.lua\n```\n\n**可选：仅使用 CPU 训练**\n若无可用 GPU，添加环境变量强制使用 CPU：\n```bash\nDATA_ROOT=.\u002Fdatasets\u002Ffacades name=facades_generation which_direction=BtoA gpu=0 cudnn=0 batchSize=10 save_epoch_freq=5 th train.lua\n```\n\n**可选：可视化训练过程**\n在新终端窗口启动显示服务器，浏览器访问 `http:\u002F\u002Flocalhost:8000` 查看实时生成结果：\n```bash\nth -ldisplay.start 8000 0.0.0.0\n```\n\n### 测试模型\n训练完成后，使用以下命令在验证集上测试模型：\n```bash\nDATA_ROOT=.\u002Fdatasets\u002Ffacades name=facades_generation which_direction=BtoA phase=val th test.lua\n```\n\n测试结果将保存为 HTML 文件，路径如下：\n`.\u002Fresults\u002Ffacades_generation\u002Flatest_net_G_val\u002Findex.html`\n直接在浏览器打开该文件即可查看输入与生成的对比图。\n\n---\n**自定义数据提示**：\n若要使用自己的数据集，请准备成对图像（A 和 B 风格），分别放入 `data\u002FA` 和 `data\u002FB` 文件夹，确保文件名一致且尺寸相同，然后运行：\n```bash\npython scripts\u002Fcombine_A_and_B.py --fold_A \u002Fpath\u002Fto\u002Fdata\u002FA --fold_B \u002Fpath\u002Fto\u002Fdata\u002FB --fold_AB \u002Fpath\u002Fto\u002Fdata\n```\n之后将 `DATA_ROOT` 指向合并后的数据目录即可。","某城市规划院的设计师需要将大量手绘的建筑立面草图快速转化为逼真的实景效果图，以向客户展示改造方案。\n\n### 没有 pix2pix 时\n- **人工绘制耗时极长**：设计师需手动为每张草图上色、添加光影和材质纹理，单张图纸处理往往需要数小时甚至数天。\n- **风格难以统一**：不同设计师或同一设计师在不同时间绘制的效果图，在光照角度、色彩饱和度和细节表现上存在明显差异，导致方案集显得杂乱。\n- **修改成本高昂**：一旦客户提出调整建筑窗户样式或墙面材质，几乎需要重新绘制整张效果图，无法快速响应反馈。\n- **依赖高端渲染农场**：若使用传统 3D 建模渲染流程，需要构建精细模型并消耗大量算力资源，小团队难以承担硬件成本。\n\n### 使用 pix2pix 后\n- **秒级自动转化**：只需训练一次模型（如使用 400 张立面图数据训练约 2 小时），即可将新的手绘草图在几秒钟内自动转换为高保真实景图。\n- **输出风格高度一致**：模型学习到的映射关系确保了所有生成图像拥有统一的光照逻辑和材质质感，大幅提升方案书的专业度。\n- **即时迭代修改**：设计师仅需修改输入端的草图线条（如改变窗户位置），pix2pix 便能立即生成对应的更新版效果图，实现“所画即所得”。\n- **降低硬件门槛**：基于条件对抗生成网络，该工具在单块消费级显卡上即可高效运行，无需昂贵的渲染集群支持。\n\npix2pix 通过将繁琐的图像翻译过程自动化，让建筑师能从重复的绘图工作中解放出来，专注于创意设计与方案优化。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fphillipi_pix2pix_230f170a.jpg","phillipi","Phillip Isola","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fphillipi_e2a6f2ed.png",null,"phillip.isola@gmail.com","http:\u002F\u002Fweb.mit.edu\u002Fphillipi\u002F","https:\u002F\u002Fgithub.com\u002Fphillipi",[81,85,89,93,97],{"name":82,"color":83,"percentage":84},"Lua","#000080",72.6,{"name":86,"color":87,"percentage":88},"Python","#3572A5",17.9,{"name":90,"color":91,"percentage":92},"MATLAB","#e16737",6.3,{"name":94,"color":95,"percentage":96},"TeX","#3D6117",2.2,{"name":98,"color":99,"percentage":100},"Shell","#89e051",1,10624,1736,"2026-04-16T06:15:39","NOASSERTION",4,"Linux, macOS","需要 NVIDIA GPU + CUDA CuDNN（官方未测试纯 CPU 模式或无 CuDNN 的 CUDA 模式，虽提及可微调运行但不保证）；具体显存大小和 CUDA 版本未说明，但示例提到在 Pascal Titan X GPU 上训练。","未说明",{"notes":110,"python":111,"dependencies":112},"1. 该工具主要基于旧的 Lua Torch 框架，而非现代 Python PyTorch（README 强烈建议用户转向其 PyTorch 版本以获得更好支持和效果）。2. 需安装 LuaRocks 来管理 Lua 依赖包。3. 若需运行 Cityscapes 数据集的评估脚本，必须额外安装 Caffe 和 pycaffe。4. 数据准备阶段可能需要 Python 脚本处理图像配对或边缘提取（需 HED 边缘检测器）。5. 官方示例显示在单张 Pascal Titan X GPU 上训练小规模数据集约需 2 小时。","未说明 (主要基于 Lua\u002FTorch 框架，仅部分数据预处理脚本使用 Python)",[113,114,115,116,117,118,119],"Torch (Lua 版本)","nngraph","display (szym\u002Fdisplay)","CUDA","CuDNN","Caffe (仅用于 Cityscapes 评估脚本)","pycaffe",[14,35,15],[122,123,124,64,125,126,127,128,129,130],"computer-vision","computer-graphics","gan","dcgan","generative-adversarial-network","deep-learning","image-generation","image-manipulation","image-to-image-translation","2026-03-27T02:49:30.150509","2026-04-17T09:54:13.560452",[134,139,144,149,154,158],{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},36541,"在评估 Cityscapes 数据集时，为什么 FCN 模型输出的分割结果全为 0 或效果很差？","这通常是因为输入图像的尺寸问题。FCN 模型期望输入是特定尺寸（如 256x256）的图像。如果生成的图像尺寸不同（例如 256x512），直接在生成后调整大小再输入 FCN 可能会导致错误的结果。建议在生成阶段就使用正确的尺寸（如 256x256），或者确保预处理步骤（包括缩放和归一化）与训练该 FCN 模型时的设置完全一致。此外，PyTorch 模型与原始 Torch 模型之间可能存在细微差异，导致分数略有不同。","https:\u002F\u002Fgithub.com\u002Fphillipi\u002Fpix2pix\u002Fissues\u002F148",{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},36542,"训练 pix2pix 时遇到 'luajit: not enough memory' (内存不足) 错误怎么办？","可以通过减少批处理大小（batchSize）来缓解内存不足的问题。例如，将 batchSize 从默认值降低到 10 或更小。另外，如果 GPU 显存不足，可以尝试在 CPU 上运行训练（设置 gpu=0），虽然速度会慢很多，但可以解决内存溢出问题。命令示例：`DATA_ROOT=.\u002Fdatasets\u002Fxxx name=xxx which_direction=AtoB gpu=0 cudnn=0 batchSize=10 th train.lua`。","https:\u002F\u002Fgithub.com\u002Fphillipi\u002Fpix2pix\u002Fissues\u002F48",{"id":145,"question_zh":146,"answer_zh":147,"source_url":148},36543,"如何使用预训练的 Caffe FCN 模型评估 Cityscapes 数据集的真实图像（Ground Truth）？","直接使用原始高分辨率（如 1024x2048）的 Cityscapes 图像进行评估会导致非常低的分数（例如 mIoU 约为 0.05）。这是因为预训练的 FCN 模型是在 256x256 分辨率下训练或评估的。为了获得合理的评估结果（接近论文中的基准，如 mIoU ~0.21），必须先将输入图像（无论是真实图像还是生成图像）调整为 256x256 分辨率，然后再输入到 FCN 模型中进行评估。","https:\u002F\u002Fgithub.com\u002Fphillipi\u002Fpix2pix\u002Fissues\u002F116",{"id":150,"question_zh":151,"answer_zh":152,"source_url":153},36544,"在 Cityscapes 任务中，如何正确评估从照片到标签（photo->label）的生成结果？","评估 photo->label 任务时，需要确保输入给评估脚本的图像格式正确。常见错误包括直接对灰度级的标签图进行评估而未进行适当的编码转换，或者图像尺寸不匹配。如果遇到所有参数为零或报错的情况，请检查是否使用了正确的验证集图像（如 500 张 val 图像），并确认图像预处理流程（包括尺寸调整为 256x256）与评估脚本的要求一致。","https:\u002F\u002Fgithub.com\u002Fphillipi\u002Fpix2pix\u002Fissues\u002F112",{"id":155,"question_zh":156,"answer_zh":157,"source_url":138},36545,"为什么我在不同图像尺寸（如 128x256 vs 256x512）下评估得到的 FCN 分数差异巨大甚至高于真实值？","这是因为 FCN 评分对输入图像的分辨率非常敏感。实验表明，只有在生成时直接使用 256x256 尺寸的图像，或者在输入 FCN 之前严格将其重采样到 256x256，才能得到可比的分数。如果在生成时使用较大尺寸（如 256x512）然后强行缩放，可能会引入伪影或改变特征分布，导致评分异常（有时甚至虚高）。建议统一在 256x256 分辨率下进行生成和评估以保证结果的可复现性。",{"id":159,"question_zh":160,"answer_zh":161,"source_url":138},36546,"PyTorch 版本的 pix2pix 复现结果与原始论文（Torch 版本）有差异是正常的吗？","是的，这是正常现象。论文的原始结果是基于 Torch 框架训练的模型得出的。由于框架差异、随机种子、底层库实现细节等不同，PyTorch 版本复现的结果可能会有轻微差别，有时更好，有时稍差。只要数量级一致且趋势符合预期，通常不需要过度担心。",[]]