[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-Fantasy-Studio--Paint-by-Example":3,"tool-Fantasy-Studio--Paint-by-Example":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":76,"owner_company":76,"owner_location":76,"owner_email":76,"owner_twitter":76,"owner_website":76,"owner_url":78,"languages":79,"stars":88,"forks":89,"last_commit_at":90,"license":91,"difficulty_score":10,"env_os":92,"env_gpu":93,"env_ram":92,"env_deps":94,"category_tags":101,"github_topics":102,"view_count":23,"oss_zip_url":76,"oss_zip_packed_at":76,"status":16,"created_at":112,"updated_at":113,"faqs":114,"releases":150},3536,"Fantasy-Studio\u002FPaint-by-Example","Paint-by-Example","Paint by Example: Exemplar-based Image Editing with Diffusion Models","Paint-by-Example 是一款基于扩散模型的图像编辑工具，它突破了传统“文字描述生成”的限制，实现了更精准的“以图修图”。用户只需提供一张参考图片（例如想要替换的物体样式）和一张待编辑的原图，Paint-by-Example 就能将参考图中的物体自然地融合到原图的指定位置，同时完美保持光影、透视和整体风格的一致性。\n\n该工具主要解决了现有 AI 编辑中难以精确控制物体外观细节的痛点。以往通过文字提示词往往难以准确描述复杂的纹理或特定造型，而 Paint-by-Example 通过自监督训练和信息瓶颈技术，有效避免了简单的“复制粘贴”造成的生硬拼接痕迹，确保编辑结果既逼真又自然。其核心亮点在于无需迭代优化，仅需单次前向推理即可完成高质量编辑，并支持任意形状的掩膜控制。\n\n这款工具非常适合设计师、数字艺术家以及需要精细图像合成的研究人员使用。对于希望将特定素材无缝融入场景的专业人士，或者探索基于样本图像生成技术的开发者来说，Paint-by-Example 提供了一个高效且可控的解决方案。目前项目已开源代码并提供在线演示，方便各类用户快速体验与二次开发。","# Paint by Example: Exemplar-based Image Editing with Diffusion Models\n![Teaser](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FFantasy-Studio_Paint-by-Example_readme_fa467ca5469a.png)\n### [Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.13227) | [Huggingface Demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FFantasy-Studio\u002FPaint-by-Example) \n\u003C!-- \u003Cbr> -->\n[Binxin Yang](https:\u002F\u002Forcid.org\u002F0000-0003-4110-1986), [Shuyang Gu](http:\u002F\u002Fhome.ustc.edu.cn\u002F~gsy777\u002F), [Bo Zhang](https:\u002F\u002Fbo-zhang.me\u002F), [Ting Zhang](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fpeople\u002Ftinzhan\u002F), [Xuejin Chen](http:\u002F\u002Fstaff.ustc.edu.cn\u002F~xjchen99\u002F), [Xiaoyan Sun](http:\u002F\u002Fstaff.ustc.edu.cn\u002F~xysun720\u002F), [Dong Chen](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fpeople\u002Fdoch\u002F) and [Fang Wen](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fpeople\u002Ffangwen\u002F).\n\u003C!-- \u003Cbr> -->\n\n## Abstract\n>Language-guided image editing has achieved great success recently. In this paper, for the first time, we investigate exemplar-guided image editing for more precise control. We achieve this goal by leveraging self-supervised training to disentangle and re-organize the source image and the exemplar. However, the naive approach will cause obvious fusing artifacts. We carefully analyze it and propose an information bottleneck and strong augmentations to avoid the trivial solution of directly copying and pasting the exemplar image. Meanwhile, to ensure the controllability of the editing process, we design an arbitrary shape mask for the exemplar image and leverage the classifier-free guidance to increase the similarity to the exemplar image. The whole framework involves a single forward of the diffusion model without any iterative optimization. We demonstrate that our method achieves an impressive performance and enables controllable editing on in-the-wild images with high fidelity.\n>\n## News\n- *2023-11-28* The recent work Asymmetric VQGAN improves the preservation of details in non-masked regions. For comprehensive details, please refer to the associated [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.04632), [github]( https:\u002F\u002Fgithub.com\u002Fbuxiangzhiren\u002FAsymmetric_VQGAN).\n- *2023-05-13* Release code for quantitative results.\n- *2023-03-03* Release test benchmark.\n- *2023-02-23* Non-official 3rd party apps support by [ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002Fdamo\u002Fcv_stable-diffusion_paint-by-example\u002Fsummary) (the largest Model Community in Chinese).\n- *2022-12-07* Release a [Gradio](https:\u002F\u002Fgradio.app\u002F) demo on [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FFantasy-Studio\u002FPaint-by-Example) Spaces.\n- *2022-11-29* Upload code.\n\n\n## Requirements\nA suitable [conda](https:\u002F\u002Fconda.io\u002F) environment named `Paint-by-Example` can be created\nand activated with:\n\n```\nconda env create -f environment.yaml\nconda activate Paint-by-Example\n```\n\n## Pretrained Model\nWe provide the checkpoint ([Google Drive](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F15QzaTWsvZonJcXsNv-ilMRCYaQLhzR_i\u002Fview?usp=share_link) | [Hugging Face](https:\u002F\u002Fhuggingface.co\u002FFantasy-Studio\u002FPaint-by-Example\u002Fresolve\u002Fmain\u002Fmodel.ckpt)) that is trained on [Open-Images](https:\u002F\u002Fstorage.googleapis.com\u002Fopenimages\u002Fweb\u002Findex.html) for 40 epochs. By default, we assume that the pretrained model is downloaded and saved to the directory `checkpoints`.\n\n## Testing\n\nTo sample from our model, you can use `scripts\u002Finference.py`. For example, \n```\npython scripts\u002Finference.py \\\n--plms --outdir results \\\n--config configs\u002Fv1.yaml \\\n--ckpt checkpoints\u002Fmodel.ckpt \\\n--image_path examples\u002Fimage\u002Fexample_1.png \\\n--mask_path examples\u002Fmask\u002Fexample_1.png \\\n--reference_path examples\u002Freference\u002Fexample_1.jpg \\\n--seed 321 \\\n--scale 5\n```\nor simply run:\n```\nsh test.sh\n```\nVisualization of inputs and output:\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FFantasy-Studio_Paint-by-Example_readme_98896925c6b0.png)\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FFantasy-Studio_Paint-by-Example_readme_61c0a1cfe0c4.png)\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FFantasy-Studio_Paint-by-Example_readme_0296208edaf0.png)\n\n## Training\n\n### Data preparing\n- Download separate packed files of Open-Images dataset from [CVDF's site](https:\u002F\u002Fgithub.com\u002Fcvdfoundation\u002Fopen-images-dataset#download-images-with-bounding-boxes-annotations) and unzip them to the directory `dataset\u002Fopen-images\u002Fimages`.\n- Download bbox annotations of Open-Images dataset from [Open-Images official site](https:\u002F\u002Fstorage.googleapis.com\u002Fopenimages\u002Fweb\u002Fdownload_v7.html#download-manually) and save them to the directory `dataset\u002Fopen-images\u002Fannotations`.\n- Generate bbox annotations of each image in txt format.\n    ```\n    python scripts\u002Fread_bbox.py\n    ```\n\nThe data structure is like this:\n```\ndataset\n├── open-images\n│  ├── annotations\n│  │  ├── class-descriptions-boxable.csv\n│  │  ├── oidv6-train-annotations-bbox.csv\n│  │  ├── test-annotations-bbox.csv\n│  │  ├── validation-annotations-bbox.csv\n│  ├── images\n│  │  ├── train_0\n│  │  │  ├── xxx.jpg\n│  │  │  ├── ...\n│  │  ├── train_1\n│  │  ├── ...\n│  │  ├── validation\n│  │  ├── test\n│  ├── bbox\n│  │  ├── train_0\n│  │  │  ├── xxx.txt\n│  │  │  ├── ...\n│  │  ├── train_1\n│  │  ├── ...\n│  │  ├── validation\n│  │  ├── test\n```\n\n### Download the pretrained model of Stable Diffusion\nWe utilize the pretrained Stable Diffusion v1-4 as initialization, please download the pretrained models from [Hugging Face](https:\u002F\u002Fhuggingface.co\u002FCompVis\u002Fstable-diffusion-v-1-4-original) and save the model to directory `pretrained_models`. Then run the following script to add zero-initialized weights for 5 additional input channels of the UNet (4 for the encoded masked-image and 1 for the mask itself).\n```\npython scripts\u002Fmodify_checkpoints.py\n```\n\n### Training Paint by Example\nTo train a new model on Open-Images, you can use `main.py`. For example,\n```\npython -u main.py \\\n--logdir models\u002FPaint-by-Example \\\n--pretrained_model pretrained_models\u002Fsd-v1-4-modified-9channel.ckpt \\\n--base configs\u002Fv1.yaml \\\n--scale_lr False\n```\nor simply run:\n```\nsh train.sh\n```\n\n## Test Benchmark\nWe build a test benchmark for quantitative analysis. Specifically, we manually select 3500 source images from MSCOCO validation set, each image contains only one bounding box. Then we manually retrieve a reference image patch from MSCOCO training set. The reference image usually shares a similar semantic with mask region to ensure the combination is reasonable. We named it as COCO Exemplar-based image Editing benchmark, abbreviated as COCOEE. This test benchmark can be downloaded from [Google Drive](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F18wO_wSFF-GPNxWmO1bt6LdjubXcttqtO\u002Fview?usp=share_link).\n\n## Quantitative Results\nBy default, we assume that the COCOEE is downloaded and saved to the directory `test_bench`. To generate the results of test bench, you can use `scripts\u002Finference_test_bench.py`. For example, \n```\npython scripts\u002Finference_test_bench.py \\\n--plms \\\n--outdir results\u002Ftest_bench \\\n--config configs\u002Fv1.yaml \\\n--ckpt checkpoints\u002Fmodel.ckpt \\\n--scale 5\n```\nor simply run:\n```\nbash inference_test_bench.sh\n```\n### FID Score\nBy default, we assume that the test set of COCO2017 is downloaded and saved to the directory `dataset`.\nThe data structure is like this:\n```\ndataset\n├── coco\n│  ├── test2017\n│  │  ├── xxx.jpg\n│  │  ├── xxx.jpg\n│  │  ├── ...\n│  │  ├── xxx.jpg\n```\nThen convert the images into square images with 512 solution.\n  ```\n  python scripts\u002Fcreate_square_gt_for_fid.py\n  ```\nTo calculate FID score, simply run:\n```\npython eval_tool\u002Ffid\u002Ffid_score.py --device cuda \\\ntest_bench\u002Ftest_set_GT \\\nresults\u002Ftest_bench\u002Fresults\n```\n### QS Score\nPlease download the model weights for QS score from [Google Drive](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1Ce2cSQ8UttxcEk03cjfJgaBwdhSPyuHI\u002Fview?usp=share_link) and save the model to directory `eval_tool\u002Fgmm`.\nTo calculate QS score, simply run:\n```\npython eval_tool\u002Fgmm\u002Fgmm_score_coco.py results\u002Ftest_bench\u002Fresults \\\n--gmm_path eval_tool\u002Fgmm\u002Fcoco2017_gmm_k20 \\\n--gpu 1\n```\n\n### CLIP Score\nTo calculate CLIP score, simply run:\n```\npython eval_tool\u002Fclip_score\u002Fregion_clip_score.py \\\n--result_dir results\u002Ftest_bench\u002Fresults\n```\n\n\n## Citing Paint by Example\n\n```\n@article{yang2022paint,\n  title={Paint by Example: Exemplar-based Image Editing with Diffusion Models},\n  author={Binxin Yang and Shuyang Gu and Bo Zhang and Ting Zhang and Xuejin Chen and Xiaoyan Sun and Dong Chen and Fang Wen},\n  journal={arXiv preprint arXiv:2211.13227},\n  year={2022}\n}\n```\n\n## Acknowledgements\n\nThis code borrows heavily from [Stable Diffusion](https:\u002F\u002Fgithub.com\u002FCompVis\u002Fstable-diffusion). We also thank the contributors of [OpenAI's ADM codebase](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fguided-diffusion) and [https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fdenoising-diffusion-pytorch](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fdenoising-diffusion-pytorch).\n\n## Maintenance\n\nPlease open a GitHub issue for any help. If you have any questions regarding the technical details, feel free to contact us.\n\n## License\nThe codes and the pretrained model in this repository are under the CreativeML OpenRAIL M license as specified by the LICENSE file.\n\nThe test benchmark, COCOEE, belongs to the COCO Consortium and are licensed under a Creative Commons Attribution 4.0 License.\n","# 以示例作画：基于扩散模型的范例驱动图像编辑\n![预告图](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FFantasy-Studio_Paint-by-Example_readme_fa467ca5469a.png)\n### [论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.13227) | [Hugging Face 演示](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FFantasy-Studio\u002FPaint-by-Example) \n\u003C!-- \u003Cbr> -->\n[Binxin Yang](https:\u002F\u002Forcid.org\u002F0000-0003-4110-1986), [Shuyang Gu](http:\u002F\u002Fhome.ustc.edu.cn\u002F~gsy777\u002F), [Bo Zhang](https:\u002F\u002Fbo-zhang.me\u002F), [Ting Zhang](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fpeople\u002Ftinzhan\u002F), [Xuejin Chen](http:\u002F\u002Fstaff.ustc.edu.cn\u002F~xjchen99\u002F), [Xiaoyan Sun](http:\u002F\u002Fstaff.ustc.edu.cn\u002F~xysun720\u002F), [Dong Chen](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fpeople\u002Fdoch\u002F) 和 [Fang Wen](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fpeople\u002Ffangwen\u002F)。\n\u003C!-- \u003Cbr> -->\n\n## 摘要\n> 近年来，语言引导的图像编辑取得了巨大成功。在本文中，我们首次探索了范例引导的图像编辑，以实现更精确的控制。我们通过自监督训练来解耦并重新组织源图像和范例图像，从而达到这一目标。然而，简单的方法会导致明显的融合伪影。我们对此进行了仔细分析，并提出了信息瓶颈和强数据增强策略，以避免直接复制粘贴范例图像这种平凡解。同时，为了确保编辑过程的可控性，我们为范例图像设计了一个任意形状的掩码，并利用无分类器指导来提高与范例图像的相似度。整个框架只需一次扩散模型的前向传播，无需任何迭代优化。我们证明，我们的方法表现优异，能够在真实场景图像上实现高保真度的可控编辑。\n>\n## 新闻\n- *2023-11-28* 最新工作 Asymmetric VQGAN 改进了非掩码区域细节的保留效果。更多详细信息请参阅相关 [论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.04632) 和 [GitHub]( https:\u002F\u002Fgithub.com\u002Fbuxiangzhiren\u002FAsymmetric_VQGAN)。\n- *2023-05-13* 发布定量结果代码。\n- *2023-03-03* 发布测试基准。\n- *2023-02-23* 非官方第三方应用支持由 [ModelScope](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002Fdamo\u002Fcv_stable-diffusion_paint-by-example\u002Fsummary)（中国最大的模型社区）提供。\n- *2022-12-07* 在 [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FFantasy-Studio\u002FPaint-by-Example) Spaces 上发布了一个 [Gradio](https:\u002F\u002Fgradio.app\u002F) 演示。\n- *2022-11-29* 上传代码。\n\n\n## 环境要求\n可以创建并激活一个名为 `Paint-by-Example` 的合适 [conda](https:\u002F\u002Fconda.io\u002F) 环境，命令如下：\n\n```\nconda env create -f environment.yaml\nconda activate Paint-by-Example\n```\n\n## 预训练模型\n我们提供了在 [Open-Images](https:\u002F\u002Fstorage.googleapis.com\u002Fopenimages\u002Fweb\u002Findex.html) 数据集上训练了 40 个 epoch 的检查点（[Google Drive](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F15QzaTWsvZonJcXsNv-ilMRCYaQLhzR_i\u002Fview?usp=share_link) | [Hugging Face](https:\u002F\u002Fhuggingface.co\u002FFantasy-Studio\u002FPaint-by-Example\u002Fresolve\u002Fmain\u002Fmodel.ckpt)）。默认情况下，我们假设预训练模型已下载并保存到 `checkpoints` 目录。\n\n## 测试\n\n要从我们的模型中采样，可以使用 `scripts\u002Finference.py`。例如：\n```\npython scripts\u002Finference.py \\\n--plms --outdir results \\\n--config configs\u002Fv1.yaml \\\n--ckpt checkpoints\u002Fmodel.ckpt \\\n--image_path examples\u002Fimage\u002Fexample_1.png \\\n--mask_path examples\u002Fmask\u002Fexample_1.png \\\n--reference_path examples\u002Freference\u002Fexample_1.jpg \\\n--seed 321 \\\n--scale 5\n```\n或者直接运行：\n```\nsh test.sh\n```\n\n输入和输出的可视化效果如下：\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FFantasy-Studio_Paint-by-Example_readme_98896925c6b0.png)\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FFantasy-Studio_Paint-by-Example_readme_61c0a1cfe0c4.png)\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FFantasy-Studio_Paint-by-Example_readme_0296208edaf0.png)\n\n## 训练\n\n### 数据准备\n- 从 [CVDF 官网](https:\u002F\u002Fgithub.com\u002Fcvdfoundation\u002Fopen-images-dataset#download-images-with-bounding-boxes-annotations) 下载 Open-Images 数据集的独立压缩包，并解压到 `dataset\u002Fopen-images\u002Fimages` 目录。\n- 从 [Open-Images 官网](https:\u002F\u002Fstorage.googleapis.com\u002Fopenimages\u002Fweb\u002Fdownload_v7.html#download-manually) 下载 Open-Images 数据集的边界框标注文件，保存到 `dataset\u002Fopen-images\u002Fannotations` 目录。\n- 将每张图像的边界框标注转换为 txt 格式。\n    ```\n    python scripts\u002Fread_bbox.py\n    ```\n\n数据结构如下：\n```\ndataset\n├── open-images\n│  ├── annotations\n│  │  ├── class-descriptions-boxable.csv\n│  │  ├── oidv6-train-annotations-bbox.csv\n│  │  ├── test-annotations-bbox.csv\n│  │  ├── validation-annotations-bbox.csv\n│  ├── images\n│  │  ├── train_0\n│  │  │  ├── xxx.jpg\n│  │  │  ├── ...\n│  │  ├── train_1\n│  │  ├── ...\n│  │  ├── validation\n│  │  ├── test\n│  ├── bbox\n│  │  ├── train_0\n│  │  │  ├── xxx.txt\n│  │  │  ├── ...\n│  │  ├── train_1\n│  │  ├── ...\n│  │  ├── validation\n│  │  ├── test\n```\n\n### 下载 Stable Diffusion 预训练模型\n我们使用 Stable Diffusion v1-4 作为初始化模型，请从 [Hugging Face](https:\u002F\u002Fhuggingface.co\u002FCompVis\u002Fstable-diffusion-v-1-4-original) 下载预训练模型并保存到 `pretrained_models` 目录。然后运行以下脚本，为 UNet 的 5 个额外输入通道添加零初始化权重（4 个用于编码的掩码图像，1 个用于掩码本身）。\n```\npython scripts\u002Fmodify_checkpoints.py\n```\n\n### 训练 Paint by Example\n要在 Open-Images 数据集上训练新模型，可以使用 `main.py`。例如：\n```\npython -u main.py \\\n--logdir models\u002FPaint-by-Example \\\n--pretrained_model pretrained_models\u002Fsd-v1-4-modified-9channel.ckpt \\\n--base configs\u002Fv1.yaml \\\n--scale_lr False\n```\n或者直接运行：\n```\nsh train.sh\n```\n\n## 测试基准\n我们构建了一个用于定量分析的测试基准。具体来说，我们从 MSCOCO 验证集手动选取了 3500 张源图像，每张图像仅包含一个边界框。然后，我们又从 MSCOCO 训练集手动检索了一块参考图像区域。参考图像通常与掩码区域具有相似的语义，以确保组合合理。我们将其命名为 COCO 范例驱动图像编辑基准，简称 COCOEE。该测试基准可以从 [Google Drive](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F18wO_wSFF-GPNxWmO1bt6LdjubXcttqtO\u002Fview?usp=share_link) 下载。\n\n## 定量结果\n默认情况下，我们假设 COCOEE 已下载并保存到 `test_bench` 目录。要生成测试基准的结果，可以使用 `scripts\u002Finference_test_bench.py`。例如：\n```\npython scripts\u002Finference_test_bench.py \\\n--plms \\\n--outdir results\u002Ftest_bench \\\n--config configs\u002Fv1.yaml \\\n--ckpt checkpoints\u002Fmodel.ckpt \\\n--scale 5\n```\n或者直接运行：\n```\nbash inference_test_bench.sh\n```\n\n### FID 分数\n默认情况下，我们假设 COCO2017 的测试集已下载并保存到 `dataset` 目录中。数据结构如下：\n```\ndataset\n├── coco\n│  ├── test2017\n│  │  ├── xxx.jpg\n│  │  ├── xxx.jpg\n│  │  ├── ...\n│  │  ├── xxx.jpg\n```\n然后将图像转换为分辨率为 512×512 的正方形图像。\n  ```\n  python scripts\u002Fcreate_square_gt_for_fid.py\n  ```\n要计算 FID 分数，只需运行：\n```\npython eval_tool\u002Ffid\u002Ffid_score.py --device cuda \\\ntest_bench\u002Ftest_set_GT \\\nresults\u002Ftest_bench\u002Fresults\n```\n### QS 分数\n请从 [Google Drive](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1Ce2cSQ8UttxcEk03cjfJgaBwdhSPyuHI\u002Fview?usp=share_link) 下载用于计算 QS 分数的模型权重，并将模型保存到 `eval_tool\u002Fgmm` 目录中。要计算 QS 分数，只需运行：\n```\npython eval_tool\u002Fgmm\u002Fgmm_score_coco.py results\u002Ftest_bench\u002Fresults \\\n--gmm_path eval_tool\u002Fgmm\u002Fcoco2017_gmm_k20 \\\n--gpu 1\n```\n\n### CLIP 分数\n要计算 CLIP 分数，只需运行：\n```\npython eval_tool\u002Fclip_score\u002Fregion_clip_score.py \\\n--result_dir results\u002Ftest_bench\u002Fresults\n```\n\n\n## 引用 Paint by Example\n\n```\n@article{yang2022paint,\n  title={Paint by Example: Exemplar-based Image Editing with Diffusion Models},\n  author={Binxin Yang and Shuyang Gu and Bo Zhang and Ting Zhang and Xuejin Chen and Xiaoyan Sun and Dong Chen and Fang Wen},\n  journal={arXiv preprint arXiv:2211.13227},\n  year={2022}\n}\n```\n\n## 致谢\n\n本代码大量借鉴了 [Stable Diffusion](https:\u002F\u002Fgithub.com\u002FCompVis\u002Fstable-diffusion)。我们还感谢 [OpenAI 的 ADM 代码库](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fguided-diffusion) 和 [https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fdenoising-diffusion-pytorch](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fdenoising-diffusion-pytorch) 的贡献者。\n\n## 维护\n如需帮助，请在 GitHub 上提交问题。如果您对技术细节有任何疑问，请随时与我们联系。\n\n## 许可证\n本仓库中的代码和预训练模型均采用 CreativeML OpenRAIL M 许可证，具体参见 LICENSE 文件。\n\n测试基准 COCOEE 归 COCO 联盟所有，并根据知识共享署名 4.0 许可证授权使用。","# Paint-by-Example 快速上手指南\n\nPaint-by-Example 是一个基于扩散模型的图像编辑工具，允许用户通过提供一张参考图（Exemplar），将参考图中的物体或纹理精确地融合到目标图像的指定区域中。\n\n## 环境准备\n\n*   **系统要求**：Linux 操作系统，推荐配备 NVIDIA GPU（显存建议 8GB 以上）。\n*   **前置依赖**：已安装 [Conda](https:\u002F\u002Fconda.io\u002F) 包管理器。\n*   **网络提示**：下载模型和依赖时若遇到网络问题，建议配置国内镜像源或使用代理加速。\n\n## 安装步骤\n\n1.  **克隆代码库**\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002FFantasy-Studio\u002FPaint-by-Example.git\n    cd Paint-by-Example\n    ```\n\n2.  **创建并激活 Conda 环境**\n    使用项目提供的配置文件一键安装依赖：\n    ```bash\n    conda env create -f environment.yaml\n    conda activate Paint-by-Example\n    ```\n\n3.  **下载预训练模型**\n    将训练好的模型下载到 `checkpoints` 目录。你可以选择从 Hugging Face 下载，或者使用国内镜像（如 ModelScope）加速。\n\n    *   **方式一：Hugging Face (官方)**\n        ```bash\n        mkdir -p checkpoints\n        wget -O checkpoints\u002Fmodel.ckpt https:\u002F\u002Fhuggingface.co\u002FFantasy-Studio\u002FPaint-by-Example\u002Fresolve\u002Fmain\u002Fmodel.ckpt\n        ```\n\n    *   **方式二：ModelScope (国内加速推荐)**\n        如果 Hugging Face 下载缓慢，可前往 [ModelScope 模型页](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002Fdamo\u002Fcv_stable-diffusion_paint-by-example\u002Fsummary) 下载 `model.ckpt` 并手动放入 `checkpoints` 文件夹。\n\n## 基本使用\n\n准备好三张图片即可开始推理：\n1.  **原图 (`--image_path`)**：需要被编辑的背景图。\n2.  **掩码图 (`--mask_path`)**：黑白图片，白色区域表示需要被替换\u002F填充的区域。\n3.  **参考图 (`--reference_path`)**：提供想要融入内容的示例图片。\n\n运行以下命令生成结果（结果将保存在 `results` 文件夹）：\n\n```bash\npython scripts\u002Finference.py \\\n--plms --outdir results \\\n--config configs\u002Fv1.yaml \\\n--ckpt checkpoints\u002Fmodel.ckpt \\\n--image_path examples\u002Fimage\u002Fexample_1.png \\\n--mask_path examples\u002Fmask\u002Fexample_1.png \\\n--reference_path examples\u002Freference\u002Fexample_1.jpg \\\n--seed 321 \\\n--scale 5\n```\n\n**参数说明：**\n*   `--scale`：引导尺度，数值越大生成的图像与参考图越相似（默认 5，可尝试 3-7 之间）。\n*   `--seed`：随机种子，用于复现结果。\n\n你也可以直接运行封装好的脚本进行测试：\n```bash\nsh test.sh\n```","一位电商设计师正在为新款运动鞋制作多场景营销海报，需要将鞋子从白底图中提取并自然融合到复杂的户外背景中。\n\n### 没有 Paint-by-Example 时\n- **融合生硬**：传统的复制粘贴或简单蒙版合成，导致鞋子边缘与背景光影、色调严重不符，看起来像“贴”上去的假图。\n- **细节丢失**：在使用风格迁移工具时，鞋子的品牌 Logo、纹理材质等关键特征容易被算法模糊化或错误重绘。\n- **调整繁琐**：为了匹配背景透视和光照，设计师需手动在 Photoshop 中进行耗时的调色、绘制阴影和高光，反复修改多次仍难完美。\n- **控制力弱**：难以精确指定仅替换特定区域而保持背景其他部分（如草地、天空）完全不变，常出现背景扭曲的伪影。\n\n### 使用 Paint-by-Example 后\n- **自然融合**：只需提供鞋子参考图和背景掩码，Paint-by-Example 利用扩散模型自动生成符合背景光照和透视的鞋子，边缘过渡极其自然。\n- **特征保真**：基于示例的编辑机制确保了鞋子的 Logo、织带纹理等细节被高精度保留，不会发生形变或模糊。\n- **一键生成**：无需手动修图，单次前向推理即可完成高质量合成，将原本数小时的精修工作缩短至几秒钟。\n- **精准可控**：通过任意形状的掩码严格限定编辑区域，背景中的草木、建筑等非目标区域毫发无损，彻底杜绝了背景伪影。\n\nPaint-by-Example 通过“以图改图”的范式，解决了传统合成中光影不协和细节丢失的难题，让高保真的创意编辑变得简单可控。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FFantasy-Studio_Paint-by-Example_98896925.png","Fantasy-Studio",null,"https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FFantasy-Studio_11f3866c.png","https:\u002F\u002Fgithub.com\u002FFantasy-Studio",[80,84],{"name":81,"color":82,"percentage":83},"Python","#3572A5",99.7,{"name":85,"color":86,"percentage":87},"Shell","#89e051",0.3,1249,113,"2026-04-01T05:52:39","NOASSERTION","未说明","需要 NVIDIA GPU (运行评估脚本时指定 --device cuda)，显存需求未说明 (基于 Stable Diffusion v1-4，建议 8GB+)，CUDA 版本未说明",{"notes":95,"python":92,"dependencies":96},"1. 推荐使用 conda 创建名为 'Paint-by-Example' 的环境 (需 environment.yaml 文件，但文中未提供具体内容)。\n2. 推理和训练基于 Stable Diffusion v1-4，需预先下载并修改模型权重以支持 9 通道输入。\n3. 训练数据需手动准备 Open-Images 数据集并按特定目录结构整理标注文件。\n4. 定量评估需额外下载 COCOEE 基准测试集及 GMM 模型权重。",[97,98,99,100],"conda (环境管理)","Stable Diffusion v1-4 (基础模型)","Open-Images Dataset (训练数据)","Gradio (演示界面)",[14,13],[103,104,105,106,107,108,109,110,111],"computer-vision","deep-learning","diffusion-models","image-editing","image-generation","image-manipulation","pytorch","stable-diffusion","paint-by-example","2026-03-27T02:49:30.150509","2026-04-06T08:08:37.138885",[115,120,125,130,135,140,145],{"id":116,"question_zh":117,"answer_zh":118,"source_url":119},16204,"安装依赖时出现 'no found page' 错误，特别是针对 taming-transformers 库怎么办？","这通常是由于网络问题（如在中国大陆访问 GitHub 受限）导致的。解决方法有两种：1. 使用 VPN；2. 手动下载 taming-transformers 仓库到本地电脑，然后复制到服务器进行本地安装。具体命令可尝试：`pip install -e git+https:\u002F\u002Fgithub.com\u002FCompVis\u002Ftaming-transformers.git@master#egg=taming-transformers`，如果失败则采用手动下载源码后运行 `pip install -e .` 的方式。","https:\u002F\u002Fgithub.com\u002FFantasy-Studio\u002FPaint-by-Example\u002Fissues\u002F5",{"id":121,"question_zh":122,"answer_zh":123,"source_url":124},16205,"为什么生成的图像是全黑的，或者输出为零张量？","生成全黑图像通常是因为下载的预训练模型文件（model.ckpt）损坏或不完整。解决方案是重新下载预训练模型文件（约 11.7G），并确保将其正确放置在项目的 `checkpoints` 目录下。维护者已确认之前的检查点文件存在故障，重新下载后即可正常工作。","https:\u002F\u002Fgithub.com\u002FFantasy-Studio\u002FPaint-by-Example\u002Fissues\u002F7",{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},16206,"当蒙版（mask）尺寸过小时，生成效果不佳或无效的原因是什么？","小蒙版生成效果差的主要原因是训练数据中未包含足够多的小尺寸蒙版样本。临时解决方案是尝试增大 CFG guidance（分类器自由引导尺度）的值来改善生成结果。从根本上解决需要在训练阶段加入更多小蒙版数据进行训练。","https:\u002F\u002Fgithub.com\u002FFantasy-Studio\u002FPaint-by-Example\u002Fissues\u002F33",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},16207,"配置文件中设置 `cond_stage_trainable` 为 `true` 是否意味着微调了 CLIP 模型？","不是。虽然参数设置为 `true`，但这并不意味着微调了 CLIP 模型本身。该设置是为了训练 `cond_stage` 中的几个 MLP 层，这些层不属于 CLIP 模型架构，而是用于解码特征并注入到扩散过程中的额外层。CLIP 模型的权重在训练中保持冻结。","https:\u002F\u002Fgithub.com\u002FFantasy-Studio\u002FPaint-by-Example\u002Fissues\u002F9",{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},16208,"代码中的 `self.learnable_vector` 是如何被优化学习的？为什么看起来像高斯噪声？","`self.learnable_vector` 是通过优化器参数列表中包含该向量来进行更新的。它看起来像高斯噪声主要有两个原因：1. 它初始化为高斯分布；2. 由于学习率较小，其值在训练过程中偏离初始值不多。维护者曾承认代码清理过程中存在相关 Bug 并已修复，实际训练中该向量的梯度较小，但确实在更新。","https:\u002F\u002Fgithub.com\u002FFantasy-Studio\u002FPaint-by-Example\u002Fissues\u002F26",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},16209,"运行推理时报错 `TypeError: __init__() got an unexpected keyword argument 'u_cond_percent'` 如何解决？","该错误通常是由于依赖库版本不匹配导致的，特别是 `diffusers` 或相关 LDM 代码的版本问题。请检查并确保安装了与该项目兼容的特定版本的依赖库。如果问题依旧，建议参考项目最新的 `requirements.txt` 重新安装环境，或查看是否有代码更新修复了该参数传递问题。","https:\u002F\u002Fgithub.com\u002FFantasy-Studio\u002FPaint-by-Example\u002Fissues\u002F6",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},16210,"COCOEE 数据集是否已经发布？","是的，维护者已经发布了 COCOEE 数据集。用户可以直接在项目仓库或相关发布页面获取该数据集资源。","https:\u002F\u002Fgithub.com\u002FFantasy-Studio\u002FPaint-by-Example\u002Fissues\u002F19",[]]