[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-Shilin-LU--TF-ICON":3,"tool-Shilin-LU--TF-ICON":64},[4,18,26,35,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,2,"2026-04-06T11:32:50",[14,15,13],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":10,"last_commit_at":41,"category_tags":42,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[43,15,13,14],"语言模型",{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":10,"last_commit_at":50,"category_tags":51,"status":17},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,"2026-04-06T03:28:53",[14,15,13,52],"视频",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",85013,"2026-04-06T11:09:19",[15,16,52,61,13,62,43,14,63],"插件","其他","音频",{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":81,"owner_twitter":82,"owner_website":83,"owner_url":84,"languages":85,"stars":98,"forks":99,"last_commit_at":100,"license":101,"difficulty_score":102,"env_os":103,"env_gpu":104,"env_ram":105,"env_deps":106,"category_tags":112,"github_topics":113,"view_count":32,"oss_zip_url":82,"oss_zip_packed_at":82,"status":17,"created_at":120,"updated_at":121,"faqs":122,"releases":148},5345,"Shilin-LU\u002FTF-ICON","TF-ICON","[ICCV 2023] \"TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition\" (Official Implementation)","TF-ICON 是一款基于扩散模型的开源图像合成工具，专为解决“跨域图像引导合成”任务而设计。它的核心功能是将用户提供的物体无缝融合到全新的视觉背景中，同时保持光影、风格的高度一致。\n\n传统方法在处理此类任务时，往往需要对预训练模型进行昂贵的微调或针对每个实例进行耗时优化，这不仅计算成本高，还可能破坏模型原有的丰富先验知识。TF-ICON 巧妙地解决了这一痛点，它无需任何额外训练、微调或迭代优化，即可直接利用现成的扩散模型（如 Stable Diffusion）实现高质量的图像合成。\n\n该工具的独特技术亮点在于引入了“特殊提示词（exceptional prompt）”机制。这一创新能有效辅助扩散模型将真实图像精准地反演为潜在表示，为后续的自然融合奠定基础。实验表明，在 CelebA-HQ、COCO 等多个数据集上，其反演效果超越了当前最先进的方法。\n\nTF-ICON 非常适合研究人员探索无训练图像编辑范式，也适用于开发者快速集成高效的图像合成功能。对于设计师而言，它是一个强大的辅助工具，能帮助快速创作出逼真的合成素材，大幅降低技术门槛和时间成本。","# TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition (ICCV 2023)\n\n## [\u003Ca href=\"https:\u002F\u002Fshilin-lu.github.io\u002Ftf-icon.github.io\u002F\" target=\"_blank\">Project Page\u003C\u002Fa>] [\u003Ca href=\"https:\u002F\u002Fentuedu-my.sharepoint.com\u002F:b:\u002Fg\u002Fpersonal\u002Fshilin002_e_ntu_edu_sg\u002FEWRDLuFDrs5Ll0KGuMtvtbUBhBZcSw2roKCo96iCWgpMZQ?e=rEv3As\" target=\"_blank\">Poster\u003C\u002Fa>]\n\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-TF--ICON-green.svg?style=plastic)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.12493) [![TI2I](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fbenchmarks-TF--ICON-blue.svg?style=plastic)](https:\u002F\u002Fentuedu-my.sharepoint.com\u002F:f:\u002Fg\u002Fpersonal\u002Fshilin002_e_ntu_edu_sg\u002FEmmCgLm_3OZCssqjaGdvjMwBCIvqfjsyphjqNs7g2DFzQQ?e=JSwOHY)\n\nOfficial implementation of [TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition](https:\u002F\u002Fshilin-lu.github.io\u002Ftf-icon.github.io\u002F).\n\n> **TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition**\u003Cbr>\n\u003C!-- > [Gwanghyun Kim](https:\u002F\u002Fgwang-kim.github.io\u002F), Taesung Kwon, [Jong Chul Ye](https:\u002F\u002Fbispl.weebly.com\u002Fprofessor.html) \u003Cbr> -->\n> Shilin Lu, Yanzhu Liu, and Adams Wai-Kin Kong \u003Cbr>\n> ICCV 2023\n>\n>**Abstract**: \u003Cbr>\nText-driven diffusion models have exhibited impressive generative capabilities, enabling various image editing tasks. In this paper, we propose TF-ICON, a novel Training-Free Image COmpositioN framework that harnesses the power of text-driven diffusion models for cross-domain image-guided composition. This task aims to seamlessly integrate user-provided objects into a specific visual context. Current diffusion-based methods often involve costly instance-based optimization or finetuning of pretrained models on customized datasets, which can potentially undermine their rich prior. In contrast, TF-ICON can leverage off-the-shelf diffusion models to perform cross-domain image-guided composition without requiring additional training, finetuning, or optimization. Moreover, we introduce the exceptional prompt, which contains no information, to facilitate text-driven diffusion models in accurately inverting real images into latent representations, forming the basis for compositing. Our experiments show that equipping Stable Diffusion with the exceptional prompt outperforms state-of-the-art inversion methods on various datasets (CelebA-HQ, COCO, and ImageNet), and that TF-ICON surpasses prior baselines in versatile visual domains.\n\n\u003C!-- ## [\u003Ca href=\"https:\u002F\u002Fpnp-diffusion.github.io\u002F\" target=\"_blank\">Project Page\u003C\u002Fa>] [\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FMichalGeyer\u002Fpnp-diffusers\" target=\"_blank\">Diffusers Implementation\u003C\u002Fa>] -->\n\n\u003C!-- [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-PnP-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.12572) [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fhysts\u002FPnP-diffusion-features) \u003Ca href=\"https:\u002F\u002Freplicate.com\u002Farielreplicate\u002Fplug_and_play_image_translation\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FShilin-LU_TF-ICON_readme_7dacf1cc5d87.png\">\u003C\u002Fa> [![TI2I](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fbenchmarks-TI2I-blue)](https:\u002F\u002Fwww.dropbox.com\u002Fsh\u002F8giw0uhfekft47h\u002FAAAF1frwakVsQocKczZZSX6La?dl=0) -->\n\n![teaser](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FShilin-LU_TF-ICON_readme_12b70e62168d.png)\n\n---\n\n\u003C\u002Fdiv>\n\n![framework](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FShilin-LU_TF-ICON_readme_81ff7001b6f7.png)\n\n\u003C!-- # Updates:\n\n**19\u002F06\u002F23** 🧨 Diffusers implementation of Plug-and-Play is available [here](https:\u002F\u002Fgithub.com\u002FMichalGeyer\u002Fpnp-diffusers). -->\n\n\u003C!-- ## TODO:\n- [ ] Diffusers support and pipeline integration\n- [ ] Gradio demo\n- [ ] Release TF-ICON Test Benchmark -->\n\n\n\u003C!-- ## Usage\n\n**To plug-and-play diffusion features, please follow these steps:**\n\n1. [Setup](#setup)\n2. [Feature extraction](#feature-extraction)\n3. [Running PnP](#running-pnp)\n4. [TI2I Benchmarks](#ti2i-benchmarks) -->\n\n---\n\n\u003C\u002Fdiv>\n\n## Contents\n  - [Setup](#setup)\n    - [Option 1: Using Conda](#option-1-using-conda)\n    - [Option 2: Using Pip with Virtual Environment](#option-2-using-pip-with-virtual-environment)\n    - [Option 3: Using Pip (Global Installation)](#option-3-using-pip-global-installation)\n    - [Downloading Stable-Diffusion Weights](#downloading-stable\\-diffusion-weights)\n  - [Running TF-ICON](#running-tf\\-icon)\n    - [Data Preparation](#data-preparation)\n    - [Image Composition](#image-composition)\n  - [TF-ICON Test Benchmark](#tf\\-icon-test-benchmark)\n  - [Additional Results](#additional-results)\n    - [Sketchy Painting](#sketchy-painting)\n    - [Oil Painting](#oil-painting)\n    - [Photorealism](#photorealism)\n    - [Cartoon](#cartoon)\n  - [Acknowledgments](#acknowledgments)\n  - [Citation](#citation)\n\n\n\u003Cbr>\n\n## Setup\n\nOur codebase is built on [Stable-Diffusion](https:\u002F\u002Fgithub.com\u002FStability-AI\u002Fstablediffusion)\nand has shared dependencies and model architecture. A VRAM of 23 GB is recommended, though this may vary depending on the input samples (minimum 20 GB).\n\n### Option 1: Using Conda\n\n```bash\n# Clone the repository\ngit clone https:\u002F\u002Fgithub.com\u002FShilin-LU\u002FTF-ICON.git\ncd TF-ICON\n\n# Create and activate the conda environment\nconda env create -f tf_icon_env.yaml\nconda activate tf-icon\n```\n\n### Option 2: Using Pip with Virtual Environment\n\n```bash\n# Clone the repository\ngit clone https:\u002F\u002Fgithub.com\u002FShilin-LU\u002FTF-ICON.git\ncd TF-ICON\n\n# Create and activate a virtual environment\npython -m venv venv\nsource venv\u002Fbin\u002Factivate  # On Windows: venv\\Scripts\\activate\n\n# Install the package and dependencies\npip install -e .\n\n# For development dependencies\n# pip install -e \".[dev]\"\n```\n\n### Option 3: Using Pip (Global Installation)\n\n```bash\n# Clone the repository\ngit clone https:\u002F\u002Fgithub.com\u002FShilin-LU\u002FTF-ICON.git\ncd TF-ICON\n\n# Install the package and dependencies\npip install -e .\n```\n\n**Note**: For Options 2 and 3, you need to ensure you have compatible CUDA drivers installed on your system. For optimal performance, CUDA 11.3 is recommended.\n\n### Downloading Stable-Diffusion Weights\n\nDownload the StableDiffusion weights from the [Stability AI at Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fstabilityai\u002Fstable-diffusion-2-1-base\u002Fblob\u002Fmain\u002Fv2-1_512-ema-pruned.ckpt)\n(download the `sd-v2-1_512-ema-pruned.ckpt` file), and put it under `.\u002Fckpt` folder.\n\nAlternatively, you can also use the following commands to download and place the weights in the correct location:\n\n```bash\n# Create the ckpt directory if it doesn't exist\nmkdir -p ckpt\n\n# Download the model weights (using wget)\nwget -O ckpt\u002Fv2-1_512-ema-pruned.ckpt https:\u002F\u002Fhuggingface.co\u002Fstabilityai\u002Fstable-diffusion-2-1-base\u002Fresolve\u002Fmain\u002Fv2-1_512-ema-pruned.ckpt\n\n# Alternative: Using curl\n# curl -L https:\u002F\u002Fhuggingface.co\u002Fstabilityai\u002Fstable-diffusion-2-1-base\u002Fresolve\u002Fmain\u002Fv2-1_512-ema-pruned.ckpt -o ckpt\u002Fv2-1_512-ema-pruned.ckpt\n```\n\n## Running TF-ICON\n\n### Data Preparation\n\nSeveral input samples are available under `.\u002Finputs` directory. Each sample involves one background (bg), one foreground (fg), one segmentation mask for the foreground (fg_mask), and one user mask that denotes the desired composition location (mask_bg_fg). The input data structure is like this:\n```\ninputs\n├── cross_domain\n│  ├── prompt1\n│  │  ├── bgxx.png\n│  │  ├── fgxx.png\n│  │  ├── fgxx_mask.png\n│  │  ├── mask_bg_fg.png\n│  ├── prompt2\n│  ├── ...\n├── same_domain\n│  ├── prompt1\n│  │  ├── bgxx.png\n│  │  ├── fgxx.png\n│  │  ├── fgxx_mask.png\n│  │  ├── mask_bg_fg.png\n│  ├── prompt2\n│  ├── ...\n```\n\nMore samples are available in [TF-ICON Test Benchmark](#tf\\-icon-test-benchmark) or you can customize them. Note that the resolution of the input foreground should not be too small.\n\n- Cross domain: the background and foreground images originate from different visual domains.\n- Same domain: both the background and foreground images belong to the same photorealism domain.\n\n### Image Composition\nTo execute the TF-ICON under the 'cross_domain' mode, run the following commands:\n\n```\npython scripts\u002Fmain_tf_icon.py  --ckpt ckpt\u002Fv2-1_512-ema-pruned.ckpt      \\\n                                --root .\u002Finputs\u002Fcross_domain      \\\n                                --domain 'cross'                  \\\n                                --dpm_steps 20                    \\\n                                --dpm_order 2                     \\\n                                --scale 5                         \\\n                                --tau_a 0.4                       \\\n                                --tau_b 0.8                       \\\n                                --outdir .\u002Foutputs                \\\n                                --gpu cuda:0                      \\\n                                --seed 3407\n```\n\nFor the 'same_domain' mode, run the following commands:\n```\npython scripts\u002Fmain_tf_icon.py  --ckpt ckpt\u002Fv2-1_512-ema-pruned.ckpt      \\\n                                --root .\u002Finputs\u002Fsame_domain       \\\n                                --domain 'same'                   \\\n                                --dpm_steps 20                    \\\n                                --dpm_order 2                     \\\n                                --scale 2.5                       \\\n                                --tau_a 0.4                       \\\n                                --tau_b 0.8                       \\\n                                --outdir .\u002Foutputs                \\\n                                --gpu cuda:0                      \\\n                                --seed 3407\n```\n\n- `ckpt`: The path to the checkpoint of Stable Diffusion.\n- `root`: The path to your input data.\n- `domain`: Setting 'cross' if the foreground and background are from different visual domains, otherwise 'same'.\n- `dpm_steps`: The diffusion sampling steps.\n- `dpm_solver`: The order of the probability flow ODE solver.\n- `scale`: The classifier-free guidance (CFG) scale.\n- `tau_a`: The threshold for injecting composite self-attention maps.\n- `tau_b`: The threshold for preserving background.\n\n## TF-ICON Test Benchmark\n\nThe complete TF-ICON test benchmark is available in [this OneDrive folder](https:\u002F\u002Fentuedu-my.sharepoint.com\u002F:f:\u002Fg\u002Fpersonal\u002Fshilin002_e_ntu_edu_sg\u002FEmmCgLm_3OZCssqjaGdvjMwBCIvqfjsyphjqNs7g2DFzQQ?e=JSwOHY). If you find the benchmark useful for your research, please consider citing.\n\n\n\u003C!-- You can find the **Wild-TI2I**, **ImageNetR-TI2I** and **ImageNetR-Fake-TI2I** benchmarks in [this dropbox folder](https:\u002F\u002Fwww.dropbox.com\u002Fsh\u002F8giw0uhfekft47h\u002FAAAF1frwakVsQocKczZZSX6La?dl=0). The translation prompts and all the necessary configs (e.g. seed, generation prompt, guidance image path) are provided in a yaml file in each benchmark folder. -->\n\n\n\n## Additional Results\n### Sketchy Painting\n![sketchy-comp](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FShilin-LU_TF-ICON_readme_1dca2379dda8.png)\n\n---\n\n\u003C\u002Fdiv>\n\n### Oil Painting\n![painting-comp](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FShilin-LU_TF-ICON_readme_a73ab310ad1a.png)\n\n---\n\n\u003C\u002Fdiv>\n\n### Photorealism\n![real-comp](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FShilin-LU_TF-ICON_readme_5e74425e3b85.png)\n\n---\n\n\u003C\u002Fdiv>\n\n### Cartoon\n![carton-comp](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FShilin-LU_TF-ICON_readme_f22fc07c2e30.png)\n\n---\n\n\u003C\u002Fdiv>\n\n## Acknowledgments\nOur work is standing on the shoulders of giants. We thank the following contributors that our code is based on: [Stable-Diffusion](https:\u002F\u002Fgithub.com\u002FStability-AI\u002Fstablediffusion) and [Prompt-to-Prompt](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fprompt-to-prompt).\n\n\n## Citation\nIf you find the repo useful, please consider citing:\n```\n@inproceedings{lu2023tf,\n  title={TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition},\n  author={Lu, Shilin and Liu, Yanzhu and Kong, Adams Wai-Kin},\n  booktitle={Proceedings of the IEEE\u002FCVF International Conference on Computer Vision},\n  pages={2294--2305},\n  year={2023}\n}\n```\n","# TF-ICON：基于扩散模型的无训练跨域图像合成（ICCV 2023）\n\n## [\u003Ca href=\"https:\u002F\u002Fshilin-lu.github.io\u002Ftf-icon.github.io\u002F\" target=\"_blank\">项目主页\u003C\u002Fa>] [\u003Ca href=\"https:\u002F\u002Fentuedu-my.sharepoint.com\u002F:b:\u002Fg\u002Fpersonal\u002Fshilin002_e_ntu_edu_sg\u002FEWRDLuFDrs5Ll0KGuMtvtbUBhBZcSw2roKCo96iCWgpMZQ?e=rEv3As\" target=\"_blank\">海报\u003C\u002Fa>]\n\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-TF--ICON-green.svg?style=plastic)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.12493) [![TI2I](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fbenchmarks-TF--ICON-blue.svg?style=plastic)](https:\u002F\u002Fentuedu-my.sharepoint.com\u002F:f:\u002Fg\u002Fpersonal\u002Fshilin002_e_ntu_edu_sg\u002FEmmCgLm_3OZCssqjaGdvjMwBCIvqfjsyphjqNs7g2DFzQQ?e=JSwOHY)\n\n[TF-ICON：基于扩散模型的无训练跨域图像合成](https:\u002F\u002Fshilin-lu.github.io\u002Ftf-icon.github.io\u002F)的官方实现。\n\n> **TF-ICON：基于扩散模型的无训练跨域图像合成**\u003Cbr>\n\u003C!-- > [金光贤](https:\u002F\u002Fgwang-kim.github.io\u002F)、权泰成、[叶宗哲](https:\u002F\u002Fbispl.weebly.com\u002Fprofessor.html) \u003Cbr> -->\n> 卢诗琳、刘彦竹和康伟健\u003Cbr>\n> ICCV 2023\n>\n>**摘要**: \u003Cbr>\n文本驱动的扩散模型展现出令人印象深刻的生成能力，能够支持多种图像编辑任务。本文提出TF-ICON，一种新颖的无训练图像合成框架，利用文本驱动的扩散模型进行跨域图像引导的合成。该任务旨在将用户提供的对象无缝融入特定的视觉场景中。目前基于扩散的方法通常需要昂贵的实例级优化或在定制数据集上对预训练模型进行微调，这可能会削弱模型丰富的先验知识。相比之下，TF-ICON可以直接使用现成的扩散模型进行跨域图像引导的合成，而无需额外的训练、微调或优化。此外，我们引入了一种特殊的提示词——“异常提示”，它不包含任何信息，能够帮助文本驱动的扩散模型准确地将真实图像反演为潜在表示，从而为图像合成奠定基础。实验表明，为Stable Diffusion模型配备“异常提示”后，在多个数据集（CelebA-HQ、COCO和ImageNet）上的逆向效果优于当前最先进的方法；同时，TF-ICON在多种视觉领域中的表现也超越了先前的基线方法。\n\n\u003C!-- ## [\u003Ca href=\"https:\u002F\u002Fpnp-diffusion.github.io\u002F\" target=\"_blank\">项目主页\u003C\u002Fa>] [\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FMichalGeyer\u002Fpnp-diffusers\" target=\"_blank\">Diffusers实现\u003C\u002Fa>] -->\n\n\u003C!-- [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-PnP-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.12572) [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fhysts\u002FPnP-diffusion-features) \u003Ca href=\"https:\u002F\u002Freplicate.com\u002Farielreplicate\u002Fplug_and_play_image_translation\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FShilin-LU_TF-ICON_readme_7dacf1cc5d87.png\">\u003C\u002Fa> [![TI2I](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fbenchmarks-TI2I-blue)](https:\u002F\u002Fwww.dropbox.com\u002Fsh\u002F8giw0uhfekft47h\u002FAAAF1frwakVsQocKczZZSX6La?dl=0) -->\n\n![teaser](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FShilin-LU_TF-ICON_readme_12b70e62168d.png)\n\n---\n\n\u003C\u002Fdiv>\n\n![framework](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FShilin-LU_TF-ICON_readme_81ff7001b6f7.png)\n\n\u003C!-- # 更新：\n\n**19\u002F06\u002F23** 🧨 插拔式扩散功能的Diffusers实现现已发布[此处](https:\u002F\u002Fgithub.com\u002FMichalGeyer\u002Fpnp-diffusers)。 -->\n\n\u003C!-- ## 待办事项：\n- [ ] 支持Diffusers并集成流水线\n- [ ] Gradio演示\n- [ ] 发布TF-ICON测试基准 -->\n\n\n\u003C!-- ## 使用方法\n\n**要使用插拔式扩散功能，请按照以下步骤操作：**\n\n1. [设置](#setup)\n2. [特征提取](#feature-extraction)\n3. [运行PnP](#running-pnp)\n4. [TI2I基准测试](#ti2i-benchmarks) -->\n\n---\n\n\u003C\u002Fdiv>\n\n## 目录\n  - [设置](#setup)\n    - [选项1：使用Conda](#option-1-using-conda)\n    - [选项2：使用Pip与虚拟环境](#option-2-using-pip-with-virtual-environment)\n    - [选项3：使用Pip（全局安装）](#option-3-using-pip-global-installation)\n    - [下载Stable-Diffusion权重](#downloading-stable\\-diffusion-weights)\n  - [运行TF-ICON](#running-tf\\-icon)\n    - [数据准备](#data-preparation)\n    - [图像合成](#image-composition)\n  - [TF-ICON测试基准](#tf\\-icon-test-benchmark)\n  - [附加结果](#additional-results)\n    - [素描风格](#sketchy-painting)\n    - [油画风格](#oil-painting)\n    - [写实风格](#photorealism)\n    - [卡通风格](#cartoon)\n  - [致谢](#acknowledgments)\n  - [引用](#citation)\n\n\n\u003Cbr>\n\n## 设置\n\n我们的代码库基于[Stable-Diffusion](https:\u002F\u002Fgithub.com\u002FStability-AI\u002Fstablediffusion)，共享依赖项和模型架构。建议显存至少为23 GB，但具体需求可能因输入样本而异，最低要求为20 GB。\n\n### 选项1：使用Conda\n\n```bash\n# 克隆仓库\ngit clone https:\u002F\u002Fgithub.com\u002FShilin-LU\u002FTF-ICON.git\ncd TF-ICON\n\n# 创建并激活Conda环境\nconda env create -f tf_icon_env.yaml\nconda activate tf-icon\n```\n\n### 选项2：使用Pip与虚拟环境\n\n```bash\n# 克隆仓库\ngit clone https:\u002F\u002Fgithub.com\u002FShilin-LU\u002FTF-ICON.git\ncd TF-ICON\n\n# 创建并激活虚拟环境\npython -m venv venv\nsource venv\u002Fbin\u002Factivate  # 在Windows上：venv\\Scripts\\activate\n\n# 安装包及依赖\npip install -e .\n\n# 如果需要开发依赖\n# pip install -e \".[dev]\"\n```\n\n### 选项3：使用Pip（全局安装）\n\n```bash\n# 克隆仓库\ngit clone https:\u002F\u002Fgithub.com\u002FShilin-LU\u002FTF-ICON.git\ncd TF-ICON\n\n# 安装包及依赖\npip install -e .\n```\n\n**注意**：对于选项2和选项3，您需要确保系统已安装兼容的CUDA驱动程序。为了获得最佳性能，建议使用CUDA 11.3版本。\n\n### 下载Stable-Diffusion权重\n\n从[Hugging Face上的Stability AI](https:\u002F\u002Fhuggingface.co\u002Fstabilityai\u002Fstable-diffusion-2-1-base\u002Fblob\u002Fmain\u002Fv2-1_512-ema-pruned.ckpt)下载StableDiffusion权重（下载`sd-v2-1_512-ema-pruned.ckpt`文件），并将其放置在`.\u002Fckpt`文件夹下。\n\n或者，您也可以使用以下命令直接下载并将权重放置到正确的位置：\n\n```bash\n# 如果ckpt目录不存在，则创建\nmkdir -p ckpt\n\n# 下载模型权重（使用wget）\nwget -O ckpt\u002Fv2-1_512-ema-pruned.ckpt https:\u002F\u002Fhuggingface.co\u002Fstabilityai\u002Fstable-diffusion-2-1-base\u002Fresolve\u002Fmain\u002Fv2-1_512-ema-pruned.ckpt\n\n# 或者使用curl\n# curl -L https:\u002F\u002Fhuggingface.co\u002Fstabilityai\u002Fstable-diffusion-2-1-base\u002Fresolve\u002Fmain\u002Fv2-1_512-ema-pruned.ckpt -o ckpt\u002Fv2-1_512-ema-pruned.ckpt\n```\n\n## 运行TF-ICON\n\n### 数据准备\n\n`.\u002Finputs` 目录下提供了若干输入样本。每个样本包含一张背景图（bg）、一张前景图（fg）、一张用于前景的分割掩码（fg_mask）以及一张用于指定目标合成位置的用户掩码（mask_bg_fg）。输入数据结构如下：\n```\ninputs\n├── cross_domain\n│  ├── prompt1\n│  │  ├── bgxx.png\n│  │  ├── fgxx.png\n│  │  ├── fgxx_mask.png\n│  │  ├── mask_bg_fg.png\n│  ├── prompt2\n│  ├── ...\n├── same_domain\n│  ├── prompt1\n│  │  ├── bgxx.png\n│  │  ├── fgxx.png\n│  │  ├── fgxx_mask.png\n│  │  ├── mask_bg_fg.png\n│  ├── prompt2\n│  ├── ...\n```\n\n更多样本可在 [TF-ICON 测试基准](#tf\\-icon-test-benchmark) 中获取，您也可以自行定制。请注意，输入前景的分辨率不应过小。\n\n- 跨域：背景图和前景图来自不同的视觉领域。\n- 同域：背景图和前景图属于同一写实主义领域。\n\n### 图像合成\n要在“跨域”模式下执行 TF-ICON，请运行以下命令：\n\n```\npython scripts\u002Fmain_tf_icon.py  --ckpt ckpt\u002Fv2-1_512-ema-pruned.ckpt      \\\n                                --root .\u002Finputs\u002Fcross_domain      \\\n                                --domain 'cross'                  \\\n                                --dpm_steps 20                    \\\n                                --dpm_order 2                     \\\n                                --scale 5                         \\\n                                --tau_a 0.4                       \\\n                                --tau_b 0.8                       \\\n                                --outdir .\u002Foutputs                \\\n                                --gpu cuda:0                      \\\n                                --seed 3407\n```\n\n在“同域”模式下，请运行以下命令：\n```\npython scripts\u002Fmain_tf_icon.py  --ckpt ckpt\u002Fv2-1_512-ema-pruned.ckpt      \\\n                                --root .\u002Finputs\u002Fsame_domain       \\\n                                --domain 'same'                   \\\n                                --dpm_steps 20                    \\\n                                --dpm_order 2                     \\\n                                --scale 2.5                       \\\n                                --tau_a 0.4                       \\\n                                --tau_b 0.8                       \\\n                                --outdir .\u002Foutputs                \\\n                                --gpu cuda:0                      \\\n                                --seed 3407\n```\n\n- `ckpt`: Stable Diffusion 检查点的路径。\n- `root`: 输入数据的路径。\n- `domain`: 如果前景和背景来自不同视觉领域，则设置为 ‘cross’；否则设置为 ‘same’。\n- `dpm_steps`: 扩散采样步数。\n- `dpm_solver`: 概率流 ODE 求解器的阶数。\n- `scale`: 无分类器指导（CFG）尺度。\n- `tau_a`: 注入复合自注意力图的阈值。\n- `tau_b`: 保留背景的阈值。\n\n## TF-ICON 测试基准\n\n完整的 TF-ICON 测试基准可在 [此 OneDrive 文件夹](https:\u002F\u002Fentuedu-my.sharepoint.com\u002F:f:\u002Fg\u002Fpersonal\u002Fshilin002_e_ntu_edu_sg\u002FEmmCgLm_3OZCssqjaGdvjMwBCIvqfjsyphjqNs7g2DFzQQ?e=JSwOHY) 中找到。如果您认为该基准对您的研究有帮助，请考虑引用它。\n\n\n\u003C!-- 您可以在 [此 Dropbox 文件夹](https:\u002F\u002Fwww.dropbox.com\u002Fsh\u002F8giw0uhfekft47h\u002FAAAF1frwakVsQocKczZZSX6La?dl=0) 中找到 **Wild-TI2I**、**ImageNetR-TI2I** 和 **ImageNetR-Fake-TI2I** 基准。每个基准文件夹中都提供了一个 YAML 文件，其中包含了翻译提示以及所有必要的配置（例如种子、生成提示、引导图像路径）。 -->\n\n\n\n## 补充结果\n### 素描风格\n![sketchy-comp](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FShilin-LU_TF-ICON_readme_1dca2379dda8.png)\n\n---\n\n\u003C\u002Fdiv>\n\n### 油画风格\n![painting-comp](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FShilin-LU_TF-ICON_readme_a73ab310ad1a.png)\n\n---\n\n\u003C\u002Fdiv>\n\n### 写实风格\n![real-comp](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FShilin-LU_TF-ICON_readme_5e74425e3b85.png)\n\n---\n\n\u003C\u002Fdiv>\n\n### 卡通风格\n![carton-comp](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FShilin-LU_TF-ICON_readme_f22fc07c2e30.png)\n\n---\n\n\u003C\u002Fdiv>\n\n## 致谢\n我们的工作建立在前人的基础上。我们感谢以下项目为我们的代码提供了基础：[Stable-Diffusion](https:\u002F\u002Fgithub.com\u002FStability-AI\u002Fstablediffusion) 和 [Prompt-to-Prompt](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fprompt-to-prompt)。\n\n\n## 引用\n如果您觉得本仓库有用，请考虑引用：\n```\n@inproceedings{lu2023tf,\n  title={TF-ICON: 基于扩散模型的无训练跨域图像合成},\n  author={Lu, Shilin and Liu, Yanzhu and Kong, Adams Wai-Kin},\n  booktitle={IEEE\u002FCVF 国际计算机视觉会议论文集},\n  pages={2294--2305},\n  year={2023}\n}\n```","# TF-ICON 快速上手指南\n\nTF-ICON 是一个基于扩散模型的**免训练**跨域图像合成框架。它能够将用户提供的物体无缝融合到特定的视觉背景中，支持跨域（如卡通融入实景）和同域合成，无需微调模型或进行昂贵的实例优化。\n\n## 1. 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n*   **操作系统**: Linux (推荐) 或 Windows\n*   **GPU**: NVIDIA 显卡，显存建议 **23 GB** 以上（最低需 20 GB，具体取决于输入分辨率）。\n*   **CUDA**: 推荐版本 **11.3**。\n*   **依赖管理**: 推荐使用 `Conda` 或 `Python venv`。\n\n## 2. 安装步骤\n\n### 第一步：克隆代码库\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FShilin-LU\u002FTF-ICON.git\ncd TF-ICON\n```\n\n### 第二步：创建虚拟环境\n推荐使用 Conda（方式一），也可使用 pip + venv（方式二）。\n\n**方式一：使用 Conda (推荐)**\n```bash\nconda env create -f tf_icon_env.yaml\nconda activate tf-icon\n```\n\n**方式二：使用 Pip + Virtual Environment**\n```bash\npython -m venv venv\n# Linux\u002FMac:\nsource venv\u002Fbin\u002Factivate\n# Windows:\n# venv\\Scripts\\activate\n\npip install -e .\n```\n\n### 第三步：下载模型权重\n您需要下载 Stable Diffusion v2.1 的预训练权重文件 (`v2-1_512-ema-pruned.ckpt`) 并放入 `ckpt` 目录。\n\n**手动下载：**\n从 [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fstabilityai\u002Fstable-diffusion-2-1-base\u002Fblob\u002Fmain\u002Fv2-1_512-ema-pruned.ckpt) 下载文件。\n\n**或使用命令行自动下载：**\n```bash\nmkdir -p ckpt\nwget -O ckpt\u002Fv2-1_512-ema-pruned.ckpt https:\u002F\u002Fhuggingface.co\u002Fstabilityai\u002Fstable-diffusion-2-1-base\u002Fresolve\u002Fmain\u002Fv2-1_512-ema-pruned.ckpt\n```\n*(注：如果下载速度慢，可尝试配置国内镜像源或使用代理)*\n\n## 3. 基本使用\n\n### 数据准备\n将输入图片放置在 `.\u002Finputs` 目录下。每个合成任务需要四张图片：\n1.  **背景图** (`bgxx.png`)\n2.  **前景物体图** (`fgxx.png`)\n3.  **前景分割掩码** (`fgxx_mask.png`)\n4.  **目标位置掩码** (`mask_bg_fg.png`，指定前景在背景中的放置位置)\n\n目录结构示例：\n```text\ninputs\n├── cross_domain (跨域合成示例)\n│   └── prompt1\n│       ├── bgxx.png\n│       ├── fgxx.png\n│       ├── fgxx_mask.png\n│       └── mask_bg_fg.png\n└── same_domain (同域合成示例)\n    └── ...\n```\n\n### 运行图像合成\n\n根据背景与前景是否属于同一视觉领域，选择以下命令运行：\n\n#### 场景 A：跨域合成 (Cross-Domain)\n适用于风格差异大的场景（例如：将真实照片物体融入油画背景）。\n\n```bash\npython scripts\u002Fmain_tf_icon.py  --ckpt ckpt\u002Fv2-1_512-ema-pruned.ckpt      \\\n                                --root .\u002Finputs\u002Fcross_domain      \\\n                                --domain 'cross'                  \\\n                                --dpm_steps 20                    \\\n                                --dpm_order 2                     \\\n                                --scale 5                         \\\n                                --tau_a 0.4                       \\\n                                --tau_b 0.8                       \\\n                                --outdir .\u002Foutputs                \\\n                                --gpu cuda:0                      \\\n                                --seed 3407\n```\n\n#### 场景 B：同域合成 (Same-Domain)\n适用于风格一致的场景（例如：真实物体融入真实背景）。注意 `--domain` 设为 `'same'` 且 `--scale` 调整为 `2.5`。\n\n```bash\npython scripts\u002Fmain_tf_icon.py  --ckpt ckpt\u002Fv2-1_512-ema-pruned.ckpt      \\\n                                --root .\u002Finputs\u002Fsame_domain       \\\n                                --domain 'same'                   \\\n                                --dpm_steps 20                    \\\n                                --dpm_order 2                     \\\n                                --scale 2.5                       \\\n                                --tau_a 0.4                       \\\n                                --tau_b 0.8                       \\\n                                --outdir .\u002Foutputs                \\\n                                --gpu cuda:0                      \\\n                                --seed 3407\n```\n\n### 关键参数说明\n*   `--domain`: `'cross'` (跨域) 或 `'same'` (同域)。\n*   `--scale`: 分类器自由引导尺度 (CFG scale)，跨域推荐 5，同域推荐 2.5。\n*   `--tau_a`: 注入合成自注意力图的阈值。\n*   `--tau_b`: 保留背景的阈值。\n*   `--outdir`: 输出结果保存路径。\n\n运行完成后，合成结果将保存在 `.\u002Foutputs` 目录中。","某电商设计团队需要快速将新款运动鞋合成到雪山、沙滩等截然不同的营销背景图中，以制作多套广告素材。\n\n### 没有 TF-ICON 时\n- **训练成本高昂**：为了让模型理解特定鞋款的细节，往往需要对预训练模型进行微调或针对每个商品进行耗时的实例优化。\n- **跨域融合生硬**：直接将鞋子贴图到雪景或沙滩上时，光照、阴影和纹理风格严重不匹配，显得像拙劣的 PS 拼接。\n- **破坏原有质感**：传统的生成式编辑容易在合成过程中丢失商品原本的材质细节（如皮革纹理、鞋带结构），导致“货不对板”。\n- **迭代效率低下**：每更换一个背景或调整一种风格，都需要重新计算优化参数，设计师需等待数分钟甚至更久才能看到结果。\n\n### 使用 TF-ICON 后\n- **零训练即时可用**：直接利用现成的 Stable Diffusion 模型，无需任何微调或额外训练，即可实现跨域图像引导合成。\n- **光影自然统一**：TF-ICON 能自动根据目标背景（如雪山）的光照条件，重新渲染鞋子的阴影和高光，使合成效果天衣无缝。\n- **完美保留细节**：通过独特的“异常提示词”技术精准反转真实图像，确保鞋子的原始结构和材质特征在生成中毫发无损。\n- **工作流极速流转**：省去了繁琐的优化步骤，设计师可在几秒钟内切换多种场景风格，大幅缩短从创意到成稿的周期。\n\nTF-ICON 的核心价值在于打破了跨域合成的训练壁垒，让高质量的商品营销图生成变得像搭积木一样简单、快速且保真。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FShilin-LU_TF-ICON_12b70e62.png","Shilin-LU","Shilin","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FShilin-LU_cdd09c85.jpg","Ph.D. Student at NTU","Nanyang Technological University","Singapore","SHILIN002@e.ntu.edu.sg",null,"https:\u002F\u002Fshilin-lu.github.io\u002F","https:\u002F\u002Fgithub.com\u002FShilin-LU",[86,90,94],{"name":87,"color":88,"percentage":89},"Python","#3572A5",96.6,{"name":91,"color":92,"percentage":93},"Jupyter Notebook","#DA5B0B",3.2,{"name":95,"color":96,"percentage":97},"HTML","#e34c26",0.1,822,101,"2026-04-01T12:37:54","MIT",4,"Linux, macOS, Windows","必需 NVIDIA GPU，推荐显存 23GB（最低 20GB），推荐 CUDA 11.3","未说明",{"notes":107,"python":105,"dependencies":108},"1. 代码基于 Stable-Diffusion 构建，需手动下载 Stable Diffusion v2-1 模型权重文件 (v2-1_512-ema-pruned.ckpt) 并放入 .\u002Fckpt 目录。2. 支持三种环境安装方式：Conda、Pip 虚拟环境或全局 Pip 安装。3. 输入图像分辨率不宜过小，且需准备背景图、前景图及对应的掩码文件。4. 跨域合成任务对显存要求较高，建议确保显存充足以避免溢出。",[109,110,111],"Stable-Diffusion (基于 stabilityai\u002Fstablediffusion)","torch","cuda 驱动",[15],[114,115,116,117,118,119],"image-composition","image-inversion","generative-ai","stable-diffusion","text-to-image","diffusion-model","2026-03-27T02:49:30.150509","2026-04-08T12:17:51.807234",[123,128,133,138,143],{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},24247,"如果想合成论文未提及的风格（如儿童绘本风格），应该怎么做？","可以尝试将 `domain` 参数简单地切换为 'cross' 来探索是否能获得更好的结果，虽然这不一定保证成功，但值得一试。","https:\u002F\u002Fgithub.com\u002FShilin-LU\u002FTF-ICON\u002Fissues\u002F10",{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},24246,"如何计算 CLIP_text 和 CLIP_image 指标？","CLIP_text 是在生成的合成图像与文本提示（text prompt）之间计算的。CLIP_image 则是根据用户的掩码（mask），在生成图像的裁剪区域与参考前景图像之间计算的。","https:\u002F\u002Fgithub.com\u002FShilin-LU\u002FTF-ICON\u002Fissues\u002F27",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},24248,"运行代码时遇到 'get_learned_conditioning() takes 2 positional arguments but 3 were given' 错误怎么办？","这通常是由于依赖包版本不匹配导致的。建议严格遵循项目指南创建 conda 环境以确保 `diffusers` 和 `transformers` 版本正确。请删除现有的 tf-icon 环境，并按以下步骤重新创建：\n1. git clone https:\u002F\u002Fgithub.com\u002FShilin-LU\u002FTF-ICON.git\n2. cd TF-ICON\n3. conda env create -f tf_icon_env.yaml\n4. conda activate tf-icon","https:\u002F\u002Fgithub.com\u002FShilin-LU\u002FTF-ICON\u002Fissues\u002F16",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},24249,"关于附录 C 中“值注入”消融实验的代码实现细节是什么？","作者指出该部分的原始实现可能存在错误，且相关代码未保留。推测可能是错误地在某些时间步根据掩码结合了 x_t^fg 和 x_t^bg。建议用户尝试这种结合方式来复现类似的结果。","https:\u002F\u002Fgithub.com\u002FShilin-LU\u002FTF-ICON\u002Fissues\u002F23",{"id":144,"question_zh":145,"answer_zh":146,"source_url":147},24250,"如何理解论文中提到的“具有相同行的矩阵乘以另一个矩阵，结果矩阵也具有相同的行”这一性质？","可以通过矩阵乘法示例理解：设矩阵 A 的两行均为 [a, b]，矩阵 B 为任意 2x2 矩阵。当 A 乘以 B 时，结果矩阵的每一行都是 A 的行向量与 B 的列向量的点积，由于 A 的行向量相同，计算出的结果行自然也完全相同。","https:\u002F\u002Fgithub.com\u002FShilin-LU\u002FTF-ICON\u002Fissues\u002F22",[]]