[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-AlonzoLeeeooo--awesome-text-to-image-studies":3,"tool-AlonzoLeeeooo--awesome-text-to-image-studies":62},[4,18,26,35,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108111,2,"2026-04-08T11:23:26",[14,15,13],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":10,"last_commit_at":41,"category_tags":42,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[43,15,13,14],"语言模型",{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":10,"last_commit_at":50,"category_tags":51,"status":17},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,"2026-04-06T03:28:53",[14,15,13,52],"视频",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":59,"last_commit_at":60,"category_tags":61,"status":17},5646,"opencv","opencv\u002Fopencv","OpenCV 是一个功能强大的开源计算机视觉库，被誉为机器视觉领域的“瑞士军刀”。它主要解决让计算机“看懂”图像和视频的核心难题，提供了从基础的图像读取、色彩转换、边缘检测，到复杂的人脸识别、物体追踪、3D 重建及深度学习模型部署等全方位算法支持。无论是处理静态图片还是分析实时视频流，OpenCV 都能高效完成特征提取与模式识别任务。\n\n这款工具特别适合计算机视觉开发者、人工智能研究人员以及机器人工程师使用。对于希望将视觉感知能力集成到应用中的软件工程师，或是需要快速验证算法原型的学术研究者，OpenCV 都是不可或缺的基础设施。虽然普通用户通常不会直接操作代码，但日常生活中使用的扫码支付、美颜相机和自动驾驶系统，背后往往都有它的身影。\n\nOpenCV 的独特亮点在于其卓越的性能与广泛的兼容性。它采用 C++ 编写以确保高速运算，同时提供 Python、Java 等多种语言接口，极大降低了开发门槛。库中内置了数千种优化算法，并支持跨平台运行，能够无缝对接各类硬件加速器。作为社区驱动的项目，OpenCV 拥有活跃的生态系统和丰富的学习资源，持续推动着视觉技术的前沿发展。",86988,1,"2026-04-08T16:06:22",[14,15],{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":78,"owner_email":79,"owner_twitter":79,"owner_website":79,"owner_url":80,"languages":81,"stars":86,"forks":87,"last_commit_at":88,"license":89,"difficulty_score":59,"env_os":90,"env_gpu":91,"env_ram":91,"env_deps":92,"category_tags":95,"github_topics":96,"view_count":32,"oss_zip_url":79,"oss_zip_packed_at":79,"status":17,"created_at":102,"updated_at":103,"faqs":104,"releases":105},5964,"AlonzoLeeeooo\u002Fawesome-text-to-image-studies","awesome-text-to-image-studies","A collection of awesome text-to-image generation studies.","awesome-text-to-image-studies 是一个专注于“文生图”（Text-to-Image）生成技术的开源学术资源库。它系统性地整理了该领域海量的研究论文、数据集、工具包以及成熟的商业产品，旨在解决研究人员和开发者在面对快速迭代的 AI 绘图技术时，难以高效追踪最新进展、梳理技术脉络的痛点。\n\n这份资源库不仅按年份和顶级会议（如 CVPR、ICLR、AAAI）对文献进行了分类，还独具特色地开辟了“主题专题”板块，深入探讨了扩散模型与大语言模型（LLM）、Transformer 架构、联邦学习等前沿技术的融合应用。此外，它还涵盖了从基础生成到个性化定制、文本引导编辑等多个细分方向，甚至提供了相关代码和模型权重的链接。\n\nawesome-text-to-image-studies 非常适合 AI 领域的科研人员、算法工程师以及希望深入了解生成式人工智能底层逻辑的技术爱好者使用。对于想要把握行业风向、寻找研究灵感或复现经典算法的用户而言，这里提供了一个结构清晰、更新及时的一站式知识导航，帮助大家更轻松地探索文生图技术的无限可能。","\u003Cp align=\"center\">\n  \u003Ch1 align=\"center\">A Collection of Text-to-Image Generation Studies\u003C\u002Fh1>\n\nThis GitHub repository summarizes papers and resources related to the text-to-image (T2I) generation task. \n\n> [!NOTE]\n> This document serves as the `homepage` of the whole GitHub repo. Papers are summarized according to **different research directions, published years, and conferences.** \n> \n> [The `topics` section](topics\u002Ftopics.md) summarizes papers that are highly related to T2I generation according to different properties, e.g., prerequisites of T2I generation, diffusion models with other techniques (e.g., Diffusion Transformer, LLMs, Mamba, etc.), and diffusion models for other tasks. \n\nIf you have any suggestions about this reåpository, please feel free to [start a new issue](https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo\u002Fawesome-text-to-image-generation-studies\u002Fissues\u002Fnew) or [pull requests](https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo\u002Fawesome-text-to-image-generation-studies\u002Fpulls).\n\nRecent news of this GitHub repo are listed as follows.\n\n🔥 [Dec. 11th, 2025] Our paper titled [\"StableV2V: Stablizing Shape Consistency in Video-to-Video Editing\"](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F11272911) is accepted at TCSVT 2025!\n\n🔥 [Nov. 19th] We have released our latest paper titled [\"StableV2V: Stablizing Shape Consistency in Video-to-Video Editing\"](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.11045), with the correponding [code](https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo\u002FStableV2V), [model weights](https:\u002F\u002Fhuggingface.co\u002FAlonzoLeeeooo\u002FStableV2V), and [a testing benchmark `DAVIS-Edit`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FAlonzoLeeeooo\u002FDAVIS-Edit) open-sourced. Feel free to check them out from the links!\n\u003Cdetails> \u003Csummary> Click to see more information. \u003C\u002Fsummary>\n\n- [Apr. 26th] Update a new topic: **Diffusion Models Meet Federated Learning.** See [the `topics` section](topics\u002Ftopics.md) for more details!\n- [Mar. 28th] The official **AAAI 2024** paper list are released! Official version of PDFs and BibTeX references are updated accordingly.\n- [Mar. 21th] [The `topics` section](topics\u002Ftopics.md) has been updated. This section aims to offer **paper lists that are summarized according to other properties of diffusion models**, e.g., Diffusion Transformer-based methods, diffusion models for NLP, diffusion models integrated with LLMs, etc. The corresponding references of these papers are also concluded in `reference.bib`.\n- [Mar. 7th] All available **CVPR, ICLR, and AAAI 2024 papers and references** are updated.\n- [Mar. 1st] Websites of [**the off-the-shelf text-to-image generation products**](#available-products) and [**toolkits**](#toolkits) are summarized.\n\n\u003C\u002Fdetails>\n\n\n\u003C!-- omit in toc -->\n# \u003Cspan id=\"contents\">Contents\u003C\u002Fspan>\n- [Products](#available-products)\n- [To-Do Lists](#to-do-lists)\n- [Papers](#papers)\n  - [Survey Papers](#survey-papers)\n  - [Text-to-Image Generation](#text-to-image-generation)\n    - [Year 2025](#text-year-2025)\n    - [Year 2024](#text-year-2024)\n    - [Year 2023](#text-year-2023)\n    - [Year 2022](#text-year-2022)\n    - [Year 2021](#text-year-2021)\n    - [Year 2020](#text-year-2020)\n  - [Conditional Text-to-Image Generation](#conditional-text-to-image-generation)\n    - [Year 2025](#conditional-year-2025)\n    - [Year 2024](#conditional-year-2024)\n    - [Year 2023](#conditional-year-2023)\n    - [Year 2022](#conditional-year-2022)\n  - [Personalized Text-to-Image Generation](#personalized-text-to-image-generation)\n    - [Year 2025](#personalized-year-2025)\n    - [Year 2024](#personalized-year-2024)\n    - [Year 2023](#personalized-year-2023)\n  - [Text-Guided Image Editing](#text-guided-image-editing) \n    - [Year 2025](#editing-year-2025)\n    - [Year 2024](#editing-year-2024)\n    - [Year 2023](#editing-year-2023)\n    - [Year 2022](#editing-year-2022)\n  - [Text Image Generation](#text-image-generation)\n    - [Year 2024](#gentext-year-2024)\n- [Datasets](#datasets)\n- [Toolkits](#toolkits)\n- [Q&A](#qa)\n- [References](#references)\n- [Star History](#star-history)\n- [WeChat Group](#wechat-group)\n\n\u003C!-- omit in toc -->\n# To-Do Lists\n- Published Papers on Conferences\n  - [ ] Update NeurIPS 2025 Papers\n  - [x] Update ICCV 2025 Papers\n  - [x] Update CVPR 2025 Papers\n  - [x] Update ICLR 2025 Papers\n  - [x] Update NeurIPS 2024 Papers\n  - [x] Update ECCV 2024 Papers\n  - [x] Update CVPR 2024 Papers\n    - [x] Update ⚠️ Papers and References\n    - [ ] Update arXiv References into the Official Version\n  - [x] Update AAAI 2024 Papers\n    - [x] Update ⚠️ Papers and References\n    - [x] Update arXiv References into the Official Version\n  - [x] Update ICLR 2024 Papers\n  - [x] Update NeurIPS 2023 Papers\n- Regular Maintenance of Preprint arXiv Papers and Missed Papers\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\n\u003C!-- omit in toc -->\n# Products\n|Name|Year|Website|Specialties|\n|-|-|-|-|\n|Nano Image Art|2025|[link](https:\u002F\u002Fnanoimage.art\u002F)|Create stunning AI images — powered by Google's Nano Banana Pro for next-gen quality, smart editing, and intelligent prompts.|\n|Fast Image AI|2025|[link](https:\u002F\u002Ffastimage.ai\u002F)|Fast Image AI instantly transforms your photos into stunning styles like Ghibli, Sketch, and Pixar. Effortlessly control image elements and create amazing effects with just one click.|\n|Gempix2 (Nano Banana 2)|2025|[link](https:\u002F\u002Fgempix2.site)|Free AI image generation platform with text-to-image, AI editing, and video generation support|\n|Stable Diffusion 3|2024|[link](https:\u002F\u002Fstability.ai\u002Fnews\u002Fstable-diffusion-3)|Diffusion Transformer-based Stable Diffusion|\n|Stable Video|2024|[link](https:\u002F\u002Fwww.stablevideo.com\u002F)|High-quality high-resolution images|\n|DALL-E 3|2023|[link](https:\u002F\u002Fopenai.com\u002Fdall-e-3)|Collaborate with [ChatGPT](https:\u002F\u002Fchat.openai.com\u002F)|\n|Ideogram|2023|[link](https:\u002F\u002Fideogram.ai\u002Flogin)|Text images|\n|Playground|2023|[link](https:\u002F\u002Fplayground.com\u002F)|Athestic images|\n|HiDream.ai|2023|[link](https:\u002F\u002Fhidreamai.com\u002F#\u002F)|-|\n|Dashtoon|2023|[link](https:\u002F\u002Fdashtoon.com\u002F)|Text-to-Comic Generation|\n|WHEE|2023|[link](https:\u002F\u002Fwww.whee.com\u002F)|WHEE is an online AI generation tool, which can be applied for *T2I generation, I2I generation, SR, inpainting, outpainting, image variation, virtural try-on, etc.* |\n|Vega AI|2023|[link](https:\u002F\u002Fwww.vegaai.net\u002F)|Vega AI is an online AI generation tool, which can be applied for *T2I generation, I2I generation, SR, T2V generation, I2V generation, etc.*|\n|Wujie AI|2022|[link](https:\u002F\u002Fwww.wujieai.com\u002F)|The Chinese name is \"无界AI\", offering AIGC resources and online services|\n|Midjourney|2022|[link](https:\u002F\u002Fwww.midjourney.com\u002Fhome)|Powerful close-sourced generation tool|\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\n\u003C!-- omit in toc -->\n# Papers\n\n\u003C!-- omit in toc -->\n## Survey Papers\n- **Text-to-Image Generation**\n  - Year 2024\n    - **ACM Computing Surveys** \n      - Diffusion Models: A Comprehensive Survey of Methods and Applications [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.00796.pdf)\n  - Year 2023\n    - **TPAMI**\n      - Diffusion Models in Vision: A Survey [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.04747v2) [[Code]](https:\u002F\u002Fgithub.com\u002FCroitoruAlin\u002FDiffusion-Models-in-Vision-A-Survey)\n    - **arXiv**\n      - Text-to-image Diffusion Models in Generative AI: A Survey [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.07909)\n      - State of the Art on Diffusion Models for Visual Computing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.07204.pdf) \n  - Year 2022\n    - **arXiv**\n      - Efficient Diffusion Models for Vision: A Survey [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2210.09292)\n- **Conditional Text-to-Image Generation**\n  - Year 2024\n    - **arXiv**\n      - Controllable Generation with Text-to-Image Diffusion Models: A Survey [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.04279)\n- **Text-Guided Image Editing**\n  - Year 2024\n    - **arXiv**\n      - Diffusion Model-Based Image Editing: A Survey [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.17525.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FSiatMMLab\u002FAwesome-Diffusion-Model-Based-Image-Editing-Methods)\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n## Text-to-Image Generation\n- \u003Cspan id=\"text-year-2025\">**Year 2025**\u003C\u002Fspan>\n  - **CVPR**\n    - ***PreciseCam:*** Precise Camera Control for Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fapdf\u002F2501.12910) [[Project]](https:\u002F\u002Fgraphics.unizar.es\u002Fprojects\u002FPreciseCam2024\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fedurnebernal\u002FPreciseCam)\n    - ***Type-R:*** Automatically Retouching Typos for Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.18159) [[Code]](https:\u002F\u002Fgithub.com\u002FCyberAgentAILab\u002FType-R)\n    - ***Compass Control:*** Multi Object Orientation Control for Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.06752)\n    - ***Generative Photography:*** Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.02168) [[Project]](https:\u002F\u002Fgenerative-photography.github.io\u002Fproject\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fpandayuanyu\u002Fgenerative-photography)\n    -  ***One-Way Ticket:*** Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Fcvpr.thecvf.com\u002Fvirtual\u002F2025\u002Fposter\u002F32579) [[Code]](https:\u002F\u002Fgithub.com\u002Fsen-mao\u002FLoopfree)\n    - Text Embedding is Not All You Need: Attention Control for Text-to-Image Semantic Alignment with Text Self-Attention Maps [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.15236)\n    - Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.03178)\n    - Plug-and-Play Interpretable Responsible Text-to-Image Generation via Dual-Space Multi-facet Concept Control [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.18324) [[Project]](https:\u002F\u002Fbasim-azam.github.io\u002Fresponsiblediffusion\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fbasim-azam\u002Fresponsiblediffusion)\n    - Make It Count: Text-to-Image Generation with an Accurate Number of Objects [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.10210) [[Project]](https:\u002F\u002Fmake-it-count-paper.github.io\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FLitalby1\u002Fmake-it-count)\n    - ***MCCD:*** Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.02648)\n    - Rethinking Training for De-biasing Text-to-Image Generation: Unlocking the Potential of Stable Diffusion [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.12692)\n    - ***ShapeWords:*** Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.02912) [[Project]](https:\u002F\u002Flodurality.github.io\u002Fshapewords\u002F)\n    - ***SnapGen:*** Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.09619)\n    - Spatial Transport Optimization by Repositioning Attention Map for Training-Free Text-to-Image Synthesis [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.22168)\n    - ***Focus-N-Fix:*** Region-Aware Fine-Tuning for Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.06481)\n    - ***SILMM:*** Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.05818)\n    - Localized Concept Erasure for Text-to-Image Diffusion Models Using Training-Free Gated Low-Rank Adaptation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.12356)\n    - Self-Cross Diffusion Guidance for Text-to-Image Synthesis of Similar Subjects [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.18936)\n    - Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.16503) [[Code]](https:\u002F\u002Fgithub.com\u002FBomingmiao\u002FNoiseDiffusion)\n    - Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.07838)\n    - ***STEREO:*** A Two-Stage Framework for Adversarially Robust Concept Erasing from Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.16807)\n    - Minority-Focused Text-to-Image Generation via Prompt Optimization [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.16503) [[Code]](https:\u002F\u002Fgithub.com\u002Fsoobin-um\u002FMinorityPrompt)\n    - Scaling Down Text Encoders of Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.19897) [[Code]](https:\u002F\u002Fgithub.com\u002FLifuWang-66\u002FDistillT5)\n    - ⚠️ The Illusion of Unlearning: The Unstable Nature of Machine Unlearning in Text-to-Image Diffusion Models [Paper]\n    - ⚠️ Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis [Paper]\n    - ⚠️ Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization [Paper]\n    - ⚠️ Multi-Group Proportional Representations for Text-to-Image Models [Paper]\n    - ⚠️ ***VODiff:*** Controlling Object Visibility Order in Text-to-Image Generation [Paper]\n  - **ICLR**\n    - Improving Long-Text Alignment for Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=2ZK8zyIt7o)\n    - Information Theoretic Text-to-Image Alignment [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=Ugs2W5XFFo)\n    - ***Meissonic:*** Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=GJsuYHhAga)\n    - ***PaRa:*** Personalizing Text-to-Image Diffusion via Parameter Rank Reduction [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=KZgo2YQbhc)\n    - ***Fluid:*** Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=jQP5o1VAVc)\n    - Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=3BhZCfJ73Y)\n    - Denoising Autoregressive Transformers for Scalable Text-to-Image Generation [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=amDkNPVWcn)\n    - Progressive Compositionality in Text-to-Image Generative Models [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=S85PP4xjFD)\n    - Mining your own secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=hUdLs6TqZL)\n    - Measuring And Improving Engagement of Text-to-Image Generation Models [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=TmCcNuo03f)\n    - Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=ZRDhBwKs7l)\n    - Enhancing Compositional Text-to-Image Generation with Reliable Random Seeds [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=5BSlakturs)\n    - ***One-Prompt-One-Story:*** Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=cD1kl2QKv1)\n    - You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=T7bmHkwzS6)\n    - Rethinking Artistic Copyright Infringements In the Era Of Text-to-Image Generative Models [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=0OTVNEm9N4)\n    - Erasing Concept Combination from Text-to-Image Diffusion Model [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=OBjF5I4PWg)\n    - Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generative Models [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=1vggIT5vvj)\n    - ***TIGeR:*** Unifying Text-to-Image Generation and Retrieval with Large Multimodal Models [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=mr2icR6dpD)\n    - ***DGQ:*** Distribution-Aware Group Quantization for Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=ZyNEr7Xw5L)\n    - Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=LZfjxvqw0N)\n    - ***PT-T2I\u002FV:*** An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Image\u002FVideo-Task [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=lTrrnNdkOX)\n    - Revisiting text-to-image evaluation with Gecko: on metrics, prompts, and human rating [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=Im2neAMlre)\n    - ***SANA:*** Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=N8Oj1XhtYZ)\n    - Text-to-Image Rectified Flow as Plug-and-Play Priors [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=SzPZK856iI)\n    - Automated Filtering of Human Feedback Data for Aligning Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=8jvVNPHtVJ)\n    - ***SAFREE:*** Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=hgTFotBRKl)\n    - ***IterComp:*** Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=4w99NAikOE)\n    - ***ScImage:*** How good are multimodal large language models at scientific text-to-image generation? [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=ugyqNEOjoU)\n    - Guided Score identity Distillation for Data-Free One-Step Text-to-Image Generation [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=HMVDiaWMwM)\n    - Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=NWb128pSCb)\n- \u003Cspan id=\"text-year-2024\">**Year 2024**\u003C\u002Fspan>\n  - **CVPR**\n    - ***DistriFusion:*** Distributed Parallel Inference for High-Resolution Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.19481.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Fdistrifuser)\n    - ***InstanceDiffusion:*** Instance-level Control for Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.03290.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Ffrank-xwang\u002FInstanceDiffusion) [[Project]](https:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~xdwang\u002Fprojects\u002FInstDiff\u002F)\n    - ***ECLIPSE:*** A Resource-Efficient Text-to-Image Prior for Image Generations [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.04655.pdf) [[Code]](https:\u002F\u002Feclipse-t2i.vercel.app\u002F) [[Project]](https:\u002F\u002Fgithub.com\u002Feclipse-t2i\u002Feclipse-inference) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FECLIPSE-Community\u002FECLIPSE-Kandinsky-v2.2)\n    - ***Instruct-Imagen:*** Image Generation with Multi-modal Instruction [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.01952.pdf)\n    - Learning Continuous 3D Words for Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.08654.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fttchengab\u002Fcontinuous_3d_words_code\u002F)\n    - ***HanDiffuser:*** Text-to-Image Generation With Realistic Hand Appearances [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.01693.pdf)\n    - Rich Human Feedback for Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.10240.pdf)\n    - ***MarkovGen:*** Structured Prediction for Efficient Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.10997.pdf)\n    - Customization Assistant for Text-to-image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.03045.pdf)\n    - ***ADI:*** Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.15841.pdf) [[Project]](https:\u002F\u002Fadi-t2i.github.io\u002FADI\u002F)\n    - ***UFOGen:*** You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.09257.pdf)\n    - Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.17216.pdf)\n    - ***Tailored Visions:*** Enhancing Text-to-Image Generation with Personalized Prompt Rewriting [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.08129.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fzzjchen\u002FTailored-Visions)\n    - ***CoDi:*** Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.01407.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Ffast-codi\u002FCoDi) [[Project]](https:\u002F\u002Ffast-codi.github.io\u002F) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FMKFMIKU\u002FCoDi)\n    - Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.10255.pdf)\n    - Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.05239)\n    -  ***ElasticDiffusion:*** Training-free Arbitrary Size Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.18822) [[Code]](https:\u002F\u002Fgithub.com\u002FMoayedHajiAli\u002FElasticDiffusion-official) [[Project]](https:\u002F\u002Felasticdiffusion.github.io\u002F) [[Demo]](https:\u002F\u002Freplicate.com\u002Fmoayedhajiali\u002Felasticdiffusion)\n    - ***CosmicMan:*** A Text-to-Image Foundation Model for Humans [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.01294) [[Code]](https:\u002F\u002Fgithub.com\u002Fcosmicman-cvpr2024\u002FCosmicMan) [[Project]](https:\u002F\u002Fcosmicman-cvpr2024.github.io\u002F)\n    - ***PanFusion:*** Taming Stable Diffusion for Text to 360° Panorama Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.07949) [[Code]](https:\u002F\u002Fgithub.com\u002Fchengzhag\u002FPanFusion) [[Project]](https:\u002F\u002Fchengzhag.github.io\u002Fpublication\u002Fpanfusion)\n    - ***Intelligent Grimm:*** Open-ended Visual Storytelling via Latent Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2306.00973) [[Code]](https:\u002F\u002Fgithub.com\u002Fhaoningwu3639\u002FStoryGen) [[Project]](https:\u002F\u002Fhaoningwu3639.github.io\u002FStoryGen_Webpage\u002F)\n    - On the Scalability of Diffusion-based Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.02883)\n    - ***MuLAn:*** A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.02790) [[Project]](https:\u002F\u002Fmulan-dataset.github.io\u002F) [[Dataset]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fmulan-dataset\u002Fv1.0)\n    - Learning Multi-dimensional Human Preference for Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.14705)\n    - Dynamic Prompt Optimizing for Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.04095)\n    - Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FMiao_Training_Diffusion_Models_Towards_Diverse_Image_Generation_with_Reinforcement_Learning_CVPR_2024_paper.pdf)\n    - Adversarial Text to Continuous Image Generation [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FHaydarov_Adversarial_Text_to_Continuous_Image_Generation_CVPR_2024_paper.pdf) [[Project]](https:\u002F\u002Fkilichbek.github.io\u002Fwebpage\u002Fhypercgan\u002F) [[Video]](https:\u002F\u002Fkilichbek.github.io\u002Fwebpage\u002Fhypercgan\u002F#)\n    - ***EmoGen:*** Emotional Image Content Generation with Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FYang_EmoGen_Emotional_Image_Content_Generation_with_Text-to-Image_Diffusion_Models_CVPR_2024_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FJingyuanYY\u002FEmoGen)\n  - **ECCV**\n    - Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.07860) [[Code]](https:\u002F\u002Fgithub.com\u002FShihaoZhaoZSH\u002FLaVi-Bridge) [[Project]](https:\u002F\u002Fshihaozhaozsh.github.io\u002FLaVi-Bridge\u002F)\n    - Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.05352v1) [[Code]](https:\u002F\u002Fgithub.com\u002Fnini0919\u002FDiffPNG)\n    - Getting it Right: Improving Spatial Consistency in Text-to-Image Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.01197) [[Code]](https:\u002F\u002Fgithub.com\u002FSPRIGHT-T2I\u002FSPRIGHT) [[Project]](https:\u002F\u002Fspright-t2i.github.io\u002F)\n    - Navigating Text-to-Image Generative Bias across Indic Languages [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.00283v1) [[Project]](https:\u002F\u002Fiab-rubric.org\u002Fresources\u002Fother-databases\u002Findictti)\n    - Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.21032)\n    - The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.12579) [[Code]](https:\u002F\u002Fleo81005.github.io\u002FReality-and-Fantasy\u002F) [[Project]](https:\u002F\u002Fleo81005.github.io\u002FReality-and-Fantasy\u002F) [[Dataset]](https:\u002F\u002Fleo81005.github.io\u002FReality-and-Fantasy\u002F)\n    - Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.12383v1) [[Code]](https:\u002F\u002Fgithub.com\u002FCharlesGong12\u002FRECE)\n    - Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.05352v1) [[Code]](https:\u002F\u002Fgithub.com\u002Fnini0919\u002FDiffPNG)\n    - ***StyleTokenizer:*** Defining Image Style by a Single Instance for Controlling Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.02543) [[Code]](https:\u002F\u002Fgithub.com\u002Falipay\u002Fstyle-tokenizer)\n    - ***PEA-Diffusion:*** Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F08492.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FOPPO-Mente-Lab\u002FPEA-Diffusion)\n    - Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F11936.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fzdxdsw\u002Fskewed_relations_T2I)\n    - ***Parrot:*** Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F05562.pdf)\n    - Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F10495.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FShihaoZhaoZSH\u002FLaVi-Bridge) [[Project]](https:\u002F\u002Fshihaozhaozsh.github.io\u002FLaVi-Bridge\u002F)\n    - ***MobileDiffusion:*** Instant Text-to-Image Generation on Mobile Devices [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F07923.pdf)\n    - ***PixArt-Σ:*** Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.04692) [[Code]](https:\u002F\u002Fgithub.com\u002FPixArt-alpha\u002FPixArt-sigma) [[Project]](https:\u002F\u002Fpixart-alpha.github.io\u002FPixArt-sigma-project\u002F)\n    - ***CogView3:*** Finer and Faster Text-to-Image Generation via Relay Diffusion [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.05121) [[Code]](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FCogView)\n  - **ICLR**\n    - Patched Denoising Diffusion Models For High-Resolution Image Synthesis [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.01316.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fmlpc-ucsd\u002Fpatch-dm)\n    - ***Relay Diffusion:*** Unifying diffusion process across resolutions for image synthesis [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.03350.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FRelayDiffusion)\n    - ***SDXL:*** Improving Latent Diffusion Models for High-Resolution Image Synthesis [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2307.01952.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FStability-AI\u002Fgenerative-models)\n    - Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.09048.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Ftomtom1103\u002Fcompose-and-conquer)\n    - ***PixArt-α:*** Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.00426.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FPixArt-alpha\u002FPixArt-alpha) [[Project]](https:\u002F\u002Fpixart-alpha.github.io\u002F) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FPixArt-alpha\u002FPixArt-alpha)\n  - **SIGGRAPH**\n    - ***RGB↔X:*** Image Decomposition and Synthesis Using Material- and Lighting-aware Diffusion Models [[Paper]](https:\u002F\u002Fzheng95z.github.io\u002Fassets\u002Ffiles\u002Fsig24-rgbx.pdf) [[Project]](https:\u002F\u002Fzheng95z.github.io\u002Fpublications\u002Frgbx24)\n  - **AAAI**\n    - Semantic-aware Data Augmentation for Text-to-image Synthesis [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.07951.pdf)\n    - Text-to-Image Generation for Abstract Concepts [[Paper]](https:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fview\u002F28122)\n  - **arXiv**\n    - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.10210.pdf)\n    - ***RPG:*** Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.11708.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FYangLing0818\u002FRPG-DiffusionMaster)\n    - ***Playground v2.5:*** Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.17245.pdf) [[Code]](https:\u002F\u002Fhuggingface.co\u002Fplaygroundai\u002Fplayground-v2.5-1024px-aesthetic)\n    - ***ResAdapter:*** Domain Consistent Resolution Adapter for Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.02084.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fbytedance\u002Fres-adapter) [[Project]](https:\u002F\u002Fres-adapter.github.io\u002F)\n    - ***InstantID:*** Zero-shot Identity-Preserving Generation in Seconds [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.07519.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FInstantID\u002FInstantID) [[Project]](https:\u002F\u002Finstantid.github.io\u002F) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FInstantX\u002FInstantID)\n    - ***PIXART-δ:*** Fast and Controllable Image Generation with Latent Consistency Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.05252) [[Code]](b.com\u002FPixArt-alpha\u002FPixArt-alpha?tab=readme-ov-file)\n    - ***ELLA:*** Equip Diffusion Models with LLM for Enhanced Semantic Alignment [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.05135) [[Code]](https:\u002F\u002Fgithub.com\u002FELLA-Diffusion\u002FELLA) [[Project]](https:\u002F\u002Fella-diffusion.github.io\u002F) \n    - ***Text2Street:*** Controllable Text-to-image Generation for Street Views [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.04504.pdf)\n    - ***LayerDiffuse:*** Transparent Image Layer Diffusion using Latent Transparency [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.17113) [[Code]](https:\u002F\u002Fgithub.com\u002Flayerdiffusion\u002FLayerDiffuse)\n    - ***SD3-Turbo:*** Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.12015.pdf)\n    - ***StreamMultiDiffusion:*** Real-Time Interactive Generation with Region-Based Semantic Control [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.09055) [[Code]](https:\u002F\u002Fgithub.com\u002Fironjr\u002FStreamMultiDiffusion)\n    - ***SVGDreamer:*** Text Guided SVG Generation with Diffusion Model [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.16476) [[Code]](https:\u002F\u002Fgithub.com\u002Fximinng\u002FSVGDreamer) [[Project]](https:\u002F\u002Fximinng.github.io\u002FSVGDreamer-project\u002F)\n    - ***PromptCharm:*** Text-to-Image Generation through Multi-modal Prompting and Refinement [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.04014.pdf)\n    - ***YOSO:*** You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.12931) [[Code]](https:\u002F\u002Fgithub.com\u002Fmlpen\u002FYOSO)\n    - ***SingDiffusion:*** Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.08381) [[Code]](https:\u002F\u002Fgithub.com\u002FPangzeCheung\u002FSingDiffusion)\n    - ***CoMat:*** Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.03653) [[Code]](https:\u002F\u002Fgithub.com\u002FCaraJ7\u002FCoMat) [[Project]](https:\u002F\u002Fcaraj7.github.io\u002Fcomat\u002F)\n    - ***StoryDiffusion:*** Consistent Self-Attention for Long-Range Image and Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.01434) [[Code]](https:\u002F\u002Fgithub.com\u002FHVision-NKU\u002FStoryDiffusion) [[Project]](https:\u002F\u002Fstorydiffusion.github.io\u002F) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FYupengZhou\u002FStoryDiffusion)\n    - Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.12970) [[Project]](https:\u002F\u002Ffaceadapter.github.io\u002Fface-adapter.github.io\u002F)\n    - ***LinFusion:*** 1 GPU, 1 Minute, 16K Image [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.02097) [[Code]](https:\u002F\u002Fgithub.com\u002FHuage001\u002FLinFusion) [[Project]](https:\u002F\u002Flv-linfusion.github.io\u002F) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FHuage001\u002FLinFusion-SD-v1.5)\n    - ***OmniGen:*** Unified Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.11340) [[Code]](https:\u002F\u002Fgithub.com\u002FVectorSpaceLab\u002FOmniGen)\n    - ***CoMPaSS:*** Enhancing Spatial Understanding in Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.13195) [[Code]](https:\u002F\u002Fgithub.com\u002Fblurgyy\u002FCoMPaSS)\n  - **Others**\n    - ***Stable Cascade*** [[Blog]](https:\u002F\u002Fstability.ai\u002Fnews\u002Fintroducing-stable-cascade) [[Code]](https:\u002F\u002Fgithub.com\u002FStability-AI\u002FStableCascade)\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n- \u003Cspan id=\"text-year-2023\">**Year 2023**\u003C\u002Fspan>\n  - **CVPR**\n    - ***GigaGAN:*** Scaling Up GANs for Text-to-Image Synthesis [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FKang_Scaling_Up_GANs_for_Text-to-Image_Synthesis_CVPR_2023_paper.pdf) [[Reproduced Code]](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fgigagan-pytorch) [[Project]](https:\u002F\u002Fmingukkang.github.io\u002FGigaGAN\u002F) [[Video]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ZjxtuDQkOPY&feature=youtu.be)\n    - ***ERNIE-ViLG 2.0:*** Improving Text-to-Image Diffusion Model With Knowledge-Enhanced Mixture-of-Denoising-Experts [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FFeng_ERNIE-ViLG_2.0_Improving_Text-to-Image_Diffusion_Model_With_Knowledge-Enhanced_Mixture-of-Denoising-Experts_CVPR_2023_paper.pdf)\n    - Shifted Diffusion for Text-to-image Generation [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FZhou_Shifted_Diffusion_for_Text-to-Image_Generation_CVPR_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fdrboog\u002FShifted_Diffusion)\n    - ***GALIP:*** Generative Adversarial CLIPs for Text-to-Image Synthesis [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FTao_GALIP_Generative_Adversarial_CLIPs_for_Text-to-Image_Synthesis_CVPR_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Ftobran\u002FGALIP)\n    - ***Specialist Diffusion:*** Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models to Learn Any Unseen Style [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FLu_Specialist_Diffusion_Plug-and-Play_Sample-Efficient_Fine-Tuning_of_Text-to-Image_Diffusion_Models_To_CVPR_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FPicsart-AI-Research\u002FSpecialist-Diffusion)\n    - Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FOtani_Toward_Verifiable_and_Reproducible_Human_Evaluation_for_Text-to-Image_Generation_CVPR_2023_paper.pdf)\n    - ***RIATIG:*** Reliable and Imperceptible Adversarial Text-to-Image Generation with Natural Prompts [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FLiu_RIATIG_Reliable_and_Imperceptible_Adversarial_Text-to-Image_Generation_With_Natural_Prompts_CVPR_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FWUSTL-CSPL\u002FRIATIG)\n    - Multi-Concept Customization of Text-to-Image Diffusion [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FKumari_Multi-Concept_Customization_of_Text-to-Image_Diffusion_CVPR_2023_paper.pdf) [[Project]](https:\u002F\u002Fwww.cs.cmu.edu\u002F~custom-diffusion\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fadobe-research\u002Fcustom-diffusion) \n  - **ICCV**\n    - ***DiffFit:*** Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FXie_DiffFit_Unlocking_Transferability_of_Large_Diffusion_Models_via_Simple_Parameter-efficient_ICCV_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fmkshing\u002FDiffFit-pytorch) [[Demo]](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fmkshing\u002Fdifffit-pytorch\u002Fblob\u002Fmain\u002Fscripts\u002Fdifffit_pytorch.ipynb)\n  - **NeurIPS**\n    - ***ImageReward:*** Learning and Evaluating Human Preferences for Text-to-Image Generation [[Paper]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=JVzeOYEx6d) [[Code]](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FImageReward)\n    - ***RAPHAEL***: Text-to-Image Generation via Large Mixture of Diffusion Paths [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.18295) [[Project]](https:\u002F\u002Fraphael-painter.github.io\u002F)\n    - Linguistic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment [[Paper]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=AOKU4nRw1W) [[Code]](https:\u002F\u002Fgithub.com\u002FRoyiRa\u002FLinguistic-Binding-in-Diffusion-Models)\n    - ***DenseDiffusion:*** Dense Text-to-Image Generation with Attention Modulation [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FKim_Dense_Text-to-Image_Generation_with_Attention_Modulation_ICCV_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fnaver-ai\u002Fdensediffusion)\n  - **ICLR**\n    - Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis [[Paper]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=PUIqjT4rzq7) [[Code]](https:\u002F\u002Fgithub.com\u002Fweixi-feng\u002FStructured-Diffusion-Guidance)\n  - **ICML**\n    - ***StyleGAN-T:*** Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis [[Paper]](https:\u002F\u002Fproceedings.mlr.press\u002Fv202\u002Fsauer23a\u002Fsauer23a.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Fstylegan-t) [[Project]](https:\u002F\u002Fsites.google.com\u002Fview\u002Fstylegan-t\u002F) [[Video]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=MMj8OTOUIok)\n    - ***Muse:*** Text-To-Image Generation via Masked Generative Transformers [[Paper]](https:\u002F\u002Fproceedings.mlr.press\u002Fv202\u002Fchang23b\u002Fchang23b.pdf) [[Reproduced Code]](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fmuse-maskgit-pytorch) [[Project]](https:\u002F\u002Fmuse-icml.github.io\u002F)\n    - ***UniDiffusers:*** One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.06555) [[Code]](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002Funidiffuser)\n  - **ACM MM**\n    - ***SUR-adapter:*** Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.05189.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FQrange-group\u002FSUR-adapter)\n    - ***ControlStyle:*** Text-Driven Stylized Image Generation Using Diffusion Priors [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.05463.pdf)\n  - **SIGGRAPH**\n    - ***Attend-and-Excite:*** Attention-Based Semantic Guidance for Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2301.13826.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fyuval-alaluf\u002FAttend-and-Excite) [[Project]](https:\u002F\u002Fyuval-alaluf.github.io\u002FAttend-and-Excite\u002F) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FAttendAndExcite\u002FAttend-and-Excite)\n  - **arXiv**\n    - ***P+:*** Extended Textual Conditioning in Text-to-Image Generation [[Paper]](https:\u002F\u002Fprompt-plus.github.io\u002Ffiles\u002FPromptPlus.pdf)\n    - ***SDXL-Turbo:*** Adversarial Diffusion Distillation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.17042.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FStability-AI\u002Fgenerative-models)\n    - ***Wuerstchen:*** An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2306.00637.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fdome272\u002FWuerstchen)\n    - ***StreamDiffusion:*** A Pipeline-level Solution for Real-time Interactive Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.12491.pdf) [[Project]](https:\u002F\u002Fgithub.com\u002Fcumulo-autumn\u002FStreamDiffusion)\n    - ***ParaDiffusion:*** Paragraph-to-Image Generation with Information-Enriched Diffusion Model [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.14284) [[Code]](https:\u002F\u002Fgithub.com\u002Fweijiawu\u002FParaDiffusion)\n  - **Others**\n    - ***DALL-E 3:*** Improving Image Generation with Better Captions [[Paper]](https:\u002F\u002Fcdn.openai.com\u002Fpapers\u002Fdall-e-3.pdf)\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n- \u003Cspan id=\"text-year-2022\">**Year 2022**\u003C\u002Fspan>\n  - **CVPR**\n    - 🔥 ***Stable Diffusion:*** High-Resolution Image Synthesis With Latent Diffusion Models [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FRombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FCompVis\u002Flatent-diffusion) [[Project]](https:\u002F\u002Fommer-lab.com\u002Fresearch\u002Flatent-diffusion-models\u002F)\n    - Vector Quantized Diffusion Model for Text-to-Image Synthesis [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FGu_Vector_Quantized_Diffusion_Model_for_Text-to-Image_Synthesis_CVPR_2022_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fcientgu\u002FVQ-Diffusion)\n    - ***DF-GAN:*** A Simple and Effective Baseline for Text-to-Image Synthesis [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FTao_DF-GAN_A_Simple_and_Effective_Baseline_for_Text-to-Image_Synthesis_CVPR_2022_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Ftobran\u002FDF-GAN)\n    - ***LAFITE:*** Towards Language-Free Training for Text-to-Image Generation [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FZhou_Towards_Language-Free_Training_for_Text-to-Image_Generation_CVPR_2022_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fdrboog\u002FLafite)\n    - Text-to-Image Synthesis based on Object-Guided Joint-Decoding Transformer [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FWu_Text-to-Image_Synthesis_Based_on_Object-Guided_Joint-Decoding_Transformer_CVPR_2022_paper.pdf)\n    - ***StyleT2I:*** Toward Compositional and High-Fidelity Text-to-Image Synthesis [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FLi_StyleT2I_Toward_Compositional_and_High-Fidelity_Text-to-Image_Synthesis_CVPR_2022_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fzhihengli-UR\u002FStyleT2I)\n  - **ECCV**\n    - ***Make-A-Scene:*** Scene-Based Text-to-Image Generation with Human Priors [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2022\u002Fpapers_ECCV\u002Fpapers\u002F136750087.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FCasualGANPapers\u002FMake-A-Scene) [[Demo]](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1SPyQ-epTsAOAu8BEohUokN4-b5RM_TnE?usp=sharing)\n    - Trace Controlled Text to Image Generation [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2022\u002Fpapers_ECCV\u002Fpapers\u002F136960058.pdf)\n    - Improved Masked Image Generation with Token-Critic [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2022\u002Fpapers_ECCV\u002Fpapers\u002F136830070.pdf)\n    - ***VQGAN-CLIP:*** Open Domain Image Generation and Manipulation Using Natural Language [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2022\u002Fpapers_ECCV\u002Fpapers\u002F136970088.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Fvqgan-clip)\n    - ***TISE:*** Bag of Metrics for Text-to-Image Synthesis Evaluation [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2022\u002Fpapers_ECCV\u002Fpapers\u002F136960585.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FVinAIResearch\u002Ftise-toolbox)\n    - ***StoryDALL-E:*** Adapting Pretrained Text-to-image Transformers for Story Continuation [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2022\u002Fpapers_ECCV\u002Fpapers\u002F136970070.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fadymaharana\u002Fstorydalle) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FECCV2022\u002Fstorydalle)\n  - **NeurIPS**\n    - ***CogView2:*** Faster and Better Text-to-Image Generation via Hierarchical Transformers [[Paper]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=GkDbQb6qu_r) [[Code]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=GkDbQb6qu_r)\n    - ***Imagen:*** Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding [[Paper]](https:\u002F\u002Fpapers.nips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Ffile\u002Fec795aeadae0b7d230fa35cbaf04c041-Paper-Conference.pdf) [[Reproduced Code]](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fimagen-pytorch) [[Project]](https:\u002F\u002Fimagen.research.google\u002F) [[***Imagen 2***]](https:\u002F\u002Fdeepmind.google\u002Ftechnologies\u002Fimagen-2\u002F)\n  - **ACM MM**\n    - ***Adma-GAN:*** Attribute-Driven Memory Augmented GANs for Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.14046.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FHsintien-Ng\u002FAdma-GAN)\n    - Background Layout Generation and Object Knowledge Transfer for Text-to-Image Generation [[Paper]](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3503161.3548154)\n    - ***DSE-GAN:*** Dynamic Semantic Evolution Generative Adversarial Network for Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.01339.pdf)\n    - ***AtHom:*** Two Divergent Attentions Stimulated By Homomorphic Training in Text-to-Image Synthesis [[Paper]](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3503161.3548159)\n  - **arXiv**\n    - ***DALLE-2:*** Hierarchical Text-Conditional Image Generation with CLIP Latents [[Paper]](https:\u002F\u002Fcdn.openai.com\u002Fpapers\u002Fdall-e-2.pdf)\n    - ***PITI:*** Pretraining is All You Need for Image-to-Image Translation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2205.12952.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FPITI-Synthesis\u002FPITI)\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n- \u003Cspan id=\"text-year-2021\">**Year 2021**\u003C\u002Fspan>\n  - **ICCV**\n    -  ***DAE-GAN:*** Dynamic Aspect-aware GAN for Text-to-Image Synthesis [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2021\u002Fpapers\u002FRuan_DAE-GAN_Dynamic_Aspect-Aware_GAN_for_Text-to-Image_Synthesis_ICCV_2021_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fhiarsal\u002FDAE-GAN)\n  - **NeurIPS**\n    - ***CogView:*** Mastering Text-to-Image Generation via Transformers [[Paper]](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2021\u002Ffile\u002Fa4d92e2cd541fca87e4620aba658316d-Paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FCogView) [[Demo]](https:\u002F\u002Fthudm.github.io\u002FCogView\u002Findex.html)\n    - ***UFC-BERT:*** Unifying Multi-Modal Controls for Conditional Image Synthesis [[Paper]](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2021\u002Ffile\u002Fe46bc064f8e92ac2c404b9871b2a4ef2-Paper.pdf)\n  - **ICML**\n    - ***DALLE-1:*** Zero-Shot Text-to-Image Generation [[Paper]](https:\u002F\u002Fproceedings.mlr.press\u002Fv139\u002Framesh21a\u002Framesh21a.pdf) [[Reproduced Code]](https:\u002F\u002Fgithub.com\u002Flucidrains\u002FDALLE-pytorch)\n   -  **ACM MM**\n      -  Cycle-Consistent Inverse GAN for Text-to-Image Synthesis [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2108.01361.pdf)\n      -  ***R-GAN:*** Exploring Human-like Way for Reasonable Text-to-Image Synthesis via Generative Adversarial Networks [[Paper]](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3474085.3475363)\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n- \u003Cspan id=\"text-year-2020\">**Year 2020**\u003C\u002Fspan>\n  - **ACM MM**\n    - Text-to-Image Synthesis via Aesthetic Layout [[Paper]](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3394171.3414357)\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n## Conditional Text-to-Image Generation\n- \u003Cspan id=\"conditional-year-2025\">**Year 2025**\u003C\u002Fspan>\n  - **CVPR**\n    - Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.01515) [[Code]](https:\u002F\u002Fgithub.com\u002FZixuanWang0525\u002FDADG)\n  - **ICCV**\n    - ***UNO:*** A Universal Customization Method for Both Single and Multi‑Subject Conditioning [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.02160) [[Project]](https:\u002F\u002Fbytedance.github.io\u002FUNO) [[Code]](https:\u002F\u002Fgithub.com\u002Fbytedance\u002FUNO)\n    - ***CoMPaSS:*** Enhancing Spatial Understanding in Text‑to‑Image Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.13195) [[Project]](https:\u002F\u002Fcompass.blurgy.xyz) [[Code]](https:\u002F\u002Fgithub.com\u002Fblurgyy\u002FCoMPaSS)\n    - ***SP‑Ctrl:*** Rethink Sparse Signals for Pose‑Guided Text‑to‑Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.20983) [[Code]](https:\u002F\u002Fgithub.com\u002FDREAMXFAR\u002FSP-Ctrl)\n    - ***CompCon:*** Discovering Divergent Representations Between Text‑to‑Image Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.08940) [[Code]](https:\u002F\u002Fgithub.com\u002Fadobe-research\u002FCompCon)\n    - ***C2OT:*** The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow‑Based Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.10636) [[Project]](https:\u002F\u002Fhkchengrex.com\u002FC2OT) [[Code]](https:\u002F\u002Fgithub.com\u002Fhkchengrex\u002FC2OT)\n    - ***RAG‑Diffusion:*** Region‑Aware Text‑to‑Image Generation via Hard Binding and Soft Refinement [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.06558) [[Project]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FNJU\u002FRAG-Diffusion) [[Code]](https:\u002F\u002Fgithub.com\u002FNJU-PCALab\u002FRAG-Diffusion)\n    - ***CharaConsist:*** Fine‑Grained Consistent Character Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.11533) [[Project]](https:\u002F\u002Fmurray-wang.github.io\u002FCharaConsist) [[Code]](https:\u002F\u002Fgithub.com\u002FMurray-Wang\u002FCharaConsist)\n    - ***Shadow Director:*** Parametric Shadow Control for Portrait Generation in Text‑to‑Image Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.21943) [[Project]](https:\u002F\u002Fhm-cai.com\u002FShadowDirector)\n    - ***ImageGen‑CoT:*** Enhancing Text‑to‑Image In‑Context Learning with Chain‑of‑Thought Reasoning [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.19312) [[Project]](https:\u002F\u002Fimagegen-cot.github.io)\n\n- \u003Cspan id=\"conditional-year-2024\">**Year 2024**\u003C\u002Fspan>\n  - **CVPR**\n    - ***PLACE:*** Adaptive Layout-Semantic Fusion for Semantic Image Synthesis [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.01852.pdf)\n    - One-Shot Structure-Aware Stylized Image Synthesis [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.17275.pdf)\n    - Grounded Text-to-Image Synthesis with Attention Refocusing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2306.05427.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FAttention-Refocusing\u002Fattention-refocusing) [[Project]](https:\u002F\u002Fattention-refocusing.github.io\u002F) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fattention-refocusing\u002FAttention-refocusing)\n    - Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.18078.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FYanzuoLu\u002FCFLD)\n    - ***DetDiffusion:*** Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.13304)\n    - ***CAN:*** Condition-Aware Neural Network for Controlled Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.01143.pdf)\n    - ***SceneDiffusion:*** Move Anything with Layered Scene Diffusion [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.07178)\n    - ***Zero-Painter:*** Training-Free Layout Control for Text-to-Image Synthesis [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FOhanyan_Zero-Painter_Training-Free_Layout_Control_for_Text-to-Image_Synthesis_CVPR_2024_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FPicsart-AI-Research\u002FZero-Painter)\n    - ***MIGC:*** Multi-Instance Generation Controller for Text-to-Image Synthesis [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FZhou_MIGC_Multi-Instance_Generation_Controller_for_Text-to-Image_Synthesis_CVPR_2024_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Flimuloo\u002FMIGC) [[Project]](https:\u002F\u002Fmigcproject.github.io\u002F)\n    - ***FreeControl:*** Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FMo_FreeControl_Training-Free_Spatial_Control_of_Any_Text-to-Image_Diffusion_Model_with_CVPR_2024_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fgenforce\u002Ffreecontrol) [[Project]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FMo_FreeControl_Training-Free_Spatial_Control_of_Any_Text-to-Image_Diffusion_Model_with_CVPR_2024_paper.pdf)\n  - **ECCV**\n    - ***PreciseControl:*** Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.05083) [[Code]](https:\u002F\u002Fgithub.com\u002Frishubhpar\u002FPreciseControl) [[Project]](https:\u002F\u002Frishubhpar.github.io\u002FPreciseControl.home\u002F)\n    - ***AnyControl:*** Create Your Artwork with Versatile Control on Text-to-Image Generation [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F01706.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002FAnyControl)\n  - **NeurIPS**\n    - ***Ctrl-X:*** Controlling Structure and Appearance for Text-To-Image Generation Without Guidance [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.07540) [[Code]](https:\u002F\u002Fgithub.com\u002Fgenforce\u002Fctrl-x) [[Project]](https:\u002F\u002Fgenforce.github.io\u002Fctrl-x\u002F)\n  - **ICLR**\n    - Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.06313.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fmuzishen\u002FPCDMs)\n  - **WACV**\n    - Training-Free Layout Control with Cross-Attention Guidance [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FWACV2024\u002Fpapers\u002FChen_Training-Free_Layout_Control_With_Cross-Attention_Guidance_WACV_2024_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fsilent-chen\u002Flayout-guidance) [[Project]](https:\u002F\u002Fsilent-chen.github.io\u002Flayout-guidance\u002F) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fsilentchen\u002Flayout-guidance)\n  - **AAAI**\n    - ***SSMG:*** Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.10156.pdf)\n    - Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.13921.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FOPPO-Mente-Lab\u002Fattention-mask-control)\n  - **arXiv**\n    - ***DEADiff:*** An Efficient Stylization Diffusion Model with Disentangled Representations [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.06951)\n    - ***InstantStyle:*** Free Lunch towards Style-Preserving in Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.02733.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FInstantStyle\u002FInstantStyle) [[Project]](https:\u002F\u002Finstantstyle.github.io\u002F)\n    - ***ControlNet++:*** Improving Conditional Controls with Efficient Consistency Feedback [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.07987.pdf) [[Project]](https:\u002F\u002Fliming-ai.github.io\u002FControlNet_Plus_Plus\u002F)\n    - ***Hunyuan-DiT:*** A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.08748) [[Code]](https:\u002F\u002Fgithub.com\u002FTencent\u002FHunyuanDiT) [[Project]](https:\u002F\u002Fdit.hunyuan.tencent.com\u002F)\n    - ***DialogGen:*** Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.08857) [[Code]](https:\u002F\u002Fgithub.com\u002FCentaurusalpha\u002FDialogGen) [[Project]](https:\u002F\u002Fhunyuan-dialoggen.github.io\u002F)\n    - ***ControlNeXt:*** Powerful and Efficient Control for Image and Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.06070) [[Code]](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FControlNeXt) [[Project]](https:\u002F\u002Fpbihao.github.io\u002Fprojects\u002Fcontrolnext\u002Findex.html)\n    - ***UniPortrait:*** A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.05939) [[Code]](https:\u002F\u002Fgithub.com\u002Fjunjiehe96\u002FUniPortrait) [[Project]](https:\u002F\u002Faigcdesigngroup.github.io\u002FUniPortrait-Page\u002F) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FJunjie96\u002FUniPortrait)\n    - ***OmniControl:*** Minimal and Universal Control for Diffusion Transformer [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.15098) [[Code]](https:\u002F\u002Fgithub.com\u002FYuanshi9815\u002FOminiControl) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FYuanshi\u002FOminiControl)\n    - ***UnZipLoRA:*** Separating Content and Style from a Single Image [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.04465) [[Project]](https:\u002F\u002Funziplora.github.io\u002F)\n    - ***CtrLoRA:*** An Extensible and Efficient Framework for Controllable Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.09400) [[Code]](https:\u002F\u002Fgithub.com\u002FxyfJASON\u002Fctrlora)\n    - Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.06558) [[Code]](https:\u002F\u002Fgithub.com\u002FNJU-PCALab\u002FRAG-Diffusion)\n\n\n\n\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n- \u003Cspan id=\"conditional-year-2023\">**Year 2023**\u003C\u002Fspan>\n  - **CVPR**\n    - ***GLIGEN:*** Open-Set Grounded Text-to-Image Generation [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FLi_GLIGEN_Open-Set_Grounded_Text-to-Image_Generation_CVPR_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fgligen\u002FGLIGEN) [[Project]](https:\u002F\u002Fgligen.github.io\u002F) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fgligen\u002Fdemo) [[Video]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=-MCkU7IAGKs&feature=youtu.be)\n    - Autoregressive Image Generation using Residual Quantization [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FLee_Autoregressive_Image_Generation_Using_Residual_Quantization_CVPR_2022_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fkakaobrain\u002Frq-vae-transformer)\n    - ***SpaText:*** Spatio-Textual Representation for Controllable Image Generation [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FAvrahami_SpaText_Spatio-Textual_Representation_for_Controllable_Image_Generation_CVPR_2023_paper.pdf) [[Project]](https:\u002F\u002Fomriavrahami.com\u002Fspatext\u002F) [[Video]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=VlieNoCwHO4)\n    - Text to Image Generation with Semantic-Spatial Aware GAN [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FLiao_Text_to_Image_Generation_With_Semantic-Spatial_Aware_GAN_CVPR_2022_paper.pdf)\n    - ***ReCo:*** Region-Controlled Text-to-Image Generation [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FYang_ReCo_Region-Controlled_Text-to-Image_Generation_CVPR_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FReCo)\n    - ***LayoutDiffusion:*** Controllable Diffusion Model for\n    Layout-to-image Generation [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FZheng_LayoutDiffusion_Controllable_Diffusion_Model_for_Layout-to-Image_Generation_CVPR_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FZGCTroy\u002FLayoutDiffusion)\n  - **ICLR**\n    - ***Ctrl-U:*** Robust Conditional Image Generation via Uncertainty-aware Reward Modeling [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=eC2ICbECNM) [[Project]](https:\u002F\u002Fgrenoble-zhang.github.io\u002FCtrl-U-Page\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fgrenoble-zhang\u002FCtrl-U)\n  - **ICCV**\n    - ***ControlNet:*** Adding Conditional Control to Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FZhang_Adding_Conditional_Control_to_Text-to-Image_Diffusion_Models_ICCV_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet)\n    - ***SceneGenie:*** Scene Graph Guided Diffusion Models for Image Synthesis [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023W\u002FSG2RL\u002Fpapers\u002FFarshad_SceneGenie_Scene_Graph_Guided_Diffusion_Models_for_Image_Synthesis_ICCVW_2023_paper.pdf) [[Code]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023W\u002FSG2RL\u002Fpapers\u002FFarshad_SceneGenie_Scene_Graph_Guided_Diffusion_Models_for_Image_Synthesis_ICCVW_2023_paper.pdf)\n    - ***ZestGuide:*** Zero-Shot Spatial Layout Conditioning for Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FCouairon_Zero-Shot_Spatial_Layout_Conditioning_for_Text-to-Image_Diffusion_Models_ICCV_2023_paper.pdf)\n  - **ICML**\n    - ***Composer:*** Creative and Controllable Image Synthesis with Composable Conditions [[Paper]](https:\u002F\u002Fproceedings.mlr.press\u002Fv202\u002Fhuang23b\u002Fhuang23b.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002Fcomposer) [[Project]](https:\u002F\u002Fali-vilab.github.io\u002Fcomposer-page\u002F)\n    - ***MultiDiffusion:*** Fusing Diffusion Paths for Controlled Image Generation [[Paper]](https:\u002F\u002Fproceedings.mlr.press\u002Fv202\u002Fbar-tal23a\u002Fbar-tal23a.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fomerbt\u002FMultiDiffusion) [[Video]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=D2Q0D1gIeqg) [[Project]](https:\u002F\u002Fmultidiffusion.github.io\u002F) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fweizmannscience\u002FMultiDiffusion)\n  - **SIGGRAPH** \n    - Sketch-Guided Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3588432.3591560) [[Reproduced Code]](https:\u002F\u002Fgithub.com\u002Fogkalu2\u002FSketch-Guided-Stable-Diffusion) [[Project]](https:\u002F\u002Fsketch-guided-diffusion.github.io\u002F)\n  - **NeurIPS**\n    - ***Uni-ControlNet:*** All-in-One Control to Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.16322.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FShihaoZhaoZSH\u002FUni-ControlNet) [[Project]](https:\u002F\u002Fshihaozhaozsh.github.io\u002Funicontrolnet\u002F)\n    - ***Prompt Diffusion:*** In-Context Learning Unlocked for Diffusion Models [[Paper]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=6BZS2EAkns) [[Code]](https:\u002F\u002Fgithub.com\u002FZhendong-Wang\u002FPrompt-Diffusion) [[Project]](https:\u002F\u002Fzhendong-wang.github.io\u002Fprompt-diffusion.github.io\u002F)\n  - **WACV**\n    - More Control for Free! Image Synthesis with Semantic Diffusion Guidance [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FWACV2023\u002Fpapers\u002FLiu_More_Control_for_Free_Image_Synthesis_With_Semantic_Diffusion_Guidance_WACV_2023_paper.pdf)\n  - **ACM MM**\n    -  ***LayoutLLM-T2I:*** Eliciting Layout Guidance from LLM for Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.05095.pdf)\n  - **arXiv**\n    - ***T2I-Adapter:*** Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2302.08453.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FTencentARC\u002FT2I-Adapter) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FTencentARC\u002FT2I-Adapter-SDXL)\n    - ***BLIP-Diffusion:*** Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.14720.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fsalesforce\u002FLAVIS\u002Ftree\u002Fmain\u002Fprojects\u002Fblip-diffusion)\n    - Late-Constraint Diffusion Guidance for Controllable Image Synthesis [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.11520) [[Code]](https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo\u002FLCDG)\n- \u003Cspan id=\"conditional-year-2022\">**Year 2022**\u003C\u002Fspan>\n  - **ICLR**\n    - ***SDEdit:*** Guided Image Synthesis and Editing with Stochastic Differential Equations [[Paper]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=aBsCjcPu_tE) [[Code]](https:\u002F\u002Fgithub.com\u002Fermongroup\u002FSDEdit) [[Project]](https:\u002F\u002Fsde-image-editing.github.io\u002F)\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\n\u003C!-- omit in toc -->\n## Personalized Text-to-Image Generation\n- \u003Cspan id=\"personalized-year-2025\">**Year 2025**\u003C\u002Fspan>\n  - **CVPR**\n    - ***SerialGen:*** Personalized Image Generation by First Standardization Then Personalization [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.01485) [[Project]](https:\u002F\u002Fserialgen.github.io\u002F)\n    - ***PatchDPO:*** Patch-level DPO for Finetuning-free Personalized Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.03177) [[Code]](https:\u002F\u002Fgithub.com\u002FhqhQAQ\u002FPatchDPO)\n    - ***DreamCache:*** Finetuning-Free Lightweight Personalized Image Generation via Feature Caching [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.17786v1)\n  - **ICCV**\n    - ***DrUM:*** Draw Your Mind: Personalized Generation via Condition‑Level Modeling in Text‑to‑Image Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.03481) [[Code]](https:\u002F\u002Fgithub.com\u002FBurf\u002FDrUM)\n    - ***PersonaCraft:*** Personalized and Controllable Full‑Body Multi‑Human Scene Generation Using Occlusion‑Aware 3D‑Conditioned Diffusion [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.18068) [[Project]](https:\u002F\u002Fgwang-kim.github.io\u002Fpersona_craft\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fgwang-kim\u002FPersonaCraft)\n    - ***Steering Guidance:*** Steering Guidance for Personalized Text‑to‑Image Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.00319)\n    - ***FreeCus:*** FreeCus: Free Lunch Subject‑Driven Customization in Diffusion Transformers [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.15249) [[Code]](https:\u002F\u002Fgithub.com\u002FMonalissaa\u002FFreeCus)\n    - ***PromptDresser:*** Improving the Quality and Controllability of Virtual Try‑On via Generative Textual Prompt and Prompt‑aware Mask [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.16978) [[Code]](https:\u002F\u002Fgithub.com\u002Frlawjdghek\u002FPromptDresser)\n    - ***DynamicID:*** Zero‑Shot Multi‑ID Image Personalization with Flexible Facial Editability [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.06505) [[Code]](https:\u002F\u002Fgithub.com\u002FByteCat-bot\u002FDynamicID)\n    - ***UniversalBooth:*** Model‑Agnostic Personalized Text‑to‑Image Generation\n    - ***ARBooth:*** Fine‑Tuning Visual Autoregressive Models for Subject‑Driven Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.02612) [[Project]](https:\u002F\u002Fjiwoogit.github.io\u002FARBooth\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fjiwoogit\u002FARBooth)\n    - ⚠️ ***ConceptSplit:*** Decoupled Multi‑Concept Personalization of Diffusion Models via Token‑wise Adaptation and Attention Disentanglement [[Code]](https:\u002F\u002Fgithub.com\u002FKU-VGI\u002FConceptSplit)\n    - ⚠️ ***ObjectMate:*** A Recurrence Prior for Object Insertion and Subject‑Driven Generation [[Project]](https:\u002F\u002Fobject-mate.com)\n  - **NeurIPS**\n    - ***MS-Diffusion:*** Multi-Subject Zero-Shot Image Personalized with Layout Guidance [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.07209) [[Project]](https:\u002F\u002Fms-diffusion.github.io\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FMS-Diffusion\u002FMS-Diffusion)\n    - ***ClassDiffusion:*** More Aligned Personalization Tuning with Explicit Class Guidance [[Paper]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=iTm4H6N4aG) [[Project]](https:\u002F\u002Fclassdiffusion.github.io\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FRbrq03\u002FClassDiffusion)\n    - ***DreamBench++:*** A Human-Aligned Benchmark for Personalized Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.16855) [[Project]](https:\u002F\u002Fdreambenchplus.github.io\u002F)\n    - ***TweedieMix:*** Improving Multi-Concept Fusion for Diffusion-based Image\u002FVideo Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.05591) [[Code]](https:\u002F\u002Fgithub.com\u002FKwonGihyun\u002FTweedieMix)\n- \u003Cspan id=\"personalized-year-2024\">**Year 2024**\u003C\u002Fspan>\n  - **CVPR**\n    - Cross Initialization for Personalized Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.15905.pdf)\n    - When StyleGAN Meets Stable Diffusion: a W+ Adapter for Personalized Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.17461.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fcsxmli2016\u002Fw-plus-adapter) [[Project]](https:\u002F\u002Fcsxmli2016.github.io\u002Fprojects\u002Fw-plus-adapter\u002F)\n    - Style Aligned Image Generation via Shared Attention [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.02133.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fstyle-aligned) [[Project]](https:\u002F\u002Fstyle-aligned-gen.github.io\u002F)\n    - ***InstantBooth:*** Personalized Text-to-Image Generation without Test-Time Finetuning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.03411.pdf) [[Project]](https:\u002F\u002Fjshi31.github.io\u002FInstantBooth\u002F)\n    - High Fidelity Person-centric Subject-to-Image Synthesis [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.10329.pdf)\n    - ***RealCustom:*** Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.00483.pdf) [[Project]](https:\u002F\u002Fcorleone-huang.github.io\u002Frealcustom\u002F)\n    - ***DisenDiff:*** Attention Calibration for Disentangled Text-to-Image Personalization [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.18551) [[Code]](https:\u002F\u002Fgithub.com\u002FMonalissaa\u002FDisenDiff)\n    - ***FreeCustom:*** Tuning-Free Customized Image Generation for Multi-Concept Composition [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.13870v1) [[Code]](https:\u002F\u002Fgithub.com\u002Faim-uofa\u002FFreeCustom) [[Project]](https:\u002F\u002Faim-uofa.github.io\u002FFreeCustom\u002F)\n    - Personalized Residuals for Concept-Driven Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.12978)\n    - Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.01356)\n    - ***JeDi:*** Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FZeng_JeDi_Joint-Image_Diffusion_Models_for_Finetuning-Free_Personalized_Text-to-Image_Generation_CVPR_2024_paper.pdf)\n    - Countering Personalized Text-to-Image Generation with Influence Watermarks [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FLiu_Countering_Personalized_Text-to-Image_Generation_with_Influence_Watermarks_CVPR_2024_paper.pdf)\n    - ***PIA:*** Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FZhang_PIA_Your_Personalized_Image_Animator_via_Plug-and-Play_Modules_in_Text-to-Image_CVPR_2024_paper.pdf) [[Project]](https:\u002F\u002Fpi-animator.github.io\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002FPIA)\n    - ***SSR-Encoder:*** Encoding Selective Subject Representation for Subject-Driven Generation [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FZhang_SSR-Encoder_Encoding_Selective_Subject_Representation_for_Subject-Driven_Generation_CVPR_2024_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FXiaojiu-z\u002FSSR_Encoder)\n  - **ECCV**\n    - Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.16990) [[Project]](https:\u002F\u002Fomer11a.github.io\u002Fbounded-attention\u002F)\n    - Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning [[Paper]](http:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.06642v1) [[Code]](https:\u002F\u002Fgithub.com\u002Fwfanyue\u002FDPG-T2I-Personalization)\n    - ***TIGC:*** Tuning-Free Image Customization with Image and Text Guidance [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.12658) [[Code]](https:\u002F\u002Fgithub.com\u002Fzrealli\u002FTIGIC) [[Project]](https:\u002F\u002Fzrealli.github.io\u002FTIGIC\u002F)\n    - ***MasterWeaver:*** Taming Editability and Face Identity for Personalized Text-to-Image Generation [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F06786.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fcsyxwei\u002FMasterWeaver) [[Project]](https:\u002F\u002Fmasterweaver.github.io\u002F)\n  - **NeurIPS**\n    - ***RectifID:*** Personalizing Rectified Flow with Anchored Classifier Guidance [[Paper]](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2024\u002Ffile\u002Fafa58a5b6adc0845e0fd632132a64c39-Paper-Conference.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Ffeifeiobama\u002FRectifID)\n    - ***AttnDreamBooth:*** Towards Text-Aligned Personalized Image Generation [[Paper]](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2024\u002Ffile\u002F465a13a95741fab2e912f98adb07df1d-Paper-Conference.pdf) [[Project]](https:\u002F\u002Fattndreambooth.github.io\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FlyuPang\u002FAttnDreamBooth)\n  - **AAAI**\n    - Decoupled Textual Embeddings for Customized Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.11826.pdf)\n  - **arXiv**\n    - ***FlashFace:*** Human Image Personalization with High-fidelity Identity Preservation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.17008) [[Code]](https:\u002F\u002Fgithub.com\u002Fjshilong\u002FFlashFace) [[Project]](https:\u002F\u002Fjshilong.github.io\u002Fflashface-page)\n    - ***MoMA:*** Multimodal LLM Adapter for Fast Personalized Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.05674)\n    - ***IDAdapter:*** Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.13535)\n    - ***CoRe:*** Context-Regularized Text Embedding Learning for Text-to-Image Personalization [[Paper]](https:\u002F\u002Fwww.arxiv.org\u002Fpdf\u002F2408.15914)\n    - ***Imagine yourself:*** Tuning-Free Personalized Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.13346) [[Project]](https:\u002F\u002Fai.meta.com\u002Fresearch\u002Fpublications\u002Fimagine-yourself-tuning-free-personalized-image-generation\u002F)\n- \u003Cspan id=\"personalized-year-2023\">**Year 2023**\u003C\u002Fspan>\n  - **CVPR**\n    - ***Custom Diffusion:*** Multi-Concept Customization of Text-to-Image Diffusion [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FKumari_Multi-Concept_Customization_of_Text-to-Image_Diffusion_CVPR_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fadobe-research\u002Fcustom-diffusion) [[Project]](https:\u002F\u002Fwww.cs.cmu.edu\u002F~custom-diffusion\u002F)\n    - ***DreamBooth:*** Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FRuiz_DreamBooth_Fine_Tuning_Text-to-Image_Diffusion_Models_for_Subject-Driven_Generation_CVPR_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fdreambooth) [[Project]](https:\u002F\u002Fdreambooth.github.io\u002F)\n  - **ICCV**\n    - ***ELITE:*** Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FWei_ELITE_Encoding_Visual_Concepts_into_Textual_Embeddings_for_Customized_Text-to-Image_ICCV_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fcsyxwei\u002FELITE)\n  - **ICLR**\n    - ***Textual Inversion:*** An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion [[Paper]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=NAQvF08TcyG) [[Code]](https:\u002F\u002Fgithub.com\u002Frinongal\u002Ftextual_inversion) [[Project]](https:\u002F\u002Ftextual-inversion.github.io\u002F)\n  - **SIGGRAPH**\n    - ***Break-A-Scene:*** Extracting Multiple Concepts from a Single Image [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.16311.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fbreak-a-scene)\n    - Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2302.12228.pdf) [[Project]](https:\u002F\u002Ftuning-encoder.github.io\u002F)\n    - ***LayerDiffusion:*** Layered Controlled Image Editing with Diffusion Models [[Paper]](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3610543.3626172)\n  - **arXiv**\n    - ***DreamTuner:*** Single Image is Enough for Subject-Driven Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.13691.pdf) [[Project]](https:\u002F\u002Fdreamtuner-diffusion.github.io\u002F)\n    - ***PhotoMaker:*** Customizing Realistic Human Photos via Stacked ID Embedding [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.04461.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FTencentARC\u002FPhotoMaker)\n    - ***IP-Adapter:*** Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.06721.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Ftencent-ailab\u002FIP-Adapter) [[Project]](https:\u002F\u002Fip-adapter.github.io\u002F)\n    - ***FastComposer:*** Tuning-Free Multi-Subject Image Generation with Localized Attention [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.10431.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Ffastcomposer)\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\n\u003C!-- omit in toc -->\n## Text-Guided Image Editing\n- \u003Cspan id=\"editing-year-2025\">**Year 2025**\u003C\u002Fspan>\n  - **CVPR**\n    - ***FDS:*** Frequency-Aware Denoising Score for Text-Guided Latent Diffusion Image Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.19191)\n    - Reference-Based 3D-Aware Image Editing with Triplanes [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.03632)\n    - ***MoEdit:*** On Learning Quantity Perception for Multi-object Image Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.10112)\n    - ⚠️ FeedEdit: Text-Based Image Editing with Dynamic Feedback Regulation [Paper]\n  - **ICCV**\n    - ***In-Context Edit:*** Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.20690) [[Project]](https:\u002F\u002Friver-zhang.github.io\u002FICEdit-gh-pages\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FRiver-Zhang\u002FICEdit?tab=readme-ov-file)\n    - ***Dual‑Conditional Inversion:*** for Boosting Diffusion‑Based Image Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.02560)\n    - ***CAMILA:*** Context‑Aware Masking for Image Editing with Language Alignment [[Paper]](https:\u002F\u002Fneurips.cc\u002Fvirtual\u002F2025\u002Fposter\u002F119101)\n    - ***EditInfinity:*** Image Editing with Binary‑Quantized Generative Models [[Paper]](https:\u002F\u002Fneurips.cc\u002Fvirtual\u002F2025\u002Fposter\u002F115392)\n    - ***KRIS‑Bench:*** Benchmarking Knowledge‑Based Reasoning in Image Editing Systems [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.16707) [[Project]](https:\u002F\u002Fyongliang-wu.github.io\u002Fkris_bench_project_page\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fmercurystraw\u002FKris_Bench)\n    - ***LoongX:*** Neural-Driven Image Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.05397) [[Project]](https:\u002F\u002Floongx1.github.io) [[Code]](https:\u002F\u002Fgithub.com\u002FLanceZPF\u002Floongx)\n    - ***CREA:*** CREA: A Collaborative Multi‑Agent Framework for Creative Image Editing and Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.05306) [[Project]](https:\u002F\u002Fcrea-diffusion.github.io)\n    - ***IEAP:*** Image Editing As Programs with Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.04158) [[Project]](https:\u002F\u002Fyujiahu1109.github.io\u002FIEAP\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FYujiaHu1109\u002FIEAP)\n  - **ICLR**\n    - Lightning-Fast Image Inversion and Editing for Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=t9l63huPRt)\n    - Multi-Reward as Condition for Instruction-based Image Editing [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=9RFocgIccP) [[Code]](https:\u002F\u002Fgithub.com\u002Fbytedance\u002FMulti-Reward-Editing)\n    - ***HQ-Edit:*** A High-Quality Dataset for Instruction-based Image Editing [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=mZptYYttFj) [[Dataset]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FUCSC-VLAA\u002FHQ-Edit) [[Code]](https:\u002F\u002Fgithub.com\u002FUCSC-VLAA\u002FHQ-Edit)\n    - ***CLIPDrag:*** Combining Text-based and Drag-based Instructions for Image Editing [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=2HjRezQ1nj) [[Code]](https:\u002F\u002Fgithub.com\u002FHKUST-LongGroup\u002FCLIPDrag)\n    - Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=Hu0FSOSEyS) [[Project]](https:\u002F\u002Frf-inversion.github.io\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FLituRout\u002FRF-Inversion)\n    - ***PostEdit:*** Posterior Sampling for Efficient Zero-Shot Image Editing [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=J8YWCBPgx7) [[Code]](https:\u002F\u002Fgithub.com\u002FTFNTF\u002FPostEdit)\n    - ***OmniEdit:*** Building Image Editing Generalist Models Through Specialist Supervision [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=Hlm0cga0sv) [[Project]](https:\u002F\u002Ftiger-ai-lab.github.io\u002FOmniEdit\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FTIGER-AI-Lab\u002FOmniEdit) [[Dataset]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTIGER-Lab\u002FOmniEdit-Filtered-1.2M)\n\n- \u003Cspan id=\"editing-year-2024\">**Year 2024**\u003C\u002Fspan>\n  - **CVPR**\n    - ***InfEdit:*** Inversion-Free Image Editing with Natural Language [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.04965.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fsled-group\u002FInfEdit) [[Project]](https:\u002F\u002Fsled-group.github.io\u002FInfEdit\u002F)\n    - Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.03431.pdf)\n    - Doubly Abductive Counterfactual Inference for Text-based Image Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.02981.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fxuesong39\u002FDAC)\n    - Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.10113.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fguoqincode\u002FFocus-on-Your-Instruction)\n    - Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.18608.pdf)\n    - ***DragDiffusion:*** Harnessing Diffusion Models for Interactive Point-based Image Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2306.14435.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FYujun-Shi\u002FDragDiffusion)\n    - ***DiffEditor:*** Boosting Accuracy and Flexibility on Diffusion-based Image Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.02583.pdf)\n    - ***FreeDrag:*** Feature Dragging for Reliable Point-based Image Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2307.04684.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FLPengYang\u002FFreeDrag)\n    - Text-Driven Image Editing via Learnable Regions [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.16432.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fyuanze-lin\u002FLearnable_Regions) [[Project]](https:\u002F\u002Fyuanze-lin.me\u002FLearnableRegions_page\u002F) [[Video]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=FpMWRXFraK8&feature=youtu.be)\n    - ***LEDITS++:*** Limitless Image Editing using Text-to-Image Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.16711.pdf) [[Code]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fediting-images\u002Fledtisplusplus\u002Ftree\u002Fmain) [[Project]](https:\u002F\u002Fleditsplusplus-project.static.hf.space\u002Findex.html) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fediting-images\u002Fleditsplusplus)\n    - ***SmartEdit:*** Exploring Complex Instruction-based Image Editing with Large Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.06739.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FTencentARC\u002FSmartEdit) [[Project]](https:\u002F\u002Fyuzhou914.github.io\u002FSmartEdit\u002F)\n    - ***Edit One for All:*** Interactive Batch Image Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.10219.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fthaoshibe\u002Fedit-one-for-all) [[Project]](https:\u002F\u002Fthaoshibe.github.io\u002Fedit-one-for-all\u002F)\n    - ***DiffMorpher:*** Unleashing the Capability of Diffusion Models for Image Morphing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.07409) [[Code]](https:\u002F\u002Fgithub.com\u002FKevin-thu\u002FDiffMorpher) [[Project]](https:\u002F\u002Fkevin-thu.github.io\u002FDiffMorpher_page\u002F) [[Demo]](https:\u002F\u002Fopenxlab.org.cn\u002Fapps\u002Fdetail\u002FKaiwenZhang\u002FDiffMorpher)\n    - ***TiNO-Edit:*** Timestep and Noise Optimization for Robust Diffusion-Based Image Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.11120) [[Code]](https:\u002F\u002Fgithub.com\u002FSherryXTChen\u002FTiNO-Edit)\n    - Person in Place: Generating Associative Skeleton-Guidance Maps for Human-Object Interaction Image Editing [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FYang_Person_in_Place_Generating_Associative_Skeleton-Guidance_Maps_for_Human-Object_Interaction_CVPR_2024_paper.pdf) [[Project]](https:\u002F\u002Fyangchanghee.github.io\u002FPerson-in-Place_page\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FYangChangHee\u002FCVPR2024_Person-In-Place_RELEASE)\n    - Referring Image Editing: Object-level Image Editing via Referring Expressions [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FLiu_Referring_Image_Editing_Object-level_Image_Editing_via_Referring_Expressions_CVPR_2024_paper.pdf)\n    - Prompt Augmentation for Self-supervised Text-guided Image Manipulation [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FBodur_Prompt_Augmentation_for_Self-supervised_Text-guided_Image_Manipulation_CVPR_2024_paper.pdf)\n    - The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FBobkov_The_Devil_is_in_the_Details_StyleFeatureEditor_for_Detail-Rich_StyleGAN_CVPR_2024_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FAIRI-Institute\u002FStyleFeatureEditor)\n  \n\n  - **ECCV**\n    - ***RegionDrag:*** Fast Region-Based Image Editing with Diffusion Models [[Paper]](http:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.18247v1) [[Code]](https:\u002F\u002Fgithub.com\u002FVisual-AI\u002FRegionDrag) [[Project]](https:\u002F\u002Fvisual-ai.github.io\u002Fregiondrag\u002F) [[Demo]](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1pnq9t_1zZ8yL_Oba20eBLVZLp3glniBR?usp=sharing)\n    - ***TurboEdit:*** Instant text-based image editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.08332v1) [[Project]](https:\u002F\u002Fbetterze.github.io\u002FTurboEdit\u002F)\n    - ***InstructGIE:*** Towards Generalizable Image Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.05018)\n    - ***StableDrag:*** Stable Dragging for Point-based Image Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.04437)\n    - ***Eta Inversion:*** Designing an Optimal Eta Function for Diffusion-based Real Image Editing [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F02157.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Ffuriosa-ai\u002Feta-inversion) [[Project]](https:\u002F\u002Fgithub.com\u002Ffuriosa-ai\u002Feta-inversion)\n    - ***SwapAnything:*** Enabling Arbitrary Object Swapping in Personalized Image Editing [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F04768.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Feric-ai-lab\u002Fswap-anything) [[Project]](https:\u002F\u002Fswap-anything.github.io\u002F)\n    - ***Guide-and-Rescale:*** Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F08987.pdf)\n    - ***FreeDiff:*** Progressive Frequency Truncation for Image Editing with Diffusion Models [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F00759.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FThermal-Dynamics\u002FFreeDiff)\n    - Lazy Diffusion Transformer for Interactive Image Editing [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F03436.pdf) [[Project]](https:\u002F\u002Flazydiffusion.github.io\u002F)\n    - ***ByteEdit:*** Boost, Comply and Accelerate Generative Image Editing [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F00359.pdf) [[Project]](https:\u002F\u002Fbyte-edit.github.io\u002F)\n  - **ICLR**\n    - Guiding Instruction-based Image Editing via Multimodal Large Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.17102.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fapple\u002Fml-mgie) [[Project]](https:\u002F\u002Fmllm-ie.github.io\u002F)\n    - The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.01410.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FML-GSAI\u002FSDE-Drag) [[Project]](https:\u002F\u002Fml-gsai.github.io\u002FSDE-Drag-demo\u002F)\n    - ***Motion Guidance:*** Diffusion-Based Image Editing with Differentiable Motion Estimators [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.18085.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fdangeng\u002Fmotion_guidance) [[Project]](https:\u002F\u002Fdangeng.github.io\u002Fmotion_guidance\u002F)\n    - Object-Aware Inversion and Reassembly for Image Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.12149.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Faim-uofa\u002FOIR) [[Project]](https:\u002F\u002Faim-uofa.github.io\u002FOIR-Diffusion\u002F)\n    - ***Noise Map Guidance:*** Inversion with Spatial Context for Real Image Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.04625.pdf)\n  - **AAAI**\n    - Tuning-Free Inversion-Enhanced Control for Consistent Image Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.14611)\n    - ***BARET:*** Balanced Attention based Real image Editing driven by Target-text Inversion [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.05482)\n    - Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion Inference [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.17423)\n    - High-Fidelity Diffusion-based Image Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.15707)\n    - ***AdapEdit:*** Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.08019)\n    - ***TexFit:*** Text-Driven Fashion Image Editing with Diffusion Models [[Paper]](https:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fview\u002F28885)\n  - **arXiv**\n    - ***An Item is Worth a Prompt:*** Versatile Image Editing with Disentangled Control [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.04880.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FasFeng\u002Fd-edit)\n    - One-Dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.16145) [[Code]](https:\u002F\u002Fgithub.com\u002FCon6924\u002FSPM) [[Project]](https:\u002F\u002Flyumengyao.github.io\u002Fprojects\u002Fspm)\n    - ***EditWorld:*** Simulating World Dynamics for Instruction-Following Image Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.14785) [[Code]](https:\u002F\u002Fgithub.com\u002FYangLing0818\u002FEditWorld) [[Project]](https:\u002F\u002Fgithub.com\u002FYangLing0818\u002FEditWorld)\n    - ***ReasonPix2Pix:*** Instruction Reasoning Dataset for Advanced Image Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.11190)\n    - ***FlowEdit:*** Inversion-Free Text-Based Editing Using Pre-Trained Flow Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.08629) [[Code]](https:\u002F\u002Fgithub.com\u002Ffallenshock\u002FFlowEdit) [[Project]](https:\u002F\u002Fmatankleiner.github.io\u002Fflowedit\u002F) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffallenshock\u002FFlowEdit\u002F)\n- \u003Cspan id=\"editing-year-2023\">**Year 2023**\u003C\u002Fspan>\n  - **CVPR**\n    - Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FWu_Uncovering_the_Disentanglement_Capability_in_Text-to-Image_Diffusion_Models_CVPR_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FUCSB-NLP-Chang\u002FDiffusionDisentanglement)\n    - ***SINE:*** SINgle Image Editing with Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FZhang_SINE_SINgle_Image_Editing_With_Text-to-Image_Diffusion_Models_CVPR_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fzhang-zx\u002FSINE)\n    - ***Imagic:*** Text-Based Real Image Editing with Diffusion Models [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FKawar_Imagic_Text-Based_Real_Image_Editing_With_Diffusion_Models_CVPR_2023_paper.pdf)\n    - ***InstructPix2Pix:*** Learning to Follow Image Editing Instructions [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FBrooks_InstructPix2Pix_Learning_To_Follow_Image_Editing_Instructions_CVPR_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Ftimothybrooks\u002Finstruct-pix2pix) [[Dataset]](https:\u002F\u002Finstruct-pix2pix.eecs.berkeley.edu\u002F) [[Project]](https:\u002F\u002Fwww.timothybrooks.com\u002Finstruct-pix2pix\u002F) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ftimbrooks\u002Finstruct-pix2pix)\n    - ***Null-text Inversion*** for Editing Real Images using Guided Diffusion Models [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FMokady_NULL-Text_Inversion_for_Editing_Real_Images_Using_Guided_Diffusion_Models_CVPR_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fprompt-to-prompt\u002F#null-text-inversion-for-editing-real-images)\n  - **ICCV**\n    - ***MasaCtrl:*** Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FCao_MasaCtrl_Tuning-Free_Mutual_Self-Attention_Control_for_Consistent_Image_Synthesis_and_ICCV_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FTencentARC\u002FMasaCtrl) [[Project]](https:\u002F\u002Fljzycmd.github.io\u002Fprojects\u002FMasaCtrl\u002F) [[Demo]](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1DZeQn2WvRBsNg4feS1bJrwWnIzw1zLJq?usp=sharing)\n    - Localizing Object-level Shape Variations with Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FPatashnik_Localizing_Object-Level_Shape_Variations_with_Text-to-Image_Diffusion_Models_ICCV_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Forpatashnik\u002Flocal-prompt-mixing) [[Project]](https:\u002F\u002Forpatashnik.github.io\u002Flocal-prompt-mixing\u002F) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Forpatashnik\u002Flocal-prompt-mixing)\n  - **ICLR**\n    - ***SDEdit:*** Guided Image Synthesis and Editing with Stochastic Differential Equations [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2108.01073.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fermongroup\u002FSDEdit) [[Project]](https:\u002F\u002Fsde-image-editing.github.io\u002F)\n- \u003Cspan id=\"editing-year-2022\">**Year 2022**\u003C\u002Fspan>\n  - **CVPR**\n    - ***DiffusionCLIP:*** Text-Guided Diffusion Models for Robust Image Manipulation [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FKim_DiffusionCLIP_Text-Guided_Diffusion_Models_for_Robust_Image_Manipulation_CVPR_2022_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fgwang-kim\u002FDiffusionCLIP)\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n## Text Image Generation\n- \u003Cspan id=\"gentext-year-2024\">**Year 2024**\u003C\u002Fspan>\n  - **arXiv**\n    - ***AnyText:*** Multilingual Visual Text Generation And Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.03054.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Ftyxsspa\u002FAnyText) [[Project]](https:\u002F\u002Fanytext.pics\u002F)\n  - **CVPR**\n    - ***SceneTextGen:*** Layout-Agnostic Scene Text Image Synthesis with Integrated Character-Level Diffusion and Contextual Consistency [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.01062v2)\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\n\u003C!-- omit in toc -->\n# Datasets\n- ***Microsoft COCO:*** Common Objects in Context [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1405.0312.pdf) [[Dataset]](https:\u002F\u002Fcocodataset.org\u002F#home)\n- ***Conceptual Captions:*** A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning [[Paper]](https:\u002F\u002Faclanthology.org\u002FP18-1238.pdf) [[Dataset]](https:\u002F\u002Fai.google.com\u002Fresearch\u002FConceptualCaptions\u002F)\n- ***LAION-5B:*** An Open Large-Scale Dataset for Training Next Generation Image-Text Models [[Paper]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=M3Y74vmsMcY) [[Dataset]](https:\u002F\u002Flaion.ai\u002F)\n- ***PartiPrompts:*** Scaling Autoregressive Models for Content-Rich Text-to-Image Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2206.10789) [[Dataset]](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fparti?tab=readme-ov-file) [[Project]](https:\u002F\u002Fsites.research.google\u002Fparti\u002F)\n\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\n\u003C!-- omit in toc -->\n# Toolkits\n|Name|Website|Description|\n|-|-|-|\n|Stable Diffusion WebUI|[link](https:\u002F\u002Fgithub.com\u002FAUTOMATIC1111\u002Fstable-diffusion-webui)|Built based on Gradio, deployed locally to run Stable Diffusion checkpoints, LoRA weights, ControlNet weights, etc.|\n|Stable Diffusion WebUI-forge|[link](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002Fstable-diffusion-webui-forge)|Built based on Gradio, deployed locally to run Stable Diffusion checkpoints, LoRA weights, ControlNet weights, etc.|\n|Fooocus|[link](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FFooocus)|Built based on Gradio, offline, open source, and free.  \u003Cbr \u002F>The manual tweaking is not needed, and users only need to focus on the prompts and images.|\n|ComfyUI|[link](https:\u002F\u002Fgithub.com\u002Fcomfyanonymous\u002FComfyUI)|Deployed locally to enable customized workflows with Stable Diffusion|\n|Civitai|[link](https:\u002F\u002Fcivitai.com\u002F)|Websites for community Stable Diffusion and LoRA checkpoints|\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n# Q&A\n- **Q: The conference sequence of this paper list?**\n  - This paper list is organized according to the following sequence:\n    - CVPR\n    - ICCV\n    - ECCV\n    - WACV\n    - NeurIPS\n    - ICLR\n    - ICML\n    - ACM MM\n    - SIGGRAPH\n    - AAAI\n    - arXiv\n    - Others\n- **Q: What does `Others` refers to?**\n  - Some of the following studies (e.g., `Stable Casacade`) does not publish their technical report on arXiv. Instead, they tend to write a blog in their official websites. The `Others` category refers to such kind of studies.\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n# References\n\nThe `reference.bib` file summarizes bibtex references of up-to-date image inpainting papers, widely used datasets, and toolkits.\nBased on the original references, I have made the following modifications to make their results look nice in the `LaTeX` manuscripts:\n- Refereces are normally constructed in the form of `author-etal-year-nickname`. Particularly, references of datasets and toolkits are directly constructed as `nickname`, e.g., `imagenet`.\n- In each reference, all names of conferences\u002Fjournals are converted into abbreviations, e.g., `Computer Vision and Pattern Recognition -> CVPR`.\n- The `url`, `doi`, `publisher`, `organization`, `editor`, `series` in all references are removed.\n- The `pages` of all references are added if they are missing.\n- All paper names are in title case. Besides, I have added an additional `{}` to make sure that the title case would also work well in some particular templates. \n\nIf you have other demands of reference formats, you may refer to the original references of papers by searching their names in [DBLP](https:\u002F\u002Fdblp.org\u002F) or [Google Scholar](https:\u002F\u002Fscholar.google.com\u002F).\n\n> [!NOTE]\n> Note that references in the `homepage` and [the `topic` section](topics\u002Ftopics.md) can be repeated in `reference.bib`. Personally, I recommend using `\"Ctrl+F\" \u002F \"Command+F\"` to search your desired `BibTeX` reference.\n\n [\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n# Star History\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlonzoLeeeooo_awesome-text-to-image-studies_readme_06364993e832.png\" target=\"_blank\">\n        \u003Cimg width=\"500\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlonzoLeeeooo_awesome-text-to-image-studies_readme_06364993e832.png\" alt=\"Star History Chart\">\n    \u003C\u002Fa>\n\u003Cp>\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n# WeChat Group\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"YOUR_OFFICIAL_WEBSITE_URL\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlonzoLeeeooo_awesome-text-to-image-studies_readme_bebe6963d95b.png\" alt=\"group\">\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n","\u003Cp align=\"center\">\n  \u003Ch1 align=\"center\">文生图生成研究合集\u003C\u002Fh1>\n\n本 GitHub 仓库汇总了与文本到图像（T2I）生成任务相关的论文和资源。\n\n> [!NOTE]\n> 本文档是整个 GitHub 仓库的`主页`。论文按照**不同的研究方向、发表年份和会议**进行总结。\n> \n> [“topics”章节](topics\u002Ftopics.md)根据不同的特性，总结了与 T2I 生成高度相关的论文，例如 T2I 生成的前提条件、结合其他技术的扩散模型（如 Diffusion Transformer、LLMs、Mamba 等），以及用于其他任务的扩散模型。\n\n如果您对本仓库有任何建议，请随时[发起新议题](https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo\u002Fawesome-text-to-image-generation-studies\u002Fissues\u002Fnew)或[提交 Pull Request](https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo\u002Fawesome-text-to-image-generation-studies\u002Fpulls)。\n\n本 GitHub 仓库的最新动态如下。\n\n🔥 [2025年12月11日] 我们的论文《StableV2V：视频到视频编辑中的形状一致性稳定化》已被 TCSVT 2025 接收！\n\n🔥 [11月19日] 我们发布了最新论文《StableV2V：视频到视频编辑中的形状一致性稳定化》，并开源了相应的[代码](https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo\u002FStableV2V)、[模型权重](https:\u002F\u002Fhuggingface.co\u002FAlonzoLeeeooo\u002FStableV2V)以及测试基准[DAVIS-Edit](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FAlonzoLeeeooo\u002FDAVIS-Edit)。欢迎通过链接查看！\n\u003Cdetails> \u003Csummary> 点击查看更多信息。 \u003C\u002Fsummary>\n\n- [4月26日] 更新了一个新主题：**扩散模型与联邦学习的结合。** 更多详情请参见[“topics”章节](topics\u002Ftopics.md)！\n- [3月28日] 官方的**AAAI 2024**论文列表已发布！相应地更新了官方 PDF 版本和 BibTeX 参考文献。\n- [3月21日] [“topics”章节](topics\u002Ftopics.md)已更新。该章节旨在提供**按扩散模型其他特性分类的论文列表**，例如基于 Diffusion Transformer 的方法、用于自然语言处理的扩散模型、与 LLM 集成的扩散模型等。这些论文的参考文献也被汇总在 `reference.bib` 中。\n- [3月7日] 所有可用的**CVPR、ICLR 和 AAAI 2024 论文及参考文献**均已更新。\n- [3月1日] 总结了[**现成的文生图生成产品**](#available-products)和[**工具包**](#toolkits)的网站。\n\n\u003C\u002Fdetails>\n\n\n\u003C!-- omit in toc -->\n# \u003Cspan id=\"contents\">目录\u003C\u002Fspan>\n- [产品](#available-products)\n- [待办事项](#to-do-lists)\n- [论文](#papers)\n  - [综述论文](#survey-papers)\n  - [文本到图像生成](#text-to-image-generation)\n    - [2025年](#text-year-2025)\n    - [2024年](#text-year-2024)\n    - [2023年](#text-year-2023)\n    - [2022年](#text-year-2022)\n    - [2021年](#text-year-2021)\n    - [2020年](#text-year-2020)\n  - [条件文本到图像生成](#conditional-text-to-image-generation)\n    - [2025年](#conditional-year-2025)\n    - [2024年](#conditional-year-2024)\n    - [2023年](#conditional-year-2023)\n    - [2022年](#conditional-year-2022)\n  - [个性化文本到图像生成](#personalized-text-to-image-generation)\n    - [2025年](#personalized-year-2025)\n    - [2024年](#personalized-year-2024)\n    - [2023年](#personalized-year-2023)\n  - [文本引导的图像编辑](#text-guided-image-editing) \n    - [2025年](#editing-year-2025)\n    - [2024年](#editing-year-2024)\n    - [2023年](#editing-year-2023)\n    - [2022年](#editing-year-2022)\n  - [文本图像生成](#text-image-generation)\n    - [2024年](#gentext-year-2024)\n- [数据集](#datasets)\n- [工具包](#toolkits)\n- [问答](#qa)\n- [参考文献](#references)\n- [星标历史](#star-history)\n- [微信群](#wechat-group)\n\n\u003C!-- omit in toc -->\n# 待办事项\n- 会议发表论文\n  - [ ] 更新 NeurIPS 2025 论文\n  - [x] 更新 ICCV 2025 论文\n  - [x] 更新 CVPR 2025 论文\n  - [x] 更新 ICLR 2025 论文\n  - [x] 更新 NeurIPS 2024 论文\n  - [x] 更新 ECCV 2024 论文\n  - [x] 更新 CVPR 2024 论文\n    - [x] 更新 ⚠️ 论文和参考文献\n    - [ ] 将 arXiv 参考文献更新为官方版本\n  - [x] 更新 AAAI 2024 论文\n    - [x] 更新 ⚠️ 论文和参考文献\n    - [x] 将 arXiv 参考文献更新为官方版本\n  - [x] 更新 ICLR 2024 论文\n  - [x] 更新 NeurIPS 2023 论文\n- 定期维护预印本 arXiv 论文及遗漏论文\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\n\u003C!-- omit in toc -->\n# 产品\n|名称|年份|网站|特色|\n|-|-|-|-|\n|Nano Image Art|2025|[link](https:\u002F\u002Fnanoimage.art\u002F)|创作惊艳的 AI 图像——由 Google 的 Nano Banana Pro 提供支持，实现下一代品质、智能编辑和智能化提示。|\n|Fast Image AI|2025|[link](https:\u002F\u002Ffastimage.ai\u002F)|Fast Image AI 可以立即将您的照片转换为吉卜力、素描和皮克斯等惊艳风格。只需点击一下，即可轻松控制图像元素并创造出令人惊叹的效果。|\n|Gempix2 (Nano Banana 2)|2025|[link](https:\u002F\u002Fgempix2.site)|免费的 AI 图像生成平台，支持文生图、AI 编辑和视频生成|\n|Stable Diffusion 3|2024|[link](https:\u002F\u002Fstability.ai\u002Fnews\u002Fstable-diffusion-3)|基于 Diffusion Transformer 的 Stable Diffusion|\n|Stable Video|2024|[link](https:\u002F\u002Fwww.stablevideo.com\u002F)|高质量高分辨率图像|\n|DALL-E 3|2023|[link](https:\u002F\u002Fopenai.com\u002Fdall-e-3)|可与 [ChatGPT](https:\u002F\u002Fchat.openai.com\u002F) 协作|\n|Ideogram|2023|[link](https:\u002F\u002Fideogram.ai\u002Flogin)|文本图像|\n|Playground|2023|[link](https:\u002F\u002Fplayground.com\u002F)|运动风格图像|\n|HiDream.ai|2023|[link](https:\u002F\u002Fhidreamai.com\u002F#\u002F)|-|\n|Dashtoon|2023|[link](https:\u002F\u002Fdashtoon.com\u002F)|文本转漫画生成|\n|WHEE|2023|[link](https:\u002F\u002Fwww.whee.com\u002F)|WHEE 是一款在线 AI 生成工具，可用于 *T2I 生成、I2I 生成、超分辨率、修复、扩展绘画、图像变体、虚拟试穿等*。|\n|Vega AI|2023|[link](https:\u002F\u002Fwww.vegaai.net\u002F)|Vega AI 是一款在线 AI 生成工具，可用于 *T2I 生成、I2I 生成、超分辨率、T2V 生成、I2V 生成等*。|\n|Wujie AI|2022|[link](https:\u002F\u002Fwww.wujieai.com\u002F)|中文名为“无界AI”，提供 AIGC 资源和在线服务|\n|Midjourney|2022|[link](https:\u002F\u002Fwww.midjourney.com\u002Fhome)|功能强大的闭源生成工具|\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\n\u003C!-- omit in toc -->\n# 论文\n\n\u003C!-- omit in toc -->\n\n## 综述论文\n- **文本到图像生成**\n  - 2024年\n    - **ACM Computing Surveys** \n      - 扩散模型：方法与应用的全面综述 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.00796.pdf)\n  - 2023年\n    - **TPAMI**\n      - 视觉中的扩散模型：综述 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.04747v2) [[代码]](https:\u002F\u002Fgithub.com\u002FCroitoruAlin\u002FDiffusion-Models-in-Vision-A-Survey)\n    - **arXiv**\n      - 生成式AI中的文本到图像扩散模型：综述 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.07909)\n      - 视觉计算中扩散模型的最新进展 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.07204.pdf) \n  - 2022年\n    - **arXiv**\n      - 面向视觉的高效扩散模型：综述 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2210.09292)\n- **条件文本到图像生成**\n  - 2024年\n    - **arXiv**\n      - 基于文本到图像扩散模型的可控生成：综述 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.04279)\n- **文本引导的图像编辑**\n  - 2024年\n    - **arXiv**\n      - 基于扩散模型的图像编辑：综述 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.17525.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FSiatMMLab\u002FAwesome-Diffusion-Model-Based-Image-Editing-Methods)\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n\n## 文本到图像生成\n- \u003Cspan id=\"text-year-2025\">**2025年**\u003C\u002Fspan>\n  - **CVPR**\n    - ***PreciseCam：*** 文本到图像生成中的精确相机控制 [[论文]](https:\u002F\u002Farxiv.org\u002Fapdf\u002F2501.12910) [[项目]](https:\u002F\u002Fgraphics.unizar.es\u002Fprojects\u002FPreciseCam2024\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fedurnebernal\u002FPreciseCam)\n    - ***Type-R：*** 文本到图像生成中自动修复错别字 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.18159) [[代码]](https:\u002F\u002Fgithub.com\u002FCyberAgentAILab\u002FType-R)\n    - ***Compass Control：*** 文本到图像生成中的多对象方向控制 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.06752)\n    - ***Generative Photography：*** 场景一致的相机控制用于逼真的文本到图像合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.02168) [[项目]](https:\u002F\u002Fgenerative-photography.github.io\u002Fproject\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fpandayuanyu\u002Fgenerative-photography)\n    - ***One-Way Ticket：*** 不依赖时间的统一编码器用于蒸馏文本到图像扩散模型 [[论文]](https:\u002F\u002Fcvpr.thecvf.com\u002Fvirtual\u002F2025\u002Fposter\u002F32579) [[代码]](https:\u002F\u002Fgithub.com\u002Fsen-mao\u002FLoopfree)\n    - 文本嵌入并非全部所需：利用文本自注意力图进行文本到图像语义对齐的注意力控制 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.15236)\n    - 向理解和量化文本到图像生成中的不确定性迈进 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.03178)\n    - 通过双空间多方面概念控制实现即插即用、可解释且负责任的文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.18324) [[项目]](https:\u002F\u002Fbasim-azam.github.io\u002Fresponsiblediffusion\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fbasim-azam\u002Fresponsiblediffusion)\n    - 精确计数：生成准确数量物体的文本到图像 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.10210) [[项目]](https:\u002F\u002Fmake-it-count-paper.github.io\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FLitalby1\u002Fmake-it-count)\n    - ***MCCD：*** 基于多智能体协作的组合式扩散用于复杂文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.02648)\n    - 重新思考去偏训练：释放稳定扩散的潜力 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.12692)\n    - ***ShapeWords：*** 使用三维形状感知提示引导文本到图像合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.02912) [[项目]](https:\u002F\u002Flodurality.github.io\u002Fshapewords\u002F)\n    - ***SnapGen：*** 通过高效架构和训练驯服移动端高分辨率文本到图像模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.09619)\n    - 无需训练的文本到图像合成中通过重定位注意力图优化空间传输 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.22168)\n    - ***Focus-N-Fix：*** 区域感知微调用于文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.06481)\n    - ***SILMM：*** 自我改进的大规模多模态模型用于组合式文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.05818)\n    - 无训练门控低秩适应用于文本到图像扩散模型的局部概念擦除 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.12356)\n    - 自交叉扩散指导用于相似主题的文本到图像合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.18936)\n    - 噪声扩散用于提升文本到图像合成中的语义忠实度 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.16503) [[代码]](https:\u002F\u002Fgithub.com\u002FBomingmiao\u002FNoiseDiffusion)\n    - 学习为文本到图像生成采样有效且多样化的提示 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.07838)\n    - ***STEREO：*** 一个两阶段框架，用于从文本到图像扩散模型中对抗性地擦除概念 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.16807)\n    - 针对少数群体的文本到图像生成通过提示优化 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.16503) [[代码]](https:\u002F\u002Fgithub.com\u002Fsoobin-um\u002FMinorityPrompt)\n    - 缩小文本到图像扩散模型的文本编码器 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.19897) [[代码]](https:\u002F\u002Fgithub.com\u002FLifuWang-66\u002FDistillT5)\n    - ⚠️ “遗忘”的幻象：文本到图像扩散模型中机器遗忘的不稳定性 [论文]\n    - ⚠️ 探索大型语言模型与扩散Transformer在文本到图像合成中的深度融合 [论文]\n    - ⚠️ 检测并引导：通过指南令牌优化自我调节扩散模型以实现安全的文本到图像生成 [论文]\n    - ⚠️ 多群体比例代表性在文本到图像模型中的应用 [论文]\n    - ⚠️ ***VODiff：*** 控制文本到图像生成中对象的可见性顺序 [论文]\n  - **ICLR**\n    - 改善文本到图像扩散模型的长文本对齐 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=2ZK8zyIt7o)\n    - 信息论视角下的文本到图像对齐 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=Ugs2W5XFFo)\n    - ***Meissonic：*** 重振掩码生成式Transformer以实现高效的高分辨率文本到图像合成 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=GJsuYHhAga)\n    - ***PaRa：*** 通过参数秩降低个性化文本到图像扩散 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=KZgo2YQbhc)\n    - ***Fluid：*** 使用连续标记扩展自回归文本到图像生成模型 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=jQP5o1VAVc)\n    - 并非所有提示都一样：基于提示修剪文本到图像扩散模型 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=3BhZCfJ73Y)\n    - 去噪自回归Transformer用于可扩展的文本到图像生成 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=amDkNPVWcn)\n    - 文本到图像生成模型中的渐进式组合性 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=S85PP4xjFD)\n    - 挖掘你自己的秘密：扩散分类器分数用于文本到图像扩散模型的持续个性化 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=hUdLs6TqZL)\n    - 测量并改善文本到图像生成模型的参与度 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=TmCcNuo03f)\n    - 通过残差注意力门擦除文本到图像扩散模型中的概念组合 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=ZRDhBwKs7l)\n    - 使用可靠随机种子增强组合式文本到图像生成 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=5BSlakturs)\n    - ***一提示一故事：*** 使用单一提示实现一致性文本到图像生成 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=cD1kl2QKv1)\n    - 你只需采样一次：通过自合作扩散GAN驯服一步式文本到图像合成 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=T7bmHkwzS6)\n    - 重新思考文本到图像生成时代下的艺术版权侵权问题 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=0OTVNEm9N4)\n    - 从文本到图像扩散模型中擦除概念组合 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=OBjF5I4PWg)\n    - 文本到图像生成模型中跨注意力头位置模式可以与人类视觉概念对齐 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=1vggIT5vvj)\n    - ***TIGeR：*** 利用大型多模态模型统一文本到图像生成与检索 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=mr2icR6dpD)\n    - ***DGQ：*** 分布感知的分组量化用于文本到图像扩散模型 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=ZyNEr7Xw5L)\n    - 无训练推测雅可比解码加速自回归文本到图像生成 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=LZfjxvqw0N)\n    - ***PT-T2I\u002FV：*** 一种高效的代理标记化扩散Transformer用于文本到图像\u002F视频任务 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=lTrrnNdkOX)\n    - 重新审视文本到图像评估：关于指标、提示和人工评分 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=Im2neAMlre)\n    - ***SANA：*** 利用线性扩散Transformer实现高效高分辨率文本到图像合成 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=N8Oj1XhtYZ)\n    - 文本到图像校正流作为即插即用先验 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=SzPZK856iI)\n    - 自动过滤人类反馈数据以对齐文本到图像扩散模型 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=8jvVNPHtVJ)\n    - ***SAFREE：*** 无训练且自适应的安全卫士，用于安全的文本到图像和视频生成 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=hgTFotBRKl)\n    - ***IterComp：*** 迭代式组合意识反馈学习，来自模型库用于文本到图像生成 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=4w99NAikOE)\n    - ***ScImage：*** 多模态大型语言模型在科学文本到图像生成方面的表现如何？ [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=ugyqNEOjoU)\n    - 引导得分身份蒸馏用于无数据的一步式文本到图像生成 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=HMVDiaWMwM)\n    - 从因果视角评估文本到图像合成中的语义变异 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=NWb128pSCb)\n- \u003Cspan id=\"text-year-2024\">**2024年**\u003C\u002Fspan>\n  - **CVPR**\n    - ***DistriFusion：*** 高分辨率扩散模型的分布式并行推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.19481.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Fdistrifuser)\n    - ***InstanceDiffusion：*** 实例级控制用于图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.03290.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Ffrank-xwang\u002FInstanceDiffusion) [[项目]](https:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~xdwang\u002Fprojects\u002FInstDiff\u002F)\n    - ***ECLIPSE：*** 一种资源高效的文本到图像先验用于图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.04655.pdf) [[代码]](https:\u002F\u002Feclipse-t2i.vercel.app\u002F) [[项目]](https:\u002F\u002Fgithub.com\u002Feclipse-t2i\u002Feclipse-inference) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FECLIPSE-Community\u002FECLIPSE-Kandinsky-v2.2)\n    - ***Instruct-Imagen：*** 多模态指令驱动的图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.01952.pdf)\n    - 学习连续3D词用于文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.08654.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fttchengab\u002Fcontinuous_3d_words_code\u002F)\n    - ***HanDiffuser：*** 具有逼真手部外观的文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.01693.pdf)\n    - 丰富的人类反馈用于文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.10240.pdf)\n    - ***MarkovGen：*** 结构化预测用于高效文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.10997.pdf)\n    - 文本到图像生成的定制助手 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.03045.pdf)\n    - ***ADI：*** 学习解耦标识符用于动作定制的文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.15841.pdf) [[项目]](https:\u002F\u002Fadi-t2i.github.io\u002FADI\u002F)\n    - ***UFOGen：*** 通过扩散GAN实现大规模单向文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.09257.pdf)\n    - 自我发现可解释的扩散潜在方向用于负责任的文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.17216.pdf)\n    - ***Tailored Visions：*** 通过个性化提示重写提升文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.08129.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fzzjchen\u002FTailored-Visions)\n    - ***CoDi：*** 条件扩散蒸馏用于更高保真度和更快的图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.01407.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Ffast-codi\u002FCoDi) [[项目]](https:\u002F\u002Ffast-codi.github.io\u002F) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FMKFMIKU\u002FCoDi)\n    - 使用潜在扩散模型和隐式神经解码器进行任意规模的图像生成和上采样 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.10255.pdf)\n    - 朝着在基于扩散的模型中有效使用以人为本先验以生成基于文本的人像迈进 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.05239)\n    - ***ElasticDiffusion：*** 无训练的任意尺寸图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.18822) [[代码]](https:\u002F\u002Fgithub.com\u002FMoayedHajiAli\u002FElasticDiffusion-official) [[项目]](https:\u002F\u002Felasticdiffusion.github.io\u002F) [[演示]](https:\u002F\u002Freplicate.com\u002Fmoayedhajiali\u002Felasticdiffusion)\n    - ***CosmicMan：*** 一个人类专用的文本到图像基础模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.01294) [[代码]](https:\u002F\u002Fgithub.com\u002Fcosmicman-cvpr2024\u002FCosmicMan) [[项目]](https:\u002F\u002Fcosmicman-cvpr2024.github.io\u002F)\n    - ***PanFusion：*** 驯服稳定扩散以生成360°全景图像 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.07949) [[代码]](https:\u002F\u002Fgithub.com\u002Fchengzhag\u002FPanFusion) [[项目]](https:\u002F\u002Fchengzhag.github.io\u002Fpublication\u002Fpanfusion)\n    - ***Intelligent Grimm：*** 基于潜在扩散模型的开放式视觉叙事 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2306.00973) [[代码]](https:\u002F\u002Fgithub.com\u002Fhaoningwu3639\u002FStoryGen) [[项目]](https:\u002F\u002Fhaoningwu3639.github.io\u002FStoryGen_Webpage\u002F)\n    - 关于基于扩散的文本到图像生成的可扩展性 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.02883)\n    - ***MuLAn：*** 一个多层次标注数据集用于可控文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.02790) [[项目]](https:\u002F\u002Fmulan-dataset.github.io\u002F) [[数据集]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fmulan-dataset\u002Fv1.0)\n    - 学习多维度的人类偏好用于文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.14705)\n    - 动态提示优化用于文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.04095)\n    - 通过强化学习训练扩散模型以实现多样化图像生成 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FMiao_Training_Diffusion_Models_Towards_Diverse_Image_Generation_with_Reinforcement_Learning_CVPR_2024_paper.pdf)\n    - 对抗性文本到连续图像生成 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FHaydarov_Adversarial_Text_to_Continuous_Image_Generation_CVPR_2024_paper.pdf) [[项目]](https:\u002F\u002Fkilichbek.github.io\u002Fwebpage\u002Fhypercgan\u002F) [[视频]](https:\u002F\u002Fkilichbek.github.io\u002Fwebpage\u002Fhypercgan\u002F#)\n    - ***EmoGen：*** 利用文本到图像扩散模型生成情感图像内容 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FYang_EmoGen_Emotional_Image_Content_Generation_with_Text-to-Image_Diffusion_Models_CVPR_2024_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FJingyuanYY\u002FEmoGen)\n  - **ECCV**\n    - 搭建不同语言模型和生成式视觉模型之间的桥梁以实现文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.07860) [[代码]](https:\u002F\u002Fgithub.com\u002FShihaoZhaoZSH\u002FLaVi-Bridge) [[项目]](https:\u002F\u002Fshihaozhaozsh.github.io\u002FLaVi-Bridge\u002F)\n    - 探索文本到图像扩散模型中的短语级对齐 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.05352v1) [[代码]](https:\u002F\u002Fgithub.com\u002Fnini0919\u002FDiffPNG)\n    - 把握正确方向：提升文本到图像模型的空间一致性 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.01197) [[代码]](https:\u002F\u002Fgithub.com\u002FSPRIGHT-T2I\u002FSPRIGHT) [[项目]](https:\u002F\u002Fspright-t2i.github.io\u002F)\n    - 跨印度语系导航文本到图像生成中的偏见 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.00283v1) [[项目]](https:\u002F\u002Fiab-rubric.org\u002Fresources\u002Fother-databases\u002Findictti)\n    - 通过人类反馈反演保护文本到图像扩散模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.21032)\n    - 现实与幻想的构建：借助LLM辅助提示解读进行场景生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.12579) [[代码]](https:\u002F\u002Fleo81005.github.io\u002FReality-and-Fantasy\u002F) [[项目]](https:\u002F\u002Fleo81005.github.io\u002FReality-and-Fantasy\u002F) [[数据集]](https:\u002F\u002Fleo81005.github.io\u002FReality-and-Fantasy\u002F)\n    - 文本到图像扩散模型中可靠且高效的概念擦除 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.12383v1) [[代码]](https:\u002F\u002Fgithub.com\u002FCharlesGong12\u002FRECE)\n    - 探索文本到图像扩散模型中的短语级对齐 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.05352v1) [[代码]](https:\u002F\u002Fgithub.com\u002Fnini0919\u002FDiffPNG)\n    - ***StyleTokenizer：*** 通过单个实例定义图像风格以控制扩散模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.02543) [[代码]](https:\u002F\u002Fgithub.com\u002Falipay\u002Fstyle-tokenizer)\n    - ***PEA-Diffusion：*** 在非英语文本到图像生成中具有知识蒸馏功能的参数高效适配器 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F08492.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FOPPO-Mente-Lab\u002FPEA-Diffusion)\n    - 现象空间中的偏差阻碍了文本到图像生成的泛化能力 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F11936.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fzdxdsw\u002Fskewed_relations_T2I)\n    - ***Parrot：*** 用于文本到图像生成的帕累托最优多奖励强化学习框架 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F05562.pdf)\n    - 搭建不同语言模型和生成式视觉模型之间的桥梁以实现文本到图像生成 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F10495.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FShihaoZhaoZSH\u002FLaVi-Bridge) [[项目]](https:\u002F\u002Fshihaozhaozsh.github.io\u002FLaVi-Bridge\u002F)\n    - ***MobileDiffusion：*** 移动设备上的即时文本到图像生成 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F07923.pdf)\n    - ***PixArt-Σ：*** 从弱到强训练扩散Transformer以实现4K文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.04692) [[代码]](https:\u002F\u002Fgithub.com\u002FPixArt-alpha\u002FPixArt-sigma) [[项目]](https:\u002F\u002Fpixart-alpha.github.io\u002FPixArt-sigma-project\u002F)\n    - ***CogView3：*** 通过接力扩散实现更精细、更快速的文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.05121) [[代码]](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FCogView)\n  - **ICLR**\n    - 修补后的去噪扩散模型用于高分辨率图像合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.01316.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fmlpc-ucsd\u002Fpatch-dm)\n    - ***Relay Diffusion：*** 统一跨分辨率的扩散过程以进行图像合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.03350.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FRelayDiffusion)\n    - ***SDXL：*** 改善潜伏扩散模型以用于高分辨率图像合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2307.01952.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FStability-AI\u002Fgenerative-models)\n    - 组合并征服：基于扩散的3D深度感知可组合图像合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.09048.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Ftomtom1103\u002Fcompose-and-conquer)\n    - ***PixArt-α：*** 快速训练扩散Transformer以实现照片级真实感的文本到图像合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.00426.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FPixArt-alpha\u002FPixArt-alpha) [[项目]](https:\u002F\u002Fpixart-alpha.github.io\u002F) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FPixArt-alpha\u002FPixArt-alpha)\n  - **SIGGRAPH**\n    - ***RGB↔X：*** 使用材料和光照感知的扩散模型进行图像分解和合成 [[论文]](https:\u002F\u002Fzheng95z.github.io\u002Fassets\u002Ffiles\u002Fsig24-rgbx.pdf) [[项目]](https:\u002F\u002Fzheng95z.github.io\u002Fpublications\u002Frgbx24)\n  - **AAAI**\n    - 语义感知的数据增强用于文本到图像合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.07951.pdf)\n    - 面向抽象概念的文本到图像生成 [[论文]](https:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fview\u002F28122)\n  - **arXiv**\n    - 扩散模型的自博弈微调用于文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.10210.pdf)\n    - ***RPG：*** 掌握文本到图像扩散：利用多模态LLM进行重新描述、规划和生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.11708.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FYangLing0818\u002FRPG-DiffusionMaster)\n    - ***Playground v2.5：*** 三个见解以提升文本到图像生成的艺术品质 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.17245.pdf) [[代码]](https:\u002F\u002Fhuggingface.co\u002Fplaygroundai\u002Fplayground-v2.5-1024px-aesthetic)\n    - ***ResAdapter：*** 扩散模型的领域一致分辨率适配器 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.02084.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fbytedance\u002Fres-adapter) [[项目]](https:\u002F\u002Fres-adapter.github.io\u002F)\n    - ***InstantID：*** 零样本身份保留生成，几秒钟内完成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.07519.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FInstantID\u002FInstantID) [[项目]](https:\u002F\u002Finstantid.github.io\u002F) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FInstantX\u002FInstantID)\n    - ***PIXART-δ：*** 快速且可控的图像生成，采用潜伏一致性模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.05252) [[代码]](b.com\u002FPixArt-alpha\u002FPixArt-alpha?tab=readme-ov-file)\n    - ***ELLA：*** 为扩散模型配备LLM以增强语义对齐 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.05135) [[代码]](https:\u002F\u002Fgithub.com\u002FELLA-Diffusion\u002FELLA) [[项目]](https:\u002F\u002Fella-diffusion.github.io\u002F)\n    - ***Text2Street：*** 可控的文本到图像生成用于街景 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.04504.pdf)\n    - ***LayerDiffuse：*** 使用潜伏透明度进行透明图像层扩散 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.17113) [[代码]](https:\u002F\u002Fgithub.com\u002Flayerdiffusion\u002FLayerDiffuse)\n    - ***SD3-Turbo：*** 快速高分辨率图像合成，采用潜伏对抗性扩散蒸馏 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.12015.pdf)\n    - ***StreamMultiDiffusion：*** 实时交互式生成，带有基于区域的语义控制 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.09055) [[代码]](https:\u002F\u002Fgithub.com\u002Fironjr\u002FStreamMultiDiffusion)\n    - ***SVGDreamer：*** 文本引导的SVG生成，使用扩散模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.16476) [[代码]](https:\u002F\u002Fgithub.com\u002Fximinng\u002FSVGDreamer) [[项目]](https:\u002F\u002Fximinng.github.io\u002FSVGDreamer-project\u002F)\n    - ***PromptCharm：*** 通过多模态提示和精炼实现文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.04014.pdf)\n    - ***YOSO：*** 你只需采样一次：通过自合作扩散GAN驯服一步式文本到图像合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.12931) [[代码]](https:\u002F\u002Fgithub.com\u002Fmlpen\u002FYOSO)\n    - ***SingDiffusion：*** 解决扩散模型中时间区间端点处的奇点 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.08381) [[代码]](https:\u002F\u002Fgithub.com\u002FPangzeCheung\u002FSingDiffusion)\n    - ***CoMat：*** 将文本到图像扩散模型与图像到文本的概念匹配对齐 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.03653) [[代码]](https:\u002F\u002Fgithub.com\u002FCaraJ7\u002FCoMat) [[项目]](https:\u002F\u002Fcaraj7.github.io\u002Fcomat\u002F)\n    - ***StoryDiffusion：*** 用于长距离图像和视频生成的一致自注意力 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.01434) [[代码]](https:\u002F\u002Fgithub.com\u002FHVision-NKU\u002FStoryDiffusion) [[项目]](https:\u002F\u002Fstorydiffusion.github.io\u002F) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FYupengZhou\u002FStoryDiffusion)\n    - 面部适配器用于预训练扩散模型，具备精细的ID和属性控制 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.12970) [[项目]](https:\u002F\u002Ffaceadapter.github.io\u002Fface-adapter.github.io\u002F)\n    - ***LinFusion：*** 1 GPU，1分钟，16K张图像 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.02097) [[代码]](https:\u002F\u002Fgithub.com\u002FHuage001\u002FLinFusion) [[项目]](https:\u002F\u002Flv-linfusion.github.io\u002F) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FHuage001\u002FLinFusion-SD-v1.5)\n    - ***OmniGen：*** 统一图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.11340) [[代码]](https:\u002F\u002Fgithub.com\u002FVectorSpaceLab\u002FOmniGen)\n    - ***CoMPaSS：*** 增强文本到图像扩散模型中的空间理解 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.13195) [[代码]](https:\u002F\u002Fgithub.com\u002Fblurgyy\u002FCoMPaSS)\n  - **其他**\n    - ***Stable Cascade*** [[博客]](https:\u002F\u002Fstability.ai\u002Fnews\u002Fintroducing-stable-cascade) [[代码]](https:\u002F\u002Fgithub.com\u002FStability-AI\u002FStableCascade)\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n- \u003Cspan id=\"text-year-2023\">**2023年**\u003C\u002Fspan>\n  - **CVPR**\n    - ***GigaGAN：*** 扩展GAN用于文本到图像合成 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FKang_Scaling_Up_GANs_for_Text-to-Image_Synthesis_CVPR_2023_paper.pdf) [[复现代码]](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fgigagan-pytorch) [[项目]](https:\u002F\u002Fmingukkang.github.io\u002FGigaGAN\u002F) [[视频]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ZjxtuDQkOPY&feature=youtu.be)\n    - ***ERNIE-ViLG 2.0：*** 基于知识增强的去噪专家混合模型改进文本到图像扩散模型 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FFeng_ERNIE-ViLG_2.0_Improving_Text-to-Image_Diffusion_Model_With_Knowledge-Enhanced_Mixture-of-Denoising-Experts_CVPR_2023_paper.pdf)\n    - 用于文本到图像生成的偏移扩散 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FZhou_Shifted_Diffusion_for_Text-to-Image_Generation_CVPR_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fdrboog\u002FShifted_Diffusion)\n    - ***GALIP：*** 用于文本到图像合成的生成对抗CLIP [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FTao_GALIP_Generative_Adversarial_CLIPs_for_Text-to-Image_Synthesis_CVPR_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Ftobran\u002FGALIP)\n    - ***Specialist Diffusion：*** 即插即用、样本高效的微调文本到图像扩散模型以学习任何未见风格 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FLu_Specialist_Diffusion_Plug-and-Play_Sample-Efficient_Fine-Tuning_of_Text-to-Image_Diffusion_Models_To_CVPR_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FPicsart-AI-Research\u002FSpecialist-Diffusion)\n    - 面向文本到图像生成的可验证与可重复的人类评估 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FOtani_Toward_Verifiable_and_Reproducible_Human_Evaluation_for_Text-to-Image_Generation_CVPR_2023_paper.pdf)\n    - ***RIATIG：*** 使用自然提示进行可靠且难以察觉的对抗性文本到图像生成 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FLiu_RIATIG_Reliable_and_Imperceptible_Adversarial_Text-to-Image_Generation_With_Natural_Prompts_CVPR_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FWUSTL-CSPL\u002FRIATIG)\n    - 文本到图像扩散的多概念定制 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FKumari_Multi-Concept_Customization_of_Text-to-Image_Diffusion_CVPR_2023_paper.pdf) [[项目]](https:\u002F\u002Fwww.cs.cmu.edu\u002F~custom-diffusion\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fadobe-research\u002Fcustom-diffusion) \n  - **ICCV**\n    - ***DiffFit：*** 通过简单的参数高效微调解锁大型扩散模型的迁移能力 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FXie_DiffFit_Unlocking_Transferability_of_Large_Diffusion_Models_via_Simple_Parameter-efficient_ICCV_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fmkshing\u002FDiffFit-pytorch) [[演示]](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fmkshing\u002Fdifffit-pytorch\u002Fblob\u002Fmain\u002Fscripts\u002Fdifffit_pytorch.ipynb)\n  - **NeurIPS**\n    - ***ImageReward：*** 学习与评估人类对文本到图像生成的偏好 [[论文]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=JVzeOYEx6d) [[代码]](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FImageReward)\n    - ***RAPHAEL***：通过大规模扩散路径混合进行文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.18295) [[项目]](https:\u002F\u002Fraphael-painter.github.io\u002F)\n    - 扩散模型中的语言绑定：通过注意力图对齐增强属性对应关系 [[论文]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=AOKU4nRw1W) [[代码]](https:\u002F\u002Fgithub.com\u002FRoyiRa\u002FLinguistic-Binding-in-Diffusion-Models)\n    - ***DenseDiffusion：*** 带有注意力调制的密集文本到图像生成 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FKim_Dense_Text-to-Image_Generation_with_Attention_Modulation_ICCV_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fnaver-ai\u002Fdensediffusion)\n  - **ICLR**\n    - 用于组合式文本到图像合成的免训练结构化扩散引导 [[论文]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=PUIqjT4rzq7) [[代码]](https:\u002F\u002Fgithub.com\u002Fweixi-feng\u002FStructured-Diffusion-Guidance)\n  - **ICML**\n    - ***StyleGAN-T：*** 解锁GAN在快速大规模文本到图像合成中的潜力 [[论文]](https:\u002F\u002Fproceedings.mlr.press\u002Fv202\u002Fsauer23a\u002Fsauer23a.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fautonomousvision\u002Fstylegan-t) [[项目]](https:\u002F\u002Fsites.google.com\u002Fview\u002Fstylegan-t\u002F) [[视频]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=MMj8OTOUIok)\n    - ***Muse：*** 通过掩码生成式Transformer进行文本到图像生成 [[论文]](https:\u002F\u002Fproceedings.mlr.press\u002Fv202\u002Fchang23b\u002Fchang23b.pdf) [[复现代码]](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fmuse-maskgit-pytorch) [[项目]](https:\u002F\u002Fmuse-icml.github.io\u002F)\n    - ***UniDiffusers：*** 一个Transformer适用于大规模多模态扩散中的所有分布 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.06555) [[代码]](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002Funidiffuser)\n  - **ACM MM**\n    - ***SUR-adapter：*** 用大型语言模型增强文本到图像预训练扩散模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.05189.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FQrange-group\u002FSUR-adapter)\n    - ***ControlStyle：*** 基于文本驱动的扩散先验进行风格化图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.05463.pdf)\n  - **SIGGRAPH**\n    - ***Attend-and-Excite：*** 基于注意力的语义引导用于文本到图像扩散模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2301.13826.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fyuval-alaluf\u002FAttend-and-Excite) [[项目]](https:\u002F\u002Fyuval-alaluf.github.io\u002FAttend-and-Excite\u002F) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FAttendAndExcite\u002FAttend-and-Excite)\n  - **arXiv**\n    - ***P+：*** 文本到图像生成中的扩展文本条件化 [[论文]](https:\u002F\u002Fprompt-plus.github.io\u002Ffiles\u002FPromptPlus.pdf)\n    - ***SDXL-Turbo：*** 对抗性扩散蒸馏 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.17042.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FStability-AI\u002Fgenerative-models)\n    - ***Wuerstchen：*** 一种用于大规模文本到图像扩散模型的高效架构 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2306.00637.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fdome272\u002FWuerstchen)\n    - ***StreamDiffusion：*** 用于实时交互式生成的流水线级解决方案 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.12491.pdf) [[项目]](https:\u002F\u002Fgithub.com\u002Fcumulo-autumn\u002FStreamDiffusion)\n    - ***ParaDiffusion：*** 基于信息增强扩散模型的段落到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.14284) [[代码]](https:\u002F\u002Fgithub.com\u002Fweijiawu\u002FParaDiffusion)\n  - **其他**\n    - ***DALL-E 3：*** 通过更好的标题改进图像生成 [[论文]](https:\u002F\u002Fcdn.openai.com\u002Fpapers\u002Fdall-e-3.pdf)\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n- \u003Cspan id=\"text-year-2022\">**2022年**\u003C\u002Fspan>\n  - **CVPR**\n    - 🔥 ***Stable Diffusion：*** 基于潜在扩散模型的高分辨率图像合成 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FRombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FCompVis\u002Flatent-diffusion) [[项目]](https:\u002F\u002Fommer-lab.com\u002Fresearch\u002Flatent-diffusion-models\u002F)\n    - 用于文本到图像合成的向量量化扩散模型 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FGu_Vector_Quantized_Diffusion_Model_for_Text-to-Image_Synthesis_CVPR_2022_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fcientgu\u002FVQ-Diffusion)\n    - ***DF-GAN：*** 文本到图像合成的一个简单而有效的基线 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FTao_DF-GAN_A_Simple_and_Effective_Baseline_for_Text-to-Image_Synthesis_CVPR_2022_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Ftobran\u002FDF-GAN)\n    - ***LAFITE：*** 朝着无语言训练的文本到图像生成方向发展 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FZhou_Towards_Language-Free_Training_for_Text-to-Image_Generation_CVPR_2022_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fdrboog\u002FLafite)\n    - 基于对象引导联合解码Transformer的文本到图像合成 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FWu_Text-to-Image_Synthesis_Based_on_Object-Guided_Joint-Decoding_Transformer_CVPR_2022_paper.pdf)\n    - ***StyleT2I：*** 朝着组合性和高保真度的文本到图像合成方向发展 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FLi_StyleT2I_Toward_Compositional_and_High-Fidelity_Text-to-Image_Synthesis_CVPR_2022_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fzhihengli-UR\u002FStyleT2I)\n  - **ECCV**\n    - ***Make-A-Scene：*** 基于场景和人类先验知识的文本到图像生成 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2022\u002Fpapers_ECCV\u002Fpapers\u002F136750087.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FCasualGANPapers\u002FMake-A-Scene) [[演示]](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1SPyQ-epTsAOAu8BEohUokN4-b5RM_TnE?usp=sharing)\n    - 轨迹控制的文本到图像生成 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2022\u002Fpapers_ECCV\u002Fpapers\u002F136960058.pdf)\n    - 使用Token-Critic改进的掩码图像生成 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2022\u002Fpapers_ECCV\u002Fpapers\u002F136830070.pdf)\n    - ***VQGAN-CLIP：*** 利用自然语言进行开放域图像生成与操控 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2022\u002Fpapers_ECCV\u002Fpapers\u002F136970088.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Fvqgan-clip)\n    - ***TISE：*** 用于文本到图像合成评估的指标集合 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2022\u002Fpapers_ECCV\u002Fpapers\u002F136960585.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FVinAIResearch\u002Ftise-toolbox)\n    - ***StoryDALL-E：*** 适配预训练的文本到图像Transformer以进行故事续写 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2022\u002Fpapers_ECCV\u002Fpapers\u002F136970070.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fadymaharana\u002Fstorydalle) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FECCV2022\u002Fstorydalle)\n  - **NeurIPS**\n    - ***CogView2：*** 通过层次化Transformer实现更快更好的文本到图像生成 [[论文]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=GkDbQb6qu_r) [[代码]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=GkDbQb6qu_r)\n    - ***Imagen：*** 具有深度语言理解能力的逼真文本到图像扩散模型 [[论文]](https:\u002F\u002Fpapers.nips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Ffile\u002Fec795aeadae0b7d230fa35cbaf04c041-Paper-Conference.pdf) [[复现代码]](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fimagen-pytorch) [[项目]](https:\u002F\u002Fimagen.research.google\u002F) [[***Imagen 2***]](https:\u002F\u002Fdeepmind.google\u002Ftechnologies\u002Fimagen-2\u002F)\n  - **ACM MM**\n    - ***Adma-GAN：*** 基于属性驱动的记忆增强GAN用于文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.14046.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FHsintien-Ng\u002FAdma-GAN)\n    - 文本到图像生成中的背景布局生成与物体知识迁移 [[论文]](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3503161.3548154)\n    - ***DSE-GAN：*** 用于文本到图像生成的动态语义演化生成对抗网络 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.01339.pdf)\n    - ***AtHom：*** 在文本到图像合成中由同态训练激发的两种发散注意力 [[论文]](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1145\u002F3503161.3548159)\n  - **arXiv**\n    - ***DALLE-2：*** 基于CLIP潜在空间的层次化条件文本图像生成 [[论文]](https:\u002F\u002Fcdn.openai.com\u002Fpapers\u002Fdall-e-2.pdf)\n    - ***PITI：*** 图像到图像转换只需预训练即可 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2205.12952.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FPITI-Synthesis\u002FPITI)\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n- \u003Cspan id=\"text-year-2021\">**2021年**\u003C\u002Fspan>\n  - **ICCV**\n    -  ***DAE-GAN：*** 用于文本到图像合成的动态宽高比感知GAN [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2021\u002Fpapers\u002FRuan_DAE-GAN_Dynamic_Aspect-Aware_GAN_for_Text-to-Image_Synthesis_ICCV_2021_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fhiarsal\u002FDAE-GAN)\n  - **NeurIPS**\n    - ***CogView：*** 通过Transformer掌握文本到图像生成 [[论文]](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2021\u002Ffile\u002Fa4d92e2cd541fca87e4620aba658316d-Paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FCogView) [[演示]](https:\u002F\u002Fthudm.github.io\u002FCogView\u002Findex.html)\n    - ***UFC-BERT：*** 统一多模态控制以实现条件图像生成 [[论文]](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2021\u002Ffile\u002Fe46bc064f8e92ac2c404b9871b2a4ef2-Paper.pdf)\n  - **ICML**\n    - ***DALLE-1：*** 零样本文本到图像生成 [[论文]](https:\u002F\u002Fproceedings.mlr.press\u002Fv139\u002Framesh21a\u002Framesh21a.pdf) [[复现代码]](https:\u002F\u002Fgithub.com\u002Flucidrains\u002FDALLE-pytorch)\n   -  **ACM MM**\n      - 用于文本到图像合成的循环一致性逆向GAN [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2108.01361.pdf)\n      - ***R-GAN：*** 通过生成对抗网络探索类人方式实现合理的文本到图像合成 [[论文]](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3474085.3475363)\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n- \u003Cspan id=\"text-year-2020\">**2020年**\u003C\u002Fspan>\n  - **ACM MM**\n    - 基于美学布局的文本到图像合成 [[论文]](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3394171.3414357)\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n\n## 条件文本到图像生成\n- \u003Cspan id=\"conditional-year-2025\">**2025年**\u003C\u002Fspan>\n  - **CVPR**\n    - 用于模块化条件图像合成的无训练密集对齐扩散引导 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.01515) [[代码]](https:\u002F\u002Fgithub.com\u002FZixuanWang0525\u002FDADG)\n  - **ICCV**\n    - ***UNO：*** 一种适用于单主体和多主体条件的通用定制方法 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.02160) [[项目]](https:\u002F\u002Fbytedance.github.io\u002FUNO) [[代码]](https:\u002F\u002Fgithub.com\u002Fbytedance\u002FUNO)\n    - ***CoMPaSS：*** 增强文本到图像扩散模型中的空间理解能力 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.13195) [[项目]](https:\u002F\u002Fcompass.blurgy.xyz) [[代码]](https:\u002F\u002Fgithub.com\u002Fblurgyy\u002FCoMPaSS)\n    - ***SP‑Ctrl：*** 重新思考用于姿态引导文本到图像生成的稀疏信号 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.20983) [[代码]](https:\u002F\u002Fgithub.com\u002FDREAMXFAR\u002FSP-Ctrl)\n    - ***CompCon：*** 发现文本到图像模型之间的差异性表征 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.08940) [[代码]](https:\u002F\u002Fgithub.com\u002Fadobe-research\u002FCompCon)\n    - ***C2OT：*** 条件的诅咒：分析并改进基于流的条件生成中的最优传输 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.10636) [[项目]](https:\u002F\u002Fhkchengrex.com\u002FC2OT) [[代码]](https:\u002F\u002Fgithub.com\u002Fhkchengrex\u002FC2OT)\n    - ***RAG‑Diffusion：*** 通过硬绑定与软细化实现区域感知的文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.06558) [[项目]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FNJU\u002FRAG-Diffusion) [[代码]](https:\u002F\u002Fgithub.com\u002FNJU-PCALab\u002FRAG-Diffusion)\n    - ***CharaConsist：*** 细粒度的一致性角色生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.11533) [[项目]](https:\u002F\u002Fmurray-wang.github.io\u002FCharaConsist) [[代码]](https:\u002F\u002Fgithub.com\u002FMurray-Wang\u002FCharaConsist)\n    - ***Shadow Director：*** 文本到图像扩散模型中人像生成的参数化阴影控制 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.21943) [[项目]](https:\u002F\u002Fhm-cai.com\u002FShadowDirector)\n    - ***ImageGen‑CoT：*** 利用思维链推理增强文本到图像的上下文学习能力 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.19312) [[项目]](https:\u002F\u002Fimagegen-cot.github.io)\n\n- \u003Cspan id=\"conditional-year-2024\">**2024年**\u003C\u002Fspan>\n  - **CVPR**\n    - ***PLACE：*** 用于语义图像合成的自适应布局-语义融合 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.01852.pdf)\n    - 一次性结构感知风格化图像合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.17275.pdf)\n    - 基于注意力重聚焦的接地文本到图像合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2306.05427.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FAttention-Refocusing\u002Fattention-refocusing) [[项目]](https:\u002F\u002Fattention-refocusing.github.io\u002F) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fattention-refocusing\u002FAttention-refocusing)\n    - 用于姿态引导的人体图像合成的粗细结合潜在扩散模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.18078.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FYanzuoLu\u002FCFLD)\n    - ***DetDiffusion：*** 协同生成与感知模型以增强数据生成与感知能力 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.13304)\n    - ***CAN：*** 用于可控图像生成的条件感知神经网络 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.01143.pdf)\n    - ***SceneDiffusion：*** 使用分层场景扩散任意移动物体 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.07178)\n    - ***Zero-Painter：*** 无需训练的文本到图像合成布局控制 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FOhanyan_Zero-Painter_Training-Free_Layout_Control_for_Text-to-Image_Synthesis_CVPR_2024_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FPicsart-AI-Research\u002FZero-Painter)\n    - ***MIGC：*** 用于文本到图像合成的多实例生成控制器 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FZhou_MIGC_Multi-Instance_Generation_Controller_for_Text-to-Image_Synthesis_CVPR_2024_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Flimuloo\u002FMIGC) [[项目]](https:\u002F\u002Fmigcproject.github.io\u002F)\n    - ***FreeControl：*** 对任何文本到图像扩散模型在任意条件下实现无需训练的空间控制 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FMo_FreeControl_Training-Free_Spatial_Control_of_Any_Text-to-Image_Diffusion_Model_with_CVPR_2024_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fgenforce\u002Ffreecontrol) [[项目]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FMo_FreeControl_Training-Free_Spatial_Control_of_Any_Text-to-Image_Diffusion_Model_with_CVPR_2024_paper.pdf)\n  - **ECCV**\n    - ***PreciseControl：*** 通过细粒度属性控制提升文本到图像扩散模型性能 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.05083) [[代码]](https:\u002F\u002Fgithub.com\u002Frishubhpar\u002FPreciseControl) [[项目]](https:\u002F\u002Frishubhpar.github.io\u002FPreciseControl.home\u002F)\n    - ***AnyControl：*** 通过多功能控制创作你的艺术作品 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F01706.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002FAnyControl)\n  - **NeurIPS**\n    - ***Ctrl-X：*** 在无指导的情况下控制文本到图像生成的结构与外观 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.07540) [[代码]](https:\u002F\u002Fgithub.com\u002Fgenforce\u002Fctrl-x) [[项目]](https:\u002F\u002Fgenforce.github.io\u002Fctrl-x\u002F)\n  - **ICLR**\n    - 利用渐进式条件扩散模型推进姿态引导图像合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.06313.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fmuzishen\u002FPCDMs)\n  - **WACV**\n    - 基于交叉注意力引导的无需训练布局控制 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FWACV2024\u002Fpapers\u002FChen_Training-Free_Layout_Control_With_Cross-Attention_Guidance_WACV_2024_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fsilent-chen\u002Flayout-guidance) [[项目]](https:\u002F\u002Fsilent-chen.github.io\u002Flayout-guidance\u002F) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fsilentchen\u002Flayout-guidance)\n  - **AAAI**\n    - ***SSMG：*** 基于空间-语义地图引导的扩散模型，用于自由格式布局到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.10156.pdf)\n    - 利用扩散模型的注意力图控制进行组合式文本到图像合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.13921.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FOPPO-Mente-Lab\u002Fattention-mask-control)\n  - **arXiv**\n    - ***DEADiff：*** 具有解耦表示的高效风格化扩散模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.06951)\n    - ***InstantStyle：*** 文本到图像生成中风格保留的免费午餐 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.02733.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FInstantStyle\u002FInstantStyle) [[项目]](https:\u002F\u002Finstantstyle.github.io\u002F)\n    - ***ControlNet++：*** 通过高效的连贯性反馈改进条件控制 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.07987.pdf) [[项目]](https:\u002F\u002Fliming-ai.github.io\u002FControlNet_Plus_Plus\u002F)\n    - ***Hunyuan-DiT：*** 一款功能强大的多分辨率扩散Transformer，具备精细的中文理解能力 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.08748) [[代码]](https:\u002F\u002Fgithub.com\u002FTencent\u002FHunyuanDiT) [[项目]](https:\u002F\u002Fdit.hunyuan.tencent.com\u002F)\n    - ***DialogGen：*** 多模态交互式对话系统，用于多轮次文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.08857) [[代码]](https:\u002F\u002Fgithub.com\u002FCentaurusalpha\u002FDialogGen) [[项目]](https:\u002F\u002Fhunyuan-dialoggen.github.io\u002F)\n    - ***ControlNeXt：*** 强大而高效的图像和视频生成控制 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.06070) [[代码]](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FControlNeXt) [[项目]](https:\u002F\u002Fpbihao.github.io\u002Fprojects\u002Fcontrolnext\u002Findex.html)\n    - ***UniPortrait：*** 一个统一框架，用于单人及多人图像的身份保留个性化 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.05939) [[代码]](https:\u002F\u002Fgithub.com\u002Fjunjiehe96\u002FUniPortrait) [[项目]](https:\u002F\u002Faigcdesigngroup.github.io\u002FUniPortrait-Page\u002F) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FJunjie96\u002FUniPortrait)\n    - ***OmniControl：*** 针对扩散Transformer的极简且通用的控制 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.15098) [[代码]](https:\u002F\u002Fgithub.com\u002FYuanshi9815\u002FOminiControl) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FYuanshi\u002FOminiControl)\n    - ***UnZipLoRA：*** 从单张图片中分离内容与风格 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.04465) [[项目]](https:\u002F\u002Funziplora.github.io\u002F)\n    - ***CtrLoRA：*** 一个可扩展且高效的可控图像生成框架 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.09400) [[代码]](https:\u002F\u002Fgithub.com\u002FxyfJASON\u002Fctrlora)\n    - 基于硬绑定与软细化的区域感知文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.06558) [[代码]](https:\u002F\u002Fgithub.com\u002FNJU-PCALab\u002FRAG-Diffusion)\n\n\n\n\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n- \u003Cspan id=\"conditional-year-2023\">**2023年**\u003C\u002Fspan>\n  - **CVPR**\n    - ***GLIGEN:*** 开放集接地文本到图像生成 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FLi_GLIGEN_Open-Set_Grounded_Text-to-Image_Generation_CVPR_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fgligen\u002FGLIGEN) [[项目]](https:\u002F\u002Fgligen.github.io\u002F) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fgligen\u002Fdemo) [[视频]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=-MCkU7IAGKs&feature=youtu.be)\n    - 基于残差量化的自回归图像生成 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FLee_Autoregressive_Image_Generation_Using_Residual_Quantization_CVPR_2022_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fkakaobrain\u002Frq-vae-transformer)\n    - ***SpaText:*** 用于可控图像生成的时空文本表示 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FAvrahami_SpaText_Spatio-Textual_Representation_for_Controllable_Image_Generation_CVPR_2023_paper.pdf) [[项目]](https:\u002F\u002Fomriavrahami.com\u002Fspatext\u002F) [[视频]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=VlieNoCwHO4)\n    - 具有语义-空间感知GAN的文本到图像生成 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FLiao_Text_to_Image_Generation_With_Semantic-Spatial_Aware_GAN_CVPR_2022_paper.pdf)\n    - ***ReCo:*** 区域控制的文本到图像生成 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FYang_ReCo_Region-Controlled_Text-to-Image_Generation_CVPR_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FReCo)\n    - ***LayoutDiffusion:*** 用于布局到图像生成的可控扩散模型 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FZheng_LayoutDiffusion_Controllable_Diffusion_Model_for_Layout-to-Image_Generation_CVPR_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FZGCTroy\u002FLayoutDiffusion)\n  - **ICLR**\n    - ***Ctrl-U:*** 通过不确定性感知奖励建模实现鲁棒的条件图像生成 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=eC2ICbECNM) [[项目]](https:\u002F\u002Fgrenoble-zhang.github.io\u002FCtrl-U-Page\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fgrenoble-zhang\u002FCtrl-U)\n  - **ICCV**\n    - ***ControlNet:*** 为文本到图像扩散模型添加条件控制 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FZhang_Adding_Conditional_Control_to_Text-to-Image_Diffusion_Models_ICCV_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet)\n    - ***SceneGenie:*** 场景图引导的扩散模型用于图像合成 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023W\u002FSG2RL\u002Fpapers\u002FFarshad_SceneGenie_Scene_Graph_Guided_Diffusion_Models_for_Image_Synthesis_ICCVW_2023_paper.pdf) [[代码]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023W\u002FSG2RL\u002Fpapers\u002FFarshad_SceneGenie_Scene_Graph_Guided_Diffusion_Models_for_Image_Synthesis_ICCVW_2023_paper.pdf)\n    - ***ZestGuide:*** 零样本的空间布局条件化用于文本到图像扩散模型 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FCouairon_Zero-Shot_Spatial_Layout_Conditioning_for_Text-to-Image_Diffusion_Models_ICCV_2023_paper.pdf)\n  - **ICML**\n    - ***Composer:*** 具有可组合条件的创意和可控图像合成 [[论文]](https:\u002F\u002Fproceedings.mlr.press\u002Fv202\u002Fhuang23b\u002Fhuang23b.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002Fcomposer) [[项目]](https:\u002F\u002Fali-vilab.github.io\u002Fcomposer-page\u002F)\n    - ***MultiDiffusion:*** 融合扩散路径以实现可控图像生成 [[论文]](https:\u002F\u002Fproceedings.mlr.press\u002Fv202\u002Fbar-tal23a\u002Fbar-tal23a.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fomerbt\u002FMultiDiffusion) [[视频]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=D2Q0D1gIeqg) [[项目]](https:\u002F\u002Fmultidiffusion.github.io\u002F) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fweizmannscience\u002FMultiDiffusion)\n  - **SIGGRAPH** \n    - 草图引导的文本到图像扩散模型 [[论文]](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3588432.3591560) [[复现代码]](https:\u002F\u002Fgithub.com\u002Fogkalu2\u002FSketch-Guided-Stable-Diffusion) [[项目]](https:\u002F\u002Fsketch-guided-diffusion.github.io\u002F)\n  - **NeurIPS**\n    - ***Uni-ControlNet:*** 文本到图像扩散模型的一体化控制 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.16322.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FShihaoZhaoZSH\u002FUni-ControlNet) [[项目]](https:\u002F\u002Fshihaozhaozsh.github.io\u002Funicontrolnet\u002F)\n    - ***Prompt Diffusion:*** 扩散模型解锁上下文学习 [[论文]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=6BZS2EAkns) [[代码]](https:\u002F\u002Fgithub.com\u002FZhendong-Wang\u002FPrompt-Diffusion) [[项目]](https:\u002F\u002Fzhendong-wang.github.io\u002Fprompt-diffusion.github.io\u002F)\n  - **WACV**\n    - 更多控制，免费！基于语义扩散指导的图像合成 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FWACV2023\u002Fpapers\u002FLiu_More_Control_for_Free_Image_Synthesis_With_Semantic_Diffusion_Guidance_WACV_2023_paper.pdf)\n  - **ACM MM**\n    -  ***LayoutLLM-T2I:*** 从LLM中提取布局指导以进行文本到图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.05095.pdf)\n  - **arXiv**\n    - ***T2I-Adapter:*** 学习适配器以挖掘文本到图像扩散模型的更多可控能力 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2302.08453.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FTencentARC\u002FT2I-Adapter) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FTencentARC\u002FT2I-Adapter-SDXL)\n    - ***BLIP-Diffusion:*** 用于可控文本到图像生成和编辑的预训练主体表征 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.14720.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fsalesforce\u002FLAVIS\u002Ftree\u002Fmain\u002Fprojects\u002Fblip-diffusion)\n    - 用于可控图像合成的晚期约束扩散指导 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.11520) [[代码]](https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo\u002FLCDG)\n- \u003Cspan id=\"conditional-year-2022\">**2022年**\u003C\u002Fspan>\n  - **ICLR**\n    - ***SDEdit:*** 基于随机微分方程的引导式图像合成与编辑 [[论文]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=aBsCjcPu_tE) [[代码]](https:\u002F\u002Fgithub.com\u002Fermongroup\u002FSDEdit) [[项目]](https:\u002F\u002Fsde-image-editing.github.io\u002F)\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\n\u003C!-- omit in toc -->\n\n## 个性化文生图生成\n- \u003Cspan id=\"personalized-year-2025\">**2025年**\u003C\u002Fspan>\n  - **CVPR**\n    - ***SerialGen:*** 先标准化再个性化的个性化图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.01485) [[项目]](https:\u002F\u002Fserialgen.github.io\u002F)\n    - ***PatchDPO:*** 无需微调的个性化图像生成的补丁级DPO [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.03177) [[代码]](https:\u002F\u002Fgithub.com\u002FhqhQAQ\u002FPatchDPO)\n    - ***DreamCache:*** 通过特征缓存实现无需微调的轻量级个性化图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.17786v1)\n  - **ICCV**\n    - ***DrUM:*** 捕捉你的思绪：基于文本到图像扩散模型中条件级别的建模进行个性化生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.03481) [[代码]](https:\u002F\u002Fgithub.com\u002FBurf\u002FDrUM)\n    - ***PersonaCraft:*** 利用遮挡感知的3D条件扩散模型实现个性化且可控的全身多人场景生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.18068) [[项目]](https:\u002F\u002Fgwang-kim.github.io\u002Fpersona_craft\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fgwang-kim\u002FPersonaCraft)\n    - ***Steering Guidance:*** 面向个性化文生图扩散模型的引导机制 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.00319)\n    - ***FreeCus:*** FreeCus：扩散Transformer中的免费午餐式主题驱动定制 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.15249) [[代码]](https:\u002F\u002Fgithub.com\u002FMonalissaa\u002FFreeCus)\n    - ***PromptDresser:*** 通过生成式文本提示和提示感知掩码提升虚拟试穿的质量与可控性 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.16978) [[代码]](https:\u002F\u002Fgithub.com\u002Frlawjdghek\u002FPromptDresser)\n    - ***DynamicID:*** 具有灵活面部编辑能力的零样本多身份图像个性化 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.06505) [[代码]](https:\u002F\u002Fgithub.com\u002FByteCat-bot\u002FDynamicID)\n    - ***UniversalBooth:*** 模型无关的个性化文生图生成\n    - ***ARBooth:*** 微调视觉自回归模型以实现主题驱动生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.02612) [[项目]](https:\u002F\u002Fjiwoogit.github.io\u002FARBooth\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fjiwoogit\u002FARBooth)\n    - ⚠️ ***ConceptSplit:*** 通过逐token适应和注意力解耦实现扩散模型的多概念解耦个性化 [[代码]](https:\u002F\u002Fgithub.com\u002FKU-VGI\u002FConceptSplit)\n    - ⚠️ ***ObjectMate:*** 用于对象插入和主题驱动生成的递归先验 [[项目]](https:\u002F\u002Fobject-mate.com)\n  - **NeurIPS**\n    - ***MS-Diffusion:*** 布局引导下的多主体零样本个性化图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.07209) [[项目]](https:\u002F\u002Fms-diffusion.github.io\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FMS-Diffusion\u002FMS-Diffusion)\n    - ***ClassDiffusion:*** 更加对齐的个性化调优，采用显式类别指导 [[论文]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=iTm4H6N4aG) [[项目]](https:\u002F\u002Fclassdiffusion.github.io\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FRbrq03\u002FClassDiffusion)\n    - ***DreamBench++:*** 一个面向人类对齐的个性化图像生成基准测试 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.16855) [[项目]](https:\u002F\u002Fdreambenchplus.github.io\u002F)\n    - ***TweedieMix:*** 改进基于扩散的图像\u002F视频生成中的多概念融合 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.05591) [[代码]](https:\u002F\u002Fgithub.com\u002FKwonGihyun\u002FTweedieMix)\n- \u003Cspan id=\"personalized-year-2024\">**2024年**\u003C\u002Fspan>\n  - **CVPR**\n    - 个性化文生图的交叉初始化 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.15905.pdf)\n    - 当StyleGAN遇见Stable Diffusion：用于个性化图像生成的W+适配器 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.17461.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fcsxmli2016\u002Fw-plus-adapter) [[项目]](https:\u002F\u002Fcsxmli2016.github.io\u002Fprojects\u002Fw-plus-adapter\u002F)\n    - 通过共享注意力实现风格一致的图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.02133.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fstyle-aligned) [[项目]](https:\u002F\u002Fstyle-aligned-gen.github.io\u002F)\n    - ***InstantBooth:*** 无需测试时微调的个性化文生图生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.03411.pdf) [[项目]](https:\u002F\u002Fjshi31.github.io\u002FInstantBooth\u002F)\n    - 高保真的人像主题图像合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.10329.pdf)\n    - ***RealCustom:*** 缩小真实文本词汇范围，实现实时开放域文生图定制 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.00483.pdf) [[项目]](https:\u002F\u002Fcorleone-huang.github.io\u002Frealcustom\u002F)\n    - ***DisenDiff:*** 用于解耦文生图个性化的注意力校准 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.18551) [[代码]](https:\u002F\u002Fgithub.com\u002FMonalissaa\u002FDisenDiff)\n    - ***FreeCustom:*** 无需调优即可实现多概念组合的定制化图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.13870v1) [[代码]](https:\u002F\u002Fgithub.com\u002Faim-uofa\u002FFreeCustom) [[项目]](https:\u002F\u002Faim-uofa.github.io\u002FFreeCustom\u002F)\n    - 用于概念驱动文生图的个性化残差 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.12978)\n    - 利用主体无关的指导改进主题驱动图像合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.01356)\n    - ***JeDi:*** 用于无需微调的个性化文生图生成的联合图像扩散模型 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FZeng_JeDi_Joint-Image_Diffusion_Models_for_Finetuning-Free_Personalized_Text-to-Image_Generation_CVPR_2024_paper.pdf)\n    - 使用影响力水印对抗个性化文生图生成 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FLiu_Countering_Personalized_Text-to-Image_Generation_with_Influence_Watermarks_CVPR_2024_paper.pdf)\n    - ***PIA:*** 通过文生图模型中的即插即用模块打造你的个性化图像动画师 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FZhang_PIA_Your_Personalized_Image_Animator_via_Plug-and-Play_Modules_in_Text-to-Image_CVPR_2024_paper.pdf) [[项目]](https:\u002F\u002Fpi-animator.github.io\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002FPIA)\n    - ***SSR-Encoder:*** 为主题驱动生成编码选择性主体表征 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FZhang_SSR-Encoder_Encoding_Selective_Subject_Representation_for_Subject-Driven_Generation_CVPR_2024_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FXiaojiu-z\u002FSSR_Encoder)\n  - **ECCV**\n    - 做回自己：用于多主体文生图生成的受限注意力 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.16990) [[项目]](https:\u002F\u002Fomer11a.github.io\u002Fbounded-attention\u002F)\n    - 强大而灵活：通过强化学习实现个性化文生图生成 [[论文]](http:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.06642v1) [[代码]](https:\u002F\u002Fgithub.com\u002Fwfanyue\u002FDPG-T2I-Personalization)\n    - ***TIGC:*** 无需调优，仅凭图像和文本指导即可完成图像定制 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.12658) [[代码]](https:\u002F\u002Fgithub.com\u002Fzrealli\u002FTIGIC) [[项目]](https:\u002F\u002Fzrealli.github.io\u002FTIGIC\u002F)\n    - ***MasterWeaver:*** 掌控可编辑性和人脸身份，实现个性化文生图生成 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F06786.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fcsyxwei\u002FMasterWeaver) [[项目]](https:\u002F\u002Fmasterweaver.github.io\u002F)\n  - **NeurIPS**\n    - ***RectifID:*** 基于锚定分类器指导的修正流个性化 [[论文]](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2024\u002Ffile\u002Fafa58a5b6adc0845e0fd632132a64c39-Paper-Conference.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Ffeifeiobama\u002FRectifID)\n    - ***AttnDreamBooth:*** 朝着文本对齐的个性化图像生成迈进 [[论文]](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2024\u002Ffile\u002F465a13a95741fab2e912f98adb07df1d-Paper-Conference.pdf) [[项目]](https:\u002F\u002Fattndreambooth.github.io\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FlyuPang\u002FAttnDreamBooth)\n  - **AAAI**\n    - 用于定制化图像生成的解耦文本嵌入 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.11826.pdf)\n  - **arXiv**\n    - ***FlashFace:*** 高保真地保留身份特征的人像个性化 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.17008) [[代码]](https:\u002F\u002Fgithub.com\u002Fjshilong\u002FFlashFace) [[项目]](https:\u002F\u002Fjshilong.github.io\u002Fflashface-page)\n    - ***MoMA:*** 多模态LLM适配器，用于快速个性化图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.05674)\n    - ***IDAdapter:*** 学习混合特征，实现文生图模型的无调优个性化 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.13535)\n    - ***CoRe:*** 面向文生图个性化的上下文正则化文本嵌入学习 [[论文]](https:\u002F\u002Fwww.arxiv.org\u002Fpdf\u002F2408.15914)\n    - ***Imagine yourself:*** 无需调优的个性化图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.13346) [[项目]](https:\u002F\u002Fai.meta.com\u002Fresearch\u002Fpublications\u002Fimagine-yourself-tuning-free-personalized-image-generation\u002F)\n- \u003Cspan id=\"personalized-year-2023\">**2023年**\u003C\u002Fspan>\n  - **CVPR**\n    - ***Custom Diffusion:*** 文生图扩散模型的多概念定制 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FKumari_Multi-Concept_Customization_of_Text-to-Image_Diffusion_CVPR_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fadobe-research\u002Fcustom-diffusion) [[项目]](https:\u002F\u002Fwww.cs.cmu.edu\u002F~custom-diffusion\u002F)\n    - ***DreamBooth:*** 针对主题驱动生成对文生图扩散模型进行微调 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FRuiz_DreamBooth_Fine_Tuning_Text-to-Image_Diffusion_Models_for_Subject-Driven_Generation_CVPR_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fdreambooth) [[项目]](https:\u002F\u002Fdreambooth.github.io\u002F)\n  - **ICCV**\n    - ***ELITE:*** 将视觉概念编码为文本嵌入，用于定制化文生图生成 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FWei_ELITE_Encoding_Visual_Concepts_into_Textual_Embeddings_for_Customized_Text-to-Image_ICCV_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fcsyxwei\u002FELITE)\n  - **ICLR**\n    - ***Textual Inversion:*** 一张图胜过千言万语：利用文本反转个性化文生图生成 [[论文]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=NAQvF08TcyG) [[代码]](https:\u002F\u002Fgithub.com\u002Frinongal\u002Ftextual_inversion) [[项目]](https:\u002F\u002Ftextual-inversion.github.io\u002F)\n  - **SIGGRAPH**\n    - ***Break-A-Scene:*** 从单张图像中提取多个概念 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.16311.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fbreak-a-scene)\n    - 基于编码器的领域调优，实现文生图模型的快速个性化 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2302.12228.pdf) [[项目]](https:\u002F\u002Ftuning-encoder.github.io\u002F)\n    - ***LayerDiffusion:*** 基于扩散模型的分层可控图像编辑 [[论文]](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3610543.3626172)\n  - **arXiv**\n    - ***DreamTuner:*** 仅需一张图像即可实现主题驱动生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.13691.pdf) [[项目]](https:\u002F\u002Fdreamtuner-diffusion.github.io\u002F)\n    - ***PhotoMaker:*** 通过堆叠身份嵌入定制逼真的真人照片 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.04461.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FTencentARC\u002FPhotoMaker)\n    - ***IP-Adapter:*** 文本兼容的图像提示适配器，用于文生图扩散模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.06721.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Ftencent-ailab\u002FIP-Adapter) [[项目]](https:\u002F\u002Fip-adapter.github.io\u002F)\n    - ***FastComposer:*** 无需调优，通过局部注意力实现多主体图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.10431.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Ffastcomposer)\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\n\u003C!-- omit in toc -->\n\n\n## 文本引导的图像编辑\n- \u003Cspan id=\"editing-year-2025\">**2025年**\u003C\u002Fspan>\n  - **CVPR**\n    - ***FDS:*** 面向文本引导潜在扩散图像编辑的频率感知去噪分数 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.19191)\n    - 基于参考的三平面3D感知图像编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.03632)\n    - ***MoEdit:*** 关于学习多对象图像编辑中的数量感知 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.10112)\n    - ⚠️ FeedEdit: 基于动态反馈调节的文本驱动图像编辑 [论文]\n  - **ICCV**\n    - ***In-Context Edit:*** 利用大规模扩散Transformer中的上下文生成实现指令式图像编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.20690) [[项目]](https:\u002F\u002Friver-zhang.github.io\u002FICEdit-gh-pages\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FRiver-Zhang\u002FICEdit?tab=readme-ov-file)\n    - ***双条件反演:*** 用于增强基于扩散的图像编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.02560)\n    - ***CAMILA:*** 具有语言对齐能力的上下文感知掩码技术用于图像编辑 [[论文]](https:\u002F\u002Fneurips.cc\u002Fvirtual\u002F2025\u002Fposter\u002F119101)\n    - ***EditInfinity:*** 基于二值量化生成模型的图像编辑 [[论文]](https:\u002F\u002Fneurips.cc\u002Fvirtual\u002F2025\u002Fposter\u002F115392)\n    - ***KRIS‑Bench:*** 图像编辑系统中基于知识推理的基准测试 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.16707) [[项目]](https:\u002F\u002Fyongliang-wu.github.io\u002Fkris_bench_project_page\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fmercurystraw\u002FKris_Bench)\n    - ***LoongX:*** 神经网络驱动的图像编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.05397) [[项目]](https:\u002F\u002Floongx1.github.io) [[代码]](https:\u002F\u002Fgithub.com\u002FLanceZPF\u002Floongx)\n    - ***CREA:*** CREA：一个用于创意图像编辑与生成的协作式多智能体框架 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.05306) [[项目]](https:\u002F\u002Fcrea-diffusion.github.io)\n    - ***IEAP:*** 使用扩散模型将图像编辑视为程序 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.04158) [[项目]](https:\u002F\u002Fyujiahu1109.github.io\u002FIEAP\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FYujiaHu1109\u002FIEAP)\n  - **ICLR**\n    - 面向文本到图像扩散模型的闪电般快速图像反演与编辑 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=t9l63huPRt)\n    - 多奖励作为基于指令的图像编辑条件 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=9RFocgIccP) [[代码]](https:\u002F\u002Fgithub.com\u002Fbytedance\u002FMulti-Reward-Editing)\n    - ***HQ-Edit:*** 一个高质量的基于指令的图像编辑数据集 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=mZptYYttFj) [[数据集]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FUCSC-VLAA\u002FHQ-Edit) [[代码]](https:\u002F\u002Fgithub.com\u002FUCSC-VLAA\u002FHQ-Edit)\n    - ***CLIPDrag:*** 将基于文本和基于拖拽的指令结合用于图像编辑 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=2HjRezQ1nj) [[代码]](https:\u002F\u002Fgithub.com\u002FHKUST-LongGroup\u002FCLIPDrag)\n    - 使用修正随机微分方程进行语义图像反演与编辑 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=Hu0FSOSEyS) [[项目]](https:\u002F\u002Frf-inversion.github.io\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FLituRout\u002FRF-Inversion)\n    - ***PostEdit:*** 后验采样用于高效的零样本图像编辑 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=J8YWCBPgx7) [[代码]](https:\u002F\u002Fgithub.com\u002FTFNTF\u002FPostEdit)\n    - ***OmniEdit:*** 通过专家监督构建图像编辑通用模型 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=Hlm0cga0sv) [[项目]](https:\u002F\u002Ftiger-ai-lab.github.io\u002FOmniEdit\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FTIGER-AI-Lab\u002FOmniEdit) [[数据集]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTIGER-Lab\u002FOmniEdit-Filtered-1.2M)\n\n- \u003Cspan id=\"editing-year-2024\">**2024年**\u003C\u002Fspan>\n  - **CVPR**\n    - ***InfEdit:*** 基于自然语言的无反演图像编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.04965.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fsled-group\u002FInfEdit) [[项目]](https:\u002F\u002Fsled-group.github.io\u002FInfEdit\u002F)\n    - 理解稳定扩散模型中的交叉注意力与自注意力在文本引导图像编辑中的作用 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.03431.pdf)\n    - 面向文本驱动图像编辑的双重溯因反事实推理 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.02981.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fxuesong39\u002FDAC)\n    - 聚焦你的指令：通过注意力调制实现细粒度多指令图像编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.10113.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fguoqincode\u002FFocus-on-Your-Instruction)\n    - 用于文本引导潜在扩散图像编辑的对比去噪分数 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.18608.pdf)\n    - ***DragDiffusion:*** 利用扩散模型实现交互式基于点的图像编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2306.14435.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FYujun-Shi\u002FDragDiffusion)\n    - ***DiffEditor:*** 提升基于扩散的图像编辑的准确性和灵活性 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.02583.pdf)\n    - ***FreeDrag:*** 基于特征拖拽的可靠点式图像编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2307.04684.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FLPengYang\u002FFreeDrag)\n    - 通过可学习区域进行文本驱动图像编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.16432.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fyuanze-lin\u002FLearnable_Regions) [[项目]](https:\u002F\u002Fyuanze-lin.me\u002FLearnableRegions_page\u002F) [[视频]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=FpMWRXFraK8&feature=youtu.be)\n    - ***LEDITS++:*** 使用文生图模型实现无限可能的图像编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.16711.pdf) [[代码]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fediting-images\u002Fledtisplusplus\u002Ftree\u002Fmain) [[项目]](https:\u002F\u002Fleditsplusplus-project.static.hf.space\u002Findex.html) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fediting-images\u002Fleditsplusplus)\n    - ***SmartEdit:*** 探索大型语言模型支持下的复杂指令驱动图像编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.06739.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FTencentARC\u002FSmartEdit) [[项目]](https:\u002F\u002Fyuzhou914.github.io\u002FSmartEdit\u002F)\n    - ***Edit One for All:*** 交互式批量图像编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.10219.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fthaoshibe\u002Fedit-one-for-all) [[项目]](https:\u002F\u002Fthaoshibe.github.io\u002Fedit-one-for-all\u002F)\n    - ***DiffMorpher:*** 挖掘扩散模型在图像变形中的潜力 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.07409) [[代码]](https:\u002F\u002Fgithub.com\u002FKevin-thu\u002FDiffMorpher) [[项目]](https:\u002F\u002Fkevin-thu.github.io\u002FDiffMorpher_page\u002F) [[演示]](https:\u002F\u002Fopenxlab.org.cn\u002Fapps\u002Fdetail\u002FKaiwenZhang\u002FDiffMorpher)\n    - ***TiNO-Edit:*** 用于鲁棒扩散式图像编辑的时间步长与噪声优化 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.11120) [[代码]](https:\u002F\u002Fgithub.com\u002FSherryXTChen\u002FTiNO-Edit)\n    - 人在其位：为人体-物体交互图像编辑生成关联骨架引导图 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FYang_Person_in_Place_Generating_Associative_Skeleton-Guidance_Maps_for_Human-Object_Interaction_CVPR_2024_paper.pdf) [[项目]](https:\u002F\u002Fyangchanghee.github.io\u002FPerson-in-Place_page\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FYangChangHee\u002FCVPR2024_Person-In-Place_RELEASE)\n    - 引用式图像编辑：通过引用表达进行对象级图像编辑 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FLiu_Referring_Image_Editing_Object-level_Image_Editing_via_Referring_Expressions_CVPR_2024_paper.pdf)\n    - 用于自监督文本引导图像操作的提示增强 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FBodur_Prompt_Augmentation_for_Self-supervised_Text-guided_Image_Manipulation_CVPR_2024_paper.pdf)\n    - 细节决定成败：StyleFeatureEditor用于细节丰富的StyleGAN反演及高质量图像编辑 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FBobkov_The_Devil_is_in_the_Details_StyleFeatureEditor_for_Detail-Rich_StyleGAN_CVPR_2024_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FAIRI-Institute\u002FStyleFeatureEditor)\n\n- **ECCV**\n    - ***RegionDrag:*** 基于扩散模型的快速区域图像编辑 [[论文]](http:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.18247v1) [[代码]](https:\u002F\u002Fgithub.com\u002FVisual-AI\u002FRegionDrag) [[项目]](https:\u002F\u002Fvisual-ai.github.io\u002Fregiondrag\u002F) [[演示]](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1pnq9t_1zZ8yL_Oba20eBLVZLp3glniBR?usp=sharing)\n    - ***TurboEdit:*** 即时文本驱动图像编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.08332v1) [[项目]](https:\u002F\u002Fbetterze.github.io\u002FTurboEdit\u002F)\n    - ***InstructGIE:*** 向通用化图像编辑迈进 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.05018)\n    - ***StableDrag:*** 基于点的图像编辑中的稳定拖拽 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.04437)\n    - ***Eta Inversion:*** 为基于扩散的现实图像编辑设计最优的Eta函数 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F02157.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Ffuriosa-ai\u002Feta-inversion) [[项目]](https:\u002F\u002Fgithub.com\u002Ffuriosa-ai\u002Feta-inversion)\n    - ***SwapAnything:*** 实现个性化图像编辑中的任意对象替换 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F04768.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Feric-ai-lab\u002Fswap-anything) [[项目]](https:\u002F\u002Fswap-anything.github.io\u002F)\n    - ***Guide-and-Rescale:*** 用于高效无调优现实图像编辑的自引导机制 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F08987.pdf)\n    - ***FreeDiff:*** 基于扩散模型的图像编辑中的渐进式频率截断 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F00759.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FThermal-Dynamics\u002FFreeDiff)\n    - 用于交互式图像编辑的懒惰扩散Transformer [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F03436.pdf) [[项目]](https:\u002F\u002Flazydiffusion.github.io\u002F)\n    - ***ByteEdit:*** 提升、合规并加速生成式图像编辑 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F00359.pdf) [[项目]](https:\u002F\u002Fbyte-edit.github.io\u002F)\n  - **ICLR**\n    - 通过多模态大语言模型指导基于指令的图像编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.17102.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fapple\u002Fml-mgie) [[项目]](https:\u002F\u002Fmllm-ie.github.io\u002F)\n    - 随机性的恩赐：在通用扩散图像编辑中，SDE优于ODE [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.01410.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FML-GSAI\u002FSDE-Drag) [[项目]](https:\u002F\u002Fml-gsai.github.io\u002FSDE-Drag-demo\u002F)\n    - ***Motion Guidance:*** 基于扩散的图像编辑与可微运动估计器 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.18085.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fdangeng\u002Fmotion_guidance) [[项目]](https:\u002F\u002Fdangeng.github.io\u002Fmotion_guidance\u002F)\n    - 面向图像编辑的对象感知反演与重组 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.12149.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Faim-uofa\u002FOIR) [[项目]](https:\u002F\u002Faim-uofa.github.io\u002FOIR-Diffusion\u002F)\n    - ***Noise Map Guidance:*** 具有空间上下文的反演用于现实图像编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.04625.pdf)\n  - **AAAI**\n    - 无调优反演增强控制用于一致性图像编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.14611)\n    - ***BARET:*** 基于平衡注意力的真实图像编辑，由目标文本反演驱动 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.05482)\n    - 通过缓存支持的稀疏扩散推理加速文本到图像编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.17423)\n    - 高保真度的基于扩散的图像编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.15707)\n    - ***AdapEdit:*** 面向文本连续性敏感图像编辑的时空引导自适应编辑算法 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.08019)\n    - ***TexFit:*** 基于扩散模型的文本驱动时尚图像编辑 [[论文]](https:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fview\u002F28885)\n  - **arXiv**\n    - ***一件物品胜过一条提示：*** 具有解耦控制的多功能图像编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.04880.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FasFeng\u002Fd-edit)\n    - 一维适配器统领一切：概念、扩散模型与擦除应用 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.16145) [[代码]](https:\u002F\u002Fgithub.com\u002FCon6924\u002FSPM) [[项目]](https:\u002F\u002Flyumengyao.github.io\u002Fprojects\u002Fspm)\n    - ***EditWorld:*** 模拟世界动态以进行遵循指令的图像编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.14785) [[代码]](https:\u002F\u002Fgithub.com\u002FYangLing0818\u002FEditWorld) [[项目]](https:\u002F\u002Fgithub.com\u002FYangLing0818\u002FEditWorld)\n    - ***ReasonPix2Pix:*** 高级图像编辑的指令推理数据集 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.11190)\n    - ***FlowEdit:*** 使用预训练流模型的无反演文本编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.08629) [[代码]](https:\u002F\u002Fgithub.com\u002Ffallenshock\u002FFlowEdit) [[项目]](https:\u002F\u002Fmatankleiner.github.io\u002Fflowedit\u002F) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffallenshock\u002FFlowEdit\u002F)\n- \u003Cspan id=\"editing-year-2023\">**2023年**\u003C\u002Fspan>\n  - **CVPR**\n    - 揭示文本到图像扩散模型中的解耦能力 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FWu_Uncovering_the_Disentanglement_Capability_in_Text-to-Image_Diffusion_Models_CVPR_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FUCSB-NLP-Chang\u002FDiffusionDisentanglement)\n    - ***SINE:*** 使用文本到图像扩散模型进行单张图像编辑 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FZhang_SINE_SINgle_Image_Editing_With_Text-to-Image_Diffusion_Models_CVPR_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fzhang-zx\u002FSINE)\n    - ***Imagic:*** 基于文本的现实图像编辑，使用扩散模型 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FKawar_Imagic_Text-Based_Real_Image_Editing_With_Diffusion_Models_CVPR_2023_paper.pdf)\n    - ***InstructPix2Pix:*** 学习遵循图像编辑指令 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FBrooks_InstructPix2Pix_Learning_To_Follow_Image_Editing_Instructions_CVPR_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Ftimothybrooks\u002Finstruct-pix2pix) [[数据集]](https:\u002F\u002Finstruct-pix2pix.eecs.berkeley.edu\u002F) [[项目]](https:\u002F\u002Fwww.timothybrooks.com\u002Finstruct-pix2pix\u002F) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ftimbrooks\u002Finstruct-pix2pix)\n    - ***空文本反演***，用于借助引导扩散模型编辑真实图像 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FMokady_NULL-Text_Inversion_for_Editing_Real_Images_Using_Guided_Diffusion_Models_CVPR_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fprompt-to-prompt\u002F#null-text-inversion-for-editing-real-images)\n  - **ICCV**\n    - ***MasaCtrl:*** 无调优的互惠自注意力控制，用于一致的图像合成与编辑 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FCao_MasaCtrl_Tuning-Free_Mutual_Self-Attention_Control_for_Consistent_Image_Synthesis_and_ICCV_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FTencentARC\u002FMasaCtrl) [[项目]](https:\u002F\u002Fljzycmd.github.io\u002Fprojects\u002FMasaCtrl\u002F) [[演示]](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1DZeQn2WvRBsNg4feS1bJrwWnIzw1zLJq?usp=sharing)\n    - 使用文本到图像扩散模型定位对象级别的形状变化 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FPatashnik_Localizing_Object-Level_Shape_Variations_with_Text-to-Image_Diffusion_Models_ICCV_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Forpatashnik\u002Flocal-prompt-mixing) [[项目]](https:\u002F\u002Forpatashnik.github.io\u002Flocal-prompt-mixing\u002F) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Forpatashnik\u002Flocal-prompt-mixing)\n  - **ICLR**\n    - ***SDEdit:*** 借助随机微分方程进行引导的图像合成与编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2108.01073.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fermongroup\u002FSDEdit) [[项目]](https:\u002F\u002Fsde-image-editing.github.io\u002F)\n- \u003Cspan id=\"editing-year-2022\">**2022年**\u003C\u002Fspan>\n  - **CVPR**\n    - ***DiffusionCLIP:*** 文本引导的扩散模型，用于鲁棒的图像操作 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FKim_DiffusionCLIP_Text-Guided_Diffusion_Models_for_Robust_Image_Manipulation_CVPR_2022_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fgwang-kim\u002FDiffusionCLIP)\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n\n\n## 文本图像生成\n- \u003Cspan id=\"gentext-year-2024\">**2024年**\u003C\u002Fspan>\n  - **arXiv**\n    - ***AnyText:*** 多语言视觉文本生成与编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.03054.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Ftyxsspa\u002FAnyText) [[项目]](https:\u002F\u002Fanytext.pics\u002F)\n  - **CVPR**\n    - ***SceneTextGen:*** 布局无关的场景文本图像合成，结合字符级扩散模型与上下文一致性 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.01062v2)\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\n\u003C!-- omit in toc -->\n# 数据集\n- ***Microsoft COCO:*** 上下文中的常见物体 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1405.0312.pdf) [[数据集]](https:\u002F\u002Fcocodataset.org\u002F#home)\n- ***Conceptual Captions:*** 清洗过的、超义词化的图像替代文本数据集，用于自动图像字幕生成 [[论文]](https:\u002F\u002Faclanthology.org\u002FP18-1238.pdf) [[数据集]](https:\u002F\u002Fai.google.com\u002Fresearch\u002FConceptualCaptions\u002F)\n- ***LAION-5B:*** 一个开放的大规模数据集，用于训练下一代图文模型 [[论文]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=M3Y74vmsMcY) [[数据集]](https:\u002F\u002Flaion.ai\u002F)\n- ***PartiPrompts:*** 面向丰富内容文本到图像生成的自回归模型扩展 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2206.10789) [[数据集]](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fparti?tab=readme-ov-file) [[项目]](https:\u002F\u002Fsites.research.google\u002Fparti\u002F)\n\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\n\u003C!-- omit in toc -->\n# 工具包\n|名称|官网|描述|\n|-|-|-|\n|Stable Diffusion WebUI|[link](https:\u002F\u002Fgithub.com\u002FAUTOMATIC1111\u002Fstable-diffusion-webui)|基于Gradio构建，本地部署以运行Stable Diffusion检查点、LoRA权重、ControlNet权重等。|\n|Stable Diffusion WebUI-forge|[link](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002Fstable-diffusion-webui-forge)|基于Gradio构建，本地部署以运行Stable Diffusion检查点、LoRA权重、ControlNet权重等。|\n|Fooocus|[link](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FFooocus)|基于Gradio构建，离线、开源且免费。\u003Cbr \u002F>无需手动调整，用户只需关注提示词和图像即可。|\n|ComfyUI|[link](https:\u002F\u002Fgithub.com\u002Fcomfyanonymous\u002FComfyUI)|本地部署，支持使用Stable Diffusion进行自定义工作流。|\n|Civitai|[link](https:\u002F\u002Fcivitai.com\u002F)|社区驱动的Stable Diffusion和LoRA检查点网站|\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n# 问答\n- **问：本文献列表的会议顺序是什么？**\n  - 本文献列表按照以下顺序排列：\n    - CVPR\n    - ICCV\n    - ECCV\n    - WACV\n    - NeurIPS\n    - ICLR\n    - ICML\n    - ACM MM\n    - SIGGRAPH\n    - AAAI\n    - arXiv\n    - 其他\n- **问：‘其他’指的是什么？**\n  - 一些研究（例如‘Stable Casacade’）并未在arXiv上发表技术报告，而是倾向于在其官方网站上撰写博客文章。‘其他’类别即指这类研究。\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n# 参考文献\n\n`reference.bib`文件汇总了最新的图像修复论文、常用数据集和工具包的BibTeX参考文献。基于原始参考文献，我进行了如下修改，以使它们在LaTeX文档中呈现得更加美观：\n- 参考文献通常采用“作者-etal-年份-昵称”的形式。特别是数据集和工具包的参考文献直接使用“昵称”，如“imagenet”。\n- 在每条参考文献中，所有会议或期刊名称均被转换为缩写，例如“Computer Vision and Pattern Recognition -> CVPR”。\n- 移除了所有参考文献中的`url`、`doi`、`publisher`、`organization`、`editor`和`series`字段。\n- 对于缺少页码的参考文献，补充了页码信息。\n- 所有论文标题均采用首字母大写格式，并额外添加了`{}`，以确保在某些特定模板中也能正确显示首字母大写。\n\n如果您对参考文献格式有其他需求，可以通过在[DBLP](https:\u002F\u002Fdblp.org\u002F)或[Google Scholar](https:\u002F\u002Fscholar.google.com\u002F)中搜索论文名称来参考原始文献。\n\n> [!NOTE]\n> 请注意，主页和[主题章节](topics\u002Ftopics.md)中的参考文献可能会在`reference.bib`中重复出现。个人建议使用“Ctrl+F”\u002F“Command+F”来查找您所需的BibTeX参考文献。\n\n [\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n# 点赞历史\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlonzoLeeeooo_awesome-text-to-image-studies_readme_06364993e832.png\" target=\"_blank\">\n        \u003Cimg width=\"500\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlonzoLeeeooo_awesome-text-to-image-studies_readme_06364993e832.png\" alt=\"点赞历史图表\">\n    \u003C\u002Fa>\n\u003Cp>\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n# 微信群\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"YOUR_OFFICIAL_WEBSITE_URL\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlonzoLeeeooo_awesome-text-to-image-studies_readme_bebe6963d95b.png\" alt=\"群组\">\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)","# awesome-text-to-image-studies 快速上手指南\n\n`awesome-text-to-image-studies` 并非一个可直接运行的软件库或工具包，而是一个**文生图（Text-to-Image, T2I）领域的学术资源汇总仓库**。它主要收集了相关的论文、综述、数据集、开源代码链接以及在线产品列表。\n\n本指南将指导开发者如何高效利用该仓库获取最新的研究成果和代码资源。\n\n## 环境准备\n\n由于本仓库主要是文档和资源索引，**无需安装特定的运行时环境或依赖库**即可浏览内容。\n\n若您需要运行仓库中链接的具体论文代码，通常需要具备以下基础开发环境：\n\n*   **操作系统**: Linux (推荐 Ubuntu 20.04+), macOS, 或 Windows (WSL2)\n*   **编程语言**: Python 3.8+\n*   **深度学习框架**: PyTorch (大多数扩散模型基于此)\n*   **硬件加速**: NVIDIA GPU (推荐显存 8GB 以上，运行大模型建议 16GB+)\n*   **版本控制**: Git\n\n## 获取资源步骤\n\n### 1. 克隆仓库\n使用 Git 将资源列表下载到本地，以便离线查阅或追踪更新。\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo\u002Fawesome-text-to-image-generation-studies.git\ncd awesome-text-to-image-generation-studies\n```\n\n> **国内加速建议**：如果直接克隆速度较慢，可使用国内镜像源（如 Gitee 镜像，若有）或配置代理。\n> ```bash\n> # 示例：使用国内镜像加速（需确认具体镜像地址，此处为通用示意）\n> git clone https:\u002F\u002Fgitee.com\u002Fmirror\u002Fawesome-text-to-image-generation-studies.git\n> ```\n\n### 2. 浏览分类资源\n进入目录后，主要通过阅读 `README.md` 或 `topics\u002Ftopics.md` 文件来查找目标资源。仓库内容按以下维度分类：\n*   **研究方向**: 文生图生成、条件生成、个性化生成、文本引导编辑等。\n*   **时间年份**: 涵盖 2020 年至 2025 年的最新论文。\n*   **顶会期刊**: CVPR, ICCV, ECCV, NeurIPS, ICLR, AAAI, TPAMI 等。\n\n## 基本使用流程\n\n本仓库的核心用法是\"**检索 -> 定位代码 -> 独立部署**\"。以下是典型的使用示例：\n\n### 场景：查找并运行一篇 2025 年的文生图论文代码\n\n**步骤 1：在列表中定位论文**\n打开 `README.md`，找到 `Text-to-Image Generation` -> `Year 2025` 部分。\n例如，找到论文 ***PreciseCam*** (Precise Camera Control for Text-to-Image Generation)。\n\n**步骤 2：获取官方代码链接**\n在 README 中点击该论文对应的 `[[Code]]` 链接（通常指向独立的 GitHub 仓库）。\n*   示例链接：`https:\u002F\u002Fgithub.com\u002Fedurnebernal\u002FPreciseCam`\n\n**步骤 3：克隆并安装具体项目**\n跳转到该项目的独立页面，按照其自身的 `README` 进行安装。通常流程如下：\n\n```bash\n# 1. 克隆具体项目的代码\ngit clone https:\u002F\u002Fgithub.com\u002Fedurnebernal\u002FPreciseCam.git\ncd PreciseCam\n\n# 2. 创建虚拟环境 (推荐)\nconda create -n precisecam python=3.9\nconda activate precisecam\n\n# 3. 安装依赖 (具体命令以该项目 requirements.txt 为准)\npip install -r requirements.txt\n\n# 4. 下载预训练模型 (根据项目指引下载权重文件)\n# ...\n\n# 5. 运行推理示例\npython infer.py --prompt \"a photo of a cat\" --camera_angle 45\n```\n\n### 场景：查找综述文章\n若希望系统了解领域进展，可在 `Survey Papers` 章节直接点击 `[[Paper]]` 链接下载 PDF 阅读。\n*   例如：*Diffusion Models: A Comprehensive Survey of Methods and Applications*\n\n## 注意事项\n*   **时效性**: 仓库持续更新（如 2025 年会议论文），请定期 `git pull` 获取最新列表。\n*   **代码独立性**: 本仓库不提供统一的 `pip install` 命令，每个列出的研究项目都有独立的环境要求和安装脚本，请务必前往对应的项目主页查看详细说明。\n*   **数据集与工具**: 如需特定数据集或在线工具，可参考仓库中的 `Datasets` 和 `Products` 章节。","某高校计算机视觉实验室的研究生团队正致力于研发一种能精准控制人物姿态的文生图新算法，急需梳理最新的技术路线以确立创新点。\n\n### 没有 awesome-text-to-image-studies 时\n- **文献检索如大海捞针**：研究人员需在 arXiv、Google Scholar 等多个平台反复搜索关键词，难以区分哪些是真正的文生图核心论文，哪些只是边缘应用，耗时极长。\n- **技术演进脉络模糊**：面对 2020 年至 2025 年爆发的海量研究，团队难以快速理清从基础扩散模型到结合 LLM、Mamba 等新技术的演变逻辑，容易遗漏关键转折点。\n- **细分领域资源分散**：想要查找“个性化生成”或“文本引导编辑”等特定方向的论文及对应代码、数据集时，往往发现资源散落在不同仓库，缺乏统一入口。\n- **复现门槛高**：找到论文后，常因缺少官方代码链接、预训练模型权重或专用测试基准（如 DAVIS-Edit），导致算法复现和对比实验迟迟无法开展。\n\n### 使用 awesome-text-to-image-studies 后\n- **一站式获取权威清单**：团队直接查阅按年份（2020-2025）和会议（CVPR, ICCV 等）分类的论文列表，瞬间锁定近三年的核心研究成果，检索效率提升数倍。\n- **清晰把握技术前沿**：通过\"Topics\"板块，迅速掌握扩散模型与 Transformer、联邦学习等跨界融合的最新动态，快速定位到适合引入的创新技术组合。\n- **垂直领域精准导航**：利用“个性化生成”和“文本引导编辑”等细分目录，直接获取该方向下的所有相关论文、开源代码库及配套数据集，无需二次搜寻。\n- **复现链路完整闭环**：借助仓库提供的论文对应的代码地址、HuggingFace 模型权重及基准测试集，团队成员当天即可搭建环境并跑通基线模型，加速实验迭代。\n\nawesome-text-to-image-studies 将原本需要数周的文献调研与资源收集工作压缩至数小时，让研发团队能将宝贵精力集中于核心算法的创新与突破。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlonzoLeeeooo_awesome-text-to-image-studies_f52b33c2.png","AlonzoLeeeooo","USTC-liuchang","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FAlonzoLeeeooo_5e15b1aa.jpg","University of Science and Technology of China (USTC), Ph.D. Student, Computer Vision","University of Science and Technology of China","Hefei, China",null,"https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo",[82],{"name":83,"color":84,"percentage":85},"TeX","#3D6117",100,757,39,"2026-04-07T05:40:40","MIT","","未说明",{"notes":93,"python":91,"dependencies":94},"该仓库（awesome-text-to-image-studies）是一个论文、资源和工具集的汇总列表，本身不包含可执行的源代码或模型训练\u002F推理脚本，因此没有特定的运行环境需求。列表中提及的具体项目（如 StableV2V, PreciseCam 等）需参考其各自独立的仓库以获取环境配置信息。",[],[15],[97,98,99,100,101],"artificial-intelligence","diffusion-models","text-to-image","text-to-image-ai","text-to-image-diffusion","2026-03-27T02:49:30.150509","2026-04-10T02:46:42.922113",[],[]]