[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-ali-vilab--In-Context-LoRA":3,"tool-ali-vilab--In-Context-LoRA":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",160784,2,"2026-04-19T11:32:54",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",109154,"2026-04-18T11:18:24",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":73,"owner_company":75,"owner_location":75,"owner_email":75,"owner_twitter":75,"owner_website":75,"owner_url":76,"languages":75,"stars":77,"forks":78,"last_commit_at":79,"license":75,"difficulty_score":10,"env_os":80,"env_gpu":81,"env_ram":80,"env_deps":82,"category_tags":86,"github_topics":75,"view_count":32,"oss_zip_url":75,"oss_zip_packed_at":75,"status":17,"created_at":87,"updated_at":88,"faqs":89,"releases":122},9605,"ali-vilab\u002FIn-Context-LoRA","In-Context-LoRA","Official repository of In-Context LoRA for Diffusion Transformers","In-Context LoRA 是一个专为扩散变换器（Diffusion Transformers）设计的灵活生成式 AI 框架。它核心解决了传统模型在面对新任务时往往需要大量数据重新训练或微调的痛点，通过“上下文学习”机制，让用户仅需提供少量参考图像（如几张设计草图或角色照片），模型即可零样本（Zero-shot）理解并执行复杂的视觉生成任务。\n\n该技术特别适用于设计师、创意工作者以及 AI 研究人员。设计师可以利用它快速完成虚拟试衣、产品外观设计、视觉身份迁移或电影分镜生成等工作，大幅降低多任务适配的成本；研究人员则能借助其开源的训练配置和预训练模型，探索生成模型在泛化能力上的边界。目前社区已基于该技术开发了多种 ComfyUI 工作流，涵盖从服装换款到角色扮演等丰富场景。\n\nIn-Context LoRA 的独特亮点在于其高效的适应性：无需针对每个新任务单独训练庞大的模型参数，而是通过动态调整注意力机制中的上下文令牌，实现“即插即用”的任务切换。这种设计不仅保留了高质量生成的特性，还显著提升了模型处理多样化现实世界设计任务的灵活性，是连接通用大模型与垂直应用场景的有力桥梁。","# In-Context LoRA (IC-LoRA)\n\n🔥 **Latest News!**\n\n- **[2024-12-17]** 🚀 We are excited to release **[IDEA-Bench](https:\u002F\u002Fali-vilab.github.io\u002FIDEA-Bench-Page\u002F)**, a comprehensive benchmark designed to assess the zero-shot task generalization abilities of generative models. The benchmark includes **100** real-world design tasks across **275** unique cases. Despite its general-purpose focus, the top-performing model, EMU2, achieves a score of only **6.81** out of 100, highlighting the current challenges in this domain. Explore the benchmark and challenge the limits of model performance!\n- **[2024-11-16]** 🌟 The community continues to innovate with IC-LoRA! Exciting projects include models, ComfyUI nodes and workflows for **Virtual Try-on, Product Design, Object Mitigation, Role Play**, and more. Explore their creations in **[Community Creations Using IC-LoRA](#community-creations-using-ic-lora)**. Huge thanks to all contributors for their incredible efforts!\n- **[2024-11-07]** 🚀 We have released **[10 pretrained models](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA)** for In-Context LoRA, covering diverse tasks such as Film Storyboard Generation, Visual Identity Design, and Visual Effects. See **[MODEL ZOO](#model-zoo)** for details. We also provide an [example workflow](.\u002Fworkflow\u002Ffilm-storyboard.json) for [ComfyUI](https:\u002F\u002Fgithub.com\u002Fcomfyanonymous\u002FComfyUI).\n- **[2024-11-01]** 📂 Data and training configurations for **[In-Context LoRA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.23775)** are now available!\n- **[2024-10-31]** 📜 Our latest paper, **[In-Context LoRA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.23775)**, introduces a flexible framework adaptable to a wide range of tasks.\n- **[2024-10-19]** 🎨 We released the paper **[Group Diffusion Transformers](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.15027)**, the predecessor of In-Context LoRA, offering zero-shot support for 30 visual generation tasks.\n- **[2024-4-18]** 💻 We release the **[code and models](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FFlashFace)** for **[FlashFace](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FFlashFace)**, a precursor to Group Diffusion Transformers, verifies attention token concatenation for customized generation scenarios.\n\nWelcome to the official repository of **In-Context LoRA for Diffusion Transformers** ([Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.23775) and [Project Page](https:\u002F\u002Fali-vilab.github.io\u002FIn-Context-LoRA-Page\u002F)).\n\n## Community Creations Using IC-LoRA\n\nWe are thrilled to showcase the community's innovative projects leveraging In-Context LoRA (IC-LoRA). If you have additional recommendations or projects to share, **please don't hesitate to send a [Pull Request](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FIn-Context-LoRA\u002Fpulls)!**\n\n| Project Name | Type                 | Supported Tasks                                                                 | Sample Results |\n|--------------|----------------------|---------------------------------------------------------------------------------|----------------|\n| 1. [Comfyui_Object_Migration](https:\u002F\u002Fgithub.com\u002FTTPlanetPig\u002FComfyui_Object_Migration) | ComfyUI Node & Workflow & LoRA Model         | Clothing Migration, Cartoon Clothing to Realism, and More     | ![Sample Result](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fali-vilab_In-Context-LoRA_readme_cbf2aafb9f1c.png) |\n| 2. [Flux Simple Try On - In Context Lora](https:\u002F\u002Fcivitai.com\u002Fmodels\u002F950111\u002Fflux-simple-try-on-in-context-lora) | LoRA Model & ComfyUI Workflow     | Virtual Try-on             | ![Sample Result](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fali-vilab_In-Context-LoRA_readme_92f3994dcf7c.png) ![Sample Result](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fali-vilab_In-Context-LoRA_readme_45b29ffc5827.jpeg) |\n| 3. [Flux In Context - visual identity Lora in Comfy](https:\u002F\u002Fcivitai.com\u002Farticles\u002F8779) | ComfyUI Workflow               | Visual Identity Transfer              | ![Sample Result](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fali-vilab_In-Context-LoRA_readme_bd573fabf5ea.jpeg) |\n| 4. [Workflows Flux In Context Lora For Product Design](https:\u002F\u002Fcivitai.com\u002Fmodels\u002F933018\u002Fworkflows-flux-in-context-lora-for-product-design) | ComfyUI Workflow               | Product Design, Role Play, and More              | ![Sample Result](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fali-vilab_In-Context-LoRA_readme_602a49840b81.jpeg) |\n| 5. [Flux Product Design - In Context Lora](https:\u002F\u002Fcivitai.com\u002Fmodels\u002F933026\u002Fflux-product-design-in-context-lora) | LoRA Model & ComfyUI Workflow               | Product Design              | ![Sample Result](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fali-vilab_In-Context-LoRA_readme_6bece06e573a.jpeg) |\n| 6. [In Context lora + Character story generator + flux+ shichen](https:\u002F\u002Fcivitai.com\u002Fmodels\u002F951357\u002Fin-context-lora-character-story-generator-flux-shichen) | ComfyUI Workflow               | Character Movie Story Generator              | ![Sample Result](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fali-vilab_In-Context-LoRA_readme_57402590a7c5.jpeg) |\n| 7. [In- Context-Lora｜Cute 4koma 可爱四格漫画](https:\u002F\u002Fcivitai.com\u002Fmodels\u002F947702\u002Fin-context-loracute-4koma) | LoRA Model & ComfyUI Workflow               | Comic Strip Generation              | ![Sample Result](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fali-vilab_In-Context-LoRA_readme_d10c51b44cc8.jpeg) |\n| 8. [Creative Effects & Design LoRA Pack (In-Context LORA)](https:\u002F\u002Fcivitai.com\u002Fmodels\u002F929592\u002Fcreative-effects-and-design-lora-pack-in-context-lora) | LoRA Model & ComfyUI Workflow               | Movie-Shot Generation and More              | ![Sample Result](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fali-vilab_In-Context-LoRA_readme_69b6b6f37ada.jpeg) |\n\nWe extend our heartfelt thanks to all contributors for their exceptional work in advancing the IC-LoRA ecosystem.\n\n## Key Idea\n\nThe core concept of IC-LoRA is to **concatenate** both condition and target images into a single composite image while using **Natural Language** to define the task. This approach enables seamless adaptation to a wide range of applications.\n\n## Features\n\n- **Task-Agnostic Framework**: IC-LoRA serves as a general framework, but it requires task-specific fine-tuning for diverse applications.\n- **Customizable Image-Set Generation**: You can fine-tune text-to-image models to **generate image sets** with customizable intrinsic relationships.\n- **Condition on Image-Set**: You can also **condition the generation of a set of images on another set of images**, enabling a wide range of controllable generation applications.\n\nFor more detailed information and examples, please read our [Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.23775) or visit our [Project Page](https:\u002F\u002Fali-vilab.github.io\u002FIn-Context-LoRA-Page\u002F).\n\n## Getting Started\n\nYou can directly use the open-source [AI-Toolkit](https:\u002F\u002Fgithub.com\u002Fostris\u002Fai-toolkit) to train IC-LoRA models. We have provided sample training data with a configuration file in this repo:\n\n- **Configuration File**: `config\u002Fmovie-shots.yml` (place it in the `config\u002F` directory of AI-Toolkit)\n- **Sample Training Data**: `data\u002Fmovie-shots.zip` (extract it to `data\u002Fmovie-shots` of AI-Toolkit)\n\nAfter installing the necessary dependencies and setting up AI-Toolkit, you can start training by running:\n\n```bash\npython run.py config\u002Fmovie-shots.yml\n```\n\nThe training runs on a single GPU with at least 24GB of memory (adjust the `resolution` parameter in `config\u002Fmovie-shots.yml` for different GPU memory limits). The training should complete in a few hours.\n\n## Prompt for Multi-Scene Image Captioning\n\nAs a reference, we provide an example prompt used to generate captions for multi-scene images:\n\n> *Create a short description of this three-scene image featuring movie shots, beginning with the prefix [MOVIE-SHOTS] for the entire caption, followed by an overall summary of the image. Each scene detail should flow within the same sentence, with specific markers [SCENE-1], [SCENE-2], [SCENE-3], indicating the start of each scene’s description. Name the role(s) with random name(s) if necessary, and wrap the name(s) with \"\u003C\" and \">\". Ensure the entire description is cohesive, flows as one sentence, and remains within 512 words.*\n\n## MODEL ZOO\n\nBelow lists 10 In-Context LoRA models and their recommend settings. We provide an [example workflow](.\u002Fworkflow\u002Ffilm-storyboard.json) for [ComfyUI](https:\u002F\u002Fgithub.com\u002Fcomfyanonymous\u002FComfyUI).\n\n| Task          | Model        | Recommend Settings | Example Prompt        |\n|---------------|-------------------|---------------------|---------------------------|\n| **1. Couple Profile Design** | [`couple-profile.safetensors`](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA\u002Fblob\u002Fmain\u002Fcouple-profile.safetensors)   | `width: 2048, height: 1024` | `This two-part image portrays a couple of cartoon cats in detective attire; [LEFT] a black cat in a trench coat and fedora holds a magnifying glass and peers to the right, while [RIGHT] a white cat with a bow tie and matching hat raises an eyebrow in curiosity, creating a fun, noir-inspired scene against a dimly lit background.` |\n| **2. Film Storyboard**  | [`film-storyboard.safetensors`](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA\u002Fblob\u002Fmain\u002Ffilm-storyboard.safetensors) | `width: 1024, height: 1536`    | `[MOVIE-SHOTS] In a vibrant festival, [SCENE-1] we find \u003CLeo>, a shy boy, standing at the edge of a bustling carnival, eyes wide with awe at the colorful rides and laughter, [SCENE-2] transitioning to him reluctantly trying a daring game, his friends cheering him on, [SCENE-3] culminating in a triumphant moment as he wins a giant stuffed bear, his face beaming with pride as he holds it up for all to see.`  |\n| **3. Font Design** | [`font-design.safetensors`](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA\u002Fblob\u002Fmain\u002Ffont-design.safetensors)   | `width: 1792, height: 1216` | `The four-panel image showcases a playful bubble font in a vibrant pop-art style. [TOP-LEFT] displays \"Pop Candy\" in bright pink with a polka dot background; [TOP-RIGHT] shows \"Sweet Treat\" in purple, surrounded by candy illustrations; [BOTTOM-LEFT] has \"Yum!\" in a mix of bright colors; [BOTTOM-RIGHT] shows \"Delicious\" against a striped background, perfect for fun, kid-friendly products.` |\n| **4. Home Decoration** | [`home-decoration.safetensors`](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA\u002Fblob\u002Fmain\u002Fhome-decoration.safetensors)      | `width: 1344, height: 1728` | `This four-panel image showcases a rustic living room with warm wood tones and cozy decor elements; [TOP-LEFT] features a large stone fireplace with wooden shelves filled with books and candles; [TOP-RIGHT] shows a vintage leather sofa draped in plaid blankets, complemented by a mix of textured cushions; [BOTTOM-LEFT] displays a corner with a wooden armchair beside a side table holding a steaming mug and a classic book; [BOTTOM-RIGHT] captures a cozy reading nook with a window seat, a soft fur throw, and decorative logs stacked neatly.` |\n| **5. Portrait Illustration** | [`portrait-illustration.safetensors`](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA\u002Fblob\u002Fmain\u002Fportrait-illustration.safetensors)      | `width: 1152, height: 1088` | `This two-panel image presents a transformation from a realistic portrait to a playful illustration, capturing both detail and artistic flair; [LEFT] the photograph shows a woman standing in a bustling marketplace, wearing a wide-brimmed hat, a flowing bohemian dress, and a leather crossbody bag; [RIGHT] the illustration panel exaggerates her accessories and features, with the bohemian dress depicted in vibrant patterns and bold colors, while the background is simplified into abstract market stalls, giving the scene an animated and lively feel.` |\n| **6. Portrait Photography** | [`portrait-photography.safetensors`](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA\u002Fblob\u002Fmain\u002Fportrait-photography.safetensors)      | `width: 1344, height: 1728` | `This [FOUR-PANEL] image illustrates a young artist's creative process in a bright and inspiring studio; [TOP-LEFT] she stands before a large canvas, brush in hand, adding vibrant colors to a partially completed painting, [TOP-RIGHT] she sits at a cluttered wooden table, sketching ideas in a notebook with various art supplies scattered around, [BOTTOM-LEFT] she takes a moment to step back and observe her work, adjusting her glasses thoughtfully, and [BOTTOM-RIGHT] she experiments with different textures by mixing paints directly on the palette, her focused expression showcasing her dedication to her craft.` |\n| **7. PPT Template** | [`ppt-templates.safetensors`](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA\u002Fblob\u002Fmain\u002Fppt-templates.safetensors)      | `width: 1984, height: 1152` | `This four-panel image showcases a rustic-themed PowerPoint template for a culinary workshop; [TOP-LEFT] introduces \"Farm to Table Cooking\" in warm, earthy tones; [TOP-RIGHT] organizes workshop sections like \"Ingredients,\" \"Preparation,\" and \"Serving\"; [BOTTOM-LEFT] displays ingredient lists for seasonal produce; [BOTTOM-RIGHT] includes chef profiles with short bios.` |\n| **8. Sandstorm Visual Effect** | [`sandstorm-visual-effect.safetensors`](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA\u002Fblob\u002Fmain\u002Fsandstorm-visual-effect.safetensors)      | `width: 1408, height: 1600` | `[SANDSTORM-PSA] This two-part image showcases the transformation of a cyclist through a sandstorm visual effect; [TOP] the upper panel features a cyclist in vibrant gear pedaling steadily on a clear, open road with a serene sky in the background, highlighting focus and determination, [BOTTOM] the lower panel transforms the scene as the cyclist becomes enveloped in a fierce sandstorm, with sand particles swirling intensely around the bike and rider against a stormy, darkened backdrop, emphasizing chaos and power.` |\n| **9. Sparklers Visual Effect** | [`sparklers-visual-effect.safetensors`](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA\u002Fblob\u002Fmain\u002Fsparklers-visual-effect.safetensors)      | `width: 960, height: 1088` | `[REAL-SPARKLERS-OVERLAYS] The two-part image vividly illustrates a woodland proposal transformed by sparkler overlays; [TOP] the first panel depicts a man kneeling on one knee with an engagement ring before his partner in a forest clearing at dusk, with warm, natural lighting, [BOTTOM] while the second panel introduces glowing sparklers that form a heart shape around the couple, amplifying the romance and joy of the moment.` |\n| **10. Visual Identity Design** | [`visual-identity-design.safetensors`](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA\u002Fblob\u002Fmain\u002Fvisual-identity-design.safetensors)      | `width: 1472, height: 1024` | `The two-panel image showcases the joyful identity of a produce brand, with the left panel showing a smiling pineapple graphic and the brand name “Fresh Tropic” in a fun, casual font on a light aqua background; [LEFT] while the right panel translates the design onto a reusable shopping tote with the pineapple logo in black, held by a person in a market setting, emphasizing the brand’s approachable and eco-friendly vibe.` |\n\n## License\n\nThis repository uses [FLUX](https:\u002F\u002Fgithub.com\u002Fblack-forest-labs\u002Fflux) as the base model. Users must comply with FLUX's license when using this code. Please refer to [FLUX's License](https:\u002F\u002Fgithub.com\u002Fblack-forest-labs\u002Fflux\u002Ftree\u002Fmain\u002Fmodel_licenses) for more details.\n\n**DISCLAIMER**: Please be aware that the training data provided in this repository may contain copyrighted material. The open-source data is intended for reference and educational purposes only. If you plan to use this data for commercial purposes, you are responsible for obtaining the necessary permissions and ensuring compliance with all applicable copyright laws and regulations.\n\n## Citation\n\nIf you find this work useful in your research, please consider citing:\n\n```bibtex\n@article{lhhuang2024iclora,\n  title={In-Context LoRA for Diffusion Transformers},\n  author={Huang, Lianghua and Wang, Wei and Wu, Zhi-Fan and Shi, Yupeng and Dou, Huanzhang and Liang, Chen and Feng, Yutong and Liu, Yu and Zhou, Jingren},\n  journal={arXiv preprint arxiv:2410.23775},\n  year={2024}\n}\n```\n\n```bibtex\n@article{lhhuang2024groupdiffusion,\n  title={Group Diffusion Transformers are Unsupervised Multitask Learners},\n  author={Huang, Lianghua and Wang, Wei and Wu, Zhi-Fan and Dou, Huanzhang and Shi, Yupeng and Feng, Yutong and Liang, Chen and Liu, Yu and Zhou, Jingren},\n  journal={arXiv preprint arxiv:2410.15027},\n  year={2024}\n}\n```\n","# 在上下文LoRA (IC-LoRA)\n\n🔥 **最新消息！**\n\n- **[2024-12-17]** 🚀 我们很高兴地发布了**[IDEA-Bench](https:\u002F\u002Fali-vilab.github.io\u002FIDEA-Bench-Page\u002F)**，这是一个全面的基准测试，旨在评估生成模型的零样本任务泛化能力。该基准包含**275个**独特案例中的**100个**真实世界设计任务。尽管其定位为通用型，表现最佳的模型EMU2在该基准上的得分也仅为100分中的**6.81分**，这凸显了当前该领域的挑战。快来探索这个基准，挑战模型性能的极限吧！\n- **[2024-11-16]** 🌟 社区持续以IC-LoRA进行创新！令人兴奋的项目包括用于**虚拟试穿、产品设计、物体迁移、角色扮演**等任务的模型、ComfyUI节点和工作流。请在**[使用IC-LoRA的社区创作](#community-creations-using-ic-lora)**中探索他们的作品。衷心感谢所有贡献者的卓越努力！\n- **[2024-11-07]** 🚀 我们发布了**[10个预训练模型](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA)**，适用于在上下文LoRA，涵盖电影分镜生成、视觉形象设计和视觉特效等多种任务。详情请参见**[模型库](#model-zoo)**。我们还提供了针对[ComfyUI](https:\u002F\u002Fgithub.com\u002Fcomfyanonymous\u002FComfyUI)的[示例工作流](.\u002Fworkflow\u002Ffilm-storyboard.json)。\n- **[2024-11-01]** 📂 **[在上下文LoRA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.23775)**的数据和训练配置现已开放下载！\n- **[2024-10-31]** 📜 我们的最新论文**[在上下文LoRA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.23775)**提出了一种灵活的框架，可适应多种任务。\n- **[2024-10-19]** 🎨 我们发布了论文**[Group Diffusion Transformers](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.15027)**，这是在上下文LoRA的前身，能够支持30种视觉生成任务的零样本应用。\n- **[2024-4-18]** 💻 我们发布了**[代码和模型](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FFlashFace)**，用于**[FlashFace](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FFlashFace)**——Group Diffusion Transformers的早期版本——验证了注意力机制中token拼接技术在定制化生成场景中的应用。\n\n欢迎来到**扩散Transformer中的在上下文LoRA**官方仓库（[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.23775)和[项目页面](https:\u002F\u002Fali-vilab.github.io\u002FIn-Context-LoRA-Page\u002F)）。\n\n## 使用IC-LoRA的社区创作\n\n我们非常高兴地展示社区利用在上下文LoRA（IC-LoRA）所开发的创新项目。如果您有其他推荐或项目想要分享，请**随时提交[拉取请求](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FIn-Context-LoRA\u002Fpulls)**！\n\n| 项目名称 | 类型                 | 支持的任务                                                                 | 示例结果 |\n|----------|----------------------|------------------------------------------------------------------------------|----------|\n| 1. [Comfyui_Object_Migration](https:\u002F\u002Fgithub.com\u002FTTPlanetPig\u002FComfyui_Object_Migration) | ComfyUI节点 & 工作流 & LoRA模型         | 衣物迁移、卡通服装转写实风格等     | ![示例结果](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fali-vilab_In-Context-LoRA_readme_cbf2aafb9f1c.png) |\n| 2. [Flux Simple Try On - In Context Lora](https:\u002F\u002Fcivitai.com\u002Fmodels\u002F950111\u002Fflux-simple-try-on-in-context-lora) | LoRA模型 & ComfyUI工作流     | 虚拟试穿             | ![示例结果](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fali-vilab_In-Context-LoRA_readme_92f3994dcf7c.png) ![示例结果](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fali-vilab_In-Context-LoRA_readme_45b29ffc5827.jpeg) |\n| 3. [Flux In Context - visual identity Lora in Comfy](https:\u002F\u002Fcivitai.com\u002Farticles\u002F8779) | ComfyUI工作流               | 视觉形象迁移              | ![示例结果](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fali-vilab_In-Context-LoRA_readme_bd573fabf5ea.jpeg) |\n| 4. [Workflows Flux In Context Lora For Product Design](https:\u002F\u002Fcivitai.com\u002Fmodels\u002F933018\u002Fworkflows-flux-in-context-lora-for-product-design) | ComfyUI工作流               | 产品设计、角色扮演等              | ![示例结果](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fali-vilab_In-Context-LoRA_readme_602a49840b81.jpeg) |\n| 5. [Flux Product Design - In Context Lora](https:\u002F\u002Fcivitai.com\u002Fmodels\u002F933026\u002Fflux-product-design-in-context-lora) | LoRA模型 & ComfyUI工作流               | 产品设计              | ![示例结果](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fali-vilab_In-Context-LoRA_readme_6bece06e573a.jpeg) |\n| 6. [In Context lora + Character story generator + flux+ shichen](https:\u002F\u002Fcivitai.com\u002Fmodels\u002F951357\u002Fin-context-lora-character-story-generator-flux-shichen) | ComfyUI工作流               | 角色电影故事生成              | ![示例结果](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fali-vilab_In-Context-LoRA_readme_57402590a7c5.jpeg) |\n| 7. [In- Context-Lora｜Cute 4koma 可爱四格漫画](https:\u002F\u002Fcivitai.com\u002Fmodels\u002F947702\u002Fin-context-loracute-4koma) | LoRA模型 & ComfyUI工作流               | 漫画条生成              | ![示例结果](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fali-vilab_In-Context-LoRA_readme_d10c51b44cc8.jpeg) |\n| 8. [Creative Effects & Design LoRA Pack (In-Context LORA)](https:\u002F\u002Fcivitai.com\u002Fmodels\u002F929592\u002Fcreative-effects-and-design-lora-pack-in-context-lora) | LoRA模型 & ComfyUI工作流               | 电影镜头生成等              | ![示例结果](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fali-vilab_In-Context-LoRA_readme_69b6b6f37ada.jpeg) |\n\n我们向所有为推动IC-LoRA生态系统发展做出杰出贡献的伙伴致以最诚挚的谢意。\n\n## 核心理念\n\nIC-LoRA的核心思想是将条件图像和目标图像**拼接**成一张复合图像，并使用**自然语言**来定义任务。这种方法使得IC-LoRA能够无缝适配于各种应用场景。\n\n## 特性\n\n- **任务无关框架**：IC-LoRA是一个通用框架，但针对不同应用仍需进行特定任务的微调。\n- **可定制图像集生成**：您可以对文本到图像模型进行微调，以**生成具有自定义内在关系的图像集**。\n- **基于图像集的条件控制**：您还可以让一组图像的生成**依赖于另一组图像**，从而实现广泛的可控生成应用。\n\n如需更详细的信息和示例，请阅读我们的[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.23775)或访问我们的[项目页面](https:\u002F\u002Fali-vilab.github.io\u002FIn-Context-LoRA-Page\u002F)。\n\n## 快速入门\n\n您可以直接使用开源的[AI工具包](https:\u002F\u002Fgithub.com\u002Fostris\u002Fai-toolkit)来训练IC-LoRA模型。我们在本仓库中提供了示例训练数据及配置文件：\n\n- **配置文件**：`config\u002Fmovie-shots.yml`（放置于AI工具包的`config\u002F`目录下）\n- **示例训练数据**：`data\u002Fmovie-shots.zip`（解压至AI工具包的`data\u002Fmovie-shots`目录）\n\n安装好必要的依赖并设置好AI工具包后，您可以通过以下命令开始训练：\n\n```bash\npython run.py config\u002Fmovie-shots.yml\n```\n\n训练可在单块至少拥有24GB显存的GPU上运行（可根据不同GPU显存限制调整`config\u002Fmovie-shots.yml`中的`resolution`参数）。整个训练过程通常只需数小时即可完成。\n\n## 多场景图像字幕生成提示\n\n作为参考，我们提供一个用于生成多场景图像字幕的示例提示：\n\n> *为这张包含三个电影镜头的图像创作一段简短描述，整段字幕以[MOVIE-SHOTS]作为前缀，随后给出图像的整体概述。每个场景的细节应连贯地融入同一句话中，并使用特定标记[SCENE-1]、[SCENE-2]、[SCENE-3]标明各场景描述的起始位置。如有需要，可为角色随机命名，姓名用“\u003C”和“>”括起来。确保整段描述语义连贯、一气呵成，且字数不超过512字。*\n\n## 模型动物园\n\n以下是10个上下文LoRA模型及其推荐设置。我们为[ComfyUI](https:\u002F\u002Fgithub.com\u002Fcomfyanonymous\u002FComfyUI)提供了一个[示例工作流](.\u002Fworkflow\u002Ffilm-storyboard.json)。\n\n| 任务          | 模型        | 推荐设置 | 示例提示        |\n|---------------|-------------------|---------------------|---------------------------|\n| **1. 情侣形象设计** | [`couple-profile.safetensors`](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA\u002Fblob\u002Fmain\u002Fcouple-profile.safetensors)   | `width: 2048, height: 1024` | `这张由两部分组成的图像描绘了一对穿着侦探装的卡通猫；[左]一只身穿风衣、头戴费多拉帽的黑猫手持放大镜向右凝视，而[右]一只系着蝴蝶结领结、戴着同色帽子的白猫则好奇地挑了挑眉毛，在昏暗的背景衬托下营造出一种趣味十足的黑色电影风格场景。` |\n| **2. 电影分镜头脚本**  | [`film-storyboard.safetensors`](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA\u002Fblob\u002Fmain\u002Ffilm-storyboard.safetensors) | `width: 1024, height: 1536`    | `[电影镜头] 在一场热闹的节日里，[场景1] 我们看到害羞的男孩\u003CLeo>站在熙熙攘攘的嘉年华边缘，眼睛因五彩缤纷的游乐设施和欢声笑语而睁得大大的；[场景2] 镜头切换到他勉强尝试一项惊险刺激的游戏，朋友们在一旁为他加油助威；[场景3] 最后是胜利的时刻——他赢得了一只巨大的毛绒熊，脸上洋溢着自豪的笑容，高高举起让所有人都能看到。`  |\n| **3. 字体设计** | [`font-design.safetensors`](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA\u002Fblob\u002Fmain\u002Ffont-design.safetensors)   | `width: 1792, height: 1216` | `这组四联图展示了一种充满活力的波普艺术风格的趣味气泡字体。[左上] 是用亮粉色书写的“Pop Candy”，背景点缀着波点图案；[右上] 是紫色的“Sweet Treat”，周围环绕着糖果插画；[左下] 是用多种鲜艳色彩拼接而成的“Yum!”；[右下] 则是在条纹背景上的“Delicious”，非常适合用于有趣、适合儿童的产品。` |\n| **4. 家居装饰** | [`home-decoration.safetensors`](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA\u002Fblob\u002Fmain\u002Fhome-decoration.safetensors)      | `width: 1344, height: 1728` | `这组四联图展示了一个充满温暖木色调与舒适装饰元素的乡村风格客厅；[左上] 是一个巨大的石制壁炉，木架上摆满了书籍和蜡烛；[右上] 是一张铺着格子毯的复古皮质沙发，搭配各种质感丰富的靠垫；[左下] 是一个角落，摆放着一把木制扶手椅，旁边的小桌上放着一杯冒着热气的饮品和一本经典书籍；[右下] 则是一个舒适的阅读角，设有窗边座椅、柔软的毛皮披肩以及整齐堆放的装饰木柴。` |\n| **5. 肖像插画** | [`portrait-illustration.safetensors`](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA\u002Fblob\u002Fmain\u002Fportrait-illustration.safetensors)      | `width: 1152, height: 1088` | `这张双联图呈现了一幅从写实肖像到趣味插画的转变过程，既保留了细节又富有艺术气息；[左] 是一张照片，拍摄的是一个女子站在热闹的集市中，头戴宽檐帽，身着飘逸的波西米亚长裙，斜挎着皮质挎包；[右] 则是一幅插画，夸张地表现了她的配饰和面部特征，波西米亚长裙被绘制成充满活力的图案和大胆的色彩，而背景则简化为抽象的市场摊位，使画面显得生动活泼。` |\n| **6. 人物摄影** | [`portrait-photography.safetensors`](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA\u002Fblob\u002Fmain\u002Fportrait-photography.safetensors)      | `width: 1344, height: 1728` | `这组[四联]图片描绘了一位年轻艺术家在明亮而富有灵感的工作室中的创作过程；[左上] 她站在一幅巨大的画布前，手持画笔为尚未完成的作品添上鲜艳的色彩；[右上] 她坐在一张杂乱的木桌上，一边在笔记本上勾勒创意，身边散落着各种美术用品；[左下] 她停下脚步，退后几步仔细端详自己的作品，若有所思地调整着眼镜；[右下] 则是在调色板上直接混合颜料，尝试不同的肌理效果，她专注的表情充分展现了对艺术的执着。` |\n| **7. PPT模板** | [`ppt-templates.safetensors`](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA\u002Fblob\u002Fmain\u002Fppt-templates.safetensors)      | `width: 1984, height: 1152` | `这组四联图展示了一个以乡村为主题的烹饪工作坊PPT模板；[左上] 以温暖的大地色调介绍了“从农场到餐桌的烹饪”；[右上] 按照“食材”、“准备”和“上菜”等环节组织内容；[左下] 列出了时令农产品的清单；[右下] 包含了厨师简介短文。` |\n| **8. 沙尘暴视觉特效** | [`sandstorm-visual-effect.safetensors`](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA\u002Fblob\u002Fmain\u002Fsandstorm-visual-effect.safetensors)      | `width: 1408, height: 1600` | `[沙尘暴警示] 这张双联图通过沙尘暴视觉特效展示了骑自行车者经历的变化；[上] 上半部分描绘了一位身穿鲜艳骑行服的骑手在晴朗开阔的公路上稳步前行，背景是宁静的天空，凸显了他的专注与决心；[下] 下半部分则将场景转变为狂暴的沙尘暴，沙粒在自行车和骑手周围剧烈翻滚，背景变得阴沉而混乱，强调了无序与力量。` |\n| **9. 魔术烟花视觉特效** | [`sparklers-visual-effect.safetensors`](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA\u002Fblob\u002Fmain\u002Fsparklers-visual-effect.safetensors)      | `width: 960, height: 1088` | `[真实魔术烟花叠加] 这张双联图生动地展现了林间求婚仪式在魔术烟花叠加下的变化；[上] 第一幅画面描绘了一位男士在黄昏时分跪在森林空地上向伴侣求婚，背景光线温暖自然；[下] 第二幅画面则加入了闪耀的魔术烟花，围绕这对情侣组成了心形图案，进一步烘托出浪漫与喜悦的氛围。` |\n| **10. 视觉识别设计** | [`visual-identity-design.safetensors`](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA\u002Fblob\u002Fmain\u002Fvisual-identity-design.safetensors)      | `width: 1472, height: 1024` | `这组双联图展示了某生鲜品牌欢快的品牌形象，左侧画面是一个微笑的菠萝图形，品牌名“Fresh Tropic”以轻松随意的字体印在浅水蓝色背景上；[左] 而右侧画面则将这一设计应用到了一款可重复使用的购物袋上，黑色的菠萝标志由一位身处市场的消费者手持，突显了该品牌亲民且环保的理念。` |\n\n## 许可证\n\n本仓库以 [FLUX](https:\u002F\u002Fgithub.com\u002Fblack-forest-labs\u002Fflux) 作为基础模型。用户在使用本代码时，必须遵守 FLUX 的许可证条款。更多详情请参阅 [FLUX 的许可证](https:\u002F\u002Fgithub.com\u002Fblack-forest-labs\u002Fflux\u002Ftree\u002Fmain\u002Fmodel_licenses)。\n\n**免责声明**：请注意，本仓库提供的训练数据可能包含受版权保护的内容。这些开源数据仅用于参考和教育目的。如果您计划将这些数据用于商业用途，您有责任获得必要的许可，并确保遵守所有适用的版权法律法规。\n\n## 引用\n\n如果您在研究中认为本工作有所帮助，请考虑引用以下文献：\n\n```bibtex\n@article{lhhuang2024iclora,\n  title={扩散 Transformer 的上下文 LoRA},\n  author={Huang, Lianghua 和 Wang, Wei 和 Wu, Zhi-Fan 和 Shi, Yupeng 和 Dou, Huanzhang 和 Liang, Chen 和 Feng, Yutong 和 Liu, Yu 和 Zhou, Jingren},\n  journal={arXiv 预印本 arxiv:2410.23775},\n  year={2024}\n}\n```\n\n```bibtex\n@article{lhhuang2024groupdiffusion,\n  title={群体扩散 Transformer 是无监督多任务学习者},\n  author={Huang, Lianghua 和 Wang, Wei 和 Wu, Zhi-Fan 和 Dou, Huanzhang 和 Shi, Yupeng 和 Feng, Yutong 和 Liang, Chen 和 Liu, Yu 和 Zhou, Jingren},\n  journal={arXiv 预印本 arxiv:2410.15027},\n  year={2024}\n}\n```","# In-Context LoRA (IC-LoRA) 快速上手指南\n\nIn-Context LoRA 是一个基于扩散变换器（Diffusion Transformers）的灵活框架。其核心理念是将**条件图像**与**目标图像**拼接成一张复合图像，并利用**自然语言提示词**来定义任务关系。该工具适用于分镜生成、视觉识别设计、虚拟试穿等多种需要保持图像间内在联系的任务。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux (推荐 Ubuntu 20.04+)\n*   **GPU**: 单张显存至少 **24GB** 的 NVIDIA GPU (如 RTX 3090\u002F4090, A10\u002FA100)。\n    *   *注：若显存不足，需在配置文件中调整 `resolution` 参数。*\n*   **Python**: 3.10 或更高版本\n*   **前置依赖**:\n    *   [AI-Toolkit](https:\u002F\u002Fgithub.com\u002Fostris\u002Fai-toolkit)：官方推荐的训练工具库。\n    *   Git, pip, CUDA Toolkit (版本需与 PyTorch 匹配)。\n\n## 安装步骤\n\n### 1. 克隆并设置 AI-Toolkit\n首先获取官方推荐的训练框架：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fostris\u002Fai-toolkit.git\ncd ai-toolkit\npip install -r requirements.txt\n```\n\n### 2. 获取 IC-LoRA 资源\n下载本项目的配置文件和示例数据，并将其放入 AI-Toolkit 的对应目录中。\n\n```bash\n# 假设已在 ai-toolkit 根目录下\n# 下载配置文件的示例 (实际使用时请从 In-Context-LoRA 仓库下载 config\u002Fmovie-shots.yml)\nwget https:\u002F\u002Fraw.githubusercontent.com\u002Fali-vilab\u002FIn-Context-LoRA\u002Fmain\u002Fconfig\u002Fmovie-shots.yml -P config\u002F\n\n# 下载示例数据 (实际使用时请从 In-Context-LoRA 仓库下载 data\u002Fmovie-shots.zip)\nwget https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fali-vilab\u002FIn-Context-LoRA-data\u002Fresolve\u002Fmain\u002Fmovie-shots.zip -P data\u002F\nunzip data\u002Fmovie-shots.zip -d data\u002F\n```\n\n> **提示**：如果下载速度较慢，建议使用国内镜像源或代理加速 HuggingFace 和 GitHub 的连接。\n\n## 基本使用\n\n完成环境配置后，您可以直接使用提供的配置文件启动训练。以下以“电影分镜生成”任务为例：\n\n### 1. 启动训练\n在 `ai-toolkit` 根目录下运行以下命令：\n\n```bash\npython run.py config\u002Fmovie-shots.yml\n```\n\n*   训练将在单张 24GB+ 显存的 GPU 上进行。\n*   预计训练时间为数小时（具体取决于数据集大小和硬件性能）。\n*   如需适配更小显存，请编辑 `config\u002Fmovie-shots.yml` 中的 `resolution` 参数。\n\n### 2. 构建提示词 (Prompt)\nIC-LoRA 依赖特定的提示词格式来描述多场景图像的关系。训练数据标注或推理时，请参考以下结构：\n\n> **通用模板逻辑**：\n> `[前缀] 整体描述，[标记 1] 场景 1 细节，[标记 2] 场景 2 细节...`\n\n**示例提示词（电影分镜）：**\n```text\n[MOVIE-SHOTS] In a vibrant festival, [SCENE-1] we find \u003CLeo>, a shy boy, standing at the edge of a bustling carnival, eyes wide with awe at the colorful rides and laughter, [SCENE-2] transitioning to him reluctantly trying a daring game, his friends cheering him on, [SCENE-3] culminating in a triumphant moment as he wins a giant stuffed bear, his face beaming with pride as he holds it up for all to see.\n```\n\n### 3. 使用预训练模型 (推理)\n如果您不想从头训练，可以直接使用官方发布的 [Model Zoo](https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA) 中的预训练权重（`.safetensors` 文件）。\n\n*   **推荐工作流**：项目提供了适用于 [ComfyUI](https:\u002F\u002Fgithub.com\u002Fcomfyanonymous\u002FComfyUI) 的工作流文件（例如 `workflow\u002Ffilm-storyboard.json`）。\n*   **使用方法**：\n    1.  下载对应的 `.safetensors` 模型文件放入 ComfyUI 的 `models\u002Floras` 目录。\n    2.  导入提供的 JSON 工作流文件。\n    3.  加载模型并根据任务类型输入相应的提示词（参考 MODEL ZOO 中的示例 Prompt）。\n\n---\n*更多详细任务配置（如情侣头像设计、字体设计、家居装饰等）及参数设置，请参阅官方 Model Zoo 列表或原始论文。*","一家电商设计团队急需为新款羽绒服生成多套营销海报，要求模特穿着该服装在雪地、城市街头等不同场景中保持服装细节完全一致。\n\n### 没有 In-Context-LoRA 时\n- **重训成本高昂**：每更换一款新服装或一种新风格，都需要收集大量数据并重新训练专属 LoRA 模型，耗时数小时甚至数天。\n- **特征一致性差**：传统图生图或 ControlNet 方法难以在复杂背景变换中完美保留服装的纹理、Logo 和剪裁细节，常出现“变形”或“穿帮”。\n- **工作流割裂**：设计师需在多个插件间反复切换尝试，无法通过简单的参考图直接驱动生成，迭代效率极低。\n- **零样本能力弱**：面对从未见过的服装设计任务，通用模型往往无法理解指令，导致生成结果与实物严重不符。\n\n### 使用 In-Context-LoRA 后\n- **免训即时生效**：只需提供一张服装参考图作为上下文输入，In-Context-LoRA 即可在不进行任何额外训练的情况下，立即适配新任务。\n- **精准特征迁移**：利用扩散 Transformer 的注意力机制，将参考图中的服装细节完美“迁移”至不同姿态和背景中，确保像素级的一致性。\n- **流程极简统一**：在 ComfyUI 中通过单一节点加载预训练模型，设计师可像搭积木一样快速构建虚拟试衣、故事板生成等工作流。\n- **强大泛化能力**：凭借优秀的零样本泛化性，无论是卡通转写实还是跨风格产品设计，都能高质量完成，大幅拓展创意边界。\n\nIn-Context-LoRA 通过将“训练”转化为“推理时的上下文学习”，彻底打破了定制化生成的效率瓶颈，让高质量视觉创作变得像对话一样简单。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fali-vilab_In-Context-LoRA_cbf2aafb.png","ali-vilab","Alibaba TongYi Vision Intelligence Lab","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fali-vilab_c2d93ee0.png",null,"https:\u002F\u002Fgithub.com\u002Fali-vilab",2068,95,"2026-04-17T20:14:02","未说明","需要单张 GPU，显存至少 24GB（可通过调整配置中的 resolution 参数适配不同显存限制），具体显卡型号和 CUDA 版本未说明",{"notes":83,"python":80,"dependencies":84},"该工具主要作为训练框架，需配合开源的 AI-Toolkit 使用。官方提供了示例配置文件和训练数据。训练通常在单张 24GB 显存的 GPU 上进行，若显存不足可调整配置文件中的分辨率参数。推理部分支持 ComfyUI 工作流。",[85],"AI-Toolkit (ostris\u002Fai-toolkit)",[15,14],"2026-03-27T02:49:30.150509","2026-04-20T04:06:06.088815",[90,95,100,104,109,113,118],{"id":91,"question_zh":92,"answer_zh":93,"source_url":94},43106,"项目的训练代码在哪里？与 ai-toolkit 有什么关系？","项目直接利用 `ai-toolkit` 进行模型训练，因此不需要额外的专用训练代码仓库。作者已开源了示例训练配置文件和数据。用户可以直接使用 ai-toolkit，参考项目文档中的“快速开始（Getting Started）”部分，使用提供的配置样例进行训练。","https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FIn-Context-LoRA\u002Fissues\u002F4",{"id":96,"question_zh":97,"answer_zh":98,"source_url":99},43107,"会开源模型的训练数据吗？","目前暂无计划开源这些模型的训练数据。项目主要开源的是训练\u002F推理的代码流程（基于 ai-toolkit）、示例配置以及部分任务的模型参数（权重）。","https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FIn-Context-LoRA\u002Fissues\u002F5",{"id":101,"question_zh":102,"answer_zh":103,"source_url":99},43102,"In-Context LoRA 的核心工作原理是什么？它与普通微调有何不同？","In-Context LoRA 可以被视为文本到图像生成的“预训练第二阶段”（尽管目前仍是特定任务微调）。就像大语言模型的 SFT 不需要新代码只需要新数据一样，In-Context LoRA 也不需要新代码，只需新数据。其核心思想是将条件图像和目标图像拼接成一张复合图像，然后使用自然语言定义任务。这是一个与任务无关的框架，旨在统一照片修饰、视觉特效、身份保持等多种可控生成任务。",{"id":105,"question_zh":106,"answer_zh":107,"source_url":108},43103,"如何使用 ComfyUI 进行推理和加载模型？","您可以直接使用 ComfyUI 进行推理：\n1. 将检查点文件（.safetensors）放入 `ComfyUI\u002Fmodels\u002Floras` 目录下。\n2. 将提供的 workflow json 文件（如 \"workflow\u002Ffilm-storyboard.json\"）直接拖拽到 ComfyUI 界面即可加载工作流并运行。\n此外，项目已发布 10 个预训练模型和示例 ComfyUI 工作流。","https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FIn-Context-LoRA\u002Fissues\u002F7",{"id":110,"question_zh":111,"answer_zh":112,"source_url":108},43104,"如果不使用 ComfyUI，如何通过 Python 代码（diffusers）进行推理？","可以使用 diffusers 库进行推理，示例代码如下：\n```python\nfrom diffusers import FluxPipeline\nimport torch\n\nyour_lora_checkpoint_file = \".\u002Ffilm-storyboard.safetensors\"\nprompt = \"[your prompt]\"\n\npipe = FluxPipeline.from_pretrained(\"black-forest-labs\u002FFLUX.1-dev\", torch_dtype=torch.bfloat16)\npipe.load_lora_weights(your_lora_checkpoint_file)\npipe = pipe.to(torch.device(\"cuda:0\"))\npipe.enable_lora()\n\nout = pipe(\n    prompt=prompt,\n    guidance_scale=3.5,\n    height=1536,\n    width=1024,\n    num_inference_steps=50,\n).images[0]\nout.save(\"image.png\")\n```",{"id":114,"question_zh":115,"answer_zh":116,"source_url":117},43105,"如何实现图像条件（ImageCondition）生成？","在 ComfyUI 中，主要使用 `InpaintModelConditioning` 节点来处理图像条件生成。维护者表示将上传专门支持图像条件的 ComfyUI 工作流以供参考。对于尝试使用 SDEdit 失败的情况（如右侧留白），建议直接参考官方发布的包含正确节点配置的工作流文件。","https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FIn-Context-LoRA\u002Fissues\u002F8",{"id":119,"question_zh":120,"answer_zh":121,"source_url":108},43108,"如果遇到 Hugging Face 模型文件下载链接失效怎么办？","如果默认链接无法打开，可以尝试访问项目主页的文件列表直接下载。例如 `film-storyboard.safetensors` 的直接链接为：https:\u002F\u002Fhuggingface.co\u002Fali-vilab\u002FIn-Context-LoRA\u002Fblob\u002Fmain\u002Ffilm-storyboard.safetensors。维护者通常会修复失效的链接。",[]]