[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-lllyasviel--ControlNet":3,"tool-lllyasviel--ControlNet":64},[4,17,26,40,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,2,"2026-04-03T11:11:01",[13,14,15],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":23,"last_commit_at":32,"category_tags":33,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,34,35,36,15,37,38,13,39],"数据工具","视频","插件","其他","语言模型","音频",{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":10,"last_commit_at":46,"category_tags":47,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,38,37],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":10,"last_commit_at":54,"category_tags":55,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74913,"2026-04-05T10:44:17",[38,14,13,37],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":23,"last_commit_at":62,"category_tags":63,"status":16},2471,"tesseract","tesseract-ocr\u002Ftesseract","Tesseract 是一款历史悠久且备受推崇的开源光学字符识别（OCR）引擎，最初由惠普实验室开发，后由 Google 维护，目前由全球社区共同贡献。它的核心功能是将图片中的文字转化为可编辑、可搜索的文本数据，有效解决了从扫描件、照片或 PDF 文档中提取文字信息的难题，是数字化归档和信息自动化的重要基础工具。\n\n在技术层面，Tesseract 展现了强大的适应能力。从版本 4 开始，它引入了基于长短期记忆网络（LSTM）的神经网络 OCR 引擎，显著提升了行识别的准确率；同时，为了兼顾旧有需求，它依然支持传统的字符模式识别引擎。Tesseract 原生支持 UTF-8 编码，开箱即用即可识别超过 100 种语言，并兼容 PNG、JPEG、TIFF 等多种常见图像格式。输出方面，它灵活支持纯文本、hOCR、PDF、TSV 等多种格式，方便后续数据处理。\n\nTesseract 主要面向开发者、研究人员以及需要构建文档处理流程的企业用户。由于它本身是一个命令行工具和库（libtesseract），不包含图形用户界面（GUI），因此最适合具备一定编程能力的技术人员集成到自动化脚本或应用程序中",73286,"2026-04-03T01:56:45",[13,14],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":76,"owner_location":76,"owner_email":76,"owner_twitter":76,"owner_website":76,"owner_url":79,"languages":80,"stars":89,"forks":90,"last_commit_at":91,"license":92,"difficulty_score":10,"env_os":93,"env_gpu":94,"env_ram":93,"env_deps":95,"category_tags":99,"github_topics":76,"view_count":100,"oss_zip_url":76,"oss_zip_packed_at":76,"status":16,"created_at":101,"updated_at":102,"faqs":103,"releases":132},803,"lllyasviel\u002FControlNet","ControlNet","Let us control diffusion models!","ControlNet 是一款专为扩散模型设计的开源神经网络结构，旨在让用户通过添加额外条件来精确控制图像生成过程。对于 Stable Diffusion 而言，这意味着不再仅仅依赖文本提示，而是可以结合边缘检测、人体姿态或深度图等输入，实现对构图和细节的精准把控。\n\nControlNet 巧妙解决了微调大模型容易破坏原有能力的痛点。它采用“零卷积”技术，将网络权重复制为“锁定”和“可训练”两部分。训练初期零卷积输出为零，确保原模型不受干扰，仅需少量图像对即可学习新条件，且无需从头训练，保护了生产级模型的完整性。此外，它还支持低显存模式，显著降低了硬件门槛。\n\n无论是追求算法优化的开发者、进行实验的研究人员，还是需要精准素材的设计师，都能从 ControlNet 中受益。配合丰富的预训练模型和多种控制方式，它能帮助用户在个人设备上高效创作出符合预期的高质量图像，是提升 AI 绘图可控性的理想选择。","# News: A nightly version of ControlNet 1.1 is released!\n\n[ControlNet 1.1](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet-v1-1-nightly) is released. Those new models will be merged to this repo after we make sure that everything is good.\n\n# Below is ControlNet 1.0\n\nOfficial implementation of [Adding Conditional Control to Text-to-Image Diffusion Models](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.05543).\n\nControlNet is a neural network structure to control diffusion models by adding extra conditions.\n\n![img](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_77ab84967053.png)\n\nIt copys the weights of neural network blocks into a \"locked\" copy and a \"trainable\" copy. \n\nThe \"trainable\" one learns your condition. The \"locked\" one preserves your model. \n\nThanks to this, training with small dataset of image pairs will not destroy the production-ready diffusion models.\n\nThe \"zero convolution\" is 1×1 convolution with both weight and bias initialized as zeros. \n\nBefore training, all zero convolutions output zeros, and ControlNet will not cause any distortion.\n\nNo layer is trained from scratch. You are still fine-tuning. Your original model is safe. \n\nThis allows training on small-scale or even personal devices.\n\nThis is also friendly to merge\u002Freplacement\u002Foffsetting of models\u002Fweights\u002Fblocks\u002Flayers.\n\n### FAQ\n\n**Q:** But wait, if the weight of a conv layer is zero, the gradient will also be zero, and the network will not learn anything. Why \"zero convolution\" works?\n\n**A:** This is not true. [See an explanation here](docs\u002Ffaq.md).\n\n# Stable Diffusion + ControlNet\n\nBy repeating the above simple structure 14 times, we can control stable diffusion in this way:\n\n![img](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_34c429e12ae6.png)\n\nIn this way, the ControlNet can **reuse** the SD encoder as a **deep, strong, robust, and powerful backbone** to learn diverse controls. Many evidences (like [this](https:\u002F\u002Fjerryxu.net\u002FODISE\u002F) and [this](https:\u002F\u002Fvpd.ivg-research.xyz\u002F)) validate that the SD encoder is an excellent backbone.\n\nNote that the way we connect layers is computational efficient. The original SD encoder does not need to store gradients (the locked original SD Encoder Block 1234 and Middle). The required GPU memory is not much larger than original SD, although many layers are added. Great!\n\n# Features & News\n\n2023\u002F0\u002F14 - We released [ControlNet 1.1](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet-v1-1-nightly). Those new models will be merged to this repo after we make sure that everything is good.\n\n2023\u002F03\u002F03 - We released a discussion - [Precomputed ControlNet: Speed up ControlNet by 45%, but is it necessary?](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet\u002Fdiscussions\u002F216)\n\n2023\u002F02\u002F26 - We released a blog - [Ablation Study: Why ControlNets use deep encoder? What if it was lighter? Or even an MLP?](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet\u002Fdiscussions\u002F188)\n\n2023\u002F02\u002F20 - Implementation for non-prompt mode released. See also [Guess Mode \u002F Non-Prompt Mode](#guess-anchor).\n\n2023\u002F02\u002F12 - Now you can play with any community model by [Transferring the ControlNet](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet\u002Fdiscussions\u002F12).\n\n2023\u002F02\u002F11 - [Low VRAM mode](docs\u002Flow_vram.md) is added. Please use this mode if you are using 8GB GPU(s) or if you want larger batch size.\n\n# Production-Ready Pretrained Models\n\nFirst create a new conda environment\n\n    conda env create -f environment.yaml\n    conda activate control\n\nAll models and detectors can be downloaded from [our Hugging Face page](https:\u002F\u002Fhuggingface.co\u002Flllyasviel\u002FControlNet). Make sure that SD models are put in \"ControlNet\u002Fmodels\" and detectors are put in \"ControlNet\u002Fannotator\u002Fckpts\". Make sure that you download all necessary pretrained weights and detector models from that Hugging Face page, including HED edge detection model, Midas depth estimation model, Openpose, and so on. \n\nWe provide 9 Gradio apps with these models.\n\nAll test images can be found at the folder \"test_imgs\".\n\n## ControlNet with Canny Edge\n\nStable Diffusion 1.5 + ControlNet (using simple Canny edge detection)\n\n    python gradio_canny2image.py\n\nThe Gradio app also allows you to change the Canny edge thresholds. Just try it for more details.\n\nPrompt: \"bird\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_f695f4ab2239.png)\n\nPrompt: \"cute dog\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_b127716b363f.png)\n\n## ControlNet with M-LSD Lines\n\nStable Diffusion 1.5 + ControlNet (using simple M-LSD straight line detection)\n\n    python gradio_hough2image.py\n\nThe Gradio app also allows you to change the M-LSD thresholds. Just try it for more details.\n\nPrompt: \"room\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_132adf369dad.png)\n\nPrompt: \"building\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_11282cb86fa4.png)\n\n## ControlNet with HED Boundary\n\nStable Diffusion 1.5 + ControlNet (using soft HED Boundary)\n\n    python gradio_hed2image.py\n\nThe soft HED Boundary will preserve many details in input images, making this app suitable for recoloring and stylizing. Just try it for more details.\n\nPrompt: \"oil painting of handsome old man, masterpiece\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_d781a1fc3b06.png)\n\nPrompt: \"Cyberpunk robot\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_be0be2015594.png)\n\n## ControlNet with User Scribbles\n\nStable Diffusion 1.5 + ControlNet (using Scribbles)\n\n    python gradio_scribble2image.py\n\nNote that the UI is based on Gradio, and Gradio is somewhat difficult to customize. Right now you need to draw scribbles outside the UI (using your favorite drawing software, for example, MS Paint) and then import the scribble image to Gradio. \n\nPrompt: \"turtle\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_b86a4d892dc4.png)\n\nPrompt: \"hot air balloon\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_75fa8cddadcc.png)\n\n### Interactive Interface\n\nWe actually provide an interactive interface\n\n    python gradio_scribble2image_interactive.py\n\n~~However, because gradio is very [buggy](https:\u002F\u002Fgithub.com\u002Fgradio-app\u002Fgradio\u002Fissues\u002F3166) and difficult to customize, right now, user need to first set canvas width and heights and then click \"Open drawing canvas\" to get a drawing area. Please do not upload image to that drawing canvas. Also, the drawing area is very small; it should be bigger. But I failed to find out how to make it larger. Again, gradio is really buggy.~~ (Now fixed, will update asap)\n\nThe below dog sketch is drawn by me. Perhaps we should draw a better dog for showcase.\n\nPrompt: \"dog in a room\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_3c5dc5835c73.png)\n\n## ControlNet with Fake Scribbles\n\nStable Diffusion 1.5 + ControlNet (using fake scribbles)\n\n    python gradio_fake_scribble2image.py\n\nSometimes we are lazy, and we do not want to draw scribbles. This script use the exactly same scribble-based model but use a simple algorithm to synthesize scribbles from input images.\n\nPrompt: \"bag\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_92cc9f340bbe.png)\n\nPrompt: \"shose\" (Note that \"shose\" is a typo; it should be \"shoes\". But it still seems to work.)\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_f3cb9e07c47b.png)\n\n## ControlNet with Human Pose\n\nStable Diffusion 1.5 + ControlNet (using human pose)\n\n    python gradio_pose2image.py\n\nApparently, this model deserves a better UI to directly manipulate pose skeleton. However, again, Gradio is somewhat difficult to customize. Right now you need to input an image and then the Openpose will detect the pose for you.\n\nPrompt: \"Chief in the kitchen\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_74605b8767a7.png)\n\nPrompt: \"An astronaut on the moon\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_0b528cbaea83.png)\n\n## ControlNet with Semantic Segmentation\n\nStable Diffusion 1.5 + ControlNet (using semantic segmentation)\n\n    python gradio_seg2image.py\n\nThis model use ADE20K's segmentation protocol. Again, this model deserves a better UI to directly draw the segmentations. However, again, Gradio is somewhat difficult to customize. Right now you need to input an image and then a model called Uniformer will detect the segmentations for you. Just try it for more details.\n\nPrompt: \"House\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_2f5602988db6.png)\n\nPrompt: \"River\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_f716ef53c2b5.png)\n\n## ControlNet with Depth\n\nStable Diffusion 1.5 + ControlNet (using depth map)\n\n    python gradio_depth2image.py\n\nGreat! Now SD 1.5 also have a depth control. FINALLY. So many possibilities (considering SD1.5 has much more community models than SD2).\n\nNote that different from Stability's model, the ControlNet receive the full 512×512 depth map, rather than 64×64 depth. Note that Stability's SD2 depth model use 64*64 depth maps. This means that the ControlNet will preserve more details in the depth map.\n\nThis is always a strength because if users do not want to preserve more details, they can simply use another SD to post-process an i2i. But if they want to preserve more details, ControlNet becomes their only choice. Again, SD2 uses 64×64 depth, we use 512×512.\n\nPrompt: \"Stormtrooper's lecture\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_845780c70c6a.png)\n\n## ControlNet with Normal Map\n\nStable Diffusion 1.5 + ControlNet (using normal map)\n\n    python gradio_normal2image.py\n\nThis model use normal map. Rightnow in the APP, the normal is computed from the midas depth map and a user threshold (to determine how many area is background with identity normal face to viewer, tune the \"Normal background threshold\" in the gradio app to get a feeling).\n\nPrompt: \"Cute toy\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_8e3f03e5f9c3.png)\n\nPrompt: \"Plaster statue of Abraham Lincoln\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_e8b02b77436a.png)\n\nCompared to depth model, this model seems to be a bit better at preserving the geometry. This is intuitive: minor details are not salient in depth maps, but are salient in normal maps. Below is the depth result with same inputs. You can see that the hairstyle of the man in the input image is modified by depth model, but preserved by the normal model. \n\nPrompt: \"Plaster statue of Abraham Lincoln\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_81098c31acb7.png)\n\n## ControlNet with Anime Line Drawing\n\nWe also trained a relatively simple ControlNet for anime line drawings. This tool may be useful for artistic creations. (Although the image details in the results is a bit modified, since it still diffuse latent images.)\n\nThis model is not available right now. We need to evaluate the potential risks before releasing this model. Nevertheless, you may be interested in [transferring the ControlNet to any community model](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet\u002Fdiscussions\u002F12).\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_907ba19cbeb7.png)\n\n\u003Ca id=\"guess-anchor\">\u003C\u002Fa>\n\n# Guess Mode \u002F Non-Prompt Mode\n\nThe \"guess mode\" (or called non-prompt mode) will completely unleash all the power of the very powerful ControlNet encoder. \n\nSee also the blog - [Ablation Study: Why ControlNets use deep encoder? What if it was lighter? Or even an MLP?](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet\u002Fdiscussions\u002F188)\n\nYou need to manually check the \"Guess Mode\" toggle to enable this mode.\n\nIn this mode, the ControlNet encoder will try best to recognize the content of the input control map, like depth map, edge map, scribbles, etc, even if you remove all prompts.\n\n**Let's have fun with some very challenging experimental settings!**\n\n**No prompts. No \"positive\" prompts. No \"negative\" prompts. No extra caption detector. One single diffusion loop.**\n\nFor this mode, we recommend to use 50 steps and guidance scale between 3 and 5.\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_9d9532f221f6.png)\n\nNo prompts:\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_5ed12ae86609.png)\n\nNote that the below example is 768×768. No prompts. No \"positive\" prompts. No \"negative\" prompts.\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_79a38e2cfcc5.png)\n\nBy tuning the parameters, you can get some very intereting results like below:\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_753f5b45ea9c.png)\n\nBecause no prompt is available, the ControlNet encoder will \"guess\" what is in the control map. Sometimes the guess result is really interesting. Because diffusion algorithm can essentially give multiple results, the ControlNet seems able to give multiple guesses, like this:\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_0a94e85637fb.png)\n\nWithout prompt, the HED seems good at generating images look like paintings when the control strength is relatively low:\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_fdec5347a818.png)\n\nThe Guess Mode is also supported in [WebUI Plugin](https:\u002F\u002Fgithub.com\u002FMikubill\u002Fsd-webui-controlnet):\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_5586f828f707.png)\n\nNo prompts. Default WebUI parameters. Pure random results with the seed being 12345. Standard SD1.5. Input scribble is in \"test_imgs\" folder to reproduce.\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_b0575be8cf12.png)\n\nBelow is another challenging example:\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_8dd3d039b715.png)\n\nNo prompts. Default WebUI parameters. Pure random results with the seed being 12345. Standard SD1.5. Input scribble is in \"test_imgs\" folder to reproduce.\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_388b4a5312f6.png)\n\nNote that in the guess mode, you will still be able to input prompts. The only difference is that the model will \"try harder\" to guess what is in the control map even if you do not provide the prompt. Just try it yourself!\n\nBesides, if you write some scripts (like BLIP) to generate image captions from the \"guess mode\" images, and then use the generated captions as prompts to diffuse again, you will get a SOTA pipeline for fully automatic conditional image generating.\n\n# Combining Multiple ControlNets\n\nControlNets are composable: more than one ControlNet can be easily composed to multi-condition control.\n\nRight now this feature is in experimental stage in the [Mikubill' A1111 Webui Plugin](https:\u002F\u002Fgithub.com\u002FMikubill\u002Fsd-webui-controlnet):\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_b4ce65e32c36.png)\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_648417aeb592.png)\n\nAs long as the models are controlling the same SD, the \"boundary\" between different research projects does not even exist. This plugin also allows different methods to work together!\n\n# Use ControlNet in Any Community Model (SD1.X)\n\nThis is an experimental feature.\n\n[See the steps here](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet\u002Fdiscussions\u002F12).\n\nOr you may want to use the [Mikubill' A1111 Webui Plugin](https:\u002F\u002Fgithub.com\u002FMikubill\u002Fsd-webui-controlnet) which is plug-and-play and does not need manual merging.\n\n# Annotate Your Own Data\n\nWe provide simple python scripts to process images.\n\n[See a gradio example here](docs\u002Fannotator.md).\n\n# Train with Your Own Data\n\nTraining a ControlNet is as easy as (or even easier than) training a simple pix2pix. \n\n[See the steps here](docs\u002Ftrain.md).\n\n# Related Resources\n\nSpecial Thank to the great project - [Mikubill' A1111 Webui Plugin](https:\u002F\u002Fgithub.com\u002FMikubill\u002Fsd-webui-controlnet) !\n\nWe also thank Hysts for making [Hugging Face Space](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fhysts\u002FControlNet) as well as more than 65 models in that amazing [Colab list](https:\u002F\u002Fgithub.com\u002Fcamenduru\u002Fcontrolnet-colab)! \n\nThank haofanwang for making [ControlNet-for-Diffusers](https:\u002F\u002Fgithub.com\u002Fhaofanwang\u002FControlNet-for-Diffusers)!\n\nWe also thank all authors for making Controlnet DEMOs, including but not limited to [fffiloni](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffffiloni\u002FControlNet-Video), [other-model](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fhysts\u002FControlNet-with-other-models), [ThereforeGames](https:\u002F\u002Fgithub.com\u002FAUTOMATIC1111\u002Fstable-diffusion-webui\u002Fdiscussions\u002F7784), [RamAnanth1](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FRamAnanth1\u002FControlNet), etc!\n\nBesides, you may also want to read these amazing related works:\n\n[Composer: Creative and Controllable Image Synthesis with Composable Conditions](https:\u002F\u002Fgithub.com\u002Fdamo-vilab\u002Fcomposer): A much bigger model to control diffusion!\n\n[T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models](https:\u002F\u002Fgithub.com\u002FTencentARC\u002FT2I-Adapter): A much smaller model to control stable diffusion!\n\n[ControlLoRA: A Light Neural Network To Control Stable Diffusion Spatial Information](https:\u002F\u002Fgithub.com\u002FHighCWu\u002FControlLoRA): Implement Controlnet using LORA!\n\nAnd these amazing recent projects: [InstructPix2Pix Learning to Follow Image Editing Instructions](https:\u002F\u002Fwww.timothybrooks.com\u002Finstruct-pix2pix), [Pix2pix-zero: Zero-shot Image-to-Image Translation](https:\u002F\u002Fgithub.com\u002Fpix2pixzero\u002Fpix2pix-zero), [Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation](https:\u002F\u002Fgithub.com\u002FMichalGeyer\u002Fplug-and-play), [MaskSketch: Unpaired Structure-guided Masked Image Generation](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.05496), [SEGA: Instructing Diffusion using Semantic Dimensions](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.12247), [Universal Guidance for Diffusion Models](https:\u002F\u002Fgithub.com\u002Farpitbansal297\u002FUniversal-Guided-Diffusion), [Region-Aware Diffusion for Zero-shot Text-driven Image Editing](https:\u002F\u002Fgithub.com\u002Fhaha-lisa\u002FRDM-Region-Aware-Diffusion-Model), [Domain Expansion of Image Generators](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.05225), [Image Mixer](https:\u002F\u002Ftwitter.com\u002FLambdaAPI\u002Fstatus\u002F1626327289288957956), [MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation](https:\u002F\u002Fmultidiffusion.github.io\u002F)\n\n# Citation\n\n    @misc{zhang2023adding,\n      title={Adding Conditional Control to Text-to-Image Diffusion Models}, \n      author={Lvmin Zhang and Anyi Rao and Maneesh Agrawala},\n      booktitle={IEEE International Conference on Computer Vision (ICCV)}\n      year={2023},\n    }\n\n[Arxiv Link](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.05543)\n\n[Supplementary Materials](https:\u002F\u002Flllyasviel.github.io\u002Fmisc\u002F202309\u002Fcnet_supp.pdf)\n","# 新闻：ControlNet (控制网络) 1.1 的夜间版本已发布！\n\n[ControlNet 1.1](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet-v1-1-nightly) 已发布。在我们确认一切正常后，这些新模型将合并到此仓库中。\n\n# 以下是 ControlNet 1.0\n\n[向文本到图像扩散模型 (Diffusion Models) 添加条件控制](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.05543) 的官方实现。\n\nControlNet 是一种神经网络 (Neural Network) 结构，通过添加额外条件来控制扩散模型。\n\n![img](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_77ab84967053.png)\n\n它将神经网络块的权重复制到一个“锁定”副本和一个“可训练”副本中。 \n\n“可训练”的那个学习你的条件。“锁定”的那个保留你的模型。 \n\n得益于这一点，使用少量图像对数据集进行训练不会破坏生产就绪的扩散模型。\n\n“零卷积 (Zero Convolution)”是权重和偏置均初始化为零的 1×1 卷积。 \n\n在训练之前，所有零卷积输出为零，ControlNet 不会引起任何失真。\n\n没有任何层是从头开始训练的。你仍然是在微调 (Fine-tuning)。你的原始模型是安全的。 \n\n这使得在小型甚至个人设备上训练成为可能。\n\n这也便于模型\u002F权重\u002F块\u002F层的合并\u002F替换\u002F偏移操作。\n\n### 常见问题解答 (FAQ)\n\n**问：** 等等，如果卷积层的权重为零，梯度 (Gradient) 也会为零，网络将无法学习任何东西。为什么“零卷积”有效？\n\n**答：** 并非如此。[在此查看解释](docs\u002Ffaq.md)。\n\n# 稳定扩散 (Stable Diffusion) + ControlNet\n\n通过重复上述简单结构 14 次，我们可以这样控制稳定扩散：\n\n![img](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_34c429e12ae6.png)\n\n通过这种方式，ControlNet 可以**重用**SD 编码器 (Encoder) 作为**深度、强大、鲁棒且功能强大的骨干 (Backbone)** 来学习多样化的控制。许多证据（如 [此](https:\u002F\u002Fjerryxu.net\u002FODISE\u002F) 和 [此](https:\u002F\u002Fvpd.ivg-research.xyz\u002F)）验证了 SD 编码器是一个优秀的骨干。\n\n请注意，我们连接层的方式计算效率高。原始 SD 编码器不需要存储梯度（锁定的原始 SD 编码器块 1234 和中层）。尽管添加了多层，但所需的 GPU 显存 (GPU Memory) 并不比原始 SD 大多少。太棒了！\n\n# 功能与新闻\n\n2023\u002F0\u002F14 - 我们发布了 [ControlNet 1.1](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet-v1-1-nightly)。在我们确认一切正常后，这些新模型将合并到此仓库中。\n\n2023\u002F03\u002F03 - 我们发布了一个讨论主题 - [预计算 ControlNet：将 ControlNet 速度提升 45%，但这有必要吗？](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet\u002Fdiscussions\u002F216)\n\n2023\u002F02\u002F26 - 我们发布了一篇博客 - [消融研究 (Ablation Study)：为什么 ControlNet 使用深度编码器？如果它更轻呢？或者甚至是多层感知机 (MLP)？](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet\u002Fdiscussions\u002F188)\n\n2023\u002F02\u002F20 - 非提示模式的实现已发布。另见 [猜测模式\u002F非提示模式](#guess-anchor)。\n\n2023\u002F02\u002F12 - 现在您可以通过 [转移 ControlNet](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet\u002Fdiscussions\u002F12) 来尝试使用任何社区模型。\n\n2023\u002F02\u002F11 - 添加了 [低显存 (VRAM) 模式](docs\u002Flow_vram.md)。如果您使用 8GB GPU 或想要更大的批次大小 (Batch Size)，请使用此模式。\n\n# 生产就绪的预训练模型\n\n首先创建一个新的 Conda 环境\n\n    conda env create -f environment.yaml\n    conda activate control\n\n所有模型和检测器 (Detectors) 均可从 [我们的 Hugging Face 页面](https:\u002F\u002Fhuggingface.co\u002Flllyasviel\u002FControlNet) 下载。请确保 SD 模型放在 \"ControlNet\u002Fmodels\" 中，检测器放在 \"ControlNet\u002Fannotator\u002Fckpts\" 中。请确保从该 Hugging Face 页面下载所有必要的预训练权重和检测器模型，包括 HED 边缘检测模型、Midas 深度估计模型、Openpose 等。 \n\n我们提供了 9 个带有这些模型的 Gradio 应用程序。\n\n所有测试图片都可以在 \"test_imgs\" 文件夹中找到。\n\n## 带有 Canny 边缘的 ControlNet\n\nStable Diffusion 1.5 + ControlNet（使用简单的 Canny 边缘检测）\n\n    python gradio_canny2image.py\n\nGradio 应用程序还允许您更改 Canny 边缘阈值。只需尝试一下以获取更多详细信息。\n\nPrompt: \"bird\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_f695f4ab2239.png)\n\nPrompt: \"cute dog\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_b127716b363f.png)\n\n## 带有 M-LSD 线条的 ControlNet\n\nStable Diffusion 1.5 + ControlNet（使用简单的 M-LSD 直线检测）\n\n    python gradio_hough2image.py\n\nGradio 应用程序还允许您更改 M-LSD 阈值。只需尝试一下以获取更多详细信息。\n\nPrompt: \"room\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_132adf369dad.png)\n\nPrompt: \"building\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_11282cb86fa4.png)\n\n## 带有 HED 边界的 ControlNet\n\nStable Diffusion 1.5 + ControlNet（使用软 HED 边界）\n\n    python gradio_hed2image.py\n\n软 HED 边界将保留输入图像中的许多细节，使此应用程序适合重新着色和风格化。只需尝试一下以获取更多详细信息。\n\nPrompt: \"oil painting of handsome old man, masterpiece\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_d781a1fc3b06.png)\n\nPrompt: \"Cyberpunk robot\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_be0be2015594.png)\n\n## 带有用户涂鸦的 ControlNet\n\nStable Diffusion 1.5 + ControlNet（使用涂鸦）\n\n    python gradio_scribble2image.py\n\n请注意，UI 基于 Gradio，而 Gradio 在某些方面难以自定义。目前，您需要在外部的 UI 之外绘制涂鸦（使用您喜欢的绘图软件，例如 MS Paint），然后将涂鸦图像导入 Gradio。 \n\nPrompt: \"turtle\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_b86a4d892dc4.png)\n\nPrompt: \"hot air balloon\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_75fa8cddadcc.png)\n\n### 交互式界面\n\n我们实际上提供了一个交互式界面\n\n    python gradio_scribble2image_interactive.py\n\n~~然而，由于 gradio 非常 [有 bug](https:\u002F\u002Fgithub.com\u002Fgradio-app\u002Fgradio\u002Fissues\u002F3166) 且难以自定义，目前，用户需要先设置画布宽度和高度，然后点击“打开绘图画布”以获得绘图区域。请勿将图像上传到该绘图画布。此外，绘图区域非常小；应该更大。但我无法找出如何使其更大。再次强调，gradio 确实有很多 bug。~~（现已修复，将尽快更新）\n\n下面的狗草图是我画的。也许我们应该画一只更好的狗来展示。\n\nPrompt: \"dog in a room\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_3c5dc5835c73.png)\n\n## 带有假涂鸦的 ControlNet\n\nStable Diffusion 1.5 + ControlNet（使用假涂鸦）\n\n    python gradio_fake_scribble2image.py\n\n有时我们很懒，不想画涂鸦。这个脚本使用完全相同的基于涂鸦的模型，但使用简单的算法从输入图像合成涂鸦。\n\nPrompt: \"bag\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_92cc9f340bbe.png)\n\nPrompt: \"shose\"（注意：\"shose\" 是个拼写错误；应该是\"shoes\"。但它似乎仍然有效。）\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_f3cb9e07c47b.png)\n\n## ControlNet with Human Pose\n\nStable Diffusion 1.5 + ControlNet（使用人体姿态）\n\n    python gradio_pose2image.py\n\n显然，这个模型值得拥有一个更好的用户界面来直接操作姿态骨架。然而，Gradio 在某种程度上难以定制。目前你需要输入一张图片，然后 Openpose 会为你检测姿态。\n\nPrompt: \"Chief in the kitchen\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_74605b8767a7.png)\n\nPrompt: \"An astronaut on the moon\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_0b528cbaea83.png)\n\n## ControlNet with Semantic Segmentation\n\nStable Diffusion 1.5 + ControlNet（使用语义分割）\n\n    python gradio_seg2image.py\n\n该模型使用 ADE20K 的分割协议。同样，这个模型值得拥有一个更好的用户界面来直接绘制分割区域。然而，Gradio 在某种程度上难以定制。目前你需要输入一张图片，然后一个名为 Uniformer 的模型会为你检测分割结果。尝试一下以了解更多详情。\n\nPrompt: \"House\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_2f5602988db6.png)\n\nPrompt: \"River\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_f716ef53c2b5.png)\n\n## ControlNet with Depth\n\nStable Diffusion 1.5 + ControlNet（使用深度图）\n\n    python gradio_depth2image.py\n\n太好了！现在 SD 1.5 也支持深度控制。终于如此。可能性无穷无尽（考虑到 SD1.5 拥有比 SD2 多得多的社区模型）。\n\n请注意，与 Stability 的模型不同，ControlNet 接收完整的 512×512 深度图，而不是 64×64 深度图。请注意，Stability 的 SD2 深度模型使用 64*64 深度图。这意味着 ControlNet 将在深度图中保留更多细节。\n\n这始终是一个优势，因为如果用户不想保留更多细节，他们可以直接使用另一个 SD 对图像到图像 (i2i) 进行后处理。但如果他们想要保留更多细节，ControlNet 就成了唯一的选择。再次强调，SD2 使用 64×64 深度，而我们使用 512×512。\n\nPrompt: \"Stormtrooper's lecture\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_845780c70c6a.png)\n\n## ControlNet with Normal Map\n\nStable Diffusion 1.5 + ControlNet（使用法线图）\n\n    python gradio_normal2image.py\n\n该模型使用法线图。目前在 APP 中，法线是根据 midas 深度图和用户阈值计算得出的（用于判断哪些区域是背景且法线正对观察者，调整 gradio 应用中的“法线背景阈值”来感受效果）。\n\nPrompt: \"Cute toy\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_8e3f03e5f9c3.png)\n\nPrompt: \"Plaster statue of Abraham Lincoln\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_e8b02b77436a.png)\n\n与深度模型相比，该模型似乎在保留几何结构方面表现更好。这是直观的：细微细节在深度图中并不显著，但在法线图中却很显著。下面是相同输入的深度结果。你可以看到输入图像中男子的发型被深度模型修改了，但被法线模型保留了。 \n\nPrompt: \"Plaster statue of Abraham Lincoln\"\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_81098c31acb7.png)\n\n## ControlNet with Anime Line Drawing\n\n我们还训练了一个相对简单的用于动漫线条画的 ControlNet。该工具可能对艺术创作有用。（尽管结果中的图像细节略有修改，因为它仍然扩散潜在图像。）\n\n该模型目前不可用。我们需要在发布此模型之前评估潜在风险。尽管如此，你可能对 [将 ControlNet 转移到任何社区模型](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet\u002Fdiscussions\u002F12) 感兴趣。\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_907ba19cbeb7.png)\n\n\u003Ca id=\"guess-anchor\">\u003C\u002Fa>\n\n# Guess Mode \u002F Non-Prompt Mode\n\n“猜测模式”（或称为无提示词模式）将完全释放非常强大的 ControlNet 编码器的所有力量。 \n\n另请参阅博客 - [消融研究：为什么 ControlNets 使用深层编码器？如果更轻呢？甚至是 MLP？](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet\u002Fdiscussions\u002F188)\n\n你需要手动勾选“猜测模式”开关以启用此模式。\n\n在此模式下，ControlNet 编码器将尽力识别输入控制图的内容，如深度图、边缘图、涂鸦等，即使你移除了所有提示词。\n\n**让我们在一些极具挑战性的实验设置中尽情玩耍！**\n\n**无提示词。无“正向”提示词。无“负向”提示词。无额外的标题检测器。仅单次扩散循环。**\n\n对于此模式，我们建议使用 50 步，引导比例 (guidance scale) 在 3 到 5 之间。\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_9d9532f221f6.png)\n\n无提示词：\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_5ed12ae86609.png)\n\n请注意，以下示例为 768×768。无提示词。无“正向”提示词。无“负向”提示词。\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_79a38e2cfcc5.png)\n\n通过调整参数，你可以获得一些非常有趣的结果，如下所示：\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_753f5b45ea9c.png)\n\n由于没有可用提示词，ControlNet 编码器将“猜测”控制图中有什么。有时猜测结果真的很有趣。因为扩散算法本质上可以给出多个结果，ControlNet 似乎能够给出多个猜测，就像这样：\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_0a94e85637fb.png)\n\n在没有提示词的情况下，当控制强度相对较低时，HED 似乎擅长生成看起来像绘画的图像：\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_fdec5347a818.png)\n\n[WebUI 插件](https:\u002F\u002Fgithub.com\u002FMikubill\u002Fsd-webui-controlnet) 中也支持猜测模式：\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_5586f828f707.png)\n\n无提示词。默认 WebUI 参数。纯随机结果，种子 (seed) 为 12345。标准 SD1.5。输入涂鸦位于 \"test_imgs\" 文件夹中以复现。\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_b0575be8cf12.png)\n\n以下是另一个具有挑战性的示例：\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_8dd3d039b715.png)\n\n无提示词。默认 WebUI 参数。纯随机结果，种子 (seed) 为 12345。标准 SD1.5。输入涂鸦位于 \"test_imgs\" 文件夹中以复现。\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_388b4a5312f6.png)\n\n请注意，在猜测模式中，你仍然可以输入提示词。唯一的区别是，即使你不提供提示词，模型也会“更努力”地猜测控制图中有什么。亲自试一试吧！\n\n此外，如果你编写一些脚本（如 BLIP）从“猜测模式”图像生成图像标题，然后将生成的标题作为提示词再次进行扩散，你将获得一个用于全自动条件图像生成的 SOTA (最先进) 流程。\n\n# Combining Multiple ControlNets\n\nControlNets 是可组合的：多个 ControlNet 可以轻松组合以实现多条件控制。\n\n目前，此功能处于 [Mikubill' A1111 Webui 插件](https:\u002F\u002Fgithub.com\u002FMikubill\u002Fsd-webui-controlnet) 的实验阶段：\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_b4ce65e32c36.png)\n\n![p](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_readme_648417aeb592.png)\n\n只要模型控制的是同一个 SD，不同研究项目之间的“界限”甚至都不存在。此插件还允许不同的方法协同工作！\n\n# Use ControlNet in Any Community Model (SD1.X)\n\n这是一个实验性功能。\n\n[在此处查看步骤](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet\u002Fdiscussions\u002F12)。\n\n或者你可能想使用 [Mikubill' A1111 Webui 插件](https:\u002F\u002Fgithub.com\u002FMikubill\u002Fsd-webui-controlnet)，它是即插即用的，不需要手动合并。\n\n# Annotate Your Own Data\n\n我们提供简单的 Python 脚本来处理图像。\n\n[在此处查看 gradio 示例](docs\u002Fannotator.md)。\n\n# 使用你自己的数据进行训练\n\n训练一个 ControlNet（一种用于控制扩散模型生成的条件网络）就像（甚至比）训练一个简单的 pix2pix（图像到图像转换模型）一样容易。\n\n[在此处查看步骤](docs\u002Ftrain.md)。\n\n# 相关资源\n\n特别感谢伟大的项目 - [Mikubill' A1111 Webui 插件](https:\u002F\u002Fgithub.com\u002FMikubill\u002Fsd-webui-controlnet)！\n\n我们也感谢 Hysts 创建了 [Hugging Face Space](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fhysts\u002FControlNet)（托管机器学习应用的在线平台），以及那个令人惊叹的 [Colab 列表](https:\u002F\u002Fgithub.com\u002Fcamenduru\u002Fcontrolnet-colab)（Google 云端 Jupyter Notebook 服务）中的 65 多个模型！\n\n感谢 haofanwang 制作了 [ControlNet-for-Diffusers](https:\u002F\u002Fgithub.com\u002Fhaofanwang\u002FControlNet-for-Diffusers)（基于 Hugging Face Diffusers 库的实现）！\n\n我们也感谢所有制作 ControlNet 演示的作者，包括但不限于 [fffiloni](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffffiloni\u002FControlNet-Video), [other-model](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fhysts\u002FControlNet-with-other-models), [ThereforeGames](https:\u002F\u002Fgithub.com\u002FAUTOMATIC1111\u002Fstable-diffusion-webui\u002Fdiscussions\u002F7784), [RamAnanth1](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FRamAnanth1\u002FControlNet) 等！\n\n此外，您可能还想阅读这些令人惊叹的相关作品：\n\n[Composer: 使用可组合条件进行创意且可控的图像合成](https:\u002F\u002Fgithub.com\u002Fdamo-vilab\u002Fcomposer): 一个更大的模型来控制扩散（扩散过程）！\n\n[T2I-Adapter: 学习适配器以挖掘文本到图像扩散模型的更多可控能力](https:\u002F\u002Fgithub.com\u002FTencentARC\u002FT2I-Adapter): 一个更小的模型来控制 Stable Diffusion（一种流行的文本生成图像扩散模型）！\n\n[ControlLoRA: 一个轻量级神经网络以控制 Stable Diffusion 的空间信息](https:\u002F\u002Fgithub.com\u002FHighCWu\u002FControlLoRA): 使用 LoRA（低秩适应）实现 ControlNet！\n\n以及这些令人惊叹的最新项目：[InstructPix2Pix 学习遵循图像编辑指令](https:\u002F\u002Fwww.timothybrooks.com\u002Finstruct-pix2pix), [Pix2pix-zero: 零样本图像到图像翻译](https:\u002F\u002Fgithub.com\u002Fpix2pixzero\u002Fpix2pix-zero), [Plug-and-Play Diffusion Features 用于文本驱动图像到图像翻译的即插即用扩散特征](https:\u002F\u002Fgithub.com\u002FMichalGeyer\u002Fplug-and-play), [MaskSketch: 非配对结构引导的掩码图像生成](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.05496), [SEGA: 使用语义维度指导扩散](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.12247), [Universal Guidance for Diffusion Models 扩散模型的通用引导](https:\u002F\u002Fgithub.com\u002Farpitbansal297\u002FUniversal-Guided-Diffusion), [Region-Aware Diffusion 用于零样本文本驱动图像编辑的区域感知扩散](https:\u002F\u002Fgithub.com\u002Fhaha-lisa\u002FRDM-Region-Aware-Diffusion-Model), [Domain Expansion of Image Generators 图像生成器的领域扩展](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.05225), [Image Mixer 图像混合器](https:\u002F\u002Ftwitter.com\u002FLambdaAPI\u002Fstatus\u002F1626327289288957956), [MultiDiffusion: 融合扩散路径以实现受控图像生成](https:\u002F\u002Fmultidiffusion.github.io\u002F)\n\n# 引用\n\n    @misc{zhang2023adding,\n      title={Adding Conditional Control to Text-to-Image Diffusion Models}, \n      author={Lvmin Zhang and Anyi Rao and Maneesh Agrawala},\n      booktitle={IEEE International Conference on Computer Vision (ICCV)}\n      year={2023},\n    }\n\n[Arxiv 链接](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.05543)\n\n[补充材料](https:\u002F\u002Flllyasviel.github.io\u002Fmisc\u002F202309\u002Fcnet_supp.pdf)","# ControlNet 快速上手指南\n\nControlNet 是一种通过添加额外条件来控制扩散模型的神经网络结构。它允许在保持原始模型权重的同时，学习新的控制条件（如边缘、深度、姿态等），从而实现对 Stable Diffusion 生成过程的精确控制。\n\n## 环境准备\n\n*   **系统要求**：建议使用 NVIDIA GPU。若显存小于 8GB，请启用低显存模式（Low VRAM mode）。\n*   **前置依赖**：Python 环境及 Conda 包管理器。\n*   **代码仓库**：确保已克隆项目代码到本地。\n\n## 安装步骤\n\n1.  **创建并激活 Conda 环境**\n    在项目根目录下运行以下命令创建环境并激活：\n    ```bash\n    conda env create -f environment.yaml\n    conda activate control\n    ```\n\n2.  **下载预训练模型与检测器**\n    所有模型和检测器需从官方 Hugging Face 页面下载：[lllyasviel\u002FControlNet](https:\u002F\u002Fhuggingface.co\u002Flllyasviel\u002FControlNet)。\n    \n    下载后请按照以下目录结构放置文件：\n    *   **Stable Diffusion 模型**：放入 `ControlNet\u002Fmodels` 文件夹。\n    *   **检测器模型**（如 HED 边缘检测、Midas 深度估计、Openpose 等）：放入 `ControlNet\u002Fannotator\u002Fckpts` 文件夹。\n\n## 基本使用\n\n项目提供了多个基于 Gradio 的交互应用，对应不同的控制类型。启动任一脚本即可在浏览器中打开 Web 界面。\n\n### 1. 边缘控制 (Canny Edge)\n这是最基础的控制方式，适用于根据草图生成图像。\n```bash\npython gradio_canny2image.py\n```\n*操作提示*：在界面中可调整 Canny 边缘检测阈值。\n*示例 Prompt*：\"bird\", \"cute dog\"\n\n### 2. 其他控制模式\n根据需求选择对应的脚本启动：\n*   **直线检测 (M-LSD)**：`python gradio_hough2image.py`\n*   **软边界 (HED Boundary)**：`python gradio_hed2image.py`（适合重绘和风格化）\n*   **用户涂鸦 (Scribbles)**：`python gradio_scribble2image.py`\n*   **人体姿态 (Human Pose)**：`python gradio_pose2image.py`（需输入图片自动检测姿态）\n*   **语义分割 (Semantic Segmentation)**：`python gradio_seg2image.py`\n*   **深度图 (Depth)**：`python gradio_depth2image.py`（支持 512×512 高分辨率深度图）\n*   **法线图 (Normal Map)**：`python gradio_normal2image.py`\n\n### 3. 猜测模式 (Guess Mode)\n该模式完全释放 ControlNet 编码器的能力，无需文本提示词即可识别控制图中的内容。\n*   **开启方式**：在 UI 中手动勾选 \"Guess Mode\" 开关。\n*   **推荐参数**：步数 50，引导系数 (Guidance Scale) 3-5。\n*   **适用场景**：无提示词生成、探索性创作。\n\n> **注意**：ControlNet 1.1 夜间版本已发布，新模型将在验证无误后合并至主仓库。","独立游戏美术师小张负责主角的多套皮肤设计，要求在不同配色下严格保持同一动作帧。\n\n### 没有 ControlNet 时\n- 仅靠文本提示词很难精确控制人物关节角度，生成的动作经常扭曲或比例失调。\n- 为了匹配特定姿势，需要反复重绘数十次，不仅浪费显卡资源还严重拖慢进度。\n- 每次生成的背景透视和角色位置随机飘移，导致多张素材无法拼合成连续动画。\n- 想要微调局部细节（如手部握剑姿势）几乎不可能，只能重新训练模型或手动修图。\n\n### 使用 ControlNet 后\n- 通过 OpenPose 或 Canny 边缘检测，直接锁定参考图的骨架结构与轮廓线条。\n- 固定姿势约束后，仅需调整服装风格提示词，即可批量产出符合要求的皮肤变体。\n- 生成成功率大幅提升，原本需要半天调试的素材现在半小时即可完成。\n- 确保所有角色立绘在构图、透视和动态上一致，完美适配游戏引擎的动画绑定需求。\n\nControlNet 将不确定的扩散模型转化为可精准操控的生产力工具，彻底解决了创意落地中的构图失控难题。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flllyasviel_ControlNet_f695f4ab.png","lllyasviel",null,"https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Flllyasviel_92d612b9.jpg","Lvmin Zhang (Lyumin Zhang)\r\n\r\n","https:\u002F\u002Fgithub.com\u002Flllyasviel",[81,85],{"name":82,"color":83,"percentage":84},"Python","#3572A5",100,{"name":86,"color":87,"percentage":88},"Shell","#89e051",0,33786,3002,"2026-04-05T09:48:35","Apache-2.0","未说明","需要 GPU，建议 8GB 显存（支持 Low VRAM 模式）",{"notes":96,"python":93,"dependencies":97},"需使用 conda 创建环境并加载 environment.yaml；模型及检测器需从 Hugging Face 页面下载；若显存不足可使用 Low VRAM 模式；部分功能依赖 Gradio 交互界面。",[98],"gradio",[14],31,"2026-03-27T02:49:30.150509","2026-04-06T05:17:10.404659",[104,109,114,119,123,128],{"id":105,"question_zh":106,"answer_zh":107,"source_url":108},3455,"为什么生成的图片质量模糊？","本项目默认使用 DDIM 作为基线。若要获得工业级清晰度，请使用 WebUI 插件，它支持高分辨率修复（high res fix）。","https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet\u002Fissues\u002F264",{"id":110,"question_zh":111,"answer_zh":112,"source_url":113},3456,"推理过程卡死或无错误信息如何解决？","这通常与 Gradio 版本冲突有关。尝试安装特定版本：`pip install gradio==3.40.1`。最新版 4.29.0 可能存在兼容性问题导致无法工作。","https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet\u002Fissues\u002F492",{"id":115,"question_zh":116,"answer_zh":117,"source_url":118},3457,"如何正确安装 xformers 以优化性能？","安装脚本会自动检测环境。Linux 用户可通过 pip 直接安装。Windows 用户需注意 Python 版本，部分版本（如 3.10）可能不支持自动安装，需手动构建。","https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet\u002Fissues\u002F6",{"id":120,"question_zh":121,"answer_zh":122,"source_url":118},3458,"影响生成速度的关键因素有哪些？","除了 xformers，批处理大小（Batch size）和随机种子（random seed）对当前性能影响更大。建议使用宏来测试不同种子的图像生成效果。",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},3459,"如何在命令行中使用 T2I Adapter 模块？","HuggingFace 仓库中可能找不到对应的 `.bin` 或 `.json` 文件。建议查阅 WebUI 扩展代码，因为核心库可能不包含这些模型文件，需配合扩展使用。","https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet\u002Fissues\u002F570",{"id":129,"question_zh":130,"answer_zh":131,"source_url":127},3460,"StableDiffusionControlNetPipeline 是否原生支持 ControlNet？","不支持。AUTOMATIC1111 的 WebUI 不原生支持 ControlNet，必须安装扩展（extension）。在 Diffusers 中可能需要自定义社区管道才能使用。",[]]