[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-QwenLM--Qwen-Image-Layered":3,"tool-QwenLM--Qwen-Image-Layered":64},[4,17,26,40,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,2,"2026-04-03T11:11:01",[13,14,15],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":23,"last_commit_at":32,"category_tags":33,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,34,35,36,15,37,38,13,39],"数据工具","视频","插件","其他","语言模型","音频",{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":10,"last_commit_at":46,"category_tags":47,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,38,37],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":10,"last_commit_at":54,"category_tags":55,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74913,"2026-04-05T10:44:17",[38,14,13,37],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":23,"last_commit_at":62,"category_tags":63,"status":16},2471,"tesseract","tesseract-ocr\u002Ftesseract","Tesseract 是一款历史悠久且备受推崇的开源光学字符识别（OCR）引擎，最初由惠普实验室开发，后由 Google 维护，目前由全球社区共同贡献。它的核心功能是将图片中的文字转化为可编辑、可搜索的文本数据，有效解决了从扫描件、照片或 PDF 文档中提取文字信息的难题，是数字化归档和信息自动化的重要基础工具。\n\n在技术层面，Tesseract 展现了强大的适应能力。从版本 4 开始，它引入了基于长短期记忆网络（LSTM）的神经网络 OCR 引擎，显著提升了行识别的准确率；同时，为了兼顾旧有需求，它依然支持传统的字符模式识别引擎。Tesseract 原生支持 UTF-8 编码，开箱即用即可识别超过 100 种语言，并兼容 PNG、JPEG、TIFF 等多种常见图像格式。输出方面，它灵活支持纯文本、hOCR、PDF、TSV 等多种格式，方便后续数据处理。\n\nTesseract 主要面向开发者、研究人员以及需要构建文档处理流程的企业用户。由于它本身是一个命令行工具和库（libtesseract），不包含图形用户界面（GUI），因此最适合具备一定编程能力的技术人员集成到自动化脚本或应用程序中",73286,"2026-04-03T01:56:45",[13,14],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":81,"owner_website":82,"owner_url":83,"languages":84,"stars":89,"forks":90,"last_commit_at":91,"license":92,"difficulty_score":10,"env_os":93,"env_gpu":94,"env_ram":93,"env_deps":95,"category_tags":104,"github_topics":79,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":105,"updated_at":106,"faqs":107,"releases":108},2136,"QwenLM\u002FQwen-Image-Layered","Qwen-Image-Layered","Qwen-Image-Layered: Layered Decomposition for Inherent Editablity","Qwen-Image-Layered 是一款能够将单张图像智能拆解为多个独立 RGBA 图层的 AI 模型。它主要解决了传统图片编辑中“牵一发而动全身”的痛点：在普通图片中修改某个元素往往会影响背景或其他物体，而 Qwen-Image-Layered 通过物理隔离语义或结构组件，让每个图层都能被单独操控。这意味着用户可以无损地对图中的物体进行移动、缩放、重新着色或删除，且不会破坏画面其他部分的完整性，实现了真正的高保真编辑。\n\n这款工具特别适合设计师、插画师以及需要精细图像处理的内容创作者，同时也为计算机视觉研究人员提供了宝贵的分层数据生成能力。其核心技术亮点在于“分层分解”机制，能将复杂的平面图像还原为具有内在可编辑性的多层结构。虽然它支持文本提示来描述整体画面内容（包括被遮挡的部分），但其最强项在于将现有图片转化为可编辑的多层素材，而非从零生成。目前，Qwen-Image-Layered 已开源模型权重并提供在线演示，用户只需简单的代码调用或通过 Web 界面，即可轻松体验将静态图片变为灵活素材的过程，极大提升了后期制作的效率与自由度。","\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Fqianwen-res.oss-cn-beijing.aliyuncs.com\u002FQwen-Image\u002Flayered\u002Fqwen-image-layered-logo.png\" width=\"800\"\u002F>\n\u003Cp> \n\u003Cp align=\"center\">&nbsp&nbsp🤗 \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen-Image-Layered\">HuggingFace\u003C\u002Fa>&nbsp&nbsp | &nbsp&nbsp🤖 \u003Ca href=\"https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002FQwen\u002FQwen-Image-Layered\">ModelScope\u003C\u002Fa>&nbsp&nbsp | &nbsp&nbsp 📑 \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.15603\">Research Paper\u003C\u002Fa> &nbsp&nbsp | &nbsp&nbsp 📑 \u003Ca href=\"https:\u002F\u002Fqwen.ai\u002Fblog?id=qwen-image-layered\">Blog\u003C\u002Fa> &nbsp&nbsp | &nbsp&nbsp 🤗 \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FQwen\u002FQwen-Image-Layered\">Demo\u003C\u002Fa> &nbsp&nbsp \n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Fqianwen-res.oss-cn-beijing.aliyuncs.com\u002FQwen-Image\u002Flayered\u002Flayered.JPG\" width=\"1024\"\u002F>\n\u003Cp>\n\n## Introduction\nWe are excited to introduce **Qwen-Image-Layered**, a model capable of decomposing an image into multiple RGBA layers. This layered representation unlocks **inherent editability**: each layer can be independently manipulated without affecting other content. Meanwhile, such a layered representation naturally supports **high-fidelity elementary operations**-such as resizing, reposition, and recoloring. By physically isolating semantic or structural components into distinct layers, our approach enables high-fidelity and consistent editing.\n\n\n[![Qwen Image Layered](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FQwenLM_Qwen-Image-Layered_readme_7c3ee4e69d19.jpg)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=OVhmiBrsziQ)\n\n\n## News\n- 2025.12.22: You can try Qwen-Image-Layered on [Huggingface Spaces](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FQwen\u002FQwen-Image-Layered) and [Modelscope Studio](https:\u002F\u002Fmodelscope.cn\u002Fstudios\u002FQwen\u002FQwen-Image-Layered).\n- 2025.12.19: We released Qwen-Image-Layered weights! Check at [Huggingface](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen-Image-Layered) and [ModelScope](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002FQwen\u002FQwen-Image-Layered)!\n- 2025.12.19: We released Qwen-Image-Layered! Check our [Blog](https:\u002F\u002Fqwen.ai\u002Fblog?id=qwen-image-layered) for more details!\n- 2025.12.18: We released our [Research Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.15603) on Arxiv!\n\n> [!NOTE]\n> - The text prompt is intended to describe the overall content of the input image—including elements that may be partially occluded (e.g., you may specify the text hidden behind a foreground object). It is not designed to control the semantic content of individual layers explicitly.\n> - The released weights are specifically fine-tuned for the image-to-multi-RGBA decomposition task. As a result, while the model supports text-conditioned inference, its performance on text-to-multi-RGBA generation is limited.\n\n## Quick Start\n\n1. Make sure your transformers>=4.51.3 (Supporting Qwen2.5-VL)\n\n2. Install the latest version of diffusers\n```\npip install git+https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers\npip install python-pptx\npip install psd-tools\n```\n\n\n```python\nfrom diffusers import QwenImageLayeredPipeline\nimport torch\nfrom PIL import Image\n\npipeline = QwenImageLayeredPipeline.from_pretrained(\"Qwen\u002FQwen-Image-Layered\")\npipeline = pipeline.to(\"cuda\", torch.bfloat16)\npipeline.set_progress_bar_config(disable=None)\n\nimage = Image.open(\"asserts\u002Ftest_images\u002F1.png\").convert(\"RGBA\")\ninputs = {\n    \"image\": image,\n    \"generator\": torch.Generator(device='cuda').manual_seed(777),\n    \"true_cfg_scale\": 4.0,\n    \"negative_prompt\": \" \",\n    \"num_inference_steps\": 50,\n    \"num_images_per_prompt\": 1,\n    \"layers\": 4,\n    \"resolution\": 640,      # Using different bucket (640, 1024) to determine the resolution. For this version, 640 is recommended\n    \"cfg_normalize\": True,  # Whether enable cfg normalization.\n    \"use_en_prompt\": True,  # Automatic caption language if user does not provide caption\n}\n\nwith torch.inference_mode():\n    output = pipeline(**inputs)\n    output_image = output.images[0]\n\nfor i, image in enumerate(output_image):\n    image.save(f\"{i}.png\")\n```\n\n## Deploy Qwen-Image-Layered\nThe following scripts will start a Gradio-based web interface where you can decompose an image and export the layers into pptx, zip, and psd files, where you can edit and move these layers flexibly.\n```bash\npython src\u002Fapp.py\n```\n\nAfter decomposition, you may want to edit specific layers. The following scripts will launch a Gradio-based web interface where you can edit images with transparency using Qwen-Image-Edit.\n```bash\npython src\u002Ftool\u002Fedit_rgba_image.py\n```\n\nAfter editing the individual decomposed layers, you can use the following script to combine them into a new image. Remember to upload the layers in order—from the bottom layer to the top.\n```bash\npython src\u002Ftool\u002Fcombine_layers.py\n```\n\n### vLLM-Omni\n[vLLM-Omni](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm-omni) now supports Qwen-Image-Layered. See the [recipes](https:\u002F\u002Fdocs.vllm.ai\u002Fprojects\u002Frecipes\u002Fen\u002Flatest\u002FQwen\u002FQwen-Image.html) for up-to-date details.\n\n## Showcase\n### Layered Decomposition in Application\nGiven an image, Qwen-Image-Layered can decompose it into several RGBA layers:\n![Example Image](https:\u002F\u002Fqianwen-res.oss-cn-beijing.aliyuncs.com\u002FQwen-Image\u002Flayered\u002F幻灯片1.JPG)\n\nAfter decomposition, edits are applied exclusively to the target layer, physically isolating it from the rest of the content, and thereby fundamentally ensuring consistency across edits. \n\nFor example, we can recolor the first layer and keep all other content untouched:\n![Example Image](https:\u002F\u002Fqianwen-res.oss-cn-beijing.aliyuncs.com\u002FQwen-Image\u002Flayered\u002F幻灯片2.JPG)\n\nWe can also replace the second layer from a girl to a boy (The target layer is edited using Qwen-Image-Edit):\n![Example Image](https:\u002F\u002Fqianwen-res.oss-cn-beijing.aliyuncs.com\u002FQwen-Image\u002Flayered\u002F幻灯片3.JPG)\n\nHere, we revise the text to \"Qwen-Image\" (The target layer is edited using Qwen-Image-Edit):\n![Example Image](https:\u002F\u002Fqianwen-res.oss-cn-beijing.aliyuncs.com\u002FQwen-Image\u002Flayered\u002F幻灯片4.JPG)\n\nFurthermore, the layered structure naturally supports elemetary operations. For example, we can delete unwanted objects cleanly:\n![Example Image](https:\u002F\u002Fqianwen-res.oss-cn-beijing.aliyuncs.com\u002FQwen-Image\u002Flayered\u002F幻灯片5.JPG)\n\nWe can also resize an object without distortion:\n![Example Image](https:\u002F\u002Fqianwen-res.oss-cn-beijing.aliyuncs.com\u002FQwen-Image\u002Flayered\u002F幻灯片6.JPG)\n\nAfter layer decomposition, we can move objects freely within the canvas:\n![Example Image](https:\u002F\u002Fqianwen-res.oss-cn-beijing.aliyuncs.com\u002FQwen-Image\u002Flayered\u002F幻灯片7.JPG)\n\n### Flexible and Iterative Decomposition\nQwen-Image-Layered is not limited to a fixed number of layers. The model supports variable-layer decomposition. For example, we can decompose an image into either 3 or 8 layers as needed:\n\n![Example Image](https:\u002F\u002Fqianwen-res.oss-cn-beijing.aliyuncs.com\u002FQwen-Image\u002Flayered\u002F幻灯片8.JPG)\n\nMoreover, decomposition can be applied recursively: any layer can itself be further decomposed, enabling infinite decomposition. \n\n![Example Image](https:\u002F\u002Fqianwen-res.oss-cn-beijing.aliyuncs.com\u002FQwen-Image\u002Flayered\u002F幻灯片9.JPG)\n\n\n## License Agreement\n\nQwen-Image-Layered is licensed under Apache 2.0. \n\n## Citation\n\nWe kindly encourage citation of our work if you find it useful.\n\n```bibtex\n@misc{yin2025qwenimagelayered,\n      title={Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition}, \n      author={Shengming Yin, Zekai Zhang, Zecheng Tang, Kaiyuan Gao, Xiao Xu, Kun Yan, Jiahao Li, Yilei Chen, Yuxiang Chen, Heung-Yeung Shum, Lionel M. Ni, Jingren Zhou, Junyang Lin, Chenfei Wu},\n      year={2025},\n      eprint={2512.15603},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.15603}, \n}\n```\n","\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Fqianwen-res.oss-cn-beijing.aliyuncs.com\u002FQwen-Image\u002Flayered\u002Fqwen-image-layered-logo.png\" width=\"800\"\u002F>\n\u003Cp> \n\u003Cp align=\"center\">&nbsp&nbsp🤗 \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen-Image-Layered\">HuggingFace\u003C\u002Fa>&nbsp&nbsp | &nbsp&nbsp🤖 \u003Ca href=\"https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002FQwen\u002FQwen-Image-Layered\">ModelScope\u003C\u002Fa>&nbsp&nbsp | &nbsp&nbsp 📑 \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.15603\">研究论文\u003C\u002Fa> &nbsp&nbsp | &nbsp&nbsp 📑 \u003Ca href=\"https:\u002F\u002Fqwen.ai\u002Fblog?id=qwen-image-layered\">博客\u003C\u002Fa> &nbsp&nbsp | &nbsp&nbsp 🤗 \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FQwen\u002FQwen-Image-Layered\">演示\u003C\u002Fa> &nbsp&nbsp \n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Fqianwen-res.oss-cn-beijing.aliyuncs.com\u002FQwen-Image\u002Flayered\u002Flayered.JPG\" width=\"1024\"\u002F>\n\u003Cp>\n\n## 简介\n我们非常高兴地推出**Qwen-Image-Layered**，这是一款能够将图像分解为多个RGBA图层的模型。这种分层表示赋予了图像**内在的可编辑性**：每个图层都可以独立操作，而不会影响其他内容。同时，这种分层表示也自然支持**高保真度的基本操作**，例如调整大小、重新定位和重新着色。通过将语义或结构组件物理上隔离到不同的层中，我们的方法实现了高质量且一致的编辑效果。\n\n\n[![Qwen Image Layered](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FQwenLM_Qwen-Image-Layered_readme_7c3ee4e69d19.jpg)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=OVhmiBrsziQ)\n\n\n## 最新消息\n- 2025年12月22日：您可以在[Huggingface Spaces](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FQwen\u002FQwen-Image-Layered)和[Modelscope Studio](https:\u002F\u002Fmodelscope.cn\u002Fstudios\u002FQwen\u002FQwen-Image-Layered)上试用Qwen-Image-Layered。\n- 2025年12月19日：我们发布了Qwen-Image-Layered的权重！请访问[Huggingface](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen-Image-Layered)和[ModelScope](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002FQwen\u002FQwen-Image-Layered)查看！\n- 2025年12月19日：我们正式发布了Qwen-Image-Layered！更多详情请参阅我们的[博客](https:\u002F\u002Fqwen.ai\u002Fblog?id=qwen-image-layered)！\n- 2025年12月18日：我们在Arxiv上发表了我们的[研究论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.15603)！\n\n> [!注意]\n> - 文本提示旨在描述输入图像的整体内容——包括可能被部分遮挡的元素（例如，您可以指定前景物体后面隐藏的文字）。它并非用于显式控制单个层的语义内容。\n> - 发布的权重是专门为图像到多RGBA图层分解任务进行微调的。因此，尽管该模型支持文本条件下的推理，但在文本到多RGBA图层生成方面的表现较为有限。\n\n## 快速入门\n\n1. 确保您的transformers版本≥4.51.3（支持Qwen2.5-VL）。\n\n2. 安装最新版本的diffusers：\n```\npip install git+https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers\npip install python-pptx\npip install psd-tools\n```\n\n\n```python\nfrom diffusers import QwenImageLayeredPipeline\nimport torch\nfrom PIL import Image\n\npipeline = QwenImageLayeredPipeline.from_pretrained(\"Qwen\u002FQwen-Image-Layered\")\npipeline = pipeline.to(\"cuda\", torch.bfloat16)\npipeline.set_progress_bar_config(disable=None)\n\nimage = Image.open(\"asserts\u002Ftest_images\u002F1.png\").convert(\"RGBA\")\ninputs = {\n    \"image\": image,\n    \"generator\": torch.Generator(device='cuda').manual_seed(777),\n    \"true_cfg_scale\": 4.0,\n    \"negative_prompt\": \" \",\n    \"num_inference_steps\": 50,\n    \"num_images_per_prompt\": 1,\n    \"layers\": 4,\n    \"resolution\": 640,      # 使用不同的分辨率桶（640、1024）来确定分辨率。对于此版本，建议使用640\n    \"cfg_normalize\": True,  # 是否启用cfg归一化。\n    \"use_en_prompt\": True,  # 如果用户未提供说明文字，则自动使用英文说明\n}\n\nwith torch.inference_mode():\n    output = pipeline(**inputs)\n    output_image = output.images[0]\n\nfor i, image in enumerate(output_image):\n    image.save(f\"{i}.png\")\n```\n\n## 部署Qwen-Image-Layered\n以下脚本将启动一个基于Gradio的Web界面，您可以在其中对图像进行分层分解，并将各层导出为pptx、zip和psd文件，以便灵活编辑和移动这些层。\n```bash\npython src\u002Fapp.py\n```\n\n分解完成后，您可能希望编辑特定的图层。以下脚本将启动一个基于Gradio的Web界面，您可以在其中使用Qwen-Image-Edit对具有透明度的图像进行编辑。\n```bash\npython src\u002Ftool\u002Fedit_rgba_image.py\n```\n\n在对各个分解后的层进行编辑后，您可以使用以下脚本将其合并成一张新的图像。请务必按照顺序上传图层——从底层到顶层。\n```bash\npython src\u002Ftool\u002Fcombine_layers.py\n```\n\n### vLLM-Omni\n[vLLM-Omni](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm-omni)现已支持Qwen-Image-Layered。有关最新详情，请参阅[配方](https:\u002F\u002Fdocs.vllm.ai\u002Fprojects\u002Frecipes\u002Fen\u002Flatest\u002FQwen\u002FQwen-Image.html)。\n\n## 展示\n### 分层分解的应用\n给定一张图像，Qwen-Image-Layered可以将其分解为若干个RGBA图层：\n![示例图像](https:\u002F\u002Fqianwen-res.oss-cn-beijing.aliyuncs.com\u002FQwen-Image\u002Flayered\u002F幻灯片1.JPG)\n\n分解后，编辑仅作用于目标层，使其与其他内容物理隔离，从而从根本上保证了编辑的一致性。\n\n例如，我们可以为第一层重新着色，而保持其他内容不变：\n![示例图像](https:\u002F\u002Fqianwen-res.oss-cn-beijing.aliyuncs.com\u002FQwen-Image\u002Flayered\u002F幻灯片2.JPG)\n\n我们还可以将第二层中的女孩替换为男孩（目标层使用Qwen-Image-Edit进行编辑）：\n![示例图像](https:\u002F\u002Fqianwen-res.oss-cn-beijing.aliyuncs.com\u002FQwen-Image\u002Flayered\u002F幻灯片3.JPG)\n\n在这里，我们将文字修改为“Qwen-Image”（目标层使用Qwen-Image-Edit进行编辑）：\n![示例图像](https:\u002F\u002Fqianwen-res.oss-cn-beijing.aliyuncs.com\u002FQwen-Image\u002Flayered\u002F幻灯片4.JPG)\n\n此外，分层结构还自然支持基本操作。例如，我们可以干净利落地删除不需要的对象：\n![示例图像](https:\u002F\u002Fqianwen-res.oss-cn-beijing.aliyuncs.com\u002FQwen-Image\u002Flayered\u002F幻灯片5.JPG)\n\n我们也可以在不发生变形的情况下调整对象的大小：\n![示例图像](https:\u002F\u002Fqianwen-res.oss-cn-beijing.aliyuncs.com\u002FQwen-Image\u002Flayered\u002F幻灯片6.JPG)\n\n在完成图层分解后，我们可以自由地在画布内移动对象：\n![示例图像](https:\u002F\u002Fqianwen-res.oss-cn-beijing.aliyuncs.com\u002FQwen-Image\u002Flayered\u002F幻灯片7.JPG)\n\n### 灵活且迭代的分解\nQwen-Image-Layered并不局限于固定的图层数量。该模型支持可变数量的图层分解。例如，我们可以根据需要将图像分解为3层或8层：\n\n![示例图像](https:\u002F\u002Fqianwen-res.oss-cn-beijing.aliyuncs.com\u002FQwen-Image\u002Flayered\u002F幻灯片8.JPG)\n\n此外，分解还可以递归进行：任何一层都可以进一步分解，从而实现无限次分解。\n\n![示例图像](https:\u002F\u002Fqianwen-res.oss-cn-beijing.aliyuncs.com\u002FQwen-Image\u002Flayered\u002F幻灯片9.JPG)\n\n\n## 许可协议\n\nQwen-Image-Layered采用Apache 2.0许可证授权。\n\n## 引用\n\n如果您觉得我们的工作有用，我们诚挚地鼓励您引用我们的研究成果。\n\n```bibtex\n@misc{yin2025qwenimagelayered,\n      title={Qwen-Image-Layered：通过图层分解实现内在可编辑性}, \n      author={尹盛明, 张泽凯, 唐哲成, 高凯元, 徐晓, 严坤, 李嘉豪, 陈一磊, 陈宇翔, 沈向洋, 尼礼伦, 周景仁, 林俊阳, 吴晨飞},\n      year={2025},\n      eprint={2512.15603},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.15603}, \n}\n```","# Qwen-Image-Layered 快速上手指南\n\nQwen-Image-Layered 是一款能够将图像分解为多个 RGBA 图层的 AI 模型。通过这种分层表示，用户可以独立编辑每个图层（如调整大小、重新定位、重新着色），而不会影响其他内容，从而实现高保真且一致的图像编辑。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux \u002F Windows \u002F macOS\n*   **Python**: 建议 Python 3.10+\n*   **GPU**: 推荐使用支持 CUDA 的 NVIDIA 显卡（显存建议 16GB 以上以获得最佳体验）\n*   **核心依赖**:\n    *   `transformers` >= 4.51.3 (需支持 Qwen2.5-VL)\n    *   `diffusers` (最新开发版)\n    *   `torch` (支持 bfloat16)\n\n## 安装步骤\n\n请依次执行以下命令安装必要的依赖库。为了获取最新功能，`diffusers` 建议直接从 GitHub 安装源码版本。\n\n```bash\n# 升级 transformers 以支持 Qwen2.5-VL\npip install -U \"transformers>=4.51.3\"\n\n# 安装最新版 diffusers\npip install git+https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers\n\n# 安装辅助工具库（用于导出 PPTX 和 PSD 文件）\npip install python-pptx\npip install psd-tools\n```\n\n> **提示**：国内用户若下载速度较慢，可添加 `-i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple` 参数使用清华镜像源。\n\n## 基本使用\n\n以下是最简单的 Python 代码示例，展示如何加载模型并将一张图片分解为 4 个 RGBA 图层。\n\n```python\nfrom diffusers import QwenImageLayeredPipeline\nimport torch\nfrom PIL import Image\n\n# 1. 加载预训练模型\npipeline = QwenImageLayeredPipeline.from_pretrained(\"Qwen\u002FQwen-Image-Layered\")\n# 移动到 GPU 并使用 bfloat16 精度以节省显存并加速推理\npipeline = pipeline.to(\"cuda\", torch.bfloat16)\npipeline.set_progress_bar_config(disable=None)\n\n# 2. 准备输入图像\n# 请替换为您本地的图片路径\nimage = Image.open(\"asserts\u002Ftest_images\u002F1.png\").convert(\"RGBA\")\n\n# 3. 配置推理参数\ninputs = {\n    \"image\": image,\n    \"generator\": torch.Generator(device='cuda').manual_seed(777), # 固定随机种子\n    \"true_cfg_scale\": 4.0,\n    \"negative_prompt\": \" \",\n    \"num_inference_steps\": 50,       # 推理步数\n    \"num_images_per_prompt\": 1,\n    \"layers\": 4,                     # 期望分解的图层数量\n    \"resolution\": 640,               # 分辨率桶 (推荐 640 或 1024)\n    \"cfg_normalize\": True,           # 是否启用 CFG 归一化\n    \"use_en_prompt\": True,           # 若未提供描述，自动使用英文生成描述\n}\n\n# 4. 执行分解\nwith torch.inference_mode():\n    output = pipeline(**inputs)\n    output_image = output.images[0]\n\n# 5. 保存结果\n# 输出的图层将按顺序保存为独立的 PNG 文件\nfor i, image in enumerate(output_image):\n    image.save(f\"layer_{i}.png\")\n    print(f\"Layer {i} saved.\")\n```\n\n### 关键参数说明\n*   **`layers`**: 指定希望将图像分解为多少个图层（支持动态调整，如 3 层或 8 层）。\n*   **`resolution`**: 当前版本推荐使用 `640`，也可尝试 `1024`。\n*   **文本提示**: 该模型主要用于“图到多层”分解。虽然支持文本条件推理，但目前的权重主要针对分解任务微调，文本生成能力有限。文本提示主要用于描述整体内容（包括被遮挡部分），而非控制单个图层的具体语义。\n\n### 进阶部署\n如果您希望启动一个 Web 界面进行交互式分解、编辑或合并图层，可以使用项目自带的 Gradio 脚本：\n\n*   **启动分解与导出界面** (支持导出 pptx, zip, psd):\n    ```bash\n    python src\u002Fapp.py\n    ```\n*   **启动图层编辑界面** (基于 Qwen-Image-Edit):\n    ```bash\n    python src\u002Ftool\u002Fedit_rgba_image.py\n    ```\n*   **启动图层合并工具**:\n    ```bash\n    python src\u002Ftool\u002Fcombine_layers.py\n    ```","电商运营设计师小李需要为“双 11\"大促快速制作多套商品海报，要求同一商品在不同背景、尺寸和配色方案中灵活切换。\n\n### 没有 Qwen-Image-Layered 时\n- **抠图耗时且边缘粗糙**：面对复杂的商品图（如透明玻璃瓶或毛绒玩具），手动抠图需花费数小时，且发丝或透明边缘难以处理完美。\n- **修改牵一发而动全身**：一旦客户想调整商品位置或大小，重新合成背景时往往需要返工重做，无法直接移动主体。\n- **多版本适配效率低**：为适应手机端、PC 端及线下大屏等不同分辨率，每次都需要重新调整图层并导出，极易出错。\n- **色彩调整受限**：若想尝试商品换色（如将红色包装改为蓝色），必须依赖原始设计工程文件，若只有扁平化图片则几乎无法实现。\n\n### 使用 Qwen-Image-Layered 后\n- **一键自动分层解耦**：Qwen-Image-Layered 能瞬间将整张商品图分解为多个独立的 RGBA 图层，自动分离前景商品、阴影及背景，边缘处理自然精准。\n- **元素独立自由编辑**：生成的每个图层均可单独操作，小李可直接拖拽商品位置、无损缩放大小，完全不影响其他视觉元素。\n- **高保真多端适配**：基于分层结构，可快速重组画面以适应不同分辨率需求，确保商品在任何尺寸下都保持清晰且构图合理。\n- **无损换色与重构**：利用分层特性，直接对特定图层进行重着色（Recoloring）或替换背景，无需原始工程文件即可实现“换装”效果。\n\nQwen-Image-Layered 通过将静态图像转化为可独立操控的分层资产，彻底打破了传统修图的线性流程，让创意修改变得像搭积木一样简单高效。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FQwenLM_Qwen-Image-Layered_7c3ee4e6.jpg","QwenLM","Qwen","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FQwenLM_4756c6c9.png","Alibaba Cloud's general-purpose AI models",null,"qianwen_opensource@alibabacloud.com","Alibaba_Qwen","https:\u002F\u002Fqwen.ai\u002F","https:\u002F\u002Fgithub.com\u002FQwenLM",[85],{"name":86,"color":87,"percentage":88},"Python","#3572A5",100,1750,136,"2026-04-04T19:55:40","Apache-2.0","未说明","必需 NVIDIA GPU (代码示例使用 'cuda')，显存需求未明确说明 (建议 8GB+ 以运行 bfloat16)，CUDA 版本未说明",{"notes":96,"python":93,"dependencies":97},"模型权重针对图像到多 RGBA 分解任务微调，文本生成能力有限；推荐使用分辨率 640；支持通过 vLLM-Omni 部署；运行示例代码需将管道移至 cuda 并使用 bfloat16 精度。",[98,99,100,101,102,103],"transformers>=4.51.3","diffusers (最新版)","torch","python-pptx","psd-tools","PIL (Pillow)",[14,37],"2026-03-27T02:49:30.150509","2026-04-06T05:37:44.291816",[],[]]