[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-1038lab--ComfyUI-QwenVL":3,"tool-1038lab--ComfyUI-QwenVL":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",149489,2,"2026-04-10T11:32:46",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":75,"owner_location":75,"owner_email":75,"owner_twitter":75,"owner_website":75,"owner_url":76,"languages":77,"stars":86,"forks":87,"last_commit_at":88,"license":89,"difficulty_score":10,"env_os":90,"env_gpu":91,"env_ram":90,"env_deps":92,"category_tags":99,"github_topics":100,"view_count":32,"oss_zip_url":75,"oss_zip_packed_at":75,"status":17,"created_at":105,"updated_at":106,"faqs":107,"releases":138},6292,"1038lab\u002FComfyUI-QwenVL","ComfyUI-QwenVL","ComfyUI-QwenVL custom node: Integrates the Qwen-VL series, including Qwen2.5-VL and the latest Qwen3-VL, with GGUF support for advanced multimodal AI in text generation, image understanding, and video analysis.","ComfyUI-QwenVL 是一款专为 ComfyUI 设计的自定义节点，旨在将阿里云强大的 Qwen-VL 系列视觉语言模型（包括最新的 Qwen3-VL 和 Qwen2.5-VL）无缝集成到您的工作流中。它主要解决了在本地可视化界面中高效部署多模态 AI 的难题，让用户能够轻松实现图像理解、视频帧序列分析以及高质量文本生成，无需编写复杂代码。\n\n这款工具非常适合希望拓展 ComfyUI 功能的设计师、AI 爱好者以及需要快速验证多模态应用的研究人员。无论是构建智能图文助手还是进行视频内容分析，它都能提供灵活的支持。其技术亮点在于广泛的兼容性与性能优化：不仅支持标准的 Hugging Face 模型，还引入了 GGUF 后端以大幅降低显存占用；具备智能量化功能（4-bit\u002F8-bit\u002FFP16），可根据硬件自动调整；最新版本更加入了 SageAttention 加速技术和针对特定 GPU 架构的内核优化，显著提升了推理速度与稳定性。此外，它还提供了从简易到高级的多种节点模式及预设提示词系统，兼顾了新手上手的便捷性与专家用户对细节的掌控需求。","# **QwenVL for ComfyUI**\n\nThe ComfyUI-QwenVL custom node integrates the powerful Qwen-VL series of vision-language models (LVLMs) from Alibaba Cloud, including the latest Qwen3-VL and Qwen2.5-VL, plus GGUF backends and text-only Qwen3 support. This advanced node enables seamless multimodal AI capabilities within your ComfyUI workflows, allowing for efficient text generation, image understanding, and video analysis.\n\n![QwenVL_V1.1.0](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F1038lab_ComfyUI-QwenVL_readme_d7baa5edfaa3.png)\n\n## **📰 News & Updates**\n* **2026\u002F02\u002F08**: **v2.1.1**  Fixed compatibility for  Transformers 4.x and 5.x [[Update](https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL\u002Fblob\u002Fmain\u002Fupdate.md#version-211-20260208)]\n\n* **2026\u002F02\u002F05**: **v2.1.0** Added SageAttention support with per-GPU architecture optimization, improved FP8 model handling, and automatic attention mode selection. [[Update](https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL\u002Fblob\u002Fmain\u002Fupdate.md#version-210-20260205)]\n  * **SageAttention Support**: New attention mode with per-GPU optimized kernels (SM80, SM89, SM90, SM120)\n  * **Improved FP8 Handling**: Better support for pre-quantized FP8 models with automatic SDPA fallback\n  * **Smart Attention Selection**: Auto mode now tries Sage → Flash → SDPA for optimal performance\n  * **Progress Bar**: Added ComfyUI progress bar for model loading and generation stages\n  * **Better Memory Management**: Improved cache clearing when changing attention modes or quantization\n* **2025\u002F12\u002F22**: **v2.0.0** Added GGUF supported nodes and Prompt Enhancer nodes. [[Update](https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL\u002Fblob\u002Fmain\u002Fupdate.md#version-200-20251222)]\n> [!IMPORTANT]  \n> Install llama-cpp-python before running GGUF nodes [instruction](docs\u002FLLAMA_CPP_PYTHON_VISION_INSTALL.md)\n> \n![600346260_122188475918461193_3763807942053883496_n](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F1038lab_ComfyUI-QwenVL_readme_f385399d2244.png)\n* **2025\u002F11\u002F10**: **v1.1.0** Runtime overhaul with attention-mode selector, flash-attn auto detection, smarter caching, and quantization\u002Ftorch.compile controls in both nodes. [[Update](https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL\u002Fblob\u002Fmain\u002Fupdate.md#version-110-20251110)]\n* **2025\u002F10\u002F31**: **v1.0.4** Custom Models Supported [[Update](https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL\u002Fblob\u002Fmain\u002Fupdate.md#version-104-20251031)]\n* **2025\u002F10\u002F22**: **v1.0.3** Models list updated [[Update](https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL\u002Fblob\u002Fmain\u002Fupdate.md#version-103-20251022)]\n* **2025\u002F10\u002F17**: **v1.0.0** Initial Release  \n  * Support for Qwen3-VL and Qwen2.5-VL series models.  \n  * Automatic model downloading from Hugging Face.  \n  * On-the-fly quantization (4-bit, 8-bit, FP16).  \n  * Preset and Custom Prompt system for flexible and easy use.  \n  * **Includes both a standard and an advanced node** for users of all levels.  \n  * Hardware-aware safeguards for FP8 model compatibility.  \n  * Image and Video (frame sequence) input support.  \n  * \"Keep Model Loaded\" option for improved performance on sequential runs.  \n  * **Seed parameter** for reproducible generation.\n\n[![QwenVL_V1.0.0r](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F1038lab_ComfyUI-QwenVL_readme_5bb47e60f14b.jpg)](https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL\u002Fblob\u002Fmain\u002Fexample_workflows\u002FQWenVL.json)\n\n## **✨ Features**\n\n* **Standard & Advanced Nodes**: Includes a simple QwenVL node for quick use and a QwenVL (Advanced) node with fine-grained control over generation.  \n* **Prompt Enhancers**: Dedicated text-only prompt enhancers for both HF and GGUF backends.  \n* **Preset & Custom Prompts**: Choose from a list of convenient preset prompts or write your own for full control.  \n* **Multi-Model Support**: Easily switch between various official Qwen-VL models.  \n* **Automatic Model Download**: Models are downloaded automatically on first use.  \n* **Smart Quantization**: Balance VRAM and performance with 4-bit, 8-bit, and FP16 options.  \n* **Hardware-Aware**: Automatically detects GPU capabilities and prevents errors with incompatible models (e.g., FP8).  \n* **Reproducible Generation**: Use the seed parameter to get consistent outputs.  \n* **Memory Management**: \"Keep Model Loaded\" option to retain the model in VRAM for faster processing.  \n* **Image & Video Support**: Accepts both single images and video frame sequences as input.  \n* **Robust Error Handling**: Provides clear error messages for hardware or memory issues.  \n* **Clean Console Output**: Minimal and informative console logs during operation.\n* **SageAttention Support**: GPU-optimized attention mechanism with per-architecture kernels (Ampere, Ada, Hopper, Blackwell).\n* **Progress Bar**: Visual feedback during model loading and generation stages.\n* **Intelligent Cache Management**: Automatically clears VRAM when changing attention modes or quantization settings.\n\n## **🚀 Installation**\n\n1. Clone this repository to your ComfyUI\u002Fcustom\\_nodes directory:  \n   ```\n   cd ComfyUI\u002Fcustom\\_nodes  \n   git clone https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL.git\n   ```\n2. Install the required dependencies:  \n   ```\n   cd ComfyUI\u002Fcustom\\_nodes\u002FComfyUI-QwenVL  \n   pip install \\-r requirements.txt\n   ```\n\n3. Restart ComfyUI.\n\n### **Optional: SageAttention Support**\nFor optimal performance on supported GPUs, install SageAttention:\n```\npip install sageattention\n```\n\n## **🧭 Node Overview**\n\n### **Transformers (HF) Nodes**\n- **QwenVL**: Quick vision-language inference (image\u002Fvideo + preset\u002Fcustom prompts).  \n- **QwenVL (Advanced)**: Full control over sampling, device, and performance settings.  \n- **QwenVL Prompt Enhancer**: Text-only prompt enhancement (supports both Qwen3 text models and QwenVL models in text mode).  \n\n### **GGUF (llama.cpp) Nodes**\n- **QwenVL (GGUF)**: GGUF vision-language inference.  \n- **QwenVL (GGUF Advanced)**: Extended GGUF controls (context, GPU layers, etc.).  \n- **QwenVL Prompt Enhancer (GGUF)**: GGUF text-only prompt enhancement.  \n\n## **🧩 GGUF Nodes (llama.cpp backend)**\n\nThis repo includes **GGUF** nodes powered by `llama-cpp-python` (separate from the Transformers-based nodes).\n\n- **Nodes**: `QwenVL (GGUF)`, `QwenVL (GGUF Advanced)`, `QwenVL Prompt Enhancer (GGUF)`\n- **Model folder** (default): `ComfyUI\u002Fmodels\u002Fllm\u002FGGUF\u002F` (configurable via `gguf_models.json`)\n- **Vision requirement**: install a vision-capable `llama-cpp-python` wheel that provides `Qwen3VLChatHandler` \u002F `Qwen25VLChatHandler`  \n  See [docs\u002FLLAMA_CPP_PYTHON_VISION_INSTALL.md](docs\u002FLLAMA_CPP_PYTHON_VISION_INSTALL.md)\n\n## **🗂️ Config Files**\n\n- **HF models**: `hf_models.json`  \n  - `hf_vl_models`: vision-language models (used by QwenVL nodes).  \n  - `hf_text_models`: text-only models (used by Prompt Enhancer).  \n- **GGUF models**: `gguf_models.json`  \n- **System prompts**: `AILab_System_Prompts.json` (includes both VL prompts and prompt-enhancer styles).  \n\n## **📥 Download Models**\n\nThe models will be automatically downloaded on first use. If you prefer to download them manually, place them in the ComfyUI\u002Fmodels\u002FLLM\u002FQwen-VL\u002F directory.\n\n### **HF Vision Models (Qwen-VL)**\n| Model | Link |\n| :---- | :---- |\n| Qwen3-VL-2B-Instruct | [Download](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-2B-Instruct) |\n| Qwen3-VL-2B-Thinking | [Download](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-2B-Thinking) |\n| Qwen3-VL-2B-Instruct-FP8 | [Download](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-2B-Instruct-FP8) |\n| Qwen3-VL-2B-Thinking-FP8 | [Download](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-2B-Thinking-FP8) |\n| Qwen3-VL-4B-Instruct | [Download](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-4B-Instruct) |\n| Qwen3-VL-4B-Thinking | [Download](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-4B-Thinking) |\n| Qwen3-VL-4B-Instruct-FP8 | [Download](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-4B-Instruct-FP8) |\n| Qwen3-VL-4B-Thinking-FP8 | [Download](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-4B-Thinking-FP8) |\n| Qwen3-VL-8B-Instruct | [Download](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-8B-Instruct) |\n| Qwen3-VL-8B-Thinking | [Download](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-8B-Thinking) |\n| Qwen3-VL-8B-Instruct-FP8 | [Download](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-8B-Instruct-FP8) |\n| Qwen3-VL-8B-Thinking-FP8 | [Download](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-8B-Thinking-FP8) |\n| Qwen3-VL-32B-Instruct | [Download](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-32B-Instruct) |\n| Qwen3-VL-32B-Thinking | [Download](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-32B-Thinking) |\n| Qwen3-VL-32B-Instruct-FP8 | [Download](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-32B-Instruct-FP8) |\n| Qwen3-VL-32B-Thinking-FP8 | [Download](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-32B-Thinking-FP8) |\n| Qwen2.5-VL-3B-Instruct | [Download](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen2.5-VL-3B-Instruct) |\n| Qwen2.5-VL-7B-Instruct | [Download](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen2.5-VL-7B-Instruct) |\n\n### **HF Text Models (Qwen3)**\n| Model | Link |\n| :---- | :---- |\n| Qwen3-0.6B | [Download](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-0.6B) |\n| Qwen3-4B-Instruct-2507 | [Download](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-4B-Instruct-2507) |\n| qwen3-4b-Z-Image-Engineer | [Download](https:\u002F\u002Fhuggingface.co\u002FBennyDaBall\u002Fqwen3-4b-Z-Image-Engineer) |\n\n### **GGUF Models (Manual Download)**\n| Group | Model | Repo | Alt Repo | Model Files | MMProj |\n| :-- | :-- | :-- | :-- | :-- | :-- |\n| Qwen text (GGUF) | Qwen3-4B-GGUF | [Qwen\u002FQwen3-4B-GGUF](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-4B-GGUF) |  | Qwen3-4B-Q4_K_M.gguf, Qwen3-4B-Q5_0.gguf, Qwen3-4B-Q5_K_M.gguf, Qwen3-4B-Q6_K.gguf, Qwen3-4B-Q8_0.gguf |  |\n| Qwen-VL (GGUF) | Qwen3-VL-4B-Instruct-GGUF | [Qwen\u002FQwen3-VL-4B-Instruct-GGUF](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-4B-Instruct-GGUF) |  | Qwen3VL-4B-Instruct-F16.gguf, Qwen3VL-4B-Instruct-Q4_K_M.gguf, Qwen3VL-4B-Instruct-Q8_0.gguf | mmproj-Qwen3VL-4B-Instruct-F16.gguf |\n| Qwen-VL (GGUF) | Qwen3-VL-8B-Instruct-GGUF | [Qwen\u002FQwen3-VL-8B-Instruct-GGUF](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-8B-Instruct-GGUF) |  | Qwen3VL-8B-Instruct-F16.gguf, Qwen3VL-8B-Instruct-Q4_K_M.gguf, Qwen3VL-8B-Instruct-Q8_0.gguf | mmproj-Qwen3VL-8B-Instruct-F16.gguf |\n| Qwen-VL (GGUF) | Qwen3-VL-4B-Thinking-GGUF | [Qwen\u002FQwen3-VL-4B-Thinking-GGUF](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-4B-Thinking-GGUF) |  | Qwen3VL-4B-Thinking-F16.gguf, Qwen3VL-4B-Thinking-Q4_K_M.gguf, Qwen3VL-4B-Thinking-Q8_0.gguf | mmproj-Qwen3VL-4B-Thinking-F16.gguf |\n| Qwen-VL (GGUF) | Qwen3-VL-8B-Thinking-GGUF | [Qwen\u002FQwen3-VL-8B-Thinking-GGUF](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-8B-Thinking-GGUF) |  | Qwen3VL-8B-Thinking-F16.gguf, Qwen3VL-8B-Thinking-Q4_K_M.gguf, Qwen3VL-8B-Thinking-Q8_0.gguf | mmproj-Qwen3VL-8B-Thinking-F16.gguf |\n\n## **📖 Usage**\n\n### **Basic Usage**\n\n1. Add the **\"QwenVL\"** node from the 🧪AILab\u002FQwenVL category.  \n2. Select the **model\\_name** you wish to use.  \n3. Connect an image or video (image sequence) source to the node.  \n4. Write your prompt using the preset or custom field.  \n5. Run the workflow.\n\n### **Advanced Usage**\n\nFor more control, use the **\"QwenVL (Advanced)\"** node. This gives you access to detailed generation parameters like temperature, top\\_p, beam search, and device selection.\n\n## **⚙️ Parameters**\n\n| Parameter | Description | Default | Range | Node(s) |\n| :---- | :---- | :---- | :---- | :---- |\n| **model\\_name** | The Qwen-VL model to use. | Qwen3-VL-4B-Instruct | \\- | Standard & Advanced |\n| **quantization** | On-the-fly quantization. Ignored for pre-quantized models (e.g., FP8). | 8-bit (Balanced) | 4-bit, 8-bit, None | Standard & Advanced |\n| **attention\\_mode** | Attention mechanism: auto (Sage→Flash→SDPA), sage, flash\\_attention\\_2, sdpa | auto | auto, sage, flash\\_attention\\_2, sdpa | Standard & Advanced |\n| **preset\\_prompt** | A selection of pre-defined prompts for common tasks. | \"Describe this...\" | Any text | Standard & Advanced |\n| **custom\\_prompt** | Overrides the preset prompt if provided. |  | Any text | Standard & Advanced |\n| **max\\_tokens** | Maximum number of new tokens to generate. | 1024 | 64-2048 | Standard & Advanced |\n| **keep\\_model\\_loaded** | Keep the model in VRAM for faster subsequent runs. | True | True\u002FFalse | Standard & Advanced |\n| **seed** | A seed for reproducible results. | 1 | 1 \\- 2^64-1 | Standard & Advanced |\n| **temperature** | Controls randomness. Higher values \\= more creative. (Used when num\\_beams is 1). | 0.6 | 0.1-1.0 | Advanced Only |\n| **top\\_p** | Nucleus sampling threshold. (Used when num\\_beams is 1). | 0.9 | 0.0-1.0 | Advanced Only |\n| **num\\_beams** | Number of beams for beam search. \\> 1 disables temperature\u002Ftop\\_p sampling. | 1 | 1-10 | Advanced Only |\n| **repetition\\_penalty** | Discourages repeating tokens. | 1.2 | 0.0-2.0 | Advanced Only |\n| **frame\\_count** | Number of frames to sample from the video input. | 16 | 1-64 | Advanced Only |\n| **device** | Override automatic device selection. | auto | auto, cuda, cpu | Advanced Only |\n| **use\\_torch\\_compile** | Enable torch.compile optimization for faster inference. | False | True\u002FFalse | Advanced Only |\n\n### **💡 Quantization Options**\n\n| Mode | Precision | Memory Usage | Speed | Quality | Recommended For |\n| :---- | :---- | :---- | :---- | :---- | :---- |\n| None (FP16) | 16-bit Float | High | Fastest | Best | High VRAM GPUs (16GB+) |\n| 8-bit (Balanced) | 8-bit Integer | Medium | Fast | Very Good | Balanced performance (8GB+) |\n| 4-bit (VRAM-friendly) | 4-bit Integer | Low | Slower\\* | Good | Low VRAM GPUs (\u003C8GB) |\n\n\\* **Note on 4-bit Speed**: 4-bit quantization significantly reduces VRAM usage but may result in slower performance on some systems due to the computational overhead of real-time dequantization.\n\n### **🎯 Attention Mode Guide**\n\n| Mode | Description | Best For |\n| :---- | :---- | :---- |\n| **auto** | Automatically selects best available: Sage → Flash → SDPA | Most users (recommended) |\n| **sage** | SageAttention with GPU-optimized kernels | Speed on modern GPUs (RTX 40 series, Hopper, Blackwell) |\n| **flash\\_attention\\_2** | Flash Attention 2 | Speed when Sage unavailable |\n| **sdpa** | PyTorch SDPA (default) | Compatibility, FP8\u002FBitsAndBytes models |\n\n**Note**: FP8 models and BitsAndBytes quantization automatically use SDPA regardless of selection.\n\n### **🤔 Setting Tips**\n\n| Setting | Recommendation |\n| :---- | :---- |\n| **Model Choice** | For most users, Qwen3-VL-4B-Instruct is a great starting point. If you have a 40-series GPU, try the \\-FP8 version for better performance. |\n| **Memory Mode** | Keep keep\\_model\\_loaded enabled (True) for the best performance if you plan to run the node multiple times. Disable it only if you are running out of VRAM for other nodes. |\n| **Quantization** | Start with the default 8-bit. If you have plenty of VRAM (>16GB), switch to None (FP16) for the best speed and quality. If you are low on VRAM, use 4-bit. |\n| **Attention Mode** | Use \"auto\" for best performance. SageAttention provides fastest inference on supported GPUs. |\n| **Performance** | The first time a model is loaded with a specific quantization, it may be slow. Subsequent runs (with keep\\_model\\_loaded enabled) will be much faster. |\n\n## **🧠 About Model**\n\nThis node utilizes the Qwen-VL series of models, developed by the Qwen Team at Alibaba Cloud. These are powerful, open-source large vision-language models (LVLMs) designed to understand and process both visual and textual information, making them ideal for tasks like detailed image and video description.\n\n## **🗺️ Roadmap**\n\n### **✅ Completed (v2.1.0)**\n\n* ✅ SageAttention support with per-GPU architecture optimization\n* ✅ Improved FP8 model handling with automatic SDPA fallback\n* ✅ Smart attention selection (auto: Sage → Flash → SDPA)\n* ✅ Progress bar for model loading and generation\n* ✅ Better memory management and cache clearing\n\n### **✅ Completed (v2.0.0)**\n\n* ✅ GGUF model support via llama.cpp backend\n* ✅ Prompt Enhancer nodes for text-only optimization\n\n### **✅ Completed (v1.0.0)**\n\n* ✅ Support for Qwen3-VL and Qwen2.5-VL models.  \n* ✅ Automatic model downloading and management.  \n* ✅ On-the-fly 4-bit, 8-bit, and FP16 quantization.  \n* ✅ Hardware compatibility checks for FP8 models.  \n* ✅ Image and Video (frame sequence) input support.\n\n\n## **🙏 Credits**\n\n* **Qwen Team**: [Alibaba Cloud](https:\u002F\u002Fgithub.com\u002FQwenLM) \\- For developing and open-sourcing the powerful Qwen-VL models.  \n* **ComfyUI**: [comfyanonymous](https:\u002F\u002Fgithub.com\u002Fcomfyanonymous\u002FComfyUI) \\- For the incredible and extensible ComfyUI platform.  \n* **llama-cpp-python**: [JamePeng\u002Fllama-cpp-python](https:\u002F\u002Fgithub.com\u002FJamePeng\u002Fllama-cpp-python) \\- GGUF backend with vision support used by the GGUF nodes.  \n* **SageAttention**: [SageAttention](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002FSageAttention) \\- Efficient attention implementation with GPU-optimized kernels.\n* **ComfyUI Integration**: [1038lab](https:\u002F\u002Fgithub.com\u002F1038lab) \\- Developer of this custom node.\n\n## **📜 License**\n\nThis repository's code is released under the [GPL-3.0 License](LICENSE).\n","# **QwenVL for ComfyUI**\n\nComfyUI-QwenVL 自定义节点集成了阿里云强大的 Qwen-VL 系列视觉语言模型（LVLM），包括最新的 Qwen3-VL 和 Qwen2.5-VL，同时还支持 GGUF 后端以及纯文本的 Qwen3 模型。这一先进的节点能够在您的 ComfyUI 工作流中实现无缝的多模态 AI 功能，从而高效地进行文本生成、图像理解与视频分析。\n\n![QwenVL_V1.1.0](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F1038lab_ComfyUI-QwenVL_readme_d7baa5edfaa3.png)\n\n## **📰 新闻与更新**\n* **2026年2月8日**：**v2.1.1** 修复了与 Transformers 4.x 和 5.x 的兼容性问题 [[更新](https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL\u002Fblob\u002Fmain\u002Fupdate.md#version-211-20260208)]\n\n* **2026年2月5日**：**v2.1.0** 增加了 SageAttention 支持，并针对每种 GPU 架构进行了优化；改进了 FP8 模型的处理方式，实现了注意力机制模式的自动选择。[[更新](https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL\u002Fblob\u002Fmain\u002Fupdate.md#version-210-20260205)]\n  * **SageAttention 支持**：全新的注意力机制模式，配备针对不同 GPU 架构优化的内核（SM80、SM89、SM90、SM120）\n  * **FP8 处理改进**：更好地支持预量化 FP8 模型，并可自动回退到 SDPA\n  * **智能注意力选择**：自动模式会依次尝试 Sage → Flash → SDPA，以获得最佳性能\n  * **进度条**：为模型加载和生成阶段添加了 ComfyUI 进度条\n  * **更优的内存管理**：在切换注意力模式或量化设置时，改进了缓存清理机制\n* **2025年12月22日**：**v2.0.0** 新增了 GGUF 支持节点和提示增强节点。[[更新](https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL\u002Fblob\u002Fmain\u002Fupdate.md#version-200-20251222)]\n> [!重要]  \n> 在运行 GGUF 节点之前，请先安装 llama-cpp-python [安装说明](docs\u002FLLAMA_CPP_PYTHON_VISION_INSTALL.md)\n> \n![600346260_122188475918461193_3763807942053883496_n](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F1038lab_ComfyUI-QwenVL_readme_f385399d2244.png)\n* **2025年11月10日**：**v1.1.0** 对运行时进行了全面重构，加入了注意力模式选择器、flash-attn 自动检测功能、更智能的缓存管理以及在两个节点中均可使用的量化和 torch.compile 控制选项。[[更新](https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL\u002Fblob\u002Fmain\u002Fupdate.md#version-110-20251110)]\n* **2025年10月31日**：**v1.0.4** 支持自定义模型 [[更新](https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL\u002Fblob\u002Fmain\u002Fupdate.md#version-104-20251031)]\n* **2025年10月22日**：**v1.0.3** 更新了模型列表 [[更新](https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL\u002Fblob\u002Fmain\u002Fupdate.md#version-103-20251022)]\n* **2025年10月17日**：**v1.0.0** 初始发布  \n  * 支持 Qwen3-VL 和 Qwen2.5-VL 系列模型。  \n  * 可从 Hugging Face 自动下载模型。  \n  * 支持即时量化（4-bit、8-bit、FP16）。  \n  * 提供预设和自定义提示系统，使用灵活便捷。  \n  * **包含标准节点和高级节点**，适合各水平用户。  \n  * 针对硬件特性提供保护措施，确保 FP8 模型的兼容性。  \n  * 支持图像和视频（帧序列）输入。  \n  * 提供“保持模型加载”选项，以提升连续运行时的性能。  \n  * 提供 **种子参数**，便于生成结果的重复性。\n\n[![QwenVL_V1.0.0r](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F1038lab_ComfyUI-QwenVL_readme_5bb47e60f14b.jpg)](https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL\u002Fblob\u002Fmain\u002Fexample_workflows\u002FQWenVL.json)\n\n## **✨ 功能特性**\n\n* **标准与高级节点**：包含一个用于快速使用的简单 QwenVL 节点，以及一个具备精细生成控制能力的 QwenVL（高级）节点。  \n* **提示增强器**：专为 HF 和 GGUF 后端设计的纯文本提示增强器。  \n* **预设与自定义提示**：您可以从便捷的预设提示列表中选择，也可以自行编写提示，以实现完全控制。  \n* **多模型支持**：轻松切换不同的官方 Qwen-VL 模型。  \n* **自动模型下载**：首次使用时会自动下载所需模型。  \n* **智能量化**：通过 4-bit、8-bit 和 FP16 选项，在显存占用与性能之间取得平衡。  \n* **硬件感知**：自动检测 GPU 性能，并防止使用不兼容模型时出现错误（例如 FP8）。  \n* **可重复生成**：使用种子参数可获得一致的输出。  \n* **内存管理**：提供“保持模型加载”选项，将模型常驻显存以加快处理速度。  \n* **图像与视频支持**：既可接受单张图像输入，也可接受视频帧序列作为输入。  \n* **健壮的错误处理**：针对硬件或内存问题提供清晰的错误信息。  \n* **简洁的控制台输出**：运行过程中仅显示最少且富有信息量的日志。  \n* **SageAttention 支持**：基于 GPU 优化的注意力机制，配备针对不同架构的专用内核（Ampere、Ada、Hopper、Blackwell）。  \n* **进度条**：在模型加载和生成阶段提供可视化反馈。  \n* **智能缓存管理**：在切换注意力模式或量化设置时自动释放显存。\n\n## **🚀 安装步骤**\n\n1. 将本仓库克隆到您的 ComfyUI\u002Fcustom_nodes 目录下：  \n   ```\n   cd ComfyUI\u002Fcustom_nodes  \n   git clone https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL.git\n   ```\n2. 安装所需的依赖项：  \n   ```\n   cd ComfyUI\u002Fcustom_nodes\u002FComfyUI-QwenVL  \n   pip install -r requirements.txt\n   ```\n\n3. 重启 ComfyUI。\n\n### **可选：SageAttention 支持**\n为了在支持的 GPU 上获得最佳性能，您需要安装 SageAttention：\n```\npip install sageattention\n```\n\n## **🧭 节点概览**\n\n### **Transformers（HF）节点**\n- **QwenVL**：快速的视觉语言推理（图像\u002F视频 + 预设\u002F自定义提示）。  \n- **QwenVL（高级）**：可全面控制采样、设备及性能设置。  \n- **QwenVL 提示增强器**：纯文本提示增强（同时支持 Qwen3 文本模型和 QwenVL 模型的文本模式）。  \n\n### **GGUF（llama.cpp）节点**\n- **QwenVL（GGUF）**：基于 GGUF 的视觉语言推理。  \n- **QwenVL（GGUF 高级）**：扩展的 GGUF 控制选项（上下文长度、GPU 层数等）。  \n- **QwenVL 提示增强器（GGUF）**：GGUF 版本的纯文本提示增强。\n\n## **🧩 GGUF 节点（llama.cpp 后端）**\n\n本仓库包含由 `llama-cpp-python` 提供支持的 **GGUF** 节点（与基于 Transformers 的节点分开）。\n\n- **节点**：`QwenVL（GGUF）`、`QwenVL（GGUF 高级）`、`QwenVL 提示增强器（GGUF）`\n- **模型文件夹**（默认路径）：`ComfyUI\u002Fmodels\u002Fllm\u002FGGUF\u002F`（可通过 `gguf_models.json` 进行配置）\n- **视觉要求**：需安装具备视觉功能的 `llama-cpp-python` 轮子，该轮子应提供 `Qwen3VLChatHandler` 或 `Qwen25VLChatHandler`  \n  请参阅 [docs\u002FLLAMA_CPP_PYTHON_VISION_INSTALL.md](docs\u002FLLAMA_CPP_PYTHON_VISION_INSTALL.md)\n\n## **🗂️ 配置文件**\n\n- **HF 模型**：`hf_models.json`  \n  - `hf_vl_models`：视觉语言模型（供 QwenVL 节点使用）。  \n  - `hf_text_models`：纯文本模型（供提示增强器使用）。  \n- **GGUF 模型**：`gguf_models.json`  \n- **系统提示**：`AILab_System_Prompts.json`（包含 VL 提示及提示增强风格）。\n\n## **📥 下载模型**\n\n首次使用时，模型将自动下载。如果您希望手动下载，请将其放置在 ComfyUI\u002Fmodels\u002FLLM\u002FQwen-VL\u002F 目录下。\n\n### **HF 视觉模型（Qwen-VL）**\n| 模型 | 链接 |\n| :---- | :---- |\n| Qwen3-VL-2B-Instruct | [下载](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-2B-Instruct) |\n| Qwen3-VL-2B-Thinking | [下载](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-2B-Thinking) |\n| Qwen3-VL-2B-Instruct-FP8 | [下载](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-2B-Instruct-FP8) |\n| Qwen3-VL-2B-Thinking-FP8 | [下载](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-2B-Thinking-FP8) |\n| Qwen3-VL-4B-Instruct | [下载](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-4B-Instruct) |\n| Qwen3-VL-4B-Thinking | [下载](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-4B-Thinking) |\n| Qwen3-VL-4B-Instruct-FP8 | [下载](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-4B-Instruct-FP8) |\n| Qwen3-VL-4B-Thinking-FP8 | [下载](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-4B-Thinking-FP8) |\n| Qwen3-VL-8B-Instruct | [下载](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-8B-Instruct) |\n| Qwen3-VL-8B-Thinking | [下载](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-8B-Thinking) |\n| Qwen3-VL-8B-Instruct-FP8 | [下载](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-8B-Instruct-FP8) |\n| Qwen3-VL-8B-Thinking-FP8 | [下载](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-8B-Thinking-FP8) |\n| Qwen3-VL-32B-Instruct | [下载](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-32B-Instruct) |\n| Qwen3-VL-32B-Thinking | [下载](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-32B-Thinking) |\n| Qwen3-VL-32B-Instruct-FP8 | [下载](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-32B-Instruct-FP8) |\n| Qwen3-VL-32B-Thinking-FP8 | [下载](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-32B-Thinking-FP8) |\n| Qwen2.5-VL-3B-Instruct | [下载](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen2.5-VL-3B-Instruct) |\n| Qwen2.5-VL-7B-Instruct | [下载](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen2.5-VL-7B-Instruct) |\n\n### **HF 文本模型（Qwen3）**\n| 模型 | 链接 |\n| :---- | :---- |\n| Qwen3-0.6B | [下载](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-0.6B) |\n| Qwen3-4B-Instruct-2507 | [下载](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-4B-Instruct-2507) |\n| qwen3-4b-Z-Image-Engineer | [下载](https:\u002F\u002Fhuggingface.co\u002FBennyDaBall\u002Fqwen3-4b-Z-Image-Engineer) |\n\n### **GGUF 模型（手动下载）**\n| 组别 | 模型 | 仓库 | 替代仓库 | 模型文件 | MMProj |\n| :-- | :-- | :-- | :-- | :-- | :-- |\n| Qwen 文本（GGUF） | Qwen3-4B-GGUF | [Qwen\u002FQwen3-4B-GGUF](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-4B-GGUF) |  | Qwen3-4B-Q4_K_M.gguf, Qwen3-4B-Q5_0.gguf, Qwen3-4B-Q5_K_M.gguf, Qwen3-4B-Q6_K.gguf, Qwen3-4B-Q8_0.gguf |  |\n| Qwen-VL（GGUF） | Qwen3-VL-4B-Instruct-GGUF | [Qwen\u002FQwen3-VL-4B-Instruct-GGUF](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-4B-Instruct-GGUF) |  | Qwen3VL-4B-Instruct-F16.gguf, Qwen3VL-4B-Instruct-Q4_K_M.gguf, Qwen3VL-4B-Instruct-Q8_0.gguf | mmproj-Qwen3VL-4B-Instruct-F16.gguf |\n| Qwen-VL（GGUF） | Qwen3-VL-8B-Instruct-GGUF | [Qwen\u002FQwen3-VL-8B-Instruct-GGUF](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-8B-Instruct-GGUF) |  | Qwen3VL-8B-Instruct-F16.gguf, Qwen3VL-8B-Instruct-Q4_K_M.gguf, Qwen3VL-8B-Instruct-Q8_0.gguf | mmproj-Qwen3VL-8B-Instruct-F16.gguf |\n| Qwen-VL（GGUF） | Qwen3-VL-4B-Thinking-GGUF | [Qwen\u002FQwen3-VL-4B-Thinking-GGUF](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-4B-Thinking-GGUF) |  | Qwen3VL-4B-Thinking-F16.gguf, Qwen3VL-4B-Thinking-Q4_K_M.gguf, Qwen3VL-4B-Thinking-Q8_0.gguf | mmproj-Qwen3VL-4B-Thinking-F16.gguf |\n| Qwen-VL（GGUF） | Qwen3-VL-8B-Thinking-GGUF | [Qwen\u002FQwen3-VL-8B-Thinking-GGUF](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-8B-Thinking-GGUF) |  | Qwen3VL-8B-Thinking-F16.gguf, Qwen3VL-8B-Thinking-Q4_K_M.gguf, Qwen3VL-8B-Thinking-Q8_0.gguf | mmproj-Qwen3VL-8B-Thinking-F16.gguf |\n\n## **📖 使用方法**\n\n### **基本用法**\n\n1. 从 🧪AILab\u002FQwenVL 类别中添加 **“QwenVL”** 节点。  \n2. 选择您想要使用的 **model_name**。  \n3. 将图像或视频（图像序列）源连接到节点。  \n4. 使用预设或自定义字段编写您的提示。  \n5. 运行工作流。\n\n### **高级用法**\n\n为了获得更多的控制，可以使用 **“QwenVL (Advanced)”** 节点。这使您可以访问详细的生成参数，如温度、top_p、束搜索和设备选择。\n\n## **⚙️ 参数**\n\n| 参数 | 描述 | 默认值 | 范围 | 节点 |\n| :---- | :---- | :---- | :---- | :---- |\n| **model_name** | 要使用的 Qwen-VL 模型。 | Qwen3-VL-4B-Instruct | \\- | 标准版与高级版 |\n| **quantization** | 即时量化。对于预先量化的模型（例如 FP8）则忽略此选项。 | 8 位（平衡） | 4 位、8 位、无 | 标准版与高级版 |\n| **attention_mode** | 注意力机制：自动（Sage→Flash→SDPA）、sage、flash_attention_2、sdpa | 自动 | 自动、sage、flash_attention_2、sdpa | 标准版与高级版 |\n| **preset_prompt** | 常见任务的预定义提示选择。 | “描述一下这个……” | 任意文本 | 标准版与高级版 |\n| **custom_prompt** | 如果提供，则会覆盖预设提示。 |  | 任意文本 | 标准版与高级版 |\n| **max_tokens** | 最大生成新标记数。 | 1024 | 64-2048 | 标准版与高级版 |\n| **keep_model_loaded** | 将模型保留在 VRAM 中，以便后续运行更快。 | 真 | 真\u002F假 | 标准版与高级版 |\n| **seed** | 用于可重复结果的种子。 | 1 | 1 - 2^64-1 | 标准版与高级版 |\n| **temperature** | 控制随机性。数值越高，越具创造性。（当 num_beams 为 1 时使用）。 | 0.6 | 0.1-1.0 | 仅高级版 |\n| **top_p** | 核采样阈值。（当 num_beams 为 1 时使用）。 | 0.9 | 0.0-1.0 | 仅高级版 |\n| **num_beams** | 束搜索的束数。大于 1 会禁用温度\u002Ftop_p 采样。 | 1 | 1-10 | 仅高级版 |\n| **repetition_penalty** | 不鼓励重复标记。 | 1.2 | 0.0-2.0 | 仅高级版 |\n| **frame_count** | 从视频输入中采样的帧数。 | 16 | 1-64 | 仅高级版 |\n| **device** | 覆盖自动设备选择。 | 自动 | 自动、cuda、cpu | 仅高级版 |\n| **use_torch_compile** | 启用 torch.compile 优化以加快推理速度。 | 假 | 真\u002F假 | 仅高级版 |\n\n### **💡 量化选项**\n\n| 模式 | 精度 | 内存占用 | 速度 | 质量 | 推荐场景 |\n| :---- | :---- | :---- | :---- | :---- | :---- |\n| 无（FP16） | 16 位浮点 | 高 | 最快 | 最佳 | 高 VRAM 显卡（16GB+） |\n| 8 位（平衡） | 8 位整数 | 中 | 快 | 非常好 | 平衡性能（8GB+） |\n| 4 位（节省 VRAM） | 4 位整数 | 低 | 较慢\\* | 良好 | 低 VRAM 显卡（\u003C8GB） |\n\n\\* **关于 4 位速度的说明**：4 位量化显著减少了 VRAM 的使用，但由于实时反量化带来的计算开销，在某些系统上可能会导致性能下降。\n\n### **🎯 注意力模式指南**\n\n| 模式 | 描述 | 适用场景 |\n| :---- | :---- | :---- |\n| **auto** | 自动选择最佳可用模式：Sage → Flash → SDPA | 大多数用户（推荐） |\n| **sage** | 基于 GPU 优化内核的 SageAttention | 在现代 GPU（RTX 40 系列、Hopper、Blackwell）上速度更快 |\n| **flash\\_attention\\_2** | Flash Attention 2 | 当 Sage 不可用时提供速度优势 |\n| **sdpa** | PyTorch SDPA（默认） | 兼容性好，适用于 FP8 和 BitsAndBytes 模型 |\n\n**注意**：无论选择哪种模式，FP8 模型和 BitsAndBytes 量化都会自动使用 SDPA。\n\n### **🤔 设置建议**\n\n| 设置 | 建议 |\n| :---- | :---- |\n| **模型选择** | 对于大多数用户来说，Qwen3-VL-4B-Instruct 是一个很好的起点。如果你有 40 系列 GPU，可以尝试 \\-FP8 版本以获得更好的性能。 |\n| **内存模式** | 如果计划多次运行该节点，建议保持 keep\\_model\\_loaded 开启（True），以获得最佳性能。仅在其他节点内存不足时才关闭它。 |\n| **量化** | 首先使用默认的 8 位量化。如果显存充足（>16GB），可切换到无量化（FP16），以获得最快的速度和最佳质量。若显存紧张，则使用 4 位量化。 |\n| **注意力模式** | 使用“auto”模式以获得最佳性能。在支持的 GPU 上，SageAttention 能提供最快的推理速度。 |\n| **性能** | 第一次加载特定量化级别的模型时，可能会比较慢。但后续运行（保持 keep\\_model\\_loaded 开启）会快得多。\n\n## **🧠 关于模型**\n\n该节点使用由阿里云通义实验室团队开发的 Qwen-VL 系列模型。这些是功能强大的开源大型视觉语言模型（LVLM），旨在理解和处理视觉与文本信息，非常适合用于详细描述图像和视频等任务。\n\n## **🗺️ 路线图**\n\n### **✅ 已完成（v2.1.0）**\n\n* ✅ 支持基于每 GPU 架构优化的 SageAttention\n* ✅ 改进了 FP8 模型的处理，自动回退到 SDPA\n* ✅ 智能注意力选择（auto：Sage → Flash → SDPA）\n* ✅ 模型加载和生成进度条\n* ✅ 更好的内存管理和缓存清理\n\n### **✅ 已完成（v2.0.0）**\n\n* ✅ 通过 llama.cpp 后端支持 GGUF 模型\n* ✅ 文本优化增强节点\n\n### **✅ 已完成（v1.0.0）**\n\n* ✅ 支持 Qwen3-VL 和 Qwen2.5-VL 模型。\n* ✅ 自动下载和管理模型。\n* ✅ 实时进行 4 位、8 位和 FP16 量化。\n* ✅ 对 FP8 模型进行硬件兼容性检查。\n* ✅ 支持图像和视频（帧序列）输入。\n\n## **🙏 致谢**\n\n* **Qwen 团队**：[阿里云](https:\u002F\u002Fgithub.com\u002FQwenLM) —— 感谢他们开发并开源了强大的 Qwen-VL 模型。\n* **ComfyUI**：[comfyanonymous](https:\u002F\u002Fgithub.com\u002Fcomfyanonymous\u002FComfyUI) —— 感谢其强大且可扩展的 ComfyUI 平台。\n* **llama-cpp-python**：[JamePeng\u002Fllama-cpp-python](https:\u002F\u002Fgithub.com\u002FJamePeng\u002Fllama-cpp-python) —— 提供了 GGUF 节点使用的具有视觉支持的 GGUF 后端。\n* **SageAttention**：[SageAttention](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002FSageAttention) —— 提供了高效的注意力实现及 GPU 优化的内核。\n* **ComfyUI 集成**：[1038lab](https:\u002F\u002Fgithub.com\u002F1038lab) —— 该自定义节点的开发者。\n\n## **📜 许可证**\n\n本仓库的代码采用 [GPL-3.0 许可证](LICENSE) 发布。","# ComfyUI-QwenVL 快速上手指南\n\nComfyUI-QwenVL 是专为 ComfyUI 设计的自定义节点，集成了阿里云强大的 Qwen-VL 系列视觉语言模型（包括最新的 Qwen3-VL、Qwen2.5-VL），支持图像理解、视频分析及文本生成。该工具提供标准与高级两种节点模式，并支持 GGUF 量化后端，能够灵活适配不同显存配置。\n\n## 环境准备\n\n在开始之前，请确保满足以下系统要求：\n\n*   **操作系统**: Windows, Linux 或 macOS\n*   **Python**: 建议 Python 3.10 或更高版本\n*   **ComfyUI**: 已安装并可正常运行的最新稳定版 ComfyUI\n*   **GPU**: 推荐 NVIDIA GPU (支持 CUDA)，显存建议 8GB 以上（根据模型大小而定，小模型如 2B\u002F4B 可在较低显存运行）\n*   **依赖库**:\n    *   `transformers` (自动安装)\n    *   `torch` (需匹配你的 CUDA 版本)\n    *   **可选 (高性能)**: 若使用 SageAttention 加速，需额外安装 `sageattention`\n    *   **可选 (GGUF 模式)**: 若使用 GGUF 节点，需预先安装支持 Vision 的 `llama-cpp-python`\n\n## 安装步骤\n\n### 1. 克隆仓库\n进入 ComfyUI 的 `custom_nodes` 目录并克隆本插件：\n\n```bash\ncd ComfyUI\u002Fcustom_nodes\ngit clone https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL.git\n```\n\n> **国内加速提示**：如果 GitHub 连接缓慢，可使用镜像源：\n> `git clone https:\u002F\u002Fghp.ci\u002Fhttps:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL.git`\n\n### 2. 安装依赖\n进入插件目录并安装所需 Python 包：\n\n```bash\ncd ComfyUI\u002Fcustom_nodes\u002FComfyUI-QwenVL\npip install -r requirements.txt\n```\n\n> **国内加速提示**：建议使用国内镜像源加速 pip 安装：\n> `pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`\n\n### 3. (可选) 安装高性能组件\n*   **SageAttention 加速** (推荐 NVIDIA Ampere\u002FAda\u002FHopper 架构用户):\n    ```bash\n    pip install sageattention\n    ```\n*   **GGUF 支持** (如需使用 GGUF 节点):\n    请参考项目文档 `docs\u002FLLAMA_CPP_PYTHON_VISION_INSTALL.md` 安装带有 `Qwen3VLChatHandler` 支持的 `llama-cpp-python` 版本。\n\n### 4. 重启 ComfyUI\n安装完成后，完全重启 ComfyUI 以加载新节点。\n\n## 基本使用\n\n以下是使用 **标准节点** 进行图像描述的最简工作流：\n\n1.  **添加节点**:\n    在 ComfyUI 右键菜单中，找到 `🧪AILab\u002FQwenVL` 分类，选择 **`QwenVL`** 节点添加到画布。\n\n2.  **连接输入**:\n    *   将 **图像加载器 (Load Image)** 的输出连接到 `QwenVL` 节点的 `image` 输入端。\n    *   (可选) 支持视频帧序列输入。\n\n3.  **配置参数**:\n    *   **model_name**: 选择要使用的模型（例如 `Qwen3-VL-4B-Instruct`）。首次运行时会自动从 Hugging Face 下载模型。\n    *   **preset_prompt**: 选择预设提示词（如 \"Describe this image in detail\"），或在 **custom_prompt** 中输入自定义指令。\n    *   **quantization**: 根据显存选择量化等级（默认 `8-bit`，显存紧张可选 `4-bit`）。\n\n4.  **运行工作流**:\n    点击 \"Queue Prompt\" 运行。节点将输出模型生成的文本描述。\n\n### 进阶提示\n*   **模型存储**: 自动下载的模型默认存储在 `ComfyUI\u002Fmodels\u002FLLM\u002FQwen-VL\u002F`。如需手动下载，可从 Hugging Face 获取后放入该目录。\n*   **保持模型加载**: 启用 `keep_model_loaded` 选项可避免重复加载模型，显著提升连续生成的速度。\n*   **高级控制**: 如需调整 Temperature、Top_P 或指定 Attention 模式，请使用 **`QwenVL (Advanced)`** 节点。","一位电商运营设计师需要快速处理数百张新品服装图，既要提取详细的材质与款式描述用于上架，又要基于这些特征生成多风格的营销海报。\n\n### 没有 ComfyUI-QwenVL 时\n- **流程割裂效率低**：必须先用独立的 OCR 工具或人工手动记录图片中的文字标签和面料信息，再复制到文生图节点，无法在 ComfyUI 内部形成闭环。\n- **视频分析能力缺失**：面对动态走秀视频素材，只能逐帧截图后盲目猜测动作细节，缺乏对连续帧语义的精准理解，导致生成的提示词空洞。\n- **显存管理困难**：尝试加载大型多模态模型时，常因缺乏智能量化（如 FP8\u002F4-bit）和显存清理机制，导致本地显卡直接爆显存崩溃。\n- **工作流复用性差**：每次更换模型或调整参数都需要重新编写复杂的脚本代码，难以通过可视化节点灵活切换 Qwen2.5-VL 或 Qwen3-VL 等不同版本。\n\n### 使用 ComfyUI-QwenVL 后\n- **端到端自动化**：直接将服装图或视频帧序列输入节点，利用内置的 Qwen3-VL 模型自动输出包含“真丝质感”、“法式剪裁”等细节的结构化提示词，无缝对接下游生图节点。\n- **深度视频理解**：借助对视频帧序列的分析能力，精准捕捉模特转身、裙摆飘动等动态特征，自动生成极具画面感的动态营销文案。\n- **硬件友好运行**：开启 GGUF 后端与智能量化选项，自动匹配 SageAttention 加速内核，在消费级显卡上也能流畅运行大参数模型而不爆显存。\n- **灵活可视调控**：通过预设提示词模板和高级节点控件，无需写代码即可一键切换模型版本或微调生成策略，大幅降低多模态工作流的搭建门槛。\n\nComfyUI-QwenVL 将复杂的多模态理解能力转化为可视化的标准组件，让设计师能在单一工作流中实现从“看图理解”到“创意生成”的无缝飞跃。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F1038lab_ComfyUI-QwenVL_d7baa5ed.png","1038lab","AI Lab","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002F1038lab_8aa634bc.jpg",null,"https:\u002F\u002Fgithub.com\u002F1038lab",[78,82],{"name":79,"color":80,"percentage":81},"Python","#3572A5",98.6,{"name":83,"color":84,"percentage":85},"JavaScript","#f1e05a",1.4,717,101,"2026-04-09T10:29:41","GPL-3.0","未说明","需要 NVIDIA GPU 以获得最佳性能（支持 SageAttention 优化，架构包括 SM80, SM89, SM90, SM120）；显存需求取决于模型大小及量化设置（支持 4-bit, 8-bit, FP16, FP8）；具备硬件感知保护机制以防止不兼容的 FP8 模型报错。",{"notes":93,"python":90,"dependencies":94},"该工具集成阿里云 Qwen-VL 系列（含 Qwen3-VL, Qwen2.5-VL）及纯文本 Qwen3 模型。支持 Transformers (HF) 和 GGUF (llama.cpp) 两种后端。若使用 GGUF 节点，必须预先安装支持视觉的 llama-cpp-python 版本。支持自动下载模型，也可手动放置于指定目录。提供智能注意力模式选择（Sage -> Flash -> SDPA）和多种量化选项以平衡显存与性能。支持图像和视频帧序列输入。",[95,96,97,98],"torch","transformers (4.x\u002F5.x)","sageattention (可选)","llama-cpp-python (GGUF 节点必需，需包含视觉处理能力)",[15,35],[101,102,103,104],"comfyui","customnodes","qwen-vl","qwen3-vl","2026-03-27T02:49:30.150509","2026-04-10T22:43:51.491694",[108,113,118,123,128,133],{"id":109,"question_zh":110,"answer_zh":111,"source_url":112},28480,"GGUF 模型处理完后显存（VRAM）未释放怎么办？","这是旧版本中存在的内存泄漏问题。维护者已在最新版本中修复了从 llama-cpp-python 继承的许多内存泄漏和管理问题。特别是 v0.3.27 版本修复了 CLIP 模型释放的问题。强烈建议升级到 v0.3.27 或更高版本，以获得更好的多轮加载\u002F卸载会话体验，无需重启 ComfyUI。","https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL\u002Fissues\u002F104",{"id":114,"question_zh":115,"answer_zh":116,"source_url":117},28481,"如何在本地路径配置自定义模型（custom_models.json）？","你需要修改 `custom_models.json` 文件（注意不要使用带 `_sample` 后缀的示例文件）。在 Windows 上，配置格式如下：\n{\n  \"hf_models\": {\n    \"你的模型名称\": {\n      \"repo_id\": \"D:\u002F你的\u002F本地\u002F模型\u002F路径\",\n      \"default\": false,\n      \"quantized\": false,\n      \"vram_requirement\": {\n        \"4bit\": 4,\n        \"8bit\": 6,\n        \"full\": 10\n      }\n    }\n  }\n}\n注意：`repo_id` 应指向包含模型文件的文件夹路径，而不是具体的 .safetensors 文件名。确保路径分隔符正确（推荐使用正斜杠 \u002F 或双反斜杠 \\\\）。","https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL\u002Fissues\u002F73",{"id":119,"question_zh":120,"answer_zh":121,"source_url":122},28482,"Windows 用户为什么无法使用 Flash-Attention？","早期版本为了兼容性在 Windows 上强制禁用了 Flash-Attention。维护者已在最新代码中应用了快速修复，恢复了 Windows 用户的 Flash-Attention 支持。请拉取主分支（main）的最新代码并更新插件。如果更新后仍未启用，请提供环境详情和日志以便进一步排查。","https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL\u002Fissues\u002F102",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},28483,"Qwen3-VL 是否支持 GGUF 量化模型？","是的，v2.0.0 版本已正式包含对 GGUF 量化模型的支持，并实现了 ComfyUI 额外的模型 YAML 路径配置。请更新到最新版本即可使用轻量级且优化的 GGUF 设置。对于不稳定的 VLM 切换导致的崩溃问题，建议使用子进程（subprocess）方式来运行，以防止 ComfyUI 主程序崩溃。","https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL\u002Fissues\u002F32",{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},28484,"遇到 'dict' object has no attribute 'model_type' 错误如何解决？","该错误通常由依赖项版本不匹配或未正确安装引起。请尝试以下步骤：\n1. 进入 `ComfyUI-QwenVL` 插件目录。\n2. 运行命令 `pip install -r requirements.txt` 以确保所有依赖项（特别是 transformers 库）已正确安装且版本兼容。\n3. 如果使用的是 ComfyUI 夜间版（nightly），尝试切换回稳定版（如 0.3.75）测试。\n4. 确认已删除并重新下载了正确的模型文件。","https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL\u002Fissues\u002F50",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},28485,"QwenVL 模型生成速度过慢是什么原因？","生成速度慢可能与硬件配置、模型量化等级或 PyTorch\u002FCUDA 版本有关。有用户反馈在 RTX 4070Ti 上使用 Q4_K_M 量化模型、Torch 2.9.1 和 cu130 时，冷启动小于 6 秒，热启动小于 3 秒，速度正常。如果遇到极慢的情况（如数百秒），请检查：\n1. 是否使用了量化版本（如 GGUF Q4_K_M）而非全精度模型。\n2. PyTorch 和 CUDA 版本是否与显卡驱动兼容。\n3. 显存是否充足，避免发生交换导致速度下降。","https:\u002F\u002Fgithub.com\u002F1038lab\u002FComfyUI-QwenVL\u002Fissues\u002F18",[]]