[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-numz--ComfyUI-SeedVR2_VideoUpscaler":3,"tool-numz--ComfyUI-SeedVR2_VideoUpscaler":62},[4,18,26,36,46,54],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",158594,2,"2026-04-16T23:34:05",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":42,"last_commit_at":43,"category_tags":44,"status":17},8272,"opencode","anomalyco\u002Fopencode","OpenCode 是一款开源的 AI 编程助手（Coding Agent），旨在像一位智能搭档一样融入您的开发流程。它不仅仅是一个代码补全插件，而是一个能够理解项目上下文、自主规划任务并执行复杂编码操作的智能体。无论是生成全新功能、重构现有代码，还是排查难以定位的 Bug，OpenCode 都能通过自然语言交互高效完成，显著减少开发者在重复性劳动和上下文切换上的时间消耗。\n\n这款工具专为软件开发者、工程师及技术研究人员设计，特别适合希望利用大模型能力来提升编码效率、加速原型开发或处理遗留代码维护的专业人群。其核心亮点在于完全开源的架构，这意味着用户可以审查代码逻辑、自定义行为策略，甚至私有化部署以保障数据安全，彻底打破了传统闭源 AI 助手的“黑盒”限制。\n\n在技术体验上，OpenCode 提供了灵活的终端界面（Terminal UI）和正在测试中的桌面应用程序，支持 macOS、Windows 及 Linux 全平台。它兼容多种包管理工具，安装便捷，并能无缝集成到现有的开发环境中。无论您是追求极致控制权的资深极客，还是渴望提升产出的独立开发者，OpenCode 都提供了一个透明、可信",144296,1,"2026-04-16T14:50:03",[13,45],"插件",{"id":47,"name":48,"github_repo":49,"description_zh":50,"stars":51,"difficulty_score":32,"last_commit_at":52,"category_tags":53,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":55,"name":56,"github_repo":57,"description_zh":58,"stars":59,"difficulty_score":32,"last_commit_at":60,"category_tags":61,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[45,13,15,14],{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":77,"owner_twitter":77,"owner_website":77,"owner_url":78,"languages":79,"stars":84,"forks":85,"last_commit_at":86,"license":87,"difficulty_score":10,"env_os":88,"env_gpu":89,"env_ram":90,"env_deps":91,"category_tags":103,"github_topics":105,"view_count":32,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":111,"updated_at":112,"faqs":113,"releases":143},8175,"numz\u002FComfyUI-SeedVR2_VideoUpscaler","ComfyUI-SeedVR2_VideoUpscaler","Official SeedVR2 Video Upscaler for ComfyUI","ComfyUI-SeedVR2_VideoUpscaler 是字节跳动 SeedVR2 模型在 ComfyUI 平台上的官方实现，专为高质量视频与图像超分辨率放大而设计。它能有效解决低分辨率素材模糊、细节丢失的问题，通过先进的 AI 算法重建纹理，显著提升画面清晰度，同时支持多显卡并行处理以加速渲染。\n\n这款工具特别适合熟悉 ComfyUI 工作流的创作者、视频后期设计师以及需要批量处理高清内容的开发者使用。其独特亮点在于不仅作为节点嵌入 ComfyUI，还能作为独立命令行工具运行，灵活适配不同硬件环境。最新版本进一步优化了跨平台兼容性，包括苹果 Silicon 芯片的内存管理、10-bit 色彩深度输出以减少色彩断层，以及对 GGUF 量化模型的支持，大幅降低了显存占用并提升了推理效率。此外，它还增强了安全性，防止恶意模型文件执行代码。无论是修复老视频还是提升生成式 AI 产出画质，ComfyUI-SeedVR2_VideoUpscaler 都提供了一个稳定且高效的解决方案。","# ComfyUI-SeedVR2_VideoUpscaler\n\n[![View Code](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F📂_View_Code-GitHub-181717?style=for-the-badge&logo=github)](https:\u002F\u002Fgithub.com\u002Fnumz\u002FComfyUI-SeedVR2_VideoUpscaler)\n\nOfficial release of [SeedVR2](https:\u002F\u002Fgithub.com\u002FByteDance-Seed\u002FSeedVR) for ComfyUI that enables high-quality video and image upscaling.\n\nCan run as **Multi-GPU standalone CLI** too, see [🖥️ Run as Standalone](#-run-as-standalone-cli) section.\n\n[![SeedVR2 v2.5 Deep Dive Tutorial](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnumz_ComfyUI-SeedVR2_VideoUpscaler_readme_c1df29b1d2ae.jpg)](https:\u002F\u002Fyoutu.be\u002FMBtWYXq_r60)\n\n![Usage Example](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnumz_ComfyUI-SeedVR2_VideoUpscaler_readme_3c9839b686c7.png)\n\n![Usage Example](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnumz_ComfyUI-SeedVR2_VideoUpscaler_readme_04a1bbdc12cf.png)\n\n## 📋 Quick Access\n\n- [🆙 Future Work](#-future-work)\n- [🚀 Release Notes](#-release-notes)\n- [🎯 Features](#-features)\n- [🔧 Requirements](#-requirements)\n- [📦 Installation](#-installation)\n- [📖 Usage](#-usage)\n- [🖥️ Run as Standalone](#️-run-as-standalone-cli)\n- [⚠️ Limitations](#️-limitations)\n- [🤝 Contributing](#-contributing)\n- [🙏 Credits](#-credits)\n- [📜 License](#-license)\n\n## 🆙 Future Work\n\nWe're actively working on improvements and new features. To stay informed:\n\n- **📌 Track Active Development**: Visit [Issues](https:\u002F\u002Fgithub.com\u002Fnumz\u002FComfyUI-SeedVR2_VideoUpscaler\u002Fissues) to see active development, report bugs, and request new features\n- **💬 Join the Community**: Learn from others, share your workflows, and get help in the [Discussions](https:\u002F\u002Fgithub.com\u002Fnumz\u002FComfyUI-SeedVR2_VideoUpscaler\u002Fdiscussions)\n- **🔮 Next Model Survey**: We're looking for community input on the next open-source super-powerful generic restoration model. Share your suggestions in [Issue #164](https:\u002F\u002Fgithub.com\u002Fnumz\u002FComfyUI-SeedVR2_VideoUpscaler\u002Fissues\u002F164)\n\n## 🚀 Release Notes\n\n**2025.12.24 - Version 2.5.24**\n\n- **🍎 Fix: MPS memory leak regression** - Restored MPS cache clearing after VAE encode\u002Fdecode operations that was accidentally removed during code cleanup in v2.5.23\n\n**2025.12.24 - Version 2.5.23**\n\n- **🔒 Security: Prevent code execution in model loading** - Added protection against malicious .pth files by restricting deserialization to tensors only\n- **🎥 Fix: FFmpeg video writer reliability** - Resolved ffmpeg process hanging issues by redirecting stderr and adding buffer flush, with improved error messages for debugging *(thanks [@thehhmdb](https:\u002F\u002Fgithub.com\u002Fthehhmdb))*\n- **⚡ Fix: GGUF VAE model support** - Enabled automatic weight dequantization for convolution operations, making GGUF-quantized VAE models fully functional *(thanks [@naxci1](https:\u002F\u002Fgithub.com\u002Fnaxci1))*\n- **🛡️ Fix: VAE slicing edge cases** - Protected against division by zero crashes when using small split sizes with high temporal downsampling *(thanks [@naxci1](https:\u002F\u002Fgithub.com\u002Fnaxci1))*\n- **🎨 Fix: LAB color transfer precision** - Resolved dtype mismatch errors during video upscaling by ensuring consistent float types before matrix operations\n- **🔧 Fix: PyTorch 2.9+ compatibility** - Extended Conv3d memory workaround to all PyTorch 2.9+ versions, fixing 3x VRAM usage on newer PyTorch releases\n- **📦 Fix: Bitsandbytes compatibility** - Added ValueError exception handling for Intel Gaudi version detection failures on non-Gaudi systems\n- **🍎 MPS: Memory optimization** - Reduced memory usage during encode\u002Fdecode operations on Apple Silicon *(thanks [@s-cerevisiae](https:\u002F\u002Fgithub.com\u002Fs-cerevisiae))*\n\n\n**2025.12.13 - Version 2.5.22**\n\n- **🎬 CLI: FFmpeg video backend with 10-bit support** - New `--video_backend ffmpeg` and `--10bit` flags enable x265 encoding with 10-bit color depth, reducing banding artifacts in gradients compared to 8-bit OpenCV output *(based on PR by [@thehhmdb](https:\u002F\u002Fgithub.com\u002Fthehhmdb) - thank you!)*\n- **🍎 Fix: MPS bicubic upscaling compatibility** - Added CPU fallback for bicubic+antialias interpolation on PyTorch versions before 2.8.0, resolving RGBA alpha upscaling errors on Apple Silicon\n- **⚡ Fix: Cross-platform histogram matching** - Replaced scatter_ operation with argsort+index_select for improved reliability across CUDA, ROCm, and MPS backends\n- **🧹 MPS: Remove sync overhead** - Reverted unnecessary `torch.mps.synchronize()` calls introduced in v2.5.21 for consistent behavior with CUDA pipeline\n\n**2025.12.12 - Version 2.5.21**\n\n- **🛠️ Fix: GGUF dequantization error on MPS** - Resolved shape mismatch error introduced in 2.5.20 by skipping GGUF quantized buffers in precision conversion - these must remain in packed format for on-the-fly dequantization during inference\n- **🍎 MPS: Eliminate CPU sync overhead** - Skip unnecessary CPU tensor offload on Apple Silicon unified memory architecture, preventing sync stalls that caused slowdowns. Input images and output video now stay on MPS device throughout the pipeline\n- **⚡ MPS: Preload text embeddings** - Load text embeddings before Phase 1 encoding to avoid sync stall at Phase 2 start, improving timing accuracy and throughput\n- **🧹 MPS: Optimized model cleanup** - Skip redundant CPU movement before model deletion on unified memory\n\n**2025.12.12 - Version 2.5.20**\n\n- **⚡ Expanded attention backends** - Full support for Flash Attention 2 (Ampere+), Flash Attention 3 (Hopper+), SageAttention 2, and SageAttention 3 (Blackwell\u002FRTX 50xx), with automatic fallback chains to PyTorch SDPA when unavailable *(based on PR by [@naxci1](https:\u002F\u002Fgithub.com\u002Fnaxci1) - thank you!)*\n- **🍎 macOS\u002FApple Silicon compatibility** - Replaced MPS autocast with explicit dtype conversion throughout VAE and DiT pipelines, resolving hangs and crashes on M-series Macs. BlockSwap now auto-disables with warning (unified memory makes it meaningless)\n- **🛡️ Flash Attention graceful fallback** - Added compatibility shims for corrupted or partially installed flash_attn\u002Fxformers DLLs, preventing startup crashes\n- **🛡️ AMD ROCm: bitsandbytes conflict fix** - Prevent kernel registration errors when diffusers attempts to re-import broken bitsandbytes installations\n- **📦 ComfyUI Manager: macOS classifier fix** - Removed NVIDIA CUDA classifier causing false \"GPU not supported\" warnings on macOS\n- **📚 Documentation updates** - Updated README with attention backend details, BlockSwap macOS notes, and clarified model caching descriptions\n\n**2025.12.10 - Version 2.5.19**\n\n- **🎨 New header logo design** - Refreshed ASCII art banner *(thanks [@naxci1](https:\u002F\u002Fgithub.com\u002Fnaxci1))*\n- **🧹 Remove dead flash attention wrapper** - Removed legacy code from FP8CompatibleDiT; FlashAttentionVarlen already handles backend switching via its `attention_mode` attribute\n- **🛡️ Fix graceful fallback from flash-attn** - Add compatibility shims for corrupted flash_attn\u002Fxformers DLLs, preventing startup crashes when CUDA extensions are broken\n- **📊 Improved VRAM tracking** - Separate allocated vs reserved memory tracking, Windows-only overflow detection (WDDM paging behavior)\n- **♻️ Centralize backend detection** - Unified `is_mps_available()`, `is_cuda_available()`, `get_gpu_backend()` helpers across codebase\n- **🔄 Revert 2.5.14 VRAM limit enforcement** - Removed `set_per_process_memory_fraction` call; Overflow detection and warnings remain.\n\n**2025.12.09 - Version 2.5.18**\n\n- **🚀 CLI: Streaming mode for long videos** - New `--chunk_size` flag processes videos in memory-bounded chunks, enabling arbitrarily long videos without RAM limits. Works with model caching (`--cache_dit`\u002F`--cache_vae`) for chunk-to-chunk reuse *(inspired by [disk02](https:\u002F\u002Fgithub.com\u002Fdisk02) PR contribution)*\n- **⚡ CLI: Multi-GPU streaming** - Each GPU now streams its segment internally with independent model caching, improving memory efficiency and enabling `--temporal_overlap` blending at GPU boundaries\n- **🔧 CLI: Fix large video MemoryError** - Shared memory transfer replaces numpy pickling, preventing crashes on high-resolution\u002Flong video outputs *(inspired by  [FurkanGozukara](https:\u002F\u002Fgithub.com\u002FFurkanGozukara) PR contribution)*\n\n**2025.12.05 - Version 2.5.17**\n\n- **🔧 Fix: Older GPU compatibility (GTX 970, etc.)** - Runtime bf16 CUBLAS probe replaces compute capability heuristics, correctly detecting unsupported GPUs without affecting RTX 20XX\n\n**2025.12.05 - Version 2.5.16**\n\n- **🔧 Fix: Older GPU compatibility (GTX 970, etc.)** - Automatic fallback for GPUs without bfloat16 support\n- **🐛 Fix: Quality regression** - Reverted bfloat16 detection that was causing artifact issues\n- **📋 Debug: Environment info display** - Shows system info in debug mode to help with issue reporting\n- **📚 Docs: Simplified contribution workflow** - Streamlined to main branch only\n\n**2025.12.03 - Version 2.5.15**\n\n- **🍎 Fix: MPS compatibility** - Disable antialias for MPS tensors and fix bfloat16 arange issues\n- **⚡ Fix: Autocast device type** - Use proper device type attribute to prevent autocast errors\n- **📊 Memory: Accurate VRAM tracking** - Use max_memory_reserved for more precise peak reporting\n- **🔧 Fix: Triton compatibility** - Add shim for bitsandbytes 0.45+ \u002F triton 3.0+ (fixes PyTorch 2.7 installation errors)\n\n**2025.12.01 - Version 2.5.14**\n\n- **🍎 Fix: MPS device comparison** - Normalize device strings to prevent unnecessary tensor movements\n- **📊 Memory: VRAM swap detection** - Peak stats now show GPU+swap breakdown when overflow occurs, with warning when swap detected\n- **🛡️ Memory: Enforce physical VRAM limit** - PyTorch now OOMs instead of silently swapping to shared memory (prevents extreme slowdowns on Windows)\n\n**2025.11.30 - Version 2.5.13**\n\n- **🔧 Fix: PyTorch 2.7+ triton import error** - Resolved installation crash caused by triton.ops import chain on newer triton versions\n- **💾 Fix: OOM on float32 conversion for long videos** - Graceful fallback to native dtype when insufficient memory for float32 conversion\n- **🍎 Fix: CLI watermark error on macOS** - Resolved MPS-related watermark processing crash on Apple Silicon\n\n**2025.11.28 - Version 2.5.12**\n\n- **🐛 Fix: Color artifacts regression** - Reverted in-place tensor operations in video transform pipeline that caused color artifacts on some images\n\n**2025.11.28 - Version 2.5.11**\n\n- **⚡ Feature: CUDNN attention backend** - Added support for PyTorch 2.3+ CUDNN_ATTENTION backend with automatic fallback for older versions (thanks @eadwu)\n- **💾 Fix: Memory spike for long videos** - VAE decode now streams directly to pre-allocated tensor, eliminating OOM errors during long video processing\n- **🎨 Fix: LAB color correction artifacts** - Resolved tile boundary artifacts using wavelet reconstruction preprocessing\n- **🎨 Fix: Color reference misalignment** - Fixed color correction frame alignment with temporal overlap\n- **🍎 Fix: MPS detection reliability** - Switched to canonical `torch.backends.mps.is_available()` API for consistent Apple Silicon detection\n- **🖥️ Fix: Mac subprocess error** - CLI now uses direct processing on Mac to avoid MPS allocator failures in child processes\n- **🖥️ Fix: Multi-GPU device assignment** - CUDA_VISIBLE_DEVICES now set before spawn for proper worker inheritance\n- **📊 Fix: BlockSwap logging** - Now shows effective\u002Ftotal blocks (e.g., 32\u002F32) instead of raw requested value\n- **🔧 Feature: Auto bfloat16 detection** - Automatically detects bfloat16 support to prevent CUBLAS errors on older GPUs\n- **📊 Feature: Peak RAM tracking** - Added RAM usage alongside VRAM in debug summary\n- **⚡ Performance: In-place tensor ops** - Reduced memory allocation overhead with in-place operations throughout pipeline\n- **📖 Docs: Multi-GPU clarification** - Clarified frame-level parallelism behavior expectations for multi-GPU setups\n\n**2025.11.13 - Version 2.5.10**\n\n- **🎯 Fix: Deterministic generation** - Identical images with the same seed now produce identical results across different sessions and batch positions\n- **🔧 Fix: Model caching with BlockSwap** - Resolved issue where cached DiT models wouldn't properly reload when VAE caching state changed\n- **💾 Fix: Runner caching optimization** - Runner templates now correctly cache whenever both DiT and VAE are cached, regardless of caching order\n- **📁 Fix: Case-insensitive model paths** - Extra model paths in YAML config now work regardless of case (seedvr2, SEEDVR2, SeedVR2, etc.)\n- **🐛 Fix: High resolution tile debug crash** - Fixed \"NoneType has no attribute log\" error when using maximum resolution with VAE tiling\n- **📊 Fix: Temporal overlap logging** - Corrected frame count reporting when temporal overlap is automatically adjusted\n- **🔍 Feature: Enhanced model path debugging** - Added detailed logging to help troubleshoot model loading issues (visible in debug mode)\n\n**2025.11.12 - Version 2.5.9**\n\n- **🐛 Fix: Tile debug visualization crash** - Fixed OpenCV error when using VAE tile debug mode on certain systems.\n- **🍎 Fix: macOS MPS loading error** - Added automatic CPU fallback for MPS allocator issues on certain PyTorch\u002FmacOS versions.\n- **🖥️ Fix: Windows log buffering** - Added flush to print statements for real-time log visibility in ComfyUI on Windows\n- **📦 Fix: ComfyUI Registry logo** - Updated icon URL to display properly in ComfyUI node registry\n- **ℹ️ Feature: Version display** - Added version number to node name and CLI\u002FComfyUI header for better tracking\n- **💝 Feature: GitHub Sponsors** - Added sponsor button to support project development. Thank you everyone for your support!\n- **📜 License: Apache 2.0** - Reverted License from MIT to Apache 2.0 to match ByteDance Seed project\n\n**2025.11.10 - Version 2.5.8**\n\n- **🐛 Fix (CLI): Windows batch processing duplicate files** - Fixed CLI batch mode processing each file twice on Windows due to case-insensitive filesystem. Improved directory scanning performance by 2-3x\n- **📁 Fix(CLI): Output folder location** - Output files now created in sensible locations: batch mode creates `{folder_name}_upscaled\u002F` sibling folder with original filenames preserved; single file mode adds `_upscaled` suffix in same directory. All logs now show absolute paths for clarity\n- **🎨 Fix(CLI): RGBA alpha channel support** - PNG images with transparency are now properly detected and preserved through the upscaling pipeline, matching ComfyUI behavior\n\n**2025.11.10 - Version 2.5.7**\n\n- **🔧 Fix: Conv3d workaround compatibility** - Enhanced platform detection and added graceful fallback to prevent errors on PyTorch dev builds and AMD ROCm systems\n\n**2025.11.09 - Version 2.5.6**\n\n- 🎨 **Fix: Restored natural look for 7b model** - Corrected torch.compile optimization that was causing overly plastic\u002F high-specular appearance in upscaled videos with 7b model.\n\n- 💾 **Memory: Fixed RAM leak for long videos** - On-demand reconstruction with lightweight batch indices instead of storing full transformed videos, fixed release_tensor_memory to handle CPU\u002FCUDA\u002FMPS consistently, and refactored batch processing helpers\n\n**2025.11.08 - Version 2.5.4**\n\n- 🎨 **Fix: AdaIN color correction** - Replace `.view()` with `.reshape()` to handle non-contiguous tensors after spatial padding, resolving \"view size is not compatible with input tensor's size and stride\" error\n- 🔴 **Fix: AMD ROCm compatibility** - Add cuDNN availability check in Conv3d workaround to prevent \"ATen not compiled with cuDNN support\" error on ROCm systems (AMD GPUs on Windows\u002FLinux)\n\n**2025.11.08 - Version 2.5.3**\n\n- 🍎 **Fix: Apple Silicon MPS device handling** - Corrected MPS device enumeration to use `\"mps\"` instead of `\"mps:0\"`, resolving invalid device errors on M-series Macs\n- 🪟 **Fix: torch.mps AttributeError on Windows** - Add defensive checks for `torch.mps.is_available()` to handle PyTorch versions where the method doesn't exist on non-Mac platforms\n\n**2025.11.07 - Version 2.5.0** 🎉\n\n⚠️ **BREAKING CHANGE**: This is a major update requiring workflow recreation. All nodes and CLI parameters have been redesigned for better usability and consistency. Watch the latest video from [AInVFX](https:\u002F\u002Fwww.youtube.com\u002F@AInVFX) for a deep dive and check out the [usage](#-usage) section.\n\n**📦 Official Release**: Now available on main branch with ComfyUI Manager support for easy installation and automatic version tracking. Updated dependencies and local imports prevent conflicts with other ComfyUI custom nodes.\n\n### 🎨 ComfyUI Improvements\n\n- **Four-Node Modular Architecture**: Split into dedicated nodes for DiT model, VAE model, torch.compile settings, and main upscaler for granular control\n- **Global Model Cache**: Models now shared across multiple upscaler instances with automatic config updates - no more redundant loading\n- **ComfyUI V3 Migration**: Full compatibility with ComfyUI V3 stateless node design\n- **RGBA Support**: Native alpha channel processing with edge-guided upscaling for clean transparency\n- **Improved Memory Management**: Streaming architecture prevents VRAM spikes regardless of video length\n- **Flexible Resolution Support**: Upscale to any resolution divisible by 2 with lossless padding approach (replaced restrictive cropping)\n- **Enhanced Parameters**: Added `uniform_batch_size`, `temporal_overlap`, `prepend_frames`, and `max_resolution` for better control\n\n### 🖥️ CLI Enhancements\n\n- **Batch Directory Processing**: Process entire folders of videos\u002Fimages with model caching for efficiency\n- **Single Image Support**: Direct image upscaling without video conversion\n- **Smart Output Detection**: Auto-detects output format (MP4\u002FPNG) based on input type\n- **Enhanced Multi-GPU**: Improved workload distribution with temporal overlap blending\n- **Unified Parameters**: CLI and ComfyUI now use identical parameter names for consistency\n- **Better UX**: Auto-display help, validation improvements, progress tracking, and cleaner output\n\n### ⚡ Performance & Optimization\n\n- **torch.compile Support**: 20-40% DiT speedup and 15-25% VAE speedup with full graph compilation\n- **Optimized BlockSwap**: Adaptive memory clearing (5% threshold), separate I\u002FO component handling, reduced overhead\n- **Enhanced VAE Tiling**: Tensor offload support for accumulation buffers, separate encode\u002Fdecode configuration\n- **Native Dtype Pipeline**: Eliminated unnecessary conversions, maintains bfloat16 precision throughout for speed and quality\n- **Optimized Tensor Operations**: Replaced einops rearrange with native PyTorch ops for 2-5x faster transforms\n\n### 🎯 Quality Improvements\n\n- **LAB Color Correction**: New perceptual color transfer method with superior color accuracy (now default)\n- **Additional Color Methods**: HSV saturation matching, wavelet adaptive, and hybrid approaches\n- **Deterministic Generation**: Seed-based reproducibility with phase-specific seeding strategy\n- **Better Temporal Consistency**: Hann window blending for smooth transitions between batches\n\n### 💾 Memory Management\n\n- **Smarter Offloading**: Independent device configuration for DiT, VAE, and tensors (CPU\u002FGPU\u002Fnone)\n- **Four-Phase Pipeline**: Completes each phase (encode→upscale→decode→postprocess) for all batches before moving to next, minimizing model swaps\n- **Better Cleanup**: Phase-specific resource management with proper tensor memory release\n- **Peak VRAM Tracking**: Per-phase memory monitoring with summary display\n\n### 🔧 Technical Improvements\n\n- **GGUF Quantization Support**: Added full GGUF support for 4-bit\u002F8-bit inference on low-VRAM systems\n- **Improved GGUF Handling**: Fixed VRAM leaks, torch.compile compatibility, non-persistent buffers\n- **Apple Silicon Support**: Full MPS (Metal Performance Shaders) support for Apple Silicon Macs\n- **AMD ROCm Compatibility**: Conditional FSDP imports for PyTorch ROCm 7+ support\n- **Conv3d Memory Workaround**: Fixes PyTorch 2.9+ cuDNN memory bug (3x usage reduction)\n- **Flash Attention Optional**: Graceful fallback to SDPA when flash-attn unavailable\n\n### 📚 Code Quality\n\n- **Modular Architecture**: Split monolithic files into focused modules (generation_phases, model_configuration, etc.)\n- **Comprehensive Documentation**: Extensive docstrings with type hints across all modules\n- **Better Error Handling**: Early validation, clear error messages, installation instructions\n- **Consistent Logging**: Unified indentation, better categorization, concise messages\n\n**2025.08.07**\n\n- 🎯 **Unified Debug System**: New structured logging with categories, timers, and memory tracking. `enable_debug` now available on main node\n- ⚡ **Smart FP8 Optimization**: FP8 models now keep native FP8 storage, converting to BFloat16 only for arithmetic - faster and more memory efficient than FP16\n- 📦 **Model Registry**: Multi-repo support (numz\u002F & AInVFX\u002F), auto-discovery of user models, added mixed FP8 variants to fix 7B artifacts\n- 💾 **Model Caching**: `cache_model` moved to main node, fixed memory leaks with proper RoPE\u002Fwrapper cleanup\n- 🧹 **Code Cleanup**: New modular structure (`constants.py`, `model_registry.py`, `debug.py`), removed legacy code\n- 🚀 **Performance**: Better memory management with `torch.cuda.ipc_collect()`, improved RoPE handling\n\n**2025.07.17**\n\n- 🛠️ Add 7B sharp Models: add 2 new 7B models with sharpen output\n\n**2025.07.11**\n\n- 🎬 Complete tutorial released: Adrien from [AInVFX](https:\u002F\u002Fwww.youtube.com\u002F@AInVFX) created an in-depth ComfyUI SeedVR2 guide covering everything from basic setup to advanced BlockSwap techniques for running on consumer GPUs. Perfect for understanding memory optimization and upscaling of image sequences with alpha channel! [Watch the tutorial](#-usage)\n\n**2025.09.07**\n\n- 🛠️ Blockswap Integration: Big thanks to [Adrien Toupet](https:\u002F\u002Fgithub.com\u002Fadrientoupet) from [AInVFX](https:\u002F\u002Fwww.youtube.com\u002F@AInVFX) for this :), useful for low VRAM users (see [usage](#-usage) section)\n\n**2025.07.03**\n\n- 🛠️ Can run as **standalone mode** with **Multi GPU** see [🖥️ Run as Standalone](#run-as-standalone-cli)\n\n**2025.06.30**\n\n- 🚀 Speed Up the process and less VRAM used\n- 🛠️ Fixed memory leak on 3B models\n- ❌ Can now interrupt process if needed\n- ✅ Refactored the code for better sharing with the community, feel free to propose pull requests\n- 🛠️ Removed flash attention dependency (thanks to [luke2642](https:\u002F\u002Fgithub.com\u002FLuke2642) !!)\n\n**2025.06.24**\n\n- 🚀 Speed up the process until x4\n\n**2025.06.22**\n\n- 💪 FP8 compatibility !\n- 🚀 Speed Up all Process\n- 🚀 less VRAM consumption (Stay high, batch_size=1 for RTX4090 max, I'm trying to fix that)\n- 🛠️ Better benchmark coming soon\n\n**2025.06.20**\n\n- 🛠️ Initial push\n\n## 🎯 Features\n\n### Core Capabilities\n- **High-Quality Diffusion-Based Upscaling**: One-step diffusion model for video and image enhancement\n- **Temporal Consistency**: Maintains coherence across video frames with configurable batch processing\n- **Multi-Format Support**: Handles RGB and RGBA (alpha channel) for both videos and images\n- **Any Video Length**: Suitable for any video length\n\n### Model Support\n- **Multiple Model Variants**: 3B and 7B parameter models with different precision options\n- **FP16, FP8, and GGUF Quantization**: Choose between full precision (FP16), mixed precision (FP8), or heavily quantized GGUF models for different VRAM requirements\n- **Automatic Model Downloads**: Models are automatically downloaded from HuggingFace on first use\n\n### Memory Optimization\n- **BlockSwap Technology**: Dynamically swap transformer blocks between GPU and CPU memory to run large models on limited VRAM\n- **VAE Tiling**: Process large resolutions with tiled encoding\u002Fdecoding to reduce VRAM usage\n- **Intelligent Offloading**: Offload models and intermediate tensors to CPU or secondary GPUs between processing phases\n- **GGUF Quantization Support**: Run models with 4-bit or 8-bit quantization for extreme VRAM savings\n\n### Performance Features\n- **torch.compile Integration**: Optional 20-40% DiT speedup and 15-25% VAE speedup with PyTorch 2.0+ compilation\n- **Multi-GPU CLI**: Distribute workload across multiple GPUs with automatic temporal overlap blending\n- **Model Caching**: Keep models loaded between generations for single-GPU directory processing or multi-GPU streaming\n- **Flexible Attention Backends**: Choose between PyTorch SDPA (stable, always available), Flash Attention 2\u002F3, or SageAttention 2\u002F3 for faster computation on supported hardware\n\n### Quality Control\n- **Advanced Color Correction**: Five methods including LAB (recommended for highest fidelity), wavelet, wavelet adaptive, HSV, and AdaIN\n- **Noise Injection Controls**: Fine-tune input and latent noise scales for artifact reduction at high resolutions\n- **Configurable Resolution Limits**: Set target and maximum resolutions with automatic aspect ratio preservation\n\n### Workflow Features\n- **ComfyUI Integration**: Four dedicated nodes for complete control over the upscaling pipeline\n- **Standalone CLI**: Command-line interface for batch processing and automation\n- **Debug Logging**: Comprehensive debug mode with memory tracking, timing information, and processing details\n- **Progress Reporting**: Real-time progress updates during processing\n\n## 🔧 Requirements\n\n### Hardware\n\nWith the current optimizations (tiling, BlockSwap, GGUF quantization), SeedVR2 can run on a wide range of hardware:\n\n- **Minimal VRAM** (8GB or less): Use GGUF Q4_K_M models with BlockSwap and VAE tiling enabled\n- **Moderate VRAM** (12-16GB): Use FP8 models with BlockSwap or VAE tiling as needed\n- **High VRAM** (24GB+): Use FP16 models for best quality and speed without memory optimizations\n\n### Software\n\n- **ComfyUI**: Latest version recommended\n- **Python**: 3.12+ (Python 3.12 and 3.13 tested and recommended)\n- **PyTorch**: 2.0+ for torch.compile support (optional but recommended)\n- **Triton**: Required for torch.compile with inductor backend (optional)\n- **Flash Attention \u002F SageAttention**: Flash Attention 2 (Ampere+), Flash Attention 3 (Hopper+), SageAttention 2 or SageAttention 3 (Blackwell) provide faster attention computation on supported hardware (optional, falls back to PyTorch SDPA)\n\n## 📦 Installation\n\n### Option 1: ComfyUI Manager (Recommended)\n\n1. Open ComfyUI Manager in your ComfyUI interface\n2. Click \"Custom Nodes Manager\"\n3. Search for \"ComfyUI-SeedVR2_VideoUpscaler\"\n4. Click \"Install\" and restart ComfyUI\n\n**Registry Link**: [ComfyUI Registry - SeedVR2 Video Upscaler](https:\u002F\u002Fregistry.comfy.org\u002Fnodes\u002Fseedvr2_videoupscaler)\n\n### Option 2: Manual Installation\n\n1. **Clone the repository** into your ComfyUI custom nodes directory:\n```bash\ncd ComfyUI\ngit clone https:\u002F\u002Fgithub.com\u002Fnumz\u002FComfyUI-SeedVR2_VideoUpscaler.git custom_nodes\u002Fseedvr2_videoupscaler\n```\n\n2. **Install dependencies using standalone Python**:\n```bash\n# Install requirements (from same ComfyUI directory)\n# Windows:\n.venv\\Scripts\\python.exe -m pip install -r custom_nodes\\seedvr2_videoupscaler\\requirements.txt\n# Linux\u002FmacOS:\n.venv\u002Fbin\u002Fpython -m pip install -r custom_nodes\u002Fseedvr2_videoupscaler\u002Frequirements.txt\n```\n\n3. **Restart ComfyUI**\n\n### Model Installation\n\nModels will be **automatically downloaded** on first use and saved to `ComfyUI\u002Fmodels\u002FSEEDVR2`.\n\nYou can also manually download models from:\n- Main models available at [numz\u002FSeedVR2_comfyUI](https:\u002F\u002Fhuggingface.co\u002Fnumz\u002FSeedVR2_comfyUI\u002Ftree\u002Fmain) and [AInVFX\u002FSeedVR2_comfyUI](https:\u002F\u002Fhuggingface.co\u002FAInVFX\u002FSeedVR2_comfyUI\u002Ftree\u002Fmain)\n- Additional GGUF models available at [cmeka\u002FSeedVR2-GGUF](https:\u002F\u002Fhuggingface.co\u002Fcmeka\u002FSeedVR2-GGUF\u002Ftree\u002Fmain)\n\n## 📖 Usage\n\n### 🎬 Video Tutorials\n\n#### Latest Version Deep Dive (Recommended)\n\nComplete walkthrough of version 2.5 by Adrien from [AInVFX](https:\u002F\u002Fwww.youtube.com\u002F@AInVFX), covering the new 4-node architecture, GGUF support, memory optimizations, and production workflows:\n\n[![SeedVR2 v2.5 Deep Dive Tutorial](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnumz_ComfyUI-SeedVR2_VideoUpscaler_readme_c1df29b1d2ae.jpg)](https:\u002F\u002Fyoutu.be\u002FMBtWYXq_r60)\n\nThis comprehensive tutorial covers:\n- Installing v2.5 through ComfyUI Manager and troubleshooting conflicts\n- Understanding the new 4-node modular architecture and why we rebuilt it\n- Running 7B models on 8GB VRAM with GGUF quantization\n- Configuring BlockSwap, VAE tiling, and torch.compile for your hardware\n- Image and video upscaling workflows with alpha channel support\n- CLI for batch processing and multi-GPU rendering\n- Memory optimization strategies for different VRAM levels\n- Real production tips and the critical batch_size formula (4n+1)\n\n#### Previous Version Tutorial\n\nFor reference, here's the original tutorial covering the initial release:\n\n[![SeedVR2 Deep Dive Tutorial](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnumz_ComfyUI-SeedVR2_VideoUpscaler_readme_f63bdd371d3a.jpg)](https:\u002F\u002Fyoutu.be\u002FI0sl45GMqNg)\n\n*Note: This tutorial covers the previous single-node architecture. While the UI has changed significantly in v2.5, the core concepts about BlockSwap and memory management remain valuable.*\n\n### Node Setup\n\nSeedVR2 uses a modular node architecture with four specialized nodes:\n\n#### 1. SeedVR2 (Down)Load DiT Model\n\n![SeedVR2 (Down)Load DiT Model](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnumz_ComfyUI-SeedVR2_VideoUpscaler_readme_32d7085bcfe1.png)\n\nConfigure the DiT (Diffusion Transformer) model for video upscaling.\n\n**Parameters:**\n\n- **model**: Choose your DiT model\n  - **3B Models**: Faster, lower VRAM requirements\n    - `seedvr2_ema_3b_fp16.safetensors`: FP16 (best quality)\n    - `seedvr2_ema_3b_fp8_e4m3fn.safetensors`: FP8 8-bit (good quality)\n    - `seedvr2_ema_3b-Q4_K_M.gguf`: GGUF 4-bit quantized (acceptable quality)\n    - `seedvr2_ema_3b-Q8_0.gguf`: GGUF 8-bit quantized (good quality)\n  - **7B Models**: Higher quality, higher VRAM requirements\n    - `seedvr2_ema_7b_fp16.safetensors`: FP16 (best quality)\n    - `seedvr2_ema_7b_fp8_e4m3fn_mixed_block35_fp16.safetensors`: FP8 with last block in FP16 to reduce artifacts (good quality)\n    - `seedvr2_ema_7b-Q4_K_M.gguf`: GGUF 4-bit quantized (acceptable quality)\n    - `seedvr2_ema_7b_sharp_*`: Sharp variants for enhanced detail\n\n- **device**: GPU device for DiT inference (e.g., `cuda:0`)\n\n- **offload_device**: Device to offload DiT model when not actively processing\n  - `none`: Keep model on inference device (fastest, highest VRAM)\n  - `cpu`: Offload to system RAM (reduces VRAM)\n  - `cuda:X`: Offload to another GPU (good balance if available)\n\n- **cache_model**: Keep DiT model loaded on offload_device between workflow runs\n  - Useful for batch processing to avoid repeated loading\n  - Requires offload_device to be set\n\n- **blocks_to_swap**: BlockSwap memory optimization\n  - `0`: Disabled (default)\n  - `1-32`: Number of transformer blocks to swap for 3B model\n  - `1-36`: Number of transformer blocks to swap for 7B model\n  - Higher values = more VRAM savings but slower processing\n  - Requires offload_device to be set and different from device\n\n- **swap_io_components**: Offload input\u002Foutput embeddings and normalization layers\n  - Additional VRAM savings when combined with blocks_to_swap\n  - Requires offload_device to be set and different from device\n\n- **attention_mode**: Attention computation backend\n  - `sdpa`: PyTorch scaled_dot_product_attention (default, always available)\n  - `flash_attn_2`: Flash Attention 2 (Ampere+, requires flash-attn package)\n  - `flash_attn_3`: Flash Attention 3 (Hopper+, requires flash-attn with FA3 support)\n  - `sageattn_2`: SageAttention 2 (requires sageattention package)\n  - `sageattn_3`: SageAttention 3 (Blackwell\u002FRTX 50xx, requires sageattn3 package)\n\n- **torch_compile_args**: Connect to SeedVR2 Torch Compile Settings node for 20-40% speedup\n\n**BlockSwap Explained:**\n\nBlockSwap enables running large models on GPUs with limited VRAM by dynamically swapping transformer blocks between GPU and CPU memory during inference.\n\n> **Note:** BlockSwap is not available on macOS. Apple Silicon Macs use unified memory architecture where GPU and CPU share the same memory pool, making BlockSwap meaningless. The option will be automatically disabled with a warning if requested on macOS.\n\nHere's how it works:\n\n- **What it does**: Keeps only the currently-needed transformer blocks on the GPU, while storing the rest on CPU or another device\n- **When to use it**: When you get OOM (Out of Memory) errors during the upscaling phase\n- **How to configure**:\n  1. Set `offload_device` to `cpu` or another GPU\n  2. Start with `blocks_to_swap=16` (half the blocks)\n  3. If still getting OOM, increase to 24 or 32 (3B) \u002F 36 (7B)\n  4. Enable `swap_io_components` for maximum VRAM savings\n  5. If you have plenty of VRAM, decrease or set to 0 for faster processing\n\n**Example Configuration for Low VRAM (8GB)**:\n- model: `seedvr2_ema_3b-Q8_0.gguf`\n- device: `cuda:0`\n- offload_device: `cpu`\n- blocks_to_swap: `32`\n- swap_io_components: `True`\n\n#### 2. SeedVR2 (Down)Load VAE Model\n\n![SeedVR2 (Down)Load VAE Model](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnumz_ComfyUI-SeedVR2_VideoUpscaler_readme_0e32b40d059a.png)\n\nConfigure the VAE (Variational Autoencoder) model for encoding\u002Fdecoding video frames.\n\n**Parameters:**\n\n- **model**: VAE model selection\n  - `ema_vae_fp16.safetensors`: Default and recommended\n\n- **device**: GPU device for VAE inference (e.g., `cuda:0`)\n\n- **offload_device**: Device to offload VAE model when not actively processing\n  - `none`: Keep model on inference device (default, fastest)\n  - `cpu`: Offload to system RAM (reduces VRAM)\n  - `cuda:X`: Offload to another GPU (good balance if available)\n\n- **cache_model**: Keep VAE model loaded on offload_device between workflow runs\n  - Requires offload_device to be set\n\n- **encode_tiled**: Enable tiled encoding to reduce VRAM usage during encoding phase\n  - Enable if you see OOM errors during the \"Encoding\" phase in debug logs\n\n- **encode_tile_size**: Encoding tile size in pixels (default: 1024)\n  - Applied to both height and width\n  - Lower values reduce VRAM but may increase processing time\n\n- **encode_tile_overlap**: Encoding tile overlap in pixels (default: 128)\n  - Reduces visible seams between tiles\n\n- **decode_tiled**: Enable tiled decoding to reduce VRAM usage during decoding phase\n  - Enable if you see OOM errors during the \"Decoding\" phase in debug logs\n\n- **decode_tile_size**: Decoding tile size in pixels (default: 1024)\n\n- **decode_tile_overlap**: Decoding tile overlap in pixels (default: 128)\n\n- **torch_compile_args**: Connect to SeedVR2 Torch Compile Settings node for 15-25% speedup\n\n**VAE Tiling Explained:**\n\nVAE tiling processes large resolutions in smaller tiles to reduce VRAM requirements. Here's how to use it:\n\n1. **Run without tiling first** and monitor the debug logs (enable `enable_debug` on main node)\n2. **If OOM during \"Encoding\" phase**:\n   - Enable `encode_tiled`\n   - If still OOM, reduce `encode_tile_size` (try 768, 512, etc.)\n3. **If OOM during \"Decoding\" phase**:\n   - Enable `decode_tiled`\n   - If still OOM, reduce `decode_tile_size`\n4. **Adjust overlap** (default 128) if you see visible seams in output (increase it) or processing times are too slow (decrease it).\n\n**Example Configuration for High Resolution (4K)**:\n- encode_tiled: `True`\n- encode_tile_size: `1024`\n- encode_tile_overlap: `128`\n- decode_tiled: `True`\n- decode_tile_size: `1024`\n- decode_tile_overlap: `128`\n\n#### 3. SeedVR2 Torch Compile Settings (Optional)\n\n![SeedVR2 Torch Compile Settings](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnumz_ComfyUI-SeedVR2_VideoUpscaler_readme_19daf8c081be.png)\n\nConfigure torch.compile optimization for 20-40% DiT speedup and 15-25% VAE speedup.\n\n**Requirements:**\n- PyTorch 2.0+\n- Triton (for inductor backend)\n\n**Parameters:**\n\n- **backend**: Compilation backend\n  - `inductor`: Full optimization with Triton kernel generation and fusion (recommended)\n  - `cudagraphs`: Lightweight wrapper using CUDA graphs, no kernel optimization\n\n- **mode**: Optimization level (compilation time vs runtime performance)\n  - `default`: Fast compilation with good speedup (recommended for development)\n  - `reduce-overhead`: Lower overhead, optimized for smaller models\n  - `max-autotune`: Slowest compilation, best runtime performance (recommended for production)\n  - `max-autotune-no-cudagraphs`: Like max-autotune but without CUDA graphs\n\n- **fullgraph**: Compile entire model as single graph without breaks\n  - `False`: Allow graph breaks for better compatibility (default, recommended)\n  - `True`: Enforce no breaks for maximum optimization (may fail with dynamic shapes)\n\n- **dynamic**: Handle varying input shapes without recompilation\n  - `False`: Specialize for exact input shapes (default)\n  - `True`: Create dynamic kernels that adapt to shape variations (enable when processing different resolutions or batch sizes)\n\n- **dynamo_cache_size_limit**: Max cached compiled versions per function (default: 64)\n  - Higher = more memory, lower = more recompilation\n\n- **dynamo_recompile_limit**: Max recompilation attempts before falling back to eager mode (default: 128)\n  - Safety limit to prevent compilation loops\n\n**Usage:**\n1. Add this node to your workflow\n2. Connect its output to the `torch_compile_args` input of DiT and\u002For VAE loader nodes\n3. First run will be slow (compilation), subsequent runs will be much faster\n\n**When to use:**\n- torch.compile only makes sense when processing **multiple batches, long videos, or many tiles**\n- For single images or short clips, the compilation time outweighs the speed improvement\n- Best suited for batch processing workflows or long videos\n\n**Recommended Settings:**\n- For development\u002Ftesting: `mode=default`, `backend=inductor`, `fullgraph=False`\n- For production: `mode=max-autotune`, `backend=inductor`, `fullgraph=False`\n\n#### 4. SeedVR2 Video Upscaler (Main Node)\n\n![SeedVR2 Video Upscaler](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnumz_ComfyUI-SeedVR2_VideoUpscaler_readme_5bad7d99fbb9.png)\n\nMain upscaling node that processes video frames using DiT and VAE models.\n\n**Required Inputs:**\n\n- **image**: Input video frames as image batch (RGB or RGBA format)\n- **dit**: DiT model configuration from SeedVR2 (Down)Load DiT Model node\n- **vae**: VAE model configuration from SeedVR2 (Down)Load VAE Model node\n\n**Parameters:**\n\n- **seed**: Random seed for reproducible generation (default: 42)\n  - Same seed with same inputs produces identical output\n\n- **resolution**: Target resolution for shortest edge in pixels (default: 1080)\n  - Maintains aspect ratio automatically\n\n- **max_resolution**: Maximum resolution for any edge (default: 0 = no limit)\n  - Automatically scales down if exceeded to prevent OOM\n\n- **batch_size**: Frames per batch (default: 5)\n  - **CRITICAL REQUIREMENT**: Must follow the **4n+1 formula** (1, 5, 9, 13, 17, 21, 25, ...)\n  - **Why this matters**: The model uses these frames for temporal consistency calculations\n  - **Minimum 5 for temporal consistency**: Use 1 only for single images or when temporal consistency isn't needed\n  - **Match shot length ideally**: For best results, set batch_size to match your shot length (e.g., batch_size=21 for a 20-frame shot)\n  - **VRAM impact**: Higher batch_size = better quality and speed but requires more VRAM\n  - **If you get OOM with batch_size=5**: Try optimization techniques first (model offloading, BlockSwap, GGUF models...) before reducing batch_size or input resolution, as these directly impact quality\n\n**uniform_batch_size** (default: False)\n  - Pads the final batch to match `batch_size` for uniform processing\n  - Prevents temporal artifacts when the last batch is significantly smaller than others\n  - Example: 45 frames with `batch_size=33` creates [33, 33] instead of [33, 12]\n  - Recommended when using large batch sizes and video length is not a multiple of `batch_size`\n  - Increases VRAM usage slightly but ensures consistent temporal coherence across all batches\n\n- **temporal_overlap**: Overlapping frames between batches (default: 0)\n  - Used for blending between batches to reduce temporal artifacts\n  - Range: 0-16 frames\n\n- **prepend_frames**: Frames to prepend (default: 0)\n  - Prepends reversed frames to reduce artifacts at video start\n  - Automatically removed after processing\n  - Range: 0-32 frames\n\n- **color_correction**: Color correction method (default: \"wavelet\")\n  - **`lab`**: Full perceptual color matching with detail preservation (recommended for highest fidelity to original)\n  - **`wavelet`**: Frequency-based natural colors, preserves details well\n  - **`wavelet_adaptive`**: Wavelet base + targeted saturation correction\n  - **`hsv`**: Hue-conditional saturation matching\n  - **`adain`**: Statistical style transfer\n  - **`none`**: No color correction\n\n- **input_noise_scale**: Input noise injection scale 0.0-1.0 (default: 0.0)\n  - Adds noise to input frames to reduce artifacts at very high resolutions\n  - Try 0.1-0.3 if you see artifacts with high output resolutions\n\n- **latent_noise_scale**: Latent space noise scale 0.0-1.0 (default: 0.0)\n  - Adds noise during diffusion process, can soften excessive detail\n  - Use if input_noise doesn't help, try 0.05-0.15\n\n- **offload_device**: Device for storing intermediate tensors between processing phases (default: \"cpu\")\n  - `none`: Keep all tensors on inference device (fastest but highest VRAM)\n  - `cpu`: Offload to system RAM (recommended for long videos, slower transfers)\n  - `cuda:X`: Offload to another GPU (good balance if available, faster than CPU)\n\n- **enable_debug**: Enable detailed debug logging (default: False)\n  - Shows memory usage, timing information, and processing details\n  - **Highly recommended** for troubleshooting OOM issues\n\n**Output:**\n- Upscaled video frames with color correction applied\n- Format (RGB\u002FRGBA) matches input\n- Range [0, 1] normalized for ComfyUI compatibility\n\n### Typical Workflow Setup\n\n**Basic Workflow (High VRAM - 24GB+)**:\n```\nLoad Video Frames\n    ↓\nSeedVR2 Load DiT Model\n  ├─ model: seedvr2_ema_3b_fp16.safetensors\n  └─ device: cuda:0\n    ↓\nSeedVR2 Load VAE Model\n  ├─ model: ema_vae_fp16.safetensors\n  └─ device: cuda:0\n    ↓\nSeedVR2 Video Upscaler\n  ├─ batch_size: 21\n  └─ resolution: 1080\n    ↓\nSave Video\u002FFrames\n```\n\n**Low VRAM Workflow (8-12GB)**:\n```\nLoad Video Frames\n    ↓\nSeedVR2 Load DiT Model\n  ├─ model: seedvr2_ema_3b-Q8_0.gguf\n  ├─ device: cuda:0\n  ├─ offload_device: cpu\n  ├─ blocks_to_swap: 32\n  └─ swap_io_components: True\n    ↓\nSeedVR2 Load VAE Model\n  ├─ model: ema_vae_fp16.safetensors\n  ├─ device: cuda:0\n  ├─ encode_tiled: True\n  └─ decode_tiled: True\n    ↓\nSeedVR2 Video Upscaler\n  ├─ batch_size: 5\n  └─ resolution: 720\n    ↓\nSave Video\u002FFrames\n```\n\n**High Performance Workflow (24GB+ with torch.compile)**:\n```\nLoad Video Frames\n    ↓\nSeedVR2 Torch Compile Settings\n  ├─ mode: max-autotune\n  └─ backend: inductor\n    ↓\nSeedVR2 Load DiT Model\n  ├─ model: seedvr2_ema_7b_sharp_fp16.safetensors\n  ├─ device: cuda:0\n  └─ torch_compile_args: connected\n    ↓\nSeedVR2 Load VAE Model\n  ├─ model: ema_vae_fp16.safetensors\n  ├─ device: cuda:0\n  └─ torch_compile_args: connected\n    ↓\nSeedVR2 Video Upscaler\n  ├─ batch_size: 81\n  └─ resolution: 1080\n    ↓\nSave Video\u002FFrames\n```\n\n## 🖥️ Run as Standalone (CLI)\n\nThe standalone CLI provides powerful batch processing capabilities with multi-GPU support and sophisticated optimization options.\n\n### Prerequisites\n\nChoose the appropriate setup based on your installation:\n\n#### Option 1: Already Have ComfyUI with SeedVR2 Installed\n\nIf you've already installed SeedVR2 as part of ComfyUI (via [ComfyUI installation](#-installation)), you can use the CLI directly:\n\n```bash\n# Navigate to your ComfyUI directory\ncd ComfyUI\n\n# Run the CLI using standalone Python (display help message)\n# Windows:\n.venv\\Scripts\\python.exe custom_nodes\\seedvr2_videoupscaler\\inference_cli.py --help\n# Linux\u002FmacOS:\n.venv\u002Fbin\u002Fpython custom_nodes\u002Fseedvr2_videoupscaler\u002Finference_cli.py --help\n```\n\n**Skip to [Command Line Usage](#command-line-usage) below.**\n\n#### Option 2: Standalone Installation (Without ComfyUI)\n\nIf you want to use the CLI without ComfyUI installation, follow these steps:\n\n1. **Install [uv](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002Fgetting-started\u002Finstallation\u002F)** (modern Python package manager):\n```bash\n# Windows\npowershell -ExecutionPolicy ByPass -c \"irm https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.ps1 | iex\"\n\n# macOS and Linux\ncurl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\n```\n\n2. **Clone the repository**:\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fnumz\u002FComfyUI-SeedVR2_VideoUpscaler.git seedvr2_videoupscaler\ncd seedvr2_videoupscaler\n```\n\n3. **Create virtual environment and install dependencies**:\n```bash\n# Create virtual environment with Python 3.13\nuv venv --python 3.13\n\n# Activate virtual environment\n# Windows:\n.venv\\Scripts\\activate\n# Linux\u002FmacOS:\nsource .venv\u002Fbin\u002Factivate\n\n# Install PyTorch with CUDA support\n# Check command line based on your environment: https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F\nuv pip install --pre torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fnightly\u002Fcu130\n\n# Install SeedVR2 requirements\nuv pip install -r requirements.txt\n\n# Run the CLI (display help message)\n# Windows:\n.venv\\Scripts\\python.exe inference_cli.py --help\n# Linux\u002FmacOS:\n.venv\u002Fbin\u002Fpython inference_cli.py --help\n```\n\n### Command Line Usage\n\nThe CLI provides comprehensive options for single-GPU, multi-GPU, and batch processing workflows.\n\n**Basic Usage Examples:**\n\n```bash\n# Basic image upscaling\npython inference_cli.py image.jpg\n\n# Basic video upscaling with temporal consistency\npython inference_cli.py video.mp4 --resolution 720 --batch_size 33\n\n# Streaming mode for long videos (memory-efficient) with 10-bit video output (requires FFMPEG)\n# Processes video in chunks of 330 frames to avoid loading entire video into RAM\n# Use --temporal_overlap to ensure smooth transitions between chunks\npython inference_cli.py long_video.mp4 \\\n    --resolution 1080 \\\n    --batch_size 33 \\\n    --chunk_size 330 \\\n    --temporal_overlap 3 \\\n    --video_backend ffmpeg \\\n    --10bit\n\n# Multi-GPU processing with temporal overlap\npython inference_cli.py video.mp4 \\\n    --cuda_device 0,1 \\\n    --resolution 1080 \\\n    --batch_size 81 \\\n    --uniform_batch_size \\\n    --temporal_overlap 3 \\\n    --prepend_frames 4\n\n# Memory-optimized for low VRAM (8GB)\npython inference_cli.py image.png \\\n    --dit_model seedvr2_ema_3b-Q8_0.gguf \\\n    --resolution 1080 \\\n    --blocks_to_swap 32 \\\n    --swap_io_components \\\n    --dit_offload_device cpu \\\n    --vae_offload_device cpu\n\n# High resolution with VAE tiling\npython inference_cli.py video.mp4 \\\n    --resolution 1440 \\\n    --batch_size 31 \\\n    --uniform_batch_size \\\n    --temporal_overlap 3 \\\n    --vae_encode_tiled \\\n    --vae_decode_tiled\n\n# Batch directory processing with model caching\npython inference_cli.py media_folder\u002F \\\n    --output processed\u002F \\\n    --cuda_device 0 \\\n    --cache_dit \\\n    --cache_vae \\\n    --dit_offload_device cpu \\\n    --vae_offload_device cpu \\\n    --resolution 1080 \\\n    --max_resolution 1920\n```\n\n### Command Line Arguments\n\n**Input\u002FOutput:**\n- `\u003Cinput>`: Input file (.mp4, .avi, .png, .jpg, etc.) or directory\n- `--output`: Output path (default: auto-generated in 'output\u002F' directory)\n- `--output_format`: Output format: 'mp4' (video) or 'png' (image sequence). Default: auto-detect from input type\n- `--video_backend`: Video encoder backend: 'opencv' (default) or 'ffmpeg' (requires ffmpeg in PATH)\n- `--10bit`: Save 10-bit video with x265 codec and yuv420p10le pixel format (reduces banding in gradients). Without this flag, ffmpeg uses x264 (yuv420p) for maximum compatibility. Requires --video_backend ffmpeg\n- `--model_dir`: Model directory (default: .\u002Fmodels\u002FSEEDVR2)\n\n**Model Selection:**\n- `--dit_model`: DiT model to use. Options: 3B\u002F7B with fp16\u002Ffp8\u002FGGUF variants (default: 3B FP8)\n\n**Processing Parameters:**\n- `--resolution`: Target short-side resolution in pixels (default: 1080)\n- `--max_resolution`: Maximum resolution for any edge. Scales down if exceeded. 0 = no limit (default: 0)\n- `--batch_size`: Frames per batch (must follow 4n+1: 1, 5, 9, 13, 17, 21...). Ideally matches shot length for best temporal consistency (default: 5)\n- `--seed`: Random seed for reproducibility (default: 42)\n- `--skip_first_frames`: Skip N initial frames (default: 0)\n- `--load_cap`: Maximum total frames to load from video. 0 = load all (default: 0)\n- `--chunk_size`: Frames per chunk for streaming mode. When > 0, processes video in memory-bounded chunks of N frames, writing each chunk before loading the next. Essential for long videos that would otherwise exceed RAM. Use with `--temporal_overlap` for seamless chunk transitions. 0 = load all frames at once (default: 0)\n- `--prepend_frames`: Prepend N reversed frames to reduce start artifacts (auto-removed) (default: 0)\n- `--temporal_overlap`: Frames to overlap between batches\u002FGPUs for smooth blending (default: 0)\n\n**Quality Control:**\n- `--color_correction`: Color correction method: 'lab' (perceptual, recommended), 'wavelet', 'wavelet_adaptive', 'hsv', 'adain', or 'none' (default: lab)\n- `--input_noise_scale`: Input noise injection scale (0.0-1.0). Reduces artifacts at high resolutions (default: 0.0)\n- `--latent_noise_scale`: Latent space noise scale (0.0-1.0). Softens details if needed (default: 0.0)\n\n**Memory Management:**\n- `--dit_offload_device`: Device to offload DiT model: 'none' (keep on GPU), 'cpu', or 'cuda:X' (default: none)\n- `--vae_offload_device`: Device to offload VAE model: 'none', 'cpu', or 'cuda:X' (default: none)\n- `--blocks_to_swap`: Number of transformer blocks to swap (0=disabled, 3B: 0-32, 7B: 0-36). Requires dit_offload_device (default: 0). Not available on macOS.\n- `--swap_io_components`: Offload I\u002FO components for additional VRAM savings. Requires dit_offload_device. Not available on macOS.\n\n**VAE Tiling:**\n- `--vae_encode_tiled`: Enable VAE encode tiling to reduce VRAM during encoding\n- `--vae_encode_tile_size`: VAE encode tile size in pixels (default: 1024)\n- `--vae_encode_tile_overlap`: VAE encode tile overlap in pixels (default: 128)\n- `--vae_decode_tiled`: Enable VAE decode tiling to reduce VRAM during decoding\n- `--vae_decode_tile_size`: VAE decode tile size in pixels (default: 1024)\n- `--vae_decode_tile_overlap`: VAE decode tile overlap in pixels (default: 128)\n- `--tile_debug`: Visualize tiles: 'false' (default), 'encode', or 'decode'\n\n**Performance Optimization:**\n- `--allow_vram_overflow`: Allow VRAM overflow to system RAM. Prevents OOM but may cause severe slowdown\n- `--attention_mode`: Attention backend: 'sdpa' (default), 'flash_attn_2' (Ampere+), 'flash_attn_3' (Hopper+), 'sageattn_2', or 'sageattn_3' (Blackwell)\n- `--compile_dit`: Enable torch.compile for DiT model (20-40% speedup, requires PyTorch 2.0+ and Triton)\n- `--compile_vae`: Enable torch.compile for VAE model (15-25% speedup, requires PyTorch 2.0+ and Triton)\n- `--compile_backend`: Compilation backend: 'inductor' (full optimization) or 'cudagraphs' (lightweight) (default: inductor)\n- `--compile_mode`: Optimization level: 'default', 'reduce-overhead', 'max-autotune', 'max-autotune-no-cudagraphs' (default: default)\n- `--compile_fullgraph`: Compile entire model as single graph (faster but less flexible) (default: False)\n- `--compile_dynamic`: Handle varying input shapes without recompilation (default: False)\n- `--compile_dynamo_cache_size_limit`: Max cached compiled versions per function (default: 64)\n- `--compile_dynamo_recompile_limit`: Max recompilation attempts before fallback (default: 128)\n\n**Model Caching (batch processing):**\n- `--cache_dit`: Keep DiT model in memory between generations. Works with single-GPU directory processing or multi-GPU streaming (`--chunk_size`). Requires `--dit_offload_device`\n- `--cache_vae`: Keep VAE model in memory between generations. Works with single-GPU directory processing or multi-GPU streaming (`--chunk_size`). Requires `--vae_offload_device`\n\n**Multi-GPU:**\n- `--cuda_device`: CUDA device id(s). Single id (e.g., '0') or comma-separated list '0,1' for multi-GPU\n\n**Debugging:**\n- `--debug`: Enable verbose debug logging\n\n### Multi-GPU Processing Explained\n\nThe CLI's multi-GPU mode uses **frame-level parallelism**: the video is split into chunks and each GPU processes its chunk independently through all 4 phases (encode → upscale → decode → postprocess). This is ideal for long videos where you want to reduce total processing time by dividing the workload.\n\n**How it works:**\n1. Video frames are split evenly across GPUs (e.g., 100 frames on 2 GPUs → 50 frames each)\n2. Each GPU loads its own copy of the models and processes its chunk independently\n3. When `--temporal_overlap` is set, chunks include overlapping frames for seamless blending\n4. Results are concatenated (and blended at overlap regions) into the final video\n\n**Example for 100 frames on 2 GPUs with temporal_overlap=4:**\n```\nGPU 0: Frames 0-53 (50 base + 4 overlap at end, processed as independent video)\nGPU 1: Frames 50-99 (50 frames, 4 overlap at start, processed as independent video)\nResult: Frames 0-99 with smooth blending at the transition point\n```\n\n**Important considerations:**\n- Each GPU processes its chunk as a separate video with its own batch splitting\n- `batch_size` controls batching *within* each GPU's chunk, not across GPUs\n- For short videos (\u003C 100 frames), single GPU is often more efficient due to model loading overhead\n- Multi-GPU doubles VRAM usage (each GPU loads full models) but roughly halves processing time\n\n**When to use multi-GPU:**\n- Long videos (100+ frames) where splitting provides significant time savings\n- When you have multiple GPUs with sufficient VRAM each\n\n**When to use single GPU:**\n- Short videos where model loading overhead outweighs parallel gains\n- When you want all frames processed together for maximum temporal coherence\n\n**Best practices:**\n- Set `--temporal_overlap` to 2-4 frames for smooth blending between GPU chunks\n- Higher overlap = smoother transitions but more redundant processing\n- Use `--prepend_frames` to reduce artifacts at video start\n- For optimal quality on short videos, use single GPU with `batch_size` matching your shot length\n\n## ⚠️ Limitations\n\n### Model Limitations\n\n**Batch Size Constraint**: The model requires batch_size to follow the **4n+1 formula** (1, 5, 9, 13, 17, 21, 25, ...) due to temporal consistency architecture. All frames in a batch are processed together for temporal coherence, then batches can be blended using temporal_overlap. Ideally, set batch_size to match your shot length for optimal quality.\n\n### Performance Considerations\n\n**VAE Bottleneck**: Even with optimized DiT upscaling (BlockSwap, GGUF, torch.compile), the VAE encoding\u002Fdecoding stages can be the bottleneck, especially for high resolutions. The VAE is slow. Use large batch_size to mitigate this.\n\n**VRAM Usage**: While the integration now supports low VRAM systems (8GB or less with proper optimization), VRAM usage varies based on:\n- Input\u002Foutput resolution (larger = more VRAM)\n- Batch size (higher = more VRAM but better temporal consistency and speed)\n- Model choice (FP16 > FP8 > GGUF in VRAM usage)\n- Optimization settings (BlockSwap, VAE tiling significantly reduce VRAM)\n\n**Speed**: Processing speed depends on:\n- GPU capabilities (compute performance, VRAM bandwidth, and architecture generation)\n- Model size (3B faster than 7B)\n- Batch size (larger batch sizes are faster per frame due to better GPU utilization)\n- Optimization settings (torch.compile provides significant speedup)\n- Resolution (higher resolutions are slower)\n\n### Best Practices\n\n1. **Start with debug enabled** to understand where VRAM is being used\n2. **For OOM errors during encoding**: Enable VAE encode tiling and reduce tile size\n3. **For OOM errors during upscaling**: Enable BlockSwap and increase blocks_to_swap\n4. **For OOM errors during decoding**: Enable VAE decode tiling and reduce tile size\n   - **If still getting OOM after trying all above**: Reduce batch_size or resolution\n5. **For best quality**: Use higher batch_size matching your shot length, FP16 models, and LAB color correction\n6. **For speed**: Use FP8\u002FGGUF models, enable torch.compile, and use Flash Attention if available\n7. **Test settings with a short clip first** before processing long videos\n\n## 🤝 Contributing\n\nContributions are welcome! We value community input and improvements.\n\nFor detailed contribution guidelines, see [CONTRIBUTING.md](CONTRIBUTING.md).\n\n**Quick Start:**\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature\u002FAmazingFeature`)\n3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the branch (`git push origin feature\u002FAmazingFeature`)\n5. Open a Pull Request to the **main** branch\n\n**Get Help:**\n- YouTube: [AInVFX Channel](https:\u002F\u002Fwww.youtube.com\u002F@AInVFX)\n- GitHub [Issues](https:\u002F\u002Fgithub.com\u002Fnumz\u002FComfyUI-SeedVR2_VideoUpscaler\u002Fissues): For bug reports and feature requests\n- GitHub [Discussions](https:\u002F\u002Fgithub.com\u002Fnumz\u002FComfyUI-SeedVR2_VideoUpscaler\u002Fdiscussions): For questions and community support\n- Discord: adrientoupet & NumZ#7184\n\n## 🙏 Credits\n\nThis ComfyUI implementation is a collaborative project by **[NumZ](https:\u002F\u002Fgithub.com\u002Fnumz)** and **[AInVFX](https:\u002F\u002Fwww.youtube.com\u002F@AInVFX)** (Adrien Toupet), based on the original [SeedVR2](https:\u002F\u002Fgithub.com\u002FByteDance-Seed\u002FSeedVR) by ByteDance Seed Team.\n\nSpecial thanks to our community contributors including [naxci1](https:\u002F\u002Fgithub.com\u002Fnaxci1), [thehhmdb](https:\u002F\u002Fgithub.com\u002Fthehhmdb), [s-cerevisiae](https:\u002F\u002Fgithub.com\u002Fs-cerevisiae), [benjaminherb](https:\u002F\u002Fgithub.com\u002Fbenjaminherb), [cmeka](https:\u002F\u002Fgithub.com\u002Fcmeka), [FurkanGozukara](https:\u002F\u002Fgithub.com\u002FFurkanGozukara), [JohnAlcatraz](https:\u002F\u002Fgithub.com\u002FJohnAlcatraz), [lihaoyun6](https:\u002F\u002Fgithub.com\u002Flihaoyun6), [Luchuanzhao](https:\u002F\u002Fgithub.com\u002FLuchuanzhao), [Luke2642](https:\u002F\u002Fgithub.com\u002FLuke2642), [proxyid](https:\u002F\u002Fgithub.com\u002Fproxyid), [q5sys](https:\u002F\u002Fgithub.com\u002Fq5sys), and many others for their improvements, bug fixes, and testing.\n\n## 📜 License\n\nThe code in this repository is released under the Apache 2.0 license as found in the [LICENSE](LICENSE) file.","# ComfyUI-SeedVR2_视频超分辨率\n\n[![查看代码](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F📂_View_Code-GitHub-181717?style=for-the-badge&logo=github)](https:\u002F\u002Fgithub.com\u002Fnumz\u002FComfyUI-SeedVR2_VideoUpscaler)\n\n这是为 ComfyUI 官方发布的 [SeedVR2](https:\u002F\u002Fgithub.com\u002FByteDance-Seed\u002FSeedVR) 插件，可实现高质量的视频和图像超分辨率。\n\n它也可以作为 **多 GPU 独立 CLI** 运行，详情请参阅 [🖥️ 作为独立 CLI 运行](#-run-as-standalone-cli) 部分。\n\n[![SeedVR2 v2.5 深度解析教程](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnumz_ComfyUI-SeedVR2_VideoUpscaler_readme_c1df29b1d2ae.jpg)](https:\u002F\u002Fyoutu.be\u002FMBtWYXq_r60)\n\n![使用示例](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnumz_ComfyUI-SeedVR2_VideoUpscaler_readme_3c9839b686c7.png)\n\n![使用示例](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnumz_ComfyUI-SeedVR2_VideoUpscaler_readme_04a1bbdc12cf.png)\n\n## 📋 快速访问\n\n- [🆙 未来工作](#-future-work)\n- [🚀 发布说明](#-release-notes)\n- [🎯 特性](#-features)\n- [🔧 要求](#-requirements)\n- [📦 安装](#-installation)\n- [📖 使用](#-usage)\n- [🖥️ 作为独立 CLI 运行](#️-run-as-standalone-cli)\n- [⚠️ 限制](#️-limitations)\n- [🤝 贡献](#-contributing)\n- [🙏 致谢](#-credits)\n- [📜 许可证](#-license)\n\n## 🆙 未来工作\n\n我们正在积极开发改进和新功能。如需了解最新动态：\n\n- **📌 跟踪开发进展**：访问 [Issues](https:\u002F\u002Fgithub.com\u002Fnumz\u002FComfyUI-SeedVR2_VideoUpscaler\u002Fissues) 查看当前开发情况、报告问题或提出新功能需求。\n- **💬 加入社区**：在 [Discussions](https:\u002F\u002Fgithub.com\u002Fnumz\u002FComfyUI-SeedVR2_VideoUpscaler\u002Fdiscussions) 中向他人学习、分享你的工作流并获得帮助。\n- **🔮 下一代模型调查**：我们正在征求社区意见，以确定下一个开源的超强通用修复模型。请在 [Issue #164](https:\u002F\u002Fgithub.com\u002Fnumz\u002FComfyUI-SeedVR2_VideoUpscaler\u002Fissues\u002F164) 中提出你的建议。\n\n## 🚀 发布说明\n\n**2025年12月24日 - 版本 2.5.24**\n\n- **🍎 修复：MPS 内存泄漏回归问题** - 恢复了在 VAE 编码\u002F解码操作后清除 MPS 缓存的功能，该功能在 v2.5.23 的代码清理过程中被意外移除。\n\n**2025年12月24日 - 版本 2.5.23**\n\n- **🔒 安全性：防止模型加载时执行代码** - 通过仅允许反序列化张量来增加对恶意 .pth 文件的防护。\n- **🎥 修复：FFmpeg 视频写入器可靠性问题** - 通过重定向 stderr 并添加缓冲区刷新解决了 ffmpeg 进程挂起的问题，并改进了错误信息以便调试 *(感谢 [@thehhmdb](https:\u002F\u002Fgithub.com\u002Fthehhmdb))*\n- **⚡ 修复：GGUF VAE 模型支持** - 启用了卷积运算的自动权重去量化功能，使 GGUF 量化的 VAE 模型完全可用 *(感谢 [@naxci1](https:\u002F\u002Fgithub.com\u002Fnaxci1))*\n- **🛡️ 修复：VAE 切片边缘情况** - 在使用小分割尺寸和高时间下采样时，防止了因除以零而导致的崩溃 *(感谢 [@naxci1](https:\u002F\u002Fgithub.com\u002Fnaxci1))*\n- **🎨 修复：LAB 颜色转换精度问题** - 通过确保矩阵运算前浮点类型一致，解决了视频超分辨率过程中的 dtype 不匹配错误。\n- **🔧 修复：PyTorch 2.9+ 兼容性问题** - 将 Conv3d 内存绕过方案扩展到所有 PyTorch 2.9+ 版本，修复了较新版本 PyTorch 上 VRAM 使用量增加两倍的问题。\n- **📦 修复：Bitsandbytes 兼容性问题** - 添加了针对非 Gaudi 系统上 Intel Gaudi 版本检测失败的 ValueError 异常处理。\n- **🍎 MPS：内存优化** - 减少了 Apple Silicon 上编码\u002F解码操作期间的内存占用 *(感谢 [@s-cerevisiae](https:\u002F\u002Fgithub.com\u002Fs-cerevisiae))*\n\n\n**2025年12月13日 - 版本 2.5.22**\n\n- **🎬 CLI：支持 10 位的 FFmpeg 视频后端** - 新增 `--video_backend ffmpeg` 和 `--10bit` 标志，启用 x265 编码并支持 10 位色深，与 8 位 OpenCV 输出相比，可减少渐变中的条带伪影 *(基于 [@thehhmdb](https:\u002F\u002Fgithub.com\u002Fthehhmdb) 的 PR - 感谢！)*\n- **🍎 修复：MPS 双三次插值兼容性问题** - 在 PyTorch 2.8.0 之前的版本中，增加了双三次+抗锯齿插值的 CPU 备用方案，解决了 Apple Silicon 上 RGBA 透明度插值时出现的错误。\n- **⚡ 修复：跨平台直方图匹配问题** - 用 argsort+index_select 替代 scatter_ 操作，提高了在 CUDA、ROCm 和 MPS 后端上的可靠性。\n- **🧹 MPS：移除同步开销** - 恢复了在 v2.5.21 中引入的不必要的 `torch.mps.synchronize()` 调用，以保持与 CUDA 流水线的一致行为。\n\n**2025年12月12日 - 版本 2.5.21**\n\n- **🛠️ 修复：MPS 上 GGUF 去量化错误** - 解决了在 2.5.20 中引入的形状不匹配错误，原因是跳过了 GGUF 量化缓冲区进行精度转换——这些缓冲区必须保持打包格式，以便在推理过程中进行即时去量化。\n- **🍎 MPS：消除 CPU 同步开销** - 在 Apple Silicon 统一内存架构上，跳过不必要的 CPU 张量卸载操作，避免了导致速度下降的同步停滞。输入图像和输出视频现在在整个流程中都保留在 MPS 设备上。\n- **⚡ MPS：预加载文本嵌入** - 在第一阶段编码之前加载文本嵌入，以避免第二阶段开始时的同步停滞，从而提高时间准确性和吞吐量。\n- **🧹 MPS：优化模型清理** - 在统一内存上删除模型前，跳过冗余的 CPU 移动操作。\n\n**2025年12月12日 - 版本 2.5.20**\n\n- **⚡ 扩展注意力机制后端** - 完全支持 Flash Attention 2 (Ampere+)、Flash Attention 3 (Hopper+)、SageAttention 2 和 SageAttention 3 (Blackwell\u002FRTX 50xx)，并在不可用时自动回退到 PyTorch SDPA *(基于 [@naxci1](https:\u002F\u002Fgithub.com\u002Fnaxci1) 的 PR - 感谢！)*\n- **🍎 macOS\u002FApple Silicon 兼容性** - 在整个 VAE 和 DiT 流程中，将 MPS 自动混合精度替换为显式的数据类型转换，解决了 M 系列 Mac 上的卡顿和崩溃问题。BlockSwap 现在会自动禁用并发出警告（由于统一内存的存在，它已无意义）。\n- **🛡️ Flash Attention 优雅回退** - 添加了针对损坏或部分安装的 flash_attn\u002Fxformers DLL 的兼容性适配层，防止启动时崩溃。\n- **🛡️ AMD ROCm：解决 bitsandbytes 冲突问题** - 防止了当 diffusers 尝试重新导入损坏的 bitsandbytes 安装时出现的内核注册错误。\n- **📦 ComfyUI Manager：macOS 分类器修复** - 移除了导致 macOS 上出现“不支持 GPU”虚假警告的 NVIDIA CUDA 分类器。\n- **📚 文档更新** - 更新了 README，加入了注意力机制后端的详细信息、BlockSwap 的 macOS 注意事项以及更清晰的模型缓存描述。\n\n**2025年12月10日 - 版本 2.5.19**\n\n- **🎨 新的页眉 logo 设计** - 更新了 ASCII 艺术横幅 *(感谢 [@naxci1](https:\u002F\u002Fgithub.com\u002Fnaxci1))*\n- **🧹 移除已废弃的 FlashAttention 包装器** - 从 FP8CompatibleDiT 中移除了遗留代码；FlashAttentionVarlen 已通过其 `attention_mode` 属性自动处理后端切换\n- **🛡️ 修复 FlashAttention 的优雅降级机制** - 添加了针对损坏的 flash_attn\u002Fxformers DLL 文件的兼容性适配层，防止 CUDA 扩展损坏时启动崩溃\n- **📊 改进显存跟踪功能** - 分离已分配与预留内存的跟踪，仅在 Windows 上实现溢出检测（WDDM 分页行为）\n- **♻️ 统一后端检测逻辑** - 在整个代码库中统一了 `is_mps_available()`、`is_cuda_available()` 和 `get_gpu_backend()` 辅助函数\n- **🔄 恢复 2.5.14 版本的显存限制强制执行** - 移除了 `set_per_process_memory_fraction` 调用；溢出检测和警告功能仍保留。\n\n**2025年12月9日 - 版本 2.5.18**\n\n- **🚀 CLI：长视频流式处理模式** - 新增 `--chunk_size` 标志，以受内存限制的分块方式处理视频，从而支持任意长度的视频而无 RAM 限制。配合模型缓存功能（`--cache_dit`\u002F`--cache_vae`），可在不同分块间重复使用缓存数据 *(灵感来自 [disk02](https:\u002F\u002Fgithub.com\u002Fdisk02) 的 PR 贡献)*\n- **⚡ CLI：多 GPU 流式处理** - 现在每个 GPU 都会独立进行内部分段处理，并采用独立的模型缓存策略，从而提升内存效率，并支持在 GPU 边界处启用 `--temporal_overlap` 混合功能\n- **🔧 CLI：修复大视频内存错误** - 使用共享内存传输替代 NumPy 序列化，避免在高分辨率或长视频输出时发生崩溃 *(灵感来自 [FurkanGozukara](https:\u002F\u002Fgithub.com\u002FFurkanGozukara) 的 PR 贡献)*\n\n**2025年12月5日 - 版本 2.5.17**\n\n- **🔧 修复：旧款 GPU 兼容性（如 GTX 970 等）** - 运行时 bf16 CUBLAS 探测取代了计算能力启发式方法，能够正确检测不支持的 GPU，同时不影响 RTX 20XX 系列\n\n**2025年12月5日 - 版本 2.5.16**\n\n- **🔧 修复：旧款 GPU 兼容性（如 GTX 970 等）** - 自动回退至不支持 bfloat16 的 GPU\n- **🐛 修复：质量回归问题** - 恢复了导致伪影问题的 bfloat16 检测逻辑\n- **📋 调试：环境信息显示** - 在调试模式下显示系统信息，便于问题报告\n- **📚 文档：简化贡献流程** - 流程现已精简为仅向主分支提交\n\n**2025年12月3日 - 版本 2.5.15**\n\n- **🍎 修复：MPS 兼容性** - 禁用 MPS 张量的抗锯齿功能，并修复 bfloat16 arange 的问题\n- **⚡ 修复：自动混合精度设备类型** - 使用正确的设备类型属性，防止自动混合精度错误\n- **📊 内存：精确显存跟踪** - 使用 max_memory_reserved 提供更精准的峰值报告\n- **🔧 修复：Triton 兼容性** - 为 bitsandbytes 0.45+ 和 triton 3.0+ 添加适配层，解决 PyTorch 2.7 安装错误\n\n**2025年12月1日 - 版本 2.5.14**\n\n- **🍎 修复：MPS 设备比较** - 规范设备字符串，防止不必要的张量移动\n- **📊 内存：显存交换检测** - 峰值统计现在会显示 GPU 和交换内存的细分数据，当发生溢出时还会发出警告\n- **🛡️ 内存：强制执行物理显存上限** - PyTorch 现在会在超出显存限制时直接抛出 OOM 错误，而不是默默切换到共享内存（可防止 Windows 系统上的极端性能下降）\n  \n**2025年11月30日 - 版本 2.5.13**\n\n- **🔧 修复：PyTorch 2.7+ Triton 导入错误** - 解决了因较新版本 Triton 中 triton.ops 导入链导致的安装崩溃问题\n- **💾 修复：长视频转 float32 时的 OOM 错误** - 当内存不足以进行 float32 转换时，将优雅地回退到原生数据类型\n- **🍎 修复：macOS 上 CLI 水印错误** - 解决了 Apple Silicon 上 MPS 相关水印处理崩溃的问题\n\n**2025年11月28日 - 版本 2.5.12**\n\n- **🐛 修复：颜色伪影回归问题** - 恢复了视频转换流水线中的原位张量操作，该操作曾导致部分图像出现颜色伪影\n\n**2025年11月28日 - 版本 2.5.11**\n\n- **⚡ 功能：CUDNN 注意力后端** - 添加了对 PyTorch 2.3+ CUDNN_ATTENTION 后端的支持，并为旧版本提供自动回退机制（感谢 @eadwu）\n- **💾 修复：长视频内存激增问题** - VAE 解码现在直接流式写入预分配的张量，从而消除长视频处理过程中的 OOM 错误\n- **🎨 修复：LAB 颜色校正伪影** - 使用小波重建预处理解决了瓦片边界伪影问题\n- **🎨 修复：颜色参考错位** - 修正了颜色校正帧与时间重叠之间的对齐问题\n- **🍎 修复：MPS 检测可靠性** - 切换到规范的 `torch.backends.mps.is_available()` API，以确保一致的 Apple Silicon 检测结果\n- **🖥️ 修复：Mac 子进程错误** - CLI 现在在 Mac 上直接运行，避免子进程中 MPS 分配器失败的情况\n- **🖥️ 修复：多 GPU 设备分配** - 现在会在进程创建前设置 CUDA_VISIBLE_DEVICES，以确保工作进程正确继承设备\n- **📊 修复：BlockSwap 日志记录** - 现在显示有效\u002F总块数（例如 32\u002F32），而非原始请求值\n- **🔧 功能：自动 bfloat16 检测** - 自动检测 bfloat16 支持情况，以防止旧款 GPU 上出现 CUBLAS 错误\n- **📊 功能：峰值内存跟踪** - 在调试摘要中增加了内存使用情况，与显存一同显示\n- **⚡ 性能：原位张量操作** - 通过在整个流水线中使用原位操作，减少了内存分配开销\n- **📖 文档：多 GPU 说明** - 明确了多 GPU 设置下帧级并行性的预期行为\n\n**2025年11月13日 - 版本 2.5.10**\n\n- **🎯 修复：确定性生成** - 使用相同种子生成的图像，在不同会话和批次位置上都会产生完全一致的结果\n- **🔧 修复：带 BlockSwap 的模型缓存** - 解决了当 VAE 缓存状态发生变化时，已缓存的 DiT 模型无法正确重新加载的问题\n- **💾 修复：Runner 缓存优化** - Runner 模板现在无论缓存顺序如何，只要 DiT 和 VAE 都被缓存，就会正确缓存\n- **📁 修复：模型路径大小写不敏感** - YAML 配置中的额外模型路径现在无论大小写如何都能正常工作（seedvr2、SEEDVR2、SeedVR2 等）\n- **🐛 修复：高分辨率瓦片调试崩溃** - 修复了在使用最大分辨率且启用 VAE 瓦片化时出现的 “NoneType 对象没有 log 属性” 错误\n- **📊 修复：时间重叠日志记录** - 当时间重叠自动调整时，修正了帧数报告问题\n- **🔍 功能：增强模型路径调试** - 添加了详细日志记录，帮助排查模型加载问题（可在调试模式下查看）\n\n**2025年11月12日 - 版本 2.5.9**\n\n- **🐛 修复：瓦片调试可视化崩溃** - 修复了在某些系统上使用 VAE 瓦片调试模式时出现的 OpenCV 错误。\n- **🍎 修复：macOS MPS 加载错误** - 在某些 PyTorch\u002FmacOS 版本中，针对 MPS 分配器问题添加了自动回退到 CPU 的机制。\n- **🖥️ 修复：Windows 日志缓冲** - 在 Windows 上的 ComfyUI 中，为 print 语句添加了刷新功能，以实现实时日志可见性。\n- **📦 修复：ComfyUI 注册表 logo** - 更新了图标 URL，使其在 ComfyUI 节点注册表中正常显示。\n- **ℹ️ 功能：版本显示** - 在节点名称以及 CLI\u002FComfyUI 头部添加了版本号，以便更好地追踪版本。\n- **💝 功能：GitHub 赞助** - 添加了赞助按钮，用于支持项目开发。感谢大家的支持！\n- **📜 许可证：Apache 2.0** - 将许可证从 MIT 重新改为 Apache 2.0，以与字节跳动 Seed 项目保持一致。\n\n**2025年11月10日 - 版本 2.5.8**\n\n- **🐛 修复（CLI）：Windows 批处理重复文件问题** - 修复了由于 Windows 文件系统不区分大小写导致 CLI 批处理模式下每个文件被处理两次的问题。同时将目录扫描性能提升了 2-3 倍。\n- **📁 修复（CLI）：输出文件夹位置** - 现在输出文件会创建在更合理的路径下：批处理模式会在原文件所在目录下创建一个同级的 `{folder_name}_upscaled\u002F` 文件夹，并保留原始文件名；单文件模式则会在同一目录下添加 `_upscaled` 后缀。所有日志现在都会显示绝对路径，以提高清晰度。\n- **🎨 修复（CLI）：RGBA 透明通道支持** - 现在带有透明度的 PNG 图像会被正确检测并保留在超分辨率处理流程中，与 ComfyUI 的行为一致。\n\n**2025年11月10日 - 版本 2.5.7**\n\n- **🔧 修复：Conv3d 替代方案兼容性** - 改进了平台检测，并添加了优雅的回退机制，以防止在 PyTorch 开发版和 AMD ROCm 系统上出现错误。\n\n**2025年11月9日 - 版本 2.5.6**\n\n- 🎨 **修复：恢复 7b 模型的自然外观** - 修正了 torch.compile 优化导致的过度塑料感和高光泽度问题，该问题曾出现在使用 7b 模型进行超分辨率处理的视频中。\n- 💾 **内存：修复长视频的内存泄漏问题** - 采用按需重建的方式，使用轻量级的批次索引代替存储完整的变换后视频；修复了 release_tensor_memory 函数，使其能够一致地处理 CPU\u002FCUDA\u002FMPS 内存释放问题，并重构了批次处理辅助函数。\n\n**2025年11月8日 - 版本 2.5.4**\n\n- 🎨 **修复：AdaIN 颜色校正** - 将 `.view()` 替换为 `.reshape()`，以处理空间填充后的非连续张量，从而解决“视图大小与输入张量的大小和步幅不兼容”的错误。\n- 🔴 **修复：AMD ROCm 兼容性** - 在 Conv3d 替代方案中添加了 cuDNN 可用性检查，以防止在 ROCm 系统（AMD GPU 在 Windows\u002FLinux 上）上出现“ATen 未编译 cuDNN 支持”的错误。\n\n**2025年11月8日 - 版本 2.5.3**\n\n- 🍎 **修复：Apple Silicon MPS 设备处理** - 修正了 MPS 设备枚举逻辑，使用 `\"mps\"` 而不是 `\"mps:0\"`，从而解决了 M 系列 Mac 上的无效设备错误。\n- 🪟 **修复：Windows 上的 torch.mps AttributeError** - 添加了对 `torch.mps.is_available()` 的防御性检查，以应对在非 Mac 平台上该方法不存在的 PyTorch 版本。\n\n**2025年11月7日 - 版本 2.5.0** 🎉\n\n⚠️ **重大变更**：这是一个需要重新创建工作流的重大更新。所有节点和 CLI 参数都经过重新设计，以提升易用性和一致性。请观看 [AInVFX](https:\u002F\u002Fwww.youtube.com\u002F@AInVFX) 的最新视频，深入了解此次更新，并查看[使用说明](#-usage)部分。\n\n**📦 官方发布**：现已在主分支上线，并支持 ComfyUI Manager，方便用户安装及自动跟踪版本。更新的依赖项和本地导入避免了与其他 ComfyUI 自定义节点的冲突。\n\n\n\n### 🎨 ComfyUI 改进\n\n- **四节点模块化架构**：拆分为专门的 DiT 模型、VAE 模型、torch.compile 设置以及主超分辨率节点，实现更精细的控制。\n- **全局模型缓存**：模型现在可在多个超分辨率实例之间共享，并自动更新配置，无需重复加载。\n- **ComfyUI V3 迁移**：完全兼容 ComfyUI V3 的无状态节点设计。\n- **RGBA 支持**：原生支持透明通道处理，并通过边缘引导的超分辨率技术实现干净的透明效果。\n- **改进的内存管理**：流式架构可防止无论视频长度如何，VRAM 都不会出现峰值。\n- **灵活的分辨率支持**：可将视频或图像放大到任何能被 2 整除的分辨率，采用无损填充方式（取代了限制性的裁剪操作）。\n- **增强的参数设置**：新增了 `uniform_batch_size`、`temporal_overlap`、`prepend_frames` 和 `max_resolution` 等参数，以提供更好的控制。\n\n### 🖥️ CLI 增强\n\n- **批量目录处理**：高效地处理包含视频\u002F图片的整个文件夹，同时利用模型缓存提升效率。\n- **单张图片支持**：直接对图片进行超分辨率处理，无需转换为视频。\n- **智能输出检测**：根据输入类型自动检测输出格式（MP4\u002FPNG）。\n- **多 GPU 性能提升**：通过时间重叠混合技术，进一步优化了负载分配。\n- **统一参数**：CLI 和 ComfyUI 现在使用相同的参数名称，以确保一致性。\n- **更好的用户体验**：自动显示帮助信息、改进验证机制、跟踪进度，并使输出更加整洁。\n\n### ⚡ 性能与优化\n\n- **torch.compile 支持** - 使用完整图编译后，DiT 速度提升 20-40%，VAE 速度提升 15-25%。\n- **优化的 BlockSwap** - 采用自适应内存清理（阈值为 5%），分离 I\u002FO 组件处理，降低开销。\n- **增强的 VAE 瓦片处理** - 支持张量卸载至累积缓冲区，并可单独配置编码和解码过程。\n- **原生数据类型流水线** - 消除了不必要的类型转换，在整个流程中保持 bfloat16 精度，以兼顾速度和质量。\n- **优化的张量操作** - 用原生 PyTorch 操作替换了 einops 的重新排列，使转换速度提升了 2-5 倍。\n\n### 🎯 质量提升\n\n- **LAB 颜色校正** - 引入新的感知色彩转移方法，具有更出色的色彩准确性（现为默认设置）。\n- **额外的颜色方法** - 包括 HSV 饱和度匹配、小波自适应以及混合方法。\n- **确定性生成** - 基于种子的可重复性，采用特定阶段的种子策略。\n- **更好的时间一致性** - 使用汉宁窗混合技术，使不同批次之间的过渡更加平滑。\n\n### 💾 内存管理\n\n- **更智能的卸载** - 对 DiT、VAE 和张量分别进行独立的设备配置（CPU\u002FGPU\u002F无）。\n- **四阶段流水线** - 每个批次的所有阶段（编码→超分辨率→解码→后处理）完成后才会进入下一阶段，从而最大限度地减少模型切换。\n- **更好的清理** - 根据阶段进行资源管理，并正确释放张量内存。\n- **峰值 VRAM 跟踪** - 对每个阶段的内存使用情况进行监控，并提供汇总显示。\n\n### 🔧 技术改进\n\n- **GGUF量化支持**：新增对低显存设备上4位\u002F8位推理的完整GGUF支持\n- **GGUF处理优化**：修复了显存泄漏问题，增强了与torch.compile的兼容性，并解决了非持久化缓冲区问题。\n- **Apple Silicon支持**：全面支持Apple Silicon Mac上的MPS（Metal Performance Shaders）。\n- **AMD ROCm兼容性**：为PyTorch ROCm 7+版本添加了条件性的FSDP导入支持。\n- **Conv3d内存绕过方案**：修复了PyTorch 2.9及以上版本中cuDNN内存使用问题，内存占用减少至原来的三分之一。\n- **Flash Attention可选**：当Flash Attention不可用时，可优雅地回退到SDPA。\n\n### 📚 代码质量\n\n- **模块化架构**：将单体文件拆分为多个专注功能的模块（如generation_phases、model_configuration等）。\n- **全面文档**：所有模块均配有详尽的文档字符串及类型提示。\n- **更好的错误处理**：提前验证输入，提供清晰的错误信息和安装说明。\n- **统一的日志记录**：采用统一的缩进格式，分类更清晰，日志信息更加简洁。\n\n**2025年8月7日**\n\n- 🎯 **统一调试系统**：引入新的结构化日志系统，包含类别、计时器和内存跟踪功能。主节点现已支持`enable_debug`选项。\n- ⚡ **智能FP8优化**：FP8模型现在保留原生FP8存储格式，仅在进行算术运算时转换为BFloat16——相比FP16更快且更节省显存。\n- 📦 **模型注册表**：支持多仓库（numz\u002F & AInVFX\u002F），自动发现用户自定义模型，并新增混合FP8变体以解决7B模型中的伪影问题。\n- 💾 **模型缓存**：`cache_model`功能已移至主节点，同时通过正确的RoPE和包装器清理修复了内存泄漏问题。\n- 🧹 **代码清理**：引入新的模块化结构（constants.py、model_registry.py、debug.py），并移除了遗留代码。\n- 🚀 **性能提升**：通过`torch.cuda.ipc_collect()`实现更好的显存管理，并改进了RoPE处理方式。\n\n**2025年7月17日**\n\n- 🛠️ 新增7B锐化模型：推出两款输出更锐利的7B模型。\n\n**2025年7月11日**\n\n- 🎬 完整教程发布：来自[AInVFX](https:\u002F\u002Fwww.youtube.com\u002F@AInVFX)的Adrien制作了一篇深入的ComfyUI SeedVR2指南，内容涵盖从基础设置到高级BlockSwap技术，适用于消费级显卡运行。非常适合理解如何优化显存以及对带有Alpha通道的图像序列进行超分辨率处理！[观看教程](#-usage)\n\n**2025年9月7日**\n\n- 🛠️ Blockswap集成：特别感谢来自[AInVFX](https:\u002F\u002Fwww.youtube.com\u002F@AInVFX)的[Adrien Toupet](https:\u002F\u002Fgithub.com\u002Fadrientoupet)，此功能对显存有限的用户非常有用（详见[使用方法](#-usage)部分）。\n\n**2025年7月3日**\n\n- 🛠️ 支持**独立模式**运行，且可利用**多GPU**加速，详情请参阅[🖥️ 独立运行](#run-as-standalone-cli)。\n\n**2025年6月30日**\n\n- 🚀 提升处理速度并减少显存占用。\n- 🛠️ 修复了3B模型中的内存泄漏问题。\n- ❌ 现在可以在需要时中断进程。\n- ✅ 对代码进行了重构，以便更好地与社区共享，欢迎提出Pull Request。\n- 🛠️ 移除了对Flash Attention的依赖（感谢[luke2642](https:\u002F\u002Fgithub.com\u002FLuke2642)！！）。\n\n**2025年6月24日**\n\n- 🚀 处理速度最高可提升至4倍。\n\n**2025年6月22日**\n\n- 💪 FP8兼容！\n- 🚀 全流程加速。\n- 🚀 显存消耗更低（RTX4090建议保持高显存，batch_size=1，我正在努力解决这个问题）。\n- 🛠️ 更好的基准测试即将推出。\n\n**2025年6月20日**\n\n- 🛠️ 初始推送。\n\n## 🎯 功能特性\n\n### 核心能力\n- **高质量扩散模型超分辨率**：一步式扩散模型用于视频和图像增强。\n- **时间一致性**：通过可配置的批处理方式，保持视频帧之间的连贯性。\n- **多格式支持**：支持RGB和RGBA（Alpha通道）格式的视频和图像。\n- **任意视频长度**：适用于任何长度的视频。\n\n### 模型支持\n- **多种模型变体**：提供3B和7B参数量的不同精度版本。\n- **FP16、FP8和GGUF量化**：根据显存需求，可选择全精度（FP16）、混合精度（FP8）或高度量化的GGUF模型。\n- **自动模型下载**：首次使用时会自动从HuggingFace下载所需模型。\n\n### 内存优化\n- **BlockSwap技术**：在GPU和CPU内存之间动态交换Transformer块，从而在有限显存条件下运行大型模型。\n- **VAE分块处理**：通过分块编码\u002F解码大分辨率图像来降低显存占用。\n- **智能卸载**：在不同处理阶段间将模型和中间张量卸载到CPU或辅助GPU上。\n- **GGUF量化支持**：使用4位或8位量化模型以极大节省显存。\n\n### 性能特性\n- **torch.compile集成**：启用PyTorch 2.0+编译后，DiT速度可提升20-40%，VAE速度可提升15-25%。\n- **多GPU命令行界面**：通过自动的时间重叠混合，将工作负载分配到多个GPU上。\n- **模型缓存**：在单GPU目录处理或多GPU流式传输中，可将模型保留在内存中以供后续使用。\n- **灵活的注意力后端**：可根据硬件支持情况，选择PyTorch SDPA（稳定且始终可用）、Flash Attention 2\u002F3或SageAttention 2\u002F3以获得更快的计算速度。\n\n### 质量控制\n- **高级色彩校正**：提供五种方法，包括LAB（推荐用于最高保真度）、小波变换、自适应小波变换、HSV和AdaIN。\n- **噪声注入控制**：可微调输入和潜在噪声比例，以减少高分辨率下的伪影。\n- **可配置的分辨率限制**：设置目标和最大分辨率，并自动保持宽高比。\n\n### 工作流程特性\n- **ComfyUI集成**：提供四个专用节点，实现对超分辨率流程的完全控制。\n- **独立命令行界面**：用于批量处理和自动化任务。\n- **调试日志记录**：提供全面的调试模式，包含内存跟踪、时间信息和处理细节。\n- **进度报告**：在处理过程中实时显示进度。\n\n## 🔧 系统要求\n\n### 硬件\n\n借助当前的优化技术（分块处理、BlockSwap、GGUF量化），SeedVR2可在多种硬件上运行：\n\n- **最低显存**（8GB或以下）：使用开启BlockSwap和VAE分块处理的GGUF Q4_K_M模型。\n- **中等显存**（12-16GB）：根据需要使用带BlockSwap或VAE分块处理的FP8模型。\n- **高显存**（24GB以上）：使用FP16模型以获得最佳质量和速度，无需额外的内存优化措施。\n\n### 软件\n\n- **ComfyUI**：建议使用最新版本。\n- **Python**：3.12及以上版本（经测试并推荐使用Python 3.12和3.13）。\n- **PyTorch**：2.0及以上版本以支持torch.compile（可选但推荐）。\n- **Triton**：使用inductor后端进行torch.compile时必需（可选）。\n- **Flash Attention \u002F SageAttention**：Flash Attention 2（Ampere及以上）、Flash Attention 3（Hopper及以上）、SageAttention 2或SageAttention 3（Blackwell）可在支持的硬件上提供更快的注意力计算（可选，若不可用则回退到PyTorch SDPA）。\n\n### 方法一：ComfyUI 管理器（推荐）\n\n1. 在 ComfyUI 界面中打开 ComfyUI 管理器。\n2. 点击“自定义节点管理器”。\n3. 搜索“ComfyUI-SeedVR2_VideoUpscaler”。\n4. 点击“安装”，然后重启 ComfyUI。\n\n**注册表链接**：[ComfyUI 注册表 - SeedVR2 视频超分辨率插件](https:\u002F\u002Fregistry.comfy.org\u002Fnodes\u002Fseedvr2_videoupscaler)\n\n### 方法二：手动安装\n\n1. **克隆仓库**到你的 ComfyUI 自定义节点目录：\n```bash\ncd ComfyUI\ngit clone https:\u002F\u002Fgithub.com\u002Fnumz\u002FComfyUI-SeedVR2_VideoUpscaler.git custom_nodes\u002Fseedvr2_videoupscaler\n```\n\n2. **使用独立 Python 安装依赖**：\n```bash\n# 安装所需依赖（从同一 ComfyUI 目录运行）\n# Windows:\n.venv\\Scripts\\python.exe -m pip install -r custom_nodes\\seedvr2_videoupscaler\\requirements.txt\n# Linux\u002FmacOS:\n.venv\u002Fbin\u002Fpython -m pip install -r custom_nodes\u002Fseedvr2_videoupscaler\u002Frequirements.txt\n```\n\n3. **重启 ComfyUI**\n\n### 模型安装\n\n模型将在首次使用时**自动下载**，并保存到 `ComfyUI\u002Fmodels\u002FSEEDVR2` 目录下。\n\n你也可以手动从以下地址下载模型：\n- 主要模型可在 [numz\u002FSeedVR2_comfyUI](https:\u002F\u002Fhuggingface.co\u002Fnumz\u002FSeedVR2_comfyUI\u002Ftree\u002Fmain) 和 [AInVFX\u002FSeedVR2_comfyUI](https:\u002F\u002Fhuggingface.co\u002FAInVFX\u002FSeedVR2_comfyUI\u002Ftree\u002Fmain) 获取。\n- 额外的 GGUF 模型可在 [cmeka\u002FSeedVR2-GGUF](https:\u002F\u002Fhuggingface.co\u002Fcmeka\u002FSeedVR2-GGUF\u002Ftree\u002Fmain) 下载。\n\n## 📖 使用说明\n\n### 🎬 视频教程\n\n#### 最新版本深度解析（推荐）\n\n由 AInVFX 的 Adrien 制作的 v2.5 完整讲解视频，涵盖了全新的 4 节点架构、GGUF 支持、内存优化以及生产工作流等内容：\n\n[![SeedVR2 v2.5 深度解析教程](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnumz_ComfyUI-SeedVR2_VideoUpscaler_readme_c1df29b1d2ae.jpg)](https:\u002F\u002Fyoutu.be\u002FMBtWYXq_r60)\n\n本教程详细介绍了：\n- 如何通过 ComfyUI 管理器安装 v2.5 并解决冲突问题\n- 新的 4 节点模块化架构及其重构原因\n- 使用 GGUF 量化在 8GB 显存上运行 7B 模型\n- 根据硬件配置设置 BlockSwap、VAE 平铺及 torch.compile\n- 带有 Alpha 通道的图像和视频超分辨率工作流\n- 用于批量处理和多 GPU 渲染的命令行工具\n- 针对不同显存水平的内存优化策略\n- 实际生产中的技巧以及关键的 batch_size 公式（4n+1）\n\n#### 旧版本教程\n\n作为参考，这里提供最初的发布版教程：\n\n[![SeedVR2 深度解析教程](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnumz_ComfyUI-SeedVR2_VideoUpscaler_readme_f63bdd371d3a.jpg)](https:\u002F\u002Fyoutu.be\u002FI0sl45GMqNg)\n\n*注：该教程介绍的是之前的单节点架构。虽然 v2.5 的界面已大幅变化，但关于 BlockSwap 和内存管理的核心概念仍然很有价值。*\n\n### 节点设置\n\nSeedVR2 采用模块化节点架构，包含四个专用节点：\n\n#### 1. SeedVR2 (下载)DiT 模型\n\n![SeedVR2 (下载)DiT 模型](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnumz_ComfyUI-SeedVR2_VideoUpscaler_readme_32d7085bcfe1.png)\n\n用于配置用于视频超分辨率的 DiT（扩散 Transformer）模型。\n\n**参数说明：**\n\n- **model**：选择你的 DiT 模型\n  - **3B 模型**：速度较快，显存需求较低\n    - `seedvr2_ema_3b_fp16.safetensors`：FP16（最佳质量）\n    - `seedvr2_ema_3b_fp8_e4m3fn.safetensors`：FP8 8 位（良好质量）\n    - `seedvr2_ema_3b-Q4_K_M.gguf`：GGUF 4 位量化（可接受质量）\n    - `seedvr2_ema_3b-Q8_0.gguf`：GGUF 8 位量化（良好质量）\n  - **7B 模型**：质量更高，显存需求更大\n    - `seedvr2_ema_7b_fp16.safetensors`：FP16（最佳质量）\n    - `seedvr2_ema_7b_fp8_e4m3fn_mixed_block35_fp16.safetensors`：FP8，最后一层用 FP16 减少伪影（良好质量）\n    - `seedvr2_ema_7b-Q4_K_M.gguf`：GGUF 4 位量化（可接受质量）\n    - `seedvr2_ema_7b_sharp_*`：锐化变体，增强细节\n\n- **device**：执行 DiT 推理的 GPU 设备（例如 `cuda:0`）\n\n- **offload_device**：在不活跃时卸载 DiT 模型的设备\n  - `none`：将模型保留在推理设备上（最快，占用显存最多）\n  - `cpu`：卸载到系统内存（减少显存占用）\n  - `cuda:X`：卸载到另一块 GPU（如果有条件，效果较好）\n\n- **cache_model**：在工作流之间将 DiT 模型保留在 offload_device 上\n  - 对于批量处理非常有用，可避免重复加载\n  - 需要同时设置 offload_device\n\n- **blocks_to_swap**：启用 BlockSwap 内存优化功能\n  - `0`：禁用（默认）\n  - `1-32`：适用于 3B 模型的变换块数量\n  - `1-36`：适用于 7B 模型的变换块数量\n  - 数值越高，节省的显存越多，但处理速度会变慢\n  - 需要设置 offload_device，且不能与 device 相同\n\n- **swap_io_components**：卸载输入输出嵌入层和归一化层\n  - 与 blocks_to_swap 结合使用可进一步节省显存\n  - 需要设置 offload_device，且不能与 device 相同\n\n- **attention_mode**：注意力计算后端\n  - `sdpa`：PyTorch scaled_dot_product_attention（默认，始终可用）\n  - `flash_attn_2`：Flash Attention 2（Ampere+，需安装 flash-attn 包）\n  - `flash_attn_3`：Flash Attention 3（Hopper+，需安装支持 FA3 的 flash-attn 包）\n  - `sageattn_2`：SageAttention 2（需安装 sageattention 包）\n  - `sageattn_3`：SageAttention 3（Blackwell\u002FRTX 50xx，需安装 sageattn3 包）\n\n- **torch_compile_args**：连接到 SeedVR2 Torch Compile Settings 节点，以获得 20-40% 的速度提升。\n\n**BlockSwap 解释：**\n\nBlockSwap 可以让显存有限的 GPU 运行大型模型，它会在推理过程中动态地在 GPU 和 CPU 内存之间交换 Transformer 块。\n\n> **注意**：BlockSwap 在 macOS 上不可用。Apple Silicon Mac 采用统一内存架构，GPU 和 CPU 共享同一内存池，因此 BlockSwap 没有意义。如果在 macOS 上尝试启用此选项，系统会自动禁用并发出警告。\n\n其工作原理如下：\n- **作用**：只将当前需要的 Transformer 块保留在 GPU 上，其余部分存储在 CPU 或其他设备上。\n- **适用场景**：在超分辨率阶段遇到 OOM（内存不足）错误时。\n- **配置方法**：\n  1. 将 `offload_device` 设置为 `cpu` 或另一块 GPU。\n  2. 初始设置 `blocks_to_swap=16`（即一半的块数）。\n  3. 如果仍出现 OOM 错误，则增加到 24 或 32（3B）\u002F36（7B）。\n  4. 启用 `swap_io_components` 以最大限度地节省显存。\n  5. 如果显存充足，可以降低数值或直接设为 0，以提高处理速度。\n\n**低显存（8GB）配置示例**：\n- model：`seedvr2_ema_3b-Q8_0.gguf`\n- device：`cuda:0`\n- offload_device：`cpu`\n- blocks_to_swap：`32`\n- swap_io_components：`True`\n\n#### 2. SeedVR2 (下载)VAE 模型\n\n![SeedVR2 (下载)VAE 模型](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnumz_ComfyUI-SeedVR2_VideoUpscaler_readme_0e32b40d059a.png)\n\n用于配置 VAE（变分自编码器）模型，以进行视频帧的编码和解码。\n\n**参数说明：**\n\n- **model**：选择 VAE 模型\n  - `ema_vae_fp16.safetensors`：默认且推荐\n\n- **device**：执行 VAE 推理的 GPU 设备（例如 `cuda:0`）\n\n- **offload_device**: 在不主动处理时，将 VAE 模型卸载到的设备\n  - `none`: 将模型保留在推理设备上（默认，速度最快）\n  - `cpu`: 卸载到系统内存（减少显存占用）\n  - `cuda:X`: 卸载到另一块 GPU（如果有其他 GPU 可用，则平衡较好）\n\n- **cache_model**: 在工作流运行之间，将 VAE 模型保留在 offload_device 上\n  - 需要先设置 offload_device\n\n- **encode_tiled**: 启用分块编码以减少编码阶段的显存使用\n  - 如果在调试日志中“编码”阶段出现 OOM 错误，请启用\n\n- **encode_tile_size**: 编码分块大小，单位为像素（默认：1024）\n  - 同时应用于高度和宽度\n  - 值越小，显存占用越低，但处理时间可能会增加\n\n- **encode_tile_overlap**: 编码分块重叠区域，单位为像素（默认：128）\n  - 减少分块之间的可见接缝\n\n- **decode_tiled**: 启用分块解码以减少解码阶段的显存使用\n  - 如果在调试日志中“解码”阶段出现 OOM 错误，请启用\n\n- **decode_tile_size**: 解码分块大小，单位为像素（默认：1024）\n\n- **decode_tile_overlap**: 解码分块重叠区域，单位为像素（默认：128）\n\n- **torch_compile_args**: 连接到 SeedVR2 Torch Compile Settings 节点，可获得 15-25% 的速度提升\n\n**VAE 分块说明：**\n\nVAE 分块技术通过将大分辨率图像分割成较小的块来处理，从而降低显存需求。使用方法如下：\n\n1. **先不启用分块**运行，并监控调试日志（在主节点上启用 `enable_debug`）。\n2. **如果“编码”阶段出现 OOM**：\n   - 启用 `encode_tiled`。\n   - 如果仍然 OOM，减小 `encode_tile_size`（例如尝试 768、512 等）。\n3. **如果“解码”阶段出现 OOM**：\n   - 启用 `decode_tiled`。\n   - 如果仍然 OOM，减小 `decode_tile_size`。\n4. **调整重叠区域**（默认 128），如果输出中出现明显的接缝，则增大；如果处理时间过长，则减小。\n\n**高分辨率（4K）示例配置：**\n- encode_tiled: `True`\n- encode_tile_size: `1024`\n- encode_tile_overlap: `128`\n- decode_tiled: `True`\n- decode_tile_size: `1024`\n- decode_tile_overlap: `128`\n\n#### 3. SeedVR2 Torch Compile 设置（可选）\n\n![SeedVR2 Torch Compile 设置](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnumz_ComfyUI-SeedVR2_VideoUpscaler_readme_19daf8c081be.png)\n\n配置 torch.compile 优化，可使 DiT 加速 20-40%，VAE 加速 15-25%。\n\n**要求：**\n- PyTorch 2.0+\n- Triton（用于 inductor 后端）\n\n**参数：**\n\n- **backend**: 编译后端\n  - `inductor`: 使用 Triton 生成内核并进行融合的完整优化（推荐）\n  - `cudagraphs`: 使用 CUDA 图的轻量级封装，无内核优化\n\n- **mode**: 优化级别（编译时间 vs 运行时性能）\n  - `default`: 快速编译且有良好加速效果（开发阶段推荐）\n  - `reduce-overhead`: 开销更低，针对小型模型优化\n  - `max-autotune`: 编译最慢，运行时性能最佳（生产环境推荐）\n  - `max-autotune-no-cudagraphs`: 类似 max-autotune，但不使用 CUDA 图\n\n- **fullgraph**: 将整个模型编译为单个图，不允许中断\n  - `False`: 允许图中断以提高兼容性（默认，推荐）\n  - `True`: 强制不允许中断以实现最大优化（可能因动态形状而失败）\n\n- **dynamic**: 处理不同输入形状而不重新编译\n  - `False`: 针对精确输入形状进行优化（默认）\n  - `True`: 创建可适应形状变化的动态内核（在处理不同分辨率或批量大小时启用）\n\n- **dynamo_cache_size_limit**: 每个函数最多缓存的编译版本数（默认：64）\n  - 数值越高，占用内存越多；数值越低，重新编译次数越多。\n\n- **dynamo_recompile_limit**: 在回退到 eager 模式之前的最大重新编译次数（默认：128）\n  - 安全限制，防止编译循环。\n\n**使用方法：**\n1. 将此节点添加到您的工作流中。\n2. 将其输出连接到 DiT 和\u002F或 VAE 加载节点的 `torch_compile_args` 输入。\n3. 第一次运行会较慢（因为需要编译），后续运行会快得多。\n\n**适用场景：**\n- torch.compile 只有在处理 **多批次、长视频或大量分块** 时才有意义。\n- 对于单张图片或短片段，编译时间会超过速度提升带来的收益。\n- 最适合批量处理工作流或长视频。\n\n**推荐设置：**\n- 开发\u002F测试阶段：`mode=default`，`backend=inductor`，`fullgraph=False`。\n- 生产阶段：`mode=max-autotune`，`backend=inductor`，`fullgraph=False`。\n\n#### 4. SeedVR2 视频超分辨率（主节点）\n\n![SeedVR2 视频超分辨率](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnumz_ComfyUI-SeedVR2_VideoUpscaler_readme_5bad7d99fbb9.png)\n\n主要的超分辨率节点，使用 DiT 和 VAE 模型处理视频帧。\n\n**所需输入：**\n\n- **image**: 输入视频帧，以图像批次形式提供（RGB 或 RGBA 格式）。\n- **dit**: 来自 SeedVR2 (Down)Load DiT Model 节点的 DiT 模型配置。\n- **vae**: 来自 SeedVR2 (Down)Load VAE Model 节点的 VAE 模型配置。\n\n**参数：**\n\n- **seed**: 随机种子，用于生成结果的可重复性（默认：42）。\n  - 相同的种子和输入会产生完全相同的输出。\n\n- **resolution**: 目标分辨率，以短边像素数表示（默认：1080）。\n  - 自动保持宽高比。\n\n- **max_resolution**: 任意边的最大分辨率（默认：0 = 无限制）。\n  - 如果超过此限制，会自动缩小分辨率以防止 OOM。\n\n- **batch_size**: 每批处理的帧数（默认：5）。\n  - **关键要求**：必须符合 **4n+1 公式**（1, 5, 9, 13, 17, 21, 25, …）。\n  - **原因**：模型利用这些帧进行时间一致性计算。\n  - **至少 5 帧才能保证时间一致性**：仅当处理单张图片或不需要时间一致性时，才可使用 1 帧。\n  - **理想情况下匹配镜头长度**：为了获得最佳效果，应将 batch_size 设置为与镜头长度一致（例如，batch_size=21 表示一个 20 帧的镜头）。\n  - **显存影响**：batch_size 越大，质量和速度越好，但需要更多显存。\n  - **如果 batch_size=5 时出现 OOM**：请先尝试优化手段（模型卸载、BlockSwap、GGUF 模型等），再考虑降低 batch_size 或输入分辨率，因为这些会直接影响质量。\n\n- **uniform_batch_size**（默认：False）：\n  - 将最后一组补足至 `batch_size` 大小，以实现统一处理。\n  - 防止当最后一组明显小于其他组时产生时间伪影。\n  - 例如，45 帧使用 `batch_size=33` 时，会生成 [33, 33] 而不是 [33, 12]。\n  - 建议在使用较大 batch_size 且视频长度不是 `batch_size` 的整数倍时启用。\n  - 会略微增加显存占用，但能确保所有批次之间的时间连贯性一致。\n\n- **temporal_overlap**: 批次之间的重叠帧数（默认：0）。\n  - 用于批次间的混合，以减少时间伪影。\n  - 范围：0-16 帧。\n\n- **prepend_frames**: 预置帧数（默认：0）。\n  - 在视频开头预置反向帧，以减少开头的伪影。\n  - 处理完成后会自动移除。\n  - 范围：0-32 帧。\n\n- **color_correction**: 色彩校正方法（默认值为 \"wavelet\"）\n  - **`lab`**: 完整的感知色彩匹配，同时保留细节（推荐用于实现与原图最高的保真度）\n  - **`wavelet`**: 基于频率的自然色彩，能很好地保留细节\n  - **`wavelet_adaptive`**: 小波基底结合针对性的饱和度校正\n  - **`hsv`**: 基于色相条件的饱和度匹配\n  - **`adain`**: 统计风格迁移\n  - **`none`**: 不进行色彩校正\n\n- **input_noise_scale**: 输入噪声注入比例 0.0–1.0（默认值为 0.0）\n  - 向输入帧添加噪声，以减少超高分辨率下的伪影\n  - 如果在高输出分辨率下出现伪影，可尝试设置为 0.1–0.3\n\n- **latent_noise_scale**: 隐空间噪声比例 0.0–1.0（默认值为 0.0）\n  - 在扩散过程中添加噪声，可以柔化过度的细节\n  - 如果 input_noise 无效，可尝试设置为 0.05–0.15\n\n- **offload_device**: 用于存储处理阶段之间中间张量的设备（默认值为 \"cpu\"）\n  - `none`: 所有张量保留在推理设备上（速度最快，但显存占用最高）\n  - `cpu`: 将张量卸载到系统内存（推荐用于长视频，传输速度较慢）\n  - `cuda:X`: 将张量卸载到另一块 GPU 上（如果有可用资源，效果较好，比 CPU 快）\n\n- **enable_debug**: 启用详细调试日志记录（默认值为 False）\n  - 显示内存使用情况、计时信息和处理细节\n  - **强烈建议**启用此选项以排查 OOM 问题\n\n**输出：**\n- 应用了色彩校正的超分辨率视频帧\n- 格式（RGB\u002FRGBA）与输入一致\n- 取值范围 [0, 1] 归一化，以兼容 ComfyUI\n\n\n\n### 典型工作流设置\n\n**基础工作流（高显存 – 24GB+）**:\n```\n加载视频帧\n    ↓\nSeedVR2 加载 DiT 模型\n  ├─ model: seedvr2_ema_3b_fp16.safetensors\n  └─ device: cuda:0\n    ↓\nSeedVR2 加载 VAE 模型\n  ├─ model: ema_vae_fp16.safetensors\n  └─ device: cuda:0\n    ↓\nSeedVR2 视频超分辨率器\n  ├─ batch_size: 21\n  └─ resolution: 1080\n    ↓\n保存视频\u002F帧\n```\n\n**低显存工作流（8–12GB）**:\n```\n加载视频帧\n    ↓\nSeedVR2 加载 DiT 模型\n  ├─ model: seedvr2_ema_3b-Q8_0.gguf\n  ├─ device: cuda:0\n  ├─ offload_device: cpu\n  ├─ blocks_to_swap: 32\n  └─ swap_io_components: True\n    ↓\nSeedVR2 加载 VAE 模型\n  ├─ model: ema_vae_fp16.safetensors\n  ├─ device: cuda:0\n  ├─ encode_tiled: True\n  └─ decode_tiled: True\n    ↓\nSeedVR2 视频超分辨率器\n  ├─ batch_size: 5\n  └─ resolution: 720\n    ↓\n保存视频\u002F帧\n```\n\n**高性能工作流（24GB+，配合 torch.compile）**:\n```\n加载视频帧\n    ↓\nSeedVR2 Torch Compile 设置\n  ├─ mode: max-autotune\n  └─ backend: inductor\n    ↓\nSeedVR2 加载 DiT 模型\n  ├─ model: seedvr2_ema_7b_sharp_fp16.safetensors\n  ├─ device: cuda:0\n  └─ torch_compile_args: connected\n    ↓\nSeedVR2 加载 VAE 模型\n  ├─ model: ema_vae_fp16.safetensors\n  ├─ device: cuda:0\n  └─ torch_compile_args: connected\n    ↓\nSeedVR2 视频超分辨率器\n  ├─ batch_size: 81\n  └─ resolution: 1080\n    ↓\n保存视频\u002F帧\n```\n\n## 🖥️ 独立运行（CLI）\n\n独立 CLI 提供强大的批处理能力，支持多 GPU，并具备复杂的优化选项。\n\n### 前置条件\n\n根据您的安装选择合适的设置：\n\n#### 选项 1：已安装 ComfyUI 和 SeedVR2\n\n如果您已经通过 [ComfyUI 安装指南](#-installation) 将 SeedVR2 作为 ComfyUI 的一部分安装好了，可以直接使用 CLI：\n\n```bash\n# 进入您的 ComfyUI 目录\ncd ComfyUI\n\n# 使用独立 Python 运行 CLI（显示帮助信息）\n# Windows:\n.venv\\Scripts\\python.exe custom_nodes\\seedvr2_videoupscaler\\inference_cli.py --help\n# Linux\u002FmacOS:\n.venv\u002Fbin\u002Fpython custom_nodes\u002Fseedvr2_videoupscaler\u002Finference_cli.py --help\n```\n\n**请跳至下方的 [命令行使用说明](#command-line-usage)。**\n\n#### 选项 2：独立安装（无需 ComfyUI）\n\n如果您希望在不安装 ComfyUI 的情况下使用 CLI，请按照以下步骤操作：\n\n1. **安装 [uv](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002Fgetting-started\u002Finstallation\u002F)**（现代 Python 包管理工具）：\n```bash\n# Windows\npowershell -ExecutionPolicy ByPass -c \"irm https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.ps1 | iex\"\n\n# macOS 和 Linux\ncurl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\n```\n\n2. **克隆仓库**：\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fnumz\u002FComfyUI-SeedVR2_VideoUpscaler.git seedvr2_videoupscaler\ncd seedvr2_videoupscaler\n```\n\n3. **创建虚拟环境并安装依赖**：\n```bash\n# 创建 Python 3.13 的虚拟环境\nuv venv --python 3.13\n\n# 激活虚拟环境\n# Windows:\n.venv\\Scripts\\activate\n# Linux\u002FmacOS:\nsource .venv\u002Fbin\u002Factivate\n\n# 安装支持 CUDA 的 PyTorch\n# 根据您的环境查看相应命令：https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F\nuv pip install --pre torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fnightly\u002Fcu130\n\n# 安装 SeedVR2 的依赖\nuv pip install -r requirements.txt\n\n# 运行 CLI（显示帮助信息）\n# Windows:\n.venv\\Scripts\\python.exe inference_cli.py --help\n# Linux\u002FmacOS:\n.venv\u002Fbin\u002Fpython inference_cli.py --help\n```\n\n### 命令行使用说明\n\nCLI 提供了针对单 GPU、多 GPU 以及批处理工作流的全面选项。\n\n**基本使用示例：**\n\n```bash\n# 基本图像超分辨率\npython inference_cli.py image.jpg\n\n# 基本视频超分辨率，带时间一致性\npython inference_cli.py video.mp4 --resolution 720 --batch_size 33\n\n# 长视频流式处理模式（内存高效），输出 10 位视频（需 FFMPEG）\n# 每次处理 330 帧，避免将整个视频加载到内存中\n# 使用 --temporal_overlap 确保各片段之间的平滑过渡\npython inference_cli.py long_video.mp4 \\\n    --resolution 1080 \\\n    --batch_size 33 \\\n    --chunk_size 330 \\\n    --temporal_overlap 3 \\\n    --video_backend ffmpeg \\\n    --10bit\n\n# 多 GPU 处理，带时间重叠\npython inference_cli.py video.mp4 \\\n    --cuda_device 0,1 \\\n    --resolution 1080 \\\n    --batch_size 81 \\\n    --uniform_batch_size \\\n    --temporal_overlap 3 \\\n    --prepend_frames 4\n\n# 低显存优化（8GB）\npython inference_cli.py image.png \\\n    --dit_model seedvr2_ema_3b-Q8_0.gguf \\\n    --resolution 1080 \\\n    --blocks_to_swap 32 \\\n    --swap_io_components \\\n    --dit_offload_device cpu \\\n    --vae_offload_device cpu\n\n# 高分辨率，VAE 分块处理\npython inference_cli.py video.mp4 \\\n    --resolution 1440 \\\n    --batch_size 31 \\\n    --uniform_batch_size \\\n    --temporal_overlap 3 \\\n    --vae_encode_tiled \\\n    --vae_decode_tiled\n\n# 目录批量处理，带模型缓存\npython inference_cli.py media_folder\u002F \\\n    --output processed\u002F \\\n    --cuda_device 0 \\\n    --cache_dit \\\n    --cache_vae \\\n    --dit_offload_device cpu \\\n    --vae_offload_device cpu \\\n    --resolution 1080 \\\n    --max_resolution 1920\n```\n\n### 命令行参数\n\n**输入\u002F输出：**\n- `\u003Cinput>`：输入文件（.mp4、.avi、.png、.jpg 等）或目录\n- `--output`：输出路径（默认：在 'output\u002F' 目录中自动生成）\n- `--output_format`：输出格式：'mp4'（视频）或 'png'（图像序列）。默认：根据输入类型自动检测\n- `--video_backend`：视频编码后端：'opencv'（默认）或 'ffmpeg'（需在 PATH 中包含 ffmpeg）\n- `--10bit`：使用 x265 编码器和 yuv420p10le 像素格式保存 10 位视频（可减少渐变中的色带现象）。若不启用此标志，ffmpeg 将使用 x264（yuv420p）以实现最大兼容性。需配合 --video_backend ffmpeg 使用\n- `--model_dir`：模型目录（默认：.\u002Fmodels\u002FSEEDVR2）\n\n**模型选择：**\n- `--dit_model`：要使用的 DiT 模型。选项包括 3B\u002F7B 的 fp16\u002Ffp8\u002FGGUF 变体（默认：3B FP8）\n\n**处理参数：**\n- `--resolution`：目标短边分辨率（像素），默认为 1080\n- `--max_resolution`：任意边的最大分辨率。超过该值时会进行缩放。0 表示无限制（默认：0）\n- `--batch_size`：每批处理的帧数（必须符合 4n+1 规则：1、5、9、13、17、21…）。理想情况下应与镜头长度匹配，以获得最佳时间一致性（默认：5）\n- `--seed`：用于复现结果的随机种子（默认：42）\n- `--skip_first_frames`：跳过前 N 帧（默认：0）\n- `--load_cap`：从视频中最多加载的总帧数。0 表示加载全部（默认：0）\n- `--chunk_size`：流式处理模式下的每块帧数。当大于 0 时，视频将以内存受限的 N 帧块进行处理，每处理完一块即写入，再加载下一块。这对于可能超出内存容量的长视频至关重要。建议与 `--temporal_overlap` 配合使用，以实现平滑的块间过渡。0 表示一次性加载所有帧（默认：0）\n- `--prepend_frames`：在视频开头添加 N 帧反转帧，以减少起始处的伪影（处理后会自动移除）（默认：0）\n- `--temporal_overlap`：批次或 GPU 之间用于平滑融合的重叠帧数（默认：0）\n\n**质量控制：**\n- `--color_correction`：颜色校正方法：'lab'（感知型，推荐）、'wavelet'、'wavelet_adaptive'、'hsv'、'adain' 或 'none'（默认：lab）\n- `--input_noise_scale`：输入噪声注入强度（0.0–1.0）。可在高分辨率下减少伪影（默认：0.0）\n- `--latent_noise_scale`：潜在空间噪声强度（0.0–1.0）。必要时可柔化细节（默认：0.0）\n\n**内存管理：**\n- `--dit_offload_device`：用于卸载 DiT 模型的设备：'none'（保留在 GPU 上）、'cpu' 或 'cuda:X'（默认：none）\n- `--vae_offload_device`：用于卸载 VAE 模型的设备：'none'、'cpu' 或 'cuda:X'（默认：none）\n- `--blocks_to_swap`：要交换的 Transformer 块数量（0=禁用，3B：0–32，7B：0–36）。需配合 dit_offload_device 使用（默认：0）。macOS 系统不支持此功能。\n- `--swap_io_components`：卸载 I\u002FO 组件以进一步节省显存。需配合 dit_offload_device 使用。macOS 系统不支持此功能。\n\n**VAE 分块处理：**\n- `--vae_encode_tiled`：启用 VAE 编码分块以减少编码过程中的显存占用\n- `--vae_encode_tile_size`：VAE 编码分块的尺寸（像素），默认为 1024\n- `--vae_encode_tile_overlap`：VAE 编码分块之间的重叠区域（像素），默认为 128\n- `--vae_decode_tiled`：启用 VAE 解码分块以减少解码过程中的显存占用\n- `--vae_decode_tile_size`：VAE 解码分块的尺寸（像素），默认为 1024\n- `--vae_decode_tile_overlap`：VAE 解码分块之间的重叠区域（像素），默认为 128\n- `--tile_debug`：可视化分块：'false'（默认）、'encode' 或 'decode'\n\n**性能优化：**\n- `--allow_vram_overflow`：允许显存溢出至系统内存。可防止 OOM 错误，但可能导致严重性能下降\n- `--attention_mode`：注意力机制后端：'sdpa'（默认）、'flash_attn_2'（Ampere+）、'flash_attn_3'（Hopper+）、'sageattn_2' 或 'sageattn_3'（Blackwell）\n- `--compile_dit`：为 DiT 模型启用 torch.compile（速度提升 20–40%，需 PyTorch 2.0+ 和 Triton）\n- `--compile_vae`：为 VAE 模型启用 torch.compile（速度提升 15–25%，需 PyTorch 2.0+ 和 Triton）\n- `--compile_backend`：编译后端：'inductor'（全面优化）或 'cudagraphs'（轻量级）（默认：inductor）\n- `--compile_mode`：优化级别：'default'、'reduce-overhead'、'max-autotune'、'max-autotune-no-cudagraphs'（默认：default）\n- `--compile_fullgraph`：将整个模型编译为单个图（速度更快但灵活性较低）（默认：False）\n- `--compile_dynamic`：无需重新编译即可处理不同形状的输入（默认：False）\n- `--compile_dynamo_cache_size_limit`：每个函数最多缓存的编译版本数（默认：64）\n- `--compile_dynamo_recompile_limit`：在回退之前允许的最大重新编译次数（默认：128）\n\n**模型缓存（批处理）：**\n- `--cache_dit`：在多次生成之间将 DiT 模型保留在内存中。适用于单 GPU 目录处理或多 GPU 流式处理（`--chunk_size`）。需配合 `--dit_offload_device` 使用\n- `--cache_vae`：在多次生成之间将 VAE 模型保留在内存中。适用于单 GPU 目录处理或多 GPU 流式处理（`--chunk_size`）。需配合 `--vae_offload_device` 使用\n\n**多 GPU：**\n- `--cuda_device`：CUDA 设备 ID。可以是单个 ID（如 '0'）或逗号分隔的列表（如 '0,1'）以实现多 GPU 处理\n\n**调试：**\n- `--debug`：启用详细调试日志记录\n\n### 多GPU处理详解\n\nCLI的多GPU模式采用**帧级并行**：视频被分割成多个片段，每个GPU独立地对各自负责的片段执行全部4个阶段（编码→超分→解码→后处理）。这种方式非常适合长视频，通过分配工作负载来缩短总处理时间。\n\n**工作原理：**\n1. 视频帧均匀分配到各个GPU上（例如，100帧分配给2个GPU，则每个GPU处理50帧）。\n2. 每个GPU加载自己的模型副本，并独立处理其负责的片段。\n3. 当启用`--temporal_overlap`时，片段之间会包含重叠帧，以实现无缝拼接。\n4. 最终将各GPU的结果拼接起来（并在重叠区域进行混合），生成完整的视频。\n\n**示例：100帧分配给2个GPU，temporal_overlap设置为4：**\n```\nGPU 0：帧0-53（50帧基础部分 + 结尾4帧重叠，作为独立视频处理）\nGPU 1：帧50-99（50帧，开头4帧重叠，作为独立视频处理）\n结果：帧0-99，在过渡处平滑融合\n```\n\n**重要注意事项：**\n- 每个GPU将其负责的片段当作一个单独的视频来处理，且各自进行批处理划分。\n- `batch_size`仅控制单个GPU内部的批处理大小，不会跨GPU生效。\n- 对于短视频（少于100帧），由于模型加载开销较大，通常使用单GPU更为高效。\n- 多GPU模式会使显存占用翻倍（每个GPU都会加载完整模型），但大致可将处理时间减半。\n\n**何时使用多GPU：**\n- 长视频（100帧以上），分割处理能显著节省时间。\n- 当你有多块显存充足的GPU时。\n\n**何时使用单GPU：**\n- 短视频：模型加载开销可能超过并行化带来的收益。\n- 需要所有帧一起处理以保证最佳的时间一致性时。\n\n**最佳实践：**\n- 设置`--temporal_overlap`为2-4帧，以确保GPU片段之间的平滑过渡。\n- 重叠帧越多，过渡越平滑，但也会增加冗余计算量。\n- 使用`--prepend_frames`减少视频开头的伪影。\n- 对于短视频，建议使用单GPU，并将`batch_size`设置为与镜头长度一致，以获得最佳质量。\n\n## ⚠️ 限制\n\n### 模型限制\n\n**批处理大小约束**：由于时间一致性架构的设计要求，模型的`batch_size`必须遵循**4n+1公式**（1, 5, 9, 13, 17, 21, 25, ...）。同一批次内的所有帧会同时处理以保持时间一致性，随后可通过`temporal_overlap`参数对不同批次进行混合。理想情况下，应将`batch_size`设置为与镜头长度匹配，以获得最佳质量。\n\n### 性能考虑\n\n**VAE瓶颈**：即使采用了优化后的DiT超分技术（BlockSwap、GGUF、torch.compile），VAE的编码\u002F解码阶段仍可能成为性能瓶颈，尤其是在高分辨率下。VAE本身速度较慢，可以通过增大`batch_size`来缓解这一问题。\n\n**显存占用**：尽管当前集成已支持低显存系统（8GB或更少，经过适当优化即可运行），但显存使用量会因以下因素而异：\n- 输入\u002F输出分辨率（越高，显存需求越大）。\n- 批处理大小（越大，显存需求越高，但时间一致性和速度更好）。\n- 模型选择（FP16 > FP8 > GGUF，显存占用依次递减）。\n- 优化设置（BlockSwap和VAE分块技术可显著降低显存使用）。\n\n**速度**：处理速度取决于：\n- GPU性能（计算能力、显存带宽及架构世代）。\n- 模型规模（3B比7B更快）。\n- 批处理大小（较大的批处理可以提高每帧处理效率，从而提升整体速度）。\n- 优化设置（torch.compile可带来显著加速）。\n- 分辨率（分辨率越高，处理速度越慢）。\n\n### 最佳实践\n\n1. **开启调试模式**，以便了解显存的具体使用情况。\n2. **编码阶段出现OOM错误时**：启用VAE编码分块功能，并减小分块尺寸。\n3. **超分阶段出现OOM错误时**：启用BlockSwap功能，并增加`blocks_to_swap`参数值。\n4. **解码阶段出现OOM错误时**：启用VAE解码分块功能，并减小分块尺寸。\n   - **如果尝试上述方法后仍出现OOM错误**：请降低`batch_size`或分辨率。\n5. **追求最佳质量时**：使用较高的`batch_size`（与镜头长度匹配）、FP16模型以及LAB色彩校正。\n6. **追求速度时**：使用FP8\u002FGGUF模型，启用torch.compile，并在可用时使用Flash Attention。\n7. **在处理长视频之前**，先用一段短片测试配置。\n\n## 🤝 贡献\n\n我们欢迎任何形式的贡献！非常重视社区的意见和改进。\n\n详细的贡献指南请参阅[CONTRIBUTING.md](CONTRIBUTING.md)。\n\n**快速入门：**\n\n1. 克隆仓库并创建分支。\n2. 创建你的功能分支（`git checkout -b feature\u002FAmazingFeature`）。\n3. 提交更改（`git commit -m 'Add some AmazingFeature'`）。\n4. 推送到分支（`git push origin feature\u002FAmazingFeature`）。\n5. 向**main**分支发起拉取请求。\n\n**获取帮助：**\n- YouTube：[AInVFX频道](https:\u002F\u002Fwww.youtube.com\u002F@AInVFX)\n- GitHub [Issues](https:\u002F\u002Fgithub.com\u002Fnumz\u002FComfyUI-SeedVR2_VideoUpscaler\u002Fissues)：用于报告问题和提出功能需求。\n- GitHub [Discussions](https:\u002F\u002Fgithub.com\u002Fnumz\u002FComfyUI-SeedVR2_VideoUpscaler\u002Fdiscussions)：用于提问和社区支持。\n- Discord：adrientoupet & NumZ#7184\n\n## 🙏 致谢\n\n本ComfyUI实现是由**NumZ**（[GitHub链接](https:\u002F\u002Fgithub.com\u002Fnumz)）和**AInVFX**（Adrien Toupet，[YouTube频道](https:\u002F\u002Fwww.youtube.com\u002F@AInVFX)）合作完成的，基于字节跳动Seed团队的原始[SeedVR2](https:\u002F\u002Fgithub.com\u002FByteDance-Seed\u002FSeedVR)项目。\n\n特别感谢我们的社区贡献者，包括[naxci1](https:\u002F\u002Fgithub.com\u002Fnaxci1)、[thehhmdb](https:\u002F\u002Fgithub.com\u002Fthehhmdb)、[s-cerevisiae](https:\u002F\u002Fgithub.com\u002Fs-cerevisiae)、[benjaminherb](https:\u002F\u002Fgithub.com\u002Fbenjaminherb)、[cmeka](https:\u002F\u002Fgithub.com\u002Fcmeka)、[FurkanGozukara](https:\u002F\u002Fgithub.com\u002FFurkanGozukara)、[JohnAlcatraz](https:\u002F\u002Fgithub.com\u002FJohnAlcatraz)、[lihaoyun6](https:\u002F\u002Fgithub.com\u002Flihaoyun6)、[Luchuanzhao](https:\u002F\u002Fgithub.com\u002FLuchuanzhao)、[Luke2642](https:\u002F\u002Fgithub.com\u002FLuke2642)、[proxyid](https:\u002F\u002Fgithub.com\u002Fproxyid)、[q5sys](https:\u002F\u002Fgithub.com\u002Fq5sys)，以及其他众多贡献者，感谢他们提供的改进、Bug修复和测试支持。\n\n## 📜 许可证\n\n本仓库中的代码根据Apache 2.0许可证发布，详细信息请参阅[LICENSE](LICENSE)文件。","# ComfyUI-SeedVR2_VideoUpscaler 快速上手指南\n\nComfyUI-SeedVR2_VideoUpscaler 是字节跳动 SeedVR2 模型在 ComfyUI 中的官方实现，专为高质量视频和图像超分辨率放大设计。支持多 GPU 独立运行及流式处理长视频。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**: Windows, Linux, macOS (Apple Silicon M 系列芯片已优化支持)\n- **Python**: 3.10 - 3.12\n- **GPU**: \n  - NVIDIA: 推荐 RTX 20 系列及以上（支持 bf16 更佳），旧款显卡（如 GTX 970）会自动降级兼容。\n  - AMD: 支持 ROCm。\n  - Apple: 支持 MPS 加速。\n- **显存**: 建议 8GB 以上，长视频或高分辨率处理建议 16GB+（支持分块流式处理以降低显存需求）。\n\n### 前置依赖\n确保已安装以下基础环境：\n- **ComfyUI**: 最新稳定版\n- **PyTorch**: 2.3+ (推荐 2.5+)\n- **FFmpeg**: 用于视频编码输出（必装，支持 10-bit x265 编码）\n- **Git**\n\n> **国内加速建议**：\n> - 安装 PyTorch 时推荐使用清华或中科大镜像源。\n> - 克隆仓库时使用 `gitee` 镜像或配置 git proxy。\n\n## 安装步骤\n\n### 方法一：通过 ComfyUI Manager 安装（推荐）\n1. 打开 ComfyUI，点击右侧菜单的 **Manager**。\n2. 选择 **Install Custom Nodes**。\n3. 搜索 `SeedVR2` 或 `ComfyUI-SeedVR2_VideoUpscaler`。\n4. 点击 **Install** 并重启 ComfyUI。\n\n### 方法二：手动命令行安装\n进入 ComfyUI 的 `custom_nodes` 目录，执行以下命令：\n\n```bash\ncd ComfyUI\u002Fcustom_nodes\ngit clone https:\u002F\u002Fgithub.com\u002Fnumz\u002FComfyUI-SeedVR2_VideoUpscaler.git\ncd ComfyUI-SeedVR2_VideoUpscaler\npip install -r requirements.txt\n```\n\n> **国内用户加速命令**：\n> ```bash\n> git clone https:\u002F\u002Fgitee.com\u002Fmirrors\u002FComfyUI-SeedVR2_VideoUpscaler.git\n> pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n*注：若需独立 CLI 模式运行，请确保全局环境变量中已包含 `ffmpeg`。*\n\n## 基本使用\n\n### 1. ComfyUI 工作流使用\n启动 ComfyUI 后，加载预设工作流或新建节点：\n\n1. **添加节点**：在节点菜单中找到 `SeedVR2` 分类，双击添加 `SeedVR2 Video Upscaler` 节点。\n2. **连接输入**：\n   - 将视频加载节点（如 `Load Video`）的输出连接到 upscale 节点的 `image` 或 `video` 输入端。\n   - 可选连接 `reference_image` 用于色彩参考。\n3. **参数设置**：\n   - `upscale_factor`: 设置放大倍数（如 2x, 4x）。\n   - `denoise_strength`: 去噪强度（默认 0.35，过高可能导致细节丢失）。\n   - `tile_size`: 显存不足时减小此值以启用分块处理。\n4. **运行**：点击 \"Queue Prompt\" 开始处理，输出将自动保存至 `output` 文件夹。\n\n### 2. 独立命令行 (CLI) 模式\n无需打开 ComfyUI 界面，直接通过终端处理视频，支持多 GPU 和长视频流式处理。\n\n**基础命令示例**：\n```bash\npython main.py --input_path .\u002Finput_video.mp4 --output_path .\u002Foutput_video.mp4 --upscale_factor 2\n```\n\n**高级用法（多 GPU + 10-bit 输出 + 分块处理长视频）**：\n```bash\npython main.py \\\n  --input_path .\u002Flong_video.mp4 \\\n  --output_path .\u002Fupscaled_10bit.mp4 \\\n  --upscale_factor 4 \\\n  --device_ids 0 1 \\\n  --chunk_size 16 \\\n  --video_backend ffmpeg \\\n  --10bit \\\n  --cache_dit \\\n  --cache_vae\n```\n\n**关键参数说明**：\n- `--device_ids`: 指定使用的 GPU ID（如 `0 1` 表示双卡）。\n- `--chunk_size`: 每次处理的帧数，用于超长视频避免 OOM。\n- `--video_backend ffmpeg`: 启用 FFmpeg 后端以获得更好的画质和 10-bit 支持。\n- `--cache_dit` \u002F `--cache_vae`: 启用模型缓存，提升多段处理速度。\n\n处理完成后，高清视频将保存在指定的输出路径中。","一位独立纪录片创作者需要将十年前用老式摄像机拍摄的 480p 家庭影像修复为 4K 分辨率，以便在现代高清电视上播放并存档。\n\n### 没有 ComfyUI-SeedVR2_VideoUpscaler 时\n- **画质模糊且细节丢失**：传统双三次插值放大导致画面边缘锯齿严重，人物面部纹理完全糊成一片，缺乏真实感。\n- **色彩断层明显**：在天空或渐变背景中出现明显的色带（Banding），尤其是在导出为 8 位视频时，视觉体验极不自然。\n- **处理流程繁琐易崩**：尝试组合多个 AI 节点进行分帧处理时，常因显存优化不足导致进程挂起或崩溃，尤其在长视频渲染中频繁中断。\n- **动态闪烁问题**：帧与帧之间的修复风格不一致，导致视频播放时出现令人头晕的闪烁噪点，后期逐帧手动修正耗时巨大。\n\n### 使用 ComfyUI-SeedVR2_VideoUpscaler 后\n- **超清细节重建**：利用 SeedVR2 模型强大的生成能力，不仅将分辨率提升至 4K，还智能还原了皮肤纹理和衣物褶皱等高频细节。\n- **平滑渐变色彩**：内置支持 10-bit FFmpeg 后端编码，有效消除了色彩断层，使天空和阴影过渡如电影般平滑自然。\n- **稳定高效的多卡渲染**：依托其优化的多 GPU  standalone 模式和显存切片技术，轻松处理长视频序列，不再出现进程假死或显存溢出。\n- **时序一致性保障**：算法专门针对视频时序优化，确保相邻帧修复结果高度连贯，彻底解决了画面闪烁问题，无需额外去闪处理。\n\nComfyUI-SeedVR2_VideoUpscaler 将原本需要数天人工精修的老旧视频复活工作，缩短为几小时的一键自动化流程，同时达到了广播级的画质标准。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnumz_ComfyUI-SeedVR2_VideoUpscaler_c1df29b1.jpg","numz","NumZ","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fnumz_d17be211.png","Passionate about AI, I develop projects and share my work on GitHub.",null,"https:\u002F\u002Fgithub.com\u002Fnumz",[80],{"name":81,"color":82,"percentage":83},"Python","#3572A5",100,2349,172,"2026-04-16T05:29:57","Apache-2.0","Linux, macOS, Windows","支持 NVIDIA GPU (需 CUDA，支持 Flash Attention 2\u002F3, SageAttention, CUDNN_ATTENTION)，AMD GPU (ROCm)，Intel Gaudi，以及 Apple Silicon (MPS)。旧款 GPU (如 GTX 970) 可自动回退至不支持 bfloat16 的模式运行。显存需求未明确说明，但具备流式处理和多 GPU 支持以优化长视频处理的显存占用。","未说明 (支持通过 --chunk_size 标志进行分块流式处理，以突破内存限制处理任意长度视频)",{"notes":92,"python":93,"dependencies":94},"该工具不仅可作为 ComfyUI 插件运行，也支持作为独立的多 GPU CLI 工具使用。针对 Apple Silicon (macOS) 进行了深度优化，统一内存架构下会自动禁用 BlockSwap 并优化同步开销。支持 GGUF 量化模型和多种注意力后端自动回退机制。处理长视频时建议使用分块流式模式 (--chunk_size) 以避免内存溢出。","未说明 (提及兼容 PyTorch 2.3 至 2.9+ 版本)",[95,96,97,98,99,100,101,102],"torch>=2.3","ffmpeg","flash_attn (可选)","sageattention (可选)","bitsandbytes (可选)","triton","xformers (可选)","opencv-python (可选，默认后端)",[14,104,15,13],"视频",[106,107,108,109,110],"ai","comfyui","comfyui-nodes","upscaler","video-processing","2026-03-27T02:49:30.150509","2026-04-17T09:52:34.912631",[114,119,124,129,134,139],{"id":115,"question_zh":116,"answer_zh":117,"source_url":118},36574,"处理长视频时遇到\"DefaultCPUAllocator: not enough memory\"内存不足错误怎么办？","该问题通常出现在处理长视频的后处理阶段，尝试分配过大内存导致崩溃。维护者已在 v2.5.11 及更高版本中修复了此问题。请更新到最新版本（v2.5.11+）并重新测试。如果问题仍然存在，请确保您的系统有足够的可用内存，或尝试减小视频分辨率\u002F批次大小。","https:\u002F\u002Fgithub.com\u002Fnumz\u002FComfyUI-SeedVR2_VideoUpscaler\u002Fissues\u002F130",{"id":120,"question_zh":121,"answer_zh":122,"source_url":123},36575,"安装 SeedVR2 后启动 ComfyUI 出现黑屏窗口或 Matplotlib 相关错误如何解决？","这是由于 ComfyUI Windows 嵌入式环境中 Matplotlib 安装损坏或不完整导致的。Diffusers 库在启动时会检查可选依赖项，若检测到损坏的 Matplotlib 会报错。解决方案：\n1. 更新到最新版本的 SeedVR2（通过 ComfyUI Manager），维护者已将 matplotlib 添加到 requirements.txt 中自动修复；\n2. 如果仍存在问题，可尝试手动完全卸载 matplotlib 后重新安装，或直接使用手动安装方式（克隆仓库并手动安装依赖）替代 Windows 安装器。","https:\u002F\u002Fgithub.com\u002Fnumz\u002FComfyUI-SeedVR2_VideoUpscaler\u002Fissues\u002F122",{"id":125,"question_zh":126,"answer_zh":127,"source_url":128},36576,"7B 模型（FP16\u002FQ8）在场景过渡时出现锐化伪影，但 3B 模型正常，如何解决？","该问题可能是由于使用了旧版本或非主分支的代码导致。请按以下步骤排查：\n1. 进入 seedvr2 自定义节点文件夹，打开 README.md 文件；\n2. 查看\"## 🚀 Updates\"部分，确认版本号是否为\"2025.11.09 - Version 2.5.6\"或更高；\n3. 如果版本过旧，请在 ComfyUI Manager 中强制更新节点，或删除文件夹后从 GitHub 主分支手动下载最新版本；\n4. 更新后重启 ComfyUI 并重新运行工作流。维护者确认最新版已解决此锐化问题。","https:\u002F\u002Fgithub.com\u002Fnumz\u002FComfyUI-SeedVR2_VideoUpscaler\u002Fissues\u002F249",{"id":130,"question_zh":131,"answer_zh":132,"source_url":133},36577,"在 macOS (MPS) 上 VAE 解码时 RAM 占用异常高（如 52GB）甚至 OOM 怎么办？","这是内存泄漏问题，已在 v2.5.23\u002Fv2.5.24 版本中修复。正常情况下编码\u002F解码步骤不应超过 4GB 内存。解决方案：\n1. 立即更新到 v2.5.24 或更高版本；\n2. 如果暂时无法更新，可尝试降低图像分辨率或批次大小；\n3. 对于高级用户，可调整内部缓存清理逻辑（但推荐直接更新）。维护者确认新版已优化动态缓存清空逻辑，平衡内存使用与性能。","https:\u002F\u002Fgithub.com\u002Fnumz\u002FComfyUI-SeedVR2_VideoUpscaler\u002Fissues\u002F363",{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},36578,"是否支持 Tile-VAE 和渐进式聚合采样以生成更长的视频？","是的，该功能已在最新版本中发布。Tile-VAE 特别适用于生成长视频，可有效降低显存占用。使用方法：\n1. 通过 ComfyUI Manager 更新到最新版 SeedVR2；\n2. 参考官方教程视频了解完整工作流程：https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=MBtWYXq_r60；\n3. 注意：解码时的显存消耗通常会高于编码，因为放大后的图像更大。新版本已支持为编码和解码分别设置不同的 Tile 参数。","https:\u002F\u002Fgithub.com\u002Fnumz\u002FComfyUI-SeedVR2_VideoUpscaler\u002Fissues\u002F18",{"id":140,"question_zh":141,"answer_zh":142,"source_url":128},36579,"如何正确更新 SeedVR2 以确保获得最新修复和功能？","推荐通过以下方式更新：\n1. 首选：在 ComfyUI Manager 中搜索\"SeedVR2\"并点击更新按钮；\n2. 如果 Manager 更新失败（如回退到 nightly 版本），可手动删除 custom_nodes 中的 seedvr2 文件夹，然后从 GitHub 主分支克隆最新代码；\n3. 更新后务必重启 ComfyUI；\n4. 验证版本：进入节点文件夹查看 README.md 的更新日志，确认版本号符合预期（如 v2.5.x）。如遇持续问题，可参考官方教程检查工作流配置。",[144,149,154,159,164,169,174,179,184,189,194,199,204,209,214,219],{"id":145,"version":146,"summary_zh":147,"released_at":148},289355,"v2.5.23","- 🔒 安全性：防止模型加载时执行代码 - 通过仅允许反序列化张量，新增对恶意 .pth 文件的防护\n- 🎥 修复：FFmpeg 视频写入器可靠性 - 通过重定向 stderr 并添加缓冲区刷新，解决了 FFmpeg 进程卡死的问题，并改进了错误信息以方便调试（感谢 [@thehhmdb](https:\u002F\u002Fgithub.com\u002Fthehhmdb)）\n- ⚡ 修复：GGUF VAE 模型支持 - 启用了卷积运算的自动权重去量化，使 GGUF 量化的 VAE 模型完全可用（感谢 [@naxci1](https:\u002F\u002Fgithub.com\u002Fnaxci1)）\n- 🛡️ 修复：VAE 切片边界情况 - 在使用较小切片大小并配合高时间下采样时，防止了除零崩溃（感谢 [@naxci1](https:\u002F\u002Fgithub.com\u002Fnaxci1)）\n- 🎨 修复：LAB 颜色迁移精度 - 通过在矩阵运算前确保浮点类型一致，解决了视频超分辨率过程中因数据类型不匹配导致的错误\n- 🔧 修复：PyTorch 2.9+ 兼容性 - 将 Conv3d 内存绕过方案扩展至所有 PyTorch 2.9+ 版本，修复了新版本 PyTorch 下 VRAM 使用量增至三倍的问题\n- 📦 修复：Bitsandbytes 兼容性 - 为非 Gaudi 系统上 Intel Gaudi 版本检测失败的情况新增 ValueError 异常处理\n- 🍎 MPS：内存优化 - 在 Apple Silicon 上的编码\u002F解码过程中降低了内存占用（感谢 [@s-cerevisiae](https:\u002F\u002Fgithub.com\u002Fs-cerevisiae)）","2025-12-24T02:16:24",{"id":150,"version":151,"summary_zh":152,"released_at":153},289356,"v2.5.22","- 🎬 CLI：支持10位色深的FFmpeg视频后端 — 新增 --video_backend ffmpeg 和 --10bit 参数，可启用10位色深的x265编码，相比8位OpenCV输出，能有效减少渐变中的条带伪影（基于 [@thehhmdb](https:\u002F\u002Fgithub.com\u002Fthehhmdb) 的PR — 感谢！）\n- 🍎 修复：MPS双三次上采样兼容性 — 在PyTorch 2.8.0之前版本中添加了CPU回退方案，用于双三次+抗锯齿插值，从而解决了Apple Silicon平台上RGBA透明度通道上采样时出现的错误。\n- ⚡ 修复：跨平台直方图匹配 — 将 scatter_ 操作替换为 argsort+index_select，以提升在CUDA、ROCm和MPS后端上的可靠性。\n- 🧹 MPS：移除同步开销 — 回滚了v2.5.21中引入的不必要的 torch.mps.synchronize() 调用，以确保与CUDA流水线行为的一致性。","2025-12-13T05:38:02",{"id":155,"version":156,"summary_zh":157,"released_at":158},289357,"v2.5.21","- 🛠️ 修复：MPS 上的 GGUF 去量化错误 —— 解决了 2.5.20 版本中因在精度转换时跳过 GGUF 量化缓冲区而引入的形状不匹配错误。这些缓冲区必须保持打包格式，以便在推理过程中进行实时去量化。\n- 🍎 MPS：消除 CPU 同步开销 —— 在 Apple Silicon 统一内存架构上跳过不必要的 CPU 张量卸载操作，避免因同步阻塞导致的性能下降。输入图像和输出视频现在在整个处理流程中始终保留在 MPS 设备上。\n- ⚡ MPS：预加载文本嵌入 —— 在 Phase 1 编码之前加载文本嵌入，以避免在 Phase 2 开始时出现同步阻塞，从而提升时间精度和吞吐量。\n- 🧹 MPS：优化模型清理 —— 在统一内存架构上删除模型前，跳过冗余的 CPU 数据移动操作。","2025-12-12T16:28:46",{"id":160,"version":161,"summary_zh":162,"released_at":163},289358,"v2.5.20","- ⚡ 扩展的注意力后端 —— 完全支持 Flash Attention 2（Ampere 及以上架构）、Flash Attention 3（Hopper 及以上架构）、SageAttention 2 和 SageAttention 3（Blackwell\u002FRTX 50xx），并在不可用时自动回退到 PyTorch SDPA（基于 [@naxci1](https:\u002F\u002Fgithub.com\u002Fnaxci1) 的 PR —— 感谢！）\n- 🍎 macOS\u002FApple Silicon 兼容性 —— 将 VAE 和 DiT 流水线中的 MPS 自动混合精度替换为显式的数据类型转换，解决了 M 系列 Mac 上的卡顿和崩溃问题。BlockSwap 现在会自动禁用并发出警告（由于统一内存的存在，该功能已无意义）。\n- 🛡️ Flash Attention 优雅回退 —— 添加了针对损坏或部分安装的 flash_attn\u002Fxformers DLL 的兼容性适配层，避免启动时崩溃。\n- 🛡️ AMD ROCm：修复 bitsandbytes 冲突 —— 阻止 diffusers 在尝试重新导入损坏的 bitsandbytes 安装时出现内核注册错误。\n- 📦 ComfyUI Manager：修复 macOS 分类器问题 —— 移除了导致 macOS 上出现虚假“不支持 GPU”警告的 NVIDIA CUDA 分类器。\n- 📚 文档更新 —— 更新了 README，增加了注意力后端的详细信息、BlockSwap 在 macOS 上的注意事项，并更清晰地说明了模型缓存的相关描述。","2025-12-12T05:48:33",{"id":165,"version":166,"summary_zh":167,"released_at":168},289359,"v2.5.19","- 🎨 新版页眉Logo设计 - 更新了ASCII艺术横幅（感谢[@naxci1](https:\u002F\u002Fgithub.com\u002Fnaxci1)）\n- 🧹 移除已废弃的Flash Attention封装 - 从FP8CompatibleDiT中移除了遗留代码；FlashAttentionVarlen现已通过attention_mode属性自动处理后端切换\n- 🛡️ 修复Flash Attention的优雅降级机制 - 添加了针对损坏的flash_attn\u002Fxformers DLL的兼容性适配层，防止CUDA扩展损坏时导致启动崩溃\n- 📊 改进显存监控 - 区分已分配内存与预留内存的监控，并在Windows平台上新增溢出检测功能（基于WDDM分页行为）\n- ♻️ 中心化后端检测逻辑 - 在整个代码库中统一了is_mps_available()、is_cuda_available()以及get_gpu_backend()辅助函数\n- 🔄 恢复2.5.14版本的显存限制策略 - 移除了set_per_process_memory_fraction调用；溢出检测及警告功能仍保留。","2025-12-10T06:56:39",{"id":170,"version":171,"summary_zh":172,"released_at":173},289360,"v2.5.18","- 🚀 命令行界面：长视频流式处理模式 - 新增 --chunk_size 标志，按内存受限的分块方式处理视频，实现无 RAM 限制的任意长视频播放。配合模型缓存功能（--cache_dit\u002F--cache_vae），可在不同分块间复用计算结果（受 [disk02](https:\u002F\u002Fgithub.com\u002Fdisk02) 的 PR 贡献启发）。\n- ⚡ 命令行界面：多 GPU 流式处理 - 现在每张 GPU 都会独立进行内部片段的流式处理，并配备独立的模型缓存，从而提升显存利用率，并支持在 GPU 边界处使用 --temporal_overlap 进行时序重叠融合。\n- 🔧 命令行界面：修复大视频内存溢出问题 - 采用共享内存传输替代 NumPy 序列化，有效避免高分辨率或超长视频输出时的崩溃问题（受 [FurkanGozukara](https:\u002F\u002Fgithub.com\u002FFurkanGozukara) 的 PR 贡献启发）。","2025-12-09T06:05:15",{"id":175,"version":176,"summary_zh":177,"released_at":178},289361,"v2.5.17","- 🔧 修复：旧版 GPU 兼容性（如 GTX 970 等） - 运行时 bf16 CUBLAS 探测取代了计算能力启发式方法，能够正确检测不支持的 GPU，同时不会影响 RTX 20XX 系列。","2025-12-06T02:07:53",{"id":180,"version":181,"summary_zh":182,"released_at":183},289362,"v2.5.16","- 🔧 修复：旧版显卡兼容性（如 GTX 970 等）——对于不支持 bfloat16 的显卡自动回退  \n- 🐛 修复：画质下降问题——已回滚导致伪影问题的 bfloat16 检测逻辑  \n- 📋 调试：环境信息显示——在调试模式下显示系统信息，便于问题上报  \n- 📚 文档：简化贡献流程——现已统一为仅向 main 分支提交更改","2025-12-05T20:53:02",{"id":185,"version":186,"summary_zh":187,"released_at":188},289363,"v2.5.15","- 🍎 修复：MPS 兼容性——为 MPS 张量禁用抗锯齿，并修复 bfloat16 arange 的问题\n- ⚡ 修复：自动混合精度的设备类型——使用正确的设备类型属性，以防止自动混合精度错误\n- 📊 内存：精确的显存跟踪——使用 max_memory_reserved 来提供更准确的峰值报告\n- 🔧 修复：Triton 兼容性——为 bitsandbytes 0.45+ 和 Triton 3.0+ 添加适配层（修复 PyTorch 2.7 安装错误）","2025-12-03T18:17:14",{"id":190,"version":191,"summary_zh":192,"released_at":193},289364,"v2.5.14","- 🍎 修复：MPS 设备比较 —— 规范化设备字符串，以避免不必要的张量移动\n- 📊 内存：检测 VRAM 交换 —— 当发生溢出时，峰值统计信息会显示 GPU 和交换内存的细分数据，并在检测到交换时发出警告\n- 🛡️ 内存：强制限制物理 VRAM 使用 —— PyTorch 现在会在显存不足时直接抛出 OOM 错误，而不是静默地切换到共享内存（可防止 Windows 系统上的性能极度下降）","2025-12-01T05:34:29",{"id":195,"version":196,"summary_zh":197,"released_at":198},289365,"v2.5.13","- 🔧 Fix: PyTorch 2.7+ triton import error - Resolved installation crash caused by triton.ops import chain on newer triton versions\r\n- 💾 Fix: OOM on float32 conversion for long videos - Graceful fallback to native dtype when insufficient memory for float32 conversion\r\n- 🍎 Fix: CLI watermark error on macOS - Resolved MPS-related watermark processing crash on Apple Silicon","2025-11-30T14:04:42",{"id":200,"version":201,"summary_zh":202,"released_at":203},289366,"v2.5.12","- 🐛 Fix: Color artifacts regression - Reverted in-place tensor operations in video transform pipeline that caused color artifacts on some images","2025-11-28T22:52:29",{"id":205,"version":206,"summary_zh":207,"released_at":208},289367,"v2.5.11","- ⚡ Feature: CUDNN attention backend - Added support for PyTorch 2.3+ CUDNN_ATTENTION backend with automatic fallback for older versions (thanks @eadwu)\r\n- 💾 Fix: Memory spike for long videos - VAE decode now streams directly to pre-allocated tensor, eliminating OOM errors during long video processing\r\n- 🎨 Fix: LAB color correction artifacts - Resolved tile boundary artifacts using wavelet reconstruction preprocessing\r\n- 🎨 Fix: Color reference misalignment - Fixed color correction frame alignment with temporal overlap\r\n- 🍎 Fix: MPS detection reliability - Switched to canonical torch.backends.mps.is_available() API for consistent Apple Silicon detection\r\n- 🖥️ Fix: Mac subprocess error - CLI now uses direct processing on Mac to avoid MPS allocator failures in child processes\r\n- 🖥️ Fix: Multi-GPU device assignment - CUDA_VISIBLE_DEVICES now set before spawn for proper worker inheritance\r\n- 📊 Fix: BlockSwap logging - Now shows effective\u002Ftotal blocks (e.g., 32\u002F32) instead of raw requested value\r\n- 🔧 Feature: Auto bfloat16 detection - Automatically detects bfloat16 support to prevent CUBLAS errors on older GPUs\r\n- 📊 Feature: Peak RAM tracking - Added RAM usage alongside VRAM in debug summary\r\n- ⚡ Performance: In-place tensor ops - Reduced memory allocation overhead with in-place operations throughout pipeline\r\n- 📖 Docs: Multi-GPU clarification - Clarified frame-level parallelism behavior expectations for multi-GPU setups","2025-11-28T22:11:29",{"id":210,"version":211,"summary_zh":212,"released_at":213},289368,"v2.5.10","- 🎯 Fix: Deterministic generation - Identical images with the same seed now produce identical results across different sessions and batch positions\r\n- 🔧 Fix: Model caching with BlockSwap - Resolved issue where cached DiT models wouldn't properly reload when VAE caching state changed\r\n- 💾 Fix: Runner caching optimization - Runner templates now correctly cache whenever both DiT and VAE are cached, regardless of caching order\r\n- 📁 Fix: Case-insensitive model paths - Extra model paths in YAML config now work regardless of case (seedvr2, SEEDVR2, SeedVR2, etc.)\r\n- 🐛 Fix: High resolution tile debug crash - Fixed \"NoneType has no attribute log\" error when using maximum resolution with VAE tiling\r\n- 📊 Fix: Temporal overlap logging - Corrected frame count reporting when temporal overlap is automatically adjusted\r\n- 🔍 Feature: Enhanced model path debugging - Added detailed logging to help troubleshoot model loading issues (visible in debug mode)","2025-11-13T17:04:52",{"id":215,"version":216,"summary_zh":217,"released_at":218},289369,"v2.5.9","- 🐛 Fix: Tile debug visualization crash - Fixed OpenCV error when using VAE tile debug mode on certain systems.\r\n- 🍎 Fix: macOS MPS loading error - Added automatic CPU fallback for MPS allocator issues on certain PyTorch\u002FmacOS versions.\r\n- 🖥️ Fix: Windows log buffering - Added flush to print statements for real-time log visibility in ComfyUI on Windows\r\n- 📦 Fix: ComfyUI Registry logo - Updated icon URL to display properly in ComfyUI node registry\r\n- ℹ️ Feature: Version display - Added version number to node name and CLI\u002FComfyUI header for better tracking\r\n- 💝 Feature: GitHub Sponsors - Added sponsor button to support project development. Thank you everyone for your support!\r\n- 📜 License: Apache 2.0 - Reverted License from MIT to Apache 2.0 to match ByteDance Seed project","2025-11-12T19:40:53",{"id":220,"version":221,"summary_zh":222,"released_at":223},289370,"v2.5.8","- 🐛 Fix (CLI): Windows batch processing duplicate files - Fixed CLI batch mode processing each file twice on Windows due to case-insensitive filesystem. Improved directory scanning performance by 2-3x\r\n- 📁 Fix(CLI): Output folder location - Output files now created in sensible locations: batch mode creates {folder_name}_upscaled\u002F sibling folder with original filenames preserved; single file mode adds _upscaled suffix in same directory. All logs now show absolute paths for clarity\r\n- 🎨 Fix(CLI): RGBA alpha channel support - PNG images with transparency are now properly detected and preserved through the upscaling pipeline, matching ComfyUI behavior","2025-11-11T14:44:17"]