[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-ModelTC--LightX2V":3,"tool-ModelTC--LightX2V":64},[4,17,26,40,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,2,"2026-04-03T11:11:01",[13,14,15],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":23,"last_commit_at":32,"category_tags":33,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,34,35,36,15,37,38,13,39],"数据工具","视频","插件","其他","语言模型","音频",{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":10,"last_commit_at":46,"category_tags":47,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,38,37],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":10,"last_commit_at":54,"category_tags":55,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74939,"2026-04-05T23:16:38",[38,14,13,37],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":23,"last_commit_at":62,"category_tags":63,"status":16},2471,"tesseract","tesseract-ocr\u002Ftesseract","Tesseract 是一款历史悠久且备受推崇的开源光学字符识别（OCR）引擎，最初由惠普实验室开发，后由 Google 维护，目前由全球社区共同贡献。它的核心功能是将图片中的文字转化为可编辑、可搜索的文本数据，有效解决了从扫描件、照片或 PDF 文档中提取文字信息的难题，是数字化归档和信息自动化的重要基础工具。\n\n在技术层面，Tesseract 展现了强大的适应能力。从版本 4 开始，它引入了基于长短期记忆网络（LSTM）的神经网络 OCR 引擎，显著提升了行识别的准确率；同时，为了兼顾旧有需求，它依然支持传统的字符模式识别引擎。Tesseract 原生支持 UTF-8 编码，开箱即用即可识别超过 100 种语言，并兼容 PNG、JPEG、TIFF 等多种常见图像格式。输出方面，它灵活支持纯文本、hOCR、PDF、TSV 等多种格式，方便后续数据处理。\n\nTesseract 主要面向开发者、研究人员以及需要构建文档处理流程的企业用户。由于它本身是一个命令行工具和库（libtesseract），不包含图形用户界面（GUI），因此最适合具备一定编程能力的技术人员集成到自动化脚本或应用程序中",73286,"2026-04-03T01:56:45",[13,14],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":75,"owner_avatar_url":76,"owner_bio":77,"owner_company":78,"owner_location":78,"owner_email":78,"owner_twitter":78,"owner_website":78,"owner_url":79,"languages":80,"stars":112,"forks":113,"last_commit_at":114,"license":115,"difficulty_score":10,"env_os":116,"env_gpu":117,"env_ram":118,"env_deps":119,"category_tags":126,"github_topics":127,"view_count":23,"oss_zip_url":78,"oss_zip_packed_at":78,"status":16,"created_at":132,"updated_at":133,"faqs":134,"releases":175},2749,"ModelTC\u002FLightX2V","LightX2V","Light Image Video Generation Inference Framework","LightX2V 是一款先进且轻量级的图像与视频生成推理框架，旨在为用户提供高效、高性能的多模态内容合成方案。它成功解决了传统生成模型部署复杂、推理速度慢以及难以统一支持多种任务格式的痛点，将文本生成视频（T2V）、图生视频（I2V）、文生图（T2I）及图像编辑等功能整合于同一平台，实现了从不同输入模态到视觉输出的无缝转换。\n\n该工具非常适合 AI 开发者、研究人员以及需要快速构建生成式应用的技术团队使用。其核心亮点在于卓越的工程优化能力：不仅支持 FP8 和 NVFP4 等前沿量化技术以大幅降低显存占用并提升推理速度，还创新性地引入了基于强化学习的 GenRL 框架，显著提升了生成内容的审美质量与动作连贯性。此外，LightX2V 具备灵活的部署特性，兼容 Intel AIPC 硬件，并支持基于 Mooncake 的解耦部署架构，能够轻松适应从本地开发到大规模集群的各种应用场景。无论是进行算法研究还是落地实际产品，LightX2V 都能提供稳定且强大的底层支持。","\u003Cdiv align=\"center\" style=\"font-family: charter;\">\n  \u003Ch1>⚡️ LightX2V:\u003Cbr> Light Video Generation Inference Framework\u003C\u002Fh1>\n\n\u003Cimg alt=\"logo\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FModelTC_LightX2V_readme_ff9a58abf934.png\" width=75%>\u003C\u002Fimg>\n\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache_2.0-blue.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FApache-2.0)\n[![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002FModelTC\u002Flightx2v)\n[![Doc](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdocs-English-99cc2)](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest)\n[![Doc](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F文档-中文-99cc2)](https:\u002F\u002Flightx2v-zhcn.readthedocs.io\u002Fzh-cn\u002Flatest)\n[![Papers](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F论文集-中文-99cc2)](https:\u002F\u002Flightx2v-papers-zhcn.readthedocs.io\u002Fzh-cn\u002Flatest)\n[![Docker](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDocker-2496ED?style=flat&logo=docker&logoColor=white)](https:\u002F\u002Fhub.docker.com\u002Fr\u002Flightx2v\u002Flightx2v\u002Ftags)\n\n**\\[ English | [中文](README_zh.md) \\]**\n\n\u003C\u002Fdiv>\n\n--------------------------------------------------------------------------------\n\n**LightX2V** is an advanced lightweight image\u002Fvideo generation inference framework engineered to deliver efficient, high-performance image\u002Fvideo synthesis solutions. This unified platform integrates multiple state-of-the-art image\u002Fvideo generation techniques, supporting diverse generation tasks including text-to-video (T2V), image-to-video (I2V), text-to-image (T2I), image-editing (I2I). **X2V represents the transformation of different input modalities (X, such as text or images) into vision output (Vision)**.\n\n> 🌐 **Try it online now!** Experience LightX2V without installation: **[LightX2V Online Service](https:\u002F\u002Fx2v.light-ai.top\u002Flogin)** - Free, lightweight, and fast AI digital human video generation platform.\n\n> 🎉 **NEW: GenRL is here!** Check out our new **[GenRL Framework](https:\u002F\u002Fgithub.com\u002FModelTC\u002FGenRL)** for training visual generation models with reinforcement learning! High-performance RL-trained checkpoints now available on **[HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Flightx2v\u002Fgenrl)**.\n\n> 👋 **Join our WeChat group! LightX2V Rotbot WeChat ID: random42seed**\n\n## 🧾 Community Code Contribution Guidelines\n\nBefore submitting, please ensure that the code format conforms to the project standard. You can use the following execution command to ensure the consistency of project code format.\n\n```bash\npip install ruff pre-commit\npre-commit run --all-files\n```\n\nBesides the contributions from the LightX2V team, we have received contributions from some community developers, including but not limited to:\n\n- [zhtshr](https:\u002F\u002Fgithub.com\u002Fzhtshr)\n- [triple-Mu](https:\u002F\u002Fgithub.com\u002Ftriple-Mu)\n- [vivienfanghuagood](https:\u002F\u002Fgithub.com\u002Fvivienfanghuagood)\n- [yeahdongcn](https:\u002F\u002Fgithub.com\u002Fyeahdongcn)\n- [kikidouloveme79](https:\u002F\u002Fgithub.com\u002Fkikidouloveme79)\n- [ziyanxzy](https:\u002F\u002Fgithub.com\u002Fziyanxzy)\n\n## :fire: Latest News\n\n- **March 5, 2026:** 🚀 We now support deployment on Intel AIPC PTL. Thanks to the Intel team.\n\n- **March 5, 2026:** 🚀 We now support disaggregated deployment based on [Mooncake](https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake). More improvements and documentation for disaggregated deployment are in progress. Thanks to the Mooncake team for their help!\n\n- **February 27, 2026:** 🚀 We now support FP8 and NVFP4 quantization for autoregressive video generation models ([Self Forcing](https:\u002F\u002Fgithub.com\u002Fguandeh17\u002FSelf-Forcing))! You can find the quantized model here: **[Self-Forcing-FP8](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FSelf-Forcing-FP8), [Self-Forcing-NVFP4](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FSelf-Forcing-NVFP4)**.\n\n- **February 11, 2026:** 🎉 We are excited to announce **[GenRL](https:\u002F\u002Fgithub.com\u002FModelTC\u002FGenRL)** - a scalable reinforcement learning framework for visual generation! GenRL enables training diffusion\u002Fflow models with multi-reward optimization (HPSv3, VideoAlign, etc.) using GRPO algorithm. We've released high-performance LoRA checkpoints trained with multi-node multi-GPU setup, demonstrating significant improvements in aesthetic quality, motion coherence, and text-video alignment. Check out our [model collection](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Flightx2v\u002Fgenrl) on HuggingFace! Give us a ⭐ if you find it useful!\n\n- **January 20, 2026:** 🚀 We support the [LTX-2](https:\u002F\u002Fhuggingface.co\u002FLightricks\u002FLTX-2) audio-video generation model, featuring CFG parallelism, block-level offload, and FP8 per-tensor quantization. Usage examples can be found in [examples\u002Fltx2](https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Ftree\u002Fmain\u002Fexamples\u002Fltx2) and [scripts\u002Fltx2](https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Ftree\u002Fmain\u002Fscripts\u002Fltx2).\n\n- **January 6, 2026:** 🚀 We updated the 8-step CFG\u002Fstep-distilled models for [Qwen-Image-2512](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen-Image-2512) and [Qwen\u002FQwen-Image-Edit-2511](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen-Image-Edit-2511). You can download the corresponding weights from [Qwen-Image-Edit-2511-Lightning](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FQwen-Image-Edit-2511-Lightning) and [Qwen-Image-2512-Lightning](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FQwen-Image-2512-Lightning) for use. Usage tutorials can be found [here](https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Ftree\u002Fmain\u002Fexamples\u002Fqwen_image).\n\n- **January 6, 2026:** 🚀 Supported deployment on Enflame S60 (GCU).\n\n- **December 31, 2025:** 🚀 We support the [Qwen-Image-2512](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen-Image-2512) text-to-image model since Day 0. Our [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FQwen-Image-2512-Lightning) has been updated with CFG \u002F step-distilled LoRA. Usage examples can be found [here](https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Ftree\u002Fmain\u002Fexamples\u002Fqwen_image).\n\n- **December 27, 2025:** 🚀 Supported deployment on MThreads MUSA.\n\n- **December 25, 2025:** 🚀 Supported deployment on AMD ROCm and Ascend 910B.\n\n- **December 23, 2025:** 🚀 We support the [Qwen-Image-Edit-2511](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen-Image-Edit-2511) image editing model since Day 0. On a single H100 GPU, LightX2V delivers approximately 1.4× speedup. We support for CFG parallelism, Ulysses parallelism, and efficient offloading technologies. Our [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FQwen-Image-Edit-2511-Lightning) has been updated with CFG \u002F step-distilled LoRA and FP8 weights. Usage examples can be found [here](https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Ftree\u002Fmain\u002Fexamples\u002Fqwen_image). Combined with LightX2V, 4-step CFG \u002F step distillation, and the FP8 model, the maximum acceleration can reach up to approximately 42×. Feel free to try [LightX2V Online Service](https:\u002F\u002Fx2v.light-ai.top\u002Flogin) with *Image to Image* and *Qwen-Image-Edit-2511* model.\n\n- **December 22, 2025:** 🚀 Added **Wan2.1 NVFP4 quantization-aware 4-step distilled models**; weights are available on HuggingFace: [Wan-NVFP4](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FWan-NVFP4).\n\n- **December 15, 2025:** 🚀 Supported deployment on Hygon DCU.\n\n- **December 4, 2025:** 🚀 Supported GGUF format model inference & deployment on Cambricon MLU590\u002FMetaX C500.\n\n- **November 24, 2025:** 🚀 We released 4-step distilled models for HunyuanVideo-1.5! These models enable **ultra-fast 4-step inference** without CFG requirements, achieving approximately **25x speedup** compared to standard 50-step inference. Both base and FP8 quantized versions are now available: [Hy1.5-Distill-Models](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FHy1.5-Distill-Models).\n\n- **November 21, 2025:** 🚀 We support the [HunyuanVideo-1.5](https:\u002F\u002Fhuggingface.co\u002Ftencent\u002FHunyuanVideo-1.5) video generation model since Day 0. With the same number of GPUs, LightX2V can achieve a speed improvement of over 2 times and supports deployment on GPUs with lower memory (such as the 24GB RTX 4090). It also supports CFG\u002FUlysses parallelism, efficient offloading, TeaCache\u002FMagCache technologies, and more. We will soon update more models on our [HuggingFace page](https:\u002F\u002Fhuggingface.co\u002Flightx2v), including step distillation, VAE distillation, and other related models. Quantized models and lightweight VAE models are now available: [Hy1.5-Quantized-Models](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FHy1.5-Quantized-Models) for quantized inference, and [LightTAE for HunyuanVideo-1.5](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FAutoencoders\u002Fblob\u002Fmain\u002Flighttaehy1_5.safetensors) for fast VAE decoding. Refer to [this](https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Ftree\u002Fmain\u002Fscripts\u002Fhunyuan_video_15) for usage tutorials, or check out the [examples directory](https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Ftree\u002Fmain\u002Fexamples) for code examples.\n\n\n## 🏆 Performance Benchmarks (Updated on 2025.12.01)\n\n### 📊 Cross-Framework Performance Comparison (H100)\n\n| Framework | GPUs | Step Time | Speedup |\n|-----------|---------|---------|---------|\n| Diffusers | 1 | 9.77s\u002Fit | 1x |\n| xDiT | 1 | 8.93s\u002Fit | 1.1x |\n| FastVideo | 1 | 7.35s\u002Fit | 1.3x |\n| SGL-Diffusion | 1 | 6.13s\u002Fit | 1.6x |\n| **LightX2V** | 1 | **5.18s\u002Fit** | **1.9x** 🚀 |\n| FastVideo | 8 | 2.94s\u002Fit | 1x |\n| xDiT | 8 | 2.70s\u002Fit | 1.1x |\n| SGL-Diffusion | 8 | 1.19s\u002Fit | 2.5x |\n| **LightX2V** | 8 | **0.75s\u002Fit** | **3.9x** 🚀 |\n\n### 📊 Cross-Framework Performance Comparison (RTX 4090D)\n\n| Framework | GPUs | Step Time | Speedup |\n|-----------|---------|---------|---------|\n| Diffusers | 1 | 30.50s\u002Fit | 1x |\n| FastVideo | 1 | 22.66s\u002Fit | 1.3x |\n| xDiT | 1 | OOM | OOM |\n| SGL-Diffusion | 1 | OOM | OOM |\n| **LightX2V** | 1 | **20.26s\u002Fit** | **1.5x** 🚀 |\n| FastVideo | 8 | 15.48s\u002Fit | 1x |\n| xDiT | 8 | OOM | OOM |\n| SGL-Diffusion | 8 | OOM | OOM |\n| **LightX2V** | 8 | **4.75s\u002Fit** | **3.3x** 🚀 |\n\n### 📊 LightX2V Performance Comparison\n\n| Framework | GPU | Configuration | Step Time | Speedup |\n|-----------|-----|---------------|-----------|---------------|\n| **LightX2V** | H100 | 8 GPUs + cfg | 0.75s\u002Fit | 1x |\n| **LightX2V** | H100 | 8 GPUs + no cfg | 0.39s\u002Fit | 1.9x |\n| **LightX2V** | H100 | **8 GPUs + no cfg + fp8** | **0.35s\u002Fit** | **2.1x** 🚀 |\n| **LightX2V** | 4090D | 8 GPUs + cfg | 4.75s\u002Fit | 1x |\n| **LightX2V** | 4090D | 8 GPUs + no cfg | 3.13s\u002Fit | 1.5x |\n| **LightX2V** | 4090D | **8 GPUs + no cfg + fp8** | **2.35s\u002Fit** | **2.0x** 🚀 |\n\n**Note**: All the above performance data were tested on Wan2.1-I2V-14B-480P(40 steps, 81 frames). In addition, we also provide 4-step distilled models on the [HuggingFace page](https:\u002F\u002Fhuggingface.co\u002Flightx2v).\n\n\n## 💡 Quick Start\n\nFor comprehensive usage instructions, please refer to our documentation: **[English Docs](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002F) | [中文文档](https:\u002F\u002Flightx2v-zhcn.readthedocs.io\u002Fzh-cn\u002Flatest\u002F)**\n\n**We highly recommend using the Docker environment, as it is the simplest and fastest way to set up the environment. For details, please refer to the Quick Start section in the documentation.**\n\n### Installation from Git\n```bash\npip install -v git+https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V.git\n```\n\n### Building from Source\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V.git\ncd LightX2V\nuv pip install -v . # pip install -v .\n```\n\n### (Optional) Install Attention\u002FQuantize Operators\nFor attention operators installation, please refer to our documentation: **[English Docs](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fgetting_started\u002Fquickstart.html#step-4-install-attention-operators) | [中文文档](https:\u002F\u002Flightx2v-zhcn.readthedocs.io\u002Fzh-cn\u002Flatest\u002Fgetting_started\u002Fquickstart.html#id9)**\n\n### Usage Example\n\n```python\n# examples\u002Fwan\u002Fwan_i2v.py\n\"\"\"\nWan2.2 image-to-video generation example.\nThis example demonstrates how to use LightX2V with Wan2.2 model for I2V generation.\n\"\"\"\n\nfrom lightx2v import LightX2VPipeline\n\n# Initialize pipeline for Wan2.2 I2V task\n# For wan2.1, use model_cls=\"wan2.1\"\npipe = LightX2VPipeline(\n    model_path=\"\u002Fpath\u002Fto\u002FWan2.2-I2V-A14B\",\n    model_cls=\"wan2.2_moe\",\n    task=\"i2v\",\n)\n\n# Alternative: create generator from config JSON file\n# pipe.create_generator(\n#     config_json=\"configs\u002Fwan22\u002Fwan_moe_i2v.json\"\n# )\n\n# Enable offloading to significantly reduce VRAM usage with minimal speed impact\n# Suitable for RTX 30\u002F40\u002F50 consumer GPUs\npipe.enable_offload(\n    cpu_offload=True,\n    offload_granularity=\"block\",  # For Wan models, supports both \"block\" and \"phase\"\n    text_encoder_offload=True,\n    image_encoder_offload=False,\n    vae_offload=False,\n)\n\n# Create generator manually with specified parameters\npipe.create_generator(\n    attn_mode=\"sage_attn2\",\n    infer_steps=40,\n    height=480,  # Can be set to 720 for higher resolution\n    width=832,  # Can be set to 1280 for higher resolution\n    num_frames=81,\n    guidance_scale=[3.5, 3.5],  # For wan2.1, guidance_scale is a scalar (e.g., 5.0)\n    sample_shift=5.0,\n)\n\n# Generation parameters\nseed = 42\nprompt = \"Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside.\"\nnegative_prompt = \"镜头晃动，色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走\"\nimage_path=\"\u002Fpath\u002Fto\u002Fimg_0.jpg\"\nsave_result_path = \"\u002Fpath\u002Fto\u002Fsave_results\u002Foutput.mp4\"\n\n# Generate video\npipe.generate(\n    seed=seed,\n    image_path=image_path,\n    prompt=prompt,\n    negative_prompt=negative_prompt,\n    save_result_path=save_result_path,\n)\n```\n\n**NVFP4 (quantization-aware 4-step) resources**\n- Inference examples: `examples\u002Fwan\u002Fwan_i2v_nvfp4.py` (I2V) and `examples\u002Fwan\u002Fwan_t2v_nvfp4.py` (T2V).\n- NVFP4 operator build\u002Finstall guide: see `lightx2v_kernel\u002FREADME.md`.\n\n> 💡 **More Examples**: For more usage examples including quantization, offloading, caching, and other advanced configurations, please refer to the [examples directory](https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Ftree\u002Fmain\u002Fexamples).\n\n\n\n## 🤖 Supported Model Ecosystem\n\n### Official Open-Source Models\n- ✅ [LTX-2](https:\u002F\u002Fhuggingface.co\u002FLightricks\u002FLTX-2)\n- ✅ [HunyuanVideo-1.5](https:\u002F\u002Fhuggingface.co\u002Ftencent\u002FHunyuanVideo-1.5)\n- ✅ [Wan2.1 & Wan2.2](https:\u002F\u002Fhuggingface.co\u002FWan-AI\u002F)\n- ✅ [Qwen-Image](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen-Image)\n- ✅ [Qwen-Image-Edit](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FQwen\u002FQwen-Image-Edit)\n- ✅ [Qwen-Image-Edit-2509](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen-Image-Edit-2509)\n- ✅ [Qwen-Image-Edit-2511](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen-Image-Edit-2511)\n\n### Quantized and Distilled Models\u002FLoRAs (**🚀 Recommended: 4-step inference**)\n- ✅ [Wan2.1-Distill-Models](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FWan2.1-Distill-Models)\n- ✅ [Wan2.2-Distill-Models](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FWan2.2-Distill-Models)\n- ✅ [Wan2.1-Distill-Loras](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FWan2.1-Distill-Loras)\n- ✅ [Wan2.2-Distill-Loras](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FWan2.2-Distill-Loras)\n- ✅ [Wan2.1-Distill-NVFP4](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FWan-NVFP4)\n- ✅ [Qwen-Image-Edit-2511-Lightning](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FQwen-Image-Edit-2511-Lightning)\n\n### Lightweight Autoencoder Models (**🚀 Recommended: fast inference & low memory usage**)\n- ✅ [Autoencoders](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FAutoencoders)\n\n### Autoregressive Models\n- ✅ [Wan2.1-T2V-CausVid](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FWan2.1-T2V-14B-CausVid)\n- ✅ [Self-Forcing](https:\u002F\u002Fgithub.com\u002Fguandeh17\u002FSelf-Forcing)\n- ✅ [Matrix-Game-2.0](https:\u002F\u002Fhuggingface.co\u002FSkywork\u002FMatrix-Game-2.0)\n\n🔔 Follow our [HuggingFace page](https:\u002F\u002Fhuggingface.co\u002Flightx2v) for the latest model releases from our team.\n\n💡 Refer to the [Model Structure Documentation](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fgetting_started\u002Fmodel_structure.html) to quickly get started with LightX2V\n\n## 🚀 Frontend Interfaces\n\nWe provide multiple frontend interface deployment options:\n\n- **🎨 Gradio Interface**: Clean and user-friendly web interface, perfect for quick experience and prototyping\n  - 📖 [Gradio Deployment Guide](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fdeploy_guides\u002Fdeploy_gradio.html)\n- **🎯 ComfyUI Interface**: Powerful node-based workflow interface, supporting complex video generation tasks\n  - 📖 [ComfyUI Deployment Guide](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fdeploy_guides\u002Fdeploy_comfyui.html)\n- **🚀 Windows One-Click Deployment**: Convenient deployment solution designed for Windows users, featuring automatic environment configuration and intelligent parameter optimization\n  - 📖 [Windows One-Click Deployment Guide](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fdeploy_guides\u002Fdeploy_local_windows.html)\n\n**💡 Recommended Solutions**:\n- **First-time Users**: We recommend the Windows one-click deployment solution\n- **Advanced Users**: We recommend the ComfyUI interface for more customization options\n- **Quick Experience**: The Gradio interface provides the most intuitive operation experience\n\n## 🚀 Core Features\n\n### 🎯 **Ultimate Performance Optimization**\n- **🔥 SOTA Inference Speed**: Achieve **~20x** acceleration via step distillation and system optimization (single GPU)\n- **⚡️ Revolutionary 4-Step Distillation**: Compress original 40-50 step inference to just 4 steps without CFG requirements\n- **🛠️ Advanced Operator Support**: Integrated with cutting-edge operators including [Sage Attention](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002FSageAttention), [Flash Attention](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention), [Radial Attention](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Fradial-attention), [q8-kernel](https:\u002F\u002Fgithub.com\u002FKONAKONA666\u002Fq8_kernels), [sgl-kernel](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang\u002Ftree\u002Fmain\u002Fsgl-kernel), [vllm](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm)\n\n### 💾 **Resource-Efficient Deployment**\n- **💡 Breaking Hardware Barriers**: Run 14B models for 480P\u002F720P video generation with only **8GB VRAM + 16GB RAM**\n- **🔧 Intelligent Parameter Offloading**: Advanced disk-CPU-GPU three-tier offloading architecture with phase\u002Fblock-level granular management\n- **⚙️ Comprehensive Quantization**: Support for `w8a8-int8`, `w8a8-fp8`, `w4a4-nvfp4` and other quantization strategies\n\n### 🎨 **Rich Feature Ecosystem**\n- **📈 Smart Feature Caching**: Intelligent caching mechanisms to eliminate redundant computations\n- **🔄 Parallel Inference**: Multi-GPU parallel processing for enhanced performance\n- **📱 Flexible Deployment Options**: Support for Gradio, service deployment, ComfyUI and other deployment methods\n- **🎛️ Dynamic Resolution Inference**: Adaptive resolution adjustment for optimal generation quality\n- **🎞️ Video Frame Interpolation**: RIFE-based frame interpolation for smooth frame rate enhancement\n\n\n## 📚 Technical Documentation\n\n### 📖 **Method Tutorials**\n- [Model Quantization](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fmethod_tutorials\u002Fquantization.html) - Comprehensive guide to quantization strategies\n- [Feature Caching](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fmethod_tutorials\u002Fcache.html) - Intelligent caching mechanisms\n- [Attention Mechanisms](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fmethod_tutorials\u002Fattention.html) - State-of-the-art attention operators\n- [Parameter Offloading](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fmethod_tutorials\u002Foffload.html) - Three-tier storage architecture\n- [Parallel Inference](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fmethod_tutorials\u002Fparallel.html) - Multi-GPU acceleration strategies\n- [Changing Resolution Inference](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fmethod_tutorials\u002Fchanging_resolution.html) - U-shaped resolution strategy\n- [Step Distillation](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fmethod_tutorials\u002Fstep_distill.html) - 4-step inference technology\n- [Video Frame Interpolation](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fmethod_tutorials\u002Fvideo_frame_interpolation.html) - Base on the RIFE technology\n\n### 🛠️ **Deployment Guides**\n- [Low-Resource Deployment](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fdeploy_guides\u002Ffor_low_resource.html) - Optimized 8GB VRAM solutions\n- [Low-Latency Deployment](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fdeploy_guides\u002Ffor_low_latency.html) - Ultra-fast inference optimization\n- [Gradio Deployment](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fdeploy_guides\u002Fdeploy_gradio.html) - Web interface setup\n- [Service Deployment](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fdeploy_guides\u002Fdeploy_service.html) - Production API service deployment\n- [Lora Model Deployment](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fdeploy_guides\u002Flora_deploy.html) - Flexible Lora deployment\n\n## 🤝 Acknowledgments\n\nWe sincerely thank all the model repositories and research communities that inspired and promoted the development of LightX2V. This framework is built on the collective efforts of the open-source community. It includes but is not limited to:\n\n- [Tencent-Hunyuan](https:\u002F\u002Fgithub.com\u002FTencent-Hunyuan)\n- [Wan-Video](https:\u002F\u002Fgithub.com\u002FWan-Video)\n- [Qwen-Image](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen-Image)\n- [LightLLM](https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightLLM)\n- [sglang](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang)\n- [vllm](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm)\n- [flash-attention](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention)\n- [SageAttention](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002FSageAttention)\n- [flashinfer](https:\u002F\u002Fgithub.com\u002Fflashinfer-ai\u002Fflashinfer)\n- [MagiAttention](https:\u002F\u002Fgithub.com\u002FSandAI-org\u002FMagiAttention)\n- [radial-attention](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Fradial-attention)\n- [xDiT](https:\u002F\u002Fgithub.com\u002Fxdit-project\u002FxDiT)\n- [FastVideo](https:\u002F\u002Fgithub.com\u002Fhao-ai-lab\u002FFastVideo)\n- [Mooncake](https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake)\n\n## 🌟 Star History\n\n[![Star History Chart](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FModelTC_LightX2V_readme_b5b0dcc400b5.png)](https:\u002F\u002Fstar-history.com\u002F#ModelTC\u002Flightx2v&Timeline)\n\n## ✏️ Citation\n\nIf you find LightX2V useful in your research, please consider citing our work:\n\n```bibtex\n@misc{lightx2v,\n author = {LightX2V Contributors},\n title = {LightX2V: Light Video Generation Inference Framework},\n year = {2025},\n publisher = {GitHub},\n journal = {GitHub repository},\n howpublished = {\\url{https:\u002F\u002Fgithub.com\u002FModelTC\u002Flightx2v}},\n}\n```\n\n## 📞 Contact & Support\n\nFor questions, suggestions, or support, please feel free to reach out through:\n- 🐛 [GitHub Issues](https:\u002F\u002Fgithub.com\u002FModelTC\u002Flightx2v\u002Fissues) - Bug reports and feature requests\n\n---\n\n\u003Cdiv align=\"center\">\nBuilt with ❤️ by the LightX2V team\n\u003C\u002Fdiv>\n","\u003Cdiv align=\"center\" style=\"font-family: charter;\">\n  \u003Ch1>⚡️ LightX2V:\u003Cbr> 轻量级视频生成推理框架\u003C\u002Fh1>\n\n\u003Cimg alt=\"logo\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FModelTC_LightX2V_readme_ff9a58abf934.png\" width=75%>\u003C\u002Fimg>\n\n[![许可证](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache_2.0-blue.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FApache-2.0)\n[![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002FModelTC\u002Flightx2v)\n[![文档（英文）](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdocs-English-99cc2.svg)](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest)\n[![文档（中文）](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F文档-中文-99cc2.svg)](https:\u002F\u002Flightx2v-zhcn.readthedocs.io\u002Fzh-cn\u002Flatest)\n[![论文集（中文）](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F论文集-中文-99cc2.svg)](https:\u002F\u002Flightx2v-papers-zhcn.readthedocs.io\u002Fzh-cn\u002Flatest)\n[![Docker](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDocker-2496ED?style=flat&logo=docker&logoColor=white)](https:\u002F\u002Fhub.docker.com\u002Fr\u002Flightx2v\u002Flightx2v\u002Ftags)\n\n**\\[ 英文 | [中文](README_zh.md) \\]**\n\n\u003C\u002Fdiv>\n\n--------------------------------------------------------------------------------\n\n**LightX2V** 是一款先进的轻量级图像\u002F视频生成推理框架，旨在提供高效、高性能的图像\u002F视频合成解决方案。该统一平台集成了多种最先进的图像\u002F视频生成技术，支持多样化的生成任务，包括文本到视频（T2V）、图像到视频（I2V）、文本到图像（T2I）以及图像编辑（I2I）。**X2V 代表将不同输入模态（X，如文本或图像）转换为视觉输出（Vision）的过程**。\n\n> 🌐 **立即在线体验！** 无需安装即可体验 LightX2V：**[LightX2V 在线服务](https:\u002F\u002Fx2v.light-ai.top\u002Flogin)** - 免费、轻量且快速的 AI 数字人视频生成平台。\n\n> 🎉 **全新发布：GenRL 来了！** 欢迎查看我们全新的 **[GenRL 框架](https:\u002F\u002Fgithub.com\u002FModelTC\u002FGenRL)**，用于通过强化学习训练视觉生成模型！高性能 RL 训练检查点现已在 **[HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Flightx2v\u002Fgenrl)** 上发布。\n\n> 👋 **加入我们的微信交流群！LightX2V 机器人微信号：random42seed**\n\n## 🧾 社区代码贡献指南\n\n在提交代码之前，请确保代码格式符合项目标准。您可以通过以下命令来保证项目代码格式的一致性：\n\n```bash\npip install ruff pre-commit\npre-commit run --all-files\n```\n\n除了 LightX2V 团队的贡献外，我们还收到了一些社区开发者的贡献，其中包括但不限于：\n\n- [zhtshr](https:\u002F\u002Fgithub.com\u002Fzhtshr)\n- [triple-Mu](https:\u002F\u002Fgithub.com\u002Ftriple-Mu)\n- [vivienfanghuagood](https:\u002F\u002Fgithub.com\u002Fvivienfanghuagood)\n- [yeahdongcn](https:\u002F\u002Fgithub.com\u002Fyeahdongcn)\n- [kikidouloveme79](https:\u002F\u002Fgithub.com\u002Fkikidouloveme79)\n- [ziyanxzy](https:\u002F\u002Fgithub.com\u002Fziyanxzy)\n\n## :fire: 最新消息\n\n- **2026年3月5日:** 🚀 现已支持在Intel AIPC PTL上部署。感谢Intel团队！\n\n- **2026年3月5日:** 🚀 现已支持基于[Mooncake](https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake)的分布式部署。更多关于分布式部署的改进和文档正在开发中。感谢Mooncake团队的帮助！\n\n- **2026年2月27日:** 🚀 现已支持自回归视频生成模型（[Self Forcing](https:\u002F\u002Fgithub.com\u002Fguandeh17\u002FSelf-Forcing)）的FP8和NVFP4量化！您可以在以下链接找到量化后的模型：**[Self-Forcing-FP8](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FSelf-Forcing-FP8), [Self-Forcing-NVFP4](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FSelf-Forcing-NVFP4)**。\n\n- **2026年2月11日:** 🎉 我们很高兴地宣布推出**[GenRL](https:\u002F\u002Fgithub.com\u002FModelTC\u002FGenRL)**——一个用于视觉生成的可扩展强化学习框架！GenRL支持使用GRPO算法，通过多奖励优化（HPSv3、VideoAlign等）训练扩散\u002F流模型。我们发布了在多节点多GPU环境下训练的高性能LoRA检查点，展示了在美学质量、运动连贯性和文本与视频对齐方面的显著提升。请查看我们在HuggingFace上的[模型合集](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Flightx2v\u002Fgenrl)！如果您觉得有用，请给我们点个赞⭐！\n\n- **2026年1月20日:** 🚀 我们支持[LTX-2](https:\u002F\u002Fhuggingface.co\u002FLightricks\u002FLTX-2)音视频生成模型，该模型具备CFG并行、块级卸载以及每张张量FP8量化等功能。使用示例可在[examples\u002Fltx2](https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Ftree\u002Fmain\u002Fexamples\u002Fltx2)和[scripts\u002Fltx2](https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Ftree\u002Fmain\u002Fscripts\u002Fltx2)中找到。\n\n- **2026年1月6日:** 🚀 我们更新了针对[Qwen-Image-2512](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen-Image-2512)和[Qwen\u002FQwen-Image-Edit-2511](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen-Image-Edit-2511)的8步CFG\u002F步骤蒸馏模型。您可以从[Qwen-Image-Edit-2511-Lightning](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FQwen-Image-Edit-2511-Lightning)和[Qwen-Image-2512-Lightning](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FQwen-Image-2512-Lightning)下载相应权重以供使用。使用教程可在[这里](https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Ftree\u002Fmain\u002Fexamples\u002Fqwen_image)找到。\n\n- **2026年1月6日:** 🚀 支持在Enflame S60（GCU）上部署。\n\n- **2025年12月31日:** 🚀 自第一天起，我们就支持[Qwen-Image-2512](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen-Image-2512)文本到图像模型。我们的[HuggingFace](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FQwen-Image-2512-Lightning)已更新为CFG\u002F步骤蒸馏的LoRA版本。使用示例可在[这里](https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Ftree\u002Fmain\u002Fexamples\u002Fqwen_image)找到。\n\n- **2025年12月27日:** 🚀 支持在MThreads MUSA上部署。\n\n- **2025年12月25日:** 🚀 支持在AMD ROCm和Ascend 910B上部署。\n\n- **2025年12月23日:** 🚀 自第一天起，我们就支持[Qwen-Image-Edit-2511](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen-Image-Edit-2511)图像编辑模型。在单张H100 GPU上，LightX2V可带来约1.4倍的速度提升。我们支持CFG并行、Ulysses并行以及高效的卸载技术。我们的[HuggingFace](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FQwen-Image-Edit-2511-Lightning)已更新为CFG\u002F步骤蒸馏的LoRA和FP8权重。使用示例可在[这里](https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Ftree\u002Fmain\u002Fexamples\u002Fqwen_image)找到。结合LightX2V、4步CFG\u002F步骤蒸馏以及FP8模型，最高加速可达约42倍。欢迎试用[LightX2V在线服务](https:\u002F\u002Fx2v.light-ai.top\u002Flogin)，体验“图片转图片”和“Qwen-Image-Edit-2511”模型。\n\n- **2025年12月22日:** 🚀 新增**Wan2.1 NVFP4量化感知的4步蒸馏模型**；权重已在HuggingFace上发布：[Wan-NVFP4](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FWan-NVFP4)。\n\n- **2025年12月15日:** 🚀 支持在Hygon DCU上部署。\n\n- **2025年12月4日:** 🚀 支持GGUF格式模型在Cambricon MLU590\u002FMetaX C500上的推理与部署。\n\n- **2025年11月24日:** 🚀 我们发布了HunyuanVideo-1.5的4步蒸馏模型！这些模型无需CFG即可实现**超快速4步推理**，相比标准50步推理，速度提升约**25倍**。基础版和FP8量化版现已上线：[Hy1.5-Distill-Models](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FHy1.5-Distill-Models)。\n\n- **2025年11月21日:** 🚀 自第一天起，我们就支持[Tencent的HunyuanVideo-1.5](https:\u002F\u002Fhuggingface.co\u002Ftencent\u002FHunyuanVideo-1.5)视频生成模型。在相同数量的GPU下，LightX2V可将速度提升至2倍以上，并支持在显存较低的GPU（如24GB RTX 4090）上部署。它还支持CFG\u002FUlysses并行、高效卸载、TeaCache\u002FMagCache等技术。我们将在不久的将来更新更多模型至我们的[HuggingFace页面](https:\u002F\u002Fhuggingface.co\u002Flightx2v)，包括步骤蒸馏、VAE蒸馏等相关模型。量化模型和轻量级VAE模型现已上线：[Hy1.5-Quantized-Models](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FHy1.5-Quantized-Models)用于量化推理，而[LightTAE for HunyuanVideo-1.5](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FAutoencoders\u002Fblob\u002Fmain\u002Flighttaehy1_5.safetensors)则用于快速VAE解码。使用教程请参考[这里](https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Ftree\u002Fmain\u002Fscripts\u002Fhunyuan_video_15)，或查看[示例目录](https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Ftree\u002Fmain\u002Fexamples)获取代码示例。\n\n\n## 🏆 性能基准测试（更新于2025年12月1日）\n\n### 📊 跨框架性能对比（H100）\n\n| 框架 | GPU数量 | 每步耗时 | 加速比 |\n|-----------|---------|---------|---------|\n| Diffusers | 1 | 9.77秒\u002F步 | 1x |\n| xDiT | 1 | 8.93秒\u002F步 | 1.1x |\n| FastVideo | 1 | 7.35秒\u002F步 | 1.3x |\n| SGL-Diffusion | 1 | 6.13秒\u002F步 | 1.6x |\n| **LightX2V** | 1 | **5.18秒\u002F步** | **1.9x** 🚀 |\n| FastVideo | 8 | 2.94秒\u002F步 | 1x |\n| xDiT | 8 | 2.70秒\u002F步 | 1.1x |\n| SGL-Diffusion | 8 | 1.19秒\u002F步 | 2.5x |\n| **LightX2V** | 8 | **0.75秒\u002F步** | **3.9x** 🚀 |\n\n### 📊 跨框架性能对比（RTX 4090D）\n\n| 框架 | GPU数量 | 每步耗时 | 加速比 |\n|-----------|---------|---------|---------|\n| Diffusers | 1 | 30.50秒\u002F步 | 1x |\n| FastVideo | 1 | 22.66秒\u002F步 | 1.3x |\n| xDiT | 1 | OOM | OOM |\n| SGL-Diffusion | 1 | OOM | OOM |\n| **LightX2V** | 1 | **20.26秒\u002F步** | **1.5x** 🚀 |\n| FastVideo | 8 | 15.48秒\u002F步 | 1x |\n| xDiT | 8 | OOM | OOM |\n| SGL-Diffusion | 8 | OOM | OOM |\n| **LightX2V** | 8 | **4.75秒\u002F步** | **3.3x** 🚀 |\n\n### 📊 LightX2V 性能对比\n\n| 框架 | GPU | 配置 | 步骤时间 | 加速比 |\n|-----------|-----|---------------|-----------|---------------|\n| **LightX2V** | H100 | 8 GPUs + cfg | 0.75s\u002Fit | 1x |\n| **LightX2V** | H100 | 8 GPUs + no cfg | 0.39s\u002Fit | 1.9x |\n| **LightX2V** | H100 | **8 GPUs + no cfg + fp8** | **0.35s\u002Fit** | **2.1x** 🚀 |\n| **LightX2V** | 4090D | 8 GPUs + cfg | 4.75s\u002Fit | 1x |\n| **LightX2V** | 4090D | 8 GPUs + no cfg | 3.13s\u002Fit | 1.5x |\n| **LightX2V** | 4090D | **8 GPUs + no cfg + fp8** | **2.35s\u002Fit** | **2.0x** 🚀 |\n\n**注**: 以上所有性能数据均在 Wan2.1-I2V-14B-480P（40 步，81 帧）上测试。此外，我们还在 [HuggingFace 页面](https:\u002F\u002Fhuggingface.co\u002Flightx2v) 上提供了 4 步蒸馏模型。\n\n## 💡 快速入门\n\n有关完整的使用说明，请参阅我们的文档：**[英文文档](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002F) | [中文文档](https:\u002F\u002Flightx2v-zhcn.readthedocs.io\u002Fzh-cn\u002Flatest\u002F)**\n\n**我们强烈建议使用 Docker 环境，因为这是设置环境最简单、最快捷的方式。详情请参阅文档中的快速入门部分。**\n\n### 从 Git 安装\n```bash\npip install -v git+https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V.git\n```\n\n### 从源码构建\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V.git\ncd LightX2V\nuv pip install -v . # pip install -v .\n```\n\n### （可选）安装注意力\u002F量化算子\n关于注意力算子的安装，请参阅我们的文档：**[英文文档](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fgetting_started\u002Fquickstart.html#step-4-install-attention-operators) | [中文文档](https:\u002F\u002Flightx2v-zhcn.readthedocs.io\u002Fzh-cn\u002Flatest\u002Fgetting_started\u002Fquickstart.html#id9)**\n\n### 使用示例\n\n```python\n# examples\u002Fwan\u002Fwan_i2v.py\n\"\"\"\nWan2.2 图像转视频生成示例。\n本示例演示如何使用 LightX2V 和 Wan2.2 模型进行 I2V 生成。\n\"\"\"\n\nfrom lightx2v import LightX2VPipeline\n\n# 初始化用于 Wan2.2 I2V 任务的管道\n# 对于 wan2.1，使用 model_cls=\"wan2.1\"\npipe = LightX2VPipeline(\n    model_path=\"\u002Fpath\u002Fto\u002FWan2.2-I2V-A14B\",\n    model_cls=\"wan2.2_moe\",\n    task=\"i2v\",\n)\n\n# 另一种方式：从配置 JSON 文件创建生成器\n# pipe.create_generator(\n#     config_json=\"configs\u002Fwan22\u002Fwan_moe_i2v.json\"\n# )\n\n# 启用卸载功能，以显著减少显存占用，同时对速度影响较小。\n# 适用于 RTX 30\u002F40\u002F50 消费级 GPU\npipe.enable_offload(\n    cpu_offload=True,\n    offload_granularity=\"block\",  # 对于 Wan 模型，支持 \"block\" 和 \"phase\"\n    text_encoder_offload=True,\n    image_encoder_offload=False,\n    vae_offload=False,\n)\n\n# 使用指定参数手动创建生成器\npipe.create_generator(\n    attn_mode=\"sage_attn2\",\n    infer_steps=40，\n    height=480，  # 可设置为 720 以获得更高分辨率\n    width=832，  # 可设置为 1280 以获得更高分辨率\n    num_frames=81，\n    guidance_scale=[3.5, 3.5]，  # 对于 wan2.1，guidance_scale 是一个标量（例如 5.0）\n    sample_shift=5.0，\n)\n\n# 生成参数\nseed = 42\nprompt = \"夏日海滩度假风格，一只戴着太阳镜的白猫坐在冲浪板上。这只毛茸茸的小猫神情放松地直视着镜头。背景是模糊的海滩景色，清澈见底的海水、远处的青山以及点缀着白云的蓝天。猫咪的姿态自然放松，仿佛正在享受海风和温暖的阳光。特写镜头突出了猫咪的精致细节和海滨的清新氛围。\"\nnegative_prompt = \"镜头晃动，色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走\"\nimage_path=\"\u002Fpath\u002Fto\u002Fimg_0.jpg\"\nsave_result_path = \"\u002Fpath\u002Fto\u002Fsave_results\u002Foutput.mp4\"\n\n# 生成视频\npipe.generate(\n    seed=seed，\n    image_path=image_path，\n    prompt=prompt，\n    negative_prompt=negative_prompt，\n    save_result_path=save_result_path，\n)\n```\n\n**NVFP4（量化感知 4 步）资源**\n- 推理示例：`examples\u002Fwan\u002Fwan_i2v_nvfp4.py`（I2V）和 `examples\u002Fwan\u002Fwan_t2v_nvfp4.py`（T2V）。\n- NVFP4 算子构建\u002F安装指南：请参阅 `lightx2v_kernel\u002FREADME.md`。\n\n> 💡 **更多示例**：有关量化、卸载、缓存及其他高级配置的更多使用示例，请参阅 [examples 目录](https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Ftree\u002Fmain\u002Fexamples)。\n\n\n\n## 🤖 支持的模型生态\n\n### 官方开源模型\n- ✅ [LTX-2](https:\u002F\u002Fhuggingface.co\u002FLightricks\u002FLTX-2)\n- ✅ [HunyuanVideo-1.5](https:\u002F\u002Fhuggingface.co\u002Ftencent\u002FHunyuanVideo-1.5)\n- ✅ [Wan2.1 & Wan2.2](https:\u002F\u002Fhuggingface.co\u002FWan-AI\u002F)\n- ✅ [Qwen-Image](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen-Image)\n- ✅ [Qwen-Image-Edit](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FQwen\u002FQwen-Image-Edit)\n- ✅ [Qwen-Image-Edit-2509](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen-Image-Edit-2509)\n- ✅ [Qwen-Image-Edit-2511](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen-Image-Edit-2511)\n\n### 量化与蒸馏模型\u002FLoRA（**🚀 推荐：4 步推理**）\n- ✅ [Wan2.1-Distill-Models](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FWan2.1-Distill-Models)\n- ✅ [Wan2.2-Distill-Models](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FWan2.2-Distill-Models)\n- ✅ [Wan2.1-Distill-Loras](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FWan2.1-Distill-Loras)\n- ✅ [Wan2.2-Distill-Loras](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FWan2.2-Distill-Loras)\n- ✅ [Wan2.1-Distill-NVFP4](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FWan-NVFP4)\n- ✅ [Qwen-Image-Edit-2511-Lightning](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FQwen-Image-Edit-2511-Lightning)\n\n### 轻量级自编码器模型（**🚀 推荐：快速推理 & 低内存占用**）\n- ✅ [Autoencoders](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FAutoencoders)\n\n### 自回归模型\n- ✅ [Wan2.1-T2V-CausVid](https:\u002F\u002Fhuggingface.co\u002Flightx2v\u002FWan2.1-T2V-14B-CausVid)\n- ✅ [Self-Forcing](https:\u002F\u002Fgithub.com\u002Fguandeh17\u002FSelf-Forcing)\n- ✅ [Matrix-Game-2.0](https:\u002F\u002Fhuggingface.co\u002FSkywork\u002FMatrix-Game-2.0)\n\n🔔 关注我们的 [HuggingFace 页面](https:\u002F\u002Fhuggingface.co\u002Flightx2v)，了解我们团队发布的最新模型。\n\n💡 请参阅 [模型结构文档](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fgetting_started\u002Fmodel_structure.html)，以便快速上手 LightX2V。\n\n## 🚀 前端界面\n\n我们提供了多种前端界面部署选项：\n\n- **🎨 Gradio 界面**: 简洁友好的网页界面，非常适合快速体验和原型开发\n  - 📖 [Gradio 部署指南](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fdeploy_guides\u002Fdeploy_gradio.html)\n- **🎯 ComfyUI 界面**: 强大的节点式工作流界面，支持复杂的视频生成任务\n  - 📖 [ComfyUI 部署指南](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fdeploy_guides\u002Fdeploy_comfyui.html)\n- **🚀 Windows 一键部署**: 专为 Windows 用户设计的便捷部署方案，具备自动环境配置和智能参数优化功能\n  - 📖 [Windows 一键部署指南](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fdeploy_guides\u002Fdeploy_local_windows.html)\n\n**💡 推荐方案**：\n- **初次使用者**: 建议使用 Windows 一键部署方案\n- **高级用户**: 建议使用 ComfyUI 界面以获得更多自定义选项\n- **快速体验**: Gradio 界面提供最直观的操作体验\n\n## 🚀 核心特性\n\n### 🎯 **极致性能优化**\n- **🔥 SOTA 推理速度**: 通过步骤蒸馏与系统优化实现 **~20倍** 加速（单 GPU）\n- **⚡️ 革命性 4 步骤蒸馏**: 在无需 CFG 的情况下，将原本 40–50 步的推理压缩至仅 4 步\n- **🛠️ 先进算子支持**: 集成前沿算子，包括 [Sage Attention](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002FSageAttention)、[Flash Attention](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention)、[Radial Attention](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Fradial-attention)、[q8-kernel](https:\u002F\u002Fgithub.com\u002FKONAKONA666\u002Fq8_kernels)、[sgl-kernel](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang\u002Ftree\u002Fmain\u002Fsgl-kernel)、[vllm](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm)\n\n### 💾 **资源高效部署**\n- **💡 打破硬件限制**: 仅需 **8GB VRAM + 16GB RAM** 即可运行 14B 参数模型进行 480P\u002F720P 视频生成\n- **🔧 智能参数卸载**: 先进的磁盘-CPU-GPU 三层卸载架构，支持阶段\u002F块级精细化管理\n- **⚙️ 全面量化支持**: 支持 `w8a8-int8`、`w8a8-fp8`、`w4a4-nvfp4` 等多种量化策略\n\n### 🎨 **丰富功能生态**\n- **📈 智能特征缓存**: 智能缓存机制，消除冗余计算\n- **🔄 并行推理**: 多 GPU 并行处理，提升性能\n- **📱 灵活部署选项**: 支持 Gradio、服务部署、ComfyUI 等多种部署方式\n- **🎛️ 动态分辨率推理**: 自适应分辨率调整，优化生成质量\n- **🎞️ 视频帧插值**: 基于 RIFE 的帧插值技术，平滑提升帧率\n\n\n## 📚 技术文档\n\n### 📖 **方法教程**\n- [模型量化](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fmethod_tutorials\u002Fquantization.html) - 量化策略全面指南\n- [特征缓存](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fmethod_tutorials\u002Fcache.html) - 智能缓存机制\n- [注意力机制](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fmethod_tutorials\u002Fattention.html) - 最先进的注意力算子\n- [参数卸载](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fmethod_tutorials\u002Foffload.html) - 三层存储架构\n- [并行推理](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fmethod_tutorials\u002Fparallel.html) - 多 GPU 加速策略\n- [变分辨率推理](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fmethod_tutorials\u002Fchanging_resolution.html) - U型分辨率策略\n- [步骤蒸馏](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fmethod_tutorials\u002Fstep_distill.html) - 4 步骤推理技术\n- [视频帧插值](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fmethod_tutorials\u002Fvideo_frame_interpolation.html) - 基于 RIFE 技术\n\n### 🛠️ **部署指南**\n- [低资源部署](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fdeploy_guides\u002Ffor_low_resource.html) - 优化后的 8GB VRAM 解决方案\n- [低延迟部署](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fdeploy_guides\u002Ffor_low_latency.html) - 超快速推理优化\n- [Gradio 部署](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fdeploy_guides\u002Fdeploy_gradio.html) - Web 界面搭建\n- [服务部署](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fdeploy_guides\u002Fdeploy_service.html) - 生产级 API 服务部署\n- [Lora 模型部署](https:\u002F\u002Flightx2v-en.readthedocs.io\u002Fen\u002Flatest\u002Fdeploy_guides\u002Flora_deploy.html) - 灵活的 Lora 部署\n\n## 🤝 致谢\n\n我们衷心感谢所有启发并推动 LightX2V 发展的模型仓库和研究社区。本框架建立在开源社区的共同努力之上，其中包括但不限于：\n\n- [Tencent-Hunyuan](https:\u002F\u002Fgithub.com\u002FTencent-Hunyuan)\n- [Wan-Video](https:\u002F\u002Fgithub.com\u002FWan-Video)\n- [Qwen-Image](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen-Image)\n- [LightLLM](https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightLLM)\n- [sglang](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang)\n- [vllm](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm)\n- [flash-attention](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention)\n- [SageAttention](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002FSageAttention)\n- [flashinfer](https:\u002F\u002Fgithub.com\u002Fflashinfer-ai\u002Fflashinfer)\n- [MagiAttention](https:\u002F\u002Fgithub.com\u002FSandAI-org\u002FMagiAttention)\n- [radial-attention](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Fradial-attention)\n- [xDiT](https:\u002F\u002Fgithub.com\u002Fxdit-project\u002FxDiT)\n- [FastVideo](https:\u002F\u002Fgithub.com\u002Fhao-ai-lab\u002FFastVideo)\n- [Mooncake](https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake)\n\n## 🌟 星标历史\n\n[![星标历史图](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FModelTC_LightX2V_readme_b5b0dcc400b5.png)](https:\u002F\u002Fstar-history.com\u002F#ModelTC\u002Flightx2v&Timeline)\n\n## ✏️ 引用\n\n如果您在研究中使用了 LightX2V，请考虑引用我们的工作：\n\n```bibtex\n@misc{lightx2v,\n author = {LightX2V Contributors},\n title = {LightX2V: 轻量级视频生成推理框架},\n year = {2025},\n publisher = {GitHub},\n journal = {GitHub 仓库},\n howpublished = {\\url{https:\u002F\u002Fgithub.com\u002FModelTC\u002Flightx2v}},\n}\n```\n\n## 📞 联系与支持\n\n如有任何问题、建议或需要支持，请随时通过以下方式联系我们：\n- 🐛 [GitHub Issues](https:\u002F\u002Fgithub.com\u002FModelTC\u002Flightx2v\u002Fissues) - 用于提交 Bug 和功能请求\n\n---\n\n\u003Cdiv align=\"center\">\n由 LightX2V 团队用心打造\n\u003C\u002Fdiv>","# LightX2V 快速上手指南\n\nLightX2V 是一个先进的高性能图像\u002F视频生成推理框架，支持文生视频 (T2V)、图生视频 (I2V)、文生图 (T2I) 及图像编辑 (I2I) 等多种任务。该框架通过集成 CFG 并行、块级卸载 (Offload)、FP8\u002FNVFP4 量化及步数蒸馏等技术，显著提升了生成速度并降低了显存需求。\n\n## 1. 环境准备\n\n### 系统要求\n*   **操作系统**: Linux (推荐 Ubuntu 20.04+)\n*   **GPU**: NVIDIA GPU (支持 H100, A100, RTX 4090 等)，或国产算力卡 (华为 Ascend 910B, 海光 DCU, 摩尔线程 MUSA, 天数智芯 GCU 等)。\n*   **Python**: 3.8 - 3.11\n*   **CUDA**: 建议 12.1+ (根据具体显卡驱动版本调整)\n\n### 前置依赖\n确保已安装 `git` 和 `pip` (或 `uv` 以加速安装)。\n> **提示**：国内开发者建议使用国内镜像源加速 Python 包下载。\n\n## 2. 安装步骤\n\n推荐优先使用 **Docker** 部署（最简单且环境隔离最好），若需源码安装请参考以下方式。\n\n### 方式一：Docker 部署（推荐）\n直接从 Docker Hub 拉取预构建镜像，避免环境配置冲突：\n```bash\ndocker pull lightx2v\u002Flightx2v:latest\n# 启动容器示例（需映射模型路径和 GPU）\ndocker run --gpus all -it -v \u002Fpath\u002Fto\u002Fmodels:\u002Fmodels lightx2v\u002Flightx2v:latest\n```\n\n### 方式二：从 Git 直接安装\n```bash\npip install -v git+https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V.git\n# 国内加速建议：\n# pip install -v git+https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V.git -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n### 方式三：源码构建安装\n适合需要修改代码或开发贡献的用户：\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V.git\ncd LightX2V\n# 推荐使用 uv 加速安装，也可使用 pip\nuv pip install -v . \n# 或\npip install -v . -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n> **注意**：如需启用特定的注意力算子优化或量化功能，请参考官方文档安装额外的算子库。\n\n## 3. 基本使用\n\n以下是一个基于 **Wan2.2** 模型的图生视频 (I2V) 最小化使用示例。该示例展示了如何初始化管道并开启显存卸载以适应消费级显卡。\n\n```python\n# examples\u002Fwan\u002Fwan_i2v.py\n\"\"\"\nWan2.2 image-to-video generation example.\nThis example demonstrates how to use LightX2V with Wan2.2 model for I2V generation.\n\"\"\"\n\nfrom lightx2v import LightX2VPipeline\n\n# 初始化 Wan2.2 I2V 任务管道\n# 若使用 wan2.1 模型，请将 model_cls 设置为 \"wan2.1\"\npipe = LightX2VPipeline(\n    model_path=\"\u002Fpath\u002Fto\u002FWan2.2-I2V-A14B\",\n    model_cls=\"wan2.2_moe\",\n    task=\"i2v\",\n)\n\n# 可选：通过 JSON 配置文件创建生成器\n# pipe.create_generator(\n#     config_json=\"configs\u002Fwan22\u002Fwan_moe_i2v.json\"\n# )\n\n# 启用卸载功能 (Offload)\n# 显著降低显存占用，轻微影响速度，非常适合 RTX 30\u002F40\u002F50 系列消费级显卡\npipe.enable_offload(\n    cpu_offload=True,\n    offload_granularity=\"block\",  # Wan 模型支持 \"block\" 和 \"phase\"\n    text_encoder_offload=True,\n)\n\n# 执行生成\n# 请根据实际模型要求替换 prompt, image_path, num_steps 等参数\noutput_video = pipe(\n    prompt=\"A cat walking on the street\",\n    image_path=\"input_image.jpg\",\n    num_steps=4,  # 若使用蒸馏模型可设为 4 步，否则通常为 30-50 步\n    guidance_scale=1.0, # 蒸馏模型通常不需要 CFG 或设为 1.0\n)\n\n# 保存结果\noutput_video.save(\"output.mp4\")\n```\n\n### 关键优化提示\n*   **极速推理**：若使用 HuggingFace 上提供的 **4-step distilled models** (如 `Hy1.5-Distill-Models` 或 `Wan-NVFP4`)，可将 `num_steps` 设为 4，实现约 25 倍以上的加速。\n*   **低显存方案**：在单张 24GB 显存显卡 (如 RTX 4090) 上运行大模型时，务必调用 `pipe.enable_offload()` 并配合 FP8 量化权重。","某电商营销团队需要在“双 11\"大促前，快速为数百款新品生成带有动态展示效果的短视频广告，以投放到社交媒体平台。\n\n### 没有 LightX2V 时\n- **部署门槛高**：团队需手动配置复杂的深度学习环境，不同视频生成模型（如 T2V、I2V）依赖冲突频繁，耗费数天调试才能跑通 Demo。\n- **推理速度慢**：传统框架在生成高清视频时显存占用巨大，单张显卡一次只能处理极短片段，批量生成数百个视频需排队数周。\n- **功能割裂严重**：文生视频、图生视频和图像编辑需要切换不同的代码库和接口，工作流无法统一，开发人员难以快速迭代创意。\n- **硬件成本高昂**：由于缺乏量化支持，必须租用昂贵的顶级 GPU 集群，导致营销预算大量消耗在算力租赁上。\n\n### 使用 LightX2V 后\n- **一键统一部署**：LightX2V 提供统一的推理框架，内置 Docker 镜像，团队仅需一条命令即可集成文本转视频、图像转视频等多种任务，环境搭建缩短至小时级。\n- **极速高效推理**：借助 FP8 和 NVFP4 量化技术，LightX2V 大幅降低显存需求并提升吞吐率，原本需一周的批量生成任务现在仅需两天即可完成。\n- **全流程标准化**：通过统一的 API 接口，开发人员在一个平台上即可灵活切换文生视频、图生视频及图像编辑模式，创意验证周期从几天压缩到几小时。\n- **低成本落地**：得益于轻量化设计，LightX2V 能在消费级显卡甚至 Intel AIPC 上流畅运行，使算力成本直接降低 60% 以上。\n\nLightX2V 通过统一的轻量化架构与先进的量化技术，将视频生成的工程门槛与算力成本双重击穿，让高质量动态内容创作真正变得普惠且高效。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FModelTC_LightX2V_ff9a58ab.png","ModelTC","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FModelTC_2da72b12.png","https:\u002F\u002Flight-ai.top\u002F",null,"https:\u002F\u002Fgithub.com\u002FModelTC",[81,85,89,93,96,100,104,108],{"name":82,"color":83,"percentage":84},"Python","#3572A5",90.7,{"name":86,"color":87,"percentage":88},"Shell","#89e051",4.8,{"name":90,"color":91,"percentage":92},"Cuda","#3A4E3A",1.9,{"name":94,"color":95,"percentage":92},"C++","#f34b7d",{"name":97,"color":98,"percentage":99},"Batchfile","#C1F12E",0.4,{"name":101,"color":102,"percentage":103},"CMake","#DA3434",0.2,{"name":105,"color":106,"percentage":107},"Dockerfile","#384d54",0.1,{"name":109,"color":110,"percentage":111},"C","#555555",0,2133,177,"2026-04-03T09:54:57","Apache-2.0","Linux","必需。支持 NVIDIA GPU (如 H100, RTX 4090D\u002F30\u002F40\u002F50 系列，显存低至 24GB 可运行部分模型); 同时支持国产加速卡：华为 Ascend 910B、海光 DCU、摩尔线程 MUSA、天数智芯 Enflame S60 (GCU)、寒武纪 MLU590、MetaX C500 以及 Intel AIPC PTL。支持 FP8\u002FNVFP4 量化以降低显存需求。","未说明",{"notes":120,"python":121,"dependencies":122},"强烈建议使用 Docker 环境部署以简化配置。支持多种并行策略（CFG parallelism, Ulysses parallelism）和高效卸载技术（block-level offload）以优化显存和速度。支持 GGUF 格式模型推理。针对特定模型（如 Wan2.1, HunyuanVideo-1.5）提供了蒸馏版和量化版权重以实现超快推理。不同硬件后端可能需要安装特定的注意力算子或量化算子（详见文档）。","未说明 (推荐使用 uv 或 pip 安装)",[123,124,125],"torch","ruff","pre-commit",[14,35],[128,129,130,131],"video-generation","wan-video","diffusion-models","auto-regressive-diffusion-model","2026-03-27T02:49:30.150509","2026-04-06T08:27:35.161418",[135,140,145,150,155,160,165,170],{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},12726,"运行 Wan2.2 Distill I2V 模型时输出视频全是高斯噪声，如何解决？","该问题通常由代码版本过旧或 SageAttention 兼容性问题引起。解决方案如下：\n1. 确保拉取的是最新代码，避免使用国内镜像源（可能同步滞后），建议直接执行：`git clone https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V.git`。\n2. 尝试安装作者修改版的 SageAttention，编译并安装：\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002FModelTC\u002FSageAttention\n   cd SageAttention\n   export EXT_PARALLEL=4 NVCC_APPEND_FLAGS=\"--threads 8\" MAX_JOBS=32\n   python setup.py install\n   ```\n3. 更新代码后重新按照教程配置环境即可解决。","https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Fissues\u002F630",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},12727,"如何正确加载包含子目录（如 fp8）的蒸馏模型，避免加载错误的主目录文件？","当模型目录结构包含子文件夹（例如 `Wan2.1-I2V-14B-480P-LightX2V\u002Ffp8\u002F`）时，代码默认可能加载主目录下的第一个文件导致错误。解决方法是在配置文件（JSON）中显式指定具体的模型路径参数。例如，将 `dit_quantized_ckpt` 参数直接设置为子目录路径：`\"dit_quantized_ckpt\": \"Wan2.1-I2V-14B-480P-LightX2V\u002Ffp8\"`。理论上所有场景均可在配置文件中单独指定模型路径。","https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Fissues\u002F193",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},12728,"Wan2.2-I2V-MoE-Distill 模型运行时提示缺少 \"in_dim\" 参数或配置错误怎么办？","请参照官方提供的完整配置文件进行修改。具体可参考仓库中的示例配置：`configs\u002Fwan22\u002Fwan_moe_i2v_distil_with_lora.json`。此外，确保 T5、CLIP 和 VAE 模型文件已正确下载，这些文件可在官方模型库（如 HuggingFace 上的 `Wan-AI\u002FWan2.2-I2V-A14B`）中找到。检查脚本是否遗漏了必要的 LoRA 配置部分，根据示例补全 `lora_configs` 字段。","https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Fissues\u002F414",{"id":151,"question_zh":152,"answer_zh":153,"source_url":154},12729,"如何将训练好的 LoRA 模型合并到源模型中以生成独立模型文件？","可以使用项目提供的转换脚本进行离线合并。命令示例如下：\n```bash\npython LightX2V\u002Ftools\u002Fconvert\u002Fconverter.py --source \u002Fpath\u002Fto\u002Fbase_model\u002F --output \u002Fpath\u002Fto\u002Foutput --output_ext .safetensors --output_name merged_model --model_type wan_dit --lora_path \u002Fpath\u002Fto\u002Flora.safetensors --lora_strength 1.0 --single_file\n```\n该脚本支持将 Diffsynth-Studio 等训练的 LoRA 模型融合到源模型上，生成单一的 `.safetensors` 文件。","https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Fissues\u002F403",{"id":156,"question_zh":157,"answer_zh":158,"source_url":159},12730,"Wan2.2 模型使用 LightX2V 的 LoRA 后效果变差（提示词遵循度低、动作丢失），该如何处理？","目前社区反馈显示，在 Wan2.2 T2V 任务中使用 LightX2V 的 LoRA 可能会导致提示词遵循度下降和动态效果丢失。多位用户测试发现，不使用 LoRA 并在 30 步（steps）下进行推理，能获得更好的生成结果。建议暂时移除 LoRA 加载配置，直接运行基础模型以获取更稳定的视频质量。","https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Fissues\u002F171",{"id":161,"question_zh":162,"answer_zh":163,"source_url":164},12731,"LightX2V 是否支持 FP4 量化以适配 RTX 5090 等新显卡？","截至目前，LightX2V 主要支持 FP8 量化，尚未正式支持 FP4 量化。虽然用户希望利用 RTX 5090 等显卡的 FP4 算力，但项目维护者暂未发布相关计划。建议持续关注官方更新或使用现有的 FP8 量化方案以获得较好的性能平衡。","https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Fissues\u002F605",{"id":166,"question_zh":167,"answer_zh":168,"source_url":169},12732,"是否有针对 Wan2.2 的步数蒸馏（Step Distill）模型计划？","维护者表示正在开发中。对于需要高性能推理的用户，可以参考相关的轻量级模型仓库（如 `ModelTC\u002FWan2.2-Lightning`）。同时，社区建议关注 ComfyUI 或 SwarmUI 等前端框架的集成进展，未来可能会有自定义节点支持 LightX2V 的推理加速功能。","https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Fissues\u002F173",{"id":171,"question_zh":172,"answer_zh":173,"source_url":174},12733,"I2V 任务是否支持改变分辨率？","关于 I2V 的分辨率调整支持，官方文档已进行更新说明。需要注意的是，LightX2V 的推理步数（steps）默认是从 1 开始计数的，用户在配置分辨率和相关参数时需留意此设定，具体细节请查阅最新的项目文档。","https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Fissues\u002F136",[]]