[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-ali-vilab--UniAnimate-DiT":3,"tool-ali-vilab--UniAnimate-DiT":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":76,"owner_company":78,"owner_location":78,"owner_email":78,"owner_twitter":78,"owner_website":78,"owner_url":79,"languages":80,"stars":85,"forks":86,"last_commit_at":87,"license":78,"difficulty_score":10,"env_os":88,"env_gpu":89,"env_ram":88,"env_deps":90,"category_tags":104,"github_topics":105,"view_count":10,"oss_zip_url":78,"oss_zip_packed_at":78,"status":16,"created_at":109,"updated_at":110,"faqs":111,"releases":145},811,"ali-vilab\u002FUniAnimate-DiT","UniAnimate-DiT","UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer","UniAnimate-DiT 是一款专注于人像动画生成的开源 AI 模型，它能将单张静态人物照片转化为连贯的视频片段。通过结合先进的视频扩散 Transformer 架构，UniAnimate-DiT 有效解决了传统技术在生成过程中容易出现的角色特征漂移或动作不自然的问题，确保人物在动态变化中保持高一致性。\n\nUniAnimate-DiT 主要面向 AI 开发者、算法研究人员以及数字内容创作者。它基于强大的 Wan2.1-14B-I2V 模型构建，并依托 DiffSynth-Studio 框架，提供了完整的训练与推理代码。技术上，UniAnimate-DiT 表现尤为出色：它不仅支持多卡并行推理，还引入了 teacache 加速技术，能将推理速度提升约 4 倍。例如在单张 A800 显卡上，生成 5 秒 480p 视频仅需 3 分钟左右。对于追求高效且高质量人像动画效果的用户来说，UniAnimate-DiT 提供了一个极具潜力的技术底座，助力创意落地。","# UniAnimate-DiT \n\nAn expanded version of [UniAnimate](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.01188) based on [Wan2.1](https:\u002F\u002Fgithub.com\u002FWan-Video\u002FWan2.1)\n\nUniAnimate-DiT is based on a state-of-the-art DiT-based Wan2.1-14B-I2V model for consistent human image animation. This codebase is built upon [DiffSynth-Studio](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FDiffSynth-Studio), thanks for the nice open-sourced project.\n\n\u003Cdiv align=\"center\">\n\n\u003Cp align=\"center\">\n  \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fali-vilab_UniAnimate-DiT_readme_e1b15f346453.png' width='784'>\n\n  Overview of the proposed UniAnimate-DiT\n\u003C\u002Fp>\n\n\u003C\u002Fdiv>\n\n\n## 🔥 News \n- **[2025\u002F04\u002F21]** 🔥 We support Unified Sequence Parallel (USP) for multi-GPUs inference.\n- **[2025\u002F04\u002F18]** 🔥🔥🔥 **We support teacache for both short video generation and long video generation, which can achieve about 4 times inference acceleration.** Now, it costs ~3 minutes to generate 5s 480p videos and ~13 minutes to generate 5s 720p videos on one A800 GPU. You can use teacache to select seed and disenable teacache for ideal results.\n- **[2025\u002F04\u002F18]** 🔥 We support teacache, which can achieve about 4 times inference acceleration. It may have a slight impact on performance, and you can use teacache to select the seed. Long video generation does not currently support teacache acceleration, but we are working hard to overcome this.\n- **[2025\u002F04\u002F16]** 🔥 The technical report is avaliable on [ArXiv](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.11289).\n- **[2025\u002F04\u002F15]** 🔥🔥🔥 We released the training and inference code of UniAnimate-DiT based on [UniAnimate](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FUniAnimate) and [Wan2.1](https:\u002F\u002Fgithub.com\u002FWan-Video\u002FWan2.1). The technical report will be avaliable soon.\n\n\n##  Demo cases\n\u003Ctable>\n\u003Ccenter>\n\u003Ctr>\n    \u003C!-- \u003Ctd width=25% style=\"border: none\"> -->\n    \u003Ctd >\u003Ccenter>\n        \u003Cvideo height=\"260\" controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F9671e4e1-edf4-4352-af1e-6743aff4e9f0\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd >\u003Ccenter>\n        \u003Cvideo height=\"260\" controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fc3cf5dc6-19d2-4865-92b8-b687b4e7a901\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\n\n\u003Ctable>\n\u003Ccenter>\n\u003Ctr>\n    \u003C!-- \u003Ctd width=25% style=\"border: none\"> -->\n    \u003Ctd >\u003Ccenter>\n        \u003Cvideo height=\"260\" controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fbd8a9dba-33b0-432f-8ae4-911d7044eb28\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd >\u003Ccenter>\n        \u003Cvideo height=\"260\" controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F79601ec8-ed35-4542-9bb3-777085c6a4a0\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\n\u003Ctable>\n\u003Ccenter>\n\u003Ctr>\n    \u003C!-- \u003Ctd width=25% style=\"border: none\"> -->\n    \u003Ctd >\u003Ccenter>\n        \u003Cvideo height=\"260\" controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F83ae10c3-9828-4eed-95db-f4e3265924b9\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd >\u003Ccenter>\n        \u003Cvideo height=\"260\" controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fa6838591-4ed1-436e-b016-0c4d3864d92e\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\n\n\u003Ctable>\n\u003Ccenter>\n\u003Ctr>\n    \u003C!-- \u003Ctd width=25% style=\"border: none\"> -->\n    \u003Ctd >\u003Ccenter>\n        \u003Cvideo height=\"260\" controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F9e2d75d3-8b1e-4cbb-91a5-dacf99c18261\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd >\u003Ccenter>\n        \u003Cvideo height=\"260\" controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F32104e1a-4f20-4070-a458-73d9e9401013\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\n\n\u003Ctable>\n\u003Ccenter>\n\u003Ctr>\n    \u003C!-- \u003Ctd width=25% style=\"border: none\"> -->\n    \u003Ctd >\u003Ccenter>\n        \u003Cvideo height=\"260\" controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fe7ae8deb-26e2-4452-844c-a8a043dd9846\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd >\u003Ccenter>\n        \u003Cvideo height=\"260\" controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F7f96e347-617f-4c78-bc59-a2bcef9f8080\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n## Getting Started with UniAnimate-DiT\n\n\n### (1) Installation\n\nBefore using this model, please create the conda environment and install DiffSynth-Studio from **source code**.\n\n```shell\nconda create -n UniAnimate-DiT python=3.9.21\n# or conda create -n UniAnimate-DiT python=3.10.16 # Python>=3.10 is required for Unified Sequence Parallel (USP)\nconda activate UniAnimate-DiT\n\n# CUDA 11.8\npip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n# CUDA 12.1\npip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\n# CUDA 12.4\npip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu124\n\ngit clone https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FUniAnimate-DiT.git\ncd UniAnimate-DiT\npip install -e .\n```\n\nUniAnimate-DiT supports multiple Attention implementations. If you have installed any of the following Attention implementations, they will be enabled based on priority.\n\n* [Flash Attention 3](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention)\n* [Flash Attention 2](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention)\n* [Sage Attention](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002FSageAttention)\n* [torch SDPA](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fstable\u002Fgenerated\u002Ftorch.nn.functional.scaled_dot_product_attention.html) (default. `torch>=2.5.0` is recommended.)\n\n## Inference\n\n\n### (2) Download the pretrained checkpoints\n\n(i) Download Wan2.1-14B-I2V-720P models using huggingface-cli:\n```\npip install \"huggingface_hub[cli]\"\nhuggingface-cli download Wan-AI\u002FWan2.1-I2V-14B-720P --local-dir .\u002FWan2.1-I2V-14B-720P\n```\n\nOr download Wan2.1-14B-I2V-720P models using modelscope-cli:\n```\npip install modelscope\nmodelscope download Wan-AI\u002FWan2.1-I2V-14B-720P --local_dir .\u002FWan2.1-I2V-14B-720P\n```\n\n\n(ii) Download pretrained UniAnimate-DiT models (only include the weights of lora and additional learnable modules):\n```\npip install modelscope\nmodelscope download xiaolaowx\u002FUniAnimate-DiT --local_dir .\u002Fcheckpoints\n```\n\nOr download UniAnimate-DiT models using huggingface-cli:\n```\npip install \"huggingface_hub[cli]\"\nhuggingface-cli download ZheWang123\u002FUniAnimate-DiT --local-dir .\u002Fcheckpoints\n```\n\n(iii) Finally, the model weights will be organized in `.\u002Fcheckpoints\u002F` as follows:\n```\n.\u002Fcheckpoints\u002F\n|---- dw-ll_ucoco_384.onnx\n|---- UniAnimate-Wan2.1-14B-Lora-12000.ckpt\n└---- yolox_l.onnx\n```\n\n\n\n### (3) Pose alignment \n\nRescale the target pose sequence to match the pose of the reference image (you can also install `pip install onnxruntime-gpu==1.18.1` for faster extraction on GPU.):\n```\n# reference image 1\npython run_align_pose.py  --ref_name data\u002Fimages\u002FWOMEN-Blouses_Shirts-id_00004955-01_4_full.jpg --source_video_paths data\u002Fvideos\u002Fsource_video.mp4 --saved_pose_dir data\u002Fsaved_pose\u002FWOMEN-Blouses_Shirts-id_00004955-01_4_full \n\n# reference image 2\npython run_align_pose.py  --ref_name data\u002Fimages\u002Fmusk.jpg --source_video_paths data\u002Fvideos\u002Fsource_video.mp4 --saved_pose_dir data\u002Fsaved_pose\u002Fmusk \n\n# reference image 3\npython run_align_pose.py  --ref_name data\u002Fimages\u002FWOMEN-Blouses_Shirts-id_00005125-03_4_full.jpg --source_video_paths data\u002Fvideos\u002Fsource_video.mp4 --saved_pose_dir data\u002Fsaved_pose\u002FWOMEN-Blouses_Shirts-id_00005125-03_4_full\n\n# reference image 4\npython run_align_pose.py  --ref_name data\u002Fimages\u002FIMG_20240514_104337.jpg --source_video_paths data\u002Fvideos\u002Fsource_video.mp4 --saved_pose_dir data\u002Fsaved_pose\u002FIMG_20240514_104337\n\n# reference image 5\npython run_align_pose.py  --ref_name data\u002Fimages\u002F10.jpg --source_video_paths data\u002Fvideos\u002Fsource_video.mp4 --saved_pose_dir data\u002Fsaved_pose\u002F10\n\n# reference image 6\npython run_align_pose.py  --ref_name data\u002Fimages\u002Ftaiyi2.jpg --source_video_paths data\u002Fvideos\u002Fsource_video.mp4 --saved_pose_dir data\u002Fsaved_pose\u002Ftaiyi2\n```\nThe processed target pose for demo videos will be in ```data\u002Fsaved_pose```. `--ref_name` denotes the path of reference image, `--source_video_paths` provides the source poses, `--saved_pose_dir` means the path of processed target poses.\n\n\n### (4) Run UniAnimate-DiT-14B to generate 480P videos\n\n```\nCUDA_VISIBLE_DEVICES=\"0\" python examples\u002Funianimate_wan\u002Finference_unianimate_wan_480p.py\n```\nAbout 23G GPU memory is needed. After this, 81-frame video clips with 832x480 (hight x width) resolution will be generated under the `.\u002Foutputs` folder.\n\n- **Tips**: you can also set `cfg_scale=1.0` to save inference time, which disables classifier-free guidance and can double the speed with minimal performance impact. https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FUniAnimate-DiT\u002Fblob\u002Fc2c7019dbb081464271d470d750b7693ade10dd8\u002Fexamples\u002Funianimate_wan\u002Finference_unianimate_wan_480p.py#L223-L224\n\n- **Tips**: you can set `num_persistent_param_in_dit` to a small number to reduce VRAM required.\n\n|`torch_dtype`|`num_persistent_param_in_dit`|Speed|Required VRAM|Default Setting|\n|-|-|-|-|-|\n|torch.bfloat16|7*10**9 (7B)|20.5s\u002Fit|23G|yes|\n|torch.bfloat16|0|23.0s\u002Fit|14G||\n\n- **Tips**: you can set `use_teacache=True` to enable teacache, which can achieve about 4 times inference acceleration. It may have a slight impact on performance, and you can also use teacache to select the seed. \n\nIf you have many GPUs for inference, we also support Unified Sequence Parallel (USP), note that python>=3.10 is required for Unified Sequence Parallel (USP):\n\n```\npip install xfuser\ntorchrun --standalone --nproc_per_node=4 examples\u002Funianimate_wan\u002Finference_unianimate_wan_480p_usp.py\n```\n\nFor long video generation, run the following comment, the tips above can also be used by yourself:\n\n```\nCUDA_VISIBLE_DEVICES=\"0\" python examples\u002Funianimate_wan\u002Finference_unianimate_wan_long_video_480p.py\n```\n\n### (5) Run UniAnimate-DiT-14B to generate 720P videos\n\n```\nCUDA_VISIBLE_DEVICES=\"0\" python examples\u002Funianimate_wan\u002Finference_unianimate_wan_720p.py\n```\nAbout 36G GPU memory is needed. After this, 81-frame video clips with 1280x720 resolution will be generated.\n\n- **Tips**: you can also set `cfg_scale=1.0` to save inference time, which disables classifier-free guidance and can double the speed with minimal performance impact. https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FUniAnimate-DiT\u002Fblob\u002Fc37c996740cb9584edbdf3b4db2fa9eb47526e30\u002Fexamples\u002Funianimate_wan\u002Finference_unianimate_wan_720p.py#L224-L225\n\n- **Tips**: you can set `num_persistent_param_in_dit` to a small number to reduce VRAM required.\n\n|`torch_dtype`|`num_persistent_param_in_dit`|Speed|Required VRAM|Default Setting|\n|-|-|-|-|-|\n|torch.bfloat16|7*10**9 (7B)|20.5s\u002Fit|36G|yes|\n|torch.bfloat16|0|23.0s\u002Fit|26G||\n\n- **Tips**: you can set `use_teacache=True` to enable teacache, which can achieve about 4 times inference acceleration. It may have a slight impact on performance, and you can also use teacache to select the seed. \n\n\n**Note**: Even though our model was trained on 832x480 resolution, we observed that direct inference on 1280x720 is usually allowed and produces satisfactory results. \n\n\nFor long video generation, run the following comment, the tips above can also be used by yourself:\n\n```\nCUDA_VISIBLE_DEVICES=\"0\" python examples\u002Funianimate_wan\u002Finference_unianimate_wan_long_video_720p.py\n```\n\n**Note**: We find use teacache for 720P long video generation may lead to inconsistent background. We still work on it. You can use teacache to select random seed and disenable teacache for ideal results.\n\n## Train\n\nWe support UniAnimate-DiT training on our own dataset. \n\n### Step 1: Install additional packages\n\n```\npip install peft lightning pandas\n# deepspeed for multiple GPUs\npip install -U deepspeed\n```\n\n### Step 2: Prepare your dataset\n\nIn order to speed up the training, we preprocessed the videos, extracted video frames and corresponding Dwpose in advance, and packaged them with pickle package. You need to manage the training data as follows:\n\n```\ndata\u002Fexample_dataset\u002F\n└── TikTok\n    └── 00001_mp4\n      ├── dw_pose_with_foot_wo_face.pkl # packaged Dwpose\n      └── frame_data.pkl # packaged frames\n```\n\nWe encourage adding large amounts of data to finetune models to get better results. The experimental results show that about 1000 training videos can finetune a good human image animation model. Please refer to `prepare_training_data.py` file for more details about packaged Dwpose\u002Fframes.\n\n### Step 3: Train\n\nFor convenience, we do not pre-process VAE features, but put VAE pre-processing and DiT model training in a training script, and also facilitate data augmentation to improve performance. You can also choose to extract VAE features first and then conduct subsequent DiT model training. \n\n\nLoRA training (One A100 GPU):\n\n```shell\nCUDA_VISIBLE_DEVICES=\"0\" python examples\u002Funianimate_wan\u002Ftrain_unianimate_wan.py \\\n   --task train  \\\n   --train_architecture lora \\\n   --lora_rank 64 --lora_alpha 64  \\\n   --dataset_path data\u002Fexample_dataset   \\\n   --output_path .\u002Fmodels_out_one_GPU   \\\n   --dit_path \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00001-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00002-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00003-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00004-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00005-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00006-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00007-of-00007.safetensors\"    \\\n   --max_epochs 10   --learning_rate 1e-4   \\\n   --accumulate_grad_batches 1   \\\n   --use_gradient_checkpointing --image_encoder_path \".\u002FWan2.1-I2V-14B-720P\u002Fmodels_clip_open-clip-xlm-roberta-large-vit-huge-14.pth\"  --use_gradient_checkpointing_offload \n```\n\n\nLoRA training (Multi-GPUs, based on `Deepseed`):\n\n```shell\nCUDA_VISIBLE_DEVICES=\"0,1,2,3\" python examples\u002Funianimate_wan\u002Ftrain_unianimate_wan.py  \\\n   --task train   --train_architecture lora \\\n   --lora_rank 128 --lora_alpha 128  \\\n   --dataset_path data\u002Fexample_dataset   \\\n   --output_path .\u002Fmodels_out   --dit_path \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00001-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00002-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00003-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00004-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00005-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00006-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00007-of-00007.safetensors\"     \\\n   --max_epochs 10   --learning_rate 1e-4   \\\n   --accumulate_grad_batches 1   \\\n   --use_gradient_checkpointing \\\n   --image_encoder_path \".\u002FWan2.1-I2V-14B-720P\u002Fmodels_clip_open-clip-xlm-roberta-large-vit-huge-14.pth\" \\\n   --use_gradient_checkpointing_offload \\\n   --training_strategy \"deepspeed_stage_2\" \n```\n\n\nYou can also finetune our trained model by set `--pretrained_lora_path=\".\u002Fcheckpoints\u002FUniAnimate-Wan2.1-14B-Lora-12000.ckpt\"`.\n\n### Step 4: Test\n\nTest the LoRA finetuned model trained on one GPU:\n\n```python\nimport torch\nfrom diffsynth import ModelManager, WanVideoPipeline, save_video, VideoData, WanUniAnimateVideoPipeline\n\n\n# Load models\nmodel_manager = ModelManager(device=\"cpu\")\nmodel_manager.load_models(\n    [\".\u002FWan2.1-I2V-14B-720P\u002Fmodels_clip_open-clip-xlm-roberta-large-vit-huge-14.pth\"],\n    torch_dtype=torch.float32, # Image Encoder is loaded with float32\n)\nmodel_manager.load_models(\n    [\n        [\n            \n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00001-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00002-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00003-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00004-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00005-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00006-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00007-of-00007.safetensors\",\n\n        ],\n        \".\u002FWan2.1-I2V-14B-720P\u002Fmodels_t5_umt5-xxl-enc-bf16.pth\",\n        \".\u002FWan2.1-I2V-14B-720P\u002FWan2.1_VAE.pth\",\n    ],\n    torch_dtype=torch.bfloat16, \n)\n\nmodel_manager.load_lora_v2(\"models\u002Flightning_logs\u002Fversion_1\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\", lora_alpha=1.0)\n\n...\n...\n```\n\nTest the LoRA finetuned model trained on multi-GPUs based on Deepspeed, first you need `python zero_to_fp32.py . output_dir\u002F --safe_serialization` to change the .pt files to .safetensors files. Note that `zero_to_fp32.py` is an automatically generated file that can be found in the checkpoint folder after training with DeepSpeed on ​​Multi-GPUs. And then run:\n\n```python\nimport torch\nfrom diffsynth import ModelManager, WanVideoPipeline, save_video, VideoData, WanUniAnimateVideoPipeline\n\n\n# Load models\nmodel_manager = ModelManager(device=\"cpu\")\nmodel_manager.load_models(\n    [\".\u002FWan2.1-I2V-14B-720P\u002Fmodels_clip_open-clip-xlm-roberta-large-vit-huge-14.pth\"],\n    torch_dtype=torch.float32, # Image Encoder is loaded with float32\n)\nmodel_manager.load_models(\n    [\n        [\n            \n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00001-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00002-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00003-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00004-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00005-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00006-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00007-of-00007.safetensors\",\n\n        ],\n        \".\u002FWan2.1-I2V-14B-720P\u002Fmodels_t5_umt5-xxl-enc-bf16.pth\",\n        \".\u002FWan2.1-I2V-14B-720P\u002FWan2.1_VAE.pth\",\n    ],\n    torch_dtype=torch.bfloat16, \n)\n\nmodel_manager.load_lora_v2([\n            \".\u002Fmodels\u002Flightning_logs\u002Fversion_0\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\u002Foutput_dir\u002Fmodel-00001-of-00011.safetensors\",\n            \".\u002Fmodels\u002Flightning_logs\u002Fversion_0\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\u002Foutput_dir\u002Fmodel-00002-of-00011.safetensors\",\n            \".\u002Fmodels\u002Flightning_logs\u002Fversion_0\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\u002Foutput_dir\u002Fmodel-00003-of-00011.safetensors\",\n            \".\u002Fmodels\u002Flightning_logs\u002Fversion_0\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\u002Foutput_dir\u002Fmodel-00004-of-00011.safetensors\",\n            \".\u002Fmodels\u002Flightning_logs\u002Fversion_0\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\u002Foutput_dir\u002Fmodel-00005-of-00011.safetensors\",\n            \".\u002Fmodels\u002Flightning_logs\u002Fversion_0\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\u002Foutput_dir\u002Fmodel-00006-of-00011.safetensors\",\n            \".\u002Fmodels\u002Flightning_logs\u002Fversion_0\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\u002Foutput_dir\u002Fmodel-00007-of-00011.safetensors\",\n            \".\u002Fmodels\u002Flightning_logs\u002Fversion_0\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\u002Foutput_dir\u002Fmodel-00008-of-00011.safetensors\",\n            \".\u002Fmodels\u002Flightning_logs\u002Fversion_0\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\u002Foutput_dir\u002Fmodel-00009-of-00011.safetensors\",\n            \".\u002Fmodels\u002Flightning_logs\u002Fversion_0\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\u002Foutput_dir\u002Fmodel-00010-of-00011.safetensors\",\n            \".\u002Fmodels\u002Flightning_logs\u002Fversion_0\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\u002Foutput_dir\u002Fmodel-00011-of-00011.safetensors\",\n            ], lora_alpha=1.0)\n\n...\n...\n```\n\n\n## Citation\n\nIf you find this codebase useful for your research, please cite the following paper:\n\n```\n@article{wang2025unianimate,\n      title={UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation},\n      author={Wang, Xiang and Zhang, Shiwei and Gao, Changxin and Wang, Jiayu and Zhou, Xiaoqiang and Zhang, Yingya and Yan, Luxin and Sang, Nong},\n      journal={Science China Information Sciences},\n      year={2025}\n}\n```\n\n\n## Disclaimer\n\nThis project is intended for academic research, and we explicitly disclaim any responsibility for user-generated content. Users are solely liable for their actions while using the generative model. The project contributors have no legal affiliation with, nor accountability for, users' behaviors. It is imperative to use the generative model responsibly, adhering to both ethical and legal standards.\n","# UniAnimate-DiT \n\n基于 [Wan2.1](https:\u002F\u002Fgithub.com\u002FWan-Video\u002FWan2.1) 的 [UniAnimate](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.01188) 扩展版本\n\nUniAnimate-DiT 基于最先进的基于 DiT（Diffusion Transformer，扩散变换器）的 Wan2.1-14B-I2V 模型，用于一致的人像动画生成。本代码库构建于 [DiffSynth-Studio](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FDiffSynth-Studio) 之上，感谢这个优秀的开源项目。\n\n\u003Cdiv align=\"center\">\n\n\u003Cp align=\"center\">\n  \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fali-vilab_UniAnimate-DiT_readme_e1b15f346453.png' width='784'>\n\n  所提出的 UniAnimate-DiT 概述\n\u003C\u002Fp>\n\n\u003C\u002Fdiv>\n\n\n## 🔥 新闻 \n- **[2025\u002F04\u002F21]** 🔥 我们支持多卡推理的统一序列并行 (USP)。\n- **[2025\u002F04\u002F18]** 🔥🔥🔥 **我们支持 teacache 用于短视频生成和长视频生成，可实现约 4 倍的推理加速。** 现在，在一块 A800 GPU 上生成 5 秒 480p 视频耗时约 3 分钟，生成 5 秒 720p 视频耗时约 13 分钟。您可以使用 teacache 选择种子，并禁用 teacache 以获得理想结果。\n- **[2025\u002F04\u002F18]** 🔥 我们支持 teacache，可实现约 4 倍的推理加速。这可能会对性能产生轻微影响，您可以使用 teacache 来筛选种子。长视频生成目前尚不支持 teacache 加速，但我们正在努力克服这一问题。\n- **[2025\u002F04\u002F16]** 🔥 技术报告已在 [ArXiv](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.11289) 发布。\n- **[2025\u002F04\u002F15]** 🔥🔥🔥 我们发布了基于 [UniAnimate](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FUniAnimate) 和 [Wan2.1](https:\u002F\u002Fgithub.com\u002FWan-Video\u002FWan2.1) 的 UniAnimate-DiT 训练和推理代码。技术报告即将发布。\n\n\n## 演示案例\n\u003Ctable>\n\u003Ccenter>\n\u003Ctr>\n    \u003C!-- \u003Ctd width=25% style=\"border: none\"> -->\n    \u003Ctd >\u003Ccenter>\n        \u003Cvideo height=\"260\" controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F9671e4e1-edf4-4352-af1e-6743aff4e9f0\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd >\u003Ccenter>\n        \u003Cvideo height=\"260\" controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fc3cf5dc6-19d2-4865-92b8-b687b4e7a901\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\n\n\u003Ctable>\n\u003Ccenter>\n\u003Ctr>\n    \u003C!-- \u003Ctd width=25% style=\"border: none\"> -->\n    \u003Ctd >\u003Ccenter>\n        \u003Cvideo height=\"260\" controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fbd8a9dba-33b0-432f-8ae4-911d7044eb28\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd >\u003Ccenter>\n        \u003Cvideo height=\"260\" controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F79601ec8-ed35-4542-9bb3-777085c6a4a0\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\n\u003Ctable>\n\u003Ccenter>\n\u003Ctr>\n    \u003C!-- \u003Ctd width=25% style=\"border: none\"> -->\n    \u003Ctd >\u003Ccenter>\n        \u003Cvideo height=\"260\" controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F83ae10c3-9828-4eed-95db-f4e3265924b9\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd >\u003Ccenter>\n        \u003Cvideo height=\"260\" controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fa6838591-4ed1-436e-b016-0c4d3864d92e\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\n\n\u003Ctable>\n\u003Ccenter>\n\u003Ctr>\n    \u003C!-- \u003Ctd width=25% style=\"border: none\"> -->\n    \u003Ctd >\u003Ccenter>\n        \u003Cvideo height=\"260\" controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F9e2d75d3-8b1e-4cbb-91a5-dacf99c18261\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd >\u003Ccenter>\n        \u003Cvideo height=\"260\" controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F32104e1a-4f20-4070-a458-73d9e9401013\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\n\n\u003Ctable>\n\u003Ccenter>\n\u003Ctr>\n    \u003C!-- \u003Ctd width=25% style=\"border: none\"> -->\n    \u003Ctd >\u003Ccenter>\n        \u003Cvideo height=\"260\" controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fe7ae8deb-26e2-4452-844c-a8a043dd9846\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd >\u003Ccenter>\n        \u003Cvideo height=\"260\" controls autoplay loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F7f96e347-617f-4c78-bc59-a2bcef9f8080\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n## 开始使用 UniAnimate-DiT\n\n\n### (1) 安装\n\n在使用此模型之前，请创建 conda 环境并从**源代码**安装 DiffSynth-Studio。\n\n```shell\nconda create -n UniAnimate-DiT python=3.9.21\n# 或者 conda create -n UniAnimate-DiT python=3.10.16 # 统一序列并行 (USP) 需要 Python>=3.10\nconda activate UniAnimate-DiT\n\n# CUDA 11.8\npip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n# CUDA 12.1\npip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\n# CUDA 12.4\npip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu124\n\ngit clone https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FUniAnimate-DiT.git\ncd UniAnimate-DiT\npip install -e .\n```\n\nUniAnimate-DiT 支持多种注意力机制实现。如果您已安装以下任何一种注意力机制实现，它们将根据优先级启用。\n\n* [Flash Attention 3](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention)\n* [Flash Attention 2](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention)\n* [Sage Attention](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002FSageAttention)\n* [torch SDPA](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fstable\u002Fgenerated\u002Ftorch.nn.functional.scaled_dot_product_attention.html) (默认。推荐 `torch>=2.5.0`。)\n\n## 推理\n\n\n### (2) 下载预训练检查点\n\n(i) 使用 huggingface-cli 下载 Wan2.1-14B-I2V-720P 模型：\n```\npip install \"huggingface_hub[cli]\"\nhuggingface-cli download Wan-AI\u002FWan2.1-I2V-14B-720P --local-dir .\u002FWan2.1-I2V-14B-720P\n```\n\n或使用 modelscope-cli 下载 Wan2.1-14B-I2V-720P 模型：\n```\npip install modelscope\nmodelscope download Wan-AI\u002FWan2.1-I2V-14B-720P --local_dir .\u002FWan2.1-I2V-14B-720P\n```\n\n\n(ii) 下载预训练的 UniAnimate-DiT 模型（仅包含 LoRA 和额外可学习模块的权重）：\n```\npip install modelscope\nmodelscope download xiaolaowx\u002FUniAnimate-DiT --local_dir .\u002Fcheckpoints\n```\n\n或使用 huggingface-cli 下载 UniAnimate-DiT 模型：\n```\npip install \"huggingface_hub[cli]\"\nhuggingface-cli download ZheWang123\u002FUniAnimate-DiT --local-dir .\u002Fcheckpoints\n```\n\n(iii) 最后，模型权重将按如下方式组织在 `.\u002Fcheckpoints\u002F` 中：\n```\n.\u002Fcheckpoints\u002F\n|---- dw-ll_ucoco_384.onnx\n|---- UniAnimate-Wan2.1-14B-Lora-12000.ckpt\n└---- yolox_l.onnx\n```\n\n\n\n### (3) 姿态对齐 \n\n调整目标姿态序列的比例以匹配参考图像的姿态（您也可以安装 `pip install onnxruntime-gpu==1.18.1` 以便在 GPU 上更快地提取姿态）：\n```\n# 参考图像 1\npython run_align_pose.py  --ref_name data\u002Fimages\u002FWOMEN-Blouses_Shirts-id_00004955-01_4_full.jpg --source_video_paths data\u002Fvideos\u002Fsource_video.mp4 --saved_pose_dir data\u002Fsaved_pose\u002FWOMEN-Blouses_Shirts-id_00004955-01_4_full\n```\n\n```\n# reference image 2\npython run_align_pose.py  --ref_name data\u002Fimages\u002Fmusk.jpg --source_video_paths data\u002Fvideos\u002Fsource_video.mp4 --saved_pose_dir data\u002Fsaved_pose\u002Fmusk \n\n# reference image 3\npython run_align_pose.py  --ref_name data\u002Fimages\u002FWOMEN-Blouses_Shirts-id_00005125-03_4_full.jpg --source_video_paths data\u002Fvideos\u002Fsource_video.mp4 --saved_pose_dir data\u002Fsaved_pose\u002FWOMEN-Blouses_Shirts-id_00005125-03_4_full\n\n# reference image 4\npython run_align_pose.py  --ref_name data\u002Fimages\u002FIMG_20240514_104337.jpg --source_video_paths data\u002Fvideos\u002Fsource_video.mp4 --saved_pose_dir data\u002Fsaved_pose\u002FIMG_20240514_104337\n\n# reference image 5\npython run_align_pose.py  --ref_name data\u002Fimages\u002F10.jpg --source_video_paths data\u002Fvideos\u002Fsource_video.mp4 --saved_pose_dir data\u002Fsaved_pose\u002F10\n\n# reference image 6\npython run_align_pose.py  --ref_name data\u002Fimages\u002Ftaiyi2.jpg --source_video_paths data\u002Fvideos\u002Fsource_video.mp4 --saved_pose_dir data\u002Fsaved_pose\u002Ftaiyi2\n```\n演示视频的处理后目标姿态将保存在 ```data\u002Fsaved_pose``` 中。`--ref_name` 表示参考图像的路径，`--source_video_paths` 提供源姿态，`--saved_pose_dir` 表示处理后目标姿态的路径。\n\n### (4) 运行 UniAnimate-DiT-14B 生成 480P 视频\n\n```\nCUDA_VISIBLE_DEVICES=\"0\" python examples\u002Funianimate_wan\u002Finference_unianimate_wan_480p.py\n```\n大约需要 23G GPU 显存。之后，将在 `.\u002Foutputs` 文件夹下生成 81 帧、分辨率为 832x480（高 x 宽）的视频片段。\n\n- **提示**：你也可以设置 `cfg_scale=1.0` 以节省推理时间，这将禁用无分类器引导（classifier-free guidance），在性能影响最小的情况下可将速度提高一倍。https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FUniAnimate-DiT\u002Fblob\u002Fc2c7019dbb081464271d470d750b7693ade10dd8\u002Fexamples\u002Funianimate_wan\u002Finference_unianimate_wan_480p.py#L223-L224\n\n- **提示**：你可以将 `num_persistent_param_in_dit` 设置为较小的数字以减少所需的 VRAM。\n\n|`torch_dtype`|`num_persistent_param_in_dit`|速度 | 所需显存 | 默认设置|\n|-|-|-|-|-|\n|torch.bfloat16|7*10**9 (7B)|20.5s\u002Fit|23G|yes|\n|torch.bfloat16|0|23.0s\u002Fit|14G||\n\n- **提示**：你可以设置 `use_teacache=True` 以启用 teacache（一种推理缓存加速技术），这可以实现约 4 倍的推理加速。它可能会对性能产生轻微影响，你也可以使用 teacache 来选择种子。 \n\n如果你有很多 GPU 用于推理，我们也支持统一序列并行（Unified Sequence Parallel (USP)），注意统一序列并行（USP）需要 python>=3.10：\n\n```\npip install xfuser\ntorchrun --standalone --nproc_per_node=4 examples\u002Funianimate_wan\u002Finference_unianimate_wan_480p_usp.py\n```\n\n对于长视频生成，运行以下命令，上述提示也可以由你自己使用：\n\n```\nCUDA_VISIBLE_DEVICES=\"0\" python examples\u002Funianimate_wan\u002Finference_unianimate_wan_long_video_480p.py\n```\n\n### (5) 运行 UniAnimate-DiT-14B 生成 720P 视频\n\n```\nCUDA_VISIBLE_DEVICES=\"0\" python examples\u002Funianimate_wan\u002Finference_unianimate_wan_720p.py\n```\n大约需要 36G GPU 显存。之后，将生成 81 帧、分辨率为 1280x720 的视频片段。\n\n- **提示**：你也可以设置 `cfg_scale=1.0` 以节省推理时间，这将禁用无分类器引导（classifier-free guidance），在性能影响最小的情况下可将速度提高一倍。https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FUniAnimate-DiT\u002Fblob\u002Fc37c996740cb9584edbdf3b4db2fa9eb47526e30\u002Fexamples\u002Funianimate_wan\u002Finference_unianimate_wan_720p.py#L224-L225\n\n- **提示**：你可以将 `num_persistent_param_in_dit` 设置为较小的数字以减少所需的 VRAM。\n\n|`torch_dtype`|`num_persistent_param_in_dit`|速度 | 所需显存 | 默认设置|\n|-|-|-|-|-|\n|torch.bfloat16|7*10**9 (7B)|20.5s\u002Fit|36G|yes|\n|torch.bfloat16|0|23.0s\u002Fit|26G||\n\n- **提示**：你可以设置 `use_teacache=True` 以启用 teacache（一种推理缓存加速技术），这可以实现约 4 倍的推理加速。它可能会对性能产生轻微影响，你也可以使用 teacache 来选择种子。 \n\n\n**注意**：尽管我们的模型是在 832x480 分辨率上训练的，但我们观察到直接在 1280x720 上进行推理通常是允许的，并且会产生令人满意的结果。 \n\n\n对于长视频生成，运行以下命令，上述提示也可以由你自己使用：\n\n```\nCUDA_VISIBLE_DEVICES=\"0\" python examples\u002Funianimate_wan\u002Finference_unianimate_wan_long_video_720p.py\n```\n\n**注意**：我们发现对 720P 长视频生成使用 teacache 可能导致背景不一致。我们仍在处理这个问题。你可以使用 teacache 来选择随机种子并禁用 teacache 以获得理想结果。\n\n## 训练\n\n我们支持在我们的数据集上训练 UniAnimate-DiT。 \n\n### 步骤 1：安装额外包\n\n```\npip install peft lightning pandas\n# deepspeed for multiple GPUs\npip install -U deepspeed\n```\n\n### 步骤 2：准备数据集\n\n为了加快训练速度，我们预先对视频进行了预处理，提取了视频帧和对应的 Dwpose（姿态估计模型），并使用 pickle（Python 序列化包）进行了打包。你需要按以下方式管理训练数据：\n\n```\ndata\u002Fexample_dataset\u002F\n└── TikTok\n    └── 00001_mp4\n      ├── dw_pose_with_foot_wo_face.pkl # packaged Dwpose\n      └── frame_data.pkl # packaged frames\n```\n\n我们鼓励添加大量数据来微调（finetune）模型以获得更好的结果。实验结果表明，大约 1000 个训练视频可以微调出一个良好的人像动画模型。有关打包后的 Dwpose\u002F帧的更多详细信息，请参阅 `prepare_training_data.py` 文件。\n\n### 步骤 3：训练\n\n为了方便起见，我们不对 VAE（变分自编码器）特征进行预处理，而是将 VAE 预处理和 DiT（扩散 Transformer）模型训练整合在一个训练脚本中，同时也支持数据增强以提升性能。您也可以先提取 VAE 特征，然后再进行后续的 DiT 模型训练。\n\nLoRA（低秩适应）训练（单张 A100 GPU）：\n\n```shell\nCUDA_VISIBLE_DEVICES=\"0\" python examples\u002Funianimate_wan\u002Ftrain_unianimate_wan.py \\\n   --task train  \\\n   --train_architecture lora \\\n   --lora_rank 64 --lora_alpha 64  \\\n   --dataset_path data\u002Fexample_dataset   \\\n   --output_path .\u002Fmodels_out_one_GPU   \\\n   --dit_path \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00001-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00002-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00003-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00004-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00005-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00006-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00007-of-00007.safetensors\"    \\\n   --max_epochs 10   --learning_rate 1e-4   \\\n   --accumulate_grad_batches 1   \\\n   --use_gradient_checkpointing --image_encoder_path \".\u002FWan2.1-I2V-14B-720P\u002Fmodels_clip_open-clip-xlm-roberta-large-vit-huge-14.pth\"  --use_gradient_checkpointing_offload \n```\n\n\nLoRA 训练（多 GPU，基于 `DeepSpeed`）：\n\n```shell\nCUDA_VISIBLE_DEVICES=\"0,1,2,3\" python examples\u002Funianimate_wan\u002Ftrain_unianimate_wan.py  \\\n   --task train   --train_architecture lora \\\n   --lora_rank 128 --lora_alpha 128  \\\n   --dataset_path data\u002Fexample_dataset   \\\n   --output_path .\u002Fmodels_out   --dit_path \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00001-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00002-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00003-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00004-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00005-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00006-of-00007.safetensors,.\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00007-of-00007.safetensors\"     \\\n   --max_epochs 10   --learning_rate 1e-4   \\\n   --accumulate_grad_batches 1   \\\n   --use_gradient_checkpointing \\\n   --image_encoder_path \".\u002FWan2.1-I2V-14B-720P\u002Fmodels_clip_open-clip-xlm-roberta-large-vit-huge-14.pth\" \\\n   --use_gradient_checkpointing_offload \\\n   --training_strategy \"deepspeed_stage_2\" \n```\n\n\n您也可以通过设置 `--pretrained_lora_path=\".\u002Fcheckpoints\u002FUniAnimate-Wan2.1-14B-Lora-12000.ckpt\"` 来微调我们训练好的模型。\n\n### 步骤 4：测试\n\n测试在单 GPU 上训练的 LoRA 微调模型：\n\n```python\nimport torch\nfrom diffsynth import ModelManager, WanVideoPipeline, save_video, VideoData, WanUniAnimateVideoPipeline\n\n\n# Load models\nmodel_manager = ModelManager(device=\"cpu\")\nmodel_manager.load_models(\n    [\".\u002FWan2.1-I2V-14B-720P\u002Fmodels_clip_open-clip-xlm-roberta-large-vit-huge-14.pth\"],\n    torch_dtype=torch.float32, # Image Encoder is loaded with float32\n)\nmodel_manager.load_models(\n    [\n        [\n            \n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00001-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00002-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00003-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00004-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00005-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00006-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00007-of-00007.safetensors\",\n\n        ],\n        \".\u002FWan2.1-I2V-14B-720P\u002Fmodels_t5_umt5-xxl-enc-bf16.pth\",\n        \".\u002FWan2.1-I2V-14B-720P\u002FWan2.1_VAE.pth\",\n    ],\n    torch_dtype=torch.bfloat16, \n)\n\nmodel_manager.load_lora_v2(\"models\u002Flightning_logs\u002Fversion_1\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\", lora_alpha=1.0)\n\n...\n...\n```\n\n测试基于 DeepSpeed 在多 GPU 上训练的 LoRA 微调模型，首先您需要运行 `python zero_to_fp32.py . output_dir\u002F --safe_serialization` 将 .pt 文件转换为 .safetensors 文件。请注意，`zero_to_fp32.py` 是一个自动生成的文件，可以在使用 DeepSpeed 在多 GPU 上进行训练后的 checkpoint 文件夹中找到。然后运行：\n\n```python\nimport torch\nfrom diffsynth import ModelManager, WanVideoPipeline, save_video, VideoData, WanUniAnimateVideoPipeline\n```\n\n# Load models\nmodel_manager = ModelManager(device=\"cpu\")\nmodel_manager.load_models(\n    [\".\u002FWan2.1-I2V-14B-720P\u002Fmodels_clip_open-clip-xlm-roberta-large-vit-huge-14.pth\"],\n    torch_dtype=torch.float32, # Image Encoder is loaded with float32\n)\nmodel_manager.load_models(\n    [\n        [\n            \n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00001-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00002-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00003-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00004-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00005-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00006-of-00007.safetensors\",\n            \".\u002FWan2.1-I2V-14B-720P\u002Fdiffusion_pytorch_model-00007-of-00007.safetensors\",\n\n        ],\n        \".\u002FWan2.1-I2V-14B-720P\u002Fmodels_t5_umt5-xxl-enc-bf16.pth\",\n        \".\u002FWan2.1-I2V-14B-720P\u002FWan2.1_VAE.pth\",\n    ],\n    torch_dtype=torch.bfloat16, \n)\n\nmodel_manager.load_lora_v2([\n            \".\u002Fmodels\u002Flightning_logs\u002Fversion_0\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\u002Foutput_dir\u002Fmodel-00001-of-00011.safetensors\",\n            \".\u002Fmodels\u002Flightning_logs\u002Fversion_0\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\u002Foutput_dir\u002Fmodel-00002-of-00011.safetensors\",\n            \".\u002Fmodels\u002Flightning_logs\u002Fversion_0\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\u002Foutput_dir\u002Fmodel-00003-of-00011.safetensors\",\n            \".\u002Fmodels\u002Flightning_logs\u002Fversion_0\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\u002Foutput_dir\u002Fmodel-00004-of-00011.safetensors\",\n            \".\u002Fmodels\u002Flightning_logs\u002Fversion_0\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\u002Foutput_dir\u002Fmodel-00005-of-00011.safetensors\",\n            \".\u002Fmodels\u002Flightning_logs\u002Fversion_0\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\u002Foutput_dir\u002Fmodel-00006-of-00011.safetensors\",\n            \".\u002Fmodels\u002Flightning_logs\u002Fversion_0\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\u002Foutput_dir\u002Fmodel-00007-of-00011.safetensors\",\n            \".\u002Fmodels\u002Flightning_logs\u002Fversion_0\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\u002Foutput_dir\u002Fmodel-00008-of-00011.safetensors\",\n            \".\u002Fmodels\u002Flightning_logs\u002Fversion_0\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\u002Foutput_dir\u002Fmodel-00009-of-00011.safetensors\",\n            \".\u002Fmodels\u002Flightning_logs\u002Fversion_0\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\u002Foutput_dir\u002Fmodel-00010-of-00011.safetensors\",\n            \".\u002Fmodels\u002Flightning_logs\u002Fversion_0\u002Fcheckpoints\u002Fepoch=0-step=500.ckpt\u002Foutput_dir\u002Fmodel-00011-of-00011.safetensors\",\n            ], lora_alpha=1.0)\n\n...\n...\n```\n\n\n## 引用\n\n如果您发现此代码库对您的研究有用，请引用以下论文：\n\n```\n@article{wang2025unianimate,\n      title={UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation},\n      author={Wang, Xiang and Zhang, Shiwei and Gao, Changxin and Wang, Jiayu and Zhou, Xiaoqiang and Zhang, Yingya and Yan, Luxin and Sang, Nong},\n      journal={Science China Information Sciences},\n      year={2025}\n}\n```\n\n\n## 免责声明\n\n本项目旨在用于学术研究，我们明确声明不对用户生成内容承担任何责任。用户使用生成模型时的行为由其自行负责。项目贡献者与用户行为无法律关联，也不对其行为负责。负责任地使用生成模型至关重要，需遵守道德和法律标准。","# UniAnimate-DiT 快速上手指南\n\nUniAnimate-DiT 是基于 Wan2.1-14B-I2V 模型构建的先进 DiT 架构，用于生成一致的人像动画。本工具支持多卡推理加速（USP）及 TeaCache 加速技术。\n\n## 环境准备\n\n*   **Python 版本**: 推荐 Python 3.9 或 3.10+（若使用 Unified Sequence Parallel 需 Python >= 3.10）。\n*   **CUDA 版本**: 支持 CUDA 11.8 \u002F 12.1 \u002F 12.4。\n*   **硬件要求**:\n    *   480P 生成：约 23GB 显存。\n    *   720P 生成：约 36GB 显存。\n    *   推荐使用 A800 或同级别高性能 GPU。\n*   **依赖库**: Flash Attention 2\u002F3、Sage Attention 或 torch SDPA（默认）。\n\n## 安装步骤\n\n### 1. 创建虚拟环境并安装 PyTorch\n\n```shell\nconda create -n UniAnimate-DiT python=3.9.21\n# 或使用 Python 3.10+ 以支持多卡并行\n# conda create -n UniAnimate-DiT python=3.10.16 \nconda activate UniAnimate-DiT\n\n# 根据实际 CUDA 版本选择以下命令之一\n# CUDA 11.8\npip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n# CUDA 12.1\npip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\n# CUDA 12.4\npip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu124\n```\n\n### 2. 克隆代码库并安装项目依赖\n\n```shell\ngit clone https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FUniAnimate-DiT.git\ncd UniAnimate-DiT\npip install -e .\n```\n\n## 基本使用\n\n### 1. 下载预训练模型\n\n建议优先使用国内镜像源（ModelScope）加速下载。\n\n**(i) 下载 Wan2.1-14B-I2V-720P 基础模型**\n```shell\npip install modelscope\nmodelscope download Wan-AI\u002FWan2.1-I2V-14B-720P --local_dir .\u002FWan2.1-I2V-14B-720P\n```\n\n**(ii) 下载 UniAnimate-DiT LoRA 权重**\n```shell\npip install modelscope\nmodelscope download xiaolaowx\u002FUniAnimate-DiT --local_dir .\u002Fcheckpoints\n```\n\n确保目录结构如下：\n```\n.\u002Fcheckpoints\u002F\n|---- dw-ll_ucoco_384.onnx\n|---- UniAnimate-Wan2.1-14B-Lora-12000.ckpt\n└---- yolox_l.onnx\n```\n\n### 2. 姿态对齐\n\n将目标视频的姿态序列与参考图像对齐。\n\n```shell\n# 示例命令\npython run_align_pose.py  --ref_name data\u002Fimages\u002FWOMEN-Blouses_Shirts-id_00004955-01_4_full.jpg --source_video_paths data\u002Fvideos\u002Fsource_video.mp4 --saved_pose_dir data\u002Fsaved_pose\u002FWOMEN-Blouses_Shirts-id_00004955-01_4_full\n```\n*参数说明*: `--ref_name` 为参考图路径，`--source_video_paths` 为源视频路径，`--saved_pose_dir` 为输出姿态保存路径。\n\n### 3. 生成视频\n\n#### 生成 480P 视频\n```shell\nCUDA_VISIBLE_DEVICES=\"0\" python examples\u002Funianimate_wan\u002Finference_unianimate_wan_480p.py\n```\n*   **显存优化**: 设置 `num_persistent_param_in_dit=0` 可将显存需求降至 14GB。\n*   **加速技巧**: 设置 `use_teacache=True` 可获得约 4 倍推理加速；设置 `cfg_scale=1.0` 可禁用分类器引导以进一步提升速度。\n\n#### 生成 720P 视频\n```shell\nCUDA_VISIBLE_DEVICES=\"0\" python examples\u002Funianimate_wan\u002Finference_unianimate_wan_720p.py\n```\n*   注意：虽然模型在 480P 下训练，但直接推理 720P 通常也能获得满意结果。\n\n#### 长视频生成\n```shell\nCUDA_VISIBLE_DEVICES=\"0\" python examples\u002Funianimate_wan\u002Finference_unianimate_wan_long_video_480p.py\n```\n\n#### 多卡并行推理 (USP)\n需要 Python >= 3.10 并安装 `xfuser`：\n```shell\npip install xfuser\ntorchrun --standalone --nproc_per_node=4 examples\u002Funianimate_wan\u002Finference_unianimate_wan_480p_usp.py\n```","某独立游戏工作室急需为角色概念图制作高质量宣传动画，却受限于高昂的动作捕捉成本与复杂的后期流程，项目进度严重滞后。\n\n### 没有 UniAnimate-DiT 时\n- 传统动捕设备昂贵且需专业场地，小团队难以承担预算，导致项目搁置。\n- 通用 AI 视频工具生成的角色动作僵硬，面部细节频繁失真，无法满足商业标准。\n- 长视频渲染耗时极长，一次调整动作往往需要等待数小时，严重拖慢开发节奏。\n- 多显卡协同效率低，无法快速并行测试多种角色表现方案，试错成本过高。\n\n### 使用 UniAnimate-DiT 后\n- UniAnimate-DiT 直接驱动静态图片生成流畅自然的人体动作，彻底省去实拍环节。\n- 基于 Wan2.1 大模型架构，确保人物身份在多帧视频中高度一致，杜绝形象崩坏。\n- 引入 Teacache 加速技术，5 秒高清视频生成时间压缩至十几分钟，极大提升迭代速度。\n- 支持多卡并行推理，团队可同时运行多个任务，灵活调整动作幅度与风格。\n\n通过统一序列并行与加速技术，UniAnimate-DiT 实现了低成本、高保真的人物动态化生产。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fali-vilab_UniAnimate-DiT_3fed7442.png","ali-vilab","Alibaba TongYi Vision Intelligence Lab","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fali-vilab_c2d93ee0.png",null,"https:\u002F\u002Fgithub.com\u002Fali-vilab",[81],{"name":82,"color":83,"percentage":84},"Python","#3572A5",100,842,56,"2026-03-27T13:46:45","未说明","需要 NVIDIA GPU，显存 14GB+ (优化模式) \u002F 23GB+ (默认 480P) \u002F 36GB+ (720P)，支持 CUDA 11.8\u002F12.1\u002F12.4",{"notes":91,"python":92,"dependencies":93},"需从源码安装 DiffSynth-Studio；首次运行需下载 Wan2.1 及 UniAnimate-DiT 模型权重；推理前需先运行姿态对齐脚本；支持 teacache 加速（约 4 倍）；多卡并行需 Python>=3.10 并安装 xfuser；训练数据需预处理为 pickle 格式","3.9+ (推荐 3.10+ 以支持 USP)",[94,95,96,97,98,99,100,101,102,103],"torch==2.5.0","torchvision==0.20.0","torchaudio==2.5.0","DiffSynth-Studio","huggingface_hub","modelscope","xfuser","peft","deepspeed","pandas",[26,52,14],[106,107,108],"human-image-animation","video-diffusion-transformers","video-generation","2026-03-27T02:49:30.150509","2026-04-06T08:45:48.313679",[112,117,122,126,131,135,140],{"id":113,"question_zh":114,"answer_zh":115,"source_url":116},3500,"姿态提取分辨率与模型推理分辨率不一致会影响效果吗？","训练过程中采用了空间随机裁剪（spatially random-crop）的数据增强策略，因此模型能够容忍这种分辨率差异。如果您发现结果不佳，可以尝试在原始分辨率下提取姿态。您可以自行实验验证，如果有更好的结果，可以将模型上传到仓库。","https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FUniAnimate-DiT\u002Fissues\u002F2",{"id":118,"question_zh":119,"answer_zh":120,"source_url":121},3501,"Windows 系统加载模型时 RAM 溢出且电脑冻结如何解决？","检查虚拟内存设置。将 pagefile.sys 配置到系统盘（SSD\u002FNVMe）而非慢速硬盘。例如设置为 80GB。模型首先会加载到 RAM\u002F虚拟内存，之后才会切换到 GPU\u002FVRAM。","https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FUniAnimate-DiT\u002Fissues\u002F25",{"id":123,"question_zh":124,"answer_zh":125,"source_url":121},3502,"推理脚本中如何设置设备以解决 VRAM 不足或系统内存溢出错误？","在 inference_unianimate_wan_480p.py 或 inference_unianimate_wan_720p.py 中，找到第 X 行左右的 `model_manager = ModelManager(device=\"cpu\")`，将其修改为 `device=\"cuda\"`。这会增加 VRAM 使用率，但能解决因系统 RAM 超限导致的 KILLED ERROR。",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},3503,"如何处理源身份和目标身份之间不同的骨架长度对齐问题？","可以使用代码库中的 `run_align_pose.py` 脚本来进行姿态对齐处理。该脚本负责处理不同骨架长度的归一化或重映射。","https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FUniAnimate-DiT\u002Fissues\u002F26",{"id":132,"question_zh":133,"answer_zh":134,"source_url":130},3504,"为什么演示视频中关键点颜色与代码定义的颜色不一致（如手部显示为红色而非蓝色）？","这是由于图像格式转换导致的。具体原因是 RGB 和 BGR 颜色空间的转换问题，并非代码逻辑错误。",{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},3505,"训练过程中保存的 31GB `.pt` 文件是什么？是否可以删除？","这些文件包含 wan2.1 的权重，是 DeepSpeed 的一个 Bug。建议将 .pt 文件转换为 .safetensors 格式，或者在训练完成后自行保存可学习的权重，然后移除原始的 .pt 文件。","https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FUniAnimate-DiT\u002Fissues\u002F13",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},3506,"训练数据准备时使用了什么样的 Prompt？","所有用于训练的视频 Prompt 均统一设置为 \"A person is dancing\"。","https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FUniAnimate-DiT\u002Fissues\u002F19",[]]