[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-AlonzoLeeeooo--awesome-video-generation":3,"tool-AlonzoLeeeooo--awesome-video-generation":64},[4,17,26,35,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,2,"2026-04-03T11:11:01",[13,14,15],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":10,"last_commit_at":32,"category_tags":33,"status":16},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,"2026-04-06T03:28:53",[13,14,15,34],"视频",{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,43,34,44,15,45,46,13,47],"数据工具","插件","其他","语言模型","音频",{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":10,"last_commit_at":54,"category_tags":55,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,46,45],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74939,"2026-04-05T23:16:38",[46,14,13,45],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":81,"owner_twitter":81,"owner_website":81,"owner_url":82,"languages":83,"stars":88,"forks":89,"last_commit_at":90,"license":91,"difficulty_score":92,"env_os":93,"env_gpu":94,"env_ram":94,"env_deps":95,"category_tags":98,"github_topics":99,"view_count":23,"oss_zip_url":81,"oss_zip_packed_at":81,"status":16,"created_at":105,"updated_at":106,"faqs":107,"releases":108},4125,"AlonzoLeeeooo\u002Fawesome-video-generation","awesome-video-generation","A collection of awesome video generation studies.","awesome-video-generation 是一个专注于视频生成领域的开源资源汇总库，旨在为研究者和开发者提供一站式的前沿学术导航。面对视频生成技术日新月异、论文层出不穷的现状，它有效解决了信息分散、难以追踪最新进展的痛点。\n\n该资源库系统性地整理了从文本生成视频、图像转视频、个性化视频生成，到视频编辑、音频驱动视频及人物图像动画等多个细分方向的研究成果。其独特亮点在于极强的时效性与结构化分类：不仅按年份（涵盖 2021 至 2026 年预测）和顶级会议（如 CVPR、NeurIPS、ICLR 等）对论文进行精细归档，还及时更新官方代码、模型权重及相关数据集链接。此外，仓库维护者也会同步分享团队在视频一致性编辑等方面的最新突破性工作。\n\nawesome-video-generation 特别适合人工智能研究人员、算法工程师以及高校学生使用。对于希望快速把握领域动态、复现经典算法或寻找创新灵感的从业者而言，这是一个不可或缺的专业知识库，帮助用户高效连接理论与工程实践，加速视频生成技术的探索与应用。","\u003Cp align=\"center\">\n  \u003Ch1 align=\"center\">A Collection of Video Generation Studies\u003C\u002Fh1>\n\nThis GitHub repository summarizes papers and resources related to the video generation task. \n\nIf you have any suggestions about this repository, please feel free to [start a new issue](https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo\u002Fawesome-video-generation\u002Fissues\u002Fnew) or [pull requests](https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo\u002Fawesome-video-generation\u002Fpulls).\n\nRecent news of this GitHub repo are listed as follows.\n\n🔥 [Dec. 11th, 2025] Our paper titled [\"StableV2V: Stablizing Shape Consistency in Video-to-Video Editing\"](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F11272911) is accepted at TCSVT 2025!\n\n🔥 [Nov. 19th] We have released our latest paper titled [\"StableV2V: Stablizing Shape Consistency in Video-to-Video Editing\"](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.11045), with the correponding [code](https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo\u002FStableV2V), [model weights](https:\u002F\u002Fhuggingface.co\u002FAlonzoLeeeooo\u002FStableV2V), and [a testing benchmark `DAVIS-Edit`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FAlonzoLeeeooo\u002FDAVIS-Edit) open-sourced. Feel free to check them out from the links!\n\u003Cdetails> \u003Csummary> Click to see more information. \u003C\u002Fsummary>\n\n- [2025 May 13th] Update a new sub-task named [Human Image Animation](#human-image-animation). All **CVPR 2025** papers and references are updated.\n- [Jun. 17th] All **NeurIPS 2023** papers and references are updated.\n- [Apr. 26th] Update a new direction: [Personalized Video Generation](#personalized-video-generation).\n- [Mar. 28th] The official **AAAI 2024** paper list are released! Official version of PDFs and BibTeX references are updated accordingly.\n\u003C\u002Fdetails>\n\n\u003C!-- omit in toc -->\n# \u003Cspan id=\"contents\">Contents\u003C\u002Fspan>\n- [To-Do Lists](#to-do-lists)\n- [Products](#products)\n- [Papers](#papers)\n  - [Survey Papers](#survey-papers)\n  - [Text-to-Video Generation](#text-to-video-generation)\n    - [Year 2026](#text-year-2026)\n    - [Year 2025](#text-year-2025)\n    - [Year 2024](#text-year-2024)\n    - [Year 2023](#text-year-2023)\n    - [Year 2022](#text-year-2022)\n    - [Year 2021](#text-year-2021)\n  - [Image-to-Video Generation](#image-to-video-generation)\n    - [Year 2024](#image-year-2024)\n    - [Year 2023](#image-year-2023)\n    - [Year 2022](#image-year-2022)\n  - [Personalized Video Generation](#personalized-video-generation)\n    - [Year 2024](#personalized-year-2024)\n    - [Year 2023](#personalized-year-2023)\n  - [Video Editing](#video-editing)\n    - [Year 2025](#editing-year-2025)\n    - [Year 2024](#editing-year-2024)\n    - [Year 2023](#editing-year-2023)\n  - [Audio-to-Video Generation](#audio-to-video-generation)\n    - [Year 2024](#audio-year-2024)\n    - [Year 2023](#audio-year-2023)\n  - [Human Image Animation](#human-image-animation)\n    - [Year 2026](#human-year-2026)\n    - [Year 2025](#human-year-2025)\n    - [Year 2024](#human-year-2023)\n- [Datasets](#datasets)\n- [Q&A](#qa)\n- [References](#references)\n- [Star History](#star-history)\n- [WeChat Group](#wechat-group)\n\n\u003C!-- omit in toc -->\n# To-Do Lists\n- Latest Papers\n  - [ ] Update NeurIPS 2025 Papers\n  - [ ] Update ICCV 2025 Papers\n  - [x] Update CVPR 2025 Papers\n  - [x] Update ICLR 2025 Papers\n  - [x] Update NeurIPS 2024 Papers\n  - [x] Update ECCV 2024 Papers\n  - [x] Update CVPR 2024 Papers\n    - [x] Update PDFs and References of ⚠️ Papers\n    - [ ] Update Published Versions of References\n  - [x] Update AAAI 2024 Papers\n    - [x] Update PDFs and References of ⚠️ Papers\n    - [x] Update Published Versions of References\n  - [x] Update ICLR 2024 Papers\n  - [x] Update NeurIPS 2023 Papers\n- Previously Published Papers\n  - [x] Update Previous CVPR papers\n  - [x] Update Previous ICCV papers\n  - [x] Update Previous ECCV papers\n  - [x] Update Previous NeurIPS papers\n  - [x] Update Previous ICLR papers\n  - [x] Update Previous AAAI papers\n  - [x] Update Previous ACM MM papers\n- Regular Maintenance of Preprint arXiv Papers and Missed Papers\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n# Products\n\n|Name|Organization|Year|Research Paper|Website|Specialties|\n|-|-|-|-|-|-|\n|Sora|OpenAI|2024|[link](https:\u002F\u002Fwww.midjourney.com\u002Fhome)|[link](https:\u002F\u002Fopenai.com\u002Fsora)|-|\n|Lumiere|Google|2024|[link](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.12945)|[link](https:\u002F\u002Flumiere-video.github.io\u002F)|-|\n|VideoPoet|Google|2023|-|[link](https:\u002F\u002Fsites.research.google\u002Fvideopoet\u002F)|-|\n|W.A.I.T|Google|2023|[link](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.06662.pdf)|[link](https:\u002F\u002Fwalt-video-diffusion.github.io\u002F)|-|\n|Gen-2|Runaway|2023|-|[link](https:\u002F\u002Fresearch.runwayml.com\u002Fgen2)|-|\n|Gen-1|Runaway|2023|-|[link](https:\u002F\u002Fresearch.runwayml.com\u002Fgen1)|-|\n|Animate Anyone|Alibaba|2023|[link](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.17117.pdf)|[link](https:\u002F\u002Fhumanaigc.github.io\u002Fanimate-anyone\u002F)|-|\n|Outfit Anyone|Alibaba|2023|-|[link](https:\u002F\u002Foutfitanyone.app\u002F)|-|\n|Stable Video|StabilityAI|2023|[link](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.15127.pdf)|[link](https:\u002F\u002Fwww.stablevideo.com\u002F)|-|\n|Pixeling|HiDream.ai|2023|-|[link](https:\u002F\u002Fhidreamai.com\u002F#\u002F)|-|\n|DomoAI|DomoAI|2023|-|[link](https:\u002F\u002Fdomoai.app\u002F)|-|\n|Emu|Meta|2023|[link](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.10709)|[link](https:\u002F\u002Femu-video.metademolab.com\u002F)|-|\n|Genmo|Genmo|2023|-|[link](https:\u002F\u002Fwww.genmo.ai\u002F)|-|\n|NeverEnds|NeverEnds|2023|-|[link](https:\u002F\u002Fneverends.life\u002F)|-|\n|Moonvalley|Moonvalley|2023|-|[link](https:\u002F\u002Fmoonvalley.ai\u002F)|-|\n|Morph Studio|Morph|2023|-|[link](https:\u002F\u002Fwww.morphstudio.com\u002F)|-|\n|Pika|Pika|2023|-|[link](https:\u002F\u002Fpika.art\u002F)|-|\n|PixelDance|ByteDance|2023|[link](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.10982)|[link](https:\u002F\u002Fmakepixelsdance.github.io\u002F)|-|\n\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n# Papers\n\n\u003C!-- omit in toc -->\n## Survey Papers\n- \u003Cspan id=\"survey-year-2023\">**Year 2024**\u003C\u002Fspan>\n  - **arXiv**\n    - Video Diffusion Models: A Survey [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.03150.pdf)\n- \u003Cspan id=\"survey-year-2023\">**Year 2023**\u003C\u002Fspan>\n  - **arXiv**\n    - A Survey on Video Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.10647.pdf)\n\n\u003C!-- omit in toc -->\n## Text-to-Video Generation\n- \u003Cspan id=\"text-year-2026\">**Year 2026**\u003C\u002Fspan>\n  - **AAAI**\n    - Minute-Long Videos with Dual Parallelisms [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.21070) [[Project]](https:\u002F\u002Fdualparal-project.github.io\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FDualParal-Project\u002FDualParal)\n- \u003Cspan id=\"text-year-2025\">**Year 2025**\u003C\u002Fspan>\n  - **CVPR**\n    - ***AIGV-Assessor:*** *Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM* [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.17221) [[Code]](https:\u002F\u002Fgithub.com\u002Fwangjiarui153\u002FAIGV-Assessor)\n    - ***RAPO:*** *The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation* [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.11739) [[Project]](https:\u002F\u002Fwhynothaha.github.io\u002FPrompt_optimizer\u002FRAPO.html) [[Code]](https:\u002F\u002Fgithub.com\u002FVchitect\u002FRAPO)\n    - ***ByTheWay:*** Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.06241) [[Code]](https:\u002F\u002Fgithub.com\u002FBujiazi\u002FByTheWay)\n    - ***The Devil is in the Prompts:*** Retrieval-Augmented Prompt Optimization for Text-to-Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.15579) [[Project]](https:\u002F\u002Fwhynothaha.github.io\u002FPrompt_optimizer\u002FRAPO.html) [[Code]](https:\u002F\u002Fgithub.com\u002FVchitect\u002FRAPO)\n    - ***ConsistID:*** Identity-Preserving Text-to-Video Generation by Frequency Decomposition [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.17440) [[Code]](https:\u002F\u002Fgithub.com\u002FPKU-YuanGroup\u002FConsisID) [[Project]](https:\u002F\u002Fpku-yuangroup.github.io\u002FConsisID\u002F)\n    - ***EIDT-V:*** Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.06861) [[Code]](https:\u002F\u002Fgithub.com\u002Fdjagpal02\u002FEIDT-V) [[Project]](https:\u002F\u002Fdjagpal02.github.io\u002FEIDT-V\u002F)\n    - ***TransPixeler:*** Advancing Text-to-Video Generation with Transparency [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.03006) [[Project]](https:\u002F\u002Fwileewang.github.io\u002FTransPixar\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fwileewang\u002FTransPixeler)\n    - ***PhyT2V:*** LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.00596) [[Code]](https:\u002F\u002Fgithub.com\u002Fpittisl\u002FPhyT2V)\n    - ***InstanceCap:*** Improving Text-to-Video Generation via Instance-aware Structured Caption [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.09283) [[Code]](https:\u002F\u002Fgithub.com\u002FNJU-PCALab\u002FInstanceCap)\n    - ***BlobGEN-Vid:*** Compositional Text-to-Video Generation with Blob Video Representations [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.07647) [[Project]](https:\u002F\u002Fblobgen-vid2.github.io\u002F)\n    - ***LinGen:*** Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.09856) [[Project]](https:\u002F\u002Flineargen.github.io\u002F)\n    - ⚠️ Encapsulated Composition of Text-to-Image and Text-to-Video Models for High-Quality Video Synthesis\n  - **ICCV**\n    - ***Unified Video Generation:*** Unified Video Generation via Next-Set Prediction in Continuous Domain [[Paper]](https:\u002F\u002Ficcv.thecvf.com\u002Fvirtual\u002F2025\u002Fposter\u002F72)\n  - **NeurIPS**\n    - ***Stable Cinemetrics:*** Structured Taxonomy and Evaluation for Professional Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2509.26555) [[Project]](https:\u002F\u002Fstable-cinemetrics.github.io\u002F)\n  - **ICLR**\n    - ***OpenVid-1M:*** A Large-Scale High-Quality Dataset for Text-to-video Generation [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=j7kdXSrISM) [[Project]](https:\u002F\u002Fnju-pcalab.github.io\u002Fprojects\u002Fopenvid\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FNJU-PCALab\u002FOpenVid-1M) [[Dataset]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnkp37\u002FOpenVid-1M)\n    - ***CogVideoX:*** Text-to-Video Diffusion Models with An Expert Transformer [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=LQzN6TRFg9)\n    - Pyramidal Flow Matching for Efficient Video Generative Modeling [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=66NzcRQuOq) [[Project]](https:\u002F\u002Fpyramid-flow.github.io\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fjy0205\u002FPyramid-Flow)\n  - **arXiv**\n    - Stable Video Infinity: Infinite-Length Video Generation with Error Recycling [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2510.09212) [[Project]](https:\u002F\u002Fstable-video-infinity.github.io\u002Fhomepage\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fvita-epfl\u002FStable-Video-Infinity?tab=readme-ov-file) [[Video (YouTube)]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=p71Wp1FuqTw) [[Video (Bilibili)]](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1Zd4UzpE6G\u002F?pop_share=1)\n    - ***FEAT:*** Full-Dimensional Efficient Attention Transformer for Medical Video Generation [[Paper]](https:\u002F\u002Fwww.arxiv.org\u002Fpdf\u002F2506.04956) [[Code]](https:\u002F\u002Fgithub.com\u002FYaziwel\u002FFEAT)\n- \u003Cspan id=\"text-year-2024\">**Year 2024**\u003C\u002Fspan>\n  - **CVPR**\n    - ***Vlogger:*** Make Your Dream A Vlog [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.09414.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FVchitect\u002FVlogger)\n    - ***Make Pixels Dance:*** High-Dynamic Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.10982.pdf) [[Project]](https:\u002F\u002Fmakepixelsdance.github.io\u002F) [[Demo]](https:\u002F\u002Fmakepixelsdance.github.io\u002Fdemo.html)\n    - ***VGen:*** Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.04483) [[Code]](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FVGen) [[Project]](https:\u002F\u002Fhigen-t2v.github.io\u002F)\n    - ***GenTron:*** Delving Deep into Diffusion Transformers for Image and Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.04557) [[Project]](https:\u002F\u002Fwww.shoufachen.com\u002Fgentron_website\u002F)\n    - ***SimDA:*** Simple Diffusion Adapter for Efficient Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.09710.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FChenHsing\u002FSimDA) [[Project]](https:\u002F\u002Fchenhsing.github.io\u002FSimDA\u002F)\n    - ***MicroCinema:*** A Divide-and-Conquer Approach for Text-to-Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.18829) [[Project]](https:\u002F\u002Fwangyanhui666.github.io\u002FMicroCinema.github.io\u002F) [[Video]](https:\u002F\u002Fyoutube.com\u002Fshorts\u002FH7O-Ku_lqPA)\n    - ***Generative Rendering:*** Controllable 4D-Guided Video Generation with 2D Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.01409) [[Project]](https:\u002F\u002Fprimecai.github.io\u002Fgenerative_rendering\u002F)\n    - ***PEEKABOO:*** Interactive Video Generation via Masked-Diffusion [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.07509) [[Code]](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FPeekaboo) [[Project]](https:\u002F\u002Fjinga-lala.github.io\u002Fprojects\u002FPeekaboo\u002F) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fanshuln\u002Fpeekaboo-demo)\n    - ***EvalCrafter:*** Benchmarking and Evaluating Large Video Generation Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.11440) [[Code]](https:\u002F\u002Fgithub.com\u002FEvalCrafter\u002FEvalCrafter) [[Project]](https:\u002F\u002Fevalcrafter.github.io\u002F)\n    - A Recipe for Scaling up Text-to-Video Generation with Text-free Videos [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.15770) [[Code]](https:\u002F\u002Fgithub.com\u002Fdamo-vilab\u002Fi2vgen-xl) [[Project]](https:\u002F\u002Ftf-t2v.github.io\u002F)\n    - ***BIVDiff:*** A Training-free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.02813) [[Project]](https:\u002F\u002Fbivdiff.github.io\u002F)\n    - ***Mind the Time:*** Scaled Spatiotemporal Transformers for Text-to-Video Synthesis [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.14797) [[Project]](https:\u002F\u002Fsnap-research.github.io\u002Fsnapvideo\u002Fvideo_ldm.html)\n    - ***Animate Anyone:*** Consistent and Controllable Image-to-video Synthesis for Character Animation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.17117.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FHumanAIGC\u002FAnimateAnyone) [[Project]](https:\u002F\u002Fhumanaigc.github.io\u002Fanimate-anyone\u002F)\n    - ***MotionDirector:*** Motion Customization of Text-to-Video Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.08465) [[Code]](https:\u002F\u002Fgithub.com\u002Fshowlab\u002FMotionDirector)\n    - Hierarchical Patch-wise Diffusion Models for High-Resolution Video Generation [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FSkorokhodov_Hierarchical_Patch_Diffusion_Models_for_High-Resolution_Video_Generation_CVPR_2024_paper.pdf) [[Project]](https:\u002F\u002Fsnap-research.github.io\u002Fhpdm\u002F)\n    - ***DiffPerformer:*** Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FWang_DiffPerformer_Iterative_Learning_of_Consistent_Latent_Guidance_for_Diffusion-based_Human_CVPR_2024_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Faipixel\u002F)\n    - Grid Diffusion Models for Text-to-Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.00234.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Ftaegyeong-lee\u002FGrid-Diffusion-Models-for-Text-to-Video-Generation) [[Video]](https:\u002F\u002Ftaegyeong-lee.github.io\u002Ftext2video)\n  - **ECCV**\n    - ***Emu Video:*** Factorizing Text-to-Video Generation by Explicit Image Conditioning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.10709.pdf) [[Project]](https:\u002F\u002Fai.meta.com\u002Fblog\u002Femu-text-to-video-generation-image-editing-research\u002F)\n    - ***W.A.L.T.:*** Photorealistic Video Generation with Diffusion Models [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F10270.pdf) [[Project]](https:\u002F\u002Fwalt-video-diffusion.github.io\u002F)\n    - ***MoVideo:*** Motion-Aware Video Generation with Diffusion Models [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F06030.pdf)\n    - ***DrivingDiffusion:*** Layout-Guided Multi-View Driving Scenarios Video Generation with Latent Diffusion Model [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F10097.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fshalfun\u002FDrivingDiffusion) [[Project]](https:\u002F\u002Fdrivingdiffusion.github.io\u002F)\n    - ***MagDiff:*** Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F02738.pdf)\n    - ***HARIVO:*** Harnessing Text-to-Image Models for Video Generation [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F06938.pdf) [[Project]](https:\u002F\u002Fkwonminki.github.io\u002FHARIVO\u002F)\n    - ***MEVG:*** Multi-event Video Generation with Text-to-Video Models [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F06012.pdf) [[Project]](https:\u002F\u002Fkuai-lab.github.io\u002Feccv2024mevg\u002F)\n  - **NeurIPS**\n    - Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning [[Paper]](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2024\u002Ffile\u002F81f19c0e9f3e06c831630ab6662fd8ea-Paper-Conference.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FPR-Ryan\u002FDEMO)\n  - **ICML**\n    - ***Video-LaVIT:*** Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=S9lk6dk4LL) [[Project]](https:\u002F\u002Fvideo-lavit.github.io\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fjy0205\u002FLaVIT)\n  - **ICLR**\n    - ***VDT:*** General-purpose Video Diffusion Transformers via Mask Modeling [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.13311.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FRERV\u002FVDT) [[Project]](https:\u002F\u002Fvdt-2023.github.io\u002F)\n    - ***VersVideo:*** Leveraging Enhanced Temporal Diffusion Models for Versatile Video Generation [[Paper]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=K9sVJ17zvB)\n  - **AAAI**\n    - ***Follow Your Pose:*** Pose-Guided Text-to-Video Generation using Pose-Free Videos [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.01186) [[Code]](https:\u002F\u002Fgithub.com\u002Fmayuelala\u002FFollowYourPose) [[Project]](https:\u002F\u002Ffollow-your-pose.github.io\u002F)\n    - ***E2HQV:*** High-Quality Video Generation from Event Camera via Theory-Inspired Model-Aided Deep Learning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.08117)\n    - ***ConditionVideo:*** Training-Free Condition-Guided Text-to-Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.07697) [[Code]](https:\u002F\u002Fgithub.com\u002Fpengbo807\u002FConditionVideo) [[Project]](https:\u002F\u002Fpengbo807.github.io\u002Fconditionvideo-website\u002F)\n    - ***F3-Pruning:*** A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text to-Video Synthesis [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.03459)\n  - **arXiv**  \n    - ***Lumiere:*** A Space-Time Diffusion Model for Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.12945.pdf) [[Project]](https:\u002F\u002Flumiere-video.github.io\u002F)\n    - ***Boximator:*** Generating Rich and Controllable Motions for Video Synthesis [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.01566.pdf) [[Project]](https:\u002F\u002Fboximator.github.io\u002F) [[Video]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=reto_TYsYyQ)\n    - World Model on Million-Length Video And Language With RingAttention [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.08268) [[Code]](https:\u002F\u002Fgithub.com\u002FLargeWorldModel\u002FLWM) [[Project]](https:\u002F\u002Flargeworldmodel.github.io\u002F)\n    - ***Direct-a-Video:*** Customized Video Generation with User-Directed Camera Movement and Object Motion [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.03162.pdf) [[Project]](https:\u002F\u002Fdirect-a-video.github.io\u002F)\n    - ***WorldDreamer:*** Towards General World Models for Video Generation via Predicting Masked Tokens [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.09985) [[Code]](https:\u002F\u002Fgithub.com\u002FJeffWang987\u002FWorldDreamer) [[Project]](https:\u002F\u002Fworld-dreamer.github.io\u002F)\n    - ***MagicVideo-V2:*** Multi-Stage High-Aesthetic Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.04468.pdf) [[Project]](https:\u002F\u002Fmagicvideov2.github.io\u002F)\n    - ***Latte:*** Latent Diffusion Transformer for Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.03048) [[Code]](https:\u002F\u002Fgithub.com\u002FVchitect\u002FLatte) [[Project]](https:\u002F\u002Fmaxin-cn.github.io\u002Flatte_project)\n    - ***Mora:*** Enabling Generalist Video Generation via A Multi-Agent Framework [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.13248) [[Code]](https:\u002F\u002Fgithub.com\u002Flichao-sun\u002FMora)\n    - ***StreamingT2V:*** Consistent, Dynamic, and Extendable Long Video Generation from Text [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.14773) [[Code]](https:\u002F\u002Fgithub.com\u002FPicsart-AI-Research\u002FStreamingT2V) [[Project]](https:\u002F\u002Fstreamingt2v.github.io\u002F) [[Video]](https:\u002F\u002Ftwitter.com\u002Fi\u002Fstatus\u002F1770909673463390414)\n    - ***VIDiff:*** Translating Videos via Multi-Modal Instructions with Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.18837)\n    - ***StoryDiffusion:*** Consistent Self-Attention for Long-Range Image and Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.01434) [[Code]](https:\u002F\u002Fgithub.com\u002FHVision-NKU\u002FStoryDiffusion) [[Project]](https:\u002F\u002Fstorydiffusion.github.io\u002F) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FYupengZhou\u002FStoryDiffusion)\n    - ***Ctrl-Adapter:*** An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.09967) [[Code]](https:\u002F\u002Fgithub.com\u002FHL-hanlin\u002FCtrl-Adapter) [[Project]](https:\u002F\u002Fctrl-adapter.github.io\u002F)\n    - ***ControlNeXt:*** Powerful and Efficient Control for Image and Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.06070) [[Code]](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FControlNeXt) [[Project]](https:\u002F\u002Fpbihao.github.io\u002Fprojects\u002Fcontrolnext\u002Findex.html)\n    - ***FancyVideo:*** Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.08189v1) [[Project]](https:\u002F\u002Ffancyvideo.github.io\u002F)\n    - ***Factorized-Dreamer:*** Training A High-Quality Video Generator with Limited and Low-Quality Data [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.10119v1) [[Code]](https:\u002F\u002Fgithub.com\u002Fyangxy\u002FFactorized-Dreamer\u002F)\n    - Fine-gained Zero-shot Video Sampling [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.21475) [[Project]](https:\u002F\u002Fdensechen.github.io\u002Fzss\u002F)\n    - Training-free Long Video Generation with Chain of Diffusion Model Experts [[Paper]](https:\u002F\u002Fwww.arxiv.org\u002Fpdf\u002F2408.13423)\n    - ***ReconX:*** Reconstruct Any Scene from Sparse Views with Video Diffusion Model [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.16767) [[Code]](https:\u002F\u002Fgithub.com\u002Fliuff19\u002FReconX) [[Project]](https:\u002F\u002Fliuff19.github.io\u002FReconX\u002F) [[Video]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=UuL2nP5rJcI)\n    - ConFiner: Training-free Long Video Generation with Chain of Diffusion Model Experts [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.13423) [[Code]](https:\u002F\u002Fgithub.com\u002FConfiner2025\u002FConfiner2025)\n    - ***3DTrajMaster:*** Mastering 3D Trajectory for Multi-Entity Motion in Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.07759) [[Code]](https:\u002F\u002Fgithub.com\u002FKwaiVGI\u002F3DTrajMaster) [[Project]](https:\u002F\u002Ffuxiao0719.github.io\u002Fprojects\u002F3dtrajmaster\u002F)\n    - ***DiTCtrl:*** Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.18597) [[Code]](https:\u002F\u002Fgithub.com\u002FTencentARC\u002FDiTCtrl) [[Project]](https:\u002F\u002Fonevfall.github.io\u002Fproject_page\u002Fditctrl\u002F)\n  - **Others**\n    - ***Sora:*** Video Generation Models as World Simulators [[Paper]](https:\u002F\u002Fopenai.com\u002Fresearch\u002Fvideo-generation-models-as-world-simulators)\n- \u003Cspan id=\"text-year-2023\">**Year 2023**\u003C\u002Fspan>\n  - **CVPR**\n    - ***Align your Latents:*** High-resolution Video Synthesis with Latent Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.08818.pdf) [[Project]](https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Ftoronto-ai\u002FVideoLDM\u002F) [[Reproduced code]](https:\u002F\u002Fgithub.com\u002Fsrpkdyy\u002FVideoLDM)\n    - ***Text2Video-Zero:*** Text-to-image Diffusion Models are Zero-shot Video Generators [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FKhachatryan_Text2Video-Zero_Text-to-Image_Diffusion_Models_are_Zero-Shot_Video_Generators_ICCV_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FPicsart-AI-Research\u002FText2Video-Zero) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FPAIR\u002FText2Video-Zero) [[Project]](https:\u002F\u002Ftext2video-zero.github.io\u002F) \n    - Video Probabilistic Diffusion Models in Projected Latent Space [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FYu_Video_Probabilistic_Diffusion_Models_in_Projected_Latent_Space_CVPR_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fsihyun-yu\u002FPVDM)\n  - **ICCV**\n    - Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FGe_Preserve_Your_Own_Correlation_A_Noise_Prior_for_Video_Diffusion_ICCV_2023_paper.pdf) [[Project]](https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Fdir\u002Fpyoco\u002F)\n    - ***Gen-1:*** Structure and Content-guided Video Synthesis with Diffusion Models [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FEsser_Structure_and_Content-Guided_Video_Synthesis_with_Diffusion_Models_ICCV_2023_paper.pdf) [[Project]](https:\u002F\u002Fresearch.runwayml.com\u002Fgen1)\n  - **NeurIPS**\n    - Video Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.03458.pdf) [[Project]](https:\u002F\u002Fvideo-diffusion.github.io\u002F)\n    - Learning Universal Policies via Text-Guided Video Generation [[Paper]](https:\u002F\u002Fpapers.nips.cc\u002Fpaper_files\u002Fpaper\u002F2023\u002Ffile\u002F1d5b9233ad716a43be5c0d3023cb82d0-Paper-Conference.pdf) [[Project]](https:\u002F\u002Funiversal-policy.github.io\u002Funipi\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fflow-diffusion\u002FAVDC)\n    - ***VideoComposer:*** Compositional Video Synthesis with Motion Controllability [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2306.02018.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002Fvideocomposer) [[Project]](https:\u002F\u002Fvideocomposer.github.io\u002F)\n  - **ICLR**\n    - ***CogVideo:*** Large-scale Pretraining for Text-to-video Generation via Transformers [[Paper]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=rB6TpjAuSRy) [[Code]](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FCogVideo) [[Demo]](https:\u002F\u002Fmodels.aminer.cn\u002Fcogvideo\u002F)\n    - ***Make-A-Video:*** Text-to-video Generation without Text-video Data [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.14792.pdf) [[Project]](https:\u002F\u002Fmakeavideo.studio\u002F) [[Reproduced code]](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fmake-a-video-pytorch)\n    - ***Phenaki:*** Variable Length Video Generation From Open Domain Textual Description [[Paper]](https:\u002F\u002Fopenreview.net\u002Fpdf\u002Ffe8e106a2746992c9c2e658bdc8cb9c89cc5a39a.pdf) [[Reproduced Code]](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fphenaki-pytorch)\n  - **arXiv**\n    - ***Control-A-Video:*** Controllable Text-to-video Generation with Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.13840.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FWeifeng-Chen\u002Fcontrol-a-video) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fwf-genius\u002FControl-A-Video) [[Project]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.13840.pdf)\n    - ***ControlVideo:*** Training-free Controllable Text-to-video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.13077.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FYBYBZhang\u002FControlVideo)\n    - ***Imagen Video:*** High Definition Video Generation with Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2210.02303.pdf) \n    - ***Latent-Shift:*** Latent Diffusion with Temporal Shift for Efficient Text-to-video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.08477.pdf) [[Project]](https:\u002F\u002Flatent-shift.github.io\u002F)\n    - ***LAVIE:*** High-quality Video Generation with Cascaded Latent Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.15103.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FVchitect\u002FLaVie) [[Project]](https:\u002F\u002Fvchitect.github.io\u002FLaVie-project\u002F)\n    - ***Show-1:*** Marrying Pixel and Latent Diffusion Models for Text-to-video Generation [[Paper]](https:\u002F\u002Fshowlab.github.io\u002FShow-1\u002Fassets\u002FShow-1.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fshowlab\u002FShow-1) [[Project]](https:\u002F\u002Fshowlab.github.io\u002FShow-1\u002F)\n    - ***Stable Video Diffusion:*** Scaling Latent Video Diffusion Models to Large Datasets [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.15127.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FStability-AI\u002Fgenerative-models) [[Project]](https:\u002F\u002Fstability.ai\u002Fnews\u002Fstable-video-diffusion-open-ai-video-model)\n    - ***VideoFactory:*** Swap Attention in Spatiotemporal Diffusions for Text-to-video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.10874.pdf) [[Dataset]](https:\u002F\u002Fgithub.com\u002Fdaooshee\u002FHD-VG-130M)\n    - ***VideoGen:*** A Reference-guided Latent Diffusion Approach for High Definition Text-to-video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.00398.pdf) [[Code]](https:\u002F\u002Fvideogen.github.io\u002FVideoGen\u002F)\n    - ***InstructVideo:*** Instructing Video Diffusion Models with Human Feedback [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.12490.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002Fi2vgen-xl\u002Fblob\u002Fmain\u002Fdoc\u002FInstructVideo.md) [[Project]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.12490.pdf)\n    - ***SEINE:*** Short-to-Long Video Diffusion Model for Generative Transition and Prediction [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.20700.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FVchitect\u002FSEINE) [[Project]](https:\u002F\u002Fvchitect.github.io\u002FSEINE-project\u002F)\n    - ***VideoLCM:*** Video Latent Consistency Model [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.09109.pdf)\n    - ModelScope Text-to-Video Technical Report [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.06571.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FExponentialML\u002FText-To-Video-Finetuning)\n    - ***LAMP:*** Learn A Motion Pattern for Few-Shot-Based Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.10769) [[Code]](https:\u002F\u002Frq-wu.github.io\u002Fprojects\u002FLAMP) [[Project]](https:\u002F\u002Frq-wu.github.io\u002Fprojects\u002FLAMP)\n    - ***STG:*** Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.18664) [[Code]](https:\u002F\u002Fgithub.com\u002Fjunhahyung\u002FSTGuidance) [[Project]](https:\u002F\u002Fjunhahyung.github.io\u002FSTGuidance\u002F)\n    - ***Motion-Zero:*** Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.10150) [[Project]](https:\u002F\u002Flitaoguo.github.io\u002FMotionZero.github.io\u002F)\n    - ***NOVA:*** Autoregressive Video Generation without Vector Quantization Topics [[Paper]](https:\u002F\u002Fbitterdhg.github.io\u002FNOVA_page\u002Fpaper\u002F2412.14169v1.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fbaaivision\u002FNOVA) [[Project]](https:\u002F\u002Fbitterdhg.github.io\u002FNOVA_page\u002F)\n- \u003Cspan id=\"text-year-2022\">**Year 2022**\u003C\u002Fspan>\n  - **CVPR**    \n    - ***Show Me What and Tell Me How:*** Video Synthesis via Multimodal Conditioning [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FHan_Show_Me_What_and_Tell_Me_How_Video_Synthesis_via_CVPR_2022_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fsnap-research\u002FMMVID) [[Dataset]](https:\u002F\u002Fgithub.com\u002Fsnap-research\u002FMMVID\u002Fblob\u002Fmain\u002Fmm_vox_celeb\u002FREADME.md)\n- \u003Cspan id=\"text-year-2021\">**Year 2021**\u003C\u002Fspan>\n  - **arXiv**\n      -  ***VideoGPT:*** Video Generation using VQ-VAE and Transformers [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2104.10157.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fwilson1yan\u002FVideoGPT) [[Project]](https:\u002F\u002Fwilson1yan.github.io\u002Fvideogpt\u002Findex.html)\n      -  ***MagicVideo:*** Efficient Video Generation With Latent Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2211.11018)\n      -  ***EasyAnimate:*** A High-Performance Long Video Generation Method based on Transformer Architecture [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.18991) [[Code]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.18991)\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n## Image-to-Video Generation\n- \u003Cspan id=\"image-year-2025\">**Year 2025**\u003C\u002Fspan>\n  - **CVPR**\n    - ***MotionStone:*** Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.05848)\n    - ***MotionPro:*** A Precise Motion Controller for Image-to-Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.00948) [[Code]](https:\u002F\u002Fgithub.com\u002FHiDream-ai\u002FMotionPro)\n    - ***Through-The-Mask:*** Mask-based Motion Trajectories for Image-to-Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.03059)\n    - Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.00948)\n  - **ICCV**\n    - ***AnyI2V:*** Animating Any Conditional Image with Motion Control [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.02857) [[Project]](https:\u002F\u002Fhenghuiding.com\u002FAnyI2V\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FFudanCVL\u002FAnyI2V)\n    - ***Versatile Transition Generation:*** Versatile Transition Generation with Image-to-Video Diffusion [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.01698)\n    - ***TIP-I2V:*** A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.04709) [[Project]](https:\u002F\u002Ftip-i2v.github.io) [[Code]](https:\u002F\u002Fgithub.com\u002FWangWenhao0716\u002FTIP-I2V)\n  - **ICLR**\n    - ***SG-I2V:*** Self-Guided Trajectory Control in Image-to-Video Generation [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=uQjySppU9x) [[Project]](https:\u002F\u002Fkmcode1.github.io\u002FProjects\u002FSG-I2V\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FKmcode1\u002FSG-I2V)\n    - Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=ykD8a9gJvy)\n    - Pyramidal Flow Matching for Efficient Video Generative Modeling [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=66NzcRQuOq) [[Project]](https:\u002F\u002Fpyramid-flow.github.io\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fjy0205\u002FPyramid-Flow)\n- \u003Cspan id=\"image-year-2024\">**Year 2024**\u003C\u002Fspan>\n  - **CVPR**\n    - ***VideoBooth:*** Diffusion-based Video Generation with Image Prompts [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.00777) [[Code]](https:\u002F\u002Fgithub.com\u002FVchitect\u002FVideoBooth) [[Project]](https:\u002F\u002Fvchitect.github.io\u002FVideoBooth-project\u002F) [[Video]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=10DxH1JETzI)\n  - **ECCV**\n    - Rethinking Image-to-Video Adaptation: An Object-centric Perspective [[Paper]](http:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.06871v1)\n    - ***PhysGen:*** Rigid-Body Physics-Grounded Image-to-Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.18964) [[Code]](https:\u002F\u002Fgithub.com\u002Fstevenlsw\u002Fphysgen) [[Project]](https:\u002F\u002Fstevenlsw.github.io\u002Fphysgen\u002F)\n    - ***MOFA-Video:*** Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.20222) [[Code]](https:\u002F\u002Fgithub.com\u002FMyNiuuu\u002FMOFA-Video) [[Project]](https:\u002F\u002Fmyniuuu.github.io\u002FMOFA_Video\u002F)\n  - **AAAI**\n    - Decouple Content and Motion for Conditional Image-to-Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.14294)\n  - **NeurIPS**\n    - Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model [[Paper]](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2024\u002Ffile\u002F35cb54b887e7aafe74829677cce6c5c6-Paper-Conference.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002Fcond-image-leakage)\n  - **ICML**\n    - ***Video-LaVIT:*** Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=S9lk6dk4LL) [[Project]](https:\u002F\u002Fvideo-lavit.github.io\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fjy0205\u002FLaVIT)\n  - **arXiv**\n    -  ***ConsistI2V:*** Enhancing Visual Consistency for Image-to-Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.04324.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FTIGER-AI-Lab\u002FConsistI2V) [[Project]](https:\u002F\u002Ftiger-ai-lab.github.io\u002FConsistI2V\u002F)\n    - ***I2V-Adapter:*** A General Image-to-Video Adapter for Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.16693.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FI2V-Adapter\u002FI2V-Adapter-repo)\n    - ***Follow-Your-Click:*** Open-domain Regional Image Animation via Short Prompts [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.08268) [[Code]](https:\u002F\u002Fgithub.com\u002Fmayuelala\u002FFollowYourClick) [[Project]](https:\u002F\u002Ffollow-your-click.github.io\u002F)\n    - ***AtomoVideo:*** High Fidelity Image-to-Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.01800.pdf) [[Project]](https:\u002F\u002Fatomo-video.github.io\u002F) [[Video]](https:\u002F\u002Fwww.youtube.com\u002Fembed\u002F36JIlk-U-vQ)\n    - ***Pix2Gif:*** Motion-Guided Diffusion for GIF Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.04634) [[Code]](https:\u002F\u002Fhiteshk03.github.io\u002FPix2Gif\u002F) [[Project]](https:\u002F\u002Fhiteshk03.github.io\u002FPix2Gif\u002F)\n    - ***ID-Animator:*** Zero-Shot Identity-Preserving Human Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.15275.pdf) [[Code]](https:\u002F\u002Fid-animator.github.io\u002F) [[Project]](https:\u002F\u002Fid-animator.github.io\u002F)\n    - Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.02827) [[Project]](https:\u002F\u002Fnoise-rectification.github.io\u002F)\n    - ***MegActor-Σ:*** Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.14975) [[Code]](https:\u002F\u002Fgithub.com\u002Fmegvii-research\u002Fmegactor)\n    - ***LeviTor:*** 3D Trajectory Oriented Image-to-Video Synthesis [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.15214) [[Code]](https:\u002F\u002Fgithub.com\u002Fqiuyu96\u002FLeviTor) [[Project]](https:\u002F\u002Fppetrichor.github.io\u002Flevitor.github.io\u002F) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fhlwang06\u002FLeviTor\u002Ftree\u002Fmain)\n\n- \u003Cspan id=\"image-year-2023\">**Year 2023**\u003C\u002Fspan>\n  - **CVPR**\n    - Conditional Image-to-Video Generation with Latent Flow Diffusion Models [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FNi_Conditional_Image-to-Video_Generation_With_Latent_Flow_Diffusion_Models_CVPR_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fnihaomiao\u002FCVPR23_LFDM)\n  - **arXiv**\n    - ***I2VGen-XL:*** High-quality Image-to-video Synthesis via Cascaded Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.04145.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002Fi2vgen-xl) [[Project]](https:\u002F\u002Fi2vgen-xl.github.io\u002F)\n    - ***DreamVideo:*** High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.03018) [[Code]](https:\u002F\u002Fgithub.com\u002Fanonymous0769\u002FDreamVideo) [[Project]](https:\u002F\u002Fanonymous0769.github.io\u002FDreamVideo\u002F)\n    - ***DynamiCrafter:*** Animating Open-domain Images with Video Diffusion Priors [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.12190) [[Project]](https:\u002F\u002Fdoubiiu.github.io\u002Fprojects\u002FDynamiCrafter\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FDoubiiu\u002FDynamiCrafter) [[Video]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=0NfmIsNAg-g) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FDoubiiu\u002FDynamiCrafter)\n    - ***AnimateDiff:*** Animate Your Personalized Text-to-image Diffusion Models without Specific Tuning [[Paper]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=Fx2SbBgcte) [[Project]](https:\u002F\u002Fanimatediff.github.io\u002F)\n- \u003Cspan id=\"image-year-2022\">**Year 2022**\u003C\u002Fspan>\n  - **CVPR**    \n    - ***Make It Move:*** Controllable Image-to-Video Generation with Text Descriptions [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FHu_Make_It_Move_Controllable_Image-to-Video_Generation_With_Text_Descriptions_CVPR_2022_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FYouncy-Hu\u002FMAGE)\n- \u003Cspan id=\"image-year-2021\">**Year 2021**\u003C\u002Fspan>\n  - **ICCV**\n    - ***Click to Move:*** Controlling Video Generation with Sparse Motion [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2021\u002Fpapers\u002FArdino_Click_To_Move_Controlling_Video_Generation_With_Sparse_Motion_ICCV_2021_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FPierfrancescoArdino\u002FC2M)\n\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n## Audio-to-Video Generation\n- \u003Cspan id=\"audio-year-2024\">**Year 2024**\u003C\u002Fspan>  \n  - **AAAI**\n    - Diverse and Aligned Audio-to-Video Generation \n  via Text-to-Video Model Adaptation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.16429) [[Code]](https:\u002F\u002Fgithub.com\u002Fguyyariv\u002FTempoTokens)\n- \u003Cspan id=\"audio-year-2023\">**Year 2023**\u003C\u002Fspan>\n  - **CVPR**\n    - ***MM-Diffusion:*** Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FRuan_MM-Diffusion_Learning_Multi-Modal_Diffusion_Models_for_Joint_Audio_and_Video_CVPR_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fresearchmm\u002FMM-Diffusion)\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n## Personalized Video Generation\n- \u003Cspan id=\"personalized-year-2024\">**Year 2024**\u003C\u002Fspan>\n  - **CVPR**\n    - High-fidelity Person-centric Subject-to-Image Synthesis [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.10329) [[Code]](https:\u002F\u002Fgithub.com\u002FCodeGoat24\u002FFace-diffuser)\n  - **ICCV**\n    - - ***Magic Mirror:*** Magic Mirror: ID‑Preserved Video Generation in Video Diffusion Transformers [[Paper]](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2501.03931) [[Project]](https:\u002F\u002Fjulianjuaner.github.io\u002Fprojects\u002FMagicMirror\u002Findex.html) [[Code]](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FMagicMirror)\n    - ***PersonalVideo:*** PersonalVideo: High ID‑Fidelity Video Customization without Dynamic and Semantic Degradation [[Paper]](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2411.17048) [[Project]](https:\u002F\u002Fpersonalvideo.github.io) [[Code]](https:\u002F\u002Fgithub.com\u002FEchoPluto\u002FPersonalVideo)\n    - ***MagicID:*** MagicID: Hybrid Preference Optimization for ID‑Consistent and Dynamic‑Preserved Video Customization [[Paper]](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2503.12689) [[Project]](https:\u002F\u002Fechopluto.github.io\u002FMagicID-project) [[Code]](https:\u002F\u002Fgithub.com\u002FEchoPluto\u002FMagicID)\n    - ***DreamRelation:*** DreamRelation: Relation‑Centric Video Customization [[Paper]](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2503.07602) [[Project]](https:\u002F\u002Fdreamrelation.github.io)\n    - ⚠️ ***PERSONA:*** PERSONA: Personalized Whole‑Body 3D Avatar with Pose‑Driven Deformations from a Single Image\n  - **ECCV**\n    - ***PoseCrafter:*** One-Shot Personalized Video Synthesis Following Flexible Pose Control [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F06050.pdf) [[Project]](https:\u002F\u002Fml-gsai.github.io\u002FPoseCrafter-demo\u002F)\n  - **arXiv**\n    - ***Magic-Me:*** Identity-Specific Video Customized Diffusion [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.09368) [[Code]](https:\u002F\u002Fgithub.com\u002FZhen-Dong\u002FMagic-Me) [[Project]](https:\u002F\u002Fmagic-me-webpage.github.io\u002F) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FvisionMaze\u002FMagic-Me)\n    - ***ReVideo:*** Remake a Video with Motion and Content Control [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.13865) [[Code]](https:\u002F\u002Fgithub.com\u002FMC-E\u002FReVideo) [[Project]](https:\u002F\u002Fmc-e.github.io\u002Fproject\u002FReVideo\u002F)\n    - ***ConceptMaster:*** Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.04698) [[Project]](https:\u002F\u002Fyuzhou914.github.io\u002FConceptMaster\u002F)\n- \u003Cspan id=\"personalized-year-2023\">**Year 2023**\u003C\u002Fspan>\n  - **arXiv**\n    - ***FastComposer:*** Tuning-Free Multi-Subject Image Generation with Localized Attention [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.10431) [[Code]](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Ffastcomposer?tab=readme-ov-file) [[Demo]](https:\u002F\u002Ffastcomposer.hanlab.ai\u002F)\n    - ***Make-Your-Video:*** Customized Video Generation Using Textual and Structural Guidance [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2306.00943) [[Project]](https:\u002F\u002Fdoubiiu.github.io\u002Fprojects\u002FMake-Your-Video\u002F)\n    - ***DreamVideo-2:*** Zero-Shot Subject-Driven Video Customization with Precise Motion Control [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.13830) [[Project]](https:\u002F\u002Fdreamvideo2.github.io\u002F)\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\n\u003C!-- omit in toc -->\n## Video Editing\n- \u003Cspan id=\"editing-year-2025\">**Year 2025**\u003C\u002Fspan>\n  - **CVPR**\n    - ***VideoDirector:*** Precise Video Editing via Text-to-Video Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.17592) [[Code]](https:\u002F\u002Fgithub.com\u002FYukun66\u002FVideo_Director)\n    - ***VideoMage:*** Multi-Subject and Motion Customization of Text-to-Video Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.21781) [[Project]](https:\u002F\u002Fjasper0314-huang.github.io\u002Fvideomage-customization\u002F)\n    - Visual Prompting for One-shot Controllable Video Editing without Inversion [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.14335) [[Project]](https:\u002F\u002Fwww.zhengbozhang.com\u002F)\n    - ***SketchVideo:*** Sketch-based Video Generation and Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.23284) [[Code]](https:\u002F\u002Fgithub.com\u002FIGLICT\u002FSketchVideo) [[Project]](http:\u002F\u002Fgeometrylearning.com\u002FSketchVideo\u002F)\n    - h-***Edit:*** Effective and Flexible Diffusion-Based Editing via Doob's h-Transform [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.02187) [[Code]](https:\u002F\u002Fgithub.com\u002Fnktoan\u002Fh-edit) [[Project]](https:\u002F\u002Fnktoan.github.io\u002Fh-Edit-cvpr25\u002F)\n    - ***ObjectMover:*** Generative Object Movement with Video Prior [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.08037) [[Project]](https:\u002F\u002Fxinyu-andy.github.io\u002FObjMover\u002F)\n    - ***MatAnyone:*** Stable Video Matting with Consistent Memory Propagation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.14677) [[Code]](https:\u002F\u002Fgithub.com\u002Fpq-yang\u002FMatAnyone) [[Project]](https:\u002F\u002Fpq-yang.github.io\u002Fprojects\u002FMatAnyone\u002F)\n    - ***StyleMaster:*** Stylize Your Video with Artistic Generation and Translation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.07744) [[Code]](https:\u002F\u002Fgithub.com\u002FKwaiVGI\u002FStyleMaster) [[Project]](https:\u002F\u002Fzixuan-ye.github.io\u002Fstylemaster\u002F)\n    - ***AudCast:*** Audio-Driven Human Video Generation by Cascaded Diffusion Transformers [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.19824) [[Project]](https:\u002F\u002Fguanjz20.github.io\u002Fprojects\u002FAudCast\u002F)\n    - ⚠️ ***FADE:*** Frequency-Aware Diffusion Model Factorization for Video Editing [[Paper]]() [[Code]](https:\u002F\u002Fgithub.com\u002FEternalEvan\u002FFADE)\n    - ⚠️ Align-A-Video: Deterministic Reward Tuning of Image Diffusion Models for Consistent Video Editing\n    - ⚠️ Unity in Diversity: Video Editing via Gradient-Latent Purification\n  - **ICCV**\n    - ***VACE:*** All-in-One Video Creation and Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.07598) [[Project]](https:\u002F\u002Fali-vilab.github.io\u002FVACE-Page\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FVACE)\n    - ***Reangle-A-Video:*** 4D Video Generation as Video-to-Video Translation [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.09151) [[Project]](https:\u002F\u002Fanony1anony2.github.io\u002F)\n    - ***DIVE:*** Taming DINO for Subject-Driven Video Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.03347) [[Project]](https:\u002F\u002Fdino-video-editing.github.io\u002F)\n    - ***DynamicFace:*** High-Quality and Consistent Face Swapping for Image and Video using Composable 3D Facial Priors [[Paper]](https:\u002F\u002Ficcv.thecvf.com\u002Fvirtual\u002F2025\u002Fposter\u002F1368) [[Project]](https:\u002F\u002Fdynamic-face.github.io\u002F)\n    - ***QK-Edit:*** Revisiting Attention-based Injection in MM-DiT for Image and Video Editing [[Paper]](https:\u002F\u002Ficcv.thecvf.com\u002Fvirtual\u002F2025\u002Fposter\u002F215)\n    - ***Teleportraits:*** Training-Free People Insertion into Any Scene [[Paper]](https:\u002F\u002Ficcv.thecvf.com\u002Fvirtual\u002F2025\u002Fposter\u002F809)\n  - **ICLR**\n    - ***VideoGrain:*** Modulating Space-Time Attention for Multi-Grained Video Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.17258) [[Project]](https:\u002F\u002Fknightyxp.github.io\u002FVideoGrain_project_page\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fknightyxp\u002FVideoGrain)\n- \u003Cspan id=\"editing-year-2024\">**Year 2024**\u003C\u002Fspan>\n  - **CVPR**\n    - ***VMC:*** Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.00845) [[Code]](https:\u002F\u002Fgithub.com\u002FHyeonHo99\u002FVideo-Motion-Customization) [[Project]](https:\u002F\u002Fvideo-motion-customization.github.io\u002F)\n    - ***Fairy:*** Fast Parallellized Instruction-Guided Video-to-Video Synthesis [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.13834) [[Project]](https:\u002F\u002Ffairy-video2video.github.io\u002F)\n    - ***CCEdit:*** Creative and Controllable Video Editing via Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.16496) [[Code]](https:\u002F\u002Fgithub.com\u002FRuoyuFeng\u002FCCEdit) [[Project]](https:\u002F\u002Fruoyufeng.github.io\u002FCCEdit.github.io\u002F) [[Video]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=UQw4jq-igN4)\n    - ***DynVideo-E:*** Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.10624) [[Project]](https:\u002F\u002Fshowlab.github.io\u002FDynVideo-E\u002F) [[Video]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=xiRH4Q6B3Yk)\n    - ***Video-P2P:*** Video Editing with Cross-attention Control [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.04761) [[Code]](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FVideo-P2P) [[Project]](https:\u002F\u002Fvideo-p2p.github.io\u002F)\n    - A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.05856) [[Code]](https:\u002F\u002Fgithub.com\u002FSTEM-Inv\u002Fstem-inv) [[Project]](https:\u002F\u002Fstem-inv.github.io\u002Fpage\u002F)\n    - ***MaskINT:*** Video Editing via Interpolative Non-autoregressive Masked Transformers [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.12468) [[Code]](https:\u002F\u002Fmaskint.github.io\u002F) [[Project]](https:\u002F\u002Fmaskint.github.io\u002F)\n    - ***VidToMe:*** Video Token Merging for Zero-Shot Video Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.10656) [[Code]](https:\u002F\u002Fgithub.com\u002Flixirui142\u002FVidToMe) [[Project]](https:\u002F\u002Fvidtome-diffusion.github.io\u002F) [[Video]](https:\u002F\u002Fyoutu.be\u002FcZPtwcRepNY)\n    - Towards Language-Driven Video Inpainting via Multimodal Large Language Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.10226.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fjianzongwu\u002FLanguage-Driven-Video-Inpainting) [[Project]](https:\u002F\u002Fjianzongwu.github.io\u002Fprojects\u002Frovi\u002F) [[Dataset]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fjianzongwu\u002Frovi)\n    - ***AVID:*** Any-Length Video Inpainting with Diffusion Model [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.03816.pdf) [[Code]](https:\u002F\u002Fzhang-zx.github.io\u002FAVID\u002F) [[Project]](https:\u002F\u002Fzhang-zx.github.io\u002FAVID\u002F)\n    - ***CAMEL:*** CAusal Motion Enhancement tailored for Lifting Text-driven Video Editing [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FZhang_CAMEL_CAusal_Motion_Enhancement_Tailored_for_Lifting_Text-driven_Video_Editing_CVPR_2024_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fzhangguiwei610\u002FCAMEL)\n    - Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FYatim_Space-Time_Diffusion_Features_for_Zero-Shot_Text-Driven_Motion_Transfer_CVPR_2024_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fdiffusion-motion-transfer\u002Fdiffusion-motion-transfer) [[Project]](https:\u002F\u002Fdiffusion-motion-transfer.github.io\u002F)\n    - ***FRESCO:*** Spatial-Temporal Correspondence for Zero-Shot Video Translation [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FYang_FRESCO_Spatial-Temporal_Correspondence_for_Zero-Shot_Video_Translation_CVPR_2024_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fwilliamyang1991\u002FFRESCO) [[Project]](https:\u002F\u002Fwww.mmlab-ntu.com\u002Fproject\u002Ffresco\u002F)\n    - ***MotionEditor:*** Editing Video Motion via Content-Aware Diffusion [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FTu_MotionEditor_Editing_Video_Motion_via_Content-Aware_Diffusion_CVPR_2024_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FFrancis-Rings\u002FMotionEditor) [[Project]](https:\u002F\u002Ffrancis-rings.github.io\u002FMotionEditor\u002F)\n  - **ECCV**\n    - ***DragVideo:*** Interactive Drag-style Video Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.02216)\n    - Video Editing via Factorized Diffusion Distillation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.09334)\n    - ***OCD:*** Object-Centric Diffusion for Efficient Video Editing [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F07396.pdf) [[Project]](https:\u002F\u002Fqualcomm-ai-research.github.io\u002Fobject-centric-diffusion\u002F)\n    - ***DreamMotion:*** Space-Time Self-Similarity Score Distillation for Zero-Shot Video Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.12002) [[Project]](https:\u002F\u002Fhyeonho99.github.io\u002Fdreammotion\u002F)\n    - ***WAVE:*** Warping DDIM Inversion Features for Zero-shot Text-to-Video Editing [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F09682.pdf) [[Project]](https:\u002F\u002Free1s.github.io\u002Fwave\u002F)\n    - ***DeCo:*** Decoupled Human-Centered Diffusion Video Editing with Motion Consistency [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F06071.pdf)\n    - ***SAVE:*** Protagonist Diversification with Structure Agnostic Video Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.02503) [[Code]](https:\u002F\u002Fgithub.com\u002Fldynx\u002FSAVE)\n    - ***Videoshop:*** Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F01890.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fsfanxiang\u002Fvideoshop) [[Project]](https:\u002F\u002Fvideoshop-editing.github.io\u002F)\n  - **ICLR**\n    - ***Ground-A-Video:*** Zero-shot Grounded Video Editing using Text-to-image Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.01107) [[Code]](https:\u002F\u002Fgithub.com\u002FGround-A-Video\u002FGround-A-Video) [[Project]](https:\u002F\u002Fground-a-video.github.io\u002F)\n    - ***TokenFlow:*** Consistent Diffusion Features for Consistent Video Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2307.10373) [[Code]](https:\u002F\u002Fgithub.com\u002Fomerbt\u002FTokenFlow) [[Project]](https:\u002F\u002Fdiffusion-tokenflow.github.io\u002F)\n    - Consistent Video-to-Video Transfer Using Synthetic Dataset [[Paper]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=IoKRezZMxF) [[Code]](https:\u002F\u002Fgithub.com\u002Famazon-science\u002Finstruct-video-to-video\u002Ftree\u002Fmain)\n    - ***FLATTEN:*** Optical FLow-guided ATTENtion for Consistent Text-to-Video Editing [[Paper]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=JgqftqZQZ7) [[Code]](https:\u002F\u002Fgithub.com\u002Fyrcong\u002Fflatten) [[Project]](https:\u002F\u002Fflatten-video-editing.github.io\u002F)\n  - **SIGGRAPH**\n    - ***MotionCtrl:*** A Unified and Flexible Motion Controller for Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.03641.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FTencentARC\u002FMotionCtrl) [[Project]](https:\u002F\u002Fwzhouxiff.github.io\u002Fprojects\u002FMotionCtrl\u002F) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FTencentARC\u002FMotionCtrl)\n  - **arXiv**\n    - Spectral Motion Alignment for Video Motion Transfer using Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.15249) [[Code]](https:\u002F\u002Fgithub.com\u002Fgeonyeong-park\u002FSpectral-Motion-Alignment) [[Project]](https:\u002F\u002Fgeonyeong-park.github.io\u002Fspectral-motion-alignment\u002F)\n    - ***UniEdit:*** A Unified Tuning-Free Framework for Video Motion and Appearance Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.13185) [[Code]](https:\u002F\u002Fgithub.com\u002FJianhongBai\u002FUniEdit) [[Project]](https:\u002F\u002Fjianhongbai.github.io\u002FUniEdit\u002F)\n    - ***DragAnything:*** Motion Control for Anything using Entity Representation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.07420.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fshowlab\u002FDragAnything) [[Project]](https:\u002F\u002Fweijiawu.github.io\u002Fdraganything_page\u002F)\n    - ***AnyV2V:*** A Plug-and-Play Framework for Any Video-to-Video Editing Tasks [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.14468) [[Code]](https:\u002F\u002Fgithub.com\u002FTIGER-AI-Lab\u002FAnyV2V) [[Project]](https:\u002F\u002Ftiger-ai-lab.github.io\u002FAnyV2V\u002F)\n    - ***CoCoCo:*** Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.12035) [[Code]](https:\u002F\u002Fgithub.com\u002Fzibojia\u002FCOCOCO) [[Project]](https:\u002F\u002Fcococozibojia.github.io\u002F)\n    - ***VASE:*** Object-Centric Appearance and Shape Manipulation of Real Videos [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.02473)\n    - ***StableV2V:*** Stablizing Shape Consistency in Video-to-Video Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.11045) [[Code]](https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo\u002FStableV2V) [[Project]](https:\u002F\u002Falonzoleeeooo.github.io\u002FStableV2V) [[Dataset]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FAlonzoLeeeooo\u002FDAVIS-Edit)\n    - Motion Inversion for Video Customization [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.20193) [[Code]](https:\u002F\u002Fgithub.com\u002FEnVision-Research\u002FMotionInversion) [[Demo]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fziyangmai\u002FMotionInversion)\n    - ***VideoAnydoor:*** High-fidelity Video Object Insertion with Precise Motion Control [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.01427) [[Project]](https:\u002F\u002Fvideoanydoor.github.io\u002F)\n- \u003Cspan id=\"editing-year-2023\">**Year 2023**\u003C\u002Fspan>\n  - **CVPR**\n    - Shape-aware Text-driven Layered Video Editing [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2022\u002Fpapers_ECCV\u002Fpapers\u002F136750705.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Ftext-video-edit\u002Fshape-aware-text-driven-layered-video-editing-release) [[Project]](https:\u002F\u002Ftext-video-edit.github.io\u002F)\n  - **ICCV**\n    - ***StableVideo*** Video Editing using Layered Representation and Image Diffusion [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.09592) [[Code]](https:\u002F\u002Fgithub.com\u002Frese1f\u002FStableVideo)\n    - ***Pix2Video:*** Video Editing using Image Diffusion [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.12688) [[Code]](https:\u002F\u002Fgithub.com\u002Fduyguceylan\u002Fpix2video)\n    - ***Tune-A-Video:*** One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FWu_Tune-A-Video_One-Shot_Tuning_of_Image_Diffusion_Models_for_Text-to-Video_Generation_ICCV_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fshowlab\u002FTune-A-Video) [[Project]](https:\u002F\u002Ftuneavideo.github.io\u002F)\n  - **NeurIPS**\n    - Towards Consistent Video Editing with Text-to-Image Diffusion Models [[Paper]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=RNVwm4BzXO)\n  - **SIGGRAPH**\n    - ***Rerender A Video:*** Zero-Shot Text-Guided Video-to-Video Translation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2306.07954) [[Code]](https:\u002F\u002Fgithub.com\u002Fwilliamyang1991\u002FRerender_A_Video) [[Project]](https:\u002F\u002Fwww.mmlab-ntu.com\u002Fproject\u002Frerender\u002F)\n  - **arXiv**\n    - ***Style-A-Video:*** Agile Diffusion for Arbitrary Text-based Video Style Transfer [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.05464.pdf)\n    - ***SAVE:*** Spectral-Shift-Aware Adaptation of Image Diffusion Models for Text-guided Video Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.18670) [[Code]](https:\u002F\u002Fgithub.com\u002Fnazmul-karim170\u002FSAVE-Text2Video-Diffusion) [[Project]](https:\u002F\u002Fsave-textguidedvideoediting.github.io\u002F)\n    - ***MagicProp:*** Diffusion-based Video Editing via Motion-aware Appearance Propagation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.00908)\n- \u003Cspan id=\"editing-year-2022\">**Year 2022**\u003C\u002Fspan>\n  - **ECCV**\n    - ***Text2LIVE:*** Text-Driven Layered Image and Video Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.02491) [[Code]](https:\u002F\u002Fgithub.com\u002Fomerbt\u002FText2LIVE) [[Project]](https:\u002F\u002Ftext2live.github.io\u002F)\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n# Human Image Animation\n- \u003Cspan id=\"human-year-2026\">**Year 2026**\u003C\u002Fspan>\n  - **arXiv**\n    - ***Hand2World:*** Autoregressive Egocentric Interaction Generation via Free-Space Hand Gestures [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.09600) [[Project]](https:\u002F\u002Fhand2world.github.io\u002F)\n- \u003Cspan id=\"human-year-2025\">**Year 2025**\u003C\u002Fspan>\n  - **CVPR**\n    - ***X-Dyna:*** Expressive Dynamic Human Image Animation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.10021) [[Code]](https:\u002F\u002Fgithub.com\u002Fbytedance\u002FX-Dyna)\n    - ***StableAnimator:*** High-Quality Identity-Preserving Human Image Animation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.17697) [[Project]](https:\u002F\u002Ffrancis-rings.github.io\u002FStableAnimator\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FFrancis-Rings\u002FStableAnimator)\n  - **ICCV**\n    - ***DreamActor-M1:*** Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance [[Paper]](https:\u002F\u002Ficcv.thecvf.com\u002Fvirtual\u002F2025\u002Fposter\u002F288) [[Project]](https:\u002F\u002Fgrisoon.github.io\u002FDreamActor-M1\u002F)\n    - ***Animate Anyone 2:*** High-Fidelity Character Image Animation with Environment Affordance [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.06145) [[Project]](https:\u002F\u002Fhumanaigc.github.io\u002Fanimate-anyone-2\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FHumanAIGC\u002Fanimate-anyone-2)\n    - ***Multi-identity Human Image Animation:*** Multi-identity Human Image Animation with Structural Video Diffusion [[Paper]](https:\u002F\u002Ficcv.thecvf.com\u002Fvirtual\u002F2025\u002Fposter\u002F638)\n    - ***OmniHuman-1:*** Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models [[Paper]](https:\u002F\u002Ficcv.thecvf.com\u002Fvirtual\u002F2025\u002Fposter\u002F2201) [[Project]](https:\u002F\u002Fomnihuman-lab.github.io\u002F)\n    - ***AdaHuman:*** Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.24877) [[Project]](https:\u002F\u002Fnvlabs.github.io\u002FAdaHuman\u002F) \n    - ***Ponimator:*** Unfolding Interactive Pose for Versatile Human-human Interaction Animation [[Paper]](https:\u002F\u002Ficcv.thecvf.com\u002Fvirtual\u002F2025\u002Fposter\u002F1453)\n  - **ICLR**\n    - ***Animate-X:*** Universal Character Image Animation with Enhanced Motion Representation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.17697) [[Project]](https:\u002F\u002Flucaria-academy.github.io\u002FAnimate-X\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fanimate-x)\n  - **arXiv**\n    - ***EgoControl***: Controllable Egocentric Video Generation via 3D Full-Body Poses [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.18173) [[Project]](https:\u002F\u002Fcvg-bonn.github.io\u002FEgoControl\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FCVG-Bonn\u002FEgoControl) \n    - ***UniAnimate-DiT:*** Human Image Animation with Large-Scale Video Diffusion Transformer [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.11289) [[Project]](https:\u002F\u002Funianimate.github.io\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FUniAnimate-DiT)\n    - ***DreamActor-M1:*** Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.01724) [[Project]](https:\u002F\u002Fdreamactor-m1.com\u002F)\n    - ***Animate Anyone 2:*** High-Fidelity Character Image Animation with Environment Affordance [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.06145) [[Project]](https:\u002F\u002Fhumanaigc.github.io\u002Fanimate-anyone-2\u002F)\n- \u003Cspan id=\"human-year-2024\">**Year 2024**\u003C\u002Fspan>\n  - **CVPR**\n    - ***MotionFollower:*** Editing Video Motion via Lightweight Score-Guided Diffusion [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.20325) [[Project]](https:\u002F\u002Ffrancis-rings.github.io\u002FMotionFollower\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FFrancis-Rings\u002FMotionFollower)\n    - ***MotionEditor:*** Editing Video Motion via Content-Aware Diffusion [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FTu_MotionEditor_Editing_Video_Motion_via_Content-Aware_Diffusion_CVPR_2024_paper.pdf) [[Project]](https:\u002F\u002Ffrancis-rings.github.io\u002FMotionEditor\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FFrancis-Rings\u002FMotionEditor)\n  - **ICLR**\n    - ***DisPose:*** Disentangling Pose Guidance for Controllable Human Image Animation [[Paper]](https:\u002F\u002Fopenreview.net\u002Fforum?id=AumOa10MKG) [[Project]](https:\u002F\u002Flihxxx.github.io\u002FDisPose\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Flihxxx\u002FDisPose)\n  - **arXiv**\n    - ***MikuDance:*** Animating Character Art with Mixed Motion Dynamics [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.08656) [[Project]](https:\u002F\u002Fkebii.github.io\u002FMikuDance\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FKebii\u002FMikuDance)\n    - ***MimicMotion:*** High Quality Human Image Animation using Regional Supervision and Motion Blur Condition [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.19680) [[Code]](https:\u002F\u002Fgithub.com\u002Ftencent\u002FMimicMotion) [[Project]](https:\u002F\u002Ftencent.github.io\u002FMimicMotion\u002F)\n    - ***VividPose:*** Advancing Stable Video Diffusion for Realistic Human Image Animation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.18156) [[Project]](https:\u002F\u002Fkelu007.github.io\u002Fvivid-pose\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FKelu007\u002FVividPose)\n    - ***MIMO:*** Controllable Character Video Synthesis with Spatial Decomposed Modeling [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.16160) [[Project]](https:\u002F\u002Fmenyifang.github.io\u002Fprojects\u002FMIMO\u002Findex.html) [[Code]](https:\u002F\u002Fgithub.com\u002Fmenyifang\u002FMIMO)\n    - ***DynamicCtrl:*** Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.21246) [[Project]](https:\u002F\u002Fgulucaptain.github.io\u002FDynamiCtrl\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fgulucaptain\u002FDynamiCtrl)\n    - ***HumanDiT:*** Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.04847) [[Project]](https:\u002F\u002Fagnjason.github.io\u002FHumanDiT-page\u002F)\n    - ***Disentangling Foreground and Background Motion for Enhanced Realism in Human Video Generation:*** [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.16393) [[Project]](https:\u002F\u002Fliujl09.github.io\u002Fhumanvideo_movingbackground\u002F)\n    - ***DreamDance:*** Animating Human Images by Enriching 3D Geometry Cues from 2D Poses [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.00397) [[Project]](https:\u002F\u002Fliujl09.github.io\u002Fhumanvideo_movingbackground\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FPKU-YuanGroup\u002FDreamDance)\n\n\n\n\n\n\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\n\n\u003C!-- omit in toc -->\n# Datasets\n- [arXiv 2012] ***UCF101:*** A Dataset of 101 Human Actions Classes From Videos in The Wild [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1212.0402.pdf) [[Dataset]](https:\u002F\u002Fwww.crcv.ucf.edu\u002Fdata\u002FUCF101.php)\n- [arXiv 2017] ***DAVIS:*** The 2017 DAVIS Challenge on Video Object Segmentation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1704.00675.pdf) [[Dataset]](https:\u002F\u002Fdavischallenge.org\u002F)\n- [ICCV 2019] ***FaceForensics++:*** Learning to Detect Manipulated Facial Images [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_ICCV_2019\u002Fpapers\u002FRossler_FaceForensics_Learning_to_Detect_Manipulated_Facial_Images_ICCV_2019_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fondyari\u002FFaceForensics)\n- [NeurIPS 2019] ***TaiChi-HD:*** First Order Motion Model for Image Animation [[Paper]](https:\u002F\u002Fpapers.nips.cc\u002Fpaper_files\u002Fpaper\u002F2019\u002Ffile\u002F31c0b36aef265d9221af80872ceb62f9-Paper.pdf) [[Dataset]](https:\u002F\u002Fgithub.com\u002FAliaksandrSiarohin\u002Ffirst-order-model)\n- [ECCV 2020] ***SkyTimeLapse:*** DTVNet: Dynamic Time-lapse Video\nGeneration via Single Still Image [[Paper]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2020\u002Fpapers_ECCV\u002Fpapers\u002F123500290.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fzhangzjn\u002FDTVNet?tab=readme-ov-file)\n- [ICCV 2021] ***WebVid-10M:*** Frozen in Time: ️A Joint Video and Image Encoder for End to End Retrieval [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2104.00650.pdf) [[Dataset]](https:\u002F\u002Fmaxbain.com\u002Fwebvid-dataset\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002Fm-bain\u002Fwebvid) [[Project]](https:\u002F\u002Fwww.robots.ox.ac.uk\u002F~vgg\u002Fresearch\u002Ffrozen-in-time\u002F)\n- [ICCV 2021] ***WebVid-10M:*** Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2021\u002Fpapers\u002FBain_Frozen_in_Time_A_Joint_Video_and_Image_Encoder_for_ICCV_2021_paper.pdf) [[Dataset]](https:\u002F\u002Fgithub.com\u002Fm-bain\u002Fwebvid) [[Project]](https:\u002F\u002Fwww.robots.ox.ac.uk\u002F~vgg\u002Fresearch\u002Ffrozen-in-time\u002F)\n- [ECCV 2022] ***ROS:*** Learning to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.02393.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fmetadriverse\u002FACO) [[Dataset]](https:\u002F\u002Fmycuhk-my.sharepoint.com\u002Fpersonal\u002F1155165194_link_cuhk_edu_hk\u002F_layouts\u002F15\u002Fonedrive.aspx?id=%2Fpersonal%2F1155165194%5Flink%5Fcuhk%5Fedu%5Fhk%2FDocuments%2Fytb%5Fdriving%5Fvideos&ga=1)\n- [arXiv 2023] ***HD-VG-130M:*** VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.10874.pdf) [[Dataset]](https:\u002F\u002Fgithub.com\u002Fdaooshee\u002FHD-VG-130M)\n- [NeurIPS 2023] ***FETV:*** A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation [[Paper]](https:\u002F\u002Fpapers.nips.cc\u002Fpaper_files\u002Fpaper\u002F2023\u002Ffile\u002Fc481049f7410f38e788f67c171c64ad5-Paper-Datasets_and_Benchmarks.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002Fllyx97\u002FFETV)\n- [ICLR 2024] ***InternVid:*** A Large-scale Video-Text Dataset for Multimodal Understanding and Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2307.06942) [[Dataset]](https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FInternVideo\u002Ftree\u002Fmain\u002FData\u002FInternVid)\n- [CVPR 2024] ***Panda-70M:*** Captioning 70M Videos with Multiple Cross-Modality Teachers [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.19479.pdf) [[Dataset]](https:\u002F\u002Fgithub.com\u002Fsnap-research\u002FPanda-70M) [[Project]](https:\u002F\u002Fsnap-research.github.io\u002FPanda-70M)\n- [arXiv 2024] ***VidProM:*** A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.06098.pdf) [[Dataset]](https:\u002F\u002Fgithub.com\u002FWangWenhao0716\u002FVidProM)\n- [CVPR 2025] ***HOIGen-1M:*** A Large-scale Dataset for Human-Object Interaction Video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.23715) [[Dataset]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FHOIGen\u002FHOIGen-1M)\n- [CVPR 2025] ***VEU-Bench:*** Towards Comprehensive Understanding of Video Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.17828)\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n# Evaluation Metrics\n- [CVPR 2025] **T2V-CompBench:*** A Comprehensive Benchmark for Compositional Text-to-video Generation [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.14505) [[Project]](https:\u002F\u002Fgithub.com\u002FKaiyueSun98\u002FT2V-CompBench\u002Ftree\u002FV2) [[Code]](https:\u002F\u002Fgithub.com\u002FKaiyueSun98\u002FT2V-CompBench\u002Ftree\u002FV2)\n- [arXiv 2024] ***Davis-Edit:*** Stablizing Shape Consistency in Video-to-Video Editing [[Paper]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.11045) [[Project]](https:\u002F\u002Falonzoleeeooo.github.io\u002FStableV2V\u002F) [[Code]](https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo\u002FStableV2V)\n- [CVPR 2024] **VBench:** Comprehensive Benchmark Suite for Video Generative Models [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FHuang_VBench_Comprehensive_Benchmark_Suite_for_Video_Generative_Models_CVPR_2024_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FVchitect\u002FVBench)\n- [ICCV 2023] **DOVER:** Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives [[Paper]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FWu_Exploring_Video_Quality_Assessment_on_User_Generated_Contents_from_Aesthetic_ICCV_2023_paper.pdf) [[Code]](https:\u002F\u002Fgithub.com\u002FVQAssessment\u002FDOVER)\n- [ICLR 2019] ***FVD:*** A New Metric for Video Generation [[Paper]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=rylgEULtdN) [[Code]](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fgoogle-research\u002Fblob\u002Fmaster\u002Ffrechet_video_distance\u002Ffrechet_video_distance.py)\n\n\u003C!-- omit in toc -->\n# Q&A\n- **Q: The conference sequence of this paper list?**\n  - This paper list is organized according to the following sequence:\n    - CVPR\n    - ICCV\n    - ECCV\n    - NeurIPS\n    - ICLR\n    - AAAI\n    - ACM MM\n    - SIGGRAPH\n    - arXiv\n    - Others\n- **Q: What does `Others` refers to?**\n  - Some of the following studies (e.g., `Sora`) does not publish their technical report on arXiv. Instead, they tend to write a blog in their official websites. The `Others` category refers to such kind of studies.\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n# References\n\nThe `reference.bib` file summarizes bibtex references of up-to-date image inpainting papers, widely used datasets, and toolkits.\nBased on the original references, I have made the following modifications to make their results look nice in the `LaTeX` manuscripts:\n- Refereces are normally constructed in the form of `author-etal-year-nickname`. Particularly, references of datasets and toolkits are directly constructed as `nickname`, e.g., `imagenet`.\n- In each reference, all names of conferences\u002Fjournals are converted into abbreviations, e.g., `Computer Vision and Pattern Recognition -> CVPR`.\n- The `url`, `doi`, `publisher`, `organization`, `editor`, `series` in all references are removed.\n- The `pages` of all references are added if they are missing.\n- All paper names are in title case. Besides, I have added an additional `{}` to make sure that the title case would also work well in some particular templates. \n\nIf you have other demands of reference formats, you may refer to the original references of papers by searching their names in [DBLP](https:\u002F\u002Fdblp.org\u002F) or [Google Scholar](https:\u002F\u002Fscholar.google.com\u002F).\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n# Star History\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlonzoLeeeooo_awesome-video-generation_readme_156478355718.png\" target=\"_blank\">\n        \u003Cimg width=\"500\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlonzoLeeeooo_awesome-video-generation_readme_156478355718.png\" alt=\"Star History Chart\">\n    \u003C\u002Fa>\n\u003Cp>\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n# WeChat Group\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"YOUR_OFFICIAL_WEBSITE_URL\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlonzoLeeeooo_awesome-video-generation_readme_bebe6963d95b.png\" alt=\"group\">\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\n[\u003Cu>\u003Csmall>\u003C🎯Back to Top>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n","\u003Cp align=\"center\">\n  \u003Ch1 align=\"center\">视频生成研究合集\u003C\u002Fh1>\n\n本 GitHub 仓库汇总了与视频生成任务相关的论文和资源。\n\n如果您对本仓库有任何建议，欢迎随时[新建议题](https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo\u002Fawesome-video-generation\u002Fissues\u002Fnew)或提交[拉取请求](https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo\u002Fawesome-video-generation\u002Fpulls)。\n\n本 GitHub 仓库的最新动态如下。\n\n🔥 [2025年12月11日] 我们题为《StableV2V：稳定视频到视频编辑中的形状一致性》的论文已被 TCSVT 2025 接收！\n\n🔥 [11月19日] 我们发布了最新论文《StableV2V：稳定视频到视频编辑中的形状一致性》（[arXiv 链接](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.11045)），并同步开源了对应的[代码](https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo\u002FStableV2V)、[模型权重](https:\u002F\u002Fhuggingface.co\u002FAlonzoLeeeooo\u002FStableV2V)以及用于测试的基准数据集[`DAVIS-Edit`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FAlonzoLeeeooo\u002FDAVIS-Edit)。欢迎通过链接查看！\n\u003Cdetails> \u003Csummary> 点击查看更多信息。 \u003C\u002Fsummary>\n\n- [2025年5月13日] 更新了一个名为[人物图像动画](#human-image-animation)的新子任务。所有**CVPR 2025**论文及参考文献均已更新。\n- [6月17日] 所有**NeurIPS 2023**论文及参考文献均已更新。\n- [4月26日] 新增一个方向：[个性化视频生成](#personalized-video-generation)。\n- [3月28日] 官方**AAAI 2024**论文列表已发布！相应地更新了官方 PDF 版本及 BibTeX 参考文献。\n\u003C\u002Fdetails>\n\n\u003C!-- omit in toc -->\n# \u003Cspan id=\"contents\">目录\u003C\u002Fspan>\n- [待办事项](#to-do-lists)\n- [产品](#products)\n- [论文](#papers)\n  - [综述论文](#survey-papers)\n  - [文本到视频生成](#text-to-video-generation)\n    - [2026年](#text-year-2026)\n    - [2025年](#text-year-2025)\n    - [2024年](#text-year-2024)\n    - [2023年](#text-year-2023)\n    - [2022年](#text-year-2022)\n    - [2021年](#text-year-2021)\n  - [图像到视频生成](#image-to-video-generation)\n    - [2024年](#image-year-2024)\n    - [2023年](#image-year-2023)\n    - [2022年](#image-year-2022)\n  - [个性化视频生成](#personalized-video-generation)\n    - [2024年](#personalized-year-2024)\n    - [2023年](#personalized-year-2023)\n  - [视频编辑](#video-editing)\n    - [2025年](#editing-year-2025)\n    - [2024年](#editing-year-2024)\n    - [2023年](#editing-year-2023)\n  - [音频到视频生成](#audio-to-video-generation)\n    - [2024年](#audio-year-2024)\n    - [2023年](#audio-year-2023)\n  - [人物图像动画](#human-image-animation)\n    - [2026年](#human-year-2026)\n    - [2025年](#human-year-2025)\n    - [2024年](#human-year-2023)\n- [数据集](#datasets)\n- [问答](#qa)\n- [参考文献](#references)\n- [星标历史](#star-history)\n- [微信群](#wechat-group)\n\n\u003C!-- omit in toc -->\n# 待办事项\n- 最新论文\n  - [ ] 更新 NeurIPS 2025 论文\n  - [ ] 更新 ICCV 2025 论文\n  - [x] 更新 CVPR 2025 论文\n  - [x] 更新 ICLR 2025 论文\n  - [x] 更新 NeurIPS 2024 论文\n  - [x] 更新 ECCV 2024 论文\n  - [x] 更新 CVPR 2024 论文\n    - [x] 更新 ⚠️ 论文的 PDF 和参考文献\n    - [ ] 更新参考文献的正式版本\n  - [x] 更新 AAAI 2024 论文\n    - [x] 更新 ⚠️ 论文的 PDF 和参考文献\n    - [x] 更新参考文献的正式版本\n  - [x] 更新 ICLR 2024 论文\n  - [x] 更新 NeurIPS 2023 论文\n- 已发表论文\n  - [x] 更新之前的 CVPR 论文\n  - [x] 更新之前的 ICCV 论文\n  - [x] 更新之前的 ECCV 论文\n  - [x] 更新之前的 NeurIPS 论文\n  - [x] 更新之前的 ICLR 论文\n  - [x] 更新之前的大会 AAAI 论文\n  - [x] 更新之前的大会 ACM MM 论文\n- 定期维护预印本 arXiv 论文及遗漏论文\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n# 产品\n\n|名称|机构|年份|研究论文|官网|特色|\n|-|-|-|-|-|-|\n|Sora|OpenAI|2024|[链接](https:\u002F\u002Fwww.midjourney.com\u002Fhome)|[链接](https:\u002F\u002Fopenai.com\u002Fsora)|-|\n|Lumiere|Google|2024|[链接](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.12945)|[链接](https:\u002F\u002Flumiere-video.github.io\u002F)|-|\n|VideoPoet|Google|2023|-|[链接](https:\u002F\u002Fsites.research.google\u002Fvideopoet\u002F)|-|\n|W.A.I.T|Google|2023|[链接](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.06662.pdf)|[链接](https:\u002F\u002Fwalt-video-diffusion.github.io\u002F)|-|\n|Gen-2|Runaway|2023|-|[链接](https:\u002F\u002Fresearch.runwayml.com\u002Fgen2)|-|\n|Gen-1|Runaway|2023|-|[链接](https:\u002F\u002Fresearch.runwayml.com\u002Fgen1)|-|\n|Animate Anyone|Alibaba|2023|[链接](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.17117.pdf)|[链接](https:\u002F\u002Fhumanaigc.github.io\u002Fanimate-anyone\u002F)|-|\n|Outfit Anyone|Alibaba|2023|-|[链接](https:\u002F\u002Foutfitanyone.app\u002F)|-|\n|Stable Video|StabilityAI|2023|[链接](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.15127.pdf)|[链接](https:\u002F\u002Fwww.stablevideo.com\u002F)|-|\n|Pixeling|HiDream.ai|2023|-|[链接](https:\u002F\u002Fhidreamai.com\u002F#\u002F)|-|\n|DomoAI|DomoAI|2023|-|[链接](https:\u002F\u002Fdomoai.app\u002F)|-|\n|Emu|Meta|2023|[链接](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.10709)|[链接](https:\u002F\u002Femu-video.metademolab.com\u002F)|-|\n|Genmo|Genmo|2023|-|[链接](https:\u002F\u002Fwww.genmo.ai\u002F)|-|\n|NeverEnds|NeverEnds|2023|-|[链接](https:\u002F\u002Fneverends.life\u002F)|-|\n|Moonvalley|Moonvalley|2023|-|[链接](https:\u002F\u002Fmoonvalley.ai\u002F)|-|\n|Morph Studio|Morph|2023|-|[链接](https:\u002F\u002Fwww.morphstudio.com\u002F)|-|\n|Pika|Pika|2023|-|[链接](https:\u002F\u002Fpika.art\u002F)|-|\n|PixelDance|ByteDance|2023|[链接](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.10982)|[链接](https:\u002F\u002Fmakepixelsdance.github.io\u002F)|-|\n\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n# 论文\n\n\u003C!-- omit in toc -->\n## 综述论文\n- \u003Cspan id=\"survey-year-2023\">**2024年**\u003C\u002Fspan>\n  - **arXiv**\n    - 视频扩散模型：综述 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.03150.pdf)\n- \u003Cspan id=\"survey-year-2023\">**2023年**\u003C\u002Fspan>\n  - **arXiv**\n    - 视频扩散模型综述 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.10647.pdf)\n\n\u003C!-- omit in toc -->\n\n## 文本到视频生成\n- \u003Cspan id=\"text-year-2026\">**2026年**\u003C\u002Fspan>\n  - **AAAI**\n    - 具有双重并行性的分钟级视频 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.21070) [[项目]](https:\u002F\u002Fdualparal-project.github.io\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FDualParal-Project\u002FDualParal)\n- \u003Cspan id=\"text-year-2025\">**2025年**\u003C\u002Fspan>\n  - **CVPR**\n    - ***AIGV-Assessor:*** *使用多模态大模型对文本到视频生成的感知质量进行基准测试与评估* [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.17221) [[代码]](https:\u002F\u002Fgithub.com\u002Fwangjiarui153\u002FAIGV-Assessor)\n    - ***RAPO:*** *魔鬼藏在提示词里：用于文本到视频生成的检索增强提示优化* [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.11739) [[项目]](https:\u002F\u002Fwhynothaha.github.io\u002FPrompt_optimizer\u002FRAPO.html) [[代码]](https:\u002F\u002Fgithub.com\u002FVchitect\u002FRAPO)\n    - ***ByTheWay:*** 在无需训练的情况下提升文本到视频生成模型的质量 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.06241) [[代码]](https:\u002F\u002Fgithub.com\u002FBujiazi\u002FByTheWay)\n    - ***魔鬼藏在提示词里:*** 用于文本到视频生成的检索增强提示优化 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.15579) [[项目]](https:\u002F\u002Fwhynothaha.github.io\u002FPrompt_optimizer\u002FRAPO.html) [[代码]](https:\u002F\u002Fgithub.com\u002FVchitect\u002FRAPO)\n    - ***ConsistID:*** 基于频率分解的身份保持型文本到视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.17440) [[代码]](https:\u002F\u002Fgithub.com\u002FPKU-YuanGroup\u002FConsisID) [[项目]](https:\u002F\u002Fpku-yuangroup.github.io\u002FConsisID\u002F)\n    - ***EIDT-V:*** 利用扩散轨迹中的交集实现模型无关、零样本、无需训练的文本到视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.06861) [[代码]](https:\u002F\u002Fgithub.com\u002Fdjagpal02\u002FEIDT-V) [[项目]](https:\u002F\u002Fdjagpal02.github.io\u002FEIDT-V\u002F)\n    - ***TransPixeler:*** 以透明度推动文本到视频生成技术的发展 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.03006) [[项目]](https:\u002F\u002Fwileewang.github.io\u002FTransPixar\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fwileewang\u002FTransPixeler)\n    - ***PhyT2V:*** 基于物理约束的文本到视频生成中由大语言模型引导的迭代自我精炼 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.00596) [[代码]](https:\u002F\u002Fgithub.com\u002Fpittisl\u002FPhyT2V)\n    - ***InstanceCap:*** 通过实例感知的结构化描述改进文本到视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.09283) [[代码]](https:\u002F\u002Fgithub.com\u002FNJU-PCALab\u002FInstanceCap)\n    - ***BlobGEN-Vid:*** 基于块状视频表示的组合式文本到视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.07647) [[项目]](https:\u002F\u002Fblobgen-vid2.github.io\u002F)\n    - ***LinGen:*** 向具有线性计算复杂度的高分辨率分钟级文本到视频生成迈进 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.09856) [[项目]](https:\u002F\u002Flineargen.github.io\u002F)\n    - ⚠️ 高质量视频合成的图文模型封装组合\n  - **ICCV**\n    - ***统一视频生成:*** 通过连续域中的下一组预测实现统一视频生成 [[论文]](https:\u002F\u002Ficcv.thecvf.com\u002Fvirtual\u002F2025\u002Fposter\u002F72)\n  - **NeurIPS**\n    - ***稳定电影计量学:*** 面向专业视频生成的结构化分类与评估 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2509.26555) [[项目]](https:\u002F\u002Fstable-cinemetrics.github.io\u002F)\n  - **ICLR**\n    - ***OpenVid-1M:*** 大规模高质量文本到视频生成数据集 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=j7kdXSrISM) [[项目]](https:\u002F\u002Fnju-pcalab.github.io\u002Fprojects\u002Fopenvid\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FNJU-PCALab\u002FOpenVid-1M) [[数据集]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnkp37\u002FOpenVid-1M)\n    - ***CogVideoX:*** 具有专家级Transformer的文本到视频扩散模型 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=LQzN6TRFg9)\n    - 用于高效视频生成建模的金字塔流匹配 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=66NzcRQuOq) [[项目]](https:\u002F\u002Fpyramid-flow.github.io\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fjy0205\u002FPyramid-Flow)\n  - **arXiv**\n    - 稳定视频无限：通过错误回收实现无限长度视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2510.09212) [[项目]](https:\u002F\u002Fstable-video-infinity.github.io\u002Fhomepage\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fvita-epfl\u002FStable-Video-Infinity?tab=readme-ov-file) [[视频（YouTube）]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=p71Wp1FuqTw) [[视频（Bilibili）]](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1Zd4UzpE6G\u002F?pop_share=1)\n    - ***FEAT:*** 全维度高效注意力Transformer用于医疗视频生成 [[论文]](https:\u002F\u002Fwww.arxiv.org\u002Fpdf\u002F2506.04956) [[代码]](https:\u002F\u002Fgithub.com\u002FYaziwel\u002FFEAT)\n- \u003Cspan id=\"text-year-2024\">**2024年**\u003C\u002Fspan>\n  - **CVPR**\n    - ***Vlogger:*** 把你的梦想变成vlog [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.09414.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FVchitect\u002FVlogger)\n    - ***让像素起舞:*** 高动态范围视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.10982.pdf) [[项目]](https:\u002F\u002Fmakepixelsdance.github.io\u002F) [[演示]](https:\u002F\u002Fmakepixelsdance.github.io\u002Fdemo.html)\n    - ***VGen:*** 用于文本到视频生成的分层时空解耦 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.04483) [[代码]](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FVGen) [[项目]](https:\u002F\u002Fhigen-t2v.github.io\u002F)\n    - ***GenTron:*** 深入探索用于图像和视频生成的扩散Transformer [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.04557) [[项目]](https:\u002F\u002Fwww.shoufachen.com\u002Fgentron_website\u002F)\n    - ***SimDA:*** 用于高效视频生成的简单扩散适配器 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.09710.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FChenHsing\u002FSimDA) [[项目]](https:\u002F\u002Fchenhsing.github.io\u002FSimDA\u002F)\n    - ***MicroCinema:*** 一种用于文本到视频生成的分治方法 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.18829) [[项目]](https:\u002F\u002Fwangyanhui666.github.io\u002FMicroCinema.github.io\u002F) [[视频]](https:\u002F\u002Fyoutube.com\u002Fshorts\u002FH7O-Ku_lqPA)\n    - ***生成式渲染:*** 基于2D扩散模型的可控4D引导视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.01409) [[项目]](https:\u002F\u002Fprimecai.github.io\u002Fgenerative_rendering\u002F)\n    - ***PEEKABOO:*** 基于掩码扩散的交互式视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.07509) [[代码]](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FPeekaboo) [[项目]](https:\u002F\u002Fjinga-lala.github.io\u002Fprojects\u002FPeekaboo\u002F) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fanshuln\u002Fpeekaboo-demo)\n    - ***EvalCrafter:*** 大型视频生成模型的基准测试与评估 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.11440) [[代码]](https:\u002F\u002Fgithub.com\u002FEvalCrafter\u002FEvalCrafter) [[项目]](https:\u002F\u002Fevalcrafter.github.io\u002F)\n    - 无文本视频扩展文本到视频生成的配方 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.15770) [[代码]](https:\u002F\u002Fgithub.com\u002Fdamo-vilab\u002Fi2vgen-xl) [[项目]](https:\u002F\u002Ftf-t2v.github.io\u002F)\n    - ***BIVDiff:*** 一种无需训练的框架，通过连接图像和视频扩散模型实现通用视频合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.02813) [[项目]](https:\u002F\u002Fbivdiff.github.io\u002F)\n    - ***关注时间:*** 用于文本到视频合成的规模化时空Transformer [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.14797) [[项目]](https:\u002F\u002Fsnap-research.github.io\u002Fsnapvideo\u002Fvideo_ldm.html)\n    - ***为任何人动画化:*** 一致且可控的图像到视频合成，用于角色动画 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.17117.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FHumanAIGC\u002FAnimateAnyone) [[项目]](https:\u002F\u002Fhumanaigc.github.io\u002Fanimate-anyone\u002F)\n    - ***运动导演:*** 文本到视频扩散模型的运动自定义 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.08465) [[代码]](https:\u002F\u002Fgithub.com\u002Fshowlab\u002FMotionDirector)\n    - 用于高分辨率视频生成的分层补丁扩散模型 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FSkorokhodov_Hierarchical_Patch_Diffusion_Models_for_High-Resolution_Video_Generation_CVPR_2024_paper.pdf) [[项目]](https:\u002F\u002Fsnap-research.github.io\u002Fhpdm\u002F)\n    - ***DiffPerformer:*** 扩散式人体视频生成中的一致潜在指导的迭代学习 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FWang_DiffPerformer_Iterative_Learning_of_Consistent_Latent_Guidance_for_Diffusion-based_Human_CVPR_2024_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Faipixel\u002F)\n    - 用于文本到视频生成的网格扩散模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.00234.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Ftaegyeong-lee\u002FGrid-Diffusion-Models-for-Text-to-Video-Generation) [[视频]](https:\u002F\u002Ftaegyeong-lee.github.io\u002Ftext2video)\n  - **ECCV**\n    - ***Emu Video:*** 通过显式图像条件化分解文本到视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.10709.pdf) [[项目]](https:\u002F\u002Fai.meta.com\u002Fblog\u002Femu-text-to-video-generation-image-editing-research\u002F)\n    - ***W.A.L.T.:*** 基于扩散模型的真实感视频生成 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F10270.pdf) [[项目]](https:\u002F\u002Fwalt-video-diffusion.github.io\u002F)\n    - ***MoVideo:*** 基于扩散模型的运动感知视频生成 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F06030.pdf)\n    - ***DrivingDiffusion:*** 基于潜在扩散模型的布局引导多视角驾驶场景视频生成 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F10097.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fshalfun\u002FDrivingDiffusion) [[项目]](https:\u002F\u002Fdrivingdiffusion.github.io\u002F)\n    - ***MagDiff:*** 多重对齐扩散用于高保真视频生成和编辑 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F02738.pdf)\n    - ***HARIVO:*** 利用文本到图像模型进行视频生成 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F06938.pdf) [[项目]](https:\u002F\u002Fkwonminki.github.io\u002FHARIVO\u002F)\n    - ***MEVG:*** 使用文本到视频模型生成多事件视频 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F06012.pdf) [[项目]](https:\u002F\u002Fkuai-lab.github.io\u002Feccv2024mevg\u002F)\n  - **NeurIPS**\n    - 通过分解编码和条件化提升文本到视频生成中的运动表现 [[论文]](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2024\u002Ffile\u002F81f19c0e9f3e06c831630ab6662fd8ea-Paper-Conference.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FPR-Ryan\u002FDEMO)\n  - **ICML**\n    - ***Video-LaVIT:*** 统一的视频-语言预训练，采用解耦的视觉-运动标记法 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=S9lk6dk4LL) [[项目]](https:\u002F\u002Fvideo-lavit.github.io\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fjy0205\u002FLaVIT)\n  - **ICLR**\n    - ***VDT:*** 基于掩码建模的通用视频扩散Transformer [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.13311.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FRERV\u002FVDT) [[项目]](https:\u002F\u002Fvdt-2023.github.io\u002F)\n    - ***VersVideo:*** 利用增强的时序扩散模型实现多功能视频生成 [[论文]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=K9sVJ17zvB)\n  - **AAAI**\n    - ***跟随你的姿势:*** 使用无姿态视频进行姿势引导的文本到视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.01186) [[代码]](https:\u002F\u002Fgithub.com\u002Fmayuelala\u002FFollowYourPose) [[项目]](https:\u002F\u002Ffollow-your-pose.github.io\u002F)\n    - ***E2HQV:*** 基于事件相机的高质量视频生成，采用理论启发的模型辅助深度学习 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.08117)\n    - ***ConditionVideo:*** 无需训练的条件引导文本到视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.07697) [[代码]](https:\u002F\u002Fgithub.com\u002Fpengbo807\u002FConditionVideo) [[项目]](https:\u002F\u002Fpengbo807.github.io\u002Fconditionvideo-website\u002F)\n    - ***F3-修剪:*** 一种无需训练且通用的修剪策略，旨在实现更快更精细的文本到视频合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.03459)\n  - **arXiv**\n    - ***Lumiere:*** 一种时空扩散模型用于视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.12945.pdf) [[项目]](https:\u002F\u002Flumiere-video.github.io\u002F)\n    - ***Boximator:*** 为视频合成生成丰富且可控的运动 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.01566.pdf) [[项目]](https:\u002F\u002Fboximator.github.io\u002F) [[视频]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=reto_TYsYyQ)\n    - 带有环形注意力的大规模视频和语言世界模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.08268) [[代码]](https:\u002F\u002Fgithub.com\u002FLargeWorldModel\u002FLWM) [[项目]](https:\u002F\u002Flargeworldmodel.github.io\u002F)\n    - ***Direct-a-Video:*** 根据用户指令的摄像机移动和物体运动定制视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.03162.pdf) [[项目]](https:\u002F\u002Fdirect-a-video.github.io\u002F)\n    - ***WorldDreamer:*** 通过预测掩码标记迈向通用视频生成世界模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.09985) [[代码]](https:\u002F\u002Fgithub.com\u002FJeffWang987\u002FWorldDreamer) [[项目]](https:\u002F\u002Fworld-dreamer.github.io\u002F)\n    - ***MagicVideo-V2:*** 多阶段高审美价值的视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.04468.pdf) [[项目]](https:\u002F\u002Fmagicvideov2.github.io\u002F)\n    - ***Latte:*** 用于视频生成的潜在扩散Transformer [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.03048) [[代码]](https:\u002F\u002Fgithub.com\u002FVchitect\u002FLatte) [[项目]](https:\u002F\u002Fmaxin-cn.github.io\u002Flatte_project)\n    - ***Mora:*** 通过多智能体框架实现通用视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.13248) [[代码]](https:\u002F\u002Fgithub.com\u002Flichao-sun\u002FMora)\n    - ***StreamingT2V:*** 从文本持续生成连贯、动态且可扩展的长视频 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.14773) [[代码]](https:\u002F\u002Fgithub.com\u002FPicsart-AI-Research\u002FStreamingT2V) [[项目]](https:\u002F\u002Fstreamingt2v.github.io\u002F) [[视频]](https:\u002F\u002Ftwitter.com\u002Fi\u002Fstatus\u002F1770909673463390414)\n    - ***VIDiff:*** 基于多模态指令通过扩散模型翻译视频 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.18837)\n    - ***StoryDiffusion:*** 用于长距离图像和视频生成的一致性自注意力 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.01434) [[代码]](https:\u002F\u002Fgithub.com\u002FHVision-NKU\u002FStoryDiffusion) [[项目]](https:\u002F\u002Fstorydiffusion.github.io\u002F) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FYupengZhou\u002FStoryDiffusion)\n    - ***Ctrl-Adapter:*** 一个高效且多功能的框架，用于将各种控制适配到任何扩散模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.09967) [[代码]](https:\u002F\u002Fgithub.com\u002FHL-hanlin\u002FCtrl-Adapter) [[项目]](https:\u002F\u002Fctrl-adapter.github.io\u002F)\n    - ***ControlNeXt:*** 强大而高效的图像和视频生成控制 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.06070) [[代码]](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FControlNeXt) [[项目]](https:\u002F\u002Fpbihao.github.io\u002Fprojects\u002Fcontrolnext\u002Findex.html)\n    - ***FancyVideo:*** 朝着动态且一致的视频生成迈进，通过跨帧文本指导 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.08189v1) [[项目]](https:\u002F\u002Ffancyvideo.github.io\u002F)\n    - ***Factorized-Dreamer:*** 用有限且低质量的数据训练高质量视频生成器 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.10119v1) [[代码]](https:\u002F\u002Fgithub.com\u002Fyangxy\u002FFactorized-Dreamer\u002F)\n    - 精细的零样本视频采样 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.21475) [[项目]](https:\u002F\u002Fdensechen.github.io\u002Fzss\u002F)\n    - 无需训练的链式扩散模型专家网络长视频生成 [[论文]](https:\u002F\u002Fwww.arxiv.org\u002Fpdf\u002F2408.13423)\n    - ***ReconX:*** 用视频扩散模型从稀疏视图重建任何场景 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.16767) [[代码]](https:\u002F\u002Fgithub.com\u002Fliuff19\u002FReconX) [[项目]](https:\u002F\u002Fliuff19.github.io\u002FReconX\u002F) [[视频]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=UuL2nP5rJcI)\n    - ConFiner: 无需训练的链式扩散模型专家网络长视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.13423) [[代码]](https:\u002F\u002Fgithub.com\u002FConfiner2025\u002FConfiner2025)\n    - ***3DTrajMaster:*** 掌握视频生成中多实体运动的3D轨迹 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.07759) [[代码]](https:\u002F\u002Fgithub.com\u002FKwaiVGI\u002F3DTrajMaster) [[项目]](https:\u002F\u002Ffuxiao0719.github.io\u002Fprojects\u002F3dtrajmaster\u002F)\n    - ***DiTCtrl:*** 探索多模态扩散Transformer中的注意力控制，以实现无需调优的多提示长视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.18597) [[代码]](https:\u002F\u002Fgithub.com\u002FTencentARC\u002FDiTCtrl) [[项目]](https:\u002F\u002Fonevfall.github.io\u002Fproject_page\u002Fditctrl\u002F)\n  - **其他**\n    - ***Sora:*** 视频生成模型作为世界模拟器 [[论文]](https:\u002F\u002Fopenai.com\u002Fresearch\u002Fvideo-generation-models-as-world-simulators)\n- \u003Cspan id=\"text-year-2023\">**2023年**\u003C\u002Fspan>\n  - **CVPR**\n    - ***对齐你的潜在空间:*** 使用潜在扩散模型进行高分辨率视频合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.08818.pdf) [[项目]](https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Ftoronto-ai\u002FVideoLDM\u002F) [[复现代码]](https:\u002F\u002Fgithub.com\u002Fsrpkdyy\u002FVideoLDM)\n    - ***Text2Video-Zero:*** 文本到图像扩散模型是零样本视频生成器 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FKhachatryan_Text2Video-Zero_Text-to-Image_Diffusion_Models_are_Zero-Shot_Video_Generators_ICCV_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FPicsart-AI-Research\u002FText2Video-Zero) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FPAIR\u002FText2Video-Zero) [[项目]](https:\u002F\u002Ftext2video-zero.github.io\u002F)\n    - 投影潜在空间中的视频概率扩散模型 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FYu_Video_Probabilistic_Diffusion_Models_in_Projected_Latent_Space_CVPR_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fsihyun-yu\u002FPVDM)\n  - **ICCV**\n    - 保留你自己的相关性：视频扩散模型的噪声先验 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FGe_Preserve_Your_Own_Correlation_A_Noise_Prior_for_Video_Diffusion_ICCV_2023_paper.pdf) [[项目]](https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Fdir\u002Fpyoco\u002F)\n    - ***Gen-1:*** 基于结构和内容引导的视频合成，使用扩散模型 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FEsser_Structure_and_Content-Guided_Video_Synthesis_with_Diffusion_Models_ICCV_2023_paper.pdf) [[项目]](https:\u002F\u002Fresearch.runwayml.com\u002Fgen1)\n  - **NeurIPS**\n    - 视频扩散模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.03458.pdf) [[项目]](https:\u002F\u002Fvideo-diffusion.github.io\u002F)\n    - 通过文本引导的视频生成学习通用策略 [[论文]](https:\u002F\u002Fpapers.nips.cc\u002Fpaper_files\u002Fpaper\u002F2023\u002Ffile\u002F1d5b9233ad716a43be5c0d3023cb82d0-Paper-Conference.pdf) [[项目]](https:\u002F\u002Funiversal-policy.github.io\u002Funipi\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fflow-diffusion\u002FAVDC)\n    - ***VideoComposer:*** 具有运动可控性的组合式视频合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2306.02018.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002Fvideocomposer) [[项目]](https:\u002F\u002Fvideocomposer.github.io\u002F)\n  - **ICLR**\n    - ***CogVideo:*** 基于Transformer的大规模文本到视频生成预训练 [[论文]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=rB6TpjAuSRy) [[代码]](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FCogVideo) [[演示]](https:\u002F\u002Fmodels.aminer.cn\u002Fcogvideo\u002F)\n    - ***Make-A-Video:*** 无需文本视频数据的文本到视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.14792.pdf) [[项目]](https:\u002F\u002Fmakeavideo.studio\u002F) [[复现代码]](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fmake-a-video-pytorch)\n    - ***Phenaki:*** 来自开放领域文本描述的可变长度视频生成 [[论文]](https:\u002F\u002Fopenreview.net\u002Fpdf\u002Ffe8e106a2746992c9c2e658bdc8cb9c89cc5a39a.pdf) [[复现代码]](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fphenaki-pytorch)\n  - **arXiv**\n    - ***Control-A-Video:*** 可控的文本到视频生成，使用扩散模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.13840.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FWeifeng-Chen\u002Fcontrol-a-video) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fwf-genius\u002FControl-A-Video) [[项目]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.13840.pdf)\n    - ***ControlVideo:*** 无需训练的可控文本到视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.13077.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FYBYBZhang\u002FControlVideo)\n    - ***Imagen Video:*** 高清视频生成，使用扩散模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2210.02303.pdf)\n    - ***Latent-Shift:*** 带有时序偏移的潜在扩散，用于高效文本到视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.08477.pdf) [[项目]](https:\u002F\u002Flatent-shift.github.io\u002F)\n    - ***LAVIE:*** 高品质视频生成，采用级联潜在扩散模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.15103.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FVchitect\u002FLaVie) [[项目]](https:\u002F\u002Fvchitect.github.io\u002FLaVie-project\u002F)\n    - ***Show-1:*** 将像素和潜在扩散模型结合用于文本到视频生成 [[论文]](https:\u002F\u002Fshowlab.github.io\u002FShow-1\u002Fassets\u002FShow-1.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fshowlab\u002FShow-1) [[项目]](https:\u002F\u002Fshowlab.github.io\u002FShow-1\u002F)\n    - ***稳定视频扩散:*** 将潜在视频扩散模型扩展到大型数据集 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.15127.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FStability-AI\u002Fgenerative-models) [[项目]](https:\u002F\u002Fstability.ai\u002Fnews\u002Fstable-video-diffusion-open-ai-video-model)\n    - ***VideoFactory:*** 在时空扩散中交换注意力以实现文本到视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.10874.pdf) [[数据集]](https:\u002F\u002Fgithub.com\u002Fdaooshee\u002FHD-VG-130M)\n    - ***VideoGen:*** 一种基于参考的潜在扩散方法，用于高清文本到视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.00398.pdf) [[代码]](https:\u002F\u002Fvideogen.github.io\u002FVideoGen\u002F)\n    - ***InstructVideo:*** 用人反馈指导视频扩散模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.12490.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002Fi2vgen-xl\u002Fblob\u002Fmain\u002Fdoc\u002FInstructVideo.md) [[项目]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.12490.pdf)\n    - ***SEINE:*** 用于生成转换和预测的短至长视频扩散模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.20700.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FVchitect\u002FSEINE) [[项目]](https:\u002F\u002Fvchitect.github.io\u002FSEINE-project\u002F)\n    - ***VideoLCM:*** 视频潜在一致性模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.09109.pdf)\n    - ModelScope文本到视频技术报告 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.06571.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FExponentialML\u002FText-To-Video-Finetuning)\n    - ***LAMP:*** 学习少量样本视频生成的运动模式 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.10769) [[代码]](https:\u002F\u002Frq-wu.github.io\u002Fprojects\u002FLAMP) [[项目]](https:\u002F\u002Frq-wu.github.io\u002Fprojects\u002FLAMP)\n    - ***STG:*** 时空跳过引导，用于增强视频扩散采样 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.18664) [[代码]](https:\u002F\u002Fgithub.com\u002Fjunhahyung\u002FSTGuidance) [[项目]](https:\u002F\u002Fjunhahyung.github.io\u002FSTGuidance\u002F)\n    - ***Motion-Zero:*** 用于扩散式视频生成的零样本移动对象控制框架 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.10150) [[项目]](https:\u002F\u002Flitaoguo.github.io\u002FMotionZero.github.io\u002F)\n    - ***NOVA:*** 自回归视频生成，无需向量量化主题 [[论文]](https:\u002F\u002Fbitterdhg.github.io\u002FNOVA_page\u002Fpaper\u002F2412.14169v1.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fbaaivision\u002FNOVA) [[项目]](https:\u002F\u002Fbitterdhg.github.io\u002FNOVA_page\u002F)\n- \u003Cspan id=\"text-year-2022\">**2022年**\u003C\u002Fspan>\n  - **CVPR**\n    - ***给我看什么，告诉我怎么做:*** 基于多模态条件化的视频合成 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FHan_Show_Me_What_and_Tell_Me_How_Video_Synthesis_via_CVPR_2022_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fsnap-research\u002FMMVID) [[数据集]](https:\u002F\u002Fgithub.com\u002Fsnap-research\u002FMMVID\u002Fblob\u002Fmain\u002Fmm_vox_celeb\u002FREADME.md)\n- \u003Cspan id=\"text-year-2021\">**2021年**\u003C\u002Fspan>\n  - **arXiv**\n    - ***VideoGPT:*** 使用VQ-VAE和Transformer进行视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2104.10157.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fwilson1yan\u002FVideoGPT) [[项目]](https:\u002F\u002Fwilson1yan.github.io\u002Fvideogpt\u002Findex.html)\n    - ***MagicVideo:*** 使用潜在扩散模型进行高效视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2211.11018)\n    - ***EasyAnimate:*** 一种基于Transformer架构的高性能长视频生成方法 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.18991.pdf) [[代码]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.18991)\n\n\u003C!-- 忽略目录 -->\n\n\n## 图像转视频生成\n- \u003Cspan id=\"image-year-2025\">**2025年**\u003C\u002Fspan>\n  - **CVPR**\n    - ***MotionStone:*** 基于扩散Transformer的解耦运动强度调制用于图像转视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.05848)\n    - ***MotionPro:*** 一种用于图像转视频生成的精确运动控制器 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.00948) [[代码]](https:\u002F\u002Fgithub.com\u002FHiDream-ai\u002FMotionPro)\n    - ***Through-The-Mask:*** 基于掩码的运动轨迹用于图像转视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.03059)\n    - 图像转视频生成模型的外推与解耦：运动建模比你想象的更容易 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.00948)\n  - **ICCV**\n    - ***AnyI2V:*** 带有运动控制的任意条件图像动画化 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.02857) [[项目]](https:\u002F\u002Fhenghuiding.com\u002FAnyI2V\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FFudanCVL\u002FAnyI2V)\n    - ***Versatile Transition Generation:*** 基于图像转视频扩散的多功能过渡生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.01698)\n    - ***TIP-I2V:*** 用于图像转视频生成的百万规模真实文本和图像提示数据集 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.04709) [[项目]](https:\u002F\u002Ftip-i2v.github.io) [[代码]](https:\u002F\u002Fgithub.com\u002FWangWenhao0716\u002FTIP-I2V)\n  - **ICLR**\n    - ***SG-I2V:*** 图像转视频生成中的自引导轨迹控制 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=uQjySppU9x) [[项目]](https:\u002F\u002Fkmcode1.github.io\u002FProjects\u002FSG-I2V\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FKmcode1\u002FSG-I2V)\n    - 生成式中间帧：将图像转视频模型适配用于关键帧插值 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=ykD8a9gJvy)\n    - 金字塔流匹配用于高效的视频生成建模 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=66NzcRQuOq) [[项目]](https:\u002F\u002Fpyramid-flow.github.io\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fjy0205\u002FPyramid-Flow)\n- \u003Cspan id=\"image-year-2024\">**2024年**\u003C\u002Fspan>\n  - **CVPR**\n    - ***VideoBooth:*** 基于扩散的图像提示视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.00777) [[代码]](https:\u002F\u002Fgithub.com\u002FVchitect\u002FVideoBooth) [[项目]](https:\u002F\u002Fvchitect.github.io\u002FVideoBooth-project\u002F) [[视频]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=10DxH1JETzI)\n  - **ECCV**\n    - 重新思考图像转视频适应：以对象为中心的视角 [[论文]](http:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.06871v1)\n    - ***PhysGen:*** 基于刚体物理的图像转视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.18964) [[代码]](https:\u002F\u002Fgithub.com\u002Fstevenlsw\u002Fphysgen) [[项目]](https:\u002F\u002Fstevenlsw.github.io\u002Fphysgen\u002F)\n    - ***MOFA-Video:*** 通过冻结图像转视频扩散模型中的生成式运动场适配实现可控图像动画 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.20222) [[代码]](https:\u002F\u002Fgithub.com\u002FMyNiuuu\u002FMOFA-Video) [[项目]](https:\u002F\u002Fmyniuuu.github.io\u002FMOFA_Video\u002F)\n  - **AAAI**\n    - 为条件图像转视频生成解耦内容与运动 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.14294)\n  - **NeurIPS**\n    - 识别并解决图像转视频扩散模型中的条件图像泄露 [[论文]](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2024\u002Ffile\u002F35cb54b887e7aafe74829677cce6c5c6-Paper-Conference.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002Fcond-image-leakage)\n  - **ICML**\n    - ***Video-LaVIT:*** 基于解耦视觉-运动标记的统一视频-语言预训练 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=S9lk6dk4LL) [[项目]](https:\u002F\u002Fvideo-lavit.github.io\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fjy0205\u002FLaVIT)\n  - **arXiv**\n    - ***ConsistI2V:*** 提升图像转视频生成的视觉一致性 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.04324.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FTIGER-AI-Lab\u002FConsistI2V) [[项目]](https:\u002F\u002Ftiger-ai-lab.github.io\u002FConsistI2V\u002F)\n    - ***I2V-Adapter:*** 一种适用于扩散模型的通用图像转视频适配器 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.16693.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FI2V-Adapter\u002FI2V-Adapter-repo)\n    - ***Follow-Your-Click:*** 基于简短提示的开放域区域图像动画 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.08268) [[代码]](https:\u002F\u002Fgithub.com\u002Fmayuelala\u002FFollowYourClick) [[项目]](https:\u002F\u002Ffollow-your-click.github.io\u002F)\n    - ***AtomoVideo:*** 高保真图像转视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.01800.pdf) [[项目]](https:\u002F\u002Fatomo-video.github.io\u002F) [[视频]](https:\u002F\u002Fwww.youtube.com\u002Fembed\u002F36JIlk-U-vQ)\n    - ***Pix2Gif:*** 基于运动引导的扩散用于GIF生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.04634) [[代码]](https:\u002F\u002Fhiteshk03.github.io\u002FPix2Gif\u002F) [[项目]](https:\u002F\u002Fhiteshk03.github.io\u002FPix2Gif\u002F)\n    - ***ID-Animator:*** 零样本身份保持的人类视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.15275.pdf) [[代码]](https:\u002F\u002Fid-animator.github.io\u002F) [[项目]](https:\u002F\u002Fid-animator.github.io\u002F)\n    - 无调优的噪声校正用于高保真图像转视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.02827) [[项目]](https:\u002F\u002Fnoise-rectification.github.io\u002F)\n    - ***MegActor-Σ:*** 利用扩散Transformer解锁肖像动画中的灵活混合模态控制 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.14975) [[代码]](https:\u002F\u002Fgithub.com\u002Fmegvii-research\u002Fmegactor)\n    - ***LeviTor:*** 基于3D轨迹导向的图像转视频合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.15214) [[代码]](https:\u002F\u002Fgithub.com\u002Fqiuyu96\u002FLeviTor) [[项目]](https:\u002F\u002Fppetrichor.github.io\u002Flevitor.github.io\u002F) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fhlwang06\u002FLeviTor\u002Ftree\u002Fmain)\n\n- \u003Cspan id=\"image-year-2023\">**2023年**\u003C\u002Fspan>\n  - **CVPR**\n    - 基于潜流扩散模型的条件图像到视频生成 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FNi_Conditional_Image-to-Video_Generation_With_Latent_Flow_Diffusion_Models_CVPR_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fnihaomiao\u002FCVPR23_LFDM)\n  - **arXiv**\n    - ***I2VGen-XL:*** 通过级联扩散模型实现高质量图像到视频合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.04145.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002Fi2vgen-xl) [[项目]](https:\u002F\u002Fi2vgen-xl.github.io\u002F)\n    - ***DreamVideo:*** 具有图像保留和文本指导的高保真图像到视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.03018) [[代码]](https:\u002F\u002Fgithub.com\u002Fanonymous0769\u002FDreamVideo) [[项目]](https:\u002F\u002Fanonymous0769.github.io\u002FDreamVideo\u002F)\n    - ***DynamiCrafter:*** 利用视频扩散先验为开放域图像添加动画 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.12190) [[项目]](https:\u002F\u002Fdoubiiu.github.io\u002Fprojects\u002FDynamiCrafter\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FDoubiiu\u002FDynamiCrafter) [[视频]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=0NfmIsNAg-g) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FDoubiiu\u002FDynamiCrafter)\n    - ***AnimateDiff:*** 无需特定微调即可为您的个性化文生图扩散模型添加动画 [[论文]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=Fx2SbBgcte) [[项目]](https:\u002F\u002Fanimatediff.github.io\u002F)\n- \u003Cspan id=\"image-year-2022\">**2022年**\u003C\u002Fspan>\n  - **CVPR**    \n    - ***Make It Move:*** 基于文本描述的可控图像到视频生成 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FHu_Make_It_Move_Controllable_Image-to-Video_Generation_With_Text_Descriptions_CVPR_2022_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FYouncy-Hu\u002FMAGE)\n- \u003Cspan id=\"image-year-2021\">**2021年**\u003C\u002Fspan>\n  - **ICCV**\n    - ***Click to Move:*** 使用稀疏运动控制视频生成 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2021\u002Fpapers\u002FArdino_Click_To_Move_Controlling_Video_Generation_With_Sparse_Motion_ICCV_2021_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FPierfrancescoArdino\u002FC2M)\n\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n\n\n## 音频到视频生成\n- \u003Cspan id=\"audio-year-2024\">**2024年**\u003C\u002Fspan>  \n  - **AAAI**\n    - 通过文生视频模型适配实现多样化且对齐的音频到视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.16429) [[代码]](https:\u002F\u002Fgithub.com\u002Fguyyariv\u002FTempoTokens)\n- \u003Cspan id=\"audio-year-2023\">**2023年**\u003C\u002Fspan>\n  - **CVPR**\n    - ***MM-Diffusion:*** 学习用于联合音频和视频生成的多模态扩散模型 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FRuan_MM-Diffusion_Learning_Multi-Modal_Diffusion_Models_for_Joint_Audio_and_Video_CVPR_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fresearchmm\u002FMM-Diffusion)\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n## 个性化视频生成\n- \u003Cspan id=\"personalized-year-2024\">**2024年**\u003C\u002Fspan>\n  - **CVPR**\n    - 高保真的人像主体到图像合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.10329) [[代码]](https:\u002F\u002Fgithub.com\u002FCodeGoat24\u002FFace-diffuser)\n  - **ICCV**\n    - - ***Magic Mirror:*** 魔法镜：在视频扩散变换器中实现身份保留的视频生成 [[论文]](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2501.03931) [[项目]](https:\u002F\u002Fjulianjuaner.github.io\u002Fprojects\u002FMagicMirror\u002Findex.html) [[代码]](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FMagicMirror)\n    - ***PersonalVideo:*** PersonalVideo：在不降低动态性和语义性的情况下进行高身份保真度的视频定制 [[论文]](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2411.17048) [[项目]](https:\u002F\u002Fpersonalvideo.github.io) [[代码]](https:\u002F\u002Fgithub.com\u002FEchoPluto\u002FPersonalVideo)\n    - ***MagicID:*** MagicID：用于保持身份一致性和动态性的混合偏好优化视频定制 [[论文]](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2503.12689) [[项目]](https:\u002F\u002Fechopluto.github.io\u002FMagicID-project) [[代码]](https:\u002F\u002Fgithub.com\u002FEchoPluto\u002FMagicID)\n    - ***DreamRelation:*** 梦境关系：以关系为中心的视频定制 [[论文]](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2503.07602) [[项目]](https:\u002F\u002Fdreamrelation.github.io)\n    - ⚠️ ***PERSONA:*** PERSONA：基于单张图像，通过姿态驱动变形生成的个性化全身3D虚拟形象\n  - **ECCV**\n    - ***PoseCrafter:*** 一次性实现灵活姿态控制的个性化视频合成 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F06050.pdf) [[项目]](https:\u002F\u002Fml-gsai.github.io\u002FPoseCrafter-demo\u002F)\n  - **arXiv**\n    - ***Magic-Me:*** 基于身份特异性的视频定制扩散 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.09368) [[代码]](https:\u002F\u002Fgithub.com\u002FZhen-Dong\u002FMagic-Me) [[项目]](https:\u002F\u002Fmagic-me-webpage.github.io\u002F) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FvisionMaze\u002FMagic-Me)\n    - ***ReVideo:*** 带有运动和内容控制的视频重制 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.13865) [[代码]](https:\u002F\u002Fgithub.com\u002FMC-E\u002FReVideo) [[项目]](https:\u002F\u002Fmc-e.github.io\u002Fproject\u002FReVideo\u002F)\n    - ***ConceptMaster:*** 在扩散变换器模型上进行多概念视频定制，无需测试时微调 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.04698) [[项目]](https:\u002F\u002Fyuzhou914.github.io\u002FConceptMaster\u002F)\n- \u003Cspan id=\"personalized-year-2023\">**2023年**\u003C\u002Fspan>\n  - **arXiv**\n    - ***FastComposer:*** 无需微调的局部注意力多主体图像生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.10431) [[代码]](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Ffastcomposer?tab=readme-ov-file) [[演示]](https:\u002F\u002Ffastcomposer.hanlab.ai\u002F)\n    - ***Make-Your-Video:*** 利用文本和结构引导进行定制化视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2306.00943) [[项目]](https:\u002F\u002Fdoubiiu.github.io\u002Fprojects\u002FMake-Your-Video\u002F)\n    - ***DreamVideo-2:*** 零样本主体驱动视频定制，具备精确的运动控制 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.13830) [[项目]](https:\u002F\u002Fdreamvideo2.github.io\u002F)\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\n\u003C!-- omit in toc -->\n\n## 视频编辑\n- \u003Cspan id=\"editing-year-2025\">**2025年**\u003C\u002Fspan>\n  - **CVPR**\n    - ***VideoDirector:*** 基于文本到视频模型的精准视频编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.17592) [[代码]](https:\u002F\u002Fgithub.com\u002FYukun66\u002FVideo_Director)\n    - ***VideoMage:*** 文本到视频扩散模型的多主体与运动定制 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.21781) [[项目]](https:\u002F\u002Fjasper0314-huang.github.io\u002Fvideomage-customization\u002F)\n    - 无需反演的一次性可控视频编辑中的视觉提示 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.14335) [[项目]](https:\u002F\u002Fwww.zhengbozhang.com\u002F)\n    - ***SketchVideo:*** 基于草图的视频生成与编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.23284) [[代码]](https:\u002F\u002Fgithub.com\u002FIGLICT\u002FSketchVideo) [[项目]](http:\u002F\u002Fgeometrylearning.com\u002FSketchVideo\u002F)\n    - h-***Edit:*** 通过杜布h变换实现高效灵活的扩散基编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.02187) [[代码]](https:\u002F\u002Fgithub.com\u002Fnktoan\u002Fh-edit) [[项目]](https:\u002F\u002Fnktoan.github.io\u002Fh-Edit-cvpr25\u002F)\n    - ***ObjectMover:*** 基于视频先验的生成式物体运动 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.08037) [[项目]](https:\u002F\u002Fxinyu-andy.github.io\u002FObjMover\u002F)\n    - ***MatAnyone:*** 具有一致内存传播的稳定视频抠像 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.14677) [[代码]](https:\u002F\u002Fgithub.com\u002Fpq-yang\u002FMatAnyone) [[项目]](https:\u002F\u002Fpq-yang.github.io\u002Fprojects\u002FMatAnyone\u002F)\n    - ***StyleMaster:*** 使用艺术化生成与转换为视频风格化 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.07744) [[代码]](https:\u002F\u002Fgithub.com\u002FKwaiVGI\u002FStyleMaster) [[项目]](https:\u002F\u002Fzixuan-ye.github.io\u002Fstylemaster\u002F)\n    - ***AudCast:*** 基于级联扩散Transformer的音频驱动人体视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.19824) [[项目]](https:\u002F\u002Fguanjz20.github.io\u002Fprojects\u002FAudCast\u002F)\n    - ⚠️ ***FADE:*** 面向视频编辑的频率感知扩散模型因子分解 [[论文]]() [[代码]](https:\u002F\u002Fgithub.com\u002FEternalEvan\u002FFADE)\n    - ⚠️ Align-A-Video: 用于一致视频编辑的图像扩散模型确定性奖励调优\n    - ⚠️ 多样中的统一：通过梯度-潜在净化进行视频编辑\n  - **ICCV**\n    - ***VACE:*** 多合一视频创作与编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.07598) [[项目]](https:\u002F\u002Fali-vilab.github.io\u002FVACE-Page\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FVACE)\n    - ***Reangle-A-Video:*** 作为视频到视频翻译的4D视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.09151) [[项目]](https:\u002F\u002Fanony1anony2.github.io\u002F)\n    - ***DIVE:*** 利用DINO驾驭主体驱动的视频编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.03347) [[项目]](https:\u002F\u002Fdino-video-editing.github.io\u002F)\n    - ***DynamicFace:*** 使用可组合3D面部先验实现高质量且一致的图像与视频人脸交换 [[论文]](https:\u002F\u002Ficcv.thecvf.com\u002Fvirtual\u002F2025\u002Fposter\u002F1368) [[项目]](https:\u002F\u002Fdynamic-face.github.io\u002F)\n    - ***QK-Edit:*** 重新审视基于注意力的注入在MM-DiT中用于图像和视频编辑 [[论文]](https:\u002F\u002Ficcv.thecvf.com\u002Fvirtual\u002F2025\u002Fposter\u002F215)\n    - ***Teleportraits:*** 无需训练即可将人物插入任何场景 [[论文]](https:\u002F\u002Ficcv.thecvf.com\u002Fvirtual\u002F2025\u002Fposter\u002F809)\n  - **ICLR**\n    - ***VideoGrain:*** 调制时空注意力以实现多粒度视频编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.17258) [[项目]](https:\u002F\u002Fknightyxp.github.io\u002FVideoGrain_project_page\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fknightyxp\u002FVideoGrain)\n- \u003Cspan id=\"editing-year-2024\">**2024年**\u003C\u002Fspan>\n  - **CVPR**\n    - ***VMC:*** 基于文本到视频扩散模型的时间注意力适配进行视频运动定制 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.00845) [[代码]](https:\u002F\u002Fgithub.com\u002FHyeonHo99\u002FVideo-Motion-Customization) [[项目]](https:\u002F\u002Fvideo-motion-customization.github.io\u002F)\n    - ***Fairy:*** 快速并行化的指令引导视频到视频合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.13834) [[项目]](https:\u002F\u002Ffairy-video2video.github.io\u002F)\n    - ***CCEdit:*** 基于扩散模型的创意且可控视频编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.16496) [[代码]](https:\u002F\u002Fgithub.com\u002FRuoyuFeng\u002FCCEdit) [[项目]](https:\u002F\u002Fruoyufeng.github.io\u002FCCEdit.github.io\u002F) [[视频]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=UQw4jq-igN4)\n    - ***DynVideo-E:*** 利用动态NeRF进行大规模运动与视角变化的人文中心视频编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.10624) [[项目]](https:\u002F\u002Fshowlab.github.io\u002FDynVideo-E\u002F) [[视频]](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=xiRH4Q6B3Yk)\n    - ***Video-P2P:*** 带有交叉注意力控制的视频编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.04761) [[代码]](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FVideo-P2P) [[项目]](https:\u002F\u002Fvideo-p2p.github.io\u002F)\n    - 一段视频胜过256个基座：用于零样本视频编辑的空间-时间期望最大化反演 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.05856) [[代码]](https:\u002F\u002Fgithub.com\u002FSTEM-Inv\u002Fstem-inv) [[项目]](https:\u002F\u002Fstem-inv.github.io\u002Fpage\u002F)\n    - ***MaskINT:*** 基于插值非自回归掩码变压器的视频编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.12468) [[代码]](https:\u002F\u002Fmaskint.github.io\u002F) [[项目]](https:\u002F\u002Fmaskint.github.io\u002F)\n    - ***VidToMe:*** 用于零样本视频编辑的视频标记合并 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.10656) [[代码]](https:\u002F\u002Fgithub.com\u002Flixirui142\u002FVidToMe) [[项目]](https:\u002F\u002Fvidtome-diffusion.github.io\u002F) [[视频]](https:\u002F\u002Fyoutu.be\u002FcZPtwcRepNY)\n    - 通过多模态大型语言模型实现语言驱动的视频修复 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.10226.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fjianzongwu\u002FLanguage-Driven-Video-Inpainting) [[项目]](https:\u002F\u002Fjianzongwu.github.io\u002Fprojects\u002Frovi\u002F) [[数据集]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fjianzongwu\u002Frovi)\n    - ***AVID:*** 基于扩散模型的任意长度视频修复 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.03816.pdf) [[代码]](https:\u002F\u002Fzhang-zx.github.io\u002FAVID\u002F) [[项目]](https:\u002F\u002Fzhang-zx.github.io\u002FAVID\u002F)\n    - ***CAMEL:*** 专为提升文本驱动视频编辑而设计的因果运动增强 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FZhang_CAMEL_CAusal_Motion_Enhancement_Tailored_for_Lifting_Text-driven_Video_Editing_CVPR_2024_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fzhangguiwei610\u002FCAMEL)\n    - 用于零样本文本驱动运动迁移的时空扩散特征 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FYatim_Space-Time_Diffusion_Features_for_Zero-Shot_Text-Driven_Motion_Transfer_CVPR_2024_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fdiffusion-motion-transfer\u002Fdiffusion-motion-transfer) [[项目]](https:\u002F\u002Fdiffusion-motion-transfer.github.io\u002F)\n    - ***FRESCO:*** 用于零样本视频翻译的空间-时间对应 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FYang_FRESCO_Spatial-Temporal_Correspondence_for_Zero-Shot_Video_Translation_CVPR_2024_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fwilliamyang1991\u002FFRESCO) [[项目]](https:\u002F\u002Fwww.mmlab-ntu.com\u002Fproject\u002Ffresco\u002F)\n    - ***MotionEditor:*** 基于内容感知扩散的视频运动编辑 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FTu_MotionEditor_Editing_Video_Motion_via_Content-Aware_Diffusion_CVPR_2024_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FFrancis-Rings\u002FMotionEditor) [[项目]](https:\u002F\u002Ffrancis-rings.github.io\u002FMotionEditor\u002F)\n  - **ECCV**\n    - ***DragVideo:*** 交互式的拖拽式视频编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.02216)\n    - 基于因子化扩散蒸馏的视频编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.09334)\n    - ***OCD:*** 以对象为中心的扩散用于高效视频编辑 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F07396.pdf) [[项目]](https:\u002F\u002Fqualcomm-ai-research.github.io\u002Fobject-centric-diffusion\u002F)\n    - ***DreamMotion:*** 用于零样本视频编辑的时空自相似性评分蒸馏 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.12002) [[项目]](https:\u002F\u002Fhyeonho99.github.io\u002Fdreammotion\u002F)\n    - ***WAVE:*** 用于零样本文本到视频编辑的扭曲DDIM反演特征 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F09682.pdf) [[项目]](https:\u002F\u002Free1s.github.io\u002Fwave\u002F)\n    - ***DeCo:*** 带有运动一致性的解耦人文中心扩散视频编辑 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F06071.pdf)\n    - ***SAVE:*** 主角多样化与结构无关的视频编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.02503) [[代码]](https:\u002F\u002Fgithub.com\u002Fldynx\u002FSAVE)\n    - ***Videoshop:*** 基于噪声外推扩散反演的局部语义视频编辑 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2024\u002Fpapers_ECCV\u002Fpapers\u002F01890.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fsfanxiang\u002Fvideoshop) [[项目]](https:\u002F\u002Fvideoshop-editing.github.io\u002F)\n  - **ICLR**\n    - ***Ground-A-Video:*** 使用文本到图像扩散模型进行零样本接地视频编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.01107) [[代码]](https:\u002F\u002Fgithub.com\u002FGround-A-Video\u002FGround-A-Video) [[项目]](https:\u002F\u002Fground-a-video.github.io\u002F)\n    - ***TokenFlow:*** 用于一致视频编辑的一致扩散特征 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2307.10373) [[代码]](https:\u002F\u002Fgithub.com\u002Fomerbt\u002FTokenFlow) [[项目]](https:\u002F\u002Fdiffusion-tokenflow.github.io\u002F)\n    - 使用合成数据集进行一致的视频到视频转移 [[论文]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=IoKRezZMxF) [[代码]](https:\u002F\u002Fgithub.com\u002Famazon-science\u002Finstruct-video-to-video\u002Ftree\u002Fmain)\n    - ***FLATTEN:*** 基于光流引导的注意力，用于一致的文本到视频编辑 [[论文]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=JgqftqZQZ7) [[代码]](https:\u002F\u002Fgithub.com\u002Fyrcong\u002Fflatten) [[项目]](https:\u002F\u002Fflatten-video-editing.github.io\u002F)\n  - **SIGGRAPH**\n    - ***MotionCtrl:*** 用于视频生成的统一且灵活的运动控制器 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.03641.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FTencentARC\u002FMotionCtrl) [[项目]](https:\u002F\u002Fwzhouxiff.github.io\u002Fprojects\u002FMotionCtrl\u002F) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FTencentARC\u002FMotionCtrl)\n  - **arXiv**\n    - 基于扩散模型的视频运动迁移中的光谱运动对齐 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.15249) [[代码]](https:\u002F\u002Fgithub.com\u002Fgeonyeong-park\u002FSpectral-Motion-Alignment) [[项目]](https:\u002F\u002Fgeonyeong-park.github.io\u002Fspectral-motion-alignment\u002F)\n    - ***UniEdit:*** 一个统一且无需调优的视频运动和外观编辑框架 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.13185) [[代码]](https:\u002F\u002Fgithub.com\u002FJianhongBai\u002FUniEdit) [[项目]](https:\u002F\u002Fjianhongbai.github.io\u002FUniEdit\u002F)\n    - ***DragAnything:*** 使用实体表示进行任何事物的运动控制 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.07420.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fshowlab\u002FDragAnything) [[项目]](https:\u002F\u002Fweijiawu.github.io\u002Fdraganything_page\u002F)\n    - ***AnyV2V:*** 一个即插即用的框架，适用于任何视频到视频编辑任务 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.14468) [[代码]](https:\u002F\u002Fgithub.com\u002FTIGER-AI-Lab\u002FAnyV2V) [[项目]](https:\u002F\u002Ftiger-ai-lab.github.io\u002FAnyV2V\u002F)\n    - ***CoCoCo:*** 改善文本引导的视频修复，以提高一致性、可控性和兼容性 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.12035) [[代码]](https:\u002F\u002Fgithub.com\u002Fzibojia\u002FCOCOCO) [[项目]](https:\u002F\u002Fcococozibojia.github.io\u002F)\n    - ***VASE:*** 真实视频中以对象为中心的外观和形状操控 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.02473)\n    - ***StableV2V:*** 在视频到视频编辑中稳定形状一致性 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.11045) [[代码]](https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo\u002FStableV2V) [[项目]](https:\u002F\u002Falonzoleeeooo.github.io\u002FStableV2V) [[数据集]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FAlonzoLeeeooo\u002FDAVIS-Edit)\n    - 用于视频定制的运动反演 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.20193) [[代码]](https:\u002F\u002Fgithub.com\u002FEnVision-Research\u002FMotionInversion) [[演示]](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fziyangmai\u002FMotionInversion)\n    - ***VideoAnydoor:*** 高保真视频对象插入，具有精确的运动控制 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.01427) [[项目]](https:\u002F\u002Fvideoanydoor.github.io\u002F)\n- \u003Cspan id=\"editing-year-2023\">**2023年**\u003C\u002Fspan>\n  - **CVPR**\n    - 形状感知的文本驱动分层视频编辑 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2022\u002Fpapers_ECCV\u002Fpapers\u002F136750705.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Ftext-video-edit\u002Fshape-aware-text-driven-layered-video-editing-release) [[项目]](https:\u002F\u002Ftext-video-edit.github.io\u002F)\n  - **ICCV**\n    - ***StableVideo*** 基于分层表示和图像扩散的视频编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.09592) [[代码]](https:\u002F\u002Fgithub.com\u002Frese1f\u002FStableVideo)\n    - ***Pix2Video:*** 基于图像扩散的视频编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.12688) [[代码]](https:\u002F\u002Fgithub.com\u002Fduyguceylan\u002Fpix2video)\n    - ***Tune-A-Video:*** 一次性的图像扩散模型调优，用于文本到视频生成 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FWu_Tune-A-Video_One-Shot_Tuning_of_Image_Diffusion_Models_for_Text-to-Video_Generation_ICCV_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fshowlab\u002FTune-A-Video) [[项目]](https:\u002F\u002Ftuneavideo.github.io\u002F)\n  - **NeurIPS**\n    - 朝着使用文本到图像扩散模型的一致视频编辑方向前进 [[论文]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=RNVwm4BzXO)\n  - **SIGGRAPH**\n    - ***Rerender A Video:*** 零样本文本引导的视频到视频翻译 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2306.07954) [[代码]](https:\u002F\u002Fgithub.com\u002Fwilliamyang1991\u002FRerender_A_Video) [[项目]](https:\u002F\u002Fwww.mmlab-ntu.com\u002Fproject\u002Frerender\u002F)\n  - **arXiv**\n    - ***Style-A-Video:*** 敏捷扩散用于任意基于文本的视频风格转换 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.05464.pdf)\n    - ***SAVE:*** 光谱偏移感知的图像扩散模型适应，用于文本引导的视频编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.18670) [[代码]](https:\u002F\u002Fgithub.com\u002Fnazmul-karim170\u002FSAVE-Text2Video-Diffusion) [[项目]](https:\u002F\u002Fsave-textguidedvideoediting.github.io\u002F)\n    - ***MagicProp:*** 基于扩散的视频编辑，通过运动感知的外观传播 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.00908)\n- \u003Cspan id=\"editing-year-2022\">**2022年**\u003C\u002Fspan>\n  - **ECCV**\n    - ***Text2LIVE:*** 文本驱动的分层图像和视频编辑 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.02491) [[代码]](https:\u002F\u002Fgithub.com\u002Fomerbt\u002FText2LIVE) [[项目]](https:\u002F\u002Ftext2live.github.io\u002F)\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n\n\n# 人物图像动画\n- \u003Cspan id=\"human-year-2026\">**2026年**\u003C\u002Fspan>\n  - **arXiv**\n    - ***Hand2World:*** 基于自由空间手势的自回归视角交互生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.09600) [[项目]](https:\u002F\u002Fhand2world.github.io\u002F)\n- \u003Cspan id=\"human-year-2025\">**2025年**\u003C\u002Fspan>\n  - **CVPR**\n    - ***X-Dyna:*** 富有表现力的动态人物图像动画 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.10021) [[代码]](https:\u002F\u002Fgithub.com\u002Fbytedance\u002FX-Dyna)\n    - ***StableAnimator:*** 高质量、保持身份一致的人物图像动画 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.17697) [[项目]](https:\u002F\u002Ffrancis-rings.github.io\u002FStableAnimator\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FFrancis-Rings\u002FStableAnimator)\n  - **ICCV**\n    - ***DreamActor-M1:*** 具有混合引导的整体性、表现力强且鲁棒的人物图像动画 [[论文]](https:\u002F\u002Ficcv.thecvf.com\u002Fvirtual\u002F2025\u002Fposter\u002F288) [[项目]](https:\u002F\u002Fgrisoon.github.io\u002FDreamActor-M1\u002F)\n    - ***Animate Anyone 2:*** 具备环境 affordance 的高保真角色图像动画 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.06145) [[项目]](https:\u002F\u002Fhumanaigc.github.io\u002Fanimate-anyone-2\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FHumanAIGC\u002Fanimate-anyone-2)\n    - ***多身份人物图像动画:*** 基于结构化视频扩散的多身份人物图像动画 [[论文]](https:\u002F\u002Ficcv.thecvf.com\u002Fvirtual\u002F2025\u002Fposter\u002F638)\n    - ***OmniHuman-1:*** 重新思考单阶段条件驱动的人像动画模型的扩展 [[论文]](https:\u002F\u002Ficcv.thecvf.com\u002Fvirtual\u002F2025\u002Fposter\u002F2201) [[项目]](https:\u002F\u002Fomnihuman-lab.github.io\u002F)\n    - ***AdaHuman:*** 可动画化的精细3D人体生成，采用组合式多视角扩散模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.24877) [[项目]](https:\u002F\u002Fnvlabs.github.io\u002FAdaHuman\u002F) \n    - ***Ponimator:*** 展开交互姿态以实现多样化的人-人交互动画 [[论文]](https:\u002F\u002Ficcv.thecvf.com\u002Fvirtual\u002F2025\u002Fposter\u002F1453)\n  - **ICLR**\n    - ***Animate-X:*** 具有增强运动表示的通用角色图像动画 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.17697) [[项目]](https:\u002F\u002Flucaria-academy.github.io\u002FAnimate-X\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fanimate-x)\n  - **arXiv**\n    - ***EgoControl***：通过3D全身姿态可控的视角视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.18173) [[项目]](https:\u002F\u002Fcvg-bonn.github.io\u002FEgoControl\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FCVG-Bonn\u002FEgoControl) \n    - ***UniAnimate-DiT:*** 大规模视频扩散Transformer驱动的人像动画 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.11289) [[项目]](https:\u002F\u002Funianimate.github.io\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fali-vilab\u002FUniAnimate-DiT)\n    - ***DreamActor-M1:*** 具有混合引导的整体性、表现力强且鲁棒的人像动画 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.01724) [[项目]](https:\u002F\u002Fdreamactor-m1.com\u002F)\n    - ***Animate Anyone 2:*** 具备环境 affordance 的高保真角色图像动画 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.06145) [[项目]](https:\u002F\u002Fhumanaigc.github.io\u002Fanimate-anyone-2\u002F)\n- \u003Cspan id=\"human-year-2024\">**2024年**\u003C\u002Fspan>\n  - **CVPR**\n    - ***MotionFollower:*** 通过轻量级分数引导扩散编辑视频运动 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.20325) [[项目]](https:\u002F\u002Ffrancis-rings.github.io\u002FMotionFollower\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FFrancis-Rings\u002FMotionFollower)\n    - ***MotionEditor:*** 基于内容感知扩散编辑视频运动 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FTu_MotionEditor_Editing_Video_Motion_via_Content-Aware_Diffusion_CVPR_2024_paper.pdf) [[项目]](https:\u002F\u002Ffrancis-rings.github.io\u002FMotionEditor\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FFrancis-Rings\u002FMotionEditor)\n  - **ICLR**\n    - ***DisPose:*** 解耦姿势引导以实现可控的人像动画 [[论文]](https:\u002F\u002Fopenreview.net\u002Fforum?id=AumOa10MKG) [[项目]](https:\u002F\u002Flihxxx.github.io\u002FDisPose\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Flihxxx\u002FDisPose)\n  - **arXiv**\n    - ***MikuDance:*** 使用混合运动动力学为角色艺术动画 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.08656) [[项目]](https:\u002F\u002Fkebii.github.io\u002FMikuDance\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FKebii\u002FMikuDance)\n    - ***MimicMotion:*** 利用区域监督和运动模糊条件实现高质量的人像动画 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.19680) [[代码]](https:\u002F\u002Fgithub.com\u002Ftencent\u002FMimicMotion) [[项目]](https:\u002F\u002Ftencent.github.io\u002FMimicMotion\u002F)\n    - ***VividPose:*** 推进稳定视频扩散技术，用于逼真的人像动画 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.18156) [[项目]](https:\u002F\u002Fkelu007.github.io\u002Fvivid-pose\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FKelu007\u002FVividPose)\n    - ***MIMO:*** 基于空间分解建模的可控角色视频合成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.16160) [[项目]](https:\u002F\u002Fmenyifang.github.io\u002Fprojects\u002FMIMO\u002Findex.html) [[代码]](https:\u002F\u002Fgithub.com\u002Fmenyifang\u002FMIMO)\n    - ***DynamicCtrl:*** 重新思考高质量人像动画的基本结构及文本的作用 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.21246) [[项目]](https:\u002F\u002Fgulucaptain.github.io\u002FDynamiCtrl\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fgulucaptain\u002FDynamiCtrl)\n    - ***HumanDiT:*** 基于姿势引导的扩散Transformer，用于长时序人体运动视频生成 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.04847) [[项目]](https:\u002F\u002Fagnjason.github.io\u002FHumanDiT-page\u002F)\n    - ***解耦前景与背景运动以提升人像视频生成的真实感:*** [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.16393) [[项目]](https:\u002F\u002Fliujl09.github.io\u002Fhumanvideo_movingbackground\u002F) \n    - ***DreamDance:*** 通过丰富从2D姿态中提取的3D几何线索来动画化人像 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.00397) [[项目]](https:\u002F\u002Fliujl09.github.io\u002Fhumanvideo_movingbackground\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FPKU-YuanGroup\u002FDreamDance)\n\n\n\n\n\n\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\n\n\u003C!-- omit in toc -->\n\n# 数据集\n- [arXiv 2012] ***UCF101:*** 一个包含101个动作类别的野外视频数据集 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1212.0402.pdf) [[数据集]](https:\u002F\u002Fwww.crcv.ucf.edu\u002Fdata\u002FUCF101.php)\n- [arXiv 2017] ***DAVIS:*** 2017年视频目标分割挑战赛 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1704.00675.pdf) [[数据集]](https:\u002F\u002Fdavischallenge.org\u002F)\n- [ICCV 2019] ***FaceForensics++:*** 学习检测被篡改的人脸图像 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_ICCV_2019\u002Fpapers\u002FRossler_FaceForensics_Learning_to_Detect_Manipulated_Facial_Images_ICCV_2019_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fondyari\u002FFaceForensics)\n- [NeurIPS 2019] ***TaiChi-HD:*** 基于第一阶运动模型的图像动画 [[论文]](https:\u002F\u002Fpapers.nips.cc\u002Fpaper_files\u002Fpaper\u002F2019\u002Ffile\u002F31c0b36aef265d9221af80872ceb62f9-Paper.pdf) [[数据集]](https:\u002F\u002Fgithub.com\u002FAliaksandrSiarohin\u002Ffirst-order-model)\n- [ECCV 2020] ***SkyTimeLapse:*** DTVNet：通过单张静态图像生成动态延时视频 [[论文]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2020\u002Fpapers_ECCV\u002Fpapers\u002F123500290.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fzhangzjn\u002FDTVNet?tab=readme-ov-file)\n- [ICCV 2021] ***WebVid-10M:*** 冻结时间：用于端到端检索的联合视频和图像编码器 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2104.00650.pdf) [[数据集]](https:\u002F\u002Fmaxbain.com\u002Fwebvid-dataset\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002Fm-bain\u002Fwebvid) [[项目]](https:\u002F\u002Fwww.robots.ox.ac.uk\u002F~vgg\u002Fresearch\u002Ffrozen-in-time\u002F)\n- [ICCV 2021] ***WebVid-10M:*** 冻结时间：用于端到端检索的联合视频和图像编码器 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2021\u002Fpapers\u002FBain_Frozen_in_Time_A_Joint_Video_and_Image_Encoder_for_ICCV_2021_paper.pdf) [[数据集]](https:\u002F\u002Fgithub.com\u002Fm-bain\u002Fwebvid) [[项目]](https:\u002F\u002Fwww.robots.ox.ac.uk\u002F~vgg\u002Fresearch\u002Ffrozen-in-time\u002F)\n- [ECCV 2022] ***ROS:*** 通过观看YouTube视频学习驾驶：基于动作条件的对比策略预训练 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.02393.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fmetadriverse\u002FACO) [[数据集]](https:\u002F\u002Fmycuhk-my.sharepoint.com\u002Fpersonal\u002F1155165194_link_cuhk_edu_hk\u002F_layouts\u002F15\u002Fonedrive.aspx?id=%2Fpersonal%2F1155165194%5Flink%5Fcuhk%5Fedu%5Fhk%2FDocuments%2Fytb%5Fdriving%5Fvideos&ga=1)\n- [arXiv 2023] ***HD-VG-130M:*** VideoFactory：用于文本到视频生成的时空扩散中的交换注意力 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.10874.pdf) [[数据集]](https:\u002F\u002Fgithub.com\u002Fdaooshee\u002FHD-VG-130M)\n- [NeurIPS 2023] ***FETV:*** 一个用于细粒度评估开放域文本到视频生成的基准 [[论文]](https:\u002F\u002Fpapers.nips.cc\u002Fpaper_files\u002Fpaper\u002F2023\u002Ffile\u002Fc481049f7410f38e788f67c171c64ad5-Paper-Datasets_and_Benchmarks.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002Fllyx97\u002FFETV)\n- [ICLR 2024] ***InternVid:*** 一个用于多模态理解和生成的大规模视频-文本数据集 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2307.06942) [[数据集]](https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FInternVideo\u002Ftree\u002Fmain\u002FData\u002FInternVid)\n- [CVPR 2024] ***Panda-70M:*** 使用多模态教师为7000万段视频添加字幕 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.19479.pdf) [[数据集]](https:\u002F\u002Fgithub.com\u002Fsnap-research\u002FPanda-70M) [[项目]](https:\u002F\u002Fsnap-research.github.io\u002FPanda-70M)\n- [arXiv 2024] ***VidProM:*** 一个百万级的真实提示图库数据集，用于文本到视频扩散模型 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.06098.pdf) [[数据集]](https:\u002F\u002Fgithub.com\u002FWangWenhao0716\u002FVidProM)\n- [CVPR 2025] ***HOIGen-1M:*** 一个用于人-物体交互视频生成的大规模数据集 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.23715) [[数据集]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FHOIGen\u002FHOIGen-1M)\n- [CVPR 2025] ***VEU-Bench:*** 朝着全面理解视频编辑的方向前进 [[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.17828)\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n# 评价指标\n- [CVPR 2025] **T2V-CompBench:** 一个用于组合式文本到视频生成的综合基准 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.14505) [[项目]](https:\u002F\u002Fgithub.com\u002FKaiyueSun98\u002FT2V-CompBench\u002Ftree\u002FV2) [[代码]](https:\u002F\u002Fgithub.com\u002FKaiyueSun98\u002FT2V-CompBench\u002Ftree\u002FV2)\n- [arXiv 2024] ***Davis-Edit:*** 在视频到视频编辑中稳定形状一致性 [[论文]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.11045) [[项目]](https:\u002F\u002Falonzoleeeooo.github.io\u002FStableV2V\u002F) [[代码]](https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo\u002FStableV2V)\n- [CVPR 2024] **VBench:** 视频生成模型的综合基准测试套件 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FHuang_VBench_Comprehensive_Benchmark_Suite_for_Video_Generative_Models_CVPR_2024_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FVchitect\u002FVBench)\n- [ICCV 2023] **DOVER:** 从美学和技术角度探索用户生成内容的视频质量评估 [[论文]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FWu_Exploring_Video_Quality_Assessment_on_User_Generated_Contents_from_Aesthetic_ICCV_2023_paper.pdf) [[代码]](https:\u002F\u002Fgithub.com\u002FVQAssessment\u002FDOVER)\n- [ICLR 2019] ***FVD:*** 一种新的视频生成指标 [[论文]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=rylgEULtdN) [[代码]](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fgoogle-research\u002Fblob\u002Fmaster\u002Ffrechet_video_distance\u002Ffrechet_video_distance.py)\n\n\u003C!-- omit in toc -->\n# 问答\n- **问：这篇论文列表的会议顺序是什么？**\n  - 这篇论文列表按照以下顺序排列：\n    - CVPR\n    - ICCV\n    - ECCV\n    - NeurIPS\n    - ICLR\n    - AAAI\n    - ACM MM\n    - SIGGRAPH\n    - arXiv\n    - 其他\n- **问：这里的“其他”指的是什么？**\n  - 一些研究（例如“Sora”）并未在arXiv上发表技术报告，而是倾向于在其官方网站上撰写博客文章。“其他”类别指的就是这类研究。\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n\n# 参考文献\n\n`reference.bib` 文件汇总了最新图像修复相关论文、常用数据集和工具库的 BibTeX 格式参考文献。基于原始参考文献，我进行了以下修改，以使它们在 LaTeX 文档中显示得更加美观：\n- 参考文献通常采用 `作者-etal-年份-昵称` 的形式构建。特别地，数据集和工具库的参考文献直接使用 `昵称` 构建，例如 `imagenet`。\n- 在每条参考文献中，所有会议或期刊名称均被转换为缩写，例如 `Computer Vision and Pattern Recognition -> CVPR`。\n- 移除了所有参考文献中的 `url`、`doi`、`publisher`、`organization`、`editor` 和 `series` 字段。\n- 如果缺少页码 (`pages`)，则为其添加页码。\n- 所有论文标题均采用首字母大写格式，并额外添加了一对 `{}`，以确保在某些特定模板中也能正确显示首字母大写。\n\n如果您对参考文献格式有其他需求，可以通过在 [DBLP](https:\u002F\u002Fdblp.org\u002F) 或 [Google Scholar](https:\u002F\u002Fscholar.google.com\u002F) 中搜索论文名称来参考其原始参考文献。\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n# 点赞历史\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlonzoLeeeooo_awesome-video-generation_readme_156478355718.png\" target=\"_blank\">\n        \u003Cimg width=\"500\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlonzoLeeeooo_awesome-video-generation_readme_156478355718.png\" alt=\"点赞历史图表\">\n    \u003C\u002Fa>\n\u003Cp>\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)\n\n\u003C!-- omit in toc -->\n# 微信群\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"YOUR_OFFICIAL_WEBSITE_URL\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlonzoLeeeooo_awesome-video-generation_readme_bebe6963d95b.png\" alt=\"群组\">\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\n[\u003Cu>\u003Csmall>\u003C🎯返回顶部>\u003C\u002Fsmall>\u003C\u002Fu>](#contents)","# awesome-video-generation 快速上手指南\n\n`awesome-video-generation` 是一个汇总视频生成领域论文、开源代码、数据集及相关产品的精选资源库。它本身不是一个单一的可执行软件，而是一个指向各类前沿模型（如 Sora, Stable Video Diffusion, Animate Anyone 等）的导航索引。\n\n本指南将指导你如何利用该资源库找到目标项目，并以其中提到的热门开源项目（如 `StableV2V` 或通用扩散模型）为例，演示标准的安装与运行流程。\n\n## 环境准备\n\n在开始之前，请确保你的开发环境满足以下要求，以支持大多数视频生成模型的运行：\n\n*   **操作系统**: Linux (推荐 Ubuntu 20.04\u002F22.04) 或 macOS。Windows 用户建议使用 WSL2。\n*   **GPU**: 推荐使用 NVIDIA GPU，显存至少 16GB（运行高分辨率或长视频生成建议 24GB+）。\n*   **CUDA**: 版本 11.8 或 12.1+（根据具体模型要求）。\n*   **Python**: 3.9 - 3.11。\n*   **包管理器**: `pip` 或 `conda`。\n*   **Git**: 用于克隆仓库。\n\n**前置依赖检查：**\n```bash\npython --version\nnvidia-smi  # 检查 GPU 驱动及 CUDA 版本\ngit --version\n```\n\n## 安装步骤\n\n由于该仓库是资源列表，你需要先克隆仓库获取最新论文和代码链接，然后选择具体的子项目进行安装。以下以仓库中重点推荐的 **StableV2V** (Video-to-Video Editing) 为例进行演示。\n\n### 1. 克隆资源仓库\n首先获取最新的资源列表和更新动态：\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo\u002Fawesome-video-generation.git\ncd awesome-video-generation\n```\n\n### 2. 安装具体项目 (以 StableV2V 为例)\n根据 README 中的 \"Recent news\" 或 \"Papers\" 部分找到目标项目的代码链接。\n\n**步骤 A: 克隆项目代码**\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo\u002FStableV2V.git\ncd StableV2V\n```\n\n**步骤 B: 创建虚拟环境并安装依赖**\n推荐使用 Conda 管理环境，国内用户可使用清华源加速。\n\n```bash\n# 创建环境\nconda create -n stablev2v python=3.10 -y\nconda activate stablev2v\n\n# 安装 PyTorch (根据官方文档选择对应 CUDA 版本，此处以 CUDA 11.8 为例，使用清华源)\npip install torch torchvision torchaudio --index-url https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n\n# 安装项目依赖\npip install -r requirements.txt\n```\n\n**步骤 C: 下载模型权重**\n该项目权重托管在 Hugging Face。国内访问较慢时，建议使用镜像站或代理。\n\n```bash\n# 安装 huggingface-cli (如果未安装)\npip install huggingface_hub\n\n# 下载模型权重 (示例命令，具体参照项目 README)\nhuggingface-cli download AlonzoLeeeooo\u002FStableV2V --local-dir .\u002Fweights\n```\n\n## 基本使用\n\n安装完成后，你可以参考具体项目的脚本进行推理。以下是基于典型视频生成项目的通用使用范式。\n\n### 1. 准备输入数据\n确保你拥有输入视频或图像，并将其放置在项目指定的目录中（例如 `inputs\u002F`）。\n\n### 2. 运行推理脚本\n大多数项目提供 `inference.py` 或类似的入口脚本。以下是一个典型的命令行示例：\n\n```bash\npython inference.py \\\n    --config configs\u002Fstablev2v.yaml \\\n    --input_path inputs\u002Fmy_video.mp4 \\\n    --output_dir outputs \\\n    --prompt \"A cinematic shot of a cat walking in the rain\" \\\n    --seed 42\n```\n\n**参数说明：**\n*   `--config`: 模型配置文件路径。\n*   `--input_path`: 输入视频或图像路径。\n*   `--prompt`: 文本提示词（针对文生视频或编辑任务）。\n*   `--output_dir`: 生成结果的保存目录。\n\n### 3. 查看结果\n生成完成后，视频文件通常保存在 `outputs` 文件夹中。你可以使用任何视频播放器查看结果，或使用 FFmpeg 进行后续处理：\n\n```bash\nffplay outputs\u002Fresult_001.mp4\n```\n\n> **提示**：对于 `awesome-video-generation` 列表中其他项目（如 Text-to-Video 的 `CogVideoX` 或 `OpenVid`），请进入对应项目的 GitHub 页面，遵循其特定的 `README.md` 中的指令进行操作，流程通常与上述步骤类似。","某广告公司的 AI 研发小组正紧急为一家运动品牌开发“静态海报转动态短视频”的功能，需要在两周内复现业界最新的视频生成效果以验证商业可行性。\n\n### 没有 awesome-video-generation 时\n- **文献检索如大海捞针**：团队成员需在 arXiv、Google Scholar 等多个平台分散搜索，难以区分哪些是仅停留在理论阶段的论文，哪些已有开源代码可供快速验证。\n- **技术选型盲目低效**：面对“图像转视频”、“个性化生成”等细分方向，缺乏系统的分类指引，导致团队在过时的模型上浪费了大量算力资源进行无效尝试。\n- **复现门槛极高**：找不到配套的预训练权重或标准测试数据集（如 DAVIS-Edit），研究人员需自行清洗数据并从头训练，项目进度严重滞后。\n- **前沿动态掌握滞后**：无法及时获取如 CVPR 2025 或 NeurIPS 2024 等顶会的最新录用论文，导致技术方案可能在上马时已落后于行业半年。\n\n### 使用 awesome-video-generation 后\n- **一站式资源聚合**：直接通过目录定位到\"Image-to-Video\"或\"Human Image Animation\"板块，快速获取按年份排序的精选论文及对应的 GitHub 代码链接。\n- **精准锁定可用方案**：利用仓库中标注的“已开源代码”和“模型权重”信息，团队迅速锁定了 StableV2V 等成熟模型，将环境搭建时间从数天缩短至数小时。\n- **基准测试开箱即用**：直接下载仓库推荐的标准化数据集和评测基准，无需自行构建测试集，确保了实验结果的可比性和权威性。\n- **同步全球最新进展**：通过定期更新的顶会论文列表（如 AAAI 2024、ICLR 2025），团队立即引入了最新的形状一致性编辑技术，显著提升了输出视频的稳定性。\n\nawesome-video-generation 将原本需要数周的文献调研与资源整理工作压缩至一天，让研发团队能专注于核心算法的优化与业务落地。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FAlonzoLeeeooo_awesome-video-generation_0d82b866.png","AlonzoLeeeooo","USTC-liuchang","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FAlonzoLeeeooo_5e15b1aa.jpg","University of Science and Technology of China (USTC), Ph.D. Student, Computer Vision","University of Science and Technology of China","Hefei, China",null,"https:\u002F\u002Fgithub.com\u002FAlonzoLeeeooo",[84],{"name":85,"color":86,"percentage":87},"TeX","#3D6117",100,757,39,"2026-03-31T13:16:50","MIT",1,"","未说明",{"notes":96,"python":94,"dependencies":97},"该仓库（awesome-video-generation）是一个视频生成领域的论文和资源汇总列表，本身不是一个可运行的软件工具，因此 README 中未包含具体的操作系统、硬件配置或依赖库要求。列表中提到的具体项目（如 StableV2V, CogVideoX 等）需前往其各自的代码仓库查看运行环境需求。",[],[14,34],[100,101,102,103,104],"text-to-video-generation","video-generation","diffusion-models","generative-adversarial-networks","paper-list","2026-03-27T02:49:30.150509","2026-04-06T12:11:04.550718",[],[]]