[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-antgroup--echomimic_v2":3,"tool-antgroup--echomimic_v2":64},[4,17,26,40,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,2,"2026-04-03T11:11:01",[13,14,15],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":23,"last_commit_at":32,"category_tags":33,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,34,35,36,15,37,38,13,39],"数据工具","视频","插件","其他","语言模型","音频",{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":10,"last_commit_at":46,"category_tags":47,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,38,37],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":10,"last_commit_at":54,"category_tags":55,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74913,"2026-04-05T10:44:17",[38,14,13,37],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":23,"last_commit_at":62,"category_tags":63,"status":16},2471,"tesseract","tesseract-ocr\u002Ftesseract","Tesseract 是一款历史悠久且备受推崇的开源光学字符识别（OCR）引擎，最初由惠普实验室开发，后由 Google 维护，目前由全球社区共同贡献。它的核心功能是将图片中的文字转化为可编辑、可搜索的文本数据，有效解决了从扫描件、照片或 PDF 文档中提取文字信息的难题，是数字化归档和信息自动化的重要基础工具。\n\n在技术层面，Tesseract 展现了强大的适应能力。从版本 4 开始，它引入了基于长短期记忆网络（LSTM）的神经网络 OCR 引擎，显著提升了行识别的准确率；同时，为了兼顾旧有需求，它依然支持传统的字符模式识别引擎。Tesseract 原生支持 UTF-8 编码，开箱即用即可识别超过 100 种语言，并兼容 PNG、JPEG、TIFF 等多种常见图像格式。输出方面，它灵活支持纯文本、hOCR、PDF、TSV 等多种格式，方便后续数据处理。\n\nTesseract 主要面向开发者、研究人员以及需要构建文档处理流程的企业用户。由于它本身是一个命令行工具和库（libtesseract），不包含图形用户界面（GUI），因此最适合具备一定编程能力的技术人员集成到自动化脚本或应用程序中",73286,"2026-04-03T01:56:45",[13,14],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":75,"owner_website":80,"owner_url":81,"languages":82,"stars":95,"forks":96,"last_commit_at":97,"license":98,"difficulty_score":10,"env_os":99,"env_gpu":100,"env_ram":101,"env_deps":102,"category_tags":113,"github_topics":114,"view_count":123,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":124,"updated_at":125,"faqs":126,"releases":167},2274,"antgroup\u002Fechomimic_v2","echomimic_v2","[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation","EchoMimicV2 是一款由蚂蚁集团开源的半身人物动画生成工具，旨在通过简单的输入驱动高质量的人物视频创作。它主要解决了传统数字人动画制作流程复杂、对全身动作控制难度大以及生成效率低的问题，让用户仅需一张参考图片和一段驱动视频（或音频），即可生成表情自然、动作流畅的半身人物影像。\n\n这款工具特别适合研究人员探索多模态动画技术，也面向开发者进行二次开发，同时其提供的 GradioUI 和 ComfyUI 接口让设计师和普通创作者也能轻松上手，快速制作虚拟主播、教育视频或娱乐内容。\n\nEchoMimicV2 的技术亮点在于其“简化”与“高效”的设计理念。作为 CVPR 2025 的收录成果，它在保持画面惊艳度的同时，大幅优化了推理速度。特别是加速版本，将视频生成时间从约 7 分钟缩短至 50 秒左右，效率提升近 9 倍，真正实现了“一分钟生成视频”。此外，它还支持参考图与姿态的自动对齐，降低了用户预处理数据的门槛，让人物动画创作变得更加简单快捷。","\u003Ch1 align='center'>EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation\u003C\u002Fh1>\r\n\r\n\u003Cdiv align='center'>\r\n    \u003Ca href='https:\u002F\u002Fgithub.com\u002Fmengrang' target='_blank'>Rang Meng\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>&emsp;\r\n    \u003Ca href='https:\u002F\u002Fgithub.com\u002F' target='_blank'>Xingyu Zhang\u003C\u002Fa>&emsp;\r\n    \u003Ca href='https:\u002F\u002Flymhust.github.io\u002F' target='_blank'>Yuming Li\u003C\u002Fa>\u003Csup>2\u003C\u002Fsup>&emsp;\r\n    \u003Ca href='https:\u002F\u002Fopenreview.net\u002Fprofile?id=~Chenguang_Ma3' target='_blank'>Chenguang Ma\u003C\u002Fa>\u003Csup>2\u003C\u002Fsup>\r\n\u003C\u002Fdiv>\r\n\r\n\u003Cdiv align='center'>\r\nTerminal Technology Department, Alipay, Ant Group.\r\n\u003C\u002Fdiv>\r\n\r\n\u003Cp align='center'>\r\n    \u003Csup>1\u003C\u002Fsup>Core Contributor&emsp;\r\n    \u003Csup>2\u003C\u002Fsup>Corresponding Authors\r\n\u003C\u002Fp>\r\n\u003Cdiv align='center'>\r\n    \u003Ca href='https:\u002F\u002Fantgroup.github.io\u002Fai\u002Fechomimic_v2\u002F'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-blue'>\u003C\u002Fa>\r\n    \u003Ca href='https:\u002F\u002Fhuggingface.co\u002FBadToBest\u002FEchoMimicV2'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20HuggingFace-Model-yellow'>\u003C\u002Fa>\r\n    \u003C!--\u003Ca href='https:\u002F\u002Fantgroup.github.io\u002Fai\u002Fechomimic_v2\u002F'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20HuggingFace-Demo-yellow'>\u003C\u002Fa>-->\r\n    \u003Ca href='https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002FBadToBest\u002FEchoMimicV2'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModelScope-Model-purple'>\u003C\u002Fa>\r\n    \u003C!--\u003Ca href='https:\u002F\u002Fantgroup.github.io\u002Fai\u002Fechomimic_v2\u002F'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModelScope-Demo-purple'>\u003C\u002Fa>-->\r\n    \u003Ca href='https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.10061'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-Arxiv-red'>\u003C\u002Fa>\r\n    \u003Ca href='https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2025\u002Fpapers\u002FMeng_EchoMimicV2_Towards_Striking_Simplified_and_Semi-Body_Human_Animation_CVPR_2025_paper.pdf'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-CVPR2025-blue'>\u003C\u002Fa>\r\n    \u003Ca href='https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fblob\u002Fmain\u002Fassets\u002Fhalfbody_demo\u002Fwechat_group.png'>\u003Cimg src='https:\u002F\u002Fbadges.aleen42.com\u002Fsrc\u002Fwechat.svg'>\u003C\u002Fa>\r\n\u003C\u002Fdiv>\r\n\u003Cdiv align='center'>\r\n    \u003Ca href='https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fdiscussions\u002F53'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FEnglish-Common Problems-orange'>\u003C\u002Fa>\r\n    \u003Ca href='https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fdiscussions\u002F40'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F中文版-常见问题汇总-orange'>\u003C\u002Fa>\r\n\u003C\u002Fdiv>\r\n\r\n## &#x1F680; EchoMimic Series\r\n* EchoMimicV1: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning. [GitHub](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic)\r\n* EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation. [GitHub](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2)\r\n* EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation. [GitHub](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v3)\r\n\r\n## &#x1F4E3; Updates\r\n* [2025.08.09] 🔥🔥 We update the EchoMimicV3 and release the code.\r\n* [2025.02.27] 🔥 EchoMimicV2 is accepted by CVPR 2025.\r\n* [2025.01.16] 🔥 Please check out the [discussions](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fdiscussions) to learn how to start EchoMimicV2.\r\n* [2025.01.16] 🚀🔥 [GradioUI for Accelerated EchoMimicV2](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fblob\u002Fmain\u002Fapp_acc.py) is now available.\r\n* [2025.01.03] 🚀🔥 **One Minute is All You Need to Generate Video**. [Accelerated EchoMimicV2](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fblob\u002Fmain\u002Finfer_acc.py) are released. The inference speed can be improved by 9x (from ~7mins\u002F120frames to ~50s\u002F120frames on A100 GPU).\r\n* [2024.12.16] 🔥 [RefImg-Pose Alignment Demo](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fblob\u002Fmain\u002Fdemo.ipynb) is now available, which involves aligning reference image, extracting pose from driving video, and generating video.\r\n* [2024.11.27] 🔥 [Installation tutorial](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=2ab6U1-nVTQ) is now available. Thanks [AiMotionStudio](https:\u002F\u002Fwww.youtube.com\u002F@AiMotionStudio) for the contribution.\r\n* [2024.11.22] 🔥 [GradioUI](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fblob\u002Fmain\u002Fapp.py) is now available. Thanks @gluttony-10 for the contribution.\r\n* [2024.11.22] 🔥 [ComfyUI](https:\u002F\u002Fgithub.com\u002Fsmthemex\u002FComfyUI_EchoMimic) is now available. Thanks @smthemex for the contribution.\r\n* [2024.11.21] 🔥 We release the EMTD dataset list and processing scripts.\r\n* [2024.11.21] 🔥 We release our [EchoMimicV2](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2) codes and models.\r\n* [2024.11.15] 🔥 Our [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.10061) is in public on arxiv.\r\n\r\n## &#x1F305; Gallery\r\n### Introduction\r\n\u003Ctable class=\"center\">\r\n\u003Ctr>\r\n    \u003Ctd width=50% style=\"border: none\">\r\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Ff544dfc0-7d1a-4c2c-83c0-608f28ffda25\" muted=\"false\">\u003C\u002Fvideo>\r\n    \u003C\u002Ftd>\r\n    \u003Ctd width=50% style=\"border: none\">\r\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F7f626b65-725c-4158-a96b-062539874c63\" muted=\"false\">\u003C\u002Fvideo>\r\n    \u003C\u002Ftd>\r\n\u003C\u002Ftr>\r\n\u003C\u002Ftable>\r\n\r\n### English Driven Audio\r\n\u003Ctable class=\"center\">\r\n\u003Ctr>\r\n    \u003Ctd width=100% style=\"border: none\">\r\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F3d5ac52c-62e4-41bc-8b27-96f005bbd781\" muted=\"false\">\u003C\u002Fvideo>\r\n    \u003C\u002Ftd>\r\n\u003C\u002Ftr>\r\n\u003C\u002Ftable>\r\n\u003Ctable class=\"center\">\r\n\u003Ctr>\r\n    \u003Ctd width=30% style=\"border: none\">\r\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fe8dd6919-665e-4343-931f-54c93dc49a7d\" muted=\"false\">\u003C\u002Fvideo>\r\n    \u003C\u002Ftd>\r\n    \u003Ctd width=30% style=\"border: none\">\r\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F2a377391-a0d3-4a9d-8dde-cc59006e7e5b\" muted=\"false\">\u003C\u002Fvideo>\r\n    \u003C\u002Ftd>\r\n    \u003Ctd width=30% style=\"border: none\">\r\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F462bf3bb-0af2-43e2-a2dc-559e79953f3c\" muted=\"false\">\u003C\u002Fvideo>\r\n    \u003C\u002Ftd>\r\n\u003C\u002Ftr>\r\n\u003Ctr>\r\n    \u003Ctd width=30% style=\"border: none\">\r\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F0e988e7f-6346-4b54-9061-9cfc7a80e9c8\" muted=\"false\">\u003C\u002Fvideo>\r\n    \u003C\u002Ftd>\r\n    \u003Ctd width=30% style=\"border: none\">\r\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F56f739bd-afbf-4ed3-ab15-73a811c1bc46\" muted=\"false\">\u003C\u002Fvideo>\r\n    \u003C\u002Ftd>\r\n    \u003Ctd width=30% style=\"border: none\">\r\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F1b2f7827-111d-4fc0-a773-e1731bba285d\" muted=\"false\">\u003C\u002Fvideo>\r\n    \u003C\u002Ftd>\r\n\u003C\u002Ftr>\r\n\u003Ctr>\r\n    \u003Ctd width=30% style=\"border: none\">\r\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fa76b6cc8-89b9-4f7e-b1ce-c85a657b6dc7\" muted=\"false\">\u003C\u002Fvideo>\r\n    \u003C\u002Ftd>\r\n    \u003Ctd width=30% style=\"border: none\">\r\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fbf03b407-5033-4a30-aa59-b8680a515181\" muted=\"false\">\u003C\u002Fvideo>\r\n    \u003C\u002Ftd>\r\n    \u003Ctd width=30% style=\"border: none\">\r\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Ff98b3985-572c-499f-ae1a-1b9befe3086f\" muted=\"false\">\u003C\u002Fvideo>\r\n    \u003C\u002Ftd>\r\n\u003C\u002Ftr>\r\n\u003C\u002Ftable>\r\n\r\n### Chinese Driven Audio\r\n\u003Ctable class=\"center\">\r\n\u003Ctr>\r\n    \u003Ctd width=30% style=\"border: none\">\r\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fa940a332-2fd1-48e7-b3c4-f88f63fd1c9d\" muted=\"false\">\u003C\u002Fvideo>\r\n    \u003C\u002Ftd>\r\n    \u003Ctd width=30% style=\"border: none\">\r\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F8f185829-c67f-45f4-846c-fcbe012c3acf\" muted=\"false\">\u003C\u002Fvideo>\r\n    \u003C\u002Ftd>\r\n    \u003Ctd width=30% style=\"border: none\">\r\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fa49ab9be-f17b-41c5-96dd-20dc8d759b45\" muted=\"false\">\u003C\u002Fvideo>\r\n    \u003C\u002Ftd>\r\n\u003C\u002Ftr>\r\n\u003Ctr>\r\n    \u003Ctd width=30% style=\"border: none\">\r\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F1136ec68-a13c-4ee7-ab31-5621530bf9df\" muted=\"false\">\u003C\u002Fvideo>\r\n    \u003C\u002Ftd>\r\n    \u003Ctd width=30% style=\"border: none\">\r\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Ffc16d512-8806-4662-ae07-8fcf45c75a83\" muted=\"false\">\u003C\u002Fvideo>\r\n    \u003C\u002Ftd>\r\n    \u003Ctd width=30% style=\"border: none\">\r\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Ff8559cd1-f555-4781-9251-dfcef10b5b01\" muted=\"false\">\u003C\u002Fvideo>\r\n    \u003C\u002Ftd>\r\n\u003C\u002Ftr>\r\n\u003Ctr>\r\n    \u003Ctd width=30% style=\"border: none\">\r\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fc7473e3a-ab51-4ad5-be96-6c4691fc0c6e\" muted=\"false\">\u003C\u002Fvideo>\r\n    \u003C\u002Ftd>\r\n    \u003Ctd width=30% style=\"border: none\">\r\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fca69eac0-5126-41ee-8cac-c9722004d771\" muted=\"false\">\u003C\u002Fvideo>\r\n    \u003C\u002Ftd>\r\n    \u003Ctd width=30% style=\"border: none\">\r\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fe66f1712-b66d-46b5-8bbd-811fbcfea4fd\" muted=\"false\">\u003C\u002Fvideo>\r\n    \u003C\u002Ftd>\r\n\u003C\u002Ftr>\r\n\u003C\u002Ftable>\r\n\r\n## ⚒️ Automatic Installation\r\n### Download the Codes\r\n\r\n```bash\r\n  git clone https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\r\n  cd echomimic_v2\r\n```\r\n### Automatic Setup\r\n- CUDA >= 11.7, Python == 3.10\r\n\r\n```bash\r\n   sh linux_setup.sh\r\n```\r\n## ⚒️ Manual Installation\r\n### Download the Codes\r\n\r\n```bash\r\n  git clone https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\r\n  cd echomimic_v2\r\n```\r\n### Python Environment Setup\r\n\r\n- Tested System Environment: Centos 7.2\u002FUbuntu 22.04, Cuda >= 11.7\r\n- Tested GPUs: A100(80G) \u002F RTX4090D (24G) \u002F V100(16G)\r\n- Tested Python Version: 3.8 \u002F 3.10 \u002F 3.11\r\n\r\nCreate conda environment (Recommended):\r\n\r\n```bash\r\n  conda create -n echomimic python=3.10\r\n  conda activate echomimic\r\n```\r\n\r\nInstall packages with `pip`\r\n```bash\r\n  pip install pip -U\r\n  pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 xformers==0.0.28.post3 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu124\r\n  pip install torchao --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fnightly\u002Fcu124\r\n  pip install -r requirements.txt\r\n  pip install --no-deps facenet_pytorch==2.6.0\r\n```\r\n\r\n### Download ffmpeg-static\r\nDownload and decompress [ffmpeg-static](https:\u002F\u002Fwww.johnvansickle.com\u002Fffmpeg\u002Fold-releases\u002Fffmpeg-4.4-amd64-static.tar.xz), then\r\n```\r\nexport FFMPEG_PATH=\u002Fpath\u002Fto\u002Fffmpeg-4.4-amd64-static\r\n```\r\n\r\n### Download pretrained weights\r\n\r\n```shell\r\ngit lfs install\r\ngit clone https:\u002F\u002Fhuggingface.co\u002FBadToBest\u002FEchoMimicV2 pretrained_weights\r\n```\r\n\r\nThe **pretrained_weights** is organized as follows.\r\n\r\n```\r\n.\u002Fpretrained_weights\u002F\r\n├── denoising_unet.pth\r\n├── reference_unet.pth\r\n├── motion_module.pth\r\n├── pose_encoder.pth\r\n├── sd-vae-ft-mse\r\n│   └── ...\r\n└── audio_processor\r\n    └── tiny.pt\r\n```\r\n\r\nIn which **denoising_unet.pth** \u002F **reference_unet.pth** \u002F **motion_module.pth** \u002F **pose_encoder.pth** are the main checkpoints of **EchoMimic**. Other models in this hub can be also downloaded from it's original hub, thanks to their brilliant works:\r\n- [sd-vae-ft-mse](https:\u002F\u002Fhuggingface.co\u002Fstabilityai\u002Fsd-vae-ft-mse)\r\n- [audio_processor(whisper)](https:\u002F\u002Fopenaipublic.azureedge.net\u002Fmain\u002Fwhisper\u002Fmodels\u002F65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9\u002Ftiny.pt)\r\n\r\n### Inference on Demo \r\nRun the gradio:\r\n```bash\r\npython app.py\r\n```\r\nRun the python inference script:\r\n```bash\r\npython infer.py --config='.\u002Fconfigs\u002Fprompts\u002Finfer.yaml'\r\n```\r\n\r\nRun the python inference script for accelerated version. Make sure to check out the configuration for accelerated inference:\r\n```bash\r\npython infer_acc.py --config='.\u002Fconfigs\u002Fprompts\u002Finfer_acc.yaml'\r\n```\r\n\r\n### EMTD Dataset\r\nDownload dataset:\r\n```bash\r\npython .\u002FEMTD_dataset\u002Fdownload.py\r\n```\r\nSlice dataset:\r\n```bash\r\nbash .\u002FEMTD_dataset\u002Fslice.sh\r\n```\r\nProcess dataset:\r\n```bash\r\npython .\u002FEMTD_dataset\u002Fpreprocess.py\r\n```\r\nMake sure to check out the [discussions](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fdiscussions) to learn how to start the inference.\r\n\r\n## 📝 Release Plans\r\n\r\n|  Status  | Milestone                                                                | ETA |\r\n|:--------:|:-------------------------------------------------------------------------|:--:|\r\n|    ✅    | The inference source code of EchoMimicV2 meet everyone on GitHub   | 21st Nov, 2024 |\r\n|    ✅    | Pretrained models trained on English and Mandarin Chinese on HuggingFace | 21st Nov, 2024 |\r\n|    ✅    | Pretrained models trained on English and Mandarin Chinese on ModelScope   | 21st Nov, 2024 |\r\n|    ✅    | EMTD dataset list and processing scripts                | 21st Nov, 2024 |\r\n|    ✅    | Jupyter demo with pose and reference image alignmnet                | 16st Dec, 2024 |\r\n|    ✅    | Accelerated models                                        | 3st Jan, 2025 |\r\n|    🚀    | Online Demo on ModelScope to be released            | TBD |\r\n|    🚀    | Online Demo on HuggingFace to be released         | TBD |\r\n\r\n## ⚖️ Disclaimer\r\nThis project is intended for academic research, and we explicitly disclaim any responsibility for user-generated content. Users are solely liable for their actions while using the generative model. The project contributors have no legal affiliation with, nor accountability for, users' behaviors. It is imperative to use the generative model responsibly, adhering to both ethical and legal standards.\r\n\r\n## 🙏🏻 Acknowledgements\r\n\r\nWe would like to thank the contributors to the [MimicMotion](https:\u002F\u002Fgithub.com\u002FTencent\u002FMimicMotion) and [Moore-AnimateAnyone](https:\u002F\u002Fgithub.com\u002FMooreThreads\u002FMoore-AnimateAnyone) repositories, for their open research and exploration. \r\n\r\nWe are also grateful to [CyberHost](https:\u002F\u002Fcyberhost.github.io\u002F) and [Vlogger](https:\u002F\u002Fenriccorona.github.io\u002Fvlogger\u002F) for their outstanding work in the area of audio-driven human animation.\r\n\r\nIf we missed any open-source projects or related articles, we would like to complement the acknowledgement of this specific work immediately.\r\n\r\n## &#x1F4D2; Citation\r\n\r\nIf you find our work useful for your research, please consider citing the paper :\r\n\r\n```\r\n@article{meng2024echomimicv2,\r\n  title={EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation},\r\n  author={Meng, Rang and Zhang, Xingyu and Li, Yuming and Ma, Chenguang},\r\n  journal={arXiv preprint arXiv:2411.10061},\r\n  year={2024}\r\n}\r\n@article{meng2025echomimicv3,\r\n  title={Echomimicv3: 1.3 b parameters are all you need for unified multi-modal and multi-task human animation},\r\n  author={Meng, Rang and Wang, Yan and Wu, Weipeng and Zheng, Ruobing and Li, Yuming and Ma, Chenguang},\r\n  journal={arXiv preprint arXiv:2507.03905},\r\n  year={2025}\r\n}\r\n@article{meng2026echotorrent,\r\n  title={EchoTorrent: Towards Swift, Sustained, and Streaming Multi-Modal Video Generation},\r\n  author={Meng, Rang and Wu, Weipeng and Yin, Yingjie and Li, Yuming and Ma, Chenguang},\r\n  journal={arXiv preprint arXiv:2602.13669},\r\n  year={2026}\r\n}\r\n```\r\n\r\n## &#x1F31F; Star History\r\n[![Star History Chart](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fantgroup_echomimic_v2_readme_454e69122853.png)](https:\u002F\u002Fstar-history.com\u002F#antgroup\u002Fechomimic_v2&Date)\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n","\u003Ch1 align='center'>EchoMimicV2：迈向引人注目、简化且半身的人体动画\u003C\u002Fh1>\n\n\u003Cdiv align='center'>\n    \u003Ca href='https:\u002F\u002Fgithub.com\u002Fmengrang' target='_blank'>Rang Meng\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>&emsp;\n    \u003Ca href='https:\u002F\u002Fgithub.com\u002F' target='_blank'>Xingyu Zhang\u003C\u002Fa>&emsp;\n    \u003Ca href='https:\u002F\u002Flymhust.github.io\u002F' target='_blank'>Yuming Li\u003C\u002Fa>\u003Csup>2\u003C\u002Fsup>&emsp;\n    \u003Ca href='https:\u002F\u002Fopenreview.net\u002Fprofile?id=~Chenguang_Ma3' target='_blank'>Chenguang Ma\u003C\u002Fa>\u003Csup>2\u003C\u002Fsup>\n\u003C\u002Fdiv>\n\n\u003Cdiv align='center'>\n终端技术部，支付宝，蚂蚁集团。\n\u003C\u002Fdiv>\n\n\u003Cp align='center'>\n    \u003Csup>1\u003C\u002Fsup>核心贡献者&emsp;\n    \u003Csup>2\u003C\u002Fsup>通讯作者\n\u003C\u002Fp>\n\u003Cdiv align='center'>\n    \u003Ca href='https:\u002F\u002Fantgroup.github.io\u002Fai\u002Fechomimic_v2\u002F'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-blue'>\u003C\u002Fa>\n    \u003Ca href='https:\u002F\u002Fhuggingface.co\u002FBadToBest\u002FEchoMimicV2'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20HuggingFace-Model-yellow'>\u003C\u002Fa>\n    \u003C!--\u003Ca href='https:\u002F\u002Fantgroup.github.io\u002Fai\u002Fechomimic_v2\u002F'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20HuggingFace-Demo-yellow'>\u003C\u002Fa>-->\n    \u003Ca href='https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002FBadToBest\u002FEchoMimicV2'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModelScope-Model-purple'>\u003C\u002Fa>\n    \u003C!--\u003Ca href='https:\u002F\u002Fantgroup.github.io\u002Fai\u002Fechomimic_v2\u002F'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModelScope-Demo-purple'>\u003C\u002Fa>-->\n    \u003Ca href='https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.10061'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-Arxiv-red'>\u003C\u002Fa>\n    \u003Ca href='https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2025\u002Fpapers\u002FMeng_EchoMimicV2_Towards_Striking_Simplified_and_Semi-Body_Human_Animation_CVPR_2025_paper.pdf'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-CVPR2025-blue'>\u003C\u002Fa>\n    \u003Ca href='https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fblob\u002Fmain\u002Fassets\u002Fhalfbody_demo\u002Fwechat_group.png'>\u003Cimg src='https:\u002F\u002Fbadges.aleen42.com\u002Fsrc\u002Fwechat.svg'>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\u003Cdiv align='center'>\n    \u003Ca href='https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fdiscussions\u002F53'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FEnglish-Common Problems-orange'>\u003C\u002Fa>\n    \u003Ca href='https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fdiscussions\u002F40'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F中文版-常见问题汇总-orange'>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n## &#x1F680; EchoMimic系列\n* EchoMimicV1：通过可编辑的地标条件生成逼真的音频驱动肖像动画。[GitHub](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic)\n* EchoMimicV2：迈向引人注目、简化且半身的人体动画。[GitHub](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2)\n* EchoMimicV3：13亿参数足以实现统一的多模态和多任务人体动画。[GitHub](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v3)\n\n## &#x1F4E3; 更新\n* [2025.08.09] 🔥🔥 我们更新了EchoMimicV3并发布了代码。\n* [2025.02.27] 🔥 EchoMimicV2已被CVPR 2025接收。\n* [2025.01.16] 🔥 请查看[讨论](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fdiscussions)，了解如何开始使用EchoMimicV2。\n* [2025.01.16] 🚀🔥 [用于加速EchoMimicV2的GradioUI](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fblob\u002Fmain\u002Fapp_acc.py)现已可用。\n* [2025.01.03] 🚀🔥 **只需一分钟即可生成视频**。[加速的EchoMimicV2](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fblob\u002Fmain\u002Finfer_acc.py)已发布。在A100 GPU上，推理速度可提升9倍（从约7分钟\u002F120帧缩短至约50秒\u002F120帧）。\n* [2024.12.16] 🔥 [RefImg-Pose对齐演示](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fblob\u002Fmain\u002Fdemo.ipynb)现已可用，其中包括对参考图像进行对齐、从驱动视频中提取姿态以及生成视频。\n* [2024.11.27] 🔥 [安装教程](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=2ab6U1-nVTQ)现已可用。感谢[AiMotionStudio](https:\u002F\u002Fwww.youtube.com\u002F@AiMotionStudio)的贡献。\n* [2024.11.22] 🔥 [GradioUI](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fblob\u002Fmain\u002Fapp.py)现已可用。感谢@gluttony-10的贡献。\n* [2024.11.22] 🔥 [ComfyUI](https:\u002F\u002Fgithub.com\u002Fsmthemex\u002FComfyUI_EchoMimic)现已可用。感谢@smthemex的贡献。\n* [2024.11.21] 🔥 我们发布了EMTD数据集列表及处理脚本。\n* [2024.11.21] 🔥 我们发布了我们的[EchoMimicV2](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2)代码和模型。\n* [2024.11.15] 🔥 我们的[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.10061)已在arXiv上公开。\n\n## &#x1F305; 画廊\n### 简介\n\u003Ctable class=\"center\">\n\u003Ctr>\n    \u003Ctd width=50% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Ff544dfc0-7d1a-4c2c-83c0-608f28ffda25\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=50% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F7f626b65-725c-4158-a96b-062539874c63\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n### 英语驱动音频\n\u003Ctable class=\"center\">\n\u003Ctr>\n    \u003Ctd width=100% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F3d5ac52c-62e4-41bc-8b27-96f005bbd781\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\u003Ctable class=\"center\">\n\u003Ctr>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fe8dd6919-665e-4343-931f-54c93dc49a7d\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F2a377391-a0d3-4a9d-8dde-cc59006e7e5b\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F462bf3bb-0af2-43e2-a2dc-559e79953f3c\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F0e988e7f-6346-4b54-9061-9cfc7a80e9c8\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F56f739bd-afbf-4ed3-ab15-73a811c1bc46\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F1b2f7827-111d-4fc0-a773-e1731bba285d\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fa76b6cc8-89b9-4f7e-b1ce-c85a657b6dc7\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fbf03b407-5033-4a30-aa59-b8680a515181\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Ff98b3985-572c-499f-ae1a-1b9befe3086f\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n### 中文驱动音频\n\u003Ctable class=\"center\">\n\u003Ctr>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fa940a332-2fd1-48e7-b3c4-f88f63fd1c9d\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F8f185829-c67f-45f4-846c-fcbe012c3acf\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fa49ab9be-f17b-41c5-96dd-20dc8d759b45\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F1136ec68-a13c-4ee7-ab31-5621530bf9df\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Ffc16d512-8806-4662-ae07-8fcf45c75a83\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Ff8559cd1-f555-4781-9251-dfcef10b5b01\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fc7473e3a-ab51-4ad5-be96-6c4691fc0c6e\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fca69eac0-5126-41ee-8cac-c9722004d771\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n    \u003Ctd width=30% style=\"border: none\">\n        \u003Cvideo controls loop src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fe66f1712-b66d-46b5-8bbd-811fbcfea4fd\" muted=\"false\">\u003C\u002Fvideo>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n## ⚒️ 自动安装\n### 下载代码\n\n```bash\n  git clone https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\n  cd echomimic_v2\n```\n### 自动设置\n- CUDA >= 11.7，Python == 3.10\n\n```bash\n   sh linux_setup.sh\n```\n## ⚒️ 手动安装\n### 下载代码\n\n```bash\n  git clone https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\n  cd echomimic_v2\n```\n### Python 环境设置\n\n- 测试系统环境：Centos 7.2\u002FUbuntu 22.04，Cuda >= 11.7\n- 测试 GPU：A100(80G) \u002F RTX4090D (24G) \u002F V100(16G)\n- 测试 Python 版本：3.8 \u002F 3.10 \u002F 3.11\n\n创建 conda 环境（推荐）：\n\n```bash\n  conda create -n echomimic python=3.10\n  conda activate echomimic\n```\n\n使用 `pip` 安装包：\n```bash\n  pip install pip -U\n  pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 xformers==0.0.28.post3 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu124\n  pip install torchao --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fnightly\u002Fcu124\n  pip install -r requirements.txt\n  pip install --no-deps facenet_pytorch==2.6.0\n```\n\n### 下载 ffmpeg-static\n下载并解压 [ffmpeg-static](https:\u002F\u002Fwww.johnvansickle.com\u002Fffmpeg\u002Fold-releases\u002Fffmpeg-4.4-amd64-static.tar.xz)，然后\n```bash\nexport FFMPEG_PATH=\u002Fpath\u002Fto\u002Fffmpeg-4.4-amd64-static\n```\n\n### 下载预训练权重\n\n```shell\ngit lfs install\ngit clone https:\u002F\u002Fhuggingface.co\u002FBadToBest\u002FEchoMimicV2 pretrained_weights\n```\n\n**pretrained_weights** 的目录结构如下：\n\n```none\n.\u002Fpretrained_weights\u002F\n├── denoising_unet.pth\n├── reference_unet.pth\n├── motion_module.pth\n├── pose_encoder.pth\n├── sd-vae-ft-mse\n│   └── ...\n└── audio_processor\n    └── tiny.pt\n```\n\n其中，**denoising_unet.pth** \u002F **reference_unet.pth** \u002F **motion_module.pth** \u002F **pose_encoder.pth** 是 **EchoMimic** 的主要检查点。该仓库中的其他模型也可以从其原始仓库下载，感谢他们的杰出工作：\n- [sd-vae-ft-mse](https:\u002F\u002Fhuggingface.co\u002Fstabilityai\u002Fsd-vae-ft-mse)\n- [audio_processor(whisper)](https:\u002F\u002Fopenaipublic.azureedge.net\u002Fmain\u002Fwhisper\u002Fmodels\u002F65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9\u002Ftiny.pt)\n\n### 演示推理\n运行 Gradio：\n```bash\npython app.py\n```\n运行 Python 推理脚本：\n```bash\npython infer.py --config='.\u002Fconfigs\u002Fprompts\u002Finfer.yaml'\n```\n\n运行加速版本的 Python 推理脚本。请务必查看加速推理的配置：\n```bash\npython infer_acc.py --config='.\u002Fconfigs\u002Fprompts\u002Finfer_acc.yaml'\n```\n\n### EMTD 数据集\n下载数据集：\n```bash\npython .\u002FEMTD_dataset\u002Fdownload.py\n```\n切分数据集：\n```bash\nbash .\u002FEMTD_dataset\u002Fslice.sh\n```\n处理数据集：\n```bash\npython .\u002FEMTD_dataset\u002Fpreprocess.py\n```\n\n请务必查看 [讨论区](https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fdiscussions)，了解如何开始推理。\n\n## 📝 发布计划\n\n| 状态 | 阶段                                                         | 预计完成时间 |\n|:--------:|:-------------------------------------------------------------|:--:|\n|    ✅    | EchoMimicV2 的推理源代码在 GitHub 上公开发布                 | 2024年11月21日 |\n|    ✅    | 在 HuggingFace 上发布英语和中文普通话的预训练模型           | 2024年11月21日 |\n|    ✅    | 在 ModelScope 上发布英语和中文普通话的预训练模型             | 2024年11月21日 |\n|    ✅    | EMTD 数据集列表及处理脚本                                  | 2024年11月21日 |\n|    ✅    | 带有姿态和参考图像对齐的 Jupyter 演示                      | 2024年12月16日 |\n|    ✅    | 加速模型                                                   | 2025年1月3日 |\n|    🚀    | 在 ModelScope 上发布在线演示                                 | 待定 |\n|    🚀    | 在 HuggingFace 上发布在线演示                               | 待定 |\n\n## ⚖️ 免责声明\n本项目仅用于学术研究，我们明确声明不对用户生成的内容承担任何责任。用户在使用生成模型时应对其行为自行负责。项目贡献者与用户的任何行为均无法律关联或责任。务必以负责任的态度使用生成模型，遵守道德和法律规范。\n\n## 🙏🏻 致谢\n\n我们衷心感谢 [MimicMotion](https:\u002F\u002Fgithub.com\u002FTencent\u002FMimicMotion) 和 [Moore-AnimateAnyone](https:\u002F\u002Fgithub.com\u002FMooreThreads\u002FMoore-AnimateAnyone) 项目的贡献者们，感谢他们开放的研究与探索精神。\n\n同时，我们也感谢 [CyberHost](https:\u002F\u002Fcyberhost.github.io\u002F) 和 [Vlogger](https:\u002F\u002Fenriccorona.github.io\u002Fvlogger\u002F) 在音频驱动人体动画领域的杰出工作。\n\n如果我们遗漏了任何开源项目或相关文章，我们将立即补充致谢。\n\n## &#x1F4D2; 引用\n\n如果您觉得我们的工作对您的研究有帮助，请考虑引用以下论文：\n\n``` \n@article{meng2024echomimicv2,\n  title={EchoMimicV2：迈向震撼、简化且半身的人体动画},\n  author={孟rang、张兴宇、李宇明、马晨光},\n  journal={arXiv预印本 arXiv:2411.10061},\n  year={2024}\n}\n@article{meng2025echomimicv3,\n  title={Echomimicv3：13亿参数足以实现统一的多模态多任务人体动画},\n  author={孟rang、王燕、吴伟鹏、郑若冰、李宇明、马晨光},\n  journal={arXiv预印本 arXiv:2507.03905},\n  year={2025}\n}\n@article{meng2026echotorrent,\n  title={EchoTorrent：迈向快速、持续且流式的多模态视频生成},\n  author={孟rang、吴伟鹏、尹英杰、李宇明、马晨光},\n  journal={arXiv预印本 arXiv:2602.13669},\n  year={2026}\n}\n``` \n\n## &#x1F31F; 点赞历史\n[![点赞历史图表](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fantgroup_echomimic_v2_readme_454e69122853.png)](https:\u002F\u002Fstar-history.com\u002F#antgroup\u002Fechomimic_v2&Date)","# EchoMimicV2 快速上手指南\n\nEchoMimicV2 是一款由蚂蚁集团开源的半身人物动画生成工具，支持通过参考图片和驱动音频（或视频姿态）生成逼真的人物说话视频。本指南将帮助你快速在本地部署并运行该模型。\n\n## 1. 环境准备\n\n在开始之前，请确保你的开发环境满足以下要求：\n\n*   **操作系统**: Linux (推荐 Ubuntu 22.04 或 CentOS 7.2)\n*   **GPU**: NVIDIA 显卡，显存建议 16GB 以上 (测试通过型号：A100 80G, RTX4090D 24G, V100 16G)\n*   **CUDA 版本**: >= 11.7 (推荐 12.4 以匹配预编译包)\n*   **Python 版本**: 3.8 \u002F 3.10 \u002F 3.11 (推荐 3.10)\n*   **其他依赖**: `git`, `git-lfs`, `ffmpeg`\n\n## 2. 安装步骤\n\n### 2.1 克隆代码库\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\ncd echomimic_v2\n```\n\n### 2.2 创建虚拟环境\n\n推荐使用 Conda 管理环境：\n\n```bash\nconda create -n echomimic python=3.10\nconda activate echomimic\n```\n\n### 2.3 安装依赖包\n\n请严格按照以下顺序安装 PyTorch 及相关组件（基于 CUDA 12.4）：\n\n```bash\npip install pip -U\npip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 xformers==0.0.28.post3 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu124\npip install torchao --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fnightly\u002Fcu124\npip install -r requirements.txt\npip install --no-deps facenet_pytorch==2.6.0\n```\n\n> **注意**：如果下载速度慢，可将 `--index-url` 替换为国内镜像源（如清华源），但需确保 PyTorch 版本与 CUDA 版本严格匹配。若遇到兼容性问题，建议优先使用官方源。\n\n### 2.4 配置 FFmpeg\n\n下载静态编译版的 FFmpeg 并配置环境变量：\n\n1.  下载并解压 [ffmpeg-4.4-amd64-static.tar.xz](https:\u002F\u002Fwww.johnvansickle.com\u002Fffmpeg\u002Fold-releases\u002Fffmpeg-4.4-amd64-static.tar.xz)。\n2.  设置环境变量（请将路径替换为你的实际解压路径）：\n\n```bash\nexport FFMPEG_PATH=\u002Fpath\u002Fto\u002Fffmpeg-4.4-amd64-static\n```\n\n### 2.5 下载预训练模型\n\n使用 `git-lfs` 从 HuggingFace 下载模型权重：\n\n```bash\ngit lfs install\ngit clone https:\u002F\u002Fhuggingface.co\u002FBadToBest\u002FEchoMimicV2 pretrained_weights\n```\n\n下载完成后，`pretrained_weights` 目录应包含以下核心文件：\n*   `denoising_unet.pth`\n*   `reference_unet.pth`\n*   `motion_module.pth`\n*   `pose_encoder.pth`\n*   `sd-vae-ft-mse` (文件夹)\n*   `audio_processor\u002Ftiny.pt`\n\n> **国内加速提示**：如果 HuggingFace 连接困难，可使用 ModelScope 镜像下载：\n> ```bash\n> git clone https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002FBadToBest\u002FEchoMimicV2 pretrained_weights\n> ```\n\n## 3. 基本使用\n\n安装完成后，你可以通过两种方式运行模型：Web 界面或命令行脚本。\n\n### 方式一：启动 Web 界面 (推荐)\n\n启动 Gradio 界面，便于上传图片和音频进行可视化操作：\n\n```bash\npython app.py\n```\n\n启动后，在浏览器访问显示的本地地址（通常为 `http:\u002F\u002F127.0.0.1:7860`），按照界面提示上传参考图和驱动音频即可生成视频。\n\n### 方式二：命令行推理\n\n#### 标准推理\n使用默认配置文件进行推理：\n\n```bash\npython infer.py --config='.\u002Fconfigs\u002Fprompts\u002Finfer.yaml'\n```\n\n#### 加速推理 (推荐)\n新版提供了加速脚本，推理速度提升约 9 倍（在 A100 上从 7 分钟缩短至 50 秒\u002F120 帧）：\n\n```bash\npython infer_acc.py --config='.\u002Fconfigs\u002Fprompts\u002Finfer_acc.yaml'\n```\n\n> **提示**：在使用命令行前，请编辑对应的 `.yaml` 配置文件，修改 `ref_image_path` (参考图路径) 和 `driving_audio_path` (驱动音频路径) 为你本地的实际文件路径。","某短视频 MCN 机构急需为旗下知识类博主批量制作“半身高清口播”视频，以快速响应热点话题。\n\n### 没有 echomimic_v2 时\n- **动作僵硬局限**：传统数字人方案仅能驱动面部表情，博主说话时身体静止不动，画面缺乏真实感和感染力。\n- **制作流程繁琐**：需分别进行人脸重绘、肢体动画合成及后期剪辑对齐，单人视频制作耗时数小时，难以应对突发热点。\n- **硬件门槛高昂**：生成高质量视频依赖多卡并行推理，单次渲染耗时长达 7 分钟以上，严重拖慢内容产出节奏。\n- **姿态控制困难**：难以精确复刻参考视频中博主特有的手势和肢体语言，导致生成的视频缺乏个人风格辨识度。\n\n### 使用 echomimic_v2 后\n- **半身自然灵动**：echomimic_v2 支持半身高保真驱动，不仅能精准还原口型，还能同步生成自然的头部晃动与手部 gestures，人物栩栩如生。\n- **端到端高效生成**：只需一张参考图和一段驱动视频，即可一键生成完整动画，将原本分散的制作步骤整合，大幅缩短工作流。\n- **推理速度飞跃**：借助加速版推理脚本，在单张 A100 显卡上生成 120 帧视频仅需约 50 秒，效率提升近 9 倍，实现分钟级视频交付。\n- **姿态完美对齐**：内置的 RefImg-Pose 对齐机制能自动校准参考图与驱动视频的姿态差异，确保博主标志性的肢体动作被完美复现。\n\nechomimic_v2 通过简化流程与极致提速，让高表现力的半身数字人视频创作变得像编辑文档一样简单高效。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fantgroup_echomimic_v2_50fef8e4.png","antgroup","Ant Group","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fantgroup_0417229b.jpg","Make it easy to do business anywhere.",null,"https:\u002F\u002Fwww.antgroup.com","https:\u002F\u002Fgithub.com\u002Fantgroup",[83,87,91],{"name":84,"color":85,"percentage":86},"Python","#3572A5",94.7,{"name":88,"color":89,"percentage":90},"Jupyter Notebook","#DA5B0B",4.7,{"name":92,"color":93,"percentage":94},"Shell","#89e051",0.6,4525,533,"2026-04-03T08:45:39","Apache-2.0","Linux","必需 NVIDIA GPU，测试型号包括 A100 (80G), RTX4090D (24G), V100 (16G)，需 CUDA >= 11.7","未说明",{"notes":103,"python":104,"dependencies":105},"官方安装脚本仅支持 Linux (CentOS 7.2\u002FUbuntu 22.04)。需手动下载并配置 ffmpeg-static 环境变量。模型权重需从 HuggingFace 或 ModelScope 下载，包含去噪 UNet、参考 UNet、运动模块等核心组件。提供加速推理版本，在 A100 上可将生成速度提升约 9 倍。","3.8 \u002F 3.10 \u002F 3.11 (推荐 3.10)",[106,107,108,109,110,111,112],"torch==2.5.1","torchvision==0.20.1","torchaudio==2.5.1","xformers==0.0.28.post3","torchao","facenet_pytorch==2.6.0","ffmpeg-static",[14,35,39],[115,116,117,118,119,120,121,122],"audio-driven-portrait-animations","audio-driven-talking-face","human-animation","talking-face-generation","talking-head","audio-driven-body-animation","cvpr2025","video-generation",7,"2026-03-27T02:49:30.150509","2026-04-06T05:27:29.809990",[127,132,137,142,147,152,157,162],{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},10440,"Windows 系统是否支持部署？对显卡有什么要求？","支持 Windows 系统。用户可以直接使用一键安装包进行部署。关于显存要求，建议至少拥有 13GB 专用显存（VRAM），8GB 或 10GB 显存可能不足以运行或会导致速度极慢（使用共享内存）。相关部署视频教程和文档可在 Bilibili 和飞书文档中找到。","https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fissues\u002F3",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},10441,"生成的视频出现闪烁或噪点严重怎么办？","视频闪烁通常与生成参数或输入图片有关。建议参考官方讨论区关于“生成视频噪点问题”的文档进行调整。有用户尝试使用绿幕图片但发现仍然闪烁，这并非通用解决方案。请检查是否使用了推荐的对齐代码调整真人图片，并确保推理步数和 CFG 参数设置合理。","https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fissues\u002F73",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},10442,"运行时报错找不到 'diffusion_pytorch_model.safetensors' 文件怎么办？","该错误通常不影响最终推理。程序在运行时会加载 'pretrained_weights\u002Freference_unet.pth' 的权重，即使报错缺少 safetensors 文件，只要核心权重文件存在，通常可以正常生成视频。如果确实无法运行，请检查模型下载是否完整或参考 Issue #85 的相关讨论。","https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fissues\u002F42",{"id":143,"question_zh":144,"answer_zh":145,"source_url":146},10443,"Windows 下无法安装 triton 模块导致报错怎么办？","在 Windows 下 triton 模块确实难以安装，但部分用户反馈不安装该模块也能运行程序（Web 页面可启动），只是生成速度会显著变慢（例如 16G 显存生成一个视频可能需要 20 分钟）。如果必须安装，可尝试从官网下载源码编译最新版本，或者暂时忽略该错误直接运行测试示例。","https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fissues\u002F39",{"id":148,"question_zh":149,"answer_zh":150,"source_url":151},10444,"如何降低显存占用或在低显存显卡上运行？","EchoMimicV2 对显存要求较高，大约需要 13GB 专用显存。8GB 或 10GB 显存通常无法正常运行或会因使用共享内存而导致生成时间极长。目前官方暂未提供 CPU offload 或大幅降低显存的特定技巧，建议升级显卡或使用云端高显存实例。","https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fissues\u002F6",{"id":153,"question_zh":154,"answer_zh":155,"source_url":156},10445,"为什么使用了加速模型但推理速度没有提升？","速度未提升通常是因为未正确配置加速流程。仅仅下载加速模型是不够的，必须：1. 使用专门的加速推理脚本（如 infer_acc.py）而非普通 UI；2. 在运行界面或代码中手动将 step（推理步数）和 cfg 参数修改为加速模式对应的低数值（默认参数效果差且慢）。如果仅使用默认 UI 而未调整这些参数，加速将无效。","https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fissues\u002F118",{"id":158,"question_zh":159,"answer_zh":160,"source_url":161},10446,"如何使用自己的图片生成视频且避免头部变形或不稳定？","使用自定义图片时，建议先使用官方提供的对齐代码（参考 demo.ipynb）对真人图片进行预处理和调整。如果直接使用未对齐的图片，可能会导致头部效果不稳定或画面模糊。目前暂无明确证据表明必须重新微调模型，正确的预处理和对齐是关键。","https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fissues\u002F35",{"id":163,"question_zh":164,"answer_zh":165,"source_url":166},10447,"pose 文件中的各个姿态代表什么？有没有说明文档？","官方目前在 demo.ipynb 中提供了一些预设的 pose 文件。对于具体每个 pose 对应的动作（如适合男性的微动姿态），官方尚未提供详细的文字说明文档。未来计划推出从视频直接提取 pose 的 Demo 并增加趣味手势。用户可以通过查看 npy 文件中的坐标数据来推测姿态，或等待官方更新。","https:\u002F\u002Fgithub.com\u002Fantgroup\u002Fechomimic_v2\u002Fissues\u002F75",[]]