[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-Picsart-AI-Research--Text2Video-Zero":3,"tool-Picsart-AI-Research--Text2Video-Zero":65},[4,18,32,40,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,3,"2026-04-06T03:28:53",[13,14,15,16],"开发框架","图像","Agent","视频","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":24,"last_commit_at":25,"category_tags":26,"status":17},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",85052,2,"2026-04-08T11:03:08",[14,27,16,28,15,29,30,13,31],"数据工具","插件","其他","语言模型","音频",{"id":33,"name":34,"github_repo":35,"description_zh":36,"stars":37,"difficulty_score":10,"last_commit_at":38,"category_tags":39,"status":17},3833,"MoneyPrinterTurbo","harry0703\u002FMoneyPrinterTurbo","MoneyPrinterTurbo 是一款利用 AI 大模型技术，帮助用户一键生成高清短视频的开源工具。只需输入一个视频主题或关键词，它就能全自动完成从文案创作、素材匹配、字幕合成到背景音乐搭配的全过程，最终输出完整的竖屏或横屏短视频。\n\n这款工具主要解决了传统视频制作流程繁琐、门槛高以及素材版权复杂等痛点。无论是需要快速产出内容的自媒体创作者，还是希望尝试视频生成的普通用户，无需具备专业的剪辑技能或昂贵的硬件配置（普通电脑即可运行），都能轻松上手。同时，其清晰的 MVC 架构和对多种主流大模型（如 DeepSeek、Moonshot、通义千问等）的广泛支持，也使其成为开发者进行二次开发或技术研究的理想底座。\n\nMoneyPrinterTurbo 的独特亮点在于其高度的灵活性与本地化友好性。它不仅支持中英文双语及多种语音合成，允许用户精细调整字幕样式和画面比例，还特别优化了国内网络环境下的模型接入方案，让用户无需依赖 VPN 即可使用高性能国产大模型。此外，工具提供批量生成模式，可一次性产出多个版本供用户择优，极大地提升了内容创作的效率与质量。",54991,"2026-04-05T12:23:02",[13,30,15,16,14],{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":24,"last_commit_at":46,"category_tags":47,"status":17},2179,"oh-my-openagent","code-yeongyu\u002Foh-my-openagent","oh-my-openagent（简称 omo）是一款强大的开源智能体编排框架，前身名为 oh-my-opencode。它致力于打破单一模型供应商的生态壁垒，解决开发者在构建 AI 应用时面临的“厂商锁定”难题。不同于仅依赖特定模型的封闭方案，omo 倡导开放市场理念，支持灵活调度多种主流大模型：利用 Claude、Kimi 或 GLM 进行任务编排，调用 GPT 处理复杂推理，借助 Minimax 提升响应速度，或发挥 Gemini 的创意优势。\n\n这款工具特别适合希望摆脱平台限制、追求极致性能与成本平衡的开发者及研究人员使用。通过统一接口，用户可以轻松组合不同模型的长处，构建更高效、更具适应性的智能体系统。其独特的技术亮点在于“全模型兼容”架构，让用户不再受制于某一家公司的策略变动或定价调整，真正实现对前沿模型资源的自由驾驭。无论是构建自动化编码助手，还是开发多步骤任务处理流程，oh-my-openagent 都能提供灵活且稳健的基础设施支持，助力用户在快速演进的 AI 生态中保持技术主动权。",50011,"2026-04-09T23:35:26",[16,30,13,14,15],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":10,"last_commit_at":54,"category_tags":55,"status":17},5295,"tabby","TabbyML\u002Ftabby","Tabby 是一款可私有化部署的开源 AI 编程助手，旨在为开发团队提供 GitHub Copilot 的安全替代方案。它核心解决了代码辅助过程中的数据隐私顾虑与云端依赖问题，让企业能够在完全掌控数据的前提下享受智能代码补全、聊天问答及上下文理解带来的效率提升。\n\n这款工具特别适合注重代码安全的企业开发团队、希望本地化运行大模型的科研机构，以及拥有消费级显卡的个人开发者。Tabby 的最大亮点在于其“开箱即用”的自包含架构，无需配置复杂的数据库或依赖云服务即可快速启动。同时，它对硬件十分友好，支持在普通的消费级 GPU 上流畅运行，大幅降低了部署门槛。此外，Tabby 提供了标准的 OpenAPI 接口，能轻松集成到现有的云 IDE 或内部开发流程中，并支持通过 REST API 接入自定义文档以增强知识上下文。从代码自动补全到基于 Git 仓库的智能问答，Tabby 致力于成为开发者身边懂业务、守安全的智能伙伴。",33308,"2026-04-07T20:23:18",[13,30,15,14,16],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":62,"last_commit_at":63,"category_tags":64,"status":17},2483,"onlook","onlook-dev\u002Fonlook","Onlook 是一款专为设计师打造的开源 AI 优先设计工具，被誉为“设计师版的 Cursor”。它旨在打破设计与开发之间的壁垒，让用户能够以可视化的方式直接构建、样式化和编辑 React 应用。通过 Onlook，用户无需深入编写复杂代码，即可在类似 Figma 的直观界面中完成网页原型的搭建与调整，并实时预览最终效果。\n\n这款工具主要解决了传统工作流中设计稿到代码转换效率低、沟通成本高的问题。以往，设计师使用 Figma 等工具完成设计后，需要开发人员手动将其转化为代码，过程繁琐且容易出错。Onlook 允许用户直接在浏览器 DOM 中进行可视化编辑，底层自动生成基于 Next.js 和 TailwindCSS 的高质量代码，实现了“所见即所得”的开发体验。它不仅支持从文本或图像快速生成应用，还具备分支管理、资源管理及一键部署等功能，极大地简化了从创意到成品的流程。\n\nOnlook 特别适合前端开发者、UI\u002FUX 设计师以及希望快速验证产品创意的独立开发者使用。对于设计师而言，它降低了参与前端开发的门槛；对于开发者来说，它提供了一个高效的视觉化调试和原型构建环境。其核心技术亮点在于",25006,4,"2026-04-03T01:50:49",[15,14,16,13],{"id":66,"github_repo":67,"name":68,"description_en":69,"description_zh":70,"ai_summary_zh":71,"readme_en":72,"readme_zh":73,"quickstart_zh":74,"use_case_zh":75,"hero_image_url":76,"owner_login":77,"owner_name":78,"owner_avatar_url":79,"owner_bio":80,"owner_company":81,"owner_location":81,"owner_email":81,"owner_twitter":82,"owner_website":83,"owner_url":84,"languages":85,"stars":97,"forks":98,"last_commit_at":99,"license":100,"difficulty_score":10,"env_os":101,"env_gpu":102,"env_ram":101,"env_deps":103,"category_tags":114,"github_topics":115,"view_count":24,"oss_zip_url":81,"oss_zip_packed_at":81,"status":17,"created_at":118,"updated_at":119,"faqs":120,"releases":150},6170,"Picsart-AI-Research\u002FText2Video-Zero","Text2Video-Zero","[ICCV 2023 Oral] Text-to-Image Diffusion Models are Zero-Shot Video Generators","Text2Video-Zero 是一款创新的开源 AI 工具，能够直接将文本描述转化为连贯的视频内容，甚至支持根据指令编辑现有视频。它的核心突破在于“零样本”（Zero-Shot）能力：无需针对视频数据进行额外的昂贵训练，即可直接利用现有的文本生成图像模型（如 Stable Diffusion）来生成视频。这有效解决了传统视频生成模型依赖大量特定数据集训练、成本高且灵活性差的痛点，同时确保了生成画面在时间维度上的流畅与一致。\n\n除了基础的文生视频，Text2Video-Zero 还展现了强大的可控性。用户不仅可以输入文字，还能结合姿态、边缘轮廓或深度信息来精准引导视频动作与结构，实现了类似“视频版 Instruct-Pix2Pix\"的指令式编辑功能。技术层面，它集成了令牌合并（Token Merging）等优化手段，显著降低了显存需求，使更多设备能够运行。\n\n这款工具非常适合 AI 研究人员探索多模态生成机制，也适合开发者快速构建视频应用原型。对于设计师和内容创作者而言，它提供了一个低门槛、高自由度的创意实验平台，让将脑海中的动态场景瞬间可视化成为可能。随着其被集成到主流 Diffus","Text2Video-Zero 是一款创新的开源 AI 工具，能够直接将文本描述转化为连贯的视频内容，甚至支持根据指令编辑现有视频。它的核心突破在于“零样本”（Zero-Shot）能力：无需针对视频数据进行额外的昂贵训练，即可直接利用现有的文本生成图像模型（如 Stable Diffusion）来生成视频。这有效解决了传统视频生成模型依赖大量特定数据集训练、成本高且灵活性差的痛点，同时确保了生成画面在时间维度上的流畅与一致。\n\n除了基础的文生视频，Text2Video-Zero 还展现了强大的可控性。用户不仅可以输入文字，还能结合姿态、边缘轮廓或深度信息来精准引导视频动作与结构，实现了类似“视频版 Instruct-Pix2Pix\"的指令式编辑功能。技术层面，它集成了令牌合并（Token Merging）等优化手段，显著降低了显存需求，使更多设备能够运行。\n\n这款工具非常适合 AI 研究人员探索多模态生成机制，也适合开发者快速构建视频应用原型。对于设计师和内容创作者而言，它提供了一个低门槛、高自由度的创意实验平台，让将脑海中的动态场景瞬间可视化成为可能。随着其被集成到主流 Diffusers 库中，Text2Video-Zero 正成为连接静态图像生成与动态视频创作的重要桥梁。","\n\n\n# Text2Video-Zero\n\nThis repository is the official implementation of [Text2Video-Zero](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.13439).\n\n\n**[Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.13439)**\n\u003C\u002Fbr>\nLevon Khachatryan,\nAndranik Movsisyan,\nVahram Tadevosyan,\nRoberto Henschel,\n[Zhangyang Wang](https:\u002F\u002Fwww.ece.utexas.edu\u002Fpeople\u002Ffaculty\u002Fatlas-wang), Shant Navasardyan, [Humphrey Shi](https:\u002F\u002Fwww.humphreyshi.com)\n\u003C\u002Fbr>\n\n[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.13439) | [Video](https:\u002F\u002Fwww.dropbox.com\u002Fs\u002Fuv90mi2z598olsq\u002FText2Video-Zero.MP4?dl=0) | [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FPAIR\u002FText2Video-Zero) | [Project](https:\u002F\u002Ftext2video-zero.github.io\u002F)\n\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_965beccb9580.png\" width=\"800px\"\u002F>  \n\u003Cbr>\n\u003Cem>Our method Text2Video-Zero enables zero-shot video generation using (i) a textual prompt (see rows 1, 2),  (ii) a prompt combined with guidance from poses or edges (see lower right), and  (iii)  Video Instruct-Pix2Pix, i.e., instruction-guided video editing (see lower left). \n    Results are temporally consistent and follow closely the guidance and textual prompts.\u003C\u002Fem>\n\u003C\u002Fp>\n\n## News\n\n* [03\u002F23\u002F2023] Paper [Text2Video-Zero](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.13439) released!\n* [03\u002F25\u002F2023] The [first version](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FPAIR\u002FText2Video-Zero) of our huggingface demo (containing `zero-shot text-to-video generation` and  `Video Instruct Pix2Pix`) released!\n* [03\u002F27\u002F2023] The [full version](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FPAIR\u002FText2Video-Zero) of our huggingface demo released! Now also included: `text and pose conditional video generation`, `text and edge conditional video generation`, and \n`text, edge and dreambooth conditional video generation`.\n* [03\u002F28\u002F2023] Code for all our generation methods released! We added a new low-memory setup. Minimum required GPU VRAM is currently **12 GB**. It will be further reduced in the upcoming releases. \n* [03\u002F29\u002F2023] Improved [Huggingface demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FPAIR\u002FText2Video-Zero)! (i) For text-to-video generation, **any base model for stable diffusion** and **any dreambooth model** hosted on huggingface can now be loaded! (ii) We improved the quality of Video Instruct-Pix2Pix. (iii) We added two longer examples for Video Instruct-Pix2Pix.   \n* [03\u002F30\u002F2023] New code released! It includes all improvements of our latest huggingface iteration. See the news update from `03\u002F29\u002F2023`. In addition, generated videos (text-to-video) can have **arbitrary length**. \n* [04\u002F06\u002F2023] We integrated [Token Merging](https:\u002F\u002Fgithub.com\u002Fdbolya\u002Ftomesd) into our code. When the highest compression is used and chunk size set to `2`, our code can run with **less than 7 GB VRAM**.  \n* [04\u002F11\u002F2023] New code and Huggingface demo released! We integrated **depth control**, based on [MiDaS](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1907.01341.pdf).\n* [04\u002F13\u002F2023] Our method has been integrad into 🧨 [Diffusers](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Fapi\u002Fpipelines\u002Ftext_to_video_zero)!\n\n## Contribute\nWe are on a journey to democratize AI and empower the creativity of everyone, and we believe Text2Video-Zero is a great research direction to unleash the zero-shot video generation and editing capacity of the amazing text-to-image models!\n\nTo achieve this goal, all contributions are welcome. Please check out these external implementations and extensions of Text2Video-Zero. We thank the authors for their efforts and contributions:\n* https:\u002F\u002Fgithub.com\u002FJiauZhang\u002FText2Video-Zero\n* https:\u002F\u002Fgithub.com\u002Fcamenduru\u002Ftext2video-zero-colab\n* https:\u002F\u002Fgithub.com\u002FSHI-Labs\u002FText2Video-Zero-sd-webui\n\n\n\n\n\n\n## Setup\n\n\n\n1. Clone this repository and enter:\n\n``` shell\ngit clone https:\u002F\u002Fgithub.com\u002FPicsart-AI-Research\u002FText2Video-Zero.git\ncd Text2Video-Zero\u002F\n```\n2. Install requirements using Python 3.9 and CUDA >= 11.6\n``` shell\nvirtualenv --system-site-packages -p python3.9 venv\nsource venv\u002Fbin\u002Factivate\npip install -r requirements.txt\n```\n\n\n\n\n--- \n\n\n\n## Inference API\n\n\nTo run inferences create an instance of `Model` class\n\n``` python\nimport torch\nfrom model import Model\n\nmodel = Model(device = \"cuda\", dtype = torch.float16)\n```\n\n\n---\n\n\n### Text-To-Video\nTo directly call our text-to-video generator, run this python command which stores the result in `tmp\u002Ftext2video\u002FA_horse_galloping_on_a_street.mp4` :\n``` python\nprompt = \"A horse galloping on a street\"\nparams = {\"t0\": 44, \"t1\": 47 , \"motion_field_strength_x\" : 12, \"motion_field_strength_y\" : 12, \"video_length\": 8}\n\nout_path, fps = f\".\u002Ftext2video_{prompt.replace(' ','_')}.mp4\", 4\nmodel.process_text2video(prompt, fps = fps, path = out_path, **params)\n```\n\nTo use a different stable diffusion base model run this python command:\n``` python\nfrom hf_utils import get_model_list\nmodel_list = get_model_list()\nfor idx, name in enumerate(model_list):\n  print(idx, name)\nidx = int(input(\"Select the model by the listed number: \")) # select the model of your choice\nmodel.process_text2video(prompt, model_name = model_list[idx], fps = fps, path = out_path, **params)\n```\n\n\n#### Hyperparameters (Optional)\n\nYou can define the following hyperparameters:\n* **Motion field strength**:   `motion_field_strength_x` = $\\delta_x$  and `motion_field_strength_y` = $\\delta_y$ (see our paper, Sect. 3.3.1). Default: `motion_field_strength_x=motion_field_strength_y= 12`.\n* $T$ and $T'$ (see our paper, Sect. 3.3.1). Define values `t0` and `t1` in the range `{0,...,50}`. Default: `t0=44`, `t1=47` (DDIM steps). Corresponds to timesteps `881` and `941`, respectively. \n* **Video length**: Define the number of frames `video_length` to be generated. Default: `video_length=8`.\n\n\n---\n\n\n### Text-To-Video with Pose Control\nTo directly call our text-to-video generator with pose control, run this python command:\n``` python\nprompt = 'an astronaut dancing in outer space'\nmotion_path = '__assets__\u002Fposes_skeleton_gifs\u002Fdance1_corr.mp4'\nout_path = f\".\u002Ftext2video_pose_guidance_{prompt.replace(' ','_')}.gif\"\nmodel.process_controlnet_pose(motion_path, prompt=prompt, save_path=out_path)\n```\n\n\n---\n\n\n\n### Text-To-Video with Edge Control\nTo directly call our text-to-video generator with edge control, run this python command:\n``` python\nprompt = 'oil painting of a deer, a high-quality, detailed, and professional photo'\nvideo_path = '__assets__\u002Fcanny_videos_mp4\u002Fdeer.mp4'\nout_path = f'.\u002Ftext2video_edge_guidance_{prompt}.mp4'\nmodel.process_controlnet_canny(video_path, prompt=prompt, save_path=out_path)\n```\n\n#### Hyperparameters\n\nYou can define the following hyperparameters for Canny edge detection:\n* **low threshold**. Define value `low_threshold` in the range $(0, 255)$. Default: `low_threshold=100`.\n* **high threshold**. Define value `high_threshold` in the range $(0, 255)$. Default: `high_threshold=200`. Make sure that `high_threshold` > `low_threshold`.\n\nYou can give hyperparameters as arguments to `model.process_controlnet_canny`\n\n\n---\n\n\n### Text-To-Video with Edge Guidance and Dreambooth specialization\nLoad a dreambooth model then proceed as described in `Text-To-Video with Edge Guidance`\n``` python\n\nprompt = 'your prompt'\nvideo_path = 'path\u002Fto\u002Fyour\u002Fvideo'\ndreambooth_model_path = 'path\u002Fto\u002Fyour\u002Fdreambooth\u002Fmodel'\nout_path = f'.\u002Ftext2video_edge_db_{prompt}.gif'\nmodel.process_controlnet_canny_db(dreambooth_model_path, video_path, prompt=prompt, save_path=out_path)\n```\n\nThe value `video_path` can be the path to a `mp4` file. To use one of the example videos provided, set `video_path=\"woman1\"`, `video_path=\"woman2\"`, `video_path=\"woman3\"`, or `video_path=\"man1\"`. \n \n\nThe value `dreambooth_model_path` can either be a link to a diffuser model file, or the name of one of the dreambooth models provided. To this end, set `dreambooth_model_path = \"Anime DB\"`, `dreambooth_model_path = \"Avatar DB\"`, `dreambooth_model_path = \"GTA-5 DB\"`, or `dreambooth_model_path = \"Arcane DB\"`.  The corresponding keywords are: `1girl` (for `Anime DB`), `arcane style` (for `Arcane DB`) `avatar style` (for `Avatar DB`) and `gtav style`  (for `GTA-5 DB`).\n\n\n#### Custom Dreambooth Models\n\n\nTo load custom Dreambooth models, [transfer](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet\u002Fdiscussions\u002F12) control to the custom model and  [convert](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers\u002Fblob\u002Fmain\u002Fscripts\u002Fconvert_original_stable_diffusion_to_diffusers.py) it to diffuser format. Then, the value of `dreambooth_model_path` must link to the folder containing the diffuser file. Dreambooth models can be obtained, for instance, from [CIVITAI](https:\u002F\u002Fcivitai.com). \n\n\n\n---\n\n\n\n### Video Instruct-Pix2Pix\n\nTo perform pix2pix video editing, run this python command:\n``` python\nprompt = 'make it Van Gogh Starry Night'\nvideo_path = '__assets__\u002Fpix2pix video\u002Fcamel.mp4'\nout_path = f'.\u002Fvideo_instruct_pix2pix_{prompt}.mp4'\nmodel.process_pix2pix(video_path, prompt=prompt, save_path=out_path)\n```\n\n\n---\n\n\n### Text-To-Video with Depth Control\n\nTo directly call our text-to-video generator with depth control, run this python command:\n``` python\nprompt = 'oil painting of a deer, a high-quality, detailed, and professional photo'\nvideo_path = '__assets__\u002Fdepth_videos\u002Fdeer.mp4'\nout_path = f'.\u002Ftext2video_depth_control_{prompt}.mp4'\nmodel.process_controlnet_depth(video_path, prompt=prompt, save_path=out_path)\n```\n\n\n\n---\n\n\n\n\n### Low Memory Inference\nEach of the above introduced interface can be run in a low memory setup. In the minimal setup, a GPU with **12 GB VRAM** is sufficient. \n\nTo reduce the memory usage, add `chunk_size=k` as additional parameter when calling one of the above defined inference APIs. The integer value `k` must be in the range `{2,...,video_length}`. It defines the number of frames that are processed at once (without any loss in quality). The lower the value the less memory is needed.\n\nWhen using the gradio app, set `chunk_size` in the `Advanced options`. \n\nThanks to the great work of [Token Merging](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17604), the memory usage can be further reduced. It can be configured using the  `merging_ratio` parameter with values in `[0,1]`. The higher the value, the more compression is applied (leading to faster inference and less memory requirements). Be aware that too high values will decrease the image quality. \n\n \nWe plan to continue optimizing our code to enable even lower memory consumption.\n\n---\n\n\n### Ablation Study\nTo replicate the ablation study, add additional parameters when calling the above defined inference APIs.\n*  To deactivate `cross-frame attention`: Add `use_cf_attn=False` to the parameter list.\n* To deactivate enriching latent codes with `motion dynamics`: Add `use_motion_field=False` to the parameter list.\n\n\nNote: Adding `smooth_bg=True` activates background smoothing. However, our  code does not include the salient object detector necessary to run that code.\n\n\n\n\n---\n\n\n\n## Inference using Gradio\n\n\n\u003Cdetails closed>\n\u003Csummary>Click to see details.\u003C\u002Fsummary>\n\nFrom the project root folder, run this shell command:\n``` shell\npython app.py\n```\n\nThen access the app [locally](http:\u002F\u002F127.0.0.1:7860) with a browser.\n\nTo access the app remotely, run this shell command:\n``` shell\npython app.py --public_access\n```\nFor security information about public access we refer to the documentation of [gradio](https:\u002F\u002Fgradio.app\u002Fsharing-your-app\u002F#security-and-file-access).\n\n\u003C\u002Fdetails>\n\n\n\n---  \n\n\n\n## Results\n\n### Text-To-Video\n\u003Ctable class=\"center\">\n\u003Ctr>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_4b87723c660a.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_99e580a8dd8d.gif\">\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_b0b1c2903188.gif\">\u003C\u002Ftd>              \n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_bdb44132e866.gif\">\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n  \u003Ctd width=25% align=\"center\">\"A cat is running on the grass\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"A panda is playing guitar on times square\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"A man is running in the snow\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"An astronaut is skiing down the hill\"\u003C\u002Ftd>\n\u003C\u002Ftr>\n\n\u003Ctr>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_42df88778153.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_dce97709532f.gif\">\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_43a9ee4dbf93.gif\">\u003C\u002Ftd>              \n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_fd8c0a3e19e6.gif\">\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n  \u003Ctd width=25% align=\"center\">\"A panda surfing on a wakeboard\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"A bear dancing on times square\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"A man is riding a bicycle in the sunshine\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"A horse galloping on a street\"\u003C\u002Ftd>\n\u003C\u002Ftr>\n\n\u003Ctr>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_40ff7e59b736.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_e112b3244479.gif\">\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_c6639037a971.gif\">\u003C\u002Ftd>              \n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_f977bd72a1e6.gif\">\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n  \u003Ctd width=25% align=\"center\">\"A tiger walking alone down the street\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"A panda surfing on a wakeboard\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"A horse galloping on a street\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"A cute cat running in a beautiful meadow\"\u003C\u002Ftd>\n\u003C\u002Ftr>\n\n\n\u003Ctr>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_d8232cfd8e1d.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_6b1163906fea.gif\">\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_f126ee84b014.gif\">\u003C\u002Ftd>              \n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_096df1987026.gif\">\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n  \u003Ctd width=25% align=\"center\">\"A horse galloping on a street\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"A panda walking alone down the street\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"A dog is walking down the street\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"An astronaut is waving his hands on the moon\"\u003C\u002Ftd>\n\u003C\u002Ftr>\n\n\n\u003C\u002Ftable>\n\n### Text-To-Video with Pose Guidance\n\n\n\u003Ctable class=\"center\">\n\u003Ctr>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_71d3c9ac7cef.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_c23804a1f11d.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_eb3fd8568b31.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_da2a318f4140.gif\" raw=true>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n  \u003Ctd width=25% align=\"center\">\"A bear dancing on the concrete\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"An alien dancing under a flying saucer\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"A panda dancing in Antarctica\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"An astronaut dancing in the outer space\"\u003C\u002Ftd>\n\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n### Text-To-Video with Edge Guidance\n\n\n\n\u003Ctable class=\"center\">\n\u003Ctr>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_42783818cc72.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_7b964948a1cc.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_7fb728d45a4d.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_fe53a5084794.gif\" raw=true>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n  \u003Ctd width=25% align=\"center\">\"White butterfly\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"Beautiful girl\"\u003C\u002Ftd>\n    \u003Ctd width=25% align=\"center\">\"A jellyfish\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"beautiful girl halloween style\"\u003C\u002Ftd>\n\u003C\u002Ftr>\n\n\u003Ctr>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_033796125b4d.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_5bc5484ab5a3.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_c90b81695cbe.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_7fe00eea6796.gif\" raw=true>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n  \u003Ctd width=25% align=\"center\">\"Wild fox is walking\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"Oil painting of a beautiful girl close-up\"\u003C\u002Ftd>\n    \u003Ctd width=25% align=\"center\">\"A santa claus\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"A deer\"\u003C\u002Ftd>\n\u003C\u002Ftr>\n\n\u003C\u002Ftable>\n\n\n### Text-To-Video with Edge Guidance and Dreambooth specialization\n\n\n\n\n\u003Ctable class=\"center\">\n\u003Ctr>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_f7e89612ffab.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_c4066393036a.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_e84ef5ac7e20.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_f82760a80d2b.gif\" raw=true>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n  \u003Ctd width=25% align=\"center\">\"anime style\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"arcane style\"\u003C\u002Ftd>\n    \u003Ctd width=25% align=\"center\">\"gta-5 man\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"avatar style\"\u003C\u002Ftd>\n\u003C\u002Ftr>\n\n\u003C\u002Ftable>\n\n\n## Video Instruct Pix2Pix\n\n\u003Ctable class=\"center\">\n\u003Ctr>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_7bfeb293418e.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_22a00e1490b5.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_2b4f0ba65133.gif\" raw=true>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n  \u003Ctd width=25% align=\"center\">\"Replace man with chimpanze\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"Make it Van Gogh Starry Night style\"\u003C\u002Ftd>\n    \u003Ctd width=25% align=\"center\">\"Make it Picasso style\"\u003C\u002Ftd>\n\u003C\u002Ftr>\n\n\u003Ctr>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_fc8239fbeebb.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_7436fb9d7956.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_f7624ce30115.gif\" raw=true>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n  \u003Ctd width=25% align=\"center\">\"Make it Expressionism style\"\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">\"Make it night\"\u003C\u002Ftd>\n    \u003Ctd width=25% align=\"center\">\"Make it autumn\"\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\n## Related Links \n\n* [High-Resolution Image Synthesis with Latent Diffusion Models (a.k.a. LDM & Stable Diffusion)](https:\u002F\u002Fommer-lab.com\u002Fresearch\u002Flatent-diffusion-models\u002F)\n* [InstructPix2Pix: Learning to Follow Image Editing Instructions](https:\u002F\u002Fwww.timothybrooks.com\u002Finstruct-pix2pix\u002F)\n* [Adding Conditional Control to Text-to-Image Diffusion Models (a.k.a ControlNet)](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet)\n* [Diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers)\n* [Token Merging for Stable Diffusion](https:\u002F\u002Fgithub.com\u002Fdbolya\u002Ftomesd)\n\n## License\nOur code is published under the CreativeML Open RAIL-M license. The license provided in this repository applies to all additions and contributions we make upon the original stable diffusion code. The original stable diffusion code is under the CreativeML Open RAIL-M license, which can found [here](https:\u002F\u002Fgithub.com\u002FCompVis\u002Fstable-diffusion\u002Fblob\u002Fmain\u002FLICENSE).\n\n\n## BibTeX\nIf you use our work in your research, please cite our publication:\n```\n@article{text2video-zero,\n    title={Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators},\n    author={Khachatryan, Levon and Movsisyan, Andranik and Tadevosyan, Vahram and Henschel, Roberto and Wang, Zhangyang and Navasardyan, Shant and Shi, Humphrey},\n    journal={arXiv preprint arXiv:2303.13439},\n    year={2023}\n}\n```\n\n\n\n## Alternative ways to use Text2Video-Zero\n\nText2Video-Zero can alternatively used via \n\n* 🧨 [Diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers) Library.\n\n\u003Cdetails closed>\n\u003Csummary>Click to see details.\u003C\u002Fsummary>\n\n\n\n### Text2Video-Zero in 🧨 Diffusers Library\n\nText2Video-Zero is [available](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Fapi\u002Fpipelines\u002Ftext_to_video_zero) in 🧨 Diffusers, starting from version `0.15.0`! \n\n\n\n[Diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers) can be installed using the following command:\n\n\n``` shell\nvirtualenv --system-site-packages -p python3.9 venv\nsource venv\u002Fbin\u002Factivate\npip install diffusers torch imageio\n```\n\n\nTo generate a video from a text prompt, run the following command:\n\n``` python\nimport torch\nimport imageio\nfrom diffusers import TextToVideoZeroPipeline\n\n# load stable diffusion model weights\nmodel_id = \"runwayml\u002Fstable-diffusion-v1-5\"\n\n# create a TextToVideoZero pipeline\npipe = TextToVideoZeroPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to(\"cuda\")\n\n# define the text prompt\nprompt = \"A panda is playing guitar on times square\"\n\n# generate the video using our pipeline\nresult = pipe(prompt=prompt).images\nresult = [(r * 255).astype(\"uint8\") for r in result]\n\n# save the resulting image\nimageio.mimsave(\"video.mp4\", result, fps=4)\n```\n\n\nFor more information, including how to run `text and pose conditional video generation`, `text and edge conditional video generation` and `text and edge and dreambooth conditional video generation`, please check the [documentation](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Fapi\u002Fpipelines\u002Ftext_to_video_zero).  \n\n\n\n\u003C\u002Fdetails>\n\n","# Text2Video-Zero\n\n本仓库是 [Text2Video-Zero](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.13439) 的官方实现。\n\n\n**[Text2Video-Zero：文本到图像扩散模型即为零样本视频生成器](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.13439)**\n\u003C\u002Fbr>\nLevon Khachatryan,\nAndranik Movsisyan,\nVahram Tadevosyan,\nRoberto Henschel,\n[Zhangyang Wang](https:\u002F\u002Fwww.ece.utexas.edu\u002Fpeople\u002Ffaculty\u002Fatlas-wang), Shant Navasardyan, [Humphrey Shi](https:\u002F\u002Fwww.humphreyshi.com)\n\u003C\u002Fbr>\n\n[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.13439) | [视频](https:\u002F\u002Fwww.dropbox.com\u002Fs\u002Fuv90mi2z598olsq\u002FText2Video-Zero.MP4?dl=0) | [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FPAIR\u002FText2Video-Zero) | [项目](https:\u002F\u002Ftext2video-zero.github.io\u002F)\n\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_965beccb9580.png\" width=\"800px\"\u002F>  \n\u003Cbr>\n\u003Cem>我们的方法 Text2Video-Zero 能够实现零样本视频生成，支持 (i) 文本提示（见第1、2行），(ii) 结合姿态或边缘引导的提示（见右下角），以及 (iii) Video Instruct-Pix2Pix，即指令引导的视频编辑（见左下角）。生成结果在时间上保持一致，并紧密遵循引导和文本提示。\u003C\u002Fem>\n\u003C\u002Fp>\n\n## 最新消息\n\n* [2023年3月23日] 论文 [Text2Video-Zero](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.13439) 发布！\n* [2023年3月25日] 我们的 Hugging Face 演示 [第一个版本](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FPAIR\u002FText2Video-Zero)（包含 `零样本文本到视频生成` 和 `Video Instruct Pix2Pix`）发布！\n* [2023年3月27日] 我们的 Hugging Face 演示 [完整版本](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FPAIR\u002FText2Video-Zero) 发布！现在还新增了：`文本与姿态条件下的视频生成`、`文本与边缘条件下的视频生成`，以及 `文本、边缘和 DreamBooth 条件下的视频生成`。\n* [2023年3月28日] 我们所有生成方法的代码均已发布！我们新增了一种低显存配置。目前所需的最低 GPU 显存为 **12 GB**。未来版本将进一步降低这一要求。\n* [2023年3月29日] 优化后的 [Hugging Face 演示](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FPAIR\u002FText2Video-Zero) 上线！(i) 对于文本到视频生成，现在可以加载 Hugging Face 上托管的 **任意 Stable Diffusion 基础模型** 和 **任意 DreamBooth 模型**！(ii) 我们提升了 Video Instruct-Pix2Pix 的质量。(iii) 我们新增了两个更长的 Video Instruct-Pix2Pix 示例。\n* [2023年3月30日] 新版代码发布！包含了我们最新 Hugging Face 版本的所有改进。详情请参阅 3月29日的更新内容。此外，生成的文本到视频还可以具有 **任意长度**。\n* [2023年4月6日] 我们将 [Token Merging](https:\u002F\u002Fgithub.com\u002Fdbolya\u002Ftomesd) 集成到了代码中。当使用最高压缩率并将 chunk size 设置为 `2` 时，我们的代码可以在 **低于 7 GB 显存** 的情况下运行。\n* [2023年4月11日] 新版代码和 Hugging Face 演示发布！我们集成了基于 [MiDaS](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1907.01341.pdf) 的 **深度控制**。\n* [2023年4月13日] 我们的算法已被集成到 🧨 [Diffusers](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Fapi\u002Fpipelines\u002Ftext_to_video_zero) 中！\n\n## 贡献\n我们致力于推动 AI 民主化，激发每个人的创造力，而我们认为 Text2Video-Zero 是一个极具潜力的研究方向，能够释放优秀文本到图像模型在零样本视频生成和编辑方面的巨大潜能！\n\n为了实现这一目标，我们欢迎所有贡献。请查看这些 Text2Video-Zero 的外部实现和扩展。感谢各位作者的努力与贡献：\n* https:\u002F\u002Fgithub.com\u002FJiauZhang\u002FText2Video-Zero\n* https:\u002F\u002Fgithub.com\u002Fcamenduru\u002Ftext2video-zero-colab\n* https:\u002F\u002Fgithub.com\u002FSHI-Labs\u002FText2Video-Zero-sd-webui\n\n\n\n\n\n\n## 环境搭建\n\n\n\n1. 克隆本仓库并进入：\n\n``` shell\ngit clone https:\u002F\u002Fgithub.com\u002FPicsart-AI-Research\u002FText2Video-Zero.git\ncd Text2Video-Zero\u002F\n```\n2. 使用 Python 3.9 和 CUDA >= 11.6 安装依赖项\n``` shell\nvirtualenv --system-site-packages -p python3.9 venv\nsource venv\u002Fbin\u002Factivate\npip install -r requirements.txt\n```\n\n\n\n\n--- \n\n\n\n## 推理 API\n\n\n要进行推理，需创建 `Model` 类的实例\n\n``` python\nimport torch\nfrom model import Model\n\nmodel = Model(device = \"cuda\", dtype = torch.float16)\n```\n\n\n---\n\n\n### 文本到视频\n要直接调用我们的文本到视频生成器，运行以下 Python 命令，结果将保存至 `tmp\u002Ftext2video\u002FA_horse_galloping_on_a_street.mp4`：\n``` python\nprompt = \"A horse galloping on a street\"\nparams = {\"t0\": 44, \"t1\": 47 , \"motion_field_strength_x\" : 12, \"motion_field_strength_y\" : 12, \"video_length\": 8}\n\nout_path, fps = f\".\u002Ftext2video_{prompt.replace(' ','_')}.mp4\", 4\nmodel.process_text2video(prompt, fps = fps, path = out_path, **params)\n```\n\n若要使用不同的 Stable Diffusion 基础模型，运行以下 Python 命令：\n``` python\nfrom hf_utils import get_model_list\nmodel_list = get_model_list()\nfor idx, name in enumerate(model_list):\n  print(idx, name)\nidx = int(input(\"Select the model by the listed number: \")) # 选择您心仪的模型\nmodel.process_text2video(prompt, model_name = model_list[idx], fps = fps, path = out_path, **params)\n```\n\n\n#### 可选超参数\n\n您可以定义以下超参数：\n* **运动场强度**：`motion_field_strength_x` = $\\delta_x$  和 `motion_field_strength_y` = $\\delta_y$ （详见我们的论文第 3.3.1 节）。默认值为：`motion_field_strength_x=motion_field_strength_y= 12`。\n* $T$ 和 $T'$（详见我们的论文第 3.3.1 节）。可设置 `t0` 和 `t1` 的值，范围为 `{0,...,50}`。默认值为：`t0=44`, `t1=47`（DDIM 步骤）。分别对应于时间步 `881` 和 `941`。\n* **视频长度**：定义要生成的帧数 `video_length`。默认值为：`video_length=8`。\n\n\n---\n\n\n### 带姿态控制的文本到视频\n要直接调用我们的带姿态控制的文本到视频生成器，运行以下 Python 命令：\n``` python\nprompt = 'an astronaut dancing in outer space'\nmotion_path = '__assets__\u002Fposes_skeleton_gifs\u002Fdance1_corr.mp4'\nout_path = f\".\u002Ftext2video_pose_guidance_{prompt.replace(' ','_')}.gif\"\nmodel.process_controlnet_pose(motion_path, prompt=prompt, save_path=out_path)\n```\n\n\n---\n\n### 带边缘控制的文本转视频\n要直接调用我们的带边缘控制的文本转视频生成器，请运行以下 Python 命令：\n``` python\nprompt = '一幅鹿的油画，高质量、细节丰富且专业的照片'\nvideo_path = '__assets__\u002Fcanny_videos_mp4\u002Fdeer.mp4'\nout_path = f'.\u002Ftext2video_edge_guidance_{prompt}.mp4'\nmodel.process_controlnet_canny(video_path, prompt=prompt, save_path=out_path)\n```\n\n#### 超参数\n\n您可以为 Canny 边缘检测定义以下超参数：\n* **低阈值**。定义 `low_threshold` 的值，范围为 $(0, 255)$。默认值：`low_threshold=100`。\n* **高阈值**。定义 `high_threshold` 的值，范围为 $(0, 255)$。默认值：`high_threshold=200`。请确保 `high_threshold` > `low_threshold`。\n\n您可以将超参数作为参数传递给 `model.process_controlnet_canny`\n\n\n---\n\n\n### 带边缘引导和 Dreambooth 专精的文本转视频\n加载一个 Dreambooth 模型，然后按照“带边缘引导的文本转视频”中的说明进行操作。\n``` python\n\nprompt = '您的提示词'\nvideo_path = '您视频的路径'\ndreambooth_model_path = '您 Dreambooth 模型的路径'\nout_path = f'.\u002Ftext2video_edge_db_{prompt}.gif'\nmodel.process_controlnet_canny_db(dreambooth_model_path, video_path, prompt=prompt, save_path=out_path)\n```\n\n`video_path` 的值可以是 `mp4` 文件的路径。如果要使用提供的示例视频之一，可设置为 `video_path=\"woman1\"`、`video_path=\"woman2\"`、`video_path=\"woman3\"` 或 `video_path=\"man1\"`。\n\n\n`dreambooth_model_path` 的值可以是扩散模型文件的链接，也可以是提供的 Dreambooth 模型之一的名称。为此，可设置为 `dreambooth_model_path = \"Anime DB\"`、`dreambooth_model_path = \"Avatar DB\"`、`dreambooth_model_path = \"GTA-5 DB\"` 或 `dreambooth_model_path = \"Arcane DB\"`。对应的关键词分别为：`1girl`（用于 `Anime DB`）、`arcane style`（用于 `Arcane DB`）、`avatar style`（用于 `Avatar DB`）以及 `gtav style`（用于 `GTA-5 DB`）。\n\n\n#### 自定义 Dreambooth 模型\n\n\n要加载自定义 Dreambooth 模型，需将控制权[转移](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet\u002Fdiscussions\u002F12)到自定义模型，并将其[转换](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers\u002Fblob\u002Fmain\u002Fscripts\u002Fconvert_original_stable_diffusion_to_diffusers.py)为扩散格式。随后，`dreambooth_model_path` 的值必须指向包含扩散文件的文件夹。Dreambooth 模型可以从例如 [CIVITAI](https:\u002F\u002Fcivitai.com) 获取。\n\n\n\n---\n\n\n\n### 视频指令-Pix2Pix\n\n要进行 Pix2Pix 视频编辑，请运行以下 Python 命令：\n``` python\nprompt = '让它变成梵高的《星夜》'\nvideo_path = '__assets__\u002Fpix2pix video\u002Fcamel.mp4'\nout_path = f'.\u002Fvideo_instruct_pix2pix_{prompt}.mp4'\nmodel.process_pix2pix(video_path, prompt=prompt, save_path=out_path)\n```\n\n\n---\n\n\n### 带深度控制的文本转视频\n要直接调用我们的带深度控制的文本转视频生成器，请运行以下 Python 命令：\n``` python\nprompt = '一幅鹿的油画，高质量、细节丰富且专业的照片'\nvideo_path = '__assets__\u002Fdepth_videos\u002Fdeer.mp4'\nout_path = f'.\u002Ftext2video_depth_control_{prompt}.mp4'\nmodel.process_controlnet_depth(video_path, prompt=prompt, save_path=out_path)\n```\n\n\n\n---\n\n\n\n\n### 低内存推理\n上述每种接口都可以在低内存环境下运行。在最低配置下，只需一块拥有 **12 GB VRAM** 的 GPU 即可。\n\n为了减少内存使用量，在调用上述推理 API 时，可添加 `chunk_size=k` 作为额外参数。整数 `k` 的取值范围为 `{2,...,video_length}`。它定义了每次处理的帧数（不会影响画质）。数值越小，所需的内存就越少。\n\n当使用 Gradio 应用程序时，可在“高级选项”中设置 `chunk_size`。\n\n得益于 [Token Merging](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17604) 的出色工作，内存使用量还可以进一步降低。可以通过 `merging_ratio` 参数进行配置，其取值范围为 `[0,1]`。数值越高，压缩程度越大（从而加快推理速度并减少内存需求）。请注意，过高的数值会降低图像质量。 \n\n我们计划继续优化代码，以实现更低的内存消耗。\n\n---\n\n\n### 消融研究\n要复现消融研究，在调用上述推理 API 时，需添加额外的参数。\n* 若要禁用“跨帧注意力”：在参数列表中添加 `use_cf_attn=False`。\n* 若要禁用用“运动动态”丰富潜在编码：在参数列表中添加 `use_motion_field=False`。\n\n\n注意：添加 `smooth_bg=True` 可激活背景平滑功能。然而，我们的代码并不包含运行该功能所必需的显著物体检测器。\n\n\n\n\n---\n\n\n\n## 使用 Gradio 进行推理\n\n\n\u003Cdetails closed>\n\u003Csummary>点击查看详细信息。\u003C\u002Fsummary>\n\n从项目根目录运行以下 Shell 命令：\n``` shell\npython app.py\n```\n\n然后使用浏览器在本地访问应用 [http:\u002F\u002F127.0.0.1:7860](http:\u002F\u002F127.0.0.1:7860)。\n\n若要远程访问应用，运行以下 Shell 命令：\n``` shell\npython app.py --public_access\n```\n有关公共访问的安全信息，请参阅 [Gradio](https:\u002F\u002Fgradio.app\u002Fsharing-your-app\u002F#security-and-file-access) 的文档。\n\n\u003C\u002Fdetails>\n\n\n\n---  \n\n\n\n## 结果\n\n### 文本转视频\n\u003Ctable class=\"center\">\n\u003Ctr>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_4b87723c660a.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_99e580a8dd8d.gif\">\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_b0b1c2903188.gif\">\u003C\u002Ftd>              \n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_bdb44132e866.gif\">\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n  \u003Ctd width=25% align=\"center\">“一只猫正在草地上奔跑”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“一只熊猫在时代广场弹吉他”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“一名男子正在雪地里跑步”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“一名宇航员正滑雪下山”\u003C\u002Ftd>\n\u003C\u002Ftr>\n\n\u003Ctr>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_42df88778153.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_dce97709532f.gif\">\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_43a9ee4dbf93.gif\">\u003C\u002Ftd>              \n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_fd8c0a3e19e6.gif\">\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n  \u003Ctd width=25% align=\"center\">“一只熊猫正在玩醒板冲浪”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“一只熊在时代广场跳舞”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“一名男子在阳光下骑自行车”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“一匹马正在街道上飞奔”\u003C\u002Ftd>\n\u003C\u002Ftr>\n\n\u003Ctr>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_40ff7e59b736.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_e112b3244479.gif\">\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_c6639037a971.gif\">\u003C\u002Ftd>              \n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_f977bd72a1e6.gif\">\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n  \u003Ctd width=25% align=\"center\">“一只老虎独自走在街道上”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“一只熊猫正在玩醒板冲浪”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“一匹马正在街道上飞奔”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“一只可爱的小猫在美丽的草地上奔跑”\u003C\u002Ftd>\n\u003C\u002Ftr>\n\n\n\u003Ctr>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_d8232cfd8e1d.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_6b1163906fea.gif\">\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_f126ee84b014.gif\">\u003C\u002Ftd>              \n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_096df1987026.gif\">\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n  \u003Ctd width=25% align=\"center\">“一匹马正在街道上飞奔”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“一只熊猫独自走在街道上”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“一只狗正在街上散步”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“一名宇航员正在月球上挥手”\u003C\u002Ftd>\n\u003C\u002Ftr>\n\n\n\u003C\u002Ftable>\n\n### 带姿态引导的文本转视频\n\n\n\u003Ctable class=\"center\">\n\u003Ctr>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_71d3c9ac7cef.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_c23804a1f11d.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_eb3fd8568b31.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_da2a318f4140.gif\" raw=true>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n  \u003Ctd width=25% align=\"center\">“一只熊在水泥地上跳舞”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“一个外星人在飞碟下跳舞”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“一只熊猫在南极洲跳舞”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“一名宇航员在太空中跳舞”\u003C\u002Ftd>\n\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n### 带边缘引导的文本转视频\n\n\n\n\u003Ctable class=\"center\">\n\u003Ctr>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_42783818cc72.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_7b964948a1cc.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_7fb728d45a4d.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_fe53a5084794.gif\" raw=true>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n  \u003Ctd width=25% align=\"center\">“白蝴蝶”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“美丽女孩”\u003C\u002Ftd>\n    \u003Ctd width=25% align=\"center\">“一只水母”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“万圣节风格的美丽女孩”\u003C\u002Ftd>\n\u003C\u002Ftr.\n\n\u003Ctr>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_033796125b4d.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_5bc5484ab5a3.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_c90b81695cbe.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_7fe00eea6796.gif\" raw=true>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n  \u003Ctd width=25% align=\"center\">“一只野生狐狸正在行走”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“一幅美丽女孩的油画特写”\u003C\u002Ftd>\n    \u003Ctd width=25% align=\"center\">“一位圣诞老人”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“一只鹿”\u003C\u002Ftd>\n\u003C\u002Ftr.\n\n\u003C\u002Ftable.\n\n\n### 带边缘引导和Dreambooth微调的文本转视频\n\n\n\n\n\u003Ctable class=\"center\">\n\u003Ctr>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_f7e89612ffab.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_c4066393036a.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_e84ef5ac7e20.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_f82760a80d2b.gif\" raw=true>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n  \u003Ctd width=25% align=\"center\">“动漫风格”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“炼金术士风格”\u003C\u002Ftd>\n    \u003Ctd width=25% align=\"center\">“GTA-5风男子”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“头像风格”\u003C\u002Ftd>\n\u003C\u002Ftr.\n\n\u003C\u002Ftable.\n\n\n## 视频指令Pix2Pix\n\n\u003Ctable class=\"center\">\n\u003Ctr>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_7bfeb293418e.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_22a00e1490b5.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_2b4f0ba65133.gif\" raw=true>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n  \u003Ctd width=25% align=\"center\">“把男人换成黑猩猩”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“让它变成梵高《星夜》风格”\u003C\u002Ftd>\n    \u003Ctd width=25% align=\"center\">“让它变成毕加索风格”\u003C\u002Ftd>\n\u003C\u002Ftr.\n\n\u003Ctr>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_fc8239fbeebb.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_7436fb9d7956.gif\" raw=true>\u003C\u002Ftd>\n  \u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_readme_f7624ce30115.gif\" raw=true>\u003C\u002Ftd>\n\u003C\u002Ftr.\n\u003Ctr>\n  \u003Ctd width=25% align=\"center\">“让它变成表现主义风格”\u003C\u002Ftd>\n  \u003Ctd width=25% align=\"center\">“让它变成夜晚场景”\u003C\u002Ftd>\n    \u003Ctd width=25% align=\"center\">“让它变成秋天景象”\u003C\u002Ftd>\n\u003C\u002Ftr.\n\u003C\u002Ftable>\n\n## 相关链接\n\n* [基于潜在扩散模型的高分辨率图像合成（又称 LDM 和 Stable Diffusion)](https:\u002F\u002Fommer-lab.com\u002Fresearch\u002Flatent-diffusion-models\u002F)\n* [InstructPix2Pix：学习遵循图像编辑指令](https:\u002F\u002Fwww.timothybrooks.com\u002Finstruct-pix2pix\u002F)\n* [为文本到图像扩散模型添加条件控制（又称 ControlNet)](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FControlNet)\n* [Diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers)\n* [Stable Diffusion 的 Token 合并](https:\u002F\u002Fgithub.com\u002Fdbolya\u002Ftomesd)\n\n## 许可证\n我们的代码采用 CreativeML Open RAIL-M 许可证发布。本仓库中的许可证适用于我们在原始 Stable Diffusion 代码基础上所做的所有新增和贡献。原始 Stable Diffusion 代码同样采用 CreativeML Open RAIL-M 许可证，该许可证可在[此处](https:\u002F\u002Fgithub.com\u002FCompVis\u002Fstable-diffusion\u002Fblob\u002Fmain\u002FLICENSE)找到。\n\n## BibTeX\n如果您在研究中使用了我们的工作，请引用我们的论文：\n```\n@article{text2video-zero,\n    title={Text2Video-Zero: 文本到图像扩散模型即为零样本视频生成器},\n    author={Khachatryan, Levon and Movsisyan, Andranik and Tadevosyan, Vahram and Henschel, Roberto and Wang, Zhangyang and Navasardyan, Shant and Shi, Humphrey},\n    journal={arXiv 预印本 arXiv:2303.13439},\n    year={2023}\n}\n```\n\n\n\n## 使用 Text2Video-Zero 的其他方式\n\n您也可以通过以下方式使用 Text2Video-Zero：\n\n* 🧨 [Diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers) 库。\n\n\u003Cdetails closed>\n\u003Csummary>点击查看详细信息。\u003C\u002Fsummary>\n\n\n\n### Text2Video-Zero 在 🧨 Diffusers 库中\n\n自 `0.15.0` 版本起，Text2Video-Zero 已在 🧨 Diffusers 中[可用](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Fapi\u002Fpipelines\u002Ftext_to_video_zero)！\n\n\n\n您可以使用以下命令安装 Diffusers：\n\n``` shell\nvirtualenv --system-site-packages -p python3.9 venv\nsource venv\u002Fbin\u002Factivate\npip install diffusers torch imageio\n```\n\n\n要根据文本提示生成视频，请运行以下命令：\n\n``` python\nimport torch\nimport imageio\nfrom diffusers import TextToVideoZeroPipeline\n\n# 加载 Stable Diffusion 模型权重\nmodel_id = \"runwayml\u002Fstable-diffusion-v1-5\"\n\n# 创建 TextToVideoZero 流水线\npipe = TextToVideoZeroPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to(\"cuda\")\n\n# 定义文本提示\nprompt = \"一只熊猫正在时代广场弹吉他\"\n\n# 使用我们的流水线生成视频\nresult = pipe(prompt=prompt).images\nresult = [(r * 255).astype(\"uint8\") for r in result]\n\n# 保存生成的视频\nimageio.mimsave(\"video.mp4\", result, fps=4)\n```\n\n\n如需更多信息，包括如何进行“文本与姿态条件下的视频生成”、“文本与边缘条件下的视频生成”以及“文本、边缘和 DreamBooth 条件下的视频生成”，请参阅[文档](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Fapi\u002Fpipelines\u002Ftext_to_video_zero)。  \n\n\n\n\u003C\u002Fdetails>","# Text2Video-Zero 快速上手指南\n\nText2Video-Zero 是一个无需训练（Zero-Shot）的视频生成工具，它利用现有的文本到图像扩散模型（如 Stable Diffusion），通过提示词、姿态、边缘或深度控制来生成时间一致的视频，甚至支持指令引导的视频编辑。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux (推荐) 或 macOS\n*   **Python 版本**: 3.9\n*   **CUDA 版本**: >= 11.6\n*   **GPU 显存**: \n    *   标准模式：建议 12 GB 及以上\n    *   低显存模式（开启 Token Merging）：可低至 7 GB\n*   **Git**: 已安装并配置\n\n## 安装步骤\n\n1.  **克隆仓库**\n    ```shell\n    git clone https:\u002F\u002Fgithub.com\u002FPicsart-AI-Research\u002FText2Video-Zero.git\n    cd Text2Video-Zero\u002F\n    ```\n\n2.  **创建虚拟环境并安装依赖**\n    建议使用 `virtualenv` 隔离环境。\n    ```shell\n    virtualenv --system-site-packages -p python3.9 venv\n    source venv\u002Fbin\u002Factivate\n    pip install -r requirements.txt\n    ```\n    > **提示**：国内用户若下载依赖较慢，可添加清华源加速：\n    > `pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`\n\n## 基本使用\n\n### 1. 初始化模型\n在 Python 脚本中导入并初始化模型实例。\n\n```python\nimport torch\nfrom model import Model\n\n# 初始化模型，使用 CUDA 加速和 float16 精度以节省显存\nmodel = Model(device=\"cuda\", dtype=torch.float16)\n```\n\n### 2. 文生视频 (Text-to-Video)\n这是最基础的用法，仅需输入文本提示词即可生成视频。\n\n```python\nprompt = \"A horse galloping on a street\"\nparams = {\n    \"t0\": 44, \n    \"t1\": 47, \n    \"motion_field_strength_x\": 12, \n    \"motion_field_strength_y\": 12, \n    \"video_length\": 8\n}\n\n# 生成视频并保存\nout_path = f\".\u002Ftext2video_{prompt.replace(' ','_')}.mp4\"\nfps = 4\nmodel.process_text2video(prompt, fps=fps, path=out_path, **params)\n```\n\n**关键参数说明：**\n*   `video_length`: 生成视频的帧数（默认 8）。\n*   `motion_field_strength_x\u002Fy`: 运动场强度，控制画面动态程度（默认均为 12）。\n*   `t0`, `t1`: 去噪步数范围，通常在 0-50 之间（默认 44, 47）。\n\n### 3. 低显存模式 (可选)\n如果您的 GPU 显存较小（\u003C12GB），可以在调用上述 API 时增加 `chunk_size` 参数，将视频分块处理以降低显存占用。\n\n```python\n# 设置 chunk_size 为 2 或更大，数值越小显存占用越低，但速度可能变慢\nmodel.process_text2video(prompt, fps=fps, path=out_path, chunk_size=2, **params)\n```\n\n### 4. 启动 Web 界面 (Gradio)\n如果您更喜欢图形化界面操作，可以运行自带的 Gradio 应用。\n\n```shell\npython app.py\n```\n运行后在浏览器访问 `http:\u002F\u002F127.0.0.1:7860` 即可使用。支持文本生成、姿态控制、边缘控制及视频编辑等多种模式。","某独立游戏开发者需要为一款复古像素风格的游戏快速生成一段“角色在雨中奔跑”的过场动画，以测试剧情节奏。\n\n### 没有 Text2Video-Zero 时\n- **高昂的训练成本**：若要生成特定风格的视频，通常需收集大量该风格的视频数据并重新训练模型，耗时数天且需要多张高端显卡。\n- **画面闪烁严重**：直接使用传统的逐帧图像生成工具，会导致每一帧之间缺乏关联，角色动作和背景出现严重的抖动与不连贯。\n- **编辑灵活性差**：若想调整角色的奔跑姿势或雨势大小，往往需要重新生成整个序列，无法基于现有画面进行局部指令修改。\n- **显存门槛极高**：现有的视频生成方案对显存要求苛刻，普通开发者的单卡工作站（如 12GB 以下）难以运行，限制了创意验证的速度。\n\n### 使用 Text2Video-Zero 后\n- **零样本即时生成**：直接利用已有的文本到图像扩散模型，无需任何额外训练或视频数据集，输入提示词即可生成符合像素风格的连贯视频。\n- **时序高度一致**：通过其特有的运动引导机制，生成的视频中角色奔跑动作流畅自然，背景雨滴下落稳定，彻底消除了画面闪烁问题。\n- **指令化精准编辑**：利用 Video Instruct-Pix2Pix 功能，只需输入“让雨下得更大”或“改变奔跑方向”等指令，即可在保留原视频结构的基础上完成编辑。\n- **低显存友好部署**：结合 Token Merging 技术，即使在显存小于 7GB 的消费级显卡上也能流畅运行，让独立开发者能在本地轻松迭代创意。\n\nText2Video-Zero 将视频生成的门槛从“专业实验室”拉低至“个人工作台”，让创作者能以最少的资源实现从零到一的动态视觉构思。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FPicsart-AI-Research_Text2Video-Zero_965beccb.png","Picsart-AI-Research","Picsart AI Research (PAIR)","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FPicsart-AI-Research_ff59824a.png","",null,"PicsartAI","http:\u002F\u002Ftwitter.com\u002Fpicsartai","https:\u002F\u002Fgithub.com\u002FPicsart-AI-Research",[86,90,94],{"name":87,"color":88,"percentage":89},"Python","#3572A5",100,{"name":91,"color":92,"percentage":93},"Shell","#89e051",0,{"name":95,"color":96,"percentage":93},"CSS","#663399",4248,387,"2026-04-05T23:16:49","NOASSERTION","未说明","必需 NVIDIA GPU。最低显存需求：标准模式需 12GB；启用 Token Merging (merging_ratio) 且 chunk_size=2 时可降至 7GB 以下。CUDA 版本需 >= 11.6。",{"notes":104,"python":105,"dependencies":106},"1. 项目已集成到 Hugging Face Diffusers 库中。2. 支持低内存推理模式，可通过设置 chunk_size 参数（范围 2 到视频长度）减少显存占用。3. 集成了 Token Merging 技术，通过 merging_ratio 参数进一步压缩显存，但过高值会降低画质。4. 支持加载 Hugging Face 上的任意 Stable Diffusion 底模和 Dreambooth 模型。5. 功能包括零样本文本生成视频、姿态\u002F边缘\u002F深度控制生成、以及视频指令编辑 (Video Instruct-Pix2Pix)。","3.9",[107,108,109,110,111,112,113],"torch","diffusers","transformers","accelerate","gradio","opencv-python","tomesd",[16],[116,117],"video-editing","video-generation","2026-03-27T02:49:30.150509","2026-04-10T18:55:17.162280",[121,126,131,136,141,145],{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},27943,"遇到 'RuntimeError: Device type CUDA is not supported for torch.Generator()' 错误怎么办？","这通常是因为 PyTorch 安装不正确或版本不兼容。解决方法是卸载当前的 torch 并重新安装指定 CUDA 版本的 torch。执行以下命令：\n1. pip uninstall torch\n2. pip install torch==1.13.1 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu117\n请确保安装的 torch 版本与你的 CUDA 环境匹配。","https:\u002F\u002Fgithub.com\u002FPicsart-AI-Research\u002FText2Video-Zero\u002Fissues\u002F25",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},27944,"运行时报错 'CUBLAS_STATUS_INVALID_VALUE' 如何解决？","该错误通常由 cuDNN 运行时版本不兼容引起。尝试重新安装 cuDNN，例如将版本从 8.3.3 升级到 8.5.0。确保安装的 cuDNN 版本与你的 CUDA 版本（如 11.8）相匹配。","https:\u002F\u002Fgithub.com\u002FPicsart-AI-Research\u002FText2Video-Zero\u002Fissues\u002F22",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},27945,"项目的最低显存（VRAM）要求是多少？1080Ti 或 RTX 3060 能运行吗？","最新代码已经优化，可以在小于 7GB 显存的显卡上运行。因此，拥有 8GB 显存的 GTX 1080Ti 或 12GB 显存的 RTX 3060 理论上可以运行。如果遇到显存不足（OutOfMemoryError），请确保使用的是最新版本的代码。","https:\u002F\u002Fgithub.com\u002FPicsart-AI-Research\u002FText2Video-Zero\u002Fissues\u002F6",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},27946,"如何防止程序启动时自动创建公共链接共享我的机器？","默认情况下代码可能开启了共享功能。为了安全起见，你可以修改启动代码，将 share 参数设置为 False。\n找到类似以下的代码行：\n_,_,link = demo.queue(api_open=False).launch(file_directories=['temporal'], share=True)\n将其修改为：\n_,_,link = demo.queue(api_open=False).launch(file_directories=['temporal'], share=False)\n维护者已更新代码，默认不再创建公共链接。","https:\u002F\u002Fgithub.com\u002FPicsart-AI-Research\u002FText2Video-Zero\u002Fissues\u002F19",{"id":142,"question_zh":143,"answer_zh":144,"source_url":140},27947,"安装依赖时遇到 opencv-contrib-python 版本问题或安装失败怎么办？","在 requirements.txt 文件中，可能需要调整 opencv-contrib-python 的版本。尝试将版本号从 4.3.0.36 修改为 4.4.0.46，然后重新运行 pip install -r requirements.txt 进行安装。建议使用 Python 3.9 虚拟环境进行安装。",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},27948,"如何在免费的 Google Colab (T4 GPU) 上运行该项目？","该项目可以在免费的 Colab T4 GPU 上运行。社区成员已经制作了专门的 Colab 笔记本供使用。如果在 Colab 上遇到时序一致性问题，可能是由 xformers 引起的；禁用 xformers 可能解决问题，但这可能导致模型在免费 T4 显存限制下无法加载。","https:\u002F\u002Fgithub.com\u002FPicsart-AI-Research\u002FText2Video-Zero\u002Fissues\u002F13",[]]