[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-nvidia-cosmos--cosmos-predict2.5":3,"tool-nvidia-cosmos--cosmos-predict2.5":65},[4,23,32,40,49,57],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":22},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,2,"2026-04-05T10:45:23",[13,14,15,16,17,18,19,20,21],"图像","数据工具","视频","插件","Agent","其他","语言模型","开发框架","音频","ready",{"id":24,"name":25,"github_repo":26,"description_zh":27,"stars":28,"difficulty_score":29,"last_commit_at":30,"category_tags":31,"status":22},3833,"MoneyPrinterTurbo","harry0703\u002FMoneyPrinterTurbo","MoneyPrinterTurbo 是一款利用 AI 大模型技术，帮助用户一键生成高清短视频的开源工具。只需输入一个视频主题或关键词，它就能全自动完成从文案创作、素材匹配、字幕合成到背景音乐搭配的全过程，最终输出完整的竖屏或横屏短视频。\n\n这款工具主要解决了传统视频制作流程繁琐、门槛高以及素材版权复杂等痛点。无论是需要快速产出内容的自媒体创作者，还是希望尝试视频生成的普通用户，无需具备专业的剪辑技能或昂贵的硬件配置（普通电脑即可运行），都能轻松上手。同时，其清晰的 MVC 架构和对多种主流大模型（如 DeepSeek、Moonshot、通义千问等）的广泛支持，也使其成为开发者进行二次开发或技术研究的理想底座。\n\nMoneyPrinterTurbo 的独特亮点在于其高度的灵活性与本地化友好性。它不仅支持中英文双语及多种语音合成，允许用户精细调整字幕样式和画面比例，还特别优化了国内网络环境下的模型接入方案，让用户无需依赖 VPN 即可使用高性能国产大模型。此外，工具提供批量生成模式，可一次性产出多个版本供用户择优，极大地提升了内容创作的效率与质量。",54991,3,"2026-04-05T12:23:02",[20,19,17,15,13],{"id":33,"name":34,"github_repo":35,"description_zh":36,"stars":37,"difficulty_score":10,"last_commit_at":38,"category_tags":39,"status":22},2179,"oh-my-openagent","code-yeongyu\u002Foh-my-openagent","oh-my-openagent（简称 omo）是一款强大的开源智能体编排框架，前身名为 oh-my-opencode。它致力于打破单一模型供应商的生态壁垒，解决开发者在构建 AI 应用时面临的“厂商锁定”难题。不同于仅依赖特定模型的封闭方案，omo 倡导开放市场理念，支持灵活调度多种主流大模型：利用 Claude、Kimi 或 GLM 进行任务编排，调用 GPT 处理复杂推理，借助 Minimax 提升响应速度，或发挥 Gemini 的创意优势。\n\n这款工具特别适合希望摆脱平台限制、追求极致性能与成本平衡的开发者及研究人员使用。通过统一接口，用户可以轻松组合不同模型的长处，构建更高效、更具适应性的智能体系统。其独特的技术亮点在于“全模型兼容”架构，让用户不再受制于某一家公司的策略变动或定价调整，真正实现对前沿模型资源的自由驾驭。无论是构建自动化编码助手，还是开发多步骤任务处理流程，oh-my-openagent 都能提供灵活且稳健的基础设施支持，助力用户在快速演进的 AI 生态中保持技术主动权。",48371,"2026-04-05T11:36:18",[15,19,20,13,17],{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":46,"last_commit_at":47,"category_tags":48,"status":22},2483,"onlook","onlook-dev\u002Fonlook","Onlook 是一款专为设计师打造的开源 AI 优先设计工具，被誉为“设计师版的 Cursor”。它旨在打破设计与开发之间的壁垒，让用户能够以可视化的方式直接构建、样式化和编辑 React 应用。通过 Onlook，用户无需深入编写复杂代码，即可在类似 Figma 的直观界面中完成网页原型的搭建与调整，并实时预览最终效果。\n\n这款工具主要解决了传统工作流中设计稿到代码转换效率低、沟通成本高的问题。以往，设计师使用 Figma 等工具完成设计后，需要开发人员手动将其转化为代码，过程繁琐且容易出错。Onlook 允许用户直接在浏览器 DOM 中进行可视化编辑，底层自动生成基于 Next.js 和 TailwindCSS 的高质量代码，实现了“所见即所得”的开发体验。它不仅支持从文本或图像快速生成应用，还具备分支管理、资源管理及一键部署等功能，极大地简化了从创意到成品的流程。\n\nOnlook 特别适合前端开发者、UI\u002FUX 设计师以及希望快速验证产品创意的独立开发者使用。对于设计师而言，它降低了参与前端开发的门槛；对于开发者来说，它提供了一个高效的视觉化调试和原型构建环境。其核心技术亮点在于",25006,4,"2026-04-03T01:50:49",[17,13,15,20],{"id":50,"name":51,"github_repo":52,"description_zh":53,"stars":54,"difficulty_score":10,"last_commit_at":55,"category_tags":56,"status":22},3795,"serena","oraios\u002Fserena","Serena 是一款专为编程智能体（Coding Agent）打造的强大工具包，被誉为“智能体的集成开发环境（IDE）”。它通过模型上下文协议（MCP）与各类大语言模型及客户端无缝集成，旨在解决传统 AI 在复杂代码库中因依赖行号或简单文本搜索而导致的效率低下和准确性不足的问题。\n\n与传统方法不同，Serena 采用“智能体优先”的设计理念，提供基于语义的代码检索、编辑和重构能力。它能像资深开发者使用 IDE 一样，深入理解代码的符号层级和关联结构，从而让智能体在大型项目中运行得更快、更稳、更可靠。无论是终端用户（如 Claude Code）、IDE 插件（VSCode、Cursor）还是桌面应用，都能轻松接入 Serena 以扩展功能。\n\nSerena 特别适合需要处理大规模代码项目的开发者、研究人员以及希望提升 AI 编码能力的技术团队。其核心技术亮点在于灵活的后端支持：既默认集成了基于语言服务器协议（LSP）的开源方案，支持超过 40 种编程语言；也可选配强大的 JetBrains 插件，利用专业 IDE 的深度分析能力。这让 Serena 成为连接人工智能与复杂软件工程的高效桥",22488,"2026-04-05T10:53:54",[17,13,20,15],{"id":58,"name":59,"github_repo":60,"description_zh":61,"stars":62,"difficulty_score":29,"last_commit_at":63,"category_tags":64,"status":22},3856,"sam2","facebookresearch\u002Fsam2","SAM 2 是 Meta 推出的新一代基础模型，旨在解决图像与视频中的“提示式视觉分割”难题。无论是静态图片还是动态视频，用户只需提供简单的点击、框选等提示，SAM 2 就能精准识别并分割出目标对象。它将单张图像视为单帧视频进行处理，成功打破了以往模型在视频理解上的局限。\n\n这款工具特别适合计算机视觉开发者、AI 研究人员以及需要处理视频内容的设计师使用。对于希望探索多目标跟踪或构建交互式应用的技术团队，SAM 2 提供了强大的底层支持。其核心亮点在于采用了带有流式记忆机制的 Transformer 架构，能够实现实时的视频处理性能。此外，项目配套发布了迄今为止规模最大的视频分割数据集（SA-V），并通过“模型闭环数据引擎”不断自我进化。最新更新的 SAM 2.1 版本不仅提供了更优的预训练权重，还支持全模型编译加速及灵活的多目标独立追踪，让复杂场景下的视频分析变得更加高效与便捷。",18853,"2026-04-05T10:30:04",[13,15],{"id":66,"github_repo":67,"name":68,"description_en":69,"description_zh":70,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":80,"owner_location":80,"owner_email":81,"owner_twitter":80,"owner_website":82,"owner_url":83,"languages":84,"stars":99,"forks":100,"last_commit_at":101,"license":102,"difficulty_score":46,"env_os":103,"env_gpu":104,"env_ram":105,"env_deps":106,"category_tags":115,"github_topics":116,"view_count":120,"oss_zip_url":80,"oss_zip_packed_at":80,"status":22,"created_at":121,"updated_at":122,"faqs":123,"releases":152},254,"nvidia-cosmos\u002Fcosmos-predict2.5","cosmos-predict2.5","Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video.","Cosmos-Predict2.5 是 NVIDIA 推出的最新一代世界基础模型（World Foundation Model），专注于根据当前视频输入预测未来世界的动态变化，并以视频形式输出。它能帮助 AI 系统理解物理世界的运行规律，从而更准确地模拟和预判后续状态，适用于自动驾驶、机器人控制和智能视频分析等需要与现实环境交互的场景。\n\n这一模型解决了传统 AI 在复杂动态环境中缺乏长期预测能力和物理一致性的问题，让机器不仅能“看见”，还能“预见”。Cosmos-Predict2.5 特别适合从事物理 AI 研发的工程师和研究人员使用，尤其是那些在机器人、自动驾驶或具身智能领域工作的开发者。\n\n技术上，它基于 Rectified Flow 架构，支持 Diffusers 接口，并引入了多视角交叉注意力、动作条件生成、滑动窗口长视频生成等创新机制。配合配套的 Cosmos Cookbook，用户还能通过蒸馏、LoRA 微调等方式高效定制模型，快速部署到实际系统中。","\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnvidia-cosmos_cosmos-predict2.5_readme_f7a4af299ff1.png\" width=\"274\" alt=\"NVIDIA Cosmos\"\u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fai\u002Fcosmos\">Product Website\u003C\u002Fa>&nbsp | 🤗 \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fnvidia\u002Fcosmos-predict25-68bb63255f2fc206c5e5b346\">Hugging Face\u003C\u002Fa>&nbsp | \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.00062\">Paper\u003C\u002Fa>&nbsp | \u003Ca href=\"https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Fdir\u002Fcosmos-predict2.5\">Paper Website\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-cookbook\">Cosmos Cookbook\u003C\u002Fa>\n\u003C\u002Fp>\n\nNVIDIA Cosmos™ is a platform purpose-built for physical AI, featuring state-of-the-art generative world foundation models (WFMs), robust guardrails, and an accelerated data processing and curation pipeline. Designed specifically for real-world systems, Cosmos enables developers to rapidly advance physical AI applications such as autonomous vehicles (AVs), robots, and video analytics AI agents.\n\nCosmos World Foundation Models come in three model types which can all be customized in post-training: [cosmos-predict](https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-predict2.5), [cosmos-transfer](https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-transfer2.5), and [cosmos-reason](https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-reason1).\n\n## News!\n* [February 23, 2026] Released Predict2.5 \u002FRobot\u002FAction-Cond Distillation [guide](https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-predict2.5\u002Fblob\u002Fmain\u002Fdocs\u002Fpost-training_video2world_action.md#4-distillation), and Predict2.5 \u002FRobot\u002FPolicy [models](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002FCosmos-Predict2.5-2B\u002Ftree\u002Fmain\u002Frobot\u002Fpolicy) (RoboCasa, Libero), inference and post-training [recipe](https:\u002F\u002Fnvidia-cosmos.github.io\u002Fcosmos-cookbook\u002Frecipes\u002Fpost_training\u002Fpredict2\u002Fcosmos_policy\u002Fpost_training.html) in cosmos-cookbook.\n* [December 19, 2025] Released Cosmos-Predict2.5-2B Diffusers support via [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Fmain\u002Fen\u002Fapi\u002Fpipelines\u002Fcosmos#diffusers.Cosmos2_5_PredictBasePipeline), Cosmos-Predict2.5-2B Text2World distilled checkpoint on [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002FCosmos-Predict2.5-2B\u002Ftree\u002Fmain\u002Fbase\u002Fdistilled) and Distillation [guide](docs\u002Fdistillation.md).\n* [December 5, 2025] Released Cosmos-Predict2.5-14B [base models](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002FCosmos-Predict2.5-14B), [inference](docs\u002Finference.md) and [post training for DreamGen](docs\u002Fpost-training_video2world_gr00t.md). Also added the Cosmos-Predict2.5B robot\u002Fmultiview-agibot [model](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002FCosmos-Predict2.5-2B\u002Ftree\u002Fmain\u002Frobot\u002Fmultiview-agibot), and [inference](docs\u002Finference_robot_multiview-agibot.md).\n* [November 25, 2025] Added Blackwell + ARM inference support, along with fixes for the help menu and CLI overrides, improved guardrail offloading, and LFS enablement for large assets.\n* [November 11, 2025] Refactored the Cosmos-Predict2.5-2B Auto\u002FMultiview code, updated the Auto\u002FMultiview checkpoints in Hugging Face, and added inference example notebooks under examples\u002Fnotebook\u002F to make testing and onboarding easier.\n* [November 8, 2025] Added a new pedagogical [README](docs\u002Frectified-flow.md) in docs\u002F detailing the Rectified Flow formulation and its integration with the UniPC solver.\n* [November 7, 2025] We released support for DMD2 distillation for model compression, autoregressive sliding window generation mode for generating longer videos, and a new multiview cross-attention module. We improved inference examples and documentation, upgraded dependencies to improve support for Blackwell, and made various infrastructure improvements.\n* [October 28, 2025] We added [Cosmos Cookbook](https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-cookbook), a collection of step-by-step recipes and post-training scripts to quickly build, customize, and deploy NVIDIA’s Cosmos world foundation models for robotics and autonomous systems.\n* [October 28, 2025] We fixed action-conditioned inference bug, improved LoRA post-training and unified across text2world, image2world, video2world, sped up tokenization with CP + torch.compile for Transfer2, updated guardrails, added multi-storage support, and introduced the cosmos-oss package.\n* [October 21, 2025] We added LoRA (Low-Rank Adaptation) post-training for both [Video2World and Text2World](docs\u002Fpost-training_cosmos_nemo_assets_lora.md), and gr00t-dreams dataset for post-training. Also, updated Docker base image version, and Gradio related documentation.\n* [October 14, 2025] We released the Cosmos-Predict2.5 robot\u002Faction-cond: [Inference Guide](docs\u002Finference_robot_action_cond.md) and [Post-Training Guide](docs\u002Fpost-training_video2world_action.md). Also released [Auto Multview Post-Training](docs\u002Fpost-training_multiview.md).\n* [October 6, 2025] We released [Cosmos-Predict2.5](https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-predict2.5) and [Cosmos-Transfer2.5](https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-transfer2.5) - the next generation of our world simulation models!\n\n## Cosmos-Predict2.5\n\nWe introduce Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video. Cosmos-Predict2.5 is a flow based model that unifies Text2World, Image2World, and Video2World into a single model and utilizes Cosmos-Reason1, a Physical AI reasoning vision language model (VLM), as the text encoder. Cosmos-Predict2.5 significantly improves upon Cosmos-Predict1 in both quality and prompt alignment.\n\n### Image2World\n\n\u003Cdetails>\u003Csummary>Input prompt\u003C\u002Fsummary>\nA nighttime city bus terminal gradually shifts from stillness to subtle movement. At first, multiple double-decker buses are parked under the glow of overhead lights, with a central bus labeled '87D' facing forward and stationary. As the video progresses, the bus in the middle moves ahead slowly, its headlights brightening the surrounding area and casting reflections onto adjacent vehicles. The motion creates space in the lineup, signaling activity within the otherwise quiet station. It then comes to a smooth stop, resuming its position in line. Overhead signage in Chinese characters remains illuminated, enhancing the vibrant, urban night scene.\n\u003C\u002Fdetails>\n\n| Input image | Output video\n| --- | --- |\n| \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnvidia-cosmos_cosmos-predict2.5_readme_eecf2e497063.png\" width=\"500\" alt=\"Input image\" > | \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fa233567b-9eb4-405a-ab36-c0bf902d2988\" width=\"500\" alt=\"Output video\" controls>\u003C\u002Fvideo> |\n\n### Video2World\n\n\u003Cdetails>\u003Csummary>Input prompt\u003C\u002Fsummary>\nA robotic arm, primarily white with black joints and cables, is shown in a clean, modern indoor setting with a white tabletop. The arm, equipped with a gripper holding a small, light green pitcher, is positioned above a clear glass containing a reddish-brown liquid and a spoon. The robotic arm is in the process of pouring a transparent liquid into the glass. To the left of the pitcher, there is an opened jar with a similar reddish-brown substance visible through its transparent body. In the background, a vase with white flowers and a brown couch are partially visible, adding to the contemporary ambiance. The lighting is bright, casting soft shadows on the table. The robotic arm's movements are smooth and controlled, demonstrating precision in its task. As the video progresses, the robotic arm completes the pour, leaving the glass half-filled with the reddish-brown liquid. The jar remains untouched throughout the sequence, and the spoon inside the glass remains stationary. The other robotic arm on the right side also stays stationary throughout the video. The final frame captures the robotic arm with the pitcher finishing the pour, with the glass now filled to a higher level, while the pitcher is slightly tilted but still held securely by the gripper.\n\u003C\u002Fdetails>\n\n| Input Video | Output Video\n| --- | --- |\n| \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fddca366e-b30f-44bb-9def-b4a8386d8d23\" width=\"500\" alt=\"Output video\" controls>\u003C\u002Fvideo> | \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F62c0800d-036a-4dbc-b0a6-199ee25d8e31\" width=\"500\" alt=\"Output video\" controls>\u003C\u002Fvideo> |\n\n## Cosmos-Predict2.5 Model Family\n\nOur world simulation models, Cosmos-Predict's fundamental capability is predicting future world states in video form supporting multimodal inputs. We have open sourced both pre-trained foundation models as well as post-trained models accelerating multiple domains. Please check back as we continue to add more specialized models and capabilities to the Predict family!\n\n[**Cosmos-Predict2.5**](docs\u002Finference.md): Base [2B checkpoints](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002FCosmos-Predict2.5-2B\u002Ftree\u002Fmain\u002Fbase) and [14B checkpoints](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002FCosmos-Predict2.5-14B\u002Ftree\u002Fmain\u002Fbase), trained from the ground up for Physical AI and robotics.\n\n[**Cosmos-Predict2.5\u002Fauto\u002Fmultiview**](docs\u002Finference_auto_multiview.md): Specialized [checkpoints](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002FCosmos-Predict2.5-2B\u002Ftree\u002Fmain\u002Fauto\u002Fmultiview), post-trained for Autonomous Vehicle applications.\n\n| Model Name | Capability | Input |\n| --- | --- | --- |\n| [**Cosmos-Predict2.5 base**](docs\u002Finference.md) | | |\n| Cosmos-Predict2.5-2B\u002Fpre-trained | pre-trained base | text + image or video |\n| Cosmos-Predict2.5-2B\u002Fpost-trained | post-trained base | text + image or video |\n| Cosmos-Predict2.5-2B\u002Fdistilled | distilled base | text |\n| Cosmos-Predict2.5-14B\u002Fpre-trained | pre-trained base | text + image or video |\n| Cosmos-Predict2.5-14B\u002Fpost-trained | post-trained base | text + image or video |\n| [**Cosmos-Predict2.5 auto**](docs\u002Finference_auto_multiview.md) | | |\n| Cosmos-Predict2.5-2B\u002Fauto\u002Fmultiview | driving, 7-camera view | text + image or video |\n| [**Cosmos-Predict2.5-2B robot**](docs\u002Finference_robot_action_cond.md) | | |\n| Cosmos-Predict2.5-2B\u002Frobot\u002Faction-cond | robotic, action-conditioned | action |\n| Cosmos-Predict2.5-2B\u002Frobot\u002Fmultiview-agibot | robotic, AgiBot data, 3-camera view | text + image |\n| Cosmos-Predict2.5-2B\u002Frobot\u002Fpolicy | post-trained on Libero and RoboCasa | action + image |\n\n## User Guide\n\n* [Setup Guide](docs\u002Fsetup.md)\n* [Troubleshooting](docs\u002Ftroubleshooting.md)\n* [Inference](docs\u002Finference.md)\n  * [Auto Multiview](docs\u002Finference_auto_multiview.md)\n  * [Robot Action-Conditioned](docs\u002Finference_robot_action_cond.md)\n  * [Robot Multiview-Agibot](docs\u002Finference_robot_multiview-agibot.md)\n  * [Robot Policy](https:\u002F\u002Fnvidia-cosmos.github.io\u002Fcosmos-cookbook\u002Frecipes\u002Fpost_training\u002Fpredict2\u002Fcosmos_policy\u002Fpost_training.html)\n* [Diffusers Inference Guide](docs\u002Fdiffusers_inference.md)\n* [Post-Training](docs\u002Fpost-training.md)\n  * [Video2World Cosmos-NeMo-Assets](docs\u002Fpost-training_video2world_cosmos_nemo_assets.md)\n  * [Video2World DreamGen Bench](docs\u002Fpost-training_video2world_gr00t.md)\n  * [Auto Multiview](docs\u002Fpost-training_multiview.md)\n  * [Robot Action-Conditioned](docs\u002Fpost-training_video2world_action.md)\n  * [Robot Policy](https:\u002F\u002Fnvidia-cosmos.github.io\u002Fcosmos-cookbook\u002Frecipes\u002Fpost_training\u002Fpredict2\u002Fcosmos_policy\u002Fpost_training.html)\n* [Distillation](docs\u002Fdistillation.md)\n  * [Robot Action-Conditioned](https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-predict2.5\u002Fblob\u002Fmain\u002Fdocs\u002Fpost-training_video2world_action.md#4-distillation)\n\n## Contributing\n\nWe thrive on community collaboration! [NVIDIA-Cosmos](https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002F) wouldn't be where it is without contributions from developers like you. Check out our [Contributing Guide](CONTRIBUTING.md) to get started, and share your feedback through issues.\n\nBig thanks 🙏 to everyone helping us push the boundaries of open-source physical AI!\n\n## License and Contact\n\nThis project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.\n\nNVIDIA Cosmos source code is released under the [Apache 2 License](https:\u002F\u002Fwww.apache.org\u002Flicenses\u002FLICENSE-2.0).\n\nNVIDIA Cosmos models are released under the [NVIDIA Open Model License](https:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fagreements\u002Fenterprise-software\u002Fnvidia-open-model-license). For a custom license, please contact [cosmos-license@nvidia.com](mailto:cosmos-license@nvidia.com).\n","\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnvidia-cosmos_cosmos-predict2.5_readme_f7a4af299ff1.png\" width=\"274\" alt=\"NVIDIA Cosmos\"\u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fai\u002Fcosmos\">产品官网\u003C\u002Fa>&nbsp | 🤗 \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fnvidia\u002Fcosmos-predict25-68bb63255f2fc206c5e5b346\">Hugging Face\u003C\u002Fa>&nbsp | \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.00062\">论文\u003C\u002Fa>&nbsp | \u003Ca href=\"https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Fdir\u002Fcosmos-predict2.5\">论文官网\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-cookbook\">Cosmos Cookbook\u003C\u002Fa>\n\u003C\u002Fp>\n\nNVIDIA Cosmos™ 是一个专为物理 AI（Physical AI）打造的平台，包含业界领先的生成式世界基础模型（World Foundation Models, WFMs）、强大的安全护栏（guardrails），以及加速的数据处理与整理流水线。Cosmos 专为现实世界系统设计，使开发者能够快速推进物理 AI 应用，例如自动驾驶汽车（AVs）、机器人和视频分析 AI 智能体。\n\nCosmos 世界基础模型包含三种模型类型，均可在训练后进行定制：[cosmos-predict](https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-predict2.5)、[cosmos-transfer](https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-transfer2.5) 和 [cosmos-reason](https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-reason1)。\n\n## 最新动态！\n* [2026 年 2 月 23 日] 发布 Predict2.5 \u002FRobot\u002FAction-Cond 蒸馏 [指南](https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-predict2.5\u002Fblob\u002Fmain\u002Fdocs\u002Fpost-training_video2world_action.md#4-distillation)，以及 Predict2.5 \u002FRobot\u002FPolicy [模型](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002FCosmos-Predict2.5-2B\u002Ftree\u002Fmain\u002Frobot\u002Fpolicy)（RoboCasa、Libero），并在 cosmos-cookbook 中提供推理与训练后 [方案](https:\u002F\u002Fnvidia-cosmos.github.io\u002Fcosmos-cookbook\u002Frecipes\u002Fpost_training\u002Fpredict2\u002Fcosmos_policy\u002Fpost_training.html)。\n* [2025 年 12 月 19 日] 通过 [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Fmain\u002Fen\u002Fapi\u002Fpipelines\u002Fcosmos#diffusers.Cosmos2_5_PredictBasePipeline) 发布 Cosmos-Predict2.5-2B 对 Diffusers 的支持，并在 [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002FCosmos-Predict2.5-2B\u002Ftree\u002Fmain\u002Fbase\u002Fdistilled) 上发布 Cosmos-Predict2.5-2B Text2World 蒸馏检查点及蒸馏 [指南](docs\u002Fdistillation.md)。\n* [2025 年 12 月 5 日] 发布 Cosmos-Predict2.5-14B [基础模型](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002FCosmos-Predict2.5-14B)、[推理](docs\u002Finference.md) 及 DreamGen 的 [训练后方案](docs\u002Fpost-training_video2world_gr00t.md)。同时新增 Cosmos-Predict2.5B robot\u002Fmultiview-agibot [模型](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002FCosmos-Predict2.5-2B\u002Ftree\u002Fmain\u002Frobot\u002Fmultiview-agibot) 及 [推理](docs\u002Finference_robot_multiview-agibot.md)。\n* [2025 年 11 月 25 日] 新增 Blackwell + ARM 推理支持，并修复帮助菜单和 CLI 覆盖问题，改进安全护栏卸载机制，并为大型资产启用 LFS。\n* [2025 年 11 月 11 日] 重构 Cosmos-Predict2.5-2B Auto\u002FMultiview 代码，更新 Hugging Face 上的 Auto\u002FMultiview 检查点，并在 examples\u002Fnotebook\u002F 下新增推理示例 Notebook，以简化测试和入门流程。\n* [2025 年 11 月 8 日] 在 docs\u002F 目录下新增一篇教学型 [README](docs\u002Frectified-flow.md)，详细介绍 Rectified Flow 公式及其与 UniPC 求解器的集成方式。\n* [2025 年 11 月 7 日] 新增对 DMD2 蒸馏（用于模型压缩）的支持、自回归滑动窗口生成模式（用于生成更长视频）以及新的多视角交叉注意力模块。同时改进了推理示例和文档，升级依赖项以增强对 Blackwell 的支持，并进行了多项基础设施优化。\n* [2025 年 10 月 28 日] 新增 [Cosmos Cookbook](https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-cookbook)，这是一套逐步操作指南和训练后脚本集合，可快速构建、定制并部署 NVIDIA 的 Cosmos 世界基础模型，用于机器人和自主系统。\n* [2025 年 10 月 28 日] 修复动作条件推理（action-conditioned inference）的 bug，改进 LoRA 训练后流程并统一应用于 text2world、image2world 和 video2world，通过 CP + torch.compile 加速 Transfer2 的分词过程，更新安全护栏，新增多存储支持，并引入 cosmos-oss 包。\n* [2025 年 10 月 21 日] 为 [Video2World 和 Text2World](docs\u002Fpost-training_cosmos_nemo_assets_lora.md) 新增 LoRA（低秩适配，Low-Rank Adaptation）训练后支持，并提供用于训练后的 gr00t-dreams 数据集。同时更新 Docker 基础镜像版本及相关 Gradio 文档。\n* [2025 年 10 月 14 日] 发布 Cosmos-Predict2.5 robot\u002Faction-cond 的 [推理指南](docs\u002Finference_robot_action_cond.md) 和 [训练后指南](docs\u002Fpost-training_video2world_action.md)，并发布 [Auto Multiview 训练后方案](docs\u002Fpost-training_multiview.md)。\n* [2025 年 10 月 6 日] 正式发布 [Cosmos-Predict2.5](https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-predict2.5) 和 [Cosmos-Transfer2.5](https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-transfer2.5) —— 我们世界模拟模型的下一代版本！\n\n## Cosmos-Predict2.5\n\n我们推出 Cosmos-Predict2.5，这是 Cosmos 世界基础模型（WFMs）系列的最新版本，专门用于以视频形式模拟和预测世界的未来状态。Cosmos-Predict2.5 是一种基于流（flow-based）的模型，将 Text2World、Image2World 和 Video2World 统一到单一模型中，并采用 Cosmos-Reason1（一种物理 AI 推理视觉语言模型，VLM）作为文本编码器。Cosmos-Predict2.5 在生成质量和提示对齐方面相比 Cosmos-Predict1 有显著提升。\n\n### Image2World\n\n\u003Cdetails>\u003Csummary>输入提示\u003C\u002Fsummary>\n一个夜间城市公交总站逐渐从静止状态过渡到细微的动态。起初，多辆双层巴士停靠在顶灯的光晕下，其中一辆标有“87D”的中央巴士正对前方且静止不动。随着视频推进，中间的巴士缓慢前行，其前照灯照亮周围区域，并在邻近车辆上投射出反光。这一动作在车队中腾出空间，暗示这个原本安静的车站开始活跃起来。随后，该巴士平稳停下，重新回到队列中的位置。上方的中文标识牌持续亮起，增强了充满活力的城市夜景氛围。\n\u003C\u002Fdetails>\n\n| 输入图像 | 输出视频\n| --- | --- |\n| \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnvidia-cosmos_cosmos-predict2.5_readme_eecf2e497063.png\" width=\"500\" alt=\"Input image\" > | \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fa233567b-9eb4-405a-ab36-c0bf902d2988\" width=\"500\" alt=\"Output video\" controls>\u003C\u002Fvideo> |\n\n### Video2World\n\n\u003Cdetails>\u003Csummary>输入提示（Input prompt）\u003C\u002Fsummary>\n一个以白色为主、带有黑色关节和线缆的机械臂，出现在干净现代的室内环境中，桌面为白色。该机械臂配备了一个夹爪，夹爪抓着一个浅绿色的小水壶，正位于一个装有红棕色液体和一把勺子的透明玻璃杯上方。机械臂正在将一种透明液体倒入玻璃杯中。在水壶左侧，有一个打开的罐子，透过其透明罐体可见类似的红棕色物质。背景中部分可见一个插有白色花朵的花瓶和一张棕色沙发，增添了当代氛围。光线明亮，在桌面上投下柔和的阴影。机械臂的动作流畅且受控，展现出任务执行中的精准性。随着视频推进，机械臂完成倾倒动作，玻璃杯被红棕色液体填充至半满。在整个过程中，罐子始终未被触碰，玻璃杯内的勺子也保持静止。右侧的另一只机械臂在整个视频中同样保持不动。最后一帧捕捉到机械臂完成倾倒的瞬间：玻璃杯液位升高，水壶略微倾斜但仍被夹爪牢固抓持。\n\u003C\u002Fdetails>\n\n| 输入视频 | 输出视频\n| --- | --- |\n| \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fddca366e-b30f-44bb-9def-b4a8386d8d23\" width=\"500\" alt=\"Output video\" controls>\u003C\u002Fvideo> | \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F62c0800d-036a-4dbc-b0a6-199ee25d8e31\" width=\"500\" alt=\"Output video\" controls>\u003C\u002Fvideo> |\n\n## Cosmos-Predict2.5 模型系列\n\n我们的世界模拟模型 Cosmos-Predict 的核心能力是以视频形式预测未来的世界状态，并支持多模态输入（multimodal inputs）。我们已开源了预训练的基础模型（foundation models）以及针对多个领域进行后训练（post-trained）的模型。我们将持续向 Predict 系列添加更多专用模型和功能，请持续关注！\n\n[**Cosmos-Predict2.5**](docs\u002Finference.md)：专为物理 AI（Physical AI）和机器人学从头训练的 [2B 检查点](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002FCosmos-Predict2.5-2B\u002Ftree\u002Fmain\u002Fbase) 和 [14B 检查点](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002FCosmos-Predict2.5-14B\u002Ftree\u002Fmain\u002Fbase)。\n\n[**Cosmos-Predict2.5\u002Fauto\u002Fmultiview**](docs\u002Finference_auto_multiview.md)：专为自动驾驶（Autonomous Vehicle）应用后训练的 [检查点](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002FCosmos-Predict2.5-2B\u002Ftree\u002Fmain\u002Fauto\u002Fmultiview)。\n\n| 模型名称 | 能力 | 输入 |\n| --- | --- | --- |\n| [**Cosmos-Predict2.5 base**](docs\u002Finference.md) | | |\n| Cosmos-Predict2.5-2B\u002Fpre-trained | 预训练基础模型 | 文本 + 图像或视频 |\n| Cosmos-Predict2.5-2B\u002Fpost-trained | 后训练基础模型 | 文本 + 图像或视频 |\n| Cosmos-Predict2.5-2B\u002Fdistilled | 蒸馏基础模型 | 文本 |\n| Cosmos-Predict2.5-14B\u002Fpre-trained | 预训练基础模型 | 文本 + 图像或视频 |\n| Cosmos-Predict2.5-14B\u002Fpost-trained | 后训练基础模型 | 文本 + 图像或视频 |\n| [**Cosmos-Predict2.5 auto**](docs\u002Finference_auto_multiview.md) | | |\n| Cosmos-Predict2.5-2B\u002Fauto\u002Fmultiview | 自动驾驶，7 相机视角 | 文本 + 图像或视频 |\n| [**Cosmos-Predict2.5-2B robot**](docs\u002Finference_robot_action_cond.md) | | |\n| Cosmos-Predict2.5-2B\u002Frobot\u002Faction-cond | 机器人，动作条件（action-conditioned） | 动作（action） |\n| Cosmos-Predict2.5-2B\u002Frobot\u002Fmultiview-agibot | 机器人，AgiBot 数据，3 相机视角 | 文本 + 图像 |\n| Cosmos-Predict2.5-2B\u002Frobot\u002Fpolicy | 在 Libero 和 RoboCasa 上后训练 | 动作 + 图像 |\n\n## 用户指南\n\n* [安装指南](docs\u002Fsetup.md)\n* [故障排除](docs\u002Ftroubleshooting.md)\n* [推理（Inference）](docs\u002Finference.md)\n  * [自动多视角（Auto Multiview）](docs\u002Finference_auto_multiview.md)\n  * [机器人动作条件（Robot Action-Conditioned）](docs\u002Finference_robot_action_cond.md)\n  * [机器人多视角-Agibot（Robot Multiview-Agibot）](docs\u002Finference_robot_multiview-agibot.md)\n  * [机器人策略（Robot Policy）](https:\u002F\u002Fnvidia-cosmos.github.io\u002Fcosmos-cookbook\u002Frecipes\u002Fpost_training\u002Fpredict2\u002Fcosmos_policy\u002Fpost_training.html)\n* [Diffusers 推理指南](docs\u002Fdiffusers_inference.md)\n* [后训练（Post-Training）](docs\u002Fpost-training.md)\n  * [Video2World Cosmos-NeMo-Assets](docs\u002Fpost-training_video2world_cosmos_nemo_assets.md)\n  * [Video2World DreamGen Bench](docs\u002Fpost-training_video2world_gr00t.md)\n  * [自动多视角（Auto Multiview）](docs\u002Fpost-training_multiview.md)\n  * [机器人动作条件（Robot Action-Conditioned）](docs\u002Fpost-training_video2world_action.md)\n  * [机器人策略（Robot Policy）](https:\u002F\u002Fnvidia-cosmos.github.io\u002Fcosmos-cookbook\u002Frecipes\u002Fpost_training\u002Fpredict2\u002Fcosmos_policy\u002Fpost_training.html)\n* [蒸馏（Distillation）](docs\u002Fdistillation.md)\n  * [机器人动作条件（Robot Action-Conditioned）](https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-predict2.5\u002Fblob\u002Fmain\u002Fdocs\u002Fpost-training_video2world_action.md#4-distillation)\n\n## 贡献\n\n我们非常欢迎社区协作！如果没有像您这样的开发者的贡献，[NVIDIA-Cosmos](https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002F) 就不会有今天的成就。请查阅我们的 [贡献指南](CONTRIBUTING.md) 开始参与，并通过 issue 提交您的反馈。\n\n衷心感谢 🙏 所有帮助我们推动开源物理 AI 边界的人！\n\n## 许可与联系\n\n本项目将下载并安装额外的第三方开源软件项目。使用前请仔细阅读这些开源项目的许可条款。\n\nNVIDIA Cosmos 源代码采用 [Apache 2.0 许可证](https:\u002F\u002Fwww.apache.org\u002Flicenses\u002FLICENSE-2.0) 发布。\n\nNVIDIA Cosmos 模型采用 [NVIDIA 开源模型许可证（NVIDIA Open Model License）](https:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fagreements\u002Fenterprise-software\u002Fnvidia-open-model-license) 发布。如需定制许可，请联系 [cosmos-license@nvidia.com](mailto:cosmos-license@nvidia.com)。","# cosmos-predict2.5 快速上手指南\n\n## 环境准备\n\n- **操作系统**：Linux（推荐 Ubuntu 20.04\u002F22.04）\n- **GPU**：NVIDIA GPU（建议 H100\u002FA100\u002FL4 或更新架构，支持 Blackwell）\n- **驱动与 CUDA**：\n  - NVIDIA 驱动 ≥ 550\n  - CUDA ≥ 12.3\n- **Python**：≥ 3.10\n- **其他依赖**：\n  - Git LFS（用于下载大模型文件）\n  - Docker（可选，用于容器化部署）\n\n> 💡 国内用户建议配置 Hugging Face 镜像加速模型下载（如使用 `hf-mirror.com`）。\n\n## 安装步骤\n\n1. 克隆仓库并初始化子模块：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-predict2.5.git\ncd cosmos-predict2.5\ngit lfs install\n```\n\n2. 创建并激活 Python 虚拟环境：\n\n```bash\npython -m venv venv\nsource venv\u002Fbin\u002Factivate\n```\n\n3. 安装依赖（推荐使用国内 PyPI 镜像加速）：\n\n```bash\npip install --upgrade pip\npip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n4. （可选）安装 Diffusers 支持（用于 Hugging Face Pipeline）：\n\n```bash\npip install diffusers>=0.30.0\n```\n\n## 基本使用\n\n以下示例展示如何使用 **Cosmos-Predict2.5-2B base 模型** 进行 Text-to-World 视频生成（基于 distilled checkpoint，仅需文本输入）：\n\n```python\nfrom diffusers import Cosmos2_5_PredictBasePipeline\nimport torch\n\n# 加载模型（首次运行会自动从 Hugging Face 下载）\npipe = Cosmos2_5_PredictBasePipeline.from_pretrained(\n    \"nvidia\u002FCosmos-Predict2.5-2B\",\n    subfolder=\"base\u002Fdistilled\",\n    torch_dtype=torch.float16,\n).to(\"cuda\")\n\n# 生成视频\nprompt = \"A robotic arm pouring liquid into a glass on a white table.\"\nvideo = pipe(prompt, num_frames=16, height=256, width=256).frames[0]\n\n# 保存为 GIF（需安装 imageio 和 imageio[ffmpeg]）\nimport imageio\nimageio.mimsave(\"output.gif\", video, fps=8)\n```\n\n> 📌 提示：\n> - 首次运行会下载约数 GB 的模型权重，请确保网络畅通。国内用户可通过设置 `HF_ENDPOINT=https:\u002F\u002Fhf-mirror.com` 加速。\n> - 更多用法（如 Image2World、Video2World、机器人策略推理等）请参考官方文档：[Inference Guide](docs\u002Finference.md) 和 [Cosmos Cookbook](https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-cookbook)。","某自动驾驶初创公司正在开发城市道路场景下的行为预测模块，用于提升车辆在复杂路口对行人、非机动车和其他车辆未来轨迹的预判能力。\n\n### 没有 cosmos-predict2.5 时\n- 依赖传统物理规则+简单LSTM模型，难以准确模拟突发行为（如行人突然横穿马路），预测视频失真严重。\n- 需手动构建大量仿真场景进行训练，数据生成成本高、周期长，且覆盖不足。\n- 多智能体交互建模能力弱，无法同步预测多个交通参与者未来的联合状态。\n- 模型输出仅为轨迹坐标，缺乏直观的视觉化结果，调试和验证效率低。\n- 针对新城市或特殊天气条件，需重新采集数据并微调整个系统，泛化性差。\n\n### 使用 cosmos-predict2.5 后\n- 基于世界基础模型直接生成高保真未来视频，精准还原复杂动态行为（如电动车变道、行人避让等）。\n- 利用其强大的零样本泛化能力，仅输入当前10秒实拍视频即可生成未来5秒多模态预测，大幅减少仿真依赖。\n- 内置多视角与多智能体建模机制，可同步输出所有交通参与者的协同演化视频。\n- 生成结果为可视化视频流，工程师可直观评估预测合理性，加速算法迭代。\n- 通过Cosmos Cookbook中的轻量蒸馏流程，快速适配雨雾天气或新城市路况，部署周期缩短70%。\n\ncosmos-predict2.5 将未来状态预测从抽象坐标升级为可解释、可泛化、高保真的视觉推演，显著提升自动驾驶系统的环境理解与决策鲁棒性。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnvidia-cosmos_cosmos-predict2.5_474d4cfe.png","nvidia-cosmos","NVIDIA Cosmos","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fnvidia-cosmos_edbde0b8.png","NVIDIA Cosmos is a world foundation model platform for accelerating the development of physical AI systems.",null,"mharrim@nvidia.com","https:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fai\u002Fcosmos\u002F","https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos",[85,89,93,97],{"name":86,"color":87,"percentage":88},"Python","#3572A5",99.4,{"name":90,"color":91,"percentage":92},"Shell","#89e051",0.4,{"name":94,"color":95,"percentage":96},"Dockerfile","#384d54",0.1,{"name":98,"color":95,"percentage":96},"Just",1038,127,"2026-04-05T15:57:33","Apache-2.0","Linux","必需 NVIDIA GPU，支持 Blackwell 架构，显存至少 8GB（14B 模型推荐 24GB+），CUDA 11.7+","未说明",{"notes":107,"python":105,"dependencies":108},"项目依赖 NVIDIA 专有技术栈，支持 Docker 部署；需安装 Git LFS 以下载大模型文件；部分功能（如 Blackwell + ARM 推理）需特定硬件；建议参考 cosmos-cookbook 中的配方进行环境配置和模型部署。",[109,110,111,112,113,114],"torch","diffusers","transformers","accelerate","cosmos-oss","nvidia-cosmos-cookbook",[15],[117,118,119],"foundational-models","video-generation","world-models",22,"2026-03-27T02:49:30.150509","2026-04-06T07:13:39.475360",[124,129,134,139,144,148],{"id":125,"question_zh":126,"answer_zh":127,"source_url":128},794,"在 RTX 5090 上运行推理失败，如何解决显存不足的问题？","项目已新增模型卸载（offload）功能，包括按需加载文本编码器、分词器和扩散模型，并支持将模型卸载到 CPU。该功能将在下一版本发布，可支持在 32GB 显存的 GPU（如 RTX 5090）上高效运行。此外，可尝试使用 --downcast_text_encoder 参数进行降精度（如 bfloat16）以节省显存。","https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-predict2.5\u002Fissues\u002F2",{"id":130,"question_zh":131,"answer_zh":132,"source_url":133},795,"使用 LoRA 微调时训练卡在采样步骤怎么办？","该问题可能与日志配置有关。请确保拉取最新主分支代码，并使用如下训练命令：\n```bash\ntorchrun --nproc_per_node=8 scripts\u002Ftrain.py --config=cosmos_predict2\u002F_src\u002Fpredict2\u002Fconfigs\u002Fvideo2world\u002Fconfig.py -- experiment=predict2_lora_training_2b_cosmos_nemo_assets_txt\n```\n注意：日志输出间隔默认为每 100 步一次，验证后可能暂时看不到日志，但训练仍在正常进行。","https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-predict2.5\u002Fissues\u002F97",{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},796,"为什么无法复现 Bridge 数据集上动作条件模型的生成效果？","问题源于默认检查点的变更。在 v1.5.0 版本中，动作条件模型的 DEFAULT_CHECKPOINT 已从 `post-trained` 改为 `pre-trained`（即 `Cosmos-Predict2.5-2B\u002Fbase\u002Fpre-trained`）。请使用更新后的预训练检查点进行微调，或参考 PR #133 的修复。使用正确的检查点后，可成功复现具有良好动作可控性的结果。","https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-predict2.5\u002Fissues\u002F119",{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},797,"在 post-training 阶段启用验证集时出现错误，如何解决？","该问题与 post-training 配置中验证数据加载逻辑有关。虽然 Cosmos-Predict2 能正常加载同一验证集，但在 Predict 2.5 的 post-training 设置下存在兼容性问题。建议检查配置文件中 dataloader_val 的结构是否与训练集一致，并确保 trainer.run_validation=True 和 trainer.validation_iter 设置合理。目前官方尚未提供完整修复方案，但可尝试临时禁用验证或参考后续版本更新。","https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-predict2.5\u002Fissues\u002F30",{"id":145,"question_zh":146,"answer_zh":147,"source_url":133},798,"LoRA 训练中为何验证后日志消失？","这是因为日志打印间隔（logging iteration）被设置为 100。验证步骤执行后，若未达到下一个日志打印步数，控制台将暂时不显示训练日志，但训练仍在后台正常进行。可通过修改配置降低 logging_interval 或监控 loss 变化来确认训练状态。",{"id":149,"question_zh":150,"answer_zh":151,"source_url":138},799,"动作条件模型训练后生成视频语义与真实视频差异很大，可能原因是什么？","主要原因是使用了错误的预训练检查点。早期版本默认使用 post-trained 检查点，而新版本（v1.5.0）改为使用 pre-trained 检查点。使用 pre-trained 检查点进行微调后，模型能正确学习动作与视频内容的对应关系，从而生成语义一致的视频。请确认 checkpoint 路径是否指向 `Cosmos-Predict2.5-2B\u002Fbase\u002Fpre-trained`。",[153,158,163,168,173,178,183,188,193,198],{"id":154,"version":155,"summary_zh":156,"released_at":157},110016,"v1.5.1","1. Fix bug with latest conditional frames","2026-04-03T00:57:25",{"id":159,"version":160,"summary_zh":161,"released_at":162},110017,"v1.5.0","1. Predict2.5 \u002FRobot\u002FAction-Cond Distillation guide [published](https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-predict2.5\u002Fblob\u002Fmain\u002Fdocs\u002Fpost-training_video2world_action.md#4-distillation).\r\n\r\n2. Generate robot actions through Predict2.5 \u002FRobot\u002FPolicy [Models](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002FCosmos-Predict2.5-2B\u002Ftree\u002Fmain\u002Frobot\u002Fpolicy) (RoboCasa, Libero) - offering inference, post-training [recipe](https:\u002F\u002Fnvidia-cosmos.github.io\u002Fcosmos-cookbook\u002Frecipes\u002Fpost_training\u002Fpredict2\u002Fcosmos_policy\u002Fpost_training.html) in cosmos-cookbook.\r\n\r\n3. Predict2.5 \u002FRobot \u002FAction-Cond post training bug fix.","2026-02-24T00:08:50",{"id":164,"version":165,"summary_zh":166,"released_at":167},110018,"v1.4.2","- Add example script using Diffusers\r\n- Breakup and simplify checkpoint_db.py \r\n- Fix checkpoints paths for robot multiview and action conditioned","2026-01-27T21:40:09",{"id":169,"version":170,"summary_zh":171,"released_at":172},110019,"v1.4.1","* Adds code, checkpoints, and documentation for Predict2-Distill (docs\u002Fdistillation.md)\r\n* Predict2.5 2B\u002F14B now run via HF Diffusers","2025-12-19T21:16:13",{"id":174,"version":175,"summary_zh":176,"released_at":177},110020,"v1.4.0","This release adds several new checkpoints:\r\n\r\n* Predict2.5-2B robot\u002Fmultiview-agibot model and inference\r\n* Predict2.5 14B base models, inference and post training","2025-12-05T23:20:55",{"id":179,"version":180,"summary_zh":181,"released_at":182},110021,"v1.3.3","* Fix help menu and override cli args.\r\n* Guardrail offload fix.\r\n* Blackwell + ARM inference support.\r\n* Enable LFS for assets.","2025-11-26T01:05:30",{"id":184,"version":185,"summary_zh":186,"released_at":187},110022,"v1.3.2","* Multiview refactoring and updated checkpoints\r\n* Added inference example notebooks in examples\u002Fnotebook\u002F","2025-11-14T19:39:33",{"id":189,"version":190,"summary_zh":191,"released_at":192},110023,"v1.3.1","We released support for DMD2 distillation for model compression, autoregressive sliding window generation mode for generating longer videos, and a new multiview cross-attention module. We improved inference examples and documentation, upgraded dependencies to improve support for Blackwell, and made various infrastructure improvements.","2025-11-07T22:01:48",{"id":194,"version":195,"summary_zh":196,"released_at":197},110024,"v1.3.0","* Bugfix for action conditioned inference\r\n* Update guardrails\r\n* Add cosmos-oss package\r\n* [post-training] Joint text2world, image2world, video2world LoRA post-training + improved loRA initialization order\r\n* Add pyrefly annotations\r\n* easyio: Add multi-storage-client backend\r\n* Reorganize some internal packages\r\n* [perf] Speed up tokenizer with CP and torch.compile for transfer2\r\n","2025-10-28T22:42:11",{"id":199,"version":200,"summary_zh":201,"released_at":202},110025,"v1.2.0","Various updates\r\n\r\n* Bump docker base image version\r\n* Add pyrefly and annotations\r\n* Bump package version to 1.2.0\r\n* Add gr00t_dreams dataset","2025-10-21T22:28:36"]