Awesome-World-Models

1.5k 48 困难 1 次阅读昨天BSD-3-Clause开发框架视频Agent

AI 解读由 AI 自动生成，仅供参考

Awesome-World-Models 是一个专为人工智能领域打造的开源资源库，致力于系统性地整理与“世界模型”相关的顶尖学术论文、代码实现及技术博客。它聚焦于通用视频生成、具身智能（Embodied AI）以及自动驾驶三大核心场景，旨在解决该领域研究分散、定义模糊以及复现困难的问题，为开发者提供一站式的知识导航。

无论是希望追踪前沿动态的科研人员，还是寻求高质量基线模型的算法工程师，都能从中获益。该资源库不仅收录了如 2018 年奠基性论文《World Models》等经典文献，还持续更新包括 OpenWorldLib、DreamDojo、SIMA 2 在内的最新技术报告与开源项目。其独特亮点在于构建了从理论基础到实际基准测试（Benchmarks）的完整生态，涵盖了因果建模、物理对齐及大规模人类操作学习等前沿方向。通过清晰的分类索引，Awesome-World-Models 帮助用户快速定位所需资源，极大地降低了探索世界模型技术的门槛，是推动机器人感知与决策能力进化的重要助手。

使用场景

某自动驾驶初创公司的算法团队正致力于研发基于世界模型的端到端驾驶系统，需要在海量文献中筛选出适合实车部署的前沿方案。

没有 Awesome-World-Models 时

检索效率低下：研究人员需在 arXiv、GitHub 和各类会议网站间反复切换，手动拼凑"World Model"、"Autonomous Driving"等关键词，耗时数天仍难保全。
代码复现困难：找到的论文往往缺乏官方代码链接，或仓库已归档，导致无法快速验证算法在真实驾驶场景中的有效性。
技术视野局限：容易遗漏跨领域（如具身智能或通用视频生成）中可迁移的关键技术，错失了利用 DreamDojo 或 SIMA 2 等最新成果优化感知预测模块的机会。
基准评估混乱：缺乏统一的评测标准列表，团队难以横向对比不同模型在动态障碍物预测等核心指标上的真实性能。

使用 Awesome-World-Models 后

一站式资源聚合：团队直接通过分类目录锁定"World Models for Autonomous Driving"板块，几分钟内即可获取从基础理论到 GigaWorld-Policy 等最新策略模型的完整清单。
落地路径清晰：每个条目均附带论文、代码库及项目主页链接，工程师能迅速克隆 lingbot-va 等开源项目进行本地调试与微调。
跨界灵感激发：通过浏览"Embodied AI"和"General Video Generation"板块，团队成功借鉴了物理对齐技术，提升了车辆对复杂路况的推演能力。
评估体系规范：利用内置的"Benchmarks & Evaluation"章节，快速建立了符合行业标准的测试流程，确保模型选型科学可靠。

Awesome-World-Models 将原本分散破碎的研究线索编织成一张高效的知识地图，极大缩短了从理论探索到工程落地的周期。

运行环境要求

GPU

未说明

内存

未说明

依赖

notes该仓库（Awesome-World-Models）是一个 curated list（精选列表），主要收集了关于世界模型（World Models）在机器人、自动驾驶、具身智能等领域的论文、博客和技术报告链接。它本身不是一个可执行的软件工具或代码库，因此不包含具体的运行环境需求（如操作系统、GPU、内存、Python 版本或依赖库）。用户若需运行列表中提到的具体模型（如 Cosmos, HunyuanWorld, GAIA-2 等），需前往各模型对应的独立代码仓库查看其特定的安装和运行要求。

python未说明

快速开始

机器人领域的优秀世界模型

本仓库提供了一份精心整理的清单，收录了用于通用视频生成、具身智能和自动驾驶的世界模型相关论文。模板参考自Awesome-LLM-Robotics和Awesome-World-Model

欢迎大家贡献！请随时提交拉取请求，或通过邮件联系我们，以添加新的论文！

如果您觉得本仓库很有用，请考虑引用并为该列表点个赞⭐。也欢迎您与他人分享！

概述

机器人领域的优秀世界模型 - 欢迎大家贡献！请随时提交拉取请求，或通过邮件联系我们，以添加论文！

世界模型的基础性论文

World Models, NIPS 2018 口头报告. [论文] [官网]

博客或技术报告

OpenWorldLib, OpenWorldLib：先进世界模型的统一代码库与定义。[论文]
ABot-PhysWorld, ABot-PhysWorld：面向机器人操控的物理对齐交互式世界基础模型。[论文]
GigaWorld-Policy, GigaWorld-Policy：一种高效的以行动为中心的世界—行动模型。[论文]
GigaBrain-0.5M*, GigaBrain-0.5M*：一种基于世界模型强化学习进行训练的VLA。[论文] [官网]
ALIVE, ALIVE：通过逼真的音视频生成为您的世界注入生命。[论文] [官网]
DreamDojo, DreamDojo：基于大规模人类视频的通用机器人世界模型。[论文] [官网]
lingbot-va, 用于机器人控制的因果世界建模。[论文] [官网] [代码]
lingbot-world, 推动开源世界模型的发展。[论文] [官网] [代码]
TARS, 世界尽在掌握：一个大规模、开源的人类中心型野外操控学习生态系统。[论文] [官网]
SIMA 2, SIMA 2：面向虚拟世界的通用具身智能体。[论文]
SimWorld, SimWorld：面向物理与社会环境中自主智能体的开放式真实感模拟器。[论文] [官网]
Hunyuan-GameCraft-2, Hunyuan-GameCraft-2：遵循指令的交互式游戏世界模型。[论文] [官网]
GigaWorld-0, GigaWorld-0：以世界模型为数据引擎，赋能具身AI。[论文] [官网]
PAN, PAN：用于通用、可交互且长 horizon 世界模拟的世界模型。[论文]
Cosmos-Predict2.5, 基于视频基础模型的物理AI世界模拟。[论文] [代码]
Emu3.5, Emu3.5：原生多模态模型即世界学习者。[论文] [官网] [代码]
ODesign, ODesign：用于生物分子相互作用设计的世界模型。[论文] [官网]
GigaBrain-0, GigaBrain-0：一种由世界模型驱动的视觉-语言-行动模型。[论文] [官网]
CWM, CWM：一款开放权重的LLM，用于结合世界模型的代码生成研究。[论文] [官网] [代码]
WoW, WoW：通过具身交互迈向全知世界模型。[论文] [官网]
Matrix-Game 2.0, Matrix-Game 2.0：一款开源、实时、流式的交互式世界模型。[论文] [官网]
Matrix-3D, Matrix-3D：全方位可探索的3D世界生成。[论文] [官网]
HunyuanWorld 1.0, HunyuanWorld 1.0：根据文字或像素生成沉浸式、可探索且交互式的3D世界。[论文] [官网] [代码]
神经网络“学习世界模型”意味着什么？[论文]
Matrix-Game, Matrix-Game：交互式世界基础模型。[论文] [代码]
Cosmos-Drive-Dreams, Cosmos-Drive-Dreams：利用世界基础模型规模化生成合成驾驶数据。[论文] [官网]
GAIA-2, GAIA-2：一款可控的多视角生成式世界模型，用于自动驾驶。[论文] [官网]
Cosmos, Cosmos世界基础模型平台，专为物理AI设计。[论文] [官网] [代码]
1X Technologies, 1X世界模型。[博客]
Runway, 首次推出通用世界模型。[博客]
Wayve, 首次推出GAIA-1：一款尖端的自主驾驶生成式AI模型。[论文] [博客]
Yann LeCun, 通往自主机器智能之路。[论文]

调查研究

“视频生成模型作为世界模型：高效范式、架构与算法” arXiv 2026.03。[论文]
“从数字孪生到世界模型：移动边缘通用智能的机遇、挑战与应用” arXiv 2026.03。[论文]
“一致性三元组作为通用世界模型的定义性原则” arXiv 2026.02。[论文] [代码]
“作为世界模型的视频生成的机制化视角：状态与动力学”，arXiv 2026.01。[论文]
“从生成引擎到可行动的模拟器：世界模型中物理接地的必要性”，arXiv 2026.01。[论文]
“为具身AI建模心理世界：全面综述”，arXiv 2026.01。[论文]
“数字孪生AI：从大型语言模型到世界模型的机遇与挑战”，arXiv 2026.01。[论文]
“超越世界模型：重新思考AI模型中的理解”，AAAI 2026。[论文]
“利用人工智能模拟视觉世界：路线图”，arXiv 2025.11。[论文] [网站] [代码]
“迈向世界模型：机器人操作综述”，arXiv 2025.11。[论文]
“世界模型应优先统一物理与社会动态”，NIPS 2025。[论文] [网站]
“从掩码到世界：世界模型的搭车者指南”，arXiv 2025.10。[论文] [网站]
“具身AI世界模型的全面综述”，arXiv 2025.10。[论文] [网站]
“具身AI智能体世界模型的安全挑战：综述”，arXiv 2025.10。[论文]
“具身AI：从LLM到世界模型”，IEEE CASM。[论文]
“3D与4D世界建模：综述”，arXiv 2025.09。[论文]
“通过世界模型和代理型AI实现边缘通用智能：基础、解决方案与挑战”，arXiv 2025.08。[论文]
“综述：从物理模拟器和世界模型中学习具身智能”，arXiv 2025.07。[论文] [代码]
“具身AI智能体：建模世界”，arXiv 2025.06。[论文]
“从2D到3D认知：通用世界模型简要综述”，arXiv 2025.06。[论文]
“基于声学物理信息的世界模型综述”，arXiv 2025.06。[论文]
“探索视频生成中物理认知的演化：综述”，arXiv 2025.03。[论文] [代码]
“人工智能中的世界模型：像孩子一样感知、学习和推理”，arXiv 2025.03。[论文]
“模拟真实世界：多模态生成模型的统一综述”，arXiv 2025.03。[论文] [代码]
“物理可解释世界模型的四大原则”，arXiv 2025.03。[论文]
“世界模型在塑造自动驾驶中的作用：全面综述”，arXiv 2025.02。[论文] [代码]
“自动驾驶用世界模型综述”，TPAMI。[论文]
“理解世界还是预测未来？世界模型全面综述”，arXiv 2024.11。[论文]
“世界模型：安全视角”，ISSRE WDMD。[论文]
“探索自动驾驶中视频生成与世界模型的相互作用：综述”，arXiv 2024.11。[论文]
“从高效多模态模型到世界模型：综述”，arXiv 2024.07。[论文]
“将网络空间与物理世界对齐：具身AI全面综述”，arXiv 2024.07。[论文] [代码]
“Sora是世界模拟器吗？通用世界模型及更广泛领域的全面综述”，arXiv 2024.05。[论文] [代码]
“自动驾驶用世界模型：初步综述”，TIV。[论文]
“关于自动驾驶用多模态大型语言模型的综述”，WACVW 2024。[论文] [代码]

基准与评估

“世界推理竞技场”，arxiv 2026.03。[论文] [代码]
Omni-WorldBench：“Omni-WorldBench：迈向以交互为中心的世界模型综合评估”，arxiv 2026.03。[论文]
“眼不见，心不念？视频世界模型中状态演化的评估”，arxiv 2026.03。[论文] [网站]
MicroVerse：“MicroVerse：迈向微观世界模拟的初步探索”，ICLR 2026。[论文] [代码]
WorldArena：“WorldArena：用于评估具身世界模型感知能力与功能效用的统一基准” arXiv 2026.02。[论文] [代码]
MIND：“MIND：世界模型中记忆一致性与动作控制的基准测试” arXiv 2026.02。[论文] [代码]
WoW-bench：“工作流世界：将世界模型引入企业系统的基准测试” arXiv 2026.01。[论文]
WorldBench：“WorldBench：为世界模型诊断性评估消歧物理特性” arXiv 2026.01。[论文] [网站]
PhysicsMind：“PhysicsMind：面向基础VLM和世界模型的物理推理与预测的仿真与真实力学基准测试” arXiv 2026.01。[论文]
RBench：“重新思考具身世界的视频生成模型” arXiv 2026.01。[论文] [网站] [代码]
Wow, wo, val!：“Wow, wo, val！全面的具身世界模型评估图灵测试” arXiv 2026.01。[论文]
DrivingGen：“DrivingGen：自动驾驶领域生成式视频世界模型的综合基准” arXiv 2026.01。[论文] [网站]
“幻觉的统一定义，或者说：问题出在世界模型上，蠢货！” arXiv 2025.12。[论文]
“通过闭环世界建模实现视频化身中的主动智能” arXiv 2025.12。[论文] [代码]
MobileWorldBench：“MobileWorldBench：迈向面向移动智能体的语义世界建模” arXiv 2025.12。[论文] [代码]
WorldLens：“WorldLens：真实世界中驾驶类世界模型的全谱评估” arXiv 2025.12。[论文] [网站]
“在Veo世界模拟器中评估Gemini机器人策略” arXiv 2025.12。[论文]
On Memory：“关于记忆：世界模型中记忆机制的比较” 2026年世界建模研讨会。[论文]
SmallWorlds：“SmallWorlds：在孤立环境中评估世界模型的动力学理解能力”，arXiv 2025.11。[论文]
4DWorldBench：“4DWorldBench：面向3D/4D世界生成模型的综合评估框架”，arXiv 2025.11。[论文]
Target-Bench：“Target-Bench：世界模型能否实现基于语义目标的无地图路径规划？”，arXiv 2025.11。[论文]
PragWorld：“PragWorld：在最小语言扰动和对话动态下评估LLM本地世界模型的基准”，AAAI 2026。[论文]
“世界模拟器能进行推理吗？Gen-ViRe：生成式视觉推理基准”，arXiv 2025.11。[论文] [代码]
“利用视频世界模型进行可扩展的策略评估”，arXiv 2025.11。[论文]
“LLM世界模型的专家评估：以高温超导为例”，ICML 2025关于评估世界模型及当前人工智能探索的研讨会。[论文]
“世界模型学习的基准测试”，arXiv 2025.10。[论文]
LikePhys：“LikePhys：通过似然偏好评估视频扩散模型中的直观物理理解”，ICLR 2026。[论文] [网站]
World-in-World：“World-in-World：闭环世界中的世界模型”，arXiv 2025.10。[论文] [网站]
VideoVerse：“VideoVerse：你的T2V生成器距离世界模型还有多远？”，arXiv 2025.10。[论文]
OmniWorld：“OmniWorld：用于4D世界建模的多领域、多模态数据集”，arXiv 2025.09。[论文] [网站]
“超越仿真：面向自动驾驶中规划与因果关系的世界模型基准测试”，ICRA 2025。[论文]
WM-ABench：“视觉-语言模型是否具有内部世界模型？迈向原子级评估”，ACL 2025（Findings）。[论文] [网站]
UNIVERSE：“调整视觉-语言模型以评估世界模型”，arxiv 2025.06。[论文]
WorldPrediction：“WorldPrediction：面向高层世界建模与长 horizon 程序化规划的基准”，arxiv 2025.06。[论文]
“迈向记忆辅助的世界模型：通过空间一致性进行基准测试”，arxiv 2025.05。[论文] [数据集] [代码]
SimWorld：“SimWorld：基于世界模型的模拟器条件场景生成统一基准”，arxiv 2025.05。[论文] [代码]
EWMBench：“EWMBench：评估具身世界模型中的场景、运动和语义质量”，arxiv 2025.05。[论文] [代码]
“迈向稳定的世界模型：测量并解决生成式环境中的世界不稳定问题”，arxiv 2025.03。[论文]
WorldModelBench：“WorldModelBench：将视频生成模型视为世界模型进行评判”，CVPR 2025。[论文] [网站]
Text2World：“Text2World：大型语言模型符号世界模型生成的基准测试”，arxiv 2025.02。[论文] [网站]
ACT-Bench：“ACT-Bench：迈向可用于自动驾驶的动作可控世界模型”，arxiv 2024.12。[论文]
WorldSimBench：“WorldSimBench：迈向将视频生成模型作为世界模拟器的方向”，arxiv 2024.10。[论文] [网站]
EVA：“EVA：用于未来视频预测的具身世界模型”，ICML 2025。[论文] [网站]
AeroVerse：“AeroVerse：用于模拟、预训练、微调和评估航空航天具身世界模型的无人机代理基准套件”，arxiv 2024.08。[论文]
CityBench：“CityBench：评估大型语言模型作为世界模型的能力”，arXiv 2024.06。[论文] [代码]
“想象不可见的世界：视觉世界模型中系统性泛化的基准测试”，NIPS 2023。[论文]

General World Models

InCoder-32B-Thinking: "InCoder-32B-Thinking: Industrial Code World Model for Thinking", arxiv 2026.04. [Paper]
Learn2Fold: "Learn2Fold: Structured Origami Generation with World Model Planning", arxiv 2026.03. [Paper]
WorldFlow3D: "WorldFlow3D: Flowing Through 3D Distributions for Unbounded World Generation", arxiv 2026.03. [Paper] [Website]
LOME: "LOME: Learning Human-Object Manipulation with Action-Conditioned Egocentric World Model", arxiv 2026.03. [Paper]
VGGRPO: "VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward", arxiv 2026.03. [Paper] [Website]
PiJEPA: "Policy-Guided World Model Planning for Language-Conditioned Visual Navigation", arxiv 2026.03. [Paper]
"Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models", arxiv 2026.03. [Paper]
Lingshu-Cell: "Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells", arxiv 2026.03. [Paper]
AI-Supervisor: "AI-Supervisor: Autonomous AI Research Supervision via a Persistent Research World Model", arxiv 2026.03. [Paper]
WildWorld: "WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG", arxiv 2026.03. [Paper] [Website] [Code]
"Model Predictive Control with Differentiable World Models for Offline Reinforcement Learning", arxiv 2026.03. [Paper]
WorldCache: "WorldCache: Content-Aware Caching for Accelerated Video World Models", arxiv 2026.03. [Paper] [Website]
"From Part to Whole: 3D Generative World Model with an Adaptive Structural Hierarchy", ICME 2026. [Paper]
EgoForge: "EgoForge: Goal-Directed Egocentric World Simulator", arxiv 2026.03. [Paper]
"Structured Latent Dynamics in Wireless CSI via Homomorphic World Models", IEEE ICC. [Paper]
WorldAgents: "WorldAgents: Can Foundation Image Models be Agents for 3D World Models?", arxiv 2026.03. [Paper] [Website]
R2-Dreamer: "R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation", ICLR 2026. [Paper] [Code]
StereoWorld: "Stereo World Model: Camera-Guided Stereo Video Generation", arxiv 2026.03. [Paper] [Website]
MosaicMem: "MosaicMem: Hybrid Spatial Memory for Controllable Video World Models", arxiv 2026.03. [Paper] [Website]
WorldCam: "WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation", arxiv 2026.03. [Paper] [Website]
SWM: "Grounding World Simulation Models in a Real-World Metropolis", arxiv 2026.03. [Paper] [Website]
NavThinker: "NavThinker: Action-Conditioned World Models for Coupled Prediction and Planning in Social Navigation", arxiv 2026.03. [Paper] [Website]
EyeWorld: "EyeWorld: A Generative World Model of Ocular State and Dynamics", arxiv 2026.03. [Paper]
CtrlAttack: "CtrlAttack: A Unified Attack on World-Model Control in Diffusion Models", arxiv 2026.03. [Paper]
SAW: "SAW: Toward a Surgical Action World Model via Controllable and Scalable Video Generation", arxiv 2026.03. [Paper]
VGGT-World: "VGGT-World: Transforming VGGT into an Autoregressive Geometry World Model", arxiv 2026.03. [Paper]
ARROW: "ARROW: Augmented Replay for RObust World models", arxiv 2026.03. [Paper]
RAE-NWM: "RAE-NWM: Navigation World Model in Dense Visual Representation Space", arxiv 2026.03. [Paper] [Code]
SPIRAL: "SPIRAL: A Closed-Loop Framework for Self-Improving Action World Models via Reflective Planning Agents", arxiv 2026.03. [Paper]
MWM: "MWM: Mobile World Models for Action-Conditioned Consistent Prediction", arxiv 2026.03. [Paper] [Website] [Code]
Brain-WM: "Brain-WM: Brain Glioblastoma World Model", arxiv 2026.03. [Paper] [Code]
DreamSAC: "DreamSAC: Learning Hamiltonian World Models via Symmetry Exploration", arxiv 2026.03. [Paper]
LiveWorld: "LiveWorld: Simulating Out-of-Sight Dynamics in Generative Video World Models", arxiv 2026.03. [Paper] [Website]
"What if? Emulative Simulation with World Models for Situated Reasoning", arxiv 2026.03. [Paper]
WorldCache: "WorldCache: Accelerating World Models for Free via Heterogeneous Token Caching", arxiv 2026.03. [Paper] [Website]
"Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model", CVPR 2026. [Paper]
"Beyond Pixel Histories: World Models with Persistent 3D State", arxiv 2026.03. [Paper] [Website]
"Contextual Latent World Models for Offline Meta Reinforcement Learning", arxiv 2026.03. [Paper]
"Next Embedding Prediction Makes World Models Stronger", arxiv 2026.03. [Paper]
COMBAT: "COMBAT: Conditional World Models for Behavioral Agent Training", arxiv 2026.03. [Paper]
DreamWorld: "DreamWorld: Unified World Modeling in Video Generation", arxiv 2026.03. [Paper] [Code]
MetaOthello: "MetaOthello: A Controlled Study of Multiple World Models in Transformers", arxiv 2026.02. [Paper]
GeoWorld: "GeoWorld: Geometric World Models", CVPR 2026. [Paper] [Website]
UCM: "UCM: Unifying Camera Control and Memory with Time-aware Positional Encoding Warping for World Models", arxiv 2026.02. [Paper] [Website]
"Code World Models for Parameter Control in Evolutionary Algorithms", arxiv 2026.02. [Paper]
Solaris: "Solaris: Building a Multiplayer Video World Model in Minecraft", arxiv 2026.02. [Paper] [Website]
"MRI Contrast Enhancement Kinetics World Model", CVPR 2026. [Paper]
"Neural Fields as World Models", arXiv 2026.02. [Paper]
"Learning Invariant Visual Representations for Planning with Joint-Embedding Predictive World Models", arXiv 2026.02. [Paper]
Generated Reality: "Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control", arXiv 2026.02. [Paper]
VLM-DEWM: "VLM-DEWM: Dynamic External World Model for Verifiable and Resilient Vision-Language Planning in Manufacturing", arXiv 2026.02. [Paper]
"World-Model-Augmented Web Agents with Action Correction", arXiv 2026.02. [Paper]
"Cold-Start Personalization via Training-Free Priors from Structured World Models", arXiv 2026.02. [Paper] [Code]
"World Models for Policy Refinement in StarCraft II", arXiv 2026.02. [Paper]
WebWorld: "WebWorld: A Large-Scale World Model for Web Agent Training", arXiv 2026.02. [Paper]
WIMLE: "WIMLE: Uncertainty-Aware World Models with IMLE for Sample-Efficient Continuous Control", ICLR 2026. [Paper]
Causal-JEPA: "Causal-JEPA: Learning World Models through Object-Level Latent Interventions", arXiv 2026.02. [Paper] [Website] [Code]
Olaf-World: "Olaf-World: Orienting Latent Actions for Video World Modeling", arXiv 2026.02. [Paper] [Website] [Code]
Agent World Model: "Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning", arXiv 2026.02. [Paper] [Code]
WorldCompass: "WorldCompass: Reinforcement Learning for Long-Horizon World Models", arXiv 2026.02. [Paper] [Website]
"Horizon Imagination: Efficient On-Policy Training in Diffusion World Models", ICLR 2026. [Paper] [Code]
"Geometry-Aware Rotary Position Embedding for Consistent Video World Model", arXiv 2026.02. [Paper]
"Debugging code world models", arXiv 2026.02. [Paper]
"Cross-View World Models", arXiv 2026.02. [Paper]
"Interpreting Physics in Video World Models", arXiv 2026.02. [Paper]
"Neural Sabermetrics with World Model: Play-by-play Predictive Modeling with Large Language Model", arXiv 2026.02. [Paper]
"From Kepler to Newton: Inductive Biases Guide Learned World Models in Transformers", arXiv 2026.02. [Paper]
"Self-Improving World Modelling with Latent Actions", arXiv 2026.02. [Paper]
"Reinforcement World Model Learning for LLM-based Agents", arXiv 2026.02. [Paper]
LIVE: "LIVE: Long-horizon Interactive Video World Modeling", arXiv 2026.02. [Paper] [Website]
EHRWorld: "EHRWorld: A Patient-Centric Medical World Model for Long-Horizon Clinical Trajectories", arXiv 2026.02. [Paper]
"Joint Learning of Hierarchical Neural Options and Abstract World Model", arXiv 2026.02. [Paper]
"Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments", ICLR 2026. [Paper] [Code]
"The Patient is not a Moving Document: A World Model Training Paradigm for Longitudinal EHR", arXiv 2026.01. [Paper]
PathWise: "PathWise: Planning through World Model for Automated Heuristic Design via Self-Evolving LLMs", arXiv 2026.01. [Paper]
"From Observations to Events: Event-Aware World Model for Reinforcement Learning", arXiv 2026.01. [Paper]
NuiWorld: "NuiWorld: Exploring a Scalable Framework for End-to-End Controllable World Generation", arXiv 2026.01. [Paper]
""Just in Time" World Modeling Supports Human Planning and Reasoning", arXiv 2026.01. [Paper]
Action Shapley: "Action Shapley: A Training Data Selection Metric for World Model in Reinforcement Learning", arXiv 2026.01. [Paper]
"Inference-time Physics Alignment of Video Generative Models with Latent World Models", CVPR 2026. [Paper] [Code]
Imagine-then-Plan: "Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models", arXiv 2026.01. [Paper]
Puzzle it Out: "Puzzle it Out: Local-to-Global World Model for Offline Multi-Agent Reinforcement Learning", arXiv 2026.01. [Paper]
"Object-Centric World Models Meet Monte Carlo Tree Search", arXiv 2026.01. [Paper]
"Learning Latent Action World Models In The Wild", arXiv 2026.01. [Paper]
VerseCrafter: "VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control", arXiv 2026.01. [Paper] [Website]
"Choreographing a World of Dynamic Objects", arXiv 2026.01. [Paper] [Website]
MobileDreamer: "MobileDreamer: Generative Sketch World Model for GUI Agent", arXiv 2026.01. [Paper]
"Current Agents Fail to Leverage World Model as Tool for Foresight", arXiv 2026.01. [Paper]
"Flow Equivariant World Models: Memory for Partially Observed Dynamic Environments", arXiv 2026.01. [Paper] [Website]
"Value-guided action planning with JEPA world models", ICLR 2026 World Modeling Workshop. [Paper]
NeoVerse: "NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos", arXiv 2026.01. [Paper] [Website]
TeleWorld: "TeleWorld: Towards Dynamic Multimodal Synthesis with a 4D World Model", arXiv 2026.01. [Paper]
"World model inspired sarcasm reasoning with large language model agents", arXiv 2025.12. [Paper]
LEWM: "Large Emotional World Model", arXiv 2025.12. [Paper]
WWM: "Web World Models", arXiv 2025.12. [Paper] [Website]
SurgWorld: "SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling", arXiv 2025.12. [Paper]
Agent2World: "Agent2World: Learning to Generate Symbolic World Models via Adaptive Multi-Agent Feedback", arXiv 2025.12. [Paper] [Website]
Yume-1.5: "Yume-1.5: A Text-Controlled Interactive World Generation Model", arXiv 2025.12. [Paper] [Website] [Code]
"Aerial World Model for Long-horizon Visual Generation and Navigation in 3D Space", arXiv 2025.12. [Paper]
"From Word to World: Can Large Language Models be Implicit Text-based World Models?", arXiv 2025.12. [Paper]
"Dexterous World Models", arXiv 2025.12. [Paper] [Website]
PhysFire-WM: "PhysFire-WM: A Physics-Informed World Model for Emulating Fire Spread Dynamics", `arxiv 2025.12. [Paper]
WorldPlay: "WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling", `arxiv 2025.12. [Paper] [Website]
"The Double Life of Code World Models: Provably Unmasking Malicious Behavior Through Execution Traces", `arxiv 2025.12. [Paper]
LongVie 2: "LongVie 2: Multimodal Controllable Ultra-Long Video World Model", `arxiv 2025.12. [Paper] [Website]
VFMF: "VFMF: World Modeling by Forecasting Vision Foundation Model Features", `arxiv 2025.12. [Paper] [Website] [Code]
VDAWorld: "VDAWorld: World Modelling via VLM-Directed Abstraction and Simulation", `arxiv 2025.12. [Paper] [Website]
"Closing the Train-Test Gap in World Models for Gradient-Based Planning", `arxiv 2025.12. [Paper]
WonderZoom: "WonderZoom: Multi-Scale 3D World Generation", `arxiv 2025.12. [Paper] [Website]
Astra: "Astra: General Interactive World Model with Autoregressive Denoising", `arxiv 2025.12. [Paper] [Website] [Code]
Visionary: "Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform", `arxiv 2025.12. [Paper] [Website]
CLARITY: "CLARITY: Medical World Model for Guiding Treatment Decisions by Modeling Context-Aware Disease Trajectories in Latent Space", `arxiv 2025.12. [Paper]
UnityVideo: "UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation", `arxiv 2025.12. [Paper] [Website] [Code]
"Speech World Model: Causal State-Action Planning with Explicit Reasoning for Speech", `arxiv 2025.12. [Paper]
"Probing the effectiveness of World Models for Spatial Reasoning through Test-time Scaling", `arxiv 2025.12. [Paper] [Code]
ProPhy: "ProPhy: Progressive Physical Alignment for Dynamic World Simulation", `arxiv 2025.12. [Paper]
BiTAgent: "BiTAgent: A Task-Aware Modular Framework for Bidirectional Coupling between Multimodal Large Language Models and World Models", `arxiv 2025.12. [Paper]
RELIC: "RELIC: Interactive Video World Model with Long-Horizon Memory", `arxiv 2025.12. [Paper] [Website]
"Better World Models Can Lead to Better Post-Training Performance", `arxiv 2025.12. [Paper]
SeeU: "SeeU: Seeing the Unseen World via 4D Dynamics-aware Generation", `arxiv 2025.12. [Paper] [Website]
DynamicVerse: "DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling", `arxiv 2025.12. [Paper]
IC-World: "IC-World: In-Context Generation for Shared World Modeling", `arxiv 2025.12. [Paper] [Code]
WorldPack: "WorldPack: Compressed Memory Improves Spatial Consistency in Video World Modeling", `arxiv 2025.12. [Paper]
GrndCtrl: "GrndCtrl: Grounding World Models via Self-Supervised Reward Alignment", `arxiv 2025.12. [Paper]
ChronosObserver: "ChronosObserver: Taming 4D World with Hyperspace Diffusion Sampling", `arxiv 2025.12. [Paper]
AVWM: "Audio-Visual World Models: Towards Multisensory Imagination in Sight and Sound", `arxiv 2025.12. [Paper]
VCWorld: "VCWorld: A Biological World Model for Virtual Cell Simulation", `arxiv 2025.12. [Paper] [Code]
VISTAv2: "VISTAv2: World Imagination for Indoor Vision-and-Language Navigation", `arxiv 2025.12. [Paper] [Website]
Captain Safari: "Captain Safari: A World Engine", arxiv 2025.11. [Paper] [Website]
WorldWander: "WorldWander: Bridging Egocentric and Exocentric Worlds in Video Generation", arxiv 2025.11. [Paper] [Code]
Inferix: "Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation", arxiv 2025.11. [Paper] [Code]
MagicWorld, MagicWorld: Towards Long-Horizon Stability for Interactive Video World Exploration. [Paper][Website] [Code]
"Counterfactual World Models via Digital Twin-conditioned Video Diffusion", arxiv 2025.11. [Paper]
WorldGen: "WorldGen: From Text to Traversable and Interactive 3D Worlds", arxiv 2025.11. [Paper] [Website]
X-WIN: "X-WIN: Building Chest Radiograph World Model via Predictive Sensing", arxiv 2025.11. [Paper]
"Object-Centric World Models for Causality-Aware Reinforcement Learning", AAAI 2026. [Paper]
"Latent-Space Autoregressive World Model for Efficient and Robust Image-Goal Navigation", arxiv 2025.11. [Paper]
Dynamic Sparsity: "Dynamic Sparsity: Challenging Common Sparsity Assumptions for Learning World Models in Robotic Reinforcement Learning Benchmarks", AAAI 2026. [Paper]
MrCoM: "MrCoM: A Meta-Regularized World-Model Generalizing Across Multi-Scenarios", AAAI 2026. [Paper]
"Next-Latent Prediction Transformers Learn Compact World Models", arxiv 2025.11. [Paper]
DR. WELL: "DR. WELL: Dynamic Reasoning and Learning with Symbolic World Model for Embodied LLM-Based Multi-Agent Collaboration", NeurIPS 2025 Workshop: Bridging Language, Agent, and World Models for Reasoning and Planning (LAW). [Paper] [Website]
"How Far Are Surgeons from Surgical World Models? A Pilot Study on Zero-shot Surgical Video Generation with Expert Assessment", arxiv 2025.11. [Paper]
"From Pixels to Cooperation Multi Agent Reinforcement Learning based on Multimodal World Models", arxiv 2025.11. [Paper]
"Bootstrap Off-policy with World Model", NIPS 2025. [Paper]
"Clone Deterministic 3D Worlds with Geometrically-Regularized World Models", arxiv 2025.10. [Paper]
"Semantic Communications with World Models", arxiv 2025.10. [Paper]
TRELLISWorld: "TRELLISWorld: Training-Free World Generation from Object Generators", arxiv 2025.10. [Paper]
WorldGrow: "WorldGrow: Generating Infinite 3D World", arxiv 2025.10. [Paper] [Code]
PhysWorld: "PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis", arxiv 2025.10. [Paper]
"How Hard is it to Confuse a World Model?", arxiv 2025.10. [Paper]
"Social World Model-Augmented Mechanism Design Policy Learning", NIPS 2025. [Paper]
VAGEN: "VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents", NIPS 2025. [Paper] [Website]
Cosmos-Surg-dVRK: "Cosmos-Surg-dVRK: World Foundation Model-based Automated Online Evaluation of Surgical Robot Policy Learning", arxiv 2025.10. [Paper]
"Zero-shot World Models via Search in Memory", arxiv 2025.10. [Paper]
"Vector Quantization in the Brain: Grid-like Codes in World Models", NIPS 2025. [Paper]
Terra: "Terra: Explorable Native 3D World Model with Point Latents", arxiv 2025.10. [Paper] [Website]
Deep SPI: "Deep SPI: Safe Policy Improvement via World Models", arxiv 2025.10. [Paper]
"One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration", arxiv 2025.10. [Paper] [Code]
R-WoM: "R-WoM: Retrieval-augmented World Model For Computer-use Agents", arxiv 2025.10. [Paper]
WorldMirror: "WorldMirror: Universal 3D World Reconstruction with Any-Prior Prompting", arxiv 2025.10. [Paper]
Unified World Models: "Unified World Models: Memory-Augmented Planning and Foresight for Visual Navigation", arxiv 2025.10. [Paper] [code]
"Code World Models for General Game Playing", arxiv 2025.10. [Paper]
MorphoSim: "MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator", arxiv 2025.10. [Paper] [code]
ChronoEdit: "ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation", arxiv 2025.10. [Paper] [Website]
SFP: "Spatiotemporal Forecasting as Planning: A Model-Based Reinforcement Learning Approach with Generative World Models", arxiv 2025.10. [Paper]
EvoWorld: "EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory", arxiv 2025.10. [Paper] [Code]
"World Model for AI Autonomous Navigation in Mechanical Thrombectomy", MICCAI 2025. Lecture Notes in Computer Science. [Paper]
DyMoDreamer: "DyMoDreamer: World Modeling with Dynamic Modulation", NeurIPS 2025. [Paper] [Code]
Dreamer4: "Training Agents Inside of Scalable World Models", arxiv 2025.09. [Paper] [Website]
"Reinforcement Learning with Inverse Rewards for World Model Post-training", arxiv 2025.09. [Paper]
"Context and Diversity Matter: The Emergence of In-Context Learning in World Models", arxiv 2025.09. [Paper]
FantasyWorld: "FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction", arxiv 2025.09. [Paper]
"Remote Sensing-Oriented World Model", arxiv 2025.09. [Paper]
"World Modeling with Probabilistic Structure Integration", arxiv 2025.09. [Paper]
"One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning", arxiv 2025.09. [Paper] [Code]
LatticeWorld: "LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation", arxiv 2025.09. [Paper]
"Planning with Reasoning using Vision Language World Model", arxiv 2025.09. [Paper]
"Social World Models", arxiv 2025.09. [Paper]
"Dynamics-Aligned Latent Imagination in Contextual World Models for Zero-Shot Generalization", arxiv 2025.08. [Paper]
HERO: "HERO: Hierarchical Extrapolation and Refresh for Efficient World Models", arxiv 2025.08. [Paper]
"Scalable RF Simulation in Generative 4D Worlds", arxiv 2025.08. [Paper]
"Finite Automata Extraction: Low-data World Model Learning as Programs from Gameplay Video", arxiv 2025.08. [Paper]
"Visuomotor Grasping with World Models for Surgical Robots", arxiv 2025.08. [Paper]
"In-Context Reinforcement Learning via Communicative World Models", arxiv 2025.08. [Paper] [Code]
PIGDreamer: "PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning", ICML 2025. [Paper]
SimuRA: "SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model", arxiv 2025.07. [Paper]
"Back to the Features: DINO as a Foundation for Video World Models", arxiv 2025.07. [Paper]
Yume: "Yume: An Interactive World Generation Model", arxiv 2025.07. [Paper] [Website] [Code]
"LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning", arxiv 2025.07. [Paper]
"Safety Certification in the Latent space using Control Barrier Functions and World Models", arxiv 2025.07. [Paper]
"Assessing adaptive world models in machines with novel games", arxiv 2025.07. [Paper]
"Graph World Model", ICML 2025. [Paper] [Website]
MobiWorld: "MobiWorld: World Models for Mobile Wireless Network", arxiv 2025.07. [Paper]
"Continual Reinforcement Learning by Planning with Online World Models", ICML 2025 Spotlight. [Paper]
AirScape: "AirScape: An Aerial Generative World Model with Motion Controllability", arxiv 2025.07. [Paper] [Website]
Geometry Forcing: "Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling", arxiv 2025.07. [Paper] [Website]
Martian World Models: "Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions", arxiv 2025.07. [Paper] [Website]
"What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models", ICML 2025. [Paper]
"Critiques of World Models", arxiv 2025.07. [Paper]
"When do World Models Successfully Learn Dynamical Systems?", arxiv 2025.07. [Paper]
WebSynthesis: "WebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory Synthesis", arxiv 2025.07. [Paper]
"Accurate and Efficient World Modeling with Masked Latent Transformers", arxiv 2025.07. [Paper]
Dyn-O: "Dyn-O: Building Structured World Models with Object-Centric Representations", arxiv 2025.07. [Paper]
NavMorph: "NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments", ICCV 2025. [Paper] [Code]
"A “Good” Regulator May Provide a World Model for Intelligent Systems", arxiv 2025.06. [Paper]
Xray2Xray: "Xray2Xray: World Model from Chest X-rays with Volumetric Context", arxiv 2025.06. [Paper]
MATWM: "Transformer World Model for Sample Efficient Multi-Agent Reinforcement Learning", arxiv 2025.06. [Paper]
"Measuring (a Sufficient) World Model in LLMs: A Variance Decomposition Framework", arxiv 2025.06. [Paper]
"Efficient Generation of Diverse Cooperative Agents with World Models", arxiv 2025.06. [Paper]
WorldLLM: "WorldLLM: Improving LLMs' world modeling using curiosity-driven theory-making", arxiv 2025.06. [Paper]
"LLMs as World Models: Data-Driven and Human-Centered Pre-Event Simulation for Disaster Impact Assessment", arxiv 2025.06. [Paper]
"Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models", arxiv 2025.06. [Paper]
"Video World Models with Long-term Spatial Memory", arxiv 2025.06. [Paper] [Website]
DSG-World: "DSG-World: Learning a 3D Gaussian World Model from Dual State Videos", arxiv 2025.06. [Paper]
"Safe Planning and Policy Optimization via World Model Learning", arxiv 2025.06. [Paper]
FOLIAGE: "FOLIAGE: Towards Physical Intelligence World Models Via Unbounded Surface Evolution", arxiv 2025.06. [Paper]
"Linear Spatial World Models Emerge in Large Language Models", arxiv 2025.06. [Paper] [Code]
Simple, Good, Fast: "Simple, Good, Fast: Self-Supervised World Models Free of Baggage", ICLR 2025. [Paper] [Code]
Medical World Model: "Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning", arxiv 2025.06. [Paper]
"General agents need world models", ICML 2025. [Paper]
"Learning Abstract World Models with a Group-Structured Latent Space", arxiv 2025.06. [Paper]
DeepVerse: "DeepVerse: 4D Autoregressive Video Generation as a World Model", arxiv 2025.06. [Paper]
"World Models for Cognitive Agents: Transforming Edge Intelligence in Future Networks", arxiv 2025.06. [Paper]
Dyna-Think: "Dyna-Think: Synergizing Reasoning, Acting, and World Model Simulation in AI Agents", arxiv 2025.06. [Paper]
StateSpaceDiffuser: "StateSpaceDiffuser: Bringing Long Context to Diffusion World Models", arxiv 2025.05. [Paper]
"Learning World Models for Interactive Video Generation", arxiv 2025.05. [Paper]
"Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective", arxiv 2025.05. [Paper]
"Long-Context State-Space Video World Models", arxiv 2025.05. [Paper] [Website]
"Unlocking Smarter Device Control: Foresighted Planning with a World Model-Driven Code Execution Approach", arxiv 2025.05. [Paper]
"World Models as Reference Trajectories for Rapid Motor Adaptation", arxiv 2025.05. [Paper]
"Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning", arxiv 2025.05. [Paper]
"Building spatial world models from sparse transitional episodic memories", arxiv 2025.05. [Paper]
PoE-World: "PoE-World: Compositional World Modeling with Products of Programmatic Experts", arxiv 2025.05. [Paper] [Website]
"Explainable Reinforcement Learning Agents Using World Models", arxiv 2025.05. [Paper]
seq-JEPA: "seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models", arxiv 2025.05. [Paper]
"Coupled Distributional Random Expert Distillation for World Model Online Imitation Learning", arxiv 2025.05. [Paper]
"Learning Local Causal World Models with State Space Models and Attention", arxiv 2025.05. [Paper]
WebEvolver: "WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model", arxiv 2025.04. [Paper]
WALL-E 2.0: "WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents", arxiv 2025.04. [Paper] [Code]
ViMo: "ViMo: A Generative Visual GUI World Model for App Agent", arxiv 2025.04. [Paper]
"Simulating Before Planning: Constructing Intrinsic User World Model for User-Tailored Dialogue Policy Planning", SIGIR 2025. [Paper]
CheXWorld: "CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning", CVPR 2025. [Paper] [Code]
EchoWorld: "EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance", CVPR 2025. [Paper] [Code]
"Adapting a World Model for Trajectory Following in a 3D Game", ICLR 2025 Workshop on World Models. [Paper]
MineWorld: "MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft", arXiv 2025.04. [Paper] [Website]
MoSim: "Neural Motion Simulator Pushing the Limit of World Models in Reinforcement Learning", CVPR 2025. [Paper]
"Improving World Models using Deep Supervision with Linear Probes", ICLR 2025 Workshop on World Models. [Paper]
"Decentralized Collective World Model for Emergent Communication and Coordination", arXiv 2025.04. [Paper]
"Adapting World Models with Latent-State Dynamics Residuals", arXiv 2025.04. [Paper]
"Can Test-Time Scaling Improve World Foundation Model?", arXiv 2025.03. [Paper] [Code]
"Synthesizing world models for bilevel planning", arXiv 2025.03. [Paper]
"Long-context autoregressive video modeling with next-frame prediction", arXiv 2025.03. [Paper] [Code] [Website]
Aether: "Aether: Geometric-Aware Unified World Modeling", arXiv 2025.03. [Paper] [Website]
FUSDREAMER: "FUSDREAMER: Label-efficient Remote Sensing World Model for Multimodal Data Classification", arXiv 2025.03. [Paper] [Website]
"Inter-environmental world modeling for continuous and compositional dynamics", arXiv 2025.03. [Paper]
Disentangled World Models: "Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning", arXiv 2025.03. [Paper]
"Revisiting the Othello World Model Hypothesis", ICLR World Models Workshop. [Paper]
"Learning Transformer-based World Models with Contrastive Predictive Coding", arXiv 2025.03. [Paper]
"Surgical Vision World Model", arXiv 2025.03. [Paper]
"World Models for Anomaly Detection during Model-Based Reinforcement Learning Inference", arXiv 2025.03. [Paper]
WMNav: "WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation", arXiv 2025.03. [Paper] [Website]
SENSEI: "SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models", arXiv 2025.03. [Paper] [Website]
"Learning Actionable World Models for Industrial Process Control", arXiv 2025.03. [Paper]
"Implementing Spiking World Model with Multi-Compartment Neurons for Model-based Reinforcement Learning", arXiv 2025.03. [Paper]
"Discrete Codebook World Models for Continuous Control", ICLR 2025. [Paper]
Multimodal Dreaming: "Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning", arXiv 2025.02. [Paper]
"Generalist World Model Pre-Training for Efficient Reinforcement Learning", arXiv 2025.02. [Paper]
"Learning To Explore With Predictive World Model Via Self-Supervised Learning", arXiv 2025.02. [Paper]
M^3: "M^3: A Modular World Model over Streams of Tokens", arXiv 2025.02. [Paper]
"When do neural networks learn world models?", arXiv 2025.02. [Paper]
"Pre-Trained Video Generative Models as World Simulators", arXiv 2025.02. [Paper]
DMWM: "DMWM: Dual-Mind World Model with Long-Term Imagination", arXiv 2025.02. [Paper]
EvoAgent: "EvoAgent: Agent Autonomous Evolution with Continual World Model for Long-Horizon Tasks", arXiv 2025.02. [Paper]
"Acquisition through My Eyes and Steps: A Joint Predictive Agent Model in Egocentric Worlds", arXiv 2025.02. [Paper]
"Generating Symbolic World Models via Test-time Scaling of Large Language Models", arXiv 2025.02. [Paper] [Website]
"Improving Transformer World Models for Data-Efficient RL", arXiv 2025.02. [Paper]
"Trajectory World Models for Heterogeneous Environments", arXiv 2025.02. [Paper]
"Enhancing Memory and Imagination Consistency in Diffusion-based World Models via Linear-Time Sequence Modeling", arXiv 2025.02. [Paper]
"Objects matter: object-centric world models improve reinforcement learning in visually complex environments", arXiv 2025.01. [Paper]
GLAM: "GLAM: Global-Local Variation Awareness in Mamba-based World Model", arXiv 2025.01. [Paper]
GAWM: "GAWM: Global-Aware World Model for Multi-Agent Reinforcement Learning", arXiv 2025.01. [Paper]
"Generative Emergent Communication: Large Language Model is a Collective World Model", arXiv 2025.01. [Paper]
"Towards Unraveling and Improving Generalization in World Models", arXiv 2025.01. [Paper]
"Towards Physically Interpretable World Models: Meaningful Weakly Supervised Representations for Visual Trajectory Prediction", arXiv 2024.12. [Paper]
"Transformers Use Causal World Models in Maze-Solving Tasks", arXiv 2024.12. [Paper]
"Causal World Representation in the GPT Model", NIPS 2024 Workshop. [Paper]
Owl-1: "Owl-1: Omni World Model for Consistent Long Video Generation", arXiv 2024.12. [Paper]
"Navigation World Models", arXiv 2024.12. [Paper] [Website]
"Evaluating World Models with LLM for Decision Making", arXiv 2024.11. [Paper]
LLMPhy: "LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models", arXiv 2024.11. [Paper]
WebDreamer: "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents", arXiv 2024.11. [Paper] [Code]
"Scaling Laws for Pre-training Agents and World Models", arXiv 2024.11. [Paper]
DINO-WM: "DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning", arXiv 2024.11. [Paper] [Website]
"Learning World Models for Unconstrained Goal Navigation", NIPS 2024. [Paper]
"How Far is Video Generation from World Model: A Physical Law Perspective", arXiv 2024.11. [Paper] [Website] [Code]
Adaptive World Models: "Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity", NIPS 2024 Workshop Adaptive Foundation Models. [Paper]
LLMCWM: "Language Agents Meet Causality -- Bridging LLMs and Causal World Models", arXiv 2024.10. [Paper] [Code]
"Reward-free World Models for Online Imitation Learning", arXiv 2024.10. [Paper]
"Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation", arXiv 2024.10. [Paper]
AVID: "AVID: Adapting Video Diffusion Models to World Models", arXiv 2024.10. [Paper] [Code]
SMAC: "Grounded Answers for Multi-agent Decision-making Problem through Generative World Model", NeurIPS 2024. [Paper]
OSWM: "One-shot World Models Using a Transformer Trained on a Synthetic Prior", arXiv 2024.09. [Paper]
"Making Large Language Models into World Models with Precondition and Effect Knowledge", arXiv 2024.09. [Paper]
"Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction", arXiv 2024.08. [Paper]
MoReFree: "World Models Increase Autonomy in Reinforcement Learning", arXiv 2024.08. [Paper] [Project]
UrbanWorld: "UrbanWorld: An Urban World Model for 3D City Generation", arXiv 2024.07. [Paper]
PWM: "PWM: Policy Learning with Large World Models", arXiv 2024.07. [Paper] [Code]
"Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling", arXiv 2024.07. [Paper]
GenRL: "GenRL: Multimodal foundation world models for generalist embodied agents", arXiv 2024.06. [Paper] [Code]
DLLM: "World Models with Hints of Large Language Models for Goal Achieving", arXiv 2024.06. [Paper]
"Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model", arXiv 2024.06. [Paper]
CoDreamer: "CoDreamer: Communication-Based Decentralised World Models", arXiv 2024.06. [Paper]
Pandora: "Pandora: Towards General World Model with Natural Language Actions and Video States", arXiv 2024.06. [Paper] [Code]
EBWM: "Cognitively Inspired Energy-Based World Models", arXiv 2024.06. [Paper]
"Evaluating the World Model Implicit in a Generative Model", arXiv 2024.06. [Paper] [Code]
"Transformers and Slot Encoding for Sample Efficient Physical World Modelling", arXiv 2024.05. [Paper] [Code]
Puppeteer: "Hierarchical World Models as Visual Whole-Body Humanoid Controllers", arXiv 2024.05. [Paper] [Code]
BWArea Model: "BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation", arXiv 2024.05. [Paper]
WKM: "Agent Planning with World Knowledge Model", arXiv 2024.05. [Paper] [Code]
Diamond: "Diffusion for World Modeling: Visual Details Matter in Atari", arXiv 2024.05. [Paper] [Code]
"Compete and Compose: Learning Independent Mechanisms for Modular World Models", arXiv 2024.04. [Paper]
"Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization", arXiv 2024.03. [Paper] [Code]
V-JEPA: "V-JEPA: Video Joint Embedding Predictive Architecture", Meta AI. [Blog] [Paper] [Code]
IWM: "Learning and Leveraging World Models in Visual Representation Learning", Meta AI. [Paper]
Genie: "Genie: Generative Interactive Environments", DeepMind. [Paper] [Blog]
Sora: "Video generation models as world simulators", OpenAI. [Technical report]
LWM: "World Model on Million-Length Video And Language With RingAttention", arXiv 2024.02. [Paper] [Code]
"Planning with an Ensemble of World Models", OpenReview. [Paper]
WorldDreamer: "WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens", arXiv 2024.01. [Paper] [Code]
CWM: "Understanding Physical Dynamics with Counterfactual World Modeling", ECCV 2024. [Paper] [Code]
Δ-IRIS: "Efficient World Models with Context-Aware Tokenization", ICML 2024. [Paper] [Code]
LLM-Sim: "Can Language Models Serve as Text-Based World Simulators?", ACL. [Paper] [Code]
AD3: "AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors", ICML 2024. [Paper]
MAMBA: "MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning", ICLR 2024. [Paper] [Code]
R2I: "Mastering Memory Tasks with World Models", ICLR 2024. [Paper] [Website] [Code]
HarmonyDream: "HarmonyDream: Task Harmonization Inside World Models", ICML 2024. [Paper] [Code]
REM: "Improving Token-Based World Models with Parallel Observation Prediction", ICML 2024. [Paper] [Code]
"Do Transformer World Models Give Better Policy Gradients?"", ICML 2024. [Paper]
DreamSmooth: "DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing", ICLR 2024. [Paper]
TD-MPC2: "TD-MPC2: Scalable, Robust World Models for Continuous Control", ICLR 2024. [Paper] [Torch Code]
Hieros: "Hieros: Hierarchical Imagination on Structured State Space Sequence World Models", ICML 2024. [Paper]
CoWorld: "Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning", NeurIPS 2024. [Paper]

World Models for Embodied AI

JailWAM: "JailWAM: Jailbreaking World Action Models in Robot Control", arxiv 2026.04. [Paper]
"Hierarchical Planning with Latent World Models", arxiv 2026.04. [Paper]
"World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry", arxiv 2026.04. [Paper] [Website]
EgoSim: "EgoSim: Egocentric World Simulator for Embodied Interaction Generation", arxiv 2026.04. [Paper] [Website]
"Enhancing Policy Learning with World-Action Model", arxiv 2026.03. [Paper]
"Persistent Robot World Models: Stabilizing Multi-Step Rollouts via Reinforcement Learning", arxiv 2026.03. [Paper]
ThinkJEPA: "ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model", arxiv 2026.03. [Paper]
"Do World Action Models Generalize Better than VLAs? A Robustness Study", arxiv 2026.03. [Paper]
DDP: "Dreaming the Unseen: World Model-regularized Diffusion Policy for Out-of-Distribution Robustness", arxiv 2026.03. [Paper]
OmniVTA: "OmniVTA: Visuo-Tactile World Modeling for Contact-Rich Robotic Manipulation", arxiv 2026.03. [Paper] [Website]
EVA: "EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards", arxiv 2026.03. [Paper] [Website]
DreamPlan: "DreamPlan: Efficient Reinforcement Fine-Tuning of Vision-Language Planners via Video World Models", arxiv 2026.03. [Paper] [Website]
Kinema4D: "Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation", arxiv 2026.03. [Paper] [Website]
"Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation", arxiv 2026.03. [Paper] [Website]
WestWorld: "WestWorld: A Knowledge-Encoded Scalable Trajectory World Model for Diverse Robotic Systems", arxiv 2026.03. [Paper] [Website]
"Building Explicit World Model for Zero-Shot Open-World Object Manipulation", arxiv 2026.03. [Paper] [Website]
"Egocentric World Model for Photorealistic Hand-Object Interaction Synthesis", arxiv 2026.03. [Paper] [Website]
RoboStereo: "RoboStereo: Dual-Tower 4D Embodied World Models for Unified Policy Optimization", arxiv 2026.03. [Paper]
ResWM: "ResWM: Residual-Action World Model for Visual RL", arxiv 2026.03. [Paper]
PlayWorld: "PlayWorld: Learning Robot World Models from Autonomous Play", arxiv 2026.03. [Paper] [Website]
MetaWorld-X: "MetaWorld-X: Hierarchical World Modeling via VLM-Orchestrated Experts for Humanoid Loco-Manipulation", arxiv 2026.03. [Paper] [Website]
"Interactive World Simulator for Robot Policy Training and Evaluation", arxiv 2026.03. [Paper] [Website]
"Foundational World Models Accurately Detect Bimanual Manipulator Failures", ICRA 2026. [Paper]
LPWM: "Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling", ICLR 2026. [Paper] [Website]
AdaWorldPolicy: "AdaWorldPolicy: World-Model-Driven Diffusion Policy with Online Adaptive Learning for Robotic Manipulation", arXiv 2026.02. [Paper] [Website]
FRAPPE: "FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment", arXiv 2026.02. [Paper] [Website]
"Learning to unfold cloth: Scaling up world models to deformable object manipulation", arXiv 2026.02. [Paper]
"World Model Failure Classification and Anomaly Detection for Autonomous Inspection", arXiv 2026.02. [Paper] [Website]
DreamZero: "World Action Models are Zero-shot Policies", arXiv 2026.02. [Paper] [Website]
WoVR: "WoVR: World Models as Reliable Simulators for Post-Training VLA Policies with RL", arXiv 2026.02. [Paper]
"Visual Foresight for Robotic Stow: A Diffusion-Based World Model from Sparse Snapshots", arXiv 2026.02. [Paper]
VLAW: "VLAW: Iterative Co-Improvement of Vision-Language-Action Policy and World Model", arXiv 2026.02. [Paper] [Website]
HAIC: "HAIC: Humanoid Agile Object Interaction Control via Dynamics-Aware World Model", arXiv 2026.02. [Paper] [Website]
H-WM: "H-WM: Robotic Task and Motion Planning Guided by Hierarchical World Model", arXiv 2026.02. [Paper]
RISE: "RISE: Self-Improving Robot Policy with Compositional World Model", arXiv 2026.02. [Paper] [Website]
"ContactGaussian-WM: Learning Physics-Grounded World Model from Videos", arXiv 2026.02. [Paper]
"Scaling World Model for Hierarchical Manipulation Policies", arXiv 2026.02. [Paper] [Website]
Say, Dream, and Act: "Say, Dream, and Act: Learning Video World Models for Instruction-Driven Robot Manipulation", arXiv 2026.02. [Paper]
"Affordances Enable Partial World Modeling with LLMs", arXiv 2026.02. [Paper]
VLA-JEPA: "VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model", arXiv 2026.02. [Paper]
MVISTA-4D: "MVISTA-4D: View-Consistent 4D World Model with Test-Time Action Inference for Robotic Manipulation", arXiv 2026.02. [Paper]
Hand2World: "Hand2World: Autoregressive Egocentric Interaction Generation via Free-Space Hand Gestures", arXiv 2026.02. [Paper] [Website]
World-VLA-Loop: "World-VLA-Loop: Closed-Loop Learning of Video World Model and VLA Policy", arXiv 2026.02. [Paper] [Website]
"Coupled Local and Global World Models for Efficient First Order RL", arXiv 2026.02. [Paper]
"Visuo-Tactile World Models", arXiv 2026.02. [Paper]
BridgeV2W: "BridgeV2W: Bridging Video Generation Models to Embodied World Models via Embodiment Masks", arXiv 2026.02. [Paper] [Website]
World-Gymnast: "World-Gymnast: Training Robots with Reinforcement Learning in a World Model", arXiv 2026.02. [Paper] [Website]
MetaWorld: "MetaWorld: Skill Transfer and Composition in a Hierarchical World Model for Grounding High-Level Instructions", arXiv 2026.01. [Paper] [Code]
"Walk through Paintings: Egocentric World Models from Internet Priors", arXiv 2026.01. [Paper]
"Aligning Agentic World Models via Knowledgeable Experience Learning", arXiv 2026.01. [Paper]
ReWorld: "ReWorld: Multi-Dimensional Reward Modeling for Embodied World Models", arXiv 2026.01. [Paper]
"An Efficient and Multi-Modal Navigation System with One-Step World Model", arXiv 2026.01. [Paper]
PointWorld: "PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation", arXiv 2026.01. [Paper] [Website]
Dream2Flow: "Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow", arXiv 2025.12. [Paper] [Website]
"What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?", arXiv 2025.12. [Paper] [Code]
Act2Goal: "Act2Goal: From World Model To General Goal-conditioned Policy", arXiv 2025.12. [Paper] [Website]
AstraNav-World: "AstraNav-World: World Model for Foresight Control and Consistency", arXiv 2025.12. [Paper]
ChronoDreamer: "ChronoDreamer: Action-Conditioned World Model as an Online Simulator for Robotic Planning", arXiv 2025.12. [Paper]
STORM: "STORM: Search-Guided Generative World Models for Robotic Manipulation", arXiv 2025.12. [Paper]
"World Models Can Leverage Human Videos for Dexterous Manipulation", arXiv 2025.12. [Paper]
"Latent Action World Models for Control with Unlabeled Trajectories", arXiv 2025.12. [Paper]
PRISM-WM: "Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems", arXiv 2025.12. [Paper]
"Learning Robot Manipulation from Audio World Models", arXiv 2025.12. [Paper]
"Embodied Tree of Thoughts: Deliberate Manipulation Planning with Embodied World Model", arXiv 2025.12. [Paper] [Website]
"World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty", arXiv 2025.12. [Paper]
"Real-World Robot Control by Deep Active Inference With a Temporally Hierarchical World Model", IEEE Robotics and Automation Letters. [Paper]
"Seeing through Imagination: Learning Scene Geometry via Implicit Spatial World Modeling", arXiv 2025.12. [Paper]
IGen: "IGen: Scalable Data Generation for Robot Learning from Open-World Images", arXiv 2025.12. [Paper] [Website]
NavForesee: "NavForesee: A Unified Vision-Language World Model for Hierarchical Planning and Dual-Horizon Navigation Prediction", arXiv 2025.12. [Paper]
TraceGen: "TraceGen: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos", arXiv 2025.11. [Paper] [Website]
ENACT: "ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction", arXiv 2025.11. [Paper] [Website] [Code]
"Learning Massively Multitask World Models for Continuous Control", arXiv 2025.11. [Paper] [Website]
UNeMo: "UNeMo: Collaborative Visual-Language Reasoning and Navigation via a Multimodal World Model", arXiv 2025.11. [Paper]
"MindForge: Empowering Embodied Agents with Theory of Mind for Lifelong Cultural Learning", NeurIPS 2025. [Paper]
"Towards High-Consistency Embodied World Model with Multi-View Trajectory Videos", arXiv 2025.11. [Paper]
WMPO: "WMPO: World Model-based Policy Optimization for Vision-Language-Action Models", arXiv 2025.11. [Paper] [Website]
"Robot Learning from a Physical World Model", arXiv 2025.11. [Paper] [Website]
"When Object-Centric World Models Meet Policy Learning: From Pixels to Policies, and Where It Breaks", arXiv 2025.11. [Paper]
WorldPlanner: "WorldPlanner: Monte Carlo Tree Search and MPC with Action-Conditioned Visual World Models", arXiv 2025.11. [Paper]
"Learning Interactive World Model for Object-Centric Reinforcement Learning", NIPS 2025. [Paper]
"Scaling Cross-Embodiment World Models for Dexterous Manipulation", arXiv 2025.11. [Paper]
"Co-Evolving Latent Action World Models", arXiv 2025.10. [Paper]
"Deductive Chain-of-Thought Augmented Socially-aware Robot Navigation World Model", arXiv 2025.10. [Paper] [Website]
"Deep Active Inference with Diffusion Policy and Multiple Timescale World Model for Real-World Exploration and Navigation", arXiv 2025.10. [Paper]
ProTerrain: "ProTerrain: Probabilistic Physics-Informed Rough Terrain World Modeling", arXiv 2025.10. [Paper]
"Ego-Vision World Model for Humanoid Contact Planning", arXiv 2025.10. [Paper] [Website]
Ctrl-World: "Ctrl-World: A Controllable Generative World Model for Robot Manipulation", arXiv 2025.10. [Paper] [Website] [Code]
iMoWM: "iMoWM: Taming Interactive Multi-Modal World Model for Robotic Manipulation", arXiv 2025.10. [Paper] [Website]
WristWorld: "WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation", arXiv 2025.10. [Paper]
"A Recipe for Efficient Sim-to-Real Transfer in Manipulation with Online Imitation-Pretrained World Models", arXiv 2025.10. [Paper]
"Kinodynamic Motion Planning for Mobile Robot Navigation across Inconsistent World Models", RSS 2025 Workshop on Resilient Off-road Autonomous Robotics (ROAR). [Paper]
EMMA: "EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer", arXiv 2025.09. [Paper]
LongScape: "LongScape: Advancing Long-Horizon Embodied World Models with Context-Aware MoE", arXiv 2025.09. [Paper]
KeyWorld: "KeyWorld: Key Frame Reasoning Enables Effective and Efficient World Models", arXiv 2025.09. [Paper]
DAWM: "DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions", ICML 2025 Workshop. [Paper]
World4RL: "World4RL: Diffusion World Models for Policy Refinement with Reinforcement Learning for Robotic Manipulation", arXiv 2025.09. [Paper]
SAMPO: "SAMPO:Scale-wise Autoregression with Motion PrOmpt for generative world models", arXiv 2025.09. [Paper]
PhysicalAgent: "PhysicalAgent: Towards General Cognitive Robotics with Foundation World Models", arXiv 2025.09. [Paper]
"Empowering Multi-Robot Cooperation via Sequential World Models", arXiv 2025.09. [Paper]
"World Model Implanting for Test-time Adaptation of Embodied Agents", ICML 2025. [Paper]
"Learning Primitive Embodied World Models: Towards Scalable Robotic Learning", arxiv 2025.08. [Paper] [Website]
GWM: "GWM: Towards Scalable Gaussian World Models for Robotic Manipulation", ICCV 2025. [Paper] [Website]
"Imaginative World Modeling with Scene Graphs for Embodied Agent Navigation", arxiv 2025.08. [Paper]
"Bounding Distributional Shifts in World Modeling through Novelty Detection", arxiv 2025.08. [Paper]
Genie Envisioner: "Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation", arxiv 2025.08. [Paper] [Website]
DiWA: "DiWA: Diffusion Policy Adaptation with World Models", CoRL 2025. [Paper] [Code]
CoEx: "CoEx -- Co-evolving World-model and Exploration", arxiv 2025.07. [Paper]
"Latent Policy Steering with Embodiment-Agnostic Pretrained World Models", arxiv 2025.07. [Paper]
MindJourney: "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning", arxiv 2025.07. [Paper] [Website]
FOUNDER: "FOUNDER: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making", ICML 2025. [Paper] [Website]
EmbodieDreamer: "EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling", arxiv 2025.07. [Paper] [Website]
World4Omni: "World4Omni: A Zero-Shot Framework from Image Generation World Model to Robotic Manipulation", arxiv 2025.06. [Paper] [Website]
RoboScape: "RoboScape: Physics-informed Embodied World Model", arxiv 2025.06. [Paper] [Code]
ParticleFormer: "ParticleFormer: A 3D Point Cloud World Model for Multi-Object, Multi-Material Robotic Manipulation", arxiv 2025.06. [Paper] [Website]
ManiGaussian++: "ManiGaussian++: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model", arxiv 2025.06. [Paper] [Code]
ReOI: "Reimagination with Test-time Observation Interventions: Distractor-Robust World Model Predictions for Visual Model Predictive Control", arxiv 2025.06. [Paper]
GAF: "GAF: Gaussian Action Field as a Dynamic World Model for Robotic Mlanipulation", arxiv 2025.06. [Paper] [Website]
"Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins", RSS 2025. [Paper] [Website]
V-JEPA 2 and V-JEPA 2-AC: "V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning", arxiv 2025.06. [Paper] [Website] [Code]
"Time-Aware World Model for Adaptive Prediction and Control", ICML 2025. [Paper]
3DFlowAction: "3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model", arxiv 2025.06. [Paper]
ORV: "ORV: 4D Occupancy-centric Robot Video Generation", arxiv 2025.06. [Paper] [Code] [Website]
WoMAP: "WoMAP: World Models For Embodied Open-Vocabulary Object Localization", arxiv 2025.06. [Paper]
"Sparse Imagination for Efficient Visual World Model Planning", arxiv 2025.06. [Paper]
Humanoid World Models: "Humanoid World Models: Open World Foundation Models for Humanoid Robotics", arxiv 2025.06. [Paper]
"Evaluating Robot Policies in a World Model", arxiv 2025.06. [Paper] [Website]
OSVI-WM: "OSVI-WM: One-Shot Visual Imitation for Unseen Tasks using World-Model-Guided Trajectory Generation", arxiv 2025.05. [Paper]
WorldEval: "WorldEval: World Model as Real-World Robot Policies Evaluator", arxiv 2025.05. [Paper] [Website]
"Consistent World Models via Foresight Diffusion", arxiv 2025.05. [Paper]
Vid2World: "Vid2World: Crafting Video Diffusion Models to Interactive World Models", arXiv 2025.05. [Paper] [Website]
RLVR-World: "RLVR-World: Training World Models with Reinforcement Learning", arXiv 2025.05. [Paper] [Website] [Code]
LaDi-WM: "LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation", arXiv 2025.05. [Paper]
FlowDreamer: "FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation", arXiv 2025.05. [Paper] [Website]
"Occupancy World Model for Robots", arXiv 2025.05. [Paper]
"Learning 3D Persistent Embodied World Models", arXiv 2025.05. [Paper]
TesserAct: "TesserAct: Learning 4D Embodied World Models", arXiv 2025.04. [Paper] [Website]
PIN-WM: "PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation", arXiv 2025.04. [Paper]
"Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator", arXiv 2025.04. [Paper]
ManipDreamer: "ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance", arXiv 2025.04. [Paper]
UWM: "Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets", arXiv 2025.04. [Paper] [Website]
"Perspective-Shifted Neuro-Symbolic World Models: A Framework for Socially-Aware Robot Navigation", arXiv 2025.03. [Paper]
AdaWorld: "AdaWorld: Learning Adaptable World Models with Latent Actions", arXiv 2025.03. [Paper] [Website]
DyWA: "DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation", arXiv 2025.03. [Paper] [Website]
"Towards Suturing World Models: Learning Predictive Models for Robotic Surgical Tasks", arXiv 2025.03. [Paper] [Website]
"World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning", arXiv 2025.03. [Paper]
LUMOS: "LUMOS: Language-Conditioned Imitation Learning with World Models", ICRA 2025. [Paper] [Website]
"Object-Centric World Model for Language-Guided Manipulation", arXiv 2025.03. [Paper]
DEMO^3: "Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning", arXiv 2025.03. [Paper] [Website]
"Accelerating Model-Based Reinforcement Learning with State-Space World Models", arXiv 2025.02. [Paper]
"Learning Humanoid Locomotion with World Model Reconstruction", arXiv 2025.02. [Paper]
"Strengthening Generative Robot Policies through Predictive World Modeling", arXiv 2025.02. [Paper] [Website]
Robotic World Model: "Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics", arXiv 2025.01. [Paper]
RoboHorizon: "RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation", arXiv 2025.01. [Paper]
Dream to Manipulate: "Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination", arXiv 2024.12. [Paper] [Website]
WHALE: "WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making", arXiv 2024.11. [Paper]
VisualPredicator: "VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning", arXiv 2024.10. [Paper]
"Multi-Task Interactive Robot Fleet Learning with Visual World Models", CoRL 2024. [Paper] [Code]
X-MOBILITY: "X-MOBILITY: End-To-End Generalizable Navigation via World Modeling", arXiv 2024.10. [Paper]
PIVOT-R: "PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation", NeurIPS 2024. [Paper]
GLIMO: "Grounding Large Language Models In Embodied Environment With Imperfect World Models", arXiv 2024.10. [Paper]
EVA: "EVA: An Embodied World Model for Future Video Anticipation", arxiv 2024.10. [Paper] [Website]
PreLAR: "PreLAR: World Model Pre-training with Learnable Action Representation", ECCV 2024. [Paper] [Code]
WMP: "World Model-based Perception for Visual Legged Locomotion", arXiv 2024.09. [Paper] [Project]
R-AIF: "R-AIF: Solving Sparse-Reward Robotic Tasks from Pixels with Active Inference and World Models", arXiv 2024.09. [Paper]
"Representing Positional Information in Generative World Models for Object Manipulation" arXiv 2024.09 [Paper]
DexSim2Real$^2$: "DexSim2Real$^2: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation", arXiv 2024.09. [Paper]
DWL: "Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning", RSS 2024 (Best Paper Award Finalist). [Paper]
"Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics", arXiv 2024.06. [Paper] [Website]
HRSSM: "Learning Latent Dynamic Robust Representations for World Models", ICML 2024. [Paper] [Code]
RoboDreamer: "RoboDreamer: Learning Compositional World Models for Robot Imagination", ICML 2024. [Paper] [Code]
COMBO: "COMBO: Compositional World Models for Embodied Multi-Agent Cooperation", ECCV 2024. [Paper] [Website] [Code]
ManiGaussian: "ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation", arXiv 2024.03. [Paper] [Code]

VLA 的世界模型

DIAL: “DIAL：通过潜在世界建模解耦意图与动作，实现端到端 VLA”，arxiv 2026.03。[论文] [官网]
“面向视觉-语言-动作模型的实用型基于世界模型的强化学习”，arxiv 2026.03。[论文]
“利用生成式 3D 世界扩展机器人 VLAs 的模拟-现实强化学习”，arxiv 2026.03。[论文]
Fast-WAM: “Fast-WAM：世界动作模型是否需要在测试时进行未来想象？”，arxiv 2026.03。[论文] [官网]
StructVLA: “超越密集未来：作为结构化规划器的世界模型用于机器人操作”，arxiv 2026.03。[论文]
World2Act: “World2Act：通过技能组合型世界模型进行潜在动作的后训练”，arxiv 2026.03。[论文] [官网]
AtomVLA: “AtomVLA：通过预测性潜在世界模型实现机器人操作的可扩展后训练”，arxiv 2026.03。[论文]
Chain of World: “Chain of World：潜运动中的世界模型思维”，CVPR 2026。[论文] [官网]
“从预训练视频模型中学习物理：一种用于机器人操作的多模态连续且序列化的世界交互模型”，arxiv 2026.03。[论文]
World Guidance: “World Guidance：在条件空间中进行世界建模以生成动作”，arxiv 2026.02。[论文] [官网]
SC-VLA: “自校正 VLA：通过稀疏世界想象进行在线动作细化”，arxiv 2026.02。[论文] [官网]
Motus: “Motus：统一的潜在动作世界模型”，arxiv 2025.12。[论文] [官网] [代码]
RoboScape-R: “RoboScape-R：通过 RL 实现通用机器人训练的统一奖励-观测世界模型”，arxiv 2025.12。[论文]
AdaPower: “AdaPower：针对预测性操作的专业化世界基础模型”，arxiv 2025.12。[论文]
RynnVLA-002: “RynnVLA-002：统一的视觉-语言-动作及世界模型”，arxiv 2025.11。[论文] [代码]
NORA-1.5: “NORA-1.5：使用基于世界模型和动作的偏好奖励训练的视觉-语言-动作模型”，arxiv 2025.11。[论文] [官网] [代码]
“用于世界模型增强型视觉-语言-动作模型的双流扩散”，arxiv 2025.10。[论文]
VLA-RFT: “VLA-RFT：在世界模拟器中使用验证过的奖励进行视觉-语言-动作强化微调”，arxiv 2025.10。[论文]
World-Env: “World-Env：将世界模型用作 VLA 后训练的虚拟环境”，arxiv 2025.09。[论文]
MoWM: “MoWM：通过潜像素特征调制实现具身规划的世界模型混合体”，arxiv 2025.09。[论文]
LAWM: “通过世界建模进行潜在动作预训练”，arxiv 2025.09。[论文] [代码]
PAR: “无需动作预训练的机器人操作物理自回归模型”，arxiv 2025.08。[论文] [官网]
DreamVLA: “DreamVLA：一个融合全面世界知识的梦想型视觉-语言-动作模型”，arxiv 2025.07。[论文] [代码] [官网]
WorldVLA: “WorldVLA：迈向自回归动作世界模型”，arxiv 2025.06。[论文] [代码]
UniVLA: “UniVLA：统一的视觉-语言-动作模型”，arxiv 2025.06。[论文] [代码]
MinD: “MinD：通过层次化世界模型实现统一的视觉想象与控制”，arxiv 2025.06。[论文] [官网]
FLARE: “FLARE：通过隐式世界建模进行机器人学习”，arxiv 2025.05。[论文] [代码] [官网]
DreamGen: “DreamGen：通过视频世界模型解锁机器人学习的泛化能力”，arxiv 2025.06。[论文] [代码]
CoT-VLA: “CoT-VLA：视觉-语言-动作模型的视觉思维链推理”，CVPR 2025。[论文]
UP-VLA: “UP-VLA：一个统一的理解与预测模型，适用于具身智能体”，ICML 2025。[论文] [代码]
3D-VLA: “3D-VLA：一个 3D 视觉-语言-动作生成式世界模型”，ICML 2024。[论文]

用于视觉理解的世界模型

DILLO：“先描述再行动：通过蒸馏的语言-动作世界模型实现主动式智能体引导”，arxiv 2026.03。[论文]
WorldVLM：“WorldVLM：结合世界模型预测与视觉-语言推理”，arxiv 2026.03。[论文]
“何时以及在多大程度上进行想象：基于世界模型的自适应测试时缩放技术在视觉空间推理中的应用”，arxiv 2026.01。[论文] [网站]
“视觉生成通过多模态世界模型解锁类人推理能力”，arxiv 2026.01。[论文] [网站]
“语义世界模型”，arxiv 2025.10。[论文] [网站]
DyVA：“世界模型能否助力视觉-语言模型理解世界动态？”，arxiv 2025.10。[论文] [网站]
“视频模型是零样本学习者和推理者”，arxiv 2025.09。[论文]
“从生成到泛化：视频扩散模型中的涌现式少样本学习”，arxiv 2025.06。[论文]

用于自动驾驶的世界模型

Refer to https://github.com/LMD0311/Awesome-World-Model

DeltaWorld: "A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens", CVPR 2026. [Paper] [Code]
DriveDreamer-Policy: "DriveDreamer-Policy: A Geometry-Grounded World-Action Model for Unified Generation and Planning", `arxiv 2026.04. [Paper] [Website]
DLWM: "DLWM: Dual Latent World Models enable Holistic Gaussian-centric Pre-training in Autonomous Driving", `CVPR 2026. [Paper]
AutoWorld: "AutoWorld: Scaling Multi-Agent Traffic Simulation with Self-Supervised World Models", `arxiv 2026.03. [Paper]
OccSim: "OccSim: Multi-kilometer Simulation with Long-horizon Occupancy World Models", `arxiv 2026.03. [Paper]
Uni-World VLA: "Uni-World VLA: Interleaved World Modeling and Planning for Autonomous Driving", `arxiv 2026.03. [Paper]
DreamerAD: "DreamerAD: Efficient Reinforcement Learning via Latent World Model for Autonomous Driving", `arxiv 2026.03. [Paper]
Latent-WAM: "Latent-WAM: Latent World Action Modeling for End-to-End Autonomous Driving", `arxiv 2026.03. [Paper]
"Toward Physically Consistent Driving Video World Models under Challenging Trajectories", `arxiv 2026.03. [Paper] [Website]
CounterScene: "CounterScene: Counterfactual Causal Reasoning in Generative World Models for Safety-Critical Closed-Loop Evaluation", `arxiv 2026.03. [Paper]
X-World: "X-World: Controllable Ego-Centric Multi-Camera World Models for Scalable End-to-End Driving", `arxiv 2026.03. [Paper]
DynFlowDrive: "DynFlowDrive: Flow-Based Dynamic World Modeling for Autonomous Driving", `arxiv 2026.03. [Paper] [Code]
Enactor: "Enactor: From Traffic Simulators to Surrogate World Models", `arxiv 2026.03. [Paper]
VectorWorld: "VectorWorld: Efficient Streaming World Model via Diffusion Flow on Vector Graphs", `arxiv 2026.03. [Paper] [Code]
"Bridging Scene Generation and Planning: Driving with World Model via Unifying Vision and Motion Representation", `arxiv 2026.03. [Paper] [Code]
"Latent World Models for Automated Driving: A Unified Taxonomy, Evaluation Framework, and Open Challenges", `arxiv 2026.03. [Paper]
"Kinematics-Aware Latent World Models for Data-Efficient Autonomous Driving", `arxiv 2026.03. [Paper]
ShareVerse: "ShareVerse: Multi-Agent Consistent Video Generation for Shared World Modeling", `arxiv 2026.03. [Paper]
"Risk-Aware World Model Predictive Control for Generalizable End-to-End Autonomous Driving", `arxiv 2026.02. [Paper]
RAYNOVA: "RAYNOVA: Scale-Temporal Autoregressive World Modeling in Ray Space", `CVPR 2026. [Paper] [Website]
"When World Models Dream Wrong: Physical-Conditioned Adversarial Attacks against World Models", `arxiv 2026.02. [Paper]
"Factored Latent Action World Models", `arxiv 2026.02. [Paper]
ResWorld: "ResWorld: Temporal Residual World Model for End-to-End Autonomous Driving", `arxiv 2026.02. [Paper] [Code]
DriveWorld-VLA: "DriveWorld-VLA: Unified Latent-Space World Modeling with Vision-Language-Action for Autonomous Driving", `arxiv 2026.02. [Paper] [Code]
"Safe Urban Traffic Control via Uncertainty-Aware Conformal Prediction and World-Model Reinforcement Learning", `arxiv 2026.02. [Paper]
InstaDrive: "InstaDrive: Instance-Aware Driving World Models for Realistic and Consistent Video Generation", `arxiv 2026.02. [Paper] [Website]
ConsisDrive: "ConsisDrive: Identity-Preserving Driving World Models for Video Generation by Instance Mask", `arxiv 2026.02. [Paper] [Website]
MAD: "MAD: Motion Appearance Decoupling for efficient Driving World Models", `arxiv 2026.01. [Paper] [Website]
UniDrive-WM: "UniDrive-WM: Unified Understanding, Planning and Generation World Model For Autonomous Driving", `arxiv 2026.01. [Paper] [Website]
DriveLaW: "DriveLaW:Unifying Planning and Video Generation in a Latent Driving World", `arxiv 2025.12. [Paper]
GaussianDWM: "GaussianDWM: 3D Gaussian Driving World Model for Unified Scene Understanding and Multi-Modal Generation", `arxiv 2025.12. [Paper] [Code]
WorldRFT: "WorldRFT: Latent World Model Planning with Reinforcement Fine-Tuning for Autonomous Driving", `AAAI 2026. [Paper]
InDRiVE: "InDRiVE: Reward-Free World-Model Pretraining for Autonomous Driving via Latent Disagreement", `arxiv 2025.12. [Paper]
GenieDrive: "GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation", `arxiv 2025.12. [Paper] [Website]
FutureX: "FutureX: Enhance End-to-End Autonomous Driving via Latent Chain-of-Thought World Model", `arxiv 2025.12. [Paper]
"Latent Chain-of-Thought World Modeling for End-to-End Driving", `arxiv 2025.12. [Paper]
MindDrive: "MindDrive: An All-in-One Framework Bridging World Models and Vision-Language Model for End-to-End Autonomous Driving", `arxiv 2025.12. [Paper]
"Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles", `arxiv 2025.12. [Paper]
U4D: "U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences", `arxiv 2025.12. [Paper]
"Vehicle Dynamics Embedded World Models for Autonomous Driving", arXiv 2025.12. [Paper]
"World Model Robustness via Surprise Recognition", arXiv 2025.12. [Paper]
SparseWorld-TC: "SparseWorld-TC: Trajectory-Conditioned Sparse Occupancy World Model", arXiv 2025.11. [Paper]
AD-R1: "AD-R1: Closed-Loop Reinforcement Learning for End-to-End Autonomous Driving with Impartial World Models", arXiv 2025.11. [Paper]
Map-World: "Map-World: Masked Action planning and Path-Integral World Model for Autonomous Driving", arXiv 2025.11. [Paper]
WPT: "WPT: World-to-Policy Transfer via Online World Model Distillation", arXiv 2025.11. [Paper]
Percept-WAM: "Percept-WAM: Perception-Enhanced World-Awareness-Action Model for Robust End-to-End Autonomous Driving", arXiv 2025.11. [Paper]
Thinking Ahead: "Thinking Ahead: Foresight Intelligence in MLLMs and World Models", arXiv 2025.11. [Paper]
LiSTAR: "LiSTAR: Ray-Centric World Models for 4D LiDAR Sequences in Autonomous Driving", arXiv 2025.11. [Paper] [Website]
"Dual-Mind World Models: A General Framework for Learning in Dynamic Wireless Networks", arXiv 2025.10. [Paper]
"Addressing Corner Cases in Autonomous Driving: A World Model-based Approach with Mixture of Experts and LLMs", arXiv 2025.10. [Paper]
"From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction", NIPS 2025. [Paper] [Code]
"Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks", arXiv 2025.10. [Paper] [Website]
OmniNWM: "OmniNWM: Omniscient Driving Navigation World Models", arXiv 2025.10. [Paper] [Website]
SparseWorld: "SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries", arXiv 2025.10. [Paper] [Code]
"Vision-Centric 4D Occupancy Forecasting and Planning via Implicit Residual World Models", arXiv 2025.10. [Paper]
DriveVLA-W0: "DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving", arXiv 2025.10. [Paper] [Code]
CoIRL-AD: "CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving", arXiv 2025.10. [Paper] [Code]
TeraSim-World: "TeraSim-World: Worldwide Safety-Critical Data Synthesis for End-to-End Autonomous Driving", arXiv 2025.09. [Paper] [Website]
"Enhancing Physical Consistency in Lightweight World Models", arXiv 2025.09. [Paper]
OccTENS: "OccTENS: 3D Occupancy World Model via Temporal Next-Scale Prediction", arXiv 2025.09. [Paper]
IRL-VLA: "IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model", arXiv 2025.08. [Paper] [Website] [Code]
LiDARCrafter: "LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences", arXiv 2025.08. [Paper] [Website] [Code]
FASTopoWM: "FASTopoWM: Fast-Slow Lane Segment Topology Reasoning with Latent World Models", arXiv 2025.07. [Paper] [Code]
Orbis: "Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models", arXiv 2025.07. [Paper] [Code]
"World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving", arXiv 2025.07. [Paper]
NRSeg: "NRSeg: Noise-Resilient Learning for BEV Semantic Segmentation via Driving World Models", arXiv 2025.07. [Paper] [Code]
World4Drive: "World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model", ICCV2025. [Paper] [Code]
Epona: "Epona: Autoregressive Diffusion World Model for Autonomous Driving", ICCV2025. [Paper] [Code]
"Towards foundational LiDAR world models with efficient latent flow matching", arXiv 2025.06. [Paper]
SceneDiffuser++: "SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model", CVPR 2025. [Paper]
COME: "COME: Adding Scene-Centric Forecasting Control to Occupancy World Model", arXiv 2025.06. [Paper] [Code]
STAGE: "STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation", arXiv 2025.06. [Paper]
ReSim: "ReSim: Reliable World Simulation for Autonomous Driving", arXiv 2025.06. [Paper] [Code] [Project Page]
"Ego-centric Learning of Communicative World Models for Autonomous Driving", arXiv 2025.06. [Paper]
Dreamland: "Dreamland: Controllable World Creation with Simulator and Generative Models", arXiv 2025.06. [Paper] [Project Page]
LongDWM: "LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model", arXiv 2025.06. [Paper] [Project Page]
GeoDrive: "GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control", arXiv 2025.05. [Paper] [Code]
FutureSightDrive: "FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving", NeurIPS 2025. [Paper] [Code]
Raw2Drive: "Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)", arXiv 2025.05. [Paper]
VL-SAFE: "VL-SAFE: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving", arXiv 2025.05. [Paper] [Project Page]
PosePilot: "PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth", arXiv 2025.05. [Paper]
"World Model-Based Learning for Long-Term Age of Information Minimization in Vehicular Networks", arXiv 2025.05. [Paper]
"Learning to Drive from a World Model", arXiv 2025.04. [Paper]
DriVerse: "DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment", arXiv 2025.04. [Paper]
"End-to-End Driving with Online Trajectory Evaluation via BEV World Model", arXiv 2025.04. [Paper] [Code]
"Knowledge Graphs as World Models for Semantic Material-Aware Obstacle Handling in Autonomous Vehicles", arXiv 2025.03. [Paper]
MiLA: "MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving", arXiv 2025.03. [Paper] [Project Page]
SimWorld: "SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model", arXiv 2025.03. [Paper] [Project Page]
UniFuture: "Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception", arXiv 2025.03. [Paper] [Project Page]
EOT-WM: "Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latent Space", arXiv 2025.03. [Paper]
"Temporal Triplane Transformers as Occupancy World Models", arXiv 2025.03. [Paper]
InDRiVE: "InDRiVE: Intrinsic Disagreement based Reinforcement for Vehicle Exploration through Curiosity Driven Generalized World Model", arXiv 2025.02. [Paper]
MaskGWM: "MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction", arXiv 2025.02. [Paper]
Dream to Drive: "Dream to Drive: Model-Based Vehicle Control Using Analytic World Models", arXiv 2025.02. [Paper]
"Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving", ICLR 2025. [Paper]
"Dream to Drive with Predictive Individual World Model", IEEE TIV. [Paper] [Code]
HERMES: "HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation", arXiv 2025.01. [Paper]
AdaWM: "AdaWM: Adaptive World Model based Planning for Autonomous Driving", ICLR 2025. [Paper]
AD-L-JEPA: "AD-L-JEPA: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR Data", arXiv 2025.01. [Paper]
DrivingWorld: "DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT", arXiv 2024.12. [Paper] [Code] [Project Page]
DrivingGPT: "DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers", arXiv 2024.12. [Paper] [Project Page]
"An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training", arXiv 2024.12. [Paper]
GEM: "GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control", arXiv 2024.12. [Paper] [Project Page]
GaussianWorld: "GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction", arXiv 2024.12. [Paper] [Code]
Doe-1: "Doe-1: Closed-Loop Autonomous Driving with Large World Model", arXiv 2024.12. [Paper] [Project Page] [Code]
"Pysical Informed Driving World Model", arXiv 2024.12. [Paper] [Project Page]
InfiniCube: "InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models", arXiv 2024.12. [Paper] [Project Page]
InfinityDrive: "InfinityDrive: Breaking Time Limits in Driving World Models", arXiv 2024.12. [Paper] [Project Page]
ReconDreamer: "ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration", arXiv 2024.11. [Paper] [Project Page]
Imagine-2-Drive: "Imagine-2-Drive: High-Fidelity World Modeling in CARLA for Autonomous Vehicles", ICRA 2025. [Paper] [Project Page]
DynamicCity: "DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes", ICLR 2025 Spotlight. [Paper] [Project Page] [Code]
DriveDreamer4D: "World Models Are Effective Data Machines for 4D Driving Scene Representation", arXiv 2024.10. [Paper] [Project Page]
DOME: "Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model", arXiv 2024.10. [Paper] [Project Page]
SSR: "Does End-to-End Autonomous Driving Really Need Perception Tasks?", arXiv 2024.09. [Paper] [Code]
"Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models", arXiv 2024.09. [Paper]
LatentDriver: "Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving", arXiv 2024.09. [Paper] [Code]
RenderWorld: "World Model with Self-Supervised 3D Label", arXiv 2024.09. [Paper]
OccLLaMA: "An Occupancy-Language-Action Generative World Model for Autonomous Driving", arXiv 2024.09. [Paper]
DriveGenVLM: "Real-world Video Generation for Vision Language Model based Autonomous Driving", arXiv 2024.08. [Paper]
Drive-OccWorld: "Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving", arXiv 2024.08. [Paper]
CarFormer: "Self-Driving with Learned Object-Centric Representations", ECCV 2024. [Paper] [Code]
BEVWorld: "A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space", arXiv 2024.07. [Paper] [Code]
TOKEN: "Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving", arXiv 2024.07. [Paper]
UMAD: "Unsupervised Mask-Level Anomaly Detection for Autonomous Driving", arXiv 2024.06. [Paper]
SimGen: "Simulator-conditioned Driving Scene Generation", arXiv 2024.06. [Paper] [Code]
AdaptiveDriver: "Planning with Adaptive World Models for Autonomous Driving", arXiv 2024.06. [Paper] [Code]
UnO: "Unsupervised Occupancy Fields for Perception and Forecasting", CVPR 2024. [Paper] [Code]
LAW: "Enhancing End-to-End Autonomous Driving with Latent World Model", arXiv 2024.06. [Paper] [Code]
Delphi: "Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation", arXiv 2024.06. [Paper] [Code]
OccSora: "4D Occupancy Generation Models as World Simulators for Autonomous Driving", arXiv 2024.05. [Paper] [Code]
MagicDrive3D: "Controllable 3D Generation for Any-View Rendering in Street Scenes", arXiv 2024.05. [Paper] [Code]
Vista: "A Generalizable Driving World Model with High Fidelity and Versatile Controllability", NeurIPS 2024. [Paper] [Code]
CarDreamer: "Open-Source Learning Platform for World Model based Autonomous Driving", arXiv 2024.05. [Paper] [Code]
DriveSim: "Probing Multimodal LLMs as World Models for Driving", arXiv 2024.05. [Paper] [Code]
DriveWorld: "4D Pre-trained Scene Understanding via World Models for Autonomous Driving", CVPR 2024. [Paper]
LidarDM: "Generative LiDAR Simulation in a Generated World", arXiv 2024.04. [Paper] [Code]
SubjectDrive: "Scaling Generative Data in Autonomous Driving via Subject Control", arXiv 2024.03. [Paper] [Project]
DriveDreamer-2: "LLM-Enhanced World Models for Diverse Driving Video Generation", arXiv 2024.03. [Paper] [Code]
Think2Drive: "Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving", ECCV 2024. [Paper]
MARL-CCE: "Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model", ECCV 2024. [Paper] [Code]
GenAD: "Generalized Predictive Model for Autonomous Driving", CVPR 2024. [Paper] [Data]
GenAD: "Generative End-to-End Autonomous Driving", ECCV 2024. [Paper] [Code]
NeMo: "Neural Volumetric World Models for Autonomous Driving", ECCV 2024. [Paper]
MARL-CCE: "Modelling-Competitive-Behaviors-in-Autonomous-Driving-Under-Generative-World-Model", ECCV 2024. [Code]
ViDAR: "Visual Point Cloud Forecasting enables Scalable Autonomous Driving", CVPR 2024. [Paper] [Code]
Drive-WM: "Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving", CVPR 2024. [Paper] [Code]
Cam4DOCC: "Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications", CVPR 2024. [Paper] [Code]
Panacea: "Panoramic and Controllable Video Generation for Autonomous Driving", CVPR 2024. [Paper] [Code]
OccWorld: "Learning a 3D Occupancy World Model for Autonomous Driving", ECCV 2024. [Paper] [Code]
Copilot4D: "Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion", ICLR 2024. [Paper]
DrivingDiffusion: "Layout-Guided multi-view driving scene video generation with latent diffusion model", ECCV 2024. [Paper] [Code]
SafeDreamer: "Safe Reinforcement Learning with World Models", ICLR 2024. [Paper] [Code]
MagicDrive: "Street View Generation with Diverse 3D Geometry Control", ICLR 2024. [Paper] [Code]
DriveDreamer: "Towards Real-world-driven World Models for Autonomous Driving", ECCV 2024. [Paper] [Code]
SEM2: "Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model", TITS. [Paper]

引用

如果您觉得本仓库有用，请考虑引用此列表：

@misc{leo2024worldmodelspaperslist,
    title = {Awesome-World-Models},
    author = {Leo Fan},
    journal = {GitHub 仓库},
    url = {https://github.com/leofan90/Awesome-World-Models},
    year = {2024},
}

Awesome-World-Models 快速上手指南

项目简介： Awesome-World-Models 并非一个可直接安装的单一软件库或工具包，而是一个精选论文与资源列表。它汇集了关于通用视频生成、具身智能（Embodied AI）和自动驾驶领域的“世界模型”（World Models）相关的前沿研究、技术报告、综述及基准测试。

本指南旨在帮助开发者如何利用该列表快速定位所需模型，并获取对应项目的代码与环境配置方法。

1. 环境准备

由于本项目是资源索引，无需安装本项目本身作为依赖。你需要根据列表中感兴趣的具体模型（如 Cosmos, GAIA-2, HunyuanWorld 等）来准备相应的环境。

通用系统要求（参考主流世界模型项目）

大多数列出的世界模型项目对硬件有较高要求，建议具备以下基础环境：

操作系统: Linux (Ubuntu 20.04/22.04 推荐) 或 macOS (部分轻量模型支持)。
GPU: NVIDIA GPU，显存建议 16GB 以上（大型生成模型通常需要 24GB+ 或多卡）。
驱动: NVIDIA Driver >= 535, CUDA >= 11.8 或 12.1。
基础软件:
- Python >= 3.9 (具体版本视目标项目而定)
- Git
- Conda 或 Mamba (推荐用于环境管理)

前置依赖

在克隆具体模型仓库前，请确保已安装基础工具：

# 更新包管理器
sudo apt update

# 安装 Git 和基础编译工具
sudo apt install -y git build-essential wget curl

# 安装 Miniconda (如果尚未安装)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

2. 获取资源与安装具体模型

使用步骤分为两步：首先浏览列表找到目标项目，然后前往该项目仓库进行安装。

第一步：浏览与选择

访问 Awesome-World-Models 仓库页面，在 Blog or Technical Report 或 General World Models 分类下查找你需要的模型。

例如：若对 NVIDIA 的物理 AI 平台感兴趣，找到 Cosmos。
若对腾讯的 3D 世界生成感兴趣，找到 HunyuanWorld 1.0。

第二步：克隆与安装（以 Cosmos 为例）

点击列表中的 [Code] 链接进入对应 GitHub 仓库。以下是通用的安装流程示例：

# 1. 克隆目标项目代码 (此处以 NVIDIA Cosmos 为例)
git clone https://github.com/NVIDIA/Cosmos.git
cd Cosmos

# 2. 创建虚拟环境 (推荐使用 conda)
conda create -n cosmos-env python=3.10 -y
conda activate cosmos-env

# 3. 安装 PyTorch (根据官方要求选择 CUDA 版本)
# 国内用户推荐使用清华源加速
pip install torch torchvision torchaudio --index-url https://pypi.tuna.tsinghua.edu.cn/simple

# 4. 安装项目依赖
# 注意：不同项目依赖文件名称可能不同 (requirements.txt, setup.py, pyproject.toml)
pip install -r requirements.txt
# 或者
pip install -e .

提示：对于列表中带有 [Website] 的项目，建议优先访问其官方网站，通常提供更详细的 Demo 体验和特定的安装指令。

3. 基本使用

由于每个模型的输入输出接口不同，以下为基于典型世界模型（视频生成/预测）的通用使用逻辑示例。请以具体项目的 README 或 examples 文件夹为准。

典型使用流程

准备输入数据：通常是初始帧图像、文本提示词（Prompt）或动作序列。
加载模型权重：首次运行通常会自动下载或需手动从 HuggingFace 下载权重。
执行推理：运行提供的推理脚本。

使用示例 (伪代码/通用命令)

假设你已安装好某个名为 awesome-model 的项目：

# 激活环境
conda activate cosmos-env

# 运行推理脚本 (具体参数请参考该项目文档)
# 示例：根据文本提示生成一段世界模拟视频
python infer.py \
    --prompt "A robot arm picking up a cube on a table" \
    --output_dir ./results \
    --num_frames 128 \
    --resolution 720p

# 查看生成的视频
ls ./results

利用列表进行学术研究

如果你主要用于文献调研而非代码运行：

克隆本仓库以本地查阅最新论文：

git clone https://github.com/leofan90/Awesome-World-Models.git
cd Awesome-World-Models

直接阅读 README.md 中的分类链接，点击 [Paper] 跳转至 arXiv 下载 PDF。
关注 Surveys 部分，阅读综述论文以快速建立知识体系。

注意事项：

列表中部分论文标记为 arXiv 2026 等未来时间戳，这可能代表预发布版本或占位符，请以实际 arXiv 页面为准。
国内访问 HuggingFace 或 GitHub 可能较慢，建议在 .bashrc 中配置镜像加速或使用代理。
世界模型训练和推理消耗巨大，初次尝试建议先使用官方提供的 Colab 演示或在线 Demo (如有)。

相似工具推荐

openclaw

OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。 OpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你

★ 349.3k|★★★☆☆|3天前

Agent开发框架图像

stable-diffusion-webui

stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。

★ 162.1k|★★★☆☆|4天前

开发框架图像Agent

everything-claude-code

everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上

★ 146.8k|★★☆☆☆|今天

开发框架Agent语言模型

ComfyUI

ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。

★ 108.1k|★★☆☆☆|今天

开发框架图像Agent

markitdown

MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器

★ 93.4k|★★☆☆☆|2天前

插件开发框架

LLMs-from-scratch

LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。 LLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备

★ 90.1k|★★★☆☆|2天前

语言模型图像Agent