[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-LMD0311--Awesome-World-Model":3,"tool-LMD0311--Awesome-World-Model":64},[4,17,27,35,44,52],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":10,"last_commit_at":41,"category_tags":42,"status":16},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,"2026-04-06T03:28:53",[13,14,15,43],"视频",{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":23,"last_commit_at":50,"category_tags":51,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":53,"name":54,"github_repo":55,"description_zh":56,"stars":57,"difficulty_score":23,"last_commit_at":58,"category_tags":59,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,60,43,61,15,62,26,13,63],"数据工具","插件","其他","音频",{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":78,"owner_twitter":81,"owner_website":78,"owner_url":82,"languages":78,"stars":83,"forks":84,"last_commit_at":85,"license":78,"difficulty_score":86,"env_os":87,"env_gpu":88,"env_ram":88,"env_deps":89,"category_tags":92,"github_topics":93,"view_count":23,"oss_zip_url":78,"oss_zip_packed_at":78,"status":16,"created_at":104,"updated_at":105,"faqs":106,"releases":107},4176,"LMD0311\u002FAwesome-World-Model","Awesome-World-Model","Collect some World Models for Autonomous Driving (and Robotic, etc.) papers. ","Awesome-World-Model 是一个专注于自动驾驶与机器人领域的开源项目，旨在系统性地收集、追踪并评测最新的“世界模型”相关学术论文。所谓世界模型，是一种能够模拟物理世界如何随智能体行为而演变的预测性程序，它是实现安全、可靠且具备通用决策能力的关键技术基础。\n\n该项目主要解决了当前该领域研究分散、缺乏统一基准的痛点。通过整理涵盖感知、指令遵循、可控性及未来预测等核心能力的论文列表，它为研究者提供了一站式的资源导航，并作为其配套综述文章的动态补充。此外，项目还关联了 CVPR 等顶级会议的相关研讨会与挑战赛（如 OpenDriveLab），促进了学术成果与实际评测的结合。\n\nAwesome-World-Model 特别适合人工智能研究人员、自动驾驶算法工程师以及高校师生使用。无论是希望快速把握前沿趋势，还是寻找具体的基线方法进行对比实验，都能从中获益。其独特亮点在于不仅提供了详尽的文献索引，还建立了开放的社区贡献机制，鼓励全球开发者共同完善这份清单，推动世界模型技术在具身智能领域的标准化发展。","# Awesome World Models for Autonomous Driving\n\n[![Awesome](https:\u002F\u002Fcdn.rawgit.com\u002Fsindresorhus\u002Fawesome\u002Fd7305f38d29fed78fa85652e3a63e154dd8e8829\u002Fmedia\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fsindresorhus\u002Fawesome) [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2502.10498-b31b1b.svg?logo=arXiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.10498)\n\nThis repo is used for recording, tracking, and benchmarking several recent World Models (for Autonomous Driving or Robotic) methods, as a supplement to our [**survey**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.10498).\n\nIf you find some ignored papers, **feel free to [*create pull requests*](https:\u002F\u002Fgithub.com\u002FLMD0311\u002FAwesome-World-Model\u002Fblob\u002Fmain\u002FContributionGuidelines.md), or [*open issues*](https:\u002F\u002Fgithub.com\u002FLMD0311\u002FAwesome-World-Model\u002Fissues\u002Fnew)**. Contributions in any form to make this list more comprehensive are welcome. 📣📣📣\n\nIf you find this repository useful, please consider  **giving us a star** 🌟 and a [**cite**](https:\u002F\u002Fgithub.com\u002FLMD0311\u002FAwesome-World-Model#citation).\n\n## 📚 Citation\nIf you find this repository useful in your research, please kindly consider giving a star ⭐ and a citation:\n```bibtex\n@article{tu2025drivingworldmodel,\n  title={The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey}, \n  author={Tu, Sifan and Zhou, Xin and Liang, Dingkang and Jiang, Xingyu and Zhang, Yumeng and Li, Xiaofan and Bai, Xiang},\n  journal={arXiv preprint arXiv:2502.10498},\n  year={2025}\n}\n\n@inproceedings{zhou2025hermes,\n  title={HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation},\n  author={Zhou, Xin and Liang, Dingkang and Tu, Sifan and Chen, Xiwu and Ding, Yikang and Zhang, Dingyuan and Tan, Feiyang and Zhao, Hengshuang and Bai, Xiang},\n  booktitle={Proceedings of the IEEE\u002FCVF International Conference on Computer Vision},\n  year={2025}\n}\n\n@inproceedings{liang2025UniFuture,\n  title={UniFuture: A 4D Driving World Model for Future Generation and Perception},\n  author={Liang, Dingkang and Zhang, Dingyuan and Zhou, Xin and Tu, Sifan and Feng, Tianrui and Li, Xiaofan and Zhang, Yumeng and Du, Mingyang and Tan, Xiao and Bai, Xiang},\n  booktitle={Proceedings of the IEEE International Conference on Robotics Automation},\n  year={2026}\n}\n\n@article{chen2026out,\n  title={Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models},\n  author={Chen, Kaijin and Liang, Dingkang and Zhou, Xin and Ding, Yikang and Liu, Xiaoqiang and Wan, Pengfei and Bai, Xiang},\n  journal={arXiv preprint arXiv:2603.25716},\n  year={2026}\n}\n```\n\n## Workshop & Challenge\n\n- [`CVPR 25 Workshop & Challenge | OpenDriveLab`](https:\u002F\u002Fopendrivelab.com\u002Fchallenge25\u002F#1x-wm) Track: World Model.\n> A world model is a computer program that can imagine how the world evolves in response to an agent's behavior. It has the potential to solve general-purpose simulation and evaluation, enabling robots that are safe, reliable, and intelligent in a wide variety of scenarios.\n- [`World Model Bench @ CVPR'25`](https:\u002F\u002Fworldmodelbench.github.io\u002F) WorldModelBench: The 1st Workshop on Benchmarking World Models\n> World models refer to predictive models of physical phenomena in the world surrounding us. These models are fundamental for Physical AI agents, enabling crucial capabilities such as decision-making, planning, and counterfactual analysis. Effective world models must integrate several key components, including perception, instruction following, controllability, physical plausibility, and future prediction.\n- [`CVPR 24 Workshop & Challenge | OpenDriveLab`](https:\u002F\u002Fopendrivelab.com\u002Fchallenge24\u002F#predictive_world_model) Track #4: Predictive World Model.\n- [`CVPR 23 Workshop on Autonomous Driving`](https:\u002F\u002Fcvpr23.wad.vision\u002F) CHALLENGE 3: ARGOVERSE CHALLENGES, [3D Occupancy Forecasting](https:\u002F\u002Feval.ai\u002Fweb\u002Fchallenges\u002Fchallenge-page\u002F1977\u002Foverview) using the [Argoverse 2 Sensor Dataset](https:\u002F\u002Fwww.argoverse.org\u002Fav2.html#sensor-link). Predict the spacetime occupancy of the world for the next 3 seconds.\n\n## Papers\n\n### World model original paper\n\n- Using Occupancy Grids for Mobile Robot Perception and Navigation [[paper](http:\u002F\u002Fwww.sci.brooklyn.cuny.edu\u002F~parsons\u002Fcourses\u002F3415-fall-2011\u002Fpapers\u002Felfes.pdf)]\n\n### Technical blog or video\n\n- **`Yann LeCun`**: A Path Towards Autonomous Machine Intelligence [[paper](https:\u002F\u002Fopenreview.net\u002Fpdf?id=BZ5a1r-kVsf)] [[Video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=OKkEdTchsiE)]\n- **`ICCV'25 workshop`** Keynote - Ashok Elluswamy, Tesla [[Video](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1oasHzTEe3\u002F?vd_source=9ef518a6c349809d9fa8ab9427bd8b2c)]\n- **`CVPR'23 workshop`** Keynote - Ashok Elluswamy, Tesla [[Video](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=6x-Xb_uT7ts)]\n- **`Wayve`** Introducing GAIA-1: A Cutting-Edge Generative AI Model for Autonomy [[blog](https:\u002F\u002Fwayve.ai\u002Fthinking\u002Fintroducing-gaia1\u002F)] \n  > World models are the basis for the ability to predict what might happen next, which is fundamentally important for autonomous driving. They can act as a learned simulator, or a mental “what if” thought experiment for model-based reinforcement learning (RL) or planning. By incorporating world models into our driving models, we can enable them to understand human decisions better and ultimately generalise to more real-world situations.\n  \n\n### Survey\n- The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey. **`arXiv 25.02`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.10498)]\n- Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI. **`TMECH 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.06886)] [[Code](https:\u002F\u002Fgithub.com\u002FHCPLab-SYSU\u002FEmbodied_AI_Paper_List)]\n- A Survey on Future Physical World Generation for Autonomous Driving. **`MMAsia 25`** [[Paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Ffull\u002F10.1145\u002F3769748.3773345)]\n- A survey on multimodal large language models for autonomous driving. **`WACVW 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.12320)] [[Code](https:\u002F\u002Fgithub.com\u002FIrohXu\u002FAwesome-Multimodal-LLM-Autonomous-Driving)]\n- World Models: The Safety Perspective. **`ISSREW`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.07690)]\n- Progressive Robustness-Aware World Models in Autonomous Driving: A Review and Outlook. **`techrXiv 25.11`** [[Paper](https:\u002F\u002Fdoi.org\u002F10.36227\u002Ftechrxiv.176523308.84756413\u002Fv1)] [[Project](https:\u002F\u002Fgithub.com\u002FMoyangSensei\u002FAwesomeRobustDWM)]\n- A Survey of Unified Multimodal Understanding and Generation: Advances and Challenges. **`techrXiv 25.11`** [[Paper](https:\u002F\u002Fwww.techrxiv.org\u002Fdoi\u002Ffull\u002F10.36227\u002Ftechrxiv.176289261.16802577)]\n- Simulating the Visual World with Artificial Intelligence: A Roadmap. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.08585)] [[Project](https:\u002F\u002Fworld-model-roadmap.github.io\u002F)]\n- A Step Toward World Models: A Survey on Robotic Manipulation. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.02097)]\n- A Comprehensive Survey on World Models for Embodied AI. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.16732)] [[Project](https:\u002F\u002Fgithub.com\u002FLi-Zn-H\u002FAwesomeWorldModels)]\n- The Safety Challenge of World Models for Embodied AI Agents: A Review. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.05865)]\n- A Survey on World Models Grounded in Acoustic Physical Information. **`arXiv 25.09`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.13833)]\n- 3D and 4D World Modeling: A Survey. **`arXiv 25.09`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.07996)] [[Code](https:\u002F\u002Fgithub.com\u002Fworldbench\u002Fsurvey)]\n- A Survey of Embodied World Models. **`25.09`** [[Paper](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F395713824_A_Survey_of_Embodied_World_Models)]\n- One Flight Over the Gap: A Survey from Perspective to Panoramic Vision. **`arXiv 25.09`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.04444)] [[Page](https:\u002F\u002Finsta360-research-team.github.io\u002FSurvey-of-Panorama\u002F)]\n- Edge General Intelligence Through World Models and Agentic AI: Fundamentals, Solutions, and Challenges. **`arXiv 25.08`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.09561)]\n- A Survey: Learning Embodied Intelligence from Physical Simulators and World Models. **`arXiv 25.07`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.00917)]\n- From 2D to 3D Cognition: A Brief Survey of General World Models. **`arXiv 25.06`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.20134)]\n- World Models for Cognitive Agents: Transforming Edge Intelligence in Future Networks. **`arXiv 25.05`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.00417)]\n- Exploring the Evolution of Physics Cognition in Video Generation: A Survey. **`arXiv 25.03`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.21765)] [[Code](https:\u002F\u002Fgithub.com\u002Fminnie-lin\u002FAwesome-Physics-Cognition-based-Video-Generation)]\n- A Survey of World Models for Autonomous Driving. **`arXiv 25.01`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.11260)]\n- Generative Physical AI in Vision: A Survey. **`arXiv 25.01`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.10928)] [[Code](https:\u002F\u002Fgithub.com\u002FBestJunYu\u002FAwesome-Physics-aware-Generation)]\n- Understanding World or Predicting Future? A Comprehensive Survey of World Models. **`arXiv 24.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.14499)]\n- Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey. **`arXiv 24.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.02914)]\n- Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond. **`arXiv 24.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.03520)] [[Code](https:\u002F\u002Fgithub.com\u002FGigaAI-research\u002FGeneral-World-Models-Survey)]\n- World Models for Autonomous Driving: An Initial Survey. **`arXiv 24.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.02622)]\n\n### 2026\n- [**UniFuture**] UniFuture: A 4D Driving World Model for Future Generation and Perception. **`ICRA 26`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.13587)] [[Code](https:\u002F\u002Fgithub.com\u002Fdk-liang\u002FUniFuture)] [[Project](https:\u002F\u002Fdk-liang.github.io\u002FUniFuture\u002F)]\n- **RAYNOVA**: Scale-Temporal Autoregressive World Modeling in Ray Space. **`CVPR 26`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.20685)] [[Project](https:\u002F\u002Fraynova-ai.github.io\u002F)]\n- **WAM-Flow**: Parallel Coarse-to-Fine Motion Planning via Discrete Flow Matching for Autonomous Driving. **`CVPR 26`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.06112)] [[Code](https:\u002F\u002Fgithub.com\u002Ffudan-generative-vision\u002FWAM-Flow)]\n- **ResWorld**: Temporal Residual World Model for End-to-End Autonomous Driving. **`ICLR 26`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.10884)] [[Code](https:\u002F\u002Fgithub.com\u002Fmengtan00\u002FResWorld.git)]\n- **WorldRFT**: Latent World Model Planning with Reinforcement Fine-Tuning for Autonomous Driving.  **`AAAI 26`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.19133)]\n- **X-World**: Controllable Ego-Centric Multi-Camera World Models for Scalable End-to-End Driving. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.19979)]\n- **Vega**: Learning to Drive with Natural Language Instructions. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.25741)] [[Code](https:\u002F\u002Fgithub.com\u002Fzuosc19\u002FVega)]\n- **DCARL**: A Divide-and-Conquer Framework for Autoregressive Long-Trajectory Video Generation. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.24835)] [[Project](https:\u002F\u002Fjunyiouy.github.io\u002Fprojects\u002Fdcarl)]\n- **DreamerAD**: Efficient Reinforcement Learning via Latent World Model for Autonomous Driving. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.24587)]\n- **Latent-WAM**: Latent World Action Modeling for End-to-End Autonomous Driving. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.24581)]\n- Toward Physically Consistent Driving Video World Models under Challenging Trajectories. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.24506)] [[Project](https:\u002F\u002Fwm-research.github.io\u002FPhyGenesis\u002F)]\n- **FAR-Drive**: Frame-AutoRegressive Video Generation in Closed-Loop Autonomous Driving. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.14938)]\n- **WorldVLM**: Combining World Model Forecasting and Vision-Language Reasoning. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.14497)]\n- [**WorldDrive**] Bridging Scene Generation and Planning: Driving with World Model via Unifying Vision and Motion Representation. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.14948)] [[Code](https:\u002F\u002Fgithub.com\u002FTabGuigui\u002FWorldDrive)]\n- **DynVLA**: Learning World Dynamics for Action Reasoning in Autonomous Driving. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.11041)]\n- Latent World Models for Automated Driving: A Unified Taxonomy, Evaluation Framework, and Open Challenges. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.09086)]\n- **SAMoE-VLA**: A Scene Adaptive Mixture-of-Experts Vision-Language-Action Model for Autonomous Driving. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.08113)]\n- Kinematics-Aware Latent World Models for Data-Efficient Autonomous Driving. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.07264)]\n- **ShareVerse**: Multi-Agent Consistent Video Generation for Shared World Modeling. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.02697)]\n- Risk-Aware World Model Predictive Control for Generalizable End-to-End Autonomous Driving. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.23259)]\n- **UniDrive-WM**: Unified Understanding, Planning and Generation World Model For Autonomous Driving. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.04453)] [[Project](https:\u002F\u002Funidrive-wm.github.io\u002FUniDrive-WM)]\n- **MAD**: Motion Appearance Decoupling for efficient Driving World Models. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.09452)] [[Project](https:\u002F\u002Fvita-epfl.github.io\u002FMAD-World-Model\u002F)]\n- A Mechanistic View on Video Generation as World Models: State and Dynamics. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.17067)]\n- **Drive-JEPA**: Video JEPA Meets Multimodal Trajectory Distillation for End-to-End Driving. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.22032)]\n- **DrivingGen**: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.01528)] [[Project](https:\u002F\u002Fdrivinggen-bench.github.io\u002F)]\n\n### 2025\n- **HERMES**: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation.  **`ICCV 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.14729)] [[Code](https:\u002F\u002Fgithub.com\u002FLMD0311\u002FHERMES)] [[Project](https:\u002F\u002Flmd0311.github.io\u002FHERMES\u002F)]\n- [**FSDrive**] FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving. **`NeurIPS 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.17685)] [[Code](https:\u002F\u002Fgithub.com\u002FMIV-XJTU\u002FFSDrive)]\n- **DINO-Foresight**: Looking into the Future with DINO. **`NeurIPS 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.11673)] [[Code](https:\u002F\u002Fgithub.com\u002FSta8is\u002FDINO-Foresight)]\n- **From Forecasting to Planning**: Policy World Model for Collaborative State-Action Prediction. **`NeurIPS 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.19654)] [[Code](https:\u002F\u002Fgithub.com\u002F6550Zhao\u002FPolicy-World-Model)]\n- **InfiniCube**: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models.  **`ICCV 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.03934)] [[Project](https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Ftoronto-ai\u002Finfinicube\u002F)]\n- **DiST-4D**: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation.  **`ICCV 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.15208)] [[Project](https:\u002F\u002Froyalmelon0505.github.io\u002FDiST-4D\u002F)]\n- **Epona**: Autoregressive Diffusion World Model for Autonomous Driving.  **`ICCV 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.24113)] [[Code](https:\u002F\u002Fgithub.com\u002FKevin-thu\u002FEpona\u002F)]\n- **UniOcc**: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving. **`ICCV 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.24381)] [[Code](https:\u002F\u002Funiocc.github.io\u002F)]\n- **DriVerse**: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment.  **`ACM MM 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.19614)] [[Code](https:\u002F\u002Fgithub.com\u002Fshalfun\u002FDriVerse)]\n- **OmniGen**: Unified Multimodal Sensor Generation for Autonomous Driving. **`ACM MM 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.14225)]\n- **World4Drive**: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model.  **`ICCV 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.00603)]\n- [**PIWM**] Dream to Drive with Predictive Individual World Model.  **`TIV 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.16733)]  [[Code](https:\u002F\u002Fgithub.com\u002Fgaoyinfeng\u002FPIWM)]\n- **DriveDreamer4D**: World Models Are Effective Data Machines for 4D Driving Scene Representation. **`CVPR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.13571)] [[Project Page](https:\u002F\u002Fdrivedreamer4d.github.io\u002F)]\n- **GaussianWorld**: Gaussian World Model for Streaming 3D Occupancy Prediction. **`CVPR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.10373)] [[Code](https:\u002F\u002Fgithub.com\u002Fzuosc19\u002FGaussianWorld)]\n- **ReconDreamer**: Crafting World Models for Driving Scene Reconstruction via Online Restoration. **`CVPR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.19548)] [[Code](https:\u002F\u002Fgithub.com\u002FGigaAI-research\u002FReconDreamer)]\n- **FUTURIST**: Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers. **`CVPR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.08303)] [[Code](https:\u002F\u002Fgithub.com\u002FSta8is\u002FFUTURIST)]\n- **MaskGWM**: A Generalizable Driving World Model with Video Mask Reconstruction.  **`CVPR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.11663)] [[Code](https:\u002F\u002Fgithub.com\u002FSenseTime-FVG\u002FOpenDWM)]\n- **UniScene**: Unified Occupancy-centric Driving Scene Generation. **`CVPR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.05435)] [[Project](https:\u002F\u002Farlo0o.github.io\u002Funiscene\u002F)]\n- **DrivingGPT**: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers. **`CVPR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.18607)] [[Project](https:\u002F\u002Frogerchern.github.io\u002FDrivingGPT\u002F)]\n- **GEM**: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control. **`CVPR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.11198)] [[Project](https:\u002F\u002Fvita-epfl.github.io\u002FGEM.github.io\u002F)]\n- [**UMGen**] Generating Multimodal Driving Scenes via Next-Scene Prediction. **`CVPR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.14945)] [[Project](https:\u002F\u002Fyanhaowu.github.io\u002FUMGen\u002F)] [[Code](https:\u002F\u002Fgithub.com\u002FYanhaoWu\u002FUMGen\u002F)]\n- **DIO**: Decomposable Implicit 4D Occupancy-Flow World Model. **`CVPR 25`** [[Paper](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2025\u002Fhtml\u002FDiehl_DIO_Decomposable_Implicit_4D_Occupancy-Flow_World_Model_CVPR_2025_paper.html)]\n- **SceneDiffuser++**: City-Scale Traffic Simulation via a Generative World Model. **`CVPR 25`** [[Paper](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2025\u002Fhtml\u002FTan_SceneDiffuser_City-Scale_Traffic_Simulation_via_a_Generative_World_Model_CVPR_2025_paper.html)]\n- **DynamicCity**: Large-Scale LiDAR Generation from Dynamic Scenes  **`ICLR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.18084)] [[Code](https:\u002F\u002Fgithub.com\u002F3DTopia\u002FDynamicCity)]\n- **AdaWM**: Adaptive World Model based Planning for Autonomous Driving.  **`ICLR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.13072)]\n- **OccProphet**: Pushing Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with Observer-Forecaster-Refiner Framework.  **`ICLR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.15180)] [[Code](https:\u002F\u002Fgithub.com\u002FJLChen-C\u002FOccProphet)]\n- [**PreWorld**] Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving.  **`ICLR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.07309)] [[Code](https:\u002F\u002Fgithub.com\u002Fgetterupper\u002FPreWorld)]\n- [**SSR**] Does End-to-End Autonomous Driving Really Need Perception Tasks? **`ICLR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.18341)] [[Code](https:\u002F\u002Fgithub.com\u002FPeidongLi\u002FSSR)]\n- **Occ-LLM**: Enhancing Autonomous Driving with Occupancy-Based Large Language Models.  **`ICRA 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.06419)]\n- **STAGE**: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation.  **`IROS 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.13138)] [[Project](https:\u002F\u002F4dvlab.github.io\u002FSTAGE\u002F)]\n- **Drive&Gen**: Co-Evaluating End-to-End Driving and Video Generation Models.  **`IROS 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.06209)]\n- Learning to Generate 4D LiDAR Sequences. **`ICCVW 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.11959)]\n- World model-based end-to-end scene generation for accident anticipation in autonomous driving. **`Communications Engineering 25`** [[Paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs44172-025-00474-7)]\n- World Models for Autonomous Navigation of Terrestrial Robots from LIDAR Observations. **`JIFS 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.03429)]\n- **GaussianDWM**: 3D Gaussian Driving World Model for Unified Scene Understanding and Multi-Modal Generation. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.23180)] [[Code](https:\u002F\u002Fgithub.com\u002Fdtc111111\u002FGaussianDWM)]\n- **DriveLaW**: Unifying Planning and Video Generation in a Latent Driving World. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.23421)]\n- **InDRiVE**: Reward-Free World-Model Pretraining for Autonomous Driving via Latent Disagreement. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.18850)]\n- Latent Chain-of-Thought World Modeling for End-to-End Driving. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.10226)]\n- **GenieDrive**: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.12751)] [[Project](https:\u002F\u002Fhuster-yzy.github.io\u002Fgeniedrive_project_page\u002F)]\n- **WorldLens**: Full-Spectrum Evaluations of Driving World Models in Real World. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.10958)] [[Project](https:\u002F\u002Fworldbench.github.io\u002Fworldlens)]\n- **UniUGP**: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.09864)] [[Project](https:\u002F\u002Fseed-uniugp.github.io\u002F)]\n- **MindDrive**: An All-in-One Framework Bridging World Models and Vision-Language Model for End-to-End Autonomous Driving. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.04441)]\n- **U4D**: Uncertainty-Aware 4D World Modeling from LiDAR Sequences. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.02982)]\n- **RadarGen**: Automotive Radar Point Cloud Generation from Cameras. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.17897)] [[Project](https:\u002F\u002Fradargen.github.io\u002F)]\n- **Think Before You Drive**: World Model-Inspired Multimodal Grounding for Autonomous Vehicles. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.03454)]\n- Vehicle Dynamics Embedded World Models for Autonomous Driving. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.02417)]\n- **LiSTAR**: Ray-Centric World Models for 4D LiDAR Sequences in Autonomous Driving. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.16049)] [[Project](https:\u002F\u002Focean-luna.github.io\u002FLiSTAR.github.io\u002F)]\n- **OpenTwinMap**: An Open-Source Digital Twin Generator for Urban Autonomous Driving. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.21925)]\n- **SparseWorld-TC**: Trajectory-Conditioned Sparse Occupancy World Model. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.22039)]\n- **LaGen**: Towards Autoregressive LiDAR Scene Generation. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.21256)]\n- **AD-R1**: Closed-Loop Reinforcement Learning for End-to-End Autonomous Driving with Impartial World Models. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.20325)]\n- **CorrectAD**: A Self-Correcting Agentic System to Improve End-to-end Planning in Autonomous Driving. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.13297)]\n- [**UniScenev2**] Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.22973)]\n- Vision-Centric 4D Occupancy Forecasting and Planning via Implicit Residual World Models. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.16729)]\n- **SparseWorld**: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.17482)] [[Code](https:\u002F\u002Fgithub.com\u002FMSunDYY\u002FSparseWorld)]\n- **OmniNWM**: Omniscient Driving Navigation World Models. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.18313)] [[Project](https:\u002F\u002Farlo0o.github.io\u002FOmniNWM\u002F)]\n- [**ORAD-3D**] Advancing Off-Road Autonomous Driving: The Large-Scale ORAD-3D Dataset and Comprehensive Benchmarks. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.16500)] [[Code](https:\u002F\u002Fgithub.com\u002Fchaytonmin\u002FORAD-3D)]\n- [**Dream4Drive**] Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.19195)] [[Project](https:\u002F\u002Fwm-research.github.io\u002FDream4Drive\u002F)]\n- **DriveVLA-W0**: World Models Amplify Data Scaling Law in Autonomous Driving. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.12796)]\n- **CoIRL-AD**: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.12560)]\n- **CVD-STORM**: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.07944)]\n- [**PhiGensis**] 4D Driving Scene Generation With Stereo Forcing. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.20251)] [[Project](https:\u002F\u002Fjiangxb98.github.io\u002FPhiGensis\u002F)]\n- **TeraSim-World**: Worldwide Safety-Critical Data Synthesis for End-to-End Autonomous Driving. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.13164)]\n- **OccTENS**: 3D Occupancy World Model via Temporal Next-Scale Prediction. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.03887)]\n- [**G^2Editor**] Realistic and Controllable 3D Gaussian-Guided Object Editing for Driving Video Generation. **`arXiv 25.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.20471)]\n- **LSD-3D**: Large-Scale 3D Driving Scene Generation with Geometry Grounding. **`arXiv 25.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.19204)] [[Project](https:\u002F\u002Fprinceton-computational-imaging.github.io\u002FLSD-3D\u002F)]\n- Seeing Clearly, Forgetting Deeply: Revisiting Fine-Tuned Video Generators for Driving Simulation. **`arXiv 25.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.16512)]\n- **MoVieDrive**: Multi-Modal Multi-View Urban Scene Video Generation. **`arXiv 25.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.14327)]\n- **ImagiDrive**: A Unified Imagination-and-Planning Framework for Autonomous Driving. **`arXiv 25.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.11428)] [[Code](https:\u002F\u002Fgithub.com\u002Ffudan-zvg\u002FImagiDrive)]\n- **LiDARCrafter**: Dynamic 4D World Modeling from LiDAR Sequences. **`arXiv 25.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.03692)] [[Project](https:\u002F\u002Flidarcrafter.github.io\u002F)]\n- **FASTopoWM**: Fast-Slow Lane Segment Topology Reasoning with Latent World Models. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.23325)]\n- World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.12762)]\n- **Orbis**: Overcoming Challenges of Long-Horizon Prediction in Driving World Models.  **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.13162)] [[Code](https:\u002F\u002Flmb-freiburg.github.io\u002Forbis.github.io\u002F)]\n- **I2 -World**: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting.  **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.09144)] [[Code](https:\u002F\u002Fgithub.com\u002Flzzzzzm\u002FII-World)]\n- **NRSeg**: Noise-Resilient Learning for BEV Semantic Segmentation via Driving World Models.  **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.04002)] [[Code](https:\u002F\u002Fgithub.com\u002Flynn-yu\u002FNRSeg)]\n- Towards foundational LiDAR world models with efficient latent flow matching.  **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.23434)]\n- **ReSim**: Reliable World Simulation for Autonomous Driving.  **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.09981)] [[Project](https:\u002F\u002Fopendrivelab.com\u002FReSim)]\n- **Cosmos-Drive-Dreams**: Scalable Synthetic Driving Data Generation with World Foundation Models.  **`arXiv 25.6`** **`NVIDIA`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.09042)] [[Project](https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Ftoronto-ai\u002Fcosmos_drive_dreams\u002F)]\n- **Dreamland**: Controllable World Creation with Simulator and Generative Models.  **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.08006)] [[Project](https:\u002F\u002Fmetadriverse.github.io\u002Fdreamland\u002F)]\n- **LongDWM**: Cross-Granularity Distillation for Building a Long-Term Driving World Model.  **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.01546)] [[Code](https:\u002F\u002Fwang-xiaodong1899.github.io\u002Flongdwm\u002F)]\n- **ProphetDWM**: ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.18650)]\n- **GeoDrive**: 3D Geometry-Informed Driving World Model with Precise Action Control.  **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.22421)] [[Code](https:\u002F\u002Fgithub.com\u002Fantonioo-c\u002FGeoDrive)]\n- **DriveX**: Omni Scene Modeling for Learning Generalizable World Knowledge in\nAutonomous Driving.  **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.19239)]\n- **VL-SAFE**: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving.  **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.16377)] [[Project](https:\u002F\u002Fys-qu.github.io\u002Fvlsafe-website\u002F)]\n- **Raw2Drive**: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2).  **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.16394)]\n- [**RAMBLE**] From Imitation to Exploration: End-to-end Autonomous Driving based on World Model.  **`arXiv 25.4`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.02253)] [[Code](https:\u002F\u002Fgithub.com\u002FSCP-CN-001\u002Framble)]\n- **DiVE**: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer.  **`arXiv 25.4`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.18576)]\n- [**WoTE**] End-to-End Driving with Online Trajectory Evaluation via BEV World Model.  **`ICCV 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.01941)] [[Code](https:\u002F\u002Fgithub.com\u002FliyingyanUCAS\u002FWoTE)]\n- **MagicDrive-V2**: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.13807)] [[Project](https:\u002F\u002Fgaoruiyuan.com\u002Fmagicdrive-v2\u002F)]\n- **CoGen**: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving.  **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.22231)] \n- **GAIA-2**: A Controllable Multi-View Generative World Model for Autonomous Driving.  **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.20523)] \n- **Semi-SD**: Semi-Supervised Metric Depth Estimation via Surrounding Cameras for Autonomous Driving.  **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.19713)] [[Code](https:\u002F\u002Fgithub.com\u002Fxieyuser\u002FSemi-SD)]\n- **MiLA**: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving.  **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.15875)] [[Project](https:\u002F\u002Fxiaomi-mlab.github.io\u002Fmila.github.io\u002F)]\n- **SimWorld**: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.13952)] [[Code](https:\u002F\u002Fgithub.com\u002FLi-Zn-H\u002FSimWorld)]\n- [**EOT-WM**] Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latant Space. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.09215)]\n- [**T^3Former**] Temporal Triplane Transformers as Occupancy World Models. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.07338)]\n- **AVD2**: Accident Video Diffusion for Accident Video Description. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.14801)] [[Project](https:\u002F\u002Fan-answer-tree.github.io\u002F)]\n- **VaViM and VaVAM**: Autonomous Driving through Video Generative Modeling.  **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.15672)] [[Code](https:\u002F\u002Fgithub.com\u002Fvaleoai\u002FVideoActionModel)]\n- **Dream to Drive**: Model-Based Vehicle Control Using Analytic World Models.  **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.10012)]\n- **AD-L-JEPA**: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR Data.  **`arXiv 25.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.04969)] [[Code](https:\u002F\u002Fgithub.com\u002FHaoranZhuExplorer\u002FAD-L-JEPA-Release)]\n\n### 2024\n- [**SEM2**] Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model. **`TITS`** [[Paper](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F10538211\u002F)]\n- **Vista**: A Generalizable Driving World Model with High Fidelity and Versatile Controllability. **`NeurIPS 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.17398)] [[Code](https:\u002F\u002Fgithub.com\u002FOpenDriveLab\u002FVista)]\n- **SceneDiffuser**: Efficient and Controllable Driving Simulation Initialization and Rollout. **`NeurIPS 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.12129)]\n- **DrivingDojo Dataset**: Advancing Interactive and Knowledge-Enriched Driving World Model. **`NeurIPS 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.10738)] [[Project](https:\u002F\u002Fdrivingdojo.github.io\u002F)]\n- **Think2Drive**: Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving. **`ECCV 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.16720)]\n- [**MARL-CCE**] Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model. **`ECCV 24`** [[Paper](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_24\u002Fpapers_ECCV\u002Fpapers\u002F05085.pdf)] [[Code](https:\u002F\u002Fgithub.com\u002Fqiaoguanren\u002FMARL-CCE)]\n- **DriveDreamer**: Towards Real-world-driven World Models for Autonomous Driving. **`ECCV 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.09777)] [[Code](https:\u002F\u002Fgithub.com\u002FJeffWang987\u002FDriveDreamer)]\n- **OccWorld**: Learning a 3D Occupancy World Model for Autonomous Driving. **`ECCV 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.16038)] [[Code](https:\u002F\u002Fgithub.com\u002Fwzzheng\u002FOccWorld)]\n- [**NeMo**] Neural Volumetric World Models for Autonomous Driving. **`ECCV 24`** [[Paper](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_24\u002Fpapers_ECCV\u002Fpapers\u002F02571.pdf)]\n- **CarFormer**: Self-Driving with Learned Object-Centric Representations. **`ECCV 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.15843)] [[Code](https:\u002F\u002Fkuis-ai.github.io\u002FCarFormer\u002F)]\n- [**MARL-CCE**] Modelling-Competitive-Behaviors-in-Autonomous-Driving-Under-Generative-World-Model. **`ECCV 24`** [[Code](https:\u002F\u002Fgithub.com\u002Fqiaoguanren\u002FMARL-CCE)]\n- [**GUMP**] Solving Motion Planning Tasks with a Scalable Generative Model. **`ECCV 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.02797)] [[Code](https:\u002F\u002Fgithub.com\u002FHorizonRobotics\u002FGUMP\u002F)]\n- **WoVoGen**: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation. **`ECCV 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.02934)] [[Code](https:\u002F\u002Fgithub.com\u002Ffudan-zvg\u002FWoVoGen)]\n- **DrivingDiffusion**: Layout-Guided multi-view driving scene video generation with latent diffusion model. **`ECCV 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.07771)] [[Code](https:\u002F\u002Fgithub.com\u002Fshalfun\u002FDrivingDiffusion)]\n- **3D-VLA**: A 3D Vision-Language-Action Generative World Model.  **`ICML 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.09631)]\n- [**ViDAR**] Visual Point Cloud Forecasting enables Scalable Autonomous Driving. **`CVPR 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.17655)] [[Code](https:\u002F\u002Fgithub.com\u002FOpenDriveLab\u002FViDAR)]\n- [**GenAD**] Generalized Predictive Model for Autonomous Driving. **`CVPR 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.09630)] [[Data](https:\u002F\u002Fgithub.com\u002FOpenDriveLab\u002FDriveAGI?tab=readme-ov-file#genad-dataset-opendv-youtube)]\n- **Cam4DOCC**: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications. **`CVPR 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.17663)] [[Code](https:\u002F\u002Fgithub.com\u002Fhaomo-ai\u002FCam4DOcc)]\n- [**Drive-WM**] Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving. **`CVPR 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.17918)] [[Code](https:\u002F\u002Fgithub.com\u002FBraveGroup\u002FDrive-WM)]\n- **DriveWorld**: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving. **`CVPR 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.04390)]\n- **Panacea**: Panoramic and Controllable Video Generation for Autonomous Driving. **`CVPR 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.16813)] [[Code](https:\u002F\u002Fpanacea-ad.github.io\u002F)]\n- **UnO**: Unsupervised Occupancy Fields for Perception and Forecasting. **`CVPR 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.08691)] [[Code](https:\u002F\u002Fwaabi.ai\u002Fresearch\u002Funo)]\n- **MagicDrive**: Street View Generation with Diverse 3D Geometry Control. **`ICLR 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.02601)] [[Code](https:\u002F\u002Fgithub.com\u002Fcure-lab\u002FMagicDrive)]\n- **Copilot4D**: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion. **`ICLR 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.01017)]\n- **SafeDreamer**: Safe Reinforcement Learning with World Models. **`ICLR 24`** [[Paper](https:\u002F\u002Fopenreview.net\u002Fforum?id=tsE5HLYtYg)] [[Code](https:\u002F\u002Fgithub.com\u002FPKU-Alignment\u002FSafeDreamer)]\n- **DrivingWorld**: Constructing World Model for Autonomous Driving via Video GPT. **`arXiv 24.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.19505)] [[Code](https:\u002F\u002Fgithub.com\u002FYvanYin\u002FDrivingWorld)]\n- An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training. **`arXiv 24.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.13772)]\n- **Doe-1**: Closed-Loop Autonomous Driving with Large World Model. **`arXiv 24.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.09627)] [[Code](https:\u002F\u002Fgithub.com\u002Fwzzheng\u002FDoe)]\n- [**DrivePhysica**] Physical Informed Driving World Model. **`arXiv 24.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.08410)] [[Code](https:\u002F\u002Fmetadrivescape.github.io\u002Fpapers_project\u002FDrivePhysica\u002Fpage.html)]\n- **Terra** **ACT-Bench**: Towards Action Controllable World Models for Autonomous Driving. **`arXiv 24.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.05337)] [[Code](https:\u002F\u002Fgithub.com\u002Fturingmotors\u002FACT-Bench)] [[Project](https:\u002F\u002Fturingmotors.github.io\u002Factbench\u002F)] [[Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fturing-motors\u002FTerra)] \n- **UniMLVG**: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving. **`arXiv 24.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.04842)] [[Project](https:\u002F\u002Fsensetime-fvg.github.io\u002FUniMLVG\u002F)] [[Code](https:\u002F\u002Fgithub.com\u002FSenseTime-FVG\u002FOpenDWM)]\n- **HoloDrive**: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving. **`arXiv 24.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.01407)]\n- **InfinityDrive**: Breaking Time Limits in Driving World Models. **`arXiv 24.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.01522)] [[Project Page](https:\u002F\u002Fmetadrivescape.github.io\u002Fpapers_project\u002FInfinityDrive\u002Fpage.html)]\n- Generating Out-Of-Distribution Scenarios Using Language Models. **`arXiv 24.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.16554)]\n- **Imagine-2-Drive**: High-Fidelity World Modeling in CARLA for Autonomous Vehicles. **`arXiv 24.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.10171)] [[Project Page](https:\u002F\u002Fanantagrg.github.io\u002FImagine-2-Drive.github.io\u002F)]\n- **WorldSimBench**: Towards Video Generation Models as World Simulator. **`arXiv 24.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.18072)] [[Project Page](https:\u002F\u002Firanqin.github.io\u002FWorldSimBench.github.io\u002F)]\n- **DOME**: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model. **`arXiv 24.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.10429)] [[Project Page](https:\u002F\u002Fgusongen.github.io\u002FDOME)]\n- **OCCVAR**: Scalable 4D Occupancy Prediction via Next-Scale Prediction. **`OpenReview`** [[Paper](https:\u002F\u002Fopenreview.net\u002Fforum?id=X2HnTFsFm8)]\n- Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models. **`arXiv 24.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.16663)]\n- [**LatentDriver**] Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving. **`arXiv 24.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.15730)] [[Code](https:\u002F\u002Fgithub.com\u002FSephirex-X\u002FLatentDriver)]\n- **RenderWorld**: World Model with Self-Supervised 3D Label. **`arXiv 24.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.11356)]\n- **OccLLaMA**: An Occupancy-Language-Action Generative World Model for Autonomous Driving. **`arXiv 24.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.03272)]\n- **DriveGenVLM**: Real-world Video Generation for Vision Language Model based Autonomous Driving. **`arXiv 24.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.16647)]\n- [**Drive-OccWorld**] Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving. **`arXiv 24.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.14197)]\n- **BEVWorld**: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space. **`arXiv 24.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.05679)] [[Code](https:\u002F\u002Fgithub.com\u002Fzympsyche\u002FBevWorld)]\n- [**TOKEN**] Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving. **`arXiv 24.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.00959)]\n- **UMAD**: Unsupervised Mask-Level Anomaly Detection for Autonomous Driving. **`arXiv 24.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.06370)]\n- **SimGen**: Simulator-conditioned Driving Scene Generation. **`arXiv 24.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.09386)] [[Code](https:\u002F\u002Fmetadriverse.github.io\u002Fsimgen\u002F)]\n- [**AdaptiveDriver**] Planning with Adaptive World Models for Autonomous Driving. **`arXiv 24.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.10714)] [[Code](https:\u002F\u002Farunbalajeev.github.io\u002Fworld_models_planning\u002Fworld_model_paper.html)]\n- [**LAW**] Enhancing End-to-End Autonomous Driving with Latent World Model. **`arXiv 24.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.08481)] [[Code](https:\u002F\u002Fgithub.com\u002FBraveGroup\u002FLAW)]\n- [**Delphi**] Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation. **`arXiv 24.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.01349)] [[Code](https:\u002F\u002Fgithub.com\u002Fwestlake-autolab\u002FDelphi)]\n- **OccSora**: 4D Occupancy Generation Models as World Simulators for Autonomous Driving. **`arXiv 24.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.20337)] [[Code](https:\u002F\u002Fgithub.com\u002Fwzzheng\u002FOccSora)]\n- **MagicDrive3D**: Controllable 3D Generation for Any-View Rendering in Street Scenes. **`arXiv 24.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.14475)] [[Code](https:\u002F\u002Fgaoruiyuan.com\u002Fmagicdrive3d\u002F)]\n- **CarDreamer**: Open-Source Learning Platform for World Model based Autonomous Driving. **`arXiv 24.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.09111)] [[Code](https:\u002F\u002Fgithub.com\u002Fucd-dare\u002FCarDreamer)]\n- [**DriveSim**] Probing Multimodal LLMs as World Models for Driving. **`arXiv 24.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.05956)] [[Code](https:\u002F\u002Fgithub.com\u002Fsreeramsa\u002FDriveSim)]\n- **LidarDM**: Generative LiDAR Simulation in a Generated World. **`arXiv 24.4`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.02903)] [[Code](https:\u002F\u002Fgithub.com\u002Fvzyrianov\u002Flidardm)]\n- **SubjectDrive**: Scaling Generative Data in Autonomous Driving via Subject Control. **`arXiv 24.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.19438)] [[Project](https:\u002F\u002Fsubjectdrive.github.io\u002F)]\n- **DriveDreamer-2**: LLM-Enhanced World Models for Diverse Driving Video Generation. **`arXiv 24.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.06845)] [[Code](https:\u002F\u002Fdrivedreamer2.github.io\u002F)]\n\n### 2023\n\n- **TrafficBots**: Towards World Models for Autonomous Driving Simulation and Motion Prediction. **`ICRA 23`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.04116)] [[Code](https:\u002F\u002Fgithub.com\u002Fzhejz\u002FTrafficBots)]\n- [**CTT**] Categorical Traffic Transformer: Interpretable and Diverse Behavior Prediction with Tokenized Latent. **`arXiv 23.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.18307)]\n- **MUVO**: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations. **`arXiv 23.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.11762)]\n- **GAIA-1**: A Generative World Model for Autonomous Driving. **`arXiv 23.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.17080)]\n- **ADriver-I**: A General World Model for Autonomous Driving. **`arXiv 23.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.13549)]\n- **UniWorld**: Autonomous Driving Pre-training via World Models. **`arXiv 23.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.07234)] [[Code](https:\u002F\u002Fgithub.com\u002Fchaytonmin\u002FUniWorld)]\n\n### 2022\n\n- [**MILE**] Model-Based Imitation Learning for Urban Driving. **`NeurIPS 22`** [[Paper](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Fhash\u002F827cb489449ea216e4a257c47e407d18-Abstract-Conference.html)] [[Code](https:\u002F\u002Fgithub.com\u002Fwayveai\u002Fmile)]\n- **Symphony**: Learning Realistic and Diverse Agents for Autonomous Driving Simulation. **`ICRA 22`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.03195)] \n- Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving. **`IROS 22`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.09539)]\n\n## Other World Model Paper\n### 2026\n- Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model. **`CVPR 26`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.05438)]\n- **GeoWorld**: Geometric World Models. **`CVPR 26`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.23058)] [[Project](https:\u002F\u002Fsteve-zeyu-zhang.github.io\u002FGeoWorld)]\n- [**EAWM**] From Observations to Events: Event-Aware World Model for Reinforcement Learning. **`ICLR 26`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.19336)] [[Code](https:\u002F\u002Fgithub.com\u002FMarquisDarwin\u002FEAWM)]\n- **R2-Dreamer**: Redundancy-Reduced World Models without Decoders or Augmentation. **`ICLR 26`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.18202)] [[Code](https:\u002F\u002Fgithub.com\u002FNM512\u002Fr2dreamer)]\n- **NeuroHex**: Highly-Efficient Hex Coordinate System for Creating World Models to Enable Adaptive AI. **`NICE 26`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.00376)]\n- Foundation World Models for Agents that Learn, Verify, and Adapt Reliably Beyond Static Environments. **`AAMAS 26`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.23997)]\n- Probabilistic Dreaming for World Models. **`ICLRW 26`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.04715)]\n- From Part to Whole: 3D Generative World Model with an Adaptive Structural Hierarchy. **`ICME 26`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.21557)]\n- Value-guided action planning with JEPA world models. **`World Modeling Workshop 26`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.00844)]\n- Self-Supervised Multi-Modal World Model with 4D Space-Time Embedding. **`World Modeling Workshop 26`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.07039)] [[Project](https:\u002F\u002Fgithub.com\u002Flegel\u002Fdeepearth)]\n- Explicit World Models for Reliable Human-Robot Collaboration. **`AAAIW 26`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.01705)]\n- [**HyDRA**] Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2603.25716)] [[Code](https:\u002F\u002Fgithub.com\u002FH-EmbodVis\u002FHyDRA)] [[Project](https:\u002F\u002Fkj-chen666.github.io\u002FHybrid-Memory-in-Video-World-Models\u002F)]\n- Persistent Robot World Models: Stabilizing Multi-Step Rollouts via Reinforcement Learning. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.25685)]\n- **MMaDA-VLA**: Large Diffusion Vision-Language-Action Model with Unified Multi-Modal Instruction and Generation. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.25406)]\n- **ABot-PhysWorld**: Interactive World Foundation Model for Robotic Manipulation with Physics Alignment. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.23376)]\n- **Describe-Then-Act**: Proactive Agent Steering via Distilled Language-Action World Models. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.23149)]\n- Model Predictive Control with Differentiable World Models for Offline Reinforcement Learning. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.22430)]\n- **WorldCache**: Content-Aware Caching for Accelerated Video World Models. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.22286)] [[Code](https:\u002F\u002Fumair1221.github.io\u002FWorld-Cache\u002F)]\n- **ThinkJEPA**: Empowering Latent World Models with Large Vision-Language Reasoning Model. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.22281)]\n- **Omni-WorldBench**: Towards a Comprehensive Interaction-Centric Evaluation for World Models. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.22212)]\n- Do World Action Models Generalize Better than VLAs? A Robustness Study. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.22078)]\n- **InSpatio-WorldFM**: An Open-Source Real-Time Generative Frame Model. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.11911)] [[Project](https:\u002F\u002Finspatio.github.io\u002Fworldfm\u002F)]\n- [**VEGA-3D**] Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.19235)] [[Code](https:\u002F\u002Fgithub.com\u002FH-EmbodVis\u002FVEGA-3D)]\n- **AcceRL**: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.18464)]\n- **EVA**: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.17808)] [[Project](https:\u002F\u002Feva-project-page.github.io\u002F)]\n- **Stereo World Model**: Camera-Guided Stereo Video Generation. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.17375)] [[Project](https:\u002F\u002Fsunyangtian.github.io\u002FStereoWorld-web\u002F)]\n- **GigaWorld-Policy**: An Efficient Action-Centered World--Action Model. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.17240)]\n- **MosaicMem**: Hybrid Spatial Memory for Controllable Video World Models. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.17117)] [[Project](https:\u002F\u002Fmosaicmem.github.io\u002Fmosaicmem\u002F)]\n- **DreamPlan**: Efficient Reinforcement Fine-Tuning of Vision-Language Planners via Video World Models. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.16860)] [[Project](https:\u002F\u002Fpsi-lab.ai\u002FDreamPlan\u002F)]\n- **Simulation Distillation**: Pretraining World Models in Simulation for Rapid Real-World Adaptation. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.15759)] [[Project](https:\u002F\u002Fsim-dist.github.io\u002F)]\n- **ResWM**: Residual-Action World Model for Visual RL. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.11110)]\n- **World2Act**: Latent Action Post-Training via Skill-Compositional World Models. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.10422)] [[Project](https:\u002F\u002Fwm2act.github.io\u002F)]\n- **RAE-NWM**: Navigation World Model in Dense Visual Representation Space. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.09241)] [[Code](https:\u002F\u002Fgithub.com\u002F20robo\u002Fraenwm)]\n- **MWM**: Mobile World Models for Action-Conditioned Consistent Prediction. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.07799)]\n- **DreamSAC**: Learning Hamiltonian World Models via Symmetry Exploration. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.07545)]\n- **LiveWorld**: Simulating Out-of-Sight Dynamics in Generative Video World Models. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.07145)]\n- **WorldCache**: Accelerating World Models for Free via Heterogeneous Token Caching. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.06331)] [[Project](https:\u002F\u002Fgithub.com\u002FFofGofx\u002FWorldCache)]\n- World Properties without World Models: Recovering Spatial and Temporal Structure from Co-occurrence Statistics in Static Word Embeddings. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.04317)]\n- Beyond Pixel Histories: World Models with Persistent 3D State. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.03482)]\n- **DreamWorld**: Unified World Modeling in Video Generation. **`arXiv 26.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.00466)]\n- **MetaOthello**: A Controlled Study of Multiple World Models in Transformers. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.23164)]\n- The Trinity of Consistency as a Defining Principle for General World Models. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.23152)]\n- **UCM**: Unifying Camera Control and Memory with Time-aware Positional Encoding Warping for World Models. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.22960)] [[Project](https:\u002F\u002Fhumanaigc.github.io\u002Fucm-webpage\u002F)]\n- **CWM**: Contrastive World Models for Action Feasibility Learning in Embodied Agent Pipelines. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.22452)]\n- **Solaris**: Building a Multiplayer Video World Model in Minecraft. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.22208)] [[Project](https:\u002F\u002Fsolaris-wm.github.io\u002F)]\n- When World Models Dream Wrong: Physical-Conditioned Adversarial Attacks against World Models. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.18739)]\n- Learning Invariant Visual Representations for Planning with Joint-Embedding Predictive World Models. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.18639)]\n- Factored Latent Action World Models. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.16229)]\n- [**DreamZero**] World Action Models are Zero-shot Policies. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.15922)] [[Project](https:\u002F\u002Fdreamzero0.github.io\u002F)]\n- **VLM-DEWM**: Dynamic External World Model for Verifiable and Resilient Vision-Language Planning in Manufacturing. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.15549)]\n- Self-Supervised JEPA-based World Models for LiDAR Occupancy Completion and Forecasting. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.12540)]\n- GigaBrain-0.5M: a VLA That Learns From World Model-Based Reinforcement Learning. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.12099)] [[Project](https:\u002F\u002Fgigabrain05m.github.io\u002F)]\n- **VLAW**: Iterative Co-Improvement of Vision-Language-Action Policy and World Model. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.12063)] [[Project](https:\u002F\u002Fsites.google.com\u002Fview\u002Fvlaw-arxiv)]\n- Scaling World Model for Hierarchical Manipulation Policies. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.10983)] [[Project](https:\u002F\u002Fvista-wm.github.io\u002F)]\n- Say, Dream, and Act: Learning Video World Models for Instruction-Driven Robot Manipulation. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.10717)]\n- **Olaf-World**: Orienting Latent Actions for Video World Modeling. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.10104)] [[Project](https:\u002F\u002Fshowlab.github.io\u002FOlaf-World\u002F)]\n- **VLA-JEPA**: Enhancing Vision-Language-Action Model with Latent World Model. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.10098)]\n- **Agent World Model**: Infinity Synthetic Environments for Agentic Reinforcement Learning. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.10090)] [[Code](https:\u002F\u002Fgithub.com\u002FSnowflake-Labs\u002Fagent-world-model)]\n- **MVISTA-4D**: View-Consistent 4D World Model with Test-Time Action Inference for Robotic Manipulation. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.09878)]\n- **Hand2World**: Autoregressive Egocentric Interaction Generation via Free-Space Hand Gestures. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.09600)] [[Project](https:\u002F\u002Fhand2world.github.io\u002F)]\n- **WorldArena**: A Unified Benchmark for Evaluating Perception and Functional Utility of Embodied World Models. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.08971)]\n- **MIND**: Benchmarking Memory Consistency and Action Control in World Models. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.08025)] [[Code](https:\u002F\u002Fgithub.com\u002FCSU-JPG\u002FMIND)]\n- Cross-View World Models. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.07277)]\n- Interpreting Physics in Video World Models. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.07050)]\n- **DreamDojo**: A Generalist Robot World Model from Large-Scale Human Videos. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.06949)] [[Project](https:\u002F\u002Fdreamdojo-world.github.io\u002F)]\n- **World-VLA-Loop**: Closed-Loop Learning of Video World Model and VLA Policy. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.06508)] [[Project](https:\u002F\u002Fshowlab.github.io\u002FWorld-VLA-Loop\u002F)]\n- Self-Improving World Modelling with Latent Actions. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.06130)]\n- **BridgeV2W**: Bridging Video Generation Models to Embodied World Models via Embodiment Masks. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.03793)] [[Project](https:\u002F\u002Fbridgev2w.github.io\u002F)]\n- **LIVE**: Long-horizon Interactive Video World Modeling. **`arXiv 26.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.03747)]\n- [**Lingbot-World**] Advancing Open-source World Models. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.20540)] [[Code](https:\u002F\u002Fgithub.com\u002Frobbyant\u002Flingbot-world)]\n- [**Lingbot-VA**] Causal World Modeling for Robot Control. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.21998)] [[Code](https:\u002F\u002Fgithub.com\u002Frobbyant\u002Flingbot-va)]\n- **PathWise**: Planning through World Model for Automated Heuristic Design via Self-Evolving LLMs. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.20539)]\n- **WorldBench**: Disambiguating Physics for Diagnostic Evaluation of World Modelsl. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.21282)] [[Project](https:\u002F\u002Fworld-bench.github.io\u002F)]\n- Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.19834)] [[Project](https:\u002F\u002Fthuml.github.io\u002FReasoning-Visual-World)]\n- **PhysicsMind**: Sim and Real Mechanics Benchmarking for Physical Reasoning and Prediction in Foundational VLMs and World Models. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.16007)]\n- **Boltzmann-GPT**: Bridging Energy-Based World Models and Language Generation. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.17094)]\n- **MetaWorld**: Skill Transfer and Composition in a Hierarchical World Model for Grounding High-Level Instructions. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.17507)] [[Project](https:\u002F\u002Fanonymous.4open.science\u002Fr\u002Fmetaworld-2BF4\u002F)]\n- Aligning Agentic World Models via Knowledgeable Experience Learning. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.13247)]\n- **VJEPA**: Variational Joint Embedding Predictive Architectures as Probabilistic World Models. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.14354)]\n- Walk through Paintings: Egocentric World Models from Internet Priors. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.15284)]\n- From Generative Engines to Actionable Simulators: The Imperative of Physical Grounding in World Models. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.15533)]\n- An Efficient and Multi-Modal Navigation System with One-Step World Model. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.12277)]\n- **ReWorld**: Multi-Dimensional Reward Modeling for Embodied World Models. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.12428)]\n- Action Shapley: A Training Data Selection Metric for World Model in Reinforcement Learning. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.10905)]\n- Inference-time Physics Alignment of Video Generative Models with Latent World Models. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.10553)]\n- Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.08955)]\n- Semantic Belief-State World Model for 3D Human Motion Prediction. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.03517)]\n- **PointWorld**: Scaling 3D World Models for In-The-Wild Robotic Manipulation. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.03782)] [[Project](https:\u002F\u002Fpoint-world.github.io\u002F)]\n- Current Agents Fail to Leverage World Model as Tool for Foresight. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.03905)]\n- MobileDreamer: Generative Sketch World Model for GUI Agent. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.04035)]\n- Wow, wo, val! A Comprehensive Embodied World Model Evaluation Turing Testl. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.04137)]\n- **VerseCrafter**: Dynamic Realistic Video World Model with 4D Geometric Control. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.05138)] [[Project](https:\u002F\u002Fsixiaozheng.github.io\u002FVerseCrafter_page\u002F)]\n- Learning Latent Action World Models In The Wild. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.05230)]\n- Object-Centric World Models Meet Monte Carlo Tree Search. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.06604)]\n- Puzzle it Out: Local-to-Global World Model for Offline Multi-Agent Reinforcement Learning. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.07463)]\n- A formal theory on problem space as a semantic world model in systems engineering. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.00755)]\n- Flow Equivariant World Models: Memory for Partially Observed Dynamic Environments. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.01075)] [[Project](https:\u002F\u002Fflowequivariantworldmodels.github.io\u002F)]\n- **NeoVerse**: Enhancing 4D World Model with in-the-wild Monocular Videos. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.00393)] [[Project](https:\u002F\u002Fneoverse-4d.github.io\u002F)]\n- What Drives Success in Physical Planning with Joint-Embedding Predictive World Models? **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.24497)]\n- **AlignUSER**: Human-Aligned LLM Agents via World Models for Recommender System Evaluation. **`arXiv 26.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.00930)]\n\n### 2025\n- [**DreamerV3**] Mastering Diverse Domains through World Models. **`Nature`** [[Paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-025-08744-2)] [[JAX Code](https:\u002F\u002Fgithub.com\u002Fdanijar\u002Fdreamerv3)]\n- **3D4D**: An Interactive, Editable, 4D World Model via 3D Video Generation.  **`AAAI 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.08536)]\n- Object-Centric World Models for Causality-Aware Reinforcement Learning.  **`AAAI 26`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.14262)]\n- Foundation Models as World Models: A Foundational Study in Text-Based GridWorlds.  **`NeurIPSW 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.15915)]\n- Language-conditioned world model improves policy generalization by reading environmental descriptions. **`NeurIPSW 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.22904)]\n- **NavMorph**: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments.  **`ICCV 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.23468)] [[Code](https:\u002F\u002Fgithub.com\u002FFeliciaxyao\u002FNavMorph)]\n- **GWM**: Towards Scalable Gaussian World Models for Robotic Manipulation **`ICCV 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.17600)] [[Project](https:\u002F\u002Fgaussian-world-model.github.io\u002F)]\n- **FOUNDER**: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making. **`ICML 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.12496)] [[Project](https:\u002F\u002Fsites.google.com\u002Fview\u002Ffounder-rl)]\n- General agents need world models.  **`ICML 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.01622)]\n- What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models. **`ICML 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.06952)]\n- Continual Reinforcement Learning by Planning with Online World Models. **`ICML 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.09177)]\n- **PIGDreamer**: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning. **`ICML 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.02159)]\n- [**NWM**] Navigation World Models.  **`CVPR 25 Best Paper Honorable Mention`** **`Yann LeCun`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.03572)] [[Project](https:\u002F\u002Fwww.amirbar.net\u002Fnwm\u002F)]\n- [**PrediCIR**] Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval. **`CVPR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.17109)] [[Code](https:\u002F\u002Fgithub.com\u002FPter61\u002Fpredicir)]\n- [**MoSim**] Neural Motion Simulator: Pushing the Limit of World Models in Reinforcement Learning. **`CVPR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.07095)]\n- **CoT-VLA**: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models.  **`CVPR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.22020)] [[Project](https:\u002F\u002Fcot-vla.github.io\u002F)]\n- **EchoWorld**: Learning Motion-Aware World Models for Echocardiography Probe Guidance. **`CVPR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.13065)] [[Code](https:\u002F\u002Fgithub.com\u002FLeapLabTHU\u002FEchoWorld)]\n- **DiWA**: Diffusion Policy Adaptation with World Models. **`CoRL 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.03645)] [[Project](https:\u002F\u002Fdiwa.cs.uni-freiburg.de\u002F)]\n- **Simulating Before Planning**: Constructing Intrinsic User World Model for User-Tailored Dialogue Policy Planning. **`SIGIR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.13643)] \n- **LS-Imagine**: Open-World Reinforcement Learning over Long Short-Term Imagination. **`ICLR 25 Oral`** [[Paper](https:\u002F\u002Fopenreview.net\u002Fpdf?id=vzItLaEoDa)] [[Code](https:\u002F\u002Fgithub.com\u002Fqiwang067\u002FLS-Imagine)]\n- **DC-MPC**: Discrete Codebook World Models for Continuous Control.  **`ICLR 25`** [[Paper](https:\u002F\u002Fopenreview.net\u002Fforum?id=lfRYzd8ady)] [[Code](https:\u002F\u002Fgithub.com\u002Faidanscannell\u002Fdcmpc)]\n- [**SGF**] Simple, Good, Fast: Self-Supervised World Models Free of Baggage.  **`ICLR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.02612)] [[Code](https:\u002F\u002Fgithub.com\u002Fjrobine\u002Fsgf)]\n- **ManiGaussian++**: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model. **`IROS 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.19842)] [[Code](https:\u002F\u002Fgithub.com\u002FApril-Yz\u002FManiGaussian_Bimanual)]\n- **SCMA**: Self-Consistent Model-based Adaptation for Visual Reinforcement Learning. **`IJCAI 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.09923)]\n- **Surfer**: A World Model-Based Framework for Vision-Language Robot Manipulation. **`TNNLS 25`** [[Paper](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F11152367)]\n- Probing the effectiveness of World Models for Spatial Reasoning through Test-time Scaling. **`World Modeling Workshop 26`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.05809)] [[Code](https:\u002F\u002Fgithub.com\u002Fchandar-lab\u002Fvisa-for-mindjourney)]\n- **On Memory**: A comparison of memory mechanisms in world models. **`World Modeling Workshop 26`** [[Paper](https:\u002F\u002Fwww.arxiv.org\u002Fabs\u002F2512.06983)]\n- **Zero-Splat TeleAssist**: A Zero-Shot Pose Estimation Framework for Semantic Teleoperation. **`ICRAW 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.08271)]\n- **Act2Goal**: From World Model To General Goal-conditioned Policy. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.23541)]\n- Web World Models. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.23676)]\n- [**LEWM**] Large Emotional World Model. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.24149)]\n- World model inspired sarcasm reasoning with large language model agents. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.24329)]\n- **TeleWorld**: Towards Dynamic Multimodal Synthesis with a 4D World Model. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.00051)]\n- Aerial World Model for Long-horizon Visual Generation and Navigation in 3D Space. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.21887)]\n- **Yume-1.5**: A Text-Controlled Interactive World Generation Model. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.22096)]\n- [**ORCA**] Active Intelligence in Video Avatars via Closed-loop World Modeling. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.20615)] [[Project](https:\u002F\u002Fxuanhuahe.github.io\u002FORCA\u002F)]\n- **From Word to World**: Can Large Language Models be Implicit Text-based World Models?. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.18832)]\n- A Unified Definition of Hallucination, Or: It's the World Model, Stupid. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.21577)]\n- **AstraNav-World**: World Model for Foresight Control and Consistency. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.21714)]\n- **ChronoDreamer**: Action-Conditioned World Model as an Online Simulator for Robotic Planning. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.18619)]\n- **STORM**: Search-Guided Generative World Models for Robotic Manipulation. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.18477)]\n- Dexterous World Models. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.17907)] [[Project](http:\u002F\u002Fsnuvclab.github.io\u002Fdwm)]\n- **WorldPlay**: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.14614)] [[Project](https:\u002F\u002F3d-models.hunyuan.tencent.com\u002Fworld\u002F)]\n- **Motus**: A Unified Latent Action World Model. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.13030)]\n- **LongVie 2**: Multimodal Controllable Ultra-Long Video World Model. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.13604)]\n- World Models Can Leverage Human Videos for Dexterous Manipulation. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.13644)]\n- World Models Unlock Optimal Foraging Strategies in Reinforcement Learning Agents. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.12548)]\n- **VFMF**: World Modeling by Forecasting Vision Foundation Model Features. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.11225)] [[Code](https:\u002F\u002Fgithub.com\u002Fgboduljak\u002Fvfmf)]\n- **VDAWorld**: World Modelling via VLM-Directed Abstraction and Simulation. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.11061)] [[Project](https:\u002F\u002Ffelixomahony.github.io\u002Fvdaworld\u002F)]\n- The Double Life of Code World Models: Provably Unmasking Malicious Behavior Through Execution Traces. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.13821)]\n- **KAN-Dreamer**: Benchmarking Kolmogorov-Arnold Networks as Function Approximators in World Models. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.07437)]\n- **CLARITY**: Medical World Model for Guiding Treatment Decisions by Modeling Context-Aware Disease Trajectories in Latent Space. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.08029)]\n- Embodied Tree of Thoughts: Deliberate Manipulation Planning with Embodied World Model. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.08188)] [[Project](https:\u002F\u002Fembodied-tree-of-thoughts.github.io\u002F)]\n- Deterministic World Models for Verification of Closed-loop Vision-based Systems. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.08991)]\n- Closing the Train-Test Gap in World Models for Gradient-Based Planning. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.09929)]\n- Latent Action World Models for Control with Unlabeled Trajectories. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.10016)]\n- Evaluating Gemini Robotics Policies in a Veo World Simulator. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.10675)]\n- **Astra**: General Interactive World Model with Autoregressive Denoising. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.08931)] [[Code](https:\u002F\u002Fgithub.com\u002FEternalEvan\u002FAstra)]\n- **Visionary**: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.08478)]\n- - Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.08411)]\n- Learning Robot Manipulation from Audio World Models. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.08405)]\n- **FieldSeer I**: Physics-Guided World Models for Long-Horizon Electromagnetic Dynamics under Partial Observability. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.05361)]\n- World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.05927)]\n- **Speech World Model**: Causal State-Action Planning with Explicit Reasoning for Speech. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.05933)]\n- **BiTAgent**: A Task-Aware Modular Framework for Bidirectional Coupling between Multimodal Large Language Models and World Models. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.04513)]\n- **AdaPower**: Specializing World Foundation Models for Predictive Manipulation. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.03538)]\n- **RoboScape-R**: Unified Reward-Observation World Models for Generalizable Robotics Training via RL. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.03556)]\n- **RELIC**: Interactive Video World Model with Long-Horizon Memory. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.04040)]\n- **Audio-Visual World Models**: Towards Multisensory Imagination in Sight and Sound. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.00883)]\n- Better World Models Can Lead to Better Post-Training Performance. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.03400)]\n- **VCWorld**: A Biological World Model for Virtual Cell Simulation. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.00306)]\n- **NavForesee**: A Unified Vision-Language World Model for Hierarchical Planning and Dual-Horizon Navigation Prediction. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.01550)]\n- **GrndCtrl**: Grounding World Models via Self-Supervised Reward Alignment. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.01952)]\n- **The brain-AI convergence**: Predictive and generative world models for general-purpose computation. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.02419)]\n- **WorldPack**: Compressed Memory Improves Spatial Consistency in Video World Modeling. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.02473)]\n- **VISTAv2**: World Imagination for Indoor Vision-and-Language Navigation. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.00041)]\n- **Hunyuan-GameCraft-2**: Instruction-following Interactive Game World Model. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.23429)] [[Project](https:\u002F\u002Fhunyuan-gamecraft-2.github.io\u002F)]\n- **SmallWorlds**: Assessing Dynamics Understanding of World Models in Isolated Environments. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.23465)]\n- **Thinking by Doing**: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.23476)]\n- **TraceGen**: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.21690)]\n- **GigaWorld-0**: World Models as Data Engine to Empower Embodied AI. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.19861)]\n- **4DWorldBench**: A Comprehensive Evaluation Framework for 3D\u002F4D World Generation Models. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.19836)]\n- **Thinking Ahead**: Foresight Intelligence in MLLMs and World Models. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.18735)]\n- Counterfactual World Models via Digital Twin-conditioned Video Diffusion. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.17481)]\n- **RynnVLA-002**: A Unified Vision-Language-Action and World Model. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.17502)]\n- **Beyond Generative AI**: World Models for Clinical Prediction, Counterfactuals, and Planning. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.16333)]\n- **X-WIN**: Building Chest Radiograph World Model via Predictive Sensing. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.14918)]\n- **IPR-1**: Interactive Physical Reasoner. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.15407)]\n- **NORA-1.5**: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.14659)] [[Code](https:\u002F\u002Fdeclare-lab.github.io\u002Fnora-1.5)]\n- Towards High-Consistency Embodied World Model with Multi-View Trajectory Videos. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.12882)]\n- **PragWorld**: A Benchmark Evaluating LLMs' Local World Model under Minimal Linguistic Alterations and Conversational Dynamics. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.13021)]\n- Latent-Space Autoregressive World Model for Efficient and Robust Image-Goal Navigation. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.11011)]\n- Scalable Policy Evaluation with Video World Models. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.11520)]\n- **WMPO**: World Model-based Policy Optimization for Vision-Language-Action Models. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.09515)]\n- **ViPRA**: Video Prediction for Robot Actions. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.07732)]\n- **Dynamic Sparsity**: Challenging Common Sparsity Assumptions for Learning World Models in Robotic Reinforcement Learning Benchmarks. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.08086)]\n- **LLM-as-a-Judge**: Toward World Models for Slate Recommendation Systems. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.04541)]\n- **DR. WELL**: Dynamic Reasoning and Learning with Symbolic World Model for Embodied LLM-Based Multi-Agent Collaboration. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.04646)]\n- **WorldPlanner**: Monte Carlo Tree Search and MPC with Action-Conditioned Visual World Models. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.03077)]\n- Natural Building Blocks for Structured World Models: Theory, Evidence, and Scaling. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.02091)]\n- Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.02748)]\n- How Far Are Surgeons from Surgical World Models? A Pilot Study on Zero-shot Surgical Video Generation with Expert Assessment. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.01775)]\n- Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.27607)]\n- Co-Evolving Latent Action World Models. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.26433)]\n- **Emu3.5**: Native Multimodal Models are World Learners. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.26583)]\n- Clone Deterministic 3D Worlds with Geometrically-Regularized World Models. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.26782)]\n- Semantic Communications with World Models. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.24785)]\n- Dual-Mind World Models: A General Framework for Learning in Dynamic Wireless Networks. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.24546)]\n- Deductive Chain-of-Thought Augmented Socially-aware Robot Navigation World Model. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.23509)]\n- Deep Active Inference with Diffusion Policy and Multiple Timescale World Model for Real-World Exploration and Navigation. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.23258)]\n- Vector Quantization in the Brain: Grid-like Codes in World Models. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.16039)]\n- Zero-shot World Models via Search in Memory. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.16123)]\n- **VAGEN**: Reinforcing World Model Reasoning for Multi-Turn VLM Agents. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.16907)] [[Project](https:\u002F\u002Fvagen-ai.github.io\u002F)]\n- **World-in-World**: World Models in a Closed-Loop World. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.18135)]\n- Higher Embedding Dimension Creates a Stronger World Model for a Simple Sorting Task. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.18315)]\n- Social World Model-Augmented Mechanism Design Policy Learning. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.19270)]\n- **ProTerrain**: Probabilistic Physics-Informed Rough Terrain World Modeling. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.19364)]\n- **GigaBrain-0**: A World Model-Powered Vision-Language-Action Model. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.19430)] [[Project](https:\u002F\u002Fgigabrain0.github.io\u002F)]\n- Benchmarking World-Model Learning. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.19788)]\n- Semantic World Models. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.19818)] [[Project](https:\u002F\u002Fweirdlabuw.github.io\u002Fswm)]\n- World Models Should Prioritize the Unification of Physical and Social Dynamics. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.21219)]\n- **From Masks to Worlds**: A Hitchhiker's Guide to World Models. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.20668)]\n- Rethinking the Simulation vs. Rendering Dichotomy: No Free Lunch in Spatial World Modelling. **`NeurIPSW 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.20835)]\n- How Hard is it to Confuse a World Model? **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.21232)]\n- **DreamerV3-XP**: Optimizing exploration through uncertainty estimation. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.21418)]\n- **PhysWorld**: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.21447)]\n- **Terra**: Explorable Native 3D World Model with Point Latents. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.14977)] [[Project](https:\u002F\u002Fhuang-yh.github.io\u002Fterra\u002F)]\n- **R-WoM**: Retrieval-augmented World Model For Computer-use Agents. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.11892)]\n- **One Life to Learn**: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.12088)] [[Project](https:\u002F\u002Fonelife-worldmodel.github.io\u002F)]\n- **Deep SPI**: Safe Policy Improvement via World Models. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.12312)]\n- **DREAMer-VXS**: A Latent World Model for Sample-Efficient AGV Exploration in Stochastic, Unobserved Environments. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.00005)]\n- Ego-Vision World Model for Humanoid Contact Planning. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.11682)] [[Project](https:\u002F\u002Fego-vcp.github.io\u002F)]\n- **Unified World Models**: Memory-Augmented Planning and Foresight for Visual Navigation. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.08713)]\n- What You Don't Know Can Hurt You: How Well do Latent Safety Filters Understand Partially Observable Safety Constraints? **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.06492)]\n- Generative World Modelling for Humanoids: 1X World Model Challenge Technical Report. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.07092)]\n- **WristWorld**: Generating Wrist-Views via 4D World Models for Robotic Manipulation. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.07313)]\n- **Ctrl-World**: A Controllable Generative World Model for Robot Manipulation. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.10125)]\n- Active Confusion Expression in Large Language Models: Leveraging World Models toward Better Social Reasoning. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.07974)]\n- **VideoVerse**: How Far is Your T2V Generator from a World Model? **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.08398)]\n- Internal World Models as Imagination Networks in Cognitive Agents. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.04391)]\n- Code World Models for General Game Playing. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.04542)]\n- Spatiotemporal Forecasting as Planning: A Model-Based Reinforcement Learning Approach with Generative World Models. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.04020)]\n- **MorphoSim**: An Interactive, Controllable, and Editable Language-guided 4D World Simulator. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.04390)] [[Code](https:\u002F\u002Fgithub.com\u002Feric-ai-lab\u002FMorph4D)]\n- Bridging the Gap Between Multimodal Foundation Models and World Models. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.03727)]\n- **Memory Forcing**: Spatio-Temporal Memory for Consistent Scene Generation on Minecraft. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.03198)] [[Project](https:\u002F\u002Fjunchao-cs.github.io\u002FMemoryForcing-demo\u002F)]\n- A Recipe for Efficient Sim-to-Real Transfer in Manipulation with Online Imitation-Pretrained World Models. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.02538)]\n- **CWM**: An Open-Weights LLM for Research on Code Generation with World Models. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.02387)]\n- **FantasyWorld**: Geometry-Consistent World Modeling via Unified Video and 3D Prediction. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.21657)]\n- **LongScape**: Advancing Long-Horizon Embodied World Models with Context-Aware MoE. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.21790)] [[Code](https:\u002F\u002Fgithub.com\u002Ftsinghua-fib-lab\u002FLongscape)]\n- **LongLive**: Real-time Interactive Long Video Generation. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.22622)] [[Code](https:\u002F\u002Fgithub.com\u002FNVlabs\u002FLongLive)]\n- **MoWM**: Mixture-of-World-Models for Embodied Planning via Latent-to-Pixel Feature Modulation. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.21797)]\n- **Context and Diversity Matter**: The Emergence of In-Context Learning in World Models. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.22353)]\n- **WoW**: Towards a World omniscient World model Through Embodied Interaction. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.22642)]\n- **KeyWorld**: Key Frame Reasoning Enables Effective and Efficient World Models. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.21027)] [[Code](https:\u002F\u002Fanonymous.4open.science\u002Fr\u002FKeyworld-E43D)]\n- [**Voe 3**] Video models are zero-shot learners and reasoners. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.20328)] [[Project](https:\u002F\u002Fvideo-zero-shot.github.io\u002F)]\n- **World4RL**: Diffusion World Models for Policy Refinement with Reinforcement Learning for Robotic Manipulation. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.19080)] [[Project](https:\u002F\u002Fworld4rl.github.io\u002F)]\n- Remote Sensing-Oriented World Model. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.17808)]\n- **SAMPO**: Scale-wise Autoregression with Motion PrOmpt for generative world models. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.15536)]\n- [**PIWM**] Enhancing Physical Consistency in Lightweight World Models. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.12437)] [[Project](https:\u002F\u002Fphysics-wm.github.io\u002F)]\n- **LLM-JEPA**: Large Language Models Meet Joint Embedding Predictive Architectures. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.14252v1)] [[Code](https:\u002F\u002Fgithub.com\u002Frbalestr-lab\u002Fllm-jepa)]\n- **PhysicalAgent**: Towards General Cognitive Robotics with Foundation World Models. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.13903)]\n- **OmniWorld**: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.12201)] [[Project](https:\u002F\u002Fyangzhou24.github.io\u002FOmniWorld\u002F)]\n- **UnifoLM-WMA-0**: A World-Model-Action (WMA) Framework under UnifoLM Family. **`Unitree`** [[Code](https:\u002F\u002Fgithub.com\u002Funitreerobotics\u002Funifolm-world-model-action)]\n- **One Model for All Tasks**: Leveraging Efficient World Models in Multi-Task Planning. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.07945)]\n- Language-Driven Hierarchical Task Structures as Explicit World Models for Multi-Agent Learning. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.04731)]\n- **LatticeWorld**: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.05263)] [[Demo](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=8VWZXpERR18&feature=youtu.be)]\n- Design and Optimization of Reinforcement Learning-Based Agents in Text-Based Games. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.03479)]\n- **CausalARC**: Abstract Reasoning with Causal World Models. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.03636)]\n- Planning with Reasoning using Vision Language World Model. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.02722)]\n- Learning an Adversarial World Model for Automated Curriculum Generation in MARL. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.03771)]\n- World Model Implanting for Test-time Adaptation of Embodied Agents. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.03956)]\n- Social World Models. **`arXiv 25.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.00559)]\n- [**PEWM**] Learning Primitive Embodied World Models: Towards Scalable Robotic Learning. **`arXiv 25.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.20840)]\n- [**DALI**] Dynamics-Aligned Latent Imagination in Contextual World Models for Zero-Shot Generalization. **`arXiv 25.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.20294)]\n- **HERO**: Hierarchical Extrapolation and Refresh for Efficient World Models. **`arXiv 25.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.17588)]\n- **Matrix-Game 2.0**: An Open-Source, Real-Time, and Streaming Interactive World Model. **`arXiv 25.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.13009)] [[Code](https:\u002F\u002Fgithub.com\u002FSkyworkAI\u002FMatrix-Game\u002Ftree\u002Fmain\u002FMatrix-Game-2)]\n- Visuomotor Grasping with World Models for Surgical Robots. **`arXiv 25.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.11200)]\n- **Genie 3**: A new frontier for world models. **`Google DeepMind`** [[Blog](https:\u002F\u002Fdeepmind.google\u002Fdiscover\u002Fblog\u002Fgenie-3-a-new-frontier-for-world-models\u002F)]\n- **SimuRA**: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.23773)]\n- **CoEx** -- Co-evolving World-model and Exploration. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.22281)]\n- What Does it Mean for a Neural Network to Learn a \"World Model\"? **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.21513)]\n- **Back to the Features**: DINO as a Foundation for Video World Models. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.19468)]\n- **HunyuanWorld 1.0**: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels. **`25.7`** [[Paper](https:\u002F\u002F3d-models.hunyuan.tencent.com\u002Fworld\u002FHY_World_1_technical_report.pdf)] [[Code](https:\u002F\u002Fgithub.com\u002FTencent-Hunyuan\u002FHunyuanWorld-1.0)]\n- **Yume**: An Interactive World Generation Model. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.17744)] [[Code](https:\u002F\u002Fgithub.com\u002Fstdstu12\u002FYUME)]\n- LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.15521)]\n- **MindJourney**: Test-Time Scaling with World Models for Spatial Reasoning. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.12508)] [[Project](https:\u002F\u002Fumass-embodied-agi.github.io\u002FMindJourney\u002F)]\n- Latent Policy Steering with Embodiment-Agnostic Pretrained World Models. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.13340)]\n- **MobiWorld**: World Models for Mobile Wireless Network. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.09462)]\n- [**GWM**] Graph World Model. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.10539)] [[Code](https:\u002F\u002Fgithub.com\u002Fulab-uiuc\u002FGWM)]\n- **From Curiosity to Competence**: How World Models Interact with the Dynamics of Exploration. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.08210)]\n- **Martian World Models**: Controllable Video Synthesis with Physically Accurate 3D Reconstructions. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.07978)] [[Project](https:\u002F\u002Fmarsgenai.github.io\u002F)]\n- **Sekai**: A Video Dataset towards World Exploration. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.15675)] [[Project](https:\u002F\u002Flixsp11.github.io\u002Fsekai-project\u002F)]\n- **Dyn-O**: Building Structured World Models with Object-Centric Representations. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.03298)]\n- Critiques of World Models. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.05169)]\n- [**PEVA**] Whole-Body Conditioned Egocentric Video Prediction. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.21552)] [[Project](https:\u002F\u002Fdannytran123.github.io\u002FPEVA\u002F)]\n- **World4Omni**: A Zero-Shot Framework from Image Generation World Model to Robotic Manipulation. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.23919)]\n- **ParticleFormer**: A 3D Point Cloud World Model for Multi-Object, Multi-Material Robotic Manipulation. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.23126)] [[Project](https:\u002F\u002Fparticleformer.github.io\u002F)]\n- **RoboScape**: Physics-informed Embodied World Model. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.23135)] [[Code](https:\u002F\u002Fgithub.com\u002Ftsinghua-fib-lab\u002FRoboScape)]\n- **Embodied AI Agents**: Modeling the World. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.22355)]\n- A \"Good\" Regulator May Provide a World Model for Intelligent Systems. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.23032)]\n- **WorldVLA**: Towards Autoregressive Action World Model. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.21539)] [[Code](https:\u002F\u002Fgithub.com\u002Falibaba-damo-academy\u002FWorldVLA)]\n- **MinD**: Unified Visual Imagination and Control via Hierarchical World Models. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.18897)]\n- Transformer World Model for Sample Efficient Multi-Agent Reinforcement Learning. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.18537)]\n- [**UNIVERSE**] Adapting Vision-Language Models for Evaluating World Models. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.17967)]\n- **TransDreamerV3**: Implanting Transformer In DreamerV3. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.17103)]\n- Reimagination with Test-time Observation Interventions: Distractor-Robust World Model Predictions for Visual Model Predictive Control. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.16565)]\n- Measuring (a Sufficient) World Model in LLMs: A Variance Decomposition Framework. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.16584)]\n- **GAF**: Gaussian Action Field as a Dynamic World Model for Robotic Manipulation. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.14135)] [[Project](https:\u002F\u002Fchaiying1.github.io\u002FGAF.github.io\u002Fproject_page\u002F)]\n- [**UniVLA**] Unified Vision-Language-Action Model. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.19850)]\n- **Xray2Xray**: World Model from Chest X-rays with Volumetric Context. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.19055)]\n- **PlayerOne**: Egocentric World Simulator. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.09995)] [[Project](https:\u002F\u002Fplayerone-hku.github.io\u002F)]\n- **V-JEPA 2**: Self-Supervised Video Models Enable Understanding, Prediction and Planning. **`arXiv 25.6`** **`Yann LeCun`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.09985)] [[Project](https:\u002F\u002Fai.meta.com\u002Fvjepa\u002F)]\n- [**TAWM**] Time-Aware World Model for Adaptive Prediction and Control. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.08441)] [[Code](https:\u002F\u002Fgithub.com\u002Fanh-nn01\u002FTime-Aware-World-Model)]\n- [**XPM-WM**] Efficient Generation of Diverse Cooperative Agents with World Models. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.07450)]\n- Video World Models with Long-term Spatial Memory. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.05284)] [[Project](https:\u002F\u002Fspmem.github.io\u002F)]\n- **DSG-World**: Learning a 3D Gaussian World Model from Dual State Videos. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.05217)]\n- Safe Planning and Policy Optimization via World Model Learning. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.04828)]\n- **3DFlowAction**: Learning Cross-Embodiment Manipulation from 3D Flow World Model. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.06199)] [[Code](https:\u002F\u002Fgithub.com\u002FHoyyyaard\u002F3DFlowAction\u002F)]\n- Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.06006)]\n- **ORV**: 4D Occupancy-centric Robot Video Generation. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.03079)] [[Project](https:\u002F\u002Forangesodahub.github.io\u002FORV\u002F)]\n- **DeepVerse**: 4D Autoregressive Video Generation as a World Model. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.01103)] [[Project](https:\u002F\u002Fsotamak1r.github.io\u002Fdeepverse\u002F)]\n- Sparse Imagination for Efficient Visual World Model Planning. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.01392)]\n- Learning Abstract World Models with a Group-Structured Latent Space. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.01529)]\n- **Voyager**: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.04225)] [[Project](https:\u002F\u002Fvoyager-world.github.io\u002F)]\n- **WoMAP**: World Models For Embodied Open-Vocabulary Object Localization. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.01600)]\n- [**LoopNav**] Toward Memory-Aided World Models: Benchmarking via Spatial Consistency. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.22976)] [[Code](https:\u002F\u002Fgithub.com\u002FKevin-lkw\u002FLoopNav)] [[Data](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FkevinLian\u002FLoopNav)]\n- Long-Context State-Space Video World Models. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.20171)]\n- **Dyna-Think**: Synergizing Reasoning, Acting, and World Model Simulation in AI Agents. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.00320)]\n- [**WPE**] Evaluating Robot Policies in a World Model. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.00613)] [[Demo](https:\u002F\u002Fworld-model-eval.github.io\u002F)]\n- **StateSpaceDiffuser**: Bringing Long Context to Diffusion World Models. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.22246)]\n- [**VRAG**] Learning World Models for Interactive Video Generation. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.21996)]\n- **JEDI**: Latent End-to-end Diffusion Mitigates Agent-Human Performance Asymmetry in Model-Based Reinforcement Learning. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.19698)]\n- [**FPWC**] Unlocking Smarter Device Control: Foresighted Planning with a World Model-Driven Code Execution Approach. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.16422)]\n- [**ForeDiff**] Consistent World Models via Foresight Diffusion. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.16474)]\n- **FLARE**: Robot Learning with Implicit World Modeling. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.15659)] [[Project](https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Fgear\u002Fflare)]\n- [**RWM**] World Models as Reference Trajectories for Rapid Motor Adaptation. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.15589)]\n- **RLVR-World**: Training World Models with Reinforcement Learning. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.13934)] [[Project](https:\u002F\u002Fthuml.github.io\u002FRLVR-World\u002F)]\n- **Vid2World**: Crafting Video Diffusion Models to Interactive World Models. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.14357)] [[Project](https:\u002F\u002Fknightnemo.github.io\u002Fvid2world\u002F)]\n- **Causal Cartographer**: From Mapping to Reasoning Over Counterfactual Worlds. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.14396)]\n- **EWMBench**: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.09694)] [[Data&Code](https:\u002F\u002Fgithub.com\u002FAgibotTech\u002FEWMBench)]\n- **FlowDreamer**: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.10075)] [[Project](https:\u002F\u002Fsharinka0715.github.io\u002FFlowDreamer\u002F)]\n- [**RoboOccWorld**] Occupancy World Model for Robots. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.05512)]\n- **seq-JEPA**: Autoregressive Predictive Learning of Invariant-Equivariant World Models. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.03176)]\n- **TesserAct**: Learning 4D Embodied World Models. **`arXiv 25.4`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.20995)] [[Project](https:\u002F\u002Ftesseractworld.github.io\u002F)]\n- **ManipDreamer**: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance. **`arXiv 25.4`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.16464)]\n- [**RWM-O**] Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator. **`arXiv 25.4`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.16680)]\n- **PIN-WM**: Learning Physics-INformed World Models for Non-Prehensile Manipulation. **`arXiv 25.4`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.16693)]\n- Adapting a World Model for Trajectory Following in a 3D Game. **`arXiv 25.4`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.12299)]\n- Embodied World Models Emerge from Navigational Task in Open-Ended Environments. **`arXiv 25.4`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.11419)]\n- **MineWorld**: a Real-Time and Open-Source Interactive World Model on Minecraft. **`arXiv 25.4`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.08388)] [[Code](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FMineWorld)]\n- [**UWM**] Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets. **`arXiv 25.4`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.02792)] [[Code](https:\u002F\u002Fgithub.com\u002FWEIRDLabUW\u002Funified-world-model)]\n- Synthesizing world models for bilevel planning. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.20124)]\n- **Aether**: Geometric-Aware Unified World Modeling. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.18945)] [[Project](https:\u002F\u002Faether-world.github.io\u002F)]\n- [**MaaG**] Model as a Game: On Numerical and Spatial Consistency for Generative Games. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.21172)]\n- **DyWA**: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation.  **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.16806)] [[Project](https:\u002F\u002Fpku-epic.github.io\u002FDyWA\u002F)]\n- **Cosmos-Transfer1** **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.14492)] [[Code](https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-transfer1)]\n- Meta-Reinforcement Learning with Discrete World Models for Adaptive Load Balancing. **`ACMSE 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.08872)]\n- [**FAR**] Long-Context Autoregressive Video Modeling with Next-Frame Prediction. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.19325)] [[Project](https:\u002F\u002Ffarlongctx.github.io\u002F)] [[Code](https:\u002F\u002Fgithub.com\u002Fshowlab\u002FFAR)]\n- **LUMOS**: Language-Conditioned Imitation Learning with World Models. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.10370)] [[Project](http:\u002F\u002Flumos.cs.uni-freiburg.de\u002F)]\n- **World Modeling Makes a Better Planner**: Dual Preference Optimization for Embodied Task Planning. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.10480)]\n- [**WLA**] Inter-environmental world modeling for continuous and compositional dynamics. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.09911)]\n- **Disentangled World Models**: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.08751)]\n- **WMNav**: Integrating Vision-Language Models into World Models for Object Goal Navigation. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.02247)] [[Code](https:\u002F\u002Fgithub.com\u002FB0B8K1ng\u002FWMNavigation)]\n- Toward Stable World Models: Measuring and Addressing World Instability in Generative Environments. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.08122)]\n- **WorldModelBench**: Judging Video Generation Models As World Models. **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.20694)] [[Project](https:\u002F\u002Fworldmodelbench-team.github.io\u002F)]\n- **Multimodal Dreaming**: A Global Workspace Approach to World Model-Based Reinforcement Learning. **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.21142)]\n- Learning To Explore With Predictive World Model Via Self-Supervised Learning. **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.13200)]\n- **Text2World**: Benchmarking Large Language Models for Symbolic World Model Generation.  **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.13092)] [[Project](https:\u002F\u002Ftext-to-world.github.io\u002F)]\n- **M^3** : A Modular World Model over Streams of Tokens.  **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.11537)]  [[Code](https:\u002F\u002Fgithub.com\u002Fleor-c\u002FM3)]\n- When do Neural Networks Learn World Models?.  **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.09297)]\n- [**DWS**] Pre-Trained Video Generative Models as World Simulators.  **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.07825)]\n- **DMWM**: Dual-Mind World Model with Long-Term Imagination.  **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.07591)]\n- **EvoAgent**: Agent Autonomous Evolution with Continual World Model for Long-Horizon Tasks.  **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.05907)]\n- Generating Symbolic World Models via Test-time Scaling of Large Language Models.  **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.04728)]\n- [**HMA**] Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression.  **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.04296)] [[Code](https:\u002F\u002Fgithub.com\u002Fliruiw\u002FHMA)] [[Project](https:\u002F\u002Fliruiw.github.io\u002Fhma\u002F)]\n- **UP-VLA**: A Unified Understanding and Prediction Model for Embodied Agent.  **`arXiv 25.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.18867)]\n- **GLAM**: Global-Local Variation Awareness in Mamba-based World Model.  **`arXiv 25.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.11949)] [[Code](https:\u002F\u002Fgithub.com\u002FGLAM25\u002Fglam)]\n- **Robotic World Model**: A Neural Network Simulator for Robust Policy Optimization in Robotics.  **`arXiv 25.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.10100)]\n- **GAWM**: Global-Aware World Model for Multi-Agent Reinforcement Learning.  **`arXiv 25.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.10116)]\n- **RoboHorizon**: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation.  **`arXiv 25.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.06605)]\n- **EnerVerse**: Envisioning Embodied Future Space for Robotics Manipulation. **`AgiBot`**  **`arXiv 25.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.01895)] [[Website](https:\u002F\u002Fsites.google.com\u002Fview\u002Fenerverse)]\n- **Cosmos** World Foundation Model Platform for Physical AI. **`NVIDIA`** **`arXiv 25.1`** [[Paper](https:\u002F\u002Fd1qx31qr3h6wln.cloudfront.net\u002Fpublications\u002FNVIDIA%20Cosmos_4.pdf)] [[Code](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FCosmos)]\n### 2024\n- [**SMAC**] Grounded Answers for Multi-agent Decision-making Problem through Generative World Model. **`NeurIPS 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.02664)]\n- [**CoWorld**] Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning. **`NeurIPS 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.15260)] [[Website](https:\u002F\u002Fqiwang067.github.io\u002Fcoworld)] [[Torch Code](https:\u002F\u002Fgithub.com\u002Fqiwang067\u002FCoWorld)]\n- [**Diamond**] Diffusion for World Modeling: Visual Details Matter in Atari. **`NeurIPS 24`**  [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.12399)] [[Code](https:\u002F\u002Fgithub.com\u002Feloialonso\u002Fdiamond)]\n- **PIVOT-R**: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation. **`NeurIPS 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.10394)]\n- [**MUN**]Learning World Models for Unconstrained Goal Navigation. **`NeurIPS 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.02446)] [[Code](https:\u002F\u002Fgithub.com\u002FRU-Automated-Reasoning-Group\u002FMUN)]\n- **VidMan**: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation. **`NeurIPS 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.09153)]\n- **Adaptive World Models**: Learning Behaviors by Latent Imagination Under Non-Stationarity. **`NeurIPSW 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.01342)]\n- Emergence of Implicit World Models from Mortal Agents. **`NeurIPSW 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.12304)]\n- Causal World Representation in the GPT Model. **`NeurIPSW 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.07446)]\n- **PreLAR**: World Model Pre-training with Learnable Action Representation. **`ECCV 24`** [[Paper](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_24\u002Fpapers_ECCV\u002Fpapers\u002F03363.pdf)] [[Code](https:\u002F\u002Fgithub.com\u002Fzhanglixuan0720\u002FPreLAR)]\n- [**CWM**] Understanding Physical Dynamics with Counterfactual World Modeling. **`ECCV 24`** [[Paper](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_24\u002Fpapers_ECCV\u002Fpapers\u002F03523.pdf)] [[Code](https:\u002F\u002Fneuroailab.github.io\u002Fcwm-physics\u002F)]\n- **ManiGaussian**: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation. **`ECCV 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.08321)] [[Code](https:\u002F\u002Fgithub.com\u002FGuanxingLu\u002FManiGaussian)]\n- [**DWL**] Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning. **`RSS 24 (Best Paper Award Finalist)`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.14472)]\n- [**LLM-Sim**] Can Language Models Serve as Text-Based World Simulators? **`ACL`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.06485)] [[Code](https:\u002F\u002Fgithub.com\u002Fcognitiveailab\u002FGPT-simulator)]\n- **RoboDreamer**: Learning Compositional World Models for Robot Imagination. **`ICML 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.12377)] [[Code](https:\u002F\u002Frobovideo.github.io\u002F)]\n- [**Δ-IRIS**] Efficient World Models with Context-Aware Tokenization. **`ICML 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.19320)] [[Code](https:\u002F\u002Fgithub.com\u002Fvmicheli\u002Fdelta-iris)]\n- **AD3**: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors. **`ICML 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.09976)]\n- **Hieros**: Hierarchical Imagination on Structured State Space Sequence World Models. **`ICML 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.05167)]\n- [**HRSSM**] Learning Latent Dynamic Robust Representations for World Models.**`ICML 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.06263)] [[Code](https:\u002F\u002Fgithub.com\u002Fbit1029public\u002FHRSSM)]\n- **HarmonyDream**: Task Harmonization Inside World Models.**`ICML 24`** [[Paper](https:\u002F\u002Fopenreview.net\u002Fforum?id=x0yIaw2fgk)] [[Code](https:\u002F\u002Fgithub.com\u002Fthuml\u002FHarmonyDream)]\n- [**REM**] Improving Token-Based World Models with Parallel Observation Prediction.**`ICML 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.05643)] [[Code](https:\u002F\u002Fgithub.com\u002Fleor-c\u002FREM)]\n- Do Transformer World Models Give Better Policy Gradients? **`ICML 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.05290)]\n- **TD-MPC2**: Scalable, Robust World Models for Continuous Control. **`ICLR 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.16828)] [[Torch Code](https:\u002F\u002Fgithub.com\u002Fnicklashansen\u002Ftdmpc2)]\n- **DreamSmooth**: Improving Model-based Reinforcement Learning via Reward Smoothing. **`ICLR 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.01450)]\n- [**R2I**] Mastering Memory Tasks with World Models. **`ICLR 24`** [[Paper](http:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.04253)] [[JAX Code](https:\u002F\u002Fgithub.com\u002FOpenDriveLab\u002FViDAR)]\n- **MAMBA**: an Effective World Model Approach for Meta-Reinforcement Learning. **`ICLR 24`**  [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.09859)] [[Code](https:\u002F\u002Fgithub.com\u002Fzoharri\u002Fmamba)]\n- Multi-Task Interactive Robot Fleet Learning with Visual World Models. **`CoRL 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.22689)] [[Code](https:\u002F\u002Fut-austin-rpl.github.io\u002Fsirius-fleet\u002F)]\n- **Generative Emergent Communication**: Large Language Model is a Collective World Model. **`arXiv 24.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.00226)]\n- Towards Unraveling and Improving Generalization in World Models. **`arXiv 24.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.00195)]\n- **Towards Physically Interpretable World Models**: Meaningful Weakly Supervised Representations for Visual Trajectory Prediction. **`arXiv 24.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.12870)]\n- **Dream to Manipulate**: Compositional World Models Empowering Robot Imitation Learning with Imagination. **`arXiv 24.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.14957)]  [[Project](https:\u002F\u002Fleobarcellona.github.io\u002FDreamToManipulate\u002F)]\n- Transformers Use Causal World Models in Maze-Solving Tasks. **`arXiv 24.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.11867)]\n- **Owl-1**: Omni World Model for Consistent Long Video Generation. **`arXiv 24.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.09600)] [[Code](https:\u002F\u002Fgithub.com\u002Fhuang-yh\u002FOwl)]\n- **StoryWeaver**: A Unified World Model for Knowledge-Enhanced Story Character Customization. **`arXiv 24.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.07375)] [[Code](https:\u002F\u002Fgithub.com\u002FAria-Zhangjl\u002FStoryWeaver)]\n- **SimuDICE**: Offline Policy Optimization Through World Model Updates and DICE Estimation. **`BNAIC 24`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.06486)]\n- Bounded Exploration with World Model Uncertainty in Soft Actor-Critic Reinforcement Learning Algorithm. **`arXiv 24.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.06139)]\n- **Genie 2**: A large-scale foundation world model.  **`24.12`** **`Google DeepMind`** [[Blog](https:\u002F\u002Fdeepmind.google\u002Fdiscover\u002Fblog\u002Fgenie-2-a-large-scale-foundation-world-model\u002F)]\n- **The Matrix**: Infinite-Horizon World Generation with Real-Time Moving Control.  **`arXiv 24.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.03568)] [[Project](https:\u002F\u002Fthematrix1999.github.io\u002F)]\n- **Motion Prompting**: Controlling Video Generation with Motion Trajectories.  **`arXiv 24.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.02700)] [[Project](https:\u002F\u002Fmotion-prompting.github.io\u002F)]\n- Generative World Explorer. **`arXiv 24.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.11844)] [[Project](https:\u002F\u002Fgenerative-world-explorer.github.io\u002F)]\n- [**WebDreamer**] Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents. **`arXiv 24.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.06559)] [[Code](https:\u002F\u002Fgithub.com\u002FOSU-NLP-Group\u002FWebDreamer)]\n- **WHALE**: Towards Generalizable and Scalable World Models for Embodied Decision-making. **`arXiv 24.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.05619)]\n- **DINO-WM**: World Models on Pre-trained Visual Features enable Zero-shot Planning. **`arXiv 24.11`** **`Yann LeCun`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.04983)]\n- Scaling Laws for Pre-training Agents and World Models. **`arXiv 24.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.04434)]\n- [**Phyworld**] How Far is Video Generation from World Model: A Physical Law Perspective. **`arXiv 24.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.02385)] [[Project](https:\u002F\u002Fphyworld.github.io\u002F)]\n- **IGOR**: Image-GOal Representations are the Atomic Control Units for Foundation Models in Embodied AI. **`arXiv 24.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.00785)] [[Project](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fproject\u002Figor-image-goal-representations\u002F)]\n- **EVA**: An Embodied World Model for Future Video Anticipation. **`arXiv 24.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.15461)] \n- **VisualPredicator**: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning. **`arXiv 24.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.23156)] \n- [**LLMCWM**] Language Agents Meet Causality -- Bridging LLMs and Causal World Models. **`arXiv 24.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.19923)] [[Code](https:\u002F\u002Fgithub.com\u002Fj0hngou\u002FLLMCWM\u002F)]\n- Reward-free World Models for Online Imitation Learning. **`arXiv 24.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.14081)]\n- **Web Agents with World Models**: Learning and Leveraging Environment Dynamics in Web Navigation. **`arXiv 24.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.13232)]\n- [**GLIMO**] Grounding Large Language Models In Embodied Environment With Imperfect World Models. **`arXiv 24.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.02742)]\n- **AVID**: Adapting Video Diffusion Models to World Models. **`arXiv 24.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.12822)] [[Code](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fcausica\u002Ftree\u002Fmain\u002Fresearch_experiments\u002Favid)]\n- [**WMP**] World Model-based Perception for Visual Legged Locomotion. **`arXiv 24.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.16784)] [[Project](https:\u002F\u002Fwmp-loco.github.io\u002F)]\n- [**OSWM**] One-shot World Models Using a Transformer Trained on a Synthetic Prior. **`arXiv 24.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.14084)]\n- **R-AIF**: Solving Sparse-Reward Robotic Tasks from Pixels with Active Inference and World Models. **`arXiv 24.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.14216)]\n- Representing Positional Information in Generative World Models for Object Manipulation. **`arXiv 24.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.12005)]\n- Making Large Language Models into World Models with Precondition and Effect Knowledge. **`arXiv 24.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.12278)]\n- **DexSim2Real$^2$**: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation. **`arXiv 24.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.08750)]\n- Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction. **`arXiv 24.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.11816)]\n- [**MoReFree**] World Models Increase Autonomy in Reinforcement Learning. **`arXiv 24.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.09807)] [[Project](https:\u002F\u002Fsites.google.com\u002Fview\u002Fmorefree)]\n- **UrbanWorld**: An Urban World Model for 3D City Generation. **`arXiv 24.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.11965)]\n- **PWM**: Policy Learning with Large World Models. **`arXiv 24.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.02466)] [[Code](https:\u002F\u002Fwww.imgeorgiev.com\u002Fpwm\u002F)]\n- **Predicting vs. Acting**: A Trade-off Between World Modeling & Agent Modeling. **`arXiv 24.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.02446)]\n- [**GenRL**] Multimodal foundation world models for generalist embodied agents. **`arXiv 24.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.18043)] [[Code](https:\u002F\u002Fgithub.com\u002Fmazpie\u002Fgenrl)]\n- [**DLLM**] World Models with Hints of Large Language Models for Goal Achieving. **`arXiv 24.6`** [[Paper](http:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.07381)]\n- Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model. **`arXiv 24.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.15275)]\n- **CityBench**: Evaluating the Capabilities of Large Language Model as World Model. **`arXiv 24.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.13945)] [[Code](https:\u002F\u002Fgithub.com\u002Ftsinghua-fib-lab\u002FCityBench)]\n- **CoDreamer**: Communication-Based Decentralised World Models. **`arXiv 24.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.13600)]\n- [**EBWM**] Cognitively Inspired Energy-Based World Models. **`arXiv 24.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.08862)]\n- Evaluating the World Model Implicit in a Generative Model. **`arXiv 24.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.03689)] [[Code](https:\u002F\u002Fgithub.com\u002Fmazpie\u002Fgenrl)]\n- Transformers and Slot Encoding for Sample Efficient Physical World Modelling. **`arXiv 24.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.20180)] [[Code](https:\u002F\u002Fgithub.com\u002Ftorchipeppo\u002Ftransformers-and-slot-encoding-for-wm)]\n- [**Puppeteer**] Hierarchical World Models as Visual Whole-Body Humanoid Controllers. **`arXiv 24.5`** **`Yann LeCun`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.18418)] [[Code](https:\u002F\u002Fnicklashansen.com\u002Frlpuppeteer)]\n- **BWArea Model**: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation. **`arXiv 24.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.17039)]\n- **Pandora**: Towards General World Model with Natural Language Actions and Video States. [[Paper](https:\u002F\u002Fworld-model.maitrix.org\u002Fassets\u002Fpandora.pdf)] [[Code](https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002FPandora)]\n- [**WKM**] Agent Planning with World Knowledge Model. **`arXiv 24.5`**  [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.14205)] [[Code](https:\u002F\u002Fgithub.com\u002Fzjunlp\u002FWKM)]\n- **Newton**™ – a first-of-its-kind foundation model for understanding the physical world. **`Archetype AI`** [[Blog](https:\u002F\u002Fwww.archetypeai.io\u002Fblog\u002Fintroducing-archetype-ai---understand-the-real-world-in-real-time)]\n- **Compete and Compose**: Learning Independent Mechanisms for Modular World Models. **`arXiv 24.4`**  [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.15109)]\n- **MagicTime**: Time-lapse Video Generation Models as Metamorphic Simulators. **`arXiv 24.4`**  [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.05014)] [[Code](https:\u002F\u002Fgithub.com\u002FPKU-YuanGroup\u002FMagicTime)]\n- **Dreaming of Many Worlds**: Learning Contextual World Models Aids Zero-Shot Generalization. **`arXiv 24.3`**  [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.10967)] [[Code](https:\u002F\u002Fgithub.com\u002Fsai-prasanna\u002Fdreaming_of_many_worlds)]\n- **V-JEPA**: Video Joint Embedding Predictive Architecture. **`Meta AI`** **`Yann LeCun`** [[Blog](https:\u002F\u002Fai.meta.com\u002Fblog\u002Fv-jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture\u002F)] [[Paper](https:\u002F\u002Fai.meta.com\u002Fresearch\u002Fpublications\u002Frevisiting-feature-prediction-for-learning-visual-representations-from-video\u002F)] [[Code](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fjepa)]\n- [**IWM**] Learning and Leveraging World Models in Visual Representation Learning. **`Meta AI`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.00504)] \n- **Genie**: Generative Interactive Environments. **`DeepMind`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.15391)] [[Blog](https:\u002F\u002Fsites.google.com\u002Fview\u002Fgenie-24\u002Fhome)]\n- [**Sora**] Video generation models as world simulators. **`OpenAI`** [[Technical report](https:\u002F\u002Fopenai.com\u002Fresearch\u002Fvideo-generation-models-as-world-simulators)]\n- [**LWM**] World Model on Million-Length Video And Language With RingAttention. **`arXiv 24.2`**  [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.08268)] [[Code](https:\u002F\u002Fgithub.com\u002FLargeWorldModel\u002FLWM)]\n- Planning with an Ensemble of World Models. **`OpenReview`** [[Paper](https:\u002F\u002Fopenreview.net\u002Fforum?id=cvGdPXaydP)]\n- **WorldDreamer**: Towards General World Models for Video Generation via Predicting Masked Tokens. **`arXiv 24.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.09985)] [[Code](https:\u002F\u002Fgithub.com\u002FJeffWang987\u002FWorldDreamer)]\n\n### 2023\n- [**IRIS**] Transformers are Sample Efficient World Models. **`ICLR 23 Oral`** [[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.00588)] [[Torch Code](https:\u002F\u002Fgithub.com\u002Feloialonso\u002Firis)]\n- **STORM**: Efficient Stochastic Transformer based World Models for Reinforcement Learning. **`NIPS 23`** [[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.09615)] [[Torch Code](https:\u002F\u002Fgithub.com\u002Fweipu-zhang\u002FSTORM)]\n- [**TWM**] Transformer-based World Models Are Happy with 100k Interactions. **`ICLR 23`** [[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.07109)] [[Torch Code](https:\u002F\u002Fgithub.com\u002Fjrobine\u002Ftwm)]\n- **FOCUS**: Object-Centric World Models for Robotics Manipulation **`arXiv 23.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.02427)] [[Code](https:\u002F\u002Fgithub.com\u002FStefanoFerraro\u002FFOCUS)]\n- [**Dynalang**] Learning to Model the World with Language. **`arXiv 23.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.01399)] [[Code](https:\u002F\u002Fgithub.com\u002Fjlin816\u002Fdynalang)]\n- [**TAD**] Task Aware Dreamer for Task Generalization in Reinforcement Learning. **`arXiv 23.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.05092)]\n### 2022\n- [**TD-MPC**] Temporal Difference Learning for Model Predictive Control. **`ICML 22`** [[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.04955)][[Code](https:\u002F\u002Fgithub.com\u002Fnicklashansen\u002Ftdmpc)]\n- **DreamerPro**: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations. **`ICML 22`** [[Paper](https:\u002F\u002Fproceedings.mlr.press\u002Fv162\u002Fdeng22a\u002Fdeng22a.pdf)] [[Code](https:\u002F\u002Fgithub.com\u002Ffdeng18\u002Fdreamer-pro)]\n- **DayDreamer**: World Models for Physical Robot Learning. **`CoRL 22`** [[Paper](https:\u002F\u002Fproceedings.mlr.press\u002Fv205\u002Fwu23c\u002Fwu23c.pdf)] [[Code](https:\u002F\u002Fgithub.com\u002Fdanijar\u002Fdaydreamer)]\n- Deep Hierarchical Planning from Pixels. **`NIPS 22`** [[Paper](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Ffile\u002Fa766f56d2da42cae20b5652970ec04ef-Paper-Conference.pdf)] [[Code](https:\u002F\u002Fgithub.com\u002Fdanijar\u002Fdirector)]\n- **Iso-Dream**: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models. **`NIPS 22 Spotlight`** [[Paper](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Ffile\u002F9316769afaaeeaad42a9e3633b14e801-Paper-Conference.pdf)] [[Code](https:\u002F\u002Fgithub.com\u002Fpanmt\u002FIso-Dream)]\n- **DreamingV2**: Reinforcement Learning with Discrete World Models without Reconstruction. **`arXiv 22.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.00494)] \n### 2021\n- [**DreamerV2**] Mastering Atari with Discrete World Models. **`ICLR 21`** [[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2010.02193)] [[TF Code](https:\u002F\u002Fgithub.com\u002Fdanijar\u002Fdreamerv2)] [[Torch Code](https:\u002F\u002Fgithub.com\u002Fjsikyoon\u002Fdreamer-torch)]\n- **Dreaming**: Model-based Reinforcement Learning by Latent Imagination without Reconstruction. **`ICRA 21`** [[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2007.14535)]\n### 2020\n- [**DreamerV1**] Dream to Control: Learning Behaviors by Latent Imagination. **`ICLR 20`** [[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1912.01603)] [[TF Code](https:\u002F\u002Fgithub.com\u002Fdanijar\u002Fdreamer)] [[Torch Code](https:\u002F\u002Fgithub.com\u002Fjuliusfrost\u002Fdreamer-pytorch)]\n- [**Plan2Explore**] Planning to Explore via Self-Supervised World Models. **`ICML 20`** [[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2005.05960)] [[TF Code](https:\u002F\u002Fgithub.com\u002Framanans1\u002Fplan2explore)] [[Torch Code](https:\u002F\u002Fgithub.com\u002Fyusukeurakami\u002Fplan2explore-pytorch)]\n\n### 2018\n* World Models. **`NIPS 2018 Oral`** [[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.10122)]\n","# 用于自动驾驶的优秀世界模型\n\n[![Awesome](https:\u002F\u002Fcdn.rawgit.com\u002Fsindresorhus\u002Fawesome\u002Fd7305f38d29fed78fa85652e3a63e154dd8e8829\u002Fmedia\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fsindresorhus\u002Fawesome) [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2502.10498-b31b1b.svg?logo=arXiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.10498)\n\n本仓库用于记录、跟踪和基准测试近年来多种世界模型（适用于自动驾驶或机器人）方法，作为我们[**综述**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.10498)的补充。\n\n如果您发现有遗漏的论文，请随时[*创建拉取请求*](https:\u002F\u002Fgithub.com\u002FLMD0311\u002FAwesome-World-Model\u002Fblob\u002Fmain\u002FContributionGuidelines.md)或[*提交问题*](https:\u002F\u002Fgithub.com\u002FLMD0311\u002FAwesome-World-Model\u002Fissues\u002Fnew)。欢迎任何形式的贡献，以使此列表更加全面。📣📣📣\n\n如果您觉得本仓库有用，请考虑为我们点个赞🌟并进行[*引用*](https:\u002F\u002Fgithub.com\u002FLMD0311\u002FAwesome-World-Model#citation)。\n\n## 📚 引用\n如果您在研究中使用了本仓库，请不吝点赞⭐并引用如下：\n```bibtex\n@article{tu2025drivingworldmodel,\n  title={世界模型在塑造自动驾驶中的作用：综合综述}, \n  author={Tu, Sifan and Zhou, Xin and Liang, Dingkang and Jiang, Xingyu and Zhang, Yumeng and Li, Xiaofan and Bai, Xiang},\n  journal={arXiv预印本 arXiv:2502.10498},\n  year={2025}\n}\n\n@inproceedings{zhou2025hermes,\n  title={HERMES：一种用于同时进行3D场景理解和生成的统一自动驾驶世界模型},\n  author={Zhou, Xin and Liang, Dingkang and Tu, Sifan and Chen, Xiwu and Ding, Yikang and Zhang, Dingyuan and Tan, Feiyang and Zhao, Hengshuang and Bai, Xiang},\n  booktitle={IEEE\u002FCVF国际计算机视觉会议论文集},\n  year={2025}\n}\n\n@inproceedings{liang2025UniFuture,\n  title={UniFuture：一种用于未来生成与感知的4D驾驶世界模型},\n  author={Liang, Dingkang and Zhang, Dingyuan and Zhou, Xin and Tu, Sifan and Feng, Tianrui and Li, Xiaofan and Zhang, Yumeng and Du, Mingyang and Tan, Xiao and Bai, Xiang},\n  booktitle={IEEE国际机器人与自动化会议论文集},\n  year={2026}\n}\n\n@article{chen2026out,\n  title={眼不见心仍念：动态视频世界模型的混合记忆},\n  author={Chen, Kaijin and Liang, Dingkang and Zhou, Xin and Ding, Yikang and Liu, Xiaoqiang and Wan, Pengfei and Bai, Xiang},\n  journal={arXiv预印本 arXiv:2603.25716},\n  year={2026}\n}\n```\n\n## 研讨会与挑战赛\n\n- [`CVPR 25研讨会与挑战赛 | OpenDriveLab`](https:\u002F\u002Fopendrivelab.com\u002Fchallenge25\u002F#1x-wm) 赛道：世界模型。\n> 世界模型是一种能够模拟智能体行为对环境影响的计算机程序。它有望解决通用仿真与评估问题，从而在各种场景下实现安全、可靠且智能的机器人应用。\n- [`World Model Bench @ CVPR'25`](https:\u002F\u002Fworldmodelbench.github.io\u002F) WorldModelBench：首届世界模型基准测试研讨会\n> 世界模型是指对我们周围物理现象的预测性模型。这类模型是物理AI智能体的基础，可赋予其决策、规划及反事实分析等关键能力。有效的世界模型需整合感知、指令执行、可控性、物理合理性以及未来预测等多个核心要素。\n- [`CVPR 24研讨会与挑战赛 | OpenDriveLab`](https:\u002F\u002Fopendrivelab.com\u002Fchallenge24\u002F#predictive_world_model) 第4赛道：预测型世界模型。\n- [`CVPR 23自动驾驶研讨会`](https:\u002F\u002Fcvpr23.wad.vision\u002F) 挑战赛3：ARGOVERSE挑战，基于[Argoverse 2传感器数据集](https:\u002F\u002Fwww.argoverse.org\u002Fav2.html#sensor-link)的[3D占用预测](https:\u002F\u002Feval.ai\u002Fweb\u002Fchallenges\u002Fchallenge-page\u002F1977\u002Foverview)，预测未来3秒内世界的时空占用情况。\n\n## 论文\n\n### 世界模型原始论文\n\n- 使用占用栅格进行移动机器人感知与导航 [[论文](http:\u002F\u002Fwww.sci.brooklyn.cuny.edu\u002F~parsons\u002Fcourses\u002F3415-fall-2011\u002Fpapers\u002Felfes.pdf)]\n\n### 技术博客或视频\n\n- **`Yann LeCun`**：迈向自主机器智能之路 [[论文](https:\u002F\u002Fopenreview.net\u002Fpdf?id=BZ5a1r-kVsf)] [[视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=OKkEdTchsiE)]\n- **`ICCV'25研讨会`** 主题演讲——特斯拉Ashok Elluswamy [[视频](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1oasHzTEe3\u002F?vd_source=9ef518a6c349809d9fa8ab9427bd8b2c)]\n- **`CVPR'23研讨会`** 主题演讲——特斯拉Ashok Elluswamy [[视频](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=6x-Xb_uT7ts)]\n- **`Wayve`** 推出GAIA-1：面向自主驾驶的尖端生成式AI模型 [[博客](https:\u002F\u002Fwayve.ai\u002Fthinking\u002Fintroducing-gaia1\u002F)] \n  > 世界模型是预测未来可能发生事件的基础，这对自动驾驶至关重要。它们可以充当学习型模拟器，或为基于模型的强化学习（RL）与规划提供“假设性”思维实验。通过将世界模型融入我们的驾驶模型，我们可以使其更好地理解人类决策，并最终推广到更多实际场景中。\n\n### 调查研究\n- 世界模型在塑造自动驾驶中的作用：综合调查。**`arXiv 25.02`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.10498)]\n- 将网络空间与物理世界对齐：具身智能的综合调查。**`TMECH 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.06886)] [[代码](https:\u002F\u002Fgithub.com\u002FHCPLab-SYSU\u002FEmbodied_AI_Paper_List)]\n- 面向自动驾驶的未来物理世界生成综述。**`MMAsia 25`** [[论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Ffull\u002F10.1145\u002F3769748.3773345)]\n- 面向自动驾驶的多模态大型语言模型综述。**`WACVW 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.12320)] [[代码](https:\u002F\u002Fgithub.com\u002FIrohXu\u002FAwesome-Multimodal-LLM-Autonomous-Driving)]\n- 世界模型：安全视角。**`ISSREW`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.07690)]\n- 自动驾驶中渐进式的鲁棒感知型世界模型：回顾与展望。**`techrXiv 25.11`** [[论文](https:\u002F\u002Fdoi.org\u002F10.36227\u002Ftechrxiv.176523308.84756413\u002Fv1)] [[项目](https:\u002F\u002Fgithub.com\u002FMoyangSensei\u002FAwesomeRobustDWM)]\n- 统一的多模态理解与生成综述：进展与挑战。**`techrXiv 25.11`** [[论文](https:\u002F\u002Fwww.techrxiv.org\u002Fdoi\u002Ffull\u002F10.36227\u002Ftechrxiv.176289261.16802577)]\n- 利用人工智能模拟视觉世界：路线图。**`arXiv 25.11`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.08585)] [[项目](https:\u002F\u002Fworld-model-roadmap.github.io\u002F)]\n- 通往世界模型之路：机器人操作综述。**`arXiv 25.11`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.02097)]\n- 面向具身智能的世界模型综合调查。**`arXiv 25.10`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.16732)] [[项目](https:\u002F\u002Fgithub.com\u002FLi-Zn-H\u002FAwesomeWorldModels)]\n- 具身智能代理中世界模型的安全挑战：综述。**`arXiv 25.10`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.05865)]\n- 基于声学物理信息的世界模型综述。**`arXiv 25.09`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.13833)]\n- 3D与4D世界建模：综述。**`arXiv 25.09`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.07996)] [[代码](https:\u002F\u002Fgithub.com\u002Fworldbench\u002Fsurvey)]\n- 具身世界模型综述。**`25.09`** [[论文](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F395713824_A_Survey_of_Embodied_World_Models)]\n- 跨越鸿沟的一次飞跃：从透视到全景视觉的综述。**`arXiv 25.09`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.04444)] [[页面](https:\u002F\u002Finsta360-research-team.github.io\u002FSurvey-of-Panorama\u002F)]\n- 通过世界模型和代理式AI实现边缘通用智能：基础、解决方案与挑战。**`arXiv 25.08`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.09561)]\n- 综述：从物理模拟器和世界模型中学习具身智能。**`arXiv 25.07`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.00917)]\n- 从2D到3D认知：通用世界模型简要综述。**`arXiv 25.06`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.20134)]\n- 面向认知代理的世界模型：变革未来网络中的边缘智能。**`arXiv 25.05`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.00417)]\n- 探索视频生成中物理认知的演化：综述。**`arXiv 25.03`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.21765)] [[代码](https:\u002F\u002Fgithub.com\u002Fminnie-lin\u002FAwesome-Physics-Cognition-based-Video-Generation)]\n- 面向自动驾驶的世界模型综述。**`arXiv 25.01`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.11260)]\n- 视觉中的生成式物理AI：综述。**`arXiv 25.01`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.10928)] [[代码](https:\u002F\u002Fgithub.com\u002FBestJunYu\u002FAwesome-Physics-aware-Generation)]\n- 理解世界还是预测未来？世界模型综合调查。**`arXiv 24.11`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.14499)]\n- 探讨自动驾驶中视频生成与世界模型之间的相互作用：综述。**`arXiv 24.11`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.02914)]\n- Sora是世界模拟器吗？通用世界模型及更广泛领域的综合调查。**`arXiv 24.5`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.03520)] [[代码](https:\u002F\u002Fgithub.com\u002FGigaAI-research\u002FGeneral-World-Models-Survey)]\n- 面向自动驾驶的世界模型：初步调查。**`arXiv 24.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.02622)]\n\n### 2026年\n- [**UniFuture**] UniFuture：面向下一代生成与感知的4D驾驶世界模型。**`ICRA 26`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.13587)] [[代码](https:\u002F\u002Fgithub.com\u002Fdk-liang\u002FUniFuture)] [[项目](https:\u002F\u002Fdk-liang.github.io\u002FUniFuture\u002F)]\n- **RAYNOVA**：光线空间中的尺度-时间自回归世界建模。**`CVPR 26`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.20685)] [[项目](https:\u002F\u002Fraynova-ai.github.io\u002F)]\n- **WAM-Flow**：基于离散流匹配的并行粗细结合运动规划，用于自动驾驶。**`CVPR 26`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.06112)] [[代码](https:\u002F\u002Fgithub.com\u002Ffudan-generative-vision\u002FWAM-Flow)]\n- **ResWorld**：用于端到端自动驾驶的时间残差世界模型。**`ICLR 26`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.10884)] [[代码](https:\u002F\u002Fgithub.com\u002Fmengtan00\u002FResWorld.git)]\n- **WorldRFT**：结合强化学习微调的潜在世界模型规划，用于自动驾驶。**`AAAI 26`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.19133)]\n- **X-World**：可控制的以自我为中心多摄像头世界模型，用于可扩展的端到端驾驶。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.19979)]\n- **Vega**：通过自然语言指令学习驾驶。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.25741)] [[代码](https:\u002F\u002Fgithub.com\u002Fzuosc19\u002FVega)]\n- **DCARL**：一种用于自回归长轨迹视频生成的分治框架。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.24835)] [[项目](https:\u002F\u002Fjunyiouy.github.io\u002Fprojects\u002Fdcarl)]\n- **DreamerAD**：通过潜在世界模型实现自动驾驶的高效强化学习。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.24587)]\n- **Latent-WAM**：用于端到端自动驾驶的潜在世界动作建模。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.24581)]\n- 面向挑战性轨迹下的物理一致性驾驶视频世界模型。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.24506)] [[项目](https:\u002F\u002Fwm-research.github.io\u002FPhyGenesis\u002F)]\n- **FAR-Drive**：闭环自动驾驶中的帧级自回归视频生成。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.14938)]\n- **WorldVLM**：结合世界模型预测与视觉-语言推理。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.14497)]\n- [**WorldDrive**] 桥接场景生成与规划：通过统一视觉与运动表征的世界模型进行驾驶。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.14948)] [[代码](https:\u002F\u002Fgithub.com\u002FTabGuigui\u002FWorldDrive)]\n- **DynVLA**：用于自动驾驶中行动推理的世界动力学学习。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.11041)]\n- 自动驾驶用潜在世界模型：统一分类、评估框架及开放挑战。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.09086)]\n- **SAMoE-VLA**：一种面向自动驾驶的场景自适应专家混合视觉-语言-行动模型。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.08113)]\n- 考虑运动学的潜在世界模型，用于数据高效的自动驾驶。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.07264)]\n- **ShareVerse**：用于共享世界建模的多智能体一致性视频生成。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.02697)]\n- 风险感知的世界模型预测控制，用于可泛化的端到端自动驾驶。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.23259)]\n- **UniDrive-WM**：用于自动驾驶的统一理解、规划与生成世界模型。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.04453)] [[项目](https:\u002F\u002Funidrive-wm.github.io\u002FUniDrive-WM)]\n- **MAD**：用于高效驾驶世界模型的运动与外观解耦。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.09452)] [[项目](https:\u002F\u002Fvita-epfl.github.io\u002FMAD-World-Model\u002F)]\n- 从机制视角看作为世界模型的视频生成：状态与动力学。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.17067)]\n- **Drive-JEPA**：视频JEPA结合多模态轨迹蒸馏，用于端到端驾驶。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.22032)]\n- **DrivingGen**：自动驾驶中生成式视频世界模型的综合基准测试。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.01528)] [[项目](https:\u002F\u002Fdrivinggen-bench.github.io\u002F)]\n\n### 2025年\n- **HERMES**: 用于同时进行3D场景理解和生成的统一自动驾驶世界模型。**`ICCV 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.14729)] [[代码](https:\u002F\u002Fgithub.com\u002FLMD0311\u002FHERMES)] [[项目](https:\u002F\u002Flmd0311.github.io\u002FHERMES\u002F)]\n- [**FSDrive**] FutureSightDrive：基于时空思维链的视觉化自动驾驶方法。**`NeurIPS 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.17685)] [[代码](https:\u002F\u002Fgithub.com\u002FMIV-XJTU\u002FFSDrive)]\n- **DINO-Foresight**: 利用DINO模型展望未来。**`NeurIPS 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.11673)] [[代码](https:\u002F\u002Fgithub.com\u002FSta8is\u002FDINO-Foresight)]\n- **从预测到规划**: 用于协同状态-动作预测的策略世界模型。**`NeurIPS 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.19654)] [[代码](https:\u002F\u002Fgithub.com\u002F6550Zhao\u002FPolicy-World-Model)]\n- **InfiniCube**: 基于世界引导视频模型的无界且可控的动态3D驾驶场景生成。**`ICCV 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.03934)] [[项目](https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Ftoronto-ai\u002Finfinicube\u002F)]\n- **DiST-4D**: 基于度量深度的解耦时空扩散模型，用于4D驾驶场景生成。**`ICCV 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.15208)] [[项目](https:\u002F\u002Froyalmelon0505.github.io\u002FDiST-4D\u002F)]\n- **Epona**: 用于自动驾驶的自回归扩散世界模型。**`ICCV 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.24113)] [[代码](https:\u002F\u002Fgithub.com\u002FKevin-thu\u002FEpona\u002F)]\n- **UniOcc**: 自动驾驶中占用预测与预报的统一基准。**`ICCV 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.24381)] [[代码](https:\u002F\u002Funiocc.github.io\u002F)]\n- **DriVerse**: 通过多模态轨迹提示和运动对齐实现驾驶模拟的导航世界模型。**`ACM MM 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.19614)] [[代码](https:\u002F\u002Fgithub.com\u002Fshalfun\u002FDriVerse)]\n- **OmniGen**: 自动驾驶中的统一多模态传感器生成。**`ACM MM 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.14225)]\n- **World4Drive**: 基于意图感知的物理潜在世界模型实现端到端自动驾驶。**`ICCV 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.00603)]\n- [**PIWM**] 通过预测性个体世界模型实现“梦想成真”的驾驶。**`TIV 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.16733)] [[代码](https:\u002F\u002Fgithub.com\u002Fgaoyinfeng\u002FPIWM)]\n- **DriveDreamer4D**: 世界模型是高效的4D驾驶场景表示数据生成器。**`CVPR 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.13571)] [[项目页](https:\u002F\u002Fdrivedreamer4d.github.io\u002F)]\n- **GaussianWorld**: 用于流式3D占用预测的高斯世界模型。**`CVPR 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.10373)] [[代码](https:\u002F\u002Fgithub.com\u002Fzuosc19\u002FGaussianWorld)]\n- **ReconDreamer**: 通过在线修复技术构建用于驾驶场景重建的世界模型。**`CVPR 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.19548)] [[代码](https:\u002F\u002Fgithub.com\u002FGigaAI-research\u002FReconDreamer)]\n- **FUTURIST**: 通过多模态视觉序列Transformer推进语义未来预测。**`CVPR 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.08303)] [[代码](https:\u002F\u002Fgithub.com\u002FSta8is\u002FFUTURIST)]\n- **MaskGWM**: 具有视频掩码重建功能的可泛化驾驶世界模型。**`CVPR 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.11663)] [[代码](https:\u002F\u002Fgithub.com\u002FSenseTime-FVG\u002FOpenDWM)]\n- **UniScene**: 统一的以占用为中心的驾驶场景生成。**`CVPR 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.05435)] [[项目](https:\u002F\u002Farlo0o.github.io\u002Funiscene\u002F)]\n- **DrivingGPT**: 通过多模态自回归Transformer统一驾驶世界建模与规划。**`CVPR 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.18607)] [[项目](https:\u002F\u002Frogerchern.github.io\u002FDrivingGPT\u002F)]\n- **GEM**: 一种可泛化的自我视角多模态世界模型，用于精细控制自我运动、物体动力学和场景构成。**`CVPR 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.11198)] [[项目](https:\u002F\u002Fvita-epfl.github.io\u002FGEM.github.io\u002F)]\n- [**UMGen**] 通过下一场景预测生成多模态驾驶场景。**`CVPR 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.14945)] [[项目](https:\u002F\u002Fyanhaowu.github.io\u002FUMGen\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002FYanhaoWu\u002FUMGen\u002F)]\n- **DIO**: 可分解的隐式4D占用-流世界模型。**`CVPR 25`** [[论文](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2025\u002Fhtml\u002FDiehl_DIO_Decomposable_Implicit_4D_Occupancy-Flow_World_Model_CVPR_2025_paper.html)]\n- **SceneDiffuser++**: 基于生成式世界模型的城市级交通仿真。**`CVPR 25`** [[论文](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2025\u002Fhtml\u002FTan_SceneDiffuser_City-Scale_Traffic_Simulation_via_a_Generative_World_Model_CVPR_2025_paper.html)]\n- **DynamicCity**: 从动态场景中大规模生成LiDAR点云。**`ICLR 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.18084)] [[代码](https:\u002F\u002Fgithub.com\u002F3DTopia\u002FDynamicCity)]\n- **AdaWM**: 基于自适应世界模型的自动驾驶规划。**`ICLR 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.13072)]\n- **OccProphet**: 采用观察者-预测者-精炼者框架提升纯相机4D占用预测的效率极限。**`ICLR 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.15180)] [[代码](https:\u002F\u002Fgithub.com\u002FJLChen-C\u002FOccProphet)]\n- [**PreWorld**] 半监督视觉中心的3D占用世界模型，用于自动驾驶。**`ICLR 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.07309)] [[代码](https:\u002F\u002Fgithub.com\u002Fgetterupper\u002FPreWorld)]\n- [**SSR**] 端到端自动驾驶是否真的需要感知任务？**`ICLR 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.18341)] [[代码](https:\u002F\u002Fgithub.com\u002FPeidongLi\u002FSSR)]\n- **Occ-LLM**: 利用基于占用的大语言模型增强自动驾驶。**`ICRA 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.06419)]\n- **STAGE**: 以流为中心的生成式世界模型，用于长时程驾驶场景仿真。**`IROS 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.13138)] [[项目](https:\u002F\u002F4dvlab.github.io\u002FSTAGE\u002F)]\n- **Drive&Gen**: 同步评估端到端驾驶与视频生成模型。**`IROS 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.06209)]\n- 学习生成4D LiDAR序列。**`ICCVW 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.11959)]\n- 基于世界模型的端到端场景生成，用于自动驾驶中的事故预警。**`Communications Engineering 25`** [[论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs44172-025-00474-7)]\n- 基于LiDAR观测的地面机器人自主导航世界模型。**`JIFS 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.03429)]\n- **GaussianDWM**: 3D Gaussian驾驶世界模型，用于统一场景理解和多模态生成。**`arXiv 25.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.23180)] [[代码](https:\u002F\u002Fgithub.com\u002Fdtc111111\u002FGaussianDWM)]\n- **DriveLaW**: 在潜在驾驶世界中统一规划与视频生成。**`arXiv 25.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.23421)]\n- **InDRiVE**: 基于潜在分歧的自动驾驶免奖励世界模型预训练。**`arXiv 25.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.18850)]\n- 面向端到端驾驶的潜在思维链世界建模。**`arXiv 25.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.10226)]\n- **GenieDrive**: 朝着物理感知型驾驶世界模型迈进，以4D占用指导视频生成。**`arXiv 25.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.12751)] [[项目](https:\u002F\u002Fhuster-yzy.github.io\u002Fgeniedrive_project_page\u002F)]\n- **WorldLens**: 在真实世界中对驾驶世界模型进行全面评估。**`arXiv 25.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.10958)] [[项目](https:\u002F\u002Fworldbench.github.io\u002Fworldlens)]\n- **UniUGP**: 统一理解、生成和规划，实现端到端自动驾驶。**`arXiv 25.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.09864)] [[项目](https:\u002F\u002Fseed-uniugp.github.io\u002F)]\n- **MindDrive**: 一个整合世界模型和视觉-语言模型的全栈框架，用于端到端自动驾驶。**`arXiv 25.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.04441)]\n- **U4D**: 基于LiDAR序列的不确定性感知4D世界建模。**`arXiv 25.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.02982)]\n- **RadarGen**: 由摄像头生成汽车雷达点云。**`arXiv 25.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.17897)] [[项目](https:\u002F\u002Fradargen.github.io\u002F)]\n- **先思考再驾驶**: 受世界模型启发的多模态接地方法，用于自动驾驶车辆。**`arXiv 25.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.03454)]\n- 车辆动力学嵌入式世界模型，用于自动驾驶。**`arXiv 25.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.02417)]\n- **LiSTAR**: 以光线为中心的世界模型，用于自动驾驶中的4D LiDAR序列。**`arXiv 25.11`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.16049)] [[项目](https:\u002F\u002Focean-luna.github.io\u002FLiSTAR.github.io\u002F)]\n- **OpenTwinMap**: 用于城市自动驾驶的开源数字孪生生成器。**`arXiv 25.11`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.21925)]\n- **SparseWorld-TC**: 轨迹条件下的稀疏占用世界模型。**`arXiv 25.11`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.22039)]\n- **LaGen**: 朝着自回归LiDAR场景生成迈进。**`arXiv 25.11`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.21256)]\n- **AD-R1**: 基于公正世界模型的闭环强化学习，用于实现端到端自动驾驶。**`arXiv 25.11`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.20325)]\n- **CorrectAD**: 一种自我修正的代理系统，用于改善自动驾驶中的端到端规划。**`arXiv 25.11`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.13297)]\n- [**UniScenev2**] 扩大规模的以占用为中心的驾驶场景生成：数据集与方法。**`arXiv 25.10`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.22973)]\n- 基于隐式残差世界模型的视觉中心4D占用预测与规划。**`arXiv 25.10`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.16729)]\n- **SparseWorld**: 一种灵活、适应性强且高效的4D占用世界模型，由稀疏和动态查询驱动。**`arXiv 25.10`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.17482)] [[代码](https:\u002F\u002Fgithub.com\u002FMSunDYY\u002FSparseWorld)]\n- **OmniNWM**: 全知全能的驾驶导航世界模型。**`arXiv 25.10`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.18313)] [[项目](https:\u002F\u002Farlo0o.github.io\u002FOmniNWM\u002F)]\n- [**ORAD-3D**] 推进越野自动驾驶：大型ORAD-3D数据集及全面基准测试。**`arXiv 25.10`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.16500)] [[代码](https:\u002F\u002Fgithub.com\u002Fchaytonmin\u002FORAD-3D)]\n- [**Dream4Drive**] 将驾驶世界模型重新定义为感知任务的合成数据生成器。**`arXiv 25.10`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.19195)] [[项目](https:\u002F\u002Fwm-research.github.io\u002FDream4Drive\u002F)]\n- **DriveVLA-W0**: 世界模型放大了自动驾驶中的数据规模法则。**`arXiv 25.10`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.12796)]\n- **CoIRL-AD**: 在潜在世界模型中进行协作-竞争式的模仿-强化学习，用于自动驾驶。**`arXiv 25.10`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.12560)]\n- **CVD-STORM**: 基于空间-时间重建模型的跨视图视频扩散，用于自动驾驶。**`arXiv 25.10`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.07944)]\n- [**PhiGensis**] 基于立体强制的4D驾驶场景生成。**`arXiv 25.9`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.20251)] [[项目](https:\u002F\u002Fjiangxb98.github.io\u002FPhiGensis\u002F)]\n- **TeraSim-World**: 全球范围内的安全关键数据合成，用于端到端自动驾驶。**`arXiv 25.9`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.13164)]\n- **OccTENS**: 基于时间尺度下一次预测的3D占用世界模型。**`arXiv 25.9`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.03887)]\n- [**G^2Editor**] 现实且可控的3D高斯引导对象编辑，用于驾驶视频生成。**`arXiv 25.8`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.20471)]\n- **LSD-3D**: 大规模3D驾驶场景生成，结合几何接地。**`arXiv 25.8`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.19204)] [[项目](https:\u002F\u002Fprinceton-computational-imaging.github.io\u002FLSD-3D\u002F)]\n- 清晰地看，深刻地忘却：重新审视用于驾驶模拟的微调视频生成器。**`arXiv 25.8`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.16512)]\n- **MoVieDrive**: 多模态多视角城市场景视频生成。**`arXiv 25.8`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.14327)]\n- **ImagiDrive**: 一个统一的想象与规划框架，用于自动驾驶。**`arXiv 25.8`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.11428)] [[代码](https:\u002F\u002Fgithub.com\u002Ffudan-zvg\u002FImagiDrive)]\n- **LiDARCrafter**: 从LiDAR序列中进行动态4D世界建模。**`arXiv 25.8`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.03692)] [[项目](https:\u002F\u002Flidarcrafter.github.io\u002F)]\n- **FASTopoWM**: 基于潜在世界模型的快慢车道段拓扑推理。**`arXiv 25.7`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.23325)]\n- 基于世界模型的端到端场景生成，用于自动驾驶中的事故预警。**`arXiv 25.7`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.12762)]\n- **Orbis**: 克服驾驶世界模型中长时程预测的挑战。**`arXiv 25.7`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.13162)] [[代码](https:\u002F\u002Flmb-freiburg.github.io\u002Forbis.github.io\u002F)]\n- **I2 -World**: 内部-交互标记化，用于高效动态4D场景预测。**`arXiv 25.7`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.09144)] [[代码](https:\u002F\u002Fgithub.com\u002Flzzzzzm\u002FII-World)]\n- **NRSeg**: 基于驾驶世界模型的BEV语义分割噪声鲁棒学习。**`arXiv 25.7`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.04002)] [[代码](https:\u002F\u002Fgithub.com\u002Flynn-yu\u002FNRSeg)]\n- 朝着高效潜在流匹配的底层LiDAR世界模型迈进。**`arXiv 25.6`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.23434)]\n- **ReSim**: 可靠的自动驾驶世界模拟。**`arXiv 25.6`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.09981)] [[项目](https:\u002F\u002Fopendrivelab.com\u002FReSim)]\n- **Cosmos-Drive-Dreams**: 基于世界基础模型的大规模合成驾驶数据生成。**`arXiv 25.6`** **`NVIDIA`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.09042)] [[项目](https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Ftoronto-ai\u002Fcosmos_drive_dreams\u002F)]\n- **Dreamland**: 利用模拟器和生成模型进行可控的世界创造。**`arXiv 25.6`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.08006)] [[项目](https:\u002F\u002Fmetadriverse.github.io\u002Fdreamland\u002F)]\n- **LongDWM**: 跨粒度蒸馏，用于构建长期驾驶世界模型。**`arXiv 25.6`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.01546)] [[代码](https:\u002F\u002Fwang-xiaodong1899.github.io\u002Flongdwm\u002F)]\n- **ProphetDWM**: 用于滚动发布未来行动和视频的驾驶世界模型。**`arXiv 25.5`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.18650)]\n- **GeoDrive**: 基于3D几何信息的驾驶世界模型，具备精确的动作控制能力。**`arXiv 25.5`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.22421)] [[代码](https:\u002F\u002Fgithub.com\u002Fantonioo-c\u002FGeoDrive)]\n- **DriveX**: 全景建模，用于学习可泛化的自动驾驶世界知识。**`arXiv 25.5`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.19239)]\n- **VL-SAFE**: 基于视觉-语言指导的安全意识强化学习，结合世界模型应用于自动驾驶。**`arXiv 25.5`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.16377)] [[项目](https:\u002F\u002Fys-qu.github.io\u002Fvlsafe-website\u002F)]\n- **Raw2Drive**: 基于对齐世界模型的强化学习，用于CARLA v2中的端到端自动驾驶。**`arXiv 25.5`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.16394)]\n- [**RAMBLE**] 从模仿到探索：基于世界模型的端到端自动驾驶。**`arXiv 25.4`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.02253)] [[代码](https:\u002F\u002Fgithub.com\u002FSCP-CN-001\u002Framble)]\n- **DiVE**: 基于视频扩散Transformer的高效多视角驾驶场景生成。**`arXiv 25.4`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.18576)]\n- [**WoTE**] 基于BEV世界模型，在线轨迹评估实现端到端驾驶。**`ICCV 25`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.01941)] [[代码](https:\u002F\u002Fgithub.com\u002FliyingyanUCAS\u002FWoTE)]\n- **MagicDrive-V2**: 高分辨率长视频生成，用于自动驾驶，并具备适应性控制。**`arXiv 25.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.13807)] [[项目](https:\u002F\u002Fgaoruiyuan.com\u002Fmagicdrive-v2\u002F)]\n- **CoGen**: 基于适应性条件的3D一致视频生成，用于自动驾驶。**`arXiv 25.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.22231)]\n- **GAIA-2**: 一种可控的多视角生成式世界模型，用于自动驾驶。**`arXiv 25.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.20523)]\n- **Semi-SD**: 基于周围摄像头的半监督度量深度估计，用于自动驾驶。**`arXiv 25.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.19713)] [[代码](https:\u002F\u002Fgithub.com\u002Fxieyuser\u002FSemi-SD)]\n- **MiLA**: 多视角高保真度长期视频生成世界模型，用于自动驾驶。**`arXiv 25.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.15875)] [[项目](https:\u002F\u002Fxiaomi-mlab.github.io\u002Fmila.github.io\u002F)]\n- **SimWorld**: 基于世界模型的模拟器条件场景生成统一基准。**`arXiv 25.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.13952)] [[代码](https:\u002F\u002Fgithub.com\u002FLi-Zn-H\u002FSimWorld)]\n- [**EOT-WM**] 其他车辆轨迹同样重要：驾驶世界模型将自我与其他车辆轨迹统一在视频潜在空间中。**`arXiv 25.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.09215)]\n- [**T^3Former**] 时间三平面Transformer作为占用世界模型。**`arXiv 25.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.07338)]\n- **AVD2**: 事故视频扩散，用于事故视频描述。**`arXiv 25.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.14801)] [[项目](https:\u002F\u002Fan-answer-tree.github.io\u002F)]\n- **VaViM和VaVAM**: 通过视频生成建模实现自动驾驶。**`arXiv 25.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.15672)] [[代码](https:\u002F\u002Fgithub.com\u002Fvaleoai\u002FVideoActionModel)]\n- **梦想成真**: 基于解析世界模型的车辆控制。**`arXiv 25.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.10012)]\n- **AD-L-JEPA**: 基于联合嵌入预测架构的自监督空间世界模型，用于LiDAR数据驱动的自动驾驶。**`arXiv 25.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.04969)] [[代码](https:\u002F\u002Fgithub.com\u002FHaoranZhuExplorer\u002FAD-L-JEPA-Release)]\n\n### 2024\n- [**SEM2**] 通过语义掩码世界模型提升端到端城市自动驾驶的样本效率与鲁棒性。**`TITS`** [[论文](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F10538211\u002F)]\n- **Vista**: 具有高保真度和多样化可控性的可泛化驾驶世界模型。**`NeurIPS 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.17398)] [[代码](https:\u002F\u002Fgithub.com\u002FOpenDriveLab\u002FVista)]\n- **SceneDiffuser**: 高效且可控的驾驶场景仿真初始化与推演。**`NeurIPS 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.12129)]\n- **DrivingDojo 数据集**: 推动交互式、知识增强型驾驶世界模型的发展。**`NeurIPS 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.10738)] [[项目](https:\u002F\u002Fdrivingdojo.github.io\u002F)]\n- **Think2Drive**: 基于潜在世界模型思考的高效强化学习，用于准现实自动驾驶。**`ECCV 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.16720)]\n- [**MARL-CCE**] 在生成式世界模型下建模自动驾驶中的竞争行为。**`ECCV 24`** [[论文](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_24\u002Fpapers_ECCV\u002Fpapers\u002F05085.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002Fqiaoguanren\u002FMARL-CCE)]\n- **DriveDreamer**: 朝着由真实世界驱动的自动驾驶世界模型迈进。**`ECCV 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.09777)] [[代码](https:\u002F\u002Fgithub.com\u002FJeffWang987\u002FDriveDreamer)]\n- **OccWorld**: 学习用于自动驾驶的三维占用世界模型。**`ECCV 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.16038)] [[代码](https:\u002F\u002Fgithub.com\u002Fwzzheng\u002FOccWorld)]\n- [**NeMo**] 用于自动驾驶的神经体积世界模型。**`ECCV 24`** [[论文](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_24\u002Fpapers_ECCV\u002Fpapers\u002F02571.pdf)]\n- **CarFormer**: 基于学习到的对象中心表征的自动驾驶。**`ECCV 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.15843)] [[代码](https:\u002F\u002Fkuis-ai.github.io\u002FCarFormer\u002F)]\n- [**MARL-CCE**] 在生成式世界模型下建模自动驾驶中的竞争行为。**`ECCV 24`** [[代码](https:\u002F\u002Fgithub.com\u002Fqiaoguanren\u002FMARL-CCE)]\n- [**GUMP**] 使用可扩展生成模型解决运动规划任务。**`ECCV 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.02797)] [[代码](https:\u002F\u002Fgithub.com\u002FHorizonRobotics\u002FGUMP\u002F)]\n- **WoVoGen**: 具备世界体积感知的扩散模型，用于可控的多摄像头驾驶场景生成。**`ECCV 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.02934)] [[代码](https:\u002F\u002Fgithub.com\u002Ffudan-zvg\u002FWoVoGen)]\n- **DrivingDiffusion**: 基于潜在扩散模型的布局引导型多视角驾驶场景视频生成。**`ECCV 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.07771)] [[代码](https:\u002F\u002Fgithub.com\u002Fshalfun\u002FDrivingDiffusion)]\n- **3D-VLA**: 一种3D视觉-语言-动作生成式世界模型。**`ICML 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.09631)]\n- [**ViDAR**] 视觉点云预测实现可扩展自动驾驶。**`CVPR 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.17655)] [[代码](https:\u002F\u002Fgithub.com\u002FOpenDriveLab\u002FViDAR)]\n- [**GenAD**] 自动驾驶的通用预测模型。**`CVPR 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.09630)] [[数据](https:\u002F\u002Fgithub.com\u002FOpenDriveLab\u002FDriveAGI?tab=readme-ov-file#genad-dataset-opendv-youtube)]\n- **Cam4DOCC**: 自动驾驶应用中仅基于摄像头的4D占用预测基准测试。**`CVPR 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.17663)] [[代码](https:\u002F\u002Fgithub.com\u002Fhaomo-ai\u002FCam4DOcc)]\n- [**Drive-WM**] 驾驶向未来：基于世界模型的多视角视觉预测与规划，用于自动驾驶。**`CVPR 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.17918)] [[代码](https:\u002F\u002Fgithub.com\u002FBraveGroup\u002FDrive-WM)]\n- **DriveWorld**: 通过世界模型进行4D预训练的场景理解，用于自动驾驶。**`CVPR 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.04390)]\n- **Panacea**: 用于自动驾驶的全景式可控视频生成。**`CVPR 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.16813)] [[代码](https:\u002F\u002Fpanacea-ad.github.io\u002F)]\n- **UnO**: 用于感知与预测的无监督占用场。**`CVPR 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.08691)] [[代码](https:\u002F\u002Fwaabi.ai\u002Fresearch\u002Funo)]\n- **MagicDrive**: 具有多样化3D几何控制的街景生成。**`ICLR 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.02601)] [[代码](https:\u002F\u002Fgithub.com\u002Fcure-lab\u002FMagicDrive)]\n- **Copilot4D**: 通过离散扩散学习用于自动驾驶的无监督世界模型。**`ICLR 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.01017)]\n- **SafeDreamer**: 基于世界模型的安全强化学习。**`ICLR 24`** [[论文](https:\u002F\u002Fopenreview.net\u002Fforum?id=tsE5HLYtYg)] [[代码](https:\u002F\u002Fgithub.com\u002FPKU-Alignment\u002FSafeDreamer)]\n- **DrivingWorld**: 通过视频GPT构建自动驾驶世界模型。**`arXiv 24.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.19505)] [[代码](https:\u002F\u002Fgithub.com\u002FYvanYin\u002FDrivingWorld)]\n- 一种通过解耦动态流与图像辅助训练的高效占用世界模型。**`arXiv 24.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.13772)]\n- **Doe-1**: 基于大型世界模型的闭环自动驾驶。**`arXiv 24.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.09627)] [[代码](https:\u002F\u002Fgithub.com\u002Fwzzheng\u002FDoe)]\n- [**DrivePhysica**] 物理信息驱动的驾驶世界模型。**`arXiv 24.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.08410)] [[代码](https:\u002F\u002Fmetadrivescape.github.io\u002Fpapers_project\u002FDrivePhysica\u002Fpage.html)]\n- **Terra ACT-Bench**: 朝着行动可控的世界模型迈进，用于自动驾驶。**`arXiv 24.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.05337)] [[代码](https:\u002F\u002Fgithub.com\u002Fturingmotors\u002FACT-Bench)] [[项目](https:\u002F\u002Fturingmotors.github.io\u002Factbench\u002F)] [[Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fturing-motors\u002FTerra)]\n- **UniMLVG**: 用于自动驾驶的具有全面控制能力的多视角长视频生成统一框架。**`arXiv 24.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.04842)] [[项目](https:\u002F\u002Fsensetime-fvg.github.io\u002FUniMLVG\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002FSenseTime-FVG\u002FOpenDWM)]\n- **HoloDrive**: 用于自动驾驶的整体式2D-3D多模态街景生成。**`arXiv 24.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.01407)]\n- **InfinityDrive**: 打破驾驶世界模型的时间限制。**`arXiv 24.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.01522)] [[项目页](https:\u002F\u002Fmetadrivescape.github.io\u002Fpapers_project\u002FInfinityDrive\u002Fpage.html)]\n- 使用语言模型生成分布外场景。**`arXiv 24.11`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.16554)]\n- **Imagine-2-Drive**: 在CARLA中为自动驾驶车辆进行高保真度世界建模。**`arXiv 24.11`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.10171)] [[项目页](https:\u002F\u002Fanantagrg.github.io\u002FImagine-2-Drive.github.io\u002F)]\n- **WorldSimBench**: 朝着以视频生成模型作为世界模拟器的方向发展。**`arXiv 24.10`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.18072)] [[项目页](https:\u002F\u002Firanqin.github.io\u002FWorldSimBench.github.io\u002F)]\n- **DOME**: 将扩散模型驯服为高保真度的可控占用世界模型。**`arXiv 24.10`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.10429)] [[项目页](https:\u002F\u002Fgusongen.github.io\u002FDOME)]\n- **OCCVAR**: 通过次规模预测实现可扩展的4D占用预测。**`OpenReview`** [[论文](https:\u002F\u002Fopenreview.net\u002Fforum?id=X2HnTFsFm8)]\n- 利用潜在空间生成式世界模型缓解自动驾驶模仿学习中的协变量偏移。**`arXiv 24.9`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.16663)]\n- [**LatentDriver**] 在自动驾驶中从潜在世界模型学习多重概率决策。**`arXiv 24.9`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.15730)] [[代码](https:\u002F\u002Fgithub.com\u002FSephirex-X\u002FLatentDriver)]\n- **RenderWorld**: 具有自监督3D标签的世界模型。**`arXiv 24.9`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.11356)]\n- **OccLLaMA**: 一种用于自动驾驶的占用-语言-动作生成式世界模型。**`arXiv 24.9`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.03272)]\n- **DriveGenVLM**: 基于视觉语言模型的自动驾驶的真实世界视频生成。**`arXiv 24.8`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.16647)]\n- [**Drive-OccWorld**] 在占用世界中行驶：基于世界模型的以视觉为中心的4D占用预测与规划，用于自动驾驶。**`arXiv 24.8`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.14197)]\n- **BEVWorld**: 通过统一的BEV潜在空间构建的用于自动驾驶的多模态世界模型。**`arXiv 24.7`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.05679)] [[代码](https:\u002F\u002Fgithub.com\u002Fzympsyche\u002FBevWorld)]\n- [**TOKEN**] 将世界分词为对象级知识，以应对自动驾驶中的长尾事件。**`arXiv 24.7`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.00959)]\n- **UMAD**: 用于自动驾驶的无监督掩码级异常检测。**`arXiv 24.6`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.06370)]\n- **SimGen**: 模拟器条件下的驾驶场景生成。**`arXiv 24.6`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.09386)] [[代码](https:\u002F\u002Fmetadriverse.github.io\u002Fsimgen\u002F)]\n- [**AdaptiveDriver**] 基于自适应世界模型进行自动驾驶规划。**`arXiv 24.6`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.10714)] [[代码](https:\u002F\u002Farunbalajeev.github.io\u002Fworld_models_planning\u002Fworld_model_paper.html)]\n- [**LAW**] 利用潜在世界模型提升端到端自动驾驶性能。**`arXiv 24.6`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.08481)] [[代码](https:\u002F\u002Fgithub.com\u002FBraveGroup\u002FLAW)]\n- [**Delphi**] 通过可控的长视频生成释放端到端自动驾驶的泛化能力。**`arXiv 24.6`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.01349)] [[代码](https:\u002F\u002Fgithub.com\u002Fwestlake-autolab\u002FDelphi)]\n- **OccSora**: 作为自动驾驶世界模拟器的4D占用生成模型。**`arXiv 24.5`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.20337)] [[代码](https:\u002F\u002Fgithub.com\u002Fwzzheng\u002FOccSora)]\n- **MagicDrive3D**: 用于街景任意视角渲染的可控3D生成。**`arXiv 24.5`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.14475)] [[代码](https:\u002F\u002Fgaoruiyuan.com\u002Fmagicdrive3d\u002F)]\n- **CarDreamer**: 基于世界模型的自动驾驶开源学习平台。**`arXiv 24.5`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.09111)] [[代码](https:\u002F\u002Fgithub.com\u002Fucd-dare\u002FCarDreamer)]\n- [**DriveSim**] 探索将多模态大语言模型用作驾驶世界模型。**`arXiv 24.5`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.05956)] [[代码](https:\u002F\u002Fgithub.com\u002Fsreeramsa\u002FDriveSim)]\n- **LidarDM**: 在生成的世界中进行生成式激光雷达仿真。**`arXiv 24.4`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.02903)] [[代码](https:\u002F\u002Fgithub.com\u002Fvzyrianov\u002Flidardm)]\n- **SubjectDrive**: 通过主体控制在自动驾驶中扩展生成数据。**`arXiv 24.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.19438)] [[项目](https:\u002F\u002Fsubjectdrive.github.io\u002F)]\n- **DriveDreamer-2**: 基于大语言模型增强的世界模型，用于多样化的驾驶视频生成。**`arXiv 24.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.06845)] [[代码](https:\u002F\u002Fdrivedreamer2.github.io\u002F)]\n\n### 2023年\n\n- **TrafficBots**: 面向自动驾驶仿真与运动预测的世界模型。**`ICRA 23`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.04116)] [[代码](https:\u002F\u002Fgithub.com\u002Fzhejz\u002FTrafficBots)]\n- [**CTT**] 分类交通Transformer：基于标记化潜在空间的可解释且多样化的行为预测。**`arXiv 23.11`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.18307)]\n- **MUVO**: 基于几何表示的多模态生成式世界模型，用于自动驾驶。**`arXiv 23.11`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.11762)]\n- **GAIA-1**: 用于自动驾驶的生成式世界模型。**`arXiv 23.9`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.17080)]\n- **ADriver-I**: 一种通用的自动驾驶世界模型。**`arXiv 23.9`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.13549)]\n- **UniWorld**: 基于世界模型的自动驾驶预训练。**`arXiv 23.8`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.07234)] [[代码](https:\u002F\u002Fgithub.com\u002Fchaytonmin\u002FUniWorld)]\n\n### 2022年\n\n- [**MILE**] 面向城市驾驶的基于模型的模仿学习。**`NeurIPS 22`** [[论文](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Fhash\u002F827cb489449ea216e4a257c47e407d18-Abstract-Conference.html)] [[代码](https:\u002F\u002Fgithub.com\u002Fwayveai\u002Fmile)]\n- **Symphony**: 学习用于自动驾驶仿真的真实且多样化智能体。**`ICRA 22`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.03195)]\n- 自动驾驶规划中的层次化基于模型的模仿学习。**`IROS 22`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.09539)]\n\n## 其他世界模型论文\n\n### 2026年\n- 8个Token中的规划：用于潜在世界模型的紧凑离散分词器。**`CVPR 26`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.05438)]\n- **GeoWorld**：几何世界模型。**`CVPR 26`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.23058)] [[项目](https:\u002F\u002Fsteve-zeyu-zhang.github.io\u002FGeoWorld)]\n- [**EAWM**] 从观测到事件：面向强化学习的事件感知世界模型。**`ICLR 26`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.19336)] [[代码](https:\u002F\u002Fgithub.com\u002FMarquisDarwin\u002FEAWM)]\n- **R2-Dreamer**：无需解码器或数据增强的降冗余世界模型。**`ICLR 26`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.18202)] [[代码](https:\u002F\u002Fgithub.com\u002FNM512\u002Fr2dreamer)]\n- **NeuroHex**：用于构建世界模型以实现自适应AI的高度高效的六边形坐标系。**`NICE 26`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.00376)]\n- 面向在静态环境之外可靠学习、验证和适应的智能体的基础世界模型。**`AAMAS 26`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.23997)]\n- 世界模型中的概率性梦境生成。**`ICLRW 26`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.04715)]\n- 由局部到整体：具有自适应结构层次的3D生成式世界模型。**`ICME 26`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.21557)]\n- 基于JEPA世界模型的价值引导行动规划。**`世界建模研讨会 26`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.00844)]\n- 自监督多模态世界模型，带有4D时空嵌入。**`世界建模研讨会 26`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.07039)] [[项目](https:\u002F\u002Fgithub.com\u002Flegel\u002Fdeepearth)]\n- 用于可靠人机协作的显式世界模型。**`AAAIW 26`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.01705)]\n- [**HyDRA**] 眼不见心不烦：动态视频世界模型的混合记忆。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2603.25716)] [[代码](https:\u002F\u002Fgithub.com\u002FH-EmbodVis\u002FHyDRA)] [[项目](https:\u002F\u002Fkj-chen666.github.io\u002FHybrid-Memory-in-Video-World-Models\u002F)]\n- 持久化机器人世界模型：通过强化学习稳定多步回放缓冲区。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.25685)]\n- **MMaDA-VLA**：具有统一多模态指令与生成能力的大规模扩散视觉-语言-动作模型。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.25406)]\n- **ABot-PhysWorld**：面向物理对齐的机器人操作任务的交互式世界基础模型。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.23376)]\n- **Describe-Then-Act**：通过蒸馏的语言-动作世界模型进行主动式智能体引导。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.23149)]\n- 基于可微世界模型的模型预测控制，用于离线强化学习。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.22430)]\n- **WorldCache**：面向加速视频世界模型的内容感知缓存。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.22286)] [[代码](https:\u002F\u002Fumair1221.github.io\u002FWorld-Cache\u002F)]\n- **ThinkJEPA**：利用大型视觉-语言推理模型增强潜在世界模型。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.22281)]\n- **Omni-WorldBench**：迈向全面的以交互为中心的世界模型评估。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.22212)]\n- 世界动作模型是否比VLA更具泛化能力？一项鲁棒性研究。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.22078)]\n- **InSpatio-WorldFM**：一个开源的实时生成式帧模型。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.11911)] [[项目](https:\u002F\u002Finspatio.github.io\u002Fworldfm\u002F)]\n- [**VEGA-3D**] 生成模型懂得空间：释放隐式3D先验知识以理解场景。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.19235)] [[代码](https:\u002F\u002Fgithub.com\u002FH-EmbodVis\u002FVEGA-3D)]\n- **AcceRL**：一个面向视觉-语言-动作模型的分布式异步强化学习与世界模型框架。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.18464)]\n- **EVA**：通过逆动力学奖励将视频世界模型与可执行的机器人动作对齐。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.17808)] [[项目](https:\u002F\u002Feva-project-page.github.io\u002F)]\n- **立体世界模型**：相机引导的立体视频生成。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.17375)] [[项目](https:\u002F\u002Fsunyangtian.github.io\u002FStereoWorld-web\u002F)]\n- **GigaWorld-Policy**：一种高效的动作中心世界—动作模型。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.17240)]\n- **MosaicMem**：用于可控视频世界模型的混合空间记忆。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.17117)] [[项目](https:\u002F\u002Fmosaicmem.github.io\u002Fmosaicmem\u002F)]\n- **DreamPlan**：通过视频世界模型高效地对视觉-语言规划器进行强化学习微调。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.16860)] [[项目](https:\u002F\u002Fpsi-lab.ai\u002FDreamPlan\u002F)]\n- **仿真蒸馏**：在仿真环境中预训练世界模型，以快速适应真实世界。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.15759)] [[项目](https:\u002F\u002Fsim-dist.github.io\u002F)]\n- **ResWM**：用于视觉强化学习的残差动作世界模型。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.11110)]\n- **World2Act**：通过技能组合型世界模型进行潜在动作的后训练。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.10422)] [[项目](https:\u002F\u002Fwm2act.github.io\u002F)]\n- **RAE-NWM**：密集视觉表征空间中的导航世界模型。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.09241)] [[代码](https:\u002F\u002Fgithub.com\u002F20robo\u002Fraenwm)]\n- **MWM**：面向动作条件一致预测的移动世界模型。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.07799)]\n- **DreamSAC**：通过探索对称性学习哈密顿世界模型。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.07545)]\n- **LiveWorld**：在生成式视频世界模型中模拟不可见的动力学。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.07145)]\n- **WorldCache**：通过异构标记缓存免费加速世界模型。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.06331)] [[项目](https:\u002F\u002Fgithub.com\u002FFofGofx\u002FWorldCache)]\n- 无需世界模型即可获取世界属性：从静态词嵌入中的共现统计中恢复时空结构。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.04317)]\n- 超越像素历史：具有持久3D状态的世界模型。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.03482)]\n- **DreamWorld**：视频生成中的统一世界建模。**`arXiv 26.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.00466)]\n- **MetaOthello**：Transformer中多种世界模型的对照研究。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.23164)]\n- 一致性三元组作为通用世界模型的定义原则。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.23152)]\n- **UCM**：通过时间感知的位置编码扭曲，将相机控制与记忆统一起来，用于世界模型。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.22960)] [[项目](https:\u002F\u002Fhumanaigc.github.io\u002Fucm-webpage\u002F)]\n- **CWM**：用于具身智能体流水线中动作可行性学习的对比世界模型。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.22452)]\n- **Solaris**：在Minecraft中构建多人视频世界模型。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.22208)] [[项目](https:\u002F\u002Fsolaris-wm.github.io\u002F)]\n- 当世界模型梦错了时：针对世界模型的物理条件对抗攻击。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.18739)]\n- 学习不变的视觉表征，用于结合嵌入的预测性世界模型进行规划。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.18639)]\n- 因子分解的潜在动作世界模型。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.16229)]\n- [**DreamZero**] 世界动作模型就是零样本策略。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.15922)] [[项目](https:\u002F\u002Fdreamzero0.github.io\u002F)]\n- **VLM-DEWM**：用于制造业中可验证且稳健的视觉-语言规划的动态外部世界模型。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.15549)]\n- 自监督的基于JEPA的世界模型，用于LiDAR占用率补全与预测。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.12540)]\n- GigaBrain-0.5M：一款基于世界模型强化学习的VLA。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.12099)] [[项目](https:\u002F\u002Fgigabrain05m.github.io\u002F)]\n- **VLAW**：视觉-语言-动作策略与世界模型的迭代协同改进。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.12063)] [[项目](https:\u002F\u002Fsites.google.com\u002Fview\u002Fvlaw-arxiv)]\n- 为层级式操控策略扩展世界模型。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.10983)] [[项目](https:\u002F\u002Fvista-wm.github.io\u002F)]\n- 说、梦、做：学习用于指令驱动型机器人操作的视频世界模型。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.10717)]\n- **Olaf-World**：为视频世界建模定向潜在动作。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.10104)] [[项目](https:\u002F\u002Fshowlab.github.io\u002FOlaf-World\u002F)]\n- **VLA-JEPA**：用潜在世界模型增强视觉-语言-动作模型。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.10098)]\n- **Agent World Model**：面向代理式强化学习的无限合成环境。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.10090)] [[代码](https:\u002F\u002Fgithub.com\u002FSnowflake-Labs\u002Fagent-world-model)]\n- **MVISTA-4D**：视图一致的4D世界模型，可在测试时推断动作，用于机器人操作。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.09878)]\n- **Hand2World**：通过自由空间手势自回归地生成第一人称交互。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.09600)] [[项目](https:\u002F\u002Fhand2world.github.io\u002F)]\n- **WorldArena**：一个用于评估具身世界模型感知能力和功能效用的统一基准。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.08971)]\n- **MIND**：评估世界模型中的内存一致性与动作控制。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.08025)] [[代码](https:\u002F\u002Fgithub.com\u002FCSU-JPG\u002FMIND)]\n- 跨视角世界模型。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.07277)]\n- 在视频世界模型中解释物理规律。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.07050)]\n- **DreamDojo**：来自大规模人类视频的通用机器人世界模型。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.06949)] [[项目](https:\u002F\u002Fdreamdojo-world.github.io\u002F)]\n- **World-VLA-Loop**：视频世界模型与VLA策略的闭环学习。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.06508)] [[项目](https:\u002F\u002Fshowlab.github.io\u002FWorld-VLA-Loop\u002F)]\n- 利用潜在动作进行自我改进的世界建模。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.06130)]\n- **BridgeV2W**：通过具身掩码将视频生成模型与具身世界模型连接起来。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.03793)] [[项目](https:\u002F\u002Fbridgev2w.github.io\u002F)]\n- **LIVE**：长时程交互式视频世界建模。**`arXiv 26.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.03747)]\n- [**Lingbot-World**] 推进开源世界模型。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.20540)] [[代码](https:\u002F\u002Fgithub.com\u002Frobbyant\u002Flingbot-world)]\n- [**Lingbot-VA**] 用于机器人控制的因果世界建模。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.21998)] [[代码](https:\u002F\u002Fgithub.com\u002Frobbyant\u002Flingbot-va)]\n- **PathWise**：通过世界模型规划，借助自我进化LLM实现自动化启发式设计。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.20539)]\n- **WorldBench**：为诊断评估世界模型而消除物理歧义。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.21282)] [[项目](https:\u002F\u002Fworld-bench.github.io\u002F)]\n- 视觉生成通过多模态世界模型解锁类人推理能力。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.19834)] [[项目](https:\u002F\u002Fthuml.github.io\u002FReasoning-Visual-World)]\n- **PhysicsMind**：为底层VLM和世界模型中的物理推理与预测提供仿真与真实力学基准测试。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.16007)]\n- **Boltzmann-GPT**：连接基于能量的世界模型与语言生成。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.17094)]\n- **MetaWorld**：在高层指令接地方面的技能迁移与组合，通过层级式世界模型实现。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.17507)] [[项目](https:\u002F\u002Fanonymous.4open.science\u002Fr\u002Fmetaworld-2BF4\u002F)]\n- 通过知识丰富的经验学习来对齐代理式世界模型。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.13247)]\n- **VJEPA**：变分联合嵌入预测架构作为概率性世界模型。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.14354)]\n- 穿行于画作之中：来自互联网先验知识的第一人称世界模型。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.15284)]\n- 从生成引擎到可行动的模拟器：世界模型中物理接地的重要性。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.15533)]\n- 一种高效且多模态的单步世界模型导航系统。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.12277)]\n- **ReWorld**：面向具身世界模型的多维奖励建模。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.12428)]\n- 动作夏普利值：用于强化学习中世界模型的训练数据选择指标。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.10905)]\n- 在推断时将视频生成模型与潜在世界模型的物理特性对齐。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.10553)]\n- 先想象再规划：智能体通过世界模型进行自适应前瞻学习。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.08955)]\n- 用于3D人体运动预测的语义信念状态世界模型。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.03517)]\n- **PointWorld**：为野外机器人操作扩展3D世界模型。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.03782)] [[项目](https:\u002F\u002Fpoint-world.github.io\u002F)]\n- 当前智能体未能将世界模型用作预见未来的工具。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.03905)]\n- MobileDreamer：面向GUI智能体的生成式草图世界模型。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.04035)]\n- 哇，哇，哇！一场全面的具身世界模型评估图灵测试。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.04137)]\n- **VerseCrafter**：具有4D几何控制的动态逼真视频世界模型。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.05138)] [[项目](https:\u002F\u002Fsixiaozheng.github.io\u002FVerseCrafter_page\u002F)]\n- 在野外学习潜在动作世界模型。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.05230)]\n- 以对象为中心的世界模型与蒙特卡洛树搜索相遇。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.06604)]\n- 解开谜题：面向离线多智能体强化学习的局部到全局世界模型。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.07463)]\n- 关于问题空间作为系统工程中语义世界模型的形式化理论。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.00755)]\n- 流等变世界模型：用于部分可观测动态环境的记忆。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.01075)] [[项目](https:\u002F\u002Fflowequivariantworldmodels.github.io\u002F)]\n- **NeoVerse**：利用野外单目视频增强4D世界模型。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.00393)] [[项目](https:\u002F\u002Fneoverse-4d.github.io\u002F)]\n- 是什么驱动了结合嵌入的预测性世界模型在物理规划中的成功？**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.24497)]\n- **AlignUSER**：通过世界模型使LLM智能体与人类对齐，用于推荐系统评估。**`arXiv 26.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.00930)]\n\n### 2025\n- [**DreamerV3**] Mastering Diverse Domains through World Models. **`Nature`** [[Paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41586-025-08744-2)] [[JAX Code](https:\u002F\u002Fgithub.com\u002Fdanijar\u002Fdreamerv3)]\n- **3D4D**: An Interactive, Editable, 4D World Model via 3D Video Generation.  **`AAAI 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.08536)]\n- Object-Centric World Models for Causality-Aware Reinforcement Learning.  **`AAAI 26`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.14262)]\n- Foundation Models as World Models: A Foundational Study in Text-Based GridWorlds.  **`NeurIPSW 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.15915)]\n- Language-conditioned world model improves policy generalization by reading environmental descriptions. **`NeurIPSW 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.22904)]\n- **NavMorph**: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments.  **`ICCV 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.23468)] [[Code](https:\u002F\u002Fgithub.com\u002FFeliciaxyao\u002FNavMorph)]\n- **GWM**: Towards Scalable Gaussian World Models for Robotic Manipulation **`ICCV 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.17600)] [[Project](https:\u002F\u002Fgaussian-world-model.github.io\u002F)]\n- **FOUNDER**: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making. **`ICML 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.12496)] [[Project](https:\u002F\u002Fsites.google.com\u002Fview\u002Ffounder-rl)]\n- General agents need world models.  **`ICML 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.01622)]\n- What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models. **`ICML 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.06952)]\n- Continual Reinforcement Learning by Planning with Online World Models. **`ICML 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.09177)]\n- **PIGDreamer**: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning. **`ICML 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.02159)]\n- [**NWM**] Navigation World Models.  **`CVPR 25 Best Paper Honorable Mention`** **`Yann LeCun`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.03572)] [[Project](https:\u002F\u002Fwww.amirbar.net\u002Fnwm\u002F)]\n- [**PrediCIR**] Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval. **`CVPR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.17109)] [[Code](https:\u002F\u002Fgithub.com\u002FPter61\u002Fpredicir)]\n- [**MoSim**] Neural Motion Simulator: Pushing the Limit of World Models in Reinforcement Learning. **`CVPR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.07095)]\n- **CoT-VLA**: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models.  **`CVPR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.22020)] [[Project](https:\u002F\u002Fcot-vla.github.io\u002F)]\n- **EchoWorld**: Learning Motion-Aware World Models for Echocardiography Probe Guidance. **`CVPR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.13065)] [[Code](https:\u002F\u002Fgithub.com\u002FLeapLabTHU\u002FEchoWorld)]\n- **DiWA**: Diffusion Policy Adaptation with World Models. **`CoRL 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.03645)] [[Project](https:\u002F\u002Fdiwa.cs.uni-freiburg.de\u002F)]\n- **Simulating Before Planning**: Constructing Intrinsic User World Model for User-Tailored Dialogue Policy Planning. **`SIGIR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.13643)] \n- **LS-Imagine**: Open-World Reinforcement Learning over Long Short-Term Imagination. **`ICLR 25 Oral`** [[Paper](https:\u002F\u002Fopenreview.net\u002Fpdf?id=vzItLaEoDa)] [[Code](https:\u002F\u002Fgithub.com\u002Fqiwang067\u002FLS-Imagine)]\n- **DC-MPC**: Discrete Codebook World Models for Continuous Control.  **`ICLR 25`** [[Paper](https:\u002F\u002Fopenreview.net\u002Fforum?id=lfRYzd8ady)] [[Code](https:\u002F\u002Fgithub.com\u002Faidanscannell\u002Fdcmpc)]\n- [**SGF**] Simple, Good, Fast: Self-Supervised World Models Free of Baggage.  **`ICLR 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.02612)] [[Code](https:\u002F\u002Fgithub.com\u002Fjrobine\u002Fsgf)]\n- **ManiGaussian++**: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model. **`IROS 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.19842)] [[Code](https:\u002F\u002Fgithub.com\u002FApril-Yz\u002FManiGaussian_Bimanual)]\n- **SCMA**: Self-Consistent Model-based Adaptation for Visual Reinforcement Learning. **`IJCAI 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.09923)]\n- **Surfer**: A World Model-Based Framework for Vision-Language Robot Manipulation. **`TNNLS 25`** [[Paper](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F11152367)]\n- Probing the effectiveness of World Models for Spatial Reasoning through Test-time Scaling. **`World Modeling Workshop 26`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.05809)] [[Code](https:\u002F\u002Fgithub.com\u002Fchandar-lab\u002Fvisa-for-mindjourney)]\n- **On Memory**: A comparison of memory mechanisms in world models. **`World Modeling Workshop 26`** [[Paper](https:\u002F\u002Fwww.arxiv.org\u002Fabs\u002F2512.06983)]\n- **Zero-Splat TeleAssist**: A Zero-Shot Pose Estimation Framework for Semantic Teleoperation. **`ICRAW 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.08271)]\n- **Act2Goal**: From World Model To General Goal-conditioned Policy. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.23541)]\n- Web World Models. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.23676)]\n- [**LEWM**] Large Emotional World Model. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.24149)]\n- World model inspired sarcasm reasoning with large language model agents. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.24329)]\n- **TeleWorld**: Towards Dynamic Multimodal Synthesis with a 4D World Model. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.00051)]\n- Aerial World Model for Long-horizon Visual Generation and Navigation in 3D Space. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.21887)]\n- **Yume-1.5**: A Text-Controlled Interactive World Generation Model. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.22096)]\n- [**ORCA**] Active Intelligence in Video Avatars via Closed-loop World Modeling. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.20615)] [[Project](https:\u002F\u002Fxuanhuahe.github.io\u002FORCA\u002F)]\n- **From Word to World**: Can Large Language Models be Implicit Text-based World Models?. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.18832)]\n- A Unified Definition of Hallucination, Or: It's the World Model, Stupid. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.21577)]\n- **AstraNav-World**: World Model for Foresight Control and Consistency. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.21714)]\n- **ChronoDreamer**: Action-Conditioned World Model as an Online Simulator for Robotic Planning. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.18619)]\n- **STORM**: Search-Guided Generative World Models for Robotic Manipulation. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.18477)]\n- Dexterous World Models. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.17907)] [[Project](http:\u002F\u002Fsnuvclab.github.io\u002Fdwm)]\n- **WorldPlay**: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.14614)] [[Project](https:\u002F\u002F3d-models.hunyuan.tencent.com\u002Fworld\u002F)]\n- **Motus**: A Unified Latent Action World Model. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.13030)]\n- **LongVie 2**: Multimodal Controllable Ultra-Long Video World Model. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.13604)]\n- World Models Can Leverage Human Videos for Dexterous Manipulation. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.13644)]\n- World Models Unlock Optimal Foraging Strategies in Reinforcement Learning Agents. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.12548)]\n- **VFMF**: World Modeling by Forecasting Vision Foundation Model Features. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.11225)] [[Code](https:\u002F\u002Fgithub.com\u002Fgboduljak\u002Fvfmf)]\n- **VDAWorld**: World Modelling via VLM-Directed Abstraction and Simulation. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.11061)] [[Project](https:\u002F\u002Ffelixomahony.github.io\u002Fvdaworld\u002F)]\n- The Double Life of Code World Models: Provably Unmasking Malicious Behavior Through Execution Traces. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.13821)]\n- **KAN-Dreamer**: Benchmarking Kolmogorov-Arnold Networks as Function Approximators in World Models. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.07437)]\n- **CLARITY**: Medical World Model for Guiding Treatment Decisions by Modeling Context-Aware Disease Trajectories in Latent Space. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.08029)]\n- Embodied Tree of Thoughts: Deliberate Manipulation Planning with Embodied World Model. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.08188)] [[Project](https:\u002F\u002Fembodied-tree-of-thoughts.github.io\u002F)]\n- Deterministic World Models for Verification of Closed-loop Vision-based Systems. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.08991)]\n- Closing the Train-Test Gap in World Models for Gradient-Based Planning. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.09929)]\n- Latent Action World Models for Control with Unlabeled Trajectories. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.10016)]\n- Evaluating Gemini Robotics Policies in a Veo World Simulator. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.10675)]\n- **Astra**: General Interactive World Model with Autoregressive Denoising. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.08931)] [[Code](https:\u002F\u002Fgithub.com\u002FEternalEvan\u002FAstra)]\n- **Visionary**: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.08478)]\n- - Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.08411)]\n- Learning Robot Manipulation from Audio World Models. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.08405)]\n- **FieldSeer I**: Physics-Guided World Models for Long-Horizon Electromagnetic Dynamics under Partial Observability. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.05361)]\n- World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.05927)]\n- **Speech World Model**: Causal State-Action Planning with Explicit Reasoning for Speech. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.05933)]\n- **BiTAgent**: A Task-Aware Modular Framework for Bidirectional Coupling between Multimodal Large Language Models and World Models. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.04513)]\n- **AdaPower**: Specializing World Foundation Models for Predictive Manipulation. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.03538)]\n- **RoboScape-R**: Unified Reward-Observation World Models for Generalizable Robotics Training via RL. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.03556)]\n- **RELIC**: Interactive Video World Model with Long-Horizon Memory. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.04040)]\n- **Audio-Visual World Models**: Towards Multisensory Imagination in Sight and Sound. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.00883)]\n- Better World Models Can Lead to Better Post-Training Performance. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.03400)]\n- **VCWorld**: A Biological World Model for Virtual Cell Simulation. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.00306)]\n- **NavForesee**: A Unified Vision-Language World Model for Hierarchical Planning and Dual-Horizon Navigation Prediction. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.01550)]\n- **GrndCtrl**: Grounding World Models via Self-Supervised Reward Alignment. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.01952)]\n- **The brain-AI convergence**: Predictive and generative world models for general-purpose computation. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.02419)]\n- **WorldPack**: Compressed Memory Improves Spatial Consistency in Video World Modeling. **`arXiv 25.12`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.02473)]\n- **VISTAv2**: World Imagination for Indoor Vision-and-Language Navigation. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.00041)]\n- **Hunyuan-GameCraft-2**: Instruction-following Interactive Game World Model. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.23429)] [[Project](https:\u002F\u002Fhunyuan-gamecraft-2.github.io\u002F)]\n- **SmallWorlds**: Assessing Dynamics Understanding of World Models in Isolated Environments. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.23465)]\n- **Thinking by Doing**: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.23476)]\n- **TraceGen**: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.21690)]\n- **GigaWorld-0**: World Models as Data Engine to Empower Embodied AI. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.19861)]\n- **4DWorldBench**: A Comprehensive Evaluation Framework for 3D\u002F4D World Generation Models. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.19836)]\n- **Thinking Ahead**: Foresight Intelligence in MLLMs and World Models. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.18735)]\n- Counterfactual World Models via Digital Twin-conditioned Video Diffusion. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.17481)]\n- **RynnVLA-002**: A Unified Vision-Language-Action and World Model. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.17502)]\n- **Beyond Generative AI**: World Models for Clinical Prediction, Counterfactuals, and Planning. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.16333)]\n- **X-WIN**: Building Chest Radiograph World Model via Predictive Sensing. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.14918)]\n- **IPR-1**: Interactive Physical Reasoner. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.15407)]\n- **NORA-1.5**: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.14659)] [[Code](https:\u002F\u002Fdeclare-lab.github.io\u002Fnora-1.5)]\n- Towards High-Consistency Embodied World Model with Multi-View Trajectory Videos. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.12882)]\n- **PragWorld**: A Benchmark Evaluating LLMs' Local World Model under Minimal Linguistic Alterations and Conversational Dynamics. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.13021)]\n- Latent-Space Autoregressive World Model for Efficient and Robust Image-Goal Navigation. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.11011)]\n- Scalable Policy Evaluation with Video World Models. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.11520)]\n- **WMPO**: World Model-based Policy Optimization for Vision-Language-Action Models. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.09515)]\n- **ViPRA**: Video Prediction for Robot Actions. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.07732)]\n- **Dynamic Sparsity**: Challenging Common Sparsity Assumptions for Learning World Models in Robotic Reinforcement Learning Benchmarks. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.08086)]\n- **LLM-as-a-Judge**: Toward World Models for Slate Recommendation Systems. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.04541)]\n- **DR. WELL**: Dynamic Reasoning and Learning with Symbolic World Model for Embodied LLM-Based Multi-Agent Collaboration. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.04646)]\n- **WorldPlanner**: Monte Carlo Tree Search and MPC with Action-Conditioned Visual World Models. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.03077)]\n- Natural Building Blocks for Structured World Models: Theory, Evidence, and Scaling. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.02091)]\n- Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.02748)]\n- How Far Are Surgeons from Surgical World Models? A Pilot Study on Zero-shot Surgical Video Generation with Expert Assessment. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.01775)]\n- Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model. **`arXiv 25.11`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.27607)]\n- Co-Evolving Latent Action World Models. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.26433)]\n- **Emu3.5**: Native Multimodal Models are World Learners. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.26583)]\n- Clone Deterministic 3D Worlds with Geometrically-Regularized World Models. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.26782)]\n- Semantic Communications with World Models. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.24785)]\n- Dual-Mind World Models: A General Framework for Learning in Dynamic Wireless Networks. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.24546)]\n- Deductive Chain-of-Thought Augmented Socially-aware Robot Navigation World Model. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.23509)]\n- Deep Active Inference with Diffusion Policy and Multiple Timescale World Model for Real-World Exploration and Navigation. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.23258)]\n- Vector Quantization in the Brain: Grid-like Codes in World Models. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.16039)]\n- Zero-shot World Models via Search in Memory. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.16123)]\n- **VAGEN**: Reinforcing World Model Reasoning for Multi-Turn VLM Agents. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.16907)] [[Project](https:\u002F\u002Fvagen-ai.github.io\u002F)]\n- **World-in-World**: World Models in a Closed-Loop World. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.18135)]\n- Higher Embedding Dimension Creates a Stronger World Model for a Simple Sorting Task. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.18315)]\n- Social World Model-Augmented Mechanism Design Policy Learning. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.19270)]\n- **ProTerrain**: Probabilistic Physics-Informed Rough Terrain World Modeling. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.19364)]\n- **GigaBrain-0**: A World Model-Powered Vision-Language-Action Model. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.19430)] [[Project](https:\u002F\u002Fgigabrain0.github.io\u002F)]\n- Benchmarking World-Model Learning. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.19788)]\n- Semantic World Models. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.19818)] [[Project](https:\u002F\u002Fweirdlabuw.github.io\u002Fswm)]\n- World Models Should Prioritize the Unification of Physical and Social Dynamics. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.21219)]\n- **From Masks to Worlds**: A Hitchhiker's Guide to World Models. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.20668)]\n- Rethinking the Simulation vs. Rendering Dichotomy: No Free Lunch in Spatial World Modelling. **`NeurIPSW 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.20835)]\n- How Hard is it to Confuse a World Model? **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.21232)]\n- **DreamerV3-XP**: Optimizing exploration through uncertainty estimation. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.21418)]\n- **PhysWorld**: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.21447)]\n- **Terra**: Explorable Native 3D World Model with Point Latents. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.14977)] [[Project](https:\u002F\u002Fhuang-yh.github.io\u002Fterra\u002F)]\n- **R-WoM**: Retrieval-augmented World Model For Computer-use Agents. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.11892)]\n- **One Life to Learn**: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.12088)] [[Project](https:\u002F\u002Fonelife-worldmodel.github.io\u002F)]\n- **Deep SPI**: Safe Policy Improvement via World Models. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.12312)]\n- **DREAMer-VXS**: A Latent World Model for Sample-Efficient AGV Exploration in Stochastic, Unobserved Environments. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.00005)]\n- Ego-Vision World Model for Humanoid Contact Planning. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.11682)] [[Project](https:\u002F\u002Fego-vcp.github.io\u002F)]\n- **Unified World Models**: Memory-Augmented Planning and Foresight for Visual Navigation. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.08713)]\n- What You Don't Know Can Hurt You: How Well do Latent Safety Filters Understand Partially Observable Safety Constraints? **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.06492)]\n- Generative World Modelling for Humanoids: 1X World Model Challenge Technical Report. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.07092)]\n- **WristWorld**: Generating Wrist-Views via 4D World Models for Robotic Manipulation. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.07313)]\n- **Ctrl-World**: A Controllable Generative World Model for Robot Manipulation. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.10125)]\n- Active Confusion Expression in Large Language Models: Leveraging World Models toward Better Social Reasoning. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.07974)]\n- **VideoVerse**: How Far is Your T2V Generator from a World Model? **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.08398)]\n- Internal World Models as Imagination Networks in Cognitive Agents. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.04391)]\n- Code World Models for General Game Playing. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.04542)]\n- Spatiotemporal Forecasting as Planning: A Model-Based Reinforcement Learning Approach with Generative World Models. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.04020)]\n- **MorphoSim**: An Interactive, Controllable, and Editable Language-guided 4D World Simulator. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.04390)] [[Code](https:\u002F\u002Fgithub.com\u002Feric-ai-lab\u002FMorph4D)]\n- Bridging the Gap Between Multimodal Foundation Models and World Models. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.03727)]\n- **Memory Forcing**: Spatio-Temporal Memory for Consistent Scene Generation on Minecraft. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.03198)] [[Project](https:\u002F\u002Fjunchao-cs.github.io\u002FMemoryForcing-demo\u002F)]\n- A Recipe for Efficient Sim-to-Real Transfer in Manipulation with Online Imitation-Pretrained World Models. **`arXiv 25.10`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.02538)]\n- **CWM**: An Open-Weights LLM for Research on Code Generation with World Models. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.02387)]\n- **FantasyWorld**: Geometry-Consistent World Modeling via Unified Video and 3D Prediction. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.21657)]\n- **LongScape**: Advancing Long-Horizon Embodied World Models with Context-Aware MoE. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.21790)] [[Code](https:\u002F\u002Fgithub.com\u002Ftsinghua-fib-lab\u002FLongscape)]\n- **LongLive**: Real-time Interactive Long Video Generation. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.22622)] [[Code](https:\u002F\u002Fgithub.com\u002FNVlabs\u002FLongLive)]\n- **MoWM**: Mixture-of-World-Models for Embodied Planning via Latent-to-Pixel Feature Modulation. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.21797)]\n- **Context and Diversity Matter**: The Emergence of In-Context Learning in World Models. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.22353)]\n- **WoW**: Towards a World omniscient World model Through Embodied Interaction. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.22642)]\n- **KeyWorld**: Key Frame Reasoning Enables Effective and Efficient World Models. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.21027)] [[Code](https:\u002F\u002Fanonymous.4open.science\u002Fr\u002FKeyworld-E43D)]\n- [**Voe 3**] Video models are zero-shot learners and reasoners. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.20328)] [[Project](https:\u002F\u002Fvideo-zero-shot.github.io\u002F)]\n- **World4RL**: Diffusion World Models for Policy Refinement with Reinforcement Learning for Robotic Manipulation. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.19080)] [[Project](https:\u002F\u002Fworld4rl.github.io\u002F)]\n- Remote Sensing-Oriented World Model. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.17808)]\n- **SAMPO**: Scale-wise Autoregression with Motion PrOmpt for generative world models. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.15536)]\n- [**PIWM**] Enhancing Physical Consistency in Lightweight World Models. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.12437)] [[Project](https:\u002F\u002Fphysics-wm.github.io\u002F)]\n- **LLM-JEPA**: Large Language Models Meet Joint Embedding Predictive Architectures. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.14252v1)] [[Code](https:\u002F\u002Fgithub.com\u002Frbalestr-lab\u002Fllm-jepa)]\n- **PhysicalAgent**: Towards General Cognitive Robotics with Foundation World Models. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.13903)]\n- **OmniWorld**: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.12201)] [[Project](https:\u002F\u002Fyangzhou24.github.io\u002FOmniWorld\u002F)]\n- **UnifoLM-WMA-0**: A World-Model-Action (WMA) Framework under UnifoLM Family. **`Unitree`** [[Code](https:\u002F\u002Fgithub.com\u002Funitreerobotics\u002Funifolm-world-model-action)]\n- **One Model for All Tasks**: Leveraging Efficient World Models in Multi-Task Planning. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.07945)]\n- Language-Driven Hierarchical Task Structures as Explicit World Models for Multi-Agent Learning. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.04731)]\n- **LatticeWorld**: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.05263)] [[Demo](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=8VWZXpERR18&feature=youtu.be)]\n- Design and Optimization of Reinforcement Learning-Based Agents in Text-Based Games. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.03479)]\n- **CausalARC**: Abstract Reasoning with Causal World Models. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.03636)]\n- Planning with Reasoning using Vision Language World Model. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.02722)]\n- Learning an Adversarial World Model for Automated Curriculum Generation in MARL. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.03771)]\n- World Model Implanting for Test-time Adaptation of Embodied Agents. **`arXiv 25.9`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.03956)]\n- Social World Models. **`arXiv 25.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.00559)]\n- [**PEWM**] Learning Primitive Embodied World Models: Towards Scalable Robotic Learning. **`arXiv 25.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.20840)]\n- [**DALI**] Dynamics-Aligned Latent Imagination in Contextual World Models for Zero-Shot Generalization. **`arXiv 25.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.20294)]\n- **HERO**: Hierarchical Extrapolation and Refresh for Efficient World Models. **`arXiv 25.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.17588)]\n- **Matrix-Game 2.0**: An Open-Source, Real-Time, and Streaming Interactive World Model. **`arXiv 25.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.13009)] [[Code](https:\u002F\u002Fgithub.com\u002FSkyworkAI\u002FMatrix-Game\u002Ftree\u002Fmain\u002FMatrix-Game-2)]\n- Visuomotor Grasping with World Models for Surgical Robots. **`arXiv 25.8`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.11200)]\n- **Genie 3**: A new frontier for world models. **`Google DeepMind`** [[Blog](https:\u002F\u002Fdeepmind.google\u002Fdiscover\u002Fblog\u002Fgenie-3-a-new-frontier-for-world-models\u002F)]\n- **SimuRA**: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.23773)]\n- **CoEx** -- Co-evolving World-model and Exploration. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.22281)]\n- What Does it Mean for a Neural Network to Learn a \"World Model\"? **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.21513)]\n- **Back to the Features**: DINO as a Foundation for Video World Models. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.19468)]\n- **HunyuanWorld 1.0**: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels. **`25.7`** [[Paper](https:\u002F\u002F3d-models.hunyuan.tencent.com\u002Fworld\u002FHY_World_1_technical_report.pdf)] [[Code](https:\u002F\u002Fgithub.com\u002FTencent-Hunyuan\u002FHunyuanWorld-1.0)]\n- **Yume**: An Interactive World Generation Model. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.17744)] [[Code](https:\u002F\u002Fgithub.com\u002Fstdstu12\u002FYUME)]\n- LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.15521)]\n- **MindJourney**: Test-Time Scaling with World Models for Spatial Reasoning. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.12508)] [[Project](https:\u002F\u002Fumass-embodied-agi.github.io\u002FMindJourney\u002F)]\n- Latent Policy Steering with Embodiment-Agnostic Pretrained World Models. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.13340)]\n- **MobiWorld**: World Models for Mobile Wireless Network. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.09462)]\n- [**GWM**] Graph World Model. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.10539)] [[Code](https:\u002F\u002Fgithub.com\u002Fulab-uiuc\u002FGWM)]\n- **From Curiosity to Competence**: How World Models Interact with the Dynamics of Exploration. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.08210)]\n- **Martian World Models**: Controllable Video Synthesis with Physically Accurate 3D Reconstructions. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.07978)] [[Project](https:\u002F\u002Fmarsgenai.github.io\u002F)]\n- **Sekai**: A Video Dataset towards World Exploration. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.15675)] [[Project](https:\u002F\u002Flixsp11.github.io\u002Fsekai-project\u002F)]\n- **Dyn-O**: Building Structured World Models with Object-Centric Representations. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.03298)]\n- Critiques of World Models. **`arXiv 25.7`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.05169)]\n- [**PEVA**] Whole-Body Conditioned Egocentric Video Prediction. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.21552)] [[Project](https:\u002F\u002Fdannytran123.github.io\u002FPEVA\u002F)]\n- **World4Omni**: A Zero-Shot Framework from Image Generation World Model to Robotic Manipulation. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.23919)]\n- **ParticleFormer**: A 3D Point Cloud World Model for Multi-Object, Multi-Material Robotic Manipulation. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.23126)] [[Project](https:\u002F\u002Fparticleformer.github.io\u002F)]\n- **RoboScape**: Physics-informed Embodied World Model. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.23135)] [[Code](https:\u002F\u002Fgithub.com\u002Ftsinghua-fib-lab\u002FRoboScape)]\n- **Embodied AI Agents**: Modeling the World. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.22355)]\n- A \"Good\" Regulator May Provide a World Model for Intelligent Systems. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.23032)]\n- **WorldVLA**: Towards Autoregressive Action World Model. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.21539)] [[Code](https:\u002F\u002Fgithub.com\u002Falibaba-damo-academy\u002FWorldVLA)]\n- **MinD**: Unified Visual Imagination and Control via Hierarchical World Models. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.18897)]\n- Transformer World Model for Sample Efficient Multi-Agent Reinforcement Learning. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.18537)]\n- [**UNIVERSE**] Adapting Vision-Language Models for Evaluating World Models. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.17967)]\n- **TransDreamerV3**: Implanting Transformer In DreamerV3. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.17103)]\n- Reimagination with Test-time Observation Interventions: Distractor-Robust World Model Predictions for Visual Model Predictive Control. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.16565)]\n- Measuring (a Sufficient) World Model in LLMs: A Variance Decomposition Framework. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.16584)]\n- **GAF**: Gaussian Action Field as a Dynamic World Model for Robotic Manipulation. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.14135)] [[Project](https:\u002F\u002Fchaiying1.github.io\u002FGAF.github.io\u002Fproject_page\u002F)]\n- [**UniVLA**] Unified Vision-Language-Action Model. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.19850)]\n- **Xray2Xray**: World Model from Chest X-rays with Volumetric Context. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.19055)]\n- **PlayerOne**: Egocentric World Simulator. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.09995)] [[Project](https:\u002F\u002Fplayerone-hku.github.io\u002F)]\n- **V-JEPA 2**: Self-Supervised Video Models Enable Understanding, Prediction and Planning. **`arXiv 25.6`** **`Yann LeCun`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.09985)] [[Project](https:\u002F\u002Fai.meta.com\u002Fvjepa\u002F)]\n- [**TAWM**] Time-Aware World Model for Adaptive Prediction and Control. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.08441)] [[Code](https:\u002F\u002Fgithub.com\u002Fanh-nn01\u002FTime-Aware-World-Model)]\n- [**XPM-WM**] Efficient Generation of Diverse Cooperative Agents with World Models. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.07450)]\n- Video World Models with Long-term Spatial Memory. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.05284)] [[Project](https:\u002F\u002Fspmem.github.io\u002F)]\n- **DSG-World**: Learning a 3D Gaussian World Model from Dual State Videos. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.05217)]\n- Safe Planning and Policy Optimization via World Model Learning. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.04828)]\n- **3DFlowAction**: Learning Cross-Embodiment Manipulation from 3D Flow World Model. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.06199)] [[Code](https:\u002F\u002Fgithub.com\u002FHoyyyaard\u002F3DFlowAction\u002F)]\n- Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.06006)]\n- **ORV**: 4D Occupancy-centric Robot Video Generation. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.03079)] [[Project](https:\u002F\u002Forangesodahub.github.io\u002FORV\u002F)]\n- **DeepVerse**: 4D Autoregressive Video Generation as a World Model. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.01103)] [[Project](https:\u002F\u002Fsotamak1r.github.io\u002Fdeepverse\u002F)]\n- Sparse Imagination for Efficient Visual World Model Planning. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.01392)]\n- Learning Abstract World Models with a Group-Structured Latent Space. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.01529)]\n- **Voyager**: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.04225)] [[Project](https:\u002F\u002Fvoyager-world.github.io\u002F)]\n- **WoMAP**: World Models For Embodied Open-Vocabulary Object Localization. **`arXiv 25.6`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.01600)]\n- [**LoopNav**] Toward Memory-Aided World Models: Benchmarking via Spatial Consistency. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.22976)] [[Code](https:\u002F\u002Fgithub.com\u002FKevin-lkw\u002FLoopNav)] [[Data](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FkevinLian\u002FLoopNav)]\n- Long-Context State-Space Video World Models. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.20171)]\n- **Dyna-Think**: Synergizing Reasoning, Acting, and World Model Simulation in AI Agents. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.00320)]\n- [**WPE**] Evaluating Robot Policies in a World Model. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.00613)] [[Demo](https:\u002F\u002Fworld-model-eval.github.io\u002F)]\n- **StateSpaceDiffuser**: Bringing Long Context to Diffusion World Models. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.22246)]\n- [**VRAG**] Learning World Models for Interactive Video Generation. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.21996)]\n- **JEDI**: Latent End-to-end Diffusion Mitigates Agent-Human Performance Asymmetry in Model-Based Reinforcement Learning. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.19698)]\n- [**FPWC**] Unlocking Smarter Device Control: Foresighted Planning with a World Model-Driven Code Execution Approach. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.16422)]\n- [**ForeDiff**] Consistent World Models via Foresight Diffusion. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.16474)]\n- **FLARE**: Robot Learning with Implicit World Modeling. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.15659)] [[Project](https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Fgear\u002Fflare)]\n- [**RWM**] World Models as Reference Trajectories for Rapid Motor Adaptation. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.15589)]\n- **RLVR-World**: Training World Models with Reinforcement Learning. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.13934)] [[Project](https:\u002F\u002Fthuml.github.io\u002FRLVR-World\u002F)]\n- **Vid2World**: Crafting Video Diffusion Models to Interactive World Models. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.14357)] [[Project](https:\u002F\u002Fknightnemo.github.io\u002Fvid2world\u002F)]\n- **Causal Cartographer**: From Mapping to Reasoning Over Counterfactual Worlds. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.14396)]\n- **EWMBench**: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.09694)] [[Data&Code](https:\u002F\u002Fgithub.com\u002FAgibotTech\u002FEWMBench)]\n- **FlowDreamer**: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.10075)] [[Project](https:\u002F\u002Fsharinka0715.github.io\u002FFlowDreamer\u002F)]\n- [**RoboOccWorld**] Occupancy World Model for Robots. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.05512)]\n- **seq-JEPA**: Autoregressive Predictive Learning of Invariant-Equivariant World Models. **`arXiv 25.5`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.03176)]\n- **TesserAct**: Learning 4D Embodied World Models. **`arXiv 25.4`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.20995)] [[Project](https:\u002F\u002Ftesseractworld.github.io\u002F)]\n- **ManipDreamer**: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance. **`arXiv 25.4`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.16464)]\n- [**RWM-O**] Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator. **`arXiv 25.4`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.16680)]\n- **PIN-WM**: Learning Physics-INformed World Models for Non-Prehensile Manipulation. **`arXiv 25.4`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.16693)]\n- Adapting a World Model for Trajectory Following in a 3D Game. **`arXiv 25.4`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.12299)]\n- Embodied World Models Emerge from Navigational Task in Open-Ended Environments. **`arXiv 25.4`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.11419)]\n- **MineWorld**: a Real-Time and Open-Source Interactive World Model on Minecraft. **`arXiv 25.4`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.08388)] [[Code](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FMineWorld)]\n- [**UWM**] Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets. **`arXiv 25.4`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.02792)] [[Code](https:\u002F\u002Fgithub.com\u002FWEIRDLabUW\u002Funified-world-model)]\n- Synthesizing world models for bilevel planning. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.20124)]\n- **Aether**: Geometric-Aware Unified World Modeling. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.18945)] [[Project](https:\u002F\u002Faether-world.github.io\u002F)]\n- [**MaaG**] Model as a Game: On Numerical and Spatial Consistency for Generative Games. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.21172)]\n- **DyWA**: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation.  **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.16806)] [[Project](https:\u002F\u002Fpku-epic.github.io\u002FDyWA\u002F)]\n- **Cosmos-Transfer1** **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.14492)] [[Code](https:\u002F\u002Fgithub.com\u002Fnvidia-cosmos\u002Fcosmos-transfer1)]\n- Meta-Reinforcement Learning with Discrete World Models for Adaptive Load Balancing. **`ACMSE 25`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.08872)]\n- [**FAR**] Long-Context Autoregressive Video Modeling with Next-Frame Prediction. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.19325)] [[Project](https:\u002F\u002Ffarlongctx.github.io\u002F)] [[Code](https:\u002F\u002Fgithub.com\u002Fshowlab\u002FFAR)]\n- **LUMOS**: Language-Conditioned Imitation Learning with World Models. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.10370)] [[Project](http:\u002F\u002Flumos.cs.uni-freiburg.de\u002F)]\n- **World Modeling Makes a Better Planner**: Dual Preference Optimization for Embodied Task Planning. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.10480)]\n- [**WLA**] Inter-environmental world modeling for continuous and compositional dynamics. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.09911)]\n- **Disentangled World Models**: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.08751)]\n- **WMNav**: Integrating Vision-Language Models into World Models for Object Goal Navigation. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.02247)] [[Code](https:\u002F\u002Fgithub.com\u002FB0B8K1ng\u002FWMNavigation)]\n- Toward Stable World Models: Measuring and Addressing World Instability in Generative Environments. **`arXiv 25.3`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.08122)]\n- **WorldModelBench**: Judging Video Generation Models As World Models. **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.20694)] [[Project](https:\u002F\u002Fworldmodelbench-team.github.io\u002F)]\n- **Multimodal Dreaming**: A Global Workspace Approach to World Model-Based Reinforcement Learning. **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.21142)]\n- Learning To Explore With Predictive World Model Via Self-Supervised Learning. **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.13200)]\n- **Text2World**: Benchmarking Large Language Models for Symbolic World Model Generation.  **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.13092)] [[Project](https:\u002F\u002Ftext-to-world.github.io\u002F)]\n- **M^3** : A Modular World Model over Streams of Tokens.  **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.11537)]  [[Code](https:\u002F\u002Fgithub.com\u002Fleor-c\u002FM3)]\n- When do Neural Networks Learn World Models?.  **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.09297)]\n- [**DWS**] Pre-Trained Video Generative Models as World Simulators.  **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.07825)]\n- **DMWM**: Dual-Mind World Model with Long-Term Imagination.  **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.07591)]\n- **EvoAgent**: Agent Autonomous Evolution with Continual World Model for Long-Horizon Tasks.  **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.05907)]\n- Generating Symbolic World Models via Test-time Scaling of Large Language Models.  **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.04728)]\n- [**HMA**] Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression.  **`arXiv 25.2`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.04296)] [[Code](https:\u002F\u002Fgithub.com\u002Fliruiw\u002FHMA)] [[Project](https:\u002F\u002Fliruiw.github.io\u002Fhma\u002F)]\n- **UP-VLA**: A Unified Understanding and Prediction Model for Embodied Agent.  **`arXiv 25.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.18867)]\n- **GLAM**: Global-Local Variation Awareness in Mamba-based World Model.  **`arXiv 25.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.11949)] [[Code](https:\u002F\u002Fgithub.com\u002FGLAM25\u002Fglam)]\n- **Robotic World Model**: A Neural Network Simulator for Robust Policy Optimization in Robotics.  **`arXiv 25.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.10100)]\n- **GAWM**: Global-Aware World Model for Multi-Agent Reinforcement Learning.  **`arXiv 25.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.10116)]\n- **RoboHorizon**: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation.  **`arXiv 25.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.06605)]\n- **EnerVerse**: Envisioning Embodied Future Space for Robotics Manipulation. **`AgiBot`**  **`arXiv 25.1`** [[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.01895)] [[Website](https:\u002F\u002Fsites.google.com\u002Fview\u002Fenerverse)]\n- **Cosmos** World Foundation Model Platform for Physical AI. **`NVIDIA`** **`arXiv 25.1`** [[Paper](https:\u002F\u002Fd1qx31qr3h6wln.cloudfront.net\u002Fpublications\u002FNVIDIA%20Cosmos_4.pdf)] [[Code](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FCosmos)]\n\n### 2024\n- [**SMAC**] 基于生成式世界模型的多智能体决策问题的可信答案。**`NeurIPS 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.02664)]\n- [**CoWorld**] 将离线强化学习在线化：用于离线视觉强化学习的协作式世界模型。**`NeurIPS 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.15260)] [[官网](https:\u002F\u002Fqiwang067.github.io\u002Fcoworld)] [[PyTorch代码](https:\u002F\u002Fgithub.com\u002Fqiwang067\u002FCoWorld)]\n- [**Diamond**] 用于世界建模的扩散模型：Atari游戏中的视觉细节至关重要。**`NeurIPS 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.12399)] [[代码](https:\u002F\u002Fgithub.com\u002Feloialonso\u002Fdiamond)]\n- **PIVOT-R**：面向机器人操作的基于基元的航点感知世界模型。**`NeurIPS 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.10394)]\n- [**MUN**] 用于无约束目标导航的世界模型学习。**`NeurIPS 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.02446)] [[代码](https:\u002F\u002Fgithub.com\u002FRU-Automated-Reasoning-Group\u002FMUN)]\n- **VidMan**：利用视频扩散模型中的隐式动力学实现高效的机器人操作。**`NeurIPS 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.09153)]\n- **自适应世界模型**：在非平稳环境下通过潜在想象学习行为。**`NeurIPSW 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.01342)]\n- 来自有限寿命智能体的隐式世界模型涌现。**`NeurIPSW 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.12304)]\n- GPT模型中的因果世界表征。**`NeurIPSW 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.07446)]\n- **PreLAR**：基于可学习动作表示的世界模型预训练。**`ECCV 24`** [[论文](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_24\u002Fpapers_ECCV\u002Fpapers\u002F03363.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002Fzhanglixuan0720\u002FPreLAR)]\n- [**CWM**] 利用反事实世界建模理解物理动力学。**`ECCV 24`** [[论文](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_24\u002Fpapers_ECCV\u002Fpapers\u002F03523.pdf)] [[代码](https:\u002F\u002Fneuroailab.github.io\u002Fcwm-physics\u002F)]\n- **ManiGaussian**：用于多任务机器人操作的动态高斯泼溅技术。**`ECCV 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.08321)] [[代码](https:\u002F\u002Fgithub.com\u002FGuanxingLu\u002FManiGaussian)]\n- [**DWL**] 推进人形机器人行走：通过去噪世界模型学习掌握复杂地形。**`RSS 24（最佳论文奖决赛入围）`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.14472)]\n- [**LLM-Sim**] 语言模型能否作为基于文本的世界模拟器？**`ACL`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.06485)] [[代码](https:\u002F\u002Fgithub.com\u002Fcognitiveailab\u002FGPT-simulator)]\n- **RoboDreamer**：为机器人想象力学习组合式世界模型。**`ICML 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.12377)] [[代码](https:\u002F\u002Frobovideo.github.io\u002F)]\n- [**Δ-IRIS**] 基于上下文感知分词的高效世界模型。**`ICML 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.19320)] [[代码](https:\u002F\u002Fgithub.com\u002Fvmicheli\u002Fdelta-iris)]\n- **AD3**：隐式动作是世界模型区分多样化视觉干扰的关键。**`ICML 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.09976)]\n- **Hieros**：基于结构化状态空间序列的世界模型的层次化想象。**`ICML 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.05167)]\n- [**HRSSM**] 学习用于世界模型的鲁棒潜在动态表征。**`ICML 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.06263)] [[代码](https:\u002F\u002Fgithub.com\u002Fbit1029public\u002FHRSSM)]\n- **HarmonyDream**：世界模型内部的任务协调。**`ICML 24`** [[论文](https:\u002F\u002Fopenreview.net\u002Fforum?id=x0yIaw2fgk)] [[代码](https:\u002F\u002Fgithub.com\u002Fthuml\u002FHarmonyDream)]\n- [**REM**] 通过并行观测预测改进基于标记的世界模型。**`ICML 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.05643)] [[代码](https:\u002F\u002Fgithub.com\u002Fleor-c\u002FREM)]\n- 变压器世界模型是否能提供更好的策略梯度？**`ICML 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.05290)]\n- **TD-MPC2**：适用于连续控制的可扩展、鲁棒世界模型。**`ICLR 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.16828)] [[PyTorch代码](https:\u002F\u002Fgithub.com\u002Fnicklashansen\u002Ftdmpc2)]\n- **DreamSmooth**：通过奖励平滑改进基于模型的强化学习。**`ICLR 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.01450)]\n- [**R2I**] 利用世界模型掌握记忆任务。**`ICLR 24`** [[论文](http:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.04253)] [[JAX代码](https:\u002F\u002Fgithub.com\u002FOpenDriveLab\u002FViDAR)]\n- **MAMBA**：一种用于元强化学习的有效世界模型方法。**`ICLR 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.09859)] [[代码](https:\u002F\u002Fgithub.com\u002Fzoharri\u002Fmamba)]\n- 基于视觉世界模型的多任务交互式机器人舰队学习。**`CoRL 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.22689)] [[代码](https:\u002F\u002Fut-austin-rpl.github.io\u002Fsirius-fleet\u002F)]\n- **生成式涌现通信**：大型语言模型是一种集体世界模型。**`arXiv 24.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.00226)]\n- 朝着揭示和提升世界模型泛化能力的方向。**`arXiv 24.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.00195)]\n- **迈向物理可解释的世界模型**：用于视觉轨迹预测的有意义的弱监督表征。**`arXiv 24.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.12870)]\n- **梦想操控**：组合式世界模型赋能机器人模仿学习的想象力。**`arXiv 24.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.14957)] [[项目](https:\u002F\u002Fleobarcellona.github.io\u002FDreamToManipulate\u002F)]\n- 变压器在迷宫求解任务中使用因果世界模型。**`arXiv 24.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.11867)]\n- **Owl-1**：用于一致长视频生成的全能世界模型。**`arXiv 24.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.09600)] [[代码](https:\u002F\u002Fgithub.com\u002Fhuang-yh\u002FOwl)]\n- **StoryWeaver**：用于知识增强型故事角色定制的统一世界模型。**`arXiv 24.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.07375)] [[代码](https:\u002F\u002Fgithub.com\u002FAria-Zhangjl\u002FStoryWeaver)]\n- **SimuDICE**：通过世界模型更新和DICE估计进行离线策略优化。**`BNAIC 24`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.06486)]\n- 在软演员-评论家强化学习算法中利用世界模型不确定性进行有界探索。**`arXiv 24.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.06139)]\n- **Genie 2**：一个大规模的基础世界模型。**`24.12`** **`Google DeepMind`** [[博客](https:\u002F\u002Fdeepmind.google\u002Fdiscover\u002Fblog\u002Fgenie-2-a-large-scale-foundation-world-model\u002F)]\n- **矩阵**：具有实时运动控制的无限时域世界生成。**`arXiv 24.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.03568)] [[项目](https:\u002F\u002Fthematrix1999.github.io\u002F)]\n- **运动提示**：通过运动轨迹控制视频生成。**`arXiv 24.12`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.02700)] [[项目](https:\u002F\u002Fmotion-prompting.github.io\u002F)]\n- 生成式世界探索者。**`arXiv 24.11`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.11844)] [[项目](https:\u002F\u002Fgenerative-world-explorer.github.io\u002F)]\n- [**WebDreamer**] 您的语言模型是否暗中充当互联网的世界模型？基于模型的网络智能体规划。**`arXiv 24.11`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.06559)] [[代码](https:\u002F\u002Fgithub.com\u002FOSU-NLP-Group\u002FWebDreamer)]\n- **WHALE**：迈向具身决策的通用且可扩展的世界模型。**`arXiv 24.11`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.05619)]\n- **DINO-WM**：基于预训练视觉特征的世界模型支持零样本规划。**`arXiv 24.11`** **`Yann LeCun`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.04983)]\n- 预训练智能体和世界模型的规模法则。**`arXiv 24.11`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.04434)]\n- [**Phyworld**] 视频生成距离世界模型还有多远：从物理定律角度看。**`arXiv 24.11`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.02385)] [[项目](https:\u002F\u002Fphyworld.github.io\u002F)]\n- **IGOR**：图像-目标表征是具身人工智能中基础模型的原子控制单元。**`arXiv 24.10`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.00785)] [[项目](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Fproject\u002Figor-image-goal-representations\u002F)]\n- **EVA**：用于未来视频预测的具身世界模型。**`arXiv 24.10`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.15461)]\n- **VisualPredicator**：利用神经符号谓词学习抽象世界模型，用于机器人规划。**`arXiv 24.10`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.23156)]\n- [**LLMCWM**] 语言智能体与因果关系——连接LLM和因果世界模型。**`arXiv 24.10`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.19923)] [[代码](https:\u002F\u002Fgithub.com\u002Fj0hngou\u002FLLMCWM\u002F)]\n- 用于在线模仿学习的免奖励世界模型。**`arXiv 24.10`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.14081)]\n- **带有世界模型的网络智能体**：在网络导航中学习和利用环境动力学。**`arXiv 24.10`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.13232)]\n- [**GLIMO**] 将大型语言模型嵌入到具有不完美世界模型的具身环境中。**`arXiv 24.10`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.02742)]\n- **AVID**：将视频扩散模型适配为世界模型。**`arXiv 24.10`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.12822)] [[代码](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fcausica\u002Ftree\u002Fmain\u002Fresearch_experiments\u002Favid)]\n- [**WMP**] 基于世界模型的视觉足式运动感知。**`arXiv 24.9`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.16784)] [[项目](https:\u002F\u002Fwmp-loco.github.io\u002F)]\n- [**OSWM**] 使用在合成先验上训练的变压器进行一次性世界模型构建。**`arXiv 24.9`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.14084)]\n- **R-AIF**：利用主动推理和世界模型从像素中解决稀疏奖励的机器人任务。**`arXiv 24.9`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.14216)]\n- 在生成式世界模型中表示位置信息以进行物体操作。**`arXiv 24.9`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.12005)]\n- 通过前提和效果知识将大型语言模型转化为世界模型。**`arXiv 24.9`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.12278)]\n- **DexSim2Real$^2$**：为精确的关节型物体灵巧操作构建显式世界模型。**`arXiv 24.9`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.08750)]\n- 基于以对象为中心的抽象进行高效探索和判别性世界模型学习。**`arXiv 24.8`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.11816)]\n- [**MoReFree**] 世界模型提升强化学习的自主性。**`arXiv 24.8`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.09807)] [[项目](https:\u002F\u002Fsites.google.com\u002Fview\u002Fmorefree)]\n- **UrbanWorld**：用于3D城市生成的城市世界模型。**`arXiv 24.7`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.11965)]\n- **PWM**：利用大型世界模型进行策略学习。**`arXiv 24.7`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.02466)] [[代码](https:\u002F\u002Fwww.imgeorgiev.com\u002Fpwm\u002F)]\n- **预测 vs. 行动**：世界建模与智能体建模之间的权衡。**`arXiv 24.7`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.02446)]\n- [**GenRL**] 多模态基础世界模型，用于通用具身智能体。**`arXiv 24.6`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.18043)] [[代码](https:\u002F\u002Fgithub.com\u002Fmazpie\u002Fgenrl)]\n- [**DLLM**] 带有大型语言模型暗示的世界模型，用于目标达成。**`arXiv 24.6`** [[论文](http:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.07381)]\n- 语言模型的认知地图：通过口头表征世界模型实现最优规划。**`arXiv 24.6`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.15275)]\n- **CityBench**：评估大型语言模型作为世界模型的能力。**`arXiv 24.6`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.13945)] [[代码](https:\u002F\u002Fgithub.com\u002Ftsinghua-fib-lab\u002FCityBench)]\n- **CoDreamer**：基于沟通的去中心化世界模型。**`arXiv 24.6`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.13600)]\n- [**EBWM**] 受认知启发的能量驱动世界模型。**`arXiv 24.6`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.08862)]\n- 评估生成模型中隐含的世界模型。**`arXiv 24.6`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.03689)] [[代码](https:\u002F\u002Fgithub.com\u002Fmazpie\u002Fgenrl)]\n- 变压器和槽位编码用于高效物理世界建模。**`arXiv 24.5`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.20180)] [[代码](https:\u002F\u002Fgithub.com\u002Ftorchipeppo\u002Ftransformers-and-slot-encoding-for-wm)]\n- [**Puppeteer**] 层次化世界模型作为视觉全身人形控制器。**`arXiv 24.5`** **`Yann LeCun`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.18418)] [[代码](https:\u002F\u002Fnicklashansen.com\u002Frlpuppeteer)]\n- **BWArea Model**：学习可控语言生成的世界模型、逆动力学和策略。**`arXiv 24.5`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.17039)]\n- **潘多拉**：迈向具有自然语言动作和视频状态的通用世界模型。[[论文](https:\u002F\u002Fworld-model.maitrix.org\u002Fassets\u002Fpandora.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002Fmaitrix-org\u002FPandora)]\n- [**WKM**] 基于世界知识模型的智能体规划。**`arXiv 24.5`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.14205)] [[代码](https:\u002F\u002Fgithub.com\u002Fzjunlp\u002FWKM)]\n- **牛顿**™——首个用于理解物理世界的奠基模型。**`Archetype AI`** [[博客](https:\u002F\u002Fwww.archetypeai.io\u002Fblog\u002Fintroducing-archetype-ai---understand-the-real-world-in-real-time)]\n- **竞争与组合**：学习模块化世界模型的独立机制。**`arXiv 24.4`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.15109)]\n- **MagicTime**：延时视频生成模型作为变形模拟器。**`arXiv 24.4`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.05014)] [[代码](https:\u002F\u002Fgithub.com\u002FPKU-YuanGroup\u002FMagicTime)]\n- **梦想多重世界**：学习上下文世界模型有助于零样本泛化。**`arXiv 24.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.10967)] [[代码](https:\u002F\u002Fgithub.com\u002Fsai-prasanna\u002Fdreaming_of_many_worlds)]\n- **V-JEPA**：视频联合嵌入预测架构。**`Meta AI`** **`Yann LeCun`** [[博客](https:\u002F\u002Fai.meta.com\u002Fblog\u002Fv-jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture\u002F)] [[论文](https:\u002F\u002Fai.meta.com\u002Fresearch\u002Fpublications\u002Frevisiting-feature-prediction-for-learning-visual-representations-from-video\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fjepa)]\n- [**IWM**] 在视觉表征学习中学习和利用世界模型。**`Meta AI`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.00504)]\n- **Genie**：生成式互动环境。**`DeepMind`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.15391)] [[博客](https:\u002F\u002Fsites.google.com\u002Fview\u002Fgenie-24\u002Fhome)]\n- [**Sora**] 视频生成模型作为世界模拟器。**`OpenAI`** [[技术报告](https:\u002F\u002Fopenai.com\u002Fresearch\u002Fvideo-generation-models-as-world-simulators)]\n- [**LWM**] 基于百万长度视频和语言、采用RingAttention的世界模型。**`arXiv 24.2`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.08268)] [[代码](https:\u002F\u002Fgithub.com\u002FLargeWorldModel\u002FLWM)]\n- 使用世界模型集成进行规划。**`OpenReview`** [[论文](https:\u002F\u002Fopenreview.net\u002Fforum?id=cvGdPXaydP)]\n- **WorldDreamer**：通过预测掩码标记，迈向用于视频生成的通用世界模型。**`arXiv 24.1`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.09985)] [[代码](https:\u002F\u002Fgithub.com\u002FJeffWang987\u002FWorldDreamer)]\n\n### 2023\n- [**IRIS**] Transformer 是样本高效的环境模型。**`ICLR 23 口头报告`** [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.00588)] [[PyTorch 代码](https:\u002F\u002Fgithub.com\u002Feloialonso\u002Firis)]\n- **STORM**: 基于随机 Transformer 的强化学习高效环境模型。**`NIPS 23`** [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.09615)] [[PyTorch 代码](https:\u002F\u002Fgithub.com\u002Fweipu-zhang\u002FSTORM)]\n- [**TWM**] 基于 Transformer 的环境模型仅需 10 万次交互即可取得良好效果。**`ICLR 23`** [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.07109)] [[PyTorch 代码](https:\u002F\u002Fgithub.com\u002Fjrobine\u002Ftwm)]\n- **FOCUS**: 面向机器人操作的对象中心环境模型 **`arXiv 23.7`** [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.02427)] [[代码](https:\u002F\u002Fgithub.com\u002FStefanoFerraro\u002FFOCUS)]\n- [**Dynalang**] 利用语言学习世界建模。**`arXiv 23.8`** [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.01399)] [[代码](https:\u002F\u002Fgithub.com\u002Fjlin816\u002Fdynalang)]\n- [**TAD**] 面向强化学习任务泛化的任务感知梦想家。**`arXiv 23.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.05092)]\n### 2022\n- [**TD-MPC**] 用于模型预测控制的时序差分学习。**`ICML 22`** [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.04955)][[代码](https:\u002F\u002Fgithub.com\u002Fnicklashansen\u002Ftdmpc)]\n- **DreamerPro**: 基于原型表征的无重建模型基础强化学习。**`ICML 22`** [[论文](https:\u002F\u002Fproceedings.mlr.press\u002Fv162\u002Fdeng22a\u002Fdeng22a.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002Ffdeng18\u002Fdreamer-pro)]\n- **DayDreamer**: 用于物理机器人学习的环境模型。**`CoRL 22`** [[论文](https:\u002F\u002Fproceedings.mlr.press\u002Fv205\u002Fwu23c\u002Fwu23c.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002Fdanijar\u002Fdaydreamer)]\n- 从像素中进行深度层次规划。**`NIPS 22`** [[论文](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Ffile\u002Fa766f56d2da42cae20b5652970ec04ef-Paper-Conference.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002Fdanijar\u002Fdirector)]\n- **Iso-Dream**: 在环境模型中隔离并利用不可控的视觉动态。**`NIPS 22 Spotlight`** [[论文](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Ffile\u002F9316769afaaeeaad42a9e3633b14e801-Paper-Conference.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002Fpanmt\u002FIso-Dream)]\n- **DreamingV2**: 无需重建的离散环境模型强化学习。**`arXiv 22.3`** [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.00494)]\n### 2021\n- [**DreamerV2**] 使用离散环境模型掌握 Atari 游戏。**`ICLR 21`** [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2010.02193)] [[TensorFlow 代码](https:\u002F\u002Fgithub.com\u002Fdanijar\u002Fdreamerv2)] [[PyTorch 代码](https:\u002F\u002Fgithub.com\u002Fjsikyoon\u002Fdreamer-torch)]\n- **Dreaming**: 通过潜在想象实现无重建的基于模型的强化学习。**`ICRA 21`** [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2007.14535)]\n### 2020\n- [**DreamerV1**] 梦想即控制：通过潜在想象学习行为。**`ICLR 20`** [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1912.01603)] [[TensorFlow 代码](https:\u002F\u002Fgithub.com\u002Fdanijar\u002Fdreamer)] [[PyTorch 代码](https:\u002F\u002Fgithub.com\u002Fjuliusfrost\u002Fdreamer-pytorch)]\n- [**Plan2Explore**] 通过自监督环境模型进行探索式规划。**`ICML 20`** [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2005.05960)] [[TensorFlow 代码](https:\u002F\u002Fgithub.com\u002Framanans1\u002Fplan2explore)] [[PyTorch 代码](https:\u002F\u002Fgithub.com\u002Fyusukeurakami\u002Fplan2explore-pytorch)]\n\n### 2018\n* 环境模型。**`NIPS 2018 口头报告`** [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.10122)]","# Awesome-World-Model 快速上手指南\n\n**Awesome-World-Model** 并非一个可直接运行的单一软件库，而是一个专注于**自动驾驶与世界模型（World Models）**领域的开源论文、代码库、综述及基准测试的精选合集。本指南将帮助开发者快速利用该资源追踪前沿技术并获取相关模型的代码。\n\n## 环境准备\n\n由于本仓库主要收录的是不同研究团队的独立项目，因此没有统一的系统要求。在运行具体模型前，请确保满足以下通用前置条件：\n\n*   **操作系统**：推荐 Linux (Ubuntu 18.04\u002F20.04\u002F22.04) 或 macOS。Windows 用户建议使用 WSL2。\n*   **Python 版本**：建议 Python 3.8 - 3.10（具体取决于目标模型的要求）。\n*   **硬件要求**：大多数世界模型训练和推理需要 NVIDIA GPU (推荐显存 ≥ 16GB，如 RTX 3090\u002F4090 或 A100)。\n*   **基础依赖**：\n    *   Git\n    *   CUDA Toolkit (版本需与 PyTorch 匹配)\n    *   Conda 或 Mamba (推荐用于管理不同项目的虚拟环境)\n\n## 安装步骤\n\n本仓库本身无需“安装”，只需克隆即可浏览列表。若要运行其中收录的具体模型（如 `UniFuture`, `HERMES` 等），请按以下步骤操作：\n\n### 1. 克隆本仓库\n获取最新的论文列表和代码链接：\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FLMD0311\u002FAwesome-World-Model.git\ncd Awesome-World-Model\n```\n\n### 2. 选择并克隆目标模型\n在 `README.md` 的 **Papers** 部分找到你感兴趣的模型（例如 **UniFuture**），点击其 **Code** 链接进入对应仓库。以 `UniFuture` 为例：\n\n```bash\n# 示例：克隆 UniFuture 项目\ngit clone https:\u002F\u002Fgithub.com\u002Fdk-liang\u002FUniFuture.git\ncd UniFuture\n```\n\n### 3. 配置虚拟环境与依赖\n**注意**：每个子项目都有独立的 `requirements.txt` 或 `environment.yml`，请务必在该项目目录下安装。\n\n```bash\n# 创建虚拟环境 (以 conda 为例)\nconda create -n world_model python=3.9\nconda activate world_model\n\n# 安装 PyTorch (推荐使用国内镜像源加速)\npip install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n\n# 安装项目特定依赖\npip install -r requirements.txt\n```\n\n> **💡 国内加速建议**：\n> 在安装依赖时，若遇到网络缓慢，可临时使用清华或阿里镜像源：\n> ```bash\n> pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n## 基本使用\n\n由于这是一个论文合集，\"使用\"通常指复现某个具体模型。以下是基于列表中典型项目（如 **UniFuture** 或 **HERMES**）的通用使用流程：\n\n### 1. 数据准备\n大多数自动驾驶世界模型需要特定的数据集（如 nuScenes, Argoverse 2）。\n*   查阅目标项目 README 中的 \"Data Preparation\" 章节。\n*   下载数据集并整理到指定目录结构。\n\n### 2. 模型推理 (Inference)\n大多数项目提供预训练权重。以下是一个典型的推理命令示例（具体参数请参考各子项目文档）：\n\n```bash\n# 示例：运行 UniFuture 进行未来场景生成\npython tools\u002Finfer.py \\\n    --config configs\u002Funifuture_nuscenes.py \\\n    --checkpoint checkpoints\u002Funifuture.pth \\\n    --input_data data\u002Fnuscenes\u002Fsamples \\\n    --output_dir outputs\u002Fpredictions\n```\n\n### 3. 模型训练 (Training)\n若需从头训练或微调，通常使用如下命令：\n\n```bash\n# 示例：启动分布式训练\npython -m torch.distributed.launch --nproc_per_node=4 train.py \\\n    --config configs\u002Ftrain_config.py \\\n    --data_root \u002Fpath\u002Fto\u002Fdataset\n```\n\n### 4. 查阅最新论文\n若你想了解最新的技术动态而非立即运行代码，可直接访问仓库中的论文链接：\n*   **综述类**：查看 `Survey` 章节下的 arXiv 链接（如 *The Role of World Models in Shaping Autonomous Driving*）。\n*   **最新模型**：查看 `2026` 或 `2025` 章节，直接点击 **Paper** 阅读算法细节，点击 **Project** 查看演示视频。\n\n---\n**提示**：发现遗漏的优秀论文或代码？欢迎通过仓库的 [Issues](https:\u002F\u002Fgithub.com\u002FLMD0311\u002FAwesome-World-Model\u002Fissues\u002Fnew) 或 [Pull Requests](https:\u002F\u002Fgithub.com\u002FLMD0311\u002FAwesome-World-Model\u002Fblob\u002Fmain\u002FContributionGuidelines.md) 进行贡献。","某自动驾驶初创公司的算法团队正在研发新一代端到端驾驶系统，急需评估并集成最新的世界模型（World Model）以提升车辆在复杂路况下的预测与规划能力。\n\n### 没有 Awesome-World-Model 时\n- **文献检索效率低下**：研究人员需在 arXiv、Google Scholar 等多个平台手动筛选海量论文，难以区分哪些是真正针对自动驾驶场景的世界模型，耗时数周仍可能遗漏关键成果。\n- **基准对比困难**：由于缺乏统一的评测标准和数据集链接，团队无法快速复现不同模型的性能，导致技术选型主要依靠直觉而非数据支撑。\n- **前沿动态滞后**：社区最新的研讨会（如 CVPR Workshop）和挑战赛信息分散，团队容易错过像\"3D 占用预测”或\"4D 未来生成”等突破性方向，研发路线存在盲区。\n- **复现门槛高**：许多论文未公开代码或缺乏清晰的实现指引，工程师在尝试复现时常常陷入环境配置和数据处理的黑洞，严重拖慢迭代进度。\n\n### 使用 Awesome-World-Model 后\n- **一站式资源聚合**：团队直接通过该清单获取了经过筛选的自动驾驶世界模型论文库，包括 HERMES、UniFuture 等 SOTA 方法，将调研周期从数周缩短至两天。\n- **标准化评测参考**：借助列表中整理的 Benchmark 和相关挑战赛（如 World Model Bench），团队迅速建立了内部评估体系，量化对比了各模型在物理合理性和未来预测上的表现。\n- **紧跟学术前沿**：通过追踪列表更新的研讨会和挑战赛信息，团队及时引入了“混合记忆动态视频模型”等新思路，优化了长尾场景下的决策逻辑。\n- **加速工程落地**：清单提供的开源代码链接和引用指南，帮助工程师快速跑通基线模型，并将重点从“找代码”转移到“改架构”，显著提升了研发效能。\n\nAwesome-World-Model 不仅是一个论文列表，更是连接学术前沿与工业落地的桥梁，让自动驾驶团队能在瞬息万变的技术浪潮中精准导航、高效迭代。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FLMD0311_Awesome-World-Model_6c5f9378.png","LMD0311","Xin Zhou","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FLMD0311_1faf40a8.jpg",null,"Huazhong University of Science & Technology","Wuhan, Hubei Province, China","THELMDOFZHOUXIN","https:\u002F\u002Fgithub.com\u002FLMD0311",1954,77,"2026-04-05T22:36:58",1,"","未说明",{"notes":90,"python":88,"dependencies":91},"该仓库（Awesome-World-Model）是一个用于记录、跟踪和基准测试自动驾驶及机器人领域世界模型（World Models）的论文和资源列表（Survey\u002FAwesome List），本身不是一个可执行的软件工具或代码库，因此 README 中未包含具体的运行环境、硬件需求或依赖库信息。如需运行列表中提及的具体模型（如 UniFuture, HERMES 等），请访问各模型对应的独立项目链接查看其具体环境要求。",[],[15,14,13,62],[94,95,96,97,98,99,100,101,102,103],"autonomous-driving","autonomous-vehicles","computer-vision","world-model","future-predict","artificial-intelligence","artificial-intelligence-algorithms","awesome","deep-learning","robotics","2026-03-27T02:49:30.150509","2026-04-06T14:05:08.320115",[],[]]