[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-HCPLab-SYSU--Embodied_AI_Paper_List":3,"tool-HCPLab-SYSU--Embodied_AI_Paper_List":64},[4,17,27,35,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,43,44,45,15,46,26,13,47],"数据工具","视频","插件","其他","音频",{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":10,"last_commit_at":54,"category_tags":55,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,46],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74939,"2026-04-05T23:16:38",[26,14,13,46],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":79,"stars":82,"forks":83,"last_commit_at":84,"license":79,"difficulty_score":85,"env_os":86,"env_gpu":87,"env_ram":87,"env_deps":88,"category_tags":91,"github_topics":92,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":103,"updated_at":104,"faqs":105,"releases":135},2137,"HCPLab-SYSU\u002FEmbodied_AI_Paper_List","Embodied_AI_Paper_List","[Embodied-AI-Survey-2025] Paper List and Resource Repository for Embodied AI","Embodied_AI_Paper_List 是由中山大学 HCPLab 与鹏城实验室联合维护的具身智能（Embodied AI）领域论文与资源汇总库。该项目旨在解决具身智能研究方向分散、文献更新迅速导致研究者难以全面追踪前沿进展的痛点，通过系统化的整理，为社区提供一站式的知识导航。\n\n资源库内容涵盖具身机器人、仿真模拟器、感知、交互、智能体架构以及虚实迁移等核心议题，并特别关注多模态大模型与世界模型在其中的应用。除了按时间顺序持续更新的精选论文列表外，项目还配套了发表在 IEEE\u002FASME Transactions on Mechatronics 上的综述文章，深入剖析了现有方法的范式、数据集及未来挑战。\n\n该工具非常适合人工智能研究人员、高校师生以及从事智能机器人开发的工程师使用。无论是希望快速入门的新手，还是需要把握最新技术动态的资深专家，都能从中高效获取高质量参考文献和项目代码。其独特的亮点在于不仅提供清单，更通过专业的分类体系和定期的维护更新，帮助开发者理清从理论到落地的完整技术脉络，是探索具身通用智能不可或缺的参考指南。","\u003Cbr>\n\u003Cp align=\"center\">\n\u003Ch1 align=\"center\">\u003Cstrong>Paper List  and Resource Repository for Embodied AI\u003C\u002Fstrong>\u003C\u002Fh1>\n  \u003Cp align=\"center\">\n    \u003Ca href='https:\u002F\u002Fwww.sysu-hcp.net\u002F' target='_blank'>HCPLab\u003C\u002Fa>&emsp;\n    \u003Cbr>\n    SYSU HCP Lab and Pengcheng Laboratory\n    \u003Cbr>\n  \u003C\u002Fp>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHCPLab-SYSU_Embodied_AI_Paper_List_readme_1a5768392322.jpg\" width=\"250\">\n\u003C\u002Fp>\n\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2407.06886-orange)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.06886)\n[![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-%F0%9F%93%96-yellow)](https:\u002F\u002Fgithub.com\u002FHCPLab-SYSU\u002FEmbodied_AI_Paper_List\u002Fblob\u002Fmain\u002FEmbodiedAI_Review.pdf)\n[![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-%F0%9F%9A%80-pink)](https:\u002F\u002Fgithub.com\u002FHCPLab-SYSU\u002FEmbodied_AI_Paper_List)\n\n#### We appreciate any useful suggestions for improvement of this paper list or survey from peers. Please raise issues or send an email to **liuy856@mail.sysu.edu.cn** and **chen867820261@gmail.com**. Thanks for your cooperation! We also welcome your pull requests for this project!\n\n![Teaser](teaser.png \"demo\")\n\n[**Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI, IEEE\u002FASME Transactions on Mechatronics 2025**](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.06886)    \n  [Yang Liu](https:\u002F\u002Fyangliu9208.github.io), Weixing Chen, Yongjie Bai, [Xiaodan Liang](https:\u002F\u002Flemondan.github.io), [Guanbin Li](http:\u002F\u002Fguanbinli.com\u002F), [Wen Gao](https:\u002F\u002Fidm.pku.edu.cn\u002Finfo\u002F1017\u002F1041.htm), [Liang Lin](http:\u002F\u002Fwww.linliang.net\u002F)     \n\n\u003Cp align=\"center\">\n\u003Cimg src=\".\u002FSurvey.png\" width=\"800\">\n\u003C\u002Fp>  \n\n## 🏠 About\n\nEmbodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General Intelligence (AGI) and serves as a foundation for various applications (e.g., intelligent mechatronics systems, smart manufacturing) that bridge cyberspace and the physical world. Recently, the emergence of Multi-modal Large Models (MLMs) and World Models (WMs) have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilities, making them a promising architecture for embodied agents. In this survey, we give a comprehensive exploration of the latest advancements in Embodied AI. Our analysis firstly navigates through the forefront of representative works of embodied robots and simulators, to fully understand the research focuses and their limitations. Then, we analyze four main research targets: 1) embodied perception, 2) embodied interaction, 3) embodied agent, and 4) sim-to-real adaptation, covering state-of-the-art methods, essential paradigms, and comprehensive datasets. Additionally, we explore the complexities of MLMs in virtual and real embodied agents, highlighting their significance in facilitating interactions in digital and physical environments.  Finally, we summarize the challenges and limitations of embodied AI and discuss potential future directions. We hope this survey will serve as a foundational reference for the research community. \n\n## :collision: Update Log \n* [2026.03.11] Updated paper list with latest 2025-2026 papers across all categories!\n* [2025.05.27] Our Embodied AI Survey paper is accepted by IEEE\u002FASME Transactions on Mechatronics!\n* [2024.09.08] We are constantly updating the Dataset section!\n* [2024.08.31] We added the Datasets section and classified the useful projects!\n* [2024.08.19] To make readers focus on newest works, we have arranged papers in chronological order!   \n* [2024.08.02] We regularly update the project weekly!   \n* [2024.07.29] We have updated the project!   \n* [2024.07.22] We have updated the paper list and other useful embodied projects!   \n* [2024.07.10] We release the first version of the survey on Embodied AI [PDF](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.06886)!\n* [2024.07.10] We release the first version of the paper list for Embodied AI. This page is continually updating!\n\n\n\n## \u003Ca id=\"table-of-contents\">📚 Table of Contents \u003C\u002Fa>\n\n- [Books & Surveys](#books-surveys)\n- [Embodied Simulators](#simulators)\n- [Embodied Perception](#perception)\n- [Embodied Interaction](#interaction)\n- [Embodied Agent](#agent)\n- [Sim-to-Real Adaptation](#sim-to-real)\n- [Datasets](#datasets)\n\n## \u003Ca id=\"books-surveys\"> Books & Surveys \u003Ca href=\"#table-of-contents\">🔝\u003C\u002Fa> \u003C\u002Fa> \n\n* **Self-evolving Embodied AI**, arXiv:2602.04411, 2026       \nTongtong Feng, Xin Wang, Wenwu Zhu.        \n[[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2602.04411)]\n\n* **Towards Robust and Secure Embodied AI: A Survey on Vulnerabilities and Attacks**, arXiv:2502.13175, 2025       \nWenpeng Xing, Minghao Li, Mohan Li, Meng Han.        \n[[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.13175)]\n\n* **From Screens to Scenes: A Survey of Embodied AI in Healthcare**, arXiv:2501.07468, 2025       \nYihao Liu, Xu Cao, Tingting Chen, Yankai Jiang, Junjie You, Minghua Wu, Xiaosong Wang, Mengling Feng, Yaochu Jin, Jintai Chen.        \n[[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.07468)]\n\n* **Semantic Mapping in Indoor Embodied AI -- A Survey**, arXiv:2501.05750, 2025       \nSonia Raychaudhuri, Angel X. Chang.        \n[[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.05750)]\n\n* **A Comprehensive Survey on World Models for Embodied AI**, arXiv:2510.16732, 2025       \nXinqing Li, Xin He, Le Zhang, Min Wu, Xiaoli Li, Yun Liu.        \n[[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2510.16732)]\n\n* **Generative Artificial Intelligence in Robotic Manipulation: A Survey**, arXiv:2503.03464, 2025       \nKun Zhang, Peng Yun, Jun Cen, Junhao Cai, Didi Zhu, Hangjie Yuan, Chao Zhao, Tao Feng, Michael Yu Wang, Qifeng Chen, Jia Pan, Wei Zhang, Bo Yang, Hua Chen.        \n[[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.03464)]\n\n* **Dexterous Manipulation through Imitation Learning: A Survey**, arXiv:2504.03515, 2025       \nShan An, Ziyu Meng, Chao Tang, Yuning Zhou, Tengyu Liu, Fangqiang Ding, Shufang Zhang, Yao Mu, Ran Song, Wei Zhang, Zeng-Guang Hou, Hong Zhang.        \n[[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.03515)]\n\n* **Humanoid Robots and Humanoid AI: Review, Perspectives and Directions**, arXiv:2405.15775, 2025       \nLongbing Cao.        \n[[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.15775)]\n\n* **A Survey of Robotic Navigation and Manipulation with Physics Simulators in the Era of Embodied AI**, arXiv:2505.01458, 2025       \nLik Hang Kenny Wong, Xueyang Kang, Kaixin Bai, Jianwei Zhang.        \n[[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.01458)]\n\n* **Multimodal Large Models: The New Paradigm of Artificial General Intelligence**, Publishing House of Electronics Industry (PHE), 2024       \nYang Liu, Liang Lin             \n[[Page](https:\u002F\u002Fhcplab-sysu.github.io\u002FBook-of-MLM\u002F)]      \n\n* **Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI**, arXiv:2407.06886, 2024       \nYang Liu, Weixing Chen, Yongjie Bai, Guanbin Li, Wen Gao, Liang Lin.        \n[[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.06886)]    \n\n* **All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents**, arXiv:2408.10899, 2024      \nZhiqiang Wang, Hao Zheng, Yunshuang Nie, Wenjun Xu, Qingwei Wang, Hua Ye, Zhe Li, Kaidong Zhang, Xuewen Cheng, Wanxi Dong, Chang Cai, Liang Lin, Feng Zheng, Xiaodan Liang           \n[[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.10899)][[Project](https:\u002F\u002Fimaei.github.io\u002Fproject_pages\u002Fario\u002F)]\n\n* **Embodied intelligence toward future smart manufacturing in the era of AI foundation model**, IEEE\u002FASME Transactions on Mechatronics, 2024         \nLei Ren, Jiabao Dong, Shuai Liu, Lin Zhang, and Lihui Wang.         \n[[Paper](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F10697107)]\n\n* **A Survey of Embodied Learning for Object-Centric Robotic Manipulation**, arXiv:2408.11537, 2024   \nYing Zheng, Lei Yao, Yuejiao Su, Yi Zhang, Yi Wang, Sicheng Zhao, Yiyi Zhang, Lap-Pui Chau    \n[[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.11537)]\n\n* **Teleoperation of Humanoid Robots: A Survey**, IEEE Transactions on Robotics, 2024       \nKourosh Darvish, Luigi Penco, Joao Ramos, Rafael Cisneros, Jerry Pratt, Eiichi Yoshida, Serena Ivaldi, Daniele Pucci.        \n[[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2301.04317)]\n\n* **A Survey on Vision-Language-Action Models for Embodied AI**, arXiv:2405.14093, 2024   \nYueen Ma, Zixing Song, Yuzheng Zhuang, Jianye Hao, Irwin King    \n[[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.14093)]\n\n* **Towards Generalist Robot Learning from Internet Video: A Survey**, arXiv:2404.19664, 2024   \nMcCarthy, Robert, Daniel CH Tan, Dominik Schmidt, Fernando Acero, Nathan Herr, Yilun Du, Thomas G. Thuruthel, and Zhibin Li.  \n[[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.19664)]\n\n* **A Survey on Robotics with Foundation Models: toward Embodied AI**, arXiv:2402.02385, 2024    \nZhiyuan Xu, Kun Wu, Junjie Wen, Jinming Li, Ning Liu, Zhengping Che, and Jian Tang.     \n[[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.02385)]     \n\n* **Toward general-purpose robots via foundation models: A survey and meta-analysis**, Machines, 2023   \nYafei Hu, Quanting Xie, Vidhi Jain, Jonathan Francis, Jay Patrikar, Nikhil Keetha, Seungchan Kim, Yaqi Xie, Tianyi Zhang, Shibo Zhao, Yu Quan Chong, Chen Wang, Katia Sycara, Matthew Johnson-Roberson, Dhruv Batra, Xiaolong Wang, Sebastian Scherer, Zsolt Kira, Fei Xia, Yonatan Bisk.            \n[[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.08782)]    \n\n* **Deformable Object Manipulation in Caregiving Scenarios: A Review**, Machines, 2023   \nLiman Wang, Jihong Zhu.  \n[[Paper]https:\u002F\u002Fwww.mdpi.com\u002F2075-1702\u002F11\u002F11\u002F1013]\n\n* **A survey of embodied ai: From simulators to research tasks**, IEEE Transactions on Emerging Topics in Computational Intelligence, 2022    \nJiafei Duan, Samson Yu, Hui Li Tan, Hongyuan Zhu, Cheston Tan    \n[[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.04918)]    \n\n* **The development of embodied cognition: Six lessons from babies**, Artificial life, 2005    \nLinda Smith, Michael Gasser    \n[[Paper](https:\u002F\u002Fcogdev.sitehost.iu.edu\u002Flabwork\u002F6_lessons.pdf)]    \n\n* **Embodied artificial intelligence: Trends and challenges**, Lecture notes in computer science, 2004    \nRolf Pfeifer, Fumiya Iida   \n[[Paper](https:\u002F\u002Fpeople.csail.mit.edu\u002Fiida\u002Fpapers\u002FPfeiferIidaEAIDags.pdf)]     \n\n## \u003Ca id=\"simulators\"> Embodied Simulators \u003Ca href=\"#table-of-contents\">🔝\u003C\u002Fa> \u003C\u002Fa>\n### General Simulator\n\n* **Design and use paradigms for gazebo, an open-source multi-robot simulator**, IROS, 2004        \nKoenig, Nathan, Andrew, Howard.      \n[[page](https:\u002F\u002Fciteseerx.ist.psu.edu\u002Fdocument?repid=rep1&type=pdf&doi=79f91c1c95271a075b91e9fdca43d6c31e4cbe17)]\n\n* **Nvidia isaac sim: Robotics simulation and synthetic data**, NVIDIA, 2023    \n[[page](https:\u002F\u002Fdeveloper.nvidia.com\u002Fisaac\u002Fsim)]    \n\n* **Aerial Gym -- Isaac Gym Simulator for Aerial Robots**, ArXiv, 2023    \nMihir Kulkarni and Theodor J. L. Forgaard and Kostas Alexis.     \n[[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16510)]     \n\n* **Webots: open-source robot simulator**, 2018      \nCyberbotics      \n[[page](https:\u002F\u002Fcyberbotics.com\u002Fdoc\u002Freference\u002Findex), [code](https:\u002F\u002Fgithub.com\u002Fcyberbotics\u002Fwebots)]     \n\n* **Unity: A general platform for intelligent agents**, ArXiv, 2020    \nJuliani, Arthur, Vincent-Pierre, Berges, Ervin, Teng, Andrew, Cohen, Jonathan, Harper, Chris, Elion, Chris, Goy, Yuan, Gao, Hunter, Henry, Marwan, Mattar, Danny, Lange.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1809.02627)]    \n\n* **AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles**, Field and Service Robotics, 2017    \nShital Shah, , Debadeepta Dey, Chris Lovett, Ashish Kapoor.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1705.05065.pdf%20http:\u002F\u002Farxiv.org\u002Fabs\u002F1705.05065)]    \n\n* **Pybullet, a python module for physics simulation for games, robotics and machine learning**, 2016     \nCoumans, Erwin, Yunfei, Bai.     \n[[page](https:\u002F\u002Fgithub.com\u002Fbulletphysics\u002Fbullet3)]       \n\n* **V-REP: A versatile and scalable robot simulation framework**, IROS, 2013      \nRohmer, Eric, Surya PN, Singh, Marc, Freese.     \n[[page](https:\u002F\u002Fcoppeliarobotics.com\u002FcoppeliaSim_v-rep_iros2013.pdf)]     \n\n* **MuJoCo: A physics engine for model-based control**, IROS, 2012    \nTodorov, Emanuel, Tom, Erez, Yuval, Tassa.      \n[[page](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F6386109\u002F), [code](https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Fmujoco)]     \n\n* **Modular open robots simulation engine: Morse**, ICRA, 2011       \nEcheverria, Gilberto and Lassabe, Nicolas and Degroote, Arnaud and Lemaignan, S{\\'e}verin     \n[[page](https:\u002F\u002Fwww.openrobots.org\u002Fmorse\u002Fmaterial\u002Fmedia\u002Fpdf\u002Fpaper-icra.pdf)]    \n\n\n### Real-Scene Based Simulators\n\n* **RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning**, arXiv, 2025  \nHaoran Geng, Feishi Wang, Songlin Wei, Yuyang Li, Bangjun Wang, Boshi An, Charlie Tianyue Cheng, Haozhe Lou, Peihao Li, Yen-Jen Wang, Yutong Liang, Dylan Goetting, Chaoyi Xu, Haozhe Chen, Yuxi Qian, Yiran Geng, Jiageng Mao, Weikang Wan, Mingtong Zhang, Jiangran Lyu, Siheng Zhao, Jiazhao Zhang, Jialiang Zhang, Chengyang Zhao, Haoran Lu, Yufei Ding, Ran Gong, Yuran Wang, Yuxuan Kuang, Ruihai Wu, Baoxiong Jia, Carlo Sferrazza, Hao Dong, Siyuan Huang, Yue Wang, Jitendra Malik, Pieter Abbeel.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.18904)]\n\n* **Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning**, arXiv, 2025  \nMayank Mittal, Pascal Roth, James Tigue, Antoine Richard, Octi Zhang, Peter Du, Antonio Serrano-Muñoz, Xinjie Yao, René Zurbrügg, Nikita Rudin, Lukasz Wawrzyniak, Milad Rakhsha, Alain Denzler, Eric Heiden, Ales Borovicka, Ossama Ahmed, Iretiayo Akinola, Abrar Anwar, Mark T. Carlson, Ji Yuan Feng, Animesh Garg.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2511.04831)]\n\n* **InfiniteWorld: A Unified Scalable Simulation Framework for General Visual-Language Robot Interaction**, arxiv, 2024  \nPengzhen Ren, Min Li, Zhen Luo, Xinshuai Song, Ziwei Chen, Weijia Liufu, Yixuan Yang, Hao Zheng, Rongtao Xu, Zitong Huang, Tongsheng Ding, Luyang Xie, Kaidong Zhang, Changfei Fu, Yang Liu, Liang Lin, Feng Zheng, Xiaodan Liang.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.05789)]\n\n\n* **ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI**, arxiv, 2024  \nStone Tao, Fanbo Xiang, Arth Shukla, Yuzhe Qin, Xander Hinrichsen, Xiaodi Yuan, Chen Bao, Xinsong Lin, Yulin Liu, Tse-kai Chan, Yuan Gao, Xuanlin Li, Tongzhou Mu, Nan Xiao, Arnav Gurha, Zhiao Huang, Roberto Calandra, Rui Chen, Shan Luo, Hao Su.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.00425)]\n\n* **PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI**, arxiv, 2024  \nYang, Yandan, Baoxiong, Jia, Peiyuan, Zhi, Siyuan, Huang.   \n[[page](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FYang_PhyScene_Physically_Interactable_3D_Scene_Synthesis_for_Embodied_AI_CVPR_2024_paper.pdf)]\n\n* **Holodeck: Language Guided Generation of 3D Embodied AI Environments**, CVPR, 2024  \nYue Yang, , Fan-Yun Sun, Luca Weihs, Eli VanderBilt, Alvaro Herrasti, Winson Han, Jiajun Wu, Nick Haber, Ranjay Krishna, Lingjie Liu, Chris Callison-Burch, Mark Yatskar, Aniruddha Kembhavi, Christopher Clark.   \n[[page](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FYang_Holodeck_Language_Guided_Generation_of_3D_Embodied_AI_Environments_CVPR_2024_paper.pdf)]\n\n* **RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation**, arXiv, 2023  \nWang, Yufei, Zhou, Xian, Feng, Chen, Tsun-Hsuan, Wang, Yian, Wang, Katerina, Fragkiadaki, Zackory, Erickson, David, Held, Chuang, Gan.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.01455)]\n\n* **ProcTHOR: Large-Scale Embodied AI Using Procedural Generation**, NeurIPS, 2022  \nDeitke, VanderBilt, Herrasti, Weihs, Salvador, Ehsani, Han, Kolve, Farhadi, Kembhavi, Mottaghi   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2206.06994)]\n\n* **ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation**, NeurIPS, 2021  \nGan, Chuang, J., Schwartz, Seth, Alter, Martin, Schrimpf, James, Traer, JulianDe, Freitas, Jonas, Kubilius, Abhishek, Bhandwaldar, Nick, Haber, Megumi, Sano, Kuno, Kim, Elias, Wang, Damian, Mrowca, Michael, Lingelbach, Aidan, Curtis, KevinT., Feigelis, DavidM., Bear, Dan, Gutfreund, DavidD., Cox, JamesJ., DiCarlo, JoshH., McDermott, JoshuaB., Tenenbaum, Daniel, Yamins.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2007.04954)]\n\n* **iGibson 1.0: A Simulation Environment for Interactive Tasks in Large Realistic Scenes**, IROS, 2021  \nShen, Bokui, Fei, Xia, Chengshu, Li, Roberto, Martín-Martín, Linxi, Fan, Guanzhi, Wang, Claudia, Pérez-D’Arpino, Shyamal, Buch, Sanjana, Srivastava, Lyne, Tchapmi, Micael, Tchapmi, Kent, Vainio, Josiah, Wong, Li, Fei-Fei, Silvio, Savarese.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2012.02924)]\n\n* **SAPIEN: A SimulAted Part-Based Interactive ENvironment**, CVPR, 2020  \nXiang, Fanbo, Yuzhe, Qin, Kaichun, Mo, Yikuan, Xia, Hao, Zhu, Fangchen, Liu, Minghua, Liu, Hanxiao, Jiang, Yifu, Yuan, He, Wang, Li, Yi, Angel X., Chang, Leonidas J., Guibas, Hao, Su.   \n[[page](http:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_CVPR_2020\u002Fpapers\u002FXiang_SAPIEN_A_SimulAted_Part-Based_Interactive_ENvironment_CVPR_2020_paper.pdf)]\n\n* **Habitat: A Platform for Embodied AI Research**, ICCV, 2019  \nSavva, Manolis, Abhishek, Kadian, Oleksandr, Maksymets, Yili, Zhao, Erik, Wĳmans, Bhavana, Jain, Julian, Straub, Jia, Liu, Vladlen, Koltun, Jitendra, Malik, Devi, Parikh, Dhruv, Batra.   \n[[page](http:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_ICCV_2019\u002Fpapers\u002FSavva_Habitat_A_Platform_for_Embodied_AI_Research_ICCV_2019_paper.pdf)]\n\n* **VirtualHome: Simulating Household Activities Via Programs**, CVPR, 2018  \nPuig, Xavier, Kevin, Ra, Marko, Boben, Jiaman, Li, Tingwu, Wang, Sanja, Fidler, Antonio, Torralba.   \n[[page](http:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_cvpr_2018\u002Fpapers\u002FPuig_VirtualHome_Simulating_Household_CVPR_2018_paper.pdf)]\n\n* **Matterport3D: Learning from RGB-D Data in Indoor Environments**, 3DV, 2017  \nChang, Angel, Angela, Dai, Thomas, Funkhouser, Maciej, Halber, Matthias, Niebner, Manolis, Savva, Shuran, Song, Andy, Zeng, Yinda, Zhang.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1709.06158)]\n\n* **AI2-THOR: An Interactive 3D Environment for Visual AI**. arXiv, 2017  \nKolve, Eric, Roozbeh, Mottaghi, Daniel, Gordon, Yuke, Zhu, Abhinav, Gupta, Ali, Farhadi.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1712.05474)]\n\n## \u003Ca id=\"perception\">  Embodied Perception \u003Ca href=\"#table-of-contents\">🔝\u003C\u002Fa> \u003C\u002Fa>\n### Active Visual Exploration\n* **Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection**, Arxiv, 2025.  \nJuil Koo*, Daehyeon Choi*, Sangwoo Youn*, Phillip Y. Lee, Minhyuk Sung.  \n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.13250)]\n\n* **ActiveGAMER: Active GAussian Mapping through Efficient Rendering**, CVPR, 2025.  \nLiyan Chen, Huangying Zhan, Kevin Chen, Xiangyu Xu, Qingan Yan, Changjiang Cai, Yi Xu.  \n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.06897)]\n\n* **ActiveGS: Active Scene Reconstruction Using Gaussian Splatting**, RA-L, 2025.  \nLiren Jin, Xingguang Zhong, Yue Pan, Jens Behley, Cyrill Stachniss, Marija Popović.  \n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.17769)]\n\n* **RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics**, arxiv, 2025.  \nEnshen Zhou, Cheng Chi, Yibo Li, Jingkun An, Jiayuan Zhang, Shanyu Rong, Yi Han, Yuheng Ji, Mengzhen Liu, Pengwei Wang, Zhongyuan Wang, Lu Sheng, Shanghang Zhang.  \n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.13660)] [[Project](https:\u002F\u002Fzhoues.github.io\u002FRoboTracer\u002F)]\n\n* **RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics**, arxiv, 2025.  \nEnshen Zhou, Jingkun An, Cheng Chi, Yi Han, Shanyu Rong, Chi Zhang, Pengwei Wang, Zhongyuan Wang, Tiejun Huang, Lu Sheng, Shanghang Zhang.  \n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.04308)] [[Project](https:\u002F\u002Fzhoues.github.io\u002FRoboRefer\u002F)]\n\n* **3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians**, arxiv, 2025.     \nZeming Wei, Junyi Lin, Yang Liu, Weixing Chen, Jingzhou Luo, Guanbin Li, Liang Lin.     \n[[Paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.11218)] [[Project](https:\u002F\u002Fgithub.com\u002FHCPLab-SYSU\u002F3DAffordSplat)]    \n\n* **Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection**, CVPR, 2025.  \nEnshen Zhou, Qi Su, Cheng Chi, Zhizheng Zhang, Zhongyuan Wang, Tiejun Huang, Lu Sheng, He Wang.  \n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.04455)] [[Project](https:\u002F\u002Fzhoues.github.io\u002FCode-as-Monitor\u002F)]\n\n* **SnapMem: Snapshot-based 3D Scene Memory for Embodied Exploration and Reasoning**, arxiv, 2024.    \nYuncong Yang, Han Yang, Jiachen Zhou, Peihao Chen, Hongxin Zhang, Yilun Du, Chuang Gan.      \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.17735)]\n\n* **AIR-Embodied: An Efficient Active 3DGS-based Interaction and Reconstruction Framework with Embodied Large Language Model**, arxiv, 2024.    \nZhenghao Qi, Shenghai Yuan, Fen Liu, Haozhi Cao, Tianchen Deng, Jianfei Yang, Lihua Xie.      \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.16019)]\n\n* **BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation**, CVPR, 2024.    \nYunhao Ge, Yihe Tang, Jiashu Xu, Cem Gokmen, Chengshu Li, Wensi Ai, Benjamin Jose Martinez, Arman Aydin, Mona Anvari, Ayush K Chakravarthy, Hong-Xing Yu, Josiah Wong, Sanjana Srivastava, Sharon Lee, Shengxin Zha, Laurent Itti, Yunzhu Li, Roberto Martín-Martín, Miao Liu, Pengchuan Zhang, Ruohan Zhang, Li Fei-Fei, Jiajun Wu.         \n[[page](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FGe_BEHAVIOR_Vision_Suite_Customizable_Dataset_Generation_via_Simulation_CVPR_2024_paper.pdf)]\n\n* **Coarse-to-Fine Detection of Multiple Seams for Robotic Welding**, arxiv, 2024.    \nPengkun Wei, Shuo Cheng, Dayou Li, Ran Song, Yipeng Zhang, Wei Zhang.      \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.10710)]      \n\n* **Evidential Active Recognition: Intelligent and Prudent Open-World Embodied Perception**, CVPR, 2024.    \nFan, Lei, Mingfu, Liang, Yunxuan, Li, Gang, Hua, Ying, Wu.      \n[[page](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FFan_Evidential_Active_Recognition_Intelligent_and_Prudent_Open-World_Embodied_Perception_CVPR_2024_paper.pdf)]\n\n* **SpatialBot: Precise Spatial Understanding with Vision Language Models**, arxiv, 2024.    \nWenxiao Cai, Yaroslav Ponomarenko, Jianhao Yuan, Xiaoqi Li, Wankou Yang, Hao Dong, Bo Zhao.      \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.13642)]\n\n* **Embodied Uncertainty-Aware Object Segmentations**, IROS, 2024.      \nXiaolin Fang, Leslie Pack Kaelbling, Tom ́as Lozano-P ́erez.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.04760)]\n\n* **Point Transformer V3: Simpler Faster Stronger**, CVPR, 2024.\nWu, Xiaoyang, Li, Jiang, Peng-Shuai, Wang, Zhijian, Liu, Xihui, Liu, Yu, Qiao, Wanli, Ouyang, Tong, He, Hengshuang, Zhao.    \n[[page](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FWu_Point_Transformer_V3_Simpler_Faster_Stronger_CVPR_2024_paper.pdf)]    \n\n* **PointMamba: A Simple State Space Model for Point Cloud Analysis**, arXiv, 2024.   \nLiang, Dingkang, Xin, Zhou, Xinyu, Wang, Xingkui, Zhu, Wei, Xu, Zhikang, Zou, Xiaoqing, Ye, Xiang, Bai.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.10739)]    \n\n* **Point Could Mamba: Point Cloud Learning via State Space Model**, arXiv, 2024.    \nZhang, Tao, Xiangtai, Li, Haobo, Yuan, Shunping, Ji, Shuicheng, Yan.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.00762)]    \n\n* **Mamba3d: Enhancing local features for 3d point cloud analysis via state space model**, arXiv, 2024.   \nHan, Xu, Yuan, Tang, Zhaoxuan, Wang, Xianzhi, Li.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.14966)]\n\n* **Gs-slam: Dense visual slam with 3d gaussian splatting**, CVPR, 2024.    \nYan, Chi, Delin, Qu, Dan, Xu, Bin, Zhao, Zhigang, Wang, Dong, Wang, Xuelong, Li.      \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FYan_GS-SLAM_Dense_Visual_SLAM_with_3D_Gaussian_Splatting_CVPR_2024_paper.pdf)    \n\n* **GOReloc: Graph-based Object-Level Relocalization for Visual SLAM**, IEEE RAL, 2024.    \nYutong Wang, Chaoyang Jiang, Xieyuanli Chen.      \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.07917)\n\n* **Embodiedscan: A holistic multi-modal 3d perception suite towards embodied ai** CVPR, 2024.    \nWang, Tai, Xiaohan, Mao, Chenming, Zhu, Runsen, Xu, Ruiyuan, Lyu, Peisen, Li, Xiao, Chen, Wenwei, Zhang, Kai, Chen, Tianfan, Xue, others.      \n[[page](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FWang_EmbodiedScan_A_Holistic_Multi-Modal_3D_Perception_Suite_Towards_Embodied_AI_CVPR_2024_paper.pdf)]\n\n* **Neu-nbv: Next best view planning using uncertainty estimation in image-based neural rendering**, IROS, 2023.    \nJin, Liren, Xieyuanli, Chen, Julius, Rückin, Marija, Popovi\\'c.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.01284)]    \n\n* **Off-policy evaluation with online adaptation for robot exploration in challenging environments**, IEEE Robotics and Automation Letters, 2023.   \nHu, Yafei, Junyi, Geng, Chen, Wang, John, Keller, Sebastian, Scherer.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.03140)]    \n\n* **OVD-SLAM: An online visual SLAM for dynamic environments**, IEEE Sensors Journal, 2023.    \nHe, Jiaming, Mingrui, Li, Yangyang, Wang, Hongyu, Wang.     \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F10113832)    \n\n* **Transferring implicit knowledge of non-visual object properties across heterogeneous robot morphologies**, ICRA, 2023.    \nTatiya, Gyan, Jonathan, Francis, Jivko, Sinapov.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.06890)]\n\n* **Swin3d: A pretrained transformer backbone for 3d indoor scene understanding**, arXiv, 2023.   \nYang, Yu-Qi, Yu-Xiao, Guo, Jian-Yu, Xiong, Yang, Liu, Hao, Pan, Peng-Shuai, Wang, Xin, Tong, Baining, Guo.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.06906)]\n\n* **Point transformer v2: Grouped vector attention and partition-based pooling**, NeurIPS, 2022.   \nWu, Xiaoyang, Yixing, Lao, Li, Jiang, Xihui, Liu, Hengshuang, Zhao.     \n[[page](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Ffile\u002Fd78ece6613953f46501b958b7bb4582f-Paper-Conference.pdf)]\n\n* **Rethinking network design and local geometry in point cloud: A simple residual MLP framework**, arXiv, 2022.\nMa, Xu, Can, Qin, Haoxuan, You, Haoxi, Ran, Yun, Fu. \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2202.07123)]    \n\n* **So-slam: Semantic object slam with scale proportional and symmetrical texture constraints**. IEEE Robotics and Automation Letters 7. 2(2022): 4008–4015.  \nLiao, Ziwei, Yutong, Hu, Jiadong, Zhang, Xianyu, Qi, Xiaoyu, Zhang, Wei, Wang.   \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F9705562)\n\n* **SG-SLAM: A real-time RGB-D visual SLAM toward dynamic scenes with semantic and geometric information**, IEEE Transactions on Instrumentation and Measurement 72. (2022): 1–12.      \nCheng, Shuhong, Changhe, Sun, Shĳun, Zhang, Dianfan, Zhang.    \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F9978699)   \n\n* **Point transformer**, ICCV, 2021.\nZhao, Hengshuang, Li, Jiang, Jiaya, Jia, Philip HS, Torr, Vladlen, Koltun.     \n[[page](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2021\u002Fpapers\u002FZhao_Point_Transformer_ICCV_2021_paper.pdf)]    \n\n* **Pointpillars: Fast encoders for object detection from point clouds**, CVPR, 2019.    \nLang, Alex H, Sourabh, Vora, Holger, Caesar, Lubing, Zhou, Jiong, Yang, Oscar, Beijbom.     \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_CVPR_2019\u002Fpapers\u002FLang_PointPillars_Fast_Encoders_for_Object_Detection_From_Point_Clouds_CVPR_2019_paper.pdf)    \n\n* **4d spatio-temporal convnets: Minkowski convolutional neural networks**, CVPR, 2019.    \nChoy, Christopher, JunYoung, Gwak, Silvio, Savarese.    \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_CVPR_2019\u002Fpapers\u002FChoy_4D_Spatio-Temporal_ConvNets_Minkowski_Convolutional_Neural_Networks_CVPR_2019_paper.pdf)\n\n* **Cubeslam: Monocular 3-d object slam**, IEEE T-RO 35. 4(2019): 925–938  \nYang, Shichao, Sebastian, Scherer.  \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F8708251)\n\n* **Hierarchical topic model based object association for semantic SLAM**, IEEE T-VCG 25. 11(2019): 3052–3062  \nZhang, Jianhua, Mengping, Gui, Qichao, Wang, Ruyu, Liu, Junzhe, Xu, Shengyong, Chen.   \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F8794595)\n\n* **DS-SLAM: A semantic visual SLAM towards dynamic environments**, IROS, 2018   \nYu, Chao, Zuxin, Liu, Xin-Jun, Liu, Fugui, Xie, Yi, Yang, Qi, Wei, Qiao, Fei.   \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F8593691)\n\n* **DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes**, IEEE Robotics and Automation Letters 3. 4(2018): 4076–4083     \nBescos, Berta, José M, Facil, Javier, Civera, José, Neira.   \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F8421015)\n\n* **Quadricslam: Dual quadrics from object detections as landmarks in object-oriented slam**, IEEE Robotics and Automation Letters 4. 1(2018): 1–8.  \nNicholson, Lachlan, Michael, Milford, Niko, Sünderhauf.   \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F8440105)\n\n* **3d semantic segmentation with submanifold sparse convolutional networks**, CVPR, 2018.    \nGraham, Benjamin, Martin, Engelcke, Laurens, Van Der Maaten.     \n[[page](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_cvpr_2018\u002Fpapers\u002FGraham_3D_Semantic_Segmentation_CVPR_2018_paper.pdf)]\n\n* **Learning to look around: Intelligently exploring unseen environments for unknown tasks**, CVPR, 2018.   \nJayaraman, Dinesh, Kristen, Grauman.    \n[[page](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_cvpr_2018\u002Fpapers\u002FJayaraman_Learning_to_Look_CVPR_2018_paper.pdf)]    \n\n* **Multi-view 3d object detection network for autonomous driving**, CVPR, 2017.    \nChen, Xiaozhi, Huimin, Ma, Ji, Wan, Bo, Li, Tian, Xia.     \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_cvpr_2017\u002Fpapers\u002FChen_Multi-View_3D_Object_CVPR_2017_paper.pdf)    \n\n* **Semantic scene completion from a single depth image**, CVPR, 2017.    \nSong, Shuran, Fisher, Yu, Andy, Zeng, Angel X, Chang, Manolis, Savva, Thomas, Funkhouser.     \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_cvpr_2017\u002Fpapers\u002FSong_Semantic_Scene_Completion_CVPR_2017_paper.pdf)    \n\n* **Pointnet: Deep learning on point sets for 3d classification and segmentation**, CVPR, 2017.    \nQi, Charles R, Hao, Su, Kaichun, Mo, Leonidas J, Guibas.     \n[[page](Pointnet: Deep learning on point sets for 3d classification and segmentation)]    \n\n* **Pointnet++: Deep hierarchical feature learning on point sets in a metric space**, NeurIPS, 2017.    \nQi, Charles Ruizhongtai, Li, Yi, Hao, Su, Leonidas J, Guibas.     \n[[page](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2017\u002Ffile\u002Fd8bf84be3800d12f74d8b05e9b89836f-Paper.pdf)]\n\n* **The curious robot: Learning visual representations via physical interactions**, ECCV, 2016.   \nPinto, Lerrel, Dhiraj, Gandhi, Yuanfeng, Han, Yong-Lae, Park, Abhinav, Gupta.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1604.01360)]    \n\n* **Multi-view convolutional neural networks for 3d shape recognition**, ICCV, 2015.    \nSu, Hang, Subhransu, Maji, Evangelos, Kalogerakis, Erik, Learned-Miller.     \n[[page]](https:\u002F\u002Fwww.cv-foundation.org\u002Fopenaccess\u002Fcontent_iccv_2015\u002Fpapers\u002FSu_Multi-View_Convolutional_Neural_ICCV_2015_paper.pdf)    \n\n* **Voxnet: A 3d convolutional neural network for real-time object recognition**, IROS, 2015.    \nMaturana, Daniel, Sebastian, Scherer.     \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F7353481)    \n\n* **ORB-SLAM: a versatile and accurate monocular SLAM system** IEEE T-RO 31. 5(2015): 1147–1163  \nMur-Artal, Raul, Jose Maria Martinez, Montiel, Juan D, Tardos.   \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F7219438\u002F)\n\n* **LSD-SLAM: Large-scale direct monocular SLAM**, ECCV, 2014  \nEngel, Jakob, Thomas, Schops, Daniel, Cremers.  \n[[page]](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-3-319-10605-2_54)\n\n* **Slam++: Simultaneous localisation and mapping at the level of objects**, CVPR, 2013  \nSalas-Moreno, Renato F, Richard A, Newcombe, Hauke, Strasdat, Paul HJ, Kelly, Andrew J, Davison.   \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_cvpr_2013\u002Fpapers\u002FSalas-Moreno_SLAM_Simultaneous_Localisation_2013_CVPR_paper.pdf)\n\n* **DTAM: Dense tracking and mapping in real-time**, ICCV, 2011  \nNewcombe, Richard A, Steven J, Lovegrove, Andrew J, Davison.  \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F6126513\u002F)\n\n* **MonoSLAM: Real-time single camera SLAM**, IEEE T-PAMI, 2007.  \nDavison, Andrew J, Ian D, Reid, Nicholas D, Molton, Olivier, Stasse.   \n[[page]](http:\u002F\u002Fwww.doc.ic.ac.uk\u002F~ajd\u002FPublications\u002Fdavison_etal_pami2007.pdf)\n\n* **A multi-state constraint Kalman filter for vision-aided inertial navigation**, IROS, 2007  \nMourikis, Anastasios I, Stergios I, Roumeliotis.   \n[[page]](https:\u002F\u002Fintra.engr.ucr.edu\u002F~mourikis\u002Ftech_reports\u002FTR_MSCKF.pdf)\n\n* **Parallel tracking and mapping for small AR workspaces**, ISMAR, 2007  \nKlein, Georg, David, Murray.   \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F4538852\u002F)\n\n### 3D Visual Perception and Grounding\n* **ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding**, CVPR, 2025  \nZhenyang Liu, Yikai Wang, Sixiao Zheng, Tongying Pan, Longfei Liang, Yanwei Fu, Xiangyang Xue.  \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.23297)\n\n* **ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Grounding**, arXiv, 2025  \nAustin T. Wang, ZeMing Gong, Angel X. Chang.  \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.01366)\n\n* **UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation**, ICRA, 2025\nYihe Tang, Wenlong Huang, Yingke Wang, Chengshu Li, Roy Yuan, Ruohan Zhang, Jiajun Wu, Li Fei-Fei  \n[[page]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=an953WOpo2)\n\n* **Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions**, arxiv, 2025  \nHe Zhu, Quyu Kong, Kechun Xu, Xunlong Xia, Bing Deng, Jieping Ye, Rong Xiong, Yue Wang  \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.04744)  \n\n* **3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds**, arxiv, 2025  \nHengshuo Chu, Xiang Deng, Qi Lv, Xiaoyang Chen, Yinchuan Li, Jianye Hao, Liqiang Nie  \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.20041)  \n\n* **SeqAfford: Sequential 3D affordance reasoning via Multimodal Large Language Model**, CVPR, 2025  \nHanqing Wang, Chunlin Yu, Haoyang Luo, Jingyi Yu, Ye Shi, Jingya Wang  \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.01550) \n\n* **GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency**, CVPR, 2025  \nDongyue Lu, Lingdong Kong, Tianxin Huang, Gim Hee Lee  \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.09511)  \n\n* **GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding**, arxiv, 2024  \nYawen Shao, Wei Zhai, Yuhang Yang, Hongchen Luo, Yang Cao, Zheng-Jun Zha, CVPR, 2025 \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.19626)  \n\n* **LASO: Language-guided affordance segmentation on 3d object**, CVPR, 2024  \nYicong Li, Na Zhao, Junbin Xiao, Chun Feng, Xiang Wang, Tat-seng Chua  \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FLi_LASO_Language-guided_Affordance_Segmentation_on_3D_Object_CVPR_2024_paper.pdf)  \n\n* **SceneFun3D: fine-grained functionality and affordance understanding in 3D scenes**, CVPR, 2024  \nAlexandros Delitzas, Ayca Takmaz, Federico Tombari, Robert Sumner, Marc Pollefeys, Francis Engelmann  \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fhtml\u002FDelitzas_SceneFun3D_Fine-Grained_Functionality_and_Affordance_Understanding_in_3D_Scenes_CVPR_2024_paper.html)  \n\n* **Language-conditioned affordance-pose detection in 3d point clouds**, ICRA, 2024  \nToan Nguyen, Minh Nhat Vu, Baoru Huang, Tuan Van Vo, Vy Truong, Ngan Le, Thieu Vo, Bac Le, Anh Nguyen  \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.10911)   \n\n* **DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering**, CVPR, 2025        \nJingzhou Luo, Yang Liu, Weixing Chen, Zhen Li, Yaowei Wang, Guanbin Li, Liang Lin                                   \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.03190)[project](https:\u002F\u002Fgithub.com\u002FLZ-CH\u002FDSPNet)    \n\n* **Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding**, arxiv, 2024        \nXianqiang Gao, Pingrui Zhang, Delin Qu, Dong Wang, Zhigang Wang, Yan Ding, Bin Zhao, Xuelong Li                               \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.13024)\n\n* **EmbodiedSAM: Online Segment Any 3D Thing in Real Time**, arxiv, 2024        \nXiuwei Xu, Huangxing Chen, Linqing Zhao, Ziwei Wang, Jie Zhou, Jiwen Lu                          \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.11811)\n\n* **OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding**, arxiv, 2024        \nYoujun Zhao, Jiaying Lin, Shuquan Ye, Qianshi Pang, Rynson W.H. Lau                           \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.11030)\n\n* **LLMI3D: Empowering LLM with 3D Perception from a Single 2D Image**, arxiv, 2024       \nFan Yang, Sicheng Zhao, Yanhao Zhang, Haoxiang Chen, Hui Chen, Wenbo Tang, Haonan Lu, Pengfei Xu, Zhenyu Yang, Jungong Han, Guiguang Ding                      \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.07422)\n\n* **MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations**, arxiv, 2024       \nRuiyuan Lyu, Tai Wang, Jingli Lin, Shuai Yang, Xiaohan Mao, Yilun Chen, Runsen Xu, Haifeng Huang, Chenming Zhu, Dahua Lin, Jiangmiao Pang                   \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.09401)\n\n* **ShapeLLM: Universal 3D Object Understanding for Embodied Interaction**, arxiv, 2024          \nZekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, He Wang, Li Yi, Kaisheng Ma      \n[[page]](https:\u002F\u002Fqizekun.github.io\u002Fshapellm\u002F)\n\n* **LEO: An Embodied Generalist Agent in 3D World**, ICML, 2024      \nJiangyong Huang, Silong Yong, Xiaojian Ma, Xiongkun Linghu, Puhao Li, Yan Wang, Qing Li, Song-Chun Zhu, Baoxiong Jia, and Siyuan Huang   \n[[page]](https:\u002F\u002Fembodied-generalist.github.io\u002F)    \n\n* **SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding**, ECCV, 2024    \nBaoxiong Jia, Yixin Chen, Huangyue Yu, Yan Wang, Xuesong Niu, Tengyu Liu, Qing Li, and Siyuan Huang    \n[[page]](https:\u002F\u002Fscene-verse.github.io\u002F)    \n\n* **PQ3D: Unifying 3D Vision-Language Understanding via Promptable Queries**, ECCV, 2024     \nZiyu Zhu, Zhuofan Zhang, Xiaojian Ma, Xuesong Niu, Yixin Chen, Baoxiong Jia, Zhidong Deng, Siyuan Huang, and Qing Li    \n[[page]](https:\u002F\u002F3d-vista.github.io\u002F)\n\n* **MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World**, CVPR, 2024     \nYining Hong, Zishuo Zheng, Peihao Chen, Yian Wang, Junyan Li, Chuang Gan     \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FHong_MultiPLY_A_Multisensory_Object-Centric_Embodied_Large_Language_Model_in_3D_CVPR_2024_paper.pdf)\n\n* **MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception**, CVPR, 2024     \nYiran Qin, Enshen Zhou, Qichang Liu, Zhenfei Yin, Lu Sheng, Ruimao Zhang, Yu Qiao, Jing Shao        \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FQin_MP5_A_Multi-modal_Open-ended_Embodied_System_in_Minecraft_via_Active_CVPR_2024_paper.pdf)\n\n* **MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation**, CVPR, 2024     \nMi Yan, Jiazhao Zhang, Yan Zhu, He Wang            \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.07745)\n\n* **TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding**, CVPR, 2024     \nYun Liu, Haolin Yang, Xu Si, Ling Liu, Zipeng Li, Yuxiang Zhang, Yebin Liu, Li Yi                \n[[page]](https:\u002F\u002Ftaco2024.github.io\u002F)\n\n* **EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding**, CVPR, 2023   \nWu, Yanmin and Cheng, Xinhua and Zhang, Renrui and Cheng, Zesen and Zhang, Jian   \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FWu_EDA_Explicit_Text-Decoupling_and_Dense_Alignment_for_3D_Visual_Grounding_CVPR_2023_paper.pdf)   \n\n* **Affordpose: A large-scale dataset of hand-object interactions with affordance-driven hand pose**, ICCV, 2023  \nJuntao Jian, Xiuping Liu, Manyi Li, Ruizhen Hu, Jian Liu  \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fhtml\u002FJian_AffordPose_A_Large-Scale_Dataset_of_Hand-Object_Interactions_with_Affordance-Driven_Hand_ICCV_2023_paper.html)\n\n* **Grounding 3d object affordance from 2d interactions in images**, ICCV, 2023  \nYuhang Yang, Wei Zhai, Hongchen Luo, Yang Cao, Jiebo Luo, Zheng-Jun Zha  \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fhtml\u002FYang_Grounding_3D_Object_Affordance_from_2D_Interactions_in_Images_ICCV_2023_paper.html)  \n\n* **3d-vista: Pre-trained transformer for 3d vision and text alignment**, ICCV, 2023       \nZiyu Zhu, Xiaojian Ma, Yixin Chen, Zhidong Deng, Siyuan Huang, and Qing Li      \n[[page]](https:\u002F\u002F3d-vista.github.io\u002F)    \n\n* **LeaF: Learning Frames for 4D Point Cloud Sequence Understanding**, ICCV, 2023       \nYunze Liu, Junyu Chen, Zekai Zhang, Li Yi        \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FLiu_LeaF_Learning_Frames_for_4D_Point_Cloud_Sequence_Understanding_ICCV_2023_paper.pdf)\n\n* **SQA3D: Situated Question Answering in 3D Scenes**, ICLR, 2023    \nXiaojian Ma, Silong Yong, Zilong Zheng, Qing Li, Yitao Liang, Song-Chun Zhu, and Siyuan Huang    \n[[page]]([https:\u002F\u002Fsqa3d.github.io\u002F)\n\n* **LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent**, arXix, 2023   \nYang, Jianing and Chen, Xuweiyi and Qian, Shengyi and Madaan, Nikhil and Iyengar, Madhavan and Fouhey, David F and Chai, Joyce   \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.12311)   \n\n* **Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding**, arXix, 2023   \nYuan, Zhihao and Ren, Jinke and Feng, Chun-Mei and Zhao, Hengshuang and Cui, Shuguang and Li, Zhen   \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.15383)\n\n* **Multi-view transformer for 3D visual grounding**, CVPR, 2022   \nHuang, Shijia and Chen, Yilun and Jia, Jiaya and Wang, Liwei   \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.02174)    \n\n* **Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding**, CVPR, 2022   \nBakr, Eslam and Alsaedy, Yasmeen and Elhoseiny, Mohamed   \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2211.14241)   \n\n* **3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection**, CVPR, 2022   \nLuo, Junyu and Fu, Jiahui and Kong, Xianghao and Gao, Chen and Ren, Haibing and Shen, Hao and Xia, Huaxia and Liu, Si   \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.06272)    \n\n* **Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds**, ECCV, 2022   \nJain, Ayush and Gkanatsios, Nikolaos and Mediratta, Ishita and Fragkiadaki, Katerina   \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2112.08879)   \n\n* **3d affordancenet: A benchmark for visual object affordance understanding**, CVPR, 2021  \nShengheng Deng, Xun Xu, Chaozheng Wu, Ke Chen, Kui Jia  \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2021\u002Fhtml\u002FDeng_3D_AffordanceNet_A_Benchmark_for_Visual_Object_Affordance_Understanding_CVPR_2021_paper.html)\n\n* **Text-guided graph neural networks for referring 3D instance segmentation**, AAAI, 2021   \nHuang, Pin-Hao and Lee, Han-Hung and Chen, Hwann-Tzong and Liu, Tyng-Luh   \n[[page]](https:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fview\u002F16253\u002F16060)   \n\n* **InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring**, ICCV, 2021   \nYuan, Zhihao and Yan, Xu and Liao, Yinghong and Zhang, Ruimao and Wang, Sheng and Li, Zhen and Cui, Shuguang   \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.01128)   \n\n* **Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud**, CVPR, 2021   \nFeng, Mingtao and Li, Zhen and Li, Qi and Zhang, Liang and Zhang, XiangDong and Zhu, Guangming and Zhang, Hui and Wang, Yaonan and Mian, Ajmal   \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.16381)   \n\n* **SAT: 2D Semantics Assisted Training for 3D Visual Grounding**, CVPR, 2021   \nYang, Zhengyuan and Zhang, Songyang and Wang, Liwei and Luo, Jiebo   \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2105.11450)   \n\n* **LanguageRefer: Spatiallanguage model for 3D visual grounding**, CVPR, 2021   \nRoh, Junha and Desingh, Karthik and Farhadi, Ali and Fox, Dieter   \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2107.03438)   \n\n* **3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds**, ICCV, 2021    \nZhao, Lichen and Cai, Daigang and Sheng, Lu and Xu, Dong    \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2021\u002Fpapers\u002FZhao_3DVG-Transformer_Relation_Modeling_for_Visual_Grounding_on_Point_Clouds_ICCV_2021_paper.pdf)    \n\n* **TransRefer3D: Entity-and-relation aware transformer for fine-grained 3D visual grounding**, CVPR, 2021    \nHe, Dailan and Zhao, Yusheng and Luo, Junyu and Hui, Tianrui and Huang, Shaofei and Zhang, Aixi and Liu, Si\n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2108.02388)   \n\n* **ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language**, ECCV, 2020    \nChen, Dave Zhenyu and Chang, Angel X and Nie{\\ss}ner, Matthias    \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1912.08830)    \n\n* **ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes**, ECCV, 2020   \nAchlioptas, Panos and Abdelreheem, Ahmed and Xia, Fei and Elhoseiny, Mohamed and Guibas, Leonidas   \n[[page]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2020\u002Fpapers_ECCV\u002Fpapers\u002F123460409.pdf)   \n\n\n### Visual Language Navigation\n\n* **WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation**, IROS, 2025.       \nDujun Nie, Xianda Guo, Yiqun Duan, Ruijun Zhang, Long Chen.             \n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.02247)] [[Project](https:\u002F\u002Fb0b8k1ng.github.io\u002FWMNav\u002F)]\n\n* **SmartWay: Enhanced Waypoint Prediction and Backtracking for Zero-Shot Vision-and-Language Navigation**, IROS, 2025.       \nXiangyu Shi, Zerui Li, Wenqi Lyu, Jiatong Xia, Feras Dayoub, Yanyuan Qiao, Qi Wu.             \n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.10069)]\n\n* **EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents**, arXiv, 2025.       \nRui Yang, Hanyang Chen, Junyu Zhang, Mark Zhao, Cheng Qian, Kangrui Wang, Qineng Wang, Teja Venkat Koripella, Marziyeh Movahedi, Manling Li, Heng Ji, Huan Zhang, Tong Zhang.             \n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.09560)] [[Project](https:\u002F\u002Fembodiedbench.github.io)]\n\n* **MapNav: A Novel Memory Representation via Annotated Semantic Maps for VLM-based Vision-and-Language Navigation**, arXiv, 2025.       \nLingfeng Zhang, Xiaoshuai Hao, Qinwen Xu, Qiang Zhang, Xinyao Zhang, Pengwei Wang, Jing Zhang, Zhongyuan Wang, Shanghang Zhang, Renjing Xu.             \n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.13451)]\n\n* **Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method**, CVPR, 2025.       \nXinshuai Song, Weixing Chen, Yang Liu, Weikai Chen, Guanbin Li, Liang Lin.             \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.09082)][project](https:\u002F\u002Fhcplab-sysu.github.io\u002FLH-VLN\u002F)   \n\n* **DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects**, arxiv, 2024.     \nZhaowei Wang, Hongming Zhang, Tianqing Fang, Ye Tian, Yue Yang, Kaixin Ma, Xiaoman Pan, Yangqiu Song, Dong Yu.     \n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.02730)] [[Project](https:\u002F\u002Fzhaowei-wang-nlp.github.io\u002Fdivscene-project-page\u002F)]\n\n* **MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation**, ACL, 2024.       \nJiaqi Chen, Bingqian Lin, Ran Xu, Zhenhua Chai, Xiaodan Liang, Kwan-Yee K. Wong.             \n[[page](https:\u002F\u002Fchen-judge.github.io\u002FMapGPT\u002F)]\n\n* **NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning**, ArXiv, 2024.      \nBingqian Lin, Yunshuang Nie, Ziming Wei, Jiaqi Chen, Shikui Ma, Jianhua Han, Hang Xu, Xiaojun Chang, Xiaodan Liang.             \n[[page](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.07376)]       \n\n* **OMEGA: Efficient Occlusion-Aware Navigation for Air-Ground Robot in Dynamic Environments via State Space Model**, ArXiv, 2024.      \nJunming Wang, Dong Huang, Xiuxian Guan, Zekai Sun, Tianxiang Shen, Fangming Liu, Heming Cui.         \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.10618)]\n\n* **CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving**, ArXiv, 2024.      \nHidehisa Arai, Keita Miwa, Kento Sasaki, Yu Yamaguchi, Kohei Watanabe, Shunsuke Aoki, Issei Yamamoto.         \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.10845)]\n\n* **FLAME: Learning to Navigate with Multimodal LLM in Urban Environments**, ArXiv, 2024.      \nYunzhe Xu, Yiyuan Pan, Zhe Liu, Hesheng Wang.       \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.11051)]\n\n* **Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation**, ArXiv, 2024.      \nJiaqi Chen, Bingqian Lin, Xinmin Liu, Xiaodan Liang, Kwan-Yee K Wong.         \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.05890)]\n\n* **Embodied Instruction Following in Unknown Environments**, ArXiv, 2024.      \nWu, Wang, Xu, Lu, Yan.       \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.11818)]       \n\n* **DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control**, arxiv, 2024.                \nXinyu Xu, Shengcheng Luo, Yanchao Yang, Yong-Lu Li, Cewu Lu.              \n[[page]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.14758)\n\n* **NOLO: Navigate Only Look Once**, arxiv, 2024.                \nBohan Zhou, Jiangxing Wang, Zongqing Lu.              \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.01384)\n\n* **Towards Learning a Generalist Model for Embodied Navigation**, CVPR, 2024.    \nDuo Zheng, , Shijia Huang, Lin Zhao, Yiwu Zhong, Liwei Wang.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.02010)]     \n\n* **Fast-Slow Test-time Adaptation for Online Vision-and-Language Navigation** ICML, 2024.    \nJunyu Gao, , Xuan Yao, Changsheng Xu.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.13209)]   \n\n* **Discuss before moving: Visual language navigation via multi-expert discussions**, ICRA, 2024.   \nLong, Yuxing, Xiaoqi, Li, Wenzhe, Cai, Hao, Dong.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.11382)]    \n\n* **Vision-and-Language Navigation via Causal Learning**, CVPR, 2024.   \nLiuyi Wang, Qijun Chen.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.10241)]   \n\n* **Volumetric Environment Representation for Vision-Language Navigation**, CVPR, 2024.   \nRui Liu, Yi Yang.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.14158)]    \n\n* **Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation**, CVPR 2024.   \nWang, Zihan, Xiangyang, Li, Jiahao, Yang, Yeqi, Liu, Junjie, Hu, Ming, Jiang, Shuqiang, Jiang. \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.01943)]    \n\n* **Bridging zero-shot object navigation and foundation models through pixel-guided navigation skill** ICRA, 2024.           \nWenzhe Cai, Siyuan Huang, Guangran Cheng, Yuxing Long, Peng Gao, Changyin Sun, and Hao Dong.       \n[[page]](https:\u002F\u002Fgithub.com\u002Fwzcai99\u002FPixel-Navigator)      \n\n* **OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation**, CVPR, 2024.              \nGanlong Zhao, Guanbin Li, Weikai Chen, Yizhou Yu.           \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FZhao_OVER-NAV_Elevating_Iterative_Vision-and-Language_Navigation_with_Open-Vocabulary_Detection_and_StructurEd_CVPR_2024_paper.pdf)     \n\n* **RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation**, CVPR, 2024.                \nZeyuan Yang, Jiageng Liu, Peihao Chen, Anoop Cherian, Tim K. Marks, Jonathan Le Roux, Chuang Gan.       \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FYang_RILA_Reflective_and_Imaginative_Language_Agent_for_Zero-Shot_Semantic_Audio-Visual_CVPR_2024_paper.pdf)   \n\n* **Towards Learning a Generalist Model for Embodied Navigation**, CVPR, 2024.                \nDuo Zheng, Shijia Huang, Lin Zhao, Yiwu Zhong, Liwei Wang.       \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FZheng_Towards_Learning_a_Generalist_Model_for_Embodied_Navigation_CVPR_2024_paper.pdf)\n\n* **Vision-and-Language Navigation via Causal Learning**, CVPR, 2024.                \nLiuyi Wang, Zongtao He, Ronghao Dang, Mengjiao Shen, Chengju Liu, Qijun Chen.        \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FWang_Vision-and-Language_Navigation_via_Causal_Learning_CVPR_2024_paper.pdf)\n\n* **Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation**, CVPR, 2024.                \nXiaohan Lei, Min Wang, Wengang Zhou, Li Li, Houqiang Li.     \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FLei_Instance-aware_Exploration-Verification-Exploitation_for_Instance_ImageGoal_Navigation_CVPR_2024_paper.pdf)\n\n* **Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation**, CVPR, 2024.                \nMukul Khanna, Yongsen Mao, Hanxiao Jiang, Sanjay Haresh, Brennan Shacklett, Dhruv Batra, Alexander Clegg, Eric Undersander, Angel X. Chang, Manolis Savva.     \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FKhanna_Habitat_Synthetic_Scenes_Dataset_HSSD-200_An_Analysis_of_3D_Scene_CVPR_2024_paper.pdf)\n\n* **SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System**, CVPR, 2024.                \nYunfei Fan, Tianyu Zhao, Guidong Wang.     \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FFan_SchurVINS_Schur_Complement-Based_Lightweight_Visual_Inertial_Navigation_System_CVPR_2024_paper.pdf)\n\n* **SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World**, CVPR, 2024.                \nKiana Ehsani, Tanmay Gupta, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Kunal Pratap Singh, Yejin Kim, Winson Han, Alvaro Herrasti, Ranjay Krishna, Dustin Schwenk, Eli VanderBilt, Aniruddha Kembhavi.  \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FEhsani_SPOC_Imitating_Shortest_Paths_in_Simulation_Enables_Effective_Navigation_and_CVPR_2024_paper.pdf)\n\n* **Volumetric Environment Representation for Vision-Language Navigation**, CVPR, 2024.                \nRui Liu, Wenguan Wang, Yi Yang.     \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FLiu_Volumetric_Environment_Representation_for_Vision-Language_Navigation_CVPR_2024_paper.pdf)\n\n* **GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation**, CVPR, 2024.                \nXiaohan Wang, Yuehu Liu, Xinhang Song, Yuyi Liu, Sixian Zhang, Shuqiang Jiang.        \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FKhanna_GOAT-Bench_A_Benchmark_for_Multi-Modal_Lifelong_Navigation_CVPR_2024_paper.pdf)\n\n* **An Interactive Navigation Method with Effect-oriented Affordance**, CVPR, 2024.                \nXiaohan Wang, Yuehu Liu, Xinhang Song, Yuyi Liu, Sixian Zhang, Shuqiang Jiang.      \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FWang_An_Interactive_Navigation_Method_with_Effect-oriented_Affordance_CVPR_2024_paper.pdf)\n\n* **Imagine Before Go: Self-Supervised Generative Map for Object Goal Navigation**, CVPR, 2024.                \nSixian Zhang, Xinyao Yu, Xinhang Song, Xiaohan Wang, Shuqiang Jiang.         \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FZhang_Imagine_Before_Go_Self-Supervised_Generative_Map_for_Object_Goal_Navigation_CVPR_2024_paper.pdf)\n\n* **MemoNav: Working Memory Model for Visual Navigation**, CVPR, 2024.                \nHongxin Li, Zeyu Wang, Xu Yang, Yuran Yang, Shuqi Mei, Zhaoxiang Zhang.         \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FLi_MemoNav_Working_Memory_Model_for_Visual_Navigation_CVPR_2024_paper.pdf)\n\n* **Versatile Navigation Under Partial Observability via Value-guided Diffusion Policy**, CVPR, 2024.                \nGengyu Zhang, Hao Tang, Yan Yan.         \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FZhang_Versatile_Navigation_Under_Partial_Observability_via_Value-guided_Diffusion_Policy_CVPR_2024_paper.pdf)\n\n* **Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation**, CVPR, 2024.                \nZihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Junjie Hu, Ming Jiang, Shuqiang Jiang.    \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FWang_Lookahead_Exploration_with_Neural_Radiance_Representation_for_Continuous_Vision-Language_Navigation_CVPR_2024_paper.pdf)\n\n* **SPIN: Simultaneous Perception Interaction and Navigation**, CVPR, 2024.                \nShagun Uppal, Ananye Agarwal, Haoyu Xiong, Kenneth Shaw, Deepak Pathak.    \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FUppal_SPIN_Simultaneous_Perception_Interaction_and_Navigation_CVPR_2024_paper.pdf)\n\n* **Correctable Landmark Discovery via Large Models for Vision-Language Navigation**, TPAMI, 2024.              \nBingqian Lin, Yunshuang Nie, Ziming Wei, Yi Zhu, Hang Xu, Shikui Ma, Jianzhuang Liu, Xiaodan Liang.           \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F10543121)\n\n* **ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments**, IEEE T-PAMI, 2024.   \nAn, Dong, Hanqing, Wang, Wenguan, Wang, Zun, Wang, Yan, Huang, Keji, He, Liang, Wang. \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.03047)]\n\n* **NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation**, RSS, 2024.   \nJiazhao Zhang, Kunyu Wang, Rongtao Xu, Gengze Zhou, Yicong Hong, Xiaomeng Fang, Qi Wu, Zhizheng Zhang, He Wang.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.15852)]\n\n* **March in Chat: Interactive Prompting for Remote Embodied Referring Expression**, ICCV, 2023.   \nQiao, Yanyuan, Yuankai, Qi, Zheng, Yu, Jing, Liu, Qi, Wu.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.10141)]     \n\n* **Multi-level compositional reasoning for interactive instruction following**, AAAI, 2023.   \nBhambri, Suvaansh, Byeonghwi, Kim, Jonghyun, Choi.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.09387)]   \n\n* **Vision and Language Navigation in the Real World via Online Visual Language Mapping**, ArXiv, 2023.   \nChengguang Xu, , Hieu T. Nguyen, Christopher Amato, Lawson L.S. Wong. \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.10822)]    \n\n* **Towards Deviation-robust Agent Navigation via Perturbation-aware Contrastive Learning**, TPAMI, 2023.              \nBingqian Lin, Yanxin Long, Yi Zhu, Fengda Zhu, Xiaodan Liang , Qixiang Ye, Liang Lin.           \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F10120966\u002F)\n\n* **Find What You Want: Learning Demand-conditioned Object Attribute Space for Demand-driven Navigation**, NIPS, 2023.   \nWang, Chen, Li, Wu, Dong.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.08138)]\n\n* **HomeRobot: Open-Vocabulary Mobile Manipulation**, NIPS, 2023.   \nYenamandra, Sriram, Arun, Ramachandran, Karmesh, Yadav, Austin, Wang, Mukul, Khanna, Theophile, Gervet, Tsung-Yen, Yang, Vidhi, Jain, AlexanderWilliam, Clegg, John, Turner, Zsolt, Kira, Manolis, Savva, Angel, Chang, DevendraSingh, Chaplot, Dhruv, Batra, Roozbeh, Mottaghi, Yonatan, Bisk, Chris, Paxton.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2306.11565)]    \n\n* **Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation**, Conference on Robot Learning. 2023.    \nLi, Chengshu, Ruohan, Zhang, Josiah, Wong, Cem, Gokmen, Sanjana, Srivastava, Roberto, Mart\\in-Mart\\'\\in, Chen, Wang, Gabrael, Levine, Michael, Lingelbach, Jiankai, Sun, others.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.09227)]\n\n* **DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following**, arXiv, 2022.   \nGao, Xiaofeng, Qiaozi, Gao, Ran, Gong, Kaixiang, Lin, Govind, Thattai, GauravS., Sukhatme.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2202.13330)]   \n\n* **HOP: History-and-Order Aware Pretraining for Vision-and-Language Navigation**, CVPR, 2022.       \nQiao, Yanyuan, Yuankai, Qi, Yicong, Hong, Zheng, Yu, Peng, Wang, Qi, Wu.        \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.11591)]      \n\n* **Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation**, CVPR, 2022.    \nHong, Yicong, Zun, Wang, Qi, Wu, Stephen, Gould.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.02764)]   \n\n* **FILM: Following Instructions in Language with Modular Methods**, ICLR, 2022.   \nSo Yeon Min, , Devendra Singh Chaplot, Pradeep Kumar Ravikumar, Yonatan Bisk, Ruslan Salakhutdinov.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2110.07342)]   \n\n* **LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action**, Conference on Robot Learning. 2022.   \nDhruv Shah, , Blazej Osinski, Brian Ichter, Sergey Levine.      \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2207.04429)]\n\n* **SOON: Scenario Oriented Object Navigation with Graph-based Exploration**, CVPR, 2021.      \nZhu, Fengda, Xiwen, Liang, Yi, Zhu, Qizhi, Yu, Xiaojun, Chang, Xiaodan, Liang.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.17138)]   \n\n* **Vision-Language Navigation Policy Learning and Adaptation**, IEEE T-PAMI 43. 12(2021): 4205-4216.    \nWang, Xin, Qiuyuan, Huang, Asli, Celikyilmaz, Jianfeng, Gao, Dinghan, Shen, Yuan-Fang, Wang, William Yang, Wang, Lei, Zhang.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002Fhttps:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F8986691)]   \n\n* **Neighbor-view enhanced model for vision and language navigation**, MM, 2021.   \nAn, Dong, Yuankai, Qi, Yan, Huang, Qi, Wu, Liang, Wang, Tieniu, Tan.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2107.07201)]   \n\n* **Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments**, ECCV, 2020.         \nKrantz, Jacob and Wijmans, Erik and Majumdar, Arjun and Batra, Dhruv and Lee, Stefan.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2004.02857)]\n\n* **REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments**, CVPR, 2020.   \nQi, Yuankai, Qi, Wu, Peter, Anderson, Xin, Wang, William Yang, Wang, Chunhua, Shen, Anton, Hengel.        \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1904.10151)]      \n\n* **ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks**, CVPR, 2020.    \nShridhar, Mohit, Jesse, Thomason, Daniel, Gordon, Yonatan, Bisk, Winson, Han, Roozbeh, Mottaghi, Luke, Zettlemoyer, Dieter, Fox.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1912.01734)]   \n\n* **Vision-and-dialog navigation**, Conference on Robot Learning. 2020.   \nThomason, Jesse, Michael, Murray, Maya, Cakmak, Luke, Zettlemoyer.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1907.04957)]   \n\n* **Language and visual entity relationship graph for agent navigation**, NeurIPS, 2020.   \nHong, Yicong, Cristian, Rodriguez, Yuankai, Qi, Qi, Wu, Stephen, Gould.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2010.09304)]   \n\n* **Language-Guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning**, IEEE T-CSVT 31. (2020): 3469-3481.    \nWeixia Zhang, , Chao Ma, Qi Wu, Xiaokang Yang.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2011.10972)]   \n\n* **Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation**, ACL, 2019.   \nJain, Vihan, Gabriel, Magalhaes, Alexander, Ku, Ashish, Vaswani, Eugene, Ie, Jason, Baldridge.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.12255)]    \n\n* **TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments**, CVPR, 2019.   \nChen, Howard, Alane, Suhr, Dipendra, Misra, Noah, Snavely, Yoav, Artzi.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1811.12354)]\n\n* **Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments**, CVPR, 2018.   \nAnderson, Peter, Qi, Wu, Damien, Teney, Jake, Bruce, Mark, Johnson, Niko, Sunderhauf, Ian, Reid, Stephen, Gould, Anton, Hengel.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1711.07280)]    \n\n* **Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation**, ECCV, 2018.   \nXin Eric Wang, , Wenhan Xiong, Hongmin Wang, William Yang Wang.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.07729)]     \n\n### Non-Visual Perception: Tactile\n\n* **Sensor-Invariant Tactile Representation (SITR)**, ICLR, 2025.    \nHarsh Gupta, Yuchen Mo, Shengmiao Jin, Wenzhen Yuan.    \n[[page](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.19638)]\n\n* **Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation**, RSS, 2025.    \nHan Xue, Jieji Ren, Wendi Chen, Gu Zhang, Yuan Fang, Guoying Gu, Huazhe Xu, Cewu Lu.    \n[[page](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.02881)]\n\n* **3D-ViTac: Learning Fine-Grained Manipulation with Visuo-Tactile Sensing**, CoRL, 2024.    \nBinghao Huang, Yixuan Wang, Xinyi Yang, Yiyue Luo, Yunzhu Li.    \n[[page](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.24091)]\n\n* **TacSL: A Library for Visuotactile Sensor Simulation and Learning**, IEEE TRO, 2025.    \nIretiayo Akinola, Jie Xu, Jan Carius, Dieter Fox, Yashraj Narang.    \n[[page](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.06506)]\n\n* **When Vision Meets Touch: A Contemporary Review for Visuotactile Sensors from the Signal Processing Perspective**, Arxiv, 2024.    \nLi, Shoujie and Wang, Zihan and Wu, Changsheng and Li, Xiang and Luo, Shan and Fang, Bin and Sun, Fuchun and Zhang, Xiao-Ping and Ding, Wenbo.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.12226)]\n\n* **Enhancing Generalizable 6D Pose Tracking of an In-Hand Object with Tactile Sensing**, RA-L, 2024.    \nYun Liu, Xiaomeng Xu, Weihang Chen, Haocheng Yuan, He Wang, Jing Xu, Rui Chen, Li Yi.       \n[[page](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.04026)]\n\n* **Learning visuotactile skills with two multifingered hands**, ArXiv, 2024.   \nLin, Toru and Zhang, Yu and Li, Qiyang and Qi, Haozhi and Yi, Brent and Levine, Sergey and Malik, Jitendra.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.16823)]   \n\n* **Binding touch to everything: Learning unified multimodal tactile representations**, CVPR, 2024.   \nYang, Fengyu and Feng, Chao and Chen, Ziyang and Park, Hyoungseob and Wang, Daniel and Dou, Yiming and Zeng, Ziyao and Chen, Xien and Gangopadhyay, Rit and Owens, Andrew and others.   \n[[page](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FYang_Binding_Touch_to_Everything_Learning_Unified_Multimodal_Tactile_Representations_CVPR_2024_paper.pdf)]\n\n* **Bioinspired sensors and applications in intelligent robots: a review**, Robotic Intelligence and Automation, 2024.    \nZhou, Yanmin and Yan, Zheng and Yang, Ye and Wang, Zhipeng and Lu, Ping and Yuan, Philip F and He, Bin.   \n[[page](https:\u002F\u002Fwww.emerald.com\u002Finsight\u002Fcontent\u002Fdoi\u002F10.1108\u002FRIA-07-2023-0088\u002Ffull\u002Fpdf)]   \n\n* **Give Me a Sign: Using Data Gloves for Static Hand-Shape Recognition**, Sensors, 2023.   \nAchenbach, Philipp and Laux, Sebastian and Purdack, Dennis and Müller, Philipp Niklas and Göbel, Stefan.   \n[[page](https:\u002F\u002Fwww.mdpi.com\u002F1424-8220\u002F23\u002F24\u002F9847\u002Fpdf)]   \n\n* **Semantics-aware adaptive knowledge distillation for sensor-to-vision action recognition**, IEEE Transactions on Image Processing, 2021.   \nLiu, Yang and Wang, Keze and Li, Guanbin and Lin, Liang.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2009.00210)]   \n\n* **Hand movements: A window into haptic object recognition**, Cognitive psychology, 1987.    \nLederman, Susan J and Klatzky, Roberta L.   \n[[page](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002F0010028587900089)]   \n\n* **Force and tactile sensing**, Springer Handbook of Robotics, 2016.   \nCutkosky, Mark R and Howe, Robert D and Provancher, William R.   \n[[page](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-3-319-32552-1_28)]\n\n* **Haptic perception: A tutorial**, Attention, Perception, & Psychophysics, 2009.   \nLederman, Susan J and Klatzky, Roberta L.   \n[[page](https:\u002F\u002Flink.springer.com\u002Fcontent\u002Fpdf\u002F10.3758\u002FAPP.71.7.1439.pdf)]   \n\n* **Flexible tactile sensing based on piezoresistive composites: A review**, Sensors, 2014.   \nStassi, Stefano and Cauda, Valentina and Canavese, Giancarlo and Pirri, Candido Fabrizio.   \n[[page](https:\u002F\u002Fwww.mdpi.com\u002F1424-8220\u002F14\u002F3\u002F5296.)]   \n\n* **Tactile sensing in dexterous robot hands**, Robotics and Autonomous Systems, 2015.    \nKappassov, Zhanat and Corrales, Juan-Antonio and Perdereau, Véronique.   \n[[page](https:\u002F\u002Fuca.hal.science\u002Fhal-01680649\u002Fdocument)]   \n\n* **GelLink: A Compact Multi-phalanx Finger with Vision-based Tactile Sensing and Proprioception**, arXiv, 2024.   \nMa, Yuxiang and Adelson, Edward.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.14887)]\n\n* **A Touch, Vision, and Language Dataset for Multimodal Alignment**, ArXiv, 2024.   \nFu, Letian and Datta, Gaurav and Huang, Huang and Panitch, William Chung-Ho and Drake, Jaimyn and Ortiz, Joseph and Mukadam, Mustafa and Lambeta, Mike and Calandra, Roberto and Goldberg, Ken.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.13232)]\n\n* **Large-scale actionless video pre-training via discrete diffusion for efficient policy learning**, ArXiv, 2024.   \nHe, Haoran and Bai, Chenjia and Pan, Ling and Zhang, Weinan and Zhao, Bin and Li, Xuelong.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.14407)]\n\n* **Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces**, ArXiv, 2024.   \nComi, Mauro and Tonioni, Alessio and Yang, Max and Tremblay, Jonathan and Blukis, Valts and Lin, Yijiong and Lepora, Nathan F and Aitchison, Laurence.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.20275)]   \n\n* **Tactile-augmented radiance fields**, CVPR, 2024.      \nDou, Yiming and Yang, Fengyu and Liu, Yi and Loquercio, Antonio and Owens, Andrew.    \n[[page](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FDou_Tactile-Augmented_Radiance_Fields_CVPR_2024_paper.pdf)]       \n\n* **AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch**, ArXiv, 2024.      \nYang, Max and Lu, Chenghua and Church, Alex and Lin, Yijiong and Ford, Chris and Li, Haoran and Psomopoulou, Efi and Barton, David AW and Lepora, Nathan F.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.07391)]      \n\n* **Feature-level Sim2Real Regression of Tactile Images for Robot Manipulation**, ICRA ViTac, 2024.    \nDuan, Boyi and Qian, Kun and Zhao, Yongqiang and Zhang, Dongyuan and Luo, Shan.    \n[[page](https:\u002F\u002Fshanluo.github.io\u002FViTacWorkshops\u002Fcontent\u002FViTac2024_Paper_09.pdf)]   \n\n* **MAE4GM: Visuo-Tactile Learning for Property Estimation of Granular Material using Multimodal Autoencoder**,ICRA ViTac, 2024.    \nZhang, Zeqing and Zheng, Guangze and Ji, Xuebo and Chen, Guanqi and Jia, Ruixing and Chen, Wentao and Chen, Guanhua and Zhang, Liangjun and Pan, Jia.    \n[[page](https:\u002F\u002Fshanluo.github.io\u002FViTacWorkshops\u002Fcontent\u002FViTac2024_Paper_01.pdf)]\n\n* **Octopi: Object Property Reasoning with Large Tactile-Language Models**, arXiv preprint arXiv:2405.02794, 2024.     \nYu, Samson and Lin, Kelvin and Xiao, Anxing and Duan, Jiafei and Soh, Harold.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.02794)]\n\n* **9dtact: A compact vision-based tactile sensor for accurate 3D shape reconstruction and generalizable 6D force estimation**, IEEE Robotics and Automation Letters, 2023.   \nLin, Changyi and Zhang, Han and Xu, Jikai and Wu, Lei and Xu, Huazhe.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.14277)]\n\n* **Allsight: A low-cost and high-resolution round tactile sensor with zero-shot learning capability**, IEEE Robotics and Automation Letters, 2023.   \nAzulay, Osher and Curtis, Nimrod and Sokolovsky, Rotem and Levitski, Guy and Slomovik, Daniel and Lilling, Guy and Sintov, Avishai.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2307.02928)]   \n\n* **Vistac towards a unified multi-modal sensing finger for robotic manipulation**, IEEE Sensors Journal, 2023.   \nAthar, Sheeraz and Patel, Gaurav and Xu, Zhengtong and Qiu, Qiang and She, Yu.   \n[[page](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F10242327\u002F)]   \n\n* **Midastouch: Monte-carlo inference over distributions across sliding touch**, CoRL, 2023.   \nSuresh, Sudharshan and Si, Zilin and Anderson, Stuart and Kaess, Michael and Mukadam, Mustafa.   \n[[page](https:\u002F\u002Fproceedings.mlr.press\u002Fv205\u002Fsuresh23a\u002Fsuresh23a.pdf)]    \n\n* **The objectfolder benchmark: Multisensory learning with neural and real objects**, CVPR, 2023.   \nGao, Ruohan and Dou, Yiming and Li, Hao and Agarwal, Tanmay and Bohg, Jeannette and Li, Yunzhu and Fei-Fei, Li and Wu, Jiajun.\n[[page](http:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FGao_The_ObjectFolder_Benchmark_Multisensory_Learning_With_Neural_and_Real_Objects_CVPR_2023_paper.pdf)]\n\n* **Imagebind: One embedding space to bind them all**, CVPR, 2023.   \nGirdhar, Rohit and El-Nouby, Alaaeldin and Liu, Zhuang and Singh, Mannat and Alwala, Kalyan Vasudev and Joulin, Armand and Misra, Ishan.   \n[[page](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FGirdhar_ImageBind_One_Embedding_Space_To_Bind_Them_All_CVPR_2023_paper.pdf)]\n\n* **Touching a nerf: Leveraging neural radiance fields for tactile sensory data generation**, Conference on Robot Learning, pp. 1618-1628, 2023.    \nZhong, Shaohong and Albini, Alessandro and Jones, Oiwi Parker and Maiolino, Perla and Posner, Ingmar.   \n[[page](https:\u002F\u002Fproceedings.mlr.press\u002Fv205\u002Fzhong23a\u002Fzhong23a.pdf)]    \n\n* **Learning to read braille: Bridging the tactile reality gap with diffusion models**, ArXiv, 2023.   \nHiguera, Carolina and Boots, Byron and Mukadam, Mustafa.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.01182)]   \n\n* **Generating visual scenes from touch**, CVPR, 2023.   \nYang, Fengyu and Zhang, Jiacheng and Owens, Andrew.   \n[[page](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FYang_Generating_Visual_Scenes_from_Touch_ICCV_2023_paper.pdf)]\n\n* **Dtact: A vision-based tactile sensor that measures high-resolution 3D geometry directly from darkness**, ICRA, 2023.   \nLin, Changyi and Lin, Ziqi and Wang, Shaoxiong and Xu, Huazhe.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.13916)]\n\n* **In-hand pose estimation using hand-mounted RGB cameras and visuotactile sensors**, IEEE Access, 2023.    \nGao, Yuan and Matsuoka, Shogo and Wan, Weiwei and Kiyokawa, Takuya and Koyama, Keisuke and Harada, Kensuke.   \n[[page](https:\u002F\u002Fieeexplore.ieee.org\u002Fiel7\u002F6287639\u002F6514899\u002F10043666.pdf)]   \n\n* **Collision-aware in-hand 6D object pose estimation using multiple vision-based tactile sensors**, ICRA, 2023.    \nCaddeo, Gabriele M and Piga, Nicola A and Bottarel, Fabrizio and Natale, Lorenzo.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2301.13667)]\n\n* **Implicit neural representation for 3D shape reconstruction using vision-based tactile sensing**, ArXiv, 2023.    \nComi, Mauro and Church, Alex and Li, Kejie and Aitchison, Laurence and Lepora, Nathan F.    \n[[page](https:\u002F\u002Fshanluo.github.io\u002FViTacWorkshops\u002Fcontent\u002FViTac2023_Paper_06.pdf)]    \n\n* **Sliding touch-based exploration for modeling unknown object shape with multi-fingered hands**, IROS, 2023.   \nChen, Yiting and Tekden, Ahmet Ercan and Deisenroth, Marc Peter and Bekiroglu, Yasemin.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.00576)]   \n\n* **General In-hand Object Rotation with Vision and Touch**, CoRL, 2023.          \nQi, Haozhi and Yi, Brent and Suresh, Sudharshan and Lambeta, Mike and Ma, Yi and Calandra, Roberto and Malik, Jitendra.        \n[[page](https:\u002F\u002Fproceedings.mlr.press\u002Fv229\u002Fqi23a\u002Fqi23a.pdf)]          \n\n* **Sim-to-Real Model-Based and Model-Free Deep Reinforcement Learning for Tactile Pushing**, IEEE Robotics and Automation Letters, 2023.   \nYang, Max and Lin, Yijiong and Church, Alex and Lloyd, John and Zhang, Dandan and Barton, David AW and Lepora, Nathan F.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2307.14272)]      \n\n* **Unsupervised adversarial domain adaptation for sim-to-real transfer of tactile images**, IEEE Transactions on Instrumentation and Measurement, 2023.       \nJing, Xingshuo and Qian, Kun and Jianu, Tudor and Luo, Shan.      \n[[page](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F10106009\u002F)]\n\n* **Learn from incomplete tactile data: Tactile representation learning with masked autoencoders**, IROS, 2023.    \nCao, Guanqun and Jiang, Jiaqi and Bollegala, Danushka and Luo, Shan.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2307.07358)]\n\n* **Dexterity from touch: Self-supervised pre-training of tactile representations with robotic play**, ArXiv, 2023.    \nGuzey, Irmak and Evans, Ben and Chintala, Soumith and Pinto, Lerrel.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.12076)]    \n\n* **Gelslim 3.0: High-resolution measurement of shape, force and slip in a compact tactile-sensing finger**, ICRA, 2022.   \n  Taylor, Ian H and Dong, Siyuan and Rodriguez, Alberto.   \n  [[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.12269)]    \n\n* **Tacto: A fast, flexible, and open-source simulator for high-resolution vision-based tactile sensors**, IEEE Robotics and Automation Letters, 2022.    \nWang, Shaoxiong and Lambeta, Mike and Chou, Po-Wei and Calandra, Roberto.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2012.08456)]   \n\n* **Taxim: An example-based simulation model for GelSight tactile sensors**, IEEE Robotics and Automation Letters, 2022.   \nSi, Zilin and Yuan, Wenzhen.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2109.04027)]\n\n* **Objectfolder 2.0: A multisensory object dataset for sim2real transfer**, CVPR, 2022.      \nGao, Ruohan and Si, Zilin and Chang, Yen-Yu and Clarke, Samuel and Bohg, Jeannette and Fei-Fei, Li and Yuan, Wenzhen and Wu, Jiajun.     \n[[page](http:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FGao_ObjectFolder_2.0_A_Multisensory_Object_Dataset_for_Sim2Real_Transfer_CVPR_2022_paper.pdf)]      \n\n* **Self-supervised visuo-tactile pretraining to locate and follow garment features**, ArXiv, 2022.      \nKerr, Justin and Huang, Huang and Wilcox, Albert and Hoque, Ryan and Ichnowski, Jeffrey and Calandra, Roberto and Goldberg, Ken.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.13042)]     \n\n* **Visuotactile 6D pose estimation of an in-hand object using vision and tactile sensor data**, IEEE Robotics and Automation Letters, 2022.   \nDikhale, Snehal and Patel, Karankumar and Dhingra, Daksh and Naramura, Itoshi and Hayashi, Akinobu and Iba, Soshi and Jamali, Nawid.   \n[[page](https:\u002F\u002Fwww.researchgate.net\u002Fprofile\u002FSnehal_Dikhale\u002Fpublication\u002F357842538_VisuoTactile_6D_Pose_Estimation_of_an_In-Hand_Object_Using_Vision_and_Tactile_Sensor_Data\u002Flinks\u002F6297b925416ec50bdb022987\u002FVisuoTactile-6D-Pose-Estimation-of-an-In-Hand-Object-Using-Vision-and-Tactile-Sensor-Data.pdf)]   \n\n* **Shapemap 3-D: Efficient shape mapping through dense touch and vision**, ICRA, 2022.    \nSuresh, Sudharshan and Si, Zilin and Mangelson, Joshua G and Yuan, Wenzhen and Kaess, Michael.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2109.09884)]\n\n* **Visuotactile-rl: Learning multimodal manipulation policies with deep reinforcement learning**, ICRA, 2022.          \nHansen, Johanna and Hogan, Francois and Rivkin, Dmitriy and Meger, David and Jenkin, Michael and Dudek, Gregory.          \n[[page](https:\u002F\u002Fjohannah.github.io\u002Fpapers\u002FVisuotactile-RL.pdf)]          \n\n* **Tactile gym 2.0: Sim-to-real deep reinforcement learning for comparing low-cost high-resolution robot touch**, IEEE Robotics and Automation Letters, 2022.   \nLin, Yijiong and Lloyd, John and Church, Alex and Lepora, Nathan F.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2207.10763)]\n\n* **Touch and go: Learning from human-collected vision and touch**, ArXiv, 2022.    \nYang, Fengyu and Ma, Chenyang and Zhang, Jiacheng and Zhu, Jing and Yuan, Wenzhen and Owens, Andrew.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2211.12498)]   \n\n* **Objectfolder: A dataset of objects with implicit visual, auditory, and tactile representations**, arXiv, 2021.   \nGao, Ruohan and Chang, Yen-Yu and Mall, Shivani and Fei-Fei, Li and Wu, Jiajun.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2109.07991)]   \n\n* **Learning transferable visual models from natural language supervision**, International Conference on Machine Learning, 2021.   \nRadford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and others.   \n[[page](http:\u002F\u002Fproceedings.mlr.press\u002Fv139\u002Fradford21a\u002Fradford21a.pdf)]   \n\n* **GelSight wedge: Measuring high-resolution 3D contact geometry with a compact robot finger**, ICRA, 2021.   \nWang, Shaoxiong and She, Yu and Romero, Branden and Adelson, Edward.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2106.08851)]\n\n* **Tactile object pose estimation from the first touch with geometric contact rendering**, CoRL, 2021.   \nVillalonga, Maria Bauza and Rodriguez, Alberto and Lim, Bryan and Valls, Eric and Sechopoulos, Theo.   \n[[page](https:\u002F\u002Fproceedings.mlr.press\u002Fv155\u002Fvillalonga21a\u002Fvillalonga21a.pdf)]     \n\n* **Active 3D shape reconstruction from vision and touch**, NeurIPS, 2021.   \nSmith, Edward and Meger, David and Pineda, Luis and Calandra, Roberto and Malik, Jitendra and Romero Soriano, Adriana and Drozdzal, Michal.    \n[[page](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2021\u002Ffile\u002F8635b5fd6bc675033fb72e8a3ccc10a0-Paper.pdf)]   \n\n* **Interpreting and predicting tactile signals for the syntouch biotac**, The International Journal of Robotics Research, 2021.    \nNarang, Yashraj S and Sundaralingam, Balakumar and Van Wyk, Karl and Mousavian, Arsalan and Fox, Dieter.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2101.05452)]\n\n* **GelTip: A finger-shaped optical tactile sensor for robotic manipulation**, IROS, 2020.   \nGomes, Daniel Fernandes and Lin, Zhonglin and Luo, Shan.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2008.05404)]   \n\n* **DIGIT: A Novel Design for a Low-Cost Compact High-Resolution Tactile Sensor With Application to In-Hand Manipulation**, IEEE Robotics and Automation Letters, 2020.   \nLambeta, Mike and Chou, Po-Wei and Tian, Stephen and Yang, Brian and Maloon, Benjamin and Most, Victoria Rose and Stroud, Dave and Santos, Raymond and Byagowi, Ahmad and Kammerer, Gregg and Jayaraman, Dinesh and Calandra, Roberto.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2005.14679)]\n\n* **Digit: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation**, IEEE Robotics and Automation Letters, 2020.   \nLambeta, Mike and Chou, Po-Wei and Tian, Stephen and Yang, Brian and Maloon, Benjamin and Most, Victoria Rose and Stroud, Dave and Santos, Raymond and Byagowi, Ahmad and Kammerer, Gregg and others.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2005.14679)]\n\n* **Deep tactile experience: Estimating tactile sensor output from depth sensor data**, IROS, 2020.   \nPatel, Karankumar and Iba, Soshi and Jamali, Nawid.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2110.08946)]   \n\n* **3D shape reconstruction from vision and touch**, NeurIPS, 2020.   \nSmith, Edward and Calandra, Roberto and Romero, Adriana and Gkioxari, Georgia and Meger, David and Malik, Jitendra and Drozdzal, Michal.   \n[[page](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2020\u002Ffile\u002Fa3842ed7b3d0fe3ac263bcabd2999790-Paper.pdf)]\n\n* **Supervised autoencoder joint learning on heterogeneous tactile sensory data: Improving material classification performance**, IROS, 2020.      \nGao, Ruihan and Taunyazov, Tasbolat and Lin, Zhiping and Wu, Yan.      \n[[page](https:\u002F\u002Fyan-wu.com\u002Fwp-content\u002Fuploads\u002F2020\u002F08\u002Fgao2020supervised.pdf)]\n\n* **Making sense of vision and touch: Learning multimodal representations for contact-rich tasks**, IEEE Transactions on Robotics, 2020.    \nLee, Michelle A and Zhu, Yuke and Zachares, Peter and Tan, Matthew and Srinivasan, Krishnan and Savarese, Silvio and Fei-Fei, Li and Garg, Animesh and Bohg, Jeannette.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1907.13098)]\n\n* **Learning efficient haptic shape exploration with a rigid tactile sensor array**, PloS One, 2020.    \nFleer, Sascha and Moringen, Alexandra and Klatzky, Roberta L and Ritter, Helge.    \n[[page](https:\u002F\u002Fjournals.plos.org\u002Fplosone\u002Farticle\u002Ffile?id=10.1371\u002Fjournal.pone.0226880&type=printable)]    \n\n* **Interpreting and predicting tactile signals via a physics-based and data-driven framework**, ArXiv, 2020.    \nNarang, Yashraj S and Van Wyk, Karl and Mousavian, Arsalan and Fox, Dieter.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2006.03777)]\n\n* **Fast texture classification using tactile neural coding and spiking neural network**, IROS, 2020.    \nTaunyazov, Tasbolat and Chua, Yansong and Gao, Ruihan and Soh, Harold and Wu, Yan.    \n[[page](https:\u002F\u002Fruihangao.github.io\u002Ffiles\u002Ftaunyazov2020fast.pdf)]\n\n* **Simulation of the SynTouch BioTac sensor**, Intelligent Autonomous Systems 15: Proceedings of the 15th International Conference IAS-15, 2019.    \nRuppel, Philipp and Jonetzko, Yannick and Görner, Michael and Hendrich, Norman and Zhang, Jianwei.   \n[[page](https:\u002F\u002Fwww.researchgate.net\u002Fprofile\u002FYannick-Jonetzko\u002Fpublication\u002F330014756_Simulation_of_the_SynTouch_BioTac_Sensor_Proceedings_of_the_15th_International_Conference_IAS-15\u002Flinks\u002F5cc7ed694585156cd7bbc519\u002FSimulation-of-the-SynTouch-BioTac-Sensor-Proceedings-of-the-15th-International-Conference-IAS-15.pdf)]   \n\n* **Robust learning of tactile force estimation through robot interaction**, ICRA, 2019.   \nSundaralingam, Balakumar and Lambert, Alexander Sasha and Handa, Ankur and Boots, Byron and Hermans, Tucker and Birchfield, Stan and Ratliff, Nathan and Fox, Dieter.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.06187)]   \n\n* **From pixels to percepts: Highly robust edge perception and contour following using deep learning and an optical biomimetic tactile sensor**, IEEE Robotics and Automation Letters, 2019.    \nLepora, Nathan F and Church, Alex and De Kerckhove, Conrad and Hadsell, Raia and Lloyd, John.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1812.02941)]\n\n* **Tactile mapping and localization from high-resolution tactile imprints**, ICRA, 2019.   \nBauza, Maria and Canal, Oleguer and Rodriguez, Alberto.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1904.10944)]   \n\n* **Convolutional autoencoder for feature extraction in tactile sensing**, IEEE Robotics and Automation Letters, 2019.   \nPolic, Marsela and Krajacic, Ivona and Lepora, Nathan and Orsag, Matko.    \n[[page](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F8758942\u002F)]     \n\n* **Learning to identify object instances by touch: Tactile recognition via multimodal matching**, ICRA, 2019.    \nLin, Justin and Calandra, Roberto and Levine, Sergey.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1903.03591)]\n\n* **The tactip family: Soft optical tactile sensors with 3D-printed biomimetic morphologies**, Soft Robotics, 2018.   \nWard-Cherrier, Benjamin and Pestell, Nicholas and Cramphorn, Luke and Winstone, Benjamin and Giannaccini, Maria Elena and Rossiter, Jonathan and Lepora, Nathan F.   \n[[page](https:\u002F\u002Fwww.liebertpub.com\u002Fdoi\u002Fpdf\u002F10.1089\u002Fsoro.2017.0052)]     \n\n* **3D shape perception from monocular vision, touch, and shape priors**, IROS, 2018.   \nWang, Shaoxiong and Wu, Jiajun and Sun, Xingyuan and Yuan, Wenzhen and Freeman, William T and Tenenbaum, Joshua B and Adelson, Edward H.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.13916)]\n\n* **GelSight: High-Resolution Robot Tactile Sensors for Estimating Geometry and Force**, Sensors, 2017.   \nYuan, Wenzhen and Dong, Siyuan and Adelson, Edward H.   \n[[page](https:\u002F\u002Fwww.mdpi.com\u002F1424-8220\u002F17\u002F12\u002F2762\u002Fpdf)]   \n\n* **The feeling of success: Does touch sensing help predict grasp outcomes?**, arXiv, 2017.      \nCalandra, Roberto and Owens, Andrew and Upadhyaya, Manu and Yuan, Wenzhen and Lin, Justin and Adelson, Edward H and Levine, Sergey.      \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1710.05512)]        \n\n* **Improved GelSight tactile sensor for measuring geometry and slip**, IROS, 2017.    \nDong, Siyuan and Yuan, Wenzhen and Adelson, Edward H.   \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1708.00922)]   \n\n* **GelSight: High-resolution robot tactile sensors for estimating geometry and force**, Sensors, vol. 17, no. 12, pp. 2762, 2017.   \nYuan, Wenzhen and Dong, Siyuan and Adelson, Edward H.   \n[[page](https:\u002F\u002Fwww.mdpi.com\u002F1424-8220\u002F17\u002F12\u002F2762\u002Fpdf)]   \n\n* **Connecting look and feel: Associating the visual and tactile properties of physical materials**, CVPR, 2017.    \nYuan, Wenzhen and Wang, Shaoxiong and Dong, Siyuan and Adelson, Edward.    \n[[page](http:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_cvpr_2017\u002Fpapers\u002FYuan_Connecting_Look_and_CVPR_2017_paper.pdf)]    \n\n* **Stable reinforcement learning with autoencoders for tactile and visual data**, IROS, 2016.    \nVan Hoof, Herke and Chen, Nutan and Karl, Maximilian and van der Smagt, Patrick and Peters, Jan.    \n[[page](https:\u002F\u002Fwww.academia.edu\u002Fdownload\u002F47652433\u002FHoof2016.pdf)]    \n\n* **Sensing tactile microvibrations with the BioTac—Comparison with human sensitivity**, BioRob, 2012.   \nFishel, Jeremy A and Loeb, Gerald E.   \n[[page](https:\u002F\u002Fwww.researchgate.net\u002Fprofile\u002FGerald-Loeb\u002Fpublication\u002F256748883_Sensing_tactile_microvibrations_with_the_BioTac_Comparison_with_human_sensitivity\u002Flinks\u002F5dbcacae299bf1a47b0a3fa6\u002FSensing-tactile-microvibrations-with-the-BioTac-Comparison-with-human-sensitivity.pdf)]   \n\n## \u003Ca id=\"interaction\"> Embodied Interaction \u003Ca href=\"#table-of-contents\">🔝\u003C\u002Fa> \u003C\u002Fa> \n\n* **DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness**, arXiv, 2025     \nYiming Zhong, Qi Jiang, Jingyi Yu, Yuexin Ma.       \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.08257)\n\n* **Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering**, arxiv, 2025     \nKaixuan Jiang, Yang Liu, Weixing Chen, Jingzhou Luo, Ziliang Chen, Ling Pan, Guanbin Li, Liang Lin.        \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.11117)\n\n* **Cross-Embodiment Dexterous Grasping with Reinforcement Learning**, arxiv, 2024     \nHaoqi Yuan, Bohan Zhou, Yuhui Fu, Zongqing Lu.       \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.02479)\n\n* **ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation**, arxiv, 2024     \nGuanxing Lu, Shiyi Zhang, Ziwei Wang, Changliu Liu, Jiwen Lu, Yansong Tang.       \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.08321)\n\n* **MANUS: Markerless Grasp Capture using Articulated 3D Gaussians**, CVPR, 2024     \nChandradeep Pokhariya, Ishaan Nikhil Shah, Angela Xing, Zekun Li, Kefan Chen, Avinash Sharma, Srinath Sridhar.       \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FPokhariya_MANUS_Markerless_Grasp_Capture_using_Articulated_3D_Gaussians_CVPR_2024_paper.pdf)\n\n* **Language-driven Grasp Detection**, CVPR, 2024     \nAn Dinh Vuong, Minh Nhat Vu, Baoru Huang, Nghia Nguyen, Hieu Le, Thieu Vo, Anh Nguyen.       \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FVuong_Language-driven_Grasp_Detection_CVPR_2024_paper.pdf)\n\n* **Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge**, CVPR, 2024     \nHaoxiang Ma, Modi Shi, Boyang Gao, Di Huang.       \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FMa_Generalizing_6-DoF_Grasp_Detection_via_Domain_Prior_Knowledge_CVPR_2024_paper.pdf)\n\n* **Dexterous Grasp Transformer**, CVPR, 2024     \nGuo-Hao Xu, Yi-Lin Wei, Dian Zheng, Xiao-Ming Wu, Wei-Shi Zheng.       \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FXu_Dexterous_Grasp_Transformer_CVPR_2024_paper.pdf)\n\n* **Single-View Scene Point Cloud Human Grasp Generation**, CVPR, 2024     \nYan-Kang Wang, Chengyi Xing, Yi-Lin Wei, Xiao-Ming Wu, Wei-Shi Zheng.       \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FWang_Single-View_Scene_Point_Cloud_Human_Grasp_Generation_CVPR_2024_paper.pdf)\n\n* **G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis**, CVPR, 2024     \nYufei Ye, Abhinav Gupta, Kris Kitani, Shubham Tulsiani.       \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FYe_G-HOP_Generative_Hand-Object_Prior_for_Interaction_Reconstruction_and_Grasp_Synthesis_CVPR_2024_paper.pdf)\n\n* **Grasping Diverse Objects with Simulated Humanoids** ArXiv, 2024.           \nZhengyi Luo, Jinkun Cao, Sammy Christen, Alexander Winkler, Kris Kitani, Weipeng Xu      \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.11385)        \n\n* **Task-Oriented Dexterous Grasp Synthesis via Differentiable Grasp Wrench Boundary Estimator**, IROS, 2024.           \nJiayi Chen,Yuxing Chen,Jialiang Zhang, He Wang          \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.13586)\n\n* **Open6DOR: Benchmarking Open-instruction 6-DoF Object Rearrangement and A VLM-based Approach**, IROS, 2024.           \nYufei Ding,Haoran Geng , Chaoyi Xu ,Xiaomeng Fang,Jiazhao Zhang,Songlin Wei, Qiyu Dai, Zhizheng Zhang, He Wang             \n[[page]](https:\u002F\u002Fpku-epic.github.io\u002FOpen6DOR\u002F)\n\n* **ASGrasp: Generalizable Transparent Object Reconstruction and 6-DoF Grasp Detection from RGB-D Active Stereo Camera**, ICRA, 2024.           \nJun Shi, Yong A, Yixiang Jin, Dingzhe Li, Haoyu Niu, Zhezhu Jin, He Wang                 \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.05648)\n\n* **OpenEQA: Embodied Question Answering in the Era of Foundation Models**, CVPR, 2024    \nMajumdar, Arjun and Ajay, Anurag and Zhang, Xiaohan and Putta, Pranav and Yenamandra, Sriram and Henaff, Mikael and Silwal, Sneha and Mcvay, Paul and Maksymets, Oleksandr and Arnaud, Sergio and others    \n[[page]](http:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FMajumdar_OpenEQA_Embodied_Question_Answering_in_the_Era_of_Foundation_Models_CVPR_2024_paper.pdf)    \n\n* **Explore until Confident: Efficient Exploration for Embodied Question Answering**, ICRA Workshop VLMNM, 2024    \nRen, Allen Z and Clark, Jaden and Dixit, Anushri and Itkina, Masha and Majumdar, Anirudha and Sadigh, Dorsa    \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.15941)    \n\n* **S-EQA: Tackling Situational Queries in Embodied Question Answering**, arXix, 2024    \nDorbala, Vishnu Sashank and Goyal, Prasoon and Piramuthu, Robinson and Johnston, Michael and Manocha, Dinesh and Ghanadhan, Reza     \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.04732)\n\n* **Map-based Modular Approach for Zero-shot Embodied Question Answering**, arXiv, 2024    \nSakamoto, Koya and Azuma, Daichi and Miyanishi, Taiki and Kurita, Shuhei and Kawanabe, Motoaki    \n[[page]](https:\u002F\u002Fui.adsabs.harvard.edu\u002Fabs\u002F2024arXiv240516559S\u002Fabstract)    \n\n* **Embodied Question Answering via Multi-LLM Systems**, arXiv, 2024    \nBhrij Patel and Vishnu Sashank Dorbala and Amrit Singh Bedi    \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.10918)    \n\n* **MultiGripperGrasp: A Dataset for Robotic Grasping from Parallel Jaw Grippers to Dexterous Hands**, arXiv, 2024    \nMurrilo, Luis Felipe Casas and Khargonkar, Ninad and Prabhakaran, Balakrishnan and Xiang, Yu    \n[[page]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.09841)\n\n* **Reasoning Grasping via Multimodal Large Language Model**, arXiv, 2024    \nJin, Shiyu and Xu, Jinxuan and Lei, Yutian and Zhang, Liangjun   \n[[page]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.06798)    \n\n* **SemGrasp: Semantic Grasp Generation via Language Aligned Discretization**, CoRR, 2024    \nLi, Kailin and Wang, Jingbo and Yang, Lixin and Lu, Cewu and Dai, Bo    \n[[page]](https:\u002F\u002Fopenreview.net\u002Fforum?id=WUbr8NV1G6)\n\n* **GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping**, arXiv, 2024    \nZheng, Yuhang and Chen, Xiangyu and Zheng, Yupeng and Gu, Songen and Yang, Runyi and Jin, Bu and Li, Pengfei and Zhong, Chengliang and Wang, Zengmao and Liu, Lina and others    \n[[page]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.09637)\n\n* **Knowledge-based Embodied Question Answering**, TPAMI, 2023    \nTan, Sinan and Ge, Mengmeng and Guo, Di and Liu, Huaping and Sun, Fuchun      \n[[page]](https:\u002F\u002Fpubmed.ncbi.nlm.nih.gov\u002F37195849\u002F)\n\n* **Deep Learning Approaches to Grasp Synthesis: A Review**, IEEE Transactions on Robotics, 2023    \nNewbury, Rhys and Gu, Morris and Chumbley, Lachlan and Mousavian, Arsalan and Eppner, Clemens and Leitner, J{\\\"u}rgen and Bohg, Jeannette and Morales, Antonio and Asfour, Tamim and Kragic, Danica and others    \n[[page]](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1109\u002FTRO.2023.3280597)\n\n* **Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter**, CoRL, 2023    \nTziafas, Georgios and Xu, Yucheng and Goel, Arushi and Kasaei, Mohammadreza and Li, Zhibin and Kasaei, Hamidreza    \n[[page]](https:\u002F\u002Fwww.research.ed.ac.uk\u002Fen\u002Fpublications\u002Flanguage-guided-robot-grasping-clip-based-referring-grasp-synthes)    \n\n* **Reasoning Tuning Grasp: Adapting Multi-Modal Large Language Models for Robotic Grasping**, CoRL, 2023     \nXu, Jinxuan and Jin, Shiyu and Lei, Yutian and Zhang, Yuqian and Zhang, Liangjun   \n[[page]](https:\u002F\u002Fopenreview.net\u002Fforum?id=3mKb5iyZ2V)    \n\n* **Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation**, CoRL, 2023    \nShen, William and Yang, Ge and Yu, Alan and Wong, Jansen and Kaelbling, Leslie Pack and Isola, Phillip    \n[[page]](https:\u002F\u002Fproceedings.mlr.press\u002Fv229\u002Fshen23a.html)    \n\n* **AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains**, IEEE Transactions on Robotics, 2023 \nFang, Hao-Shu and Wang, Chenxi and Fang, Hongjie and Gou, Minghao and Liu, Jirong and Yan, Hengxu and Liu, Wenhai and Xie, Yichen and Lu, Cewu    \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F10167687)\n\n* **DexGraspNet: A Large-Scale Robotic Dexterous Grasp Dataset for General Objects Based on Simulation**, ICRA, 2023.           \nRuicheng Wang, Jialiang Zhang, Jiayi Chen, Yinzhen Xu, Puhao Li, Tengyu Liu, He Wang                          \n[[page]](https:\u002F\u002Fpku-epic.github.io\u002FDexGraspNet\u002F)\n\n* **UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy**, CVPR, 2023.           \nYinzhen Xu, Weikang Wan, Jialiang Zhang, Haoran Liu, Zikang Shan, Hao Shen, Ruicheng Wang, Haoran Geng, Yijia Weng, Jiayi Chen, Tengyu Liu, Li Yi, He Wang                      \n[[page]](https:\u002F\u002Fpku-epic.github.io\u002FUniDexGrasp\u002F)\n\n* **UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning**, ICCV, 2023.           \nWeikang Wan, Haoran Geng, Yun Liu, Zikang Shan, Yaodong Yang, Li Yi, He Wang                    \n[[page]](https:\u002F\u002Fpku-epic.github.io\u002FUniDexGrasp++\u002F)\n\n* **CLIPort: What and Where Pathways for Robotic Manipulation**, CoRL, 2022    \nShridhar, Mohit and Manuelli, Lucas and Fox, Dieter    \n[[page]](https:\u002F\u002Fproceedings.mlr.press\u002Fv164\u002Fshridhar22a.html)    \n\n* **ACRONYM: A Large-Scale Grasp Dataset Based on Simulation**, ICRA, 2021     \nEppner, Clemens and Mousavian, Arsalan and Fox, Dieter    \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F9560844)    \n\n* **Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI**, NeurIPS, 2021    \nRamakrishnan, Santhosh K and Gokaslan, Aaron and Wijmans, Erik and Maksymets, Oleksandr and Clegg, Alex and Turner, John and Undersander, Eric and Galuba, Wojciech and Westbury, Andrew and Chang, Angel X and others    \n[[page]](https:\u002F\u002Fdatasets-benchmarks-proceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2021\u002Ffile\u002F34173cb38f07f89ddbebc2ac9128303f-Paper-round2.pdf)    \n\n* **End-to-end Trainable Deep Neural Network for Robotic Grasp Detection and Semantic Segmentation from RGB**, ICRA, 2021    \nAinetter, Stefan and Fraundorfer, Friedrich   \n[[page]](https:\u002F\u002Felib.dlr.de\u002F146134\u002F)\n\n* **Revisiting EmbodiedQA: A Simple Baseline and Beyond**, IEEE Transactions on Image Processing, 2020    \nWu, Yu and Jiang, Lu and Yang, Yi    \n[[page]](https:\u002F\u002Fopus.lib.uts.edu.au\u002Frest\u002Fbitstreams\u002Fee2d1faf-ce3b-4f63-a133-4217d19e9db1\u002Fretrieve)    \n\n* **Multi-agent Embodied Question Answering in Interactive Environments**, ECCV, 2020    \nTan, Sinan and Xiang, Weilai and Liu, Huaping and Guo, Di and Sun, Fuchun    \n[[page]](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1007\u002F978-3-030-58601-0_39)    \n\n* **Language Models are Few-Shot Learners**, NIPS, 2020    \nBrown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and others    \n[[page]](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.5555\u002F3495724.3495883)    \n\n* **GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping**, CVPR, 2020    \nFang, Hao-Shu and Wang, Chenxi and Gou, Minghao and Lu, Cewu    \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_CVPR_2020\u002Fhtml\u002FFang_GraspNet-1Billion_A_Large-Scale_Benchmark_for_General_Object_Grasping_CVPR_2020_paper.html)\n\n* **Multi-Target Embodied Question Answering**, CVPR, 2019    \nYu, Licheng and Chen, Xinlei and Gkioxari, Georgia and Bansal, Mohit and Berg, Tamara L and Batra, Dhruv    \n[[page]](http:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_CVPR_2019\u002Fpapers\u002FYu_Multi-Target_Embodied_Question_Answering_CVPR_2019_paper.pdf)    \n\n* **Embodied Question Answering in Photorealistic Environments with Point Cloud Perception**, CVPR, 2019    \nWijmans, Erik and Datta, Samyak and Maksymets, Oleksandr and Das, Abhishek and Gkioxari, Georgia and Lee, Stefan and Essa, Irfan and Parikh, Devi and Batra, Dhruv   \n[[page]](http:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_CVPR_2019\u002Fpapers\u002FWijmans_Embodied_Question_Answering_in_Photorealistic_Environments_With_Point_Cloud_Perception_CVPR_2019_paper.pdf)\n\n* **VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering**, BMVC, 2019    \nCangea, C{\\u{a}}t{\\u{a}}lina and Belilovsky, Eugene and Li{\\`o}, Pietro and Courville, Aaron    \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1908.04950)    \n\n* **6-DOF GraspNet: Variational Grasp Generation for Object Manipulation**, ICCV, 2019    \nMousavian, Arsalan and Eppner, Clemens and Fox, Dieter    \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_ICCV_2019\u002Fhtml\u002FMousavian_6-DOF_GraspNet_Variational_Grasp_Generation_for_Object_Manipulation_ICCV_2019_paper.html)    \n\n* **Embodied Question Answering**, CVPR, 2018    \nDas, Abhishek and Datta, Samyak and Gkioxari, Georgia and Lee, Stefan and Parikh, Devi and Batra, Dhruv       \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_cvpr_2018\u002Fpapers\u002FDas_Embodied_Question_Answering_CVPR_2018_paper.pdf)    \n\n* **IQA: Visual Question Answering in Interactive Environments**, CVPR, 2018     \nGordon, Daniel and Kembhavi, Aniruddha and Rastegari, Mohammad and Redmon, Joseph and Fox, Dieter and Farhadi, Ali \n[[page]](http:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_cvpr_2018\u002Fpapers\u002FGordon_IQA_Visual_Question_CVPR_2018_paper.pdf)      \n\n* **Building Generalizable Agents with a Realistic and Rich 3D Environment**, ECCV, 2018     \nWu, Yi and Wu, Yuxin and Gkioxari, Georgia and Tian, Yuandong    \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1801.02209)    \n\n* **MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments**, ECCV, 2018    \nSavva, Manolis and Chang, Angel X and Dosovitskiy, Alexey and Funkhouser, Thomas and Koltun, Vladlen    \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1712.03931)    \n\n* **Neural Modular Control for Embodied Question Answering**, ECCV, 2018    \nDas, Abhishek and Gkioxari, Georgia and Lee, Stefan and Parikh, Devi and Batra, Dhruv    \n[[page]](https:\u002F\u002Fauthors.library.caltech.edu\u002Frecords\u002Fykvm4-2ed40\u002Ffiles\u002F1810.11181.pdf)\n\n* **Jacquard: A Large Scale Dataset for Robotic Grasp Detection**, IROS, 2018    \nDepierre, Amaury and Dellandr{\\'e}a, Emmanuel and Chen, Liming    \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F8593950)    \n\n* **Matterport3D: Learning from rgb-d data in indoor environments,**, IEEE International Conference on 3D Vision, 2017    \nChang, Angel and Dai, Angela and Funkhouser, Thomas and Halber, Maciej and Niessner, Matthias and Savva, Manolis and Song, Shuran and Zeng, Andy and Zhang, Yinda    \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1709.06158)    \n\n* **ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes**, CVPR, 2017    \nDai, Angela and Chang, Angel X and Savva, Manolis and Halber, Maciej and Funkhouser, Thomas and Nie{\\ss}ner, Matthias \n[[page]](https:\u002F\u002Fwww.computer.org\u002Fcsdl\u002Fproceedings-article\u002Fcvpr\u002F2017\u002F0457c432\u002F12OmNyRg4C5)\n\n* **Shape Completion Enabled Robotic Grasping**, IROS, 2017    \nVarley, Jacob and DeChant, Chad and Richardson, Adam and Ruales, Joaqu{\\'\\i}n and Allen, Peter    \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F8206060)    \n\n* **Efficient grasping from RGBD images: Learning using a new rectangle representation**, IEEE International Conference on Robotics and Automation, 2011    \nJiang, Yun and Moseson, Stephen and Saxena, Ashutosh    \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F5980145)    \n\n* **A frontier-based approach for autonomous exploration**, CIRA, 1997     \nYamauchi, Brian    \n[[page]](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.5555\u002F523996.793157)    \n\n## \u003Ca id=\"agent\"> Embodied Agent \u003Ca href=\"#table-of-contents\">🔝\u003C\u002Fa> \u003C\u002Fa> \n\n### Embodied Multimodal Foundation Models and VLA Methods\n* **π₀: A Vision-Language-Action Flow Model for General Robot Control**, arXiv, 2024.     \nKevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch, Lucy Xiaoyang Shi, James Tanner, Quan Vuong, Anna Walling, Haohuan Wang, Ury Zhilinsky.     \n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.24164)] [[Project](https:\u002F\u002Fphysicalintelligence.company\u002Fblog\u002Fpi0)]\n\n* **π₀.₅: A Vision-Language-Action Model with Open-World Generalization**, arXiv, 2025.     \nPhysical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Manuel Y. Galliker, Dibya Ghosh, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Devin LeBlanc, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch, Allen Z. Ren, Laura Smith, Jost Tobias Springenberg, Kyle Stachowicz, James Tanner, Quan Vuong, Homer Walke, Anna Walling, Haohuan Wang, Lili Yu, Ury Zhilinsky.     \n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.16054)] [[Project](https:\u002F\u002Fwww.physicalintelligence.company\u002Fblog\u002Fpi05)]\n\n* **GR00T N1: An Open Foundation Model for Generalist Humanoid Robots**, arXiv, 2025.     \nNVIDIA: Johan Bjorck, Fernando Castañeda, Nikita Cherniadev, Xingye Da, Runyu Ding, Linxi \"Jim\" Fan, Yu Fang, Dieter Fox, Fengyuan Hu, Spencer Huang, Joel Jang, Zhenyu Jiang, Jan Kautz, Yuke Zhu.     \n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.14734)] [[Project](https:\u002F\u002Fdeveloper.nvidia.com\u002Fproject-groot)]\n\n* **Gemini Robotics: Bringing AI into the Physical World**, arXiv, 2025.     \nGemini Robotics Team, Google DeepMind.     \n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.20020)] [[Project](https:\u002F\u002Fdeepmind.google\u002Fdiscover\u002Fblog\u002Fgemini-robotics\u002F)]\n\n* **OpenVLA: An Open-Source Vision-Language-Action Model**, CoRL, 2024.     \nMoo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag R. Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, Chelsea Finn.     \n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.09246)] [[Project](https:\u002F\u002Fopenvla.github.io\u002F)]\n\n* **Octo: An Open-Source Generalist Robot Policy**, RSS, 2024.     \nOcto Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, Jianlan Luo, You Liang Tan, Lawrence Yunliang Chen, Lerrel Pinto, Chelsea Finn, Sergey Levine.     \n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.12213)] [[Project](https:\u002F\u002Focto-models.github.io\u002F)]\n\n* **Magma: A Foundation Model for Multimodal AI Agents**, CVPR, 2025.     \nJianwei Yang, Reuben Tan, Qianhui Wu, Ruijie Zheng, Baolin Peng, Yongyuan Liang, Yu Gu, Mu Cai, Seonghyeon Ye, Jongmin Jang, Yuquan Deng, Lars Lidén, Jianfeng Gao.     \n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.13130)]\n\n* **UniVLA: Unified Vision-Language-Action Model**, RSS, 2025.     \nYuqi Wang, Xinghang Li, Wenxuan Wang, Junbo Zhang, Yingyan Li, Yuntao Chen, Xinlong Wang, Zhaoxiang Zhang.     \n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.19850)] [[Project](https:\u002F\u002Fgithub.com\u002FOpenDriveLab\u002FUniVLA)]\n\n* **FAST: Efficient Action Tokenization for Vision-Language-Action Models**, arXiv, 2025.     \nKarl Pertsch, Kyle Stachowicz, Brian Ichter, Danny Driess, Suraj Nair, Quan Vuong, Sergey Levine, Chelsea Finn.     \n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.09747)] [[Project](https:\u002F\u002Fphysicalintelligence.company\u002Fresearch\u002Ffast)]\n\n* **HumanPlus: Humanoid Shadowing and Imitation from Humans**, CoRL, 2024.     \nZipeng Fu, Qingqing Zhao, Qi Wu, Gordon Wetzstein, Chelsea Finn.     \n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.10454)] [[Project](https:\u002F\u002Fhumanoid-ai.github.io\u002F)]\n\n* **ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills**, arXiv, 2025.     \nTairan He, Jiawei Gao, Wenli Xiao, Yuanhang Zhang, Zi Wang, Jiashun Wang, Zhengyi Luo, Guanqi He, Nikhil Sobanbab, Chaoyi Pan, Zeji Yi, Guannan Qu, Kris Kitani, Jessica Hodgins, Linxi \"Jim\" Fan, Yuke Zhu, Changliu Liu, Guanya Shi.     \n[[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.01143)]\n\n* **Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks**, arXiv, 2025.     \nWenqi Zhang, Mengna Wang, Gangao Liu, Xu Huixin, Yiwei Jiang, Yongliang Shen, Guiyang Hou, Zhe Zheng, Hang Zhang, Xin Li, Weiming Lu, Peng Li, Yueting Zhuang     \n[[page](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.21696)]\n\n* **RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World**, arXiv, 2024.     \nWeixin Mao, Weiheng Zhong, Zhou Jiang, Dong Fang, Zhongyue Zhang, Zihan Lan, Fan Jia, Tiancai Wang, Haoqiang Fan, Osamu Yoshie.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.00171)]\n\n* **Spatially Visual Perception for End-to-End Robotic Learning**, arXiv, 2024.     \nTravis Davies, Jiahuan Yan, Xiang Chen, Yu Tian, Yueting Zhuang, Yiqi Huang, Luhui Hu.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.17458)]\n\n* **GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation**, arXiv, 2024.     \nChi-Lam Cheang, Guangzeng Chen, Ya Jing, Tao Kong, Hang Li, Yifeng Li, Yuxiao Liu, Hongtao Wu, Jiafeng Xu, Yichu Yang, Hanbo Zhang, Minzhao Zhu.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.06158)]\n\n* **Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers**, arXiv, 2024.     \nLirui Wang, Xinlei Chen, Jialiang Zhao, Kaiming He.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.20537)]\n\n* **Spatial Reasoning and Planning for Deep Embodied Agents**, arXiv, 2024.     \nShu Ishida.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.19479)]\n\n* **Grounding Large Language Models In Embodied Environment With Imperfect World Models**, arXiv, 2024.     \nHaolan Liu, Jishen Zhao.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.02742)]\n\n* **SELU: Self-Learning Embodied MLLMs in Unknown Environments**, arXiv, 2024.     \nBoyu Li, Haobin Jiang, Ziluo Ding, Xinrun Xu, Haoran Li, Dongbin Zhao, Zongqing Lu.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.03303)]\n\n* **Autort: Embodied foundation models for large scale orchestration of robotic agents**, arXiv, 2024.     \nAhn, Michael, Debidatta, Dwibedi, Chelsea, Finn, Montse Gonzalez, Arenas, Keerthana, Gopalakrishnan, Karol, Hausman, Brian, Ichter, Alex, Irpan, Nikhil, Joshi, Ryan, Julian, others.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.12963)]     \n\n* **Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learningn**, arXiv, 2024.      \nNorman Di Palo, Leonard Hasenclever, Jan Humplik, Arunkumar Byravan.       \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.20798)]\n\n* **Rt-h: Action hierarchies using language**, ArXiv, 2024.    \nBelkhale, Suneel, Tianli, Ding, Ted, Xiao, Pierre, Sermanet, Quon, Vuong, Jonathan, Tompson, Yevgen, Chebotar, Debidatta, Dwibedi, Dorsa, Sadigh.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.01823)]\n\n* **Do as i can, not as i say: Grounding language in robotic affordances**, Conference on robot learning. 2023.    \nBrohan, Anthony, Yevgen, Chebotar, Chelsea, Finn, Karol, Hausman, Alexander, Herzog, Daniel, Ho, Julian, Ibarz, Alex, Irpan, Eric, Jang, Ryan, Julian.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.01691)]    \n\n* **Embodiedgpt: Vision-language pre-training via embodied chain of thought**, NeurIPS, 2024.     \nMu, Yao, Qinglong, Zhang, Mengkang, Hu, Wenhai, Wang, Mingyu, Ding, Jun, Jin, Bin, Wang, Jifeng, Dai, Yu, Qiao, Ping, Luo.     \n[[page](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2023\u002Ffile\u002F4ec43957eda1126ad4887995d05fae3b-Paper-Conference.pdf)]\n\n* **Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions**, Conference on Robot Learning. 2023.    \nChebotar, Yevgen, Quan, Vuong, Karol, Hausman, Fei, Xia, Yao, Lu, Alex, Irpan, Aviral, Kumar, Tianhe, Yu, Alexander, Herzog, Karl, Pertsch, others.     \n[[page](https:\u002F\u002Fproceedings.mlr.press\u002Fv229\u002Fchebotar23a\u002Fchebotar23a.pdf)]    \n\n* **Sara-rt: Scaling up robotics transformers with self-adaptive robust attention**, arXiv, 2023.    \nLeal, Isabel, Krzysztof, Choromanski, Deepali, Jain, Avinava, Dubey, Jake, Varley, Michael, Ryoo, Yao, Lu, Frederick, Liu, Vikas, Sindhwani, Quan, Vuong, others.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.01990)]\n\n* **Palm-e: An embodied multimodal language model**, ArXiv, 2023.    \nDriess, Danny, Fei, Xia, Mehdi SM, Sajjadi, Corey, Lynch, Aakanksha, Chowdhery, Brian, Ichter, Ayzaan, Wahid, Jonathan, Tompson, Quan, Vuong, Tianhe, Yu, others.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.03378)]    \n\n* **Rt-2: Vision-language-action models transfer web knowledge to robotic control**, Conference on Robot Learning. 2023.    \nZitkovich, Brianna, Tianhe, Yu, Sichun, Xu, Peng, Xu, Ted, Xiao, Fei, Xia, Jialin, Wu, Paul, Wohlhart, Stefan, Welker, Ayzaan, Wahid, others.    \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2307.15818)]\n\n* **Open x-embodiment: Robotic learning datasets and rt-x models**, arXiv, 2023.        \nPadalkar, others.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.08864)]\n\n* **Vision-language foundation models as effective robot imitators**, arXiv, 2023.    \nLi, Xinghang, Minghuan, Liu, Hanbo, Zhang, Cunjun, Yu, Jie, Xu, Hongtao, Wu, Chilam, Cheang, Ya, Jing, Weinan, Zhang, Huaping, Liu, others.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.01378)]    \n\n* **Rt-1: Robotics transformer for real-world control at scale**, ArXiv, 2022.    \nBrohan, Anthony, Noah, Brown, Justice, Carbajal, Yevgen, Chebotar, Joseph, Dabis, Chelsea, Finn, Keerthana, Gopalakrishnan, Karol, Hausman, Alex, Herzog, Jasmine, Hsu, others.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2212.06817)]    \n\n### Embodied Manipulation & Control\n\n* **Diffusion Policy: Visuomotor Policy Learning via Action Diffusion**, RSS, 2023.    \nCheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, Shuran Song.       \n[[page](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.04137)] [[Project](https:\u002F\u002Fdiffusion-policy.cs.columbia.edu\u002F)]\n\n* **ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning**, CVPR, 2025.    \nKailin Li, Puhao Li, Tengyu Liu, Yuyang Li, Siyuan Huang.       \n[[page](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.21860)]\n\n* **KStar Diffuser: Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation**, CVPR, 2025.    \nQi Lv, Hao Li, Xiang Deng, Rui Shao, Yinchuan Li, Jianye Hao, Longxiang Gao, Michael Yu Wang, Liqiang Nie.       \n[[page](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F11093774\u002F)]\n\n* **AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems**, IROS, 2025.    \nAgiBot-World-Contributors, Qingwen Bu, Jisong Cai, Li Chen, Xiuqi Cui, Yan Ding, Siyuan Feng, Shenyuan Gao, Xindong He, Xuan Hu, Xu Huang, Shu Jiang, Yuxin Jiang, Hongyang Li, Jialu Li, Chiming Liu, Yi Liu, Yuxiang Lu, Jianlan Luo, Ping Luo, Yao Mu, Yuehan Niu, Yixuan Pan, Jiangmiao Pang, Yu Qiao.       \n[[page](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.06669)] [[Project](https:\u002F\u002Fgithub.com\u002FOpenDriveLab\u002FAgiBot-World)]\n\n* **Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation**, arXiv, 2025.    \nAbhiram Maddukuri, Zhenyu Jiang, Lawrence Yunliang Chen, Soroush Nasiriany, Yuqi Xie, Yu Fang, Wenqi Huang, Zu Wang, Zhenjia Xu, Nikita Chernyadev, Scott Reed, Ken Goldberg, Ajay Mandlekar, Linxi Fan, Yuke Zhu.       \n[[page](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.24361)]\n\n* **PEAC: Unsupervised Pre-training for Cross-Embodiment Reinforcement Learning**, NeurIPS, 2024.    \nChengyang Ying, Zhongkai Hao, Xinning Zhou, Xuezhou Xu, Hang Su, Xingxing Zhang, Jun Zhu.       \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.14073)]\n\n* **Fourier Controller Networks for Real-Time Decision-Making in Embodied Learning**, ICML, 2024.    \nHengkai Tan, Songming Liu, Kai Ma, Chengyang Ying, Xingxing Zhang, Hang Su, Jun Zhu.       \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.19885)]\n\n* **RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation**, ArXiv, 2024.    \nSongming Liu, Lingxuan Wu, Bangguo Li, Hengkai Tan, Huayu Chen, Zhengyi Wang, Ke Xu, Hang Su, Jun Zhu.       \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.07864)]\n\n* **ManiBox: Enhancing Spatial Grasping Generalization via Scalable Simulation Data Generation**, ArXiv, 2024.    \nHengkai Tan, Xuezhou Xu, Chengyang Ying, Xinyi Mao, Songming Liu, Xingxing Zhang, Hang Su, Jun Zhu.       \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.01850)]\n\n* **RoboGSim: A Real2Sim2Real Robotic Gaussian Splatting Simulator**, ArXiv, 2024.    \nXinhai Li, Jialin Li, Ziheng Zhang, Rui Zhang, Fan Jia, Tiancai Wang, Haoqiang Fan, Kuo-Kun Tseng, Ruiping Wang.       \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.11839)]\n\n* **SPIRE: Synergistic Planning, Imitation, and Reinforcement Learning for Long-Horizon Manipulation**, ArXiv, 2024.    \nZihan Zhou, Animesh Garg, Dieter Fox, Caelan Garrett, Ajay Mandlekar.       \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.18065)]\n\n* **Diffusion Transformer Policy**, ArXiv, 2024.    \nZhi Hou, Tianyi Zhang, Yuwen Xiong, Hengjun Pu, Chengyang Zhao, Ronglei Tong, Yu Qiao, Jifeng Dai, Yuntao Chen.       \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.15959)]\n\n* **Dexcap: Scalable and portable mocap data collection system for dexterous manipulation**, ArXiv, 2024.    \nChen Wang, Haochen Shi, Weizhuo Wang, Ruohan Zhang, Li Fei-Fei, C Karen Liu.       \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.07788)]\n\n* **Lota-bench: Benchmarking language-oriented task planners for embodied agents**, ArXiv, 2024.    \nChoi, Jae-Woo, Youngwoo, Yoon, Hyobin, Ong, Jaehong, Kim, Minsu, Jang.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.08178)]\n\n* **Socratic Planner: Inquiry-Based Zero-Shot Planning for Embodied Instruction Following**, Arxiv, 2024.    \nSuyeon Shin, Sujin jeon, Junghyun Kim, Gi-Cheon Kang, Byoung-Tak Zhang.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.15190)]    \n\n* **Large Language Models as Commonsense Knowledge for Large-Scale Task Planning**, NeurIPS, 2024.    \nZhao, Zirui, Wee Sun, Lee, David, Hsu.     \n[[page](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2023\u002Ffile\u002F65a39213d7d0e1eb5d192aa77e77eeb7-Paper-Conference.pdf)]\n\n* **Generalized Planning in PDDL Domains with Pretrained Large Language Models**, AAAI, 2024.    \nSilver, Tom, Soham, Dan, Kavitha, Srinivas, Joshua B., Tenenbaum, Leslie Pack, Kaelbling, Michael, Katz.     \n[[page](https:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fdownload\u002F30006\u002F31766)]    \n\n* **Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration** arXiv, 2024.    \nZhang, Yang, Shixin, Yang, Chenjia, Bai, Fei, Wu, Xiu, Li, Xuelong, Li, Zhen, Wang.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.14314)]\n\n* **Embodied Instruction Following in Unknown Environments**, arXiv, 2024.    \nZhenyu Wu, Ziwei Wang, Xiuwei Xu, Jiwen Lu, Haibin Yan.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.11818)]\n\n* **A Backbone for Long-Horizon Robot Task Understanding**, arxiv, 2024.       \nXiaoshuai Chen, Wei Chen, Dongmyoung Lee, Yukun Ge, Nicolas Rojas, and Petar Kormushev.           \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.01334)\n\n* **RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation**, arXiv, 2024.    \nLiu, Jiaming, Mengzhen, Liu, Zhenyu, Wang, Lily, Lee, Kaichen, Zhou, Pengju, An, Senqiao, Yang, Renrui, Zhang, Yandong, Guo, Shanghang, Zhang.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.04339)]\n\n* **Play to the Score: Stage-Guided Dynamic Multi-Sensory Fusion for Robotic Manipulation**, arxiv, 2024.          \nRuoxuan Feng, Di Hu1, Wenke Ma, Xuelong Li.              \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.01366)\n\n* **Egocentric Vision Language Planning**, arxiv, 2024.          \nZhirui Fang, Ming Yang, Weishuai Zeng, Boyu Li, Junpeng Yue, Ziluo Ding, Xiu Li, Zongqing Lu.              \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.05802)\n\n* **Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models**, IROS, 2024.          \nTianyu Wang, Haitao Lin, Junqiu Yu, Yanwei Fu.              \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.07975)\n\n* **LLM-SAP: Large Language Model Situational Awareness Based Planning**, ICME 2024 Workshop MML4SG.           \nLiman Wang, Hanyang Zhong.          \n[[page]](https:\u002F\u002Fgithub.com\u002FHanyangZhong\u002FSituational_Planning_datasets)    \n\n* **FMB: a Functional Manipulation Benchmark for Generalizable Robotic Learning**, ArXiv, 2024.       \nJianlan Luo, Charles Xu, Fangchen Liu, Liam Tan, Zipeng Lin, Jeffrey Wu, Pieter Abbeel, and Sergey Levine.          \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.08553)    \n\n* **ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models** IROS, 2024.     \nSiyuan Huang, Iaroslav Ponomarenko, Zhengkai Jiang, Xiaoqi Li, Xiaobin Hu, Peng Gao, Hongsheng Li, and Hao Dong.     \n[[page]](https:\u002F\u002Fgithub.com\u002FSiyuanHuang95\u002FManipVQA)\n\n* **A3VLM: Actionable Articulation-Aware Vision Language Model** ArXiv, 2024.       \nSiyuan Huang, Haonan Chang, Yuhan Liu, Yimeng Zhu, Hao Dong, Peng Gao, Abdeslam Boularias, and Hongsheng Li.       \n[[page]](https:\u002F\u002Fgithub.com\u002Fchanghaonan\u002FA3VLM)\n\n* **Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld**, CVPR, 2024.       \nYijun Yang, Tianyi Zhou, Kanxue Li, Dapeng Tao, Lusong Li, Li Shen, Xiaodong He, Jing Jiang, Yuhui Shi.       \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FYang_Embodied_Multi-Modal_Agent_trained_by_an_LLM_from_a_Parallel_CVPR_2024_paper.pdf)\n\n* **Retrieval-Augmented Embodied Agents**, CVPR, 2024.       \nYichen Zhu, Zhicai Ou, Xiaofeng Mou, Jian Tang.       \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FZhu_Retrieval-Augmented_Embodied_Agents_CVPR_2024_paper.pdf)\n\n* **Multi-agent Collaborative Perception via Motion-aware Robust Communication Network**, CVPR, 2024.       \nShixin Hong, Yu Liu, Zhi Li, Shaohui Li, You He.       \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FHong_Multi-agent_Collaborative_Perception_via_Motion-aware_Robust_Communication_Network_CVPR_2024_paper.pdf)\n\n* **LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models**, ICCV, 2023.         \nChan Hee Song, Jiaman Wu, Clay Washington, Brian M. Sadler, Wei-Lun Chao, Yu Su.      \n[[page](LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models)]    \n\n* **Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models** EMNLP, 2023.    \nSarch, Gabriel, Yue, Wu, Michael J., Tarr, Katerina, Fragkiadaki.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.15127)]    \n\n* **Voyager: An Open-Ended Embodied Agent with Large Language Models**, TMLR, 2023.    \nWang, Guanzhi, Yuqi, Xie, Yunfan, Jiang, Ajay, Mandlekar, Chaowei, Xiao, Yuke, Zhu, Linxi, Fan, Anima, Anandkumar.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.16291)]    \n\n* **ReAct: Synergizing Reasoning and Acting in Language Models**, ICLR, 2023.    \nYao, Shunyu, Jeffrey, Zhao, Dian, Yu, Nan, Du, Izhak, Shafran, Karthik, Narasimhan, Yuan, Cao.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2210.03629)]    \n\n* **ProgPrompt: Generating Situated Robot Task Plans Using Large Language Models** , ICRA, 2023.    \nSingh, Ishika, Valts, Blukis, Arsalan, Mousavian, Ankit, Goyal, Danfei, Xu, Jonathan, Tremblay, Dieter, Fox, Jesse, Thomason, Animesh, Garg.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.11302)]    \n\n* **ChatGPT for Robotics: Design Principles and Model Abilities**, IEEE Access 12. (2023): 55682-55696.    \nSai Vemprala, Rogerio Bonatti, Arthur Fender C. Bucker, Ashish Kapoor.     \n[[page](https:\u002F\u002Fieeexplore.ieee.org\u002Fstamp\u002Fstamp.jsp?arnumber=10500490)]    \n\n* **Code as Policies: Language Model Programs for Embodied Control**, ICRA, 2023.        \nJacky Liang, , Wenlong Huang, F. Xia, Peng Xu, Karol Hausman, Brian Ichter, Peter R. Florence, Andy Zeng.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.07753)]\n\n* **Reasoning with Language Model Is Planning with World Model**, Arxiv, 2023.    \nHao, Shibo, Yi, Gu, Haodi, Ma, Joshua Jiahua, Hong, Zhen, Wang, Daisy Zhe, Wang, Zhiting, Hu.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.14992)]    \n\n* **LGMCTS: Language-Guided Monte-Carlo Tree Search for Executable Semantic Object Rearrangement**, arXiv, 2023.    \nHaonan Chang, Kai Gao, Kowndinya Boyalakuntla, Alex Lee, Baichuan Huang, Harish Udhaya Kumar, Jinjin Yu, Abdeslam Boularias.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.15821)]    \n\n* **Translating Natural Language to Planning Goals with Large-Language Models**, arXiv, 2023.    \nXie, Yaqi, Chen, Yu, Tongyao, Zhu, Jinbin, Bai, Ze, Gong, Harold, Soh.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2302.05128)]    \n\n* **LLM+P: Empowering Large Language Models with Optimal Planning Proficiency**, arXiv, 2023.    \nLiu, Bo, Yuqian, Jiang, Xiaohan, Zhang, Qiang, Liu, Shiqi, Zhang, Joydeep, Biswas, Peter, Stone.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.11477)]    \n\n* **Dynamic Planning with a LLM**, arXiv, 2023.    \nDagan, Gautier, Frank, Keller, Alex, Lascarides.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.06391)]    \n\n* **Embodied Task Planning with Large Language Models**, arXiv, 2023.    \nWu, Zhenyu, Ziwei, Wang, Xiuwei, Xu, Jiwen, Lu, Haibin, Yan.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2307.01848)]    \n\n* **SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning**, Conference on Robot Learning. 2023.    \nKrishan Rana, Jesse Haviland, Sourav Garg, Jad Abou-Chakra, Ian D. Reid, Niko Sunderhauf.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2307.06135)]    \n\n* **ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning**, ArXiv, 2023.    \nQiao Gu, Ali Kuwajerwala, Sacha Morin, Krishna Murthy Jatavallabhula, Bipasha Sen, Aditya Agarwal, Corban Rivera, William Paul, Kirsty Ellis, Ramalingam Chellappa, Chuang Gan, Celso Miguel de Melo, Joshua B Tenenbaum, Antonio Torralba, Florian Shkurti, Liam Paull.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.16650)]    \n\n* **RoboGPT: an intelligent agent of making embodied long-term decisions for daily instruction tasks**, arXiv, 2023.    \nYaran Chen, Wenbo Cui, Yuanwen Chen, Mining Tan, Xinyao Zhang, Dong Zhao, He Wang.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.15649)]    \n\n* **Chat with the Environment: Interactive Multimodal Perception Using Large Language Models**, IROS, 2023.    \nZhao, Xufeng, Mengdi, Li, Cornelius, Weber, Muhammad Burhan, Hafez, Stefan, Wermter.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.08268)]    \n\n* **Video Language Planning**, arxiv, 2023.    \nDu, Yilun, Mengjiao, Yang, Pete, Florence, Fei, Xia, Ayzaan, Wahid, Brian, Ichter, Pierre, Sermanet, Tianhe, Yu, Pieter, Abbeel, Joshua B., Tenenbaum, Leslie, Kaelbling, Andy, Zeng, Jonathan, Tompson.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.10625)]    \n\n* **Code as Policies: Language Model Programs for Embodied Control**, ICRA, 2023,    \nJacky Liang, Wenlong Huang, F. Xia, Peng Xu, Karol Hausman, Brian Ichter, Peter R. Florence, Andy Zeng.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.07753)]    \n\n* **Reflexion: an autonomous agent with dynamic memory and self-reflection**, ArXiv, 2023.    \nNoah Shinn, Beck Labash, A. Gopinath.     \n[[page]()]    \n\n* **Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents**, Proceedings of the 37th International Conference on Neural Information Processing Systems, 2023.    \nZihao Wang, Shaofei Cai, Anji Liu, Xiaojian Ma, Yitao Liang.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2302.01560)]    \n\n* **Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model**, ArXiv, 2023.         \nSiyuan Huang, Zhengkai Jiang, Hao Dong, Yu Qiao, Peng Gao, and Hongsheng Li.          \n[[page]](https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FInstruct2Act)    \n\n* **Cliport: What and where pathways for robotic manipulation**, Conference on robot learning, 2022.    \nShridhar, Mohit, Lucas, Manuelli, Dieter, Fox.     \n[[page](https:\u002F\u002Fproceedings.mlr.press\u002Fv164\u002Fshridhar22a\u002Fshridhar22a.pdf)]    \n\n* **Language models as zero-shot planners: Extracting actionable knowledge for embodied agents**, ICML, 2022.    \nHuang, Wenlong, Pieter, Abbeel, Deepak, Pathak, Igor, Mordatch.     \n[[page](https:\u002F\u002Fproceedings.mlr.press\u002Fv162\u002Fhuang22a\u002Fhuang22a.pdf)]     \n\n* **Inner Monologue: Embodied Reasoning through Planning with Language Models**, Conference on Robot Learning, 2022.    \nHuang, Wenlong, Fei, Xia, Ted, Xiao, Harris, Chan, Jacky, Liang, Pete, Florence, Andy, Zeng, Jonathan, Tompson, Igor, Mordatch, Yevgen, Chebotar, Pierre, Sermanet, Noah, Brown, Tomas, Jackson, Linda, Luu, Sergey, Levine, Karol, Hausman, Brian, Ichter.      \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2207.05608)]     \n\n* **Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents**, ICML, 2022.    \nHuang, Wenlong, Pieter, Abbeel, Deepak, Pathak, Igor, Mordatch.     \n[[page](https:\u002F\u002Fproceedings.mlr.press\u002Fv162\u002Fhuang22a\u002Fhuang22a.pdf)]\n\n* **Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language**, ICLR, 2022.    \nZeng, Andy, Maria, Attarian, Brian, Ichter, Krzysztof, Choromanski, Adrian, Wong, Stefan, Welker, Federico, Tombari, Aveek, Purohit, Michael, Ryoo, Vikas, Sindhwani, Johnny, Lee, Vincent, Vanhoucke, Pete, Florence.     \n[[page](Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language)]    \n\n* **Skill Induction and Planning with Latent Language**, ACL, 2021.    \nPratyusha Sharma, Antonio Torralba, Jacob Andreas.     \n[[page](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2110.01517)]\n\n* **PDDL-the planning domain definition language**, Technical Report. 1998.    \nDrew McDermott, Malik Ghallab, Adele E. Howe, Craig A. Knoblock, Ashwin Ram, Manuela M. Veloso, Daniel S. Weld, David E. Wilkins.     \n[[page](https:\u002F\u002Fwww.researchgate.net\u002Fprofile\u002FCraig-Knoblock\u002Fpublication\u002F2278933_PDDL_-_The_Planning_Domain_Definition_Language\u002Flinks\u002F0912f50c0c99385e19000000\u002FPDDL-The-Planning-Domain-Definition-Language.pdf)]\n\n* **Strips: A new approach to the application of theorem proving to problem solving**, Artificial Intelligence 2. 3(1971): 189-208.    \nRichard E. Fikes, Nils J. Nilsson.     \n[[page](https:\u002F\u002Fntrs.nasa.gov\u002Fapi\u002Fcitations\u002F19730013831\u002Fdownloads\u002F19730013831.pdf#page=98)]    \n\n* **A Formal Basis for the Heuristic Determination of Minimum Cost Paths**, IEEE Trans. Syst. Sci. Cybern. 4. (1968): 100-107.        \nPeter E. Hart, Nils J. Nilsson, Bertram Raphael.     \n[[page](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F4082128)]\n\n* **The Monte Carlo method**, Journal of the American Statistical Association 44 247. (1949): 335-41.    \nNicholas C. Metropolis, S. M. Ulam.     \n[[page](https:\u002F\u002Fweb.williams.edu\u002FMathematics\u002Fsjmiller\u002Fpublic_html\u002F341Fa09\u002Fhandouts\u002FMetropolisUlam_TheMonteCarloMethod.pdf)]    \n\n\n## \u003Ca id=\"sim-to-real\"> Sim-to-Real Adaptation \u003Ca href=\"#table-of-contents\">🔝\u003C\u002Fa> \u003C\u002Fa> \n\n* **Phantom: Training Robots Without Robots Using Only Human Videos**, arXiv, 2025    \nMarion Lepert, Jiaying Fang, Jeannette Bohg.       \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.00779)\n\n* **Generalizable Humanoid Manipulation with 3D Diffusion Policies**, arXiv, 2025    \nYanjie Ze, Zixuan Chen, Wenhao Wang, Tianyi Chen, Xialin He, Ying Yuan, Xue Bin Peng, Jiajun Wu.       \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.10803)\n\n* **VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks**, arxiv, 2024    \nShiduo Zhang, Zhe Xu, Peiju Liu, Xiaopeng Yu, Yuan Li, Qinghui Gao, Zhaoye Fei, Zhangyue Yin, Zuxuan Wu, Yu-Gang Jiang, Xipeng Qiu                     \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.18194)\n\n* **PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation**, NeurIPS, 2024    \nKaidong Zhang, Pengzhen Ren, Bingqian Lin, Junfan Lin, Shikui Ma, Hang Xu, Xiaodan Liang                      \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.10394)\n\n* **Data Scaling Laws in Imitation Learning for Robotic Manipulation**, arxiv, 2024       \nFanqi Lin, Yingdong Hu, Pingyue Sheng, Chuan Wen, Jiacheng You, Yang Gao                   \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.18647)\n\n* **Evaluating Real-World Robot Manipulation Policies in Simulation**, arxiv, 2024       \nXuanlin Li, Kyle Hsu, Jiayuan Gu, Karl Pertsch, Oier Mees, Homer Rich Walke, Chuyuan Fu, Ishikaa Lunawat, Isabel Sieh, Sean Kirmani, Sergey Levine, Jiajun Wu, Chelsea Finn, Hao Su, Quan Vuong, Ted Xiao                \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.05941)\n\n* **Body Transformer: Leveraging Robot Embodiment for Policy Learning**, arxiv, 2024       \nCarmelo Sferrazza, Dun-Ming Huang, Fangchen Liu, Jongmin Lee, Pieter Abbeel             \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.06316)\n\n* **Autonomous Behavior Planning For Humanoid Loco-manipulation Through Grounded Language Model**, arxiv, 2024    \nJin Wang, Arturo Laurenzi, Nikos Tsagarakis        \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.082827)\n\n* **Robust agents learn causal world models**, ICLR, 2024    \nRichens, Jonathan, and Tom Everitt   \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.10877)    \n\n* **Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots**, arXiv， 2024   \nChi, Cheng and Xu, Zhenjia and Pan, Chuer and Cousineau, Eric and Burchfiel, Benjamin and Feng, Siyuan and Tedrake, Russ and Song, Shuran    \n[[page]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.10329)   \n\n* **Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation**, arXiv, 2024    \nFu, Zipeng and Zhao, Tony Z and Finn, Chelsea   \n[[page]](https:\u002F\u002Fmobile-aloha.github.io\u002Fresources\u002Fmobile-aloha.pdf)   \n\n* **Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition**, arXiv, 2024    \nLuo, Shengcheng and Peng, Quanquan and Lv, Jun and Hong, Kaiwen and Driggs-Campbell, Katherine Rose and Lu, Cewu and Li, Yong-Lu    \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.00299)\n\n* **Reconciling Reality through Simulation: A Real-to-Sim-to-Real Approach for Robust Manipulation**, arXiv, 2024    \nTorne, Marcel and Simeonov, Anthony and Li, Zechu and Chan, April and Chen, Tao and Gupta, Abhishek and Agrawal, Pulkit   \n[[page]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.03949)    \n\n* **TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction**, arXiv, 2024   \nJiang, Yunfan and Wang, Chen and Zhang, Ruohan and Wu, Jiajun and Fei-Fei, Li    \n[[page]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.10315)    \n\n* **Natural Language Can Help Bridge the Sim2Real Gap**, arXiv, 2024    \nYu, Albert and Foote, Adeline and Mooney, Raymond and Mart{\\'\\i}n-Mart{\\'\\i}n, Roberto   \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.10020)\n\n* **Visual Whole-Body Control for Legged Loco-Manipulation**, arXiv, 2024    \nLiu, Minghuan and Chen, Zixuan and Cheng, Xuxin and Ji, Yandong and Yang, Ruihan and Wang, Xiaolong    \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.16967)    \n\n* **Expressive Whole-Body Control for Humanoid Robots**, arXiv, 2024    \nCheng, Xuxin and Ji, Yandong and Chen, Junming and Yang, Ruihan and Yang, Ge and Wang, Xiaolong    \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.16796)\n\n* **Pandora: Towards General World Model with Natural Language Actions and Video States**, arXiv, 2024    \nXiang, Jiannan and Liu, Guangyi and Gu, Yi and Gao, Qiyue and Ning, Yuting and Zha, Yuheng and Feng, Zeyu and Tao, Tianhua and Hao, Shibo and Shi, Yemin and others    \n[[page]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.09455)    \n\n* **3D-VLA: A 3D Vision-Language-Action Generative World Model**, ICML, 2024     \nZhen, Haoyu and Qiu, Xiaowen and Chen, Peihao and Yang, Jincheng and Yan, Xin and Du, Yilun and Hong, Yining and Gan, Chuang    \n[[page]](https:\u002F\u002Fopenreview.net\u002Fforum?id=EZcFK8HupF)    \n\n* **Diffusion World Model: Future Modeling Beyond Step-by-Step Rollout for Offline Reinforcement Learning**, arXiv, 2024    \nDing, Zihan and Zhang, Amy and Tian, Yuandong and Zheng, Qinqing    \n[[page]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.03570)    \n\n* **MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features**, ICLR, 2024    \nBardes, Adrien and Ponce, Jean and LeCun, Yann    \n[[page]](https:\u002F\u002Fopenreview.net\u002Fforum?id=9XdLlbxZCC)    \n\n* **Learning and Leveraging World Models in Visual Representation Learning**, arXiv, 2024    \nGarrido, Quentin and Assran, Mahmoud and Ballas, Nicolas and Bardes, Adrien and Najman, Laurent and LeCun, Yann    \n[[page]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.00504)    \n\n* **iVideoGPT: Interactive VideoGPTs are Scalable World Models**, arXiv, 2024    \nWu, Jialong and Yin, Shaofeng and Feng, Ningya and He, Xu and Li, Dong and Hao, Jianye and Long, Mingsheng   \n[[page]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.15223)   \n\n* **Spatiotemporal Predictive Pre-training for Robotic Motor Control**, arXiv, 2024    \nYang, Jiange and Liu, Bei and Fu, Jianlong and Pan, Bocheng and Wu, Gangshan and Wang, Limin   \n[[page]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.05304)    \n\n* **LEGENT: Open Platform for Embodied Agents**, arXiv, 2024    \nCheng, Zhili and Wang, Zhitong and Hu, Jinyi and Hu, Shengding and Liu, An and Tu, Yuge and Li, Pengkai and Shi, Lei and Liu, Zhiyuan and Sun, Maosong    \n[[page]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.18243)    \n\n* **Point-JEPA: A Joint Embedding Predictive Architecture for Self-Supervised Learning on Point Cloud**, arXiv, 2024    \nSaito, Ayumu and Poovvancheri, Jiju   \n[[page]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.16432)\n\n* **MuDreamer: Learning Predictive World Models without Reconstruction**, ICLR, 2024    \nBurchi, Maxime and Timofte, Radu    \n[[page]](https:\u002F\u002Fopenreview.net\u002Fforum?id=9pe38WpsbX)    \n\n* **From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought**, arXiv, 2024    \nWong, Lionel and Grand, Gabriel and Lew, Alexander K and Goodman, Noah D and Mansinghka, Vikash K and Andreas, Jacob and Tenenbaum, Joshua B    \n[[page]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.12672)    \n\n* **ElastoGen: 4D Generative Elastodynamics**, arXiv, 2024    \nFeng, Yutao and Shang, Yintong and Feng, Xiang and Lan, Lei and Zhe, Shandian and Shao, Tianjia and Wu, Hongzhi and Zhou, Kun and Su, Hao and Jiang, Chenfanfu and others   \n[[page]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.15056)\n\n* **Lifelike agility and play in quadrupedal robots using reinforcement learning and generative pre-trained models**, Nature Machine Intelligence, 2024.        \nLei Han, Qingxu Zhu, Jiapeng Sheng, Chong Zhang, Tingguang Li, Yizheng Zhang, He Zhang et al.        \n[[page]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs42256-024-00861-3)\n\n* **Model Adaptation for Time Constrained Embodied Control**, CVPR, 2024.        \nJaehyun Song, Minjong Yoo, Honguk Woo.        \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FSong_Model_Adaptation_for_Time_Constrained_Embodied_Control_CVPR_2024_paper.pdf)\n\n* **ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation**, CVPR, 2024.        \nXiaoqi Li, Mingxu Zhang, Yiran Geng, Haoran Geng, Yuxing Long, Yan Shen, Renrui Zhang, Jiaming Liu, Hao Dong.        \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FLi_ManipLLM_Embodied_Multimodal_Large_Language_Model_for_Object-Centric_Robotic_Manipulation_CVPR_2024_paper.pdf)    \n\n* **ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation**, CVPR, 2024.        \nXiaoqi Li, Mingxu Zhang, Yiran Geng, Haoran Geng, Yuxing Long, Yan Shen, Renrui Zhang, Jiaming Liu, Hao Dong.        \n[[page]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FLi_ManipLLM_Embodied_Multimodal_Large_Language_Model_for_Object-Centric_Robotic_Manipulation_CVPR_2024_paper.pdf)\n\n* **GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation**, CVPR, 2024.        \nZifan Wang, Junyu Chen, Ziqing Chen, Pengwei Xie, Rui Chen, Li Yi.        \n[[page]](https:\u002F\u002Fgenh2r.github.io\u002F)\n\n* **SAGE: Bridging Semantic and Actionable Parts for Generalizable Manipulation of Articulated Objects**, RSS, 2024.        \nHaoran Geng, Songlin Wei, Congyue Deng, Bokui Shen, He Wang, Leonidas Guibas.        \n[[page]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.01307)\n\n* **GAMMA: Graspability-Aware Mobile MAnipulation Policy Learning based on Online Grasping Pose Fusion**, ICRA, 2024.        \nJiazhao Zhang, Nandiraju Gireesh, Jilong Wang, Xiaomeng Fang, Chaoyi Xu, Weiguang Chen, Liu Dai, He Wang.        \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.15459)\n\n* **ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments**, ECCV, 2024.        \nTaewoong Kim, Cheolhong Min, Byeonghwi Kim, Jinyeon Kim, Wonje Jeung, Jonghyun Choi.        \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.18550)\n\n* **DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control**, ECCV, 2024.        \nXinyu Xu, Shengcheng Luo, Yanchao Yang, Yong-Lu Li, Cewu Lu.         \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.14758)\n\n* **DynSyn: Dynamical Synergistic Representation for Efficient Learning and Control in Overactuated Embodied Systems**, ICML, 2024.        \nKaibo He, Chenhui Zuo, Chengtian Ma, Yanan Sui.         \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.11472)\n\n* **A-JEPA: Joint-Embedding Predictive Architecture Can Listen**, arXiv, 2023    \nFei, Zhengcong and Fan, Mingyuan and Huang, Junshi   \n[[page]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.15830)    \n\n* **One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization**, NeurIPS, 2023    \nLiu, Minghua and Xu, Chao and Jin, Haian and Chen, Linghao and Varma T, Mukund and Xu, Zexiang and Su, Hao   \n[[page]](https:\u002F\u002Fopenreview.net\u002Fforum?id=A6X9y8n4sT)    \n\n* **Introduction to Latent Variable Energy-Based Models: A Path Towards Autonomous Machine Intelligence**, arXiv, 2023    \nDawid, Anna and LeCun, Yann    \n[[page]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.02572)    \n\n* **GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts**, CVPR, 2023    \nGeng, Haoran and Xu, Helin and Zhao, Chengyang and Xu, Chao and Yi, Li and Huang, Siyuan and Wang, He    \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F10203924)\n\n* **Reward-Adaptive Reinforcement Learning: Dynamic Policy Gradient Optimization for Bipedal Locomotion**, IEEE TPAMI, 2023    \nHuang, Changxin and Wang, Guangrun and Zhou, Zhibo and Zhang, Ronghui and Lin, Liang   \n[[page]](https:\u002F\u002Fwww.computer.org\u002Fcsdl\u002Fjournal\u002Ftp\u002F2023\u002F06\u002F09956746\u002F1Iu2CDAJBcc)\n\n* **Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware**, ICML, 2023    \nZhao, Tony Z and Kumar, Vikash and Levine, Sergey and Finn, Chelsea    \n[[page]](https:\u002F\u002Fopenreview.net\u002Fforum?id=e8Eu1lqLaf)\n\n* **Surfer: Progressive Reasoning with World Models for Robotic Manipulation**, arxiv, 2023.    \nPengzhen Ren, Kaidong Zhang, Hetao Zheng, Zixuan Li, Yuhang Wen, Fengda Zhu, Mas Ma, Xiaodan Liang.         \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2306.11335)\n\n* **PartManip: Learning Cross-Category Generalizable Part Manipulation Policy from Point Cloud Observations**, CVPR, 2023.    \nHaoran Geng, Ziming Li, Yiran Geng, Jiayi Chen, Hao Dong, He Wang.         \n[[page]](https:\u002F\u002Fpku-epic.github.io\u002FPartManip\u002F)\n\n* **A Path Towards Autonomous Machine Intelligence Version 0.9.2, 2022-06-27**, Open Review, 2022    \nYann LeCun    \n[[page]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=BZ5a1r-kVsf&)    \n\n* **Real2Sim2Real: Self-Supervised Learning of Physical Single-Step Dynamic Actions for Planar Robot Casting**, ICRA, 2022    \nLim, Vincent and Huang, Huang and Chen, Lawrence Yunliang and Wang, Jonathan and Ichnowski, Jeffrey and Seita, Daniel and Laskey, Michael and Goldberg, Ken    \n[[page]](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1109\u002FICRA46639.2022.9811651)\n\n* **Continuous Jumping for Legged Robots on Stepping Stones via Trajectory Optimization and Model Predictive Control**, IEEE CDC, 2022    \nNguyen, Chuong and Bao, Lingfan and Nguyen, Quan    \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.01147)\n\n* **Reward-Adaptive Reinforcement Learning: Dynamic Policy Gradient Optimization for Bipedal Locomotion**, TPAMI, 2022.          \nChangxin Huang, Guangrun Wang, Zhibo Zhou, Ronghui Zhang, Liang Lin.        \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F9956746)\n\n* **Transporter Networks: Rearranging the Visual World for Robotic Manipulation**, CoRL, 2021    \nZeng, Andy and Florence, Pete and Tompson, Jonathan and Welker, Stefan and Chien, Jonathan and Attarian, Maria and Armstrong, Travis and Krasin, Ivan and Duong, Dan and Sindhwani, Vikas and others    \n[[page]](https:\u002F\u002Fproceedings.mlr.press\u002Fv155\u002Fzeng21a.html)   \n\n* **The MIT Humanoid Robot: Design, Motion Planning, and Control for Acrobatic Behaviors**, IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids), 2021    \nChignoli, Matthew and Kim, Donghyun and Stanger-Jones, Elijah and Kim, Sangbae   \n[[page]](https:\u002F\u002Farxiv.longhoe.net\u002Fpdf\u002F2104.09025)   \n\n* **Sim2Real Transfer for Reinforcement Learning without Dynamics Randomization**, IROS, 2020    \nKaspar, Manuel and Osorio, Juan D Mu{\\~n}oz and Bock, Jurgen      \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F9341260)   \n\n* **Learning Dexterous In-Hand Manipulation**, The International Journal of Robotics Research, 2020    \nAndrychowicz, OpenAI: Marcin and Baker, Bowen and Chociej, Maciek and Jozefowicz, Rafal and McGrew, Bob and Pachocki, Jakub and Petron, Arthur and Plappert, Matthias and Powell, Glenn and Ray, Alex and others    \n[[page]](https:\u002F\u002Fjournals.sagepub.com\u002Fdoi\u002Ffull\u002F10.1177\u002F0278364919887447)\n\n* **DeepGait: Planning and Control of Quadrupedal Gaits using Deep Reinforcement Learning**, IEEE Robotics and Automation Letters, 2020   \nTsounis, Vassilios and Alge, Mitja and Lee, Joonho and Farshidian, Farbod and Hutter, Marco    \n[[page]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1909.08399)    \n\n* **Optimized Jumping on the MIT Cheetah 3 Robot**, ICRA, 2019    \nNguyen, Quan and Powell, Matthew J and Katz, Benjamin and Di Carlo, Jared and Kim, Sangbae   \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F8794449)   \n\n* **World Models**, NIPS, 2018    \nHa, David and Schmidhuber, Jurgen    \n[[page]](https:\u002F\u002Fmx.nthu.edu.tw\u002F~jlliu\u002Fteaching\u002FAI17\u002FAuto8.pdf)\n\n* **MIT Cheetah 3: Design and Control of a Robust, Dynamic Quadruped Robot**, IEEE\u002FRSJ International Conference on Intelligent Robots and Systems (IROS), 2018    \nBledt, Gerardo and Powell, Matthew J and Katz, Benjamin and Di Carlo, Jared and Wensing, Patrick M and Kim, Sangbae    \n[[page]](https:\u002F\u002Fdspace.mit.edu\u002Fbitstream\u002Fhandle\u002F1721.1\u002F126619\u002Firos.pdf?sequence=2)   \n\n* **Sim-to-Real Reinforcement Learning for Deformable Object Manipulation**, CoRL, 2018    \nMatas, Jan and James, Stephen and Davison, Andrew J    \n[[page]](http:\u002F\u002Fproceedings.mlr.press\u002Fv87\u002Fmatas18a\u002Fmatas18a.pdf)\n\n* **Dynamic Walking on Randomly-Varying Discrete Terrain With One-Step Preview**, Robotics: Science and Systems, 2017    \nNguyen, Quan and Agrawal, Ayush and Da, Xingye and Martin, William C and Geyer, Hartmut and Grizzle, Jessy W and Sreenath, Koushil    \n[[page]](https:\u002F\u002Fhybrid-robotics.berkeley.edu\u002Fpublications\u002FRSS2017_DiscreteTerrain_Walking.pdf)   \n\n* **Deep Kernels for Optimizing Locomotion Controllers**, CoRL, 2017    \nAntonova, Rika and Rai, Akshara and Atkeson, Christopher G    \n[[page]](http:\u002F\u002Fproceedings.mlr.press\u002Fv78\u002Fantonova17a\u002Fantonova17a.pdf)\n\n* **Preparing for the Unknown: Learning a Universal Policy with Online System Identification**, RSS, 2017    \nYu, Wenhao and Tan, Jie and Liu, C Karen and Turk, Greg    \n[[page]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1702.02453)    \n\n* **Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World**, IROS, 2017    \nTobin, Josh and Fong, Rachel and Ray, Alex and Schneider, Jonas and Zaremba, Wojciech and Abbeel, Pieter    \n[[page]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F8202133)\n\n* **Practice Makes Perfect: An Optimization-Based Approach to Controlling Agile Motions for a Quadruped Robot**, IEEE Robotics \\& Automation Magazine, 2016    \nGehring, Christian and Coros, Stelian and Hutter, Marco and Bellicoso, Carmine Dario and Heijnen, Huub and Diethelm, Remo and Bloesch, Michael and Fankhauser, P{\\'e}ter and Hwangbo, Jemin and Hoepflinger, Mark and others    \n[[page]](https:\u002F\u002Fwww.research-collection.ethz.ch\u002Fbitstream\u002Fhandle\u002F20.500.11850\u002F183161.1\u002F1\u002Feth-49107-01.pdf)   \n\n* **ANYmal - a highly mobile and dynamic quadrupedal robot**, IEEE\u002FRSJ international conference on intelligent robots and systems (IROS), 2016    \nHutter, Marco and Gehring, Christian and Jud, Dominic and Lauber, Andreas and Bellicoso, C Dario and Tsounis, Vassilios and Hwangbo, Jemin and Bodie, Karen and Fankhauser, Peter and Bloesch, Michael and others    \n[[page]](https:\u002F\u002Fwww.research-collection.ethz.ch\u002Fbitstream\u002Fhandle\u002F20.500.11850\u002F118642\u002Feth-49454-01.pdf;sequence=1)   \n\n* **Optimization Based Full Body Control for the Atlas Robot**, IEEE-RAS International Conference on Humanoid Robots, 2014    \nFeng, Siyuan and Whitman, Eric and Xinjilefu, X and Atkeson, Christopher G    \n[[page]](http:\u002F\u002Fwww.cs.cmu.edu\u002Fafs\u002Fcs\u002Fuser\u002Fsfeng\u002Fwww\u002Fsf_hum14.pdf)    \n\n* **A Compliant Hybrid Zero Dynamics Controller for Stable, Efficient and Fast Bipedal Walking on MABEL**, The International Journal of Robotics Research, 2011    \nSreenath, Koushil and Park, Hae-Won and Poulakakis, Ioannis and Grizzle, Jessy W    \n[[page]](https:\u002F\u002Fsites.udel.edu\u002Fpoulakas\u002Ffiles\u002F2022\u002F10\u002FJ07-A-Compliant-Hybrid-Zero-Dynamics-Controller.pdf)   \n\n* **Dynamic walk of a biped**, The International Journal of Robotics Research, 1984   \nMiura, Hirofumi and Shimoyama, Isao   \n[[page]](https:\u002F\u002Fjournals.sagepub.com\u002Fdoi\u002Fabs\u002F10.1177\u002F027836498400300206)  \n\n## \u003Ca id=\"datasets\"> Datasets \u003Ca href=\"#table-of-contents\">🔝\u003C\u002Fa> \u003C\u002Fa> \nTo be updated...     \n* **AgiBot World**, 2025. [[link]](https:\u002F\u002Fgithub.com\u002FOpenDriveLab\u002FAgiBot-World)\n* **RoboVerse**, 2025. [[link]](https:\u002F\u002Froboverseorg.github.io\u002F)\n* **RefSpatial**, 2025. [[link]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FJingkunAn\u002FRefSpatial)\n* **VisualAgentBench**, 2023.[link](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FVisualAgentBench)\n* **Open X-Embodiment**, 2023.[link](https:\u002F\u002Frobotics-transformer-x.github.io\u002F)\n* **RH20T-P**, 2024.[link](https:\u002F\u002Fsites.google.com\u002Fview\u002Frh20t-primitive\u002Fmain)   \n* **ALOHA 2**, 2024.[link](https:\u002F\u002Faloha-2.github.io\u002F)  \n* **GRUtopia**, 2024.[link](https:\u002F\u002Fgithub.com\u002FOpenRobotLab\u002FGRUtopia)\n* **ARIO (All Robots In One)**, 2024.[link](https:\u002F\u002Fimaei.github.io\u002Fproject_pages\u002Fario\u002F)\n* **VLABench**, 2024.[link](https:\u002F\u002Fvlabench.github.io\u002F)\n* **Matterport3D**, 2017. [[link]](https:\u002F\u002Fgithub.com\u002Fniessner\u002FMatterport)\n* **RoboMIND**, 2025. [[link]](https:\u002F\u002Fx-humanoid-robomind.github.io\u002F)\n\n\n### Embodied Perception\n#### Vision\n\n\n* **BEHAVIOR Vision Suite**, 2024. [[link]](https:\u002F\u002Fbehavior-vision-suite.github.io\u002F)\n* **SpatialQA**, 2024.[[link]](https:\u002F\u002Fgithub.com\u002FBAAI-DCAI\u002FSpatialBot)  \n* **SpatialBench**, 2024. [[link]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRussRobin\u002FSpatialBench)\n* **Uni3DScenes**, 2024. [[link]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRussRobinSpatialBench)\n* **Active Recognition Dataset**, 2023. [[link]](https:\u002F\u002Fleifan95.github.io\u002F_pages\u002FAR-dataset\u002Findex.html)\n* **Baxter_UR5_95_Objects_Dataset**, 2023. [[link]](https:\u002F\u002Fwww.eecs.tufts.edu\u002F~gtatiya\u002Fpages\u002F2022\u002FBaxter_UR5_95_Objects_Dataset.html)\n* **Caltech-256**, 2022. [[link]](https:\u002F\u002Fdata.caltech.edu\u002Frecords\u002Fnyy15-4j048)\n* **DIDI Dataset**, 2020. [[link]](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fgoogle-research\u002Fblob\u002Fmaster\u002Fdidi_dataset\u002FREADME.md)\n* **Replica**, 2019. [[link]](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FReplica-Dataset)\n* **ScanObjectNN**, 2019. [[link]](https:\u002F\u002Fhkust-vgd.github.io\u002Fscanobjectnn\u002F)\n* **OCID Dataset**, 2019. [[link]](https:\u002F\u002Fwww.acin.tuwien.ac.at\u002Fen\u002Fvision-for-robotics\u002Fsoftware-tools\u002Fobject-clutter-indoor-dataset\u002F)\n* **L3RScan**, 2019. [[link]](https:\u002F\u002Fgithub.com\u002FWaldJohannaU\u002F3RScan)\n* **EmbodiedScan**, 2019. [[link]](https:\u002F\u002Fdocs.google.com\u002Fforms\u002Fd\u002Fe\u002F1FAIpQLScUXEDTksGiqHZp31j7Zp7zlCNV7p_08uViwP_Nbzfn3g6hhw\u002Fviewform)  \n* **UZH-FPV Dataset**, 2019. [[link]](https:\u002F\u002Ffpv.ifi.uzh.ch\u002F)\n* **LM Data**, 2019. [[link]](https:\u002F\u002Fperinglab.org\u002Flmdata\u002F)\n* **TUM Visual-Inertial Dataset**, 2018. [[link]](https:\u002F\u002Fcvg.cit.tum.de\u002Fdata\u002Fdatasets\u002Fvisual-inertial-dataset)\n* **ScanNet**, 2017. [[link]](https:\u002F\u002Fgithub.com\u002FScanNet\u002FScanNet)\n* **SUNCG**, 2017. [[link]](http:\u002F\u002Fsuncg.cs.princeton.edu\u002F)\n* **Semantic 3D**, 2017. [[link]](http:\u002F\u002Fwww.semantic3d.net\u002F)\n* **ScanNet v2**, 2017. [[link]](https:\u002F\u002Fgithub.com\u002FScanNet\u002FScanNet)\n* **S3DIS**, 2016. [[link]](http:\u002F\u002Fbuildingparser.stanford.edu\u002F)\n* **Synthia**, 2016. [[link]](https:\u002F\u002Fsynthia-dataset.net\u002F)\n* **ModelNet**, 2015. [[link]](https:\u002F\u002Fmodelnet.cs.princeton.edu\u002F)\n* **ORBvoc**, 2015. [[link]](https:\u002F\u002Fgithub.com\u002Fraulmur\u002FORB_SLAM)\n* **Sketch dataset**, 2015. [[link]](https:\u002F\u002Fcybertron.cg.tu-berlin.de\u002Feitz\u002Fprojects\u002Fclassifysketch\u002F)\n* **SUN RGBD**, 2015. [[link]](https:\u002F\u002Frgbd.cs.princeton.edu\u002F)\n* **ShapeNet**, 2015. [[link]](https:\u002F\u002Fshapenet.org\u002F)\n* **MVS Dataset**, 2014. [[link]](http:\u002F\u002Froboimagedata.compute.dtu.dk\u002F?page_id=36)\n* **SUOD**, 2013. [[link]](https:\u002F\u002Fwww.acfr.usyd.edu.au\u002Fpapers\u002FSydneyUrbanObjectsDataset.shtml)\n* **SUN360**, 2012. [[link]](https:\u002F\u002Fvision.cs.princeton.edu\u002Fprojects\u002F2012\u002FSUN360\u002Fdata\u002F)\n* **NYU Depth Dataset V2**, 2012. [[link]](https:\u002F\u002Fcs.nyu.edu\u002F~fergus\u002Fdatasets\u002Fnyu_depth_v2.html)\n* **TUM-RGBD**, 2012. [[link]](https:\u002F\u002Fcvg.cit.tum.de\u002Fdata\u002Fdatasets\u002Frgbd-dataset\u002Fdownload)\n* **EuRoC MAV Dataset**, 2012. [[link]](https:\u002F\u002Fprojects.asl.ethz.ch\u002Fdatasets\u002Fdoku.php?id=kmavvisualinertialdatasets)\n* **Semantic KITTI**, 2012. [[link]](https:\u002F\u002Fwww.semantic-kitti.org\u002Fdataset.html#download)\n* **KITTI Object Recognition**, 2012. [[link]](http:\u002F\u002Fwww.cvlibs.net\u002Fdatasets\u002Fkitti\u002Feval_object.php)\n* **Stanford Track Collection**, 2011. [[link]](http:\u002F\u002Fcs.stanford.edu\u002Fpeople\u002Fteichman\u002Fstc\u002F)\n\n\n#### Tactile \n\n* **Touch100k**, 2024. [[link]](https:\u002F\u002Fcocacola-lab.github.io\u002FTouch100k\u002F)\n* **ARIO (All Robots In One)**, 2024. [[link]](https:\u002F\u002Fimaei.github.io\u002Fproject_pages\u002Fario\u002F)\n* **TaRF**, 2024. [[link]](https:\u002F\u002Fdou-yiming.github.io\u002FTaRF\u002F)    \n* **TVL**, 2024. [[link]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fmlfu7\u002FTouch-Vision-Language-Dataset)\n* **YCB-Slide**, 2022. [[link]](https:\u002F\u002Fgithub.com\u002Frpl-cmu\u002FYCB-Slide)\n* **Touch and Go**, 2022. [[link]](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1NDasyshDCL9aaQzxjn_-Q5MBURRT360B)\n* **SSVTP**, 2022. [[link]](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1H0B-jJ4l3tJu2zuqf-HbZy2bjEl-vL3f\u002Fview?usp=sharing)\n* **ObjectFolder**, 2021-2023. [[link]](https:\u002F\u002Fgithub.com\u002Frhgao\u002FObjectFolder)\n* **Decoding the BioTac**, 2020. [[link]](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1-BkqiFN9q6cz9Dk74oDlfmDs2m7ZvbWC)\n* **SynTouch**, 2019. [[link]](https:\u002F\u002Ftams.informatik.uni-hamburg.de\u002Fresearch\u002Fdatasets\u002Findex.php#biotac_single_contact_response)\n* **The Feeling of Success**, 2017. [[link]](https:\u002F\u002Fsites.google.com\u002Fview\u002Fthe-feeling-of-success\u002F)\n\n### Embodied Navigation\n* **ALFRED**, 2020. [[link]](https:\u002F\u002Faskforalfred.com\u002F)  \n* **REVERIE**, 2020. [[link]](https:\u002F\u002Fgithub.com\u002FYuankaiQi\u002FREVERIE) \n* **CVDN**, 2019. [[link]](https:\u002F\u002Fgithub.com\u002Fmmurray\u002Fcvdn\u002F)     \n* **Room to Room (R2R)**, 2017. [[link]](https:\u002F\u002Fpaperswithcode.com\u002Fdataset\u002Froom-to-room)\n* **DivScene**, 2024.[[link]](https:\u002F\u002Fgithub.com\u002Fzhaowei-wang-nlp\u002FDivScene)\n* **LH-VLN**, 2025. [[link]](https:\u002F\u002Fhcplab-sysu.github.io\u002FLH-VLN\u002F)\n \n### Embodied Question Answering\n\n* **SpatialQA**, 2024. [[link]](https:\u002F\u002Fgithub.com\u002FBAAI-DCAI\u002FSpatialBot)  \n* **S-EQA**, 2024. [[link]](https:\u002F\u002Fgamma.umd.edu\u002Fresearchdirections\u002Fembodied\u002Fseqa\u002F)\n* **HM-EQA**, 2024. [[link]](https:\u002F\u002Fgithub.com\u002FStanford-ILIAD\u002Fexplore-eqa) \n* **K-EQA**, 2023. [[link]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.07872) \n* **SQA3D**, 2023. [[link]](https:\u002F\u002Fsqa3d.github.io\u002F) \n* **VideoNavQA**, 2019. [[link]](https:\u002F\u002Fgithub.com\u002Fcatalina17\u002FVideoNavQA)  \n* **MP3D-EQA**, 2019. [[link]](https:\u002F\u002Faskforalfred.com\u002F)  \n* **MT-EQA**, 2019. [[link]](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FMT-EQA)  \n* **IQUAD V1**, 2018. [[link]](https:\u002F\u002Fgithub.com\u002Fdanielgordon10\u002Fthor-iqa-cvpr-2018)  \n* **EQA**, 2018. [[link]](https:\u002F\u002Fembodiedqa.org\u002Fdata)  \n  \n### Embodied Manipulation\n* **OAKINK2**, 2024. [[link]](https:\u002F\u002Foakink.net\u002Fv2\u002F)  \n\n## Other Useful Embodied Projects & Tools\n\n### Resources\n[Awesome-Embodied-Agent-with-LLMs](https:\u002F\u002Fgithub.com\u002Fzchoi\u002FAwesome-Embodied-Agent-with-LLMs)    \n[Awesome Embodied Vision](https:\u002F\u002Fgithub.com\u002FChanganVR\u002Fawesome-embodied-vision)    \n[Awesome Touch](https:\u002F\u002Fgithub.com\u002Flinchangyi1\u002FAwesome-Touch)    \n[Awesome VLA Study](https:\u002F\u002Fgithub.com\u002FMilkClouds\u002Fawesome-vla-study)    \n\n### Simulate Platforms & Enviroments\n[Habitat-Lab](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fhabitat-lab)    \n[Habitat-Sim](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fhabitat-sim)    \n[GibsonEnv](https:\u002F\u002Fgithub.com\u002FStanfordVL\u002FGibsonEnv)    \n[LEGENT](https:\u002F\u002Fgithub.com\u002Fthunlp\u002FLEGENT)    \n[MetaUrban](https:\u002F\u002Fmetadriverse.github.io\u002Fmetaurban\u002F)  \n[GRUtopia](https:\u002F\u002Fgithub.com\u002FOpenRobotLab\u002FGRUtopia)             \n[GenH2R](https:\u002F\u002Fgenh2r.github.io\u002F)    \n[Demonstrating HumanTHOR](https:\u002F\u002Fsites.google.com\u002Fview\u002Fhumanthor\u002F)      \n[BestMan](https:\u002F\u002Fgithub.com\u002FAutonoBot-Lab\u002FBestMan_Pybullet)      \n[InfiniteWorld](https:\u002F\u002Fgithub.com\u002Fpzhren\u002FInfiniteWorld)      \n[Genesis](https:\u002F\u002Fgenesis-embodied-ai.github.io\u002F)      \n[Cosmos](https:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fai\u002Fcosmos\u002F)      \n### Projects\n* Manipulation\n\n[RoboMamba](https:\u002F\u002Fsites.google.com\u002Fview\u002Frobomamba-web)   \n[MANIPULATE-ANYTHING](https:\u002F\u002Frobot-ma.github.io\u002F)    \n[DexGraspNet](https:\u002F\u002Fpku-epic.github.io\u002FDexGraspNet\u002F)      \n[UniDexGrasp](https:\u002F\u002Fpku-epic.github.io\u002FUniDexGrasp\u002F)      \n[UniDexGrasp++](https:\u002F\u002Fpku-epic.github.io\u002FUniDexGrasp++\u002F)      \n[OAKINK2](https:\u002F\u002Foakink.net\u002Fv2)      \n[AgiBot-World](https:\u002F\u002Fgithub.com\u002FOpenDriveLab\u002Fagibot-world)\n\n* Embodied Interaction\n\n[EmbodiedQA](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FEmbodiedQA)  \n\n* Embodied Perception\n\n[EmbodiedScan](https:\u002F\u002Fgithub.com\u002FOpenRobotLab\u002FEmbodiedScan)    \n  \n* Models & Tools\n\n[Octopus](https:\u002F\u002Fgithub.com\u002Fdongyh20\u002FOctopus)    \n[Holodeck](https:\u002F\u002Fgithub.com\u002Fallenai\u002FHolodeck)    \n[AllenAct](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fallenact)    \n\n* Agents\n\n[LEO](https:\u002F\u002Fgithub.com\u002Fembodied-generalist\u002Fembodied-generalist)    \n[Voyager](https:\u002F\u002Fgithub.com\u002FMineDojo\u002FVoyager)    \n   \n\n \n## :newspaper: Citation \nIf you think this survey is helpful, please feel free to leave a star ⭐️ and cite our paper:\n\n```bibtex\n@article{liu2024aligning,\n  title={Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI},\n  author={Liu, Yang and Chen, Weixing and Bai, Yongjie and Liang, Xiaodan and Li, Guanbin and Gao, Wen and Lin, Liang},\n  journal={arXiv preprint arXiv:2407.06886},\n  year={2024}\n}\n```\n```bibtex\n@article{liu2025aligning,\n  title={Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI},\n  author={Liu, Yang and Chen, Weixing and Bai, Yongjie and Liang, Xiaodan and Li, Guanbin and Gao, Wen and Lin, Liang},\n  journal={IEEE\u002FASME Transactions on Mechatronics},\n  year={2025}\n}\n```\n## 👏 Acknowledgements\nWe sincerely thank Jingzhou Luo, Xinshuai Song, Kaixuan Jiang, Junyi Lin, Zhida Li, and Ganlong Zhao for their contributions.\n","\u003Cbr>\n\u003Cp align=\"center\">\n\u003Ch1 align=\"center\">\u003Cstrong>具身人工智能论文列表与资源库\u003C\u002Fstrong>\u003C\u002Fh1>\n  \u003Cp align=\"center\">\n    \u003Ca href='https:\u002F\u002Fwww.sysu-hcp.net\u002F' target='_blank'>HCPLab\u003C\u002Fa>&emsp;\n    \u003Cbr>\n    中山大学HCP实验室与鹏城实验室\n    \u003Cbr>\n  \u003C\u002Fp>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHCPLab-SYSU_Embodied_AI_Paper_List_readme_1a5768392322.jpg\" width=\"250\">\n\u003C\u002Fp>\n\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2407.06886-orange)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.06886)\n[![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-%F0%9F%93%96-yellow)](https:\u002F\u002Fgithub.com\u002FHCPLab-SYSU\u002FEmbodied_AI_Paper_List\u002Fblob\u002Fmain\u002FEmbodiedAI_Review.pdf)\n[![](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-%F0%9F%9A%80-pink)](https:\u002F\u002Fgithub.com\u002FHCPLab-SYSU\u002FEmbodied_AI_Paper_List)\n\n#### 我们非常感谢同行对本论文列表或综述提出的任何有益改进建议。请提交问题或发送邮件至**liuy856@mail.sysu.edu.cn**和**chen867820261@gmail.com**。感谢您的合作！我们也欢迎为本项目贡献代码！\n\n![Teaser](teaser.png \"demo\")\n\n[**将网络空间与物理世界对齐：具身人工智能的全面综述，IEEE\u002FASME机电一体化汇刊 2025**](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.06886)    \n  [刘洋](https:\u002F\u002Fyangliu9208.github.io), 陈伟星, 白永杰, [梁晓丹](https:\u002F\u002Flemondan.github.io), [李冠斌](http:\u002F\u002Fguanbinli.com\u002F), [高文](https:\u002F\u002Fidm.pku.edu.cn\u002Finfo\u002F1017\u002F1041.htm), [林亮](http:\u002F\u002Fwww.linliang.net\u002F)     \n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHCPLab-SYSU_Embodied_AI_Paper_List_readme_18f51e0c3e73.png\" width=\"800\">\n\u003C\u002Fp>  \n\n## 🏠 关于\n\n具身人工智能（Embodied AI）对于实现通用人工智能（AGI）至关重要，同时也是连接网络空间与物理世界的各类应用（如智能机电系统、智能制造等）的基础。近年来，多模态大模型（MLMs）和世界模型（WMs）凭借其卓越的感知、交互和推理能力，成为具身智能体的有前景架构，备受关注。在本综述中，我们全面探讨了具身人工智能领域的最新进展。首先，我们梳理了具身机器人和仿真平台的代表性研究成果，以深入理解当前的研究重点及其局限性。随后，我们从四个主要研究方向展开分析：1）具身感知，2）具身交互，3）具身智能体，以及4）模拟到现实的迁移，涵盖了最先进的方法、关键范式和丰富的数据集。此外，我们还探讨了多模态大模型在虚拟和真实具身智能体中的复杂性，强调其在数字与物理环境中促进交互的重要意义。最后，我们总结了具身人工智能面临的挑战与局限，并讨论了未来的发展方向。希望本综述能为研究社区提供基础性参考。\n\n## :collision: 更新日志 \n* [2026.03.11] 更新论文列表，新增2025-2026年各领域最新论文！\n* [2025.05.27] 我们的具身人工智能综述论文已被IEEE\u002FASME机电一体化汇刊接收！\n* [2024.09.08] 数据集部分持续更新中！\n* [2024.08.31] 新增数据集板块，并对相关项目进行了分类！\n* [2024.08.19] 为帮助读者聚焦最新成果，我们已按时间顺序排列论文！   \n* [2024.08.02] 我们每周定期更新项目内容！   \n* [2024.07.29] 项目已完成更新！   \n* [2024.07.22] 更新了论文列表及其他具身相关有用项目！   \n* [2024.07.10] 发布具身人工智能综述的首个版本[PDF](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.06886)！\n* [2024.07.10] 发布具身人工智能论文列表的首个版本。本页面将持续更新！\n\n\n\n## \u003Ca id=\"table-of-contents\">📚 目录 \u003C\u002Fa>\n\n- [书籍与综述](#books-surveys)\n- [具身仿真平台](#simulators)\n- [具身感知](#perception)\n- [具身交互](#interaction)\n- [具身智能体](#agent)\n- [模拟到现实的迁移](#sim-to-real)\n- [数据集](#datasets)\n\n## \u003Ca id=\"books-surveys\"> 书籍与综述 \u003Ca href=\"#table-of-contents\">🔝\u003C\u002Fa> \u003C\u002Fa> \n\n* **自我进化具身智能**, arXiv:2602.04411, 2026       \n冯通通、王欣、朱文武。        \n[[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2602.04411)]\n\n* **迈向鲁棒且安全的具身智能：漏洞与攻击的综述**, arXiv:2502.13175, 2025       \n邢文鹏、李明浩、李摩根、韩萌。        \n[[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.13175)]\n\n* **从屏幕到场景：医疗领域具身智能的综述**, arXiv:2501.07468, 2025       \n刘一浩、曹旭、陈婷婷、蒋燕凯、游俊杰、吴明华、王小松、冯梦玲、金耀初、陈金泰。        \n[[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.07468)]\n\n* **室内具身智能中的语义地图构建——综述**, arXiv:2501.05750, 2025       \n索尼亚·雷乔杜里、安吉尔·X·张。        \n[[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.05750)]\n\n* **具身智能世界模型的全面综述**, arXiv:2510.16732, 2025       \n李新青、何鑫、张乐、吴敏、李晓丽、刘云。        \n[[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2510.16732)]\n\n* **机器人操作中的生成式人工智能：综述**, arXiv:2503.03464, 2025       \n张坤、云鹏、岑军、蔡俊豪、朱迪迪、袁航杰、赵超、冯涛、王迈克尔宇、陈启峰、潘佳、张伟、杨博、陈华。        \n[[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.03464)]\n\n* **通过模仿学习实现灵巧操作：综述**, arXiv:2504.03515, 2025       \n安山、孟子宇、唐超、周雨宁、刘腾宇、丁方强、张淑芳、穆瑶、宋冉、张伟、侯增光、张宏。        \n[[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.03515)]\n\n* **人形机器人与人形人工智能：回顾、展望与方向**, arXiv:2405.15775, 2025       \n曹龙兵。        \n[[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.15775)]\n\n* **具身智能时代下基于物理模拟器的机器人导航与操作综述**, arXiv:2505.01458, 2025       \n黄力恒、康雪阳、白凯欣、张建伟。        \n[[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.01458)]\n\n* **多模态大模型：通用人工智能的新范式**, 电子工业出版社, 2024       \n刘洋、林亮             \n[[页面](https:\u002F\u002Fhcplab-sysu.github.io\u002FBook-of-MLM\u002F)]      \n\n* **将网络空间与物理世界对齐：具身智能的全面综述**, arXiv:2407.06886, 2024       \n刘洋、陈卫星、白永杰、李冠斌、高文、林亮。        \n[[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.06886)]    \n\n* **一个机器人解决所有问题：面向多功能通用具身智能体的新标准与统一数据集**, arXiv:2408.10899, 2024      \n王志强、郑浩、聂云霜、徐文俊、王庆伟、叶华、李哲、张凯东、程学文、董万喜、蔡昌、林亮、郑峰、梁晓丹           \n[[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.10899)][[项目](https:\u002F\u002Fimaei.github.io\u002Fproject_pages\u002Fario\u002F)]\n\n* **基于AI基础模型时代的具身智能，助力未来智能制造**, IEEE\u002FASME机电一体化汇刊, 2024         \n任磊、董家宝、刘帅、张琳、王立辉。         \n[[论文](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F10697107)]\n\n* **以物体为中心的机器人操作中具身学习的综述**, arXiv:2408.11537, 2024   \n郑英、姚雷、苏月娇、张毅、王毅、赵思成、张怡怡、周立辉    \n[[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.11537)]\n\n* **人形机器人遥操作：综述**, IEEE机器人学汇刊, 2024       \n达维什·库鲁什、彭科·路易吉、拉莫斯·若昂、西斯内罗斯·拉斐尔、普拉特·杰里、吉田荣一、伊瓦尔迪·塞雷娜、普奇·达尼埃莱。        \n[[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2301.04317)]\n\n* **具身智能中视觉-语言-动作模型的综述**, arXiv:2405.14093, 2024   \n马跃恩、宋子兴、庄宇正、郝建业、金尔温    \n[[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.14093)]\n\n* **从互联网视频中学习通用机器人：综述**, arXiv:2404.19664, 2024   \n麦卡锡、罗伯特、陈丹尼尔、施密特·多米尼克、阿塞罗·费尔南多、赫尔·内森、杜一伦、瑟尔斯·托马斯·G、李志斌。  \n[[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.19664)]\n\n* **基于基础模型的机器人技术：迈向具身智能**, arXiv:2402.02385, 2024    \n许志远、吴坤、温俊杰、李金明、刘宁、车正平、唐健。     \n[[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.02385)]     \n\n* **借助基础模型实现通用机器人：综述与元分析**, Machines, 2023   \n胡亚飞、谢泉亭、贾因·维迪、弗朗西斯·乔纳森、帕特里卡尔·杰伊、基塔·尼希尔、金承灿、谢雅琪、张天一、赵世博、崇于权、王晨、西卡拉·卡蒂娅、约翰逊-罗伯森·马修、巴特拉·德鲁夫、王小龙、舍勒·塞巴斯蒂安、基拉·佐尔特、夏菲·费伊、比斯克·约纳坦。            \n[[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.08782)]    \n\n* **护理场景中的可变形物体操作：综述**, Machines, 2023   \n王利民、朱继红。  \n[[论文]https:\u002F\u002Fwww.mdpi.com\u002F2075-1702\u002F11\u002F11\u002F1013]\n\n* **具身智能综述：从模拟器到研究任务**, IEEE新兴计算智能主题汇刊, 2022    \n段嘉飞、余山森、谭慧莉、朱洪源、谭切斯顿    \n[[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.04918)]    \n\n* **具身认知的发展：来自婴儿的六个启示**, Artificial life, 2005    \n史密斯·琳达、加瑟·迈克尔    \n[[论文](https:\u002F\u002Fcogdev.sitehost.iu.edu\u002Flabwork\u002F6_lessons.pdf)]    \n\n* **具身人工智能：趋势与挑战**, 计算机科学讲义, 2004    \n罗尔夫·普菲弗、井田文弥   \n[[论文](https:\u002F\u002Fpeople.csail.mit.edu\u002Fiida\u002Fpapers\u002FPfeiferIidaEAIDags.pdf)]     \n\n## \u003Ca id=\"simulators\"> 具身模拟器 \u003Ca href=\"#table-of-contents\">🔝\u003C\u002Fa> \u003C\u002Fa>\n\n### 通用仿真器\n\n* **Gazebo：开源多机器人仿真器的设计与使用范式**, IROS, 2004        \n科尼格，内森；安德鲁，霍华德。      \n[[页面](https:\u002F\u002Fciteseerx.ist.psu.edu\u002Fdocument?repid=rep1&type=pdf&doi=79f91c1c95271a075b91e9fdca43d6c31e4cbe17)]\n\n* **NVIDIA Isaac Sim：机器人仿真与合成数据生成平台**, NVIDIA, 2023    \n[[页面](https:\u002F\u002Fdeveloper.nvidia.com\u002Fisaac\u002Fsim)]    \n\n* **Aerial Gym——面向空中机器人的Isaac Gym仿真器**, ArXiv, 2023    \n米希尔·库尔卡尼、西奥多·J·L·福加德、科斯塔斯·阿莱克西斯。     \n[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16510)]     \n\n* **Webots：开源机器人仿真器**, 2018      \n赛博机器人公司      \n[[页面](https:\u002F\u002Fcyberbotics.com\u002Fdoc\u002Freference\u002Findex), [代码](https:\u002F\u002Fgithub.com\u002Fcyberbotics\u002Fwebots)]     \n\n* **Unity：面向智能体的通用平台**, ArXiv, 2020    \n朱利亚尼，阿瑟；文森特-皮埃尔，贝尔热；埃尔文，滕；安德鲁，科恩；乔纳森，哈珀；克里斯，埃利昂；袁，戈伊；亨特，亨利；马尔万，马塔尔；丹尼，兰格。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1809.02627)]    \n\n* **AirSim：面向自动驾驶车辆的高保真视觉与物理仿真系统**, 场地与服务机器人, 2017    \n希塔尔·沙赫、德巴迪普塔·戴、克里斯·洛维特、阿希什·卡普尔。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1705.05065.pdf%20http:\u002F\u002Farxiv.org\u002Fabs\u002F1705.05065)]    \n\n* **PyBullet：用于游戏、机器人和机器学习的物理仿真Python模块**, 2016     \n库曼斯，埃尔温；白云飞。     \n[[页面](https:\u002F\u002Fgithub.com\u002Fbulletphysics\u002Fbullet3)]       \n\n* **V-REP：多功能且可扩展的机器人仿真框架**, IROS, 2013      \n罗默，埃里克；苏里亚·PN；辛格，马克；弗里斯。     \n[[页面](https:\u002F\u002Fcoppeliarobotics.com\u002FcoppeliaSim_v-rep_iros2013.pdf)]     \n\n* **MuJoCo：基于模型控制的物理引擎**, IROS, 2012    \n托多罗夫，伊曼纽尔；汤姆，埃雷兹；尤瓦尔，塔萨。      \n[[页面](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F6386109\u002F), [代码](https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Fmujoco)]     \n\n* **模块化开源机器人仿真引擎：Morse**, ICRA, 2011       \n埃切韦里亚，吉尔伯托；拉萨贝，尼古拉斯；德格鲁特，阿尔诺；勒梅尼昂，塞韦林     \n[[页面](https:\u002F\u002Fwww.openrobots.org\u002Fmorse\u002Fmaterial\u002Fmedia\u002Fpdf\u002Fpaper-icra.pdf)]\n\n### 基于真实场景的仿真器\n\n* **RoboVerse：迈向可扩展且通用的机器人学习统一平台、数据集与基准测试**，arXiv，2025年  \n耿浩然、王飞石、魏松林、李宇阳、王邦俊、安博世、程天悦、娄浩哲、李沛昊、王延杰、梁宇彤、戈廷·迪伦、徐超毅、陈浩哲、钱宇曦、耿怡然、毛家庚、万维康、张明通、吕江然、赵思恒、张嘉钊、张佳亮、赵成阳、陆浩然、丁宇飞、龚冉、王雨然、匡宇轩、吴瑞海、贾宝雄、卡洛·斯费拉扎、董浩、黄思远、王岳、马利克·吉滕德拉、皮特·阿贝尔。   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.18904)]\n\n* **Isaac Lab：用于多模态机器人学习的GPU加速仿真框架**，arXiv，2025年  \n米扬克·米塔尔、帕斯卡尔·罗斯、詹姆斯·蒂格、安托万·理查德、张奥克提、杜彼得、安东尼奥·塞拉诺-穆尼奥斯、姚新杰、勒内·祖尔布吕格、鲁丁·尼基塔、瓦夫日尼亚克·卢卡什、拉赫沙·米拉德、丹兹勒·阿兰、海登·埃里克、博罗维茨卡·阿莱斯、艾哈迈德·奥萨马、阿基诺拉·伊雷蒂亚约、安瓦尔·阿布拉尔、卡尔森·马克·T、冯·季元、加格·阿尼梅什。   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2511.04831)]\n\n* **InfiniteWorld：用于通用视觉-语言机器人交互的统一可扩展仿真框架**，arXiv，2024年  \n任鹏振、李敏、罗震、宋新帅、陈子威、刘福伟嘉、杨一轩、郑浩、许荣涛、黄子桐、丁同生、谢路洋、张凯东、傅昌飞、刘洋、林亮、郑峰、梁晓丹。   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.05789)]\n\n\n* **ManiSkill3：面向通用具身AI的GPU并行化机器人仿真与渲染**，arXiv，2024年  \n陶石头、向凡博、舒克拉·阿斯、秦宇哲、欣德里希森·赞德、袁晓迪、鲍晨、林信松、刘玉林、陈泽凯、高源、李玄林、穆通州、肖楠、古尔哈·阿尔纳夫、黄志傲、卡拉德拉·罗伯托、陈锐、罗珊、苏浩。   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.00425)]\n\n* **PhyScene：面向具身AI的物理可交互3D场景合成**，arXiv，2024年  \n杨、严丹、贾宝雄、支培源、黄思远。   \n[[页面](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FYang_PhyScene_Physically_Interactable_3D_Scene_Synthesis_for_Embodied_AI_CVPR_2024_paper.pdf)]\n\n* **Holodeck：语言引导生成3D具身AI环境**，CVPR，2024年  \n杨月、孙凡云、魏斯·卢卡、范德比尔特·伊利、埃拉瓦罗·埃拉斯特里、韩温森、吴嘉俊、哈伯·尼克、克里希纳·兰杰、刘凌洁、卡利森-伯奇·克里斯、雅茨卡尔·马克、坎巴维·阿尼鲁达、克拉克·克里斯托弗。   \n[[页面](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FYang_Holodeck_Language_Guided_Generation_of_3D_Embodied_AI_Environments_CVPR_2024_paper.pdf)]\n\n* **RoboGen：通过生成式仿真释放无限数据以实现自动化机器人学习**，arXiv，2023年  \n王宇飞、周贤、冯辰、王存萱、王义安、王卡特琳娜、弗拉基亚达基·扎科里、埃里克森·大卫、赫尔德·庄、甘。   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.01455)]\n\n* **ProcTHOR：基于程序化生成的大规模具身AI**，NeurIPS，2022年  \n代特克、范德比尔特、埃拉斯特里、魏斯、萨尔瓦多、埃赫萨尼、韩、科尔夫、法哈迪、坎巴维、莫塔吉。   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2206.06994)]\n\n* **ThreeDWorld：用于交互式多模态物理仿真的平台**，NeurIPS，2021年  \n甘·庄、J、施瓦茨·塞思、阿尔特·马丁、施林普夫·詹姆斯、特雷尔·朱利安德、弗雷塔斯·乔纳斯、库比利乌斯·阿比舍克、班德瓦尔德·尼克、哈伯·梅古米、佐野·久野、金·伊利亚斯、王·达米安、姆罗卡·迈克尔、林格尔巴赫·艾丹、柯蒂斯·凯文T、费格尔里斯·戴维M、贝尔·丹、古特弗伦德·戴维D、考克斯·詹姆斯J、迪卡洛·乔什H、麦克德莫特·乔舒亚B、特南鲍姆·丹尼尔、亚马津。   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2007.04954)]\n\n* **iGibson 1.0：大型真实场景中交互任务的仿真环境**，IROS，2021年  \n沈博魁、费夏、程书、李罗伯托、马丁-马丁·林熙、范关智、王克劳迪娅、佩雷斯-达尔皮诺·夏马尔、布赫·桑贾娜、斯里瓦斯塔瓦·莱恩、查普米·米卡埃尔、查普米·肯特、韦尼奥·约西亚、王莉、费-费·西尔维奥、萨瓦雷斯。   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2012.02924)]\n\n* **SAPIEN：基于部件的模拟交互环境**，CVPR，2020年  \n向凡博、秦宇哲、莫凯春、李益宽、夏浩、朱方晨、刘明华、刘汉霄、蒋义夫、袁何、王李易、安吉尔X、张列奥尼达斯J、圭巴斯·郝、苏浩。   \n[[页面](http:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_CVPR_2020\u002Fpapers\u002FXiang_SAPIEN_A_SimulAted_Part-Based_Interactive_ENvironment_CVPR_2020_paper.pdf)]\n\n* **Habitat：具身AI研究平台**，ICCV，2019年  \n萨瓦·马诺利斯、阿比舍克·卡迪安、奥列克桑德尔·马克西梅茨、赵伊丽、维曼斯·埃里克、贾伊·布莱恩、刘弗拉德伦、科尔顿·吉滕德拉、马利克·黛薇、帕里克·德鲁夫、巴特拉。   \n[[页面](http:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_ICCV_2019\u002Fpapers\u002FSavva_Habitat_A_Platform_for_Embodied_AI_Research_ICCV_2019_paper.pdf)]\n\n* **VirtualHome：通过程序模拟家庭活动**，CVPR，2018年  \n普伊格·哈维尔、凯文·拉、马尔科·鲍本、李佳满、王珊雅、菲德勒·安东尼奥、托拉尔巴。   \n[[页面](http:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_cvpr_2018\u002Fpapers\u002FPuig_VirtualHome_Simulating_Household_CVPR_2018_paper.pdf)]\n\n* **Matterport3D：从室内环境中的RGB-D数据中学习**，3DV，2017年  \n张·安吉尔、安吉拉·戴、托马斯·芬克豪瑟、马切伊·哈尔伯、马蒂亚斯·尼布纳、马诺利斯·萨瓦、宋·安迪、曾·印达、张。   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1709.06158)]\n\n* **AI2-THOR：用于视觉AI的交互式3D环境**，arXiv，2017年  \n科尔夫·埃里克、鲁兹贝·莫塔吉、丹尼尔·戈登、朱·阿比纳夫、古普塔·阿里、法哈迪。   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1712.05474)]\n\n## \u003Ca id=\"perception\"> 具身感知 \u003Ca href=\"#table-of-contents\">🔝\u003C\u002Fa> \u003C\u002Fa>\n### 主动视觉探索\n* **迈向行走视觉：学习视觉驱动的主动视点选择**，Arxiv，2025年。  \n库·朱伊尔*、崔大贤*、尹尚佑*、李Phillip Y.、成珉赫。  \n[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.13250)]\n\n* **ActiveGAMER：通过高效渲染进行主动高斯映射**，CVPR，2025年。  \n陈丽燕、詹黄英、陈凯文、徐向宇、颜庆安、蔡长江、徐毅。  \n[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.06897)]\n\n* **ActiveGS：利用高斯泼溅进行主动场景重建**，RA-L，2025年。  \n金立仁、钟兴光、潘岳、贝利·延斯、斯塔赫尼斯·西里尔、波波维奇·玛丽亚。  \n[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.17769)]\n\n* **RoboTracer：借助视觉-语言模型中的推理掌握空间追踪技术，应用于机器人领域**，arxiv，2025年。  \n周恩深、池成、李一博、安景坤、张家源、荣善宇、韩毅、姬宇衡、刘孟珍、王鹏威、王中原、盛陆、张尚航。  \n[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.13660)] [[项目](https:\u002F\u002Fzhoues.github.io\u002FRoboTracer\u002F)]\n\n* **RoboRefer：面向机器人视觉-语言模型推理的空间指代**，arXiv，2025年。  \n周恩深、安景坤、池成、韩毅、荣善宇、张驰、王鹏威、王中元、黄铁军、盛璐、张尚航。  \n[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.04308)] [[项目](https:\u002F\u002Fzhoues.github.io\u002FRoboRefer\u002F)]\n\n* **3DAffordSplat：基于3D高斯的高效 affordance 推理**，arXiv，2025年。  \n魏泽明、林俊义、刘洋、陈伟星、罗静洲、李冠斌、林亮。  \n[[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.11218)] [[项目](https:\u002F\u002Fgithub.com\u002FHCPLab-SYSU\u002F3DAffordSplat)]\n\n* **代码即监控：面向反应式与前瞻式机器人故障检测的约束感知型视觉编程**，CVPR，2025年。  \n周恩深、苏琪、池成、张志正、王中元、黄铁军、盛璐、王赫。  \n[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.04455)] [[项目](https:\u002F\u002Fzhoues.github.io\u002FCode-as-Monitor\u002F)]\n\n* **SnapMem：基于快照的具身探索与推理用3D场景记忆**，arXiv，2024年。  \n杨云聪、杨涵、周嘉晨、陈沛浩、张洪鑫、杜一伦、甘闯。  \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.17735)]\n\n* **AIR-Embodied：基于具身大型语言模型的高效主动3DGS交互与重建框架**，arXiv，2024年。  \n齐正浩、袁圣海、刘芬、曹浩志、邓天辰、杨建飞、谢丽华。  \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.16019)]\n\n* **BEHAVIOR Vision Suite：通过仿真自定义数据集生成**，CVPR，2024年。  \n葛云浩、唐艺禾、徐家树、杰姆·戈克曼、李承书、艾文思、本杰明·何塞·马丁内斯、阿尔曼·艾丁、莫娜·安瓦里、阿尤什·K·查克拉瓦蒂、余宏兴、约西亚·王、桑贾娜·斯里瓦斯塔瓦、莎伦·李、赵圣欣、洛朗·伊蒂、李云竹、罗伯托·马丁-马丁、刘淼、张鹏川、张若涵、李飞飞、吴佳俊。  \n[[页面](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FGe_BEHAVIOR_Vision_Suite_Customizable_Dataset_Generation_via_Simulation_CVPR_2024_paper.pdf)]\n\n* **机器人焊接中多条焊缝的粗细结合检测**，arXiv，2024年。  \n魏鹏坤、程硕、李大友、宋然、张一鹏、张伟。  \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.10710)]\n\n* **证据主动识别：智能且审慎的开放世界具身感知**，CVPR，2024年。  \n范、雷、明福、梁、李云轩、华刚、吴英。  \n[[页面](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FFan_Evidential_Active_Recognition_Intelligent_and_Prudent_Open-World_Embodied_Perception_CVPR_2024_paper.pdf)]\n\n* **SpatialBot：利用视觉语言模型实现精确的空间理解**，arXiv，2024年。  \n蔡文晓、亚罗斯拉夫·波诺马连科、袁建豪、李小奇、杨万库、董浩、赵博。  \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.13642)]\n\n* **具身不确定性感知下的物体分割**，IROS，2024年。  \n方晓琳、莱斯利·帕克·凯尔布林、托马斯·洛萨诺-佩雷斯。  \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.04760)]\n\n* **Point Transformer V3：更简单、更快、更强**，CVPR，2024年。  \n吴晓阳、李江、彭帅、王志坚、刘希辉、刘宇、乔万里、欧阳彤、何恒爽、赵。  \n[[页面](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FWu_Point_Transformer_V3_Simpler_Faster_Stronger_CVPR_2024_paper.pdf)]\n\n* **PointMamba：用于点云分析的简单状态空间模型**，arXiv，2024年。  \n梁定康、周新、周信宇、王兴奎、朱伟、许志康、邹晓青、叶翔、白。  \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.10739)]\n\n* **Point Could Mamba：基于状态空间模型的点云学习**，arXiv，2024年。  \n张涛、李向泰、李浩波、袁顺平、季水成、严。  \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.00762)]\n\n* **Mamba3d：通过状态空间模型增强3D点云分析中的局部特征**，arXiv，2024年。  \n韩旭、袁唐、赵轩宣、王贤志、李。  \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.14966)]\n\n* **Gs-slam：基于3D高斯泼溅的稠密视觉SLAM**，CVPR，2024年。  \n严驰、屈德林、徐丹、王志刚、王东、王雪龙、李。  \n[[页面](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FYan_GS-SLAM_Dense_Visual_SLAM_with_3D_Gaussian_Splatting_CVPR_2024_paper.pdf)]\n\n* **GOReloc：基于图的物体级重定位技术，用于视觉SLAM**，IEEE RAL，2024年。  \n王宇彤、蒋朝阳、陈谢源立。  \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.07917)]\n\n* **Embodiedscan：面向具身AI的整体多模态3D感知套件**，CVPR，2024年。  \n王泰、夏涵、毛晨明、朱润森、徐瑞远、吕培森、李晓、陈文伟、张凯、陈天凡、薛以及其他。  \n[[页面](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FWang_EmbodiedScan_A_Holistic_Multi-Modal_3D_Perception_Suite_Towards_Embodied_AI_CVPR_2024_paper.pdf)]\n\n* **Neu-nbv：基于图像神经渲染中的不确定性估计进行下一个最佳视角规划**，IROS，2023年。  \n金利仁、陈谢源立、朱利叶斯、鲁金、玛丽亚、波波维奇。  \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.01284)]\n\n* **具有在线适应性的离策略评估，用于机器人在复杂环境中的探索**，IEEE机器人与自动化快报，2023年。  \n胡亚飞、耿俊义、陈、王约翰、凯勒、塞巴斯蒂安、舍雷尔。  \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.03140)]\n\n* **OVD-SLAM：一种适用于动态环境的在线视觉SLAM**，IEEE传感器期刊，2023年。  \n何嘉明、李明睿、王阳阳、王鸿宇、王。  \n[[页面](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F10113832)]\n\n* **跨异构机器人形态转移非视觉对象属性的隐式知识**，ICRA，2023年。  \n塔蒂娅、吉安、乔纳森、弗朗西斯、季夫科、西纳波夫。  \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.06890)]\n\n* **Swin3d：用于3D室内场景理解的预训练Transformer骨干网络**，arXiv，2023年。  \n杨宇奇、于晓、郭建宇、熊扬、刘浩、潘彭帅、王欣、童百宁、郭。  \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.06906)]\n\n* **Point Transformer v2：分组向量注意力与基于分区的池化**，NeurIPS，2022年。  \n吴晓阳、易兴、老李、李江、刘希辉、赵恒爽。  \n[[页面](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Ffile\u002Fd78ece6613953f46501b958b7bb4582f-Paper-Conference.pdf)]\n\n* **重新思考点云中的网络设计与局部几何：一个简单的残差MLP框架**，arXiv，2022年。  \n马旭、秦灿、郝轩、游浩熙、冉云、傅。  \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2202.07123)]\n\n* **So-slam：带有尺度比例和对称纹理约束的语义物体SLAM**。IEEE机器人与自动化快报第7卷第2期（2022年）：4008–4015页。  \n廖子威、胡宇彤、张家栋、张宪宇、齐晓宇、张伟、王。  \n[[页面](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F9705562)]\n\n* **SG-SLAM：一种面向动态场景、融合语义与几何信息的实时RGB-D视觉SLAM**，IEEE仪器与测量汇刊 72.（2022）：1–12。      \n程书宏，孙昌和，张世军，张典凡。    \n[[页面]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F9978699)   \n\n* **点变换器**，ICCV，2021年。\n赵恒爽，李江，贾亚，Philip HS，托尔，弗拉德伦，科尔顿。     \n[[页面](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2021\u002Fpapers\u002FZhao_Point_Transformer_ICCV_2021_paper.pdf)]    \n\n* **PointPillars：用于从点云中进行目标检测的快速编码器**，CVPR，2019年。    \n朗·亚历克斯·H，索拉布·沃拉，霍尔格·凯撒，周鲁冰，杨炯，奥斯卡·贝伊博姆。     \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_CVPR_2019\u002Fpapers\u002FLang_PointPillars_Fast_Encoders_for_Object_Detection_From_Point_Clouds_CVPR_2019_paper.pdf)    \n\n* **4D时空卷积网络：明可夫斯基卷积神经网络**，CVPR，2019年。    \n乔伊，克里斯托弗，具俊英，萨瓦雷斯，西尔维奥。    \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_CVPR_2019\u002Fpapers\u002FChoy_4D_Spatio-Temporal_ConvNets_Minkowski_Convolutional_Neural_Networks_CVPR_2019_paper.pdf)\n\n* **Cubeslam：单目3D目标SLAM**，IEEE T-RO 35. 4（2019）：925–938  \n杨世超，塞巴斯蒂安·舍雷尔。  \n[[页面]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F8708251)\n\n* **基于层次主题模型的目标关联用于语义SLAM**，IEEE T-VCG 25. 11（2019）：3052–3062  \n张建华，桂孟平，王奇超，刘汝宇，徐盛勇，陈。   \n[[页面]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F8794595)\n\n* **DS-SLAM：面向动态环境的语义视觉SLAM**，IROS，2018年   \n于超，刘祖鑫，刘新军，谢富贵，杨毅，魏琪，乔飞。   \n[[页面]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F8593691)\n\n* **DynaSLAM：动态场景中的跟踪、建图与修复**，IEEE机器人与自动化快报 3. 4（2018）：4076–4083     \n贝斯科斯，贝尔塔，何塞·M，法西尔，哈维尔，西韦拉，何塞，内拉。   \n[[页面]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F8421015)\n\n* **Quadricslam：基于目标检测的双二次曲面作为面向对象SLAM中的地标**，IEEE机器人与自动化快报 4. 1（2018）：1–8。  \n尼科尔森，拉克兰，米尔福德，迈克尔，桑德豪夫，尼科。   \n[[页面]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F8440105)\n\n* **利用子流形稀疏卷积网络进行3D语义分割**，CVPR，2018年。    \n格雷厄姆，本杰明，恩格尔克，马丁，范德马滕，劳伦斯。     \n[[页面](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_cvpr_2018\u002Fpapers\u002FGraham_3D_Semantic_Segmentation_CVPR_2018_paper.pdf)]\n\n* **学习环顾四周：为未知任务智能探索未知环境**，CVPR，2018年。   \n贾亚拉曼，迪内什，格劳曼，克里斯汀。    \n[[页面](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_cvpr_2018\u002Fpapers\u002FJayaraman_Learning_to_Look_CVPR_2018_paper.pdf)]    \n\n* **用于自动驾驶的多视角3D目标检测网络**，CVPR，2017年。    \n陈晓志，马慧敏，万吉，李天，夏。     \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_cvpr_2017\u002Fpapers\u002FChen_Multi-View_3D_Object_CVPR_2017_paper.pdf)    \n\n* **从单幅深度图像进行语义场景补全**，CVPR，2017年。    \n宋舒然，费舍尔，余，安迪，曾，安杰尔·X，张，马诺利斯，萨瓦，托马斯，芬克豪瑟。     \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_cvpr_2017\u002Fpapers\u002FSong_Semantic_Scene_Completion_CVPR_2017_paper.pdf)    \n\n* **PointNet：用于3D分类与分割的点集深度学习**，CVPR，2017年。    \n齐，查尔斯·R，苏，郝，莫，凯春，古伊巴斯，莱昂尼达斯·J。     \n[[页面](Pointnet: Deep learning on point sets for 3d classification and segmentation)]    \n\n* **PointNet++：在度量空间中对点集进行深度层次特征学习**，NeurIPS，2017年。    \n齐，查尔斯·瑞仲泰，李，易，苏，郝，古伊巴斯，莱昂尼达斯·J。     \n[[页面](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2017\u002Ffile\u002Fd8bf84be3800d12f74d8b05e9b89836f-Paper.pdf)]\n\n* **好奇的机器人：通过物理交互学习视觉表征**，ECCV，2016年。   \n平托，勒雷尔，加迪，迪拉吉，韩元峰，朴永来，古普塔，阿比纳夫。    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1604.01360)    \n\n* **用于3D形状识别的多视角卷积神经网络**，ICCV，2015年。    \n苏，杭，马吉，苏布兰苏，卡洛格拉基斯，埃文杰洛斯，利アード-米勒，埃里克。     \n[[页面]](https:\u002F\u002Fwww.cv-foundation.org\u002Fopenaccess\u002Fcontent_iccv_2015\u002Fpapers\u002FSu_Multi-View_Convolutional_Neural_ICCV_2015_paper.pdf)    \n\n* **Voxnet：用于实时目标识别的3D卷积神经网络**，IROS，2015年。    \n马图拉纳，丹尼尔，舍雷尔，塞巴斯蒂安。     \n[[页面]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F7353481)    \n\n* **ORB-SLAM：一种通用且精确的单目SLAM系统** IEEE T-RO 31. 5（2015）：1147–1163  \n穆尔-阿尔塔尔，劳尔，马丁内斯·何塞·玛丽亚，蒙蒂埃尔，胡安·D，塔尔多斯。   \n[[页面]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F7219438\u002F)\n\n* **LSD-SLAM：大规模直接单目SLAM**，ECCV，2014年  \n恩格尔，雅各布，肖普斯，托马斯，克雷默斯，丹尼尔。  \n[[页面]](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-3-319-10605-2_54)\n\n* **Slam++：在目标级别实现的同时定位与建图**，CVPR，2013年  \n萨拉斯-莫雷诺，雷纳托·F，理查德·A，纽科姆，豪克，斯特拉斯达特，保罗·HJ，凯利，安德鲁·J，戴维森。   \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_cvpr_2013\u002Fpapers\u002FSalas-Moreno_SLAM_Simultaneous_Localisation_2013_CVPR_paper.pdf)\n\n* **DTAM：实时密集跟踪与建图**，ICCV，2011年  \n纽科姆，理查德·A，洛夫格罗夫，史蒂文·J，戴维森，安德鲁·J。  \n[[页面]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F6126513\u002F)\n\n* **MonoSLAM：实时单目SLAM**，IEEE T-PAMI，2007年。  \n戴维森，安德鲁·J，里德，伊恩·D，莫尔顿，尼古拉斯·D，斯塔斯，奥利维埃。   \n[[页面]](http:\u002F\u002Fwww.doc.ic.ac.uk\u002F~ajd\u002FPublications\u002Fdavison_etal_pami2007.pdf)\n\n* **用于视觉辅助惯性导航的多状态约束卡尔曼滤波器**，IROS，2007年  \n穆里基斯，阿纳斯塔西奥斯·I，鲁梅利奥蒂斯，斯特吉奥斯·I。   \n[[页面]](https:\u002F\u002Fintra.engr.ucr.edu\u002F~mourikis\u002Ftech_reports\u002FTR_MSCKF.pdf)\n\n* **用于小型AR工作空间的并行跟踪与建图**，ISMAR，2007年  \n克莱因，乔治，穆雷，大卫。   \n[[页面]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F4538852\u002F)\n\n\n\n### 3D视觉感知与接地\n* **ReasonGrounder：LVLM引导的层次化特征投射用于开放词汇3D视觉接地**，CVPR，2025年  \n刘振阳，王一凯，郑思晓，潘彤颖，梁龙飞，傅延伟，薛向阳。  \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.23297)\n\n* **ViGiL3D：一个用于3D视觉接地的多语言数据集**，arXiv，2025年  \n王奥斯汀·T，龚泽明，张安杰尔·X。  \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2501.01366)\n\n* **UAD：用于机器人操作泛化任务的无监督可供性蒸馏**，ICRA，2025\n唐一鹤、黄文龙、王英科、李成树、Roy Yuan、张若涵、吴嘉俊、李飞飞  \n[[页面]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=an953WOpo2)\n\n* **基于语言指令、视觉观测与交互的3D物体可供性对齐**，arXiv，2025  \n朱赫、孔秋宇、徐克春、夏训龙、邓冰、叶洁平、熊荣、王岳  \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.04744)  \n\n* **3D-AffordanceLLM：利用大型语言模型实现3D世界中的开放词汇可供性检测**，arXiv，2025  \n褚恒硕、邓翔、吕琪、陈晓阳、李银川、郝建业、聂立强  \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.20041)  \n\n* **SeqAfford：通过多模态大型语言模型进行序列化的3D可供性推理**，CVPR，2025  \n王汉青、于春林、罗浩洋、俞静怡、史烨、王静雅  \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.01550) \n\n* **GEAL：基于跨模态一致性的可泛化3D可供性学习**，CVPR，2025  \n卢东岳、孔令东、黄天欣、李金熙  \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.09511)  \n\n* **GREAT：面向开放词汇3D物体可供性对齐的几何-意图协同推理**，arXiv，2024  \n邵亚文、翟伟、杨宇航、罗洪晨、曹阳、查正军，CVPR，2025  \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.19626)  \n\n* **LASO：基于语言引导的3D物体可供性分割**，CVPR，2024  \n李一聪、赵娜、肖俊斌、冯春、王翔、蔡特生  \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FLi_LASO_Language-guided_Affordance_Segmentation_on_3D_Object_CVPR_2024_paper.pdf)  \n\n* **SceneFun3D：3D场景中的细粒度功能与可供性理解**，CVPR，2024  \n亚历山德罗斯·德利察斯、艾伊卡·塔克马兹、费德里科·汤巴里、罗伯特·萨姆纳、马克·波勒菲斯、弗朗西斯·恩格尔曼  \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fhtml\u002FDelitzas_SceneFun3D_Fine-Grained_Functionality_and_Affordance_Understanding_in_3D_Scenes_CVPR_2024_paper.html)  \n\n* **语言条件下的3D点云可供性-位姿检测**，ICRA，2024  \n阮端、武明日、黄宝儒、武团文、张薇、黎银、武秀、黎北、阮英  \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.10911)   \n\n* **DSPNet：用于鲁棒3D问答的双目场景感知**，CVPR，2025        \n罗景州、刘洋、陈伟星、李振、王耀威、李冠彬、林亮                                   \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.03190)[项目](https:\u002F\u002Fgithub.com\u002FLZ-CH\u002FDSPNet)    \n\n* **用于3D可供性对齐的2D不变可供性知识学习**，arXiv，2024        \n高贤强、张平瑞、曲德林、王东、王志刚、丁岩、赵斌、李学龙                               \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.13024)\n\n* **EmbodiedSAM：实时在线分割任意3D物体**，arXiv，2024        \n许修伟、陈黄兴、赵琳清、王子威、周杰、陆继文                          \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.11811)\n\n* **OpenScan：面向通用开放词汇3D场景理解的基准数据集**，arXiv，2024        \n赵友军、林佳颖、叶书权、庞千石、劳仁森·W·H·  \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.11030)\n\n* **LLMI3D：通过单张2D图像赋予大型语言模型3D感知能力**，arXiv，2024       \n杨帆、赵思诚、张彦豪、陈浩翔、陈辉、唐文博、陆浩楠、徐鹏飞、杨振宇、韩俊功、丁贵光                      \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.07422)\n\n* **MMScan：具有分层语义标注的多模态3D场景数据集**，arXiv，2024       \n吕睿远、王泰、林静丽、杨帅、毛晓涵、陈逸伦、徐润森、黄海峰、朱晨明、林大华、庞江淼                   \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.09401)\n\n* **ShapeLLM：面向具身交互的通用3D对象理解模型**，arXiv，2024          \n戚泽坤、董润培、张绍晨、耿浩然、韩春锐、葛政、王何、李毅、马凯胜      \n[[页面]](https:\u002F\u002Fqizekun.github.io\u002Fshapellm\u002F)\n\n* **LEO：3D世界中的具身通用智能体**，ICML，2024      \n黄江勇、雍思龙、马晓健、凌虎雄坤、李普浩、王燕、李青、朱松纯、贾宝雄、黄思源   \n[[页面]](https:\u002F\u002Fembodied-generalist.github.io\u002F)    \n\n* **SceneVerse：面向场景理解的3D视觉-语言学习规模化扩展**，ECCV，2024    \n贾宝雄、陈艺心、于黄悦、王燕、牛雪松、刘腾宇、李青、黄思源    \n[[页面]](https:\u002F\u002Fscene-verse.github.io\u002F)    \n\n* **PQ3D：通过可提示查询统一3D视觉-语言理解**，ECCV，2024     \n朱子宇、张卓凡、马晓健、牛雪松、陈艺心、贾宝雄、邓志东、黄思源、李青    \n[[页面]](https:\u002F\u002F3d-vista.github.io\u002F)\n\n* **MultiPLY：3D世界中以多感官为中心的具身大型语言模型**，CVPR，2024     \n洪宜宁、郑子硕、陈培浩、王依安、李俊彦、甘创     \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FHong_MultiPLY_A_Multisensory_Object-Centric_Embodied_Large_Language_Model_in_3D_CVPR_2024_paper.pdf)\n\n* **MP5：基于主动感知的多模态开放式具身系统，应用于Minecraft**，CVPR，2024     \n秦怡然、周恩深、刘启昌、尹振飞、盛路、张瑞茂、乔宇、邵晶        \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FQin_MP5_A_Multi-modal_Open-ended_Embodied_System_in_Minecraft_via_Active_CVPR_2024_paper.pdf)\n\n* **MaskClustering：基于视图共识的掩码图聚类，用于开放词汇3D实例分割**，CVPR，2024     \n严米、张家照、朱燕、王何            \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.07745)\n\n* **TACO：可泛化双手工具-动作-物体理解的基准测试**，CVPR，2024     \n刘云、杨浩林、司旭、刘玲、李子朋、张雨翔、刘业斌、李毅                \n[[页面]](https:\u002F\u002Ftaco2024.github.io\u002F)\n\n* **EDA：显式文本解耦与密集对齐技术在3D视觉对齐中的应用**，CVPR，2023   \n吴、严敏、程、新华、张、仁瑞、程、泽森、张、健   \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FWu_EDA_Explicit_Text-Decoupling_and_Dense_Alignment_for_3D_Visual_Grounding_CVPR_2023_paper.pdf)   \n\n* **Affordpose：大规模手-物体交互数据集，包含基于可供性的手部姿态信息**，ICCV，2023  \n简俊涛、刘秀萍、李曼怡、胡瑞珍、刘健  \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fhtml\u002FJian_AffordPose_A_Large-Scale_Dataset_of_Hand-Object_Interactions_with_Affordance-Driven_Hand_ICCV_2023_paper.html)\n\n* **基于图像中2D交互的3D物体可供性定位**, ICCV, 2023  \n杨宇航、翟伟、罗洪晨、曹阳、罗杰波、查正军  \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fhtml\u002FYang_Grounding_3D_Object_Affordance_from_2D_Interactions_in_Images_ICCV_2023_paper.html)  \n\n* **3d-vista：用于3D视觉与文本对齐的预训练Transformer模型**, ICCV, 2023       \n朱子宇、马晓健、陈一欣、邓志东、黄思远、李青      \n[[页面]](https:\u002F\u002F3d-vista.github.io\u002F)    \n\n* **LeaF：用于4D点云序列理解的学习帧方法**, ICCV, 2023       \n刘云泽、陈俊宇、张泽凯、易力        \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FLiu_LeaF_Learning_Frames_for_4D_Point_Cloud_Sequence_Understanding_ICCV_2023_paper.pdf)\n\n* **SQA3D：3D场景中的情境化问答系统**, ICLR, 2023    \n马晓健、雍思龙、郑子龙、李青、梁义涛、朱松纯、黄思远    \n[[页面]]([https:\u002F\u002Fsqa3d.github.io\u002F)\n\n* **LLM-Grounder：以大型语言模型为代理的开放词汇3D视觉定位**, arXiv, 2023   \n杨佳宁、陈旭伟、钱圣毅、马丹尼尔、艾英加尔马达万、福黑大卫F、蔡乔伊斯   \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.12311)   \n\n* **面向零样本开放词汇3D视觉定位的视觉编程**, arXiv, 2023   \n袁志浩、任金科、冯春梅、赵恒爽、崔曙光、李震   \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.15383)\n\n* **用于3D视觉定位的多视角Transformer模型**, CVPR, 2022   \n黄世嘉、陈奕伦、贾佳亚、王立伟   \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.02174)    \n\n* **环顾四周并参照：用于3D视觉定位的2D合成语义知识蒸馏**, CVPR, 2022   \n巴克尔埃斯拉姆、阿尔萨迪雅斯敏、埃尔霍赛尼穆罕默德   \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2211.14241)   \n\n* **3D-SPS：通过引用点渐进式选择实现单阶段3D视觉定位**, CVPR, 2022   \n罗俊宇、傅家辉、孔祥昊、高辰、任海兵、沈浩、夏华夏、刘思   \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.06272)    \n\n* **自下而上、自上而下的检测Transformer模型，用于图像和点云中的语言定位**, ECCV, 2022   \n贾因阿尤什、格卡纳西奥斯尼古拉斯、梅迪拉塔伊希塔、弗拉基亚达基卡特琳娜   \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2112.08879)   \n\n* **3d affordancenet：视觉物体可供性理解的基准测试**, CVPR, 2021  \n邓盛恒、徐勋、吴超正、陈科、贾奎   \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2021\u002Fhtml\u002FDeng_3D_AffordanceNet_A_Benchmark_for_Visual_Object_Affordance_Understanding_CVPR_2021_paper.html)\n\n* **文本引导的图神经网络用于引用式3D实例分割**, AAAI, 2021   \n黄品豪、李汉鸿、陈焕宗、刘廷禄   \n[[页面]](https:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fview\u002F16253\u002F16060)   \n\n* **InstanceRefer：通过实例多层次上下文引用实现点云上视觉定位的协同整体理解**, ICCV, 2021   \n袁志浩、严旭、廖英红、张瑞茂、王晟、李震、崔曙光   \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.01128)   \n\n* **自由描述引导的3D视觉图网络用于点云中的物体定位**, CVPR, 2021   \n冯明涛、李震、李琪、张亮、张向东、朱光明、张辉、王耀南、米安阿吉马尔   \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.16381)   \n\n* **SAT：用于3D视觉定位的2D语义辅助训练**, CVPR, 2021   \n杨正元、张松阳、王立伟、罗杰波   \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2105.11450)   \n\n* **LanguageRefer：用于3D视觉定位的空间语言模型**, CVPR, 2021   \n罗俊河、德辛格卡尔蒂克、法哈迪阿里、福克斯迪特尔   \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2107.03438)   \n\n* **3DVG-Transformer：用于点云上视觉定位的关系建模**, ICCV, 2021    \n赵丽晨、蔡大刚、盛璐、许东    \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2021\u002Fpapers\u002FZhao_3DVG-Transformer_Relation_Modeling_for_Visual_Grounding_on_Point_Clouds_ICCV_2021_paper.pdf)    \n\n* **TransRefer3D：面向细粒度3D视觉定位的实体与关系感知Transformer模型**, CVPR, 2021    \n何代兰、赵宇生、罗俊宇、惠天睿、黄绍飞、张爱喜、刘思\n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2108.02388)   \n\n* **ScanRefer：利用自然语言在RGB-D扫描中进行3D物体定位**, ECCV, 2020    \n陈大卫振宇、常安吉尔X、尼斯纳马蒂亚斯    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1912.08830)    \n\n* **ReferIt3D：用于真实场景中细粒度3D物体识别的神经听者**, ECCV, 2020   \n阿奇利奥普塔斯帕诺斯、阿卜杜勒雷赫姆艾哈迈德、夏菲、埃尔霍赛尼穆罕默德、圭巴斯莱昂尼达斯   \n[[页面]](https:\u002F\u002Fwww.ecva.net\u002Fpapers\u002Feccv_2020\u002Fpapers_ECCV\u002Fpapers\u002F123460409.pdf)   \n\n\n\n\n### 视觉语言导航\n\n* **WMNav：将视觉-语言模型整合到世界模型中，用于目标物体导航**, IROS, 2025.       \n聂杜君、郭贤达、段义群、张瑞俊、陈龙。             \n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.02247)] [[项目]](https:\u002F\u002Fb0b8k1ng.github.io\u002FWMNav\u002F)\n\n* **SmartWay：增强的航点预测与回溯功能，用于零样本视觉-语言导航**, IROS, 2025.       \n石向宇、李泽锐、吕文琦、夏家通、达优布费拉斯、乔燕媛、吴琪。             \n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.10069)\n\n* **EmbodiedBench：面向视觉驱动具身智能体的多模态大型语言模型综合基准测试**, arXiv, 2025.       \n杨锐、陈汉阳、张俊宇、赵马克、钱成、王康睿、王秦能、科里佩拉泰贾文卡特、莫瓦赫迪马尔齐耶、李曼玲、季恒、张欢、张彤。             \n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.09560)] [[项目]](https:\u002F\u002Fembodiedbench.github.io)\n\n* **MapNav：基于标注语义地图的新型记忆表示，用于基于VLM的视觉-语言导航**, arXiv, 2025.       \n张凌峰、郝晓帅、徐钦文、张强、张新尧、王鹏威、张静、王忠源、张尚航、徐仁静。             \n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.13451)\n\n* **迈向长时程视觉-语言导航：平台、基准与方法**, CVPR, 2025.       \n宋新帅、陈伟星、刘洋、陈维凯、李冠斌、林亮。             \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.09082)][项目](https:\u002F\u002Fhcplab-sysu.github.io\u002FLH-VLN\u002F)\n\n* **DivScene：基于多样化场景和物体的LVLMs目标导航基准测试**，arxiv，2024年。     \n王兆伟、张洪明、方天庆、田晔、杨岳、马凯欣、潘晓曼、宋阳秋、于东。     \n[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.02730)] [[项目](https:\u002F\u002Fzhaowei-wang-nlp.github.io\u002Fdivscene-project-page\u002F)]\n\n* **MapGPT：基于地图引导提示与自适应路径规划的视觉-语言导航**，ACL，2024年。       \n陈嘉琪、林冰倩、徐然、柴振华、梁晓丹、黄冠义。             \n[[页面](https:\u002F\u002Fchen-judge.github.io\u002FMapGPT\u002F)]\n\n* **NavCoT：通过学习解耦推理提升基于LLM的视觉-语言导航性能**，ArXiv，2024年。      \n林冰倩、聂云霜、魏子明、陈嘉琪、马世魁、韩建华、许航、常晓军、梁晓丹。             \n[[页面](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.07376)]       \n\n* **OMEGA：基于状态空间模型的高效遮挡感知式空地机器人动态环境导航**，ArXiv，2024年。      \n王俊明、黄栋、关秀贤、孙泽凯、沈天翔、刘方明、崔鹤鸣。         \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.10618)]\n\n* **CoVLA：面向自动驾驶的综合视觉-语言-动作数据集**，ArXiv，2024年。      \n荒井英久、三轮圭太、佐佐木健斗、山口优、渡边浩平、青木俊介、山本一成。         \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.10845)]\n\n* **FLAME：在城市环境中利用多模态LLM进行导航的学习**，ArXiv，2024年。      \n徐云哲、潘怡媛、刘哲、王赫生。       \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.11051)]\n\n* **基于基础模型的连续视觉-语言导航中的可供性导向规划**，ArXiv，2024年。      \n陈嘉琪、林冰倩、刘新民、梁晓丹、黄冠义。         \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.05890)]\n\n* **未知环境中的具身指令遵循**，ArXiv，2024年。      \n吴、王、徐、陆、颜。       \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.11818)]       \n\n* **DISCO：通过可微分场景语义与双层控制实现具身导航与交互**，arxiv，2024年。                \n徐鑫宇、罗圣诚、杨延超、李永禄、陆策吾。              \n[[页面]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.14758)\n\n* **NOLO：仅看一次即可导航**，arxiv，2024年。                \n周博文、王江星、陆宗清。              \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.01384)\n\n* **迈向具身导航通用模型的学习**，CVPR，2024年。    \n郑铎、黄诗佳、赵琳、钟毅武、王立伟。     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.02010)]     \n\n* **在线视觉-语言导航中的快慢结合测试时适应** ICML，2024年。    \n高俊宇、姚璇、徐昌盛。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.13209)]   \n\n* **行动前先讨论：通过多专家讨论实现视觉语言导航**，ICRA，2024年。   \n龙、宇兴、小奇、李、文哲、蔡、浩、董。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.11382)]    \n\n* **基于因果学习的视觉-语言导航**，CVPR，2024年。   \n王柳依、陈启君。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.10241)]   \n\n* **用于视觉-语言导航的体素化环境表示**，CVPR，2024年。   \n刘睿、杨毅。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.14158)]    \n\n* **利用神经辐射场进行连续视觉-语言导航的前瞻式探索**，CVPR 2024。   \n王、子涵、向阳、李、家豪、杨、叶琪、刘、俊杰、胡、明、蒋、书强、蒋。 \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.01943)]    \n\n* **通过像素引导的导航技能连接零样本目标导航与基础模型** ICRA，2024年。           \n蔡文哲、黄思远、程光然、龙宇兴、高鹏、孙昌寅以及董浩。       \n[[页面]](https:\u002F\u002Fgithub.com\u002Fwzcai99\u002FPixel-Navigator)      \n\n* **OVER-NAV：借助开放词汇检测与结构化表示提升迭代式视觉-语言导航**，CVPR，2024年。              \n赵干龙、李冠斌、陈维凯、俞益舟。           \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FZhao_OVER-NAV_Elevating_Iterative_Vision-and-Language_Navigation_with_Open-Vocabulary_Detection_and_StructurEd_CVPR_2024_paper.pdf)     \n\n* **RILA：用于零样本语义视听导航的反思与想象型语言代理**，CVPR，2024年。                \n杨泽源、刘嘉庚、陈培浩、阿努普·切里安、蒂姆·K·马克斯、乔纳森·勒鲁、甘创。       \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FYang_RILA_Reflective_and_Imaginative_Language_Agent_for_Zero-Shot_Semantic_Audio-Visual_CVPR_2024_paper.pdf)   \n\n* **迈向具身导航通用模型的学习**，CVPR，2024年。                \n郑铎、黄诗佳、赵琳、钟毅武、王立伟。       \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FZheng_Towards_Learning_a_Generalist_Model_for_Embodied_Navigation_CVPR_2024_paper.pdf)\n\n* **基于因果学习的视觉-语言导航**，CVPR，2024年。                \n王柳依、何宗涛、党荣浩、申孟娇、刘承举、陈启君。        \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FWang_Vision-and-Language_Navigation_via_Causal_Learning_CVPR_2024_paper.pdf)\n\n* **针对实例图像目标导航的实例感知探索-验证-开发**，CVPR，2024年。                \n雷晓涵、王敏、周文刚、李莉、李厚强。     \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FLei_Instance-aware_Exploration-Verification-Exploitation_for_Instance_ImageGoal_Navigation_CVPR_2024_paper.pdf)\n\n* **Habitat合成场景数据集（HSSD-200）：对象目标导航中3D场景规模与真实感权衡的分析**，CVPR，2024年。                \n穆库尔·卡纳、毛永森、姜瀚霄、哈雷什·桑杰、布伦南·沙克莱特、德鲁夫·巴特拉、亚历山大·克莱格、埃里克·昂德桑德、安吉尔·X·张、马诺利斯·萨瓦。     \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FKhanna_Habitat_Synthetic_Scenes_Dataset_HSSD-200_An_Analysis_of_3D_Scene_CVPR_2024_paper.pdf)\n\n* **SchurVINS：基于舒尔补的轻量级视觉惯性导航系统**，CVPR，2024年。                \n范云飞、赵天宇、王贵东。     \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FFan_SchurVINS_Schur_Complement-Based_Lightweight_Visual_Inertial_Navigation_System_CVPR_2024_paper.pdf)\n\n* **SPOC：在仿真中模仿最短路径实现现实世界中的高效导航与操作**，CVPR，2024年。                \nKiana Ehsani、Tanmay Gupta、Rose Hendrix、Jordi Salvador、Luca Weihs、Kuo-Hao Zeng、Kunal Pratap Singh、Yejin Kim、Winson Han、Alvaro Herrasti、Ranjay Krishna、Dustin Schwenk、Eli VanderBilt、Aniruddha Kembhavi。  \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FEhsani_SPOC_Imitating_Shortest_Paths_in_Simulation_Enables_Effective_Navigation_and_CVPR_2024_paper.pdf)\n\n* **用于视觉-语言导航的体素化环境表示**，CVPR，2024年。                \n刘睿、王文冠、杨毅。     \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FLiu_Volumetric_Environment_Representation_for_Vision-Language_Navigation_CVPR_2024_paper.pdf)\n\n* **GOAT-Bench：多模态终身导航基准测试**，CVPR，2024年。                \n王小涵、刘岳虎、宋欣航、刘宇怡、张思贤、蒋书强。        \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FKhanna_GOAT-Bench_A_Benchmark_for_Multi-Modal_Lifelong_Navigation_CVPR_2024_paper.pdf)\n\n* **基于效果导向效用的交互式导航方法**，CVPR，2024年。                \n王小涵、刘岳虎、宋欣航、刘宇怡、张思贤、蒋书强。      \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FWang_An_Interactive_Navigation_Method_with_Effect-oriented_Affordance_CVPR_2024_paper.pdf)\n\n* **先想象再行动：面向目标物体导航的自监督生成地图**，CVPR，2024年。                \n张思贤、于新尧、宋欣航、王小涵、蒋书强。         \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FZhang_Imagine_Before_Go_Self-Supervised_Generative_Map_for_Object_Goal_Navigation_CVPR_2024_paper.pdf)\n\n* **MemoNav：用于视觉导航的工作记忆模型**，CVPR，2024年。                \n李洪鑫、王泽宇、杨旭、杨雨然、梅淑琪、张兆翔。         \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FLi_MemoNav_Working_Memory_Model_for_Visual_Navigation_CVPR_2024_paper.pdf)\n\n* **基于价值引导扩散策略的偏观测下多功能导航**，CVPR，2024年。                \n张耿宇、唐浩、严燕。         \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FZhang_Versatile_Navigation_Under_Partial_Observability_via_Value-guided_Diffusion_Policy_CVPR_2024_paper.pdf)\n\n* **利用神经辐射场进行前瞻探索的连续视觉-语言导航**，CVPR，2024年。                \n王子涵、李向阳、杨嘉豪、刘叶琪、胡俊杰、江明、蒋书强。    \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FWang_Lookahead_Exploration_with_Neural_Radiance_Representation_for_Continuous_Vision-Language_Navigation_CVPR_2024_paper.pdf)\n\n* **SPIN：同步感知、交互与导航**，CVPR，2024年。                \nShagun Uppal、Ananye Agarwal、熊浩宇、Kenneth Shaw、Deepak Pathak。    \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FUppal_SPIN_Simultaneous_Perception_Interaction_and_Navigation_CVPR_2024_paper.pdf)\n\n* **通过大型模型实现视觉-语言导航中的可修正地标发现**，TPAMI，2024年。              \n林冰倩、聂云霜、魏子明、朱毅、徐航、马世奎、刘建庄、梁晓丹。           \n[[页面]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F10543121)\n\n* **ETPNav：面向连续环境下的视觉-语言导航的演化式拓扑规划**，IEEE T-PAMI，2024年。   \n安、董、韩青、王、王文冠、王尊、黄彦、何凯基、王亮。 \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.03047)]\n\n* **NaVid：基于视频的VLM为视觉-语言导航规划下一步行动**，RSS，2024年。   \n张佳钊、王坤宇、许荣涛、周庚泽、洪一聪、方晓萌、吴奇、张志正、王赫。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.15852)]\n\n* **对话中的行进：用于远程具身指代表达的交互式提示**，ICCV，2023年。   \n乔、闫元、袁凯、齐、郑、余、景、刘、齐、吴。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.10141)]     \n\n* **用于交互式指令遵循的多级组合推理**，AAAI，2023年。   \nBhambri、Suvaansh、金炳辉、崔钟贤。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.09387)]   \n\n* **通过在线视觉-语言地图实现真实世界的视觉和语言导航**，ArXiv，2023年。   \n徐成光、阮孝德、克里斯托弗·阿马托、劳森·L·S·王。 \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.10822)]    \n\n* **通过扰动感知对比学习实现抗偏差的智能体导航**，TPAMI，2023年。              \n林冰倩、龙延鑫、朱毅、朱凤达、梁晓丹、叶启祥、林亮。           \n[[页面]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F10120966\u002F)\n\n* **找到你想要的：学习需求条件下的物体属性空间以支持需求驱动的导航**，NIPS，2023年。   \n王晨、李武、董。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.08138)]\n\n* **HomeRobot：开放词汇的移动操作机器人**，NIPS，2023年。   \n耶纳曼德拉、斯里拉姆、阿伦、拉马昌德兰、卡尔梅什、亚达夫、奥斯汀、王、穆库尔、坎纳、提奥菲尔、热韦特、杨宗炎、维迪、贾因、亚历山大威廉、克莱格、约翰、特纳、佐尔特、基拉、马诺利斯、萨瓦、安吉尔、张、德文德拉辛格、查普洛特、德鲁夫、巴特拉、鲁兹贝、莫塔吉、约纳坦、比斯克、克里斯、帕克斯顿。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2306.11565)]    \n\n* **Behavior-1k：包含1,000项日常活动及真实感仿真的具身人工智能基准测试**，机器人学习会议，2023年。    \n李承书、张若涵、王乔西亚、戈克门、斯里瓦斯塔瓦、马丁-马丁、王陈、莱文、凌格尔巴赫、孙以及其他。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.09227)]\n\n* **DialFRED：用于具身指令遵循的对话式智能体**，arXiv，2022年。   \n高晓峰、高巧姿、龚冉、林戈文德、塔泰、高拉夫S.、苏卡特梅。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2202.13330)]   \n\n* **HOP：面向视觉-语言导航的历史与顺序感知预训练**，CVPR，2022年。       \n乔、闫元、齐、洪一聪、余、彭、王、齐、吴。        \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.11591)]      \n\n* **弥合视觉-语言导航中离散与连续环境学习之间的差距**，CVPR，2022年。    \n洪一聪、王尊、吴齐、史蒂芬·古尔德。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.02764)]   \n\n* **FILM：使用模块化方法遵循语言指令**，ICLR，2022年。   \nMin So Yeon、查普洛特、拉维库马尔、比斯克、萨拉胡丁诺夫。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2110.07342)]\n\n* **LM-Nav：基于大规模预训练语言、视觉和动作模型的机器人导航**，机器人学习会议。2022年。  \nDhruv Shah, , Blazej Osinski, Brian Ichter, Sergey Levine。      \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2207.04429)]\n\n* **SOON：基于图的探索的场景导向目标导航**，CVPR，2021年。      \nZhu, Fengda, Xiwen, Liang, Yi, Zhu, Qizhi, Yu, Xiaojun, Chang, Xiaodan, Liang。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.17138)]   \n\n* **视觉-语言导航策略学习与适应**，IEEE T-PAMI 43. 12(2021): 4205-4216。    \nWang, Xin, Qiuyuan, Huang, Asli, Celikyilmaz, Jianfeng, Gao, Dinghan, Shen, Yuan-Fang, Wang, William Yang, Wang, Lei, Zhang。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002Fhttps:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F8986691)]   \n\n* **邻域视图增强的视觉与语言导航模型**，MM，2021年。   \nAn, Dong, Yuankai, Qi, Yan, Huang, Qi, Wu, Liang, Wang, Tieniu, Tan。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2107.07201)]   \n\n* **超越导航图：连续环境中的视觉与语言导航**，ECCV，2020年。         \nKrantz, Jacob 和 Wijmans, Erik 和 Majumdar, Arjun 和 Batra, Dhruv 和 Lee, Stefan。   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2004.02857)]\n\n* **REVERIE：真实室内环境中的远程具身视觉指代表达**，CVPR，2020年。   \nQi, Yuankai, Qi, Wu, Peter, Anderson, Xin, Wang, William Yang, Wang, Chunhua, Shen, Anton, Hengel。        \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1904.10151)]      \n\n* **ALFRED：面向日常任务的具身指令理解基准**，CVPR，2020年。    \nShridhar, Mohit, Jesse, Thomason, Daniel, Gordon, Yonatan, Bisk, Winson, Han, Roozbeh, Mottaghi, Luke, Zettlemoyer, Dieter, Fox。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1912.01734)]   \n\n* **视觉与对话导航**，机器人学习会议。2020年。   \nThomason, Jesse, Michael, Murray, Maya, Cakmak, Luke, Zettlemoyer。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1907.04957)]   \n\n* **用于智能体导航的语言与视觉实体关系图**，NeurIPS，2020年。   \nHong, Yicong, Cristian, Rodriguez, Yuankai, Qi, Qi, Wu, Stephen, Gould。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2010.09304)]   \n\n* **基于跨模态接地与交替对抗学习的语言引导导航**，IEEE T-CSVT 31. (2020): 3469-3481。    \nWeixia Zhang, , Chao Ma, Qi Wu, Xiaokang Yang。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2011.10972)]   \n\n* **坚守路径：视觉与语言导航中的指令忠实性**，ACL，2019年。   \nJain, Vihan, Gabriel, Magalhaes, Alexander, Ku, Ashish, Vaswani, Eugene, Ie, Jason, Baldridge。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.12255)]    \n\n* **TOUCHDOWN：视觉街道环境中的自然语言导航与空间推理**，CVPR，2019年。   \nChen, Howard, Alane, Suhr, Dipendra, Misra, Noah, Snavely, Yoav, Artzi。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1811.12354)]\n\n* **视觉与语言导航：在真实环境中解读具身导航指令**，CVPR，2018年。   \nAnderson, Peter, Qi, Wu, Damien, Teney, Jake, Bruce, Mark, Johnson, Niko, Sunderhauf, Ian, Reid, Stephen, Gould, Anton, Hengel。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1711.07280)]    \n\n* **三思而后行：为前瞻式视觉与语言导航架起无模型与基于模型强化学习的桥梁**，ECCV，2018年。   \nXin Eric Wang, , Wenhan Xiong, Hongmin Wang, William Yang Wang。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.07729)]     \n\n\n\n### 非视觉感知：触觉\n\n* **传感器无关触觉表征（SITR）**，ICLR，2025年。    \nHarsh Gupta, Yuchen Mo, Shengmiao Jin, Wenzhen Yuan。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.19638)]\n\n* **反应式扩散策略：面向接触密集型操作的慢速-快速视觉-触觉策略学习**，RSS，2025年。    \nHan Xue, Jieji Ren, Wendi Chen, Gu Zhang, Yuan Fang, Guoying Gu, Huazhe Xu, Cewu Lu。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.02881)]\n\n* **3D-ViTac：利用视觉-触觉传感进行精细操作学习**，CoRL，2024年。    \nBinghao Huang, Yixuan Wang, Xinyi Yang, Yiyue Luo, Yunzhu Li。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.24091)]\n\n* **TacSL：用于视觉-触觉传感器仿真与学习的库**，IEEE TRO，2025年。    \nIretiayo Akinola, Jie Xu, Jan Carius, Dieter Fox, Yashraj Narang。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fabs\u002F2408.06506)]\n\n* **当视觉遇见触觉：从信号处理视角看视觉-触觉传感器的当代综述**，Arxiv，2024年。    \nLi, Shoujie 和 Wang, Zihan 和 Wu, Changsheng 和 Li, Xiang 和 Luo, Shan 和 Fang, Bin 和 Sun, Fuchun 和 Zhang, Xiao-Ping 和 Ding, Wenbo。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.12226)]\n\n* **利用触觉传感提升手部物体的可泛化6D位姿跟踪**，RA-L，2024年。    \nYun Liu, Xiaomeng Xu, Weihang Chen, Haocheng Yuan, He Wang, Jing Xu, Rui Chen, Li Yi。       \n[[页面](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.04026)]\n\n* **用两只多指灵巧手学习视觉-触觉技能**，ArXiv，2024年。   \nLin, Toru 和 Zhang, Yu 和 Li, Qiyang 和 Qi, Haozhi 和 Yi, Brent 和 Levine, Sergey 和 Malik, Jitendra。   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.16823)]   \n\n* **将触觉融入一切：学习统一的多模态触觉表征**，CVPR，2024年。   \nYang, Fengyu 和 Feng, Chao 和 Chen, Ziyang 和 Park, Hyoungseob 和 Wang, Daniel 和 Dou, Yiming 和 Zeng, Ziyao 和 Chen, Xien 和 Gangopadhyay, Rit 和 Owens, Andrew 等人。   \n[[页面](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FYang_Binding_Touch_to_Everything_Learning_Unified_Multimodal_Tactile_Representations_CVPR_2024_paper.pdf)]\n\n* **受生物启发的传感器及其在智能机器人中的应用：综述**，机器人智能与自动化，2024年。    \nZhou, Yanmin 和 Yan, Zheng 和 Yang, Ye 和 Wang, Zhipeng 和 Lu, Ping 和 Yuan, Philip F 和 He, Bin。   \n[[页面](https:\u002F\u002Fwww.emerald.com\u002Finsight\u002Fcontent\u002Fdoi\u002F10.1108\u002FRIA-07-2023-0088\u002Ffull\u002Fpdf)]   \n\n* **给我一个信号：使用数据手套进行静态手势识别**，传感器，2023年。   \nAchenbach, Philipp 和 Laux, Sebastian 和 Purdack, Dennis 和 Müller, Philipp Niklas 和 Göbel, Stefan。   \n[[页面](https:\u002F\u002Fwww.mdpi.com\u002F1424-8220\u002F23\u002F24\u002F9847\u002Fpdf)]   \n\n* **语义感知的自适应知识蒸馏用于传感器到视觉的动作识别**，IEEE图像处理汇刊，2021年。   \nLiu, Yang 和 Wang, Keze 和 Li, Guanbin 和 Lin, Liang。   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2009.00210)]   \n\n* **手部动作：触觉对象识别的窗口**，认知心理学，1987年。    \nLederman, Susan J 和 Klatzky, Roberta L。   \n[[页面](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002F0010028587900089)]   \n\n* **力与触觉传感**，施普林格机器人学手册，2016年。   \nCutkosky, Mark R 和 Howe, Robert D 和 Provancher, William R。   \n[[页面](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-3-319-32552-1_28)]\n\n* **触觉感知：教程**, 注意力、知觉与心理物理学, 2009年.   \n莱德曼，苏珊·J 和 克拉茨基，罗伯塔·L.   \n[[页面](https:\u002F\u002Flink.springer.com\u002Fcontent\u002Fpdf\u002F10.3758\u002FAPP.71.7.1439.pdf)]   \n\n* **基于压阻复合材料的柔性触觉传感：综述**, 传感器, 2014年.   \n斯塔西，斯特凡诺 和 卡乌达，瓦伦蒂娜 和 卡纳韦塞，詹卡洛 和 皮里，坎迪多·法布里齐奥.   \n[[页面](https:\u002F\u002Fwww.mdpi.com\u002F1424-8220\u002F14\u002F3\u002F5296.)]   \n\n* **灵巧机器人手中的触觉传感**, 机器人与自主系统, 2015年.    \n卡帕索夫，扎纳特 和 科拉列斯，胡安-安东尼奥 和 佩尔德罗，韦罗尼克.   \n[[页面](https:\u002F\u002Fuca.hal.science\u002Fhal-01680649\u002Fdocument)]   \n\n* **GelLink：一种基于视觉的触觉感知与本体感觉的紧凑型多指节手指**, arXiv, 2024年.   \n马，宇翔 和 阿德尔森，爱德华.   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.14887)]\n\n* **用于多模态对齐的触觉、视觉和语言数据集**, ArXiv, 2024年.   \n傅，乐天 和 达塔，高拉夫 和 黄，黄 和 帕尼奇，威廉·钟浩 和 德雷克，贾伊敏 和 奥尔蒂斯，约瑟夫 和 穆卡达姆，穆斯塔法 和 兰贝塔，迈克 和 卡兰德拉，罗伯托 和 戈德堡，肯.   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.13232)]\n\n* **通过离散扩散进行大规模无动作视频预训练以实现高效策略学习**, ArXiv, 2024年.   \n何，浩然 和 白，陈佳 和 潘，凌 和 张，维南 和 赵，彬 和 李，雪龙.   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.14407)]\n\n* **Snap-it, Tap-it, Splat-it：面向复杂表面重建的触觉信息引导的3D高斯泼溅法**, ArXiv, 2024年.   \n科米，毛罗 和 托尼奥尼，阿莱西奥 和 杨，麦克斯 和 特雷姆布莱，乔纳森 和 布卢基斯，瓦尔茨 和 林，易琼 和 列波拉，内森·F 和 艾奇森，劳伦斯.    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.20275)]   \n\n* **触觉增强的辐射场**, CVPR, 2024年.      \nDou，Yiming 和 Yang，Fengyu 和 Liu，Yi 和 Loquercio，Antonio 和 Owens，Andrew.    \n[[页面](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FDou_Tactile-Augmented_Radiance_Fields_CVPR_2024_paper.pdf)]       \n\n* **AnyRotate：具有Sim-to-Real触感的重力不变式手持物体旋转**, ArXiv, 2024年.      \n杨，麦克斯 和 卢，成华 和 Church，Alex 和 林，易琼 和 Ford，Chris 和 李，浩然 和 Psomopoulou，Efi 和 Barton，David AW 和 Lepora，Nathan F.     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.07391)]      \n\n* **用于机器人操作的触觉图像特征级Sim2Real回归**, ICRA ViTac, 2024年.    \n段，博义 和 钱，坤 和 赵，永强 和 张，东元 和 罗，山.    \n[[页面](https:\u002F\u002Fshanluo.github.io\u002FViTacWorkshops\u002Fcontent\u002FViTac2024_Paper_09.pdf)]   \n\n* **MAE4GM：利用多模态自编码器进行颗粒状物料属性估计的视觉-触觉学习**,ICRA ViTac, 2024年.    \n张，泽青 和 郑，广泽 和 季，学博 和 陈，冠琪 和 贾，瑞星 和 陈，文涛 和 陈，冠华 和 张，梁俊 和 潘，嘉.    \n[[页面](https:\u002F\u002Fshanluo.github.io\u002FViTacWorkshops\u002Fcontent\u002FViTac2024_Paper_01.pdf)]\n\n* **Octopi：利用大型触觉-语言模型进行物体属性推理**, arXiv预印本arXiv:2405.02794, 2024年.     \nYu，Samson 和 Lin，Kelvin 和 Xiao，Anxing 和 段，贾飞 和 Soh，Harold.    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.02794)]\n\n* **9dtact：一种紧凑型基于视觉的触觉传感器，用于精确的3D形状重建和可推广的6D力估计**, IEEE机器人与自动化快报, 2023年.   \n林，昌毅 和 张，韩 和 许，继凯 和 吴，雷 和 许，华哲.   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.14277)]\n\n* **Allsight：一种低成本、高分辨率的圆形触觉传感器，具备零样本学习能力**, IEEE机器人与自动化快报, 2023年.   \n阿祖莱，奥舍 和 柯蒂斯，尼姆罗德 和 索科洛夫斯基，罗特姆 和 莱维茨基，盖伊 和 斯洛莫维克，丹尼尔 和 利林，盖伊 和 辛托夫，阿维沙伊.    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2307.02928)]   \n\n* **Vistac：迈向用于机器人操作的统一多模态感知手指**, IEEE传感器期刊, 2023年.   \n阿塔尔，希拉兹 和 帕特尔，高拉夫 和 徐，正通 和 邱，强 和 谢，宇.   \n[[页面](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F10242327\u002F)]   \n\n* **Midastouch：跨滑动触摸分布的蒙特卡洛推理**, CoRL, 2023年.   \n苏雷什，苏达尔尚 和 施，子琳 和 安德森，斯图尔特 和 凯斯，迈克 和 穆卡达姆，穆斯塔法.   \n[[页面](https:\u002F\u002Fproceedings.mlr.press\u002Fv205\u002Fsuresh23a\u002Fsuresh23a.pdf)]    \n\n* **The objectfolder基准测试：结合神经网络与真实物体的多感官学习**, CVPR, 2023年.   \n高，若涵 和 Dou，Yiming 和 李，浩 和 阿加瓦尔，坦迈 和 博格，珍妮特 和 李，云竹 和 费-费，李 和 吴，家俊.\n[[页面](http:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FGao_The_ObjectFolder_Benchmark_Multisensory_Learning_With_Neural_and_Real_Objects_CVPR_2023_paper.pdf)]\n\n* **Imagebind：一个嵌入空间，将一切绑定在一起**, CVPR, 2023年.   \n吉尔达尔，罗希特 和 埃尔-努比，阿拉丁 和 刘，庄 和 辛格，曼纳特 和 阿尔瓦拉，卡利扬·瓦苏德夫 和 朱林，阿芒 和 米斯拉，伊桑.   \n[[页面](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2023\u002Fpapers\u002FGirdhar_ImageBind_One_Embedding_Space_To_Bind_Them_All_CVPR_2023_paper.pdf)]\n\n* **触摸神经场：利用神经辐射场生成触觉传感数据**, 机器人学习会议, 第1618–1628页, 2023年.    \n钟，绍洪 和 阿尔比尼，亚历山德罗 和 琼斯，欧伊维·帕克 和 迈奥利诺，佩尔拉 和 波斯纳，英格玛.   \n[[页面](https:\u002F\u002Fproceedings.mlr.press\u002Fv205\u002Fzhong23a\u002Fzhong23a.pdf)]    \n\n* **学习阅读盲文：利用扩散模型弥合触觉现实差距**, ArXiv, 2023年.   \n伊圭拉，卡罗丽娜 和 布茨，拜伦 和 穆卡达姆，穆斯塔法.   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.01182)]   \n\n* **从触觉生成视觉场景**, CVPR, 2023年.   \n杨，冯宇 和 张，家诚 和 奥文斯，安德鲁.   \n[[页面](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023\u002Fpapers\u002FYang_Generating_Visual_Scenes_from_Touch_ICCV_2023_paper.pdf)]\n\n* **Dtact：一种基于视觉的触觉传感器，可直接从黑暗中测量高分辨率的3D几何形状**, ICRA, 2023年.   \n林，昌毅 和 林，子琪 和 王，绍雄 和 许，华哲.   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.13916)]\n\n* **使用安装在手中的RGB相机和视觉触觉传感器进行手部姿态估计**, IEEE Access, 2023年.    \n高，袁 和 松冈，祥吾 和 万，伟伟 和 清川，拓也 和 小山，圭介 和 原田，健介.   \n[[页面](https:\u002F\u002Fieeexplore.ieee.org\u002Fiel7\u002F6287639\u002F6514899\u002F10043666.pdf)]   \n\n* **利用多个基于视觉的触觉传感器进行碰撞感知的手部6D物体姿态估计**, ICRA, 2023年.    \n卡代奥，加布里埃莱 M 和 皮加，尼古拉 A 和 博塔雷尔，法布里齐奥 和 纳塔莱，洛伦佐.    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2301.13667)]\n\n* **基于视觉触觉传感的三维形状重建的隐式神经表示**, ArXiv, 2023.    \n科米，毛罗；丘奇，亚历克斯；李克杰；艾奇森，劳伦斯；莱波拉，内森·F.    \n[[页面](https:\u002F\u002Fshanluo.github.io\u002FViTacWorkshops\u002Fcontent\u002FViTac2023_Paper_06.pdf)]    \n\n* **多指手滑动触碰探索用于未知物体形状建模**, IROS, 2023.   \n陈怡婷；泰克登，艾哈迈特·埃尔詹；戴森罗斯，马克·彼得；贝基罗格鲁，亚塞敏.   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.00576)]   \n\n* **结合视觉与触觉的通用手持物体旋转**, CoRL, 2023.          \n齐浩志；易布伦特；苏雷什，苏达尔尚；兰贝塔，迈克；马毅；卡兰德拉，罗伯托；马利克，吉滕德拉.        \n[[页面](https:\u002F\u002Fproceedings.mlr.press\u002Fv229\u002Fqi23a\u002Fqi23a.pdf)]          \n\n* **基于模型与无模型的触觉推动模拟到现实深度强化学习**, IEEE机器人与自动化快报, 2023.   \n杨，麦克斯；林，义琼；丘奇，亚历克斯；劳埃德，约翰；张，丹丹；巴顿，大卫·AW；莱波拉，内森·F.     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2307.14272)]      \n\n* **用于触觉图像模拟到现实迁移的无监督对抗域适应**, IEEE仪器与测量汇刊, 2023.       \n景，星硕；钱，坤；贾努，图多尔；罗珊.      \n[[页面](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F10106009\u002F)]\n\n* **从不完全触觉数据中学习：基于掩码自编码器的触觉表征学习**, IROS, 2023.    \n曹，关群；江，佳琪；博勒加拉，达努什卡；罗珊.    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2307.07358)]\n\n* **触觉带来的灵巧性：通过机器人玩耍进行触觉表征的自监督预训练**, ArXiv, 2023.    \n古泽伊，伊尔马克；埃文斯，本；钦塔拉，索米思；平托，莱雷尔.    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.12076)]    \n\n* **Gelslim 3.0：紧凑型触觉感知手指中的高分辨率形状、力和滑移测量**, ICRA, 2022.   \n泰勒，伊恩·H；董，思远；罗德里格斯，阿尔贝托.   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.12269)]    \n\n* **Tacto：一款快速、灵活且开源的高分辨率基于视觉的触觉传感器仿真器**, IEEE机器人与自动化快报, 2022.    \n王，绍雄；兰贝塔，迈克；周，柏伟；卡兰德拉，罗伯托.   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2012.08456)]   \n\n* **Taxim：一种基于示例的GelSight触觉传感器仿真模型**, IEEE机器人与自动化快报, 2022.   \n司，子琳；袁，文珍.   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2109.04027)]\n\n* **Objectfolder 2.0：用于模拟到现实迁移的多感官物体数据集**, CVPR, 2022.      \n高，若涵；司，子琳；张，延宇；克拉克，塞缪尔；博格，珍妮特；费-费，李；袁，文珍；吴，嘉俊.     \n[[页面](http:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FGao_ObjectFolder_2.0_A_Multisensory_Object_Dataset_for_Sim2Real_Transfer_CVPR_2022_paper.pdf)]      \n\n* **自监督视觉-触觉预训练用于定位和跟随衣物特征**, ArXiv, 2022.      \n克尔，贾斯汀；黄，黄；威尔科克斯，阿尔伯特；霍克，瑞安；伊赫诺夫斯基，杰弗里；卡兰德拉，罗伯托；戈德堡，肯.     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.13042)]     \n\n* **利用视觉和触觉传感器数据对手中物体进行视觉触觉6D位姿估计**, IEEE机器人与自动化快报, 2022.   \n迪卡莱，斯内哈尔；帕特尔，卡兰库马尔；丁格拉，达克什；纳拉穆拉，伊托西；林，明信；伊巴，索希；贾马利，纳维德.   \n[[页面](https:\u002F\u002Fwww.researchgate.net\u002Fprofile\u002FSnehal_Dikhale\u002Fpublication\u002F357842538_VisuoTactile_6D_Pose_Estimation_of_an_In-Hand_Object_Using_Vision_and_Tactile_Sensor_Data\u002Flinks\u002F6297b925416ec50bdb022987\u002FVisuoTactile-6D-Pose-Estimation-of-an-In-Hand-Object-Using-Vision-and-Tactile-Sensor-Data.pdf)]   \n\n* **Shapemap 3-D：通过密集触碰和视觉实现高效形状映射**, ICRA, 2022.    \n苏雷什，苏达尔尚；司，子琳；曼格尔森，乔舒亚·G；袁，文珍；凯斯，迈克尔.    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2109.09884)]\n\n* **Visuotactile-rl：利用深度强化学习学习多模态操作策略**, ICRA, 2022.          \n汉森，约翰娜；霍根，弗朗索瓦；里夫金，德米特里；梅格，大卫；詹金，迈克尔；杜德克，格雷戈里.          \n[[页面](https:\u002F\u002Fjohannah.github.io\u002Fpapers\u002FVisuotactile-RL.pdf)]          \n\n* **触觉健身房2.0：低成本高分辨率机器人触觉比较的模拟到现实深度强化学习**, IEEE机器人与自动化快报, 2022.   \n林，义琼；劳埃德，约翰；丘奇，亚历克斯；莱波拉，内森·F.   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2207.10763)]\n\n* **触碰即走：从人类收集的视觉和触觉数据中学习**, ArXiv, 2022.    \n杨，冯宇；马，晨阳；张，家成；朱，静；袁，文珍；欧文斯，安德鲁.    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2211.12498)]   \n\n* **Objectfolder：一个包含隐式视觉、听觉和触觉表征的对象数据集**, arXiv, 2021.   \n高，若涵；张，延宇；马尔，希瓦尼；李，费-费；吴，嘉俊.   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2109.07991)]   \n\n* **从自然语言监督中学习可迁移的视觉模型**, 国际机器学习大会, 2021.   \n拉德福德，亚历克；金，钟旭；哈拉西，克里斯；拉梅什，阿迪蒂亚；戈，加布里埃尔；阿加瓦尔，桑迪尼；萨斯特里，吉里什；阿斯克尔，阿曼达；米什金，帕梅拉；克拉克，杰克等.   \n[[页面](http:\u002F\u002Fproceedings.mlr.press\u002Fv139\u002Fradford21a\u002Fradford21a.pdf)]   \n\n* **GelSight楔形：用紧凑型机器人手指测量高分辨率3D接触几何形状**, ICRA, 2021.   \n王，绍雄；谢，宇；罗梅罗，布兰登；阿德尔森，爱德华.   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2106.08851)]\n\n* **利用几何接触渲染从首次触碰中进行触觉物体位姿估计**, CoRL, 2021.   \n维利亚隆加，玛丽亚·鲍萨；罗德里格斯，阿尔贝托；林，布莱恩；瓦尔斯，埃里克；塞霍波洛斯，西奥.   \n[[页面](https:\u002F\u002Fproceedings.mlr.press\u002Fv155\u002Fvillalonga21a\u002Fvillalonga21a.pdf)]     \n\n* **基于视觉和触觉的主动3D形状重建**, NeurIPS, 2021.   \n史密斯，爱德华；梅格，大卫；皮内达，路易斯；卡兰德拉，罗伯托；马利克，吉滕德拉；罗梅罗·索里亚诺，阿德里亚娜；德罗兹达尔，米哈尔.    \n[[页面](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2021\u002Ffile\u002F8635b5fd6bc675033fb72e8a3ccc10a0-Paper.pdf)]   \n\n* **为syntouch biotac解读和预测触觉信号**, 国际机器人研究杂志, 2021.    \n纳朗，雅什拉杰·S；孙达拉林甘，巴拉库马尔；范·维克，卡尔；穆萨维安，阿尔萨兰；福克斯，迪特.    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2101.05452)]\n\n* **GelTip：用于机器人操作的指状光学触觉传感器**, IROS, 2020.   \n戈梅斯，丹尼尔·费尔南德斯；林，钟林；罗，珊.   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2008.05404)]\n\n* **DIGIT：一种用于手持操作的低成本、紧凑型高分辨率触觉传感器的新设计**，IEEE机器人与自动化快报，2020年。   \n兰贝塔、迈克，周柏玮，田斯蒂芬，杨布莱恩，马伦本杰明，莫斯维多利亚·罗斯，斯特劳德戴夫，桑托斯雷蒙德，比亚戈维艾哈迈德，卡默雷尔格雷格，贾亚拉曼迪内什，卡兰德拉罗伯托。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2005.14679)]\n\n* **Digit：一种用于手持操作的低成本、紧凑型高分辨率触觉传感器的新设计**，IEEE机器人与自动化快报，2020年。   \n兰贝塔、迈克，周柏玮，田斯蒂芬，杨布莱恩，马伦本杰明，莫斯维多利亚·罗斯，斯特劳德戴夫，桑托斯雷蒙德，比亚戈维艾哈迈德，卡默雷尔格雷格等。   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2005.14679)]\n\n* **深度触觉体验：基于深度传感器数据估计触觉传感器输出**，IROS，2020年。   \n帕特尔、卡拉库马尔，伊巴、索西，贾马利、纳维德。   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2110.08946)]   \n\n* **基于视觉和触觉的3D形状重建**，NeurIPS，2020年。   \n史密斯、爱德华，卡兰德拉、罗伯托，罗梅罗、阿德里亚娜，吉奥克萨里、乔治娅，梅格尔、大卫，马利克、吉滕德拉，德罗兹达尔、米哈尔。   \n[[页面](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2020\u002Ffile\u002Fa3842ed7b3d0fe3ac263bcabd2999790-Paper.pdf)]\n\n* **异构触觉传感数据上的监督自编码器联合学习：提升材料分类性能**，IROS，2020年。      \n高瑞涵、陶尼亚佐夫、塔斯博拉特，林志平、吴燕。      \n[[页面](https:\u002F\u002Fyan-wu.com\u002Fwp-content\u002Fuploads\u002F2020\u002F08\u002Fgao2020supervised.pdf)]\n\n* **融合视觉与触觉：为接触密集型任务学习多模态表征**，IEEE机器人学汇刊，2020年。    \n李、米歇尔A，朱、宇科，扎卡雷斯、彼得，谭、马修，斯里尼瓦桑、克里希南，萨瓦雷斯、西尔维奥，费-费、李，加格、阿尼梅什，博格、珍妮特。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1907.13098)]\n\n* **利用刚性触觉传感器阵列进行高效触觉形状探索的学习**，PloS One，2020年。    \n弗勒、萨沙，莫林根、亚历山德拉，克拉茨基、罗伯塔L，里特、赫尔格。    \n[[页面](https:\u002F\u002Fjournals.plos.org\u002Fplosone\u002Farticle\u002Ffile?id=10.1371\u002Fjournal.pone.0226880&type=printable)]    \n\n* **通过物理驱动与数据驱动相结合的框架解释和预测触觉信号**，ArXiv，2020年。    \n纳朗、雅什拉杰S，范威克、卡尔，穆萨维安、阿尔萨兰，福克斯、迪特。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2006.03777)]\n\n* **利用触觉神经编码与脉冲神经网络实现快速纹理分类**，IROS，2020年。    \n陶尼亚佐夫、塔斯博拉特，蔡、彦松，高瑞涵、吴燕。    \n[[页面](https:\u002F\u002Fruihangao.github.io\u002Ffiles\u002Ftaunyazov2020fast.pdf)]\n\n* **SynTouch BioTac传感器的仿真**，智能自主系统15：第15届国际会议IAS-15论文集，2019年。    \n鲁佩尔、菲利普，约内茨科、扬尼克，格尔纳、米夏埃尔，亨德里希、诺尔曼，张、建伟。   \n[[页面](https:\u002F\u002Fwww.researchgate.net\u002Fprofile\u002FYannick-Jonetzko\u002Fpublication\u002F330014756_Simulation_of_the_SynTouch_BioTac_Sensor_Proceedings_of_the_15th_International_Conference_IAS-15\u002Flinks\u002F5cc7ed694585156cd7bbc519\u002FSimulation-of-the-SynTouch-BioTac-Sensor-Proceedings-of-the-15th-International-Conference-IAS-15.pdf)]   \n\n* **通过机器人交互稳健地学习触觉力估计**，ICRA，2019年。   \n孙达拉林甘、巴拉库马尔，兰伯特、亚历山大·萨莎，汉达、安库尔，布茨、拜伦，赫尔曼斯、塔克，伯奇菲尔德、斯坦，拉特利夫、内森，福克斯、迪特。   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.06187)]   \n\n* **从像素到感知：利用深度学习和仿生光学触觉传感器实现高度鲁棒的边缘感知与轮廓跟踪**，IEEE机器人与自动化快报，2019年。    \n莱波拉、内森F，丘奇、亚历克斯，德凯尔克霍夫、康拉德，哈德塞尔、赖娅，劳埃德、约翰。   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1812.02941)]\n\n* **基于高分辨率触觉印迹的触觉映射与定位**，ICRA，2019年。   \n鲍萨、玛丽亚，卡纳尔、奥列古尔，罗德里格斯、阿尔贝托。   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1904.10944)]   \n\n* **用于触觉传感特征提取的卷积自编码器**，IEEE机器人与自动化快报，2019年。   \n波利克、马尔塞拉，克拉亚西奇、伊沃娜，莱波拉、内森，奥尔萨格、马特科。    \n[[页面](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F8758942\u002F)]     \n\n* **通过触摸识别物体实例：基于多模态匹配的触觉识别**，ICRA，2019年。    \n林、贾斯汀，卡兰德拉、罗伯托，莱文、谢尔盖。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1903.03591)]\n\n* **Tactip系列：具有3D打印仿生形态的软体光学触觉传感器**，软体机器人，2018年。   \n沃德-谢里耶、本杰明，佩斯泰尔、尼古拉斯，克拉姆霍恩、卢克，温斯顿、本杰明，吉安纳奇尼、玛丽亚·埃莱娜，罗西特、乔纳森，莱波拉、内森F。   \n[[页面](https:\u002F\u002Fwww.liebertpub.com\u002Fdoi\u002Fpdf\u002F10.1089\u002Fsoro.2017.0052)]     \n\n* **基于单目视觉、触觉及形状先验的3D形状感知**，IROS，2018年。   \n王、绍雄，吴、嘉俊，孙、兴元，袁、文珍，弗里曼、威廉T，特南鲍姆、乔舒亚B，阿德尔森、爱德华H。   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.13916)]\n\n* **GelSight：用于估计几何形状和作用力的高分辨率机器人触觉传感器**，传感器，2017年。   \n袁、文珍，董、思远，阿德尔森、爱德华H。   \n[[页面](https:\u002F\u002Fwww.mdpi.com\u002F1424-8220\u002F17\u002F12\u002F2762\u002Fpdf)]   \n\n* **成功的触感：触觉传感能否帮助预测抓取结果？**，arXiv，2017年。      \n卡兰德拉、罗伯托，欧文斯、安德鲁，乌帕迪亚亚、马努，袁、文珍，林、贾斯汀，阿德尔森、爱德华H，莱文、谢尔盖。      \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1710.05512)]        \n\n* **用于测量几何形状和滑移的改进型GelSight触觉传感器**，IROS，2017年。    \n董、思远，袁、文珍，阿德尔森、爱德华H。   \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1708.00922)]   \n\n* **GelSight：用于估计几何形状和作用力的高分辨率机器人触觉传感器**，传感器，第17卷第12期，第2762页，2017年。   \n袁、文珍，董、思远，阿德尔森、爱德华H。   \n[[页面](https:\u002F\u002Fwww.mdpi.com\u002F1424-8220\u002F17\u002F12\u002F2762\u002Fpdf)]   \n\n* **连接视觉与触感：关联物理材料的视觉与触觉特性**，CVPR，2017年。    \n袁、文珍，王、绍雄，董、思远，阿德尔森、爱德华。    \n[[页面](http:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_cvpr_2017\u002Fpapers\u002FYuan_Connecting_Look_and_CVPR_2017_paper.pdf)]    \n\n* **结合自编码器的稳定强化学习应用于触觉和视觉数据**，IROS，2016年。    \n范霍夫、赫尔克，陈、努坦，卡尔、马克西米利安，范德斯马赫特、帕特里克，彼得斯、扬。    \n[[页面](https:\u002F\u002Fwww.academia.edu\u002Fdownload\u002F47652433\u002FHoof2016.pdf)]\n\n* **利用BioTac感知触觉微振动——与人类敏感度的比较**, BioRob, 2012.   \nFishel, Jeremy A 和 Loeb, Gerald E.   \n[[页面](https:\u002F\u002Fwww.researchgate.net\u002Fprofile\u002FGerald-Loeb\u002Fpublication\u002F256748883_Sensing_tactile_microvibrations_with_the_BioTac_Comparison_with_human_sensitivity\u002Flinks\u002F5dbcacae299bf1a47b0a3fa6\u002FSensing-tactile-microvibrations-with-the-BioTac-Comparison-with-human-sensitivity.pdf)]   \n\n\n\n## \u003Ca id=\"interaction\"> 身体化交互 \u003Ca href=\"#table-of-contents\">🔝\u003C\u002Fa> \u003C\u002Fa> \n\n* **DexGrasp Anything：迈向具备物理意识的通用机器人灵巧抓取**, arXiv, 2025     \n钟一鸣、蒋琪、于静怡、马悦欣。       \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.08257)\n\n* **超越目的地：一种面向探索感知的身体化问答新基准**, arXiv, 2025     \n姜凯旋、刘洋、陈伟星、罗景洲、陈子良、潘玲、李冠斌、林亮。        \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.11117)\n\n* **基于强化学习的跨身体灵巧抓取**, arXiv, 2024     \n袁浩奇、周博涵、傅宇辉、陆宗庆。       \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.02479)\n\n* **ManiGaussian：用于多任务机器人操作的动态高斯泼溅法**, arXiv, 2024     \n卢冠兴、张世义、王子威、刘昌柳、陆继文、唐言松。       \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.08321)\n\n* **MANUS：使用关节式3D高斯模型实现无标记抓取捕捉**, CVPR, 2024     \n波卡里亚·钱德拉迪普、沙阿·伊山·尼基尔、辛格·安吉拉、李泽坤、陈可凡、夏尔马·阿维纳什、斯里达尔·斯里纳特。       \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FPokhariya_MANUS_Markerless_Grasp_Capture_using_Articulated_3D_Gaussians_CVPR_2024_paper.pdf)\n\n* **语言驱动的抓取检测**, CVPR, 2024     \n武英定、武明日、黄宝儒、阮义、黎孝、武秋、阮英。       \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FVuong_Language-driven_Grasp_Detection_CVPR_2024_paper.pdf)\n\n* **通过领域先验知识泛化六自由度抓取检测**, CVPR, 2024     \n马浩翔、石莫迪、高博阳、黄迪。       \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FMa_Generalizing_6-DoF_Grasp_Detection_via_Domain_Prior_Knowledge_CVPR_2024_paper.pdf)\n\n* **灵巧抓取Transformer**, CVPR, 2024     \n徐国豪、魏艺琳、郑典、吴晓明、郑伟士。       \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FXu_Dexterous_Grasp_Transformer_CVPR_2024_paper.pdf)\n\n* **单视角场景点云的人类抓取生成**, CVPR, 2024     \n王彦康、邢成毅、魏艺琳、吴晓明、郑伟士。       \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FWang_Single-View_Scene_Point_Cloud_Human_Grasp_Generation_CVPR_2024_paper.pdf)\n\n* **G-HOP：用于交互重建和抓取合成的生成式手物先验**, CVPR, 2024     \n叶宇飞、阿比纳夫·古普塔、克里斯·基塔尼、舒巴姆·图尔西亚尼。       \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FYe_G-HOP_Generative_Hand-Object_Prior_for_Interaction_Reconstruction_and_Grasp_Synthesis_CVPR_2024_paper.pdf)\n\n* **利用模拟人形机器人抓取多样化物体** ArXiv, 2024.           \n罗正毅、曹金坤、萨米·克里斯滕、亚历山大·温克勒、克里斯·基塔尼、许伟鹏      \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.11385)        \n\n* **基于可微分抓握力矩边界估计器的任务导向灵巧抓取合成**, IROS, 2024.           \n陈嘉怡、陈宇星、张佳梁、王赫          \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.13586)\n\n* **Open6DOR：开放指令下6自由度物体重排的基准测试及基于VLM的方法**, IROS, 2024.           \n丁宇飞、耿浩然、徐超逸、方晓梦、张家照、魏松林、戴启宇、张志正、王赫             \n[[页面]](https:\u002F\u002Fpku-epic.github.io\u002FOpen6DOR\u002F)\n\n* **ASGrasp：基于RGB-D主动立体相机的透明物体重建与6自由度抓取检测的通用方法**, ICRA, 2024.           \n史俊、永A、金义祥、李鼎哲、牛浩宇、金哲柱、王赫                 \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.05648)\n\n* **OpenEQA：基础模型时代的身体化问答**, CVPR, 2024    \n马朱姆达尔、阿琼、阿贾伊、张晓涵、普塔、耶纳曼德拉、塞拉姆、亨纳夫、西尔瓦尔、麦克维、马克西梅茨、阿尔瑙德等人    \n[[页面]](http:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FMajumdar_OpenEQA_Embodied_Question_Answering_in_the_Era_of_Foundation_Models_CVPR_2024_paper.pdf)    \n\n* **探索至确信：身体化问答中的高效探索**, ICRA Workshop VLMNM, 2024    \n任、艾伦Z、克拉克、贾登、迪克西特、阿努什丽、伊特基娜、玛莎、马朱姆达尔、阿尼鲁达、萨迪格、多尔萨    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.15941)    \n\n* **S-EQA：解决身体化问答中的情境查询**, arXiv, 2024    \n多尔巴拉、维什努·萨尚克、戈亚尔、皮拉穆图、约翰斯顿、马诺查、加纳丹    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.04732)\n\n* **基于地图的模块化零样本身体化问答方法**, arXiv, 2024    \n坂本、小谷、东、大智、宫西、太贵、栗田、修平、川边、元明    \n[[页面]](https:\u002F\u002Fui.adsabs.harvard.edu\u002Fabs\u002F2024arXiv240516559S\u002Fabstract)    \n\n* **基于多LLM系统的身体化问答**, arXiv, 2024    \n帕特尔、多尔巴拉、贝迪    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.10918)\n\n* **MultiGripperGrasp：从平行爪夹持器到灵巧手的机器人抓取数据集**, arXiv, 2024    \n穆里洛、路易斯·费利佩·卡萨斯、卡尔贡卡尔、普拉巴卡兰、杨翔    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.09841)\n\n* **基于多模态大型语言模型的推理抓取**, arXiv, 2024    \n金世宇、徐锦轩、雷雨田、张良俊   \n[[页面]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.06798)\n\n* **SemGrasp：通过语言对齐离散化生成语义抓取**, CoRR, 2024    \n李凯林、王京博、杨立新、陆策武、戴博    \n[[页面]](https:\u002F\u002Fopenreview.net\u002Fforum?id=WUbr8NV1G6)\n\n* **GaussianGrasper：用于开放词汇机器人抓取的3D语言高斯泼溅法**, arXiv, 2024    \n郑宇航、陈翔宇、郑宇鹏、顾松恩、杨润益、靳步、李鹏飞、仲承亮、王增茂、刘丽娜等    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.09637)\n\n* **基于知识的身体化问答**, TPAMI, 2023    \n谭思南、葛梦梦、郭迪、刘华萍、孙富春      \n[[页面]](https:\u002F\u002Fpubmed.ncbi.nlm.nih.gov\u002F37195849\u002F)\n\n* **抓取合成的深度学习方法：综述**，IEEE机器人学汇刊，2023年    \n纽伯里，瑞斯；顾，莫里斯；钱布尔利，拉克兰；穆萨维安，阿尔萨兰；埃普纳，克莱门斯；莱特纳，尤尔根；博格，珍妮特；莫拉莱斯，安东尼奥；阿斯福尔，塔米姆；克拉吉奇，达尼察等    \n[[页面]](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1109\u002FTRO.2023.3280597)\n\n* **语言引导的机器人抓取：基于CLIP的杂乱场景下指代性抓取合成**，CoRL，2023年    \n齐亚法斯，乔治奥斯；许，宇成；戈埃尔，阿鲁希；卡塞伊，穆罕默德雷扎；李，志斌；卡塞伊，哈米德雷扎    \n[[页面]](https:\u002F\u002Fwww.research.ed.ac.uk\u002Fen\u002Fpublications\u002Flanguage-guided-robot-grasping-clip-based-referring-grasp-synthes)\n\n* **推理调优抓取：将多模态大型语言模型适配于机器人抓取任务**，CoRL，2023年    \n徐，金轩；金，世宇；雷，宇田；张，玉倩；张，梁俊   \n[[页面]](https:\u002F\u002Fopenreview.net\u002Fforum?id=3mKb5iyZ2V)\n\n* **蒸馏特征场实现少样本语言引导操作**，CoRL，2023年    \n沈，威廉；杨，戈；余，艾伦；王，詹森；凯尔布林，莱斯利·帕克；伊索拉，菲利普    \n[[页面]](https:\u002F\u002Fproceedings.mlr.press\u002Fv229\u002Fshen23a.html)\n\n* **AnyGrasp：空间与时间域中的鲁棒高效抓取感知**，IEEE机器人学汇刊，2023年    \n方，浩书；王，晨曦；方，洪杰；勾，明浩；刘，继荣；颜，恒旭；刘，文海；谢，义臣；陆，策武    \n[[页面]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F10167687)\n\n* **DexGraspNet：基于仿真构建的大规模通用物体灵巧抓取数据集**，ICRA，2023年。           \n王瑞诚、张嘉亮、陈佳怡、许银振、李普浩、刘腾宇、王鹤                          \n[[页面]](https:\u002F\u002Fpku-epic.github.io\u002FDexGraspNet\u002F)\n\n* **UniDexGrasp：通过学习多样化的提案生成与目标条件策略实现通用机器人灵巧抓取**，CVPR，2023年。           \n许银振、万伟康、张嘉亮、刘浩然、单子康、沈浩、王瑞诚、耿浩然、翁一嘉、陈佳怡、刘腾宇、李毅、王鹤                      \n[[页面]](https:\u002F\u002Fpku-epic.github.io\u002FUniDexGrasp\u002F)\n\n* **UniDexGrasp++：通过几何感知课程和迭代式通才—专才学习改进灵巧抓取策略学习**，ICCV，2023年。           \n万伟康、耿浩然、刘芸、单子康、杨耀东、李毅、王鹤                    \n[[页面]](https:\u002F\u002Fpku-epic.github.io\u002FUniDexGrasp++\u002F)\n\n* **CLIPort：用于机器人操作的“什么”与“哪里”路径**，CoRL，2022年    \n施里达尔，莫希特；马努埃利，卢卡斯；福克斯，迪特    \n[[页面]](https:\u002F\u002Fproceedings.mlr.press\u002Fv164\u002Fshridhar22a.html)\n\n* **ACRONYM：基于仿真的大规模抓取数据集**，ICRA，2021年     \n埃普纳，克莱门斯；穆萨维安，阿尔萨兰；福克斯，迪特    \n[[页面]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F9560844)\n\n* **Habitat-Matterport 3D数据集（HM3D）：1000个用于具身AI的大规模3D环境**，NeurIPS，2021年    \n拉马克里希南，桑托什·K；戈卡斯兰，亚伦；维贾曼斯，埃里克；马克西梅茨，奥列克桑德尔；克雷格，亚历克斯；特纳，约翰；昂德桑德，埃里克；加卢巴，沃伊切赫；韦斯特伯里，安德鲁；昌，安吉尔·X等    \n[[页面]](https:\u002F\u002Fdatasets-benchmarks-proceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2021\u002Ffile\u002F34173cb38f07f89ddbebc2ac9128303f-Paper-round2.pdf)\n\n* **端到端可训练的深度神经网络：用于从RGB图像中进行机器人抓取检测与语义分割**，ICRA，2021年    \n艾内特，斯特凡；弗劳恩多费尔，弗里德里希   \n[[页面]](https:\u002F\u002Felib.dlr.de\u002F146134\u002F)\n\n* **重访具身QA：一个简单基线及更进一步**，IEEE图像处理汇刊，2020年    \n吴，宇；蒋，璐；杨，毅    \n[[页面]](https:\u002F\u002Fopus.lib.uts.edu.au\u002Frest\u002Fbitstreams\u002Fee2d1faf-ce3b-4f63-a133-4217d19e9db1\u002Fretrieve)\n\n* **交互环境中多智能体具身问答**，ECCV，2020年    \n谭，思南；向，伟来；刘，华平；郭，迪；孙，富春    \n[[页面]](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1007\u002F978-3-030-58601-0_39)\n\n* **语言模型是少样本学习者**，NIPS，2020年    \n布朗，汤姆；曼，本杰明；赖德，尼克；苏比亚，梅拉妮；卡普兰，贾里德·D；达里瓦尔，普拉富拉；尼拉坎坦，阿温德；夏亚姆，普拉纳夫；萨斯特里，吉里什；阿斯克尔，阿曼达等    \n[[页面]](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.5555\u002F3495724.3495883)\n\n* **GraspNet-1Billion：通用物体抓取的大规模基准测试**，CVPR，2020年    \n方，浩书；王，晨曦；勾，明浩；陆，策武    \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_CVPR_2020\u002Fhtml\u002FFang_GraspNet-1Billion_A_Large-Scale_Benchmark_for_General_Object_Grasping_CVPR_2020_paper.html)\n\n* **多目标具身问答**，CVPR，2019年    \n俞，立成；陈，欣蕾；吉奥克萨里，乔治娅；班萨尔，莫希特；伯格，塔玛拉·L；巴特拉，德鲁夫    \n[[页面]](http:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_CVPR_2019\u002Fpapers\u002FYu_Multi-Target_Embodied_Question_Answering_CVPR_2019_paper.pdf)\n\n* **具有点云感知的写实环境中具身问答**，CVPR，2019年    \n维贾曼斯，埃里克；达塔，萨米亚克；马克西梅茨，奥列克桑德尔；达斯，阿比舍克；吉奥克萨里，乔治娅；李，斯蒂芬；埃萨，伊尔凡；帕里克，黛薇；巴特拉，德鲁夫   \n[[页面]](http:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_CVPR_2019\u002Fpapers\u002FWijmans_Embodied_Question_Answering_in_Photorealistic_Environments_With_Point_Cloud_Perception_CVPR_2019_paper.pdf)\n\n* **VideoNavQA：弥合视觉问答与具身问答之间的鸿沟**，BMVC，2019年    \n坎吉亚，卡塔利娜；贝利洛夫斯基，尤金；利奥，皮耶特罗；库维尔，阿隆    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1908.04950)\n\n* **6-DOF GraspNet：面向物体操作的变分抓取生成**，ICCV，2019年    \n穆萨维安，阿尔萨兰；埃普纳，克莱门斯；福克斯，迪特    \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_ICCV_2019\u002Fhtml\u002FMousavian_6-DOF_GraspNet_Variational_Grasp_Generation_for_Object_Manipulation_ICCV_2019_paper.html)\n\n* **具身问答**，CVPR，2018年    \n达斯，阿比舍克；达塔，萨米亚克；吉奥克萨里，乔治娅；李，斯蒂芬；帕里克，黛薇；巴特拉，德鲁夫       \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_cvpr_2018\u002Fpapers\u002FDas_Embodied_Question_Answering_CVPR_2018_paper.pdf)\n\n* **IQA：交互环境中的视觉问答**，CVPR，2018年     \n戈登，丹尼尔；肯布哈维，阿尼鲁达；拉斯泰加里，穆罕默德；雷德蒙，约瑟夫；福克斯，迪特；法尔哈迪，阿里 \n[[页面]](http:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent_cvpr_2018\u002Fpapers\u002FGordon_IQA_Visual_Question_CVPR_2018_paper.pdf)      \n\n* **利用逼真且丰富的3D环境构建可泛化智能体**，ECCV，2018年     \n吴，毅；吴，宇鑫；吉奥克萨里，乔治娅；田，元东    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1801.02209)\n\n* **MINOS：复杂环境中导航的多模态室内模拟器**，ECCV，2018年    \nSavva, Manolis 和 Chang, Angel X 和 Dosovitskiy, Alexey 和 Funkhouser, Thomas 和 Koltun, Vladlen    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1712.03931)    \n\n* **用于具身问答的神经模块化控制**，ECCV，2018年    \nDas, Abhishek 和 Gkioxari, Georgia 和 Lee, Stefan 和 Parikh, Devi 和 Batra, Dhruv    \n[[页面]](https:\u002F\u002Fauthors.library.caltech.edu\u002Frecords\u002Fykvm4-2ed40\u002Ffiles\u002F1810.11181.pdf)\n\n* **Jacquard：用于机器人抓取检测的大规模数据集**，IROS，2018年    \nDepierre, Amaury 和 Dellandr{\\'e}a, Emmanuel 和 Chen, Liming    \n[[页面]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F8593950)    \n\n* **Matterport3D：从室内场景的RGB-D数据中学习**，IEEE国际3D视觉会议，2017年    \nChang, Angel 和 Dai, Angela 和 Funkhouser, Thomas 和 Halber, Maciej 和 Niessner, Matthias 和 Savva, Manolis 和 Song, Shuran 和 Zeng, Andy 和 Zhang, Yinda    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1709.06158)    \n\n* **ScanNet：室内场景的丰富标注3D重建**，CVPR，2017年    \nDai, Angela 和 Chang, Angel X 和 Savva, Manolis 和 Halber, Maciej 和 Funkhouser, Thomas 和 Nie{\\ss}ner, Matthias \n[[页面]](https:\u002F\u002Fwww.computer.org\u002Fcsdl\u002Fproceedings-article\u002Fcvpr\u002F2017\u002F0457c432\u002F12OmNyRg4C5)\n\n* **基于形状补全的机器人抓取**，IROS，2017年    \nVarley, Jacob 和 DeChant, Chad 和 Richardson, Adam 和 Ruales, Joaqu{\\'\\i}n 和 Allen, Peter    \n[[页面]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F8206060)    \n\n* **基于RGB-D图像的有效抓取：使用新的矩形表示进行学习**，IEEE国际机器人与自动化会议，2011年    \nJiang, Yun 和 Moseson, Stephen 和 Saxena, Ashutosh    \n[[页面]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F5980145)    \n\n* **一种基于前沿的自主探索方法**，CIRA，1997年     \nYamauchi, Brian    \n[[页面]](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.5555\u002F523996.793157)    \n\n\n\n## \u003Ca id=\"agent\"> 具身智能体 \u003Ca href=\"#table-of-contents\">🔝\u003C\u002Fa> \u003C\u002Fa> \n\n### 具身多模态基础模型与VLA方法\n* **π₀：用于通用机器人控制的视觉-语言-动作流模型**，arXiv，2024年。     \nKevin Black、Noah Brown、Danny Driess、Adnan Esmail、Michael Equi、Chelsea Finn、Niccolo Fusai、Lachy Groom、Karol Hausman、Brian Ichter、Szymon Jakubczak、Tim Jones、Liyiming Ke、Sergey Levine、Adrian Li-Bell、Mohith Mothukuri、Suraj Nair、Karl Pertsch、Lucy Xiaoyang Shi、James Tanner、Quan Vuong、Anna Walling、Haohuan Wang、Ury Zhilinsky。     \n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.24164) [[项目]](https:\u002F\u002Fphysicalintelligence.company\u002Fblog\u002Fpi0)\n\n* **π₀.₅：具有开放世界泛化能力的视觉-语言-动作模型**，arXiv，2025年。     \nPhysical Intelligence、Kevin Black、Noah Brown、James Darpinian、Karan Dhabalia、Danny Driess、Adnan Esmail、Michael Equi、Chelsea Finn、Niccolo Fusai、Manuel Y. Galliker、Dibya Ghosh、Lachy Groom、Karol Hausman、Brian Ichter、Szymon Jakubczak、Tim Jones、Liyiming Ke、Devin LeBlanc、Sergey Levine、Adrian Li-Bell、Mohith Mothukuri、Suraj Nair、Karl Pertsch、Allen Z. Ren、Laura Smith、Jost Tobias Springenberg、Kyle Stachowicz、James Tanner、Quan Vuong、Homer Walke、Anna Walling、Haohuan Wang、Lili Yu、Ury Zhilinsky。     \n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.16054) [[项目]](https:\u002F\u002Fwww.physicalintelligence.company\u002Fblog\u002Fpi05)\n\n* **GR00T N1：面向通用人形机器人的开源基础模型**，arXiv，2025年。     \nNVIDIA：Johan Bjorck、Fernando Castañeda、Nikita Cherniadev、Xingye Da、Runyu Ding、Linxi \"Jim\" Fan、Yu Fang、Dieter Fox、Fengyuan Hu、Spencer Huang、Joel Jang、Zhenyu Jiang、Jan Kautz、Yuke Zhu。     \n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.14734) [[项目]](https:\u002F\u002Fdeveloper.nvidia.com\u002Fproject-groot)\n\n* **Gemini Robotics：将AI带入物理世界**，arXiv，2025年。     \nGemini Robotics团队、Google DeepMind。     \n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.20020) [[项目]](https:\u002F\u002Fdeepmind.google\u002Fdiscover\u002Fblog\u002Fgemini-robotics\u002F)\n\n* **OpenVLA：开源视觉-语言-动作模型**，CoRL，2024年。     \nMoo Jin Kim、Karl Pertsch、Siddharth Karamcheti、Ted Xiao、Ashwin Balakrishna、Suraj Nair、Rafael Rafailov、Ethan Foster、Grace Lam、Pannag R. Sanketi、Quan Vuong、Thomas Kollar、Benjamin Burchfiel、Russ Tedrake、Dorsa Sadigh、Sergey Levine、Percy Liang、Chelsea Finn。     \n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.09246) [[项目]](https:\u002F\u002Fopenvla.github.io\u002F)\n\n* **Octo：开源通用机器人策略**，RSS，2024年。     \nOcto模型团队、Dibya Ghosh、Homer Walke、Karl Pertsch、Kevin Black、Oier Mees、Sudeep Dasari、Joey Hejna、Tobias Kreiman、Charles Xu、Jianlan Luo、You Liang Tan、Lawrence Yunliang Chen、Lerrel Pinto、Chelsea Finn、Sergey Levine。     \n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.12213) [[项目]](https:\u002F\u002Focto-models.github.io\u002F)\n\n* **Magma：多模态AI智能体的基础模型**，CVPR，2025年。     \nJianwei Yang、Reuben Tan、Qianhui Wu、Ruijie Zheng、Baolin Peng、Yongyuan Liang、Yu Gu、Mu Cai、Seonghyeon Ye、Jongmin Jang、Yuquan Deng、Lars Lidén、Jianfeng Gao。     \n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.13130)\n\n* **UniVLA：统一的视觉-语言-动作模型**，RSS，2025年。     \nYuqi Wang、Xinghang Li、Wenxuan Wang、Junbo Zhang、Yingyan Li、Yuntao Chen、Xinlong Wang、Zhaoxiang Zhang。     \n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.19850) [[项目]](https:\u002F\u002Fgithub.com\u002FOpenDriveLab\u002FUniVLA)\n\n* **FAST：视觉-语言-动作模型的高效动作标记化**，arXiv，2025年。     \nKarl Pertsch、Kyle Stachowicz、Brian Ichter、Danny Driess、Suraj Nair、Quan Vuong、Sergey Levine、Chelsea Finn。     \n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.09747) [[项目]](https:\u002F\u002Fphysicalintelligence.company\u002Fresearch\u002Ffast)\n\n* **HumanPlus：来自人类的人形机器人影子跟随与模仿**，CoRL，2024年。     \nZipeng Fu、Qingqing Zhao、Qi Wu、Gordon Wetzstein、Chelsea Finn。     \n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.10454) [[项目]](https:\u002F\u002Fhumanoid-ai.github.io\u002F)\n\n* **ASAP：对齐仿真与真实物理环境以学习敏捷的人形全身技能**，arXiv，2025年。     \nTairan He、Jiawei Gao、Wenli Xiao、Yuanhang Zhang、Zi Wang、Jiashun Wang、Zhengyi Luo、Guanqi He、Nikhil Sobanbab、Chaoyi Pan、Zeji Yi、Guannan Qu、Kris Kitani、Jessica Hodgins、Linxi \"Jim\" Fan、Yuke Zhu、Changliu Liu、Guanya Shi。     \n[[论文]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.01143)\n\n* **Embodied-Reasoner：为具身交互任务协同视觉搜索、推理与行动**，arXiv，2025年。     \nWenqi Zhang、Mengna Wang、Gangao Liu、Xu Huixin、Yiwei Jiang、Yongliang Shen、Guiyang Hou、Zhe Zheng、Hang Zhang、Xin Li、Weiming Lu、Peng Li、Yueting Zhuang     \n[[页面]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.21696)\n\n* **RoboMatrix：面向开放世界中可扩展机器人任务规划与执行的技能中心分层框架**, arXiv, 2024.     \n毛伟欣、钟伟恒、蒋洲、方东、张仲悦、兰子涵、贾凡、王天材、范浩强、吉江修。     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.00171)]\n\n* **用于端到端机器人学习的空间视觉感知**, arXiv, 2024.     \n特拉维斯·戴维斯、严嘉欢、陈翔、田宇、庄雨婷、黄一奇、胡璐辉。     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.17458)]\n\n* **GR-2：具有网络规模知识的生成式视频-语言-动作模型，用于机器人操作**, arXiv, 2024.     \n张志廉、陈广增、景雅、孔涛、李航、李一峰、刘宇晓、吴洪涛、徐家锋、杨一初、张汉博、朱敏钊。     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.06158)]\n\n* **利用异构预训练Transformer扩展本体感觉-视觉学习**, arXiv, 2024.     \n王立睿、陈新磊、赵佳亮、何凯明。     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.20537)]\n\n* **深度具身智能体的空间推理与规划**, arXiv, 2024.     \n石田修。     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2409.19479)]\n\n* **在不完善的世界模型下将大型语言模型具身化于环境中**, arXiv, 2024.     \n刘浩然、赵继申。     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.02742)]\n\n* **SELU：未知环境中的自学习具身多模态大语言模型**, arXiv, 2024.     \n李博宇、姜浩斌、丁子洛、徐新润、李浩然、赵东彬、陆宗庆。     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.03303)]\n\n* **Autort：用于大规模机器人智能体编排的具身基础模型**, arXiv, 2024.     \n安恩、迈克尔、德比达塔、德维贝迪、切尔西、芬恩、蒙塞·冈萨雷斯、阿雷纳斯、基尔塔娜、戈帕拉克里希南、卡罗尔、豪斯曼、布赖恩、伊赫特、亚历克斯、伊尔潘、尼希尔、乔希、瑞安、朱利安等。     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.12963)]     \n\n* **扩散增强型智能体：高效探索与迁移学习框架**, arXiv, 2024.      \n诺曼·迪·帕洛、莱昂纳德·哈森克莱弗、扬·洪普利克、阿伦库马尔·比亚万。       \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.20798)]\n\n* **Rt-h：基于语言的动作层次结构**, ArXiv, 2024.    \n贝尔哈利、苏尼尔、丁天力、泰德、肖、皮埃尔、塞尔梅内、权、武英、乔纳森、汤普森、叶夫根、切博塔尔、德比达塔、德维贝迪、多尔萨、萨迪格。     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.01823)]\n\n* **照我所能做，而非照我说的做：将语言具身化于机器人操作能力之中**, 机器人学习会议，2023年。    \n布罗汉、安东尼、叶夫根·切博塔尔、切尔西·芬恩、卡罗尔·豪斯曼、亚历山大·赫尔佐格、丹尼尔·霍、朱利安·伊巴尔斯、艾瑞克·伊尔潘、杨瑞安、朱利安等。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.01691)]    \n\n* **Embodiedgpt：通过具身思维链进行视觉-语言预训练**, NeurIPS, 2024.     \n穆、姚、张青龙、胡孟康、王文海、丁俊、金斌、王继峰、戴宇、乔平、罗。     \n[[页面](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2023\u002Ffile\u002F4ec43957eda1126ad4887995d05fae3b-Paper-Conference.pdf)]\n\n* **Q-transformer：通过自回归Q函数实现可扩展的离线强化学习**, 机器人学习会议，2023年。    \n切博塔尔、叶夫根、权、武英、卡罗尔·豪斯曼、费伊、夏、姚、卢、亚历克斯·伊尔潘、阿维拉尔·库马尔、田和、俞、亚历山大·赫尔佐格、卡尔·佩尔茨等人。     \n[[页面](https:\u002F\u002Fproceedings.mlr.press\u002Fv229\u002Fchebotar23a\u002Fchebotar23a.pdf)]    \n\n* **Sara-rt：利用自适应鲁棒注意力扩展机器人Transformer**, arXiv, 2023.    \n莱阿尔、伊莎贝尔、克日什托夫·霍罗马斯基、迪帕莉·贾因、阿维纳瓦·杜贝、杰克·瓦利、迈克尔·里奥、姚、卢、弗雷德里克·刘、维卡斯·辛德瓦尼、权、武英等。     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2312.01990)]\n\n* **Palm-e：一种具身多模态语言模型**, ArXiv, 2023.    \n德里斯、丹尼、费伊、夏、梅迪 SM、萨贾迪、科里、林奇、阿坎克沙、乔德里、布莱恩·伊赫特、艾扎安、瓦希德、乔纳森·汤普森、权、武英、田和、俞等。     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.03378)]    \n\n* **Rt-2：视觉-语言-动作模型将网络知识迁移到机器人控制中**, 机器人学习会议，2023年。    \n齐特科维奇、布里安娜、田和、俞、西春、徐、彭、徐、泰德、肖、费伊、夏、贾琳、吴、保罗、沃尔哈特、斯特凡、韦尔克、艾扎安、瓦希德等。    \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2307.15818)]\n\n* **Open x-embodiment：机器人学习数据集及rt-x模型**, arXiv, 2023.        \n帕达尔卡尔及其他贡献者。     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.08864)]\n\n* **视觉-语言基础模型作为高效的机器人模仿者**, arXiv, 2023.    \n李兴航、刘明焕、张汉博、于存军、于洁、徐洪涛、吴赤蓝、张雅、景、魏楠、张华平等。     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.01378)]    \n\n* **Rt-1：用于大规模真实世界控制的机器人Transformer**, ArXiv, 2022.    \n布罗汉、安东尼、诺亚·布朗、贾斯蒂斯、卡巴哈尔、叶夫根·切博塔尔、约瑟夫·达比斯、切尔西·芬恩、基尔塔娜·戈帕拉克里希南、卡罗尔·豪斯曼、亚历克斯·赫尔佐格、茉莉·许等。     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2212.06817)]    \n\n\n\n### 具身操控与控制\n\n* **扩散策略：基于动作扩散的视觉运动策略学习**, RSS, 2023.    \n程驰、许振佳、冯思远、埃里克·库赞诺、杜一伦、本杰明·伯奇菲尔、拉斯·特德拉克、宋舒然。       \n[[页面](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.04137)] [[项目](https:\u002F\u002Fdiffusion-policy.cs.columbia.edu\u002F)]\n\n* **ManipTrans：通过残差学习实现高效灵巧双臂操作迁移**, CVPR, 2025.    \n李凯琳、李普浩、刘腾宇、李宇阳、黄思远。       \n[[页面](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.21860)]\n\n* **KStar Diffuser：具有运动学建模的时空图扩散策略，用于双臂机器人操作**, CVPR, 2025.    \n吕琪、李浩、邓翔、邵锐、李银川、郝建业、高隆祥、王宇迈克尔、聂立强。       \n[[页面](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F11093774\u002F)]\n\n* **AgiBot World Colosseo：用于规模化与智能化具身系统的大型操控平台**, IROS, 2025.    \nAgiBot-World-Contributors、毕青文、蔡继松、陈丽、崔秀琪、丁燕、冯思远、高深源、何新东、胡轩、黄旭、姜书、姜宇欣、李宏洋、李嘉露、刘启明、刘毅、路宇翔、罗建兰、罗平、穆耀、牛月寒、潘一轩、庞江淼、乔宇等。       \n[[页面](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.06669)] [[项目](https:\u002F\u002Fgithub.com\u002FOpenDriveLab\u002FAgiBot-World)]\n\n* **仿真与现实协同训练：基于视觉的机器人操作简易方案**, arXiv, 2025.    \n阿比拉姆·马杜库里、蒋振宇、陈永良劳伦斯、索鲁什·纳西里亚尼、谢宇琪、于芳、黄文琦、王祖、许振佳、切尔尼亚杰夫·尼基塔、里德·斯科特、肯·戈德堡、曼德尔卡尔·阿贾伊、樊林溪、朱玉珂等。       \n[[页面](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.24361)]\n\n* **PEAC：面向跨具身强化学习的无监督预训练**，NeurIPS，2024年。    \n应承阳、郝中凯、周欣宁、徐学舟、苏航、张星星、朱俊。       \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.14073)]\n\n* **用于具身学习实时决策的傅里叶控制器网络**，ICML，2024年。    \n谭恒凯、刘松明、马凯、应承阳、张星星、苏航、朱俊。       \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.19885)]\n\n* **RDT-1B：用于双手操作的扩散基础模型**，ArXiv，2024年。    \n刘松明、吴凌轩、李邦国、谭恒凯、陈华宇、王正毅、许科、苏航、朱俊。       \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.07864)]\n\n* **ManiBox：通过可扩展的仿真数据生成提升空间抓取泛化能力**，ArXiv，2024年。    \n谭恒凯、徐学舟、应承阳、毛新怡、刘松明、张星星、苏航、朱俊。       \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.01850)]\n\n* **RoboGSim：Real2Sim2Real 机器人高斯泼溅模拟器**，ArXiv，2024年。    \n李新海、李嘉林、张子恒、张睿、贾凡、王天财、范浩强、曾国坤、王瑞平。       \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2411.11839)]\n\n* **SPIRE：协同规划、模仿与强化学习在长 horizon 操作中的应用**，ArXiv，2024年。    \n周子涵、阿尼梅什·加格、迪特·福克斯、凯兰·加勒特、阿杰·曼德尔卡。       \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.18065)]\n\n* **扩散 Transformer 策略**，ArXiv，2024年。    \n侯志、张天一、熊宇文、蒲恒军、赵承阳、佟荣磊、乔宇、戴继峰、陈云涛。       \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.15959)]\n\n* **Dexcap：用于灵巧操作的可扩展且便携的动作捕捉数据采集系统**，ArXiv，2024年。    \n王晨、史浩辰、王伟卓、张若涵、李飞飞、C·卡伦·刘。       \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.07788)]\n\n* **Lota-bench：面向具身智能体的语言导向任务规划基准测试**，ArXiv，2024年。    \n崔在宇、尹英佑、翁孝彬、金民洙、张。     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.08178)]\n\n* **苏格拉底规划器：基于探究的零样本具身指令遵循规划**，Arxiv，2024年。    \n申秀妍、全秀珍、金正贤、姜基千、张炳泽。     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2404.15190)]    \n\n* **大型语言模型作为大规模任务规划中的常识知识**，NeurIPS，2024年。    \n赵子睿、李维孙、大卫·许。     \n[[页面](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2023\u002Ffile\u002F65a39213d7d0e1eb5d192aa77e77eeb7-Paper-Conference.pdf)]\n\n* **利用预训练大型语言模型在 PDDL 域中的通用规划**，AAAI，2024年。    \n西尔弗、汤姆、索哈姆、丹、卡维塔、斯里尼瓦斯、约书亚·B、特南鲍姆、莱斯利·帕克、凯尔布林、迈克尔、卡茨。     \n[[页面](https:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fdownload\u002F30006\u002F31766)]    \n\n* **迈向具身多智能体协作的高效 LLM 对齐**，arXiv，2024年。    \n张洋、杨世鑫、陈佳、白飞、吴秀、李雪龙、李振、王。     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.14314)]\n\n* **未知环境下的具身指令遵循**，arXiv，2024年。    \n吴振宇、王子威、徐秀伟、陆继文、颜海斌。     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.11818)]\n\n* **长 horizon 机器人任务理解的骨干模型**，arxiv，2024年。       \n陈晓帅、陈伟、李东明、葛玉坤、尼古拉斯·罗哈斯和彼得·科尔穆舍夫。           \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.01334)\n\n* **RoboMamba：用于高效机器人推理与操作的多模态状态空间模型**，arXiv，2024年。    \n刘家铭、孟真、刘振宇、王莉莉、李凯臣、周鹏举、安森桥、杨仁锐、张燕东、郭尚航、张。     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2406.04339)]\n\n* **按分而行：阶段引导的动态多感官融合在机器人操作中的应用**，arxiv，2024年。          \n冯若萱、胡迪1、马文珂、李雪龙。              \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.01366)\n\n* **自我中心视觉语言规划**，arxiv，2024年。          \n方志睿、杨明、曾伟帅、李博宇、岳俊鹏、丁子洛、李秀、陆宗庆。              \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.05802)\n\n* **Polaris：通过 Syn2Real 视觉对齐和大型语言模型实现开放式交互式机器人操作**，IROS，2024年。          \n王天宇、林海涛、于俊秋、傅延伟。              \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.07975)\n\n* **LLM-SAP：基于大型语言模型的情境感知规划**，ICME 2024 MML4SG 工作坊。           \n王立敏、钟汉阳。          \n[[页面]](https:\u002F\u002Fgithub.com\u002FHanyangZhong\u002FSituational_Planning_datasets)    \n\n* **FMB：面向可泛化机器人学习的功能性操作基准测试**，ArXiv，2024年。       \n罗建兰、徐查尔斯、刘芳晨、谭利亚姆、林子鹏、吴杰弗里、皮特·阿贝尔和谢尔盖·列文。          \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2401.08553)    \n\n* **ManipVQA：将机器人操作可能性与物理 grounded 信息注入多模态大型语言模型**，IROS，2024年。     \n黄思远、伊万·波诺马连科、蒋正凯、李小琪、胡晓彬、高鹏、李洪生和董浩。     \n[[页面]](https:\u002F\u002Fgithub.com\u002FSiyuanHuang95\u002FManipVQA)\n\n* **A3VLM：可行动的关节感知视觉语言模型**，ArXiv，2024年。       \n黄思远、常浩楠、刘宇涵、朱依梦、董浩、高鹏、阿卜杜斯拉姆·布拉里亚斯和李洪生。       \n[[页面]](https:\u002F\u002Fgithub.com\u002Fchanghaonan\u002FA3VLM)\n\n* **由 LLM 从 Parallel TextWorld 训练的具身多模态智能体**，CVPR，2024年。       \n杨义君、周天一、李侃雪、陶大鹏、李路松、沈丽、何晓东、江静、施雨辉。       \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FYang_Embodied_Multi-Modal_Agent_trained_by_an_LLM_from_a_Parallel_CVPR_2024_paper.pdf)\n\n* **检索增强型具身智能体**，CVPR，2024年。       \n朱一辰、欧志才、牟晓峰、唐健。       \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FZhu_Retrieval-Augmented_Embodied_Agents_CVPR_2024_paper.pdf)\n\n* **基于运动感知的鲁棒通信网络的多智能体协作感知**，CVPR，2024年。       \n洪世鑫、刘宇、李志、李绍辉、何友。       \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FHong_Multi-agent_Collaborative_Perception_via_Motion-aware_Robust_Communication_Network_CVPR_2024_paper.pdf)\n\n* **LLM-规划器：基于大型语言模型的具身智能体少样本 grounded 规划**，ICCV，2023年。         \n宋灿熙、吴嘉满、克莱·华盛顿、布莱恩·M·萨德勒、魏伦·曹、苏宇。      \n[[页面](LLM-规划器：基于大型语言模型的具身智能体少样本 grounded 规划)]\n\n* **具有记忆增强型大语言模型的开放式指令式具身智能体** EMNLP, 2023.    \nSarch, Gabriel, Yue, Wu, Michael J., Tarr, Katerina, Fragkiadaki.     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.15127)]    \n\n* **Voyager：基于大语言模型的开放式具身智能体**, TMLR, 2023.    \nWang, Guanzhi, Yuqi, Xie, Yunfan, Jiang, Ajay, Mandlekar, Chaowei, Xiao, Yuke, Zhu, Linxi, Fan, Anima, Anandkumar.     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.16291)]    \n\n* **ReAct：在语言模型中协同推理与行动**, ICLR, 2023.    \nYao, Shunyu, Jeffrey, Zhao, Dian, Yu, Nan, Du, Izhak, Shafran, Karthik, Narasimhan, Yuan, Cao.     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2210.03629)]    \n\n* **ProgPrompt：利用大语言模型生成情境化的机器人任务规划**, ICRA, 2023.    \nSingh, Ishika, Valts, Blukis, Arsalan, Mousavian, Ankit, Goyal, Danfei, Xu, Jonathan, Tremblay, Dieter, Fox, Jesse, Thomason, Animesh, Garg.     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.11302)]    \n\n* **ChatGPT用于机器人技术：设计原则与模型能力**, IEEE Access 12. (2023): 55682-55696.    \nSai Vemprala, Rogerio Bonatti, Arthur Fender C. Bucker, Ashish Kapoor.     \n[[页面](https:\u002F\u002Fieeexplore.ieee.org\u002Fstamp\u002Fstamp.jsp?arnumber=10500490)]    \n\n* **代码即策略：用于具身控制的语言模型程序**, ICRA, 2023.        \nJacky Liang, , Wenlong Huang, F. Xia, Peng Xu, Karol Hausman, Brian Ichter, Peter R. Florence, Andy Zeng.     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.07753)]\n\n* **用语言模型进行推理就是使用世界模型进行规划**, Arxiv, 2023.    \nHao, Shibo, Yi, Gu, Haodi, Ma, Joshua Jiahua, Hong, Zhen, Wang, Daisy Zhe, Wang, Zhiting, Hu.     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.14992)]    \n\n* **LGMCTS：面向可执行语义对象重排的语言引导蒙特卡洛树搜索**, arXiv, 2023.    \nHaonan Chang, Kai Gao, Kowndinya Boyalakuntla, Alex Lee, Baichuan Huang, Harish Udhaya Kumar, Jinjin Yu, Abdeslam Boularias.     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.15821)]    \n\n* **利用大语言模型将自然语言翻译为规划目标**, arXiv, 2023.    \nXie, Yaqi, Chen, Yu, Tongyao, Zhu, Jinbin, Bai, Ze, Gong, Harold, Soh.     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2302.05128)]    \n\n* **LLM+P：赋予大语言模型最优的规划能力**, arXiv, 2023.    \nLiu, Bo, Yuqian, Jiang, Xiaohan, Zhang, Qiang, Liu, Shiqi, Zhang, Joydeep, Biswas, Peter, Stone.     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.11477)]    \n\n* **使用LLM进行动态规划**, arXiv, 2023.    \nDagan, Gautier, Frank, Keller, Alex, Lascarides.     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.06391)]    \n\n* **利用大语言模型进行具身任务规划**, arXiv, 2023.    \nWu, Zhenyu, Ziwei, Wang, Xiuwei, Xu, Jiwen, Lu, Haibin, Yan.     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2307.01848)]    \n\n* **SayPlan：利用3D场景图使大语言模型具身化，实现可扩展的任务规划**, 机器人学习会议。2023年.    \nKrishan Rana, Jesse Haviland, Sourav Garg, Jad Abou-Chakra, Ian D. Reid, Niko Sunderhauf.     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2307.06135)]    \n\n* **ConceptGraphs：用于感知和规划的开放词汇3D场景图**, ArXiv, 2023.    \nQiao Gu, Ali Kuwajerwala, Sacha Morin, Krishna Murthy Jatavallabhula, Bipasha Sen, Aditya Agarwal, Corban Rivera, William Paul, Kirsty Ellis, Ramalingam Chellappa, Chuang Gan, Celso Miguel de Melo, Joshua B Tenenbaum, Antonio Torralba, Florian Shkurti, Liam Paull.     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.16650)]    \n\n* **RoboGPT：一种能够为日常指令任务做出具身长期决策的智能体**, arXiv, 2023.    \nYaran Chen, Wenbo Cui, Yuanwen Chen, Mining Tan, Xinyao Zhang, Dong Zhao, He Wang.     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.15649)]    \n\n* **与环境对话：利用大语言模型进行交互式多模态感知**, IROS, 2023.    \nZhao, Xufeng, Mengdi, Li, Cornelius, Weber, Muhammad Burhan, Hafez, Stefan, Wermter.     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.08268)]    \n\n* **视频语言规划**, arxiv, 2023.    \nDu, Yilun, Mengjiao, Yang, Pete, Florence, Fei, Xia, Ayzaan, Wahid, Brian, Ichter, Pierre, Sermanet, Tianhe, Yu, Pieter, Abbeel, Joshua B., Tenenbaum, Leslie, Kaelbling, Andy, Zeng, Jonathan, Tompson.     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.10625)]    \n\n* **代码即策略：用于具身控制的语言模型程序**, ICRA, 2023,    \nJacky Liang, Wenlong Huang, F. Xia, Peng Xu, Karol Hausman, Brian Ichter, Peter R. Florence, Andy Zeng.     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.07753)]    \n\n* **Reflexion：一个具有动态记忆和自我反思能力的自主智能体**, ArXiv, 2023.    \nNoah Shinn, Beck Labash, A. Gopinath.     \n[[页面]()]    \n\n* **描述、解释、规划与选择：利用大语言模型的交互式规划赋能开放世界多任务智能体**, 第37届国际神经信息处理系统大会论文集，2023年.    \nZihao Wang, Shaofei Cai, Anji Liu, Xiaojian Ma, Yitao Liang.     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2302.01560)]    \n\n* **Instruct2Act：利用大语言模型将多模态指令映射为机器人动作**, ArXiv, 2023.         \nSiyuan Huang, Zhengkai Jiang, Hao Dong, Yu Qiao, Peng Gao, and Hongsheng Li.          \n[[页面]](https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FInstruct2Act)    \n\n* **Cliport：用于机器人操作的“什么”和“哪里”路径**, 机器人学习会议，2022年.    \nShridhar, Mohit, Lucas, Manuelli, Dieter, Fox.     \n[[页面](https:\u002F\u002Fproceedings.mlr.press\u002Fv164\u002Fshridhar22a\u002Fshridhar22a.pdf)]    \n\n* **语言模型作为零样本规划者：为具身智能体提取可行动知识**, ICML, 2022.    \nHuang, Wenlong, Pieter, Abbeel, Deepak, Pathak, Igor, Mordatch.     \n[[页面](https:\u002F\u002Fproceedings.mlr.press\u002Fv162\u002Fhuang22a\u002Fhuang22a.pdf)]     \n\n* **内心独白：通过语言模型进行规划的具身推理**, 机器人学习会议，2022年.    \nHuang, Wenlong, Fei, Xia, Ted, Xiao, Harris, Chan, Jacky, Liang, Pete, Florence, Andy, Zeng, Jonathan, Tompson, Igor, Mordatch, Yevgen, Chebotar, Pierre, Sermanet, Noah, Brown, Tomas, Jackson, Linda, Luu, Sergey, Levine, Karol, Hausman, Brian, Ichter.      \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2207.05608)]     \n\n* **语言模型作为零样本规划者：为具身智能体提取可行动知识**, ICML, 2022.    \nHuang, Wenlong, Pieter, Abbeel, Deepak, Pathak, Igor, Mordatch.     \n[[页面](https:\u002F\u002Fproceedings.mlr.press\u002Fv162\u002Fhuang22a\u002Fhuang22a.pdf)]\n\n* **苏格拉底模型：用语言构建零样本多模态推理**, ICLR, 2022.    \nZeng, Andy, Maria, Attarian, Brian, Ichter, Krzysztof, Choromanski, Adrian, Wong, Stefan, Welker, Federico, Tombari, Aveek, Purohit, Michael, Ryoo, Vikas, Sindhwani, Johnny, Lee, Vincent, Vanhoucke, Pete, Florence.     \n[[页面](苏格拉底模型：用语言构建零样本多模态推理)]\n\n* **基于潜在语言的技能诱导与规划**, ACL, 2021.    \n普拉蒂尤莎·夏尔马、安东尼奥·托拉尔巴、雅各布·安德烈亚斯.     \n[[页面](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2110.01517)]\n\n* **PDDL：规划领域定义语言**, 技术报告。1998年.    \n德鲁·麦克德莫特、马利克·加拉卜、阿黛尔·E·豪、克雷格·A·诺克洛克、阿什温·拉姆、曼努埃拉·M·韦洛索、丹尼尔·S·韦尔德、大卫·E·威尔金斯.     \n[[页面](https:\u002F\u002Fwww.researchgate.net\u002Fprofile\u002FCraig-Knoblock\u002Fpublication\u002F2278933_PDDL_-_The_Planning_Domain_Definition_Language\u002Flinks\u002F0912f50c0c99385e19000000\u002FPDDL-The-Planning-Domain-Definition-Language.pdf)]\n\n* **STRIPS：将定理证明应用于问题求解的新方法**, 人工智能2. 3(1971): 189-208.    \n理查德·E·菲克斯、尼尔斯·J·尼尔森.     \n[[页面](https:\u002F\u002Fntrs.nasa.gov\u002Fapi\u002Fcitations\u002F19730013831\u002Fdownloads\u002F19730013831.pdf#page=98)]    \n\n* **启发式确定最小代价路径的正式基础**, IEEE系统科学与控制论汇刊4. (1968): 100-107.        \n彼得·E·哈特、尼尔斯·J·尼尔森、伯特伦·拉斐尔.     \n[[页面](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F4082128)]\n\n* **蒙特卡洛方法**, 美国统计协会期刊44 247. (1949): 335-41.    \n尼古拉斯·C·梅特ropolis、S. M. 乌兰.     \n[[页面](https:\u002F\u002Fweb.williams.edu\u002FMathematics\u002Fsjmiller\u002Fpublic_html\u002F341Fa09\u002Fhandouts\u002FMetropolisUlam_TheMonteCarloMethod.pdf)]    \n\n\n\n\n## \u003Ca id=\"sim-to-real\"> 仿真到现实的适应 \u003Ca href=\"#table-of-contents\">🔝\u003C\u002Fa> \u003C\u002Fa> \n\n* **Phantom：仅使用人类视频即可在无需机器人的情况下训练机器人**, arXiv, 2025    \n玛丽昂·勒佩尔特、方嘉颖、珍妮特·博格.       \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.00779)\n\n* **基于3D扩散策略的可泛化人形机器人操作**, arXiv, 2025    \n严杰泽、陈子轩、王文浩、陈天一、何夏林、袁莹、彭学斌、吴佳俊.       \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.10803)\n\n* **VLABench：一个大规模基准，用于具有长 horizon 推理任务的语言条件机器人操作**, arXiv, 2024    \n张世铎、徐哲、刘培菊、俞晓鹏、李源、高青辉、费兆业、尹章悦、吴祖轩、蒋宇刚、邱锡鹏                     \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.18194)\n\n* **PIVOT-R：面向机器人操作的原语驱动、航点感知世界模型**, NeurIPS, 2024    \n张凯东、任鹏振、林冰倩、林俊凡、马士奎、许航、梁晓丹                      \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.10394)\n\n* **机器人操作模仿学习中的数据缩放定律**, arXiv, 2024       \n林凡奇、胡英东、盛平岳、温川、游家成、高洋                   \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2410.18647)\n\n* **在仿真中评估真实世界机器人操作策略**, arXiv, 2024       \n李玄林、许凯、顾嘉元、珀茨先生、梅斯先生、里克·沃尔克先生、傅楚渊、卢纳瓦特小姐、西赫女士、基尔马尼先生、莱文先生、吴佳俊先生、芬恩女士、苏浩先生、武权先生、肖泰德先生                \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.05941)\n\n* **身体转换器：利用机器人具身性进行策略学习**, arXiv, 2024       \n萨弗拉扎神父、黄敦明、刘芳晨、李钟敏、皮特·阿贝尔             \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.06316)\n\n* **通过 grounded 语言模型实现人形机器人移动操作的自主行为规划**, arXiv, 2024    \n王进、劳伦齐神父、尼科斯·察加拉基斯        \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2408.082827)\n\n* **稳健智能体学习因果世界模型**, ICLR, 2024    \n里琴斯、乔纳森和汤姆·埃弗里特   \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.10877)    \n\n* **通用操作接口：无需野外机器人的野外机器人教学**, arXiv， 2024   \n池先生、潘振佳先生、潘秋儿女士、库辛诺先生、伯奇菲尔先生、冯思远先生、特德拉克先生、宋书然先生    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.10329)   \n\n* **Mobile ALOHA：通过低成本全身遥操作学习双臂移动操作**, arXiv, 2024    \n傅子鹏先生和赵托尼Z先生以及切尔西·芬恩女士   \n[[页面]](https:\u002F\u002Fmobile-aloha.github.io\u002Fresources\u002Fmobile-aloha.pdf)   \n\n* **人机联合学习以高效获取机器人操作技能**, arXiv, 2024    \n罗圣成先生、彭泉泉先生、吕军先生、洪凯文先生、德里格斯-坎贝尔女士、陆策吾先生、李永禄先生    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.00299)\n\n* **通过仿真弥合现实差距：一种从现实到仿真再到现实的稳健操作方法**, arXiv, 2024    \n托恩先生、西梅诺夫先生、李泽初先生、陈艾普丽尔女士、陈涛先生、阿比谢克·古普塔先生、普尔基特·阿格拉瓦尔先生   \n[[页面]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.03949)    \n\n* **TRANSIC：通过在线纠正学习实现仿真到现实的策略迁移**, arXiv, 2024   \n姜云帆先生、王辰先生、张若涵先生、吴佳俊先生、李飞飞女士    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.10315)    \n\n* **自然语言可以帮助弥合Sim2Real鸿沟**, arXiv, 2024    \n余阿尔伯特先生、富特阿德琳女士、穆尼雷蒙德先生、马丁-马丁罗伯托先生   \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.10020)\n\n* **用于足式移动操作的视觉全身控制**, arXiv, 2024    \n刘明焕先生、陈子轩先生、程旭欣先生、季延东先生、杨瑞涵先生、王小龙先生    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2403.16967)    \n\n* **用于人形机器人的富有表现力的全身控制**, arXiv, 2024    \n程旭欣先生、季延东先生、陈俊明先生、杨瑞涵先生、杨戈先生、王小龙先生    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.16796)\n\n* **Pandora：迈向具有自然语言动作和视频状态的通用世界模型**, arXiv, 2024    \n向建楠先生、刘广义先生、顾毅先生、高琪玥先生、宁玉婷先生、查宇恒先生、冯泽宇先生、陶天华先生、郝世博先生、史叶民先生等    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.09455)    \n\n* **3D-VLA：一种3D视觉-语言-动作生成式世界模型**, ICML, 2024     \n甄浩宇先生、邱晓雯女士、陈沛浩先生、杨锦程先生、颜鑫先生、杜逸伦先生、洪怡宁女士、甘创先生    \n[[页面]](https:\u002F\u002Fopenreview.net\u002Fforum?id=EZcFK8HupF)    \n\n* **扩散世界模型：超越逐步展开的未来建模，用于离线强化学习**, arXiv, 2024    \n丁子涵先生、张艾米女士、田元东先生、郑沁清女士    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.03570)    \n\n* **MC-JEPA：一种用于运动和内容特征自监督学习的联合嵌入预测架构**, ICLR, 2024    \n巴尔德斯先生、庞塞先生、勒丘恩先生    \n[[页面]](https:\u002F\u002Fopenreview.net\u002Fforum?id=9XdLlbxZCC)    \n\n* **在视觉表征学习中学习和利用世界模型**, arXiv, 2024    \n加里多先生、阿斯兰先生、巴拉斯先生、巴尔德斯先生、纳吉曼先生、勒丘恩先生    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.00504)\n\n* **iVideoGPT：交互式 VideoGPT 是可扩展的世界模型**，arXiv，2024    \n吴嘉龙、尹绍峰、冯宁雅、何旭、李栋、郝建业、龙明生   \n[[页面]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.15223)   \n\n* **用于机器人运动控制的时空预测性预训练**，arXiv，2024    \n杨建格、刘贝、傅建龙、潘博成、吴刚山、王利民   \n[[页面]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.05304)    \n\n* **LEGENT：具身智能体开放平台**，arXiv，2024    \n程志立、王志通、胡金毅、胡圣鼎、刘安、涂宇歌、李鹏凯、史磊、刘志远、孙茂松    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.18243)    \n\n* **Point-JEPA：面向点云自监督学习的联合嵌入预测架构**，arXiv，2024    \nSaito, Ayumu 和 Poovvancheri, Jiju   \n[[页面]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.16432)\n\n* **MuDreamer：无需重建即可学习预测性世界模型**，ICLR，2024    \nBurchi, Maxime 和 Timofte, Radu    \n[[页面]](https:\u002F\u002Fopenreview.net\u002Fforum?id=9pe38WpsbX)    \n\n* **从词模型到世界模型：将自然语言转化为概率化的思维语言**，arXiv，2024    \nWong, Lionel、Grand, Gabriel、Lew, Alexander K、Goodman, Noah D、Mansinghka, Vikash K、Andreas, Jacob 和 Tenenbaum, Joshua B    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.12672)    \n\n* **ElastoGen：4D 生成式弹性动力学**，arXiv，2024    \n冯宇涛、尚银童、冯翔、兰雷、哲闪电、邵天嘉、吴洪志、周坤、苏浩、蒋晨帆等   \n[[页面]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.15056)\n\n* **利用强化学习和生成式预训练模型实现四足机器人的逼真敏捷性和玩耍行为**，Nature Machine Intelligence，2024。        \n韩雷、朱庆旭、盛家鹏、张冲、李廷光、张义正、张鹤等        \n[[页面]](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs42256-024-00861-3)\n\n* **面向时间约束具身控制的模型适应**，CVPR，2024。        \n宋在贤、柳敏钟、禹洪郁。        \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FSong_Model_Adaptation_for_Time_Constrained_Embodied_Control_CVPR_2024_paper.pdf)\n\n* **ManipLLM：面向以物体为中心的机器人操作的具身多模态大型语言模型**，CVPR，2024。        \n李晓琪、张明旭、耿怡然、耿浩然、龙宇星、沈燕、张仁睿、刘佳明、董浩。        \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FLi_ManipLLM_Embodied_Multimodal_Large_Language_Model_for_Object-Centric_Robotic_Manipulation_CVPR_2024_paper.pdf)    \n\n* **ManipLLM：面向以物体为中心的机器人操作的具身多模态大型语言模型**，CVPR，2024。        \n李晓琪、张明旭、耿怡然、耿浩然、龙宇星、沈燕、张仁睿、刘佳明、董浩。        \n[[页面]](https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2024\u002Fpapers\u002FLi_ManipLLM_Embodied_Multimodal_Large_Language_Model_for_Object-Centric_Robotic_Manipulation_CVPR_2024_paper.pdf)\n\n* **GenH2R：通过可扩展的仿真、演示和模仿学习通用的人机交接技能**，CVPR，2024。        \n王子凡、陈俊宇、陈子清、谢鹏威、陈瑞、李毅。        \n[[页面]](https:\u002F\u002Fgenh2r.github.io\u002F)\n\n* **SAGE：连接语义与可操作部件，实现铰接式物体的通用操作**，RSS，2024。        \n耿浩然、魏松林、邓聪悦、申博魁、王鹤、Leonidas Guibas。        \n[[页面]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.01307)\n\n* **GAMMA：基于在线抓取姿态融合的可抓取性感知移动操作策略学习**，ICRA，2024。        \n张嘉钊、Nandiraju Gireesh、王继龙、方晓梦、徐超逸、陈伟光、戴刘、王鹤。        \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.15459)\n\n* **ReALFRED：真实感环境中的具身指令遵循基准测试**，ECCV，2024。        \n金泰雄、闵哲弘、金炳辉、金珍妍、郑元杰、崔宗贤。        \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.18550)\n\n* **DISCO：基于可微场景语义与双层控制的具身导航与交互**，ECCV，2024。        \n许新宇、罗圣诚、杨延超、李永禄、陆策吾。         \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.14758)\n\n* **DynSyn：面向过驱动具身系统的高效学习与控制的动力协同表征**，ICML，2024。        \n何凯波、左晨辉、马承天、隋亚楠。         \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.11472)\n\n* **A-JEPA：联合嵌入预测架构能够“倾听”**，arXiv，2023    \n费正聪、范明远、黄俊石   \n[[页面]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.15830)    \n\n* **One-2-3-45：无需逐形状优化，任何单张图像均可在45秒内转换为3D网格**，NeurIPS，2023    \n刘明华、徐超、金海安、陈凌浩、Varma T, Mukund、徐泽翔、苏浩   \n[[页面]](https:\u002F\u002Fopenreview.net\u002Fforum?id=A6X9y8n4sT)    \n\n* **潜在变量能量模型导论：迈向自主机器智能之路**，arXiv，2023    \nDawid, Anna 和 LeCun, Yann    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.02572)    \n\n* **GAPartNet：通过通用且可操作的部件实现跨类别领域通用的对象感知与操作**，CVPR，2023    \n耿浩然、许赫林、赵成阳、徐超、李毅、黄思源、王鹤    \n[[页面]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F10203924)\n\n* **奖励自适应强化学习：用于双足行走的动态策略梯度优化**，IEEE TPAMI，2023    \n黄昌鑫、王广润、周志博、张荣辉、林亮   \n[[页面]](https:\u002F\u002Fwww.computer.org\u002Fcsdl\u002Fjournal\u002Ftp\u002F2023\u002F06\u002F09956746\u002F1Iu2CDAJBcc)\n\n* **使用低成本硬件学习精细的双手操作**，ICML，2023    \nZhao, Tony Z、Kumar, Vikash、Levine, Sergey、Finn, Chelsea    \n[[页面]](https:\u002F\u002Fopenreview.net\u002Fforum?id=e8Eu1lqLaf)\n\n* **Surfer：结合世界模型的渐进式推理用于机器人操作**，arXiv，2023。    \n任鹏振、张凯东、郑和涛、李子轩、温宇航、朱凤达、马斯马、梁晓丹。         \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2306.11335)\n\n* **PartManip：从点云观测中学习跨类别通用的部件操作策略**，CVPR，2023。    \n耿浩然、李子铭、耿怡然、陈佳依、董浩、王鹤。         \n[[页面]](https:\u002F\u002Fpku-epic.github.io\u002FPartManip\u002F)\n\n* **迈向自主机器智能之路 版本 0.9.2，2022年6月27日**，Open Review，2022    \nYann LeCun    \n[[页面]](https:\u002F\u002Fopenreview.net\u002Fpdf?id=BZ5a1r-kVsf&)\n\n* **Real2Sim2Real：用于平面机器人投掷的物理单步动态动作自监督学习**，ICRA，2022    \nLim, Vincent、Huang, Huang、Chen, Lawrence Yunliang、Wang, Jonathan、Ichnowski, Jeffrey、Seita, Daniel、Laskey, Michael 和 Goldberg, Ken    \n[[页面]](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fabs\u002F10.1109\u002FICRA46639.2022.9811651)\n\n* **基于轨迹优化与模型预测控制的足式机器人在踏脚石上的连续跳跃**，IEEE CDC，2022    \nNguyen, Chuong、Bao, Lingfan 和 Nguyen, Quan    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.01147)\n\n* **奖励自适应强化学习：双足行走的动态策略梯度优化**，TPAMI，2022。          \nChangxin Huang、Guangrun Wang、Zhibo Zhou、Ronghui Zhang、Liang Lin。        \n[[页面]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F9956746)\n\n* **搬运网络：为机器人操作重新组织视觉世界**，CoRL，2021    \nZeng, Andy、Florence, Pete、Tompson, Jonathan、Welker, Stefan、Chien, Jonathan、Attarian, Maria、Armstrong, Travis、Krasin, Ivan、Duong, Dan、Sindhwani, Vikas 等    \n[[页面]](https:\u002F\u002Fproceedings.mlr.press\u002Fv155\u002Fzeng21a.html)   \n\n* **MIT 人形机器人：特技行为的设计、运动规划与控制**，IEEE-RAS 第20届国际人形机器人会议（Humanoids），2021    \nChignoli, Matthew、Kim, Donghyun、Stanger-Jones, Elijah 和 Kim, Sangbae   \n[[页面]](https:\u002F\u002Farxiv.longhoe.net\u002Fpdf\u002F2104.09025)   \n\n* **无需动力学随机化的强化学习Sim2Real迁移**，IROS，2020    \nKaspar, Manuel、Osorio, Juan D Mu{\\~n}oz 和 Bock, Jurgen      \n[[页面]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F9341260)   \n\n* **学习灵巧的手部操作**，国际机器人研究杂志，2020    \nAndrychowicz，OpenAI：Marcin、Baker，Bowen、Chociej，Maciek、Jozefowicz，Rafal、McGrew，Bob、Pachocki，Jakub、Petron，Arthur、Plappert，Matthias、Powell，Glenn、Ray，Alex 等    \n[[页面]](https:\u002F\u002Fjournals.sagepub.com\u002Fdoi\u002Ffull\u002F10.1177\u002F0278364919887447)\n\n* **DeepGait：利用深度强化学习规划与控制四足步态**，IEEE机器人与自动化快报，2020   \nTsounis, Vassilios、Alge, Mitja、Lee, Joonho、Farshidian, Farbod 和 Hutter, Marco    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1909.08399)    \n\n* **MIT Cheetah 3 机器人的优化跳跃**，ICRA，2019    \nNguyen, Quan、Powell, Matthew J、Katz, Benjamin、Di Carlo, Jared 和 Kim, Sangbae   \n[[页面]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F8794449)   \n\n* **世界模型**，NIPS，2018    \nHa, David 和 Schmidhuber, Jurgen    \n[[页面]](https:\u002F\u002Fmx.nthu.edu.tw\u002F~jlliu\u002Fteaching\u002FAI17\u002FAuto8.pdf)\n\n* **MIT Cheetah 3：一款鲁棒、动态的四足机器人设计与控制**，IEEE\u002FRSJ 智能机器人与系统国际会议（IROS），2018    \nBledt, Gerardo、Powell, Matthew J、Katz, Benjamin、Di Carlo, Jared、Wensing, Patrick M 和 Kim, Sangbae    \n[[页面]](https:\u002F\u002Fdspace.mit.edu\u002Fbitstream\u002Fhandle\u002F1721.1\u002F126619\u002Firos.pdf?sequence=2)   \n\n* **可变形物体操作的模拟到现实强化学习**，CoRL，2018    \nMatas, Jan、James, Stephen 和 Davison, Andrew J    \n[[页面]](http:\u002F\u002Fproceedings.mlr.press\u002Fv87\u002Fmatas18a\u002Fmatas18a.pdf)\n\n* **具有单步预览功能的随机变化离散地形上的动态行走**，机器人：科学与系统，2017    \nNguyen, Quan、Agrawal, Ayush、Da, Xingye、Martin, William C、Geyer, Hartmut、Grizzle, Jessy W 和 Sreenath, Koushil    \n[[页面]](https:\u002F\u002Fhybrid-robotics.berkeley.edu\u002Fpublications\u002FRSS2017_DiscreteTerrain_Walking.pdf)   \n\n* **用于优化运动控制器的深度核方法**，CoRL，2017    \nAntonova, Rika、Rai, Akshara 和 Atkeson, Christopher G    \n[[页面]](http:\u002F\u002Fproceedings.mlr.press\u002Fv78\u002Fantonova17a\u002Fantonova17a.pdf)\n\n* **为未知做好准备：通过在线系统辨识学习通用策略**，RSS，2017    \nYu, Wenhao、Tan, Jie、Liu, C Karen 和 Turk, Greg    \n[[页面]](https:\u002F\u002Farxiv.org\u002Fabs\u002F1702.02453)    \n\n* **领域随机化：将深度神经网络从仿真迁移到现实世界**，IROS，2017    \nTobin, Josh、Fong, Rachel、Ray, Alex、Schneider, Jonas、Zaremba, Wojciech 和 Abbeel, Pieter    \n[[页面]](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F8202133)\n\n* **熟能生巧：基于优化的方法控制四足机器人的敏捷运动**，IEEE机器人与自动化杂志，2016    \nGehring, Christian、Coros, Stelian、Hutter, Marco、Bellicoso, Carmine Dario、Heijnen, Huub、Diethelm, Remo、Bloesch, Michael、Fankhauser, P{\\'e}ter、Hwangbo, Jemin 和 Hoepflinger, Mark 等    \n[[页面]](https:\u002F\u002Fwww.research-collection.ethz.ch\u002Fbitstream\u002Fhandle\u002F20.500.11850\u002F183161.1\u002F1\u002Feth-49107-01.pdf)   \n\n* **ANYmal——一款高度机动且动态的四足机器人**，IEEE\u002FRSJ 智能机器人与系统国际会议（IROS），2016    \nHutter, Marco、Gehring, Christian、Jud, Dominic、Lauber, Andreas、Bellicoso, C Dario、Tsounis, Vassilios、Hwangbo, Jemin、Bodie, Karen、Fankhauser, Peter 和 Bloesch, Michael 等    \n[[页面]](https:\u002F\u002Fwww.research-collection.ethz.ch\u002Fbitstream\u002Fhandle\u002F20.500.11850\u002F118642\u002Feth-49454-01.pdf;sequence=1)   \n\n* **基于优化的Atlas机器人全身控制**，IEEE-RAS 国际人形机器人会议，2014    \nFeng, Siyuan、Whitman, Eric、Xinjilefu、X 和 Atkeson, Christopher G    \n[[页面]](http:\u002F\u002Fwww.cs.cmu.edu\u002Fafs\u002Fcs\u002Fuser\u002Fsfeng\u002Fwww\u002Fsf_hum14.pdf)    \n\n* **适用于MABEL的柔顺混合零动力学控制器：实现稳定、高效且快速的双足行走**，国际机器人研究杂志，2011    \nSreenath, Koushil、Park, Hae-Won、Poulakakis, Ioannis 和 Grizzle, Jessy W    \n[[页面]](https:\u002F\u002Fsites.udel.edu\u002Fpoulakas\u002Ffiles\u002F2022\u002F10\u002FJ07-A-Compliant-Hybrid-Zero-Dynamics-Controller.pdf)   \n\n* **双足机器人的动态行走**，国际机器人研究杂志，1984年   \nMiura, Hirofumi 和 Shimoyama, Isao    \n[[页面]](https:\u002F\u002Fjournals.sagepub.com\u002Fdoi\u002Fabs\u002F10.1177\u002F027836498400300206)\n\n## \u003Ca id=\"datasets\"> 数据集 \u003Ca href=\"#table-of-contents\">🔝\u003C\u002Fa> \u003C\u002Fa> \n待更新...     \n* **AgiBot World**, 2025年。[[链接]](https:\u002F\u002Fgithub.com\u002FOpenDriveLab\u002FAgiBot-World)\n* **RoboVerse**, 2025年。[[链接]](https:\u002F\u002Froboverseorg.github.io\u002F)\n* **RefSpatial**, 2025年。[[链接]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FJingkunAn\u002FRefSpatial)\n* **VisualAgentBench**, 2023年。[链接](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FVisualAgentBench)\n* **Open X-Embodiment**, 2023年。[链接](https:\u002F\u002Frobotics-transformer-x.github.io\u002F)\n* **RH20T-P**, 2024年。[链接](https:\u002F\u002Fsites.google.com\u002Fview\u002Frh20t-primitive\u002Fmain)   \n* **ALOHA 2**, 2024年。[链接](https:\u002F\u002Faloha-2.github.io\u002F)  \n* **GRUtopia**, 2024年。[链接](https:\u002F\u002Fgithub.com\u002FOpenRobotLab\u002FGRUtopia)\n* **ARIO (All Robots In One)**, 2024年。[链接](https:\u002F\u002Fimaei.github.io\u002Fproject_pages\u002Fario\u002F)\n* **VLABench**, 2024年。[链接](https:\u002F\u002Fvlabench.github.io\u002F)\n* **Matterport3D**, 2017年。[[链接]](https:\u002F\u002Fgithub.com\u002Fniessner\u002FMatterport)\n* **RoboMIND**, 2025年。[[链接]](https:\u002F\u002Fx-humanoid-robomind.github.io\u002F)\n\n\n### 身体化感知\n#### 视觉\n\n\n* **BEHAVIOR Vision Suite**, 2024年。[[链接]](https:\u002F\u002Fbehavior-vision-suite.github.io\u002F)\n* **SpatialQA**, 2024年。[[链接]](https:\u002F\u002Fgithub.com\u002FBAAI-DCAI\u002FSpatialBot)  \n* **SpatialBench**, 2024年。[[链接]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRussRobin\u002FSpatialBench)\n* **Uni3DScenes**, 2024年。[[链接]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRussRobinSpatialBench)\n* **Active Recognition Dataset**, 2023年。[[链接]](https:\u002F\u002Fleifan95.github.io\u002F_pages\u002FAR-dataset\u002Findex.html)\n* **Baxter_UR5_95_Objects_Dataset**, 2023年。[[链接]](https:\u002F\u002Fwww.eecs.tufts.edu\u002F~gtatiya\u002Fpages\u002F2022\u002FBaxter_UR5_95_Objects_Dataset.html)\n* **Caltech-256**, 2022年。[[链接]](https:\u002F\u002Fdata.caltech.edu\u002Frecords\u002Fnyy15-4j048)\n* **DIDI Dataset**, 2020年。[[链接]](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fgoogle-research\u002Fblob\u002Fmaster\u002Fdidi_dataset\u002FREADME.md)\n* **Replica**, 2019年。[[链接]](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FReplica-Dataset)\n* **ScanObjectNN**, 2019年。[[链接]](https:\u002F\u002Fhkust-vgd.github.io\u002Fscanobjectnn\u002F)\n* **OCID Dataset**, 2019年。[[链接]](https:\u002F\u002Fwww.acin.tuwien.ac.at\u002Fen\u002Fvision-for-robotics\u002Fsoftware-tools\u002Fobject-clutter-indoor-dataset\u002F)\n* **L3RScan**, 2019年。[[链接]](https:\u002F\u002Fgithub.com\u002FWaldJohannaU\u002F3RScan)\n* **EmbodiedScan**, 2019年。[[链接]](https:\u002F\u002Fdocs.google.com\u002Fforms\u002Fd\u002Fe\u002F1FAIpQLScUXEDTksGiqHZp31j7Zp7zlCNV7p_08uViwP_Nbzfn3g6hhw\u002Fviewform)  \n* **UZH-FPV Dataset**, 2019年。[[链接]](https:\u002F\u002Ffpv.ifi.uzh.ch\u002F)\n* **LM Data**, 2019年。[[链接]](https:\u002F\u002Fperinglab.org\u002Flmdata\u002F)\n* **TUM Visual-Inertial Dataset**, 2018年。[[链接]](https:\u002F\u002Fcvg.cit.tum.de\u002Fdata\u002Fdatasets\u002Fvisual-inertial-dataset)\n* **ScanNet**, 2017年。[[链接]](https:\u002F\u002Fgithub.com\u002FScanNet\u002FScanNet)\n* **SUNCG**, 2017年。[[链接]](http:\u002F\u002Fsuncg.cs.princeton.edu\u002F)\n* **Semantic 3D**, 2017年。[[链接]](http:\u002F\u002Fwww.semantic3d.net\u002F)\n* **ScanNet v2**, 2017年。[[链接]](https:\u002F\u002Fgithub.com\u002FScanNet\u002FScanNet)\n* **S3DIS**, 2016年。[[链接]](http:\u002F\u002Fbuildingparser.stanford.edu\u002F)\n* **Synthia**, 2016年。[[链接]](https:\u002F\u002Fsynthia-dataset.net\u002F)\n* **ModelNet**, 2015年。[[链接]](https:\u002F\u002Fmodelnet.cs.princeton.edu\u002F)\n* **ORBvoc**, 2015年。[[链接]](https:\u002F\u002Fgithub.com\u002Fraulmur\u002FORB_SLAM)\n* **Sketch dataset**, 2015年。[[链接]](https:\u002F\u002Fcybertron.cg.tu-berlin.de\u002Feitz\u002Fprojects\u002Fclassifysketch\u002F)\n* **SUN RGBD**, 2015年。[[链接]](https:\u002F\u002Frgbd.cs.princeton.edu\u002F)\n* **ShapeNet**, 2015年。[[链接]](https:\u002F\u002Fshapenet.org\u002F)\n* **MVS Dataset**, 2014年。[[链接]](http:\u002F\u002Froboimagedata.compute.dtu.dk\u002F?page_id=36)\n* **SUOD**, 2013年。[[链接]](https:\u002F\u002Fwww.acfr.usyd.edu.au\u002Fpapers\u002FSydneyUrbanObjectsDataset.shtml)\n* **SUN360**, 2012年。[[链接]](https:\u002F\u002Fvision.cs.princeton.edu\u002Fprojects\u002F2012\u002FSUN360\u002Fdata\u002F)\n* **NYU Depth Dataset V2**, 2012年。[[链接]](https:\u002F\u002Fcs.nyu.edu\u002F~fergus\u002Fdatasets\u002Fnyu_depth_v2.html)\n* **TUM-RGBD**, 2012年。[[链接]](https:\u002F\u002Fcvg.cit.tum.de\u002Fdata\u002Fdatasets\u002Frgbd-dataset\u002Fdownload)\n* **EuRoC MAV Dataset**, 2012年。[[链接]](https:\u002F\u002Fprojects.asl.ethz.ch\u002Fdatasets\u002Fdoku.php?id=kmavvisualinertialdatasets)\n* **Semantic KITTI**, 2012年。[[链接]](https:\u002F\u002Fwww.semantic-kitti.org\u002Fdataset.html#download)\n* **KITTI Object Recognition**, 2012年。[[链接]](http:\u002F\u002Fwww.cvlibs.net\u002Fdatasets\u002Fkitti\u002Feval_object.php)\n* **Stanford Track Collection**, 2011年。[[链接]](http:\u002F\u002Fcs.stanford.edu\u002Fpeople\u002Fteichman\u002Fstc\u002F)\n\n\n#### 触觉 \n\n* **Touch100k**, 2024年。[[链接]](https:\u002F\u002Fcocacola-lab.github.io\u002FTouch100k\u002F)\n* **ARIO (All Robots In One)**, 2024年。[[链接]](https:\u002F\u002Fimaei.github.io\u002Fproject_pages\u002Fario\u002F)\n* **TaRF**, 2024年。[[链接]](https:\u002F\u002Fdou-yiming.github.io\u002FTaRF\u002F)    \n* **TVL**, 2024年。[[链接]](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fmlfu7\u002FTouch-Vision-Language-Dataset)\n* **YCB-Slide**, 2022年。[[链接]](https:\u002F\u002Fgithub.com\u002Frpl-cmu\u002FYCB-Slide)\n* **Touch and Go**, 2022年。[[链接]](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1NDasyshDCL9aaQzxjn_-Q5MBURRT360B)\n* **SSVTP**, 2022年。[[链接]](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1H0B-jJ4l3tJu2zuqf-HbZy2bjEl-vL3f\u002Fview?usp=sharing)\n* **ObjectFolder**, 2021-2023年。[[链接]](https:\u002F\u002Fgithub.com\u002Frhgao\u002FObjectFolder)\n* **Decoding the BioTac**, 2020年。[[链接]](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1-BkqiFN9q6cz9Dk74oDlfmDs2m7ZvbWC)\n* **SynTouch**, 2019年。[[链接]](https:\u002F\u002Ftams.informatik.uni-hamburg.de\u002Fresearch\u002Fdatasets\u002Findex.php#biotac_single_contact_response)\n* **The Feeling of Success**, 2017年。[[链接]](https:\u002F\u002Fsites.google.com\u002Fview\u002Fthe-feeling-of-success\u002F)\n\n### 身体化导航\n* **ALFRED**, 2020年。[[链接]](https:\u002F\u002Faskforalfred.com\u002F)  \n* **REVERIE**, 2020年。[[链接]](https:\u002F\u002Fgithub.com\u002FYuankaiQi\u002FREVERIE) \n* **CVDN**, 2019年。[[链接]](https:\u002F\u002Fgithub.com\u002Fmmurray\u002Fcvdn\u002F)     \n* **Room to Room (R2R)**, 2017年。[[链接]](https:\u002F\u002Fpaperswithcode.com\u002Fdataset\u002Froom-to-room)\n* **DivScene**, 2024年。[[链接]](https:\u002F\u002Fgithub.com\u002Fzhaowei-wang-nlp\u002FDivScene)\n* **LH-VLN**, 2025年。[[链接]](https:\u002F\u002Fhcplab-sysu.github.io\u002FLH-VLN\u002F)\n\n### 身体化问答\n\n* **SpatialQA**, 2024年。[[链接]](https:\u002F\u002Fgithub.com\u002FBAAI-DCAI\u002FSpatialBot)  \n* **S-EQA**, 2024年。[[链接]](https:\u002F\u002Fgamma.umd.edu\u002Fresearchdirections\u002Fembodied\u002Fseqa\u002F)\n* **HM-EQA**, 2024年。[[链接]](https:\u002F\u002Fgithub.com\u002FStanford-ILIAD\u002Fexplore-eqa) \n* **K-EQA**, 2023年。[[链接]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.07872) \n* **SQA3D**, 2023年。[[链接]](https:\u002F\u002Fsqa3d.github.io\u002F) \n* **VideoNavQA**, 2019年。[[链接]](https:\u002F\u002Fgithub.com\u002Fcatalina17\u002FVideoNavQA)  \n* **MP3D-EQA**, 2019年。[[链接]](https:\u002F\u002Faskforalfred.com\u002F)  \n* **MT-EQA**, 2019年。[[链接]](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FMT-EQA)  \n* **IQUAD V1**, 2018年。[[链接]](https:\u002F\u002Fgithub.com\u002Fdanielgordon10\u002Fthor-iqa-cvpr-2018)  \n* **EQA**, 2018年。[[链接]](https:\u002F\u002Fembodiedqa.org\u002Fdata)  \n\n### 身体化操作\n* **OAKINK2**, 2024年。[[链接]](https:\u002F\u002Foakink.net\u002Fv2\u002F)  \n\n## 其他有用的身体化项目与工具\n\n### 资源\n[Awesome-Embodied-Agent-with-LLMs](https:\u002F\u002Fgithub.com\u002Fzchoi\u002FAwesome-Embodied-Agent-with-LLMs)    \n[Awesome Embodied Vision](https:\u002F\u002Fgithub.com\u002FChanganVR\u002Fawesome-embodied-vision)    \n[Awesome Touch](https:\u002F\u002Fgithub.com\u002Flinchangyi1\u002FAwesome-Touch)    \n[Awesome VLA Study](https:\u002F\u002Fgithub.com\u002FMilkClouds\u002Fawesome-vla-study)\n\n### 模拟平台与环境\n[Habitat-Lab](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fhabitat-lab)    \n[Habitat-Sim](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fhabitat-sim)    \n[GibsonEnv](https:\u002F\u002Fgithub.com\u002FStanfordVL\u002FGibsonEnv)    \n[LEGENT](https:\u002F\u002Fgithub.com\u002Fthunlp\u002FLEGENT)    \n[MetaUrban](https:\u002F\u002Fmetadriverse.github.io\u002Fmetaurban\u002F)  \n[GRUtopia](https:\u002F\u002Fgithub.com\u002FOpenRobotLab\u002FGRUtopia)             \n[GenH2R](https:\u002F\u002Fgenh2r.github.io\u002F)    \n[演示 HumanTHOR](https:\u002F\u002Fsites.google.com\u002Fview\u002Fhumanthor\u002F)      \n[BestMan](https:\u002F\u002Fgithub.com\u002FAutonoBot-Lab\u002FBestMan_Pybullet)      \n[InfiniteWorld](https:\u002F\u002Fgithub.com\u002Fpzhren\u002FInfiniteWorld)      \n[Genesis](https:\u002F\u002Fgenesis-embodied-ai.github.io\u002F)      \n[Cosmos](https:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fai\u002Fcosmos\u002F)      \n### 项目\n* 操控\n\n[RoboMamba](https:\u002F\u002Fsites.google.com\u002Fview\u002Frobomamba-web)   \n[MANIPULATE-ANYTHING](https:\u002F\u002Frobot-ma.github.io\u002F)    \n[DexGraspNet](https:\u002F\u002Fpku-epic.github.io\u002FDexGraspNet\u002F)      \n[UniDexGrasp](https:\u002F\u002Fpku-epic.github.io\u002FUniDexGrasp\u002F)      \n[UniDexGrasp++](https:\u002F\u002Fpku-epic.github.io\u002FUniDexGrasp++)      \n[OAKINK2](https:\u002F\u002Foakink.net\u002Fv2)      \n[AgiBot-World](https:\u002F\u002Fgithub.com\u002FOpenDriveLab\u002Fagibot-world)\n\n* 身体化交互\n\n[EmbodiedQA](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FEmbodiedQA)  \n\n* 身体化感知\n\n[EmbodiedScan](https:\u002F\u002Fgithub.com\u002FOpenRobotLab\u002FEmbodiedScan)    \n\n* 模型与工具\n\n[Octopus](https:\u002F\u002Fgithub.com\u002Fdongyh20\u002FOctopus)    \n[Holodeck](https:\u002F\u002Fgithub.com\u002Fallenai\u002FHolodeck)    \n[AllenAct](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fallenact)    \n\n* 代理\n\n[LEO](https:\u002F\u002Fgithub.com\u002Fembodied-generalist\u002Fembodied-generalist)    \n[Voyager](https:\u002F\u002Fgithub.com\u002FMineDojo\u002FVoyager)    \n   \n\n \n## :newspaper: 引用 \n如果您认为本综述有所帮助，请随时点个赞 ⭐️ 并引用我们的论文：\n\n```bibtex\n@article{liu2024aligning,\n  title={将网络空间与物理世界对齐：身体化人工智能的全面综述},\n  author={刘洋、陈伟星、白永杰、梁晓丹、李冠斌、高文、林亮},\n  journal={arXiv 预印本 arXiv:2407.06886},\n  year={2024}\n}\n```\n```bibtex\n@article{liu2025aligning,\n  title={将网络空间与物理世界对齐：身体化人工智能的全面综述},\n  author={刘洋、陈伟星、白永杰、梁晓丹、李冠斌、高文、林亮},\n  journal={IEEE\u002FASME 机电一体化汇刊},\n  year={2025}\n}\n```\n## 👏 致谢\n我们衷心感谢罗景洲、宋新帅、蒋凯旋、林俊毅、李志达和赵甘龙的贡献。","# Embodied_AI_Paper_List 快速上手指南\n\n`Embodied_AI_Paper_List` 并非一个需要编译安装的软件工具，而是一个由中山大学 HCPLab 维护的**具身智能（Embodied AI）领域论文与资源汇总仓库**。它主要作为研究文献库、综述参考及数据集索引使用。开发者无需复杂的环境配置，即可通过浏览器或 Git 直接获取资源。\n\n## 环境准备\n\n本项目无特殊的系统或依赖要求，仅需具备以下基础环境之一：\n\n*   **操作系统**：Windows \u002F macOS \u002F Linux 均可。\n*   **必备工具**：\n    *   **Web 浏览器**：用于直接在线浏览分类列表和下载论文（推荐 Chrome 或 Edge）。\n    *   **Git**（可选）：用于克隆仓库到本地，方便离线查阅或贡献代码。\n    *   **PDF 阅读器**：用于阅读下载的综述论文和数据集文档。\n\n> **国内访问建议**：\n> 由于项目托管在 GitHub 上，国内用户若遇到访问速度慢或图片加载失败的问题，建议使用 **Gitee 镜像**（如有）或通过 **GitHub 加速代理** 进行克隆。在线阅读时，可直接访问提供的 arXiv 链接或国内学术镜像站获取论文全文。\n\n## 安装步骤（获取资源）\n\n你可以通过以下两种方式获取该资源列表：\n\n### 方式一：在线浏览（推荐）\n直接访问 GitHub 项目页面，查看实时更新的 `README.md` 文件，其中包含了按类别整理的最新论文列表。\n*   **项目地址**：[https:\u002F\u002Fgithub.com\u002FHCPLab-SYSU\u002FEmbodied_AI_Paper_List](https:\u002F\u002Fgithub.com\u002FHCPLab-SYSU\u002FEmbodied_AI_Paper_List)\n\n### 方式二：克隆到本地\n如果你希望离线查阅或通过 Pull Request 贡献新的论文条目，请使用以下命令克隆仓库：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FHCPLab-SYSU\u002FEmbodied_AI_Paper_List.git\ncd Embodied_AI_Paper_List\n```\n\n> **国内加速命令**（如果原生克隆失败）：\n> ```bash\n> git clone https:\u002F\u002Fghp.ci\u002Fhttps:\u002F\u002Fgithub.com\u002FHCPLab-SYSU\u002FEmbodied_AI_Paper_List.git\n> ```\n\n## 基本使用\n\n本项目的核心用途是**检索文献**和**下载报告**。以下是两种最常用的使用场景：\n\n### 1. 检索特定领域的论文\n打开项目根目录下的 `README.md` 文件（或在 GitHub 网页端查看），利用目录导航快速定位到你感兴趣的研究方向。主要分类包括：\n\n*   **Books & Surveys**：书籍与综述文章（适合入门和了解前沿）。\n*   **Embodied Simulators**：具身仿真器（如 Isaac Sim, MuJoCo, Habitat 等）。\n*   **Embodied Perception**：具身感知。\n*   **Embodied Interaction**：具身交互。\n*   **Embodied Agent**：具身智能体。\n*   **Sim-to-Real Adaptation**：仿真到现实的迁移。\n*   **Datasets**：相关数据集汇总。\n\n**示例**：若想查找关于“世界模型（World Models）”的综述，可在 `Books & Surveys` 章节找到标题为 *\"A Comprehensive Survey on World Models for Embodied AI\"* 的条目，点击 `[Paper]` 链接即可跳转至 arXiv 下载 PDF。\n\n### 2. 下载核心综述报告\n该项目配套了一篇发表在 *IEEE\u002FASME Transactions on Mechatronics 2025* 上的深度综述论文，是理解该领域的绝佳起点。\n\n*   **论文标题**：Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI\n*   **下载方式**：\n    直接在浏览器打开以下链接获取完整版 PDF：\n    ```text\n    https:\u002F\u002Farxiv.org\u002Fpdf\u002F2407.06886\n    ```\n    或者在克隆后的本地仓库中查找 `EmbodiedAI_Review.pdf` 文件（如果仓库包含该附件）。\n\n### 3. 贡献新论文（进阶）\n如果你发现了最新的相关论文并希望收录其中，可以 Fork 该项目，修改 `README.md` 文件添加条目，然后发起 Pull Request。\n\n```markdown\n* **论文标题**, arXiv:编号, 年份\n作者列表.\n[[Paper](论文链接)]\n```","某高校机器人实验室的博士生正在撰写关于“具身智能感知与交互”的综述论文，并计划开发一套新的仿真训练框架。\n\n### 没有 Embodied_AI_Paper_List 时\n- **文献检索如大海捞针**：需要在 arXiv、Google Scholar 等多个平台反复搜索关键词，难以区分哪些是核心综述，哪些是边缘研究，耗时数周仍担心遗漏重要成果。\n- **技术脉络模糊不清**：面对碎片化的论文，难以系统梳理从“具身感知”到“虚实迁移（Sim-to-Real）”的技术演进路线，导致论文逻辑架构搭建困难。\n- **资源匹配效率低下**：找到了算法论文却找不到对应的开源代码或专用数据集，甚至发现选用的仿真器已过时，严重拖慢实验复现进度。\n- **前沿动态滞后**：无法及时获取 2025-2026 年的最新突破（如多模态大模型在具身智能中的最新应用），导致研究起点落后于社区平均水平。\n\n### 使用 Embodied_AI_Paper_List 后\n- **一站式获取权威清单**：直接查阅按时间排序的最新论文列表，快速锁定 IEEE\u002FASME Transactions 等顶刊收录的综述及 2025-2026 年的前沿工作，文献调研时间缩短 80%。\n- **清晰构建知识图谱**：依托工具中分类明确的四大核心板块（感知、交互、智能体、虚实迁移），迅速理清技术范式与局限性，高效完成论文大纲设计。\n- **代码与数据无缝对接**：通过关联的资源库直接定位到经过分类的优质项目、数据集和仿真器链接，实现了从理论阅读到实验复现的无缝衔接。\n- **紧跟社区最新节奏**：利用每周更新的机制，即时掌握多模态大模型与世界模型在具身智能领域的最新落地案例，确保研究内容始终处于行业最前沿。\n\nEmbodied_AI_Paper_List 将原本分散杂乱的科研信息整合为结构化的知识导航，极大提升了具身智能领域从理论研究到工程落地的全链路效率。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHCPLab-SYSU_Embodied_AI_Paper_List_1a576839.jpg","HCPLab-SYSU","HCP Lab","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FHCPLab-SYSU_d9c13e31.jpg","Human Cyber Physical (HCP) Intelligence Integration Lab\r\n\r\n中山大学人机物智能融合实验室",null,"https:\u002F\u002Fwww.sysu-hcp.net\u002F","https:\u002F\u002Fgithub.com\u002FHCPLab-SYSU",1986,139,"2026-04-05T11:17:26",1,"","未说明",{"notes":89,"python":87,"dependencies":90},"该项目是一个论文列表和资源仓库（Paper List and Resource Repository），主要包含综述文章、数据集链接和模拟器介绍，并非可执行的软件代码库，因此没有特定的运行环境、依赖库或硬件需求。用户只需通过浏览器查看网页或使用 Git 克隆仓库即可使用。",[],[46,15],[93,94,95,96,97,98,99,100,101,102],"embodied-ai","agent","causality","interaction","reasoning","robotics","percpetion","manipulation","survey","navigation","2026-03-27T02:49:30.150509","2026-04-06T08:15:57.739301",[106,111,116,121,125,130],{"id":107,"question_zh":108,"answer_zh":109,"source_url":110},9842,"如何向该论文列表项目推荐或添加新的相关论文？","您可以直接在 GitHub 上创建一个 Issue，提供论文的标题、作者、发表年份\u002F会议以及链接（如项目主页或 arXiv 链接）。维护者审核认为有用后会将其加入列表。部分情况下，维护者也可能建议您直接创建一个 Pull Request (PR) 来添加论文。","https:\u002F\u002Fgithub.com\u002FHCPLab-SYSU\u002FEmbodied_AI_Paper_List\u002Fissues\u002F27",{"id":112,"question_zh":113,"answer_zh":114,"source_url":115},9843,"如果发现列表中某篇论文的引用链接错误，该如何反馈？","请通过提交 Issue 指出具体的错误链接，并提供正确的链接地址（例如 DOI 链接或官方项目页面）。维护者在确认后会立即修正该链接。","https:\u002F\u002Fgithub.com\u002FHCPLab-SYSU\u002FEmbodied_AI_Paper_List\u002Fissues\u002F13",{"id":117,"question_zh":118,"answer_zh":119,"source_url":120},9844,"项目中的论文是按什么顺序排列的？未来会如何更新？","根据社区建议，项目计划将论文按时间顺序从新到旧排列，以便读者关注最前沿的工作。后续更新也将遵循这一顺序。","https:\u002F\u002Fgithub.com\u002FHCPLab-SYSU\u002FEmbodied_AI_Paper_List\u002Fissues\u002F12",{"id":122,"question_zh":123,"answer_zh":124,"source_url":120},9845,"是否接受特定细分领域（如扩散策略 Diffusion Policy）的论文推荐？","是的，项目欢迎各个热门及细分领域的论文推荐，包括最近火热的扩散策略（diffusion policy）等相关工作。用户可以通过 Issue 提交具体论文，维护者会根据相关性进行收录。",{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},9846,"提交的论文推荐会被立即处理吗？","维护者通常会尽快响应。对于有价值的论文推荐，维护者会在评论中确认“已添加”或“即将更新”。如果论文较多或需要格式调整，可能会建议您通过 Pull Request 自行添加以加快进程。","https:\u002F\u002Fgithub.com\u002FHCPLab-SYSU\u002FEmbodied_AI_Paper_List\u002Fissues\u002F15",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},9847,"我可以一次性推荐多篇不同任务类型（如操作和导航）的论文吗？","可以。您可以在一个 Issue 中分类列出多篇论文（例如分为“操作相关”和“导航相关”），并提供每篇论文的详细信息和链接。维护者会逐一评估并收录有用的论文。","https:\u002F\u002Fgithub.com\u002FHCPLab-SYSU\u002FEmbodied_AI_Paper_List\u002Fissues\u002F7",[]]