[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-WooooDyy--LLM-Agent-Paper-List":3,"tool-WooooDyy--LLM-Agent-Paper-List":62},[4,18,26,36,46,54],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",160015,2,"2026-04-18T11:30:52",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":42,"last_commit_at":43,"category_tags":44,"status":17},8272,"opencode","anomalyco\u002Fopencode","OpenCode 是一款开源的 AI 编程助手（Coding Agent），旨在像一位智能搭档一样融入您的开发流程。它不仅仅是一个代码补全插件，而是一个能够理解项目上下文、自主规划任务并执行复杂编码操作的智能体。无论是生成全新功能、重构现有代码，还是排查难以定位的 Bug，OpenCode 都能通过自然语言交互高效完成，显著减少开发者在重复性劳动和上下文切换上的时间消耗。\n\n这款工具专为软件开发者、工程师及技术研究人员设计，特别适合希望利用大模型能力来提升编码效率、加速原型开发或处理遗留代码维护的专业人群。其核心亮点在于完全开源的架构，这意味着用户可以审查代码逻辑、自定义行为策略，甚至私有化部署以保障数据安全，彻底打破了传统闭源 AI 助手的“黑盒”限制。\n\n在技术体验上，OpenCode 提供了灵活的终端界面（Terminal UI）和正在测试中的桌面应用程序，支持 macOS、Windows 及 Linux 全平台。它兼容多种包管理工具，安装便捷，并能无缝集成到现有的开发环境中。无论您是追求极致控制权的资深极客，还是渴望提升产出的独立开发者，OpenCode 都提供了一个透明、可信",144296,1,"2026-04-16T14:50:03",[13,45],"插件",{"id":47,"name":48,"github_repo":49,"description_zh":50,"stars":51,"difficulty_score":32,"last_commit_at":52,"category_tags":53,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",109154,"2026-04-18T11:18:24",[14,15,13],{"id":55,"name":56,"github_repo":57,"description_zh":58,"stars":59,"difficulty_score":32,"last_commit_at":60,"category_tags":61,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[45,13,15,14],{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":78,"owner_email":78,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":78,"stars":82,"forks":83,"last_commit_at":84,"license":78,"difficulty_score":42,"env_os":85,"env_gpu":86,"env_ram":86,"env_deps":87,"category_tags":90,"github_topics":91,"view_count":32,"oss_zip_url":78,"oss_zip_packed_at":78,"status":17,"created_at":97,"updated_at":98,"faqs":99,"releases":100},9130,"WooooDyy\u002FLLM-Agent-Paper-List","LLM-Agent-Paper-List","The paper list of the 86-page SCIS cover paper \"The Rise and Potential of Large Language Model Based Agents: A Survey\" by Zhiheng Xi et al.","LLM-Agent-Paper-List 是一个专注于大语言模型（LLM）智能体领域的学术资源库，旨在系统整理和追踪该方向的前沿研究论文。它源于团队发表在《中国科学：信息科学》封面的综述文章《基于大语言模型的智能体的崛起与潜力》，核心目标是解决研究人员在面对海量文献时难以快速定位高质量、必读论文的痛点。\n\n该资源库不仅提供了一份精心筛选的论文清单，还构建了涵盖智能体“大脑、感知、行动”三大核心组件的概念框架，并深入探讨了单体、多体及人机协作等多种应用场景与社会化行为。除了静态的文献列表，项目还持续更新相关技术动态，例如配套推出的 AgentGym 平台及其强化学习版本 AgentGym-RL，支持开发者在自定义环境中训练智能体进行长程决策，并提供可视化工具以复现和分析智能体的决策轨迹。\n\nLLM-Agent-Paper-List 特别适合人工智能领域的研究人员、高校师生以及希望深入了解 LLM 智能体架构的开发者使用。无论是想要快速把握领域发展脉络，还是寻找具体的算法实现与数据集，这里都能提供极具价值的指引，是探索通用人工智能（AGI）路径的重要参考站。","# The Rise and Potential of Large Language Model Based Agents: A Survey\n\n🔥 **Must-read papers for LLM-based agents.**\n\n🏃 **Coming soon: Add one-sentence intro to each paper.**\n\n## 🔔 News\n\n- 🎉 [2025-09-10] Note！You can develop your custom environment to AgentGym and perform RL on it! The tutorial is [here](https:\u002F\u002Fgithub.com\u002FWooooDyy\u002FAgentGym\u002Fblob\u002Fmain\u002Fdocs\u002Ftutorials\u002Fen\u002F05-2nd-Development.md).\n- 🍺 [2025-09-10] New paper is released on arXiv: [AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.08755).\n- 🚀 [2025-09-10] AgentGym-RL Framework released! We introduce the reinforcement learning (RL) version of AgentGym, enabling agents to learn directly from interactive environments: [AgentGym-RL](https:\u002F\u002Fgithub.com\u002FWooooDyy\u002FAgentGym-RL).\n- 👀 [2025\u002F09\u002F03] AgentGym now provides an interactive frontend for visualization. Researchers can replay and inspect full trajectories, step through agent decision-making, and analyze model behaviors more easily.\n- ☄️ [2024\u002F06\u002F07] AgentGym has been released for developing and evolving LLM-based agents across diverse environments!\n  - Paper: [AgentGym](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.04151).\n  - Project page: [https:\u002F\u002Fagentgym.github.io\u002F](https:\u002F\u002Fagentgym.github.io\u002F).\n  - Codes: [Platform and Implementations](https:\u002F\u002Fgithub.com\u002FWooooDyy\u002FAgentGym).\n  - Huggingface resources:  [AgentTraj-L](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FAgentGym\u002FAgentTraj-L), [AgentEval](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FAgentGym\u002FAgentEval), [AgentEvol-7B](https:\u002F\u002Fhuggingface.co\u002FAgentGym\u002FAgentEvol-7B).\n- 🎉 [2024\u002F05\u002F02] R3 ([Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.05808)) was accepted by ICML 2024!\n- 💫 [2024\u002F02\u002F08] New paper R3 on RL for LLM agent reasoning has been released! Paper: [Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.05808). Codes: [LLM-Reverse-Curriculum-RL](https:\u002F\u002Fgithub.com\u002FWooooDyy\u002FLLM-Reverse-Curriculum-RL).\n- 🥳 [2023\u002F09\u002F20] This project has been listed on [GitHub Trendings](https:\u002F\u002Fgithub.com\u002Ftrending)!  It is a great honor!\n- 💥 [2023\u002F09\u002F15] Our survey is released! See [The Rise and Potential of Large Language Model Based Agents: A Survey](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.07864) for the paper!\n- ✨ [2023\u002F09\u002F14] We create this repository to maintain a paper list on LLM-based agents. More papers are coming soon!\n\n\u003Cdiv align=center>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWooooDyy_LLM-Agent-Paper-List_readme_9a9ae4a3358d.jpg\" width=\"80%\" \u002F>\u003C\u002Fdiv>\n\n\n## 🌟 Introduction\n\nFor a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing human level, with AI agents considered as a promising vehicle of this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. \n\nDue to the versatile and remarkable capabilities they demonstrate, large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI), offering hope for building general AI agents. Many research efforts have leveraged LLMs as the foundation to build AI agents and have achieved significant progress.\n\nIn this repository, we provide a systematic and comprehensive survey on LLM-based agents, and list some must-read papers. \n\nSpecifically, we start by the general conceptual framework for LLM-based agents: comprising three main components: brain, perception, and action, and the framework can be tailored to suit different applications. \nSubsequently, we explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation. \nFollowing this, we delve into agent societies, exploring the behavior and personality of LLM-based agents, the social phenomena that emerge when they form societies, and the insights they offer for human society.\nFinally, we discuss a range of key topics and open problems within the field.\n\n**We greatly appreciate any contributions via PRs, issues, emails, or other methods.**\n\n## Table of Content (ToC)\n\n\n- [The Rise and Potential of Large Language Model Based Agents: A Survey](#the-rise-and-potential-of-large-language-model-based-agents-a-survey)\n  - [🔔 News](#-news)\n  - [🌟 Introduction](#-introduction)\n  - [Table of Content (ToC)](#table-of-content-toc)\n  - [1. The Birth of An Agent: Construction of LLM-based Agents](#1-the-birth-of-an-agent-construction-of-llm-based-agents)\n    - [1.1 Brain: Primarily Composed of An LLM](#11-brain-primarily-composed-of-an-llm)\n      - [1.1.1 Natural Language Interaction](#111-natural-language-interaction)\n        - [High-quality generation](#high-quality-generation)\n        - [Deep understanding](#deep-understanding)\n      - [1.1.2 Knowledge](#112-knowledge)\n        - [Pretrain model](#pretrain-model)\n        - [Linguistic knowledge](#linguistic-knowledge)\n        - [Commonsense knowledge](#commonsense-knowledge)\n        - [Actionable knowledge](#actionable-knowledge)\n        - [Potential issues of knowledge](#potential-issues-of-knowledge)\n      - [1.1.3 Memory](#113-memory)\n        - [Memory capability](#memory-capability)\n          - [Raising the length limit of Transformers](#raising-the-length-limit-of-transformers)\n          - [Summarizing memory](#summarizing-memory)\n          - [Compressing memories with vectors or data structures](#compressing-memories-with-vectors-or-data-structures)\n        - [Memory retrieval](#memory-retrieval)\n      - [1.1.4 Reasoning \\& Planning](#114-reasoning--planning)\n        - [Reasoning](#reasoning)\n        - [Planning](#planning)\n          - [Plan formulation](#plan-formulation)\n          - [Plan reflection](#plan-reflection)\n      - [1.1.5 Transferability and Generalization](#115-transferability-and-generalization)\n        - [Unseen task generalization](#unseen-task-generalization)\n        - [In-context learning](#in-context-learning)\n        - [Continual learning](#continual-learning)\n    - [1.2 Perception: Multimodal Inputs for LLM-based Agents](#12-perception-multimodal-inputs-for-llm-based-agents)\n      - [1.2.1 Visual](#121-visual)\n      - [1.2.2 Audio](#122-audio)\n    - [1.3 Action: Expand Action Space of LLM-based Agents](#13-action-expand-action-space-of-llm-based-agents)\n      - [1.3.1 Tool Using](#131-tool-using)\n      - [1.3.2 Embodied Action](#132-embodied-action)\n  - [2. Agents in Practice: Applications of LLM-based Agents](#2-agents-in-practice-applications-of-llm-based-agents)\n    - [2.1 General Ability of Single Agent](#21-general-ability-of-single-agent)\n      - [2.1.1 Task-oriented Deployment](#211-task-oriented-deployment)\n      - [2.1.2 Innovation-oriented Deployment](#212-innovation-oriented-deployment)\n      - [2.1.3 Lifecycle-oriented Deployment](#213-lifecycle-oriented-deployment)\n    - [2.2 Coordinating Potential of Multiple Agents](#22-coordinating-potential-of-multiple-agents)\n      - [2.2.1 Cooperative Interaction for Complementarity](#221-cooperative-interaction-for-complementarity)\n      - [2.2.2 Adversarial Interaction for Advancement](#222-adversarial-interaction-for-advancement)\n    - [2.3 Interactive Engagement between Human and Agent](#23-interactive-engagement-between-human-and-agent)\n      - [2.3.1 Instructor-Executor Paradigm](#231-instructor-executor-paradigm)\n        - [Education](#education)\n        - [Health](#health)\n        - [Other Application](#other-application)\n      - [2.3.2 Equal Partnership Paradigm](#232-equal-partnership-paradigm)\n        - [Empathetic Communicator](#empathetic-communicator)\n        - [Human-Level Participant](#human-level-participant)\n  - [3. Agent Society: From Individuality to Sociality](#3-agent-society-from-individuality-to-sociality)\n    - [3.1 Behavior and Personality of LLM-based Agents](#31-behavior-and-personality-of-llm-based-agents)\n      - [3.1.1 Social Behavior](#311-social-behavior)\n        - [Individual behaviors](#individual-behaviors)\n        - [Group behaviors](#group-behaviors)\n      - [3.1.2 Personality](#312-personality)\n        - [Cognition](#cognition)\n        - [Emotion](#emotion)\n        - [Character](#character)\n    - [3.2 Environment for Agent Society](#32-environment-for-agent-society)\n      - [3.2.1 Text-based Environment](#321-text-based-environment)\n      - [3.2.2 Virtual Sandbox Environment](#322-virtual-sandbox-environment)\n      - [3.2.3 Physical Environment](#323-physical-environment)\n    - [3.3 Society Simulation with LLM-based Agents](#33-society-simulation-with-llm-based-agents)\n  - [4. Other Topics](#4-other-topics)\n    - [4.1 Benchmarks for LLM-based Agents](#41-benchmarks-for-llm-based-agents)\n    - [4.2 Training and Optimizing LLM-based Agents](#42-training-and-optimizing-llm-based-agents)\n  - [Citation](#citation)\n  - [Project Maintainers \\& Contributors](#project-maintainers--contributors)\n  - [Contact](#contact)\n  - [Star History](#star-history)\n\n\n\n\n\n\n## 1. The Birth of An Agent: Construction of LLM-based Agents\n\u003Cdiv align=center>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWooooDyy_LLM-Agent-Paper-List_readme_ff2e2079c50f.jpg\" width=\"80%\" \u002F>\u003C\u002Fdiv>\n\n### 1.1 Brain: Primarily Composed of An LLM\n\n#### 1.1.1 Natural Language Interaction\n\n##### High-quality generation\n\n\n- [2023\u002F10] **Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond** *Liang Chen et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.02071)] [[code](https:\u002F\u002Fgithub.com\u002FPKUnlp-icler\u002FPCA-EVAL)]\n  - This work proposes PCA-EVAL, which benchmarks embodied decision making via MLLM-based End-to-End method and LLM-based Tool-Using methods from Perception, Cognition and Action Levels.\n- [2023\u002F08] **A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity.** *Yejin Bang et al. arXiv.* [[paper](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2302.04023)]\n  - This work evaluates the multitask, multilingual and multimodal aspects of ChatGPT using 21 data sets covering 8 different common NLP application tasks.\n- [2023\u002F06] **LLM-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain Conversations with Large Language Models.** *Yen-Ting Lin et al. arXiv.* [[paper](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2305.13711)]\n  - The LLM-EVAL method evaluates multiple dimensions of evaluation, such as content, grammar, relevance, and appropriateness.\n- [2023\u002F04] **Is ChatGPT a Highly Fluent Grammatical Error Correction System? A Comprehensive Evaluation.** *Tao Fang et al. arXiv.* [[paper](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2304.01746)]\n  - The results of evaluation demonstrate that ChatGPT has excellent error detection capabilities and can freely correct errors to make the corrected sentences very fluent. Additionally, its performance in non-English and low-resource settings highlights its potential in multilingual GEC tasks.\n\n##### Deep understanding\n\n- [2023\u002F06] **Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models.** *Natalie Shapira et al. arXiv.* [[paper](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2305.14763)]\n  - LLMs exhibit certain theory of mind abilities, but this behavior is far from being robust.\n- [2022\u002F08] **Inferring Rewards from Language in Context.** *Jessy Lin et al. ACL.* [[paper](https:\u002F\u002Fdoi.org\u002F10.18653\u002Fv1\u002F2022.acl-long.585)]\n  - This work presents a model that infers rewards from language and predicts optimal actions in unseen environment.\n- [2021\u002F10] **Theory of Mind Based Assistive Communication in Complex Human Robot Cooperation.** *Moritz C. Buehler et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.01355)]\n  - This work designs an agent Sushi with an understanding of the human during interaction.\n\n#### 1.1.2 Knowledge\n\n##### Pretrain model\n\n- [2023\u002F04] **Learning Distributed Representations of Sentences from Unlabelled Data.** *Felix Hill (University of Cambridge) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1602.03483)]\n- [2020\u002F02] **How Much Knowledge Can You Pack Into the Parameters of a Language Model?** *Adam Roberts (Google) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.08910)]\n- [2020\u002F01] **Scaling Laws for Neural Language Models.** *Jared Kaplan (Johns Hopkins University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2001.08361)]\n- [2017\u002F12] **Commonsense Knowledge in Machine Intelligence.** *Niket Tandon (Allen Institute for Artificial Intelligence) et al. SIGMOD.* [[paper](https:\u002F\u002Fsigmodrecord.org\u002Fpublications\u002FsigmodRecord\u002F1712\u002Fpdfs\u002F09_reports_Tandon.pdf)]\n- [2011\u002F03] **Natural Language Processing (almost) from Scratch.** *Ronan Collobert (Princeton) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1103.0398)]\n\n##### Linguistic knowledge\n\n- [2023\u002F02] **A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity.** *Yejin Bang et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.04023)]\n- [2021\u002F06] **Probing Pre-trained Language Models for Semantic Attributes and their Values.** *Meriem Beloucif et al. EMNLP.* [[paper](https:\u002F\u002Faclanthology.org\u002F2021.findings-emnlp.218\u002F)]\n- [2020\u002F10] **Probing Pretrained Language Models for Lexical Semantics.** *Ivan Vulić et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.05731)]\n- [2019\u002F04] **A Structural Probe for Finding Syntax in Word Representations.** *John Hewitt et al. ACL.* [[paper](https:\u002F\u002Faclanthology.org\u002FN19-1419\u002F)]\n- [2016\u002F04] **Improved Automatic Keyword Extraction Given More Semantic Knowledge.** *H Leung. Systems for Advanced Applications.* [[paper](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-3-319-32055-7_10)]\n\n##### Commonsense knowledge\n\n- [2022\u002F10] **Language Models of Code are Few-Shot Commonsense Learners.** *Aman Madaan et al.arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.07128)]\n- [2021\u002F04] **Relational World Knowledge Representation in Contextual Language Models: A Review.** *Tara Safavi et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.05837)]\n- [2019\u002F11] **How Can We Know What Language Models Know?** *Zhengbao Jiang et al.arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.12543)]\n\n##### Actionable knowledge\n\n- [2023\u002F07] **Large language models in medicine.** *Arun James Thirunavukarasu et al. nature.* [[paper](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41591-023-02448-8)]\n- [2023\u002F06] **DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation.** *Yuhang Lai et al. ICML.* [[paper](https:\u002F\u002Fproceedings.mlr.press\u002Fv202\u002Flai23b.html)]\n- [2022\u002F10] **Language Models of Code are Few-Shot Commonsense Learners.** *Aman Madaan et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.07128)]\n- [2022\u002F02] **A Systematic Evaluation of Large Language Models of Code.** *Frank F. Xu et al.arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2202.13169)]\n- [2021\u002F10] **Training Verifiers to Solve Math Word Problems.** *Karl Cobbe et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.14168)]\n\n##### Potential issues of knowledge\n\n- [2023\u002F10] **FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation.** *Tu Vu (Google) et al. arXiv* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.03214)] [[code](https:\u002F\u002Fgithub.com\u002Ffreshllms\u002Ffreshqa)]\n- [2023\u002F05] **Editing Large Language Models: Problems, Methods, and Opportunities.** *Yunzhi Yao et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.13172)]\n- [2023\u002F05] **Self-Checker: Plug-and-Play Modules for Fact-Checking with Large Language Models.** *Miaoran Li et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14623)]\n- [2023\u002F05] **CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing.** *Zhibin Gou et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.11738)]\n- [2023\u002F04] **Tool Learning with Foundation Models.** *Yujia Qin et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.08354)]\n- [2023\u002F03] **SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models.** *Potsawee Manakul et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.08896)]\n- [2022\u002F06] **Memory-Based Model Editing at Scale.** *Eric Mitchell et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.06520)]\n- [2022\u002F04] **A Review on Language Models as Knowledge Bases.** *Badr AlKhamissi et al.arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.06031)]\n- [2021\u002F04] **Editing Factual Knowledge in Language Models.** *Nicola De Cao et al.arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.08164)]\n- [2017\u002F08] **Measuring Catastrophic Forgetting in Neural Networks.** *Ronald Kemker et al.arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1708.02072)]\n\n#### 1.1.3 Memory\n\n##### Memory capability\n\n###### Raising the length limit of Transformers\n\n- [2023\u002F10] **MemGPT: Towards LLMs as Operating Systems.** *Charles Packer (UC Berkeley) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.08560)] [[project page](https:\u002F\u002Fmemgpt.ai\u002F)] [[code](https:\u002F\u002Fgithub.com\u002Fcpacker\u002FMemGPT)] [[dataset](https:\u002F\u002Fhuggingface.co\u002FMemGPT)]\n- [2023\u002F05] **Randomized Positional Encodings Boost Length Generalization of Transformers.** *Anian Ruoss (DeepMind) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16843)] [[code](https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Frandomized_positional_encodings)]\n- [2023-03] **CoLT5: Faster Long-Range Transformers with Conditional Computation.** *Joshua Ainslie (Google Research) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.09752)]\n- [2022\u002F03] **Efficient Classification of Long Documents Using Transformers.** *Hyunji Hayley Park (Illinois University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.11258)] [[code](https:\u002F\u002Fgithub.com\u002Famazon-science\u002Fefficient-longdoc-classification)]\n- [2021\u002F12] **LongT5: Efficient Text-To-Text Transformer for Long Sequences.** *Mandy Guo (Google Research) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.07916)] [[code](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Flongt5)]\n- [2019\u002F10] **BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension.** *Michael Lewis (Facebook AI) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.13461)] [[code](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Ftree\u002Fmain\u002Fsrc\u002Ftransformers\u002Fmodels\u002Fbart)]\n\n###### Summarizing memory\n\n- [2023\u002F10] **Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading** *Howard Chen (Princeton University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.05029)]\n- [2023\u002F09] **Empowering Private Tutoring by Chaining Large Language Models** *Yulin Chen (Tsinghua University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.08112)]\n- [2023\u002F08] **ExpeL: LLM Agents Are Experiential Learners.** *Andrew Zhao (Tsinghua University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.10144)] [[code](https:\u002F\u002Fgithub.com\u002FAndrewzh112\u002FExpeL)]\n- [2023\u002F08] **ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate.** *Chi-Min Chan (Tsinghua University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.07201)] [[code](https:\u002F\u002Fgithub.com\u002Fthunlp\u002FChatEval)]\n- [2023\u002F05] **MemoryBank: Enhancing Large Language Models with Long-Term Memory.** *Wanjun Zhong (Harbin Institute of Technology) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.10250)] [[code](https:\u002F\u002Fgithub.com\u002Fzhongwanjun\u002Fmemorybank-siliconfriend)]\n- [2023\u002F04] **Generative Agents: Interactive Simulacra of Human Behavior.** *Joon Sung Park (Stanford University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.03442)] [[code](https:\u002F\u002Fgithub.com\u002Fjoonspk-research\u002Fgenerative_agents)]\n- [2023\u002F04] **Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System.** *Xinnian Liang (Beihang University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.13343)] [[code](https:\u002F\u002Fgithub.com\u002Fwbbeyourself\u002Fscm4llms)]\n- [2023\u002F03] **Reflexion: Language Agents with Verbal Reinforcement Learning.** *Noah Shinn (Northeastern University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.11366)] [[code](https:\u002F\u002Fgithub.com\u002Fnoahshinn024\u002Freflexion)]\n- [2023\u002F05] **RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text.** *Wangchunshu Zhou (AIWaves) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.13304.pdf)] [[code](https:\u002F\u002Fgithub.com\u002Faiwaves-cn\u002FRecurrentGPT)]\n\n\n###### Compressing memories with vectors or data structures\n\n- [2023\u002F07] **Communicative Agents for Software Development.** *Chen Qian (Tsinghua University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.07924)] [[code](https:\u002F\u002Fgithub.com\u002Fopenbmb\u002Fchatdev)]\n- [2023\u002F06] **ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory.** *Chenxu Hu (Tsinghua University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.03901)] [[code](https:\u002F\u002Fgithub.com\u002Fhuchenxucs\u002FChatDB)]\n- [2023\u002F05] **Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory.** *Xizhou Zhu (Tsinghua University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.17144)] [[code](https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FGITM)]\n- [2023\u002F05] **RET-LLM: Towards a General Read-Write Memory for Large Language Models.** *Ali Modarressi (LMU Munich) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14322)] [[code](https:\u002F\u002Fgithub.com\u002Ftloen\u002Falpaca-lora)]\n- [2023\u002F05] **RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text.** *Wangchunshu Zhou (AIWaves) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.13304.pdf)] [[code](https:\u002F\u002Fgithub.com\u002Faiwaves-cn\u002FRecurrentGPT)]\n\n##### Memory retrieval\n\n- [2023\u002F08] **Memory Sandbox: Transparent and Interactive Memory Management for Conversational Agents.** *Ziheng Huang (University of California—San Diego) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.01542)]\n- [2023\u002F08] **AgentSims: An Open-Source Sandbox for Large Language Model Evaluation.** *Jiaju Lin (PTA Studio) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.04026)] [[project page](https:\u002F\u002Fwww.agentsims.com\u002F)] [[code](https:\u002F\u002Fgithub.com\u002Fpy499372727\u002FAgentSims\u002F)]\n- [2023\u002F06] **ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory.** *Chenxu Hu (Tsinghua University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.03901)] [[code](https:\u002F\u002Fgithub.com\u002Fhuchenxucs\u002FChatDB)]\n- [2023\u002F05] **MemoryBank: Enhancing Large Language Models with Long-Term Memory.** *Wanjun Zhong (Harbin Institute of Technology) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.10250)] [[code](https:\u002F\u002Fgithub.com\u002Fzhongwanjun\u002Fmemorybank-siliconfriend)]\n- [2023\u002F04] **Generative Agents: Interactive Simulacra of Human Behavior.** *Joon Sung Park (Stanford) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.03442)] [[code](https:\u002F\u002Fgithub.com\u002Fjoonspk-research\u002Fgenerative_agents)]\n- [2023\u002F05] **RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text.** *Wangchunshu Zhou (AIWaves) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.13304.pdf)] [[code](https:\u002F\u002Fgithub.com\u002Faiwaves-cn\u002FRecurrentGPT)]\n\n\n#### 1.1.4 Reasoning & Planning\n\n##### Reasoning\n- [2024\u002F02] **Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning.** *Zhiheng Xi (Fudan University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.05808)] [[Code](https:\u002F\u002Fgithub.com\u002FWooooDyy\u002FLLM-Agent-Paper-List)]\n- [2023\u002F09] **ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs.** *Justin Chih-Yao Chen (University of North Carolina at Chapel Hill) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.13007.pdf)] [[code](https:\u002F\u002Fgithub.com\u002Fdinobby\u002FReConcile)]\n\n- [2023\u002F05] **Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement.** *Zhiheng Xi (Fudan University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14497)] [[code](https:\u002F\u002Fgithub.com\u002Fwoooodyy\u002Fself-polish)]\n\n- [2023-03] **Large Language Models are Zero-Shot Reasoners.** *Takeshi Kojima (The University of Tokyo) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.11916)] [[code](https:\u002F\u002Fgithub.com\u002Fkojima-takeshi188\u002Fzero_shot_cot)]\n\n- [2023\u002F03] **Self-Refine: Iterative Refinement with Self-Feedback.** *Aman Madaan (Carnegie Mellon University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17651)] [[code](https:\u002F\u002Fgithub.com\u002Fmadaan\u002Fself-refine)]\n\n- [2022\u002F05] **Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning.** *Antonia Creswell (DeepMind) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.09712)]\n\n- [2022\u002F03] **Self-Consistency Improves Chain of Thought Reasoning in Language Models.** *Xuezhi Wang (Google Research) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.11171)] [[code](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Ftree\u002Fmain\u002Fsrc\u002Ftransformers\u002Fmodels\u002Fbart)]\n\n- [2023\u002F02] **Multimodal Chain-of-Thought Reasoning in Language Models.** *Zhuosheng Zhang (Shanghai Jiao Tong University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.00923)] [[code](https:\u002F\u002Fgithub.com\u002Famazon-science\u002Fmm-cot)]\n\n- [2022\u002F01] **Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.** *Jason Wei (Google Research) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.11903)]\n\n\n##### Planning\n\n###### Plan formulation\n\n- [2023\u002F11] **JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models.** *ZiHao Wang (Peking University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05997)] [[code](https:\u002F\u002Fgithub.com\u002FCraftJarvis\u002FJARVIS-1)]\n- [2023\u002F10] **Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models.** *Andy Zhou (University of Illinois Urbana-Champaign) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.04406)] [[project page](https:\u002F\u002Fandyz245.github.io\u002FLanguageAgentTreeSearch\u002F)] [[code](https:\u002F\u002Fgithub.com\u002Fandyz245\u002FLanguageAgentTreeSearch\u002F)]\n- [2023\u002F05] **Tree of Thoughts: Deliberate Problem Solving with Large Language Models.** *Shunyu Yao (Princeton University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.10601)] [[code](https:\u002F\u002Fgithub.com\u002Fprinceton-nlp\u002Ftree-of-thought-llm)]\n- [2023\u002F05] **Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents.** *Yue Wu (Carnegie Mellon University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.02412)]\n- [2023\u002F05] **Reasoning with Language Model is Planning with World Model.** *Shibo Hao (UC San Diego) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14992)] [[code](https:\u002F\u002Fgithub.com\u002FBer666\u002FRAP)]\n- [2023\u002F05] **SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks.** *Bill Yuchen Lin (Allen Institute for Artificial Intelligence) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.17390)] [[code](https:\u002F\u002Fgithub.com\u002Fyuchenlin\u002Fswiftsage)]\n- [2023\u002F04] **LLM+P: Empowering Large Language Models with Optimal Planning Proficiency.** *Bo Liu (University of Texas at Austin) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.11477)] [[code](https:\u002F\u002Fgithub.com\u002FCranial-XIX\u002Fllm-pddl)]\n- [2023\u002F03] **HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face.** *Yongliang Shen (Microsoft Research Asia) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17580)] [[code](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FJARVIS)]\n- [2023\u002F02] **Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents.** *ZiHao Wang (Peking University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.01560)] [[code](https:\u002F\u002Fgithub.com\u002FCraftJarvis\u002FMC-Planner)]\n- [2022\u002F05] **Least-to-Most Prompting Enables Complex Reasoning in Large Language Models.** *Denny Zhou (Google Research) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.10625)]\n- [2022\u002F05] **MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning.** *Ehud Karpas (AI21 Labs) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.00445)]\n- [2022\u002F04] **Do As I Can, Not As I Say: Grounding Language in Robotic Affordances.** *Michael Ahn (Robotics at Google) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.01691)]\n- [2023\u002F05] **Agents: An Open-source Framework for Autonomous Language Agents.** *Wangchunshu Zhou (AIWaves) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.07870.pdf)] [[code](https:\u002F\u002Fgithub.com\u002Faiwaves-cn\u002Fagents)]\n- [2022\u002F12] **Don’t Generate, Discriminate: A Proposal for Grounding Language Models to Real-World Environments.** *Yu Gu (The Ohio State University) et al. ACL.* [[paper](https:\u002F\u002Faclanthology.org\u002F2023.acl-long.270.pdf)] [[code](https:\u002F\u002Fgithub.com\u002Fdki-lab\u002FPangu)]\n\n\n###### Plan reflection\n\n- [2024\u002F02] **Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization** *Wenqi Zhang (Zhejiang University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.17574)] [[code](https:\u002F\u002Fgithub.com\u002Fzwq2018\u002FAgent-Pro)]\n- [2024\u002F01] **Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives** *Wenqi Zhang (Zhejiang University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.02009)]\n- [2023\u002F11] **JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models.** *ZiHao Wang (Peking University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05997)] [[code](https:\u002F\u002Fgithub.com\u002FCraftJarvis\u002FJARVIS-1)]\n- [2023\u002F10] **Chain-of-Verification Reduces Hallucination in Large Language Models.** *Shehzaad Dhuliawala (Meta AI & ETH Zu ̈rich) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11495)]\n- [2023\u002F10] **FireAct: Toward Language Agent Fine-tuning.** *Baian Chen (System2 Research) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16291)] [[project page](https:\u002F\u002Ffireact-agent.github.io\u002F)] [[code](https:\u002F\u002Fgithub.com\u002Fanchen1011\u002FFireAct)] [[dataset](https:\u002F\u002Fgithub.com\u002Fanchen1011\u002FFireAct\u002Ftree\u002Fmain\u002Fdata)]\n- [2023\u002F08] **SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning.** *Ning Miao (University of Oxford) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.00436)] [[code](https:\u002F\u002Fgithub.com\u002FNingMiao\u002FSelfCheck)]\n- [2023\u002F05] **ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models.** *Zhipeng Chen (Renmin University of China) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14323)] [[code](https:\u002F\u002Fgithub.com\u002FRUCAIBOX\u002FChatCoT)]\n- [2023\u002F05] **Voyager: An Open-Ended Embodied Agent with Large Language Models.** *Guanzhi Wang (NVIDIA) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16291)] [[project page](https:\u002F\u002Fvoyager.minedojo.org\u002F)] [[code](https:\u002F\u002Fgithub.com\u002FMineDojo\u002FVoyager)]\n- [2023\u002F03] **Chat with the Environment: Interactive Multimodal Perception Using Large Language Models.** *Xufeng Zhao (University Hamburg) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.08268)] [[code](https:\u002F\u002Fmatcha-model.github.io\u002F)]\n- [2022\u002F12] **LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models.** *Chan Hee Song (The Ohio State University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.04088)] [[code](https:\u002F\u002Fdki-lab.github.io\u002FLLM-Planner\u002F)]\n- [2022\u002F10] **ReAct: Synergizing Reasoning and Acting in Language Models.** *Shunyu Yao (Princeton University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.03629)] [[code](https:\u002F\u002Freact-lm.github.io\u002F)]\n- [2022\u002F07] **Inner Monologue: Embodied Reasoning through Planning with Language Models.** *Wenlong Huang (Robotics at Google) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.05608)] [[code](https:\u002F\u002Finnermonologue.github.io\u002F)]\n- [2021\u002F10] **AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts.** *Tongshuang Wu (University of Washington) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.01691)]\n\n#### 1.1.5 Transferability and Generalization\n\n##### Unseen task generalization\n- [2024\u002F06] **AgentGym: Evolving Large Language Model-based Agents across Diverse Environments.** *Zhiheng Xi (Fudan University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.04151)] [[project page](https:\u002F\u002Fagentgym.github.io\u002F)] [[codes and platform](https:\u002F\u002Fgithub.com\u002FWooooDyy\u002FAgentGym)] [[dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FAgentGym\u002FAgentTraj-L)] [[benchmark](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FAgentGym\u002FAgentEval)] [[model](https:\u002F\u002Fhuggingface.co\u002FAgentGym\u002FAgentEvol-7B)].\n- [2023\u002F10] **AgentTuning: Enabling Generalized Agent Abilities for LLMs.** *Aohan Zeng (Tsinghua University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.12823)] [[project page](https:\u002F\u002Fthudm.github.io\u002FAgentTuning\u002F)] [[code](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FAgentTuning)] [[dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTHUDM\u002FAgentInstruct)]\n- [2023\u002F10] **Lemur: Harmonizing Natural Language and Code for Language Agents** *Yiheng Xu (University of Hong Kong) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.06830)] [[code](https:\u002F\u002Fgithub.com\u002FOpenLemur\u002FLemur)]\n- [2023\u002F05] **Training language models to follow instructions with human feedback.** *Long Ouyang et al. NeurIPS.* [[paper](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Fhash\u002Fb1efde53be364a73914f58805a001731-Abstract-Conference.html)]\n  - InstructGPT: Aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback.\n- [2023\u002F01] **Multitask Prompted Training Enables Zero-Shot Task Generalization.** *Victor Sanh et al. ICLR.* [[paper](https:\u002F\u002Fopenreview.net\u002Fforum?id=9Vrb9D0WI4)] [[code](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Ft-zero)]\n  - T0: T0 is an encoder-decoder model that consumes textual inputs and produces target responses. It is trained on a multitask mixture of NLP datasets partitioned into different tasks.\n- [2022\u002F10] **Scaling Instruction-Finetuned Language Models.** *Hyung Won Chung et al. arXiv.* [[paper](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2210.11416)] [[code](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Ft5x)]\n  - This work explores instruction finetuning with a particular focus on scaling the number of tasks and the model size, which  improves performance on a variety of model classes, prompting setups, and evaluation benchmarks.\n- [2022\u002F08] **Finetuned Language Models are Zero-Shot Learners.** *Jason Wei et al. ICLR.* [[paper](https:\u002F\u002Fopenreview.net\u002Fforum?id=gEZrGCozdqR)]\n  - FLAN: Instruction tuning substantially improves zero-shot performance on unseen tasks.\n\n##### In-context learning\n\n- [2023\u002F08] **Images Speak in Images: A Generalist Painter for In-Context Visual Learning.** *Xinlong Wang et al. IEEE.* [[paper](https:\u002F\u002Fdoi.org\u002F10.1109\u002FCVPR52729.2023.00660)] [[code](https:\u002F\u002Fgithub.com\u002Fbaaivision\u002FPainter)]\n  - Painter: This work presents a generalist model for in-context visual learning with an \"image\"-centric solution.\n- [2023\u002F08] **Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers.** *Chengyi Wang et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.02111)] [[code](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Funilm)]\n  - VALL-E: This work trains a neural codec language model, which emerges in-context learning capabilities.\n- [2023\u002F07] **A Survey for In-context Learning.** *Qingxiu Dong et al. arXiv.* [[paper](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2301.00234)]\n  - This survey summarizes the progress and challenges of in-context learning (ICL).\n- [2023\u002F05] **Language Models are Few-Shot Learners.** *Tom B. Brown (OpenAI) et al. NeurIPS.* [[paper](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2020\u002Fhash\u002F1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html)]\n  - GPT-3: Scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even becoming competitive with prior state-ofthe-art fine-tuning approaches.\n\n##### Continual learning\n\n- [2023\u002F11] **JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models.** *ZiHao Wang (Peking University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05997)] [[code](https:\u002F\u002Fgithub.com\u002FCraftJarvis\u002FJARVIS-1)]\n- [2023\u002F07] **Progressive Prompts: Continual Learning for Language Models.** *Razdaibiedina et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.12314)]\n  - This work introduces Progressive Prompts, which allows forward transfer and resists catastrophic forgetting, without relying on data replay or a large number of task-specific parameters.\n- [2023\u002F05] **Voyager: An Open-Ended Embodied Agent with Large Language Models.** *Guanzhi Wang (NVIDIA) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16291)] [[project page](https:\u002F\u002Fvoyager.minedojo.org\u002F)] [[code](https:\u002F\u002Fgithub.com\u002FMineDojo\u002FVoyager)]\n  -  Voyager: This is an example of LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention.\n- [2023\u002F01] **A Comprehensive Survey of Continual Learning: Theory, Method and Application.** *Liyuan Wang et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.00487)]\n  - This survey presents a comprehensive survey of continual learning, seeking to bridge the basic settings, theoretical foundations, representative methods, and practical applications.\n- [2022\u002F11] **Continual Learning of Natural Language Processing Tasks: A Survey.** *Zixuan Ke et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.12701)]\n  - This survey presents a comprehensive review and analysis of the recent progress of CL in NLP.\n\n\n### 1.2 Perception: Multimodal Inputs for LLM-based Agents\n\n#### 1.2.1 Visual\n\n- [2024\u002F01] **Agent ai: Surveying the horizons of multimodal interaction.** *Zane Durante et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.03568)]\n- [2023\u002F10] **Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond** *Liang Chen et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.02071)] [[code](https:\u002F\u002Fgithub.com\u002FPKUnlp-icler\u002FPCA-EVAL)]\n- [2023\u002F05] **Language Is Not All You Need: Aligning Perception with Language Models.** *Shaohan Huang et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.14045)]\n- [2023\u002F05] **InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning.** *Wenliang Dai et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.06500)]\n- [2023\u002F05] **MultiModal-GPT: A Vision and Language Model for Dialogue with Humans.** *Tao Gong et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.04790)]\n- [2023\u002F05] **PandaGPT: One Model To Instruction-Follow Them All.** *Yixuan Su et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16355)]\n- [2023\u002F04] **Visual Instruction Tuning.** *Haotian Liu et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.08485)]\n- [2023\u002F04] **MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models.** *Deyao Zhu. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.10592)]\n- [2023\u002F01] **BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models.** *Junnan Li et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.12597)]\n- [2022\u002F04] **Flamingo: a Visual Language Model for Few-Shot Learning.** *Jean-Baptiste Alayrac et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.14198)]\n- [2021\u002F10] **MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer.** *Sachin Mehta et al.arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.02178)]\n- [2021\u002F05] **MLP-Mixer: An all-MLP Architecture for Vision.** *Ilya Tolstikhin et al.arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.01601)]\n- [2020\u002F10] **An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.** *Alexey Dosovitskiy et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.11929)]\n- [2017\u002F11] **Neural Discrete Representation Learning.** *Aaron van den Oord et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.00937)]\n#### 1.2.2 Audio\n\n- [2023\u002F06] **Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding.** *Hang Zhang et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.02858)]\n- [2023\u002F05] **X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages.** *Feilong Chen et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.04160)]\n- [2023\u002F05] **InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language.** *Zhaoyang Liu et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.05662)]\n- [2023\u002F04] **AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.** *Rongjie Huang et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.12995)]\n- [2023\u002F03] **HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face.** *Yongliang Shen et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17580)]\n- [2021\u002F06] **HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units.** *Wei-Ning Hsu et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.07447)]\n- [2021\u002F04] **AST: Audio Spectrogram Transformer.** *Yuan Gong et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.01778)]\n\n### 1.3 Action: Expand Action Space of LLM-based Agents\n\n#### 1.3.1 Tool Using\n- [2024\u002F02] **Towards Uncertainty-Aware Language Agent.** *Jiuzhou Han (Monash University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.14016)] [[project page](https:\u002F\u002Fuala-agent.github.io)] [[code](https:\u002F\u002Fgithub.com\u002FJiuzhouh\u002FUncertainty-Aware-Language-Agent)]\n- [2023\u002F10] **OpenAgents: An Open Platform for Language Agents in the Wild.** *XLang Lab (The University of Hong Kong) arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.10634)] [[project page](https:\u002F\u002Fdocs.xlang.ai)] [[code](https:\u002F\u002Fgithub.com\u002Fxlang-ai\u002FOpenAgents)] [[demo](https:\u002F\u002Fchat.xlang.ai)]\n- [2023\u002F10] **Lemur: Harmonizing Natural Language and Code for Language Agents** *Yiheng Xu (University of Hong Kong) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.06830)] [[code](https:\u002F\u002Fgithub.com\u002FOpenLemur\u002FLemur)]\n- [2023\u002F10] **Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond** *Liang Chen (Peking University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.02071)] [[code](https:\u002F\u002Fgithub.com\u002FPKUnlp-icler\u002FPCA-EVAL)]\n  - HOLMES is a multi-agent cooperation framework that allows LLMs to leverage MLLMs and APIs to gather multimodal information for informed decision-making. \n- [2023\u002F07] **ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs.** *Yujia Qin (Tsinghua University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.16789)] [[code](https:\u002F\u002Fgithub.com\u002Fopenbmb\u002Ftoolbench)] [[dataset](https:\u002F\u002Fpaperswithcode.com\u002Fdataset\u002Ftoolbench)]\n  - ToolLLM is a general tool-use framework encompassing data construction, model training and evaluation.\n- [2023\u002F05] **Large Language Models as Tool Makers.** *Tianle Cai (Princeton University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.17126)] [[code](https:\u002F\u002Fgithub.com\u002Fctlllll\u002Fllm-toolmaker)]\n  - LATM is a closed-loop framework that takes an initial step towards removing the dependency on the availability of existing tools.\n- [2023\u002F05] **CREATOR: Disentangling Abstract and Concrete Reasonings of Large Language Models through Tool Creation.** *Cheng Qian (Tsinghua University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14318)]\n  - CREATOR is a novel framework that empowers LLMs to create their own tools through documentation and code realization.\n- [2023\u002F04] **Tool Learning with Foundation Models.** *Yujia Qin (Tsinghua University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.08354)] [[code](https:\u002F\u002Fgithub.com\u002Fopenbmb\u002Fbmtools)]\n  - This survey primarily introduces a new paradigm called \"tool learning based on foundational models\", which combines the advantages of specialized tools and foundational models, achieving higher precision, efficiency, and automation in problem-solving.\n- [2023\u002F04] **ChemCrow: Augmenting large-language models with chemistry tools.** *Andres M Bran (Laboratory of Artificial Chemical Intelligence, ISIC, EPFL) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.05376)] [[code](https:\u002F\u002Fgithub.com\u002Fur-whitelab\u002Fchemcrow-public)]\n  - ChemCrow is an LLM chemistry agent that integrates 13 expert-designed tools and augments the LLM performance in chemistry and emerge new capabilities.\n- [2023\u002F04] **GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information.** *Qiao Jin (National Institutes of Health), Yifan Yang, Qingyu Chen, Zhiyong Lu. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.09667)] [[code](https:\u002F\u002Fgithub.com\u002Fncbi\u002FGeneGPT)]\n  - GeneGPT is a model that answer genomics questions. It introduces a novel method for handling challenges with hallucinations by teaching LLMs to use the Web APIs.\n- [2023\u002F04] **OpenAGI: When LLM Meets Domain Experts.** *Yingqiang Ge (Rutgers University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.04370)] [[code](https:\u002F\u002Fgithub.com\u002Fagiresearch\u002Fopenagi)]\n  - OpenAGI is an open-source AGI research platform. It introduces a paradigm of LLMs operating various expert models for complex task-solving and proposes an RLTF mechanism to improve the LLM's task-solving ability.\n- [2023\u002F03] **HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face.** *Yongliang Shen (Zhejiang University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17580)] [[code](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FJARVIS)]\n  - HuggingGPT is a system that leverages LLMs to connect various and multimodal AI models in machine learning communities to solve AI tasks.\n- [2023\u002F03] **Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models.** *Chenfei Wu (Microsoft Research Asia) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.04671)] [[code](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fvisual-chatgpt)]\n  - Visual ChatGPT is a system that opens the door to investigating the visual roles of ChatGPT with the help of Visual Foundation Models.\n- [2023\u002F02] **Augmented Language Models: a Survey.** *Grégoire Mialon (Meta AI) et al. TMLR.* [[paper](https:\u002F\u002Fopenreview.net\u002Fforum?id=jh7wH2AzKK)]\n  - This survey reviews works in which LMs are augmented with the ability to use tools. Augmented LMs can use external modules to expand their context processing ability.\n- [2023\u002F02] **Toolformer: Language Models Can Teach Themselves to Use Tools.** *Timo Schick (Meta AI) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.04761)]\n  - Toolformer shows that LLMs can teach themselves to use external tools with a handful of demonstrations for each API.\n- [2022\u002F05] **TALM: Tool Augmented Language Models.** *Aaron Parisi (Google) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.12255)]\n  - TALM introduces a method that combines non-differentiable tools with LMs, enabling the model to access real-time or private data.\n- [2022\u002F05] **MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning.** *Ehud Karpas (AI21 Labs) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.00445)]\n  - MRKL Systems augments LLMs with an easily extensible set of external knowledge and reasoning modules.\n- [2022\u002F04] **Do As I Can, Not As I Say: Grounding Language in Robotic Affordances.** *Michael Ahn (Google) et al. CoRL.* [[paper](https:\u002F\u002Fproceedings.mlr.press\u002Fv205\u002Fichter23a.html)]\n  - SayCan applies LMs in real-world robotic tasks by combining advanced semantic knowledge from LLMs with the value function of pre-trained skills.\n- [2021\u002F12] **WebGPT: Browser-assisted question-answering with human feedback.** *Reiichiro Nakano (OpenAI) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.09332)]\n  - WebGPT answer questions using a webbrowsing environment. It uses imitation learning during training and then optimizes answer quality through human feedback.\n- [2021\u002F07] **Evaluating Large Language Models Trained on Code.** *Mark Chen (OpenAI) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2107.03374)] [[code](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fhuman-eval)]\n  - Codex can synthesize programs from docstrings, that is, creating tools based on documentation.\n\n\n#### 1.3.2 Embodied Action\n- [2023\u002F12] **Towards Learning a Generalist Model for Embodied Navigation.** *Duo Zheng (The Chinese University of Hong Kong) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.02010)] [[code](https:\u002F\u002Fgithub.com\u002Fzd11024\u002FNaviLLM)]\n- [2023\u002F11] **An Embodied Generalist Agent in 3D World.** *Jiangyong Huang (BIGAI & Peking University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.12871)] [[project page](https:\u002F\u002Fembodied-generalist.github.io\u002F)]\n- [2023\u002F11] **JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models.** *ZiHao Wang (Peking University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05997)] [[code](https:\u002F\u002Fgithub.com\u002FCraftJarvis\u002FJARVIS-1)]\n- [2023\u002F10] **Lemur: Harmonizing Natural Language and Code for Language Agents** *Yiheng Xu (University of Hong Kong) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.06830)] [[code](https:\u002F\u002Fgithub.com\u002FOpenLemur\u002FLemur)]\n- [2023\u002F10] **Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond** *Liang Chen et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.02071)] [[code](https:\u002F\u002Fgithub.com\u002FPKUnlp-icler\u002FPCA-EVAL)]\n- [2023\u002F07] **Interactive language: Talking to robots in real time.** *Corey Lynch et al. IEEE (RAL)* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2210.06407.pdf)]\n- [2023\u002F05] **Voyager: An Open-Ended Embodied Agent with Large Language Models.** *Guanzhi Wang (NVIDIA) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16291)] [[project page](https:\u002F\u002Fvoyager.minedojo.org\u002F)] [[code](https:\u002F\u002Fgithub.com\u002FMineDojo\u002FVoyager)]\n- [2023\u002F05] **AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments.** *Sudipta Paul et al. NeurIPS.* [[paper](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Ffile\u002F28f699175783a2c828ae74d53dd3da20-Paper-Conference.pdf)]\n- [2023\u002F05] **EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought.** *Yao Mu et al. Arxiv* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.15021.pdf)] [[code](https:\u002F\u002Fgithub.com\u002FEmbodiedGPT\u002FEmbodiedGPT_Pytorch)]\n- [2023\u002F05] **NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models.** *Gengze Zhou et al. Arxiv* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.16986.pdf)]\n- [2023\u002F05] **AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation.** *Chuhao Jin et al. Arxiv* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.18898.pdf)]\n- [2023\u002F03] **PaLM-E: An Embodied Multimodal Language Model.** *Danny Driess et al. Arxiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.03378.pdf)]\n- [2023\u002F03] **Reflexion: Language Agents with Verbal Reinforcement Learning.** *Noah Shinn et al. Arxiv* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.11366.pdf)] [[code](https:\u002F\u002Fgithub.com\u002Fnoahshinn024\u002Freflexion)]\n- [2023\u002F02] **Collaborating with language models for embodied reasoning.** *Ishita Dasgupta et al. Arxiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2302.00763.pdf)]\n- [2023\u002F02] **Code as Policies: Language Model Programs for Embodied Control.** *Jacky Liang et al. IEEE (ICRA).* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.07753.pdf)]\n- [2022\u002F10] **ReAct: Synergizing Reasoning and Acting in Language Models.** *Shunyu Yao et al. Arxiv* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2210.03629.pdf)] [[code](https:\u002F\u002Fgithub.com\u002Fysymyth\u002FReAct)]\n- [2022\u002F10] **Instruction-Following Agents with Multimodal Transformer.** *Hao Liu et al. CVPR* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2210.13431.pdf)] [[code](https:\u002F\u002Fgithub.com\u002Flhao499\u002Finstructrl)]\n- [2022\u002F07] **Inner Monologue: Embodied Reasoning through Planning with Language Models.** *Wenlong Huang et al. Arxiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2207.05608.pdf)]\n- [2022\u002F07] **LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action.** *Dhruv Shahet al. CoRL* [[paper](https:\u002F\u002Fproceedings.mlr.press\u002Fv205\u002Fshah23b\u002Fshah23b.pdf)] [[code](https:\u002F\u002Fgithub.com\u002Fblazejosinski\u002Flm_nav)]\n- [2022\u002F04] **Do As I Can, Not As I Say: Grounding Language in Robotic Affordances.** *Michael Ahn et al. Arxiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.01691.pdf)]\n- [2022\u002F01] **A Survey of Embodied AI: From Simulators to Research Tasks.** *Jiafei Duan et al. IEEE (TETCI).* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.04918.pdf)]\n- [2022\u002F01] **Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents.** *Wenlong Huang et al. Arxiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2201.07207v2.pdf)] [[code](https:\u002F\u002Fgithub.com\u002Fhuangwl18\u002Flanguage-planner)]\n- [2020\u002F04] **Experience Grounds Language.** *Yonatan Bisk et al. EMNLP* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2004.10151.pdf)]\n- [2019\u002F03] **Review of Deep Reinforcement Learning for Robot Manipulation.** *Hai Nguyen et al. IEEE (IRC).* [[paper](https:\u002F\u002Fwww.researchgate.net\u002Fprofile\u002FHai-Nguyen-128\u002Fpublication\u002F355980729_Review_of_Deep_Reinforcement_Learning_for_Robot_Manipulation\u002Flinks\u002F6187ef153068c54fa5bb977e\u002FReview-of-Deep-Reinforcement-Learning-for-Robot-Manipulation.pdf)]\n- [2005\u002F01] **The Development of Embodied Cognition: Six Lessons from Babies.** *Linda Smith et al. Artificial Life.* [[paper](https:\u002F\u002Fcogdev.sitehost.iu.edu\u002Flabwork\u002F6_lessons.pdf)]\n\n\n\n## 2. Agents in Practice: Applications of LLM-based Agents\n\n\u003Cdiv align=center>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWooooDyy_LLM-Agent-Paper-List_readme_da4fb6cc2482.jpg\" width=\"60%\" \u002F>\u003C\u002Fdiv>\n\n### 2.1 General Ability of Single Agent\n\u003Cdiv align=center>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWooooDyy_LLM-Agent-Paper-List_readme_c9ca91e1e74c.jpg\" width=\"60%\" \u002F>\u003C\u002Fdiv>\n\n#### 2.1.1 Task-oriented Deployment\n**In web scenarios**\n- [2023\u002F10] **OpenAgents: An Open Platform for Language Agents in the Wild.** *XLang Lab (The University of Hong Kong) arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.10634)] [[project page](https:\u002F\u002Fdocs.xlang.ai)] [[code](https:\u002F\u002Fgithub.com\u002Fxlang-ai\u002FOpenAgents)] [[demo](https:\u002F\u002Fchat.xlang.ai)]\n- [2023\u002F07] **WebArena: A Realistic Web Environment for Building Autonomous Agents.** *Shuyan Zhou (CMU) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.13854)] [[code](https:\u002F\u002Fwebarena.dev\u002F)]\n- [2023\u002F07] **A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis.** *Izzeddin Gur (DeepMind) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.12856)]\n- [2023\u002F06] **SYNAPSE: Leveraging Few-Shot Exemplars for\nHuman-Level Computer Control.** *Longtao Zheng (Nanyang Technological University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.07863)] [[code](https:\u002F\u002Fgithub.com\u002Fltzheng\u002Fsynapse)]\n- [2023\u002F06] **Mind2Web: Towards a Generalist Agent for the Web.** *Xiang Deng (The Ohio State University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.06070)] [[code](https:\u002F\u002Fosu-nlp-group.github.io\u002FMind2Web\u002F)]\n- [2023\u002F05] **Multimodal Web Navigation with Instruction-Finetuned Foundation Models.** *Hiroki Furuta (The University of Tokyo) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.11854)]\n- [2023\u002F03] **Language Models can Solve Computer Tasks.** *Geunwoo Kim (University of California) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17491)] [[code](https:\u002F\u002Fgithub.com\u002Fposgnu\u002Frci-agent)]\n- [2022\u002F07] **WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents.** *Shunyu Yao (Princeton University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.01206)] [[code](https:\u002F\u002Fwebshop-pnlp.github.io\u002F)]\n- [2021\u002F12] **WebGPT: Browser-assisted question-answering with human feedback.** *Reiichiro Nakano (OpenAI) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.09332)]\n- [2023\u002F05] **Agents: An Open-source Framework for Autonomous Language Agents.** *Wangchunshu Zhou (AIWaves) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.07870.pdf)] [[code](https:\u002F\u002Fgithub.com\u002Faiwaves-cn\u002Fagents)]\n- [2024\u002F04] **OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments.** *XLang Lab (The University of Hong Kong) arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.07972)] [[project page](https:\u002F\u002Fdocs.xlang.ai)] [[code](https:\u002F\u002Fgithub.com\u002Fxlang-ai\u002FOSWorld)] [[data viewer](https:\u002F\u002Fos-world.github.io\u002Fexplorer.html)]\n\n**In life scenarios**\n- [2023\u002F10] **OpenAgents: An Open Platform for Language Agents in the Wild.** *XLang Lab (The University of Hong Kong) arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.10634)] [[project page](https:\u002F\u002Fdocs.xlang.ai)] [[code](https:\u002F\u002Fgithub.com\u002Fxlang-ai\u002FOpenAgents)] [[demo](https:\u002F\u002Fchat.xlang.ai)]\n- [2023\u002F08] **InterAct: Exploring the Potentials of ChatGPT as a Cooperative Agent.** *Po-Lin Chen et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.01552)]\n- [2023\u002F05] **Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents.** *Yue Wu (CMU) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.02412)]\n- [2023\u002F05] **Augmenting Autotelic Agents with Large Language Models.** *Cédric Colas (MIT) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.12487)]\n- [2023\u002F03] **Planning with Large Language Models via Corrective Re-prompting.** *Shreyas Sundara Raman (Brown University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.09935)]\n- [2022\u002F10] **Generating Executable Action Plans with Environmentally-Aware Language Models.** *Maitrey Gramopadhye (University of North Carolina at Chapel Hill) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.04964)] [[code](https:\u002F\u002Fgithub.com\u002Fhri-ironlab\u002Fscene_aware_language_planner)]\n- [2022\u002F01] **Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents.** *Wenlong Huang (UC Berkeley) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.07207)] [[code](https:\u002F\u002Fwenlong.page\u002Flanguage-planner\u002F)]\n\n#### 2.1.2 Innovation-oriented Deployment\n- [2023\u002F10] **OpenAgents: An Open Platform for Language Agents in the Wild.** *XLang Lab (The University of Hong Kong) arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.10634)] [[project page](https:\u002F\u002Fdocs.xlang.ai)] [[code](https:\u002F\u002Fgithub.com\u002Fxlang-ai\u002FOpenAgents)] [[demo](https:\u002F\u002Fchat.xlang.ai)]\n- [2023\u002F08] **The Hitchhiker's Guide to Program Analysis: A Journey with Large Language Models.** *Haonan Li (UC Riverside) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.00245)]\n- [2023\u002F08] **ChatMOF: An Autonomous AI System for Predicting and Generating Metal-Organic Frameworks.** *Yeonghun Kang (Korea Advanced Institute of Science\nand Technology) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.01423)]\n- [2023\u002F07] **Math Agents: Computational Infrastructure, Mathematical Embedding, and Genomics.** *Melanie Swan (University College London) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.02502)]\n- [2023\u002F06] **Towards Autonomous Testing Agents via Conversational Large Language Models.** *Robert Feldt (Chalmers University of Technology) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.05152)]\n- [2023\u002F04] **Emergent autonomous scientific research capabilities of large language models.** *Daniil A. Boiko (CMU) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.05332)]\n- [2023\u002F04] **ChemCrow: Augmenting large-language models with chemistry tools.** *Andres M Bran (Laboratory of Artificial Chemical Intelligence, ISIC, EPFL) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.05376)] [[code](https:\u002F\u002Fgithub.com\u002Fur-whitelab\u002Fchemcrow-public)]\n- [2022\u002F03] **ScienceWorld: Is your Agent Smarter than a 5th Grader?** *Ruoyao Wang (University of Arizona) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.07540)] [[code](https:\u002F\u002Fsciworld.apps.allenai.org\u002F)]\n\n#### 2.1.3 Lifecycle-oriented Deployment\n- [2023\u002F05] **Voyager: An Open-Ended Embodied Agent with Large Language Models.** *Guanzhi Wang (NVIDIA) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16291)] [[project page](https:\u002F\u002Fvoyager.minedojo.org\u002F)] [[code](https:\u002F\u002Fgithub.com\u002FMineDojo\u002FVoyager)]\n- [2023\u002F05] **Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory.** *Xizhou Zhu (Tsinghua University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.17144)] [[code](https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FGITM)]\n- [2023\u002F03] **Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks.** *Haoqi Yuan (PKU) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.16563)] [[project page](https:\u002F\u002Fsites.google.com\u002Fview\u002Fplan4mc)]\n- [2023\u002F02] **Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents.** *Zihao Wang (PKU) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.01560)] [[code](https:\u002F\u002Fgithub.com\u002FCraftJarvis\u002FMC-Planner)]\n- [2023\u002F01] **Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling.** *Kolby Nottingham (University of California Irvine, Irvine) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.12050)] [[code](https:\u002F\u002Fdeckardagent.github.io\u002F)]\n\n### 2.2 Coordinating Potential of Multiple Agents\n\u003Cdiv align=center>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWooooDyy_LLM-Agent-Paper-List_readme_74784d697618.jpg\" width=\"60%\" \u002F>\u003C\u002Fdiv>\n\n#### 2.2.1 Cooperative Interaction for Complementarity\n**Disordered cooperation**\n- [2023\u002F07] **Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration.** *Zhenhailong Wang (University of Illinois Urbana-Champaign) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.05300)] [[code](https:\u002F\u002Fgithub.com\u002FMikeWangWZHL\u002FSolo-Performance-Prompting)]\n- [2023\u002F07] **RoCo: Dialectic Multi-Robot Collaboration with Large Language Models.** *Zhao Mandi, Shreeya Jain, Shuran Song (Columbia University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.04738)] [[code](https:\u002F\u002Fproject-roco.github.io\u002F)]\n- [2023\u002F04] **ChatLLM Network: More brains, More intelligence.** *Rui Hao (Beijing University of Posts and Telecommunications) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.12998)]\n- [2023\u002F01] **Blind Judgement: Agent-Based Supreme Court Modelling With GPT.** *Sil Hamilton (McGill University). arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.05327)]\n- [2023\u002F05] **Agents: An Open-source Framework for Autonomous Language Agents.** *Wangchunshu Zhou (AIWaves) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.07870.pdf)] [[code](https:\u002F\u002Fgithub.com\u002Faiwaves-cn\u002Fagents)]\n\n\n**Ordered cooperation**\n- [2023\u002F10] **AutoAgents: A Framework for Automatic Agent Generation.** *Guangyao Chen (Peking University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.17288)] [[code](https:\u002F\u002Fgithub.com\u002FLink-AGI\u002FAutoAgents)]\n- [2023\u002F09] **MindAgent: Emerging Gaming Interaction.** *Ran Gong (UCLA) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.09971)] [[code](https:\u002F\u002Fmindagent.github.io\u002F)]\n- [2023\u002F08] **CGMI: Configurable General Multi-Agent Interaction Framework.** *Shi Jinxin (East China Normal University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.12503)]\n- [2023\u002F08] **ProAgent: Building Proactive Cooperative AI with Large Language Models.** *Ceyao Zhang (The Chinese University of Hong Kong, Shenzhen) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.11339)] [[code](https:\u002F\u002Fpku-proagent.github.io\u002F)]\n- [2023\u002F08] **AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents.** *Weize Chen (Tsinghua University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.10848)] [[code](https:\u002F\u002Fgithub.com\u002FOpenBMB\u002FAgentVerse)]\n- [2023\u002F08] **AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework.** *Qingyun Wu (Pennsylvania State University\n) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.08155)] [[code](https:\u002F\u002Fmicrosoft.github.io\u002FFLAML\u002Fdocs\u002FUse-Cases\u002FAutogen\u002F)]\n- [2023\u002F08] **MetaGPT: Meta Programming for Multi-Agent Collaborative Framework.** *Sirui Hong (DeepWisdom) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.00352)] [[code](https:\u002F\u002Fgithub.com\u002Fgeekan\u002FMetaGPT)]\n- [2023\u002F07] **Communicative Agents for Software Development.** *Chen Qian (Tsinghua University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.07924)] [[code](https:\u002F\u002Fgithub.com\u002Fopenbmb\u002Fchatdev)]\n- [2023\u002F06] **Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents.** *Yashar Talebira (University of Alberta) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.03314)]\n- [2023\u002F05] **Training Socially Aligned Language Models in Simulated Human Society.** *Ruibo Liu (Dartmouth College) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16960)] [[code](https:\u002F\u002Fgithub.com\u002Fagi-templar\u002FStable-Alignment)]\n- [2023\u002F05] **SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks.** *Bill Yuchen Lin (Allen Institute for Artificial Intelligence) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.17390)] [[code](https:\u002F\u002Fyuchenlin.xyz\u002Fswiftsage\u002F)]\n- [2023\u002F05] **ChatGPT as your Personal Data Scientist.** *Md Mahadi Hassan (Auburn University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.13657)]\n- [2023\u002F03] **CAMEL: Communicative Agents for \"Mind\" Exploration of Large Scale Language Model Society.** *Guohao Li (King Abdullah University of Science and Technology) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17760)] [[code](https:\u002F\u002Fgithub.com\u002Flightaime\u002Fcamel)]\n- [2023\u002F03] **DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents.** *Varun Nair (Curai Health) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17071)] [[code](https:\u002F\u002Fgithub.com\u002Fcurai\u002Fcurai-research\u002Ftree\u002Fmain\u002FDERA)]\n- [2023\u002F04] **Self-collaboration Code Generation via ChatGPT.** *Yihong Dong (Peking University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.07590)]\n\n#### 2.2.2 Adversarial Interaction for Advancement\n- [2023\u002F08] **ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate.** *Chi-Min Chan (Tsinghua University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.07201)] [[code](https:\u002F\u002Fgithub.com\u002Fthunlp\u002FChatEval)]\n- [2023\u002F05] **Improving Factuality and Reasoning in Language Models through Multiagent Debate.** *Yilun Du (MIT CSAIL) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14325)] [[code](https:\u002F\u002Fcomposable-models.github.io\u002Fllm_debate\u002F)]\n- [2023\u002F05] **Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback.** *Yao Fu (University of Edinburgh) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.10142)] [[code](https:\u002F\u002Fgithub.com\u002FFranxYao\u002FGPT-Bargaining)]\n- [2023\u002F05] **Examining the Inter-Consistency of Large Language Models: An In-depth Analysis via Debate.** *Kai Xiong (Harbin Institute of Technology) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.11595)]\n- [2023\u002F05] **Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate.** *Tian Liang (Tsinghua University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.19118)] [[code](https:\u002F\u002Fgithub.com\u002FSkytliang\u002FMulti-Agents-Debate)]\n\n### 2.3 Interactive Engagement between Human and Agent\n\u003Cdiv align=center>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWooooDyy_LLM-Agent-Paper-List_readme_6ed12799a232.jpg\" width=\"60%\" \u002F>\u003C\u002Fdiv>\n\n#### 2.3.1 Instructor-Executor Paradigm\n\n##### Education\n\n- [2023\u002F07] **Math Agents: Computational Infrastructure, Mathematical Embedding, and Genomics.** *Melanie Swan (UCL) et al. arXiv.* [[paper](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2307.02502)]\n  - Communicate with humans to help them understand and use mathematics.\n- [2023\u002F03] **Hey Dona! Can you help me with student course registration?** *Vishesh Kalvakurthi (MSU) et al. arXiv.* [[paper](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2303.13548)]\n  - This is a developed application called Dona that offers virtual voice assistance in student course registration, where humans provide instructions.\n\n##### Health\n\n- [2023\u002F08] **Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-world Multi-turn Dialogue.** *Songhua Yang (ZZU) et al. arXiv.* [[paper](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2308.03549)] [[code](https:\u002F\u002Fgithub.com\u002FSupritYoung\u002FZhongjing)]\n- [2023\u002F05] **HuatuoGPT, towards Taming Language Model to Be a Doctor.** *Hongbo Zhang (CUHK-SZ) et al. arXiv.* [[paper](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2305.15075)] [[code](https:\u002F\u002Fgithub.com\u002FFreedomIntelligence\u002FHuatuoGPT)] [[demo](https:\u002F\u002Fwww.huatuogpt.cn\u002F)]\n- [2023\u002F05] **Helping the Helper: Supporting Peer Counselors via AI-Empowered Practice and Feedback.** *Shang-Ling Hsu (Gatech) et al. arXiv.* [[paper](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2305.08982)]\n- [2020\u002F10] **A Virtual Conversational Agent for Teens with Autism Spectrum Disorder: Experimental Results and Design Lessons.** *Mohammad Rafayet Ali (U of R) et al. IVA '20.* [[paper](https:\u002F\u002Fdoi.org\u002F10.1145\u002F3383652.3423900)]\n\n##### Other Application\n\n- [2023\u002F08] **RecMind: Large Language Model Powered Agent For Recommendation.** *Yancheng Wang (ASU, Amazon) et al. arXiv.* [[paper](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2308.14296)]\n- [2023\u002F08] **Multi-Turn Dialogue Agent as Sales' Assistant in Telemarketing.** *Wanting Gao (JNU) et al. IEEE.* [[paper](https:\u002F\u002Fdoi.org\u002F10.1109\u002FIJCNN54540.2023.10192042)]\n- [2023\u002F07] **PEER: A Collaborative Language Model.** *Timo Schick (Meta AI) et al. arXiv.* [[paper](https:\u002F\u002Fopenreview.net\u002Fpdf?id=KbYevcLjnc)]\n- [2023\u002F07] **DIALGEN: Collaborative Human-LM Generated Dialogues for Improved Understanding of Human-Human Conversations.** *Bo-Ru Lu (UW) et al. arXiv.* [[paper](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2307.07047)]\n- [2023\u002F08] **LLM As DBA [vision].** *Xuanhe Zhou (Tsinghua) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.05481)]\n- [2023\u002F06] **AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn.** *Difei Gao (NUS) et al. arXiv.* [[paper](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2306.08640)]\n- [2023\u002F05] **Agents: An Open-source Framework for Autonomous Language Agents.** *Wangchunshu Zhou (AIWaves) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.07870.pdf)] [[code](https:\u002F\u002Fgithub.com\u002Faiwaves-cn\u002Fagents)]\n- [2023\u002F12] **D-Bot: Database Diagnosis System using Large Language Models.** *Xuanhe Zhou (Tsinghua) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.01454)] [[code](https:\u002F\u002Fgithub.com\u002FTsinghuaDatabaseGroup\u002FDB-GPT)]\n\n\n#### 2.3.2 Equal Partnership Paradigm\n\n##### Empathetic Communicator\n\n- [2023\u002F08] **SAPIEN: Affective Virtual Agents Powered by Large Language Models.** *Masum Hasan et al. arXiv.* [[paper](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2308.03022)] [[project page](https:\u002F\u002Fsapien.coach\u002F)]\n- [2023\u002F05] **Helping the Helper: Supporting Peer Counselors via AI-Empowered Practice and Feedback.** *Shang-Ling Hsu (Gatech) et al. arXiv.* [[paper](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2305.08982)]\n- [2022\u002F07] **Artificial empathy in marketing interactions: Bridging the human-AI gap in affective and social customer experience.** *Yuping Liu‑Thompkins et al.* [[paper](https:\u002F\u002Flink.springer.com\u002Farticle\u002F10.1007\u002Fs11747-022-00892-5)]\n\n##### Human-Level Participant\n\n- [2023\u002F08] **Quantifying the Impact of Large Language Models on Collective Opinion Dynamics.** *Chao Li et al. CoRR.* [[paper](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2308.03313)]\n- [2023\u002F06] **Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning.** *Anton Bakhtin et al. ICLR.* [[paper](https:\u002F\u002Fopenreview.net\u002Fpdf?id=F61FwJTZhb)]\n- [2023\u002F06] **Decision-Oriented Dialogue for Human-AI Collaboration.** *Jessy Lin et al. CoRR.* [[paper](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2305.20076)]\n- [2022\u002F11] **Human-level play in the game of Diplomacy by combining language models with strategic reasoning.** *FAIR et al. Science.* [[paper](https:\u002F\u002Fwww.science.org\u002Fdoi\u002F10.1126\u002Fscience.ade9097)]\n\n## 3. Agent Society: From Individuality to Sociality\n\u003Cdiv align=center>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWooooDyy_LLM-Agent-Paper-List_readme_dc52bd770fd9.jpg\" width=\"60%\" \u002F>\u003C\u002Fdiv>\n\n### 3.1 Behavior and Personality of LLM-based Agents\n\n#### 3.1.1 Social Behavior\n\n##### Individual behaviors\n- [2023\u002F10] **Lyfe Agents: Generative agents for low-cost real-time social interactions.** *Zhao Kaiya (MIT) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.02172)]\n- [2023\u002F05] **Voyager: An Open-Ended Embodied Agent with Large Language Models.** *Guanzhi Wang (NVIDIA) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16291)] [[code](https:\u002F\u002Fgithub.com\u002FMineDojo\u002FVoyager)] [[project page](https:\u002F\u002Fvoyager.minedojo.org\u002F)]\n- [2023\u002F04] **LLM+P: Empowering Large Language Models with Optimal Planning Proficiency.** *Bo Liu (University of Texas) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.11477)] [[code](https:\u002F\u002Fgithub.com\u002FCranial-XIX\u002Fllm-pddl)]\n- [2023\u002F03] **Reflexion: Language Agents with Verbal Reinforcement Learning.** *Noah Shinn (Northeastern University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.11366)] [[code](https:\u002F\u002Fgithub.com\u002Fnoahshinn024\u002Freflexion)]\n- [2023\u002F03] **PaLM-E: An Embodied Multimodal Language Model.** *Danny Driess (Google) et al. ICML.* [[paper](http:\u002F\u002Fproceedings.mlr.press\u002Fv202\u002Fdriess23a\u002Fdriess23a.pdf)] [[project page](https:\u002F\u002Fpalm-e.github.io\u002F)]\n- [2023\u002F03] **ReAct: Synergizing Reasoning and Acting in Language Models.** *Shunyu Yao (Princeton University) et al. ICLR.* [[paper](https:\u002F\u002Fopenreview.net\u002Fpdf?id=WE_vluYUL-X)] [[project page](https:\u002F\u002Freact-lm.github.io\u002F)]\n- [2022\u002F01] **Chain-of-thought prompting elicits reasoning in large language models.** *Jason Wei (Google) et al. NeurIPS.* [[paper](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Ffile\u002F9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf)]\n\n##### Group behaviors\n- [2023\u002F10] **Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View.** *Jintian Zhang (Zhejiang University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.02124)] [[code](https:\u002F\u002Fgithub.com\u002Fzjunlp\u002FMachineSoM)]\n- [2023\u002F09] **MindAgent: Emerging Gaming Interaction.** *Ran Gong (UCLA) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.09971)] [[code](https:\u002F\u002Fmindagent.github.io\u002F)]\n- [2023\u002F09] **Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf.** *Yuzhuang Xu (Tsinghua University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.04658)]\n- [2023\u002F09] **Suspicion Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4** *Jiaxian Gu oet al. arXiv.* [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2309.17277)]\n\n- [2023\u002F08] **AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents.** *Weize Chen (Tsinghua University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.10848)] [[code](https:\u002F\u002Fgithub.com\u002FOpenBMB\u002FAgentVerse)]\n- [2023\u002F08] **AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework.** *Qingyun Wu (Pennsylvania State University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.08155)] [[code](https:\u002F\u002Fmicrosoft.github.io\u002FFLAML\u002Fdocs\u002FUse-Cases\u002FAutogen\u002F)]\n- [2023\u002F08] **ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate.** *Chi-Min Chan (Tsinghua University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.07201)] [[code](https:\u002F\u002Fgithub.com\u002Fthunlp\u002FChatEval)]\n\n- [2023\u002F07] **Communicative Agents for Software Development.** *Chen Qian (Tsinghua University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.07924)] [[code](https:\u002F\u002Fgithub.com\u002Fopenbmb\u002Fchatdev)]\n- [2023\u002F07] **RoCo: Dialectic Multi-Robot Collaboration with Large Language Models.** *Zhao Mandi, Shreeya Jain, Shuran Song (Columbia University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.04738)] [[code](https:\u002F\u002Fproject-roco.github.io\u002F)]\n- [2023\u002F08] **ProAgent: Building Proactive Cooperative AI with Large Language Models.** *Ceyao Zhang (The Chinese University of Hong Kong, Shenzhen) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.11339)] [[code](https:\u002F\u002Fpku-proagent.github.io\u002F)]\n\n- [2023\u002F06] **Homophily in An Artificial Social Network of Agents Powered By Large Language Models.** *James K. He (University of Cambridge) et al. PsyArXiv.* [[paper](https:\u002F\u002Fdoi.org\u002F10.21203\u002Frs.3.rs-3096289\u002Fv1)]\n\n#### 3.1.2 Personality\n\n##### Cognition\n\n- [2023\u002F09] **Suspicion Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4** *Jiaxian Gu oet al. arXiv.* [[paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2309.17277)]\n- [2023\u002F03] **Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods.** *Thilo Hagendorff (University of Stuttgart) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.13988)]\n- [2023\u002F03] **Mind meets machine: Unravelling GPT-4's cognitive psychology.** *Sifatkaur Dhingra (Nowrosjee Wadia College) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.11436)]\n- [2022\u002F07] **Language models show human-like content effects on reasoning.** *Ishita Dasgupta (DeepMind) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.07051)]\n- [2022\u002F06] **Using cognitive psychology to understand GPT-3.** *Marcel Binz et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.14576)]\n\n\n##### Emotion\n\n- [2023\u002F07] **Emotional Intelligence of Large Language Models.** *Xuena Wang (Tsinghua  University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.09042)]\n- [2023\u002F05] **ChatGPT outperforms humans in emotional awareness evaluations.** *Zohar Elyoseph et al. Frontiers in Psychology.* [[paper](https:\u002F\u002Fwww.frontiersin.org\u002Farticles\u002F10.3389\u002Ffpsyg.2023.1199058\u002Ffull)]\n- [2023\u002F02] **Empathetic AI for Empowering Resilience in Games.** *Reza Habibi (University of California) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.09070)]\n- [2022\u002F12] **Computer says “No”: The Case Against Empathetic Conversational AI.** *Alba Curry (University of Leeds) et al. ACL.* [[paper](https:\u002F\u002Faclanthology.org\u002F2023.findings-acl.515.pdf)]\n\n##### Character\n\n- [2024\u002F05] **TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models.** *Jaewoo Ahn (Seoul National University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.18027)] [[code](https:\u002F\u002Fgithub.com\u002Fahnjaewoo\u002Ftimechara)]\n- [2023\u002F10] **Character-LLM: A Trainable Agent for Role-Playing.** *Yunfan Shao (Fudan University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.10158)] [[code](https:\u002F\u002Fgithub.com\u002Fchoosewhatulike\u002Ftrainable-agents\u002F)]\n- [2023\u002F07] **Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models.** *Keyu Pan (ByteDance) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.16180)] [[code](https:\u002F\u002Fgithub.com\u002FHarderThenHarder\u002Ftransformers_tasks)]\n- [2023\u002F07] **Personality Traits in Large Language Models.** *Mustafa Safdari (DeepMind) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.00184)] [[code](https:\u002F\u002Fgithub.com\u002FHarderThenHarder\u002Ftransformers_tasks)]\n- [2022\u002F12] **Does GPT-3 Demonstrate Psychopathy? Evaluating Large Language Models from a Psychological Perspective.** *Xingxuan Li (Alibaba) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.10529)]\n- [2022\u002F12] **Identifying and Manipulating the Personality Traits of Language Models.** *Graham Caron et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.10276)]\n\n### 3.2 Environment for Agent Society\n\n#### 3.2.1 Text-based Environment\n\n- [2023\u002F08] **Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models.** *Aidan O’Gara (University of Southern California) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.01404)] [[code](https:\u002F\u002Fgithub.com\u002Faogara-ds\u002Fhoodwinked)]\n- [2023\u002F03] **CAMEL: Communicative Agents for \"Mind\" Exploration of Large Scale Language Model Society.** *Guohao Li (King Abdullah University of Science and Technology) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17760)] [[code](https:\u002F\u002Fgithub.com\u002Flightaime\u002Fcamel)]\n- [2020\u002F12] **Playing Text-Based Games with Common Sense.** *Sahith Dambekodi (Georgia Institute of Technology) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2012.02757.pdf)]\n- [2019\u002F09] **Interactive Fiction Games: A Colossal Adventure.** *Matthew Hausknecht (Microsoft Research) et al. AAAI.* [[paper](https:\u002F\u002Fcdn.aaai.org\u002Fojs\u002F6297\u002F6297-13-9522-1-10-20200516.pdf)] [[code](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fjericho)]\n- [2019\u002F03] **Learning to Speak and Act in a Fantasy Text Adventure Game.** *Jack Urbanek (Facebook) et al. ACL.* [[paper](https:\u002F\u002Faclanthology.org\u002FD19-1062.pdf)] [[code](https:\u002F\u002Fparl.ai\u002Fprojects\u002Flight\u002F)]\n- [2018\u002F06] **TextWorld: A Learning Environment for Text-based Games.** *Marc-Alexandre Côté (Microsoft Research) et al. IJCAI.* [[paper](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-3-030-24337-1_3)] [[code](https:\u002F\u002Fgithub.com\u002FMicrosoft\u002FTextWorld)]\n\n#### 3.2.2 Virtual Sandbox Environment\n\n- [2023\u002F11] **JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models.** *ZiHao Wang (Peking University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05997)] [[code](https:\u002F\u002Fgithub.com\u002FCraftJarvis\u002FJARVIS-1)]\n- [2023\u002F10] **Humanoid Agents: Platform for Simulating Human-like Generative Agents.** *Zhilin Wang (University of Washington and NVIDIA) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.05418)] [[code](https:\u002F\u002Fgithub.com\u002FHumanoidAgents\u002FHumanoidAgents)] [[demo](https:\u002F\u002Fwww.humanoidagents.com\u002F)]\n- [2023\u002F08] **AgentSims: An Open-Source Sandbox for Large Language Model Evaluation.** *Jiaju Lin (PTA Studio) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.04026)] [[project page](https:\u002F\u002Fwww.agentsims.com\u002F)] [[code](https:\u002F\u002Fgithub.com\u002Fpy499372727\u002FAgentSims\u002F)]\n- [2023\u002F05] **Training Socially Aligned Language Models in Simulated Human Society.** *Ruibo Liu (Dartmouth College) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16960)] [[code](https:\u002F\u002Fgithub.com\u002Fagi-templar\u002FStable-Alignment)]\n- [2023\u002F05] **Voyager: An Open-Ended Embodied Agent with Large Language Models.** *Guanzhi Wang (NVIDIA) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16291)] [[project page](https:\u002F\u002Fvoyager.minedojo.org\u002F)] [[code](https:\u002F\u002Fgithub.com\u002FMineDojo\u002FVoyager)]\n- [2023\u002F04] **Generative Agents: Interactive Simulacra of Human Behavior.** *Joon Sung Park (Stanford University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.03442)] [[code](https:\u002F\u002Fgithub.com\u002Fjoonspk-research\u002Fgenerative_agents)]\n- [2023\u002F03] **Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks.** *Haoqi Yuan (PKU) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.16563)] [[project page](https:\u002F\u002Fsites.google.com\u002Fview\u002Fplan4mc)]\n- [2022\u002F06] **MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge.** *Linxi Fan (NVIDIA) et al. NeurIPS.* [[paper](https:\u002F\u002Fpapers.nips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Ffile\u002F74a67268c5cc5910f64938cac4526a90-Paper-Datasets_and_Benchmarks.pdf)] [[project page](https:\u002F\u002Fminedojo.org\u002F)]\n\n#### 3.2.3 Physical Environment\n\n- [2023\u002F11] **An Embodied Generalist Agent in 3D World.** *Jiangyong Huang (BIGAI & Peking University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.12871)] [[project page](https:\u002F\u002Fembodied-generalist.github.io\u002F)]\n- [2023\u002F09] **RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking.** *Homanga Bharadhwaj (Carnegie Mellon University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.01918)] [[project page](https:\u002F\u002Frobopen.github.io\u002F)]\n- [2023\u002F05] **AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments.** *Sudipta Paul et al. NeurIPS.* [[paper](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Ffile\u002F28f699175783a2c828ae74d53dd3da20-Paper-Conference.pdf)]\n- [2023\u002F03] **PaLM-E: An Embodied Multimodal Language Model.** *Danny Driess (Google) et al. ICML.* [[paper](http:\u002F\u002Fproceedings.mlr.press\u002Fv202\u002Fdriess23a\u002Fdriess23a.pdf)] [[project page](https:\u002F\u002Fpalm-e.github.io\u002F)]\n- [2022\u002F10] **Interactive Language: Talking to Robots in Real Time.** *Corey Lynch (Google) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.06407)] [[code](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Flanguage-table)]\n\n\n### 3.3 Society Simulation with LLM-based Agents\n- [2024\u002F03] **Emergence of Social Norms in Large Language Model-based Agent Societies.** *Siyue Ren et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.08251)] [[code](https:\u002F\u002Fgithub.com\u002Fsxswz213\u002FCRSEC)] \n- [2023\u002F08] **AgentSims: An Open-Source Sandbox for Large Language Model Evaluation.** *Jiaju Lin (PTA Studio) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.04026)] [[project page](https:\u002F\u002Fwww.agentsims.com\u002F)] [[code](https:\u002F\u002Fgithub.com\u002Fpy499372727\u002FAgentSims\u002F)]\n- [2023\u002F07] **S\u003Csup>3\u003C\u002Fsup>: Social-network Simulation System with Large Language Model-Empowered Agents.** *Chen Gao (Tsinghua University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.14984)]\n- [2023\u002F07] **Epidemic Modeling with Generative Agents.** *Ross Williams (Virginia Tech) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.04986)] [[code](https:\u002F\u002Fgithub.com\u002Fbear96\u002FGABM-Epidemic)] \n- [2023\u002F06] **RecAgent: A Novel Simulation Paradigm for Recommender Systems.** *Lei Wang (Renmin University of China) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.02552)]\n- [2023\u002F05] **Training Socially Aligned Language Models in Simulated Human Society.** *Ruibo Liu (Dartmouth College) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16960)] [[code](https:\u002F\u002Fgithub.com\u002Fagi-templar\u002FStable-Alignment)]\n- [2023\u002F04] **Generative Agents: Interactive Simulacra of Human Behavior.** *Joon Sung Park (Stanford University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.03442)] [[code](https:\u002F\u002Fgithub.com\u002Fjoonspk-research\u002Fgenerative_agents)]\n- [2022\u002F08] **Social Simulacra: Creating Populated Prototypes for Social Computing Systems.** *Joon Sung Park (Stanford University) et al. UIST.* [[paper](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3526113.3545616)]\n\n## 4. Other Topics\n\n### 4.1 Benchmarks for LLM-based Agents\n- [2023\u002F11] **\"MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration.\"** *Lin Xu et al.* (NUS, ByteDance, Stanford & UC Berkeley) arXiv. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.08562)] [[Project Page](https:\u002F\u002Fzhiyuanhubj.github.io\u002FMAgIC\u002F)] [[Code](https:\u002F\u002Fgithub.com\u002Fcathyxl\u002FMAgIC)]\n  - The work presents a benchmarking framework for evaluating LLMs in multi-agent settings, showing a 50% average improvement using Probabilistic Graphical Modeling.\n- [2023\u002F10] **\"Benchmarking Large Language Models As AI Research Agents.\"** *Qian Huang (Stanford) et al.* arXiv. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.03302)] [[code](https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002FMLAgentBench)]\n- [2023\u002F08] **\"AgentBench: Evaluating LLMs as Agents.\"** *Xiao Liu (THU) et al.* arXiv. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.03688)] [[code](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FAgentBench)] [[project page](https:\u002F\u002Fllmbench.ai\u002F)]\n  - AGENTBENCH, a benchmark for assessing LLMs as agents, shows a performance gap between top commercial and open-source models.\n- [2023\u002F10] **\"SmartPlay : A Benchmark for LLMs as Intelligent Agents.\"** *Yue Wu (CMU & Microsoft) et al.* arXiv. [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.01557)] [[code](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FSmartPlay)]\n  - SmartPlay is a benchmark and methodology for evaluating LLMs as intelligent agents, featuring six diverse games to assess key capabilities, providing a roadmap for identifying gaps in current methodologie\n- [2024\u002F04] **\"OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments.\"** *XLang Lab (The University of Hong Kong) arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.07972)] [[project page](https:\u002F\u002Fdocs.xlang.ai)] [[code](https:\u002F\u002Fgithub.com\u002Fxlang-ai\u002FOSWorld)] [[data viewer](https:\u002F\u002Fos-world.github.io\u002Fexplorer.html)]\n  - OSWorld🖥️ is a unified, real computer environment for multimodal agents to benchmark open-ended computer tasks with arbitrary apps and interfaces on Ubuntu, Windows, & macOS.\n\n### 4.2 Training and Optimizing LLM-based Agents\n- [2024\u002F06] **AgentGym: Evolving Large Language Model-based Agents across Diverse Environments.** *Zhiheng Xi (Fudan University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.04151)] [[project page](https:\u002F\u002Fagentgym.github.io\u002F)] [[codes and platform](https:\u002F\u002Fgithub.com\u002FWooooDyy\u002FAgentGym)] [[dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FAgentGym\u002FAgentTraj-L)] [[benchmark](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FAgentGym\u002FAgentEval)] [[model](https:\u002F\u002Fhuggingface.co\u002FAgentGym\u002FAgentEvol-7B)].\n- [2023\u002F10] **FireAct: Toward Language Agent Fine-tuning.** *Baian Chen (System2 Research) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16291)] [[project page](https:\u002F\u002Ffireact-agent.github.io\u002F)] [[code](https:\u002F\u002Fgithub.com\u002Fanchen1011\u002FFireAct)] [[dataset](https:\u002F\u002Fgithub.com\u002Fanchen1011\u002FFireAct\u002Ftree\u002Fmain\u002Fdata)]\n- [2023\u002F10] **AgentTuning: Enabling Generalized Agent Abilities for LLMs.** *Aohan Zeng (Tsinghua University) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.12823)] [[project page](https:\u002F\u002Fthudm.github.io\u002FAgentTuning\u002F)] [[code](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FAgentTuning)] [[dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTHUDM\u002FAgentInstruct)]\n- [2023\u002F10] **Lemur: Harmonizing Natural Language and Code for Language Agents** *Yiheng Xu (University of Hong Kong) et al. arXiv.* [[paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.06830)] [[code](https:\u002F\u002Fgithub.com\u002FOpenLemur\u002FLemur)]\n\n## Citation\nIf you find this repository useful, please cite our paper:\n\n```\n@misc{xi2023rise,\n      title={The Rise and Potential of Large Language Model Based Agents: A Survey}, \n      author={Zhiheng Xi and Wenxiang Chen and Xin Guo and Wei He and Yiwen Ding and Boyang Hong and Ming Zhang and Junzhe Wang and Senjie Jin and Enyu Zhou and Rui Zheng and Xiaoran Fan and Xiao Wang and Limao Xiong and Yuhao Zhou and Weiran Wang and Changhao Jiang and Yicheng Zou and Xiangyang Liu and Zhangyue Yin and Shihan Dou and Rongxiang Weng and Wensen Cheng and Qi Zhang and Wenjuan Qin and Yongyan Zheng and Xipeng Qiu and Xuanjing Huang and Tao Gui},\n      year={2023},\n      eprint={2309.07864},\n      archivePrefix={arXiv},\n      primaryClass={cs.AI}\n}\n```\n\n\n## Project Maintainers & Contributors\n- Zhiheng Xi （奚志恒, [@WooooDyy](https:\u002F\u002Fgithub.com\u002FWooooDyy)）\n- Wenxiang Chen （陈文翔, [@chenwxOggai](https:\u002F\u002Fgithub.com\u002FchenwxOggai)）\n- Xin Guo （郭昕, [@XinGuo2002](https:\u002F\u002Fgithub.com\u002FXinGuo2002)）\n- Wei He（何为, [@hewei2001](https:\u002F\u002Fgithub.com\u002Fhewei2001)）\n- Yiwen Ding （丁怡文, [@Yiwen-Ding](https:\u002F\u002Fgithub.com\u002FYiwen-Ding)）\n- Boyang Hong（洪博杨, [@HongBoYang](https:\u002F\u002Fgithub.com\u002FHBY-hub)）\n- Ming Zhang （张明, [@KongLongGeFDU](https:\u002F\u002Fgithub.com\u002FKongLongGeFDU)）\n- Junzhe Wang（王浚哲, [@zsxmwjz](https:\u002F\u002Fgithub.com\u002Fzsxmwjz)）\n- Senjie Jin（金森杰, [@Leonnnnnn929](https:\u002F\u002Fgithub.com\u002FLeonnnnnn929)）\n\n## Contact\n- Zhiheng Xi: zhxi22@m.fudan.edu.cn\n\n\n\n## Star History\n\n[![Star History Chart](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWooooDyy_LLM-Agent-Paper-List_readme_8ba72ed8f802.png)](https:\u002F\u002Fstar-history.com\u002F#WooooDyy\u002FLLM-Agent-Paper-List&Date)\n\n","# 基于大型语言模型的智能体的兴起与潜力：综述\n\n🔥 **基于LLM的智能体必读论文。**\n\n🏃 **即将推出：为每篇论文添加一句话简介。**\n\n## 🔔 新闻\n\n- 🎉 [2025-09-10] 注意！您可以在AgentGym中开发自定义环境，并在其上进行强化学习！教程请见[这里](https:\u002F\u002Fgithub.com\u002FWooooDyy\u002FAgentGym\u002Fblob\u002Fmain\u002Fdocs\u002Ftutorials\u002Fen\u002F05-2nd-Development.md)。\n- 🍺 [2025-09-10] arXiv上新论文发布：[AgentGym-RL：通过多轮强化学习训练LLM智能体进行长 horizon 决策](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.08755)。\n- 🚀 [2025-09-10] AgentGym-RL框架发布！我们推出了AgentGym的强化学习（RL）版本，使智能体能够直接从交互式环境中学习：[AgentGym-RL](https:\u002F\u002Fgithub.com\u002FWooooDyy\u002FAgentGym-RL)。\n- 👀 [2025\u002F09\u002F03] AgentGym现提供交互式前端用于可视化。研究人员可以回放和检查完整的轨迹、逐步查看智能体的决策过程，并更方便地分析模型行为。\n- ☄️ [2024\u002F06\u002F07] AgentGym已发布，用于在多样化环境中开发和演化基于LLM的智能体！\n  - 论文：[AgentGym](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.04151)。\n  - 项目页面：[https:\u002F\u002Fagentgym.github.io\u002F](https:\u002F\u002Fagentgym.github.io\u002F)。\n  - 代码：[平台与实现](https:\u002F\u002Fgithub.com\u002FWooooDyy\u002FAgentGym)。\n  - Huggingface资源：[AgentTraj-L](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FAgentGym\u002FAgentTraj-L)、[AgentEval](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FAgentGym\u002FAgentEval)、[AgentEvol-7B](https:\u002F\u002Fhuggingface.co\u002FAgentGym\u002FAgentEvol-7B)。\n- 🎉 [2024\u002F05\u002F02] R3（[通过逆向课程强化学习训练大型语言模型进行推理](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.05808)）已被ICML 2024接受！\n- 💫 [2024\u002F02\u002F08] 关于LLM智能体推理的强化学习新论文R3已发布！论文：[通过逆向课程强化学习训练大型语言模型进行推理](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.05808)。代码：[LLM-Reverse-Curriculum-RL](https:\u002F\u002Fgithub.com\u002FWooooDyy\u002FLLM-Reverse-Curriculum-RL)。\n- 🥳 [2023\u002F09\u002F20] 本项目已被列入[Github Trendings](https:\u002F\u002Fgithub.com\u002Ftrending)! 这是一项巨大的荣誉！\n- 💥 [2023\u002F09\u002F15] 我们的综述已发布！论文详见[基于大型语言模型的智能体的兴起与潜力：综述](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.07864)！\n- ✨ [2023\u002F09\u002F14] 我们创建了这个仓库，用于维护关于基于LLM的智能体的论文列表。更多论文即将发布！\n\n\u003Cdiv align=center>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWooooDyy_LLM-Agent-Paper-List_readme_9a9ae4a3358d.jpg\" width=\"80%\" \u002F>\u003C\u002Fdiv>\n\n\n## 🌟 引言\n\n长期以来，人类一直在追求达到或超越人类水平的人工智能（AI），而AI智能体被视为实现这一目标的有力工具。AI智能体是能够感知环境、做出决策并采取行动的人工实体。\n\n由于其多功能性和卓越的能力，大型语言模型（LLMs）被认为是通用人工智能（AGI）的潜在火花，为构建通用AI智能体带来了希望。许多研究工作已经以LLMs为基础来构建AI智能体，并取得了显著进展。\n\n在本仓库中，我们提供了关于基于LLM的智能体的系统性综合综述，并列出了几篇必读论文。\n\n具体而言，我们首先介绍基于LLM的智能体的一般概念框架：该框架由大脑、感知和行动三个主要组件构成，并可根据不同应用进行定制。随后，我们探讨了基于LLM的智能体在三个方面的大规模应用：单智能体场景、多智能体场景以及人机协作。接着，我们深入研究智能体社会，探索基于LLM的智能体的行为与个性、它们形成社会时出现的社会现象，以及这些现象对人类社会的启示。最后，我们讨论了该领域内的一系列关键议题和开放问题。\n\n**我们非常感谢通过PR、Issue、邮件或其他方式提出的任何贡献。**\n\n## 目录 (ToC)\n\n\n- [基于大型语言模型的智能体的兴起与潜力：综述](#the-rise-and-potential-of-large-language-model-based-agents-a-survey)\n  - [🔔 新闻](#-news)\n  - [🌟 引言](#-introduction)\n  - [目录 (ToC)](#table-of-content-toc)\n  - [1. 智能体的诞生：基于LLM的智能体构建](#1-the-birth-of-an-agent-construction-of-llm-based-agents)\n    - [1.1 大脑：主要由LLM构成](#11-brain-primarily-composed-of-an-llm)\n      - [1.1.1 自然语言交互](#111-natural-language-interaction)\n        - [高质量生成](#high-quality-generation)\n        - [深度理解](#deep-understanding)\n      - [1.1.2 知识](#112-knowledge)\n        - [预训练模型](#pretrain-model)\n        - [语言知识](#linguistic-knowledge)\n        - [常识性知识](#commonsense-knowledge)\n        - [可操作性知识](#actionable-knowledge)\n        - [知识可能存在的问题](#potential-issues-of-knowledge)\n      - [1.1.3 记忆](#113-memory)\n        - [记忆能力](#memory-capability)\n          - [提升Transformer的长度限制](#raising-the-length-limit-of-transformers)\n          - [总结记忆](#summarizing-memory)\n          - [用向量或数据结构压缩记忆](#compressing-memories-with-vectors-or-data-structures)\n        - [记忆检索](#memory-retrieval)\n      - [1.1.4 推理与规划](#114-reasoning--planning)\n        - [推理](#reasoning)\n        - [规划](#planning)\n          - [制定计划](#plan-formulation)\n          - [反思计划](#plan-reflection)\n      - [1.1.5 可迁移性和泛化能力](#115-transferability-and-generalization)\n        - [未见任务的泛化](#unseen-task-generalization)\n        - [上下文学习](#in-context-learning)\n        - [持续学习](#continual-learning)\n    - [1.2 感知：面向LLM基智能体的多模态输入](#12-perception-multimodal-inputs-for-llm-based-agents)\n      - [1.2.1 视觉](#121-visual)\n      - [1.2.2 音频](#122-audio)\n    - [1.3 行动：扩展LLM基智能体的动作空间](#13-action-expand-action-space-of-llm-based-agents)\n      - [1.3.1 工具使用](#131-tool-using)\n      - [1.3.2 身体行动](#132-embodied-action)\n  - [2. 智能体的应用实践：LLM基智能体的用途](#2-agents-in-practice-applications-of-llm-based-agents)\n    - [2.1 单个智能体的通用能力](#21-general-ability-of-single-agent)\n      - [2.1.1 任务导向型部署](#211-task-oriented-deployment)\n      - [2.1.2 创新导向型部署](#212-innovation-oriented-deployment)\n      - [2.1.3 生命周期导向型部署](#213-lifecycle-oriented-deployment)\n    - [2.2 多智能体的协同潜力](#22-coordinating-potential-of-multiple-agents)\n      - [2.2.1 合作互动以实现互补](#221-cooperative-interaction-for-complementarity)\n      - [2.2.2 对抗性互动以促进进步](#222-adversarial-interaction-for-advancement)\n    - [2.3 人机交互参与](#23-interactive-engagement-between-human-and-agent)\n      - [2.3.1 教师-执行者模式](#231-instructor-executor-paradigm)\n        - [教育](#education)\n        - [健康](#health)\n        - [其他应用](#other-application)\n      - [2.3.2 平等伙伴关系模式](#232-equal-partnership-paradigm)\n        - [共情沟通者](#empathetic-communicator)\n        - [人类水平参与者](#human-level-participant)\n  - [3. 智能体社会：从个体到社会性](#3-agent-society-from-individuality-to-sociality)\n    - [3.1 LLM基智能体的行为与个性](#31-behavior-and-personality-of-llm-based-agents)\n      - [3.1.1 社会行为](#311-social-behavior)\n        - [个体行为](#individual-behaviors)\n        - [群体行为](#group-behaviors)\n      - [3.1.2 个性](#312-personality)\n        - [认知](#cognition)\n        - [情感](#emotion)\n        - [性格](#character)\n    - [3.2 智能体社会的环境](#32-environment-for-agent-society)\n      - [3.2.1 文本环境](#321-text-based-environment)\n      - [3.2.2 虚拟沙盒环境](#322-virtual-sandbox-environment)\n      - [3.2.3 物理环境](#323-physical-environment)\n    - [3.3 基于LLM的智能体的社会模拟](#33-society-simulation-with-llm-based-agents)\n  - [4. 其他主题](#4-other-topics)\n    - [4.1 LLM基智能体的基准测试](#41-benchmarks-for-llm-based-agents)\n    - [4.2 LLM基智能体的训练与优化](#42-training-and-optimizing-llm-based-agents)\n  - [引用](#citation)\n  - [项目维护者与贡献者](#project-maintainers--contributors)\n  - [联系方式](#contact)\n  - [星标历史](#star-history)\n\n\n\n\n\n\n## 1. 智能体的诞生：基于LLM的智能体构建\n\u003Cdiv align=center>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWooooDyy_LLM-Agent-Paper-List_readme_ff2e2079c50f.jpg\" width=\"80%\" \u002F>\u003C\u002Fdiv>\n\n### 1.1 大脑：主要由LLM构成\n\n#### 1.1.1 自然语言交互\n\n##### 高质量生成\n\n\n- [2023年10月] **通过多模态大型语言模型实现端到端具身决策：与GPT4-Vision及其他模型的探索** *陈亮等人 arXiv.* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.02071)] [[代码](https:\u002F\u002Fgithub.com\u002FPKUnlp-icler\u002FPCA-EVAL)]\n  - 该研究提出了PCA-EVAL，从感知、认知和行动三个层面，对基于MLLM的端到端方法以及基于LLM的工具使用方法进行具身决策的基准测试。\n- [2023年8月] **ChatGPT在推理、幻觉和交互性方面的多任务、多语言、多模态评估。** *Yejin Bang等人 arXiv.* [[论文](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2302.04023)]\n  - 该研究使用涵盖8种常见NLP应用任务的21个数据集，评估了ChatGPT的多任务、多语言和多模态特性。\n- [2023年6月] **LLM-Eval：针对大型语言模型开放域对话的统一多维度自动评估。** *林延婷等人 arXiv.* [[论文](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2305.13711)]\n  - LLM-Eval方法评估了内容、语法、相关性及恰当性等多个维度。\n- [2023年4月] **ChatGPT是否为高度流畅的语法错误修正系统？一项全面评估。** *方涛等人 arXiv.* [[论文](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2304.01746)]\n  - 评估结果显示，ChatGPT具有出色的错误检测能力，并且能够自由地纠正错误，使修正后的句子非常流畅。此外，其在非英语和低资源环境中的表现也凸显了它在多语言语法错误修正任务中的潜力。\n\n##### 深度理解\n\n- [2023\u002F06] **聪明汉斯还是神经心智理论？大型语言模型中的社交推理压力测试。** *Natalie Shapira 等人，arXiv。* [[论文](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2305.14763)]\n  - LLMs 展现出一定的心智理论能力，但这种行为远未达到稳健水平。\n- [2022\u002F08] **从上下文中通过语言推断奖励。** *Jessy Lin 等人，ACL。* [[论文](https:\u002F\u002Fdoi.org\u002F10.18653\u002Fv1\u002F2022.acl-long.585)]\n  - 该研究提出了一种能够从语言中推断奖励并在未见环境中预测最优动作的模型。\n- [2021\u002F10] **基于心智理论的复杂人机协作辅助沟通。** *Moritz C. Buehler 等人，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.01355)]\n  - 该研究设计了一个在交互过程中理解人类意图的智能体 Sushi。\n\n#### 1.1.2 知识\n\n##### 预训练模型\n\n- [2023\u002F04] **从未标注数据中学习句子的分布式表示。** *Felix Hill（剑桥大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1602.03483)]\n- [2020\u002F02] **语言模型的参数中能容纳多少知识？** *Adam Roberts（谷歌）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.08910)]\n- [2020\u002F01] **神经语言模型的规模法则。** *Jared Kaplan（约翰霍普金斯大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2001.08361)]\n- [2017\u002F12] **机器智能中的常识知识。** *Niket Tandon（艾伦人工智能研究所）等，SIGMOD。* [[论文](https:\u002F\u002Fsigmodrecord.org\u002Fpublications\u002FsigmodRecord\u002F1712\u002Fpdfs\u002F09_reports_Tandon.pdf)]\n- [2011\u002F03] **从零开始的自然语言处理（几乎）。** *Ronan Collobert（普林斯顿大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1103.0398)]\n\n##### 语言学知识\n\n- [2023\u002F02] **ChatGPT 在推理、幻觉和交互性方面的多任务、多语言、多模态评估。** *Yejin Bang 等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.04023)]\n- [2021\u002F06] **探测预训练语言模型中的语义属性及其取值。** *Meriem Beloucif 等，EMNLP。* [[论文](https:\u002F\u002Faclanthology.org\u002F2021.findings-emnlp.218\u002F)]\n- [2020\u002F10] **探测预训练语言模型中的词汇语义。** *Ivan Vulić 等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.05731)]\n- [2019\u002F04] **一种用于在词表示中寻找句法结构的结构性探测方法。** *John Hewitt 等，ACL。* [[论文](https:\u002F\u002Faclanthology.org\u002FN19-1419\u002F)]\n- [2016\u002F04] **在获得更多语义知识的情况下改进自动关键词提取。** *H Leung。高级应用系统。* [[论文](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-3-319-32055-7_10)]\n\n##### 常识知识\n\n- [2022\u002F10] **代码语言模型是少样本的常识学习者。** *Aman Madaan 等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.07128)]\n- [2021\u002F04] **上下文语言模型中的关系型世界知识表示：综述。** *Tara Safavi 等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.05837)]\n- [2019\u002F11] **我们如何知道语言模型掌握了什么知识？** *Zhengbao Jiang 等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.12543)]\n\n##### 可操作知识\n\n- [2023\u002F07] **大型语言模型在医学领域的应用。** *Arun James Thirunavukarasu 等，Nature。* [[论文](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fs41591-023-02448-8)]\n- [2023\u002F06] **DS-1000：一个自然且可靠的用于数据科学代码生成的基准测试。** *Yuhang Lai 等，ICML。* [[论文](https:\u002F\u002Fproceedings.mlr.press\u002Fv202\u002Flai23b.html)]\n- [2022\u002F10] **代码语言模型是少样本的常识学习者。** *Aman Madaan 等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.07128)]\n- [2022\u002F02] **对大型代码语言模型的系统性评估。** *Frank F. Xu 等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2202.13169)]\n- [2021\u002F10] **训练验证器解决数学文字题。** *Karl Cobbe 等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.14168)]\n\n##### 知识的潜在问题\n\n- [2023\u002F10] **FreshLLMs：利用搜索引擎增强刷新大型语言模型。** *Tu Vu（谷歌）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.03214)] [[代码](https:\u002F\u002Fgithub.com\u002Ffreshllms\u002Ffreshqa)]\n- [2023\u002F05] **编辑大型语言模型：问题、方法与机遇。** *Yunzhi Yao 等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.13172)]\n- [2023\u002F05] **Self-Checker：用于大型语言模型事实核查的即插即用模块。** *Miaoran Li 等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14623)]\n- [2023\u002F05] **CRITIC：大型语言模型可通过工具交互式批评实现自我修正。** *Zhibin Gou 等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.11738)]\n- [2023\u002F04] **使用基础模型进行工具学习。** *Yujia Qin 等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.08354)]\n- [2023\u002F03] **SelfCheckGPT：针对生成式大型语言模型的零资源黑盒幻觉检测。** *Potsawee Manakul 等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.08896)]\n- [2022\u002F06] **大规模基于记忆的模型编辑。** *Eric Mitchell 等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.06520)]\n- [2022\u002F04] **关于语言模型作为知识库的综述。** *Badr AlKhamissi 等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.06031)]\n- [2021\u002F04] **编辑语言模型中的事实性知识。** *Nicola De Cao 等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.08164)]\n- [2017\u002F08] **衡量神经网络中的灾难性遗忘现象。** *Ronald Kemker 等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1708.02072)]\n\n#### 1.1.3 记忆\n\n##### 记忆能力\n\n###### 提升 Transformer 的序列长度限制\n\n- [2023\u002F10] **MemGPT：迈向将大语言模型作为操作系统。** *查尔斯·帕克（加州大学伯克利分校）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.08560)] [[项目页面](https:\u002F\u002Fmemgpt.ai\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002Fcpacker\u002FMemGPT)] [[数据集](https:\u002F\u002Fhuggingface.co\u002FMemGPT)]\n- [2023\u002F05] **随机位置编码提升Transformer模型的长度泛化能力。** *安尼安·鲁奥斯（DeepMind）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16843)] [[代码](https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Frandomized_positional_encodings)]\n- [2023年3月] **CoLT5：基于条件计算的更快速长距离Transformer模型。** *乔舒亚·艾恩斯利（谷歌研究院）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.09752)]\n- [2022\u002F03] **利用Transformer高效分类长文档。** *玄智海莉·朴（伊利诺伊大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.11258)] [[代码](https:\u002F\u002Fgithub.com\u002Famazon-science\u002Fefficient-longdoc-classification)]\n- [2021\u002F12] **LongT5：用于长序列的高效文本到文本Transformer模型。** *曼迪·郭（谷歌研究院）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.07916)] [[代码](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Flongt5)]\n- [2019\u002F10] **BART：面向自然语言生成、翻译和理解的去噪序列到序列预训练模型。** *迈克尔·刘易斯（Facebook AI）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.13461)] [[代码](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Ftree\u002Fmain\u002Fsrc\u002Ftransformers\u002Fmodels\u002Fbart)]\n\n###### 总结记忆\n\n- [2023\u002F10] **穿越记忆迷宫：通过交互式阅读突破上下文限制** *霍华德·陈（普林斯顿大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.05029)]\n- [2023\u002F09] **通过大型语言模型链式调用赋能私人辅导** *陈宇林（清华大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.08112)]\n- [2023\u002F08] **ExpeL：LLM代理是体验式学习者。** *赵安德（清华大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.10144)] [[代码](https:\u002F\u002Fgithub.com\u002FAndrewzh112\u002FExpeL)]\n- [2023\u002F08] **ChatEval：通过多智能体辩论打造更优秀的基于LLM的评估工具。** *陈志敏（清华大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.07201)] [[代码](https:\u002F\u002Fgithub.com\u002Fthunlp\u002FChatEval)]\n- [2023\u002F05] **MemoryBank：用长期记忆增强大型语言模型。** *钟万军（哈尔滨工业大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.10250)] [[代码](https:\u002F\u002Fgithub.com\u002Fzhongwanjun\u002Fmemorybank-siliconfriend)]\n- [2023\u002F04] **生成式代理：人类行为的交互式模拟体。** *朴俊成（斯坦福大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.03442)] [[代码](https:\u002F\u002Fgithub.com\u002Fjoonspk-research\u002Fgenerative_agents)]\n- [2023\u002F04] **自控记忆系统释放大规模语言模型的无限输入容量。** *梁新念（北京航空航天大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.13343)] [[代码](https:\u002F\u002Fgithub.com\u002Fwbbeyourself\u002Fscm4llms)]\n- [2023\u002F03] **Reflexion：具备言语强化学习能力的语言代理。** *诺亚·辛恩（东北大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.11366)] [[代码](https:\u002F\u002Fgithub.com\u002Fnoahshinn024\u002Freflexion)]\n- [2023\u002F05] **RecurrentGPT：交互式生成任意长度文本。** *周旺春树（AIWaves）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.13304.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002Faiwaves-cn\u002FRecurrentGPT)]\n\n\n###### 用向量或数据结构压缩记忆\n\n- [2023\u002F07] **用于软件开发的沟通型智能体。** *钱晨（清华大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.07924)] [[代码](https:\u002F\u002Fgithub.com\u002Fopenbmb\u002Fchatdev)]\n- [2023\u002F06] **ChatDB：以数据库作为符号化记忆增强LLM。** *胡晨旭（清华大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.03901)] [[代码](https:\u002F\u002Fgithub.com\u002Fhuchenxucs\u002FChatDB)]\n- [2023\u002F05] **Minecraft中的幽灵：基于文本知识与记忆的大语言模型在开放世界环境中的通用智能体。** *朱锡洲（清华大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.17144)] [[代码](https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FGITM)]\n- [2023\u002F05] **RET-LLM：迈向大型语言模型的通用读写记忆。** *阿里·莫达雷西（慕尼黑大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14322)] [[代码](https:\u002F\u002Fgithub.com\u002Ftloen\u002Falpaca-lora)]\n- [2023\u002F05] **RecurrentGPT：交互式生成任意长度文本。** *周旺春树（AIWaves）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.13304.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002Faiwaves-cn\u002FRecurrentGPT)]\n\n##### 内存检索\n\n- [2023\u002F08] **内存沙盒：对话型智能体的透明且交互式内存管理。** *黄子恒（加州大学圣地亚哥分校）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.01542)]\n- [2023\u002F08] **AgentSims：一个用于大型语言模型评估的开源沙盒。** *林家驹（PTA Studio）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.04026)] [[项目页面](https:\u002F\u002Fwww.agentsims.com\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002Fpy499372727\u002FAgentSims\u002F)]\n- [2023\u002F06] **ChatDB：以数据库作为符号化记忆增强LLM。** *胡晨旭（清华大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.03901)] [[代码](https:\u002F\u002Fgithub.com\u002Fhuchenxucs\u002FChatDB)]\n- [2023\u002F05] **MemoryBank：用长期记忆增强大型语言模型。** *钟万军（哈尔滨工业大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.10250)] [[代码](https:\u002F\u002Fgithub.com\u002Fzhongwanjun\u002Fmemorybank-siliconfriend)]\n- [2023\u002F04] **生成式代理：人类行为的交互式模拟体。** *朴俊成（斯坦福大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.03442)] [[代码](https:\u002F\u002Fgithub.com\u002Fjoonspk-research\u002Fgenerative_agents)]\n- [2023\u002F05] **RecurrentGPT：交互式生成任意长度文本。** *周旺春树（AIWaves）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.13304.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002Faiwaves-cn\u002FRecurrentGPT)]\n\n\n#### 1.1.4 推理与规划\n\n##### 推理\n- [2024\u002F02] **通过逆序课程强化学习训练大型语言模型进行推理。** *奚志恒（复旦大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.05808)] [[代码](https:\u002F\u002Fgithub.com\u002FWooooDyy\u002FLLM-Agent-Paper-List)]\n- [2023\u002F09] **ReConcile：圆桌会议通过多元LLM间的共识提升推理能力。** *陈志尧（北卡罗来纳大学教堂山分校）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.13007.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002Fdinobby\u002FReConcile)]\n\n- [2023\u002F05] **Self-Polish：通过问题精炼提升大语言模型的推理能力。** *Zhiheng Xi（复旦大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14497)] [[代码](https:\u002F\u002Fgithub.com\u002Fwoooodyy\u002Fself-polish)]\n\n- [2023-03] **大语言模型是零样本推理者。** *Takeshi Kojima（东京大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.11916)] [[代码](https:\u002F\u002Fgithub.com\u002Fkojima-takeshi188\u002Fzero_shot_cot)]\n\n- [2023\u002F03] **Self-Refine：基于自我反馈的迭代精炼。** *Aman Madaan（卡内基梅隆大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17651)] [[代码](https:\u002F\u002Fgithub.com\u002Fmadaan\u002Fself-refine)]\n\n- [2022\u002F05] **选择—推理：利用大语言模型实现可解释的逻辑推理。** *Antonia Creswell（DeepMind）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.09712)]\n\n- [2022\u002F03] **自一致性改进了语言模型中的思维链推理。** *Xuezhi Wang（谷歌研究院）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.11171)] [[代码](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Ftree\u002Fmain\u002Fsrc\u002Ftransformers\u002Fmodels\u002Fbart)]\n\n- [2023\u002F02] **语言模型中的多模态思维链推理。** *Zhuosheng Zhang（上海交通大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.00923)] [[代码](https:\u002F\u002Fgithub.com\u002Famazon-science\u002Fmm-cot)]\n\n- [2022\u002F01] **思维链提示在大语言模型中激发推理能力。** *Jason Wei（谷歌研究院）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.11903)]\n\n\n##### 规划\n\n###### 计划制定\n\n- [2023\u002F11] **JARVIS-1：基于记忆增强型多模态语言模型的开放世界多任务智能体。** *ZiHao Wang（北京大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05997)] [[代码](https:\u002F\u002Fgithub.com\u002FCraftJarvis\u002FJARVIS-1)]\n- [2023\u002F10] **语言智能体树搜索统一了语言模型中的推理、行动与规划。** *Andy Zhou（伊利诺伊大学厄巴纳-香槟分校）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.04406)] [[项目页面](https:\u002F\u002Fandyz245.github.io\u002FLanguageAgentTreeSearch\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002Fandyz245\u002FLanguageAgentTreeSearch\u002F)]\n- [2023\u002F05] **思维之树：利用大语言模型进行审慎的问题解决。** *Shunyu Yao（普林斯顿大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.10601)] [[代码](https:\u002F\u002Fgithub.com\u002Fprinceton-nlp\u002Ftree-of-thought-llm)]\n- [2023\u002F05] **计划、排除与追踪——语言模型是具身智能体的好老师。** *Yue Wu（卡内基梅隆大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.02412)]\n- [2023\u002F05] **利用语言模型进行推理即是在使用世界模型进行规划。** *Shibo Hao（加州大学圣地亚哥分校）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14992)] [[代码](https:\u002F\u002Fgithub.com\u002FBer666\u002FRAP)]\n- [2023\u002F05] **SwiftSage：一种具备快慢思维的生成式智能体，适用于复杂交互任务。** *Bill Yuchen Lin（艾伦人工智能研究所）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.17390)] [[代码](https:\u002F\u002Fgithub.com\u002Fyuchenlin\u002Fswiftsage)]\n- [2023\u002F04] **LLM+P：用最优规划能力赋能大语言模型。** *Bo Liu（德克萨斯大学奥斯汀分校）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.11477)] [[代码](https:\u002F\u002Fgithub.com\u002FCranial-XIX\u002Fllm-pddl)]\n- [2023\u002F03] **HuggingGPT：借助ChatGPT及其在Hugging Face中的伙伴解决AI任务。** *Yongliang Shen（微软亚洲研究院）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17580)] [[代码](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FJARVIS)]\n- [2023\u002F02] **描述、解释、计划与选择：基于大语言模型的交互式规划使开放世界多任务智能体成为可能。** *ZiHao Wang（北京大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.01560)] [[代码](https:\u002F\u002Fgithub.com\u002FCraftJarvis\u002FMC-Planner)]\n- [2022\u002F05] **从最简到最繁提示法使大语言模型能够进行复杂推理。** *Denny Zhou（谷歌研究院）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.10625)]\n- [2022\u002F05] **MRKL系统：一种模块化、神经符号架构，结合了大语言模型、外部知识源和离散推理。** *Ehud Karpas（AI21 Labs）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.00445)]\n- [2022\u002F04] **照我所做，而非照我说：将语言与机器人操作可能性相结合。** *Michael Ahn（谷歌机器人团队）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.01691)]\n- [2023\u002F05] **Agents：一个用于自主语言智能体的开源框架。** *Wangchunshu Zhou（AIWaves）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.07870.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002Faiwaves-cn\u002Fagents)]\n- [2022\u002F12] **不要生成，要判别：一种将语言模型接地于真实世界环境的方案。** *Yu Gu（俄亥俄州立大学）等，ACL。* [[论文](https:\u002F\u002Faclanthology.org\u002F2023.acl-long.270.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002Fdki-lab\u002FPangu)]\n\n\n###### 计划反思\n\n- [2024\u002F02] **Agent-Pro：通过策略级反思与优化实现自我进化** *张文琪（浙江大学）等，arXiv.* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2402.17574)] [[代码](https:\u002F\u002Fgithub.com\u002Fzwq2018\u002FAgent-Pro)]\n- [2024\u002F01] **自对比：通过不一致的求解视角实现更优的反思** *张文琪（浙江大学）等，arXiv.* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.02009)]\n- [2023\u002F11] **JARVIS-1：基于记忆增强型多模态语言模型的开放世界多任务智能体。** *王子豪（北京大学）等，arXiv.* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05997)] [[代码](https:\u002F\u002Fgithub.com\u002FCraftJarvis\u002FJARVIS-1)]\n- [2023\u002F10] **验证链可降低大型语言模型的幻觉现象。** *谢赫扎德·杜利亚瓦拉（Meta AI & ETH苏黎世）等，arXiv.* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.11495)]\n- [2023\u002F10] **FireAct：迈向语言智能体的微调。** *陈百安（System2 Research）等，arXiv.* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16291)] [[项目页面](https:\u002F\u002Ffireact-agent.github.io\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002Fanchen1011\u002FFireAct)] [[数据集](https:\u002F\u002Fgithub.com\u002Fanchen1011\u002FFireAct\u002Ftree\u002Fmain\u002Fdata)]\n- [2023\u002F08] **SelfCheck：利用大语言模型对自身逐步推理进行零样本校验。** *苗宁（牛津大学）等，arXiv.* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.00436)] [[代码](https:\u002F\u002Fgithub.com\u002FNingMiao\u002FSelfCheck)]\n- [2023\u002F05] **ChatCoT：基于对话式大型语言模型的工具增强型思维链推理。** *陈志鹏（中国人民大学）等，arXiv.* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14323)] [[代码](https:\u002F\u002Fgithub.com\u002FRUCAIBOX\u002FChatCoT)]\n- [2023\u002F05] **Voyager：基于大型语言模型的开放式具身智能体。** *王冠智（NVIDIA）等，arXiv.* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16291)] [[项目页面](https:\u002F\u002Fvoyager.minedojo.org\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002FMineDojo\u002FVoyager)]\n- [2023\u002F03] **与环境对话：利用大型语言模型实现交互式多模态感知。** *赵旭峰（汉堡大学）等，arXiv.* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.08268)] [[代码](https:\u002F\u002Fmatcha-model.github.io\u002F)]\n- [2022\u002F12] **LLM-Planner：基于大型语言模型的具身智能体少样本接地规划。** *宋灿熙（俄亥俄州立大学）等，arXiv.* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.04088)] [[代码](https:\u002F\u002Fdki-lab.github.io\u002FLLM-Planner\u002F)]\n- [2022\u002F10] **ReAct：在语言模型中协同推理与行动。** *姚顺宇（普林斯顿大学）等，arXiv.* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.03629)] [[代码](https:\u002F\u002Freact-lm.github.io\u002F)]\n- [2022\u002F07] **内心独白：通过语言模型规划实现具身推理。** *黄文龙（谷歌机器人团队）等，arXiv.* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.05608)] [[代码](https:\u002F\u002Finnermonologue.github.io\u002F)]\n- [2021\u002F10] **AI链条：通过串联大型语言模型提示实现透明且可控的人机交互。** *吴彤爽（华盛顿大学）等，arXiv.* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.01691)]\n\n#### 1.1.5 可迁移性与泛化能力\n\n##### 未见任务的泛化\n- [2024\u002F06] **AgentGym：跨多样化环境演化基于大型语言模型的智能体。** *奚志恒（复旦大学）等，arXiv.* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.04151)] [[项目页面](https:\u002F\u002Fagentgym.github.io\u002F)] [[代码与平台](https:\u002F\u002Fgithub.com\u002FWooooDyy\u002FAgentGym)] [[数据集](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FAgentGym\u002FAgentTraj-L)] [[基准测试](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FAgentGym\u002FAgentEval)] [[模型](https:\u002F\u002Fhuggingface.co\u002FAgentGym\u002FAgentEvol-7B)]。\n- [2023\u002F10] **AgentTuning：为大型语言模型赋予通用智能体能力。** *曾傲寒（清华大学）等，arXiv.* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.12823)] [[项目页面](https:\u002F\u002Fthudm.github.io\u002FAgentTuning\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FAgentTuning)] [[数据集](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTHUDM\u002FAgentInstruct)]\n- [2023\u002F10] **Lemur：为语言智能体协调自然语言与代码** *许一恒（香港大学）等，arXiv.* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.06830)] [[代码](https:\u002F\u002Fgithub.com\u002FOpenLemur\u002FLemur)]\n- [2023\u002F05] **通过人类反馈训练语言模型以遵循指令。** *欧阳龙等，NeurIPS.* [[论文](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Fhash\u002Fb1efde53be364a73914f58805a001731-Abstract-Conference.html)]\n  - InstructGPT：通过人类反馈微调，使语言模型在广泛的任务上与用户意图保持一致。\n- [2023\u002F01] **多任务提示训练可实现零样本任务泛化。** *维克托·桑等人，ICLR.* [[论文](https:\u002F\u002Fopenreview.net\u002Fforum?id=9Vrb9D0WI4)] [[代码](https:\u002F\u002Fgithub.com\u002Fbigscience-workshop\u002Ft-zero)]\n  - T0：T0是一种编码器-解码器模型，它接收文本输入并生成目标响应。该模型是在按不同任务划分的多任务NLP数据集混合物上训练而成。\n- [2022\u002F10] **指令微调语言模型的扩展性研究。** *郑炯元等人，arXiv.* [[论文](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2210.11416)] [[代码](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Ft5x)]\n  - 该研究探讨了指令微调，特别关注任务数量和模型规模的扩展，这有助于提升多种模型类别、提示设置及评估基准上的性能。\n- [2022\u002F08] **微调后的语言模型具备零样本学习能力。** *杰森·魏等人，ICLR.* [[论文](https:\u002F\u002Fopenreview.net\u002Fforum?id=gEZrGCozdqR)]\n  - FLAN：指令微调显著提升了在未见过任务上的零样本性能。\n\n##### 上下文学习\n\n- [2023\u002F08] **图像用图像说话：用于上下文视觉学习的通用画家模型。** *王新龙等人，IEEE.* [[论文](https:\u002F\u002Fdoi.org\u002F10.1109\u002FCVPR52729.2023.00660)] [[代码](https:\u002F\u002Fgithub.com\u002Fbaaivision\u002FPainter)]\n  - Painter：这项工作提出了一种以“图像”为中心的解决方案，用于上下文视觉学习的通用模型。\n- [2023\u002F08] **神经编解码语言模型是零样本文本到语音合成器。** *王成义等人，arXiv.* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.02111)] [[代码](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Funilm)]\n  - VALL-E：这项研究训练了一个神经编解码语言模型，使其具备上下文学习的能力。\n- [2023\u002F07] **关于上下文学习的综述。** *董庆秀等人，arXiv.* [[论文](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2301.00234)]\n  - 该综述总结了上下文学习（ICL）的进展与挑战。\n- [2023\u002F05] **语言模型是少样本学习者。** *汤姆·B·布朗（OpenAI）等，NeurIPS.* [[论文](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2020\u002Fhash\u002F1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html)]\n  - GPT-3：大规模扩展语言模型显著提升了其任务无关的少样本性能，有时甚至可以与先前最先进的微调方法相媲美。\n\n##### 持续学习\n\n- [2023\u002F11] **JARVIS-1：基于记忆增强型多模态语言模型的开放世界多任务智能体。** *ZiHao Wang（北京大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05997)] [[代码](https:\u002F\u002Fgithub.com\u002FCraftJarvis\u002FJARVIS-1)]\n- [2023\u002F07] **渐进式提示：面向语言模型的持续学习。** *Razdaibiedina 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.12314)]\n  - 该工作提出了渐进式提示方法，能够在不依赖数据重放或大量任务特定参数的情况下实现正向迁移并抵抗灾难性遗忘。\n- [2023\u002F05] **Voyager：基于大型语言模型的开放式具身智能体。** *Guanzhi Wang（NVIDIA）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16291)] [[项目页面](https:\u002F\u002Fvoyager.minedojo.org\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002FMineDojo\u002FVoyager)]\n  - Voyager：这是一个由 LLM 驱动的 Minecraft 中具身终身学习智能体示例，它无需人类干预即可持续探索世界、习得多样技能并进行新发现。\n- [2023\u002F01] **持续学习的综合综述：理论、方法与应用。** *Liyuan Wang 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.00487)]\n  - 该综述全面回顾了持续学习领域，旨在打通基础设定、理论基础、代表性方法及实际应用之间的联系。\n- [2022\u002F11] **自然语言处理任务中的持续学习：综述。** *Zixuan Ke 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.12701)]\n  - 该综述对 NLP 领域中持续学习的最新进展进行了全面回顾与分析。\n\n\n\n\n### 1.2 感知：用于 LLM 基础智能体的多模态输入\n\n#### 1.2.1 视觉\n\n- [2024\u002F01] **Agent ai：多模态交互的前沿探索。** *Zane Durante 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.03568)]\n- [2023\u002F10] **通过多模态大型语言模型实现端到端具身决策：基于 GPT4-Vision 及其扩展的探索。** *Liang Chen 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.02071)] [[代码](https:\u002F\u002Fgithub.com\u002FPKUnlp-icler\u002FPCA-EVAL)]\n- [2023\u002F05] **语言并非万能：将感知与语言模型对齐。** *Shaohan Huang 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.14045)]\n- [2023\u002F05] **InstructBLIP：通过指令微调迈向通用视觉—语言模型。** *Wenliang Dai 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.06500)]\n- [2023\u002F05] **MultiModal-GPT：用于与人类对话的视觉—语言模型。** *Tao Gong 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.04790)]\n- [2023\u002F05] **PandaGPT：一个模型搞定所有指令遵循任务。** *Yixuan Su 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16355)]\n- [2023\u002F04] **视觉指令微调。** *Haotian Liu 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.08485)]\n- [2023\u002F04] **MiniGPT-4：利用先进大型语言模型提升视觉—语言理解能力。** *Deyao Zhu，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.10592)]\n- [2023\u002F01] **BLIP-2：基于冻结图像编码器和大型语言模型的自监督语言—图像预训练。** *Junnan Li 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.12597)]\n- [2022\u002F04] **Flamingo：用于少样本学习的视觉—语言模型。** *Jean-Baptiste Alayrac 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.14198)]\n- [2021\u002F10] **MobileViT：轻量级、通用且适合移动端的视觉 Transformer。** *Sachin Mehta 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.02178)]\n- [2021\u002F05] **MLP-Mixer：面向视觉的全 MLP 架构。** *Ilya Tolstikhin 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.01601)]\n- [2020\u002F10] **一张图胜过 16×16 个词：用于大规模图像识别的 Transformer。** *Alexey Dosovitskiy 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.11929)]\n- [2017\u002F11] **神经离散表征学习。** *Aaron van den Oord 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.00937)]\n#### 1.2.2 音频\n\n- [2023\u002F06] **Video-LLaMA：面向视频理解的指令微调音频—视觉语言模型。** *Hang Zhang 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.02858)]\n- [2023\u002F05] **X-LLM：将多模态视为外语来构建先进大型语言模型。** *Feilong Chen 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.04160)]\n- [2023\u002F05] **InternGPT：通过超越语言的方式与 ChatGPT 交互解决以视觉为中心的任务。** *Zhaoyang Liu 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.05662)]\n- [2023\u002F04] **AudioGPT：理解与生成语音、音乐、声音及说话头像。** *Rongjie Huang 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.12995)]\n- [2023\u002F03] **HuggingGPT：借助 ChatGPT 及其在 Hugging Face 中的伙伴解决 AI 任务。** *Yongliang Shen 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17580)]\n- [2021\u002F06] **HuBERT：通过掩码预测隐藏单元实现自监督语音表征学习。** *Wei-Ning Hsu 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2106.07447)]\n- [2021\u002F04] **AST：音频谱图 Transformer。** *Yuan Gong 等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.01778)]\n\n### 1.3 行动：扩展 LLM 基础智能体的动作空间\n\n#### 1.3.1 工具使用\n- [2024\u002F02] **迈向不确定性感知的语言智能体。** *韩九州（莫纳什大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.14016)] [[项目页面](https:\u002F\u002Fuala-agent.github.io)] [[代码](https:\u002F\u002Fgithub.com\u002FJiuzhouh\u002FUncertainty-Aware-Language-Agent)]\n- [2023\u002F10] **OpenAgents：面向真实场景的语言智能体开放平台。** *XLang实验室（香港大学），arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.10634)] [[项目页面](https:\u002F\u002Fdocs.xlang.ai)] [[代码](https:\u002F\u002Fgithub.com\u002Fxlang-ai\u002FOpenAgents)] [[演示](https:\u002F\u002Fchat.xlang.ai)]\n- [2023\u002F10] **Lemur：为语言智能体协调自然语言与代码** *许一恒（香港大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.06830)] [[代码](https:\u002F\u002Fgithub.com\u002FOpenLemur\u002FLemur)]\n- [2023\u002F10] **基于多模态大语言模型的端到端具身决策：GPT4-Vision及更进一步的探索** *陈亮（北京大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.02071)] [[代码](https:\u002F\u002Fgithub.com\u002FPKUnlp-icler\u002FPCA-EVAL)]\n  - HOLMES是一个多智能体协作框架，允许LLM利用MLLM和API收集多模态信息，以支持明智的决策。\n- [2023\u002F07] **ToolLLM：助力大语言模型掌握16000+真实世界API。** *秦宇佳（清华大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.16789)] [[代码](https:\u002F\u002Fgithub.com\u002Fopenbmb\u002Ftoolbench)] [[数据集](https:\u002F\u002Fpaperswithcode.com\u002Fdataset\u002Ftoolbench)]\n  - ToolLLM是一个通用的工具使用框架，涵盖数据构建、模型训练和评估。\n- [2023\u002F05] **大语言模型作为工具制造者。** *蔡天乐（普林斯顿大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.17126)] [[代码](https:\u002F\u002Fgithub.com\u002Fctlllll\u002Fllm-toolmaker)]\n  - LATM是一个闭环框架，迈出了消除对现有工具依赖的第一步。\n- [2023\u002F05] **CREATOR：通过工具创建解耦大语言模型的抽象与具体推理。** *钱诚（清华大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14318)]\n  - CREATOR是一个创新框架，使LLM能够通过文档和代码实现自主创建工具。\n- [2023\u002F04] **基于基础模型的工具学习。** *秦宇佳（清华大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.08354)] [[代码](https:\u002F\u002Fgithub.com\u002Fopenbmb\u002Fbmtools)]\n  - 该综述主要介绍了一种名为“基于基础模型的工具学习”的新范式，它结合了专用工具和基础模型的优势，在解决问题时实现了更高的精度、效率和自动化。\n- [2023\u002F04] **ChemCrow：用化学工具增强大语言模型。** *安德烈斯·M·布兰（EPFL ISIC人工化学智能实验室）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.05376)] [[代码](https:\u002F\u002Fgithub.com\u002Fur-whitelab\u002Fchemcrow-public)]\n  - ChemCrow是一个LLM化学智能体，集成了13个专家设计的工具，增强了LLM在化学领域的表现，并催生了新的能力。\n- [2023\u002F04] **GeneGPT：用领域工具增强大语言模型，以提升生物医学信息的获取能力。** *金桥（美国国立卫生研究院）、杨一帆、陈清宇、陆志勇，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.09667)] [[代码](https:\u002F\u002Fgithub.com\u002Fncbi\u002FGeneGPT)]\n  - GeneGPT是一种回答基因组学问题的模型。它提出了一种新颖的方法来应对幻觉问题，即教会LLM使用Web API。\n- [2023\u002F04] **OpenAGI：当LLM遇见领域专家。** *葛英强（罗格斯大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.04370)] [[代码](https:\u002F\u002Fgithub.com\u002Fagiresearch\u002Fopenagi)]\n  - OpenAGI是一个开源的AGI研究平台。它引入了LLM运行各种专家模型以解决复杂任务的范式，并提出了RLTF机制来提升LLM的任务解决能力。\n- [2023\u002F03] **HuggingGPT：用ChatGPT及其在Hugging Face中的伙伴解决AI任务。** *沈永亮（浙江大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17580)] [[代码](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FJARVIS)]\n  - HuggingGPT是一个系统，利用LLM连接机器学习社区中的各种多模态AI模型，以解决AI任务。\n- [2023\u002F03] **Visual ChatGPT：与视觉基础模型对话、绘图和编辑。** *吴晨飞（微软亚洲研究院）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.04671)] [[代码](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fvisual-chatgpt)]\n  - Visual ChatGPT是一个系统，借助视觉基础模型打开了探索ChatGPT视觉角色的大门。\n- [2023\u002F02] **增强型语言模型：综述。** *格雷瓜尔·米亚隆（Meta AI）等，TMLR。* [[论文](https:\u002F\u002Fopenreview.net\u002Fforum?id=jh7wH2AzKK)]\n  - 该综述回顾了将工具使用能力赋予语言模型的相关工作。增强后的语言模型可以利用外部模块扩展其上下文处理能力。\n- [2023\u002F02] **Toolformer：语言模型可自我教授如何使用工具。** *蒂莫·希克（Meta AI）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.04761)]\n  - Toolformer表明，LLM仅需针对每个API提供少量示范，即可学会使用外部工具。\n- [2022\u002F05] **TALM：工具增强型语言模型。** *亚伦·帕里西（谷歌）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.12255)]\n  - TALM提出了一种将不可微分工具与语言模型相结合的方法，使模型能够访问实时或私有数据。\n- [2022\u002F05] **MRKL系统：一种模块化、神经符号架构，结合大语言模型、外部知识源和离散推理。** *埃胡德·卡帕斯（AI21 Labs）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2205.00445)]\n  - MRKL系统通过一套易于扩展的外部知识和推理模块来增强LLM。\n- [2022\u002F04] **照我做，别照我说：将语言 grounding 到机器人 affordances 上。** *迈克尔·安（谷歌）等，CoRL。* [[论文](https:\u002F\u002Fproceedings.mlr.press\u002Fv205\u002Fichter23a.html)]\n  - SayCan通过将LLM的高级语义知识与预训练技能的价值函数相结合，将LLM应用于现实世界的机器人任务。\n- [2021\u002F12] **WebGPT：浏览器辅助问答，结合人类反馈。** *中野玲一郎（OpenAI）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.09332)]\n  - WebGPT利用网页浏览环境回答问题。它在训练过程中采用模仿学习，随后通过人类反馈优化答案质量。\n- [2021\u002F07] **评估基于代码训练的大语言模型。** *马克·陈（OpenAI）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2107.03374)] [[代码](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fhuman-eval)]\n  - Codex能够根据docstring合成程序，即基于文档创建工具。\n\n#### 1.3.2 具身行动\n- [2023\u002F12] **迈向具身导航通用模型的学习。** *郑铎（香港中文大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.02010)] [[代码](https:\u002F\u002Fgithub.com\u002Fzd11024\u002FNaviLLM)]\n- [2023\u002F11] **三维世界中的具身通用智能体。** *黄江勇（BIGAI & 北京大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.12871)] [[项目页面](https:\u002F\u002Fembodied-generalist.github.io\u002F)]\n- [2023\u002F11] **JARVIS-1：基于记忆增强型多模态语言模型的开放世界多任务智能体。** *王子豪（北京大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05997)] [[代码](https:\u002F\u002Fgithub.com\u002FCraftJarvis\u002FJARVIS-1)]\n- [2023\u002F10] **Lemur：为语言智能体协调自然语言与代码** *许一恒（香港大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.06830)] [[代码](https:\u002F\u002Fgithub.com\u002FOpenLemur\u002FLemur)]\n- [2023\u002F10] **通过多模态大型语言模型实现端到端的具身决策：基于GPT4-Vision及其他模型的探索** *陈亮等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.02071)] [[代码](https:\u002F\u002Fgithub.com\u002FPKUnlp-icler\u002FPCA-EVAL)]\n- [2023\u002F07] **交互式语言：与机器人实时对话。** *科里·林奇等，IEEE (RAL)。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2210.06407.pdf)]\n- [2023\u002F05] **Voyager：基于大型语言模型的开放式具身智能体。** *王冠志（NVIDIA）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16291)] [[项目页面](https:\u002F\u002Fvoyager.minedojo.org\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002FMineDojo\u002FVoyager)]\n- [2023\u002F05] **AVLEN：三维环境中的视听语言具身导航。** *苏迪普塔·保罗等，NeurIPS。* [[论文](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Ffile\u002F28f699175783a2c828ae74d53dd3da20-Paper-Conference.pdf)]\n- [2023\u002F05] **EmbodiedGPT：通过具身思维链进行视觉—语言预训练。** *穆瑶等，Arxiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.15021.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002FEmbodiedGPT\u002FEmbodiedGPT_Pytorch)]\n- [2023\u002F05] **NavGPT：利用大型语言模型在视觉—语言导航中进行显式推理。** *周耿泽等，Arxiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.16986.pdf)]\n- [2023\u002F05] **AlphaBlock：面向机器人操作中视觉—语言推理的具身微调。** *金楚浩等，Arxiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.18898.pdf)]\n- [2023\u002F03] **PaLM-E：一种具身多模态语言模型。** *丹尼·德里斯等，Arxiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.03378.pdf)]\n- [2023\u002F03] **Reflexion：具有口头强化学习的语言智能体。** *诺亚·辛恩等，Arxiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2303.11366.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002Fnoahshinn024\u002Freflexion)]\n- [2023\u002F02] **与语言模型协作进行具身推理。** *伊希塔·达斯古普塔等，Arxiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2302.00763.pdf)]\n- [2023\u002F02] **代码即策略：用于具身控制的语言模型程序。** *梁杰克等，IEEE (ICRA)。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2209.07753.pdf)]\n- [2022\u002F10] **ReAct：在语言模型中协同推理与行动。** *姚顺宇等，Arxiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2210.03629.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002Fysymyth\u002FReAct)]\n- [2022\u002F10] **基于多模态Transformer的指令遵循智能体。** *刘浩等，CVPR。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2210.13431.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002Flhao499\u002Finstructrl)]\n- [2022\u002F07] **内心独白：通过规划与语言模型实现具身推理。** *黄文龙等，Arxiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2207.05608.pdf)]\n- [2022\u002F07] **LM-Nav：利用大规模预训练的语言、视觉和动作模型进行机器人导航。** *德鲁夫·沙赫等，CoRL。* [[论文](https:\u002F\u002Fproceedings.mlr.press\u002Fv205\u002Fshah23b\u002Fshah23b.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002Fblazejosinski\u002Flm_nav)]\n- [2022\u002F04] **按我能做的做，而非我所说的做：将语言 grounding 到机器人的可操作性上。** *迈克尔·安等，Arxiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.01691.pdf)]\n- [2022\u002F01] **具身人工智能综述：从模拟器到研究任务。** *段佳飞等，IEEE (TETCI)。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.04918.pdf)]\n- [2022\u002F01] **语言模型作为零样本规划者：为具身智能体提取可执行知识。** *黄文龙等，Arxiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2201.07207v2.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002Fhuangwl18\u002Flanguage-planner)]\n- [2020\u002F04] **经验赋予语言意义。** *约纳坦·比斯克等，EMNLP。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2004.10151.pdf)]\n- [2019\u002F03] **机器人操作中深度强化学习综述。** *海·阮等，IEEE (IRC)。* [[论文](https:\u002F\u002Fwww.researchgate.net\u002Fprofile\u002FHai-Nguyen-128\u002Fpublication\u002F355980729_Review_of_Deep_Reinforcement_Learning_for_Robot_Manipulation\u002Flinks\u002F6187ef153068c54fa5bb977e\u002FReview-of-Deep-Reinforcement-Learning-for-Robot-Manipulation.pdf)]\n- [2005\u002F01] **具身认知的发展：来自婴儿的六堂课。** *琳达·史密斯等，Artificial Life。* [[论文](https:\u002F\u002Fcogdev.sitehost.iu.edu\u002Flabwork\u002F6_lessons.pdf)]\n\n\n\n\n\n## 2. 实践中的智能体：基于LLM的智能体应用\n\n\u003Cdiv align=center>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWooooDyy_LLM-Agent-Paper-List_readme_da4fb6cc2482.jpg\" width=\"60%\" \u002F>\u003C\u002Fdiv>\n\n### 2.1 单一智能体的通用能力\n\u003Cdiv align=center>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWooooDyy_LLM-Agent-Paper-List_readme_c9ca91e1e74c.jpg\" width=\"60%\" \u002F>\u003C\u002Fdiv>\n\n#### 2.1.1 任务导向型部署\n**在网页场景中**\n- [2023\u002F10] **OpenAgents：面向真实世界的语言代理开放平台。** *XLang实验室（香港大学）arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.10634)] [[项目页面](https:\u002F\u002Fdocs.xlang.ai)] [[代码](https:\u002F\u002Fgithub.com\u002Fxlang-ai\u002FOpenAgents)] [[演示](https:\u002F\u002Fchat.xlang.ai)]\n- [2023\u002F07] **WebArena：用于构建自主代理的真实网络环境。** *周书言（卡内基梅隆大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.13854)] [[代码](https:\u002F\u002Fwebarena.dev\u002F)]\n- [2023\u002F07] **具备规划、长上下文理解与程序合成能力的真实世界WebAgent。** *伊泽丁·古尔（DeepMind）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.12856)]\n- [2023\u002F06] **SYNAPSE：利用少量示例实现人类级别的计算机操控。** *郑龙涛（南洋理工大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.07863)] [[代码](https:\u002F\u002Fgithub.com\u002Fltzheng\u002Fsynapse)]\n- [2023\u002F06] **Mind2Web：迈向通用的网络代理。** *邓翔（俄亥俄州立大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.06070)] [[代码](https:\u002F\u002Fosu-nlp-group.github.io\u002FMind2Web\u002F)]\n- [2023\u002F05] **基于指令微调的基础模型的多模态网页导航。** *古田浩树（东京大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.11854)]\n- [2023\u002F03] **语言模型可以解决计算机任务。** *金根宇（加州大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17491)] [[代码](https:\u002F\u002Fgithub.com\u002Fposgnu\u002Frci-agent)]\n- [2022\u002F07] **WebShop：通过具身语言代理实现可扩展的真实世界网络交互。** *姚顺宇（普林斯顿大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.01206)] [[代码](https:\u002F\u002Fwebshop-pnlp.github.io\u002F)]\n- [2021\u002F12] **WebGPT：结合浏览器辅助问答与人类反馈。** *中野玲一郎（OpenAI）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.09332)]\n- [2023\u002F05] **Agents：面向自主语言代理的开源框架。** *周旺春树（AIWaves）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.07870.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002Faiwaves-cn\u002Fagents)]\n- [2024\u002F04] **OSWorld：在真实计算机环境中对多模态代理进行开放式任务基准测试。** *XLang实验室（香港大学）arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.07972)] [[项目页面](https:\u002F\u002Fdocs.xlang.ai)] [[代码](https:\u002F\u002Fgithub.com\u002Fxlang-ai\u002FOSWorld)] [[数据查看器](https:\u002F\u002Fos-world.github.io\u002Fexplorer.html)]\n\n**在生活中场景中**\n- [2023\u002F10] **OpenAgents：面向真实世界的语言代理开放平台。** *XLang实验室（香港大学）arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.10634)] [[项目页面](https:\u002F\u002Fdocs.xlang.ai)] [[代码](https:\u002F\u002Fgithub.com\u002Fxlang-ai\u002FOpenAgents)] [[演示](https:\u002F\u002Fchat.xlang.ai)]\n- [2023\u002F08] **InterAct：探索ChatGPT作为协作代理的潜力。** *陈柏霖等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.01552)]\n- [2023\u002F05] **计划、消除与追踪——语言模型是具身代理的好老师。** *吴岳（卡内基梅隆大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.02412)]\n- [2023\u002F05] **用大型语言模型增强自目的性代理。** *塞德里克·科拉斯（麻省理工学院）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.12487)]\n- [2023\u002F03] **通过纠正式重提示使用大型语言模型进行规划。** *斯雷亚斯·桑达拉·拉曼（布朗大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.09935)]\n- [2022\u002F10] **利用环境感知语言模型生成可执行行动计划。** *迈特雷·格拉莫帕迪耶（北卡罗来纳大学教堂山分校）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.04964)] [[代码](https:\u002F\u002Fgithub.com\u002Fhri-ironlab\u002Fscene_aware_language_planner)]\n- [2022\u002F01] **语言模型作为零样本规划者：为具身代理提取可操作知识。** *黄文龙（加州大学伯克利分校）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2201.07207)] [[代码](https:\u002F\u002Fwenlong.page\u002Flanguage-planner\u002F)]\n\n#### 2.1.2 创新导向型部署\n- [2023\u002F10] **OpenAgents：面向真实世界的语言代理开放平台。** *XLang实验室（香港大学）arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.10634)] [[项目页面](https:\u002F\u002Fdocs.xlang.ai)] [[代码](https:\u002F\u002Fgithub.com\u002Fxlang-ai\u002FOpenAgents)] [[演示](https:\u002F\u002Fchat.xlang.ai)]\n- [2023\u002F08] **程序分析指南：与大型语言模型同行的旅程。** *李浩楠（加州大学河滨分校）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.00245)]\n- [2023\u002F08] **ChatMOF：用于预测和生成金属有机框架的自主人工智能系统。** *姜永勋（韩国先进科学技术研究院）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.01423)]\n- [2023\u002F07] **数学代理：计算基础设施、数学嵌入与基因组学。** *梅拉妮·斯旺（伦敦大学学院）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.02502)]\n- [2023\u002F06] **通过对话式大型语言模型迈向自主测试代理。** *罗伯特·费尔特（查尔姆斯理工大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.05152)]\n- [2023\u002F04] **大型语言模型涌现的自主科学研究能力。** *丹尼尔·A·博伊科（卡内基梅隆大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.05332)]\n- [2023\u002F04] **ChemCrow：用化学工具增强大型语言模型。** *安德烈斯·M·布兰（EPFL ISIC人工化学智能实验室）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.05376)] [[代码](https:\u002F\u002Fgithub.com\u002Fur-whitelab\u002Fchemcrow-public)]\n- [2022\u002F03] **ScienceWorld：你的代理比五年级学生更聪明吗？** *王若瑶（亚利桑那大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.07540)] [[代码](https:\u002F\u002Fsciworld.apps.allenai.org\u002F)]\n\n#### 2.1.3 生命周期导向的部署\n- [2023\u002F05] **Voyager：基于大型语言模型的开放式具身智能体。** *Guanzhi Wang（NVIDIA）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16291)] [[项目页面](https:\u002F\u002Fvoyager.minedojo.org\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002FMineDojo\u002FVoyager)]\n- [2023\u002F05] **Minecraft中的幽灵：利用基于文本的知识与记忆的大型语言模型构建适用于开放世界环境的通用智能体。** *Xizhou Zhu（清华大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.17144)] [[代码](https:\u002F\u002Fgithub.com\u002FOpenGVLab\u002FGITM)]\n- [2023\u002F03] **Plan4MC：面向开放世界Minecraft任务的技能强化学习与规划。** *Haoqi Yuan（北京大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.16563)] [[项目页面](https:\u002F\u002Fsites.google.com\u002Fview\u002Fplan4mc)]\n- [2023\u002F02] **描述、解释、规划与选择：结合大型语言模型的交互式规划赋能开放世界多任务智能体。** *Zihao Wang（北京大学）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.01560)] [[代码](https:\u002F\u002Fgithub.com\u002FCraftJarvis\u002FMC-Planner)]\n- [2023\u002F01] **具身智能体会梦见像素化的绵羊吗？基于语言引导的世界建模的具身决策。** *Kolby Nottingham（加州大学欧文分校）等，arXiv预印本。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.12050)] [[代码](https:\u002F\u002Fdeckardagent.github.io\u002F)]\n\n### 2.2 多智能体的协同潜力\n\u003Cdiv align=center>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWooooDyy_LLM-Agent-Paper-List_readme_74784d697618.jpg\" width=\"60%\" \u002F>\u003C\u002Fdiv>\n\n#### 2.2.1 基于互补性的合作交互\n**无序合作**\n- [2023\u002F07] **释放大语言模型中的认知协同效应：通过多角色自我协作的任务求解智能体。** *王振海龙（伊利诺伊大学厄巴纳-香槟分校）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.05300)] [[代码](https:\u002F\u002Fgithub.com\u002FMikeWangWZHL\u002FSolo-Performance-Prompting)]\n- [2023\u002F07] **RoCo：基于大语言模型的辩证式多机器人协作。** *赵曼迪、Shreeya Jain、宋舒然（哥伦比亚大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.04738)] [[代码](https:\u002F\u002Fproject-roco.github.io\u002F)]\n- [2023\u002F04] **ChatLLM网络：更多大脑，更强大智能。** *郝锐（北京邮电大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.12998)]\n- [2023\u002F01] **盲人裁判：基于GPT的代理制最高法院建模。** *西尔·汉密尔顿（麦吉尔大学），arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.05327)]\n- [2023\u002F05] **Agents：一个用于自主语言智能体的开源框架。** *周旺春树（AIWaves）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.07870.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002Faiwaves-cn\u002Fagents)]\n\n\n**有序合作**\n- [2023\u002F10] **AutoAgents：自动智能体生成框架。** *陈光耀（北京大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.17288)] [[代码](https:\u002F\u002Fgithub.com\u002FLink-AGI\u002FAutoAgents)]\n- [2023\u002F09] **MindAgent：新兴的游戏交互方式。** *龚然（UCLA）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.09971)] [[代码](https:\u002F\u002Fmindagent.github.io\u002F)]\n- [2023\u002F08] **CGMI：可配置的通用多智能体交互框架。** *史金鑫（华东师范大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.12503)]\n- [2023\u002F08] **ProAgent：利用大语言模型构建主动协作型AI。** *张策尧（香港中文大学深圳校区）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.11339)] [[代码](https:\u002F\u002Fpku-proagent.github.io\u002F)]\n- [2023\u002F08] **AgentVerse：促进多智能体协作并探索智能体的涌现行为。** *陈伟泽（清华大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.10848)] [[代码](https:\u002F\u002Fgithub.com\u002FOpenBMB\u002FAgentVerse)]\n- [2023\u002F08] **AutoGen：通过多智能体对话框架实现下一代LLM应用。** *吴庆云（宾夕法尼亚州立大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.08155)] [[代码](https:\u002F\u002Fmicrosoft.github.io\u002FFLAML\u002Fdocs\u002FUse-Cases\u002FAutogen\u002F)]\n- [2023\u002F08] **MetaGPT：面向多智能体协作框架的元编程。** *洪思睿（DeepWisdom）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.00352)] [[代码](https:\u002F\u002Fgithub.com\u002Fgeekan\u002FMetaGPT)]\n- [2023\u002F07] **用于软件开发的沟通型智能体。** *陈谦（清华大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.07924)] [[代码](https:\u002F\u002Fgithub.com\u002Fopenbmb\u002Fchatdev)]\n- [2023\u002F06] **多智能体协作：释放智能LLM智能体的力量。** *雅沙尔·塔莱比拉（阿尔伯塔大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.03314)]\n- [2023\u002F05] **在模拟人类社会中训练社会对齐的语言模型。** *刘瑞博（达特茅斯学院）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16960)] [[代码](https:\u002F\u002Fgithub.com\u002Fagi-templar\u002FStable-Alignment)]\n- [2023\u002F05] **SwiftSage：一种具备快慢思维的生成式智能体，适用于复杂交互任务。** *林宇辰（艾伦人工智能研究所）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.17390)] [[代码](https:\u002F\u002Fyuchenlin.xyz\u002Fswiftsage\u002F)]\n- [2023\u002F05] **ChatGPT作为您的私人数据科学家。** *Md Mahadi Hassan（奥本大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.13657)]\n- [2023\u002F03] **CAMEL：用于探索大规模语言模型社会“心智”的沟通型智能体。** *李国豪（阿卜杜拉国王科技大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17760)] [[代码](https:\u002F\u002Fgithub.com\u002Flightaime\u002Fcamel)]\n- [2023\u002F03] **DERA：通过支持对话的解析型智能体提升大语言模型的补全能力。** *瓦伦·奈尔（Curai Health）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17071)] [[代码](https:\u002F\u002Fgithub.com\u002Fcurai\u002Fcurai-research\u002Ftree\u002Fmain\u002FDERA)]\n- [2023\u002F04] **通过ChatGPT实现自我协作式代码生成。** *董一弘（北京大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.07590)]\n\n#### 2.2.2 基于对抗的交互以推动进步\n- [2023\u002F08] **ChatEval：通过多智能体辩论打造更优秀的LLM评估器。** *陈志敏（清华大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.07201)] [[代码](https:\u002F\u002Fgithub.com\u002Fthunlp\u002FChatEval)]\n- [2023\u002F05] **通过多智能体辩论提升语言模型的事实性和推理能力。** *杜逸伦（MIT CSAIL）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.14325)] [[代码](https:\u002F\u002Fcomposable-models.github.io\u002Fllm_debate\u002F)]\n- [2023\u002F05] **利用自我博弈和来自AI反馈的上下文学习改进语言模型谈判能力。** *傅瑶（爱丁堡大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.10142)] [[代码](https:\u002F\u002Fgithub.com\u002FFranxYao\u002FGPT-Bargaining)]\n- [2023\u002F05] **考察大语言模型的一致性：通过辩论进行深入分析。** *熊凯（哈尔滨工业大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.11595)]\n- [2023\u002F05] **通过多智能体辩论激发大语言模型的发散思维。** *梁天（清华大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.19118)] [[代码](https:\u002F\u002Fgithub.com\u002FSkytliang\u002FMulti-Agents-Debate)]\n\n### 2.3 人与智能体的交互式协作\n\u003Cdiv align=center>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWooooDyy_LLM-Agent-Paper-List_readme_6ed12799a232.jpg\" width=\"60%\" \u002F>\u003C\u002Fdiv>\n\n#### 2.3.1 教师—执行者范式\n\n##### 教育\n\n- [2023\u002F07] **数学智能体：计算基础设施、数学嵌入与基因组学。** *梅拉妮·斯旺（UCL）等，arXiv。* [[论文](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2307.02502)]\n  - 与人类沟通，帮助其理解并运用数学知识。\n- [2023\u002F03] **嘿，多娜！你能帮我处理学生的课程注册吗？** *维谢什·卡尔瓦库尔蒂（MSU）等，arXiv。* [[论文](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2303.13548)]\n  - 这是一款名为“Dona”的应用，提供学生课程注册的虚拟语音助手服务，由人类下达指令。\n\n##### 健康\n\n- [2023\u002F08] **中经：通过专家反馈和真实世界多轮对话提升大语言模型的中医能力。** *杨松华（ZZU）等，arXiv。* [[论文](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2308.03549)] [[代码](https:\u002F\u002Fgithub.com\u002FSupritYoung\u002FZhongjing)]\n- [2023\u002F05] **华佗GPT：让语言模型成为医生的探索。** *张洪波（CUHK-SZ）等，arXiv。* [[论文](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2305.15075)] [[代码](https:\u002F\u002Fgithub.com\u002FFreedomIntelligence\u002FHuatuoGPT)] [[演示](https:\u002F\u002Fwww.huatuogpt.cn\u002F)]\n- [2023\u002F05] **帮助帮助者：利用AI赋能的实践与反馈支持同伴咨询师。** *许尚凌（Gatech）等，arXiv。* [[论文](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2305.08982)]\n- [2020\u002F10] **针对自闭症谱系障碍青少年的虚拟对话智能体：实验结果与设计启示。** *穆罕默德·拉法耶特·阿里（U of R）等，IVA '20。* [[论文](https:\u002F\u002Fdoi.org\u002F10.1145\u002F3383652.3423900)]\n\n##### 其他应用\n\n- [2023\u002F08] **RecMind：基于大语言模型的推荐智能体。** *王延成（ASU、Amazon）等，arXiv。* [[论文](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2308.14296)]\n- [2023\u002F08] **多轮对话智能体作为电话营销中的销售助理。** *高婉婷（JNU）等，IEEE。* [[论文](https:\u002F\u002Fdoi.org\u002F10.1109\u002FIJCNN54540.2023.10192042)]\n- [2023\u002F07] **PEER：一种协作式语言模型。** *蒂莫·希克（Meta AI）等，arXiv。* [[论文](https:\u002F\u002Fopenreview.net\u002Fpdf?id=KbYevcLjnc)]\n- [2023\u002F07] **DIALGEN：用于增进对人类间对话理解的协作式人—语言模型生成对话。** *卢博儒（UW）等，arXiv。* [[论文](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2307.07047)]\n- [2023\u002F08] **LLM作为数据库管理员[愿景]。** *周轩赫（清华）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.05481)]\n- [2023\u002F06] **AssistGPT：一款能够规划、执行、检查并学习的通用多模态助手。** *高迪菲（NUS）等，arXiv。* [[论文](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2306.08640)]\n- [2023\u002F05] **Agents：一个用于自主语言智能体的开源框架。** *周旺春树（AIWaves）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2309.07870.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002Faiwaves-cn\u002Fagents)]\n- [2023\u002F12] **D-Bot：基于大语言模型的数据库诊断系统。** *周轩赫（清华）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.01454)] [[代码](https:\u002F\u002Fgithub.com\u002FTsinghuaDatabaseGroup\u002FDB-GPT)]\n\n#### 2.3.2 平等伙伴关系范式\n\n##### 共情沟通者\n\n- [2023\u002F08] **SAPIEN：由大语言模型驱动的情感虚拟智能体。** *马苏姆·哈桑等，arXiv。* [[论文](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2308.03022)] [[项目页面](https:\u002F\u002Fsapien.coach\u002F)]\n- [2023\u002F05] **帮助帮助者：利用AI赋能的实践与反馈支持同伴咨询师。** *许尚凌（Gatech）等，arXiv。* [[论文](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2305.08982)]\n- [2022\u002F07] **营销互动中的人工共情：弥合情感与社交客户体验中的人工智能鸿沟。** *刘玉萍—汤普金斯等。* [[论文](https:\u002F\u002Flink.springer.com\u002Farticle\u002F10.1007\u002Fs11747-022-00892-5)]\n\n##### 人类水平参与者\n\n- [2023\u002F08] **量化大语言模型对群体意见动态的影响。** *李超等，CoRR。* [[论文](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2308.03313)]\n- [2023\u002F06] **通过人类规整的强化学习与规划掌握无压力外交游戏。** *安东·巴赫京等，ICLR。* [[论文](https:\u002F\u002Fopenreview.net\u002Fpdf?id=F61FwJTZhb)]\n- [2023\u002F06] **面向决策的人—AI协作对话。** *林杰西等，CoRR。* [[论文](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2305.20076)]\n- [2022\u002F11] **结合语言模型与战略推理实现外交游戏中的人类水平博弈。** *FAIR等，Science。* [[论文](https:\u002F\u002Fwww.science.org\u002Fdoi\u002F10.1126\u002Fscience.ade9097)]\n\n## 3. 智能体社会：从个体性到社会性\n\u003Cdiv align=center>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWooooDyy_LLM-Agent-Paper-List_readme_dc52bd770fd9.jpg\" width=\"60%\" \u002F>\u003C\u002Fdiv>\n\n### 3.1 基于LLM的智能体行为与人格\n\n#### 3.1.1 社会行为\n\n##### 个体行为\n- [2023\u002F10] **Lyfe Agents：用于低成本实时社交互动的生成式智能体。** *赵凯雅（MIT）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.02172)]\n- [2023\u002F05] **Voyager：一款具有大语言模型的开放式具身智能体。** *王冠志（NVIDIA）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16291)] [[代码](https:\u002F\u002Fgithub.com\u002FMineDojo\u002FVoyager)] [[项目页面](https:\u002F\u002Fvoyager.minedojo.org\u002F)]\n- [2023\u002F04] **LLM+P：赋予大语言模型最优规划能力。** *刘博（德克萨斯大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.11477)] [[代码](https:\u002F\u002Fgithub.com\u002FCranial-XIX\u002Fllm-pddl)]\n- [2023\u002F03] **Reflexion：具备言语强化学习能力的语言智能体。** *诺亚·辛恩（东北大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.11366)] [[代码](https:\u002F\u002Fgithub.com\u002Fnoahshinn024\u002Freflexion)]\n- [2023\u002F03] **PaLM-E：一款具身多模态语言模型。** *丹尼·德里斯（Google）等，ICML。* [[论文](http:\u002F\u002Fproceedings.mlr.press\u002Fv202\u002Fdriess23a\u002Fdriess23a.pdf)] [[项目页面](https:\u002F\u002Fpalm-e.github.io\u002F)]\n- [2023\u002F03] **ReAct：在语言模型中协同推理与行动。** *姚顺宇（普林斯顿大学）等，ICLR。* [[论文](https:\u002F\u002Fopenreview.net\u002Fpdf?id=WE_vluYUL-X)] [[项目页面](https:\u002F\u002Freact-lm.github.io\u002F)]\n- [2022\u002F01] **思维链提示可激发大语言模型的推理能力。** *贾森·魏（Google）等，NeurIPS。* [[论文](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Ffile\u002F9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf)]\n\n##### 团体行为\n- [2023\u002F10] **探索LLM智能体的协作机制：社会心理学视角。** *张金天（浙江大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.02124)] [[代码](https:\u002F\u002Fgithub.com\u002Fzjunlp\u002FMachineSoM)]\n- [2023\u002F09] **MindAgent：新兴的游戏交互方式。** *龚然（UCLA）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.09971)] [[代码](https:\u002F\u002Fmindagent.github.io\u002F)]\n- [2023\u002F09] **探索大型语言模型在交流类游戏中的应用：基于狼人杀的实证研究。** *徐宇壮（清华大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.04658)]\n- [2023\u002F09] **怀疑者智能体：利用具备心智理论意识的GPT-4玩不完全信息博弈** *顾家贤等，arXiv。* [[论文](http:\u002F\u002Farxiv.org\u002Fabs\u002F2309.17277)]\n\n- [2023\u002F08] **AgentVerse：促进多智能体协作并探索智能体的涌现行为。** *陈伟泽（清华大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.10848)] [[代码](https:\u002F\u002Fgithub.com\u002FOpenBMB\u002FAgentVerse)]\n- [2023\u002F08] **AutoGen：通过多智能体对话框架实现下一代LLM应用。** *吴庆云（宾夕法尼亚州立大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.08155)] [[代码](https:\u002F\u002Fmicrosoft.github.io\u002FFLAML\u002Fdocs\u002FUse-Cases\u002FAutogen\u002F)]\n- [2023\u002F08] **ChatEval：通过多智能体辩论提升基于LLM的评估器性能。** *陈志敏（清华大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.07201)] [[代码](https:\u002F\u002Fgithub.com\u002Fthunlp\u002FChatEval)]\n\n- [2023\u002F07] **面向软件开发的沟通型智能体。** *钱晨（清华大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.07924)] [[代码](https:\u002F\u002Fgithub.com\u002Fopenbmb\u002Fchatdev)]\n- [2023\u002F07] **RoCo：基于大型语言模型的辩证式多机器人协作。** *赵曼迪、Shreeya Jain、宋舒然（哥伦比亚大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.04738)] [[代码](https:\u002F\u002Fproject-roco.github.io\u002F)]\n- [2023\u002F08] **ProAgent：利用大型语言模型构建主动协作型AI。** *张策尧（香港中文大学深圳分校）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.11339)] [[代码](https:\u002F\u002Fpku-proagent.github.io\u002F)]\n\n- [2023\u002F06] **大型语言模型驱动的智能体人工社交网络中的同质性现象。** *詹姆斯·K·何（剑桥大学）等，PsyArXiv。* [[论文](https:\u002F\u002Fdoi.org\u002F10.21203\u002Frs.3.rs-3096289\u002Fv1)]\n\n#### 3.1.2 人格\n##### 认知\n- [2023\u002F09] **怀疑者智能体：利用具备心智理论意识的GPT-4玩不完全信息博弈** *顾家贤等，arXiv。* [[论文](http:\u002F\u002Farxiv.org\u002Fabs\u002F2309.17277)]\n- [2023\u002F03] **机器心理学：运用心理学方法探究大型语言模型的涌现能力与行为。** *蒂洛·哈根多夫（斯图加特大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.13988)]\n- [2023\u002F03] **心灵与机器相遇：解析GPT-4的认知心理学特征。** *西法特考尔·丁格拉（Nowrosjee Wadia College）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.11436)]\n- [2022\u002F07] **语言模型在推理中表现出类似人类的内容效应。** *伊希塔·达斯古普塔（DeepMind）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2207.07051)]\n- [2022\u002F06] **用认知心理学理解GPT-3。** *马塞尔·宾茨等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.14576)]\n\n\n##### 情感\n- [2023\u002F07] **大型语言模型的情感智力。** *王雪娜（清华大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.09042)]\n- [2023\u002F05] **ChatGPT在情感觉察评估中优于人类。** *佐哈尔·埃利奥塞夫等，Frontiers in Psychology。* [[论文](https:\u002F\u002Fwww.frontiersin.org\u002Farticles\u002F10.3389\u002Ffpsyg.2023.1199058\u002Ffull)]\n- [2023\u002F02] **用于增强游戏韧性的共情型AI。** *雷扎·哈比比（加州大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.09070)]\n- [2022\u002F12] **计算机说“不”：反对共情型对话式AI的理由。** *阿尔巴·柯里（利兹大学）等，ACL。* [[论文](https:\u002F\u002Faclanthology.org\u002F2023.findings-acl.515.pdf)]\n\n##### 性格\n- [2024\u002F05] **TimeChara：评估角色扮演型大型语言模型的时点性格幻觉。** *安在宇（首尔国立大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.18027)] [[代码](https:\u002F\u002Fgithub.com\u002Fahnjaewoo\u002Ftimechara)]\n- [2023\u002F10] **Character-LLM：一种可训练的角色扮演智能体。** *邵云帆（复旦大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.10158)] [[代码](https:\u002F\u002Fgithub.com\u002Fchoosewhatulike\u002Ftrainable-agents\u002F)]\n- [2023\u002F07] **LLM是否具有人格？将MBTI测试作为评估大型语言模型的绝佳工具。** *潘科宇（字节跳动）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.16180)] [[代码](https:\u002F\u002Fgithub.com\u002FHarderThenHarder\u002Ftransformers_tasks)]\n- [2023\u002F07] **大型语言模型中的人格特质。** *穆斯塔法·萨夫达里（DeepMind）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.00184)] [[代码](https:\u002F\u002Fgithub.com\u002FHarderThenHarder\u002Ftransformers_tasks)]\n- [2022\u002F12] **GPT-3是否表现出精神病态？从心理学角度评估大型语言模型。** *李星轩（阿里巴巴）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.10529)]\n- [2022\u002F12] **识别和操纵语言模型的人格特质。** *格雷厄姆·卡隆等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.10276)]\n\n### 3.2 智能体社会的环境\n\n#### 3.2.1 文本型环境\n\n- [2023\u002F08] **Hoodwinked: 面向语言模型的文本游戏中的欺骗与合作。** *Aidan O’Gara（南加州大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.01404)] [[代码](https:\u002F\u002Fgithub.com\u002Faogara-ds\u002Fhoodwinked)]\n- [2023\u002F03] **CAMEL：用于大规模语言模型社会“心智”探索的沟通型智能体。** *Guohao Li（阿卜杜拉国王科技大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.17760)] [[代码](https:\u002F\u002Fgithub.com\u002Flightaime\u002Fcamel)]\n- [2020\u002F12] **运用常识玩文本游戏。** *Sahith Dambekodi（佐治亚理工学院）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2012.02757.pdf)]\n- [2019\u002F09] **互动式小说游戏：一场浩大的冒险。** *Matthew Hausknecht（微软研究院）等，AAAI。* [[论文](https:\u002F\u002Fcdn.aaai.org\u002Fojs\u002F6297\u002F6297-13-9522-1-10-20200516.pdf)] [[代码](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fjericho)]\n- [2019\u002F03] **在奇幻文本冒险游戏中学习说话与行动。** *Jack Urbanek（Facebook）等，ACL。* [[论文](https:\u002F\u002Faclanthology.org\u002FD19-1062.pdf)] [[代码](https:\u002F\u002Fparl.ai\u002Fprojects\u002Flight\u002F)]\n- [2018\u002F06] **TextWorld：一个面向文本游戏的学习环境。** *Marc-Alexandre Côté（微软研究院）等，IJCAI。* [[论文](https:\u002F\u002Flink.springer.com\u002Fchapter\u002F10.1007\u002F978-3-030-24337-1_3)] [[代码](https:\u002F\u002Fgithub.com\u002FMicrosoft\u002FTextWorld)]\n\n#### 3.2.2 虚拟沙盒环境\n\n- [2023\u002F11] **JARVIS-1：基于记忆增强型多模态语言模型的开放世界多任务智能体。** *ZiHao Wang（北京大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.05997)] [[代码](https:\u002F\u002Fgithub.com\u002FCraftJarvis\u002FJARVIS-1)]\n- [2023\u002F10] **Humanoid Agents：模拟类人生成式智能体的平台。** *Zhilin Wang（华盛顿大学和NVIDIA）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.05418)] [[代码](https:\u002F\u002Fgithub.com\u002FHumanoidAgents\u002FHumanoidAgents)] [[演示](https:\u002F\u002Fwww.humanoidagents.com\u002F)]\n- [2023\u002F08] **AgentSims：一个用于大型语言模型评估的开源沙盒。** *Jiaju Lin（PTA Studio）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.04026)] [[项目页面](https:\u002F\u002Fwww.agentsims.com\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002Fpy499372727\u002FAgentSims\u002F)]\n- [2023\u002F05] **在模拟人类社会中训练社会对齐的语言模型。** *Ruibo Liu（达特茅斯学院）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16960)] [[代码](https:\u002F\u002Fgithub.com\u002Fagi-templar\u002FStable-Alignment)]\n- [2023\u002F05] **Voyager：一个基于大型语言模型的开放式具身智能体。** *Guanzhi Wang（NVIDIA）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16291)] [[项目页面](https:\u002F\u002Fvoyager.minedojo.org\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002FMineDojo\u002FVoyager)]\n- [2023\u002F04] **生成式智能体：人类行为的交互式模拟物。** *Joon Sung Park（斯坦福大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.03442)] [[代码](https:\u002F\u002Fgithub.com\u002Fjoonspk-research\u002Fgenerative_agents)]\n- [2023\u002F03] **Plan4MC：面向开放世界Minecraft任务的技能强化学习与规划。** *Haoqi Yuan（北大）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.16563)] [[项目页面](https:\u002F\u002Fsites.google.com\u002Fview\u002Fplan4mc)]\n- [2022\u002F06] **MineDojo：利用互联网规模知识构建开放式具身智能体。** *Linxi Fan（NVIDIA）等，NeurIPS。* [[论文](https:\u002F\u002Fpapers.nips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Ffile\u002F74a67268c5cc5910f64938cac4526a90-Paper-Datasets_and_Benchmarks.pdf)] [[项目页面](https:\u002F\u002Fminedojo.org\u002F)]\n\n#### 3.2.3 物理环境\n\n- [2023\u002F11] **3D世界中的具身通用智能体。** *Jiangyong Huang（BIGAI和北京大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.12871)] [[项目页面](https:\u002F\u002Fembodied-generalist.github.io\u002F)]\n- [2023\u002F09] **RoboAgent：通过语义增强和动作分块实现机器人操作中的泛化与效率。** *Homanga Bharadhwaj（卡内基梅隆大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.01918)] [[项目页面](https:\u002F\u002Frobopen.github.io\u002F)]\n- [2023\u002F05] **AVLEN：3D环境中基于音频-视觉-语言的具身导航。** *Sudipta Paul等人，NeurIPS。* [[论文](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2022\u002Ffile\u002F28f699175783a2c828ae74d53dd3da20-Paper-Conference.pdf)]\n- [2023\u002F03] **PaLM-E：一个具身多模态语言模型。** *Danny Driess（谷歌）等，ICML。* [[论文](http:\u002F\u002Fproceedings.mlr.press\u002Fv202\u002Fdriess23a\u002Fdriess23a.pdf)] [[项目页面](https:\u002F\u002Fpalm-e.github.io\u002F)]\n- [2022\u002F10] **交互式语言：与机器人实时对话。** *Corey Lynch（谷歌）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.06407)] [[代码](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Flanguage-table)]\n\n\n### 3.3 基于LLM的智能体社会仿真\n- [2024\u002F03] **大型语言模型驱动的智能体社会中社会规范的涌现。** *Siyue Ren等人，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.08251)] [[代码](https:\u002F\u002Fgithub.com\u002Fsxswz213\u002FCRSEC)]\n- [2023\u002F08] **AgentSims：一个用于大型语言模型评估的开源沙盒。** *Jiaju Lin（PTA Studio）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.04026)] [[项目页面](https:\u002F\u002Fwww.agentsims.com\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002Fpy499372727\u002FAgentSims\u002F)]\n- [2023\u002F07] **S\u003Csup>3\u003C\u002Fsup>：基于大型语言模型赋能智能体的社会网络仿真系统。** *Chen Gao（清华大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.14984)]\n- [2023\u002F07] **利用生成式智能体进行流行病建模。** *Ross Williams（弗吉尼亚理工大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.04986)] [[代码](https:\u002F\u002Fgithub.com\u002Fbear96\u002FGABM-Epidemic)]\n- [2023\u002F06] **RecAgent：推荐系统的一种新型仿真范式。** *Lei Wang（中国人民大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.02552)]\n- [2023\u002F05] **在模拟人类社会中训练社会对齐的语言模型。** *Ruibo Liu（达特茅斯学院）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16960)] [[代码](https:\u002F\u002Fgithub.com\u002Fagi-templar\u002FStable-Alignment)]\n- [2023\u002F04] **生成式智能体：人类行为的交互式模拟物。** *Joon Sung Park（斯坦福大学）等，arXiv。* [[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.03442)] [[代码](https:\u002F\u002Fgithub.com\u002Fjoonspk-research\u002Fgenerative_agents)]\n- [2022\u002F08] **社会模拟物：为社交计算系统创建有人群的原型。** *Joon Sung Park（斯坦福大学）等，UIST。* [[论文](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3526113.3545616)]\n\n## 4. 其他主题\n\n### 4.1 基于LLM的智能体基准测试\n- [2023\u002F11] **“MAgIC：大型语言模型驱动的多智能体在认知、适应性、理性与协作方面的研究”** *林旭等*（新加坡国立大学、字节跳动、斯坦福大学及加州大学伯克利分校）arXiv。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2311.08562)] [[项目页面](https:\u002F\u002Fzhiyuanhubj.github.io\u002FMAgIC\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002Fcathyxl\u002FMAgIC)]\n  - 该工作提出了一套用于评估多智能体场景下LLM的基准测试框架，表明使用概率图模型可实现平均50%的性能提升。\n- [2023\u002F10] **“将大型语言模型作为人工智能研究智能体进行基准测试”** *黄谦（斯坦福大学）等* arXiv。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.03302)] [[代码](https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002FMLAgentBench)]\n- [2023\u002F08] **“AgentBench：评估LLM作为智能体”** *刘晓等*（清华大学）arXiv。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.03688)] [[代码](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FAgentBench)] [[项目页面](https:\u002F\u002Fllmbench.ai\u002F)]\n  - AGENTBENCH是一个用于评估LLM作为智能体的基准测试，结果显示顶尖商业模型与开源模型之间存在性能差距。\n- [2023\u002F10] **“SmartPlay：面向LLM作为智能体的基准测试”** *吴岳（卡内基梅隆大学和微软）等* arXiv。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.01557)] [[代码](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FSmartPlay)]\n  - SmartPlay是一套用于评估LLM作为智能体的基准测试与方法论，包含六种不同游戏以评估关键能力，并为识别当前方法中的不足提供了路线图。\n- [2024\u002F04] **“OSWorld：在真实计算机环境中针对开放式任务的多模态智能体基准测试”** *XLang实验室（香港大学）arXiv。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.07972)] [[项目页面](https:\u002F\u002Fdocs.xlang.ai)] [[代码](https:\u002F\u002Fgithub.com\u002Fxlang-ai\u002FOSWorld)] [[数据查看器](https:\u002F\u002Fos-world.github.io\u002Fexplorer.html)]\n  - OSWorld🖥️是一个统一的真实计算机环境，供多模态智能体在Ubuntu、Windows和macOS上对任意应用和界面的开放式计算机任务进行基准测试。\n\n### 4.2 基于LLM的智能体训练与优化\n- [2024\u002F06] **AgentGym：跨多样化环境进化基于大型语言模型的智能体** *奚志恒（复旦大学）等* arXiv。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.04151)] [[项目页面](https:\u002F\u002Fagentgym.github.io\u002F)] [[代码与平台](https:\u002F\u002Fgithub.com\u002FWooooDyy\u002FAgentGym)] [[数据集](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FAgentGym\u002FAgentTraj-L)] [[基准测试](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FAgentGym\u002FAgentEval)] [[模型](https:\u002F\u002Fhuggingface.co\u002FAgentGym\u002FAgentEvol-7B)]。\n- [2023\u002F10] **FireAct：迈向语言智能体的微调** *陈百安（System2 Research）等* arXiv。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16291)] [[项目页面](https:\u002F\u002Ffireact-agent.github.io\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002Fanchen1011\u002FFireAct)] [[数据集](https:\u002F\u002Fgithub.com\u002Fanchen1011\u002FFireAct\u002Ftree\u002Fmain\u002Fdata)]\n- [2023\u002F10] **AgentTuning：为LLM赋能通用智能体能力** *曾傲寒（清华大学）等* arXiv。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.12823)] [[项目页面](https:\u002F\u002Fthudm.github.io\u002FAgentTuning\u002F)] [[代码](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FAgentTuning)] [[数据集](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTHUDM\u002FAgentInstruct)]\n- [2023\u002F10] **Lemur：为语言智能体协调自然语言与代码** *许一恒（香港大学）等* arXiv。[[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.06830)] [[代码](https:\u002F\u002Fgithub.com\u002FOpenLemur\u002FLemur)]\n\n## 引用\n如果您觉得本仓库有用，请引用我们的论文：\n\n```\n@misc{xi2023rise,\n      title={大型语言模型驱动智能体的兴起与潜力：综述}, \n      author={奚志恒、陈文翔、郭昕、何为、丁怡文、洪博杨、张明、王浚哲、金森杰、周恩宇、郑睿、范晓然、王骁、熊立茂、周宇豪、王伟然、蒋昌浩、邹义成、刘向阳、尹章越、窦世涵、翁荣祥、程文森、张琪、秦文娟、郑永燕、邱锡鹏、黄轩静、桂涛},\n      year={2023},\n      eprint={2309.07864},\n      archivePrefix={arXiv},\n      primaryClass={cs.AI}\n}\n```\n\n\n## 项目维护者与贡献者\n- 奚志恒 （奚志恒, [@WooooDyy](https:\u002F\u002Fgithub.com\u002FWooooDyy)）\n- 陈文翔 （陈文翔, [@chenwxOggai](https:\u002F\u002Fgithub.com\u002FchenwxOggai)）\n- 郭昕 （郭昕, [@XinGuo2002](https:\u002F\u002Fgithub.com\u002FXinGuo2002)）\n- 何为 （何为, [@hewei2001](https:\u002F\u002Fgithub.com\u002Fhewei2001)）\n- 丁怡文 （丁怡文, [@Yiwen-Ding](https:\u002F\u002Fgithub.com\u002FYiwen-Ding)）\n- 洪博杨 （洪博杨, [@HongBoYang](https:\u002F\u002Fgithub.com\u002FHBY-hub)）\n- 张明 （张明, [@KongLongGeFDU](https:\u002F\u002Fgithub.com\u002FKongLongGeFDU)）\n- 王浚哲 （王浚哲, [@zsxmwjz](https:\u002F\u002Fgithub.com\u002Fzsxmwjz)）\n- 金森杰 （金森杰, [@Leonnnnnn929](https:\u002F\u002Fgithub.com\u002FLeonnnnnn929)）\n\n## 联系方式\n- 奚志恒：zhxi22@m.fudan.edu.cn\n\n\n\n## 星标历史\n\n[![星标历史图表](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWooooDyy_LLM-Agent-Paper-List_readme_8ba72ed8f802.png)](https:\u002F\u002Fstar-history.com\u002F#WooooDyy\u002FLLM-Agent-Paper-List&Date)","# LLM-Agent-Paper-List 快速上手指南\n\n`LLM-Agent-Paper-List` 是一个专注于大语言模型（LLM）智能体领域的开源论文清单与综述项目。它系统性地整理了基于 LLM 的智能体构建、应用、社会模拟等方向的核心文献，是研究人员和开发者追踪该领域前沿进展的必备资源库。\n\n本项目主要为**论文列表与知识库**，无需复杂的运行时环境即可浏览内容。若需复现列表中提到的具体算法（如 AgentGym），请参考对应论文的独立仓库。\n\n## 环境准备\n\n由于本项目本质为文档与资源索引，对系统要求极低：\n\n- **操作系统**：Windows \u002F macOS \u002F Linux 均可\n- **前置依赖**：\n  - Git（用于克隆仓库）\n  - 现代浏览器（用于查看渲染后的 Markdown 或访问链接）\n  - （可选）Python 3.x（仅当你需要运行列表中某些论文提供的配套代码时）\n\n> **注意**：本仓库本身不包含需要安装的 Python 包。如需体验项目中提到的 `AgentGym` 框架，请访问其独立仓库 [AgentGym](https:\u002F\u002Fgithub.com\u002FWooooDyy\u002FAgentGym)。\n\n## 安装步骤\n\n通过 Git 克隆项目到本地，即可随时查阅最新的论文列表。\n\n```bash\n# 1. 克隆仓库\ngit clone https:\u002F\u002Fgithub.com\u002FWooooDyy\u002FLLM-Agent-Paper-List.git\n\n# 2. 进入项目目录\ncd LLM-Agent-Paper-List\n\n# 3. (可选) 拉取最新更新，保持论文列表同步\ngit pull origin main\n```\n\n> **国内加速建议**：\n> 如果直接克隆速度较慢，可使用 Gitee 镜像（如有）或通过以下命令配置代理加速：\n> ```bash\n> # 示例：使用国内镜像源克隆 (若存在官方同步镜像)\n> # 或者使用 git 代理设置\n> export GIT_PROXY_COMMAND=\"connect-proxy\" \n> ```\n> *注：该项目主要托管于 GitHub，建议确保网络通畅或使用合法的加速工具。*\n\n## 基本使用\n\n### 1. 浏览论文清单\n克隆完成后，你可以直接在本地使用 Markdown 阅读器（如 VS Code、Typora）打开 `README.md` 文件，或者直接在 GitHub 网页上浏览。\n\n项目内容按以下逻辑分类，你可以根据需求快速定位：\n\n- **1. 智能体的诞生 (Construction)**：涵盖大脑（LLM 核心）、感知（多模态输入）和行动（工具使用\u002F具身行动）三大组件的相关论文。\n- **2. 实践应用 (Applications)**：包含单智能体任务、多智能体协作以及人机交互场景的研究。\n- **3. 智能体社会 (Agent Society)**：探讨智能体的行为、性格及社会模拟实验。\n- **4. 其他主题**：包括基准测试 (Benchmarks) 和训练优化方法。\n\n### 2. 查找特定论文\n在 `README.md` 中，每篇论文都标注了：\n- 📅 发布日期\n- 📝 标题与作者\n- 🔗 **[paper]**：指向 arXiv 或会议论文的链接\n- 💻 **[code]**：指向官方代码仓库的链接（如有）\n\n**使用示例**：\n假设你想研究“多智能体协作”，直接在文件中搜索 `2.2 Coordinating Potential of Multiple Agents`，即可找到相关论文列表，点击 `[paper]` 链接阅读原文，点击 `[code]` 链接获取实现代码。\n\n### 3. 贡献与更新\n该项目社区活跃，鼓励提交 PR 补充新论文。\n- 查看最新新闻：关注 `News` 章节，了解如 `AgentGym-RL` 等新框架的发布。\n- 提交新论文：通过 GitHub Issues 或 Pull Requests 向仓库添加遗漏的重要文献。\n\n---\n*提示：本指南仅针对论文列表仓库。若要动手开发智能体，请根据列表中推荐的论文（如 AgentGym），跳转至对应的代码仓库进行环境配置与模型训练。*","某高校人工智能实验室的博士生正在撰写关于“大语言模型智能体强化学习”的综述论文，急需梳理该领域的最新进展与核心文献。\n\n### 没有 LLM-Agent-Paper-List 时\n- **文献检索如大海捞针**：需要在 arXiv、Google Scholar 等多个平台反复搜索关键词，难以区分哪些是真正具有里程碑意义的必读论文，效率极低。\n- **知识体系支离破碎**：收集到的论文杂乱无章，缺乏统一的框架（如大脑、感知、行动）进行归类，难以构建系统的理论认知。\n- **错过关键前沿动态**：容易遗漏像 AgentGym-RL 这样刚刚发布、支持多轮强化学习的最新成果，导致研究内容滞后于社区发展。\n- **复现与环境搭建困难**：找到论文后，往往需要花费大量时间单独寻找对应的代码库、数据集或交互式前端，甚至发现资源已失效。\n\n### 使用 LLM-Agent-Paper-List 后\n- **一站式获取权威书单**：直接基于 86 页 SCIS 封面综述论文整理的清单，快速锁定涵盖单智能体、多智能体及社会行为等方向的必读文献。\n- **结构化掌握技术脉络**：依托工具提供的“大脑 - 感知 - 行动”概念框架，将零散论文有序归档，迅速理清技术演进路线。\n- **实时同步最新突破**：通过新闻板块即时捕捉到 2025 年 9 月发布的 AgentGym-RL 框架及其教程，确保研究紧跟最前沿的长程决策训练方法。\n- **资源链接直达可用**：每篇重要论文均附带项目主页、GitHub 代码库及 HuggingFace 数据集链接，甚至包含可可视化的交互前端，大幅缩短复现路径。\n\nLLM-Agent-Paper-List 将原本数周的文献调研工作压缩至数小时，让研究者能从繁琐的信息搜集转向深度的创新思考。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FWooooDyy_LLM-Agent-Paper-List_472f7abf.png","WooooDyy","Zhiheng Xi","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FWooooDyy_0bce9f32.jpg","Now PhD student at Fudan NLP Group of Fudan University. Previously got Bechler's degree at Nanjing University.\r\n","Fudan University",null,"Be1ong1","https:\u002F\u002Fwoooodyy.github.io\u002F","https:\u002F\u002Fgithub.com\u002FWooooDyy",8110,492,"2026-04-18T08:11:14","","未说明",{"notes":88,"python":86,"dependencies":89},"该仓库（LLM-Agent-Paper-List）主要是一个关于基于大语言模型（LLM）的智能体（Agents）的论文列表和综述资源，并非一个可直接运行的软件工具或框架，因此 README 中未提供具体的操作系统、GPU、内存、Python 版本或依赖库等运行环境需求。文中提到的相关代码实现（如 AgentGym, R3 等）位于独立的外部仓库链接中，需参考那些具体项目的文档以获取环境配置信息。",[],[14,13,35],[92,93,94,95,96],"agent","large-language-models","llm","nlp","survey","2026-03-27T02:49:30.150509","2026-04-18T22:31:45.717620",[],[]]