[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-LantaoYu--MARL-Papers":3,"tool-LantaoYu--MARL-Papers":64},[4,17,27,35,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,43,44,45,15,46,26,13,47],"数据工具","视频","插件","其他","音频",{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":10,"last_commit_at":54,"category_tags":55,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,46],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},2181,"OpenHands","OpenHands\u002FOpenHands","OpenHands 是一个专注于 AI 驱动开发的开源平台，旨在让智能体（Agent）像人类开发者一样理解、编写和调试代码。它解决了传统编程中重复性劳动多、环境配置复杂以及人机协作效率低等痛点，通过自动化流程显著提升开发速度。\n\n无论是希望提升编码效率的软件工程师、探索智能体技术的研究人员，还是需要快速原型验证的技术团队，都能从中受益。OpenHands 提供了灵活多样的使用方式：既可以通过命令行（CLI）或本地图形界面在个人电脑上轻松上手，体验类似 Devin 的流畅交互；也能利用其强大的 Python SDK 自定义智能体逻辑，甚至在云端大规模部署上千个智能体并行工作。\n\n其核心技术亮点在于模块化的软件智能体 SDK，这不仅构成了平台的引擎，还支持高度可组合的开发模式。此外，OpenHands 在 SWE-bench 基准测试中取得了 77.6% 的优异成绩，证明了其解决真实世界软件工程问题的能力。平台还具备完善的企业级功能，支持与 Slack、Jira 等工具集成，并提供细粒度的权限管理，适合从个人开发者到大型企业的各类用户场景。",70612,"2026-04-05T11:12:22",[26,15,13,45],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":81,"owner_twitter":82,"owner_website":83,"owner_url":84,"languages":82,"stars":85,"forks":86,"last_commit_at":87,"license":82,"difficulty_score":88,"env_os":89,"env_gpu":90,"env_ram":90,"env_deps":91,"category_tags":98,"github_topics":99,"view_count":10,"oss_zip_url":82,"oss_zip_packed_at":82,"status":16,"created_at":102,"updated_at":103,"faqs":104,"releases":135},1164,"LantaoYu\u002FMARL-Papers","MARL-Papers","Paper list of multi-agent reinforcement learning (MARL)","MARL-Papers是一个专注于多智能体强化学习（MARL）的论文集合工具，按时间顺序整理了相关研究和综述论文。它帮助用户快速找到该领域的最新成果和经典文献，涵盖从基础理论到实际应用的多个方向，如协作与竞争、通信学习、迁移学习等。对于从事人工智能、机器学习或机器人研究的学者和开发者来说，这是一个高效获取知识资源的平台。工具内容经过分类整理，便于查找和深入理解不同分支的研究进展。其开放性和持续更新的特性，使得研究人员能够紧跟领域动态，推动自身工作的发展。","## Paper Collection of Multi-Agent Reinforcement Learning (MARL)\n\nMulti-Agent Reinforcement Learning is a very interesting research area, which has strong connections with single-agent RL, multi-agent systems, game theory, evolutionary computation and optimization theory, and its application in Large Language Models (LLMs) and Robotics.\n\nThis is a collection of research and review papers of multi-agent reinforcement learning (MARL). The Papers are sorted by time. Any suggestions and pull requests are welcome.\n\nThe sharing principle of these references here is for research. If any authors do not want their paper to be listed here, please feel free to contact us.\n\n## Overview\n* [Tutorial](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#tutorial-and-books)\n* [Review Papers](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#review-papers)\n* [Research Papers](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#research-papers)\n  * [Framework](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#framework)\n  * [Joint action learning](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#joint-action-learning)\n  * [Cooperation and competition](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#cooperation-and-competition)\n  * [Coordination](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#coordination)\n  * [Security](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#security)\n  * [Self-Play](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#self-play)\n  * [Learning To Communicate](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#learning-to-communicate)\n  * [Transfer Learning](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#transfer-learning)\n  * [Imitation and Inverse Reinforcement Learning](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#imitation-and-inverse-reinforcement-learning)\n  * [Meta Learning](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#meta-learning)\n  * [Application](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#application)\n  * [Networked MARL (Decentralized Training Decentralized Execution)](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#networked-MARL)\n  * [MARL in LLMs (MARL in Large Language Models)](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#framework)\n  * [MARL in Robotics (MARL in Robotics)](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#framework)\n  \n\n## Tutorial and Books\n* [Multi-Agent Reinforcement Learning: Foundations and Modern Approaches](https:\u002F\u002Fwww.marl-book.com\u002Fdownload) by Stefano V. Albrecht, Filippos Christianos, Lukas Schäfer, 2023.\n* [Many-agent Reinforcement Learning](https:\u002F\u002Fdiscovery.ucl.ac.uk\u002Fid\u002Feprint\u002F10124273\u002F12\u002FYang_10124273_thesis_revised.pdf) by Yaodong Yang, 2021. PhD Thesis.\n* [Deep Multi-Agent Reinforcement Learning](https:\u002F\u002Fora.ox.ac.uk\u002Fobjects\u002Fuuid:a55621b3-53c0-4e1b-ad1c-92438b57ffa4) by Jakob N Foerster, 2018. PhD Thesis.\n* [Multi-Agent Machine Learning: A Reinforcement Approach](https:\u002F\u002Fonlinelibrary.wiley.com\u002Fdoi\u002Fbook\u002F10.1002\u002F9781118884614) by H. M. Schwartz, 2014.\n* [Multiagent Reinforcement Learning](http:\u002F\u002Fwww.ecmlpkdd2013.org\u002Fwp-content\u002Fuploads\u002F2013\u002F09\u002FMultiagent-Reinforcement-Learning.pdf) by Daan Bloembergen, Daniel Hennes, Michael Kaisers, Peter Vrancx. ECML, 2013.\n* [Multiagent systems: Algorithmic, game-theoretic, and logical foundations](http:\u002F\u002Fwww.masfoundations.org\u002Fdownload.html) by Shoham Y, Leyton-Brown K. Cambridge University Press, 2008.\n\n## Review Papers\n* [The Landscape of Agentic Reinforcement Learning for LLMs: A Survey](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.02547) by Guibin Zhang, Hejia Geng, Xiaohang Yu, Zhenfei Yin, Zaibin Zhang, Zelin Tan, Heng Zhou, Zhongzhi Li, Xiangyuan Xue, Yijiang Li, Yifan Zhou, Yang Chen, Chen Zhang, Yutao Fan, Zihu Wang, Songtao Huang, Yue Liao, Hongru Wang, Mengyue Yang, Heng Ji, Michael Littman, Jun Wang, Shuicheng Yan, Philip Torr, and Lei Bai. 2025. [[GitHub](https:\u002F\u002Fgithub.com\u002Fxhyumiracle\u002FAwesome-AgenticLLM-RL-Papers)]\n* [Model-based Multi-agent Reinforcement Learning: Recent Progress and Prospects](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.10603.pdf) by Xihuai Wang, Zhicheng Zhang, and Weinan Zhang. 2022.\n* [An overview of multi-agent reinforcement learning from game theoretical perspective](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2011.00583.pdf) by Yaodong Yang and Jun Wang. 2020.\n* [A Survey and Critique of Multiagent Deep Reinforcement Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.05587) by Pablo Hernandez-Leal, Bilal Kartal and Matthew E. Taylor. 2019.\n* [Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1911.10635.pdf) by Kaiqing Zhang, Zhuoran Yang, Tamer Başar. 2019.\n* [A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems](https:\u002F\u002Fwww.jair.org\u002Findex.php\u002Fjair\u002Farticle\u002Fview\u002F11396) by Silva, Felipe Leno da; Costa, Anna Helena Reali. JAIR, 2019.\n* [Autonomously Reusing Knowledge in Multiagent Reinforcement Learning](https:\u002F\u002Fwww.ijcai.org\u002Fproceedings\u002F2018\u002F774) by Silva, Felipe Leno da; Taylor, Matthew E.; Costa, Anna Helena Reali. IJCAI, 2018.\n* [Deep Reinforcement Learning Variants of Multi-Agent Learning Algorithms](https:\u002F\u002Fproject-archive.inf.ed.ac.uk\u002Fmsc\u002F20162091\u002Fmsc_proj.pdf) by Castaneda A O. 2016.\n* [Evolutionary Dynamics of Multi-Agent Learning: A Survey](https:\u002F\u002Fwww.jair.org\u002Findex.php\u002Fjair\u002Farticle\u002Fview\u002F10952) by Bloembergen, Daan, et al. JAIR, 2015.\n* [Game theory and multi-agent reinforcement learning](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F269100101_Game_Theory_and_Multi-agent_Reinforcement_Learning) by Nowé A, Vrancx P, De Hauwere Y M. Reinforcement Learning. Springer Berlin Heidelberg, 2012.\n* [Multi-agent reinforcement learning: An overview](http:\u002F\u002Fwww.dcsc.tudelft.nl\u002F~bdeschutter\u002Fpub\u002Frep\u002F10_003.pdf) by Buşoniu L, Babuška R, De Schutter B. Innovations in multi-agent systems and applications-1. Springer Berlin Heidelberg, 2010\n* [A comprehensive survey of multi-agent reinforcement learning](http:\u002F\u002Fwww.dcsc.tudelft.nl\u002F~bdeschutter\u002Fpub\u002Frep\u002F07_019.pdf) by Busoniu L, Babuska R, De Schutter B. IEEE Transactions on Systems Man and Cybernetics Part C Applications and Reviews, 2008\n* [If multi-agent learning is the answer, what is the question?](http:\u002F\u002Frobotics.stanford.edu\u002F~shoham\u002Fwww%20papers\u002FLearningInMAS.pdf) by Shoham Y, Powers R, Grenager T. Artificial Intelligence, 2007.\n* [From single-agent to multi-agent reinforcement learning: Foundational concepts and methods](http:\u002F\u002Fusers.isr.ist.utl.pt\u002F~mtjspaan\u002FreadingGroup\u002FlearningNeto05.pdf) by Neto G. Learning theory course, 2005.\n* [Evolutionary game theory and multi-agent reinforcement learning](https:\u002F\u002Fpdfs.semanticscholar.org\u002Fbb9f\u002Fbee22eae2b47bbf304804a6ac07def1aecdb.pdf) by Tuyls K, Nowé A. The Knowledge Engineering Review, 2005.\n* [An Overview of Cooperative and Competitive Multiagent Learning](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F221622801_An_Overview_of_Cooperative_and_Competitive_Multiagent_Learning) by Pieter Jan ’t HoenKarl TuylsLiviu PanaitSean LukeJ. A. La Poutré. AAMAS's workshop LAMAS, 2005.\n* [Cooperative multi-agent learning: the state of the art](https:\u002F\u002Fcs.gmu.edu\u002F~eclab\u002Fpapers\u002Fpanait05cooperative.pdf) by Liviu Panait and Sean Luke, 2005.\n\n## Research Papers\n\n### MARL in LLMs\n* [CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.08529) by Xiangyuan Xue, Yifan Zhou, Guibin Zhang, Zaibin Zhang, Yijiang Li, Chen Zhang, Zhenfei Yin, Philip Torr, Wanli Ouyang, and Lei Bai. 2025.\n* [Mutual Theory of Mind in Human-AI Collaboration: An Empirical Study with LLM-driven AI Agents in a Real-time Shared Workspace Task](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.08811.pdf) by Shao Zhang*, Xihuai Wang*, Wenhao Zhang, Yongshan Chen, Landi Gao, Dakuo Wang, Weinan Zhang, Xinbing Wang, and Ying Wen. 2024.\n* [Large language model based multi-agents: A survey of progress and challenges](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.01680) by Guo, Taicheng, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V. Chawla, Olaf Wiest, and Xiangliang Zhang. 2024.\n* [Leveraging Large Language Models for Optimised Coordination in Textual Multi-Agent Reinforcement Learning](https:\u002F\u002Fopenreview.net\u002Fpdf?id=1PPjf4wife) by Slumbers, Oliver, David Henry Mguni, Kun Shao, and Jun Wang. 2024.\n* [Theory of mind for multi-agent collaboration via large language models](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.10701) by Li, Huao, Yu Quan Chong, Simon Stepputtis, Joseph Campbell, Dana Hughes, Michael Lewis, and Katia Sycara. 2023.\n\n\n### Framework\n* [Safe Multi-Agent Reinforcement Learning with Bilevel Optimization in Autonomous Driving](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.18209) by Zhi Zheng and Shangding Gu, 2024.\n* [Multi-Agent Constrained Policy Optimisation](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2110.02793.pdf) by Shangding Gu, Jakub Grudzien Kuba, Munning Wen, Ruiqing Chen, Ziyan Wang, Zheng Tian, Jun Wang, Alois Knoll, and Yaodong Yang, 2021.\n* [Settling the Variance of Multi-Agent Policy Gradients](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2108.08612.pdf) by Kuba Jakub, Muning Wen, Linghui Meng, Shangding Gu, Haifeng Zhang, David Mguni, Jun Wang, and Yaodong Yang, NIPS 2021.\n* [QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.11485.pdf) by Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson. ICML 2018.\n* [Mean Field Multi-Agent Reinforcement Learning](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1802.05438.pdf) by Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, and Jun Wang. ICML 2018.\n* [Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1706.02275.pdf) by Lowe R, Wu Y, Tamar A, et al. arXiv, 2017.\n* [Deep Decentralized Multi-task Multi-Agent RL under Partial Observability](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.06182.pdf) by Omidshafiei S, Pazis J, Amato C, et al. arXiv, 2017.\n* [Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.10069.pdf) by Peng P, Yuan Q, Wen Y, et al. arXiv, 2017.\n* [Robust Adversarial Reinforcement Learning](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.02702.pdf) by Lerrel Pinto, James Davidson, Rahul Sukthankar, Abhinav Gupta. arXiv, 2017.\n* [Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1702.08887.pdf) by Foerster J, Nardelli N, Farquhar G, et al. arXiv, 2017.\n* [Multiagent reinforcement learning with sparse interactions by negotiation and knowledge transfer](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1508.05328.pdf) by Zhou L, Yang P, Chen C, et al. IEEE transactions on cybernetics, 2016.\n* [Decentralised multi-agent reinforcement learning for dynamic and uncertain environments](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1409.4561.pdf) by Marinescu A, Dusparic I, Taylor A, et al. arXiv, 2014.\n* [CLEANing the reward: counterfactual actions to remove exploratory action noise in multiagent learning](http:\u002F\u002Firll.eecs.wsu.edu\u002Fwp-content\u002Fpapercite-data\u002Fpdf\u002F2014iat-holmesparker.pdf) by HolmesParker C, Taylor M E, Agogino A, et al. AAMAS, 2014.\n* [Bayesian reinforcement learning for multiagent systems with state uncertainty](http:\u002F\u002Fwww.fransoliehoek.net\u002Fdocs\u002FAmato13MSDM.pdf) by Amato C, Oliehoek F A. MSDM Workshop, 2013.\n* [Multiagent learning: Basics, challenges, and prospects](http:\u002F\u002Fwww.weiss-gerhard.info\u002Fpublications\u002FAI_MAGAZINE_2012_TuylsWeiss.pdf) by Tuyls, Karl, and Gerhard Weiss. AI Magazine, 2012.\n* [Classes of multiagent q-learning dynamics with epsilon-greedy exploration](http:\u002F\u002Ficml2010.haifa.il.ibm.com\u002Fpapers\u002F191.pdf) by Wunder M, Littman M L, Babes M. ICML, 2010.\n* [Conditional random fields for multi-agent reinforcement learning](http:\u002F\u002Fwww.machinelearning.org\u002Fproceedings\u002Ficml2007\u002Fpapers\u002F89.pdf) by Zhang X, Aberdeen D, Vishwanathan S V N. ICML, 2007.\n* [Multi-agent reinforcement learning using strategies and voting](http:\u002F\u002Fama.imag.fr\u002F~partalas\u002Fpartalasmarl.pdf) by Partalas, Ioannis, Ioannis Feneris, and Ioannis Vlahavas. ICTAI, 2007.\n* [A reinforcement learning scheme for a partially-observable multi-agent game](https:\u002F\u002Fpdfs.semanticscholar.org\u002F57fb\u002Fae00e17c0d798559ebab0e8f4267e032f41d.pdf) by Ishii S, Fujita H, Mitsutake M, et al. Machine Learning, 2005.\n* [Asymmetric multiagent reinforcement learning](http:\u002F\u002Flib.tkk.fi\u002FDiss\u002F2004\u002Fisbn9512273594\u002Farticle1.pdf) by Könönen V. Web Intelligence and Agent Systems, 2004.\n* [Adaptive policy gradient in multiagent learning](http:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=860686) by Banerjee B, Peng J. AAMAS, 2003.\n* [Reinforcement learning to play an optimal Nash equilibrium in team Markov games](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F2171-reinforcement-learning-to-play-an-optimal-nash-equilibrium-in-team-markov-games.pdf) by Wang X, Sandholm T. NIPS, 2002.\n* [Multiagent learning using a variable learning rate](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS0004370202001212) by Michael Bowling and Manuela Veloso, 2002.\n* [Value-function reinforcement learning in Markov game](http:\u002F\u002Fwww.sts.rpi.edu\u002F~rsun\u002Fsi-mal\u002Farticle3.pdf) by Littman M L. Cognitive Systems Research, 2001.\n* [Hierarchical multi-agent reinforcement learning](http:\u002F\u002Fresearchers.lille.inria.fr\u002F~ghavamza\u002Fmy_website\u002FPublications_files\u002Fagents01.pdf) by Makar, Rajbala, Sridhar Mahadevan, and Mohammad Ghavamzadeh. The fifth international conference on Autonomous agents, 2001.\n* [An analysis of stochastic game theory for multiagent reinforcement learning](https:\u002F\u002Fwww.cs.cmu.edu\u002F~mmv\u002Fpapers\u002F00TR-mike.pdf) by Michael Bowling and Manuela Veloso, 2000.\n\n### Joint action learning\n* [AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents](http:\u002F\u002Fwww.cs.cmu.edu\u002F~conitzer\u002FawesomeML06.pdf) by Conitzer V, Sandholm T. Machine Learning, 2007.\n* [Extending Q-Learning to General Adaptive Multi-Agent Systems](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F2503-extending-q-learning-to-general-adaptive-multi-agent-systems.pdf) by Tesauro, Gerald. NIPS, 2003.\n* [Multiagent reinforcement learning: theoretical framework and an algorithm.](http:\u002F\u002Fwww.lirmm.fr\u002F~jq\u002FCours\u002F3cycle\u002Fmodule\u002FHuWellman98icml.pdf) by Hu, Junling, and Michael P. Wellman. ICML, 1998.\n* [The dynamics of reinforcement learning in cooperative multiagent systems](http:\u002F\u002Fwww.aaai.org\u002FPapers\u002FAAAI\u002F1998\u002FAAAI98-106.pdf) by Claus C, Boutilier C. AAAI, 1998.\n* [Markov games as a framework for multi-agent reinforcement learning](https:\u002F\u002Fwww.cs.duke.edu\u002Fcourses\u002Fspring07\u002Fcps296.3\u002Flittman94markov.pdf) by Littman, Michael L. ICML, 1994.\n\n### Cooperation and competition\n* [Order Matters: Agent-by-agent Policy Optimization](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2302.06205.pdf) by Xihuai Wang, Zheng Tian, Ziyu Wan, Ying Wen, Jun Wang, Weinan Zhang, ICLR 2023.\n* [Interaction Pattern Disentangling for Multi-Agent Reinforcement Learning](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2207.03902.pdf) by Shunyu Liu, Jie Song, Yihe Zhou, Na Yu, Kaixuan Chen, Zunlei Feng, Mingli Song. TPAMI, 2024.\n* [Contrastive Identity-Aware Learning for Multi-Agent Value Decomposition](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2211.12712.pdf) by Shunyu Liu, Yihe Zhou, Jie Song, Tongya Zheng, Kaixuan Chen, Tongtian Zhu, Zunlei Feng, Mingli Song. AAAI, 2023.\n* [Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.17352.pdf) by Yihe Zhou, Shunyu Liu, Yunpeng Qing, Kaixuan Chen, Tongya Zheng, Yanhao Huang, Jie Song, Mingli Song. 2023.\n* [Multi-Agent Reinforcement Learning is a Sequence Modeling Problem](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2205.14953.pdf), by Wen, Muning, Jakub Grudzien Kuba, Runji Lin, Weinan Zhang, Ying Wen, Jun Wang, and Yaodong Yang, 2022.\n* [The Complexity of Markov Equilibrium in Stochastic Games](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.03991.pdf) by Daskalakis, Constantinos, Noah Golowich, and Kaiqing Zhang, 2022.\n* [Trust region policy optimisation in multi-agent reinforcement learning](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2109.11251.pdf) by Kuba, Jakub Grudzien, Ruiqing Chen, Munning Wen, Ying Wen, Fanglei Sun, Jun Wang, and Yaodong Yang, ICLR 2022.\n* [Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.10603.pdf) by Weinan Zhang, Xihuai Wang, Jian Shen, and Ming Zhou. IJCAI 2021.\n* [The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.01955.pdf) by Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, Yi Wu, 2021.\n* [Human-level performance in 3D multiplayer games with population-based reinforcement learning](https:\u002F\u002Fwww.science.org\u002Fdoi\u002Fabs\u002F10.1126\u002Fscience.aau6249) by Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, et al. Science 364.6443: 859-865, 2019.\n* [Emergent complexity through multi-agent competition](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1710.03748.pdf) by Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, Igor Mordatch, 2018.\n* [Learning with opponent learning awareness](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1709.04326.pdf) by Jakob Foerster, Richard Y. Chen2, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch, 2018.\n* [Multi-agent Reinforcement Learning in Sequential Social Dilemmas](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1702.03037.pdf) by Leibo J Z, Zambaldi V, Lanctot M, et al. arXiv, 2017. [[Post](https:\u002F\u002Fdeepmind.com\u002Fblog\u002Funderstanding-agent-cooperation\u002F)]\n* [Cooperative Multi-Agent Control Using Deep Reinforcement Learning](https:\u002F\u002Fala2017.it.nuigalway.ie\u002Fpapers\u002FALA2017_Gupta.pdf) by Gupta, J. K., Egorov, M., & Kochenderfer, M. AAMAS 2017.\n* [Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies with PAC Bounds](http:\u002F\u002Forca.st.usm.edu\u002F~banerjee\u002Fpapers\u002Fp530-ceren.pdf) by Roi Ceren, Prashant Doshi, and Bikramjit Banerjee, pp. 530-538, AAMAS 2016.\n* [Opponent Modeling in Deep Reinforcement Learning](http:\u002F\u002Fwww.umiacs.umd.edu\u002F~hal\u002Fdocs\u002Fdaume16opponent.pdf) by He H, Boyd-Graber J, Kwok K, et al. ICML, 2016.\n* [Multiagent cooperation and competition with deep reinforcement learning](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1511.08779.pdf) by Tampuu A, Matiisen T, Kodelja D, et al. arXiv, 2015.\n* [Emotional multiagent reinforcement learning in social dilemmas](http:\u002F\u002Fwww.uow.edu.au\u002F~fren\u002Fdocuments\u002FEMR_2013.pdf) by Yu C, Zhang M, Ren F. International Conference on Principles and Practice of Multi-Agent Systems, 2013.\n* [Multi-agent reinforcement learning in common interest and fixed sum stochastic games: An experimental study](http:\u002F\u002Fwww.jmlr.org\u002Fpapers\u002Fvolume9\u002Fbab08a\u002Fbab08a.pdf) by Bab, Avraham, and Ronen I. Brafman. Journal of Machine Learning Research, 2008.\n* [Combining policy search with planning in multi-agent cooperation](https:\u002F\u002Fpdfs.semanticscholar.org\u002F5120\u002Fd9f2c738ad223e9f8f14cb3fd5612239a35c.pdf) by Ma J, Cameron S. Robot Soccer World Cup, 2008.\n* [Collaborative multiagent reinforcement learning by payoff propagation](http:\u002F\u002Fwww.jmlr.org\u002Fpapers\u002Fvolume7\u002Fkok06a\u002Fkok06a.pdf) by Kok J R, Vlassis N. JMLR, 2006.\n* [Learning to cooperate in multi-agent social dilemmas](http:\u002F\u002Fciteseerx.ist.psu.edu\u002Fviewdoc\u002Fdownload?doi=10.1.1.107.335&rep=rep1&type=pdf) by de Cote E M, Lazaric A, Restelli M. AAMAS, 2006.\n* [Learning to compete, compromise, and cooperate in repeated general-sum games](http:\u002F\u002Fwww.machinelearning.org\u002Fproceedings\u002Ficml2005\u002Fpapers\u002F021_Learning_CrandallGoodrich.pdf) by Crandall J W, Goodrich M A. ICML, 2005.\n* [Sparse cooperative Q-learning](http:\u002F\u002Fwww.machinelearning.org\u002Fproceedings\u002Ficml2004\u002Fpapers\u002F267.pdf) by Kok J R, Vlassis N. ICML, 2004.\n* [Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2106.01969.pdf) by Leonardos, Stefanos, Will Overman, Ioannis Panageas, and Georgios Piliouras. 2021\n* [Markov α-Potential Games: Equilibrium Approximation and Regret Analysis](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.12553.pdf) by Xin G, et al, 2023\n* [A Natural Actor-Critic Framework for Zero-Sum Markov Games](https:\u002F\u002Fproceedings.mlr.press\u002Fv162\u002Falacaoglu22a\u002Falacaoglu22a.pdf) Ahmet A. et al, ICML, 2022\n\n### Coordination\n* [ZSC-Eval: An Evaluation Toolkit and Benchmark for Multi-agent Zero-shot Coordination](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.05208.pdf) by Xihuai Wang, Shao Zhang, Wenhao Zhang, Wentao Dong, Jingxiao Chen, Ying Wen, and Weinan Zhang. NeurIPS 2024.\n* [Collaborating with Humans without Human Data](https:\u002F\u002Fopenreview.net\u002Fpdf?id=1Kof-nkmQB8) by DJ Strouse, Kevin R. McKee, Matt Botvinick, Edward Hughes, Richard Everett. NeurIPS 2021.\n* [Coordinated Multi-Agent Imitation Learning](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.03121.pdf) by Le H M, Yue Y, Carr P. arXiv, 2017.\n* [Reinforcement social learning of coordination in networked cooperative multiagent systems](http:\u002F\u002Fmipc.inf.ed.ac.uk\u002F2014\u002Fpapers\u002Fmipc2014_hao_etal.pdf) by Hao J, Huang D, Cai Y, et al. AAAI Workshop, 2014.\n* [Coordinating multi-agent reinforcement learning with limited communication](http:\u002F\u002Fwww.aamas-conference.org\u002FProceedings\u002Faamas2013\u002Fdocs\u002Fp1101.pdf) by Zhang, Chongjie, and Victor Lesser. AAMAS, 2013.\n* [Coordination guided reinforcement learning](http:\u002F\u002Fwww.ifaamas.org\u002FProceedings\u002Faamas2012\u002Fpapers\u002F1B_1.pdf) by Lau Q P, Lee M L, Hsu W. AAMAS, 2012.\n* [Coordination in multiagent reinforcement learning: a Bayesian approach](https:\u002F\u002Fwww.cs.toronto.edu\u002F~cebly\u002FPapers\u002FbayesMARL.pdf) by Chalkiadakis G, Boutilier C. AAMAS, 2003.\n* [Coordinated reinforcement learning](https:\u002F\u002Fusers.cs.duke.edu\u002F~parr\u002Ficml02.pdf) by Guestrin C, Lagoudakis M, Parr R. ICML, 2002.\n* [Reinforcement learning of coordination in cooperative multi-agent systems](http:\u002F\u002Fwww.aaai.org\u002FPapers\u002FAAAI\u002F2002\u002FAAAI02-050.pdf) by Kapetanakis S, Kudenko D. AAAI\u002FIAAI, 2002.\n\n### Security\n* [Markov Security Games: Learning in Spatial Security Problems](http:\u002F\u002Fwww.fransoliehoek.net\u002Fdocs\u002FKlima16LICMAS.pdf) by Klima R, Tuyls K, Oliehoek F. The Learning, Inference and Control of Multi-Agent Systems at NIPS, 2016.\n* [Cooperative Capture by Multi-Agent using Reinforcement Learning, Application for Security Patrol Systems](http:\u002F\u002Fieeexplore.ieee.org\u002Fstamp\u002Fstamp.jsp?arnumber=7244682) by Yasuyuki S, Hirofumi O, Tadashi M, et al. Control Conference (ASCC), 2015\n* [Improving learning and adaptation in security games by exploiting information asymmetry](http:\u002F\u002Fwww4.ncsu.edu\u002F~hdai\u002Finfocom-2015-XH.pdf) by He X, Dai H, Ning P. INFOCOM, 2015.\n\n### Self-Play\n* [A Comparison of Self-Play Algorithms Under a Generalized Framework](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.04471) by Daniel Hernandez, Kevin Denamganai, Sam Devlin, et al. IEEE Transactions on Games 2021\n* [A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1711.00832.pdf) by Marc Lanctot, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Perolat, David Silver, Thore Graepel. NIPS 2017.\n* [Deep reinforcement learning from self-play in imperfect-information games](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1603.01121.pdf) by Heinrich, Johannes, and David Silver. arXiv, 2016.\n* [Fictitious Self-Play in Extensive-Form Games](http:\u002F\u002Fjmlr.org\u002Fproceedings\u002Fpapers\u002Fv37\u002Fheinrich15.pdf) by Heinrich, Johannes, Marc Lanctot, and David Silver. ICML, 2015.\n\n### Learning To Communicate\n* [Hammer: Multi-level coordination of reinforcement learning agents via learned messaging] by Nikunj Gupta, G. Srinivasaraghavan, Swarup Mohalik, Nishant Kumar, and Matthew E. Taylor, Neural Computing and Applications, 2023.\" \n* [Learning to ground multi-agent communication with autoencoders](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2110.15349) by Lin, Toru, Jacob Huh, Christopher Stauffer, Ser Nam Lim, and Phillip Isola. 2021.\n* [Emergent Communication through Negotiation](https:\u002F\u002Fopenreview.net\u002Fpdf?id=Hk6WhagRW) by Kris Cao, Angeliki Lazaridou, Marc Lanctot, Joel Z Leibo, Karl Tuyls, Stephen Clark, 2018.\n* [Emergence of Linguistic Communication From Referential Games with Symbolic and Pixel Input](https:\u002F\u002Fopenreview.net\u002Fpdf?id=HJGv1Z-AW) by Angeliki Lazaridou, Karl Moritz Hermann, Karl Tuyls, Stephen Clark. ICLR 2018.\n* [Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols](https:\u002F\u002Fopenreview.net\u002Fpdf?id=SkaxnKEYg) by Serhii Havrylov, Ivan Titov. ICLR Workshop, 2017.\n* [Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.06585.pdf) by Abhishek Das, Satwik Kottur, et al. arXiv, 2017.\n* [Emergence of Grounded Compositional Language in Multi-Agent Populations](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.04908.pdf) by Igor Mordatch, Pieter Abbeel. arXiv, 2017. [[Post](https:\u002F\u002Fopenai.com\u002Fblog\u002Flearning-to-communicate\u002F)]\n* [Cooperation and communication in multiagent deep reinforcement learning](https:\u002F\u002Frepositories.lib.utexas.edu\u002Fhandle\u002F2152\u002F45681) by Hausknecht M J. 2017.\n* [Multi-agent cooperation and the emergence of (natural) language](https:\u002F\u002Fopenreview.net\u002Fpdf?id=Hk8N3Sclg) by Lazaridou A, Peysakhovich A, Baroni M. arXiv, 2016.\n* [Learning to communicate to solve riddles with deep distributed recurrent q-networks](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1602.02672.pdf) by Foerster J N, Assael Y M, de Freitas N, et al. arXiv, 2016.\n* [Learning to communicate with deep multi-agent reinforcement learning](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1605.06676.pdf) by Foerster J, Assael Y M, de Freitas N, et al. NIPS, 2016.\n* [Learning multiagent communication with backpropagation](http:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F6398-learning-multiagent-communication-with-backpropagation.pdf) by Sukhbaatar S, Fergus R. NIPS, 2016.\n* [Efficient distributed reinforcement learning through agreement](http:\u002F\u002Fpeople.csail.mit.edu\u002Flpk\u002Fpapers\u002Fdars08.pdf) by Varshavskaya P, Kaelbling L P, Rus D. Distributed Autonomous Robotic Systems, 2009.\n\n### Transfer Learning\n* [Simultaneously Learning and Advising in Multiagent Reinforcement Learning](http:\u002F\u002Fwww.ifaamas.org\u002FProceedings\u002Faamas2017\u002Fpdfs\u002Fp1100.pdf) by Silva, Felipe Leno da; Glatt, Ruben; and Costa, Anna Helena Reali. AAMAS, 2017.\n* [Accelerating Multiagent Reinforcement Learning through Transfer Learning](https:\u002F\u002Fwww.aaai.org\u002Focs\u002Findex.php\u002FAAAI\u002FAAAI17\u002Fpaper\u002Fdownload\u002F14217\u002F14005) by Silva, Felipe Leno da; and Costa, Anna Helena Reali. AAAI, 2017.\n* [Accelerating multi-agent reinforcement learning with dynamic co-learning](https:\u002F\u002Fweb.cs.umass.edu\u002Fpublication\u002Fdocs\u002F2015\u002FUM-CS-2015-004.pdf) by Garant D, da Silva B C, Lesser V, et al. Technical report, 2015\n* [Transfer learning in multi-agent systems through parallel transfer](https:\u002F\u002Fwww.scss.tcd.ie\u002F~tayloral\u002Fres\u002Fpapers\u002FTaylor_ParallelTransferLearning_ICML_2013.pdf) by Taylor, Adam, et al. ICML, 2013.\n* [Transfer learning in multi-agent reinforcement learning domains](https:\u002F\u002Fewrl.files.wordpress.com\u002F2011\u002F08\u002Fewrl2011_submission_19.pdf) by Boutsioukis, Georgios, Ioannis Partalas, and Ioannis Vlahavas. European Workshop on Reinforcement Learning, 2011.\n* [Transfer Learning for Multi-agent Coordination](https:\u002F\u002Fai.vub.ac.be\u002F~ydehauwe\u002Fpublications\u002FICAART2011_2.pdf) by Vrancx, Peter, Yann-Michaël De Hauwere, and Ann Nowé. ICAART, 2011.\n\n### Imitation and Inverse Reinforcement Learning\n* [On the Utility of Learning about Humans for Human-AI Coordination](https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.05789) by Micah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Griffiths, Sanjit A. Seshia, Pieter Abbeel, Anca Dragan. NeurIPS 2019.\n* [Multi-Agent Adversarial Inverse Reinforcement Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1907.13220) by Lantao Yu, Jiaming Song, Stefano Ermon. ICML 2019.\n* [Multi-Agent Generative Adversarial Imitation Learning](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F7975-multi-agent-generative-adversarial-imitation-learning) by Jiaming Song, Hongyu Ren, Dorsa Sadigh, Stefano Ermon. NeurIPS 2018.\n* [Cooperative inverse reinforcement learning](http:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F6420-cooperative-inverse-reinforcement-learning.pdf) by Hadfield-Menell D, Russell S J, Abbeel P, et al. NIPS, 2016.\n* [Comparison of Multi-agent and Single-agent Inverse Learning on a Simulated Soccer Example](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1403.6822.pdf) by Lin X, Beling P A, Cogill R. arXiv, 2014.\n* [Multi-agent inverse reinforcement learning for zero-sum games](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1403.6508.pdf) by Lin X, Beling P A, Cogill R. arXiv, 2014.\n* [Multi-robot inverse reinforcement learning under occlusion with interactions](http:\u002F\u002Faamas2014.lip6.fr\u002Fproceedings\u002Faamas\u002Fp173.pdf) by Bogert K, Doshi P. AAMAS, 2014.\n* [Multi-agent inverse reinforcement learning](http:\u002F\u002Fhomes.soic.indiana.edu\u002Fnatarasr\u002FPapers\u002Fmairl.pdf) by Natarajan S, Kunapuli G, Judah K, et al. ICMLA, 2010.\n\n### Meta Learning\n* [Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1710.03641.pdf) by l-Shedivat, M. 2018.\n\n\n### Application\n* [Mobile User Interface Adaptation Based on Usability Reward Model and Multi-Agent Reinforcement Learning](https:\u002F\u002Fwww.mdpi.com\u002F2414-4088\u002F8\u002F4\u002F26) by Vidmanov Dmitry, Alfimtsev Alexander. Multimodal Technologies and Interaction, 2024. \n* [Safe multiagent learning with soft constrained policy optimization in real robot control](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F10530650) by Shangding Gu, Dianye Huang, Muning Wen, Guang Chen, Alois Knoll. IEEE TII, 2024.\n* [MuZero with Self-competition for Rate Control in VP9 Video Compression](https:\u002F\u002Farxiv.org\u002Fabs\u002F2202.06626) by Amol Mandhane, Anton Zhernov, Maribeth Rauh, Chenjie Gu, et al. arXiv 2022.\n* [MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1712.00600.pdf) by Zheng L et al. NIPS 2017 & AAAI 2018 Demo. ([Github Page](https:\u002F\u002Fgithub.com\u002Fgeek-ai\u002FMAgent))\n* [Collaborative Deep Reinforcement Learning for Joint Object Search](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1702.05573.pdf) by Kong X, Xin B, Wang Y, et al. arXiv, 2017.\n* [Multi-Agent Stochastic Simulation of Occupants for Building Simulation](http:\u002F\u002Fwww.ibpsa.org\u002Fproceedings\u002FBS2017\u002FBS2017_051.pdf) by Chapman J, Siebers P, Darren R. Building Simulation, 2017.\n* [Extending No-MASS: Multi-Agent Stochastic Simulation for Demand Response of residential appliances](http:\u002F\u002Fwww.ibpsa.org\u002Fproceedings\u002FBS2017\u002FBS2017_056.pdf) by Sancho-Tomás A, Chapman J, Sumner M, Darren R. Building Simulation, 2017.\n* [Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1610.03295.pdf) by Shalev-Shwartz S, Shammah S, Shashua A. arXiv, 2016.\n* [Applying multi-agent reinforcement learning to watershed management](https:\u002F\u002Fwww.researchgate.net\u002Fprofile\u002FKarl_Mason\u002Fpublication\u002F299416955_Applying_Multi-Agent_Reinforcement_Learning_to_Watershed_Management\u002Flinks\u002F56f545b908ae95e8b6d1d3ff.pdf) by Mason, Karl, et al. Proceedings of the Adaptive and Learning Agents workshop at AAMAS, 2016.\n* [Crowd Simulation Via Multi-Agent Reinforcement Learning](http:\u002F\u002Fwww.aaai.org\u002Focs\u002Findex.php\u002FAIIDE\u002FAIIDE10\u002Fpaper\u002FviewFile\u002F2112\u002F2550) by Torrey L. AAAI, 2010.\n* [Traffic light control by multiagent reinforcement learning systems](https:\u002F\u002Fpdfs.semanticscholar.org\u002F61bc\u002Fb98b7ae3df894f4f72aba3d145bd48ca2cd5.pdf) by Bakker, Bram, et al. Interactive Collaborative Information Systems, 2010.\n* [Multiagent reinforcement learning for urban traffic control using coordination graphs](https:\u002F\u002Fstaff.science.uva.nl\u002Fs.a.whiteson\u002Fpubs\u002Fkuyerecml08.pdf) by Kuyer, Lior, et al. oint European Conference on Machine Learning and Knowledge Discovery in Databases, 2008.\n* [A multi-agent Q-learning framework for optimizing stock trading systems](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F221465347_A_Multi-agent_Q-learning_Framework_for_Optimizing_Stock_Trading_Systems) by Lee J W, Jangmin O. DEXA, 2002.\n* [Multi-agent reinforcement learning for traffic light control](http:\u002F\u002Fciteseerx.ist.psu.edu\u002Fviewdoc\u002Fdownload;jsessionid=422747CB9AF552CF1C4E455220E3F96F?doi=10.1.1.32.9887&rep=rep1&type=pdf) by Wiering, Marco. ICML. 2000.\n\n\n### Networked MARL\n* [QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus Innovations](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F6415291) by Kar, Soummya and Moura, José M. F. and Poor, H. Vincent. IEEE Transactions on Signal Processing 2013.\n* [Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents](https:\u002F\u002Fproceedings.mlr.press\u002Fv80\u002Fzhang18n.html) by Kaiqing Zhang, Zhuoran Yang, Han Liu, Tong Zhang, Tamer Basar. ICML 2018.\n* [Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2019\u002Fhash\u002F8a0e1141fd37fa5b98d5bb769ba1a7cc-Abstract.html) by Chao Qu, Shie Mannor, Huan Xu, Yuan Qi, Le Song, Junwu Xiong. NIPS 2019.\n* [Multi-agent Reinforcement Learning for Networked System Control](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.01339) by Tianshu Chu, Sandeep Chinchali, Sachin Katti. ICLR 2020.\n* [F2A2: Flexible fully-decentralized approximate actor-critic for cooperative multi-agent reinforcement learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.11145) by Wenhao Li, Bo Jin, Xiangfeng Wang, Junchi Yan, Hongyuan Zha. arXiv 2020.\n* [Scalable Reinforcement Learning of Localized Policies for Multi-Agent Networked Systems](https:\u002F\u002Fproceedings.mlr.press\u002Fv120\u002Fqu20a.html) by Guannan Qu, Adam Wierman, Na Li. L4DC 2020.\n* [Finite-Sample Analysis For Decentralized Batch Multi-Agent Reinforcement Learning With Networked Agents](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F9314079) by Zhang, Kaiqing and Yang, Zhuoran and Liu, Han and Zhang, Tong and Başar, Tamer. TAC 2021.\n\n\n","## 多智能体强化学习（MARL）论文集\n\n多智能体强化学习是一个非常有趣的研究领域，它与单智能体强化学习、多智能体系统、博弈论、进化计算和优化理论有着密切的联系，并且在大型语言模型（LLMs）和机器人学中有着广泛的应用。\n\n这是一个关于多智能体强化学习（MARL）的研究和综述论文的集合。论文按时间顺序排列。欢迎提出建议和拉取请求。\n\n此处引用文献的分享原则仅用于研究目的。如果任何作者不希望自己的论文被列在此处，请随时与我们联系。\n\n## 概述\n* [教程](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#tutorial-and-books)\n* [综述论文](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#review-papers)\n* [研究论文](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#research-papers)\n  * [框架](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#framework)\n  * [联合行动学习](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#joint-action-learning)\n  * [合作与竞争](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#cooperation-and-competition)\n  * [协调](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#coordination)\n  * [安全性](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#security)\n  * [自我对弈](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#self-play)\n  * [学习沟通](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#learning-to-communicate)\n  * [迁移学习](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#transfer-learning)\n  * [模仿与逆向强化学习](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#imitation-and-inverse-reinforcement-learning)\n  * [元学习](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#meta-learning)\n  * [应用](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#application)\n  * [网络化MARL（去中心化训练、去中心化执行）](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#networked-MARL)\n  * [LLMs中的MARL（大型语言模型中的多智能体强化学习）](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#framework)\n  * [机器人学中的MARL（机器人学中的多智能体强化学习）](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#framework)\n\n## 教程与书籍\n* [多智能体强化学习：基础与现代方法](https:\u002F\u002Fwww.marl-book.com\u002Fdownload) 由 Stefano V. Albrecht、Filippos Christianos、Lukas Schäfer 编著，2023年。\n* [多智能体强化学习](https:\u002F\u002Fdiscovery.ucl.ac.uk\u002Fid\u002Feprint\u002F10124273\u002F12\u002FYang_10124273_thesis_revised.pdf) 由 Yaodong Yang 编著，2021年。博士论文。\n* [深度多智能体强化学习](https:\u002F\u002Fora.ox.ac.uk\u002Fobjects\u002Fuuid:a55621b3-53c0-4e1b-ad1c-92438b57ffa4) 由 Jakob N Foerster 编著，2018年。博士论文。\n* [多智能体机器学习：强化学习方法](https:\u002F\u002Fonlinelibrary.wiley.com\u002Fdoi\u002Fbook\u002F10.1002\u002F9781118884614) 由 H. M. Schwartz 编著，2014年。\n* [多智能体强化学习](http:\u002F\u002Fwww.ecmlpkdd2013.org\u002Fwp-content\u002Fuploads\u002F2013\u002F09\u002FMultiagent-Reinforcement-Learning.pdf) 由 Daan Bloembergen、Daniel Hennes、Michael Kaisers、Peter Vrancx 编著。ECML，2013年。\n* [多智能体系统：算法、博弈论和逻辑基础](http:\u002F\u002Fwww.masfoundations.org\u002Fdownload.html) 由 Shoham Y、Leyton-Brown K 编著。剑桥大学出版社，2008年。\n\n## 综述论文\n* [LLMs中代理强化学习的现状：综述](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.02547) 由 Guibin Zhang、Hejia Geng、Xiaohang Yu、Zhenfei Yin、Zaibin Zhang、Zelin Tan、Heng Zhou、Zhongzhi Li、Xiangyuan Xue、Yijiang Li、Yifan Zhou、Yang Chen、Chen Zhang、Yutao Fan、Zihu Wang、Songtao Huang、Yue Liao、Hongru Wang、Mengyue Yang、Heng Ji、Michael Littman、Jun Wang、Shuicheng Yan、Philip Torr 和 Lei Bai 共同撰写。2025年。[[GitHub](https:\u002F\u002Fgithub.com\u002Fxhyumiracle\u002FAwesome-AgenticLLM-RL-Papers)]\n* [基于模型的多智能体强化学习：最新进展与展望](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.10603.pdf) 由 Xihuai Wang、Zhicheng Zhang 和 Weinan Zhang 共同撰写。2022年。\n* [从博弈论视角看多智能体强化学习概述](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2011.00583.pdf) 由 Yaodong Yang 和 Jun Wang 共同撰写。2020年。\n* [多智能体深度强化学习的综述与评论](https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.05587) 由 Pablo Hernandez-Leal、Bilal Kartal 和 Matthew E. Taylor 共同撰写。2019年。\n* [多智能体强化学习：理论与算法的精选概述](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1911.10635.pdf) 由 Kaiqing Zhang、Zhuoran Yang 和 Tamer Başar 共同撰写。2019年。\n* [多智能体强化学习系统中迁移学习的综述](https:\u002F\u002Fwww.jair.org\u002Findex.php\u002Fjair\u002Farticle\u002Fview\u002F11396) 由 Silva、Felipe Leno da；Costa、Anna Helena Reali 共同撰写。JAIR，2019年。\n* [多智能体强化学习中知识的自主复用](https:\u002F\u002Fwww.ijcai.org\u002Fproceedings\u002F2018\u002F774) 由 Silva、Felipe Leno da；Taylor、Matthew E.；Costa、Anna Helena Reali 共同撰写。IJCAI，2018年。\n* [多智能体学习算法的深度强化学习变体](https:\u002F\u002Fproject-archive.inf.ed.ac.uk\u002Fmsc\u002F20162091\u002Fmsc_proj.pdf) 由 Castaneda A O 共同撰写。2016年。\n* [多智能体学习的演化动力学：综述](https:\u002F\u002Fwww.jair.org\u002Findex.php\u002Fjair\u002Farticle\u002Fview\u002F10952) 由 Bloembergen、Daan 等人共同撰写。JAIR，2015年。\n* [博弈论与多智能体强化学习](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F269100101_Game_Theory_and_Multi-agent_Reinforcement_Learning) 由 Nowé A、Vrancx P、De Hauwere Y M 共同撰写。强化学习。施普林格柏林海德堡，2012年。\n* [多智能体强化学习概述](http:\u002F\u002Fwww.dcsc.tudelft.nl\u002F~bdeschutter\u002Fpub\u002Frep\u002F10_003.pdf) 由 Buşoniu L、Babuška R、De Schutter B 共同撰写。多智能体系统及应用创新-1。施普林格柏林海德堡，2010年。\n* [多智能体强化学习的全面综述](http:\u002F\u002Fwww.dcsc.tudelft.nl\u002F~bdeschutter\u002Fpub\u002Frep\u002F07_019.pdf) 由 Busoniu L、Babuska R、De Schutter B 共同撰写。IEEE系统、人机与网络学杂志C部分：应用与评论，2008年。\n* [如果多智能体学习是答案，那么问题是什么？](http:\u002F\u002Frobotics.stanford.edu\u002F~shoham\u002Fwww%20papers\u002FLearningInMAS.pdf) 由 Shoham Y、Powers R、Grenager T 共同撰写。人工智能，2007年。\n* [从单智能体到多智能体强化学习：基础概念和方法](http:\u002F\u002Fusers.isr.ist.utl.pt\u002F~mtjspaan\u002FreadingGroup\u002FlearningNeto05.pdf) 由 Neto G 共同撰写。学习理论课程，2005年。\n* [进化博弈论与多智能体强化学习](https:\u002F\u002Fpdfs.semanticscholar.org\u002Fbb9f\u002Fbee22eae2b47bbf304804a6ac07def1aecdb.pdf) 由 Tuyls K、Nowé A 共同撰写。知识工程评论，2005年。\n* [合作与竞争性多智能体学习概述](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F221622801_An_Overview_of_Cooperative_and_Competitive_Multiagent_Learning) 由 Pieter Jan ’t Hoen、Karl Tuyls、Liviu Panait、Sean Luke、J. A. La Poutré 共同撰写。AAMAS的工作坊LAMAS，2005年。\n* [合作性多智能体学习：现状](https:\u002F\u002Fcs.gmu.edu\u002F~eclab\u002Fpapers\u002Fpanait05cooperative.pdf) 由 Liviu Panait 和 Sean Luke 共同撰写，2005年。\n\n## 研究论文\n\n### LLM中的MARL\n* [CoMAS：通过交互奖励实现多智能体系统的协同进化](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.08529) 作者：薛向远、周一帆、张贵斌、张再彬、李义江、张晨、尹振飞、菲利普·托尔、欧阳万利和白磊。2025年。\n* [人机协作中的互为主观理论：基于LLM驱动的AI代理在实时共享工作空间任务中的实证研究](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.08811.pdf) 作者：张绍*、王希怀*、张文浩、陈永山、高兰迪、王大阔、张伟楠、王新兵和温颖。2024年。\n* [基于大型语言模型的多智能体系统：进展与挑战综述](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2402.01680) 作者：郭泰成、陈秀英、王雅琪、常瑞迪、裴世超、尼特什·V·乔拉、奥拉夫·维斯特和张祥亮。2024年。\n* [利用大型语言模型优化文本型多智能体强化学习中的协调](https:\u002F\u002Fopenreview.net\u002Fpdf?id=1PPjf4wife) 作者：斯拉姆伯斯、奥利弗、大卫·亨利·姆古尼、邵坤和王军。2024年。\n* [通过大型语言模型实现多智能体协作的理论思维](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.10701) 作者：李华欧、仇宇权、西蒙·施特普蒂斯、约瑟夫·坎贝尔、达娜·休斯、迈克尔·刘易斯和卡蒂娅·西卡拉。2023年。\n\n\n### 框架\n* [自动驾驶中的双层优化安全多智能体强化学习](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2405.18209) 作者：郑志和顾尚定，2024年。\n* [多智能体约束策略优化](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2110.02793.pdf) 作者：顾尚定、雅库布·格鲁津·库巴、温慕宁、陈瑞清、王子渊、田正、王军、阿洛伊斯·克诺尔和杨耀东，2021年。\n* [解决多智能体策略梯度的方差问题](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2108.08612.pdf) 作者：库巴·雅库布、温慕宁、孟令辉、顾尚定、张海峰、大卫·姆古尼、王军和杨耀东，NIPS 2021。\n* [QMIX：用于深度多智能体强化学习的单调值函数分解](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.11485.pdf) 作者：塔比什·拉希德、米卡耶尔·萨姆韦良、克里斯蒂安·施罗德·德·维特、格雷戈里·法夸尔、雅各布·福斯特、西蒙·怀特森。ICML 2018。\n* [平均场多智能体强化学习](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1802.05438.pdf) 作者：杨耀东、罗睿、李敏、周明、张伟楠和王军。ICML 2018。\n* [混合合作-竞争环境下的多智能体演员-评论家算法](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1706.02275.pdf) 作者：Lowe R、Wu Y、Tamar A等。arXiv，2017年。\n* [部分可观测条件下的深度去中心化多任务多智能体强化学习](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.06182.pdf) 作者：Omidshafiei S、Pazis J、Amato C等。arXiv，2017年。\n* [用于学习玩《星际争霸》战斗游戏的多智能体双向协调网络](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.10069.pdf) 作者：彭鹏、袁强、文勇等。arXiv，2017年。\n* [鲁棒对抗性强化学习](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.02702.pdf) 作者：莱瑞尔·平托、詹姆斯·戴维森、拉胡尔·苏克坦卡尔和阿比纳夫·古普塔。arXiv，2017年。\n* [稳定深度多智能体强化学习的经验回放](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1702.08887.pdf) 作者：福斯特 J、纳尔德利 N、法夸尔 G等。arXiv，2017年。\n* [通过协商和知识转移实现稀疏交互的多智能体强化学习](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1508.05328.pdf) 作者：周立、杨鹏、陈超等。IEEE控制论汇刊，2016年。\n* [面向动态不确定环境的去中心化多智能体强化学习](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1409.4561.pdf) 作者：马林斯库 A、杜斯帕里奇 I、泰勒 A等。arXiv，2014年。\n* [清理奖励：通过反事实行动去除多智能体学习中的探索性动作噪声](http:\u002F\u002Firll.eecs.wsu.edu\u002Fwp-content\u002Fpapercite-data\u002Fpdf\u002F2014iat-holmesparker.pdf) 作者：霍姆斯帕克 C、泰勒 M E、阿戈吉诺 A等。AAMAS，2014年。\n* [具有状态不确定性多智能体系统的贝叶斯强化学习](http:\u002F\u002Fwww.fransoliehoek.net\u002Fdocs\u002FAmato13MSDM.pdf) 作者：阿马托 C、奥利霍克 F A。MSDM研讨会，2013年。\n* [多智能体学习：基础、挑战与展望](http:\u002F\u002Fwww.weiss-gerhard.info\u002Fpublications\u002FAI_MAGAZINE_2012_TuylsWeiss.pdf) 作者：图尔斯·卡尔和格哈德·魏斯。人工智能杂志，2012年。\n* [具有epsilon-greedy探索的多智能体q-learning动力学类别](http:\u002F\u002Ficml2010.haifa.il.ibm.com\u002Fpapers\u002F191.pdf) 作者：温德 M、利特曼 M L、巴贝斯 M。ICML，2010年。\n* [用于多智能体强化学习的条件随机场](http:\u002F\u002Fwww.machinelearning.org\u002Fproceedings\u002Ficml2007\u002Fpapers\u002F89.pdf) 作者：张晓、阿伯丁 D、维什瓦纳坦 S V N。ICML，2007年。\n* [使用策略和投票进行多智能体强化学习](http:\u002F\u002Fama.imag.fr\u002F~partalas\u002Fpartalasmarl.pdf) 作者：帕塔拉斯、伊奥annis·费内里斯和伊奥annis·弗拉哈瓦斯。ICTAI，2007年。\n* [部分可观测多智能体游戏的强化学习方案](https:\u002F\u002Fpdfs.semanticscholar.org\u002F57fb\u002Fae00e17c0d798559ebab0e8f4267e032f41d.pdf) 作者：石井 S、藤田 H、光武 M等。机器学习，2005年。\n* [非对称多智能体强化学习](http:\u002F\u002Flib.tkk.fi\u002FDiss\u002F2004\u002Fisbn9512273594\u002Farticle1.pdf) 作者：科农恩 V。网络智能与智能体系统，2004年。\n* [多智能体学习中的适应性策略梯度](http:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=860686) 作者：班纳吉 B、彭 J。AAMAS，2003年。\n* [为团队马尔可夫游戏中博弈最优纳什均衡而设计的强化学习](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F2171-reinforcement-learning-to-play-an-optimal-nash-equilibrium-in-team-markov-games.pdf) 作者：王旭和桑德霍姆 T。NIPS，2002年。\n* [使用可变学习率的多智能体学习](https:\u002F\u002Fwww.sciencedirect.com\u002Fscience\u002Farticle\u002Fpii\u002FS0004370202001212) 作者：迈克尔·鲍林和曼努埃拉·维洛索，2002年。\n* [马尔可夫游戏中基于价值函数的强化学习](http:\u002F\u002Fwww.sts.rpi.edu\u002F~rsun\u002Fsi-mal\u002Farticle3.pdf) 作者：利特曼 M L。认知系统研究，2001年。\n* [层次化多智能体强化学习](http:\u002F\u002Fresearchers.lille.inria.fr\u002F~ghavamza\u002Fmy_website\u002FPublications_files\u002Fagents01.pdf) 作者：马卡尔、拉杰巴拉、斯里达尔·马哈德万和穆罕默德·加瓦姆扎德赫。第五届自主智能体国际会议，2001年。\n* [多智能体强化学习的随机博弈理论分析](https:\u002F\u002Fwww.cs.cmu.edu\u002F~mmv\u002Fpapers\u002F00TR-mike.pdf) 作者：迈克尔·鲍林和曼努埃拉·维洛索，2000年。\n\n### 联合行动学习\n* [AWESOME：一种在自我对弈中收敛且能针对静态对手学习最佳应对策略的通用多智能体学习算法](http:\u002F\u002Fwww.cs.cmu.edu\u002F~conitzer\u002FawesomeML06.pdf)，作者：Conitzer V、Sandholm T。机器学习，2007年。\n* [将Q学习扩展到通用自适应多智能体系统](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F2503-extending-q-learning-to-general-adaptive-multi-agent-systems.pdf)，作者：Tesauro, Gerald。NIPS，2003年。\n* [多智能体强化学习：理论框架与算法](http:\u002F\u002Fwww.lirmm.fr\u002F~jq\u002FCours\u002F3cycle\u002Fmodule\u002FHuWellman98icml.pdf)，作者：Hu, Junling 和 Michael P. Wellman。ICML，1998年。\n* [合作型多智能体系统中强化学习的动力学](http:\u002F\u002Fwww.aaai.org\u002FPapers\u002FAAAI\u002F1998\u002FAAAI98-106.pdf)，作者：Claus C、Boutilier C。AAAI，1998年。\n* [马尔可夫博弈作为多智能体强化学习的框架](https:\u002F\u002Fwww.cs.duke.edu\u002Fcourses\u002Fspring07\u002Fcps296.3\u002Flittman94markov.pdf)，作者：Littman, Michael L。ICML，1994年。\n\n### 合作与竞争\n* [顺序很重要：基于智能体的策略优化](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2302.06205.pdf)，作者：Xihuai Wang、Zheng Tian、Ziyu Wan、Ying Wen、Jun Wang、Weinan Zhang，ICLR 2023。\n* [多智能体强化学习中的交互模式解耦](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2207.03902.pdf)，作者：Shunyu Liu、Jie Song、Yihe Zhou、Na Yu、Kaixuan Chen、Zunlei Feng、Mingli Song。TPAMI，2024年。\n* [面向多智能体价值分解的对比式身份感知学习](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2211.12712.pdf)，作者：Shunyu Liu、Yihe Zhou、Jie Song、Tongya Zheng、Kaixuan Chen、Tongtian Zhu、Zunlei Feng、Mingli Song。AAAI，2023年。\n* [集中训练与分散执行框架对于MARL来说是否足够集中？](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.17352.pdf)，作者：Yihe Zhou、Shunyu Liu、Yunpeng Qing、Kaixuan Chen、Tongya Zheng、Yanhao Huang、Jie Song、Mingli Song。2023年。\n* [多智能体强化学习是一个序列建模问题](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2205.14953.pdf)，作者：Wen、Muning、Jakub Grudzien Kuba、Runji Lin、Weinan Zhang、Ying Wen、Jun Wang 和 Yaodong Yang，2022年。\n* [随机博弈中马尔可夫均衡的复杂性](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2204.03991.pdf)，作者：Daskalakis, Constantinos、Noah Golowich 和 Kaiqing Zhang，2022年。\n* [多智能体强化学习中的信任域策略优化](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2109.11251.pdf)，作者：Kuba、Jakub Grudzien、Ruiqing Chen、Munning Wen、Ying Wen、Fanglei Sun、Jun Wang 和 Yaodong Yang，ICLR 2022。\n* [基于模型的多智能体策略优化与自适应对手模拟滚动](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2203.10603.pdf)，作者：Weinan Zhang、Xihuai Wang、Jian Shen 和 Ming Zhou。IJCAI 2021。\n* [PPO在合作型多智能体游戏中的惊人效果](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.01955.pdf)，作者：Chao Yu、Akash Velu、Eugene Vinitsky、Yu Wang、Alexandre Bayen、Yi Wu，2021年。\n* [基于群体的强化学习在3D多人游戏中达到人类水平表现](https:\u002F\u002Fwww.science.org\u002Fdoi\u002Fabs\u002F10.1126\u002Fscience.aau6249)，作者：Max Jaderberg、Wojciech M. Czarnecki、Iain Dunning 等。Science 364.6443: 859-865，2019年。\n* [通过多智能体竞争涌现的复杂性](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1710.03748.pdf)，作者：Trapit Bansal、Jakub Pachocki、Szymon Sidor、Ilya Sutskever、Igor Mordatch，2018年。\n* [具有对手学习意识的学习](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1709.04326.pdf)，作者：Jakob Foerster、Richard Y. Chen2、Maruan Al-Shedivat、Shimon Whiteson、Pieter Abbeel、Igor Mordatch，2018年。\n* [顺序型社会困境中的多智能体强化学习](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1702.03037.pdf)，作者：Leibo J Z、Zambaldi V、Lanctot M 等。arXiv，2017年。[[文章](https:\u002F\u002Fdeepmind.com\u002Fblog\u002Funderstanding-agent-cooperation\u002F)]\n* [使用深度强化学习进行合作型多智能体控制](https:\u002F\u002Fala2017.it.nuigalway.ie\u002Fpapers\u002FALA2017_Gupta.pdf)，作者：Gupta, J. K.、Egorov, M. 和 Kochenderfer, M。AAMAS 2017。\n* [部分可观测多智能体环境中的强化学习：具有PAC界值的蒙特卡洛探索策略](http:\u002F\u002Forca.st.usm.edu\u002F~banerjee\u002Fpapers\u002Fp530-ceren.pdf)，作者：Roi Ceren、Prashant Doshi 和 Bikramjit Banerjee，页码530–538，AAMAS 2016。\n* [深度强化学习中的对手建模](http:\u002F\u002Fwww.umiacs.umd.edu\u002F~hal\u002Fdocs\u002Fdaume16opponent.pdf)，作者：He H、Boyd-Graber J、Kwok K 等。ICML 2016。\n* [深度强化学习下的多智能体合作与竞争](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1511.08779.pdf)，作者：Tampuu A、Matiisen T、Kodelja D 等。arXiv，2015年。\n* [社会困境中的情感型多智能体强化学习](http:\u002F\u002Fwww.uow.edu.au\u002F~fren\u002Fdocuments\u002FEMR_2013.pdf)，作者：Yu C、Zhang M、Ren F。国际多智能体系统原理与实践会议，2013年。\n* [共同利益与固定总和随机博弈中的多智能体强化学习：一项实验研究](http:\u002F\u002Fwww.jmlr.org\u002Fpapers\u002Fvolume9\u002Fbab08a\u002Fbab08a.pdf)，作者：Bab, Avraham 和 Ronen I. Brafman。机器学习研究期刊，2008年。\n* [在多智能体合作中结合策略搜索与规划](https:\u002F\u002Fpdfs.semanticscholar.org\u002F5120\u002Fd9f2c738ad223e9f8f14cb3fd5612239a35c.pdf)，作者：Ma J、Cameron S。机器人足球世界杯，2008年。\n* [通过收益传播实现协作型多智能体强化学习](http:\u002F\u002Fwww.jmlr.org\u002Fpapers\u002Fvolume7\u002Fkok06a\u002Fkok06a.pdf)，作者：Kok J R、Vlassis N。JMLR，2006年。\n* [在多智能体社会困境中学习合作](http:\u002F\u002Fciteseerx.ist.psu.edu\u002Fviewdoc\u002Fdownload?doi=10.1.1.107.335&rep=rep1&type=pdf)，作者：de Cote E M、Lazaric A、Restelli M。AAMAS，2006年。\n* [在重复的非零和博弈中学习竞争、妥协与合作](http:\u002F\u002Fwww.machinelearning.org\u002Fproceedings\u002Ficml2005\u002Fpapers\u002F021_Learning_CrandallGoodrich.pdf)，作者：Crandall J W、Goodrich M A。ICML，2005年。\n* [稀疏合作型Q学习](http:\u002F\u002Fwww.machinelearning.org\u002Fproceedings\u002Ficml2004\u002Fpapers\u002F267.pdf)，作者：Kok J R、Vlassis N。ICML，2004年。\n* [马尔可夫势博弈中多智能体策略梯度的全局收敛](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2106.01969.pdf)，作者：Leonardos, Stefanos、Will Overman、Ioannis Panageas 和 Georgios Piliouras。2021年。\n* [马尔可夫α-势博弈：均衡近似与后悔分析](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.12553.pdf)，作者：Xin G 等，2023年。\n* [零和马尔可夫博弈的自然演员-评论家框架](https:\u002F\u002Fproceedings.mlr.press\u002Fv162\u002Falacaoglu22a\u002Falacaoglu22a.pdf)，作者：Ahmet A. 等，ICML，2022年。\n\n### 协作\n* [ZSC-Eval：多智能体零样本协作的评估工具包和基准](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2310.05208.pdf) 作者：王希怀、张绍、张文浩、董文涛、陈静晓、温颖、张伟楠。NeurIPS 2024。\n* [无需人类数据即可与人类协作](https:\u002F\u002Fopenreview.net\u002Fpdf?id=1Kof-nkmQB8) 作者：DJ Strouse、Kevin R. McKee、Matt Botvinick、Edward Hughes、Richard Everett。NeurIPS 2021。\n* [协调式多智能体模仿学习](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.03121.pdf) 作者：Le H M、Yue Y、Carr P。arXiv，2017年。\n* [网络化合作式多智能体系统中协作的强化社会学习](http:\u002F\u002Fmipc.inf.ed.ac.uk\u002F2014\u002Fpapers\u002Fmipc2014_hao_etal.pdf) 作者：Hao J、Huang D、Cai Y 等。AAAI研讨会，2014年。\n* [有限通信下的多智能体强化学习协作](http:\u002F\u002Fwww.aamas-conference.org\u002FProceedings\u002Faamas2013\u002Fdocs\u002Fp1101.pdf) 作者：Zhang, Chongjie 和 Victor Lesser。AAMAS，2013年。\n* [协作引导的强化学习](http:\u002F\u002Fwww.ifaamas.org\u002FProceedings\u002Faamas2012\u002Fpapers\u002F1B_1.pdf) 作者：Lau Q P、Lee M L、Hsu W。AAMAS，2012年。\n* [多智能体强化学习中的协作：贝叶斯方法](https:\u002F\u002Fwww.cs.toronto.edu\u002F~cebly\u002FPapers\u002FbayesMARL.pdf) 作者：Chalkiadakis G、Boutilier C。AAMAS，2003年。\n* [协调式强化学习](https:\u002F\u002Fusers.cs.duke.edu\u002F~parr\u002Ficml02.pdf) 作者：Guestrin C、Lagoudakis M、Parr R。ICML，2002年。\n* [合作式多智能体系统中协作的强化学习](http:\u002F\u002Fwww.aaai.org\u002FPapers\u002FAAAI\u002F2002\u002FAAAI02-050.pdf) 作者：Kapetanakis S、Kudenko D。AAAI\u002FIAAI，2002年。\n\n### 安全\n* [马尔可夫安全博弈：空间安全问题中的学习](http:\u002F\u002Fwww.fransoliehoek.net\u002Fdocs\u002FKlima16LICMAS.pdf) 作者：Klima R、Tuyls K、Oliehoek F。NIPS 多智能体系统的学习、推理与控制会议，2016年。\n* [基于强化学习的多智能体合作捕获：应用于安全巡逻系统](http:\u002F\u002Fieeexplore.ieee.org\u002Fstamp\u002Fstamp.jsp?arnumber=7244682) 作者：Yasuyuki S、Hirofumi O、Tadashi M 等。控制会议（ASCC），2015年。\n* [利用信息不对称提升安全博弈中的学习与适应能力](http:\u002F\u002Fwww4.ncsu.edu\u002F~hdai\u002Finfocom-2015-XH.pdf) 作者：He X、Dai H、Ning P。INFOCOM，2015年。\n\n### 自对弈\n* [广义框架下自对弈算法的比较](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.04471) 作者：Daniel Hernandez、Kevin Denamganai、Sam Devlin 等。IEEE 游戏事务期刊，2021年。\n* [多智能体强化学习的统一博弈论方法](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1711.00832.pdf) 作者：Marc Lanctot、Vinicius Zambaldi、Audrunas Gruslys、Angeliki Lazaridou、Karl Tuyls、Julien Perolat、David Silver。NIPS 2017。\n* [不完美信息博弈中的深度自对弈强化学习](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1603.01121.pdf) 作者：Heinrich Johannes、David Silver。arXiv，2016年。\n* [扩展形式博弈中的虚构自对弈](http:\u002F\u002Fjmlr.org\u002Fproceedings\u002Fpapers\u002Fv37\u002Fheinrich15.pdf) 作者：Heinrich Johannes、Marc Lanctot、David Silver。ICML，2015年。\n\n### 学习沟通\n* [Hammer：通过学习的消息传递实现强化学习智能体的多级协作] 作者：Nikunj Gupta、G. Srinivasaraghavan、Swarup Mohalik、Nishant Kumar 和 Matthew E. Taylor，《神经计算与应用》，2023年。\n* [使用自编码器学习多智能体通信的语义基础](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2110.15349) 作者：Lin Toru、Jacob Huh、Christopher Stauffer、Ser Nam Lim 和 Phillip Isola。2021年。\n* [通过谈判产生的新兴通信](https:\u002F\u002Fopenreview.net\u002Fpdf?id=Hk6WhagRW) 作者：Kris Cao、Angeliki Lazaridou、Marc Lanctot、Joel Z Leibo、Karl Tuyls、Stephen Clark。2018年。\n* [符号与像素输入参考游戏中的语言交流涌现](https:\u002F\u002Fopenreview.net\u002Fpdf?id=HJGv1Z-AW) 作者：Angeliki Lazaridou、Karl Moritz Hermann、Karl Tuyls、Stephen Clark。ICLR 2018。\n* [多智能体游戏中语言的涌现：学习用符号序列进行沟通](https:\u002F\u002Fopenreview.net\u002Fpdf?id=SkaxnKEYg) 作者：Serhii Havrylov、Ivan Titov。ICLR 研讨会，2017年。\n* [使用深度强化学习学习合作性视觉对话智能体](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.06585.pdf) 作者：Abhishek Das、Satwik Kottur 等。arXiv，2017年。\n* [多智能体群体中具身组合语言的涌现](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.04908.pdf) 作者：Igor Mordatch、Pieter Abbeel。arXiv，2017年。[[帖子](https:\u002F\u002Fopenai.com\u002Fblog\u002Flearning-to-communicate\u002F)]\n* [多智能体深度强化学习中的合作与沟通](https:\u002F\u002Frepositories.lib.utexas.edu\u002Fhandle\u002F2152\u002F45681) 作者：Hausknecht M J。2017年。\n* [多智能体合作与（自然）语言的涌现](https:\u002F\u002Fopenreview.net\u002Fpdf?id=Hk8N3Sclg) 作者：Lazaridou A、Peysakhovich A、Baroni M。arXiv，2016年。\n* [使用深度分布式循环Q网络学习沟通以解决谜题](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1602.02672.pdf) 作者：Foerster J N、Assael Y M、de Freitas N 等。arXiv，2016年。\n* [使用深度多智能体强化学习学习沟通](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1605.06676.pdf) 作者：Foerster J、Assael Y M、de Freitas N 等。NIPS，2016年。\n* [通过反向传播学习多智能体通信](http:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F6398-learning-multiagent-communication-with-backpropagation.pdf) 作者：Sukhbaatar S、Fergus R。NIPS，2016年。\n* [通过协商实现高效的分布式强化学习](http:\u002F\u002Fpeople.csail.mit.edu\u002Flpk\u002Fpapers\u002Fdars08.pdf) 作者：Varshavskaya P、Kaelbling L P、Rus D。分布式自主机器人系统，2009年。\n\n### 迁移学习\n* [在多智能体强化学习中同时学习与指导](http:\u002F\u002Fwww.ifaamas.org\u002FProceedings\u002Faamas2017\u002Fpdfs\u002Fp1100.pdf) 作者：Silva, Felipe Leno da；Glatt, Ruben；以及 Costa, Anna Helena Reali。AAMAS，2017年。\n* [通过迁移学习加速多智能体强化学习](https:\u002F\u002Fwww.aaai.org\u002Focs\u002Findex.php\u002FAAAI\u002FAAAI17\u002Fpaper\u002Fdownload\u002F14217\u002F14005) 作者：Silva, Felipe Leno da；以及 Costa, Anna Helena Reali。AAAI，2017年。\n* [通过动态协同学习加速多智能体强化学习](https:\u002F\u002Fweb.cs.umass.edu\u002Fpublication\u002Fdocs\u002F2015\u002FUM-CS-2015-004.pdf) 作者：Garant D、da Silva B C、Lesser V 等。技术报告，2015年。\n* [通过并行迁移在多智能体系统中进行迁移学习](https:\u002F\u002Fwww.scss.tcd.ie\u002F~tayloral\u002Fres\u002Fpapers\u002FTaylor_ParallelTransferLearning_ICML_2013.pdf) 作者：Taylor, Adam 等。ICML，2013年。\n* [多智能体强化学习领域的迁移学习](https:\u002F\u002Fewrl.files.wordpress.com\u002F2011\u002F08\u002Fewrl2011_submission_19.pdf) 作者：Boutsioukis, Georgios、Ioannis Partalas 和 Ioannis Vlahavas。欧洲强化学习研讨会，2011年。\n* [用于多智能体协作的迁移学习](https:\u002F\u002Fai.vub.ac.be\u002F~ydehauwe\u002Fpublications\u002FICAART2011_2.pdf) 作者：Vrancx, Peter、Yann-Michaël De Hauwere 和 Ann Nowé。ICAART，2011年。\n\n### 模仿学习与逆强化学习\n* [关于为人类-AI协作学习人类行为的效用](https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.05789) 作者：Micah Carroll、Rohin Shah、Mark K. Ho、Thomas L. Griffiths、Sanjit A. Seshia、Pieter Abbeel、Anca Dragan。NeurIPS 2019。\n* [多智能体对抗性逆强化学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F1907.13220) 作者：Lantao Yu、Jiaming Song、Stefano Ermon。ICML 2019。\n* [多智能体生成式对抗模仿学习](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F7975-multi-agent-generative-adversarial-imitation-learning) 作者：Jiaming Song、Hongyu Ren、Dorsa Sadigh、Stefano Ermon。NeurIPS 2018。\n* [合作式逆强化学习](http:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F6420-cooperative-inverse-reinforcement-learning.pdf) 作者：Hadfield-Menell D、Russell S J、Abbeel P 等。NIPS，2016年。\n* [在模拟足球场景中比较多智能体与单智能体逆向学习](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1403.6822.pdf) 作者：Lin X、Beling P A、Cogill R。arXiv，2014年。\n* [零和博弈中的多智能体逆强化学习](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1403.6508.pdf) 作者：Lin X、Beling P A、Cogill R。arXiv，2014年。\n* [考虑交互作用的遮挡条件下多机器人逆强化学习](http:\u002F\u002Faamas2014.lip6.fr\u002Fproceedings\u002Faamas\u002Fp173.pdf) 作者：Bogert K、Doshi P。AAMAS，2014年。\n* [多智能体逆强化学习](http:\u002F\u002Fhomes.soic.indiana.edu\u002Fnatarasr\u002FPapers\u002Fmairl.pdf) 作者：Natarajan S、Kunapuli G、Judah K 等。ICMLA，2010年。\n\n### 元学习\n* [非平稳且竞争性环境下的元学习连续适应](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1710.03641.pdf) 作者：l-Shedivat, M。2018年。\n\n\n### 应用\n* [基于可用性奖励模型和多智能体强化学习的移动用户界面自适应](https:\u002F\u002Fwww.mdpi.com\u002F2414-4088\u002F8\u002F4\u002F26) 作者：Vidmanov Dmitry、Alfimtsev Alexander。多模态技术与交互，2024年。\n* [真实机器人控制中采用软约束策略优化的安全多智能体学习](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F10530650) 作者：Shangding Gu、Dianye Huang、Muning Wen、Guang Chen、Alois Knoll。IEEE TII，2024年。\n* [MuZero结合自我竞争用于VP9视频压缩中的速率控制](https:\u002F\u002Farxiv.org\u002Fabs\u002F2202.06626) 作者：Amol Mandhane、Anton Zhernov、Maribeth Rauh、Chenjie Gu 等。arXiv，2022年。\n* [MAgent：面向人工群体智能的多智能体强化学习平台](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1712.00600.pdf) 作者：Zheng L 等。NIPS 2017 & AAAI 2018演示。（[Github页面](https:\u002F\u002Fgithub.com\u002Fgeek-ai\u002FMAgent)）\n* [用于联合目标搜索的协作式深度强化学习](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1702.05573.pdf) 作者：Kong X、Xin B、Wang Y 等。arXiv，2017年。\n* [建筑模拟中的人群多智能体随机仿真](http:\u002F\u002Fwww.ibpsa.org\u002Fproceedings\u002FBS2017\u002FBS2017_051.pdf) 作者：Chapman J、Siebers P、Darren R。建筑模拟，2017年。\n* [扩展No-MASS：用于住宅电器需求响应的多智能体随机仿真](http:\u002F\u002Fwww.ibpsa.org\u002Fproceedings\u002FBS2017\u002FBS2017_056.pdf) 作者：Sancho-Tomás A、Chapman J、Sumner M、Darren R。建筑模拟，2017年。\n* [用于自动驾驶的安全多智能体强化学习](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1610.03295.pdf) 作者：Shalev-Shwartz S、Shammah S、Shashua A。arXiv，2016年。\n* [将多智能体强化学习应用于流域管理](https:\u002F\u002Fwww.researchgate.net\u002Fprofile\u002FKarl_Mason\u002Fpublication\u002F299416955_Applying_Multi-Agent_Reinforcement_Learning_to_Watershed_Management\u002Flinks\u002F56f545b908ae95e8b6d1d3ff.pdf) 作者：Mason、Karl 等。AAMAS自适应与学习代理研讨会论文集，2016年。\n* [通过多智能体强化学习进行人群仿真](http:\u002F\u002Fwww.aaai.org\u002Focs\u002Findex.php\u002FAIIDE\u002FAIIDE10\u002Fpaper\u002FviewFile\u002F2112\u002F2550) 作者：Torrey L。AAAI，2010年。\n* [多智能体强化学习系统控制交通信号灯](https:\u002F\u002Fpdfs.semanticscholar.org\u002F61bc\u002Fb98b7ae3df894f4f72aba3d145bd48ca2cd5.pdf) 作者：Bakker、Bram 等。交互式协作信息系统，2010年。\n* [利用协调图进行城市交通控制的多智能体强化学习](https:\u002F\u002Fstaff.science.uva.nl\u002Fs.a.whiteson\u002Fpubs\u002Fkuyerecml08.pdf) 作者：Kuyer、Lior 等。欧洲机器学习与数据库知识发现联合会议，2008年。\n* [用于优化股票交易系统的多智能体Q学习框架](https:\u002F\u002Fwww.researchgate.net\u002Fpublication\u002F221465347_A_Multi-agent_Q-learning_Framework_for_Optimizing_Stock_Trading_Systems) 作者：Lee J W、Jangmin O。DEXA，2002年。\n* [用于交通信号灯控制的多智能体强化学习](http:\u002F\u002Fciteseerx.ist.psu.edu\u002Fviewdoc\u002Fdownload;jsessionid=422747CB9AF552CF1C4E455220E3F96F?doi=10.1.1.32.9887&rep=rep1&type=pdf) 作者：Wiering、Marco。ICML，2000年。\n\n\n### 网络化MARL\n* [QD-Learning：一种通过共识创新实现多智能体强化学习的协同分布式策略](https:\u002F\u002Fieeexplore.ieee.org\u002Fdocument\u002F6415291) 作者：Kar、Soummya 和 Moura、José M. F. 以及 Poor、H. Vincent。IEEE信号处理汇刊，2013年。\n* [具有网络化智能体的完全去中心化多智能体强化学习](https:\u002F\u002Fproceedings.mlr.press\u002Fv80\u002Fzhang18n.html) 作者：Kaiqing Zhang、Zhuoran Yang、Han Liu、Tong Zhang、Tamer Basar。ICML 2018。\n* [用于去中心化网络化深度多智能体强化学习的价值传播](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F2019\u002Fhash\u002F8a0e1141fd37fa5b98d5bb769ba1a7cc-Abstract.html) 作者：Chao Qu、Shie Mannor、Huan Xu、Yuan Qi、Le Song、Junwu Xiong。NIPS 2019。\n* [用于网络化系统控制的多智能体强化学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.01339) 作者：Tianshu Chu、Sandeep Chinchali、Sachin Katti。ICLR 2020。\n* [F2A2：用于合作式多智能体强化学习的灵活全去中心化近似演员-评论家算法](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.11145) 作者：Wenhao Li、Bo Jin、Xiangfeng Wang、Junchi Yan、Hongyuan Zha。arXiv，2020年。\n* [多智能体网络化系统局部策略的可扩展强化学习](https:\u002F\u002Fproceedings.mlr.press\u002Fv120\u002Fqu20a.html) 作者：Guannan Qu、Adam Wierman、Na Li。L4DC 2020。\n* [具有网络化智能体的去中心化批量多智能体强化学习的有限样本分析](https:\u002F\u002Fieeexplore.ieee.org\u002Fabstract\u002Fdocument\u002F9314079) 作者：Zhang、Kaiqing 和 Yang、Zhuoran 以及 Liu、Han 和 Zhang、Tong 和 Başar、Tamer。TAC 2021。","# MARL-Papers 快速上手指南\n\n## 环境准备\n\n- **系统要求**：支持 Linux、macOS 或 Windows 操作系统\n- **前置依赖**：\n  - Python 3.7 或以上版本\n  - Git（用于克隆仓库）\n  - 可选：`pip` 或 `conda` 用于安装依赖（如需运行相关代码）\n\n> 建议使用国内镜像源加速依赖安装，例如：\n> ```bash\n> pip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n## 安装步骤\n\n1. 克隆仓库到本地：\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers.git\n   ```\n\n2. 进入项目目录：\n   ```bash\n   cd MARL-Papers\n   ```\n\n3. （可选）如果需要查看或运行其中的代码示例，根据具体论文中的说明安装相关依赖。\n\n## 基本使用\n\n1. 查看 README 中的分类链接，例如：\n   - [Tutorial and Books](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#tutorial-and-books)\n   - [Review Papers](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#review-papers)\n   - [Research Papers](https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers#research-papers)\n\n2. 直接访问对应链接阅读论文或参考文献。\n\n3. 如果你希望在本地浏览所有论文列表，可以使用文本编辑器打开 `README.md` 文件。","某高校人工智能实验室的研究团队正在开展多智能体强化学习（MARL）方向的前沿研究，重点探索其在自主无人机编队控制中的应用。团队成员需要快速了解该领域的最新进展，并筛选出相关论文进行深入分析。\n\n### 没有 MARL-Papers 时  \n- 研究人员需要手动在多个学术平台（如arXiv、Google Scholar）上搜索和整理MARL相关论文，耗时且效率低  \n- 缺乏系统分类，难以快速定位特定子领域（如协作与竞争、自对弈训练等）的高质量论文  \n- 部分重要论文可能被遗漏，影响研究的全面性和前瞻性  \n- 团队成员之间信息不共享，导致重复劳动和资源浪费  \n\n### 使用 MARL-Papers 后  \n- 可以直接访问结构化的论文列表，节省大量手动搜索时间  \n- 通过清晰的分类体系，快速找到所需子领域的核心论文，提升研究效率  \n- 覆盖范围广泛，确保不会错过关键研究成果，增强研究深度  \n- 团队成员可统一参考同一资料库，提高协作效率和知识共享质量  \n\nMARL-Papers 为研究人员提供了一个高效、系统、全面的MARL论文资源平台，显著提升了科研工作的质量和效率。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FLantaoYu_MARL-Papers_1022896a.png","LantaoYu","Lantao Yu","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FLantaoYu_30183370.jpg","Ph.D. Student at Stanford CS Department","Stanford University","Stanford, California","lantaoyu@hotmail.com",null,"lantaoyu.com","https:\u002F\u002Fgithub.com\u002FLantaoYu",4779,771,"2026-04-02T14:01:47",1,"Linux, macOS, Windows","未说明",{"notes":92,"python":93,"dependencies":94},"建议使用 conda 管理环境，首次运行需下载约 5GB 模型文件","3.8+",[95,96,97],"torch>=2.0","transformers>=4.30","accelerate",[15],[100,101],"multiagent-reinforcement-learning","multi-agent-learning","2026-03-27T02:49:30.150509","2026-04-06T05:37:50.488958",[105,110,115,120,125,130],{"id":106,"question_zh":107,"answer_zh":108,"source_url":109},5275,"如何添加新的 MARL 相关主题？","可以提出新主题，例如基于 LLM 的 MARL、生成模型的 MARL 或博弈论的理论分析。维护者会考虑添加相关文献。","https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers\u002Fissues\u002F24",{"id":111,"question_zh":112,"answer_zh":113,"source_url":114},5276,"是否有新的 MARL 书籍推荐？","一本即将出版的 MARL 书籍已加入仓库：https:\u002F\u002Fwww.marl-book.com\u002F","https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers\u002Fissues\u002F22",{"id":116,"question_zh":117,"answer_zh":118,"source_url":119},5277,"该分类是否会持续更新？","维护者表示会继续更新该分类。","https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers\u002Fissues\u002F16",{"id":121,"question_zh":122,"answer_zh":123,"source_url":124},5278,"如何处理缺失的高被引论文？","如果发现高被引论文缺失，可以提交链接，维护者会将其添加到仓库中。","https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers\u002Fissues\u002F12",{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},5279,"如何修复失效的链接？","如果发现链接失效，维护者会更新链接以确保可访问性。","https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers\u002Fissues\u002F10",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},5280,"如何解决无法访问的文献链接？","如果文献链接失效，维护者会更新链接以解决问题。","https:\u002F\u002Fgithub.com\u002FLantaoYu\u002FMARL-Papers\u002Fissues\u002F5",[]]