[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-Replicable-MARL--MARLlib":3,"tool-Replicable-MARL--MARLlib":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",143909,2,"2026-04-07T11:33:18",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":72,"owner_avatar_url":73,"owner_bio":74,"owner_company":75,"owner_location":75,"owner_email":75,"owner_twitter":75,"owner_website":75,"owner_url":76,"languages":77,"stars":108,"forks":109,"last_commit_at":110,"license":111,"difficulty_score":112,"env_os":113,"env_gpu":114,"env_ram":114,"env_deps":115,"category_tags":121,"github_topics":122,"view_count":32,"oss_zip_url":75,"oss_zip_packed_at":75,"status":17,"created_at":126,"updated_at":127,"faqs":128,"releases":157},5040,"Replicable-MARL\u002FMARLlib","MARLlib","One repository is all that is necessary for Multi-agent Reinforcement Learning (MARL)","MARLlib 是一个专为多智能体强化学习（MARL）打造的一站式开源库。它基于强大的 Ray 和 RLlib 框架，旨在解决当前 MARL 领域环境碎片化、算法复现困难以及训练流程复杂等痛点，为开发者提供了一个统一且高效的实验平台。\n\n无论是高校研究人员还是算法工程师，都能利用 MARLlib 轻松完成从环境搭建、模型构建到算法训练与测试的全流程。该工具内置了丰富的经典环境与前沿任务支持，涵盖 MPE、StarCraft II、Overcooked-AI 乃至空中对抗等多种场景，并兼容最新的 PettingZoo 与 Gymnasium 标准。\n\nMARLlib 的核心亮点在于其高度模块化的设计。用户仅需几行代码即可灵活配置超参数、选择网络架构（如 MLP 或编码层），并快速部署 MAPPO 等主流算法。它还支持参数共享策略，极大地降低了多智能体协作任务的开发门槛。凭借简洁的 API 接口和详尽的文档，MARLlib 让复杂的多智能体系统研究变得更加直观和可复现，是探索群体智能理想的得力助手。","\u003Cdiv align=\"center\">\n\n\u003Cimg src=docs\u002Fsource\u002Fimages\u002Flogo1.png width=75% \u002F>\n\u003C\u002Fdiv>\n\n\n\n\u003Ch1 align=\"center\"> MARLlib: A Multi-agent Reinforcement Learning Library \u003C\u002Fh1>\n\n\u003Cdiv align=\"center\">\n\n\u003Cimg src=docs\u002Fsource\u002Fimages\u002Fallenv.gif width=99% \u002F>\n\n\u003C\u002Fdiv>\n\n&emsp;\n\n[![GitHub license](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-blue.svg)]()\n![coverage](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Frllib_1.8.0_dev\u002Fcoverage.svg)\n[![Documentation Status](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FReplicable-MARL_MARLlib_readme_13d664e1afd7.png)](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002F)\n[![GitHub issues](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002FReplicable-MARL\u002FMARLlib)](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fissues)\n[![PyPI version](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fmarllib.svg)](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fmarllib)\n[![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Fmaster\u002Fmarllib.ipynb)\n[![Organization](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FOrganization-ReLER_RL-blue.svg)](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib)\n[![Organization](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FOrganization-PKU_MARL-blue.svg)](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib)\n[![Awesome](https:\u002F\u002Fawesome.re\u002Fbadge.svg)](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fresources\u002Fawesome.html)\n\n\n| :exclamation:  News |\n|:-----------------------------------------|\n| **March 2023** :anchor:We are excited to announce that a major update has just been released. For detailed version information, please refer to the [version info](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Freleases\u002Ftag\u002F1.0.2).|\n| **May 2023** Exciting news! MARLlib now supports five more tasks: [MATE](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#mate), [GoBigger](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#gobigger), [Overcooked-AI](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#overcooked-ai), [MAPDN](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#power-distribution-networks), and [AirCombat](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#air-combat). Give them a try!|\n| **June 2023** [OpenAI: Hide and Seek](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#hide-and-seek) and [SISL](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#sisl) environments are incorporated into MARLlib.|\n| **Aug 2023** :tada:MARLlib has been accepted for publication in [JMLR](https:\u002F\u002Fwww.jmlr.org\u002Fmloss\u002F).|\n| **Sept 2023** Latest [PettingZoo](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#pettingzoo) with [Gymnasium](https:\u002F\u002Fgymnasium.farama.org) are compatiable within MARLlib.|\n| **Nov 2023** We are currently in the process of creating a hands-on MARL book and aim to release the draft by the end of 2023.|\n\n\n**Multi-agent Reinforcement Learning Library ([MARLlib](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.13708))** is ***a MARL library*** that utilizes [**Ray**](https:\u002F\u002Fgithub.com\u002Fray-project\u002Fray) and one of its toolkits [**RLlib**](https:\u002F\u002Fgithub.com\u002Fray-project\u002Fray\u002Ftree\u002Fmaster\u002Frllib). It offers a comprehensive platform for developing, training, and testing MARL algorithms across various tasks and environments. \n\nHere's an example of how MARLlib can be used:\n\n```py\nfrom marllib import marl\n\n# prepare env\nenv = marl.make_env(environment_name=\"mpe\", map_name=\"simple_spread\", force_coop=True)\n\n# initialize algorithm with appointed hyper-parameters\nmappo = marl.algos.mappo(hyperparam_source='mpe')\n\n# build agent model based on env + algorithms + user preference\nmodel = marl.build_model(env, mappo, {\"core_arch\": \"mlp\", \"encode_layer\": \"128-256\"})\n\n# start training\nmappo.fit(env, model, stop={'timesteps_total': 1000000}, share_policy='group')\n```\n\n\n\n## Why MARLlib?\n\nHere we provide a table for the comparison of MARLlib and existing work.\n\n|   Library   |  Supported Env | Algorithm | Parameter Sharing  | Model \n|:-------------:|:-------------:|:-------------:|:--------------:|:----------------:|\n|     [PyMARL](https:\u002F\u002Fgithub.com\u002Foxwhirl\u002Fpymarl) |        1 cooperative       |       5       |         share        |      GRU           | :x:\n|   [PyMARL2](https:\u002F\u002Fgithub.com\u002Fhijkzzz\u002Fpymarl2)|        2 cooperative       |     11   |         share        |  MLP + GRU  | :x:\n| [MAPPO Benchmark](https:\u002F\u002Fgithub.com\u002Fmarlbenchmark\u002Fon-policy) |       4 cooperative       |      1     |          share + separate        |          MLP + GRU        |         :x:              |\n| [MAlib](https:\u002F\u002Fgithub.com\u002Fsjtu-marl\u002Fmalib) |  4 self-play  | 10 | share + group + separate | MLP + LSTM | [![Documentation Status](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FReplicable-MARL_MARLlib_readme_13d664e1afd7.png)](https:\u002F\u002Fmalib.readthedocs.io\u002Fen\u002Flatest\u002F?badge=latest)\n|    [EPyMARL](https:\u002F\u002Fgithub.com\u002Fuoe-agents\u002Fepymarl)|        4 cooperative      |    9    |        share + separate       |      GRU             |           :x:            |\n|    [HARL](https:\u002F\u002Fgithub.com\u002FPKU-MARL\u002FHARL)|        8 cooperative      |    9    |        share + separate       |      MLP + CNN + GRU           |           :x:            |\n|    **[MARLlib](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib)** |       17 **no task mode restriction**     |    18     |   share + group + separate + **customizable**         |         MLP + CNN + GRU + LSTM          |           [![Documentation Status](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FReplicable-MARL_MARLlib_readme_13d664e1afd7.png)](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002F) |\n\n|   Library   | Github Stars  | Documentation | Issues Open | Activity | Last Update\n|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|\n|     [PyMARL](https:\u002F\u002Fgithub.com\u002Foxwhirl\u002Fpymarl) | [![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Foxwhirl\u002Fpymarl)](https:\u002F\u002Fgithub.com\u002Foxwhirl\u002Fpymarl)    |       :x: | ![GitHub opened issue](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Foxwhirl\u002Fpymarl.svg) | ![GitHub commit-activity](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fy\u002Foxwhirl\u002Fpymarl?label=commit) | ![GitHub last commit](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002Foxwhirl\u002Fpymarl?label=last%20update)  \n|   [PyMARL2](https:\u002F\u002Fgithub.com\u002Fhijkzzz\u002Fpymarl2)| [![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fhijkzzz\u002Fpymarl2)](https:\u002F\u002Fgithub.com\u002Fhijkzzz\u002Fpymarl2)       |       :x:  | ![GitHub opened issue](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Fhijkzzz\u002Fpymarl2.svg) | ![GitHub commit-activity](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fy\u002Fhijkzzz\u002Fpymarl2?label=commit) | ![GitHub last commit](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002Fhijkzzz\u002Fpymarl2?label=last%20update)  \n| [MAPPO Benchmark](https:\u002F\u002Fgithub.com\u002Fmarlbenchmark\u002Fon-policy)| [![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmarlbenchmark\u002Fon-policy)](https:\u002F\u002Fgithub.com\u002Fmarlbenchmark\u002Fon-policy)   |        :x:              | ![GitHub opened issue](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Fmarlbenchmark\u002Fon-policy.svg) | ![GitHub commit-activity](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fy\u002Fmarlbenchmark\u002Fon-policy?label=commit)| ![GitHub last commit](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002Fmarlbenchmark\u002Fon-policy?label=last%20update)  \n| [MAlib](https:\u002F\u002Fgithub.com\u002Fsjtu-marl\u002Fmalib) | [![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fsjtu-marl\u002Fmalib)](https:\u002F\u002Fgithub.com\u002Fsjtu-marl\u002Fmalib) | [![Documentation Status](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FReplicable-MARL_MARLlib_readme_13d664e1afd7.png)](https:\u002F\u002Fmalib.readthedocs.io\u002Fen\u002Flatest\u002F?badge=latest) | ![GitHub opened issue](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Fsjtu-marl\u002Fmalib.svg) | ![GitHub commit-activity](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fy\u002Fsjtu-marl\u002Fmalib?label=commit) | ![GitHub last commit](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002Fsjtu-marl\u002Fmalib?label=last%20update)  \n|    [EPyMARL](https:\u002F\u002Fgithub.com\u002Fuoe-agents\u002Fepymarl)| [![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fuoe-agents\u002Fepymarl)](https:\u002F\u002Fgithub.com\u002Fuoe-agents\u002Fepymarl)        |           :x:            | ![GitHub opened issue](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Fuoe-agents\u002Fepymarl.svg) | ![GitHub commit-activity](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fy\u002Fuoe-agents\u002Fepymarl?label=commit) | ![GitHub last commit](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002Fuoe-agents\u002Fepymarl?label=last%20update)  \n|    [HARL](https:\u002F\u002Fgithub.com\u002FPKU-MARL\u002FHARL)**\\***| [![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FPKU-MARL\u002FHARL)](https:\u002F\u002Fgithub.com\u002FPKU-MARL\u002FHARL)        |           :x:            | ![GitHub opened issue](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002FPKU-MARL\u002FHARL.svg) | ![GitHub commit-activity](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fy\u002FPKU-MARL\u002FHARL?label=commit) | ![GitHub last commit](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002FPKU-MARL\u002FHARL?label=last%20update)  \n|    **[MARLlib](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib)** |  [![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FReplicable-MARL\u002FMARLlib)](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib)  |           [![Documentation Status](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FReplicable-MARL_MARLlib_readme_13d664e1afd7.png)](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002F) | ![GitHub opened issue](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002FReplicable-MARL\u002FMARLlib.svg) | ![GitHub commit-activity](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fy\u002FReplicable-MARL\u002FMARLlib?label=commit) | ![GitHub last commit](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002FReplicable-MARL\u002FMARLlib?label=last%20update)  \n\n> **_\\*_**  **HARL** is the latest MARL library that has been recently released:fire:. If cutting-edge MARL algorithms with state-of-the-art performance are your target, HARL is definitely worth [a look](https:\u002F\u002Fgithub.com\u002FPKU-MARL\u002FHARL)!\n\n[comment]: \u003C> (\u003Cdiv align=\"center\">)\n\n[comment]: \u003C> (\u003Cimg src=docs\u002Fsource\u002Fimages\u002Foverview.png width=100% \u002F>)\n\n[comment]: \u003C> (\u003C\u002Fdiv>)\n\n## key features\n\n:beginner: MARLlib offers several key features that make it stand out:\n\n- MARLlib unifies diverse algorithm pipelines with agent-level distributed dataflow, allowing researchers to develop, test, and evaluate MARL algorithms across different tasks and environments.\n- MARLlib supports all task modes, including cooperative, collaborative, competitive, and mixed. This makes it easier for researchers to train and evaluate MARL algorithms across a wide range of tasks.\n- MARLlib provides a new interface that follows the structure of Gym, making it easier for researchers to work with multi-agent environments.\n- MARLlib provides flexible and customizable parameter-sharing strategies, allowing researchers to optimize their algorithms for different tasks and environments.\n\n:rocket: Using MARLlib, you can take advantage of various benefits, such as:\n\n- **Zero knowledge of MARL**: MARLlib provides 18 pre-built algorithms with an intuitive API, allowing researchers to start experimenting with MARL without prior knowledge of the field.\n- **Support for all task modes**: MARLlib supports almost all multi-agent environments, making it easier for researchers to experiment with different task modes.\n- **Customizable model architecture**: Researchers can choose their preferred model architecture from the model zoo, or build their own.\n- **Customizable policy sharing**: MARLlib provides grouping options for policy sharing, or researchers can create their own.\n- **Access to over a thousand released experiments**: Researchers can access over a thousand released experiments to see how other researchers have used MARLlib.\n\n## Installation\n\n> __Note__:\n> Please note that at this time, MARLlib is only compatible with Linux operating systems.\n\n### Step-by-step  (recommended)\n\n- install dependencies\n- install environments\n- install patches\n\n#### 1. install dependencies (basic)\n\nFirst, install MARLlib dependencies to guarantee basic usage.\nfollowing [this guide](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html), finally install patches for RLlib.\n\n```bash\n$ conda create -n marllib python=3.8 # or 3.9\n$ conda activate marllib\n$ git clone https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib.git && cd MARLlib\n$ pip install -r requirements.txt\n```\n\n#### 2. install environments (optional)\n\nPlease follow [this guide](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html).\n\n> __Note__:\n> We recommend the gym version around 0.20.0.\n```bash\npip install \"gym==0.20.0\"\n```\n\n#### 3. install patches (basic)\n\nFix bugs of RLlib using patches by running the following command:\n\n```bash\n$ cd \u002FPath\u002FTo\u002FMARLlib\u002Fmarllib\u002Fpatch\n$ python add_patch.py -y\n```\n\n### PyPI\n\n```bash\n$ pip install --upgrade pip\n$ pip install marllib\n```\n\n### Docker-based usage\n\nWe provide a Dockerfile for building the MARLlib docker image in `MARLlib\u002Fdocker\u002FDockerfile` and a devcontainer setup in `MARLlib\u002F.devcontainer` folder. If you use the devcontainer, one thing to note is that you may need to customise certain arguments in `runArgs`  of `devcontainer.json` according to your hardware, for example the `--shm-size` argument.\n\n## Getting started\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>\u003Cbig>Prepare the configuration\u003C\u002Fbig>\u003C\u002Fb>\u003C\u002Fsummary>\n\nThere are four parts of configurations that take charge of the whole training process.\n\n- scenario: specify the environment\u002Ftask settings\n- algorithm: choose the hyperparameters of the algorithm\n- model: customize the model architecture\n- ray\u002Frllib: change the basic training settings\n\n\u003Cdiv align=\"center\">\n\u003Cimg src=docs\u002Fsource\u002Fimages\u002Fconfigurations.png width=100% \u002F>\n\u003C\u002Fdiv>\n\nBefore training, ensure all the parameters are set correctly, especially those you don't want to change.\n> __Note__:\n> You can also modify all the pre-set parameters via MARLLib API.*\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>\u003Cbig>Register the environment\u003C\u002Fbig>\u003C\u002Fb>\u003C\u002Fsummary>\n\nEnsure all the dependencies are installed for the environment you are running with. Otherwise, please refer to\n[MARLlib documentation](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html).\n\n\n|   task mode   | api example |\n| :-----------: | ----------- |\n| cooperative | ```marl.make_env(environment_name=\"mpe\", map_name=\"simple_spread\", force_coop=True)``` |\n| collaborative | ```marl.make_env(environment_name=\"mpe\", map_name=\"simple_spread\")``` |\n| competitive | ```marl.make_env(environment_name=\"mpe\", map_name=\"simple_adversary\")``` |\n| mixed | ```marl.make_env(environment_name=\"mpe\", map_name=\"simple_crypto\")``` |\n\nMost of the popular environments in MARL research are supported by MARLlib:\n\n| Env Name | Learning Mode | Observability | Action Space | Observations |\n| :-----------: | :-----------: | :-----------: | :-----------: | :-----------: |\n| **[LBF](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#lbf)**  | cooperative + collaborative | Both | Discrete | 1D  |\n| **[RWARE](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#rware)**  | cooperative | Partial | Discrete | 1D  |\n| **[MPE](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#mpe)**  | cooperative + collaborative + mixed | Both | Both | 1D  |\n| **[SISL](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#sisl)** | cooperative + collaborative | Full | Both | 1D |\n| **[SMAC](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#smac)**  | cooperative | Partial | Discrete | 1D |\n| **[MetaDrive](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#metadrive)**  | collaborative | Partial | Continuous | 1D |\n| **[MAgent](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#magent)** | collaborative + mixed | Partial | Discrete | 2D |\n| **[Pommerman](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#pommerman)**  | collaborative + competitive + mixed | Both | Discrete | 2D |\n| **[MAMuJoCo](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#mamujoco)**  | cooperative | Full | Continuous | 1D |\n| **[GRF](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#google-research-football)**  | collaborative + mixed | Full | Discrete | 2D |\n| **[Hanabi](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#hanabi)** | cooperative | Partial | Discrete | 1D |\n| **[MATE](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#mate)** | cooperative + mixed | Partial | Both | 1D |\n| **[GoBigger](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#gobigger)** | cooperative + mixed | Both | Continuous | 1D |\n| **[Overcooked-AI](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#overcooked-ai)** | cooperative | Full | Discrete | 1D |\n| **[PDN](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#power-distribution-networks)** | cooperative | Partial | Continuous | 1D |\n| **[AirCombat](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#air-combat)** | cooperative + mixed | Partial | MultiDiscrete | 1D |\n| **[HideAndSeek](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#hide-and-seek)** | competitive + mixed | Partial | MultiDiscrete | 1D |\n\nEach environment has a readme file, standing as the instruction for this task, including env settings, installation, and\nimportant notes.\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>\u003Cbig>Initialize the algorithm\u003C\u002Fbig>\u003C\u002Fb>\u003C\u002Fsummary>\n\n\n|  running target   | api example |\n| :-----------: | ----------- |\n| train & finetune  | ```marl.algos.mappo(hyperparam_source=$ENV)``` |\n| develop & debug | ```marl.algos.mappo(hyperparam_source=\"test\")``` |\n| 3rd party env | ```marl.algos.mappo(hyperparam_source=\"common\")``` |\n\nHere is a chart describing the characteristics of each algorithm:\n\n| algorithm                                                    | support task mode | discrete action   | continuous action |  policy type        |\n| :------------------------------------------------------------: | :-----------------: | :----------: | :--------------------: | :----------: | \n| *IQL**                                                         | all four               | :heavy_check_mark:   |    |  off-policy |\n| *[PG](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf)* | all four                  | :heavy_check_mark:       | :heavy_check_mark:   |  on-policy  |\n| *[A2C](https:\u002F\u002Farxiv.org\u002Fabs\u002F1602.01783)*                      | all four              | :heavy_check_mark:       | :heavy_check_mark:   |  on-policy  |\n| *[DDPG](https:\u002F\u002Farxiv.org\u002Fabs\u002F1509.02971)*                     | all four             |  | :heavy_check_mark:   |  off-policy |\n| *[TRPO](http:\u002F\u002Fproceedings.mlr.press\u002Fv37\u002Fschulman15.pdf)*      | all four            | :heavy_check_mark:       | :heavy_check_mark:   |  on-policy  |\n| *[PPO](https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.06347)*                      | all four            | :heavy_check_mark:       | :heavy_check_mark:   |  on-policy  |\n| *[COMA](https:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fdownload\u002F11794\u002F11653)* | all four                           | :heavy_check_mark:       |   |  on-policy  |\n| *[MADDPG](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.02275)*                   | all four                     |  | :heavy_check_mark:   |  off-policy |\n| *MAA2C**                                                       | all four                        | :heavy_check_mark:       | :heavy_check_mark:   |  on-policy  |\n| *MATRPO**                                                      | all four                         | :heavy_check_mark:       | :heavy_check_mark:   |  on-policy  |\n| *[MAPPO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.01955)*                    | all four                         | :heavy_check_mark:       | :heavy_check_mark:   |  on-policy  |\n| *[HATRPO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.11251)*                   | cooperative                     | :heavy_check_mark:       | :heavy_check_mark:   |  on-policy  |\n| *[HAPPO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.11251)*                    | cooperative                     | :heavy_check_mark:       | :heavy_check_mark:   |  on-policy  |\n| *[VDN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.05296)*                      | cooperative         | :heavy_check_mark:   |    |  off-policy |\n| *[QMIX](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.11485)*                     | cooperative                    | :heavy_check_mark:   |   |  off-policy |\n| *[FACMAC](https:\u002F\u002Farxiv.org\u002Fabs\u002F2003.06709)*                   | cooperative                    |  | :heavy_check_mark:   |  off-policy |\n| *[VDAC](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.12306)*                    | cooperative                    | :heavy_check_mark:       | :heavy_check_mark:   |  on-policy  |\n| *VDPPO**                                                      | cooperative                | :heavy_check_mark:       | :heavy_check_mark:   |  on-policy  |\n\n***all four**: cooperative collaborative competitive mixed\n\n*IQL* is the multi-agent version of Q learning.\n*MAA2C* and *MATRPO* are the centralized version of A2C and TRPO.\n*VDPPO* is the value decomposition version of PPO.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>\u003Cbig>Build the agent model\u003C\u002Fbig>\u003C\u002Fb>\u003C\u002Fsummary>\n\nAn agent model consists of two parts, `encoder` and `core arch`. \n`encoder` will be constructed by MARLlib according to the observation space.\nChoose `mlp`, `gru`, or `lstm` as you like to build the complete model.\n\n|  model arch   | api example |\n| :-----------: | ----------- |\n| MLP  | ```marl.build_model(env, algo, {\"core_arch\": \"mlp\")``` |\n| GRU | ```marl.build_model(env, algo, {\"core_arch\": \"gru\"})```  |\n| LSTM | ```marl.build_model(env, algo, {\"core_arch\": \"lstm\"})```  |\n| Encoder Arch | ```marl.build_model(env, algo, {\"core_arch\": \"gru\", \"encode_layer\": \"128-256\"})```  |\n\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>\u003Cbig>Kick off the training\u003C\u002Fbig>\u003C\u002Fb>\u003C\u002Fsummary>\n\n|  setting   | api example |\n| :-----------: | ----------- |\n| train  | ```algo.fit(env, model)``` |\n| debug  | ```algo.fit(env, model, local_mode=True)``` |\n| stop condition | ```algo.fit(env, model, stop={'episode_reward_mean': 2000, 'timesteps_total': 10000000})```  |\n| policy sharing | ```algo.fit(env, model, share_policy='all') # or 'group' \u002F 'individual'```  |\n| save model | ```algo.fit(env, model, checkpoint_freq=100, checkpoint_end=True)```  |\n| GPU accelerate  | ```algo.fit(env, model, local_mode=False, num_gpus=1)``` |\n| CPU accelerate | ```algo.fit(env, model, local_mode=False, num_workers=5)```  |\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>\u003Cbig>Training & rendering API\u003C\u002Fbig>\u003C\u002Fb>\u003C\u002Fsummary>\n\n```py\nfrom marllib import marl\n\n# prepare env\nenv = marl.make_env(environment_name=\"smac\", map_name=\"5m_vs_6m\")\n# initialize algorithm with appointed hyper-parameters\nmappo = marl.algos.mappo(hyperparam_source=\"smac\")\n# build agent model based on env + algorithms + user preference\nmodel = marl.build_model(env, mappo, {\"core_arch\": \"gru\", \"encode_layer\": \"128-256\"})\n# start training\nmappo.fit(\n  env, model, \n  stop={\"timesteps_total\": 1000000}, \n  checkpoint_freq=100, \n  share_policy=\"group\"\n)\n# rendering\nmappo.render(\n  env, model, \n  local_mode=True, \n  restore_path={'params_path': \"checkpoint\u002Fparams.json\",\n                'model_path': \"checkpoint\u002Fcheckpoint-10\"}\n)\n```\n\u003C\u002Fdetails>\n\n## Results\n\nUnder the current working directory, you can find all the training data (logging and TensorFlow files) as well as the saved models. To visualize the learning curve, you can use Tensorboard. Follow the steps below:\n\n1. Install Tensorboard by running the following command:\n```bash\npip install tensorboard\n```\n\n2. Use the following command to launch Tensorboard and visualize the results:\n```bash\ntensorboard --logdir .\n```\n\nAlternatively, you can refer to [this tutorial](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensorboard\u002Fblob\u002Fmaster\u002Fdocs\u002Fget_started.ipynb) for more detailed instructions.\n\nFor a list of all the existing results, you can visit [this link](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Ftree\u002Fmain\u002Fresults). Please note that these results were obtained from an older version of MARLlib, which may lead to inconsistencies when compared to the current results.\n\n## Quick examples\n\nMARLlib provides some practical examples for you to refer to.\n\n- [Detailed API usage](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Frllib_1.8.0_dev\u002Fexamples\u002Fapi_basic_usage.py): show how to use MARLlib api in\n  detail, e.g. cmd + api combined running.\n- [Policy sharing cutomization](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Frllib_1.8.0_dev\u002Fexamples\u002Fcustomize_policy_sharing.py):\n  define your group policy-sharing strategy as you like based on current tasks.\n- [Loading model](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Frllib_1.8.0_dev\u002Fexamples\u002Fload_model.py):\n  load the pre-trained model and keep training.\n- [Loading model and rendering](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Frllib_1.8.0_dev\u002Fexamples\u002Fload_and_render_model.py):\n  render the environment based on the pre-trained model.\n- [Incorporating new environment](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Frllib_1.8.0_dev\u002Fexamples\u002Fadd_new_env.py):\n  add your new environment following MARLlib's env-agent interaction interface.\n- [Incorporating new algorithm](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Frllib_1.8.0_dev\u002Fexamples\u002Fadd_new_algorithm.py):\n  add your new algorithm following MARLlib learning pipeline.\n- [Parallelized finetuning](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Frllib_1.8.0_dev\u002Fexamples\u002Fgrid_search_usage.py):\n  fintune your policy\u002Fmodel performance with `ray.tune`.\n  \n## Tutorials\n\nTry MPE + MAPPO examples on Google Colaboratory!\n[![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Fmaster\u002Fmarllib.ipynb)\nMore tutorial documentations are available [here](https:\u002F\u002Fmarllib.readthedocs.io\u002F).\n\n## Awesome List\n\nA collection of research and review papers of multi-agent reinforcement learning (MARL) is available. The papers have been organized based on their publication date and their evaluation of the corresponding environments.\n\nAlgorithms: [![Awesome](https:\u002F\u002Fawesome.re\u002Fbadge.svg)](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fresources\u002Fawesome.html)\nEnvironments: [![Awesome](https:\u002F\u002Fawesome.re\u002Fbadge.svg)](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html)\n\n\n## Community\n\n|  Channel   | Link |\n| :----------- | :----------- |\n| Issues | [GitHub Issues](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fissues) |\n\n## Roadmap\n\nThe roadmap to the future release is available in [ROADMAP.md](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Fmain\u002FROADMAP.md).\n\n## Contributing\n\nWe are a small team on multi-agent reinforcement learning, and we will take all the help we can get! \nIf you would like to get involved, here is information on [contribution guidelines and how to test the code locally](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Frllib_1.8.0_dev\u002FCONTRIBUTING.md).\n\nYou can contribute in multiple ways, e.g., reporting bugs, writing or translating documentation, reviewing or refactoring code, requesting or implementing new features, etc.\n\n## Citation\n\nIf you use MARLlib in your research, please cite the [MARLlib paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.13708).\n\n```tex\n@article{hu2022marllib,\n  author  = {Siyi Hu and Yifan Zhong and Minquan Gao and Weixun Wang and Hao Dong and Xiaodan Liang and Zhihui Li and Xiaojun Chang and Yaodong Yang},\n  title   = {MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library},\n  journal = {Journal of Machine Learning Research},\n  year    = {2023},\n}\n```\n\nWorks that are based on or closely collaborate with MARLlib \u003C[link](https:\u002F\u002Fgithub.com\u002FPKU-MARL\u002FHARL)>\n\n```tex\n@InProceedings{hu2022policy,\n      title={Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent {RL}},\n      author={Hu, Siyi and Xie, Chuanlong and Liang, Xiaodan and Chang, Xiaojun},\n      booktitle={Proceedings of the 39th International Conference on Machine Learning},\n      year={2022},\n}\n@misc{zhong2023heterogeneousagent,\n      title={Heterogeneous-Agent Reinforcement Learning}, \n      author={Yifan Zhong and Jakub Grudzien Kuba and Siyi Hu and Jiaming Ji and Yaodong Yang},\n      archivePrefix={arXiv},\n      year={2023},\n}\n```\n\n\n","\u003Cdiv align=\"center\">\n\n\u003Cimg src=docs\u002Fsource\u002Fimages\u002Flogo1.png width=75% \u002F>\n\u003C\u002Fdiv>\n\n\n\n\u003Ch1 align=\"center\"> MARLlib：多智能体强化学习库 \u003C\u002Fh1>\n\n\u003Cdiv align=\"center\">\n\n\u003Cimg src=docs\u002Fsource\u002Fimages\u002Fallenv.gif width=99% \u002F>\n\n\u003C\u002Fdiv>\n\n&emsp;\n\n[![GitHub 许可证](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-blue.svg)]()\n![覆盖率](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Frllib_1.8.0_dev\u002Fcoverage.svg)\n[![文档状态](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FReplicable-MARL_MARLlib_readme_13d664e1afd7.png)](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002F)\n[![GitHub 问题](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002FReplicable-MARL\u002FMARLlib)](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fissues)\n[![PyPI 版本](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fmarllib.svg)](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fmarllib)\n[![在 Colab 中打开](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Fmaster\u002Fmarllib.ipynb)\n[![组织](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FOrganization-ReLER_RL-blue.svg)](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib)\n[![组织](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FOrganization-PKU_MARL-blue.svg)](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib)\n[![Awesome](https:\u002F\u002Fawesome.re\u002Fbadge.svg)](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fresources\u002Fawesome.html)\n\n\n| :exclamation:  新闻 |\n|:-----------------------------------------|\n| **2023年3月** :anchor:我们很高兴地宣布，一项重大更新刚刚发布。有关详细版本信息，请参阅[版本信息](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Freleases\u002Ftag\u002F1.0.2)。|\n| **2023年5月** 激动人心的消息！MARLlib 现在支持另外五项任务：[MATE](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#mate)、[GoBigger](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#gobigger)、[Overcooked-AI](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#overcooked-ai)、[MAPDN](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#power-distribution-networks) 和 [AirCombat](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#air-combat)。快来试试吧！|\n| **2023年6月** [OpenAI：躲猫猫](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#hide-and-seek) 和 [SISL](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#sisl) 环境已被纳入 MARLlib。|\n| **2023年8月** :tada:MARLlib 已被 [JMLR](https:\u002F\u002Fwww.jmlr.org\u002Fmloss\u002F) 接受发表。|\n| **2023年9月** 最新的 [PettingZoo](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#pettingzoo) 与 [Gymnasium](https:\u002F\u002Fgymnasium.farama.org) 在 MARLlib 中实现了兼容。|\n| **2023年11月** 我们目前正在编写一本实践性的多智能体强化学习书籍，并计划在2023年底发布初稿。|\n\n\n**多智能体强化学习库（[MARLlib](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.13708)）** 是一个基于 [**Ray**](https:\u002F\u002Fgithub.com\u002Fray-project\u002Fray) 及其工具包之一 [**RLlib**](https:\u002F\u002Fgithub.com\u002Fray-project\u002Fray\u002Ftree\u002Fmaster\u002Frllib) 的 ***多智能体强化学习库***。它提供了一个全面的平台，用于开发、训练和测试各种任务及环境下的多智能体强化学习算法。\n\n以下是 MARLlib 的使用示例：\n\n```py\nfrom marllib import marl\n\n# 准备环境\nenv = marl.make_env(environment_name=\"mpe\", map_name=\"simple_spread\", force_coop=True)\n\n# 使用指定超参数初始化算法\nmappo = marl.algos.mappo(hyperparam_source='mpe')\n\n# 基于环境、算法和用户偏好构建智能体模型\nmodel = marl.build_model(env, mappo, {\"core_arch\": \"mlp\", \"encode_layer\": \"128-256\"})\n\n# 开始训练\nmappo.fit(env, model, stop={'timesteps_total': 1000000}, share_policy='group')\n```\n\n## 为什么选择 MARLlib？\n\n下面我们提供一张表格，用于比较 MARLlib 与现有工作。\n\n| 库名         | 支持环境       | 算法数量 | 参数共享方式   | 模型架构           | 文档状态 |\n|:-------------:|:-------------:|:-------------:|:--------------:|:----------------:|:--------:|\n|     [PyMARL](https:\u002F\u002Fgithub.com\u002Foxwhirl\u002Fpymarl) |        1 个合作环境       |       5       |         共享        |      GRU           | :x:\n|   [PyMARL2](https:\u002F\u002Fgithub.com\u002Fhijkzzz\u002Fpymarl2)|        2 个合作环境       |     11   |         共享        |  MLP + GRU  | :x:\n| [MAPPO Benchmark](https:\u002F\u002Fgithub.com\u002Fmarlbenchmark\u002Fon-policy) |       4 个合作环境       |      1     |          共享 + 分离        |          MLP + GRU        |         :x:              |\n| [MAlib](https:\u002F\u002Fgithub.com\u002Fsjtu-marl\u002Fmalib) |  4 种自对战场景  | 10 | 共享 + 分组 + 分离 | MLP + LSTM | [![文档状态](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FReplicable-MARL_MARLlib_readme_13d664e1afd7.png)](https:\u002F\u002Fmalib.readthedocs.io\u002Fen\u002Flatest\u002F?badge=latest)\n|    [EPyMARL](https:\u002F\u002Fgithub.com\u002Fuoe-agents\u002Fepymarl)|        4 个合作环境      |    9    |        共享 + 分离       |      GRU             |           :x:            |\n|    [HARL](https:\u002F\u002Fgithub.com\u002FPKU-MARL\u002FHARL)|        8 个合作环境      |    9    |        共享 + 分离       |      MLP + CNN + GRU           |           :x:            |\n|    **[MARLlib](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib)** |       17 种环境（无任务模式限制）     |    18     |   共享 + 分组 + 分离 + **可定制**         |         MLP + CNN + GRU + LSTM          |           [![文档状态](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FReplicable-MARL_MARLlib_readme_13d664e1afd7.png)](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002F) |\n\n| 库名         | GitHub 星数  | 文档 | 开放问题数 | 活跃度 | 最后更新时间 |\n|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|\n|     [PyMARL](https:\u002F\u002Fgithub.com\u002Foxwhirl\u002Fpymarl) | [![GitHub 星数](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Foxwhirl\u002Fpymarl)](https:\u002F\u002Fgithub.com\u002Foxwhirl\u002Fpymarl)    |       :x: | ![GitHub 开放问题](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Foxwhirl\u002Fpymarl.svg) | ![GitHub 提交活跃度](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fy\u002Foxwhirl\u002Fpymarl?label=提交) | ![GitHub 最后一次提交](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002Foxwhirl\u002Fpymarl?label=最后更新)  \n|   [PyMARL2](https:\u002F\u002Fgithub.com\u002Fhijkzzz\u002Fpymarl2)| [![GitHub 星数](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fhijkzzz\u002Fpymarl2)](https:\u002F\u002Fgithub.com\u002Fhijkzzz\u002Fpymarl2)       |       :x:  | ![GitHub 开放问题](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Fhijkzzz\u002Fpymarl2.svg) | ![GitHub 提交活跃度](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fy\u002Fhijkzzz\u002Fpymarl2?label=提交) | ![GitHub 最后一次提交](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002Fhijkzzz\u002Fpymarl2?label=最后更新)  \n| [MAPPO Benchmark](https:\u002F\u002Fgithub.com\u002Fmarlbenchmark\u002Fon-policy)| [![GitHub 星数](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmarlbenchmark\u002Fon-policy)](https:\u002F\u002Fgithub.com\u002Fmarlbenchmark\u002Fon-policy)   |        :x:              | ![GitHub 开放问题](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Fmarlbenchmark\u002Fon-policy.svg) | ![GitHub 提交活跃度](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fy\u002Fmarlbenchmark\u002Fon-policy?label=提交)| ![GitHub 最后一次提交](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002Fmarlbenchmark\u002Fon-policy?label=最后更新)  \n| [MAlib](https:\u002F\u002Fgithub.com\u002Fsjtu-marl\u002Fmalib) | [![GitHub 星数](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fsjtu-marl\u002Fmalib)](https:\u002F\u002Fgithub.com\u002Fsjtu-marl\u002Fmalib) | [![文档状态](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FReplicable-MARL_MARLlib_readme_13d664e1afd7.png)](https:\u002F\u002Fmalib.readthedocs.io\u002Fen\u002Flatest\u002F?badge=latest) | ![GitHub 开放问题](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Fsjtu-marl\u002Fmalib.svg) | ![GitHub 提交活跃度](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fy\u002Fsjtu-marl\u002Fmalib?label=提交) | ![GitHub 最后一次提交](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002Fsjtu-marl\u002Fmalib?label=最后更新)  \n|    [EPyMARL](https:\u002F\u002Fgithub.com\u002Fuoe-agents\u002Fepymarl)| [![GitHub 星数](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fuoe-agents\u002Fepymarl)](https:\u002F\u002Fgithub.com\u002Fuoe-agents\u002Fepymarl)        |           :x:            | ![GitHub 开放问题](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Fuoe-agents\u002Fepymarl.svg) | ![GitHub 提交活跃度](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fy\u002Fuoe-agents\u002Fepymarl?label=提交) | ![GitHub 最后一次提交](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002Fuoe-agents\u002Fepymarl?label=最后更新)  \n|    [HARL](https:\u002F\u002Fgithub.com\u002FPKU-MARL\u002FHARL)**\\***| [![GitHub 星数](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FPKU-MARL\u002FHARL)](https:\u002F\u002Fgithub.com\u002FPKU-MARL\u002FHARL)        |           :x:            | ![GitHub 开放问题](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002FPKU-MARL\u002FHARL.svg) | ![GitHub 提交活跃度](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fy\u002FPKU-MARL\u002FHARL?label=提交) | ![GitHub 最后一次提交](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002FPKU-MARL\u002FHARL?label=最后更新)  \n|    **[MARLlib](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib)** |  [![GitHub 星数](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FReplicable-MARL\u002FMARLlib)](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib)  |           [![文档状态](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FReplicable-MARL_MARLlib_readme_13d664e1afd7.png)](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002F) | ![GitHub 开放问题](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002FReplicable-MARL\u002FMARLlib.svg) | ![GitHub 提交活跃度](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fy\u002FReplicable-MARL\u002FMARLlib?label=提交) | ![GitHub 最后一次提交](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002FReplicable-MARL\u002FMARLlib?label=最后更新)  \n\n> **_\\*_**  **HARL** 是最近发布的最新 MARL 库：fire:。如果你的目标是前沿的、性能最先进的 MARL 算法，那么 HARL 绝对值得你 [一看](https:\u002F\u002Fgithub.com\u002FPKU-MARL\u002FHARL)！\n\n[comment]: \u003C> (\u003Cdiv align=\"center\">)\n\n[comment]: \u003C> (\u003Cimg src=docs\u002Fsource\u002Fimages\u002Foverview.png width=100% \u002F>)\n\n[comment]: \u003C> (\u003C\u002Fdiv>)\n\n## 主要特性\n\n:beginner: MARLlib 提供了多项关键特性，使其脱颖而出：\n\n- MARLlib 通过代理级别的分布式数据流统一了多样化的算法流程，使研究人员能够在不同的任务和环境中开发、测试和评估多智能体强化学习算法。\n- MARLlib 支持所有任务模式，包括合作、协作、竞争和混合模式。这使得研究人员能够更轻松地在各种任务中训练和评估多智能体强化学习算法。\n- MARLlib 提供了一个遵循 Gym 结构的新接口，使研究人员更容易处理多智能体环境。\n- MARLlib 提供灵活且可定制的参数共享策略，允许研究人员针对不同任务和环境优化其算法。\n\n:rocket: 使用 MARLlib，您可以享受多种优势，例如：\n\n- **无需多智能体强化学习知识**：MARLlib 提供了 18 种预构建的算法，并配有直观的 API，使研究人员无需事先了解该领域即可开始实验多智能体强化学习。\n- **支持所有任务模式**：MARLlib 几乎支持所有的多智能体环境，使研究人员能够更轻松地尝试不同的任务模式。\n- **可定制的模型架构**：研究人员可以从模型库中选择自己喜欢的模型架构，也可以自行构建。\n- **可定制的策略共享**：MARLlib 提供了策略共享的分组选项，或者研究人员也可以自定义自己的策略共享方式。\n- **访问上千个已发布的实验**：研究人员可以访问上千个已发布的实验，了解其他研究人员是如何使用 MARLlib 的。\n\n## 安装\n\n> __注意__：\n> 请注意，目前 MARLlib 仅兼容 Linux 操作系统。\n\n### 分步安装（推荐）\n\n- 安装依赖项\n- 安装环境\n- 安装补丁\n\n#### 1. 安装依赖项（基础）\n\n首先，安装 MARLlib 的依赖项以确保基本使用。\n按照 [此指南](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html)，最后为 RLlib 安装补丁。\n\n```bash\n$ conda create -n marllib python=3.8 # 或 3.9\n$ conda activate marllib\n$ git clone https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib.git && cd MARLlib\n$ pip install -r requirements.txt\n```\n\n#### 2. 安装环境（可选）\n\n请按照 [此指南](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html) 进行操作。\n\n> __注意__：\n> 我们建议使用大约 0.20.0 版本的 gym。\n```bash\npip install \"gym==0.20.0\"\n```\n\n#### 3. 安装补丁（基础）\n\n使用补丁修复 RLlib 的错误，运行以下命令：\n\n```bash\n$ cd \u002FPath\u002FTo\u002FMARLlib\u002Fmarllib\u002Fpatch\n$ python add_patch.py -y\n```\n\n### PyPI\n\n```bash\n$ pip install --upgrade pip\n$ pip install marllib\n```\n\n### 基于 Docker 的使用\n\n我们在 `MARLlib\u002Fdocker\u002FDockerfile` 中提供了用于构建 MARLlib Docker 镜像的 Dockerfile，并在 `MARLlib\u002F.devcontainer` 文件夹中提供了 devcontainer 设置。如果您使用 devcontainer，请注意，您可能需要根据您的硬件设备自定义 `devcontainer.json` 中的 `runArgs` 参数，例如 `--shm-size` 参数。\n\n## 开始使用\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>\u003Cbig>准备配置\u003C\u002Fbig>\u003C\u002Fb>\u003C\u002Fsummary>\n\n整个训练过程由四个部分的配置负责。\n\n- 场景：指定环境\u002F任务设置\n- 算法：选择算法的超参数\n- 模型：自定义模型架构\n- ray\u002Frllib：更改基本训练设置\n\n\u003Cdiv align=\"center\">\n\u003Cimg src=docs\u002Fsource\u002Fimages\u002Fconfigurations.png width=100% \u002F>\n\u003C\u002Fdiv>\n\n在开始训练之前，请确保所有参数都已正确设置，尤其是那些您不希望更改的参数。\n> __注意__：\n> 您也可以通过 MARLLib API 修改所有预设参数。*\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>\u003Cbig>注册环境\u003C\u002Fbig>\u003C\u002Fb>\u003C\u002Fsummary>\n\n请确保您正在使用的环境的所有依赖项均已安装。否则，请参考\n[MARLlib 文档](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html)。\n\n\n|   任务模式   | api 示例 |\n| :-----------: | ----------- |\n| 合作 | ```marl.make_env(environment_name=\"mpe\", map_name=\"simple_spread\", force_coop=True)``` |\n| 协作 | ```marl.make_env(environment_name=\"mpe\", map_name=\"simple_spread\")``` |\n| 竞争 | ```marl.make_env(environment_name=\"mpe\", map_name=\"simple_adversary\")``` |\n| 混合 | ```marl.make_env(environment_name=\"mpe\", map_name=\"simple_crypto\")``` |\n\nMARLlib 支持大多数多智能体强化学习研究中流行的环境：\n\n| 环境名称 | 学习模式 | 可观测性 | 动作空间 | 观测信息 |\n| :-----------: | :-----------: | :-----------: | :-----------: | :-----------: |\n| **[LBF](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#lbf)**  | 协作 + 协同 | 全可观测 | 离散 | 1D  |\n| **[RWARE](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#rware)**  | 协作 | 部分可观测 | 离散 | 1D  |\n| **[MPE](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#mpe)**  | 协作 + 协同 + 混合 | 全可观测 | 离散和连续 | 1D  |\n| **[SISL](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#sisl)** | 协作 + 协同 | 全可观测 | 离散和连续 | 1D |\n| **[SMAC](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#smac)**  | 协作 | 部分可观测 | 离散 | 1D |\n| **[MetaDrive](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#metadrive)**  | 协同 | 部分可观测 | 连续 | 1D |\n| **[MAgent](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#magent)** | 协同 + 混合 | 部分可观测 | 离散 | 2D |\n| **[Pommerman](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#pommerman)**  | 协同 + 竞争 + 混合 | 全可观测 | 离散 | 2D |\n| **[MAMuJoCo](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#mamujoco)**  | 协作 | 全可观测 | 连续 | 1D |\n| **[GRF](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#google-research-football)**  | 协同 + 混合 | 全可观测 | 离散 | 2D |\n| **[Hanabi](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#hanabi)** | 协作 | 部分可观测 | 离散 | 1D |\n| **[MATE](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#mate)** | 协作 + 混合 | 部分可观测 | 离散和连续 | 1D |\n| **[GoBigger](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#gobigger)** | 协作 + 混合 | 全可观测 | 连续 | 1D |\n| **[Overcooked-AI](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#overcooked-ai)** | 协作 | 全可观测 | 离散 | 1D |\n| **[PDN](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#power-distribution-networks)** | 协作 | 部分可观测 | 连续 | 1D |\n| **[AirCombat](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#air-combat)** | 协作 + 混合 | 部分可观测 | 多离散 | 1D |\n| **[HideAndSeek](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html#hide-and-seek)** | 竞争 + 混合 | 部分可观测 | 多离散 | 1D |\n\n每个环境都配有说明文档，作为该任务的使用指南，其中包括环境设置、安装步骤及重要注意事项。\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>\u003Cbig>初始化算法\u003C\u002Fbig>\u003C\u002Fb>\u003C\u002Fsummary>\n\n\n|  运行目标   | API 示例 |\n| :-----------: | ----------- |\n| 训练与微调  | ```marl.algos.mappo(hyperparam_source=$ENV)``` |\n| 开发与调试 | ```marl.algos.mappo(hyperparam_source=\"test\")``` |\n| 第三方环境 | ```marl.algos.mappo(hyperparam_source=\"common\")``` |\n\n以下是各算法特性的表格：\n\n| 算法                                                    | 支持的任务模式 | 离散动作   | 连续动作 |  策略类型        |\n| :------------------------------------------------------------: | :-----------------: | :----------: | :--------------------: | :----------: | \n| *IQL**                                                         | 四种模式均支持 | :heavy_check_mark:   |    |  离策略 |\n| *[PG](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf)* | 四种模式均支持                  | :heavy_check_mark:       | :heavy_check_mark:   |  在策 |\n| *[A2C](https:\u002F\u002Farxiv.org\u002Fabs\u002F1602.01783)*                      | 四种模式均支持              | :heavy_check_mark:       | :heavy_check_mark:   |  在策  |\n| *[DDPG](https:\u002F\u002Farxiv.org\u002Fabs\u002F1509.02971)*                     | 四种模式均支持             |  | :heavy_check_mark:   |  离策 |\n| *[TRPO](http:\u002F\u002Fproceedings.mlr.press\u002Fv37\u002Fschulman15.pdf)*      | 四种模式均支持            | :heavy_check_mark:       | :heavy_check_mark:   |  在策  |\n| *[PPO](https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.06347)*                      | 四种模式均支持            | :heavy_check_mark:       | :heavy_check_mark:   |  在策  |\n| *[COMA](https:\u002F\u002Fojs.aaai.org\u002Findex.php\u002FAAAI\u002Farticle\u002Fdownload\u002F11794\u002F11653)* | 四种模式均支持                           | :heavy_check_mark:       |   |  在策  |\n| *[MADDPG](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.05296)*                   | 四种模式均支持                     |  | :heavy_check_mark:   |  离策 |\n| *MAA2C**                                                       | 四种模式均支持                        | :heavy_check_mark:       | :heavy_check_mark:   |  在策  |\n| *MATRPO**                                                      | 四种模式均支持                         | :heavy_check_mark:       | :heavy_check_mark:   |  在策  |\n| *[MAPPO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2103.01955)*                    | 四种模式均支持                         | :heavy_check_mark:       | :heavy_check_mark:   |  在策  |\n| *[HATRPO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.11251)*                   | 仅协作模式                     | :heavy_check_mark:       | :heavy_check_mark:   |  在策  |\n| *[HAPPO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.11251)*                    | 仅协作模式                     | :heavy_check_mark:       | :heavy_check_mark:   |  在策  |\n| *[VDN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.05296)*                      | 仅协作模式         | :heavy_check_mark:   |    |  离策 |\n| *[QMIX](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.11485)*                     | 仅协作模式                    | :heavy_check_mark:   |   |  离策 |\n| *[FACMAC](https:\u002F\u002Farxiv.org\u002Fabs\u002F2003.06709)*                   | 仅协作模式                    |  | :heavy_check_mark:   |  离策 |\n| *[VDAC](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.12306)*                    | 仅协作模式                    | :heavy_check_mark:       | :heavy_check_mark:   |  在策  |\n| *VDPPO**                                                      | 仅协作模式                | :heavy_check_mark:       | :heavy_check_mark:   |  在策  |\n\n***四种模式**: 协作、协同、竞争、混合\n\n*IQL* 是多智能体版的Q学习。\n*MAA2C* 和 *MATRPO* 分别是A2C和TRPO的集中式版本。\n*VDPPO* 是PPO的值分解版本。\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>\u003Cbig>构建智能体模型\u003C\u002Fbig>\u003C\u002Fb>\u003C\u002Fsummary>\n\n智能体模型由两部分组成：`encoder` 和 `core arch`。\n`encoder` 将由 MARLlib 根据观测空间自动生成。\n您可以选择 `mlp`、`gru` 或 `lstm` 来构建完整的模型。\n\n| 模型架构 | API 示例 |\n| :-----------: | ----------- |\n| MLP  | ```marl.build_model(env, algo, {\"core_arch\": \"mlp\"})``` |\n| GRU | ```marl.build_model(env, algo, {\"core_arch\": \"gru\"})```  |\n| LSTM | ```marl.build_model(env, algo, {\"core_arch\": \"lstm\"})```  |\n| 编码器架构 | ```marl.build_model(env, algo, {\"core_arch\": \"gru\", \"encode_layer\": \"128-256\"})```  |\n\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>\u003Cbig>开始训练\u003C\u002Fbig>\u003C\u002Fb>\u003C\u002Fsummary>\n\n| 设置   | API 示例 |\n| :-----------: | ----------- |\n| 训练  | ```algo.fit(env, model)``` |\n| 调试  | ```algo.fit(env, model, local_mode=True)``` |\n| 停止条件 | ```algo.fit(env, model, stop={'episode_reward_mean': 2000, 'timesteps_total': 10000000})```  |\n| 策略共享 | ```algo.fit(env, model, share_policy='all') # 或 'group' \u002F 'individual'```  |\n| 保存模型 | ```algo.fit(env, model, checkpoint_freq=100, checkpoint_end=True)```  |\n| GPU 加速  | ```algo.fit(env, model, local_mode=False, num_gpus=1)``` |\n| CPU 加速 | ```algo.fit(env, model, local_mode=False, num_workers=5)```  |\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>\u003Cbig>训练与渲染 API\u003C\u002Fbig>\u003C\u002Fb>\u003C\u002Fsummary>\n\n```py\nfrom marllib import marl\n\n\n\n# 准备环境\nenv = marl.make_env(environment_name=\"smac\", map_name=\"5m_vs_6m\")\n# 使用指定超参数初始化算法\nmappo = marl.algos.mappo(hyperparam_source=\"smac\")\n# 根据环境、算法和用户偏好构建智能体模型\nmodel = marl.build_model(env, mappo, {\"core_arch\": \"gru\", \"encode_layer\": \"128-256\"})\n# 开始训练\nmappo.fit(\n  env, model, \n  stop={\"timesteps_total\": 1000000}, \n  checkpoint_freq=100, \n  share_policy=\"group\"\n)\n# 渲染\nmappo.render(\n  env, model, \n  local_mode=True, \n  restore_path={'params_path': \"checkpoint\u002Fparams.json\",\n                'model_path': \"checkpoint\u002Fcheckpoint-10\"}\n)\n```\n\u003C\u002Fdetails>\n\n## 结果\n\n在当前工作目录下，您可以找到所有的训练数据（日志文件和 TensorFlow 文件）以及保存的模型。要可视化学习曲线，您可以使用 TensorBoard。请按照以下步骤操作：\n\n1. 通过运行以下命令安装 TensorBoard：\n```bash\npip install tensorboard\n```\n\n2. 使用以下命令启动 TensorBoard 并可视化结果：\n```bash\ntensorboard --logdir .\n```\n\n您也可以参考[这篇教程](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensorboard\u002Fblob\u002Fmaster\u002Fdocs\u002Fget_started.ipynb)，获取更详细的说明。\n\n有关所有现有结果的列表，请访问[此链接](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Ftree\u002Fmain\u002Fresults)。请注意，这些结果来自 MARLlib 的旧版本，因此可能与当前结果存在不一致之处。\n\n## 快速示例\n\nMARLlib 提供了一些实用示例供您参考。\n\n- [详细 API 使用](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Frllib_1.8.0_dev\u002Fexamples\u002Fapi_basic_usage.py)：展示如何详细使用 MARLlib API，例如结合命令行和 API 运行。\n- [自定义策略共享](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Frllib_1.8.0_dev\u002Fexamples\u002Fcustomize_policy_sharing.py)：根据当前任务自定义您的组策略共享策略。\n- [加载模型](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Frllib_1.8.0_dev\u002Fexamples\u002Fload_model.py)：加载预训练模型并继续训练。\n- [加载模型并渲染](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Frllib_1.8.0_dev\u002Fexamples\u002Fload_and_render_model.py)：基于预训练模型渲染环境。\n- [集成新环境](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Frllib_1.8.0_dev\u002Fexamples\u002Fadd_new_env.py)：按照 MARLlib 的环境-智能体交互接口添加您的新环境。\n- [集成新算法](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Frllib_1.8.0_dev\u002Fexamples\u002Fadd_new_algorithm.py)：按照 MARLlib 的学习流程添加您的新算法。\n- [并行微调](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Frllib_1.8.0_dev\u002Fexamples\u002Fgrid_search_usage.py)：使用 `ray.tune` 微调您的策略\u002F模型性能。\n\n## 教程\n\n在 Google Colaboratory 上尝试 MPE + MAPPO 示例！\n[![在 Colab 中打开](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Fmaster\u002Fmarllib.ipynb)\n更多教程文档可在[这里](https:\u002F\u002Fmarllib.readthedocs.io\u002F)找到。\n\n## 精选列表\n\n收集了多智能体强化学习（MARL）的研究和综述论文。这些论文已按发表日期及其对相应环境的评估进行整理。\n\n算法：[![Awesome](https:\u002F\u002Fawesome.re\u002Fbadge.svg)](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fresources\u002Fawesome.html)\n环境：[![Awesome](https:\u002F\u002Fawesome.re\u002Fbadge.svg)](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html)\n\n\n## 社区\n\n| 渠道   | 链接 |\n| :----------- | :----------- |\n| 问题 | [GitHub Issues](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fissues) |\n\n## 路线图\n\n未来版本的路线图可在[ROADMAP.md](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Fmain\u002FROADMAP.md)中查看。\n\n## 贡献\n\n我们是一个专注于多智能体强化学习的小团队，非常欢迎任何帮助！\n如果您想参与其中，以下是关于[贡献指南及如何在本地测试代码](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Frllib_1.8.0_dev\u002FCONTRIBUTING.md)的信息。\n\n您可以通过多种方式做出贡献，例如报告 bug、撰写或翻译文档、审查或重构代码、请求或实现新功能等。\n\n## 引用\n\n如果您在研究中使用 MARLlib，请引用[MARLlib 论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.13708)。\n\n```tex\n@article{hu2022marllib,\n  author  = {Siyi Hu and Yifan Zhong and Minquan Gao and Weixun Wang and Hao Dong and Xiaodan Liang and Zhihui Li and Xiaojun Chang and Yaodong Yang},\n  title   = {MARLlib: 一个可扩展且高效的多智能体强化学习库},\n  journal = {机器学习研究期刊},\n  year    = {2023},\n}\n```\n\n基于或密切合作于 MARLlib 的作品 \u003C[链接](https:\u002F\u002Fgithub.com\u002FPKU-MARL\u002FHARL)>\n\n```tex\n@InProceedings{hu2022policy,\n      title={通过测量角色多样性进行合作式多智能体 RL 中的策略诊断},\n      author={Hu, Siyi 和 Xie, Chuanlong 和 Liang, Xiaodan 和 Chang, Xiaojun},\n      booktitle={第 39 届国际机器学习大会论文集},\n      year={2022},\n}\n@misc{zhong2023heterogeneousagent,\n      title={异构智能体强化学习}, \n      author={Yifan Zhong 和 Jakub Grudzien Kuba 和 Siyi Hu 和 Jiaming Ji 和 Yaodong Yang},\n      archivePrefix={arXiv},\n      year={2023},\n}\n```","# MARLlib 快速上手指南\n\nMARLlib 是一个基于 Ray 和 RLlib 构建的多智能体强化学习（MARL）库。它支持协作、竞争及混合等多种任务模式，提供 18 种预置算法和灵活的模型架构，旨在降低 MARL 研究门槛，让开发者无需深厚的领域知识即可快速开展实验。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：目前仅支持 **Linux** 系统。\n*   **Python 版本**：推荐 Python 3.8 或 3.9。\n*   **包管理工具**：建议使用 `conda` 进行环境管理。\n*   **网络环境**：由于需要克隆仓库和安装依赖，请确保网络连接畅通（国内用户可配置 pip 国内镜像源加速下载）。\n\n## 安装步骤\n\n推荐按照以下步骤进行分步安装，以确保基础依赖和环境补丁正确配置。\n\n### 1. 创建并激活虚拟环境\n\n```bash\nconda create -n marllib python=3.8\nconda activate marllib\n```\n\n### 2. 克隆代码库并安装基础依赖\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib.git\ncd MARLlib\npip install -r requirements.txt\n```\n\n> **提示**：国内用户可使用清华或阿里镜像源加速安装，例如：\n> `pip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`\n\n### 3. 安装特定环境与补丁（可选但推荐）\n\nMARLlib 支持多种环境（如 MPE, StarCraft II, Google Research Football 等）。如需使用特定环境，请参考官方文档中的 [环境安装指南](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fhandbook\u002Fenv.html) 进行额外安装。\n\n**重要**：安装完依赖后，务必安装针对 RLlib 的补丁以保证兼容性：\n```bash\n# 具体补丁安装命令请参考项目根目录或文档说明，通常在安装依赖后自动处理或需手动运行特定脚本\n```\n\n## 基本使用\n\nMARLlib 提供了直观的 API，只需几行代码即可启动训练。以下是一个在 MPE (Multi-Agent Particle Environment) 的 `simple_spread` 地图上训练 MAPPO 算法的最小示例：\n\n```python\nfrom marllib import marl\n\n# 1. 准备环境\n# environment_name: 环境名称\n# map_name: 具体地图\u002F场景名称\n# force_coop: 是否强制协作模式\nenv = marl.make_env(environment_name=\"mpe\", map_name=\"simple_spread\", force_coop=True)\n\n# 2. 初始化算法\n# hyperparam_source: 加载预设的超参数配置（对应环境名）\nmappo = marl.algos.mappo(hyperparam_source='mpe')\n\n# 3. 构建模型\n# core_arch: 核心网络架构 (mlp, cnn, gru 等)\n# encode_layer: 隐藏层结构\nmodel = marl.build_model(env, mappo, {\"core_arch\": \"mlp\", \"encode_layer\": \"128-256\"})\n\n# 4. 开始训练\n# stop: 停止条件 (如总步数)\n# share_policy: 策略共享方式 ('group', 'all', 'separate' 等)\nmappo.fit(env, model, stop={'timesteps_total': 1000000}, share_policy='group')\n```\n\n运行上述脚本后，MARLlib 将自动利用 Ray 进行分布式数据采集与模型训练，并在控制台输出训练日志。","某自动驾驶初创公司的算法团队正致力于开发多车协同避障系统，需要在复杂的动态交通环境中训练多个智能体进行高效协作。\n\n### 没有 MARLlib 时\n- **环境适配繁琐**：工程师需为不同的仿真场景（如 MPE、StarCraft II）手动编写大量接口代码，每次切换测试环境都要重构数据格式，耗时且易错。\n- **算法复现困难**：尝试复现最新的 MAPPO 或 QMIX 算法时，缺乏统一的基准实现，团队成员各自为战，导致代码风格割裂，难以对比实验结果。\n- **分布式训练门槛高**：利用多 GPU 集群加速训练时，需自行配置 Ray 底层架构和复杂的参数共享逻辑，调试并行策略占据了大部分研发时间。\n- **模型扩展性差**：当需要调整神经网络结构（如从 MLP 改为 CNN）以适应新的感知输入时，往往牵一发而动全身，修改成本极高。\n\n### 使用 MARLlib 后\n- **一键加载环境**：通过 `marl.make_env` 接口即可瞬间加载 MPE、Overcooked-AI 等十余种标准环境，统一了观测与动作空间，让团队能专注于策略设计而非数据清洗。\n- **标准化算法库**：直接调用内置的 MAPPO 等成熟算法，只需指定超参数源，确保了实验的可复现性，团队成员可在同一基准上快速迭代优化。\n- **无缝分布式支持**：依托集成的 Ray\u002FRLlib 架构，仅需一行 `fit` 代码即可自动管理多智能体间的参数共享与分布式训练，大幅缩短了模型收敛时间。\n- **灵活模型构建**：利用 `marl.build_model` 可像搭积木一样自定义核心网络架构（如设置 \"128-256\" 隐藏层），轻松适配不同任务需求而无需重写底层逻辑。\n\nMARLlib 将多智能体强化学习从“造轮子”的底层泥潭中解放出来，让研发团队能以前所未有的速度验证协同决策算法的商业价值。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FReplicable-MARL_MARLlib_69443e09.png","Replicable-MARL","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FReplicable-MARL_d11934ca.png","Arena of MARL algorithms",null,"https:\u002F\u002Fgithub.com\u002FReplicable-MARL",[78,82,86,90,94,98,101,105],{"name":79,"color":80,"percentage":81},"Python","#3572A5",92.6,{"name":83,"color":84,"percentage":85},"C++","#f34b7d",6.5,{"name":87,"color":88,"percentage":89},"C","#555555",0.5,{"name":91,"color":92,"percentage":93},"Jupyter Notebook","#DA5B0B",0.2,{"name":95,"color":96,"percentage":97},"Dockerfile","#384d54",0.1,{"name":99,"color":100,"percentage":97},"Shell","#89e051",{"name":102,"color":103,"percentage":104},"CMake","#DA3434",0,{"name":106,"color":107,"percentage":104},"Jsonnet","#0064bd",1298,194,"2026-04-02T16:50:26","MIT",4,"Linux","未说明",{"notes":116,"python":117,"dependencies":118},"目前仅兼容 Linux 操作系统。安装步骤包括：创建 conda 环境（Python 3.8 或 3.9）、克隆仓库、安装 requirements.txt 中的依赖。此外，还需要根据指南安装特定环境补丁（patches for RLlib），部分环境安装为可选步骤。","3.8, 3.9",[119,120],"ray","rllib",[13,14,52],[123,119,120,124,125],"multi-agent-reinforcement-learning","pytorch","deep-reinforcement-learning","2026-03-27T02:49:30.150509","2026-04-07T22:50:55.653227",[129,134,139,144,149,153],{"id":130,"question_zh":131,"answer_zh":132,"source_url":133},22917,"如何在 MARLlib 中运行 PettingZoo SISL 环境（如 multiwalker, waterworld）？","可以使用以下脚本运行 SISL 环境。如果遇到错误，尝试将学习率（LR）在 mappo.yaml 中从默认的 5e-4 降低到 1e-6。\n\n示例代码：\n```python\nfrom marllib import marl\n\nenv = marl.make_env(environment_name=\"sisl\", map_name=\"multiwalker\", force_coop=True)\n\nmappo = marl.algos.mappo(hyperparam_source=\"test\")\n\nmodel = marl.build_model(env, mappo, {\"core_arch\": \"mlp\", \"encode_layer\": \"128-256\"})\n\nmappo.fit(env, model, stop={'episode_reward_mean': 2000, 'timesteps_total': 20000000}, local_mode=False, num_gpus=1,\n          num_workers=2, share_policy='all', checkpoint_freq=100)\n```\n注意：不需要手动扁平化 obstacle_coords，上述配置即可正常运行。","https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fissues\u002F117",{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},22918,"运行 MetaDrive 环境时遇到 \"AttributeError: 'NoneType' object has no attribute 'reset'\" 错误怎么办？","MetaDrive 环境需要额外的 worker 来进行评估。请修改 `ray.yaml` 配置文件，将 `num_workers` 设置为大于 2 的值（例如 5）。\n\n操作步骤：\n1. 打开 `ray.yaml` 文件。\n2. 找到 `num_workers` 参数。\n3. 将其值改为 5 或更高。\n4. 重新运行训练命令。\n\n示例命令：\n`python marl\u002Fmain.py --algo_config=mappo --env_config=metadrive with env_args.map_name=Roundabout`","https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fissues\u002F78",{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},22919,"运行 MAA2C 或 DDPG 系列算法时报错 \"'MAA2CTrainer' object has no attribute '_local_ip'\" 或其他属性错误如何解决？","DDPG 算法家族（包括 MAA2C, MADDPG 等）仅支持连续动作空间（continuous action space）。如果环境默认是离散动作，需要在配置文件中显式开启连续动作选项。\n\n解决方法：\n在对应的环境配置文件（如 `mpe.yaml`）中，将 `env_args` 下的 `continuous_actions` 参数设置为 `True`。\n\n配置示例片段：\n```yaml\nenv_args:\n  continuous_actions: true\n  map_name: simple_adversary\n```","https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fissues\u002F65",{"id":145,"question_zh":146,"answer_zh":147,"source_url":148},22920,"MARLlib 中的超参数是如何微调的？如何找到最优超参数？","寻找最优超参数主要有两种方法：\n1. **参考基准论文**：MARLlib 的微调主要基于已发表的基准研究论文中的参数设置。\n2. **网格搜索**：使用 `ray.tune.grid_search()` 系统地探索超参数空间。\n\n只有位于 `marllib\u002Fmarl\u002Falgos\u002Fhyperparams\u002Ffinetuned` 目录下的任务经过了官方微调。对于其他任务，建议查阅相关论文和综述以获取更深入的指导。","https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fissues\u002F128",{"id":150,"question_zh":151,"answer_zh":152,"source_url":148},22921,"目前 MARLlib 支持哪个版本的 MuJoCo？是否支持 v4？","目前 MARLlib 主要支持 MuJoCo v2 版本。v4 版本包含许多新功能，但尚未在库中完全更新支持。如果需要 v4 的功能，可能需要等待官方更新或自行适配代码。当前建议继续使用稳定的 v2 版本以确保兼容性。",{"id":154,"question_zh":155,"answer_zh":156,"source_url":143},22922,"在运行示例脚本时出现关于 `observation_spaces` 和 `action_spaces` 的弃用警告，这会影响运行吗？","这些警告来自 PettingZoo 库的更新（`The observation_spaces dictionary is deprecated...`），提示应使用函数 `observation_space()` 和 `action_space()` 替代字典访问。通常情况下，这些只是警告（UserWarning），不会导致程序崩溃或阻止训练运行。您可以忽略它们，或者升级 PettingZoo 到最新版本以消除警告。如果程序因其他错误（如 Ray TaskError）停止，请优先解决具体的报错信息而非这些弃用警告。",[158,163],{"id":159,"version":160,"summary_zh":161,"released_at":162},136697,"1.0.3","# 发布说明 - 1.0.3\n\n本次发布包含来自 1.0.2 的多项更新和改进。以下是主要变更列表：\n\n## 新增\n- [MATE](https:\u002F\u002Fgithub.com\u002FXuehaiPan\u002Fmate)：一个新的多智能体追踪环境。\n- [GoBigger](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FGoBigger)：一个高效且简洁的类“吃豆人”游戏引擎，并提供了多种用于游戏 AI 开发的接口。\n\n我们鼓励用户升级到此版本，以充分利用新功能和改进。如果您有任何问题或遇到任何问题，请随时提交 Issue。","2023-04-25T03:22:21",{"id":164,"version":165,"summary_zh":166,"released_at":167},136698,"1.0.2","# 发布说明 - 1.0.2\n\n本次发布包含对 MARLlib 的多项更新和改进。以下是主要变更列表：\n\n## 新增\n- [基于 API 的使用方式](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib#getting-started)：自 1.0.0 版本起，MARLlib 采用了新的 API，以简化训练流程并支持更灵活的自定义。详细用法请参阅 [README](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib#getting-started)。\n- [示例代码](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Ftree\u002Fmaster\u002Fexamples)：现提供示例代码，帮助用户快速上手 MARLlib。\n- [渲染功能](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Fblob\u002Fmaster\u002Fexamples\u002Fload_and_render_model.py)：支持加载预训练模型，并基于这些模型进行任务渲染。\n- [精选资源列表](https:\u002F\u002Fmarllib.readthedocs.io\u002Fen\u002Flatest\u002Fresources\u002Fawesome.html)：收录了近期相关论文综述，展示多智能体强化学习领域的最新进展（截至 2023 年初）。\n\n## 变更\n- [MLP 模块](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Ftree\u002Fmaster\u002Fmarllib\u002Fmarl\u002Fmodels\u002Fzoo\u002Fmlp)：除原有的 GRU 外，本版本新增多层感知器（MLP），可用于构建智能体架构。\n- [标准化编码器](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Ftree\u002Fmaster\u002Fmarllib\u002Fmarl\u002Fmodels\u002Fzoo\u002Fencoder)：所有不同模型类现统一使用相同的编码器类。\n- 超参数优化：新增了一个用于算法测试的 [测试文件夹](https:\u002F\u002Fgithub.com\u002FReplicable-MARL\u002FMARLlib\u002Ftree\u002Fmaster\u002Fmarllib\u002Fmarl\u002Falgos\u002Fhyperparams\u002Ftest)，同时删除了一些对超参数影响较小的环境文件夹。\n- 结果数据：经核查后，移除了部分实验数据。\n- 其他小幅改进。\n\n## 移除\n- 控制台式使用方式：由于新 API 是未来开发中唯一受支持的使用方式，控制台式使用已被完全弃用。\n\n我们鼓励用户升级至本版本，以充分利用新增功能和改进。如有任何疑问或问题，请随时提交 Issue。\n\n","2023-03-21T10:03:33"]