[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-opendilab--DI-engine":3,"tool-opendilab--DI-engine":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",142651,2,"2026-04-06T23:34:12",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":77,"owner_twitter":73,"owner_website":76,"owner_url":78,"languages":79,"stars":92,"forks":93,"last_commit_at":94,"license":95,"difficulty_score":10,"env_os":96,"env_gpu":97,"env_ram":96,"env_deps":98,"category_tags":106,"github_topics":107,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":128,"updated_at":129,"faqs":130,"releases":159},4818,"opendilab\u002FDI-engine","DI-engine","OpenDILab Decision AI Engine. The Most Comprehensive Reinforcement Learning Framework B.P.","DI-engine 是由 OpenDILab 推出的决策智能引擎，旨在为强化学习（RL）领域提供一套全面且高效的开发框架。它主要解决了强化学习算法从理论研究到实际落地过程中面临的痛点，如环境适配复杂、算法复现困难以及分布式训练部署门槛高等问题，帮助用户将精力更集中于策略设计而非底层工程搭建。\n\n这款工具非常适合人工智能研究人员、算法工程师以及希望深入探索决策智能的开发者使用。无论是学术界的算法创新验证，还是工业界的游戏 AI、机器人控制等场景应用，DI-engine 都能提供强有力的支持。\n\n其核心技术亮点在于极高的算法覆盖率与灵活的架构设计。DI-engine 内置了数十种主流强化学习算法，并支持通过统一接口轻松切换；同时，它原生支持分布式训练，能够显著提升大规模任务的训练效率。此外，项目拥有完善的中文文档和社区支持，大幅降低了上手难度，让使用者能够更顺畅地构建、训练并部署自己的智能决策模型。","\u003Cdiv align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F\">\u003Cimg width=\"1000px\" height=\"auto\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_7389e3f44a0c.png\">\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n---\n\n[![Twitter](https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Furl?style=social&url=https%3A%2F%2Ftwitter.com%2Fopendilab)](https:\u002F\u002Ftwitter.com\u002Fopendilab)\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002FDI-engine)](https:\u002F\u002Fpypi.org\u002Fproject\u002FDI-engine\u002F)\n![Conda](https:\u002F\u002Fanaconda.org\u002Fopendilab\u002Fdi-engine\u002Fbadges\u002Fversion.svg)\n![Conda update](https:\u002F\u002Fanaconda.org\u002Fopendilab\u002Fdi-engine\u002Fbadges\u002Flatest_release_date.svg)\n![PyPI - Python Version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002FDI-engine)\n![PyTorch Version](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdynamic\u002Fjson?color=blue&label=pytorch&query=%24.pytorchVersion&url=https%3A%2F%2Fgist.githubusercontent.com\u002FPaParaZz1\u002F54c5c44eeb94734e276b2ed5770eba8d\u002Fraw\u002F85b94a54933a9369f8843cc2cea3546152a75661\u002Fbadges.json)\n\n![Loc](https:\u002F\u002Fimg.shields.io\u002Fendpoint?url=https:\u002F\u002Fgist.githubusercontent.com\u002FHansBug\u002F3690cccd811e4c5f771075c2f785c7bb\u002Fraw\u002Floc.json)\n![Comments](https:\u002F\u002Fimg.shields.io\u002Fendpoint?url=https:\u002F\u002Fgist.githubusercontent.com\u002FHansBug\u002F3690cccd811e4c5f771075c2f785c7bb\u002Fraw\u002Fcomments.json)\n\n![Style](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Factions\u002Fworkflows\u002Fstyle.yml\u002Fbadge.svg)\n[![Read en Docs](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Factions\u002Fworkflows\u002Fdoc.yml\u002Fbadge.svg)](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest)\n[![Read zh_CN Docs](https:\u002F\u002Fimg.shields.io\u002Freadthedocs\u002Fdi-engine-docs?label=%E4%B8%AD%E6%96%87%E6%96%87%E6%A1%A3)](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest)\n![Unittest](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Factions\u002Fworkflows\u002Funit_test.yml\u002Fbadge.svg)\n![Algotest](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Factions\u002Fworkflows\u002Falgo_test.yml\u002Fbadge.svg)\n![deploy](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Factions\u002Fworkflows\u002Fdeploy.yml\u002Fbadge.svg)\n[![codecov](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fopendilab\u002FDI-engine\u002Fbranch\u002Fmain\u002Fgraph\u002Fbadge.svg?token=B0Q15JI301)](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fopendilab\u002FDI-engine)\n\n![GitHub Org's stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fopendilab)\n[![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fopendilab\u002FDI-engine)](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fstargazers)\n[![GitHub forks](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Fopendilab\u002FDI-engine)](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fnetwork)\n![GitHub commit activity](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fm\u002Fopendilab\u002FDI-engine)\n[![GitHub issues](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Fopendilab\u002FDI-engine)](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fissues)\n[![GitHub pulls](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues-pr\u002Fopendilab\u002FDI-engine)](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fpulls)\n[![Contributors](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcontributors\u002Fopendilab\u002FDI-engine)](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fgraphs\u002Fcontributors)\n[![GitHub license](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fopendilab\u002FDI-engine)](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmaster\u002FLICENSE)\n[![Hugging Face](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Models-yellow)](https:\u002F\u002Fhuggingface.co\u002FOpenDILabCommunity)\n[![Open in OpenXLab](https:\u002F\u002Fcdn-static.openxlab.org.cn\u002Fheader\u002Fopenxlab_models.svg)](https:\u002F\u002Fopenxlab.org.cn\u002Fmodels?search=opendilab)\n[![discord badge](https:\u002F\u002Fdcbadge.vercel.app\u002Fapi\u002Fserver\u002FdkZS2JF56X?style=flat)](https:\u002F\u002Fdiscord.gg\u002FdkZS2JF56X)\n[![slack badge](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSlack-join-blueviolet?logo=slack&amp)](https:\u002F\u002Fjoin.slack.com\u002Ft\u002Fopendilab\u002Fshared_invite\u002Fzt-v9tmv4fp-nUBAQEH1_Kuyu_q4plBssQ)\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fhellogithub.com\u002Frepository\u002F175c1e13739c4e429d0abf2b32ec583d\" target=\"_blank\">\n    \u003Cimg src=\"https:\u002F\u002Fapi.hellogithub.com\u002Fv1\u002Fwidgets\u002Frecommend.svg?rid=175c1e13739c4e429d0abf2b32ec583d&claim_uid=cExIpHuMKdTQ6BW\" alt=\"Featured｜HelloGitHub\" style=\"width: 250px; height: 54px;\" width=\"250\" height=\"54\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\u003Cbr>\n\nUpdated on 2024.12.23 DI-engine-v0.5.3\n\n## Introduction to DI-engine\n\n[Documentation](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F) | [中文文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F) | [Tutorials](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F01_quickstart\u002Findex.html) | [Feature](#feature) | [Task & Middleware](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F03_system\u002Findex.html) | [TreeTensor](#general-data-container-treetensor) | [Roadmap](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fissues\u002F548)\n\n**DI-engine** is a generalized decision intelligence engine for PyTorch and JAX.\n\nIt provides **python-first** and **asynchronous-native** task and middleware abstractions, and modularly integrates several of the most important decision-making concepts: Env, Policy and Model. Based on the above mechanisms, DI-engine supports **various [deep reinforcement learning](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F10_concepts\u002Findex.html) algorithms** with superior performance, high efficiency, well-organized [documentation](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F) and [unittest](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Factions):\n\n- Most basic DRL algorithms: such as DQN, Rainbow, PPO, TD3, SAC, R2D2, IMPALA\n- Multi-agent RL algorithms: such as QMIX, WQMIX, MAPPO, HAPPO, ACE\n- Imitation learning algorithms (BC\u002FIRL\u002FGAIL): such as GAIL, SQIL, Guided Cost Learning, Implicit BC\n- Offline RL algorithms: BCQ, CQL, TD3BC, Decision Transformer, EDAC, Diffuser, Decision Diffuser, SO2\n- Model-based RL algorithms: SVG, STEVE, MBPO, DDPPO, DreamerV3\n- Exploration algorithms: HER, RND, ICM, NGU\n- LLM + RL Algorithms: PPO-max, DPO, PromptPG, PromptAWR\n- Other algorithms: such as PER, PLR, PCGrad\n- MCTS + RL algorithms: AlphaZero, MuZero, please refer to [LightZero](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLightZero)\n- Generative Model + RL algorithms: Diffusion-QL, QGPO, SRPO, please refer to [GenerativeRL](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FGenerativeRL)\n\n\n**DI-engine** aims to **standardize different Decision Intelligence environments and applications**, supporting both academic research and prototype applications. Various training pipelines and customized decision AI applications are also supported:\n\n\u003Cdetails open>\n\u003Csummary>(Click to Collapse)\u003C\u002Fsummary>\n\n- Traditional academic environments\n  - [DI-zoo](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine#environment-versatility): various decision intelligence demonstrations and benchmark environments with DI-engine.\n- Tutorial courses\n  - [PPOxFamily](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FPPOxFamily): PPO x Family DRL Tutorial Course\n- Real world decision AI applications\n  - [DI-star](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-star): Decision AI in StarCraftII\n  - [PsyDI](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FPsyDI): Towards a Multi-Modal and Interactive Chatbot for Psychological Assessments\n  - [DI-drive](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-drive): Auto-driving platform\n  - [DI-sheep](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-sheep): Decision AI in 3 Tiles Game\n  - [DI-smartcross](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-smartcross): Decision AI in Traffic Light Control\n  - [DI-bioseq](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-bioseq): Decision AI in Biological Sequence Prediction and Searching\n  - [DI-1024](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-1024): Deep Reinforcement Learning + 1024 Game\n- Research paper\n  - [InterFuser](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FInterFuser): [CoRL 2022] Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer\n  - [ACE](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FACE): [AAAI 2023] ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency\n  - [GoBigger](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FGoBigger): [ICLR 2023] Multi-Agent Decision Intelligence Environment\n  - [DOS](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDOS): [CVPR 2023] ReasonNet: End-to-End Driving with Temporal and Global Reasoning\n  - [LightZero](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLightZero): [NeurIPS 2023 Spotlight] A lightweight and efficient MCTS\u002FAlphaZero\u002FMuZero algorithm toolkit\n  - [SO2](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FSO2): [AAAI 2024] A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning\n  - [LMDrive](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLMDrive): [CVPR 2024] LMDrive: Closed-Loop End-to-End Driving with Large Language Models\n  - [SmartRefine](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FSmartRefine): [CVPR 2024] SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction\n  - [ReZero](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLightZero): Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze\n  - [UniZero](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLightZero): Generalized and Efficient Planning with Scalable Latent World Models\n- Docs and Tutorials\n  - [DI-engine-docs](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine-docs): Tutorials, best practice and the API reference.\n  - [awesome-model-based-RL](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-model-based-RL): A curated list of awesome Model-Based RL resources\n  - [awesome-exploration-RL](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-exploration-rl): A curated list of awesome exploration RL resources\n  - [awesome-decision-transformer](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-decision-transformer): A curated list of Decision Transformer resources\n  - [awesome-RLHF](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-RLHF): A curated list of reinforcement learning with human feedback resources\n  - [awesome-multi-modal-reinforcement-learning](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-multi-modal-reinforcement-learning): A curated list of Multi-Modal Reinforcement Learning resources\n  - [awesome-diffusion-model-in-rl](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-diffusion-model-in-rl): A curated list of Diffusion Model in RL resources\n  - [awesome-ui-agents](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-ui-agents): A curated list of of awesome UI agents resources, encompassing Web, App, OS, and beyond\n  - [awesome-AI-based-protein-design](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-AI-based-protein-design): a collection of research papers for AI-based protein design\n  - [awesome-end-to-end-autonomous-driving](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-end-to-end-autonomous-driving): A curated list of awesome End-to-End Autonomous Driving resources\n  - [awesome-driving-behavior-prediction](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-driving-behavior-prediction): A collection of research papers for Driving Behavior Prediction\n\n  \u003C\u002Fdetails>\n\nOn the low-level end, DI-engine comes with a set of highly re-usable modules, including [RL optimization functions](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fding\u002Frl_utils), [PyTorch utilities](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fding\u002Ftorch_utils) and [auxiliary tools](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fding\u002Futils).\n\nBTW, **DI-engine** also has some special **system optimization and design** for efficient and robust large-scale RL training:\n\n\u003Cdetails close>\n\u003Csummary>(Click for Details)\u003C\u002Fsummary>\n\n- [treevalue](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Ftreevalue): Tree-nested data structure\n- [DI-treetensor](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-treetensor): Tree-nested PyTorch tensor Lib\n- [DI-toolkit](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-toolkit): A simple toolkit package for decision intelligence\n- [DI-orchestrator](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-orchestrator): RL Kubernetes Custom Resource and Operator Lib\n- [DI-hpc](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-hpc): RL HPC OP Lib\n- [DI-store](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-store): RL Object Store\n\n\u003C\u002Fdetails>\n\nHave fun with exploration and exploitation.\n\n## Outline\n\n- [Introduction to DI-engine](#introduction-to-di-engine)\n- [Outline](#outline)\n- [Installation](#installation)\n- [Quick Start](#quick-start)\n- [Feature](#feature)\n  - [Algorithm Versatility](#algorithm-versatility)\n  - [Environment Versatility](#environment-versatility)\n  - [General Data Container: TreeTensor](#general-data-container-treetensor)\n- [Feedback and Contribution](#feedback-and-contribution)\n- [Supporters](#supporters)\n  - [↳ Stargazers](#-stargazers)\n  - [↳ Forkers](#-forkers)\n- [Citation](#citation)\n- [License](#license)\n\n## Installation\n\nYou can simply install DI-engine from PyPI with the following command:\n\n```bash\npip install DI-engine\n```\n\nFor more information about installation, you can refer to [installation](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F01_quickstart\u002Finstallation.html).\n\nAnd our dockerhub repo can be found [here](https:\u002F\u002Fhub.docker.com\u002Frepository\u002Fdocker\u002Fopendilab\u002Fding)，we prepare `base image` and `env image` with common RL environments.\n\n\u003Cdetails close>\n\u003Csummary>(Click for Details)\u003C\u002Fsummary>\n\n- base: opendilab\u002Fding:nightly\n- rpc: opendilab\u002Fding:nightly-rpc\n- atari: opendilab\u002Fding:nightly-atari\n- mujoco: opendilab\u002Fding:nightly-mujoco\n- dmc: opendilab\u002Fding:nightly-dmc2gym\n- metaworld: opendilab\u002Fding:nightly-metaworld\n- smac: opendilab\u002Fding:nightly-smac\n- grf: opendilab\u002Fding:nightly-grf\n- cityflow: opendilab\u002Fding:nightly-cityflow\n- evogym: opendilab\u002Fding:nightly-evogym\n- d4rl: opendilab\u002Fding:nightly-d4rl\n\n\u003C\u002Fdetails>\n\nThe detailed documentation are hosted on [doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F) | [中文文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F).\n\n## Quick Start\n\n[3 Minutes Kickoff](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F01_quickstart\u002Ffirst_rl_program.html)\n\n[3 Minutes Kickoff (colab)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1_7L-QFDfeCvMvLJzRyBRUW5_Q6ESXcZ4)\n\n[DI-engine Huggingface Kickoff (colab)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1UH1GQOjcHrmNSaW77hnLGxFJrLSLwCOk)\n\n[How to migrate a new **RL Env**](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F11_dizoo\u002Findex.html) | [如何迁移一个新的**强化学习环境**](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F11_dizoo\u002Findex_zh.html)\n\n[How to customize the neural network model](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F04_best_practice\u002Fcustom_model.html) | [如何定制策略使用的**神经网络模型**](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F04_best_practice\u002Fcustom_model_zh.html)\n\n[测试\u002F部署 **强化学习策略** 的样例](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fdizoo\u002Fclassic_control\u002Fcartpole\u002Fentry\u002Fcartpole_c51_deploy.py)\n\n[新老 pipeline 的异同对比](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F04_best_practice\u002Fdiff_in_new_pipeline_zh.html)\n\n## Feature\n\n### Algorithm Versatility\n\n\u003Cdetails open>\n\u003Csummary>(Click to Collapse)\u003C\u002Fsummary>\n\n![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen) &nbsp;discrete means discrete action space, which is only label in normal DRL algorithms (1-23)\n\n![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green) &nbsp;means continuous action space, which is only label in normal DRL algorithms (1-23)\n\n![hybrid](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-hybrid-darkgreen) &nbsp;means hybrid (discrete + continuous) action space (1-23)\n\n![dist](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-distributed-blue) &nbsp;[Distributed Reinforcement Learning](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F02_algo\u002Fdistributed_rl.html)｜[分布式强化学习](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F02_algo\u002Fdistributed_rl_zh.html)\n\n![MARL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow) &nbsp;[Multi-Agent Reinforcement Learning](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F02_algo\u002Fmulti_agent_cooperation_rl.html)｜[多智能体强化学习](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F02_algo\u002Fmulti_agent_cooperation_rl_zh.html)\n\n![exp](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-exploration-orange) &nbsp;[Exploration Mechanisms in Reinforcement Learning](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F02_algo\u002Fexploration_rl.html)｜[强化学习中的探索机制](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F02_algo\u002Fexploration_rl_zh.html)\n\n![IL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-IL-purple) &nbsp;[Imitation Learning](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F02_algo\u002Fimitation_learning.html)｜[模仿学习](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F02_algo\u002Fimitation_learning_zh.html)\n\n![offline](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-offlineRL-darkblue) &nbsp;[Offiline Reinforcement Learning](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F02_algo\u002Foffline_rl.html)｜[离线强化学习](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F02_algo\u002Foffline_rl_zh.html)\n\n![mbrl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-ModelBasedRL-lightblue) &nbsp;[Model-Based Reinforcement Learning](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F02_algo\u002Fmodel_based_rl.html)｜[基于模型的强化学习](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F02_algo\u002Fmodel_based_rl_zh.html)\n\n![other](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-other-lightgrey) &nbsp;means other sub-direction algorithms, usually as plugin-in in the whole pipeline\n\nP.S: The `.py` file in `Runnable Demo` can be found in `dizoo`\n\n\n| No. |                                                              Algorithm                                                              |                                                                                     Label                                                                                     |                                                                                                                                   Doc and Implementation                                                                                                                                   |                                      Runnable Demo                                      |\n| :-: | :---------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------: |\n|  1  |                             [DQN](https:\u002F\u002Fstorage.googleapis.com\u002Fdeepmind-media\u002Fdqn\u002FDQNNaturePaper.pdf)                             |                                                        ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                        |             [DQN doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fdqn.html)\u003Cbr>[DQN中文文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F12_policies\u002Fdqn_zh.html)\u003Cbr>[policy\u002Fdqn](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fdqn.py)             |     python3 -u cartpole_dqn_main.py \u002F ding -m serial -c cartpole_dqn_config.py -s 0     |\n|  2  |                                             [C51](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1707.06887.pdf)                                             |                                                        ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                        |                                                            [C51 doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fc51.html)\u003Cbr>[policy\u002Fc51](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fc51.py)                                                            |                      ding -m serial -c cartpole_c51_config.py -s 0                      |\n|  3  |                                            [QRDQN](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1710.10044.pdf)                                            |                                                        ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                        |                                                        [QRDQN doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fqrdqn.html)\u003Cbr>[policy\u002Fqrdqn](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fqrdqn.py)                                                        |                     ding -m serial -c cartpole_qrdqn_config.py -s 0                     |\n|  4  |                                             [IQN](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1806.06923.pdf)                                             |                                                        ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                        |                                                            [IQN doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fiqn.html)\u003Cbr>[policy\u002Fiqn](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fiqn.py)                                                            |                      ding -m serial -c cartpole_iqn_config.py -s 0                      |\n|  5  |                                             [FQF](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1911.02140.pdf)                                             |                                                        ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                        |                                                            [FQF doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Ffqf.html)\u003Cbr>[policy\u002Ffqf](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Ffqf.py)                                                            |                      ding -m serial -c cartpole_fqf_config.py -s 0                      |\n|  6  |                                           [Rainbow](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1710.02298.pdf)                                           |                                                        ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                        |                                                    [Rainbow doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Frainbow.html)\u003Cbr>[policy\u002Frainbow](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Frainbow.py)                                                    |                    ding -m serial -c cartpole_rainbow_config.py -s 0                    |\n|  7  |                                             [SQL](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1702.08165.pdf)                                             |                          ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                          |                                                            [SQL doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fsql.html)\u003Cbr>[policy\u002Fsql](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fsql.py)                                                            |                      ding -m serial -c cartpole_sql_config.py -s 0                      |\n|  8  |                                         [R2D2](https:\u002F\u002Fopenreview.net\u002Fforum?id=r1lyTjAqYX)                                         |                            ![dist](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-distributed-blue)![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                            |                                                          [R2D2 doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fr2d2.html)\u003Cbr>[policy\u002Fr2d2](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fr2d2.py)                                                          |                      ding -m serial -c cartpole_r2d2_config.py -s 0                      |\n|  9  |                   [PG](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F1999\u002Ffile\u002F464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf)                   |                                                        ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                        |                                                             [PG doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fa2c.html)\u003Cbr>[policy\u002Fpg](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fpg.py)                                                             |                       ding -m serial -c cartpole_pg_config.py -s 0                       |\n| 10 |                                            [PromptPG](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.14610)                                            |                                                        ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                        |                                                                                               [policy\u002Fprompt_pg](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fprompt_pg.py)                                                                                               |                   ding -m serial_onpolicy -c tabmwp_pg_config.py -s 0                   |\n| 11 |                                             [A2C](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1602.01783.pdf)                                             |                                                        ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                        |                                                            [A2C doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fa2c.html)\u003Cbr>[policy\u002Fa2c](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fa2c.py)                                                            |                      ding -m serial -c cartpole_a2c_config.py -s 0                      |\n| 12 |                        [PPO](https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.06347)\u002F[MAPPO](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.01955.pdf)                        | ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)![MARL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow) |                                                            [PPO doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fppo.html)\u003Cbr>[policy\u002Fppo](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fppo.py)                                                            | python3 -u cartpole_ppo_main.py \u002F ding -m serial_onpolicy -c cartpole_ppo_config.py -s 0 |\n| 13 |                                             [PPG](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2009.04416.pdf)                                             |                                                        ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                        |                                                            [PPG doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fppg.html)\u003Cbr>[policy\u002Fppg](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fppg.py)                                                            |                             python3 -u cartpole_ppg_main.py                             |\n| 14 |                                            [ACER](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1611.01224.pdf)                                            |                          ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                          |                                                          [ACER doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Facer.html)\u003Cbr>[policy\u002Facer](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Facer.py)                                                          |                      ding -m serial -c cartpole_acer_config.py -s 0                      |\n| 15 |                                             [IMPALA](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.01561)                                             |                            ![dist](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-distributed-blue)![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                            |                                                      [IMPALA doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fimpala.html)\u003Cbr>[policy\u002Fimpala](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fimpala.py)                                                      |                     ding -m serial -c cartpole_impala_config.py -s 0                     |\n| 16 |                     [DDPG](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1509.02971.pdf)\u002F[PADDPG](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1511.04143.pdf)                     |                             ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)![hybrid](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-hybrid-darkgreen)                             |                                                          [DDPG doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fddpg.html)\u003Cbr>[policy\u002Fddpg](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fddpg.py)                                                          |                      ding -m serial -c pendulum_ddpg_config.py -s 0                      |\n| 17 |                                             [TD3](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1802.09477.pdf)                                             |                             ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)![hybrid](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-hybrid-darkgreen)                             |                                                            [TD3 doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Ftd3.html)\u003Cbr>[policy\u002Ftd3](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Ftd3.py)                                                            |     python3 -u pendulum_td3_main.py \u002F ding -m serial -c pendulum_td3_config.py -s 0     |\n| 18 |                                            [D4PG](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1804.08617.pdf)                                            |                                                         ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                                                         |                                                          [D4PG doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fd4pg.html)\u003Cbr>[policy\u002Fd4pg](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fd4pg.py)                                                          |                            python3 -u pendulum_d4pg_config.py                            |\n| 19 |                                           [SAC](https:\u002F\u002Farxiv.org\u002Fabs\u002F1801.01290)\u002F[MASAC]                                           | ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)![MARL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow) |                                                            [SAC doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fsac.html)\u003Cbr>[policy\u002Fsac](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fsac.py)                                                            |                      ding -m serial -c pendulum_sac_config.py -s 0                      |\n| 20 |                                            [PDQN](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.06394.pdf)                                            |                                                           ![hybrid](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-hybrid-darkgreen)                                                           |                                                                                                    [policy\u002Fpdqn](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fpdqn.py)                                                                                                    |                     ding -m serial -c gym_hybrid_pdqn_config.py -s 0                     |\n| 21 |                                            [MPDQN](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.04388.pdf)                                            |                                                           ![hybrid](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-hybrid-darkgreen)                                                           |                                                                                                    [policy\u002Fpdqn](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fpdqn.py)                                                                                                    |                    ding -m serial -c gym_hybrid_mpdqn_config.py -s 0                    |\n| 22 |                                            [HPPO](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1903.01344.pdf)                                            |                                                           ![hybrid](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-hybrid-darkgreen)                                                           |                                                                                                     [policy\u002Fppo](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fppo.py)                                                                                                     |                ding -m serial_onpolicy -c gym_hybrid_hppo_config.py -s 0                |\n| 23 |                                             [BDQ](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1711.08946.pdf)                                             |                                                           ![hybrid](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-hybrid-darkgreen)                                                           |                                                                                                     [policy\u002Fbdq](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fdqn.py)                                                                                                     |                             python3 -u hopper_bdq_config.py                             |\n| 24 |                                              [MDQN](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.14430)                                              |                                                        ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                        |                                                                                                    [policy\u002Fmdqn](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fmdqn.py)                                                                                                    |                            python3 -u asterix_mdqn_config.py                            |\n| 25 |                                            [QMIX](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.11485.pdf)                                            |                                                              ![MARL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)                                                              |                                                          [QMIX doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fqmix.html)\u003Cbr>[policy\u002Fqmix](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fqmix.py)                                                          |                     ding -m serial -c smac_3s5z_qmix_config.py -s 0                     |\n| 26 |                                            [COMA](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1705.08926.pdf)                                            |                                                              ![MARL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)                                                              |                                                          [COMA doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fcoma.html)\u003Cbr>[policy\u002Fcoma](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fcoma.py)                                                          |                     ding -m serial -c smac_3s5z_coma_config.py -s 0                     |\n| 27 |                                              [QTran](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.05408)                                              |                                                              ![MARL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)                                                              |                                                                                                   [policy\u002Fqtran](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fqtran.py)                                                                                                   |                     ding -m serial -c smac_3s5z_qtran_config.py -s 0                     |\n| 28 |                                              [WQMIX](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.10800)                                              |                                                              ![MARL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)                                                              |                                                        [WQMIX doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fwqmix.html)\u003Cbr>[policy\u002Fwqmix](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fwqmix.py)                                                        |                     ding -m serial -c smac_3s5z_wqmix_config.py -s 0                     |\n| 29 |                                           [CollaQ](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2010.08531.pdf)                                           |                                                              ![MARL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)                                                              |                                                      [CollaQ doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fcollaq.html)\u003Cbr>[policy\u002Fcollaq](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fcollaq.py)                                                      |                    ding -m serial -c smac_3s5z_collaq_config.py -s 0                    |\n| 30 |                                           [MADDPG](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1706.02275.pdf)                                           |                                                              ![MARL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)                                                              |                                                         [MADDPG doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fddpg.html)\u003Cbr>[policy\u002Fddpg](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fddpg.py)                                                         |                ding -m serial -c ptz_simple_spread_maddpg_config.py -s 0                |\n| 31 |                                            [GAIL](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1606.03476.pdf)                                            |                                                                ![IL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-IL-purple)                                                                |                                               [GAIL doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fgail.html)\u003Cbr>[reward_model\u002Fgail](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Freward_model\u002Fgail_irl_model.py)                                               |                 ding -m serial_gail -c cartpole_dqn_gail_config.py -s 0                 |\n| 32 |                                            [SQIL](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.11108.pdf)                                            |                                                                ![IL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-IL-purple)                                                                |                                                    [SQIL doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fsqil.html)\u003Cbr>[entry\u002Fsqil](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fentry\u002Fserial_entry_sqil.py)                                                    |                   ding -m serial_sqil -c cartpole_sqil_config.py -s 0                   |\n| 33 |                                            [DQFD](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1704.03732.pdf)                                            |                                                                ![IL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-IL-purple)                                                                |                                                          [DQFD doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fdqfd.html)\u003Cbr>[policy\u002Fdqfd](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fdqfd.py)                                                          |                   ding -m serial_dqfd -c cartpole_dqfd_config.py -s 0                   |\n| 34 |                                            [R2D3](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1909.01387.pdf)                                            |                                                                ![IL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-IL-purple)                                                                |       [R2D3 doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fr2d3.html)\u003Cbr>[R2D3中文文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F12_policies\u002Fr2d3_zh.html)\u003Cbr>[policy\u002Fr2d3](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F12_policies\u002Fr2d3_zh.html)       |                        python3 -u pong_r2d3_r2d2expert_config.py                        |\n| 35 |                                    [Guided Cost Learning](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1603.00448.pdf)                                    |                                                                ![IL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-IL-purple)                                                                |                      [Guided Cost Learning中文文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F12_policies\u002Fguided_cost_zh.html)\u003Cbr>[reward_model\u002Fguided_cost](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Freward_model\u002Fguided_cost_reward_model.py)                      |                            python3 lunarlander_gcl_config.py                            |\n| 36 |                                              [TREX](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.06387)                                              |                                                                ![IL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-IL-purple)                                                                |                                             [TREX doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Ftrex.html)\u003Cbr>[reward_model\u002Ftrex](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Freward_model\u002Ftrex_reward_model.py)                                             |                               python3 mujoco_trex_main.py                               |\n| 37 |                               [Implicit Behavorial Cloning](https:\u002F\u002Fimplicitbc.github.io\u002F) (DFO+MCMC)                               |                                                                ![IL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-IL-purple)                                                                |                                                  [policy\u002Fibc](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fibc.py) \u003Cbr> [model\u002Ftemplate\u002Febm](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fmodel\u002Ftemplate\u002Febm.py)                                                  |              python3 d4rl_ibc_main.py -s 0 -c pen_human_ibc_mcmc_config.py              |\n| 38 |                                             [BCO](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1805.01954.pdf)                                             |                                                                ![IL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-IL-purple)                                                                |                                                                                                [entry\u002Fbco](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fentry\u002Fserial_entry_bco.py)                                                                                                |                            python3 -u cartpole_bco_config.py                            |\n| 39 |                                             [HER](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1707.01495.pdf)                                             |                                                           ![exp](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-exploration-orange)                                                           |                                               [HER doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fher.html)\u003Cbr>[reward_model\u002Fher](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Freward_model\u002Fher_reward_model.py)                                               |                              python3 -u bitflip_her_dqn.py                              |\n| 40 |                                               [RND](https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.12894)                                               |                                                           ![exp](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-exploration-orange)                                                           |                                               [RND doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Frnd.html)\u003Cbr>[reward_model\u002Frnd](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Freward_model\u002Frnd_reward_model.py)                                               |                         python3 -u cartpole_rnd_onppo_config.py                         |\n| 41 |                                             [ICM](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1705.05363.pdf)                                             |                                                           ![exp](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-exploration-orange)                                                           | [ICM doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Ficm.html)\u003Cbr>[ICM中文文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F12_policies\u002Ficm_zh.html)\u003Cbr>[reward_model\u002Ficm](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Freward_model\u002Ficm_reward_model.py) |                          python3 -u cartpole_ppo_icm_config.py                          |\n| 42 |                                             [CQL](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2006.04779.pdf)                                             |                                                         ![offline](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-offlineRL-darkblue)                                                         |                                                            [CQL doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fcql.html)\u003Cbr>[policy\u002Fcql](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fcql.py)                                                            |                               python3 -u d4rl_cql_main.py                               |\n| 43 |                                            [TD3BC](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2106.06860.pdf)                                            |                                                         ![offline](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-offlineRL-darkblue)                                                         |                                                      [TD3BC doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Ftd3_bc.html)\u003Cbr>[policy\u002Ftd3_bc](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Ftd3_bc.py)                                                      |                              python3 -u d4rl_td3_bc_main.py                              |\n| 44 |                                    [Decision Transformer](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2106.01345.pdf)                                    |                                                         ![offline](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-offlineRL-darkblue)                                                         |                                                                                                      [policy\u002Fdt](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fdt.py)                                                                                                      |                               python3 -u d4rl_dt_mujoco.py                               |\n| 45 |                                            [EDAC](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2110.01548.pdf)                                            |                                                         ![offline](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-offlineRL-darkblue)                                                         |                                                          [EDAC doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fedac.html)\u003Cbr>[policy\u002Fedac](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fedac.py)                                                          |                               python3 -u d4rl_edac_main.py                               |\n| 46 |                                            [QGPO](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.12824.pdf)                                            |                                                         ![offline](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-offlineRL-darkblue)                                                         |                                                          [QGPO doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fqgpo.html)\u003Cbr>[policy\u002Fqgpo](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fqgpo.py)                                                          |                             python3 -u ding\u002Fexample\u002Fqgpo.py                             |\n| 47 |   MBSAC([SAC](https:\u002F\u002Farxiv.org\u002Fabs\u002F1801.01290)+[MVE](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.00101)+[SVG](https:\u002F\u002Farxiv.org\u002Fabs\u002F1510.09142))   |                           ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)![mbrl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-ModelBasedRL-lightblue)                           |                                                                                          [policy\u002Fmbpolicy\u002Fmbsac](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fmbpolicy\u002Fmbsac.py)                                                                                          |   python3 -u pendulum_mbsac_mbpo_config.py \\ python3 -u pendulum_mbsac_ddppo_config.py   |\n| 48 | STEVESAC([SAC](https:\u002F\u002Farxiv.org\u002Fabs\u002F1801.01290)+[STEVE](https:\u002F\u002Farxiv.org\u002Fabs\u002F1807.01675)+[SVG](https:\u002F\u002Farxiv.org\u002Fabs\u002F1510.09142)) |                           ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)![mbrl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-ModelBasedRL-lightblue)                           |                                                                                          [policy\u002Fmbpolicy\u002Fmbsac](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fmbpolicy\u002Fmbsac.py)                                                                                          |                       python3 -u pendulum_stevesac_mbpo_config.py                       |\n| 49 |                                            [MBPO](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1906.08253.pdf)                                            |                                                         ![mbrl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-ModelBasedRL-lightblue)                                                         |                                                     [MBPO doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fmbpo.html)\u003Cbr>[world_model\u002Fmbpo](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fworld_model\u002Fmbpo.py)                                                     |                          python3 -u pendulum_sac_mbpo_config.py                          |\n| 50 |                                        [DDPPO](https:\u002F\u002Fopenreview.net\u002Fforum?id=rzvOQrnclO0)                                        |                                                         ![mbrl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-ModelBasedRL-lightblue)                                                         |                                                                                              [world_model\u002Fddppo](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fworld_model\u002Fddppo.py)                                                                                              |                        python3 -u pendulum_mbsac_ddppo_config.py                        |\n| 51 |                                          [DreamerV3](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2301.04104.pdf)                                          |                                                         ![mbrl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-ModelBasedRL-lightblue)                                                         |                                                                                          [world_model\u002Fdreamerv3](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fworld_model\u002Fdreamerv3.py)                                                                                          |                      python3 -u cartpole_balance_dreamer_config.py                      |\n| 52 |                                             [PER](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1511.05952.pdf)                                             |                                                            ![other](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-other-lightgrey)                                                            |                                                                                   [worker\u002Freplay_buffer](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fworker\u002Freplay_buffer\u002Fadvanced_buffer.py)                                                                                   |                                      `rainbow demo`                                      |\n| 53 |                                             [GAE](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1506.02438.pdf)                                             |                                                            ![other](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-other-lightgrey)                                                            |                                                                                                   [rl_utils\u002Fgae](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Frl_utils\u002Fgae.py)                                                                                                   |                                        `ppo demo`                                        |\n| 54 |                                           [ST-DIM](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1906.08226.pdf)                                           |                                                            ![other](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-other-lightgrey)                                                            |                                                                              [torch_utils\u002Floss\u002Fcontrastive_loss](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Ftorch_utils\u002Floss\u002Fcontrastive_loss.py)                                                                              |                   ding -m serial -c cartpole_dqn_stdim_config.py -s 0                   |\n| 55 |                                             [PLR](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2010.03934.pdf)                                             |                                                            ![other](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-other-lightgrey)                                                            |                                       [PLR doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fplr.html)\u003Cbr>[data\u002Flevel_replay\u002Flevel_sampler](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fdata\u002Flevel_replay\u002Flevel_sampler.py)                                       |                          python3 -u bigfish_plr_config.py -s 0                          |\n| 56 |                                           [PCGrad](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2001.06782.pdf)                                           |                                                            ![other](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-other-lightgrey)                                                            |                                                                             [torch_utils\u002Foptimizer_helper\u002FPCGrad](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fdata\u002Ftorch_utils\u002Foptimizer_helper.py)                                                                             |                        python3 -u multi_mnist_pcgrad_main.py -s 0                        |\n| 57 |                                           [AWR](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.00177)                                                   |                                                            ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                    |                                                                             [policy\u002Fibc](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fprompt_awr.py)                                                                                                                     |                        python3 -u tabmwp_awr_config.py                                   |\n\n\u003C\u002Fdetails>\n\n### Environment Versatility\n\n\u003Cdetails open>\n\u003Csummary>(Click to Collapse)\u003C\u002Fsummary>\n\n| No |                                          Environment                                          |                                                                                                                   Label                                                                                                                   |                                             Visualization                                             |                                                                                                                                     Code and Doc Links                                                                                                                                     |\n| :-: | :--------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |\n| 1 |               [Atari](https:\u002F\u002Fale.farama.org)               |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                                  ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_deb7d17553ec.gif)                                  |               [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fatari\u002Fenvs) \u003Cbr>[env tutorial](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fatari.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fatari_zh.html)               |\n| 2 |        [box2d\u002Fbipedalwalker](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym\u002Ftree\u002Fmaster\u002Fgym\u002Fenvs\u002Fbox2d)        |                                                                                       ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                                                                                       |                         ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_03b074361224.gif)                         | [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fbox2d\u002Fbipedalwalker\u002Fenvs)\u003Cbr>[env tutorial](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fbipedalwalker.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fbipedalwalker_zh.html) |\n| 3 |         [box2d\u002Flunarlander](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym\u002Ftree\u002Fmaster\u002Fgym\u002Fenvs\u002Fbox2d)         |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                         ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_8b57ca01adf3.gif)                         |    [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fbox2d\u002Flunarlander\u002Fenvs)\u003Cbr>[env tutorial](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Flunarlander.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Flunarlander_zh.html)    |\n| 4 | [classic_control\u002Fcartpole](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym\u002Ftree\u002Fmaster\u002Fgym\u002Fenvs\u002Fclassic_control) |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                       ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_d27e13293bcb.gif)                       |   [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fclassic_control\u002Fcartpole\u002Fenvs)\u003Cbr>[env tutorial](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fcartpole.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fcartpole_zh.html)   |\n| 5 | [classic_control\u002Fpendulum](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym\u002Ftree\u002Fmaster\u002Fgym\u002Fenvs\u002Fclassic_control) |                                                                                       ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                                                                                       |                       ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_b95ef8c9973c.gif)                       |   [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fclassic_control\u002Fpendulum\u002Fenvs)\u003Cbr>[env tutorial](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fpendulum.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fpendulum_zh.html)   |\n| 6 |                [competitive_rl](https:\u002F\u002Fgithub.com\u002Fcuhkrlcourse\u002Fcompetitive-rl)                |                                                         ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen) ![selfplay](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-selfplay-blue)                                                         |                         ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_57ab0e56c25d.gif)                         |                                                     [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo.classic_control)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fcompetitive_rl_zh.html)                                                     |\n| 7 |                    [gfootball](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Ffootball)                    |                          ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)![sparse](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-sparse%20reward-orange)![selfplay](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-selfplay-blue)                          |                              ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_cec004ff6c1a.gif)                              |           [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo.gfootball\u002Fenvs)\u003Cbr>[env tutorial](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fgfootball.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fgfootball_zh.html)           |\n| 8 |                      [minigrid](https:\u002F\u002Fgithub.com\u002Fmaximecb\u002Fgym-minigrid)                      |                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)![sparse](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-sparse%20reward-orange)                                                      |                               ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_bb708e1ba572.gif)                               |             [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fminigrid\u002Fenvs)\u003Cbr>[env tutorial](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fminigrid.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fminigrid_zh.html)             |\n| 9 |              [MuJoCo](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym\u002Ftree\u002Fmaster\u002Fgym\u002Fenvs\u002Fmujoco)              |                                                                                       ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                                                                                       |                                 ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_14e09becfe59.gif)                                 |                [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fmajoco\u002Fenvs)\u003Cbr>[env tutorial](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fmujoco.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fmujoco_zh.html)                |\n| 10 |                 [PettingZoo](https:\u002F\u002Fgithub.com\u002FFarama-Foundation\u002FPettingZoo)                 |                              ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen) ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green) ![marl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)                              |                   ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_b5ae00e38469.gif)                   |        [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fpetting_zoo\u002Fenvs)\u003Cbr>[env tutorial](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fpettingzoo.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fpettingzoo_zh.html)        |\n| 11 |               [overcooked](https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fovercooked-demo)               |                                                            ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen) ![marl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)                                                            |                             ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_2b0dab67310c.gif)                             |                                                       [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fovercooded\u002Fenvs)\u003Cbr>[env tutorial](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fovercooked.html)                                                       |\n| 12 |                          [procgen](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fprocgen)                          |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                                ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_80b380d05c7e.gif)                                |               [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fprocgen)\u003Cbr>[env tutorial](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fprocgen.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fprocgen_zh.html)               |\n| 13 |                      [pybullet](https:\u002F\u002Fgithub.com\u002Fbenelot\u002Fpybullet-gym)                      |                                                                                       ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                                                                                       |                               ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_e2c5b7f67b04.gif)                               |                                                        [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fpybullet\u002Fenvs)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fpybullet_zh.html)                                                        |\n| 14 |                            [smac](https:\u002F\u002Fgithub.com\u002Foxwhirl\u002Fsmac)                            | ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen) ![marl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)![selfplay](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-selfplay-blue)![sparse](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-sparse%20reward-orange) |                                   ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_67a3c84bdb89.gif)                                   |                 [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fsmac\u002Fenvs)\u003Cbr>[env tutorial](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fsmac.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fsmac_zh.html)                 |\n| 15 |                         [d4rl](https:\u002F\u002Fgithub.com\u002Frail-berkeley\u002Fd4rl)                         |                                                                                       ![offline](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-offlineRL-darkblue)                                                                                       |                                      ![ori](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_efcc51cf498f.gif)                                      |                                                              [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fd4rl)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fd4rl_zh.html)                                                              |\n| 16 |                                          league_demo                                          |                                                         ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen) ![selfplay](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-selfplay-blue)                                                         |                            ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_64b8c7559165.png)                            |                                                                                                    [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fleague_demo\u002Fenvs)                                                                                                    |\n| 17 |                                          pomdp atari                                          |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                                                                                                        |                                                                                                       [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fpomdp\u002Fenvs)                                                                                                       |\n| 18 |                          [bsuite](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fbsuite)                          |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                                 ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_e2ab4ca22cb6.png)                                 |             [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fbsuite\u002Fenvs)\u003Cbr>[env tutorial](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002F\u002Fbsuite.html) \u003Cbr> [环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fbsuite_zh.html)             |\n| 19 |                             [ImageNet](https:\u002F\u002Fwww.image-net.org\u002F)                             |                                                                                             ![IL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-IL\u002FSL-purple)                                                                                             |                         ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_e1af270145fe.png)                         |                                                    [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fimage_classification)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fimage_cls_zh.html)                                                    |\n| 20 |                 [slime_volleyball](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym)                 |                                                          ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)![selfplay](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-selfplay-blue)                                                          |                              ![ori](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_dea895057c05.gif)                              |    [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fslime_volley)\u003Cbr>[env tutorial](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fslime_volleyball.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fslime_volleyball_zh.html)    |\n| 21 |                    [gym_hybrid](https:\u002F\u002Fgithub.com\u002Fthomashirtz\u002Fgym-hybrid)                    |                                                                                         ![hybrid](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-hybrid-darkgreen)                                                                                         |                                 ![ori](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_7b04dc2eb83a.gif)                                 |           [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fgym_hybrid)\u003Cbr>[env tutorial](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fgym_hybrid.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fgym_hybrid_zh.html)           |\n| 22 |                       [GoBigger](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FGoBigger)                       |                                    ![hybrid](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-hybrid-darkgreen)![marl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)![selfplay](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-selfplay-blue)                                    |                                 ![ori](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_8f0aa108697e.gif)                                 |                                [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FGoBigger-Challenge-2021\u002Ftree\u002Fmain\u002Fdi_baseline)\u003Cbr>[env tutorial](https:\u002F\u002Fgobigger.readthedocs.io\u002Fen\u002Flatest\u002Findex.html)\u003Cbr>[环境指南](https:\u002F\u002Fgobigger.readthedocs.io\u002Fzh_CN\u002Flatest\u002F)                                |\n| 23 |                       [gym_soccer](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym-soccer)                       |                                                                                         ![hybrid](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-hybrid-darkgreen)                                                                                         |                              ![ori](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_423205f017d0.gif)                              |                                                        [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fgym_soccer)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fgym_soccer_zh.html)                                                        |\n| 24 |           [multiagent_mujoco](https:\u002F\u002Fgithub.com\u002Fschroederdewitt\u002Fmultiagent_mujoco)           |                                                              ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green) ![marl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)                                                              |                                 ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_14e09becfe59.gif)                                 |                                                    [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fmultiagent_mujoco\u002Fenvs)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fmujoco_zh.html)                                                    |\n| 25 |                                            bitflip                                            |                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen) ![sparse](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-sparse%20reward-orange)                                                      |                                ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_eb4166449771.gif)                                |                                                         [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fbitflip\u002Fenvs)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fbitflip_zh.html)                                                         |\n| 26 |                      [sokoban](https:\u002F\u002Fgithub.com\u002FmpSchrader\u002Fgym-sokoban)                      |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      | ![Game 2](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_e974934be4b7.gif) |             [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fsokoban\u002Fenvs)\u003Cbr>[env tutorial](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fsokoban.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fsokoban_zh.html)             |\n| 27 |                   [gym_anytrading](https:\u002F\u002Fgithub.com\u002FAminHP\u002Fgym-anytrading)                   |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                         ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_cc418c586002.png)                         |                                                [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fgym_anytrading) \u003Cbr> [env tutorial](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fdizoo\u002Fgym_anytrading\u002Fenvs\u002FREADME.md)                                                |\n| 28 |                   [mario](https:\u002F\u002Fgithub.com\u002FKautenja\u002Fgym-super-mario-bros)                   |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                                  ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_7501674b0387.gif)                                  |  [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fmario) \u003Cbr> [env tutorial](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fgym_super_mario_bros.html) \u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fgym_super_mario_bros_zh.html)  |\n| 29 |                       [dmc2gym](https:\u002F\u002Fgithub.com\u002Fdenisyarats\u002Fdmc2gym)                       |                                                                                       ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                                                                                       |                            ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_984ad0c53fac.png)                            |               [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fdmc2gym)\u003Cbr>[env tutorial](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fdmc2gym.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fdmc2gym_zh.html)               |\n| 30 |                        [evogym](https:\u002F\u002Fgithub.com\u002FEvolutionGym\u002Fevogym)                        |                                                                                       ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                                                                                       |                                 ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_2c1d64c758c5.gif)                                 |            [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fevogym\u002Fenvs) \u003Cbr> [env tutorial](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fevogym.html) \u003Cbr> [环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002FEvogym_zh.html)            |\n| 31 |             [gym-pybullet-drones](https:\u002F\u002Fgithub.com\u002FutiasDSL\u002Fgym-pybullet-drones)             |                                                                                       ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                                                                                       |                    ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_1a3c7b00a823.gif)                    |                                                                                          [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fgym_pybullet_drones\u002Fenvs)\u003Cbr>环境指南                                                                                          |\n| 32 |                 [beergame](https:\u002F\u002Fgithub.com\u002FOptMLGroup\u002FDeepBeerInventory-RL)                 |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                               ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_91bae50e9303.png)                               |                                                                                               [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fbeergame\u002Fenvs)\u003Cbr>环境指南                                                                                               |\n| 33 | [classic_control\u002Facrobot](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym\u002Ftree\u002Fmaster\u002Fgym\u002Fenvs\u002Fclassic_control) |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                        ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_b9ca34ae76c8.gif)                        |                                                [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fclassic_control\u002Facrobot\u002Fenvs)\u003Cbr> [环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Facrobot_zh.html)                                                |\n| 34 |   [box2d\u002Fcar_racing](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym\u002Fblob\u002Fmaster\u002Fgym\u002Fenvs\u002Fbox2d\u002Fcar_racing.py)   |                                                     ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen) \u003Cbr> ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                                                     |                          ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_f4293ae4ad58.gif)                          |                                                                                            [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fbox2d\u002Fcarracing\u002Fenvs)\u003Cbr>环境指南                                                                                            |\n| 35 |                     [metadrive](https:\u002F\u002Fgithub.com\u002Fmetadriverse\u002Fmetadrive)                     |                                                                                       ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                                                                                       |                            ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_539637faf23e.gif)                            |                                                       [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fmetadrive\u002Fenv)\u003Cbr> [环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fmetadrive_zh.html)                                                       |\n| 36 |  [cliffwalking](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym\u002Fblob\u002Fmaster\u002Fgym\u002Fenvs\u002Ftoy_text\u002Fcliffwalking.py)  |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                          ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_7ddd9c04384f.gif)                          |                                                                                    [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fcliffwalking\u002Fenvs)\u003Cbr> env tutorial \u003Cbr> 环境指南                                                                                    |\n| 37 |                       [tabmwp](https:\u002F\u002Fpromptpg.github.io\u002Fexplore.html)                       |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                                ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_059f0b09627e.jpeg)                                |                                                                                         [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Ftabmwp) \u003Cbr> env tutorial \u003Cbr> 环境指南                                                                                         |\n| 38 |            [frozen_lake](https:\u002F\u002Fgymnasium.farama.org\u002Fenvironments\u002Ftoy_text\u002Ffrozen_lake)      |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                                ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_aa3ca9c8615f.gif)                        |                                                                                         [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Ffrozen_lake) \u003Cbr> env tutorial \u003Cbr> 环境指南                                                                                         |\n| 39 | [ising_model](https:\u002F\u002Fgithub.com\u002Fmlii\u002Fmfrl\u002Ftree\u002Fmaster\u002Fexamples\u002Fising_model)                  |                            ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen) ![marl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)                                                                                             |                                ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_2dbb615e9aae.gif)                           | [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fising_env) \u003Cbr> env tutorial \u003Cbr> [环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fising_model_zh.html) |\n| 40 | [taxi](https:\u002F\u002Fwww.gymlibrary.dev\u002Fenvironments\u002Ftoy_text\u002Ftaxi\u002F)                  |                            ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                 |                                ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_46da3c7299e9.gif)                           | [dizoo link](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Ftaxi\u002Fenvs) \u003Cbr> [env tutorial](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Ftaxi.html) \u003Cbr> [环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh-cn\u002Flatest\u002F13_envs\u002Ftaxi_zh.html) |\n\n\n\n![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen) means discrete action space\n\n![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green) means continuous action space\n\n![hybrid](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-hybrid-darkgreen) means hybrid (discrete + continuous) action space\n\n![MARL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow) means multi-agent RL environment\n\n![sparse](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-sparse%20reward-orange) means environment which is related to exploration and sparse reward\n\n![offline](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-offlineRL-darkblue) means offline RL environment\n\n![IL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-IL\u002FSL-purple) means Imitation Learning or Supervised Learning Dataset\n\n![selfplay](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-selfplay-blue) means environment that allows agent VS agent battle\n\nP.S. some enviroments in Atari, such as **MontezumaRevenge**, are also the sparse reward type.\n\n\u003C\u002Fdetails>\n\n### General Data Container: TreeTensor\n\nDI-engine utilizes [TreeTensor](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-treetensor) as the basic data container in various components, which is ease of use and consistent across different code modules such as environment definition, data processing and DRL optimization. Here are some concrete code examples:\n\n- TreeTensor can easily extend all the operations of `torch.Tensor` to nested data:\n\n  \u003Cdetails close>\n  \u003Csummary>(Click for Details)\u003C\u002Fsummary>\n\n  ```python\n  import treetensor.torch as ttorch\n\n\n  # create random tensor\n  data = ttorch.randn({'a': (3, 2), 'b': {'c': (3, )}})\n  # clone+detach tensor\n  data_clone = data.clone().detach()\n  # access tree structure like attribute\n  a = data.a\n  c = data.b.c\n  # stack\u002Fcat\u002Fsplit\n  stacked_data = ttorch.stack([data, data_clone], 0)\n  cat_data = ttorch.cat([data, data_clone], 0)\n  data, data_clone = ttorch.split(stacked_data, 1)\n  # reshape\n  data = data.unsqueeze(-1)\n  data = data.squeeze(-1)\n  flatten_data = data.view(-1)\n  # indexing\n  data_0 = data[0]\n  data_1to2 = data[1:2]\n  # execute math calculations\n  data = data.sin()\n  data.b.c.cos_().clamp_(-1, 1)\n  data += data ** 2\n  # backward\n  data.requires_grad_(True)\n  loss = data.arctan().mean()\n  loss.backward()\n  # print shape\n  print(data.shape)\n  # result\n  # \u003CSize 0x7fbd3346ddc0>\n  # ├── 'a' --> torch.Size([1, 3, 2])\n  # └── 'b' --> \u003CSize 0x7fbd3346dd00>\n  #     └── 'c' --> torch.Size([1, 3])\n  ```\n\n  \u003C\u002Fdetails>\n- TreeTensor can make it simple yet effective to implement classic deep reinforcement learning pipeline\n\n  \u003Cdetails close>\n  \u003Csummary>(Click for Details)\u003C\u002Fsummary>\n\n  ```diff\n  import torch\n  import treetensor.torch as ttorch\n\n  B = 4\n\n\n  def get_item():\n      return {\n          'obs': {\n              'scalar': torch.randn(12),\n              'image': torch.randn(3, 32, 32),\n          },\n          'action': torch.randint(0, 10, size=(1,)),\n          'reward': torch.rand(1),\n          'done': False,\n      }\n\n\n  data = [get_item() for _ in range(B)]\n\n\n  # execute `stack` op\n  - def stack(data, dim):\n  -     elem = data[0]\n  -     if isinstance(elem, torch.Tensor):\n  -         return torch.stack(data, dim)\n  -     elif isinstance(elem, dict):\n  -         return {k: stack([item[k] for item in data], dim) for k in elem.keys()}\n  -     elif isinstance(elem, bool):\n  -         return torch.BoolTensor(data)\n  -     else:\n  -         raise TypeError(\"not support elem type: {}\".format(type(elem)))\n  - stacked_data = stack(data, dim=0)\n  + data = [ttorch.tensor(d) for d in data]\n  + stacked_data = ttorch.stack(data, dim=0)\n\n  # validate\n  - assert stacked_data['obs']['image'].shape == (B, 3, 32, 32)\n  - assert stacked_data['action'].shape == (B, 1)\n  - assert stacked_data['reward'].shape == (B, 1)\n  - assert stacked_data['done'].shape == (B,)\n  - assert stacked_data['done'].dtype == torch.bool\n  + assert stacked_data.obs.image.shape == (B, 3, 32, 32)\n  + assert stacked_data.action.shape == (B, 1)\n  + assert stacked_data.reward.shape == (B, 1)\n  + assert stacked_data.done.shape == (B,)\n  + assert stacked_data.done.dtype == torch.bool\n  ```\n\n  \u003C\u002Fdetails>\n\n## Feedback and Contribution\n\n- [File an issue](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fissues\u002Fnew\u002Fchoose) on Github\n- Open or participate in our [forum](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fdiscussions)\n- Discuss on DI-engine [discord server](https:\u002F\u002Fdiscord.gg\u002FdkZS2JF56X)\n- Discuss on DI-engine [slack communication channel](https:\u002F\u002Fjoin.slack.com\u002Ft\u002Fopendilab\u002Fshared_invite\u002Fzt-v9tmv4fp-nUBAQEH1_Kuyu_q4plBssQ)\n- Discuss on DI-engine's WeChat group (i.e. add us on WeChat: ding314assist)\n\n  \u003Cimg src=https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fassets\u002Fwechat.jpeg width=35% \u002F>\n- Contact our email (opendilab@pjlab.org.cn)\n- Contributes to our future plan [Roadmap](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fissues\u002F548)\n\nWe appreciate all the feedbacks and contributions to improve DI-engine, both algorithms and system designs. And `CONTRIBUTING.md` offers some necessary information.\n\n## Supporters\n\n### &#8627; Stargazers\n\n[![Stargazers repo roster for @opendilab\u002FDI-engine](https:\u002F\u002Freporoster.com\u002Fstars\u002Fopendilab\u002FDI-engine)](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fstargazers)\n\n### &#8627; Forkers\n\n[![Forkers repo roster for @opendilab\u002FDI-engine](https:\u002F\u002Freporoster.com\u002Fforks\u002Fopendilab\u002FDI-engine)](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fnetwork\u002Fmembers)\n\n## Citation\n\n```latex\n@misc{ding,\n    title={DI-engine: A Universal AI System\u002FEngine for Decision Intelligence},\n    author={Niu, Yazhe and Xu, Jingxin and Pu, Yuan and Nie, Yunpeng and Zhang, Jinouwen and Hu, Shuai and Zhao, Liangxuan and Zhang,  Ming and Liu, Yu},\n    publisher={GitHub},\n    howpublished={\\url{https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine}},\n    year={2021},\n}\n```\n\n## License\n\nDI-engine released under the Apache 2.0 license.\n","\u003Cdiv align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F\">\u003Cimg width=\"1000px\" height=\"auto\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_7389e3f44a0c.png\">\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n---\n\n[![Twitter](https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Furl?style=social&url=https%3A%2F%2Ftwitter.com%2Fopendilab)](https:\u002F\u002Ftwitter.com\u002Fopendilab)\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002FDI-engine)](https:\u002F\u002Fpypi.org\u002Fproject\u002FDI-engine\u002F)\n![Conda](https:\u002F\u002Fanaconda.org\u002Fopendilab\u002Fdi-engine\u002Fbadges\u002Fversion.svg)\n![Conda update](https:\u002F\u002Fanaconda.org\u002Fopendilab\u002Fdi-engine\u002Fbadges\u002Flatest_release_date.svg)\n![PyPI - Python Version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002FDI-engine)\n![PyTorch Version](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdynamic\u002Fjson?color=blue&label=pytorch&query=%24.pytorchVersion&url=https%3A%2F%2Fgist.githubusercontent.com\u002FPaParaZz1\u002F54c5c44eeb94734e276b2ed5770eba8d\u002Fraw\u002F85b94a54933a9369f8843cc2cea3546152a75661\u002Fbadges.json)\n\n![Loc](https:\u002F\u002Fimg.shields.io\u002Fendpoint?url=https:\u002F\u002Fgist.githubusercontent.com\u002FHansBug\u002F3690cccd811e4c5f771075c2f785c7bb\u002Fraw\u002Floc.json)\n![Comments](https:\u002F\u002Fimg.shields.io\u002Fendpoint?url=https:\u002F\u002Fgist.githubusercontent.com\u002FHansBug\u002F3690cccd811e4c5f771075c2f785c7bb\u002Fraw\u002Fcomments.json)\n\n![Style](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Factions\u002Fworkflows\u002Fstyle.yml\u002Fbadge.svg)\n[![Read en Docs](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Factions\u002Fworkflows\u002Fdoc.yml\u002Fbadge.svg)](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest)\n[![Read zh_CN Docs](https:\u002F\u002Fimg.shields.io\u002Freadthedocs\u002Fdi-engine-docs?label=%E4%B8%AD%E6%96%87%E6%96%87%E6%A1%A3)](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest)\n![Unittest](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Factions\u002Fworkflows\u002Funit_test.yml\u002Fbadge.svg)\n![Algotest](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Factions\u002Fworkflows\u002Falgo_test.yml\u002Fbadge.svg)\n![deploy](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Factions\u002Fworkflows\u002Fdeploy.yml\u002Fbadge.svg)\n[![codecov](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fopendilab\u002FDI-engine\u002Fbranch\u002Fmain\u002Fgraph\u002Fbadge.svg?token=B0Q15JI301)](https:\u002F\u002Fcodecov.io\u002Fgh\u002Fopendilab\u002FDI-engine)\n\n![GitHub Org's stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fopendilab)\n[![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fopendilab\u002FDI-engine)](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fstargazers)\n[![GitHub forks](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Fopendilab\u002FDI-engine)](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fnetwork)\n![GitHub commit activity](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fm\u002Fopendilab\u002FDI-engine)\n[![GitHub issues](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Fopendilab\u002FDI-engine)](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fissues)\n[![GitHub pulls](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues-pr\u002Fopendilab\u002FDI-engine)](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fpulls)\n[![Contributors](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcontributors\u002Fopendilab\u002FDI-engine)](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fgraphs\u002Fcontributors)\n[![GitHub license](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fopendilab\u002FDI-engine)](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmaster\u002FLICENSE)\n[![Hugging Face](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Models-yellow)](https:\u002F\u002Fhuggingface.co\u002FOpenDILabCommunity)\n[![Open in OpenXLab](https:\u002F\u002Fcdn-static.openxlab.org.cn\u002Fheader\u002Fopenxlab_models.svg)](https:\u002F\u002Fopenxlab.org.cn\u002Fmodels?search=opendilab)\n[![discord badge](https:\u002F\u002Fdcbadge.vercel.app\u002Fapi\u002Fserver\u002FdkZS2JF56X?style=flat)](https:\u002F\u002Fdiscord.gg\u002FdkZS2JF56X)\n[![slack badge](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSlack-join-blueviolet?logo=slack&amp)](https:\u002F\u002Fjoin.slack.com\u002Ft\u002Fopendilab\u002Fshared_invite\u002Fzt-v9tmv4fp-nUBAQEH1_Kuyu_q4plBssQ)\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fhellogithub.com\u002Frepository\u002F175c1e13739c4e429d0abf2b32ec583d\" target=\"_blank\">\n    \u003Cimg src=\"https:\u002F\u002Fapi.hellogithub.com\u002Fv1\u002Fwidgets\u002Frecommend.svg?rid=175c1e13739c4e429d0abf2b32ec583d&claim_uid=cExIpHuMKdTQ6BW\" alt=\"Featured｜HelloGitHub\" style=\"width: 250px; height: 54px;\" width=\"250\" height=\"54\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\u003Cbr>\n\n更新于 2024年12月23日 DI-engine-v0.5.3\n\n## DI-engine 简介\n\n[文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F) | [中文文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F) | [教程](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F01_quickstart\u002Findex.html) | [特性](#feature) | [任务与中间件](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F03_system\u002Findex.html) | [TreeTensor](#general-data-container-treetensor) | [路线图](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fissues\u002F548)\n\n**DI-engine** 是一个基于 PyTorch 和 JAX 的通用决策智能引擎。\n\n它提供了 **以 Python 为中心** 且 **原生异步** 的任务和中间件抽象，并模块化地集成了几个最重要的决策概念：环境、策略和模型。基于上述机制，DI-engine 支持 **多种深度强化学习算法**，具有卓越的性能、高效的运行效率、结构清晰的 [文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F) 以及完善的 [单元测试](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Factions)：\n\n- 最基础的 DRL 算法：如 DQN、Rainbow、PPO、TD3、SAC、R2D2、IMPALA\n- 多智能体 RL 算法：如 QMIX、WQMIX、MAPPO、HAPPO、ACE\n- 仿生学习算法（BC\u002FIRL\u002FGAIL）：如 GAIL、SQIL、Guided Cost Learning、Implicit BC\n- 离线 RL 算法：BCQ、CQL、TD3BC、Decision Transformer、EDAC、Diffuser、Decision Diffuser、SO2\n- 基于模型的 RL 算法：SVG、STEVE、MBPO、DDPPO、DreamerV3\n- 探索性算法：HER、RND、ICM、NGU\n- LLM + RL 算法：PPO-max、DPO、PromptPG、PromptAWR\n- 其他算法：如 PER、PLR、PCGrad\n- MCTS + RL 算法：AlphaZero、MuZero，请参考 [LightZero](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLightZero)\n- 生成模型 + RL 算法：Diffusion-QL、QGPO、SRPO，请参考 [GenerativeRL](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FGenerativeRL)\n\n\n**DI-engine** 的目标是 **标准化不同的决策智能环境和应用**，既支持学术研究，也适用于原型应用开发。此外，还支持各种训练流水线和定制化的决策 AI 应用程序：\n\n\u003Cdetails open>\n\u003Csummary>(点击收起)\u003C\u002Fsummary>\n\n- 传统学术环境\n  - [DI-zoo](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine#environment-versatility)：结合 DI-engine 的多种决策智能演示与基准环境。\n- 教程课程\n  - [PPOxFamily](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FPPOxFamily)：PPO x Family 强化学习教程课程。\n- 现实世界中的决策 AI 应用\n  - [DI-star](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-star)：星际争霸 II 中的决策 AI。\n  - [PsyDI](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FPsyDI)：面向心理评估的多模态交互式聊天机器人。\n  - [DI-drive](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-drive)：自动驾驶平台。\n  - [DI-sheep](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-sheep)：三子棋游戏中的决策 AI。\n  - [DI-smartcross](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-smartcross)：交通信号灯控制中的决策 AI。\n  - [DI-bioseq](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-bioseq)：生物序列预测与搜索中的决策 AI。\n  - [DI-1024](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-1024)：深度强化学习 + 1024 游戏。\n- 研究论文\n  - [InterFuser](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FInterFuser)：[CoRL 2022] 基于可解释传感器融合 Transformer 的安全增强自动驾驶。\n  - [ACE](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FACE)：[AAAI 2023] ACE：具有双向动作依赖性的合作式多智能体 Q 学习。\n  - [GoBigger](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FGoBigger)：[ICLR 2023] 多智能体决策智能环境。\n  - [DOS](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDOS)：[CVPR 2023] ReasonNet：基于时序与全局推理的端到端驾驶。\n  - [LightZero](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLightZero)：[NeurIPS 2023 Spotlight] 轻量高效 MCTS\u002FAlphaZero\u002FMuZero 算法工具包。\n  - [SO2](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FSO2)：[AAAI 2024] 关于离线到在线强化学习中 Q 值估计的一个视角。\n  - [LMDrive](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLMDrive)：[CVPR 2024] LMDrive：基于大型语言模型的闭环端到端驾驶。\n  - [SmartRefine](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FSmartRefine)：[CVPR 2024] SmartRefine：用于高效运动预测的场景自适应精炼框架。\n  - [ReZero](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLightZero)：通过反向视图和全缓冲区重新分析提升基于 MCTS 的算法性能。\n  - [UniZero](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLightZero)：利用可扩展潜在世界模型实现通用且高效的规划。\n- 文档与教程\n  - [DI-engine-docs](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine-docs)：教程、最佳实践及 API 参考。\n  - [awesome-model-based-RL](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-model-based-RL)：精选的基于模型的强化学习资源列表。\n  - [awesome-exploration-RL](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-exploration-rl)：精选的探索型强化学习资源列表。\n  - [awesome-decision-transformer](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-decision-transformer)：精选的决策 Transformer 资源列表。\n  - [awesome-RLHF](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-RLHF)：精选的强化学习与人类反馈资源列表。\n  - [awesome-multi-modal-reinforcement-learning](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-multi-modal-reinforcement-learning)：精选的多模态强化学习资源列表。\n  - [awesome-diffusion-model-in-rl](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-diffusion-model-in-rl)：精选的强化学习中扩散模型资源列表。\n  - [awesome-ui-agents](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-ui-agents)：精选的 UI 代理资源列表，涵盖 Web、App、操作系统等。\n  - [awesome-AI-based-protein-design](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-AI-based-protein-design)：人工智能辅助蛋白质设计的研究论文合集。\n  - [awesome-end-to-end-autonomous-driving](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-end-to-end-autonomous-driving)：精选的端到端自动驾驶资源列表。\n  - [awesome-driving-behavior-prediction](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-driving-behavior-prediction)：关于驾驶行为预测的研究论文合集。\n\n\u003C\u002Fdetails>\n\n在底层，DI-engine 提供了一系列高度可复用的模块，包括 [RL 优化函数](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fding\u002Frl_utils)、[PyTorch 工具](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fding\u002Ftorch_utils)以及 [辅助工具](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fding\u002Futils)。\n\n此外，**DI-engine** 还针对高效、稳健的大规模强化学习训练进行了一些特殊的 **系统优化与设计**：\n\n\u003Cdetails close>\n\u003Csummary>(点击查看详情)\u003C\u002Fsummary>\n\n- [treevalue](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Ftreevalue)：树形嵌套数据结构。\n- [DI-treetensor](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-treetensor)：树形嵌套 PyTorch 张量库。\n- [DI-toolkit](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-toolkit)：一个用于决策智能的简单工具包。\n- [DI-orchestrator](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-orchestrator)：强化学习 Kubernetes 自定义资源与 Operator 库。\n- [DI-hpc](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-hpc)：强化学习 HPC OP 库。\n- [DI-store](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-store)：强化学习对象存储。\n\n\u003C\u002Fdetails>\n\n尽情享受探索与利用的乐趣吧。\n\n## 大纲\n\n- [DI-engine 简介](#introduction-to-di-engine)\n- [大纲](#outline)\n- [安装](#installation)\n- [快速入门](#quick-start)\n- [特性](#feature)\n  - [算法多样性](#algorithm-versatility)\n  - [环境多样性](#environment-versatility)\n  - [通用数据容器：TreeTensor](#general-data-container-treetensor)\n- [反馈与贡献](#feedback-and-contribution)\n- [支持者](#supporters)\n  - [↳ 星标用户](#-stargazers)\n  - [↳ 分叉用户](#-forkers)\n- [引用](#citation)\n- [许可证](#license)\n\n## 安装\n\n您可以通过以下命令从 PyPI 简单地安装 DI-engine：\n\n```bash\npip install DI-engine\n```\n\n有关安装的更多信息，请参阅 [安装指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F01_quickstart\u002Finstallation.html)。\n\n我们的 Docker Hub 仓库地址为 [这里](https:\u002F\u002Fhub.docker.com\u002Frepository\u002Fdocker\u002Fopendilab\u002Fding)，我们准备了包含常见 RL 环境的 `基础镜像` 和 `环境镜像`。\n\n\u003Cdetails close>\n\u003Csummary>(点击查看详情)\u003C\u002Fsummary>\n\n- 基础镜像：opendilab\u002Fding:nightly\n- RPC 镜像：opendilab\u002Fding:nightly-rpc\n- Atari 镜像：opendilab\u002Fding:nightly-atari\n- Mujoco 镜像：opendilab\u002Fding:nightly-mujoco\n- DMC 镜像：opendilab\u002Fding:nightly-dmc2gym\n- MetaWorld 镜像：opendilab\u002Fding:nightly-metaworld\n- SMAC 镜像：opendilab\u002Fding:nightly-smac\n- GRF 镜像：opendilab\u002Fding:nightly-grf\n- CityFlow 镜像：opendilab\u002Fding:nightly-cityflow\n- EvoGym 镜像：opendilab\u002Fding:nightly-evogym\n- D4RL 镜像：opendilab\u002Fding:nightly-d4rl\n\n\u003C\u002Fdetails>\n\n详细文档托管在 [doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F) | [中文文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F) 上。\n\n## 快速入门\n\n[3分钟入门](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F01_quickstart\u002Ffirst_rl_program.html)\n\n[3分钟入门 (colab)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1_7L-QFDfeCvMvLJzRyBRUW5_Q6ESXcZ4)\n\n[DI-engine Huggingface 入门 (colab)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1UH1GQOjcHrmNSaW77hnLGxFJrLSLwCOk)\n\n[如何迁移一个新的**强化学习环境**](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F11_dizoo\u002Findex.html) | [如何迁移一个新的**强化学习环境**](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F11_dizoo\u002Findex_zh.html)\n\n[如何定制策略使用的**神经网络模型**](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F04_best_practice\u002Fcustom_model.html) | [如何定制策略使用的**神经网络模型**](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F04_best_practice\u002Fcustom_model_zh.html)\n\n[测试\u002F部署 **强化学习策略** 的样例](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fdizoo\u002Fclassic_control\u002Fcartpole\u002Fentry\u002Fcartpole_c51_deploy.py)\n\n[新老 pipeline 的异同对比](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F04_best_practice\u002Fdiff_in_new_pipeline_zh.html)\n\n## 功能特性\n\n### 算法多样性\n\n\u003Cdetails open>\n\u003Csummary>(点击收起)\u003C\u002Fsummary>\n\n![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen) &nbsp;离散动作空间，常见于常规强化学习算法中（1-23）\n\n![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green) &nbsp;连续动作空间，常见于常规强化学习算法中（1-23）\n\n![hybrid](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-hybrid-darkgreen) &nbsp;混合动作空间（离散+连续）（1-23）\n\n![dist](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-distributed-blue) &nbsp;[分布式强化学习](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F02_algo\u002Fdistributed_rl.html)｜[分布式强化学习](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F02_algo\u002Fdistributed_rl_zh.html)\n\n![MARL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow) &nbsp;[多智能体强化学习](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F02_algo\u002Fmulti_agent_cooperation_rl.html)｜[多智能体强化学习](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F02_algo\u002Fmulti_agent_cooperation_rl_zh.html)\n\n![exp](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-exploration-orange) &nbsp;[强化学习中的探索机制](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F02_algo\u002Fexploration_rl.html)｜[强化学习中的探索机制](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F02_algo\u002Fexploration_rl_zh.html)\n\n![IL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-IL-purple) &nbsp;[模仿学习](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F02_algo\u002Fimitation_learning.html)｜[模仿学习](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F02_algo\u002Fimitation_learning_zh.html)\n\n![offline](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-offlineRL-darkblue) &nbsp;[离线强化学习](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F02_algo\u002Foffline_rl.html)｜[离线强化学习](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F02_algo\u002Foffline_rl_zh.html)\n\n![mbrl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-ModelBasedRL-lightblue) &nbsp;[基于模型的强化学习](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F02_algo\u002Fmodel_based_rl.html)｜[基于模型的强化学习](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F02_algo\u002Fmodel_based_rl_zh.html)\n\n![other](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-other-lightgrey) &nbsp;其他细分方向的算法，通常作为整个流程中的插件式组件\n\n注：`Runnable Demo` 中的 `.py` 文件可在 `dizoo` 目录下找到。\n\n| 序号 |                                                              算法                                                              |                                                                                     标签                                                                                     |                                                                                                                                   文档与实现                                                                                                                                   |                                      可运行Demo                                      |\n| :-: | :---------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------: |\n|  1  |                             [DQN](https:\u002F\u002Fstorage.googleapis.com\u002Fdeepmind-media\u002Fdqn\u002FDQNNaturePaper.pdf)                             |                                                        ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                        |             [DQN文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fdqn.html)\u003Cbr>[DQN中文文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F12_policies\u002Fdqn_zh.html)\u003Cbr>[policy\u002Fdqn](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fdqn.py)             |     python3 -u cartpole_dqn_main.py \u002F ding -m serial -c cartpole_dqn_config.py -s 0     |\n|  2  |                                             [C51](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1707.06887.pdf)                                             |                                                        ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                        |                                                            [C51文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fc51.html)\u003Cbr>[policy\u002Fc51](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fc51.py)                                                            |                      ding -m serial -c cartpole_c51_config.py -s 0                      |\n|  3  |                                            [QRDQN](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1710.10044.pdf)                                            |                                                        ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                        |                                                        [QRDQN文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fqrdqn.html)\u003Cbr>[policy\u002Fqrdqn](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fqrdqn.py)                                                        |                     ding -m serial -c cartpole_qrdqn_config.py -s 0                     |\n|  4  |                                             [IQN](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1806.06923.pdf)                                             |                                                        ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                        |                                                            [IQN文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fiqn.html)\u003Cbr>[policy\u002Fiqn](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fiqn.py)                                                            |                      ding -m serial -c cartpole_iqn_config.py -s 0                      |\n|  5  |                                             [FQF](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1911.02140.pdf)                                             |                                                        ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                        |                                                            [FQF文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Ffqf.html)\u003Cbr>[policy\u002Ffqf](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Ffqf.py)                                                            |                      ding -m serial -c cartpole_fqf_config.py -s 0                      |\n|  6  |                                           [Rainbow](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1710.02298.pdf)                                           |                                                        ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                        |                                                    [Rainbow文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Frainbow.html)\u003Cbr>[policy\u002Frainbow](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Frainbow.py)                                                    |                    ding -m serial -c cartpole_rainbow_config.py -s 0                    |\n|  7  |                                             [SQL](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1702.08165.pdf)                                             |                          ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                          |                                                            [SQL文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fsql.html)\u003Cbr>[policy\u002Fsql](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fsql.py)                                                            |                      ding -m serial -c cartpole_sql_config.py -s 0                      |\n|  8  |                                         [R2D2](https:\u002F\u002Fopenreview.net\u002Fforum?id=r1lyTjAqYX)                                         |                            ![dist](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-distributed-blue)![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                            |                                                          [R2D2文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fr2d2.html)\u003Cbr>[policy\u002Fr2d2](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fr2d2.py)                                                          |                      ding -m serial -c cartpole_r2d2_config.py -s 0                      |\n|  9  |                   [PG](https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper\u002F1999\u002Ffile\u002F464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf)                   |                                                        ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                        |                                                             [PG文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fa2c.html)\u003Cbr>[policy\u002Fpg](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fpg.py)                                                             |                       ding -m serial -c cartpole_pg_config.py -s 0                       |\n| 10 |                                            [PromptPG](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.14610)                                            |                                                        ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                        |                                                                                               [policy\u002Fprompt_pg](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fprompt_pg.py)                                                                                               |                   ding -m serial_onpolicy -c tabmwp_pg_config.py -s 0                   |\n| 11 |                                             [A2C](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1602.01783.pdf)                                             |                                                        ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                        |                                                            [A2C文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fa2c.html)\u003Cbr>[policy\u002Fa2c](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fa2c.py)                                                            |                      ding -m serial -c cartpole_a2c_config.py -s 0                      |\n| 12 |                        [PPO](https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.06347)\u002F[MAPPO](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2103.01955.pdf)                        | ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)![MARL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow) |                                                            [PPO文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fppo.html)\u003Cbr>[policy\u002Fppo](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fppo.py)                                                            | python3 -u cartpole_ppo_main.py \u002F ding -m serial_onpolicy -c cartpole_ppo_config.py -s 0 |\n| 13 |                                             [PPG](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2009.04416.pdf)                                             |                                                        ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                        |                                                            [PPG文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fppg.html)\u003Cbr>[policy\u002Fppg](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fppg.py)                                                            |                             python3 -u cartpole_ppg_main.py                             |\n| 14 |                                            [ACER](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1611.01224.pdf)                                            |                          ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                          |                                                          [ACER文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Facer.html)\u003Cbr>[policy\u002Facer](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Facer.py)                                                          |                      ding -m serial -c cartpole_acer_config.py -s 0                      |\n| 15 |                                             [IMPALA](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.01561)                                             |                            ![dist](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-distributed-blue)![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                            |                                                      [IMPALA文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fimpala.html)\u003Cbr>[policy\u002Fimpala](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fimpala.py)                                                      |                     ding -m serial -c cartpole_impala_config.py -s 0                     |\n| 16 |                     [DDPG](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1509.02971.pdf)\u002F[PADDPG](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1511.04143.pdf)                     |                             ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)![hybrid](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-hybrid-darkgreen)                             |                                                          [DDPG文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fddpg.html)\u003Cbr>[policy\u002Fddpg](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fddpg.py)                                                          |                      ding -m serial -c pendulum_ddpg_config.py -s 0                      |\n| 17 |                                             [TD3](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1802.09477.pdf)                                             |                             ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)![hybrid](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-hybrid-darkgreen)                             |                                                            [TD3文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Ftd3.html)\u003Cbr>[policy\u002Ftd3](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Ftd3.py)                                                            |     python3 -u pendulum_td3_main.py \u002F ding -m serial -c pendulum_td3_config.py -s 0     |\n| 18 |                                            [D4PG](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1804.08617.pdf)                                            |                                                         ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                                                         |                                                          [D4PG文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fd4pg.html)\u003Cbr>[policy\u002Fd4pg](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fd4pg.py)                                                          |                            python3 -u pendulum_d4pg_config.py                            |\n| 19 |                                           [SAC](https:\u002F\u002Farxiv.org\u002Fabs\u002F1801.01290)\u002F[MASAC]                                           | ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)![MARL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow) |                                                            [SAC文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fsac.html)\u003Cbr>[policy\u002Fsac](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fsac.py)                                                            |                      ding -m serial -c pendulum_sac_config.py -s 0                      |\n| 20 |                                            [PDQN](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.06394.pdf)                                            |                                                           ![hybrid](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-hybrid-darkgreen)                                                           |                                                                                                    [policy\u002Fpdqn](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fpdqn.py)                                                                                                    |                     ding -m serial -c gym_hybrid_pdqn_config.py -s 0                     |\n| 21 |                                            [MPDQN](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.04388.pdf)                                            |                                                           ![hybrid](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-hybrid-darkgreen)                                                           |                                                                                                    [policy\u002Fpdqn](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fpdqn.py)                                                                                                    |                    ding -m serial -c gym_hybrid_mpdqn_config.py -s 0                    |\n| 22 |                                            [HPPO](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1903.01344.pdf)                                            |                                                           ![hybrid](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-hybrid-darkgreen)                                                           |                                                                                                     [policy\u002Fppo](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fppo.py)                                                                                                     |                ding -m serial_onpolicy -c gym_hybrid_hppo_config.py -s 0                |\n| 23 |                                             [BDQ](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1711.08946.pdf)                                             |                                                           ![hybrid](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-hybrid-darkgreen)                                                           |                                                                                                     [policy\u002Fbdq](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fdqn.py)                                                                                                     |                             python3 -u hopper_bdq_config.py                             |\n| 24 |                                              [MDQN](https:\u002F\u002Farxiv.org\u002Fabs\u002F2007.14430)                                              |                                                        ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                        |                                                                                                    [policy\u002Fmdqn](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fmdqn.py)                                                                                                    |                            python3 -u asterix_mdqn_config.py                            |\n| 25 |                                            [QMIX](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1803.11485.pdf)                                            |                                                              ![MARL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)                                                              |                                                          [QMIX文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fqmix.html)\u003Cbr>[policy\u002Fqmix](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fqmix.py)                                                          |                     ding -m serial -c smac_3s5z_qmix_config.py -s 0                     |\n| 26 |                                            [COMA](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1705.08926.pdf)                                            |                                                              ![MARL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)                                                              |                                                          [COMA文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fcoma.html)\u003Cbr>[policy\u002Fcoma](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fcoma.py)                                                          |                     ding -m serial -c smac_3s5z_coma_config.py -s 0                     |\n| 27 |                                              [QTran](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.05408)                                              |                                                              ![MARL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)                                                              |                                                                                                   [policy\u002Fqtran](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fqtran.py)                                                                                                   |                     ding -m serial -c smac_3s5z_qtran_config.py -s 0                     |\n| 28 |                                              [WQMIX](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.10800)                                              |                                                              ![MARL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)                                                              |                                                        [WQMIX文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fwqmix.html)\u003Cbr>[policy\u002Fwqmix](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fwqmix.py)                                                        |                     ding -m serial -c smac_3s5z_wqmix_config.py -s 0                     |\n| 29 |                                           [CollaQ](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2010.08531.pdf)                                           |                                                              ![MARL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)                                                              |                                                      [CollaQ文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fcollaq.html)\u003Cbr>[policy\u002Fcollaq](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fcollaq.py)                                                      |                    ding -m serial -c smac_3s5z_collaq_config.py -s 0                    |\n| 30 |                                           [MADDPG](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1706.02275.pdf)                                           |                                                              ![MARL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)                                                              |                                                         [MADDPG文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fddpg.html)\u003Cbr>[policy\u002Fddpg](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fddpg.py)                                                         |                ding -m serial -c ptz_simple_spread_maddpg_config.py -s 0                |\n| 31 |                                            [GAIL](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1606.03476.pdf)                                            |                                                                ![IL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-IL-purple)                                                                |                                               [GAIL文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fgail.html)\u003Cbr>[reward_model\u002Fgail](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Freward_model\u002Fgail_irl_model.py)                                               |                 ding -m serial_gail -c cartpole_dqn_gail_config.py -s 0                 |\n| 32 |                                            [SQIL](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1905.11108.pdf)                                            |                                                                ![IL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-IL-purple)                                                                |                                                    [SQIL文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fsqil.html)\u003Cbr>[entry\u002Fsqil](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fentry\u002Fserial_entry_sqil.py)                                                    |                   ding -m serial_sqil -c cartpole_sqil_config.py -s 0                   |\n| 33 |                                            [DQFD](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1704.03732.pdf)                                            |                                                                ![IL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-IL-purple)                                                                |                                                          [DQFD文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fdqfd.html)\u003Cbr>[policy\u002Fdqfd](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fdqfd.py)                                                          |                   ding -m serial_dqfd -c cartpole_dqfd_config.py -s 0                   |\n| 34 |                                            [R2D3](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1909.01387.pdf)                                            |                                                                ![IL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-IL-purple)                                                                |       [R2D3文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fr2d3.html)\u003Cbr>[R2D3中文文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F12_policies\u002Fr2d3_zh.html)\u003Cbr>[policy\u002Fr2d3](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F12_policies\u002Fr2d3_zh.html)       |                        python3 -u pong_r2d3_r2d2expert_config.py                        |\n| 35 |                                    [Guided Cost Learning](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1603.00448.pdf)                                    |                                                                ![IL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-IL-purple)                                                                |                      [Guided Cost Learning中文文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F12_policies\u002Fguided_cost_zh.html)\u003Cbr>[reward_model\u002Fguided_cost](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Freward_model\u002Fguided_cost_reward_model.py)                      |                            python3 lunarlander_gcl_config.py                            |\n| 36 |                                              [TREX](https:\u002F\u002Farxiv.org\u002Fabs\u002F1904.06387)                                              |                                                                ![IL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-IL-purple)                                                                |                                             [TREX文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Ftrex.html)\u003Cbr>[reward_model\u002Ftrex](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Freward_model\u002Ftrex_reward_model.py)                                             |                               python3 mujoco_trex_main.py                               |\n| 37 |                               [Implicit Behavorial Cloning](https:\u002F\u002Fimplicitbc.github.io\u002F) (DFO+MCMC)                               |                                                                ![IL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-IL-purple)                                                                |                                                  [policy\u002Fibc](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fibc.py) \u003Cbr> [model\u002Ftemplate\u002Febm](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fmodel\u002Ftemplate\u002Febm.py)                                                  |              python3 d4rl_ibc_main.py -s 0 -c pen_human_ibc_mcmc_config.py              |\n| 38 |                                             [BCO](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1805.01954.pdf)                                             |                                                                ![IL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-IL-purple)                                                                |                                                                                                [entry\u002Fbco](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fentry\u002Fserial_entry_bco.py)                                                                                                |                            python3 -u cartpole_bco_config.py                            |\n| 39 |                                             [HER](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1707.01495.pdf)                                             |                                                           ![exp](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-exploration-orange)                                                           |                                               [HER文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fher.html)\u003Cbr>[reward_model\u002Fher](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Freward_model\u002Fher_reward_model.py)                                               |                              python3 -u bitflip_her_dqn.py                              |\n| 40 |                                               [RND](https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.12894)                                               |                                                           ![exp](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-exploration-orange)                                                           |                                               [RND文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Frnd.html)\u003Cbr>[reward_model\u002Frnd](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Freward_model\u002Frnd_reward_model.py)                                               |                         python3 -u cartpole_rnd_onppo_config.py                         |\n| 41 |                                             [ICM](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1705.05363.pdf)                                             |                                                           ![exp](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-exploration-orange)                                                           | [ICM文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Ficm.html)\u003Cbr>[ICM中文文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F12_policies\u002Ficm_zh.html)\u003Cbr>[reward_model\u002Ficm](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Freward_model\u002Ficm_reward_model.py) |                          python3 -u cartpole_ppo_icm_config.py                          |\n| 42 |                                             [CQL](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2006.04779.pdf)                                             |                                                         ![offline](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-offlineRL-darkblue)                                                         |                                                            [CQL文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fcql.html)\u003Cbr>[policy\u002Fcql](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fcql.py)                                                            |                               python3 -u d4rl_cql_main.py                               |\n| 43 |                                            [TD3BC](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2106.06860.pdf)                                            |                                                         ![offline](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-offlineRL-darkblue)                                                         |                                                      [TD3BC文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Ftd3_bc.html)\u003Cbr>[policy\u002Ftd3_bc](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Ftd3_bc.py)                                                      |                              python3 -u d4rl_td3_bc_main.py                              |\n| 44 |                                    [Decision Transformer](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2106.01345.pdf)                                    |                                                         ![offline](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-offlineRL-darkblue)                                                         |                                                                                                      [policy\u002Fdt](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fdt.py)                                                                                                      |                               python3 -u d4rl_dt_mujoco.py                               |\n| 45 |                                            [EDAC](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2110.01548.pdf)                                            |                                                         ![offline](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-offlineRL-darkblue)                                                         |                                                          [EDAC文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fedac.html)\u003Cbr>[policy\u002Fedac](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fedac.py)                                                          |                               python3 -u d4rl_edac_main.py                               |\n| 46 |                                            [QGPO](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2304.12824.pdf)                                            |                                                         ![offline](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-offlineRL-darkblue)                                                         |                                                          [QGPO文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fqgpo.html)\u003Cbr>[policy\u002Fqgpo](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fqgpo.py)                                                          |                             python3 -u ding\u002Fexample\u002Fqgpo.py                             |\n| 47 |   MBSAC([SAC](https:\u002F\u002Farxiv.org\u002Fabs\u002F1801.01290)+[MVE](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.00101)+[SVG](https:\u002F\u002Farxiv.org\u002Fabs\u002F1510.09142))   |                           ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)![mbrl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-ModelBasedRL-lightblue)                           |                                                                                          [policy\u002Fmbpolicy\u002Fmbsac](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fmbpolicy\u002Fmbsac.py)                                                                                          |   python3 -u pendulum_mbsac_mbpo_config.py \\ python3 -u pendulum_mbsac_ddppo_config.py   |\n| 48 | STEVESAC([SAC](https:\u002F\u002Farxiv.org\u002Fabs\u002F1801.01290)+[STEVE](https:\u002F\u002Farxiv.org\u002Fabs\u002F1807.01675)+[SVG](https:\u002F\u002Farxiv.org\u002Fabs\u002F1510.09142)) |                           ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)![mbrl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-ModelBasedRL-lightblue)                           |                                                                                          [policy\u002Fmbpolicy\u002Fmbsac](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fmbpolicy\u002Fmbsac.py)                                                                                          |                       python3 -u pendulum_stevesac_mbpo_config.py                       |\n| 49 |                                            [MBPO](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1906.08253.pdf)                                            |                                                         ![mbrl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-ModelBasedRL-lightblue)                                                         |                                                     [MBPO文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fmbpo.html)\u003Cbr>[world_model\u002Fmbpo](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fworld_model\u002Fmbpo.py)                                                     |                          python3 -u pendulum_sac_mbpo_config.py                          |\n| 50 |                                        [DDPPO](https:\u002F\u002Fopenreview.net\u002Fforum?id=rzvOQrnclO0)                                        |                                                         ![mbrl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-ModelBasedRL-lightblue)                                                         |                                                                                              [world_model\u002Fddppo](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fworld_model\u002Fddppo.py)                                                                                              |                        python3 -u pendulum_mbsac_ddppo_config.py                        |\n| 51 |                                          [DreamerV3](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2301.04104.pdf)                                          |                                                         ![mbrl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-ModelBasedRL-lightblue)                                                         |                                                                                          [world_model\u002Fdreamerv3](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fworld_model\u002Fdreamerv3.py)                                                                                          |                      python3 -u cartpole_balance_dreamer_config.py                      |\n| 52 |                                             [PER](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1511.05952.pdf)                                             |                                                            ![other](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-other-lightgrey)                                                            |                                                                                   [worker\u002Freplay_buffer](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fworker\u002Freplay_buffer\u002Fadvanced_buffer.py)                                                                                   |                                      `rainbow demo`                                      |\n| 53 |                                             [GAE](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1506.02438.pdf)                                             |                                                            ![other](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-other-lightgrey)                                                            |                                                                                                   [rl_utils\u002Fgae](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Frl_utils\u002Fgae.py)                                                                                                   |                                        `ppo demo`                                        |\n| 54 |                                           [ST-DIM](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1906.08226.pdf)                                           |                                                            ![other](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-other-lightgrey)                                                            |                                                                              [torch_utils\u002Floss\u002Fcontrastive_loss](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Ftorch_utils\u002Floss\u002Fcontrastive_loss.py)                                                                              |                   ding -m serial -c cartpole_dqn_stdim_config.py -s 0                   |\n| 55 |                                             [PLR](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2010.03934.pdf)                                             |                                                            ![other](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-other-lightgrey)                                                            |                                       [PLR文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Fplr.html)\u003Cbr>[data\u002Flevel_replay\u002Flevel_sampler](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fdata\u002Flevel_replay\u002Flevel_sampler.py)                                       |                          python3 -u bigfish_plr_config.py -s 0                          |\n| 56 |                                           [PCGrad](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2001.06782.pdf)                                           |                                                            ![other](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-other-lightgrey)                                                            |                                                                             [torch_utils\u002Foptimizer_helper\u002FPCGrad](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fdata\u002Ftorch_utils\u002Foptimizer_helper.py)                                                                             |                        python3 -u multi_mnist_pcgrad_main.py -s 0                        |\n| 57 |                                           [AWR](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1910.00177.pdf)                                                   |                                                            ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                    |                                                                             [policy\u002Fibc](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fding\u002Fpolicy\u002Fprompt_awr.py)                                                                                                                     |                        python3 -u tabmwp_awr_config.py                                   |\n\n\u003C\u002Fdetails>\n\n\n\n### 环境适应性\n\n\u003Cdetails open>\n\u003Csummary>(点击收起)\u003C\u002Fsummary>\n\n| 序号 |                                          环境                                          |                                                                                                                   标签                                                                                                                   |                                             可视化                                             |                                                                                                                                     代码与文档链接                                                                                                                                     |\n| :-: | :--------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |\n| 1 |               [Atari](https:\u002F\u002Fale.farama.org)               |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                                  ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_deb7d17553ec.gif)                                  |               [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fatari\u002Fenvs) \u003Cbr>[环境教程](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fatari.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fatari_zh.html)               |\n| 2 |        [box2d\u002Fbipedalwalker](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym\u002Ftree\u002Fmaster\u002Fgym\u002Fenvs\u002Fbox2d)        |                                                                                       ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                                                                                       |                         ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_03b074361224.gif)                         | [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fbox2d\u002Fbipedalwalker\u002Fenvs)\u003Cbr>[环境教程](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fbipedalwalker.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fbipedalwalker_zh.html) |\n| 3 |         [box2d\u002Flunarlander](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym\u002Ftree\u002Fmaster\u002Fgym\u002Fenvs\u002Fbox2d)         |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                         ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_8b57ca01adf3.gif)                         |    [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fbox2d\u002Flunarlander\u002Fenvs)\u003Cbr>[环境教程](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Flunarlander.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Flunarlander_zh.html)    |\n| 4 | [classic_control\u002Fcartpole](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym\u002Ftree\u002Fmaster\u002Fgym\u002Fenvs\u002Fclassic_control) |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                       ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_d27e13293bcb.gif)                       |   [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fclassic_control\u002Fcartpole\u002Fenvs)\u003Cbr>[环境教程](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fcartpole.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fcartpole_zh.html)   |\n| 5 | [classic_control\u002Fpendulum](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym\u002Ftree\u002Fmaster\u002Fgym\u002Fenvs\u002Fclassic_control) |                                                                                       ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                                                                                       |                       ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_b95ef8c9973c.gif)                       |   [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fclassic_control\u002Fpendulum\u002Fenvs)\u003Cbr>[环境教程](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fpendulum.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fpendulum_zh.html)   |\n| 6 |                [competitive_rl](https:\u002F\u002Fgithub.com\u002Fcuhkrlcourse\u002Fcompetitive-rl)                |                                                         ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen) ![selfplay](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-selfplay-blue)                                                         |                         ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_57ab0e56c25d.gif)                         |                                                     [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo.classic_control)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fcompetitive_rl_zh.html)                                                     |\n| 7 |                    [gfootball](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Ffootball)                    |                          ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)![sparse](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-sparse%20reward-orange)![selfplay](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-selfplay-blue)                          |                              ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_cec004ff6c1a.gif)                              |           [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo.gfootball\u002Fenvs)\u003Cbr>[环境教程](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fgfootball.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fgfootball_zh.html)           |\n| 8 |                      [minigrid](https:\u002F\u002Fgithub.com\u002Fmaximecb\u002Fgym-minigrid)                      |                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)![sparse](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-sparse%20reward-orange)                                                      |                               ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_bb708e1ba572.gif)                               |             [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fminigrid\u002Fenvs)\u003Cbr>[环境教程](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fminigrid.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fminigrid_zh.html)             |\n| 9 |              [MuJoCo](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym\u002Ftree\u002Fmaster\u002Fgym\u002Fenvs\u002Fmujoco)              |                                                                                       ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                                                                                       |                                 ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_14e09becfe59.gif)                                 |                [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fmajoco\u002Fenvs)\u003Cbr>[环境教程](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fmujoco.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fmujoco_zh.html)                |\n| 10 |                 [PettingZoo](https:\u002F\u002Fgithub.com\u002FFarama-Foundation\u002FPettingZoo)                 |                              ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen) ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green) ![marl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)                              |                   ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_b5ae00e38469.gif)                   |        [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fpetting_zoo\u002Fenvs)\u003Cbr>[环境教程](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fpettingzoo.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fpettingzoo_zh.html)        |\n| 11 |               [overcooked](https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fovercooked-demo)               |                                                            ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen) ![marl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)                                                            |                             ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_2b0dab67310c.gif)                             |                                                       [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fovercooded\u002Fenvs)\u003Cbr>[环境教程](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fovercooked.html)                                                       |\n| 12 |                          [procgen](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fprocgen)                          |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                                ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_80b380d05c7e.gif)                                |               [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fprocgen)\u003Cbr>[环境教程](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fprocgen.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fprocgen_zh.html)               |\n| 13 |                      [pybullet](https:\u002F\u002Fgithub.com\u002Fbenelot\u002Fpybullet-gym)                      |                                                                                       ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                                                                                       |                               ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_e2c5b7f67b04.gif)                               |                                                        [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fpybullet\u002Fenvs)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fpybullet_zh.html)                                                        |\n| 14 |                            [smac](https:\u002F\u002Fgithub.com\u002Foxwhirl\u002Fsmac)                            | ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen) ![marl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)![selfplay](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-selfplay-blue)![sparse](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-sparse%20reward-orange) |                                   ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_67a3c84bdb89.gif)                                   |                 [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fsmac\u002Fenvs)\u003Cbr>[环境教程](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fsmac.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fsmac_zh.html)                 |\n| 15 |                         [d4rl](https:\u002F\u002Fgithub.com\u002Frail-berkeley\u002Fd4rl)                         |                                                                                       ![offline](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-offlineRL-darkblue)                                                                                       |                                      ![ori](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_efcc51cf498f.gif)                                      |                                                              [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fd4rl)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fd4rl_zh.html)                                                              |\n| 16 |                                          league_demo                                          |                                                         ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen) ![selfplay](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-selfplay-blue)                                                         |                            ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_64b8c7559165.png)                            |                                                                                                    [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fleague_demo\u002Fenvs)                                                                                                    |\n| 17 |                                          pomdp atari                                          |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                                                                                                        |                                                                                                       [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fpomdp\u002Fenvs)                                                                                                       |\n| 18 |                          [bsuite](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fbsuite)                          |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                                 ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_e2ab4ca22cb6.png)                                 |             [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fbsuite\u002Fenvs)\u003Cbr>[环境教程](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002F\u002Fbsuite.html) \u003Cbr> [环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fbsuite_zh.html)             |\n| 19 |                             [ImageNet](https:\u002F\u002Fwww.image-net.org\u002F)                             |                                                                                             ![IL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-IL\u002FSL-purple)                                                                                             |                         ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_e1af270145fe.png)                         |                                                    [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fimage_classification)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fimage_cls_zh.html)                                                    |\n| 20 |                 [slime_volleyball](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym)                 |                                                          ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)![selfplay](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-selfplay-blue)                                                          |                              ![ori](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_dea895057c05.gif)                              |    [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fslime_volley)\u003Cbr>[环境教程](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fslime_volleyball.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fslime_volleyball_zh.html)    |\n| 21 |                    [gym_hybrid](https:\u002F\u002Fgithub.com\u002Fthomashirtz\u002Fgym-hybrid)                    |                                                                                         ![hybrid](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-hybrid-darkgreen)                                                                                         |                                 ![ori](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_7b04dc2eb83a.gif)                                 |           [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fgym_hybrid)\u003Cbr>[环境教程](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fgym_hybrid.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fgym_hybrid_zh.html)           |\n| 22 |                       [GoBigger](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FGoBigger)                       |                                    ![hybrid](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-hybrid-darkgreen)![marl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)![selfplay](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-selfplay-blue)                                    |                                 ![ori](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_8f0aa108697e.gif)                                 |                                [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FGoBigger-Challenge-2021\u002Ftree\u002Fmain\u002Fdi_baseline)\u003Cbr>[环境教程](https:\u002F\u002Fgobigger.readthedocs.io\u002Fen\u002Flatest\u002Findex.html)\u003Cbr>[环境指南](https:\u002F\u002Fgobigger.readthedocs.io\u002Fzh_CN\u002Flatest\u002F)                                |\n| 23 |                       [gym_soccer](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym-soccer)                       |                                                                                         ![hybrid](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-hybrid-darkgreen)                                                                                         |                              ![ori](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_423205f017d0.gif)                              |                                                        [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fgym_soccer)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fgym_soccer_zh.html)                                                        |\n| 24 |           [multiagent_mujoco](https:\u002F\u002Fgithub.com\u002Fschroederdewitt\u002Fmultiagent_mujoco)           |                                                              ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green) ![marl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)                                                              |                                 ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_14e09becfe59.gif)                                 |                                                    [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fmultiagent_mujoco\u002Fenvs)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fmujoco_zh.html)                                                    |\n| 25 |                                            bitflip                                            |                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen) ![sparse](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-sparse%20reward-orange)                                                      |                                ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_eb4166449771.gif)                                |                                                         [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fbitflip\u002Fenvs)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fbitflip_zh.html)                                                         |\n| 26 |                      [sokoban](https:\u002F\u002Fgithub.com\u002FmpSchrader\u002Fgym-sokoban)                      |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      | ![Game 2](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_e974934be4b7.gif) |             [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fsokoban\u002Fenvs)\u003Cbr>[环境教程](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fsokoban.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fsokoban_zh.html)             |\n| 27 |                   [gym_anytrading](https:\u002F\u002Fgithub.com\u002FAminHP\u002Fgym-anytrading)                   |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                         ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_cc418c586002.png)                         |                                                [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fgym_anytrading) \u003Cbr> [环境教程](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fdizoo\u002Fgym_anytrading\u002Fenvs\u002FREADME.md)                                                |\n| 28 |                   [mario](https:\u002F\u002Fgithub.com\u002FKautenja\u002Fgym-super-mario-bros)                   |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                                  ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_7501674b0387.gif)                                  |  [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fmario) \u003Cbr> [环境教程](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fgym_super_mario_bros.html) \u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fgym_super_mario_bros_zh.html)  |\n| 29 |                       [dmc2gym](https:\u002F\u002Fgithub.com\u002Fdenisyarats\u002Fdmc2gym)                       |                                                                                       ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                                                                                       |                            ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_984ad0c53fac.png)                            |               [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fdmc2gym)\u003Cbr>[环境教程](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fdmc2gym.html)\u003Cbr>[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fdmc2gym_zh.html)               |\n| 30 |                        [evogym](https:\u002F\u002Fgithub.com\u002FEvolutionGym\u002Fevogym)                        |                                                                                       ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                                                                                       |                                 ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_2c1d64c758c5.gif)                                 |            [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fevogym\u002Fenvs) \u003Cbr> [环境教程](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Fevogym.html) \u003Cbr> [环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002FEvogym_zh.html)            |\n| 31 |             [gym-pybullet-drones](https:\u002F\u002Fgithub.com\u002FutiasDSL\u002Fgym-pybullet-drones)             |                                                                                       ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                                                                                       |                    ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_1a3c7b00a823.gif)                    |                                                                                          [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fgym_pybullet_drones\u002Fenvs)\u003Cbr>环境指南                                                                                          |\n| 32 |                 [beergame](https:\u002F\u002Fgithub.com\u002FOptMLGroup\u002FDeepBeerInventory-RL)                 |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                               ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_91bae50e9303.png)                               |                                                                                               [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fbeergame\u002Fenvs)\u003Cbr>环境指南                                                                                               |\n| 33 | [classic_control\u002Facrobot](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym\u002Ftree\u002Fmaster\u002Fgym\u002Fenvs\u002Fclassic_control) |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                        ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_b9ca34ae76c8.gif)                        |                                                [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fclassic_control\u002Facrobot\u002Fenvs)\u003Cbr> [环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Facrobot_zh.html)                                                |\n| 34 |   [box2d\u002Fcar_racing](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym\u002Fblob\u002Fmaster\u002Fgym\u002Fenvs\u002Fbox2d\u002Fcar_racing.py)   |                                                     ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen) \u003Cbr> ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                                                     |                          ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_f4293ae4ad58.gif)                          |                                                                                            [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fbox2d\u002Fcarracing\u002Fenvs)\u003Cbr>环境指南                                                                                            |\n| 35 |                     [metadrive](https:\u002F\u002Fgithub.com\u002Fmetadriverse\u002Fmetadrive)                     |                                                                                       ![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green)                                                                                       |                            ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_539637faf23e.gif)                            |                                                       [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fmetadrive\u002Fenv)\u003Cbr> [环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fmetadrive_zh.html)                                                       |\n| 36 |  [cliffwalking](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym\u002Fblob\u002Fmaster\u002Fgym\u002Fenvs\u002Ftoy_text\u002Fcliffwalking.py)  |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                          ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_7ddd9c04384f.gif)                          |                                                                                    [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fcliffwalking\u002Fenvs)\u003Cbr> env tutorial \u003Cbr> 环境指南                                                                                    |\n| 37 |                       [tabmwp](https:\u002F\u002Fpromptpg.github.io\u002Fexplore.html)                       |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                                ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_059f0b09627e.jpeg)                                |                                                                                         [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Ftabmwp) \u003Cbr> env tutorial \u003Cbr> 环境指南                                                                                         |\n| 38 |            [frozen_lake](https:\u002F\u002Fgymnasium.farama.org\u002Fenvironments\u002Ftoy_text\u002Ffrozen_lake)      |                                                                                      ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                      |                                ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_aa3ca9c8615f.gif)                        |                                                                                         [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Ffrozen_lake) \u003Cbr> env tutorial \u003Cbr> 环境指南                                                                                         |\n| 39 | [ising_model](https:\u002F\u002Fgithub.com\u002Fmlii\u002Fmfrl\u002Ftree\u002Fmaster\u002Fexamples\u002Fising_model)                  |                            ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen) ![marl](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow)                                                                                             |                                ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_2dbb615e9aae.gif)                           | [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Fising_env) \u003Cbr> env tutorial \u003Cbr> [环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Fising_model_zh.html) |\n| 40 | [taxi](https:\u002F\u002Fwww.gymlibrary.dev\u002Fenvironments\u002Ftoy_text\u002Ftaxi\u002F)                  |                            ![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen)                                                                                 |                                ![original](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_readme_46da3c7299e9.gif)                           | [dizoo链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fdizoo\u002Ftaxi\u002Fenvs) \u003Cbr> [环境教程](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F13_envs\u002Ftaxi.html) \u003Cbr> [环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F13_envs\u002Ftaxi_zh.html) |\n\n![discrete](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-discrete-brightgreen) 表示离散动作空间\n\n![continuous](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-continous-green) 表示连续动作空间\n\n![hybrid](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-hybrid-darkgreen) 表示混合（离散+连续）动作空间\n\n![MARL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-MARL-yellow) 表示多智能体强化学习环境\n\n![sparse](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-sparse%20reward-orange) 表示与探索相关且奖励稀疏的环境\n\n![offline](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-offlineRL-darkblue) 表示离线强化学习环境\n\n![IL](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-IL\u002FSL-purple) 表示模仿学习或监督学习数据集\n\n![selfplay](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-selfplay-blue) 表示允许智能体之间对战的环境\n\n附注：Atari 中的一些环境，例如 **MontezumaRevenge**，也属于稀疏奖励类型。\n\n\u003C\u002Fdetails>\n\n\n\n### 通用数据容器：TreeTensor\n\nDI-engine 在各个组件中使用 [TreeTensor](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-treetensor) 作为基础数据容器，它易于使用，并且在环境定义、数据处理和 DRL 优化等不同代码模块中保持一致。以下是一些具体的代码示例：\n\n- TreeTensor 可以轻松地将 `torch.Tensor` 的所有操作扩展到嵌套数据中：\n\n  \u003Cdetails close>\n  \u003Csummary>(点击查看详情)\u003C\u002Fsummary>\n\n  ```python\n  import treetensor.torch as ttorch\n\n\n  # 创建随机张量\n  data = ttorch.randn({'a': (3, 2), 'b': {'c': (3, )}})\n  # 克隆并分离梯度\n  data_clone = data.clone().detach()\n  # 通过属性访问树形结构\n  a = data.a\n  c = data.b.c\n  # 堆叠\u002F拼接\u002F分割\n  stacked_data = ttorch.stack([data, data_clone], 0)\n  cat_data = ttorch.cat([data, data_clone], 0)\n  data, data_clone = ttorch.split(stacked_data, 1)\n  # 重塑\n  data = data.unsqueeze(-1)\n  data = data.squeeze(-1)\n  flatten_data = data.view(-1)\n  # 索引\n  data_0 = data[0]\n  data_1to2 = data[1:2]\n  # 执行数学计算\n  data = data.sin()\n  data.b.c.cos_().clamp_(-1, 1)\n  data += data ** 2\n  # 反向传播\n  data.requires_grad_(True)\n  loss = data.arctan().mean()\n  loss.backward()\n  # 打印形状\n  print(data.shape)\n  # 结果\n  # \u003CSize 0x7fbd3346ddc0>\n  # ├── 'a' --> torch.Size([1, 3, 2])\n  # └── 'b' --> \u003CSize 0x7fbd3346dd00>\n  #     └── 'c' --> torch.Size([1, 3])\n  ```\n\n  \u003C\u002Fdetails>\n- TreeTensor 可以让经典的深度强化学习流程实现起来既简单又高效。\n\n  \u003Cdetails close>\n  \u003Csummary>(点击查看详情)\u003C\u002Fsummary>\n\n  ```diff\n  import torch\n  import treetensor.torch as ttorch\n\n  B = 4\n\n\n  def get_item():\n      return {\n          'obs': {\n              'scalar': torch.randn(12),\n              'image': torch.randn(3, 32, 32),\n          },\n          'action': torch.randint(0, 10, size=(1,)),\n          'reward': torch.rand(1),\n          'done': False,\n      }\n\n\n  data = [get_item() for _ in range(B)]\n\n\n  # 执行 `stack` 操作\n  - def stack(data, dim):\n  -     elem = data[0]\n  -     if isinstance(elem, torch.Tensor):\n  -         return torch.stack(data, dim)\n  -     elif isinstance(elem, dict):\n  -         return {k: stack([item[k] for item in data], dim) for k in elem.keys()}\n  -     elif isinstance(elem, bool):\n  -         return torch.BoolTensor(data)\n  -     else:\n  -         raise TypeError(\"不支持该元素类型: {}\".format(type(elem)))\n  - stacked_data = stack(data, dim=0)\n  + data = [ttorch.tensor(d) for d in data]\n  + stacked_data = ttorch.stack(data, dim=0)\n\n  # 验证\n  - assert stacked_data['obs']['image'].shape == (B, 3, 32, 32)\n  - assert stacked_data['action'].shape == (B, 1)\n  - assert stacked_data['reward'].shape == (B, 1)\n  - assert stacked_data['done'].shape == (B,)\n  - assert stacked_data['done'].dtype == torch.bool\n  + assert stacked_data.obs.image.shape == (B, 3, 32, 32)\n  + assert stacked_data.action.shape == (B, 1)\n  + assert stacked_data.reward.shape == (B, 1)\n  + assert stacked_data.done.shape == (B,)\n  + assert stacked_data.done.dtype == torch.bool\n  ```\n\n  \u003C\u002Fdetails>\n\n## 反馈与贡献\n\n- 在 Github 上 [提交问题](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fissues\u002Fnew\u002Fchoose)\n- 打开或参与我们的 [论坛](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fdiscussions)\n- 在 DI-engine 的 [Discord 服务器](https:\u002F\u002Fdiscord.gg\u002FdkZS2JF56X) 上讨论\n- 在 DI-engine 的 [Slack 沟通频道](https:\u002F\u002Fjoin.slack.com\u002Ft\u002Fopendilab\u002Fshared_invite\u002Fzt-v9tmv4fp-nUBAQEH1_Kuyu_q4plBssQ) 上讨论\n- 在 DI-engine 的微信交流群中讨论（添加我们微信：ding314assist）\n\n  \u003Cimg src=https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fassets\u002Fwechat.jpeg width=35% \u002F>\n- 联系我们的邮箱 (opendilab@pjlab.org.cn)\n- 参与我们的未来计划 [Roadmap](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fissues\u002F548)\n\n我们非常感谢所有关于算法和系统设计方面的反馈与贡献。`CONTRIBUTING.md` 文件提供了必要的信息。\n\n## 支持者\n\n### &#8627; 星标用户\n\n[![@opendilab\u002FDI-engine 的星标用户列表](https:\u002F\u002Freporoster.com\u002Fstars\u002Fopendilab\u002FDI-engine)](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fstargazers)\n\n### &#8627; 分支用户\n\n[![@opendilab\u002FDI-engine 的分支用户列表](https:\u002F\u002Freporoster.com\u002Fforks\u002Fopendilab\u002FDI-engine)](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fnetwork\u002Fmembers)\n\n## 引用\n\n```latex\n@misc{ding,\n    title={DI-engine: 用于决策智能的通用 AI 系统\u002F引擎},\n    author={Niu, Yazhe 和 Xu, Jingxin 和 Pu, Yuan 和 Nie, Yunpeng 和 Zhang, Jinouwen 和 Hu, Shuai 和 Zhao, Liangxuan 和 Zhang, Ming 和 Liu, Yu},\n    publisher={GitHub},\n    howpublished={\\url{https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine}},\n    year={2021},\n}\n```\n\n## 许可证\n\nDI-engine 采用 Apache 2.0 许可证发布。","# DI-engine 快速上手指南\n\nDI-engine 是一个基于 PyTorch 和 JAX 的通用决策智能引擎，支持多种深度强化学习（DRL）算法、多智能体强化学习、模仿学习及离线强化学习等。本指南将帮助你快速完成环境配置并运行第一个示例。\n\n## 1. 环境准备\n\n在开始之前，请确保你的开发环境满足以下要求：\n\n*   **操作系统**: Linux (推荐 Ubuntu 18.04+) 或 macOS。Windows 用户建议使用 WSL2。\n*   **Python 版本**: 3.8, 3.9, 3.10 或 3.11。\n*   **GPU 驱动**: 如需使用 GPU 加速训练，请确保已安装正确的 NVIDIA 驱动和 CUDA 工具包。\n*   **包管理器**: 推荐使用 `conda` 管理虚拟环境，也可使用 `pip`。\n\n## 2. 安装步骤\n\n### 方法一：使用 pip 安装（推荐）\n\n你可以直接从 PyPI 安装最新稳定版。为了获得更快的下载速度，国内用户建议使用清华或阿里镜像源。\n\n```bash\n# 创建并激活虚拟环境 (可选但推荐)\nconda create -n di-env python=3.9\nconda activate di-env\n\n# 使用国内镜像源安装 DI-engine\npip install DI-engine -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n### 方法二：使用 Conda 安装\n\n如果你偏好使用 conda 通道进行安装：\n\n```bash\nconda create -n di-env python=3.9\nconda activate di-env\nconda install -c opendilab di-engine\n```\n\n### 验证安装\n\n安装完成后，运行以下命令检查是否成功：\n\n```bash\npython -c \"import ding; print(ding.__version__)\"\n```\n如果输出了版本号且无报错，则说明安装成功。\n\n## 3. 基本使用\n\nDI-engine 的核心设计理念是模块化，主要包含 **Env** (环境)、**Policy** (策略) 和 **Model** (模型)。以下是一个最简化的代码示例，展示如何初始化一个 DQN 策略并在 CartPole 环境中运行一步。\n\n```python\nfrom ding import DQNPolicy, CommandModeCollector, EnvManager\nfrom ding.utils import set_pkg_seed\nfrom gymnasium.wrappers import TimeLimit\nimport gymnasium as gym\n\n# 1. 设置随机种子以保证复现性\nset_pkg_seed(0, use_cuda=False)\n\n# 2. 创建环境 (以 CartPole-v1 为例)\n# DI-engine 通常需要对原生 gym 环境进行简单封装\nenv = gym.make('CartPole-v1')\nenv = TimeLimit(env, max_episode_steps=200)\n\n# 3. 初始化策略配置\n# 这里使用 DQN 的默认配置，实际使用中可加载自定义 config 文件\npolicy = DQNPolicy(cfg=dict(\n    model=dict(\n        obs_shape=4,\n        action_shape=2,\n        encoder_hidden_size_list=[128, 128],\n    ),\n    learn=dict(\n        update_per_collect=1,\n        batch_size=64,\n        learning_rate=0.001,\n    ),\n    collect=dict(\n        n_sample=100,\n        unroll_len=1,\n    ),\n))\n\n# 4. 初始化收集器 (Collector 负责与环境交互收集数据)\ncollector = CommandModeCollector(\n    policy=policy,\n    env=EnvManager([lambda: env]),\n    trajectory_space_name='trajectory',\n)\n\n# 5. 执行一次数据采集\n# collect() 返回 collected_data (收集到的数据) 和 other_info (其他信息)\ncollected_data, other_info = collector.collect(n_step=10, random_fraction=0.5)\n\nprint(f\"成功收集了 {len(collected_data)} 条轨迹数据\")\nprint(\"第一步快速上手完成！接下来可以查看官方文档进行完整训练。\")\n```\n\n### 下一步建议\n*   **完整训练**: 参考 `di-engine` 自带的配置文件（位于 `ding\u002Fexample` 目录），使用 `ding entry` 命令行工具启动完整训练任务。\n*   **查阅文档**: 访问 [中文文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F) 获取详细的算法列表、API 参考及进阶教程。","某自动驾驶初创公司的算法团队正在开发城市复杂路口的决策模型，需要快速验证多种强化学习算法在仿真环境中的表现。\n\n### 没有 DI-engine 时\n- 团队需手动重写数据收集、模型训练和评估流程，每尝试新算法（如 PPO 转 SAC）都要耗费数天重构代码。\n- 缺乏统一的接口标准，不同成员开发的模块兼容性差，导致调试并行交互环境时频繁报错。\n- 难以复现论文中的基准结果，因缺少内置的标准算法实现和预配置参数，实验对比缺乏公信力。\n- 分布式训练配置极其繁琐，无法充分利用集群算力，单次大规模实验耗时过长。\n\n### 使用 DI-engine 后\n- 借助框架内置的 20+ 种主流算法模板，团队仅需修改配置文件即可在几分钟内切换算法策略。\n- 利用标准化的数据接口和模块化设计，仿真环境与训练引擎无缝对接，大幅降低了集成错误率。\n- 直接调用官方提供的高精度基准模型和参数预设，快速建立了可靠的性能评估基线。\n- 通过原生支持的分布式训练功能，轻松调度多卡资源，将原本需要数天的训练周期缩短至数小时。\n\nDI-engine 通过提供全栈式、标准化的强化学习基础设施，让研发团队从重复造轮子中解放出来，专注于核心决策逻辑的创新与迭代。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopendilab_DI-engine_3f2973ba.png","opendilab","OpenDILab","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fopendilab_83f31d72.png","Open-source Decision Intelligence (DI) Platform",null,"opendilab@pjlab.org.cn","https:\u002F\u002Fgithub.com\u002Fopendilab",[80,84,88],{"name":81,"color":82,"percentage":83},"Python","#3572A5",99.9,{"name":85,"color":86,"percentage":87},"Shell","#89e051",0.1,{"name":89,"color":90,"percentage":91},"Makefile","#427819",0,3610,431,"2026-04-06T22:06:48","Apache-2.0","未说明","未说明 (基于 PyTorch\u002FJAX，通常深度学习训练需要 GPU，但 README 未指定具体型号或显存要求)",{"notes":99,"python":100,"dependencies":101},"该工具是一个通用的决策智能引擎，支持 PyTorch 和 JAX 后端。它模块化集成了多种深度强化学习算法（如 DQN, PPO, SAC 等）及多智能体、离线 RL 算法。安装可通过 PyPI 或 Conda 进行。项目依赖其生态内的特定库（如 treevalue 用于树形数据结构）。具体的 CUDA 版本和 GPU 型号取决于用户安装的 PyTorch 或 JAX 版本，README 中未做强制限定。","3.8+ (根据 PyPI badge 推断，支持 Python 3.8, 3.9, 3.10 等)",[102,103,104,105],"torch","jax","treevalue","DI-treetensor",[13,14],[108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127],"reinforcement-learning","multiagent-reinforcement-learning","self-play","imitation-learning","inverse-reinforcement-learning","exploration-exploitation","distributed-system","python","impala","smac","atari","mujoco","minigrid","r2d2","reinforcement-learning-algorithms","pytorch-rl","offline-rl","drl","distributed-reinforcement-learning","model-based-reinforcement-learning","2026-03-27T02:49:30.150509","2026-04-07T11:35:31.908452",[131,136,141,146,151,155],{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},21897,"如何在 Atari 环境中使用 GTrXL 模型，或者如何在其前连接 Conv2D 网络？","GTrXL 本身没有直接提供 obs_shape 参数来处理 Atari 的图像输入。如果需要用于 Atari 环境，通常需要在 GTrXL 之前手动连接一个 Conv2D 网络来提取特征。此外，对于像 Montezuma Revenge 这样稀疏奖励的环境，单纯改进时序模型（如 GTrXL）效果有限，更重要的是增强探索能力，建议结合 RND (Random Network Distillation) 或 Go-Explore 算法使用。","https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fissues\u002F319",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},21898,"运行交易模型部署时遇到 'NameNotFound: Environment stocks doesn't exist' 错误怎么办？","该错误通常是因为配置中的 `ckpt_path` 参数未被主函数正确调用，或者指定的检查点文件路径不存在。请确保你拥有训练好的模型文件（例如 `ckpt_best.pth.tar`），并在配置中正确设置其路径。如果文件不存在，需要先完成模型训练步骤。同时检查 `env_id` 是否正确注册为 'stocks-v0'。","https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fissues\u002F710",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},21899,"使用 CUDA 运行 R2D2 算法时出现异常，提示样本不足或 unroll_len 相关问题，如何解决？","这是因为配置中的 `unroll_len`（默认可能为 42）对于某些环境来说过大，导致收集的样本不足以填充缓冲区。当采样时，系统会过滤掉长度不足的序列从而引发异常。解决方法是减小配置文件中的 `unroll_len` 值，使其适应当前环境的回合长度。","https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fissues\u002F561",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},21900,"运行 gym_hybrid 环境时报错 'AssertionError: gym_hybrid' 或缺少 replay_path 怎么办？","该错误通常是因为配置文件中缺少 `replay_path` 字段。你需要在环境配置字典（env dict）中显式添加该字段，指定一个用于保存回放视频的有效目录路径。示例代码如下：\n```python\ngym_hybrid_ddpg_config = dict(\n  env=dict(\n      replay_path=\".\u002Fyour_video_save_directory\"\n  ),\n  ...\n)\n```","https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fissues\u002F442",{"id":152,"question_zh":153,"answer_zh":154,"source_url":135},21901,"如何使用 PPO 策略运行 GTrXL 算法？","虽然 Issue 标题询问了如何用 PPO 运行 GTrXL，但根据讨论，GTrXL 通常与基于价值的离线策略（off-policy value-based methods）结合更紧密。如果必须尝试，需确保缓冲区大小（Buffer size）设置合理，因为这对离线方法至关重要。对于具体实现，建议参考官方文档中关于多 GPU 和数据并行的示例，或查看相关论文以了解缓冲区对性能的影响。",{"id":156,"question_zh":157,"answer_zh":158,"source_url":140},21902,"在股票交易环境中，如何正确配置 DQN 策略的参数（如窗口大小、数据范围等）？","在配置股票交易环境（如 stocks-v0）时，需注意以下关键参数：\n1. `window_size`: 设置为特征长度（例如 20）。\n2. `eps_length`: 设置为交易周期长度（例如 1 天或 253 天）。\n3. `train_range` 和 `test_range`: 定义用于训练和测试的数据比例（例如 0.8 和 -0.2）。\n4. `stocks_data_filename`: 指定原始数据文件名（如 'STOCKS_GOOGL'）。\n确保 `obs_shape` 与输入特征维度匹配（例如 62），并正确设置 `action_shape`。",[160,165,170,175,180,185,190,195,200,205,210,215,220,225,230,235,240,245,250,255],{"id":161,"version":162,"summary_zh":163,"released_at":164},128326,"v0.5.3","# API 变更\n1. 将 DI-engine 的 Python 版本支持范围扩展至 Python 3.7–Python 3.10\n\n# 环境\n1. 添加 pistonball 多智能体强化学习环境及其单元测试和示例 (#833)\n2. 更新 trading 环境 (#831)\n3. 优化 PPO 配置以提升离散动作空间的性能 (#809)\n4. 移除 MuJoCo PPO 中未使用的配置字段\n\n# 算法\n1. 添加 AWR 算法 (#828)\n2. 在 MAVAC 中加入编码器 (#823)\n3. 添加 HPT 模型架构 (#841)\n4. 修复多个模型包装器的重置 bug (#846)\n5. 为 ActionNoiseWrapper 增加混合动作空间支持 (#829)\n6. 修复 MAPPO 的优势函数计算 bug (#812)\n\n# 增强功能\n1. 新增 resume_training 选项，支持无缝续训 envstep 和 train_iter 计数 (#835)\n2. 优化旧版\u002F新版分布式数据并行（DDP）实现 (#842)\n3. 使 DingEnvWrapper 兼容 gymnasium (#817)\n\n# 修复\n1. 修复优先级回放缓存的删除 bug (#844)\n2. 修复中间件收集器环境重置 bug (#845)\n3. 修复多项单元测试中的缺陷\n\n# 风格调整\n1. 将 pyecharts 日志级别降为警告，并优化安装文档 (#838)\n2. 完善必要依赖项说明\n3. 详细润色 API 文档\n4. 校正 DI-engine 引用文献的作者信息\n5. 将 CI 的 macOS 版本从 12 升级至 13\n\n# 新闻\n1. [CleanS2S](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FCleanS2S)：单文件实现的高质量、流式语音到语音交互代理。\n2. [GenerativeRL](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.01245)：重新审视生成策略——一种更简洁的强化学习算法视角。\n3. [PRG](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.01787)：预训练可逆生成作为无监督视觉表征学习。\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fcompare\u002Fv0.5.2...v0.5.3\n\n**贡献者**：@PaParaZz1 @puyuan1996 @kxzxvbk @YinminZhang @zjowowen @luodi-7 @MarkHolmstrom @TairanMK","2024-12-23T08:06:37",{"id":166,"version":167,"summary_zh":168,"released_at":169},128327,"v0.5.2","# 环境\n1. 添加出租车环境 (#799) (#807)\n2. 添加伊辛模型环境 (#782)\n3. 添加新的弗罗曾湖环境 (#781)\n4. 优化 MuJoCo 中的 PPO 连续控制配置 (#801)\n5. 修复 MASAC SMAC 配置中 multi_agent=True 的 bug (#791)\n6. 更新\u002F加速摆杆 PPO\n\n# 算法\n1. 修复 GTRXL 兼容性 bug (#796)\n2. 修复 PPO 流水线中复杂观测的示例 (#786)\n3. 添加朴素 PWIL 示例\n4. 修复 MARL n-step TD 兼容性 bug\n\n# 增强\n1. 添加 GPU 工具函数 (#788)\n2. 添加已弃用函数装饰器 (#778)\n\n# 风格\n1. 放宽 Flask 依赖要求 (#811)\n2. 在 README 中添加新徽章 (hellogithub) (#805)\n3. 更新 README 中的 Discord 链接及徽章 (#795)\n4. 修复 config.py 中的拼写错误 (#776)\n5. 优化 rl_utils API 文档\n6. 添加 numpy\u003C2 的约束\n7. 将 macOS 平台测试版本更新至 12\n8. 优化 CI 的 Python 版本\n\n# 新闻\n1. [PsyDI](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FPsyDI): 构建用于心理评估的多模态交互式聊天机器人\n2. [ReZero](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLightZero): 通过反向视角和全缓冲重分析提升基于 MCTS 的算法性能\n3. [UniZero](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLightZero): 基于可扩展潜在世界模型的通用高效规划\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fcompare\u002Fv0.5.1...v0.5.2\n\n**贡献者: @PaParaZz1 @zjowowen @YinminZhang @TuTuHuss @nighood @ruiheng123 @rongkunxue @ooooo-create  @eltociear**","2024-06-27T08:56:39",{"id":171,"version":172,"summary_zh":173,"released_at":174},128328,"v0.5.1","# 环境\n1. 添加 MADDPG PettingZoo 示例 (#774)\n2. 优化 NGU Atari 配置 (#767)\n3. 修复悬崖漫步环境中的 bug (#759)\n4. 添加 PettingZoo 回放视频演示\n5. 将环境管理器的默认最大重试次数从 5 改为 1\n\n# 算法\n1. 添加与 QGPO 扩散模型相关的算法 (#757)\n2. 添加 HAPPO 多智能体算法 (#717)\n3. 添加 DreamerV3 + MiniGrid 适配 (#725)\n4. 修复 hppo 的熵权重，以避免 log_prob 中出现 NaN 错误 (#761)\n5. 修复结构化动作相关 bug (#760)\n6. 优化 Decision Transformer 入门文档 (#754)\n7. 修复 EDAC 策略\u002F模型相关 bug\n\n# 修复\n1. 修复环境中的拼写错误\n2. 修复 pynng 依赖问题\n3. 修复通信模块的单元测试 bug\n\n# 风格\n1. 优化策略 API 文档 (#762) (#764) (#768)\n2. 添加智能体 API 文档 (#758)\n3. 优化 torch_utils\u002Futils API 文档 (#745) (#747) (#752) (#755) (#763)\n\n# 新闻\n1. AAAI 2024：[SO2：离线到在线强化学习中 Q 值估计的视角](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FSO2)\n2. [LMDrive：基于大型语言模型的闭环端到端驾驶](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLMDrive)\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fcompare\u002Fv0.5.0...v0.5.1\n\n**贡献者：@PaParaZz1 @zjowowen @nighood @kxzxvbk @puyuan1996 @Cloud-Pku @AltmanD @HarryXuancy**","2024-02-04T15:55:45",{"id":176,"version":177,"summary_zh":178,"released_at":179},128329,"v0.5.0","# 环境\n1. 添加 tabmwp 环境 (#667)\n2. 修复 anytrading 环境中的问题 (#731)\n\n# 算法\n1. 添加 PromptPG 算法 (#667)\n2. 添加 Plan Diffuser 算法 (#700) (#749)\n3. 为 IMPALA 算法添加新的流水线实现 (#713)\n4. 在 DQN 类算法中添加 Dropout 层 (#712)\n\n# 增强\n1. 为 SAC\u002FDDPG\u002FA2C\u002FPPO 添加新的流水线智能体，并支持 Hugging Face (#637) (#730) (#737)\n2. 为模型增加更多单元测试用例 (#728)\n3. 在新流水线中添加收集器日志记录 (#735)\n\n# 修复\n1. 修复日志中间件问题 (#715)\n2. 修复 PPO 并行 bug (#709)\n3. 修复 optimizer_helper.py 中的拼写错误 (#726)\n4. 修复 MLP Dropout 的 if 条件 bug\n5. 修复 DreX 收集数据的单元测试 bug\n\n# 风格\n1. 优化环境管理器\u002F包装器的注释和 API 文档 (#742)\n2. 优化模型的注释和 API 文档 (#722) (#729) (#734) (#736) (#741)\n3. 优化策略的注释和 API 文档 (#732)\n4. 优化 rl_utils 的注释和 API 文档 (#724)\n5. 优化 torch_utils 的注释和 API 文档 (#738)\n6. 更新 README.md 和 Colab 示例 (#733)\n7. 更新 Metaworld Docker 镜像\n\n# 新闻\n1. NeurIPS 2023 Spotlight：[LightZero：面向通用序列决策场景的蒙特卡洛树搜索统一基准](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLightZero)\n2. OpenDILab + Hugging Face 强化学习模型库 [链接](https:\u002F\u002Fhuggingface.co\u002FOpenDILabCommunity)\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fcompare\u002Fv0.4.9...v0.5.0\n\n\n**贡献者：@PaParaZz1 @zjowowen @AltmanD @puyuan1996 @kxzxvbk @Super1ce @nighood @Cloud-Pku @zhangpaipai @ruoyuGao @eltociear**","2023-12-05T05:04:13",{"id":181,"version":182,"summary_zh":183,"released_at":184},128330,"v0.4.9","# API 变更\n1. 重构决策 Transformer 的实现，DI-engine 现在支持多模态观测下的离散和连续 DT 输出（示例：`ding\u002Fexample\u002Fdt.py`）\n2. 更新多 GPU 分布式数据并行（DDP）示例（[链接](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fdizoo\u002Fatari\u002Fconfig\u002Fserial\u002Fspaceinvaders\u002Fspaceinvaders_dqn_config_multi_gpu_ddp.py)）\n3. 修改 `InteractionSerialEvaluator` 的返回值，简化冗余结果\n\n# 环境\n1. 添加 Cliffwalking 环境 (#677)\n2. 添加 LunarLander PPO 配置及示例\n\n# 算法\n1. 添加 BCQ 离线强化学习算法 (#640)\n2. 添加 Dreamerv3 基于模型的强化学习算法 (#652)\n3. 添加张量流合并网络工具 (#673)\n4. 添加散射连接模型 (#680)\n5. 在新流水线中重构决策 Transformer，并支持图像输入和离散输出 (#693)\n6. 添加 Bilinear 类的三种变体以及一个 FiLM 类 (#703)\n\n# 增强功能\n1. 优化 Off-Policy 强化学习的多 GPU DDP 训练 (#679)\n2. 为 Ape-X 分布式流水线添加中间件 (#696)\n3. 添加评估训练好的 DQN 的示例 (#706)\n\n# 修复\n1. 修复 to_ndarray 无法为标量指定 dtype 的问题 (#708)\n2. 修复评估器返回 episode_info 的兼容性 bug\n3. 修复 CQL 示例入口使用错误配置的 bug\n4. 修复 enable_save_figure 环境接口\n5. 修复评估器中冗余环境信息的 bug\n6. 修复 to_item 单元测试中的 bug\n\n# 风格调整\n1. 优化并简化依赖项要求 (#672)\n2. 添加 Hugging Face Model Zoo 标志 (#674)\n3. 添加 OpenXLab Model Zoo 标志 (#675)\n4. 修复 py37 macOS CI 的问题，并将默认 PyTorch 版本从 1.7.1 更新至 1.12.1 (#678)\n5. 修复 mujoco-py 与 cython\u003C3 的兼容性问题 (#711)\n6. 修复类型拼写错误 (#704)\n7. 修复 PyPI 发布动作在 Ubuntu 18.04 上的 bug\n8. 更新联系方式（如微信）\n9. 优化算法文档中的表格\n\n# 新仓库\n1. [DOS](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDOS)：[CVPR 2023] ReasonNet：基于时空全局推理的端到端自动驾驶\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fcompare\u002Fv0.4.8...v0.4.9\n\n\n**贡献者：@PaParaZz1 @zjowowen @zhangpaipai @AltmanD @puyuan1996 @Cloud-Pku @Super1ce @kxzxvbk @jayyoung0802 @Mossforest @lxl2gf @Privilger**","2023-08-23T09:49:24",{"id":186,"version":187,"summary_zh":188,"released_at":189},128331,"v0.4.8","# API 变更\n1. 配置中不再强制要求 `stop value` 字段，默认值为 `math.inf`。用户可以在训练入口中指定 `max_env_step` 或 `max_train_iter`，以设置固定的终止条件来运行程序。\n\n# 环境\n1. 修复 Gym 混合奖励类型错误 (#664)\n2. 修复 Atari 环境 ID 无帧跳过问题 (#655)\n3. 修复 Gym AnyTrading 环境中的拼写错误 (#654)\n4. 更新 TD3BC D4RL 配置 (#659)\n5. 优化 BipedalWalker 配置\n\n# 算法\n1. 添加 EDAC 离线强化学习算法 (#639)\n2. 在 ResBlock 中支持 LN 和 GN 归一化类型 (#660)\n3. 为 PPOF 添加正态分布价值归一化基线 (#658)\n4. 优化 MLP 最后一层的初始化和归一化 (#650)\n5. 优化 TD3 监控变量\n\n# 增强\n1. 添加 MAPPO\u002FMASAC 任务示例 (#661)\n2. 添加复杂环境观测下的 PPO 示例 (#644)\n3. 添加屏障中间件 (#570)\n\n# 修复\n1. 修复采集器日志异常，并新增 `record_random_collect` 选项 (#662)\n2. 修复 `to_item` 兼容性问题 (#646)\n3. 修复训练器 dtype 转换兼容性问题\n4. 修复 PettingZoo 1.23.0 兼容性问题\n5. 修复集成头单元测试问题\n\n# 风格\n1. 修复 Dockerfile.env 中与 Gym 版本不兼容的问题 (#653)\n2. 增加更多算法 [文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F12_policies\u002Findex.html)\n\n# 新仓库\n1. [LightZero](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FLightZero)：一个轻量级、高效的 MCTS\u002FAlphaZero\u002FMuZero 算法工具包。\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fcompare\u002Fv0.4.6...v0.4.7\n\n**贡献者：@PaParaZz1 @zjowowen @puyuan1996 @SolenoidWGT @Super1ce @karroyan @zhangpaipai @eltociear**","2023-05-25T05:27:22",{"id":191,"version":192,"summary_zh":193,"released_at":194},128332,"v0.4.7","# API 变更\n1. 移除策略配置中子字段（learn\u002Fcollect\u002Feval）的强制要求（用户可自定义配置格式）\n2. 在任务流水线中使用 `wandb` 作为默认的日志记录器\n3. 移除 SAC 及相关算法中的 `value_network` 配置字段及其实现\n\n# 环境\n1. 添加 dmc2gym 环境支持及基线 (#451)\n2. 将 pettingzoo 更新至最新版本 (#597)\n3. 修复 ICM\u002FRND+ONPPO 配置中的 bug，并新增 app_door_to_key 环境 (#564)\n4. 添加 lunarlander 连续动作空间下的 TD3\u002FSAC 配置\n5. 优化 lunarlander 离散动作空间下的 C51 配置\n\n# 算法\n1. 新增程序克隆（PC）模仿学习算法 (#514)\n2. 新增孟奇hausen强化学习（MDQN）算法 (#590)\n3. 新增奖励\u002F价值归一化方法：PopArt、价值重缩放及 SymLog (#605)\n4. 优化奖励模型的配置与训练流水线 (#624)\n5. 新增 PPOF 奖励空间演示支持 (#608)\n6. 新增 PPOF AtarI 演示支持 (#589)\n7. 优化 DQN 默认配置及环境示例 (#611)\n8. 优化 SAC 相关注释并清理代码\n\n# 增强功能\n1. 新增语言模型（如 GPT）训练工具 (#625)\n2. 移除策略配置中子字段的强制要求 (#620)\n3. 完全支持 WandB (#579)\n\n# 修复\n1. 修复 next_obs 的浅拷贝操作导致的混淆问题 (#641)\n2. 修复 PDQN 中当 action_args 形状为 1 时 unsqueeze 操作的问题 (#599)\n3. 修复评估器 return_info 张量类型错误 (#592)\n4. 修复 deque 缓冲区包装器 PER 的 bug (#586)\n5. 修复奖励模型保存方法的兼容性问题\n6. 修复日志记录器断言及单元测试中的 bug\n7. 修复 BFS 测试在 Python 3.9 下的兼容性问题\n8. 修复 zergling 收集器单元测试中的 bug\n\n# 风格调整\n1. 新增 DI-engine Torch-RPC 点对点通信 Docker (#628)\n2. 新增 D4RL Docker (#591)\n3. 修正任务中的拼写错误 (#617)\n4. 修正 time_helper 中的拼写错误 (#602)\n5. 优化 README 并添加 TreeTensor 示例\n6. 更新贡献指南文档\n\n# 新计划\n- **DI-engine 贡献者招募** (#621)\n\u003Cdiv align=\"center\">\n   \u003Cimg width=\"300px\" height=\"auto\" src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F33195032\u002F226930216-b191c457-85ba-48d5-ae0f-c7ed7b46e9c1.png\">\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fcompare\u002Fv0.4.6...v0.4.7\n\n**贡献者: @PaParaZz1 @karroyan @zjowowen @ruoyuGao @kxzxvbk @nighood @song2181  @SolenoidWGT @PSHarold  @jimmydengpeng  @eltociear**\n","2023-04-11T16:55:54",{"id":196,"version":197,"summary_zh":198,"released_at":199},128333,"v0.4.6","# API 变更\n1. 中间件：`CkptSaver(cfg, policy, train_freq=100)` -> `CkptSaver(policy, cfg.exp_name, train_freq=100)`\n\n# 环境\n1. 添加 Metadrive 环境及相关 PPO 配置 (#574)\n2. 添加 Acrobot 环境及相关 DQN 配置 (#577)\n3. 在 Box2D 中添加赛车环境 (#575)\n4. 添加新的 Gym 混合可视化功能 (#563)\n5. 更新 CartPole IL 配置 (#578)\n\n# 算法\n1. 添加 BDQ 算法 (#558)\n2. 添加程序克隆模型 (#573)\n\n# 增强\n1. 添加简化的 PPOF（PPO × Family）接口 (#567) (#568) (#581) (#582)\n\n# 修复\n1. 修复使用 Torch 时的 `to_device` 和 `prev_state` Bug (#571)\n2. 修复 Py38 和 NumPy 的单元测试 Bug (#565)\n3. 修复 `contrastive_loss.py` 中的拼写错误 (#572)\n4. 修复 Dizoo 环境包安装中的 Bug\n5. 修复 `multi_trainer` 中间件的单元测试 Bug\n\n# 风格\n1. 添加 Evogym Docker (#580)\n2. 修复 MetaWorld Docker 的 Bug\n3. 修复 setuptools 高版本不兼容 Bug\n4. 扩展 TreeTensor 的最低版本要求\n\n# 新论文\n1. [GoBigger](https:\u002F\u002Fopenreview.net\u002Fforum?id=NnOZT_CR26Z): [ICLR 2023] 一个用于合作-竞争型多智能体交互式仿真的可扩展平台\n\n\n**贡献者：@PaParaZz1 @puyuan1996 @timothijoe @Cloud-Pku @ruoyuGao @Super1ce @karroyan @kxzxvbk @eltociear**","2023-02-18T13:49:56",{"id":201,"version":202,"summary_zh":203,"released_at":204},128334,"v0.4.5","# API 变更\n1. 将关于从扩展 `BaseEnv` 到利用 `DingEnvWrapper` 添加新环境的默认示例移至相应位置。\n2. 在所有相关代码中（包括环境和评估器）将 `final_eval_reward` 重命名为 `eval_episode_return`。\n\n# 环境\n1. 新增啤酒游戏供应链优化环境 (#512)。\n2. 新增 gym_pybullet_drones 环境 (#526)。\n3. 将 `eval reward` 重命名为 `episode return` (#536)。\n\n# 算法\n1. 新增策略梯度算法实现 (#544)。\n2. 新增 MADDPG 算法实现 (#550)。\n3. 新增 IMPALA 连续动作空间算法实现 (#551)。\n4. 新增 MADQN 算法实现 (#540)。\n\n# 增强功能\n1. 新增 IMPALA 类型的分布式训练方案 (#321)。\n2. 为回放缓冲区添加加载和保存方法 (#542)。\n3. 增加更多 DingEnvWrapper 示例 (#525)。\n4. 为评估器增加更多信息可视化支持 (#538)。\n5. 为子进程环境管理器添加跟踪日志 (#534)。\n\n# 修复\n1. 修复 HalfCheetah TD3 配置文件 (#537)。\n2. 修复 MuJoCo `action_clip` 参数兼容性 bug (#535)。\n3. 修复 Atari A2C 配置项 bug。\n4. 修复 DreX 单元测试兼容性 bug。\n\n# 风格调整\n1. 新增 DI-engine 路线图议题 (#548)。\n2. 更新相关项目链接及新环境文档。\n\n# 新项目\n1. [PPOxFamily](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FPPOxFamily)：PPO x 家族强化学习教程课程。\n2. [ACE](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FACE)：[AAAI 2023] 论文“ACE：具有双向动作依赖性的合作多智能体 Q 学习”的官方 PyTorch 实现。\n\n**贡献者：@PaParaZz1 @sailxjx @zjowowen @hiha3456 @Weiyuhong-1998 @kxzxvbk @song2181 @zerlinwang**","2022-12-13T17:40:04",{"id":206,"version":207,"summary_zh":208,"released_at":209},128335,"v0.4.4","# API 变更\n1. 新任务流水线中的 `context` 现在由 `dataclass` 实现，而非 `dict`。\n2. 推荐的可视化工具现在是 `wandb`，而非 `tensorboard`。\n\n# 环境\n1. 添加了修改后的 gym-hybrid 环境，包含移动、滑动和硬移动操作（#505）（#519）。\n3. 添加了 evogym 支持（#495）（#527）。\n4. 增加了 `save_replay_gif` 选项（#506）。\n5. 将 minigrid_env 及相关配置适配到最新的 MiniGrid v2.0.0 版本（#500）。\n\n# 算法\n1. 添加了 PCGrad 优化器（#489）。\n2. 在 MLP 和 ResBlock 中添加了一些功能（#511）。\n3. 删除了与 MCTS 相关的模块（#518）（我们将在未来发布一个 MCTS 仓库）。\n\n# 增强\n1. 添加了 wandb 中间件及示例（#488）（#523）（#528）。\n2. 在 Context 中新增了一些属性（#499）。\n3. 为策略部署添加了单环境策略包装器（[示例](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Fblob\u002Fmain\u002Fdizoo\u002Fclassic_control\u002Fcartpole\u002Fentry\u002Fcartpole_c51_deploy.py)）。\n4. 添加了自定义模型示例及文档（[文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F04_best_practice\u002Fcustom_model_zh.html)）。\n\n# 修复\n1. 修复了构建日志记录器参数及单元测试的问题（#522）。\n2. 修复了 PDQN 中 `total_loss` 的计算问题（#504）。\n3. 修复了保存 GIF 功能的 bug。\n4. 修复了关卡采样单元测试的 bug。\n\n# 风格\n1. 更新了联系邮箱地址（#503）。\n2. 优化了环境日志和残差块的命名。\n3. 在 README 中添加了详情按钮。\n\n# 新仓库\n- [DI-1024](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-1024)：深度强化学习 + 1024 游戏\n\n**贡献者：@PaParaZz1 @puyuan1996 @karroyan @hiha3456 @davide97l @Weiyuhong-1998 @zjowowen @norman26625**","2022-10-31T08:52:25",{"id":211,"version":212,"summary_zh":213,"released_at":214},128336,"v0.4.3","# Env\r\n1. add rule-based gomoku expert (#465)\r\n\r\n# Algorithm\r\n1. fix a2c policy batch size bug (#481)\r\n2. enable activation option in collaq attention and mixer\r\n3. minor fix about IBC (#477)\r\n\r\n# Enhancement\r\n1. add IGM support (#486)\r\n2. add tb logger middleware and demo\r\n\r\n# Fix\r\n1. the type conversion in ding_env_wrapper (#483)\r\n2. di-orchestrator version bug in unittest (#479)\r\n3. data collection errors caused by shallow copies (#475)\r\n4. gym==0.26.0 seed args bug\r\n\r\n# Style\r\n1. add readme tutorial link(environment & algorithm) (#490) (#493)\r\n2. adjust location of the default_model method in policy (#453)\r\n\r\n# New Repo\r\n- [DI-sheep](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-sheep): Deep Reinforcement Learning + 3 Tiles Game\r\n\r\n**Contributors: @PaParaZz1 @nighood @norman26625 @ZHZisZZ @cpwan @mahuangxu**\r\n","2022-09-22T17:02:00",{"id":216,"version":217,"summary_zh":218,"released_at":219},128337,"v0.4.2","# API Change\r\n1. `config` will be deepcopyed by default in `compile_config` function\r\n2. After calling `compile_config` function, current code repo `git log` and `git diff` information will be saved in `exp_name` directory\r\n\r\n# Env\r\n1. add rocket env (#449)\r\n2. updated pettingzoo env and improved related performance (#457)\r\n3. add mario env demo (#443)\r\n4. add MAPPO multi-agent config (#464)\r\n5. add mountain car (discrete action) environment (#452)\r\n6. fix multi-agent mujoco gym comaptibility bug\r\n7. fix gfootball env save_replay variable init bug\r\n\r\n# Algorithm\r\n1. add IBC (Implicit Behaviour Cloning) algorithm (#401)\r\n2. add BCO (Behaviour Cloning from Observation) algorithm (#270)\r\n3. add continuous PPOPG algorithm (#414)\r\n4. add PER in CollaQ (#472)\r\n5. add activation option in QMIX and CollaQ\r\n\r\n# Enhancement\r\n1. update ctx to dataclass (#467)\r\n\r\n# Fix\r\n1. base_env FinalMeta bug about gym 0.25.0-0.25.1\r\n2. config inplace modification bug\r\n3. ding cli no argument problem\r\n4. import errors after running setup.py (jinja2, markupsafe)\r\n5. conda py3.6 and cross platform build bug\r\n\r\n# Style\r\n1. add project state and datetime in log dir (#455)\r\n2. polish notes for q-learning model (#427)\r\n3. revision to mujoco dockerfile and validation (#474)\r\n4. add dockerfile for cityflow env\r\n5. polish default output log format\r\n\r\n**Contributors: @PaParaZz1 @ZHZisZZ  @zjowowen @song2181 @zerlinwang @i-am-tc @hiha3456 @nighood @kxzxvbk @Weiyuhong-1998 @RobinC94** ","2022-09-07T18:11:23",{"id":221,"version":222,"summary_zh":223,"released_at":224},128338,"v0.4.1","# API Change\r\n1. upgrade Python version from `3.6-3.8` to `3.7-3.9`\r\n2. upgrade gym version from `0.20.0` to `0.25.0`, plenty of `env_id` needs to update (e.g., `Pendulum-v0` to `Pendulum-v1`) (#434)\r\n3. upgrade torch version from `1.10.0` to `1.12.0`\r\n4. upgrade mujoco bin from `2.0.0` to `2.1.0`\r\n5. add new task pipeline demo (DDPG\u002FTD3\u002FD4PG\u002FC51\u002FQRDQN\u002FIQN?SQIL\u002FTREX\u002FPDQN) (#374, #380, #384, #407)\r\n\r\n# Env (dizoo)\r\n1. add gym anytrading env (#424) \r\n2. add board games env (tictactoe, gomuku, chess) (#356)\r\n3. add sokoban env (#397) (#429)\r\n4. add BC and DQN demo for gfootball (#418) (#423)\r\n6. add discrete pendulum env (#395)\r\n\r\n# Algorithm\r\n1. add STEVE model-based algorithm (#363)\r\n2. add PLR algorithm (#408)\r\n3. plugin ST-DIM into PPO (#379)\r\n\r\n# Enhancement\r\n1. add final result saving in training pipeline\r\n\r\n# Fix\r\n1. random policy randomness bug\r\n2. action_space seed compalbility bug\r\n3. discard message sent by self in redis mq (#354)\r\n4. remove pace controller (#400)\r\n5. import error in serial_pipeline_trex (#410)\r\n7. unittest hang and fail bug (#413)\r\n8. DREX collect data bug\r\n9. remove unused import cv2\r\n10. ding CLI env\u002Fpolicy option bug\r\n\r\n# Style\r\n1. add buffer api description (#371)\r\n2. polish VAE comments (#404)\r\n3. unittest for FQF (#412)\r\n4. add metaworld dockerfile (#432)\r\n5. remove opencv requirement in default setting\r\n6. update long description in setup.py\r\n\r\n# New Repo\r\n1. [InterFuser](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FInterFuser): Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer\r\n2. [awesome-decision-transformer](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-decision-transformer): A curated list of Decision Transformer resources\r\n3. [awesome-exploration-RL](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-exploration-rl): A curated list of awesome exploration RL resources\r\n\r\n**Contributors: @PaParaZz1 @zjowowen @sailxjx @puyuan1996 @ZHZisZZ @lixl-st @Cloud-Pku @Weiyuhong-1998 @karroyan @kxzxvbk @song2181 @nighood @zhangpaipai @Hcnaeg**","2022-08-14T09:14:52",{"id":226,"version":227,"summary_zh":228,"released_at":229},128339,"v0.4.0","# API Change\r\n1. refactor DI-engine doc and update doc links [doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002F) | [中文文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002F)\r\n2. refactor default logging lib and add DI-toolkit (ditk) requirement (just enter `pip install DI-toolkit`)\r\n\r\n\r\n# Env (dizoo)\r\n1. add MAPPO\u002FMASAC all configs in SMAC (#310) **(SOTA results in SMAC!!!)**\r\n2. add dmc2gym env (#344) (#360)\r\n3. remove DI-star requirements of dizoo\u002Fsmac, use official pysc2 (#302)\r\n4. add latest GAIL mujoco config (#298)\r\n5. polish procgen env (#311)\r\n6. add MBPO ant and humanoid config for mbpo (#314)\r\n7. fix slime volley env obs space bug when agent_vs_agent\r\n8. fix smac env obs space bug\r\n9. fix import path error in lunarlander (#362)\r\n\r\n# Algorithm\r\n1. add Decision Transformer algorithm (#327) (#364)\r\n2. add on-policy PPG algorithm (#312)\r\n3. add DDPPO & add model-based SAC with lambda-return algorithm (#332)\r\n4. add infoNCE loss and ST-DIM algorithm (#326)\r\n5. add FQF distributional RL algorithm (#274)\r\n6. add continuous BC algorithm (#318）\r\n7. add pure policy gradient PPO algorithm (#382)\r\n8. add SQIL + SAC algorithm (#348)\r\n9. polish NGU and related modules (#283) (#343) (#353)\r\n10. add marl distributional td loss (#331)\r\n\r\n# Enhancement\r\n1. add new worker middleware (#236) **(new DRL programming model and pipeline example)**\r\n2. refactor model-based RL pipeline (ding\u002Fworld_model) (#332)\r\n3. refactor logging system in the whole DI-engine (#316)\r\n4. add env supervisor design (#330)\r\n5. support async reset for envpool env manager (#250)\r\n6. add log videos to tensorboard (#320)\r\n7. refactor impala cnn encoder interface (#378)\r\n\r\n# Fix\r\n1. env save replay bug\r\n2. transformer mask inplace operation bug\r\n3. transtion_with_policy_data bug in SAC and PPG\r\n\r\n# Style\r\n1. add dockerfile for ding:hpc image (#337)\r\n2. fix mpire 2.3.5 which handles default processes more elegantly (#306)\r\n3. use FORMAT_DIR instead of .\u002Fding (#309）\r\n4. update quickstart colab link (#347)\r\n5. polish comments in ding\u002Fmodel\u002Fcommon (#315)\r\n6. update mujoco docker download path (#386)\r\n7. fix protobuf new version compatibility bug\r\n8. fix torch1.8.0 torch.div compatibility bug\r\n9. update doc links in readme\r\n10. add outline in readme and update wechat image\r\n11. update head image and refactor docker dir\r\n\r\n**Contributors: @PaParaZz1 @sailxjx @puyuan1996 @ZHZisZZ @Will-Nie @zjowowen  @HansBug @zerlinwang @Weiyuhong-1998 @davide97l  @hiha3456 @LuciusMos @kxzxvbk @lixl-st @zhangpaipai @song2181  @karroyan**","2022-06-21T12:37:50",{"id":231,"version":232,"summary_zh":233,"released_at":234},128340,"v0.3.1","## API Change\r\n1. Substitute `gym.wrappers.RecordVideo`  for `gym.wrappers.Monitor` to save video replay\r\n2. Substitute `policy\u002Fbc.py` for `policy\u002Fil.py` and update relevant serial_pipeline and unittest\r\n3. Polish all the configurations in dizoo with our new config guideline\r\n\r\n## Env (dizoo)\r\n1. polish and standardize dizoo config (#252) (#255) (#249) (#246) (#262) (#261) (#266) (#273) (#263) (#280) (#259) (#286) (#277) (#290) (#289) (#299)\r\n4. add GRF academic env and config (#281)\r\n5. update env inferface of GRF (#258)\r\n6. update D4RL offline RL env and config (#285)\r\n7. polish PomdpAtariEnv (#254)\r\n\r\n## Algorithm\r\n1. DREX Inverse RL algorithm (#218)\r\n\r\n## Feature\r\n1. separate mq and parallel modules, add redis (#247)\r\n2. rename env variables; fix attach_to parameter (#244)\r\n4. env implementation check (#275)\r\n5. adjust and set the max column number of tabulate in log (#296)\r\n6. speed up GTrXL forward method + GRU unittest (#253) (#292)\r\n8. add drop_extra option for sample collect\r\n\r\n## Fix\r\n1. add act_scale in DingEnvWrapper; fix envpool env manager (#245)\r\n2. auto_reset=False and env_ref bug in env manager (#248)\r\n3. data type and deepcopy bug in RND (#288)\r\n4. share_memory bug and multi_mujoco env (#279)\r\n5. some bugs in GTrXL (#276)\r\n7. update gym_vector_env_manager and add more unittest (#241)\r\n9. mdpolicy random collect bug (#293)\r\n10. gym.wrapper save video replay bug\r\n11. collect abnormal step format bug and add unittest\r\n\r\n## Test\r\n1. add buffer benchmark & socket test (#284)\r\n\r\n## Style\r\n1. upgrade mpire (#251)\r\n2. add GRF(google research football) docker (#256)\r\n3. update policy and gail comment\r\n\r\n**Contributors: @PaParaZz1 @sailxjx @puyuan1996 @Will-Nie @davide97l  @hiha3456 @zjowowen  @Weiyuhong-1998 @LuciusMos @kxzxvbk @lixl-st @YinminZhang @song2181  @Hcnaeg @norman26625 @jayyoung0802 @RobinC94 @HansBug**","2022-04-23T08:19:37",{"id":236,"version":237,"summary_zh":238,"released_at":239},128341,"v0.3.0","# API Change\r\n1. add new `BaseEnv` definition: \r\n    - remove `info` method\r\n    - add `random_action` method\r\n    - add `observation_space`, `action_space`, `reward_space` properties\r\n    - [Env English doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002Fbest_practice\u002Fding_env.html) | [环境中文文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002Fbest_practice\u002Fding_env_zh.html)\r\n2. modify the return value of `eval` method in `InteractionSerialEvaluator` class from `Tuple[bool, float]` to `Tuple[bool, dict]`.\r\n3. move the default logger to rich logger, you can set env variable like `export ENABLE_RICH_LOGGING=False` to disable it.\r\n4. add `train_iter` and `env_step` argument in ding CLI.\r\n    - you can use them like `ding -m serial -c pendulum_sac_config.py -s 0 --train-iter 1e3`\r\n5. remove default `n_sample\u002Fn_episode` value in policy default config.\r\n\r\n# Env (dizoo)\r\n1. add bitfilp HER DQN benchmark (#192) (#193) (#197)\r\n2. add slime volley league training demo (#229)\r\n\r\n# Algorithm\r\n1. Gated TransformXL (GTrXL) algorithm (#136)\r\n2. TD3 + VAE(HyAR) latent action algorithm (#152)\r\n6. stochastic dueling network (#234)\r\n7. use log prob instead of using prob in ACER (#186)\r\n\r\n# Feature\r\n1. support envpool env manager (#228)\r\n2. add league main and other improvements in new framework (#177) (#214)\r\n3. add pace controller middleware in new framework (#198)\r\n4. add auto recover option in new framework (#242)\r\n5. add k8s parser in new framework (#243)\r\n8. support async event handler and logger (#213)\r\n9. add grad norm calculator (#205)\r\n10. add gym vector env manager (#147)\r\n11. add train_iter and env_step in serial pipeline (#212)\r\n12. add rich logger handler (#219) (#223) (#232)\r\n13. add naive lr_scheduler demo\r\n\r\n# Refactor\r\n1. new BaseEnv and DingEnvWrapper (#171) (#231) (#240) [Env English doc](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002Fbest_practice\u002Fding_env.html) | [环境中文文档](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002Fbest_practice\u002Fding_env_zh.html)\r\n\r\n# Polish \r\n### Improve configurations in dizoo and add more algorithm benchmark [doc example](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002Fhands_on\u002Fdqn.html) | [文档示例](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fzh_CN\u002Flatest\u002Fhands_on\u002Fdqn_zh.html)\r\n1. MAPPO and MASAC smac config (#209) (#239)\r\n2. QMIX smac config (#175)\r\n3. R2D2 atari config (#181)\r\n4. A2C atari config (#189)\r\n5. GAIL box2d and mujoco config (#188)\r\n6. ACER atari config (#180)\r\n7. SQIL atari config (#230)\r\n8. TREX atari\u002Fmujoco config\r\n9. IMPALA atari config\r\n10. MBPO\u002FD4PG mujoco config\r\n\r\n# Fix\r\n1. random_collect compatible to episode collector (#190)\r\n2. remove default n_sample\u002Fn_episode value in policy config (#185)\r\n3. PDQN model bug on gpu device (#220)\r\n4. TREX algorithm CLI bug (#182)\r\n5. DQfD JE computation bug and move to AdamW optimizer (#191)\r\n6. pytest problem for parallel middleware (#211)\r\n7. mujoco numpy compatibility bug\r\n8. markupsafe 2.1.0 bug\r\n9. framework parallel module network emit bug\r\n10. mpire bug and disable algotest in py3.8\r\n11. lunarlander env import and env_id bug\r\n12. icm unittest repeat name bug\r\n13. buffer thruput close bug\r\n\r\n# Test\r\n1. resnet unittest (#199)\r\n2. SAC\u002FSQN unittest (#207)\r\n3. CQL\u002FR2D3\u002FGAIL unittest (#201)\r\n4. NGU td unittest (#210)\r\n5. model wrapper unittest (#215)\r\n6. MAQAC model unittest (#226)\r\n\r\n# Style\r\n1. add doc docker (#221) (latex support)\r\n\r\n**Contributors: @PaParaZz1 @sailxjx @puyuan1996 @Will-Nie @Weiyuhong-1998 @davide97l  @zjowowen  @LuciusMos @kxzxvbk @Hcnaeg @jayyoung0802 @simonat2011 @jiaruonan**","2022-03-24T08:59:27",{"id":241,"version":242,"summary_zh":243,"released_at":244},128342,"v0.2.3","# API Change\r\n1. move `actor_head_type` to `action_space` (which is related DDPG\u002FTD3\u002FSAC)\r\n2. add multiple seeds in CLI: `ding -m serial -c cartpole_dqn_config.py -s 0 -s 1 -s 2`\r\n3. add new replay buffer (which separates algorithm and storage), user can refer to [buffer](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine\u002Ftree\u002Fmain\u002Fding\u002Fworker\u002Fbuffer)\r\n4. add new main pipeline for async\u002Fparallel framework [tutorial](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002Fdistributed\u002Findex.html)\r\n\r\n# Env (dizoo)\r\n1. add multi-agent mujoco env (#146)\r\n2. add delay reward mujoco env (#145)\r\n3. fix port conflict in gym_soccer (#139)\r\n\r\n# Algorithm\r\n1. MASAC algorithm (#112)\r\n2. TREX IRL algorithm (#119) (#144)\r\n3. H-PPO hybrid action space algorithm (#140)\r\n4. residual link in R2D2 (#150)\r\n5. gumbel softmax (#169)\r\n6. move actor_head_type to action_space field\r\n\r\n# Feature\r\n1. new main pipeline and async\u002Fparallel framework (#142) (#166) (#168)\r\n2. refactor buffer, separate algorithm and storage (#129)\r\n3. cli in new pipeline(ditask) (#160) \r\n4. add multiprocess tblogger, fix circular reference problem (#156)\r\n5. add multiple seed cli\r\n6. polish eps_greedy_multinomial_sample in model_wrapper (#154)\r\n\r\n# Fix\r\n1. R2D3 abs priority problem (#158) (#161)\r\n2. multi-discrete action space policies random action bug (#167)\r\n3. doc generate bug with enum_tools (#155)\r\n\r\n# Style\r\n1. more comments about R2D2 (#149)\r\n2. add doc about how to migrate a new env [link](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002Fbest_practice\u002Fding_env.html)\r\n3. add doc about env tutorial in dizoo [link](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002Fenv_tutorial\u002Findex.html)\r\n4. add conda auto release (#148)\r\n5. udpate zh doc link\r\n6. update kaggle tutorial link\r\n\r\n# New Repo\r\n1. [awesome-model-based-RL](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Fawesome-model-based-RL): A curated list of awesome Model-Based RL resources \r\n2. [DI-smartcross](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-smartcross): Decision AI in Traffic Light Control\r\n\r\n**Contributors: @PaParaZz1 @sailxjx @puyuan1996 @Will-Nie @Weiyuhong-1998 @LikeJulia @RobinC94 @LuciusMos @mingzhang96 @shgqmrf15 @zjowowen**","2022-01-04T06:43:55",{"id":246,"version":247,"summary_zh":248,"released_at":249},128343,"v0.2.2","# Env (dizoo)\r\n1. apple key to door treasure env (#128)\r\n2. bsuite memory benchmark (#138)\r\n3. polish atari impala config\r\n\r\n# Algorithm\r\n1. Guided Cost IRL algorithm (#57)\r\n2. ICM exploration algorithm (#41)\r\n3. MP-DQN hybrid action space algorithm (#131)\r\n4. add loss statistics and polish r2d3 pong config (#126)\r\n\r\n# Enhancement\r\n1. add renew env mechanism in env manager and update timeout mechanism (#127) (#134)\r\n\r\n# Fix\r\n1. async subprocess env manager reset bug (#137)\r\n2. keepdims name bug in model wrapper\r\n3. on-policy ppo value norm bug\r\n4. GAE and RND unittest bug\r\n5. hidden state wrapper h tensor compatibility\r\n6. naive buffer auto config create bug\r\n\r\n# Style\r\n1. add supporters list\r\n\r\n# New Repo Feature\r\n1. [treevalue speed benchmark](https:\u002F\u002Fgithub.com\u002Fopendilab\u002Ftreevalue#speed-performance) \r\n\r\n**Contributors: @PaParaZz1 @puyuan1996 @RobinC94 @LikeJulia @Will-Nie @Weiyuhong-1998 @timothijoe @davide97l @lichuminglcm @YinminZhang**","2021-12-03T14:23:15",{"id":251,"version":252,"summary_zh":253,"released_at":254},128344,"v0.2.1","# API Change\r\n1. remove torch in all envs (numpy array is the basic data format in env)\r\n2. remove `on_policy` field in all the config\r\n3. change `eval_freq` from 50 to 1000\r\n\r\n# Tutorial and Doc\r\n1. [env tutorial](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Flatest\u002Fenv_tutorial\u002Findex.html)\u002F[环境指南](https:\u002F\u002Fdi-engine-docs.readthedocs.io\u002Fen\u002Fmain-zh\u002Fenv_tutorial\u002Findex_zh.html)\r\n\r\n\r\n# Env (dizoo)\r\n1. gym-hybrid env (#86)\r\n2. gym-soccer (HFO) env (#94)\r\n3. Go-Bigger env baseline (#95)\r\n4. sac and ppo config for bipedalwalker env(#121)\r\n\r\n# Algorithm\r\n1. DQfD Imitation Learning algorithm (#48) (#98)\r\n2. TD3BC offline RL algorithm (#88)\r\n3. MBPO model-based RL algorithm (#113)\r\n4. PADDPG hybrid action space algorithm (#109)\r\n5. PDQN hybrid action space algorithm (#118)\r\n6. fix R2D2 bugs and produce benchmark, add naive NGU (#40)\r\n7. self-play training demo in slime_volley env (#23)\r\n8. add example of GAIL entry + config for mujoco (#114)\r\n\r\n# Enhancement\r\n1. enable arbitrary policy num in serial sample collector\r\n2. add torch DataParallel for single machine multi-GPU\r\n3. add registry force_overwrite argument\r\n4. add naive buffer periodic thruput seconds argument\r\n\r\n\r\n# Fix\r\n1. target model wrapper hard reset bug\r\n2. fix learn state_dict target model bug\r\n3. ppo bugs and update atari ppo offpolicy config (#108)\r\n4. pyyaml version bug (#99)\r\n5. small fix on bsuite environment (#117)\r\n6. discrete cql unittest bug\r\n7. release workflow bug\r\n8. base policy model state_dict overlap bug\r\n9. remove on_policy option in dizoo config and entry\r\n10. remove torch in env\r\n\r\n\r\n# Test\r\n1. add pure docker setting test (#103)\r\n2. add unittest for dataset and evaluator (#107)\r\n3. add unittest for on-policy algorithm (#92)\r\n4. add unittest for ppo and td (MARL case) (#89)\r\n\r\n\r\n# Style\r\n1. gym version == 0.20.0\r\n2. torch version >= 1.1.0, \u003C= 1.10.0\r\n3. ale-py == 0.7.0\r\n\r\n\r\n# New Repo\r\n- [Go-Bigger](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FGoBigger) OpenDILab Multi-Agent Decision Intelligence Environment\r\n- [GoBigger-Challenge-2021](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FGoBigger-Challenge-2021) Basic code and description for GoBigger challenge 2021\r\n\r\n**Contributors: @PaParaZz1 @puyuan1996 @Will-Nie @YinminZhang @Weiyuhong-1998 @LikeJulia @sailxjx @davide97l @jayyoung0802  @lichuminglcm @yifan123 @RobinC94 @zjowowen**","2021-11-22T08:15:20",{"id":256,"version":257,"summary_zh":258,"released_at":259},128345,"v0.2.0","# API Change\r\n1. `SampleCollector` rename to `SampleSerialCollector`\r\n2. `EpisodeCollector` rename to `EpisodeSerialCollector`\r\n3. `BaseSerialEvaluator` rename to `InteractionSerialEvaluator`\r\n4. `ZerglingCollector` rename to `ZerglingParallelCollector`\r\n5. `OneVsOneCollector` rename to `MarineParallelCollector`\r\n6. `AdvancedBuffer` registry name from `priority` to `advanced`\r\n\r\n\r\n# Env (dizoo)\r\n1. overcooked env (#20)\r\n2. procgen env (#26)\r\n3. modified predator env (#30)\r\n4. d4rl env (#37)\r\n5. imagenet dataset (#27)\r\n6. bsuite env (#58)\r\n7. move atari_py to ale-py\r\n\r\n# Algorithm\r\n1. SQIL algorithm (#25) (#44)\r\n2. CQL algorithm (discrete\u002Fcontinuous) (#37) (#68)\r\n3. MAPPO algorithm (#62)\r\n4. WQMIX algorithm (#24)\r\n5. D4PG algorithm (#76)\r\n6. update multi-discrete policy(dqn, ppo, rainbow) (#51) (#72)\r\n\r\n# Enhancement\r\n1. image classification supervised training pipeline (#27)\r\n2. add force_reproducibility option in subprocess env manager\r\n3. add\u002Fdelete\u002Frestart replicas via cli for k8s\r\n4. add league metric (trueskill and elo) (#22)\r\n5. add tb in naive buffer and modify tb in advanced buffer (#39)\r\n6. add k8s launcher and di-orchestrator launcher, add related unittest (#45) (#49)\r\n7. add hyper-parameter scheduler module (#38)\r\n8. add plot function (#59)\r\n\r\n# Fix\r\n1. acer weight bug and update atari result (#21)\r\n2. mappo nan bug and dict obs cannot unsqueeze bug (#54)\r\n3. r2d2 hidden state and obs pre-processing bug (#36) (#52)\r\n4. ppo bug when use dual_clip and adv > 0\r\n5. qmix double_q hidden state bug\r\n6. spawn context problem in interaction unittest (#69)\r\n7. formatted config no eval bug (#53)\r\n8. the catch statements that will never succeed and system proxy bug (#71) (#79)\r\n9. lunarlander config polish\r\n10. c51 head dimension mismatch bug\r\n11. mujoco config typo bug\r\n12. ppg atari config multi buffer bug\r\n13. max use and priority update special branch bug in advanced_buffer\r\n\r\n# Style\r\n1. add docker deploy in github workflow (#70) (#78) (#80)\r\n2. support PyTorch 1.9.0\r\n3. add algo\u002Fenv list in README\r\n4. rename advanced_buffer register name to advanced\r\n\r\n# New Repo\r\n- [DI-treetensor](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-treetensor): Tree Nested PyTorch Tensor Lib\r\n\r\n**Contributors: @PaParaZz1 @YinminZhang @Will-Nie @puyuan1996 @Weiyuhong-1998 @HansBug @sailxjx @simonat2011 @konnase @RobinC94 @LikeJulia @LuciusMos @jayyoung0802 @yifan123 @davide97l @garyzhang99**","2021-09-30T15:00:39"]