[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-kengz--awesome-deep-rl":3,"tool-kengz--awesome-deep-rl":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":80,"owner_email":81,"owner_twitter":79,"owner_website":82,"owner_url":83,"languages":79,"stars":84,"forks":85,"last_commit_at":86,"license":87,"difficulty_score":88,"env_os":89,"env_gpu":89,"env_ram":89,"env_deps":90,"category_tags":93,"github_topics":94,"view_count":10,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":100,"updated_at":101,"faqs":102,"releases":103},1046,"kengz\u002Fawesome-deep-rl","awesome-deep-rl","A curated list of awesome Deep Reinforcement Learning resources.","awesome-deep-rl 是一个精心整理的深度强化学习（Deep RL）资源清单，旨在为社区提供一站式的优质内容导航。面对深度强化学习领域资源分散、学习曲线陡峭的现状，awesome-deep-rl 通过系统化的分类整理，帮助用户快速定位所需的代码库、基准测试结果、训练环境及学习材料。\n\n这份清单涵盖了从工业界到学术界的广泛资源，包括 Google DeepMind、OpenAI、Facebook 等机构开源的核心库（如 Ray RLLib、Dopamine、Acme），以及实用的教程、书籍和技术博客。无论是希望复现算法的研究人员，还是寻求高效开发框架的工程师，亦或是想要入门该领域的学生，都能从中获益。\n\nawesome-deep-rl 的独特价值在于其全面性与时效性，不仅罗列了主流算法实现，还提供了竞赛信息和发展时间线，极大地降低了信息检索成本。如果你正在探索深度强化学习的奥秘，awesome-deep-rl 将是值得信赖的起点，帮助你少走弯路，高效开展研究与开发工作。","# Awesome Deep RL [![Awesome](https:\u002F\u002Fawesome.re\u002Fbadge.svg)](https:\u002F\u002Fawesome.re)\nA curated list of awesome Deep Reinforcement Learning resources.\n\n## Contents\n\n- [Libraries](#libraries)\n- [Benchmark Results](#benchmark-results)\n- [Environments](#environments)\n- [Competitions](#competitions)\n- [Timeline](#timeline)\n- [Books](#books)\n- [Tutorials](#tutorials)\n- [Blog](#blogs)\n\n## Libraries\n\n- [AgileRL](https:\u002F\u002Fgithub.com\u002FAgileRL\u002FAgileRL) - A Deep Reinforcement Learning library focused on improving development by introducing RLOps - MLOps for reinforcement learning.\n- [Berkeley Ray RLLib](https:\u002F\u002Fgithub.com\u002Fray-project\u002Fray) - An open-source library for reinforcement learning that offers both high scalability and a unified API for a variety of applications.\n- [Berkeley Softlearning](https:\u002F\u002Fgithub.com\u002Frail-berkeley\u002Fsoftlearning) - A reinforcement learning framework for training maximum entropy policies in continuous domains.\n- [Catalyst](https:\u002F\u002Fgithub.com\u002Fcatalyst-team\u002Fcatalyst) - Accelerated DL & RL.\n- [ChainerRL](https:\u002F\u002Fgithub.com\u002Fchainer\u002Fchainerrl) - A deep reinforcement learning library built on top of Chainer.\n- [DeepMind Acme](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Facme) - A research framework for reinforcement learning.\n- [DeepMind OpenSpiel](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fopen_spiel) - A collection of environments and algorithms for research in general reinforcement learning and search\u002Fplanning in games.\n- [DeepMind TRFL](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Ftrfl) - TensorFlow Reinforcement Learning.\n- [DeepRL](https:\u002F\u002Fgithub.com\u002FShangtongZhang\u002FDeepRL) - Modularized Implementation of Deep RL Algorithms in PyTorch.\n- [DeepX machina](https:\u002F\u002Fgithub.com\u002FDeepX-inc\u002Fmachina) - A library for real-world Deep Reinforcement Learning which is built on top of PyTorch.\n- [d3rlpy](https:\u002F\u002Fgithub.com\u002Ftakuseno\u002Fd3rlpy) - An offline deep reinforcement learning library.\n- [Facebook ELF](https:\u002F\u002Fgithub.com\u002Fpytorch\u002FELF) - A platform for game research with AlphaGoZero\u002FAlphaZero reimplementation.\n- [Facebook ReAgent](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FReAgent) - A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)\n- [garage](https:\u002F\u002Fgithub.com\u002Frlworkgroup\u002Fgarage) - A toolkit for reproducible reinforcement learning research.\n- [Google Dopamine](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fdopamine) - A research framework for fast prototyping of reinforcement learning algorithms.\n- [Google TF-Agents](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fagents) - TF-Agents is a library for Reinforcement Learning in TensorFlow.\n- [K-Scale Labs - ksim](https:\u002F\u002Fgithub.com\u002Fkscalelabs\u002Fksim) - A modular and easy-to-use framework for training policies in simulation.\n- [K-Scale Labs - ksim-gym](https:\u002F\u002Fgithub.com\u002Fkscalelabs\u002Fksim-gym) - K-Sim Gym: Making robots useful with RL. Built on top of K-Sim.\n- [MAgent](https:\u002F\u002Fgithub.com\u002Fgeek-ai\u002FMAgent) - A Platform for Many-agent Reinforcement Learning.\n- [Maze](https:\u002F\u002Fgithub.com\u002Fenlite-ai\u002Fmaze) - Application-oriented deep reinforcement learning framework addressing real-world decision problems.\n- [MushroomRL](https:\u002F\u002Fgithub.com\u002FMushroomRL\u002Fmushroom-rl) - Python library for Reinforcement Learning experiments.\n- [NervanaSystems coach](https:\u002F\u002Fgithub.com\u002FNervanaSystems\u002Fcoach) - Reinforcement Learning Coach by Intel AI Lab.\n- [OpenAI Baselines](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fbaselines) - High-quality implementations of reinforcement learning algorithms.\n- [OpenRL](https:\u002F\u002Fgithub.com\u002FOpenRL-Lab\u002Fopenrl) - An open-source general reinforcement learning research framework.\n- [pytorch-a2c-ppo-acktr-gail](https:\u002F\u002Fgithub.com\u002Fikostrikov\u002Fpytorch-a2c-ppo-acktr-gail) - PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).\n- [pytorch-rl](https:\u002F\u002Fgithub.com\u002Fnavneet-nmk\u002Fpytorch-rl) - Model-free deep reinforcement learning algorithms implemented in Pytorch.\n- [reaver](https:\u002F\u002Fgithub.com\u002Finoryy\u002Freaver) - A modular deep reinforcement learning framework with a focus on various StarCraft II based tasks.\n- [RLgraph](https:\u002F\u002Fgithub.com\u002Frlgraph\u002Frlgraph) - Modular computation graphs for deep reinforcement learning.\n- [RLkit](https:\u002F\u002Fgithub.com\u002Fvitchyr\u002Frlkit) - Reinforcement learning framework and algorithms implemented in PyTorch.\n- [rlpyt](https:\u002F\u002Fgithub.com\u002Fastooke\u002Frlpyt) - Reinforcement Learning in PyTorch.\n- [RLtools](https:\u002F\u002Fgithub.com\u002Frl-tools\u002Frl-tools) - The fastest deep reinforcement learning library for continuous control, implemented in pure, dependency-free C++ (Python bindings available as well).\n- [skrl](https:\u002F\u002Fgithub.com\u002FToni-SM\u002Fskrl) - Modular reinforcement learning library (on PyTorch and JAX) with support for NVIDIA Isaac Gym, Omniverse Isaac Gym and Isaac Lab.\n- [SLM Lab](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab) - Modular Deep Reinforcement Learning framework in PyTorch.\n- [Stable Baselines](https:\u002F\u002Fgithub.com\u002Fhill-a\u002Fstable-baselines) - A fork of OpenAI Baselines, implementations of reinforcement learning algorithms.\n- [TensorForce](https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce) - A TensorFlow library for applied reinforcement learning.\n- [Tianshou](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002Ftianshou\u002F) - Tianshou (天授) is a reinforcement learning platform based on pure PyTorch.\n- [TorchRL](https:\u002F\u002Fdocs.pytorch.org\u002Frl\u002Fstable\u002Findex.html) - An open-source Reinforcement Learning (RL) library for PyTorch.\n- [UMass Amherst Autonomous Learning Library](https:\u002F\u002Fgithub.com\u002Fcpnota\u002Fautonomous-learning-library) - A PyTorch library for building deep reinforcement learning agents.\n- [Unity ML-Agents Toolkit](https:\u002F\u002Fgithub.com\u002FUnity-Technologies\u002Fml-agents) - Unity Machine Learning Agents Toolkit.\n- [vel](https:\u002F\u002Fgithub.com\u002FMillionIntegrals\u002Fvel) - Bring velocity to deep-learning research.\n- [DI-engine](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine) - A generalized decision intelligence engine. It supports various Deep RL algorithms.\n\n## Benchmark Results\n\n- [DeepMind bsuite](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fbsuite\u002Ftree\u002Fmaster\u002Fbsuite)\n- [OpenAI baselines-results](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fbaselines-results)\n- [OpenAI Baselines](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fbaselines#benchmarks)\n- [OpenAI Spinning Up](https:\u002F\u002Fspinningup.openai.com\u002Fen\u002Flatest\u002Fspinningup\u002Fbench.html)\n- [ray rl-experiments](https:\u002F\u002Fgithub.com\u002Fray-project\u002Frl-experiments)\n- [rl-baselines-zoo](https:\u002F\u002Fgithub.com\u002Faraffin\u002Frl-baselines-zoo\u002Fblob\u002Fmaster\u002Fbenchmark.md)\n- [SLM Lab](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fblob\u002Fmaster\u002FBENCHMARK.md)\n- [vel](https:\u002F\u002Fblog.millionintegrals.com\u002Fvel-pytorch-meets-baselines)\n- [What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.05990)\n- [yarlp](https:\u002F\u002Fgithub.com\u002Fbtaba\u002Fyarlp)\n\n## Environments\n\n- [AI2-THOR](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fai2thor) - A near photo-realistic interactable framework for AI agents.\n- [Animal-AI Olympics](https:\u002F\u002Fgithub.com\u002Fbeyretb\u002FAnimalAI-Olympics) - An AI competition with tests inspired by animal cognition.\n- [Berkeley rl-generalization](https:\u002F\u002Fgithub.com\u002Fsunblaze-ucb\u002Frl-generalization) - Modifiable OpenAI Gym environments for studying generalization in RL.\n- [BTGym](https:\u002F\u002Fgithub.com\u002FKismuz\u002Fbtgym) - Scalable event-driven RL-friendly backtesting library. Build on top of Backtrader with OpenAI Gym environment API.\n- [Carla](https:\u002F\u002Fgithub.com\u002Fcarla-simulator\u002Fcarla) - Open-source simulator for autonomous driving research.\n- [CuLE](https:\u002F\u002Fgithub.com\u002FNVlabs\u002Fcule) - A CUDA port of the Atari Learning Environment (ALE).\n- [Deepdrive](https:\u002F\u002Fgithub.com\u002Fdeepdrive\u002Fdeepdrive) - End-to-end simulation for self-driving cars.\n- [DeepMind AndroidEnv](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fandroid_env) - A library for doing RL research on Android devices.\n- [DeepMind DM Control](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fdm_control) - The DeepMind Control Suite and Package.\n- [DeepMind Lab](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Flab) - A customisable 3D platform for agent-based AI research.\n- [DeepMind pycolab](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fpycolab) - A highly-customisable gridworld game engine with some batteries included.\n- [DeepMind PySC2](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fpysc2) - StarCraft II Learning Environment.\n- [DeepMind RL Unplugged](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fdeepmind-research\u002Ftree\u002Fmaster\u002Frl_unplugged) - Benchmarks for Offline Reinforcement Learning.\n- [Facebook EmbodiedQA](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FEmbodiedQA) - Train embodied agents that can answer questions in environments.\n- [Facebook Habitat](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fhabitat-api) - A modular high-level library to train embodied AI agents across a variety of tasks, environments, and simulators.\n- [Facebook House3D](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FHouse3D) - A Rich and Realistic 3D Environment.\n- [Facebook natural_rl_environment](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fnatural_rl_environment) - natural signal Atari environments, introduced in the paper Natural Environment Benchmarks for Reinforcement Learning.\n- [Google Research Football](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Ffootball) - An RL environment based on open-source game Gameplay Football.\n- [GVGAI Gym](https:\u002F\u002Fgithub.com\u002Frubenrtorrado\u002FGVGAI_GYM) - An OpenAI Gym environment for games written in the Video Game Description Language, including the Generic Video Game Competition framework.\n- [gym-doom](https:\u002F\u002Fgithub.com\u002Fppaquette\u002Fgym-doom) - Doom environments based on VizDoom.\n- [gym-duckietown](https:\u002F\u002Fgithub.com\u002Fduckietown\u002Fgym-duckietown) - Self-driving car simulator for the Duckietown universe.\n- [gym-gazebo2](https:\u002F\u002Fgithub.com\u002FAcutronicRobotics\u002Fgym-gazebo2) - A toolkit for developing and comparing reinforcement learning algorithms using ROS 2 and Gazebo.\n- [gym-ignition](https:\u002F\u002Fgithub.com\u002Frobotology\u002Fgym-ignition) - Experimental OpenAI Gym environments implemented with Ignition Robotics.\n- [gym-idsgame](https:\u002F\u002Fgithub.com\u002FLimmen\u002Fgym-idsgame) - An Abstract Cyber Security Simulation and Markov Game for OpenAI Gym\n- [gym-super-mario](https:\u002F\u002Fgithub.com\u002Fppaquette\u002Fgym-super-mario) - 32 levels of original Super Mario Bros.\n- [Gym4ReaL](https:\u002F\u002Fgithub.com\u002FDaveonwave\u002Fgym4ReaL) - A Gymnasium-based benchmarking suite for testing reinforcement learning algorithms on real-world scenarios, including water management, energy management in microgrids, financial trading, and more.\n- [Holodeck](https:\u002F\u002Fgithub.com\u002FBYU-PCCL\u002Fholodeck) - High Fidelity Simulator for Reinforcement Learning and Robotics Research.\n- [home-platform](https:\u002F\u002Fgithub.com\u002FHoME-Platform\u002Fhome-platform) - A platform for artificial agents to learn from vision, audio, semantics, physics, and interaction with objects and other agents, all within a realistic context\n- [ma-gym](https:\u002F\u002Fgithub.com\u002Fkoulanurag\u002Fma-gym) - A collection of multi agent environments based on OpenAI gym.\n- [mazelab](https:\u002F\u002Fgithub.com\u002Fzuoxingdong\u002Fmazelab) - A customizable framework to create maze and gridworld environments.\n- [Meta-World](https:\u002F\u002Fgithub.com\u002Frlworkgroup\u002Fmetaworld) - An open source robotics benchmark for meta- and multi-task reinforcement learning.\n- [Microsoft AirSim](https:\u002F\u002Fgithub.com\u002FMicrosoft\u002FAirSim) - Open source simulator for autonomous vehicles built on Unreal Engine \u002F Unity, from Microsoft AI & Research.\n- [Microsoft Jericho](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fjericho) - A learning environment for man-made Interactive Fiction games.\n- [Microsoft Malmö](https:\u002F\u002Fgithub.com\u002FMicrosoft\u002Fmalmo) - A platform for Artificial Intelligence experimentation and research built on top of Minecraft.\n- [Microsoft MazeExplorer](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FMazeExplorer) - Customisable 3D environment for assessing generalisation in Reinforcement Learning.\n- [Microsoft TextWorld](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FTextWorld) - A text-based game generator and extensible sandbox learning environment for training and testing reinforcement learning (RL) agents.\n- [MineRL](https:\u002F\u002Fgithub.com\u002Fminerllabs\u002Fminerl) - MineRL Competition for Sample Efficient Reinforcement Learning.\n- [MuJoCo](http:\u002F\u002Fwww.mujoco.org) - Advanced physics simulation.\n- [OpenAI Coinrun](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fcoinrun) - Code for the environments used in the paper Quantifying Generalization in Reinforcement Learning.\n- [OpenAI Gym Retro](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fretro) - Retro Games in Gym.\n- [OpenAI Gym Soccer](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym-soccer) - A multiagent domain featuring continuous state and action spaces.\n- [OpenAI Gym](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym) - A toolkit for developing and comparing reinforcement learning algorithms.\n- [OpenAI Multi-Agent Particle Environment](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fmultiagent-particle-envs) - A simple multi-agent particle world with a continuous observation and discrete action space, along with some basic simulated physics.\n- [OpenAI Neural MMO](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fneural-mmo) - A Massively Multiagent Game Environment.\n- [OpenAI Procgen Benchmark](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fprocgen) - Procedurally Generated Game-Like Gym Environments.\n- [OpenAI Roboschool](https:\u002F\u002Fgithub.com\u002Fopenai\u002Froboschool) - Open-source software for robot simulation, integrated with OpenAI Gym.\n- [OpenAI RoboSumo](https:\u002F\u002Fgithub.com\u002Fopenai\u002Frobosumo) - A set of competitive multi-agent environments used in the paper Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments.\n- [OpenAI Safety Gym](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fsafety-gym) - Tools for accelerating safe exploration research.\n- [Personae](https:\u002F\u002Fgithub.com\u002FCeruleanacg\u002FPersonae) - RL & SL Methods and Envs For Quantitative Trading.\n- [Pommerman](https:\u002F\u002Fgithub.com\u002FMultiAgentLearning\u002Fplayground) - A clone of Bomberman built for AI research.\n- [pybullet-gym](https:\u002F\u002Fgithub.com\u002Fbenelot\u002Fpybullet-gym) - Open-source implementations of OpenAI Gym MuJoCo environments for use with the OpenAI Gym Reinforcement Learning Research Platform\n- [PyGame Learning Environment](https:\u002F\u002Fgithub.com\u002Fntasfi\u002FPyGame-Learning-Environment) - Reinforcement Learning Environment in Python.\n- [RLBench](https:\u002F\u002Fgithub.com\u002Fstepjam\u002FRLBench) - A large-scale benchmark and learning environment.\n- [RLGym](https:\u002F\u002Fgithub.com\u002Flucas-emery\u002Frocket-league-gym) - A python API to treat the game Rocket League as an OpenAI Gym environment.\n- [RLTrader](https:\u002F\u002Fgithub.com\u002Fnotadamking\u002FRLTrader) - A cryptocurrency trading environment using deep reinforcement learning and OpenAI's gym.\n- [RoboNet](https:\u002F\u002Fblog.ml.cmu.edu\u002F2019\u002F11\u002F26\u002Frobonet\u002F) - A Dataset for Large-Scale Multi-Robot Learning.\n- [rocket-lander](https:\u002F\u002Fgithub.com\u002Farex18\u002Frocket-lander) - SpaceX Falcon 9 Box2D continuous-action simulation with traditional and AI controllers.\n- [Stanford Gibson Environments](https:\u002F\u002Fgithub.com\u002FStanfordVL\u002FGibsonEnv) - Real-World Perception for Embodied Agents.\n- [Stanford osim-rl](https:\u002F\u002Fgithub.com\u002Fstanfordnmbl\u002Fosim-rl) - Reinforcement learning environments with musculoskeletal models.\n- [Unity ML-Agents Toolkit](https:\u002F\u002Fgithub.com\u002FUnity-Technologies\u002Fml-agents) - Unity Machine Learning Agents Toolkit.\n- [UnityObstableTower](https:\u002F\u002Fgithub.com\u002FUnity-Technologies\u002Fobstacle-tower-env) - A procedurally generated environment consisting of multiple floors to be solved by a learning agent.\n- [VizDoom](https:\u002F\u002Fgithub.com\u002Fmwydmuch\u002FViZDoom) - Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information.\n- [RLCard](https:\u002F\u002Fgithub.com\u002Fdatamllab\u002Frlcard\u002F) - A research platform for reinforcement learning in card games.\n- [DouZero](https:\u002F\u002Fgithub.com\u002Fkwai\u002FDouZero\u002F) - A research platform for reinforcement learning in DouDizhu (Chinese poker).\n\n\n## Competitions\n\n- [AWS DeepRacer League 2019](https:\u002F\u002Faws.amazon.com\u002Fdeepracer\u002Fleague\u002F)\n- [Flatland Challenge 2019](https:\u002F\u002Fwww.aicrowd.com\u002Fchallenges\u002Fflatland-challenge)\n- [Kaggle Connect X Competition 2020](https:\u002F\u002Fwww.kaggle.com\u002Fc\u002Fconnectx)\n- [NeurIPS 2019: Animal-AI Olympics](http:\u002F\u002Fanimalaiolympics.com\u002F)\n- [NeurIPS 2019: Game of Drones](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Facademic-program\u002Fgame-of-drones-competition-at-neurips-2019\u002F)\n- [NeurIPS 2019: Learn to Move - Walk Around](https:\u002F\u002Fwww.aicrowd.com\u002Fchallenges\u002Fneurips-2019-learning-to-move-walk-around)\n- [NeurIPS 2019: MineRL Competition](http:\u002F\u002Fminerl.io\u002Fcompetition\u002F)\n- [NeurIPS 2019: Reconnaissance Blind Chess](https:\u002F\u002Frbc.jhuapl.edu\u002F)\n- [NeurIPS 2019: Robot open-Ended Autonomous Learning](https:\u002F\u002Fwww.aicrowd.com\u002Fchallenges\u002Frobot-open-ended-autonomous-learning-real)\n- [Unity Obstacle Tower Challenge 2019](https:\u002F\u002Fblogs.unity3d.com\u002F2019\u002F01\u002F28\u002Fobstacle-tower-challenge-test-the-limits-of-intelligence-systems\u002F)\n\n>Check [AICrowd](https:\u002F\u002Fwww.aicrowd.com) for the latest list of major RL competitions\n\n## Timeline\n\n- 1947: [Monte Carlo Sampling](http:\u002F\u002Feniacinaction.com\u002Fthe-articles\u002F3-los-alamos-bets-on-eniac-nuclear-monte-carlo-simulations-1947-8\u002F)\n- 1958: [Perceptron](https:\u002F\u002Fwww.ling.upenn.edu\u002Fcourses\u002Fcogs501\u002FRosenblatt1958.pdf)\n- 1959: [Temporal Difference Learning](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=1661924)\n- 1983: [ASE-ALE — the first Actor-Critic algorithm](https:\u002F\u002Fpsycnet.apa.org\u002Frecord\u002F1984-13799-001)\n- 1986: [Backpropagation algorithm](https:\u002F\u002Fwww.nature.com\u002Farticles\u002F323533a0)\n- 1989: [CNNs](http:\u002F\u002Fciteseerx.ist.psu.edu\u002Fviewdoc\u002Fdownload?doi=10.1.1.476.479&rep=rep1&type=pdf)\n- 1989: [Q-Learning](http:\u002F\u002Fwww.cs.rhul.ac.uk\u002F~chrisw\u002Fnew_thesis.pdf)\n- 1991: [TD-Gammon](http:\u002F\u002Fbkgm.com\u002Fbooks\u002FRobertie-LearningFromTheMachine.html)\n- 1992: [REINFORCE](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=139614)\n- 1992: [Experience Replay](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=139620)\n- 1994: [SARSA](http:\u002F\u002Fciteseerx.ist.psu.edu\u002Fviewdoc\u002Fsummary?doi=10.1.1.17.2539)\n- 1999: [Nvidia invented the GPU](https:\u002F\u002Fwww.nvidia.com\u002Fobject\u002Fgpu.html)\n- 2007: [CUDA released](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-zone)\n- 2012: [Arcade Learning Environment (ALE)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1207.4708)\n- 2013: [DQN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1312.5602)\n- 2015 Feb: [DQN human-level control in Atari](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fnature14236)\n- 2015 Feb: [TRPO](https:\u002F\u002Farxiv.org\u002Fabs\u002F1502.05477)\n- 2015 Jun: [Generalized Advantage Estimation](https:\u002F\u002Farxiv.org\u002Fabs\u002F1506.02438)\n- 2015 Sep: [Deep Deterministic Policy Gradient (DDPG)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1509.02971)\n- 2015 Sep: [DoubleDQN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1509.06461)\n- 2015 Nov: [DuelingDQN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.06581)\n- 2015 Nov: [Prioritized Experience Replay](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.05952)\n- 2015 Nov: [TensorFlow](https:\u002F\u002Fwww.tensorflow.org\u002F)\n- 2016 Feb: [A3C](https:\u002F\u002Farxiv.org\u002Fabs\u002F1602.01783)\n- 2016 Mar: [AlphaGo beats Lee Sedol 4-1](https:\u002F\u002Fdeepmind.com\u002Falphago-korea)\n- 2016 Jun: [OpenAI Gym](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym)\n- 2016 Jun: [Generative Adversarial Imitation Learning (GAIL)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1606.03476)\n- 2016 Oct: [PyTorch](https:\u002F\u002Fpytorch.org\u002F)\n- 2017 Mar: [Model-Agnostic Meta-Learning (MAML)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.03400)\n- 2017 Jul: [Distributional RL](https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.06887)\n- 2017 Jul: [PPO](https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.06347)\n- 2017 Aug: [OpenAI DotA 2 1:1](https:\u002F\u002Fopenai.com\u002Fblog\u002Fmore-on-dota-2\u002F)\n- 2017 Aug: [Intrinsic Cusiority Module (ICM)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1705.05363)\n- 2017 Oct: [Rainbow](https:\u002F\u002Farxiv.org\u002Fabs\u002F1710.02298)\n- 2017 Oct: [AlphaGo Zero masters Go without human knowledge](https:\u002F\u002Fdeepmind.com\u002Fblog\u002Farticle\u002Falphago-zero-starting-scratch)\n- 2017 Dec: [AlphaZero masters Go, Chess and Shogi](https:\u002F\u002Farxiv.org\u002Fabs\u002F1712.01815)\n- 2018 Jan: [Soft Actor-Critic](https:\u002F\u002Fai.googleblog.com\u002F2019\u002F01\u002Fsoft-actor-critic-deep-reinforcement.html)\n- 2018 Feb: [IMPALA](https:\u002F\u002Fdeepmind.com\u002Fblog\u002Farticle\u002Fimpala-scalable-distributed-deeprl-dmlab-30)\n- 2018 Jun: [Qt-Opt](https:\u002F\u002Fai.googleblog.com\u002F2018\u002F06\u002Fscalable-deep-reinforcement-learning.html)\n- 2018 Nov: [Go-Explore solved Montezuma’s Revenge](https:\u002F\u002Feng.uber.com\u002Fgo-explore\u002F)\n- 2018 Dec: [AlphaZero becomes the strongest player in history for chess, Go, and Shogi](https:\u002F\u002Fdeepmind.com\u002Fblog\u002Farticle\u002Falphazero-shedding-new-light-grand-games-chess-shogi-and-go)\n- 2019 Apr: [OpenAI Five defeated world champions at DotA 2](https:\u002F\u002Fopenai.com\u002Ffive\u002F)\n- 2019 May: [FTW Quake III Arena Capture the Flag](https:\u002F\u002Fdeepmind.com\u002Fblog\u002Farticle\u002Fcapture-the-flag-science)\n- 2019 Aug: [AlphaStar: Grandmaster level in StarCraft II](https:\u002F\u002Fdeepmind.com\u002Fblog\u002Farticle\u002FAlphaStar-Grandmaster-level-in-StarCraft-II-using-multi-agent-reinforcement-learning)\n- 2019 Sep: [Emergent Tool Use from Multi-Agent Interaction](https:\u002F\u002Fopenai.com\u002Fblog\u002Femergent-tool-use\u002F)\n- 2019 Oct: [Solving Rubik’s Cube with a Robot Hand](https:\u002F\u002Fopenai.com\u002Fblog\u002Fsolving-rubiks-cube\u002F)\n- 2020 Mar: [Agent57 outperforms the standard human benchmark on all 57 Atari games](https:\u002F\u002Fdeepmind.com\u002Fblog\u002Farticle\u002FAgent57-Outperforming-the-human-Atari-benchmark)\n- 2020 Nov: [AlphaFold for protein folding](https:\u002F\u002Fdeepmind.com\u002Fblog\u002Farticle\u002Falphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology)\n- 2020 Dec: [MuZero masters Go, chess, shogi and Atari without rules](https:\u002F\u002Fdeepmind.com\u002Fblog\u002Farticle\u002Fmuzero-mastering-go-chess-shogi-and-atari-without-rules)\n- 2021 Aug: [Generally capable agents emerge from open-ended play](https:\u002F\u002Fdeepmind.com\u002Fblog\u002Farticle\u002Fgenerally-capable-agents-emerge-from-open-ended-play)\n\n## Books\n\n- [Algorithms for Reinforcement Learning. *Szepesvari et. al.*](https:\u002F\u002Fwww.amazon.com\u002FAlgorithms-Reinforcement-Learning-Csaba-Szepesvari\u002Fdp\u002F1608454924)\n- [An Introduction to Deep Reinforcement Learning. *Francois-Lavet et. al.*](https:\u002F\u002Fwww.amazon.com\u002Fdp\u002F1680835386)\n- [Deep Reinforcement Learning Hands-On. *Lapan*](https:\u002F\u002Fwww.amazon.com\u002FDeep-Reinforcement-Learning-Hands-optimisation\u002Fdp\u002F1838826998)\n- [Deep Reinforcement Learning in Action. *Zai & Brown*](https:\u002F\u002Fwww.amazon.com\u002FDeep-Reinforcement-Learning-Action-Alexander\u002Fdp\u002F1617295434)\n- [Foundations of Deep Reinforcement Learning. *Graesser & Keng*](https:\u002F\u002Fwww.amazon.com\u002Fdp\u002F0135172381)\n- [Grokking Deep Reinforcement Learning. *Morales*](https:\u002F\u002Fwww.amazon.com\u002FGrokking-Reinforcement-Learning-Miguel-Morales\u002Fdp\u002F1617295450)\n- [Reinforcement Learning: An Introduction. *Sutton & Barto.*](https:\u002F\u002Fwww.amazon.com\u002Fdp\u002F0262039249)\n\n## Tutorials\n\n- [Andrew Karpathy Deep Reinforcement Learning: Pong from Pixels](http:\u002F\u002Fkarpathy.github.io\u002F2016\u002F05\u002F31\u002Frl\u002F)\n- [Arthur Juliani Simple Reinforcement Learning in Tensorflow Series](https:\u002F\u002Fmedium.com\u002Femergent-future\u002Fsimple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0)\n- [Berkeley Deep Reinforcement Learning Course](http:\u002F\u002Frail.eecs.berkeley.edu\u002Fdeeprlcourse\u002F)\n- [David Silver UCL Course on RL 2015](http:\u002F\u002Fwww0.cs.ucl.ac.uk\u002Fstaff\u002Fd.silver\u002Fweb\u002FTeaching.html)\n- [Deep RL Bootcamp 2017](https:\u002F\u002Fsites.google.com\u002Fview\u002Fdeep-rl-bootcamp\u002Flectures)\n- [DeepMind UCL Deep RL Course 2018](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLqYmG7hTraZDNJre23vqCGIVpfZ_K2RZs)\n- [DeepMind Learning Resources](https:\u002F\u002Fdeepmind.com\u002Flearning-resources)\n- [dennybritz\u002Freinforcement-learning](https:\u002F\u002Fgithub.com\u002Fdennybritz\u002Freinforcement-learning)\n- [higgsfield\u002FRL-Adventure-2](https:\u002F\u002Fgithub.com\u002Fhiggsfield\u002FRL-Adventure-2)\n- [higgsfield\u002FRL-Adventure](https:\u002F\u002Fgithub.com\u002Fhiggsfield\u002FRL-Adventure)\n- [The Hugging Face Deep Reinforcement Learning Class 🤗](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdeep-rl-class#the-hugging-face-deep-reinforcement-learning-class-)\n- [MorvanZhou\u002FReinforcement Learning Methods and Tutorials](https:\u002F\u002Fgithub.com\u002FMorvanZhou\u002FReinforcement-learning-with-tensorflow)\n- [OpenAI Spinning Up](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fspinningup)\n- [Sergey Levine CS294 Deep Reinforcement Learning Fall 2017](http:\u002F\u002Frail.eecs.berkeley.edu\u002Fdeeprlcourse-fa17\u002Findex.html)\n- [Udacity Deep Reinforcement Learning Nanodegree](https:\u002F\u002Fwww.udacity.com\u002Fcourse\u002Fdeep-reinforcement-learning-nanodegree--nd893)\n- [Reinforcement Learning Fundamental](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLzvYlJMoZ02Dxtwe-MmH4nOB5jYlMGBjr)\n- [PPOxFamily: DRL Tutorial Course](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FPPOxFamily)\n\n## Blogs\n\n- [Alex Irpan](https:\u002F\u002Fwww.alexirpan.com)\n- [Andrew Karpathy](http:\u002F\u002Fkarpathy.github.io\u002F)\n- [Berkeley AI Research](https:\u002F\u002Fbair.berkeley.edu\u002Fblog\u002F)\n- [Chris Olah](https:\u002F\u002Fcolah.github.io\u002F)\n- [David Ha](http:\u002F\u002Fblog.otoro.net\u002F)\n- [DeepMind](https:\u002F\u002Fdeepmind.com\u002Fblog)\n- [Distill](https:\u002F\u002Fdistill.pub)\n- [Eric Jang](https:\u002F\u002Fblog.evjang.com)\n- [Facebook AI](https:\u002F\u002Fai.facebook.com\u002Fblog\u002F)\n- [Google AI](https:\u002F\u002Fai.googleblog.com\u002F)\n- [Lilian Weng](https:\u002F\u002Flilianweng.github.io\u002Flil-log\u002F)\n- [Matthew Rahtz](http:\u002F\u002Famid.fish\u002F)\n- [OpenAI](https:\u002F\u002Fopenai.com\u002Fblog\u002F)\n- [The Gradient](https:\u002F\u002Fthegradient.pub\u002F)\n- [Uber AI](https:\u002F\u002Feng.uber.com\u002Fcategory\u002Farticles\u002Fai\u002F)\n","# Awesome Deep RL [![Awesome](https:\u002F\u002Fawesome.re\u002Fbadge.svg)](https:\u002F\u002Fawesome.re)\n一份精选的深度强化学习（Deep Reinforcement Learning）资源列表。\n\n## 目录\n\n- [库](#libraries)\n- [基准测试结果](#benchmark-results)\n- [环境](#environments)\n- [竞赛](#competitions)\n- [时间线](#timeline)\n- [书籍](#books)\n- [教程](#tutorials)\n- [博客](#blogs)\n\n## 库\n\n- [AgileRL](https:\u002F\u002Fgithub.com\u002FAgileRL\u002FAgileRL) - 一个深度强化学习库，专注于通过引入 RLOps（强化学习运维）- MLOps（机器学习运维）来改进开发流程。\n- [Berkeley Ray RLLib](https:\u002F\u002Fgithub.com\u002Fray-project\u002Fray) - 一个开源的强化学习库，提供高可扩展性以及用于各种应用的统一 API（应用程序接口）。\n- [Berkeley Softlearning](https:\u002F\u002Fgithub.com\u002Frail-berkeley\u002Fsoftlearning) - 一个用于在连续域中训练最大熵策略的强化学习框架。\n- [Catalyst](https:\u002F\u002Fgithub.com\u002Fcatalyst-team\u002Fcatalyst) - 加速的深度学习（DL）与强化学习（RL）。\n- [ChainerRL](https:\u002F\u002Fgithub.com\u002Fchainer\u002Fchainerrl) - 一个基于 Chainer 构建的深度强化学习库。\n- [DeepMind Acme](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Facme) - 一个强化学习研究框架。\n- [DeepMind OpenSpiel](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fopen_spiel) - 一个用于通用强化学习研究以及游戏中搜索\u002F规划的环境和算法集合。\n- [DeepMind TRFL](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Ftrfl) - TensorFlow 强化学习。\n- [DeepRL](https:\u002F\u002Fgithub.com\u002FShangtongZhang\u002FDeepRL) - PyTorch 中深度强化学习算法的模块化实现。\n- [DeepX machina](https:\u002F\u002Fgithub.com\u002FDeepX-inc\u002Fmachina) - 一个用于现实世界深度强化学习的库，基于 PyTorch 构建。\n- [d3rlpy](https:\u002F\u002Fgithub.com\u002Ftakuseno\u002Fd3rlpy) - 一个离线深度强化学习库。\n- [Facebook ELF](https:\u002F\u002Fgithub.com\u002Fpytorch\u002FELF) - 一个包含 AlphaGoZero\u002FAlphaZero 重实现的游戏研究平台。\n- [Facebook ReAgent](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FReAgent) - 一个用于推理系统（强化学习、上下文赌博机等）的平台。\n- [garage](https:\u002F\u002Fgithub.com\u002Frlworkgroup\u002Fgarage) - 一个用于可复现强化学习研究的工具包。\n- [Google Dopamine](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fdopamine) - 一个用于强化学习算法快速原型设计的研究框架。\n- [Google TF-Agents](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Fagents) - TF-Agents 是一个 TensorFlow 中的强化学习库。\n- [K-Scale Labs - ksim](https:\u002F\u002Fgithub.com\u002Fkscalelabs\u002Fksim) - 一个模块化且易于使用的框架，用于在仿真中训练策略。\n- [K-Scale Labs - ksim-gym](https:\u002F\u002Fgithub.com\u002Fkscalelabs\u002Fksim-gym) - K-Sim Gym：利用 RL 让机器人更有用。基于 K-Sim 构建。\n- [MAgent](https:\u002F\u002Fgithub.com\u002Fgeek-ai\u002FMAgent) - 一个多智能体强化学习平台。\n- [Maze](https:\u002F\u002Fgithub.com\u002Fenlite-ai\u002Fmaze) - 面向应用的深度强化学习框架，解决现实世界决策问题。\n- [MushroomRL](https:\u002F\u002Fgithub.com\u002FMushroomRL\u002Fmushroom-rl) - 用于强化学习实验的 Python 库。\n- [NervanaSystems coach](https:\u002F\u002Fgithub.com\u002FNervanaSystems\u002Fcoach) - Intel AI 实验室出品的强化学习 Coach。\n- [OpenAI Baselines](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fbaselines) - 强化学习算法的高质量实现。\n- [OpenRL](https:\u002F\u002Fgithub.com\u002FOpenRL-Lab\u002Fopenrl) - 一个开源的通用强化学习研究框架。\n- [pytorch-a2c-ppo-acktr-gail](https:\u002F\u002Fgithub.com\u002Fikostrikov\u002Fpytorch-a2c-ppo-acktr-gail) - 优势演员 - 评论家 (A2C)、近端策略优化 (PPO)、使用 Kronecker 因子近似的大规模深度强化学习信任域方法 (ACKTR) 以及生成对抗模仿学习 (GAIL) 的 PyTorch 实现。\n- [pytorch-rl](https:\u002F\u002Fgithub.com\u002Fnavneet-nmk\u002Fpytorch-rl) - 用 PyTorch 实现的无模型深度强化学习算法。\n- [reaver](https:\u002F\u002Fgithub.com\u002Finoryy\u002Freaver) - 一个模块化深度强化学习框架，专注于各种基于 StarCraft II 的任务。\n- [RLgraph](https:\u002F\u002Fgithub.com\u002Frlgraph\u002Frlgraph) - 用于深度强化学习的模块化计算图。\n- [RLkit](https:\u002F\u002Fgithub.com\u002Fvitchyr\u002Frlkit) - 用 PyTorch 实现的强化学习框架和算法。\n- [rlpyt](https:\u002F\u002Fgithub.com\u002Fastooke\u002Frlpyt) - PyTorch 中的强化学习。\n- [RLtools](https:\u002F\u002Fgithub.com\u002Frl-tools\u002Frl-tools) - 最快的用于连续控制的深度强化学习库，用纯的、无依赖的 C++ 实现（也提供 Python 绑定）。\n- [skrl](https:\u002F\u002Fgithub.com\u002FToni-SM\u002Fskrl) - 模块化强化学习库（基于 PyTorch 和 JAX），支持 NVIDIA Isaac Gym、Omniverse Isaac Gym 和 Isaac Lab。\n- [SLM Lab](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab) - PyTorch 中的模块化深度强化学习框架。\n- [Stable Baselines](https:\u002F\u002Fgithub.com\u002Fhill-a\u002Fstable-baselines) - OpenAI Baselines 的一个分支，包含强化学习算法的实现。\n- [TensorForce](https:\u002F\u002Fgithub.com\u002Ftensorforce\u002Ftensorforce) - 一个用于应用强化学习的 TensorFlow 库。\n- [Tianshou](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002Ftianshou\u002F) - Tianshou (天授) 是一个基于纯 PyTorch 的强化学习平台。\n- [TorchRL](https:\u002F\u002Fdocs.pytorch.org\u002Frl\u002Fstable\u002Findex.html) - 一个用于 PyTorch 的开源强化学习 (RL) 库。\n- [UMass Amherst Autonomous Learning Library](https:\u002F\u002Fgithub.com\u002Fcpnota\u002Fautonomous-learning-library) - 一个用于构建深度强化学习智能体的 PyTorch 库。\n- [Unity ML-Agents Toolkit](https:\u002F\u002Fgithub.com\u002FUnity-Technologies\u002Fml-agents) - Unity 机器学习智能体工具包。\n- [vel](https:\u002F\u002Fgithub.com\u002FMillionIntegrals\u002Fvel) - 为深度学习研究带来速度。\n- [DI-engine](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FDI-engine) - 一个通用决策智能引擎。支持各种深度强化学习算法。\n\n## 基准测试结果\n\n- [DeepMind bsuite](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fbsuite\u002Ftree\u002Fmaster\u002Fbsuite)\n- [OpenAI baselines-results](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fbaselines-results)\n- [OpenAI Baselines](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fbaselines#benchmarks)\n- [OpenAI Spinning Up](https:\u002F\u002Fspinningup.openai.com\u002Fen\u002Flatest\u002Fspinningup\u002Fbench.html)\n- [ray rl-experiments](https:\u002F\u002Fgithub.com\u002Fray-project\u002Frl-experiments)\n- [rl-baselines-zoo](https:\u002F\u002Fgithub.com\u002Faraffin\u002Frl-baselines-zoo\u002Fblob\u002Fmaster\u002Fbenchmark.md)\n- [SLM Lab](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fblob\u002Fmaster\u002FBENCHMARK.md)\n- [vel](https:\u002F\u002Fblog.millionintegrals.com\u002Fvel-pytorch-meets-baselines)\n- [同策略强化学习中什么最重要？一项大规模实证研究](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.05990)\n- [yarlp](https:\u002F\u002Fgithub.com\u002Fbtaba\u002Fyarlp)\n\n## 环境\n\n- [AI2-THOR](https:\u002F\u002Fgithub.com\u002Fallenai\u002Fai2thor) - 一个近乎照片级真实感的可交互框架，用于 AI (人工智能) agents (智能体)。\n- [Animal-AI Olympics](https:\u002F\u002Fgithub.com\u002Fbeyretb\u002FAnimalAI-Olympics) - 一项 AI 竞赛，测试灵感来源于动物认知。\n- [Berkeley rl-generalization](https:\u002F\u002Fgithub.com\u002Fsunblaze-ucb\u002Frl-generalization) - 可修改的 OpenAI Gym 环境，用于研究 RL (强化学习) 中的泛化能力。\n- [BTGym](https:\u002F\u002Fgithub.com\u002FKismuz\u002Fbtgym) - 可扩展的事件驱动型、对 RL 友好的回测库。基于 Backtrader 构建，带有 OpenAI Gym 环境 API (应用程序接口)。\n- [Carla](https:\u002F\u002Fgithub.com\u002Fcarla-simulator\u002Fcarla) - 用于自动驾驶研究的开源模拟器。\n- [CuLE](https:\u002F\u002Fgithub.com\u002FNVlabs\u002Fcule) - Atari 学习环境（ALE）的 CUDA 移植版本。\n- [Deepdrive](https:\u002F\u002Fgithub.com\u002Fdeepdrive\u002Fdeepdrive) - 用于自动驾驶汽车的端到端模拟。\n- [DeepMind AndroidEnv](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fandroid_env) - 一个在 Android 设备上进行 RL 研究的库。\n- [DeepMind DM Control](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fdm_control) - DeepMind 控制套件和包。\n- [DeepMind Lab](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Flab) - 一个可定制的 3D 平台，用于基于智能体的 AI 研究。\n- [DeepMind pycolab](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fpycolab) - 一个高度可定制的 gridworld (网格世界) 游戏引擎，包含一些内置功能。\n- [DeepMind PySC2](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fpysc2) - 星际争霸 II 学习环境。\n- [DeepMind RL Unplugged](https:\u002F\u002Fgithub.com\u002Fdeepmind\u002Fdeepmind-research\u002Ftree\u002Fmaster\u002Frl_unplugged) - Offline (离线) 强化学习的基准测试。\n- [Facebook EmbodiedQA](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FEmbodiedQA) - 训练 embodied (具身) 智能体，使其能够在环境中回答问题。\n- [Facebook Habitat](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fhabitat-api) - 一个模块化的高级库，用于在各种任务、环境和模拟器中训练具身 AI 智能体。\n- [Facebook House3D](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FHouse3D) - 一个丰富且逼真的 3D 环境。\n- [Facebook natural_rl_environment](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fnatural_rl_environment) - 自然信号 Atari 环境，在论文《自然环境强化学习基准》中引入。\n- [Google Research Football](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Ffootball) - 一个基于开源游戏 Gameplay Football 的 RL 环境。\n- [GVGAI Gym](https:\u002F\u002Fgithub.com\u002Frubenrtorrado\u002FGVGAI_GYM) - 一个 OpenAI Gym 环境，适用于用视频游戏描述语言编写的游戏，包括通用视频游戏竞赛框架。\n- [gym-doom](https:\u002F\u002Fgithub.com\u002Fppaquette\u002Fgym-doom) - 基于 VizDoom 的 Doom 环境。\n- [gym-duckietown](https:\u002F\u002Fgithub.com\u002Fduckietown\u002Fgym-duckietown) - 用于 Duckietown 宇宙的自动驾驶汽车模拟器。\n- [gym-gazebo2](https:\u002F\u002Fgithub.com\u002FAcutronicRobotics\u002Fgym-gazebo2) - 一个使用 ROS 2 和 Gazebo 开发和比较强化学习算法的工具包。\n- [gym-ignition](https:\u002F\u002Fgithub.com\u002Frobotology\u002Fgym-ignition) - 使用 Ignition Robotics 实现的实验性 OpenAI Gym 环境。\n- [gym-idsgame](https:\u002F\u002Fgithub.com\u002FLimmen\u002Fgym-idsgame) - 一个用于 OpenAI Gym 的抽象网络安全模拟和 Markov Game (马尔科夫博弈)。\n- [gym-super-mario](https:\u002F\u002Fgithub.com\u002Fppaquette\u002Fgym-super-mario) - 32 关原版超级马里奥兄弟。\n- [Gym4ReaL](https:\u002F\u002Fgithub.com\u002FDaveonwave\u002Fgym4ReaL) - 一个基于 Gymnasium 的基准测试套件，用于在现实世界场景中测试强化学习算法，包括水资源管理、微电网能源管理、金融交易等。\n- [Holodeck](https:\u002F\u002Fgithub.com\u002FBYU-PCCL\u002Fholodeck) - 用于强化学习和机器人学研究的高保真模拟器。\n- [home-platform](https:\u002F\u002Fgithub.com\u002FHoME-Platform\u002Fhome-platform) - 一个人工智能体平台，用于在现实情境中从视觉、音频、语义、物理以及与物体和其他智能体的交互中学习。\n- [ma-gym](https:\u002F\u002Fgithub.com\u002Fkoulanurag\u002Fma-gym) - 基于 OpenAI gym 的 multi agent (多智能体) 环境集合。\n- [mazelab](https:\u002F\u002Fgithub.com\u002Fzuoxingdong\u002Fmazelab) - 一个可定制的框架，用于创建迷宫和网格世界环境。\n- [Meta-World](https:\u002F\u002Fgithub.com\u002Frlworkgroup\u002Fmetaworld) - 一个用于 Meta- (元) 和多任务强化学习的开源机器人基准测试。\n- [Microsoft AirSim](https:\u002F\u002Fgithub.com\u002FMicrosoft\u002FAirSim) - 由微软 AI & 研究部门开发的基于 Unreal Engine \u002F Unity 构建的用于自动驾驶车辆的开源模拟器。\n- [Microsoft Jericho](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fjericho) - 一个人造交互式小说（Interactive Fiction）游戏的学习环境。\n- [Microsoft Malmö](https:\u002F\u002Fgithub.com\u002FMicrosoft\u002Fmalmo) - 一个建立在 Minecraft 之上的人工智能实验和研究平台。\n- [Microsoft MazeExplorer](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FMazeExplorer) - 可定制的 3D 环境，用于评估强化学习中的泛化能力。\n- [Microsoft TextWorld](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FTextWorld) - 一个基于文本的游戏生成器和可扩展的沙盒学习环境，用于训练和测试强化学习智能体。\n- [MineRL](https:\u002F\u002Fgithub.com\u002Fminerllabs\u002Fminerl) - 样本高效强化学习竞赛。\n- [MuJoCo](http:\u002F\u002Fwww.mujoco.org) - 高级物理模拟。\n- [OpenAI Coinrun](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fcoinrun) - 论文《量化强化学习中的泛化》中使用的环境代码。\n- [OpenAI Gym Retro](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fretro) - Gym 中的复古游戏。\n- [OpenAI Gym Soccer](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym-soccer) - 一个具有连续状态和动作空间的多智能体领域。\n- [OpenAI Gym](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym) - 一个用于开发和比较强化学习算法的工具包。\n- [OpenAI Multi-Agent Particle Environment](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fmultiagent-particle-envs) - 一个简单的多智能体粒子世界，具有连续观察和离散动作空间，以及一些基本的模拟物理。\n- [OpenAI Neural MMO](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fneural-mmo) - 一个大规模多智能体游戏环境。\n- [OpenAI Procgen Benchmark](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fprocgen) - 程序生成的类游戏 Gym 环境。\n- [OpenAI Roboschool](https:\u002F\u002Fgithub.com\u002Fopenai\u002Froboschool) - 用于机器人模拟的开源软件，与 OpenAI Gym 集成。\n- [OpenAI RoboSumo](https:\u002F\u002Fgithub.com\u002Fopenai\u002Frobosumo) - 一组竞争性多智能体环境，用于论文《通过非平稳和竞争环境中的元学习进行持续适应》。\n- [OpenAI Safety Gym](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fsafety-gym) - 用于加速安全探索研究的工具。\n- [Personae](https:\u002F\u002Fgithub.com\u002FCeruleanacg\u002FPersonae) - 用于量化交易的 RL & SL (监督学习) 方法及环境。\n- [Pommerman](https:\u002F\u002Fgithub.com\u002FMultiAgentLearning\u002Fplayground) - 为 AI 研究构建的炸弹人克隆版。\n- [pybullet-gym](https:\u002F\u002Fgithub.com\u002Fbenelot\u002Fpybullet-gym) - OpenAI Gym MuJoCo 环境的开源实现，用于 OpenAI Gym 强化学习研究平台。\n- [PyGame Learning Environment](https:\u002F\u002Fgithub.com\u002Fntasfi\u002FPyGame-Learning-Environment) - Python 中的强化学习环境。\n- [RLBench](https:\u002F\u002Fgithub.com\u002Fstepjam\u002FRLBench) - 一个大规模基准测试和学习环境。\n- [RLGym](https:\u002F\u002Fgithub.com\u002Flucas-emery\u002Frocket-league-gym) - 一个 Python API，将火箭联盟游戏视为 OpenAI Gym 环境。\n- [RLTrader](https:\u002F\u002Fgithub.com\u002Fnotadamking\u002FRLTrader) - 一个使用深度强化学习和 OpenAI gym 的加密货币交易环境。\n- [RoboNet](https:\u002F\u002Fblog.ml.cmu.edu\u002F2019\u002F11\u002F26\u002Frobonet\u002F) - 一个用于大规模多机器人学习的数据集。\n- [rocket-lander](https:\u002F\u002Fgithub.com\u002Farex18\u002Frocket-lander) - SpaceX Falcon 9 Box2D 连续动作模拟，带有传统和 AI 控制器。\n- [Stanford Gibson Environments](https:\u002F\u002Fgithub.com\u002FStanfordVL\u002FGibsonEnv) - 具身智能体的现实世界感知。\n- [Stanford osim-rl](https:\u002F\u002Fgithub.com\u002Fstanfordnmbl\u002Fosim-rl) - 带有肌肉骨骼模型的强化学习环境。\n- [Unity ML-Agents Toolkit](https:\u002F\u002Fgithub.com\u002FUnity-Technologies\u002Fml-agents) - Unity 机器学习智能体工具包。\n- [UnityObstableTower](https:\u002F\u002Fgithub.com\u002FUnity-Technologies\u002Fobstacle-tower-env) - 一个程序生成的环境，由多个楼层组成，供学习智能体解决。\n- [VizDoom](https:\u002F\u002Fgithub.com\u002Fmwydmuch\u002FViZDoom) - 基于 Doom 的 AI 研究平台，用于从原始视觉信息进行强化学习。\n- [RLCard](https:\u002F\u002Fgithub.com\u002Fdatamllab\u002Frlcard\u002F) - 一个用于纸牌游戏强化学习的研究平台。\n- [DouZero](https:\u002F\u002Fgithub.com\u002Fkwai\u002FDouZero\u002F) - 一个用于斗地主（中国扑克）强化学习的研究平台。\n\n## 竞赛\n\n- [AWS DeepRacer (深度赛车) 联赛 2019](https:\u002F\u002Faws.amazon.com\u002Fdeepracer\u002Fleague\u002F)\n- [Flatland 挑战赛 2019](https:\u002F\u002Fwww.aicrowd.com\u002Fchallenges\u002Fflatland-challenge)\n- [Kaggle Connect X 竞赛 2020](https:\u002F\u002Fwww.kaggle.com\u002Fc\u002Fconnectx)\n- [NeurIPS 2019: Animal-AI (动物智能) 奥运会](http:\u002F\u002Fanimalaiolympics.com\u002F)\n- [NeurIPS 2019: 无人机博弈](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Fresearch\u002Facademic-program\u002Fgame-of-drones-competition-at-neurips-2019\u002F)\n- [NeurIPS 2019: 学习移动 - 行走](https:\u002F\u002Fwww.aicrowd.com\u002Fchallenges\u002Fneurips-2019-learning-to-move-walk-around)\n- [NeurIPS 2019: MineRL 竞赛](http:\u002F\u002Fminerl.io\u002Fcompetition\u002F)\n- [NeurIPS 2019: 侦察盲棋](https:\u002F\u002Frbc.jhuapl.edu\u002F)\n- [NeurIPS 2019: 机器人开放式自主学习](https:\u002F\u002Fwww.aicrowd.com\u002Fchallenges\u002Frobot-open-ended-autonomous-learning-real)\n- [Unity 障碍塔挑战赛 2019](https:\u002F\u002Fblogs.unity3d.com\u002F2019\u002F01\u002F28\u002Fobstacle-tower-challenge-test-the-limits-of-intelligence-systems\u002F)\n\n> 查看 [AICrowd](https:\u002F\u002Fwww.aicrowd.com) 获取主要强化学习竞赛的最新列表\n\n## 时间线\n\n- 1947: [Monte Carlo Sampling (蒙特卡洛采样)](http:\u002F\u002Feniacinaction.com\u002Fthe-articles\u002F3-los-alamos-bets-on-eniac-nuclear-monte-carlo-simulations-1947-8\u002F)\n- 1958: [Perceptron (感知机)](https:\u002F\u002Fwww.ling.upenn.edu\u002Fcourses\u002Fcogs501\u002FRosenblatt1958.pdf)\n- 1959: [Temporal Difference Learning (时序差分学习)](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=1661924)\n- 1983: [ASE-ALE — 第一个 Actor-Critic (演员 - 评论家) 算法](https:\u002F\u002Fpsycnet.apa.org\u002Frecord\u002F1984-13799-001)\n- 1986: [Backpropagation (反向传播) 算法](https:\u002F\u002Fwww.nature.com\u002Farticles\u002F323533a0)\n- 1989: [CNNs (卷积神经网络)](http:\u002F\u002Fciteseerx.ist.psu.edu\u002Fviewdoc\u002Fdownload?doi=10.1.1.476.479&rep=rep1&type=pdf)\n- 1989: [Q-Learning (Q 学习)](http:\u002F\u002Fwww.cs.rhul.ac.uk\u002F~chrisw\u002Fnew_thesis.pdf)\n- 1991: [TD-Gammon](http:\u002F\u002Fbkgm.com\u002Fbooks\u002FRobertie-LearningFromTheMachine.html)\n- 1992: [REINFORCE](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=139614)\n- 1992: [Experience Replay (经验回放)](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=139620)\n- 1994: [SARSA](http:\u002F\u002Fciteseerx.ist.psu.edu\u002Fviewdoc\u002Fsummary?doi=10.1.1.17.2539)\n- 1999: [Nvidia 发明了 GPU (图形处理器)](https:\u002F\u002Fwww.nvidia.com\u002Fobject\u002Fgpu.html)\n- 2007: [CUDA (统一计算设备架构) 发布](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-zone)\n- 2012: [Arcade Learning Environment (ALE) (街机学习环境)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1207.4708)\n- 2013: [DQN (深度 Q 网络)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1312.5602)\n- 2015 年 2 月：[DQN 在 Atari 上达到人类水平控制](https:\u002F\u002Fwww.nature.com\u002Farticles\u002Fnature14236)\n- 2015 年 2 月：[TRPO (信任区域策略优化)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1502.05477)\n- 2015 年 6 月：[Generalized Advantage Estimation (广义优势估计)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1506.02438)\n- 2015 年 9 月：[Deep Deterministic Policy Gradient (DDPG) (深度确定性策略梯度)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1509.02971)\n- 2015 年 9 月：[DoubleDQN (双重深度 Q 网络)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1509.06461)\n- 2015 年 11 月：[DuelingDQN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.06581)\n- 2015 年 11 月：[Prioritized Experience Replay (优先经验回放)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1511.05952)\n- 2015 年 11 月：[TensorFlow](https:\u002F\u002Fwww.tensorflow.org\u002F)\n- 2016 年 2 月：[A3C (异步优势演员 - 评论家)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1602.01783)\n- 2016 年 3 月：[AlphaGo 以 4-1 击败李世石](https:\u002F\u002Fdeepmind.com\u002Falphago-korea)\n- 2016 年 6 月：[OpenAI Gym](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym)\n- 2016 年 6 月：[Generative Adversarial Imitation Learning (GAIL) (生成对抗模仿学习)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1606.03476)\n- 2016 年 10 月：[PyTorch](https:\u002F\u002Fpytorch.org\u002F)\n- 2017 年 3 月：[Model-Agnostic Meta-Learning (MAML) (模型无关元学习)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.03400)\n- 2017 年 7 月：[Distributional RL (分布式强化学习)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.06887)\n- 2017 年 7 月：[PPO (近端策略优化)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.06347)\n- 2017 年 8 月：[OpenAI DotA 2 1:1](https:\u002F\u002Fopenai.com\u002Fblog\u002Fmore-on-dota-2\u002F)\n- 2017 年 8 月：[Intrinsic Curiosity Module (ICM) (内在好奇心模块)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1705.05363)\n- 2017 年 10 月：[Rainbow](https:\u002F\u002Farxiv.org\u002Fabs\u002F1710.02298)\n- 2017 年 10 月：[AlphaGo Zero 在无人类知识情况下精通围棋](https:\u002F\u002Fdeepmind.com\u002Fblog\u002Farticle\u002Falphago-zero-starting-scratch)\n- 2017 年 12 月：[AlphaZero 精通围棋、国际象棋和将棋](https:\u002F\u002Farxiv.org\u002Fabs\u002F1712.01815)\n- 2018 年 1 月：[Soft Actor-Critic (软演员 - 评论家)](https:\u002F\u002Fai.googleblog.com\u002F2019\u002F01\u002Fsoft-actor-critic-deep-reinforcement.html)\n- 2018 年 2 月：[IMPALA](https:\u002F\u002Fdeepmind.com\u002Fblog\u002Farticle\u002Fimpala-scalable-distributed-deeprl-dmlab-30)\n- 2018 年 6 月：[Qt-Opt](https:\u002F\u002Fai.googleblog.com\u002F2018\u002F06\u002Fscalable-deep-reinforcement-learning.html)\n- 2018 年 11 月：[Go-Explore 解决了 Montezuma's Revenge (蒙特祖玛的复仇)](https:\u002F\u002Feng.uber.com\u002Fgo-explore\u002F)\n- 2018 年 12 月：[AlphaZero 成为国际象棋、围棋和将棋历史上最强的选手](https:\u002F\u002Fdeepmind.com\u002Fblog\u002Farticle\u002Falphazero-shedding-new-light-grand-games-chess-shogi-and-go)\n- 2019 年 4 月：[OpenAI Five 在 DotA 2 中击败世界冠军](https:\u002F\u002Fopenai.com\u002Ffive\u002F)\n- 2019 年 5 月：[FTW Quake III Arena 夺旗模式](https:\u002F\u002Fdeepmind.com\u002Fblog\u002Farticle\u002Fcapture-the-flag-science)\n- 2019 年 8 月：[AlphaStar: 星际争霸 II 大师级别](https:\u002F\u002Fdeepmind.com\u002Fblog\u002Farticle\u002FAlphaStar-Grandmaster-level-in-StarCraft-II-using-multi-agent-reinforcement-learning)\n- 2019 年 9 月：[多智能体交互中涌现的工具使用](https:\u002F\u002Fopenai.com\u002Fblog\u002Femergent-tool-use\u002F)\n- 2019 年 10 月：[用机械手解决魔方](https:\u002F\u002Fopenai.com\u002Fblog\u002Fsolving-rubiks-cube\u002F)\n- 2020 年 3 月：[Agent57 在所有 57 款 Atari 游戏中超越标准人类基准](https:\u002F\u002Fdeepmind.com\u002Fblog\u002Farticle\u002FAgent57-Outperforming-the-human-Atari-benchmark)\n- 2020 年 11 月：[AlphaFold 用于蛋白质折叠](https:\u002F\u002Fdeepmind.com\u002Fblog\u002Farticle\u002Falphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology)\n- 2020 年 12 月：[MuZero 在无规则情况下精通围棋、国际象棋、将棋和 Atari](https:\u002F\u002Fdeepmind.com\u002Fblog\u002Farticle\u002Fmuzero-mastering-go-chess-shogi-and-atari-without-rules)\n- 2021 年 8 月：[通用能力智能体从开放式游戏中涌现](https:\u002F\u002Fdeepmind.com\u002Fblog\u002Farticle\u002Fgenerally-capable-agents-emerge-from-open-ended-play)\n\n## 书籍\n\n- [强化学习算法 (Algorithms for Reinforcement Learning). *Szepesvari 等.*](https:\u002F\u002Fwww.amazon.com\u002FAlgorithms-Reinforcement-Learning-Csaba-Szepesvari\u002Fdp\u002F1608454924)\n- [深度强化学习简介 (An Introduction to Deep Reinforcement Learning). *Francois-Lavet 等.*](https:\u002F\u002Fwww.amazon.com\u002Fdp\u002F1680835386)\n- [深度强化学习实战 (Deep Reinforcement Learning Hands-On). *Lapan*](https:\u002F\u002Fwww.amazon.com\u002FDeep-Reinforcement-Learning-Hands-optimisation\u002Fdp\u002F1838826998)\n- [深度强化学习行动 (Deep Reinforcement Learning in Action). *Zai & Brown*](https:\u002F\u002Fwww.amazon.com\u002FDeep-Reinforcement-Learning-Action-Alexander\u002Fdp\u002F1617295434)\n- [深度强化学习基础 (Foundations of Deep Reinforcement Learning). *Graesser & Keng*](https:\u002F\u002Fwww.amazon.com\u002Fdp\u002F0135172381)\n- [深入理解深度强化学习 (Grokking Deep Reinforcement Learning). *Morales*](https:\u002F\u002Fwww.amazon.com\u002FGrokking-Reinforcement-Learning-Miguel-Morales\u002Fdp\u002F1617295450)\n- [强化学习：简介 (Reinforcement Learning: An Introduction). *Sutton & Barto.*](https:\u002F\u002Fwww.amazon.com\u002Fdp\u002F0262039249)\n\n## 教程\n\n- [Andrew Karpathy 深度强化学习 (Deep Reinforcement Learning)：从像素玩乒乓球游戏 (Pong)](http:\u002F\u002Fkarpathy.github.io\u002F2016\u002F05\u002F31\u002Frl\u002F)\n- [Arthur Juliani TensorFlow 中的简单强化学习 (Reinforcement Learning) 系列](https:\u002F\u002Fmedium.com\u002Femergent-future\u002Fsimple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0)\n- [伯克利深度强化学习课程](http:\u002F\u002Frail.eecs.berkeley.edu\u002Fdeeprlcourse\u002F)\n- [David Silver UCL 2015 强化学习 (RL) 课程](http:\u002F\u002Fwww0.cs.ucl.ac.uk\u002Fstaff\u002Fd.silver\u002Fweb\u002FTeaching.html)\n- [2017 深度强化学习 (Deep RL) 训练营](https:\u002F\u002Fsites.google.com\u002Fview\u002Fdeep-rl-bootcamp\u002Flectures)\n- [DeepMind UCL 2018 深度强化学习 (Deep RL) 课程](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLqYmG7hTraZDNJre23vqCGIVpfZ_K2RZs)\n- [DeepMind 学习资源](https:\u002F\u002Fdeepmind.com\u002Flearning-resources)\n- [dennybritz\u002Freinforcement-learning](https:\u002F\u002Fgithub.com\u002Fdennybritz\u002Freinforcement-learning)\n- [higgsfield\u002FRL-Adventure-2](https:\u002F\u002Fgithub.com\u002Fhiggsfield\u002FRL-Adventure-2)\n- [higgsfield\u002FRL-Adventure](https:\u002F\u002Fgithub.com\u002Fhiggsfield\u002FRL-Adventure)\n- [Hugging Face 深度强化学习课程 🤗](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdeep-rl-class#the-hugging-face-deep-reinforcement-learning-class-)\n- [MorvanZhou\u002F强化学习方法与教程](https:\u002F\u002Fgithub.com\u002FMorvanZhou\u002FReinforcement-learning-with-tensorflow)\n- [OpenAI Spinning Up](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fspinningup)\n- [Sergey Levine CS294 2017 秋季深度强化学习课程](http:\u002F\u002Frail.eecs.berkeley.edu\u002Fdeeprlcourse-fa17\u002Findex.html)\n- [Udacity 深度强化学习纳米学位 (Nanodegree)](https:\u002F\u002Fwww.udacity.com\u002Fcourse\u002Fdeep-reinforcement-learning-nanodegree--nd893)\n- [强化学习基础](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLzvYlJMoZ02Dxtwe-MmH4nOB5jYlMGBjr)\n- [PPOxFamily：深度强化学习 (DRL) 教程课程](https:\u002F\u002Fgithub.com\u002Fopendilab\u002FPPOxFamily)\n\n## 博客\n\n- [Alex Irpan](https:\u002F\u002Fwww.alexirpan.com)\n- [Andrew Karpathy](http:\u002F\u002Fkarpathy.github.io\u002F)\n- [Berkeley 人工智能 (AI) 研究](https:\u002F\u002Fbair.berkeley.edu\u002Fblog\u002F)\n- [Chris Olah](https:\u002F\u002Fcolah.github.io\u002F)\n- [David Ha](http:\u002F\u002Fblog.otoro.net\u002F)\n- [DeepMind](https:\u002F\u002Fdeepmind.com\u002Fblog)\n- [Distill](https:\u002F\u002Fdistill.pub)\n- [Eric Jang](https:\u002F\u002Fblog.evjang.com)\n- [Facebook AI](https:\u002F\u002Fai.facebook.com\u002Fblog\u002F)\n- [Google AI](https:\u002F\u002Fai.googleblog.com\u002F)\n- [Lilian Weng](https:\u002F\u002Flilianweng.github.io\u002Flil-log\u002F)\n- [Matthew Rahtz](http:\u002F\u002Famid.fish\u002F)\n- [OpenAI](https:\u002F\u002Fopenai.com\u002Fblog\u002F)\n- [The Gradient](https:\u002F\u002Fthegradient.pub\u002F)\n- [Uber AI](https:\u002F\u002Feng.uber.com\u002Fcategory\u002Farticles\u002Fai\u002F)","# awesome-deep-rl 快速上手指南\n\n## 简介\n`awesome-deep-rl` 不是一个直接可调用的代码库，而是一个**深度强化学习（Deep RL）资源 curated 清单**。它汇集了主流的算法库、环境、基准测试及教程。本指南将帮助你获取该清单，并基于清单中推荐的库快速开始一个深度强化学习项目。\n\n## 环境准备\n在开始之前，请确保你的系统满足以下要求：\n\n- **操作系统**: Linux, macOS 或 Windows\n- **Python**: 3.7 或更高版本\n- **版本控制**: Git\n- **深度学习框架**: PyTorch 或 TensorFlow (根据所选库决定)\n- **硬件**: 可选 NVIDIA GPU (用于加速训练)\n\n## 安装步骤\n\n1. **克隆资源清单仓库**\n   通过 Git 获取该资源列表到本地：\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Fpranz24\u002Fawesome-deep-rl.git\n   ```\n\n2. **安装依赖库**\n   该清单本身无依赖，但使用清单中推荐的库（如 Stable Baselines3）需要安装相应环境。建议使用国内镜像源加速安装：\n   ```bash\n   pip install stable-baselines3[extra] -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n   pip install gymnasium -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n   ```\n\n## 基本使用\n\n### 1. 浏览资源\n克隆完成后，直接在本地查看 `README.md` 文件，或在 GitHub 页面浏览。清单按以下分类整理，可根据需求查找：\n- **Libraries**: 算法库（如 Ray RLLib, Stable Baselines, Tianshou）\n- **Environments**: 训练环境（如 Gym, MuJoCo, Atari）\n- **Tutorials**: 教程与书籍\n- **Benchmark Results**: 算法性能基准\n\n### 2. 实战示例\n基于清单中推荐的 **Stable Baselines** 生态，以下是一个使用 PPO 算法训练智能体的最小化示例：\n\n```python\nimport gymnasium as gym\nfrom stable_baselines3 import PPO\n\n# 创建环境\nenv = gym.make(\"CartPole-v1\")\n\n# 初始化 PPO 算法\nmodel = PPO(\"MlpPolicy\", env, verbose=1)\n\n# 训练模型\nmodel.learn(total_timesteps=10000)\n\n# 测试模型\nobs, _ = env.reset()\nfor i in range(1000):\n    action, _states = model.predict(obs, deterministic=True)\n    obs, reward, terminated, truncated, info = env.step(action)\n    \n    if terminated or truncated:\n        obs, _ = env.reset()\n\nenv.close()\n```\n\n### 3. 进阶探索\n- 如需多智能体强化学习，参考清单中的 **MAgent** 或 **PettingZoo**。\n- 如需离线强化学习，参考清单中的 **d3rlpy**。\n- 如需大规模分布式训练，参考清单中的 **Berkeley Ray RLLib**。","某机器人初创公司的算法团队正在开发四足机器人的步态控制系统，计划采用深度强化学习技术在仿真环境中训练策略网络。\n\n### 没有 awesome-deep-rl 时\n- 搜索引擎结果杂乱无章，难以区分过时的教程与最新的 SOTA 方案，技术选型耗时极长。\n- 不清楚哪些开源库支持特定的仿真环境，团队反复试错配置，消耗了大量宝贵的算力资源。\n- 缺乏权威的基准测试数据参考，无法快速评估当前算法性能是否达到行业达标线。\n- 新入职成员上手困难，缺少系统化的学习路径和资源指引，培训周期长达一个月。\n\n### 使用 awesome-deep-rl 后\n- 通过 Libraries 分类快速锁定支持仿真训练的主流框架，如 Ray RLLib 或 K-Sim，选型效率提升 80%。\n- 参考 Benchmark Results 直接对比不同算法在相同环境下的性能，避免盲目选择低效模型。\n- 利用 Tutorials 和 Books 板块构建标准化培训流程，结合优质博客资源，新人一周内即可上手开发。\n- 借助 Environments 列表找到适配的仿真场景接口，大幅减少环境配置与调试时间，专注核心算法优化。\n\nawesome-deep-rl 充当了深度强化学习领域的权威导航图，显著降低了技术选型门槛并加速了研发落地进程。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_awesome-deep-rl_b9553438.png","kengz","Keng","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fkengz_998657a3.jpg","Engineer by day, rock climber by night. Mathematician at heart.",null,"NYC","kengzwl@gmail.com","kengz.github.io","https:\u002F\u002Fgithub.com\u002Fkengz",883,83,"2026-04-04T04:38:23","MIT",1,"未说明",{"notes":91,"python":89,"dependencies":92},"该仓库为深度强化学习资源合集列表，并非单一可执行工具。具体运行环境需求取决于用户选择的子项目（如 Stable Baselines, Ray RLLib, PyTorch 等）。大多数列出的库基于 Python 开发，部分深度学习算法推荐配备 NVIDIA GPU。",[],[13,15,54],[95,96,97,98,99],"awesome-list","deep-reinforcement-learning","resources","deep-rl","reinforcement-learning","2026-03-27T02:49:30.150509","2026-04-06T05:37:25.257028",[],[]]