[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-kengz--SLM-Lab":3,"tool-kengz--SLM-Lab":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",151918,2,"2026-04-12T11:33:05",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":77,"owner_email":78,"owner_twitter":76,"owner_website":79,"owner_url":80,"languages":81,"stars":94,"forks":95,"last_commit_at":96,"license":97,"difficulty_score":32,"env_os":98,"env_gpu":99,"env_ram":100,"env_deps":101,"category_tags":110,"github_topics":112,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":123,"updated_at":124,"faqs":125,"releases":155},6976,"kengz\u002FSLM-Lab","SLM-Lab","Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book \"Foundations of Deep Reinforcement Learning\".","SLM-Lab 是一个基于 PyTorch 构建的模块化深度强化学习框架，也是经典教材《深度强化学习基础》的官方配套代码库。它旨在解决强化学习研究中实验配置复杂、结果难以复现以及数据分析繁琐等痛点。通过 SLM-Lab，用户无需反复修改底层代码，仅需编写简洁的 JSON 配置文件即可定义并运行完整的实验流程，极大地降低了试错成本。\n\n该工具非常适合人工智能研究人员、算法工程师以及希望深入理解强化学习原理的学生使用。其核心亮点在于内置了 PPO、SAC、DQN 等十余种主流算法，并已在 70 多种环境中完成验证，确保开箱即用。SLM-Lab 特别强调实验的可复现性，每次运行都会自动保存配置细节与代码版本信息，方便随时回溯。此外，它还集成了自动分析功能，能够直接生成训练曲线、关键指标及 TensorBoard 日志，并支持对接云端 GPU 资源与 HuggingFace 社区，让从本地调试到成果分享的全过程更加流畅高效。无论是用于教学演示还是前沿算法探索，SLM-Lab 都能提供专业且便捷的支持。","# [SLM Lab](https:\u002F\u002Fwww.amazon.com\u002Fdp\u002F0135172381) \u003Cbr> ![GitHub tag (latest SemVer)](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Ftag\u002Fkengz\u002Fslm-lab) ![CI](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fworkflows\u002FCI\u002Fbadge.svg)\n\n\u003Cp align=\"center\">\n  \u003Ci>Modular Deep Reinforcement Learning framework in PyTorch.\u003C\u002Fi>\n  \u003Cbr>\n  \u003Ci>Companion library of the book \u003Ca href=\"https:\u002F\u002Fwww.amazon.com\u002Fdp\u002F0135172381\">Foundations of Deep Reinforcement Learning\u003C\u002Fa>.\u003C\u002Fi>\n  \u003Cbr>\n  \u003Ca href=\"https:\u002F\u002Fslm-lab.gitbook.io\u002Fslm-lab\u002F\">Documentation\u003C\u002Fa> · \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fblob\u002Fmaster\u002Fdocs\u002FBENCHMARKS.md\">Benchmark Results\u003C\u002Fa>\n\u003C\u002Fp>\n\n>**NOTE:** v5.0 updates to Gymnasium, `uv` tooling, and modern dependencies with ARM support - see [CHANGELOG.md](docs\u002FCHANGELOG.md).\n>\n>Book readers: `git checkout v4.1.1` for *Foundations of Deep Reinforcement Learning* code.\n\n|||||\n|:---:|:---:|:---:|:---:|\n| ![ppo beamrider](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_c123aacc5375.gif) | ![ppo breakout](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_a2a7dfeaa895.gif) | ![ppo kungfumaster](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_4a92dfc090de.gif) | ![ppo mspacman](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_268ef8bdac55.gif) |\n| BeamRider | Breakout | KungFuMaster | MsPacman |\n| ![ppo pong](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_2e841b53a58d.gif) | ![ppo qbert](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_a9ed3976f8b5.gif) | ![ppo seaquest](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_be90a1159620.gif) | ![ppo spaceinvaders](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_7c458b01cf48.gif) |\n| Pong | Qbert | Seaquest | Sp.Invaders |\n| ![sac ant](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_9be66a77c4fa.gif) | ![sac halfcheetah](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_aa2f86f58dc4.gif) | ![sac hopper](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_16722c868c53.gif) | ![sac humanoid](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_08d29086eca5.gif) |\n| Ant | HalfCheetah | Hopper | Humanoid |\n| ![sac doublependulum](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_173b45387279.gif) | ![sac pendulum](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_db513900b92b.gif) | ![sac reacher](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_235954f17239.gif) | ![sac walker](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_1aff20ae0bc2.gif) |\n| Inv.DoublePendulum | InvertedPendulum | Reacher | Walker |\n\nSLM Lab is a software framework for **reinforcement learning** (RL) research and application in PyTorch. RL trains agents to make decisions by learning from trial and error—like teaching a robot to walk or an AI to play games.\n\n## What SLM Lab Offers\n\n| Feature | Description |\n|---------|-------------|\n| **Ready-to-use algorithms** | PPO, SAC, CrossQ, DQN, A2C, REINFORCE—validated on 70+ environments |\n| **Easy configuration** | JSON spec files fully define experiments—no code changes needed |\n| **Reproducibility** | Every run saves its spec + git SHA for exact reproduction |\n| **Automatic analysis** | Training curves, metrics, and TensorBoard logging out of the box |\n| **Cloud integration** | dstack for GPU training, HuggingFace for sharing results |\n\n## Algorithms\n\n| Algorithm | Type | Best For | Validated Environments |\n|-----------|------|----------|------------------------|\n| **REINFORCE** | On-policy | Learning\u002Fteaching | Classic |\n| **SARSA** | On-policy | Tabular-like | Classic |\n| **DQN\u002FDDQN+PER** | Off-policy | Discrete actions | Classic, Box2D, Atari |\n| **A2C** | On-policy | Fast iteration | Classic, Box2D, Atari |\n| **PPO** | On-policy | General purpose | Classic, Box2D, MuJoCo (11), Atari (54) |\n| **SAC** | Off-policy | Continuous control | Classic, Box2D, MuJoCo |\n| **CrossQ** | Off-policy | Sample-efficient control | Classic, Box2D, MuJoCo |\n\nSee [Benchmark Results](docs\u002FBENCHMARKS.md) for detailed performance data.\n\n## Environments\n\nSLM Lab uses [Gymnasium](https:\u002F\u002Fgymnasium.farama.org\u002F) (the maintained fork of OpenAI Gym):\n\n| Category | Examples | Difficulty | Docs |\n|----------|----------|------------|------|\n| **Classic Control** | CartPole, Pendulum, Acrobot | Easy | [Gymnasium Classic](https:\u002F\u002Fgymnasium.farama.org\u002Fenvironments\u002Fclassic_control\u002F) |\n| **Box2D** | LunarLander, BipedalWalker | Medium | [Gymnasium Box2D](https:\u002F\u002Fgymnasium.farama.org\u002Fenvironments\u002Fbox2d\u002F) |\n| **MuJoCo** | Hopper, HalfCheetah, Humanoid | Hard | [Gymnasium MuJoCo](https:\u002F\u002Fgymnasium.farama.org\u002Fenvironments\u002Fmujoco\u002F) |\n| **Atari** | Breakout, MsPacman, and 54 more | Varied | [ALE](https:\u002F\u002Fale.farama.org\u002Fenvironments\u002F) |\n\nAny gymnasium-compatible environment works—just specify its name in the spec.\n\n## Quick Start\n\n```bash\n# Install\nuv sync\nuv tool install --editable .\n\n# Run demo (PPO CartPole)\nslm-lab run                                    # PPO CartPole\nslm-lab run --render                           # with visualization\n\n# Run custom experiment\nslm-lab run spec.json spec_name train          # local training\nslm-lab run-remote spec.json spec_name train   # cloud training (dstack)\n\n# Help (CLI uses Typer)\nslm-lab --help                                 # list all commands\nslm-lab run --help                             # options for run command\n\n# Troubleshoot: if slm-lab not found, use uv run\nuv run slm-lab run\n```\n\n## Cloud Training (dstack)\n\nRun experiments on cloud GPUs with automatic result sync to HuggingFace.\n\n```bash\n# Setup\ncp .env.example .env  # Add HF_TOKEN\nuv tool install dstack  # Install dstack CLI\n# Configure dstack server - see https:\u002F\u002Fdstack.ai\u002Fdocs\u002Fquickstart\n\n# Run on cloud\nslm-lab run-remote spec.json spec_name train           # CPU training (default)\nslm-lab run-remote spec.json spec_name search          # CPU ASHA search (default)\nslm-lab run-remote --gpu spec.json spec_name train     # GPU training (for image envs)\n\n# Sync results\nslm-lab pull spec_name    # Download from HuggingFace\nslm-lab list              # List available experiments\n```\n\nConfig options in `.dstack\u002F`: `run-gpu-train.yml`, `run-gpu-search.yml`, `run-cpu-train.yml`, `run-cpu-search.yml`\n\n### Minimal Install (Orchestration Only)\n\nFor a lightweight box that only dispatches dstack runs, syncs results, and generates plots (no local ML training):\n\n```bash\nuv sync --no-default-groups  # skip ML deps (torch, gymnasium, etc.)\nuv tool install dstack\nuv run --no-default-groups slm-lab run-remote spec.json spec_name train\nuv run --no-default-groups slm-lab pull spec_name\nuv run --no-default-groups slm-lab plot -f folder1,folder2\n```\n\n## Citation\n\nIf you use SLM Lab in your research, please cite:\n\n```bibtex\n@misc{kenggraesser2017slmlab,\n    author = {Keng, Wah Loon and Graesser, Laura},\n    title = {SLM Lab},\n    year = {2017},\n    publisher = {GitHub},\n    journal = {GitHub repository},\n    howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab}},\n}\n```\n\n## License\n\nMIT\n","# [SLM Lab](https:\u002F\u002Fwww.amazon.com\u002Fdp\u002F0135172381) \u003Cbr> ![GitHub tag (latest SemVer)](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Ftag\u002Fkengz\u002Fslm-lab) ![CI](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fworkflows\u002FCI\u002Fbadge.svg)\n\n\u003Cp align=\"center\">\n  \u003Ci>基于PyTorch的模块化深度强化学习框架。\u003C\u002Fi>\n  \u003Cbr>\n  \u003Ci>本书《深度强化学习基础》的配套库。\u003Ca href=\"https:\u002F\u002Fwww.amazon.com\u002Fdp\u002F0135172381\">Foundations of Deep Reinforcement Learning\u003C\u002Fa>。\u003C\u002Fi>\n  \u003Cbr>\n  \u003Ca href=\"https:\u002F\u002Fslm-lab.gitbook.io\u002Fslm-lab\u002F\">文档\u003C\u002Fa> · \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fblob\u002Fmaster\u002Fdocs\u002FBENCHMARKS.md\">基准测试结果\u003C\u002Fa>\n\u003C\u002Fp>\n\n>**注意：** v5.0 更新了 Gymnasium、`uv` 工具链以及现代依赖项，并增加了对 ARM 架构的支持——详情请参阅 [CHANGELOG.md](docs\u002FCHANGELOG.md)。\n>\n>对于本书读者：请使用 `git checkout v4.1.1` 来获取《深度强化学习基础》中的代码。\n\n|||||\n|:---:|:---:|:---:|:---:|\n| ![ppo beamrider](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_c123aacc5375.gif) | ![ppo breakout](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_a2a7dfeaa895.gif) | ![ppo kungfumaster](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_4a92dfc090de.gif) | ![ppo mspacman](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_268ef8bdac55.gif) |\n| BeamRider | Breakout | KungFuMaster | MsPacman |\n| ![ppo pong](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_2e841b53a58d.gif) | ![ppo qbert](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_a9ed3976f8b5.gif) | ![ppo seaquest](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_be90a1159620.gif) | ![ppo spaceinvaders](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_7c458b01cf48.gif) |\n| Pong | Qbert | Seaquest | Sp.Invaders |\n| ![sac ant](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_9be66a77c4fa.gif) | ![sac halfcheetah](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_aa2f86f58dc4.gif) | ![sac hopper](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_16722c868c53.gif) | ![sac humanoid](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_08d29086eca5.gif) |\n| Ant | HalfCheetah | Hopper | Humanoid |\n| ![sac doublependulum](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_173b45387279.gif) | ![sac pendulum](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_db513900b92b.gif) | ![sac reacher](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_235954f17239.gif) | ![sac walker](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_readme_1aff20ae0bc2.gif) |\n| Inv.DoublePendulum | InvertedPendulum | Reacher | Walker |\n\nSLM Lab 是一个用于 PyTorch 中 **强化学习**（RL）研究与应用的软件框架。强化学习通过试错来训练智能体做出决策，例如教机器人走路或让 AI 玩游戏。\n\n## SLM Lab 提供的功能\n\n| 功能 | 描述 |\n|---------|-------------|\n| **开箱即用的算法** | PPO、SAC、CrossQ、DQN、A2C、REINFORCE——已在 70 多个环境中验证 |\n| **简单配置** | JSON 规范文件完全定义实验，无需修改代码 |\n| **可重复性** | 每次运行都会保存规范和 git SHA，以便精确复现 |\n| **自动分析** | 训练曲线、指标和 TensorBoard 日志开箱即用 |\n| **云集成** | dstack 用于 GPU 训练，HuggingFace 用于分享结果 |\n\n## 算法\n\n| 算法 | 类型 | 最适合 | 已验证环境 |\n|-----------|------|----------|------------------------|\n| **REINFORCE** | 在策略 | 学习\u002F教学 | 经典 |\n| **SARSA** | 在策略 | 表格式 | 经典 |\n| **DQN\u002FDDQN+PER** | 离策略 | 离散动作 | 经典、Box2D、Atari |\n| **A2C** | 在策略 | 快速迭代 | 经典、Box2D、Atari |\n| **PPO** | 在策略 | 通用 | 经典、Box2D、MuJoCo（11）、Atari（54） |\n| **SAC** | 离策略 | 连续控制 | 经典、Box2D、MuJoCo |\n| **CrossQ** | 离策略 | 高效采样控制 | 经典、Box2D、MuJoCo |\n\n详细性能数据请参阅 [Benchmark Results](docs\u002FBENCHMARKS.md)。\n\n## 环境\n\nSLM Lab 使用 [Gymnasium](https:\u002F\u002Fgymnasium.farama.org\u002F)（OpenAI Gym 的维护分支）：\n\n| 类别 | 示例 | 难度 | 文档 |\n|----------|----------|------------|------|\n| **经典控制** | CartPole、Pendulum、Acrobot | 简单 | [Gymnasium Classic](https:\u002F\u002Fgymnasium.farama.org\u002Fenvironments\u002Fclassic_control\u002F) |\n| **Box2D** | LunarLander、BipedalWalker | 中等 | [Gymnasium Box2D](https:\u002F\u002Fgymnasium.farama.org\u002Fenvironments\u002Fbox2d\u002F) |\n| **MuJoCo** | Hopper、HalfCheetah、Humanoid | 困难 | [Gymnasium MuJoCo](https:\u002F\u002Fgymnasium.farama.org\u002Fenvironments\u002Fmujoco\u002F) |\n| **Atari** | Breakout、MsPacman 等 54 款游戏 | 不同 | [ALE](https:\u002F\u002Fale.farama.org\u002Fenvironments\u002F) |\n\n任何兼容 Gymnasium 的环境都可以使用——只需在规范中指定其名称即可。\n\n## 快速入门\n\n```bash\n# 安装\nuv sync\nuv tool install --editable .\n\n# 运行演示（PPO CartPole）\nslm-lab run                                    # PPO CartPole\nslm-lab run --render                           # 带可视化\n\n# 运行自定义实验\nslm-lab run spec.json spec_name train          # 本地训练\nslm-lab run-remote spec.json spec_name train   # 云端训练（dstack）\n\n# 帮助（CLI 使用 Typer）\nslm-lab --help                                 # 列出所有命令\nslm-lab run --help                             # run 命令的选项\n\n# 故障排除：如果找不到 slm-lab，请使用 uv run\nuv run slm-lab run\n```\n\n## 云端训练（dstack）\n\n在云端 GPU 上运行实验，并自动将结果同步到 HuggingFace。\n\n```bash\n# 设置\ncp .env.example .env  # 添加 HF_TOKEN\nuv tool install dstack  # 安装 dstack CLI\n# 配置 dstack 服务器——详见 https:\u002F\u002Fdstack.ai\u002Fdocs\u002Fquickstart\n\n# 在云端运行\nslm-lab run-remote spec.json spec_name train           # CPU 训练（默认）\nslm-lab run-remote spec.json spec_name search          # CPU ASHA 搜索（默认）\nslm-lab run-remote --gpu spec.json spec_name train     # GPU 训练（适用于图像环境）\n\n# 同步结果\nslm-lab pull spec_name    # 从 HuggingFace 下载\nslm-lab list              # 列出可用实验\n```\n\n`.dstack\u002F` 中的配置选项：`run-gpu-train.yml`、`run-gpu-search.yml`、`run-cpu-train.yml`、`run-cpu-search.yml`\n\n### 极简安装（仅编排）\n\n对于仅用于分发 dstack 作业、同步结果和生成图表的轻量级机器（不进行本地机器学习训练）：\n\n```bash\nuv sync --no-default-groups  # 跳过机器学习依赖（torch、gymnasium 等）\nuv tool install dstack\nuv run --no-default-groups slm-lab run-remote spec.json spec_name train\nuv run --no-default-groups slm-lab pull spec_name\nuv run --no-default-groups slm-lab plot -f folder1,folder2\n```\n\n## 引用\n\n如果您在研究中使用 SLM Lab，请引用以下文献：\n\n```bibtex\n@misc{kenggraesser2017slmlab,\n    author = {Keng, Wah Loon and Graesser, Laura},\n    title = {SLM Lab},\n    year = {2017},\n    publisher = {GitHub},\n    journal = {GitHub repository},\n    howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab}},\n}\n```\n\n## 许可证\n\nMIT","# SLM-Lab 快速上手指南\n\nSLM-Lab 是一个基于 PyTorch 的模块化深度强化学习（Deep RL）框架，也是书籍《Foundations of Deep Reinforcement Learning》的配套代码库。它支持 PPO、SAC、DQN 等主流算法，适用于从经典控制到 Atari 游戏等多种环境。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：Linux, macOS, 或 Windows (WSL2 推荐)。\n    *   *注：v5.0+ 版本已原生支持 ARM 架构（如 Apple Silicon）。*\n*   **Python 版本**：建议 Python 3.10 或更高版本。\n*   **前置工具**：\n    *   **uv**：SLM-Lab v5.0 推荐使用 `uv` 进行依赖管理和工具安装（比 pip 更快）。\n    *   **Git**：用于克隆代码库。\n*   **硬件要求**：\n    *   基础实验：普通 CPU 即可。\n    *   复杂环境（如 Atari, MuJoCo）：建议配备 NVIDIA GPU 以加速训练。\n\n> **提示**：国内开发者若遇到 `uv` 或 PyPI 源下载缓慢，可配置国内镜像加速：\n> ```bash\n> export UV_INDEX_URL=https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n## 安装步骤\n\nSLM-Lab 使用 `uv` 管理项目依赖和命令行工具。请按以下步骤操作：\n\n1.  **克隆仓库**\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab.git\n    cd SLM-Lab\n    ```\n    > **注意**：如果您是为了配合书籍《Foundations of Deep Reinforcement Learning》学习，请切换到对应版本：\n    > ```bash\n    > git checkout v4.1.1\n    > ```\n\n2.  **同步依赖并安装工具**\n    使用 `uv` 同步环境并将 `slm-lab` 命令安装到全局工具链中：\n    ```bash\n    uv sync\n    uv tool install --editable .\n    ```\n\n3.  **验证安装**\n    查看帮助信息以确认安装成功：\n    ```bash\n    slm-lab --help\n    ```\n    *如果终端提示找不到 `slm-lab` 命令，请使用 `uv run slm-lab --help` 运行。*\n\n## 基本使用\n\nSLM-Lab 的核心设计理念是通过 JSON 配置文件定义实验，无需修改代码即可运行不同的算法和环境。\n\n### 1. 运行默认示例\n运行默认的 PPO 算法在 CartPole 环境上进行训练：\n```bash\nslm-lab run\n```\n若要实时可视化训练过程（弹出窗口显示游戏画面）：\n```bash\nslm-lab run --render\n```\n\n### 2. 运行自定义实验\n您可以指定自己的配置文件（`.json`）和实验名称来启动训练：\n```bash\n# 格式：slm-lab run \u003C配置文件> \u003C实验名称> \u003C模式>\nslm-lab run spec.json my_experiment train\n```\n\n### 3. 查看结果\n训练完成后，SLM-Lab 会自动生成训练曲线、指标数据，并支持 TensorBoard 查看。结果通常保存在 `data\u002F` 目录下。\n\n### 4. 云端训练（可选）\n如果您配置了 `dstack` 和 HuggingFace Token，可以将任务分发到云端 GPU：\n```bash\n# 需先在 .env 文件中配置 HF_TOKEN\nslm-lab run-remote spec.json my_experiment train --gpu\n```\n\n---\n*更多详细算法基准测试和高级配置，请参阅官方文档或 `docs\u002FBENCHMARKS.md`。*","某机器人实验室的研究团队正在开发一款基于深度强化学习的四足机器人步态控制算法，需要在多种物理仿真环境中快速验证 PPO 和 SAC 等主流算法的有效性。\n\n### 没有 SLM-Lab 时\n- **代码重复造轮子**：每次切换算法（如从 DQN 换到 PPO）或环境，研究人员需手动重写大量数据收集、网络构建和训练循环代码，耗时且易错。\n- **实验配置混乱**：超参数散落在多个脚本文件中，缺乏统一标准，导致难以复现之前的实验结果，团队协作时常出现“在我机器上能跑”的纠纷。\n- **分析工作繁琐**：训练结束后，需手动编写脚本提取日志数据并绘制学习曲线，无法实时直观地对比不同配置下的性能差异。\n- **复现成本高昂**：由于未自动记录代码版本和具体配置，几个月后想要复现某个最优模型时，往往因环境依赖或参数丢失而失败。\n\n### 使用 SLM-Lab 后\n- **模块化即插即用**：通过 JSON 配置文件即可一键定义实验，无需修改核心代码便能灵活组合 PPO、SAC 等算法与各类仿真环境，研发效率提升数倍。\n- **配置标准化管理**：所有实验参数集中管理，SLM-Lab 自动保存每次运行的完整配置与 Git 版本哈希，确保任何成员都能精确复现实验过程。\n- **自动化可视化分析**：训练过程中自动生成 TensorBoard 日志和性能曲线，研究人员可实时监控收敛情况并快速筛选出最优策略。\n- **无缝云端部署**：直接集成 dstack 和 HuggingFace，轻松将本地实验迁移至 GPU 集群训练，并方便地分享基准测试结果。\n\nSLM-Lab 通过将复杂的强化学习流程标准化和模块化，让研究团队从繁琐的工程实现中解放出来，专注于核心算法的创新与调优。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkengz_SLM-Lab_1742f3e8.png","kengz","Keng","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fkengz_998657a3.jpg","Engineer by day, rock climber by night. Mathematician at heart.",null,"NYC","kengzwl@gmail.com","kengz.github.io","https:\u002F\u002Fgithub.com\u002Fkengz",[82,86,90],{"name":83,"color":84,"percentage":85},"Python","#3572A5",98.7,{"name":87,"color":88,"percentage":89},"Shell","#89e051",1.1,{"name":91,"color":92,"percentage":93},"Dockerfile","#384d54",0.2,1346,288,"2026-04-06T06:17:33","MIT","Linux, macOS","非必需。CPU 可运行经典控制任务；GPU 推荐用于 Atari 图像环境训练。具体型号、显存及 CUDA 版本未在文档中明确说明（依赖 PyTorch 默认支持）。","未说明",{"notes":102,"python":103,"dependencies":104},"v5.0 版本已更新为使用 Gymnasium（OpenAI Gym 的维护分支）和 uv 包管理工具，并增加了对 ARM 架构的支持。若需复现书籍《Foundations of Deep Reinforcement Learning》中的代码，请切换至 v4.1.1 分支。支持通过 dstack 进行云端 GPU 训练并将结果同步至 HuggingFace。","未说明 (需支持 uv 工具及现代 PyTorch 版本)",[105,106,107,108,109],"torch","gymnasium","uv","dstack","typer",[14,111],"其他",[113,114,115,116,117,118,119,120,121,122],"pytorch","reinforcement-learning","deep-reinforcement-learning","benchmark","policy-gradient","dqn","ppo","sac","a2c","a3c","2026-03-27T02:49:30.150509","2026-04-13T06:10:42.024506",[126,131,136,141,146,151],{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},31435,"在 'search' 模式下运行示例时出现错误且无法生成图表，如何解决？","这通常是由于 plotly-orca 依赖项的问题。解决方案如下：\n1. 尝试卸载 orca：运行 `conda uninstall plotly-orca`，此时搜索将完成但不会生成 PNG 文件（CSV 数据仍可用）。\n2. 若需生成图表，请设置环境变量 `no_proxy`：运行 `export no_proxy='*'`，然后重新运行，即可成功生成 PNG 文件。\n3. 建议更新到最新的主分支代码，其中已修复了相关崩溃问题。","https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fissues\u002F442",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},31436,"运行 DQN 示例时出现 OpenGL 错误 (GL_INVALID_OPERATION) 且进程结束无图表，怎么办？","该问题是由旧版本的 plotly-orca 引起的图形渲染错误。维护者已在 v4.2.4 版本中用 'kaleido' 替换了 'orca' 作为 Plotly 的后端。\n解决方法：请升级 SLM-Lab 到最新版本 (v4.2.4 或更高)，命令通常为 `git pull` 或通过 pip 更新。升级后无需额外配置即可解决该 OpenGL 错误并正常生成图表。","https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fissues\u002F465",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},31437,"程序执行结束时报错 'Invalid property specified for object of type plotly.graph_objs.Layout: yaxis2'，如何修复？","这是因为安装的 Plotly 版本过新，与当前代码不兼容导致的属性错误。\n解决方法：\n1. 降级 Plotly 到兼容版本（通常建议参考项目 requirements 中的版本）。\n2. 或者更新 SLM-Lab 到最新代码以支持新版 Plotly。\n3. 某些 Linux 发行版可能无法继承 shell 变量，尝试直接使用 `python run_lab.py` 命令而不是通过 yarn 运行（确保已激活 conda 环境）。","https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fissues\u002F142",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},31438,"如何在 Arch Linux 或其他非标准发行版上安装 SLM-Lab 所需的依赖？","虽然官方主要支持 Ubuntu，但在 Arch 等发行版上安装时，需要手动确保安装了构建工具和基础依赖。\n关键步骤：\n1. 确保安装了 `base-devel` (Arch) 或对应的编译工具链。\n2. 如果使用 `yarn install` 失败，检查是否缺少 Node.js 或 Python 开发头文件。\n3. 推荐直接使用 `pip` 安装 Python 依赖，并参考项目文档中的系统依赖列表手动通过包管理器（如 pacman）安装缺失的系统库。","https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fissues\u002F140",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},31439,"如何在 SLM-Lab 中添加非 Gym 环境（如麻将、扑克等多智能体环境）？","目前 SLM-Lab 原生仅支持单智能体 (Single-Agent) 环境，尚未正式支持多智能体 (Multi-Agent) 功能。\n若要使用非 Gym 环境（如 rlcard）：\n1. 您需要自定义代码，无法直接通过修改 spec JSON 文件实现多智能体配置。\n2. 可以参考官方文档 \"Using SLM-Lab in Your Project\" 部分，学习如何在外部项目中导入 SLM-Lab 的 Agent。\n3. 您需要自行编写自定义的 Session 类来适配您的环境接口。","https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fissues\u002F505",{"id":152,"question_zh":153,"answer_zh":154,"source_url":130},31440,"为什么在 search 模式下没有生成图表，只有日志文件？","SLM-Lab 设计了一个通用的 try-catch 机制来包裹绘图代码，以确保即使绘图失败（如 segfault），主训练循环也不会中断。\n如果只看到日志而没有图表：\n1. 这通常意味着绘图后端（如 orca 或 kaleido）崩溃或被拦截。\n2. 检查是否安装了正确的绘图后端（推荐使用最新的 kaleido）。\n3. 如果是网络代理问题，尝试设置 `no_proxy` 环境变量。\n4. 确认没有发生导致绘图进程崩溃的严重错误，因为简单的 flag 无法绕过这种底层崩溃。",[156,161,166,171,176,181,186,191,196,201,206,211,216,221,226,231,236,241,246,251],{"id":157,"version":158,"summary_zh":159,"released_at":160},231163,"v5.2.0","# [5.2.0](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcompare\u002Fv5.1.0...v5.2.0) (2026-03-04)\n\n\n### Bug修复\n\n* 调整学习曲线中的CrossQ帧数——Ant 300万帧、Humanoid 200万帧、Swimmer 500万帧 ([f251cba](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcommit\u002Ff251cba09d054096a2c8bfaf4944bdd508144cb3))\n* 对齐CrossQ MuJoCo规范与实际基准运行，以确保可复现性 ([2cf11b0](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcommit\u002F2cf11b04a1d172384f2ae33495c27cdf17e415b8))\n* 将CrossQ复现表格中的max_frame与实际运行数据对齐 ([a4d3026](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcommit\u002Fa4d30262613243be2a79b83a487c7d066e78e8fb))\n* 规范中统一数值替换及未替换变量的验证 ([1114fb5](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcommit\u002F1114fb5618ed347019f75da167efd2ae351d4d17))\n* 将CrossQ max_frame上限设为SAC水平，以实现公平比较 ([1aae308](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcommit\u002F1aae30821ed35ce8f66dbac9eb93855442228560))\n* CartPole帧数由20万提升至30万，Humanoid迭代次数由2次增至4次，以提高UTD值 ([f0f1dc0](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcommit\u002Ff0f1dc04f91e51ff9ede652d7f8f37e2f67dbb25))\n* CartPole恢复为20万帧，并增加training_iter=2以获得更多梯度更新 ([bda2611](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcommit\u002Fbda26112f365a4a680b750e95ce781b79884f296))\n* CartPole将training_iter设置为2，适度提升UTD值 ([1323766](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcommit\u002F13237662bd0992694ece36991a4c6bd581736087))\n* CartPole将training_iter设置为4，BRN预热步数设为2000，以获取更多梯度更新 ([f57d843](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcommit\u002Ff57d8433a30649740ef2c61538b952d95a334d30))\n* CartPole将training_start_step设置为5000，以改善初始缓冲区的多样性 ([3e6102d](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcommit\u002F3e6102d76be22bdd113d0a29426903b338f695ec))\n* 修正审计中发现的4处BENCHMARKS.md不一致之处 ([7cfc6c5](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcommit\u002F7cfc6c57ce4b0257516b7226269f4b8a14bcd48a))\n* CrossQ InvDblPend critic网络隐藏层大小由[1024]调整为[512]，Humanoid帧数由350万提升至400万 ([92803f7](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcommit\u002F92803f713af00d5b83c682109de7ea80a1c1017c))\n* 恢复PPO的minibatch_size=64以及Atari硬编码的最大帧数 ([d8f5078](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcommit\u002Fd8f50782fdba18e52b02b719981be33a8b03932d))\n* 将CartPole规范回滚至与得分405的弧线运行相匹配的状态 ([d8695ec](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcommit\u002Fd8695ec03542c2bb8e68b87cb6496ea4f948729f))\n* 回滚固定内存以诊断持续的分数下降问题 ([21beecb](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcommit\u002F21beecb1f3d1fa63d078acd031edacd8948633a0))\n* 更新dstack配置以兼容0.20.x版本 ([ca91e73](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcommit\u002Fca91e737028c1a5715e1e62d800091855ef1ac3d))\n\n\n### 功能\n\n* 版本升级至5.2.0——CrossQ算法 ([6e0f68b](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcommit\u002F6e0f68bf5a408d3defccd4983b997338ebcccbd5))\n* CrossQ算法——无目标网络的SAC + BatchRenorm ([945625b](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcommit\u002F945625bc82cd5368551ca2aea3c9d6a0f787258d))\n* 所有环境的CrossQ基准测试规范 ([37a25ea](https:\u002F\u002Fgithub.","2026-03-04T10:41:30",{"id":162,"version":163,"summary_zh":164,"released_at":165},231164,"v5.1.0","# [5.1.0](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcompare\u002Fv5.0.2...v5.1.0) (2026-02-19)\n\n\n### 功能特性\n\n* 带有完整基准测试验证的 TorchArc YAML 网络架构 ([e56a414](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcommit\u002Fe56a41477b814a51d1bc8831a699c01315f04c94))\n\n\n\n","2026-02-19T03:46:35",{"id":167,"version":168,"summary_zh":169,"released_at":170},231165,"v5.0.2","## SAC Atari 基准测试 - 全部 58 款游戏\n\n在全部 58 款游戏中完成 SAC Atari 基准测试（每款游戏 200 万帧，每个种子运行 4 次）。\n\n- 单一通用配置文件 (`sac_atari.json`)：training_iter=3，分类分布，AdamW 优化器，学习率 = 3e-4\n- 所有 58 款游戏的 A2C、PPO 和 SAC 对比图表\n- 结果已公开发布至 Hugging Face 数据集 [SLM-Lab\u002Fbenchmark](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FSLM-Lab\u002Fbenchmark)\n- 简化了 CLAUDE.md 文件，并更新了基准测试技能与数据生命周期相关文档\n- 移除了过时的 SAC PER 和 sac_pong 配置文件\n\n**SAC 表现最佳的游戏**：CrazyClimber 81839 分，Atlantis 64097 分，VideoPinball 22541 分  \n**SAC 表现最差的游戏**：Tennis -374 分，FishingDerby -77 分，DoubleDunk -44 分，Enduro 0 分，Freeway 0 分  \n\n总体而言，SAC 在 Atari 游戏上的表现逊于 PPO（仅胜出约 10 款中的 58 款），但这一负结果同样具有参考价值。","2026-02-14T18:40:15",{"id":172,"version":173,"summary_zh":174,"released_at":175},231166,"v5.0.1","## [5.0.1](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcompare\u002Fv5.0.0...v5.0.1) (2026-02-11)\n\n\n### 错误修复\n\n* 所有环境的 SAC 基准规范 ([740da34](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcommit\u002F740da3418ddb9869aaacba8309e6282c6b621f02))\n* SAC 离散动作、算法修复以及 uint8 重放缓存 ([23b6fbf](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcommit\u002F23b6fbf9de7f140006e7e8f757928419c6d48cf2))\n\n\n\n","2026-02-11T02:57:51",{"id":177,"version":178,"summary_zh":179,"released_at":180},231167,"v5.0.0","重大现代化发布，将 SLM-Lab 从 OpenAI Gym 迁移到 Gymnasium，迁移到现代 Python 工具链（`uv`），并在 70 多个环境中验证了所有算法。\n\n## 主要变更\n\n- **Gymnasium 迁移**，正确处理 `terminated`\u002F`truncated` 标志\n- **现代工具链**：`uv` + `pyproject.toml`，Python 3.12+，PyTorch 2.8+\n- **简化规范**：不再使用 `body` 部分或数组包装器\n- **完整基准测试验证**：7 种算法 × 4 类环境\n- **云训练支持**：通过 dstack + HuggingFace 实现\n\n## 基准测试结果\n\n| 算法       | 经典   | Box2D  | MuJoCo | Atari |\n|------------|--------|--------|--------|-------|\n| REINFORCE  | ✅     | —      | —      | —     |\n| SARSA      | ✅     | —      | —      | —     |\n| DQN        | ✅     | ✅     | —      | —     |\n| DDQN+PER   | ✅     | ✅     | —      | —     |\n| A2C        | ✅     | ⚠️     | ⚠️     | ✅ 54 款游戏 |\n| PPO        | ✅     | ✅     | ✅ 11 个环境 | ✅ 54 款游戏 |\n| SAC        | ✅     | ✅     | ✅ 11 个环境 | —     |\n\n**Atari 基准测试** 使用 ALE v5 并启用粘滞动作（`repeat_action_probability=0.25`），遵循 [Machado 等人 (2018)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1709.06009) 的研究最佳实践。\n\n## 破坏性变更\n\n- 环境名称：`CartPole-v0` → `CartPole-v1`，`PongNoFrameskip-v4` → `ALE\u002FPong-v5`\n- 规范格式简化：`agent: [{...}]` → `agent: {...}`\n- 移除 `body` 部分，属性移至 `agent`\n- Roboschool → MuJoCo（`RoboschoolHopper-v1` → `Hopper-v5`）\n\n## 快速入门\n\n```bash\n# 安装\nuv sync && uv tool install --editable .\n\n# 运行\nslm-lab run spec.json spec_name train\n```\n\n## 书籍读者\n\n如需《深度强化学习基础》中的确切代码，请使用：\n```bash\ngit checkout v4.1.1\n```\n\n更多详细信息请参阅 [CHANGELOG.md](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fblob\u002Fmaster\u002FCHANGELOG.md)。","2026-02-02T14:08:25",{"id":182,"version":183,"summary_zh":184,"released_at":185},231168,"v4.2.4","## 变更内容\n* 升级 Plotly，由 @kengz 在 https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fpull\u002F501 中将 orca 替换为 kaleido\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcompare\u002Fv4.2.3...v4.2.4","2021-12-18T15:52:55",{"id":187,"version":188,"summary_zh":189,"released_at":190},231169,"v4.2.3","## 变更内容\n* 在 https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fpull\u002F488 中，@dd-iuonac 添加了 VideoPinball-v0 游戏的算法配置文件。\n* 在 https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fpull\u002F496 中，@kengz 和 @Karl-Grantham 修复了针对新款 RTX 显卡的构建问题。\n* 在 https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fpull\u002F499 中，移除了 `reinforce_pong.json` 规范，以避免混淆。\n\n## 新贡献者\n* @dd-iuonac 在 https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fpull\u002F488 中完成了首次贡献。\n* @Karl-Grantham 协助调试 #496。\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fcompare\u002Fv4.2.2...v4.2.3","2021-12-06T00:05:08",{"id":192,"version":193,"summary_zh":194,"released_at":195},231170,"v4.2.2","## 提升安装稳定性\n*:raised_hands: 感谢 @Nickfagiano 在调试方面的帮助。*\n- #487 更新安装流程，使其兼容 macOS Big Sur\n- #487 改进 Conda 路径检查机制，提升安装稳定性\n- #487 为安全起见，将 `atari-py` 版本锁定至 0.2.6\n\n## Google Colab\u002FJupyter\n*:raised_hands: 感谢 @piosif97 的协助。*\n- 新增 [Google Colab\u002FJupyter 使用说明](https:\u002F\u002Fslm-lab.gitbook.io\u002Fslm-lab\u002Fresources\u002Fhelp#google-colab-jupyter-setup)\n- [Colab 笔记本示例](https:\u002F\u002Fgist.github.com\u002Fkengz\u002F6fd52a902129fb6d4509c721d71bda48)\n\n## Windows 系统安装\n*:raised_hands: 感谢 @vladimirnitu 和 @steindaian 提供的 PDF 文档。*\n- 新增 [Windows 系统安装说明](https:\u002F\u002Fslm-lab.gitbook.io\u002Fslm-lab\u002Fsetup\u002Finstallation#windows)","2021-05-25T16:28:27",{"id":197,"version":198,"summary_zh":199,"released_at":200},231171,"v4.2.1","## 安装更新\nSLM Lab 周边的依赖和系统发生了变化，导致了一些兼容性问题。本次发布修复了这些安装问题。\n\n- #461、#476 更新至 `homebrew\u002Fcask`（感谢 @ben-e、@amjadmajid）\n- #463 将 pybullet 添加到依赖项中（感谢 @rafapi）\n- #483 修复 Arch Linux 安装中缺失的安装命令（感谢 @sebimarkgraf）\n- #485 将 GitHub Actions CI 更新至 v2\n- #485 修正演示测试用例，使其使用严格 JSON 格式","2021-05-17T03:35:20",{"id":202,"version":203,"summary_zh":204,"released_at":205},231172,"v4.2.0","## 恢复训练模式\n\n- #455 添加了 `train@` 恢复训练模式，并重构了 `enjoy` 模式。详细信息请参阅该 PR。\n\n### `train@` 使用示例\n\n将训练模式指定为 `train@{predir}`，其中 `{predir}` 是上一次训练的数据目录；也可以直接使用 `latest` 来使用最近一次的训练结果。例如：\n```bash\npython run_lab.py slm_lab\u002Fspec\u002Fbenchmark\u002Freinforce\u002Freinforce_cartpole.json reinforce_cartpole train\n# 在训练未完成时终止运行\n# 可选地以过去与未来一致的方式编辑配置文件\n\n# 使用以下任一命令继续训练：\npython run_lab.py slm_lab\u002Fspec\u002Fbenchmark\u002Freinforce\u002Freinforce_cartpole.json reinforce_cartpole train@latest\n# 或者指定某个特定的运行文件夹：\npython run_lab.py slm_lab\u002Fspec\u002Fbenchmark\u002Freinforce\u002Freinforce_cartpole.json reinforce_cartpole train@data\u002Freinforce_cartpole_2020_04_13_232521\n```\n\n### `enjoy` 模式重构\n\n`train@` 恢复训练模式的 API 使得 `enjoy` 模式可以被重构。两者具有相似的语法。继续以上面的例子为例，要体验训练好的模型，我们现在使用：\n```bash\npython run_lab.py slm_lab\u002Fspec\u002Fbenchmark\u002Freinforce\u002Freinforce_cartpole.json reinforce_cartpole enjoy@data\u002Freinforce_cartpole_2020_04_13_232521\u002Freinforce_cartpole_t0_s0_spec.json\n```\n\n## Plotly 和 PyTorch 更新\n\n- #453 将 Plotly 更新至 4.5.4，PyTorch 更新至 1.3.1。\n- #454 在绘图完成后显式关闭 Plotly orca 服务器，以防止出现僵尸进程。\n\n## PPO 批量大小优化\n\n- #453 增加了分块处理功能，允许 PPO 在更大的批量大小下运行，通过拆分前向传播循环来实现。\n\n## 新的 OnPolicyCrossEntropy 内存类\n\n- #446 添加了一个新的 `OnPolicyCrossEntropy` 内存类。详情请参阅该 PR。感谢 @ingambe 的贡献。","2020-04-14T17:08:35",{"id":207,"version":208,"summary_zh":209,"released_at":210},231173,"v4.1.1","## Discrete SAC benchmark update\r\n\r\n- [Upload PR #429](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fpull\u002F429)\r\n- [Dropbox data](https:\u002F\u002Fwww.dropbox.com\u002Fs\u002Faz4vncwwktyotol\u002Fbenchmark_discrete_2019_09.zip?dl=0)\r\n\r\n||||||||\r\n|:---:|:---:|:---:|:---:|:---:|:---:|:---:|\r\n| Env. \\ Alg. | DQN | DDQN+PER | A2C (GAE) | A2C (n-step) | PPO | SAC |\r\n| Breakout \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F67737546-dabb6380-f9c8-11e9-901e-b96cc28f1fdf.png\">\u003C\u002Fdetails> | 80.88 | 182 | 377 | 398 | **443** | 3.51* |\r\n| Pong \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F67737554-e018ae00-f9c8-11e9-92b5-3bd8d213b1e0.png\">\u003C\u002Fdetails> | 18.48 | 20.5 | 19.31 | 19.56 | **20.58** | 19.87* |\r\n| Seaquest \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F67737557-e3139e80-f9c8-11e9-9446-119593ca956b.png\">\u003C\u002Fdetails> | 1185 | **4405** | 1070 | 1684 | 1715 | 171* |\r\n| Qbert \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F67737559-e575f880-f9c8-11e9-8c98-f14c82041a45.png\">\u003C\u002Fdetails> | 5494 | 11426 | 12405 | **13590** | 13460 | 923* |\r\n| LunarLander \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F67737566-e7d85280-f9c8-11e9-8df8-39c1205c5308.png\">\u003C\u002Fdetails> | 192 | 233 | 25.21 | 68.23 | 214 | **276** |\r\n| UnityHallway \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F67737569-ead34300-f9c8-11e9-9e26-61fe1d779989.png\">\u003C\u002Fdetails> | -0.32 | 0.27 | 0.08 | -0.96 | **0.73** | 0.01 |\r\n| UnityPushBlock \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F67737577-eeff6080-f9c8-11e9-931c-843ba697779c.png\">\u003C\u002Fdetails> | 4.88 | 4.93 | 4.68 | 4.93 | **4.97** | -0.70 |\r\n\r\n>Episode score at the end of training attained by SLM Lab implementations on discrete-action control problems. Reported episode scores are the average over the last 100 checkpoints, and then averaged over 4 Sessions. A Random baseline with score averaged over 100 episodes is included. Results marked with `*` were trained using the hybrid synchronous\u002Fasynchronous version of SAC to parallelize and speed up training time. For SAC, Breakout, Pong and Seaquest were trained for 2M frames instead of 10M frames.\r\n\r\n>For the full Atari benchmark, see [Atari Benchmark](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fblob\u002Fbenchmark\u002FBENCHMARK.md#atari-benchmark)","2019-11-13T08:21:09",{"id":212,"version":213,"summary_zh":214,"released_at":215},231174,"v4.1.0","This marks a stable release of SLM Lab with full benchmark results\r\n\r\n## RAdam+Lookahead optimizer\r\n\r\n- [Lookahead](https:\u002F\u002Farxiv.org\u002Fabs\u002F1907.08610) + [RAdam](https:\u002F\u002Farxiv.org\u002Fabs\u002F1908.03265) optimizer significantly improves the performance of some RL algorithms (A2C (n-step), PPO) on continuous domain problems, but does not improve (A2C (GAE), SAC). #416\r\n\r\n## TensorBoard\r\n\r\n- Add TensorBoard in body to auto-log summary variables, graph, network parameter histograms, action histogram. To launch TensorBoard, run `tensorboard --logdir=data` after a session\u002Ftrial is completed. Example screenshot:\r\n\r\n\u003Cimg width=\"1423\" alt=\"Screen Shot 2019-10-14 at 10 41 36 PM\" src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F66803221-d9bc0980-eed3-11e9-92b8-0e5cd42a6eab.png\">\r\n\r\n## Full Benchmark Upload\r\n\r\n#### Plot Legend\r\n\r\n>\u003Cimg width=\"400\" alt=\"legend\" src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F67737544-d727dc80-f9c8-11e9-904a-319b9aafd41b.png\">\r\n\r\n\r\n### Discrete Benchmark\r\n\r\n- [Upload PR #427](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fpull\u002F427)\r\n- [Dropbox data](https:\u002F\u002Fwww.dropbox.com\u002Fs\u002Faz4vncwwktyotol\u002Fbenchmark_discrete_2019_09.zip?dl=0)\r\n\r\n||||||||\r\n|:---:|:---:|:---:|:---:|:---:|:---:|:---:|\r\n| Env. \\ Alg. | DQN | DDQN+PER | A2C (GAE) | A2C (n-step) | PPO | SAC |\r\n| Breakout \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F67737546-dabb6380-f9c8-11e9-901e-b96cc28f1fdf.png\">\u003C\u002Fdetails> | 80.88 | 182 | 377 | 398 | **443** |  - |\r\n| Pong \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F67737554-e018ae00-f9c8-11e9-92b5-3bd8d213b1e0.png\">\u003C\u002Fdetails> | 18.48 | 20.5 | 19.31 | 19.56 | **20.58** | 19.87* |\r\n| Seaquest \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F67737557-e3139e80-f9c8-11e9-9446-119593ca956b.png\">\u003C\u002Fdetails> | 1185 | **4405** | 1070 | 1684 | 1715 |  - |\r\n| Qbert \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F67737559-e575f880-f9c8-11e9-8c98-f14c82041a45.png\">\u003C\u002Fdetails> | 5494 | 11426 | 12405 | **13590** | 13460 | 214* |\r\n| LunarLander \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F67737566-e7d85280-f9c8-11e9-8df8-39c1205c5308.png\">\u003C\u002Fdetails> | 192 | 233 | 25.21 | 68.23 | 214 | **276** |\r\n| UnityHallway \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F67737569-ead34300-f9c8-11e9-9e26-61fe1d779989.png\">\u003C\u002Fdetails> | -0.32 | 0.27 | 0.08 | -0.96 | **0.73** | - |\r\n| UnityPushBlock \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F67737577-eeff6080-f9c8-11e9-931c-843ba697779c.png\">\u003C\u002Fdetails> | 4.88 | 4.93 | 4.68 | 4.93 | **4.97** | - |\r\n\r\n>Episode score at the end of training attained by SLM Lab implementations on discrete-action control problems. Reported episode scores are the average over the last 100 checkpoints, and then averaged over 4 Sessions. Results marked with `*` were trained using the hybrid synchronous\u002Fasynchronous version of SAC to parallelize and speed up training time.\r\n\r\n>For the full Atari benchmark, see [Atari Benchmark](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fblob\u002Fbenchmark\u002FBENCHMARK.md#atari-benchmark)\r\n\r\n### Continuous Benchmark\r\n\r\n- [Upload PR #427](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fpull\u002F427)\r\n- [Dropbox data](https:\u002F\u002Fwww.dropbox.com\u002Fs\u002Fxaxybertpwt4s9j\u002Fbenchmark_continuous_2019_09.zip?dl=0)\r\n\r\n||||||\r\n|:---:|:---:|:---:|:---:|:---:|\r\n| Env. \\ Alg. | A2C (GAE) | A2C (n-step) | PPO | SAC |\r\n| RoboschoolAnt \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F67737923-1571cb80-f9ca-11e9-8f6b-b288fa19bff0.png\">\u003C\u002Fdetails> | 787 | 1396 | 1843 | **2915** |\r\n| RoboschoolAtlasForwardWalk \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F67737924-1571cb80-f9ca-11e9-98ee-82c920dfbf44.png\">\u003C\u002Fdetails> | 59.87 | 88.04 | 172 | **800** |\r\n| RoboschoolHalfCheetah \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F67737925-1571cb80-f9ca-11e9-9c7f-3a8294a517af.png\">\u003C\u002Fdetails> | 712 | 439 | 1960 | **2497** |\r\n| RoboschoolHopper \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F67737926-160a6200-f9ca-11e9-8cae-9afc532e5af8.png\">\u003C\u002Fdetails> | 710 | 285 | 2042 | **2045** |\r\n| RoboschoolInvertedDoublePendulum \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F67737927-160a6200-f9ca-11e9-8eb2-e04554e3844f.png\">\u003C\u002Fdetails> | 996 | 4410 | 8076 | **8085** |\r\n| RoboschoolInvertedPendulum \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F67737928-160a6200-f9ca-11e9-8eae-e7a3ccbe914a.png\">\u003C\u002Fdetails> | **995** | 978 | 986 | 941 |\r\n| RoboschoolReacher \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C","2019-10-29T05:11:21",{"id":217,"version":218,"summary_zh":219,"released_at":220},231175,"v4.0.1","This release adds a new algorithm: Soft Actor-Critic (SAC).\r\n\r\n## Soft Actor-Critic\r\n-implement the original paper: \"Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor\" https:\u002F\u002Farxiv.org\u002Fabs\u002F1801.01290 #398 \r\n- implement the improvement of SAC paper: \"Soft Actor-Critic Algorithms and Applications\" https:\u002F\u002Farxiv.org\u002Fabs\u002F1812.05905 #399 \r\n- extend SAC to work directly for discrete environment using `GumbelSoftmax` distribution (custom)\r\n\r\n### Roboschool (continuous control) Benchmark\r\n\r\n>Note that the Roboschool reward scales are different from MuJoCo's.\r\n\r\n| Env. \\ Alg. | SAC |\r\n|:---|---|\r\n| RoboschoolAnt | 2451.55 \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62837481-c1eead80-bc24-11e9-913e-7685d64ecf87.png\">\u003C\u002Fdetails> |\r\n| RoboschoolHalfCheetah | 2004.27 \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62837485-daf75e80-bc24-11e9-8fba-279802ccdd1d.png\">\u003C\u002Fdetails> |\r\n| RoboschoolHopper | 2090.52 \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62837491-e8144d80-bc24-11e9-9d06-27a35b4aacca.png\">\u003C\u002Fdetails> |\r\n| RoboschoolWalker2d | 1711.92 \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62837495-f2364c00-bc24-11e9-8bdc-fa88831c227b.png\">\u003C\u002Fdetails> |\r\n\r\n\r\n### LunarLander (discrete control) Benchmark\r\n\r\n| | |\r\n|---|---|\r\n|![sac_lunar_t0_trial_graph_mean_returns_vs_frames](https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62837421-0cbbf580-bc24-11e9-89db-4a0da92b27b8.png)|![sac_lunar_t0_trial_graph_mean_returns_ma_vs_frames](https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62837420-0cbbf580-bc24-11e9-9214-2efb14778014.png)|\r\n| Trial graph | Moving average |\r\n\r\n","2019-08-11T18:14:14",{"id":222,"version":223,"summary_zh":224,"released_at":225},231176,"v4.0.0","This release corrects and optimizes all the algorithms from benchmarking on Atari. New metrics are introduced. The lab's API is also redesigned for simplicity.\r\n\r\n## Benchmark\r\n- full algorithm benchmark on 4 core Atari environments #396\r\n- LunarLander benchmark #388 and BipedalWalker benchmark #377\r\n\r\nThis benchmark table is pulled from [PR396](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fpull\u002F396). See the full [benchmark results here](https:\u002F\u002Fgithub.com\u002Fkengz\u002FSLM-Lab\u002Fblob\u002Fmaster\u002FBENCHMARK.md).\r\n\r\n| Env. \\ Alg. | A2C (GAE) | A2C (n-step) | PPO | DQN | DDQN+PER |\r\n|:---|---|---|---|---|---|\r\n| Breakout \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62232119-554cf680-b37a-11e9-9059-3e49bbb799d2.png\">\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62232118-554cf680-b37a-11e9-9d5b-dd2ddf527305.png\">\u003C\u002Fdetails> | 389.99 \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62019989-0171c000-b176-11e9-94da-017b146afe65.png\">\u003C\u002Fdetails> | 391.32 \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62020340-6c6fc680-b177-11e9-8aa1-9ac5c2001783.png\">\u003C\u002Fdetails> | **425.89** \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62067085-c0b28f00-b1e7-11e9-9dd5-c52b6104878f.png\">\u003C\u002Fdetails> | 65.04 \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62100441-9ba13900-b246-11e9-9373-95c6063915ab.png\">\u003C\u002Fdetails> | 181.72 \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62230967-dd7dcc80-b377-11e9-965b-60a9f3d5a7a1.png\">\u003C\u002Fdetails> |\r\n| Pong \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62232135-5b42d780-b37a-11e9-9454-ff2d109ef4f4.png\">\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62232134-5b42d780-b37a-11e9-892f-a84ea8881e78.png\">\u003C\u002Fdetails> | 20.04 \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62020247-10a53d80-b177-11e9-9f0d-1433d4d87210.png\">\u003C\u002Fdetails> | 19.66 \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62020342-6f6ab700-b177-11e9-824e-75f431dc14ec.png\">\u003C\u002Fdetails> | 20.09 \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62067100-c6a87000-b1e7-11e9-919e-ad68e4166213.png\">\u003C\u002Fdetails> | 18.34 \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62100450-9fcd5680-b246-11e9-8170-2ad4473e8294.png\">\u003C\u002Fdetails> | **20.44** \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62230975-e2428080-b377-11e9-8970-6917ae80c0b4.png\">\u003C\u002Fdetails> |\r\n| Qbert \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62232149-60078b80-b37a-11e9-99bb-cedc9fe064d5.png\">\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62232148-60078b80-b37a-11e9-9610-17ac447a479f.png\">\u003C\u002Fdetails> | 13,328.32 \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62020263-261a6780-b177-11e9-8936-22a74d2405d3.png\">\u003C\u002Fdetails> | 13,259.19 \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62020347-742f6b00-b177-11e9-8bfb-edfcfd44c8b7.png\">\u003C\u002Fdetails> | **13,691.89** \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62067104-cb6d2400-b1e7-11e9-9c4f-9eaac265d7d6.png\">\u003C\u002Fdetails> | 4,787.79 \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62100455-a4920a80-b246-11e9-8ca5-d4dc1ce3d76f.png\">\u003C\u002Fdetails> | 11,673.52 \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62230986-e79fcb00-b377-11e9-8861-3686954b7e1a.png\">\u003C\u002Fdetails> |\r\n| Seaquest \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62232168-6bf34d80-b37a-11e9-9564-fa3609dc5c75.png\">\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62232167-6bf34d80-b37a-11e9-8db3-c79a0e78292b.png\">\u003C\u002Fdetails> | 892.68 \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62020266-29adee80-b177-11e9-83c2-fafbdbb982b9.png\">\u003C\u002Fdetails> | 1,686.08 \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62020350-772a5b80-b177-11e9-8917-e3c8a745cd08.png\">\u003C\u002Fdetails> | 1,583.04 \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62067113-cf994180-b1e7-11e9-870b-b9bba71f2a7e.png\">\u003C\u002Fdetails> | 1,118.50 \u003Cdetails>\u003Csummary>\u003Ci>graph\u003C\u002Fi>\u003C\u002Fsummary>\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F62100462-a9ef5500-b246-11e9-8699-9356ff81ff93.png\">\u003C\u002Fdetails>","2019-07-31T17:19:25",{"id":227,"version":228,"summary_zh":229,"released_at":230},231177,"v3.2.1","## Improve installation\r\n- #288 split out yarn installation as extra step\r\n\r\n## Improve functions\r\n- #283 #284 redesign fitness slightly\r\n- #281 simplify PER sample index\r\n- #287 #290 improve DQN polyak and network switching\r\n- #291 refactor advantage functions\r\n- #295 #296 refactor various utils, fix PyTorch inplace ops\r\n\r\n## Add out layer activation\r\n- #300 add out layer activation","2019-04-17T16:49:44",{"id":232,"version":233,"summary_zh":234,"released_at":235},231178,"v3.2.0","## Eval rework\r\n\r\n#275 #278 #279 #280 \r\n\r\nThis release adds an eval mode that is the same as OpenAI baseline. Spawn 2 environments, 1 for training and 1 more eval. In the same process (blocking), run training as usual, then at ckpt, run an episode on eval env and update stats.\r\n\r\nThe logic for the stats are the same as before, except the original `body.df` is now split into two: `body.train_df` and `body.eval_df`. Eval df uses the main env stats except for `t, reward` to reflect progress on eval env. Correspondingly, session analysis also produces both versions of data.\r\n\r\nData from `body.eval_df` is used to generate `session_df, session_graph, session_fitness_df`, whereas the data from `body.train_df` is used to generate a new set of `trainsession_df, trainsession_graph, trainsession_fitness_df` for debugging.\r\n\r\nThe previous process-based eval functionality is kept, but is now considered as `parallel_eval`. This can be useful for more robust checkpointing and eval.\r\n\r\n## Refactoring\r\n#279 \r\n\r\n- purge useless computations\r\n- properly and efficiently gather and organize all update variable computations.\r\n\r\nThis also speeds up run time by x2. For Atari Beamrider with DQN on V100 GPU, manual benchmark measurement gives 110 FPS for training every 4 frames, while eval achieves 160 FPS. This translates to 10M frames in roughly 24 hours.","2019-02-05T08:10:23",{"id":237,"version":238,"summary_zh":239,"released_at":240},231179,"v3.1.1",">[**Docker image `kengz\u002Fslm_lab:v3.0.0` released**](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fkengz\u002Fslm_lab\u002Ftags\u002F)\r\n\r\n## Add Retro Eval\r\n- #270 add retro eval mode to run fail online eval sessions. Use command `yarn retro_eval data\u002Freinforce_cartpole_2018_01_22_211751`\r\n- #272 #273 fix eval saving 0 index to `eval_session_df` causing trial analysis to break; add reset_index for safety\r\n\r\n## fix Boltzmann spec\r\n- #271 change Boltzmann spec to use Categorical instead of the wrong Argmax\r\n\r\n## misc\r\n- #273 update colorlover package to proper pip after they fixed division error\r\n- #274 remove unused torchvision package to lighten build","2019-01-20T19:38:09",{"id":242,"version":243,"summary_zh":244,"released_at":245},231180,"v3.1.0","# v3.1.0: L1 fitness norm, code and spec refactor, online eval\r\n\r\n>[**Docker image `kengz\u002Fslm_lab:v3.1.0` released**](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fkengz\u002Fslm_lab\u002Ftags\u002F)\r\n\r\n## L1 fitness norm (breaking change)\r\n- change fitness vector norm from L2 to L1 for intuitiveness and non-extreme values\r\n\r\n## code and spec refactor\r\n- #254 PPO cleanup: remove hack and restore minimization scheme\r\n- #255 remove `use_gae` and `use_nstep` param to infer from `lam, num_step_returns`\r\n- #260 fix decay `start_step` offset, add unit tests for rate decay methods\r\n- #262 make epi start from 0 instead of 1 for code logic consistency\r\n- #264 switch `max_total_t`, `max_epi` to `max_tick` and `max_tick_unit` for directness. retire `graph_x` for the unit above\r\n- #266 add Atari fitness std, fix CUDA coredump issue\r\n- #269 update gym, remove box2d hack\r\n\r\n## Online Eval mode\r\n#252 #257 #261 #267\r\nEvaluation sessions during training on a subprocess. This does not interfere with the training process, but spawns multiple subprocesses to do independent evaluation, which then adds to an eval file, and at the end a final eval will finish and plot all the graphs and save all the data for eval.\r\n\r\n- enabled by meta spec `'training_eval'`\r\n- configure `NUM_EVAL_EPI` in `analysis.py`\r\n- update `enjoy` and `eval` mode syntax. see README.\r\n- change ckpt behavior to use e.g. tag `ckpt-epi10-totalt1000`\r\n- add new `eval` mode to lab. runs on a checkpoint file. see below\r\n\r\n### Eval Session\r\n- add a proper eval Session which loads from the ckpt like above, and does not interfere with existing files. This can be ran on terminal, and it's also used by the internal eval logic, e.g. command `python run_lab.py data\u002Fdqn_cartpole_2018_12_20_214412\u002Fdqn_cartpole_t0_spec.json dqn_cartpole eval@dqn_cartpole_t0_s2_ckpt-epi10-totalt1000`\r\n- when eval session is done, it will average all of its ran episodes and append to a row in an `eval_session_df.csv`\r\n- after that it will delete the ckpt files it had just used (to prevent large storage)\r\n- then, it will run a trial analysis to update `eval_trial_graph.png`, and an accompanying `trial_df` as average of all `session_df`s\r\n\r\n### How eval mode works\r\n- checkpoint will save the models using the scheme which records its `epi` and `total_t`. This allows one to eval using the ckpt model\r\n- after creating ckpt files, if `spec.meta.training_eval in `train` mode, a subprocess will launch using the ckpt prepath to run an eval Session, using the same way above `python run_lab.py data\u002Fdqn_cartpole_2018_12_20_214412\u002Fdqn_cartpole_t0_spec.json dqn_cartpole eval@dqn_cartpole_t0_s2_ckpt-epi10-totalt1000`\r\n- eval session runs as above. ckpt will now run at the starting timestep, ckpt timestep, and at the end\r\n- the main Session will wait for the final eval session and it's final eval trial to finish before closing, to ensure that other processes like zipping wait for them.\r\n\r\nExample eval trial graph:\r\n\r\n![dqn_cartpole_t0_ckpt-eval_trial_graph](https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F50327674-eaaf1080-04a4-11e9-8586-1ca025aec3e0.png)\r\n","2019-01-09T06:11:57",{"id":247,"version":248,"summary_zh":249,"released_at":250},231181,"v3.0.0","# V3: PyTorch 1.0, faster Neural Network, Variable Scheduler\r\n\r\n>[**Docker image `kengz\u002Fslm_lab:v3.0.0` released**](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fkengz\u002Fslm_lab\u002Ftags\u002F)\r\n\r\nPRs included #240 #241 #239 #238 #244 #248 \r\n\r\n## PyTorch 1.0 and parallel CUDA\r\n- switch to PyTorch 1.0 with various improvements and parallel CUDA fix\r\n\r\n## new Neural Network API (breaking changes)\r\nTo accommodate more advanced features and improvements, all the networks have been improved with better spec and code design, faster operations, and added features\r\n- single-tail networks will now not use list but a single tail for fast output compute (for loop is slow)\r\n- use PyTorch `optim.lr_scheduler` for learning rate decay. retire old methods.\r\n- more efficient spec format for network, `clip_grad`, `lr_scheduler_spec`\r\n- fix and add proper generalization for ConvNet and RecurrentNet\r\n- add full basic network unit tests\r\n\r\n## DQN\r\n- rewrite DQN loss for 2x speedup and code simplicity. extend to SARSA\r\n- retire MultitaskDQN for HydraDQN\r\n\r\n## Memory\r\n- add `OnpolicyConcatReplay`\r\n- standardize `preprocess_state` logic in onpolicy memories\r\n\r\n## Variable Scheduler (breaking spec changes)\r\n- implement variable decay class `VarScheduler` similar to pytorch's LR scheduler. use clock with flexible scheduling units `epi` or `total_t`\r\n- unify VarScheduler to use standard `clock.max_tick_unit` specified from env\r\n- retire `action_policy_update`, update agent spec to `explore_var_spec`\r\n- replace `entropy_coef` with `entropy_coef_spec`\r\n- replace `clip_eps` with `clip_eps_spec` (PPO)\r\n- update all specs\r\n\r\n## Math util\r\n- move decay methods to `math_util.py`\r\n- move `math_util.py` from `algorithm\u002F` to `lib\u002F`\r\n\r\n## env max tick (breaking spec changes)\r\n- spec\u002Fvariable renamings:\r\n  - `max_episode` to `max_epi`\r\n  - `max_timestep` to `max_t`\r\n  - `save_epi_frequency` to `save_frequency`\r\n  - `traininig_min_timestep` to `training_start_step`\r\n- allow env to stop based on `max_epi` as well as `max_total_t`. propagate clock unit usage\r\n- introduce `max_tick, max_tick_unit` properties to env and clock from above\r\n- allow `save_frequency` to use the same units accordingly\r\n- update Pong and Beamrider to use `max_total_t` as end-condition\r\n\r\n## Update Ray to reenable CUDA in search\r\n- update ray from `0.3.1` to `0.5.3` to address broken GPU with pytorch 1.0.0\r\n- to fix CUDA not discovered in Ray worker, have to manually set CUDA devices at ray remote function due to poor design.\r\n\r\n## Improved logging and Enjoy mode\r\n#243 #245\r\n- Best models checkpointing measured using the the reward_ma\r\n- Early termination if the environment is solved\r\n- method for logging learning rate to session data frame needed to be updated after move to PyTorch lr_scheduler\r\n- Also removed training_net from the mean learning rate reported in the session dataframe since the learning rate doesn't change\r\n- update naming scheme to work with enjoy mode\r\n- unify and simplify prepath methods\r\n- info_space now uses a `ckpt` for loading ckpt model. Example usage: `yarn start pong.json dqn_pong enjoy@data\u002Fdqn_cartpole_2018_12_02_124127\u002Fdqn_cartpole_t0_s0_ckptbest`\r\n- update agent load and policy to properly set variables to `end_val` in enjoy mode\r\n- random-seed env as well\r\n\r\n## Working Atari\r\n#242 \r\nAtari benchmark had been failing, but the root cause had finally been discovered and fix: wrong image preprocessing. This can be due to several factors, and we are doing ablation studies to check against the old code: - Image normalization cause the input values to be lowered by ~255, and the resultant loss is too small for optimizer.\r\n- blackframes in stacking at the beginning timesteps\r\n- wrong image permutation\r\n\r\nPR #242 introduces:\r\n- global environment preprocessor in the form of env wrapper borrowed from OpenAI baselines, in `env\u002Fwrapper.py`\r\n- a `TransformImage` to do the proper image transform: grayscale, downsize, and **shape from w,h,c to PyTorch format c,h,w**\r\n- a `FrameStack` which uses `LazyFrames` for efficiency to replace the agent-specific Atari stack frame preprocessing. This simplifies the Atari memories\r\n- update convnet to use honest shape (c,h,w) without extra transform, and remove its expensive image axis permutation since input now is in the right shape\r\n- update Vizdoom to produce (c,h,w) shape consistent with convnet input expectation\r\n\r\nTuned parameters will be obtained and released next version.\r\n\r\nAttached is a quick training curve on Pong, DQN, where the solution avg is +18:\r\n![fast_dqn_pong_t0_s0_session_graph](https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F49337610-c610f880-f5ca-11e8-9957-0fb53cf7fba8.png)\r\n![pong](https:\u002F\u002Fuser-images.githubusercontent.com\u002F8209263\u002F49346161-07dd8580-f643-11e8-975c-38972465a587.gif)","2018-12-03T00:55:56",{"id":252,"version":253,"summary_zh":254,"released_at":255},231182,"v2.2.0","## Add VizDoom environment\r\n#222 #224 \r\n- add new `OnPolicyImageReplay` and `ImageReplay` memories\r\n- add [VizDoom](http:\u002F\u002Fvizdoom.cs.put.edu.pl\u002F)  environment, thanks to @joelouismarino\r\n\r\n## Add NN Weight Initialization functionality\r\n#223 #225 \r\n- allow specification of NN weight init function in spec, thanks to @mwcvitkovic\r\n\r\n## Update Plotly to v3\r\n#221 \r\n- move to v3 to allow Python based (instead of bash) image saving for stability\r\n\r\n## Fixes\r\n- #207 fix PPO loss function broken during refactoring\r\n- #217 fix multi-device CUDA parallelization in grad assignment","2018-11-03T18:02:50"]