[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-HumanCompatibleAI--imitation":3,"tool-HumanCompatibleAI--imitation":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",158594,2,"2026-04-16T23:34:05",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":76,"owner_twitter":76,"owner_website":77,"owner_url":78,"languages":79,"stars":96,"forks":97,"last_commit_at":98,"license":99,"difficulty_score":32,"env_os":100,"env_gpu":101,"env_ram":101,"env_deps":102,"category_tags":110,"github_topics":112,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":116,"updated_at":117,"faqs":118,"releases":149},8201,"HumanCompatibleAI\u002Fimitation","imitation","Clean PyTorch implementations of imitation and reward learning algorithms","imitation 是一个基于 PyTorch 构建的开源项目，专注于提供干净、规范的模仿学习与奖励学习算法实现。在强化学习领域，直接设计奖励函数往往困难重重，而 imitation 通过让智能体模仿专家演示或从人类偏好中学习，有效解决了这一难题，大幅降低了训练高效策略的门槛。\n\n该项目非常适合 AI 研究人员、算法工程师以及希望深入理解模仿学习机制的开发者使用。无论是需要快速复现论文实验，还是寻求稳定的基线模型进行二次开发，imitation 都能提供坚实的支持。其核心亮点在于对多种主流算法的全面支持，包括行为克隆（Behavioral Cloning）、DAgger、生成对抗模仿学习（GAIL）、对抗逆强化学习（AIRL）以及基于人类偏好的深度强化学习等。这些算法不仅代码结构清晰、易于阅读，还同时兼容离散与连续的动作及状态空间（部分算法除外），并配有详尽的 API 文档和基准测试数据。借助 imitation，用户可以更专注于算法逻辑的验证与创新，而非耗费精力在底层实现的调试上。","[![CircleCI](https:\u002F\u002Fcircleci.com\u002Fgh\u002FHumanCompatibleAI\u002Fimitation.svg?style=svg)](https:\u002F\u002Fcircleci.com\u002Fgh\u002FHumanCompatibleAI\u002Fimitation)\n[![Documentation Status](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHumanCompatibleAI_imitation_readme_13d664e1afd7.png)](https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002F?badge=latest)\n[![codecov](https:\u002F\u002Fcodecov.io\u002Fgh\u002FHumanCompatibleAI\u002Fimitation\u002Fbranch\u002Fmaster\u002Fgraph\u002Fbadge.svg)](https:\u002F\u002Fcodecov.io\u002Fgh\u002FHumanCompatibleAI\u002Fimitation)\n[![PyPI version](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fimitation.svg)](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fimitation)\n\n# Imitation Learning Baseline Implementations\n\nThis project aims to provide clean implementations of imitation and reward learning algorithms.\nCurrently, we have implementations of the algorithms below. 'Discrete' and 'Continous' stands for whether the algorithm supports discrete or continuous action\u002Fstate spaces respectively.\n\n| Algorithm (+ link to paper)                                                                                                       | API Docs                                                                                                                 | Discrete | Continuous |\n|-----------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------|----------|------------|\n| Behavioral Cloning                                                                                                                | [`algorithms.bc`](https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002Falgorithms\u002Fbc.html)                                         | ✅        | ✅          |\n| [DAgger](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1011.0686.pdf)                                                                                     | [`algorithms.dagger`](https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002Falgorithms\u002Fdagger.html)                                 | ✅        | ✅          |\n| Density-Based Reward Modeling                                                                                                     | [`algorithms.density`](https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002Falgorithms\u002Fdensity.html)                               | ✅        | ✅          |\n| [Maximum Causal Entropy Inverse Reinforcement Learning](https:\u002F\u002Fwww.cs.cmu.edu\u002F~bziebart\u002Fpublications\u002Fmaximum-causal-entropy.pdf) | [`algorithms.mce_irl`](https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002Falgorithms\u002Fmce_irl.html)                               | ✅        | ❌          |\n| [Adversarial Inverse Reinforcement Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1710.11248)                                                    | [`algoritms.airl`](https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002Falgorithms\u002Fairl.html)                                      | ✅        | ✅          |\n| [Generative Adversarial Imitation Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1606.03476)                                                     | [`algorithms.gail`](https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002Falgorithms\u002Fgail.html)                                     | ✅        | ✅          |\n| [Deep RL from Human Preferences](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03741)                                                                | [`algorithms.preference_comparisons`](https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002Falgorithms\u002Fpreference_comparisons.html) | ✅        | ✅          |\n| [Soft Q Imitation Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.11108)                                                                     | [`algorithms.sqil`](https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002Falgorithms\u002Fsqil.html)                                     | ✅        | ❌          |\n\n\nYou can find [the documentation here](https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002F).\n\nYou can read the latest benchmark results [here](https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002Fmain-concepts\u002Fbenchmark_summary.html).\n\n## Installation\n\n### Prerequisites\n\n- Python 3.8+\n- (Optional) OpenGL (to render Gymnasium environments)\n- (Optional) FFmpeg (to encode videos of renders)\n\n> Note: `imitation` is only compatible with newer [gymnasium](https:\u002F\u002Fgymnasium.farama.org\u002F) environment API and does not support the older `gym` API.\n\n### Installing PyPI release\n\nInstalling the PyPI release is the standard way to use `imitation`, and the recommended way for most users.\n\n```\npip install imitation\n```\n\n### Install from source\n\nIf you like, you can install `imitation` from source to [contribute to the project][contributing] or access the very last features before a stable release. You can do this by cloning the GitHub repository and running the installer directly. First run:\n`git clone http:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation && cd imitation`.\n\nFor development mode, then run:\n\n```\npip install -e \".[dev]\"\n```\n\nThis will run `setup.py` in development mode, and install the additional dependencies required for development. For regular use, run instead\n\n```\npip install .\n```\n\nAdditional extras are available depending on your needs. Namely, `tests` for running the test suite, `docs` for building the documentation, `parallel` for parallelizing the training, and `atari` for including atari environments. The `dev` extra already installs the `tests`, `docs`, and `atari` dependencies automatically, and `tests` installs the `atari` dependencies.\n\nFor macOS users, some packages are required to run experiments (see `.\u002Fexperiments\u002FREADME.md` for details). First, install Homebrew if not available (see [Homebrew](https:\u002F\u002Fbrew.sh\u002F)). Then, run:\n\n```\nbrew install coreutils gnu-getopt parallel\n```\n\n## CLI Quickstart\n\nWe provide several CLI scripts as a front-end to the algorithms implemented in `imitation`. These use [Sacred](https:\u002F\u002Fgithub.com\u002Fidsia\u002Fsacred) for configuration and replicability.\n\nFrom [examples\u002Fquickstart.sh:](examples\u002Fquickstart.sh)\n\n```bash\n# Train PPO agent on pendulum and collect expert demonstrations. Tensorboard logs saved in quickstart\u002Frl\u002F\npython -m imitation.scripts.train_rl with pendulum environment.fast policy_evaluation.fast rl.fast fast logging.log_dir=quickstart\u002Frl\u002F\n\n# Train GAIL from demonstrations. Tensorboard logs saved in output\u002F (default log directory).\npython -m imitation.scripts.train_adversarial gail with pendulum environment.fast demonstrations.fast policy_evaluation.fast rl.fast fast demonstrations.path=quickstart\u002Frl\u002Frollouts\u002Ffinal.npz demonstrations.source=local\n\n# Train AIRL from demonstrations. Tensorboard logs saved in output\u002F (default log directory).\npython -m imitation.scripts.train_adversarial airl with pendulum environment.fast demonstrations.fast policy_evaluation.fast rl.fast fast demonstrations.path=quickstart\u002Frl\u002Frollouts\u002Ffinal.npz demonstrations.source=local\n```\n\nTips:\n\n- Remove the \"fast\" options from the commands above to allow training run to completion.\n- `python -m imitation.scripts.train_rl print_config` will list Sacred script options. These configuration options are documented in each script's docstrings.\n\nFor more information on how to configure Sacred CLI options, see the [Sacred docs](https:\u002F\u002Fsacred.readthedocs.io\u002Fen\u002Fstable\u002F).\n\n## Python Interface Quickstart\n\nSee [examples\u002Fquickstart.py](examples\u002Fquickstart.py) for an example script that loads CartPole-v1 demonstrations and trains BC, GAIL, and AIRL models on that data.\n\n### Density reward baseline\n\nWe also implement a density-based reward baseline. You can find an [example notebook here](docs\u002Ftutorials\u002F7_train_density.ipynb).\n\n# Citations (BibTeX)\n\n```\n@misc{gleave2022imitation,\n  author = {Gleave, Adam and Taufeeque, Mohammad and Rocamonde, Juan and Jenner, Erik and Wang, Steven H. and Toyer, Sam and Ernestus, Maximilian and Belrose, Nora and Emmons, Scott and Russell, Stuart},\n  title = {imitation: Clean Imitation Learning Implementations},\n  year = {2022},\n  howPublished = {arXiv:2211.11972v1 [cs.LG]},\n  archivePrefix = {arXiv},\n  eprint = {2211.11972},\n  primaryClass = {cs.LG},\n  url = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.11972},\n}\n```\n\n# Contributing\n\nSee [Contributing to imitation][contributing] for more information.\n\n\n[contributing]: https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002Fdevelopment\u002Fcontributing\u002Findex.html\n","[![CircleCI](https:\u002F\u002Fcircleci.com\u002Fgh\u002FHumanCompatibleAI\u002Fimitation.svg?style=svg)](https:\u002F\u002Fcircleci.com\u002Fgh\u002FHumanCompatibleAI\u002Fimitation)\n[![Documentation Status](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHumanCompatibleAI_imitation_readme_13d664e1afd7.png)](https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002F?badge=latest)\n[![codecov](https:\u002F\u002Fcodecov.io\u002Fgh\u002FHumanCompatibleAI\u002Fimitation\u002Fbranch\u002Fmaster\u002Fgraph\u002Fbadge.svg)](https:\u002F\u002Fcodecov.io\u002Fgh\u002FHumanCompatibleAI\u002Fimitation)\n[![PyPI version](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fimitation.svg)](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fimitation)\n\n# 仿生学习基准实现\n\n本项目旨在提供清晰的仿生学习和奖励学习算法实现。目前，我们已实现了以下算法。“离散”和“连续”分别表示该算法是否支持离散或连续的动作\u002F状态空间。\n\n| 算法（+ 论文链接）                                                                                                       | API 文档                                                                                                                 | 离散 | 连续 |\n|-----------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------|------|------|\n| 行为克隆                                                                                                                | [`algorithms.bc`](https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002Falgorithms\u002Fbc.html)                                         | ✅    | ✅    |\n| [DAgger](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1011.0686.pdf)                                                                                     | [`algorithms.dagger`](https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002Falgorithms\u002Fdagger.html)                                 | ✅    | ✅    |\n| 基于密度的奖励建模                                                                                                     | [`algorithms.density`](https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002Falgorithms\u002Fdensity.html)                               | ✅    | ✅    |\n| [最大因果熵逆强化学习](https:\u002F\u002Fwww.cs.cmu.edu\u002F~bziebart\u002Fpublications\u002Fmaximum-causal-entropy.pdf)                             | [`algorithms.mce_irl`](https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002Falgorithms\u002Fmce_irl.html)                               | ✅    | ❌    |\n| [对抗式逆强化学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F1710.11248)                                                                    | [`algoritms.airl`](https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002Falgorithms\u002Fairl.html)                                      | ✅    | ✅    |\n| [生成对抗仿生学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F1606.03476)                                                                     | [`algorithms.gail`](https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002Falgorithms\u002Fgail.html)                                     | ✅    | ✅    |\n| [基于人类偏好的深度强化学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03741)                                                        | [`algorithms.preference_comparisons`](https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002Falgorithms\u002Fpreference_comparisons.html) | ✅    | ✅    |\n| [软Q仿生学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.11108)                                                                         | [`algorithms.sqil`](https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002Falgorithms\u002Fsqil.html)                                     | ✅    | ❌    |\n\n\n您可以在此处找到[文档](https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002F)。\n\n您还可以阅读最新的基准测试结果[这里](https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002Fmain-concepts\u002Fbenchmark_summary.html)。\n\n## 安装\n\n### 先决条件\n\n- Python 3.8+\n- （可选）OpenGL（用于渲染 Gymnasium 环境）\n- （可选）FFmpeg（用于编码渲染视频）\n\n> 注意：`imitation`仅兼容较新的[gymnasium](https:\u002F\u002Fgymnasium.farama.org\u002F)环境API，不支持旧版`gym`API。\n\n### 安装 PyPI 发布版本\n\n安装 PyPI 发布版本是使用 `imitation` 的标准方式，也是大多数用户的推荐方式。\n\n```\npip install imitation\n```\n\n### 从源代码安装\n\n如果您愿意，可以从源代码安装 `imitation`，以便[参与项目贡献][contributing]，或者在稳定版本发布前抢先体验最新功能。您可以通过克隆 GitHub 仓库并直接运行安装程序来完成此操作。首先执行：\n`git clone http:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation && cd imitation`。\n\n对于开发模式，请运行：\n\n```\npip install -e \".[dev]\"\n```\n\n这将以开发模式运行 `setup.py`，并安装开发所需的额外依赖项。对于常规使用，则应运行：\n\n```\npip install .\n```\n\n根据您的需求，还可选择其他附加包。例如，`tests`用于运行测试套件，`docs`用于构建文档，`parallel`用于并行化训练，而`atari`则用于包含 Atari 环境。`dev`附加包会自动安装 `tests`、`docs` 和 `atari` 的依赖项，且 `tests` 也会自动安装 `atari` 的依赖项。\n\n对于 macOS 用户，运行实验需要一些软件包（详情请参阅 `.\u002Fexperiments\u002FREADME.md`）。首先，如果尚未安装 Homebrew，请先安装（参见 [Homebrew](https:\u002F\u002Fbrew.sh\u002F)）。然后运行：\n\n```\nbrew install coreutils gnu-getopt parallel\n```\n\n## CLI 快速入门\n\n我们提供了若干 CLI 脚本，作为 `imitation` 中所实现算法的前端接口。这些脚本使用[Sacred](https:\u002F\u002Fgithub.com\u002Fidsia\u002Fsacred)进行配置和可重复性管理。\n\n来自 [examples\u002Fquickstart.sh:](examples\u002Fquickstart.sh)\n\n```bash\n# 在摆杆环境中训练 PPO 智能体，并收集专家演示数据。TensorBoard 日志将保存在 quickstart\u002Frl\u002F 目录下。\npython -m imitation.scripts.train_rl with pendulum environment.fast policy_evaluation.fast rl.fast fast logging.log_dir=quickstart\u002Frl\u002F\n\n# 从演示数据中训练 GAIL。TensorBoard 日志将保存在 output\u002F 目录中（默认日志目录）。\npython -m imitation.scripts.train_adversarial gail with pendulum environment.fast demonstrations.fast policy_evaluation.fast rl.fast fast demonstrations.path=quickstart\u002Frl\u002Frollouts\u002Ffinal.npz demonstrations.source=local\n\n# 从示范数据中训练 AIRL。TensorBoard 日志保存在 output\u002F 目录中（默认日志目录）。\npython -m imitation.scripts.train_adversarial airl with pendulum environment.fast demonstrations.fast policy_evaluation.fast rl.fast fast demonstrations.path=quickstart\u002Frl\u002Frollouts\u002Ffinal.npz demonstrations.source=local\n```\n\n提示：\n\n- 可以移除上述命令中的“fast”选项，以便让训练完整运行。\n- `python -m imitation.scripts.train_rl print_config` 将列出 Sacred 脚本的选项。这些配置选项已在每个脚本的文档字符串中说明。\n\n有关如何配置 Sacred CLI 选项的更多信息，请参阅 [Sacred 文档](https:\u002F\u002Fsacred.readthedocs.io\u002Fen\u002Fstable\u002F)。\n\n## Python 接口快速入门\n\n请参阅 [examples\u002Fquickstart.py](examples\u002Fquickstart.py)，其中包含一个示例脚本，用于加载 CartPole-v1 的示范数据，并基于该数据训练 BC、GAIL 和 AIRL 模型。\n\n### 密度奖励基线\n\n我们还实现了一个基于密度的奖励基线。您可以在 [此处找到示例笔记本](docs\u002Ftutorials\u002F7_train_density.ipynb)。\n\n# 引用（BibTeX）\n\n```\n@misc{gleave2022imitation,\n  author = {Gleave, Adam and Taufeeque, Mohammad and Rocamonde, Juan and Jenner, Erik and Wang, Steven H. and Toyer, Sam and Ernestus, Maximilian and Belrose, Nora and Emmons, Scott and Russell, Stuart},\n  title = {imitation: 清洁的模仿学习实现},\n  year = {2022},\n  howPublished = {arXiv:2211.11972v1 [cs.LG]},\n  archivePrefix = {arXiv},\n  eprint = {2211.11972},\n  primaryClass = {cs.LG},\n  url = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.11972},\n}\n```\n\n# 贡献\n\n有关更多信息，请参阅 [贡献 imitation][contributing]。\n\n\n[contributing]: https:\u002F\u002Fimitation.readthedocs.io\u002Fen\u002Flatest\u002Fdevelopment\u002Fcontributing\u002Findex.html","# Imitation 快速上手指南\n\n`imitation` 是一个提供模仿学习（Imitation Learning）和奖励学习（Reward Learning）算法干净实现的开源项目，支持行为克隆、GAIL、AIRL 等主流算法。\n\n## 环境准备\n\n在开始之前，请确保满足以下系统要求：\n\n*   **Python 版本**：3.8 或更高版本\n*   **图形渲染（可选）**：如需渲染 Gymnasium 环境，需安装 OpenGL\n*   **视频录制（可选）**：如需录制渲染视频，需安装 FFmpeg\n*   **重要提示**：本库仅兼容新版 [`gymnasium`](https:\u002F\u002Fgymnasium.farama.org\u002F) 环境 API，**不**支持旧版 `gym` API。\n\n> **国内用户加速建议**：\n> 建议使用国内镜像源安装依赖，以提升下载速度。例如使用清华源：\n> `pip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple \u003Cpackage_name>`\n\n## 安装步骤\n\n### 方式一：通过 PyPI 安装（推荐）\n\n这是大多数用户的标准安装方式：\n\n```bash\npip install imitation\n```\n\n**国内加速命令：**\n```bash\npip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple imitation\n```\n\n### 方式二：从源码安装\n\n如果您需要贡献代码或使用尚未发布的最新功能，可以从 GitHub 克隆源码安装。\n\n1. 克隆仓库：\n```bash\ngit clone http:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation && cd imitation\n```\n\n2. 开发模式安装（包含测试、文档等额外依赖）：\n```bash\npip install -e \".[dev]\"\n```\n\n3. 或者仅进行常规使用安装：\n```bash\npip install .\n```\n\n**macOS 用户额外步骤**：\n运行实验前需通过 Homebrew 安装部分工具：\n```bash\nbrew install coreutils gnu-getopt parallel\n```\n\n## 基本使用\n\n`imitation` 提供了命令行（CLI）和 Python 接口两种使用方式。\n\n### 1. 命令行快速开始 (CLI)\n\n项目内置了基于 [Sacred](https:\u002F\u002Fgithub.com\u002Fidsia\u002Fsacred) 的脚本，用于配置和复现实验。以下示例展示了如何训练 PPO 专家策略，并基于该数据训练 GAIL 和 AIRL 模型。\n\n> **注意**：示例中的 `fast` 参数用于快速测试（减少训练步数）。正式训练时请移除所有 `fast` 选项。\n\n**步骤 1：训练 PPO 代理并收集专家演示**\n```bash\npython -m imitation.scripts.train_rl with pendulum environment.fast policy_evaluation.fast rl.fast fast logging.log_dir=quickstart\u002Frl\u002F\n```\n\n**步骤 2：基于演示训练 GAIL 模型**\n```bash\npython -m imitation.scripts.train_adversarial gail with pendulum environment.fast demonstrations.fast policy_evaluation.fast rl.fast fast demonstrations.path=quickstart\u002Frl\u002Frollouts\u002Ffinal.npz demonstrations.source=local\n```\n\n**步骤 3：基于演示训练 AIRL 模型**\n```bash\npython -m imitation.scripts.train_adversarial airl with pendulum environment.fast demonstrations.fast policy_evaluation.fast rl.fast fast demonstrations.path=quickstart\u002Frl\u002Frollouts\u002Ffinal.npz demonstrations.source=local\n```\n\n*   查看可用配置选项：`python -m imitation.scripts.train_rl print_config`\n*   TensorBoard 日志默认保存在 `output\u002F` 目录（除非像步骤 1 那样指定了其他路径）。\n\n### 2. Python 接口快速开始\n\n您可以直接在 Python 代码中调用算法。以下逻辑展示了如何加载 CartPole 环境的演示数据并训练模型（完整代码请参考 `examples\u002Fquickstart.py`）：\n\n```python\nimport gymnasium as gym\nfrom imitation.algorithms import bc, gail, airl\nfrom imitation.data import rollout\nfrom imitation.util import util\n\n# 1. 创建环境\nenv = gym.make(\"CartPole-v1\")\n\n# 2. 加载专家演示数据 (此处仅为示意，实际需加载 npz 文件)\n# rng = np.random.default_rng(0)\n# expert_policy = ... \n# rolls = rollout.generate_expert_trajectories(...)\n\n# 3. 训练行为克隆 (Behavioral Cloning)\nbc_trainer = bc.BC(\n    observation_space=env.observation_space,\n    action_space=env.action_space,\n    demonstrations=rolls, \n)\nbc_trainer.train()\n\n# 4. 训练 GAIL 或 AIRL (需要传入演示数据和环境)\n# gail_trainer = gail.GAIL(...)\n# gail_trainer.train()\n```\n\n对于密度奖励基线（Density-Based Reward），可参考官方提供的 Jupyter Notebook 教程：`docs\u002Ftutorials\u002F7_train_density.ipynb`。","某自动驾驶初创团队正致力于让无人配送车在复杂的城市巷道中学会像老司机一样平稳行驶，但缺乏足够的标注规则来编写传统控制代码。\n\n### 没有 imitation 时\n- 工程师必须手动定义成千上万条“如果...就...\"的规则来处理变道、避障和跟车，耗时数月且难以覆盖所有长尾场景。\n- 试图通过纯强化学习从零训练，车辆因随机探索导致频繁碰撞或违规，需要在仿真环境中耗费巨大的算力成本试错。\n- 即使收集了人类司机的驾驶视频数据，也缺乏现成的算法框架将其转化为策略网络，数据只能闲置在硬盘中。\n- 调整奖励函数极其困难，稍有不慎就会导致模型出现“作弊”行为（如为了求快而危险驾驶），调试周期漫长。\n\n### 使用 imitation 后\n- 团队直接调用 imitation 中的 Behavioral Cloning (BC) 算法，将人类司机的演示数据作为监督信号，几天内便训练出了具备基础驾驶能力的策略模型。\n- 利用 DAgger 算法进行迭代优化，模型能主动识别自身不确定的场景并向人类专家请求新数据，显著减少了危险的非理性探索行为。\n- 通过集成 GAIL 或 AIRL 算法，系统能从专家轨迹中自动反推隐含的奖励函数，让车辆学会了难以用代码描述的“驾驶直觉”和流畅度。\n- 依托其干净的 PyTorch 实现，研究人员能快速在不同算法间切换对比，将原本数周的算法复现工作缩短为几小时的配置实验。\n\nimitation 通过将宝贵的人类经验直接转化为可执行的智能策略，极大地降低了机器人从“蛮力试错”到“模仿精通”的技术门槛与时间成本。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FHumanCompatibleAI_imitation_ae9f12ef.png","HumanCompatibleAI","Center for Human-Compatible AI","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FHumanCompatibleAI_99aa916f.png","CHAI seeks to develop the conceptual and technical wherewithal to reorient the general thrust of AI research towards provably beneficial systems.",null,"https:\u002F\u002Fhumancompatible.ai","https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI",[80,84,88,92],{"name":81,"color":82,"percentage":83},"Python","#3572A5",96,{"name":85,"color":86,"percentage":87},"Shell","#89e051",3.7,{"name":89,"color":90,"percentage":91},"Dockerfile","#384d54",0.2,{"name":93,"color":94,"percentage":95},"PowerShell","#012456",0,1722,301,"2026-04-16T07:08:22","MIT","Linux, macOS","未说明",{"notes":103,"python":104,"dependencies":105},"仅兼容较新的 Gymnasium 环境 API，不支持旧的 Gym API。可选安装 OpenGL 用于渲染环境，FFmpeg 用于录制视频。macOS 用户需通过 Homebrew 安装 coreutils、gnu-getopt 和 parallel 以运行实验。支持离散和连续动作\u002F状态空间的多种模仿学习算法。","3.8+",[106,107,108,109],"gymnasium","sacred","torch","stable-baselines3",[14,111],"其他",[113,114,115,106],"reward-learning","inverse-reinforcement-learning","imitation-learning","2026-03-27T02:49:30.150509","2026-04-17T09:47:54.243555",[119,124,129,134,139,144],{"id":120,"question_zh":121,"answer_zh":122,"source_url":123},36686,"如何修复加载 HuggingFace 上的 SB3 模型时出现的 'vec_normalize.pkl ignored' 或 'Outdated policy format' 错误？","该错误通常是因为模型使用了过时的归一化统计文件格式。解决方案是重新训练这些环境的专家模型（如 PPO 和 SAC），并在配置中明确指定归一化设置。例如，对于 'seals\u002FMountainCar-v0'，可以这样配置：\n\n```python\n\"seals\u002FMountainCar-v0\": dict(\n    normalize=dict(norm_obs=False, norm_reward=True),\n    policy_kwargs=dict(\n        activation_fn=torch.nn.modules.activation.Tanh,\n        net_arch=[{\"pi\": [64, 64], \"vf\": [64, 64]}],\n        features_extractor_class=imitation.policies.base.NormalizeFeaturesExtractor,\n    ),\n)\n```\n\n此外，也可以考虑使用 Python 模块作为配置文件代替 YAML 文件，以更灵活地处理此类问题。","https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fissues\u002F576",{"id":125,"question_zh":126,"answer_zh":127,"source_url":128},36687,"如何处理 Atari 环境中观察值形状不一致的问题（(h,w,c) vs (c,h,w)）？","Gym 中的 Atari 环境返回 (h,w,c) 格式的观察值，而 PyTorch 策略通常期望 (c,h,w)。Stable Baselines3 会在内部进行转置，但这可能导致奖励网络输入格式不匹配。\n\n建议的解决方案是创建一个包装器（Wrapper），专门用于检测并在输入奖励函数前自动执行必要的转置操作。特别是当使用帧堆叠（Frame Stacking）时，需注意 SB3 的 `VecFrameStack` 将堆叠维度放在最后，而 Gym 的 `FrameStack` 将其放在最前，这会影响通道维度的位置。推荐统一使用 SB3 的方法以避免混淆。","https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fissues\u002F486",{"id":130,"question_zh":131,"answer_zh":132,"source_url":133},36688,"如何将现有的 .pkl 格式数据集转换为模仿学习库可用的专家数据集用于行为克隆（Behavioral Cloning）？","如果您的 .pkl 文件结构与库生成的 `final.pkl` 类似，通常可以直接加载并使用。确保您已升级到最新版本的 Stable Baselines3 (SB3)，因为旧版本可能存在日志记录或数据加载问题。\n\n操作步骤：\n1. 切换到主分支并升级安装：`pip install --upgrade .`\n2. 确认您的 .pkl 文件包含标准的轨迹数据（obs, acts, rews, dones 等）。\n3. 如果仍有问题，检查是否因 SB3 版本差异导致的数据结构不兼容，升级 SB3 通常能解决此类问题。","https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fissues\u002F329",{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},36689,"AIRL 算法运行不正常或性能低下，是否与输入被重复预处理（double preprocessing）有关？","经过排查，AIRL 输入被重复预处理的 Bug 实际上并不存在。观察值通常会先通过 `VecNormalize` 进行归一化（计算运行均值\u002F标准差），然后再调用 `preprocess_obs` 处理特定空间类型。\n\n如果您遇到 AIRL 性能不佳的问题，可以尝试显式禁用奖励归一化来测试：\n```shell\npython -m imitation.scripts.train_adversarial with airl algorithm_kwargs.shared.normalize_reward=False\n```\n如果问题依旧，请检查其他潜在原因而非重复预处理。","https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fissues\u002F229",{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},36690,"在哪里可以找到该项目的入门教程或文档？","项目已建立 Sphinx 文档并托管在 ReadTheDocs 上。虽然早期文档结构较为松散，但目前已包含详细的类文档字符串。\n\n为了查看包括 `__init__` 方法在内的完整 API 文档，文档配置已更新为：\n```python\nautodoc_default_options = {\n    \"members\": True,\n    \"undoc-members\": True,\n    \"special-members\": \"__init__\",\n    \"show-inheritance\": True,\n}\n```\n此外，社区也制作了 YouTube 视频教程供新手参考。建议访问官方文档网站或搜索相关视频频道获取快速入门指南。","https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fissues\u002F190",{"id":145,"question_zh":146,"answer_zh":147,"source_url":148},36691,"在使用 VecEnv 时，为什么无法获取 episode 结束时的最后一个观察值（old_obs）？","这是 VecEnv 的标准行为：当某个环境在 step() 中结束时，它会自动 reset() 并返回新 episode 的初始观察值，导致原 episode 的最终状态丢失。\n\n该问题已在 Stable Baselines 侧通过 PR #412 修复，并在 `rollout.py` 中进行了相应调整以正确处理轨迹中心算法。请确保您使用的是已合并了这些修复的最新版本库，以便在收集轨迹时能正确捕获 episode 的终止状态。","https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fissues\u002F1",[150,155,160,165,170,175,179,183],{"id":151,"version":152,"summary_zh":153,"released_at":154},292769,"v1.0.1","修复在有多台设备可用时，张量位于错误设备上的 bug (#831)。\n\n## 变更内容\n* 由 @ernestum 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F817 中更新了 README 文件。\n* 由 @ernestum 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F822 中添加了更多基准测试文档。\n* 由 @ernestum 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F824 中在 README.md 中明确说明我们已切换至 gymnasium。\n* 由 @ernestum 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F826 中改用 lualatex 生成文档 PDF。\n* 由 @ernestum 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F827 中修复了文档构建流程。\n* 由 @ernestum 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F823 中修复了 quickstart.py 中的警告。\n* 由 @ernestum 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F830 中修复了 BC 测试中的覆盖率问题。\n* 由 @ernestum 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F829 中移除了 FloatReward。\n* 由 @ernestum 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F831 中确保 safe_to_tensor 能将张量移动到指定设备。\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fcompare\u002Fv1.0.0...v1.0.1","2025-01-07T06:20:04",{"id":156,"version":157,"summary_zh":158,"released_at":159},292770,"v1.0.0","我们很高兴地宣布 `imitation` 的首个稳定版本发布。主要改进包括：\n  * 兼容 Gymnasium，Gym 已被取代\n  * 针对常见算法-环境组合调整了超参数并提供了基准测试结果（详见随附的发布资产）。\n  * 新算法（beta 版）：SQIL\n如需更多信息，请参阅下方的变更日志。\n\n## 变更内容\n* @ernestum 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F760 中更新了安装说明\n* @jas-ho 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F766 中实现了在教程和文档中从 Hugging Face 下载专家数据的功能\n* @RedTachyon 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F744 中实现了 SQIL 算法\n* @EdoardoPona 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F761 中添加了更多 CLI 使用示例\n* @ernestum 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F775 中修复了依赖问题\n* @michalzajac-ml 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F774 中调优了核密度估计教程的超参数\n* @michalzajac-ml 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F772 中调优了 GAIL 和 AIRL 教程中的超参数\n* @michalzajac-ml 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F776 中引入了交互式策略，用于从用户处收集数据\n* @michalzajac-ml 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F778 中增加了使用多种离策略算法运行 SQIL 的选项\n* @lukasberglund 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F782 中完成了 PR #771（调优偏好比较示例的超参数）\n* @lukasberglund 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F784 中为 SQIL 添加了 CLI 接口\n* @ernestum 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F735 中实现了 Gymnasium 兼容性\n* @ernestum 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F803 中确保 MyST-NB 在渲染笔记本失败时会抛出错误\n* @ernestum 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F779 中增加了测试超时机制\n* @AdamGleave 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F797 中修复了 macOS 流水线：包含不在子目录中的测试\n* @AdamGleave 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F800 中移除了 SQIL 笔记本对 MuJoCo 的依赖\n* @NixGD 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F785 中添加了对字典型观测空间的部分支持（bc、density）\n* @taufeeque9 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F806 中更新了 gymnasium 依赖及 gym.make 中的 render_mode 参数\n* @ZiyueWang25 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F801 中升级了 pytype\n* @ernestum 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F810 中缩短了训练时间，并改进了教程中专家数据的加载代码\n* @taufeeque9 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F675 中添加了用于超参数调优的脚本和配置文件\n* @ernestum 在 … 中修复了 SQIL 和 PC 的性能问题","2023-10-31T18:48:11",{"id":161,"version":162,"summary_zh":163,"released_at":164},292771,"v0.4.0","## 变更内容\n\n* 持续集成：新增对 macOS 的支持；移除对 MuJoCo 的依赖。\n* 偏好比较：改进日志记录，支持基于集成模型方差的主动学习。\n* 集成 Hugging Face，用于加载模型和数据集。\n* 基准测试：添加结果和示例配置。\n* 文档：新增 Notebook 教程；其他通用改进。\n* 通用变更：迁移到 pathlib；增加更多类型提示，以支持 mypy 和 pytype。\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fcompare\u002Fv0.3.1...v0.4.0","2023-07-17T23:05:35",{"id":166,"version":167,"summary_zh":168,"released_at":169},292772,"v0.3.1","## 变更内容\n\n主要变更：\n* @levmckinney 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F460 中添加了奖励集合和保守型奖励函数\n* @levmckinney 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F505 中移除了对 Python 3.7 的支持\n\n次要变更：\n* @Rocamonde 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F497 中进行了文档字符串及其他修复，以响应 #472\n* @AdamGleave 在 https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fpull\u002F495 中改进了 Windows CI 流程\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002FHumanCompatibleAI\u002Fimitation\u002Fcompare\u002Fv0.3.0...v0.3.1","2022-07-29T00:58:46",{"id":171,"version":172,"summary_zh":173,"released_at":174},292773,"v0.3.0","新功能：\n  - **新算法**：[基于人类偏好的深度强化学习](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03741)（感谢 @ejnnr、@norabelrose 等人）\n  - 包含示例的笔记本（感谢 @ernestum）\n  - 使用 NumPy 数组而非 pickle 序列化轨迹，以确保跨版本的稳定性并节省磁盘空间（感谢 @norabelrose）\n  - 支持 Weights and Biases 日志记录（感谢 @yawen-d）\n\n改进：\n  - 将 MCE IRL 从 JAX 移植到 PyTorch，从而移除对 JAX 的依赖。（感谢 @qxcv）\n  - 重构 RewardNet 代码，使其独立于 AIRL，并在不同算法之间共享。（感谢 @ejnnr）\n  - 增加对 Windows 的支持，包括持续集成。（感谢 @taufeeque9）","2022-07-26T21:07:16",{"id":176,"version":177,"summary_zh":76,"released_at":178},292774,"v0.2.0","2020-10-23T23:07:06",{"id":180,"version":181,"summary_zh":76,"released_at":182},292775,"v0.1.1","2020-09-01T01:39:59",{"id":184,"version":185,"summary_zh":186,"released_at":187},292776,"v0.1.0","AIRL、GAIL、BC、DAGGER 的原型版本。","2020-05-09T19:46:07"]