[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-hardmaru--slimevolleygym":3,"tool-hardmaru--slimevolleygym":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":75,"owner_avatar_url":76,"owner_bio":77,"owner_company":78,"owner_location":79,"owner_email":80,"owner_twitter":78,"owner_website":81,"owner_url":82,"languages":83,"stars":88,"forks":89,"last_commit_at":90,"license":91,"difficulty_score":92,"env_os":93,"env_gpu":94,"env_ram":95,"env_deps":96,"category_tags":104,"github_topics":78,"view_count":23,"oss_zip_url":78,"oss_zip_packed_at":78,"status":16,"created_at":105,"updated_at":106,"faqs":107,"releases":148},2413,"hardmaru\u002Fslimevolleygym","slimevolleygym","A simple OpenAI Gym environment for single and multi-agent reinforcement learning","slimevolleygym 是一个基于经典网页游戏“史莱姆排球”打造的轻量级强化学习环境。它将原本简单的双人排球对抗转化为标准的 OpenAI Gym 接口，旨在为单智能体及多智能体强化学习算法提供高效、稳定的测试平台。\n\n该工具主要解决了传统基准任务（如 CartPole）缺乏博弈互动性的问题，让研究人员能够轻松构建自对弈（self-play）或多智能体对抗场景，从而更全面地评估智能体在动态竞争中的表现。它特别适合强化学习研究者、算法开发者以及高校师生用于教学演示和原型验证。\n\nslimevolleygym 拥有多项技术亮点：依赖极少，仅需 gym 和 numpy 即可运行，极大降低了环境配置出错的风险；运算效率极高，在普通 CPU 上每秒可模拟上万步，显著缩短实验迭代周期；支持像素观察模式，可直接在无显示器的云端服务器上运行，并兼容针对 Atari 游戏优化的主流模型。此外，其独特的观测设计确保了一侧训练的智能体无需调整即可直接切换到另一侧对战，为迁移学习和对称策略研究提供了便利。无论是探索多智能体协作还是验证新的进化算法，slimevolleygym 都是一个简洁而强大的选择。","# Slime Volleyball Gym Environment\n\n\u003Cp align=\"left\">\n  \u003Cimg width=\"100%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhardmaru_slimevolleygym_readme_dea895057c05.gif\">\u003C\u002Fimg>\n\u003C\u002Fp>\n\nSlime Volleyball is a game created in the early 2000s by an unknown author.\n\n*“The physics of the game are a little ‘dodgy,’ but its simple gameplay made it instantly addictive.”*\u003Cbr\u002F>\n\n---\n\n**Update (May 12, 2022):** This environment has been ported over to [EvoJAX](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fevojax), hardware-accelerated neuroevolution toolkit that allows SlimeVolley to run on GPUs, enabling training time in minutes rather than hours.\n\n---\n\nSlimeVolleyGym is a simple gym environment for testing single and multi-agent reinforcement learning algorithms.\n\nThe game is very simple: the agent's goal is to get the ball to land on the ground of its opponent's side, causing its opponent to lose a life. Each agent starts off with five lives. The episode ends when either agent loses all five lives, or after 3000 timesteps has passed. An agent receives a reward of +1 when its opponent loses or -1 when it loses a life.\n\nThis environment is based on [Neural Slime Volleyball](https:\u002F\u002Fotoro.net\u002Fslimevolley\u002F), a JavaScript game I created in [2015](https:\u002F\u002Fblog.otoro.net\u002F2015\u002F03\u002F28\u002Fneural-slime-volleyball\u002F) that used self-play and evolution to train a simple neural network agent to play the game better than most human players. I decided to port it over to Python as a lightweight and fast gym environment as a testbed for more advanced RL methods such as multi-agent, self-play, continual learning, and imitation learning algorithms.\n\n### Note: Regarding Libraries\n\n- The pre-trained PPO models were trained using [stable-baselines](https:\u002F\u002Fgithub.com\u002Fhill-a\u002Fstable-baselines) v2.10, *not* [stable-baselines3](https:\u002F\u002Fgithub.com\u002FDLR-RM\u002Fstable-baselines3).\n\n- The examples were developed based on Gym version 0.19.0 or earlier. I tested 0.20.0 briefly and it seems to work, but later versions of Gym have API-breaking changes.\n\n- I used pyglet library 0.15.7 or earlier while developing this, but have not tested whether the package works for the latest versions of pyglet.\n\n### Notable features\n\n- Only dependencies are gym and numpy. No other libraries needed to run the env, making it less likely to break.\n\n- In the normal single agent setting, the agent plays against a tiny 120-parameter [neural network](https:\u002F\u002Fotoro.net\u002Fslimevolley\u002F) baseline agent from 2015. This opponent can easily be replaced by another policy to enable a multi-agent or self-play environment.\n\n- Runs at around 12.5K timesteps per second on 2015 MacBook (core i7) for state-space observations, resulting in faster iteration in experiments.\n\n- A [tutorial](TRAINING.md) demonstrating several different training methods (e.g. single agent, self-play, evolution) that require only a single CPU machine in most cases. Potentially useful for educational purposes.\n\n- A pixel observation mode is available. Observations are directly rendered to numpy arrays and runs on headless cloud machines. The pixel version of the environment mimics gym environments based on the Atari Learning Environment and has been tested on several Atari gym wrappers and RL models tuned for Atari.\n\n- The opponent's observation is made available in the optional `info` object returned by `env.step()` for both state and pixel settings. The observations are constructed as if the agent is always playing on the right court, even if it is playing on the left court, so an agent trained to play on one side can play on the other side without adjustment.\n\nThis environment is meant to complement existing simple benchmark tasks, such as CartPole, Lunar Lander, Bipedal Walker, Car Racing, and continuous control tasks (MuJoCo \u002F PyBullet \u002F DM Control), but with an extra game-playing element. The motivation is to easily enable trained agents to play against each other, and also let us easily train agents directly in a multi-agent setting, thus adding an extra dimension for evaluating an agent's performance.\n\n## Installation\n\nInstall from pip package, if you only want to use the gym environment, but don't want the example usage scripts:\n\n```\npip install slimevolleygym\n```\n\nInstall from the repo, if you want basic usage demos, training scripts, pre-trained models:\n\n```\ngit clone https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym.git\ncd slimevolleygym\npip install -e .\n```\n\n## Basic Usage\n\nAfter installing from the repo, you can play the game against the baseline agent by running:\n\n```\npython test_state.py\n```\n\n\u003Cp align=\"left\">\n  \u003Cimg width=\"50%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhardmaru_slimevolleygym_readme_1301e405533a.gif\">\u003C\u002Fimg>\n  \u003C!--\u003Cbr\u002F>\u003Ci>State-space observation mode.\u003C\u002Fi>-->\n\u003C\u002Fp>\n\nYou can control the agent on the right using the arrow keys, or the agent on the left using (A, W, D).\n\nSimilarly, `test_pixel.py` allows you to play in the pixelated environment, and `test_atari.py` lets you play the game by observing the preprocessed stacked frames (84px x 84px x 4 frames) typically done for Atari RL agents:\n\n\u003Cp align=\"left\">\n  \u003Cimg width=\"50%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhardmaru_slimevolleygym_readme_f757d4cafb1b.gif\">\u003C\u002Fimg>\n  \u003Cbr\u002F>\u003Ci>Atari gym wrappers combine 4 frames as one observation.\u003C\u002Fi>\n\u003C\u002Fp>\n\n## Environments\n\nThere are two types of environments: state-space observation or pixel observations:\n\n|Environment Id|Observation Space|Action Space\n|---|---|---|\n|SlimeVolley-v0|Box(12)|MultiBinary(3)\n|SlimeVolleyPixel-v0|Box(84, 168, 3)|MultiBinary(3)\n|SlimeVolleyNoFrameskip-v0|Box(84, 168, 3)|Discrete(6)\n\n`SlimeVolleyNoFrameskip-v0` identical to `SlimeVolleyPixel-v0` except that the action space is now a one-hot vector typically used in Atari RL agents.\n\nIn state-space observation, the 12-dim vector corresponds to the following states:\n\n\u003Cimg src=\"https:\u002F\u002Frender.githubusercontent.com\u002Frender\u002Fmath?math=\\left(x_{agent}, y_{agent}, \\dot{x}_{agent}, \\dot{y}_{agent}, x_{ball}, y_{ball}, \\dot{x}_{ball}, \\dot{y}_{ball}, x_{opponent}, y_{opponent}, \\dot{x}_{opponent}, \\dot{y}_{opponent}\\right)\">\u003C\u002Fimg>\n\nThe origin point (0, 0) is located at the bottom of the fence.\n\nBoth state and pixel observations are presented assuming the agent is playing on the right side of the screen.\n\n### Using Multi-Agent Environment\n\nIt is straight forward to modify the gym loop to enable multi-agent or self-play. Here is a basic gym loop:\n\n```python\nimport gym\nimport slimevolleygym\n\nenv = gym.make(\"SlimeVolley-v0\")\n\nobs = env.reset()\ndone = False\ntotal_reward = 0\n\nwhile not done:\n  action = my_policy(obs)\n  obs, reward, done, info = env.step(action)\n  total_reward += reward\n  env.render()\n\nprint(\"score:\", total_reward)\n```\n\nThe `info` object contains extra information including the observation for the opponent:\n\n```\ninfo = {\n  'ale.lives': agent's lives left,\n  'ale.otherLives': opponent's lives left,\n  'otherObs': opponent's observations,\n  'state': agent's state (same as obs in state mode),\n  'otherState': opponent's state (same as otherObs in state mode),\n}\n```\n\nThis modification allows you to evaluate `policy1` against `policy2`\n\n```python\nobs1 = env.reset()\nobs2 = obs1 # both sides always see the same initial observation.\n\ndone = False\ntotal_reward = 0\n\nwhile not done:\n\n  action1 = policy1(obs1)\n  action2 = policy2(obs2)\n\n  obs1, reward, done, info = env.step(action1, action2) # extra argument\n  obs2 = info['otherObs']\n\n  total_reward += reward\n  env.render()\n\nprint(\"policy1's score:\", total_reward)\nprint(\"policy2's score:\", -total_reward)\n```\n\nNote that in both state and pixel modes, `otherObs` is given as if the agent is playing on the right side of the screen, so one can swap an agent to play either side without modifying the agent.\n\n\u003Cp align=\"left\">\n  \u003Cimg width=\"50%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhardmaru_slimevolleygym_readme_a78d995ba52e.gif\">\u003C\u002Fimg>\n  \u003Cbr\u002F>\u003Ci>Opponent's observation is rendered in the smaller window.\u003C\u002Fi>\n\u003C\u002Fp>\n\nOne can consider replacing `policy2` with earlier versions of your agent (self-play) and wrapping the multi-agent environment as if it were a single-agent environment so that it can use standard RL algorithms. There are several examples of these techniques described in more detail in the [TRAINING.md](TRAINING.md) tutorial.\n\n## Evaluating against other agents\n\nSeveral pre-trained agents (`ppo`, `cma`, `ga`, `baseline`) are discussed in the [TRAINING.md](TRAINING.md) tutorial.\n\nYou can run them against each other using the following command:\n\n```\npython eval_agents.py --left ppo --right cma --render\n```\n\n\u003Cp align=\"left\">\n  \u003C!--\u003Cimg width=\"50%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhardmaru_slimevolleygym_readme_33390331a08e.gif\">\u003C\u002Fimg>-->\n  \u003Cimg width=\"50%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhardmaru_slimevolleygym_readme_be9a903d7087.gif\">\u003C\u002Fimg>\n  \u003Cbr\u002F>\u003Ci>Evaluating PPO agent (left) against CMA-ES (right).\u003C\u002Fi>\n\u003C\u002Fp>\n\nIt should be relatively straightforward to modify `eval_agents.py` to include your custom agent.\n\n## Leaderboard\n\nBelow are scores achieved by various algorithms and links to their implementations. Feel free to add yours here:\n\n### SlimeVolley-v0\n\n|Method|Average Score|Episodes|Other Info\n|---|---|---|---|\n|Maximum Possible Score|5.0|  | \n|PPO | 1.377 ± 1.133 | 1000 | [link](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|CMA-ES | 1.148 ± 1.071 | 1000 | [link](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|GA (Self-Play) | 0.353 ± 0.728 | 1000 | [link](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|CMA-ES (Self-Play) | -0.071 ± 0.827 | 1000 | [link](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|PPO (Self-Play) | -0.371 ± 1.085 | 1000 | [link](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|Random Policy | -4.866 ± 0.372 | 1000 | \n|[Add Method](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fedit\u002Fmaster\u002FREADME.md) |  |  |  \n\n### SlimeVolley-v0 (Sample Efficiency)\n\nFor sample efficiency, we can measure how many timesteps it took to train an agent that can achieve a positive average score (over 1000 episodes) against the built-in baseline policy:\n\n|Method| Timesteps (Best) | Timesteps (Median)| Trials | Other Info\n|---|---|---|---|---|\n|PPO | 1.274M | 2.998M | 17 | [link](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|Data-efficient Rainbow | 0.750M | 0.751M | 3 | [link](https:\u002F\u002Fgithub.com\u002Fpfnet\u002Fpfrl\u002Fblob\u002Fmaster\u002Fexamples\u002Fslimevolley\u002FREADME.md)\n|[Add Method](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fedit\u002Fmaster\u002FREADME.md) |  |  |  | \n\n### SlimeVolley-v0 (Against Other Agents)\n\nTable of average scores achieved versus agents other than the default baseline policy ([1000 episodes](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002Feval_agents.py)):\n\n|Method|Baseline|PPO|CMA-ES|GA (Self-Play)| Other Info\n|---|---|---|---|---|---|\n|PPO |  1.377 ± 1.133 | — |  0.133 ± 0.414 | -3.128 ± 1.509 | [link](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|CMA-ES | 1.148 ± 1.071 | -0.133 ± 0.414 | — | -0.301 ± 0.618 | [link](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|GA (Self-Play) | 0.353 ± 0.728  | 3.128 ± 1.509 | 0.301 ± 0.618 | — | [link](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|CMA-ES (Self-Play) | -0.071 ± 0.827  |  -0.749 ± 0.846 |  -0.351 ± 0.651 |  -4.923 ± 0.342 | [link](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|PPO (Self-Play) | -0.371 ± 1.085  | 0.119 ± 1.46 |  -2.304 ± 1.392 |  -0.42 ± 0.717 | [link](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|[Add Method](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fedit\u002Fmaster\u002FREADME.md) |  |  |\n\nIt is interesting to note that while GA (Self-Play) did not perform as well against the baseline policy compared to PPO and CMA-ES, it is a superior policy if evaluated against these methods that trained directly against the baseline policy.\n\n### SlimeVolleyPixel-v0\n\nResults for pixel observation version of the environment (`SlimeVolleyPixel-v0` or `SlimeVolleyNoFrameskip-v0`):\n\n|Pixel Observation|Average Score|Episodes|Other Info\n|---|---|---|---|\n|Maximum Possible Score|5.0| | |\n|PPO | 0.435 ± 0.961 | 1000 | [link](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|Rainbow | 0.037 ± 0.994 | 1000 | [link](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002FRainbowSlimeVolley)\n|A2C | -0.079 ± 1.091 | 1000 | [link](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Frlzoo)\n|ACKTR | -1.183 ± 1.480 | 1000 | [link](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Frlzoo)\n|ACER | -1.789 ± 1.632 | 1000 | [link](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Frlzoo)\n|DQN | -4.091 ± 1.242 | 1000 | [link](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Frlzoo)\n|Random Policy | -4.866 ± 0.372 | 1000 | \n|[Add Method](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fedit\u002Fmaster\u002FREADME.md) |  | (>= 1000) | \n\n## Publications\n\nIf you have publications, articles, projects, blog posts that use this environment, feel free to add a link here via a [PR](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fedit\u002Fmaster\u002FREADME.md).\n\n## Citation\n\n\u003C!--\u003Cp align=\"left\">\n  \u003Cimg width=\"100%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhardmaru_slimevolleygym_readme_be9a903d7087.gif\">\u003C\u002Fimg>\u003C\u002Fimg>\n\u003C\u002Fp>-->\n\nPlease use this BibTeX to cite this repository in your publications:\n\n```\n@misc{slimevolleygym,\n  author = {David Ha},\n  title = {Slime Volleyball Gym Environment},\n  year = {2020},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym}},\n}\n```\n","# 黏液排球 Gym 环境\n\n\u003Cp align=\"left\">\n  \u003Cimg width=\"100%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhardmaru_slimevolleygym_readme_dea895057c05.gif\">\u003C\u002Fimg>\n\u003C\u002Fp>\n\n黏液排球是一款由不知名作者在2000年代初制作的游戏。\n\n*“游戏的物理机制有点‘不太靠谱’，但其简单的玩法却让人一玩就上瘾。”*\u003Cbr\u002F>\n\n---\n\n**更新（2022年5月12日）：** 该环境已被移植到 [EvoJAX](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fevojax) 上，这是一个硬件加速的神经进化工具包，使得 SlimeVolley 可以在 GPU 上运行，从而将训练时间从数小时缩短至几分钟。\n\n---\n\nSlimeVolleyGym 是一个用于测试单智能体和多智能体强化学习算法的简单 Gym 环境。\n\n游戏规则非常简单：智能体的目标是将球打到对方场地的地面上，从而使对方失去一条生命。每个智能体初始有五条生命。当任意一方的生命耗尽，或者经过 3000 个时间步后，回合即结束。当对手失分时，智能体获得 +1 的奖励；当自己失分时，则获得 -1 的奖励。\n\n该环境基于我于 [2015 年](https:\u002F\u002Fblog.otoro.net\u002F2015\u002F03\u002F28\u002Fneural-slime-volleyball\u002F) 开发的 JavaScript 游戏 [Neural Slime Volleyball](https:\u002F\u002Fotoro.net\u002Fslimevolley\u002F)。当时我利用自我对弈和进化算法训练了一个简单的神经网络智能体，使其表现优于大多数人类玩家。后来，我决定将其移植到 Python 中，作为一个轻量级、高速的 Gym 环境，用作更高级 RL 方法的测试平台，例如多智能体、自我对弈、持续学习和模仿学习等算法。\n\n### 注意：关于库\n\n- 预训练的 PPO 模型是使用 [stable-baselines](https:\u002F\u002Fgithub.com\u002Fhill-a\u002Fstable-baselines) v2.10 训练的，*而非* [stable-baselines3](https:\u002F\u002Fgithub.com\u002FDLR-RM\u002Fstable-baselines3)。\n\n- 示例代码基于 Gym 0.19.0 或更早版本开发。我曾短暂测试过 0.20.0 版本，似乎可以正常工作，但更高版本的 Gym 存在破坏兼容性的 API 变更。\n\n- 在开发过程中，我使用的是 pyglet 库 0.15.7 或更早版本，尚未测试最新版本的 pyglet 是否兼容。\n\n### 显著特点\n\n- 唯一依赖项是 gym 和 numpy。运行该环境无需其他库，因此出错的可能性较低。\n  \n- 在常规的单智能体设置中，智能体将与一个来自 2015 年的、仅有 120 个参数的 [神经网络](https:\u002F\u002Fotoro.net\u002Fslimevolley\u002F) 基线智能体对战。这个对手可以轻松替换为其他策略，从而实现多智能体或自我对弈环境。\n\n- 在 2015 年款 MacBook（Core i7）上，状态空间观测模式下可达到约 12.5K 时间步\u002F秒的运行速度，从而加快实验迭代。\n\n- 提供了一份[教程](TRAINING.md)，演示了多种不同的训练方法（如单智能体、自我对弈、进化等），这些方法通常仅需一台 CPU 机器即可完成。对于教学用途可能非常有用。\n\n- 提供像素观测模式。观测结果直接渲染为 numpy 数组，并可在无头云服务器上运行。该像素版本的环境模仿了基于 Atari Learning Environment 的 Gym 环境，已成功应用于多个 Atari Gym 封装及针对 Atari 优化的 RL 模型。\n\n- 对手的观测信息会作为 `env.step()` 返回的可选 `info` 对象的一部分提供，适用于状态空间和像素两种观测模式。观测数据的构建方式是假设智能体始终位于右侧球场，即使实际位置在左侧，因此在某一侧训练的智能体无需调整即可在另一侧继续游戏。\n\n该环境旨在补充现有的简单基准任务，如 CartPole、Lunar Lander、Bipedal Walker、Car Racing 以及连续控制任务（MuJoCo \u002F PyBullet \u002F DM Control），同时增加游戏性元素。其设计初衷是便于让训练好的智能体相互对战，也能够直接在多智能体环境中进行训练，从而为评估智能体性能提供更多维度。\n\n## 安装\n\n如果仅需使用 Gym 环境，而不需要示例脚本，可通过 pip 包安装：\n\n```\npip install slimevolleygym\n```\n\n若希望获取基础使用演示、训练脚本及预训练模型，则可从仓库安装：\n\n```\ngit clone https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym.git\ncd slimevolleygym\npip install -e .\n```\n\n## 基本使用\n\n从仓库安装后，可以通过运行以下命令与基线智能体对战：\n\n```\npython test_state.py\n```\n\n\u003Cp align=\"left\">\n  \u003Cimg width=\"50%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhardmaru_slimevolleygym_readme_1301e405533a.gif\">\u003C\u002Fimg>\n  \u003C!--\u003Cbr\u002F>\u003Ci>状态空间观测模式。\u003C\u002Fi>-->\n\u003C\u002Fp>\n\n可使用方向键控制右侧智能体，或使用 (A, W, D) 控制左侧智能体。\n\n同样地，`test_pixel.py` 允许在像素化环境中游玩，而 `test_atari.py` 则可以让玩家通过观察预处理后的堆叠帧（84px x 84px x 4 帧）来玩游戏，这种观测方式通常用于 Atari RL 任务：\n\n\u003Cp align=\"left\">\n  \u003Cimg width=\"50%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhardmaru_slimevolleygym_readme_f757d4cafb1b.gif\">\u003C\u002Fimg>\n  \u003Cbr\u002F>\u003Ci>Atari Gym 封装会将 4 帧合并为一个观测值。\u003C\u002Fi>\n\u003C\u002Fp>\n\n## 环境类型\n\n该环境分为两种观测模式：状态空间观测和像素观测：\n\n|环境 ID|观测空间|动作空间|\n|---|---|---|\n|SlimeVolley-v0|Box(12)|MultiBinary(3)|\n|SlimeVolleyPixel-v0|Box(84, 168, 3)|MultiBinary(3)|\n|SlimeVolleyNoFrameskip-v0|Box(84, 168, 3)|Discrete(6)|\n\n`SlimeVolleyNoFrameskip-v0` 与 `SlimeVolleyPixel-v0` 相同，唯一的区别在于动作空间现在采用 Atari RL 任务中常用的独热向量表示。\n\n在状态空间观测中，12 维向量对应以下状态：\n\n\u003Cimg src=\"https:\u002F\u002Frender.githubusercontent.com\u002Frender\u002Fmath?math=\\left(x_{agent}, y_{agent}, \\dot{x}_{agent}, \\dot{y}_{agent}, x_{ball}, y_{ball}, \\dot{x}_{ball}, \\dot{y}_{ball}, x_{opponent}, y_{opponent}, \\dot{x}_{opponent}, \\dot{y}_{opponent}\\right)\">\u003C\u002Fimg>\n\n原点 (0, 0) 位于围栏底部。\n\n无论是状态空间观测还是像素观测，都假定智能体位于屏幕的右侧。\n\n### 使用多智能体环境\n\n修改 Gym 循环以支持多智能体或自我对弈非常简单。以下是一个基本的 Gym 循环：\n\n```python\nimport gym\nimport slimevolleygym\n\nenv = gym.make(\"SlimeVolley-v0\")\n\nobs = env.reset()\ndone = False\ntotal_reward = 0\n\nwhile not done:\n  action = my_policy(obs)\n  obs, reward, done, info = env.step(action)\n  total_reward += reward\n  env.render()\n\nprint(\"score:\", total_reward)\n```\n\n`info` 对象包含额外的信息，包括对手的观测：\n\n```\ninfo = {\n  'ale.lives': 本方剩余生命值,\n  'ale.otherLives': 对手剩余生命值,\n  'otherObs': 对手的观测,\n  'state': 本方的状态（在状态模式下与 obs 相同）,\n  'otherState': 对手的状态（在状态模式下与 otherObs 相同）,\n}\n```\n\n通过这种修改，你可以评估 `policy1` 对抗 `policy2` 的表现：\n\n```python\nobs1 = env.reset()\nobs2 = obs1 # 双方始终看到相同的初始观测。\n\ndone = False\ntotal_reward = 0\n\nwhile not done:\n\n  action1 = policy1(obs1)\n  action2 = policy2(obs2)\n\n  obs1, reward, done, info = env.step(action1, action2) # 额外参数\n  obs2 = info['otherObs']\n\n  total_reward += reward\n  env.render()\n\nprint(\"policy1's score:\", total_reward)\nprint(\"policy2's score:\", -total_reward)\n```\n\n需要注意的是，在状态模式和像素模式下，`otherObs` 都是以智能体位于屏幕右侧的角度给出的，因此可以轻松切换智能体的位置，而无需修改智能体本身。\n\n\u003Cp align=\"left\">\n  \u003Cimg width=\"50%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhardmaru_slimevolleygym_readme_a78d995ba52e.gif\">\u003C\u002Fimg>\n  \u003Cbr\u002F>\u003Ci>对手的观测显示在较小的窗口中。\u003C\u002Fi>\n\u003C\u002Fp>\n\n你可以考虑用自己智能体的早期版本替换 `policy2`（即自我对弈），并将多智能体环境包装成单智能体环境，以便使用标准的强化学习算法。这些技术的更多详细示例可以在 [TRAINING.md](TRAINING.md) 教程中找到。\n\n## 对抗其他智能体进行评估\n\n[TRAINING.md](TRAINING.md) 教程中讨论了几种预训练的智能体（`ppo`、`cma`、`ga`、`baseline`）。\n\n你可以使用以下命令让它们相互对战：\n\n```\npython eval_agents.py --left ppo --right cma --render\n```\n\n\u003Cp align=\"left\">\n  \u003C!--\u003Cimg width=\"50%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhardmaru_slimevolleygym_readme_33390331a08e.gif\">\u003C\u002Fimg>-->\n  \u003Cimg width=\"50%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhardmaru_slimevolleygym_readme_be9a903d7087.gif\">\u003C\u002Fimg>\n  \u003Cbr\u002F>\u003Ci>评估 PPO 智能体（左）对抗 CMA-ES（右）。\u003C\u002Fi>\n\u003C\u002Fp>\n\n修改 `eval_agents.py` 以加入你自定义的智能体应该相对简单。\n\n## 排行榜\n\n以下是各种算法取得的分数及其实现链接。欢迎在此处添加你的结果：\n\n### SlimeVolley-v0\n\n|方法|平均得分|回合数|其他信息|\n|---|---|---|---|\n|理论最高分|5.0|  | \n|PPO | 1.377 ± 1.133 | 1000 | [链接](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|CMA-ES | 1.148 ± 1.071 | 1000 | [链接](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|GA（自我对弈） | 0.353 ± 0.728 | 1000 | [链接](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|CMA-ES（自我对弈） | -0.071 ± 0.827 | 1000 | [链接](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|PPO（自我对弈） | -0.371 ± 1.085 | 1000 | [链接](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|随机策略 | -4.866 ± 0.372 | 1000 | \n|[添加方法](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fedit\u002Fmaster\u002FREADME.md) |  |  |  \n\n### SlimeVolley-v0（样本效率）\n\n对于样本效率，我们可以衡量训练一个能够在 1000 回合内对内置基准策略取得正平均得分的智能体所需的步数：\n\n|方法|最佳步数|中位数步数|试验次数|其他信息|\n|---|---|---|---|---|\n|PPO | 1.274M | 2.998M | 17 | [链接](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|数据高效 Rainbow | 0.750M | 0.751M | 3 | [链接](https:\u002F\u002Fgithub.com\u002Fpfnet\u002Fpfrl\u002Fblob\u002Fmaster\u002Fexamples\u002Fslimevolley\u002FREADME.md)\n|[添加方法](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fedit\u002Fmaster\u002FREADME.md) |  |  |  | \n\n### SlimeVolley-v0（对抗其他智能体）\n\n与默认基准策略以外的其他智能体对战时的平均得分表（基于 1000 回合测试）：\n\n|方法|基准|PPO|CMA-ES|GA（自我对弈）|其他信息|\n|---|---|---|---|---|---|\n|PPO |  1.377 ± 1.133 | — |  0.133 ± 0.414 | -3.128 ± 1.509 | [链接](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|CMA-ES | 1.148 ± 1.071 | -0.133 ± 0.414 | — | -0.301 ± 0.618 | [链接](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|GA（自我对弈） | 0.353 ± 0.728  | 3.128 ± 1.509 | 0.301 ± 0.618 | — | [链接](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|CMA-ES（自我对弈） | -0.071 ± 0.827  |  -0.749 ± 0.846 |  -0.351 ± 0.651 |  -4.923 ± 0.342 | [链接](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|PPO（自我对弈） | -0.371 ± 1.085  | 0.119 ± 1.46 |  -2.304 ± 1.392 |  -0.42 ± 0.717 | [链接](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|[添加方法](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fedit\u002Fmaster\u002FREADME.md) |  |  |\n\n值得注意的是，尽管 GA（自我对弈）在对抗基准策略时的表现不如 PPO 和 CMA-ES，但如果将其与其他直接针对基准策略训练的智能体进行比较，则它是一种更优的策略。\n\n### SlimeVolleyPixel-v0\n\n像素观测版本环境（`SlimeVolleyPixel-v0` 或 `SlimeVolleyNoFrameskip-v0`）的结果：\n\n|像素观测|平均得分|回合数|其他信息|\n|---|---|---|---|\n|理论最高分|5.0| | |\n|PPO | 0.435 ± 0.961 | 1000 | [链接](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fblob\u002Fmaster\u002FTRAINING.md)\n|Rainbow | 0.037 ± 0.994 | 1000 | [链接](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002FRainbowSlimeVolley)\n|A2C | -0.079 ± 1.091 | 1000 | [链接](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Frlzoo)\n|ACKTR | -1.183 ± 1.480 | 1000 | [链接](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Frlzoo)\n|ACER | -1.789 ± 1.632 | 1000 | [链接](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Frlzoo)\n|DQN | -4.091 ± 1.242 | 1000 | [链接](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Frlzoo)\n|随机策略 | -4.866 ± 0.372 | 1000 | \n|[添加方法](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fedit\u002Fmaster\u002FREADME.md) |  | (≥ 1000) | \n\n## 出版物\n\n如果你有使用该环境的出版物、文章、项目或博客文章，请随时通过 [PR](https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fedit\u002Fmaster\u002FREADME.md) 在此处添加链接。\n\n## 引用\n\n\u003C!--\n\u003Cp align=\"left\">\n  \u003Cimg width=\"100%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhardmaru_slimevolleygym_readme_be9a903d7087.gif\">\u003C\u002Fimg>\u003C\u002Fimg>\n\u003C\u002Fp>\n-->\n\n请使用以下 BibTeX 格式在您的出版物中引用本仓库：\n\n```\n@misc{slimevolleygym,\n  author = {David Ha},\n  title = {史莱姆排球 Gym 环境},\n  year = {2020},\n  publisher = {GitHub},\n  journal = {GitHub 代码库},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym}},\n}\n```","# SlimeVolleyGym 快速上手指南\n\nSlimeVolleyGym 是一个轻量级的 Gym 环境，专为测试单智能体及多智能体强化学习算法而设计。该环境基于经典的\"Slime Volleyball\"游戏，支持状态空间（State-space）和像素观察（Pixel observation）两种模式，运行速度快且依赖极少。\n\n## 环境准备\n\n*   **操作系统**：Linux, macOS, Windows\n*   **Python 版本**：推荐 Python 3.6 - 3.9\n*   **核心依赖**：\n    *   `gym` (推荐版本 \u003C= 0.20.0，高版本可能存在 API 不兼容)\n    *   `numpy`\n    *   `pyglet` (如需渲染画面，推荐版本 \u003C= 0.15.7)\n*   **可选依赖**：若需使用预训练模型或示例脚本，需安装完整仓库。\n\n> **注意**：预训练的 PPO 模型是基于 `stable-baselines` v2.10 训练的，而非 `stable-baselines3`。\n\n## 安装步骤\n\n### 方式一：仅安装环境（推荐用于集成到自己的项目）\n如果你只需要 Gym 环境接口，不需要示例代码和预训练模型：\n\n```bash\npip install slimevolleygym\n```\n\n### 方式二：从源码安装（包含示例、训练脚本和预训练模型）\n如果你希望运行官方提供的演示、训练教程或评估脚本：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym.git\ncd slimevolleygym\npip install -e .\n```\n\n*(国内用户若克隆速度慢，可尝试使用 Gitee 镜像或配置 Git 代理)*\n\n## 基本使用\n\n### 1. 运行官方演示\n安装源码后，你可以直接运行以下命令体验游戏：\n\n**状态空间模式（默认，速度快）：**\n使用方向键控制右侧代理，或使用 (A, W, D) 控制左侧代理。\n```bash\npython test_state.py\n```\n\n**像素观察模式（模拟 Atari 环境）：**\n```bash\npython test_pixel.py\n```\n\n**Atari 预处理帧模式（堆叠 4 帧）：**\n```bash\npython test_atari.py\n```\n\n### 2. 代码集成示例\n以下是将 SlimeVolleyGym 集成到标准 Gym 循环中的最小化代码示例：\n\n```python\nimport gym\nimport slimevolleygym\n\n# 创建环境\n# 可选环境 ID: \n# 'SlimeVolley-v0' (状态空间，最快)\n# 'SlimeVolleyPixel-v0' (像素观察)\n# 'SlimeVolleyNoFrameskip-v0' (离散动作空间的像素观察)\nenv = gym.make(\"SlimeVolley-v0\")\n\nobs = env.reset()\ndone = False\ntotal_reward = 0\n\nwhile not done:\n  # 替换为你的策略函数\n  action = my_policy(obs)\n  \n  # 执行动作\n  obs, reward, done, info = env.step(action)\n  \n  total_reward += reward\n  \n  # 渲染画面（在无头服务器上可省略）\n  env.render()\n\nprint(\"score:\", total_reward)\nenv.close()\n```\n\n### 3. 多智能体\u002F自博弈支持\n该环境原生支持多智能体设置。`env.step()` 返回的 `info` 字典中包含对手的观察数据 (`otherObs`)，且对手视角已自动调整为与当前代理一致的坐标系（即始终假设代理在右侧），无需额外处理即可实现左右互换训练。\n\n```python\n# 简化的多智能体循环示例\nobs1 = env.reset()\nobs2 = obs1 \ndone = False\n\nwhile not done:\n  action1 = policy1(obs1)\n  action2 = policy2(obs2)\n\n  # 传入两个动作\n  obs1, reward, done, info = env.step(action1, action2)\n  obs2 = info['otherObs'] # 获取对手视角的观察\n\n  env.render()\n```","某高校强化学习实验室的研究团队正致力于开发一种能在动态对抗环境中快速进化的多智能体协作算法，急需一个轻量级且支持自博弈的测试平台。\n\n### 没有 slimevolleygym 时\n- **环境搭建繁琐**：研究者需自行编写游戏物理引擎或集成厚重的 Atari 模拟器，依赖库冲突频发，配置耗时数天。\n- **训练迭代缓慢**：传统环境在 CPU 上运行效率低，完成一次完整的自博弈进化实验往往需要数小时甚至更久，严重拖慢验证节奏。\n- **多智能体支持缺失**：现有简单基准任务（如 CartPole）缺乏对抗机制，而复杂游戏难以直接修改为自博弈模式，无法有效评估策略的鲁棒性。\n- **云端部署困难**：大多数图形化环境依赖显示服务器，难以在无头（headless）云服务器上直接进行大规模并行训练。\n\n### 使用 slimevolleygym 后\n- **即插即用**：仅依赖 gym 和 numpy 即可运行，无需额外配置物理引擎，研究人员可在几分钟内启动实验代码。\n- **极速迭代**：在普通笔记本上即可达到每秒 1.25 万步的仿真速度，若结合 EvoJAX 加速，训练时间从小时级缩短至分钟级。\n- **原生自博弈支持**：内置对手模型可轻松替换为任意策略网络，天然支持单代理对抗、多智能体协作及持续学习场景，完美契合算法验证需求。\n- **无缝云端运行**：提供像素观察模式，直接将画面渲染为 NumPy 数组，无需图形界面即可在云端集群大规模并行训练。\n\nslimevolleygym 通过极简的架构与高效的仿真能力，将多智能体强化学习的实验门槛降至最低，让研究者能专注于算法创新而非环境调试。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhardmaru_slimevolleygym_dea89505.gif","hardmaru","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fhardmaru_e038fd9c.png","I make simple things with neural networks.",null,"Tokyo","hardmaru@gmail.com","https:\u002F\u002Fotoro.net\u002Fml\u002F","https:\u002F\u002Fgithub.com\u002Fhardmaru",[84],{"name":85,"color":86,"percentage":87},"Python","#3572A5",100,779,122,"2026-03-30T12:14:19","Apache-2.0",1,"Linux, macOS, Windows","非必需。默认版本仅需 CPU 即可运行（在 2015 年 MacBook i7 上可达 12.5K steps\u002Fs）。若需硬件加速训练，可配合 EvoJAX 工具包在 GPU 上运行。","未说明（轻量级环境，依赖仅为 gym 和 numpy）",{"notes":97,"python":98,"dependencies":99},"该环境非常轻量，核心运行仅需 'gym' 和 'numpy'。预训练模型是基于 stable-baselines v2.10 训练的，不兼容 stable-baselines3。开发时使用的 pyglet 版本为 0.15.7 或更早，新版兼容性未测试。支持无头（headless）模式下的像素观察，适用于云端机器。","未说明（兼容 Gym 0.19.0 及更早版本，Gym 0.20.0 经测试可用，更高版本可能存在 API 不兼容问题）",[100,101,102,103],"gym\u003C=0.20.0","numpy","pyglet\u003C=0.15.7","stable-baselines==2.10 (用于预训练模型)",[13,15,54],"2026-03-27T02:49:30.150509","2026-04-06T05:44:31.211010",[108,113,118,123,128,133,138,143],{"id":109,"question_zh":110,"answer_zh":111,"source_url":112},11106,"如何在 SlimeVolley 环境中使用 SLM Lab 或其他强化学习框架？","该环境是一个标准的 Gym 环境，因此理论上兼容任何支持 Gym 的框架（包括 SLM Lab 和 ChainerRL）。虽然维护者未直接测试 SLM Lab，但提供了使用 ChainerRL 的成功案例供参考。建议查阅 ChainerRL 的实现示例（如 RainbowSlimeVolley 项目）来配置 SLM Lab。","https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fissues\u002F15",{"id":114,"question_zh":115,"answer_zh":116,"source_url":117},11107,"球穿过网（ball goes through net）是 Bug 吗？会被利用吗？","这是游戏原始设计中为了简化逻辑而保留的特性，并非意外 Bug。目前尚未发现有智能体能有效利用此漏洞获得不合理的高分。如果未来有代理发现并利用此漏洞，维护者计划先在排行榜上展示该案例，然后再通过版本更新修复它。","https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fissues\u002F8",{"id":119,"question_zh":120,"answer_zh":121,"source_url":122},11103,"为什么安装或运行时提示找不到 'cv2' 模块？","这是因为较新版本的 Gym 不再将 opencv-python (cv2) 作为默认依赖项。虽然项目代码中使用了 cv2，但 setup.py 中可能未显式声明。解决方法是手动安装 opencv：`pip install opencv-python`。维护者指出，通常安装 gym 时会附带 cv2，但如果使用新版 gym 则需单独安装。","https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fissues\u002F4",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},11104,"在 macOS Big Sur 上运行时报错 'Can't find framework OpenGL.framework' 怎么办？","这是 pyglet 在 macOS Big Sur 上的已知兼容性问题。建议参考 pyglet 官方 Issue #274 中的解决方案（通常涉及更新 pyglet 版本或调整系统配置）。维护者已在项目中添加了关于所用 pyglet 版本的说明，建议检查并更新到兼容版本。","https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fissues\u002F10",{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},11105,"使用 PPO 训练像素版（pixel）模型需要多少时间步和计算资源？","对于像素版 SlimeVolley 使用 PPO 算法，模型通常在约 2 亿（200M）步时开始收敛，在一台 96 核的机器上需要几天时间。为了捕捉细微的性能提升，完整训练可能会运行到 20 亿（2B）步。相比之下，状态观测版（state-observation）训练速度快得多，在较小的机器上几小时内即可完成。","https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fissues\u002F16",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},11108,"调用 env.close() 后渲染窗口没有关闭怎么办？","在某些环境下，调用 `env.close()` 可能不会立即关闭渲染窗口，直到脚本完全结束。这可能与具体的系统或 Gym 版本有关。维护者建议提交 PR 以改进窗口关闭逻辑，确保在所有窗口应在正确时间关闭。如果遇到此问题，可以尝试手动管理窗口生命周期或等待相关修复合并。","https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fissues\u002F6",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},11109,"使用 AtariPreprocessing 包装器时出现 'no attribute np_random' 错误如何解决？","这是因为环境初始化时未正确设置随机种子。解决方法是在创建环境后立即调用 `env.seed(np.random.randint(0, 10000))`。此外，AtariPreprocessing 中依赖的某些 ALE 函数（如 getScreenGrayscale）可能不直接可用，建议使用项目中示例提供的其他包装器组合来实现相同的预处理效果。","https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fissues\u002F13",{"id":144,"question_zh":145,"answer_zh":146,"source_url":147},11110,"运行多智能体评估脚本时提示 'step() takes 2 positional arguments but 3 were given' 错误？","该错误通常由库版本不匹配引起。`eval_agents.py` 中的多智能体逻辑依赖于特定版本的 Gym 或环境接口。维护者建议在项目的文档或 README 中查看“库版本（Library Versions）”说明，确保安装的 gym、slimevolleygym 及相关依赖版本与开发环境一致。","https:\u002F\u002Fgithub.com\u002Fhardmaru\u002Fslimevolleygym\u002Fissues\u002F14",[]]