[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-NVIDIA-NeMo--Gym":3,"tool-NVIDIA-NeMo--Gym":62},[4,18,26,36,46,54],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",159636,2,"2026-04-17T23:33:34",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":42,"last_commit_at":43,"category_tags":44,"status":17},8272,"opencode","anomalyco\u002Fopencode","OpenCode 是一款开源的 AI 编程助手（Coding Agent），旨在像一位智能搭档一样融入您的开发流程。它不仅仅是一个代码补全插件，而是一个能够理解项目上下文、自主规划任务并执行复杂编码操作的智能体。无论是生成全新功能、重构现有代码，还是排查难以定位的 Bug，OpenCode 都能通过自然语言交互高效完成，显著减少开发者在重复性劳动和上下文切换上的时间消耗。\n\n这款工具专为软件开发者、工程师及技术研究人员设计，特别适合希望利用大模型能力来提升编码效率、加速原型开发或处理遗留代码维护的专业人群。其核心亮点在于完全开源的架构，这意味着用户可以审查代码逻辑、自定义行为策略，甚至私有化部署以保障数据安全，彻底打破了传统闭源 AI 助手的“黑盒”限制。\n\n在技术体验上，OpenCode 提供了灵活的终端界面（Terminal UI）和正在测试中的桌面应用程序，支持 macOS、Windows 及 Linux 全平台。它兼容多种包管理工具，安装便捷，并能无缝集成到现有的开发环境中。无论您是追求极致控制权的资深极客，还是渴望提升产出的独立开发者，OpenCode 都提供了一个透明、可信",144296,1,"2026-04-16T14:50:03",[13,45],"插件",{"id":47,"name":48,"github_repo":49,"description_zh":50,"stars":51,"difficulty_score":32,"last_commit_at":52,"category_tags":53,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":55,"name":56,"github_repo":57,"description_zh":58,"stars":59,"difficulty_score":32,"last_commit_at":60,"category_tags":61,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[45,13,15,14],{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":76,"owner_twitter":76,"owner_website":77,"owner_url":78,"languages":79,"stars":96,"forks":97,"last_commit_at":98,"license":99,"difficulty_score":32,"env_os":100,"env_gpu":101,"env_ram":102,"env_deps":103,"category_tags":109,"github_topics":110,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":118,"updated_at":119,"faqs":120,"releases":151},8944,"NVIDIA-NeMo\u002FGym","Gym","Build RL environments for LLM training","NeMo Gym 是 NVIDIA 推出的一款开源库，专为构建用于大语言模型（LLM）训练的强化学习（RL）环境而设计。它旨在解决开发者在创建复杂 RL 训练场景时面临的基础设施缺失难题，提供了一套标准化的脚手架，支持多步交互、多轮对话及用户建模等高级场景的快速开发。\n\n通过 NeMo Gym，研究人员和工程师无需精通整个强化学习训练循环的细节，即可独立构建、测试并扩展环境数据收集流程。其独特亮点在于实现了环境与训练框架的解耦，允许用户在脱离具体训练算法的情况下对环境和吞吐量进行端到端验证。此外，它具备出色的互操作性，不仅能无缝对接 NeMo RL、OpenRLHF、Unsloth 等主流训练框架，还集成了 Reasoning Gym 等丰富的现成环境库，特别适用于“基于可验证奖励的强化学习”（RLVR）任务。\n\n该工具主要面向 AI 研究人员、大模型算法工程师及系统开发者。虽然目前处于早期开发阶段，API 仍在演进中，但它为希望加速大模型对齐与推理能力优化的技术团队提供了灵活且高效的基础设施支持，只需标准开发机器即可运行，降低了高性能 RL 环境的构建门槛。","# NeMo Gym\n\n**[Requirements](#-requirements)** • **[Quick Start](#-quick-start)** • **[Available Environments](#-available-environments)** • **[Documentation & Resources](#-documentation--resources)** • **[Community & Support](#-community--support)** • **[Citations](#-citations)**\n\nNeMo Gym is a library for building reinforcement learning (RL) training environments for large language models (LLMs). It provides infrastructure to develop environments, scale rollout collection, and integrate seamlessly with your preferred training framework.\n\n## 🏆 Why NeMo Gym?\n\n- Scaffolding and patterns to accelerate environment development: multi-step, multi-turn, and user modeling scenarios\n- Contribute environments without expert knowledge of the entire RL training loop\n- Test environments and throughput end-to-end, independent of the RL training loop\n- Interoperable with existing environments, systems, and RL training frameworks\n- Growing collection of training environments and datasets for Reinforcement Learning from Verifiable Reward (RLVR)\n\n> [!IMPORTANT]\n> NeMo Gym is currently in early development. You should expect evolving APIs, incomplete documentation, and occasional bugs. We welcome contributions and feedback - for any changes, please open an issue first to kick off discussion!\n\n## 🔗 Ecosystem\n\nNeMo Gym is part of [NVIDIA NeMo](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Fabout\u002Fecosystem.html#related-nemo-libraries), NVIDIA's GPU-accelerated platform for building and training generative AI models. NeMo Gym integrates with a growing number of RL training frameworks and environment libraries; see the [Ecosystem](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Fabout\u002Fecosystem.html) page for full details and tutorials.\n\n**Training Frameworks:** [NeMo RL](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Ftraining-tutorials\u002Fnemo-rl-grpo\u002Findex.html) • [OpenRLHF](https:\u002F\u002Fgithub.com\u002FOpenRLHF\u002FOpenRLHF\u002Fblob\u002Fmain\u002Fexamples\u002Fpython\u002Fagent_func_nemogym_executor.py) • [Unsloth](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Ftraining-tutorials\u002Funsloth-training.html) • [more →](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Fabout\u002Fecosystem.html#training-framework-integrations)\n\n**Environment Libraries:** [Reasoning Gym](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Ftree\u002Fmain\u002Fresources_servers\u002Freasoning_gym) • [Aviary](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Ftree\u002Fmain\u002Fresources_servers\u002Faviary) • [more →](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Fabout\u002Fecosystem.html#environment-library-integrations)\n\n## 📋 Requirements\n\nNeMo Gym is designed to run on standard development machines:\n\n| Hardware Requirements | Software Requirements |\n| --------------------- | --------------------- |\n| **GPU**: Not required for NeMo Gym library operation\u003Cbr>• GPU may be needed for specific resources servers or model inference (see individual server documentation) | **Operating System**:\u003Cbr>• Linux (Ubuntu 20.04+, or equivalent)\u003Cbr>• macOS (11.0+ for x86_64, 12.0+ for Apple Silicon)\u003Cbr>• Windows (via WSL2) |\n| **CPU**: Any modern x86_64 or ARM64 processor (e.g., Intel, AMD, Apple Silicon) | **Python**: 3.12 or higher |\n| **RAM**: Minimum 8 GB (16 GB+ recommended for larger environments) | **Git**: For cloning the repository |\n| **Storage**: Minimum 5 GB free disk space for installation and basic usage | **Internet Connection**: Required for downloading dependencies and API access |\n\n**Additional Requirements**\n\n- **API Keys**: OpenAI API key with available credits (for the quickstart examples)\n  - Other model providers supported (Azure OpenAI, self-hosted models via vLLM)\n- **Ray**: Automatically installed as a dependency (no separate setup required)\n\n## 🚀 Quick Start\n\nInstall NeMo Gym, start the servers, and collect your first verified rollouts for RL training.\n\n### Setup\n```bash\n# Clone the repository\ngit clone git@github.com:NVIDIA-NeMo\u002FGym.git\ncd Gym\n\n# Install UV (Python package manager)\ncurl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\nsource $HOME\u002F.local\u002Fbin\u002Fenv\n\n# Create virtual environment\nuv venv --python 3.12\nsource .venv\u002Fbin\u002Factivate\n\n# Install NeMo Gym\nuv sync --extra dev --group docs\n```\n\n### Configure Your API Key\nCreate an `env.yaml` file that contains your OpenAI API key and the [policy model](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Fabout\u002Fconcepts\u002Fkey-terminology.html#term-Policy-Model) you want to use. Replace `your-openai-api-key` with your actual key. This file helps keep your secrets out of version control while still making them available to NeMo Gym.\n\n```bash\necho \"policy_base_url: https:\u002F\u002Fapi.openai.com\u002Fv1\npolicy_api_key: your-openai-api-key\npolicy_model_name: gpt-4.1-2025-04-14\" > env.yaml\n```\n\n> [!NOTE]\n> We use GPT-4.1 in this quickstart because it provides low latency (no reasoning step) and works reliably out-of-the-box. NeMo Gym is **not limited to OpenAI models**—you can use self-hosted models via vLLM or any OpenAI-compatible inference server. See the [documentation](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Fget-started\u002Fdetailed-setup.html) for details.\n\n### Start Servers\n\n**Terminal 1 (start servers)**:\n```bash\n# Start servers (this will keep running)\nconfig_paths=\"resources_servers\u002Fexample_single_tool_call\u002Fconfigs\u002Fexample_single_tool_call.yaml,\\\nresponses_api_models\u002Fopenai_model\u002Fconfigs\u002Fopenai_model.yaml\"\nng_run \"+config_paths=[${config_paths}]\"\n```\n\n**Terminal 2 (interact with agent)**:\n```bash\n# In a NEW terminal, activate environment\nsource .venv\u002Fbin\u002Factivate\n\n# Interact with your agent\npython responses_api_agents\u002Fsimple_agent\u002Fclient.py\n```\n\n### Collect Rollouts\n\n**Terminal 2** (keep servers running in Terminal 1):\n```bash\n# Create a simple dataset with one query\necho '{\"responses_create_params\":{\"input\":[{\"role\":\"developer\",\"content\":\"You are a helpful assistant.\"},{\"role\":\"user\",\"content\":\"What is the weather in Seattle?\"}]}}' > weather_query.jsonl\n\n# Collect verified rollouts\nng_collect_rollouts \\\n    +agent_name=example_single_tool_call_simple_agent \\\n    +input_jsonl_fpath=weather_query.jsonl \\\n    +output_jsonl_fpath=weather_rollouts.jsonl\n\n# View the result\ncat weather_rollouts.jsonl | python -m json.tool\n```\nThis generates training data with verification scores!\n\n### Clean Up Servers\n\n**Terminal 1** with the running servers: Ctrl+C to stop the ng_run process.\n\n### Next Steps\n\nNow that you can generate rollouts, choose your path:\n\n- **Start training** — Train models using NeMo Gym with your preferred RL framework. See the [Training Tutorials](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Ftraining-tutorials\u002Findex.html).\n\n- **Use an existing environment** — Browse the [Available Environments](#-available-environments) below to find an environment that matches your goals.\n\n- **Build a custom environment** — Implement or integrate existing tools and define task verification logic. Get started with the [Creating a Training Environment](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Fenvironment-tutorials\u002Fcreating-training-environment.html) tutorial.\n\n\n## 📦 Available Environments\n\nNeMo Gym includes a curated collection of environments for training and evaluation across multiple domains:\n\n### Example Environment Patterns\n\nPurpose: Demonstrate NeMo Gym patterns and concepts.\n\n\u003C!-- START_EXAMPLE_ONLY_SERVERS_TABLE -->\n| Name               | Demonstrates                         | Config                                                                                                                             | README                                                                      |\n| ------------------ | ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- |\n| Multi Step         | Multi-step tool calling              | \u003Ca href='resources_servers\u002Fexample_multi_step\u002Fconfigs\u002Fexample_multi_step.yaml'>example_multi_step.yaml\u003C\u002Fa>                         | \u003Ca href='resources_servers\u002Fexample_multi_step\u002FREADME.md'>README\u003C\u002Fa>         |\n| Session State Mgmt | Session state management (in-memory) | \u003Ca href='resources_servers\u002Fexample_session_state_mgmt\u002Fconfigs\u002Fexample_session_state_mgmt.yaml'>example_session_state_mgmt.yaml\u003C\u002Fa> | \u003Ca href='resources_servers\u002Fexample_session_state_mgmt\u002FREADME.md'>README\u003C\u002Fa> |\n| Single Tool Call   | Basic single-step tool calling       | \u003Ca href='resources_servers\u002Fexample_single_tool_call\u002Fconfigs\u002Fexample_single_tool_call.yaml'>example_single_tool_call.yaml\u003C\u002Fa>       | \u003Ca href='resources_servers\u002Fexample_single_tool_call\u002FREADME.md'>README\u003C\u002Fa>   |\n\u003C!-- END_EXAMPLE_ONLY_SERVERS_TABLE -->\n\n### Environments for Training & Evaluation\n\nPurpose: Training-ready environments with curated datasets.\n\nEach resources server includes example data, configuration files, and tests. See each server's README for details.\n\nThe Dataset column links to publicly available datasets (e.g., on HuggingFace). A `-` means the train\u002Fvalidation data has not been publicly released yet, or that it is procedurally generated using a provided script. If no data is released yet, new data can be generated, or the environment can be used as a reference. Each server includes 5 example tasks in `data\u002Fexample.jsonl`.\n\n\u003C!-- START_TRAINING_SERVERS_TABLE -->\n| Resources Server                              | Domain                | Description                                                                                                                                                                                                                  | Value                                                                                                                        | Train | Validation | License                                                   | Config                                                                                                                                                                                                                      | Dataset                                                                                                                                                        |\n| --------------------------------------------- | --------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- | ----- | ---------- | --------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| Aalcr                                         | other                 | -                                                                                                                                                                                                                            | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Faalcr\u002Fconfigs\u002Faalcr.yaml'>aalcr.yaml\u003C\u002Fa>                                                                                                                                                         | -                                                                                                                                                              |\n| Abstention                                    | rlhf                  | Train models to abstain when unsure using three-tier reward on HotPotQA with LLM judge                                                                                                                                       | Improve calibration by rewarding abstention over incorrect answers                                                           | ✓     | ✓          | Creative Commons Attribution-ShareAlike 4.0 International | \u003Ca href='resources_servers\u002Fabstention\u002Fconfigs\u002Fabstention.yaml'>abstention.yaml\u003C\u002Fa>                                                                                                                                          | -                                                                                                                                                              |\n| Arc Agi                                       | knowledge             | Solve puzzles designed to test intelligence. See https:\u002F\u002Farcprize.org\u002Farc-agi.                                                                                                                                               | Improve puzzle-solving capabilities.                                                                                         | -     | ✓          | -                                                         | \u003Ca href='resources_servers\u002Farc_agi\u002Fconfigs\u002Farc_agi.yaml'>arc_agi.yaml\u003C\u002Fa>                                                                                                                                                   | -                                                                                                                                                              |\n| Aviary                                        | agent                 | Multi-hop question answering on the HotPotQA dataset with Wikipedia search                                                                                                                                                   | Improve knowledge and agentic capability                                                                                     | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Faviary\u002Fconfigs\u002Fhotpotqa_aviary.yaml'>hotpotqa_aviary.yaml\u003C\u002Fa>                                                                                                                                    | -                                                                                                                                                              |\n| Aviary                                        | math                  | GSM8k benchmark with calculator tool                                                                                                                                                                                         | Test math and agentic capability                                                                                             | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Faviary\u002Fconfigs\u002Fgsm8k_aviary.yaml'>gsm8k_aviary.yaml\u003C\u002Fa>                                                                                                                                          | -                                                                                                                                                              |\n| Calendar                                      | agent                 | Multi-turn calendar scheduling dataset. User states events and constraints in natural language; model schedules events to satisfy all constraints.                                                                           | Improve multi-turn instruction following capabilities                                                                        | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fcalendar\u002Fconfigs\u002Fcalendar.yaml'>calendar.yaml\u003C\u002Fa>                                                                                                                                                | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-agent-calendar_scheduling'>Nemotron-RL-agent-calendar_scheduling\u003C\u002Fa>                               |\n| Calendar                                      | agent                 | Multi-turn calendar scheduling dataset. User states events and constraints in natural language; model schedules events to satisfy all constraints.                                                                           | Improve multi-turn instruction following capabilities                                                                        | ✓     | ✓          | Creative Commons Attribution 4.0 International            | \u003Ca href='resources_servers\u002Fcalendar\u002Fconfigs\u002Fcalendar_v2.yaml'>calendar_v2.yaml\u003C\u002Fa>                                                                                                                                          | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-Instruction-Following-Calendar-v2'>Nemotron-RL-Instruction-Following-Calendar-v2\u003C\u002Fa>               |\n| Circle Click                                  | other                 | Click on circles in images                                                                                                                                                                                                   | Improve visual grounding and spatial reasoning                                                                               | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fcircle_click\u002Fconfigs\u002Fcircle_click.yaml'>circle_click.yaml\u003C\u002Fa>                                                                                                                                    | -                                                                                                                                                              |\n| Circle Count                                  | other                 | Count circles of a given color in images                                                                                                                                                                                     | Improve visual counting and color recognition                                                                                | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fcircle_count\u002Fconfigs\u002Fcircle_count.yaml'>circle_count.yaml\u003C\u002Fa>                                                                                                                                    | -                                                                                                                                                              |\n| Code Gen                                      | coding                | Model must submit the right code to solve a problem                                                                                                                                                                          | Improve competitive coding capabilities                                                                                      | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fcode_gen\u002Fconfigs\u002Fcode_gen.yaml'>code_gen.yaml\u003C\u002Fa>                                                                                                                                                | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002Fnemotron-RL-coding-competitive_coding'>nemotron-RL-coding-competitive_coding\u003C\u002Fa>                               |\n| Competitive Coding Challenges                 | coding                | Execution of competitive programming competition questions                                                                                                                                                                   | Improve competitive coding capabilities on contest-style problems                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fcompetitive_coding_challenges\u002Fconfigs\u002Fcompetitive_coding_challenges.yaml'>competitive_coding_challenges.yaml\u003C\u002Fa>                                                                                 | -                                                                                                                                                              |\n| Cvdp                                          | coding                | CVDP benchmark dataset for code generation                                                                                                                                                                                   | Evaluate RTL code generation capabilities                                                                                    | -     | ✓          | -                                                         | \u003Ca href='resources_servers\u002Fcvdp\u002Fconfigs\u002Fcvdp.yaml'>cvdp.yaml\u003C\u002Fa>                                                                                                                                                            | -                                                                                                                                                              |\n| Equivalence Llm Judge                         | agent                 | Short bash command generation questions with LLM-as-a-judge                                                                                                                                                                  | Improve foundational bash and IF capabilities                                                                                | ✓     | ✓          | GNU General Public License v3.0                           | \u003Ca href='resources_servers\u002Fequivalence_llm_judge\u002Fconfigs\u002Fnl2bash-equivalency.yaml'>nl2bash-equivalency.yaml\u003C\u002Fa>                                                                                                             | -                                                                                                                                                              |\n| Equivalence Llm Judge                         | knowledge             | Short answer questions with LLM-as-a-judge                                                                                                                                                                                   | Improve knowledge-related benchmarks like GPQA \u002F HLE                                                                         | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fequivalence_llm_judge\u002Fconfigs\u002Fequivalence_llm_judge.yaml'>equivalence_llm_judge.yaml\u003C\u002Fa>                                                                                                         | -                                                                                                                                                              |\n| Ether0                                        | knowledge             | ether0 chemistry benchmark verifiers                                                                                                                                                                                         | Evalutate chemistry knowledge and reasoning with ether0 benchmark                                                            | -     | ✓          | -                                                         | \u003Ca href='resources_servers\u002Fether0\u002Fconfigs\u002Fether0.yaml'>ether0.yaml\u003C\u002Fa>                                                                                                                                                      | -                                                                                                                                                              |\n| Finance Sec Search                            | agent                 | SEC EDGAR filing search for financial analysis questions                                                                                                                                                                     | Enable LLMs to search and analyze SEC filings                                                                                | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Ffinance_sec_search\u002Fconfigs\u002Ffinance_sec_search.yaml'>finance_sec_search.yaml\u003C\u002Fa>                                                                                                                  | -                                                                                                                                                              |\n| Format Verification                           | instruction_following | Verify citation\u002Freference markers in model responses via string matching                                                                                                                                                     | Improve instruction following for citation format adherence                                                                  | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fformat_verification\u002Fconfigs\u002Fcitation_format.yaml'>citation_format.yaml\u003C\u002Fa>                                                                                                                       | -                                                                                                                                                              |\n| Format Verification                           | instruction_following | Verify freeform text formatting (bullets, headings, tables, etc.) via regex patterns                                                                                                                                         | Improve instruction following for text formatting constraints                                                                | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fformat_verification\u002Fconfigs\u002Ffreeform_formatting.yaml'>freeform_formatting.yaml\u003C\u002Fa>                                                                                                               | -                                                                                                                                                              |\n| Genrm Compare                                 | rlhf                  | GenRM pairwise comparison for RLHF training                                                                                                                                                                                  | Compare multiple candidate responses using GenRM model                                                                       | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fgenrm_compare\u002Fconfigs\u002Fgenrm_compare.yaml'>genrm_compare.yaml\u003C\u002Fa>                                                                                                                                 | -                                                                                                                                                              |\n| Google Search                                 | agent                 | Multi-choice question answering problems with search tools integrated                                                                                                                                                        | Improve knowledge-related benchmarks with search tools                                                                       | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fgoogle_search\u002Fconfigs\u002Fgoogle_search.yaml'>google_search.yaml\u003C\u002Fa>                                                                                                                                 | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-knowledge-web_search-mcqa'>Nemotron-RL-knowledge-web_search-mcqa\u003C\u002Fa>                               |\n| Gpqa Diamond                                  | knowledge             | GPQA Diamond multiple-choice question answering problems                                                                                                                                                                     | Evaluate graduate-level scientific reasoning via MCQ verification                                                            | ✓     | -          | MIT                                                       | \u003Ca href='resources_servers\u002Fgpqa_diamond\u002Fconfigs\u002Fgpqa_diamond.yaml'>gpqa_diamond.yaml\u003C\u002Fa>                                                                                                                                    | -                                                                                                                                                              |\n| Ifbench                                       | instruction_following | IFBench instruction following evaluation using AllenAI's IFBench library (57 instruction types)                                                                                                                              | Improve IFBench instruction following                                                                                        | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fifbench\u002Fconfigs\u002Fifbench.yaml'>ifbench.yaml\u003C\u002Fa>                                                                                                                                                   | -                                                                                                                                                              |\n| Indirect Prompt Injection                     | safety                | Indirect prompt injection resistance for multi-domain tool-use agents                                                                                                                                                        | Improve agentic security by teaching robustness against tool outputs containing malicious instructions                       | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Findirect_prompt_injection\u002Fconfigs\u002Findirect_prompt_injection.yaml'>indirect_prompt_injection.yaml\u003C\u002Fa>                                                                                             | -                                                                                                                                                              |\n| Instruction Following                         | instruction_following | Instruction following datasets targeting IFEval and IFBench style instruction following capabilities                                                                                                                         | Improve IFEval and IFBench                                                                                                   | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Finstruction_following\u002Fconfigs\u002Finstruction_following.yaml'>instruction_following.yaml\u003C\u002Fa>                                                                                                         | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-instruction_following'>Nemotron-RL-instruction_following\u003C\u002Fa>                                       |\n| Jailbreak Detection                           | safety                | Jailbreak detection with Nemotron judge + combined reward                                                                                                                                                                    | Improve Jailbreak Robustness and Safety\u002FSecurity Behavior Guide Enforcement                                                  | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fjailbreak_detection\u002Fconfigs\u002Fjailbreak_detection_nemotron_combined_reward_tp8.yaml'>jailbreak_detection_nemotron_combined_reward_tp8.yaml\u003C\u002Fa>                                                     | -                                                                                                                                                              |\n| Labbench2 Vlm                                 | knowledge             | labbench2 VLM benchmarks: scientific figure\u002Ftable QA (figqa2, tableqa2) with LLM-as-judge                                                                                                                                    | Measure VLM scientific reasoning on figures and tables                                                                       | -     | ✓          | -                                                         | \u003Ca href='resources_servers\u002Flabbench2_vlm\u002Fconfigs\u002Flabbench2_vlm.yaml'>labbench2_vlm.yaml\u003C\u002Fa>                                                                                                                                 | -                                                                                                                                                              |\n| Math Advanced Calculations                    | agent                 | An instruction following math environment with counter-intuitive calculators                                                                                                                                                 | Improve instruction following capabilities in specific math environments                                                     | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fmath_advanced_calculations\u002Fconfigs\u002Fmath_advanced_calculations.yaml'>math_advanced_calculations.yaml\u003C\u002Fa>                                                                                          | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-math-advanced_calculations'>Nemotron-RL-math-advanced_calculations\u003C\u002Fa>                             |\n| Math Formal Lean                              | math                  | Lean4 formal proof verification environment                                                                                                                                                                                  | Improve formal theorem proving capabilities                                                                                  | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fmath_formal_lean\u002Fconfigs\u002Fnemotron_clean_easy.yaml'>nemotron_clean_easy.yaml\u003C\u002Fa>                                                                                                                  | -                                                                                                                                                              |\n| Math Formal Lean                              | math                  | Lean4 formal proof verification environment                                                                                                                                                                                  | Improve formal theorem proving capabilities                                                                                  | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fmath_formal_lean\u002Fconfigs\u002Fnemotron_first_try_hard.yaml'>nemotron_first_try_hard.yaml\u003C\u002Fa>                                                                                                          | -                                                                                                                                                              |\n| Math Formal Lean                              | math                  | Lean4 formal proof verification environment                                                                                                                                                                                  | Improve formal theorem proving capabilities                                                                                  | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fmath_formal_lean\u002Fconfigs\u002Fnemotron_medium_500.yaml'>nemotron_medium_500.yaml\u003C\u002Fa>                                                                                                                  | -                                                                                                                                                              |\n| Math Formal Lean                              | math                  | Lean4 formal proof verification environment                                                                                                                                                                                  | Improve formal theorem proving capabilities                                                                                  | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fmath_formal_lean\u002Fconfigs\u002Fnemotron_very_easy.yaml'>nemotron_very_easy.yaml\u003C\u002Fa>                                                                                                                    | -                                                                                                                                                              |\n| Math Formal Lean                              | math                  | Lean4 formal proof verification environment                                                                                                                                                                                  | Improve formal theorem proving capabilities                                                                                  | ✓     | -          | MIT                                                       | \u003Ca href='resources_servers\u002Fmath_formal_lean\u002Fconfigs\u002Fmath_formal_lean.yaml'>math_formal_lean.yaml\u003C\u002Fa>                                                                                                                        | -                                                                                                                                                              |\n| Math Formal Lean                              | math                  | Lean4 formal proof verification environment with multi-turn self-correction                                                                                                                                                  | Improve formal theorem proving capabilities                                                                                  | ✓     | -          | MIT                                                       | \u003Ca href='resources_servers\u002Fmath_formal_lean\u002Fconfigs\u002Fmath_formal_lean_multi_turn.yaml'>math_formal_lean_multi_turn.yaml\u003C\u002Fa>                                                                                                  | -                                                                                                                                                              |\n| Math With Code                                | math                  | Model solves competitive math problems using simple calculator tools                                                                                                                                                         | Improve math and simple tool use capabilities                                                                                | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fmath_with_code\u002Fconfigs\u002Fmath_with_code.yaml'>math_with_code.yaml\u003C\u002Fa>                                                                                                                              | -                                                                                                                                                              |\n| Math With Judge                               | math                  | DAPO17k math dataset with math-verify                                                                                                                                                                                        | Improve math capabilities including AIME 24 \u002F 25                                                                             | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fmath_with_judge\u002Fconfigs\u002Fdapo17k.yaml'>dapo17k.yaml\u003C\u002Fa>                                                                                                                                           | -                                                                                                                                                              |\n| Math With Judge                               | math                  | MathStackOverflow math dataset with math-verify                                                                                                                                                                              | Improve math capabilities including AIME 24 \u002F 25                                                                             | ✓     | ✓          | Creative Commons Attribution-ShareAlike 4.0 International | \u003Ca href='resources_servers\u002Fmath_with_judge\u002Fconfigs\u002Fmath_stack_overflow.yaml'>math_stack_overflow.yaml\u003C\u002Fa>                                                                                                                   | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-math-stack_overflow'>Nemotron-RL-math-stack_overflow\u003C\u002Fa>                                           |\n| Math With Judge                               | math                  | OpenMathReasoning math dataset with math-verify and LLM-as-a-judge                                                                                                                                                           | Improve math capabilities including AIME 24 \u002F 25                                                                             | ✓     | ✓          | Creative Commons Attribution 4.0 International            | \u003Ca href='resources_servers\u002Fmath_with_judge\u002Fconfigs\u002Fmath_with_judge.yaml'>math_with_judge.yaml\u003C\u002Fa>                                                                                                                           | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-math-OpenMathReasoning'>Nemotron-RL-math-OpenMathReasoning\u003C\u002Fa>                                     |\n| Mcqa                                          | knowledge             | Multi-choice question answering problems                                                                                                                                                                                     | Improve benchmarks like MMLU \u002F GPQA \u002F HLE                                                                                    | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fmcqa\u002Fconfigs\u002Fmcqa.yaml'>mcqa.yaml\u003C\u002Fa>                                                                                                                                                            | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-knowledge-mcqa'>Nemotron-RL-knowledge-mcqa\u003C\u002Fa>                                                     |\n| Multichallenge                                | knowledge             | Targets inference memory, instruction retention, version editing, and self-coherence.                                                                                                                                        | Improve complex multi-turn conversational capability                                                                         | ✓     | -          | Creative Commons Attribution 4.0 International            | \u003Ca href='resources_servers\u002Fmultichallenge\u002Fconfigs\u002Fmultichallenge_nrl.yaml'>multichallenge_nrl.yaml\u003C\u002Fa>                                                                                                                      | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-Instruction-Following-MultiTurnChat-v1'>Nemotron-RL-Instruction-Following-MultiTurnChat-v1\u003C\u002Fa>     |\n| Newton Bench                                  | math                  | Scientific law discovery tasks through agentic experimentation across 12 physics domains                                                                                                                                     | Improve science, reasoning, and tool use capabilities                                                                        | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fnewton_bench\u002Fconfigs\u002Fnewton_bench.yaml'>newton_bench.yaml\u003C\u002Fa>                                                                                                                                    | -                                                                                                                                                              |\n| Ns Tools                                      | agent                 | NeMo Skills tool execution with math verification                                                                                                                                                                            | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fns_tools\u002Fconfigs\u002Fns_tools.yaml'>ns_tools.yaml\u003C\u002Fa>                                                                                                                                                | -                                                                                                                                                              |\n| Nvarc                                         | knowledge             | ARC-AGI inductive mode: model outputs Python code with transform()                                                                                                                                                           | Improve ARC-AGI puzzle-solving by inducing executable transformation programs                                                | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fnvarc\u002Fconfigs\u002Finductive.yaml'>inductive.yaml\u003C\u002Fa>                                                                                                                                                 | -                                                                                                                                                              |\n| Nvarc                                         | knowledge             | ARC-AGI transductive mode: model outputs grid directly                                                                                                                                                                       | Improve ARC-AGI puzzle-solving by directly predicting transformed grids                                                      | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fnvarc\u002Fconfigs\u002Ftransductive.yaml'>transductive.yaml\u003C\u002Fa>                                                                                                                                           | -                                                                                                                                                              |\n| Openenv                                       | agent                 | Echo environment via OpenEnv (MCP). Echoes messages back with length-based rewards.                                                                                                                                          | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fopenenv\u002Fconfigs\u002Fopenenv_echo.yaml'>openenv_echo.yaml\u003C\u002Fa>                                                                                                                                         | -                                                                                                                                                              |\n| Openenv                                       | coding                | Python code execution environment via OpenEnv. Executes code and returns stdout\u002Fstderr.                                                                                                                                      | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fopenenv\u002Fconfigs\u002Fopenenv_coding.yaml'>openenv_coding.yaml\u003C\u002Fa>                                                                                                                                     | -                                                                                                                                                              |\n| Openenv                                       | games                 | Maze navigation environment via OpenEnv. Agent navigates an 8x8 grid to find the exit.                                                                                                                                       | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fopenenv\u002Fconfigs\u002Fopenenv_maze.yaml'>openenv_maze.yaml\u003C\u002Fa>                                                                                                                                         | -                                                                                                                                                              |\n| Over Refusal Detection                        |                       | -                                                                                                                                                                                                                            | -                                                                                                                            | ✓     | -          | TBD                                                       | \u003Ca href='resources_servers\u002Fover_refusal_detection\u002Fconfigs\u002Fover_refusal_detection.yaml'>over_refusal_detection.yaml\u003C\u002Fa>                                                                                                      | -                                                                                                                                                              |\n| Proof Genselect                               | math                  | Pairwise proof selection with binary correctness reward                                                                                                                                                                      | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fproof_genselect\u002Fconfigs\u002Fproof_genselect.yaml'>proof_genselect.yaml\u003C\u002Fa>                                                                                                                           | -                                                                                                                                                              |\n| Proof Judge                                   | math                  | Theorem proving with verifier + meta-verifier judge (combined env)                                                                                                                                                           | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fproof_judge\u002Fconfigs\u002Fproof_judge.yaml'>proof_judge.yaml\u003C\u002Fa>                                                                                                                                       | -                                                                                                                                                              |\n| Proof Verification                            | math                  | Proof verification scored against ground truth and meta-verifier agreement                                                                                                                                                   | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fproof_verification\u002Fconfigs\u002Fproof_verification.yaml'>proof_verification.yaml\u003C\u002Fa>                                                                                                                  | -                                                                                                                                                              |\n| Rdkit Chemistry                               | knowledge             | Molecular chemistry question answering: calculate properties of SMILES. Includes a mix of tool-use (python + rdkit) and no-tool-use questions.                                                                               | Improve molecular reasoning and SMILES parsing.                                                                              | ✓     | -          | TBD                                                       | \u003Ca href='resources_servers\u002Frdkit_chemistry\u002Fconfigs\u002Frdkit_chemistry.yaml'>rdkit_chemistry.yaml\u003C\u002Fa>                                                                                                                           | -                                                                                                                                                              |\n| Reasoning Gym                                 | knowledge             | LangGraph orchestrator agent compatible with resource servers that do not use tools; enables diverse agent training data and test time scaling vs a simple agent, extensible to use tools or other agent architectures       | Iterative test time scaling for improved performance in reasoning tasks                                                      | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Freasoning_gym\u002Fconfigs\u002Forchestrator_agent.yaml'>orchestrator_agent.yaml\u003C\u002Fa>                                                                                                                       | -                                                                                                                                                              |\n| Reasoning Gym                                 | knowledge             | LangGraph parallel thinking agent compatible with resource servers that do not use tools; enables diverse agent training data and test time scaling vs a simple agent, extensible to use tools or other agent architectures  | Iterative test time scaling for improved performance in reasoning tasks                                                      | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Freasoning_gym\u002Fconfigs\u002Fparallel_thinking_agent.yaml'>parallel_thinking_agent.yaml\u003C\u002Fa>                                                                                                             | -                                                                                                                                                              |\n| Reasoning Gym                                 | knowledge             | LangGraph reflection agent compatible with resource servers that do not use tools; provides iterative reflection for diverse agent training data and test time scaling, extensible to use tools or other agent architectures | Iterative test time scaling for improved performance in reasoning tasks                                                      | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Freasoning_gym\u002Fconfigs\u002Freflection_agent.yaml'>reflection_agent.yaml\u003C\u002Fa>                                                                                                                           | -                                                                                                                                                              |\n| Reasoning Gym                                 | knowledge             | LangGraph ReWOO agent compatible with resource servers that do not use tools; enables diverse agent training data and test time scaling vs a simple agent, extensible to use tools or other agent architectures              | Iterative test time scaling for improved performance in reasoning tasks                                                      | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Freasoning_gym\u002Fconfigs\u002Frewoo_agent.yaml'>rewoo_agent.yaml\u003C\u002Fa>                                                                                                                                     | -                                                                                                                                                              |\n| Reasoning Gym                                 | knowledge             | Over 100 tasks including algebra, arithmetic, computation, cognition, geometry, graph theory, logic, and many common games.                                                                                                  | Improve robustness, generalization, broad knowledge and reasoning                                                            | ✓     | -          | Creative Commons Attribution 4.0 International            | \u003Ca href='resources_servers\u002Freasoning_gym\u002Fconfigs\u002Freasoning_gym.yaml'>reasoning_gym.yaml\u003C\u002Fa>                                                                                                                                 | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-ReasoningGym-v1'>Nemotron-RL-ReasoningGym-v1\u003C\u002Fa>                                                   |\n| Ruler                                         | other                 | -                                                                                                                                                                                                                            | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fruler\u002Fconfigs\u002Fruler.yaml'>ruler.yaml\u003C\u002Fa>                                                                                                                                                         | -                                                                                                                                                              |\n| Single Step Tool Use With Argument Comparison | agent                 | Conversational tool-use RL from expert trajectories; behavior cloning per step across auth, lookup, and servicing domains.                                                                                                   | -                                                                                                                            | ✓     | ✓          | Creative Commons Attribution 4.0 International            | \u003Ca href='resources_servers\u002Fsingle_step_tool_use_with_argument_comparison\u002Fconfigs\u002Fsingle_step_tool_use_with_argument_comparison.yaml'>single_step_tool_use_with_argument_comparison.yaml\u003C\u002Fa>                                 | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-Agentic-Conversational-Tool-Use-Pivot-v1'>Nemotron-RL-Agentic-Conversational-Tool-Use-Pivot-v1\u003C\u002Fa> |\n| Single Step Tool Use With Argument Comparison | agent                 | General function-calling RL dataset using expert trajectories; behavior cloning to match expert tool calls per step.                                                                                                         | -                                                                                                                            | ✓     | ✓          | Creative Commons Attribution 4.0 International            | \u003Ca href='resources_servers\u002Fsingle_step_tool_use_with_argument_comparison\u002Fconfigs\u002Ftoolcall_schema_single_step_tool_use_with_argument_comparison.yaml'>toolcall_schema_single_step_tool_use_with_argument_comparison.yaml\u003C\u002Fa> | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-Agentic-Function-Calling-Pivot-v1'>Nemotron-RL-Agentic-Function-Calling-Pivot-v1\u003C\u002Fa>               |\n| Single Step Tool Use With Argument Comparison | agent                 | GitHub-issue dataset for software-engineering agents; refactored from SWE-Gym and SWE-Bench-Verified for NeMo Gym.                                                                                                           | -                                                                                                                            | ✓     | ✓          | Creative Commons Attribution 4.0 International            | \u003Ca href='resources_servers\u002Fsingle_step_tool_use_with_argument_comparison\u002Fconfigs\u002Fswe_pivot_single_step_tool_use_with_argument_comparison.yaml'>swe_pivot_single_step_tool_use_with_argument_comparison.yaml\u003C\u002Fa>             | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-Agentic-SWE-Pivot-v1'>Nemotron-RL-Agentic-SWE-Pivot-v1\u003C\u002Fa>                                         |\n| Single Step Tool Use With Argument Comparison | agent                 | The model must output the next correct call in a given trajectory involving search tools.                                                                                                                                    | Improve agentic search capability.                                                                                           | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fsingle_step_tool_use_with_argument_comparison\u002Fconfigs\u002Fsearch_pivot_single_step_tool_use_with_argument_comparison.yaml'>search_pivot_single_step_tool_use_with_argument_comparison.yaml\u003C\u002Fa>       | -                                                                                                                                                              |\n| Spider2 Lite                                  | coding                | Text-to-SQL with execution-based evaluation on Spider 2.0-Lite (135 SQLite tasks). Binary reward based on result-set equivalence.                                                                                            | Improve text-to-SQL capabilities for real-world enterprise queries using execution-based binary reward without an LLM judge. | -     | ✓          | -                                                         | \u003Ca href='resources_servers\u002Fspider2_lite\u002Fconfigs\u002Fspider2_lite.yaml'>spider2_lite.yaml\u003C\u002Fa>                                                                                                                                    | -                                                                                                                                                              |\n| Structeval                                    | instruction_following | StructEval non-renderable format verification (JSON, YAML, CSV, TOML, XML)                                                                                                                                                   | Improve structured output generation quality                                                                                 | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fstructeval\u002Fconfigs\u002Fstructeval_nonrenderable.yaml'>structeval_nonrenderable.yaml\u003C\u002Fa>                                                                                                              | -                                                                                                                                                              |\n| Structured Outputs                            | instruction_following | Check if responses are following structured output requirements in prompts                                                                                                                                                   | Improve instruction following capabilities                                                                                   | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fstructured_outputs\u002Fconfigs\u002Fstructured_outputs_json.yaml'>structured_outputs_json.yaml\u003C\u002Fa>                                                                                                        | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-instruction_following-structured_outputs'>Nemotron-RL-instruction_following-structured_outputs\u003C\u002Fa> |\n| Structured Outputs                            | instruction_following | Check if responses are following structured output requirements in prompts                                                                                                                                                   | Improve instruction following capabilities                                                                                   | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fstructured_outputs\u002Fconfigs\u002Fstructured_outputs_json_yaml_xml_v1.yaml'>structured_outputs_json_yaml_xml_v1.yaml\u003C\u002Fa>                                                                                | -                                                                                                                                                              |\n| Structured Outputs                            | instruction_following | Check if responses follow structured output requirements (JSON, YAML, XML, TOML, CSV). Created 20260409.                                                                                                                     | Improve schema adherence across all structured output formats                                                                | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fstructured_outputs\u002Fconfigs\u002Fstructured_outputs_v3.yaml'>structured_outputs_v3.yaml\u003C\u002Fa>                                                                                                            | -                                                                                                                                                              |\n| Swerl Gen                                     | coding                | Running sandboxed evaluation for SWE-style tasks (either patch generation or reproduction test generation)                                                                                                                   | Improve SWE capabilities useful for benchmarks like SWE-bench                                                                | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fswerl_gen\u002Fconfigs\u002Fswerl_gen.yaml'>swerl_gen.yaml\u003C\u002Fa>                                                                                                                                             | -                                                                                                                                                              |\n| Swerl Llm Judge                               | coding                | SWE-style multiple-choice LLM-judge tasks scored via \u003Csolution>...\u003C\u002Fsolution> choice.                                                                                                                                        | Improve SWE capabilities useful for benchmarks like SWE-bench                                                                | ✓     | ✓          | MIT                                                       | \u003Ca href='resources_servers\u002Fswerl_llm_judge\u002Fconfigs\u002Fswerl_llm_judge.yaml'>swerl_llm_judge.yaml\u003C\u002Fa>                                                                                                                           | -                                                                                                                                                              |\n| Tavily Search                                 | agent                 | Model uses search tools to satisfy a user query.                                                                                                                                                                             | Measure agentic search capability                                                                                            | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Ftavily_search\u002Fconfigs\u002Ftavily_search_judge_vllm_model.yaml'>tavily_search_judge_vllm_model.yaml\u003C\u002Fa>                                                                                               | -                                                                                                                                                              |\n| Terminal Multi Harness                        | agent                 | Agent006 harness structured-action verifier for next-step pivot RL.                                                                                                                                                          | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fterminal_multi_harness\u002Fconfigs\u002Fterminal_multi_harness_agent006.yaml'>terminal_multi_harness_agent006.yaml\u003C\u002Fa>                                                                                    | -                                                                                                                                                              |\n| Terminal Multi Harness                        | agent                 | Codex harness structured-action verifier for next-step pivot RL.                                                                                                                                                             | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fterminal_multi_harness\u002Fconfigs\u002Fterminal_multi_harness_codex.yaml'>terminal_multi_harness_codex.yaml\u003C\u002Fa>                                                                                          | -                                                                                                                                                              |\n| Terminal Multi Harness                        | agent                 | OpenCode harness structured-action verifier for next-step pivot RL.                                                                                                                                                          | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fterminal_multi_harness\u002Fconfigs\u002Fterminal_multi_harness_opencode.yaml'>terminal_multi_harness_opencode.yaml\u003C\u002Fa>                                                                                    | -                                                                                                                                                              |\n| Terminus Judge                                | agent                 | single-step terminal based task (rubrics v4 judge prompt)                                                                                                                                                                    | Improve on terminal-style tasks                                                                                              | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fterminus_judge\u002Fconfigs\u002Fterminus_judge.yaml'>terminus_judge.yaml\u003C\u002Fa>                                                                                                                              | -                                                                                                                                                              |\n| Terminus Judge                                | agent                 | single-step terminal based task (simple judge prompt)                                                                                                                                                                        | Improve on terminal-style tasks                                                                                              | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fterminus_judge\u002Fconfigs\u002Fterminus_judge_simple.yaml'>terminus_judge_simple.yaml\u003C\u002Fa>                                                                                                                | -                                                                                                                                                              |\n| Terminus Judge                                | agent                 | single-step terminal based task (string similarity only)                                                                                                                                                                     | Improve on terminal-style tasks                                                                                              | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fterminus_judge\u002Fconfigs\u002Fterminus_judge_string_only.yaml'>terminus_judge_string_only.yaml\u003C\u002Fa>                                                                                                      | -                                                                                                                                                              |\n| Text To Sql                                   | coding                | Text-to-SQL generation with LLM-as-a-judge equivalence checking                                                                                                                                                              | Improve text-to-SQL capabilities across multiple dialects                                                                    | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Ftext_to_sql\u002Fconfigs\u002Ftext_to_sql.yaml'>text_to_sql.yaml\u003C\u002Fa>                                                                                                                                       | -                                                                                                                                                              |\n| Vlm Eval Kit                                  | other                 | -                                                                                                                                                                                                                            | Measure VLM capabilities                                                                                                     | -     | ✓          | -                                                         | \u003Ca href='resources_servers\u002Fvlm_eval_kit\u002Fconfigs\u002FMMBench_DEV_EN_V11.yaml'>MMBench_DEV_EN_V11.yaml\u003C\u002Fa>                                                                                                                        | -                                                                                                                                                              |\n| Vlm Eval Kit                                  | other                 | -                                                                                                                                                                                                                            | Measure VLM capabilities                                                                                                     | -     | ✓          | -                                                         | \u003Ca href='resources_servers\u002Fvlm_eval_kit\u002Fconfigs\u002FOCRBench.yaml'>OCRBench.yaml\u003C\u002Fa>                                                                                                                                            | -                                                                                                                                                              |\n| Vlm Eval Kit                                  | other                 | Run all supported VLMEvalKit benchmarks.                                                                                                                                                                                     | Measure VLM capabilities                                                                                                     | -     | ✓          | -                                                         | \u003Ca href='resources_servers\u002Fvlm_eval_kit\u002Fconfigs\u002Fvlm_eval_kit.yaml'>vlm_eval_kit.yaml\u003C\u002Fa>                                                                                                                                    | -                                                                                                                                                              |\n| Workplace Assistant                           | agent                 | Workplace assistant multi-step tool-using environment                                                                                                                                                                        | Improve multi-step tool use capability                                                                                       | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fworkplace_assistant\u002Fconfigs\u002Fworkplace_assistant.yaml'>workplace_assistant.yaml\u003C\u002Fa>                                                                                                               | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-agent-workplace_assistant'>Nemotron-RL-agent-workplace_assistant\u003C\u002Fa>                               |\n| Xlam Fc                                       | agent                 | Salesforce xlam-function-calling-60k tool calling tasks                                                                                                                                                                      | Improve tool-calling capabilities                                                                                            | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fxlam_fc\u002Fconfigs\u002Fxlam_fc.yaml'>xlam_fc.yaml\u003C\u002Fa>                                                                                                                                                   | -                                                                                                                                                              |\n| Xstest                                        | safety                | XSTest safety benchmark - exaggerated safety (over-refusal) evaluation                                                                                                                                                       | Evaluate model safety calibration between helpfulness and harmlessness                                                       | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fxstest\u002Fconfigs\u002Fxstest.yaml'>xstest.yaml\u003C\u002Fa>                                                                                                                                                      | -                                                                                                                                                              |\n\u003C!-- END_TRAINING_SERVERS_TABLE -->\n\n## 📖 Documentation & Resources\n\n- **[Documentation](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Findex.html)** - Technical reference docs\n- **[Training Tutorials](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Ftraining-tutorials\u002Findex.html)** - Train with NeMo Gym environments\n- **[API Reference](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Fapidocs\u002Findex.html)** - Complete class and function reference\n \n\n## 🤝 Community & Support\n\nWe'd love your contributions! Here's how to get involved:\n\n- **[Report Issues](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fissues)** - Bug reports and feature requests\n- **[Contributing Guide](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Fcontribute\u002Findex.html)** - How to contribute code, docs, new environments, or training framework integrations\n\n## 📚 Citations\n\nIf you use NeMo Gym in your research, please cite it using the following BibTeX entry:\n\n```bibtex\n@misc{nemo-gym,\n  title = {NeMo Gym: An Open Source Library for Scaling Reinforcement Learning Environments for LLM},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym}},\n  author={NVIDIA},\n  year = {2025},\n  note = {GitHub repository},\n}\n```\n","# NeMo Gym\n\n**[要求](#-requirements)** • **[快速入门](#-quick-start)** • **[可用环境](#-available-environments)** • **[文档与资源](#-documentation--resources)** • **[社区与支持](#-community--support)** • **[引用](#-citations)**\n\nNeMo Gym 是一个用于构建大型语言模型（LLMs）强化学习（RL）训练环境的库。它提供了开发环境、扩展回放缓集收集以及与您首选的训练框架无缝集成的基础架构。\n\n## 🏆 为什么选择 NeMo Gym？\n\n- 提供脚手架和模式以加速环境开发：多步、多轮对话以及用户建模场景。\n- 即使不具备整个强化学习训练循环的专业知识，也能贡献环境。\n- 可独立于强化学习训练循环对环境和吞吐量进行端到端测试。\n- 与现有环境、系统和强化学习训练框架兼容。\n- 不断增长的基于可验证奖励的强化学习（RLVR）训练环境和数据集集合。\n\n> [!重要提示]\n> NeMo Gym 目前仍处于早期开发阶段。您可能会遇到 API 的变化、文档不完整以及偶尔的 bug。我们欢迎您的贡献和反馈——如有任何更改，请先提交 issue 以开启讨论！\n\n## 🔗 生态系统\n\nNeMo Gym 是 NVIDIA NeMo 平台的一部分，NVIDIA NeMo 是一个由 GPU 加速的生成式 AI 模型构建与训练平台。NeMo Gym 已经与越来越多的强化学习训练框架和环境库集成；有关详细信息和教程，请参阅 [生态系统](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Fabout\u002Fecosystem.html) 页面。\n\n**训练框架：** [NeMo RL](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Ftraining-tutorials\u002Fnemo-rl-grpo\u002Findex.html) • [OpenRLHF](https:\u002F\u002Fgithub.com\u002FOpenRLHF\u002FOpenRLHF\u002Fblob\u002Fmain\u002Fexamples\u002Fpython\u002Fagent_func_nemogym_executor.py) • [Unsloth](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Ftraining-tutorials\u002Funsloth-training.html) • [更多 →](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Fabout\u002Fecosystem.html#training-framework-integrations)\n\n**环境库：** [Reasoning Gym](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Ftree\u002Fmain\u002Fresources_servers\u002Freasoning_gym) • [Aviary](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Ftree\u002Fmain\u002Fresources_servers\u002Faviary) • [更多 →](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Fabout\u002Fecosystem.html#environment-library-integrations)\n\n## 📋 要求\n\nNeMo Gym 设计为可在标准开发机器上运行：\n\n| 硬件要求 | 软件要求 |\n| --------------------- | --------------------- |\n| **GPU**：运行 NeMo Gym 库无需 GPU\u003Cbr>• 具体资源服务器或模型推理可能需要 GPU（请参阅各服务器文档） | **操作系统**：\u003Cbr>• Linux（Ubuntu 20.04+ 或等效版本）\u003Cbr>• macOS（x86_64 架构需 11.0+，Apple Silicon 架构需 12.0+）\u003Cbr>• Windows（通过 WSL2） |\n| **CPU**：任何现代 x86_64 或 ARM64 处理器（例如 Intel、AMD、Apple Silicon） | **Python**：3.12 或更高版本 |\n| **内存**：最低 8 GB（建议 16 GB 以上以应对较大环境） | **Git**：用于克隆仓库 |\n| **存储空间**：安装及基本使用至少需要 5 GB 可用空间 | **互联网连接**：下载依赖项和访问 API 所需 |\n\n**附加要求**\n\n- **API 密钥**：具有可用额度的 OpenAI API 密钥（用于快速入门示例）\n  - 支持其他模型提供商（Azure OpenAI、通过 vLLM 自托管模型）\n- **Ray**：作为依赖项自动安装，无需单独设置。\n\n## 🚀 快速入门\n\n安装 NeMo Gym，启动服务器，并收集您的第一批用于强化学习训练的已验证回放数据。\n\n### 设置\n```bash\n# 克隆仓库\ngit clone git@github.com:NVIDIA-NeMo\u002FGym.git\ncd Gym\n\n# 安装 UV（Python 包管理器）\ncurl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\nsource $HOME\u002F.local\u002Fbin\u002Fenv\n\n# 创建虚拟环境\nuv venv --python 3.12\nsource .venv\u002Fbin\u002Factivate\n\n# 安装 NeMo Gym\nuv sync --extra dev --group docs\n```\n\n### 配置您的 API 密钥\n创建一个 `env.yaml` 文件，其中包含您的 OpenAI API 密钥以及您想要使用的 [策略模型](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Fabout\u002Fconcepts\u002Fkey-terminology.html#term-Policy-Model)。将 `your-openai-api-key` 替换为您实际的密钥。此文件有助于将您的密钥信息从版本控制中移除，同时仍使其可供 NeMo Gym 使用。\n\n```bash\necho \"policy_base_url: https:\u002F\u002Fapi.openai.com\u002Fv1\npolicy_api_key: your-openai-api-key\npolicy_model_name: gpt-4.1-2025-04-14\" > env.yaml\n```\n\n> [!注释]\n> 在本快速入门中，我们使用 GPT-4.1，因为它延迟较低（无需推理步骤），并且开箱即用即可稳定工作。NeMo Gym 并不限于 OpenAI 模型——您也可以通过 vLLM 使用自托管模型，或使用任何兼容 OpenAI 的推理服务器。有关详细信息，请参阅 [文档](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Fget-started\u002Fdetailed-setup.html)。\n\n### 启动服务器\n\n**终端 1（启动服务器）**：\n```bash\n# 启动服务器（将持续运行）\nconfig_paths=\"resources_servers\u002Fexample_single_tool_call\u002Fconfigs\u002Fexample_single_tool_call.yaml,\\\nresponses_api_models\u002Fopenai_model\u002Fconfigs\u002Fopenai_model.yaml\"\nng_run \"+config_paths=[${config_paths}]\"\n```\n\n**终端 2（与智能体交互）**：\n```bash\n# 在一个新的终端中激活环境\nsource .venv\u002Fbin\u002Factivate\n\n# 与您的智能体交互\npython responses_api_agents\u002Fsimple_agent\u002Fclient.py\n```\n\n### 收集回放数据\n\n**终端 2**（保持终端 1 中的服务器运行）：\n```bash\n# 创建一个包含单个查询的简单数据集\necho '{\"responses_create_params\":{\"input\":[{\"role\":\"developer\",\"content\":\"You are a helpful assistant.\"},{\"role\":\"user\",\"content\":\"What is the weather in Seattle?\"}]}}' > weather_query.jsonl\n\n# 收集已验证的回放数据\nng_collect_rollouts \\\n    +agent_name=example_single_tool_call_simple_agent \\\n    +input_jsonl_fpath=weather_query.jsonl \\\n    +output_jsonl_fpath=weather_rollouts.jsonl\n\n# 查看结果\ncat weather_rollouts.jsonl | python -m json.tool\n```\n\n这将生成带有验证分数的训练数据！\n\n### 关闭服务器\n\n在运行服务器的 **终端 1** 中，按下 Ctrl+C 以停止 `ng_run` 进程。\n\n### 下一步\n\n现在您已经可以生成回放数据了，请选择以下路径之一：\n\n- **开始训练** — 使用 NeMo Gym 和您喜欢的强化学习框架来训练模型。请参阅 [训练教程](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Ftraining-tutorials\u002Findex.html)。\n- **使用现有环境** — 浏览下方的 [可用环境](#-available-environments)，找到符合您目标的环境。\n- **构建自定义环境** — 实现或集成现有工具，并定义任务验证逻辑。您可以从 [创建训练环境](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Fenvironment-tutorials\u002Fcreating-training-environment.html) 教程开始。\n\n## 📦 可用环境\n\nNeMo Gym 包含一系列精心挑选的环境，适用于多个领域的训练与评估：\n\n### 示例环境模式\n\n用途：展示 NeMo Gym 的模式和概念。\n\n\u003C!-- START_EXAMPLE_ONLY_SERVERS_TABLE -->\n| 名称               | 展示内容                         | 配置                                                                                                                             | README                                                                      |\n| ------------------ | ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- |\n| 多步               | 多步工具调用              | \u003Ca href='resources_servers\u002Fexample_multi_step\u002Fconfigs\u002Fexample_multi_step.yaml'>example_multi_step.yaml\u003C\u002Fa>                         | \u003Ca href='resources_servers\u002Fexample_multi_step\u002FREADME.md'>README\u003C\u002Fa>         |\n| 会话状态管理     | 会话状态管理（内存中）       | \u003Ca href='resources_servers\u002Fexample_session_state_mgmt\u002Fconfigs\u002Fexample_session_state_mgmt.yaml'>example_session_state_mgmt.yaml\u003C\u002Fa> | \u003Ca href='resources_servers\u002Fexample_session_state_mgmt\u002FREADME.md'>README\u003C\u002Fa> |\n| 单次工具调用       | 基本的单步工具调用       | \u003Ca href='resources_servers\u002Fexample_single_tool_call\u002Fconfigs\u002Fexample_single_tool_call.yaml'>example_single_tool_call.yaml\u003C\u002Fa>       | \u003Ca href='resources_servers\u002Fexample_single_tool_call\u002FREADME.md'>README\u003C\u002Fa>   |\n\u003C!-- END_EXAMPLE_ONLY_SERVERS_TABLE -->\n\n### 用于训练与评估的环境\n\n用途：配备精选数据集的可直接用于训练的环境。\n\n每个资源服务器都包含示例数据、配置文件和测试。详情请参阅各服务器的 README 文件。\n\n“数据集”列链接到公开可用的数据集（例如 HuggingFace 上的数据集）。若显示“-”，则表示训练\u002F验证数据尚未公开发布，或由提供的脚本以程序化方式生成。如果目前尚未发布数据，可以生成新数据，或将该环境用作参考。每个服务器在 `data\u002Fexample.jsonl` 中包含 5 个示例任务。\n\n\u003C!-- START_TRAINING_SERVERS_TABLE -->\n| 资源服务器                              | 域                | 描述                                                                                                                                                                                                                  | 值                                                                                                                        | 训练 | 验证 | 许可证                                                   | 配置                                                                                                                                                                                                                      | 数据集                                                                                                                                                        |\n| --------------------------------------------- | --------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- | ----- | ---------- | --------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| Aalcr                                         | 其他                 | -                                                                                                                                                                                                                            | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Faalcr\u002Fconfigs\u002Faalcr.yaml'>aalcr.yaml\u003C\u002Fa>                                                                                                                                                         | -                                                                                                                                                              |\n| Abstention                                    | rlhf                  | 使用LLM裁判在HotPotQA数据集上采用三层奖励机制，训练模型在不确定时选择弃权                                                                                                                                       | 通过奖励弃权而非错误答案来提升校准能力                                                           | ✓     | ✓          | 知识共享署名-相同方式共享4.0国际许可                                     | \u003Ca href='resources_servers\u002Fabstention\u002Fconfigs\u002Fabstention.yaml'>abstention.yaml\u003C\u002Fa>                                                                                                                                          | -                                                                                                                                                              |\n| Arc Agi                                       | 知识             | 解决旨在测试智能的谜题。详情请参见https:\u002F\u002Farcprize.org\u002Farc-agi。                                                                                                                                               | 提升解谜能力。                                                                                         | -     | ✓          | -                                                         | \u003Ca href='resources_servers\u002Farc_agi\u002Fconfigs\u002Farc_agi.yaml'>arc_agi.yaml\u003C\u002Fa>                                                                                                                                                   | -                                                                                                                                                              |\n| Aviary                                        | 代理                 | 在HotPotQA数据集上进行多跳问答，结合维基百科搜索                                                                                                                                                   | 提升知识与代理能力                                                                                     | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Faviary\u002Fconfigs\u002Fhotpotqa_aviary.yaml'>hotpotqa_aviary.yaml\u003C\u002Fa>                                                                                                                                    | -                                                                                                                                                              |\n| Aviary                                        | 数学                  | 包含计算器工具的GSM8k基准测试                                                                                                                                                                                         | 测试数学与代理能力                                                                                     | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Faviary\u002Fconfigs\u002Fgsm8k_aviary.yaml'>gsm8k_aviary.yaml\u003C\u002Fa>                                                                                                                                          | -                                                                                                                                                              |\n| Calendar                                      | 代理                 | 多轮日历安排数据集。用户以自然语言描述事件和约束条件；模型需安排事件以满足所有约束。                                                                           | 提升多轮指令遵循能力                                                                        | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fcalendar\u002Fconfigs\u002Fcalendar.yaml'>calendar.yaml\u003C\u002Fa>                                                                                                                                                | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-agent-calendar_scheduling'>Nemotron-RL-agent-calendar_scheduling\u003C\u002Fa>                               |\n| Calendar                                      | 代理                 | 多轮日历安排数据集。用户以自然语言描述事件和约束条件；模型需安排事件以满足所有约束。                                                                           | 提升多轮指令遵循能力                                                                        | ✓     | ✓          | 知识共享署名4.0国际许可            | \u003Ca href='resources_servers\u002Fcalendar\u002Fconfigs\u002Fcalendar_v2.yaml'>calendar_v2.yaml\u003C\u002Fa>                                                                                                                                          | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-Instruction-Following-Calendar-v2'>Nemotron-RL-Instruction-Following-Calendar-v2\u003C\u002Fa>               |\n| Circle Click                                  | 其他                 | 点击图像中的圆圈                                                                                                                                                                                                   | 提升视觉定位与空间推理能力                                                                               | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fcircle_click\u002Fconfigs\u002Fcircle_click.yaml'>circle_click.yaml\u003C\u002Fa>                                                                                                                                    | -                                                                                                                                                              |\n| Circle Count                                  | 其他                 | 统计图像中给定颜色的圆圈数量                                                                                                                                                                                     | 提升视觉计数与颜色识别能力                                                                                | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fcircle_count\u002Fconfigs\u002Fcircle_count.yaml'>circle_count.yaml\u003C\u002Fa>                                                                                                                                    | -                                                                                                                                                              |\n| Code Gen                                      | 编程                | 模型必须提交正确的代码来解决问题                                                                                                                                                                          | 提升竞赛编程能力                                                                                      | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fcode_gen\u002Fconfigs\u002Fcode_gen.yaml'>code_gen.yaml\u003C\u002Fa>                                                                                                                                                | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002Fnemotron-RL-coding-competitive_coding'>nemotron-RL-coding-competitive_coding\u003C\u002Fa>                               |\n| Competitive Coding Challenges                 | 编程                | 执行竞技编程竞赛题目                                                                                                                                                                   | 提升竞赛式问题上的编程能力                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fcompetitive_coding_challenges\u002Fconfigs\u002Fcompetitive_coding_challenges.yaml'>competitive_coding_challenges.yaml\u003C\u002Fa>                                                                                 | -                                                                                                                                                              |\n| Cvdp                                          | 编程                | CVDP基准数据集用于代码生成                                                                                                                                                                                   | 评估RTL代码生成能力                                                                                    | -     | ✓          | -                                                         | \u003Ca href='resources_servers\u002Fcvdp\u002Fconfigs\u002Fcvdp.yaml'>cvdp.yaml\u003C\u002Fa>                                                                                                                                                            | -                                                                                                                                                              |\n| Equivalence Llm Judge                         | 代理                 | 使用LLM作为裁判的简短Bash命令生成任务                                                                                                                                                                  | 提升基础Bash及IF相关能力                                                                                | ✓     | ✓          | GNU通用公共许可证v3.0                           | \u003Ca href='resources_servers\u002Fequivalence_llm_judge\u002Fconfigs\u002Fnl2bash-equivalency.yaml'>nl2bash-equivalency.yaml\u003C\u002Fa>                                                                                                             | -                                                                                                                                                              |\n| Equivalence Llm Judge                         | 知识             | 使用LLM作为裁判的简答题                                                                                                                                                                                   | 提升GPQA\u002FHLE等知识类基准测试                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fequivalence_llm_judge\u002Fconfigs\u002Fequivalence_llm_judge.yaml'>equivalence_llm_judge.yaml\u003C\u002Fa>                                                                                                         | -                                                                                                                                                              |\n| Ether0                                        | 知识             | ether0化学基准验证器                                                                                                                                                                                         | 利用ether0基准评估化学知识与推理能力                                                                    | -     | ✓          | -                                                         | \u003Ca href='resources_servers\u002Fether0\u002Fconfigs\u002Fether0.yaml'>ether0.yaml\u003C\u002Fa>                                                                                                                                                      | -                                                                                                                                                              |\n| Finance Sec Search                            | 代理                 | SEC EDGAR文件检索用于财务分析问题                                                                                                                                                                     | 使LLM能够搜索并分析SEC文件                                                                              | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Ffinance_sec_search\u002Fconfigs\u002Ffinance_sec_search.yaml'>finance_sec_search.yaml\u003C\u002Fa>                                                                                                                  | -                                                                                                                                                              |\n| Format Verification                           | 指令遵循           | 通过字符串匹配验证模型输出中的引用标记                                                                                                                                                     | 提升对引用格式遵循的指令遵循能力                                                                  | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fformat_verification\u002Fconfigs\u002Fcitation_format.yaml'>citation_format.yaml\u003C\u002Fa>                                                                                                                       | -                                                                                                                                                              |\n| Format Verification                           | 指令遵循           | 通过正则表达式模式验证自由格式文本（如项目符号、标题、表格等）                                                                                                                                         | 提升对文本格式约束的指令遵循能力                                                                | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fformat_verification\u002Fconfigs\u002Ffreeform_formatting.yaml'>freeform_formatting.yaml\u003C\u002Fa>                                                                                                               | -                                                                                                                                                              |\n| Genrm Compare                                 | rlhf                  | 用于RLHF训练的GenRM成对比较                                                                                                                                                                                  | 使用GenRM模型比较多个候选响应                                                                       | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fgenrm_compare\u002Fconfigs\u002Fgenrm_compare.yaml'>genrm_compare.yaml\u003C\u002Fa>                                                                                                                                 | -                                                                                                                                                              |\n| Google Search                                 | 代理                 | 集成搜索工具的多项选择问答问题                                                                                                                                                        | 结合搜索工具提升知识类基准测试                                                                      | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fgoogle_search\u002Fconfigs\u002Fgoogle_search.yaml'>google_search.yaml\u003C\u002Fa>                                                                                                                                 | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-knowledge-web_search-mcqa'>Nemotron-RL-knowledge-web_search-mcqa\u003C\u002Fa>                               |\n| Gpqa Diamond                                  | 知识             | GPQA Diamond多项选择问答题                                                                                                                                                                     | 通过MCQ验证评估研究生级别的科学推理能力                                                              | ✓     | -          | MIT                                                       | \u003Ca href='resources_servers\u002Fgpqa_diamond\u002Fconfigs\u002Fgpqa_diamond.yaml'>gpqa_diamond.yaml\u003C\u002Fa>                                                                                                                                    | -                                                                                                                                                              |\n| Ifbench                                       | 指令遵循           | 使用AllenAI的IFBench库（57种指令类型）进行指令遵循评估                                                                                                                              | 提升IFBench指令遵循能力                                                                                | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fifbench\u002Fconfigs\u002Fifbench.yaml'>ifbench.yaml\u003C\u002Fa>                                                                                                                                                   | -                                                                                                                                                              |\n| Indirect Prompt Injection                     | 安全                | 多领域工具使用代理的间接提示注入抵抗能力                                                                                                                                                        | 通过教授对包含恶意指令的工具输出的鲁棒性来提升代理安全性                                       | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Findirect_prompt_injection\u002Fconfigs\u002Findirect_prompt_injection.yaml'>indirect_prompt_injection.yaml\u003C\u002Fa>                                                                                             | -                                                                                                                                                              |\n| Instruction Following                         | 指令遵循           | 针对IFEval和IFBench风格指令遵循能力的指令遵循数据集                                                                                                                         | 提升IFEval和IFBench表现                                                                                | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Finstruction_following\u002Fconfigs\u002Finstruction_following.yaml'>instruction_following.yaml\u003C\u002Fa>                                                                                                         | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-instruction_following'>Nemotron-RL-instruction_following\u003C\u002Fa>                                       |\n| Jailbreak Detection                           | 安全                | 使用Nemotron裁判与组合奖励进行越狱检测                                                                                                                                                                    | 提升越狱鲁棒性和安全行为准则的执行                                                                  | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fjailbreak_detection\u002Fconfigs\u002Fjailbreak_detection_nemotron_combined_reward_tp8.yaml'>jailbreak_detection_nemotron_combined_reward_tp8.yaml\u003C\u002Fa>                                                     | -                                                                                                                                                              |\n| Labbench2 Vlm                                 | 知识             | labbench2 VLM基准测试：科学图表\u002FQA（figqa2、tableqa2），由LLM担任裁判                                                                                                                                    | 衡量VLM在图表和表格上的科学推理能力                                                                   | -     | ✓          | -                                                         | \u003Ca href='resources_servers\u002Flabbench2_vlm\u002Fconfigs\u002Flabbench2_vlm.yaml'>labbench2_vlm.yaml\u003C\u002Fa>                                                                                                                                 | -                                                                                                                                                              |\n| Math Advanced Calculations                    | 代理                 | 具有反直觉计算工具的指令遵循数学环境                                                                                                                                                 | 提升特定数学环境下的指令遵循能力                                                                     | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fmath_advanced_calculations\u002Fconfigs\u002Fmath_advanced_calculations.yaml'>math_advanced_calculations.yaml\u003C\u002Fa>                                                                                          | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-math-advanced_calculations'>Nemotron-RL-math-advanced_calculations\u003C\u002Fa>                             |\n| Math Formal Lean                              | 数学                  | Lean4形式化证明验证环境                                                                                                                                                                                    | 提升形式化定理证明能力                                                                                  | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fmath_formal_lean\u002Fconfigs\u002Fnemotron_clean_easy.yaml'>nemotron_clean_easy.yaml\u003C\u002Fa>                                                                                                                  | -                                                                                                                                                              |\n| Math Formal Lean                              | 数学                  | Lean4形式化证明验证环境                                                                                                                                                                                    | 提升形式化定理证明能力                                                                                  | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fmath_formal_lean\u002Fconfigs\u002Fnemotron_first_try_hard.yaml'>nemotron_first_try_hard.yaml\u003C\u002Fa>                                                                                                          | -                                                                                                                                                              |\n| Math Formal Lean                              | 数学                  | Lean4形式化证明验证环境                                                                                                                                                                                    | 提升形式化定理证明能力                                                                                  | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fmath_formal_lean\u002Fconfigs\u002Fnemotron_medium_500.yaml'>nemotron_medium_500.yaml\u003C\u002Fa>                                                                                                                  | -                                                                                                                                                              |\n| Math Formal Lean                              | 数学                  | Lean4形式化证明验证环境                                                                                                                                                                                    | 提升形式化定理证明能力                                                                                  | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fmath_formal_lean\u002Fconfigs\u002Fnemotron_very_easy.yaml'>nemotron_very_easy.yaml\u003C\u002Fa>                                                                                                                    | -                                                                                                                                                              |\n| Math Formal Lean                              | 数学                  | Lean4形式化证明验证环境                                                                                                                                                                                    | 提升形式化定理证明能力                                                                                  | ✓     | -          | MIT                                                       | \u003Ca href='resources_servers\u002Fmath_formal_lean\u002Fconfigs\u002Fmath_formal_lean.yaml'>math_formal_lean.yaml\u003C\u002Fa>                                                                                                                        | -                                                                                                                                                              |\n| Math Formal Lean                              | 数学                  | Lean4形式化证明验证环境，具备多轮自我修正功能                                                                                                                                                       | 提升形式化定理证明能力                                                                                  | ✓     | -          | MIT                                                       | \u003Ca href='resources_servers\u002Fmath_formal_lean\u002Fconfigs\u002Fmath_formal_lean_multi_turn.yaml'>math_formal_lean_multi_turn.yaml\u003C\u002Fa>                                                                                                  | -                                                                                                                                                              |\n| Math With Code                                | 数学                  | 模型使用简单计算器工具解决竞技数学问题                                                                                                                                                                     | 提升数学与简单工具使用能力                                                                             | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fmath_with_code\u002Fconfigs\u002Fmath_with_code.yaml'>math_with_code.yaml\u003C\u002Fa>                                                                                                                              | -                                                                                                                                                              |\n| Math With Judge                               | 数学                  | DAPO17k数学数据集，配有数学验证功能                                                                                                                                                                        | 提升包括AIME 24\u002F25在内的数学能力                                                                      | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fmath_with_judge\u002Fconfigs\u002Fdapo17k.yaml'>dapo17k.yaml\u003C\u002Fa>                                                                                                                                           | -                                                                                                                                                              |\n| Math With Judge                               | 数学                  | MathStackOverflow数学数据集，配有数学验证功能                                                                                                                                                                              | 提升包括AIME 24\u002F25在内的数学能力                                                                      | ✓     | ✓          | 知识共享署名-相同方式共享4.0国际许可 | \u003Ca href='resources_servers\u002Fmath_with_judge\u002Fconfigs\u002Fmath_stack_overflow.yaml'>math_stack_overflow.yaml\u003C\u002Fa>                                                                                                                   | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-math-stack_overflow'>Nemotron-RL-math-stack_overflow\u003C\u002Fa>                                           |\n| Math With Judge                               | 数学                  | OpenMathReasoning数学数据集，配有数学验证和LLM作为裁判                                                                                                                                                           | 提升包括AIME 24\u002F25在内的数学能力                                                                      | ✓     | ✓          | 知识共享署名4.0国际许可            | \u003Ca href='resources_servers\u002Fmath_with_judge\u002Fconfigs\u002Fmath_with_judge.yaml'>math_with_judge.yaml\u003C\u002Fa>                                                                                                                           | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-math-OpenMathReasoning'>Nemotron-RL-math-OpenMathReasoning\u003C\u002Fa>                                     |\n| Mcqa                                          | 知识             | 多项选择问答问题                                                                                                                                                                                             | 提升MMLU\u002FGPQA\u002FHLE等基准测试                                                                            | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fmcqa\u002Fconfigs\u002Fmcqa.yaml'>mcqa.yaml\u003C\u002Fa>                                                                                                                                                            | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-knowledge-mcqa'>Nemotron-RL-knowledge-mcqa\u003C\u002Fa>                                                     |\n| Multichallenge                                | 知识             | 针对推理记忆、指令保留、版本编辑和自我连贯性进行测试。                                                                                                                                        | 提升复杂的多轮对话能力                                                                                  | ✓     | -          | 知识共享署名4.0国际许可            | \u003Ca href='resources_servers\u002Fmultichallenge\u002Fconfigs\u002Fmultichallenge_nrl.yaml'>multichallenge_nrl.yaml\u003C\u002Fa>                                                                                                                      | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-Instruction-Following-MultiTurnChat-v1'>Nemotron-RL-Instruction-Following-MultiTurnChat-v1\u003C\u002Fa>     |\n| Newton Bench                                  | 数学                  | 通过代理式实验，在12个物理领域中完成科学定律发现任务                                                                                                                                     | 提升科学、推理和工具使用能力                                                                            | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fnewton_bench\u002Fconfigs\u002Fnewton_bench.yaml'>newton_bench.yaml\u003C\u002Fa>                                                                                                                                    | -                                                                                                                                                              |\n| Ns Tools                                      | 代理                 | 使用NeMo Skills工具执行，并进行数学验证                                                                                                                                                                            | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fns_tools\u002Fconfigs\u002Fns_tools.yaml'>ns_tools.yaml\u003C\u002Fa>                                                                                                                                                | -                                                                                                                                                              |\n| Nvarc                                         | 知识             | ARC-AGI归纳模式：模型输出带有transform()的Python代码                                                                                                                                                           | 通过诱导可执行的转换程序来提升ARC-AGI解谜能力                                                        | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fnvarc\u002Fconfigs\u002Finductive.yaml'>inductive.yaml\u003C\u002Fa>                                                                                                                                                 | -                                                                                                                                                              |\n| Nvarc                                         | 知识             | ARC-AGI演绎模式：模型直接输出网格                                                                                                                                                                         | 通过直接预测变换后的网格来提升ARC-AGI解谜能力                                                        | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fnvarc\u002Fconfigs\u002Ftransductive.yaml'>transductive.yaml\u003C\u002Fa>                                                                                                                                           | -                                                                                                                                                              |\n| Openenv                                       | 代理                 | 通过OpenEnv（MCP）模拟回声环境。根据消息长度给予奖励，将消息原样返回。                                                                                                                                          | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fopenenv\u002Fconfigs\u002Fopenenv_echo.yaml'>openenv_echo.yaml\u003C\u002Fa>                                                                                                                                         | -                                                                                                                                                              |\n| Openenv                                       | 编程                | 通过OpenEnv提供的Python代码执行环境。执行代码并返回stdout\u002Fstderr。                                                                                                                                      | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fopenenv\u002Fconfigs\u002Fopenenv_coding.yaml'>openenv_coding.yaml\u003C\u002Fa>                                                                                                                                     | -                                                                                                                                                              |\n| Openenv                                       | 游戏                | 通过OpenEnv提供的迷宫导航环境。代理人需要在8x8的网格中找到出口。                                                                                                                                       | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fopenenv\u002Fconfigs\u002Fopenenv_maze.yaml'>openenv_maze.yaml\u003C\u002Fa>                                                                                                                                         | -                                                                                                                                                              |\n| Over Refusal Detection                        |                       | -                                                                                                                                                                                                                            | -                                                                                                                            | ✓     | -          | 待定                                                       | \u003Ca href='resources_servers\u002Fover_refusal_detection\u002Fconfigs\u002Fover_refusal_detection.yaml'>over_refusal_detection.yaml\u003C\u002Fa>                                                                                                      | -                                                                                                                                                              |\n| Proof Genselect                               | 数学                  | 通过二元正确性奖励进行成对证明选择                                                                                                                                                                        | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fproof_genselect\u002Fconfigs\u002Fproof_genselect.yaml'>proof_genselect.yaml\u003C\u002Fa>                                                                                                                           | -                                                                                                                                                              |\n| Proof Judge                                   | 数学                  | 使用验证者+元验证者裁判（联合环境）进行定理证明                                                                                                                                                           | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fproof_judge\u002Fconfigs\u002Fproof_judge.yaml'>proof_judge.yaml\u003C\u002Fa>                                                                                                                                       | -                                                                                                                                                              |\n| Proof Verification                            | 数学                  | 根据真实情况和元验证者的一致性对证明进行评分                                                                                                                                                   | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fproof_verification\u002Fconfigs\u002Fproof_verification.yaml'>proof_verification.yaml\u003C\u002Fa>                                                                                                                  | -                                                                                                                                                              |\n| Rdkit Chemistry                               | 知识             | 分子化学问答：计算SMILES的性质。包括使用工具（Python + rdkit）和不使用工具的问题。                                                                               | 提升分子推理和SMILES解析能力。                                                                          | ✓     | -          | 待定                                                       | \u003Ca href='resources_servers\u002Frdkit_chemistry\u002Fconfigs\u002Frdkit_chemistry.yaml'>rdkit_chemistry.yaml\u003C\u002Fa>                                                                                                                           | -                                                                                                                                                              |\n| Reasoning Gym                                 | 知识             | LangGraph编排器代理兼容不使用工具的资源服务器；支持多样化的代理训练数据和测试时间缩放，相比简单代理更具扩展性，可进一步集成工具或其他代理架构       | 通过迭代测试时间缩放提升推理任务性能                                                      | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Freasoning_gym\u002Fconfigs\u002Forchestrator_agent.yaml'>orchestrator_agent.yaml\u003C\u002Fa>                                                                                                                       | -                                                                                                                                                              |\n| Reasoning Gym                                 | 知识             | LangGraph并行思维代理兼容不使用工具的资源服务器；支持多样化的代理训练数据和测试时间缩放，相比简单代理更具扩展性，可进一步集成工具或其他代理架构  | 通过迭代测试时间缩放提升推理任务性能                                                      | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Freasoning_gym\u002Fconfigs\u002Fparallel_thinking_agent.yaml'>parallel_thinking_agent.yaml\u003C\u002Fa>                                                                                                             | -                                                                                                                                                              |\n| Reasoning Gym                                 | 知识             | LangGraph反思代理兼容不使用工具的资源服务器；提供迭代反思，支持多样化的代理训练数据和测试时间缩放，可进一步集成工具或其他代理架构 | 通过迭代测试时间缩放提升推理任务性能                                                      | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Freasoning_gym\u002Fconfigs\u002Freflection_agent.yaml'>reflection_agent.yaml\u003C\u002Fa>                                                                                                                           | -                                                                                                                                                              |\n| Reasoning Gym                                 | 知识             | LangGraphReWOO代理兼容不使用工具的资源服务器；支持多样化的代理训练数据和测试时间缩放，相比简单代理更具扩展性，可进一步集成工具或其他代理架构              | 通过迭代测试时间缩放提升推理任务性能                                                      | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Freasoning_gym\u002Fconfigs\u002Frewoo_agent.yaml'>rewoo_agent.yaml\u003C\u002Fa>                                                                                                                                     | -                                                                                                                                                              |\n| Reasoning Gym                                 | 知识             | 包含代数、算术、计算、认知、几何、图论、逻辑以及许多常见游戏在内的100多个任务。                                                                                                  | 提升鲁棒性、泛化能力、广博的知识与推理能力                                                            | ✓     | -          | 知识共享署名4.0国际许可            | \u003Ca href='resources_servers\u002Freasoning_gym\u002Fconfigs\u002Freasoning_gym.yaml'>reasoning_gym.yaml\u003C\u002Fa>                                                                                                                                 | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-ReasoningGym-v1'>Nemotron-RL-ReasoningGym-v1\u003C\u002Fa>                                                   |\n| Ruler                                         | 其他                 | -                                                                                                                                                                                                                            | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fruler\u002Fconfigs\u002Fruler.yaml'>ruler.yaml\u003C\u002Fa>                                                                                                                                                         | -                                                                                                                                                              |\n| Single Step Tool Use With Argument Comparison | 代理                 | 来自专家轨迹的会话式工具使用强化学习；按步骤克隆行为，涵盖认证、查找和维修等领域。                                                                                                   | -                                                                                                                            | ✓     | ✓          | 知识共享署名4.0国际许可            | \u003Ca href='resources_servers\u002Fsingle_step_tool_use_with_argument_comparison\u002Fconfigs\u002Fsingle_step_tool_use_with_argument_comparison.yaml'>single_step_tool_use_with_argument_comparison.yaml\u003C\u002Fa>                                 | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-Agentic-Conversational-Tool-Use-Pivot-v1'>Nemotron-RL-Agentic-Conversational-Tool-Use-Pivot-v1\u003C\u002Fa> |\n| Single Step Tool Use With Argument Comparison | 代理                 | 通用函数调用强化学习数据集，使用专家轨迹；按步骤克隆行为以匹配专家的工具调用。                                                                                                         | -                                                                                                                            | ✓     | ✓          | 知识共享署名4.0国际许可            | \u003Ca href='resources_servers\u002Fsingle_step_tool_use_with_argument_comparison\u002Fconfigs\u002Ftoolcall_schema_single_step_tool_use_with_argument_comparison.yaml'>toolcall_schema_single_step_tool_use_with_argument_comparison.yaml\u003C\u002Fa> | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-Agentic-Function-Calling-Pivot-v1'>Nemotron-RL-Agentic-Function-Calling-Pivot-v1\u003C\u002Fa>               |\n| Single Step Tool Use With Argument Comparison | 代理                 | 面向软件工程代理的GitHub-issue数据集；从SWE-Gym和SWE-Bench-Verified改编而来，适用于NeMo Gym。                                                                                                           | -                                                                                                                            | ✓     | ✓          | 知识共享署名4.0国际许可            | \u003Ca href='resources_servers\u002Fsingle_step_tool_use_with_argument_comparison\u002Fconfigs\u002Fswe_pivot_single_step_tool_use_with_argument_comparison.yaml'>swe_pivot_single_step_tool_use_with_argument_comparison.yaml\u003C\u002Fa>             | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-Agentic-SWE-Pivot-v1'>Nemotron-RL-Agentic-SWE-Pivot-v1\u003C\u002Fa>                                         |\n| Single Step Tool Use With Argument Comparison | 代理                 | 模型必须在涉及搜索工具的给定轨迹中输出下一个正确的调用。                                                                                                                                    | 提升代理式搜索能力。                                                                                   | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fsingle_step_tool_use_with_argument_comparison\u002Fconfigs\u002Fsearch_pivot_single_step_tool_use_with_argument_comparison.yaml'>search_pivot_single_step_tool_use_with_argument_comparison.yaml\u003C\u002Fa>       | -                                                                                                                                                              |\n| Spider2 Lite                                  | 编程                | 基于执行的评估的Text-to-SQL，使用Spider 2.0-Lite（135个SQLite任务）。根据结果集等价性给予二元奖励。                                                                                            | 提升面向实际企业查询的Text-to-SQL能力，采用基于执行的二元奖励，无需LLM裁判。 | -     | ✓          | -                                                         | \u003Ca href='resources_servers\u002Fspider2_lite\u002Fconfigs\u002Fspider2_lite.yaml'>spider2_lite.yaml\u003C\u002Fa>                                                                                                                                    | -                                                                                                                                                              |\n| Structeval                                    | 指令遵循           | StructEval不可渲染格式验证（JSON、YAML、CSV、TOML、XML）                                                                                                                                                   | 提升结构化输出生成质量                                                                                  | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fstructeval\u002Fconfigs\u002Fstructeval_nonrenderable.yaml'>structeval_nonrenderable.yaml\u003C\u002Fa>                                                                                                              | -                                                                                                                                                              |\n| Structured Outputs                            | 指令遵循           | 检查响应是否遵循提示中的结构化输出要求                                                                                                                                                     | 提升指令遵循能力                                                                                        | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fstructured_outputs\u002Fconfigs\u002Fstructured_outputs_json.yaml'>structured_outputs_json.yaml\u003C\u002Fa>                                                                                                        | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-instruction_following-structured_outputs'>Nemotron-RL-instruction_following-structured_outputs\u003C\u002Fa> |\n| Structured Outputs                            | 指令遵循           | 检查响应是否遵循提示中的结构化输出要求                                                                                                                                                     | 提升指令遵循能力                                                                                        | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fstructured_outputs\u002Fconfigs\u002Fstructured_outputs_json_yaml_xml_v1.yaml'>structured_outputs_json_yaml_xml_v1.yaml\u003C\u002Fa>                                                                                | -                                                                                                                                                              |\n| Structured Outputs                            | 指令遵循           | 检查响应是否符合结构化输出要求（JSON、YAML、XML、TOML、CSV）。创建于2026年04月09日。                                                                                                                     | 提升对所有结构化输出格式的模式遵循能力                                                                | ✓     | -          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fstructured_outputs\u002Fconfigs\u002Fstructured_outputs_v3.yaml'>structured_outputs_v3.yaml\u003C\u002Fa>                                                                                                            | -                                                                                                                                                              |\n| Swerl Gen                                     | 编程                | 运行沙盒评估，用于SWE风格的任务（生成补丁或重现测试生成）。                                                                                                                                   | 提升SWE能力，有助于SWE-bench等基准测试                                                                  | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fswerl_gen\u002Fconfigs\u002Fswerl_gen.yaml'>swerl_gen.yaml\u003C\u002Fa>                                                                                                                                             | -                                                                                                                                                              |\n| Swerl Llm Judge                               | 编程                | SWE风格的多项选择LLM裁判任务，通过\u003Csolution>...\u003C\u002Fsolution>选项进行评分。                                                                                                                                        | 提升SWE能力，有助于SWE-bench等基准测试                                                                  | ✓     | ✓          | MIT                                                       | \u003Ca href='resources_servers\u002Fswerl_llm_judge\u002Fconfigs\u002Fswerl_llm_judge.yaml'>swerl_llm_judge.yaml\u003C\u002Fa>                                                                                                                           | -                                                                                                                                                              |\n| Tavily Search                                 | 代理                 | 模型使用搜索工具来满足用户查询。                                                                                                                                                                             | 衡量代理式搜索能力                                                                                      | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Ftavily_search\u002Fconfigs\u002Ftavily_search_judge_vllm_model.yaml'>tavily_search_judge_vllm_model.yaml\u003C\u002Fa>                                                                                               | -                                                                                                                                                              |\n| Terminal Multi Harness                        | 代理                 | Agent006 Harness是用于下一步pivot RL的结构化动作验证器。                                                                                                                                                          | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fterminal_multi_harness\u002Fconfigs\u002Fterminal_multi_harness_agent006.yaml'>terminal_multi_harness_agent006.yaml\u003C\u002Fa>                                                                                    | -                                                                                                                                                              |\n| Terminal Multi Harness                        | 代理                 | Codex Harness是用于下一步pivot RL的结构化动作验证器。                                                                                                                                                             | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fterminal_multi_harness\u002Fconfigs\u002Fterminal_multi_harness_codex.yaml'>terminal_multi_harness_codex.yaml\u003C\u002Fa>                                                                                          | -                                                                                                                                                              |\n| Terminal Multi Harness                        | 代理                 | OpenCode Harness是用于下一步pivot RL的结构化动作验证器。                                                                                                                                                          | -                                                                                                                            | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fterminal_multi_harness\u002Fconfigs\u002Fterminal_multi_harness_opencode.yaml'>terminal_multi_harness_opencode.yaml\u003C\u002Fa>                                                                                    | -                                                                                                                                                              |\n| Terminus Judge                                | 代理                 | 单步终端任务（rubrics v4裁判提示）                                                                                                                                                                        | 改进终端式任务                                                                                         | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fterminus_judge\u002Fconfigs\u002Fterminus_judge.yaml'>terminus_judge.yaml\u003C\u002Fa>                                                                                                                              | -                                                                                                                                                              |\n| Terminus Judge                                | 代理                 | 单步终端任务（简单裁判提示）                                                                                                                                                                                | 改进终端式任务                                                                                         | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fterminus_judge\u002Fconfigs\u002Fterminus_judge_simple.yaml'>terminus_judge_simple.yaml\u003C\u002Fa>                                                                                                                | -                                                                                                                                                              |\n| Terminus Judge                                | 代理                 | 单步终端任务（仅字符串相似度）                                                                                                                                                                             | 改进终端式任务                                                                                         | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fterminus_judge\u002Fconfigs\u002Fterminus_judge_string_only.yaml'>terminus_judge_string_only.yaml\u003C\u002Fa>                                                                                                      | -                                                                                                                                                              |\n| Text To Sql                                   | 编程                | Text-to-SQL生成，由LLM作为裁判进行等价性检查                                                                                                                                                              | 提升跨多种方言的Text-to-SQL能力                                                                        | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Ftext_to_sql\u002Fconfigs\u002Ftext_to_sql.yaml'>text_to_sql.yaml\u003C\u002Fa>                                                                                                                                       | -                                                                                                                                                              |\n| Vlm Eval Kit                                  | 其他                 | -                                                                                                                                                                                                                            | 衡量VLM能力                                                                                             | -     | ✓          | -                                                         | \u003Ca href='resources_servers\u002Fvlm_eval_kit\u002Fconfigs\u002FMMBench_DEV_EN_V11.yaml'>MMBench_DEV_EN_V11.yaml\u003C\u002Fa>                                                                                                                        | -                                                                                                                                                              |\n| Vlm Eval Kit                                  | 其他                 | -                                                                                                                                                                                                                            | 衡量VLM能力                                                                                             | -     | ✓          | -                                                         | \u003Ca href='resources_servers\u002Fvlm_eval_kit\u002Fconfigs\u002FOCRBench.yaml'>OCRBench.yaml\u003C\u002Fa>                                                                                                                                            | -                                                                                                                                                              |\n| Vlm Eval Kit                                  | 其他                 | 运行所有支持的VLMEvalKit基准测试。                                                                                                                                                                                     | 衡量VLM能力                                                                                             | -     | ✓          | -                                                         | \u003Ca href='resources_servers\u002Fvlm_eval_kit\u002Fconfigs\u002Fvlm_eval_kit.yaml'>vlm_eval_kit.yaml\u003C\u002Fa>                                                                                                                                    | -                                                                                                                                                              |\n| Workplace Assistant                           | 代理                 | 工作场所助手多步工具使用环境                                                                                                                                                                        | 提升多步工具使用能力                                                                                   | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fworkplace_assistant\u002Fconfigs\u002Fworkplace_assistant.yaml'>workplace_assistant.yaml\u003C\u002Fa>                                                                                                               | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fnvidia\u002FNemotron-RL-agent-workplace_assistant'>Nemotron-RL-agent-workplace_assistant\u003C\u002Fa>                               |\n| Xlam Fc                                       | 代理                 | Salesforce xlam-function-calling-60k工具调用任务                                                                                                                                                                      | 提升工具调用能力                                                                                       | ✓     | ✓          | Apache 2.0                                                | \u003Ca href='resources_servers\u002Fxlam_fc\u002Fconfigs\u002Fxlam_fc.yaml'>xlam_fc.yaml\u003C\u002Fa>                                                                                                                                                   | -                                                                                                                                                              |\n| Xstest                                        | 安全                | XSTest安全基准——夸张的安全性（过度拒绝）评估                                                                                                                                                       | 评估模型在助益性与无害性之间的安全校准                                                               | -     | -          | -                                                         | \u003Ca href='resources_servers\u002Fxstest\u002Fconfigs\u002Fxstest.yaml'>xstest.yaml\u003C\u002Fa>                                                                                                                                                      | -                                                                                                                                                              |\n\u003C!-- END_TRAINING_SERVERS_TABLE -->\n\n## 📖 文档与资源\n\n- **[文档](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Findex.html)** - 技术参考文档\n- **[训练教程](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Ftraining-tutorials\u002Findex.html)** - 使用 NeMo Gym 环境进行训练\n- **[API 参考](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Fapidocs\u002Findex.html)** - 完整的类和函数参考\n\n## 🤝 社区与支持\n\n我们非常欢迎您的贡献！以下是参与方式：\n\n- **[报告问题](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fissues)** - Bug 报告和功能请求\n- **[贡献指南](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fgym\u002Flatest\u002Fcontribute\u002Findex.html)** - 如何贡献代码、文档、新环境或训练框架集成\n\n## 📚 引用\n\n如果您在研究中使用了 NeMo Gym，请使用以下 BibTeX 条目进行引用：\n\n```bibtex\n@misc{nemo-gym,\n  title = {NeMo Gym：用于扩展 LLM 强化学习环境的开源库},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym}},\n  author={NVIDIA},\n  year = {2025},\n  note = {GitHub 仓库},\n}\n```","# NeMo Gym 快速上手指南\n\nNeMo Gym 是一个用于构建大语言模型（LLM）强化学习（RL）训练环境的库。它提供了开发环境、扩展数据采集以及与主流训练框架集成的基础设施。\n\n## 环境准备\n\n在开始之前，请确保您的开发机器满足以下软硬件要求：\n\n### 硬件要求\n*   **GPU**: 运行 NeMo Gym 库本身**不需要** GPU。但在连接特定的资源服务器或进行模型推理时可能需要（视具体配置而定）。\n*   **CPU**: 任何现代 x86_64 或 ARM64 处理器（如 Intel, AMD, Apple Silicon）。\n*   **内存**: 最低 8 GB（大型环境建议 16 GB+）。\n*   **存储**: 至少 5 GB 可用磁盘空间。\n\n### 软件要求\n*   **操作系统**: \n    *   Linux (Ubuntu 20.04+)\n    *   macOS (Intel 芯片需 11.0+, Apple Silicon 需 12.0+)\n    *   Windows (需通过 WSL2)\n*   **Python**: 3.12 或更高版本。\n*   **Git**: 用于克隆代码仓库。\n*   **网络**: 需要互联网连接以下载依赖和访问 API。\n\n### 前置依赖与密钥\n*   **API Key**: 本指南示例使用 OpenAI API，请准备一个有效的 `OPENAI_API_KEY`。\n    *   *注：也支持 Azure OpenAI 或通过 vLLM 部署的自托管模型。*\n*   **Ray**: 安装过程中会自动作为依赖项安装，无需单独配置。\n\n---\n\n## 安装步骤\n\n推荐使用 `uv` 作为 Python 包管理器以获得更快的安装速度。\n\n### 1. 克隆仓库\n```bash\ngit clone git@github.com:NVIDIA-NeMo\u002FGym.git\ncd Gym\n```\n\n### 2. 安装 UV 并创建虚拟环境\n```bash\n# 安装 UV\ncurl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\nsource $HOME\u002F.local\u002Fbin\u002Fenv\n\n# 创建 Python 3.12 虚拟环境\nuv venv --python 3.12\nsource .venv\u002Fbin\u002Factivate\n\n# 安装 NeMo Gym (包含开发和文档依赖)\nuv sync --extra dev --group docs\n```\n\n### 3. 配置 API 密钥\n创建名为 `env.yaml` 的配置文件，填入您的 API 密钥和希望使用的策略模型。此文件可避免将敏感信息提交到版本控制中。\n\n```bash\necho \"policy_base_url: https:\u002F\u002Fapi.openai.com\u002Fv1\npolicy_api_key: your-openai-api-key\npolicy_model_name: gpt-4.1-2025-04-14\" > env.yaml\n```\n> **注意**: 请将 `your-openai-api-key` 替换为您真实的密钥。示例中使用 `gpt-4.1-2025-04-14` 是因为其延迟低且开箱即用，您也可以替换为其他兼容 OpenAI 接口的模型地址。\n\n---\n\n## 基本使用\n\n本部分演示如何启动服务、与代理交互并收集用于 RL 训练的验证数据（Rollouts）。您需要打开两个终端窗口。\n\n### 第一步：启动资源服务器\n在 **终端 1** 中运行以下命令启动服务器（该进程将保持运行）：\n\n```bash\nconfig_paths=\"resources_servers\u002Fexample_single_tool_call\u002Fconfigs\u002Fexample_single_tool_call.yaml,\\\nresponses_api_models\u002Fopenai_model\u002Fconfigs\u002Fopenai_model.yaml\"\nng_run \"+config_paths=[${config_paths}]\"\n```\n\n### 第二步：测试代理交互\n在 **终端 2** 中激活虚拟环境并与代理进行简单交互，确认服务正常：\n\n```bash\n# 激活环境\nsource .venv\u002Fbin\u002Factivate\n\n# 运行简易客户端进行测试\npython responses_api_agents\u002Fsimple_agent\u002Fclient.py\n```\n\n### 第三步：收集验证数据 (Rollouts)\n保持 **终端 1** 的服务器运行，在 **终端 2** 中执行以下操作以生成训练数据：\n\n1.  **创建查询文件**：\n    ```bash\n    echo '{\"responses_create_params\":{\"input\":[{\"role\":\"developer\",\"content\":\"You are a helpful assistant.\"},{\"role\":\"user\",\"content\":\"What is the weather in Seattle?\"}]}}' > weather_query.jsonl\n    ```\n\n2.  **执行收集命令**：\n    ```bash\n    ng_collect_rollouts \\\n        +agent_name=example_single_tool_call_simple_agent \\\n        +input_jsonl_fpath=weather_query.jsonl \\\n        +output_jsonl_fpath=weather_rollouts.jsonl\n    ```\n\n3.  **查看结果**：\n    ```bash\n    cat weather_rollouts.jsonl | python -m json.tool\n    ```\n    输出将包含带有验证分数的训练数据。\n\n### 清理\n完成测试后，在 **终端 1** 中按 `Ctrl+C` 停止 `ng_run` 进程。\n\n---\n\n## 下一步\n\n现在您已经成功生成了 Rollouts，可以选择以下路径继续：\n*   **开始训练**: 结合 NeMo RL、OpenRLHF 或 Unsloth 等框架进行模型训练。\n*   **使用现有环境**: 浏览官方提供的多步工具调用、会话状态管理等预设环境。\n*   **构建自定义环境**: 参考文档实现自定义工具集成和任务验证逻辑。","某金融科技公司正在训练一个专属的金融合规大模型，需要通过强化学习让模型学会准确引用法规条款并拒绝违规建议。\n\n### 没有 Gym 时\n- **环境搭建繁琐**：团队需从零编写代码来模拟多轮对话场景，手动处理用户状态记忆和复杂的法规验证逻辑，耗时数周。\n- **测试与训练耦合**：每次修改对话规则或奖励机制，都必须重启整个庞大的 RL 训练循环进行端到端测试，迭代效率极低。\n- **扩展性差**：难以快速复用现有的推理数据集（如 Reasoning Gym），且无法轻松将新构建的环境对接到 OpenRLHF 等不同训练框架中。\n- **资源浪费**：缺乏标准化的滚动数据收集（rollout collection）机制，导致 GPU 在等待环境响应时空转，算力利用率低下。\n\n### 使用 Gym 后\n- **开发加速**：利用 Gym 提供的多步、多轮对话脚手架，团队仅需几天即可构建出包含复杂用户建模的金融合规训练环境。\n- **独立验证流程**：可以在不启动完整 RL 训练的情况下，单独对环境逻辑和数据吞吐量进行端到端测试，快速发现并修复 Bug。\n- **生态无缝集成**：直接调用内置的 Reasoning Gym 等资源库，并通过标准接口将环境一键接入 NeMo RL 或 Unsloth 等主流训练框架。\n- **高效数据收集**：借助自动化的大规模滚动数据收集基础设施，显著提升了样本生成速度，确保昂贵的 GPU 资源始终处于满载训练状态。\n\nGym 通过标准化环境构建与解耦测试流程，将金融合规模型的强化学习迭代周期从数周缩短至数天，大幅降低了研发门槛与算力成本。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FNVIDIA-NeMo_Gym_743a8749.png","NVIDIA-NeMo","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FNVIDIA-NeMo_ef2128b9.png","",null,"https:\u002F\u002Fnvidia.com\u002F","https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo",[80,84,88,92],{"name":81,"color":82,"percentage":83},"Python","#3572A5",96.6,{"name":85,"color":86,"percentage":87},"Jinja","#a52a22",1.6,{"name":89,"color":90,"percentage":91},"Jupyter Notebook","#DA5B0B",1.2,{"name":93,"color":94,"percentage":95},"Shell","#89e051",0.6,836,120,"2026-04-17T22:46:32","Apache-2.0","Linux (Ubuntu 20.04+), macOS (11.0+ x86_64, 12.0+ Apple Silicon), Windows (via WSL2)","运行 NeMo Gym 库本身不需要 GPU；特定资源服务器或模型推理可能需要（详见各服务器文档），未指定具体型号或显存要求","最低 8GB，推荐 16GB+（针对大型环境）",{"notes":104,"python":105,"dependencies":106},"需要 Git 克隆仓库和互联网连接以下载依赖及访问 API。快速入门示例需要有效的 OpenAI API Key（也支持 Azure OpenAI 或通过 vLLM 自托管模型）。建议使用 uv 管理 Python 环境和依赖。该项目处于早期开发阶段，API 可能变动。","3.12+",[107,108],"uv","Ray",[13,14],[111,112,113,114,115,116,117],"reinforcement-learning","reinforcement-learning-agent","reinforcement-learning-environments","rl-environment","rl-training","gym","gym-environment","2026-03-27T02:49:30.150509","2026-04-18T14:25:54.837723",[121,126,131,136,141,146],{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},40138,"如何在 NeMo Gym 中同时支持开启和关闭推理（thinking on\u002Foff）模式的混合训练？","可以通过为每个任务配置不同的聊天模板参数来实现。一种方案是在数据集中混合包含开启\u002F关闭推理标记的数据点；另一种方案是在配置文件中添加多个聊天模板配置。例如：\n```yaml\npolicy_model:\n  responses_api_models:\n    vllm_model:\n      chat_template_configs:\n        reasoning_on:\n          enable_thinking: true\n        reasoning_off:\n          enable_thinking: false\n```\n该功能已在相关 PR（#672）中解决，允许通过元数据标志灵活切换模式，无需复制整个策略模型和代理配置。","https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fissues\u002F619",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},40139,"如何配置和使用 vLLM 作为模型服务后端（而非 OpenAI Responses API）？","vLLM 的配置文档已从 FAQ 移至专门的教程部分。用户需要在教程中参考完整的端到端示例，包括如何设置策略模型的环境变量（如 base urls 和 model names）。具体步骤涉及配置 `responses_api_models`，利用中间件在 chat 格式和 responses 格式之间进行转换。建议查阅最新的 `Model Server` 文档章节获取针对 vLLM 本地部署或 endpoint 的详细指南。","https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fissues\u002F194",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},40140,"NeMo Gym 是否支持 Unsloth 集成？目前的支持范围是什么？","目前已支持单步（single-step）场景的 Unsloth 集成。多步（multi-step\u002Fturn）场景的支持正在与 TRL 团队协作开发中。此外，项目方正在努力兼容 TRL 0.25 和 Transformers v5 版本。对于 vLLM 服务器模式，当前操作方式类似于 TRL，未来计划支持带有 OpenAI 兼容端点的异步 vLLM 引擎以提高效率。","https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fissues\u002F370",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},40141,"在 macOS 上运行 NeMo Gym 时遇到 \"ModuleNotFoundError: No module named 'nemo_gym'\" 错误怎么办？","该问题似乎是 macOS 特定环境或本地设置导致的，在远程 Linux 服务器（如 Runpod）上通常不会出现。如果遇到此导入错误，建议尝试在 Linux 环境中运行。若必须在 macOS 上解决，请检查 Python 环境路径是否正确（特别是 framework 安装），并确保包已正确安装在当前激活的虚拟环境中。如果问题持续，需提供具体的复现步骤以便进一步排查。","https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fissues\u002F232",{"id":142,"question_zh":143,"answer_zh":144,"source_url":145},40142,"如何在 NeMo Gym 中配置 Azure OpenAI 模型服务？","由于 Azure OpenAI 需要特定的 `default_query` 和 `api_version` 参数，需将其配置为独立的模型类型。在 `env.yaml` 中定义 API 版本：\n```yaml\npolicy_api_version: preview\n```\n然后在 `openai_model.yaml` 配置中添加 `default_query` 字段：\n```yaml\nopenai_model:\n  responses_api_models:\n    openai_model:\n      openai_base_url: ${policy_base_url}\n      openai_api_key: ${policy_api_key}\n      openai_model: ${policy_model_name}\n      default_query:\n        api-version: ${policy_api_version}\n```\n这样可以正确处理 Azure 端点的特殊请求参数。","https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fissues\u002F17",{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},40143,"运行 Unsloth 教程笔记本时遇到缺失模块错误或张量尺寸不匹配错误如何解决？","1. 缺少 Unsloth 模块：需在笔记本开头添加安装单元格来安装 Unsloth。\n2. 缺少依赖（matplotlib, pyparsing 等）：在非 Colab 环境运行时，需手动安装这些缺失的依赖包。\n3. 张量尺寸不匹配错误（如 \"size of tensor a must match size of tensor b\"）：这是 Unsloth 库本身的 Bug，已通过 Unsloth PR #4140 修复。建议升级 Unsloth 到最新版本以解决此训练错误。","https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fissues\u002F783",[152,157,162,167],{"id":153,"version":154,"summary_zh":155,"released_at":156},323672,"v0.2.1","- 由 @cmunley1 和 @kajalj22 完成的 0.2.1 补丁版本的 PyPI 修复 :: 拉取请求：#1081","2026-04-15T22:52:56",{"id":158,"version":159,"summary_zh":160,"released_at":161},323673,"v0.2.0","## **发布摘要**\n\nNeMo Gym v0.2.0 与 NVIDIA Nemotron 3 超大规模模型的发布同步推出，开源了训练过程中使用的强化学习环境及相应数据集。主要亮点如下：\n\n* 在编程、数学、科学、推理、智能体任务和安全性等领域新增了 17 个训练环境。  \n* 集成了 Future House Aviary、Open-Thought Reasoning Gym 和 Prime Intellect Verifiers，使您能够直接在 NeMo Gym 中使用这些库中的环境。  \n* 支持端到端的轨迹收集，并配备本地管理的 vLLM 服务器。  \n* 可通过 PyPI 直接安装：`pip install nemo-gym`。\n\n## **首次贡献者**\n\n本次发布迎来了 **15 位新贡献者**！以下是其中几位的贡献亮点：\n\n* **@sidnarayanan** 添加了 Aviary 集成，支持在任何 Aviary 环境上进行训练；Aviary 是一个涵盖数学、科学、生物学等领域的交互式强化学习环境库。  \n* **@3mei** 新增了文本转 SQL 环境，可在多种 SQL 方言中将自然语言转换为 SQL 查询。  \n* **@Kelvin0110** 新增了 NewtonBench 环境，用户可通过交互式实验发现科学定律。\n\n感谢所有新贡献者，正是你们的努力让 NeMo Gym 不断进步！\n\n## **主要功能与改进**\n\n**新增环境**\n\n* 新增 17 个资源服务器，涵盖以下领域：  \n  * 编程：文本转 SQL（[\\#648](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F648)）、SWE RL Gen（[\\#561](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F561)）、SWE RL LLM Judge（[\\#561](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F561)）  \n  * 数学：Lean4 数学证明（[\\#563](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F563)）  \n  * 科学：Aviary（[\\#55](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F55)）、NewtonBench（[\\#650](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F650)）  \n  * 推理：MultiChallenge（[\\#654](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F654)）、ARC-AGI（[\\#105](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F105)）、Reasoning Gym（[\\#113](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F113)）  \n  * 智能体任务：xLAM 函数调用（[\\#262](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F262)）、Tavily 搜索（[\\#825](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F825)）、单步工具使用与参数比较（[\\#825](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F825)）、Terminus Judge（[\\#594](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F594)）、NeMo Skills 工具（[\\#571](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F571)）  \n  * 安全性：越狱检测（[\\#825](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F825)）、过度拒绝检测（[\\#825](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F825)）  \n  * RLHF：生成式奖励模型比较（[\\#674](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F674)）  \n* 新增 5 个智能体服务器：Aviary 智能体（[\\#55](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F55)）、证明精炼智能体（[\\#563](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F563)）、SWE 智能体（[\\#343](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F343)）、工具模拟智能体（[\\#826](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FG","2026-03-11T15:03:26",{"id":163,"version":164,"summary_zh":165,"released_at":166},323674,"v0.1.1","## 变更内容\n* 由 @bxyu-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F337 中升级 v0.2.0 的包信息\n* 修复：更新文档中错误的路径：library_judge_math -> math_with_j…，由 @shashank3959 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F355 中完成\n* 更新秘密检测器以支持分支仓库，由 @chtruong814 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F358 中完成\n* 移除对 GitLab master 分支的引用，由 @hwolff99 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F377 中完成\n* 标记实验性教程，由 @bxyu-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F386 中完成\n* 文档：添加“实验性”标签，由 @lbliii 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F391 中完成\n* 修复错别字，由 @hwolff99 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F400 中完成\n* Readme 文件中关于数据集可发现性的内容，由 @fsiino-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F344 中完成\n* 添加多节点环境下的绝对 IP 地址，由 @sdevare-nv 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F286 中完成\n* 文档：移除“概念”部分中的“如何导航”章节，由 @ahmadki 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F414 中完成\n* 文档：修复核心抽象页面中的图片嵌入问题，由 @ahmadki 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F410 中完成\n* 文档：修复结构化输出部分的许可信息，由 @ahmadki 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F412 中完成\n* 文档：在文档中添加指向 GitHub 仓库的超链接，由 @ahmadki 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F413 中完成\n* 文档：在 README 和文档中添加软件\u002F硬件要求，由 @ffrujeri 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F401 中完成\n* 文档：清理 README 中的“快速入门”部分，由 @ahmadki 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F411 中完成\n* 显示系统和版本信息，由 @fsiino-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F347 中完成\n* 文档：改进资源服务器相关表述，由 @ffrujeri 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F408 中完成\n* 文档：新增创建资源服务器教程，由 @ffrujeri 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F407 中完成\n* 带离线 UV 的迷你示例，由 @sdevare-nv 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F357 中完成\n* 更新 vLLM 模型的相关注释，由 @cmunley1 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F423 中完成\n* 文档：将多个术语链接到词汇表中的定义，由 @ahmadki 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F424 中完成\n* 文档：解释为何使用 GPT-4，并澄清对其他模型的支持情况，由 @ahmadki 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F425 中完成\n* 移除内部章节，由 @hwolff99 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F430 中完成\n* 文档：多项改进和修复，由 @ahmadki 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F415 中完成\n* 文档：将“开始使用”和“部署收集”两节内容关联起来，由 @fsiino-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F426 中完成\n* 指导用户在完成“开始使用”后应采取的下一步行动，由 @cwing-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F435 中完成\n* 添加占位符作者，由 @jkyi-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F440 中完成\n* 澄清训练环境的框架，并统一文档中的信息传达，由 @cwing-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fp","2025-12-15T00:32:19",{"id":168,"version":169,"summary_zh":170,"released_at":171},323675,"v0.1.0","## 变更内容\n* @chtruong814 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F1 中添加了 copy-pr-bot\n* @chtruong814 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F2 中添加了初始仓库模板\n* @bxyu-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F3 中将 GitHub 的主分支更新为 GitLab 的主分支\n* @bxyu-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F4 中添加了别名“Penguin”\n* @bxyu-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F7 中添加了版权文档、README 和常见问题解答\n* @bxyu-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F6 中进行了 Dapo17k 相关的修改\n* @bxyu-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F8 中修复了文档构建失败的问题\n* @bxyu-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F10 中修复了文档中的问题\n* @bxyu-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F12 中改进了 GitHub SSH 密钥设置的相关文档\n* @kbhardwaj-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F5 中实现了 Comp-Coding 验证器\n* @fsiino-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F9 中实现了数据集查看器的简单聚合功能\n* @bxyu-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F13 中将 VLLMModel 的相关文档添加到了主 README 中\n* @bxyu-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F15 中修复了文档中代理名称的错误\n* @bxyu-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F11 中实现了 VLLMModel 传递 token ID 的功能\n* @bxyu-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F21 中清理了 VLLMModel 的分词参数\n* @kbhardwaj-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F26 中更新了 Comp-Coding 的 README.md 文件\n* @bxyu-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F27 中对文档进行了改进，移除了“为什么选择 NeMo Gym”部分，并增加了 CI\u002FCD 测试的相关信息\n* @cmunley1 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F22 中更新了服务器日志格式，使其更加一致\n* @cmunley1 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F25 中将 README 文件中的 `ng_collect_traj` 更名为 `ng_collect_rollouts`\n* @bxyu-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F19 中规定，简单代理的停止条件只需满足无需工具调用且输出消息项存在即可\n* @bxyu-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F31 中实现了服务器启动时的轮询机制\n* @pjin-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F33 中将顶级配置键 `openai_model` 重命名为 `policy_model`\n* @bxyu-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F35 中允许简单代理返回非 JSON 格式的工具响应\n* @bxyu-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F36 中编写了多验证器的相关文档\n* @bxyu-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F24 中为服务器提供了通过会话轻松接入各个实例的钩子\n* @damon-mosk-aoyama-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F42 中添加了 Math Stack Overflow 数据集\n* @bxyu-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F46 中添加了 Workbench 验证数据集\n* @bxyu-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F47 中更新了文档\n* @soares-f 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym\u002Fpull\u002F16 中实现了基于 LLM 的判官来判断响应等价性\n* @pjin-nvidia 在 https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FGym 中配置了全局 httpx 客户端","2025-11-15T01:06:13"]