[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-leggedrobotics--robotic_world_model":3,"tool-leggedrobotics--robotic_world_model":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":75,"owner_website":80,"owner_url":81,"languages":82,"stars":87,"forks":88,"last_commit_at":89,"license":90,"difficulty_score":10,"env_os":91,"env_gpu":92,"env_ram":93,"env_deps":94,"category_tags":98,"github_topics":79,"view_count":10,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":99,"updated_at":100,"faqs":101,"releases":132},575,"leggedrobotics\u002Frobotic_world_model","robotic_world_model","Repository for our papers: Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics and Uncertainty-Aware Robotic World Model Makes Offline Model-Based Reinforcement Learning Work on Real Robots","robotic_world_model 是基于 Isaac Lab 扩展的开源项目，专为机器人领域的模型强化学习设计。项目集成了 Robotic World Model (RWM) 和不确定性感知版本 (RWM-U) 的训练环境与流程。\n\n针对传统强化学习高度依赖物理仿真器、难以直接迁移至真实机器人的痛点，robotic_world_model 利用神经网络构建动态模拟器，实现了在无仿真器环境下训练策略的能力。这使得离线模型强化学习能够真正应用于实体机器人，大幅降低了试错成本。\n\nrobotic_world_model 非常适合机器人算法研究人员、强化学习开发者以及关注仿真到真机迁移的技术团队。核心亮点包括策略与神经动力学模型的在线联合训练、不确定性感知机制，以及支持对自回归想象轨迹的可视化分析，方便对比模型与无模型策略的性能表现。依托苏黎世联邦理工学院的学术背景，robotic_world_model 为探索鲁棒性策略优化提供了强有力的基础设施。","# Robotic World Model Extension for Isaac Lab\n\n[![IsaacSim](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FIsaacSim-4.5.0-silver.svg)](https:\u002F\u002Fdocs.omniverse.nvidia.com\u002Fisaacsim\u002Flatest\u002Foverview.html)\n[![Isaac Lab](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FIsaacLab-2.1.0-silver)](https:\u002F\u002Fisaac-sim.github.io\u002FIsaacLab)\n[![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.10-blue.svg)](https:\u002F\u002Fdocs.python.org\u002F3\u002Fwhatsnew\u002F3.10.html)\n[![Linux platform](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fplatform-linux--64-orange.svg)](https:\u002F\u002Freleases.ubuntu.com\u002F20.04\u002F)\n[![Windows platform](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fplatform-windows--64-orange.svg)](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002F)\n[![pre-commit](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https:\u002F\u002Fpre-commit.com\u002F)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-yellow.svg)](https:\u002F\u002Fopensource.org\u002Flicense\u002Fmit)\n\n## Overview\n\nThis repository extends [**Isaac Lab**](https:\u002F\u002Fgithub.com\u002Fisaac-sim\u002FIsaacLab) with environments and training pipelines for\n- [**Robotic World Model (RWM)**](https:\u002F\u002Fsites.google.com\u002Fview\u002Froboticworldmodel\u002Fhome),\n- [**Uncertainty-Aware Robotic World Model (RWM-U)**](https:\u002F\u002Fsites.google.com\u002Fview\u002Funcertainty-aware-rwm),\n\nand related model-based reinforcement learning methods.\n\nIt enables:\n- joint training of policies and neural dynamics models in Isaac Lab (online),\n- training of policies with learned neural network dynamics without any simulator (offline),\n- evaluation of model-based vs. model-free policies,\n- visualization of autoregressive imagination rollouts from learned dynamics,\n- visualization of trained policies in Isaac Lab.\n\n\n\u003Ctable>\n  \u003Ctr>\n  \u003Ctd valign=\"top\" width=\"50%\">\n\n![Robotic World Model](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fleggedrobotics_robotic_world_model_readme_a4ae49784003.png)\n\n**Paper**: [Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.10100)  \n**Project Page**: [https:\u002F\u002Fsites.google.com\u002Fview\u002Froboticworldmodel](https:\u002F\u002Fsites.google.com\u002Fview\u002Froboticworldmodel)\n\n  \u003C\u002Ftd>\n  \u003Ctd valign=\"top\" width=\"50%\">\n\n![Uncertainty-Aware Robotic World Model](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fleggedrobotics_robotic_world_model_readme_937edea84795.png)\n\n**Paper**: [Uncertainty-Aware Robotic World Model Makes Offline Model-Based Reinforcement Learning Work on Real Robots](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.16680)  \n**Project Page**: [https:\u002F\u002Fsites.google.com\u002Fview\u002Funcertainty-aware-rwm](https:\u002F\u002Fsites.google.com\u002Fview\u002Funcertainty-aware-rwm)\n\n  \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n**Authors**: [Chenhao Li](https:\u002F\u002Fbreadli428.github.io\u002F), [Andreas Krause](https:\u002F\u002Flas.inf.ethz.ch\u002Fkrausea), [Marco Hutter](https:\u002F\u002Frsl.ethz.ch\u002Fthe-lab\u002Fpeople\u002Fperson-detail.MTIxOTEx.TGlzdC8yNDQxLC0xNDI1MTk1NzM1.html)  \n**Affiliation**: [ETH AI Center](https:\u002F\u002Fai.ethz.ch\u002F), [Learning & Adaptive Systems Group](https:\u002F\u002Flas.inf.ethz.ch\u002F) and [Robotic Systems Lab](https:\u002F\u002Frsl.ethz.ch\u002F), [ETH Zurich](https:\u002F\u002Fethz.ch\u002Fen.html)\n\n\n---\n\n\n## Installation\n\n1. **Install Isaac Lab** (not needed for offline policy training)\n\nFollow the official [installation guide](https:\u002F\u002Fisaac-sim.github.io\u002FIsaacLab\u002Fmain\u002Fsource\u002Fsetup\u002Finstallation\u002Findex.html). We recommend using the Conda installation as it simplifies calling Python scripts from the terminal.\n\n2. **Install model-based RSL RL**\n\nFollow the official installation guide of model-based [RSL RL](https:\u002F\u002Fgithub.com\u002Fleggedrobotics\u002Frsl_rl_rwm) for model-based reinforcement learning to replace the `rsl_rl_lib` that comes with Isaac Lab.\n\n3. **Clone this repository** (outside your Isaac Lab directory)\n\n```bash\ngit clone git@github.com:leggedrobotics\u002Frobotic_world_model.git\n```\n\n4. **Install the extension** using the Python environment where Isaac Lab is installed\n\n```bash\npython -m pip install -e source\u002Fmbrl\n```\n\n5. **Verify the installation** (not needed for offline policy training)\n\n```bash\npython scripts\u002Freinforcement_learning\u002Frsl_rl\u002Ftrain.py --task Template-Isaac-Velocity-Flat-Anymal-D-Init-v0 --headless\n```\n\n---\n\n## World Model Pretraining & Evaluation\n\nRobotic World Model is a model-based reinforcement learning algorithm that learns a dynamics model and a policy concurrently.\n\n### Configure model inputs\u002Foutputs\n\nYou can configure the model inputs and outputs under `ObservationsCfg_PRETRAIN` in [`AnymalDFlatEnvCfg_PRETRAIN`](source\u002Fmbrl\u002Fmbrl\u002Ftasks\u002Fmanager_based\u002Flocomotion\u002Fvelocity\u002Fconfig\u002Fanymal_d\u002Fflat_env_cfg.py).\n\nAvailable components:\n- `SystemStateCfg`: state input and output head\n- `SystemActionCfg`: action input\n- `SystemExtensionCfg`: continuous privileged output head (e.g. rewards etc.)\n- `SystemContactCfg`: binary privileged output head (e.g. contacts)\n- `SystemTerminationCfg`: binary privileged output head (e.g. terminations)\n\nAnd you can configure the model architecture and training hyperparameters under `RslRlSystemDynamicsCfg` and `RslRlMbrlPpoAlgorithmCfg` in [`AnymalDFlatPPOPretrainRunnerCfg`](source\u002Fmbrl\u002Fmbrl\u002Ftasks\u002Fmanager_based\u002Flocomotion\u002Fvelocity\u002Fconfig\u002Fanymal_d\u002Fagents\u002Frsl_rl_ppo_cfg.py) .\n\nAvailable options:\n- `ensemble_size`: ensemble size for uncertainty estimation\n- `history_horizon`: stacked history horizon\n- `architecture_config`: architecture configuration\n- `system_dynamics_forecast_horizon`: autoregressive prediction steps\n\n### Run dynamics model pretraining:\n\n```bash\npython scripts\u002Freinforcement_learning\u002Frsl_rl\u002Ftrain.py \\\n  --task Template-Isaac-Velocity-Flat-Anymal-D-Pretrain-v0 \\\n  --headless\n```\n\nIt trains a PPO policy from scratch, while the induced experience during training is used to train the dynamics model.\n\n### Visualize autoregressive predictions\n\n```bash\npython scripts\u002Freinforcement_learning\u002Frsl_rl\u002Fvisualize.py \\\n  --task Template-Isaac-Velocity-Flat-Anymal-D-Visualize-v0 \\\n  --checkpoint \u003Ccheckpoint_path> \\\n  --system_dynamics_load_path \u003Cdynamics_model_path>\n```\n\nIt visualizes the learned dynamics model by rolling out the model autoregressively in imagination, conditioned on the actions from the learned policy.\nThe `dynamics_model_path` should point to the pretrained dynamics model checkpoint (e.g. `model_\u003Citeration>.pt`) inside the saved run directory.\n\n---\n\n## Model-Based Policy Training & Evaluation\n\nOnce a dynamics model is pretrained, you can train a model-based policy purely from **imagined rollouts** generated by the learned dynamics.\n\nThere are two options:\n- **Option 1: Train policy in imagination *online***, where additional environment interactions are continually collected using the latest policy to update the dynamics model (as implemented with RWM and MBPO-PPO in [Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.10100)).\n- **Option 2: Train policy in imagination *offline*** where no additional environment interactions are collected and the policy has to rely on the static dynamics model (as implemented with RWM-U and MOPO-PPO in [Uncertainty-Aware Robotic World Model Makes Offline Model-Based Reinforcement Learning Work on Real Robots](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.16680)).\n\n### Option 1: Train policy in imagination *online*\n\nThe online data collection relies on interactions with the environment and thus brings up the simulator.\n\n```bash\npython scripts\u002Freinforcement_learning\u002Frsl_rl\u002Ftrain.py --task Template-Isaac-Velocity-Flat-Anymal-D-Finetune-v0 --headless --checkpoint \u003Ccheckpoint_path> --system_dynamics_load_path \u003Cdynamics_model_path>\n```\n\nYou can either start the policy from pretrained checkpoints or from scratch by simply omitting the `--checkpoint` argument.\n\n### Option 2: Train policy in imagination *offline*\n\nThe offline policy training does not request any new data and thus relies solely on the static dynamics model.\nAlign the model architecture and specify the model load path under `ModelArchitectureConfig` in [`AnymalDFlatConfig`](scripts\u002Freinforcement_learning\u002Fmodel_based\u002Fconfigs\u002Fanymal_d_flat_cfg.py).\n\nAdditionally, the offline imagination needs to branch off from some initial states. Specify the data path under `DataConfig` in [`AnymalDFlatConfig`](scripts\u002Freinforcement_learning\u002Fmodel_based\u002Fconfigs\u002Fanymal_d_flat_cfg.py).\n\n```bash\npython scripts\u002Freinforcement_learning\u002Fmodel_based\u002Ftrain.py --task anymal_d_flat\n```\n\n### Play the learned model-based policy\n\nYou can play the learned policies with the original Isaac Lab task registry.\n\n```bash\npython scripts\u002Freinforcement_learning\u002Frsl_rl\u002Fplay.py --task Isaac-Velocity-Flat-Anymal-D-Play-v0 --checkpoint \u003Ccheckpoint_path>\n```\n\n---\n\n## Code Structure\n\nWe provide a reference pipeline that enables RWM and RWM-U on ANYmal D.\n\nKey files:\n\n**Online**\n\n- Environment configurations + dynamics model setup\n  [`flat_env_cfg.py`](source\u002Fmbrl\u002Fmbrl\u002Ftasks\u002Fmanager_based\u002Flocomotion\u002Fvelocity\u002Fconfig\u002Fanymal_d\u002Fflat_env_cfg.py).\n- Algorithm configuration + training parameters\n  [`rsl_rl_ppo_cfg.py`](source\u002Fmbrl\u002Fmbrl\u002Ftasks\u002Fmanager_based\u002Flocomotion\u002Fvelocity\u002Fconfig\u002Fanymal_d\u002Fagents\u002Frsl_rl_ppo_cfg.py).\n- Imagination rollout logic (constructs policy observations & rewards from model outputs)\n  [`anymal_d_manager_based_mbrl_env`](source\u002Fmbrl\u002Fmbrl\u002Ftasks\u002Fmanager_based\u002Flocomotion\u002Fvelocity\u002Fconfig\u002Fanymal_d\u002Fenvs\u002Fanymal_d_manager_based_mbrl_env.py).\n- Visualization environment + rollout reset\n  [`anymal_d_manager_based_visualize_env.py`](source\u002Fmbrl\u002Fmbrl\u002Ftasks\u002Fmanager_based\u002Flocomotion\u002Fvelocity\u002Fconfig\u002Fanymal_d\u002Fenvs\u002Fanymal_d_manager_based_visualize_env.py).\n\n**Offline**\n\n- Environment configurations + Imagination rollout logic (constructs policy observations & rewards from model outputs)\n  [`anymal_d_flat.py`](scripts\u002Freinforcement_learning\u002Fmodel_based\u002Fenvs\u002Fanymal_d_flat.py).\n- Algorithm configuration + training parameters\n  [`anymal_d_flat_cfg.py`](scripts\u002Freinforcement_learning\u002Fmodel_based\u002Fconfigs\u002Fanymal_d_flat_cfg.py).\n- Pretrained RWM-U checkpoint\n  [`pretrain_rnn_ens.pt`](assets\u002Fmodels\u002Fpretrain_rnn_ens.pt).\n- Initial states for imagination rollout\n  [`state_action_data_0.csv`](assets\u002Fdata\u002Fstate_action_data_0.csv).\n\n\n---\n\n## Citation\nIf you find this repository useful for your research, please consider citing:\n\n```text\n@article{li2025robotic,\n  title={Robotic world model: A neural network simulator for robust policy optimization in robotics},\n  author={Li, Chenhao and Krause, Andreas and Hutter, Marco},\n  journal={arXiv preprint arXiv:2501.10100},\n  year={2025}\n}\n@article{li2025offline,\n  title={Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator},\n  author={Li, Chenhao and Krause, Andreas and Hutter, Marco},\n  journal={arXiv preprint arXiv:2504.16680},\n  year={2025}\n}\n```\n","# Isaac Lab 的机器人世界模型扩展\n\n[![IsaacSim](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FIsaacSim-4.5.0-silver.svg)](https:\u002F\u002Fdocs.omniverse.nvidia.com\u002Fisaacsim\u002Flatest\u002Foverview.html)\n[![Isaac Lab](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FIsaacLab-2.1.0-silver)](https:\u002F\u002Fisaac-sim.github.io\u002FIsaacLab)\n[![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.10-blue.svg)](https:\u002F\u002Fdocs.python.org\u002F3\u002Fwhatsnew\u002F3.10.html)\n[![Linux platform](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fplatform-linux--64-orange.svg)](https:\u002F\u002Freleases.ubuntu.com\u002F20.04\u002F)\n[![Windows platform](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fplatform-windows--64-orange.svg)](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002F)\n[![pre-commit](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https:\u002F\u002Fpre-commit.com\u002F)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-yellow.svg)](https:\u002F\u002Fopensource.org\u002Flicense\u002Fmit)\n\n## 概述\n\n此仓库在 [**Isaac Lab**](https:\u002F\u002Fgithub.com\u002Fisaac-sim\u002FIsaacLab) 的基础上进行了扩展，提供了用于以下内容的**环境和训练流程**：\n- [**机器人世界模型 (Robotic World Model, RWM)**](https:\u002F\u002Fsites.google.com\u002Fview\u002Froboticworldmodel\u002Fhome)，\n- [**不确定性感知机器人世界模型 (Uncertainty-Aware Robotic World Model, RWM-U)**](https:\u002F\u002Fsites.google.com\u002Fview\u002Funcertainty-aware-rwm)，\n\n以及相关的**基于模型的强化学习 (model-based reinforcement learning)** 方法。\n\n它支持：\n- 在 Isaac Lab 中联合训练策略和神经动力学模型（在线），\n- 使用学习到的神经网络动力学在无任何仿真器 (simulator) 的情况下训练策略（离线），\n- 评估基于模型与无基于模型的策略，\n- 可视化从学习到的动力学中生成的自回归想象轨迹 (autoregressive imagination rollouts)，\n- 在 Isaac Lab 中可视化训练好的策略。\n\n\n\u003Ctable>\n  \u003Ctr>\n  \u003Ctd valign=\"top\" width=\"50%\">\n\n![Robotic World Model](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fleggedrobotics_robotic_world_model_readme_a4ae49784003.png)\n\n**论文**: [Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.10100)  \n**项目页面**: [https:\u002F\u002Fsites.google.com\u002Fview\u002Froboticworldmodel](https:\u002F\u002Fsites.google.com\u002Fview\u002Froboticworldmodel)\n\n  \u003C\u002Ftd>\n  \u003Ctd valign=\"top\" width=\"50%\">\n\n![Uncertainty-Aware Robotic World Model](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fleggedrobotics_robotic_world_model_readme_937edea84795.png)\n\n**论文**: [Uncertainty-Aware Robotic World Model Makes Offline Model-Based Reinforcement Learning Work on Real Robots](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.16680)  \n**项目页面**: [https:\u002F\u002Fsites.google.com\u002Fview\u002Funcertainty-aware-rwm](https:\u002F\u002Fsites.google.com\u002Fview\u002Funcertainty-aware-rwm)\n\n  \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n**作者**: [Chenhao Li](https:\u002F\u002Fbreadli428.github.io\u002F), [Andreas Krause](https:\u002F\u002Flas.inf.ethz.ch\u002Fkrausea), [Marco Hutter](https:\u002F\u002Frsl.ethz.ch\u002Fthe-lab\u002Fpeople\u002Fperson-detail.MTIxOTEx.TGlzdC8yNDQxLC0xNDI1MTk1NzM1.html)  \n**所属机构**: [ETH AI Center](https:\u002F\u002Fai.ethz.ch\u002F), [Learning & Adaptive Systems Group](https:\u002F\u002Flas.inf.ethz.ch\u002F) 和 [Robotic Systems Lab](https:\u002F\u002Frsl.ethz.ch\u002F), [ETH Zurich](https:\u002F\u002Fethz.ch\u002Fen.html)\n\n\n---\n\n\n## 安装\n\n1. **安装 Isaac Lab**（离线策略训练不需要）\n\n请遵循官方的 [安装指南](https:\u002F\u002Fisaac-sim.github.io\u002FIsaacLab\u002Fmain\u002Fsource\u002Fsetup\u002Finstallation\u002Findex.html)。我们建议使用 Conda 安装，因为它简化了从终端调用 Python 脚本的过程。\n\n2. **安装基于模型的 RSL RL**\n\n请遵循基于模型的 [RSL RL](https:\u002F\u002Fgithub.com\u002Fleggedrobotics\u002Frsl_rl_rwm) 官方安装指南，以替换随 Isaac Lab 提供的 `rsl_rl_lib`。\n\n3. **克隆此仓库**（在您的 Isaac Lab 目录之外）\n\n```bash\ngit clone git@github.com:leggedrobotics\u002Frobotic_world_model.git\n```\n\n4. **使用安装 Isaac Lab 的 Python 环境安装扩展**\n\n```bash\npython -m pip install -e source\u002Fmbrl\n```\n\n5. **验证安装**（离线策略训练不需要）\n\n```bash\npython scripts\u002Freinforcement_learning\u002Frsl_rl\u002Ftrain.py --task Template-Isaac-Velocity-Flat-Anymal-D-Init-v0 --headless\n```\n\n---\n\n## 世界模型预训练与评估\n\n机器人世界模型是一种基于模型的强化学习算法，它同时学习动力学模型和策略。\n\n### 配置模型输入\u002F输出\n\n您可以在 [`AnymalDFlatEnvCfg_PRETRAIN`](source\u002Fmbrl\u002Fmbrl\u002Ftasks\u002Fmanager_based\u002Flocomotion\u002Fvelocity\u002Fconfig\u002Fanymal_d\u002Fflat_env_cfg.py) 下的 `ObservationsCfg_PRETRAIN` 中配置模型输入和输出。\n\n可用组件：\n- `SystemStateCfg`: 状态输入和输出头\n- `SystemActionCfg`: 动作输入\n- `SystemExtensionCfg`: 连续特权输出头 (例如 rewards 等)\n- `SystemContactCfg`: 二进制特权输出头 (例如 contacts)\n- `SystemTerminationCfg`: 二进制特权输出头 (例如 terminations)\n\n您可以在 [`AnymalDFlatPPOPretrainRunnerCfg`](source\u002Fmbrl\u002Fmbrl\u002Ftasks\u002Fmanager_based\u002Flocomotion\u002Fvelocity\u002Fconfig\u002Fanymal_d\u002Fagents\u002Frsl_rl_ppo_cfg.py) 下的 `RslRlSystemDynamicsCfg` 和 `RslRlMbrlPpoAlgorithmCfg` 中配置模型架构和训练超参数。\n\n可用选项：\n- `ensemble_size`: 用于不确定性估计的集成大小\n- `history_horizon`: 堆叠的历史视界\n- `architecture_config`: 架构配置\n- `system_dynamics_forecast_horizon`: 自回归预测步骤数\n\n### 运行动力学模型预训练：\n\n```bash\npython scripts\u002Freinforcement_learning\u002Frsl_rl\u002Ftrain.py \\\n  --task Template-Isaac-Velocity-Flat-Anymal-D-Pretrain-v0 \\\n  --headless\n```\n\n它将从头开始训练一个 PPO (近端策略优化) 策略，同时训练期间产生的经验用于训练动力学模型。\n\n### 可视化自回归预测\n\n```bash\npython scripts\u002Freinforcement_learning\u002Frsl_rl\u002Fvisualize.py \\\n  --task Template-Isaac-Velocity-Flat-Anymal-D-Visualize-v0 \\\n  --checkpoint \u003Ccheckpoint_path> \\\n  --system_dynamics_load_path \u003Cdynamics_model_path>\n```\n\n它通过根据学习到的策略的动作条件化地在想象中自回归地展开模型来可视化学习到的动力学模型。\n`dynamics_model_path` 应指向保存的运行目录内的预训练动力学模型检查点 (例如 `model_\u003Citeration>.pt`)。\n\n---\n\n## 基于模型的策略训练与评估\n\n一旦动力学模型完成预训练，您就可以完全从学习到的动力学生成的**想象轨迹**中训练基于模型的策略。\n\n有两种选项：\n- **选项 1：在想象中在线训练策略**，其中使用最新策略持续收集额外的环境交互数据以更新动力学模型（如 [Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics](https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.10100) 中实现的 RWM 和 MBPO-PPO）。\n- **选项 2：在想象中离线训练策略**，其中不收集额外的环境交互数据，策略必须依赖静态动力学模型（如 [Uncertainty-Aware Robotic World Model Makes Offline Model-Based Reinforcement Learning Work on Real Robots](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.16680) 中实现的 RWM-U 和 MOPO-PPO）。\n\n### 选项 1：在想象 (Imagination) 中在线训练策略\n\n在线数据收集依赖于与环境的交互，因此需要启动模拟器。\n\n```bash\npython scripts\u002Freinforcement_learning\u002Frsl_rl\u002Ftrain.py --task Template-Isaac-Velocity-Flat-Anymal-D-Finetune-v0 --headless --checkpoint \u003Ccheckpoint_path> --system_dynamics_load_path \u003Cdynamics_model_path>\n```\n\n您可以直接使用预训练的检查点 (checkpoint) 启动策略，也可以通过省略 `--checkpoint` 参数从头开始训练。\n\n### 选项 2：在想象 (Imagination) 中离线训练策略\n\n离线策略训练不需要任何新数据，因此完全依赖静态动力学模型。\n请在 [`AnymalDFlatConfig`](scripts\u002Freinforcement_learning\u002Fmodel_based\u002Fconfigs\u002Fanymal_d_flat_cfg.py) 中的 `ModelArchitectureConfig` 下对齐模型架构并指定模型加载路径。\n\n此外，离线想象 (Imagination) 需要从某些初始状态分支出来。请在 [`AnymalDFlatConfig`](scripts\u002Freinforcement_learning\u002Fmodel_based\u002Fconfigs\u002Fanymal_d_flat_cfg.py) 的 `DataConfig` 下指定数据路径。\n\n```bash\npython scripts\u002Freinforcement_learning\u002Fmodel_based\u002Ftrain.py --task anymal_d_flat\n```\n\n### 运行学习到的基于模型的策略\n\n您可以使用原始的 Isaac Lab 任务注册表来运行学习到的策略。\n\n```bash\npython scripts\u002Freinforcement_learning\u002Frsl_rl\u002Fplay.py --task Isaac-Velocity-Flat-Anymal-D-Play-v0 --checkpoint \u003Ccheckpoint_path>\n```\n\n---\n\n## 代码结构\n\n我们提供了一个参考流程，支持在 ANYmal D 上运行 RWM (Robotic World Model，机器人世界模型) 和 RWM-U (Offline Robotic World Model，离线机器人世界模型)。\n\n关键文件：\n\n**在线**\n\n- 环境配置 + 动力学模型设置\n  [`flat_env_cfg.py`](source\u002Fmbrl\u002Fmbrl\u002Ftasks\u002Fmanager_based\u002Flocomotion\u002Fvelocity\u002Fconfig\u002Fanymal_d\u002Fflat_env_cfg.py)。\n- 算法配置 + 训练参数\n  [`rsl_rl_ppo_cfg.py`](source\u002Fmbrl\u002Fmbrl\u002Ftasks\u002Fmanager_based\u002Flocomotion\u002Fvelocity\u002Fconfig\u002Fanymal_d\u002Fagents\u002Frsl_rl_ppo_cfg.py)。\n- 想象推演 (Imagination rollout) 逻辑（根据模型输出构建策略观测值与奖励）\n  [`anymal_d_manager_based_mbrl_env`](source\u002Fmbrl\u002Fmbrl\u002Ftasks\u002Fmanager_based\u002Flocomotion\u002Fvelocity\u002Fconfig\u002Fanymal_d\u002Fenvs\u002Fanymal_d_manager_based_mbrl_env.py)。\n- 可视化环境 + 推演 (rollout) 重置\n  [`anymal_d_manager_based_visualize_env.py`](source\u002Fmbrl\u002Fmbrl\u002Ftasks\u002Fmanager_based\u002Flocomotion\u002Fvelocity\u002Fconfig\u002Fanymal_d\u002Fenvs\u002Fanymal_d_manager_based_visualize_env.py)。\n\n**离线**\n\n- 环境配置 + 想象推演 (Imagination rollout) 逻辑（根据模型输出构建策略观测值与奖励）\n  [`anymal_d_flat.py`](scripts\u002Freinforcement_learning\u002Fmodel_based\u002Fenvs\u002Fanymal_d_flat.py)。\n- 算法配置 + 训练参数\n  [`anymal_d_flat_cfg.py`](scripts\u002Freinforcement_learning\u002Fmodel_based\u002Fconfigs\u002Fanymal_d_flat_cfg.py)。\n- 预训练 RWM-U 检查点\n  [`pretrain_rnn_ens.pt`](assets\u002Fmodels\u002Fpretrain_rnn_ens.pt)。\n- 用于想象推演的初始状态\n  [`state_action_data_0.csv`](assets\u002Fdata\u002Fstate_action_data_0.csv)。\n\n\n---\n\n## 引用\n如果您发现此仓库对您的研究有用，请考虑引用：\n\n```text\n@article{li2025robotic,\n  title={Robotic world model: A neural network simulator for robust policy optimization in robotics},\n  author={Li, Chenhao and Krause, Andreas and Hutter, Marco},\n  journal={arXiv preprint arXiv:2501.10100},\n  year={2025}\n}\n@article{li2025offline,\n  title={Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator},\n  author={Li, Chenhao and Krause, Andreas and Hutter, Marco},\n  journal={arXiv preprint arXiv:2504.16680},\n  year={2025}\n}\n```","# robotic_world_model 快速上手指南\n\n**Robotic World Model Extension for Isaac Lab** 是一个基于 Isaac Lab 的扩展工具，支持机器人世界模型（RWM）及不确定性感知模型（RWM-U）的训练与评估。它允许在仿真器中联合训练策略与动力学模型，或仅使用学习到的神经网络动力学进行离线训练。\n\n## 环境准备\n\n本工具依赖 NVIDIA Isaac Sim 和 Isaac Lab 生态，请确保满足以下系统要求：\n\n- **操作系统**: Linux 64-bit 或 Windows 64-bit\n- **IsaacSim**: 4.5.0\n- **Isaac Lab**: 2.1.0\n- **Python**: 3.10\n- **建议**: 使用 Conda 管理 Python 环境，便于从终端调用脚本。\n\n> **注意**: 若仅需进行离线策略训练（Offline Policy Training），则不需要安装 Isaac Lab 和 Isaac Sim。\n\n## 安装步骤\n\n1. **安装 Isaac Lab**\n   遵循官方 [安装指南](https:\u002F\u002Fisaac-sim.github.io\u002FIsaacLab\u002Fmain\u002Fsource\u002Fsetup\u002Finstallation\u002Findex.html)。推荐使用 Conda 环境。\n\n2. **安装基于模型的 RSL RL**\n   按照 [RSL RL](https:\u002F\u002Fgithub.com\u002Fleggedrobotics\u002Frsl_rl_rwm) 官方指南安装基于模型的强化学习库，以替换 Isaac Lab 自带的 `rsl_rl_lib`。\n\n3. **克隆仓库**\n   在 Isaac Lab 目录外部克隆本仓库：\n   ```bash\n   git clone git@github.com:leggedrobotics\u002Frobotic_world_model.git\n   ```\n\n4. **安装扩展**\n   在已安装 Isaac Lab 的 Python 环境中执行：\n   ```bash\n   python -m pip install -e source\u002Fmbrl\n   ```\n\n5. **验证安装**\n   （离线训练无需此步）运行以下命令测试环境：\n   ```bash\n   python scripts\u002Freinforcement_learning\u002Frsl_rl\u002Ftrain.py --task Template-Isaac-Velocity-Flat-Anymal-D-Init-v0 --headless\n   ```\n\n## 基本使用\n\n### 1. 预训练动力学模型\n首先训练一个动力学模型，用于后续的策略训练。\n```bash\npython scripts\u002Freinforcement_learning\u002Frsl_rl\u002Ftrain.py \\\n  --task Template-Isaac-Velocity-Flat-Anymal-D-Pretrain-v0 \\\n  --headless\n```\n该命令将从头开始训练 PPO 策略，并利用训练过程中产生的经验数据来训练动力学模型。\n\n### 2. 在线策略训练 (Online)\n利用预训练的动力学模型，结合实时环境交互进行策略微调。\n```bash\npython scripts\u002Freinforcement_learning\u002Frsl_rl\u002Ftrain.py --task Template-Isaac-Velocity-Flat-Anymal-D-Finetune-v0 --headless --checkpoint \u003Ccheckpoint_path> --system_dynamics_load_path \u003Cdynamics_model_path>\n```\n- `\u003Ccheckpoint_path>`: 策略检查点路径（可选，省略则从头训练）。\n- `\u003Cdynamics_model_path>`: 动力学模型检查点路径（如 `model_\u003Citeration>.pt`）。\n\n### 3. 离线策略训练 (Offline)\n不收集新数据，完全依赖静态动力学模型进行想象轨迹训练。\n```bash\npython scripts\u002Freinforcement_learning\u002Fmodel_based\u002Ftrain.py --task anymal_d_flat\n```\n需先在配置文件中指定模型架构 (`ModelArchitectureConfig`) 和初始状态数据路径 (`DataConfig`)。\n\n### 4. 可视化与测试\n- **可视化预测**:\n  ```bash\n  python scripts\u002Freinforcement_learning\u002Frsl_rl\u002Fvisualize.py \\\n    --task Template-Isaac-Velocity-Flat-Anymal-D-Visualize-v0 \\\n    --checkpoint \u003Ccheckpoint_path> \\\n    --system_dynamics_load_path \u003Cdynamics_model_path>\n  ```\n- **运行策略**:\n  ```bash\n  python scripts\u002Freinforcement_learning\u002Frsl_rl\u002Fplay.py --task Isaac-Velocity-Flat-Anymal-D-Play-v0 --checkpoint \u003Ccheckpoint_path>\n  ```","某科研团队正在开发用于灾难救援的四足机器人，急需在复杂非结构化地形中实现稳定的自主移动能力。\n\n### 没有 robotic_world_model 时\n- 传统强化学习严重依赖真机试错，频繁的跌倒和碰撞导致硬件维修成本高昂且周期漫长。\n- 仿真环境与现实物理存在固有差异，直接迁移的策略往往无法适应真实地面的摩擦系数变化。\n- 缺乏有效的离线学习能力，必须在真实场景中收集海量数据才能微调策略，限制了迭代速度。\n\n### 使用 robotic_world_model 后\n- robotic_world_model 构建了神经网络模拟器，让策略可以在无物理引擎干扰下预训练，大幅降低硬件损耗。\n- 引入不确定性感知模块，自动过滤掉可能引发危险的高风险控制指令，显著提升部署安全性。\n- 支持离线模型基强化学习，利用少量历史数据即可完成策略优化并验证效果，加速了算法收敛。\n\nrobotic_world_model 通过神经动力学建模，将机器人策略训练的安全性与数据效率提升了数个数量级。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fleggedrobotics_robotic_world_model_c0de4cf1.png","leggedrobotics","Robotic Systems Lab - Legged Robotics at ETH Zürich","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fleggedrobotics_1f89d414.png","The Robotic Systems Lab investigates the development of machines and their intelligence to operate in rough and challenging environments. ",null,"https:\u002F\u002Frsl.ethz.ch\u002F","https:\u002F\u002Fgithub.com\u002Fleggedrobotics",[83],{"name":84,"color":85,"percentage":86},"Python","#3572A5",100,567,42,"2026-04-04T09:54:57","Apache-2.0","Linux, Windows","需要 NVIDIA GPU (Isaac Sim 依赖)，具体型号\u002F显存\u002FCUDA 版本未说明","未说明",{"notes":95,"python":96,"dependencies":97},"建议使用 conda 管理环境；离线策略训练无需安装 Isaac Lab；支持在线（需模拟器交互）和离线（纯想象）两种训练模式；首次运行需验证安装脚本。","3.10",[93],[54,13],"2026-03-27T02:49:30.150509","2026-04-06T05:27:04.707155",[102,107,112,117,122,127],{"id":103,"question_zh":104,"answer_zh":105,"source_url":106},2343,"该项目兼容的 Isaac Sim 和 Isaac Lab 具体版本是什么？","根据项目说明，当前使用的版本为 IsaacSim 4.5.0 和 Isaac Lab 2.1.0。请确保您的环境依赖与此版本匹配以避免兼容性问题。","https:\u002F\u002Fgithub.com\u002Fleggedrobotics\u002Frobotic_world_model\u002Fissues\u002F8",{"id":108,"question_zh":109,"answer_zh":110,"source_url":111},2344,"安装后运行测试时报错 ImportError: cannot import name 'dump_pickle' from 'isaaclab.utils.io' 如何解决？","这是由于 IsaacLab 版本差异导致的。官方代码已更新移除了该调用。临时解决方法：手动编辑 `scripts\u002Freinforcement_learning\u002Frsl_rl\u002Ftrain.py` 文件，删除其中关于 `dump_pickle` 的导入和调用语句。","https:\u002F\u002Fgithub.com\u002Fleggedrobotics\u002Frobotic_world_model\u002Fissues\u002F2",{"id":113,"question_zh":114,"answer_zh":115,"source_url":116},2345,"微调策略时，`system_dynamics_load_path` 参数应该指向什么路径？","该参数不应指向整个文件夹，而应指向预训练动力学模型的具体检查点文件。例如，应指向保存目录内的 `model_\u003Citeration>.pt` 文件。README 已更新以澄清此用法。","https:\u002F\u002Fgithub.com\u002Fleggedrobotics\u002Frobotic_world_model\u002Fissues\u002F5",{"id":118,"question_zh":119,"answer_zh":120,"source_url":121},2346,"G1 机器人训练时的状态归一化（Normalization）统计值（mean 和 std）是多少？","基于预训练策略运行产生的统计数据。基础线性速度均值 [0.0, 0.0, 0.0]，标准差 [0.5, 0.25, 0.1]；角速度均值 [0.0, 0.0, 0.0]，标准差 [0.25, 0.15, 0.3]；投影重力均值 [0.0, 0.0, -1.0]，标准差 [0.02, 0.02, 0.01]。关节位置\u002F速度\u002F力矩的均值均为 0，标准差需参考具体配置代码中的数组定义（如关节位置约±0.3 rad）。","https:\u002F\u002Fgithub.com\u002Fleggedrobotics\u002Frobotic_world_model\u002Fissues\u002F7",{"id":123,"question_zh":124,"answer_zh":125,"source_url":126},2347,"是否有使用世界模型梯度的 SHAC 实现代码？如何自行实施？","由于 SHAC 实现与 rsl_rl 兼容性不佳，官方未计划发布该部分代码。建议方案：尝试保留每次 RWM（Robotic World Model）rollout 相对于策略动作和学习价值函数的梯度，利用这些梯度来更新策略。","https:\u002F\u002Fgithub.com\u002Fleggedrobotics\u002Frobotic_world_model\u002Fissues\u002F1",{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},2348,"如何将此框架部署到真实机器人上？是否需要先仿真再微调？","通常流程是在仿真中训练策略，然后使用真实数据进行微调。官方即将提供一个无需运行模拟器即可进行真实数据收集的微调框架。","https:\u002F\u002Fgithub.com\u002Fleggedrobotics\u002Frobotic_world_model\u002Fissues\u002F9",[]]