[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-StanfordVL--GibsonEnv":3,"tool-StanfordVL--GibsonEnv":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":80,"owner_location":80,"owner_email":80,"owner_twitter":81,"owner_website":82,"owner_url":83,"languages":84,"stars":117,"forks":118,"last_commit_at":119,"license":120,"difficulty_score":121,"env_os":122,"env_gpu":123,"env_ram":124,"env_deps":125,"category_tags":139,"github_topics":140,"view_count":10,"oss_zip_url":80,"oss_zip_packed_at":80,"status":16,"created_at":151,"updated_at":152,"faqs":153,"releases":183},937,"StanfordVL\u002FGibsonEnv","GibsonEnv","Gibson Environments: Real-World Perception for Embodied Agents","GibsonEnv 是一个专为具身智能体（Embodied Agents）设计的虚拟环境模拟器，核心目标是为 AI 提供贴近真实世界的感知学习体验。它通过将真实空间数字化，让 AI 在虚拟环境中获得与物理世界高度相似的视觉和交互体验，从而解决传统仿真环境与现实差距过大、难以迁移的问题。\n\n这个工具主要面向机器人学和人工智能领域的研究人员与开发者。如果你正在研究视觉导航、主动感知或强化学习，需要让 AI 在复杂三维空间中自主移动并理解环境，GibsonEnv 会是一个实用的选择。它也适合那些希望先在仿真环境中训练机器人策略、再部署到真实硬件的团队。\n\nGibsonEnv 有几个值得关注的技术特点：首先，它的场景数据来自真实世界扫描，包含 572 个真实空间的 1440 层楼层，语义复杂度远高于人工设计的游戏场景；其次，内置的\"Goggles\"机制专门优化了从仿真到现实的迁移问题；最后，它集成了 Bullet 物理引擎，智能体的移动会受到真实物理约束，而非简单的碰撞检测。环境命名致敬了生态知觉理论家 James J. Gibson，体现了\"感知与运动相互依存\"的设计理念。\n\n项目由斯坦福大","GibsonEnv 是一个专为具身智能体（Embodied Agents）设计的虚拟环境模拟器，核心目标是为 AI 提供贴近真实世界的感知学习体验。它通过将真实空间数字化，让 AI 在虚拟环境中获得与物理世界高度相似的视觉和交互体验，从而解决传统仿真环境与现实差距过大、难以迁移的问题。\n\n这个工具主要面向机器人学和人工智能领域的研究人员与开发者。如果你正在研究视觉导航、主动感知或强化学习，需要让 AI 在复杂三维空间中自主移动并理解环境，GibsonEnv 会是一个实用的选择。它也适合那些希望先在仿真环境中训练机器人策略、再部署到真实硬件的团队。\n\nGibsonEnv 有几个值得关注的技术特点：首先，它的场景数据来自真实世界扫描，包含 572 个真实空间的 1440 层楼层，语义复杂度远高于人工设计的游戏场景；其次，内置的\"Goggles\"机制专门优化了从仿真到现实的迁移问题；最后，它集成了 Bullet 物理引擎，智能体的移动会受到真实物理约束，而非简单的碰撞检测。环境命名致敬了生态知觉理论家 James J. Gibson，体现了\"感知与运动相互依存\"的设计理念。\n\n项目由斯坦福大学开发，2018 年入选 CVPR Spotlight Oral，支持 Docker 快速部署和 ROS 集成，降低了上手门槛。","# GIBSON ENVIRONMENT for Embodied Active Agents with Real-World Perception \n\nYou shouldn't play video games all day, so shouldn't your AI! We built a virtual environment simulator, Gibson, that offers real-world experience for learning perception.  \n\n\u003Cimg src=misc\u002Fui.gif width=\"600\">\n \n**Summary**: Perception and being active (i.e. having a certain level of motion freedom) are closely tied. Learning active perception and sensorimotor control in the physical world is cumbersome as existing algorithms are too slow to efficiently learn in real-time and robots are fragile and costly. This has given a fruitful rise to learning in the simulation which consequently casts a question on transferring to real-world. We developed Gibson environment with the following primary characteristics:  \n\n**I.** being from the real-world and reflecting its semantic complexity through virtualizing real spaces,  \n**II.** having a baked-in mechanism for transferring to real-world (Goggles function), and  \n**III.** embodiment of the agent and making it subject to constraints of space and physics via integrating a physics engine ([Bulletphysics](http:\u002F\u002Fbulletphysics.org\u002Fwordpress\u002F)).  \n\n**Naming**: Gibson environment is named after *James J. Gibson*, the author of \"Ecological Approach to Visual Perception\", 1979. “We must perceive in order to move, but we must also move in order to perceive” – JJ Gibson\n\nPlease see the [website](http:\u002F\u002Fgibson.vision\u002F) (http:\u002F\u002Fgibsonenv.stanford.edu\u002F) for more technical details. This repository is intended for distribution of the environment and installation\u002Frunning instructions.\n\n#### Paper\n**[\"Gibson Env: Real-World Perception for Embodied Agents\"](http:\u002F\u002Fgibson.vision\u002F)**, in **CVPR 2018 [Spotlight Oral]**.\n\n\n[![Gibson summary video](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FStanfordVL_GibsonEnv_readme_2037e3528ff1.png)](https:\u002F\u002Fyoutu.be\u002FKdxuZjemyjc \"Click to watch the video summarizing Gibson environment!\")\n\n\n\nRelease\n=================\n**This is the 0.3.1 release. Bug reports, suggestions for improvement, as well as community developments are encouraged and appreciated.** [change log file](misc\u002FCHANGELOG.md).  \n\n\nDatabase\n=================\nThe full database includes 572 spaces and 1440 floors and can be downloaded [here](gibson\u002Fdata\u002FREADME.md). A diverse set of visualizations of all spaces in Gibson can be seen [here](http:\u002F\u002Fgibsonenv.stanford.edu\u002Fdatabase\u002F). To make the core assets download package lighter for the users, we  include a small subset (39) of the spaces. Users can download the rest of the spaces and add them to the assets folder. We also integrated [Stanford 2D3DS](http:\u002F\u002F3dsemantics.stanford.edu\u002F) and [Matterport 3D](https:\u002F\u002Fniessner.github.io\u002FMatterport\u002F) as separate datasets if one wishes to use Gibson's simulator with those datasets (access [here](gibson\u002Fdata\u002FREADME.md)).\n\nTable of contents\n=================\n\n   * [Installation](#installation)\n        * [Quick Installation (docker)](#a-quick-installation-docker)\n        * [Building from source](#b-building-from-source)\n        * [Uninstalling](#uninstalling)\n   * [Quick Start](#quick-start)\n        * [Gibson FPS](#gibson-framerate)\n        * [Web User Interface](#web-user-interface)\n        * [Rendering Semantics](#rendering-semantics)\n        * [Robotic Agents](#robotic-agents)\n        * [ROS Configuration](#ros-configuration)\n   * [Coding your RL agent](#coding-your-rl-agent)\n   * [Environment Configuration](#environment-configuration)\n   * [Goggles: transferring the agent to real-world](#goggles-transferring-the-agent-to-real-world)\n   * [Citation](#citation)\n\n\n\nInstallation\n=================\n\n#### Installation Method\n\nThere are two ways to install gibson, A. using our docker image (recommended) and B. building from source. \n\n#### System requirements\n\nThe minimum system requirements are the following:\n\nFor docker installation (A): \n- Ubuntu 16.04\n- Nvidia GPU with VRAM > 6.0GB\n- Nvidia driver >= 384\n- CUDA >= 9.0, CuDNN >= v7\n\nFor building from the source(B):\n- Ubuntu >= 14.04\n- Nvidia GPU with VRAM > 6.0GB\n- Nvidia driver >= 375\n- CUDA >= 8.0, CuDNN >= v5\n\n#### Download data\n\nFirst, our environment core assets data are available [here](https:\u002F\u002Fstorage.googleapis.com\u002Fgibson_scenes\u002Fassets_core_v2.tar.gz). You can follow the installation guide below to download and set up them properly. `gibson\u002Fassets` folder stores necessary data (agent models, environments, etc) to run gibson environment. Users can add more environments files into `gibson\u002Fassets\u002Fdataset` to run gibson on more environments. Visit the [database readme](gibson\u002Fdata\u002FREADME.md) for downloading more spaces. Please sign the [license agreement](gibson\u002Fdata\u002FREADME.md#download) before using Gibson's database.\n\n\nA. Quick installation (docker)\n-----\n\nWe use docker to distribute our software, you need to install [docker](https:\u002F\u002Fdocs.docker.com\u002Fengine\u002Finstallation\u002F) and [nvidia-docker2.0](https:\u002F\u002Fgithub.com\u002Fnvidia\u002Fnvidia-docker\u002Fwiki\u002FInstallation-(version-2.0)) first. \n\nRun `docker run --runtime=nvidia --rm nvidia\u002Fcuda nvidia-smi` to verify your installation. \n\nYou can either 1. pull from our docker image (recommended) or 2. build your own docker image.\n\n\n1. Pull from our docker image (recommended)\n\n```bash\n# download the dataset from https:\u002F\u002Fstorage.googleapis.com\u002Fgibson_scenes\u002Fdataset.tar.gz\ndocker pull xf1280\u002Fgibson:0.3.1\nxhost +local:root\ndocker run --runtime=nvidia -ti --rm -e DISPLAY -v \u002Ftmp\u002F.X11-unix:\u002Ftmp\u002F.X11-unix -v \u003Chost path to dataset folder>:\u002Froot\u002Fmount\u002Fgibson\u002Fgibson\u002Fassets\u002Fdataset xf1280\u002Fgibson:0.3.1\n```\n\n2. Build your own docker image \n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FStanfordVL\u002FGibsonEnv.git\ncd GibsonEnv\n.\u002Fdownload.sh # this script downloads assets data file and decompress it into gibson\u002Fassets folder\ndocker build . -t gibson ### finish building inside docker, note by default, dataset will not be included in the docker images\nxhost +local:root ## enable display from docker\n```\nIf the installation is successful, you should be able to run `docker run --runtime=nvidia -ti --rm -e DISPLAY -v \u002Ftmp\u002F.X11-unix:\u002Ftmp\u002F.X11-unix -v \u003Chost path to dataset folder>:\u002Froot\u002Fmount\u002Fgibson\u002Fgibson\u002Fassets\u002Fdataset gibson` to create a container. Note that we don't include\ndataset files in docker image to keep our image slim, so you will need to mount it to the container when you start a container. \n\n#### Notes on deployment on a headless server\n\nGibson Env supports deployment on a headless server and remote access with `x11vnc`. \nYou can build your own docker image with the docker file `Dockerfile` as above.\nInstructions to run gibson on a headless server (requires X server running):\n\n1. Install nvidia-docker2 dependencies following the starter guide. Install `x11vnc` with `sudo apt-get install x11vnc`.\n2. Have xserver running on your host machine, and run `x11vnc` on DISPLAY :0.\n3. `docker run --runtime=nvidia -ti --rm -e DISPLAY -v \u002Ftmp\u002F.X11-unix\u002FX0:\u002Ftmp\u002F.X11-unix\u002FX0 -v \u003Chost path to dataset folder>:\u002Froot\u002Fmount\u002Fgibson\u002Fgibson\u002Fassets\u002Fdataset \u003Cgibson image name>`\n4. Run gibson with `python \u003Cgibson example or training>` inside docker.\n5. Visit your `host:5900` and you should be able to see the GUI.\n\nIf you don't have X server running, you can still run gibson, see [this guide](https:\u002F\u002Fgithub.com\u002FStanfordVL\u002FGibsonEnv\u002Fwiki\u002FRunning-GibsonEnv-on-headless-server) for more details.\n\nB. Building from source\n-----\nIf you don't want to use our docker image, you can also install gibson locally. This will require some dependencies to be installed. \n\nFirst, make sure you have Nvidia driver and CUDA installed. If you install from source, CUDA 9 is not necessary, as that is for nvidia-docker 2.0. Then, let's install some dependencies:\n\n```bash\napt-get update \napt-get install libglew-dev libglm-dev libassimp-dev xorg-dev libglu1-mesa-dev libboost-dev \\\n\t\tmesa-common-dev freeglut3-dev libopenmpi-dev cmake golang libjpeg-turbo8-dev wmctrl \\\n\t\txdotool libzmq3-dev zlib1g-dev\n```\t\n\nInstall required deep learning libraries: Using python3.5 is recommended. You can create a python3.5 environment first. \n\n```bash\nconda create -n py35 python=3.5 anaconda \nsource activate py35 # the rest of the steps needs to be performed in the conda environment\nconda install -c conda-forge opencv\npip install http:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu90\u002Ftorch-0.3.1-cp35-cp35m-linux_x86_64.whl \npip install torchvision==0.2.0\npip install tensorflow==1.3\n```\nClone the repository, download data and build\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FStanfordVL\u002FGibsonEnv.git\ncd GibsonEnv\n.\u002Fdownload.sh # this script downloads assets data file and decompress it into gibson\u002Fassets folder\n.\u002Fbuild.sh build_local ### build C++ and CUDA files\npip install -e . ### Install python libraries\n```\n\nInstall OpenAI baselines if you need to run the training demo.\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ffxia22\u002Fbaselines.git\npip install -e baselines\n```\n\nUninstalling\n----\n\nUninstall gibson is easy. If you installed with docker, just run `docker images -a | grep \"gibson\" | awk '{print $3}' | xargs docker rmi` to clean up the image. If you installed from source, uninstall with `pip uninstall gibson`\n\n\nQuick Start\n=================\n\nFirst run `xhost +local:root` on your host machine to enable display. You may need to run `export DISPLAY=:0` first. After getting into the docker container with `docker run --runtime=nvidia -ti --rm -e DISPLAY -v \u002Ftmp\u002F.X11-unix:\u002Ftmp\u002F.X11-unix -v \u003Chost path to dataset folder>:\u002Froot\u002Fmount\u002Fgibson\u002Fgibson\u002Fassets\u002Fdataset gibson`, you will get an interactive shell. Now you can run a few demos. \n\nIf you installed from source, you can run those directly using the following commands without using docker. \n\n\n```bash\npython examples\u002Fdemo\u002Fplay_husky_nonviz.py ### Use ASWD keys on your keyboard to control a car to navigate around Gates building\n```\n\n\u003Cimg src=misc\u002Fhusky_nonviz.png width=\"600\">\n\nYou will be able to use ASWD keys on your keyboard to control a car to navigate around Gates building. A camera output will not be shown in this particular demo. \n\n```bash\npython examples\u002Fdemo\u002Fplay_husky_camera.py ### Use ASWD keys on your keyboard to control a car to navigate around Gates building, while RGB and depth camera outputs are also shown.\n```\n\u003Cimg src=misc\u002Fhusky_camera.png width=\"600\">\n\nYou will able to use ASWD keys on your keyboard to control a car to navigate around Gates building. You will also be able to see the RGB and depth camera outputs. \n\n```bash\npython examples\u002Ftrain\u002Ftrain_husky_navigate_ppo2.py ### Use PPO2 to train a car to navigate down the hallway in Gates building, using visual input from the camera.\n```\n\n\u003Cimg src=misc\u002Fhusky_train.png width=\"800\">\nBy running this command you will start training a husky robot to navigate in Gates building and go down the corridor with RGBD input. You will see some RL related statistics in the terminal after each episode.\n\n\n```bash\npython examples\u002Ftrain\u002Ftrain_ant_navigate_ppo1.py ### Use PPO1 to train an ant to navigate down the hallway in Gates building, using visual input from the camera.\n```\n\n\u003Cimg src=misc\u002Fant_train.png width=\"800\">\nBy running this command you will start training an ant to navigate in Gates building and go down the corridor with RGBD input. You will see some RL related statistics in the terminal after each episode.\n\n\n\nGibson Framerate\n----\nBelow is Gibson Environment's framerate benchmarked on different platforms. Please refer to [fps branch](https:\u002F\u002Fgithub.com\u002FStanfordVL\u002FGibsonEnv\u002Ftree\u002Ffps) for the code to reproduce the results.\n\u003Ctable class=\"table\">\n  \u003Ctr>\n    \u003Cth scope=\"row\">Platform\u003C\u002Fth>\n    \u003Ctd colspan=\"3\">Tested on Intel E5-2697 v4 + NVIDIA Tesla V100\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Cth scope=\"col\">Resolution [nxn]\u003C\u002Fth>\n    \u003Cth scope=\"col\">128\u003C\u002Fth>\n    \u003Cth scope=\"col\">256\u003C\u002Fth>\n    \u003Cth scope=\"col\">512\u003C\u002Fth>\n \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Cth scope=\"row\">RGBD, pre network\u003Ccode>f\u003C\u002Fcode>\u003C\u002Fth>\n    \u003Ctd>109.1\u003C\u002Ftd>\n    \u003Ctd>58.5\u003C\u002Ftd>\n    \u003Ctd>26.5\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Cth scope=\"row\">RGBD, post network\u003Ccode>f\u003C\u002Fcode>\u003C\u002Fth>\n    \u003Ctd>77.7\u003C\u002Ftd>\n    \u003Ctd>30.6\u003C\u002Ftd>\n    \u003Ctd>14.5\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Cth scope=\"row\">RGBD, post small network\u003Ccode>f\u003C\u002Fcode>\u003C\u002Fth>\n    \u003Ctd>87.4\u003C\u002Ftd>\n    \u003Ctd>40.5\u003C\u002Ftd>\n    \u003Ctd>21.2\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Cth scope=\"row\">Depth only\u003C\u002Fth>\n    \u003Ctd>253.0\u003C\u002Ftd>\n    \u003Ctd>197.9\u003C\u002Ftd>\n    \u003Ctd>124.7\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Cth scope=\"row\">Surface Normal only\u003C\u002Fth>\n    \u003Ctd>207.7\u003C\u002Ftd>\n    \u003Ctd>129.7\u003C\u002Ftd>\n    \u003Ctd>57.2\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Cth scope=\"row\">Semantic only\u003C\u002Fth>\n    \u003Ctd>190.0\u003C\u002Ftd>\n    \u003Ctd>144.2\u003C\u002Ftd>\n    \u003Ctd>55.6\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Cth scope=\"row\">Non-Visual Sensory\u003C\u002Fth>\n    \u003Ctd>396.1\u003C\u002Ftd>\n    \u003Ctd>396.1\u003C\u002Ftd>\n    \u003Ctd>396.1\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\nWe also tested on \u003Ccode>Intel I7 7700 + NVIDIA GeForce GTX 1070Ti\u003C\u002Fcode> and \u003Ccode>Tested on Intel I7 6580k + NVIDIA GTX 1080Ti\u003C\u002Fcode> platforms. The FPS difference is within 10% on each task.\n\n\u003Ctable class=\"table\">\n    \u003Ctr>\n        \u003Cth scope=\"row\">Platform\u003C\u002Fth>\n        \u003Ctd colspan=\"6\">Multi-process FPS tested on Intel E5-2697 v4 + NVIDIA Tesla V100\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth scope=\"col\">Configuration\u003C\u002Fth>\n      \u003Cth scope=\"col\">512x512 episode sync\u003C\u002Fth>\n      \u003Cth scope=\"col\">512x512 frame sync\u003C\u002Fth>\n      \u003Cth scope=\"col\">256x256 episode sync\u003C\u002Fth>\n      \u003Cth scope=\"col\">256x256 frame sync\u003C\u002Fth>\n      \u003Cth scope=\"col\">128x128 episode sync\u003C\u002Fth>\n      \u003Cth scope=\"col\">128x128 frame sync\u003C\u002Fth>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth scope=\"row\">1 process\u003C\u002Fth>\n      \u003Ctd>12.8\u003C\u002Ftd>\n      \u003Ctd>12.02\u003C\u002Ftd>\n      \u003Ctd>32.98\u003C\u002Ftd>\n      \u003Ctd>32.98\u003C\u002Ftd>\n      \u003Ctd>52\u003C\u002Ftd>\n      \u003Ctd>52\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth scope=\"row\">2 processes\u003C\u002Fth>\n      \u003Ctd>23.4\u003C\u002Ftd>\n      \u003Ctd>20.9\u003C\u002Ftd>\n      \u003Ctd>60.89\u003C\u002Ftd>\n      \u003Ctd>53.63\u003C\u002Ftd>\n      \u003Ctd>86.1\u003C\u002Ftd>\n      \u003Ctd>101.8\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth scope=\"row\">4 processes\u003C\u002Fth>\n      \u003Ctd>42.4\u003C\u002Ftd>\n      \u003Ctd>31.97\u003C\u002Ftd>\n      \u003Ctd>105.26\u003C\u002Ftd>\n      \u003Ctd>76.23\u003C\u002Ftd>\n      \u003Ctd>97.6\u003C\u002Ftd>\n      \u003Ctd>145.9\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth scope=\"row\">8 processes\u003C\u002Fth>\n      \u003Ctd>72.5\u003C\u002Ftd>\n      \u003Ctd>48.1\u003C\u002Ftd>\n      \u003Ctd>138.5\u003C\u002Ftd>\n      \u003Ctd>97.72\u003C\u002Ftd>\n      \u003Ctd>113\u003C\u002Ftd>\n      \u003Ctd>151\u003C\u002Ftd>\n    \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\u003Cimg src=misc\u002Fmpi_fps.png width=\"600\">\n\nWeb User Interface\n----\nWhen running Gibson, you can start a web user interface with `python gibson\u002Futils\u002Fweb_ui.py python gibson\u002Futils\u002Fweb_ui.py 5552`. This is helpful when you cannot physically access the machine running gibson or you are running on a headless cloud environment. You need to change `mode` in configuration file to `web_ui` to use the web user interface.\n\n\u003Cimg src=misc\u002Fweb_ui.png width=\"600\">\n\nRendering Semantics\n----\n\u003Cimg src=misc\u002Finstance_colorcoding_semantics.png width=\"600\">\n\nGibson can provide pixel-wise frame-by-frame semantic masks when the model is semantically annotated. As of now we have incorporated models from [Stanford 2D-3D-Semantics Dataset](http:\u002F\u002Fbuildingparser.stanford.edu\u002F) and [Matterport 3D](https:\u002F\u002Fniessner.github.io\u002FMatterport\u002F) for this purpose. You can access them within Gibson [here](https:\u002F\u002Fgithub.com\u002FStanfordVL\u002FGibsonEnv\u002Fblob\u002Fmaster\u002Fgibson\u002Fdata\u002FREADME.md#download-gibson-database-of-spaces). We refer you to the original dataset's reference for the list of their semantic classes and annotations. \n\nFor detailed instructions of rendering semantics in Gibson, see [semantic instructions](gibson\u002Futils\u002Fsemantics.md). As one example in the starter dataset that comes with installation, `space7` includes Stanford 2D-3D-Semantics style annotation. \n\n\u003C!---\n**Agreement**: If you choose to use the models from [Stanford 2D3DS](http:\u002F\u002F3dsemantics.stanford.edu\u002F) or [Matterport 3D](https:\u002F\u002Fniessner.github.io\u002FMatterport\u002F) for rendering semantics, please sign their respective license agreements. Stanford 2D3DS's agreement is inclued in Gibson Database's agreement and does not need to be signed again. For Matterport3D, please see [here](https:\u002F\u002Fniessner.github.io\u002FMatterport\u002F).\n--->\n\nRobotic Agents\n----\n\nGibson provides a base set of agents. See videos of these agents and their corresponding perceptual observation [here](http:\u002F\u002Fgibsonenv.stanford.edu\u002Fagents\u002F). \n\u003Cimg src=misc\u002Fagents.gif>\n\nTo enable (optionally) abstracting away low-level control and robot dynamics for high-level tasks, we also provide a set of practical and ideal controllers for each agent.\n\n| Agent Name     | DOF | Information      | Controller |\n|:-------------: | :-------------: |:-------------: |:-------------| \n| Mujoco Ant      | 8   | [OpenAI Link](https:\u002F\u002Fblog.openai.com\u002Froboschool\u002F) | Torque |\n| Mujoco Humanoid | 17  | [OpenAI Link](https:\u002F\u002Fblog.openai.com\u002Froboschool\u002F) | Torque |\n| Husky Robot     | 4   | [ROS](http:\u002F\u002Fwiki.ros.org\u002FRobots\u002FHusky), [Manufacturer](https:\u002F\u002Fwww.clearpathrobotics.com\u002F) | Torque, Velocity, Position |\n| Minitaur Robot  | 8   | [Robot Page](https:\u002F\u002Fwww.ghostrobotics.io\u002Fcopy-of-robots), [Manufacturer](https:\u002F\u002Fwww.ghostrobotics.io\u002F) | Sine Controller |\n| JackRabbot      | 2   | [Stanford Project Link](http:\u002F\u002Fcvgl.stanford.edu\u002Fprojects\u002Fjackrabbot\u002F) | Torque, Velocity, Position |\n| TurtleBot       | 2   | [ROS](http:\u002F\u002Fwiki.ros.org\u002FRobots\u002FTurtleBot), [Manufacturer](https:\u002F\u002Fwww.turtlebot.com\u002F) | Torque, Velocity, Position |\n| Quadrotor         | 6   | [Paper](https:\u002F\u002Frepository.upenn.edu\u002Fcgi\u002Fviewcontent.cgi?referer=https:\u002F\u002Fwww.google.com\u002F&httpsredir=1&article=1705&context=edissertations) | Position |\n\n\n### Starter Code \n\nMore demonstration examples can be found in `examples\u002Fdemo` folder\n\n| Example        | Explanation          |\n|:-------------: |:-------------| \n|`play_ant_camera.py`|Use 1234567890qwerty keys on your keyboard to control an ant to navigate around Gates building, while RGB and depth camera outputs are also shown. |\n|`play_ant_nonviz.py`| Use 1234567890qwerty keys on your keyboard to control an ant to navigate around Gates building.|\n|`play_drone_camera.py`| Use ASWDZX keys on your keyboard to control a drone to navigate around Gates building, while RGB and depth camera outputs are also shown.|\n|`play_drone_nonviz.py`| Use ASWDZX keys on your keyboard to control a drone to navigate around Gates building|\n|`play_humanoid_camera.py`| Use 1234567890qwertyui keys on your keyboard to control a humanoid to navigate around Gates building. Just kidding, controlling humaniod with keyboard is too difficult, you can only watch it fall. Press R to reset. RGB and depth camera outputs are also shown. |\n|`play_humanoid_nonviz.py`| Watch a humanoid fall. Press R to reset.|\n|`play_husky_camera.py`| Use ASWD keys on your keyboard to control a car to navigate around Gates building, while RGB and depth camera outputs are also shown.|\n|`play_husky_nonviz.py`| Use ASWD keys on your keyboard to control a car to navigate around Gates building|\n\nMore training code can be found in `examples\u002Ftrain` folder.\n\n| Example        | Explanation          |\n|:-------------: |:-------------| \n|`train_husky_navigate_ppo2.py`|   Use PPO2 to train a car to navigate down the hallway in Gates building, using RGBD input from the camera.|\n|`train_husky_navigate_ppo1.py`|   Use PPO1 to train a car to navigate down the hallway in Gates building, using RGBD input from the camera.|\n|`train_ant_navigate_ppo1.py`| Use PPO1 to train an ant to navigate down the hallway in Gates building, using visual input from the camera. |\n|`train_ant_climb_ppo1.py`| Use PPO1 to train an ant to climb down the stairs in Gates building, using visual input from the camera.  |\n|`train_ant_gibson_flagrun_ppo1.py`| Use PPO1 to train an ant to chase a target (a red cube) in Gates building. Everytime the ant gets to target(or time out), the target will change position.|\n|`train_husky_gibson_flagrun_ppo1.py`|Use PPO1 to train a car to chase a target (a red cube) in Gates building. Everytime the car gets to target(or time out), the target will change position. |\n\nROS Configuration\n---------\n\nWe provide examples of configuring Gibson with ROS [here](examples\u002Fros\u002Fgibson-ros). We use turtlebot as an example, after a policy is trained in Gibson, it requires minimal changes to deploy onto a turtlebot. See [README](examples\u002Fros\u002Fgibson-ros) for more details.\n\n\n\n\nCoding Your RL Agent\n====\nYou can code your RL agent following our convention. The interface with our environment is very simple (see some examples in the end of this section).\n\nFirst, you can create an environment by creating an instance of classes in `gibson\u002Fcore\u002Fenvs` folder. \n\n\n```python\nenv = AntNavigateEnv(is_discrete=False, config = config_file)\n```\n\nThen do one step of the simulation with `env.step`. And reset with `env.reset()`\n```python\nobs, rew, env_done, info = env.step(action)\n```\n`obs` gives the observation of the robot. It is a dictionary with each component as a key value pair. Its keys are specified by user inside config file. E.g. `obs['nonviz_sensor']` is proprioceptive sensor data, `obs['rgb_filled']` is rgb camera data.\n\n`rew` is the defined reward. `env_done` marks the end of one episode, for example, when the robot dies. \n`info` gives some additional information of this step; sometimes we use this to pass additional non-visual sensor values.\n\nWe mostly followed [OpenAI gym](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym) convention when designing the interface of RL algorithms and the environment. In order to help users start with the environment quicker, we\nprovide some examples at [examples\u002Ftrain](examples\u002Ftrain). The RL algorithms that we use are from [openAI baselines](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fbaselines) with some adaptation to work with hybrid visual and non-visual sensory data.\nIn particular, we used [PPO](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fbaselines\u002Ftree\u002Fmaster\u002Fbaselines\u002Fppo1) and a speed optimized version of [PPO](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fbaselines\u002Ftree\u002Fmaster\u002Fbaselines\u002Fppo2).\n\n\nEnvironment Configuration\n=================\nEach environment is configured with a `yaml` file. Examples of `yaml` files can be found in `examples\u002Fconfigs` folder. Parameters for the file is explained below. For more informat specific to Bullet Physics engine, you can see the documentation [here](https:\u002F\u002Fdocs.google.com\u002Fdocument\u002Fd\u002F10sXEhzFRSnvFcl3XxNGhnD4N2SedqwdAvK3dsihxVUA\u002Fedit).\n\n| Argument name        | Example value           | Explanation  |\n|:-------------:|:-------------:| :-----|\n| envname      | AntClimbEnv | Environment name, make sure it is the same as the class name of the environment |\n| model_id      | space1-space8      |   Scene id, in beta release, choose from space1-space8 |\n| target_orn | [0, 0, 3.14]      |   Eulerian angle (in radian) target orientation for navigating, the reference frame is world frame. For non-navigation tasks, this parameter is ignored. |\n|target_pos | [-7, 2.6, -1.5] | target position (in meter) for navigating, the reference frame is world frame. For non-navigation tasks, this parameter is ignored. |\n|initial_orn | [0, 0, 3.14] | initial orientation (in radian) for navigating, the reference frame is world frame |\n|initial_pos | [-7, 2.6, 0.5] | initial position (in meter) for navigating, the reference frame is world frame|\n|fov | 1.57  | field of view for the camera, in radian |\n| use_filler | true\u002Ffalse  | use neural network filler or not. It is recommended to leave this argument true. See [Gibson Environment website](http:\u002F\u002Fgibson.vision\u002F) for more information. |\n|display_ui | true\u002Ffalse  | Gibson has two ways of showing visual output, either in multiple windows, or aggregate them into a single pygame window. This argument determines whether to show pygame ui or not, if in a production environment (training), you need to turn this off |\n|show_diagnostics | true\u002Ffalse  | show dignostics(including fps, robot position and orientation, accumulated rewards) overlaying on the RGB image |\n|ui_num |2  | how many ui components to show, this should be length of ui_components. |\n| ui_components | [RGB_FILLED, DEPTH]  | which are the ui components, choose from [RGB_FILLED, DEPTH, NORMAL, SEMANTICS, RGB_PREFILLED] |\n|output | [nonviz_sensor, rgb_filled, depth]  | output of the environment to the robot, choose from  [nonviz_sensor, rgb_filled, depth]. These values are independent of `ui_components`, as `ui_components` determines what to show and `output` determines what the robot receives. |\n|resolution | 512 | choose from [128, 256, 512] resolution of rgb\u002Fdepth image |\n|initial_orn | [0, 0, 3.14] | initial orientation (in radian) for navigating, the reference frame is world frame |\n|speed : timestep | 0.01 | length of one physics simulation step in seconds(as defined in [Bullet](https:\u002F\u002Fdocs.google.com\u002Fdocument\u002Fd\u002F10sXEhzFRSnvFcl3XxNGhnD4N2SedqwdAvK3dsihxVUA\u002Fedit)). For example, if timestep=0.01 sec, frameskip=10, and the environment is running at 100fps, it will be 10x real time. Note: setting timestep above 0.1 can cause instability in current version of Bullet simulator since an object should not travel faster than its own radius within one timestep. You can keep timestep at a low value but increase frameskip to simulate at a faster speed. See [Bullet guide](https:\u002F\u002Fdocs.google.com\u002Fdocument\u002Fd\u002F10sXEhzFRSnvFcl3XxNGhnD4N2SedqwdAvK3dsihxVUA\u002Fedit) under \"discrete collision detection\" for more info.|\n|speed : frameskip | 10 | how many timestep to skip when rendering frames. See above row for an example. For tasks that does not require high frequency control, you can set frameskip to larger value to gain further speed up. |\n|mode | gui\u002Fheadless\u002Fweb_ui  | gui or headless, if in a production environment (training), you need to turn this to headless. In gui mode, there will be visual output; in headless mode, there will be no visual output. In addition to that, if you set mode to web_ui, it will behave like in headless mode but the visual will be rendered to a web UI server. ([more information](#web-user-interface))|\n|verbose |true\u002Ffalse  | show diagnostics in terminal |\n|fast_lq_render| true\u002Ffalse| if there is fast_lq_render in yaml file, Gibson will use a smaller filler network, this will render faster but generate slightly lower quality camera output. This option is useful for training RL agents fast. |\n\n#### Making Your Customized Environment\nGibson provides a set of methods for you to define your own environments. You can follow the existing environments inside `gibson\u002Fcore\u002Fenvs`.\n\n| Method name        | Usage           |\n|:------------------:|:---------------------------|\n| robot.render_observation(pose) | Render new observations based on pose, returns a dictionary. |\n| robot.get_observation() | Get observation at current pose. Needs to be called after robot.render_observation(pose). This does not induce extra computation. |\n| robot.get_position() | Get current robot position. |\n| robot.get_orientation() | Get current robot orientation. |\n| robot.eyes.get_position() | Get current robot perceptive camera position. |\n| robot.eyes.get_orientation() | Get current robot perceptive camera orientation. | \n| robot.get_target_position() | Get robot target position. |\n| robot.apply_action(action) | Apply action to robot. |  \n| robot.reset_new_pose(pos, orn) | Reset the robot to any pose. |\n| robot.dist_to_target() | Get current distance from robot to target. |\n\nGoggles: transferring the agent to real-world\n=================\nGibson includes a baked-in domain adaptation mechanism, named Goggles, for when an agent trained in Gibson is going to be deployed in real-world (i.e. operate based on images coming from an onboard camera). The mechanisms is essentially a learned inverse function that alters the frames coming from a real camera to what they would look like if they were rendered via Gibson, and hence, disolve the domain gap. \n\n\u003Cimg src=http:\u002F\u002Fgibson.vision\u002Fpublic\u002Fimg\u002Ffigure4.jpg width=\"600\">\n\n\n**More details:** With all the imperfections in point cloud rendering, it has been proven difficult to get completely photo-realistic rendering with neural network fixes. The remaining issues make a domain gap between the synthesized and real images. Therefore, we formulate the rendering problem as forming a joint space ensuring a correspondence between rendered and real images, rather than trying to (unsuccessfully) render images that are identical to real ones. This provides a deterministic pathway for traversing across these domains and hence undoing the gap. We add another network \"u\" for target image (I_t) and define the rendering loss to minimize the distance between f(I_s) and u(I_t), where \"f\" and \"I_s\" represent the filler neural network and point cloud rendering output, respectively (see the loss in above figure). We use the same network structure for f and u. The function u(I) is trained to alter the observation in real-world, I_t, to look like the corresponding I_s and consequently dissolve the gap. We named the u network goggles, as it resembles corrective lenses for the agent for deployment in real-world. Detailed formulation and discussion of the mechanism can be found in the paper. You can download the function u and apply it when you deploy your trained agent in real-world.\n\nIn order to use goggle, you will need preferably a camera with depth sensor, we provide an example [here](examples\u002Fros\u002Fgibson-ros\u002Fgoggle.py) for Kinect. The trained goggle functions are stored in `assets\u002Funfiller_{resolution}.pth`, and each one is paired with one filler function. You need to use the correct one depending on which filler function is used. If you don't have a camera with depth sensor, we also provide an example for RGB only [here](examples\u002Fdemo\u002Fgoggle_video.py).\n\n\nCitation\n=================\n\nIf you use Gibson Environment's software or database, please cite:\n```\n@inproceedings{xiazamirhe2018gibsonenv,\n  title={Gibson {Env}: real-world perception for embodied agents},\n  author={Xia, Fei and R. Zamir, Amir and He, Zhi-Yang and Sax, Alexander and Malik, Jitendra and Savarese, Silvio},\n  booktitle={Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on},\n  year={2018},\n  organization={IEEE}\n}\n```\n","# GIBSON 环境：用于真实感知的具身主动智能体\n\n你不应该整天玩游戏，你的 AI 也不应该！我们构建了一个虚拟环境模拟器 Gibson，它通过真实世界的体验来学习感知。\n\n\u003Cimg src=misc\u002Fui.gif width=\"600\">\n\n**概述**：感知与主动性（即拥有一定程度的运动自由）密切相关。在物理世界中学习主动感知和传感器运动控制非常麻烦，因为现有算法太慢，无法高效地实时学习，而且机器人既脆弱又昂贵。这促使了在模拟环境中学习的兴起，但也引发了如何迁移到现实世界的问题。我们开发了 Gibson 环境，具有以下主要特点：\n\n**I.** 源自真实世界，并通过虚拟化真实空间反映其语义复杂性，  \n**II.** 内置迁移到现实世界的机制（Goggles 功能），以及  \n**III.** 智能体的具身化，并通过集成物理引擎（[Bulletphysics](http:\u002F\u002Fbulletphysics.org\u002Fwordpress\u002F)）使其受空间和物理约束。\n\n**命名**：Gibson 环境以《视觉感知的生态学方法》（1979 年）的作者 *James J. Gibson* 命名。“我们必须感知才能行动，但我们也必须行动才能感知”——JJ Gibson\n\n更多技术细节请参见 [网站](http:\u002F\u002Fgibson.vision\u002F) (http:\u002F\u002Fgibsonenv.stanford.edu\u002F)。本仓库旨在分发环境并提供安装\u002F运行说明。\n\n#### 论文\n**[\"Gibson Env: Real-World Perception for Embodied Agents\"](http:\u002F\u002Fgibson.vision\u002F)**，发表于 **CVPR 2018 [Spotlight Oral]**。\n\n\n[![Gibson 总结视频](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FStanfordVL_GibsonEnv_readme_2037e3528ff1.png)](https:\u002F\u002Fyoutu.be\u002FKdxuZjemyjc \"点击观看总结 Gibson 环境的视频！\")\n\n\n\n发布版本\n=================\n**这是 0.3.1 版本。欢迎并感谢提交错误报告、改进建议以及社区开发贡献。** [更新日志文件](misc\u002FCHANGELOG.md)。  \n\n\n数据库\n=================\n完整数据库包含 572 个空间和 1440 个楼层，可从 [这里](gibson\u002Fdata\u002FREADME.md) 下载。Gibson 中所有空间的多样化可视化效果可在此处查看 [here](http:\u002F\u002Fgibsonenv.stanford.edu\u002Fdatabase\u002F)。为了减轻用户核心资产下载包的负担，我们仅包含了一小部分（39 个）空间。用户可以下载其余空间并将它们添加到资产文件夹中。我们还集成了 [Stanford 2D3DS](http:\u002F\u002F3dsemantics.stanford.edu\u002F) 和 [Matterport 3D](https:\u002F\u002Fniessner.github.io\u002FMatterport\u002F) 作为单独的数据集，如果希望使用这些数据集与 Gibson 的模拟器（访问 [此处](gibson\u002Fdata\u002FREADME.md)）。\n\n目录\n=================\n\n   * [安装](#installation)\n        * [快速安装（docker）](#a-quick-installation-docker)\n        * [从源码构建](#b-building-from-source)\n        * [卸载](#uninstalling)\n   * [快速开始](#quick-start)\n        * [Gibson 帧率](#gibson-framerate)\n        * [Web 用户界面](#web-user-interface)\n        * [渲染语义](#rendering-semantics)\n        * [机器人智能体](#robotic-agents)\n        * [ROS 配置](#ros-configuration)\n   * [编写你的强化学习智能体](#coding-your-rl-agent)\n   * [环境配置](#environment-configuration)\n   * [Goggles：将智能体迁移到现实世界](#goggles-transferring-the-agent-to-real-world)\n   * [引用](#citation)\n\n\n\n安装\n=================\n\n#### 安装方法\n\n有两种方式安装 Gibson：A. 使用我们的 Docker 镜像（推荐）和 B. 从源码构建。\n\n#### 系统要求\n\n最低系统要求如下：\n\n对于 Docker 安装（A）：\n- Ubuntu 16.04\n- Nvidia GPU，显存 > 6.0GB\n- Nvidia 驱动 >= 384\n- CUDA >= 9.0，CuDNN >= v7\n\n对于从源码构建（B）：\n- Ubuntu >= 14.04\n- Nvidia GPU，显存 > 6.0GB\n- Nvidia 驱动 >= 375\n- CUDA >= 8.0，CuDNN >= v5\n\n#### 下载数据\n\n首先，我们的环境核心资产数据可从 [这里](https:\u002F\u002Fstorage.googleapis.com\u002Fgibson_scenes\u002Fassets_core_v2.tar.gz) 获取。您可以按照以下安装指南下载并正确设置它们。`gibson\u002Fassets` 文件夹存储运行 Gibson 环境所需的必要数据（智能体模型、环境等）。用户可以将更多环境文件添加到 `gibson\u002Fassets\u002Fdataset` 中，以便在更多环境中运行 Gibson。访问 [数据库 readme](gibson\u002Fdata\u002FREADME.md) 下载更多空间。在使用 Gibson 数据库之前，请签署 [许可协议](gibson\u002Fdata\u002FREADME.md#download)。\n\n\nA. 快速安装（docker）\n-----\n\n我们使用 Docker 分发软件，您需要先安装 [docker](https:\u002F\u002Fdocs.docker.com\u002Fengine\u002Finstallation\u002F) 和 [nvidia-docker2.0](https:\u002F\u002Fgithub.com\u002Fnvidia\u002Fnvidia-docker\u002Fwiki\u002FInstallation-(version-2.0))。\n\n运行 `docker run --runtime=nvidia --rm nvidia\u002Fcuda nvidia-smi` 验证您的安装。\n\n您可以选择 1. 从我们的 Docker 镜像拉取（推荐）或 2. 构建自己的 Docker 镜像。\n\n\n1. 从我们的 Docker 镜像拉取（推荐）\n\n```bash\n# 从 https:\u002F\u002Fstorage.googleapis.com\u002Fgibson_scenes\u002Fdataset.tar.gz 下载数据集\ndocker pull xf1280\u002Fgibson:0.3.1\nxhost +local:root\ndocker run --runtime=nvidia -ti --rm -e DISPLAY -v \u002Ftmp\u002F.X11-unix:\u002Ftmp\u002F.X11-unix -v \u003C主机路径到数据集文件夹>:\u002Froot\u002Fmount\u002Fgibson\u002Fgibson\u002Fassets\u002Fdataset xf1280\u002Fgibson:0.3.1\n```\n\n2. 构建自己的 Docker 镜像 \n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FStanfordVL\u002FGibsonEnv.git\ncd GibsonEnv\n.\u002Fdownload.sh # 此脚本下载资产数据文件并解压缩到 gibson\u002Fassets 文件夹\ndocker build . -t gibson ### 在 Docker 内完成构建，默认情况下，数据集不会包含在 Docker 镜像中\nxhost +local:root ## 启用 Docker 显示\n```\n如果安装成功，您应该能够运行 `docker run --runtime=nvidia -ti --rm -e DISPLAY -v \u002Ftmp\u002F.X11-unix:\u002Ftmp\u002F.X11-unix -v \u003C主机路径到数据集文件夹>:\u002Froot\u002Fmount\u002Fgibson\u002Fgibson\u002Fassets\u002Fdataset gibson` 创建一个容器。请注意，我们未将数据集文件包含在 Docker 镜像中以保持镜像轻量化，因此在启动容器时需要将其挂载到容器中。\n\n#### 关于无头服务器部署的注意事项\n\nGibson Env 支持在无头服务器上部署并通过 `x11vnc` 进行远程访问。\n您可以使用上述 Dockerfile 构建自己的 Docker 镜像。\n在无头服务器上运行 Gibson 的说明（需要 X 服务器运行）：\n\n1. 按照入门指南安装 `nvidia-docker2` 依赖项。使用 `sudo apt-get install x11vnc` 安装 `x11vnc`。\n2. 在主机上运行 X server，并在 DISPLAY :0 上运行 `x11vnc`。\n3. 运行以下命令启动 Docker 容器：\n   ```bash\n   docker run --runtime=nvidia -ti --rm -e DISPLAY -v \u002Ftmp\u002F.X11-unix\u002FX0:\u002Ftmp\u002F.X11-unix\u002FX0 -v \u003C主机数据集文件夹路径>:\u002Froot\u002Fmount\u002Fgibson\u002Fgibson\u002Fassets\u002Fdataset \u003Cgibson 镜像名称>\n   ```\n4. 在 Docker 内通过 `python \u003Cgibson 示例或训练脚本>` 运行 Gibson。\n5. 访问 `主机:5900`，你应该能够看到图形用户界面（GUI）。\n\n如果你没有运行 X server，仍然可以运行 Gibson，请参阅[此指南](https:\u002F\u002Fgithub.com\u002FStanfordVL\u002FGibsonEnv\u002Fwiki\u002FRunning-GibsonEnv-on-headless-server)了解更多信息。\n\nB. 从源码构建\n-----\n如果你不想使用我们的 Docker 镜像，也可以在本地安装 Gibson。这需要安装一些依赖项。\n\n首先，确保你已安装 Nvidia 驱动程序和 CUDA。如果从源码安装，CUDA 9 并不是必需的，因为这是为 nvidia-docker 2.0 准备的。然后，让我们安装一些依赖项：\n\n```bash\napt-get update \napt-get install libglew-dev libglm-dev libassimp-dev xorg-dev libglu1-mesa-dev libboost-dev \\\n\t\tmesa-common-dev freeglut3-dev libopenmpi-dev cmake golang libjpeg-turbo8-dev wmctrl \\\n\t\txdotool libzmq3-dev zlib1g-dev\n```\t\n\n安装所需的深度学习库：推荐使用 Python 3.5。你可以先创建一个 Python 3.5 环境。\n\n```bash\nconda create -n py35 python=3.5 anaconda \nsource activate py35 # 后续步骤需要在 conda 环境中执行\nconda install -c conda-forge opencv\npip install http:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu90\u002Ftorch-0.3.1-cp35-cp35m-linux_x86_64.whl \npip install torchvision==0.2.0\npip install tensorflow==1.3\n```\n克隆代码仓库，下载数据并构建：\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FStanfordVL\u002FGibsonEnv.git\ncd GibsonEnv\n.\u002Fdownload.sh # 此脚本下载资产数据文件并解压到 gibson\u002Fassets 文件夹\n.\u002Fbuild.sh build_local ### 构建 C++ 和 CUDA 文件\npip install -e . ### 安装 Python 库\n```\n\n如果需要运行训练示例，请安装 OpenAI baselines。\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ffxia22\u002Fbaselines.git\npip install -e baselines\n```\n\n卸载\n----\n\n卸载 Gibson 很简单。如果你是通过 Docker 安装的，只需运行以下命令清理镜像：\n```bash\ndocker images -a | grep \"gibson\" | awk '{print $3}' | xargs docker rmi\n```\n如果你是从源码安装的，则通过 `pip uninstall gibson` 卸载。\n\n\n快速开始\n=================\n\n首先在主机上运行 `xhost +local:root` 以启用显示。你可能需要先运行 `export DISPLAY=:0`。通过以下命令进入 Docker 容器：\n```bash\ndocker run --runtime=nvidia -ti --rm -e DISPLAY -v \u002Ftmp\u002F.X11-unix:\u002Ftmp\u002F.X11-unix -v \u003C主机数据集文件夹路径>:\u002Froot\u002Fmount\u002Fgibson\u002Fgibson\u002Fassets\u002Fdataset gibson\n```\n进入后，你会获得一个交互式 shell。现在可以运行一些示例。\n\n如果你是从源码安装的，可以直接运行以下命令而无需使用 Docker。\n\n\n```bash\npython examples\u002Fdemo\u002Fplay_husky_nonviz.py ### 使用键盘上的 ASWD 键控制一辆车在 Gates 建筑周围导航\n```\n\n\u003Cimg src=misc\u002Fhusky_nonviz.png width=\"600\">\n\n在此示例中，你可以使用键盘上的 ASWD 键控制一辆车在 Gates 建筑周围导航。此示例不会显示相机输出。\n\n```bash\npython examples\u002Fdemo\u002Fplay_husky_camera.py ### 使用键盘上的 ASWD 键控制一辆车在 Gates 建筑周围导航，同时显示 RGB 和深度相机输出\n```\n\u003Cimg src=misc\u002Fhusky_camera.png width=\"600\">\n\n你可以使用键盘上的 ASWD 键控制一辆车在 Gates 建筑周围导航。你还可以看到 RGB 和深度相机的输出。\n\n```bash\npython examples\u002Ftrain\u002Ftrain_husky_navigate_ppo2.py ### 使用 PPO2 训练一辆车通过 Gates 建筑的走廊，使用相机的视觉输入\n```\n\n\u003Cimg src=misc\u002Fhusky_train.png width=\"800\">\n运行此命令后，你将开始训练一个 Husky 机器人在 Gates 建筑中导航并通过走廊，使用 RGBD 输入。每轮结束后，你将在终端中看到一些强化学习相关的统计数据。\n\n\n```bash\npython examples\u002Ftrain\u002Ftrain_ant_navigate_ppo1.py ### 使用 PPO1 训练一只蚂蚁通过 Gates 建筑的走廊，使用相机的视觉输入\n```\n\n\u003Cimg src=misc\u002Fant_train.png width=\"800\">\n运行此命令后，你将开始训练一只蚂蚁在 Gates 建筑中导航并通过走廊，使用 RGBD 输入。每轮结束后，你将在终端中看到一些强化学习相关的统计数据。\n\n\n\nGibson 帧率\n----\n以下是 Gibson 环境在不同平台上的帧率基准测试结果。请参考 [fps 分支](https:\u002F\u002Fgithub.com\u002FStanfordVL\u002FGibsonEnv\u002Ftree\u002Ffps) 获取重现结果的代码。\n\u003Ctable class=\"table\">\n  \u003Ctr>\n    \u003Cth scope=\"row\">平台\u003C\u002Fth>\n    \u003Ctd colspan=\"3\">测试环境：Intel E5-2697 v4 + NVIDIA Tesla V100\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Cth scope=\"col\">分辨率 [nxn]\u003C\u002Fth>\n    \u003Cth scope=\"col\">128\u003C\u002Fth>\n    \u003Cth scope=\"col\">256\u003C\u002Fth>\n    \u003Cth scope=\"col\">512\u003C\u002Fth>\n \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Cth scope=\"row\">RGBD，网络前\u003Ccode>f\u003C\u002Fcode>\u003C\u002Fth>\n    \u003Ctd>109.1\u003C\u002Ftd>\n    \u003Ctd>58.5\u003C\u002Ftd>\n    \u003Ctd>26.5\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Cth scope=\"row\">RGBD，网络后\u003Ccode>f\u003C\u002Fcode>\u003C\u002Fth>\n    \u003Ctd>77.7\u003C\u002Ftd>\n    \u003Ctd>30.6\u003C\u002Ftd>\n    \u003Ctd>14.5\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Cth scope=\"row\">RGBD，小型网络后\u003Ccode>f\u003C\u002Fcode>\u003C\u002Fth>\n    \u003Ctd>87.4\u003C\u002Ftd>\n    \u003Ctd>40.5\u003C\u002Ftd>\n    \u003Ctd>21.2\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Cth scope=\"row\">仅深度\u003C\u002Fth>\n    \u003Ctd>253.0\u003C\u002Ftd>\n    \u003Ctd>197.9\u003C\u002Ftd>\n    \u003Ctd>124.7\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Cth scope=\"row\">仅表面法线\u003C\u002Fth>\n    \u003Ctd>207.7\u003C\u002Ftd>\n    \u003Ctd>129.7\u003C\u002Ftd>\n    \u003Ctd>57.2\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Cth scope=\"row\">仅语义\u003C\u002Fth>\n    \u003Ctd>190.0\u003C\u002Ftd>\n    \u003Ctd>144.2\u003C\u002Ftd>\n    \u003Ctd>55.6\u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Cth scope=\"row\">非视觉感官\u003C\u002Fth>\n    \u003Ctd>396.1\u003C\u002Ftd>\n    \u003Ctd>396.1\u003C\u002Ftd>\n    \u003Ctd>396.1\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n我们还在 \u003Ccode>Intel I7 7700 + NVIDIA GeForce GTX 1070Ti\u003C\u002Fcode> 和 \u003Ccode>Intel I7 6580k + NVIDIA GTX 1080Ti\u003C\u002Fcode> 平台上进行了测试。每个任务的帧率差异在 10% 以内。\n\n\u003Ctable class=\"table\">\n    \u003Ctr>\n        \u003Cth scope=\"row\">平台 (Platform)\u003C\u002Fth>\n        \u003Ctd colspan=\"6\">在 Intel E5-2697 v4 + NVIDIA Tesla V100 上测试的多进程 FPS\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth scope=\"col\">配置 (Configuration)\u003C\u002Fth>\n      \u003Cth scope=\"col\">512x512 episode 同步\u003C\u002Fth>\n      \u003Cth scope=\"col\">512x512 frame 同步\u003C\u002Fth>\n      \u003Cth scope=\"col\">256x256 episode 同步\u003C\u002Fth>\n      \u003Cth scope=\"col\">256x256 frame 同步\u003C\u002Fth>\n      \u003Cth scope=\"col\">128x128 episode 同步\u003C\u002Fth>\n      \u003Cth scope=\"col\">128x128 frame 同步\u003C\u002Fth>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth scope=\"row\">1 进程\u003C\u002Fth>\n      \u003Ctd>12.8\u003C\u002Ftd>\n      \u003Ctd>12.02\u003C\u002Ftd>\n      \u003Ctd>32.98\u003C\u002Ftd>\n      \u003Ctd>32.98\u003C\u002Ftd>\n      \u003Ctd>52\u003C\u002Ftd>\n      \u003Ctd>52\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth scope=\"row\">2 进程\u003C\u002Fth>\n      \u003Ctd>23.4\u003C\u002Ftd>\n      \u003Ctd>20.9\u003C\u002Ftd>\n      \u003Ctd>60.89\u003C\u002Ftd>\n      \u003Ctd>53.63\u003C\u002Ftd>\n      \u003Ctd>86.1\u003C\u002Ftd>\n      \u003Ctd>101.8\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth scope=\"row\">4 进程\u003C\u002Fth>\n      \u003Ctd>42.4\u003C\u002Ftd>\n      \u003Ctd>31.97\u003C\u002Ftd>\n      \u003Ctd>105.26\u003C\u002Ftd>\n      \u003Ctd>76.23\u003C\u002Ftd>\n      \u003Ctd>97.6\u003C\u002Ftd>\n      \u003Ctd>145.9\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth scope=\"row\">8 进程\u003C\u002Fth>\n      \u003Ctd>72.5\u003C\u002Ftd>\n      \u003Ctd>48.1\u003C\u002Ftd>\n      \u003Ctd>138.5\u003C\u002Ftd>\n      \u003Ctd>97.72\u003C\u002Ftd>\n      \u003Ctd>113\u003C\u002Ftd>\n      \u003Ctd>151\u003C\u002Ftd>\n    \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\u003Cimg src=misc\u002Fmpi_fps.png width=\"600\">\n\nWeb 用户界面\n----\n运行 Gibson 时，可以通过 `python gibson\u002Futils\u002Fweb_ui.py python gibson\u002Futils\u002Fweb_ui.py 5552` 启动一个 Web 用户界面。这在您无法物理访问运行 Gibson 的机器或在无头云环境中运行时非常有用。您需要将配置文件中的 `mode` 更改为 `web_ui` 才能使用 Web 用户界面。\n\n\u003Cimg src=misc\u002Fweb_ui.png width=\"600\">\n\n渲染语义 (Rendering Semantics)\n----\n\u003Cimg src=misc\u002Finstance_colorcoding_semantics.png width=\"600\">\n\n当模型经过语义标注后，Gibson 可以提供逐像素、逐帧的语义掩码。目前我们已经整合了来自 [Stanford 2D-3D-Semantics 数据集](http:\u002F\u002Fbuildingparser.stanford.edu\u002F) 和 [Matterport 3D](https:\u002F\u002Fniessner.github.io\u002FMatterport\u002F) 的模型用于此目的。您可以在 Gibson 中通过 [这里](https:\u002F\u002Fgithub.com\u002FStanfordVL\u002FGibsonEnv\u002Fblob\u002Fmaster\u002Fgibson\u002Fdata\u002FREADME.md#download-gibson-database-of-spaces) 访问它们。有关其语义类别和标注的列表，请参考原始数据集的文档。\n\n有关在 Gibson 中渲染语义的详细说明，请参阅 [语义说明](gibson\u002Futils\u002Fsemantics.md)。例如，在安装时附带的入门数据集中，`space7` 包含 Stanford 2D-3D-Semantics 风格的标注。\n\n\u003C!---\n**协议**: 如果您选择使用 [Stanford 2D3DS](http:\u002F\u002F3dsemantics.stanford.edu\u002F) 或 [Matterport 3D](https:\u002F\u002Fniessner.github.io\u002FMatterport\u002F) 的模型进行语义渲染，请签署各自的许可协议。Stanford 2D3DS 的协议已包含在 Gibson 数据库的协议中，无需再次签署。对于 Matterport3D，请参阅 [这里](https:\u002F\u002Fniessner.github.io\u002FMatterport\u002F)。\n--->\n\n机器人代理 (Robotic Agents)\n----\n\nGibson 提供了一组基础的代理。这些代理及其对应的感知观察视频可以在此处查看 [here](http:\u002F\u002Fgibsonenv.stanford.edu\u002Fagents\u002F)。\n\u003Cimg src=misc\u002Fagents.gif>\n\n为了（可选地）抽象化低级控制和机器人动力学以便于高级任务，我们还为每个代理提供了一组实用和理想的控制器。\n\n| 代理名称 (Agent Name) | 自由度 (DOF) | 信息 (Information) | 控制器 (Controller) |\n|:-------------: | :-------------: |:-------------: |:-------------| \n| Mujoco Ant      | 8   | [OpenAI 链接](https:\u002F\u002Fblog.openai.com\u002Froboschool\u002F) | 力矩 (Torque) |\n| Mujoco Humanoid | 17  | [OpenAI 链接](https:\u002F\u002Fblog.openai.com\u002Froboschool\u002F) | 力矩 (Torque) |\n| Husky 机器人     | 4   | [ROS](http:\u002F\u002Fwiki.ros.org\u002FRobots\u002FHusky), [制造商](https:\u002F\u002Fwww.clearpathrobotics.com\u002F) | 力矩、速度、位置 (Torque, Velocity, Position) |\n| Minitaur 机器人  | 8   | [机器人页面](https:\u002F\u002Fwww.ghostrobotics.io\u002Fcopy-of-robots), [制造商](https:\u002F\u002Fwww.ghostrobotics.io\u002F) | 正弦控制器 (Sine Controller) |\n| JackRabbot      | 2   | [斯坦福项目链接](http:\u002F\u002Fcvgl.stanford.edu\u002Fprojects\u002Fjackrabbot\u002F) | 力矩、速度、位置 (Torque, Velocity, Position) |\n| TurtleBot       | 2   | [ROS](http:\u002F\u002Fwiki.ros.org\u002FRobots\u002FTurtleBot), [制造商](https:\u002F\u002Fwww.turtlebot.com\u002F) | 力矩、速度、位置 (Torque, Velocity, Position) |\n| 四轴飞行器 (Quadrotor)         | 6   | [论文](https:\u002F\u002Frepository.upenn.edu\u002Fcgi\u002Fviewcontent.cgi?referer=https:\u002F\u002Fwww.google.com\u002F&httpsredir=1&article=1705&context=edissertations) | 位置 (Position) |\n\n\n\n\n### 入门代码 (Starter Code)\n\n更多演示示例可以在 `examples\u002Fdemo` 文件夹中找到。\n\n| 示例 (Example)        | 说明 (Explanation)          |\n|:-------------: |:-------------| \n|`play_ant_camera.py`| 使用键盘上的 1234567890qwerty 键控制蚂蚁在 Gates 大楼周围导航，同时显示 RGB 和深度相机输出。|\n|`play_ant_nonviz.py`| 使用键盘上的 1234567890qwerty 键控制蚂蚁在 Gates 大楼周围导航。|\n|`play_drone_camera.py`| 使用键盘上的 ASWDZX 键控制无人机在 Gates 大楼周围导航，同时显示 RGB 和深度相机输出。|\n|`play_drone_nonviz.py`| 使用键盘上的 ASWDZX 键控制无人机在 Gates 大楼周围导航。|\n|`play_humanoid_camera.py`| 使用键盘上的 1234567890qwertyui 键控制人形机器人在 Gates 大楼周围导航。开个玩笑，用键盘控制人形机器人太难了，你只能看着它摔倒。按 R 重置。RGB 和深度相机输出也会显示。|\n|`play_humanoid_nonviz.py`| 看着人形机器人摔倒。按 R 重置。|\n|`play_husky_camera.py`| 使用键盘上的 ASWD 键控制一辆车在 Gates 大楼周围导航，同时显示 RGB 和深度相机输出。|\n|`play_husky_nonviz.py`| 使用键盘上的 ASWD 键控制一辆车在 Gates 大楼周围导航。|\n\n更多训练代码可以在 `examples\u002Ftrain` 文件夹中找到。\n\n| 示例 (Example)        | 说明 (Explanation)          |\n|:-------------: |:-------------| \n|`train_husky_navigate_ppo2.py`| 使用 PPO2 训练一辆车沿着 Gates 大楼的走廊导航，使用相机的 RGBD 输入。|\n|`train_husky_navigate_ppo1.py`| 使用 PPO1 训练一辆车沿着 Gates 大楼的走廊导航，使用相机的 RGBD 输入。|\n|`train_ant_navigate_ppo1.py`| 使用 PPO1 训练一只蚂蚁沿着 Gates 大楼的走廊导航，使用相机的视觉输入。|\n|`train_ant_climb_ppo1.py`| 使用 PPO1 训练一只蚂蚁沿着 Gates 大楼的楼梯下楼，使用相机的视觉输入。|\n|`train_ant_gibson_flagrun_ppo1.py`| 使用 PPO1 训练一只蚂蚁在 Gates 大楼内追逐目标（一个红色立方体）。每次蚂蚁到达目标（或超时），目标会改变位置。|\n|`train_husky_gibson_flagrun_ppo1.py`| 使用 PPO1 训练一辆车在 Gates 大楼内追逐目标（一个红色立方体）。每次车到达目标（或超时），目标会改变位置。|\n\nROS 配置\n---------\n\n我们在此处提供了使用 ROS [配置 Gibson 的示例](examples\u002Fros\u002Fgibson-ros)。我们以 turtlebot 为例，在 Gibson 中训练策略后，只需进行极少的修改即可将其部署到 turtlebot 上。更多详细信息，请参阅 [README](examples\u002Fros\u002Fgibson-ros)。\n\n\n\n编码你的强化学习（RL）智能体\n====\n你可以按照我们的约定来编写你的 RL 智能体。与我们的环境交互的接口非常简单（请参见本节末尾的一些示例）。\n\n首先，可以通过创建 `gibson\u002Fcore\u002Fenvs` 文件夹中的类实例来创建一个环境。\n\n\n```python\nenv = AntNavigateEnv(is_discrete=False, config = config_file)\n```\n\n然后通过 `env.step` 执行一步模拟，并通过 `env.reset()` 重置环境。\n```python\nobs, rew, env_done, info = env.step(action)\n```\n`obs` 提供了机器人的观测值。它是一个字典，每个组件作为键值对存在。其键由用户在配置文件中指定。例如，`obs['nonviz_sensor']` 是本体感受传感器数据，`obs['rgb_filled']` 是 RGB 相机数据。\n\n`rew` 是定义的奖励。`env_done` 标志着一个回合的结束，例如当机器人“死亡”时。`info` 提供了此步骤的一些附加信息；有时我们用它传递额外的非视觉传感器值。\n\n在设计 RL 算法和环境的接口时，我们主要遵循了 [OpenAI gym](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgym) 的约定。为了帮助用户更快上手环境，我们在 [examples\u002Ftrain](examples\u002Ftrain) 提供了一些示例。我们使用的 RL 算法来自 [openAI baselines](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fbaselines)，并经过一些调整以适配混合视觉和非视觉感官数据。\n具体来说，我们使用了 [PPO](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fbaselines\u002Ftree\u002Fmaster\u002Fbaselines\u002Fppo1) 和速度优化版本的 [PPO](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fbaselines\u002Ftree\u002Fmaster\u002Fbaselines\u002Fppo2)。\n\n\n环境配置\n=================\n每个环境都通过一个 `yaml` 文件进行配置。`yaml` 文件的示例可以在 `examples\u002Fconfigs` 文件夹中找到。以下是对文件参数的解释。有关 Bullet 物理引擎的更多信息，请参阅文档 [此处](https:\u002F\u002Fdocs.google.com\u002Fdocument\u002Fd\u002F10sXEhzFRSnvFcl3XxNGhnD4N2SedqwdAvK3dsihxVUA\u002Fedit)。\n\n| 参数名称        | 示例值           | 解释  |\n|:-------------:|:-------------:| :-----|\n| envname      | AntClimbEnv | 环境名称，确保它与环境的类名相同 |\n| model_id      | space1-space8      |   场景 ID，在测试版发布中，从 space1 到 space8 中选择 |\n| target_orn | [0, 0, 3.14]      |   导航的目标方向（欧拉角，单位为弧度），参考系为世界坐标系。对于非导航任务，此参数将被忽略。 |\n|target_pos | [-7, 2.6, -1.5] | 导航的目标位置（单位为米），参考系为世界坐标系。对于非导航任务，此参数将被忽略。 |\n|initial_orn | [0, 0, 3.14] | 导航的初始方向（单位为弧度），参考系为世界坐标系 |\n|initial_pos | [-7, 2.6, 0.5] | 导航的初始位置（单位为米），参考系为世界坐标系|\n|fov | 1.57  | 相机的视场角（单位为弧度） |\n| use_filler | true\u002Ffalse  | 是否使用神经网络填充器。建议将此参数保持为 true。更多信息请参阅 [Gibson 环境网站](http:\u002F\u002Fgibson.vision\u002F)。 |\n|display_ui | true\u002Ffalse  | Gibson 有两种显示视觉输出的方式，要么在多个窗口中显示，要么将它们聚合到一个 pygame 窗口中。此参数决定是否显示 pygame UI，如果是在生产环境（训练）中，则需要将其关闭。 |\n|show_diagnostics | true\u002Ffalse  | 在 RGB 图像上叠加显示诊断信息（包括帧率、机器人位置和方向、累计奖励）。 |\n|ui_num |2  | 要显示的 UI 组件数量，这应该是 ui_components 的长度。 |\n| ui_components | [RGB_FILLED, DEPTH]  | UI 组件有哪些，可从 [RGB_FILLED, DEPTH, NORMAL, SEMANTICS, RGB_PREFILLED] 中选择。 |\n|output | [nonviz_sensor, rgb_filled, depth]  | 环境向机器人提供的输出，可从 [nonviz_sensor, rgb_filled, depth] 中选择。这些值独立于 `ui_components`，因为 `ui_components` 决定要显示的内容，而 `output` 决定机器人接收到的内容。 |\n|resolution | 512 | 可从 [128, 256, 512] 中选择 RGB\u002F深度图像的分辨率。 |\n|initial_orn | [0, 0, 3.14] | 导航的初始方向（单位为弧度），参考系为世界坐标系。 |\n|speed : timestep | 0.01 | 单个物理模拟步长的时间（以秒为单位，定义见 [Bullet](https:\u002F\u002Fdocs.google.com\u002Fdocument\u002Fd\u002F10sXEhzFRSnvFcl3XxNGhnD4N2SedqwdAvK3dsihxVUA\u002Fedit)）。例如，如果 timestep=0.01 秒，frameskip=10，且环境以 100fps 运行，则模拟速度将是实时的 10 倍。注意：在当前版本的 Bullet 模拟器中，将 timestep 设置为高于 0.1 可能会导致不稳定，因为物体在一个时间步内不应移动超过其自身半径的距离。可以将 timestep 保持在较低值，但增加 frameskip 以实现更快的模拟速度。更多信息请参阅 [Bullet 指南](https:\u002F\u002Fdocs.google.com\u002Fdocument\u002Fd\u002F10sXEhzFRSnvFcl3XxNGhnD4N2SedqwdAvK3dsihxVUA\u002Fedit) 中的“离散碰撞检测”。|\n|speed : frameskip | 10 | 渲染帧时跳过的时间步数。请参见上一行的示例。对于不需要高频控制的任务，可以将 frameskip 设置为更大的值以进一步加速。 |\n|mode | gui\u002Fheadless\u002Fweb_ui  | gui 或 headless，如果是在生产环境（训练）中，则需要将其设置为 headless。在 gui 模式下会有视觉输出；在 headless 模式下则没有视觉输出。此外，如果将模式设置为 web_ui，它的行为类似于 headless 模式，但视觉内容会渲染到 Web UI 服务器上。（[更多信息](#web-user-interface)）|\n|verbose |true\u002Ffalse  | 在终端中显示诊断信息 |\n|fast_lq_render| true\u002Ffalse| 如果 yaml 文件中有 fast_lq_render，Gibson 将使用较小的填充网络，这将加快渲染速度，但生成的相机输出质量会略低。此选项对于快速训练 RL 智能体很有用。 |\n\n#### 创建自定义环境\nGibson 提供了一组方法供你定义自己的环境。你可以参考 `gibson\u002Fcore\u002Fenvs` 中现有的环境。\n\n| 方法名称             | 用途           |\n|:------------------:|:---------------------------|\n| robot.render_observation(pose) | 根据姿态 (pose) 渲染新的观测结果，返回一个字典。 |\n| robot.get_observation() | 获取当前姿态下的观测结果。需要在调用 robot.render_observation(pose) 后使用。此操作不会引入额外的计算。 |\n| robot.get_position() | 获取机器人当前位置。 |\n| robot.get_orientation() | 获取机器人当前方向。 |\n| robot.eyes.get_position() | 获取机器人感知相机的当前位置。 |\n| robot.eyes.get_orientation() | 获取机器人感知相机的当前方向。 | \n| robot.get_target_position() | 获取机器人目标位置。 |\n| robot.apply_action(action) | 对机器人应用动作 (action)。 |  \n| robot.reset_new_pose(pos, orn) | 将机器人重置为任意姿态 (pose)。 |\n| robot.dist_to_target() | 获取机器人到目标的当前距离。 |\n\nGoggles：将智能体迁移到现实世界\n=================\nGibson 包含一种内置的域适应机制，名为 Goggles（护目镜），用于将在 Gibson 中训练的智能体部署到现实世界时（即基于车载摄像头传来的图像进行操作）。该机制本质上是一个学习到的逆函数，可以将来自真实摄像头的画面调整为看起来像是通过 Gibson 渲染的效果，从而消除域差异。\n\n\u003Cimg src=http:\u002F\u002Fgibson.vision\u002Fpublic\u002Fimg\u002Ffigure4.jpg width=\"600\">\n\n\n**更多细节：** 尽管点云渲染存在各种不完美之处，但事实证明，通过神经网络修复实现完全逼真的渲染非常困难。剩余的问题导致合成图像与真实图像之间存在域差异。因此，我们将渲染问题定义为构建一个联合空间，确保渲染图像与真实图像之间的对应关系，而不是试图（徒劳地）渲染出与真实图像完全相同的图像。这提供了一种确定性的跨域路径，从而消除了域差异。我们为目标图像 (I_t) 添加了另一个网络 \"u\"，并定义了渲染损失以最小化 f(I_s) 和 u(I_t) 之间的距离，其中 \"f\" 和 \"I_s\" 分别表示填充神经网络和点云渲染输出（见上图中的损失公式）。我们对 f 和 u 使用了相同的网络结构。函数 u(I) 被训练用于调整现实世界中的观测值 I_t，使其看起来像对应的 I_s，从而消除域差异。我们将 u 网络命名为 Goggles（护目镜），因为它类似于智能体在现实世界中部署时的矫正镜片。该机制的详细公式和讨论请参阅论文。您可以下载函数 u 并在现实世界中部署训练好的智能体时应用它。\n\n为了使用 Goggle，您最好配备一个带深度传感器的摄像头，我们在此提供了一个 Kinect 的示例 [这里](examples\u002Fros\u002Fgibson-ros\u002Fgoggle.py)。训练好的 Goggle 函数存储在 `assets\u002Funfiller_{resolution}.pth` 中，每个函数都与一个填充函数配对。您需要根据所使用的填充函数选择正确的 Goggle 函数。如果您没有带深度传感器的摄像头，我们也提供了一个仅支持 RGB 的示例 [这里](examples\u002Fdemo\u002Fgoggle_video.py)。\n\n\n引用\n=================\n\n如果您使用 Gibson 环境的软件或数据库，请引用：\n```\n@inproceedings{xiazamirhe2018gibsonenv,\n  title={Gibson {Env}: real-world perception for embodied agents},\n  author={Xia, Fei and R. Zamir, Amir and He, Zhi-Yang and Sax, Alexander and Malik, Jitendra and Savarese, Silvio},\n  booktitle={Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on},\n  year={2018},\n  organization={IEEE}\n}\n```","# GibsonEnv 快速上手指南\n\nGibsonEnv 是一个用于具身主动智能体的虚拟环境模拟器，提供基于真实世界的感知学习体验。\n\n---\n\n## 环境准备\n\n### 系统要求\n- **操作系统**: Ubuntu 16.04 或更高版本\n- **GPU**: NVIDIA GPU，显存 > 6.0GB\n- **驱动**: NVIDIA 驱动 >= 384（推荐使用最新稳定版）\n- **CUDA\u002FCuDNN**: CUDA >= 9.0，CuDNN >= v7\n\n### 前置依赖\n安装以下基础依赖：\n```bash\napt-get update \napt-get install libglew-dev libglm-dev libassimp-dev xorg-dev libglu1-mesa-dev libboost-dev \\\n\t\tmesa-common-dev freeglut3-dev libopenmpi-dev cmake golang libjpeg-turbo8-dev wmctrl \\\n\t\txdotool libzmq3-dev zlib1g-dev\n```\n\n安装深度学习相关库（推荐使用 Python 3.5）：\n```bash\nconda create -n py35 python=3.5 anaconda \nsource activate py35\nconda install -c conda-forge opencv\npip install http:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu90\u002Ftorch-0.3.1-cp35-cp35m-linux_x86_64.whl \npip install torchvision==0.2.0\npip install tensorflow==1.3\n```\n\n---\n\n## 安装步骤\n\n### 方法一：使用 Docker（推荐）\n1. 安装 Docker 和 NVIDIA Docker 2.0：\n   ```bash\n   sudo apt-get install docker.io\n   sudo apt-get install nvidia-docker2\n   ```\n   验证安装是否成功：\n   ```bash\n   docker run --runtime=nvidia --rm nvidia\u002Fcuda nvidia-smi\n   ```\n\n2. 拉取官方镜像并运行容器：\n   ```bash\n   docker pull xf1280\u002Fgibson:0.3.1\n   xhost +local:root\n   docker run --runtime=nvidia -ti --rm -e DISPLAY -v \u002Ftmp\u002F.X11-unix:\u002Ftmp\u002F.X11-unix -v \u003C主机数据集路径>:\u002Froot\u002Fmount\u002Fgibson\u002Fgibson\u002Fassets\u002Fdataset xf1280\u002Fgibson:0.3.1\n   ```\n\n3. 下载核心数据集：\n   数据集地址：[https:\u002F\u002Fstorage.googleapis.com\u002Fgibson_scenes\u002Fassets_core_v2.tar.gz](https:\u002F\u002Fstorage.googleapis.com\u002Fgibson_scenes\u002Fassets_core_v2.tar.gz)\n\n### 方法二：从源码构建\n1. 克隆代码仓库并下载数据：\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002FStanfordVL\u002FGibsonEnv.git\n   cd GibsonEnv\n   .\u002Fdownload.sh\n   ```\n\n2. 构建项目：\n   ```bash\n   .\u002Fbuild.sh build_local\n   pip install -e .\n   ```\n\n3. （可选）安装 OpenAI Baselines：\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Ffxia22\u002Fbaselines.git\n   pip install -e baselines\n   ```\n\n---\n\n## 基本使用\n\n### 示例 1：控制 Husky 机器人导航\n运行以下命令，使用键盘控制 Husky 机器人在 Gates 建筑中导航：\n```bash\npython examples\u002Fdemo\u002Fplay_husky_nonviz.py\n```\n- 使用 `ASWD` 键控制方向。\n- 此示例不显示摄像头输出。\n\n### 示例 2：带摄像头输出的 Husky 导航\n运行以下命令，查看 RGB 和深度摄像头输出：\n```bash\npython examples\u002Fdemo\u002Fplay_husky_camera.py\n```\n- 使用 `ASWD` 键控制方向。\n- 显示 RGB 和深度图像。\n\n### 示例 3：训练 Husky 机器人导航\n使用 PPO2 算法训练 Husky 机器人导航：\n```bash\npython examples\u002Ftrain\u002Ftrain_husky_navigate_ppo2.py\n```\n- 训练目标是让机器人沿走廊前进。\n- 终端会显示每轮训练的强化学习统计信息。\n\n---\n\n以上为 GibsonEnv 的快速上手指南，更多功能和配置请参考 [官方文档](http:\u002F\u002Fgibson.vision\u002F)。","一家机器人初创公司正在开发一款家用服务机器人，需要训练其在真实家庭环境中完成导航和物品识别任务。\n\n### 没有 GibsonEnv 时\n- 开发团队只能依赖真实场景测试，需要搭建多个模拟房间，成本高昂且调整困难  \n- 真实环境中的测试速度受限，每次运行都需要人工干预，效率极低  \n- 由于机器人硬件脆弱，频繁的碰撞测试导致设备损坏率居高不下  \n- 缺乏对复杂语义信息（如家具类型、空间功能）的快速验证手段，算法迭代缓慢  \n- 难以保证训练环境的一致性，不同房间的光照、布局差异影响实验结果  \n\n### 使用 GibsonEnv 后\n- 团队通过虚拟化的真实家庭场景进行训练，无需搭建实体环境，大幅降低开发成本  \n- 在仿真环境中可以全天候运行测试，无需人工干预，显著提升研发效率  \n- 借助内置物理引擎，机器人可以在虚拟环境中安全试错，避免硬件损耗  \n- 提供丰富的语义信息支持，开发者能快速验证感知算法的效果并优化模型  \n- 场景数据统一且可控，确保实验结果的可重复性和可靠性  \n\nGibsonEnv 让开发者在高度仿真的环境中高效训练机器人，真正实现了从虚拟到现实的无缝衔接。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FStanfordVL_GibsonEnv_2037e352.png","StanfordVL","Stanford Vision and Learning Lab","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FStanfordVL_37170a43.png","Research Codebase",null,"StanfordSVL","http:\u002F\u002Fsvl.stanford.edu\u002F","https:\u002F\u002Fgithub.com\u002FStanfordVL",[85,89,93,97,101,105,109,113],{"name":86,"color":87,"percentage":88},"C","#555555",45.6,{"name":90,"color":91,"percentage":92},"Python","#3572A5",33,{"name":94,"color":95,"percentage":96},"C++","#f34b7d",17.6,{"name":98,"color":99,"percentage":100},"Cuda","#3A4E3A",2.4,{"name":102,"color":103,"percentage":104},"CMake","#DA3434",0.9,{"name":106,"color":107,"percentage":108},"Shell","#89e051",0.4,{"name":110,"color":111,"percentage":112},"Dockerfile","#384d54",0.1,{"name":114,"color":115,"percentage":116},"HTML","#e34c26",0,938,149,"2026-04-03T09:22:28","MIT",4,"Linux","需要 NVIDIA GPU，显存 >6GB，CUDA >=9.0（Docker 安装）或 CUDA >=8.0（源码安装）","未说明",{"notes":126,"python":127,"dependencies":128},"建议使用 Docker 安装方式以简化环境配置；首次运行需下载核心数据集文件（约数百MB到数GB）；支持无图形界面的远程服务器部署，但需要 X server 或其他显示支持。","3.5+（推荐使用 conda 创建 Python 3.5 环境）",[129,130,131,132,133,134,135,136,137,138],"torch==0.3.1","tensorflow==1.3","opencv","bulletphysics","numpy","PyYAML","scipy","matplotlib","mpi4py","zmq",[14,54,13],[141,142,143,144,145,146,147,148,149,150],"computer-vision","robotics","simulator","sim2real","deep-learning","deep-reinforcement-learning","research","ros","reinforcement-learning","cvpr2018","2026-03-27T02:49:30.150509","2026-04-06T07:13:40.276316",[154,159,163,168,173,178],{"id":155,"question_zh":156,"answer_zh":157,"source_url":158},4114,"为什么在运行包含相机和 RGB 的示例时，程序会自动关闭？","这可能是由于 CUDA 渲染器未正确加载导致的。请检查图形设置选项，并在配置文件中调整 timestep 和 frame_skip 参数的关系。","https:\u002F\u002Fgithub.com\u002FStanfordVL\u002FGibsonEnv\u002Fissues\u002F54",{"id":160,"question_zh":161,"answer_zh":162,"source_url":158},4115,"如何解决低帧率问题？","可以尝试优化图形选项设置，并确保配置文件中的 timestep 和 frame_skip 参数设置合理。",{"id":164,"question_zh":165,"answer_zh":166,"source_url":167},4116,"为什么在 ROS 中运行时，相机图像显示异常？","这可能是由于 torchvision 版本不兼容导致的。将 torchvision 版本降级到 0.2.0 可以解决问题。","https:\u002F\u002Fgithub.com\u002FStanfordVL\u002FGibsonEnv\u002Fissues\u002F61",{"id":169,"question_zh":170,"answer_zh":171,"source_url":172},4117,"如何解决 'Invalid MIT-MAGIC-COOKIE-1 key' 错误？","如果使用 Docker 环境，请确保删除所有 *.tar.gz 文件后再解压，同时可以尝试运行命令 `sudo docker run xf1280\u002Fgibson:0.2`。","https:\u002F\u002Fgithub.com\u002FStanfordVL\u002FGibsonEnv\u002Fissues\u002F24",{"id":174,"question_zh":175,"answer_zh":176,"source_url":177},4118,"如何解决 enjoy_husky_gibson_flagrun_ppo1.py 中缺少参数的问题？","该问题可能与变量 'ob' 的定义有关，建议参考 Issue #55 的讨论内容进行修复。","https:\u002F\u002Fgithub.com\u002FStanfordVL\u002FGibsonEnv\u002Fissues\u002F81",{"id":179,"question_zh":180,"answer_zh":181,"source_url":182},4119,"如何解决语义渲染时出现的深度渲染错误？","可以通过修改代码中的一个 off-by-one 错误来解决，具体修改为：`std::vector\u003Cunsigned int> vertexFaces(temp_vertices.size()+1);`。","https:\u002F\u002Fgithub.com\u002FStanfordVL\u002FGibsonEnv\u002Fissues\u002F63",[184,189],{"id":185,"version":186,"summary_zh":187,"released_at":188},103521,"v0.3.1","#### Changelog\r\n\r\n- EGL integration, remove X server dependency (solve #16 #24 #25)\r\n- OpenAI Gym 0.10.5 compatibility\r\n- Updated rendering filler models, added unfiller models\r\n- Bug fixes","2018-08-11T22:43:33",{"id":190,"version":191,"summary_zh":80,"released_at":192},103522,"v0.1.0","2018-02-26T17:40:27"]