[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-astra-vision--MonoScene":3,"tool-astra-vision--MonoScene":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",149489,2,"2026-04-10T11:32:46",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":72,"owner_avatar_url":73,"owner_bio":74,"owner_company":74,"owner_location":74,"owner_email":74,"owner_twitter":74,"owner_website":74,"owner_url":75,"languages":76,"stars":81,"forks":82,"last_commit_at":83,"license":84,"difficulty_score":85,"env_os":86,"env_gpu":87,"env_ram":88,"env_deps":89,"category_tags":103,"github_topics":105,"view_count":32,"oss_zip_url":74,"oss_zip_packed_at":74,"status":17,"created_at":120,"updated_at":121,"faqs":122,"releases":152},6358,"astra-vision\u002FMonoScene","MonoScene","[CVPR 2022] \"MonoScene: Monocular 3D Semantic Scene Completion\": 3D Semantic Occupancy Prediction from a single image","MonoScene 是一款基于单目图像的 3D 语义场景补全工具，源自 CVPR 2022 的研究成果。它的核心能力是仅凭一张普通的 2D 照片，就能重建出包含语义信息的完整 3D 空间结构，预测场景中每个体素（voxel）的几何形状与类别（如道路、车辆、行人等）。\n\n传统 3D 感知通常依赖激光雷达或多视角相机，成本高昂且数据获取复杂。MonoScene 解决了这一痛点，证明了仅使用单一摄像头即可实现高质量的 3D 语义占用预测，极大地降低了硬件门槛。其技术亮点在于独特的网络架构，能够有效从单张图像中推断被遮挡区域的几何与语义信息，并在 SemanticKITTI 和 NYUv2 等权威数据集上展现了卓越性能。\n\n这款工具非常适合计算机视觉领域的研究人员、自动驾驶算法开发者以及 3D 场景理解方向的工程师使用。对于希望探索低成本 3D 感知方案，或需要在缺乏深度传感器条件下进行场景重建的开发者而言，MonoScene 提供了宝贵的开源基线与预训练模型。虽然普通用户可通过在线演示体验其效果，但要充分发挥其潜力，仍需具备一定的深度学习框架（如 PyTorch）操作能力。","# MonoScene: Monocular 3D Semantic Scene Completion\n\n\n**MonoScene: Monocular 3D Semantic Scene Completion**\\\n[Anh-Quan Cao](https:\u002F\u002Fanhquancao.github.io),\n[Raoul de Charette](https:\u002F\u002Fteam.inria.fr\u002Frits\u002Fmembres\u002Fraoul-de-charette\u002F)  \nInria, Paris, France.  \nCVPR 2022 \\\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv%20%2B%20supp-2112.00726-purple)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.00726) \n[![Project page](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject%20Page-MonoScene-red)](https:\u002F\u002Fastra-vision.github.io\u002FMonoScene\u002F)\n[![Live demo](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLive%20demo-Hugging%20Face-yellow)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FCVPR\u002FMonoScene)\n\nIf you find this work or code useful, please cite our [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.00726) and [give this repo a star](https:\u002F\u002Fgithub.com\u002Fastra-vision\u002FMonoScene\u002Fstargazers):\n```\n@inproceedings{cao2022monoscene,\n    title={MonoScene: Monocular 3D Semantic Scene Completion}, \n    author={Anh-Quan Cao and Raoul de Charette},\n    booktitle={CVPR},\n    year={2022}\n}\n```\n\n# Teaser\n\n\n|SemanticKITTI | KITTI-360 \u003Cbr\u002F>(Trained on SemanticKITTI) |\n|:------------:|:------:|\n|\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fastra-vision_MonoScene_readme_4a22288a8fe7.gif\"  \u002F>|\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fastra-vision_MonoScene_readme_0ab80c4755a5.gif\" \u002F>|\n\n\n\u003Cp align=\"center\">\n  \u003Cb>NYUv2\u003C\u002Fb>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fastra-vision_MonoScene_readme_8190e8b241bb.gif\" style=\"width:48%\"\u002F>\n\u003C\u002Fp>\n\n# Table of Content\n- [News](#news)\n- [Preparing MonoScene](#preparing-monoscene)\n  - [Installation](#installation)  \n  - [Datasets](#datasets)\n  - [Pretrained models](#pretrained-models)\n- [Running MonoScene](#running-monoscene)\n  - [Training](#training)\n  - [Evaluating](#evaluating)\n- [Inference & Visualization](#inference--visualization)\n  - [Inference](#inference)\n  - [Visualization](#visualization)\n- [Related camera-only 3D occupancy prediction projects](#related-camera-only-3d-occupancy-prediction-projects)\n- [License](#license)\n\n# News\n- 25\u002F03\u002F2026: One step towards generalized unified occupnay prediction with [OccAny: Generalized Unconstrained Urban 3D Occupancy (CVPR'26)](https:\u002F\u002Fvaleoai.github.io\u002FOccAny\u002F)\n- 05\u002F12\u002F2023: Check out our recent work [PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness](https:\u002F\u002Fastra-vision.github.io\u002FPaSCo\u002F) :rotating_light:\n- 20\u002F04\u002F2023: Check out other [camera-only 3D occupancy prediction projects](#related-camera-only-3d-occupancy-prediction-projects)\n- 28\u002F06\u002F2022: We added [MonoScene demo on Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FCVPR\u002FMonoScene) \n- 13\u002F06\u002F2022: We added a tutorial on [How to define viewpoint programmatically in mayavi](https:\u002F\u002Fanhquancao.github.io\u002Fblog\u002F2022\u002Fhow-to-define-viewpoint-programmatically-in-mayavi\u002F) \n- 12\u002F06\u002F2022: We added a guide on [how to install mayavi](https:\u002F\u002Fanhquancao.github.io\u002Fblog\u002F2022\u002Fhow-to-install-mayavi-with-python-3-on-ubuntu-2004-using-pip-or-anaconda\u002F) \n- 09\u002F06\u002F2022: We fixed the installation errors mentioned in https:\u002F\u002Fgithub.com\u002Fastra-vision\u002FMonoScene\u002Fissues\u002F18 \n\n# Preparing MonoScene\n\n## Installation\n\n\n\n1. Create conda environment:\n\n```\n$ conda create -y -n monoscene python=3.7\n$ conda activate monoscene\n```\n2. This code was implemented with python 3.7, pytorch 1.7.1 and CUDA 10.2. Please install [PyTorch](https:\u002F\u002Fpytorch.org\u002F): \n\n```\n$ conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.2 -c pytorch\n```\n\n3. Install the additional dependencies:\n\n```\n$ cd MonoScene\u002F\n$ pip install -r requirements.txt\n```\n\n4. Install tbb:\n\n```\n$ conda install -c bioconda tbb=2020.2\n```\n\n5. Downgrade torchmetrics to 0.6.0\n```\n$ pip install torchmetrics==0.6.0\n```\n\n6. Finally, install MonoScene:\n\n```\n$ pip install -e .\u002F\n```\n\n\n## Datasets\n\n\n### SemanticKITTI\n\n1. You need to download\n\n      - The **Semantic Scene Completion dataset v1.1** (SemanticKITTI voxel data (700 MB)) from [SemanticKITTI website](http:\u002F\u002Fwww.semantic-kitti.org\u002Fdataset.html#download)\n      -  The **KITTI Odometry Benchmark calibration data** (Download odometry data set (calibration files, 1 MB)) and the **RGB images** (Download odometry data set (color, 65 GB)) from [KITTI Odometry website](http:\u002F\u002Fwww.cvlibs.net\u002Fdatasets\u002Fkitti\u002Feval_odometry.php).\n      - The dataset folder at **\u002Fpath\u002Fto\u002Fsemantic_kitti** should have the following structure:\n    ```\n    └── \u002Fpath\u002Fto\u002Fsemantic_kitti\u002F\n      └── dataset\n        ├── poses\n        └── sequences\n    ```\n\n\n2. Create a folder to store SemanticKITTI preprocess data at `\u002Fpath\u002Fto\u002Fkitti\u002Fpreprocess\u002Ffolder`.\n\n3. Store paths in environment variables for faster access (**Note: folder 'dataset' is in \u002Fpath\u002Fto\u002Fsemantic_kitti**):\n\n```\n$ export KITTI_PREPROCESS=\u002Fpath\u002Fto\u002Fkitti\u002Fpreprocess\u002Ffolder\n$ export KITTI_ROOT=\u002Fpath\u002Fto\u002Fsemantic_kitti \n```\n\n4. Preprocess the data to generate labels at a lower scale, which are used to compute the ground truth relation matrices:\n\n```\n$ cd MonoScene\u002F\n$ python monoscene\u002Fdata\u002Fsemantic_kitti\u002Fpreprocess.py kitti_root=$KITTI_ROOT kitti_preprocess_root=$KITTI_PREPROCESS\n```\n\n### NYUv2\n\n1. Download the [NYUv2 dataset](https:\u002F\u002Fwww.rocq.inria.fr\u002Frits_files\u002Fcomputer-vision\u002Fmonoscene\u002Fnyu.zip).\n\n2. Create a folder to store NYUv2 preprocess data at `\u002Fpath\u002Fto\u002FNYU\u002Fpreprocess\u002Ffolder`.\n\n3. Store paths in environment variables for faster access:\n\n```\n$ export NYU_PREPROCESS=\u002Fpath\u002Fto\u002FNYU\u002Fpreprocess\u002Ffolder\n$ export NYU_ROOT=\u002Fpath\u002Fto\u002FNYU\u002Fdepthbin \n```\n\n4. Preprocess the data to generate labels at a lower scale, which are used to compute the ground truth relation matrices:\n\n```\n$ cd MonoScene\u002F\n$ python monoscene\u002Fdata\u002FNYU\u002Fpreprocess.py NYU_root=$NYU_ROOT NYU_preprocess_root=$NYU_PREPROCESS\n\n```\n\n\n\n### KITTI-360\n\n1. We only perform inference on KITTI-360. You can download either the **Perspective Images for Train & Val (128G)** or the **Perspective Images for Test (1.5G)** at [http:\u002F\u002Fwww.cvlibs.net\u002Fdatasets\u002Fkitti-360\u002Fdownload.php](http:\u002F\u002Fwww.cvlibs.net\u002Fdatasets\u002Fkitti-360\u002Fdownload.php).\n\n2. Create a folder to store KITTI-360 data at `\u002Fpath\u002Fto\u002FKITTI-360\u002Ffolder`.\n\n3. Store paths in environment variables for faster access:\n\n```\n$ export KITTI_360_ROOT=\u002Fpath\u002Fto\u002FKITTI-360\n```\n\n## Pretrained models\n\nDownload MonoScene pretrained models [on SemanticKITTI](https:\u002F\u002Fwww.rocq.inria.fr\u002Frits_files\u002Fcomputer-vision\u002Fmonoscene\u002Fmonoscene_kitti.ckpt) and [on NYUv2](https:\u002F\u002Fwww.rocq.inria.fr\u002Frits_files\u002Fcomputer-vision\u002Fmonoscene\u002Fmonoscene_nyu.ckpt), then put them in the folder `\u002Fpath\u002Fto\u002FMonoScene\u002Ftrained_models`.\n\n\n# Running MonoScene\n\n## Training\n\nTo train MonoScene with SemanticKITTI, type:\n\n### SemanticKITTI\n\n1. Create folders to store training logs at **\u002Fpath\u002Fto\u002Fkitti\u002Flogdir**.\n\n2. Store in an environment variable:\n\n```\n$ export KITTI_LOG=\u002Fpath\u002Fto\u002Fkitti\u002Flogdir\n```\n\n3. Train MonoScene using 4 GPUs with batch_size of 4 (1 item per GPU) on Semantic KITTI:\n\n```\n$ cd MonoScene\u002F\n$ python monoscene\u002Fscripts\u002Ftrain_monoscene.py \\\n    dataset=kitti \\\n    enable_log=true \\\n    kitti_root=$KITTI_ROOT \\\n    kitti_preprocess_root=$KITTI_PREPROCESS\\\n    kitti_logdir=$KITTI_LOG \\\n    n_gpus=4 batch_size=4    \n```\n\n### NYUv2\n\n1. Create folders to store training logs at **\u002Fpath\u002Fto\u002FNYU\u002Flogdir**.\n\n2. Store in an environment variable:\n\n```\n$ export NYU_LOG=\u002Fpath\u002Fto\u002FNYU\u002Flogdir\n```\n\n3.  Train MonoScene using 2 GPUs with batch_size of 4 (2 item per GPU) on NYUv2:\n```\n$ cd MonoScene\u002F\n$ python monoscene\u002Fscripts\u002Ftrain_monoscene.py \\\n    dataset=NYU \\\n    NYU_root=$NYU_ROOT \\\n    NYU_preprocess_root=$NYU_PREPROCESS \\\n    logdir=$NYU_LOG \\\n    n_gpus=2 batch_size=4\n\n```\n\n\n## Evaluating \n\n### SemanticKITTI\n\nTo evaluate MonoScene on SemanticKITTI validation set, type:\n\n```\n$ cd MonoScene\u002F\n$ python monoscene\u002Fscripts\u002Feval_monoscene.py \\\n    dataset=kitti \\\n    kitti_root=$KITTI_ROOT \\\n    kitti_preprocess_root=$KITTI_PREPROCESS \\\n    n_gpus=1 batch_size=1\n```\n\n### NYUv2\n\nTo evaluate MonoScene on NYUv2 test set, type:\n\n```\n$ cd MonoScene\u002F\n$ python monoscene\u002Fscripts\u002Feval_monoscene.py \\\n    dataset=NYU \\\n    NYU_root=$NYU_ROOT\\\n    NYU_preprocess_root=$NYU_PREPROCESS \\\n    n_gpus=1 batch_size=1\n```\n\n# Inference & Visualization\n\n## Inference\n\nPlease create folder **\u002Fpath\u002Fto\u002Fmonoscene\u002Foutput** to store the MonoScene outputs and store in environment variable:\n\n```\nexport MONOSCENE_OUTPUT=\u002Fpath\u002Fto\u002Fmonoscene\u002Foutput\n```\n\n### NYUv2\n\nTo generate the predictions on the NYUv2 test set, type:\n\n```\n$ cd MonoScene\u002F\n$ python monoscene\u002Fscripts\u002Fgenerate_output.py \\\n    +output_path=$MONOSCENE_OUTPUT \\\n    dataset=NYU \\\n    NYU_root=$NYU_ROOT \\\n    NYU_preprocess_root=$NYU_PREPROCESS \\\n    n_gpus=1 batch_size=1\n```\n\n### Semantic KITTI\n\nTo generate the predictions on the Semantic KITTI validation set, type:\n\n```\n$ cd MonoScene\u002F\n$ python monoscene\u002Fscripts\u002Fgenerate_output.py \\\n    +output_path=$MONOSCENE_OUTPUT \\\n    dataset=kitti \\\n    kitti_root=$KITTI_ROOT \\\n    kitti_preprocess_root=$KITTI_PREPROCESS \\\n    n_gpus=1 batch_size=1\n```\n\n### KITTI-360\n\nHere we use the sequence **2013_05_28_drive_0009_sync**, you can use other sequences. To generate the predictions on KITTI-360, type:\n\n```\n$ cd MonoScene\u002F\n$ python monoscene\u002Fscripts\u002Fgenerate_output.py \\\n    +output_path=$MONOSCENE_OUTPUT \\\n    dataset=kitti_360 \\\n    +kitti_360_root=$KITTI_360_ROOT \\\n    +kitti_360_sequence=2013_05_28_drive_0009_sync  \\\n    n_gpus=1 batch_size=1\n```\n\n## Visualization\n\n**NOTE:** if you have trouble using mayavi, you can use an alternative [visualization code using Open3D](https:\u002F\u002Fgithub.com\u002Fastra-vision\u002FMonoScene\u002Fissues\u002F68#issuecomment-1637623145).\n\n\nWe use mayavi to visualize the predictions. Please install mayavi following the [official installation instruction](https:\u002F\u002Fdocs.enthought.com\u002Fmayavi\u002Fmayavi\u002Finstallation.html). Then, use the following commands to visualize the outputs on respective datasets.\n\nIf you have **trouble installing mayavi**, you can take a look at our [**mayavi installation guide**](https:\u002F\u002Fanhquancao.github.io\u002Fblog\u002F2022\u002Fhow-to-install-mayavi-with-python-3-on-ubuntu-2004-using-pip-or-anaconda\u002F).\n\nIf you have **trouble fixing mayavi viewpoint**, you can take a look at [**our tutorial**](https:\u002F\u002Fanhquancao.github.io\u002Fblog\u002F2022\u002Fhow-to-define-viewpoint-programmatically-in-mayavi\u002F).\n\n\nYou also need to install some packages used by the visualization scripts using the commands:\n```\npip install tqdm\npip install omegaconf\npip install hydra-core\n```\n\n### NYUv2 \n\n```\n$ cd MonoScene\u002F\n$ python monoscene\u002Fscripts\u002Fvisualization\u002FNYU_vis_pred.py +file=\u002Fpath\u002Fto\u002Foutput\u002Ffile.pkl\n```\n\n### Semantic KITTI \n\n```\n$ cd MonoScene\u002F\n$ python monoscene\u002Fscripts\u002Fvisualization\u002Fkitti_vis_pred.py +file=\u002Fpath\u002Fto\u002Foutput\u002Ffile.pkl +dataset=kitt\n```\n\n\n### KITTI-360\n\n```\n$ cd MonoScene\u002F \n$ python monoscene\u002Fscripts\u002Fvisualization\u002Fkitti_vis_pred.py +file=\u002Fpath\u002Fto\u002Foutput\u002Ffile.pkl +dataset=kitti_360\n```\n\n# Related camera-only 3D occupancy prediction projects\n\n\n\n- [NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space](https:\u002F\u002Fgithub.com\u002FJiawei-Yao0812\u002FNDCScene), ICCV 2023.\n- [OG: Equip vision occupancy with instance segmentation and visual grounding](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.05873), arXiv 2023.\n- [FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation](https:\u002F\u002Fgithub.com\u002FNVlabs\u002FFB-BEV), CVPRW 2023.\n- [Symphonize 3D Semantic Scene Completion with Contextual Instance Queries](https:\u002F\u002Fgithub.com\u002Fhustvl\u002FSymphonies), arXiv 2023.\n- [OVO: Open-Vocabulary Occupancy](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.16133.pdf), arXiv 2023.\n- [OccNet: Scene as Occupancy](https:\u002F\u002Fgithub.com\u002Fopendrivelab\u002Foccnet), ICCV 2023.\n- [SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields](https:\u002F\u002Fastra-vision.github.io\u002FSceneRF\u002F), ICCV 2023.\n- [Behind the Scenes: Density Fields for Single View Reconstruction](https:\u002F\u002Ffwmb.github.io\u002Fbts\u002F), CVPR 2023.\n- [VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion](https:\u002F\u002Fgithub.com\u002FNVlabs\u002FVoxFormer), CVPR 2023.\n- [OccDepth: A Depth-aware Method for 3D Semantic Occupancy Network](https:\u002F\u002Fgithub.com\u002Fmegvii-research\u002FOccDepth), arXiv 2023.\n- [StereoScene: BEV-Assisted Stereo Matching Empowers 3D Semantic Scene Completion](https:\u002F\u002Fgithub.com\u002FArlo0o\u002FStereoScene), arXiv 2023.\n- [Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction](https:\u002F\u002Fgithub.com\u002Fwzzheng\u002FTPVFormer), CVPR 2023.\n- [A Simple Attempt for 3D Occupancy Estimation in Autonomous Driving](https:\u002F\u002Fgithub.com\u002FGANWANSHUI\u002FSimpleOccupancy), arXiv 2023.\n- [OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction](https:\u002F\u002Fgithub.com\u002Fzhangyp15\u002FOccFormer), ICCV 2023.\n- [SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving](https:\u002F\u002Fgithub.com\u002Fweiyithu\u002FSurroundOcc), ICCV 2023.\n- [PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.10013), arXiv 2023.\n- [PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction](https:\u002F\u002Fgithub.com\u002Fwzzheng\u002FPointOcc), arXiv 2023.\n- [RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.09502), arXiv 2023.\n\n## Datasets\u002FBenchmarks\n- [PointSSC: A Cooperative Vehicle-Infrastructure Point Cloud Benchmark for Semantic Scene Completion](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.12708), arXiv 2023.\n- [OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception](https:\u002F\u002Fgithub.com\u002FJeffWang987\u002FOpenOccupancy), ICCV 2023.\n- [Occupancy Dataset for nuScenes](https:\u002F\u002Fgithub.com\u002FFANG-MING\u002Foccupancy-for-nuscenes), Github 2023\n- [Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving](https:\u002F\u002Fgithub.com\u002FTsinghua-MARS-Lab\u002FOcc3D), arXiv 2023.\n- [OccNet: Scene as Occupancy](https:\u002F\u002Fgithub.com\u002Fopendrivelab\u002Foccnet), ICCV 2023.\n- [SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving](https:\u002F\u002Fgithub.com\u002Fai4ce\u002FSSCBench), arXiv 2023.\n\n\n\n\n\n\n# License\nMonoScene is released under the [Apache 2.0 license](.\u002FLICENSE).\n","# MonoScene: 单目3D语义场景补全\n\n\n**MonoScene: 单目3D语义场景补全**  \n[Anh-Quan Cao](https:\u002F\u002Fanhquancao.github.io),\n[Raoul de Charette](https:\u002F\u002Fteam.inria.fr\u002Frits\u002Fmembres\u002Fraoul-de-charette\u002F)  \n法国巴黎Inria研究所。  \nCVPR 2022  \n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv%20%2B%20supp-2112.00726-purple)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.00726) \n[![项目页面](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject%20Page-MonoScene-red)](https:\u002F\u002Fastra-vision.github.io\u002FMonoScene\u002F)\n[![在线演示](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLive%20demo-Hugging%20Face-yellow)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FCVPR\u002FMonoScene)\n\n如果您觉得这项工作或代码有用，请引用我们的[论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.00726)并为本仓库点个赞([star](https:\u002F\u002Fgithub.com\u002Fastra-vision\u002FMonoScene\u002Fstargazers))：\n```\n@inproceedings{cao2022monoscene,\n    title={MonoScene: Monocular 3D Semantic Scene Completion}, \n    author={Anh-Quan Cao and Raoul de Charette},\n    booktitle={CVPR},\n    year={2022}\n}\n```\n\n# 预告片\n\n\n|SemanticKITTI | KITTI-360 \u003Cbr\u002F>(在SemanticKITTI上训练) |\n|:------------:|:------:|\n|\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fastra-vision_MonoScene_readme_4a22288a8fe7.gif\"  \u002F>|\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fastra-vision_MonoScene_readme_0ab80c4755a5.gif\" \u002F>|\n\n\n\u003Cp align=\"center\">\n  \u003Cb>NYUv2\u003C\u002Fb>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fastra-vision_MonoScene_readme_8190e8b241bb.gif\" style=\"width:48%\"\u002F>\n\u003C\u002Fp>\n\n# 目录\n- [新闻](#news)\n- [准备MonoScene](#preparing-monoscene)\n  - [安装](#installation)  \n  - [数据集](#datasets)\n  - [预训练模型](#pretrained-models)\n- [运行MonoScene](#running-monoscene)\n  - [训练](#training)\n  - [评估](#evaluating)\n- [推理与可视化](#inference--visualization)\n  - [推理](#inference)\n  - [可视化](#visualization)\n- [相关仅基于相机的3D占用预测项目](#related-camera-only-3d-occupancy-prediction-projects)\n- [许可证](#license)\n\n# 新闻\n- 25\u002F03\u002F2026：朝着通用统一的占用预测迈进了一步，推出了[OccAny：通用无约束城市3D占用预测（CVPR'26）](https:\u002F\u002Fvaleoai.github.io\u002FOccAny\u002F)\n- 05\u002F12\u002F2023：请查看我们最近的工作[PaSCo：具有不确定性感知的城市3D全景场景补全](https:\u002F\u002Fastra-vision.github.io\u002FPaSCo\u002F) :rotating_light:\n- 20\u002F04\u002F2023：请查看其他[仅基于相机的3D占用预测项目](#related-camera-only-3d-occupancy-prediction-projects)\n- 28\u002F06\u002F2022：我们在Hugging Face上添加了[MonoScene演示](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FCVPR\u002FMonoScene) \n- 13\u002F06\u002F2022：我们添加了一个关于[如何在mayavi中以编程方式定义视点的教程](https:\u002F\u002Fanhquancao.github.io\u002Fblog\u002F2022\u002Fhow-to-define-viewpoint-programmatically-in-mayavi\u002F) \n- 12\u002F06\u002F2022：我们添加了一篇关于[如何安装mayavi的指南](https:\u002F\u002Fanhquancao.github.io\u002Fblog\u002F2022\u002Fhow-to-install-mayavi-with-python-3-on-ubuntu-2004-using-pip-or-anaconda\u002F) \n- 09\u002F06\u002F2022：我们修复了GitHub仓库中提到的安装错误（见https:\u002F\u002Fgithub.com\u002Fastra-vision\u002FMonoScene\u002Fissues\u002F18）\n\n# 准备MonoScene\n\n## 安装\n\n\n\n1. 创建conda环境：\n\n```\n$ conda create -y -n monoscene python=3.7\n$ conda activate monoscene\n```\n2. 本代码使用Python 3.7、PyTorch 1.7.1和CUDA 10.2实现。请安装[PyTorch](https:\u002F\u002Fpytorch.org\u002F)：\n\n```\n$ conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.2 -c pytorch\n```\n\n3. 安装其他依赖项：\n\n```\n$ cd MonoScene\u002F\n$ pip install -r requirements.txt\n```\n\n4. 安装tbb：\n\n```\n$ conda install -c bioconda tbb=2020.2\n```\n\n5. 将torchmetrics降级到0.6.0版本：\n```\n$ pip install torchmetrics==0.6.0\n```\n\n6. 最后，安装MonoScene：\n\n```\n$ pip install -e .\u002F\n```\n\n\n## 数据集\n\n\n### SemanticKITTI\n\n1. 您需要下载\n\n      - **语义场景补全数据集v1.1**（SemanticKITTI体素数据，700 MB）来自[SemanticKITTI官网](http:\u002F\u002Fwww.semantic-kitti.org\u002Fdataset.html#download)\n      - **KITTI Odometry Benchmark校准数据**（下载里程计数据集，包含校准文件，1 MB）以及**RGB图像**（下载里程计数据集，彩色图像，65 GB）来自[KITTI Odometry官网](http:\u002F\u002Fwww.cvlibs.net\u002Fdatasets\u002Fkitti\u002Feval_odometry.php)。\n      - 数据集文件夹位于**\u002Fpath\u002Fto\u002Fsemantic_kitti**，应具有以下结构：\n    ```\n    └── \u002Fpath\u002Fto\u002Fsemantic_kitti\u002F\n      └── dataset\n        ├── poses\n        └── sequences\n    ```\n\n\n2. 创建一个文件夹来存储SemanticKITTI预处理数据，路径为`\u002Fpath\u002Fto\u002Fkitti\u002Fpreprocess\u002Ffolder`。\n\n3. 将路径存储在环境变量中以便快速访问（注意：`dataset`文件夹位于`\u002Fpath\u002Fto\u002Fsemantic_kitti`）：\n\n```\n$ export KITTI_PREPROCESS=\u002Fpath\u002Fto\u002Fkitti\u002Fpreprocess\u002Ffolder\n$ export KITTI_ROOT=\u002Fpath\u002Fto\u002Fsemantic_kitti \n```\n\n4. 对数据进行预处理，生成低分辨率标签，用于计算真实值关系矩阵：\n\n```\n$ cd MonoScene\u002F\n$ python monoscene\u002Fdata\u002Fsemantic_kitti\u002Fpreprocess.py kitti_root=$KITTI_ROOT kitti_preprocess_root=$KITTI_PREPROCESS\n```\n\n### NYUv2\n\n1. 下载[NYUv2数据集](https:\u002F\u002Fwww.rocq.inria.fr\u002Frits_files\u002Fcomputer-vision\u002Fmonoscene\u002Fnyu.zip)。\n\n2. 创建一个文件夹来存储NYUv2预处理数据，路径为`\u002Fpath\u002Fto\u002FNYU\u002Fpreprocess\u002Ffolder`。\n\n3. 将路径存储在环境变量中以便快速访问：\n\n```\n$ export NYU_PREPROCESS=\u002Fpath\u002Fto\u002FNYU\u002Fpreprocess\u002Ffolder\n$ export NYU_ROOT=\u002Fpath\u002Fto\u002FNYU\u002Fdepthbin \n```\n\n4. 对数据进行预处理，生成低分辨率标签，用于计算真实值关系矩阵：\n\n```\n$ cd MonoScene\u002F\n$ python monoscene\u002Fdata\u002FNYU\u002Fpreprocess.py NYU_root=$NYU_ROOT NYU_preprocess_root=$NYU_PREPROCESS\n\n```\n\n\n\n### KITTI-360\n\n1. 我们仅对KITTI-360进行推理。您可以从[http:\u002F\u002Fwww.cvlibs.net\u002Fdatasets\u002Fkitti-360\u002Fdownload.php](http:\u002F\u002Fwww.cvlibs.net\u002Fdatasets\u002Fkitti-360\u002Fdownload.php)下载**用于训练和验证的透视图像（128G）**或**用于测试的透视图像（1.5G）**。\n\n2. 创建一个文件夹来存储KITTI-360数据，路径为`\u002Fpath\u002Fto\u002FKITTI-360\u002Ffolder`。\n\n3. 将路径存储在环境变量中以便快速访问：\n\n```\n$ export KITTI_360_ROOT=\u002Fpath\u002Fto\u002FKITTI-360\n```\n\n## 预训练模型\n\n下载MonoScene在SemanticKITTI上的预训练模型[链接](https:\u002F\u002Fwww.rocq.inria.fr\u002Frits_files\u002Fcomputer-vision\u002Fmonoscene\u002Fmonoscene_kitti.ckpt)以及在NYUv2上的预训练模型[链接](https:\u002F\u002Fwww.rocq.inria.fr\u002Frits_files\u002Fcomputer-vision\u002Fmonoscene\u002Fmonoscene_nyu.ckpt)，然后将其放入文件夹`\u002Fpath\u002Fto\u002FMonoScene\u002Ftrained_models`。\n\n\n# 运行MonoScene\n\n## 训练\n\n要在SemanticKITTI上训练MonoScene，请输入以下命令：\n\n### SemanticKITTI\n\n1. 创建文件夹来存储训练日志，路径为**\u002Fpath\u002Fto\u002Fkitti\u002Flogdir**。\n\n2. 将其存储在环境变量中：\n\n```\n$ export KITTI_LOG=\u002Fpath\u002Fto\u002Fkitti\u002Flogdir\n```\n\n3. 使用4块GPU，每块GPU处理1个样本，以batch_size为4，在Semantic KITTI上训练MonoScene：\n\n```\n$ cd MonoScene\u002F\n$ python monoscene\u002Fscripts\u002Ftrain_monoscene.py \\\n    dataset=kitti \\\n    enable_log=true \\\n    kitti_root=$KITTI_ROOT \\\n    kitti_preprocess_root=$KITTI_PREPROCESS\\\n    kitti_logdir=$KITTI_LOG \\\n    n_gpus=4 batch_size=4    \n```\n\n### NYUv2\n\n1. 在 **\u002Fpath\u002Fto\u002FNYU\u002Flogdir** 创建用于存储训练日志的文件夹。\n\n2. 将路径存储到环境变量中：\n\n```\n$ export NYU_LOG=\u002Fpath\u002Fto\u002FNYU\u002Flogdir\n```\n\n3. 使用 2 张 GPU、每张 GPU 处理 2 个样本（即 batch_size=4）在 NYUv2 数据集上训练 MonoScene：\n```\n$ cd MonoScene\u002F\n$ python monoscene\u002Fscripts\u002Ftrain_monoscene.py \\\n    dataset=NYU \\\n    NYU_root=$NYU_ROOT \\\n    NYU_preprocess_root=$NYU_PREPROCESS \\\n    logdir=$NYU_LOG \\\n    n_gpus=2 batch_size=4\n```\n\n\n## 评估 \n\n### SemanticKITTI\n\n要在 SemanticKITTI 验证集上评估 MonoScene，请执行以下命令：\n\n```\n$ cd MonoScene\u002F\n$ python monoscene\u002Fscripts\u002Feval_monoscene.py \\\n    dataset=kitti \\\n    kitti_root=$KITTI_ROOT \\\n    kitti_preprocess_root=$KITTI_PREPROCESS \\\n    n_gpus=1 batch_size=1\n```\n\n### NYUv2\n\n要在 NYUv2 测试集上评估 MonoScene，请执行以下命令：\n\n```\n$ cd MonoScene\u002F\n$ python monoscene\u002Fscripts\u002Feval_monoscene.py \\\n    dataset=NYU \\\n    NYU_root=$NYU_ROOT\\\n    NYU_preprocess_root=$NYU_PREPROCESS \\\n    n_gpus=1 batch_size=1\n```\n\n# 推理与可视化\n\n## 推理\n\n请创建文件夹 **\u002Fpath\u002Fto\u002Fmonoscene\u002Foutput** 以存储 MonoScene 的输出，并将其路径存储到环境变量中：\n\n```\nexport MONOSCENE_OUTPUT=\u002Fpath\u002Fto\u002Fmonoscene\u002Foutput\n```\n\n### NYUv2\n\n要在 NYUv2 测试集上生成预测结果，请执行以下命令：\n\n```\n$ cd MonoScene\u002F\n$ python monoscene\u002Fscripts\u002Fgenerate_output.py \\\n    +output_path=$MONOSCENE_OUTPUT \\\n    dataset=NYU \\\n    NYU_root=$NYU_ROOT \\\n    NYU_preprocess_root=$NYU_PREPROCESS \\\n    n_gpus=1 batch_size=1\n```\n\n### Semantic KITTI\n\n要在 Semantic KITTI 验证集上生成预测结果，请执行以下命令：\n\n```\n$ cd MonoScene\u002F\n$ python monoscene\u002Fscripts\u002Fgenerate_output.py \\\n    +output_path=$MONOSCENE_OUTPUT \\\n    dataset=kitti \\\n    kitti_root=$KITTI_ROOT \\\n    kitti_preprocess_root=$KITTI_PREPROCESS \\\n    n_gpus=1 batch_size=1\n```\n\n### KITTI-360\n\n这里我们使用序列 **2013_05_28_drive_0009_sync**，您也可以使用其他序列。要在 KITTI-360 上生成预测结果，请执行以下命令：\n\n```\n$ cd MonoScene\u002F\n$ python monoscene\u002Fscripts\u002Fgenerate_output.py \\\n    +output_path=$MONOSCENE_OUTPUT \\\n    dataset=kitti_360 \\\n    +kitti_360_root=$KITTI_360_ROOT \\\n    +kitti_360_sequence=2013_05_28_drive_0009_sync  \\\n    n_gpus=1 batch_size=1\n```\n\n## 可视化\n\n**注意：** 如果您在使用 mayavi 时遇到困难，可以使用替代的 [基于 Open3D 的可视化代码](https:\u002F\u002Fgithub.com\u002Fastra-vision\u002FMonoScene\u002Fissues\u002F68#issuecomment-1637623145)。\n\n\n我们使用 mayavi 来可视化预测结果。请按照 [官方安装说明](https:\u002F\u002Fdocs.enthought.com\u002Fmayavi\u002Fmayavi\u002Finstallation.html) 安装 mayavi。然后，使用以下命令分别对各个数据集的输出进行可视化。\n\n如果您在 **安装 mayavi 时遇到困难**，可以查看我们的 [**mayavi 安装指南**](https:\u002F\u002Fanhquancao.github.io\u002Fblog\u002F2022\u002Fhow-to-install-mayavi-with-python-3-on-ubuntu-2004-using-pip-or-anaconda\u002F)。\n\n如果您在 **调整 mayavi 视角时遇到困难**，可以查看我们的 [**教程**](https:\u002F\u002Fanhquancao.github.io\u002Fblog\u002F2022\u002Fhow-to-define-viewpoint-programmatically-in-mayavi\u002F)。\n\n\n您还需要使用以下命令安装可视化脚本所需的软件包：\n```\npip install tqdm\npip install omegaconf\npip install hydra-core\n```\n\n### NYUv2 \n\n```\n$ cd MonoScene\u002F\n$ python monoscene\u002Fscripts\u002Fvisualization\u002FNYU_vis_pred.py +file=\u002Fpath\u002Fto\u002Foutput\u002Ffile.pkl\n```\n\n### Semantic KITTI \n\n```\n$ cd MonoScene\u002F\n$ python monoscene\u002Fscripts\u002Fvisualization\u002Fkitti_vis_pred.py +file=\u002Fpath\u002Fto\u002Foutput\u002Ffile.pkl +dataset=kitt\n```\n\n\n### KITTI-360\n\n```\n$ cd MonoScene\u002F \n$ python monoscene\u002Fscripts\u002Fvisualization\u002Fkitti_vis_pred.py +file=\u002Fpath\u002Fto\u002Foutput\u002Ffile.pkl +dataset=kitti_360\n```\n\n# 相关的仅基于相机的 3D 占用预测项目\n\n\n\n- [NDC-Scene: 在归一化设备坐标空间中提升单目 3D 语义场景补全](https:\u002F\u002Fgithub.com\u002FJiawei-Yao0812\u002FNDCScene)，ICCV 2023。\n- [OG: 为视觉占用预测配备实例分割和视觉定位能力](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.05873)，arXiv 2023。\n- [FB-OCC: 基于前后视图变换的 3D 占用预测](https:\u002F\u002Fgithub.com\u002FNVlabs\u002FFB-BEV)，CVPRW 2023。\n- [通过上下文实例查询统一 3D 语义场景补全](https:\u002F\u002Fgithub.com\u002Fhustvl\u002FSymphonies)，arXiv 2023。\n- [OVO: 开放词汇占用预测](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2305.16133.pdf)，arXiv 2023。\n- [OccNet: 场景即占用](https:\u002F\u002Fgithub.com\u002Fopendrivelab\u002Foccnet)，ICCV 2023。\n- [SceneRF: 基于辐射场的自监督单目 3D 场景重建](https:\u002F\u002Fastra-vision.github.io\u002FSceneRF\u002F)，ICCV 2023。\n- [幕后：用于单视图重建的密度场](https:\u002F\u002Ffwmb.github.io\u002Fbts\u002F)，CVPR 2023。\n- [VoxFormer: 基于稀疏体素的 Transformer 用于基于相机的 3D 语义场景补全](https:\u002F\u002Fgithub.com\u002FNVlabs\u002FVoxFormer)，CVPR 2023。\n- [OccDepth: 一种深度感知的 3D 语义占用网络方法](https:\u002F\u002Fgithub.com\u002Fmegvii-research\u002FOccDepth)，arXiv 2023。\n- [StereoScene: BEV 辅助立体匹配赋能 3D 语义场景补全](https:\u002F\u002Fgithub.com\u002FArlo0o\u002FStereoScene)，arXiv 2023。\n- [三视角用于基于视觉的 3D 语义占用预测](https:\u002F\u002Fgithub.com\u002Fwzzheng\u002FTPVFormer)，CVPR 2023。\n- [自动驾驶中 3D 占用估计的简单尝试](https:\u002F\u002Fgithub.com\u002FGANWANSHUI\u002FSimpleOccupancy)，arXiv 2023。\n- [OccFormer: 双路径 Transformer 用于基于视觉的 3D 语义占用预测](https:\u002F\u002Fgithub.com\u002Fzhangyp15\u002FOccFormer)，ICCV 2023。\n- [SurroundOcc: 多摄像头 3D 占用预测用于自动驾驶](https:\u002F\u002Fgithub.com\u002Fweiyithu\u002FSurroundOcc)，ICCV 2023。\n- [PanoOcc: 统一的占用表示用于基于相机的 3D 全景分割](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.10013)，arXiv 2023。\n- [PointOcc: 基于柱状三视角的点云 3D 语义占用预测](https:\u002F\u002Fgithub.com\u002Fwzzheng\u002FPointOcc)，arXiv 2023。\n- [RenderOcc: 以视觉为中心的 3D 占用预测，辅以 2D 渲染监督](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.09502)，arXiv 2023。\n\n## 数据集\u002F基准测试\n- [PointSSC: 用于语义场景补全的车路协同点云基准测试](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.12708)，arXiv 2023。\n- [OpenOccupancy: 用于周围语义占用感知的大规模基准测试](https:\u002F\u002Fgithub.com\u002FJeffWang987\u002FOpenOccupancy)，ICCV 2023。\n- [nuScenes 的占用数据集](https:\u002F\u002Fgithub.com\u002FFANG-MING\u002Foccupancy-for-nuscenes)，Github 2023\n- [Occ3D: 用于自动驾驶的大规模 3D 占用预测基准测试](https:\u002F\u002Fgithub.com\u002FTsinghua-MARS-Lab\u002FOcc3D)，arXiv 2023。\n- [OccNet: 场景即占用](https:\u002F\u002Fgithub.com\u002Fopendrivelab\u002Foccnet)，ICCV 2023。\n- [SSCBench: 用于自动驾驶的大规模 3D 语义场景补全基准测试](https:\u002F\u002Fgithub.com\u002Fai4ce\u002FSSCBench)，arXiv 2023。\n\n\n\n\n\n# 许可证\nMonoScene 根据 [Apache 2.0 许可证](.\u002FLICENSE) 发布。","# MonoScene 快速上手指南\n\nMonoScene 是一个基于单目图像的 3D 语义场景补全工具，支持 SemanticKITTI、NYUv2 和 KITTI-360 数据集。本指南将帮助你快速完成环境配置、安装及基础推理。\n\n## 1. 环境准备\n\n在开始之前，请确保你的系统满足以下要求：\n- **操作系统**: Linux (推荐 Ubuntu 20.04)\n- **Python**: 3.7 (项目严格依赖此版本)\n- **CUDA**: 10.2\n- **PyTorch**: 1.7.1\n- **GPU**: 建议具备至少 8GB 显存用于训练，推理可根据模型大小调整\n\n> **注意**：由于项目依赖较旧版本的 PyTorch 和 Python，强烈建议使用 `conda` 创建独立的虚拟环境以避免冲突。\n\n## 2. 安装步骤\n\n### 2.1 创建 Conda 环境并安装 PyTorch\n\n```bash\n# 创建名为 monoscene 的 Python 3.7 环境\nconda create -y -n monoscene python=3.7\nconda activate monoscene\n\n# 安装指定版本的 PyTorch (CUDA 10.2)\n# 如果下载速度慢，可尝试添加清华源：-c https:\u002F\u002Fmirrors.tuna.tsinghua.edu.cn\u002Fanaconda\u002Fcloud\u002Fpytorch\u002F\nconda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.2 -c pytorch\n```\n\n### 2.2 安装项目依赖\n\n克隆代码库并安装所需依赖：\n\n```bash\n# 假设你已经 git clone 了仓库并进入目录\ncd MonoScene\u002F\n\n# 安装 requirements.txt 中的依赖\npip install -r requirements.txt\n\n# 安装 tbb 库\nconda install -c bioconda tbb=2020.2\n\n# 强制降级 torchmetrics 到 0.6.0 (关键步骤，版本不匹配会报错)\npip install torchmetrics==0.6.0\n\n# 以编辑模式安装 MonoScene\npip install -e .\u002F\n```\n\n### 2.3 安装可视化工具 (可选)\n\n如果需要运行官方可视化脚本，需安装 `mayavi`。若安装困难，可使用 Open3D 替代方案（见原文注）。\n\n```bash\n# 安装 mayavi 及其依赖\npip install tqdm omegaconf hydra-core\n# mayavi 安装较为复杂，建议参考作者提供的专门教程或使用 conda 安装\nconda install -c conda-forge mayavi\n```\n\n## 3. 基本使用\n\n以下演示如何使用预训练模型在 **NYUv2** 数据集上进行最简单的**推理与结果生成**。\n\n### 3.1 准备数据与模型\n\n1.  **下载数据集**：下载 NYUv2 数据集并解压。\n2.  **下载预训练模型**：下载 [NYUv2 预训练权重](https:\u002F\u002Fwww.rocq.inria.fr\u002Frits_files\u002Fcomputer-vision\u002Fmonoscene\u002Fmonoscene_nyu.ckpt)，并将其放入项目根目录下的 `trained_models` 文件夹中。\n3.  **配置环境变量**：\n\n```bash\n# 替换为你的实际路径\nexport NYU_ROOT=\u002Fpath\u002Fto\u002FNYU\u002Fdepthbin \nexport NYU_PREPROCESS=\u002Fpath\u002Fto\u002FNYU\u002Fpreprocess\u002Ffolder\nexport MONOSCENE_OUTPUT=\u002Fpath\u002Fto\u002Fmonoscene\u002Foutput\n\n# 执行数据预处理 (生成低分辨率标签)\npython monoscene\u002Fdata\u002FNYU\u002Fpreprocess.py NYU_root=$NYU_ROOT NYU_preprocess_root=$NYU_PREPROCESS\n```\n\n### 3.2 执行推理\n\n运行以下命令生成预测结果（`.pkl` 文件）：\n\n```bash\ncd MonoScene\u002F\npython monoscene\u002Fscripts\u002Fgenerate_output.py \\\n    +output_path=$MONOSCENE_OUTPUT \\\n    dataset=NYU \\\n    NYU_root=$NYU_ROOT \\\n    NYU_preprocess_root=$NYU_PREPROCESS \\\n    n_gpus=1 batch_size=1\n```\n\n### 3.3 可视化结果 (可选)\n\n如果你已成功安装 `mayavi`，可以使用以下命令查看生成的 3D 场景：\n\n```bash\n# 将 \u002Fpath\u002Fto\u002Foutput\u002Ffile.pkl 替换为上一步生成的实际文件路径\npython monoscene\u002Fscripts\u002Fvisualization\u002FNYU_vis_pred.py +file=\u002Fpath\u002Fto\u002Foutput\u002Ffile.pkl\n```\n\n> **提示**：对于 SemanticKITTI 或 KITTI-360 数据集，只需更改 `dataset` 参数及对应的根路径环境变量（`KITTI_ROOT`, `KITTI_360_ROOT` 等），命令格式与上述类似。","某自动驾驶初创团队正在开发仅依赖单目摄像头的城市道路感知系统，需要在低成本硬件上实现高精度的 3D 环境重建。\n\n### 没有 MonoScene 时\n- **硬件成本高昂**：为了获取准确的 3D 语义信息，必须配备昂贵的激光雷达（LiDAR）或多目立体相机，导致整车传感器预算超标。\n- **数据稀疏且不完整**：传统单目深度估计只能生成稀疏的点云，无法推断被车辆或建筑物遮挡区域的几何结构，导致地图存在大量“盲区”。\n- **语义理解割裂**：深度信息与语义分类（如车道线、人行道、障碍物）是分开处理的，难以在统一的 3D 体素空间中进行联合推理，增加了后端融合算法的复杂度。\n- **实时性差**：多传感器数据的时间同步和空间标定耗时费力，且复杂的融合管线难以在嵌入式设备上满足实时性要求。\n\n### 使用 MonoScene 后\n- **纯视觉低成本方案**：MonoScene 仅需单张 RGB 图像即可预测稠密的 3D 语义占据网格，成功移除了对激光雷达的依赖，大幅降低了硬件门槛。\n- **完整的场景补全**：利用其场景补全能力，模型能根据上下文逻辑“脑补”出遮挡区域背后的物体（如被公交车挡住的行人），生成了连续且完整的 3D 环境模型。\n- **几何与语义统一**：直接输出带有语义标签的 3D 体素，将几何重建与语义分割合二为一，简化了感知栈架构，提升了系统鲁棒性。\n- **部署高效灵活**：基于单一摄像头的输入流简化了数据预处理流程，使得算法更容易移植到算力有限的边缘计算平台上运行。\n\nMonoScene 的核心价值在于让低成本的单目摄像头具备了“透视”遮挡并理解完整 3D 世界的能力，为普及高阶自动驾驶提供了关键的技术路径。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fastra-vision_MonoScene_8190e8b2.gif","astra-vision","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fastra-vision_a8b3b8ea.png",null,"https:\u002F\u002Fgithub.com\u002Fastra-vision",[77],{"name":78,"color":79,"percentage":80},"Python","#3572A5",100,802,76,"2026-04-07T06:05:58","Apache-2.0",4,"Linux","需要 NVIDIA GPU，训练需 4 张 GPU (SemanticKITTI) 或 2 张 GPU (NYUv2)，CUDA 10.2","未说明",{"notes":90,"python":91,"dependencies":92},"1. 官方安装指南基于 Ubuntu 20.04，Windows\u002FmacOS 未明确支持且 Mayavi 在这些系统上安装可能困难。2. 可视化依赖 Mayavi，若安装失败可使用 Open3D 替代方案。3. 需手动下载 SemanticKITTI、NYUv2 或 KITTI-360 数据集并配置环境变量。4. 训练 SemanticKITTI 建议使用 4 卡，NYUv2 建议使用 2 卡。","3.7",[93,94,95,96,97,98,99,100,101,102],"pytorch==1.7.1","torchvision==0.8.2","torchaudio==0.7.2","cudatoolkit=10.2","tbb=2020.2","torchmetrics==0.6.0","mayavi","omegaconf","hydra-core","tqdm",[15,14,104],"其他",[106,107,108,109,110,111,112,113,99,114,115,116,117,118,119],"nyu-depth-v2","semantic-scene-completion","semantic-scene-understanding","single-image-reconstruction","monocular","2d-to-3d","semantic-kitti","kitti-360","pytorch","deep-learning","computer-vision","cvpr22","cvpr2022","occupancy-prediction","2026-03-27T02:49:30.150509","2026-04-11T03:24:41.510858",[123,128,133,138,143,148],{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},28781,"为什么论文中的模型指标（如 AICNet, 3DSketch）与原文献报告的数值差异很大？","这种差异通常是因为评估区域不同。如果您在“可见表面和遮挡空间（visible surface and occluded space）”上进行训练和评估，可以得到与原文献更接近的数值。例如，维护者在该设置下复现 AICNet 得到了 53.65% 的 IoU 和 28.23% 的 mIoU。请确保您的评估范围与其他 SSC 模型保持一致。","https:\u002F\u002Fgithub.com\u002Fastra-vision\u002FMonoScene\u002Fissues\u002F66",{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},28782,"训练时出现 'relation_loss: nan' 或 'total_loss: nan' 错误，且标签中包含大量 255 值，这是什么原因？","标签中的 255 通常代表忽略像素（ignore label）。如果数据集中存在大量 255 导致 frustum_nonempty 为零，除法运算会被跳过或赋予默认值，从而引发 NaN。此外，请检查类别权重（class_weights）的长度是否等于您的类别数量。用户反馈在调整数据集格式或处理标签后问题得以解决。","https:\u002F\u002Fgithub.com\u002Fastra-vision\u002FMonoScene\u002Fissues\u002F95",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},28783,"遇到 'TypeError: int object is not subscriptable' 错误或显存不足无法训练怎么办？","这通常是因为显存不足或配置错误。您可以尝试以下两种方法：\n1. 减小特征尺寸：在 `monoscene\u002Fscripts\u002Ftrain_monoscene.py` 第 54 行附近将特征大小减小（例如设为 32）。\n2. 禁用 3D CRP 模块：在 `monoscene\u002Fconfig\u002Fmonoscene.yaml` 第 26 行将 `context_prior` 设置为 `false`。\n注意：单卡训练通常需要 Tesla V100 32G 显存，减小批次大小（batch size）可能会影响批归一化效果。","https:\u002F\u002Fgithub.com\u002Fastra-vision\u002FMonoScene\u002Fissues\u002F1",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},28784,"DataLoader 无法加载数据或程序直接退出且无输出，如何解决？","这很可能是 PyTorch 版本不兼容导致的。代码在 PyTorch 1.7.1 版本下运行正常，但在 PyTorch 1.8 及以上版本中已知存在相关问题（参考 PyTorch issue #54752）。建议将 PyTorch 版本降级至 1.7.1 以解决此问题。","https:\u002F\u002Fgithub.com\u002Fastra-vision\u002FMonoScene\u002Fissues\u002F48",{"id":144,"question_zh":145,"answer_zh":146,"source_url":147},28785,"如何使用自己的数据集（只有 RGB 图像和语义信息，没有体素数据）进行训练？","MonoScene 主要设计用于场景完成（SSC），通常需要体素化的真值数据。如果您只有 RGB 和语义信息，需要自行构建数据加载器并将数据预处理为模型所需的格式。关键注意事项：\n1. 确保 `class_weights` 列表的长度与您数据集中的类别数量完全一致（检查 `monoscene\u002Floss\u002Fssc_loss.py`）。\n2. 如果没有体素数据，可能需要修改网络输入部分或仅利用其编码器部分进行特定任务，但这需要较多的代码调整。","https:\u002F\u002Fgithub.com\u002Fastra-vision\u002FMonoScene\u002Fissues\u002F85",{"id":149,"question_zh":150,"answer_zh":151,"source_url":147},28786,"启用 CE_ssc_loss 时程序报错，但其他 loss 正常，可能是什么原因？","这通常与类别不匹配有关。请重点检查配置文件或代码中的 `class_weights` 参数，确保其长度严格等于您数据集中的实际类别数。如果类别数不匹配，交叉熵损失函数（CE loss）在计算时会抛出维度错误。",[153],{"id":154,"version":155,"summary_zh":156,"released_at":157},197648,"v0.1","首次发布，附带预训练模型","2021-12-09T20:40:23"]