[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-nianticlabs--simplerecon":3,"tool-nianticlabs--simplerecon":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",153609,2,"2026-04-13T11:34:59",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":76,"owner_twitter":76,"owner_website":77,"owner_url":78,"languages":79,"stars":84,"forks":85,"last_commit_at":86,"license":87,"difficulty_score":10,"env_os":88,"env_gpu":89,"env_ram":90,"env_deps":91,"category_tags":101,"github_topics":102,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":113,"updated_at":114,"faqs":115,"releases":145},7213,"nianticlabs\u002Fsimplerecon","simplerecon","[ECCV 2022] SimpleRecon: 3D Reconstruction Without 3D Convolutions","SimpleRecon 是一款专注于多视图立体深度估计的开源算法，旨在通过输入的带姿态 RGB 图像序列，高效生成目标图像的高精度深度图。它主要解决了传统 3D 重建方法中依赖计算密集型的 3D 卷积操作这一痛点，从而在保持甚至提升重建质量的同时，显著降低了硬件资源消耗并提升了运行速度。\n\n该工具的核心技术亮点在于其独特的架构设计：完全摒弃了昂贵的 3D 卷积层，转而利用高效的 2D 卷积网络配合精心设计的代价体构建策略来完成三维信息推理。这种“去 3D 卷积”的思路不仅简化了模型结构，还使其在扫描网（ScanNet）等标准数据集上取得了极具竞争力的评估分数，包括更低的绝对误差和更高的 F-Score。\n\nSimpleRecon 非常适合计算机视觉领域的研究人员、算法工程师以及需要部署轻量化 3D 重建方案的开发者使用。对于希望深入理解现代多视图几何深度学习，或需要在有限算力环境下实现高质量点云融合与网格重建的专业人士来说，这是一个极具参考价值的基准实现。项目提供了完整的训练、测试及预训练模型支持，便于用户快速复现论文结果或进行二次开发。","# SimpleRecon: 3D Reconstruction Without 3D Convolutions\n\nThis is the reference PyTorch implementation for training and testing MVS depth estimation models using the method described in\n\n> **SimpleRecon: 3D Reconstruction Without 3D Convolutions**\n>\n> [Mohamed Sayed](https:\u002F\u002Fmasayed.com), [John Gibson](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fjohn-e-gibson-ii\u002F), [Jamie Watson](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fjamie-watson-544825127\u002F), [Victor Adrian Prisacariu](https:\u002F\u002Fwww.robots.ox.ac.uk\u002F~victor\u002F), [Michael Firman](http:\u002F\u002Fwww.michaelfirman.co.uk), and [Clément Godard](http:\u002F\u002Fwww0.cs.ucl.ac.uk\u002Fstaff\u002FC.Godard\u002F)\n>\n> [Paper, ECCV 2022 (arXiv pdf)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2208.14743), [Supplemental Material](https:\u002F\u002Fnianticlabs.github.io\u002Fsimplerecon\u002Fresources\u002FSimpleRecon_supp.pdf), [Project Page](https:\u002F\u002Fnianticlabs.github.io\u002Fsimplerecon\u002F), [Video](https:\u002F\u002Fyoutu.be\u002F3LP8jp45Ef8)\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnianticlabs_simplerecon_readme_4d9e8f3f1db2.jpeg\" alt=\"example output\" width=\"720\" \u002F>\n\u003C\u002Fp>\n\nhttps:\u002F\u002Fgithub.com\u002Fnianticlabs\u002Fsimplerecon\u002Fassets\u002F14994206\u002Fae5074c2-6537-45f1-9f5e-0b3646a96dcb\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F14994206\u002F189788536-5fa8a1b5-ae8b-4f64-92d6-1ff1abb03eaf.mp4\n\nThis code is for non-commercial use; please see the [license file](LICENSE) for terms. If you do find any part of this codebase helpful, please cite our paper using the BibTex below and link this repo. Thanks!\n\n## 🆕 Updates\n\n25\u002F05\u002F2023: Fixed package verions for `llvm-openmp`, `clang`, and `protobuf`. Do use this new environment file if you have trouble running the code and\u002For if dataloading is being limited to a single thread.\n\n09\u002F03\u002F2023: Added kornia version to the environments file to fix kornia typing issue. (thanks @natesimon!)\n\n26\u002F01\u002F2023: The license has been modified to make running the model for academic reasons easier. Please the LICENSE file for the exact details.\n\nThere is an update as of 31\u002F12\u002F2022 that fixes slightly wrong intrinsics, flip augmentation for the cost volume, and a \nnumerical precision bug in projection. All scores improve. You will need to update your forks and use new weights. See [Bug Fixes](#-bug-fixes).\n\nPrecomputed scans for online default frames are here: https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1dSOFI9GayYHQjsx4I_NG0-3ebCAfWXjV?usp=share_link \n\n## Table of Contents\n\n  * [🗺️ Overview](#%EF%B8%8F-overview)\n  * [⚙️ Setup](#%EF%B8%8F-setup)\n  * [📦 Models](#-models)\n  * [🚀 Speed](#-speed)\n  * [📝 TODOs:](#-todos)\n  * [🏃 Running out of the box!](#-running-out-of-the-box)\n  * [💾 ScanNetv2 Dataset](#-scannetv2-dataset)\n  * [🖼️🖼️🖼️ Frame Tuples](#%EF%B8%8F%EF%B8%8F%EF%B8%8F-frame-tuples)\n  * [📊 Testing and Evaluation](#-testing-and-evaluation)\n  * [👉☁️ Point Cloud Fusion](#%EF%B8%8F-point-cloud-fusion)\n  * [📊 Mesh Metrics](#-mesh-metrics)\n  * [⏳ Training](#-training)\n    + [🎛️ Finetuning a pretrained model](#%EF%B8%8F-finetuning-a-pretrained-model)\n  * [🔧 Other training and testing options](#-other-training-and-testing-options)\n  * [✨ Visualization](#-visualization)\n  * [📝🧮👩‍💻 Notation for Transformation Matrices](#-notation-for-transformation-matrices)\n  * [🗺️ World Coordinate System](#%EF%B8%8F-world-coordinate-system)\n  * [🐜🔧 Bug Fixes](#-bug-fixes)\n  * [🗺️💾 COLMAP Dataset](#%EF%B8%8F-colmap-dataset)\n  * [🙏 Acknowledgements](#-acknowledgements)\n  * [📜 BibTeX](#-bibtex)\n  * [👩‍⚖️ License](#%EF%B8%8F-license)\n\n## 🗺️ Overview\n\nSimpleRecon takes as input posed RGB images, and outputs a depth map for a target image.\n\n## ⚙️ Setup\n\nAssuming a fresh [Anaconda](https:\u002F\u002Fwww.anaconda.com\u002Fdownload\u002F) distribution, you can install dependencies with:\n```shell\nconda env create -f simplerecon_env.yml\n```\nWe ran our experiments with PyTorch 1.10, CUDA 11.3, Python 3.9.7 and Debian GNU\u002FLinux 10.\n\n## 📦 Models\n\nDownload a pretrained model into the `weights\u002F` folder.\n\nWe provide the following models (scores are with online default keyframes):\n\n| `--config`  | Model  | Abs Diff↓| Sq Rel↓ | delta \u003C 1.05↑| Chamfer↓ | F-Score↑ |\n|-------------|----------|--------------------|---------|---------|--------------|----------|\n| [`hero_model.yaml`](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1hCuKZjEq-AghrYAmFxJs_4eeixIlP488\u002Fview?usp=sharing) | Metadata + Resnet Matching | 0.0868 | 0.0127 | 74.26 | 5.69 | 0.680 |\n| [`dot_product_model.yaml`](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F13lW-VPgsl2eAo95E87RKWoK8KUZelkUK\u002Fview?usp=sharing) | Dot Product + Resnet Matching | 0.0910 | 0.0134 | 71.90 | 5.92 | 0.667 |\n\n`hero_model` is the one we use in the paper as **Ours**\n\n## 🚀 Speed\n\n| `--config` |  Model | Inference Speed (`--batch_size 1`) | Inference GPU memory  | Approximate training time   |\n|------------|------------|------------|-------------------------|-----------------------------|\n| `hero_model` | Hero, Metadata + Resnet | 130ms \u002F 70ms (speed optimized) | 2.6GB \u002F 5.7GB (speed optimized)        | 36 hours                    |\n| `dot_product_model` | Dot Product + Resnet | 80ms | 2.6GB        | 36 hours                    |\n\nWith larger batches speed increases considerably. With batch size 8 on the non-speed optimized model, the latency drops to \n~40ms.\n\n## 📝 TODOs:\n- [x] Simple scan for folks to quickly try the code, instead of downloading the ScanNetv2 test scenes. DONE\n- [x] ScanNetv2 extraction, ~~ETA 10th October~~ DONE\n- [ ] FPN model weights.\n- ~~[ ] Tutorial on how to use Scanniverse data, ETA 5th October 10th October 20th October~~ At present there is no publically available way of exporting scans from Scanniverse. You'll have to use ios-logger; NeuralRecon have a good tutorial on [this](https:\u002F\u002Fgithub.com\u002Fzju3dv\u002FNeuralRecon\u002Fblob\u002Fmaster\u002FDEMO.md), and a dataloader that accepts the processed format is at ```datasets\u002Farkit_dataset.py```. UPDATE: There is now a quick readme [data_scripts\u002FIOS_LOGGER_ARKIT_README.md](data_scripts\u002FIOS_LOGGER_ARKIT_README.md) for how to process and run inference an ios-logger scan using the script at ```data_scripts\u002Fios_logger_preprocessing.py```.\n\n## 🏃 Running out of the box!\n\nWe've now included two scans for people to try out immediately with the code. You can download these scans [from here](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1x-auV7vGCMdu5yZUMPcoP83p77QOuasT\u002Fview?usp=sharing).\n\nSteps:\n1. Download weights for the `hero_model` into the weights directory.\n2. Download the scans and unzip them to a directory of your choosing.\n3. Modify the value for the option `dataset_path` in `configs\u002Fdata\u002Fvdr_dense.yaml` to the base path of the unzipped vdr folder.\n4. You should be able to run it! Something like this will work:\n\n```bash\nCUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs\u002Fmodels\u002Fhero_model.yaml \\\n            --load_weights_from_checkpoint weights\u002Fhero_model.ckpt \\\n            --data_config configs\u002Fdata\u002Fvdr_dense.yaml \\\n            --num_workers 8 \\\n            --batch_size 2 \\\n            --fast_cost_volume \\\n            --run_fusion \\\n            --depth_fuser open3d \\\n            --fuse_color \\\n            --dump_depth_visualization;\n```\n\nThis will output meshes, quick depth viz, and socres when benchmarked against LiDAR depth under `OUTPUT_PATH`. \n\nThis command uses `vdr_dense.yaml` which will generate depths for every frame and fuse them into a mesh. In the paper we report scores with fused keyframes instead, and you can run those using `vdr_default.yaml`. You can also use `dense_offline` tuples by instead using `vdr_dense_offline.yaml`.\n\n\n\nSee the section below on testing and evaluation. Make sure to use the correct config flags for datasets. \n\n## 💾 ScanNetv2 Dataset\n\n~~Please follow the instructions [here](https:\u002F\u002Fgithub.com\u002FScanNet\u002FScanNet) to download the dataset. This dataset is quite big (>2TB), so make sure you have enough space, especially for extracting files.~~\n\n~~Once downloaded, use this [script](https:\u002F\u002Fgithub.com\u002FScanNet\u002FScanNet\u002Ftree\u002Fmaster\u002FSensReader\u002Fpython) to export raw sensor data to images and depth files.~~\n\nWe've written a quick tutorial and included modified scripts to help you with downloading and extracting ScanNetv2. You can find them at [data_scripts\u002Fscannet_wrangling_scripts\u002F](data_scripts\u002Fscannet_wrangling_scripts)\n\nYou should change the `dataset_path` config argument for ScanNetv2 data configs at `configs\u002Fdata\u002F` to match where your dataset is.\n\nThe codebase expects ScanNetv2 to be in the following format:\n\n    dataset_path\n        scans_test (test scans)\n            scene0707\n                scene0707_00_vh_clean_2.ply (gt mesh)\n                sensor_data\n                    frame-000261.pose.txt\n                    frame-000261.color.jpg \n                    frame-000261.color.512.png (optional, image at 512x384)\n                    frame-000261.color.640.png (optional, image at 640x480)\n                    frame-000261.depth.png (full res depth, stored scale *1000)\n                    frame-000261.depth.256.png (optional, depth at 256x192 also\n                                                scaled)\n                scene0707.txt (scan metadata and image sizes)\n                intrinsic\n                    intrinsic_depth.txt\n                    intrinsic_color.txt\n            ...\n        scans (val and train scans)\n            scene0000_00\n                (see above)\n            scene0000_01\n            ....\n\nIn this example `scene0707.txt` should contain the scan's metadata:\n\n        colorHeight = 968\n        colorToDepthExtrinsics = 0.999263 -0.010031 0.037048 ........\n        colorWidth = 1296\n        depthHeight = 480\n        depthWidth = 640\n        fx_color = 1170.187988\n        fx_depth = 570.924255\n        fy_color = 1170.187988\n        fy_depth = 570.924316\n        mx_color = 647.750000\n        mx_depth = 319.500000\n        my_color = 483.750000\n        my_depth = 239.500000\n        numColorFrames = 784\n        numDepthFrames = 784\n        numIMUmeasurements = 1632\n\n`frame-000261.pose.txt` should contain pose in the form:\n\n        -0.384739 0.271466 -0.882203 4.98152\n        0.921157 0.0521417 -0.385682 1.46821\n        -0.0587002 -0.961035 -0.270124 1.51837\n\n`frame-000261.color.512.png` and `frame-000261.color.640.png` are precached resized versions of the original image to save load and compute time during training and testing. `frame-000261.depth.256.png` is also a \nprecached resized version of the depth map. \n\nAll resized precached versions of depth and images are nice to have but not \nrequired. If they don't exist, the full resolution versions will be loaded, and downsampled on the fly.\n\n\n## 🖼️🖼️🖼️ Frame Tuples\n\nBy default, we estimate a depth map for each keyframe in a scan. We use DeepVideoMVS's heuristic for keyframe separation and construct tuples to match. We use the depth maps at these keyframes for depth fusion. For each keyframe, we associate a list of source frames that will be used to build the cost volume. We also use dense tuples, where we predict a depth map for each frame in the data, and not just at specific keyframes; these are mostly used for visualization.\n\nWe generate and export a list of tuples across all scans that act as the dataset's elements. We've precomputed these lists and they are available at `data_splits` under each dataset's split. For ScanNet's test scans they are at `data_splits\u002FScanNetv2\u002Fstandard_split`. Our core depth numbers are computed using `data_splits\u002FScanNetv2\u002Fstandard_split\u002Ftest_eight_view_deepvmvs.txt`.\n\n\n\nHere's a quick taxonamy of the type of tuples for test:\n\n- `default`: a tuple for every keyframe following DeepVideoMVS where all source frames are in the past. Used for all depth and mesh evaluation unless stated otherwise. For ScanNet use `data_splits\u002FScanNetv2\u002Fstandard_split\u002Ftest_eight_view_deepvmvs.txt`.\n- `offline`: a tuple for every frame in the scan where source frames can be both in the past and future relative to the current frame. These are useful when a scene is captured offline, and you want the best accuracy possible. With online tuples, the cost volume will contain empty regions as the camera moves away and all source frames lag behind; however with offline tuples, the cost volume is full on both ends, leading to a better scale (and metric) estimate.\n- `dense`: an online tuple (like default) for every frame in the scan where all source frames are in the past. For ScanNet this would be `data_splits\u002FScanNetv2\u002Fstandard_split\u002Ftest_eight_view_deepvmvs_dense.txt`.\n- `offline`: an offline tuple for every keyframefor every keyframe in the scan.\n\n\nFor the train and validation sets, we follow the same tuple augmentation strategy as in DeepVideoMVS and use the same core generation script.\n\nIf you'd like to generate these tuples yourself, you can use the scripts at `data_scripts\u002Fgenerate_train_tuples.py` for train tuples and `data_scripts\u002Fgenerate_test_tuples.py` for test tuples. These follow the same config format as `test.py` and will use whatever dataset class you build to read pose informaiton.\n\nExample for test:\n\n```bash\n# default tuples\npython .\u002Fdata_scripts\u002Fgenerate_test_tuples.py \n    --data_config configs\u002Fdata\u002Fscannet_default_test.yaml\n    --num_workers 16\n\n# dense tuples\npython .\u002Fdata_scripts\u002Fgenerate_test_tuples.py \n    --data_config configs\u002Fdata\u002Fscannet_dense_test.yaml\n    --num_workers 16\n```\n\nExamples for train:\n\n```bash\n# train\npython .\u002Fdata_scripts\u002Fgenerate_train_tuples.py \n    --data_config configs\u002Fdata\u002Fscannet_default_train.yaml\n    --num_workers 16\n\n# val\npython .\u002Fdata_scripts\u002Fgenerate_val_tuples.py \n    --data_config configs\u002Fdata\u002Fscannet_default_val.yaml\n    --num_workers 16\n```\n\nThese scripts will first check each frame in the dataset to make sure it has an existing RGB frame, an existing depth frame (if appropriate for the dataset), and also an existing and valid pose file. It will save these `valid_frames` in a text file in each scan's folder, but if the directory is read only, it will ignore saving a `valid_frames` file and generate tuples anyway.\n\n\n## 📊 Testing and Evaluation\n\nYou can use `test.py` for inferring and evaluating depth maps and fusing meshes. \n\nAll results will be stored at a base results folder (results_path) at:\n\n    opts.output_base_path\u002Fopts.name\u002Fopts.dataset\u002Fopts.frame_tuple_type\u002F\n\nwhere opts is the `options` class. For example, when `opts.output_base_path` is `.\u002Fresults`, `opts.name` is `HERO_MODEL`,\n`opts.dataset` is `scannet`, and `opts.frame_tuple_type` is `default`, the output directory will be \n\n    .\u002Fresults\u002FHERO_MODEL\u002Fscannet\u002Fdefault\u002F\n\nMake sure to set `--opts.output_base_path` to a directory suitable for you to store results.\n\n`--frame_tuple_type` is the type of image tuple used for MVS. A selection should \nbe provided in the `data_config` file you used. \n\nBy default `test.py` will attempt to compute depth scores for each frame and provide both frame averaged and scene averaged metrics. The script will save these scores (per scene and totals) under `results_path\u002Fscores`.\n\nWe've done our best to ensure that a torch batching bug through the matching \nencoder is fixed for (\u003C10^-4) accurate testing by disabling image batching \nthrough that encoder. Run `--batch_size 4` at most if in doubt, and if \nyou're looking to get as stable as possible numbers and avoid PyTorch \ngremlins, use `--batch_size 1` for comparison evaluation.\n\nIf you want to use this for speed, set `--fast_cost_volume` to True. This will\nenable batching through the matching encoder and will enable an einops \noptimized feature volume.\n\n\n```bash\n# Example command to just compute scores \nCUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs\u002Fmodels\u002Fhero_model.yaml \\\n            --load_weights_from_checkpoint weights\u002Fhero_model.ckpt \\\n            --data_config configs\u002Fdata\u002Fscannet_default_test.yaml \\\n            --num_workers 8 \\\n            --batch_size 4;\n\n# If you'd like to get a super fast version use:\nCUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs\u002Fmodels\u002Fhero_model.yaml \\\n            --load_weights_from_checkpoint weights\u002Fhero_model.ckpt \\\n            --data_config configs\u002Fdata\u002Fscannet_default_test.yaml \\\n            --num_workers 8 \\\n            --fast_cost_volume \\\n            --batch_size 2;\n```\n\nThis script can also be used to perform a few different auxiliary tasks, \nincluding:\n\n**TSDF Fusion**\n\nTo run TSDF fusion provide the `--run_fusion` flag. You have two choices for \nfusers\n1) `--depth_fuser ours` (default) will use our fuser, whose meshes are used \n    in most visualizations and for scores. This fuser does not support \n    color. We've provided a custom branch of scikit-image with our custom\n    implementation of `measure.matching_cubes` that allows single walled. We use \n    single walled meshes for evaluation. If this is isn't important to you, you\n    can set the export_single_mesh to `False` for call to `export_mesh` in `test.py`.\n2) `--depth_fuser open3d` will use the open3d depth fuser. This fuser \n    supports color and you can enable this by using the `--fuse_color` flag. \n\nBy default, depth maps will be clipped to 3m for fusion and a tsdf \nresolution of 0.04m\u003Csup>3\u003C\u002Fsup> will be used, but you can change that by changing both \n`--max_fusion_depth` and `--fusion_resolution`\n\nYou can optionnally ask for predicted depths used for fusion to be masked \nwhen no vaiid MVS information exists using `--mask_pred_depths`. This is not \nenabled by default.\n\nYou can also fuse the best guess depths from the cost volume before the \ncost volume encoder-decoder that introduces a strong image prior. You can do this by using \n`--fusion_use_raw_lowest_cost`.\n\nMeshes will be stored under `results_path\u002Fmeshes\u002F`.\n\n```bash\n# Example command to fuse depths to get meshes\nCUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs\u002Fmodels\u002Fhero_model.yaml \\\n            --load_weights_from_checkpoint weights\u002Fhero_model.ckpt \\\n            --data_config configs\u002Fdata\u002Fscannet_default_test.yaml \\\n            --num_workers 8 \\\n            --run_fusion \\\n            --batch_size 8;\n```\n\n**Cache depths**\n\nYou can optionally store depths by providing the `--cache_depths` flag. \nThey will be stored at `results_path\u002Fdepths`.\n\n```bash\n# Example command to compute scores and cache depths\nCUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs\u002Fmodels\u002Fhero_model.yaml \\\n            --load_weights_from_checkpoint weights\u002Fhero_model.ckpt \\\n            --data_config configs\u002Fdata\u002Fscannet_default_test.yaml \\\n            --num_workers 8 \\\n            --cache_depths \\\n            --batch_size 8;\n\n# Example command to fuse depths to get color meshes\nCUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs\u002Fmodels\u002Fhero_model.yaml \\\n            --load_weights_from_checkpoint weights\u002Fhero_model.ckpt \\\n            --data_config configs\u002Fdata\u002Fscannet_default_test.yaml \\\n            --num_workers 8 \\\n            --run_fusion \\\n            --depth_fuser open3d \\\n            --fuse_color \\\n            --batch_size 4;\n```\n**Quick viz**\n\nThere are other scripts for deeper visualizations of output depths and \nfusion, but for quick export of depth map visualization you can use \n`--dump_depth_visualization`. Visualizations will be stored at `results_path\u002Fviz\u002Fquick_viz\u002F`.\n\n\n```bash\n# Example command to output quick depth visualizations\nCUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs\u002Fmodels\u002Fhero_model.yaml \\\n            --load_weights_from_checkpoint weights\u002Fhero_model.ckpt \\\n            --data_config configs\u002Fdata\u002Fscannet_default_test.yaml \\\n            --num_workers 8 \\\n            --dump_depth_visualization \\\n            --batch_size 4;\n```\n## 👉☁️ Point Cloud Fusion\n\nWe also allow point cloud fusion of depth maps using the fuser from 3DVNet's [repo](https:\u002F\u002Fgithub.com\u002Falexrich021\u002F3dvnet\u002Fblob\u002Fmain\u002Fmv3d\u002Feval\u002Fpointcloudfusion_custom.py). \n\n```bash\n# Example command to fuse depths into point clouds.\nCUDA_VISIBLE_DEVICES=0 python pc_fusion.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs\u002Fmodels\u002Fhero_model.yaml \\\n            --load_weights_from_checkpoint weights\u002Fhero_model.ckpt \\\n            --data_config configs\u002Fdata\u002Fscannet_dense_test.yaml \\\n            --num_workers 8 \\\n            --batch_size 4;\n```\n\nChange `configs\u002Fdata\u002Fscannet_dense_test.yaml` to `configs\u002Fdata\u002Fscannet_default_test.yaml` to use keyframes only if you don't want to wait too long.\n\n## 📊 Mesh Metrics\n\nWe use TransformerFusion's [mesh evaluation](https:\u002F\u002Fgithub.com\u002FAljazBozic\u002FTransformerFusion\u002Fblob\u002Fmain\u002Fsrc\u002Fevaluation\u002Feval.py) for our main results table but set the seed to a fixed value for consistency when randomly sampling meshes. We also report mesh metrics using NeuralRecon's [evaluation](https:\u002F\u002Fgithub.com\u002Fzju3dv\u002FNeuralRecon\u002Fblob\u002Fmaster\u002Ftools\u002Fevaluation.py) in the supplemental material.\n\nFor point cloud evaluation, we use TransformerFusion's code but load in a point cloud in place of sampling a mesh's surface.\n\n\n\n## ⏳ Training\n\nBy default models and tensorboard event files are saved to `~\u002Ftmp\u002Ftensorboard\u002F\u003Cmodel_name>`.\nThis can be changed with the `--log_dir` flag.\n\nWe train with a batch_size of 16 with 16-bit precision on two A100s on the default ScanNetv2 split.\n\nExample command to train with two GPUs:\n```shell\nCUDA_VISIBLE_DEVICES=0,1 python train.py --name HERO_MODEL \\\n            --log_dir logs \\\n            --config_file configs\u002Fmodels\u002Fhero_model.yaml \\\n            --data_config configs\u002Fdata\u002Fscannet_default_train.yaml \\\n            --gpus 2 \\\n            --batch_size 16;\n```\n\n\nThe code supports any number of GPUs for training.\nYou can specify which GPUs to use with the `CUDA_VISIBLE_DEVICES` environment.\n\nAll our training runs were performed on two NVIDIA A100s.\n\n**Different dataset**\n\nYou can train on a custom MVS dataset by writing a new dataloader class which inherits from `GenericMVSDataset` at `datasets\u002Fgeneric_mvs_dataset.py`. See the `ScannetDataset` class in `datasets\u002Fscannet_dataset.py` or indeed any other class in `datasets` for an example.\n\n\n### 🎛️ Finetuning a pretrained model\n\nTo finetune, simple load a checkpoint (not resume!) and train from there:\n```shell\nCUDA_VISIBLE_DEVICES=0 python train.py --config configs\u002Fmodels\u002Fhero_model.yaml\n                --data_config configs\u002Fdata\u002Fscannet_default_train.yaml \n                --load_weights_from_checkpoint weights\u002Fhero_model.ckpt\n```\n\nChange the data configs to whatever dataset you want to finetune to. \n\n## 🔧 Other training and testing options\n\nSee `options.py` for the range of other training options, such as learning rates and ablation settings, and testing options.\n\n## ✨ Visualization\n\nOther than quick depth visualization in the `test.py` script, there are two scripts for visualizing depth output. \n\nThe first is `visualization_scripts\u002Fvisualize_scene_depth_output.py`. This will produce a video with color images of the reference and source frames, depth prediction, cost volume estimate, GT depth, and estimated normals from depth. The script assumes you have cached depth output using `test.py` and accepts the same command template format as `test.py`:\n\n```shell\n# Example command to get visualizations for dense frames\nCUDA_VISIBLE_DEVICES=0 python .\u002Fvisualization_scripts\u002Fvisualize_scene_depth_output.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --data_config configs\u002Fdata\u002Fscannet_dense_test.yaml \\\n            --num_workers 8;\n```\n\nwhere `OUTPUT_PATH` is the base results directory for SimpleRecon (what you used for test to begin with). You could optionally run `.visualization_scripts\u002Fgenerate_gt_min_max_cache.py` before this script to get a scene average for the min and max depth values used for colormapping; if those aren't available, the script will use 0m and 5m for colomapping min and max.\n\nThe second allows a live visualization of meshing. This script will use cached depth maps if available, otherwise it will use the model to predict them before fusion. The script will iteratively load in a depth map, fuse it, save a mesh file at this step, and render this mesh alongside a camera marker for the birdseye video, and from the point of view of the camera for the fpv video. \n\n```shell\n# Example command to get live visualizations for mesh reconstruction\nCUDA_VISIBLE_DEVICES=0 python visualize_live_meshing.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs\u002Fmodels\u002Fhero_model.yaml \\\n            --load_weights_from_checkpoint weights\u002Fhero_model.ckpt \\\n            --data_config configs\u002Fdata\u002Fscannet_dense_test.yaml \\\n            --num_workers 8;\n```\n\nBy default the script will save meshes to an intermediate location, and you can optionally load those meshes to save time when visualizing the same meshes again by passing `--use_precomputed_partial_meshes`. All intermediate meshes will have had to be computed on the previous run for this to work.\n\n## 📝🧮👩‍💻 Notation for Transformation Matrices\n\n__TL;DR:__ `world_T_cam == world_from_cam`  \nThis repo uses the notation \"cam_T_world\" to denote a transformation from world to camera points (extrinsics). The intention is to make it so that the coordinate frame names would match on either side of the variable when used in multiplication from *right to left*:\n\n    cam_points = cam_T_world @ world_points\n\n`world_T_cam` denotes camera pose (from cam to world coords). `ref_T_src` denotes a transformation from a source to a reference view.  \nFinally this notation allows for representing both rotations and translations such as: `world_R_cam` and `world_t_cam`\n\n## 🗺️ World Coordinate System\n\nThis repo is geared towards ScanNet, so while its functionality should allow for any coordinate system (signaled via input flags), the model weights we provide assume a ScanNet coordinate system. This is important since we include ray information as part of metadata. Other datasets used with these weights should be transformed to the ScanNet system. The dataset classes we include will perform the appropriate transforms. \n\n## 🐜🔧 Bug Fixes\n\n### **Update 31\u002F12\u002F2022:**\n\nThere are a few bugs addressed in this update, you will need to update your forks and use new weights from the table near the beginning of this README. You will also need to make sure you have the correct intrinsics files extracted using the reader.\n- We were initially using a slightly incorrect set of intrinsics in ScanNet. The repo now uses intriniscs from the intriniscs folder.\n- The MLP in the cost volume wasn't seeing any flip augmentation which led to biases around edges, so we've now included a geometry based flip in the base dataset class. It is enabled only for the train split.\n- We had a bug in projection that never allowed the mask in the cost volume to properly function, so we've now switched to using the same normalization as in OpenCV and Kornia.\n\nThanks to all those that pointed it out and were patient while we worked on fixes. \n\nAll scores improve with these fixes, and the associated weights are uploaded here. For old scores, code, and weights, check this commit hash: 7de5b451e340f9a11c7fd67bd0c42204d0b009a9\n\nFull scores for models with bug fixes:\n\n_Depth_\n| `--config`  | Abs Diff↓ | Abs Rel↓ | Sq Rel↓ |  RMSE↓  |  log RMSE↓  |delta \u003C 1.05↑ | delta \u003C 1.10↑ |\n|-------------|-----------|----------|---------|---------|-------------|--------------|---------------|\n| `hero_model.yaml`, Metadata + Resnet  | 0.0868 | 0.0428 | 0.0127 | 0.1472 |  0.0681 | 74.26 | 90.88 |\n| `dot_product_model.yaml`, dot product + Resnet | 0.0910 | 0.0453 | 0.0134 | 0.1509 | 0.0704 | 71.90 | 89.75 | \n\n_Mesh Fusion_\n| `--config`  | Acc↓ | Comp↓ | Chamfer↓ | Recall↑ | Precision↑ | F-Score↑ |\n|-------------|------|-------|----------|---------|------------|----------|\n| `hero_model.yaml`, Metadata + Resnet | 5.41 | 5.98 | 5.69 | 0.695 | 0.668 | 0.680 |\n| `dot_product_model.yaml`, dot product + Resnet | 5.66 | 6.18 | 5.92 | 0.682 | 0.655 | 0.667 | \n\n\n_Comparison:_\n| `--config`  | Model  | Abs Diff↓| Sq Rel↓ | delta \u003C 1.05↑| Chamfer↓ | F-Score↑ |\n|-------------|----------|--------------------|---------|---------|--------------|----------|\n| `hero_model.yaml` | Metadata + Resnet Matching | 0.0868 | 0.0127 | 74.26 | 5.69 | 0.680 |\n| OLD `hero_model.yaml` | Metadata + Resnet Matching | 0.0885 | 0.0125 | 73.16 | 5.81 | 0.671 |\n| `dot_product_model.yaml` | Dot Product + Resnet Matching | 0.0910 | 0.0134 | 71.90 | 5.92 | 0.667 |\n| OLD `dot_product_model.yaml` | Dot Product + Resnet Matching | 0.0941 | 0.0139 | 70.48 | 6.29 | 0.642 |\n\n\n### **Tiny bug with frame count:**\n\nInitially this repo spat out tuple files for default DVMVS style keyframes with 9 extra frame of 25599 for the ScanNetv2 test set. There was a minor bug with handling lost tracking that's now fixed. This repo should now mimic the DVMVS keyframe buffer exactly, with 25590 keyframes for testing. The only effect this bug had was the inclusion of 9 extra frames, all the other tuples were exactly the same as that of DVMVS. The offending frames are in these scans \n\n```\nscan         previous count  new count\n--------------------------------------\nscene0711_00 393             392\nscene0727_00 209             208 \nscene0736_00 1023            1022 \nscene0737_00 408             407 \nscene0751_00 165             164 \nscene0775_00 220             219 \nscene0791_00 227             226 \nscene0794_00 141             140 \nscene0795_00 102             101 \n```\n\nThe tuple files for default test have been updated. Since this is a small (~3e-4) difference in extra frames scored, the scores are unchanged.\n\n## 🗺️💾 COLMAP Dataset\n\n__TL;DR:__ Scale your poses and crop your images.\n\nWe do provide a dataloader for loading images from a COLMAP sparse reconstruction. For this to work with SimpleRecon, you'll need to crop your images to match the FOV of ScanNet (roughly similar to an iPhone's FOV in video mode), and scale your pose's location using known real world measurements. If these steps aren't taken, the cost volume won't be built correctly, and the network will not estimate depth properly.\n\n## 🙏 Acknowledgements\n\nWe thank Aljaž Božič of [TransformerFusion](https:\u002F\u002Fgithub.com\u002FAljazBozic\u002FTransformerFusion), Jiaming Sun of [Neural Recon](https:\u002F\u002Fzju3dv.github.io\u002Fneuralrecon\u002F), and Arda Düzçeker of [DeepVideoMVS](https:\u002F\u002Fgithub.com\u002Fardaduz\u002Fdeep-video-mvs) for quickly providing useful information to help with baselines and for making their codebases readily available, especially on short notice.\n\nThe tuple generation scripts make heavy use of a modified version of DeepVideoMVS's [Keyframe buffer](https:\u002F\u002Fgithub.com\u002Fardaduz\u002Fdeep-video-mvs\u002Fblob\u002Fmaster\u002Fdvmvs\u002Fkeyframe_buffer.py) (thanks again Arda and co!).\n\nThe PyTorch point cloud fusion module at `torch_point_cloud_fusion` code is borrowed from 3DVNet's [repo](https:\u002F\u002Fgithub.com\u002Falexrich021\u002F3dvnet\u002Fblob\u002Fmain\u002Fmv3d\u002Feval\u002Fpointcloudfusion_custom.py). Thanks Alexander Rich!\n\nWe'd also like to thank Niantic's infrastructure team for quick actions when we needed them. Thanks folks!\n\nMohamed is funded by a Microsoft Research PhD Scholarship (MRL 2018-085).\n\n## 📜 BibTeX\n\nIf you find our work useful in your research please consider citing our paper:\n\n```\n@inproceedings{sayed2022simplerecon,\n  title={SimpleRecon: 3D Reconstruction Without 3D Convolutions},\n  author={Sayed, Mohamed and Gibson, John and Watson, Jamie and Prisacariu, Victor and Firman, Michael and Godard, Cl{\\'e}ment},\n  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},\n  year={2022},\n}\n```\n\n## 👩‍⚖️ License\n\nCopyright © Niantic, Inc. 2022. Patent Pending.\nAll rights reserved.\nPlease see the [license file](LICENSE) for terms.\n","# SimpleRecon：无需3D卷积的3D重建\n\n这是使用以下论文中描述的方法训练和测试MVS深度估计模型的参考PyTorch实现：\n\n> **SimpleRecon：无需3D卷积的3D重建**\n>\n> [Mohamed Sayed](https:\u002F\u002Fmasayed.com)、[John Gibson](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fjohn-e-gibson-ii\u002F)、[Jamie Watson](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fjamie-watson-544825127\u002F)、[Victor Adrian Prisacariu](https:\u002F\u002Fwww.robots.ox.ac.uk\u002F~victor\u002F)、[Michael Firman](http:\u002F\u002Fwww.michaelfirman.co.uk) 和 [Clément Godard](http:\u002F\u002Fwww0.cs.ucl.ac.uk\u002Fstaff\u002FC.Godard\u002F)\n>\n> [论文，ECCV 2022（arXiv pdf）](https:\u002F\u002Farxiv.org\u002Fabs\u002F2208.14743)、[补充材料](https:\u002F\u002Fnianticlabs.github.io\u002Fsimplerecon\u002Fresources\u002FSimpleRecon_supp.pdf)、[项目页面](https:\u002F\u002Fnianticlabs.github.io\u002Fsimplerecon\u002F)、[视频](https:\u002F\u002Fyoutu.be\u002F3LP8jp45Ef8)\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnianticlabs_simplerecon_readme_4d9e8f3f1db2.jpeg\" alt=\"示例输出\" width=\"720\" \u002F>\n\u003C\u002Fp>\n\nhttps:\u002F\u002Fgithub.com\u002Fnianticlabs\u002Fsimplerecon\u002Fassets\u002F14994206\u002Fae5074c2-6537-45f1-9f5e-0b3646a96dcb\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F14994206\u002F189788536-5fa8a1b5-ae8b-4f64-92d6-1ff1abb03eaf.mp4\n\n此代码仅供非商业用途；详细条款请参阅[许可证文件](LICENSE)。如果您发现本代码库中的任何部分对您有所帮助，请使用下面的BibTex引用我们的论文，并链接到此仓库。谢谢！\n\n## 🆕 更新\n\n2023年5月25日：修复了`llvm-openmp`、`clang`和`protobuf`的软件包版本。如果您在运行代码时遇到困难，或者数据加载被限制为单线程，请使用这个新的环境文件。\n\n2023年3月9日：在环境文件中添加了kornia版本，以修复kornia的类型问题。（感谢@natesimon！）\n\n2023年1月26日：许可证已修改，以便更方便地出于学术目的运行模型。具体细节请参阅LICENSE文件。\n\n2022年12月31日发布了一个更新，修复了略微错误的内参、代价体积的翻转增强以及投影中的数值精度 bug。所有指标均有所提升。您需要更新您的分支并使用新的权重。详情请参阅[Bug Fixes](#-bug-fixes)。\n\n用于在线默认帧的预计算扫描在此处：https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1dSOFI9GayYHQjsx4I_NG0-3ebCAfWXjV?usp=share_link \n\n## 目录\n\n  * [🗺️ 概述](#%EF%B8%8F-overview)\n  * [⚙️ 设置](#%EF%B8%8F-setup)\n  * [📦 模型](#-models)\n  * [🚀 速度](#-speed)\n  * [📝 待办事项：](#-todos)\n  * [🏃 开箱即用！](#-running-out-of-the-box)\n  * [💾 ScanNetv2 数据集](#-scannetv2-dataset)\n  * [🖼️🖼️🖼️ 帧组](#%EF%B8%8F%EF%B8%8F%EF%B8%8F-frame-tuples)\n  * [📊 测试与评估](#-testing-and-evaluation)\n  * [👉☁️ 点云融合](#%EF%B8%8F-point-cloud-fusion)\n  * [📊 网格指标](#-mesh-metrics)\n  * [⏳ 训练](#-training)\n    + [🎛️ 微调预训练模型](#%EF%B8%8F-finetuning-a-pretrained-model)\n  * [🔧 其他训练和测试选项](#-other-training-and-testing-options)\n  * [✨ 可视化](#-visualization)\n  * [📝🧮👩‍💻 变换矩阵表示法](#-notation-for-transformation-matrices)\n  * [🗺️ 世界坐标系](#%EF%B8%8F-world-coordinate-system)\n  * [🐜🔧 Bug Fixes](#-bug-fixes)\n  * [🗺️💾 COLMAP 数据集](#%EF%B8%8F-colmap-dataset)\n  * [🙏 致谢](#-acknowledgements)\n  * [📜 BibTeX](#-bibtex)\n  * [👩‍⚖️ 许可证](#%EF%B8%8F-license)\n\n## 🗺️ 概述\n\nSimpleRecon以带有位姿信息的RGB图像作为输入，输出目标图像的深度图。\n\n## ⚙️ 设置\n\n假设您已经安装了全新的[Anaconda](https:\u002F\u002Fwww.anaconda.com\u002Fdownload\u002F)发行版，您可以使用以下命令安装依赖项：\n```shell\nconda env create -f simplerecon_env.yml\n```\n我们使用PyTorch 1.10、CUDA 11.3、Python 3.9.7和Debian GNU\u002FLinux 10进行了实验。\n\n## 📦 模型\n\n将预训练模型下载到`weights\u002F`文件夹中。\n\n我们提供了以下模型（分数基于在线默认关键帧）：\n\n| `--config`  | 模型  | 绝对误差↓| 均方相对误差↓ | delta \u003C 1.05↑| Chamfer↓ | F-Score↑ |\n|-------------|----------|--------------------|---------|---------|--------------|----------|\n| [`hero_model.yaml`](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1hCuKZjEq-AghrYAmFxJs_4eeixIlP488\u002Fview?usp=sharing) | 元数据 + Resnet匹配 | 0.0868 | 0.0127 | 74.26 | 5.69 | 0.680 |\n| [`dot_product_model.yaml`](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F13lW-VPgsl2eAo95E87RKWoK8KUZelkUK\u002Fview?usp=sharing) | 点积 + Resnet匹配 | 0.0910 | 0.0134 | 71.90 | 5.92 | 0.667 |\n\n`hero_model`是我们论文中使用的模型，标记为**Ours**。\n\n## 🚀 速度\n\n| `--config` |  模型 | 推理速度 (`--batch_size 1`) | 推理GPU内存  | 大致训练时间   |\n|------------|------------|------------|-------------------------|-----------------------------|\n| `hero_model` | 英雄，元数据 + Resnet | 130ms \u002F 70ms（速度优化） | 2.6GB \u002F 5.7GB（速度优化）        | 36小时                    |\n| `dot_product_model` | 点积 + Resnet | 80ms | 2.6GB        | 36小时                    |\n\n使用更大的批量时，速度会显著提高。在未优化速度的模型上，当批量大小为8时，延迟降至约40毫秒。\n\n## 📝 待办事项：\n- [x] 提供一个简单的扫描数据，方便大家快速试用代码，而不用下载ScanNetv2的测试场景。已完成\n- [x] ScanNetv2数据提取，~~预计10月10日~~ 已完成\n- [ ] FPN模型权重。\n- ~~[ ] 如何使用Scanniverse数据的教程，预计10月5日、10月10日、10月20日~~ 目前尚无公开可用的方式从Scanniverse导出扫描数据。您需要使用ios-logger；NeuralRecon有一个很好的教程[在这里](https:\u002F\u002Fgithub.com\u002Fzju3dv\u002FNeuralRecon\u002Fblob\u002Fmaster\u002FDEMO.md)，并且接受处理后格式的数据加载器位于```datasets\u002Farkit_dataset.py```。更新：现在有一个简短的说明文档[data_scripts\u002FIOS_LOGGER_ARKIT_README.md](data_scripts\u002FIOS_LOGGER_ARKIT_README.md)，介绍如何使用```data_scripts\u002Fios_logger_preprocessing.py```脚本处理并运行ios-logger扫描的推理。\n\n## 🏃 开箱即用！\n\n我们现在已经在代码中包含了两个扫描数据集，供大家立即试用。你可以从[这里](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1x-auV7vGCMdu5yZUMPcoP83p77QOuasT\u002Fview?usp=sharing)下载这些扫描数据。\n\n步骤：\n1. 将 `hero_model` 的权重文件下载到 weights 目录中。\n2. 下载扫描数据并解压到你选择的目录中。\n3. 修改 `configs\u002Fdata\u002Fvdr_dense.yaml` 文件中 `dataset_path` 选项的值，将其设置为解压后的 vdr 文件夹的根路径。\n4. 现在你应该可以运行了！以下命令应该可以正常工作：\n\n```bash\nCUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs\u002Fmodels\u002Fhero_model.yaml \\\n            --load_weights_from_checkpoint weights\u002Fhero_model.ckpt \\\n            --data_config configs\u002Fdata\u002Fvdr_dense.yaml \\\n            --num_workers 8 \\\n            --batch_size 2 \\\n            --fast_cost_volume \\\n            --run_fusion \\\n            --depth_fuser open3d \\\n            --fuse_color \\\n            --dump_depth_visualization;\n```\n\n这将会在 `OUTPUT_PATH` 下生成网格、快速深度可视化以及与 LiDAR 深度对比的评估分数。\n\n该命令使用的是 `vdr_dense.yaml` 配置文件，它会为每一帧生成深度图，并将它们融合成一个网格。而在论文中，我们报告的是基于关键帧融合的结果，你可以使用 `vdr_default.yaml` 来运行这些任务。此外，如果你需要处理密集离线帧对，则可以使用 `vdr_dense_offline.yaml`。\n\n请参阅下方的测试与评估部分，确保为不同的数据集使用正确的配置参数。\n\n## 💾 ScanNetv2 数据集\n\n~~请按照 [这里](https:\u002F\u002Fgithub.com\u002FScanNet\u002FScanNet) 的说明下载数据集。这个数据集非常大（超过 2TB），因此请确保你有足够的存储空间，尤其是在解压文件时。~~\n\n~~下载完成后，使用此 [脚本](https:\u002F\u002Fgithub.com\u002FScanNet\u002FScanNet\u002Ftree\u002Fmaster\u002FSensReader\u002Fpython) 将原始传感器数据导出为图像和深度文件。~~\n\n我们编写了一个简短的教程，并附上了修改过的脚本，以帮助你下载和提取 ScanNetv2 数据。你可以在 [data_scripts\u002Fscannet_wrangling_scripts\u002F](data_scripts\u002Fscannet_wrangling_scripts) 找到这些内容。\n\n你需要将 `configs\u002Fdata\u002F` 中 ScanNetv2 数据配置文件中的 `dataset_path` 参数修改为你的数据集所在路径。\n\n代码库期望 ScanNetv2 数据具有以下格式：\n\n    dataset_path\n        scans_test (测试扫描)\n            scene0707\n                scene0707_00_vh_clean_2.ply (真实世界网格)\n                sensor_data\n                    frame-000261.pose.txt\n                    frame-000261.color.jpg \n                    frame-000261.color.512.png (可选，分辨率为 512x384 的图像)\n                    frame-000261.color.640.png (可选，分辨率为 640x480 的图像)\n                    frame-000261.depth.png (全分辨率深度图，存储时乘以 1000)\n                    frame-000261.depth.256.png (可选，分辨率为 256x192 的深度图，同样经过缩放)\n                scene0707.txt (扫描元数据及图像尺寸)\n                intrinsic\n                    intrinsic_depth.txt\n                    intrinsic_color.txt\n            ...\n        scans (验证和训练扫描)\n            scene0000_00\n                (见上文)\n            scene0000_01\n            ....\n\n在这个例子中，`scene0707.txt` 应该包含扫描的元数据：\n\n        colorHeight = 968\n        colorToDepthExtrinsics = 0.999263 -0.010031 0.037048 ........\n        colorWidth = 1296\n        depthHeight = 480\n        depthWidth = 640\n        fx_color = 1170.187988\n        fx_depth = 570.924255\n        fy_color = 1170.187988\n        fy_depth = 570.924316\n        mx_color = 647.750000\n        mx_depth = 319.500000\n        my_color = 483.750000\n        my_depth = 239.500000\n        numColorFrames = 784\n        numDepthFrames = 784\n        numIMUmeasurements = 1632\n\n`frame-000261.pose.txt` 应该包含姿态信息，格式如下：\n\n        -0.384739 0.271466 -0.882203 4.98152\n        0.921157 0.0521417 -0.385682 1.46821\n        -0.0587002 -0.961035 -0.270124 1.51837\n\n`frame-000261.color.512.png` 和 `frame-000261.color.640.png` 是原始图像的预缓存缩放版本，用于在训练和测试过程中节省加载和计算时间。同样地，`frame-000261.depth.256.png` 也是深度图的预缓存缩放版本。\n\n所有预缓存的缩放版本的深度图和图像都是有益的，但并非必需。如果不存在这些预缓存版本，系统将直接加载全分辨率的版本，并在运行时进行下采样。\n\n## 🖼️🖼️🖼️ 帧对\n\n默认情况下，我们会为每个扫描中的关键帧估计一张深度图。我们使用 DeepVideoMVS 的启发式方法来分离关键帧，并构建相应的帧对。我们利用这些关键帧上的深度图进行深度融合。对于每个关键帧，我们还会关联一个源帧列表，用于构建代价体积。此外，我们还使用密集帧对，即为数据中的每一帧都预测一张深度图，而不仅仅是特定的关键帧；这些主要用于可视化目的。\n\n我们会在所有扫描中生成并导出一列帧对，作为数据集的基本元素。这些列表已经预先计算好，可在每个数据集的 `data_splits` 目录下找到。对于 ScanNet 的测试扫描，它们位于 `data_splits\u002FScanNetv2\u002Fstandard_split`。我们的核心深度指标是通过 `data_splits\u002FScanNetv2\u002Fstandard_split\u002Ftest_eight_view_deepvmvs.txt` 计算得出的。\n\n\n\n以下是测试用帧对类型的简要分类：\n\n- `default`：遵循 DeepVideoMVS 规则的每帧关键帧对，所有源帧都在当前帧之前。除非另有说明，否则所有深度和网格评估都使用此类型。对于 ScanNet，请使用 `data_splits\u002FScanNetv2\u002Fstandard_split\u002Ftest_eight_view_deepvmvs.txt`。\n- `offline`：扫描中每一帧的帧对，其中源帧既可以是当前帧之前的，也可以是之后的。这类帧对在场景是离线拍摄时非常有用，因为可以获得最高的精度。使用在线帧对时，随着相机移动，代价体积中会出现空白区域，因为所有源帧都落后于当前帧；而使用离线帧对时，代价体积的两端都会被填满，从而获得更好的尺度（和度量）估计。\n- `dense`：类似于 default 类型的在线帧对，但适用于扫描中的每一帧，且所有源帧都在当前帧之前。对于 ScanNet，这对应于 `data_splits\u002FScanNetv2\u002Fstandard_split\u002Ftest_eight_view_deepvmvs_dense.txt`。\n- `offline`：扫描中每一帧的关键帧离线帧对。\n\n\n对于训练和验证集，我们采用与 DeepVideoMVS 相同的帧对增强策略，并使用相同的生成脚本。\n\n如果你想自己生成这些帧对，可以使用 `data_scripts\u002Fgenerate_train_tuples.py` 脚本生成训练帧对，或使用 `data_scripts\u002Fgenerate_test_tuples.py` 脚本生成测试帧对。这些脚本遵循与 `test.py` 相同的配置格式，并会使用你构建的任何数据集类来读取姿态信息。\n\n测试示例：\n\n# 默认元组\npython .\u002Fdata_scripts\u002Fgenerate_test_tuples.py \n    --data_config configs\u002Fdata\u002Fscannet_default_test.yaml\n    --num_workers 16\n\n# 密集元组\npython .\u002Fdata_scripts\u002Fgenerate_test_tuples.py \n    --data_config configs\u002Fdata\u002Fscannet_dense_test.yaml\n    --num_workers 16\n```\n\n训练示例：\n\n```bash\n# 训练\npython .\u002Fdata_scripts\u002Fgenerate_train_tuples.py \n    --data_config configs\u002Fdata\u002Fscannet_default_train.yaml\n    --num_workers 16\n\n# 验证\npython .\u002Fdata_scripts\u002Fgenerate_val_tuples.py \n    --data_config configs\u002Fdata\u002Fscannet_default_val.yaml\n    --num_workers 16\n```\n\n这些脚本会首先检查数据集中的每一帧，确保其包含有效的RGB图像、深度图像（如果数据集适用），以及有效的位置姿态文件。它会将这些“有效帧”保存到每个扫描文件夹下的文本文件中；但如果目录为只读模式，则会忽略保存“valid_frames”文件，但仍会生成元组。\n\n## 📊 测试与评估\n\n您可以使用 `test.py` 进行深度图推理与评估，并进行网格融合。\n\n所有结果将存储在基础结果目录（results_path）下，路径为：\n\n    opts.output_base_path\u002Fopts.name\u002Fopts.dataset\u002Fopts.frame_tuple_type\u002F\n\n其中 `opts` 是 `options` 类。例如，当 `opts.output_base_path` 为 `.\u002Fresults`，`opts.name` 为 `HERO_MODEL`，`opts.dataset` 为 `scannet`，`opts.frame_tuple_type` 为 `default` 时，输出目录将是：\n\n    .\u002Fresults\u002FHERO_MODEL\u002Fscannet\u002Fdefault\u002F\n\n请确保将 `--opts.output_base_path` 设置为您适合存储结果的目录。\n\n`--frame_tuple_type` 是用于多视图立体视觉（MVS）的图像元组类型，应在您使用的 `data_config` 文件中指定。\n\n默认情况下，`test.py` 会尝试计算每帧的深度评分，并提供帧平均和场景平均指标。这些评分（按场景及总计）将被保存在 `results_path\u002Fscores` 目录下。\n\n我们已尽力修复匹配编码器中的批处理错误，通过禁用该编码器的图像批处理，以实现精度达到 (\u003C10^-4) 的准确测试。如有疑问，最多可运行 `--batch_size 4`；若希望获得尽可能稳定的结果并避免 PyTorch 的潜在问题，请使用 `--batch_size 1` 进行对比评估。\n\n如果您希望提高速度，可将 `--fast_cost_volume` 设置为 True。这将启用匹配编码器的批处理，并使用 einops 优化的特征体积。\n\n```bash\n# 示例命令：仅计算评分\nCUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs\u002Fmodels\u002Fhero_model.yaml \\\n            --load_weights_from_checkpoint weights\u002Fhero_model.ckpt \\\n            --data_config configs\u002Fdata\u002Fscannet_default_test.yaml \\\n            --num_workers 8 \\\n            --batch_size 4;\n\n# 如果需要超快速版本：\nCUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs\u002Fmodels\u002Fhero_model.yaml \\\n            --load_weights_from_checkpoint weights\u002Fhero_model.ckpt \\\n            --data_config configs\u002Fdata\u002Fscannet_default_test.yaml \\\n            --num_workers 8 \\\n            --fast_cost_volume \\\n            --batch_size 2;\n```\n\n该脚本还可用于执行一些辅助任务，包括：\n\n**TSDF 融合**\n\n要运行 TSDF 融合，请添加 `--run_fusion` 标志。您有两种融合器可供选择：\n1) `--depth_fuser ours`（默认）将使用我们的融合器，其生成的网格常用于大多数可视化和评分。此融合器不支持颜色。我们提供了 scikit-image 的自定义分支，其中包含我们对 `measure.matching_cubes` 的自定义实现，支持单壁网格。我们使用单壁网格进行评估。如果您对此不关心，可在调用 `test.py` 中的 `export_mesh` 函数时将 `export_single_mesh` 设置为 `False`。\n2) `--depth_fuser open3d` 将使用 Open3D 深度融合器。此融合器支持颜色，可通过使用 `--fuse_color` 标志启用。\n\n默认情况下，用于融合的深度图会被裁剪至 3 米，且 TSDF 分辨率为 0.04 m³，但您可以通过修改 `--max_fusion_depth` 和 `--fusion_resolution` 来调整这些参数。\n\n您还可以选择在没有有效 MVS 信息时，对用于融合的预测深度进行掩码处理，方法是使用 `--mask_pred_depths`。此功能默认未启用。\n\n此外，您还可以在引入强图像先验的代价卷积编码解码之前，融合来自代价体积的最佳猜测深度。只需使用 `--fusion_use_raw_lowest_cost` 即可。\n\n网格将存储在 `results_path\u002Fmeshes\u002F` 目录下。\n\n```bash\n# 示例命令：融合深度以获取网格\nCUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs\u002Fmodels\u002Fhero_model.yaml \\\n            --load_weights_from_checkpoint weights\u002Fhero_model.ckpt \\\n            --data_config configs\u002Fdata\u002Fscannet_default_test.yaml \\\n            --num_workers 8 \\\n            --run_fusion \\\n            --batch_size 8;\n```\n\n**缓存深度**\n\n您也可以通过提供 `--cache_depths` 标志来选择性地存储深度图。它们将被保存在 `results_path\u002Fdepths` 目录下。\n\n```bash\n# 示例命令：计算评分并缓存深度\nCUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs\u002Fmodels\u002Fhero_model.yaml \\\n            --load_weights_from_checkpoint weights\u002Fhero_model.ckpt \\\n            --data_config configs\u002Fdata\u002Fscannet_default_test.yaml \\\n            --num_workers 8 \\\n            --cache_depths \\\n            --batch_size 8;\n\n# 示例命令：融合深度以获取彩色网格\nCUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs\u002Fmodels\u002Fhero_model.yaml \\\n            --load_weights_from_checkpoint weights\u002Fhero_model.ckpt \\\n            --data_config configs\u002Fdata\u002Fscannet_default_test.yaml \\\n            --num_workers 8 \\\n            --run_fusion \\\n            --depth_fuser open3d \\\n            --fuse_color \\\n            --batch_size 4;\n```\n\n**快速可视化**\n\n虽然有其他脚本可用于更深入的输出深度和融合可视化，但若需快速导出深度图可视化效果，可使用 `--dump_depth_visualization`。可视化结果将被保存在 `results_path\u002Fviz\u002Fquick_viz\u002F` 目录下。\n\n```bash\n\n# 示例命令，用于输出快速深度可视化\nCUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs\u002Fmodels\u002Fhero_model.yaml \\\n            --load_weights_from_checkpoint weights\u002Fhero_model.ckpt \\\n            --data_config configs\u002Fdata\u002Fscannet_default_test.yaml \\\n            --num_workers 8 \\\n            --dump_depth_visualization \\\n            --batch_size 4;\n```\n## 👉☁️ 点云融合\n\n我们还允许使用3DVNet仓库中的融合器将深度图与点云进行融合。[链接](https:\u002F\u002Fgithub.com\u002Falexrich021\u002F3dvnet\u002Fblob\u002Fmain\u002Fmv3d\u002Feval\u002Fpointcloudfusion_custom.py)。\n\n```bash\n# 示例命令，用于将深度信息融合到点云中。\nCUDA_VISIBLE_DEVICES=0 python pc_fusion.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs\u002Fmodels\u002Fhero_model.yaml \\\n            --load_weights_from_checkpoint weights\u002Fhero_model.ckpt \\\n            --data_config configs\u002Fdata\u002Fscannet_dense_test.yaml \\\n            --num_workers 8 \\\n            --batch_size 4;\n```\n\n如果您不想等待太久，可以将 `configs\u002Fdata\u002Fscannet_dense_test.yaml` 改为 `configs\u002Fdata\u002Fscannet_default_test.yaml`，这样就只使用关键帧了。\n\n## 📊 网格指标\n\n我们使用TransformerFusion的[网格评估代码](https:\u002F\u002Fgithub.com\u002FAljazBozic\u002FTransformerFusion\u002Fblob\u002Fmain\u002Fsrc\u002Fevaluation\u002Feval.py)来生成主要的结果表格，但在随机采样网格时，为了保持一致性，我们将随机种子设置为固定值。此外，在补充材料中，我们还报告了使用NeuralRecon的[评估代码](https:\u002F\u002Fgithub.com\u002Fzju3dv\u002FNeuralRecon\u002Fblob\u002Fmaster\u002Ftools\u002Fevaluation.py)得到的网格指标。\n\n对于点云的评估，我们同样使用TransformerFusion的代码，但会加载点云数据，而不是对网格表面进行采样。\n\n## ⏳ 训练\n\n默认情况下，模型和TensorBoard事件文件会被保存到 `~\u002Ftmp\u002Ftensorboard\u002F\u003Cmodel_name>`。可以通过 `--log_dir` 参数来更改保存路径。\n\n我们在默认的ScanNetv2划分上，使用两块A100显卡以16的批量大小和16位精度进行训练。\n\n使用两块GPU训练的示例命令如下：\n```shell\nCUDA_VISIBLE_DEVICES=0,1 python train.py --name HERO_MODEL \\\n            --log_dir logs \\\n            --config_file configs\u002Fmodels\u002Fhero_model.yaml \\\n            --data_config configs\u002Fdata\u002Fscannet_default_train.yaml \\\n            --gpus 2 \\\n            --batch_size 16;\n```\n\n\n该代码支持任意数量的GPU进行训练。您可以通过 `CUDA_VISIBLE_DEVICES` 环境变量指定要使用的GPU。\n\n我们所有的训练都在两块NVIDIA A100显卡上完成。\n\n**不同的数据集**\n\n您可以通过编写一个新的数据加载器类来训练自定义的MVS数据集，该类需要继承 `datasets\u002Fgeneric_mvs_dataset.py` 中的 `GenericMVSDataset` 类。您可以参考 `datasets\u002Fscannet_dataset.py` 中的 `ScannetDataset` 类，或者 `datasets` 目录下的其他任何类作为示例。\n\n\n### 🎛️ 微调预训练模型\n\n要进行微调，只需加载一个检查点（不要恢复训练！）并从那里开始训练：\n```shell\nCUDA_VISIBLE_DEVICES=0 python train.py --config configs\u002Fmodels\u002Fhero_model.yaml\n                --data_config configs\u002Fdata\u002Fscannet_default_train.yaml \n                --load_weights_from_checkpoint weights\u002Fhero_model.ckpt\n```\n\n只需将数据配置更改为您想要微调到的数据集即可。\n\n## 🔧 其他训练和测试选项\n\n有关学习率、消融实验设置等其他训练选项以及测试选项，请参阅 `options.py` 文件。\n\n## ✨ 可视化\n\n除了在 `test.py` 脚本中进行的快速深度可视化之外，还有两个脚本可用于可视化深度输出。\n\n第一个是 `visualization_scripts\u002Fvisualize_scene_depth_output.py`。该脚本会生成一段视频，其中包含参考帧和源帧的彩色图像、深度预测结果、代价体积估计、真实深度以及根据深度估算的法线。该脚本假设您已经使用 `test.py` 脚本缓存了深度输出，并且接受与 `test.py` 相同的命令格式：\n\n```shell\n# 示例命令，用于获取密集帧的可视化效果\nCUDA_VISIBLE_DEVICES=0 python .\u002Fvisualization_scripts\u002Fvisualize_scene_depth_output.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --data_config configs\u002Fdata\u002Fscannet_dense_test.yaml \\\n            --num_workers 8;\n```\n\n其中 `OUTPUT_PATH` 是 SimpleRecon 的基础结果目录（即您最初用于测试的目录）。您还可以选择在运行此脚本之前先运行 `.visualization_scripts\u002Fgenerate_gt_min_max_cache.py`，以获取场景平均的最小和最大深度值，用于颜色映射；如果这些值不可用，则脚本会使用0米和5米作为颜色映射的最小值和最大值。\n\n第二个脚本则允许实时可视化网格重建过程。如果已有缓存的深度图，脚本将直接使用；否则，它会先通过模型预测深度图，然后再进行融合。脚本会迭代地加载深度图，将其融合，保存当前步骤的网格文件，并将该网格与俯视视角的摄像机标记一起渲染，同时还会从摄像机视角生成第一人称视角的视频。\n\n```shell\n# 示例命令，用于获取网格重建的实时可视化效果\nCUDA_VISIBLE_DEVICES=0 python visualize_live_meshing.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs\u002Fmodels\u002Fhero_model.yaml \\\n            --load_weights_from_checkpoint weights\u002Fhero_model.ckpt \\\n            --data_config configs\u002Fdata\u002Fscannet_dense_test.yaml \\\n            --num_workers 8;\n```\n\n默认情况下，脚本会将网格保存到一个中间位置，您也可以选择传递 `--use_precomputed_partial_meshes` 参数，以便在再次可视化相同网格时节省时间。不过，只有在前一次运行中计算过的中间网格才能被重复使用。\n\n## 📝🧮👩‍💻 变换矩阵的表示法\n\n__简而言之：__ `world_T_cam == world_from_cam`  \n本仓库使用“cam_T_world”表示从世界坐标系到相机坐标系的变换矩阵（外参）。其目的是使变量在从右向左相乘时，两侧的坐标系名称能够匹配：\n\n    cam_points = cam_T_world @ world_points\n\n`world_T_cam` 表示相机姿态（从相机坐标系到世界坐标系）。`ref_T_src` 表示从源视图到参考视图的变换。  \n最后，这种表示法既可以表示旋转，也可以表示平移，例如：`world_R_cam` 和 `world_t_cam`。\n\n## 🗺️ 世界坐标系\n\n本仓库主要针对ScanNet设计，因此虽然其功能理论上适用于任何坐标系（可通过输入参数指定），但我们提供的模型权重默认假设使用的是ScanNet坐标系。这一点非常重要，因为我们在元数据中包含了光线信息。如果使用这些权重处理其他数据集，应将其转换为ScanNet坐标系。我们包含的数据集类会自动执行相应的坐标变换。\n\n## 🐜🔧 错误修复\n\n### **更新 2022年12月31日：**\n\n本次更新修复了几个 bug。您需要更新您的代码分支，并使用本 README 开头表格中的新权重文件。此外，还需确保已使用读取工具正确提取了内参文件。\n\n- 我们最初在 ScanNet 数据集中使用了一组略有偏差的内参数据。现在仓库已切换到 `intrinsics` 文件夹中的内参。\n- 成本体积中的 MLP 没有应用翻转变换增强，导致边缘附近出现偏差。因此，我们在基础数据集类中加入了基于几何的翻转变换，但仅在训练集上启用。\n- 投影过程中存在一个 bug，导致成本体积中的掩码无法正常工作。为此，我们现已改用与 OpenCV 和 Kornia 相同的归一化方法。\n\n感谢所有指出问题并耐心等待我们修复的人。\n\n应用这些修复后，各项指标均有提升，相关权重也已上传至此。如需查看旧版指标、代码和权重，请参考此 commit hash：7de5b451e340f9a11c7fd67bd0c42204d0b009a9。\n\n修复 bug 后的模型完整指标：\n\n**深度估计**\n| `--config`  | 绝对差↓ | 绝对相对误差↓ | 平方相对误差↓ | RMSE↓  | log RMSE↓  | delta \u003C 1.05↑ | delta \u003C 1.10↑ |\n|-------------|-----------|----------|---------|---------|-------------|--------------|---------------|\n| `hero_model.yaml`, 元数据 + Resnet | 0.0868 | 0.0428 | 0.0127 | 0.1472 | 0.0681 | 74.26 | 90.88 |\n| `dot_product_model.yaml`, 点积 + Resnet | 0.0910 | 0.0453 | 0.0134 | 0.1509 | 0.0704 | 71.90 | 89.75 |\n\n**网格融合**\n| `--config`  | 准确率↓ | 完整性↓ | 距离误差↓ | 召回率↑ | 精确率↑ | F1 分数↑ |\n|-------------|------|-------|----------|---------|------------|----------|\n| `hero_model.yaml`, 元数据 + Resnet | 5.41 | 5.98 | 5.69 | 0.695 | 0.668 | 0.680 |\n| `dot_product_model.yaml`, 点积 + Resnet | 5.66 | 6.18 | 5.92 | 0.682 | 0.655 | 0.667 |\n\n**对比：**\n| `--config`  | 模型  | 绝对差↓| 平方相对误差↓ | delta \u003C 1.05↑| 距离误差↓ | F1 分数↑ |\n|-------------|----------|--------------------|---------|---------|--------------|----------|\n| `hero_model.yaml` | 元数据 + Resnet 匹配 | 0.0868 | 0.0127 | 74.26 | 5.69 | 0.680 |\n| OLD `hero_model.yaml` | 元数据 + Resnet 匹配 | 0.0885 | 0.0125 | 73.16 | 5.81 | 0.671 |\n| `dot_product_model.yaml` | 点积 + Resnet 匹配 | 0.0910 | 0.0134 | 71.90 | 5.92 | 0.667 |\n| OLD `dot_product_model.yaml` | 点积 + Resnet 匹配 | 0.0941 | 0.0139 | 70.48 | 6.29 | 0.642 |\n\n### **帧数的小 bug：**\n\n最初，该仓库为默认的 DVMVS 风格关键帧生成了 tuple 文件，其中 ScanNetv2 测试集多出了 9 帧，总数为 25599 帧。这是一个处理跟踪丢失时的小 bug，现已修复。现在，该仓库应能完全复现 DVMVS 的关键帧缓冲区，测试集的关键帧数量为 25590 帧。该 bug 的唯一影响是多出了 9 帧，其余 tuple 文件与 DVMVS 完全一致。出错的帧位于以下扫描场景中：\n\n```\n扫描场景         原始帧数  新帧数\n--------------------------------------\nscene0711_00     393       392\nscene0727_00     209       208 \nscene0736_00     1023      1022 \nscene0737_00     408       407 \nscene0751_00     165       164 \nscene0775_00     220       219 \nscene0791_00     227       226 \nscene0794_00     141       140 \nscene0795_00     102       101 \n```\n\n默认测试集的 tuple 文件已更新。由于额外帧的数量差异很小（约 3e-4），因此指标未发生变化。\n\n## 🗺️💾 COLMAP 数据集\n\n__简而言之：__ 缩放位姿并裁剪图像。\n\n我们确实提供了一个用于从 COLMAP 稀疏重建中加载图像的数据加载器。为了使 SimpleRecon 正常工作，您需要将图像裁剪至与 ScanNet 视场角大致相同的范围（类似于 iPhone 视频模式下的视场角），并根据已知的真实世界测量值缩放位姿的位置。如果未执行这些步骤，成本体积将无法正确构建，网络也无法准确估计深度。\n\n## 🙏 致谢\n\n我们感谢 [TransformerFusion](https:\u002F\u002Fgithub.com\u002FAljazBozic\u002FTransformerFusion) 的 Aljaž Božič、[Neural Recon](https:\u002F\u002Fzju3dv.github.io\u002Fneuralrecon\u002F) 的 Jiaming Sun，以及 [DeepVideoMVS](https:\u002F\u002Fgithub.com\u002Fardaduz\u002Fdeep-video-mvs) 的 Arda Düzçeker，他们迅速提供了有用的基准信息，并在短时间内开放了自己的代码库。\n\ntuple 文件生成脚本大量使用了 DeepVideoMVS 的 [关键帧缓冲区](https:\u002F\u002Fgithub.com\u002Fardaduz\u002Fdeep-video-mvs\u002Fblob\u002Fmaster\u002Fdvmvs\u002Fkeyframe_buffer.py) 的修改版本（再次感谢 Arda 及其团队）。\n\n`torch_point_cloud_fusion` 中的 PyTorch 点云融合模块代码借用了 3DVNet 的 [repo](https:\u002F\u002Fgithub.com\u002Falexrich021\u002F3dvnet\u002Fblob\u002Fmain\u002Fmv3d\u002Feval\u002Fpointcloudfusion_custom.py)。感谢 Alexander Rich！\n\n我们还要感谢 Niantic 的基础设施团队在我们需要时提供的快速支持。谢谢大家！\n\nMohamed 得到了微软研究院博士奖学金（MRL 2018-085）的支持。\n\n## 📜 BibTeX\n\n如果您在研究中使用了我们的工作，请考虑引用我们的论文：\n\n```\n@inproceedings{sayed2022simplerecon,\n  title={SimpleRecon: 无 3D 卷积的 3D 重建},\n  author={Sayed, Mohamed and Gibson, John and Watson, Jamie and Prisacariu, Victor and Firman, Michael and Godard, Cl{\\'e}ment},\n  booktitle={欧洲计算机视觉大会 (ECCV) 论文集},\n  year={2022},\n}\n```\n\n## 👩‍⚖️ 许可证\n\n版权所有 © Niantic, Inc. 2022。专利申请中。\n保留所有权利。\n详细条款请参阅 [许可证文件](LICENSE)。","# SimpleRecon 快速上手指南\n\nSimpleRecon 是一个无需 3D 卷积即可实现高质量 3D 重建的多视图立体（MVS）深度估计模型。本指南将帮助中国开发者快速配置环境并运行预训练模型。\n\n## 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n*   **操作系统**: Linux (推荐 Debian GNU\u002FLinux 10 或 Ubuntu)\n*   **GPU**: 支持 CUDA 的 NVIDIA 显卡\n*   **基础软件**:\n    *   Anaconda 或 Miniconda (推荐用于管理环境)\n    *   CUDA 11.3\n    *   Python 3.9.7\n    *   PyTorch 1.10\n\n> **注意**：官方实验环境基于 PyTorch 1.10 + CUDA 11.3。如果您使用其他版本，可能会遇到兼容性问题，建议尽量匹配该环境。\n\n## 安装步骤\n\n### 1. 克隆仓库\n首先从 GitHub 克隆项目代码：\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fnianticlabs\u002Fsimplerecon.git\ncd simplerecon\n```\n\n### 2. 创建 Conda 环境\n使用项目提供的配置文件一键安装所有依赖。该配置文件已修复了 `llvm-openmp`、`clang` 和 `protobuf` 等包的版本问题，并包含了必要的 `kornia` 版本。\n\n```bash\nconda env create -f simplerecon_env.yml\n```\n\n安装完成后，激活环境：\n```bash\nconda activate simplerecon\n```\n\n### 3. 下载预训练模型\n下载官方提供的预训练权重（推荐使用 `hero_model`，即论文中的主要模型），并将其放入项目根目录下的 `weights\u002F` 文件夹中。\n\n*   **Hero Model (Metadata + Resnet Matching)**: [下载链接](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1hCuKZjEq-AghrYAmFxJs_4eeixIlP488\u002Fview?usp=sharing)\n*   **Dot Product Model**: [下载链接](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F13lW-VPgsl2eAo95E87RKWoK8KUZelkUK\u002Fview?usp=sharing)\n\n下载后执行：\n```bash\nmkdir -p weights\n# 将下载的 .ckpt 文件移动到 weights 目录，例如：\nmv ~\u002FDownloads\u002Fhero_model.ckpt weights\u002F\n```\n\n### 4. 准备测试数据（可选但推荐）\n为了无需下载庞大的 ScanNetv2 数据集即可立即测试，官方提供了两个预处理好的扫描片段。\n\n*   **示例数据下载**: [Google Drive 链接](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1x-auV7vGCMdu5yZUMPcoP83p77QOuasT\u002Fview?usp=sharing)\n\n下载并解压到您选择的目录（例如 `\u002Fdata\u002Fvdr_samples`）。\n\n## 基本使用\n\n以下命令演示如何使用 `hero_model` 对示例数据进行推理、生成深度图可视化以及融合点云网格。\n\n请根据您的实际路径修改 `--output_base_path` 和配置文件中的 `dataset_path`。\n\n### 第一步：配置数据路径\n打开 `configs\u002Fdata\u002Fvdr_dense.yaml` 文件，将 `dataset_path` 的值修改为您解压示例数据的根目录路径。\n\n### 第二步：运行推理\n在终端执行以下命令：\n\n```bash\nCUDA_VISIBLE_DEVICES=0 python test.py --name HERO_MODEL \\\n            --output_base_path OUTPUT_PATH \\\n            --config_file configs\u002Fmodels\u002Fhero_model.yaml \\\n            --load_weights_from_checkpoint weights\u002Fhero_model.ckpt \\\n            --data_config configs\u002Fdata\u002Fvdr_dense.yaml \\\n            --num_workers 8 \\\n            --batch_size 2 \\\n            --fast_cost_volume \\\n            --run_fusion \\\n            --depth_fuser open3d \\\n            --fuse_color \\\n            --dump_depth_visualization\n```\n\n**参数说明：**\n*   `--output_base_path`: 输出结果（网格、深度可视化、评估分数）的保存目录。\n*   `--run_fusion`: 启用深度图融合以生成 3D 网格。\n*   `--fuse_color`: 在生成的网格上贴图颜色。\n*   `--dump_depth_visualization`: 保存深度图的可视化图像。\n*   `--fast_cost_volume`: 使用优化后的成本体积计算以提升速度。\n\n运行结束后，您可以在 `OUTPUT_PATH` 目录下查看生成的 `.ply` 网格文件和深度可视化图片。\n\n> **提示**：如果您拥有完整的 ScanNetv2 数据集，只需更改 `--data_config` 为对应的配置文件（如 `scannetv2_test.yaml`），并确保配置中的 `dataset_path` 指向数据集根目录即可。","某自动驾驶初创团队正在构建城市街道的高精度 3D 地图，需要利用车载摄像头采集的视频流快速重建道路环境的深度信息。\n\n### 没有 simplerecon 时\n- **硬件成本高昂**：传统 3D 重建方法依赖昂贵的激光雷达（LiDAR）或专用深度传感器，导致车辆改装成本居高不下。\n- **计算资源消耗大**：现有基于 3D 卷积的算法对显存和算力要求极高，难以在边缘设备或普通工作站上实时运行。\n- **重建精度受限**：在纹理缺失区域（如白墙、路面）或光照剧烈变化时，生成的深度图噪点多，细节模糊，影响后续路径规划。\n- **部署流程复杂**：模型训练和推理环境配置繁琐，不同版本的依赖库冲突频繁，严重拖慢研发迭代速度。\n\n### 使用 simplerecon 后\n- **纯视觉低成本方案**：simplerecon 仅需普通的 RGB 摄像头图像即可输出高质量深度图，彻底摆脱了对激光雷达的依赖，大幅降低硬件门槛。\n- **高效轻量级推理**：通过摒弃耗时的 3D 卷积操作，该工具显著减少了显存占用，使得在消费级 GPU 上也能流畅进行高分辨率深度估计。\n- **细节还原更精准**：得益于先进的多视图立体几何（MVS）策略，simplerecon 在弱纹理区域仍能保持边缘锐利，生成的点云密度和准确度显著提升。\n- **开箱即用的体验**：提供预训练模型和标准化的环境配置文件，开发人员可快速复现论文效果，将精力集中于业务逻辑而非环境调试。\n\nsimplerecon 通过“去 3D 卷积”的创新架构，让高精度 3D 重建从昂贵的实验室技术变成了可大规模落地的普惠型视觉方案。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fnianticlabs_simplerecon_4d9e8f3f.jpg","nianticlabs","Niantic Labs","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fnianticlabs_edeead43.png","Building technologies and ideas that move us",null,"https:\u002F\u002Fwww.nianticlabs.com","https:\u002F\u002Fgithub.com\u002Fnianticlabs",[80],{"name":81,"color":82,"percentage":83},"Python","#3572A5",100,1431,131,"2026-04-13T06:16:42","NOASSERTION","Linux","需要 NVIDIA GPU，显存至少 2.6GB（标准模式）或 5.7GB（速度优化模式），CUDA 11.3","未说明",{"notes":92,"python":93,"dependencies":94},"官方实验环境为 Debian GNU\u002FLinux 10。建议使用 Anaconda 创建环境（simplerecon_env.yml）。2023 年 5 月 25 日更新修复了数据加载线程限制问题，若遇问题请更新环境文件。代码仅限非商业用途。","3.9.7",[95,96,97,98,99,100],"torch==1.10","kornia","llvm-openmp","clang","protobuf","open3d",[15,14],[103,104,105,106,107,108,109,110,111,112],"computer-vision","cost-volume","depth","depth-estimation","eccv2022","multi-view-stereo","mvs","pytorch","scannet","visualization","2026-03-27T02:49:30.150509","2026-04-14T03:09:21.914719",[116,121,126,131,136,141],{"id":117,"question_zh":118,"answer_zh":119,"source_url":120},32390,"无法复现论文中的训练结果，准确率低于预期怎么办？","如果您使用的硬件配置（如显卡型号、Batch Size）与论文默认设置不同，可能会导致结果差异。请确保严格按照 README 中的方法进行操作。有用户反馈在使用 4 张 3090 显卡（每张卡 Batch Size=8）替代默认的 2 张 A100 时，初始结果不佳，但经过正确配置和训练后，成功复现了合理的结果（例如 abs_diff 降至 0.0891 左右）。请检查是否修改了其他选项，并确保数据预处理步骤正确。","https:\u002F\u002Fgithub.com\u002Fnianticlabs\u002Fsimplerecon\u002Fissues\u002F15",{"id":122,"question_zh":123,"answer_zh":124,"source_url":125},32391,"如何使用 iPhone 扫描图像并重建以获取精确测量值（无激光雷达设备）？","目前 Scanniverse App 尚未提供公开的导出扫描功能。您需要使用 `ios-logger` 工具。NeuralRecon 项目提供了相关教程和数据加载器（datasets\u002Farkit_dataset.py）。此外，仓库中提供了快速指南 `data_scripts\u002FIOS_LOGGER_ARKIT_README.md`，说明了如何使用 `data_scripts\u002Fios_logger_preprocessing.py` 脚本处理和运行 ios-logger 扫描数据的推理。对于非激光雷达设备（如 iPhone 12 mini），Scanniverse 2.0+ 版本支持保存原始数据，但提取这些数据仍需通过特定工具或脚本处理。","https:\u002F\u002Fgithub.com\u002Fnianticlabs\u002Fsimplerecon\u002Fissues\u002F1",{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},32392,"COLMAP 数据集缺少 `get_valid_frame_id` 函数或路径配置错误如何解决？","COLMAP 数据集实现中确实曾缺失该函数，您可以参考 ScanNet 数据集的实现进行复制。更重要的是，请确保使用 COLMAP 生成的**去畸变图像（undistorted images）**。如果遇到路径错误（如假设图像在 `sparse` 文件夹下），可能需要手动调整代码中的路径逻辑（例如移除代码中对 `sparse` 目录的硬编码假设）。另外，建议尝试以 5fps 的频率注册图像以提高速度，并确保 COLMAP 重建时能正确链接序列的不同部分以避免尺度估计错误。","https:\u002F\u002Fgithub.com\u002Fnianticlabs\u002Fsimplerecon\u002Fissues\u002F4",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},32393,"点云融合（Pointcloud Fusion）结果混乱或不符合预期是什么原因？","这通常是由于输入图像的分辨率不匹配导致的。模型是使用 **640x480** 的图像训练的。如果您的输入数据（如 VDR 数据集）原始分辨率为 720x540 或其他尺寸，必须在预处理阶段使用 ImageMagick 等工具将其调整为 640x480，或者确保数据加载器（dataloader）中包含动态调整大小的逻辑（参考 `utils\u002Fgeneric_utils.py` 中的实现）。如果尺寸不一致，会导致深度预测和融合出错。","https:\u002F\u002Fgithub.com\u002Fnianticlabs\u002Fsimplerecon\u002Fissues\u002F9",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},32394,"如何处理 NeuralRecon 数据集或自定义视频输入以用于深度生成？","使用 NeuralRecon 数据集示例时，需要预先运行额外的预处理步骤来提取图像和姿态。具体而言，需要运行 NeuralRecon 仓库中的脚本：`tools\u002Fprocess_arkit_data.py`。运行该脚本后，数据格式才能被本项目的 `ARKitDataset` 类正确解析。对于自定义视频输入，建议先通过 COLMAP 或其他 SLAM 工具（如 DPVO, DroidSlam）估算相机内参和姿态，再按照 ARKit 数据集的格式进行整理。","https:\u002F\u002Fgithub.com\u002Fnianticlabs\u002Fsimplerecon\u002Fissues\u002F8",{"id":142,"question_zh":143,"answer_zh":144,"source_url":130},32395,"使用互联网视频生成深度图时 COLMAP 运行速度过慢怎么办？","COLMAP 稀疏重建对于长视频（如 10 分钟）确实非常耗时且难以扩展。建议的解决方案是：仅对视频中的一小段片段（例如 30 秒，5FPS）运行 COLMAP 以估算相机内参，然后结合视觉 SLAM\u002F里程计软件（如 DPVO 或 DroidSlam）来处理剩余帧的姿态估计。这样可以显著减少计算时间，同时保持足够的精度。",[]]