[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-wanmeihuali--taichi_3d_gaussian_splatting":3,"tool-wanmeihuali--taichi_3d_gaussian_splatting":61},[4,18,26,36,44,52],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",141543,2,"2026-04-06T11:32:54",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":10,"last_commit_at":50,"category_tags":51,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":53,"name":54,"github_repo":55,"description_zh":56,"stars":57,"difficulty_score":10,"last_commit_at":58,"category_tags":59,"status":17},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,"2026-04-06T03:28:53",[14,15,13,60],"视频",{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":73,"owner_location":76,"owner_email":73,"owner_twitter":73,"owner_website":73,"owner_url":77,"languages":78,"stars":91,"forks":92,"last_commit_at":93,"license":94,"difficulty_score":10,"env_os":95,"env_gpu":96,"env_ram":97,"env_deps":98,"category_tags":111,"github_topics":113,"view_count":32,"oss_zip_url":73,"oss_zip_packed_at":73,"status":17,"created_at":123,"updated_at":124,"faqs":125,"releases":159},4565,"wanmeihuali\u002Ftaichi_3d_gaussian_splatting","taichi_3d_gaussian_splatting","An unofficial implementation of paper 3D Gaussian Splatting for Real-Time Radiance Field Rendering by taichi lang.","taichi_3d_gaussian_splatting 是一个基于 Taichi 语言开发的开源项目，旨在复现\"3D 高斯泼溅（3D Gaussian Splatting）”这一前沿技术。它能够将多视角照片和稀疏点云转化为包含丰富特征的高密度点云，从而实现高质量的实时辐射场渲染。无论是从新角度生成逼真图像，还是合并多个场景对象，它都能轻松胜任，有效解决了传统神经辐射场（NeRF）类算法在场景融合与渲染效率上的痛点。\n\n该项目特别适合计算机视觉研究人员、图形学开发者以及对高性能渲染感兴趣的技术爱好者使用。其核心亮点在于利用 Taichi 语言 bridging Python 的开发效率与 C++\u002FCUDA 的运行性能，使得整个代码库纯由 Python 编写，既易读易维护，又在部分测试中以更少的点数实现了比官方版本更高的图像重建精度（PSNR）。虽然目前主要支持 CUDA 后端且训练速度略逊于官方实现，但其跨平台潜力和自动微分特性为未来扩展留下了广阔空间。如果你希望在不深入底层 CUDA 编程的前提下探索实时 3D 渲染的前沿方案，这是一个值得尝试的工具。","# taichi_3d_gaussian_splatting\nAn unofficial implementation of paper [3D Gaussian Splatting\nfor Real-Time Radiance Field Rendering](https:\u002F\u002Frepo-sam.inria.fr\u002Ffungraph\u002F3d-gaussian-splatting\u002F) by taichi lang. \n\n## What does 3D Gaussian Splatting do?\n\n### Training:\nThe algorithm takes image from multiple views, a sparse point cloud, and camera pose as input, use a differentiable rasterizer to train the point cloud, and output a dense point cloud with extra features(covariance, color information, etc.).\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fwanmeihuali_taichi_3d_gaussian_splatting_readme_699a577ab066.png\" alt=\"drawing\" width=\"200\"\u002F>\\\nIf we view the training process as module, it can be described as:\n```mermaid\ngraph LR\n    A[ImageFromMultiViews] --> B((Training))\n    C[sparsePointCloud] --> B\n    D[CameraPose] --> B\n    B --> E[DensePointCloudWithExtraFeatures]\n```\n\n### Inference:\nThe algorithm takes the dense point cloud with extra features and any camera pose as input, use the same rasterizer to render the image from the camera pose.\n```mermaid\ngraph LR\n    C[DensePointCloudWithExtraFeatures] --> B((Inference))\n    D[NewCameraPose] --> B\n    B --> E[Image]\n```\nAn example of inference result:\n\nhttps:\u002F\u002Fgithub.com\u002Fwanmeihuali\u002Ftaichi_3d_gaussian_splatting\u002Fassets\u002F18469933\u002Fcc760693-636b-4157-ae85-33813f3da54d\n\nBecause the nice property of point cloud, the algorithm easily handles scene\u002Fobject merging compared to other NeRF-like algorithms.\n\nhttps:\u002F\u002Fgithub.com\u002Fwanmeihuali\u002Ftaichi_3d_gaussian_splatting\u002Fassets\u002F18469933\u002Fbc38a103-e435-4d35-9239-940e605b4552\n\n\n\n\u003Cdetails>\u003Csummary>other example result\u003C\u002Fsummary>\n\u003Cp>\n\ntop left: [result from this repo(30k iteration)](https:\u002F\u002Fgithub.com\u002Fwanmeihuali\u002Ftaichi_3d_gaussian_splatting\u002Fblob\u002Fcf7c1428e8d26495a236726adf9546e4f2a9adb7\u002Fconfig\u002Ftat_truck_every_8_test.yaml), top right: ground truth, bottom left: normalized depth, bottom right: normalized num of points per pixel\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fwanmeihuali_taichi_3d_gaussian_splatting_readme_2e2f4f53ce5d.png)\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fwanmeihuali_taichi_3d_gaussian_splatting_readme_12c9d01e706a.png)\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fwanmeihuali_taichi_3d_gaussian_splatting_readme_27accb243b7f.png)\n\n\u003C\u002Fp>\n\u003C\u002Fdetails>\n\n## Why taichi?\n- Taichi is a language for high-performance computing. It is designed to close the gap between the productivity-focused Python language and the performance- and parallelism-focused C++\u002FCUDA languages. By using Taichi, the repo is pure Python, and achieves the same or even better performance compared to CUDA implementation. Also, the code is much easier to read and maintain.\n- Taichi provides various backends, including CUDA, OpenGL, Metal, etc. We do plan to change the backend to support various platforms, but currently, the repo only supports CUDA backend.\n- Taichi provides automatic differentiation, although the repo does not use it currently, it is a nice feature for future development. \n\n## Current status\nThe repo is now tested with the dataset provided by the official implementation. For the truck dataset, The repo is able to achieve a bit higher PSNR than the official implementation with only 1\u002F5 to 1\u002F4 number of points. However, the training\u002Finference speed is still slower than the official implementation. \n\nThe results for the official implementation and this implementation are tested on the same dataset. I notice that the result from official implementation is slightly different from their paper, the reason may be the difference in testing resolution.\n\n| Dataset | source | PSNR | SSIM | #points |\n| --- | --- | --- | --- | --- |\n| Truck(7k) | paper | 23.51 | 0.840 | - |\n| Truck(7k) | offcial implementation | 23.22 | - | 1.73e6 |\n| Truck(7k) | this implementation | 23.762359619140625 | 0.835700511932373 | ~2.3e5 |\n| Truck(30k) | paper | 25.187 | 0.879 | - |\n| Truck(30k) | offcial implementation | 24.88 | - | 2.1e6 |\n| Truck(30k) | this implementation | 25.21463966369629 | 0.8645088076591492 | 428687.0 |\n\n[Truck(30k)(recent best result)](https:\u002F\u002Fgithub.com\u002Fwanmeihuali\u002Ftaichi_3d_gaussian_splatting\u002Fpull\u002F98#issuecomment-1634828783):\n| train:iteration | train:l1loss | train:loss | train:num_valid_points | train:psnr | train:ssim | train:ssimloss | val:loss | val:psnr | val:ssim |\n| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |\n| 30000.0 | 0.02784738875925541 | 0.04742341861128807 | 428687.0 | 25.662137985229492 | 0.8742724657058716 | 0.12572753429412842 | 0.05369199812412262 | 25.21463966369629 | 0.8645088076591492 |\n\n\n## Installation\n1. Prepare an environment contains pytorch and torchvision\n2. clone the repo and cd into the directory.\n3. run the following command\n```\npip install -r requirements.txt\npip install -e .\n```\n\nAll dependencies can be installed by pip. pytorch\u002Ftochvision can be installed by conda. The code is tested on Ubuntu 20.04.2 LTS with python 3.10.10. The hardware is RTX 3090 and CUDA 12.1. The code is not tested on other platforms, but it should work on other platforms with minor modifications.\n\n## Dataset\nThe algorithm requires point cloud for whole scene, camera parameters, and ground truth image. The point cloud is stored in parquet format. The camera parameters and ground truth image are stored in json format. The running config is stored in yaml format. A script to build dataset from colmap output is provided. It is also possible to build dataset from raw data.\n### Train on Tank and temple Truck scene\n\u003Cdetails>\u003Csummary>CLICK ME\u003C\u002Fsummary>\n\u003Cp>\n**Disclaimer**: users are required to get permission from the original dataset provider. Any usage of the data must obey the license of the dataset owner.\n\nThe truck scene in [tank and temple](https:\u002F\u002Fwww.tanksandtemples.org\u002Fdownload\u002F) dataset is the major dataset used to develop this repo. We use a downsampled version of images in most experiments. The camera poses and the sparse point cloud can be easily generated by colmap. The preprocessed image, pregenerated camera pose and point cloud for truck scene can be downloaded from this [link](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1ZhMSkm3YGfhtywII5Hik5YDdMzD3lZjX?usp=sharing\n).\n\nPlease download the images into a folder named `image` and put it under the root directory of this repo. The camera poses and sparse point cloud should be put under `data\u002Ftat_truck_every_8_test`. The folder structure should be like this:\n```\n├── data\n│   ├── tat_truck_every_8_test\n│   │   ├── train.json\n│   │   ├── val.json\n│   │   ├── point_cloud.parquet\n├── image\n│   ├── 000000.png\n│   ├── 000001.png\n```\nthe config file [config\u002Ftat_truck_every_8_test.yaml](config\u002Ftat_truck_every_8_test.yaml) is provided. The config file is used to specify the dataset path, the training parameters, and the network parameters. The config file is self-explanatory. The training can be started by running\n```bash\npython gaussian_point_train.py --train_config config\u002Ftat_truck_every_8_test.yaml\n```\n\u003C\u002Fp>\n\u003C\u002Fdetails>\n\n\n### Train on Example Object(boot)\n\n\u003Cdetails>\u003Csummary>CLICK ME\u003C\u002Fsummary>\n\u003Cp>\n\nIt is actually one random free mesh from [Internet](https:\u002F\u002Fwww.turbosquid.com\u002F3d-models\u002F3d-tactical-boots-1948918), I believe it is free to use. [BlenderNerf](https:\u002F\u002Fgithub.com\u002Fmaximeraafat\u002FBlenderNeRF.git) is used to generate the dataset. The preprocessed image, pregenerated camera pose and point cloud for boot scene can be downloaded from this [link](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1d14l9ewnyI7zCA6BxuQUWseQbIKyo3Jh?usp=sharing). Please download the images into a folder named `image` and put it under the root directory of this repo. The camera poses and sparse point cloud should be put under `data\u002Fboots_super_sparse`. The folder structure should be like this:\n```\n├── data\n│   ├── boots_super_sparse\n│   │   ├── boots_train.json\n│   │   ├── boots_val.json\n│   │   ├── point_cloud.parquet\n├── image\n│   ├── images_train\n│   │   ├── COS_Camera.001.png\n│   │   ├── COS_Camera.002.png\n|   |   ├── ...\n```\nNote that because the image in this dataset has a higher resolution(1920x1080), training on it is actually slower than training on the truck scene.\n\n\n\u003C\u002Fp>\n\u003C\u002Fdetails>\n\n\n### Train on dataset generated by colmap\n\u003Cdetails>\u003Csummary>CLICK ME\u003C\u002Fsummary>\n\u003Cp>\n    \n- Reconstruct using colmap: See https:\u002F\u002Fcolmap.github.io\u002Ftutorial.html. The image should be undistorted. Sparse reconstruction is usually enough.\n- save as txt: the standard colmap txt output contains three files, cameras.txt, images.txt, points3D.txt\n- transform the txt into json and parquet: see [this file](tools\u002Fprepare_colmap.py) about how to prepare it.\n- prepare config yaml: see [this file](config\u002Ftat_train.yaml) as an example\n- run with the config.\n\n\u003C\u002Fp>\n\u003C\u002Fdetails>\n\n### Train on dataset with Instant-NGP format with extra mesh\n\u003Cdetails>\u003Csummary>CLICK ME\u003C\u002Fsummary>\n\u003Cp>\n\n- A script to convert Instant-NGP format dataset into the two required JSON files is provided. However, the algorithm requires an extra point cloud as input, which does not usually come with Instant-NGP format dataset. The script accepts a mesh file as input and generate a point cloud by sampling points on the mesh. The script is [here](tools\u002Fprepare_InstantNGP_with_mesh.py).\n- User can run the script with the following command:\n```bash\npython tools\u002Fprepare_InstantNGP_with_mesh.py \\\n    --transforms_train {path to train transform file} \\\n    --transforms_test {path to val transform file, if not provided, val will be sampled from train} \\\n    --mesh_path {path to mesh file} \\\n    --mesh_sample_points {number of points to sample on the mesh} \\\n    --val_sample {if sample val from train, sample by every n frames} \\\n    --image_path_prefix {path prefix to the image, usually the path to the folder containing the image folder} \\\n    --output_path {path to output folder}\n```\n- then in the output folder, there will be two json files, train.json and val.json, and a point cloud file point_cloud.parquet. \n- create a config yaml file similar to [test_sagemaker.yaml](config\u002Ftest_sagemaker.yaml), modify train-dataset-json-path to the path of train.json, val-dataset-json-path to the path of val.json, and pointcloud-parquet-path to the path of point_cloud.parquet. Also modify the summary-writer-log-dir and output-model-dir to where ever you want to save the model and tensorboard log.\n- run with the config:\n```bash\npython gaussian_point_train.py --train_config {path to config yaml}\n```\n\n\u003C\u002Fp>\n\u003C\u002Fdetails>\n\n### Train on dataset generated by BlenderNerf\n\u003Cdetails>\u003Csummary>CLICK ME\u003C\u002Fsummary>\n\u003Cp>\n\n[BlenderNerf](https:\u002F\u002Fgithub.com\u002Fmaximeraafat\u002FBlenderNeRF.git) is a Blender Plugin to generate dataset for NeRF. The dataset generated by BlenderNerf can be the Instant-NGP format, and we can use the [script](tools\u002Fprepare_InstantNGP_with_mesh.py) to convert it into the required format. And the mesh can be easily exported from Blender. To generate the dataset:\n- Install [Blender](https:\u002F\u002Fwww.blender.org\u002F)\n- import the mesh\u002Fscene you want to [Blender](https:\u002F\u002Fwww.blender.org\u002F)\n- Install BlenderNerf by following the README in [BlenderNerf](https:\u002F\u002Fgithub.com\u002Fmaximeraafat\u002FBlenderNeRF.git)\n- config BlenderNerf: make sure Train is selected and Test is not selected(Test seems to be buggy), File Format is NGP, save path is filled.\\\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fwanmeihuali_taichi_3d_gaussian_splatting_readme_45b563ca40e9.png)\n- config BlenderNerf Camera on Sphere: follow BlenderNerf README to config the camera(default is enough for most case). Then click PLAY COS.\\\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fwanmeihuali_taichi_3d_gaussian_splatting_readme_17b332f30666.png) \n- A zip file will be generated in the save path. Unzip it, it should contain a folder named `train` and a file named `transforms_train.json`.\n- In Blender, File->Export->Stl(.stl), export the mesh as stl file.\n- can run the [script](tools\u002Fprepare_InstantNGP_with_mesh.py) with the following command:\n```bash\npython tools\u002Fprepare_InstantNGP_with_mesh.py \\\n    --transforms_train {path to transform_train.json} \\\n    --mesh_path {path to stl file} \\\n    --mesh_sample_points {number of points to sample on the mesh, default to be 500} \\\n    --val_sample {if sample val from train, sample by every n frames, default to be 8} \\\n    --image_path_prefix {absolute path of the directory contain the train dir} \\\n    --output_path {any path you want}\n```\n- then in the output folder, there will be two json files, train.json and val.json, and a point cloud file point_cloud.parquet.\n- create a config yaml file similar to [test_sagemaker.yaml](config\u002Ftest_sagemaker.yaml), modify train-dataset-json-path to the path of train.json, val-dataset-json-path to the path of val.json, and pointcloud-parquet-path to the path of point_cloud.parquet. Also modify the summary-writer-log-dir and output-model-dir to where ever you want to save the model and tensorboard log.\n- run with the config:\n```bash\npython gaussian_point_train.py --train_config {path to config yaml}\n```\n\n\u003C\u002Fp>\n\u003C\u002Fdetails>\n\n### Train on dataset generated by other methods\n\u003Cdetails>\u003Csummary>CLICK ME\u003C\u002Fsummary>\n\u003Cp>\n\nsee [this file](docs\u002FRawDataFormat.md) about how to prepare the dataset.\n\n\u003C\u002Fp>\n\u003C\u002Fdetails>\n\n\n\n \n## Run\n```bash\npython gaussian_point_train.py --train_config {path to config file}\n```\n\nThe training process works in the following way:\n```mermaid\nstateDiagram-v2\n    state WeightToTrain {\n        sparsePointCloud\n        pointCloudExtraFeatures\n    }\n    WeightToTrain --> Rasterizer: input\n    cameraPose --> Rasterizer: input\n    Rasterizer --> Loss: rasterized image\n    ImageFromMultiViews --> Loss\n    Loss --> Rasterizer: gradient\n    Rasterizer --> WeightToTrain: gradient\n```\n\nThe result is visualized in tensorboard. The tensorboard log is stored in the output directory specified in the config file. The trained point cloud with feature is also stored as parquet and the output directory is specified in the config file.\n\n### Run on colab (to take advantage of google provided GPU accelerators)\nYou can find the related notebook here: [\u002Ftools\u002Frun_3d_gaussian_splatting_on_colab.ipynb](\u002Ftools\u002Frun_3d_gaussian_splatting_on_colab.ipynb)\n\n1. Set the hardware accelerator in colab: \"Runtime->Change Runtime Type->Hardware accelerator->select GPU->select T4\"\n2. Upload this repo to corresponding folder in your google drive.\n3. Mount your google drive to your notebook (see notebook).\n4. Install condacolab (see notebook).\n5. Install requirement.txt with pip (see notebook).\n6. Install pytorch, torchvision, pytorch-cuda etc. with conda (see notebook).\n7. Prepare the dataset as instructed in https:\u002F\u002Fgithub.com\u002Fwanmeihuali\u002Ftaichi_3d_gaussian_splatting#dataset\n8. Run the trainer with correct config (see notebook).\n9. Check out the training process through tensorboard (see notebook).\n\n## Visualization\nA simple visualizer is provided. The visualizer is implemented by Taichi GUI which limited the FPS to 60(If anyone knows how to change this limitation please ping me). The visualizer takes one or multiple parquet results. Example parquets can be downloaded [here](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F12-kZZay8RFlDk7hJQysG_Cr4-oxDp37l\u002Fview?usp=sharing).\n```bash\npython3 visualizer --parquet_path_list \u003Cparquet_path_0> \u003Cparquet_path_1> ...\n```\nThe visualizer merges multiple point clouds and displays them in the same scene.\n- Press 0 to select all point clouds(default state).\n- Press 1 to 9 to select one of the point clouds.\n- When all point clouds are selected, use \"WASD=-\" to move the camera, and use \"QE\" to rotate by the y-axis, or drag the mouse to do free rotation.\n- When only one of the point clouds is selected, use \"WASD=-\" to move the object\u002Fscene, and use \"QE\" to rotate the object\u002Fscene by the y-axis, or r drag the mouse to do free rotation by the center of the object.\n\n## How to contribute\u002FUse CI to train on cloud\n\nI've enabled CI and cloud-based training now. The function is not very stable yet. It enables anyone to contribute to this repo even if you don't have a GPU.\nGenerally, the workflow is:\n1. For any algorithm improvement, please create a new branch and make a pull request.\n2. Please @wanmeihuali in the pull request, and I will check the code and add a label `need_experiment` or `need_experiment_garden` or `need_experiment_tat_truck` to the pull request.\n3. The CI will automatically build the docker image and upload it to AWS ECR. Then the cloud-based training will be triggered. The training result will be uploaded to the pull request as a comment, e.g. [this PR](https:\u002F\u002Fgithub.com\u002Fwanmeihuali\u002Ftaichi_3d_gaussian_splatting\u002Fpull\u002F38). The dataset is generated by the default config of colmap. The training is on g4dn.xlarge Spot Instance(NVIDIA T4, a weaker GPU than 3090\u002FA6000), the training usually takes 2-3 hours.\n4. Now the best training result in README.md is manually updated. I will try to automate this process in the future.\n\nThe current implementation is based on my understanding of the paper, and it will have some difference from the paper\u002Fofficial implementation(they plan to release the code in the July). As a personal project, the parameters are not tuned well. I will try to improve performance in the future. Feel free to open an issue if you have any questions, and PRs are welcome, especially for any performance improvement.\n\n\n## TODO\n### Algorithm part\n- [ ] Fix the adaptive controller part, something is wrong with the densify process, and the description in the paper is very vague. Further experiments are needed to figure out the correct\u002Fbetter implementation.\n    - figure if the densify shall apply to all points, or only points in current frame.\n    - figure what \"average magnitude of view-space position gradients\" means, is it average across frames, or average across pixel? \n    - ~figure the correct split policy. Where shall the location of new point be? Currently the location is the location before optimization. Will it be better to put it at foci of the original ellipsoid?~ use sampling of pdf for over-reconstruct, use position before optimization for under-reconstruct.\n- [x] Add result score\u002Fimage in README.md\n    - try same dataset in the paper.\n    - fix issue in current blender plugin, and also make the plugin open source.\n- [ ] camera pose optimization: get the gradient of the camera pose, and optimize it during training.\n- [ ] Dynamic Rigid Object support. The current implementation already supports multiple camera poses in one scene, so the movement of rigid objects shall be able to transform into the movement of the camera. Need to find some sfm solution that can provide an estimation of 6 DOF pose for different objects, and modify the dataset code to do the test.\n\n### Engineering part\n- [x] fix bug: crash when there's no point in camrea.\n- [x] Add a inference only framework to support adding\u002Fmoving objects in the scene, scene merging, scene editing, etc.\n- [ ] Add a install script\u002Fdocker image\n- [ ] Support batch training. Currently the code only supports single image training, and only uses small part of the GPU memory.\n- [ ] Implement radix sort\u002Fcumsum by Taichi instead of torch, torch-taichi tensor cast seems only available on CUDA device. If we want to switch to other device, we need to get rid of torch.\n- [ ] Implement a Taichi only inference rasterizer which only use taichi field, and migrate to MacOS\u002FAndroid\u002FIOS.\n","# taichi_3d_gaussian_splatting\n由 Taichi 语言实现的论文《用于实时辐射场渲染的 3D 高斯泼溅》（[3D Gaussian Splatting for Real-Time Radiance Field Rendering](https:\u002F\u002Frepo-sam.inria.fr\u002Ffungraph\u002F3d-gaussian-splatting\u002F)）的非官方版本。\n\n## 3D 高斯泼溅的作用是什么？\n\n### 训练：\n该算法以多视角图像、稀疏点云和相机位姿作为输入，利用可微分光栅化器对点云进行训练，最终输出带有额外特征（协方差、颜色信息等）的稠密点云。\n\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fwanmeihuali_taichi_3d_gaussian_splatting_readme_699a577ab066.png\" alt=\"drawing\" width=\"200\"\u002F>\\\n如果将训练过程视为一个模块，可以这样描述：\n```mermaid\ngraph LR\n    A[多视角图像] --> B((训练))\n    C[稀疏点云] --> B\n    D[相机位姿] --> B\n    B --> E[带额外特征的稠密点云]\n```\n\n### 推理：\n该算法以带额外特征的稠密点云和任意相机位姿作为输入，使用相同的光栅化器从给定的相机位姿渲染出图像。\n```mermaid\ngraph LR\n    C[带额外特征的稠密点云] --> B((推理))\n    D[新相机位姿] --> B\n    B --> E[图像]\n```\n推理结果示例：\n\nhttps:\u002F\u002Fgithub.com\u002Fwanmeihuali\u002Ftaichi_3d_gaussian_splatting\u002Fassets\u002F18469933\u002Fcc760693-636b-4157-ae85-33813f3da54d\n\n由于点云的良好特性，与其他类似 NeRF 的算法相比，该算法在处理场景或物体合并时更加容易。\n\nhttps:\u002F\u002Fgithub.com\u002Fwanmeihuali\u002Ftaichi_3d_gaussian_splatting\u002Fassets\u002F18469933\u002Fbc38a103-e435-4d35-9239-940e605b4552\n\n\n\n\u003Cdetails>\u003Csummary>其他示例结果\u003C\u002Fsummary>\n\u003Cp>\n\n左上：[本仓库的结果（3万次迭代）](https:\u002F\u002Fgithub.com\u002Fwanmeihuali\u002Ftaichi_3d_gaussian_splatting\u002Fblob\u002Fcf7c1428e8d26495a236726adf9546e4f2a9adb7\u002Fconfig\u002Ftat_truck_every_8_test.yaml)，右上：真实标签，左下：归一化深度，右下：每像素点数的归一化值\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fwanmeihuali_taichi_3d_gaussian_splatting_readme_2e2f4f53ce5d.png)\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fwanmeihuali_taichi_3d_gaussian_splatting_readme_12c9d01e706a.png)\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fwanmeihuali_taichi_3d_gaussian_splatting_readme_27accb243b7f.png)\n\n\u003C\u002Fp>\n\u003C\u002Fdetails>\n\n## 为什么选择 Taichi？\n- Taichi 是一种用于高性能计算的语言。它旨在弥合以生产力为导向的 Python 语言与以性能和并行性为导向的 C++\u002FCUDA 语言之间的差距。通过使用 Taichi，本项目完全基于 Python 实现，却能达到与 CUDA 实现相当甚至更好的性能。此外，代码也更容易阅读和维护。\n- Taichi 提供多种后端，包括 CUDA、OpenGL、Metal 等。我们计划未来切换后端以支持更多平台，但目前该项目仅支持 CUDA 后端。\n- Taichi 提供自动微分功能，尽管当前项目尚未使用，但这对于未来的开发来说是一个非常有用的功能。\n\n## 当前状态\n目前，该项目已在官方实现提供的数据集上进行了测试。对于卡车数据集，本项目仅使用官方实现 1\u002F5 到 1\u002F4 的点数，便能获得略高于官方实现的 PSNR 值。然而，训练和推理的速度仍然慢于官方实现。\n\n官方实现和本实现的结果均在同一数据集上测试。我注意到官方实现的结果与其论文中的结果略有不同，这可能是由于测试分辨率的差异所致。\n\n| 数据集 | 来源 | PSNR | SSIM | 点数 |\n| --- | --- | --- | --- | --- |\n| 卡车（7k） | 论文 | 23.51 | 0.840 | - |\n| 卡车（7k） | 官方实现 | 23.22 | - | 173万 |\n| 卡车（7k） | 本实现 | 23.762359619140625 | 0.835700511932373 | 约23万 |\n| 卡车（30k） | 论文 | 25.187 | 0.879 | - |\n| 卡车（30k） | 官方实现 | 24.88 | - | 210万 |\n| 卡车（30k） | 本实现 | 25.21463966369629 | 0.8645088076591492 | 428,687.0 |\n\n[卡车（3万）最新最佳结果](https:\u002F\u002Fgithub.com\u002Fwanmeihuali\u002Ftaichi_3d_gaussian_splatting\u002Fpull\u002F98#issuecomment-1634828783)：\n| 训练：迭代次数 | 训练：L1 损失 | 训练：损失 | 训练：有效点数 | 训练：PSNR | 训练：SSIM | 训练：SSIM 损失 | 验证：损失 | 验证：PSNR | 验证：SSIM |\n| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |\n| 30000.0 | 0.02784738875925541 | 0.04742341861128807 | 428687.0 | 25.662137985229492 | 0.8742724657058716 | 0.12572753429412842 | 0.05369199812412262 | 25.21463966369629 | 0.8645088076591492 |\n\n\n## 安装\n1. 准备包含 PyTorch 和 torchvision 的环境。\n2. 克隆本仓库并进入目录。\n3. 运行以下命令：\n```\npip install -r requirements.txt\npip install -e .\n```\n\n所有依赖项均可通过 pip 安装。PyTorch\u002FTorchvision 也可以通过 conda 安装。代码已在 Ubuntu 20.04.2 LTS 上使用 Python 3.10.10 测试过。硬件配置为 RTX 3090 显卡和 CUDA 12.1。虽然尚未在其他平台上测试，但经过少量修改后应该也能在其他平台上运行。\n\n## 数据集\n该算法需要整个场景的点云、相机参数以及真实标签图像。点云以 parquet 格式存储，相机参数和真实标签图像以 JSON 格式存储。运行配置则以 YAML 格式存储。我们提供了一个从 Colmap 输出构建数据集的脚本，也可以直接从原始数据构建数据集。\n### 在 Tank and Temple 卡车场景上训练\n\u003Cdetails>\u003Csummary>点击我\u003C\u002Fsummary>\n\u003Cp>\n**免责声明**：用户需事先获得原始数据集提供者的许可。任何数据的使用都必须遵守数据集所有者的许可协议。\n\n[Tank and Temple](https:\u002F\u002Fwww.tanksandtemples.org\u002Fdownload\u002F) 数据集中卡车场景是本项目开发的主要数据集。我们在大多数实验中都使用了图像的下采样版本。相机位姿和稀疏点云可以通过 Colmap 轻松生成。预处理后的图像、预生成的相机位姿和点云可用于卡车场景，可从此 [链接](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1ZhMSkm3YGfhtywII5Hik5YDdMzD3lZjX?usp=sharing\n) 下载。\n\n请将图像下载到名为 `image` 的文件夹中，并将其放置在本项目的根目录下。相机位姿和稀疏点云应放在 `data\u002Ftat_truck_every_8_test` 目录下。文件夹结构应如下所示：\n```\n├── data\n│   ├── tat_truck_every_8_test\n│   │   ├── train.json\n│   │   ├── val.json\n│   │   ├── point_cloud.parquet\n├── image\n│   ├── 000000.png\n│   ├── 000001.png\n```\n同时提供了配置文件 [config\u002Ftat_truck_every_8_test.yaml](config\u002Ftat_truck_every_8_test.yaml)。该配置文件用于指定数据集路径、训练参数和网络参数。配置文件内容清晰易懂。可通过以下命令开始训练：\n```bash\npython gaussian_point_train.py --train_config config\u002Ftat_truck_every_8_test.yaml\n```\n\u003C\u002Fp>\n\u003C\u002Fdetails>\n\n### 在示例物体（靴子）上训练\n\n\u003Cdetails>\u003Csummary>点击我\u003C\u002Fsummary>\n\u003Cp>\n\n实际上，这是一个来自[互联网](https:\u002F\u002Fwww.turbosquid.com\u002F3d-models\u002F3d-tactical-boots-1948918)的随机免费网格模型，我认为可以免费使用。数据集是使用[BlenderNerf](https:\u002F\u002Fgithub.com\u002Fmaximeraafat\u002FBlenderNeRF.git)生成的。经过预处理的图像、预先生成的相机位姿以及靴子场景的点云可以从这个[链接](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1d14l9ewnyI7zCA6BxuQUWseQbIKyo3Jh?usp=sharing)下载。请将图像下载到名为`image`的文件夹中，并将其放置在本仓库的根目录下。相机位姿和稀疏点云应放在`data\u002Fboots_super_sparse`目录下。文件夹结构应如下所示：\n```\n├── data\n│   ├── boots_super_sparse\n│   │   ├── boots_train.json\n│   │   ├── boots_val.json\n│   │   ├── point_cloud.parquet\n├── image\n│   ├── images_train\n│   │   ├── COS_Camera.001.png\n│   │   ├── COS_Camera.002.png\n|   |   ├── ...\n```\n请注意，由于该数据集中的图像分辨率较高（1920x1080），因此在其上进行训练的速度实际上比在卡车场景上训练要慢。\n\n\n\u003C\u002Fp>\n\u003C\u002Fdetails>\n\n\n### 使用Colmap生成的数据集进行训练\n\u003Cdetails>\u003Csummary>点击我\u003C\u002Fsummary>\n\u003Cp>\n    \n- 使用Colmap进行重建：参见https:\u002F\u002Fcolmap.github.io\u002Ftutorial.html。图像应进行去畸变处理。通常稀疏重建就足够了。\n- 保存为txt格式：标准的Colmap txt输出包含三个文件，即cameras.txt、images.txt和points3D.txt。\n- 将txt文件转换为json和parquet格式：请参阅[此文件](tools\u002Fprepare_colmap.py)，了解如何准备这些文件。\n- 准备配置yaml文件：以[此文件](config\u002Ftat_train.yaml)为例。\n- 使用该配置运行。\n\n\u003C\u002Fp>\n\u003C\u002Fdetails>\n\n### 使用带有额外网格的Instant-NGP格式数据集进行训练\n\u003Cdetails>\u003Csummary>点击我\u003C\u002Fsummary>\n\u003Cp>\n\n- 提供了一个脚本，用于将Instant-NGP格式的数据集转换为所需的两个JSON文件。然而，该算法需要一个额外的点云作为输入，而Instant-NGP格式的数据集通常不包含这一点。该脚本接受一个网格文件作为输入，并通过在网格上采样点来生成点云。该脚本位于[这里](tools\u002Fprepare_InstantNGP_with_mesh.py)。\n- 用户可以使用以下命令运行该脚本：\n```bash\npython tools\u002Fprepare_InstantNGP_with_mesh.py \\\n    --transforms_train {训练变换文件路径} \\\n    --transforms_test {验证变换文件路径，若未提供，则从训练集中采样验证集} \\\n    --mesh_path {网格文件路径} \\\n    --mesh_sample_points {在网格上采样的点数} \\\n    --val_sample {若从训练集中采样验证集，则每隔n帧采样一次} \\\n    --image_path_prefix {图像路径前缀，通常是包含图像文件夹的目录路径} \\\n    --output_path {输出文件夹路径}\n```\n- 运行后，在输出文件夹中将生成两个JSON文件，即train.json和val.json，以及一个点云文件point_cloud.parquet。\n- 创建一个类似于[test_sagemaker.yaml](config\u002Ftest_sagemaker.yaml)的配置yaml文件，将train-dataset-json-path修改为train.json的路径，val-dataset-json-path修改为val.json的路径，pointcloud-parquet-path修改为point_cloud.parquet的路径。同时，将summary-writer-log-dir和output-model-dir修改为你希望保存模型和TensorBoard日志的路径。\n- 使用该配置运行：\n```bash\npython gaussian_point_train.py --train_config {配置yaml文件路径}\n```\n\n\u003C\u002Fp>\n\u003C\u002Fdetails>\n\n### 使用BlenderNerf生成的数据集进行训练\n\u003Cdetails>\u003Csummary>点击我\u003C\u002Fsummary>\n\u003Cp>\n\n[BlenderNerf](https:\u002F\u002Fgithub.com\u002Fmaximeraafat\u002FBlenderNeRF.git)是一个用于为NeRF生成数据集的Blender插件。BlenderNerf生成的数据集可以是Instant-NGP格式，我们可以使用[脚本](tools\u002Fprepare_InstantNGP_with_mesh.py)将其转换为所需格式。此外，网格文件可以很容易地从Blender中导出。要生成数据集：\n- 安装[Blender](https:\u002F\u002Fwww.blender.org\u002F)。\n- 将你想要的网格或场景导入[Blender](https:\u002F\u002Fwww.blender.org\u002F)。\n- 按照[BlenderNerf](https:\u002F\u002Fgithub.com\u002Fmaximeraafat\u002FBlenderNeRF.git)中的README安装BlenderNerf。\n- 配置BlenderNerf：确保选择“Train”选项，而不选择“Test”选项（“Test”选项似乎存在一些问题），并将文件格式设置为NGP，并填写保存路径。\\\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fwanmeihuali_taichi_3d_gaussian_splatting_readme_45b563ca40e9.png)\n- 配置BlenderNerf Camera on Sphere：按照BlenderNerf README中的说明配置相机（默认设置在大多数情况下已经足够）。然后点击PLAY COS。\\\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fwanmeihuali_taichi_3d_gaussian_splatting_readme_17b332f30666.png) \n- 保存路径下会生成一个zip文件。解压后，应该会有一个名为`train`的文件夹和一个名为`transforms_train.json`的文件。\n- 在Blender中，选择File->Export->Stl(.stl)，将网格导出为stl文件。\n- 可以使用以下命令运行[脚本](tools\u002Fprepare_InstantNGP_with_mesh.py)：\n```bash\npython tools\u002Fprepare_InstantNGP_with_mesh.py \\\n    --transforms_train {transform_train.json文件路径} \\\n    --mesh_path {stl文件路径} \\\n    --mesh_sample_points {在网格上采样的点数，默认为500} \\\n    --val_sample {若从训练集中采样验证集，则每隔n帧采样一次，默认为8} \\\n    --image_path_prefix {包含train文件夹的目录的绝对路径} \\\n    --output_path {任意你希望的路径}\n```\n- 运行后，在输出文件夹中将生成两个JSON文件，即train.json和val.json，以及一个点云文件point_cloud.parquet。\n- 创建一个类似于[test_sagemaker.yaml](config\u002Ftest_sagemaker.yaml)的配置yaml文件，将train-dataset-json-path修改为train.json的路径，val-dataset-json-path修改为val.json的路径，pointcloud-parquet-path修改为point_cloud.parquet的路径。同时，将summary-writer-log-dir和output-model-dir修改为你希望保存模型和TensorBoard日志的路径。\n- 使用该配置运行：\n```bash\npython gaussian_point_train.py --train_config {配置yaml文件路径}\n```\n\n\u003C\u002Fp>\n\u003C\u002Fdetails>\n\n### 使用其他方法生成的数据集进行训练\n\u003Cdetails>\u003Csummary>点击我\u003C\u002Fsummary>\n\u003Cp>\n\n请参阅[此文件](docs\u002FRawDataFormat.md)，了解如何准备数据集。\n\n\u003C\u002Fp>\n\u003C\u002Fdetails>\n\n\n\n \n## 运行\n```bash\npython gaussian_point_train.py --train_config {配置文件路径}\n```\n\n训练过程的工作流程如下：\n```mermaid\nstateDiagram-v2\n    state WeightToTrain {\n        sparsePointCloud\n        pointCloudExtraFeatures\n    }\n    WeightToTrain --> Rasterizer: 输入\n    cameraPose --> Rasterizer: 输入\n    Rasterizer --> Loss: 光栅化后的图像\n    ImageFromMultiViews --> Loss\n    Loss --> Rasterizer: 梯度\n    Rasterizer --> WeightToTrain: 梯度\n```\n\n结果会在TensorBoard中可视化。TensorBoard日志存储在配置文件中指定的输出目录中。训练后的带特征点云也会以parquet格式存储，其输出目录同样由配置文件指定。\n\n### 在 Colab 上运行（以利用 Google 提供的 GPU 加速器）\n相关笔记本可在以下链接找到：[\u002Ftools\u002Frun_3d_gaussian_splatting_on_colab.ipynb](\u002Ftools\u002Frun_3d_gaussian_splatting_on_colab.ipynb)\n\n1. 在 Colab 中设置硬件加速器：“Runtime->Change Runtime Type->Hardware accelerator->选择 GPU->选择 T4”\n2. 将此仓库上传至您的 Google Drive 的相应文件夹。\n3. 将您的 Google Drive 挂载到笔记本中（参见笔记本）。\n4. 安装 condacolab（参见笔记本）。\n5. 使用 pip 安装 requirement.txt 中的依赖项（参见笔记本）。\n6. 使用 conda 安装 PyTorch、torchvision、pytorch-cuda 等（参见笔记本）。\n7. 按照 https:\u002F\u002Fgithub.com\u002Fwanmeihuali\u002Ftaichi_3d_gaussian_splatting#dataset 中的说明准备数据集。\n8. 使用正确的配置运行训练器（参见笔记本）。\n9. 通过 TensorBoard 查看训练过程（参见笔记本）。\n\n## 可视化\n提供了一个简单的可视化工具。该工具由 Taichi GUI 实现，帧率被限制在 60 FPS（如果有人知道如何解除这一限制，请联系我）。该可视化工具可以接收一个或多个 Parquet 格式的输出结果。示例 Parquet 文件可在此下载：[https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F12-kZZay8RFlDk7hJQysG_Cr4-oxDp37l\u002Fview?usp=sharing](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F12-kZZay8RFlDk7hJQysG_Cr4-oxDp37l\u002Fview?usp=sharing)。\n```bash\npython3 visualizer --parquet_path_list \u003Cparquet_path_0> \u003Cparquet_path_1> ...\n```\n该可视化工具会将多个点云合并，并在同一场景中显示。\n- 按 0 键选择所有点云（默认状态）。\n- 按 1 到 9 键选择其中一个点云。\n- 当所有点云都被选中时，使用 WASD 和 - 键移动相机，使用 QE 键绕 Y 轴旋转，或直接拖动鼠标进行自由旋转。\n- 当仅选择一个点云时，使用 WASD 和 - 键移动对象或场景，使用 QE 键绕 Y 轴旋转对象或场景，或直接拖动鼠标以对象中心为轴进行自由旋转。\n\n## 如何贡献\u002F使用 CI 在云端训练\n\n我现在已经启用了 CI 和基于云的训练功能。该功能目前还不太稳定，但它允许即使没有 GPU 的人也能为这个仓库做出贡献。\n通常的工作流程如下：\n1. 对于任何算法改进，请创建一个新的分支并提交拉取请求。\n2. 请在拉取请求中 @wanmeihuali，我会检查代码并为拉取请求添加 `need_experiment`、`need_experiment_garden` 或 `need_experiment_tat_truck` 标签。\n3. CI 会自动构建 Docker 镜像并将其上传到 AWS ECR，随后触发云端训练。训练结果会以评论的形式上传到拉取请求中，例如 [this PR](https:\u002F\u002Fgithub.com\u002Fwanmeihuali\u002Ftaichi_3d_gaussian_splatting\u002Fpull\u002F38)。数据集是使用 colmap 的默认配置生成的。训练在 g4dn.xlarge Spot 实例（NVIDIA T4，性能弱于 3090\u002FA6000）上进行，通常需要 2-3 小时。\n4. 目前 README.md 中的最佳训练结果是由人工更新的。未来我会尝试自动化这一过程。\n\n当前的实现基于我对论文的理解，与论文或官方实现可能会存在一些差异（他们计划在七月发布代码）。作为一个个人项目，参数尚未经过充分调优。我将在未来继续改进性能。如果您有任何问题，欢迎随时提出 issue；也欢迎提交 PR，尤其是针对性能提升的贡献。\n\n\n## 待办事项\n### 算法部分\n- [ ] 修复自适应控制器部分，目前致密化过程存在问题，且论文中的描述非常模糊。需要进一步实验来确定正确或更好的实现方式。\n    - 确定致密化是否应应用于所有点，还是仅限于当前帧中的点。\n    - 弄清楚“视空间位置梯度的平均 magnitude”具体指什么，是跨帧平均，还是跨像素平均？\n    - ~确定正确的分割策略。新点的位置应该放在哪里？目前的位置是优化前的位置。是否应该将其放置在原始椭球体的焦点上更好？~ 对于过度重建的情况，使用概率密度函数采样；对于欠重建的情况，则使用优化前的位置。\n- [x] 在 README.md 中添加结果评分和图像\n    - 尝试使用论文中相同的数据集。\n    - 修复当前 Blender 插件中的问题，并将插件开源。\n- [ ] 相机位姿优化：获取相机位姿的梯度，并在训练过程中对其进行优化。\n- [ ] 支持动态刚体对象。目前的实现已经支持在一个场景中使用多个相机位姿，因此刚体对象的运动可以转换为相机的运动。需要寻找一种能够为不同物体提供 6 自由度位姿估计的 SfM 解决方案，并修改数据集代码以进行测试。\n\n### 工程部分\n- [x] 修复 bug：当相机中没有点时程序会崩溃。\n- [x] 添加仅推理框架，以支持在场景中添加\u002F移动物体、场景合并、场景编辑等功能。\n- [ ] 添加安装脚本\u002FDocker 镜像。\n- [ ] 支持批量训练。目前代码仅支持单张图像训练，且只使用了 GPU 内存的一小部分。\n- [ ] 使用 Taichi 实现基数排序\u002FCumsum，而不是依赖 PyTorch。PyTorch-Taichi 的张量转换似乎仅在 CUDA 设备上可用。如果要切换到其他设备，就需要摆脱 PyTorch。\n- [ ] 实现仅使用 Taichi 场的纯 Taichi 推理光栅化器，并迁移到 macOS、Android 和 iOS 平台。","# taichi_3d_gaussian_splatting 快速上手指南\n\n`taichi_3d_gaussian_splatting` 是一个基于 Taichi Lang 实现的 3D Gaussian Splatting 非官方版本。它利用纯 Python 代码实现了高性能的可微分光栅化，用于从多视角图像训练稠密点云并进行实时渲染。相比官方 CUDA 实现，该版本代码更易读且易于维护，同时在部分场景下能以更少的点数量达到更高的 PSNR。\n\n## 环境准备\n\n在开始之前，请确保你的开发环境满足以下要求：\n\n*   **操作系统**: 推荐 Ubuntu 20.04.2 LTS (其他 Linux 发行版或 Windows 可能需要微调)。\n*   **Python 版本**: 推荐 Python 3.10.10。\n*   **硬件要求**: NVIDIA GPU (测试环境为 RTX 3090)，需安装对应的 CUDA 驱动 (测试环境为 CUDA 12.1)。\n    *   *注意：目前主要支持 CUDA 后端。*\n*   **前置依赖**:\n    *   PyTorch\n    *   TorchVision\n\n建议先通过 Conda 创建虚拟环境并安装 PyTorch：\n\n```bash\nconda create -n taichi_splatting python=3.10\nconda activate taichi_splatting\n# 请访问 pytorch.org 获取适合你 CUDA 版本的安装命令，例如：\n# pip install torch torchvision --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\n```\n\n> **国内加速提示**：如果下载 PyTorch 或 pip 包速度较慢，推荐使用清华源或阿里源。\n> ```bash\n> export PIP_INDEX_URL=https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n## 安装步骤\n\n克隆仓库并安装项目依赖：\n\n1.  克隆代码库并进入目录：\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fwanmeihuali\u002Ftaichi_3d_gaussian_splatting.git\n    cd taichi_3d_gaussian_splatting\n    ```\n\n2.  安装 Python 依赖包及项目本身：\n    ```bash\n    pip install -r requirements.txt\n    pip install -e .\n    ```\n\n## 基本使用\n\n本项目最核心的功能是训练和推理。以下以官方提供的 **Tank and Temple Truck** 数据集为例，展示最简单的训练流程。\n\n### 1. 准备数据\n\n你需要准备图像、相机姿态（JSON 格式）和稀疏点云（Parquet 格式）。\n\n*   **下载预处理数据**：可以从 [Google Drive 链接](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1ZhMSkm3YGfhtywII5Hik5YDdMzD3lZjX?usp=sharing) 下载预处理的 Truck 场景数据。\n*   **目录结构**：将下载的图片放入根目录下的 `image` 文件夹，将相机姿态和点云文件放入 `data\u002Ftat_truck_every_8_test` 文件夹。最终结构应如下所示：\n\n    ```text\n    ├── data\n    │   ├── tat_truck_every_8_test\n    │   │   ├── train.json\n    │   │   ├── val.json\n    │   │   ├── point_cloud.parquet\n    ├── image\n    │   ├── 000000.png\n    │   ├── 000001.png\n    │   └── ...\n    ```\n\n    *(注：如果你有自己的 COLMAP 输出数据，可以使用 `tools\u002Fprepare_colmap.py` 脚本转换为上述格式)*\n\n### 2. 开始训练\n\n使用提供的配置文件启动训练过程。该配置文件已预设好数据集路径和网络参数。\n\n```bash\npython gaussian_point_train.py --train_config config\u002Ftat_truck_every_8_test.yaml\n```\n\n训练过程中，程序会读取多视角图像和稀疏点云，通过可微分光栅器优化点云属性（协方差、颜色等），最终输出带有额外特征的稠密点云。\n\n### 3. 推理与渲染\n\n训练完成后，模型会自动保存（具体路径在 yaml 配置中指定，通常在 `output-model-dir`）。推理阶段只需加载训练好的稠密点云和新的相机姿态即可渲染图像。具体的推理脚本逻辑包含在训练代码的验证环节中，或者你可以参考仓库中的测试脚本来加载模型并生成新视角的图像。\n\n---\n*提示：对于其他数据集（如 BlenderNerf 生成的数据或 Instant-NGP 格式数据），请参考仓库 `tools` 目录下的转换脚本进行数据预处理，并修改对应的 yaml 配置文件路径后运行相同的训练命令。*","某自动驾驶仿真团队需要基于实车采集的多视角视频，快速构建高保真、可实时渲染的城市路口三维场景，用于测试感知算法在极端视角下的表现。\n\n### 没有 taichi_3d_gaussian_splatting 时\n- **渲染速度慢**：传统 NeRF 类方法推理耗时严重，无法在仿真引擎中实现实时的自由视角漫游，导致测试迭代效率极低。\n- **场景融合困难**：当需要将不同时间段采集的“路口”与“车辆”模型合并时，神经辐射场往往出现伪影或边界融合不自然，难以动态编辑。\n- **开发门槛高**：官方高性能实现依赖复杂的 CUDA 代码，算法工程师若想调整光栅化逻辑或添加新特征，必须精通底层 GPU 编程，维护成本巨大。\n- **显存占用过大**：为了达到较高的图像重建质量（PSNR），通常需要数百万个高斯点，对仿真集群的显存资源造成极大压力。\n\n### 使用 taichi_3d_gaussian_splatting 后\n- **实时渲染流畅**：利用可微分光栅化技术，实现了毫秒级的图像合成，仿真器能以高帧率流畅回放任意相机位姿的视角，大幅提升测试覆盖率。\n- **对象合并灵活**：得益于点云表示的天然优势，团队轻松将独立训练的车辆与路口高斯模型无缝拼接，且无明显的视觉伪影，支持动态场景构建。\n- **纯 Python 高效开发**：基于 Taichi 语言，核心算法完全用 Python 编写却拥有接近 CUDA 的性能，研究人员可直接修改训练逻辑，无需触碰底层 C++\u002FCUDA 代码。\n- **稀疏高效存储**：在 Truck 数据集测试中，仅用官方实现约 1\u002F5 的点数（约 43 万点）就达到了更高的 PSNR（25.21 vs 24.88），显著降低了显存占用。\n\ntaichi_3d_gaussian_splatting 通过纯 Python 的高性能实现，让开发者在保持实时渲染与高质量重建的同时，极大地降低了三维场景生成的工程门槛与资源成本。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fwanmeihuali_taichi_3d_gaussian_splatting_2e2f4f53.png","wanmeihuali",null,"https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fwanmeihuali_08fb5905.png","Software Engineer in Autonomous Driving Industry","San Diego, CA","https:\u002F\u002Fgithub.com\u002Fwanmeihuali",[79,83,87],{"name":80,"color":81,"percentage":82},"Jupyter Notebook","#DA5B0B",55,{"name":84,"color":85,"percentage":86},"Python","#3572A5",44.9,{"name":88,"color":89,"percentage":90},"Shell","#89e051",0.1,747,72,"2026-04-06T12:57:34","Apache-2.0","Linux","必需 NVIDIA GPU，测试环境为 RTX 3090，需支持 CUDA 12.1（目前仅支持 CUDA 后端）","未说明",{"notes":99,"python":100,"dependencies":101},"代码已在 Ubuntu 20.04.2 LTS 上测试通过。虽然 Taichi 语言理论上支持多后端，但目前该仓库仅支持 CUDA 后端。其他平台可能需要少量修改才能运行。数据集准备需要额外步骤（如使用 COLMAP 或 BlenderNeRF 生成点云和相机参数）。","3.10.10",[102,103,104,105,106,107,108,109,110],"torch","torchvision","taichi","pyarrow","pyyaml","tqdm","opencv-python","pillow","numpy",[15,14,112],"其他",[114,115,116,117,118,119,120,121,122,104],"3d-reconstruction","3d-rendering","computer-graphics","computer-vision","machine-learning","nerf","python","pytorch","real-time-rendering","2026-03-27T02:49:30.150509","2026-04-07T02:37:24.201008",[126,131,136,140,145,150,155],{"id":127,"question_zh":128,"answer_zh":129,"source_url":130},20770,"为什么可视化时会出现类似瓷砖的伪影（Tiles-like artifacts）或颜色填充整个图块？","这是由于 `gaussian_alpha` 的值变为无穷大（inf）导致的。解决方法是在渲染代码中增加检查，跳过无效的 alpha 值。具体操作是在 `GaussianPointCloudRasterisation.py` 的相关循环中加入以下判断：\n```python\nif abs(gaussian_alpha) >= np.inf:\n    continue\n```\n该问题的根本原因已在相关 PR 中修复，核心思路是在像素空间进行 2D 卷积以避免数值溢出。","https:\u002F\u002Fgithub.com\u002Fwanmeihuali\u002Ftaichi_3d_gaussian_splatting\u002Fissues\u002F119",{"id":132,"question_zh":133,"answer_zh":134,"source_url":135},20771,"如何将 .parquet 格式的文件转换为网格（Mesh）或其他点云格式？","目前还没有高效的方法直接生成带纹理的网格。主要困难在于该算法生成的点云包含大量分布在物体表面附近的半透明点（通过体积渲染堆叠颜色），而传统的点云转网格算法通常不考虑不透明度（opacity）。因此，直接使用常规算法效果不佳，可能需要定制专门的点云转网格算法来处理这些半透明点。","https:\u002F\u002Fgithub.com\u002Fwanmeihuali\u002Ftaichi_3d_gaussian_splatting\u002Fissues\u002F116",{"id":137,"question_zh":138,"answer_zh":139,"source_url":135},20772,"如何在 Web 端或使用 CPU 本地机器可视化 .parquet 文件？","理论上是可以实现的，但可能需要修改颜色计算公式，例如在球谐函数（SH function）后添加一个 sigmoid 激活函数。目前有用户尝试使用基于 WebGL 的查看器（如 antimatter15.com\u002Fsplat\u002F）来可视化数据，但可能需要对数据格式或计算逻辑进行适配。",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},20773,"训练高分辨率视频时出现崩溃或不理想的结果，如何解决？","训练更高分辨率的图像需要对现有参数进行更多微调，否则可能导致崩溃或不良结果。建议的解决方案是自动将训练图像缩放至不超过 1600 像素（如果需要）。这可以通过在数据预处理阶段添加缩放逻辑来实现，以确保在广泛场景下的鲁棒性。","https:\u002F\u002Fgithub.com\u002Fwanmeihuali\u002Ftaichi_3d_gaussian_splatting\u002Fissues\u002F151",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},20774,"如何实现相机姿态优化？梯度如何从 3D 高斯协方差传递到视角？","实现相机姿态优化需要注意以下几点：\n1. 检查反向传播公式是否正确，需计算从损失函数到视角的偏导数。\n2. 优化方案可以选择直接优化 c2w 矩阵，或者单独设计一个张量来表示矩阵的偏差（delta），初始化为 0 并优化 delta。\n3. 建议从简单的扰动开始进行测试，以评估姿态优化是否生效。\n如果在 CUDA 实现中发现只有前几次迭代有梯度，需仔细检查 `computeCov2DCUDA` 中的梯度回传逻辑。","https:\u002F\u002Fgithub.com\u002Fwanmeihuali\u002Ftaichi_3d_gaussian_splatting\u002Fissues\u002F120",{"id":151,"question_zh":152,"answer_zh":153,"source_url":154},20775,"遇到与数据类型相关的错误（如 tensorboard 绘图报错），该如何排查？","这可能是由于早期版本的 torch 或 tensorboard 对 int32 类型支持不佳导致的。可以尝试注释掉 `self._plot_grad_histogram` 这一行代码，因为它只是一个用于在 tensorboard 中绘制梯度分布的日志函数，并非核心功能。如果问题仍然存在，需进一步检查涉及变量的数据类型转换。","https:\u002F\u002Fgithub.com\u002Fwanmeihuali\u002Ftaichi_3d_gaussian_splatting\u002Fissues\u002F62",{"id":156,"question_zh":157,"answer_zh":158,"source_url":144},20776,"官方代码中 2D 高斯卷积的归一化因子为什么包含 sqrt(|Σ|)？","在像素空间进行 2D 卷积以模拟像素积分时，归一化因子设为 `sqrt(|Σ| \u002F |Σ+0.3I|)` 是为了补偿协方差矩阵的变化。虽然理论上归一化高斯积分只需 `1 \u002F sqrt(|Σ+0.3I|)`，但引入 `sqrt(|Σ|)` 是为了确保卷积结果的积分特性与原始 3D 高斯投影保持一致，这是官方代码中模拟像素积分的一种特定处理方式（即使官方实现也可能存在细节上的讨论）。",[]]