[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-ashawkey--stable-dreamfusion":3,"tool-ashawkey--stable-dreamfusion":62},[4,18,26,35,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,2,"2026-04-10T11:39:34",[14,15,13],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":32,"last_commit_at":41,"category_tags":42,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[43,13,15,14],"插件",{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":10,"last_commit_at":50,"category_tags":51,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[52,15,13,14],"语言模型",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,"2026-04-06T03:28:53",[14,15,13,61],"视频",{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":68,"readme_en":69,"readme_zh":70,"quickstart_zh":71,"use_case_zh":72,"hero_image_url":73,"owner_login":74,"owner_name":75,"owner_avatar_url":76,"owner_bio":77,"owner_company":78,"owner_location":79,"owner_email":80,"owner_twitter":77,"owner_website":81,"owner_url":82,"languages":83,"stars":108,"forks":109,"last_commit_at":110,"license":111,"difficulty_score":112,"env_os":113,"env_gpu":114,"env_ram":115,"env_deps":116,"category_tags":128,"github_topics":130,"view_count":32,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":137,"updated_at":138,"faqs":139,"releases":168},8002,"ashawkey\u002Fstable-dreamfusion","stable-dreamfusion","Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion.","stable-dreamfusion 是一个基于 PyTorch 的开源项目，旨在实现通过文本描述或单张图片生成三维模型，并支持导出为网格格式。它巧妙地将强大的 Stable Diffusion 二维图像生成能力与神经辐射场（NeRF）技术相结合，让用户只需输入简单的文字提示或上传参考图，即可创造出对应的 3D 资产。\n\n该项目主要解决了传统 3D 建模门槛高、耗时久的问题，为快速原型设计和内容创作提供了自动化方案。其核心技术亮点在于使用多分辨率网格编码器替代了传统的 NeRF 主干网络，显著提升了渲染速度；同时引入了 Perp-Neg 技术，有效缓解了生成物体出现“多头”或结构畸变的常见问题。此外，它还支持 DeepFloyd-IF 等后端模型，并提供了详细的 Colab 笔记以便快速上手。\n\n需要注意的是，作为一个持续迭代中的实验性项目，stable-dreamfusion 的生成质量目前可能尚未完全达到学术论文中的理想效果，部分复杂提示词仍可能生成失败。因此，它更适合具备一定技术背景的开发者、AI 研究人员以及希望探索前沿 3D 生成技术的创意设计师使用。对于普通用户而言，若熟悉","stable-dreamfusion 是一个基于 PyTorch 的开源项目，旨在实现通过文本描述或单张图片生成三维模型，并支持导出为网格格式。它巧妙地将强大的 Stable Diffusion 二维图像生成能力与神经辐射场（NeRF）技术相结合，让用户只需输入简单的文字提示或上传参考图，即可创造出对应的 3D 资产。\n\n该项目主要解决了传统 3D 建模门槛高、耗时久的问题，为快速原型设计和内容创作提供了自动化方案。其核心技术亮点在于使用多分辨率网格编码器替代了传统的 NeRF 主干网络，显著提升了渲染速度；同时引入了 Perp-Neg 技术，有效缓解了生成物体出现“多头”或结构畸变的常见问题。此外，它还支持 DeepFloyd-IF 等后端模型，并提供了详细的 Colab 笔记以便快速上手。\n\n需要注意的是，作为一个持续迭代中的实验性项目，stable-dreamfusion 的生成质量目前可能尚未完全达到学术论文中的理想效果，部分复杂提示词仍可能生成失败。因此，它更适合具备一定技术背景的开发者、AI 研究人员以及希望探索前沿 3D 生成技术的创意设计师使用。对于普通用户而言，若熟悉命令行操作和 Python 环境配置，也可尝试体验这一从 2D 到 3D 的神奇转换过程。","# Stable-Dreamfusion\n\nA pytorch implementation of the text-to-3D model **Dreamfusion**, powered by the [Stable Diffusion](https:\u002F\u002Fgithub.com\u002FCompVis\u002Fstable-diffusion) text-to-2D model.\n\n**ADVERTISEMENT: Please check out [threestudio](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio) for recent improvements and better implementation in 3D content generation!**\n\n**NEWS (2023.6.12)**:\n\n* Support of [Perp-Neg](https:\u002F\u002Fperp-neg.github.io\u002F) to alleviate multi-head problem in Text-to-3D.\n* Support of Perp-Neg for both [Stable Diffusion](https:\u002F\u002Fgithub.com\u002FCompVis\u002Fstable-diffusion) and [DeepFloyd-IF](https:\u002F\u002Fgithub.com\u002Fdeep-floyd\u002FIF).\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F25863658\u002F236712982-9f93bd32-83bf-423a-bb7c-f73df7ece2e3.mp4\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F25863658\u002F232403162-51b69000-a242-4b8c-9cd9-4242b09863fa.mp4\n\n### [Update Logs](assets\u002Fupdate_logs.md)\n\n### Colab notebooks:\n* Instant-NGP backbone (`-O`): [![Instant-NGP Backbone](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1MXT3yfOFvO0ooKEfiUUvTKwUkrrlCHpF?usp=sharing)\n\n* Vanilla NeRF backbone (`-O2`): [![Vanilla Backbone](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1mvfxG-S_n_gZafWoattku7rLJ2kPoImL?usp=sharing)\n\n# Important Notice\nThis project is a **work-in-progress**, and contains lots of differences from the paper. **The current generation quality cannot match the results from the original paper, and many prompts still fail badly!**\n\n## Notable differences from the paper\n* Since the Imagen model is not publicly available, we use [Stable Diffusion](https:\u002F\u002Fgithub.com\u002FCompVis\u002Fstable-diffusion) to replace it (implementation from [diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers)). Different from Imagen, Stable-Diffusion is a latent diffusion model, which diffuses in a latent space instead of the original image space. Therefore, we need the loss to propagate back from the VAE's encoder part too, which introduces extra time cost in training.\n* We use the [multi-resolution grid encoder](https:\u002F\u002Fgithub.com\u002FNVlabs\u002Finstant-ngp\u002F) to implement the NeRF backbone (implementation from [torch-ngp](https:\u002F\u002Fgithub.com\u002Fashawkey\u002Ftorch-ngp)), which enables much faster rendering (~10FPS at 800x800).\n* We use the [Adan](https:\u002F\u002Fgithub.com\u002Fsail-sg\u002FAdan) optimizer as default.\n\n# Install\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion.git\ncd stable-dreamfusion\n```\n\n### Optional: create a python virtual environment\n\nTo avoid python package conflicts, we recommend using a virtual environment, e.g.: using conda or venv:\n\n```bash\npython -m venv venv_stable-dreamfusion\nsource venv_stable-dreamfusion\u002Fbin\u002Factivate # you need to repeat this step for every new terminal\n```\n\n### Install with pip\n\n```bash\npip install -r requirements.txt\n```\n\n### Download pre-trained models\n\nTo use image-conditioned 3D generation, you need to download some pretrained checkpoints manually:\n* [Zero-1-to-3](https:\u002F\u002Fgithub.com\u002Fcvlab-columbia\u002Fzero123) for diffusion backend.\n    We use `zero123-xl.ckpt` by default, and it is hard-coded in `guidance\u002Fzero123_utils.py`.\n    ```bash\n    cd pretrained\u002Fzero123\n    wget https:\u002F\u002Fzero123.cs.columbia.edu\u002Fassets\u002Fzero123-xl.ckpt\n    ```\n* [Omnidata](https:\u002F\u002Fgithub.com\u002FEPFL-VILAB\u002Fomnidata\u002Ftree\u002Fmain\u002Fomnidata_tools\u002Ftorch) for depth and normal prediction.\n    These ckpts are hardcoded in `preprocess_image.py`.\n    ```bash\n    mkdir pretrained\u002Fomnidata\n    cd pretrained\u002Fomnidata\n    # assume gdown is installed\n    gdown '1Jrh-bRnJEjyMCS7f-WsaFlccfPjJPPHI&confirm=t' # omnidata_dpt_depth_v2.ckpt\n    gdown '1wNxVO4vVbDEMEpnAi_jwQObf2MFodcBR&confirm=t' # omnidata_dpt_normal_v2.ckpt\n    ```\n\nTo use [DeepFloyd-IF](https:\u002F\u002Fgithub.com\u002Fdeep-floyd\u002FIF), you need to accept the usage conditions from [hugging face](https:\u002F\u002Fhuggingface.co\u002FDeepFloyd\u002FIF-I-XL-v1.0), and login with `huggingface-cli login` in command line.\n\nFor DMTet, we port the pre-generated `32\u002F64\u002F128` resolution tetrahedron grids under `tets`.\nThe 256 resolution one can be found [here](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1lgvEKNdsbW5RS4gVxJbgBS4Ac92moGSa\u002Fview?usp=sharing).\n\n### Build extension (optional)\nBy default, we use [`load`](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fstable\u002Fcpp_extension.html#torch.utils.cpp_extension.load) to build the extension at runtime.\nWe also provide the `setup.py` to build each extension:\n```bash\ncd stable-dreamfusion\n\n# install all extension modules\nbash scripts\u002Finstall_ext.sh\n\n# if you want to install manually, here is an example:\npip install .\u002Fraymarching # install to python path (you still need the raymarching\u002F folder, since this only installs the built extension.)\n```\n\n### Taichi backend (optional)\nUse [Taichi](https:\u002F\u002Fgithub.com\u002Ftaichi-dev\u002Ftaichi) backend for Instant-NGP. It achieves comparable performance to CUDA implementation while **No CUDA** build is required. Install Taichi with pip:\n```bash\npip install -i https:\u002F\u002Fpypi.taichi.graphics\u002Fsimple\u002F taichi-nightly\n```\n\n### Trouble Shooting:\n* we assume working with the latest version of all dependencies, if you meet any problems from a specific dependency, please try to upgrade it first (e.g., `pip install -U diffusers`). If the problem still holds, [reporting a bug issue](https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion\u002Fissues\u002Fnew?assignees=&labels=bug&template=bug_report.yaml&title=%3Ctitle%3E) will be appreciated!\n* `[F glutil.cpp:338] eglInitialize() failed Aborted (core dumped)`: this usually indicates problems in OpenGL installation. Try to re-install Nvidia driver, or use nvidia-docker as suggested in https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion\u002Fissues\u002F131 if you are using a headless server.\n* `TypeError: xxx_forward(): incompatible function arguments`： this happens when we update the CUDA source and you used `setup.py` to install the extensions earlier. Try to re-install the corresponding extension (e.g., `pip install .\u002Fgridencoder`).\n\n### Tested environments\n* Ubuntu 22 with torch 1.12 & CUDA 11.6 on a V100.\n\n# Usage\n\nFirst time running will take some time to compile the CUDA extensions.\n\n```bash\n#### stable-dreamfusion setting\n\n### Instant-NGP NeRF Backbone\n# + faster rendering speed\n# + less GPU memory (~16G)\n# - need to build CUDA extensions (a CUDA-free Taichi backend is available)\n\n## train with text prompt (with the default settings)\n# `-O` equals `--cuda_ray --fp16`\n# `--cuda_ray` enables instant-ngp-like occupancy grid based acceleration.\npython main.py --text \"a hamburger\" --workspace trial -O\n\n# reduce stable-diffusion memory usage with `--vram_O`\n# enable various vram savings (https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Foptimization\u002Ffp16).\npython main.py --text \"a hamburger\" --workspace trial -O --vram_O\n\n# You can collect arguments in a file. You can override arguments by specifying them after `--file`. Note that quoted strings can't be loaded from .args files...\npython main.py --file scripts\u002Fres64.args --workspace trial_awesome_hamburger --text \"a photo of an awesome hamburger\"\n\n# use CUDA-free Taichi backend with `--backbone grid_taichi`\npython3 main.py --text \"a hamburger\" --workspace trial -O --backbone grid_taichi\n\n# choose stable-diffusion version (support 1.5, 2.0 and 2.1, default is 2.1 now)\npython main.py --text \"a hamburger\" --workspace trial -O --sd_version 1.5\n\n# use a custom stable-diffusion checkpoint from hugging face:\npython main.py --text \"a hamburger\" --workspace trial -O --hf_key andite\u002Fanything-v4.0\n\n# use DeepFloyd-IF for guidance (experimental):\npython main.py --text \"a hamburger\" --workspace trial -O --IF\npython main.py --text \"a hamburger\" --workspace trial -O --IF --vram_O # requires ~24G GPU memory\n\n# we also support negative text prompt now:\npython main.py --text \"a rose\" --negative \"red\" --workspace trial -O\n\n## after the training is finished:\n# test (exporting 360 degree video)\npython main.py --workspace trial -O --test\n# also save a mesh (with obj, mtl, and png texture)\npython main.py --workspace trial -O --test --save_mesh\n# test with a GUI (free view control!)\npython main.py --workspace trial -O --test --gui\n\n### Vanilla NeRF backbone\n# + pure pytorch, no need to build extensions!\n# - slow rendering speed\n# - more GPU memory\n\n## train\n# `-O2` equals `--backbone vanilla`\npython main.py --text \"a hotdog\" --workspace trial2 -O2\n\n# if CUDA OOM, try to reduce NeRF sampling steps (--num_steps and --upsample_steps)\npython main.py --text \"a hotdog\" --workspace trial2 -O2 --num_steps 64 --upsample_steps 0\n\n## test\npython main.py --workspace trial2 -O2 --test\npython main.py --workspace trial2 -O2 --test --save_mesh\npython main.py --workspace trial2 -O2 --test --gui # not recommended, FPS will be low.\n\n### DMTet finetuning\n\n## use --dmtet and --init_with \u003Cnerf checkpoint> to finetune the mesh at higher reslution\npython main.py -O --text \"a hamburger\" --workspace trial_dmtet --dmtet --iters 5000 --init_with trial\u002Fcheckpoints\u002Fdf.pth\n\n## init dmtet with a mesh to generate texture\n# require install of cubvh: pip install git+https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fcubvh\n# remove --lock_geo to also finetune geometry, but performance may be bad.\npython main.py -O --text \"a white bunny with red eyes\" --workspace trial_dmtet_mesh --dmtet --iters 5000 --init_with .\u002Fdata\u002Fbunny.obj --lock_geo\n\n## test & export the mesh\npython main.py -O --text \"a hamburger\" --workspace trial_dmtet --dmtet --iters 5000 --test --save_mesh\n\n## gui to visualize dmtet\npython main.py -O --text \"a hamburger\" --workspace trial_dmtet --dmtet --iters 5000 --test --gui\n\n### Image-conditioned 3D Generation\n\n## preprocess input image\n# note: the results of image-to-3D is dependent on zero-1-to-3's capability. For best performance, the input image should contain a single front-facing object, it should have square aspect ratio, with \u003C1024 pixel resolution. Check the examples under .\u002Fdata.\n# this will exports `\u003Cimage>_rgba.png`, `\u003Cimage>_depth.png`, and `\u003Cimage>_normal.png` to the directory containing the input image.\npython preprocess_image.py \u003Cimage>.png\npython preprocess_image.py \u003Cimage>.png --border_ratio 0.4 # increase border_ratio if the center object appears too large and results are unsatisfying.\n\n## zero123 train\n# pass in the processed \u003Cimage>_rgba.png by --image and do NOT pass in --text to enable zero-1-to-3 backend.\npython main.py -O --image \u003Cimage>_rgba.png --workspace trial_image --iters 5000\n\n# if the image is not exactly front-view (elevation = 0), adjust default_polar (we use polar from 0 to 180 to represent elevation from 90 to -90)\npython main.py -O --image \u003Cimage>_rgba.png --workspace trial_image --iters 5000 --default_polar 80\n\n# by default we leverage monocular depth estimation to aid image-to-3d, but if you find the depth estimation inaccurate and harms results, turn it off by:\npython main.py -O --image \u003Cimage>_rgba.png --workspace trial_image --iters 5000 --lambda_depth 0\n\npython main.py -O --image \u003Cimage>_rgba.png --workspace trial_image_dmtet --dmtet --init_with trial_image\u002Fcheckpoints\u002Fdf.pth\n\n## zero123 with multiple images\npython main.py -O --image_config config\u002F\u003Cconfig>.csv --workspace trial_image --iters 5000\n\n## render \u003Cnum> images per batch (default 1)\npython main.py -O --image_config config\u002F\u003Cconfig>.csv --workspace trial_image --iters 5000 --batch_size 4\n\n# providing both --text and --image enables stable-diffusion backend (similar to make-it-3d)\npython main.py -O --image hamburger_rgba.png --text \"a DSLR photo of a delicious hamburger\" --workspace trial_image_text --iters 5000\n\npython main.py -O --image hamburger_rgba.png --text \"a DSLR photo of a delicious hamburger\" --workspace trial_image_text_dmtet --dmtet --init_with trial_image_text\u002Fcheckpoints\u002Fdf.pth\n\n## test \u002F visualize\npython main.py -O --image \u003Cimage>_rgba.png --workspace trial_image_dmtet --dmtet --test --save_mesh\npython main.py -O --image \u003Cimage>_rgba.png --workspace trial_image_dmtet --dmtet --test --gui\n\n### Debugging\n\n# Can save guidance images for debugging purposes. These get saved in trial_hamburger\u002Fguidance.\n# Warning: this slows down training considerably and consumes lots of disk space!\npython main.py --text \"a hamburger\" --workspace trial_hamburger -O --vram_O --save_guidance --save_guidance_interval 5 # save every 5 steps\n```\n\nFor example commands, check [`scripts`](.\u002Fscripts).\n\nFor advanced tips and other developing stuff, check [Advanced Tips](.\u002Fassets\u002Fadvanced.md).\n\n# Evalutation\n\nReproduce the paper CLIP R-precision evaluation\n\nAfter the testing part in the usage, the validation set containing projection from different angle is generated. Test the R-precision between prompt and the image.(R=1)\n\n```bash\npython r_precision.py --text \"a snake is flying in the sky\" --workspace snake_HQ --latest ep0100 --mode depth --clip clip-ViT-B-16\n```\n\n# Acknowledgement\n\nThis work is based on an increasing list of amazing research works and open-source projects, thanks a lot to all the authors for sharing!\n\n* [DreamFusion: Text-to-3D using 2D Diffusion](https:\u002F\u002Fdreamfusion3d.github.io\u002F)\n    ```\n    @article{poole2022dreamfusion,\n        author = {Poole, Ben and Jain, Ajay and Barron, Jonathan T. and Mildenhall, Ben},\n        title = {DreamFusion: Text-to-3D using 2D Diffusion},\n        journal = {arXiv},\n        year = {2022},\n    }\n    ```\n\n* [Magic3D: High-Resolution Text-to-3D Content Creation](https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Fdir\u002Fmagic3d\u002F)\n   ```\n   @inproceedings{lin2023magic3d,\n      title={Magic3D: High-Resolution Text-to-3D Content Creation},\n      author={Lin, Chen-Hsuan and Gao, Jun and Tang, Luming and Takikawa, Towaki and Zeng, Xiaohui and Huang, Xun and Kreis, Karsten and Fidler, Sanja and Liu, Ming-Yu and Lin, Tsung-Yi},\n      booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},\n      year={2023}\n    }\n   ```\n\n* [Zero-1-to-3: Zero-shot One Image to 3D Object](https:\u002F\u002Fgithub.com\u002Fcvlab-columbia\u002Fzero123)\n    ```\n    @misc{liu2023zero1to3,\n        title={Zero-1-to-3: Zero-shot One Image to 3D Object},\n        author={Ruoshi Liu and Rundi Wu and Basile Van Hoorick and Pavel Tokmakov and Sergey Zakharov and Carl Vondrick},\n        year={2023},\n        eprint={2303.11328},\n        archivePrefix={arXiv},\n        primaryClass={cs.CV}\n    }\n    ```\n    \n* [Perp-Neg: Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into 3D, alleviate Janus problem and Beyond](https:\u002F\u002Fperp-neg.github.io\u002F)\n    ```\n    @article{armandpour2023re,\n      title={Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into 3D, alleviate Janus problem and Beyond},\n      author={Armandpour, Mohammadreza and Zheng, Huangjie and Sadeghian, Ali and Sadeghian, Amir and Zhou, Mingyuan},\n      journal={arXiv preprint arXiv:2304.04968},\n      year={2023}\n    }\n    ```\n    \n* [RealFusion: 360° Reconstruction of Any Object from a Single Image](https:\u002F\u002Fgithub.com\u002Flukemelas\u002Frealfusion)\n    ```\n    @inproceedings{melaskyriazi2023realfusion,\n        author = {Melas-Kyriazi, Luke and Rupprecht, Christian and Laina, Iro and Vedaldi, Andrea},\n        title = {RealFusion: 360 Reconstruction of Any Object from a Single Image},\n        booktitle={CVPR}\n        year = {2023},\n        url = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.10663},\n    }\n    ```\n\n* [Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation](https:\u002F\u002Ffantasia3d.github.io\u002F)\n    ```\n    @article{chen2023fantasia3d,\n        title={Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation},\n        author={Rui Chen and Yongwei Chen and Ningxin Jiao and Kui Jia},\n        journal={arXiv preprint arXiv:2303.13873},\n        year={2023}\n    }\n    ```\n\n* [Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior](https:\u002F\u002Fmake-it-3d.github.io\u002F)\n    ```\n    @article{tang2023make,\n        title={Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior},\n        author={Tang, Junshu and Wang, Tengfei and Zhang, Bo and Zhang, Ting and Yi, Ran and Ma, Lizhuang and Chen, Dong},\n        journal={arXiv preprint arXiv:2303.14184},\n        year={2023}\n    }\n    ```\n\n* [Stable Diffusion](https:\u002F\u002Fgithub.com\u002FCompVis\u002Fstable-diffusion) and the [diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers) library.\n\n    ```\n    @misc{rombach2021highresolution,\n        title={High-Resolution Image Synthesis with Latent Diffusion Models},\n        author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},\n        year={2021},\n        eprint={2112.10752},\n        archivePrefix={arXiv},\n        primaryClass={cs.CV}\n    }\n\n    @misc{von-platen-etal-2022-diffusers,\n        author = {Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Thomas Wolf},\n        title = {Diffusers: State-of-the-art diffusion models},\n        year = {2022},\n        publisher = {GitHub},\n        journal = {GitHub repository},\n        howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers}}\n    }\n    ```\n\n* The GUI is developed with [DearPyGui](https:\u002F\u002Fgithub.com\u002Fhoffstadt\u002FDearPyGui).\n\n* Puppy image from : https:\u002F\u002Fwww.pexels.com\u002Fphoto\u002Fhigh-angle-photo-of-a-corgi-looking-upwards-2664417\u002F\n\n* Anya images from : https:\u002F\u002Fwww.goodsmile.info\u002Fen\u002Fproduct\u002F13301\u002FPOP+UP+PARADE+Anya+Forger.html\n\n# Citation\n\nIf you find this work useful, a citation will be appreciated via:\n```\n@misc{stable-dreamfusion,\n    Author = {Jiaxiang Tang},\n    Year = {2022},\n    Note = {https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion},\n    Title = {Stable-dreamfusion: Text-to-3D with Stable-diffusion}\n}\n```\n","# Stable-Dreamfusion\n\n一个基于 PyTorch 的文本到 3D 模型 **Dreamfusion** 的实现，由 [Stable Diffusion](https:\u002F\u002Fgithub.com\u002FCompVis\u002Fstable-diffusion) 文本到 2D 模型驱动。\n\n**广告：请查看 [threestudio](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio)，它在 3D 内容生成方面有最新的改进和更好的实现！**\n\n**新闻 (2023.6.12)**：\n\n* 支持 [Perp-Neg](https:\u002F\u002Fperp-neg.github.io\u002F) 来缓解文本到 3D 中的多头问题。\n* 同时支持 [Stable Diffusion](https:\u002F\u002Fgithub.com\u002FCompVis\u002Fstable-diffusion) 和 [DeepFloyd-IF](https:\u002F\u002Fgithub.com\u002Fdeep-floyd\u002FIF) 的 Perp-Neg。\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F25863658\u002F236712982-9f93bd32-83bf-423a-bb7c-f73df7ece2e3.mp4\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F25863658\u002F232403162-51b69000-a242-4b8c-9cd9-4242b09863fa.mp4\n\n### [更新日志](assets\u002Fupdate_logs.md)\n\n### Colab 笔记本：\n* Instant-NGP 骨干（`-O`）：[![Instant-NGP Backbone](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1MXT3yfOFvO0ooKEfiUUvTKwUkrrlCHpF?usp=sharing)\n\n* 范式 NeRF 骨干（`-O2`）：[![Vanilla Backbone](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1mvfxG-S_n_gZafWoattku7rLJ2kPoImL?usp=sharing)\n\n# 重要提示\n该项目目前仍处于 **开发中**，与论文相比存在许多差异。**当前的生成质量还无法达到原论文的结果，许多提示仍然会严重失败！**\n\n## 与论文的主要区别\n* 由于 Imagen 模型未公开，我们使用 [Stable Diffusion](https:\u002F\u002Fgithub.com\u002FCompVis\u002Fstable-diffusion) 替代它（实现来自 [diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers)）。与 Imagen 不同，Stable-Diffusion 是一种潜在扩散模型，它是在潜在空间而非原始图像空间中进行扩散的。因此，我们需要将损失也反向传播到 VAE 的编码器部分，这会增加训练时间成本。\n* 我们使用 [多分辨率网格编码器](https:\u002F\u002Fgithub.com\u002FNVlabs\u002Finstant-ngp\u002F) 来实现 NeRF 骨干（实现来自 [torch-ngp](https:\u002F\u002Fgithub.com\u002Fashawkey\u002Ftorch-ngp)），这使得渲染速度大幅提升（800x800 分辨率下约 10 FPS）。\n* 我们默认使用 [Adan](https:\u002F\u002Fgithub.com\u002Fsail-sg\u002FAdan) 优化器。\n\n# 安装\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion.git\ncd stable-dreamfusion\n```\n\n### 可选：创建 Python 虚拟环境\n\n为避免 Python 包冲突，建议使用虚拟环境，例如使用 conda 或 venv：\n\n```bash\npython -m venv venv_stable-dreamfusion\nsource venv_stable-dreamfusion\u002Fbin\u002Factivate # 每次打开新终端都需要重复此步骤\n```\n\n### 使用 pip 安装\n\n```bash\npip install -r requirements.txt\n```\n\n### 下载预训练模型\n\n要使用图像条件下的 3D 生成，需要手动下载一些预训练检查点：\n* [Zero-1-to-3](https:\u002F\u002Fgithub.com\u002Fcvlab-columbia\u002Fzero123) 作为扩散后端。\n    默认使用 `zero123-xl.ckpt`，并在 `guidance\u002Fzero123_utils.py` 中硬编码。\n    ```bash\n    cd pretrained\u002Fzero123\n    wget https:\u002F\u002Fzero123.cs.columbia.edu\u002Fassets\u002Fzero123-xl.ckpt\n    ```\n* [Omnidata](https:\u002F\u002Fgithub.com\u002FEPFL-VILAB\u002Fomnidata\u002Ftree\u002Fmain\u002Fomnidata_tools\u002Ftorch) 用于深度和法线预测。\n    这些检查点在 `preprocess_image.py` 中硬编码。\n    ```bash\n    mkdir pretrained\u002Fomnidata\n    cd pretrained\u002Fomnidata\n    # 假设已安装 gdown\n    gdown '1Jrh-bRnJEjyMCS7f-WsaFlccfPjJPPHI&confirm=t' # omnidata_dpt_depth_v2.ckpt\n    gdown '1wNxVO4vVbDEMEpnAi_jwQObf2MFodcBR&confirm=t' # omnidata_dpt_normal_v2.ckpt\n    ```\n\n要使用 [DeepFloyd-IF](https:\u002F\u002Fgithub.com\u002Fdeep-floyd\u002FIF)，需要接受来自 [hugging face](https:\u002F\u002Fhuggingface.co\u002FDeepFloyd\u002FIF-I-XL-v1.0) 的使用条款，并通过命令行使用 `huggingface-cli login` 登录。\n\n对于 DMTet，我们将预先生成的 `32\u002F64\u002F128` 分辨率四面体网格移植到 `tets` 目录下。256 分辨率的网格可以在 [这里](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1lgvEKNdsbW5RS4gVxJbgBS4Ac92moGSa\u002Fview?usp=sharing) 找到。\n\n### 构建扩展（可选）\n默认情况下，我们使用 [`load`](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fstable\u002Fcpp_extension.html#torch.utils.cpp_extension.load) 在运行时构建扩展。\n我们也提供了 `setup.py` 来构建每个扩展：\n```bash\ncd stable-dreamfusion\n\n# 安装所有扩展模块\nbash scripts\u002Finstall_ext.sh\n\n# 如果你想手动安装，这里有一个例子：\npip install .\u002Fraymarching # 安装到 Python 路径（你仍然需要 raymarching\u002F 文件夹，因为这只是安装了编译好的扩展。)\n```\n\n### Taichi 后端（可选）\n使用 [Taichi](https:\u002F\u002Fgithub.com\u002Ftaichi-dev\u002Ftaichi) 后端来替代 Instant-NGP。它可以在无需 CUDA 的情况下实现与 CUDA 实现相当的性能。使用 pip 安装 Taichi：\n```bash\npip install -i https:\u002F\u002Fpypi.taichi.graphics\u002Fsimple\u002F taichi-nightly\n```\n\n### 故障排除：\n* 我们假设所有依赖项都使用最新版本，如果遇到特定依赖项的问题，请先尝试升级它（例如 `pip install -U diffusers`）。如果问题仍然存在，请提交 [bug 报告](https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion\u002Fissues\u002Fnew?assignees=&labels=bug&template=bug_report.yaml&title=%3Ctitle%3E)，我们将不胜感激！\n* `[F glutil.cpp:338] eglInitialize() failed Aborted (core dumped)`：这通常表示 OpenGL 安装存在问题。尝试重新安装 Nvidia 驱动程序，或者如 https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion\u002Fissues\u002F131 所示，如果您使用的是无头服务器，可以尝试使用 nvidia-docker。\n* `TypeError: xxx_forward(): incompatible function arguments`：当您更新 CUDA 源代码并之前使用 `setup.py` 安装过扩展时，可能会出现这种情况。尝试重新安装相应的扩展（例如 `pip install .\u002Fgridencoder`）。\n\n### 测试环境\n* Ubuntu 22，配备 torch 1.12 和 CUDA 11.6 的 V100 显卡。\n\n# 使用方法\n\n首次运行时，编译 CUDA 扩展可能需要一些时间。\n\n```bash\n#### stable-dreamfusion 设置\n\n### Instant-NGP NeRF 骨干\n# + 渲染速度更快\n# + GPU 内存占用更少（约 16G）\n# - 需要编译 CUDA 扩展（也有无需 CUDA 的 Taichi 后端）\n\n## 使用文本提示训练（使用默认设置）\n# `-O` 等价于 `--cuda_ray --fp16`\n# `--cuda_ray` 启用类似 Instant-NGP 的占用网格加速。\npython main.py --text \"a hamburger\" --workspace trial -O\n\n# 使用 `--vram_O` 减少 Stable-Diffusion 的内存使用\n# 启用各种显存节省功能（https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Foptimization\u002Ffp16）。\npython main.py --text \"a hamburger\" --workspace trial -O --vram_O\n\n# 你可以将参数收集到一个文件中。通过在 `--file` 后指定参数，可以覆盖文件中的参数。请注意，带引号的字符串无法从 .args 文件中加载……\npython main.py --file scripts\u002Fres64.args --workspace trial_awesome_hamburger --text \"a photo of an awesome hamburger\"\n\n# 使用无 CUDA 的 Taichi 后端，只需添加 `--backbone grid_taichi`\npython3 main.py --text \"a hamburger\" --workspace trial -O --backbone grid_taichi\n\n# 选择 Stable Diffusion 版本（支持 1.5、2.0 和 2.1，默认为 2.1）\npython main.py --text \"a hamburger\" --workspace trial -O --sd_version 1.5\n\n# 使用来自 Hugging Face 的自定义 Stable Diffusion 检查点：\npython main.py --text \"a hamburger\" --workspace trial -O --hf_key andite\u002Fanything-v4.0\n\n# 使用 DeepFloyd-IF 进行引导（实验性）：\npython main.py --text \"a hamburger\" --workspace trial -O --IF\npython main.py --text \"a hamburger\" --workspace trial -O --IF --vram_O # 需要约 24G 显存\n\n# 我们现在也支持负面文本提示：\npython main.py --text \"a rose\" --negative \"red\" --workspace trial -O\n\n## 训练完成后：\n# 测试（导出 360 度视频）\npython main.py --workspace trial -O --test\n# 同时保存网格模型（包含 obj、mtl 和 png 纹理）\npython main.py --workspace trial -O --test --save_mesh\n# 使用 GUI 进行测试（可自由控制视角！）\npython main.py --workspace trial -O --test --gui\n\n### Vanilla NeRF 后台\n# + 纯 PyTorch 实现，无需编译扩展！\n# - 渲染速度较慢\n# - 需要更多显存\n\n## 训练\n# `-O2` 等同于 `--backbone vanilla`\npython main.py --text \"a hotdog\" --workspace trial2 -O2\n\n# 如果 CUDA 内存不足，可以尝试减少 NeRF 采样步骤（`--num_steps` 和 `--upsample_steps`）\npython main.py --text \"a hotdog\" --workspace trial2 -O2 --num_steps 64 --upsample_steps 0\n\n## 测试\npython main.py --workspace trial2 -O2 --test\npython main.py --workspace trial2 -O2 --test --save_mesh\npython main.py --workspace trial2 -O2 --test --gui # 不推荐，帧率会很低。\n\n### DMTet 微调\n\n## 使用 `--dmtet` 和 `--init_with \u003Cnerf checkpoint>` 来以更高分辨率微调网格\npython main.py -O --text \"a hamburger\" --workspace trial_dmtet --dmtet --iters 5000 --init_with trial\u002Fcheckpoints\u002Fdf.pth\n\n## 使用网格初始化 DMTet 以生成纹理\n# 需要安装 cubvh：pip install git+https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fcubvh\n# 移除 `--lock_geo` 可以同时微调几何形状，但性能可能较差。\npython main.py -O --text \"a white bunny with red eyes\" --workspace trial_dmtet_mesh --dmtet --iters 5000 --init_with .\u002Fdata\u002Fbunny.obj --lock_geo\n\n## 测试与导出网格\npython main.py -O --text \"a hamburger\" --workspace trial_dmtet --dmtet --iters 5000 --test --save_mesh\n\n## 可视化 DMTet 的 GUI\npython main.py -O --text \"a hamburger\" --workspace trial_dmtet --dmtet --iters 5000 --test --gui\n\n### 基于图像的 3D 生成\n\n## 预处理输入图像\n# 注意：图像转 3D 的效果取决于 zero-1-to-3 的能力。为了获得最佳效果，输入图像应包含单一正面物体，长宽比为正方形，分辨率为 1024 像素以下。请查看 .\u002Fdata 下的示例。\n# 这将输出 `\u003Cimage>_rgba.png`、`\u003Cimage>_depth.png` 和 `\u003Cimage>_normal.png` 到输入图像所在的目录。\npython preprocess_image.py \u003Cimage>.png\npython preprocess_image.py \u003Cimage>.png --border_ratio 0.4 # 如果中心物体显得过大且结果不理想，可增加边框比例。\n\n## zero123 训练\n# 通过 `--image` 传入处理后的 `\u003Cimage>_rgba.png`，不要传入 `--text`，以启用 zero-1-to-3 后端。\npython main.py -O --image \u003Cimage>_rgba.png --workspace trial_image --iters 5000\n\n# 如果图像并非完全正面（仰角 ≠ 0），需调整 default_polar（我们用 0 到 180 表示 90 到 -90 的仰角）\npython main.py -O --image \u003Cimage>_rgba.png --workspace trial_image --iters 5000 --default_polar 80\n\n# 默认情况下，我们会利用单目深度估计来辅助图像转 3D，但如果发现深度估计不准确并影响结果，可以通过以下方式关闭：\npython main.py -O --image \u003Cimage>_rgba.png --workspace trial_image --iters 5000 --lambda_depth 0\n\npython main.py -O --image \u003Cimage>_rgba.png --workspace trial_image_dmtet --dmtet --init_with trial_image\u002Fcheckpoints\u002Fdf.pth\n\n## 多张图像的 zero123\npython main.py -O --image_config config\u002F\u003Cconfig>.csv --workspace trial_image --iters 5000\n\n## 每批渲染 \u003Cnum> 张图像（默认 1 张）\npython main.py -O --image_config config\u002F\u003Cconfig>.csv --workspace trial_image --iters 5000 --batch_size 4\n\n# 同时提供 `--text` 和 `--image` 将启用 Stable Diffusion 后端（类似于 make-it-3d）\npython main.py -O --image hamburger_rgba.png --text \"一张美味汉堡的 DSLR 照片\" --workspace trial_image_text --iters 5000\n\npython main.py -O --image hamburger_rgba.png --text \"一张美味汉堡的 DSLR 照片\" --workspace trial_image_text_dmtet --dmtet --init_with trial_image_text\u002Fcheckpoints\u002Fdf.pth\n\n## 测试 \u002F 可视化\npython main.py -O --image \u003Cimage>_rgba.png --workspace trial_image_dmtet --dmtet --test --save_mesh\npython main.py -O --image \u003Cimage>_rgba.png --workspace trial_image_dmtet --dmtet --test --gui\n\n### 调试\n\n# 可以保存引导图像用于调试目的。这些图像会保存在 trial_hamburger\u002Fguidance 目录下。\n# 注意：这会显著减慢训练速度，并占用大量磁盘空间！\npython main.py --text \"a hamburger\" --workspace trial_hamburger -O --vram_O --save_guidance --save_guidance_interval 5 # 每 5 步保存一次\n```\n\n有关示例命令，请查看 [`scripts`](.\u002Fscripts)。\n\n有关高级技巧和其他开发内容，请查看 [Advanced Tips](.\u002Fassets\u002Fadvanced.md)。\n\n# 评估\n\n重现论文中的 CLIP R 精度评估\n\n在使用说明中的测试部分完成后，会生成包含不同角度投影的验证集。测试提示词与图像之间的 R 精度。（R=1）\n\n```bash\npython r_precision.py --text \"a snake is flying in the sky\" --workspace snake_HQ --latest ep0100 --mode depth --clip clip-ViT-B-16\n```\n\n# 致谢\n\n本工作基于日益增长的一系列卓越的研究成果和开源项目，衷心感谢所有作者的分享！\n\n* [DreamFusion：使用2D扩散模型实现文本到3D生成](https:\u002F\u002Fdreamfusion3d.github.io\u002F)\n    ```\n    @article{poole2022dreamfusion,\n        author = {Poole, Ben and Jain, Ajay and Barron, Jonathan T. and Mildenhall, Ben},\n        title = {DreamFusion: Text-to-3D using 2D Diffusion},\n        journal = {arXiv},\n        year = {2022},\n    }\n    ```\n\n* [Magic3D：高分辨率文本到3D内容生成](https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Fdir\u002Fmagic3d\u002F)\n   ```\n   @inproceedings{lin2023magic3d,\n      title={Magic3D: High-Resolution Text-to-3D Content Creation},\n      author={Lin, Chen-Hsuan and Gao, Jun and Tang, Luming and Takikawa, Towaki and Zeng, Xiaohui and Huang, Xun and Kreis, Karsten and Fidler, Sanja and Liu, Ming-Yu and Lin, Tsung-Yi},\n      booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},\n      year={2023}\n    }\n   ```\n\n* [Zero-1-to-3：零样本单张图像到3D物体生成](https:\u002F\u002Fgithub.com\u002Fcvlab-columbia\u002Fzero123)\n    ```\n    @misc{liu2023zero1to3,\n        title={Zero-1-to-3: Zero-shot One Image to 3D Object},\n        author={Ruoshi Liu and Rundi Wu and Basile Van Hoorick and Pavel Tokmakov and Sergey Zakharov and Carl Vondrick},\n        year={2023},\n        eprint={2303.11328},\n        archivePrefix={arXiv},\n        primaryClass={cs.CV}\n    }\n    ```\n    \n* [Perp-Neg：重新构想负向提示算法——将2D扩散转化为3D，缓解Janus问题并进一步拓展](https:\u002F\u002Fperp-neg.github.io\u002F)\n    ```\n    @article{armandpour2023re,\n      title={Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into 3D, alleviate Janus problem and Beyond},\n      author={Armandpour, Mohammadreza and Zheng, Huangjie and Sadeghian, Ali and Sadeghian, Amir and Zhou, Mingyuan},\n      journal={arXiv preprint arXiv:2304.04968},\n      year={2023}\n    }\n    ```\n    \n* [RealFusion：从单张图像重建任意物体的360°视图](https:\u002F\u002Fgithub.com\u002Flukemelas\u002Frealfusion)\n    ```\n    @inproceedings{melaskyriazi2023realfusion,\n        author = {Melas-Kyriazi, Luke and Rupprecht, Christian and Laina, Iro and Vedaldi, Andrea},\n        title = {RealFusion: 360 Reconstruction of Any Object from a Single Image},\n        booktitle={CVPR}\n        year = {2023},\n        url = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.10663},\n    }\n    ```\n\n* [Fantasia3D：解耦几何与外观以实现高质量文本到3D内容生成](https:\u002F\u002Ffantasia3d.github.io\u002F)\n    ```\n    @article{chen2023fantasia3d,\n        title={Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation},\n        author={Rui Chen and Yongwei Chen and Ningxin Jiao and Kui Jia},\n        journal={arXiv preprint arXiv:2303.13873},\n        year={2023}\n    }\n    ```\n\n* [Make-It-3D：基于扩散先验的单张图像高保真3D生成](https:\u002F\u002Fmake-it-3d.github.io\u002F)\n    ```\n    @article{tang2023make,\n        title={Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior},\n        author={Tang, Junshu and Wang, Tengfei and Zhang, Bo and Zhang, Ting and Yi, Ran and Ma, Lizhuang and Chen, Dong},\n        journal={arXiv preprint arXiv:2303.14184},\n        year={2023}\n    }\n    ```\n\n* [Stable Diffusion](https:\u002F\u002Fgithub.com\u002FCompVis\u002Fstable-diffusion) 和 [diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers) 库。\n\n    ```\n    @misc{rombach2021highresolution,\n        title={High-Resolution Image Synthesis with Latent Diffusion Models},\n        author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},\n        year={2021},\n        eprint={2112.10752},\n        archivePrefix={arXiv},\n        primaryClass={cs.CV}\n    }\n\n    @misc{von-platen-etal-2022-diffusers,\n        author = {Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Thomas Wolf},\n        title = {Diffusers: State-of-the-art diffusion models},\n        year = {2022},\n        publisher = {GitHub},\n        journal = {GitHub repository},\n        howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers}}\n    }\n    ```\n\n* GUI界面采用 [DearPyGui](https:\u002F\u002Fgithub.com\u002Fhoffstadt\u002FDearPyGui) 开发。\n\n* 小狗图片来自：https:\u002F\u002Fwww.pexels.com\u002Fphoto\u002Fhigh-angle-photo-of-a-corgi-looking-upwards-2664417\u002F\n\n* 安雅图片来自：https:\u002F\u002Fwww.goodsmile.info\u002Fen\u002Fproduct\u002F13301\u002FPOP+UP+PARADE+Anya+Forger.html\n\n# 引用\n\n如果您觉得本工作有所帮助，请通过以下方式引用：\n```\n@misc{stable-dreamfusion,\n    Author = {Jiaxiang Tang},\n    Year = {2022},\n    Note = {https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion},\n    Title = {Stable-dreamfusion: Text-to-3D with Stable-diffusion}\n}\n```","# Stable-Dreamfusion 快速上手指南\n\nStable-Dreamfusion 是一个基于 PyTorch 的文本生成 3D 模型实现，利用 Stable Diffusion 作为先验知识。本指南将帮助你快速在本地部署并运行该工具。\n\n## 环境准备\n\n在开始之前，请确保你的系统满足以下要求：\n\n*   **操作系统**: 推荐 Ubuntu 22.04 (其他 Linux 发行版或 Windows WSL2 也可尝试)。\n*   **GPU**: 支持 CUDA 的 NVIDIA 显卡 (推荐显存 16GB 以上，使用 `--vram_O` 参数可适当降低要求)。\n*   **CUDA**: 推荐 CUDA 11.6 或更高版本。\n*   **Python**: Python 3.8+。\n*   **依赖**: 需要安装 `git`, `wget`, `gdown` (用于下载部分预训练模型)。\n\n> **注意**: 该项目处于开发阶段，生成质量可能与原论文有差异，且部分提示词可能效果不佳。\n\n## 安装步骤\n\n### 1. 克隆项目\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion.git\ncd stable-dreamfusion\n```\n\n### 2. 创建虚拟环境 (推荐)\n为了避免依赖冲突，建议使用 `conda` 或 `venv` 创建独立环境。\n```bash\npython -m venv venv_stable-dreamfusion\nsource venv_stable-dreamfusion\u002Fbin\u002Factivate\n```\n\n### 3. 安装 Python 依赖\n```bash\npip install -r requirements.txt\n```\n*国内用户若下载缓慢，可添加清华源加速：*\n```bash\npip install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n### 4. 下载预训练模型 (可选但推荐)\n若需使用**图像生成 3D**功能，需手动下载以下模型：\n\n**Zero-1-to-3 (扩散后端):**\n```bash\nmkdir -p pretrained\u002Fzero123\ncd pretrained\u002Fzero123\nwget https:\u002F\u002Fzero123.cs.columbia.edu\u002Fassets\u002Fzero123-xl.ckpt\ncd ..\u002F..\n```\n\n**Omnidata (深度与法线预测):**\n需先安装 `gdown`: `pip install gdown`\n```bash\nmkdir -p pretrained\u002Fomnidata\ncd pretrained\u002Fomnidata\ngdown '1Jrh-bRnJEjyMCS7f-WsaFlccfPjJPPHI&confirm=t' # omnidata_dpt_depth_v2.ckpt\ngdown '1wNxVO4vVbDEMEpnAi_jwQObf2MFodcBR&confirm=t' # omnidata_dpt_normal_v2.ckpt\ncd ..\u002F..\n```\n\n*注：若使用 DeepFloyd-IF 模型，需先在 Hugging Face 接受协议并执行 `huggingface-cli login` 登录。*\n\n### 5. 构建扩展模块 (可选)\n默认情况下，程序会在运行时自动编译 CUDA 扩展。若需预先安装以提升稳定性：\n```bash\nbash scripts\u002Finstall_ext.sh\n```\n\n*若无 CUDA 环境，可使用 Taichi 后端（性能相当但无需编译 CUDA）：*\n```bash\npip install -i https:\u002F\u002Fpypi.taichi.graphics\u002Fsimple\u002F taichi-nightly\n```\n\n## 基本使用\n\n首次运行时，系统会自动编译 CUDA 扩展，可能需要几分钟时间。\n\n### 场景一：文本生成 3D (Text-to-3D)\n\n使用默认的 Instant-NGP 骨干网络进行训练（速度快，显存占用较低）：\n\n```bash\n# 基础用法：生成一个汉堡包\npython main.py --text \"a hamburger\" --workspace trial -O\n\n# 显存优化模式：如果显存不足，添加 --vram_O 参数\npython main.py --text \"a hamburger\" --workspace trial -O --vram_O\n\n# 使用负向提示词\npython main.py --text \"a rose\" --negative \"red\" --workspace trial -O\n```\n\n**参数说明：**\n*   `-O`: 启用 Instant-NGP 加速 (`--cuda_ray --fp16`)。\n*   `--workspace`: 指定输出目录。\n*   `--sd_version`: 指定 Stable Diffusion 版本 (1.5, 2.0, 2.1)，默认为 2.1。\n\n### 场景二：测试与导出\n\n训练完成后，使用以下命令进行测试、视频导出或网格保存：\n\n```bash\n# 导出 360 度旋转视频\npython main.py --workspace trial -O --test\n\n# 保存 3D 网格文件 (.obj, .mtl, 纹理图)\npython main.py --workspace trial -O --test --save_mesh\n\n# 启动 GUI 交互式查看器 (支持自由视角控制)\npython main.py --workspace trial -O --test --gui\n```\n\n### 场景三：图像生成 3D (Image-to-3D)\n\n若需基于单张图像生成 3D 模型，首先预处理图像：\n\n```bash\n# 预处理：生成带透明通道、深度图和法线图的图像\npython preprocess_image.py \u003Cinput_image>.png\n```\n\n然后使用处理后的图像进行训练（**不要**传入 `--text` 参数以启用 Zero-1-to-3 后端）：\n\n```bash\npython main.py -O --image \u003Cinput_image>_rgba.png --workspace trial_image --iters 5000\n```\n\n训练结束后同样使用 `--test --save_mesh` 导出结果。","某独立游戏开发者需要为即将上线的奇幻题材项目快速制作一批风格统一的 3D 道具资产，但团队中缺乏专业的 3D 建模师。\n\n### 没有 stable-dreamfusion 时\n- **人力成本高昂**：必须外包或招聘专职建模师，单个低多边形道具的制作周期长达数天，严重拖慢开发进度。\n- **创意验证困难**：策划脑海中“发光的水晶骷髅”等抽象概念难以通过文字直接转化为可视模型，反复沟通修改效率极低。\n- **技术门槛限制**：团队成员仅熟悉 2D 绘图或代码，面对 Blender、Maya 等专业软件复杂的拓扑和布线规则无从下手。\n- **资产风格割裂**：外包制作的模型往往与游戏整体的美术风格存在细微偏差，后期调整材质和形状耗时费力。\n\n### 使用 stable-dreamfusion 后\n- **生成效率飞跃**：开发者只需输入“低多边形风格的发光水晶骷髅”等提示词，stable-dreamfusion 即可在数十分钟内自动生成带纹理的 3D 网格模型。\n- **创意即时落地**：利用其 Text-to-3D 能力，策划人员可直接将文字描述转化为初步模型，瞬间验证设计想法的可行性。\n- **流程大幅简化**：基于 NeRF 和扩散模型的技术栈屏蔽了底层几何构建细节，无需手动拓扑，直接导出 OBJ\u002FPLY 格式即可导入游戏引擎。\n- **风格高度可控**：结合 Stable Diffusion 的强大泛化能力，通过微调提示词即可确保生成的道具在光影和质感上与游戏世界观完美融合。\n\nstable-dreamfusion 通过将自然语言直接转化为高质量 3D 资产，彻底打破了传统建模的技术壁垒，让小型团队也能实现“所想即所得”的快速原型开发。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fashawkey_stable-dreamfusion_0b72f3c3.png","ashawkey","kiui","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fashawkey_24d0cf2b.jpg",null,"NVIDIA","Beijing, China","ashawkey1999@gmail.com","https:\u002F\u002Fme.kiui.moe","https:\u002F\u002Fgithub.com\u002Fashawkey",[84,88,92,96,100,104],{"name":85,"color":86,"percentage":87},"Python","#3572A5",85.9,{"name":89,"color":90,"percentage":91},"Cuda","#3A4E3A",10.4,{"name":93,"color":94,"percentage":95},"Shell","#89e051",2.8,{"name":97,"color":98,"percentage":99},"C","#555555",0.5,{"name":101,"color":102,"percentage":103},"C++","#f34b7d",0.2,{"name":105,"color":106,"percentage":107},"Dockerfile","#384d54",0.1,8819,774,"2026-04-14T03:21:51","Apache-2.0",4,"Linux","必需 NVIDIA GPU。测试环境为 V100 (CUDA 11.6)。Instant-NGP 后端需约 16GB 显存；使用 DeepFloyd-IF 后端需约 24GB 显存。支持 CUDA 11.6+，也可选无 CUDA 的 Taichi 后端。","未说明",{"notes":117,"python":118,"dependencies":119},"1. 项目主要在 Ubuntu 22 上测试通过，Windows\u002FmacOS 未明确提及且可能因 OpenGL 或 CUDA 扩展编译问题导致运行失败。2. 首次运行需编译 CUDA 扩展（或使用 Taichi 后端避免编译）。3. 需手动下载预训练模型（Zero-1-to-3, Omnidata 等）。4. 若使用 DeepFloyd-IF 需登录 Hugging Face。5. 遇到 OpenGL 错误需重装 Nvidia 驱动或使用 nvidia-docker。","未说明 (建议使用虚拟环境)",[120,121,122,123,124,125,126,127],"torch>=1.12","diffusers","transformers","ninja","trimesh","dearpygui","Adan (optimizer)","taichi (可选)",[15,129],"其他",[131,132,133,134,135,136],"text-to-3d","gui","nerf","stable-diffusion","dreamfusion","image-to-3d","2026-03-27T02:49:30.150509","2026-04-16T10:47:43.803211",[140,145,150,155,160,164],{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},35826,"使用 --cuda_ray 标志生成 3D 对象时速度变慢并出现 NaN 错误，如何解决？","这通常是由过旧的 NVIDIA 显卡驱动引起的。请将 NVIDIA 驱动程序更新到最新版本（例如 525.x.x 或更高）。更新后，NaN 问题通常会消失，且 --cuda_ray 模式下的训练速度会显著提升（例如从 2.8 it\u002Fs 提升至 3.5 it\u002Fs），甚至优于纯 PyTorch 模式。","https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion\u002Fissues\u002F144",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},35827,"在 Docker 或特定环境中构建 gridencoder 扩展失败（报错 subprocess-exited-with-error），怎么办？","这是一个编译标准版本不匹配的问题。请打开 `gridencoder` 文件夹下的 `setup.py` 文件，将所有的 `c++14` 替换为 `c++17`。具体修改如下：\n1. 找到 `nvcc_flags` 列表，将 `-std=c++14` 改为 `-std=c++17`。\n2. 在 `if os.name == \"posix\":` 分支下，将 `c_flags` 中的 `-std=c++14` 改为 `-std=c++17`。\n3. 在 `elif os.name == \"nt\":` (Windows) 分支下，确保使用 `\u002Fstd:c++17`。\n修改后重新运行安装命令即可。","https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion\u002Fissues\u002F81",{"id":151,"question_zh":152,"answer_zh":153,"source_url":154},35828,"如何在 DreamFusion 中使用 DALL-E 2 或 Karlo 模型进行引导生成？","项目已支持 Karlo 模型。使用时需添加命令行参数 `--guidance karlo`。不过根据社区反馈，目前 Karlo 生成的图像连贯性可能不如 Stable Diffusion，效果甚至可能差于 GLIDE，建议优先使用 Stable Diffusion 以获得更稳定的结果。","https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion\u002Fissues\u002F107",{"id":156,"question_zh":157,"answer_zh":158,"source_url":159},35829,"Image-to-3D 功能中 Zero123 似乎只使用了初始视图，没有生成新视角，这是预期行为吗？","这不是预期的最终状态，通常是因为模型未正确加载或配置问题。Zero123 模型旨在生成新视角以优化 3D 重建。如果遇到此问题，请检查是否已正确下载并放置了 Zero123 的检查点文件（ckpt），并确认环境支持该模型的精度要求（注意有关 fp16 加载的注释）。如果点击生成后长时间无反应，可能是模型加载失败导致的静默错误。","https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion\u002Fissues\u002F232",{"id":161,"question_zh":162,"answer_zh":163,"source_url":149},35830,"在 Windows 上安装时遇到环境变量设置困难或找不到 '_gridencoder' 模块的错误，如何处理？","在 Windows 上设置环境变量与 Linux 不同，需通过系统属性界面或专门的脚本进行设置。如果遇到 \"_gridencoder was not found\" 错误，通常意味着扩展模块编译成功但未正确链接或路径未包含。首先确保已按照上述方法修改 `setup.py` 为 C++17 标准并重新编译。如果问题依旧，请检查 Python 是否能找到编译生成的 `.pyd` 或 `.dll` 文件，并确保运行命令时的当前目录包含该模块。",{"id":165,"question_zh":166,"answer_zh":167,"source_url":144},35831,"为什么开启 --cuda_ray 后训练速度反而比纯 PyTorch 模式慢？","如果在更新显卡驱动前出现这种情况，通常是因为旧版驱动（如 510.x）对 CUDA 射线追踪优化支持不佳，导致计算回退或产生 NaN 从而拖慢速度。解决方案是将 NVIDIA 驱动更新至最新版（525.x+）。更新后，--cuda_ray 模式通常能发挥硬件加速优势，速度会快于或至少持平于纯 PyTorch 模式。",[169,174],{"id":170,"version":171,"summary_zh":172,"released_at":173},281025,"0.2.0","* 基于 [zero 1-to-3](https:\u002F\u002Fgithub.com\u002Fcvlab-columbia\u002Fzero123) 后端的实验性图像转3D支持。\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F25863658\u002F232403294-b77409bf-ddc7-4bb8-af32-ee0cc123825a.mp4","2023-04-17T06:38:11",{"id":175,"version":176,"summary_zh":177,"released_at":178},281026,"0.1.0","1. 使用 DMTet 对 NeRF 生成的网格进行微调。2. 提升导出网格资源的质量。","2023-04-07T03:36:58"]