[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tool-threestudio-project--threestudio":3,"similar-threestudio-project--threestudio":111},{"id":4,"github_repo":5,"name":6,"description_en":7,"description_zh":8,"ai_summary_zh":9,"readme_en":10,"readme_zh":11,"quickstart_zh":12,"use_case_zh":13,"hero_image_url":14,"owner_login":15,"owner_name":6,"owner_avatar_url":16,"owner_bio":17,"owner_company":18,"owner_location":18,"owner_email":18,"owner_twitter":18,"owner_website":18,"owner_url":19,"languages":20,"stars":37,"forks":38,"last_commit_at":39,"license":40,"difficulty_score":41,"env_os":42,"env_gpu":43,"env_ram":44,"env_deps":45,"category_tags":59,"github_topics":18,"view_count":62,"oss_zip_url":18,"oss_zip_packed_at":18,"status":63,"created_at":64,"updated_at":65,"faqs":66,"releases":105},9919,"threestudio-project\u002Fthreestudio","threestudio","A unified framework for 3D content generation.","threestudio 是一个专为 3D 内容创作打造的统一开源框架。它巧妙地将成熟的 2D 文本生成图像模型“升维”应用，让用户能够仅通过文字描述、单张图片或少量参考图，即可高效生成高质量的 3D 资产。\n\n在 threestudio 出现之前，3D 生成领域算法众多但实现分散，研究人员复现和对比不同方法（如 DreamFusion、Magic3D、ProlificDreamer 等）往往面临极高的环境配置与代码整合门槛。threestudio 通过模块化设计解决了这一痛点，将多种前沿算法整合在同一套代码库中，极大地降低了实验难度，促进了技术社区的交流与迭代。\n\n这款工具主要面向 AI 研究人员、3D 开发者以及希望探索生成式 3D 技术的设计师。对于研究者而言，它提供了便捷的基准测试平台；对于开发者，其清晰的架构便于二次开发和功能扩展；即使是具备一定编程基础的高级用户，也能利用其提供的 Colab 笔记本和 Gradio 演示快速上手体验。\n\nthreestudio 的核心亮点在于其卓越的兼容性与扩展性。它不仅原生支持十多种主流 3D 生成算法，还推出了专门的扩展仓库（threes","threestudio 是一个专为 3D 内容创作打造的统一开源框架。它巧妙地将成熟的 2D 文本生成图像模型“升维”应用，让用户能够仅通过文字描述、单张图片或少量参考图，即可高效生成高质量的 3D 资产。\n\n在 threestudio 出现之前，3D 生成领域算法众多但实现分散，研究人员复现和对比不同方法（如 DreamFusion、Magic3D、ProlificDreamer 等）往往面临极高的环境配置与代码整合门槛。threestudio 通过模块化设计解决了这一痛点，将多种前沿算法整合在同一套代码库中，极大地降低了实验难度，促进了技术社区的交流与迭代。\n\n这款工具主要面向 AI 研究人员、3D 开发者以及希望探索生成式 3D 技术的设计师。对于研究者而言，它提供了便捷的基准测试平台；对于开发者，其清晰的架构便于二次开发和功能扩展；即使是具备一定编程基础的高级用户，也能利用其提供的 Colab 笔记本和 Gradio 演示快速上手体验。\n\nthreestudio 的核心亮点在于其卓越的兼容性与扩展性。它不仅原生支持十多种主流 3D 生成算法，还推出了专门的扩展仓库（threestudio-extensions），允许社区轻松贡献新模型。这种“统一框架 + 灵活扩展”的模式，使其成为当前 3D 生成领域不可或缺的基础设施之一。","\u003Cp align=\"center\">\n    \u003Cpicture>\n    \u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_3a9d312e765b.png\" width=\"50%\">\n    \u003C\u002Fpicture>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\u003Cb>\nthreestudio is a unified framework for 3D content creation from text prompts, single images, and few-shot images, by lifting 2D text-to-image generation models.\n\u003C\u002Fb>\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F19284678\u002Ff48eca9f-45a7-4092-a519-6bb99f4939e4.gif\" width=\"100%\">\n\u003Cbr\u002F>\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F19284678\u002F01a00207-3240-4a8e-aa6f-d48436370fe7.png\" width=\"100%\">\n\u003Cbr\u002F>\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_16245d00c4d2.png\" width=\"48%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_3b4cbfa870ee.png\" width=\"25%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_a53555dda57f.png\" width=\"24%\">\n\n\u003Cbr\u002F>\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_65c0e35c8bdc.png\" width=\"48%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_47adc23cadde.png\" width=\"25%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_2d4cfdd37656.png\" width=\"25%\">\n\n\u003Cbr\u002F>\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_3196107bca58.png\" width=\"68%\">\n\u003Cbr\u002F>\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_a8e94b4a4ff6.png\" width=\"68%\">\n\u003Cbr\u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\u003Cb>\n👆 Results obtained from methods implemented by threestudio 👆 \u003Cbr\u002F>\n| \u003Ca href=\"https:\u002F\u002Fml.cs.tsinghua.edu.cn\u002Fprolificdreamer\u002F\">ProlificDreamer\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fdreamfusion3d.github.io\u002F\">DreamFusion\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Fdir\u002Fmagic3d\u002F\">Magic3D\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fpals.ttic.edu\u002Fp\u002Fscore-jacobian-chaining\">SJC\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Feladrich\u002Flatent-nerf\">Latent-NeRF\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Ffantasia3d.github.io\u002F\">Fantasia3D\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Ffabi92.github.io\u002Ftextmesh\u002F\">TextMesh\u003C\u002Fa> |\n\u003Cbr\u002F>\n| \u003Ca href=\"https:\u002F\u002Fzero123.cs.columbia.edu\u002F\">Zero-1-to-3\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fguochengqian.github.io\u002Fproject\u002Fmagic123\u002F\">Magic123\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FJunzheJosephZhu\u002FHiFA\">HiFA\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Flukoianov.com\u002Fsdi\">SDI\u003C\u002Fa> |\n\u003Cbr \u002F>\n| \u003Ca href=\"https:\u002F\u002Finstruct-nerf2nerf.github.io\u002F\">InstructNeRF2NeRF\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fcontrol4darxiv.github.io\u002F\">Control4D\u003C\u002Fa> |\n\u003C\u002Fb>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fthreestudio-project\u002Fthreestudio\u002Fblob\u002Fmain\u002Fthreestudio.ipynb\">\n  \u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fbennyguo\u002Fthreestudio\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Gradio%20Demo-Huggingface-orange\">\u003C\u002Fa>\n  \u003Ca href=\"http:\u002F\u002Ft23-g-01.threestudio.ai\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGradio%20Demo-Tencent-blue?logo=tencentqq&logoColor=white\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002Fejer2MAB8N\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-5865F2?logo=discord&logoColor=white\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    Did not find what you want? Checkout \u003Ca href=\"https:\u002F\u002Fthreestudio-project.github.io\u002Fthreestudio-extensions\u002F\">\u003Cb>threestudio-extension\u003C\u002Fb>\u003C\u002Fa> or submit a feature request \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fdiscussions\u002F46\">here\u003C\u002Fa>!\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_4d241a114bf8.png\" width=\"68%\">\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_416316e81d47.png\" width=\"35%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_1f6395841655.png\" width=\"35%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_512a44d6bac7.png\" width=\"18%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_a274906ee5ca.png\" width=\"35%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_b1fb6df5fd48.png\" width=\"35%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_d42ea5788b12.png\" width=\"18%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_d5885a5d7414.png\" width=\"35%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_f797cd6e3b71.png\" width=\"35%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_4def89febabb.png\" width=\"45%\">\n\u003C\u002Fp>\n\u003Cp align=\"center\">\u003Cb>\n| \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FHeliosZhao\u002FAnimate124\u002Ftree\u002Fthreestudio\">Animate-124\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FDSaurus\u002Fthreestudio-4dfy\">4D-fy\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fbaaivision\u002FGeoDream\u002Ftree\u002Fthreestudio\">GeoDream\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FDSaurus\u002Fthreestudio-dreamcraft3D\">DreamCraft3D\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhuanngzh\u002Fthreestudio-dreamwaltz\">Dreamwaltz\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FKU-CVLAB\u002F3DFuse-threestudio\">3DFuse\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fcxh0519\u002FProgressive3D\">Progressive3D\u003C\u002Fa> |  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fcxh0519\u002Fthreestudio-gaussiandreamer\">GaussianDreamer\u003C\u002Fa> |  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FDSaurus\u002Fthreestudio-3dgs\">Gaussian Splatting\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FDSaurus\u002Fthreestudio-mvdream\">MVDream\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FDSaurus\u002Fthreestudio-meshfitting\">Mesh-Fitting\u003C\u002Fa> |\n\u003C\u002Fb>\n\n## News\n- 08\u002F11\u002F2024: Thank [Artem Lukoianov](https:\u002F\u002Fgithub.com\u002Fottogin) for implementation of [Score Distillation via Reparametrized DDIM](https:\u002F\u002Flukoianov.com\u002Fsdi)! Text-to-3D module is added to Threestudio as well as a notebook with 2D score distillation experiments.\n- 21\u002F10\u002F2024: Thank [Amir Barda](https:\u002F\u002Fgithub.com\u002Famirbarda) for implementation of [MagicClay](https:\u002F\u002Fgithub.com\u002Famirbarda\u002FMagicClay)! Follow the instructions on its website to give it a try.\n- 12\u002F03\u002F2024: Thank [Matthew Kwak](https:\u002F\u002Fgithub.com\u002Fmskwak01) and [Inès Hyeonsu Kim](https:\u002F\u002Fgithub.com\u002FInes-Hyeonsu-Kim) for implementation of [3DFuse](https:\u002F\u002Fgithub.com\u002FKU-CVLAB\u002F3DFuse-threestudio)! Follow the instructions on its website to give it a try.\n- 08\u002F03\u002F2024: Thank [Xinhua Cheng](https:\u002F\u002Fgithub.com\u002Fcxh0519\u002F) for implementation of [GaussianDreamer](https:\u002F\u002Fgithub.com\u002Fcxh0519\u002Fthreestudio-gaussiandreamer)! Follow the instructions on its website to give it a try.\n- 01\u002F03\u002F2024: Thank [Xinhua Cheng](https:\u002F\u002Fgithub.com\u002Fcxh0519\u002F) for implementation of [Progressive3D](https:\u002F\u002Fgithub.com\u002Fcxh0519\u002FProgressive3D)! Follow the instructions on its website to give it a try.\n- 09\u002F01\u002F2024: Thank [Zehuan Huang](https:\u002F\u002Fgithub.com\u002Fhuanngzh) for implementation of 3D human avatar generation [Dreamwaltz](https:\u002F\u002Fgithub.com\u002Fhuanngzh\u002Fthreestudio-dreamwaltz)! Follow the instructions on its website to give it a try.\n- 06\u002F01\u002F2024: Thank [Baorui Ma](https:\u002F\u002Fgithub.com\u002Fmabaorui) for implementation of [GeoGream extensions](https:\u002F\u002Fgithub.com\u002Fbaaivision\u002FGeoDream\u002Ftree\u002Fthreestudio)! Follow the instructions on its website to give it a try.\n- 05\u002F01\u002F2024: Implemented HiFA. Follow the instructions [here](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#hifa-) to try all three variants.\n- 23\u002F12\u002F2023: Thank [Yuyang Zhao](https:\u002F\u002Fgithub.com\u002FHeliosZhao) for implementation of image-to-4D generation extensions [Animate-124](https:\u002F\u002Fgithub.com\u002FHeliosZhao\u002FAnimate124\u002Ftree\u002Fthreestudio)! Follow the instructions on the extensions website to give it a try.\n- 18\u002F12\u002F2023: Implementation of [4D-fy](https:\u002F\u002Fgithub.com\u002FDSaurus\u002Fthreestudio-4dfy) for 4D generation and [DreamCraft3D](https:\u002F\u002Fgithub.com\u002FDSaurus\u002Fthreestudio-dreamcraft3D) for high-quality image-to-3D generation as the custom extensions! Follow the instructions on the extensions website to give it a try.\n- 13\u002F12\u002F2023: Implementation supporting [Stable Zero123](https:\u002F\u002Fstability.ai\u002Fnews\u002Fstable-zero123-3d-generation) for 3D generation from a single image! Follow the instructions [here](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#stable-zero123) to give it a try.\n- 30\u002F11\u002F2023: Implementation of [MVDream](https:\u002F\u002Fgithub.com\u002FDSaurus\u002Fthreestudio-mvdream), [Gaussian Splatting](https:\u002F\u002Fgithub.com\u002FDSaurus\u002Fthreestudio-3dgs) as the custom extensions. You can also use neural representation to fit a mesh by [Mesh-Fitting](https:\u002F\u002Fgithub.com\u002FDSaurus\u002Fthreestudio-meshfitting).\n- 30\u002F11\u002F2023: Implementation of [custom extension system](https:\u002F\u002Fthreestudio-project.github.io\u002Fthreestudio-extensions\u002F) and you can add your extensions in [this project](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio-extensions).\n- 25\u002F06\u002F2023: Implementation of [Magic123](https:\u002F\u002Fguochengqian.github.io\u002Fproject\u002Fmagic123\u002F)! Follow the instructions [here](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#magic123-) to give it a try.\n- 06\u002F07\u002F2023: Join our [Discord server](https:\u002F\u002Fdiscord.gg\u002Fejer2MAB8N) for lively discussions!\n- 03\u002F07\u002F2023: Try text-to-3D online in [HuggingFace Spaces](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fbennyguo\u002Fthreestudio) or using our [self-hosted service](http:\u002F\u002Ft23-g-01.threestudio.ai) (GPU support from Tencent). To host the web interface locally, see [here](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#gradio-web-interface).\n- 20\u002F06\u002F2023: Implementations of Instruct-NeRF2NeRF and Control4D for high-fidelity 3D editing! Follow the instructions for [Control4D](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#control4d-) and [Instruct-NeRF2NeRF](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#instructnerf2nerf-) to give it a try.\n- 14\u002F06\u002F2023: Implementation of TextMesh! Follow the instructions [here](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#textmesh-) to give it a try.\n- 14\u002F06\u002F2023: Implementation of [prompt debiasing](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.15413) and [Perp-Neg](https:\u002F\u002Fperp-neg.github.io\u002F)! Follow the instructions [here](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#tips-on-improving-quality) to give it a try.\n- 29\u002F05\u002F2023: An experimental implementation of using [Zero-1-to-3](https:\u002F\u002Fzero123.cs.columbia.edu\u002F) for 3D generation from a single image! Follow the instructions [here](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#zero-1-to-3-) to give it a try.\n- 26\u002F05\u002F2023: Implementation of [ProlificDreamer](https:\u002F\u002Fml.cs.tsinghua.edu.cn\u002Fprolificdreamer\u002F)! Follow the instructions [here](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#prolificdreamer-) to give it a try.\n- 14\u002F05\u002F2023: You can experiment with the SDS loss on 2D images using our [2dplayground](2dplayground.ipynb).\n- 13\u002F05\u002F2023: You can now try threestudio on [Google Colab](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fthreestudio-project\u002Fthreestudio\u002Fblob\u002Fmain\u002Fthreestudio.ipynb)!\n- 11\u002F05\u002F2023: We now support exporting textured meshes! See [here](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#export-meshes) for instructions.\n\n![export-blender](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_493bdc727347.png)\n\n## Installation\n\nSee [installation.md](docs\u002Finstallation.md) for additional information, including installation via Docker.\n\nThe following steps have been tested on Ubuntu20.04.\n\n- You must have an NVIDIA graphics card with at least 6GB VRAM and have [CUDA](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-downloads) installed.\n- Install `Python >= 3.8`.\n- (Optional, Recommended) Create a virtual environment:\n\n```sh\npython3 -m virtualenv venv\n. venv\u002Fbin\u002Factivate\n\n# Newer pip versions, e.g. pip-23.x, can be much faster than old versions, e.g. pip-20.x.\n# For instance, it caches the wheels of git packages to avoid unnecessarily rebuilding them later.\npython3 -m pip install --upgrade pip\n```\n\n- Install `PyTorch >= 1.12`. We have tested on `torch1.12.1+cu113` and `torch2.0.0+cu118`, but other versions should also work fine.\n\n```sh\n# torch1.12.1+cu113\npip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu113\n# or torch2.0.0+cu118\npip install torch torchvision --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n```\n\n- (Optional, Recommended) Install ninja to speed up the compilation of CUDA extensions:\n\n```sh\npip install ninja\n```\n\n- Install dependencies:\n\n```sh\npip install -r requirements.txt\n```\n\n- (Optional) `tiny-cuda-nn` installation might require downgrading pip to 23.0.1\n\n- (Optional, Recommended) The best-performing models in threestudio use the newly-released T2I model [DeepFloyd IF](https:\u002F\u002Fgithub.com\u002Fdeep-floyd\u002FIF), which currently requires signing a license agreement. If you would like to use these models, you need to [accept the license on the model card of DeepFloyd IF](https:\u002F\u002Fhuggingface.co\u002FDeepFloyd\u002FIF-I-XL-v1.0), and login into the Hugging Face hub in the terminal by `huggingface-cli login`.\n\n- For contributors, see [here](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#contributing-to-threestudio).\n\n## Quickstart\n\nHere we show some basic usage of threestudio. First let's train a DreamFusion model to create a classic pancake bunny.\n\n**If you are experiencing unstable connections with Hugging Face, we suggest you either (1) setting environment variable `TRANSFORMERS_OFFLINE=1 DIFFUSERS_OFFLINE=1 HF_HUB_OFFLINE=1` before your running command after all needed files have been fetched on the first run, to prevent from connecting to Hugging Face each time you run, or (2) downloading the guidance model you used to a local folder following [here](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhuggingface_hub\u002Fv0.14.1\u002Fguides\u002Fdownload#download-an-entire-repository) and [here](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhuggingface_hub\u002Fv0.14.1\u002Fguides\u002Fdownload#download-files-to-local-folder), and set `pretrained_model_name_or_path` of the guidance and the prompt processor to the local path.**\n\n```sh\n# if you have agreed the license of DeepFloyd IF and have >20GB VRAM\n# please try this configuration for higher quality\npython launch.py --config configs\u002Fdreamfusion-if.yaml --train --gpu 0 system.prompt_processor.prompt=\"a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes\"\n# otherwise you could try with the Stable Diffusion model, which fits in 6GB VRAM\npython launch.py --config configs\u002Fdreamfusion-sd.yaml --train --gpu 0 system.prompt_processor.prompt=\"a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes\"\n```\n\nthreestudio uses [OmegaConf](https:\u002F\u002Fgithub.com\u002Fomry\u002Fomegaconf) for flexible configurations. You can easily change any configuration in the YAML file by specifying arguments without `--`, for example the specified prompt in the above cases. For all supported configurations, please see our [documentation](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fblob\u002Fmain\u002FDOCUMENTATION.md).\n\nThe training lasts for 10,000 iterations. You can find visualizations of the current status in the trial directory which defaults to `[exp_root_dir]\u002F[name]\u002F[tag]@[timestamp]`, where `exp_root_dir` (`outputs\u002F` by default), `name` and `tag` can be set in the configuration file. A 360-degree video will be generated after the training is completed. In training, press `ctrl+c` one time will stop training and head directly to the test stage which generates the video. Press `ctrl+c` the second time to fully quit the program.\n\n### Multi-GPU training\n\nMulti-GPU training is supported, but may still be [buggy](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fissues\u002F195). Note that `data.batch_size` is the batch size **per rank (device)**. Also remember to\n\n- Set `data.n_val_views` to be a multiple of the number of GPUs.\n- Set a unique `tag` as timestamp is disabled in multi-GPU training and will not be appended after the tag. If you the same tag as previous trials, saved config files, code and visualizations will be overridden.\n\n```sh\n# this results in an effective batch size of 4 (number of GPUs) * 2 (data.batch_size) = 8\npython launch.py --config configs\u002Fdreamfusion-if.yaml --train --gpu 0,1,2,3 system.prompt_processor.prompt=\"a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes\" data.batch_size=2 data.n_val_views=4\n```\n\nIf you define the `CUDA_VISIBLE_DEVICES` environment variable before you call `launch.py`, you don't need to specify `--gpu` - this will use all available GPUs from `CUDA_VISIBLE_DEVICES`. For instance, the following command will automatically use GPUs 3 and 4:\n\n`CUDA_VISIBLE_DEVICES=3,4 python launch.py --config configs\u002Fdreamfusion-if.yaml --train system.prompt_processor.prompt=\"a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes\"`\n\nThis is particularly useful if you run `launch.py` in a cluster using a command that automatically picks GPU(s) and exports their IDs through CUDA_VISIBLE_DEVICES, e.g. through SLURM:\n\n```bash\ncd git\u002Fthreestudio\n. venv\u002Fbin\u002Factivate\nsrun --account mod3d --partition=g40 --gpus=1 --job-name=3s_bunny python launch.py --config configs\u002Fdreamfusion-if.yaml --train system.prompt_processor.prompt=\"a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes\"\n```\n\n### Resume from checkpoints\n\nIf you want to resume from a checkpoint, do:\n\n```sh\n# resume training from the last checkpoint, you may replace last.ckpt with any other checkpoints\npython launch.py --config path\u002Fto\u002Ftrial\u002Fdir\u002Fconfigs\u002Fparsed.yaml --train --gpu 0 resume=path\u002Fto\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt\n# if the training has completed, you can still continue training for a longer time by setting trainer.max_steps\npython launch.py --config path\u002Fto\u002Ftrial\u002Fdir\u002Fconfigs\u002Fparsed.yaml --train --gpu 0 resume=path\u002Fto\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt trainer.max_steps=20000\n# you can also perform testing using resumed checkpoints\npython launch.py --config path\u002Fto\u002Ftrial\u002Fdir\u002Fconfigs\u002Fparsed.yaml --test --gpu 0 resume=path\u002Fto\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt\n# note that the above commands use parsed configuration files from previous trials\n# which will continue using the same trial directory\n# if you want to save to a new trial directory, replace parsed.yaml with raw.yaml in the command\n\n# only load weights from saved checkpoint but dont resume training (i.e. dont load optimizer state):\npython launch.py --config path\u002Fto\u002Ftrial\u002Fdir\u002Fconfigs\u002Fparsed.yaml --train --gpu 0 system.weights=path\u002Fto\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt\n```\n\n### Export Meshes\n\nTo export the scene to texture meshes, use the `--export` option. We currently support exporting to obj+mtl, or obj with vertex colors.\n\n```sh\n# this uses default mesh-exporter configurations which exports obj+mtl\npython launch.py --config path\u002Fto\u002Ftrial\u002Fdir\u002Fconfigs\u002Fparsed.yaml --export --gpu 0 resume=path\u002Fto\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt system.exporter_type=mesh-exporter\n# specify system.exporter.fmt=obj to get obj with vertex colors\n# you may also add system.exporter.save_uv=false to accelerate the process, suitable for a quick peek of the result\npython launch.py --config path\u002Fto\u002Ftrial\u002Fdir\u002Fconfigs\u002Fparsed.yaml --export --gpu 0 resume=path\u002Fto\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt system.exporter_type=mesh-exporter system.exporter.fmt=obj\n# for NeRF-based methods (DreamFusion, Magic3D coarse, Latent-NeRF, SJC)\n# you may need to adjust the isosurface threshold (25 by default) to get satisfying outputs\n# decrease the threshold if the extracted model is incomplete, increase if it is extruded\npython launch.py --config path\u002Fto\u002Ftrial\u002Fdir\u002Fconfigs\u002Fparsed.yaml --export --gpu 0 resume=path\u002Fto\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt system.exporter_type=mesh-exporter system.geometry.isosurface_threshold=10.\n# use marching cubes of higher resolutions to get more detailed models\npython launch.py --config path\u002Fto\u002Ftrial\u002Fdir\u002Fconfigs\u002Fparsed.yaml --export --gpu 0 resume=path\u002Fto\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt system.exporter_type=mesh-exporter system.geometry.isosurface_method=mc-cpu system.geometry.isosurface_resolution=256\n```\n\nFor all the options you can specify when exporting, see [the documentation](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fblob\u002Fmain\u002FDOCUMENTATION.md#exporters).\n\nSee [here](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#supported-models) for example running commands of all our supported models. Please refer to [here](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#tips-on-improving-quality) for tips on getting higher-quality results, and [here](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#vram-optimization) for reducing VRAM usage.\n\n### Gradio Web Interface\n\nLaunch the Gradio web interface by\n\n```\npython gradio_app.py launch\n```\n\nParameters:\n\n- `--listen`: listens to all addresses by setting `server_name=\"0.0.0.0\"` when launching the Gradio app.\n- `--self-deploy`: enables changing arbitrary configurations directly from the web.\n- `--save`: enables checkpoint saving.\n\nFor feature requests, bug reports, or discussions about technical problems, please [file an issue](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fissues\u002Fnew). In case you want to discuss the generation quality or showcase your generation results, please feel free to participate in the [discussion panel](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fdiscussions).\n\n## Supported Models\n\n### Score Distillation via Reparametrized DDIM (SDI) [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2405.15891-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.15891)\n\nSDI suggests to reconsider the approach to sampling the noise term in Dreamfusion. The paper demonstrates that score distillation process can be seen as a reparametrization of 2D image sampling algorithms. In that case the noise added on each step of score distillation should be of a very particular form. Noise in Dreamfusion (SDS), however, is sampled randomly, what causes over-blurring. SDI approximates the correct noise term by inverting the DDIM process.\n\nNotable differences from the paper: N\u002FA.\n\nPros:\n* High quality of the textures\n* Sharp geometric details\n\nCons:\n* Slower than SDS (1.5x times) due to additional inversion. Still faster then prolific dreamer due to lower number of steps\n* Requires more VRAM than SDS due to higher resolution rendering. Decrease the resolution to fit to smaller GPUs.\n\n**Results obtained in threestudio (Stable Diffusion, 512x512)**\n\n\u003Cimg alt=\"A_DSLR_photo_of_a_freshly_baked_round_loaf_of_sourdough_bread\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_8bfaaad12a9d.png\" width=\"48%\">\n\u003Cimg alt=\"a_photograph_of_a_knight\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_ecb24fc4e852.png\" width=\"48%\">\n\n**Example running commands**\n```sh\npython launch.py --config configs\u002Fsdi.yaml --train --gpu 0 system.prompt_processor.prompt=\"pumpkin head zombie, skinny, highly detailed, photorealistic\"\n\npython launch.py --config configs\u002Fsdi.yaml --train --gpu 1 system.prompt_processor.prompt=\"a photograph of a ninja\"\n\npython launch.py --config configs\u002Fsdi.yaml --train --gpu 2 system.prompt_processor.prompt=\"a zoomed out DSLR photo of a hamburger\"\n\npython launch.py --config configs\u002Fsdi.yaml --train --gpu 3 system.prompt_processor.prompt=\"bagel filled with cream cheese and lox\"\n```\n\n### ProlificDreamer [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2305.16213-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16213)\n\n**This is an unofficial experimental implementation! Please refer to [https:\u002F\u002Fgithub.com\u002Fthu-ml\u002Fprolificdreamer](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002Fprolificdreamer) for official code release.**\n\n**Results obtained by threestudio (Stable Diffusion, 256x256 Stage1)**\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F19284678\u002F27b42d8f-4aa4-4b47-8ea0-0f77db90fd1e\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F19284678\u002Fffcbbb01-3817-4663-a2bf-5e21a076bc3d\n\n**Results obtained by threestudio (Stable Diffusion, 256x256 Stage1, 512x512 Stage2+3)**\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F19284678\u002Fcfab881e-18dc-45fc-8384-7476f835b36e\n\nNotable differences from the paper:\n\n- ProlificDreamer adopts a two-stage sampling strategy with 64 coarse samples and 32 fine samples, while we only use 512 coarse samples.\n- In the first stage, we only render 64x64 images at the first 5000 iterations. After that, as the empty space has been effectively pruned, rendering 512x512 images wouldn't cost too much VRAM.\n- We currently don't support multiple particles.\n\n```sh\n# --------- Stage 1 (NeRF) --------- #\n# object generation with 512x512 NeRF rendering, ~30GB VRAM\npython launch.py --config configs\u002Fprolificdreamer.yaml --train --gpu 0 system.prompt_processor.prompt=\"a pineapple\"\n# if you don't have enough VRAM, try training with 64x64 NeRF rendering, ~15GB VRAM\npython launch.py --config configs\u002Fprolificdreamer.yaml --train --gpu 0 system.prompt_processor.prompt=\"a pineapple\" data.width=64 data.height=64 data.batch_size=1\n# using the same model for pretrained and LoRA enables 64x64 training with \u003C10GB VRAM\n# but the quality is worse due to the use of an epsilon prediction model for LoRA training\npython launch.py --config configs\u002Fprolificdreamer.yaml --train --gpu 0 system.prompt_processor.prompt=\"a pineapple\" data.width=64 data.height=64 data.batch_size=1 system.guidance.pretrained_model_name_or_path_lora=\"stabilityai\u002Fstable-diffusion-2-1-base\"\n# Using patch-based renderer to reduce memory consume, 512x512 resolution, ~20GB VRAM\npython launch.py --config configs\u002Fprolificdreamer-patch.yaml --train --gpu 0 system.prompt_processor.prompt=\"a pineapple\"\n# scene generation with 512x512 NeRF rendering, ~30GB VRAM\npython launch.py --config configs\u002Fprolificdreamer-scene.yaml --train --gpu 0 system.prompt_processor.prompt=\"Inside of a smart home, realistic detailed photo, 4k\"\n\n# --------- Stage 2 (Geometry Refinement) --------- #\n# refine geometry with 512x512 rasterization, Stable Diffusion SDS guidance\npython launch.py --config configs\u002Fprolificdreamer-geometry.yaml --train --gpu 0 system.prompt_processor.prompt=\"a pineapple\" system.geometry_convert_from=path\u002Fto\u002Fstage1\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt\n\n# --------- Stage 3 (Texturing) --------- #\n# texturing with 512x512 rasterization, Stable Difusion VSD guidance\npython launch.py --config configs\u002Fprolificdreamer-texture.yaml --train --gpu 0 system.prompt_processor.prompt=\"a pineapple\" system.geometry_convert_from=path\u002Fto\u002Fstage2\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt\n```\n### HiFA [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2209.14988-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.18766)\n**This is a re-implementation, missing some improvements from the original paper(coarse-to-fine NeRF sampling, kernel smoothing). For original results, please refer to [https:\u002F\u002Fgithub.com\u002FJunzheJosephZhu\u002FHiFA](https:\u002F\u002Fgithub.com\u002FJunzheJosephZhu\u002FHiFA)**\n\nHiFA is more like a suite of improvements including image space SDS, z-variance loss, and noise strength annealing. It is compatible with most optimization-based methods. Therefore, we provide three variants based on DreamFusion, ProlificDreamer, and Magic123. We provide a unified guidance config as well as an SDS\u002FVSD guidance config for the DreamFusion and ProlificDreamer variants, both configs should achieve the same results. Additionally, we also make HiFA compatible with ProlificDreamer-scene.\n\n**Results obtained by threestudio(Dreamfusion-HiFA, 512x512)**\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F24391451\u002Fc0030c66-0691-4ec2-8b79-d933101864a0\n\n**Results obtained by threestudio(ProlificDreamer-HiFA, 512x512)**\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F24391451\u002Fff5dc4d0-d7d7-4a73-964e-84b8c48e2907\n\n**Results obtained by threestudio(Magic123-HiFA, 512x512)**\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F24391451\u002Feb6f2f74-9143-4e26-8429-e300ad2d2b80\n\n**Example running commands**\n\n```sh\n# ------ DreamFusion-HiFA ------- # (similar to original paper)\npython launch.py --config configs\u002Fhifa.yaml --train --gpu 0 system.prompt_processor.prompt=\"a plate of delicious tacos\"\npython launch.py --config configs\u002Fexperimental\u002Funified-guidance\u002Fhifa.yaml --train --gpu 0 system.prompt_processor.prompt=\"a plate of delicious tacos\"\n# ------ ProlificDreamer-HiFA ------- #\npython launch.py --config configs\u002Fprolificdreamer-hifa.yaml --train --gpu 0 system.prompt_processor.prompt=\"a plate of delicious tacos\"\npython launch.py --config configs\u002Fexperimental\u002Funified-guidance\u002Fprolificdreamer-hifa.yaml --train --gpu 0 system.prompt_processor.prompt=\"a plate of delicious tacos\"\n# ------ ProlificDreamer-scene-HiFA ------- #\npython launch.py --config configs\u002Fprolificdreamer-scene-hifa.yaml --train --gpu 0 system.prompt_processor.prompt=\"A DSLR photo of a hamburger inside a restaurant\"\n# ------ Magic123-HiFA ------ #\npython launch.py --config configs\u002Fmagic123-hifa-coarse-sd.yaml --train --gpu 0 data.image_path=load\u002Fimages\u002Ffirekeeper_rgba.png system.prompt_processor.prompt=\"a toy figure of firekeeper from dark souls\"\n# We included a config for magic123's refine stage, but didn't really run it, since the coarse stage result already looks pretty decent.\n```\n\n**Tips**\n\n- If the generated object's color seems oversaturated, decrease lambda_sds_img(or lambda_sd_img if using unified guidance).\n- If the generated object looks cloudy, increase lamda_z_variance. If the shape becomes corrupted, decrease lambda_z_variance.\n- If the generated object overall seems to have high luminance, increase min_step_percent.\n- Make sure sqrt_anneal and use_img_loss are both set to True.\n- Check out the [original repo](https:\u002F\u002Fgithub.com\u002FJunzheJosephZhu\u002FHiFA)! The results are better.\n- **If you are using sqrt_anneal, make sure system.guidance.trainer_max_steps is equal to trainer.max_steps, so noise strength annealing works correctly**\n\n### DreamFusion [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2209.14988-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.14988)\n\n**Results obtained by threestudio (DeepFloyd IF, batch size 8)**\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F19284678\u002F236694848-38ae4ea4-554b-4c9d-b4c7-fba5bee3acb3.mp4\n\n**Notable differences from the paper**\n\n- We use open-source T2I models (StableDiffusion, DeepFloyd IF), while the paper uses Imagen.\n- We use a guidance scale of 20 for DeepFloyd IF, while the paper uses 100 for Imagen.\n- We do not use sigmoid to normalize the albedo color but simply scale the color from `[-1,1]` to `[0,1]`, as we find this help convergence.\n- We use HashGrid encoding and uniformly sample points along rays, while the paper uses Integrated Positional Encoding and sampling strategy from MipNeRF360.\n- We adopt camera settings and density initialization strategy from Magic3D, which is slightly different from the DreamFusion paper.\n- Some hyperparameters are different, such as the weighting of loss terms.\n\n**Example running commands**\n\n```sh\n# uses DeepFloyd IF, requires ~15GB VRAM to extract text embeddings and ~10GB VRAM in training\n# here we adopt random background augmentation to improve geometry quality\npython launch.py --config configs\u002Fdreamfusion-if.yaml --train --gpu 0 system.prompt_processor.prompt=\"a delicious hamburger\" system.background.random_aug=true\n# uses StableDiffusion, requires ~6GB VRAM in training\npython launch.py --config configs\u002Fdreamfusion-sd.yaml --train --gpu 0 system.prompt_processor.prompt=\"a delicious hamburger\"\n```\n\n**Tips**\n\n- DeepFloyd IF performs **way better than** StableDiffusion.\n- Validation shows albedo color before `system.material.ambient_only_steps` and shaded color after that.\n- Try increasing\u002Fdecreasing `system.loss.lambda_sparsity` if your scene is stuffed with floaters\u002Fbecoming empty.\n- Try increasing\u002Fdecreasing `system.loss.lambda_orient` if you object is foggy\u002Fover-smoothed.\n- Try replacing the background to random colors with a probability 0.5 by setting `system.background.random_aug=true` if you find the model incorrectly treats the background as part of the object.\n- DeepFloyd IF uses T5-XXL as its text encoder, which consumes ~15GB VRAM even when using 8-bit quantization. This is currently the bottleneck for training with less VRAM. If anyone knows how to run the text encoder with less VRAM, please file an issue. We're also trying to push the text encoder to [Replicate](https:\u002F\u002Freplicate.com\u002F) to enable extracting text embeddings via API, but are having some network connection issues. Please [contact bennyguo](mailto:imbennyguo@gmail.com) if you would like to help out.\n\n### Magic3D [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2211.10440-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.10440)\n\n**Results obtained by threestudio (DeepFloyd IF, batch size 8; first row: coarse, second row: refine)**\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F19284678\u002F236694858-0ed6939e-cd7a-408f-a94b-406709ae90c0.mp4\n\n**Notable differences from the paper**\n\n- We use open-source T2I models (StableDiffusion, DeepFloyd IF) for the coarse stage, while the paper uses eDiff-I.\n- In the coarse stage, we use a guidance scale of 20 for DeepFloyd IF, while the paper uses 100 for eDiff-I.\n- In the coarse stage, we use analytic normal, while the paper uses predicted normal.\n- In the coarse stage, we use orientation loss as in DreamFusion, while the paper does not.\n- There are many things that are omitted from the paper such as the weighting of loss terms and the DMTet grid resolution, which could be different.\n\n**Example running commands**\n\nFirst train the coarse stage NeRF:\n\n```sh\n# uses DeepFloyd IF, requires ~15GB VRAM to extract text embeddings and ~10GB VRAM in training\npython launch.py --config configs\u002Fmagic3d-coarse-if.yaml --train --gpu 0 system.prompt_processor.prompt=\"a delicious hamburger\"\n# uses StableDiffusion, requires ~6GB VRAM in training\npython launch.py --config configs\u002Fmagic3d-coarse-sd.yaml --train --gpu 0 system.prompt_processor.prompt=\"a delicious hamburger\"\n```\n\nThen convert the NeRF from the coarse stage to DMTet and train with differentiable rasterization:\n\n```sh\n# the refinement stage uses StableDiffusion, and requires ~5GB VRAM in training\npython launch.py --config configs\u002Fmagic3d-refine-sd.yaml --train --gpu 0 system.prompt_processor.prompt=\"a delicious hamburger\" system.geometry_convert_from=path\u002Fto\u002Fcoarse\u002Fstage\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt\n# if you're unsatisfied with the surface extracted using the default threshold (25)\n# you can specify a threshold value using `system.geometry_convert_override`\n# decrease the value if the extracted surface is incomplete, increase if it is extruded\npython launch.py --config configs\u002Fmagic3d-refine-sd.yaml --train --gpu 0 system.prompt_processor.prompt=\"a delicious hamburger\" system.geometry_convert_from=path\u002Fto\u002Fcoarse\u002Fstage\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt system.geometry_convert_override.isosurface_threshold=10.\n```\n\n**Tips**\n\n- For the coarse stage, DeepFloyd IF performs **way better than** StableDiffusion.\n- Magic3D uses a neural network to predict the surface normal, which may not resemble the true geometric normal and degrade geometry quality, so we use analytic normal instead.\n- Try increasing\u002Fdecreasing `system.loss.lambda_sparsity` if your scene is stuffed with floaters\u002Fbecoming empty.\n- Try increasing\u002Fdecreasing `system.loss.lambda_orient` if you object is foggy\u002Fover-smoothed.\n- Try replacing the background with random colors with a probability 0.5 by setting `system.background.random_aug=true` if you find the model incorrectly treats the background as part of the object.\n\n### Score Jacobian Chaining [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2212.00774-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.00774)\n\n**Results obtained by threestudio (Stable Diffusion)**\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F19284678\u002F236694871-87a247c1-2d3d-4cbf-89df-450bfeac3aca.mp4\n\nNotable differences from the paper: N\u002FA.\n\n**Example running commands**\n\n```sh\n# train with sjc guidance in latent space\npython launch.py --config configs\u002Fsjc.yaml --train --gpu 0 system.prompt_processor.prompt=\"A high quality photo of a delicious burger\"\n# train with sjc guidance in latent space, trump figure\npython launch.py --config configs\u002Fsjc.yaml --train --gpu 0 system.prompt_processor.prompt=\"Trump figure\" trainer.max_steps=30000 system.loss.lambda_emptiness=\"[15000,10000.0,200000.0,15001]\" system.optimizer.params.background.lr=0.05 seed=42\n```\n\n**Tips**\n\n- SJC uses subpixel rendering which decodes a `128x128` latent feature map for better visualization quality. You can turn off this feature by `system.subpixel_rendering=false` to save VRAM in validation\u002Ftesting.\n\n### Latent-NeRF [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2211.07600-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.07600)\n\n**Results obtained by threestudio (Stable Diffusion)**\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F19284678\u002F236694876-5a270347-6a41-4429-8909-44c90c554e06.mp4\n\nNotable differences from the paper: N\u002FA.\n\nWe currently only implement Latent-NeRF for text-guided and Sketch-Shape for (text,shape)-guided 3D generation. Latent-Paint is not implemented yet.\n\n**Example running commands**\n\n```sh\n# train Latent-NeRF in Stable Diffusion latent space\npython launch.py --config configs\u002Flatentnerf.yaml --train --gpu 0 system.prompt_processor.prompt=\"a delicious hamburger\"\n# refine Latent-NeRF in RGB space\npython launch.py --config configs\u002Flatentnerf-refine.yaml --train --gpu 0 system.prompt_processor.prompt=\"a delicious hamburger\" system.weights=path\u002Fto\u002Flatent\u002Fstage\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt\n\n# train Sketch-Shape in Stable Diffusion latent space\npython launch.py --config configs\u002Fsketchshape.yaml --train --gpu 0 system.guide_shape=load\u002Fshapes\u002Fteddy.obj system.prompt_processor.prompt=\"a teddy bear in a tuxedo\"\n# refine Sketch-Shape in RGB space\npython launch.py --config configs\u002Fsketchshape-refine.yaml --train --gpu 0 system.guide_shape=load\u002Fshapes\u002Fteddy.obj system.prompt_processor.prompt=\"a teddy bear in a tuxedo\" system.weights=path\u002Fto\u002Flatent\u002Fstage\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt\n```\n\n### Fantasia3D [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2303.13873-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.13873)\n\n**Results obtained by threestudio (Stable Diffusion)**\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F19284678\u002F236694880-33b0db21-4530-47f1-9c3b-c70357bc84b3.mp4\n\n**Results obtained by threestudio (Stable Diffusion, mesh initialization)**\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F19284678\u002F762903c1-665b-47b5-a2c2-bd7021a9e548.mp4\n\n\u003Cp align=\"center\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F19284678\u002F2d22e30f-4a32-454a-a06e-d6e6bd2a1b96.png\" width=\"100%\">\n\u003C\u002Fp>\n\nNotable differences from the paper:\n\n- We enable tangent-space normal perturbation by default, which can be turned off by appending `system.material.use_bump=false`.\n\n**Example running commands**\n\n```sh\n# --------- Geometry --------- #\npython launch.py --config configs\u002Ffantasia3d.yaml --train --gpu 0 system.prompt_processor.prompt=\"a DSLR photo of an ice cream sundae\"\n# Fantasia3D highly relies on the initialized SDF shape\n# the default shape is a sphere with radius 0.5\n# change the shape initialization to match your input prompt\npython launch.py --config configs\u002Ffantasia3d.yaml --train --gpu 0 system.prompt_processor.prompt=\"The leaning tower of Pisa\" system.geometry.shape_init=ellipsoid system.geometry.shape_init_params=\"[0.3,0.3,0.8]\"\n# or you can initialize from a mesh\n# here shape_init_params is the scale of the shape\n# also make sure to input the correct up and front axis (in +x, +y, +z, -x, -y, -z)\npython launch.py --config configs\u002Ffantasia3d.yaml --train --gpu 0 system.prompt_processor.prompt=\"hulk\" system.geometry.shape_init=mesh:load\u002Fshapes\u002Fhuman.obj system.geometry.shape_init_params=0.9 system.geometry.shape_init_mesh_up=+y system.geometry.shape_init_mesh_front=+z\n# --------- Texture --------- #\n# to train PBR texture continued from a geometry checkpoint:\npython launch.py --config configs\u002Ffantasia3d-texture.yaml --train --gpu 0 system.prompt_processor.prompt=\"a DSLR photo of an ice cream sundae\" system.geometry_convert_from=path\u002Fto\u002Fgeometry\u002Fstage\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt\n```\n\n**Tips**\n\n- If you find the shape easily diverge in early training stages, you may use a lower guidance scale by setting `system.guidance.guidance_scale=30.`.\n\n### TextMesh [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2304.12439-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.12439)\n\n**Results obtained by threestudio (DeepFloyd IF, batch size 4)**\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F19284678\u002F72217cdd-765a-475b-92d0-4ab62bf0f57a\n\n**Notable differences from the paper**\n\n- Most of the settings are the same as the DreamFusion model. Please refer to the notable differences of the DreamFusion model.\n- We use NeuS as the geometry representation while the original paper uses VolSDF.\n- We adopt techniques from [Neuralangelo](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.03092) to stabilize normal computation when using hash grids.\n- We currently only implemented the coarse stage of TextMesh.\n\n**Example running commands**\n\n```sh\n# uses DeepFloyd IF, requires ~15GB VRAM\npython launch.py --config configs\u002Ftextmesh-if.yaml --train --gpu 0 system.prompt_processor.prompt=\"lib:cowboy_boots\"\n```\n\n**Tips**\n\n- TextMesh uses a surface-based geometry representation, so you don't need to manually tune the isosurface threshold when exporting meshes!\n\n### Control4D [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2305.20082-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.20082)\n\n**This is an experimental implementation of Control4D using threestudio! Control4D will release the full code including static and dynamic editing after paper acceptance.**\n\n**Results obtained by threestudio (512x512)**\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F24589363\u002F97d9aadd-32c7-488f-9543-6951b285d588\n\nWe currently don't support dynamic editing.\n\nDownload the data sample of control4D using this [link](https:\u002F\u002Fmailstsinghuaeducn-my.sharepoint.com\u002F:u:\u002Fg\u002Fpersonal\u002Fshaorz20_mails_tsinghua_edu_cn\u002FEcqOaEuNwH1KpR0JTzL4Ur0BO_iJr8RiY2rNAGVC7h3fng?e=Dyr2gu).\n\n**Example running commands**\n\n```sh\n# --------- Control4D --------- #\n# static editing with 128x128 NeRF + 512x512 GAN rendering, ~20GB VRAM\npython launch.py --config configs\u002Fcontrol4d-static.yaml --train --gpu 0 data.dataroot=\"YOUR_DATAROOT\u002Ftwindom\" system.prompt_processor.prompt=\"Elon Musk wearing red shirt, RAW photo, (high detailed skin:1.2), 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3\"\n```\n\n### InstructNeRF2NeRF [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2303.12789-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.12789)\n\n**Results obtained by threestudio**\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F24589363\u002F7aa43a2d-87d7-4ef5-94b6-f778ddb041b5\n\nDownload the data sample of InstructNeRF2NeRF using this [link](https:\u002F\u002Fmailstsinghuaeducn-my.sharepoint.com\u002F:u:\u002Fg\u002Fpersonal\u002Fshaorz20_mails_tsinghua_edu_cn\u002FEbNazeNAYsBIvxGeXuCmOXgBiLv8KM-hfRNbNS7DtTvSvA?e=C1k4bM).\n\n**Example running commands**\n\n```sh\n# --------- InstructNeRF2NeRF --------- #\n# 3D editing with NeRF patch-based rendering, ~20GB VRAM\npython launch.py --config configs\u002Finstructnerf2nerf.yaml --train --gpu 0 data.dataroot=\"YOUR_DATAROOT\u002Fface\" data.camera_layout=\"front\" data.camera_distance=1 data.eval_interpolation=[1,3,50] system.prompt_processor.prompt=\"Turn him into Albert Einstein\"\n```\n\n### Magic123 [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2306.17843-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.17843)\n\n**Results obtained by threestudio (Zero123 + Stable Diffusion)**\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F19284678\u002F335a58a8-8fee-485b-ac27-c55a16f4a673\n\n**Notable differences from the paper**\n- This is an unofficial re-implementation which shares the same overall idea with the [official implementation](https:\u002F\u002Fgithub.com\u002Fguochengqian\u002FMagic123) but differs in some aspects like hyperparameters.\n- Textual Inversion is not supported, which means a text prompt is needed for training.\n\n**Example running commands**\n\nFirst train the coarse stage NeRF:\n\n```sh\n# Zero123 + Stable Diffusion, ~12GB VRAM\n# data.image_path must point to a 4-channel RGBA image\n# system.prompt_proessor.prompt must be specified\npython launch.py --config configs\u002Fmagic123-coarse-sd.yaml --train --gpu 0 data.image_path=load\u002Fimages\u002Fhamburger_rgba.png system.prompt_processor.prompt=\"a delicious hamburger\"\n```\n\nThen convert the NeRF from the coarse stage to DMTet and train with differentiable rasterization:\n\n```sh\n# Zero123 + Stable Diffusion, ~10GB VRAM\n# data.image_path must point to a 4-channel RGBA image\n# system.prompt_proessor.prompt must be specified\npython launch.py --config configs\u002Fmagic123-refine-sd.yaml --train --gpu 0 data.image_path=load\u002Fimages\u002Fhamburger_rgba.png system.prompt_processor.prompt=\"a delicious hamburger\" system.geometry_convert_from=path\u002Fto\u002Fcoarse\u002Fstage\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt\n# if you're unsatisfied with the surface extracted using the default threshold (25)\n# you can specify a threshold value using `system.geometry_convert_override`\n# decrease the value if the extracted surface is incomplete, increase if it is extruded\npython launch.py --config configs\u002Fmagic123-refine-sd.yaml --train --gpu 0 data.image_path=load\u002Fimages\u002Fhamburger_rgba.png system.prompt_processor.prompt=\"a delicious hamburger\" system.geometry_convert_from=path\u002Fto\u002Fcoarse\u002Fstage\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt system.geometry_convert_override.isosurface_threshold=10.\n```\n\n**Tips**\n\n- If the image contains non-front-facing objects, specifying the approximate elevation and azimuth angle by setting `data.default_elevation_deg` and `data.default_azimuth_deg` can be helpful. In threestudio, top is elevation +90 and bottom is elevation -90; left is azimuth -90 and right is azimuth +90.\n\n\n### Stable Zero123\n\n**Installation**\n\nDownload pretrained Stable Zero123 checkpoint `stable-zero123.ckpt` into `load\u002Fzero123` from https:\u002F\u002Fhuggingface.co\u002Fstabilityai\u002Fstable-zero123\n\n**Results obtained by threestudio (Stable Zero123 vs Zero123-XL)**\n![Final_video_v01](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_902377accad6.png)\n\n**Direct multi-view images generation**\nIf you only want to generate multi-view images, please refer to [threestudio-mvimg-gen](https:\u002F\u002Fgithub.com\u002FDSaurus\u002Fthreestudio-mvimg-gen). This extension can use Stable Zero123 to directly generate images from multi-view perspectives.\n\n**Example running commands**\n\n1. Take an image of your choice, or generate it from text using your favourite AI image generator such as SDXL Turbo (https:\u002F\u002Fclipdrop.co\u002Fstable-diffusion-turbo) E.g. \"A simple 3D render of a friendly dog\"\n2. Remove its background using Clipdrop (https:\u002F\u002Fclipdrop.co\u002Fremove-background)\n3. Save to `load\u002Fimages\u002F`, preferably with `_rgba.png` as the suffix\n4. Run Zero-1-to-3 with the Stable Zero123 ckpt:\n```sh\npython launch.py --config configs\u002Fstable-zero123.yaml --train --gpu 0 data.image_path=.\u002Fload\u002Fimages\u002Fhamburger_rgba.png\n```\n\n**IMPORTANT NOTE: This is an experimental implementation and we're constantly improving the quality.**\n\n**IMPORTANT NOTE: This implementation extends the Zero-1-to-3 implementation below, and is heavily inspired from the Zero-1-to-3 implementation in [https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion](stable-dreamfusion)! `extern\u002Fldm_zero123` is borrowed from `stable-dreamfusion\u002Fldm`.**\n\n\n### Zero-1-to-3 [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2303.11328-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.11328)\n\n**Installation**\n\nDownload pretrained Zero123XL weights into `load\u002Fzero123`:\n\n```sh\ncd load\u002Fzero123\nwget https:\u002F\u002Fzero123.cs.columbia.edu\u002Fassets\u002Fzero123-xl.ckpt\n```\n\n**Results obtained by threestudio (Zero-1-to-3)**\n\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F22424247\u002Ff4e7b66f-7a46-4f9f-8fcd-750300cef651\n\n\n**IMPORTANT NOTE: This is an experimental implementation and we're constantly improving the quality.**\n\n**IMPORTANT NOTE: This implementation is heavily inspired from the Zero-1-to-3 implementation in [https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion](stable-dreamfusion)! `extern\u002Fldm_zero123` is borrowed from `stable-dreamfusion\u002Fldm`.**\n\n**Example running commands**\n\n1. Take an image of your choice, or generate it from text using your favourite AI image generator such as Stable Diffusion XL (https:\u002F\u002Fclipdrop.co\u002Fstable-diffusion) E.g. \"A simple 3D render of a friendly dog\"\n2. Remove its background using Clipdrop (https:\u002F\u002Fclipdrop.co\u002Fremove-background)\n3. Save to `load\u002Fimages\u002F`, preferably with `_rgba.png` as the suffix\n4. Run Zero-1-to-3:\n```sh\npython launch.py --config configs\u002Fzero123.yaml --train --gpu 0 data.image_path=.\u002Fload\u002Fimages\u002Fdog1_rgba.png\n```\n\nFor more scripts for Zero-1-to-3, please check `threestudio\u002Fscripts\u002Frun_zero123.sh`.\n\nPrevious Zero-1-to-3 weights are available at `https:\u002F\u002Fhuggingface.co\u002Fcvlab\u002Fzero123-weights\u002F`. You can download them to `load\u002Fzero123` as above, and replace the path at `system.guidance.pretrained_model_name_or_path`.\n\n**Guidance evaluation**\n\nAlso includes evaluation of the guidance during training. If `system.freq.guidance_eval` is set to a value > 0, this will save rendered image, noisy image (noise added mentioned at top left), 1-step-denoised image, 1-step prediction of original image, fully denoised image. For example:\n\n![it143-train](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_46bd3e07f6aa.png)\n\n### More to come, please stay tuned.\n\n- [ ] [Dream3D](https:\u002F\u002Fbluestyle97.github.io\u002Fdream3d\u002F) [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2212.14704-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.14704)\n- [ ] [DreamAvatar](https:\u002F\u002Fyukangcao.github.io\u002FDreamAvatar\u002F) [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2304.00916-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.00916)\n\n**If you would like to contribute a new method to threestudio, see [here](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#contributing-to-threestudio).**\n\n## Prompt Library\n\nFor easier comparison, we collect the 397 preset prompts from the website of [DreamFusion](https:\u002F\u002Fdreamfusion3d.github.io\u002Fgallery.html) in [this file](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fblob\u002Fmain\u002Fload\u002Fprompt_library.json). You can use these prompts by setting `system.prompt_processor.prompt=lib:keyword1_keyword2_..._keywordN`. Note that the prompt should starts with `lib:` and all the keywords are separated by `_`. The prompt processor will match the keywords to all the prompts in the library, and will only succeed if there's **exactly one match**. The used prompt will be printed to the console. Also note that you can't use this syntax to point to every prompt in the library, as there are prompts that are subset of other prompts lmao. We will enhance the use of this feature.\n\n## Tips on Improving Quality\n\nIt's important to note that existing techniques that lift 2D T2I models to 3D cannot consistently produce satisfying results. Results from great papers like DreamFusion and Magic3D are (to some extent) cherry-pickled, so don't be frustrated if you do not get what you expected on your first trial. Here are some tips that may help you improve the generation quality:\n\n- **Increase batch size**. Large batch sizes help convergence and improve the 3D consistency of the geometry. State-of-the-art methods claim using large batch sizes: DreamFusion uses a batch size of 4; Magic3D uses a batch size of 32; Fantasia3D uses a batch size of 24; some results shown above use a batch size of 8. You can easily change the batch size by setting `data.batch_size=N`. Increasing the batch size requires more VRAM. If you have limited VRAM but still want the benefit of large batch sizes, you may use [gradient accumulation provided by PyTorch Lightning](https:\u002F\u002Flightning.ai\u002Fdocs\u002Fpytorch\u002Fstable\u002Fadvanced\u002Ftraining_tricks.html#accumulate-gradients) by setting `trainer.accumulate_grad_batches=N`. This will accumulate the gradient of several batches and achieve a large effective batch size. Note that if you use gradient accumulation, you may need to multiply all step values by N times in your config, such as values that have the name `X_steps` and `trainer.val_check_interval`, since now N batches equal to a large batch.\n- **Train longer.** This helps if you can already obtain reasonable results and would like to enhance the details. If the result is still a mess after several thousand steps, training for a longer time often won't help. You can set the total training iterations by `trainer.max_steps=N`.\n- **Try different seeds.** This is a simple solution if your results have correct overall geometry but suffer from the multi-face Janus problem. You can change the seed by setting `seed=N`. Good luck!\n- **Tuning regularization weights.** Some methods have regularization terms which can be essential to obtaining good geometry. Try tuning the weights of these regularizations by setting `system.loss.lambda_X=value`. The specific values depend on your situation, you may refer to [tips for each supported model](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#supported-models) for more detailed instructions.\n- **Try debiasing methods.** When conventional SDS techniques like DreamFusion, Magic3D, SJC, and others fail to produce the desired 3D results, Debiased Score Distillation Sampling (D-SDS) can be a solution. D-SDS is devised to tackle challenges such as artifacts or the Janus problem, employing two strategies: score debiasing and prompt debiasing. You can activate score debiasing by just setting `system.guidance.grad_clip=[0,0.5,2.0,10000]`, where the order is `start_step, start_value, end_value, end_step`. You can enable prompt debiasing by setting `system.prompt_processor.use_prompt_debiasing=true`. When using prompt debiasing, it's recommended to set a list of indices for words that should potentially be removed by `system.prompt_processor.prompt_debiasing_mask_ids=[i1,i2,...]`. For example, if the prompt is `a smiling dog` and you only want to remove the word `smiling` for certain views, you should set it to `[1]`. You could also manually specify the prompt for each view by setting `system.prompt_processor.prompt_side`, `system.prompt_processor.prompt_back` and `system.prompt_processor.prompt_overhead`. For a detailed explanation of these techniques, refer to [the D-SDS paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.15413) or check out [the project page](https:\u002F\u002Fsusunghong.github.io\u002FDebiased-Score-Distillation-Sampling\u002F).\n- **Try Perp-Neg.** The [Perp-Neg algorithm](https:\u002F\u002Fperp-neg.github.io\u002F) can potentially alleviate the multi-face Janus problem. We now support Perp-Neg for `stable-diffusion-guidance` and `deep-floyd-guidance` by setting `system.prompt_processor.use_perp_neg=true`.\n\n## VRAM Optimization\n\nIf you encounter CUDA OOM error, try the following in order (roughly sorted by recommendation) to meet your VRAM requirement.\n\n- If you only encounter OOM at validation\u002Ftest time, you can set `system.cleanup_after_validation_step=true` and `system.cleanup_after_test_step=true` to free memory after each validation\u002Ftest step. This will slow down validation\u002Ftesting.\n- Use a smaller batch size or use gradient accumulation as demonstrated [here](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#tips-on-improving-quality).\n- If you are using PyTorch1.x, enable [memory efficient attention](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Foptimization\u002Ffp16#memory-efficient-attention) by setting `system.guidance.enable_memory_efficient_attention=true`. PyTorch2.0 has built-in support for this optimization and is enabled by default.\n- Enable [attention slicing](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Foptimization\u002Ffp16#sliced-attention-for-additional-memory-savings) by setting `system.guidance.enable_attention_slicing=true`. This will slow down training by ~20%.\n- If you are using StableDiffusionGuidance, you can use [Token Merging](https:\u002F\u002Fgithub.com\u002Fdbolya\u002Ftomesd) to **drastically** speed up computation and save memory. You can easily enable Token Merging by setting `system.guidance.token_merging=true`. You can also customize the Token Merging behavior by setting the parameters [here](https:\u002F\u002Fgithub.com\u002Fdbolya\u002Ftomesd\u002Fblob\u002Fmain\u002Ftomesd\u002Fpatch.py#L183-L213) to `system.guidance.token_merging_params`. Note that Token Merging may degrade generation quality.\n- Enable [sequential CPU offload](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Foptimization\u002Ffp16#offloading-to-cpu-with-accelerate-for-memory-savings) by setting `system.guidance.enable_sequential_cpu_offload=true`. This could save a lot of VRAM but will make the training **extremely slow**.\n\n## Documentation\n\nthreestudio use [OmegaConf](https:\u002F\u002Fgithub.com\u002Fomry\u002Fomegaconf) to manage configurations. You can literally change anything inside the yaml configuration file or by adding command line arguments without `--`. We list all arguments that you can change in the configuration in our [documentation](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fblob\u002Fmain\u002FDOCUMENTATION.md). Happy experimenting!\n\n## wandb (Weights & Biases) logging\n\nTo enable the (experimental) wandb support, set `system.loggers.wandb.enable=true`, e.g.:\n\n```bash\npython launch.py --config configs\u002Fzero123.yaml --train --gpu 0 system.loggers.wandb.enable=true`\n```\n\nIf you're using a corporate wandb server, you may first need to login to your wandb instance, e.g.:\n`wandb login --host=https:\u002F\u002FCOMPANY_XYZ.wandb.io --relogin`\n\nBy default the runs will have a random name, recorded in the `threestudio` project. You can override them to give a more descriptive name, e.g.:\n\n`python launch.py --config configs\u002Fzero123.yaml --train --gpu 0 system.loggers.wandb.enable=true system.loggers.wandb.name=\"zero123xl_accum;bs=4;lr=0.05\"`\n\n## Contributing to threestudio\n\n- Fork the repository and create your branch from `main`.\n- Install development dependencies:\n\n```sh\npip install -r requirements-dev.txt\n```\n\n- If you are using VSCode as the text editor: (1) Install `editorconfig` extension. (2) Set the default linter to mypy to enable static type checking. (3) Set the default formatter to black. You could either manually format the document or let the editor format the document each time it is saved by setting `\"editor.formatOnSave\": true`.\n\n- Run `pre-commit install` to install pre-commit hooks which will automatically format the files before commit.\n\n- Make changes to the code, update README and DOCUMENTATION if needed, and open a pull request.\n\n### Code Structure\n\nHere we just briefly introduce the code structure of this project. We will make more detailed documentation about this in the future.\n\n- All methods are implemented as a subclass of `BaseSystem` (in `systems\u002Fbase.py`). There typically are six modules inside a system: geometry, material, background, renderer, guidance, and prompt_processor. All modules are subclass of `BaseModule` (in `utils\u002Fbase.py`) except for guidance, and prompt_processor, which are subclass of `BaseObject` to prevent them from being treated as model parameters and better control their behavior in multi-GPU settings.\n- All systems, modules, and data modules have their configurations in their own dataclasses.\n- Base configurations for the whole project can be found in `utils\u002Fconfig.py`. In the `ExperimentConfig` dataclass, `data`, `system`, and module configurations under `system` are parsed to configurations of each class mentioned above. These configurations are strictly typed, which means you can only use defined properties in the dataclass and stick to the defined type of each property. This configuration paradigm (1) naturally supports default values for properties; (2) effectively prevents wrong assignments of these properties (say typos in the yaml file) or inappropriate usage at runtime.\n- This projects use both static and runtime type checking. For more details, see `utils\u002Ftyping.py`.\n- To update anything of a module at each training step, simply make it inherit to `Updateable` (see `utils\u002Fbase.py`). At the beginning of each iteration, an `Updateable` will update itself, and update all its attributes that are also `Updateable`. Note that subclasses of `BaseSystem`, `BaseModule` and `BaseObject` are by default inherited to `Updateable`.\n\n## Known Problems\n\n- Gradients of Vanilla MLP parameters are empty in AMP (temporarily fixed by disabling autocast).\n- FullyFused MLP may cause NaNs in 32 precision.\n\n## Credits\n\nthreestudio is built on the following amazing open-source projects:\n\n- **[Lightning](https:\u002F\u002Fgithub.com\u002FLightning-AI\u002Flightning)** Framework for creating highly organized PyTorch code.\n- **[OmegaConf](https:\u002F\u002Fgithub.com\u002Fomry\u002Fomegaconf)** Flexible Python configuration system.\n- **[NerfAcc](https:\u002F\u002Fgithub.com\u002FKAIR-BAIR\u002Fnerfacc)** Plug-and-play NeRF acceleration.\n\nThe following repositories greatly inspire threestudio:\n\n- **[Stable-DreamFusion](https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion)**\n- **[Latent-NeRF](https:\u002F\u002Fgithub.com\u002Feladrich\u002Flatent-nerf)**\n- **[Score Jacobian Chaining](https:\u002F\u002Fgithub.com\u002Fpals-ttic\u002Fsjc)**\n- **[Fantasia3D.unofficial](https:\u002F\u002Fgithub.com\u002Fashawkey\u002Ffantasia3d.unofficial)**\n\nThanks to the maintainers of these projects for their contribution to the community!\n\n## Citing threestudio\n\nIf you find threestudio helpful, please consider citing:\n\n```\n@Misc{threestudio2023,\n  author =       {Yuan-Chen Guo and Ying-Tian Liu and Ruizhi Shao and Christian Laforte and Vikram Voleti and Guan Luo and Chia-Hao Chen and Zi-Xin Zou and Chen Wang and Yan-Pei Cao and Song-Hai Zhang},\n  title =        {threestudio: A unified framework for 3D content generation},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio}},\n  year =         {2023}\n}\n```\n","\u003Cp align=\"center\">\n    \u003Cpicture>\n    \u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_3a9d312e765b.png\" width=\"50%\">\n    \u003C\u002Fpicture>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\u003Cb>\nthreestudio 是一个统一的框架，通过迁移 2D 文本到图像生成模型，实现从文本提示、单张图像以及少量示例图像进行 3D 内容创作。\n\u003C\u002Fb>\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F19284678\u002Ff48eca9f-45a7-4092-a519-6bb99f4939e4.gif\" width=\"100%\">\n\u003Cbr\u002F>\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F19284678\u002F01a00207-3240-4a8e-aa6f-d48436370fe7.png\" width=\"100%\">\n\u003Cbr\u002F>\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_16245d00c4d2.png\" width=\"48%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_3b4cbfa870ee.png\" width=\"25%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_a53555dda57f.png\" width=\"24%\">\n\n\u003Cbr\u002F>\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_65c0e35c8bdc.png\" width=\"48%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_47adc23cadde.png\" width=\"25%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_2d4cfdd37656.png\" width=\"25%\">\n\n\u003Cbr\u002F>\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_3196107bca58.png\" width=\"68%\">\n\u003Cbr\u002F>\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_a8e94b4a4ff6.png\" width=\"68%\">\n\u003Cbr\u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\u003Cb>\n👆 由 threestudio 实现的方法所得到的结果 👆 \u003Cbr\u002F>\n| \u003Ca href=\"https:\u002F\u002Fml.cs.tsinghua.edu.cn\u002Fprolificdreamer\u002F\">ProlificDreamer\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fdreamfusion3d.github.io\u002F\">DreamFusion\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Fdir\u002Fmagic3d\u002F\">Magic3D\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fpals.ttic.edu\u002Fp\u002Fscore-jacobian-chaining\">SJC\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Feladrich\u002Flatent-nerf\">Latent-NeRF\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Ffantasia3d.github.io\u002F\">Fantasia3D\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Ffabi92.github.io\u002Ftextmesh\u002F\">TextMesh\u003C\u002Fa> |\n\u003Cbr\u002F>\n| \u003Ca href=\"https:\u002F\u002Fzero123.cs.columbia.edu\u002F\">Zero-1-to-3\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fguochengqian.github.io\u002Fproject\u002Fmagic123\u002F\">Magic123\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FJunzheJosephZhu\u002FHiFA\">HiFA\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Flukoianov.com\u002Fsdi\">SDI\u003C\u002Fa> |\n\u003Cbr \u002F>\n| \u003Ca href=\"https:\u002F\u002Finstruct-nerf2nerf.github.io\u002F\">InstructNeRF2NeRF\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fcontrol4darxiv.github.io\u002F\">Control4D\u003C\u002Fa> |\n\u003C\u002Fb>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fthreestudio-project\u002Fthreestudio\u002Fblob\u002Fmain\u002Fthreestudio.ipynb\">\n  \u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fbennyguo\u002Fthreestudio\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Gradio%20Demo-Huggingface-orange\">\u003C\u002Fa>\n  \u003Ca href=\"http:\u002F\u002Ft23-g-01.threestudio.ai\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGradio%20Demo-Tencent-blue?logo=tencentqq&logoColor=white\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002Fejer2MAB8N\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-5865F2?logo=discord&logoColor=white\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    没有找到您想要的内容？请查看 \u003Ca href=\"https:\u002F\u002Fthreestudio-project.github.io\u002Fthreestudio-extensions\u002F\">\u003Cb>threestudio 扩展\u003C\u002Fb>\u003C\u002Fa> 或在此处提交功能请求 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fdiscussions\u002F46\">这里\u003C\u002Fa>！\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_4d241a114bf8.png\" width=\"68%\">\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_416316e81d47.png\" width=\"35%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_1f6395841655.png\" width=\"35%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_512a44d6bac7.png\" width=\"18%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_a274906ee5ca.png\" width=\"35%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_b1fb6df5fd48.png\" width=\"35%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_d42ea5788b12.png\" width=\"18%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_d5885a5d7414.png\" width=\"35%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_f797cd6e3b71.png\" width=\"35%\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_4def89febabb.png\" width=\"45%\">\n\u003C\u002Fp>\n\u003Cp align=\"center\">\u003Cb>\n| \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FHeliosZhao\u002FAnimate124\u002Ftree\u002Fthreestudio\">Animate-124\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FDSaurus\u002Fthreestudio-4dfy\">4D-fy\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fbaaivision\u002FGeoDream\u002Ftree\u002Fthreestudio\">GeoDream\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FDSaurus\u002Fthreestudio-dreamcraft3D\">DreamCraft3D\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhuanngzh\u002Fthreestudio-dreamwaltz\">Dreamwaltz\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FKU-CVLAB\u002F3DFuse-threestudio\">3DFuse\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fcxh0519\u002FProgressive3D\">Progressive3D\u003C\u002Fa> |  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fcxh0519\u002Fthreestudio-gaussiandreamer\">GaussianDreamer\u003C\u002Fa> |  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FDSaurus\u002Fthreestudio-3dgs\">高斯泼溅\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FDSaurus\u002Fthreestudio-mvdream\">MVDream\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FDSaurus\u002Fthreestudio-meshfitting\">网格拟合\u003C\u002Fa> |\n\u003C\u002Fb>\n\n## 新闻\n- 2024年11月8日：感谢[Artem Lukoianov](https:\u002F\u002Fgithub.com\u002Fottogin)实现了[通过重参数化DDIM进行分数蒸馏](https:\u002F\u002Flukoianov.com\u002Fsdi)！Threestudio中新增了文本到3D模块，并附带了一个包含2D分数蒸馏实验的Notebook。\n- 2024年10月21日：感谢[Amir Barda](https:\u002F\u002Fgithub.com\u002Famirbarda)实现了[MagicClay](https:\u002F\u002Fgithub.com\u002Famirbarda\u002FMagicClay)！请按照其官网上的说明尝试使用。\n- 2024年3月12日：感谢[Matthew Kwak](https:\u002F\u002Fgithub.com\u002Fmskwak01)和[Inès Hyeonsu Kim](https:\u002F\u002Fgithub.com\u002FInes-Hyeonsu-Kim)实现了[3DFuse](https:\u002F\u002Fgithub.com\u002FKU-CVLAB\u002F3DFuse-threestudio)！请按照其官网上的说明尝试使用。\n- 2024年3月8日：感谢[Xinhua Cheng](https:\u002F\u002Fgithub.com\u002Fcxh0519\u002F)实现了[GaussianDreamer](https:\u002F\u002Fgithub.com\u002Fcxh0519\u002Fthreestudio-gaussiandreamer)！请按照其官网上的说明尝试使用。\n- 2024年3月1日：感谢[Xinhua Cheng](https:\u002F\u002Fgithub.com\u002Fcxh0519\u002F)实现了[Progressive3D](https:\u002F\u002Fgithub.com\u002Fcxh0519\u002FProgressive3D)！请按照其官网上的说明尝试使用。\n- 2024年1月9日：感谢[Zehuan Huang](https:\u002F\u002Fgithub.com\u002Fhuanngzh)实现了3D人体虚拟形象生成[Dreamwaltz](https:\u002F\u002Fgithub.com\u002Fhuanngzh\u002Fthreestudio-dreamwaltz)！请按照其官网上的说明尝试使用。\n- 2024年1月6日：感谢[Baorui Ma](https:\u002F\u002Fgithub.com\u002Fmabaorui)实现了[GeoGream扩展](https:\u002F\u002Fgithub.com\u002Fbaaivision\u002FGeoDream\u002Ftree\u002Fthreestudio)！请按照其官网上的说明尝试使用。\n- 2024年1月5日：实现了HiFA。请按照[这里](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#hifa-)的说明尝试所有三种变体。\n- 2023年12月23日：感谢[Yuyang Zhao](https:\u002F\u002Fgithub.com\u002FHeliosZhao)实现了图像到4D生成扩展[Animate-124](https:\u002F\u002Fgithub.com\u002FHeliosZhao\u002FAnimate124\u002Ftree\u002Fthreestudio)！请按照扩展官网上的说明尝试使用。\n- 2023年12月18日：作为自定义扩展，实现了用于4D生成的[4D-fy](https:\u002F\u002Fgithub.com\u002FDSaurus\u002Fthreestudio-4dfy)以及用于高质量图像到3D生成的[DreamCraft3D](https:\u002F\u002Fgithub.com\u002FDSaurus\u002Fthreestudio-dreamcraft3D)！请按照扩展官网上的说明尝试使用。\n- 2023年12月13日：实现了对[Stable Zero123](https:\u002F\u002Fstability.ai\u002Fnews\u002Fstable-zero123-3d-generation)的支持，可用于从单张图片生成3D模型！请按照[这里](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#stable-zero123)的说明尝试使用。\n- 2023年11月30日：实现了[MVDream](https:\u002F\u002Fgithub.com\u002FDSaurus\u002Fthreestudio-mvdream)和[Gaussian Splatting](https:\u002F\u002Fgithub.com\u002FDSaurus\u002Fthreestudio-3dgs)作为自定义扩展。您还可以使用神经表示通过[Mesh-Fitting](https:\u002F\u002Fgithub.com\u002FDSaurus\u002Fthreestudio-meshfitting)来拟合网格。\n- 2023年11月30日：实现了[自定义扩展系统](https:\u002F\u002Fthreestudio-project.github.io\u002Fthreestudio-extensions\u002F)，您可以在[这个项目](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio-extensions)中添加自己的扩展。\n- 2023年6月25日：实现了[Magic123](https:\u002F\u002Fguochengqian.github.io\u002Fproject\u002Fmagic123\u002F)！请按照[这里](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#magic123-)的说明尝试使用。\n- 2023年7月6日：加入我们的[Discord服务器](https:\u002F\u002Fdiscord.gg\u002Fejer2MAB8N)，参与热烈讨论！\n- 2023年7月3日：您可以在[HuggingFace Spaces](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fbennyguo\u002Fthreestudio)上在线体验文本到3D功能，或使用我们的[自托管服务](http:\u002F\u002Ft23-g-01.threestudio.ai)（腾讯提供GPU支持）。如需在本地部署Web界面，请参阅[这里](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#gradio-web-interface)。\n- 2023年6月20日：实现了Instruct-NeRF2NeRF和Control4D，用于高保真度的3D编辑！请按照[Control4D](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#control4d-)和[Instruct-NeRF2NeRF](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#instructnerf2nerf-)的说明尝试使用。\n- 2023年6月14日：实现了TextMesh！请按照[这里](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#textmesh-)的说明尝试使用。\n- 2023年6月14日：实现了[提示去偏见](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.15413)和[Perp-Neg](https:\u002F\u002Fperp-neg.github.io\u002F)！请按照[这里](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#tips-on-improving-quality)的说明尝试使用。\n- 2023年5月29日：实验性地实现了使用[Zero-1-to-3](https:\u002F\u002Fzero123.cs.columbia.edu\u002F)从单张图片生成3D模型！请按照[这里](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#zero-1-to-3-)的说明尝试使用。\n- 2023年5月26日：实现了[ProlificDreamer](https:\u002F\u002Fml.cs.tsinghua.edu.cn\u002Fprolificdreamer\u002F)！请按照[这里](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#prolificdreamer-)的说明尝试使用。\n- 2023年5月14日：您可以通过我们的[2dplayground](2dplayground.ipynb)在2D图像上试验SDS损失。\n- 2023年5月13日：现在您可以在[Google Colab](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fthreestudio-project\u002Fthreestudio\u002Fblob\u002Fmain\u002Fthreestudio.ipynb)上试用Threestudio！\n- 2023年5月11日：我们现在支持导出带有纹理的网格！有关说明，请参阅[这里](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#export-meshes)。\n\n![export-blender](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_493bdc727347.png)\n\n## 安装\n\n更多信息请参阅[installation.md](docs\u002Finstallation.md)，包括通过Docker安装的方法。\n\n以下步骤已在Ubuntu 20.04上测试通过。\n\n- 您必须拥有至少6GB显存的NVIDIA显卡，并已安装[CUDA](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-downloads)。\n- 安装`Python >= 3.8`。\n- （可选，推荐）创建虚拟环境：\n\n```sh\npython3 -m virtualenv venv\n. venv\u002Fbin\u002Factivate\n\n# 较新的pip版本，例如pip-23.x，通常比旧版本（如pip-20.x）快得多。\n# 例如，它会缓存Git包的轮子文件，以避免后续不必要的重新构建。\npython3 -m pip install --upgrade pip\n```\n\n- 安装`PyTorch >= 1.12`。我们已在`torch1.12.1+cu113`和`torch2.0.0+cu118`上进行了测试，但其他版本也应该可以正常工作。\n\n```sh\n# torch1.12.1+cu113\npip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu113\n\n# 或者 torch2.0.0+cu118\npip install torch torchvision --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n```\n\n- （可选，推荐）安装 ninja 以加快 CUDA 扩展的编译速度：\n\n```sh\npip install ninja\n```\n\n- 安装依赖项：\n\n```sh\npip install -r requirements.txt\n```\n\n- （可选）`tiny-cuda-nn` 的安装可能需要将 pip 降级到 23.0.1。\n\n- （可选，推荐）threestudio 中性能最佳的模型使用了新发布的 T2I 模型 [DeepFloyd IF](https:\u002F\u002Fgithub.com\u002Fdeep-floyd\u002FIF)，该模型目前需要签署许可协议。如果您希望使用这些模型，您需要在 [DeepFloyd IF 的模型卡片上接受许可协议](https:\u002F\u002Fhuggingface.co\u002FDeepFloyd\u002FIF-I-XL-v1.0)，并在终端中通过 `huggingface-cli login` 登录 Hugging Face Hub。\n\n- 对于贡献者，请参阅 [这里](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#contributing-to-threestudio)。\n\n## 快速入门\n\n下面我们展示 threestudio 的一些基本用法。首先让我们训练一个 DreamFusion 模型来创建一只经典的煎饼兔子。\n\n**如果您在与 Hugging Face 的连接中遇到不稳定的情况，我们建议您：(1) 在首次运行成功获取所有所需文件后，在运行命令之前设置环境变量 `TRANSFORMERS_OFFLINE=1 DIFFUSERS_OFFLINE=1 HF_HUB_OFFLINE=1`，以避免每次运行时都尝试连接 Hugging Face；或者 (2) 按照 [这里](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhuggingface_hub\u002Fv0.14.1\u002Fguides\u002Fdownload#download-an-entire-repository) 和 [这里](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhuggingface_hub\u002Fv0.14.1\u002Fguides\u002Fdownload#download-files-to-local-folder) 将您使用的指导模型下载到本地文件夹，并将指导模型和提示处理器的 `pretrained_model_name_or_path` 设置为本地路径。**\n\n```sh\n# 如果您已同意 DeepFloyd IF 的许可协议且显存大于 20GB\n# 请尝试此配置以获得更高质量的效果\npython launch.py --config configs\u002Fdreamfusion-if.yaml --train --gpu 0 system.prompt_processor.prompt=\"一张从远处拍摄的婴儿兔子坐在一摞煎饼上的单反照片\"\n# 否则，您可以尝试使用 Stable Diffusion 模型，该模型只需 6GB 显存即可运行\npython launch.py --config configs\u002Fdreamfusion-sd.yaml --train --gpu 0 system.prompt_processor.prompt=\"一张从远处拍摄的婴儿兔子坐在一摞煎饼上的单反照片\"\n```\n\nthreestudio 使用 [OmegaConf](https:\u002F\u002Fgithub.com\u002Fomry\u002Fomegaconf) 实现灵活的配置。您可以通过指定不带 `--` 的参数轻松更改 YAML 文件中的任何配置，例如上述示例中的指定提示。有关所有支持的配置，请参阅我们的 [文档](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fblob\u002Fmain\u002FDOCUMENTATION.md)。\n\n训练将持续 10,000 次迭代。您可以在默认为 `[exp_root_dir]\u002F[name]\u002F[tag]@[timestamp]` 的试验目录中找到当前状态的可视化结果，其中 `exp_root_dir`（默认为 `outputs\u002F`）、`name` 和 `tag` 可以在配置文件中设置。训练完成后将生成一段 360 度视频。在训练过程中，按一次 `ctrl+c` 将停止训练并直接进入生成视频的测试阶段。再次按下 `ctrl+c` 则会完全退出程序。\n\n### 多 GPU 训练\n\n多 GPU 训练已被支持，但仍可能存在 [bug](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fissues\u002F195)。请注意，`data.batch_size` 是 **每个进程（设备）** 的批次大小。此外，请务必：\n\n- 将 `data.n_val_views` 设置为 GPU 数量的倍数。\n- 设置唯一的 `tag`，因为在多 GPU 训练中时间戳功能已被禁用，不会附加在标签之后。如果您使用与先前试验相同的标签，保存的配置文件、代码和可视化结果将会被覆盖。\n\n```sh\n# 这将导致有效的批次大小为 4（GPU 数量）* 2（data.batch_size）= 8\npython launch.py --config configs\u002Fdreamfusion-if.yaml --train --gpu 0,1,2,3 system.prompt_processor.prompt=\"一张从远处拍摄的婴儿兔子坐在一摞煎饼上的单反照片\" data.batch_size=2 data.n_val_views=4\n```\n\n如果您在调用 `launch.py` 之前定义了 `CUDA_VISIBLE_DEVICES` 环境变量，则无需指定 `--gpu`——这将使用 `CUDA_VISIBLE_DEVICES` 中的所有可用 GPU。例如，以下命令将自动使用 GPU 3 和 4：\n\n`CUDA_VISIBLE_DEVICES=3,4 python launch.py --config configs\u002Fdreamfusion-if.yaml --train system.prompt_processor.prompt=\"一张从远处拍摄的婴儿兔子坐在一摞煎饼上的单反照片\"`\n\n这在您通过集群运行 `launch.py` 并使用自动选择 GPU 并通过 `CUDA_VISIBLE_DEVICES` 导出其 ID 的命令时特别有用，例如通过 SLURM：\n\n```bash\ncd git\u002Fthreestudio\n. venv\u002Fbin\u002Factivate\nsrun --account mod3d --partition=g40 --gpus=1 --job-name=3s_bunny python launch.py --config configs\u002Fdreamfusion-if.yaml --train system.prompt_processor.prompt=\"一张从远处拍摄的婴儿兔子坐在一摞煎饼上的单反照片\"\n```\n\n### 从检查点恢复\n\n如果您想从检查点恢复训练，可以执行以下操作：\n\n```sh\n# 从最后一个检查点恢复训练，您可以将 last.ckpt 替换为其他检查点\npython launch.py --config path\u002Fto\u002Ftrial\u002Fdir\u002Fconfigs\u002Fparsed.yaml --train --gpu 0 resume=path\u002Fto\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt\n# 如果训练已完成，您仍然可以通过设置 trainer.max_steps 来继续训练更长时间\npython launch.py --config path\u002Fto\u002Ftrial\u002Fdir\u002Fconfigs\u002Fparsed.yaml --train --gpu 0 resume=path\u002Fto\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt trainer.max_steps=20000\n# 您也可以使用恢复的检查点进行测试\npython launch.py --config path\u002Fto\u002Ftrial\u002Fdir\u002Fconfigs\u002Fparsed.yaml --test --gpu 0 resume=path\u002Fto\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt\n# 请注意，上述命令使用的是先前试验的解析配置文件\n# 这些命令将继续使用相同的试验目录\n# 如果您想保存到新的试验目录，请在命令中将 parsed.yaml 替换为 raw.yaml\n\n# 仅加载保存的检查点中的权重，但不恢复训练（即不加载优化器状态）：\npython launch.py --config path\u002Fto\u002Ftrial\u002Fdir\u002Fconfigs\u002Fparsed.yaml --train --gpu 0 system.weights=path\u002Fto\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt\n```\n\n### 导出网格\n\n要将场景导出为带有纹理的网格，请使用 `--export` 选项。我们目前支持导出为 obj+mtl 格式，或带有顶点颜色的 obj 格式。\n\n```sh\n# 这将使用默认的网格导出配置，导出 obj+mtl 格式\npython launch.py --config path\u002Fto\u002Ftrial\u002Fdir\u002Fconfigs\u002Fparsed.yaml --export --gpu 0 resume=path\u002Fto\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt system.exporter_type=mesh-exporter\n# 指定 system.exporter.fmt=obj 即可获得带有顶点颜色的 obj 格式\n\n# 你也可以添加 system.exporter.save_uv=false 来加速导出过程，适合快速预览结果\npython launch.py --config path\u002Fto\u002Ftrial\u002Fdir\u002Fconfigs\u002Fparsed.yaml --export --gpu 0 resume=path\u002Fto\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt system.exporter_type=mesh-exporter system.exporter.fmt=obj\n# 对于基于 NeRF 的方法（DreamFusion、Magic3D coarse、Latent-NeRF、SJC）\n# 你可能需要调整等值面阈值（默认为 25），以获得满意的效果\n# 如果提取的模型不完整，可以降低阈值；如果模型过度膨胀，则提高阈值\npython launch.py --config path\u002Fto\u002Ftrial\u002Fdir\u002Fconfigs\u002Fparsed.yaml --export --gpu 0 resume=path\u002Fto\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt system.exporter_type=mesh-exporter system.geometry.isosurface_threshold=10.\n# 使用更高分辨率的 marching cubes 算法来获取更精细的模型\npython launch.py --config path\u002Fto\u002Ftrial\u002Fdir\u002Fconfigs\u002Fparsed.yaml --export --gpu 0 resume=path\u002Fto\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt system.exporter_type=mesh-exporter system.geometry.isosurface_method=mc-cpu system.geometry.isosurface_resolution=256\n```\n\n有关导出时可指定的所有选项，请参阅 [文档](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fblob\u002Fmain\u002FDOCUMENTATION.md#exporters)。\n\n关于我们所有支持模型的示例运行命令，请参阅 [此处](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#supported-models)。如需获取更高质量的结果，请参考 [此处](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#tips-on-improving-quality)，而要减少显存占用，请参阅 [此处](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#vram-optimization)。\n\n### Gradio Web 界面\n\n通过以下命令启动 Gradio Web 界面：\n\n```\npython gradio_app.py launch\n```\n\n参数说明：\n\n- `--listen`：在启动 Gradio 应用时将 `server_name=\"0.0.0.0\"` 设置为监听所有地址。\n- `--self-deploy`：允许直接从网页更改任意配置。\n- `--save`：启用检查点保存功能。\n\n如需功能请求、错误报告或技术问题讨论，请 [提交 issue](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fissues\u002Fnew)。如果您想讨论生成质量或展示您的生成成果，欢迎参与 [讨论区](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fdiscussions)。\n\n## 支持的模型\n\n### 基于重参数化 DDIM 的分数蒸馏（SDI）[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2405.15891-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2405.15891)\n\nSDI 提议重新思考 Dreamfusion 中噪声项采样的方法。论文表明，分数蒸馏过程可以被视为对二维图像采样算法的重新参数化。在这种情况下，分数蒸馏每一步添加的噪声应具有非常特定的形式。然而，在 Dreamfusion（SDS）中，噪声是随机采样的，这会导致过度模糊。SDI 通过反向 DDIM 过程来近似正确的噪声项。\n\n与原论文相比的显著差异：无。\n\n优点：\n* 纹理质量高\n* 几何细节清晰\n\n缺点：\n* 由于额外的反向过程，速度比 SDS 慢 1.5 倍。不过，由于步骤较少，仍比 Prolific Dreamer 快。\n* 由于渲染分辨率较高，所需显存比 SDS 更多。若使用显存较小的 GPU，可适当降低分辨率。\n\n**在 threestudio 中使用 Stable Diffusion (512x512) 获得的结果**\n\n\u003Cimg alt=\"A_DSLR_photo_of_a_freshly_baked_round_loaf_of_sourdough_bread\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_8bfaaad12a9d.png\" width=\"48%\">\n\u003Cimg alt=\"a_photograph_of_a_knight\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_ecb24fc4e852.png\" width=\"48%\">\n\n**示例运行命令**\n```sh\npython launch.py --config configs\u002Fsdi.yaml --train --gpu 0 system.prompt_processor.prompt=\"南瓜头僵尸，瘦削，高度细节，照片级真实感\"\n\npython launch.py --config configs\u002Fsdi.yaml --train --gpu 1 system.prompt_processor.prompt=\"忍者的照片\"\n\npython launch.py --config configs\u002Fsdi.yaml --train --gpu 2 system.prompt_processor.prompt=\"汉堡的广角 DSLR 照片\"\n\npython launch.py --config configs\u002Fsdi.yaml --train --gpu 3 system.prompt_processor.prompt=\"夹有奶油芝士和熏鲑鱼的贝果\"\n```\n\n### ProlificDreamer [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2305.16213-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.16213)\n\n**这是一个非官方的实验性实现！请参阅 [https:\u002F\u002Fgithub.com\u002Fthu-ml\u002Fprolificdreamer](https:\u002F\u002Fgithub.com\u002Fthu-ml\u002Fprolificdreamer) 获取官方代码发布。**\n\n**threestudio 使用 Stable Diffusion (256x256 Stage1) 获得的结果**\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F19284678\u002F27b42d8f-4aa4-4b47-8ea0-0f77db90fd1e\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F19284678\u002Fffcbbb01-3817-4663-a2bf-5e21a076bc3d\n\n**threestudio 使用 Stable Diffusion (256x256 Stage1, 512x512 Stage2+3) 获得的结果**\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F19284678\u002Fcfab881e-18dc-45fc-8384-7476f835b36e\n\n与原论文相比的显著差异：\n\n- ProlificDreamer 采用两阶段采样策略，即先进行 64 次粗略采样，再进行 32 次精细采样，而我们仅使用 512 次粗略采样。\n- 在第一阶段，我们只在前 5000 次迭代中渲染 64x64 的图像。之后，由于空余空间已被有效裁剪，渲染 512x512 的图像并不会消耗太多显存。\n- 目前我们还不支持多粒子模式。\n\n```sh\n# --------- 第一阶段（NeRF）--------- #\n# 物体生成，采用 512x512 NeRF 渲染，约需 30GB 显存\npython launch.py --config configs\u002Fprolificdreamer.yaml --train --gpu 0 system.prompt_processor.prompt=\"一个菠萝\"\n# 如果显存不足，可以尝试使用 64x64 NeRF 渲染，约需 15GB 显存\npython launch.py --config configs\u002Fprolificdreamer.yaml --train --gpu 0 system.prompt_processor.prompt=\"一个菠萝\" data.width=64 data.height=64 data.batch_size=1\n# 使用同一模型进行预训练和 LoRA 训练，可以在 \u003C10GB 显存下进行 64x64 训练\n# 但质量会稍差，因为 LoRA 训练使用的是 epsilon 预测模型\npython launch.py --config configs\u002Fprolificdreamer.yaml --train --gpu 0 system.prompt_processor.prompt=\"一个菠萝\" data.width=64 data.height=64 data.batch_size=1 system.guidance.pretrained_model_name_or_path_lora=\"stabilityai\u002Fstable-diffusion-2-1-base\"\n# 使用基于补丁的渲染器以减少内存消耗，分辨率为 512x512，约需 20GB 显存\npython launch.py --config configs\u002Fprolificdreamer-patch.yaml --train --gpu 0 system.prompt_processor.prompt=\"一个菠萝\"\n# 场景生成，采用 512x512 NeRF 渲染，约需 30GB 显存\npython launch.py --config configs\u002Fprolificdreamer-scene.yaml --train --gpu 0 system.prompt_processor.prompt=\"智能家居内部，写实细腻的照片，4k\"\n \n# --------- 第二阶段（几何精修）--------- #\n\n# 使用512×512光栅化细化几何体，结合Stable Diffusion SDS指导\npython launch.py --config configs\u002Fprolificdreamer-geometry.yaml --train --gpu 0 system.prompt_processor.prompt=\"一个菠萝\" system.geometry_convert_from=路径\u002F到\u002F阶段1\u002F试验\u002F目录\u002F检查点\u002Flast.ckpt\n\n# --------- 阶段3（纹理化） --------- #\n# 使用512×512光栅化进行纹理化，结合Stable Diffusion VSD指导\npython launch.py --config configs\u002Fprolificdreamer-texture.yaml --train --gpu 0 system.prompt_processor.prompt=\"一个菠萝\" system.geometry_convert_from=路径\u002F到\u002F阶段2\u002F试验\u002F目录\u002F检查点\u002Flast.ckpt\n```\n### HiFA [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2209.14988-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.18766)\n**这是一个重新实现的版本，缺少了原论文中的一些改进（从粗到精的NeRF采样、核平滑）。如需原始结果，请参考[https:\u002F\u002Fgithub.com\u002FJunzheJosephZhu\u002FHiFA](https:\u002F\u002Fgithub.com\u002FJunzheJosephZhu\u002FHiFA)**\n\nHiFA更像是一套改进方案，包括图像空间SDS、z方差损失和噪声强度退火等。它兼容大多数基于优化的方法。因此，我们提供了基于DreamFusion、ProlificDreamer和Magic123的三种变体。我们还提供了一个统一的指导配置以及针对DreamFusion和ProlificDreamer变体的SDS\u002FVSD指导配置，这两种配置应该能达到相同的效果。此外，我们也使HiFA与ProlificDreamer-scene兼容。\n\n**由threestudio获得的结果（Dreamfusion-HiFA，512×512）**\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F24391451\u002Fc0030c66-0691-4ec2-8b79-d933101864a0\n\n**由threestudio获得的结果（ProlificDreamer-HiFA，512×512）**\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F24391451\u002Fff5dc4d0-d7d7-4a73-964e-84b8c48e2907\n\n**由threestudio获得的结果（Magic123-HiFA，512×512）**\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F24391451\u002Feb6f2f74-9143-4e26-8429-e300ad2d2b80\n\n**示例运行命令**\n\n```sh\n# ------ DreamFusion-HiFA ------- # (类似于原论文)\npython launch.py --config configs\u002Fhifa.yaml --train --gpu 0 system.prompt_processor.prompt=\"一盘美味的塔可\"\npython launch.py --config configs\u002Fexperimental\u002Funified-guidance\u002Fhifa.yaml --train --gpu 0 system.prompt_processor.prompt=\"一盘美味的塔可\"\n# ------ ProlificDreamer-HiFA ------- #\npython launch.py --config configs\u002Fprolificdreamer-hifa.yaml --train --gpu 0 system.prompt_processor.prompt=\"一盘美味的塔可\"\npython launch.py --config configs\u002Fexperimental\u002Funified-guidance\u002Fprolificdreamer-hifa.yaml --train --gpu 0 system.prompt_processor.prompt=\"一盘美味的塔可\"\n# ------ ProlificDreamer-scene-HiFA ------- #\npython launch.py --config configs\u002Fprolificdreamer-scene-hifa.yaml --train --gpu 0 system.prompt_processor.prompt=\"餐厅内的一张单反相机拍摄的汉堡照片\"\n# ------ Magic123-HiFA ------ #\npython launch.py --config configs\u002Fmagic123-hifa-coarse-sd.yaml --train --gpu 0 data.image_path=加载\u002F图片\u002Ffirekeeper_rgba.png system.prompt_processor.prompt=\"黑暗之魂中的守火者玩具人偶\"\n# 我们为magic123的细化阶段提供了一个配置，但并未真正运行，因为粗略阶段的结果已经相当不错。\n```\n\n**提示**\n\n- 如果生成物体的颜色显得过于饱和，可以降低lambda_sds_img（或使用统一指导时的lambda_sd_img）。\n- 如果生成物体看起来模糊不清，可以增加lamda_z_variance；如果形状变得扭曲，则应降低lambda_z_variance。\n- 如果生成物体整体亮度较高，可以提高min_step_percent。\n- 确保sqrt_anneal和use_img_loss都设置为True。\n- 请查看[原仓库](https:\u002F\u002Fgithub.com\u002FJunzheJosephZhu\u002FHiFA)！那里的效果更好。\n- **如果您使用sqrt_anneal，请确保system.guidance.trainer_max_steps等于trainer.max_steps，以便噪声强度退火能够正常工作。**\n\n### DreamFusion [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2209.14988-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2209.14988)\n\n**由threestudio获得的结果（DeepFloyd IF，批次大小8）**\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F19284678\u002F236694848-38ae4ea4-554b-4c9d-b4c7-fba5bee3acb3.mp4\n\n**与论文相比的显著差异**\n\n- 我们使用开源的T2I模型（StableDiffusion、DeepFloyd IF），而论文中使用的是Imagen。\n- 对于DeepFloyd IF，我们采用20的指导尺度，而论文中对Imagen使用的是100。\n- 我们没有使用sigmoid函数来归一化反照率颜色，而是直接将颜色从`[-1,1]`缩放到`[0,1]`，因为我们发现这样有助于收敛。\n- 我们使用HashGrid编码，并沿光线均匀采样点，而论文中则采用了集成位置编码和来自MipNeRF360的采样策略。\n- 我们借鉴了Magic3D的相机设置和密度初始化策略，这与DreamFusion论文中的做法略有不同。\n- 一些超参数也有所不同，例如损失项的权重。\n\n**示例运行命令**\n\n```sh\n# 使用DeepFloyd IF，提取文本嵌入需要约15GB显存，训练时约需10GB显存\n# 在这里，我们采用了随机背景增强来提升几何质量\npython launch.py --config configs\u002Fdreamfusion-if.yaml --train --gpu 0 system.prompt_processor.prompt=\"一个美味的汉堡\" system.background.random_aug=true\n# 使用StableDiffusion，训练时约需6GB显存\npython launch.py --config configs\u002Fdreamfusion-sd.yaml --train --gpu 0 system.prompt_processor.prompt=\"一个美味的汉堡\"\n```\n\n**提示**\n\n- DeepFloyd IF的表现**远远优于**StableDiffusion。\n- 测试显示，在`system.material.ambient_only_steps`之前是反照率颜色，之后则是着色后的颜色。\n- 如果场景中漂浮物过多或变得空洞，可以尝试调整`system.loss.lambda_sparsity`的值。\n- 如果物体显得模糊或过度平滑，可以调整`system.loss.lambda_orient`。\n- 如果发现模型错误地将背景视为物体的一部分，可以尝试将背景替换为随机颜色，概率设为0.5，即设置`system.background.random_aug=true`。\n- DeepFloyd IF使用T5-XXL作为文本编码器，即使采用8位量化，也会消耗约15GB显存。目前这是在显存较少的情况下训练的瓶颈。如果有人知道如何用更少的显存运行文本编码器，请提交问题。我们也在尝试将文本编码器部署到[Replicate](https:\u002F\u002Freplicate.com\u002F)上，以便通过API提取文本嵌入，但遇到了一些网络连接问题。如果您愿意帮忙，请联系[bennyguo](mailto:imbennyguo@gmail.com)。\n\n### Magic3D [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2211.10440-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.10440)\n\n**由 threestudio 获得的结果（DeepFloyd IF，批量大小 8；第一行：粗略阶段，第二行：细化阶段）**\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F19284678\u002F236694858-0ed6939e-cd7a-408f-a94b-406709ae90c0.mp4\n\n**与论文相比的显著差异**\n\n- 我们在粗略阶段使用开源的 T2I 模型（StableDiffusion、DeepFloyd IF），而论文中使用的是 eDiff-I。\n- 在粗略阶段，我们为 DeepFloyd IF 使用 20 的指导尺度，而论文中 eDiff-I 使用的是 100。\n- 在粗略阶段，我们使用解析法计算的法线，而论文中使用的是预测的法线。\n- 在粗略阶段，我们采用了 DreamFusion 中的方向损失，而论文中没有使用。\n- 论文中省略了许多内容，例如损失项的权重以及 DMTet 网格分辨率等，这些可能有所不同。\n\n**示例运行命令**\n\n首先训练粗略阶段的 NeRF：\n\n```sh\n# 使用 DeepFloyd IF，提取文本嵌入需要约 15GB 显存，训练时约需 10GB 显存\npython launch.py --config configs\u002Fmagic3d-coarse-if.yaml --train --gpu 0 system.prompt_processor.prompt=\"一个美味的汉堡\"\n# 使用 StableDiffusion，训练时约需 6GB 显存\npython launch.py --config configs\u002Fmagic3d-coarse-sd.yaml --train --gpu 0 system.prompt_processor.prompt=\"一个美味的汉堡\"\n```\n\n然后将粗略阶段的 NeRF 转换为 DMTet，并使用可微光栅化进行训练：\n\n```sh\n# 细化阶段使用 StableDiffusion，训练时约需 5GB 显存\npython launch.py --config configs\u002Fmagic3d-refine-sd.yaml --train --gpu 0 system.prompt_processor.prompt=\"一个美味的汉堡\" system.geometry_convert_from=粗略阶段试验目录\u002F检查点\u002Flast.ckpt\n# 如果对默认阈值（25）提取的表面不满意，\n# 可以通过 `system.geometry_convert_override` 指定阈值：\n# 表面不完整时降低阈值，过度膨胀时则提高。\npython launch.py --config configs\u002Fmagic3d-refine-sd.yaml --train --gpu 0 system.prompt_processor.prompt=\"一个美味的汉堡\" system.geometry_convert_from=粗略阶段试验目录\u002F检查点\u002Flast.ckpt system.geometry_convert_override.isosurface_threshold=10.\n```\n\n**提示**\n\n- 对于粗略阶段，DeepFloyd IF 的表现 **远优于** StableDiffusion。\n- Magic3D 使用神经网络预测表面法线，这可能与真实的几何法线不符，从而降低几何质量，因此我们改用解析法计算的法线。\n- 如果场景中漂浮物过多或过于空洞，可以尝试调整 `system.loss.lambda_sparsity` 的值。\n- 如果物体显得模糊或过度平滑，可以尝试调整 `system.loss.lambda_orient`。\n- 如果发现模型错误地将背景视为物体的一部分，可以通过设置 `system.background.random_aug=true`，以 50% 的概率用随机颜色替换背景。\n\n### Score Jacobian Chaining [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2212.00774-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.00774)\n\n**由 threestudio 获得的结果（Stable Diffusion）**\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F19284678\u002F236694871-87a247c1-2d3d-4cbf-89df-450bfeac3aca.mp4\n\n与论文相比的显著差异：无。\n\n**示例运行命令**\n\n```sh\n# 在潜在空间中使用 SJC 引导进行训练\npython launch.py --config configs\u002Fsjc.yaml --train --gpu 0 system.prompt_processor.prompt=\"一张高质量的美味汉堡照片\"\n# 在潜在空间中使用 SJC 引导进行训练，特朗普人偶\npython launch.py --config configs\u002Fsjc.yaml --train --gpu 0 system.prompt_processor.prompt=\"特朗普人偶\" trainer.max_steps=30000 system.loss.lambda_emptiness=\"[15000,10000.0,200000.0,15001]\" system.optimizer.params.background.lr=0.05 seed=42\n```\n\n**提示**\n\n- SJC 使用亚像素渲染技术，解码一个 `128x128` 的潜在特征图，以获得更好的可视化效果。如果在验证或测试时希望节省显存，可以关闭此功能，设置为 `system.subpixel_rendering=false`。\n\n### Latent-NeRF [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2211.07600-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.07600)\n\n**由 threestudio 获得的结果（Stable Diffusion）**\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F19284678\u002F236694876-5a270347-6a41-4429-8909-44c90c554e06.mp4\n\n与论文相比的显著差异：无。\n\n目前我们仅实现了用于文本引导的 Latent-NeRF，以及用于（文本、形状）引导的 Sketch-Shape 三维生成。Latent-Paint 尚未实现。\n\n**示例运行命令**\n\n```sh\n# 在 Stable Diffusion 潜在空间中训练 Latent-NeRF\npython launch.py --config configs\u002Flatentnerf.yaml --train --gpu 0 system.prompt_processor.prompt=\"一个美味的汉堡\"\n# 在 RGB 空间中细化 Latent-NeRF\npython launch.py --config configs\u002Flatentnerf-refine.yaml --train --gpu 0 system.prompt_processor.prompt=\"一个美味的汉堡\" system.weights=潜阶段试验目录\u002F检查点\u002Flast.ckpt\n\n# 在 Stable Diffusion 潜在空间中训练 Sketch-Shape\npython launch.py --config configs\u002Fsketchshape.yaml --train --gpu 0 system.guide_shape=加载\u002F形状\u002F泰迪熊.obj system.prompt_processor.prompt=\"穿着燕尾服的泰迪熊\"\n# 在 RGB 空间中细化 Sketch-Shape\npython launch.py --config configs\u002Fsketchshape-refine.yaml --train --gpu 0 system.guide_shape=加载\u002F形状\u002F泰迪熊.obj system.prompt_processor.prompt=\"穿着燕尾服的泰迪熊\" system.weights=潜阶段试验目录\u002F检查点\u002Flast.ckpt\n```\n\n### Fantasia3D [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2303.13873-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.13873)\n\n**由 threestudio 获得的结果（Stable Diffusion）**\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F19284678\u002F236694880-33b0db21-4530-47f1-9c3b-c70357bc84b3.mp4\n\n**由 threestudio 获得的结果（Stable Diffusion，网格初始化）**\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F19284678\u002F762903c1-665b-47b5-a2c2-bd7021a9e548.mp4\n\n\u003Cp align=\"center\">\n\u003Cimg alt=\"threestudio\" src=\"https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F19284678\u002F2d22e30f-4a32-454a-a06e-d6e6bd2a1b96.png\" width=\"100%\">\n\u003C\u002Fp>\n\n与论文相比的显著差异：\n\n- 我们默认启用了切线空间法线扰动，可通过添加 `system.material.use_bump=false` 来关闭。\n\n**示例运行命令**\n\n```sh\n# --------- 几何部分 --------- #\npython launch.py --config configs\u002Ffantasia3d.yaml --train --gpu 0 system.prompt_processor.prompt=\"一张单反相机拍摄的冰淇淋圣代照片\"\n# Fantasia3D 非常依赖于初始化的 SDF 形状\n# 默认形状是一个半径为 0.5 的球体\n# 可以根据输入提示更改形状初始化\npython launch.py --config configs\u002Ffantasia3d.yaml --train --gpu 0 system.prompt_processor.prompt=\"比萨斜塔\" system.geometry.shape_init=椭球 system.geometry.shape_init_params=\"[0.3,0.3,0.8]\"\n# 或者也可以从网格文件初始化\n\n# 这里 shape_init_params 是形状的缩放比例\n# 同时请确保输入正确的上方向和前方向轴（取值为 +x、+y、+z、-x、-y、-z）\npython launch.py --config configs\u002Ffantasia3d.yaml --train --gpu 0 system.prompt_processor.prompt=\"hulk\" system.geometry.shape_init=mesh:load\u002Fshapes\u002Fhuman.obj system.geometry.shape_init_params=0.9 system.geometry.shape_init_mesh_up=+y system.geometry.shape_init_mesh_front=+z\n# --------- 纹理 --------- #\n# 从几何检查点继续训练 PBR 纹理：\npython launch.py --config configs\u002Ffantasia3d-texture.yaml --train --gpu 0 system.prompt_processor.prompt=\"一张冰淇淋圣代的单反照片\" system.geometry_convert_from=path\u002Fto\u002Fgeometry\u002Fstage\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt\n```\n\n**提示**\n\n- 如果在训练初期发现形状容易发散，可以降低指导尺度，将 `system.guidance.guidance_scale` 设置为 `30`。\n\n### TextMesh [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2304.12439-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.12439)\n\n**由 threestudio（DeepFloyd IF，批量大小 4）获得的结果**\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F19284678\u002F72217cdd-765a-475b-92d0-4ab62bf0f57a\n\n**与论文相比的显著差异**\n\n- 大多数设置与 DreamFusion 模型相同。请参考 DreamFusion 模型的显著差异。\n- 我们使用 NeuS 作为几何表示，而原论文使用 VolSDF。\n- 我们采用了来自 [Neuralangelo](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.03092) 的技术，以在使用哈希网格时稳定法线计算。\n- 目前我们仅实现了 TextMesh 的粗略阶段。\n\n**示例运行命令**\n\n```sh\n# 使用 DeepFloyd IF，需要约 15GB 显存\npython launch.py --config configs\u002Ftextmesh-if.yaml --train --gpu 0 system.prompt_processor.prompt=\"lib:cowboy_boots\"\n```\n\n**提示**\n\n- TextMesh 使用基于表面的几何表示，因此在导出网格时无需手动调整等值面阈值！\n\n### Control4D [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2305.20082-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2305.20082)\n\n**这是使用 threestudio 对 Control4D 的实验性实现！Control4D 将在论文被接受后发布包含静态和动态编辑的完整代码。**\n\n**由 threestudio 获得的结果（512×512）**\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F24589363\u002F97d9aadd-32c7-488f-9543-6951b285d588\n\n我们目前不支持动态编辑。\n\n请使用此链接下载 Control4D 的数据样本：[link](https:\u002F\u002Fmailstsinghuaeducn-my.sharepoint.com\u002F:u:\u002Fg\u002Fpersonal\u002Fshaorz20_mails_tsinghua_edu_cn\u002FEcqOaEuNwH1KpR0JTzL4Ur0BO_iJr8RiY2rNAGVC7h3fng?e=Dyr2gu)。\n\n**示例运行命令**\n\n```sh\n# --------- Control4D --------- #\n# 静态编辑，采用 128×128 NeRF + 512×512 GAN 渲染，约 20GB 显存\npython launch.py --config configs\u002Fcontrol4d-static.yaml --train --gpu 0 data.dataroot=\"YOUR_DATAROOT\u002Ftwindom\" system.prompt_processor.prompt=\"埃隆·马斯克穿着红衬衫，RAW 照片，（高细节皮肤：1.2），8k 超高清，单反相机，柔和光线，高质量，胶片颗粒感，富士 XT3\"\n```\n\n### InstructNeRF2NeRF [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2303.12789-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.12789)\n\n**由 threestudio 获得的结果**\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F24589363\u002F7aa43a2d-87d7-4ef5-94b6-f778ddb041b5\n\n请使用此链接下载 InstructNeRF2NeRF 的数据样本：[link](https:\u002F\u002Fmailstsinghuaeducn-my.sharepoint.com\u002F:u:\u002Fg\u002Fpersonal\u002Fshaorz20_mails_tsinghua_edu_cn\u002FEbNazeNAYsBIvxGeXuCmOXgBiLv8KM-hfRNbNS7DtTvSvA?e=C1k4bM)。\n\n**示例运行命令**\n\n```sh\n# --------- InstructNeRF2NeRF --------- #\n# 基于 NeRF 片段渲染的 3D 编辑，约 20GB 显存\npython launch.py --config configs\u002Finstructnerf2nerf.yaml --train --gpu 0 data.dataroot=\"YOUR_DATAROOT\u002Fface\" data.camera_layout=\"front\" data.camera_distance=1 data.eval_interpolation=[1,3,50] system.prompt_processor.prompt=\"把他变成爱因斯坦\"\n```\n\n### Magic123 [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2306.17843-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2306.17843)\n\n**由 threestudio 获得的结果（Zero123 + Stable Diffusion）**\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F19284678\u002F335a58a8-8fee-485b-ac27-c55a16f4a673\n\n**与论文相比的显著差异**\n- 这是一个非官方的重新实现，其总体思路与 [官方实现](https:\u002F\u002Fgithub.com\u002Fguochengqian\u002FMagic123) 相同，但在一些方面有所不同，例如超参数。\n- 不支持文本反转，这意味着训练时需要提供文本提示。\n\n**示例运行命令**\n\n首先训练粗略阶段的 NeRF：\n\n```sh\n# Zero123 + Stable Diffusion，约 12GB 显存\n# data.image_path 必须指向一张 4 通道 RGBA 图像\n# 必须指定 system.prompt_proessor.prompt\npython launch.py --config configs\u002Fmagic123-coarse-sd.yaml --train --gpu 0 data.image_path=load\u002Fimages\u002Fhamburger_rgba.png system.prompt_processor.prompt=\"一个美味的汉堡\"\n```\n\n然后将粗略阶段的 NeRF 转换为 DMTet，并使用可微分光栅化进行训练：\n\n```sh\n# Zero123 + Stable Diffusion，约 10GB 显存\n# data.image_path 必须指向一张 4 通道 RGBA 图像\n# 必须指定 system.prompt_proessor.prompt\npython launch.py --config configs\u002Fmagic123-refine-sd.yaml --train --gpu 0 data.image_path=load\u002Fimages\u002Fhamburger_rgba.png system.prompt_processor.prompt=\"一个美味的汉堡\" system.geometry_convert_from=path\u002Fto\u002Fcoarse\u002Fstage\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt\n# 如果对使用默认阈值（25）提取的表面不满意\n# 可以通过 `system.geometry_convert_override` 指定阈值\n# 如果提取的表面不完整，则降低该值；如果过度膨胀，则提高该值\npython launch.py --config configs\u002Fmagic123-refine-sd.yaml --train --gpu 0 data.image_path=load\u002Fimages\u002Fhamburger_rgba.png system.prompt_processor.prompt=\"一个美味的汉堡\" system.geometry_convert_from=path\u002Fto\u002Fcoarse\u002Fstage\u002Ftrial\u002Fdir\u002Fckpts\u002Flast.ckpt system.geometry_convert_override.isosurface_threshold=10.\n```\n\n**提示**\n\n- 如果图像中包含非正面朝向的物体，通过设置 `data.default_elevation_deg` 和 `data.default_azimuth_deg` 来指定近似的仰角和方位角可能会有所帮助。在 threestudio 中，顶部为仰角 +90 度，底部为仰角 -90 度；左侧为方位角 -90 度，右侧为方位角 +90 度。\n\n### Stable Zero123\n\n**安装**\n\n从 https:\u002F\u002Fhuggingface.co\u002Fstabilityai\u002Fstable-zero123 下载预训练的 Stable Zero123 检查点 `stable-zero123.ckpt`，并将其放置在 `load\u002Fzero123` 目录下。\n\n**由 threestudio 获得的结果（Stable Zero123 与 Zero123-XL 对比）**\n![Final_video_v01](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_902377accad6.png)\n\n**直接生成多视角图像**\n如果您只想生成多视角图像，请参考 [threestudio-mvimg-gen](https:\u002F\u002Fgithub.com\u002FDSaurus\u002Fthreestudio-mvimg-gen)。该扩展可以使用 Stable Zero123 直接从多视角生成图像。\n\n**示例运行命令**\n\n1. 选择一张您喜欢的图片，或者使用您喜爱的 AI 图像生成器（如 SDXL Turbo，https:\u002F\u002Fclipdrop.co\u002Fstable-diffusion-turbo）根据文本生成一张图片，例如：“一只友好的狗的简单 3D 渲染”。\n2. 使用 Clipdrop（https:\u002F\u002Fclipdrop.co\u002Fremove-background）去除其背景。\n3. 将图片保存到 `load\u002Fimages\u002F` 目录下，最好以 `_rgba.png` 作为后缀。\n4. 使用 Stable Zero123 检查点运行 Zero-1-to-3：\n```sh\npython launch.py --config configs\u002Fstable-zero123.yaml --train --gpu 0 data.image_path=.\u002Fload\u002Fimages\u002Fhamburger_rgba.png\n```\n\n**重要提示：这是一个实验性实现，我们正在不断改进其质量。**\n\n**重要提示：该实现扩展了下方的 Zero-1-to-3 实现，并深受 [https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion](stable-dreamfusion) 中 Zero-1-to-3 实现的启发！`extern\u002Fldm_zero123` 借鉴自 `stable-dreamfusion\u002Fldm`。**\n\n\n### Zero-1-to-3 [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2303.11328-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.11328)\n\n**安装**\n\n将预训练的 Zero123XL 权重下载到 `load\u002Fzero123` 目录：\n\n```sh\ncd load\u002Fzero123\nwget https:\u002F\u002Fzero123.cs.columbia.edu\u002Fassets\u002Fzero123-xl.ckpt\n```\n\n**由 threestudio 获得的结果（Zero-1-to-3）**\n\n\nhttps:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fassets\u002F22424247\u002Ff4e7b66f-7a46-4f9f-8fcd-750300cef651\n\n\n**重要提示：这是一个实验性实现，我们正在不断改进其质量。**\n\n**重要提示：该实现深受 [https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion](stable-dreamfusion) 中 Zero-1-to-3 实现的启发！`extern\u002Fldm_zero123` 借鉴自 `stable-dreamfusion\u002Fldm`。**\n\n**示例运行命令**\n\n1. 选择一张您喜欢的图片，或者使用您喜爱的 AI 图像生成器（如 Stable Diffusion XL，https:\u002F\u002Fclipdrop.co\u002Fstable-diffusion）根据文本生成一张图片，例如：“一只友好的狗的简单 3D 渲染”。\n2. 使用 Clipdrop（https:\u002F\u002Fclipdrop.co\u002Fremove-background）去除其背景。\n3. 将图片保存到 `load\u002Fimages\u002F` 目录下，最好以 `_rgba.png` 作为后缀。\n4. 运行 Zero-1-to-3：\n```sh\npython launch.py --config configs\u002Fzero123.yaml --train --gpu 0 data.image_path=.\u002Fload\u002Fimages\u002Fdog1_rgba.png\n```\n\n更多 Zero-1-to-3 的脚本，请查看 `threestudio\u002Fscripts\u002Frun_zero123.sh`。\n\n之前的 Zero-1-to-3 权重可在 `https:\u002F\u002Fhuggingface.co\u002Fcvlab\u002Fzero123-weights\u002F` 获取。您可以按照上述方法将其下载到 `load\u002Fzero123` 目录，并替换 `system.guidance.pretrained_model_name_or_path` 中的路径。\n\n**指导评估**\n\n还包括训练过程中的指导评估。如果将 `system.freq.guidance_eval` 设置为大于 0 的值，系统会保存渲染后的图像、添加噪声的图像（左上角所示）、单步去噪后的图像、对原始图像的单步预测以及完全去噪后的图像。例如：\n\n![it143-train](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_readme_46bd3e07f6aa.png)\n\n### 更多内容敬请期待。\n\n- [ ] [Dream3D](https:\u002F\u002Fbluestyle97.github.io\u002Fdream3d\u002F) [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2212.14704-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2212.14704)\n- [ ] [DreamAvatar](https:\u002F\u002Fyukangcao.github.io\u002FDreamAvatar\u002F) [![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2304.00916-b31b1b.svg?style=flat-square)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.00916)\n\n**如果您希望为 threestudio 贡献新的方法，请参阅 [此处](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#contributing-to-threestudio)。**\n\n## 提示词库\n\n为了便于比较，我们将 [DreamFusion](https:\u002F\u002Fdreamfusion3d.github.io\u002Fgallery.html) 网站上的 397 条预设提示词收集到了 [此文件](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fblob\u002Fmain\u002Fload\u002Fprompt_library.json) 中。您可以通过设置 `system.prompt_processor.prompt=lib:keyword1_keyword2_..._keywordN` 来使用这些提示词。请注意，提示词必须以 `lib:` 开头，且所有关键词之间用 `_` 分隔。提示词处理器会将这些关键词与库中的所有提示词进行匹配，只有当**恰好匹配一条**时才会成功。所使用的提示词将会打印到控制台。另外请注意，您不能使用这种语法指向库中的每一条提示词，因为有些提示词是其他提示词的子集，哈哈。我们将进一步完善这一功能的使用。\n\n## 提升质量的技巧\n\n需要注意的是，现有的将2D到3D图像生成模型提升至3D的技术，并不能始终如一地产生令人满意的结果。像DreamFusion和Magic3D这样优秀论文中的结果，在某种程度上是经过精心挑选的，因此如果你在第一次尝试时没有得到预期的效果，也不必感到沮丧。以下是一些可能帮助你提高生成质量的技巧：\n\n- **增加批量大小**。较大的批量有助于收敛，并提高几何体的一致性。当前最先进的方法都声称使用了较大的批量：DreamFusion使用4的批量大小；Magic3D使用32的批量大小；Fantasia3D使用24的批量大小；上述展示的一些结果则使用8的批量大小。你可以通过设置`data.batch_size=N`轻松更改批量大小。增加批量大小需要更多的显存。如果你显存有限，但仍希望获得大批次的好处，可以使用[PyTorch Lightning提供的梯度累积功能](https:\u002F\u002Flightning.ai\u002Fdocs\u002Fpytorch\u002Fstable\u002Fadvanced\u002Ftraining_tricks.html#accumulate-gradients)，通过设置`trainer.accumulate_grad_batches=N`来实现。这会累积多个批次的梯度，从而达到较大的有效批量。请注意，如果使用梯度累积，你可能需要在配置中将所有步数相关的值（例如带有`X_steps`名称的值以及`trainer.val_check_interval`）乘以N倍，因为现在N个批次等同于一个大批次。\n  \n- **延长训练时间**。如果你已经能够获得合理的结果，并希望进一步提升细节，那么延长训练时间会有帮助。但如果经过数千步后结果仍然一团糟，继续训练通常不会带来改善。你可以通过设置`trainer.max_steps=N`来指定总的训练迭代次数。\n\n- **尝试不同的随机种子**。如果你的结果整体几何形状正确，但存在多面Janus问题，这是一个简单的解决方案。你可以通过设置`seed=N`来改变随机种子。祝你好运！\n  \n- **调整正则化权重**。一些方法包含正则化项，这些正则化对于获得良好的几何形状至关重要。你可以通过设置`system.loss.lambda_X=value`来尝试调整这些正则化项的权重。具体的数值取决于你的实际情况，更多详细说明请参考[每个支持模型的提示](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#supported-models)。\n\n- **尝试去偏方法**。当传统的SDS技术，如DreamFusion、Magic3D、SJC等无法生成理想的3D结果时，去偏分数蒸馏采样（D-SDS）可能是一个解决方案。D-SDS旨在解决诸如伪影或Janus问题等挑战，采用了两种策略：分数去偏和提示去偏。你只需通过设置`system.guidance.grad_clip=[0,0.5,2.0,10000]`即可激活分数去偏，其中顺序为`开始步数、开始值、结束值、结束步数`。要启用提示去偏，可以设置`system.prompt_processor.use_prompt_debiasing=true`。使用提示去偏时，建议通过`system.prompt_processor.prompt_debiasing_mask_ids=[i1,i2,...]`设置一个可能需要被移除的词索引列表。例如，如果提示是`一只微笑的狗`，而你只想针对某些视角移除“微笑”这个词，那么就应设置为`[1]`。你还可以通过设置`system.prompt_processor.prompt_side`、`system.prompt_processor.prompt_back`和`system.prompt_processor.prompt_overhead`来手动指定每个视角的提示。有关这些技术的详细解释，请参阅[D-SDS论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.15413)或访问[项目页面](https:\u002F\u002Fsusunghong.github.io\u002FDebiased-Score-Distillation-Sampling\u002F)。\n\n- **尝试Perp-Neg**。[Perp-Neg算法](https:\u002F\u002Fperp-neg.github.io\u002F)有可能缓解多面Janus问题。我们现在支持在`stable-diffusion-guidance`和`deep-floyd-guidance`中使用Perp-Neg，只需设置`system.prompt_processor.use_perp_neg=true`即可。\n\n## 显存优化\n\n如果你遇到CUDA OOM错误，可以按照以下顺序（大致按推荐程度排序）尝试操作，以满足你的显存需求。\n\n- 如果你只在验证\u002F测试时遇到OOM，可以设置`system.cleanup_after_validation_step=true`和`system.cleanup_after_test_step=true`，以便在每次验证\u002F测试步骤后释放内存。这样做会降低验证\u002F测试的速度。\n  \n- 使用较小的批量大小，或者如[此处](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio#tips-on-improving-quality)所示使用梯度累积。\n  \n- 如果你使用的是PyTorch 1.x版本，可以通过设置`system.guidance.enable_memory_efficient_attention=true`来启用[高效注意力机制](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Foptimization\u002Ffp16#memory-efficient-attention)。PyTorch 2.0已内置对此优化的支持，默认情况下已启用。\n  \n- 通过设置`system.guidance.enable_attention_slicing=true`来启用[注意力切片](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Foptimization\u002Ffp16#sliced-attention-for-additional-memory-savings)。这会使训练速度减慢约20%。\n  \n- 如果你使用StableDiffusionGuidance，可以利用[Token Merging](https:\u002F\u002Fgithub.com\u002Fdbolya\u002Ftomesd)来**大幅**加快计算速度并节省显存。只需设置`system.guidance.token_merging=true`即可轻松启用Token Merging。你还可以通过设置[此处](https:\u002F\u002Fgithub.com\u002Fdbolya\u002Ftomesd\u002Fblob\u002Fmain\u002Ftomesd\u002Fpatch.py#L183-L213)的参数来自定义Token Merging的行为，即`system.guidance.token_merging_params`。请注意，Token Merging可能会降低生成质量。\n  \n- 通过设置`system.guidance.enable_sequential_cpu_offload=true`来启用[顺序CPU卸载](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Foptimization\u002Ffp16#offloading-to-cpu-with-accelerate-for-memory-savings)。这可以节省大量显存，但会使训练过程变得**极其缓慢**。\n\n## 文档\n\nthreestudio使用[OmegaConf](https:\u002F\u002Fgithub.com\u002Fomry\u002Fomegaconf)来管理配置。你可以直接修改yaml配置文件中的任何内容，也可以通过添加不带`--`的命令行参数来进行更改。我们在[文档](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fblob\u002Fmain\u002FDOCUMENTATION.md)中列出了所有可以在配置中更改的参数。祝你实验愉快！\n\n## wandb（Weights & Biases）日志记录\n\n要启用（实验性）wandb支持，需设置`system.loggers.wandb.enable=true`，例如：\n\n```bash\npython launch.py --config configs\u002Fzero123.yaml --train --gpu 0 system.loggers.wandb.enable=true`\n```\n\n如果你使用的是企业版wandb服务器，可能需要先登录你的wandb实例，例如：\n`wandb login --host=https:\u002F\u002FCOMPANY_XYZ.wandb.io --relogin`\n\n默认情况下，运行会有一个随机名称，并记录在`threestudio`项目中。你可以覆盖这些名称，为其赋予更具描述性的名字，例如：\n\n```bash\npython launch.py --config configs\u002Fzero123.yaml --train --gpu 0 system.loggers.wandb.enable=true system.loggers.wandb.name=\"zero123xl_accum;bs=4;lr=0.05\"`\n```\n\n## 参与 threestudio 的开发\n\n- 克隆仓库并基于 `main` 分支创建你的分支。\n- 安装开发依赖：\n\n```sh\npip install -r requirements-dev.txt\n```\n\n- 如果你使用 VSCode 作为编辑器：(1) 安装 `editorconfig` 插件。(2) 将默认的 linter 设置为 mypy，以启用静态类型检查。(3) 将默认格式化工具设置为 black。你可以手动格式化文档，也可以通过设置 `\"editor.formatOnSave\": true` 让编辑器在每次保存时自动格式化。\n\n- 运行 `pre-commit install` 来安装 pre-commit 钩子，这些钩子会在提交前自动格式化文件。\n\n- 对代码进行修改，必要时更新 README 和 DOCUMENTATION，然后发起 Pull Request。\n\n### 代码结构\n\n这里我们简要介绍该项目的代码结构。未来我们将提供更详细的文档。\n\n- 所有方法都实现为 `BaseSystem` 的子类（位于 `systems\u002Fbase.py`）。一个系统通常包含六个模块：geometry、material、background、renderer、guidance 和 prompt_processor。除 guidance 和 prompt_processor 外，所有模块都是 `BaseModule` 的子类（位于 `utils\u002Fbase.py`），而 guidance 和 prompt_processor 则是 `BaseObject` 的子类，这样可以避免它们被视为模型参数，并更好地控制其在多 GPU 环境下的行为。\n- 所有系统、模块和数据模块都有各自的数据类配置。\n- 整个项目的基础配置可以在 `utils\u002Fconfig.py` 中找到。在 `ExperimentConfig` 数据类中，`data`、`system` 以及 `system` 下的模块配置会被解析为上述各类的配置。这些配置具有严格的类型约束，这意味着你只能使用数据类中定义的属性，并且必须遵守每个属性的类型。这种配置方式 (1) 自然支持属性的默认值；(2) 有效防止了属性的错误赋值（例如 YAML 文件中的拼写错误）或运行时的不当使用。\n- 该项目同时使用静态和运行时类型检查。更多细节请参阅 `utils\u002Ftyping.py`。\n- 若要在每个训练步骤中更新模块的任何内容，只需让该模块继承 `Updateable` 类（见 `utils\u002Fbase.py`）。在每次迭代开始时，`Updateable` 实例会自行更新，并递归更新其所有也是 `Updateable` 的属性。需要注意的是，`BaseSystem`、`BaseModule` 和 `BaseObject` 的子类默认都继承了 `Updateable`。\n\n## 已知问题\n\n- 在 AMP 模式下，Vanilla MLP 参数的梯度为空（暂时通过禁用 autocast 解决）。\n- FullyFused MLP 在 32 位精度下可能导致 NaN 值。\n\n## 致谢\n\nthreestudio 基于以下优秀的开源项目构建：\n\n- **[Lightning](https:\u002F\u002Fgithub.com\u002FLightning-AI\u002Flightning)** 用于构建高度规范化的 PyTorch 代码框架。\n- **[OmegaConf](https:\u002F\u002Fgithub.com\u002Fomry\u002Fomegaconf)** 灵活的 Python 配置系统。\n- **[NerfAcc](https:\u002F\u002Fgithub.com\u002FKAIR-BAIR\u002Fnerfacc)** 即插即用的 NeRF 加速库。\n\n以下仓库极大地启发了 threestudio：\n\n- **[Stable-DreamFusion](https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion)**\n- **[Latent-NeRF](https:\u002F\u002Fgithub.com\u002Feladrich\u002Flatent-nerf)**\n- **[Score Jacobian Chaining](https:\u002F\u002Fgithub.com\u002Fpals-ttic\u002Fsjc)**\n- **[Fantasia3D.unofficial](https:\u002F\u002Fgithub.com\u002Fashawkey\u002Ffantasia3d.unofficial)**\n\n感谢这些项目的维护者们对社区的贡献！\n\n## 引用 threestudio\n\n如果你觉得 threestudio 有所帮助，请考虑引用以下文献：\n\n```\n@Misc{threestudio2023,\n  author =       {Yuan-Chen Guo and Ying-Tian Liu and Ruizhi Shao and Christian Laforte and Vikram Voleti and Guan Luo and Chia-Hao Chen and Zi-Xin Zou and Chen Wang and Yan-Pei Cao and Song-Hai Zhang},\n  title =        {threestudio: A unified framework for 3D content generation},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio}},\n  year =         {2023}\n}\n```","# threestudio 快速上手指南\n\nthreestudio 是一个统一的框架，旨在通过提升 2D 文本到图像生成模型，实现从文本提示、单张图像或少量图像生成 3D 内容。它集成了 DreamFusion、Magic3D、ProlificDreamer、Zero-1-to-3 等多种主流算法。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: 推荐 Ubuntu 20.04（其他 Linux 发行版或 Windows 可能需要额外配置）。\n*   **硬件要求**:\n    *   NVIDIA 显卡。\n    *   显存至少 **6GB**（运行复杂模型如 ProlificDreamer 建议 16GB+）。\n    *   已安装适配的 **CUDA** 驱动。\n*   **软件依赖**:\n    *   Python >= 3.8\n    *   Git\n    *   (可选) Docker：如果您希望使用容器化部署，可参考官方 `docs\u002Finstallation.md`。\n\n> **国内开发者提示**：由于项目依赖大量 HuggingFace 模型和 GitHub 资源，建议配置国内镜像源以加速下载：\n> *   **Git**: 使用 Gitee 镜像或配置 `git config --global url.\"https:\u002F\u002Fghproxy.com\u002F\".insteadOf \"https:\u002F\u002Fgithub.com\u002F\"`\n> *   **Pip**: 使用清华源 `pip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple ...`\n> *   **HuggingFace**: 设置环境变量 `export HF_ENDPOINT=https:\u002F\u002Fhf-mirror.com`\n\n## 安装步骤\n\n以下命令基于 Ubuntu 环境，请在终端中依次执行：\n\n1.  **克隆仓库**\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio.git\n    cd threestudio\n    ```\n\n2.  **创建虚拟环境并激活**\n    ```bash\n    conda create -n threestudio python=3.8\n    conda activate threestudio\n    ```\n\n3.  **安装 PyTorch (根据 CUDA 版本选择)**\n    *   若您的 CUDA 版本为 11.7：\n        ```bash\n        pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu117\n        ```\n    *   若您的 CUDA 版本为 11.8：\n        ```bash\n        pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n        ```\n    *   *注：国内用户若上述源速度慢，可尝试使用清华源镜像安装对应的 wheel 包。*\n\n4.  **安装 threestudio 核心依赖**\n    ```bash\n    pip install -e .\n    ```\n\n5.  **安装额外依赖 (可选但推荐)**\n    为了支持网格导出和更多功能，建议安装额外依赖：\n    ```bash\n    pip install -r requirements_optional.txt\n    ```\n\n## 基本使用\n\nthreestudio 通过配置文件来管理不同的生成方法。以下是最简单的“文本生成 3D\"示例，使用经典的 **DreamFusion** 算法。\n\n### 1. 运行文本生成 3D (Text-to-3D)\n\n执行以下命令，将提示词替换为您想要生成的内容（例如：\"a hamburger\"）：\n\n```bash\npython launch.py --config configs\u002Fdreamfusion-sd.yaml --train dataset.prompt=\"a hamburger\"\n```\n\n*   `--config`: 指定配置文件，`dreamfusion-sd.yaml` 对应使用 Stable Diffusion 的 DreamFusion 实现。\n*   `--train`: 启动训练\u002F生成过程。\n*   `dataset.prompt`: 直接覆盖配置文件中的提示词。\n\n### 2. 查看结果\n\n运行结束后，生成的 3D 模型（通常为 `.obj` 网格文件或神经辐射场检查点）将保存在 `outputs\u002F` 目录下的对应实验文件夹中。您可以使用 Blender、MeshLab 或项目自带的 Gradio 界面查看结果。\n\n### 3. 其他常用场景示例\n\n*   **单图生成 3D (Image-to-3D) - 使用 Zero-1-to-3:**\n    ```bash\n    python launch.py --config configs\u002Fzero123.yaml --train dataset.image_path=\"path\u002Fto\u002Fyour\u002Fimage.png\"\n    ```\n\n*   **使用 Gradio 网页界面 (本地部署):**\n    如果您更喜欢图形化操作，可以启动本地 Web UI：\n    ```bash\n    python launch.py --config configs\u002Fgradio_interface.yaml\n    ```\n    启动后在浏览器访问显示的本地地址即可交互式生成 3D 内容。\n\n> **提示**: 更多高级用法（如 Magic3D, ProlificDreamer, 4D 生成等）请参考 `configs\u002F` 目录下的具体配置文件及官方扩展仓库 `threestudio-extensions`。","一家独立游戏工作室的美术团队正急需为即将上线的奇幻 RPG 项目批量制作高质量的 3D 道具资产，但团队中缺乏专业的 3D 建模师。\n\n### 没有 threestudio 时\n- **人力成本高昂**：必须外包或招聘昂贵的 3D 建模师，每个道具从概念图到完成模型需耗时数天，严重拖慢开发进度。\n- **创意落地困难**：策划人员脑海中独特的“火焰纹章盾牌”或“水晶法杖”难以通过文字准确传达给建模师，反复修改沟通成本极高。\n- **技术门槛壁垒**：团队成员虽擅长使用 Stable Diffusion 生成 2D 概念图，但完全不懂 NeRF、网格提取等复杂的 3D 生成算法，无法自行尝试。\n- **风格统一性差**：不同外包人员制作的资产风格割裂，难以保证与游戏整体美术风格的高度一致。\n\n### 使用 threestudio 后\n- **自动化快速生产**：美术人员直接输入文本提示词或上传单张概念图，threestudio 即可自动调用底层模型生成完整的 3D 几何与纹理，将制作周期缩短至小时级。\n- **所见即所得的创意验证**：策划人员可即时将文字描述转化为 3D 原型进行预览，快速迭代设计方案，无需依赖中间人翻译需求。\n- **统一框架降低门槛**：threestudio 集成了 DreamFusion、Magic3D 等多种前沿算法于统一框架下，用户无需深究数学原理，通过配置文件即可轻松切换不同生成策略。\n- **高质量且风格可控**：利用强大的 2D 先验知识提升 3D 细节，生成的资产拓扑结构更合理，且能严格遵循输入的图像风格，确保游戏资产的一致性。\n\nthreestudio 成功打破了 2D 创意与 3D 资产之间的技术鸿沟，让小型团队也能以极低成本实现高质量的 3D 内容自由创作。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fthreestudio-project_threestudio_493bdc72.jpg","threestudio-project","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fthreestudio-project_836eeb3e.png","",null,"https:\u002F\u002Fgithub.com\u002Fthreestudio-project",[21,25,29,33],{"name":22,"color":23,"percentage":24},"Jupyter Notebook","#DA5B0B",71.5,{"name":26,"color":27,"percentage":28},"Python","#3572A5",28.3,{"name":30,"color":31,"percentage":32},"Shell","#89e051",0.2,{"name":34,"color":35,"percentage":36},"Dockerfile","#384d54",0.1,7013,550,"2026-04-17T13:49:49","Apache-2.0",4,"Linux","必需，NVIDIA GPU，显存至少 6GB，需安装 CUDA","未说明",{"notes":46,"python":47,"dependencies":48},"官方安装步骤已在 Ubuntu 20.04 上测试通过。支持通过 Docker 安装。该项目集成了多种 3D 生成算法（如 DreamFusion, Magic3D, ProlificDreamer 等），不同算法对显存的实际需求可能高于最低要求的 6GB。建议参考项目文档中的 installation.md 获取包含 Docker 在内的详细安装指南。",">=3.8",[49,50,51,52,53,54,55,56,57,58],"torch","pytorch-lightning","transformers","diffusers","accelerate","opencv-python","trimesh","pygltflib","xatlas","nerfacc",[60,61],"开发框架","其他",2,"ready","2026-03-27T02:49:30.150509","2026-04-20T10:34:17.401363",[67,72,77,81,85,90,95,100],{"id":68,"question_zh":69,"answer_zh":70,"source_url":71},44520,"如何在 Google Colab 上运行 DeepFloyd 或解决 PyTorch 版本不兼容问题？","Colab 环境默认包含 `torch2.0.1+cu118`。安装 `xformers` 时可能会强制重装不兼容的 `torch2.0.0`，导致与 `torchvision` 冲突。解决方案是移除笔记本中的 `xformers`，因为 `torch>=2.0.0` 已不再需要它。如果遇到 PyTorch 版本不兼容导致的报错，请创建一个新的环境并拉取最新代码提交，问题通常已在最新版本中修复。","https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fissues\u002F47",{"id":73,"question_zh":74,"answer_zh":75,"source_url":76},44521,"运行 512 分辨率的 ProlificDreamer 需要多少显存？","第一阶段（Stage 1）大约需要 33GB 显存。第二阶段（512x512 分辨率训练部分）在优化后仅需约 7GB 显存，而非之前估计的 33GB。如果您拥有 48GB 显存的 GPU，通常可以顺利运行第一阶段。网格几何部分的显存占用较少。","https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fissues\u002F108",{"id":78,"question_zh":79,"answer_zh":80,"source_url":76},44522,"如何将 DreamFusion 模型加载到 ProlificDreamer 流程中？","可以通过设置以下参数将 DreamFusion 模型加载到 ProlificDreamer 的任意阶段：\n`system.geometry_convert_from=path\u002Fto\u002Fyour\u002Fdreamfusion\u002Fcheckpoint`\n`system.geometry_convert_inherit_texture=false`\n注意：由于两个流程使用的相机配置不同，您可能还需要调整 `data.camera_distance_range` 和 `system.model.radius` 参数。",{"id":82,"question_zh":83,"answer_zh":84,"source_url":76},44523,"标准 ProlificDreamer 在第一阶段（25k 迭代）默认分辨率下需要训练多久？","在单张 RTX 3090 GPU 上，完成第一阶段（25,000 次迭代）大约需要 4.5 小时。",{"id":86,"question_zh":87,"answer_zh":88,"source_url":89},44524,"在提取场景级文本生成 3D 的几何体时遇到段错误（Segmentation fault）或网格中有黑洞怎么办？","这通常是因为提取场景级网格时内存不足或阈值设置不当。尝试降低 `system.geometry_convert_override.isosurface_threshold` 的值（默认值为 25）。例如，将其设置为 10 或 5 可以显著改善结果并消除黑洞。命令示例：在启动脚本中添加 `system.geometry_convert_override.isosurface_threshold=5`。","https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fissues\u002F245",{"id":91,"question_zh":92,"answer_zh":93,"source_url":94},44525,"如何导出生成的 3D 网格模型？","项目现已支持导出网格功能。您可以参考官方文档中的 \"Export Meshes\" 部分获取详细步骤。通常可以通过运行特定的导出命令或使用配置文件将生成的隐式表示转换为 `.obj` 或其他格式的网格文件，以便在 Blender 等软件中查看。","https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fissues\u002F35",{"id":96,"question_zh":97,"answer_zh":98,"source_url":99},44526,"遇到 'Text embedding file not found' 或 T5EncoderModel 加载失败（显存不足）的错误如何解决？","该错误通常是因为显存不足以加载量化模型或模块被分发到了 CPU\u002F磁盘。解决方法包括：\n1. 确保有足够的 GPU 显存。\n2. 如果需要在 CPU 和 GPU 之间卸载模块，需设置 `load_in_8bit_fp32_cpu_offload=True` 并传递自定义的 `device_map` 给 `from_pretrained` 函数。\n3. 参考相关 Issue 讨论（如 #148）获取具体的配置修改方案。","https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fissues\u002F88",{"id":101,"question_zh":102,"answer_zh":103,"source_url":104},44527,"遇到 'RuntimeError: Expected all tensors to be on the same device' 错误怎么办？","此错误表明张量位于不同的设备上（例如有的在 CPU，有的在 CUDA）。虽然具体修复取决于上下文，但通常检查以下几点：\n1. 确保所有模型组件和数据加载都在同一设备上进行。\n2. 如果您使用的是混合精度训练（AMP），确认 `torch.set_float32_matmul_precision` 设置正确（如设为 'medium' 或 'high'）以适配带有 Tensor Cores 的 GPU（如 A10）。\n3. 检查是否有自定义操作未正确处理设备迁移。","https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio\u002Fissues\u002F236",[106],{"id":107,"version":108,"summary_zh":109,"released_at":110},351974,"v0.1.0","### 新增功能\r\n\r\n- 实现了 DreamFusion、Magic3D、SJC、Latent-NeRF 和 Sketch-Shape 模型。\r\n- 实现了 Fantasia3D 的几何重建阶段。  \r\n- 支持多 GPU 训练（#33）。  \r\n- 增加网格导出功能，支持带有 MTL 文件的 OBJ 格式以及带有顶点颜色的 OBJ 格式（#44）。","2023-05-11T12:45:06",[112,124,133,141,150,158],{"id":113,"name":114,"github_repo":115,"description_zh":116,"stars":117,"difficulty_score":118,"last_commit_at":119,"category_tags":120,"status":63},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[121,60,122,123],"Agent","图像","数据工具",{"id":125,"name":126,"github_repo":127,"description_zh":128,"stars":129,"difficulty_score":62,"last_commit_at":130,"category_tags":131,"status":63},9989,"n8n","n8n-io\u002Fn8n","n8n 是一款面向技术团队的公平代码（fair-code）工作流自动化平台，旨在让用户在享受低代码快速构建便利的同时，保留编写自定义代码的灵活性。它主要解决了传统自动化工具要么过于封闭难以扩展、要么完全依赖手写代码效率低下的痛点，帮助用户轻松连接 400 多种应用与服务，实现复杂业务流程的自动化。\n\nn8n 特别适合开发者、工程师以及具备一定技术背景的业务人员使用。其核心亮点在于“按需编码”：既可以通过直观的可视化界面拖拽节点搭建流程，也能随时插入 JavaScript 或 Python 代码、调用 npm 包来处理复杂逻辑。此外，n8n 原生集成了基于 LangChain 的 AI 能力，支持用户利用自有数据和模型构建智能体工作流。在部署方面，n8n 提供极高的自由度，支持完全自托管以保障数据隐私和控制权，也提供云端服务选项。凭借活跃的社区生态和数百个现成模板，n8n 让构建强大且可控的自动化系统变得简单高效。",184740,"2026-04-19T23:22:26",[123,60,121,122,132],"插件",{"id":134,"name":135,"github_repo":136,"description_zh":137,"stars":138,"difficulty_score":118,"last_commit_at":139,"category_tags":140,"status":63},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[60,122,121],{"id":142,"name":143,"github_repo":144,"description_zh":145,"stars":146,"difficulty_score":62,"last_commit_at":147,"category_tags":148,"status":63},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",161147,"2026-04-19T23:31:47",[60,121,149],"语言模型",{"id":151,"name":152,"github_repo":153,"description_zh":154,"stars":155,"difficulty_score":62,"last_commit_at":156,"category_tags":157,"status":63},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",109154,"2026-04-18T11:18:24",[60,122,121],{"id":159,"name":160,"github_repo":161,"description_zh":162,"stars":163,"difficulty_score":62,"last_commit_at":164,"category_tags":165,"status":63},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[132,121,122,60]]