[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-InternRobotics--PointLLM":3,"tool-InternRobotics--PointLLM":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":79,"owner_website":81,"owner_url":82,"languages":83,"stars":92,"forks":93,"last_commit_at":94,"license":79,"difficulty_score":95,"env_os":96,"env_gpu":97,"env_ram":98,"env_deps":99,"category_tags":107,"github_topics":108,"view_count":10,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":121,"updated_at":122,"faqs":123,"releases":152},908,"InternRobotics\u002FPointLLM","PointLLM","[ECCV 2024 Best Paper Candidate & TPAMI 2025] PointLLM: Empowering Large Language Models to Understand Point Clouds","PointLLM 是一个能够让大语言模型理解和处理三维点云数据的开源工具。它通过多模态学习，使模型不仅能识别物体的类型，还能理解其几何结构和外观特征，且不受深度模糊、遮挡或视角变化的干扰。\n\n该工具主要解决了传统语言模型难以处理三维视觉信息的问题。通过引入包含大量点云-文本配对指令的数据集，并采用两阶段训练策略，PointLLM 建立了生成式三维物体分类和三维物体描述生成两大评估基准，显著提升了模型对三维世界的感知与描述能力。\n\n它非常适合计算机视觉、机器人以及三维内容生成领域的研究人员和开发者使用。无论是希望探索三维场景理解的学术团队，还是需要为产品添加三维物体识别与交互功能的工程师，都能从中受益。其技术亮点在于首次将大语言模型与彩色点云理解深度结合，并通过精心构建的数据集和评估体系，为三维多模态学习提供了可靠的研究基础。","\u003Cp align=\"center\">\n\u003Ch1 align=\"center\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_c1a982e8a5f2.png\" align=\"center\" width=\"6.5%\">\u003Cstrong>PointLLM: Empowering Large Language Models to Understand Point Clouds\u003C\u002Fstrong>\u003C\u002Fh1>\n  \u003Cp align=\"center\">\n    \u003Ca href='https:\u002F\u002Frunsenxu.com\u002F' target='_blank'>Runsen Xu\u003C\u002Fa>&emsp;\n    \u003Ca href='https:\u002F\u002Fguanfang12.github.io\u002F' target='_blank'>Xiaolong Wang\u003C\u002Fa>&emsp;\n    \u003Ca href='https:\u002F\u002Ftai-wang.github.io\u002F' target='_blank'>Tai Wang\u003C\u002Fa>&emsp;\n    \u003Ca href='http:\u002F\u002Fyilunchen.com\u002Fabout' target='_blank'>Yilun Chen\u003C\u002Fa>&emsp;\n    \u003Ca href='https:\u002F\u002Foceanpang.github.io\u002F' target='_blank'>Jiangmiao Pang*\u003C\u002Fa>&emsp;\n    \u003Ca href='http:\u002F\u002Fdahua.site\u002F' target='_blank'>Dahua Lin\u003C\u002Fa>&emsp;\n    \u003Cbr>\n    The Chinese University of Hong Kong&emsp;Shanghai AI Laboratory&emsp;Zhejiang University\n  \u003C\u002Fp>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F2308.16911\" target='_**blank**'>\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2308.16911-blue?\">\n  \u003C\u002Fa> \n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.16911.pdf\" target='_blank'>\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-📖-blue?\">\n  \u003C\u002Fa> \n  \u003Ca href=\"https:\u002F\u002Frunsenxu.com\u002Fprojects\u002FPointLLM\" target='_blank'>\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-&#x1F680-blue\">\n  \u003C\u002Fa>\n  \u003Ca href=\"\" target='_blank'>\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo-&#x1f917-blue\">\n  \u003C\u002Fa>\n  \u003Ca href=\"\" target='_blank'>\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_3beb23724624.png\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fopenxlab.org.cn\u002Fapps\u002Fdetail\u002Fopenxlab-app\u002FPointLLM\" target='_blank'>\n    \u003Cimg src=\"https:\u002F\u002Fcdn-static.openxlab.org.cn\u002Fapp-center\u002Fopenxlab_app.svg\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n## 🏠 About\n\u003C!-- ![Teaser](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_752a941587a4.jpg) -->\n\u003Cdiv style=\"text-align: center;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_752a941587a4.jpg\" alt=\"Dialogue_Teaser\" width=100% >\n\u003C\u002Fdiv>\nWe introduce \u003Cb>PointLLM, a multi-modal large language model capable of understanding colored point clouds of objects.\u003C\u002Fb> It perceives object types, geometric structures, and appearance without concerns for ambiguous depth, occlusion, or viewpoint dependency. \u003Cb>We collect a novel dataset comprising 660K simple and 70K complex point-text instruction pairs\u003C\u002Fb> to enable a two-stage training strategy. To rigorously evaluate our model's perceptual abilities and its generalization capabilities, \u003Cb>we establish two benchmarks: Generative 3D Object Classification and 3D Object Captioning, assessed through three different evaluation methods.\u003C\u002Fb>\n\n## 🔥 News\n- [2026-03-17] The training annotations for PointLLM-V2 are available [here](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRunsenXu\u002FPointLLM\u002Ftree\u002Fmain).\n- [2025-07-06] Our improved version of PointLLM, [PointLLM-V2](https:\u002F\u002Fwww.computer.org\u002Fcsdl\u002Fjournal\u002Ftp\u002F5555\u002F01\u002F11086426\u002F28xeHHLbKX6), has been accepted by TPAMI 2025! Models, codes, and data are coming! 🎉\n- [2025-04-21] We closed our online demo because we need to use the serving machine for other purposes.\n- [2024-09-06] We have uploaded the camera-ready version of PointLLM for ECCV 2024, which includes clearer writing and additional experimental results. Please check the paper [here](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.16911).\n- [2024-07-01] PointLLM has been accepted by ECCV 2024 with all \"strong-accept\" recommendation. 🎉 We are looking for self-motivated students to conduct research regarding PointLLM. Please send an email to runsxu@gmail.com with your CV if you are interested!\n- [2023-12-29] We release the codes of our online Gradio demo.\n- [2023-12-26] We release the codes for model evaluation, including ChatGPT\u002FGPT-4 evaluation and traditional metric evaluation.\n- [2023-12-08] We release the codes for training and PointLLM-v1.2. The online demo has also been upgraded to the v1.2 version. Please enjoy! &#x1F389; \n- [2023-12-01] We have released an updated version of our paper (v2), which includes additional baseline comparisons, enhanced human-evaluation metrics, improved model performance (PointLLM-v1.2), and other refinements. Please check the updated version [here](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.16911).\n- [2023-10-18] We release our instruction-following data, including both the simple-description and complex instructions. Download [here](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRunsenXu\u002FPointLLM).\n- [2023-09-26] We release the inferencing codes with checkpoints as well as the Objaverse colored point cloud files we use. You can chat with PointLLM with your own machines.\n- [2023-08-31] We release the [paper](http:\u002F\u002Farxiv.org\u002Fabs\u002F2308.16911) of PointLLM and an online gradio demo. Try it! &#x1F389;\n\n\u003C!-- contents with emoji -->\n## 📋 Contents\n- [🤖 Online Demo](#-online-demo)\n- [💬 Dialogue Examples](#-dialogue-examples)\n- [🔍 Overview](#-overview)\n- [📦 Training and Evaluation](#-training-and-evaluation)\n- [📝 TODO List](#-todo-list)\n- [🔗 Citation](#-citation)\n- [📄 License](#-license)\n- [📚 Related Work](#-related-work)\n- [👏 Acknowledgements](#-acknowledgements)\n\n\n## 💬 Dialogue Examples\n| Dialogue 1 | Dialogue 2| Dialogue 3 | Dialogue 4\n| :-: | :-: | :-: | :-: |\n| \u003Cimg width=\"100%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_63e11e61c8a6.jpg\"> |  \u003Cimg width=\"100%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_59fa34a15ef1.jpg\"> |  \u003Cimg width=\"100%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_76e58346f20b.jpg\"> | \u003Cimg width=\"100%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_a409f68705f4.jpg\"> |\n\n\n## 🔍 Overview\n\n### Model\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_af713d326ccf.jpg\" align=\"center\" width=\"100%\">\n\u003C\u002Fp>\nThe point encoder extracts features from the input point cloud and projects them to the latent space of the LLM backbone. The LLM backbone processes sequences of point tokens and text tokens, and generates the predicted tokens as the output.\n\n### Experiment Results\n#### Quantitative Comparisons with baselines.\nPlease refer to our paper for more results.\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_b69a278c65c1.png\" align=\"center\" width=\"100%\">\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_f0484552a1bf.png\" align=\"center\" width=\"100%\">\n\u003C\u002Fp>\n\u003Cb>!!!Note: Traditional metrics such as BLEU-1, ROUGE-L, and METEOR tend to favor shorter responses and may not effectively capture semantic accuracy. For a detailed discussion on this, please refer to our paper. We suggest the community not solely rely on these metrics for evaluation.\u003C\u002Fb>\n\n#### Qualitative Comparisons with baselines.\nPlease refer to our paper for more results.\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_5321307c090c.png\" align=\"center\" width=\"100%\">\n\u003C\u002Fp>\n\n## 📦 Training and Evaluation\n### Installation\nWe test our codes under the following environment:\n- Ubuntu 20.04\n- NVIDIA Driver: 515.65.01\n- CUDA 11.7\n- Python 3.10.13\n- PyTorch 2.0.1\n- Transformers 4.28.0.dev(transformers.git@cae78c46)\n\nTo start: \n1. Clone this repository.\n```bash\ngit clone git@github.com:OpenRobotLab\u002FPointLLM.git\ncd PointLLM\n```\n2. Install packages\n```bash\nconda create -n pointllm python=3.10 -y\nconda activate pointllm\npip install --upgrade pip  # enable PEP 660 support\npip install -e .\n\n# * for training\npip install ninja\npip install flash-attn\n```\n\n### Data Preparation\n#### Objaverse Training Data\n1. Download the two compressed files of 660K Objaverse colored point clouds [here](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRunsenXu\u002FPointLLM\u002Ftree\u002Fmain). They require about 77GB of storage space.\n2. Run the following command to merge the two files into one and uncompress it. This will produce a folder named `8192_npy` containing 660K point cloud files named `{Objaverse_ID}_8192.npy`. Each file is a numpy array with dimensions (8192, 6), where the first three dimensions are `xyz` and the last three dimensions are `rgb` in [0, 1] range.\n```bash\ncat Objaverse_660K_8192_npy_split_a* > Objaverse_660K_8192_npy.tar.gz\ntar -xvf Objaverse_660K_8192_npy.tar.gz\n```\n3. In `PointLLM` folder, create a folder `data` and create a soft link to the uncompressed file in the directory.\n```bash\ncd PointLLM\nmkdir data\nln -s \u002Fpath\u002Fto\u002F8192_npy data\u002Fobjaverse_data\n```\n\n#### Instruction-Following Data\n1. In `PointLLM\u002Fdata` folder, create a directory named `anno_data`.\n2. Our instruction-following data, including both the simple-description and complex instructions, can be downloaded [here](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRunsenXu\u002FPointLLM). If you have difficulty downloading the data (e.g. network issue), please email the authors.\n- The simple-description data has 660K samples and the complex instructions have 70K samples.\n- Both training data are based on the Objaverse dataset.\n- The complex instructions are generated with GPT-4.\n3. Put the data files in the `anno_data` directory. The directory should look like this:\n```bash\nPointLLM\u002Fdata\u002Fanno_data\n├── PointLLM_brief_description_660K_filtered.json\n├── PointLLM_brief_description_660K.json\n└── PointLLM_complex_instruction_70K.json\n```\n4. Note, the `PointLLM_brief_description_660K_filtered.json` is filtered from `PointLLM_brief_description_660K.json` by removing the 3000 objects we reserved as the validation set. If you want to reproduce the results in our paper, you should use the `PointLLM_brief_description_660K_filtered.json` for training. The `PointLLM_complex_instruction_70K.json` contains objects from the training set.\n5. If you want to generate the complex instructions by yourself, please refer to our paper for other details. The system prompt is at `pointllm\u002Fdata\u002Fdata_generation\u002Fsystem_prompt_gpt4_0613.txt`.\n6. [Optional] The annotations for PointLLM-V2 are available at [PointLLM_V2_Stage1_1M_filtered.json](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRunsenXu\u002FPointLLM\u002Fresolve\u002Fmain\u002FPointLLM_V2_Stage1_1M_filtered.json) and [PointLLM_V2_Stage2_700k_filtered.json](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRunsenXu\u002FPointLLM\u002Fresolve\u002Fmain\u002FPointLLM_V2_Stage2_700k_filtered.json). You need to download additional point clouds from Objaverse-XL [here](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Ftiange\u002FCap3D\u002Ftree\u002Fmain\u002FPointCloud_zips).\n\n#### Evaluation Data\n1. Download the referencing GT `PointLLM_brief_description_val_200_GT.json` we use for the benchmarks on Objaverse dataset [here](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRunsenXu\u002FPointLLM\u002Fblob\u002Fmain\u002FPointLLM_brief_description_val_200_GT.json), and put it in `PointLLM\u002Fdata\u002Fanno_data`. We also provide the 3000 object ids we filter during training [here](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRunsenXu\u002FPointLLM\u002Fblob\u002Fmain\u002Fval_object_ids_3000.txt) and their corresponding referencing GT [here](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRunsenXu\u002FPointLLM\u002Fblob\u002Fmain\u002FPointLLM_brief_description_val_3000_GT.json), which can be used to evaluate on all the 3000 objects.\n2. Create a directory named `modelnet40_data` in `PointLLM\u002Fdata`. Download the test split of ModelNet40 point clouds `modelnet40_test_8192pts_fps.dat` [here](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRunsenXu\u002FPointLLM\u002Fblob\u002Fmain\u002Fmodelnet40_test_8192pts_fps.dat) and put it in `PointLLM\u002Fdata\u002Fmodelnet40_data`.\n\n### Training\n#### Download the Initial LLM and Point Encoder Weights\n1. In `PointLLM` folder, create a directory named `checkpoints`.\n2. Download the pre-trained LLM and point encoder: [\nPointLLM_7B_v1.1_init](https:\u002F\u002Fhuggingface.co\u002FRunsenXu\u002FPointLLM_7B_v1.1_init\u002Ftree\u002Fmain) or [PointLLM_13B_v1.1_init](https:\u002F\u002Fhuggingface.co\u002FRunsenXu\u002FPointLLM_13B_v1.1_init\u002Ftree\u002Fmain). Put them in the `checkpoints` directory.\n3. Note that the above \"v1.1\" means we use the Vicuna-v1.1 checkpoints, and you do **not** need to download the original LLaMA weights again. \n\n#### Start Training\n1. For stage-1 training, simply run:\n```bash\ncd PointLLM\nscripts\u002FPointLLM_train_stage1.sh\n```\n2. After stage-1 training, start stage-2 training:\n```bash\nscripts\u002FPointLLM_train_stage2.sh\n```\n\n#### PointLLM-v1.1 and PointLLM-v1.2\nUsually, you do not have to care about the following contents. They are only for reproducing the results in our v1 paper (PointLLM-v1.1). If you want to compare with our models or use our models for downstream tasks, please use PointLLM-v1.2 (refer to our v2 paper), which has better performance.\n\u003Cdetails>\n  \u003Csummary>The following steps are for reproducing PointLLM-v1.1 (click to expand)\u003C\u002Fsummary>\n  \n1. PointLLM v1.1 and v1.2 use slightly different pre-trained point encoders and projectors. If you want to reproduce PointLLM v1.1, edit the `config.json` file in the directory of initial LLM and point encoder weights, for example, `vim checkpoints\u002FPointLLM_7B_v1.1_init\u002Fconfig.json`.\n  \n2. Change the key `\"point_backbone_config_name\"` to specify another point encoder config:\n    ```bash\n    # change from\n    \"point_backbone_config_name\": \"PointTransformer_8192point_2layer\" # v1.2\n    # to\n    \"point_backbone_config_name\": \"PointTransformer_base_8192point\", # v1.1\n    ```\n\n3. Edit the checkpoint path of the point encoder in `scripts\u002Ftrain_stage1.sh`:\n    ```bash\n    # change from\n    point_backbone_ckpt=$model_name_or_path\u002Fpoint_bert_v1.2.pt # v1.2\n    # to\n    point_backbone_ckpt=$model_name_or_path\u002Fpoint_bert_v1.1.pt # v1.1\n    ```\n\u003C\u002Fdetails>\n\n### Chatting\n1. The trained model checkpoints are available [here](https:\u002F\u002Fhuggingface.co\u002FRunsenXu) (including different versions of PointLLM). \n2. Run the following command to launch a chatbot using the `torch.float32` data type for chatting about 3D models of Objaverse. The model checkpoints will be downloaded automatically. You can also manually download the model checkpoints and specify their paths. Here is an example:\n```bash\ncd PointLLM\nPYTHONPATH=$PWD python pointllm\u002Feval\u002FPointLLM_chat.py --model_name RunsenXu\u002FPointLLM_7B_v1.2 --data_name data\u002Fobjaverse_data --torch_dtype float32\n```\n3. You can also easily modify the codes for using point clouds other than those from Objaverse, as long as the point clouds input to the model have dimensions (N, 6), where the first three dimensions are `xyz` and the last three dimensions are `rgb` (in [0, 1] range). You may sample the point clouds to have 8192 points, as our model is trained on such point clouds.\n4. The following table shows GPU requirements for different models and data types. We recommend using `torch.bfloat16` if applicable, which is used in the experiments in our paper.\n   \n    |  Model   | Data Type | GPU Memory |\n    |:--------:|:---------:|:----------:|\n    | PointLLM-7B  | torch.float16 |    14GB    |\n    | PointLLM-7B  | torch.float32 |    28GB    |\n    | PointLLM-13B | torch.float16 |    26GB    |\n    | PointLLM-13B | torch.float32 |    52GB    |\n\n### Gradio Demo\n1. We provide the codes for our online Gradio demo. You can run the following commands to launch the demo locally for chatting and visualization.\n```bash\ncd PointLLM\nPYTHONPATH=$PWD python pointllm\u002Feval\u002Fchat_gradio.py --model_name RunsenXu\u002FPointLLM_7B_v1.2 --data_path data\u002Fobjaverse_data\n```\n2. Kind remind: if you want to release the demo in public, please refer to https:\u002F\u002Fwww.gradio.app\u002Fguides\u002Fsharing-your-app#security-and-file-access.\n\n### Evaluation\n#### Inferencing\n1. Run the following commands to infer the results.\n2. Different commands for inferencing on different benchmarks (PointLLM_7B_v1.2 as an example):\n```bash\ncd PointLLM\nexport PYTHONPATH=$PWD\n\n# Open Vocabulary Classification on Objaverse\npython pointllm\u002Feval\u002Feval_objaverse.py --model_name RunsenXu\u002FPointLLM_7B_v1.2 --task_type classification --prompt_index 0 # or --prompt_index 1\n\n# Object captioning on Objaverse\npython pointllm\u002Feval\u002Feval_objaverse.py --model_name RunsenXu\u002FPointLLM_7B_v1.2 --task_type captioning --prompt_index 2\n\n# Close-set Zero-shot Classification on ModelNet40\npython pointllm\u002Feval\u002Feval_modelnet_cls.py --model_name RunsenXu\u002FPointLLM_7B_v1.2 --prompt_index 0 # or --prompt_index 1\n```\n3. Please check the default command-line arguments of these two scripts. You can specify different prompts, data paths, and other parameters. \n4. After inferencing, the results will be saved in `{model_name}\u002Fevaluation` as a dict with the following format:\n```bash\n{\n  \"prompt\": \"\",\n  \"results\": [\n    {\n      \"object_id\": \"\",\n      \"ground_truth\": \"\", \n      \"model_output\": \"\",\n      \"label_name\": \"\" # only for classification on modelnet40\n    }\n  ]\n}\n```\n\n#### ChatGPT\u002FGPT-4 Evaluation\n1. Get your OpenAI API key at [https:\u002F\u002Fplatform.openai.com\u002Fapi-keys](https:\u002F\u002Fplatform.openai.com\u002Fapi-keys).\n2. Run the following commands to evaluate the model outputs in parallel with ChatGPT\u002FGPT-4 (which cost approximately $1.5 to $2.2 USD).\n```bash\ncd PointLLM\nexport PYTHONPATH=$PWD\nexport OPENAI_API_KEY=sk-****\n\n# Open Vocabulary Classification on Objaverse\npython pointllm\u002Feval\u002Fevaluator.py --results_path \u002Fpath\u002Fto\u002Fmodel_output --model_type gpt-4-0613 --eval_type open-free-form-classification --parallel --num_workers 15\n\n# Object captioning on Objaverse\npython pointllm\u002Feval\u002Fevaluator.py --results_path \u002Fpath\u002Fto\u002Fmodel_output --model_type gpt-4-0613 --eval_type object-captioning --parallel --num_workers 15\n\n# Close-set Zero-shot Classification on ModelNet40\npython pointllm\u002Feval\u002Fevaluator.py --results_path \u002Fpath\u002Fto\u002Fmodel_output --model_type gpt-3.5-turbo-0613 --eval_type modelnet-close-set-classification --parallel --num_workers 15\n```\n3. The evaluation script supports interruption and resumption. You can interrupt the evaluation process at any time by using `Ctrl+C`. This will save the temporary results. If an error occurs during the evaluation, the script will also save the current state. You can resume the evaluation from where it left off by running the same command again.\n4. The evaluation results will be saved in `{model_name}\u002Fevaluation` as another dict.\nSome of the metrics are explained as follows:\n```bash\n\"average_score\": The GPT-evaluated captioning score we report in our paper.\n\"accuracy\": The classification accuracy we report in our paper, including random choices made by ChatGPT when model outputs are vague or ambiguous and ChatGPT outputs \"INVALID\".\n\"clean_accuracy\": The classification accuracy after removing those \"INVALID\" outputs.\n\"total_predictions\": The number of predictions.\n\"correct_predictions\": The number of correct predictions.\n\"invalid_responses\": The number of \"INVALID\" outputs by ChatGPT.\n\n# Some other statistics for calling OpenAI API\n\"prompt_tokens\": The total number of tokens of the prompts for ChatGPT\u002FGPT-4.\n\"completion_tokens\": The total number of tokens of the completion results from ChatGPT\u002FGPT-4.\n\"GPT_cost\": The API cost of the whole evaluation process, in US Dollars 💵.\n```\n5. \u003Cb>Open-Step Evaluation.\u003C\u002Fb> You can also start evaluation immediately after inferencing by passing the `--start_eval` flag and specifying the `--gpt_type`. For example:\n```bash\npython pointllm\u002Feval\u002Feval_objaverse.py --model_name RunsenXu\u002FPointLLM_7B_v1.2 --task_type classification --prompt_index 0 --start_eval --gpt_type gpt-4-0613\n```\n\n#### Traditional Metric Evaluation\n1. For the object captioning task, run the following command to evaluate model outputs with traditional metrics including BLEU, ROUGE, METEOR, Sentence-BERT, and SimCSE.\n```bash\npython pointllm\u002Feval\u002Ftraditional_evaluator.py --results_path \u002Fpath\u002Fto\u002Fmodel_captioning_output\n```\n2. Note that we recommend not using BLEU, ROUGE, and METEOR for evaluation as they favor short captions and fall short of capturing semantic accuracy and diversity.\n\n## 📝 TODO List\n- [x] Add inferencing codes with checkpoints.\n- [x] Release instruction-following data.\n- [x] Add training codes.\n- [x] Add evaluation codes.\n- [x] Add gradio demo codes.\n- [ ] Release PointLLM-V2 with a better model and data.\n\nCommunity contributions are welcome!👇 If you need any support, please feel free to open an issue or contact us.\n- [ ] Support Phi-2 LLM to make PointLLM more accessible to the community.\n- [ ] Support Chinese LLMs like InternLM.\n\n## 🔗 Citation\n\nIf you find our work and this codebase helpful, please consider starring this repo 🌟 and cite:\n\n```bibtex\n@inproceedings{xu2024pointllm,\n  title={PointLLM: Empowering Large Language Models to Understand Point Clouds},\n  author={Xu, Runsen and Wang, Xiaolong and Wang, Tai and Chen, Yilun and Pang, Jiangmiao and Lin, Dahua},\n  booktitle={ECCV},\n  year={2024}\n}\n```\n\n## 📄 License\n\u003Ca rel=\"license\" href=\"http:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-nc-sa\u002F4.0\u002F\">\u003Cimg alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_8a4e76cf0ed2.png\" \u002F>\u003C\u002Fa>\n\u003Cbr \u002F>\nThis work is under the \u003Ca rel=\"license\" href=\"http:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-nc-sa\u002F4.0\u002F\">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License\u003C\u002Fa>.\n\n## 📚 Related Work\nTogether, Let's make LLM for 3D great!\n- [Point-Bind & Point-LLM](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.00615): aligns point clouds with Image-Bind, and leverages ImageBind-LLM to reason multi-modality input without 3D-instruction data training.\n- [3D-LLM](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.12981): employs 2D foundation models to encode multi-view images of 3D point clouds.\n\n\n## 👏 Acknowledgements\n- [LLaVA](https:\u002F\u002Fgithub.com\u002Fhaotian-liu\u002FLLaVA): Our codebase is built upon LLaVA.\n- [Vicuna](https:\u002F\u002Fgithub.com\u002Flm-sys\u002FFastChat): We use the Vicuna-7B and Vicuna-13B checkpoints.\n- [Objaverse](https:\u002F\u002Fobjaverse.allenai.org): We use models of the Objaverse dataset for training and evaluation.\n- [Cap3D](https:\u002F\u002Fgithub.com\u002Fcrockwell\u002FCap3D\u002F): We use the Cap3D captioning data for our data generation.\n- [ULIP-2](https:\u002F\u002Fgithub.com\u002Fsalesforce\u002FULIP): We use ULIP-2 for pre-training our point cloud encoder.\n","\u003Cp align=\"center\">\n\u003Ch1 align=\"center\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_c1a982e8a5f2.png\" align=\"center\" width=\"6.5%\">\u003Cstrong>PointLLM：赋能大语言模型理解点云\u003C\u002Fstrong>\u003C\u002Fh1>\n  \u003Cp align=\"center\">\n    \u003Ca href='https:\u002F\u002Frunsenxu.com\u002F' target='_blank'>Runsen Xu\u003C\u002Fa>&emsp;\n    \u003Ca href='https:\u002F\u002Fguanfang12.github.io\u002F' target='_blank'>Xiaolong Wang\u003C\u002Fa>&emsp;\n    \u003Ca href='https:\u002F\u002Ftai-wang.github.io\u002F' target='_blank'>Tai Wang\u003C\u002Fa>&emsp;\n    \u003Ca href='http:\u002F\u002Fyilunchen.com\u002Fabout' target='_blank'>Yilun Chen\u003C\u002Fa>&emsp;\n    \u003Ca href='https:\u002F\u002Foceanpang.github.io\u002F' target='_blank'>Jiangmiao Pang*\u003C\u002Fa>&emsp;\n    \u003Ca href='http:\u002F\u002Fdahua.site\u002F' target='_blank'>Dahua Lin\u003C\u002Fa>&emsp;\n    \u003Cbr>\n    香港中文大学&emsp;上海人工智能实验室&emsp;浙江大学\n  \u003C\u002Fp>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"http:\u002F\u002Farxiv.org\u002Fabs\u002F2308.16911\" target='_**blank**'>\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2308.16911-blue?\">\n  \u003C\u002Fa> \n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fpdf\u002F2308.16911.pdf\" target='_blank'>\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-📖-blue?\">\n  \u003C\u002Fa> \n  \u003Ca href=\"https:\u002F\u002Frunsenxu.com\u002Fprojects\u002FPointLLM\" target='_blank'>\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-&#x1F680-blue\">\n  \u003C\u002Fa>\n  \u003Ca href=\"\" target='_blank'>\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo-&#x1f917-blue\">\n  \u003C\u002Fa>\n  \u003Ca href=\"\" target='_blank'>\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_b06df2c0a34d.png\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fopenxlab.org.cn\u002Fapps\u002Fdetail\u002Fopenxlab-app\u002FPointLLM\" target='_blank'>\n    \u003Cimg src=\"https:\u002F\u002Fcdn-static.openxlab.org.cn\u002Fapp-center\u002Fopenxlab_app.svg\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n## 🏠 简介\n\u003C!-- ![Teaser](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_752a941587a4.jpg) -->\n\u003Cdiv style=\"text-align: center;\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_752a941587a4.jpg\" alt=\"Dialogue_Teaser\" width=100% >\n\u003C\u002Fdiv>\n我们提出了 \u003Cb>PointLLM，一个能够理解物体彩色点云的多模态大语言模型。\u003C\u002Fb> 它能够感知物体类型、几何结构和外观，而无需担心深度模糊、遮挡或视角依赖性问题。\u003Cb>我们收集了一个包含 66 万条简单指令对和 7 万条复杂指令对的新数据集\u003C\u002Fb>，以支持两阶段训练策略。为了严格评估我们模型的感知能力及其泛化能力，\u003Cb>我们建立了两个基准测试：生成式 3D 物体分类和 3D 物体描述生成，并通过三种不同的评估方法进行评估。\u003C\u002Fb>\n\n## 🔥 新闻\n- [2026-03-17] PointLLM-V2 的训练标注已发布在[此处](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRunsenXu\u002FPointLLM\u002Ftree\u002Fmain)。\n- [2025-07-06] PointLLM 的改进版本 [PointLLM-V2](https:\u002F\u002Fwww.computer.org\u002Fcsdl\u002Fjournal\u002Ftp\u002F5555\u002F01\u002F11086426\u002F28xeHHLbKX6) 已被 TPAMI 2025 接收！模型、代码和数据即将发布！🎉\n- [2025-04-21] 由于需要将服务器用于其他用途，我们已关闭在线演示。\n- [2024-09-06] 我们已上传 PointLLM 用于 ECCV 2024 的最终版本，该版本包含更清晰的表述和额外的实验结果。请查看论文[此处](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.16911)。\n- [2024-07-01] PointLLM 已被 ECCV 2024 接收，并获得所有“强烈推荐”的评价。🎉 我们正在寻找有自我驱动力的学生进行 PointLLM 相关研究。如果您有兴趣，请将您的简历发送至 runsxu@gmail.com！\n- [2023-12-29] 我们发布了在线 Gradio 演示的代码。\n- [2023-12-26] 我们发布了模型评估代码，包括 ChatGPT\u002FGPT-4 评估和传统指标评估。\n- [2023-12-08] 我们发布了训练代码和 PointLLM-v1.2 模型。在线演示也已升级到 v1.2 版本。请享用！&#x1F389;\n- [2023-12-01] 我们发布了论文的更新版本（v2），其中包含了额外的基线比较、增强的人工评估指标、改进的模型性能（PointLLM-v1.2）以及其他优化。请查看更新版本[此处](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.16911)。\n- [2023-10-18] 我们发布了我们的指令跟随数据，包括简单描述指令和复杂指令。下载[此处](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRunsenXu\u002FPointLLM)。\n- [2023-09-26] 我们发布了推理代码及检查点，以及我们使用的 Objaverse 彩色点云文件。您可以在自己的机器上与 PointLLM 进行对话。\n- [2023-08-31] 我们发布了 PointLLM 的[论文](http:\u002F\u002Farxiv.org\u002Fabs\u002F2308.16911)和一个在线 Gradio 演示。试试看！&#x1F389;\n\n\u003C!-- contents with emoji -->\n## 📋 目录\n- [🤖 在线演示](#-在线演示)\n- [💬 对话示例](#-对话示例)\n- [🔍 概述](#-概述)\n- [📦 训练与评估](#-训练与评估)\n- [📝 待办事项](#-待办事项)\n- [🔗 引用](#-引用)\n- [📄 许可证](#-许可证)\n- [📚 相关工作](#-相关工作)\n- [👏 致谢](#-致谢)\n\n\n## 💬 对话示例\n| 对话 1 | 对话 2| 对话 3 | 对话 4\n| :-: | :-: | :-: | :-: |\n| \u003Cimg width=\"100%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_63e11e61c8a6.jpg\"> |  \u003Cimg width=\"100%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_59fa34a15ef1.jpg\"> |  \u003Cimg width=\"100%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_76e58346f20b.jpg\"> | \u003Cimg width=\"100%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_a409f68705f4.jpg\"> |\n\n\n## 🔍 概述\n\n### 模型\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_af713d326ccf.jpg\" align=\"center\" width=\"100%\">\n\u003C\u002Fp>\n点云编码器从输入点云中提取特征，并将其投影到 LLM 骨干网络的潜在空间中。LLM 骨干网络处理点云标记和文本标记序列，并生成预测标记作为输出。\n\n### 实验结果\n#### 与基线的定量比较。\n更多结果请参阅我们的论文。\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_b69a278c65c1.png\" align=\"center\" width=\"100%\">\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_f0484552a1bf.png\" align=\"center\" width=\"100%\">\n\u003C\u002Fp>\n\u003Cb>!!!注意：传统指标如 BLEU-1、ROUGE-L 和 METEOR 倾向于偏好较短的回复，可能无法有效捕捉语义准确性。有关此问题的详细讨论，请参阅我们的论文。我们建议社区不要仅依赖这些指标进行评估。\u003C\u002Fb>\n\n#### 与基线的定性比较。\n更多结果请参阅我们的论文。\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_5321307c090c.png\" align=\"center\" width=\"100%\">\n\u003C\u002Fp>\n\n## 📦 训练与评估\n### 安装\n我们在以下环境中测试了我们的代码：\n- Ubuntu 20.04\n- NVIDIA 驱动程序：515.65.01\n- CUDA 11.7\n- Python 3.10.13\n- PyTorch 2.0.1\n- Transformers 4.28.0.dev(transformers.git@cae78c46)\n\n开始步骤：\n1. 克隆此仓库。\n```bash\ngit clone git@github.com:OpenRobotLab\u002FPointLLM.git\ncd PointLLM\n```\n2. 安装包\n```bash\nconda create -n pointllm python=3.10 -y\nconda activate pointllm\npip install --upgrade pip  # 启用 PEP 660 支持\npip install -e .\n\n# * 用于训练\npip install ninja\npip install flash-attn\n```\n\n### 数据准备\n#### Objaverse 训练数据\n1. 从[此处](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRunsenXu\u002FPointLLM\u002Ftree\u002Fmain)下载 660K 个 Objaverse 彩色点云的两个压缩文件。它们大约需要 77GB 的存储空间。\n2. 运行以下命令将两个文件合并为一个并解压缩。这将生成一个名为 `8192_npy` 的文件夹，其中包含 660K 个名为 `{Objaverse_ID}_8192.npy` 的点云文件。每个文件都是一个维度为 (8192, 6) 的 numpy 数组，其中前三个维度是 `xyz`，后三个维度是 `rgb`，范围在 [0, 1] 内。\n```bash\ncat Objaverse_660K_8192_npy_split_a* > Objaverse_660K_8192_npy.tar.gz\ntar -xvf Objaverse_660K_8192_npy.tar.gz\n```\n3. 在 `PointLLM` 文件夹中，创建一个 `data` 文件夹，并在该目录中创建指向解压文件的软链接。\n```bash\ncd PointLLM\nmkdir data\nln -s \u002Fpath\u002Fto\u002F8192_npy data\u002Fobjaverse_data\n```\n\n#### 指令遵循数据\n1. 在 `PointLLM\u002Fdata` 文件夹中，创建一个名为 `anno_data` 的目录。\n2. 我们的指令遵循数据，包括简单描述和复杂指令，可以从[此处](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRunsenXu\u002FPointLLM)下载。如果您下载数据有困难（例如网络问题），请通过电子邮件联系作者。\n- 简单描述数据有 660K 个样本，复杂指令有 70K 个样本。\n- 两种训练数据都基于 Objaverse 数据集。\n- 复杂指令是使用 GPT-4 生成的。\n3. 将数据文件放入 `anno_data` 目录。目录结构应如下所示：\n```bash\nPointLLM\u002Fdata\u002Fanno_data\n├── PointLLM_brief_description_660K_filtered.json\n├── PointLLM_brief_description_660K.json\n└── PointLLM_complex_instruction_70K.json\n```\n4. 请注意，`PointLLM_brief_description_660K_filtered.json` 是从 `PointLLM_brief_description_660K.json` 过滤而来，移除了我们保留的 3000 个对象作为验证集。如果您想复现我们论文中的结果，应使用 `PointLLM_brief_description_660K_filtered.json` 进行训练。`PointLLM_complex_instruction_70K.json` 包含来自训练集的对象。\n5. 如果您想自己生成复杂指令，请参考我们的论文了解其他细节。系统提示位于 `pointllm\u002Fdata\u002Fdata_generation\u002Fsystem_prompt_gpt4_0613.txt`。\n6. [可选] PointLLM-V2 的标注可在 [PointLLM_V2_Stage1_1M_filtered.json](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRunsenXu\u002FPointLLM\u002Fresolve\u002Fmain\u002FPointLLM_V2_Stage1_1M_filtered.json) 和 [PointLLM_V2_Stage2_700k_filtered.json](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRunsenXu\u002FPointLLM\u002Fresolve\u002Fmain\u002FPointLLM_V2_Stage2_700k_filtered.json) 获取。您需要从 Objaverse-XL [此处](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Ftiange\u002FCap3D\u002Ftree\u002Fmain\u002FPointCloud_zips) 下载额外的点云。\n\n#### 评估数据\n1. 从[此处](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRunsenXu\u002FPointLLM\u002Fblob\u002Fmain\u002FPointLLM_brief_description_val_200_GT.json)下载我们用于 Objaverse 数据集基准测试的参考真值 `PointLLM_brief_description_val_200_GT.json`，并将其放入 `PointLLM\u002Fdata\u002Fanno_data`。我们还提供了训练期间过滤的 3000 个对象 ID [此处](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRunsenXu\u002FPointLLM\u002Fblob\u002Fmain\u002Fval_object_ids_3000.txt) 及其对应的参考真值 [此处](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRunsenXu\u002FPointLLM\u002Fblob\u002Fmain\u002FPointLLM_brief_description_val_3000_GT.json)，可用于对所有 3000 个对象进行评估。\n2. 在 `PointLLM\u002Fdata` 中创建一个名为 `modelnet40_data` 的目录。从[此处](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRunsenXu\u002FPointLLM\u002Fblob\u002Fmain\u002Fmodelnet40_test_8192pts_fps.dat)下载 ModelNet40 点云的测试分割 `modelnet40_test_8192pts_fps.dat`，并将其放入 `PointLLM\u002Fdata\u002Fmodelnet40_data`。\n\n### 训练\n#### 下载初始 LLM 和点编码器权重\n1. 在 `PointLLM` 文件夹中，创建一个名为 `checkpoints` 的目录。\n2. 下载预训练的 LLM 和点编码器：[PointLLM_7B_v1.1_init](https:\u002F\u002Fhuggingface.co\u002FRunsenXu\u002FPointLLM_7B_v1.1_init\u002Ftree\u002Fmain) 或 [PointLLM_13B_v1.1_init](https:\u002F\u002Fhuggingface.co\u002FRunsenXu\u002FPointLLM_13B_v1.1_init\u002Ftree\u002Fmain)。将它们放入 `checkpoints` 目录。\n3. 请注意，上述 \"v1.1\" 表示我们使用 Vicuna-v1.1 检查点，您**无需**再次下载原始的 LLaMA 权重。\n\n#### 开始训练\n1. 对于第一阶段训练，只需运行：\n```bash\ncd PointLLM\nscripts\u002FPointLLM_train_stage1.sh\n```\n2. 第一阶段训练完成后，开始第二阶段训练：\n```bash\nscripts\u002FPointLLM_train_stage2.sh\n```\n\n#### PointLLM-v1.1 与 PointLLM-v1.2\n通常，您无需关心以下内容。它们仅用于复现我们 v1 论文（PointLLM-v1.1）中的结果。如果您想与我们的模型进行比较或将我们的模型用于下游任务，请使用性能更好的 PointLLM-v1.2（参考我们的 v2 论文）。\n\u003Cdetails>\n  \u003Csummary>以下步骤用于复现 PointLLM-v1.1（点击展开）\u003C\u002Fsummary>\n\n1. PointLLM v1.1 和 v1.2 使用略有不同的预训练点编码器和投影器。如果您想复现 PointLLM v1.1，请编辑初始 LLM 和点编码器权重目录中的 `config.json` 文件，例如 `vim checkpoints\u002FPointLLM_7B_v1.1_init\u002Fconfig.json`。\n\n2. 更改键 `\"point_backbone_config_name\"` 以指定另一个点编码器配置：\n    ```bash\n    # 从\n    \"point_backbone_config_name\": \"PointTransformer_8192point_2layer\" # v1.2\n    # 改为\n    \"point_backbone_config_name\": \"PointTransformer_base_8192point\", # v1.1\n    ```\n\n3. 编辑 `scripts\u002Ftrain_stage1.sh` 中点编码器的检查点路径：\n    ```bash\n    # 从\n    point_backbone_ckpt=$model_name_or_path\u002Fpoint_bert_v1.2.pt # v1.2\n    # 改为\n    point_backbone_ckpt=$model_name_or_path\u002Fpoint_bert_v1.1.pt # v1.1\n    ```\n\u003C\u002Fdetails>\n\n### 聊天\n1. 训练好的模型检查点可在[此处](https:\u002F\u002Fhuggingface.co\u002FRunsenXu)获取（包括不同版本的 PointLLM）。\n2. 运行以下命令，使用 `torch.float32` 数据类型启动聊天机器人，以讨论 Objaverse 的 3D 模型。模型检查点将自动下载。您也可以手动下载模型检查点并指定其路径。示例如下：\n```bash\ncd PointLLM\nPYTHONPATH=$PWD python pointllm\u002Feval\u002FPointLLM_chat.py --model_name RunsenXu\u002FPointLLM_7B_v1.2 --data_name data\u002Fobjaverse_data --torch_dtype float32\n```\n3. 您也可以轻松修改代码以使用 Objaverse 以外的点云，只要输入模型的点云维度为 (N, 6)，其中前三个维度是 `xyz`，后三个维度是 `rgb`（范围在 [0, 1] 内）。您可以将点云采样为 8192 个点，因为我们的模型是在此类点云上训练的。\n4. 下表显示了不同模型和数据类型的 GPU 要求。如果适用，我们推荐使用 `torch.bfloat16`，这也是我们论文实验中使用的数据类型。\n\n    |  模型   | 数据类型 | GPU 内存 |\n    |:--------:|:---------:|:----------:|\n    | PointLLM-7B  | torch.float16 |    14GB    |\n    | PointLLM-7B  | torch.float32 |    28GB    |\n    | PointLLM-13B | torch.float16 |    26GB    |\n    | PointLLM-13B | torch.float32 |    52GB    |\n\n### Gradio 演示\n1. 我们提供了在线 Gradio 演示的代码。您可以运行以下命令在本地启动演示，进行聊天和可视化。\n```bash\ncd PointLLM\nPYTHONPATH=$PWD python pointllm\u002Feval\u002Fchat_gradio.py --model_name RunsenXu\u002FPointLLM_7B_v1.2 --data_path data\u002Fobjaverse_data\n```\n2. 温馨提示：如果您想在公共环境中发布演示，请参考 https:\u002F\u002Fwww.gradio.app\u002Fguides\u002Fsharing-your-app#security-and-file-access。\n\n### 评估\n#### 推理\n1. 运行以下命令进行推理。\n2. 在不同基准测试上进行推理的不同命令（以 PointLLM_7B_v1.2 为例）：\n```bash\ncd PointLLM\nexport PYTHONPATH=$PWD\n\n# Objaverse 上的开放词汇分类\npython pointllm\u002Feval\u002Feval_objaverse.py --model_name RunsenXu\u002FPointLLM_7B_v1.2 --task_type classification --prompt_index 0 # 或 --prompt_index 1\n\n# Objaverse 上的物体描述生成\npython pointllm\u002Feval\u002Feval_objaverse.py --model_name RunsenXu\u002FPointLLM_7B_v1.2 --task_type captioning --prompt_index 2\n\n# ModelNet40 上的闭集零样本分类\npython pointllm\u002Feval\u002Feval_modelnet_cls.py --model_name RunsenXu\u002FPointLLM_7B_v1.2 --prompt_index 0 # 或 --prompt_index 1\n```\n3. 请检查这两个脚本的默认命令行参数。您可以指定不同的提示词、数据路径和其他参数。\n4. 推理后，结果将以以下格式的字典形式保存在 `{model_name}\u002Fevaluation` 中：\n```bash\n{\n  \"prompt\": \"\",\n  \"results\": [\n    {\n      \"object_id\": \"\",\n      \"ground_truth\": \"\", \n      \"model_output\": \"\",\n      \"label_name\": \"\" # 仅用于 modelnet40 分类\n    }\n  ]\n}\n```\n\n#### ChatGPT\u002FGPT-4 评估\n1. 在 [https:\u002F\u002Fplatform.openai.com\u002Fapi-keys](https:\u002F\u002Fplatform.openai.com\u002Fapi-keys) 获取您的 OpenAI API 密钥。\n2. 运行以下命令，使用 ChatGPT\u002FGPT-4 并行评估模型输出（大约花费 1.5 到 2.2 美元）。\n```bash\ncd PointLLM\nexport PYTHONPATH=$PWD\nexport OPENAI_API_KEY=sk-****\n\n# Objaverse 上的开放词汇分类\npython pointllm\u002Feval\u002Fevaluator.py --results_path \u002Fpath\u002Fto\u002Fmodel_output --model_type gpt-4-0613 --eval_type open-free-form-classification --parallel --num_workers 15\n\n# Objaverse 上的物体描述生成\npython pointllm\u002Feval\u002Fevaluator.py --results_path \u002Fpath\u002Fto\u002Fmodel_output --model_type gpt-4-0613 --eval_type object-captioning --parallel --num_workers 15\n\n# ModelNet40 上的闭集零样本分类\npython pointllm\u002Feval\u002Fevaluator.py --results_path \u002Fpath\u002Fto\u002Fmodel_output --model_type gpt-3.5-turbo-0613 --eval_type modelnet-close-set-classification --parallel --num_workers 15\n```\n3. 评估脚本支持中断和恢复。您可以在任何时候使用 `Ctrl+C` 中断评估过程。这将保存临时结果。如果评估过程中发生错误，脚本也会保存当前状态。您可以通过再次运行相同的命令从上次中断的地方恢复评估。\n4. 评估结果将以另一个字典的形式保存在 `{model_name}\u002Fevaluation` 中。\n部分指标解释如下：\n```bash\n\"average_score\": 我们在论文中报告的 GPT 评估的描述生成分数。\n\"accuracy\": 我们在论文中报告的分类准确率，包括当模型输出模糊或模棱两可以及 ChatGPT 输出 \"INVALID\" 时，ChatGPT 所做的随机选择。\n\"clean_accuracy\": 移除那些 \"INVALID\" 输出后的分类准确率。\n\"total_predictions\": 预测总数。\n\"correct_predictions\": 正确预测数。\n\"invalid_responses\": ChatGPT 输出的 \"INVALID\" 数量。\n\n# 调用 OpenAI API 的一些其他统计信息\n\"prompt_tokens\": ChatGPT\u002FGPT-4 提示词的总令牌数。\n\"completion_tokens\": ChatGPT\u002FGPT-4 完成结果的总令牌数。\n\"GPT_cost\": 整个评估过程的 API 成本，以美元计 💵。\n```\n5. \u003Cb>一步式评估。\u003C\u002Fb> 您也可以在推理后立即开始评估，通过传递 `--start_eval` 标志并指定 `--gpt_type`。例如：\n```bash\npython pointllm\u002Feval\u002Feval_objaverse.py --model_name RunsenXu\u002FPointLLM_7B_v1.2 --task_type classification --prompt_index 0 --start_eval --gpt_type gpt-4-0613\n```\n\n#### 传统指标评估\n1. 对于物体描述生成任务，运行以下命令，使用传统指标评估模型输出，包括 BLEU、ROUGE、METEOR、Sentence-BERT 和 SimCSE。\n```bash\npython pointllm\u002Feval\u002Ftraditional_evaluator.py --results_path \u002Fpath\u002Fto\u002Fmodel_captioning_output\n```\n2. 请注意，我们不建议使用 BLEU、ROUGE 和 METEOR 进行评估，因为它们偏向于短描述，并且在捕捉语义准确性和多样性方面存在不足。\n\n## 📝 待办事项列表\n- [x] 添加带有检查点的推理代码。\n- [x] 发布指令遵循数据。\n- [x] 添加训练代码。\n- [x] 添加评估代码。\n- [x] 添加 gradio 演示代码。\n- [ ] 发布具有更好模型和数据的 PointLLM-V2。\n\n欢迎社区贡献！👇 如果您需要任何支持，请随时提出问题或联系我们。\n- [ ] 支持 Phi-2 LLM，使 PointLLM 更易于社区使用。\n- [ ] 支持 InternLM 等中文 LLM。\n\n## 🔗 引用\n\n如果您觉得我们的工作和此代码库有帮助，请考虑给这个仓库点个星 🌟 并引用：\n\n```bibtex\n@inproceedings{xu2024pointllm,\n  title={PointLLM: Empowering Large Language Models to Understand Point Clouds},\n  author={Xu, Runsen and Wang, Xiaolong and Wang, Tai and Chen, Yilun and Pang, Jiangmiao and Lin, Dahua},\n  booktitle={ECCV},\n  year={2024}\n}\n```\n\n## 📄 许可证\n\u003Ca rel=\"license\" href=\"http:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-nc-sa\u002F4.0\u002F\">\u003Cimg alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_readme_8a4e76cf0ed2.png\" \u002F>\u003C\u002Fa>\n\u003Cbr \u002F>\n本作品遵循 \u003Ca rel=\"license\" href=\"http:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-nc-sa\u002F4.0\u002F\">知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议\u003C\u002Fa>。\n\n## 📚 相关工作\n让我们携手，共同推动面向 3D 的大语言模型（LLM）发展！\n- [Point-Bind & Point-LLM](https:\u002F\u002Farxiv.org\u002Fabs\u002F2309.00615)：将点云与 Image-Bind 对齐，并利用 ImageBind-LLM 来推理多模态输入，而无需 3D 指令数据训练。\n- [3D-LLM](https:\u002F\u002Farxiv.org\u002Fabs\u002F2307.12981)：使用 2D 基础模型来编码 3D 点云的多视图图像。\n\n## 👏 致谢\n- [LLaVA](https:\u002F\u002Fgithub.com\u002Fhaotian-liu\u002FLLaVA)：我们的代码库基于 LLaVA 构建。\n- [Vicuna](https:\u002F\u002Fgithub.com\u002Flm-sys\u002FFastChat)：我们使用了 Vicuna-7B 和 Vicuna-13B 的检查点。\n- [Objaverse](https:\u002F\u002Fobjaverse.allenai.org)：我们使用 Objaverse 数据集的模型进行训练和评估。\n- [Cap3D](https:\u002F\u002Fgithub.com\u002Fcrockwell\u002FCap3D\u002F)：我们使用 Cap3D 的标注数据来生成我们的数据。\n- [ULIP-2](https:\u002F\u002Fgithub.com\u002Fsalesforce\u002FULIP)：我们使用 ULIP-2 来预训练我们的点云编码器。","# PointLLM 快速上手指南\n\nPointLLM 是一个能够理解彩色物体点云的多模态大语言模型。本指南将帮助您快速完成环境配置、安装和基本使用。\n\n## 环境准备\n\n**系统要求**\n- **操作系统**: Ubuntu 20.04\n- **显卡驱动**: NVIDIA Driver 515.65.01 或更高版本\n- **CUDA**: 11.7\n- **Python**: 3.10.13\n\n**前置依赖**\n- PyTorch 2.0.1\n- Transformers 4.28.0.dev (特定版本)\n\n## 安装步骤\n\n1.  **克隆仓库**\n    ```bash\n    git clone git@github.com:OpenRobotLab\u002FPointLLM.git\n    cd PointLLM\n    ```\n\n2.  **创建并激活 Conda 环境**\n    ```bash\n    conda create -n pointllm python=3.10 -y\n    conda activate pointllm\n    ```\n\n3.  **安装依赖包**\n    ```bash\n    pip install --upgrade pip\n    pip install -e .\n    ```\n    *注：如需进行模型训练，还需额外安装以下包*\n    ```bash\n    pip install ninja\n    pip install flash-attn\n    ```\n\n## 数据准备\n\n### 1. 下载点云数据\n从 [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRunsenXu\u002FPointLLM\u002Ftree\u002Fmain) 下载两个压缩文件（共约 77GB），合并并解压：\n```bash\ncat Objaverse_660K_8192_npy_split_a* > Objaverse_660K_8192_npy.tar.gz\ntar -xvf Objaverse_660K_8192_npy.tar.gz\n```\n解压后得到 `8192_npy` 文件夹，内含 66 万个点云文件（`.npy` 格式，形状为 `(8192, 6)`）。\n\n### 2. 创建数据链接\n在 `PointLLM` 目录下创建 `data` 文件夹，并建立软链接指向解压后的数据：\n```bash\ncd PointLLM\nmkdir data\nln -s \u002Fpath\u002Fto\u002F8192_npy data\u002Fobjaverse_data\n```\n请将 `\u002Fpath\u002Fto\u002F8192_npy` 替换为您的实际解压路径。\n\n### 3. 下载指令数据\n1.  在 `PointLLM\u002Fdata` 下创建 `anno_data` 目录。\n2.  从同一 [Hugging Face 链接](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRunsenXu\u002FPointLLM) 下载以下三个 JSON 文件，并放入 `anno_data` 目录：\n    - `PointLLM_brief_description_660K_filtered.json`\n    - `PointLLM_brief_description_660K.json`\n    - `PointLLM_complex_instruction_70K.json`\n\n## 基本使用（推理）\n\n### 1. 下载预训练模型\n在 `PointLLM` 目录下创建 `checkpoints` 文件夹，并从以下链接下载模型权重（如 7B 版本）放入其中：\n- [PointLLM_7B_v1.1_init](https:\u002F\u002Fhuggingface.co\u002FRunsenXu\u002FPointLLM_7B_v1.1_init\u002Ftree\u002Fmain)\n\n### 2. 运行推理脚本\n项目提供了与模型对话的示例代码。您可以参考 `PointLLM` 仓库中的推理脚本（例如 `inference.py` 或相关 demo 代码）来加载模型并与您的点云数据进行交互。基本流程如下：\n1.  加载点云数据（`.npy` 文件）。\n2.  使用 `PointLLM` 模型处理点云。\n3.  输入文本指令，获取模型生成的描述或对话回复。\n\n*具体推理命令请参考项目 `README` 中 “Inference” 部分或相关脚本的说明。*\n\n## 模型训练（可选）\n\n如果您需要从头训练或微调模型，请按以下步骤进行：\n\n### 1. 阶段一训练\n运行提供的脚本：\n```bash\ncd PointLLM\nscripts\u002FPointLLM_train_stage1.sh\n```\n\n### 2. 阶段二训练\n在阶段一训练完成后，运行：\n```bash\nscripts\u002FPointLLM_train_stage2.sh\n```\n\n**注意**：训练需要大量计算资源和时间，请确保您的硬件环境满足要求。","一位自动驾驶算法工程师正在开发车辆的3D场景理解模块，需要让系统能准确识别并描述激光雷达（LiDAR）扫描生成的复杂点云数据中的物体及其状态。\n\n### 没有 PointLLM 时\n- **依赖多阶段分离模型**：需要分别部署点云分割、3D目标检测和文本生成模型，流程繁琐且系统延迟高。\n- **难以处理复杂语义查询**：系统只能输出“汽车”、“行人”等基础类别，无法响应“左前方那辆银色轿车是否正在打开车门？”这类需要结合几何、外观和上下文推理的复杂问题。\n- **描述信息贫乏且固定**：生成的描述通常是模板化的，缺乏对物体颜色、精确朝向、部件状态（如车门开闭）等细节的捕捉。\n- **泛化能力有限**：针对特定数据集训练的模型，在面对真实路采数据中新的物体形状或遮挡严重的点云时，性能会显著下降。\n- **调试与评估困难**：工程师需要人工比对点云和输出结果，难以快速、定量地评估模型对3D几何和语义理解的综合能力。\n\n### 使用 PointLLM 后\n- **端到端统一理解**：PointLLM 作为一个多模态大模型，直接接收点云和文本指令，一站式完成感知、推理与描述，简化了系统架构。\n- **支持自然语言交互式查询**：工程师或系统可以直接用自然语言提问，例如“描述右后方最靠近的物体”，PointLLM 能理解指令并输出包含类别、几何属性（位置、大小）、外观（颜色）和潜在状态的综合描述。\n- **生成丰富、准确的动态描述**：能够输出如“一辆红色的卡车，部分装载着货物，货箱门处于开启状态”等细致、非固定的描述，极大提升了场景理解的粒度。\n- **强大的零样本泛化能力**：得益于在大规模点云-文本对上的训练，PointLLM 对未知物体形状、严重遮挡或稀疏点云表现出更好的理解和描述鲁棒性。\n- **内置基准助力高效迭代**：利用 PointLLM 提供的生成式3D分类与描述评测基准，工程师可以快速、定量地评估模型性能，加速研发调试流程。\n\nPointLLM 的核心价值在于，它将大语言模型的强大推理与生成能力与点云感知深度融合，让机器能以接近人类思维的方式理解和描述三维物理世界。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FInternRobotics_PointLLM_752a9415.jpg","InternRobotics","Intern Robotics","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FInternRobotics_3c4861a1.png","Building inclusive infrastructure for Embodied AI, from Shanghai AI Lab.",null,"embodiedai@pjlab.org.cn","https:\u002F\u002Finternrobotics.shlab.org.cn","https:\u002F\u002Fgithub.com\u002FInternRobotics",[84,88],{"name":85,"color":86,"percentage":87},"Python","#3572A5",98.6,{"name":89,"color":90,"percentage":91},"Shell","#89e051",1.4,996,56,"2026-04-03T04:23:43",4,"Linux","必需。测试环境为 NVIDIA Driver 515.65.01，CUDA 11.7。训练需要较大显存，具体大小未明确说明，但需处理 8192 个点的点云数据，建议 8GB 或以上显存。","未说明。但需准备约 77GB 的存储空间用于下载训练数据。",{"notes":100,"python":101,"dependencies":102},"1. 环境在 Ubuntu 20.04 下测试通过。2. 需使用 conda 创建并激活名为 'pointllm' 的虚拟环境。3. 首次运行需下载约 77GB 的点云训练数据及预训练模型权重。4. 训练分为两个阶段，需依次运行脚本。5. 提供了 7B 和 13B 两种规模的模型初始化权重。","3.10.13",[103,104,105,106],"torch==2.0.1","transformers==4.28.0.dev","ninja","flash-attn",[26,54,13],[109,110,111,112,113,114,115,116,117,118,119,120],"3d","chatbot","foundation-models","gpt-4","large-language-models","llama","multimodal","objaverse","point-cloud","representation-learning","vision-and-language","pointllm","2026-03-27T02:49:30.150509","2026-04-06T07:00:43.786305",[124,129,133,138,143,148],{"id":125,"question_zh":126,"answer_zh":127,"source_url":128},3968,"训练时遇到 CUDA 内存不足（CUDA out of memory）错误，尤其是在显存有限的 GPU（如 24GB）上，该如何解决？","当 GPU 内存不足时，可以尝试以下方法：\n1.  使用内存效率更高的注意力实现，例如 xformers。\n2.  将模型的不同 Transformer 层分配到不同的 GPU 上。\n3.  使用 float16 精度进行训练。\n4.  考虑更换为更小的大语言模型（LLM），例如将 LLaMA-7B 替换为 Phi-2。\n5.  如果使用 PyTorch，可以尝试设置环境变量 `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True` 来避免内存碎片。\n请注意，仅使用一张 24GB 的 GPU 可能不足以完成训练。","https:\u002F\u002Fgithub.com\u002FInternRobotics\u002FPointLLM\u002Fissues\u002F31",{"id":130,"question_zh":131,"answer_zh":132,"source_url":128},3969,"如何为训练设置 GPU 设备？","在训练脚本中，通常可以通过设置 `CUDA_VISIBLE_DEVICES` 环境变量或修改训练脚本中的相关配置来指定使用的 GPU 设备。如果脚本中没有明确的设备配置参数，请检查启动脚本（例如 `.sh` 文件）或训练主文件（如 `train.py`）中是否有关于设备 ID 或分布式训练的设置。",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},3970,"训练时出现警告“Some weights of PointLLMLlamaForCausalLM were not initialized”，这是什么原因？","这个警告表明从检查点文件加载预训练权重时，模型 `PointLLMLlamaForCausalLM` 中的部分权重没有被初始化。这通常是因为检查点文件中的权重与当前模型结构不完全匹配。\n请确保你使用的检查点文件（例如 `checkpoints\u002FPointLLM_7B_v1.1_init`）是为当前代码版本准备的正确版本。如果问题仍然存在，可以尝试重新下载或生成检查点文件。","https:\u002F\u002Fgithub.com\u002FInternRobotics\u002FPointLLM\u002Fissues\u002F32",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},3971,"训练时程序在加载 Point-BERT 检查点后自动退出，没有明显报错，可能是什么原因？","程序在加载检查点后静默退出，可能是由于库版本不兼容导致的。一个已知的解决方法是更新相关的库到兼容版本。例如，有用户通过将 `transformers` 库更新到 4.45.0 版本，将 `accelerate` 库更新到 0.34.2 版本后，训练得以正常启动。建议检查并确保你的 PyTorch、Transformers 和 Accelerate 等关键库的版本与项目要求匹配。","https:\u002F\u002Fgithub.com\u002FInternRobotics\u002FPointLLM\u002Fissues\u002F68",{"id":144,"question_zh":145,"answer_zh":146,"source_url":147},3972,"提供的点云数据是如何归一化的？这对于使用其他点云测试编码器很重要。","根据 Issue 中的讨论，用户可以直接使用从 HuggingFace 获取的 `Objaverse_660K_8192_npy_split_a*` 文件中的点云，而无需进行额外的归一化步骤。这些数据已经过预处理，可以直接用于 Point-BERT 编码器。如果你想使用自己的点云数据，可能需要参考原始数据集的预处理流程（如缩放、中心化等）来确保一致性。","https:\u002F\u002Fgithub.com\u002FInternRobotics\u002FPointLLM\u002Fissues\u002F23",{"id":149,"question_zh":150,"answer_zh":151,"source_url":137},3973,"是否可以将 PointLLM 中的 LLM 主干网络（例如 LLaMA）替换为其他模型（如 T5）？","理论上是可以的。关键在于理解 PointLLM 的推理流程：它将点云 token 处理成 LLM 主干网络可以处理的 token 序列，然后调用 LLM 的 forward 函数。要替换为 T5，你需要：\n1.  仔细研究 PointLLM 的代码，特别是 `pointllm\u002Fmodel\u002Fpointllm.py` 中处理 token 和调用 LLM 的部分。\n2.  确定 T5 模型所需的输入输出格式。\n3.  调整代码以适应 T5 的接口，这个过程预计与现有 LLM 的集成有很高的相似度。\n建议深入阅读相关代码来理解其底层过程。",[]]