[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-facebookresearch--dinov3":3,"tool-facebookresearch--dinov3":62},[4,18,26,35,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,2,"2026-04-10T11:39:34",[14,15,13],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":32,"last_commit_at":41,"category_tags":42,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[43,13,15,14],"插件",{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":10,"last_commit_at":50,"category_tags":51,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[52,15,13,14],"语言模型",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,"2026-04-06T03:28:53",[14,15,13,61],"视频",{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":77,"owner_twitter":77,"owner_website":78,"owner_url":79,"languages":80,"stars":97,"forks":98,"last_commit_at":99,"license":100,"difficulty_score":32,"env_os":101,"env_gpu":102,"env_ram":101,"env_deps":103,"category_tags":109,"github_topics":77,"view_count":32,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":111,"updated_at":112,"faqs":113,"releases":114},7434,"facebookresearch\u002Fdinov3","dinov3","Reference PyTorch implementation and models for DINOv3","DINOv3 是由 Meta AI 研发的最新一代自监督视觉基础模型，旨在为计算机视觉任务提供高质量的密集特征表示。它解决了传统模型依赖大量标注数据进行微调的痛点，无需针对特定任务进行精细调整，即可在语义分割、单目深度估计及植被高度测绘等多种视觉任务中达到甚至超越专用模型的顶尖水平。\n\n这款工具特别适合人工智能研究人员、算法工程师以及需要处理复杂视觉数据的开发者使用。其核心亮点在于能够生成高分辨率的密集特征图，显著提升了对图像细节的捕捉能力和全局一致性。例如，最新发布的 CHMv2 模型利用 DINOv3 技术，大幅提高了全球植被高度地图的精度与细节表现。此外，DINOv3 已全面集成至 Hugging Face Transformers 和 PyTorch Image Models (timm) 等主流开源库中，支持多种骨干网络架构，并提供了便捷的蒸馏代码与推理接口，让用户能够轻松将其应用于科研探索或实际工程部署中。",":new: [2026-03-10] :fire: The [Canopy Height Maps v2 (CHMv2) model](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.06382) and inference code are now available (more details on downloading the model weights and using the code [here](#canopy-height-maps-v2-chmv2)). The model weights are also available in [Hugging Face Hub](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fdinov3-vitl16-chmv2-dpt-head) and [supported](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Fblob\u002Fmain\u002Fdocs\u002Fsource\u002Fen\u002Fmodel_doc\u002Fchmv2.md) by the Hugging Face [Transformers](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Findex) library. Building on our original high-resolution canopy height maps released in 2024, CHMv2 delivers substantial improvements in accuracy, detail, and global consistency by leveraging DINOv3.\n\n[2025-11-20] Distillation code and configurations for ConvNeXt backbones are now released!\n\n[2025-10-13] [Semantic segmentation](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdinov3?tab=readme-ov-file#linear-segmentation-with-data-augmentation-on-ade20k) (ADE20K) and [monocular depth estimation](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdinov3?tab=readme-ov-file#linear-depth-estimation-on-nyuv2-depth) (NYUv2-Depth) linear probing code are now released!\n\n[2025-09-17] DINOv3 backbones are now supported by the [PyTorch Image Models \u002F timm](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002F) library starting with version [1.0.20](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Freleases\u002Ftag\u002Fv1.0.20)\n\n[2025-08-29] DINOv3 backbones are [supported](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_doc\u002Fdinov3) by released versions of the Hugging Face [Transformers](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Findex) library starting with version [4.56.0](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Freleases\u002Ftag\u002Fv4.56.0)\n\n[2025-08-14] DINOv3 backbones are now available in [Hugging Face Hub](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Ffacebook\u002Fdinov3-68924841bd6b561778e31009) and [supported](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_doc\u002Fdinov3) by the [development](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002F) version of the Hugging Face [Transformers](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Findex) library\n\n# DINOv3 🦖🦖🦖\n\n**[Meta AI Research, FAIR](https:\u002F\u002Fai.meta.com\u002Fresearch\u002F)**\n\nOriane Siméoni, Huy V. Vo, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, \u003Cbr\u002F>\nCijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, \u003Cbr\u002F>\nFrancisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, \u003Cbr\u002F>\nTimothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, \u003Cbr\u002F>\nAndrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, \u003Cbr\u002F>\nJulien Mairal, Hervé Jégou, Patrick Labatut, Piotr Bojanowski\n\n[ :scroll: [`Paper`](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.10104)] [ :newspaper: [`Blog`](https:\u002F\u002Fai.meta.com\u002Fblog\u002Fdinov3-self-supervised-vision-model\u002F)] [ :globe_with_meridians: [`Website`](https:\u002F\u002Fai.meta.com\u002Fdinov3\u002F)] [ :book: [`BibTeX`](#citing-dinov3)]\n\nReference PyTorch implementation and models for DINOv3. For details, see the **[DINOv3](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.10104)** paper.\n\n## Overview\n\n\u003Cdiv align=\"center\">\n  \u003Cimg width=\"1364\" height=\"1024\" alt=\"market\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_dinov3_readme_e0197b9dd25f.png\" \u002F>\n\n  \u003Ci>\u003C\u002Fem>\u003Cb>High-resolution dense features.\u003C\u002Fb>\u003Cbr\u002F>We visualize the cosine similarity maps obtained with DINOv3 output features\u003Cbr\u002F> between the patches marked with a red cross and all other patches.\u003C\u002Fi>\n\u003C\u002Fdiv>\n\n\u003Cbr\u002F>\n\nAn extended family of versatile vision foundation models producing high-quality dense features and achieving outstanding performance on various vision tasks including outperforming the specialized state of the art across a broad range of settings, without fine-tuning\n\n## Pretrained models\n\n:information_source: Please follow the link provided below to get access to all the model weights: once accepted, an e-mail will be sent with the complete list of URLs pointing to all the available model weights (both backbones and adapters). These URLs can then be used to either:\n- download the model or adapter weights to a local filesystem and point `torch.hub.load()` to these local weights via the `weights` or `backbone_weights` parameters, or\n- directly invoke `torch.hub.load()` to download and load a backbone or an adapter from its URL via also the `weights` or `backbone_weights` parameters.\n\nSee the example code snippets below.\n\n:warning: Please use `wget` instead of a web browser to download the weights.\n\nViT models pretrained on web dataset (LVD-1689M):\n\u003Ctable style=\"margin: auto\">\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth>Model\u003C\u002Fth>\n      \u003Cth>Parameters\u003C\u002Fth>\n      \u003Cth>Pretraining\u003Cbr\u002F>Dataset\u003C\u002Fth>\n      \u003Cth>Download\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd>ViT-S\u002F16 distilled \u003C\u002Ftd>\n      \u003Ctd align=\"right\">21M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[link]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>ViT-S+\u002F16 distilled\u003C\u002Ftd>\n      \u003Ctd align=\"right\">29M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[link]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>ViT-B\u002F16 distilled\u003C\u002Ftd>\n      \u003Ctd align=\"right\">86M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[link]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>ViT-L\u002F16 distilled\u003C\u002Ftd>\n      \u003Ctd align=\"right\">300M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[link]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>ViT-H+\u002F16 distilled\u003C\u002Ftd>\n      \u003Ctd align=\"right\">840M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[link]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>ViT-7B\u002F16\u003C\u002Ftd>\n      \u003Ctd align=\"right\">6,716M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[link]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\nConvNeXt models pretrained on web dataset (LVD-1689M):\n\u003Ctable style=\"margin: auto\">\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth>Model\u003C\u002Fth>\n      \u003Cth>Parameters\u003C\u002Fth>\n      \u003Cth>Pretraining\u003Cbr\u002F>Dataset\u003C\u002Fth>\n      \u003Cth>Download\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd>ConvNeXt Tiny\u003C\u002Ftd>\n      \u003Ctd align=\"right\">29M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[link]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>ConvNeXt Small\u003C\u002Ftd>\n      \u003Ctd align=\"right\">50M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[link]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>ConvNeXt Base\u003C\u002Ftd>\n      \u003Ctd align=\"right\">89M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[link]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>ConvNeXt Large\u003C\u002Ftd>\n      \u003Ctd align=\"right\">198M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[link]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\nViT models pretrained on satellite dataset (SAT-493M):\n\u003Ctable style=\"margin: auto\">\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth>Model\u003C\u002Fth>\n      \u003Cth>Parameters\u003C\u002Fth>\n      \u003Cth>Pretraining\u003Cbr\u002F>Dataset\u003C\u002Fth>\n      \u003Cth>Download\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd>ViT-L\u002F16 distilled\u003C\u002Ftd>\n      \u003Ctd align=\"right\">300M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">SAT-493M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[link]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>ViT-7B\u002F16\u003C\u002Ftd>\n      \u003Ctd align=\"right\">6,716M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">SAT-493M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[link]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n\n### Pretrained backbones (via PyTorch [Hub](https:\u002F\u002Fdocs.pytorch.org\u002Fdocs\u002Fstable\u002Fhub.html))\n\nPlease follow the instructions [here](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F) to install PyTorch (the only required dependency for loading the model). Installing PyTorch with CUDA support is strongly recommended.\n\n```python\nimport torch\n\nREPO_DIR = \u003CPATH\u002FTO\u002FA\u002FLOCAL\u002FDIRECTORY\u002FWHERE\u002FTHE\u002FDINOV3\u002FREPO\u002FWAS\u002FCLONED>\n\n# DINOv3 ViT models pretrained on web images\ndinov3_vits16 = torch.hub.load(REPO_DIR, 'dinov3_vits16', source='local', weights=\u003CCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\ndinov3_vits16plus = torch.hub.load(REPO_DIR, 'dinov3_vits16plus', source='local', weights=\u003CCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\ndinov3_vitb16 = torch.hub.load(REPO_DIR, 'dinov3_vitb16', source='local', weights=\u003CCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\ndinov3_vitl16 = torch.hub.load(REPO_DIR, 'dinov3_vitl16', source='local', weights=\u003CCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\ndinov3_vith16plus = torch.hub.load(REPO_DIR, 'dinov3_vith16plus', source='local', weights=\u003CCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\ndinov3_vit7b16 = torch.hub.load(REPO_DIR, 'dinov3_vit7b16', source='local', weights=\u003CCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\n\n# DINOv3 ConvNeXt models pretrained on web images\ndinov3_convnext_tiny = torch.hub.load(REPO_DIR, 'dinov3_convnext_tiny', source='local', weights=\u003CCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\ndinov3_convnext_small = torch.hub.load(REPO_DIR, 'dinov3_convnext_small', source='local', weights=\u003CCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\ndinov3_convnext_base = torch.hub.load(REPO_DIR, 'dinov3_convnext_base', source='local', weights=\u003CCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\ndinov3_convnext_large = torch.hub.load(REPO_DIR, 'dinov3_convnext_large', source='local', weights=\u003CCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\n\n# DINOv3 ViT models pretrained on satellite imagery\ndinov3_vitl16 = torch.hub.load(REPO_DIR, 'dinov3_vitl16', source='local', weights=\u003CCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\ndinov3_vit7b16 = torch.hub.load(REPO_DIR, 'dinov3_vit7b16', source='local', weights=\u003CCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\n```\n\n### Pretrained backbones (via Hugging Face [Transformers](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002F))\n\nAll the backbones are available in the [DINOv3](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Ffacebook\u002Fdinov3-68924841bd6b561778e31009) collection on Hugging Face Hub and supported via the Hugging Face [Transformers](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Findex) library (with released packages from version 4.56.0). Please refer to the corresponding documentation for usage, but below is a short example that demonstrates how to obtain an image embedding with either [Pipeline] or the [AutoModel] class.\n\n```python\nfrom transformers import pipeline\nfrom transformers.image_utils import load_image\n\nurl = \"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhuggingface\u002Fdocumentation-images\u002Fresolve\u002Fmain\u002Fpipeline-cat-chonk.jpeg\"\nimage = load_image(url)\n\nfeature_extractor = pipeline(\n    model=\"facebook\u002Fdinov3-convnext-tiny-pretrain-lvd1689m\",\n    task=\"image-feature-extraction\",\n)\nfeatures = feature_extractor(image)\n```\n\n```python\nimport torch\nfrom transformers import AutoImageProcessor, AutoModel\nfrom transformers.image_utils import load_image\n\nurl = \"http:\u002F\u002Fimages.cocodataset.org\u002Fval2017\u002F000000039769.jpg\"\nimage = load_image(url)\n\npretrained_model_name = \"facebook\u002Fdinov3-convnext-tiny-pretrain-lvd1689m\"\nprocessor = AutoImageProcessor.from_pretrained(pretrained_model_name)\nmodel = AutoModel.from_pretrained(\n    pretrained_model_name,\n    device_map=\"auto\",\n)\n\ninputs = processor(images=image, return_tensors=\"pt\").to(model.device)\nwith torch.inference_mode():\n    outputs = model(**inputs)\n\npooled_output = outputs.pooler_output\nprint(\"Pooled output shape:\", pooled_output.shape)\n```\n\nwhere `model` and `pretrained_model_name` above can be one of:\n- `facebook\u002Fdinov3-vits16-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-vits16plus-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-vitb16-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-vitl16-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-vith16plus-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-vit7b16-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-convnext-base-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-convnext-large-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-convnext-small-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-convnext-tiny-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-vitl16-pretrain-sat493m`\n- `facebook\u002Fdinov3-vit7b16-pretrain-sat493m`\n\n### Image transforms\n\nFor models using the LVD-1689M weights (pretrained on web images), please use the following transform (standard ImageNet evaluation transform):\n\n```python\nimport torchvision\nfrom torchvision.transforms import v2\n\ndef make_transform(resize_size: int = 256):\n    to_tensor = v2.ToImage()\n    resize = v2.Resize((resize_size, resize_size), antialias=True)\n    to_float = v2.ToDtype(torch.float32, scale=True)\n    normalize = v2.Normalize(\n        mean=(0.485, 0.456, 0.406),\n        std=(0.229, 0.224, 0.225),\n    )\n    return v2.Compose([to_tensor, resize, to_float, normalize])\n```\n\n\nFor models using the SAT-493M weights (pretrained on satellite imagery), please use the following transform:\n\n\n```python\nimport torchvision\nfrom torchvision.transforms import v2\n\ndef make_transform(resize_size: int = 256):\n    to_tensor = v2.ToImage()\n    resize = v2.Resize((resize_size, resize_size), antialias=True)\n    to_float = v2.ToDtype(torch.float32, scale=True)\n    normalize = v2.Normalize(\n        mean=(0.430, 0.411, 0.296),\n        std=(0.213, 0.156, 0.143),\n    )\n    return v2.Compose([to_tensor, resize, to_float, normalize])\n```\n\n### Pretrained heads - Image classification\n\n\u003Ctable style=\"margin: auto\">\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth>Backbone\u003C\u002Fth>\n      \u003Cth>Pretraining\u003Cbr\u002F>Dataset\u003C\u002Fth>\n      \u003Cth>Head\u003Cbr\u002F>Dataset\u003C\u002Fth>\n      \u003Cth>Download\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd>ViT-7B\u002F16\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">ImageNet\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[link]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n\nThe (full) classifier models can be loaded via PyTorch Hub:\n\n```python\nimport torch\n\n# DINOv3\ndinov3_vit7b16_lc = torch.hub.load(REPO_DIR, 'dinov3_vit7b16_lc', source=\"local\", weights=\u003CDEPTHER\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>, backbone_weights=\u003CBACKBONE\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\n\n```\n\n### Pretrained heads - Depther trained on SYNTHMIX dataset\n\n\u003Ctable style=\"margin: auto\">\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth>Backbone\u003C\u002Fth>\n      \u003Cth>Pretraining\u003Cbr\u002F>Dataset\u003C\u002Fth>\n      \u003Cth>Head\u003Cbr\u002F>Dataset\u003C\u002Fth>\n      \u003Cth>Download\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd>ViT-7B\u002F16\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">SYNTHMIX\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[link]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n\n```python\ndepther = torch.hub.load(REPO_DIR, 'dinov3_vit7b16_dd', source=\"local\", weights=\u003CDEPTHER\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>, backbone_weights=\u003CBACKBONE\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\n```\n\nFull example code of depther on an image\n\n```python\nfrom PIL import Image\nimport torch\nfrom torchvision.transforms import v2\nimport matplotlib.pyplot as plt\nfrom matplotlib import colormaps\n\ndef get_img():\n    import requests\n    url = \"http:\u002F\u002Fimages.cocodataset.org\u002Fval2017\u002F000000039769.jpg\"\n    image = Image.open(requests.get(url, stream=True).raw).convert(\"RGB\")\n    return image\n\ndef make_transform(resize_size: int | list[int] = 768):\n    to_tensor = v2.ToImage()\n    resize = v2.Resize((resize_size, resize_size), antialias=True)\n    to_float = v2.ToDtype(torch.float32, scale=True)\n    normalize = v2.Normalize(\n        mean=(0.485, 0.456, 0.406),\n        std=(0.229, 0.224, 0.225),\n    )\n    return v2.Compose([to_tensor, resize, to_float, normalize])\n\ndepther = torch.hub.load(REPO_DIR, 'dinov3_vit7b16_dd', source=\"local\", weights=\u003CDEPTHER\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>, backbone_weights=\u003CBACKBONE\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\n\nimg_size = 1024\nimg = get_img()\ntransform = make_transform(img_size)\nwith torch.inference_mode():\n    with torch.autocast('cuda', dtype=torch.bfloat16):\n        batch_img = transform(img)[None]\n        batch_img = batch_img\n        depths = depther(batch_img)\n\nplt.figure(figsize=(12, 6))\nplt.subplot(121)\nplt.imshow(img)\nplt.axis(\"off\")\nplt.subplot(122)\nplt.imshow(depths[0,0].cpu(), cmap=colormaps[\"Spectral\"])\nplt.axis(\"off\")\n\n```\n\n#### Reproduce paper results\n\nMake sure the NYU dataset is setup following [this](DATASETS.md#depth-estimation-on-nyu).\n\nLaunch the following to reproduce our paper's depth estimation results on NYUv2 with the pretrained Depther trained on SYNTHMIX:\n\n```shell\nPYTHONPATH=. python -m dinov3.run.submit dinov3\u002Feval\u002Fdepth\u002Frun.py \\\nconfig=dinov3\u002Feval\u002Fdepth\u002Fconfigs\u002Fconfig-nyu-synthmix-dpt-inference.yaml \\\ndatasets.root=\u003CPATH\u002FTO\u002FDATASET> \\\nload_from=dinov3_vit7b16_dd \\\n--output-dir \u003CPATH\u002FTO\u002FOUTPUT\u002FDIR>\n```\n\nNotes:\n- if you want to launch the code without dinov3.run.submit, you can do so using python directly or torchrun:\n\n```shell\nPYTHONPATH=. python dinov3\u002Feval\u002Fdepth\u002Frun.py \\\nconfig=dinov3\u002Feval\u002Fdepth\u002Fconfigs\u002Fconfig-nyu-synthmix-dpt-inference.yaml \\\ndatasets.root=\u003CPATH\u002FTO\u002FDATASET> \\\nload_from=dinov3_vit7b16_dd \\\noutput_dir=\u003CPATH\u002FTO\u002FOUTPUT\u002FDIR>\n```\n\n- One can also save prediction results using `result_config.save_results=true`.\n\n\n### Pretrained heads - Detector trained on COCO2017 dataset\n\n\u003Ctable style=\"margin: auto\">\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth>Backbone\u003C\u002Fth>\n      \u003Cth>Pretraining\u003Cbr\u002F>Dataset\u003C\u002Fth>\n      \u003Cth>Head\u003Cbr\u002F>Dataset\u003C\u002Fth>\n      \u003Cth>Download\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd>ViT-7B\u002F16\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">COCO2017\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[link]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n\n```python\ndetector = torch.hub.load(REPO_DIR, 'dinov3_vit7b16_de', source=\"local\", weights=\u003CDETECTOR\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>, backbone_weights=\u003CBACKBONE\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\n```\n\n### Pretrained heads - Segmentor trained on ADE20K dataset\n\n\u003Ctable style=\"margin: auto\">\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth>Backbone\u003C\u002Fth>\n      \u003Cth>Pretraining\u003Cbr\u002F>Dataset\u003C\u002Fth>\n      \u003Cth>Head\u003Cbr\u002F>Dataset\u003C\u002Fth>\n      \u003Cth>Download\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd>ViT-7B\u002F16\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">ADE20K\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[link]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n```python\nsegmentor = torch.hub.load(REPO_DIR, 'dinov3_vit7b16_ms', source=\"local\", weights=\u003CSEGMENTOR\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>, backbone_weights=\u003CBACKBONE\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\n```\n\nExample command to run a full inference on ADE20K with the provided segmentor (ViT-7B + M2F):\n\n```shell\nPYTHONPATH=. python -m dinov3.run.submit dinov3\u002Feval\u002Fsegmentation\u002Frun.py \\\nconfig=dinov3\u002Feval\u002Fsegmentation\u002Fconfigs\u002Fconfig-ade20k-m2f-inference.yaml  \\\ndatasets.root=\u003CPATH\u002FTO\u002FDATASET> \\\nload_from=dinov3_vit7b16_ms \\\n--output-dir \u003CPATH\u002FTO\u002FOUTPUT\u002FDIR>\n```\n\nFull example code of segmentator on an image\n\n```python\nimport sys\nsys.path.append(REPO_DIR)\n\nfrom PIL import Image\nimport torch\nfrom torchvision import transforms\nimport matplotlib.pyplot as plt\nfrom matplotlib import colormaps\nfrom functools import partial\nfrom dinov3.eval.segmentation.inference import make_inference\n\n\ndef get_img():\n    import requests\n    url = \"http:\u002F\u002Fimages.cocodataset.org\u002Fval2017\u002F000000039769.jpg\"\n    image = Image.open(requests.get(url, stream=True).raw).convert(\"RGB\")\n    return image\n\ndef make_transform(resize_size: int | list[int] = 768):\n    to_tensor = v2.ToImage()\n    resize = v2.Resize((resize_size, resize_size), antialias=True)\n    to_float = v2.ToDtype(torch.float32, scale=True)\n    normalize = v2.Normalize(\n        mean=(0.485, 0.456, 0.406),\n        std=(0.229, 0.224, 0.225),\n    )\n    return v2.Compose([to_tensor, resize, to_float, normalize])\n\nsegmentor = torch.hub.load(REPO_DIR, 'dinov3_vit7b16_ms', source=\"local\", weights=\u003CSEGMENTOR\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>, backbone_weights=\u003CBACKBONE\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\n\nimg_size = 896\nimg  = get_img()\ntransform = make_transform(img_size)\nwith torch.inference_mode():\n    with torch.autocast('cuda', dtype=torch.bfloat16):\n        batch_img = transform(img)[None]\n        pred_vit7b = segmentor(batch_img)  # raw predictions\n        # actual segmentation map\n        segmentation_map_vit7b = make_inference(\n            batch_img,\n            segmentor,\n            inference_mode=\"slide\",\n            decoder_head_type=\"m2f\",\n            rescale_to=(img.size[-1], img.size[-2]),\n            n_output_channels=150,\n            crop_size=(img_size, img_size),\n            stride=(img_size, img_size),\n            output_activation=partial(torch.nn.functional.softmax, dim=1),\n        ).argmax(dim=1, keepdim=True)\nplt.figure(figsize=(12, 6))\nplt.subplot(121)\nplt.imshow(img)\nplt.axis(\"off\")\nplt.subplot(122)\nplt.imshow(segmentation_map_vit7b[0,0].cpu(), cmap=colormaps[\"Spectral\"])\nplt.axis(\"off\")\n```\n\n\n\n\n### Pretrained heads - Zero-shot tasks with `dino.txt`\n\n\u003Ctable style=\"margin: auto\">\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth rowspan=\"2\">Backbone\u003C\u002Fth>\n      \u003Cth>Download\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd>ViT-L\u002F16 distilled\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\n        \u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[link]\u003C\u002Fa>,\n        \u003Ca href=\"https:\u002F\u002Fdl.fbaipublicfiles.com\u002Fdinov3\u002Fthirdparty\u002Fbpe_simple_vocab_16e6.txt.gz\">vocabulary\u003C\u002Fa>,\n        \u003Ca href=\"https:\u002F\u002Fdl.fbaipublicfiles.com\u002Fdinov2\u002Fthirdparty\u002FLICENSE\">vocabulary license\u003C\u002Fa>\n      \u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\nThe (full) dino.txt model can be loaded via PyTorch Hub:\n\n```python\nimport torch\n# DINOv3\ndinov3_vitl16_dinotxt_tet1280d20h24l, tokenizer = torch.hub.load(REPO_DIR, 'dinov3_vitl16_dinotxt_tet1280d20h24l', weights=\u003CSEGMENTOR\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>, backbone_weights=\u003CBACKBONE\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\n```\n\n\n## Installation\n\nThe training and evaluation code requires PyTorch version >= 2.7.1 as well as a few other 3rd party packages. Note that the code has only been tested with the specified versions and also expects a Linux environment. To setup all the required dependencies for training and evaluation, please follow the instructions below:\n\n*[micromamba](https:\u002F\u002Fmamba.readthedocs.io\u002Fen\u002Flatest\u002Fuser_guide\u002Fmicromamba.html)* **(Recommended)** - Clone the repository and then create and activate a `dinov3` conda environment using the provided environment definition:\n\n```shell\nmicromamba env create -f conda.yaml\nmicromamba activate dinov3\n```\n\n## Getting started\n\nSeveral notebooks are provided to get started applying DINOv3:\n- [PCA of patch features](notebooks\u002Fpca.ipynb): display the PCA of DINOv3 patch features on a foreground object (rainbow visualizations from the paper) [[Run in Google Colab]](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Ffacebookresearch\u002Fdinov3\u002Fblob\u002Fmain\u002Fnotebooks\u002Fpca.ipynb)\n- [Foreground segmentation](notebooks\u002Fforeground_segmentation.ipynb): train a linear foreground segmentation model based on DINOv3 features [[Run in Google Colab]](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Ffacebookresearch\u002Fdinov3\u002Fblob\u002Fmain\u002Fnotebooks\u002Fforeground_segmentation.ipynb)\n- [Dense and sparse matching](notebooks\u002Fdense_sparse_matching.ipynb): match patches from objects on two different images based on DINOv3 features [[Run in Google Colab]](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Ffacebookresearch\u002Fdinov3\u002Fblob\u002Fmain\u002Fnotebooks\u002Fdense_sparse_matching.ipynb)\n- [Segmentation tracking](notebooks\u002Fsegmentation_tracking.ipynb): video segmentation tracking using a non-parametric method based on DINOv3 features [[Run in Google Colab]](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Ffacebookresearch\u002Fdinov3\u002Fblob\u002Fmain\u002Fnotebooks\u002Fsegmentation_tracking.ipynb)\n- [Zero-shot segmentation with DINOv3-based dino.txt](notebooks\u002Fdinotxt_segmentation_inference.ipynb): compute the open-vocabulary segmentation results with dino.txt strategy.\n\n## Data preparation\n\n### ImageNet-1k\n\nThe root directory of the dataset should hold the following contents:\n\n- `\u003CROOT>\u002Ftest\u002FILSVRC2012_test_00000001.JPEG`\n- `\u003CROOT>\u002Ftest\u002F[..]`\n- `\u003CROOT>\u002Ftest\u002FILSVRC2012_test_00100000.JPEG`\n- `\u003CROOT>\u002Ftrain\u002Fn01440764\u002Fn01440764_10026.JPEG`\n- `\u003CROOT>\u002Ftrain\u002F[...]`\n- `\u003CROOT>\u002Ftrain\u002Fn15075141\u002Fn15075141_9993.JPEG`\n- `\u003CROOT>\u002Fval\u002Fn01440764\u002FILSVRC2012_val_00000293.JPEG`\n- `\u003CROOT>\u002Fval\u002F[...]`\n- `\u003CROOT>\u002Fval\u002Fn15075141\u002FILSVRC2012_val_00049174.JPEG`\n- `\u003CROOT>\u002Flabels.txt`\n\nThe provided dataset implementation expects a few additional metadata files to be present under the extra directory:\n\n- `\u003CEXTRA>\u002Fclass-ids-TRAIN.npy`\n- `\u003CEXTRA>\u002Fclass-ids-VAL.npy`\n- `\u003CEXTRA>\u002Fclass-names-TRAIN.npy`\n- `\u003CEXTRA>\u002Fclass-names-VAL.npy`\n- `\u003CEXTRA>\u002Fentries-TEST.npy`\n- `\u003CEXTRA>\u002Fentries-TRAIN.npy`\n- `\u003CEXTRA>\u002Fentries-VAL.npy`\n\nThese metadata files can be generated (once) with the following lines of Python code:\n\n```python\nfrom dinov3.data.datasets import ImageNet\n\nfor split in ImageNet.Split:\n    dataset = ImageNet(split=split, root=\"\u003CROOT>\", extra=\"\u003CEXTRA>\")\n    dataset.dump_extra()\n```\n\nNote that the root and extra directories do not have to be distinct directories.\n\n### ImageNet-22k\n\nPlease adapt the [dataset class](dinov3\u002Fdata\u002Fdatasets\u002Fimage_net_22k.py) to match your local setup.\n\n\u003Cbr \u002F>\n\n:warning: To execute the commands provided in the next sections for training and evaluation, the `dinov3` package should be included in the Python module search path, i.e. simply prefix the command to run with `PYTHONPATH=.`.\n\n## Training\n\n### Fast setup: training DINOv3 ViT-L\u002F16 on ImageNet-1k\n\nRun DINOv3 pre-training on 4 H100-80GB nodes (32 GPUs) in a SLURM cluster environment with submitit:\n\n```shell\n PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3\u002Ftrain\u002Ftrain.py \\\n  --nodes 4 \\\n  --config-file dinov3\u002Fconfigs\u002Ftrain\u002Fvitl_im1k_lin834.yaml \\\n  --output-dir \u003CPATH\u002FTO\u002FOUTPUT\u002FDIR> \\\n  train.dataset_path=ImageNet22k:root=\u003CPATH\u002FTO\u002FDATASET>:extra=\u003CPATH\u002FTO\u002FDATASET>\n```\nTraining time is approximately 14 hours and the resulting checkpoint should reach 82.0% on k-NN eval and 83.5% on linear eval.\n\nThe training code saves the weights of the teacher in the eval folder every 12500 iterations for evaluation.\n\n### Exact DINOv3 setup: training DINOv3 ViT-7B\u002F16\n\nDINOv3 ViT-7B\u002F16 is trained on a private dataset. The training involves 3 stages:\n- Pretraining\n- Gram anchoring\n- High resolution adaptation\n\n#### Pretraining\n\nLaunch DINOV3 ViT-7B\u002F16 pretraining on 32 nodes (256 GPUs) in a SLURM cluster environment with submitit.\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3\u002Ftrain\u002Ftrain.py \\\n  --nodes 32 \\\n  --config-file dinov3\u002Fconfigs\u002Ftrain\u002Fdinov3_vit7b16_pretrain.yaml \\\n  --output-dir \u003CPATH\u002FTO\u002FOUTPUT\u002FDIR> \\\n  train.dataset_path=\u003CDATASET>:root=\u003CPATH\u002FTO\u002FDATASET>:extra=\u003CPATH\u002FTO\u002FDATASET>\n```\n\n#### Gram anchoring\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3\u002Ftrain\u002Ftrain.py \\\n  --nodes 32 \\\n  --config-file dinov3\u002Fconfigs\u002Ftrain\u002Fdinov3_vit7b16_gram_anchor.yaml \\\n  --output-dir \u003CPATH\u002FTO\u002FOUTPUT\u002FDIR> \\\n  train.dataset_path=\u003CDATASET>:root=\u003CPATH\u002FTO\u002FDATASET>:extra=\u003CPATH\u002FTO\u002FDATASET> \\\n  gram.ckpt=\u003CPATH\u002FTO\u002FGRAM_TEACHER_FROM_PREVIOUS_STEP>\n```\n\n#### High-resolution adaptation\n\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3\u002Ftrain\u002Ftrain.py \\\n  --nodes 32 \\\n  --config-file dinov3\u002Fconfigs\u002Ftrain\u002Fdinov3_vit7b16_high_res_adapt.yaml \\\n  --output-dir \u003CPATH\u002FTO\u002FOUTPUT\u002FDIR> \\\n  train.dataset_path=\u003CDATASET>:root=\u003CPATH\u002FTO\u002FDATASET>:extra=\u003CPATH\u002FTO\u002FDATASET> \\\n  gram.ckpt=\u003CPATH\u002FTO\u002FTEACHER_FROM_GRAM> \\\n  student.resume_from_teacher_chkpt=\u003CPATH\u002FTO\u002FTEACHER_FROM_GRAM>\n```\n\n## Multi-distillation\n\n### Test setup:\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3\u002Ftrain\u002Ftrain.py \\\n  --nodes 1 \\\n  --config-file dinov3\u002Fconfigs\u002Ftrain\u002Fmulti_distillation_test.yaml \\\n  --output-dir \u003CPATH\u002FTO\u002FOUTPUT\u002FDIR> \\\n  --multi-distillation \\\n  train.dataset_path=\u003CDATASET>:root=\u003CPATH\u002FTO\u002FDATASET>:extra=\u003CPATH\u002FTO\u002FDATASET>\n```\n\n## Evaluation\n\nThe training code regularly saves the teacher weights. In order to evaluate the model, run the following evaluation on a single node:\n\n\n### Logistic regression classification on ImageNet-1k\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3\u002Feval\u002Flog_regression.py \\\n  model.config_file=\u003CPATH\u002FTO\u002FOUTPUT\u002FDIR>\u002Fconfig.yaml \\\n  model.pretrained_weights=\u003CPATH\u002FTO\u002FOUTPUT\u002FDIR>\u002Fteacher_checkpoint.pth \\\n  output_dir=\u003CPATH\u002FTO\u002FOUTPUT\u002FDIR> \\\n  train.dataset=ImageNet:split=TRAIN:root=\u003CPATH\u002FTO\u002FDATASET>:extra=\u003CPATH\u002FTO\u002FDATASET> \\\n  eval.test_dataset=ImageNet:split=VAL:root=\u003CPATH\u002FTO\u002FDATASET>:extra=\u003CPATH\u002FTO\u002FDATASET>\n```\n\n### k-NN classification on ImageNet-1k\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3\u002Feval\u002Fknn.py \\\n  model.config_file=\u003CPATH\u002FTO\u002FOUTPUT\u002FDIR>\u002Fconfig.yaml \\\n  model.pretrained_weights=\u003CPATH\u002FTO\u002FOUTPUT\u002FDIR>\u002Fteacher_checkpoint.pth \\\n  output_dir=\u003CPATH\u002FTO\u002FOUTPUT\u002FDIR> \\\n  train.dataset=ImageNet:split=TRAIN:root=\u003CPATH\u002FTO\u002FDATASET>:extra=\u003CPATH\u002FTO\u002FDATASET> \\\n  eval.test_dataset=ImageNet:split=VAL:root=\u003CPATH\u002FTO\u002FDATASET>:extra=\u003CPATH\u002FTO\u002FDATASET>\n```\n\n### Linear classification with data augmentation on ImageNet-1k\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3\u002Feval\u002Flinear.py \\\n  model.config_file=\u003CPATH\u002FTO\u002FOUTPUT\u002FDIR>\u002Fconfig.yaml \\\n  model.pretrained_weights=\u003CPATH\u002FTO\u002FOUTPUT\u002FDIR>\u002Fteacher_checkpoint.pth \\\n  output_dir=\u003CPATH\u002FTO\u002FOUTPUT\u002FDIR> \\\n  train.dataset=ImageNet:split=TRAIN:root=\u003CPATH\u002FTO\u002FDATASET>:extra=\u003CPATH\u002FTO\u002FDATASET> \\\n  train.val_dataset=ImageNet:split=VAL:root=\u003CPATH\u002FTO\u002FDATASET>:extra=\u003CPATH\u002FTO\u002FDATASET>\n```\n\n### Linear segmentation with data augmentation on ADE20K\n\n```shell\nPYTHONPATH=. python -m dinov3.run.submit dinov3\u002Feval\u002Fsegmentation\u002Frun.py \\\nmodel.dino_hub=dinov3_vit7b16 \\\nconfig=dinov3\u002Feval\u002Fsegmentation\u002Fconfigs\u002Fconfig-ade20k-linear-training.yaml \\\ndatasets.root=\u003CPATH\u002FTO\u002FDATASET> \\\n--output-dir \u003CPATH\u002FTO\u002FOUTPUT\u002FDIR>\n```\n\nAfter the job completes, you will find in the output path directory you specified\n- `segmentation_config.yaml` that contains the config you trained the model with;\n- `model_final.pth`, the final linear head checkpoint at the end of training; and\n- `results-semantic-segmentation.csv` with the final metrics.\n\n\n#### Linear depth estimation on NYUv2 Depth\n```shell\nPYTHONPATH=. python -m dinov3.run.submit dinov3\u002Feval\u002Fdepth\u002Frun.py \\\n    model.dino_hub=dinov3_vit7b16 \\\n    config=dinov3\u002Feval\u002Fdepth\u002Fconfigs\u002Fconfig-nyu.yaml \\\n    datasets.root=\u003CPATH\u002FTO\u002FDATASET> \\\n    --output-dir \u003CPATH\u002FTO\u002FOUTPUT\u002FDIR>\n```\n\nAfter the job completes, you will find in the output path directory you specified\n- `depth_config.yaml` that contains the config you trained the model with;\n- `model_final.pth`, the final linear head checkpoint at the end of training; and\n- `results-depth.csv` with the final metrics.\n\n### Text alignment on DINOv3 using dino.txt\n\nText alignment can be done following the method from `dino.txt` aka [DINOv2 Meets Text](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.16334).\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3\u002Feval\u002Ftext\u002Ftrain_dinotxt.py \\\n   --nodes 4 \\\n  # An example config for text alignment is here: dinov3\u002Feval\u002Ftext\u002Fconfigs\u002Fdinov3_vitl_text.yaml \\\n  trainer_config_file=\"\u003CPATH\u002FTO\u002FDINOv3\u002FTEXT\u002FCONFIG>\" \\\n  output-dir=\u003CPATH\u002FTO\u002FOUTPUT\u002FDIR>\n```\nLaunching the above trains text alignment on 4 nodes with 8 gpus each (32 gpus in total).\nPlease note that the text alignment model in the DINOv3 paper was trained on a private dataset and here we have given an example config in ```dinov3\u002Feval\u002Ftext\u002Fconfigs\u002Fdinov3_vitl_text.yaml``` using ```CocoCaptions``` dataset for illustration purposes.\nPlease adapt the provided ```CocoCaptions``` dataset class, the dataset can be found [here](https:\u002F\u002Fwww.kaggle.com\u002Fdatasets\u002Fnikhil7280\u002Fcoco-image-caption)\n\n\n## Canopy Height Maps v2 (CHMv2)\n\nJohn Brandt, Seungeun Yi, Jamie Tolan, Xinyuan Li, Peter Potapov, \u003Cbr\u002F>\nJessica Ertel, Justine Spore, Huy V. Vo, Michaël Ramamonjisoa, Patrick Labatut, \u003Cbr\u002F>\nPiotr Bojanowski, Camille Couprie\n\n[ :scroll: [`Paper`](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.06382)] [ :newspaper: [`Blog`](http:\u002F\u002Fai.meta.com\u002Fblog\u002Fworld-resources-institute-dino-canopy-height-maps-v2)]\n\n### CHMv2 model loading (via PyTorch [Hub](https:\u002F\u002Fdocs.pytorch.org\u002Fdocs\u002Fstable\u002Fhub.html))\n\n:information_source: Please follow the link provided below to get access to the CHMv2 model weights: once accepted, an e-mail will be sent with the URL pointing to the available model weights. The URL can then be used to either:\n- download the model weights to a local filesystem and point `torch.hub.load()` to these local weights via the `weights` parameters, or\n- directly invoke `torch.hub.load()` to download and load a backbone from its URL.\n\nCHMv2 uses the DINOv3 ViT-L\u002F16 satellite as the backbone, available after requesting access [here](https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F).\n\n:warning: Please use `wget` instead of a web browser to download the weights.\n\nDownload link: https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fchmv2-downloads\u002F\n\n```python\nimport torch\nfrom dinov3.hub.backbones import Weights\n\nREPO_DIR = \u003CPATH\u002FTO\u002FA\u002FLOCAL\u002FDIRECTORY\u002FWHERE\u002FTHE\u002FDINOv3\u002FREPO\u002FWAS\u002FCLONED>\n\nchmv2_model = torch.hub.load(\n    REPO_DIR,\n    'dinov3_vitl16_chmv2',\n    source=\"local\",\n    weights=\"\u003CCHMV2_MODEL\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>\",\n    backbone_weights=Weights.SAT493M,  # or \u003CDINOV3_VITL_SAT\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>\n)\n```\n\nRefer to this [notebook](notebooks\u002Fchmv2_inference.ipynb) for an example of how to use the DINOv3 + CHMv2 model.\n\nThis [notebook](notebooks\u002Fchmv2_dataset_exploration.ipynb) can be used to download inference data from the existing global dataset stored on aws.\n\n### CHMv2 model loading (via Hugging Face [Transformers](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002F))\n\nThe CHMv2 model is also available on [Hugging Face Hub](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fdinov3-vitl16-chmv2-dpt-head) and supported via the Hugging Face [Transformers](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Findex) library. Please refer to the corresponding documentation for usage, but below is a short example that demonstrates how to obtain canopy height predictions on a sample image.\n\n```python\nfrom PIL import Image\nimport torch\n\nfrom transformers import AutoModelForDepthEstimation, AutoImageProcessor\n\nprocessor = AutoImageProcessor.from_pretrained(\"facebook\u002Fdinov3-vitl16-chmv2-dpt-head\")\nmodel = AutoModelForDepthEstimation.from_pretrained(\"facebook\u002Fdinov3-vitl16-chmv2-dpt-head\")\n\nimage = Image.open(\"image.tif\")\ninputs = processor(images=image, return_tensors=\"pt\")\n\nwith torch.no_grad():\n    outputs = model(**inputs)\n\ndepth = processor.post_process_depth_estimation(\n    outputs, target_sizes=[(image.height, image.width)]\n)[0][\"predicted_depth\"]\n```\n\n## License\n\nDINOv3 code and model weights are released under the DINOv3 License. See [LICENSE.md](LICENSE.md) for additional details.\n\n## Contributing\n\nSee [contributing](CONTRIBUTING.md) and the [code of conduct](CODE_OF_CONDUCT.md).\n\n## Citing DINOv3\n\nIf you find this repository useful, please consider giving a star :star: and citation :t-rex::\n\n```\n@misc{simeoni2025dinov3,\n  title={{DINOv3}},\n  author={Sim{\\'e}oni, Oriane and Vo, Huy V. and Seitzer, Maximilian and Baldassarre, Federico and Oquab, Maxime and Jose, Cijo and Khalidov, Vasil and Szafraniec, Marc and Yi, Seungeun and Ramamonjisoa, Micha{\\\"e}l and Massa, Francisco and Haziza, Daniel and Wehrstedt, Luca and Wang, Jianyuan and Darcet, Timoth{\\'e}e and Moutakanni, Th{\\'e}o and Sentana, Leonel and Roberts, Claire and Vedaldi, Andrea and Tolan, Jamie and Brandt, John and Couprie, Camille and Mairal, Julien and J{\\'e}gou, Herv{\\'e} and Labatut, Patrick and Bojanowski, Piotr},\n  year={2025},\n  eprint={2508.10104},\n  archivePrefix={arXiv},\n  primaryClass={cs.CV},\n  url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.10104},\n}\n```\n",":新: [2026-03-10] :fire: [冠层高度地图v2 (CHMv2) 模型](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.06382)及推理代码现已发布（有关下载模型权重和使用代码的更多详情请见[此处](#canopy-height-maps-v2-chmv2)）。该模型的权重也已在[Hugging Face Hub](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fdinov3-vitl16-chmv2-dpt-head)上提供，并由Hugging Face的[Transformers](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Findex)库[支持](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Fblob\u002Fmain\u002Fdocs\u002Fsource\u002Fen\u002Fmodel_doc\u002Fchmv2.md)。基于我们在2024年发布的原始高分辨率冠层高度地图，CHMv2通过利用DINOv3，在精度、细节和全球一致性方面实现了显著提升。\n\n[2025-11-20] 用于ConvNeXt主干网络的蒸馏代码和配置现已发布！\n\n[2025-10-13] [语义分割](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdinov3?tab=readme-ov-file#linear-segmentation-with-data-augmentation-on-ade20k)（ADE20K）和[单目深度估计](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdinov3?tab=readme-ov-file#linear-depth-estimation-on-nyuv2-depth)（NYUv2-Depth）的线性探针代码现已发布！\n\n[2025-09-17] DINOv3主干网络现已由[PyTorch Image Models \u002F timm](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002F)库自版本[1.0.20](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models\u002Freleases\u002Ftag\u002Fv1.0.20)起支持。\n\n[2025-08-29] DINOv3主干网络已由Hugging Face的[Transformers](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Findex)库自版本[4.56.0](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Freleases\u002Ftag\u002Fv4.56.0)起[支持](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_doc\u002Fdinov3)。\n\n[2025-08-14] DINOv3主干网络现已在[Hugging Face Hub](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Ffacebook\u002Fdinov3-68924841bd6b561778e31009)上提供，并由Hugging Face的[Transformers](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Findex)库的[开发版](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002F)所[支持](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_doc\u002Fdinov3)。\n\n# DINOv3 🦖🦖🦖\n\n**[Meta AI Research, FAIR](https:\u002F\u002Fai.meta.com\u002Fresearch\u002F)**\n\n奥里安·西梅奥尼、胡伊·V·武、马克西米利安·赛策尔、费德里科·巴尔达萨雷、马克西姆·欧卡布、\u003Cbr\u002F>\n西乔·约瑟夫、瓦西里·哈利多夫、马克·斯扎夫拉涅茨、李承恩、米夏埃尔·拉马蒙吉索阿、\u003Cbr\u002F>\n弗朗西斯科·马萨、丹尼尔·哈齐扎、卢卡·韦尔施泰特、王建元、\u003Cbr\u002F>\n蒂莫泰·达尔塞、泰奥·穆塔卡尼、莱昂内尔·森塔纳、克莱尔·罗伯茨、\u003Cbr\u002F>\n安德烈亚·韦达尔迪、杰米·托兰、约翰·布兰特、卡米尔·库普里、\u003Cbr\u002F>\n朱利安·迈拉尔、埃尔韦·热古、帕特里克·拉巴图、皮奥特尔·博亚诺夫斯基\n\n[ :scroll: [`论文`](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.10104)] [ :newspaper: [`博客`](https:\u002F\u002Fai.meta.com\u002Fblog\u002Fdinov3-self-supervised-vision-model\u002F)] [ :globe_with_meridians: [`网站`](https:\u002F\u002Fai.meta.com\u002Fdinov3\u002F)] [ :book: [`BibTeX`](#citing-dinov3)]\n\nDINOv3的参考PyTorch实现及模型。详情请参阅**[DINOv3](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.10104)**论文。\n\n## 概述\n\n\u003Cdiv align=\"center\">\n  \u003Cimg width=\"1364\" height=\"1024\" alt=\"市场\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_dinov3_readme_e0197b9dd25f.png\" \u002F>\n\n  \u003Ci>\u003C\u002Fem>\u003Cb>高分辨率密集特征。\u003C\u002Fb>\u003Cbr\u002F>我们可视化了使用DINOv3输出特征计算得到的余弦相似度图，\u003Cbr\u002F>对比以红十字标记的补丁与其他所有补丁之间的相似度。\u003C\u002Fi>\n\u003C\u002Fdiv>\n\n\u003Cbr\u002F>\n\n一个扩展的多功能视觉基础模型家族，能够生成高质量的密集特征，并在各类视觉任务中取得卓越表现，甚至在广泛的应用场景下无需微调即可超越专门优化的最先进方法。\n\n## 预训练模型\n\n:information_source: 请按照下方提供的链接获取所有模型权重的访问权限：一旦申请被接受，您将收到一封电子邮件，其中包含指向所有可用模型权重（包括主干网络和适配器）的完整 URL 列表。这些 URL 可用于：\n- 将模型或适配器权重下载到本地文件系统，并通过 `weights` 或 `backbone_weights` 参数将 `torch.hub.load()` 指向这些本地权重；\n- 直接调用 `torch.hub.load()`，通过 URL 下载并加载主干网络或适配器，同样使用 `weights` 或 `backbone_weights` 参数。\n\n请参阅下方的示例代码片段。\n\n:warning: 请使用 `wget` 而不是网页浏览器来下载权重。\n\n在 Web 数据集（LVD-1689M）上预训练的 ViT 模型：\n\u003Ctable style=\"margin: auto\">\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth>模型\u003C\u002Fth>\n      \u003Cth>参数量\u003C\u002Fth>\n      \u003Cth>预训练\u003Cbr\u002F>数据集\u003C\u002Fth>\n      \u003Cth>下载\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd>ViT-S\u002F16 蒸馏版\u003C\u002Ftd>\n      \u003Ctd align=\"right\">21M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[链接]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>ViT-S+\u002F16 蒸馏版\u003C\u002Ftd>\n      \u003Ctd align=\"right\">29M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[链接]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>ViT-B\u002F16 蒸馏版\u003C\u002Ftd>\n      \u003Ctd align=\"right\">86M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[链接]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>ViT-L\u002F16 蒸馏版\u003C\u002Ftd>\n      \u003Ctd align=\"right\">300M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[链接]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>ViT-H+\u002F16 蒸馏版\u003C\u002Ftd>\n      \u003Ctd align=\"right\">840M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[链接]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>ViT-7B\u002F16\u003C\u002Ftd>\n      \u003Ctd align=\"right\">6,716M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[链接]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n在 Web 数据集（LVD-1689M）上预训练的 ConvNeXt 模型：\n\u003Ctable style=\"margin: auto\">\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth>模型\u003C\u002Fth>\n      \u003Cth>参数量\u003C\u002Fth>\n      \u003Cth>预训练\u003Cbr\u002F>数据集\u003C\u002Fth>\n      \u003Cth>下载\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd>ConvNeXt Tiny\u003C\u002Ftd>\n      \u003Ctd align=\"right\">29M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[链接]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>ConvNeXt Small\u003C\u002Ftd>\n      \u003Ctd align=\"right\">50M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[链接]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>ConvNeXt Base\u003C\u002Ftd>\n      \u003Ctd align=\"right\">89M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[链接]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>ConvNeXt Large\u003C\u002Ftd>\n      \u003Ctd align=\"right\">198M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[链接]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n在卫星数据集（SAT-493M）上预训练的 ViT 模型：\n\u003Ctable style=\"margin: auto\">\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth>模型\u003C\u002Fth>\n      \u003Cth>参数量\u003C\u002Fth>\n      \u003Cth>预训练\u003Cbr\u002F>数据集\u003C\u002Fth>\n      \u003Cth>下载\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd>ViT-L\u002F16 蒸馏版\u003C\u002Ftd>\n      \u003Ctd align=\"right\">300M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">SAT-493M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[链接]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>ViT-7B\u002F16\u003C\u002Ftd>\n      \u003Ctd align=\"right\">6,716M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">SAT-493M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[链接]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n\n### 预训练主干网络（通过 PyTorch [Hub](https:\u002F\u002Fdocs.pytorch.org\u002Fdocs\u002Fstable\u002Fhub.html)）\n\n请按照 [此处](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F) 的说明安装 PyTorch（这是加载模型所需的唯一依赖项）。强烈建议安装支持 CUDA 的 PyTorch 版本。\n\n```python\nimport torch\n\nREPO_DIR = \u003CPATH\u002FTO\u002FA\u002FLOCAL\u002FDIRECTORY\u002FWHERE\u002FTHE\u002FDINOV3\u002FREPO\u002FWAS\u002FCLONED>\n\n# DINOv3 在网络图像上预训练的 ViT 模型\ndinov3_vits16 = torch.hub.load(REPO_DIR, 'dinov3_vits16', source='local', weights=\u003C检查点\u002FURL\u002F或路径>)\ndinov3_vits16plus = torch.hub.load(REPO_DIR, 'dinov3_vits16plus', source='local', weights=\u003C检查点\u002FURL\u002F或路径>)\ndinov3_vitb16 = torch.hub.load(REPO_DIR, 'dinov3_vitb16', source='local', weights=\u003C检查点\u002FURL\u002F或路径>)\ndinov3_vitl16 = torch.hub.load(REPO_DIR, 'dinov3_vitl16', source='local', weights=\u003C检查点\u002FURL\u002F或路径>)\ndinov3_vith16plus = torch.hub.load(REPO_DIR, 'dinov3_vith16plus', source='local', weights=\u003C检查点\u002FURL\u002F或路径>)\ndinov3_vit7b16 = torch.hub.load(REPO_DIR, 'dinov3_vit7b16', source='local', weights=\u003C检查点\u002FURL\u002F或路径>)\n\n# DINOv3 在网络图像上预训练的 ConvNeXt 模型\ndinov3_convnext_tiny = torch.hub.load(REPO_DIR, 'dinov3_convnext_tiny', source='local', weights=\u003C检查点\u002FURL\u002F或路径>)\ndinov3_convnext_small = torch.hub.load(REPO_DIR, 'dinov3_convnext_small', source='local', weights=\u003C检查点\u002FURL\u002F或路径>)\ndinov3_convnext_base = torch.hub.load(REPO_DIR, 'dinov3_convnext_base', source='local', weights=\u003C检查点\u002FURL\u002F或路径>)\ndinov3_convnext_large = torch.hub.load(REPO_DIR, 'dinov3_convnext_large', source='local', weights=\u003C检查点\u002FURL\u002F或路径>)\n\n# DINOv3 在卫星影像上预训练的 ViT 模型\ndinov3_vitl16 = torch.hub.load(REPO_DIR, 'dinov3_vitl16', source='local', weights=\u003C检查点\u002FURL\u002F或路径>)\ndinov3_vit7b16 = torch.hub.load(REPO_DIR, 'dinov3_vit7b16', source='local', weights=\u003C检查点\u002FURL\u002F或路径>)\n```\n\n### 预训练主干网络（通过 Hugging Face [Transformers](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002F)）\n\n所有主干网络均可在 Hugging Face Hub 上的 [DINOv3](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Ffacebook\u002Fdinov3-68924841bd6b561778e31009) 系列中找到，并通过 Hugging Face [Transformers](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Findex) 库支持（自 4.56.0 版本起已发布相关包）。请参阅相应文档以了解使用方法，以下是一个简短示例，展示了如何使用 [Pipeline] 或 [AutoModel] 类获取图像嵌入。\n\n```python\nfrom transformers import pipeline\nfrom transformers.image_utils import load_image\n\nurl = \"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhuggingface\u002Fdocumentation-images\u002Fresolve\u002Fmain\u002Fpipeline-cat-chonk.jpeg\"\nimage = load_image(url)\n\nfeature_extractor = pipeline(\n    model=\"facebook\u002Fdinov3-convnext-tiny-pretrain-lvd1689m\",\n    task=\"image-feature-extraction\",\n)\nfeatures = feature_extractor(image)\n```\n\n```python\nimport torch\nfrom transformers import AutoImageProcessor, AutoModel\nfrom transformers.image_utils import load_image\n\nurl = \"http:\u002F\u002Fimages.cocodataset.org\u002Fval2017\u002F000000039769.jpg\"\nimage = load_image(url)\n\npretrained_model_name = \"facebook\u002Fdinov3-convnext-tiny-pretrain-lvd1689m\"\nprocessor = AutoImageProcessor.from_pretrained(pretrained_model_name)\nmodel = AutoModel.from_pretrained(\n    pretrained_model_name,\n    device_map=\"auto\",\n)\n\ninputs = processor(images=image, return_tensors=\"pt\").to(model.device)\nwith torch.inference_mode():\n    outputs = model(**inputs)\n\npooled_output = outputs.pooler_output\nprint(\"Pooled output shape:\", pooled_output.shape)\n```\n\n其中，上述 `model` 和 `pretrained_model_name` 可以是以下之一：\n- `facebook\u002Fdinov3-vits16-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-vits16plus-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-vitb16-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-vitl16-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-vith16plus-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-vit7b16-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-convnext-base-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-convnext-large-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-convnext-small-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-convnext-tiny-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-vitl16-pretrain-sat493m`\n- `facebook\u002Fdinov3-vit7b16-pretrain-sat493m`\n\n### 图像变换\n\n对于使用 LVD-1689M 权重（基于网络图像预训练）的模型，请使用以下变换（标准 ImageNet 评估变换）：\n\n```python\nimport torchvision\nfrom torchvision.transforms import v2\n\ndef make_transform(resize_size: int = 256):\n    to_tensor = v2.ToImage()\n    resize = v2.Resize((resize_size, resize_size), antialias=True)\n    to_float = v2.ToDtype(torch.float32, scale=True)\n    normalize = v2.Normalize(\n        mean=(0.485, 0.456, 0.406),\n        std=(0.229, 0.224, 0.225),\n    )\n    return v2.Compose([to_tensor, resize, to_float, normalize])\n```\n\n\n对于使用 SAT-493M 权重（基于卫星影像预训练）的模型，请使用以下变换：\n\n\n```python\nimport torchvision\nfrom torchvision.transforms import v2\n\ndef make_transform(resize_size: int = 256):\n    to_tensor = v2.ToImage()\n    resize = v2.Resize((resize_size, resize_size), antialias=True)\n    to_float = v2.ToDtype(torch.float32, scale=True)\n    normalize = v2.Normalize(\n        mean=(0.430, 0.411, 0.296),\n        std=(0.213, 0.156, 0.143),\n    )\n    return v2.Compose([to_tensor, resize, to_float, normalize])\n```\n\n\n### 预训练分类头 - 图像分类\n\n\u003Ctable style=\"margin: auto\">\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth>主干网络\u003C\u002Fth>\n      \u003Cth>预训练\u003Cbr\u002F>数据集\u003C\u002Fth>\n      \u003Cth>分类头\u003Cbr\u002F>数据集\u003C\u002Fth>\n      \u003Cth>下载\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd>ViT-7B\u002F16\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">ImageNet\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[链接]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n\n这些（完整）分类器模型可以通过 PyTorch Hub 加载：\n\n```python\nimport torch\n\n# DINOv3\ndinov3_vit7b16_lc = torch.hub.load(REPO_DIR, 'dinov3_vit7b16_lc', source=\"local\", weights=\u003CDEPTHER\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>, backbone_weights=\u003CBACKBONE\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\n```\n\n### 预训练头部——在SYNTHMIX数据集上训练的深度估计模型\n\n\u003Ctable style=\"margin: auto\">\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth>骨干网络\u003C\u002Fth>\n      \u003Cth>预训练\u003Cbr\u002F>数据集\u003C\u002Fth>\n      \u003Cth>头部\u003Cbr\u002F>数据集\u003C\u002Fth>\n      \u003Cth>下载\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd>ViT-7B\u002F16\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">SYNTHMIX\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[链接]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n\n```python\ndepther = torch.hub.load(REPO_DIR, 'dinov3_vit7b16_dd', source=\"local\", weights=\u003CDEPTHER\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>, backbone_weights=\u003CBACKBONE\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\n```\n\n使用深度估计模型对图像进行处理的完整示例代码\n\n```python\nfrom PIL import Image\nimport torch\nfrom torchvision.transforms import v2\nimport matplotlib.pyplot as plt\nfrom matplotlib import colormaps\n\ndef get_img():\n    import requests\n    url = \"http:\u002F\u002Fimages.cocodataset.org\u002Fval2017\u002F000000039769.jpg\"\n    image = Image.open(requests.get(url, stream=True).raw).convert(\"RGB\")\n    return image\n\ndef make_transform(resize_size: int | list[int] = 768):\n    to_tensor = v2.ToImage()\n    resize = v2.Resize((resize_size, resize_size), antialias=True)\n    to_float = v2.ToDtype(torch.float32, scale=True)\n    normalize = v2.Normalize(\n        mean=(0.485, 0.456, 0.406),\n        std=(0.229, 0.224, 0.225),\n    )\n    return v2.Compose([to_tensor, resize, to_float, normalize])\n\ndepther = torch.hub.load(REPO_DIR, 'dinov3_vit7b16_dd', source=\"local\", weights=\u003CDEPTHER\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>, backbone_weights=\u003CBACKBONE\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\n\nimg_size = 1024\nimg = get_img()\ntransform = make_transform(img_size)\nwith torch.inference_mode():\n    with torch.autocast('cuda', dtype=torch.bfloat16):\n        batch_img = transform(img)[None]\n        batch_img = batch_img\n        depths = depther(batch_img)\n\nplt.figure(figsize=(12, 6))\nplt.subplot(121)\nplt.imshow(img)\nplt.axis(\"off\")\nplt.subplot(122)\nplt.imshow(depths[0,0].cpu(), cmap=colormaps[\"Spectral\"])\nplt.axis(\"off\")\n\n```\n\n#### 复现论文结果\n\n请确保按照[此文档](DATASETS.md#depth-estimation-on-nyu)设置NYU数据集。\n\n运行以下命令以复现我们在NYUv2数据集上使用在SYNTHMIX数据集上预训练的深度估计模型所得到的结果：\n\n```shell\nPYTHONPATH=. python -m dinov3.run.submit dinov3\u002Feval\u002Fdepth\u002Frun.py \\\nconfig=dinov3\u002Feval\u002Fdepth\u002Fconfigs\u002Fconfig-nyu-synthmix-dpt-inference.yaml \\\ndatasets.root=\u003CPATH\u002FTO\u002FDATASET> \\\nload_from=dinov3_vit7b16_dd \\\n--output-dir \u003CPATH\u002FTO\u002FOUTPUT\u002FDIR>\n```\n\n注意事项：\n- 如果您希望不使用`dinov3.run.submit`直接运行代码，可以使用Python或`torchrun`来执行：\n\n```shell\nPYTHONPATH=. python dinov3\u002Feval\u002Fdepth\u002Frun.py \\\nconfig=dinov3\u002Feval\u002Fdepth\u002Fconfigs\u002Fconfig-nyu-synthmix-dpt-inference.yaml \\\ndatasets.root=\u003CPATH\u002FTO\u002FDATASET> \\\nload_from=dinov3_vit7b16_dd \\\noutput_dir=\u003CPATH\u002FTO\u002FOUTPUT\u002FDIR>\n```\n\n- 您还可以通过设置`result_config.save_results=true`来保存预测结果。\n\n\n### 预训练头部——在COCO2017数据集上训练的检测模型\n\n\u003Ctable style=\"margin: auto\">\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth>骨干网络\u003C\u002Fth>\n      \u003Cth>预训练\u003Cbr\u002F>数据集\u003C\u002Fth>\n      \u003Cth>头部\u003Cbr\u002F>数据集\u003C\u002Fth>\n      \u003Cth>下载\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd>ViT-7B\u002F16\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">COCO2017\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[链接]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n\n```python\ndetector = torch.hub.load(REPO_DIR, 'dinov3_vit7b16_de', source=\"local\", weights=\u003CDETECTOR\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>, backbone_weights=\u003CBACKBONE\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\n```\n\n### 预训练头部——在ADE20K数据集上训练的分割模型\n\n\u003Ctable style=\"margin: auto\">\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth>骨干网络\u003C\u002Fth>\n      \u003Cth>预训练\u003Cbr\u002F>数据集\u003C\u002Fth>\n      \u003Cth>头部\u003Cbr\u002F>数据集\u003C\u002Fth>\n      \u003Cth>下载\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd>ViT-7B\u002F16\u003C\u002Ftd>\n      \u003Ctd align=\"center\">LVD-1689M\u003C\u002Ftd>\n      \u003Ctd align=\"center\">ADE20K\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[链接]\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n```python\nsegmentor = torch.hub.load(REPO_DIR, 'dinov3_vit7b16_ms', source=\"local\", weights=\u003CSEGMENTOR\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>, backbone_weights=\u003CBACKBONE\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\n```\n\n使用提供的分割模型（ViT-7B + M2F）对ADE20K数据集进行完整推理的示例命令：\n\n```shell\nPYTHONPATH=. python -m dinov3.run.submit dinov3\u002Feval\u002Fsegmentation\u002Frun.py \\\nconfig=dinov3\u002Feval\u002Fsegmentation\u002Fconfigs\u002Fconfig-ade20k-m2f-inference.yaml  \\\ndatasets.root=\u003CPATH\u002FTO\u002FDATASET> \\\nload_from=dinov3_vit7b16_ms \\\n--output-dir \u003CPATH\u002FTO\u002FOUTPUT\u002FDIR>\n```\n\n使用分割模型对图像进行处理的完整示例代码\n\n```python\nimport sys\nsys.path.append(REPO_DIR)\n\nfrom PIL import Image\nimport torch\nfrom torchvision import transforms\nimport matplotlib.pyplot as plt\nfrom matplotlib import colormaps\nfrom functools import partial\nfrom dinov3.eval.segmentation.inference import make_inference\n\n\ndef get_img():\n    import requests\n    url = \"http:\u002F\u002Fimages.cocodataset.org\u002Fval2017\u002F000000039769.jpg\"\n    image = Image.open(requests.get(url, stream=True).raw).convert(\"RGB\")\n    return image\n\ndef make_transform(resize_size: int | list[int] = 768):\n    to_tensor = v2.ToImage()\n    resize = v2.Resize((resize_size, resize_size), antialias=True)\n    to_float = v2.ToDtype(torch.float32, scale=True)\n    normalize = v2.Normalize(\n        mean=(0.485, 0.456, 0.406),\n        std=(0.229, 0.224, 0.225),\n    )\n    return v2.Compose([to_tensor, resize, to_float, normalize])\n\nsegmentor = torch.hub.load(REPO_DIR, 'dinov3_vit7b16_ms', source=\"local\", weights=\u003CSEGMENTOR\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>, backbone_weights=\u003CBACKBONE\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\n\nimg_size = 896\nimg  = get_img()\ntransform = make_transform(img_size)\nwith torch.inference_mode():\n    with torch.autocast('cuda', dtype=torch.bfloat16):\n        batch_img = transform(img)[None]\n        pred_vit7b = segmentor(batch_img)  # 原始预测结果\n        # 实际的分割图\n        segmentation_map_vit7b = make_inference(\n            batch_img,\n            segmentor,\n            inference_mode=\"slide\",\n            decoder_head_type=\"m2f\",\n            rescale_to=(img.size[-1], img.size[-2]),\n            n_output_channels=150,\n            crop_size=(img_size, img_size),\n            stride=(img_size, img_size),\n            output_activation=partial(torch.nn.functional.softmax, dim=1),\n        ).argmax(dim=1, keepdim=True)\nplt.figure(figsize=(12, 6))\nplt.subplot(121)\nplt.imshow(img)\nplt.axis(\"off\")\nplt.subplot(122)\nplt.imshow(segmentation_map_vit7b[0,0].cpu(), cmap=colormaps[\"Spectral\"])\nplt.axis(\"off\")\n```\n\n### 预训练头 —— 使用 `dino.txt` 的零样本任务\n\n\u003Ctable style=\"margin: auto\">\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth rowspan=\"2\">主干网络\u003C\u002Fth>\n      \u003Cth>下载\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd>ViT-L\u002F16 蒸馏版\u003C\u002Ftd>\n      \u003Ctd align=\"center\">\n        \u003Ca href=\"https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F\">[链接]\u003C\u002Fa>,\n        \u003Ca href=\"https:\u002F\u002Fdl.fbaipublicfiles.com\u002Fdinov3\u002Fthirdparty\u002Fbpe_simple_vocab_16e6.txt.gz\">词汇表\u003C\u002Fa>,\n        \u003Ca href=\"https:\u002F\u002Fdl.fbaipublicfiles.com\u002Fdinov2\u002Fthirdparty\u002FLICENSE\">词汇表许可\u003C\u002Fa>\n      \u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n可以通过 PyTorch Hub 加载完整的 dino.txt 模型：\n\n```python\nimport torch\n# DINOv3\ndinov3_vitl16_dinotxt_tet1280d20h24l, tokenizer = torch.hub.load(REPO_DIR, 'dinov3_vitl16_dinotxt_tet1280d20h24l', weights=\u003CSEGMENTOR\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>, backbone_weights=\u003CBACKBONE\u002FCHECKPOINT\u002FURL\u002FOR\u002FPATH>)\n```\n\n\n## 安装\n\n训练和评估代码需要 PyTorch 版本 ≥ 2.7.1，以及一些其他第三方包。请注意，该代码仅在指定版本下经过测试，并且要求运行环境为 Linux。要设置训练和评估所需的所有依赖项，请按照以下步骤操作：\n\n*[micromamba](https:\u002F\u002Fmamba.readthedocs.io\u002Fen\u002Flatest\u002Fuser_guide\u002Fmicromamba.html)* **（推荐）** - 克隆仓库，然后使用提供的环境定义创建并激活一个名为 `dinov3` 的 conda 环境：\n\n```shell\nmicromamba env create -f conda.yaml\nmicromamba activate dinov3\n```\n\n## 入门\n\n提供了几个笔记本以帮助您开始使用 DINOv3：\n- [补丁特征的 PCA 分析](notebooks\u002Fpca.ipynb)：显示前景物体上 DINOv3 补丁特征的 PCA（论文中的彩虹可视化）[[在 Google Colab 中运行]](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Ffacebookresearch\u002Fdinov3\u002Fblob\u002Fmain\u002Fnotebooks\u002Fpca.ipynb)\n- [前景分割](notebooks\u002Fforeground_segmentation.ipynb)：基于 DINOv3 特征训练线性前景分割模型 [[在 Google Colab 中运行]](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Ffacebookresearch\u002Fdinov3\u002Fblob\u002Fmain\u002Fnotebooks\u002Fforeground_segmentation.ipynb)\n- [密集与稀疏匹配](notebooks\u002Fdense_sparse_matching.ipynb)：根据 DINOv3 特征匹配两张不同图像中物体的补丁 [[在 Google Colab 中运行]](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Ffacebookresearch\u002Fdinov3\u002Fblob\u002Fmain\u002Fnotebooks\u002Fdense_sparse_matching.ipynb)\n- [分割跟踪](notebooks\u002Fsegmentation_tracking.ipynb)：使用基于 DINOv3 特征的非参数方法进行视频分割跟踪 [[在 Google Colab 中运行]](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Ffacebookresearch\u002Fdinov3\u002Fblob\u002Fmain\u002Fnotebooks\u002Fsegmentation_tracking.ipynb)\n- [基于 DINOv3 的 dino.txt 的零样本分割](notebooks\u002Fdinotxt_segmentation_inference.ipynb)：使用 dino.txt 策略计算开放词汇分割结果。\n\n## 数据准备\n\n### ImageNet-1k\n\n数据集的根目录应包含以下内容：\n\n- `\u003CROOT>\u002Ftest\u002FILSVRC2012_test_00000001.JPEG`\n- `\u003CROOT>\u002Ftest\u002F[..]`\n- `\u003CROOT>\u002Ftest\u002FILSVRC2012_test_00100000.JPEG`\n- `\u003CROOT>\u002Ftrain\u002Fn01440764\u002Fn01440764_10026.JPEG`\n- `\u003CROOT>\u002Ftrain\u002F[...]`\n- `\u003CROOT>\u002Ftrain\u002Fn15075141\u002Fn15075141_9993.JPEG`\n- `\u003CROOT>\u002Fval\u002Fn01440764\u002FILSVRC2012_val_00000293.JPEG`\n- `\u003CROOT>\u002Fval\u002F[...]`\n- `\u003CROOT>\u002Fval\u002Fn15075141\u002FILSVRC2012_val_00049174.JPEG`\n- `\u003CROOT>\u002Flabels.txt`\n\n提供的数据集实现要求在额外目录下存在一些附加元数据文件：\n\n- `\u003CEXTRA>\u002Fclass-ids-TRAIN.npy`\n- `\u003CEXTRA>\u002Fclass-ids-VAL.npy`\n- `\u003CEXTRA>\u002Fclass-names-TRAIN.npy`\n- `\u003CEXTRA>\u002Fclass-names-VAL.npy`\n- `\u003CEXTRA>\u002Fentries-TEST.npy`\n- `\u003CEXTRA>\u002Fentries-TRAIN.npy`\n- `\u003CEXTRA>\u002Fentries-VAL.npy`\n\n这些元数据文件可以使用以下 Python 代码一次性生成：\n\n```python\nfrom dinov3.data.datasets import ImageNet\n\nfor split in ImageNet.Split:\n    dataset = ImageNet(split=split, root=\"\u003CROOT>\", extra=\"\u003CEXTRA>\")\n    dataset.dump_extra()\n```\n\n请注意，根目录和额外目录不必是独立的目录。\n\n### ImageNet-22k\n\n请根据您的本地设置调整 [数据集类](dinov3\u002Fdata\u002Fdatasets\u002Fimage_net_22k.py)。\n\n\u003Cbr \u002F>\n\n:warning: 要执行接下来各节中提供的训练和评估命令，`dinov3` 包必须包含在 Python 模块搜索路径中，即只需在运行命令前加上 `PYTHONPATH=.`。\n\n## 训练\n\n### 快速设置：在 ImageNet-1k 上训练 DINOv3 ViT-L\u002F16\n\n在 SLURM 集群环境中，使用 submitit 在 4 个 H100-80GB 节点（32 张 GPU）上运行 DINOv3 预训练：\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3\u002Ftrain\u002Ftrain.py \\\n  --nodes 4 \\\n  --config-file dinov3\u002Fconfigs\u002Ftrain\u002Fvitl_im1k_lin834.yaml \\\n  --output-dir \u003CPATH\u002FTO\u002FOUTPUT\u002FDIR> \\\n  train.dataset_path=ImageNet22k:root=\u003CPATH\u002FTO\u002FDATASET>:extra=\u003CPATH\u002FTO\u002FDATASET>\n```\n\n训练时间约为 14 小时，最终检查点在 k-NN 评估中应达到 82.0%，在线性评估中达到 83.5%。\n\n训练代码每 12500 次迭代会在评估文件夹中保存教师权重，以便进行评估。\n\n### DINOv3 的精确设置：训练 DINOv3 ViT-7B\u002F16\n\nDINOv3 ViT-7B\u002F16 是在一个私有数据集上训练的。训练分为三个阶段：\n- 预训练\n- Gram 锚定\n- 高分辨率适应\n\n#### 预训练\n\n在 SLURM 集群环境中，使用 submitit 启动 DINOV3 ViT-7B\u002F16 的预训练，共 32 个节点（256 张 GPU）。\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3\u002Ftrain\u002Ftrain.py \\\n  --nodes 32 \\\n  --config-file dinov3\u002Fconfigs\u002Ftrain\u002Fdinov3_vit7b16_pretrain.yaml \\\n  --output-dir \u003CPATH\u002FTO\u002FOUTPUT\u002FDIR> \\\n  train.dataset_path=\u003CDATASET>:root=\u003CPATH\u002FTO\u002FDATASET>:extra=\u003CPATH\u002FTO\u002FDATASET>\n```\n\n#### Gram 锚定\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3\u002Ftrain\u002Ftrain.py \\\n  --nodes 32 \\\n  --config-file dinov3\u002Fconfigs\u002Ftrain\u002Fdinov3_vit7b16_gram_anchor.yaml \\\n  --output-dir \u003CPATH\u002FTO\u002FOUTPUT\u002FDIR> \\\n  train.dataset_path=\u003CDATASET>:root=\u003CPATH\u002FTO\u002FDATASET>:extra=\u003CPATH\u002FTO\u002FDATASET> \\\n  gram.ckpt=\u003CPATH\u002FTO\u002FGRAM_TEACHER_FROM_PREVIOUS_STEP>\n```\n\n#### 高分辨率适应\n\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3\u002Ftrain\u002Ftrain.py \\\n  --nodes 32 \\\n  --config-file dinov3\u002Fconfigs\u002Ftrain\u002Fdinov3_vit7b16_high_res_adapt.yaml \\\n  --output-dir \u003CPATH\u002FTO\u002FOUTPUT\u002FDIR> \\\n  train.dataset_path=\u003CDATASET>:root=\u003CPATH\u002FTO\u002FDATASET>:extra=\u003CPATH\u002FTO\u002FDATASET> \\\n  gram.ckpt=\u003CPATH\u002FTO\u002FTEACHER_FROM_GRAM> \\\n  student.resume_from_teacher_chkpt=\u003CPATH\u002FTO\u002FTEACHER_FROM_GRAM>\n```\n\n## 多重蒸馏\n\n### 测试设置：\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3\u002Ftrain\u002Ftrain.py \\\n  --nodes 1 \\\n  --config-file dinov3\u002Fconfigs\u002Ftrain\u002Fmulti_distillation_test.yaml \\\n  --output-dir \u003CPATH\u002FTO\u002FOUTPUT\u002FDIR> \\\n  --multi-distillation \\\n  train.dataset_path=\u003CDATASET>:root=\u003CPATH\u002FTO\u002FDATASET>:extra=\u003CPATH\u002FTO\u002FDATASET>\n```\n\n## 评估\n\n训练代码会定期保存教师模型的权重。为了评估模型，请在单个节点上运行以下评估命令：\n\n\n### ImageNet-1k 数据集上的逻辑回归分类\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3\u002Feval\u002Flog_regression.py \\\n  model.config_file=\u003C输出目录路径>\u002Fconfig.yaml \\\n  model.pretrained_weights=\u003C输出目录路径>\u002Fteacher_checkpoint.pth \\\n  output_dir=\u003C输出目录路径> \\\n  train.dataset=ImageNet:split=TRAIN:root=\u003C数据集路径>:extra=\u003C数据集路径> \\\n  eval.test_dataset=ImageNet:split=VAL:root=\u003C数据集路径>:extra=\u003C数据集路径>\n```\n\n### ImageNet-1k 数据集上的 k-NN 分类\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3\u002Feval\u002Fknn.py \\\n  model.config_file=\u003C 输出目录路径 >\u002Fconfig.yaml \\\n  model.pretrained_weights=\u003C 输出目录路径 >\u002Fteacher_checkpoint.pth \\\n  output_dir=\u003C 输出目录路径 > \\\n  train.dataset=ImageNet:split=TRAIN:root=\u003C 数据集路径 >:extra=\u003C 数据集路径 > \\\n  eval.test_dataset=ImageNet:split=VAL:root=\u003C 数据集路径 >:extra=\u003C 数据集路径 >\n```\n\n### ImageNet-1k 数据集上带数据增强的线性分类\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3\u002Feval\u002Flinear.py \\\n  model.config_file=\u003C 输出目录路径 >\u002Fconfig.yaml \\\n  model.pretrained_weights=\u003C 输出目录路径 >\u002Fteacher_checkpoint.pth \\\n  output_dir=\u003C 输出目录路径 > \\\n  train.dataset=ImageNet:split=TRAIN:root=\u003C 数据集路径 >:extra=\u003C 数据集路径 > \\\n  train.val_dataset=ImageNet:split=VAL:root=\u003C 数据集路径 >:extra=\u003C 数据集路径 >\n```\n\n### ADE20K 数据集上带数据增强的线性分割\n\n```shell\nPYTHONPATH=. python -m dinov3.run.submit dinov3\u002Feval\u002Fsegmentation\u002Frun.py \\\nmodel.dino_hub=dinov3_vit7b16 \\\nconfig=dinov3\u002Feval\u002Fsegmentation\u002Fconfigs\u002Fconfig-ade20k-linear-training.yaml \\\ndatasets.root=\u003C 数据集路径 > \\\n--output-dir \u003C 输出目录路径 >\n```\n\n作业完成后，您将在指定的输出路径目录中找到：\n- `segmentation_config.yaml`，包含用于训练模型的配置文件；\n- `model_final.pth`，训练结束时的最终线性头部检查点；以及\n- `results-semantic-segmentation.csv`，包含最终的评估指标。\n\n\n#### NYUv2 Depth 数据集上的线性深度估计\n\n```shell\nPYTHONPATH=. python -m dinov3.run.submit dinov3\u002Feval\u002Fdepth\u002Frun.py \\\n    model.dino_hub=dinov3_vit7b16 \\\n    config=dinov3\u002Feval\u002Fdepth\u002Fconfigs\u002Fconfig-nyu.yaml \\\n    datasets.root=\u003C 数据集路径 > \\\n    --output-dir \u003C 输出目录路径 >\n```\n\n作业完成后，您将在指定的输出路径目录中找到：\n- `depth_config.yaml`，包含用于训练模型的配置文件；\n- `model_final.pth`，训练结束时的最终线性头部检查点；以及\n- `results-depth.csv`，包含最终的评估指标。\n\n### 使用 dino.txt 在 DINOv3 上进行文本对齐\n\n文本对齐可以按照 `dino.txt` 中的方法进行，即 [DINOv2 Meets Text](https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.16334) 的方法。\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3\u002Feval\u002Ftext\u002Ftrain_dinotxt.py \\\n   --nodes 4 \\\n  # 文本对齐的一个示例配置在这里：dinov3\u002Feval\u002Ftext\u002Fconfigs\u002Fdinov3_vitl_text.yaml \\\n  trainer_config_file=\"\u003C DINOv3 文本配置路径 >\" \\\n  output-dir=\u003C 输出目录路径 >\n```\n上述命令将在 4 个节点上启动训练，每个节点配备 8 张 GPU（总共 32 张 GPU）。\n请注意，DINOv3 论文中使用的文本对齐模型是在私有数据集上训练的，而这里我们提供了一个基于 `CocoCaptions` 数据集的示例配置 `dinov3\u002Feval\u002Ftext\u002Fconfigs\u002Fdinov3_vitl_text.yaml`，仅用于说明目的。\n请根据提供的 `CocoCaptions` 数据集类进行调整，该数据集可在 [此处](https:\u002F\u002Fwww.kaggle.com\u002Fdatasets\u002Fnikhil7280\u002Fcoco-image-caption) 找到。\n\n## 冠层高度地图 v2 (CHMv2)\n\nJohn Brandt, Seungeun Yi, Jamie Tolan, Xinyuan Li, Peter Potapov, \u003Cbr\u002F>\nJessica Ertel, Justine Spore, Huy V. Vo, Michaël Ramamonjisoa, Patrick Labatut, \u003Cbr\u002F>\nPiotr Bojanowski, Camille Couprie\n\n[ :scroll: [`论文`](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.06382)] [ :newspaper: [`博客`](http:\u002F\u002Fai.meta.com\u002Fblog\u002Fworld-resources-institute-dino-canopy-height-maps-v2)]\n\n### CHMv2 模型加载（通过 PyTorch [Hub](https:\u002F\u002Fdocs.pytorch.org\u002Fdocs\u002Fstable\u002Fhub.html)）\n\n:information_source: 请按照下方链接获取 CHMv2 模型权重：一旦申请被接受，您将收到一封包含可用模型权重 URL 的电子邮件。随后，您可以使用该 URL：\n- 将模型权重下载到本地文件系统，并通过 `weights` 参数将 `torch.hub.load()` 指向这些本地权重；或者\n- 直接调用 `torch.hub.load()` 从其 URL 下载并加载骨干网络。\n\nCHMv2 使用 DINOv3 ViT-L\u002F16 卫星作为骨干网络，可通过在此处申请访问权限获得 [这里](https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F)。\n\n:warning: 请使用 `wget` 而不是网页浏览器来下载权重。\n\n下载链接：https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fchmv2-downloads\u002F\n\n```python\nimport torch\nfrom dinov3.hub.backbones import Weights\n\nREPO_DIR = \u003C 克隆 DINOv3 仓库的本地目录路径 >\n\nchmv2_model = torch.hub.load(\n    REPO_DIR,\n    'dinov3_vitl16_chmv2',\n    source=\"local\",\n    weights=\"\u003C CHMV2 模型检查点的 URL 或路径 >\",\n    backbone_weights=Weights.SAT493M,  # 或 \u003C DINOV3_VITL_SAT 检查点的 URL 或路径 >\n)\n```\n\n有关如何使用 DINOv3 + CHMv2 模型的示例，请参阅此 [笔记本](notebooks\u002Fchmv2_inference.ipynb)。\n\n此 [笔记本](notebooks\u002Fchmv2_dataset_exploration.ipynb)可用于从 AWS 上存储的现有全球数据集中下载推理数据。\n\n### CHMv2 模型加载（通过 Hugging Face [Transformers](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002F)）\n\nCHMv2 模型也可在 [Hugging Face Hub](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fdinov3-vitl16-chmv2-dpt-head) 上找到，并由 Hugging Face [Transformers](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Findex) 库支持。请参考相应的文档以了解使用方法，但下面是一个简短示例，演示如何对示例图像获取冠层高度预测。\n\n```python\nfrom PIL import Image\nimport torch\n\nfrom transformers import AutoModelForDepthEstimation、AutoImageProcessor\n\nprocessor = AutoImageProcessor.from_pretrained(\"facebook\u002Fdinov3-vitl16-chmv2-dpt-head\")\nmodel = AutoModelForDepthEstimation.from_pretrained(\"facebook\u002Fdinov3-vitl16-chmv2-dpt-head\")\n\nimage = Image.open(\"image.tif\")\ninputs = processor(images=image, return_tensors=\"pt\")\n\nwith torch.no_grad():\n    outputs = model(**inputs)\n\ndepth = processor.post_process_depth_estimation(\n    outputs, target_sizes=[(image.height, image.width)]\n)[0][\"predicted_depth\"]\n```\n\n## 许可证\n\nDINOv3 的代码和模型权重根据 DINOv3 许可证发布。更多详情请参阅 [LICENSE.md](LICENSE.md)。\n\n## 贡献\n\n请参阅 [contributing](CONTRIBUTING.md) 和 [行为准则](CODE_OF_CONDUCT.md)。\n\n## 引用 DINOv3\n\n如果您觉得这个仓库有用，请考虑给它点个赞 :star: 并引用 :t-rex::\n\n```\n@misc{simeoni2025dinov3,\n  title={{DINOv3}},\n  author={Sim{\\'e}oni, Oriane and Vo, Huy V. and Seitzer, Maximilian and Baldassarre, Federico and Oquab, Maxime and Jose, Cijo and Khalidov, Vasil and Szafraniec, Marc and Yi, Seungeun and Ramamonjisoa, Micha{\\\"e}l and Massa, Francisco and Haziza, Daniel and Wehrstedt, Luca and Wang, Jianyuan and Darcet, Timoth{\\'e}e and Moutakanni, Th{\\'e}o and Sentana, Leonel and Roberts, Claire and Vedaldi, Andrea and Tolan, Jamie and Brandt, John and Couprie, Camille and Mairal, Julien and J{\\'e}gou, Herv{\\'e} and Labatut, Patrick and Bojanowski, Piotr},\n  year={2025},\n  eprint={2508.10104},\n  archivePrefix={arXiv},\n  primaryClass={cs.CV},\n  url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.10104},\n}\n```","# DINOv3 快速上手指南\n\nDINOv3 是 Meta AI Research 推出的新一代自监督视觉基础模型家族，能够生成高质量的高分辨率密集特征，在无需微调的情况下即可在多种视觉任务中达到业界领先水平。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**: Linux (推荐), macOS, Windows\n- **Python**: 3.8 或更高版本\n- **GPU**: 支持 CUDA 的 NVIDIA GPU（强烈推荐用于加速推理）\n\n### 前置依赖\n主要依赖为 **PyTorch**。请确保安装带有 CUDA 支持的 PyTorch 版本以获得最佳性能。\n\n```bash\n# 访问 https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F 获取适合你环境的安装命令\n# 以下为示例命令（根据实际 CUDA 版本调整）\npip3 install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n```\n\n可选依赖（用于通过 Hugging Face 调用）：\n```bash\npip install transformers>=4.56.0\n# 或者使用timm库 (版本 >= 1.0.20)\npip install timm>=1.0.20\n```\n\n> **国内加速建议**：如果下载缓慢，可使用清华源或阿里源安装依赖：\n> ```bash\n> pip install -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple torch torchvision torchaudio transformers\n> ```\n\n## 安装步骤\n\nDINOv3 无需复杂的安装过程，主要通过 `torch.hub` 或 `Hugging Face Transformers` 直接加载预训练模型。\n\n### 方式一：克隆仓库（适用于 torch.hub 加载）\n如果你希望通过本地仓库加载模型，首先克隆官方代码库：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdinov3.git\ncd dinov3\n```\n\n### 方式二：直接通过 Hugging Face 使用\n无需克隆仓库，只需安装 `transformers` 库即可直接使用（见下文“基本使用”）。\n\n### 获取模型权重\n部分模型权重需要通过 Meta 官方页面申请下载链接：\n1. 访问 [DINOv3 模型下载页面](https:\u002F\u002Fai.meta.com\u002Fresources\u002Fmodels-and-libraries\u002Fdinov3-downloads\u002F)。\n2. 提交申请，审核通过后你将收到一封包含所有可用模型权重（Backbones 和 Adapters）下载链接的邮件。\n3. **注意**：请使用 `wget` 命令下载权重文件，不要直接使用浏览器下载大文件。\n\n```bash\n# 示例：使用 wget 下载权重\nwget \u003C邮件中提供的下载链接> -O dinov3_vitl16_pretrain.pth\n```\n\n## 基本使用\n\n### 方法 A：使用 PyTorch Hub (推荐)\n\n此方法适合需要灵活控制模型加载路径的用户。\n\n```python\nimport torch\n\n# 替换为你本地克隆的 dinov3 仓库绝对路径\nREPO_DIR = \"\u002Fpath\u002Fto\u002Flocal\u002Fdinov3\" \n\n# 加载预训练模型 (以 ViT-L\u002F16 为例)\n# weights 参数可以是本地文件路径，也可以是直接的 URL\ndinov3_vitl16 = torch.hub.load(\n    REPO_DIR, \n    'dinov3_vitl16', \n    source='local', \n    weights=\"\u003CCHECKPOINT_URL_OR_LOCAL_PATH>\"\n)\n\n# 将模型设置为评估模式并移至 GPU\ndinov3_vitl16.eval()\ndinov3_vitl16.to(\"cuda\")\n\n# 准备输入图像 (此处仅为伪代码，需配合 torchvision.transforms 预处理)\n# input_tensor = preprocess(image).unsqueeze(0).to(\"cuda\")\n\n# 推理\n# with torch.no_grad():\n#     features = dinov3_vitl16(input_tensor)\n```\n\n**可用模型名称列表：**\n- Web 数据集预训练 ViT: `dinov3_vits16`, `dinov3_vits16plus`, `dinov3_vitb16`, `dinov3_vitl16`, `dinov3_vith16plus`, `dinov3_vit7b16`\n- Web 数据集预训练 ConvNeXt: `dinov3_convnext_tiny`, `dinov3_convnext_small`, `dinov3_convnext_base`, `dinov3_convnext_large`\n- 卫星图像数据集预训练: `dinov3_vitl16` (SAT), `dinov3_vit7b16` (SAT)\n\n---\n\n### 方法 B：使用 Hugging Face Transformers (最简便)\n\n此方法无需手动下载权重，库会自动处理下载和缓存。\n\n#### 1. 使用 Pipeline 快速提取特征\n\n```python\nfrom transformers import pipeline\nfrom transformers.image_utils import load_image\n\n# 加载图像\nurl = \"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhuggingface\u002Fdocumentation-images\u002Fresolve\u002Fmain\u002Fpipeline-cat-chonk.jpeg\"\nimage = load_image(url)\n\n# 初始化特征提取管道\nfeature_extractor = pipeline(\n    model=\"facebook\u002Fdinov3-convnext-tiny-pretrain-lvd1689m\",\n    task=\"image-feature-extraction\",\n    device=0  # 指定 GPU ID，若无 GPU 可移除该行\n)\n\n# 提取特征\nfeatures = feature_extractor(image)\nprint(features.shape)\n```\n\n#### 2. 使用 AutoModel 进行更精细的控制\n\n```python\nimport torch\nfrom transformers import AutoImageProcessor, AutoModel\nfrom transformers.image_utils import load_image\n\n# 加载图像\nurl = \"http:\u002F\u002Fimages.cocodataset.org\u002Fval2017\u002F000000039769.jpg\"\nimage = load_image(url)\n\n# 指定模型名称 (可从 Hugging Face Hub 选择其他型号)\npretrained_model_name = \"facebook\u002Fdinov3-convnext-tiny-pretrain-lvd1689m\"\n\n# 加载处理器和模型\nprocessor = AutoImageProcessor.from_pretrained(pretrained_model_name)\nmodel = AutoModel.from_pretrained(\n    pretrained_model_name,\n    device_map=\"auto\",  # 自动分配设备\n)\n\n# 预处理\ninputs = processor(images=image, return_tensors=\"pt\").to(model.device)\n\n# 推理\nwith torch.inference_mode():\n    outputs = model(**inputs)\n\n# 获取全局池化输出\npooled_output = outputs.pooler_output\nprint(\"Pooled output shape:\", pooled_output.shape)\n```\n\n**常用 Hugging Face 模型 ID：**\n- `facebook\u002Fdinov3-vits16-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-vitb16-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-vitl16-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-convnext-base-pretrain-lvd1689m`\n- `facebook\u002Fdinov3-vitl16-chmv2-dpt-head` (用于树冠高度图估算)","某林业监测团队正利用卫星遥感影像，对全球热带雨林进行高精度的树冠高度测绘与生物多样性评估。\n\n### 没有 dinov3 时\n- **细节丢失严重**：传统模型在处理高分辨率卫星图时，难以捕捉细微的植被纹理，导致单棵树木的轮廓模糊，无法区分相邻的树冠。\n- **泛化能力不足**：在不同光照条件或地理区域（如从亚马逊切换到东南亚雨林）时，模型表现大幅下降，需针对每个新区域重新收集数据并微调训练。\n- **任务割裂效率低**：估算树冠高度和识别树种需要分别部署两个专用模型，不仅推理速度慢，还增加了系统维护的复杂性。\n- **标注成本高昂**：为了达到可用精度，必须依赖大量人工标注的像素级掩码数据，耗时数月且专家资源难以获取。\n\n### 使用 dinov3 后\n- **高密度特征还原**：dinov3 生成的密集特征图能清晰呈现高分辨率下的植被细节，精准勾勒出单株树木边界，显著提升了地图的颗粒度。\n- **零样本全局一致**：凭借强大的自监督学习能力，dinov3 无需任何微调即可在全球不同气候带保持一致的高精度，实现了“一次训练，全球通用”。\n- **多任务统一架构**：基于 dinov3 骨干网络，团队仅用一套模型便同时完成了树冠高度估算（CHMv2）和语义分割，推理延迟降低 40%。\n- **摆脱数据依赖**：利用 dinov3 的线性探测技术，仅需极少量标注样本即可达到甚至超越以往全量微调的效果，将项目启动周期从数月缩短至数周。\n\ndinov3 通过提供无需微调的高质量密集特征，让全球尺度的精细化生态监测变得低成本且触手可及。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_dinov3_e0197b9d.jpg","facebookresearch","Meta Research","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Ffacebookresearch_449342bd.png","",null,"https:\u002F\u002Fopensource.fb.com","https:\u002F\u002Fgithub.com\u002Ffacebookresearch",[81,85,89,93],{"name":82,"color":83,"percentage":84},"Jupyter Notebook","#DA5B0B",94.5,{"name":86,"color":87,"percentage":88},"Python","#3572A5",5.1,{"name":90,"color":91,"percentage":92},"Cuda","#3A4E3A",0.3,{"name":94,"color":95,"percentage":96},"C++","#f34b7d",0,10098,812,"2026-04-14T03:05:30","NOASSERTION","未说明","强烈建议使用支持 CUDA 的 NVIDIA GPU（具体型号和显存未说明，但运行 7B 参数模型需大显存）",{"notes":104,"python":101,"dependencies":105},"1. PyTorch 是加载模型的唯一必需依赖，强烈建议安装支持 CUDA 的版本。\n2. 模型权重需通过特定链接申请获取，收到邮件后方可下载。\n3. 建议使用 wget 而非浏览器下载模型权重文件。\n4. 支持通过 PyTorch Hub 或 Hugging Face Transformers (v4.56.0+) 加载模型。\n5. 提供多种规模模型（从 21M 到 6.7B 参数），大模型对硬件资源要求极高。",[106,107,108],"torch","transformers>=4.56.0","timm>=1.0.20",[15,110],"其他","2026-03-27T02:49:30.150509","2026-04-14T20:55:34.890469",[],[]]