[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-IDEA-Research--GroundingDINO":3,"tool-IDEA-Research--GroundingDINO":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":102,"forks":103,"last_commit_at":104,"license":105,"difficulty_score":106,"env_os":107,"env_gpu":108,"env_ram":109,"env_deps":110,"category_tags":120,"github_topics":121,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":127,"updated_at":128,"faqs":129,"releases":158},3574,"IDEA-Research\u002FGroundingDINO","GroundingDINO","[ECCV 2024] Official implementation of the paper \"Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection\"","Grounding DINO 是一款强大的开源人工智能模型，专为“开放集物体检测”任务设计。传统检测模型通常只能识别训练时见过的固定类别，而 Grounding DINO 突破了这一限制，它能够根据用户输入的自然语言描述（如“穿红衣服的人”或“桌上的咖啡杯”），在图像中精准定位并框出任意物体，即使这些物体从未出现在其训练数据中。\n\n该工具有效解决了传统算法无法灵活应对未知类别、依赖大量标注数据的痛点，实现了真正的零样本（Zero-Shot）检测能力。其核心技术亮点在于巧妙融合了 DINO 检测架构与接地预训练（Grounded Pre-Training）技术，将视觉特征与语言语义深度对齐，从而具备极强的泛化性和理解力。此外，它还能与 SAM 等分割模型联动，进一步实现精细化的物体分割与跟踪。\n\nGrounding DINO 非常适合计算机视觉研究人员、AI 开发者以及需要处理复杂场景的数据分析师使用。对于希望自动化数据集标注、构建灵活监控系统的工程师，或是探索多模态应用的科研人员，它都是一个极具价值的基石工具。凭借在 MS COCO 等权威基准测试中的卓越表现，Grounding DI","Grounding DINO 是一款强大的开源人工智能模型，专为“开放集物体检测”任务设计。传统检测模型通常只能识别训练时见过的固定类别，而 Grounding DINO 突破了这一限制，它能够根据用户输入的自然语言描述（如“穿红衣服的人”或“桌上的咖啡杯”），在图像中精准定位并框出任意物体，即使这些物体从未出现在其训练数据中。\n\n该工具有效解决了传统算法无法灵活应对未知类别、依赖大量标注数据的痛点，实现了真正的零样本（Zero-Shot）检测能力。其核心技术亮点在于巧妙融合了 DINO 检测架构与接地预训练（Grounded Pre-Training）技术，将视觉特征与语言语义深度对齐，从而具备极强的泛化性和理解力。此外，它还能与 SAM 等分割模型联动，进一步实现精细化的物体分割与跟踪。\n\nGrounding DINO 非常适合计算机视觉研究人员、AI 开发者以及需要处理复杂场景的数据分析师使用。对于希望自动化数据集标注、构建灵活监控系统的工程师，或是探索多模态应用的科研人员，它都是一个极具价值的基石工具。凭借在 MS COCO 等权威基准测试中的卓越表现，Grounding DINO 正成为连接语言指令与视觉感知的重要桥梁。","\u003Cdiv align=\"center\">\n  \u003Cimg src=\".\u002F.asset\u002Fgrounding_dino_logo.png\" width=\"30%\">\n\u003C\u002Fdiv>\n\n# :sauropod: Grounding DINO \n\n[![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fgrounding-dino-marrying-dino-with-grounded\u002Fzero-shot-object-detection-on-mscoco)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Fzero-shot-object-detection-on-mscoco?p=grounding-dino-marrying-dino-with-grounded) [![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fgrounding-dino-marrying-dino-with-grounded\u002Fzero-shot-object-detection-on-odinw)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Fzero-shot-object-detection-on-odinw?p=grounding-dino-marrying-dino-with-grounded) \\\n[![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fgrounding-dino-marrying-dino-with-grounded\u002Fobject-detection-on-coco-minival)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Fobject-detection-on-coco-minival?p=grounding-dino-marrying-dino-with-grounded) [![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fgrounding-dino-marrying-dino-with-grounded\u002Fobject-detection-on-coco)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Fobject-detection-on-coco?p=grounding-dino-marrying-dino-with-grounded)\n\n\n**[IDEA-CVR, IDEA-Research](https:\u002F\u002Fgithub.com\u002FIDEA-Research)** \n\n[Shilong Liu](http:\u002F\u002Fwww.lsl.zone\u002F), [Zhaoyang Zeng](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=U_cvvUwAAAAJ&hl=zh-CN&oi=ao), [Tianhe Ren](https:\u002F\u002Frentainhe.github.io\u002F), [Feng Li](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=ybRe9GcAAAAJ&hl=zh-CN), [Hao Zhang](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=B8hPxMQAAAAJ&hl=zh-CN), [Jie Yang](https:\u002F\u002Fgithub.com\u002Fyangjie-cv), [Chunyuan Li](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=Zd7WmXUAAAAJ&hl=zh-CN&oi=ao), [Jianwei Yang](https:\u002F\u002Fjwyang.github.io\u002F), [Hang Su](https:\u002F\u002Fscholar.google.com\u002Fcitations?hl=en&user=dxN1_X0AAAAJ&view_op=list_works&sortby=pubdate), [Jun Zhu](https:\u002F\u002Fscholar.google.com\u002Fcitations?hl=en&user=axsP38wAAAAJ), [Lei Zhang](https:\u002F\u002Fwww.leizhang.org\u002F)\u003Csup>:email:\u003C\u002Fsup>.\n\n\n[[`Paper`](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.05499)] [[`Demo`](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FShilongLiu\u002FGrounding_DINO_demo)] [[`BibTex`](#black_nib-citation)]\n\n\nPyTorch implementation and pretrained models for Grounding DINO. For details, see the paper **[Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.05499)**.\n\n- 🔥 **[Grounded SAM 2](https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGrounded-SAM-2)** is released now, which combines Grounding DINO with [SAM 2](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fsegment-anything-2) for any object tracking in open-world scenarios.\n- 🔥 **[Grounding DINO 1.5](https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGrounding-DINO-1.5-API)** is released now, which is IDEA Research's **Most Capable** Open-World Object Detection Model!\n- 🔥 **[Grounding DINO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.05499)** and **[Grounded SAM](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.14159)** are now supported in Huggingface. For more convenient use, you can refer to [this documentation](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_doc\u002Fgrounding-dino)\n\n## :sun_with_face: Helpful Tutorial\n\n- :grapes: [[Read our arXiv Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.05499)]\n- :apple:  [[Watch our simple introduction video on YouTube](https:\u002F\u002Fyoutu.be\u002FwxWDt5UiwY8)]\n- :blossom:   &nbsp;[[Try the Colab Demo](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Froboflow-ai\u002Fnotebooks\u002Fblob\u002Fmain\u002Fnotebooks\u002Fzero-shot-object-detection-with-grounding-dino.ipynb)]\n- :sunflower: [[Try our Official Huggingface Demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FShilongLiu\u002FGrounding_DINO_demo)]\n- :maple_leaf: [[Watch the Step by Step Tutorial about GroundingDINO by Roboflow AI](https:\u002F\u002Fyoutu.be\u002FcMa77r3YrDk)]\n- :mushroom: [[GroundingDINO: Automated Dataset Annotation and Evaluation by Roboflow AI](https:\u002F\u002Fyoutu.be\u002FC4NqaRBz_Kw)]\n- :hibiscus: [[Accelerate Image Annotation with SAM and GroundingDINO by Roboflow AI](https:\u002F\u002Fyoutu.be\u002FoEQYStnF2l8)]\n- :white_flower: [[Autodistill: Train YOLOv8 with ZERO Annotations based on Grounding-DINO and Grounded-SAM by Roboflow AI](https:\u002F\u002Fgithub.com\u002Fautodistill\u002Fautodistill)]\n\n\u003C!-- Grounding DINO Methods | \n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2303.05499-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.05499) \n[![YouTube](https:\u002F\u002Fbadges.aleen42.com\u002Fsrc\u002Fyoutube.svg)](https:\u002F\u002Fyoutu.be\u002FwxWDt5UiwY8) -->\n\n\u003C!-- Grounding DINO Demos |\n[![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Froboflow-ai\u002Fnotebooks\u002Fblob\u002Fmain\u002Fnotebooks\u002Fzero-shot-object-detection-with-grounding-dino.ipynb) -->\n\u003C!-- [![YouTube](https:\u002F\u002Fbadges.aleen42.com\u002Fsrc\u002Fyoutube.svg)](https:\u002F\u002Fyoutu.be\u002FcMa77r3YrDk)\n[![HuggingFace space](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-HuggingFace%20Space-cyan.svg)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FShilongLiu\u002FGrounding_DINO_demo)\n[![YouTube](https:\u002F\u002Fbadges.aleen42.com\u002Fsrc\u002Fyoutube.svg)](https:\u002F\u002Fyoutu.be\u002FoEQYStnF2l8)\n[![YouTube](https:\u002F\u002Fbadges.aleen42.com\u002Fsrc\u002Fyoutube.svg)](https:\u002F\u002Fyoutu.be\u002FC4NqaRBz_Kw) -->\n\n## :sparkles: Highlight Projects\n\n- [Semantic-SAM: a universal image segmentation model to enable segment and recognize anything at any desired granularity.](https:\u002F\u002Fgithub.com\u002FUX-Decoder\u002FSemantic-SAM), \n- [DetGPT: Detect What You Need via Reasoning](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FDetGPT)\n- [Grounded-SAM: Marrying Grounding DINO with Segment Anything](https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGrounded-Segment-Anything)\n- [Grounding DINO with Stable Diffusion](demo\u002Fimage_editing_with_groundingdino_stablediffusion.ipynb)\n- [Grounding DINO with GLIGEN for Controllable Image Editing](demo\u002Fimage_editing_with_groundingdino_gligen.ipynb)\n- [OpenSeeD: A Simple and Strong Openset Segmentation Model](https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FOpenSeeD)\n- [SEEM: Segment Everything Everywhere All at Once](https:\u002F\u002Fgithub.com\u002FUX-Decoder\u002FSegment-Everything-Everywhere-All-At-Once)\n- [X-GPT: Conversational Visual Agent supported by X-Decoder](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FX-Decoder\u002Ftree\u002Fxgpt)\n- [GLIGEN: Open-Set Grounded Text-to-Image Generation](https:\u002F\u002Fgithub.com\u002Fgligen\u002FGLIGEN)\n- [LLaVA: Large Language and Vision Assistant](https:\u002F\u002Fgithub.com\u002Fhaotian-liu\u002FLLaVA)\n\n\u003C!-- Extensions | [Grounding DINO with Segment Anything](https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGrounded-Segment-Anything); [Grounding DINO with Stable Diffusion](demo\u002Fimage_editing_with_groundingdino_stablediffusion.ipynb); [Grounding DINO with GLIGEN](demo\u002Fimage_editing_with_groundingdino_gligen.ipynb)  -->\n\n\n\n\u003C!-- Official PyTorch implementation of [Grounding DINO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.05499), a stronger open-set object detector. Code is available now! -->\n\n\n## :bulb: Highlight\n\n- **Open-Set Detection.** Detect **everything** with language!\n- **High Performance.** COCO zero-shot **52.5 AP** (training without COCO data!). COCO fine-tune **63.0 AP**.\n- **Flexible.** Collaboration with Stable Diffusion for Image Editting.\n\n\n\n\n## :fire: News\n- **`2023\u002F07\u002F18`**: We release [Semantic-SAM](https:\u002F\u002Fgithub.com\u002FUX-Decoder\u002FSemantic-SAM), a universal image segmentation model to enable segment and recognize anything at any desired granularity. **Code** and **checkpoint** are available!\n- **`2023\u002F06\u002F17`**: We provide an example to evaluate Grounding DINO on COCO zero-shot performance.\n- **`2023\u002F04\u002F15`**: Refer to [CV in the Wild Readings](https:\u002F\u002Fgithub.com\u002FComputer-Vision-in-the-Wild\u002FCVinW_Readings) for those who are interested in open-set recognition!\n- **`2023\u002F04\u002F08`**: We release [demos](demo\u002Fimage_editing_with_groundingdino_gligen.ipynb) to combine [Grounding DINO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.05499) with [GLIGEN](https:\u002F\u002Fgithub.com\u002Fgligen\u002FGLIGEN)  for more controllable image editings.\n- **`2023\u002F04\u002F08`**: We release [demos](demo\u002Fimage_editing_with_groundingdino_stablediffusion.ipynb) to combine [Grounding DINO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.05499) with [Stable Diffusion](https:\u002F\u002Fgithub.com\u002FStability-AI\u002FStableDiffusion) for image editings.\n- **`2023\u002F04\u002F06`**: We build a new demo by marrying GroundingDINO with [Segment-Anything](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fsegment-anything) named **[Grounded-Segment-Anything](https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGrounded-Segment-Anything)** aims to support segmentation in GroundingDINO.\n- **`2023\u002F03\u002F28`**: A YouTube [video](https:\u002F\u002Fyoutu.be\u002FcMa77r3YrDk) about Grounding DINO and basic object detection prompt engineering. [[SkalskiP](https:\u002F\u002Fgithub.com\u002FSkalskiP)]\n- **`2023\u002F03\u002F28`**: Add a [demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FShilongLiu\u002FGrounding_DINO_demo) on Hugging Face Space!\n- **`2023\u002F03\u002F27`**: Support CPU-only mode. Now the model can run on machines without GPUs.\n- **`2023\u002F03\u002F25`**: A [demo](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Froboflow-ai\u002Fnotebooks\u002Fblob\u002Fmain\u002Fnotebooks\u002Fzero-shot-object-detection-with-grounding-dino.ipynb) for Grounding DINO is available at Colab. [[SkalskiP](https:\u002F\u002Fgithub.com\u002FSkalskiP)]\n- **`2023\u002F03\u002F22`**: Code is available Now!\n\n\u003Cdetails open>\n\u003Csummary>\u003Cfont size=\"4\">\nDescription\n\u003C\u002Ffont>\u003C\u002Fsummary>\n \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.05499\">Paper\u003C\u002Fa> introduction.\n\u003Cimg src=\".asset\u002Fhero_figure.png\" alt=\"ODinW\" width=\"100%\">\nMarrying \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO\">Grounding DINO\u003C\u002Fa> and \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fgligen\u002FGLIGEN\">GLIGEN\u003C\u002Fa>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FIDEA-Research_GroundingDINO_readme_c5ff84142634.png\" alt=\"gd_gligen\" width=\"100%\">\n\u003C\u002Fdetails>\n\n## :star: Explanations\u002FTips for Grounding DINO Inputs and Outputs\n- Grounding DINO accepts an `(image, text)` pair as inputs.\n- It outputs `900` (by default) object boxes. Each box has similarity scores across all input words. (as shown in Figures below.)\n- We defaultly choose the boxes whose highest similarities are higher than a `box_threshold`.\n- We extract the words whose similarities are higher than the `text_threshold` as predicted labels.\n- If you want to obtain objects of specific phrases, like the `dogs` in the sentence `two dogs with a stick.`, you can select the boxes with highest text similarities with `dogs` as final outputs. \n- Note that each word can be split to **more than one** tokens with different tokenlizers. The number of words in a sentence may not equal to the number of text tokens.\n- We suggest separating different category names with `.` for Grounding DINO.\n![model_explain1](.asset\u002Fmodel_explan1.PNG)\n![model_explain2](.asset\u002Fmodel_explan2.PNG)\n\n## :label: TODO \n\n- [x] Release inference code and demo.\n- [x] Release checkpoints.\n- [x] Grounding DINO with Stable Diffusion and GLIGEN demos.\n- [ ] Release training codes.\n\n## :hammer_and_wrench: Install \n\n**Note:**\n\n0. If you have a CUDA environment, please make sure the environment variable `CUDA_HOME` is set. It will be compiled under CPU-only mode if no CUDA available.\n\nPlease make sure following the installation steps strictly, otherwise the program may produce: \n```bash\nNameError: name '_C' is not defined\n```\n\nIf this happened, please reinstalled the groundingDINO by reclone the git and do all the installation steps again.\n \n#### how to check cuda:\n```bash\necho $CUDA_HOME\n```\nIf it print nothing, then it means you haven't set up the path\u002F\n\nRun this so the environment variable will be set under current shell. \n```bash\nexport CUDA_HOME=\u002Fpath\u002Fto\u002Fcuda-11.3\n```\n\nNotice the version of cuda should be aligned with your CUDA runtime, for there might exists multiple cuda at the same time. \n\nIf you want to set the CUDA_HOME permanently, store it using:\n\n```bash\necho 'export CUDA_HOME=\u002Fpath\u002Fto\u002Fcuda' >> ~\u002F.bashrc\n```\nafter that, source the bashrc file and check CUDA_HOME:\n```bash\nsource ~\u002F.bashrc\necho $CUDA_HOME\n```\n\nIn this example, \u002Fpath\u002Fto\u002Fcuda-11.3 should be replaced with the path where your CUDA toolkit is installed. You can find this by typing **which nvcc** in your terminal:\n\nFor instance, \nif the output is \u002Fusr\u002Flocal\u002Fcuda\u002Fbin\u002Fnvcc, then:\n```bash\nexport CUDA_HOME=\u002Fusr\u002Flocal\u002Fcuda\n```\n**Installation:**\n\n1.Clone the GroundingDINO repository from GitHub.\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO.git\n```\n\n2. Change the current directory to the GroundingDINO folder.\n\n```bash\ncd GroundingDINO\u002F\n```\n\n3. Install the required dependencies in the current directory.\n\n```bash\npip install -e .\n```\n\n4. Download pre-trained model weights.\n\n```bash\nmkdir weights\ncd weights\nwget -q https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO\u002Freleases\u002Fdownload\u002Fv0.1.0-alpha\u002Fgroundingdino_swint_ogc.pth\ncd ..\n```\n\n## :arrow_forward: Demo\nCheck your GPU ID (only if you're using a GPU)\n\n```bash\nnvidia-smi\n```\nReplace `{GPU ID}`, `image_you_want_to_detect.jpg`, and `\"dir you want to save the output\"` with appropriate values in the following command\n```bash\nCUDA_VISIBLE_DEVICES={GPU ID} python demo\u002Finference_on_a_image.py \\\n-c groundingdino\u002Fconfig\u002FGroundingDINO_SwinT_OGC.py \\\n-p weights\u002Fgroundingdino_swint_ogc.pth \\\n-i image_you_want_to_detect.jpg \\\n-o \"dir you want to save the output\" \\\n-t \"chair\"\n [--cpu-only] # open it for cpu mode\n```\n\nIf you would like to specify the phrases to detect, here is a demo:\n```bash\nCUDA_VISIBLE_DEVICES={GPU ID} python demo\u002Finference_on_a_image.py \\\n-c groundingdino\u002Fconfig\u002FGroundingDINO_SwinT_OGC.py \\\n-p .\u002Fgroundingdino_swint_ogc.pth \\\n-i .asset\u002Fcat_dog.jpeg \\\n-o logs\u002F1111 \\\n-t \"There is a cat and a dog in the image .\" \\\n--token_spans \"[[[9, 10], [11, 14]], [[19, 20], [21, 24]]]\"\n [--cpu-only] # open it for cpu mode\n```\nThe token_spans specify the start and end positions of a phrases. For example, the first phrase is `[[9, 10], [11, 14]]`. `\"There is a cat and a dog in the image .\"[9:10] = 'a'`, `\"There is a cat and a dog in the image .\"[11:14] = 'cat'`. Hence it refers to the phrase `a cat` . Similarly, the `[[19, 20], [21, 24]]` refers to the phrase `a dog`.\n\nSee the `demo\u002Finference_on_a_image.py` for more details.\n\n**Running with Python:**\n\n```python\nfrom groundingdino.util.inference import load_model, load_image, predict, annotate\nimport cv2\n\nmodel = load_model(\"groundingdino\u002Fconfig\u002FGroundingDINO_SwinT_OGC.py\", \"weights\u002Fgroundingdino_swint_ogc.pth\")\nIMAGE_PATH = \"weights\u002Fdog-3.jpeg\"\nTEXT_PROMPT = \"chair . person . dog .\"\nBOX_TRESHOLD = 0.35\nTEXT_TRESHOLD = 0.25\n\nimage_source, image = load_image(IMAGE_PATH)\n\nboxes, logits, phrases = predict(\n    model=model,\n    image=image,\n    caption=TEXT_PROMPT,\n    box_threshold=BOX_TRESHOLD,\n    text_threshold=TEXT_TRESHOLD\n)\n\nannotated_frame = annotate(image_source=image_source, boxes=boxes, logits=logits, phrases=phrases)\ncv2.imwrite(\"annotated_image.jpg\", annotated_frame)\n```\n**Web UI**\n\nWe also provide a demo code to integrate Grounding DINO with Gradio Web UI. See the file `demo\u002Fgradio_app.py` for more details.\n\n**Notebooks**\n\n- We release [demos](demo\u002Fimage_editing_with_groundingdino_gligen.ipynb) to combine [Grounding DINO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.05499) with [GLIGEN](https:\u002F\u002Fgithub.com\u002Fgligen\u002FGLIGEN)  for more controllable image editings.\n- We release [demos](demo\u002Fimage_editing_with_groundingdino_stablediffusion.ipynb) to combine [Grounding DINO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.05499) with [Stable Diffusion](https:\u002F\u002Fgithub.com\u002FStability-AI\u002FStableDiffusion) for image editings.\n\n## COCO Zero-shot Evaluations\n\nWe provide an example to evaluate Grounding DINO zero-shot performance on COCO. The results should be **48.5**.\n\n```bash\nCUDA_VISIBLE_DEVICES=0 \\\npython demo\u002Ftest_ap_on_coco.py \\\n -c groundingdino\u002Fconfig\u002FGroundingDINO_SwinT_OGC.py \\\n -p weights\u002Fgroundingdino_swint_ogc.pth \\\n --anno_path \u002Fpath\u002Fto\u002Fannoataions\u002Fie\u002Finstances_val2017.json \\\n --image_dir \u002Fpath\u002Fto\u002Fimagedir\u002Fie\u002Fval2017\n```\n\n\n## :luggage: Checkpoints\n\n\u003C!-- insert a table -->\n\u003Ctable>\n  \u003Cthead>\n    \u003Ctr style=\"text-align: right;\">\n      \u003Cth>\u003C\u002Fth>\n      \u003Cth>name\u003C\u002Fth>\n      \u003Cth>backbone\u003C\u002Fth>\n      \u003Cth>Data\u003C\u002Fth>\n      \u003Cth>box AP on COCO\u003C\u002Fth>\n      \u003Cth>Checkpoint\u003C\u002Fth>\n      \u003Cth>Config\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Cth>1\u003C\u002Fth>\n      \u003Ctd>GroundingDINO-T\u003C\u002Ftd>\n      \u003Ctd>Swin-T\u003C\u002Ftd>\n      \u003Ctd>O365,GoldG,Cap4M\u003C\u002Ftd>\n      \u003Ctd>48.4 (zero-shot) \u002F 57.2 (fine-tune)\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO\u002Freleases\u002Fdownload\u002Fv0.1.0-alpha\u002Fgroundingdino_swint_ogc.pth\">GitHub link\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FShilongLiu\u002FGroundingDINO\u002Fresolve\u002Fmain\u002Fgroundingdino_swint_ogc.pth\">HF link\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO\u002Fblob\u002Fmain\u002Fgroundingdino\u002Fconfig\u002FGroundingDINO_SwinT_OGC.py\">link\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>2\u003C\u002Fth>\n      \u003Ctd>GroundingDINO-B\u003C\u002Ftd>\n      \u003Ctd>Swin-B\u003C\u002Ftd>\n      \u003Ctd>COCO,O365,GoldG,Cap4M,OpenImage,ODinW-35,RefCOCO\u003C\u002Ftd>\n      \u003Ctd>56.7 \u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO\u002Freleases\u002Fdownload\u002Fv0.1.0-alpha2\u002Fgroundingdino_swinb_cogcoor.pth\">GitHub link\u003C\u002Fa>  | \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FShilongLiu\u002FGroundingDINO\u002Fresolve\u002Fmain\u002Fgroundingdino_swinb_cogcoor.pth\">HF link\u003C\u002Fa> \n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO\u002Fblob\u002Fmain\u002Fgroundingdino\u002Fconfig\u002FGroundingDINO_SwinB_cfg.py\">link\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n## :medal_military: Results\n\n\u003Cdetails open>\n\u003Csummary>\u003Cfont size=\"4\">\nCOCO Object Detection Results\n\u003C\u002Ffont>\u003C\u002Fsummary>\n\u003Cimg src=\".asset\u002FCOCO.png\" alt=\"COCO\" width=\"100%\">\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n\u003Csummary>\u003Cfont size=\"4\">\nODinW Object Detection Results\n\u003C\u002Ffont>\u003C\u002Fsummary>\n\u003Cimg src=\".asset\u002FODinW.png\" alt=\"ODinW\" width=\"100%\">\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n\u003Csummary>\u003Cfont size=\"4\">\nMarrying Grounding DINO with \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FStability-AI\u002FStableDiffusion\">Stable Diffusion\u003C\u002Fa> for Image Editing\n\u003C\u002Ffont>\u003C\u002Fsummary>\nSee our example \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO\u002Fblob\u002Fmain\u002Fdemo\u002Fimage_editing_with_groundingdino_stablediffusion.ipynb\">notebook\u003C\u002Fa> for more details.\n\u003Cimg src=\".asset\u002FGD_SD.png\" alt=\"GD_SD\" width=\"100%\">\n\u003C\u002Fdetails>\n\n\n\u003Cdetails open>\n\u003Csummary>\u003Cfont size=\"4\">\nMarrying Grounding DINO with \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fgligen\u002FGLIGEN\">GLIGEN\u003C\u002Fa> for more Detailed Image Editing.\n\u003C\u002Ffont>\u003C\u002Fsummary>\nSee our example \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO\u002Fblob\u002Fmain\u002Fdemo\u002Fimage_editing_with_groundingdino_gligen.ipynb\">notebook\u003C\u002Fa> for more details.\n\u003Cimg src=\".asset\u002FGD_GLIGEN.png\" alt=\"GD_GLIGEN\" width=\"100%\">\n\u003C\u002Fdetails>\n\n## :sauropod: Model: Grounding DINO\n\nIncludes: a text backbone, an image backbone, a feature enhancer, a language-guided query selection, and a cross-modality decoder.\n\n![arch](.asset\u002Farch.png)\n\n\n## :hearts: Acknowledgement\n\nOur model is related to [DINO](https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FDINO) and [GLIP](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FGLIP). Thanks for their great work!\n\nWe also thank great previous work including DETR, Deformable DETR, SMCA, Conditional DETR, Anchor DETR, Dynamic DETR, DAB-DETR, DN-DETR, etc. More related work are available at [Awesome Detection Transformer](https:\u002F\u002Fgithub.com\u002FIDEACVR\u002Fawesome-detection-transformer). A new toolbox [detrex](https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002Fdetrex) is available as well.\n\nThanks [Stable Diffusion](https:\u002F\u002Fgithub.com\u002FStability-AI\u002FStableDiffusion) and [GLIGEN](https:\u002F\u002Fgithub.com\u002Fgligen\u002FGLIGEN) for their awesome models.\n\n\n## :black_nib: Citation\n\nIf you find our work helpful for your research, please consider citing the following BibTeX entry.   \n\n```bibtex\n@article{liu2023grounding,\n  title={Grounding dino: Marrying dino with grounded pre-training for open-set object detection},\n  author={Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Li, Chunyuan and Yang, Jianwei and Su, Hang and Zhu, Jun and others},\n  journal={arXiv preprint arXiv:2303.05499},\n  year={2023}\n}\n```\n\n\n\n\n","\u003Cdiv align=\"center\">\n  \u003Cimg src=\".\u002F.asset\u002Fgrounding_dino_logo.png\" width=\"30%\">\n\u003C\u002Fdiv>\n\n# :sauropod: Grounding DINO \n\n[![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fgrounding-dino-marrying-dino-with-grounded\u002Fzero-shot-object-detection-on-mscoco)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Fzero-shot-object-detection-on-mscoco?p=grounding-dino-marrying-dino-with-grounded) [![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fgrounding-dino-marrying-dino-with-grounded\u002Fzero-shot-object-detection-on-odinw)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Fzero-shot-object-detection-on-odinw?p=grounding-dino-marrying-dino-with-grounded) \\\n[![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fgrounding-dino-marrying-dino-with-grounded\u002Fobject-detection-on-coco-minival)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Fobject-detection-on-coco-minival?p=grounding-dino-marrying-dino-with-grounded) [![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fgrounding-dino-marrying-dino-with-grounded\u002Fobject-detection-on-coco)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Fobject-detection-on-coco?p=grounding-dino-marrying-dino-with-grounded)\n\n\n**[IDEA-CVR, IDEA-Research](https:\u002F\u002Fgithub.com\u002FIDEA-Research)** \n\n[刘士龙](http:\u002F\u002Fwww.lsl.zone\u002F)、[曾昭阳](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=U_cvvUwAAAAJ&hl=zh-CN&oi=ao)、[任天贺](https:\u002F\u002Frentainhe.github.io\u002F)、[李峰](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=ybRe9GcAAAAJ&hl=zh-CN)、[张浩](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=B8hPxMQAAAAJ&hl=zh-CN)、[杨杰](https:\u002F\u002Fgithub.com\u002Fyangjie-cv)、[李春元](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=Zd7WmXUAAAAJ&hl=zh-CN&oi=ao)、[杨建伟](https:\u002F\u002Fjwyang.github.io\u002F)、[苏航](https:\u002F\u002Fscholar.google.com\u002Fcitations?hl=en&user=dxN1_X0AAAAJ&view_op=list_works&sortby=pubdate)、[朱俊](https:\u002F\u002Fscholar.google.com\u002Fcitations?hl=en&user=axsP38wAAAAJ)、[张磊](https:\u002F\u002Fwww.leizhang.org\u002F)\u003Csup>:email:\u003C\u002Fsup>.\n\n\n[[`论文`](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.05499)] [[`Demo`](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FShilongLiu\u002FGrounding_DINO_demo)] [[`BibTex`](#black_nib-citation)]\n\n\nGrounding DINO 的 PyTorch 实现及预训练模型。详情请参阅论文 **[Grounding DINO：将 DINO 与接地式预训练结合用于开放集目标检测](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.05499)**。\n\n- 🔥 **[Grounded SAM 2](https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGrounded-SAM-2)** 已发布，它将 Grounding DINO 与 [SAM 2](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fsegment-anything-2) 结合，适用于开放世界场景中的任意对象跟踪。\n- 🔥 **[Grounding DINO 1.5](https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGrounding-DINO-1.5-API)** 已发布，这是 IDEA Research 打造的 **最强大** 的开放世界目标检测模型！\n- 🔥 **[Grounding DINO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.05499)** 和 **[Grounded SAM](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.14159)** 现已在 Hugging Face 上提供支持。为方便使用，您可以参考 [此文档](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmodel_doc\u002Fgrounding-dino)。\n\n## :sun_with_face: 有用的教程\n\n- :grapes: [[阅读我们的 arXiv 论文](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.05499)]\n- :apple:  [[观看我们在 YouTube 上的简单介绍视频](https:\u002F\u002Fyoutu.be\u002FwxWDt5UiwY8)]\n- :blossom:   &nbsp;[[尝试 Colab 演示](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Froboflow-ai\u002Fnotebooks\u002Fblob\u002Fmain\u002Fnotebooks\u002Fzero-shot-object-detection-with-grounding-dino.ipynb)]\n- :sunflower: [[尝试我们的官方 Hugging Face 演示](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FShilongLiu\u002FGrounding_DINO_demo)]\n- :maple_leaf: [[观看 Roboflow AI 关于 GroundingDINO 的分步教程](https:\u002F\u002Fyoutu.be\u002FcMa77r3YrDk)]\n- :mushroom: [[Roboflow AI 的 GroundingDINO：自动化数据集标注与评估](https:\u002F\u002Fyoutu.be\u002FC4NqaRBz_Kw)]\n- :hibiscus: [[Roboflow AI 利用 SAM 和 GroundingDINO 加速图像标注](https:\u002F\u002Fyoutu.be\u002FoEQYStnF2l8)]\n- :white_flower: [[Autodistill：基于 Grounding-DINO 和 Grounded-SAM，零标注训练 YOLOv8](https:\u002F\u002Fgithub.com\u002Fautodistill\u002Fautodistill)]\n\n\u003C!-- Grounding DINO 方法 | \n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2303.05499-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.05499) \n[![YouTube](https:\u002F\u002Fbadges.aleen42.com\u002Fsrc\u002Fyoutube.svg)](https:\u002F\u002Fyoutu.be\u002FwxWDt5UiwY8) -->\n\n\u003C!-- Grounding DINO 演示 |\n[![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Froboflow-ai\u002Fnotebooks\u002Fblob\u002Fmain\u002Fnotebooks\u002Fzero-shot-object-detection-with-grounding-dino.ipynb) -->\n\u003C!-- [![YouTube](https:\u002F\u002Fbadges.aleen42.com\u002Fsrc\u002Fyoutube.svg)](https:\u002F\u002Fyoutu.be\u002FcMa77r3YrDk)\n[![HuggingFace space](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-HuggingFace%20Space-cyan.svg)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FShilongLiu\u002FGrounding_DINO_demo)\n[![YouTube](https:\u002F\u002Fbadges.aleen42.com\u002Fsrc\u002Fyoutube.svg)](https:\u002F\u002Fyoutu.be\u002FoEQYStnF2l8)\n[![YouTube](https:\u002F\u002Fbadges.aleen42.com\u002Fsrc\u002Fyoutube.svg)](https:\u002F\u002Fyoutu.be\u002FC4NqaRBz_Kw) -->\n\n## :sparkles: 亮点项目\n\n- [Semantic-SAM：一种通用图像分割模型，可在任何所需粒度上实现对任何内容的分割和识别。](https:\u002F\u002Fgithub.com\u002FUX-Decoder\u002FSemantic-SAM), \n- [DetGPT：通过推理检测你需要的内容](https:\u002F\u002Fgithub.com\u002FOptimalScale\u002FDetGPT)\n- [Grounded-SAM：将 Grounding DINO 与 Segment Anything 结合](https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGrounded-Segment-Anything)\n- [Grounding DINO 与 Stable Diffusion 结合](demo\u002Fimage_editing_with_groundingdino_stablediffusion.ipynb)\n- [Grounding DINO 与 GLIGEN 结合，用于可控图像编辑](demo\u002Fimage_editing_with_groundingdino_gligen.ipynb)\n- [OpenSeeD：一个简单而强大的开放集分割模型](https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FOpenSeeD)\n- [SEEM：一次完成所有地方的所有内容的分割](https:\u002F\u002Fgithub.com\u002FUX-Decoder\u002FSegment-Everything-Everywhere-All-At-Once)\n- [X-GPT：由 X-Decoder 支持的对话式视觉智能体](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FX-Decoder\u002Ftree\u002Fxgpt)\n- [GLIGEN：开放集接地式文本到图像生成](https:\u002F\u002Fgithub.com\u002Fgligen\u002FGLIGEN)\n- [LLaVA：大型语言与视觉助手](https:\u002F\u002Fgithub.com\u002Fhaotian-liu\u002FLLaVA)\n\n\u003C!-- 扩展 | [Grounding DINO 与 Segment Anything 结合](https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGrounded-Segment-Anything); [Grounding DINO 与 Stable Diffusion 结合](demo\u002Fimage_editing_with_groundingdino_stablediffusion.ipynb); [Grounding DINO 与 GLIGEN 结合](demo\u002Fimage_editing_with_groundingdino_gligen.ipynb)  -->\n\n\n\n\u003C!-- [Grounding DINO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.05499) 的官方 PyTorch 实现，这是一种更强大的开放集目标检测器。代码现已可用！ -->\n\n\n## :bulb: 亮点\n\n- **开放集检测。** 用语言检测 **一切**！\n- **高性能。** COCO 零样本 **52.5 AP**（未使用 COCO 数据进行训练！）。COCO 微调 **63.0 AP**。\n- **灵活。** 可与 Stable Diffusion 协同用于图像编辑。\n\n## :fire: 新闻\n- **`2023\u002F07\u002F18`**: 我们发布了[Semantic-SAM](https:\u002F\u002Fgithub.com\u002FUX-Decoder\u002FSemantic-SAM)，这是一个通用图像分割模型，能够在任意所需的粒度上实现对任何对象的分割和识别。**代码**和**检查点**现已开放！\n- **`2023\u002F06\u002F17`**: 我们提供了一个示例，用于评估Grounding DINO在COCO数据集上的零样本性能。\n- **`2023\u002F04\u002F15`**: 对于那些对开放集识别感兴趣的人，请参阅[CV in the Wild Readings](https:\u002F\u002Fgithub.com\u002FComputer-Vision-in-the-Wild\u002FCVinW_Readings)！\n- **`2023\u002F04\u002F08`**: 我们发布了[demos](demo\u002Fimage_editing_with_groundingdino_gligen.ipynb)，将[Grounding DINO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.05499)与[GLIGEN](https:\u002F\u002Fgithub.com\u002Fgligen\u002FGLIGEN)结合，以实现更可控的图像编辑。\n- **`2023\u002F04\u002F08`**: 我们发布了[demos](demo\u002Fimage_editing_with_groundingdino_stablediffusion.ipynb)，将[Grounding DINO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.05499)与[Stable Diffusion](https:\u002F\u002Fgithub.com\u002FStability-AI\u002FStableDiffusion)结合，用于图像编辑。\n- **`2023\u002F04\u002F06`**: 我们构建了一个新的演示，将GroundingDINO与[Segment-Anything](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fsegment-anything)相结合，命名为**[Grounded-Segment-Anything](https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGrounded-Segment-Anything)**，旨在支持GroundingDINO中的分割功能。\n- **`2023\u002F03\u002F28`**: 一段关于Grounding DINO和基础目标检测提示工程的YouTube[视频](https:\u002F\u002Fyoutu.be\u002FcMa77r3YrDk)。[[SkalskiP](https:\u002F\u002Fgithub.com\u002FSkalskiP)]\n- **`2023\u002F03\u002F28`**: 在Hugging Face Space上新增了一个[demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FShilongLiu\u002FGrounding_DINO_demo)！\n- **`2023\u002F03\u002F27`**: 支持仅CPU模式。现在该模型可以在没有GPU的机器上运行。\n- **`2023\u002F03\u002F25`**: 在Colab上提供了一个关于Grounding DINO的[demo](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Froboflow-ai\u002Fnotebooks\u002Fblob\u002Fmain\u002Fnotebooks\u002Fzero-shot-object-detection-with-grounding-dino.ipynb)。[[SkalskiP](https:\u002F\u002Fgithub.com\u002FSkalskiP)]\n- **`2023\u002F03\u002F22`**: 代码现已开放！\n\n\u003Cdetails open>\n\u003Csummary>\u003Cfont size=\"4\">\n描述\n\u003C\u002Ffont>\u003C\u002Fsummary>\n \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.05499\">论文\u003C\u002Fa>介绍。\n\u003Cimg src=\".asset\u002Fhero_figure.png\" alt=\"ODinW\" width=\"100%\">\n将\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO\">Grounding DINO\u003C\u002Fa>和\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fgligen\u002FGLIGEN\">GLIGEN\u003C\u002Fa>结合\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FIDEA-Research_GroundingDINO_readme_c5ff84142634.png\" alt=\"gd_gligen\" width=\"100%\">\n\u003C\u002Fdetails>\n\n## :star: Grounding DINO 输入与输出的解释\u002F技巧\n- Grounding DINO 接受一个`(图片, 文本)`对作为输入。\n- 它会输出`900`个（默认）目标框。每个框都包含与所有输入词语的相似度分数。（如图所示。）\n- 默认情况下，我们会选择那些最高相似度高于`box_threshold`的框。\n- 我们会提取那些相似度高于`text_threshold`的词语作为预测标签。\n- 如果你想获取特定短语的对象，比如句子`two dogs with a stick.`中的`dogs`，你可以选择与`dogs`文本相似度最高的框作为最终输出。\n- 请注意，每个单词可能会被不同的分词器拆分成**多个**标记。因此，句子中的单词数量可能并不等于文本标记的数量。\n- 我们建议在Grounding DINO中使用`.`来分隔不同的类别名称。\n![model_explain1](.asset\u002Fmodel_explan1.PNG)\n![model_explain2](.asset\u002Fmodel_explan2.PNG)\n\n## :label: 待办事项 \n\n- [x] 发布推理代码和演示。\n- [x] 发布检查点。\n- [x] 提供Grounding DINO与Stable Diffusion和GLIGEN结合的演示。\n- [ ] 发布训练代码。\n\n## :hammer_and_wrench: 安装 \n\n**注意：**\n\n0. 如果你有CUDA环境，请确保已设置环境变量`CUDA_HOME`。如果没有CUDA，程序将以仅CPU模式编译。\n\n请严格按照以下安装步骤操作，否则程序可能会报错：\n```bash\nNameError: name '_C' is not defined\n```\n\n如果出现此错误，请重新克隆仓库并再次执行所有安装步骤来重新安装GroundingDINO。\n\n#### 如何检查CUDA：\n```bash\necho $CUDA_HOME\n```\n如果没有任何输出，则表示你尚未设置路径。\n\n运行以下命令以在当前Shell中设置环境变量：\n```bash\nexport CUDA_HOME=\u002Fpath\u002Fto\u002Fcuda-11.3\n```\n\n请注意，CUDA版本应与你的CUDA运行时版本一致，因为同一时间可能存在多个CUDA版本。\n\n如果你想永久设置CUDA_HOME，可以将其添加到配置文件中：\n```bash\necho 'export CUDA_HOME=\u002Fpath\u002Fto\u002Fcuda' >> ~\u002F.bashrc\n```\n之后，加载`.bashrc`文件并检查CUDA_HOME：\n```bash\nsource ~\u002F.bashrc\necho $CUDA_HOME\n```\n\n在本示例中，`\u002Fpath\u002Fto\u002Fcuda-11.3`应替换为你CUDA工具包的实际安装路径。你可以在终端中输入**which nvcc**来找到它：\n\n例如，如果输出是`\u002Fusr\u002Flocal\u002Fcuda\u002Fbin\u002Fnvcc`，那么：\n```bash\nexport CUDA_HOME=\u002Fusr\u002Flocal\u002Fcuda\n```\n**安装：**\n\n1. 从GitHub克隆GroundingDINO仓库。\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO.git\n```\n\n2. 切换到GroundingDINO目录。\n\n```bash\ncd GroundingDINO\u002F\n```\n\n3. 在当前目录中安装所需的依赖项。\n\n```bash\npip install -e .\n```\n\n4. 下载预训练模型权重。\n\n```bash\nmkdir weights\ncd weights\nwget -q https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO\u002Freleases\u002Fdownload\u002Fv0.1.0-alpha\u002Fgroundingdino_swint_ogc.pth\ncd ..\n```\n\n## :arrow_forward: 演示\n检查你的 GPU ID（仅在使用 GPU 时）\n\n```bash\nnvidia-smi\n```\n将以下命令中的 `{GPU ID}`、`image_you_want_to_detect.jpg` 和 `\"dir you want to save the output\"` 替换为适当的值\n```bash\nCUDA_VISIBLE_DEVICES={GPU ID} python demo\u002Finference_on_a_image.py \\\n-c groundingdino\u002Fconfig\u002FGroundingDINO_SwinT_OGC.py \\\n-p weights\u002Fgroundingdino_swint_ogc.pth \\\n-i image_you_want_to_detect.jpg \\\n-o \"dir you want to save the output\" \\\n-t \"chair\"\n [--cpu-only] # 打开以 CPU 模式运行\n```\n\n如果你想指定要检测的短语，这里有一个演示：\n```bash\nCUDA_VISIBLE_DEVICES={GPU ID} python demo\u002Finference_on_a_image.py \\\n-c groundingdino\u002Fconfig\u002FGroundingDINO_SwinT_OGC.py \\\n-p .\u002Fgroundingdino_swint_ogc.pth \\\n-i .asset\u002Fcat_dog.jpeg \\\n-o logs\u002F1111 \\\n-t \"There is a cat and a dog in the image .\" \\\n--token_spans \"[[[9, 10], [11, 14]], [[19, 20], [21, 24]]]\"\n [--cpu-only] # 打开以 CPU 模式运行\n```\n`token_spans` 指定了短语的起始和结束位置。例如，第一个短语是 `[[9, 10], [11, 14]]`。`\"There is a cat and a dog in the image .\"[9:10] = 'a'`, `\"There is a cat and a dog in the image .\"[11:14] = 'cat'`。因此，它指的是短语 `a cat`。同样地，`[[19, 20], [21, 24]]` 指的是短语 `a dog`。\n\n更多详细信息请参阅 `demo\u002Finference_on_a_image.py`。\n\n**使用 Python 运行：**\n\n```python\nfrom groundingdino.util.inference import load_model, load_image, predict, annotate\nimport cv2\n\nmodel = load_model(\"groundingdino\u002Fconfig\u002FGroundingDINO_SwinT_OGC.py\", \"weights\u002Fgroundingdino_swint_ogc.pth\")\nIMAGE_PATH = \"weights\u002Fdog-3.jpeg\"\nTEXT_PROMPT = \"chair . person . dog .\"\nBOX_TRESHOLD = 0.35\nTEXT_TRESHOLD = 0.25\n\nimage_source, image = load_image(IMAGE_PATH)\n\nboxes, logits, phrases = predict(\n    model=model,\n    image=image,\n    caption=TEXT_PROMPT,\n    box_threshold=BOX_TRESHOLD,\n    text_threshold=TEXT_TRESHOLD\n)\n\nannotated_frame = annotate(image_source=image_source, boxes=boxes, logits=logits, phrases=phrases)\ncv2.imwrite(\"annotated_image.jpg\", annotated_frame)\n```\n**Web UI**\n\n我们还提供了一个演示代码，用于将 Grounding DINO 集成到 Gradio Web UI 中。更多详细信息请参阅文件 `demo\u002Fgradio_app.py`。\n\n**Notebooks**\n\n- 我们发布了[演示](demo\u002Fimage_editing_with_groundingdino_gligen.ipynb)，将[Grounding DINO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.05499)与[GLIGEN](https:\u002F\u002Fgithub.com\u002Fgligen\u002FGLIGEN)结合，实现更可控的图像编辑。\n- 我们发布了[演示](demo\u002Fimage_editing_with_groundingdino_stablediffusion.ipynb)，将[Grounding DINO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2303.05499)与[Stable Diffusion](https:\u002F\u002Fgithub.com\u002FStability-AI\u002FStableDiffusion)结合，用于图像编辑。\n\n## COCO 零样本评估\n\n我们提供了一个示例，用于评估 Grounding DINO 在 COCO 数据集上的零样本性能。结果应为 **48.5**。\n\n```bash\nCUDA_VISIBLE_DEVICES=0 \\\npython demo\u002Ftest_ap_on_coco.py \\\n -c groundingdino\u002Fconfig\u002FGroundingDINO_SwinT_OGC.py \\\n -p weights\u002Fgroundingdino_swint_ogc.pth \\\n --anno_path \u002Fpath\u002Fto\u002Fannoataions\u002Fie\u002Finstances_val2017.json \\\n --image_dir \u002Fpath\u002Fto\u002Fimagedir\u002Fie\u002Fval2017\n```\n\n\n## :luggage: 检查点\n\n\u003C!-- 插入一个表格 -->\n\u003Ctable>\n  \u003Cthead>\n    \u003Ctr style=\"text-align: right;\">\n      \u003Cth>\u003C\u002Fth>\n      \u003Cth>名称\u003C\u002Fth>\n      \u003Cth>骨干网络\u003C\u002Fth>\n      \u003Cth>数据集\u003C\u002Fth>\n      \u003Cth>COCO 上的框 AP\u003C\u002Fth>\n      \u003Cth>检查点\u003C\u002Fth>\n      \u003Cth>配置文件\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Cth>1\u003C\u002Fth>\n      \u003Ctd>GroundingDINO-T\u003C\u002Ftd>\n      \u003Ctd>Swin-T\u003C\u002Ftd>\n      \u003Ctd>O365,GoldG,Cap4M\u003C\u002Ftd>\n      \u003Ctd>48.4（零样本）\u002F 57.2（微调）\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO\u002Freleases\u002Fdownload\u002Fv0.1.0-alpha\u002Fgroundingdino_swint_ogc.pth\">GitHub 链接\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FShilongLiu\u002FGroundingDINO\u002Fresolve\u002Fmain\u002Fgroundingdino_swint_ogc.pth\">HF 链接\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO\u002Fblob\u002Fmain\u002Fgroundingdino\u002Fconfig\u002FGroundingDINO_SwinT_OGC.py\">链接\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Cth>2\u003C\u002Fth>\n      \u003Ctd>GroundingDINO-B\u003C\u002Ftd>\n      \u003Ctd>Swin-B\u003C\u002Ftd>\n      \u003Ctd>COCO,O365,GoldG,Cap4M,OpenImage,ODinW-35,RefCOCO\u003C\u002Ftd>\n      \u003Ctd>56.7\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO\u002Freleases\u002Fdownload\u002Fv0.1.0-alpha2\u002Fgroundingdino_swinb_cogcoor.pth\">GitHub 链接\u003C\u002Fa>  | \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FShilongLiu\u002FGroundingDINO\u002Fresolve\u002Fmain\u002Fgroundingdino_swinb_cogcoor.pth\">HF 链接\u003C\u002Fa> \n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO\u002Fblob\u002Fmain\u002Fgroundingdino\u002Fconfig\u002FGroundingDINO_SwinB_cfg.py\">链接\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n## :medal_military: 结果\n\n\u003Cdetails open>\n\u003Csummary>\u003Cfont size=\"4\">\nCOCO 物体检测结果\n\u003C\u002Ffont>\u003C\u002Fsummary>\n\u003Cimg src=\".asset\u002FCOCO.png\" alt=\"COCO\" width=\"100%\">\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n\u003Csummary>\u003Cfont size=\"4\">\nODinW 物体检测结果\n\u003C\u002Ffont>\u003C\u002Fsummary>\n\u003Cimg src=\".asset\u002FODinW.png\" alt=\"ODinW\" width=\"100%\">\n\u003C\u002Fdetails>\n\n\u003Cdetails open>\n\u003Csummary>\u003Cfont size=\"4\">\n将 Grounding DINO 与 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FStability-AI\u002FStableDiffusion\">Stable Diffusion\u003C\u002Fa> 结合进行图像编辑\n\u003C\u002Ffont>\u003C\u002Fsummary>\n请参阅我们的示例 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO\u002Fblob\u002Fmain\u002Fdemo\u002Fimage_editing_with_groundingdino_stablediffusion.ipynb\">notebook\u003C\u002Fa> 以获取更多详情。\n\u003Cimg src=\".asset\u002FGD_SD.png\" alt=\"GD_SD\" width=\"100%\">\n\u003C\u002Fdetails>\n\n\n\u003Cdetails open>\n\u003Csummary>\u003Cfont size=\"4\">\n将 Grounding DINO 与 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fgligen\u002FGLIGEN\">GLIGEN\u003C\u002Fa> 结合，实现更精细的图像编辑。\n\u003C\u002Ffont>\u003C\u002Fsummary>\n请参阅我们的示例 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO\u002Fblob\u002Fmain\u002Fdemo\u002Fimage_editing_with_groundingdino_gligen.ipynb\">notebook\u003C\u002Fa> 以获取更多详情。\n\u003Cimg src=\".asset\u002FGD_GLIGEN.png\" alt=\"GD_GLIGEN\" width=\"100%\">\n\u003C\u002Fdetails>\n\n## :sauropod: 模型：Grounding DINO\n\n包括：文本骨干网络、图像骨干网络、特征增强器、语言引导的查询选择以及跨模态解码器。\n\n![arch](.asset\u002Farch.png)\n\n\n## :hearts: 致谢\n\n我们的模型与 [DINO](https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FDINO) 和 [GLIP](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FGLIP) 有关。感谢他们的杰出工作！\n\n我们还要感谢 DETR、Deformable DETR、SMCA、Conditional DETR、Anchor DETR、Dynamic DETR、DAB-DETR、DN-DETR 等优秀的工作。更多相关工作可在 [Awesome Detection Transformer](https:\u002F\u002Fgithub.com\u002FIDEACVR\u002Fawesome-detection-transformer) 中找到。此外，还有一个新的工具箱 [detrex](https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002Fdetrex) 可供使用。\n\n感谢 [Stable Diffusion](https:\u002F\u002Fgithub.com\u002FStability-AI\u002FStableDiffusion) 和 [GLIGEN](https:\u002F\u002Fgithub.com\u002Fgligen\u002FGLIGEN) 提供的优秀模型。\n\n## :black_nib: 引用\n\n如果您认为我们的工作对您的研究有所帮助，请考虑引用以下 BibTeX 条目。\n\n```bibtex\n@article{liu2023grounding,\n  title={Grounding DINO：将 DINO 与接地式预训练相结合用于开放集目标检测},\n  author={刘士龙和曾昭阳和任天贺和李峰和张浩和杨杰和李春元和杨建伟和苏航和朱俊和其他人},\n  journal={arXiv 预印本 arXiv:2303.05499},\n  year={2023}\n}\n```","# Grounding DINO 快速上手指南\n\nGrounding DINO 是一款强大的开源零样本（Zero-Shot）目标检测模型。它能够结合图像和文本提示，检测出任意指定的物体，无需针对特定类别进行重新训练。\n\n## 1. 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n*   **操作系统**: Linux (推荐) 或 Windows (可能需要额外配置编译环境)。\n*   **Python**: 3.8 或更高版本。\n*   **PyTorch**: 1.10.0 或更高版本（建议与您的 CUDA 版本匹配）。\n*   **CUDA**: 如果需要使用 GPU 加速，必须安装 CUDA Toolkit 并设置环境变量 `CUDA_HOME`。\n    *   *检查 CUDA 路径*: 在终端运行 `which nvcc`。如果输出例如 `\u002Fusr\u002Flocal\u002Fcuda\u002Fbin\u002Fnvcc`，则您的 CUDA 路径为 `\u002Fusr\u002Flocal\u002Fcuda`。\n    *   *设置环境变量* (临时): `export CUDA_HOME=\u002Fusr\u002Flocal\u002Fcuda`\n    *   *设置环境变量* (永久): 将 `export CUDA_HOME=\u002Fpath\u002Fto\u002Fcuda` 添加到 `~\u002F.bashrc` 并运行 `source ~\u002F.bashrc`。\n*   **编译工具**: 需要安装 `gcc` 和 `g++` 以编译自定义算子。\n\n## 2. 安装步骤\n\n请严格按照以下步骤操作，以避免出现 `_C is not defined` 等编译错误。\n\n### 第一步：克隆仓库\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO.git\ncd GroundingDINO\n```\n\n### 第二步：安装依赖\n建议使用国内镜像源加速 PyTorch 和相关库的安装（如清华源或阿里源）。\n\n```bash\n# 创建虚拟环境 (可选但推荐)\nconda create -n groundingdino python=3.8 -y\nconda activate groundingdino\n\n# 安装 PyTorch (请根据实际 CUDA 版本选择，此处以 cu118 为例)\npip install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n\n# 安装其他依赖\npip install -r requirements.txt\n```\n\n### 第三步：安装 Grounding DINO 本体\n这一步会编译模型所需的自定义 CUDA 算子，请确保 `CUDA_HOME` 已正确设置。\n\n```bash\npip install -e .\n```\n\n> **注意**：如果安装过程中报错提示找不到 `_C` 模块，通常是因为 CUDA 环境未识别或编译失败。请检查 `echo $CUDA_HOME` 是否有输出，并尝试清理构建缓存后重新运行 `pip install -e .`。\n\n### 第四步：下载预训练权重\n从 Hugging Face 下载预训练模型文件（例如 `groundingdino_swint_ogc.pth`），并将其放置在项目根目录或指定文件夹中。\n\n```bash\n# 示例：使用 wget 下载 (也可手动下载后上传至服务器)\nwget https:\u002F\u002Fhuggingface.co\u002FShilongLiu\u002FGroundingDINO\u002Fresolve\u002Fmain\u002Fgroundingdino_swint_ogc.pth\n```\n\n## 3. 基本使用\n\n以下是一个最简单的 Python 推理示例，展示如何加载模型并根据文本提示检测物体。\n\n```python\nimport torch\nfrom groundingdino.util.inference import load_model, load_image, predict, annotate\nfrom groundingdino.util import box_ops\n\n# 1. 配置参数\nMODEL_CONFIG_PATH = \"GroundingDINO\u002Fgroundingdino\u002Fconfig\u002FGroundingDINO_SwinT_OGC.py\"\nMODEL_CHECKPOINT_PATH = \".\u002Fgroundingdino_swint_ogc.pth\"\nIMAGE_PATH = \".\u002Fassets\u002Fdemo1.jpg\"  # 替换为你的图片路径\nTEXT_PROMPT = \"dog . cat .\"       # 提示词，不同类别建议用 \".\" 分隔\nBOX_THRESHOLD = 0.35\nTEXT_THRESHOLD = 0.25\n\n# 2. 加载模型 (自动使用 CUDA 如果可用)\nmodel = load_model(MODEL_CONFIG_PATH, MODEL_CHECKPOINT_PATH)\n\n# 3. 加载图像\nimage_source, image = load_image(IMAGE_PATH)\n\n# 4. 执行预测\nboxes, logits, phrases = predict(\n    model=model,\n    image=image,\n    caption=TEXT_PROMPT,\n    box_threshold=BOX_THRESHOLD,\n    text_threshold=TEXT_THRESHOLD\n)\n\n# 5. 标注结果\nannotated_frame = annotate(image_source=image_source, boxes=boxes, logits=logits, phrases=phrases)\n\n# 6. 保存或显示结果\nimport cv2\ncv2.imwrite(\"annotated_output.jpg\", annotated_frame)\nprint(f\"检测到物体：{phrases}\")\nprint(f\"置信度分数：{logits}\")\n```\n\n### 使用提示\n*   **输入格式**: 模型接受 `(图像，文本)` 对。\n*   **提示词技巧**: 如果需要检测多个类别，建议使用 `.` 分隔，例如 `\"person . car . tree .\"`。\n*   **阈值调整**: \n    *   `box_threshold`: 过滤检测框的最低置信度。\n    *   `text_threshold`: 过滤文本匹配度的最低阈值。\n    *   若检测结果过多或过少，可适当调整这两个参数。\n*   **特定短语检测**: 如果句子是 \"two dogs with a stick\"，而您只想检测 \"dogs\"，模型会计算每个词元的相似度，您可以选取与 \"dogs\" 相似度最高的框作为最终输出。","某电商运营团队需要每天从数万张用户上传的生活场景图中，快速筛选并标注出包含“复古红色手提包”或“木质咖啡桌”等特定长尾商品的照片，以构建新品训练数据集。\n\n### 没有 GroundingDINO 时\n- **标注成本极高**：面对从未见过的新型商品，必须雇佣大量人工逐张看图框选，耗时数天且费用昂贵。\n- **泛化能力受限**：传统检测模型只能识别预定义好的 80 类常见物体，一旦用户搜索词超出训练集（如“波西米亚风地毯”），模型直接失效。\n- **迭代周期漫长**：每当业务需求变更（例如从找“椅子”变为找“折叠椅”），都需要重新收集数据、训练模型，无法即时响应。\n- **漏检率居高不下**：对于背景复杂或遮挡严重的目标，固定类别的模型往往无法准确定位，导致大量有价值图片被遗漏。\n\n### 使用 GroundingDINO 后\n- **零样本即时检测**：只需输入自然语言描述（如“复古红色手提包”），GroundingDINO 即可直接在图中框出目标，无需任何额外训练或标注。\n- **开放词汇支持**：彻底打破类别限制，无论是生僻词还是组合概念（如“放在窗边的绿植”），GroundingDINO 都能凭借强大的图文理解能力精准识别。\n- **自动化数据生产**：利用 GroundingDINO 批量预处理海量图片，自动生成高质量标注框，将原本数周的数据准备工作压缩至几小时。\n- **复杂场景鲁棒性**：得益于接地预训练机制，GroundingDINO 在人群拥挤、光线昏暗或部分遮挡的复杂环境下，依然能保持极高的定位准确率。\n\nGroundingDINO 通过将自然语言理解融入视觉检测，让机器具备了“听懂指令找物体”的能力，彻底重构了开放世界下的目标检测工作流。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FIDEA-Research_GroundingDINO_ab6e4790.png","IDEA-Research","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FIDEA-Research_b8b3359e.png","The International Digital Economy Academy (“IDEA”). ",null,"www.idea.edu.cn","https:\u002F\u002Fgithub.com\u002FIDEA-Research",[83,87,91,94,98],{"name":84,"color":85,"percentage":86},"Python","#3572A5",79.3,{"name":88,"color":89,"percentage":90},"Cuda","#3A4E3A",17.6,{"name":92,"color":93,"percentage":23},"C++","#f34b7d",{"name":95,"color":96,"percentage":97},"Jupyter Notebook","#DA5B0B",0.9,{"name":99,"color":100,"percentage":101},"Dockerfile","#384d54",0.3,9958,1016,"2026-04-05T01:44:59","Apache-2.0",4,"Linux, macOS, Windows","非必需（支持 CPU 模式）。若有 CUDA 环境，需设置 CUDA_HOME 环境变量，具体 CUDA 版本需与运行时对齐（文中示例为 CUDA 11.3），未明确指定最低显存要求。","未说明",{"notes":111,"python":109,"dependencies":112},"1. 若拥有 CUDA 环境，必须设置 CUDA_HOME 环境变量，否则将编译为仅 CPU 模式。\n2. 安装步骤必须严格执行，否则可能报错 'NameError: name '_C' is not defined'，若发生此错误需重新克隆仓库并重装。\n3. 可通过 'which nvcc' 查找 CUDA 安装路径以设置环境变量。\n4. 项目支持在無 GPU 的机器上运行（CPU-only mode）。",[113,114,115,116,117,118,119],"torch","torchvision","transformers","pycocotools","opencv-python","supervision","timm",[26,14],[122,123,124,125,126],"object-detection","open-world","open-world-detection","vision-language","vision-language-transformer","2026-03-27T02:49:30.150509","2026-04-06T08:18:26.107429",[130,135,140,145,150,154],{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},16364,"在 Colab 上运行 GroundingDINO Demo 失败或遇到安装错误怎么办？","Colab 环境中的 Torch 版本可能与代码存在兼容性问题。解决方法是在安装命令之前，进入特定目录并修改 CUDA 源码文件：\n1. 切换目录：%cd \u002Fcontent\u002FGroundingDINO\u002Fgroundingdino\u002Fmodels\u002FGroundingDINO\u002Fcsrc\u002FMsDeformAttn\n2. 执行 sed 命令修复类型转换错误：\n   !sed -i 's\u002Fvalue.type()\u002Fvalue.scalar_type()\u002Fg' ms_deform_attn_cuda.cu\n   !sed -i 's\u002Fvalue.scalar_type().is_cuda()\u002Fvalue.is_cuda()\u002Fg' ms_deform_attn_cuda.cu\n3. 然后再运行原本的安装或演示代码。","https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO\u002Fissues\u002F402",{"id":136,"question_zh":137,"answer_zh":138,"source_url":139},16365,"安装时出现 'no suitable conversion function from DeprecatedTypeProperties to ScalarType' 编译错误如何解决？","这是由于 PyTorch 新版本中 API 变更导致的。需要手动修改源码中的 CUDA 文件：\n1. 打开文件：GroundingDINO\u002Fgroundingdino\u002Fmodels\u002FGroundingDINO\u002Fcsrc\u002FMsDeformAttn\u002Fms_deform_attn_cuda.cu\n2. 找到第 65 行和第 135 行（或其他报错行）。\n3. 将 value.type() 替换为 value.scalar_type()。\n   例如：AT_DISPATCH_FLOATING_TYPES(value.scalar_type(), \"ms_deform_attn_forward_cuda\", ...)\n4. 卸载旧版本并重新从本地安装：\n   pip uninstall groundingdino -y\n   pip install -e .","https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO\u002Fissues\u002F388",{"id":141,"question_zh":142,"answer_zh":143,"source_url":144},16366,"在 Windows 上安装 GroundingDINO 失败（Failed building wheel）的原因及解决方法？","通常是因为环境变量配置缺失或 CUDA 版本不匹配。请检查以下几点：\n1. 确保已正确设置环境变量 CUDA_HOME 和 LD_LIBRARY_PATH（Windows 下为 PATH）。\n2. 确认安装的 CUDA 版本已在系统路径中。\n3. 如果使用 SD-WebUI 等集成环境，可能是扩展与主环境版本冲突，尝试手动对齐扩展版本或停用可疑扩展。","https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO\u002Fissues\u002F78",{"id":146,"question_zh":147,"answer_zh":148,"source_url":149},16367,"在 COCO 数据集上 Fine-tune 训练时，为什么性能会在几个 Epoch 后突然大幅下降？","性能骤降通常与学习率调度策略不当有关。实验表明，如果在较晚的 Epoch（如第 8-11 个 Epoch）才降低学习率，模型可能已经过拟合或陷入局部最优导致崩溃。建议更早地降低学习率（例如在第 4-8 个 Epoch 之间），并确保其他超参数（如 Batch Size）与原始论文设置一致。此外，虽然 Denoising 不是必须的，但合理的 Label Assign 和匈牙利匹配策略对稳定性至关重要。","https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO\u002Fissues\u002F166",{"id":151,"question_zh":152,"answer_zh":153,"source_url":149},16368,"训练过程中分类损失（loss_label）出现 NaN 是什么原因？","Loss 出现 NaN 通常是因为预测的 logits 出现了 -inf 值，这往往是由于 Label Assign（标签分配）环节没有做好导致的。请检查匈牙利匹配过程中的 cost class 计算逻辑，确保其遵循了 GLIP 或相关论文的实现方法，避免无效梯度传播。",{"id":155,"question_zh":156,"answer_zh":157,"source_url":149},16369,"GroundingDINO 的匈牙利匹配和损失函数是参考哪个方法实现的？","GroundingDINO 在匈牙利匹配过程中的 cost class 计算以及最终的 label_loss（分类损失）主要遵循 GLIP 方法的实现思路。如果在复现中遇到问题，建议对照 GLIP 的代码逻辑进行检查。",[159,164],{"id":160,"version":161,"summary_zh":162,"released_at":163},98698,"v0.1.0-alpha2","模型数据集：O365、VG、RefCOCO、COCO、OpenImage、Cap4M、ODinW-35  \nCOCO 数据集上的性能：56.7 AP","2023-04-07T09:36:20",{"id":165,"version":166,"summary_zh":167,"released_at":168},98699,"v0.1.0-alpha","发布Grounding DINO的检查点","2023-03-21T04:50:27"]