[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-Intellindust-AI-Lab--DEIM":3,"tool-Intellindust-AI-Lab--DEIM":62},[4,18,26,35,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,2,"2026-04-10T11:39:34",[14,15,13],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":32,"last_commit_at":41,"category_tags":42,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[43,13,15,14],"插件",{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":10,"last_commit_at":50,"category_tags":51,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[52,15,13,14],"语言模型",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4292,"Deep-Live-Cam","hacksider\u002FDeep-Live-Cam","Deep-Live-Cam 是一款专注于实时换脸与视频生成的开源工具，用户仅需一张静态照片，即可通过“一键操作”实现摄像头画面的即时变脸或制作深度伪造视频。它有效解决了传统换脸技术流程繁琐、对硬件配置要求极高以及难以实时预览的痛点，让高质量的数字内容创作变得触手可及。\n\n这款工具不仅适合开发者和技术研究人员探索算法边界，更因其极简的操作逻辑（仅需三步：选脸、选摄像头、启动），广泛适用于普通用户、内容创作者、设计师及直播主播。无论是为了动画角色定制、服装展示模特替换，还是制作趣味短视频和直播互动，Deep-Live-Cam 都能提供流畅的支持。\n\n其核心技术亮点在于强大的实时处理能力，支持口型遮罩（Mouth Mask）以保留使用者原始的嘴部动作，确保表情自然精准；同时具备“人脸映射”功能，可同时对画面中的多个主体应用不同面孔。此外，项目内置了严格的内容安全过滤机制，自动拦截涉及裸露、暴力等不当素材，并倡导用户在获得授权及明确标注的前提下合规使用，体现了技术发展与伦理责任的平衡。",88924,"2026-04-06T03:28:53",[14,15,13,61],"视频",{"id":63,"github_repo":64,"name":65,"description_en":66,"description_zh":67,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":77,"owner_twitter":76,"owner_website":78,"owner_url":79,"languages":80,"stars":89,"forks":90,"last_commit_at":91,"license":92,"difficulty_score":10,"env_os":93,"env_gpu":94,"env_ram":95,"env_deps":96,"category_tags":102,"github_topics":103,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":109,"updated_at":110,"faqs":111,"releases":141},7071,"Intellindust-AI-Lab\u002FDEIM","DEIM","[CVPR 2025] DEIM: DETR with Improved Matching for Fast Convergence","DEIM 是一个专为加速目标检测模型训练而设计的先进框架，全称为“具有改进匹配机制的 DETR\"。它主要解决了传统 DETR 类模型收敛速度慢、训练周期长的问题。通过优化模型内部的匹配算法，DEIM 能够显著缩短训练时间，同时在保持甚至提升检测精度的基础上，让模型更快达到最佳状态。\n\n这款工具特别适合从事计算机视觉研究的研究人员、需要高效训练检测模型的算法工程师，以及关注实时目标检测技术落地的开发者。对于希望在有限计算资源下快速验证想法或部署高性能检测系统的团队，DEIM 提供了坚实的技术基础。\n\n其核心技术亮点在于对 DETR 架构中匹配机制的深度改良，实现了“快速收敛”与“高精度”的双重突破。此外，该项目持续迭代，最新推出的 DEIMv2 系列进一步涵盖了从超轻量级（如仅 0.49M 参数的 Atto 版本）到大型模型的多种规格，并引入了先进的 DINOv3 特征提取技术，在减少参数量和计算量的同时，依然保持了业界领先的性能表现，尤其适合移动端设备部署。","\u003Ch2 align=\"center\">\n  DEIM: DETR with Improved Matching for Fast Convergence\n\u003C\u002Fh2>\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FShihuaHuang95\u002FDEIM\u002Fblob\u002Fmaster\u002FLICENSE\">\n        \u003Cimg alt=\"license\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLICENSE-Apache%202.0-blue\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.04234\">\n        \u003Cimg alt=\"arXiv\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2412.04234-red\">\n    \u003C\u002Fa>\n   \u003Ca href=\"https:\u002F\u002Fwww.shihuahuang.cn\u002FDEIM\u002F\">\n        \u003Cimg alt=\"project webpage\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWebpage-DEIM-purple\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FShihuaHuang95\u002FDEIM\u002Fpulls\">\n        \u003Cimg alt=\"prs\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues-pr\u002FShihuaHuang95\u002FDEIM\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FShihuaHuang95\u002FDEIM\u002Fissues\">\n        \u003Cimg alt=\"issues\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002FShihuaHuang95\u002FDEIM?color=olive\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FShihuaHuang95\u002FDEIM\">\n        \u003Cimg alt=\"stars\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FShihuaHuang95\u002FDEIM\">\n    \u003C\u002Fa>\n    \u003Ca href=\"mailto:shihuahuang95@gmail.com\">\n        \u003Cimg alt=\"Contact Us\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FContact-Email-yellow\">\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\" style=\"font-size: 2.0em; font-weight: bold;\">\n    🎉 \u003Cstrong> \u003Ca href=\"https:\u002F\u002Fintellindust-ai-lab.github.io\u002Fprojects\u002FEdgeCrafter\u002F\" style=\"color: #d9534f; text-decoration: none;\">EdgeCrafter\u003C\u002Fa> is released with SOTA performance on detection, pose estimation as well as instance segmentation.\u003C\u002Fstrong>🎉\n\u003C\u002Fp>\n\u003Cp align=\"center\" style=\"font-size: 2.0em; font-weight: bold;\">\n    🎉 \u003Cstrong>We’re excited to share \u003Ca href=\"https:\u002F\u002Fintellindust-ai-lab.github.io\u002Fprojects\u002FDEIMv2\u002F\" style=\"color: #d9534f; text-decoration: none;\">DEIMv2\u003C\u002Fa> \u003C\u002Fstrong>🎉\n\u003C\u002Fp>\n\n\n\u003Cp align=\"center\">\n    DEIM is an advanced training framework designed to enhance the matching mechanism in DETRs, enabling faster convergence and improved accuracy. It serves as a robust foundation for future research and applications in the field of real-time object detection. \n\u003C\u002Fp>\n\n---\n\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"http:\u002F\u002Fwww.shihuahuang.cn\">Shihua Huang\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>,\n  \u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=tIFWBcQAAAAJ&hl=en\">Zhichao Lu\u003C\u002Fa>\u003Csup>2\u003C\u002Fsup>,\n  \u003Ca href=\"https:\u002F\u002Fvinthony.github.io\u002Facademic\u002F\">Xiaodong Cun\u003C\u002Fa>\u003Csup>3\u003C\u002Fsup>,\n  Yongjun Yu\u003Csup>1\u003C\u002Fsup>,\n  Xiao Zhou\u003Csup>4\u003C\u002Fsup>, \n  \u003Ca href=\"https:\u002F\u002Fxishen0220.github.io\">Xi Shen\u003C\u002Fa>\u003Csup>1*\u003C\u002Fsup>\n\u003C\u002Fdiv>\n\n  \n\u003Cp align=\"center\">\n\u003Ci>\n1. Intellindust AI Lab &nbsp; 2. City University of Hong Kong &nbsp; 3. Great Bay University &nbsp; 4. Hefei Normal University\n\u003C\u002Fi>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  **📧 Corresponding author:** \u003Ca href=\"mailto:shenxiluc@gmail.com\">shenxiluc@gmail.com\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Freal-time-object-detection-on-coco?p=deim-detr-with-improved-matching-for-fast\">\n    \u003Cimg alt=\"sota\" src=\"https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fdeim-detr-with-improved-matching-for-fast\u002Freal-time-object-detection-on-coco\">\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n\u003Cstrong>If you like our work, please give us a ⭐!\u003C\u002Fstrong>\n\u003C\u002Fp>\n\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FIntellindust-AI-Lab_DEIM_readme_b8143637c259.png\" alt=\"Image 1\" width=\"49%\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FIntellindust-AI-Lab_DEIM_readme_bdfc7b06a1bb.png\" alt=\"Image 2\" width=\"49%\">\n\u003C\u002Fp>\n\n\u003C\u002Fdetails>\n\n \n  \n## 🚀 Updates\n- [x] **\\[2025.09.26\\]** **DEIMv2** is now available with the [project page](https:\u002F\u002Fintellindust-ai-lab.github.io\u002Fprojects\u002FDEIMv2\u002F) and [release code](https:\u002F\u002Fgithub.com\u002FIntellindust-AI-Lab\u002FDEIMv2). The series covers eight model sizes, from **X** down to **Atto**. For the **S, M, L, and X** variants, we leverage DINOv3 features (distilled or pretrained). **DEIMv2** achieves higher performance with fewer parameters and FLOPs.\n- [x] **\\[2025.06.24\\]** DEIMv2 is coming soon: our next-gen detection series, along with three ultra-light variants: Pico (1.5M), Femto (0.96M), and Atto (0.49M), all delivering SoTA performance. Atto, in particular, is tailored for mobile devices, achieving 23.8 AP on COCO at 320×320 resolution.\n- [x] **\\[2025.03.12\\]** The Object365 Pretrained [DEIM-D-FINE-X](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1RMNrHh3bYN0FfT5ZlWhXtQxkG23xb2xj\u002Fview?usp=drive_link) model is released, which achieves 59.5% AP after fine-tuning 24 COCO epochs.\n- [x] **\\[2025.03.05\\]** The Nano DEIM model is released.\n- [x] **\\[2025.02.27\\]** The DEIM paper is accepted to CVPR 2025. Thanks to all co-authors.\n- [x] **\\[2024.12.26\\]** A more efficient implementation of Dense O2O, achieving nearly a 30% improvement in loading speed (See [the pull request](https:\u002F\u002Fgithub.com\u002FShihuaHuang95\u002FDEIM\u002Fpull\u002F13) for more details). Huge thanks to my colleague [Longfei Liu](https:\u002F\u002Fgithub.com\u002Fcapsule2077).\n- [x] **\\[2024.12.03\\]** Release DEIM series. Besides, this repo also supports the re-implmentations of [D-FINE](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.13842) and [RT-DETR](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.17140).\n\n## Table of Content\n* [1. Model Zoo](https:\u002F\u002Fgithub.com\u002FShihuaHuang95\u002FDEIM?tab=readme-ov-file#1-model-zoo)\n* [2. Quick start](https:\u002F\u002Fgithub.com\u002FShihuaHuang95\u002FDEIM?tab=readme-ov-file#2-quick-start)\n* [3. Usage](https:\u002F\u002Fgithub.com\u002FShihuaHuang95\u002FDEIM?tab=readme-ov-file#3-usage)\n* [4. Tools](https:\u002F\u002Fgithub.com\u002FShihuaHuang95\u002FDEIM?tab=readme-ov-file#4-tools)\n* [5. Citation](https:\u002F\u002Fgithub.com\u002FShihuaHuang95\u002FDEIM?tab=readme-ov-file#5-citation)\n* [6. Acknowledgement](https:\u002F\u002Fgithub.com\u002FShihuaHuang95\u002FDEIM?tab=readme-ov-file#6-acknowledgement)\n  \n  \n## 1. Model Zoo\n\n### DEIM-D-FINE\n| Model | Dataset | AP\u003Csup>D-FINE\u003C\u002Fsup> | AP\u003Csup>DEIM\u003C\u002Fsup> | #Params | Latency | GFLOPs | config | checkpoint\n| :---: | :---: | :---: | :---: |  :---: | :---: | :---: | :---: | :---: \n**N** | COCO | **42.8** | **43.0** | 4M | 2.12ms | 7 | [yml](.\u002Fconfigs\u002Fdeim_dfine\u002Fdeim_hgnetv2_n_coco.yml) | [ckpt](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1ZPEhiU9nhW4M5jLnYOFwTSLQC1Ugf62e\u002Fview?usp=sharing) |\n**S** | COCO | **48.7** | **49.0** | 10M | 3.49ms | 25 | [yml](.\u002Fconfigs\u002Fdeim_dfine\u002Fdeim_hgnetv2_s_coco.yml) | [ckpt](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1tB8gVJNrfb6dhFvoHJECKOF5VpkthhfC\u002Fview?usp=drive_link) |\n**M** | COCO | **52.3** | **52.7** | 19M | 5.62ms | 57 | [yml](.\u002Fconfigs\u002Fdeim_dfine\u002Fdeim_hgnetv2_m_coco.yml) | [ckpt](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F18Lj2a6UN6k_n_UzqnJyiaiLGpDzQQit8\u002Fview?usp=drive_link) |\n**L** | COCO | **54.0** | **54.7** | 31M | 8.07ms | 91 | [yml](.\u002Fconfigs\u002Fdeim_dfine\u002Fdeim_hgnetv2_l_coco.yml) | [ckpt](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1PIRf02XkrA2xAD3wEiKE2FaamZgSGTAr\u002Fview?usp=drive_link) | \n**X** | COCO | **55.8** | **56.5** | 62M | 12.89ms | 202 | [yml](.\u002Fconfigs\u002Fdeim_dfine\u002Fdeim_hgnetv2_x_coco.yml) | [ckpt](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1dPtbgtGgq1Oa7k_LgH1GXPelg1IVeu0j\u002Fview?usp=drive_link) | \n\n\n### DEIM-RT-DETRv2\n| Model | Dataset | AP\u003Csup>RT-DETRv2\u003C\u002Fsup> | AP\u003Csup>DEIM\u003C\u002Fsup> | #Params | Latency | GFLOPs | config | checkpoint\n| :---: | :---: | :---: | :---: |  :---: | :---: | :---: | :---: | :---: \n**S** | COCO | **47.9** | **49.0** | 20M | 4.59ms | 60 | [yml](.\u002Fconfigs\u002Fdeim_rtdetrv2\u002Fdeim_r18vd_120e_coco.yml) | [ckpt](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F153_JKff6EpFgiLKaqkJsoDcLal_0ux_F\u002Fview?usp=drive_link) | \n**M** | COCO | **49.9** | **50.9** | 31M | 6.40ms | 92 | [yml](.\u002Fconfigs\u002Fdeim_rtdetrv2\u002Fdeim_r34vd_120e_coco.yml) | [ckpt](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1O9RjZF6kdFWGv1Etn1Toml4r-YfdMDMM\u002Fview?usp=drive_link) | \n**M*** | COCO | **51.9** | **53.2** | 33M | 6.90ms | 100 | [yml](.\u002Fconfigs\u002Fdeim_rtdetrv2\u002Fdeim_r50vd_m_60e_coco.yml) | [ckpt](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F10dLuqdBZ6H5ip9BbBiE6S7ZcmHkRbD0E\u002Fview?usp=drive_link) | \n**L** | COCO | **53.4** | **54.3** | 42M | 9.15ms | 136 | [yml](.\u002Fconfigs\u002Fdeim_rtdetrv2\u002Fdeim_r50vd_60e_coco.yml) | [ckpt](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1mWknAXD5JYknUQ94WCEvPfXz13jcNOTI\u002Fview?usp=drive_link) | \n**X** | COCO | **54.3** | **55.5** | 76M | 13.66ms | 259 | [yml](.\u002Fconfigs\u002Fdeim_rtdetrv2\u002Fdeim_r101vd_60e_coco.yml) | [ckpt](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1BIevZijOcBO17llTyDX32F_pYppBfnzu\u002Fview?usp=drive_link) | \n\n\n## 2. Quick start\n\n### Setup\n\n```shell\nconda create -n deim python=3.11.9\nconda activate deim\npip install -r requirements.txt\n```\n\n\n### Data Preparation\n\n\u003Cdetails>\n\u003Csummary> COCO2017 Dataset \u003C\u002Fsummary>\n\n1. Download COCO2017 from [OpenDataLab](https:\u002F\u002Fopendatalab.com\u002FOpenDataLab\u002FCOCO_2017) or [COCO](https:\u002F\u002Fcocodataset.org\u002F#download).\n1. Modify paths in [coco_detection.yml](.\u002Fconfigs\u002Fdataset\u002Fcoco_detection.yml)\n\n    ```yaml\n    train_dataloader:\n        img_folder: \u002Fdata\u002FCOCO2017\u002Ftrain2017\u002F\n        ann_file: \u002Fdata\u002FCOCO2017\u002Fannotations\u002Finstances_train2017.json\n    val_dataloader:\n        img_folder: \u002Fdata\u002FCOCO2017\u002Fval2017\u002F\n        ann_file: \u002Fdata\u002FCOCO2017\u002Fannotations\u002Finstances_val2017.json\n    ```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Custom Dataset\u003C\u002Fsummary>\n\nTo train on your custom dataset, you need to organize it in the COCO format. Follow the steps below to prepare your dataset:\n\n1. **Set `remap_mscoco_category` to `False`:**\n\n    This prevents the automatic remapping of category IDs to match the MSCOCO categories.\n\n    ```yaml\n    remap_mscoco_category: False\n    ```\n\n2. **Organize Images:**\n\n    Structure your dataset directories as follows:\n\n    ```shell\n    dataset\u002F\n    ├── images\u002F\n    │   ├── train\u002F\n    │   │   ├── image1.jpg\n    │   │   ├── image2.jpg\n    │   │   └── ...\n    │   ├── val\u002F\n    │   │   ├── image1.jpg\n    │   │   ├── image2.jpg\n    │   │   └── ...\n    └── annotations\u002F\n        ├── instances_train.json\n        ├── instances_val.json\n        └── ...\n    ```\n\n    - **`images\u002Ftrain\u002F`**: Contains all training images.\n    - **`images\u002Fval\u002F`**: Contains all validation images.\n    - **`annotations\u002F`**: Contains COCO-formatted annotation files.\n\n3. **Convert Annotations to COCO Format:**\n\n    If your annotations are not already in COCO format, you'll need to convert them. You can use the following Python script as a reference or utilize existing tools:\n\n    ```python\n    import json\n\n    def convert_to_coco(input_annotations, output_annotations):\n        # Implement conversion logic here\n        pass\n\n    if __name__ == \"__main__\":\n        convert_to_coco('path\u002Fto\u002Fyour_annotations.json', 'dataset\u002Fannotations\u002Finstances_train.json')\n    ```\n\n4. **Update Configuration Files:**\n\n    Modify your [custom_detection.yml](.\u002Fconfigs\u002Fdataset\u002Fcustom_detection.yml).\n\n    ```yaml\n    task: detection\n\n    evaluator:\n      type: CocoEvaluator\n      iou_types: ['bbox', ]\n\n    num_classes: 777 # your dataset classes\n    remap_mscoco_category: False\n\n    train_dataloader:\n      type: DataLoader\n      dataset:\n        type: CocoDetection\n        img_folder: \u002Fdata\u002Fyourdataset\u002Ftrain\n        ann_file: \u002Fdata\u002Fyourdataset\u002Ftrain\u002Ftrain.json\n        return_masks: False\n        transforms:\n          type: Compose\n          ops: ~\n      shuffle: True\n      num_workers: 4\n      drop_last: True\n      collate_fn:\n        type: BatchImageCollateFunction\n\n    val_dataloader:\n      type: DataLoader\n      dataset:\n        type: CocoDetection\n        img_folder: \u002Fdata\u002Fyourdataset\u002Fval\n        ann_file: \u002Fdata\u002Fyourdataset\u002Fval\u002Fann.json\n        return_masks: False\n        transforms:\n          type: Compose\n          ops: ~\n      shuffle: False\n      num_workers: 4\n      drop_last: False\n      collate_fn:\n        type: BatchImageCollateFunction\n    ```\n\n\u003C\u002Fdetails>\n\n\n## 3. Usage\n\u003Cdetails open>\n\u003Csummary> COCO2017 \u003C\u002Fsummary>\n\n1. Training\n```shell\nCUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs\u002Fdeim_dfine\u002Fdeim_hgnetv2_${model}_coco.yml --use-amp --seed=0\n```\n\n\u003C!-- \u003Csummary>2. Testing \u003C\u002Fsummary> -->\n2. Testing\n```shell\nCUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs\u002Fdeim_dfine\u002Fdeim_hgnetv2_${model}_coco.yml --test-only -r model.pth\n```\n\n\u003C!-- \u003Csummary>3. Tuning \u003C\u002Fsummary> -->\n3. Tuning\n```shell\nCUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs\u002Fdeim_dfine\u002Fdeim_hgnetv2_${model}_coco.yml --use-amp --seed=0 -t model.pth\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary> Customizing Batch Size \u003C\u002Fsummary>\n\nFor example, if you want to double the total batch size when training D-FINE-L on COCO2017, here are the steps you should follow:\n\n1. **Modify your [dataloader.yml](.\u002Fconfigs\u002Fbase\u002Fdataloader.yml)** to increase the `total_batch_size`:\n\n    ```yaml\n    train_dataloader:\n        total_batch_size: 64  # Previously it was 32, now doubled\n    ```\n\n2. **Modify your [deim_hgnetv2_l_coco.yml](.\u002Fconfigs\u002Fdeim_dfine\u002Fdeim_hgnetv2_l_coco.yml)**. Here’s how the key parameters should be adjusted:\n\n    ```yaml\n    optimizer:\n    type: AdamW\n    params:\n        -\n        params: '^(?=.*backbone)(?!.*norm|bn).*$'\n        lr: 0.000025  # doubled, linear scaling law\n        -\n        params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'\n        weight_decay: 0.\n\n    lr: 0.0005  # doubled, linear scaling law\n    betas: [0.9, 0.999]\n    weight_decay: 0.0001  # need a grid search\n\n    ema:  # added EMA settings\n        decay: 0.9998  # adjusted by 1 - (1 - decay) * 2\n        warmups: 500  # halved\n\n    lr_warmup_scheduler:\n        warmup_duration: 250  # halved\n    ```\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary> Customizing Input Size \u003C\u002Fsummary>\n\nIf you'd like to train **DEIM** on COCO2017 with an input size of 320x320, follow these steps:\n\n1. **Modify your [dataloader.yml](.\u002Fconfigs\u002Fbase\u002Fdataloader.yml)**:\n\n    ```yaml\n\n    train_dataloader:\n    dataset:\n        transforms:\n            ops:\n                - {type: Resize, size: [320, 320], }\n    collate_fn:\n        base_size: 320\n    dataset:\n        transforms:\n            ops:\n                - {type: Resize, size: [320, 320], }\n    ```\n\n2. **Modify your [dfine_hgnetv2.yml](.\u002Fconfigs\u002Fbase\u002Fdfine_hgnetv2.yml)**:\n\n    ```yaml\n    eval_spatial_size: [320, 320]\n    ```\n\n\u003C\u002Fdetails>\n\n## 4. Tools\n\u003Cdetails>\n\u003Csummary> Deployment \u003C\u002Fsummary>\n\n\u003C!-- \u003Csummary>4. Export onnx \u003C\u002Fsummary> -->\n1. Setup\n```shell\npip install onnx onnxsim\n```\n\n2. Export onnx\n```shell\npython tools\u002Fdeployment\u002Fexport_onnx.py --check -c configs\u002Fdeim_dfine\u002Fdeim_hgnetv2_${model}_coco.yml -r model.pth\n```\n\n3. Export [tensorrt](https:\u002F\u002Fdocs.nvidia.com\u002Fdeeplearning\u002Ftensorrt\u002Finstall-guide\u002Findex.html)\n```shell\ntrtexec --onnx=\"model.onnx\" --saveEngine=\"model.engine\" --fp16\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary> Inference (Visualization) \u003C\u002Fsummary>\n\n\n1. Setup\n```shell\npip install -r tools\u002Finference\u002Frequirements.txt\n```\n\n\n\u003C!-- \u003Csummary>5. Inference \u003C\u002Fsummary> -->\n2. Inference (onnxruntime \u002F tensorrt \u002F torch)\n\nInference on images and videos is now supported.\n```shell\npython tools\u002Finference\u002Fonnx_inf.py --onnx model.onnx --input image.jpg  # video.mp4\npython tools\u002Finference\u002Ftrt_inf.py --trt model.engine --input image.jpg\npython tools\u002Finference\u002Ftorch_inf.py -c configs\u002Fdeim_dfine\u002Fdeim_hgnetv2_${model}_coco.yml -r model.pth --input image.jpg --device cuda:0\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary> Benchmark \u003C\u002Fsummary>\n\n1. Setup\n```shell\npip install -r tools\u002Fbenchmark\u002Frequirements.txt\n```\n\n\u003C!-- \u003Csummary>6. Benchmark \u003C\u002Fsummary> -->\n2. Model FLOPs, MACs, and Params\n```shell\npython tools\u002Fbenchmark\u002Fget_info.py -c configs\u002Fdeim_dfine\u002Fdeim_hgnetv2_${model}_coco.yml\n```\n\n2. TensorRT Latency\n```shell\npython tools\u002Fbenchmark\u002Ftrt_benchmark.py --COCO_dir path\u002Fto\u002FCOCO2017 --engine_dir model.engine\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary> Fiftyone Visualization  \u003C\u002Fsummary>\n\n1. Setup\n```shell\npip install fiftyone\n```\n4. Voxel51 Fiftyone Visualization ([fiftyone](https:\u002F\u002Fgithub.com\u002Fvoxel51\u002Ffiftyone))\n```shell\npython tools\u002Fvisualization\u002Ffiftyone_vis.py -c configs\u002Fdeim_dfine\u002Fdeim_hgnetv2_${model}_coco.yml -r model.pth\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary> Others \u003C\u002Fsummary>\n\n1. Auto Resume Training\n```shell\nbash reference\u002Fsafe_training.sh\n```\n\n2. Converting Model Weights\n```shell\npython reference\u002Fconvert_weight.py model.pth\n```\n\u003C\u002Fdetails>\n\n\n## 5. Citation\nIf you use `DEIM` or its methods in your work, please cite the following BibTeX entries:\n\u003Cdetails open>\n\u003Csummary> bibtex \u003C\u002Fsummary>\n\n```latex\n@misc{huang2024deim,\n      title={DEIM: DETR with Improved Matching for Fast Convergence},\n      author={Shihua, Huang and Zhichao, Lu and Xiaodong, Cun and Yongjun, Yu and Xiao, Zhou and Xi, Shen},\n      booktitle={Proceedings of the IEEE\u002FCVF Conference on Computer Vision and Pattern Recognition},\n      year={2025},\n}\n```\n\u003C\u002Fdetails>\n\n## 6. Acknowledgement\nOur work is built upon [D-FINE](https:\u002F\u002Fgithub.com\u002FPeterande\u002FD-FINE) and [RT-DETR](https:\u002F\u002Fgithub.com\u002Flyuwenyu\u002FRT-DETR).\n\n✨ Feel free to contribute and reach out if you have any questions! ✨\n","\u003Ch2 align=\"center\">\n  DEIM：具有改进匹配机制的DETR，实现快速收敛\n\u003C\u002Fh2>\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FShihuaHuang95\u002FDEIM\u002Fblob\u002Fmaster\u002FLICENSE\">\n        \u003Cimg alt=\"license\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLICENSE-Apache%202.0-blue\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.04234\">\n        \u003Cimg alt=\"arXiv\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2412.04234-red\">\n    \u003C\u002Fa>\n   \u003Ca href=\"https:\u002F\u002Fwww.shihuahuang.cn\u002FDEIM\u002F\">\n        \u003Cimg alt=\"project webpage\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWebpage-DEIM-purple\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FShihuaHuang95\u002FDEIM\u002Fpulls\">\n        \u003Cimg alt=\"prs\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues-pr\u002FShihuaHuang95\u002FDEIM\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FShihuaHuang95\u002FDEIM\u002Fissues\">\n        \u003Cimg alt=\"issues\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002FShihuaHuang95\u002FDEIM?color=olive\">\n    \u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FShihuaHuang95\u002FDEIM\">\n        \u003Cimg alt=\"stars\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FShihuaHuang95\u002FDEIM\">\n    \u003C\u002Fa>\n    \u003Ca href=\"mailto:shihuahuang95@gmail.com\">\n        \u003Cimg alt=\"Contact Us\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FContact-Email-yellow\">\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\" style=\"font-size: 2.0em; font-weight: bold;\">\n    🎉 \u003Cstrong> \u003Ca href=\"https:\u002F\u002Fintellindust-ai-lab.github.io\u002Fprojects\u002FEdgeCrafter\u002F\" style=\"color: #d9534f; text-decoration: none;\">EdgeCrafter\u003C\u002Fa> 已发布，在目标检测、姿态估计以及实例分割任务上均达到SOTA性能。\u003C\u002Fstrong>🎉\n\u003C\u002Fp>\n\u003Cp align=\"center\" style=\"font-size: 2.0em; font-weight: bold;\">\n    🎉 \u003Cstrong>我们很高兴地分享 \u003Ca href=\"https:\u002F\u002Fintellindust-ai-lab.github.io\u002Fprojects\u002FDEIMv2\u002F\" style=\"color: #d9534f; text-decoration: none;\">DEIMv2\u003C\u002Fa> \u003C\u002Fstrong>🎉\n\u003C\u002Fp>\n\n\n\u003Cp align=\"center\">\n    DEIM是一个先进的训练框架，旨在优化DETR中的匹配机制，从而实现更快的收敛速度和更高的精度。它为实时目标检测领域的未来研究与应用提供了坚实的基础。\n\u003C\u002Fp>\n\n---\n\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"http:\u002F\u002Fwww.shihuahuang.cn\">Shihua Huang\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>,\n  \u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=tIFWBcQAAAAJ&hl=en\">Zhichao Lu\u003C\u002Fa>\u003Csup>2\u003C\u002Fsup>,\n  \u003Ca href=\"https:\u002F\u002Fvinthony.github.io\u002Facademic\u002F\">Xiaodong Cun\u003C\u002Fa>\u003Csup>3\u003C\u002Fsup>,\n  Yongjun Yu\u003Csup>1\u003C\u002Fsup>,\n  Xiao Zhou\u003Csup>4\u003C\u002Fsup>, \n  \u003Ca href=\"https:\u002F\u002Fxishen0220.github.io\">Xi Shen\u003C\u002Fa>\u003Csup>1*\u003C\u002Fsup>\n\u003C\u002Fdiv>\n\n  \n\u003Cp align=\"center\">\n\u003Ci>\n1. 智能工业AI实验室 &nbsp; 2. 香港城市大学 &nbsp; 3. 珠江三角洲大学 &nbsp; 4. 合肥师范学院\n\u003C\u002Fi>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  **📧 通讯作者:** \u003Ca href=\"mailto:shenxiluc@gmail.com\">shenxiluc@gmail.com\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Freal-time-object-detection-on-coco?p=deim-detr-with-improved-matching-for-fast\">\n    \u003Cimg alt=\"sota\" src=\"https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fdeim-detr-with-improved-matching-for-fast\u002Freal-time-object-detection-on-coco\">\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n\u003Cstrong>如果您喜欢我们的工作，请给我们一颗星！\u003C\u002Fstrong>\n\u003C\u002Fp>\n\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FIntellindust-AI-Lab_DEIM_readme_b8143637c259.png\" alt=\"Image 1\" width=\"49%\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FIntellindust-AI-Lab_DEIM_readme_bdfc7b06a1bb.png\" alt=\"Image 2\" width=\"49%\">\n\u003C\u002Fp>\n\n\u003C\u002Fdetails>\n\n \n  \n## 🚀 更新\n- [x] **\\[2025.09.26\\]** **DEIMv2** 现已上线，附有[项目页面](https:\u002F\u002Fintellindust-ai-lab.github.io\u002Fprojects\u002FDEIMv2\u002F)和[发布代码](https:\u002F\u002Fgithub.com\u002FIntellindust-AI-Lab\u002FDEIMv2)。该系列包含八种模型尺寸，从**X**到**Atto**。对于**S、M、L和X**版本，我们采用了DINOv3特征（蒸馏或预训练）。**DEIMv2**在参数量和FLOPs更少的情况下实现了更高的性能。\n- [x] **\\[2025.06.24\\]** DEIMv2 即将发布：我们的下一代检测系列，同时推出三种超轻量级变体：Pico（1.5M）、Femto（0.96M）和Atto（0.49M），均达到SOTA性能。其中，Atto特别针对移动设备设计，在320×320分辨率下于COCO数据集上达到23.8 AP。\n- [x] **\\[2025.03.12\\]** Object365预训练模型[DEIM-D-FINE-X](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1RMNrHh3bYN0FfT5ZlWhXtQxkG23xb2xj\u002Fview?usp=drive_link)发布，微调24个COCO周期后，AP达到59.5%。\n- [x] **\\[2025.03.05\\]** Nano DEIM模型发布。\n- [x] **\\[2025.02.27\\]** DEIM论文已被CVPR 2025接收。感谢所有合作者。\n- [x] **\\[2024.12.26\\]** 对密集O2O进行了更高效的实现，加载速度提升了近30%（详情请参阅[拉取请求](https:\u002F\u002Fgithub.com\u002FShihuaHuang95\u002FDEIM\u002Fpull\u002F13)）。非常感谢我的同事[Longfei Liu](https:\u002F\u002Fgithub.com\u002Fcapsule2077)。\n- [x] **\\[2024.12.03\\]** 发布DEIM系列。此外，本仓库还支持[D-FINE](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.13842)和[RT-DETR](https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.17140)的重新实现。\n\n## 目录\n* [1. 模型库](https:\u002F\u002Fgithub.com\u002FShihuaHuang95\u002FDEIM?tab=readme-ov-file#1-model-zoo)\n* [2. 快速入门](https:\u002F\u002Fgithub.com\u002FShihuaHuang95\u002FDEIM?tab=readme-ov-file#2-quick-start)\n* [3. 使用方法](https:\u002F\u002Fgithub.com\u002FShihuaHuang95\u002FDEIM?tab=readme-ov-file#3-usage)\n* [4. 工具](https:\u002F\u002Fgithub.com\u002FShihuaHuang95\u002FDEIM?tab=readme-ov-file#4-tools)\n* [5. 引用](https:\u002F\u002Fgithub.com\u002FShihuaHuang95\u002FDEIM?tab=readme-ov-file#5-citation)\n* [6. 致谢](https:\u002F\u002Fgithub.com\u002FShihuaHuang95\u002FDEIM?tab=readme-ov-file#6-acknowledgement)\n  \n  \n## 1. 模型库\n\n### DEIM-D-FINE\n| 模型 | 数据集 | AP\u003Csup>D-FINE\u003C\u002Fsup> | AP\u003Csup>DEIM\u003C\u002Fsup> | 参数量 | 延迟 | GFLOPs | 配置文件 | 检查点 |\n| :---: | :---: | :---: | :---: |  :---: | :---: | :---: | :---: | :---: \n**N** | COCO | **42.8** | **43.0** | 4M | 2.12ms | 7 | [yml](.\u002Fconfigs\u002Fdeim_dfine\u002Fdeim_hgnetv2_n_coco.yml) | [ckpt](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1ZPEhiU9nhW4M5jLnYOFwTSLQC1Ugf62e\u002Fview?usp=sharing) |\n**S** | COCO | **48.7** | **49.0** | 10M | 3.49ms | 25 | [yml](.\u002Fconfigs\u002Fdeim_dfine\u002Fdeim_hgnetv2_s_coco.yml) | [ckpt](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1tB8gVJNrfb6dhFvoHJECKOF5VpkthhfC\u002Fview?usp=drive_link) |\n**M** | COCO | **52.3** | **52.7** | 19M | 5.62ms | 57 | [yml](.\u002Fconfigs\u002Fdeim_dfine\u002Fdeim_hgnetv2_m_coco.yml) | [ckpt](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F18Lj2a6UN6k_n_UzqnJyiaiLGpDzQQit8\u002Fview?usp=drive_link) |\n**L** | COCO | **54.0** | **54.7** | 31M | 8.07ms | 91 | [yml](.\u002Fconfigs\u002Fdeim_dfine\u002Fdeim_hgnetv2_l_coco.yml) | [ckpt](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1PIRf02XkrA2xAD3wEiKE2FaamZgSGTAr\u002Fview?usp=drive_link) | \n**X** | COCO | **55.8** | **56.5** | 62M | 12.89ms | 202 | [yml](.\u002Fconfigs\u002Fdeim_dfine\u002Fdeim_hgnetv2_x_coco.yml) | [ckpt](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1dPtbgtGgq1Oa7k_LgH1GXPelg1IVeu0j\u002Fview?usp=drive_link) |\n\n### DEIM-RT-DETRv2\n| 模型 | 数据集 | AP\u003Csup>RT-DETRv2\u003C\u002Fsup> | AP\u003Csup>DEIM\u003C\u002Fsup> | 参数量 | 延迟 | GFLOPs | 配置文件 | 检查点 |\n| :---: | :---: | :---: | :---: |  :---: | :---: | :---: | :---: | :---: |\n**S** | COCO | **47.9** | **49.0** | 20M | 4.59ms | 60 | [yml](.\u002Fconfigs\u002Fdeim_rtdetrv2\u002Fdeim_r18vd_120e_coco.yml) | [ckpt](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F153_JKff6EpFgiLKaqkJsoDcLal_0ux_F\u002Fview?usp=drive_link) | \n**M** | COCO | **49.9** | **50.9** | 31M | 6.40ms | 92 | [yml](.\u002Fconfigs\u002Fdeim_rtdetrv2\u002Fdeim_r34vd_120e_coco.yml) | [ckpt](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1O9RjZF6kdFWGv1Etn1Toml4r-YfdMDMM\u002Fview?usp=drive_link) | \n**M*** | COCO | **51.9** | **53.2** | 33M | 6.90ms | 100 | [yml](.\u002Fconfigs\u002Fdeim_rtdetrv2\u002Fdeim_r50vd_m_60e_coco.yml) | [ckpt](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F10dLuqdBZ6H5ip9BbBiE6S7ZcmHkRbD0E\u002Fview?usp=drive_link) | \n**L** | COCO | **53.4** | **54.3** | 42M | 9.15ms | 136 | [yml](.\u002Fconfigs\u002Fdeim_rtdetrv2\u002Fdeim_r50vd_60e_coco.yml) | [ckpt](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1mWknAXD5JYknUQ94WCEvPfXz13jcNOTI\u002Fview?usp=drive_link) | \n**X** | COCO | **54.3** | **55.5** | 76M | 13.66ms | 259 | [yml](.\u002Fconfigs\u002Fdeim_rtdetrv2\u002Fdeim_r101vd_60e_coco.yml) | [ckpt](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1BIevZijOcBO17llTyDX32F_pYppBfnzu\u002Fview?usp=drive_link) | \n\n\n## 2. 快速入门\n\n### 环境搭建\n\n```shell\nconda create -n deim python=3.11.9\nconda activate deim\npip install -r requirements.txt\n```\n\n\n### 数据准备\n\n\u003Cdetails>\n\u003Csummary> COCO2017 数据集 \u003C\u002Fsummary>\n\n1. 从 [OpenDataLab](https:\u002F\u002Fopendatalab.com\u002FOpenDataLab\u002FCOCO_2017) 或 [COCO](https:\u002F\u002Fcocodataset.org\u002F#download) 下载 COCO2017 数据集。\n1. 修改 [coco_detection.yml](.\u002Fconfigs\u002Fdataset\u002Fcoco_detection.yml) 中的路径：\n\n    ```yaml\n    train_dataloader:\n        img_folder: \u002Fdata\u002FCOCO2017\u002Ftrain2017\u002F\n        ann_file: \u002Fdata\u002FCOCO2017\u002Fannotations\u002Finstances_train2017.json\n    val_dataloader:\n        img_folder: \u002Fdata\u002FCOCO2017\u002Fval2017\u002F\n        ann_file: \u002Fdata\u002FCOCO2017\u002Fannotations\u002Finstances_val2017.json\n    ```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>自定义数据集\u003C\u002Fsummary>\n\n要在你的自定义数据集上进行训练，你需要将其组织成 COCO 格式。请按照以下步骤准备你的数据集：\n\n1. **将 `remap_mscoco_category` 设置为 `False`:**\n\n    这可以防止类别 ID 自动重新映射以匹配 MSCOCO 类别。\n\n    ```yaml\n    remap_mscoco_category: False\n    ```\n\n2. **组织图像:**\n\n    按照以下目录结构组织你的数据集：\n\n    ```shell\n    dataset\u002F\n    ├── images\u002F\n    │   ├── train\u002F\n    │   │   ├── image1.jpg\n    │   │   ├── image2.jpg\n    │   │   └── ...\n    │   ├── val\u002F\n    │   │   ├── image1.jpg\n    │   │   ├── image2.jpg\n    │   │   └── ...\n    └── annotations\u002F\n        ├── instances_train.json\n        ├── instances_val.json\n        └── ...\n    ```\n\n    - **`images\u002Ftrain\u002F`**: 包含所有训练图像。\n    - **`images\u002Fval\u002F`**: 包含所有验证图像。\n    - **`annotations\u002F`**: 包含 COCO 格式的标注文件。\n\n3. **将标注转换为 COCO 格式:**\n\n    如果你的标注还不是 COCO 格式，你需要将其转换。你可以使用以下 Python 脚本作为参考，或者利用现有的工具：\n\n    ```python\n    import json\n\n    def convert_to_coco(input_annotations, output_annotations):\n        # 在这里实现转换逻辑\n        pass\n\n    if __name__ == \"__main__\":\n        convert_to_coco('path\u002Fto\u002Fyour_annotations.json', 'dataset\u002Fannotations\u002Finstances_train.json')\n    ```\n\n4. **更新配置文件:**\n\n    修改你的 [custom_detection.yml](.\u002Fconfigs\u002Fdataset\u002Fcustom_detection.yml)。\n\n    ```yaml\n    task: detection\n\n    evaluator:\n      type: CocoEvaluator\n      iou_types: ['bbox', ]\n\n    num_classes: 777 # 你的数据集类别数\n    remap_mscoco_category: False\n\n    train_dataloader:\n      type: DataLoader\n      dataset:\n        type: CocoDetection\n        img_folder: \u002Fdata\u002Fyourdataset\u002Ftrain\n        ann_file: \u002Fdata\u002Fyourdataset\u002Ftrain\u002Ftrain.json\n        return_masks: False\n        transforms:\n          type: Compose\n          ops: ~\n      shuffle: True\n      num_workers: 4\n      drop_last: True\n      collate_fn:\n        type: BatchImageCollateFunction\n\n    val_dataloader:\n      type: DataLoader\n      dataset:\n        type: CocoDetection\n        img_folder: \u002Fdata\u002Fyourdataset\u002Fval\n        ann_file: \u002Fdata\u002Fyourdataset\u002Fval\u002Fann.json\n        return_masks: False\n        transforms:\n          type: Compose\n          ops: ~\n      shuffle: False\n      num_workers: 4\n      drop_last: False\n      collate_fn:\n        type: BatchImageCollateFunction\n    ```\n\n\u003C\u002Fdetails>\n\n## 3. 使用方法\n\u003Cdetails open>\n\u003Csummary> COCO2017 \u003C\u002Fsummary>\n\n1. 训练\n```shell\nCUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs\u002Fdeim_dfine\u002Fdeim_hgnetv2_${model}_coco.yml --use-amp --seed=0\n```\n\n\u003C!-- \u003Csummary>2. 测试 \u003C\u002Fsummary> -->\n2. 测试\n```shell\nCUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs\u002Fdeim_dfine\u002Fdeim_hgnetv2_${model}_coco.yml --test-only -r model.pth\n```\n\n\u003C!-- \u003Csummary>3. 调参 \u003C\u002Fsummary> -->\n3. 调参\n```shell\nCUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs\u002Fdeim_dfine\u002Fdeim_hgnetv2_${model}_coco.yml --use-amp --seed=0 -t model.pth\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary> 自定义批量大小 \u003C\u002Fsummary>\n\n例如，如果您希望在 COCO2017 数据集上训练 D-FINE-L 时将总批量大小增加一倍，可以按照以下步骤操作：\n\n1. **修改 [dataloader.yml](.\u002Fconfigs\u002Fbase\u002Fdataloader.yml)**，将 `total_batch_size` 增大：\n\n    ```yaml\n    train_dataloader:\n        total_batch_size: 64  # 原本是 32，现在加倍\n    ```\n\n2. **修改 [deim_hgnetv2_l_coco.yml](.\u002Fconfigs\u002Fdeim_dfine\u002Fdeim_hgnetv2_l_coco.yml)**。以下是关键参数的调整方式：\n\n    ```yaml\n    optimizer:\n    type: AdamW\n    params:\n        -\n        params: '^(?=.*backbone)(?!.*norm|bn).*$'\n        lr: 0.000025  # 加倍，遵循线性缩放法则\n        -\n        params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'\n        weight_decay: 0.\n\n    lr: 0.0005  # 加倍，遵循线性缩放法则\n    betas: [0.9, 0.999]\n    weight_decay: 0.0001  # 需要进行网格搜索\n\n    ema:  # 添加 EMA 设置\n        decay: 0.9998  # 根据公式 1 - (1 - decay) * 2 进行调整\n        warmups: 500  # 减半\n\n    lr_warmup_scheduler:\n        warmup_duration: 250  # 减半\n    ```\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary> 自定义输入尺寸 \u003C\u002Fsummary>\n\n如果您希望在 COCO2017 数据集上以 320x320 的输入尺寸训练 **DEIM**，请按照以下步骤操作：\n\n1. **修改 [dataloader.yml](.\u002Fconfigs\u002Fbase\u002Fdataloader.yml)**：\n\n    ```yaml\n\n    train_dataloader:\n    dataset:\n        transforms:\n            ops:\n                - {type: Resize, size: [320, 320], }\n    collate_fn:\n        base_size: 320\n    dataset:\n        transforms:\n            ops:\n                - {type: Resize, size: [320, 320], }\n    ```\n\n2. **修改 [dfine_hgnetv2.yml](.\u002Fconfigs\u002Fbase\u002Fdfine_hgnetv2.yml)**：\n\n    ```yaml\n    eval_spatial_size: [320, 320]\n    ```\n\n\u003C\u002Fdetails>\n\n## 4. 工具\n\u003Cdetails>\n\u003Csummary> 部署 \u003C\u002Fsummary>\n\n\u003C!-- \u003Csummary>4. 导出 onnx \u003C\u002Fsummary> -->\n1. 环境准备\n```shell\npip install onnx onnxsim\n```\n\n2. 导出 onnx\n```shell\npython tools\u002Fdeployment\u002Fexport_onnx.py --check -c configs\u002Fdeim_dfine\u002Fdeim_hgnetv2_${model}_coco.yml -r model.pth\n```\n\n3. 导出 [tensorrt](https:\u002F\u002Fdocs.nvidia.com\u002Fdeeplearning\u002Ftensorrt\u002Finstall-guide\u002Findex.html)\n```shell\ntrtexec --onnx=\"model.onnx\" --saveEngine=\"model.engine\" --fp16\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary> 推理（可视化） \u003C\u002Fsummary>\n\n\n1. 环境准备\n```shell\npip install -r tools\u002Finference\u002Frequirements.txt\n```\n\n\n\u003C!-- \u003Csummary>5. 推理 \u003C\u002Fsummary> -->\n2. 推理（onnxruntime \u002F tensorrt \u002F torch）\n\n现已支持对图像和视频进行推理。\n```shell\npython tools\u002Finference\u002Fonnx_inf.py --onnx model.onnx --input image.jpg  # video.mp4\npython tools\u002Finference\u002Ftrt_inf.py --trt model.engine --input image.jpg\npython tools\u002Finference\u002Ftorch_inf.py -c configs\u002Fdeim_dfine\u002Fdeim_hgnetv2_${model}_coco.yml -r model.pth --input image.jpg --device cuda:0\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary> 基准测试 \u003C\u002Fsummary>\n\n1. 环境准备\n```shell\npip install -r tools\u002Fbenchmark\u002Frequirements.txt\n```\n\n\u003C!-- \u003Csummary>6. 基准测试 \u003C\u002Fsummary> -->\n2. 模型 FLOPs、MACs 和参数量\n```shell\npython tools\u002Fbenchmark\u002Fget_info.py -c configs\u002Fdeim_dfine\u002Fdeim_hgnetv2_${model}_coco.yml\n```\n\n2. TensorRT 延迟\n```shell\npython tools\u002Fbenchmark\u002Ftrt_benchmark.py --COCO_dir path\u002Fto\u002FCOCO2017 --engine_dir model.engine\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary> Fiftyone 可视化 \u003C\u002Fsummary>\n\n1. 环境准备\n```shell\npip install fiftyone\n```\n4. Voxel51 Fiftyone 可视化（[fiftyone](https:\u002F\u002Fgithub.com\u002Fvoxel51\u002Ffiftyone)）\n```shell\npython tools\u002Fvisualization\u002Ffiftyone_vis.py -c configs\u002Fdeim_dfine\u002Fdeim_hgnetv2_${model}_coco.yml -r model.pth\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary> 其他 \u003C\u002Fsummary>\n\n1. 自动恢复训练\n```shell\nbash reference\u002Fsafe_training.sh\n```\n\n2. 模型权重转换\n```shell\npython reference\u002Fconvert_weight.py model.pth\n```\n\u003C\u002Fdetails>\n\n\n## 5. 引用\n如果您在工作中使用了 `DEIM` 或其相关方法，请引用以下 BibTeX 条目：\n\u003Cdetails open>\n\u003Csummary> bibtex \u003C\u002Fsummary>\n\n```latex\n@misc{huang2024deim,\n      title={DEIM: DETR with Improved Matching for Fast Convergence},\n      author={Shihua, Huang and Zhichao, Lu and Xiaodong, Cun and Yongjun, Yu and Xiao, Zhou and Xi, Shen},\n      booktitle={Proceedings of the IEEE\u002FCVF Conference on Computer Vision and Pattern Recognition},\n      year={2025},\n}\n```\n\u003C\u002Fdetails>\n\n## 6. 致谢\n我们的工作基于 [D-FINE](https:\u002F\u002Fgithub.com\u002FPeterande\u002FD-FINE) 和 [RT-DETR](https:\u002F\u002Fgithub.com\u002Flyuwenyu\u002FRT-DETR)。\n\n✨ 欢迎贡献代码，如有任何问题，欢迎随时联系！ ✨","# DEIM 快速上手指南\n\nDEIM (DETR with Improved Matching) 是一个先进的目标检测训练框架，旨在优化 DETR 系列模型的匹配机制，实现更快的收敛速度和更高的检测精度。本指南将帮助您快速搭建环境并运行模型。\n\n## 1. 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n- **操作系统**: Linux (推荐 Ubuntu 18.04+)\n- **Python**: 3.11.9 (严格版本要求)\n- **GPU**: 支持 CUDA 的 NVIDIA 显卡\n- **依赖库**: PyTorch, torchvision 等 (将通过 requirements.txt 自动安装)\n\n## 2. 安装步骤\n\n### 2.1 创建虚拟环境\n推荐使用 Conda 创建独立的 Python 环境以避免依赖冲突：\n\n```shell\nconda create -n deim python=3.11.9\nconda activate deim\n```\n\n### 2.2 安装依赖\n克隆仓库后，进入目录并安装所需依赖。国内用户建议使用清华或阿里镜像源加速安装：\n\n```shell\n# 可选：配置 pip 国内镜像源\npip config set global.index-url https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n\n# 安装项目依赖\npip install -r requirements.txt\n```\n\n## 3. 数据准备\n\n### 3.1 COCO2017 数据集（默认）\n1. 从 [OpenDataLab](https:\u002F\u002Fopendatalab.com\u002FOpenDataLab\u002FCOCO_2017) 或 [COCO 官网](https:\u002F\u002Fcocodataset.org\u002F#download) 下载 COCO2017 数据集。\n2. 修改配置文件 `configs\u002Fdataset\u002Fcoco_detection.yml` 中的路径：\n\n```yaml\ntrain_dataloader:\n    img_folder: \u002Fdata\u002FCOCO2017\u002Ftrain2017\u002F\n    ann_file: \u002Fdata\u002FCOCO2017\u002Fannotations\u002Finstances_train2017.json\nval_dataloader:\n    img_folder: \u002Fdata\u002FCOCO2017\u002Fval2017\u002F\n    ann_file: \u002Fdata\u002FCOCO2017\u002Fannotations\u002Finstances_val2017.json\n```\n\n### 3.2 自定义数据集\n若使用自定义数据，需将其整理为 COCO 格式：\n1. 目录结构示例：\n   ```text\n   dataset\u002F\n   ├── images\u002F\n   │   ├── train\u002F\n   │   └── val\u002F\n   └── annotations\u002F\n       ├── instances_train.json\n       └── instances_val.json\n   ```\n2. 在对应的 `.yml` 配置文件中设置 `remap_mscoco_category: False` 并更新数据路径。\n\n## 4. 基本使用\n\n以下命令以 `deim_hgnetv2_s_coco.yml` (Small 模型) 为例，`${model}` 可替换为 `n`, `s`, `m`, `l`, `x`。\n\n### 4.1 训练模型\n使用多卡训练（例如 4 张 GPU），开启混合精度加速 (`--use-amp`)：\n\n```shell\nCUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs\u002Fdeim_dfine\u002Fdeim_hgnetv2_${model}_coco.yml --use-amp --seed=0\n```\n\n### 4.2 评估模型\n加载预训练权重进行测试：\n\n```shell\nCUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs\u002Fdeim_dfine\u002Fdeim_hgnetv2_${model}_coco.yml --test-only -r model.pth\n```\n\n### 4.3 微调模型\n基于已有权重进行微调：\n\n```shell\nCUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs\u002Fdeim_dfine\u002Fdeim_hgnetv2_${model}_coco.yml --use-amp --seed=0 -t model.pth\n```\n\n> **提示**: 更多模型权重和配置文件请参考项目 `Model Zoo` 章节或直接访问官方 GitHub 仓库获取最新链接。","某自动驾驶初创团队正在研发车载实时行人检测系统，需要在有限算力下快速迭代高精度模型以应对复杂路况。\n\n### 没有 DEIM 时\n- **训练周期漫长**：传统 DETR 架构收敛极慢，往往需要数百个 Epoch 才能稳定，导致算法工程师每晚只能验证一轮实验，严重拖慢研发节奏。\n- **小目标漏检率高**：在密集人流场景中，模型难以精准匹配预测框与真实目标，导致远处的行人或小尺寸物体频繁漏检。\n- **调参成本高昂**：为了提升收敛速度，团队需花费大量时间手动调整学习率策略和匹配阈值，且效果往往不尽如人意。\n- **部署落地困难**：由于收敛慢，团队不敢轻易尝试更大规模的骨干网络，只能在精度和速度之间被迫妥协，难以满足车规级安全标准。\n\n### 使用 DEIM 后\n- **收敛速度飞跃**：DEIM 通过改进的匹配机制，将模型收敛所需的训练轮次大幅减少，原本需要 3 天的训练任务现在仅需数小时即可完成。\n- **匹配精度显著提升**：改进的二分图匹配策略让模型在处理密集遮挡和小目标时更加敏锐，行人检测的召回率在极端场景下提升了 15%。\n- **研发流程自动化**：不再依赖繁琐的人工调参，DEIM 让模型在不同配置下均能快速稳定收敛，团队可将精力集中于数据清洗和场景泛化研究。\n- **模型迭代更灵活**：得益于快速收敛特性，团队能够轻松尝试更多轻量化变体（如 DEIMv2 的 Atto 版本），在保持高精度的同时成功部署至嵌入式芯片。\n\nDEIM 通过重构匹配机制，将原本漫长的模型训练过程转化为高效的迭代循环，让实时检测系统的研发从“等待收敛”转变为“即时验证”。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FIntellindust-AI-Lab_DEIM_b8143637.png","Intellindust-AI-Lab","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FIntellindust-AI-Lab_387d9b45.png","INTELLINDUST INFORMATION TECHNOLOGY (SHENZHEN) CO., LTD",null,"shenxiluc@gmail.com","https:\u002F\u002Fintellindust-ai-lab.github.io\u002F","https:\u002F\u002Fgithub.com\u002FIntellindust-AI-Lab",[81,85],{"name":82,"color":83,"percentage":84},"Python","#3572A5",99.3,{"name":86,"color":87,"percentage":88},"Shell","#89e051",0.7,1475,192,"2026-04-13T03:47:56","NOASSERTION","Linux","必需 NVIDIA GPU（训练命令使用 CUDA_VISIBLE_DEVICES），具体型号和显存未说明，需支持 CUDA","未说明",{"notes":97,"python":98,"dependencies":99},"建议使用 conda 创建名为'deim'的虚拟环境。训练脚本使用 torchrun 进行多卡分布式训练。数据集需准备为 COCO 格式。具体依赖库版本需查看项目根目录下的 requirements.txt 文件（README 中未直接列出）。","3.11.9",[100,101],"torch","requirements.txt 中列出的其他依赖",[15],[104,105,106,107,108],"detr","mal","object-detection","dense-o2o","real-time-detector","2026-03-27T02:49:30.150509","2026-04-13T17:41:35.861247",[112,117,122,127,132,136],{"id":113,"question_zh":114,"answer_zh":115,"source_url":116},31806,"为什么 D-FINE 看起来比 DEIM-D-FINE 收敛得更快？","DEIM 中的 Dense O2O 机制应用了更强的数据增强，这会导致验证集结果在训练初期低于标准 D-FINE。所谓的“快速收敛”是指在更少的 epoch 内达到相当或更好的性能。如果您的数据集在 40-60 个 epoch 后验证结果不再变化，建议减少总训练 epoch 数并重新对比。此外，请检查学习率调度器（lr_scheduler）、预热步数（warmup_iter）等参数设置是否与官方配置一致。","https:\u002F\u002Fgithub.com\u002FIntellindust-AI-Lab\u002FDEIM\u002Fissues\u002F5",{"id":118,"question_zh":119,"answer_zh":120,"source_url":121},31807,"在 COCO 数据集上训练需要多长时间？如何快速达到不错的效果？","如果目标是复现论文结果，请严格使用官方提供的配置文件（configs）。如果仅需达到约 52 COCO AP 的性能，通常只需训练 24 个 epoch，耗时约一天。关于显存和批次大小（Batchsize），如果不涉及严格的学术对比实验，建议将 Batchsize 设置为填满 GPU 显存的最大值以加速训练。","https:\u002F\u002Fgithub.com\u002FIntellindust-AI-Lab\u002FDEIM\u002Fissues\u002F100",{"id":123,"question_zh":124,"answer_zh":125,"source_url":126},31808,"训练时报错 'module torchvision... has no attribute get_spatial_size' 或分布式训练崩溃怎么办？","这是由于新版 torchvision 移除了 `get_spatial_size` API 导致的兼容性问题。有两种解决方案：\n1. **降级版本（推荐）**：将环境降级为 `torchvision==0.15.2` 和 `numpy==1.26.0`。\n2. **修改代码**：如果使用新版 torchvision，请将代码中（通常在 `engine\u002Fdata\u002Ftransforms` 目录下的 mosaic 和 _transform.py 文件中）所有的 `F.get_spatial_size` 替换为 `F.get_size`。","https:\u002F\u002Fgithub.com\u002FIntellindust-AI-Lab\u002FDEIM\u002Fissues\u002F6",{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},31809,"导出 ONNX 模型时提示 'No module named src' 错误如何解决？","该问题是由于工具脚本中的路径索引问题导致的。维护者已修复了 `tools` 目录下的相关文件。请确保您拉取的是最新版本的代码。如果问题依旧，请检查运行脚本的目录结构，确保 `src` 模块能被正确识别，或者尝试从项目根目录运行导出命令。","https:\u002F\u002Fgithub.com\u002FIntellindust-AI-Lab\u002FDEIM\u002Fissues\u002F4",{"id":133,"question_zh":134,"answer_zh":135,"source_url":131},31810,"导出 ONNX 模型时内存占用过高（如超过 30GB）导致进程被杀死怎么办？","PyTorch 转 ONNX 过程中需要执行前向传播以追踪操作并存储中间张量，同时构建计算图会消耗大量内存，这是正常现象。解决方法包括：\n1. 增加系统物理内存或使用交换空间（Swap）。\n2. 尝试减小导出时的输入图像分辨率（如果模型支持动态形状）。\n3. 确保使用最新的代码版本，维护者可能已针对内存优化进行了修复。",{"id":137,"question_zh":138,"answer_zh":139,"source_url":140},31811,"推荐使用什么版本的 Torchvision 以避免兼容性错误？","目前最稳定的环境配置是使用 `torchvision==0.15`。较新版本（如 0.17+）的 torchvision 更改了部分 API（例如移除了 `get_spatial_size`），可能导致代码报错。虽然支持新版本的计划已在开发路线图中，但在正式更新前，建议使用 0.15 版本以确保顺利运行。","https:\u002F\u002Fgithub.com\u002FIntellindust-AI-Lab\u002FDEIM\u002Fissues\u002F2",[]]