[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-rwightman--efficientdet-pytorch":3,"tool-rwightman--efficientdet-pytorch":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",157379,2,"2026-04-15T23:32:42",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":77,"owner_email":76,"owner_twitter":78,"owner_website":79,"owner_url":80,"languages":81,"stars":90,"forks":91,"last_commit_at":92,"license":93,"difficulty_score":10,"env_os":94,"env_gpu":95,"env_ram":96,"env_deps":97,"category_tags":105,"github_topics":106,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":112,"updated_at":113,"faqs":114,"releases":143},8078,"rwightman\u002Fefficientdet-pytorch","efficientdet-pytorch","A PyTorch impl of EfficientDet faithful to the original Google impl w\u002F ported weights","efficientdet-pytorch 是一个基于 PyTorch 框架实现的 EfficientDet 目标检测工具，旨在高度还原谷歌官方 TensorFlow 版本的效果，并提供了经过迁移验证的预训练权重。它主要解决了开发者在 PyTorch 生态中难以复现原版高效检测模型性能的问题，让用户无需依赖 TensorFlow 即可利用先进的 EfficientDet 架构进行高精度的物体识别。\n\n这款工具非常适合计算机视觉领域的研究人员、算法工程师以及希望快速部署高性能检测模型的开发者使用。其核心亮点在于极高的灵活性与可配置性：不仅支持自由切换 BiFPN 连接模式和卷积类型，还允许用户通过参数轻松替换激活函数与归一化层。更独特的是，它能无缝集成 `timm` 库中任何支持特征提取的主干网络（如 EfficientNetV2），极大地便利了模型改进实验。此外，项目持续更新，已适配 PyTorch 2.0 编译加速及自适应梯度裁剪（AGC）等先进技术，在保证精度的同时显著提升了训练效率与推理速度。","# EfficientDet (PyTorch)\n\nA PyTorch implementation of EfficientDet.\n\nIt is based on the\n* official Tensorflow implementation by [Mingxing Tan and the Google Brain team](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fautoml)\n* paper by Mingxing Tan, Ruoming Pang, Quoc V. Le [EfficientDet: Scalable and Efficient Object Detection](https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.09070) \n\nThere are other PyTorch implementations. Either their approach didn't fit my aim to correctly reproduce the Tensorflow models (but with a PyTorch feel and flexibility) or they cannot come close to replicating MS COCO training from scratch.\n\nAside from the default model configs, there is a lot of flexibility to facilitate experiments and rapid improvements here -- some options based on the official Tensorflow impl, some of my own:\n* BiFPN connections and combination mode are fully configurable and not baked into the model code\n* BiFPN and head modules can be switched between depthwise separable or standard convolutions\n* Activations, batch norm layers are switchable via arguments (soon config)\n* Any backbone in my `timm` model collection that supports feature extraction (`features_only` arg) can be used as a bacbkone.\n\n## Updates\n\n### 2023-05-21\n* Depend on `timm` 0.9\n* Minor bug fixes\n* Version 0.4.1 release\n\n### 2023-02-09\n* Testing with PyTorch 2.0 (nightlies), add --torchcompile support to train and validate scripts\n* A small code cleanup pass, support bwd\u002Ffwd compat across timm 0.8.x and previous releases\n* Use `timm` convert_sync_batchnorm function as it handles updated models w\u002F BatchNormAct2d layers\n\n### 2022-01-06\n* New `efficientnetv2_ds` weights 50.1 mAP @ 1024x0124, using AGC clipping and `timm`'s `efficientnetv2_rw_s` backbone. Memory use comparable to D3, speed faster than D4. Smaller than optimal training batch size so can probably do better... \n\n### 2021-11-30\n* Update `efficientnetv2_dt` weights to a new set, 46.1 mAP @ 768x768, 47.0 mAP @ 896x896 using AGC clipping.\n* Add AGC (Adaptive Gradient Clipping support via `timm`). Idea from (`High-Performance Large-Scale Image Recognition Without Normalization` - https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.06171)\n* `timm` minimum version bumped up to 0.4.12\n\n### 2021-11-16\n* Add EfficientNetV2 backbone experiment `efficientnetv2_dt` based on `timm`'s `efficientnetv2_rw_t` (tiny) model. 45.8 mAP @ 768x768.\n* Updated TF EfficientDet-Lite model defs incl weights ported from official impl (https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fautoml)\n* For Lite models, updated feature resizing code in FPN to be based on feat size instead of reduction ratios, needed to support image size that aren't divisible by 128.\n* Minor tweaks, bug fixes\n\n### 2021-07-28\n* Add training example to README provided by Chris Hughes for training w\u002F custom dataset & Lightning training code\n  * [Medium blog post](https:\u002F\u002Fmedium.com\u002Fdata-science-at-microsoft\u002Ftraining-efficientdet-on-custom-data-with-pytorch-lightning-using-an-efficientnetv2-backbone-1cdf3bd7921f)\n  * [Python notebook](https:\u002F\u002Fgist.github.com\u002FChris-hughes10\u002F73628b1d8d6fc7d359b3dcbbbb8869d7)\n\n### 2021-04-30\n* Add EfficientDet AdvProp-AA weights for D0-D5 from TF impl. Model names `tf_efficientdet_d?_ap`\n  * See https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fautoml\u002Fblob\u002Fmaster\u002Fefficientdet\u002FDet-AdvProp.md\n\n### 2021-02-18\n* Add some new model weights with bilinear interpolation for upsample and downsample in FPN.\n  * 40.9 mAP - `efficientdet_q1`  (replace prev model at 40.6)\n  * 43.2 mAP -`cspresdet50`\n  * 45.2 mAP - `cspdarkdet53m`\n\n### 2020-12-07\n* Training w\u002F fully jit scripted model + bench (`--torchscript`) is possible with inclusion of ModelEmaV2 from `timm` and previous torchscript compat additions. Big speed gains for CPU bound training.\n* Add weights for alternate FPN layouts. QuadFPN experiments (`efficientdet_q0\u002Fq1\u002Fq2`) and CSPResDeXt + PAN (`cspresdext50pan`). See updated table below. Special thanks to [Artus](https:\u002F\u002Ftwitter.com\u002Fartuskg) for providing resources for training the Q2 model.\n* Heads can have a different activation from FPN via config\n* FPN resample (interpolation) can be specified via config and include any F.interpolation method or `max`\u002F`avg` pool\n* Default focal loss changed back to `new_focal`, use `--legacy-focal` arg to use the original. Legacy uses less memory, but has more numerical stability issues.\n* custom augmentation transform and collate fn can be passed to loader factory\n* `timm` >= 0.3.2 required, NOTE double check any custom defined model config for breaking change \n* PyTorch >= 1.6 now required\n\n### 2020-11-12\n* add experimental PAN and Quad FPN configs to the existing EfficientDet BiFPN w\u002F two test model configs\n* switch untrained experimental model configs to use torchscript compat bn head layout by default\n\n### 2020-11-09\n* set model config to read-only after creation to reduce likelyhood of misuse\n* no accessing model or bench .config attr in forward() call chain (for torcscript compat)\n* numerous smaller changes that allow jit scripting of the model or train\u002Fpredict bench\n\n### 2020-10-30\nMerged a few months of accumulated fixes and additions.\n* Proper fine-tuning compatible model init (w\u002F changeable # classes and proper init, demoed in train.py)\n* A new dataset interface with dataset support (via parser classes) for COCO, VOC 2007\u002F2012, and OpenImages V5\u002FChallenge2019\n* New focal loss def w\u002F label smoothing available as an option, support for jit of loss fn for (potential) speedup\n* Improved a few hot spots that squeek out a couple % of throughput gains, higher GPU utilization\n* Pascal \u002F OpenImages evaluators based on Tensorflow Models Evaluator framework (usable for other datasets as well)\n* Support for native PyTorch DDP, SyncBN, and AMP in PyTorch >= 1.6. Still defaults to APEX if installed.\n* Non-square input image sizes are allowed for the model (the anchor layout). Specified by image_size tuple in model config. Currently still restricted to `size % 128 = 0` on each dim.\n* Allow anchor target generation to be done in either dataloader process' via collate or in model as in past. Can help balance compute.\n* Filter out unused target cls\u002Fbox from dataset annotations in fixed size batch tensors before passing to target assigner. Seems to speed convergence.\n* Letterbox aware Random Erasing augmentation added.\n* A (very slow) SoftNMS impl added for inference\u002Fvalidation use. It can be manually enabled right now, can add arg if demand.\n* Tested with PyTorch 1.7\n* Add ResDet50 model weights, 41.6 mAP.\n\nA few things on priority list I haven't tackled yet:\n* Mosaic augmentation\n* bbox IOU loss (tried a bit but so far not a great result, need time to debug\u002Fimprove)\n\n**NOTE** There are some breaking changes:\n* Predict and Train benches now output XYXY boxes, NOT XYWH as before. This was done to support other datasets as XYWH is COCO's evaluator requirement.\n* The TF Models Evaluator operates on YXYX boxes like the models. Conversion from XYXY is currently done by default. Why don't I just keep everything YXYX? Because PyTorch GPU NMS operates in XYXY.\n* You must update your version of `timm` to the latest (>=0.3), as some APIs for helpers changed a bit.\n\nTraining sanity checks were done on VOC and OI\n  * 80.0 @ 50 mAP finetune on voc0712 with no attempt to tune params (roughly as per command below)\n  * 18.0 mAP @ 50 for OI Challenge2019 after couple days of training (only 6 epochs, eek!). It's much bigger, and takes a LOONG time, many classes are quite challenging.\n\n\n## Models\n\nThe table below contains models with pretrained weights. There are quite a number of other models that I have defined in [model configurations](effdet\u002Fconfig\u002Fmodel_config.py) that use various `timm` backbones.\n\n| Variant                | mAP (val2017) | mAP (test-dev2017) | mAP (TF official val2017) | mAP (TF official test-dev2017) | Params (M) | Img Size |\n|------------------------|:-------------:| :---: | :---: | :---: |:----------:|:--------:|\n| tf_efficientdet_lite0  |     27.1      | TBD | 26.4 | N\u002FA |    3.24    |   320    |\n| tf_efficientdet_lite1  |     32.2      | TBD | 31.5 | N\u002FA |    4.25    |   384    |\n| efficientdet_d0        |     33.6      | TBD | N\u002FA | N\u002FA |    3.88    |   512    |\n| tf_efficientdet_d0     |     34.2      | TBD | 34.3 | 34.6 |    3.88    |   512    |\n| tf_efficientdet_d0_ap  |     34.8      | TBD | 35.2 | 35.3 |    3.88    |   512    |\n| efficientdet_q0        |     35.7      | TBD | N\u002FA | N\u002FA |    4.13    |   512    |\n| tf_efficientdet_lite2  |     35.9      | TBD | 35.1 | N\u002FA |    5.25    |   448    |\n| efficientdet_d1        |     39.4      | 39.5 | N\u002FA | N\u002FA |    6.62    |   640    |\n| tf_efficientdet_lite3  |     39.6      | TBD | 38.8 | N\u002FA |    8.35    |   512    |\n| tf_efficientdet_d1     |     40.1      | TBD | 40.2 | 40.5 |    6.63    |   640    |\n| tf_efficientdet_d1_ap  |     40.8      | TBD | 40.9 | 40.8 |    6.63    |   640    |\n| efficientdet_q1        |     40.9      | TBD | N\u002FA | N\u002FA |    6.98    |   640    |\n| cspresdext50pan        |     41.2      | TBD | N\u002FA | N\u002FA |    22.2    |   640    |\n| resdet50               |     41.6      | TBD | N\u002FA | N\u002FA |    27.6    |   640    |\n| efficientdet_q2        |     43.1      | TBD | N\u002FA | N\u002FA |    8.81    |   768    |\n| cspresdet50            |     43.2      | TBD | N\u002FA | N\u002FA |    24.3    |   768    |\n| tf_efficientdet_d2     |     43.4      | TBD | 42.5 | 43 |    8.10    |   768    |\n| tf_efficientdet_lite3x |     43.6      | TBD | 42.6 | N\u002FA |    9.28    |   640    |\n| tf_efficientdet_lite4  |     44.2      | TBD | 43.2 | N\u002FA |    15.1    |   640    |\n| tf_efficientdet_d2_ap  |     44.2      | TBD | 44.3 | 44.3 |    8.10    |   768    |\n| cspdarkdet53m          |     45.2      | TBD | N\u002FA | N\u002FA |    35.6    |   768    |\n| efficientdetv2_dt      |     46.1      | TBD | N\u002FA | N\u002FA |    13.4    |   768    |\n| tf_efficientdet_d3     |     47.1      | TBD | 47.2 | 47.5 |    12.0    |   896    |\n| tf_efficientdet_d3_ap  |     47.7      | TBD | 48.0 | 47.7 |    12.0    |   896    |\n| tf_efficientdet_d4     |     49.2      | TBD | 49.3 | 49.7 |    20.7    |   1024   |\n| efficientdetv2_ds      |     50.1      | TBD | N\u002FA | N\u002FA |    26.6    |   1024   |\n| tf_efficientdet_d4_ap  |     50.2      | TBD | 50.4 | 50.4 |    20.7    |   1024   |\n| tf_efficientdet_d5     |     51.2      | TBD | 51.2 | 51.5 |    33.7    |   1280   |\n| tf_efficientdet_d6     |     52.0      | TBD | 52.1 | 52.6 |    51.9    |   1280   |\n| tf_efficientdet_d5_ap  |     52.1      | TBD | 52.2 | 52.5 |    33.7    |   1280   |\n| tf_efficientdet_d7     |     53.1      | 53.4 | 53.4 | 53.7 |    51.9    |   1536   |\n| tf_efficientdet_d7x    |     54.3      | TBD | 54.4 | 55.1 |    77.1    |   1536   |\n\n\nSee [model configurations](effdet\u002Fconfig\u002Fmodel_config.py) for model checkpoint urls and differences.\n\n_NOTE: Official scores for all modules now using soft-nms, but still using normal NMS here._\n\n_NOTE: In training some experimental models, I've noticed some potential issues with the combination of synchronized BatchNorm (`--sync-bn`) and model EMA weight everaging (`--model-ema`) during distributed training. The result is either a model that fails to converge, or appears to converge (training loss) but the eval loss (running BN stats) is garbage. I haven't observed this with EfficientNets, but have with some backbones like CspResNeXt, VoVNet, etc. Disabling either EMA or sync bn seems to eliminate the problem and result in good models. I have not fully characterized this issue._\n\n## Environment Setup\n\nTested in a Python 3.7 - 3.9 conda environment in Linux with:\n* PyTorch 1.6 - 1.10\n* PyTorch Image Models (timm) >= 0.4.12, `pip install timm` or local install from (https:\u002F\u002Fgithub.com\u002Frwightman\u002Fpytorch-image-models)\n* Apex AMP master (as of 2020-08). I recommend using native PyTorch AMP and DDP now.\n\n*NOTE* - There is a conflict\u002Fbug with Numpy 1.18+ and pycocotools 2.0, force install numpy \u003C= 1.17.5 or ensure you install pycocotools >= 2.0.2\n\n## Dataset Setup and Use\n\n### COCO\nMSCOCO 2017 validation data:\n```\nwget http:\u002F\u002Fimages.cocodataset.org\u002Fzips\u002Fval2017.zip\nwget http:\u002F\u002Fimages.cocodataset.org\u002Fannotations\u002Fannotations_trainval2017.zip\nunzip val2017.zip\nunzip annotations_trainval2017.zip\n```\n\nMSCOCO 2017 test-dev data:\n```\nwget http:\u002F\u002Fimages.cocodataset.org\u002Fzips\u002Ftest2017.zip\nunzip -q test2017.zip\nwget http:\u002F\u002Fimages.cocodataset.org\u002Fannotations\u002Fimage_info_test2017.zip\nunzip image_info_test2017.zip\n```\n\n#### COCO Evaluation\n\nRun validation (val2017 by default) with D2 model: `python validate.py \u002Flocaltion\u002Fof\u002Fmscoco\u002F --model tf_efficientdet_d2`\n\n\nRun test-dev2017: `python validate.py \u002Flocaltion\u002Fof\u002Fmscoco\u002F --model tf_efficientdet_d2 --split testdev`\n\n#### COCO Training\n\n`.\u002Fdistributed_train.sh 4 \u002Fmscoco --model tf_efficientdet_d0 -b 16 --amp  --lr .09 --warmup-epochs 5  --sync-bn --opt fusedmomentum --model-ema`\n\nNOTE:\n* Training script currently defaults to a model that does NOT have redundant conv + BN bias layers like the official models, set correct flag when validating.\n* I've only trained with img mean (`--fill-color mean`) as the background for crop\u002Fscale\u002Faspect fill, the official repo uses black pixel (0) (`--fill-color 0`). Both likely work fine.\n* The official training code uses EMA weight averaging by default, it's not clear there is a point in doing this with the cosine LR schedule, I find the non-EMA weights end up better than EMA in the last 10-20% of training epochs \n* The default h-params is a very close to unstable (exploding loss), don't try using Nesterov momentum. Try to keep the batch size up, use sync-bn.\n\n\n### Pascal VOC\n\n2007, 2012, and combined 2007 + 2012 w\u002F labeled 2007 test for validation are supported.\n\n```\nwget http:\u002F\u002Fhost.robots.ox.ac.uk\u002Fpascal\u002FVOC\u002Fvoc2012\u002FVOCtrainval_11-May-2012.tar\nwget http:\u002F\u002Fhost.robots.ox.ac.uk\u002Fpascal\u002FVOC\u002Fvoc2007\u002FVOCtrainval_06-Nov-2007.tar\nwget http:\u002F\u002Fhost.robots.ox.ac.uk\u002Fpascal\u002FVOC\u002Fvoc2007\u002FVOCtest_06-Nov-2007.tar\nfind . -name '*.tar' -exec tar xf {} \\;\n```\n\nThere should be a `VOC2007` and `VOC2012` folder within `VOCdevkit`, dataset root for cmd line will be VOCdevkit.\n\nAlternative download links, slower but up more often than ox.ac.uk:\n```\nhttp:\u002F\u002Fpjreddie.com\u002Fmedia\u002Ffiles\u002FVOCtrainval_11-May-2012.tar\nhttp:\u002F\u002Fpjreddie.com\u002Fmedia\u002Ffiles\u002FVOCtrainval_06-Nov-2007.tar\nhttp:\u002F\u002Fpjreddie.com\u002Fmedia\u002Ffiles\u002FVOCtest_06-Nov-2007.tar\n```\n\n#### VOC Evaluation\n\nEvaluate on VOC2012 validation set:\n`python validate.py \u002Fdata\u002FVOCdevkit --model efficientdet_d0 --num-gpu 2 --dataset voc2007 --checkpoint mychekpoint.pth --num-classes 20`\n\n#### VOC Training\n\nFine tune COCO pretrained weights to VOC 2007 + 2012:\n`\u002Fdistributed_train.sh 4 \u002Fdata\u002FVOCdevkit --model efficientdet_d0 --dataset voc0712 -b 16 --amp --lr .008 --sync-bn --opt fusedmomentum --warmup-epochs 3 --model-ema --model-ema-decay 0.9966 --epochs 150 --num-classes 20 --pretrained`\n\n### OpenImages\n\nSetting up OpenImages dataset is a commitment. I've tried to make it a bit easier wrt to the annotations, but grabbing the dataset is still going to take some time. It will take approx 560GB of storage space.\n\nTo download the image data, I prefer the CVDF packaging. The main OpenImages dataset page, annotations, dataset license info can be found at: https:\u002F\u002Fstorage.googleapis.com\u002Fopenimages\u002Fweb\u002Findex.html\n\n#### CVDF Images Download\n\nFollow the s3 download directions here: https:\u002F\u002Fgithub.com\u002Fcvdfoundation\u002Fopen-images-dataset#download-images-with-bounding-boxes-annotations\n\nEach `train_\u003Cx>.tar.gz` should be extracted to `train\u002F\u003Cx>` folder, where x is a hex digit from 0-F. `validation.tar.gz` can be extracted as flat files into `validation\u002F`.\n\n#### Annotations Download\n\nAnnotations can be downloaded separately from the OpenImages home page above. For convenience, I've packaged them all together with some additional 'info' csv files that contain ids and stats for all image files. My datasets rely on the `\u003Cset>-info.csv` files. Please see https:\u002F\u002Fstorage.googleapis.com\u002Fopenimages\u002Fweb\u002Ffactsfigures.html for the License of these annotations. The annotations are licensed by Google LLC under CC BY 4.0 license. The images are listed as having a CC BY 2.0 license.\n```\nwget https:\u002F\u002Fgithub.com\u002Frwightman\u002Fefficientdet-pytorch\u002Freleases\u002Fdownload\u002Fv0.1-anno\u002Fopenimages-annotations.tar.bz2\nwget https:\u002F\u002Fgithub.com\u002Frwightman\u002Fefficientdet-pytorch\u002Freleases\u002Fdownload\u002Fv0.1-anno\u002Fopenimages-annotations-challenge-2019.tar.bz2\nfind . -name '*.tar.bz2' -exec tar xf {} \\;\n```\n\n#### Layout\n\nOnce everything is downloaded and extracted the root of your openimages data folder should contain:\n```\nannotations\u002F\u003Ccsv anno for openimages v5\u002Fv6>\nannotations\u002Fchallenge-2019\u002F\u003Ccsv anno for challenge2019>\ntrain\u002F0\u002F\u003Call the image files starting with '0'>\n.\n.\n.\ntrain\u002Ff\u002F\u003Call the image files starting with 'f'>\nvalidation\u002F\u003Call the image files in same folder>\n```\n\n#### OpenImages Training\nTraining with Challenge2019 annotations (500 classes):\n`.\u002Fdistributed_train.sh 4 \u002Fdata\u002Fopenimages --model efficientdet_d0 --dataset openimages-challenge2019 -b 7 --amp --lr .042 --sync-bn --opt fusedmomentum --warmup-epochs 1 --lr-noise 0.4 0.9 --model-ema --model-ema-decay 0.999966 --epochs 100 --remode pixel --reprob 0.15 --recount 4 --num-classes 500 --val-skip 2`\n\nThe 500 (Challenge2019) or 601 (V5\u002FV6) class head for OI takes up a LOT more GPU memory vs COCO. You'll likely need to half batch sizes.\n\n### Examples of Training \u002F Fine-Tuning on Custom Datasets\n\nThe models here have been used with custom training routines and datasets with great results. There are lots of details to figure out so please don't file any 'I get crap results on my custom dataset issues'. If you can illustrate a reproducible problem on a public, non-proprietary, downloadable dataset, with public github fork of this repo including working dataset\u002Fparser implementations, I MAY have time to take a look.\n\nExamples:\n* Chris Hughes has put together a great example of training w\u002F `timm` EfficientNetV2 backbones and the latest versions of the EfficientDet models here\n  * [Medium blog post](https:\u002F\u002Fmedium.com\u002Fdata-science-at-microsoft\u002Ftraining-efficientdet-on-custom-data-with-pytorch-lightning-using-an-efficientnetv2-backbone-1cdf3bd7921f)\n  * [Python notebook](https:\u002F\u002Fgist.github.com\u002FChris-hughes10\u002F73628b1d8d6fc7d359b3dcbbbb8869d7)\n* Alex Shonenkov has a clear and concise Kaggle kernel which illustrates fine-tuning these models for detecting wheat heads: https:\u002F\u002Fwww.kaggle.com\u002Fshonenkov\u002Ftraining-efficientdet (NOTE: this is out of date wrt to latest versions here, many details have changed)\n\nIf you have a good example script or kernel training these models with a different dataset, feel free to notify me for inclusion here...\n\n## Results\n\n### My Training\n\n#### EfficientDet-D0\n\nLatest training run with .336 for D0 (on 4x 1080ti):\n`.\u002Fdistributed_train.sh 4 \u002Fmscoco --model efficientdet_d0 -b 22 --amp --lr .12 --sync-bn --opt fusedmomentum --warmup-epochs 5 --lr-noise 0.4 0.9 --model-ema --model-ema-decay 0.9999`\n\nThese hparams above resulted in a good model, a few points:\n* the mAP peaked very early (epoch 200 of 300) and then appeared to overfit, so likely still room for improvement\n* I enabled my experimental LR noise which tends to work well with EMA enabled\n* the effective LR is a bit higher than official. Official is .08 for batch 64, this works out to .0872\n* drop_path (aka survival_prob \u002F drop_connect) rate of 0.1, which is higher than the suggested 0.0 for D0 in official, but lower than the 0.2 for the other models\n* longer EMA period than default\n\nVAL2017\n```\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.336251\n Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.521584\n Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.356439\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.123988\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.395033\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.521695\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.287121\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.441450\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.467914\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.197697\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.552515\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.689297\n```\n\n#### EfficientDet-D1 \n\nLatest run with .394 mAP (on 4x 1080ti):\n`.\u002Fdistributed_train.sh 4 \u002Fmscoco --model efficientdet_d1 -b 10 --amp --lr .06 --sync-bn --opt fusedmomentum --warmup-epochs 5 --lr-noise 0.4 0.9 --model-ema --model-ema-decay 0.99995`\n\nFor this run I used some improved augmentations, still experimenting so not ready for release, should work well without them but will likely start overfitting a bit sooner and possibly end up a in the .385-.39 range.\n\n\n### Ported Tensorflow weights\n\n#### TEST-DEV2017\n\nNOTE: I've only tried submitting D7 to dev server for sanity check so far\n\n##### TF-EfficientDet-D7\n```\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.534\n Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.726\n Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.577\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.356\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.569\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.660\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.397\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.644\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.682\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.508\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.718\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.818\n ```\n\n#### VAL2017\n\n##### TF-EfficientDet-D0\n```\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.341877\n Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.525112\n Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.360218\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.131366\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.399686\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.537368\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.293137\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.447829\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.472954\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.195282\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.558127\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.695312\n```\n\n##### TF-EfficientDet-D1\n```\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.401070\n Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.590625\n Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.422998\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.211116\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.459650\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.577114\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.326565\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.507095\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.537278\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.308963\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.610450\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.731814\n```\n\n##### TF-EfficientDet-D2\n```\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.434042\n Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.627834\n Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.463488\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.237414\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.486118\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.606151\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.343016\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.538328\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.571489\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.350301\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.638884\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.746671\n```\n\n##### TF EfficientDet-D3\n\n```\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.471223\n Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.661550\n Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.505127\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.301385\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.518339\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.626571\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.365186\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.582691\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.617252\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.424689\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.670761\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.779611\n```\n\n##### TF-EfficientDet-D4\n ```\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.491759\n Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.686005\n Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.527791\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.325658\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.536508\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.635309\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.373752\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.601733\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.638343\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.463057\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.685103\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.789180\n```\n\n##### TF-EfficientDet-D5\n```\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.511767\n Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.704835\n Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.552920\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.355680\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.551341\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.650184\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.384516\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.619196\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.657445\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.499319\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.695617\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.788889\n```\n\n##### TF-EfficientDet-D6\n```\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.520200\n Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.713204\n Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.560973\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.361596\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.567414\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.657173\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.387733\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.629269\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.667495\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.499002\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.711909\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.802336\n```\n\n##### TF-EfficientDet-D7\n ```\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.531256\n Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.724700\n Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.571787\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.368872\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.573938\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.668253\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.393620\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.637601\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.676987\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.524850\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.717553\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.806352\n ```\n\n##### TF-EfficientDet-D7X\n\n```\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.543\n Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.737\n Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.585\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.401\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.579\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.680\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.398\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.649\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.689\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.550\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.725\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.823\n```\n\n## TODO\n- [x] Basic Training (object detection) reimplementation\n- [ ] Mosaic Augmentation\n- [ ] Rand\u002FAutoAugment\n- [ ] BBOX IoU loss (giou, diou, ciou, etc)\n- [ ] Training (semantic segmentation) experiments\n- [ ] Integration with Detectron2 \u002F MMDetection codebases\n- [ ] Addition and cleanup of EfficientNet based U-Net and DeepLab segmentation models that I've used in past projects\n- [x] Addition and cleanup of OpenImages dataset\u002Ftraining support from a past project\n- [ ] Exploration of instance segmentation possibilities...\n\nIf you are an organization is interested in sponsoring and any of this work, or prioritization of the possible future directions interests you, feel free to contact me (issue, LinkedIn, Twitter, hello at rwightman dot com). I will setup a github sponser if there is any interest.\n","# EfficientDet (PyTorch)\n\nEfficientDet 的 PyTorch 实现。\n\n该实现基于：\n* [Mingxing Tan 和 Google Brain 团队](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fautoml) 的官方 TensorFlow 实现\n* Mingxing Tan、Ruoming Pang 和 Quoc V. Le 的论文《EfficientDet: 可扩展且高效的目标检测》（[arXiv:1911.09070](https:\u002F\u002Farxiv.org\u002Fabs\u002F1911.09070)）\n\n此外，还有其他 PyTorch 实现。然而，要么它们的方法不符合我正确复现 TensorFlow 模型的目标（同时保持 PyTorch 风格和灵活性），要么在从头开始训练 MS COCO 数据集时无法达到相近的性能。\n\n除了默认的模型配置外，这里还提供了极大的灵活性，以支持实验和快速改进——其中一些选项源自官方 TensorFlow 实现，另一些则是我自创的：\n* BiFPN 连接方式和组合模式完全可配置，不会硬编码到模型代码中。\n* BiFPN 和头部模块可以在深度可分离卷积与标准卷积之间切换。\n* 激活函数和批归一化层可通过参数切换（未来将支持配置）。\n* 我的 `timm` 模型库中任何支持特征提取（`features_only` 参数）的骨干网络都可以用作主干网络。\n\n## 更新日志\n\n### 2023-05-21\n* 依赖 `timm` 0.9 版本。\n* 修复了一些小 bug。\n* 发布版本 0.4.1。\n\n### 2023-02-09\n* 测试 PyTorch 2.0（夜间构建版），并在训练和验证脚本中添加了 `--torchcompile` 支持。\n* 进行了一次小规模代码清理，确保与 `timm` 0.8.x 及更早版本的前向和反向兼容性。\n* 使用 `timm` 的 `convert_sync_batchnorm` 函数，因为它可以处理包含 `BatchNormAct2d` 层的更新模型。\n\n### 2022-01-06\n* 新增 `efficientnetv2_ds` 权重，在 1024x1024 分辨率下达到 50.1 mAP，采用 AGC 裁剪并使用 `timm` 的 `efficientnetv2_rw_s` 主干网络。内存占用与 D3 相当，速度比 D4 更快。由于训练批次大小略小，性能可能还有提升空间……\n\n### 2021-11-30\n* 将 `efficientnetv2_dt` 权重更新为新版本：在 768x768 分辨率下达到 46.1 mAP，在 896x896 分辨率下达到 47.0 mAP，均采用 AGC 裁剪。\n* 添加 AGC（自适应梯度裁剪）支持，通过 `timm` 实现。灵感来源于论文《无需归一化的高性能大规模图像识别》（[arXiv:2102.06171](https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.06171)）。\n* `timm` 的最低版本要求提升至 0.4.12。\n\n### 2021-11-16\n* 增加基于 `timm` 的 `efficientnetv2_rw_t`（tiny）模型的 EfficientNetV2 主干实验 `efficientnetv2_dt`，在 768x768 分辨率下达到 45.8 mAP。\n* 更新了 TF EfficientDet-Lite 模型定义，包括从官方实现移植的权重（[GitHub: google\u002Fautoml](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fautoml)）。\n* 对于 Lite 模型，更新了 FPN 中的特征尺寸调整代码，改为基于特征尺寸而非缩减比例，以便支持不能被 128 整除的图像尺寸。\n* 进行了一些小调整和 bug 修复。\n\n### 2021-07-28\n* 在 README 中添加了 Chris Hughes 提供的训练示例，用于使用自定义数据集和 Lightning 训练代码进行训练。\n  * [Medium 博客文章](https:\u002F\u002Fmedium.com\u002Fdata-science-at-microsoft\u002Ftraining-efficientdet-on-custom-data-with-pytorch-lightning-using-an-efficientnetv2-backbone-1cdf3bd7921f)\n  * [Python 笔记本](https:\u002F\u002Fgist.github.com\u002FChris-hughes10\u002F73628b1d8d6fc7d359b3dcbbbb8869d7)\n\n### 2021-04-30\n* 添加了来自 TF 实现的 D0-D5 级别 EfficientDet AdvProp-AA 权重，模型名称为 `tf_efficientdet_d?_ap`。\n  * 参见 [Google AutoML GitHub 仓库中的 Det-AdvProp.md 文件](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fautoml\u002Fblob\u002Fmaster\u002Fefficientdet\u002FDet-AdvProp.md)。\n\n### 2021-02-18\n* 添加了一些新的模型权重，使用双线性插值进行 FPN 中的上采样和下采样。\n  * 40.9 mAP - `efficientdet_q1`（取代之前的 40.6 mAP 模型）\n  * 43.2 mAP - `cspresdet50`\n  * 45.2 mAP - `cspdarkdet53m`\n\n### 2020-12-07\n* 通过引入 `timm` 中的 ModelEmaV2 以及之前的 TorchScript 兼容性改进，现在可以使用完全 JIT 编译的模型进行训练和基准测试（`--torchscript`）。对于 CPU 密集型训练，速度有显著提升。\n* 添加了替代 FPN 布局的权重。包括 QuadFPN 实验（`efficientdet_q0\u002Fq1\u002Fq2`）以及 CSPResDeXt + PAN（`cspresdext50pan`）。详见下方更新后的表格。特别感谢 [Artus](https:\u002F\u002Ftwitter.com\u002Fartuskg) 为 Q2 模型的训练提供了资源。\n* 头部模块的激活函数可以通过配置与 FPN 不同。\n* FPN 的重采样（插值）可以通过配置指定，支持任意 F 插值方法或 `max`\u002F`avg` 池化。\n* 默认焦点损失函数已改回 `new_focal`，若需使用原始版本，请使用 `--legacy-focal` 参数。旧版焦点损失占用内存较少，但数值稳定性较差。\n* 可以将自定义增强变换和 collate 函数传递给数据加载器工厂。\n* 需要 `timm` ≥ 0.3.2，请务必检查自定义模型配置是否存在破坏性变更。\n* 现在需要 PyTorch ≥ 1.6。\n\n### 2020-11-12\n* 在现有的 EfficientDet BiFPN 结构基础上，新增了实验性的 PAN 和 Quad FPN 配置，并附带两个测试模型配置。\n* 将未训练的实验性模型配置默认切换为兼容 TorchScript 的 BN 头布局。\n\n### 2020-11-09\n* 创建后将模型配置设置为只读，以减少误用的可能性。\n* 在前向传播调用链中不再访问模型或基准测试的 `.config` 属性（以确保 TorchScript 兼容性）。\n* 进行了许多小改动，使得模型或训练\u002F预测基准测试能够被 JIT 编译。\n\n### 2020-10-30\n合并了几个月积累的修复和新增功能。\n* 支持可变类别数且初始化正确的模型微调（在 train.py 中演示）\n* 新的数据集接口，通过解析器类支持 COCO、VOC 2007\u002F2012 和 OpenImages V5\u002FChallenge2019 数据集\n* 新增带有标签平滑选项的焦点损失定义，并支持将损失函数编译为 JIT 以提升速度\n* 优化了几个性能瓶颈，提升了约 2% 的吞吐量和 GPU 利用率\n* 基于 TensorFlow Models Evaluator 框架的 Pascal 和 OpenImages 评估器（也可用于其他数据集）\n* 支持原生 PyTorch DDP、SyncBN 和 AMP（PyTorch >= 1.6），若已安装 APEX，则仍默认使用 APEX\n* 模型现在允许非正方形输入图像尺寸（锚点布局）。通过模型配置中的 image_size 元组指定。目前每个维度仍需满足 `size % 128 = 0`\n* 锚点目标生成既可以在数据加载器进程中通过 collate 函数完成，也可以像以前一样在模型中完成，有助于平衡计算负载\n* 在将固定大小的批处理张量传递给目标分配器之前，从数据集注释中过滤掉未使用的类别和边界框，似乎能加快收敛速度\n* 添加了对 Letterbox 的 Random Erasing 数据增强支持\n* 添加了一个（非常慢的）SoftNMS 实现，用于推理和验证。目前可以手动启用，如有需求可添加参数\n* 已在 PyTorch 1.7 上测试通过\n* 添加 ResDet50 模型权重，mAP 达 41.6。\n\n优先级列表中尚未完成的几项：\n* Mosaic 数据增强\n* bbox IOU 损失（尝试过一些，但效果不佳，需要时间调试和改进）\n\n**注意** 存在一些破坏性变更：\n* 预测和训练基准现在输出 XYXY 格式的边界框，而非之前的 XYWH。这是为了支持其他数据集，因为 XYWH 是 COCO 评估器的要求。\n* TF Models Evaluator 使用 YXYX 格式的边界框，与模型一致。目前默认会进行 XYXY 到 YXYX 的转换。为什么不直接全部使用 YXYX？因为 PyTorch 的 GPU NMS 是基于 XYXY 的。\n* 必须将 `timm` 更新到最新版本（>=0.3），因为一些辅助 API 发生了变化。\n\n在 VOC 和 OI 上进行了训练验证：\n* 在 voc0712 上进行微调，未做任何超参数调整，mAP@50 达 80.0（大致如以下命令所示）\n* 经过几天的训练（仅 6 个 epoch），OI Challenge2019 的 mAP@50 达 18.0！数据集更大，训练时间很长，许多类别也颇具挑战性。\n\n\n## 模型\n\n下表包含带有预训练权重的模型。我在 [模型配置](effdet\u002Fconfig\u002Fmodel_config.py) 中还定义了许多其他使用不同 `timm` 主干网络的模型。\n\n| 变体                | mAP (val2017) | mAP (test-dev2017) | mAP (TF official val2017) | mAP (TF official test-dev2017) | 参数 (M) | 图像尺寸 |\n|------------------------|:-------------:| :---: | :---: | :---: |:----------:|:--------:|\n| tf_efficientdet_lite0  |     27.1      | TBD | 26.4 | N\u002FA |    3.24    |   320    |\n| tf_efficientdet_lite1  |     32.2      | TBD | 31.5 | N\u002FA |    4.25    |   384    |\n| efficientdet_d0        |     33.6      | TBD | N\u002FA | N\u002FA |    3.88    |   512    |\n| tf_efficientdet_d0     |     34.2      | TBD | 34.3 | 34.6 |    3.88    |   512    |\n| tf_efficientdet_d0_ap  |     34.8      | TBD | 35.2 | 35.3 |    3.88    |   512    |\n| efficientdet_q0        |     35.7      | TBD | N\u002FA | N\u002FA |    4.13    |   512    |\n| tf_efficientdet_lite2  |     35.9      | TBD | 35.1 | N\u002FA |    5.25    |   448    |\n| efficientdet_d1        |     39.4      | 39.5 | N\u002FA | N\u002FA |    6.62    |   640    |\n| tf_efficientdet_lite3  |     39.6      | TBD | 38.8 | N\u002FA |    8.35    |   512    |\n| tf_efficientdet_d1     |     40.1      | TBD | 40.2 | 40.5 |    6.63    |   640    |\n| tf_efficientdet_d1_ap  |     40.8      | TBD | 40.9 | 40.8 |    6.63    |   640    |\n| efficientdet_q1        |     40.9      | TBD | N\u002FA | N\u002FA |    6.98    |   640    |\n| cspresdext50pan        |     41.2      | TBD | N\u002FA | N\u002FA |    22.2    |   640    |\n| resdet50               |     41.6      | TBD | N\u002FA | N\u002FA |    27.6    |   640    |\n| efficientdet_q2        |     43.1      | TBD | N\u002FA | N\u002FA |    8.81    |   768    |\n| cspresdet50            |     43.2      | TBD | N\u002FA | N\u002FA |    24.3    |   768    |\n| tf_efficientdet_d2     |     43.4      | TBD | 42.5 | 43 |    8.10    |   768    |\n| tf_efficientdet_lite3x |     43.6      | TBD | 42.6 | N\u002FA |    9.28    |   640    |\n| tf_efficientdet_lite4  |     44.2      | TBD | 43.2 | N\u002FA |    15.1    |   640    |\n| tf_efficientdet_d2_ap  |     44.2      | TBD | 44.3 | 44.3 |    8.10    |   768    |\n| cspdarkdet53m          |     45.2      | TBD | N\u002FA | N\u002FA |    35.6    |   768    |\n| efficientdetv2_dt      |     46.1      | TBD | N\u002FA | N\u002FA |    13.4    |   768    |\n| tf_efficientdet_d3     |     47.1      | TBD | 47.2 | 47.5 |    12.0    |   896    |\n| tf_efficientdet_d3_ap  |     47.7      | TBD | 48.0 | 47.7 |    12.0    |   896    |\n| tf_efficientdet_d4     |     49.2      | TBD | 49.3 | 49.7 |    20.7    |   1024   |\n| efficientdetv2_ds      |     50.1      | TBD | N\u002FA | N\u002FA |    26.6    |   1024   |\n| tf_efficientdet_d4_ap  |     50.2      | TBD | 50.4 | 50.4 |    20.7    |   1024   |\n| tf_efficientdet_d5     |     51.2      | TBD | 51.2 | 51.5 |    33.7    |   1280   |\n| tf_efficientdet_d6     |     52.0      | TBD | 52.1 | 52.6 |    51.9    |   1280   |\n| tf_efficientdet_d5_ap  |     52.1      | TBD | 52.2 | 52.5 |    33.7    |   1280   |\n| tf_efficientdet_d7     |     53.1      | 53.4 | 53.4 | 53.7 |    51.9    |   1536   |\n| tf_efficientdet_d7x    |     54.3      | TBD | 54.4 | 55.1 |    77.1    |   1536   |\n\n\n有关模型检查点 URL 和差异，请参阅 [模型配置](effdet\u002Fconfig\u002Fmodel_config.py)。\n\n_注：所有模块的官方分数现在都使用 soft-nms，但此处仍使用普通 NMS。_\n\n_注：在训练一些实验性模型时，我注意到分布式训练中同步 BatchNorm (`--sync-bn`) 和模型 EMA 权重平均 (`--model-ema`) 的组合可能存在潜在问题。结果要么是模型无法收敛，要么看似收敛（训练损失降低），但评估损失（运行中的 BN 统计信息）却完全错误。我没有在 EfficientNets 上观察到这种情况，但在 CspResNeXt、VoVNet 等一些主干网络上确实出现了。禁用 EMA 或 sync bn 中的任意一个似乎可以消除问题并得到良好的模型。我尚未完全确定这一问题的原因。_\n\n## 环境设置\n\n已在 Linux 下的 Python 3.7 - 3.9 conda 环境中测试，环境包括：\n* PyTorch 1.6 - 1.10\n* PyTorch Image Models (timm) >= 0.4.12，可通过 `pip install timm` 或从 https:\u002F\u002Fgithub.com\u002Frwightman\u002Fpytorch-image-models 本地安装\n* Apex AMP master（截至 2020 年 8 月）。建议现在使用原生 PyTorch 的 AMP 和 DDP。\n\n*注意* - Numpy 1.18+ 与 pycocotools 2.0 存在冲突\u002Fbug，必须强制安装 Numpy \u003C= 1.17.5，或确保安装 pycocotools >= 2.0.2。\n\n## 数据集设置与使用\n\n### COCO\nMSCOCO 2017 验证数据：\n```\nwget http:\u002F\u002Fimages.cocodataset.org\u002Fzips\u002Fval2017.zip\nwget http:\u002F\u002Fimages.cocodataset.org\u002Fannotations\u002Fannotations_trainval2017.zip\nunzip val2017.zip\nunzip annotations_trainval2017.zip\n```\n\nMSCOCO 2017 测试-dev 数据：\n```\nwget http:\u002F\u002Fimages.cocodataset.org\u002Fzips\u002Ftest2017.zip\nunzip -q test2017.zip\nwget http:\u002F\u002Fimages.cocodataset.org\u002Fannotations\u002Fimage_info_test2017.zip\nunzip image_info_test2017.zip\n```\n\n#### COCO 评估\n\n使用 D2 模型运行验证（默认为 val2017）：`python validate.py \u002Flocaltion\u002Fof\u002Fmscoco\u002F --model tf_efficientdet_d2`\n\n\n运行 test-dev2017：`python validate.py \u002Flocaltion\u002Fof\u002Fmscoco\u002F --model tf_efficientdet_d2 --split testdev`\n\n#### COCO 训练\n\n`.\u002Fdistributed_train.sh 4 \u002Fmscoco --model tf_efficientdet_d0 -b 16 --amp  --lr .09 --warmup-epochs 5  --sync-bn --opt fusedmomentum --model-ema`\n\n注意：\n* 目前的训练脚本默认使用的模型不包含官方模型中的冗余卷积+BN偏置层，在进行验证时请设置正确的标志。\n* 我只使用图像均值（`--fill-color mean`）作为裁剪\u002F缩放\u002F纵横比填充的背景进行训练，而官方仓库则使用黑色像素（0）（`--fill-color 0`）。两者应该都能正常工作。\n* 官方训练代码默认使用 EMA 权重平均，但在采用余弦学习率调度的情况下，这样做是否有必要尚不明确。我发现，在训练的最后 10%-20% 的周期中，非 EMA 权重的表现往往优于 EMA 权重。\n* 默认的超参数设置非常接近不稳定状态（损失爆炸），不要尝试使用 Nesterov 动量。尽量保持较大的批量大小，并使用同步 BN。\n\n\n### Pascal VOC\n\n支持 2007 年、2012 年以及 2007 年和 2012 年合并的数据集，并以 2007 年测试集作为验证集。\n\n```\nwget http:\u002F\u002Fhost.robots.ox.ac.uk\u002Fpascal\u002FVOC\u002Fvoc2012\u002FVOCtrainval_11-May-2012.tar\nwget http:\u002F\u002Fhost.robots.ox.ac.uk\u002Fpascal\u002FVOC\u002Fvoc2007\u002FVOCtrainval_06-Nov-2007.tar\nwget http:\u002F\u002Fhost.robots.ox.ac.uk\u002Fpascal\u002FVOC\u002Fvoc2007\u002FVOCtest_06-Nov-2007.tar\nfind . -name '*.tar' -exec tar xf {} \\;\n```\n\n在 `VOCdevkit` 文件夹内应有 `VOC2007` 和 `VOC2012` 文件夹，命令行中数据集的根目录即为 `VOCdevkit`。\n\n备用下载链接，速度较慢但更新频率高于 ox.ac.uk：\n```\nhttp:\u002F\u002Fpjreddie.com\u002Fmedia\u002Ffiles\u002FVOCtrainval_11-May-2012.tar\nhttp:\u002F\u002Fpjreddie.com\u002Fmedia\u002Ffiles\u002FVOCtrainval_06-Nov-2007.tar\nhttp:\u002F\u002Fpjreddie.com\u002Fmedia\u002Ffiles\u002FVOCtest_06-Nov-2007.tar\n```\n\n#### VOC 评估\n\n在 VOC2012 验证集上进行评估：\n`python validate.py \u002Fdata\u002FVOCdevkit --model efficientdet_d0 --num-gpu 2 --dataset voc2007 --checkpoint mychekpoint.pth --num-classes 20`\n\n#### VOC 训练\n\n将 COCO 预训练权重微调至 VOC 2007 + 2012：\n`\u002Fdistributed_train.sh 4 \u002Fdata\u002FVOCdevkit --model efficientdet_d0 --dataset voc0712 -b 16 --amp --lr .008 --sync-bn --opt fusedmomentum --warmup-epochs 3 --model-ema --model-ema-decay 0.9966 --epochs 150 --num-classes 20 --pretrained`\n\n### OpenImages\n\n搭建 OpenImages 数据集需要一定的投入。尽管我在标注方面做了一些简化，但获取整个数据集仍然会花费不少时间。大约需要 560GB 的存储空间。\n\n要下载图像数据，我推荐使用 CVDF 的打包方式。OpenImages 数据集主页、标注信息及数据集许可信息可在以下网址找到：https:\u002F\u002Fstorage.googleapis.com\u002Fopenimages\u002Fweb\u002Findex.html\n\n#### CVDF 图像下载\n\n请按照此处的 S3 下载说明操作：https:\u002F\u002Fgithub.com\u002Fcvdfoundation\u002Fopen-images-dataset#download-images-with-bounding-boxes-annotations\n\n每个 `train_\u003Cx>.tar.gz` 文件都应解压到 `train\u002F\u003Cx>` 文件夹中，其中 x 是从 0 到 F 的十六进制数字。`validation.tar.gz` 可以解压为扁平文件并放入 `validation\u002F` 文件夹中。\n\n#### 标注下载\n\n标注可以单独从上述 OpenImages 主页下载。为了方便起见，我已将所有标注打包在一起，并添加了一些包含所有图像文件 ID 和统计信息的“info”CSV 文件。我的数据集依赖于 `\u003Cset>-info.csv` 文件。请参阅 https:\u002F\u002Fstorage.googleapis.com\u002Fopenimages\u002Fweb\u002Ffactsfigures.html 查看这些标注的许可信息。标注由 Google LLC 根据 CC BY 4.0 许可证授权。图像则被列为 CC BY 2.0 许可证。\n\n```\nwget https:\u002F\u002Fgithub.com\u002Frwightman\u002Fefficientdet-pytorch\u002Freleases\u002Fdownload\u002Fv0.1-anno\u002Fopenimages-annotations.tar.bz2\nwget https:\u002F\u002Fgithub.com\u002Frwightman\u002Fefficientdet-pytorch\u002Freleases\u002Fdownload\u002Fv0.1-anno\u002Fopenimages-annotations-challenge-2019.tar.bz2\nfind . -name '*.tar.bz2' -exec tar xf {} \\;\n```\n\n#### 数据布局\n\n当所有内容下载并解压完成后，您的 OpenImages 数据文件夹根目录应包含以下内容：\n```\nannotations\u002F\u003Copenimages v5\u002Fv6 的 CSV 标注>\nannotations\u002Fchallenge-2019\u002F\u003Cchallenge2019 的 CSV 标注>\ntrain\u002F0\u002F\u003C所有以 '0' 开头的图像文件>\n.\n.\n.\ntrain\u002Ff\u002F\u003C所有以 'f' 开头的图像文件>\nvalidation\u002F\u003C所有图像文件放在同一个文件夹中>\n```\n\n#### OpenImages 训练\n使用 Challenge2019 标注（500 类）进行训练：\n`.\u002Fdistributed_train.sh 4 \u002Fdata\u002Fopenimages --model efficientdet_d0 --dataset openimages-challenge2019 -b 7 --amp --lr .042 --sync-bn --opt fusedmomentum --warmup-epochs 1 --lr-noise 0.4 0.9 --model-ema --model-ema-decay 0.999966 --epochs 100 --remode pixel --reprob 0.15 --recount 4 --num-classes 500 --val-skip 2`\n\nOI 的 500 类（Challenge2019）或 601 类（V5\u002FV6）头部相比 COCO 需要占用更多的 GPU 内存。您可能需要将批量大小减半。\n\n### 自定义数据集上的训练\u002F微调示例\n\n这里的模型已经成功应用于自定义训练流程和数据集，并取得了很好的效果。由于涉及许多细节问题，请不要提交“我在自定义数据集上得到糟糕结果”的问题。如果您能在一个公开、非专有、可下载的数据集上重现问题，并提供包含有效数据集\u002F解析器实现的此仓库的公共 GitHub 分支，则我可能会有时间查看。\n\n示例：\n* Chris Hughes 整理了一个使用 `timm` EfficientNetV2 主干网络和最新版本 EfficientDet 模型进行训练的优秀示例：\n  * [Medium 博客文章](https:\u002F\u002Fmedium.com\u002Fdata-science-at-microsoft\u002Ftraining-efficientdet-on-custom-data-with-pytorch-lightning-using-an-efficientnetv2-backbone-1cdf3bd7921f)\n  * [Python 笔记本](https:\u002F\u002Fgist.github.com\u002FChris-hughes10\u002F73628b1d8d6fc7d359b3dcbbbb8869d7)\n* Alex Shonenkov 提供了一个清晰简洁的 Kaggle 内核，展示了如何微调这些模型来检测小麦穗：https:\u002F\u002Fwww.kaggle.com\u002Fshonenkov\u002Ftraining-efficientdet（注意：该内容已过时，与当前版本相比有许多变化）\n\n如果您有使用不同数据集训练这些模型的良好示例脚本或内核，请随时通知我以便在此处收录……\n\n## 结果\n\n### 我的训练\n\n#### EfficientDet-D0\n\n使用4张1080ti显卡进行的最新训练，D0模型的mAP为0.336：\n`.\u002Fdistributed_train.sh 4 \u002Fmscoco --model efficientdet_d0 -b 22 --amp --lr .12 --sync-bn --opt fusedmomentum --warmup-epochs 5 --lr-noise 0.4 0.9 --model-ema --model-ema-decay 0.9999`\n\n上述超参数设置得到了一个不错的模型，有几点值得注意：\n* mAP在很早的时候（300个epoch中的第200个）就达到了峰值，随后似乎出现了过拟合，因此可能还有改进空间。\n* 我启用了实验性的学习率噪声，这种策略在启用EMA时通常效果较好。\n* 实际有效学习率略高于官方设定。官方设定的批量为64时的学习率为0.08，而这里的计算结果为0.0872。\n* drop_path（也称为生存概率或drop_connect）率为0.1，这比官方建议的D0模型的0.0要高，但低于其他模型的0.2。\n* EMA的持续时间比默认设置更长。\n\nVAL2017\n```\n 平均精度（AP）@[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.336251\n 平均精度（AP）@[ IoU=0.50      | area=   all | maxDets=100 ] = 0.521584\n 平均精度（AP）@[ IoU=0.75      | area=   all | maxDets=100 ] = 0.356439\n 平均精度（AP）@[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.123988\n 平均精度（AP）@[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.395033\n 平均精度（AP）@[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.521695\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.287121\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.441450\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.467914\n 平均召回率（AR）@[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.197697\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.552515\n 平均召回率（AR）@[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.689297\n```\n\n#### EfficientDet-D1 \n\n最新一次运行的mAP为0.394（使用4张1080ti显卡）：\n`.\u002Fdistributed_train.sh 4 \u002Fmscoco --model efficientdet_d1 -b 10 --amp --lr .06 --sync-bn --opt fusedmomentum --warmup-epochs 5 --lr-noise 0.4 0.9 --model-ema --model-ema-decay 0.99995`\n\n这次训练中我使用了一些改进的数据增强方法，目前仍在实验阶段，尚未准备好发布。如果没有这些增强，模型的表现应该也不错，但可能会更快地出现过拟合，最终的mAP可能在0.385到0.39之间。\n\n### 移植的TensorFlow权重\n\n#### TEST-DEV2017\n\n注意：我目前只尝试将D7提交到开发服务器进行了简单验证。\n\n##### TF-EfficientDet-D7\n```\n 平均精度（AP）@[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.534\n 平均精度（AP）@[ IoU=0.50      | area=   all | maxDets=100 ] = 0.726\n 平均精度（AP）@[ IoU=0.75      | area=   all | maxDets=100 ] = 0.577\n 平均精度（AP）@[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.356\n 平均精度（AP）@[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.569\n 平均精度（AP）@[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.660\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.397\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.644\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.682\n 平均召回率（AR）@[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.508\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.718\n 平均召回率（AR）@[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.818\n```\n\n#### VAL2017\n\n##### TF-EfficientDet-D0\n```\n 平均精度（AP）@[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.341877\n 平均精度（AP）@[ IoU=0.50      | area=   all | maxDets=100 ] = 0.525112\n 平均精度（AP）@[ IoU=0.75      | area=   all | maxDets=100 ] = 0.360218\n 平均精度（AP）@[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.131366\n 平均精度（AP）@[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.399686\n 平均精度（AP）@[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.537368\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.293137\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.447829\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.472954\n 平均召回率（AR）@[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.195282\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.558127\n 平均召回率（AR）@[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.695312\n```\n\n##### TF-EfficientDet-D1\n```\n 平均精度（AP）@[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.401070\n 平均精度（AP）@[ IoU=0.50      | area=   all | maxDets=100 ] = 0.590625\n 平均精度（AP）@[ IoU=0.75      | area=   all | maxDets=100 ] = 0.422998\n 平均精度（AP）@[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.211116\n 平均精度（AP）@[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.459650\n 平均精度（AP）@[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.577114\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.326565\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.507095\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.537278\n 平均召回率（AR）@[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.308963\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.610450\n 平均召回率（AR）@[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.731814\n```\n\n##### TF-EfficientDet-D2\n```\n 平均精度（AP）@[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.434042\n 平均精度（AP）@[ IoU=0.50      | area=   all | maxDets=100 ] = 0.627834\n 平均精度（AP）@[ IoU=0.75      | area=   all | maxDets=100 ] = 0.463488\n 平均精度（AP）@[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.237414\n 平均精度（AP）@[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.486118\n 平均精度（AP）@[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.606151\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.343016\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.538328\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.571489\n 平均召回率（AR）@[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.350301\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.638884\n 平均召回率（AR）@[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.746671\n```\n\n##### TF EfficientDet-D3\n\n```\n 平均精度  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.471223\n 平均精度  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.661550\n 平均精度  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.505127\n 平均精度  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.301385\n 平均精度  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.518339\n 平均精度  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.626571\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.365186\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.582691\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.617252\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.424689\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.670761\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.779611\n```\n\n##### TF-EfficientDet-D4\n ```\n 平均精度  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.491759\n 平均精度  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.686005\n 平均精度  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.527791\n 平均精度  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.325658\n 平均精度  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.536508\n 平均精度  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.635309\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.373752\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.601733\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.638343\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.463057\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.685103\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.789180\n```\n\n##### TF-EfficientDet-D5\n```\n 平均精度  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.511767\n 平均精度  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.704835\n 平均精度  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.552920\n 平均精度  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.355680\n 平均精度  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.551341\n 平均精度  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.650184\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.384516\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.619196\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.657445\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.499319\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.695617\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.788889\n```\n\n##### TF-EfficientDet-D6\n```\n 平均精度  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.520200\n 平均精度  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.713204\n 平均精度  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.560973\n 平均精度  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.361596\n 平均精度  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.567414\n 平均精度  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.657173\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.387733\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.629269\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.667495\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.499002\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.711909\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.802336\n```\n\n##### TF-EfficientDet-D7\n ```\n 平均精度  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.531256\n 平均精度  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.724700\n 平均精度  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.571787\n 平均精度  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.368872\n 平均精度  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.573938\n 平均精度  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.668253\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.393620\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.637601\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.676987\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.524850\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.717553\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.806352\n ```\n\n##### TF-EfficientDet-D7X\n\n```\n 平均精度  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.543\n 平均精度  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.737\n 平均精度  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.585\n 平均精度  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.401\n 平均精度  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.579\n 平均精度  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.680\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.398\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.649\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.689\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.550\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.725\n 平均召回率     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.823\n```\n\n\n\n## 待办事项\n- [x] 基础训练（目标检测）的重新实现\n- [ ] 马赛克增强\n- [ ] 随机\u002F自动增强\n- [ ] BBOX IoU 损失（GIoU、DIoU、CIoU 等）\n- [ ] 语义分割实验\n- [ ] 与 Detectron2 \u002F MMDetection 代码库的集成\n- [ ] 添加并清理我过去项目中使用过的基于 EfficientNet 的 U-Net 和 DeepLab 分割模型\n- [x] 添加并清理过去项目中的 OpenImages 数据集\u002F训练支持\n- [ ] 探索实例分割的可能性...\n\n如果您所在的组织对赞助这些工作感兴趣，或者对可能的未来发展方向的优先级安排感兴趣，请随时联系我（通过 GitHub Issues、LinkedIn、Twitter 或发送邮件至 hello@rwightman.com）。如果有兴趣，我将设立一个 GitHub Sponsorship。","# EfficientDet (PyTorch) 快速上手指南\n\n本指南基于 `efficientdet-pytorch` 项目，帮助开发者快速在 PyTorch 环境中部署和使用 EfficientDet 目标检测模型。该项目复现了 Google Brain 的官方 TensorFlow 版本，并提供了灵活的实验配置。\n\n## 1. 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux (推荐)\n*   **Python 版本**: 3.7 - 3.9\n*   **PyTorch 版本**: 1.6 - 2.0+ (推荐使用原生 AMP 和 DDP)\n*   **核心依赖**:\n    *   `timm` (PyTorch Image Models) >= 0.9\n    *   `pycocotools` (用于 COCO 数据集评估)\n\n**⚠️ 重要注意事项**:\n存在 `numpy` 与 `pycocotools` 的版本冲突问题。如果安装 `pycocotools` 失败或报错，请强制安装旧版 numpy：\n```bash\npip install \"numpy\u003C=1.17.5\"\n# 或者确保安装 pycocotools >= 2.0.2\n```\n\n## 2. 安装步骤\n\n建议使用 Conda 创建独立环境，并通过 pip 安装所需依赖。国内用户可使用清华或阿里镜像源加速下载。\n\n### 2.1 创建并激活环境\n```bash\nconda create -n effdet python=3.8\nconda activate effdet\n```\n\n### 2.2 安装 PyTorch\n请访问 [PyTorch 官网](https:\u002F\u002Fpytorch.org\u002F) 获取适合您 CUDA 版本的安装命令。示例（CUDA 11.8）：\n```bash\npip install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n```\n*国内加速*: 可使用清华源 `--index-url https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple` (需注意 PyTorch 官方 wheel 可能不在镜像站，建议直接连官方源或使用国内镜像站的 torch 频道)。\n\n### 2.3 安装项目依赖\n安装 `timm` 和其他必要库：\n```bash\n# 使用国内镜像加速安装 timm 和其他依赖\npip install timm>=0.9 opencv-python-headless pycocotools -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n\n# 克隆本项目代码\ngit clone https:\u002F\u002Fgithub.com\u002Frwightman\u002Fefficientdet-pytorch.git\ncd efficientdet-pytorch\n```\n\n*(可选) 如果需要训练功能，请确保安装了 apex (如需使用旧版混合精度) 或直接使用 PyTorch 原生 AMP。*\n\n## 3. 基本使用\n\n### 3.1 加载预训练模型\n以下是最简单的推理示例，加载一个预训练的 `tf_efficientdet_d0` 模型并进行前向传播。\n\n```python\nimport torch\nfrom effdet import create_model, DetBenchEval\nfrom effdet.config import get_efficientdet_config\n\n# 1. 获取配置文件 (以 d0 为例)\nconfig = get_efficientdet_config('tf_efficientdet_d0')\n\n# 2. 创建模型\n# pretrained=True 会自动下载对应的权重\nmodel = create_model(config, pretrained=True, bench_task='predict')\n\n# 3. 转换为评估模式并移至 GPU\nmodel = model.cuda()\nmodel.eval()\n\n# 4. 准备输入数据 (Batch, Channels, Height, Width)\n# EfficientDet 通常要求输入尺寸能被 128 整除，例如 512x512\ninput_tensor = torch.randn(1, 3, 512, 512).cuda()\n\n# 5. 执行推理\nwith torch.no_grad():\n    # 输出格式通常为 [batch, num_detections, 6] \n    # 最后一维包含: [x_min, y_min, x_max, y_max, score, class_id]\n    output = model(input_tensor)\n\nprint(output.shape)\n```\n\n### 3.2 使用命令行进行验证\u002F推理\n项目提供了便捷的脚本用于验证数据集或运行基准测试。\n\n**验证 COCO 数据集:**\n```bash\npython val.py \u002Fpath\u002Fto\u002Fcoco --model tf_efficientdet_d0 --pretrained\n```\n\n**运行单张图片推理 (需自行编写简单脚本或修改 bench 脚本):**\n若需快速测试，可直接在 Python 交互式环境中使用上述代码片段。\n\n### 3.3 模型变体选择\n项目支持多种预训练模型，可通过更改模型名称切换。常用模型包括：\n\n| 模型名称 | 输入尺寸 | 特点 |\n| :--- | :--- | :--- |\n| `tf_efficientdet_lite0` | 320 | 轻量级，适合移动端 |\n| `tf_efficientdet_d0` | 512 | 速度与精度平衡，推荐入门 |\n| `tf_efficientdet_d3` | 896 | 高精度 |\n| `efficientdetv2_ds` | 1024 | 基于 EfficientNetV2，性能更强 |\n\n只需将创建模型时的字符串参数替换即可，例如：\n```python\nconfig = get_efficientdet_config('tf_efficientdet_d3')\n```","某电商物流团队正致力于开发一套自动化包裹分拣系统，需要实时识别传送带上不同尺寸和类型的货物以引导机械臂抓取。\n\n### 没有 efficientdet-pytorch 时\n- **精度与速度难以兼得**：团队在轻量级模型（速度快但漏检多）和大型模型（精度高但延迟高）之间陷入两难，无法在边缘设备上同时满足实时性和高准确率要求。\n- **自定义骨干网络困难**：现有的开源实现大多将网络结构写死，团队想尝试引入更先进的 EfficientNetV2 作为骨干以提升特征提取能力，却因代码耦合度高而不得不大量重写底层逻辑。\n- **复现官方效果成本高**：直接迁移 Google 的 TensorFlow 官方权重到 PyTorch 环境极其繁琐，且其他社区版本往往无法从零训练复现 MS COCO 数据集上的基准性能，导致模型调优缺乏可靠起点。\n- **实验迭代周期长**：由于双向往特征金字塔（BiFPN）的连接模式固定，研究人员无法快速验证不同的特征融合策略对特定货物检测效果的影响。\n\n### 使用 efficientdet-pytorch 后\n- **实现最佳效能平衡**：利用其提供的 D0-D7 及 Lite 系列预训练模型，团队迅速部署了适合边缘计算的方案，在保持低延迟的同时显著提升了小件货物的检出率。\n- **灵活替换骨干网络**：借助对 `timm` 库的深度集成，开发人员仅通过修改配置参数即可无缝切换至 EfficientNetV2 等任意支持特征提取的骨干网络，无需改动核心代码。\n- **无缝加载权威权重**：直接加载已移植的 Google 官方权重（包括 AdvProp 增强版），模型初始性能即达到业界领先水平，大幅缩短了从环境搭建到业务验证的时间。\n- **高效开展架构实验**：BiFPN 连接模式和卷积类型完全可配置，算法工程师能快速调整特征融合策略，针对不规则包裹优化模型结构，加速了技术迭代。\n\nefficientdet-pytorch 凭借其对官方实现的高保真还原与极高的架构灵活性，帮助团队在有限的算力资源下快速构建了高精度、低延迟的工业级目标检测系统。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frwightman_efficientdet-pytorch_442c790e.png","rwightman","Ross Wightman","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Frwightman_2b42e5de.jpg","AI, Computer Vision. Always learning, constantly curious. Building ML\u002FAI systems, watching loss curves.",null,"Vancouver, BC, Canada","wightmanr","rwightman.com","https:\u002F\u002Fgithub.com\u002Frwightman",[82,86],{"name":83,"color":84,"percentage":85},"Python","#3572A5",100,{"name":87,"color":88,"percentage":89},"Shell","#89e051",0,1656,302,"2026-04-11T03:43:51","Apache-2.0","Linux","需要 NVIDIA GPU（支持 CUDA），具体显存取决于模型大小和输入分辨率（例如 D7x 需较大显存），支持原生 PyTorch AMP","未说明",{"notes":98,"python":99,"dependencies":100},"注意 Numpy 1.18+ 与 pycocotools 2.0 存在冲突，需强制安装 numpy\u003C=1.17.5 或确保 pycocotools>=2.0.2。推荐使用原生 PyTorch AMP 和 DDP 而非 Apex。在分布式训练中，同步批归一化（SyncBN）与模型 EMA 结合某些骨干网络时可能导致不收敛，建议禁用其中之一。","3.7 - 3.9",[101,102,103,104],"torch>=1.6","timm>=0.4.12","pycocotools>=2.0.2","numpy\u003C=1.17.5 (若 pycocotools\u003C2.0.2)",[15,14],[107,108,109,110,111],"efficientdet","efficientnet","object-detection","semantic-segmentation","pytorch","2026-03-27T02:49:30.150509","2026-04-16T15:50:51.865775",[115,120,125,130,135,139],{"id":116,"question_zh":117,"answer_zh":118,"source_url":119},36159,"如何在自定义数据集上训练 EfficientDet？如果类别数较少且效果不佳该怎么办？","可以训练自定义类别数的数据集。需要覆盖配置中的 `config.num_classes`（例如设为 11）和 `config.image_size`。注意边界框坐标格式应为 yxyx，且类别索引应从 1 开始（因为 fast_collate 的工作方式）。如果结果很差，请检查是否使用了正确的阈值（例如 box threshold 设为 0.3 而不是 0.001），这能减少掩码头处理的框数量并提升精度。有用户在 10K 图像上使用 d2 模型取得了 42% AP 的检测结果。","https:\u002F\u002Fgithub.com\u002Frwightman\u002Fefficientdet-pytorch\u002Fissues\u002F72",{"id":121,"question_zh":122,"answer_zh":123,"source_url":124},36160,"当真实标注中只有一个边界框时，调用 batch_label_anchors 报错 IndexError 怎么办？","该问题在 PyTorch 1.6 和 1.7 版本中通常已修复。根本原因是输入张量的维度处理：目标边界框张量形状应为 [B, N, 4]，类别为 [B, N]。即使某个样本只有一个框，也不能去掉 N 这一维。建议固定 N 的大小（例如 COCO 默认最大实例数为 100），对不足的实例用 0 填充。默认训练流程会传递形状为 [B, 100, 4] 的框张量和 [B, 100] 的类别张量。","https:\u002F\u002Fgithub.com\u002Frwightman\u002Fefficientdet-pytorch\u002Fissues\u002F44",{"id":126,"question_zh":127,"answer_zh":128,"source_url":129},36161,"是否支持 EfficientNet-Lite 作为骨干网络？如何使用官方 Lite 模型的权重？","维护者正在开发支持官方 Lite 模型的更新，但由于需要支持不能被 128 整除的特征尺寸，改动较大，需充分测试。目前 Google AutoML 仓库提供了 6 个 EfficientDet-Lite 模型的检查点，理论上可以直接与代码库中已有的 tf 版本 Lite 模型配置配合使用，但需等待正式更新以确保兼容性。","https:\u002F\u002Fgithub.com\u002Frwightman\u002Fefficientdet-pytorch\u002Fissues\u002F31",{"id":131,"question_zh":132,"answer_zh":133,"source_url":134},36162,"从头开始在 COCO 数据集上训练时，初始损失极高或训练不稳定怎么办？","确保使用正确的超参数和优化设置。推荐使用命令：`python3 train.py \u002Ffast-data\u002Fcoco --model tf_efficientdet_d0 -b 32 --amp --lr .05 --warmup-epochs 5 --sync-bn --opt fusedmomentum --fill-color mean --model-ema`。开启混合精度训练（--amp）和指数移动平均（--model-ema）有助于稳定训练。正确配置后，初始损失应显著降低（例如从 10000 降至 18.4 左右），且损失曲线平滑不会爆炸。","https:\u002F\u002Fgithub.com\u002Frwightman\u002Fefficientdet-pytorch\u002Fissues\u002F13",{"id":136,"question_zh":137,"answer_zh":138,"source_url":134},36163,"训练过程中出现关于整数除法（Integer division）的警告如何处理？","这是 PyTorch 的弃用警告，提示未来版本中 `\u002F` 运算符将执行真除法。建议在代码中将涉及张量整数除法的地方改为使用 `true_divide` 或 `floor_divide`（即 Python 中的 `\u002F\u002F`）。虽然目前通常不影响运行，但为了兼容未来版本，最好按警告信息修改代码。",{"id":140,"question_zh":141,"answer_zh":142,"source_url":134},36164,"启用 --ema 参数时遇到错误如何解决？","这是一个已知问题，仅在设置 `--ema` 参数时发生。维护者已确认该错误并创建了专门的问题追踪（Issue #18）进行修复。建议暂时移除 `--model-ema` 参数进行训练，或查看相关 Issue 获取最新的补丁代码。",[144,148,153,158],{"id":145,"version":146,"summary_zh":76,"released_at":147},288926,"v0.2.4","2021-05-01T00:11:57",{"id":149,"version":150,"summary_zh":151,"released_at":152},288927,"v0.1-anno","一些预先收集的数据集标注，手动收集可能不太方便。","2020-10-22T19:52:48",{"id":154,"version":155,"summary_zh":156,"released_at":157},288928,"v0.1.6","最新的TF权重，包括D7X模型。","2020-09-18T18:04:54",{"id":159,"version":160,"summary_zh":161,"released_at":162},288929,"v0.1","带有 `tf_` 前缀的模型是从 TensorFlow 预训练权重和模型移植而来的，不带该前缀的模型则是使用本代码库训练得到的。","2020-04-09T09:09:10"]