[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-hustvl--SparseInst":3,"tool-hustvl--SparseInst":64},[4,17,26,40,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,2,"2026-04-03T11:11:01",[13,14,15],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":23,"last_commit_at":32,"category_tags":33,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,34,35,36,15,37,38,13,39],"数据工具","视频","插件","其他","语言模型","音频",{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":10,"last_commit_at":46,"category_tags":47,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,38,37],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":10,"last_commit_at":54,"category_tags":55,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74939,"2026-04-05T23:16:38",[38,14,13,37],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":23,"last_commit_at":62,"category_tags":63,"status":16},2471,"tesseract","tesseract-ocr\u002Ftesseract","Tesseract 是一款历史悠久且备受推崇的开源光学字符识别（OCR）引擎，最初由惠普实验室开发，后由 Google 维护，目前由全球社区共同贡献。它的核心功能是将图片中的文字转化为可编辑、可搜索的文本数据，有效解决了从扫描件、照片或 PDF 文档中提取文字信息的难题，是数字化归档和信息自动化的重要基础工具。\n\n在技术层面，Tesseract 展现了强大的适应能力。从版本 4 开始，它引入了基于长短期记忆网络（LSTM）的神经网络 OCR 引擎，显著提升了行识别的准确率；同时，为了兼顾旧有需求，它依然支持传统的字符模式识别引擎。Tesseract 原生支持 UTF-8 编码，开箱即用即可识别超过 100 种语言，并兼容 PNG、JPEG、TIFF 等多种常见图像格式。输出方面，它灵活支持纯文本、hOCR、PDF、TSV 等多种格式，方便后续数据处理。\n\nTesseract 主要面向开发者、研究人员以及需要构建文档处理流程的企业用户。由于它本身是一个命令行工具和库（libtesseract），不包含图形用户界面（GUI），因此最适合具备一定编程能力的技术人员集成到自动化脚本或应用程序中",73286,"2026-04-03T01:56:45",[13,14],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":79,"owner_url":80,"languages":81,"stars":90,"forks":91,"last_commit_at":92,"license":93,"difficulty_score":94,"env_os":95,"env_gpu":96,"env_ram":97,"env_deps":98,"category_tags":108,"github_topics":109,"view_count":10,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":115,"updated_at":116,"faqs":117,"releases":147},3613,"hustvl\u002FSparseInst","SparseInst","[CVPR 2022] SparseInst: Sparse Instance Activation for Real-Time Instance Segmentation","SparseInst 是一款专为实时实例分割设计的高效开源框架，曾发表于计算机视觉顶会 CVPR 2022。它致力于解决传统目标检测算法在速度与精度之间难以兼顾的痛点，特别是在处理复杂场景下的物体识别与轮廓分割时，往往依赖繁琐的后处理步骤（如非极大值抑制 NMS），导致推理速度受限。\n\n与其他依赖锚框或中心点的方法不同，SparseInst 创新性地提出了“实例激活图”（IAM）概念。这种方法能自适应地高亮显示物体中信息最丰富的区域，直接通过聚合特征完成识别与分割，无需任何排序或复杂的后处理操作。这种全卷积的设计不仅让架构更加简洁，还显著提升了部署便利性。实测数据显示，它在保持每秒 40 帧以上高速运行的同时，仍能取得优异的分割精度，实现了速度与性能的完美平衡。\n\n此外，SparseInst 支持多种主流骨干网络，并提供了 FP16 加速、ONNX 模型导出以及 MindSpore 框架实现等实用功能，进一步降低了使用门槛。无论是从事自动驾驶、视频监控算法研发的工程师，还是希望探索前沿分割技术的学术研究人员，都能利用 SparseInst 快速构建高性能的视觉应用。","\u003Cdiv align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhustvl_SparseInst_readme_44994886342b.gif\">\n\u003Cbr>\n\u003Cbr>\nTianheng Cheng, \u003Ca href=\"https:\u002F\u002Fxwcv.github.io\u002F\">Xinggang Wang\u003C\u002Fa>\u003Csup>\u003Cspan>&#8224;\u003C\u002Fspan>\u003C\u002Fsup>, Shaoyu Chen, Wenqiang Zhang, \u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=pCY-bikAAAAJ&hl=zh-CN\">Qian Zhang\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=IyyEKyIAAAAJ&hl=zh-CN\">Chang Huang\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fzhaoxiangzhang.net\u002F\">Zhaoxiang Zhang\u003C\u002Fa>, \u003Ca href=\"http:\u002F\u002Feic.hust.edu.cn\u002Fprofessor\u002Fliuwenyu\u002F\"> Wenyu Liu\u003C\u002Fa>\n\u003C\u002Fbr>\n(\u003Cspan>&#8224;\u003C\u002Fspan>: corresponding author)\n\n\u003C!-- \u003Cdiv>\u003Ca href=\"\">[Project Page]\u003C\u002Fa>(comming soon)\u003C\u002Fdiv>  -->\n\u003Cdiv>\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.12827\">[arXiv paper]\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FCheng_Sparse_Instance_Activation_for_Real-Time_Instance_Segmentation_CVPR_2022_paper.pdf\">[CVPR paper]\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1xhqQvQ0YVCHd8XQxnCVqef75Hey7kI-d\u002Fview?usp=sharing\">[slides]\u003C\u002Fa>\n\u003C\u002Fdiv>\n\u003C\u002Fdiv>\n\n\n\n## Highlights \n\n\u003Cdiv align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhustvl_SparseInst_readme_21ccb57c5512.gif\">\n\u003Cbr>\n\u003Cbr>\n\u003Cdiv>\n\n[![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fsparse-instance-activation-for-real-time\u002Freal-time-instance-segmentation-on-mscoco)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Freal-time-instance-segmentation-on-mscoco?p=sparse-instance-activation-for-real-time)\n\u003C\u002Fdiv>\n\u003C\u002Fdiv>\n\n\n\n* SparseInst presents a new object representation method, *i.e.*, Instance Activation Maps (IAM), to adaptively highlight informative regions of objects for recognition.\n* SparseInst is a simple, efficient, and fully convolutional framework without non-maximum suppression (NMS) or sorting, and easy to deploy!\n* SparseInst achieves good trade-off between speed and accuracy, *e.g.*, 37.9 AP and 40 FPS with 608x input.\n\n\n\n## Updates\n\n`This project is under active development, please stay tuned!` &#9749;\n\n* `[2022-10-31]`: We release the models & weights for the [`CSP-DarkNet53`](configs\u002Fsparse_inst_cspdarknet53_giam.yaml) backbone. Which is a strong baseline with highly-competitve inference speed and accuracy.\n\n* `[2022-10-19]`: We provide the implementation and inference code based on [MindSpore](https:\u002F\u002Fwww.mindspore.cn\u002F), a nice and efficient Deep Learning framework. Thanks [Ruiqi Wang](https:\u002F\u002Fgithub.com\u002FRuiqiWang00) for this kind contribution!\n\n* `[2022-8-9]`: We provide the FLOPs counter [`get_flops.py`](.\u002Ftools\u002Fget_flops.py) to obtain the FLOPs\u002FParameters of SparseInst. This update also includes some bugfixs.\n\n* `[2022-7-17]`: `Faster`&#128640;:  SparseInst now supports [training and inference with **FP16**](https:\u002F\u002Fgithub.com\u002Fhustvl\u002FSparseInst#-sparseinst-with-fp16). Inference with FP16 improves the speed by **30\\%**. `Robust`: we replace the `Sigmoid + Norm` with [`Softmax`](configs\u002Fsparse_inst_r50_giam_softmax.yaml) for numerical stability, especially for ONNX. `Easy-to-Use`: we provide the [script](.\u002Fonnx\u002Fconvert_onnx.py) for exporting SparseInst to ONNX models.\n\n* `[2022-4-29]`: We fix the **common issue** about the visualization `demo.py`, *e.g.,* `ValueError: GenericMask cannot handle ...`. \n\n* `[2022-4-7]`: We provide the `demo` code for visualization and inference on images. Besides, we have added more backbones for SparseInst, including [ResNet-101](https:\u002F\u002Farxiv.org\u002Fabs\u002F1512.03385), [CSPDarkNet](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2004.10934v1.pdf), and [PvTv2](https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.12122). We are still supporting more backbones.\n\n* `[2022-3-25]`: We have released the code and models for SparseInst! \n\n \n\n## Overview\n**SparseInst** is a conceptually novel, efficient, and fully convolutional framework for real-time instance segmentation.\nIn contrast to region boxes or anchors (centers), SparseInst adopts a sparse set of **instance activation maps** as object representation, to highlight informative regions for each foreground objects.\nThen it obtains the instance-level features by aggregating features according to the highlighted regions for recognition and segmentation.\nThe bipartite matching compels the instance activation maps to predict objects in a one-to-one style, thus avoiding non-maximum suppression (NMS) in post-processing. Owing to the simple yet effective designs with instance activation maps, SparseInst has extremely fast inference speed and achieves **40 FPS** and **37.9 AP** on COCO (NVIDIA 2080Ti), significantly outperforms the counter parts in terms of speed and accuracy.\n\n\n\u003Ccenter>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhustvl_SparseInst_readme_04f0bd07d129.png\">\n\u003C\u002Fcenter>\n\n\n## Models\n\nWe provide two versions of SparseInst, *i.e.*, the basic IAM (3x3 convolution) and the Group IAM (G-IAM for short), with different backbones.\nAll models are trained on MS-COCO *train2017*.\n\n#### Fast models\n\n| model | backbone | input | aug | AP\u003Csup>val\u003C\u002Fsup> |  AP  | FPS | weights |\n| :---- | :------  | :---: | :-: |:--------------: | :--: | :-: | :-----: |\n| [SparseInst](configs\u002Fsparse_inst_r50_base.yaml) | [R-50](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1Ee6nPXlj1eewAnooYtoPtLzbRp_mDxfB\u002Fview?usp=sharing) | 640 | &#x2718; | 32.8 | 33.2 | 44.3 | [model](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F12RQLHD5EZKIOvlqW3avUCeYjFG1NPKDy\u002Fview?usp=sharing) |\n| [SparseInst](sparse_inst_r50vd_base.yaml) | [R-50-vd](https:\u002F\u002Fgithub.com\u002Frwightman\u002Fpytorch-image-models\u002Freleases\u002Fdownload\u002Fv0.1-weights\u002Fresnet50d_ra2-464e36ba.pth) | 640 | &#x2718; | 34.1 | 34.5 | 42.6 | [model](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1fjPFy35X2iJu3tYwVdAq4Bel82PfH5kx\u002Fview?usp=sharing)|\n| [SparseInst (G-IAM)](configs\u002Fsparse_inst_r50_giam.yaml) | [R-50](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1Ee6nPXlj1eewAnooYtoPtLzbRp_mDxfB\u002Fview?usp=sharing) | 608 | &#x2718; | 33.4 | 34.0 | 44.6 | [model](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1pXU7Dsa1L7nUiLU9ULG2F6Pl5m5NEguL\u002Fview?usp=sharing) |\n| [SparseInst (G-IAM, Softmax)](configs\u002Fsparse_inst_r50_giam_soft.yaml) | [R-50](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1Ee6nPXlj1eewAnooYtoPtLzbRp_mDxfB\u002Fview?usp=sharing) | 608 | &#x2718; | 33.6 | - | 44.6 | [model](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1doterrG89SjmLxDyU8IhLYRGxVH69sR2\u002Fview?usp=sharing) |\n| [SparseInst (G-IAM)](configs\u002Fsparse_inst_r50_giam_aug.yaml) | [R-50](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1Ee6nPXlj1eewAnooYtoPtLzbRp_mDxfB\u002Fview?usp=sharing) | 608 | &#10003; | 34.2 | 34.7 | 44.6 | [model](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1MK8rO3qtA7vN9KVSBdp0VvZHCNq8-bvz\u002Fview?usp=sharing) |\n| [SparseInst (G-IAM)](configs\u002Fsparse_inst_r50_dcn_giam_aug.yaml) | [R-50-DCN](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1Ee6nPXlj1eewAnooYtoPtLzbRp_mDxfB\u002Fview?usp=sharing) | 608 | &#10003;| 36.4 | 36.8 | 41.6 | [model](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1qxdLRRHbIWEwRYn-NPPeCCk6fhBjc946\u002Fview?usp=sharing) |\n| [SparseInst (G-IAM)](configs\u002Fsparse_inst_r50vd_giam_aug.yaml) | [R-50-vd](https:\u002F\u002Fgithub.com\u002Frwightman\u002Fpytorch-image-models\u002Freleases\u002Fdownload\u002Fv0.1-weights\u002Fresnet50d_ra2-464e36ba.pth) | 608 | &#10003;| 35.6 | 36.1 | 42.8| [model](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1dlamg7ych_BdWpPUCuiBXbwE0SXpsfGx\u002Fview?usp=sharing) |\n| [SparseInst (G-IAM)](configs\u002Fsparse_inst_r50vd_dcn_giam_aug.yaml) | [R-50-vd-DCN](https:\u002F\u002Fgithub.com\u002Frwightman\u002Fpytorch-image-models\u002Freleases\u002Fdownload\u002Fv0.1-weights\u002Fresnet50d_ra2-464e36ba.pth) | 608 | &#10003; | 37.4 | 37.9 | 40.0  | [model](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1clYPdCNrDNZLbmlAEJ7wjsrOLn1igOpT\u002Fview?usp=sharing)|\n| [SparseInst (G-IAM)](configs\u002Fsparse_inst_r50vd_dcn_giam_aug.yaml) | [R-50-vd-DCN](https:\u002F\u002Fgithub.com\u002Frwightman\u002Fpytorch-image-models\u002Freleases\u002Fdownload\u002Fv0.1-weights\u002Fresnet50d_ra2-464e36ba.pth) | 640 | &#10003; | 37.7 | 38.1 | 39.3 |  [model](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1clYPdCNrDNZLbmlAEJ7wjsrOLn1igOpT\u002Fview?usp=sharing)| \n\n#### SparseInst with other backbones\n\n| model | backbone | input | AP\u003Csup>val\u003C\u002Fsup> |  AP  | FPS | weights |\n| :---- | :------ | :---: | :--------------: | :--: | :-: | :-----: |\n| SparseInst (G-IAM) | [CSPDarkNet](configs\u002Fsparse_inst_cspdarknet53_giam.yaml) | 640 | 35.1 | -| - | [model](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1rcUJWUbusM216Zbtmo_xB774jdjb3qSt\u002Fview?usp=sharing) |\n\n#### Larger models\n\n| model | backbone | input | aug  | AP\u003Csup>val\u003C\u002Fsup> |  AP  | FPS | weights |\n| :---- | :------ | :---: | :---: | :--------------: | :--: | :-: | :-----: |\n| [SparseInst (G-IAM)](configs\u002Fsparse_inst_r101_giam.yaml) | [R-101](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1-6ZBvC55unwuHvGn-Xf4xuy2Qr1vC7Zo\u002Fview?usp=sharing) | 640 | &#x2718; | 34.9 | 35.5 | - | [model](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1EZZck-UNfom652iyDhdaGYbxS0MrO__z\u002Fview?usp=sharing)|\n| [SparseInst (G-IAM)](configs\u002Fsparse_inst_r101_dcn_giam.yaml) | [R-101-DCN](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1-6ZBvC55unwuHvGn-Xf4xuy2Qr1vC7Zo\u002Fview?usp=sharing) | 640 | &#x2718; | 36.4 | 36.9 | - | [model](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1shkFvyBmDlWRxl1ActD6VfZJTJYBGBjv\u002Fview?usp=sharing) |\n\n#### SparseInst with Vision Transformers\n\n| model | backbone | input | aug | AP\u003Csup>val\u003C\u002Fsup> |  AP  | FPS | weights |\n| :---- | :------ | :---: | :---: | :--------------: | :--: | :-: | :-----: |\n| [SparseInst (G-IAM)](configs\u002Fsparse_inst_pvt_b1_giam.yaml) | [PVTv2-B1](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1B7JTO0WqyhFn7nvUlRf6qKQrFzTnRWDC\u002Fview?usp=sharing) | 640 |  &#x2718; | 35.3 | 36.0 | 33.5 (48.9\u003Csup>&#x021A1;\u003C\u002Fsup>)| [model](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F13l9JgTz3sF6j3vSVHOOhAYJnCf-QuNe_\u002Fview?usp=sharing) |\n| [SparseInst (G-IAM)](configs\u002Fsparse_inst_pvt_b2_li_giam.yaml) | [PVTv2-B2-li](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1YhjCH4FZa9ekWUqa-JovEfAR2wuUXEtQ\u002Fview?usp=sharing) | 640 |  &#x2718; | 37.2 | 38.2 | 26.5 | [model](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1DFxQnFg_UL6kmMoNC4StUKo79RXVHyNF\u002Fview?usp=sharing) |\n\n\u003Csup>&#x021A1;\u003C\u002Fsup>: measured on RTX 3090.\n\n\n**Note:** \n* **We will continue adding more models** including more efficient convolutional networks, vision transformers, and larger models for high performance and high speed, please stay tuned &#128513;!\n* Inference speeds are measured on one NVIDIA 2080Ti unless specified.\n* We haven't adopt TensorRT or other tools to accelerate the inference of SparseInst. However, we are working on it now and will provide support for ONNX, TensorRT, MindSpore, [Blade](https:\u002F\u002Fgithub.com\u002Falibaba\u002FBladeDISC), and other frameworks as soon as possible!\n* AP denotes AP evaluated on MS-COCO *test-dev2017*\n* *input* denotes the shorter side of the input, *e.g.*, 512x864 and 608x864, we keep the aspect ratio of the input and the longer side is no more than 864.\n* The inference speed might slightly change on different machines (2080 Ti) and different versions of detectron (we mainly use [v0.3](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2\u002Ftree\u002Fv0.3)). If the change is sharp, e.g., > 5ms, please feel free to contact us.\n* For `aug` (augmentation), we only adopt the simple random crop (crop size: [384, 600]) provided by detectron2.\n* We adopt `weight decay=5e-2` as default setting, which is slightly different from the original paper.\n* **[Weights on BaiduPan]**: we also provide trained models on BaiduPan: [ShareLink](https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1tot7Wcoi4J1xh8ZS7VikZg) (password: lkdo).\n\n## Installation and Prerequisites\n\nThis project is built upon the excellent framework [detectron2](https:\u002F\u002Fgithub.com\u002Ffacebookreseach\u002Fdetectron2), and you should install detectron2 first, please check [official installation guide](https:\u002F\u002Fdetectron2.readthedocs.io\u002Fen\u002Flatest\u002Ftutorials\u002Finstall.html) for more details.\n\n**Updates:** SparseInst works well on [detectron2-v0.6](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2\u002Ftree\u002Fv0.6). \n\n**Note:** previously, we mainly use [v0.3](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2\u002Ftree\u002Fv0.3) of detectron2 for experiments and evaluations. Besides, we also test our code on the newest version [v0.6](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2\u002Ftree\u002Fv0.6). If you find some bugs or incompatibility problems of higher version of detectron2, please feel free to raise a issue!\n\nInstall the detectron2:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2.git\n# if you swith to a specific version, e.g., v0.3 (recommended) or v0.6\ngit checkout tags\u002Fv0.6\n# build detectron2\npython setup.py build develop\n```\n\n## Getting Start\n\n\n### &#128293; SparseInst with FP16\n\nSparseInst with FP16 achieves 30% faster inference speed and saves much training memory, we provide some comparisons about the memory, inference speed, and training speed in the below table.\n\n|  FP16 | train mem.(log) | train mem.(`nvidia-smi`) | train speed | infer. speed | \n| :---: | :-------------: | :----------------------: | :---------: | :----------: |\n| &#x2718; | 6.0G | 10.5G | 0.8690s\u002Fiter | 52.17 FPS |\n| &#10003; | 3.9G | 6.8G  | 0.6949s\u002Fiter | 67.57 FPS |\n\nNote: statistics are measured on NVIDIA 3090. With FP16, we have faster training speed and can also increase the batch size for better performance.\n\n* Training with FP16: enable FP16 is simple, you only need to enable `SOLVER.AMP.ENABLED=True`, or add this configuration to the config file.\n\n```bash\npython tools\u002Ftrain_net.py --config-file configs\u002Fsparse_inst_r50_giam_fp16.yaml --num-gpus 8 SOLVER.AMP.ENABLED True\n```\n\n* Testing with FP16: enable FP16 for inference by adding `--fp16`.\n\n```bash\npython tools\u002Ftest_net.py --config-file configs\u002Fsparse_inst_r50_giam_fp16.yaml --fp16 MODEL.WEIGHTS model_final.pth \n```\n\n### Testing SparseInst\n\nBefore testing, you should specify the config file `\u003CCONFIG>` and the model weights `\u003CMODEL-PATH>`. In addition, you can change the input size by setting the `INPUT.MIN_SIZE_TEST` in both config file or commandline.\n\n* [Performance Evaluation] To obtain the evaluation results, *e.g.*, mask AP on COCO, you can run:\n\n```bash\npython tools\u002Ftrain_net.py --config-file \u003CCONFIG> --num-gpus \u003CGPUS> --eval MODEL.WEIGHTS \u003CMODEL-PATH>\n# example:\npython tools\u002Ftrain_net.py --config-file configs\u002Fsparse_inst_r50_giam.yaml --num-gpus 8 --eval MODEL.WEIGHTS sparse_inst_r50_giam_aug_2b7d68.pth\n```\n\n* [Inference Speed] To obtain the inference speed (FPS) on one GPU device, you can run:\n\n```bash\npython tools\u002Ftest_net.py --config-file \u003CCONFIG> MODEL.WEIGHTS \u003CMODEL-PATH> INPUT.MIN_SIZE_TEST 512\n# example:\npython tools\u002Ftest_net.py --config-file configs\u002Fsparse_inst_r50_giam.yaml MODEL.WEIGHTS sparse_inst_r50_giam_aug_2b7d68.pth INPUT.MIN_SIZE_TEST 512\n```\n\n**Note:** \n* The [`tools\u002Ftest_net.py`](.\u002Ftools\u002Ftest_net.py) only supports **1 GPU** and **1 image per batch** for measuring inference speed.\n* The inference time consists of the *pure forward time* and the *post-processing time*. While the evaluation processing, data loading, and pre-processing for wrappers (*e.g.*, ImageList) are not included.\n* `COCOMaskEvaluator` is modified from [`COCOEvaluator`](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2\u002Fblob\u002Fmain\u002Fdetectron2\u002Fevaluation\u002Fcoco_evaluation.py) for evaluating mask-only results.\n\n### FLOPs and Parameters\n\nThe [`get_flops.py`](tools\u002Fget_flops.py) is built based on `detectron2` and `fvcore`. \n\n```bash\npython tools\u002Fget_flops.py --config-file \u003CCONFIG> --tasks parameter flop\n```\n\n### Visualizing Images with SparseInst\n\nTo inference or visualize the segmentation results on your images, you can run:\n\n```bash\npython demo.py --config-file \u003CCONFIG> --input \u003CIMAGE-PATH> --output results --opts MODEL.WEIGHTS \u003CMODEL-PATH>\n# example\npython demo.py --config-file configs\u002Fsparse_inst_r50_giam.yaml --input datasets\u002Fcoco\u002Fval2017\u002F* --output results --opt MODEL.WEIGHTS sparse_inst_r50_giam_aug_2b7d68.pth INPUT.MIN_SIZE_TEST 512\n```\n* Besides, the `demo.py` also supports inference on video (`--video-input`), camera (`--webcam`). For inference on video, you might refer to [issue #9](https:\u002F\u002Fgithub.com\u002Fhustvl\u002FSparseInst\u002Fissues\u002F9) to avoid someerrors.\n* `--opts` supports modifications to the config-file, *e.g.,* `INPUT.MIN_SIZE_TEST 512`.\n* `--input` can be single image or a folder of images, *e.g.,* `xxx\u002F*`.\n* If `--output` is not specified, a popup window will show the visualization results for each image.\n* Lowering the `confidence-threshold` will show more instances but with more false positives.\n\n\u003Cdiv>\n\u003Ctable align=\"center\">\n\u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhustvl_SparseInst_readme_a6f72205a7c3.jpg\" height=200>\u003C\u002Ftd>\n\u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhustvl_SparseInst_readme_95da21b280a1.jpg\" height=200>\u003C\u002Ftd>\n\u003C\u002Ftable>\n\u003Cspan>\u003Cp align=\"center\">Visualization results (SparseInst-R50-GIAM)\u003C\u002Fp>\u003C\u002Fspan>\n\u003C\u002Fdiv>\n\n\n### Training SparseInst\n\nTo train the SparseInst model on COCO dataset with 8 GPUs. 8 GPUs are required for the training. If you only have 4 GPUs or GPU memory is limited, it doesn't matter and you can reduce the batch size through `SOLVER.IMS_PER_BATCH` or reduce the input size. If you adjust the batch size, learning schedule should be adjusted according to the linear scaling rule.\n\n```bash\npython tools\u002Ftrain_net.py --config-file \u003CCONFIG> --num-gpus 8 \n# example\npython tools\u002Ftrain_net.py --config-file configs\u002Fsparse_inst_r50vd_dcn_giam_aug.yaml --num-gpus 8\n```\n\n\n\u003C!-- ### ONNX Export -->\n\n\n### Custom Training of SparseInst\n\n1. We suggest you convert your custom datasets into the `COCO` format, which enables the usage of the default dataset mappers and loaders. You may find more details in the [official guide of detectron2](https:\u002F\u002Fdetectron2.readthedocs.io\u002Fen\u002Flatest\u002Ftutorials\u002Fdatasets.html#register-a-coco-format-dataset).\n2. You need to check whether `NUM_CLASSES` and `NUM_MASKS` should be changed according to your scenarios or tasks.\n3. Change the configurations accordingly.\n4. After finishing the above procedures, you can easily train SparseInst by `train_net.py`.\n\n\n## Acknowledgements\n\nSparseInst is based on [detectron2](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2), [OneNet](https:\u002F\u002Fgithub.com\u002FPeizeSun\u002FOneNet), [DETR](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetr), and [timm](https:\u002F\u002Fgithub.com\u002Frwightman\u002Fpytorch-image-models), and we sincerely thanks for their code and contribution to the community!\n\n\n## Citing SparseInst\n\nIf you find SparseInst is useful in your research or applications, please consider giving us a star &#127775; and citing SparseInst by the following BibTeX entry.\n\n```BibTeX\n@inproceedings{Cheng2022SparseInst,\n  title     =   {Sparse Instance Activation for Real-Time Instance Segmentation},\n  author    =   {Cheng, Tianheng and Wang, Xinggang and Chen, Shaoyu and Zhang, Wenqiang and Zhang, Qian and Huang, Chang and Zhang, Zhaoxiang and Liu, Wenyu},\n  booktitle =   {Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR)},\n  year      =   {2022}\n}\n\n```\n\n\n## License\n\nSparseInst is released under the [MIT Licence](LICENCE).\n","\u003Cdiv align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhustvl_SparseInst_readme_44994886342b.gif\">\n\u003Cbr>\n\u003Cbr>\nTianheng Cheng, \u003Ca href=\"https:\u002F\u002Fxwcv.github.io\u002F\">Xinggang Wang\u003C\u002Fa>\u003Csup>\u003Cspan>&#8224;\u003C\u002Fspan>\u003C\u002Fsup>, Shaoyu Chen, Wenqiang Zhang, \u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=pCY-bikAAAAJ&hl=zh-CN\">Qian Zhang\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=IyyEKyIAAAAJ&hl=zh-CN\">Chang Huang\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fzhaoxiangzhang.net\u002F\">Zhaoxiang Zhang\u003C\u002Fa>, \u003Ca href=\"http:\u002F\u002Feic.hust.edu.cn\u002Fprofessor\u002Fliuwenyu\u002F\"> Wenyu Liu\u003C\u002Fa>\n\u003C\u002Fbr>\n(\u003Cspan>&#8224;\u003C\u002Fspan>: 通讯作者)\n\n\u003C!-- \u003Cdiv>\u003Ca href=\"\">[项目页面]\u003C\u002Fa>(即将上线)\u003C\u002Fdiv>  -->\n\u003Cdiv>\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2203.12827\">[arXiv论文]\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FCVPR2022\u002Fpapers\u002FCheng_Sparse_Instance_Activation_for_Real-Time_Instance_Segmentation_CVPR_2022_paper.pdf\">[CVPR论文]\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1xhqQvQ0YVCHd8XQxnCVqef75Hey7kI-d\u002Fview?usp=sharing\">[演示文稿]\u003C\u002Fa>\n\u003C\u002Fdiv>\n\u003C\u002Fdiv>\n\n\n\n## 亮点 \n\n\u003Cdiv align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhustvl_SparseInst_readme_21ccb57c5512.gif\">\n\u003Cbr>\n\u003Cbr>\n\u003Cdiv>\n\n[![PWC](https:\u002F\u002Fimg.shields.io\u002Fendpoint.svg?url=https:\u002F\u002Fpaperswithcode.com\u002Fbadge\u002Fsparse-instance-activation-for-real-time\u002Freal-time-instance-segmentation-on-mscoco)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Freal-time-instance-segmentation-on-mscoco?p=sparse-instance-activation-for-real-time)\n\u003C\u002Fdiv>\n\u003C\u002Fdiv>\n\n\n\n* SparseInst提出了一种全新的目标表示方法——实例激活图（IAM），能够自适应地突出显示用于识别的目标信息区域。\n* SparseInst是一个简单、高效且完全基于卷积的框架，无需非极大值抑制（NMS）或排序操作，部署起来非常方便！\n* SparseInst在速度与精度之间取得了良好的平衡，例如，在输入分辨率为608×608时，可达到37.9 AP和40 FPS。\n\n\n\n## 更新\n\n`本项目仍在积极开发中，请持续关注！` &#9749;\n\n* `[2022-10-31]`: 我们发布了使用[`CSP-DarkNet53`](configs\u002Fsparse_inst_cspdarknet53_giam.yaml)主干网络的模型及权重。该主干网络具有很强的基准性能，在推理速度和精度方面都极具竞争力。\n\n* `[2022-10-19]`: 我们提供了基于[MindSpore](https:\u002F\u002Fwww.mindspore.cn\u002F)的实现与推理代码，MindSpore是一款优秀且高效的深度学习框架。感谢[Ruiqi Wang](https:\u002F\u002Fgithub.com\u002FRuiqiWang00)的这一贡献！\n\n* `[2022-8-9]`: 我们提供了FLOPs计数器[`get_flops.py`](.\u002Ftools\u002Fget_flops.py)，用于计算SparseInst的FLOPs和参数量。此次更新还包括一些错误修复。\n\n* `[2022-7-17]`: `更快`&#128640;：SparseInst现在支持以**FP16**格式进行训练和推理。使用FP16推理可将速度提升**30%**。`更稳健`：为了提高数值稳定性，尤其是在导出为ONNX格式时，我们用[`Softmax`](configs\u002Fsparse_inst_r50_giam_softmax.yaml)替换了原来的`Sigmoid + Norm`。`更易用`：我们提供了[脚本](.\u002Fonnx\u002Fconvert_onnx.py)，用于将SparseInst导出为ONNX模型。\n\n* `[2022-4-29]`: 我们修复了关于可视化`demo.py`的**常见问题**，例如`ValueError: GenericMask cannot handle ...`。\n\n* `[2022-4-7]`: 我们提供了用于图像可视化与推理的`demo`代码。此外，我们还为SparseInst增加了更多主干网络，包括[ResNet-101](https:\u002F\u002Farxiv.org\u002Fabs\u002F1512.03385)、[CSPDarkNet](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2004.10934v1.pdf)以及[PvTv2](https:\u002F\u002Farxiv.org\u002Fabs\u002F2102.12122)。我们仍在继续支持更多主干网络。\n\n* `[2022-3-25]`: 我们正式发布了SparseInst的代码和模型！ \n\n \n\n## 概述\n**SparseInst**是一种概念新颖、高效且完全基于卷积的实时实例分割框架。\n与传统的区域框或锚点（中心点）不同，SparseInst采用一组稀疏的**实例激活图**作为目标表示形式，用于突出显示每个前景目标的信息区域。\n随后，它根据这些突出显示的区域聚合特征，从而获得实例级别的特征，用于识别和分割。\n通过二部匹配机制，实例激活图以一对一的方式预测目标，从而避免了后处理中的非极大值抑制（NMS）。得益于这种简单而有效的实例激活图设计，SparseInst具有极快的推理速度，在COCO数据集上可达到**40 FPS**和**37.9 AP**（NVIDIA 2080Ti），在速度和精度方面均显著优于同类方法。\n\n\n\u003Ccenter>\n\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhustvl_SparseInst_readme_04f0bd07d129.png\">\n\u003C\u002Fcenter>\n\n## 模型\n\n我们提供了两种版本的 SparseInst，即基础 IAM（3×3 卷积）和分组 IAM（简称 G-IAM），它们使用不同的主干网络。所有模型均在 MS-COCO *train2017* 数据集上进行训练。\n\n#### 快速模型\n\n| 模型 | 主干网络 | 输入尺寸 | 数据增强 | AP\u003Csup>val\u003C\u002Fsup> |  AP  | FPS | 权重 |\n| :---- | :------  | :---: | :-: |:--------------: | :--: | :-: | :-----: |\n| [SparseInst](configs\u002Fsparse_inst_r50_base.yaml) | [R-50](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1Ee6nPXlj1eewAnooYtoPtLzbRp_mDxfB\u002Fview?usp=sharing) | 640 | &#x2718; | 32.8 | 33.2 | 44.3 | [模型](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F12RQLHD5EZKIOvlqW3avUCeYjFG1NPKDy\u002Fview?usp=sharing) |\n| [SparseInst](sparse_inst_r50vd_base.yaml) | [R-50-vd](https:\u002F\u002Fgithub.com\u002Frwightman\u002Fpytorch-image-models\u002Freleases\u002Fdownload\u002Fv0.1-weights\u002Fresnet50d_ra2-464e36ba.pth) | 640 | &#x2718; | 34.1 | 34.5 | 42.6 | [模型](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1fjPFy35X2iJu3tYwVdAq4Bel82PfH5kx\u002Fview?usp=sharing)|\n| [SparseInst (G-IAM)](configs\u002Fsparse_inst_r50_giam.yaml) | [R-50](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1Ee6nPXlj1eewAnooYtoPtLzbRp_mDxfB\u002Fview?usp=sharing) | 608 | &#x2718; | 33.4 | 34.0 | 44.6 | [模型](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1pXU7Dsa1L7nUiLU9ULG2F6Pl5m5NEguL\u002Fview?usp=sharing) |\n| [SparseInst (G-IAM, Softmax)](configs\u002Fsparse_inst_r50_giam_soft.yaml) | [R-50](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1Ee6nPXlj1eewAnooYtoPtLzbRp_mDxfB\u002Fview?usp=sharing) | 608 | &#x2718; | 33.6 | - | 44.6 | [模型](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1doterrG89SjmLxDyU8IhLYRGxVH69sR2\u002Fview?usp=sharing) |\n| [SparseInst (G-IAM)](configs\u002Fsparse_inst_r50_giam_aug.yaml) | [R-50](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1Ee6nPXlj1eewAnooYtoPtLzbRp_mDxfB\u002Fview?usp=sharing) | 608 | &#10003;| 34.2 | 34.7 | 44.6 | [模型](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1MK8rO3qtA7vN9KVSBdp0VvZHCNq8-bvz\u002Fview?usp=sharing) |\n| [SparseInst (G-IAM)](configs\u002Fsparse_inst_r50_dcn_giam_aug.yaml) | [R-50-DCN](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1Ee6nPXlj1eewAnooYtoPtLzbRp_mDxfB\u002Fview?usp=sharing) | 608 | &#10003;| 36.4 | 36.8 | 41.6 | [模型](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1qxdLRRHbIWEwRYn-NPPeCCk6fhBjc946\u002Fview?usp=sharing) |\n| [SparseInst (G-IAM)](configs\u002Fsparse_inst_r50vd_giam_aug.yaml) | [R-50-vd](https:\u002F\u002Fgithub.com\u002Frwightman\u002Fpytorch-image-models\u002Freleases\u002Fdownload\u002Fv0.1-weights\u002Fresnet50d_ra2-464e36ba.pth) | 608 | &#10003;| 35.6 | 36.1 | 42.8| [模型](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1dlamg7ych_BdWpPUCuiBXbwE0SXpsfGx\u002Fview?usp=sharing) |\n| [SparseInst (G-IAM)](configs\u002Fsparse_inst_r50vd_dcn_giam_aug.yaml) | [R-50-vd-DCN](https:\u002F\u002Fgithub.com\u002Frwightman\u002Fpytorch-image-models\u002Freleases\u002Fdownload\u002Fv0.1-weights\u002Fresnet50d_ra2-464e36ba.pth) | 608 | &#10003; | 37.4 | 37.9 | 40.0  | [模型](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1clYPdCNrDNZLbmlAEJ7wjsrOLn1igOpT\u002Fview?usp=sharing)| \n| [SparseInst (G-IAM)](configs\u002Fsparse_inst_r50vd_dcn_giam_aug.yaml) | [R-50-vd-DCN](https:\u002F\u002Fgithub.com\u002Frwightman\u002Fpytorch-image-models\u002Freleases\u002Fdownload\u002Fv0.1-weights\u002Fresnet50d_ra2-464e36ba.pth) | 640 | &#10003; | 37.7 | 38.1 | 39.3 |  [模型](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1clYPdCNrDNZLbmlAEJ7wjsrOLn1igOpT\u002Fview?usp=sharing)| \n\n#### 使用其他主干网络的 SparseInst\n\n| 模型 | 主干网络 | 输入尺寸 | AP\u003Csup>val\u003C\u002Fsup> |  AP  | FPS | 权重 |\n| :---- | :------ | :---: | :--------------: | :--: | :-: | :-----: |\n| SparseInst (G-IAM) | [CSPDarkNet](configs\u002Fsparse_inst_cspdarknet53_giam.yaml) | 640 | 35.1 | -| - | [模型](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1rcUJWUbusM216Zbtmo_xB774jdjb3qSt\u002Fview?usp=sharing) |\n\n#### 较大模型\n\n| 模型 | 主干网络 | 输入尺寸 | 数据增强 | AP\u003Csup>val\u003C\u002Fsup> |  AP  | FPS | 权重 |\n| :---- | :------ | :---: | :---: | :--------------: | :--: | :-: | :-----: |\n| [SparseInst (G-IAM)](configs\u002Fsparse_inst_r101_giam.yaml) | [R-101](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1-6ZBvC55unwuHvGn-Xf4xuy2Qr1vC7Zo\u002Fview?usp=sharing) | 640 | &#x2718; | 34.9 | 35.5 | - | [模型](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1EZZck-UNfom652iyDhdaGYbxS0MrO__z\u002Fview?usp=sharing)|\n| [SparseInst (G-IAM)](configs\u002Fsparse_inst_r101_dcn_giam.yaml) | [R-101-DCN](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1-6ZBvC55unwuHvGn-Xf4xuy2Qr1vC7Zo\u002Fview?usp=sharing) | 640 | &#x2718; | 36.4 | 36.9 | - | [模型](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1shkFvyBmDlWRxl1ActD6VfZJTJYBGBjv\u002Fview?usp=sharing) |\n\n#### 使用视觉Transformer的 SparseInst\n\n| 模型 | 主干网络 | 输入尺寸 | 数据增强 | AP\u003Csup>val\u003C\u002Fsup> |  AP  | FPS | 权重 |\n| :---- | :------ | :---: | :---: | :--------------: | :--: | :-: | :-----: |\n| [SparseInst (G-IAM)](configs\u002Fsparse_inst_pvt_b1_giam.yaml) | [PVTv2-B1](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1B7JTO0WqyhFn7nvUlRf6qKQrFzTnRWDC\u002Fview?usp=sharing) | 640 |  &#x2718; | 35.3 | 36.0 | 33.5 (48.9\u003Csup>&#x021A1;\u003C\u002Fsup>)| [模型](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F13l9JgTz3sF6j3vSVHOOhAYJnCf-QuNe_\u002Fview?usp=sharing) |\n| [SparseInst (G-IAM)](configs\u002Fsparse_inst_pvt_b2_li_giam.yaml) | [PVTv2-B2-li](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1YhjCH4FZa9ekWUqa-JovEfAR2wuUXEtQ\u002Fview?usp=sharing) | 640 |  &#x2718; | 37.2 | 38.2 | 26.5 | [模型](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1DFxQnFg_UL6kmMoNC4StUKo79RXVHyNF\u002Fview?usp=sharing) |\n\n\u003Csup>&#x021A1;\u003C\u002Fsup>: 在 RTX 3090 上测量。\n\n**注：**\n* **我们将继续添加更多模型**，包括更高效的卷积网络、视觉Transformer以及用于高性能和高速度的大模型，请持续关注 &#128513;!\n* 推理速度是在未指定情况下使用单块 NVIDIA 2080Ti 测量的。\n* 我们尚未采用 TensorRT 或其他工具来加速 SparseInst 的推理。不过，我们目前正在努力实现，并将尽快为 ONNX、TensorRT、MindSpore、[Blade](https:\u002F\u002Fgithub.com\u002Falibaba\u002FBladeDISC) 等框架提供支持！\n* AP 表示在 MS-COCO *test-dev2017* 数据集上评估的 AP 值。\n* *输入*表示输入图像的短边长度，例如 512×864 和 608×864，我们会保持输入图像的宽高比，且长边不超过 864。\n* 推理速度可能会因不同机器（如 2080 Ti）和不同版本的 detectron（我们主要使用 [v0.3](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2\u002Ftree\u002Fv0.3)）而略有变化。如果变化较大，例如超过 5ms，请随时联系我们。\n* 对于 `aug`（数据增强），我们仅采用 detectron2 提供的简单随机裁剪（裁剪尺寸：[384, 600]）。\n* 我们默认采用 `weight decay=5e-2`，这与原始论文略有不同。\n* **[百度网盘上的权重]**：我们也在百度网盘上提供了训练好的模型：[分享链接](https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1tot7Wcoi4J1xh8ZS7VikZg)（提取码：lkdo）。\n\n## 安装与先决条件\n\n本项目基于优秀的框架 [detectron2](https:\u002F\u002Fgithub.com\u002Ffacebookreseach\u002Fdetectron2) 构建，因此您需要先安装 detectron2。更多详细信息请参阅 [官方安装指南](https:\u002F\u002Fdetectron2.readthedocs.io\u002Fen\u002Flatest\u002Ftutorials\u002Finstall.html)。\n\n**更新：** SparseInst 在 [detectron2-v0.6](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2\u002Ftree\u002Fv0.6) 上运行良好。\n\n**注意：** 此前，我们主要使用 detectron2 的 [v0.3](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2\u002Ftree\u002Fv0.3) 版本来进行实验和评估。此外，我们也对最新版本 [v0.6](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2\u002Ftree\u002Fv0.6) 进行了测试。如果您在更高版本的 detectron2 中发现任何错误或不兼容问题，请随时提交 issue！\n\n安装 detectron2：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2.git\n# 如果切换到特定版本，例如 v0.3（推荐）或 v0.6\ngit checkout tags\u002Fv0.6\n# 构建 detectron2\npython setup.py build develop\n```\n\n## 开始使用\n\n\n### &#128293; 使用 FP16 的 SparseInst\n\n使用 FP16 的 SparseInst 可以使推理速度提升 30%，并显著节省训练内存。下表提供了关于内存、推理速度和训练速度的一些对比。\n\n|  FP16 | 训练内存（log） | 训练内存（`nvidia-smi`） | 训练速度 | 推理速度 | \n| :---: | :-------------: | :----------------------: | :---------: | :----------: |\n| &#x2718; | 6.0G | 10.5G | 0.8690s\u002F迭代 | 52.17 FPS |\n| &#10003; | 3.9G | 6.8G  | 0.6949s\u002F迭代 | 67.57 FPS |\n\n注：统计数据是在 NVIDIA 3090 上测量的。使用 FP16 后，我们的训练速度更快，并且可以增加批次大小以获得更好的性能。\n\n* 使用 FP16 进行训练：启用 FP16 非常简单，只需将 `SOLVER.AMP.ENABLED=True` 设置为真，或将此配置添加到配置文件中即可。\n\n```bash\npython tools\u002Ftrain_net.py --config-file configs\u002Fsparse_inst_r50_giam_fp16.yaml --num-gpus 8 SOLVER.AMP.ENABLED True\n```\n\n* 使用 FP16 进行测试：通过添加 `--fp16` 来启用推理时的 FP16 模式。\n\n```bash\npython tools\u002Ftest_net.py --config-file configs\u002Fsparse_inst_r50_giam_fp16.yaml --fp16 MODEL.WEIGHTS model_final.pth \n```\n\n### 测试 SparseInst\n\n在测试之前，您需要指定配置文件 `\u003CCONFIG>` 和模型权重 `\u003CMODEL-PATH>》。此外，您还可以通过在配置文件或命令行中设置 `INPUT.MIN_SIZE_TEST` 来更改输入尺寸。\n\n* [性能评估] 要获取评估结果，例如 COCO 数据集上的 mask AP，您可以运行以下命令：\n\n```bash\npython tools\u002Ftrain_net.py --config-file \u003CCONFIG> --num-gpus \u003CGPUS> --eval MODEL.WEIGHTS \u003CMODEL-PATH>\n# 示例：\npython tools\u002Ftrain_net.py --config-file configs\u002Fsparse_inst_r50_giam.yaml --num-gpus 8 --eval MODEL.WEIGHTS sparse_inst_r50_giam_aug_2b7d68.pth\n```\n\n* [推理速度] 要获取单个 GPU 设备上的推理速度（FPS），您可以运行以下命令：\n\n```bash\npython tools\u002Ftest_net.py --config-file \u003CCONFIG> MODEL.WEIGHTS \u003CMODEL-PATH> INPUT.MIN_SIZE_TEST 512\n# 示例：\npython tools\u002Ftest_net.py --config-file configs\u002Fsparse_inst_r50_giam.yaml MODEL.WEIGHTS sparse_inst_r50_giam_aug_2b7d68.pth INPUT.MIN_SIZE_TEST 512\n```\n\n**注意：**\n* [`tools\u002Ftest_net.py`](.\u002Ftools\u002Ftest_net.py) 仅支持 **1 个 GPU** 和 **每批 1 张图像** 来测量推理速度。\n* 推理时间包括 *纯前向传播时间* 和 *后处理时间*。而评估过程、数据加载以及包装器（例如 ImageList）的预处理并不包含在内。\n* `COCOMaskEvaluator` 是基于 [`COCOEvaluator`](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2\u002Fblob\u002Fmain\u002Fdetectron2\u002Fevaluation\u002Fcoco_evaluation.py) 修改而来，用于评估仅包含掩码的结果。\n\n### FLOPs 和参数\n\n[`get_flops.py`](tools\u002Fget_flops.py) 基于 `detectron2` 和 `fvcore` 构建。\n\n```bash\npython tools\u002Fget_flops.py --config-file \u003CCONFIG> --tasks parameter flop\n```\n\n### 使用 SparseInst 可视化图像\n\n要对您的图像进行推理或可视化分割结果，您可以运行以下命令：\n\n```bash\npython demo.py --config-file \u003CCONFIG> --input \u003CIMAGE-PATH> --output results --opts MODEL.WEIGHTS \u003CMODEL-PATH>\n# 示例\npython demo.py --config-file configs\u002Fsparse_inst_r50_giam.yaml --input datasets\u002Fcoco\u002Fval2017\u002F* --output results --opt MODEL.WEIGHTS sparse_inst_r50_giam_aug_2b7d68.pth INPUT.MIN_SIZE_TEST 512\n```\n* 此外，`demo.py` 还支持对视频（`--video-input`）和摄像头（`--webcam`）进行推理。对于视频推理，您可以参考 [issue #9](https:\u002F\u002Fgithub.com\u002Fhustvl\u002FSparseInst\u002Fissues\u002F9) 以避免一些错误。\n* `--opts` 支持对配置文件的修改，例如 `INPUT.MIN_SIZE_TEST 512`。\n* `--input` 可以是单张图片或一个包含多张图片的文件夹，例如 `xxx\u002F*`。\n* 如果未指定 `--output`，系统将弹出窗口显示每张图片的可视化结果。\n* 降低 `confidence-threshold` 会显示更多的实例，但也会增加误检率。\n\n\u003Cdiv>\n\u003Ctable align=\"center\">\n\u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhustvl_SparseInst_readme_a6f72205a7c3.jpg\" height=200>\u003C\u002Ftd>\n\u003Ctd>\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhustvl_SparseInst_readme_95da21b280a1.jpg\" height=200>\u003C\u002Ftd>\n\u003C\u002Ftable>\n\u003Cspan>\u003Cp align=\"center\">可视化结果（SparseInst-R50-GIAM）\u003C\u002Fp>\u003C\u002Fspan>\n\u003C\u002Fdiv>\n\n\n### 训练 SparseInst\n\n要在 COCO 数据集上使用 8 个 GPU 训练 SparseInst 模型，需要 8 个 GPU。如果您只有 4 个 GPU 或 GPU 内存有限，也不必担心，可以通过调整 `SOLVER.IMS_PER_BATCH` 来减少批次大小，或者降低输入尺寸。如果调整了批次大小，则学习率调度也应根据线性缩放规则进行相应调整。\n\n```bash\npython tools\u002Ftrain_net.py --config-file \u003CCONFIG> --num-gpus 8 \n# 示例\npython tools\u002Ftrain_net.py --config-file configs\u002Fsparse_inst_r50vd_dcn_giam_aug.yaml --num-gpus 8\n```\n\n\n\u003C!-- ### ONNX 导出 -->\n\n\n### 自定义训练 SparseInst\n\n1. 我们建议您将自定义数据集转换为 `COCO` 格式，这样就可以使用默认的数据集映射器和加载器。更多细节请参阅 [detectron2 官方指南](https:\u002F\u002Fdetectron2.readthedocs.io\u002Fen\u002Flatest\u002Ftutorials\u002Fdatasets.html#register-a-coco-format-dataset)。\n2. 您需要检查 `NUM_CLASSES` 和 `NUM_MASKS` 是否需要根据您的场景或任务进行调整。\n3. 相应地修改配置。\n4. 完成上述步骤后，您就可以轻松地使用 `train_net.py` 训练 SparseInst。\n\n## 致谢\n\nSparseInst 基于 [detectron2](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2)、[OneNet](https:\u002F\u002Fgithub.com\u002FPeizeSun\u002FOneNet)、[DETR](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetr) 和 [timm](https:\u002F\u002Fgithub.com\u002Frwightman\u002Fpytorch-image-models) 构建，我们衷心感谢这些项目及其贡献者对社区的支持！\n\n## 引用 SparseInst\n\n如果您在研究或应用中发现 SparseInst 有用，请考虑为我们点个赞 &#127775;，并使用以下 BibTeX 条目引用 SparseInst。\n\n```BibTeX\n@inproceedings{Cheng2022SparseInst,\n  title     =   {用于实时实例分割的稀疏实例激活},\n  author    =   {程天恒、王兴刚、陈绍宇、张文强、张倩、黄畅、张兆祥、刘文宇},\n  booktitle =   {IEEE 计算机视觉与模式识别会议（CVPR）论文集},\n  year      =   {2022}\n}\n\n```\n\n\n## 许可证\n\nSparseInst 采用 [MIT 许可证](LICENCE) 发布。","# SparseInst 快速上手指南\n\nSparseInst 是一个概念新颖、高效且全卷积的实时实例分割框架。它采用稀疏的“实例激活图”（IAM）作为物体表示，无需非极大值抑制（NMS）或排序，即可在速度与精度之间取得极佳平衡（例如在 COCO 数据集上达到 37.9 AP 和 40 FPS）。\n\n## 环境准备\n\n本项目基于 **Detectron2** 框架构建。请确保您的环境满足以下要求：\n\n*   **操作系统**: Linux (推荐 Ubuntu 18.04+)\n*   **Python**: 3.6 - 3.9\n*   **PyTorch**: 1.8 或更高版本\n*   **CUDA**: 支持 CUDA 的 NVIDIA GPU (测试基于 2080Ti\u002F3090)\n*   **前置依赖**: `gcc`, `g++`, `opencv-python`, `scipy` 等常规深度学习依赖。\n\n> **注意**：SparseInst 已在 Detectron2 v0.3 和 v0.6 版本上验证通过。\n\n## 安装步骤\n\n### 1. 安装 Detectron2\n\n首先克隆 Detectron2 仓库并安装。推荐使用 v0.6 版本（也可根据需求切换至 v0.3）。\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2.git\ncd detectron2\n\n# 切换到特定版本 (推荐 v0.6 或 v0.3)\ngit checkout tags\u002Fv0.6\n\n# 编译并安装\npython setup.py build develop\n```\n\n*国内用户加速建议*：如果克隆速度慢，可使用 Gitee 镜像或配置 git 代理。安装 PyTorch 时建议使用清华或阿里镜像源：\n```bash\npip install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n```\n\n### 2. 安装 SparseInst\n\n克隆 SparseInst 项目并安装相关依赖（假设已安装 Detectron2）：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fhustvl\u002FSparseInst.git\ncd SparseInst\n\n# 安装项目依赖 (如果有 requirements.txt)\npip install -r requirements.txt\n```\n\n## 基本使用\n\n### 1. 下载预训练模型\n\n从 README 提供的链接下载权重文件（例如 `sparse_inst_r50_giam.yaml` 对应的模型），并将其放置在项目目录下，或通过配置文件指定路径。\n\n*   **百度网盘加速下载**: [链接](https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1tot7Wcoi4J1xh8ZS7VikZg) (提取码: lkdo)\n*   **Google Drive**: 参考 README 中的 \"Models\" 表格链接。\n\n### 2. 图像推理与可视化\n\n使用提供的 `demo.py` 脚本对单张图像进行推理并可视化结果。\n\n```bash\npython demo.py \\\n    --config-file configs\u002Fsparse_inst_r50_giam.yaml \\\n    --input input.jpg \\\n    --output output.jpg \\\n    --opts MODEL.WEIGHTS \u002Fpath\u002Fto\u002Fyour\u002Fmodel.pth\n```\n\n**参数说明：**\n*   `--config-file`: 模型配置文件路径。\n*   `--input`: 输入图片路径（支持通配符处理多张图片）。\n*   `--output`: 输出结果保存路径。\n*   `MODEL.WEIGHTS`: 预训练权重文件的绝对路径。\n\n### 3. 导出 ONNX 模型（可选）\n\n若需部署到生产环境，可将模型导出为 ONNX 格式：\n\n```bash\npython onnx\u002Fconvert_onnx.py \\\n    --config-file configs\u002Fsparse_inst_r50_giam_softmax.yaml \\\n    --output sparse_inst.onnx \\\n    --opts MODEL.WEIGHTS \u002Fpath\u002Fto\u002Fyour\u002Fmodel.pth\n```\n\n> **提示**：导出 ONNX 时建议使用包含 `Softmax` 的配置文件（如 `sparse_inst_r50_giam_softmax.yaml`）以获得更好的数值稳定性。","某智慧物流园区的技术团队正在开发一套实时包裹分拣系统，需要摄像头在高速传送带上精准识别并分割重叠的快递包裹，以引导机械臂抓取。\n\n### 没有 SparseInst 时\n- **延迟过高导致漏检**：传统实例分割模型依赖非极大值抑制（NMS）和后处理排序，在包裹密集场景下计算耗时严重，无法满足每秒 30 帧以上的实时性要求。\n- **重叠目标识别困难**：当多个包裹紧密堆叠时，基于锚框（Anchor）的方法难以区分边界，常出现掩码粘连或错误合并，导致分拣指令出错。\n- **部署复杂度大**：现有方案结构复杂，对算力要求极高，难以在边缘计算设备或低成本工控机上流畅运行，增加了硬件成本。\n- **动态场景适应性差**：传送带速度波动时，模型推理速度不稳定，容易造成画面卡顿或处理队列堆积。\n\n### 使用 SparseInst 后\n- **实现真正实时推理**：SparseInst 采用全卷积架构，彻底移除了 NMS 和排序步骤，在 608x 输入分辨率下即可达到 40 FPS，轻松跟上高速传送带节奏。\n- **精准分割重叠物体**：通过稀疏实例激活图（IAM）自适应高亮关键区域，即使包裹紧密接触也能生成清晰独立的掩码，显著提升抓取成功率。\n- **轻量化易于部署**：框架简洁高效，支持 FP16 加速和 ONNX 导出，推理速度再提升 30%，可平滑部署于各类边缘端设备，降低硬件门槛。\n- **稳定高效的吞吐量**：无论包裹密度如何变化，SparseInst 均能保持稳定的推理延迟，确保分拣流水线连续无阻塞运行。\n\nSparseInst 通过去除繁琐的后处理步骤并利用稀疏激活机制，在保持高精度的同时实现了极致的推理速度，完美解决了工业场景中实时实例分割的痛点。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fhustvl_SparseInst_a6f72205.jpg","hustvl","HUST Vision Lab","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fhustvl_3e2bf80d.png","HUST Vision Lab of the School of EIC in HUST. Lab Lead @xinggangw",null,"https:\u002F\u002Fgithub.com\u002Fhustvl",[82,86],{"name":83,"color":84,"percentage":85},"Python","#3572A5",99.3,{"name":87,"color":88,"percentage":89},"Dockerfile","#384d54",0.7,618,75,"2026-03-05T09:07:36","MIT",4,"Linux","需要 NVIDIA GPU (测试环境为 NVIDIA 2080Ti\u002FRTX 3090)，支持 FP16 加速，需安装对应版本的 CUDA (具体版本取决于 detectron2 和 PyTorch 的安装要求)","未说明",{"notes":99,"python":100,"dependencies":101},"该项目基于 detectron2 框架构建，安装前需先编译安装 detectron2 (推荐 v0.3 或已验证的 v0.6 版本)。支持多种骨干网络 (ResNet, CSPDarkNet, PVTv2)。提供 ONNX 导出脚本及 MindSpore 版本实现。推理速度测试基于单卡 NVIDIA 2080Ti，未使用 TensorRT 加速。","未说明 (依赖 detectron2 环境，通常建议 Python 3.6+)",[102,103,104,105,106,107],"detectron2 (v0.3 或 v0.6)","torch","torchvision","opencv-python","pycocotools","timm (用于 PVT 等骨干网络)",[14],[110,111,112,113,114],"instance-segmentation","detectron2","object-detection","panoptic-segmentation","real-time","2026-03-27T02:49:30.150509","2026-04-06T08:10:27.797618",[118,123,128,132,137,142],{"id":119,"question_zh":120,"answer_zh":121,"source_url":122},16562,"运行视频推理（demo.py）时出现 OpenCV 错误，提示无法找到起始编号或报错与边界框（bounding boxes）相关怎么办？","这是因为 detectron2 默认需要边界框来计算 IoU 以在帧间关联对象并分配颜色，但 SparseInst 不预测边界框。解决方法是修改 `detectron2\u002Futils\u002Fvideo_visualizer.py` 文件（约第 85 行），使用预测的掩码（masks）来关联对象。具体代码修改如下：\n\nif boxes is None:\n    masks_rles = mask_util.encode(\n        np.asarray(np.asarray(masks.tensor.permute(1, 2, 0)), dtype=np.uint8, order=\"F\")\n    )\n    detected = [\n        _DetectedInstance(classes[i], None, mask_rle=masks_rles[i], color=None, ttl=8)\n        for i in range(num_instances)\n    ]\nelse:\n    detected = [\n        _DetectedInstance(classes[i], boxes[i], mask_rle=None, color=None, ttl=8)\n        for i in range(num_instances)\n    ]","https:\u002F\u002Fgithub.com\u002Fhustvl\u002FSparseInst\u002Fissues\u002F9",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},16563,"如何在自定义数据集上训练 SparseInst 模型？如何配置数据集路径？","虽然文档中“使用自定义数据集训练”部分可能为空，但你可以通过修改配置文件中的 `DATASETS` 字段来指定路径。确保你的数据集格式与 COCO 类似，并在配置文件中设置：\n\nDATASETS:\n  TRAIN: (\"your_custom_dataset_train\",)\n  TEST: (\"your_custom_dataset_val\",)\n\n你需要先在 detectron2 中注册该数据集名称，然后在命令行运行训练脚本时引用对应的配置文件。","https:\u002F\u002Fgithub.com\u002Fhustvl\u002FSparseInst\u002Fissues\u002F79",{"id":129,"question_zh":130,"answer_zh":131,"source_url":127},16564,"如何指定使用特定的多张 GPU 进行训练（例如使用第 7 到第 15 号 GPU）？","可以通过设置环境变量 `CUDA_VISIBLE_DEVICES` 来指定使用的 GPU 编号。例如，若想使用编号为 7 到 15 的 GPU，启动命令应为：\n\nCUDA_VISIBLE_DEVICES=7,8,9,10,11,12,13,14,15 python train_net.py --config-file ...",{"id":133,"question_zh":134,"answer_zh":135,"source_url":136},16565,"模型预测结果中出现同一对象的重复掩码（duplicate masks），如何高效去除？","可以尝试以下几种方法减少重复预测：\n1. 调整 `NUM_MASKS` 参数：如果每张图像中的实例数量远少于 100，减小 `NUM_MASKS` 可以减少预测数量从而降低重复率。\n2. 利用 `CLS_THRESHOLD`：虽然二分匹配机制旨在每个真实对象只输出一个高置信度预测，但仍可能产生重复。适当提高分类阈值可过滤部分低质量重复。\n3. 注意：像 DETR 和 Sparse R-CNN 等研究表明 NMS 对去除重复预测并非必需，且影响有限。若重复严重，可检查数据集中是否存在标注问题或尝试微调匹配策略。","https:\u002F\u002Fgithub.com\u002Fhustvl\u002FSparseInst\u002Fissues\u002F8",{"id":138,"question_zh":139,"answer_zh":140,"source_url":141},16566,"将 SparseInst 导出为 ONNX 格式后，使用 onnxruntime 推理时出现 NaN 值，而 PyTorch 正常，可能的原因是什么？","该问题通常与模型结构中的某些操作在 ONNX 导出时不被完全支持有关。维护者已确认 ONNX 导出功能已得到支持，并更新了转换脚本。建议：\n1. 确保使用最新版本的代码库，其中包含修复后的 ONNX 导出脚本。\n2. 检查模型中是否有动态形状或非标准算子，尝试在导出时添加适当的 opset_version 参数。\n3. 官方表示即将发布经过测试的 ONNX 模型，可关注仓库更新获取预导出模型。","https:\u002F\u002Fgithub.com\u002Fhustvl\u002FSparseInst\u002Fissues\u002F14",{"id":143,"question_zh":144,"answer_zh":145,"source_url":146},16567,"SparseInst 是否支持 FP16 混合精度训练？有什么好处？","是的，SparseInst 现已支持 FP16（半精度）训练。启用 FP16 可以显著节省显存占用，并加快训练和推理速度，尤其在单卡或多卡训练大模型时效果明显。你可以在配置文件中启用混合精度选项（如 `MODEL.FP16_ENABLED: True` 或通过命令行参数设置），具体取决于所使用的 detectron2 版本和配置结构。","https:\u002F\u002Fgithub.com\u002Fhustvl\u002FSparseInst\u002Fissues\u002F6",[]]