[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-facebookresearch--CutLER":3,"tool-facebookresearch--CutLER":64},[4,17,26,40,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,2,"2026-04-03T11:11:01",[13,14,15],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":23,"last_commit_at":32,"category_tags":33,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,34,35,36,15,37,38,13,39],"数据工具","视频","插件","其他","语言模型","音频",{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":10,"last_commit_at":46,"category_tags":47,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,38,37],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":10,"last_commit_at":54,"category_tags":55,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74939,"2026-04-05T23:16:38",[38,14,13,37],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":23,"last_commit_at":62,"category_tags":63,"status":16},2471,"tesseract","tesseract-ocr\u002Ftesseract","Tesseract 是一款历史悠久且备受推崇的开源光学字符识别（OCR）引擎，最初由惠普实验室开发，后由 Google 维护，目前由全球社区共同贡献。它的核心功能是将图片中的文字转化为可编辑、可搜索的文本数据，有效解决了从扫描件、照片或 PDF 文档中提取文字信息的难题，是数字化归档和信息自动化的重要基础工具。\n\n在技术层面，Tesseract 展现了强大的适应能力。从版本 4 开始，它引入了基于长短期记忆网络（LSTM）的神经网络 OCR 引擎，显著提升了行识别的准确率；同时，为了兼顾旧有需求，它依然支持传统的字符模式识别引擎。Tesseract 原生支持 UTF-8 编码，开箱即用即可识别超过 100 种语言，并兼容 PNG、JPEG、TIFF 等多种常见图像格式。输出方面，它灵活支持纯文本、hOCR、PDF、TSV 等多种格式，方便后续数据处理。\n\nTesseract 主要面向开发者、研究人员以及需要构建文档处理流程的企业用户。由于它本身是一个命令行工具和库（libtesseract），不包含图形用户界面（GUI），因此最适合具备一定编程能力的技术人员集成到自动化脚本或应用程序中",73286,"2026-04-03T01:56:45",[13,14],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":80,"owner_location":80,"owner_email":80,"owner_twitter":80,"owner_website":81,"owner_url":82,"languages":83,"stars":99,"forks":100,"last_commit_at":101,"license":102,"difficulty_score":103,"env_os":104,"env_gpu":105,"env_ram":106,"env_deps":107,"category_tags":114,"github_topics":80,"view_count":23,"oss_zip_url":80,"oss_zip_packed_at":80,"status":16,"created_at":115,"updated_at":116,"faqs":117,"releases":147},3896,"facebookresearch\u002FCutLER","CutLER","Code release for \"Cut and Learn for Unsupervised Object Detection and Instance Segmentation\" and \"VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation\"","CutLER 是一款专注于无监督物体检测与实例分割的开源 AI 框架，旨在让模型在无需任何人工标注数据的情况下，学会识别和分割图像或视频中的物体。它主要解决了传统视觉模型依赖昂贵且耗时的人工标注数据的痛点，仅需使用如 ImageNet-1K 这类未标注的通用数据集即可完成训练。\n\n该工具特别适合计算机视觉领域的研究人员、算法工程师以及希望探索自监督学习潜力的开发者使用。其核心技术亮点在于提出了\"MaskCut\"方法，能够自动从单张图像中生成高质量的多物体伪掩码，进而驱动模型进行自我学习。实验表明，CutLER 在 11 个不同基准测试（涵盖自然图像、视频帧、绘画及草图等）上的表现优异，关键指标甚至超越了以往最先进方法的 2.6 倍以上，并展现出极强的跨域鲁棒性。\n\n此外，CutLER 还衍生出了 VideoCutLER 版本，这是一个令人惊喜的视频实例分割方案。它打破了常规，无需依赖光流估计或天然视频数据，仅凭静态图像集就能训练出顶尖的视频分割模型。无论是作为无监督任务的独立解决方案，还是作为全监督或半监督任务的预训练底座，CutLER 都为降低数据标注成本、提升模型泛化能力提供了强有","CutLER 是一款专注于无监督物体检测与实例分割的开源 AI 框架，旨在让模型在无需任何人工标注数据的情况下，学会识别和分割图像或视频中的物体。它主要解决了传统视觉模型依赖昂贵且耗时的人工标注数据的痛点，仅需使用如 ImageNet-1K 这类未标注的通用数据集即可完成训练。\n\n该工具特别适合计算机视觉领域的研究人员、算法工程师以及希望探索自监督学习潜力的开发者使用。其核心技术亮点在于提出了\"MaskCut\"方法，能够自动从单张图像中生成高质量的多物体伪掩码，进而驱动模型进行自我学习。实验表明，CutLER 在 11 个不同基准测试（涵盖自然图像、视频帧、绘画及草图等）上的表现优异，关键指标甚至超越了以往最先进方法的 2.6 倍以上，并展现出极强的跨域鲁棒性。\n\n此外，CutLER 还衍生出了 VideoCutLER 版本，这是一个令人惊喜的视频实例分割方案。它打破了常规，无需依赖光流估计或天然视频数据，仅凭静态图像集就能训练出顶尖的视频分割模型。无论是作为无监督任务的独立解决方案，还是作为全监督或半监督任务的预训练底座，CutLER 都为降低数据标注成本、提升模型泛化能力提供了强有力的技术支持。","# Cut and Learn for Unsupervised Image & Video Object Detection and Instance Segmentation\n\n**Cut**-and-**LE**a**R**n (**CutLER**) is a simple approach for training object detection and instance segmentation models without human annotations.\nIt outperforms previous SOTA by **2.7 times** for AP50 and **2.6 times** for AR on **11 benchmarks**.\n\n\u003Cp align=\"center\"> \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_CutLER_readme_a0d1c6c5453b.jpg' align=\"center\" > \u003C\u002Fp>\n\n> [**Cut and Learn for Unsupervised Object Detection and Instance Segmentation**](http:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~xdwang\u002Fprojects\u002FCutLER\u002F)            \n> [Xudong Wang](https:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~xdwang\u002F), [Rohit Girdhar](https:\u002F\u002Frohitgirdhar.github.io\u002F), [Stella X. Yu](https:\u002F\u002Fwww1.icsi.berkeley.edu\u002F~stellayu\u002F), [Ishan Misra](https:\u002F\u002Fimisra.github.io\u002F)     \n> FAIR, Meta AI; UC Berkeley            \n> CVPR 2023            \n\n[[`project page`](http:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~xdwang\u002Fprojects\u002FCutLER\u002F)] [[`arxiv`](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.11320)] [[`colab`](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1NgEyFHvOfuA2MZZnfNPWg1w5gSr3HOBb?usp=sharing)] [[`bibtex`](#citation)]             \n\nUnsupervised video instance segmentation (**VideoCutLER**) is also supported. ***We demonstrate that video instance segmentation models can be learned without using any human annotations, without relying on natural videos (ImageNet data alone is sufficient), and even without motion estimations!*** The code is available [here](videocutler).             \n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_CutLER_readme_358cf4b16d1d.gif\" width=100%>\n\u003C\u002Fp>\n\n> [**VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation**](https:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~xdwang\u002Fprojects\u002FVideoCutLER\u002Fvideocutler.pdf)            \n> [Xudong Wang](https:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~xdwang\u002F), [Ishan Misra](https:\u002F\u002Fimisra.github.io\u002F), Ziyun Zeng, [Rohit Girdhar](https:\u002F\u002Frohitgirdhar.github.io\u002F), [Trevor Darrell](https:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~trevor\u002F)             \n> UC Berkeley; FAIR, Meta AI            \n> CVPR 2024            \n\n[[`code`](videocutler\u002FREADME.md)] [[`PDF`](https:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~xdwang\u002Fprojects\u002FVideoCutLER\u002Fvideocutler.pdf)] [[`arxiv`](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.14710)] [[`bibtex`](#citation)]             \n\n## Features\n- We propose MaskCut approach to generate pseudo-masks for multiple objects in an image.\n- CutLER can learn unsupervised object detectors and instance segmentors solely on ImageNet-1K.\n- CutLER exhibits strong robustness to domain shifts when evaluated on 11 different benchmarks across domains like natural images, video frames, paintings, sketches, etc.\n- CutLER can serve as a pretrained model for fully\u002Fsemi-supervised detection and segmentation tasks.\n- We also propose VideoCutLER, a surprisingly simple unsupervised video instance segmentation (UVIS) method without relying on optical flows. ImaegNet-1K is all we need for training a SOTA UVIS model!\n\n## Installation\nSee [installation instructions](INSTALL.md).\n\n## Dataset Preparation\nSee [Preparing Datasets for CutLER](datasets\u002FREADME.md).\n\n## Method Overview\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_CutLER_readme_ed47bff6ad37.jpg\" width=55%>\n\u003C\u002Fp>\nCut-and-Learn has two stages: 1) generating pseudo-masks with MaskCut and 2) learning unsupervised detectors from pseudo-masks of unlabeled data.\n\n### 1. MaskCut\n\nMaskCut can be used to provide segmentation masks for multiple instances of each image.\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_CutLER_readme_21c7ae64b420.gif\" width=100%>\n\u003C\u002Fp>\n\n### MaskCut Demo\n\nTry out the MaskCut demo using Colab (no GPU needed): [![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1X05lKL_IBRvZB7q6n6pb4w00_tIYjGlf?usp=sharing)\n\nTry out the web demo: [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffacebook\u002FMaskCut) (thanks to [@hysts](https:\u002F\u002Fgithub.com\u002Fhysts)!)\n\n\n\n\nIf you want to run MaskCut locally, we provide `demo.py` that is able to visualize the pseudo-masks produced by MaskCut.\nRun it with:\n```\ncd maskcut\npython demo.py --img-path imgs\u002Fdemo2.jpg \\\n  --N 3 --tau 0.15 --vit-arch base --patch-size 8 \\\n  [--other-options]\n```\nWe give a few demo images in maskcut\u002Fimgs\u002F. If you want to run demo.py with cpu, simply add \"--cpu\" when running the demo script. \nFor imgs\u002Fdemo4.jpg, you need to use \"--N 6\" to segment all six instances in the image.\nFollowing, we give some visualizations of the pseudo-masks on the demo images.\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_CutLER_readme_7c1e075e89f5.jpg\" width=100%>\n\u003C\u002Fp>\n\n### Generating Annotations for ImageNet-1K with MaskCut\nTo generate pseudo-masks for ImageNet-1K using MaskCut, first set up the ImageNet-1K dataset according to the instructions in [datasets\u002FREADME.md](datasets\u002FREADME.md), then execute the following command:\n```\ncd maskcut\npython maskcut.py \\\n--vit-arch base --patch-size 8 \\\n--tau 0.15 --fixed_size 480 --N 3 \\\n--num-folder-per-job 1000 --job-index 0 \\\n--dataset-path \u002Fpath\u002Fto\u002Fdataset\u002Ftraindir \\\n--out-dir \u002Fpath\u002Fto\u002Fsave\u002Fannotations \\\n```\nAs the process of generating pseudo-masks for all 1.3 million images in 1,000 folders takes a significant amount of time, it is recommended to use multiple runs. Each run should process the pseudo-mask generation for a smaller number of image folders by setting \"--num-folder-per-job\" and \"--job-index\". Once all runs are completed, you can merge all the resulting json files by using the following command:\n```\npython merge_jsons.py \\\n--base-dir \u002Fpath\u002Fto\u002Fsave\u002Fannotations \\\n--num-folder-per-job 2 --fixed-size 480 \\\n--tau 0.15 --N 3 \\\n--save-path imagenet_train_fixsize480_tau0.15_N3.json\n```\nThe \"--num-folder-per-job\", \"--fixed-size\", \"--tau\" and \"--N\" of merge_jsons.py should match the ones used to run maskcut.py.\n\nWe also provide a submitit script to launch the pseudo-mask generation process with multiple nodes. \n```\ncd maskcut\nbash run_maskcut_with_submitit.sh\n```\nAfter that, you can use \"merge_jsons.py\" to merge all these json files as described above.\n\n### 2. CutLER\n\n### Inference Demo for CutLER with Pre-trained Models\nTry out the CutLER demo using Colab (no GPU needed): [![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1NgEyFHvOfuA2MZZnfNPWg1w5gSr3HOBb?usp=sharing)\n\nTry out the web demo: [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffacebook\u002FCutLER) (thanks to [@hysts](https:\u002F\u002Fgithub.com\u002Fhysts)!)\n\n\nTry out Replicate demo and the API: [![Replicate](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_CutLER_readme_7dacf1cc5d87.png)](https:\u002F\u002Freplicate.com\u002Fcjwbw\u002Fcutler) \n\n\nIf you want to run CutLER demos locally,\n1. Pick a model and its config file from [model zoo](#model-zoo),\n  for example, `model_zoo\u002Fconfigs\u002FCutLER-ImageNet\u002Fcascade_mask_rcnn_R_50_FPN.yaml`.\n2. We provide `demo.py` that is able to demo builtin configs. Run it with:\n```\ncd cutler\npython demo\u002Fdemo.py --config-file model_zoo\u002Fconfigs\u002FCutLER-ImageNet\u002Fcascade_mask_rcnn_R_50_FPN_demo.yaml \\\n  --input demo\u002Fimgs\u002F*.jpg \\\n  [--other-options]\n  --opts MODEL.WEIGHTS \u002Fpath\u002Fto\u002Fcutler_w_cascade_checkpoint\n```\nThe configs are made for training, therefore we need to specify `MODEL.WEIGHTS` to a model from model zoo for evaluation.\nThis command will run the inference and show visualizations in an OpenCV window.\n\u003C!-- For details of the command line arguments, see `demo.py -h` or look at its source code\nto understand its behavior. Some common arguments are: -->\n* To run __on cpu__, add `MODEL.DEVICE cpu` after `--opts`.\n* To save outputs to a directory (for images) or a file (for webcam or video), use `--output`.\n\nFollowing, we give some visualizations of the model predictions on the demo images.\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_CutLER_readme_de7db9bf22ba.jpg\" width=100%>\n\u003C\u002Fp>\n\n### Unsupervised Model Learning\nBefore training the detector, it is necessary to use MaskCut to generate pseudo-masks for all ImageNet data.\nYou can either use the pre-generated json file directly by downloading it from [here](http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fmaskcut\u002Fimagenet_train_fixsize480_tau0.15_N3.json) and placing it under \"DETECTRON2_DATASETS\u002Fimagenet\u002Fannotations\u002F\", or generate your own pseudo-masks by following the instructions in [MaskCut](#1-maskcut).\n\nWe provide a script `train_net.py`, that is made to train all the configs provided in CutLER.\nTo train a model with \"train_net.py\", first setup the ImageNet-1K dataset following [datasets\u002FREADME.md](datasets\u002FREADME.md), then run:\n```\ncd cutler\nexport DETECTRON2_DATASETS=\u002Fpath\u002Fto\u002FDETECTRON2_DATASETS\u002F\npython train_net.py --num-gpus 8 \\\n  --config-file model_zoo\u002Fconfigs\u002FCutLER-ImageNet\u002Fcascade_mask_rcnn_R_50_FPN.yaml\n```\n\nIf you want to train a model using multiple nodes, you may need to adjust [some model parameters](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.02677) and some SBATCH command options in \"tools\u002Ftrain-1node.sh\" and \"tools\u002Fsingle-node_run.sh\", then run:\n```\ncd cutler\nsbatch tools\u002Ftrain-1node.sh \\\n  --config-file model_zoo\u002Fconfigs\u002FCutLER-ImageNet\u002Fcascade_mask_rcnn_R_50_FPN.yaml \\\n  MODEL.WEIGHTS \u002Fpath\u002Fto\u002Fdino\u002Fd2format\u002Fmodel \\\n  OUTPUT_DIR output\u002F\n```\nYou can also convert a pre-trained DINO model to detectron2's format by yourself following [this link](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fmoco\u002Ftree\u002Fmain\u002Fdetection).\n\n### Self-training\nWe further improve performance by self-training the model on its predictions.\n\nFirstly, we can get model predictions on ImageNet via running:\n```\npython train_net.py --num-gpus 8 \\\n  --config-file model_zoo\u002Fconfigs\u002FCutLER-ImageNet\u002Fcascade_mask_rcnn_R_50_FPN.yaml \\\n  --test-dataset imagenet_train \\\n  --eval-only TEST.DETECTIONS_PER_IMAGE 30 \\\n  MODEL.WEIGHTS output\u002Fmodel_final.pth \\ # load previous stage\u002Fround checkpoints\n  OUTPUT_DIR output\u002F # path to save model predictions\n```\nSecondly, we can run the following command to generate the json file for the first round of self-training:\n```\npython tools\u002Fget_self_training_ann.py \\\n  --new-pred output\u002Finference\u002Fcoco_instances_results.json \\ # load model predictions\n  --prev-ann DETECTRON2_DATASETS\u002Fimagenet\u002Fannotations\u002Fimagenet_train_fixsize480_tau0.15_N3.json \\ # path to the old annotation file.\n  --save-path DETECTRON2_DATASETS\u002Fimagenet\u002Fannotations\u002Fcutler_imagenet1k_train_r1.json \\ # path to save a new annotation file.\n  --threshold 0.7\n```\nFinally, place \"cutler_imagenet1k_train_r1.json\" under \"DETECTRON2_DATASETS\u002Fimagenet\u002Fannotations\u002F\", then launch the self-training process:\n```\npython train_net.py --num-gpus 8 \\\n  --config-file model_zoo\u002Fconfigs\u002FCutLER-ImageNet\u002Fcascade_mask_rcnn_R_50_FPN_self_train.yaml \\\n  --train-dataset imagenet_train_r1 \\\n  MODEL.WEIGHTS output\u002Fmodel_final.pth \\ # load previous stage\u002Fround checkpoints\n  OUTPUT_DIR output\u002Fself-train-r1\u002F # path to save checkpoints\n```\n\nYou can repeat the steps above to perform multiple rounds of self-training and adjust some arguments as needed (e.g., \"--threshold\" for round 1 and 2 can be set to 0.7 and 0.65, respectively; \"--train-dataset\" for round 1 and 2 can be set to \"imagenet_train_r1\" and \"imagenet_train_r2\", respectively; MODEL.WEIGHTS for round 1 and 2 should point to the previous stage\u002Fround checkpoints). Ensure that all annotation files are placed under DETECTRON2_DATASETS\u002Fimagenet\u002Fannotations\u002F.\nPlease ensure that \"--train-dataset\", json file names and locations match the ones specified in \"cutler\u002Fdata\u002Fdatasets\u002Fbuiltin.py\".\nPlease refer to this [instruction](https:\u002F\u002Fdetectron2.readthedocs.io\u002Fen\u002Flatest\u002Ftutorials\u002Fdatasets.html) for guidance on using custom datasets.\n\nYou can also directly download the MODEL.WEIGHTS and annotations used for each round of self-training:\n\u003Ctable>\u003Ctbody>\n\u003C!-- START TABLE -->\n\u003C!-- TABLE BODY -->\n\u003C!-- ROW: round 1 -->\n\u003Ctr>\u003Ctd align=\"center\">round 1\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_cascade_r1.pth\">cutler_cascade_r1.pth\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fmaskcut\u002Fcutler_imagenet1k_train_r1.json\">cutler_imagenet1k_train_r1.json\u003C\u002Fa>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C!-- ROW: round 2 -->\n\u003Ctr>\u003Ctd align=\"center\">round 2\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_cascade_r2.pth\">cutler_cascade_r2.pth\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fmaskcut\u002Fcutler_imagenet1k_train_r2.json\">cutler_imagenet1k_train_r2.json\u003C\u002Fa>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\n### Unsupervised Zero-shot Evaluation\nTo evaluate a model's performance on 11 different datasets, please refer to [datasets\u002FREADME.md](datasets\u002FREADME.md) for instructions on preparing the datasets. Next, select a model from the model zoo, specify the \"model_weights\", \"config_file\" and the path to \"DETECTRON2_DATASETS\" in `tools\u002Feval.sh`, then run the script.\n```\nbash tools\u002Feval.sh\n```\n\n### Model Zoo\nWe show zero-shot unsupervised object detection performance (AP50&nbsp;|&nbsp;AR) on 11 different datasets spanning a variety of domains. ^: CutLER using Mask R-CNN as a detector; *: CutLER using Cascade Mask R-CNN as a detector. \n\u003Ctable>\u003Ctbody>\n\u003C!-- START TABLE -->\n\u003C!-- TABLE HEADER -->\n\u003Cth valign=\"bottom\">Methods\u003C\u002Fth>\n\u003Cth valign=\"bottom\">Models\u003C\u002Fth>\n\u003Cth valign=\"bottom\">COCO\u003C\u002Fth>\n\u003Cth valign=\"bottom\">COCO20K\u003C\u002Fth>\n\u003Cth valign=\"bottom\">VOC\u003C\u002Fth>\n\u003Cth valign=\"bottom\">LVIS\u003C\u002Fth>\n\u003Cth valign=\"bottom\">UVO\u003C\u002Fth>\n\u003Cth valign=\"bottom\">Clipart\u003C\u002Fth>\n\u003Cth valign=\"bottom\">Comic\u003C\u002Fth>\n\u003Cth valign=\"bottom\">Watercolor\u003C\u002Fth>\n\u003Cth valign=\"bottom\">KITTI\u003C\u002Fth>\n\u003Cth valign=\"bottom\">Objects365\u003C\u002Fth>\n\u003Cth valign=\"bottom\">OpenImages\u003C\u002Fth>\n\u003C!-- TABLE BODY -->\n\u003C\u002Ftr>\n\u003Ctr>\u003Ctd align=\"center\">Prev. SOTA\u003C\u002Ftd>\n\u003Ctd valign=\"bottom\">-\u003C\u002Ftd>\n\u003Ctd align=\"center\">9.6&nbsp;|&nbsp;12.6\u003C\u002Ftd>\n\u003Ctd align=\"center\">9.7&nbsp;|&nbsp;12.6\u003C\u002Ftd>\n\u003Ctd align=\"center\">15.9&nbsp;|&nbsp;21.3\u003C\u002Ftd>\n\u003Ctd align=\"center\">3.8&nbsp;|&nbsp;6.4\u003C\u002Ftd>\n\u003Ctd align=\"center\">10.0&nbsp;|&nbsp;14.2\u003C\u002Ftd>\n\u003Ctd align=\"center\">7.9&nbsp;|&nbsp;15.1\u003C\u002Ftd>\n\u003Ctd align=\"center\">9.9&nbsp;|&nbsp;16.3\u003C\u002Ftd>\n\u003Ctd align=\"center\">6.7&nbsp;|&nbsp;16.2\u003C\u002Ftd>\n\u003Ctd align=\"center\">7.7&nbsp;|&nbsp;7.1\u003C\u002Ftd>\n\u003Ctd align=\"center\">8.1&nbsp;|&nbsp;10.2\u003C\u002Ftd>\n\u003Ctd align=\"center\">9.9&nbsp;|&nbsp;14.9\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C!-- ROW: Box\u002FMask AP for CutLER -->\n\u003C\u002Ftr>\n\u003Ctr>\u003Ctd align=\"center\">CutLER^\u003C\u002Ftd>\n\u003Ctd valign=\"bottom\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_mrcnn_final.pth\">download\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">21.1&nbsp;|&nbsp;29.6\u003C\u002Ftd>\n\u003Ctd align=\"center\">21.6&nbsp;|&nbsp;30.0\u003C\u002Ftd>\n\u003Ctd align=\"center\">36.6&nbsp;|&nbsp;41.0\u003C\u002Ftd>\n\u003Ctd align=\"center\">7.7&nbsp;|&nbsp;18.7\u003C\u002Ftd>\n\u003Ctd align=\"center\">29.8&nbsp;|&nbsp;38.4\u003C\u002Ftd>\n\u003Ctd align=\"center\">20.9&nbsp;|&nbsp;38.5\u003C\u002Ftd>\n\u003Ctd align=\"center\">31.2&nbsp;|&nbsp;37.1\u003C\u002Ftd>\n\u003Ctd align=\"center\">37.3&nbsp;|&nbsp;39.9\u003C\u002Ftd>\n\u003Ctd align=\"center\">15.3&nbsp;|&nbsp;25.4\u003C\u002Ftd>\n\u003Ctd align=\"center\">19.5&nbsp;|&nbsp;30.0\u003C\u002Ftd>\n\u003Ctd align=\"center\">17.1&nbsp;|&nbsp;26.4\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C!-- ROW: Box\u002FMask AP for CutLER -->\n\u003C\u002Ftr>\n\u003Ctr>\u003Ctd align=\"center\">CutLER*\u003C\u002Ftd>\n\u003Ctd valign=\"bottom\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_cascade_final.pth\">download\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">21.9&nbsp;|&nbsp;32.7\u003C\u002Ftd>\n\u003Ctd align=\"center\">22.4&nbsp;|&nbsp;33.1\u003C\u002Ftd>\n\u003Ctd align=\"center\">36.9&nbsp;|&nbsp;44.3\u003C\u002Ftd>\n\u003Ctd align=\"center\">8.4&nbsp;|&nbsp;21.8\u003C\u002Ftd>\n\u003Ctd align=\"center\">31.7&nbsp;|&nbsp;42.8\u003C\u002Ftd>\n\u003Ctd align=\"center\">21.1&nbsp;|&nbsp;41.3\u003C\u002Ftd>\n\u003Ctd align=\"center\">30.4&nbsp;|&nbsp;38.6\u003C\u002Ftd>\n\u003Ctd align=\"center\">37.5&nbsp;|&nbsp;44.6\u003C\u002Ftd>\n\u003Ctd align=\"center\">18.4&nbsp;|&nbsp;27.5\u003C\u002Ftd>\n\u003Ctd align=\"center\">21.6&nbsp;|&nbsp;34.2\u003C\u002Ftd>\n\u003Ctd align=\"center\">17.3&nbsp;|&nbsp;29.6\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\n## Semi-supervised and Fully-supervised Learning\nCutLER can also serve as a pretrained model for training fully supervised object detection and instance segmentation models and improves performance on COCO, including on few-shot benchmarks.\n\n### Training & Evaluation in Command Line\nYou can find all the semi-supervised and fully-supervised learning configs provided in CutLER under `model_zoo\u002Fconfigs\u002FCOCO-Semisupervised`.\n\nTo train a model using K% labels with `train_net.py`, first set up the COCO dataset according to [datasets\u002FREADME.md](datasets\u002FREADME.md) and specify K value in the config file, then run:\n```\npython train_net.py --num-gpus 8 \\\n  --config-file model_zoo\u002Fconfigs\u002FCOCO-Semisupervised\u002Fcascade_mask_rcnn_R_50_FPN_{K}perc.yaml \\\n  MODEL.WEIGHTS \u002Fpath\u002Fto\u002Fcutler_pretrained_model\n```\n\nYou can find all config files used to train supervised models under `model_zoo\u002Fconfigs\u002FCOCO-Semisupervised`.\nThe configs are made for 8-GPU training. To train on 1 GPU, you may need to [change some parameters](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.02677), e.g. number of GPUs (num-gpus your_num_gpus), learning rates (SOLVER.BASE_LR your_base_lr) and batch size (SOLVER.IMS_PER_BATCH your_batch_size).\n\n### Evaluation\nTo evaluate a model's performance, use\n```\npython train_net.py \\\n  --config-file model_zoo\u002Fconfigs\u002FCOCO-Semisupervised\u002Fcascade_mask_rcnn_R_50_FPN_{K}perc.yaml \\\n  --eval-only MODEL.WEIGHTS \u002Fpath\u002Fto\u002Fcheckpoint_file\n```\nFor more options, see `python train_net.py -h`.\n\n### Model Zoo\nWe fine-tune a Cascade R-CNN model initialized with CutLER or MoCo-v2 on varying amounts of labeled COCO data, and show results (Box&nbsp;|&nbsp;Mask AP) on the val2017 split below:\n\n\u003Ctable>\u003Ctbody>\n\u003C!-- START TABLE -->\n\u003C!-- TABLE HEADER -->\n\u003Cth valign=\"bottom\">% of labels\u003C\u002Fth>\n\u003Cth valign=\"bottom\">1%\u003C\u002Fth>\n\u003Cth valign=\"bottom\">2%\u003C\u002Fth>\n\u003Cth valign=\"bottom\">5%\u003C\u002Fth>\n\u003Cth valign=\"bottom\">10%\u003C\u002Fth>\n\u003Cth valign=\"bottom\">20%\u003C\u002Fth>\n\u003Cth valign=\"bottom\">30%\u003C\u002Fth>\n\u003Cth valign=\"bottom\">40%\u003C\u002Fth>\n\u003Cth valign=\"bottom\">50%\u003C\u002Fth>\n\u003Cth valign=\"bottom\">60%\u003C\u002Fth>\n\u003Cth valign=\"bottom\">80%\u003C\u002Fth>\n\u003Cth valign=\"bottom\">100%\u003C\u002Fth>\n\u003C!-- TABLE BODY -->\n\u003C!-- ROW: Box\u002FMask AP for CutLER -->\n\u003Ctr>\u003Ctd align=\"center\">MoCo-v2\u003C\u002Ftd>\n\u003Ctd align=\"center\">11.8&nbsp;|&nbsp;10.0\u003C\u002Ftd>\n\u003Ctd align=\"center\">16.2&nbsp;|&nbsp;13.8\u003C\u002Ftd>\n\u003Ctd align=\"center\">20.5&nbsp;|&nbsp;17.8\u003C\u002Ftd>\n\u003Ctd align=\"center\">26.5&nbsp;|&nbsp;23.0\u003C\u002Ftd>\n\u003Ctd align=\"center\">32.5&nbsp;|&nbsp;28.2\u003C\u002Ftd>\n\u003Ctd align=\"center\">35.5&nbsp;|&nbsp;30.8\u003C\u002Ftd>\n\u003Ctd align=\"center\">37.3&nbsp;|&nbsp;32.3\u003C\u002Ftd>\n\u003Ctd align=\"center\">38.7&nbsp;|&nbsp;33.6\u003C\u002Ftd>\n\u003Ctd align=\"center\">39.9&nbsp;|&nbsp;34.6\u003C\u002Ftd>\n\u003Ctd align=\"center\">41.6&nbsp;|&nbsp;36.0\u003C\u002Ftd>\n\u003Ctd align=\"center\">42.8&nbsp;|&nbsp;37.0\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C!-- ROW: Mask AP -->\n\u003Ctr>\u003Ctd align=\"center\">CutLER\u003C\u002Ftd>\n\u003Ctd align=\"center\">16.8&nbsp;|&nbsp;14.6\u003C\u002Ftd>\n\u003Ctd align=\"center\">21.6&nbsp;|&nbsp;18.9\u003C\u002Ftd>\n\u003Ctd align=\"center\">27.8&nbsp;|&nbsp;24.3\u003C\u002Ftd>\n\u003Ctd align=\"center\">32.2&nbsp;|&nbsp;28.1\u003C\u002Ftd>\n\u003Ctd align=\"center\">36.6&nbsp;|&nbsp;31.7\u003C\u002Ftd>\n\u003Ctd align=\"center\">38.2&nbsp;|&nbsp;33.3\u003C\u002Ftd>\n\u003Ctd align=\"center\">39.9&nbsp;|&nbsp;34.7\u003C\u002Ftd>\n\u003Ctd align=\"center\">41.5&nbsp;|&nbsp;35.9\u003C\u002Ftd>\n\u003Ctd align=\"center\">42.3&nbsp;|&nbsp;36.7\u003C\u002Ftd>\n\u003Ctd align=\"center\">43.8&nbsp;|&nbsp;37.9\u003C\u002Ftd>\n\u003Ctd align=\"center\">44.7&nbsp;|&nbsp;38.5\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C!-- ROW: Model Downloads -->\n\u003Ctr>\u003Ctd align=\"center\">Download\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_semi_1perc.pth\">model\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_semi_2perc.pth\">model\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_semi_5perc.pth\">model\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_semi_10perc.pth\">model\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_semi_20perc.pth\">model\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_semi_30perc.pth\">model\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_semi_40perc.pth\">model\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_semi_50perc.pth\">model\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_semi_60perc.pth\">model\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_semi_80perc.pth\">model\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_fully_100perc.pth\">model\u003C\u002Fa>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\nBoth MoCo-v2 and our CutLER are trained for the 1x schedule using Detectron2, except for extremely low-shot settings with 1% or 2% labels. When training with 1% or 2% labels, we train both MoCo-v2 and our model for 3,600 iterations with a batch size of 16.\n\n## License\nThe majority of CutLER, Detectron2 and DINO are licensed under the [CC-BY-NC license](LICENSE), however portions of the project are available under separate license terms: TokenCut, Bilateral Solver and CRF are licensed under the MIT license; If you later add other third party code, please keep this license info updated, and please let us know if that component is licensed under something other than CC-BY-NC, MIT, or CC0.\n\n## Ethical Considerations\nCutLER's wide range of detection capabilities may introduce similar challenges to many other visual recognition methods.\nAs the image can contain arbitrary instances, it may impact the model output.\n\n## How to get support from us?\nIf you have any general questions, feel free to email us at [Xudong Wang](mailto:xdwang@eecs.berkeley.edu), [Ishan Misra](mailto:imisra@meta.com) and [Rohit Girdhar](mailto:rgirdhar@meta.com). If you have code or implementation-related questions, please feel free to send emails to us or open an issue in this codebase (We recommend that you open an issue in this codebase, because your questions may help others). \n\n## Citation\nIf you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation.\n```\n@inproceedings{wang2023cut,\n  title={Cut and learn for unsupervised object detection and instance segmentation},\n  author={Wang, Xudong and Girdhar, Rohit and Yu, Stella X and Misra, Ishan},\n  booktitle={Proceedings of the IEEE\u002FCVF Conference on Computer Vision and Pattern Recognition},\n  pages={3124--3134},\n  year={2023}\n}\n```\n\n```\n@article{wang2023videocutler,\n  title={VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation},\n  author={Wang, Xudong and Misra, Ishan and Zeng, Ziyun and Girdhar, Rohit and Darrell, Trevor},\n  journal={arXiv preprint arXiv:2308.14710},\n  year={2023}\n}\n```\n","# 无监督图像与视频目标检测及实例分割的剪切与学习\n\n**Cut**-and-**LE**a**R**n (**CutLER**) 是一种无需人工标注即可训练目标检测和实例分割模型的简单方法。\n它在 **11 个基准测试**上，AP50 指标比之前的 SOTA 提升了 **2.7 倍**，AR 指标提升了 **2.6 倍**。\n\n\u003Cp align=\"center\"> \u003Cimg src='https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_CutLER_readme_a0d1c6c5453b.jpg' align=\"center\" > \u003C\u002Fp>\n\n> [**无监督目标检测与实例分割的剪切与学习**](http:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~xdwang\u002Fprojects\u002FCutLER\u002F)            \n> 王旭东 ([Xudong Wang](https:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~xdwang\u002F))、罗希特·吉尔达尔 ([Rohit Girdhar](https:\u002F\u002Frohitgirdhar.github.io\u002F))、斯黛拉·Y·余 ([Stella X. Yu](https:\u002F\u002Fwww1.icsi.berkeley.edu\u002F~stellayu\u002F))、伊山·米斯拉 ([Ishan Misra](https:\u002F\u002Fimisra.github.io\u002F))     \n> FAIR, Meta AI；加州大学伯克利分校            \n> CVPR 2023            \n\n[[`项目页面`](http:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~xdwang\u002Fprojects\u002FCutLER\u002F)] [[`arxiv`](https:\u002F\u002Farxiv.org\u002Fabs\u002F2301.11320)] [[`colab`](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1NgEyFHvOfuA2MZZnfNPWg1w5gSr3HOBb?usp=sharing)] [[`bibtex`](#citation)]             \n\n同时支持无监督视频实例分割（**VideoCutLER**）。***我们证明了视频实例分割模型可以在完全不使用任何人工标注、不依赖自然视频（仅使用 ImageNet 数据就足够）、甚至不需要运动估计的情况下进行学习！*** 代码可在 [这里](videocutler) 获取。             \n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_CutLER_readme_358cf4b16d1d.gif\" width=100%>\n\u003C\u002Fp>\n\n> [**VideoCutLER：令人惊讶的简单无监督视频实例分割**](https:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~xdwang\u002Fprojects\u002FVideoCutLER\u002Fvideocutler.pdf)            \n> 王旭东 ([Xudong Wang](https:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~xdwang\u002F))、伊山·米斯拉 ([Ishan Misra](https:\u002F\u002Fimisra.github.io\u002F))、曾子云、罗希特·吉尔达尔 ([Rohit Girdhar](https:\u002F\u002Frohitgirdhar.github.io\u002F))、特雷弗·达雷尔 ([Trevor Darrell](https:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~trevor\u002F))             \n> 加州大学伯克利分校；FAIR, Meta AI            \n> CVPR 2024            \n\n[[`代码`](videocutler\u002FREADME.md)] [[`PDF`](https:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~xdwang\u002Fprojects\u002FVideoCutLER\u002Fvideocutler.pdf)] [[`arxiv`](https:\u002F\u002Farxiv.org\u002Fabs\u002F2308.14710)] [[`bibtex`](#citation)]             \n\n## 特性\n- 我们提出了 MaskCut 方法，用于为图像中的多个目标生成伪掩码。\n- CutLER 仅使用 ImageNet-1K 数据集即可学习无监督的目标检测器和实例分割模型。\n- 在涵盖自然图像、视频帧、绘画、素描等多种领域的 11 个不同基准测试上，CutLER 对领域迁移表现出强大的鲁棒性。\n- CutLER 可用作全监督或半监督检测与分割任务的预训练模型。\n- 我们还提出了 VideoCutLER，这是一种无需依赖光流的惊人简单的无监督视频实例分割（UVIS）方法。只需 ImageNet-1K 数据集，就能训练出 SOTA 的 UVIS 模型！\n\n## 安装\n请参阅 [安装说明](INSTALL.md)。\n\n## 数据集准备\n请参阅 [为 CutLER 准备数据集](datasets\u002FREADME.md)。\n\n## 方法概述\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_CutLER_readme_ed47bff6ad37.jpg\" width=55%>\n\u003C\u002Fp>\n剪切与学习分为两个阶段：1) 使用 MaskCut 生成伪掩码；2) 从未标注数据的伪掩码中学习无监督检测器。\n\n### 1. MaskCut\n\nMaskCut 可以为每张图像中的多个实例提供分割掩码。\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_CutLER_readme_21c7ae64b420.gif\" width=100%>\n\u003C\u002Fp>\n\n### MaskCut 演示\n\n使用 Colab 尝试 MaskCut 演示（无需 GPU）：[![在 Colab 中打开](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1X05lKL_IBRvZB7q6n6pb4w00_tIYjGlf?usp=sharing)\n\n尝试网页演示：[![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffacebook\u002FMaskCut)（感谢 [@hysts](https:\u002F\u002Fgithub.com\u002Fhysts)！）\n\n如果想在本地运行 MaskCut，我们提供了 `demo.py`，可以可视化 MaskCut 生成的伪掩码。\n运行命令如下：\n```\ncd maskcut\npython demo.py --img-path imgs\u002Fdemo2.jpg \\\n  --N 3 --tau 0.15 --vit-arch base --patch-size 8 \\\n  [--其他选项]\n```\n我们在 maskcut\u002Fimgs\u002F 中提供了一些演示图片。如果想用 CPU 运行 demo.py，只需在运行脚本时添加 `--cpu` 即可。对于 imgs\u002Fdemo4.jpg，需要使用 `--N 6` 才能分割图中的六个实例。\n接下来是一些演示图片上伪掩码的可视化效果。\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_CutLER_readme_7c1e075e89f5.jpg\" width=100%>\n\u003C\u002Fp>\n\n### 使用 MaskCut 为 ImageNet-1K 生成标注\n要使用 MaskCut 为 ImageNet-1K 生成伪掩码，首先按照 [datasets\u002FREADME.md](datasets\u002FREADME.md) 中的说明设置 ImageNet-1K 数据集，然后执行以下命令：\n```\ncd maskcut\npython maskcut.py \\\n--vit-arch base --patch-size 8 \\\n--tau 0.15 --fixed_size 480 --N 3 \\\n--num-folder-per-job 1000 --job-index 0 \\\n--dataset-path \u002Fpath\u002Fto\u002Fdataset\u002Ftraindir \\\n--out-dir \u002Fpath\u002Fto\u002Fsave\u002Fannotations \\\n```\n由于为 1,000 个文件夹中的 130 万张图像生成伪掩码需要较长时间，建议分多次运行。每次运行时可通过设置 `--num-folder-per-job` 和 `--job-index` 来处理较少数量的图像文件夹。所有运行完成后，可以使用以下命令合并所有生成的 JSON 文件：\n```\npython merge_jsons.py \\\n--base-dir \u002Fpath\u002Fto\u002Fsave\u002Fannotations \\\n--num-folder-per-job 2 --fixed-size 480 \\\n--tau 0.15 --N 3 \\\n--save-path imagenet_train_fixsize480_tau0.15_N3.json\n```\n`merge_jsons.py` 中的 `--num-folder-per-job`、`--fixed-size`、`--tau` 和 `--N` 应与运行 `maskcut.py` 时使用的参数一致。\n\n我们还提供了一个 submitit 脚本，用于在多台节点上启动伪掩码生成过程。\n```\ncd maskcut\nbash run_maskcut_with_submitit.sh\n```\n之后，您可以按照上述说明使用 `merge_jsons.py` 合并所有这些 JSON 文件。\n\n### 2. CutLER\n\n### 使用预训练模型的 CutLER 推理演示\n使用 Colab 试用 CutLER 演示（无需 GPU）：[![在 Colab 中打开](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1NgEyFHvOfuA2MZZnfNPWg1w5gSr3HOBb?usp=sharing)\n\n试用网页版演示：[![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffacebook\u002FCutLER)（感谢 [@hysts](https:\u002F\u002Fgithub.com\u002Fhysts)！）\n\n\n试用 Replicate 演示及 API：[![Replicate](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_CutLER_readme_7dacf1cc5d87.png)](https:\u002F\u002Freplicate.com\u002Fcjwbw\u002Fcutler) \n\n\n如果您想在本地运行 CutLER 演示，\n1. 从 [模型库](#model-zoo) 中选择一个模型及其配置文件，例如 `model_zoo\u002Fconfigs\u002FCutLER-ImageNet\u002Fcascade_mask_rcnn_R_50_FPN.yaml`。\n2. 我们提供了 `demo.py`，可用于演示内置配置。运行命令如下：\n```\ncd cutler\npython demo\u002Fdemo.py --config-file model_zoo\u002Fconfigs\u002FCutLER-ImageNet\u002Fcascade_mask_rcnn_R_50_FPN_demo.yaml \\\n  --input demo\u002Fimgs\u002F*.jpg \\\n  [--其他选项]\n  --opts MODEL.WEIGHTS \u002Fpath\u002Fto\u002Fcutler_w_cascade_checkpoint\n```\n这些配置是为训练设计的，因此我们需要通过 `MODEL.WEIGHTS` 指定来自模型库的模型来进行评估。\n该命令将执行推理，并在 OpenCV 窗口中显示可视化结果。\n\u003C!-- 关于命令行参数的详细信息，请参阅 `demo.py -h` 或查看其源代码以了解其行为。一些常用参数包括： -->\n* 若要在 __CPU 上__ 运行，可在 `--opts` 后添加 `MODEL.DEVICE cpu`。\n* 若要将输出保存到目录（图像）或文件（网络摄像头或视频），可使用 `--output`。\n\n接下来，我们展示模型在演示图像上的预测结果可视化效果。\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_CutLER_readme_de7db9bf22ba.jpg\" width=100%>\n\u003C\u002Fp>\n\n### 无监督模型学习\n在训练检测器之前，需要使用 MaskCut 为 ImageNet 数据集中的所有图像生成伪掩码。\n您可以直接使用预先生成的 JSON 文件，从 [这里](http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fmaskcut\u002Fimagenet_train_fixsize480_tau0.15_N3.json) 下载并将其放置在 `DETECTRON2_DATASETS\u002Fimagenet\u002Fannotations\u002F` 目录下；或者按照 [MaskCut](#1-maskcut) 中的说明自行生成伪掩码。\n\n我们提供了一个脚本 `train_net.py`，用于训练 CutLER 中提供的所有配置文件。\n要使用 `train_net.py` 训练模型，首先请按照 [datasets\u002FREADME.md](datasets\u002FREADME.md) 设置 ImageNet-1K 数据集，然后运行：\n```\ncd cutler\nexport DETECTRON2_DATASETS=\u002Fpath\u002Fto\u002FDETECTRON2_DATASETS\u002F\npython train_net.py --num-gpus 8 \\\n  --config-file model_zoo\u002Fconfigs\u002FCutLER-ImageNet\u002Fcascade_mask_rcnn_R_50_FPN.yaml\n```\n\n如果您希望使用多节点训练模型，可能需要调整 [某些模型参数](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.02677) 以及 “tools\u002Ftrain-1node.sh” 和 “tools\u002Fsingle-node_run.sh” 中的 SBATCH 命令选项，然后运行：\n```\ncd cutler\nsbatch tools\u002Ftrain-1node.sh \\\n  --config-file model_zoo\u002Fconfigs\u002FCutLER-ImageNet\u002Fcascade_mask_rcnn_R_50_FPN.yaml \\\n  MODEL.WEIGHTS \u002Fpath\u002Fto\u002Fdino\u002Fd2format\u002Fmodel \\\n  OUTPUT_DIR output\u002F\n```\n您也可以按照 [此链接](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fmoco\u002Ftree\u002Fmain\u002Fdetection) 自行将预训练的 DINO 模型转换为 detectron2 格式。\n\n### 自我训练\n我们通过利用模型自身的预测结果进行自我训练，进一步提升了性能。\n\n首先，我们可以通过以下命令获取模型在 ImageNet 上的预测结果：\n```\npython train_net.py --num-gpus 8 \\\n  --config-file model_zoo\u002Fconfigs\u002FCutLER-ImageNet\u002Fcascade_mask_rcnn_R_50_FPN.yaml \\\n  --test-dataset imagenet_train \\\n  --eval-only TEST.DETECTIONS_PER_IMAGE 30 \\\n  MODEL.WEIGHTS output\u002Fmodel_final.pth \\ # 加载上一阶段\u002F轮次的检查点\n  OUTPUT_DIR output\u002F # 保存模型预测结果的路径\n```\n其次，我们可以运行以下命令生成第一轮自我训练的 JSON 文件：\n```\npython tools\u002Fget_self_training_ann.py \\\n  --new-pred output\u002Finference\u002Fcoco_instances_results.json \\ # 加载模型预测结果\n  --prev-ann DETECTRON2_DATASETS\u002Fimagenet\u002Fannotations\u002Fimagenet_train_fixsize480_tau0.15_N3.json \\ # 老的标注文件路径\n  --save-path DETECTRON2_DATASETS\u002Fimagenet\u002Fannotations\u002Fcutler_imagenet1k_train_r1.json \\ # 新标注文件的保存路径\n  --threshold 0.7\n```\n最后，将 “cutler_imagenet1k_train_r1.json” 放置在 “DETECTRON2_DATASETS\u002Fimagenet\u002Fannotations\u002F” 目录下，然后启动自我训练过程：\n```\npython train_net.py --num-gpus 8 \\\n  --config-file model_zoo\u002Fconfigs\u002FCutLER-ImageNet\u002Fcascade_mask_rcnn_R_50_FPN_self_train.yaml \\\n  --train-dataset imagenet_train_r1 \\\n  MODEL.WEIGHTS output\u002Fmodel_final.pth \\ # 加载上一阶段\u002F轮次的检查点\n  OUTPUT_DIR output\u002Fself-train-r1\u002F # 保存检查点的路径\n```\n\n您可以重复上述步骤进行多轮自我训练，并根据需要调整一些参数（例如，第 1 轮和第 2 轮的 “--threshold” 可分别设置为 0.7 和 0.65；第 1 轮和第 2 轮的 “--train-dataset” 可分别设置为 “imagenet_train_r1” 和 “imagenet_train_r2”；“MODEL.WEIGHTS” 应指向上一阶段\u002F轮次的检查点）。请确保所有标注文件都放置在 DETECTRON2_DATASETS\u002Fimagenet\u002Fannotations\u002F 目录下。\n请务必使 “--train-dataset”、JSON 文件名称及位置与 “cutler\u002Fdata\u002Fdatasets\u002Fbuiltin.py” 中指定的一致。\n有关使用自定义数据集的指导，请参考 [此说明](https:\u002F\u002Fdetectron2.readthedocs.io\u002Fen\u002Flatest\u002Ftutorials\u002Fdatasets.html)。\n\n您还可以直接下载每一轮自我训练所使用的 MODEL.WEIGHTS 和标注文件：\n\u003Ctable>\u003Ctbody>\n\u003C!-- 开始表格 -->\n\u003C!-- 表格主体 -->\n\u003C!-- 第一轮 -->\n\u003Ctr>\u003Ctd align=\"center\">第 1 轮\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_cascade_r1.pth\">cutler_cascade_r1.pth\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fmaskcut\u002Fcutler_imagenet1k_train_r1.json\">cutler_imagenet1k_train_r1.json\u003C\u002Fa>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C!-- 第二轮 -->\n\u003Ctr>\u003Ctd align=\"center\">第 2 轮\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_cascade_r2.pth\">cutler_cascade_r2.pth\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fmaskcut\u002Fcutler_imagenet1k_train_r2.json\">cutler_imagenet1k_train_r2.json\u003C\u002Fa>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\n### 无监督零样本评估\n要评估模型在 11 个不同数据集上的性能，请参阅 [datasets\u002FREADME.md](datasets\u002FREADME.md) 了解数据集准备说明。然后，从模型库中选择一个模型，在 `tools\u002Feval.sh` 中指定 “model_weights”、“config_file” 和 “DETECTRON2_DATASETS” 的路径，再运行该脚本。\n```\nbash tools\u002Feval.sh\n```\n\n### 模型动物园\n我们在涵盖多种领域的11个不同数据集上展示了零样本无监督目标检测性能（AP50 | AR）。^：CutLER 使用 Mask R-CNN 作为检测器；*：CutLER 使用 Cascade Mask R-CNN 作为检测器。\n\u003Ctable>\u003Ctbody>\n\u003C!-- 开始表格 -->\n\u003C!-- 表头 -->\n\u003Cth valign=\"bottom\">方法\u003C\u002Fth>\n\u003Cth valign=\"bottom\">模型\u003C\u002Fth>\n\u003Cth valign=\"bottom\">COCO\u003C\u002Fth>\n\u003Cth valign=\"bottom\">COCO20K\u003C\u002Fth>\n\u003Cth valign=\"bottom\">VOC\u003C\u002Fth>\n\u003Cth valign=\"bottom\">LVIS\u003C\u002Fth>\n\u003Cth valign=\"bottom\">UVO\u003C\u002Fth>\n\u003Cth valign=\"bottom\">Clipart\u003C\u002Fth>\n\u003Cth valign=\"bottom\">Comic\u003C\u002Fth>\n\u003Cth valign=\"bottom\">Watercolor\u003C\u002Fth>\n\u003Cth valign=\"bottom\">KITTI\u003C\u002Fth>\n\u003Cth valign=\"bottom\">Objects365\u003C\u002Fth>\n\u003Cth valign=\"bottom\">OpenImages\u003C\u002Fth>\n\u003C!-- 表体 -->\n\u003C\u002Ftr>\n\u003Ctr>\u003Ctd align=\"center\">先前SOTA\u003C\u002Ftd>\n\u003Ctd valign=\"bottom\">-\u003C\u002Ftd>\n\u003Ctd align=\"center\">9.6 | 12.6\u003C\u002Ftd>\n\u003Ctd align=\"center\">9.7 | 12.6\u003C\u002Ftd>\n\u003Ctd align=\"center\">15.9 | 21.3\u003C\u002Ftd>\n\u003Ctd align=\"center\">3.8 | 6.4\u003C\u002Ftd>\n\u003Ctd align=\"center\">10.0 | 14.2\u003C\u002Ftd>\n\u003Ctd align=\"center\">7.9 | 15.1\u003C\u002Ftd>\n\u003Ctd align=\"center\">9.9 | 16.3\u003C\u002Ftd>\n\u003Ctd align=\"center\">6.7 | 16.2\u003C\u002Ftd>\n\u003Ctd align=\"center\">7.7 | 7.1\u003C\u002Ftd>\n\u003Ctd align=\"center\">8.1 | 10.2\u003C\u002Ftd>\n\u003Ctd align=\"center\">9.9 | 14.9\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C!-- CutLER的边界框\u002FMask AP行 -->\n\u003C\u002Ftr>\n\u003Ctr>\u003Ctd align=\"center\">CutLER^\u003C\u002Ftd>\n\u003Ctd valign=\"bottom\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_mrcnn_final.pth\">下载\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">21.1 | 29.6\u003C\u002Ftd>\n\u003Ctd align=\"center\">21.6 | 30.0\u003C\u002Ftd>\n\u003Ctd align=\"center\">36.6 | 41.0\u003C\u002Ftd>\n\u003Ctd align=\"center\">7.7 | 18.7\u003C\u002Ftd>\n\u003Ctd align=\"center\">29.8 | 38.4\u003C\u002Ftd>\n\u003Ctd align=\"center\">20.9 | 38.5\u003C\u002Ftd>\n\u003Ctd align=\"center\">31.2 | 37.1\u003C\u002Ftd>\n\u003Ctd align=\"center\">37.3 | 39.9\u003C\u002Ftd>\n\u003Ctd align=\"center\">15.3 | 25.4\u003C\u002Ftd>\n\u003Ctd align=\"center\">19.5 | 30.0\u003C\u002Ftd>\n\u003Ctd align=\"center\">17.1 | 26.4\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C!-- CutLER的边界框\u002FMask AP行 -->\n\u003C\u002Ftr>\n\u003Ctr>\u003Ctd align=\"center\">CutLER*\u003C\u002Ftd>\n\u003Ctd valign=\"bottom\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_cascade_final.pth\">下载\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">21.9 | 32.7\u003C\u002Ftd>\n\u003Ctd align=\"center\">22.4 | 33.1\u003C\u002Ftd>\n\u003Ctd align=\"center\">36.9 | 44.3\u003C\u002Ftd>\n\u003Ctd align=\"center\">8.4 | 21.8\u003C\u002Ftd>\n\u003Ctd align=\"center\">31.7 | 42.8\u003C\u002Ftd>\n\u003Ctd align=\"center\">21.1 | 41.3\u003C\u002Ftd>\n\u003Ctd align=\"center\">30.4 | 38.6\u003C\u002Ftd>\n\u003Ctd align=\"center\">37.5 | 44.6\u003C\u002Ftd>\n\u003Ctd align=\"center\">18.4 | 27.5\u003C\u002Ftd>\n\u003Ctd align=\"center\">21.6 | 34.2\u003C\u002Ftd>\n\u003Ctd align=\"center\">17.3 | 29.6\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\n## 半监督与全监督学习\nCutLER 还可以作为预训练模型，用于训练完全监督的目标检测和实例分割模型，并在 COCO 数据集上提升性能，包括在少样本基准测试中。\n\n### 命令行中的训练与评估\n您可以在 `model_zoo\u002Fconfigs\u002FCOCO-Semisupervised` 下找到 CutLER 提供的所有半监督和全监督学习配置文件。\n\n要使用 `train_net.py` 利用 K% 的标注数据训练模型，首先按照 [datasets\u002FREADME.md](datasets\u002FREADME.md) 设置 COCO 数据集，并在配置文件中指定 K 值，然后运行：\n```\npython train_net.py --num-gpus 8 \\\n  --config-file model_zoo\u002Fconfigs\u002FCOCO-Semisupervised\u002Fcascade_mask_rcnn_R_50_FPN_{K}perc.yaml \\\n  MODEL.WEIGHTS \u002Fpath\u002Fto\u002Fcutler_pretrained_model\n```\n\n您可以在 `model_zoo\u002Fconfigs\u002FCOCO-Semisupervised` 下找到所有用于训练监督模型的配置文件。这些配置适用于 8 张 GPU 卡的训练。如果要在单张 GPU 上训练，您可能需要[调整一些参数](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.02677)，例如 GPU 数量 (num-gpus your_num_gpus)、学习率 (SOLVER.BASE_LR your_base_lr) 和批量大小 (SOLVER.IMS_PER_BATCH your_batch_size)。\n\n### 评估\n要评估模型性能，请使用：\n```\npython train_net.py \\\n  --config-file model_zoo\u002Fconfigs\u002FCOCO-Semisupervised\u002Fcascade_mask_rcnn_R_50_FPN_{K}perc.yaml \\\n  --eval-only MODEL.WEIGHTS \u002Fpath\u002Fto\u002Fcheckpoint_file\n```\n更多选项请参阅 `python train_net.py -h`。\n\n### 模型动物园\n我们在不同数量的 COCO 标注数据上对使用 CutLER 或 MoCo-v2 初始化的 Cascade R-CNN 模型进行微调，并在下面的 val2017 分割上展示了结果（Box | Mask AP）：\n\n\u003Ctable>\u003Ctbody>\n\u003C!-- 开始表格 -->\n\u003C!-- 表头 -->\n\u003Cth valign=\"bottom\">标注比例\u003C\u002Fth>\n\u003Cth valign=\"bottom\">1%\u003C\u002Fth>\n\u003Cth valign=\"bottom\">2%\u003C\u002Fth>\n\u003Cth valign=\"bottom\">5%\u003C\u002Fth>\n\u003Cth valign=\"bottom\">10%\u003C\u002Fth>\n\u003Cth valign=\"bottom\">20%\u003C\u002Fth>\n\u003Cth valign=\"bottom\">30%\u003C\u002Fth>\n\u003Cth valign=\"bottom\">40%\u003C\u002Fth>\n\u003Cth valign=\"bottom\">50%\u003C\u002Fth>\n\u003Cth valign=\"bottom\">60%\u003C\u002Fth>\n\u003Cth valign=\"bottom\">80%\u003C\u002Fth>\n\u003Cth valign=\"bottom\">100%\u003C\u002Fth>\n\u003C!-- 表体 -->\n\u003C!-- 第一行：CutLER 的 Box\u002FMask AP -->\n\u003Ctr>\u003Ctd align=\"center\">MoCo-v2\u003C\u002Ftd>\n\u003Ctd align=\"center\">11.8 | 10.0\u003C\u002Ftd>\n\u003Ctd align=\"center\">16.2 | 13.8\u003C\u002Ftd>\n\u003Ctd align=\"center\">20.5 | 17.8\u003C\u002Ftd>\n\u003Ctd align=\"center\">26.5 | 23.0\u003C\u002Ftd>\n\u003Ctd align=\"center\">32.5 | 28.2\u003C\u002Ftd>\n\u003Ctd align=\"center\">35.5 | 30.8\u003C\u002Ftd>\n\u003Ctd align=\"center\">37.3 | 32.3\u003C\u002Ftd>\n\u003Ctd align=\"center\">38.7 | 33.6\u003C\u002Ftd>\n\u003Ctd align=\"center\">39.9 | 34.6\u003C\u002Ftd>\n\u003Ctd align=\"center\">41.6 | 36.0\u003C\u002Ftd>\n\u003Ctd align=\"center\">42.8 | 37.0\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C!-- 第二行：Mask AP -->\n\u003Ctr>\u003Ctd align=\"center\">CutLER\u003C\u002Ftd>\n\u003Ctd align=\"center\">16.8 | 14.6\u003C\u002Ftd>\n\u003Ctd align=\"center\">21.6 | 18.9\u003C\u002Ftd>\n\u003Ctd align=\"center\">27.8 | 24.3\u003C\u002Ftd>\n\u003Ctd align=\"center\">32.2 | 28.1\u003C\u002Ftd>\n\u003Ctd align=\"center\">36.6 | 31.7\u003C\u002Ftd>\n\u003Ctd align=\"center\">38.2 | 33.3\u003C\u002Ftd>\n\u003Ctd align=\"center\">39.9 | 34.7\u003C\u002Ftd>\n\u003Ctd align=\"center\">41.5 | 35.9\u003C\u002Ftd>\n\u003Ctd align=\"center\">42.3 | 36.7\u003C\u002Ftd>\n\u003Ctd align=\"center\">43.8 | 37.9\u003C\u002Ftd>\n\u003Ctd align=\"center\">44.7 | 38.5\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C!-- 第三行：模型下载 -->\n\u003Ctr>\u003Ctd align=\"center\">下载\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_semi_1perc.pth\">模型\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_semi_2perc.pth\">模型\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_semi_5perc.pth\">模型\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_semi_10perc.pth\">模型\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_semi_20perc.pth\">模型\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_semi_30perc.pth\">模型\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_semi_40perc.pth\">模型\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_semi_50perc.pth\">模型\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_semi_60perc.pth\">模型\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_semi_80perc.pth\">模型\u003C\u002Fa>\u003C\u002Ftd>\n\u003Ctd align=\"center\">\u003Ca href=\"http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_fully_100perc.pth\">模型\u003C\u002Fa>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\nMoCo-v2 和我们的 CutLER 均使用 Detectron2 按照 1x 调度进行训练，除了标注比例仅为 1% 或 2% 的极低样本量设置。在使用 1% 或 2% 标注进行训练时，我们分别对 MoCo-v2 和我们的模型进行 3,600 次迭代训练，批次大小为 16。\n\n## 许可证\nCutLER、Detectron2 和 DINO 的大部分内容采用 [CC-BY-NC 许可证](LICENSE)授权，但项目中的部分内容则以单独的许可条款提供：TokenCut、Bilateral Solver 和 CRF 采用 MIT 许可证授权；如果您后续添加了其他第三方代码，请务必更新此许可证信息，并告知我们该组件是否采用 CC-BY-NC、MIT 或 CC0 之外的其他许可证。\n\n## 伦理考量\nCutLER 广泛的检测能力可能会带来与其他视觉识别方法类似的挑战。\n由于图像中可能包含任意实例，这可能会对模型输出产生影响。\n\n## 如何获得我们的支持？\n如果您有任何一般性问题，欢迎发送邮件至 [Xudong Wang](mailto:xdwang@eecs.berkeley.edu)、[Ishan Misra](mailto:imisra@meta.com) 和 [Rohit Girdhar](mailto:rgirdhar@meta.com)。如果您有关于代码或实现的问题，也欢迎随时向我们发送邮件或在此代码库中提交问题（我们建议您在此代码库中提交问题，因为您的问题可能会帮助其他人）。\n\n## 引用\n如果您觉得我们的工作富有启发性，或者在您的研究中使用了我们的代码库，请考虑为我们点亮一颗星 ⭐ 并引用我们的论文。\n```\n@inproceedings{wang2023cut,\n  title={无监督目标检测与实例分割的剪切与学习},\n  author={Wang, Xudong and Girdhar, Rohit and Yu, Stella X and Misra, Ishan},\n  booktitle={IEEE\u002FCVF 计算机视觉与模式识别会议论文集},\n  pages={3124--3134},\n  year={2023}\n}\n```\n\n```\n@article{wang2023videocutler,\n  title={VideoCutLER：令人惊讶的简单无监督视频实例分割},\n  author={Wang, Xudong and Misra, Ishan and Zeng, Ziyun and Girdhar, Rohit and Darrell, Trevor},\n  journal={arXiv 预印本 arXiv:2308.14710},\n  year={2023}\n}\n```","# CutLER 快速上手指南\n\nCutLER (Cut-and-Learn) 是一个无需人工标注即可训练目标检测和实例分割模型的开源工具。它仅使用 ImageNet-1K 数据即可达到超越以往最先进方法（SOTA）的性能，并支持视频实例分割（VideoCutLER）。\n\n## 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n*   **操作系统**: Linux (推荐 Ubuntu 18.04+)\n*   **Python**: 3.8 或更高版本\n*   **GPU**: 支持 CUDA 的 NVIDIA GPU (训练推荐多卡，推理可单卡或 CPU)\n*   **前置依赖**:\n    *   PyTorch (建议 1.10+)\n    *   Detectron2 (CutLER 基于此框架)\n    *   torchvision\n    *   opencv-python\n    *   submitit (用于多节点任务调度，可选)\n\n> **注意**: 具体版本的兼容性请参考官方 `INSTALL.md` 文件。国内用户安装 PyTorch 时建议使用清华或阿里镜像源加速。\n\n## 安装步骤\n\n### 1. 克隆仓库\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FCutLER.git\ncd CutLER\n```\n\n### 2. 安装 Detectron2 及依赖\nCutLER 依赖 Detectron2。请先按照 [Detectron2 安装指南](https:\u002F\u002Fdetectron2.readthedocs.io\u002Fen\u002Flatest\u002Ftutorials\u002Finstall.html) 安装基础环境，然后安装 CutLER 特定依赖：\n\n```bash\npip install -e .\n```\n\n如果在国内网络环境下安装缓慢，可尝试指定国内镜像源：\n```bash\npip install -e . -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n### 3. 数据集准备 (可选但推荐)\n若需复现训练过程，需准备 ImageNet-1K 数据集。\n*   下载 ImageNet-1K 训练集。\n*   按照 `datasets\u002FREADME.md` 中的说明整理目录结构。\n*   设置环境变量指向数据集根目录：\n    ```bash\n    export DETECTRON2_DATASETS=\u002Fpath\u002Fto\u002Fyour\u002Fdatasets\n    ```\n\n## 基本使用\n\nCutLER 的核心功能分为两部分：**MaskCut** (生成伪标签) 和 **CutLER** (推理与训练)。以下是无需训练、直接使用预训练模型进行推理的最简示例。\n\n### 方式一：使用 Colab 在线体验 (无需本地环境)\n如果您只想快速测试效果，推荐使用 Google Colab：\n*   **MaskCut 演示**: [打开 Colab](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1X05lKL_IBRvZB7q6n6pb4w00_tIYjGlf?usp=sharing)\n*   **CutLER 演示**: [打开 Colab](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1NgEyFHvOfuA2MZZnfNPWg1w5gSr3HOBb?usp=sharing)\n\n### 方式二：本地运行推理 Demo\n\n#### 1. 下载预训练模型\n从 Model Zoo 下载预训练权重（例如 Cascade Mask R-CNN R50）：\n```bash\n# 示例：下载权重到当前目录 (请替换为实际下载链接)\nwget http:\u002F\u002Fdl.fbaipublicfiles.com\u002Fcutler\u002Fcheckpoints\u002Fcutler_w_cascade_checkpoint.pth\n```\n\n#### 2. 运行推理脚本\n进入 `cutler` 目录并执行以下命令。该命令将读取示例图片并在窗口中显示检测结果。\n\n```bash\ncd cutler\npython demo\u002Fdemo.py --config-file model_zoo\u002Fconfigs\u002FCutLER-ImageNet\u002Fcascade_mask_rcnn_R_50_FPN_demo.yaml \\\n  --input demo\u002Fimgs\u002F*.jpg \\\n  --opts MODEL.WEIGHTS .\u002Fcutler_w_cascade_checkpoint.pth\n```\n\n**常用参数说明：**\n*   `--input`: 输入图片路径，支持通配符。\n*   `MODEL.WEIGHTS`: 指定下载的预训练模型路径。\n*   `MODEL.DEVICE cpu`: 如果无 GPU，添加此参数以在 CPU 上运行。\n*   `--output`: 指定输出目录保存结果图片，而不是弹出窗口。\n\n#### 3. 仅运行 MaskCut 生成分割掩码\n如果您只想对单张图片生成伪分割掩码（无需检测框）：\n\n```bash\ncd maskcut\npython demo.py --img-path imgs\u002Fdemo2.jpg \\\n  --N 3 --tau 0.15 --vit-arch base --patch-size 8\n```\n*   `--N`: 期望分割的对象数量。\n*   `--cpu`: 若无 GPU，添加此标志。\n\n---\n*注：如需进行无监督训练或自训练（Self-training），请先使用 MaskCut 为 ImageNet 生成完整的伪标签 JSON 文件，随后参考 README 中的 \"Unsupervised Model Learning\" 章节配置 `train_net.py`。*","一家初创安防公司急需构建能识别未知异常物体的视频监控系统，但团队面临海量监控录像完全缺乏人工标注数据的困境。\n\n### 没有 CutLER 时\n- **标注成本高昂**：依赖人工逐帧绘制物体掩码（Mask），面对数万小时视频数据，标注预算和时间成本直接导致项目停滞。\n- **冷启动困难**：传统无监督方法在未见过的场景（如夜间、特殊角度）下泛化能力差，模型无法有效识别新类别的入侵物体。\n- **技术门槛极高**：现有视频实例分割方案通常依赖复杂的光流估计（Optical Flow）计算，对算力要求苛刻且难以调试部署。\n- **数据利用率低**：手中大量的公开图像数据集（如 ImageNet）因缺乏标注而无法用于训练专用的检测模型，造成资源浪费。\n\n### 使用 CutLER 后\n- **实现零标注训练**：利用 CutLER 的 MaskCut 模块自动生成高质量伪掩码，直接在无标注的 ImageNet 数据上训练出高性能检测器，彻底省去人工标注环节。\n- **跨域鲁棒性强**：模型在自然图像、监控视频帧甚至素描等多种领域基准测试中表现优异，能稳定识别各种陌生环境下的异常目标。\n- **架构极简高效**：借助 VideoCutLER 特性，无需计算复杂的光流信息即可实现领先的视频实例分割效果，大幅降低了对硬件算力的依赖。\n- **快速落地迭代**：仅需单卡 GPU 即可运行演示代码生成伪标签，团队能在数天内完成从数据预处理到模型验证的全流程，加速产品上线。\n\nCutLER 通过“切割与学习”机制，让开发者仅凭未标注的通用图像数据就能构建出超越以往技术的物体检测与分割系统，真正打破了数据标注的壁垒。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Ffacebookresearch_CutLER_a0d1c6c5.jpg","facebookresearch","Meta Research","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Ffacebookresearch_449342bd.png","",null,"https:\u002F\u002Fopensource.fb.com","https:\u002F\u002Fgithub.com\u002Ffacebookresearch",[84,88,92,96],{"name":85,"color":86,"percentage":87},"Python","#3572A5",93.5,{"name":89,"color":90,"percentage":91},"Cuda","#3A4E3A",5.3,{"name":93,"color":94,"percentage":95},"Shell","#89e051",0.6,{"name":97,"color":98,"percentage":95},"C++","#f34b7d",1063,108,"2026-04-01T13:55:31","NOASSERTION",4,"Linux","训练必需（示例命令使用 --num-gpus 8），推理支持 CPU（需添加 --cpu 或 MODEL.DEVICE cpu）。具体显卡型号、显存大小及 CUDA 版本未在文中明确说明，但基于 Detectron2 和 ViT 架构，通常建议 NVIDIA GPU 且显存充足。","未说明",{"notes":108,"python":106,"dependencies":109},"1. 该工具主要基于 Detectron2 框架，安装需参考其官方指南。2. 核心功能分为两步：首先使用 MaskCut 生成伪标签（支持 CPU 运行演示，但处理 ImageNet-1K 全量数据耗时较长，建议使用多节点集群和 submitit 脚本加速）；其次训练检测模型（训练示例命令显示需要 8 张 GPU）。3. 数据集需准备 ImageNet-1K。4. 提供 Colab 和 Hugging Face Spaces 演示，其中部分演示无需本地 GPU。5. 支持自训练（Self-training）以进一步提升性能。",[110,111,112,113],"Detectron2","PyTorch","ViT (Vision Transformer)","submitit (用于多节点任务调度)",[14,35],"2026-03-27T02:49:30.150509","2026-04-06T09:45:01.720060",[118,123,128,133,138,143],{"id":119,"question_zh":120,"answer_zh":121,"source_url":122},17806,"使用 MaskCut 生成 VOC 格式标注时，生成的 JSON 文件中出现乱码或边界框（bbox）信息不一致怎么办？","如果在生成 VOC 标注时遇到乱码或结果与官方提供的 JSON 文件不一致（导致 AP50 异常偏高），通常是因为代码中保留了不必要的字段或解码方式问题。建议检查并删除 annotation_info 中导致乱码的 segmentation、height 和 width 相关代码。此外，确保使用的命令参数与官方一致，并对比官方提供的标注文件结构。如果问题依旧，可以参考维护者提供的官方标注文件包进行核对。","https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FCutLER\u002Fissues\u002F28",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},17807,"仅使用 MaskCut 计算 AP_mask 时，得分（score）是如何确定的？为什么我的结果接近零？","计算 AP_mask 时得分接近零通常是因为忘记将生成的掩码（mask）调整回原始图像尺寸。在使用 maskcut.py 生成伪掩码后，必须执行 resize 操作将其恢复到原图大小，然后再进行评估。请检查代码中是否包含将 bipartition 或 pseudo_mask resize 到原始图像分辨率的步骤。","https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FCutLER\u002Fissues\u002F18",{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},17808,"在自训练（self-training）阶段，如何划分 ImageNet 数据的训练集和验证集？","在自训练设置中，通常使用 100% 的 ImageNet 数据作为训练集来生成伪掩码，并不预先划分验证集。实验设置是将所有 ImageNet 数据用于训练，然后在 11 个不同的检测数据集上评估模型以展示零样本无监督学习能力。如果你计划为验证集生成伪掩码，需要在生成步骤中通过 \"--dataset-path\" 参数指定包含验证分割的数据集路径，之后再将数据按比例（如 80%\u002F20%）划分为训练和验证输入。","https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FCutLER\u002Fissues\u002F23",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},17809,"训练 VideoCutLER 时遇到 'video_id' KeyError 错误或无法下载预生成标注文件怎么办？","遇到 'video_id' KeyError 通常是因为数据集注册或标注文件格式不正确。首先，确保严格按照 INSTALL.md 准备 VideoCutLER 所需的包和数据集。其次，不要随意重命名官方的标注文件（例如将 imagenet_train_fixsize480_tau0.15_N3.json 改为 video_...），这会导致索引错误。请使用官方提供的特定视频标注文件（如通过 Google Drive 链接下载的正确文件）。如果下载链接重定向，请尝试右键点击下载链接获取真实地址。此外，检查代码中是否错误地注释掉了必要的参数（如 test_dataset, train_dataset），并确保 detectron2 版本兼容。","https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FCutLER\u002Fissues\u002F51",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},17810,"无法从 Model Zoo 下载预训练模型文件怎么办？","如果直接点击 Model Zoo 页面上的下载链接无法下载，可以尝试右键点击该下载链接，选择“链接另存为”或直接复制链接地址到下载工具中。维护者确认模型文件是可下载的，有时浏览器直接跳转可能会遇到问题，获取真实的文件地址（通常以 .pth 结尾）即可解决。","https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002FCutLER\u002Fissues\u002F47",{"id":144,"question_zh":145,"answer_zh":146,"source_url":132},17811,"如何在自定义 COCO 数据集上进行自训练，--test-dataset 参数应该填什么？","在自训练的第一步中，--test-dataset 通常设置为 'imagenet_train'，因为该阶段主要利用 ImageNet 数据进行无监督学习。对于自定义 COCO 数据，如果你已经注册了 train 和 val 数据集，但在自训练流程中仍应遵循官方实验设置：使用全部可用数据生成伪标签。只有在需要对特定验证集进行评估或生成特定伪掩码时，才需要通过 --dataset-path 明确指定验证集路径。不要在自训练初期强行分割数据，除非你有特定的评估需求。",[]]