[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-matterport--Mask_RCNN":3,"tool-matterport--Mask_RCNN":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",152630,2,"2026-04-12T23:33:54",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":76,"owner_twitter":76,"owner_website":77,"owner_url":78,"languages":79,"stars":84,"forks":85,"last_commit_at":86,"license":87,"difficulty_score":10,"env_os":88,"env_gpu":89,"env_ram":90,"env_deps":91,"category_tags":98,"github_topics":99,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":105,"updated_at":106,"faqs":107,"releases":136},7000,"matterport\u002FMask_RCNN","Mask_RCNN","Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow","Mask_RCNN 是一款基于 Keras 和 TensorFlow 构建的开源深度学习模型，专注于解决图像中的目标检测与实例分割难题。它不仅能精准识别图像中有哪些物体并框出位置（目标检测），还能进一步为每个独立物体生成精细的像素级轮廓掩膜（实例分割），即使多个同类物体重叠也能清晰区分。\n\n该工具特别适合计算机视觉领域的研究人员、算法工程师及希望深入理解底层原理的开发者使用。其核心优势在于采用了特征金字塔网络（FPN）与 ResNet101 主干网络相结合的强大架构，在保持高精度的同时具备良好的扩展性。除了提供在 MS COCO 数据集上预训练的权重以便快速上手，Mask_RCNN 还独具特色地配套了丰富的 Jupyter 可视化教程。这些资源能逐步展示从锚框筛选、边界框修正到掩膜生成的完整流程，帮助用户直观调试模型、分析中间层激活状态及权重分布，极大地降低了学习与复现前沿算法的门槛，是探索实例分割技术的理想起点。","# Mask R-CNN for Object Detection and Segmentation\n\nThis is an implementation of [Mask R-CNN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.06870) on Python 3, Keras, and TensorFlow. The model generates bounding boxes and segmentation masks for each instance of an object in the image. It's based on Feature Pyramid Network (FPN) and a ResNet101 backbone.\n\n![Instance Segmentation Sample](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_511b26a23d69.png)\n\nThe repository includes:\n* Source code of Mask R-CNN built on FPN and ResNet101.\n* Training code for MS COCO\n* Pre-trained weights for MS COCO\n* Jupyter notebooks to visualize the detection pipeline at every step\n* ParallelModel class for multi-GPU training\n* Evaluation on MS COCO metrics (AP)\n* Example of training on your own dataset\n\n\nThe code is documented and designed to be easy to extend. If you use it in your research, please consider citing this repository (bibtex below). If you work on 3D vision, you might find our recently released [Matterport3D](https:\u002F\u002Fmatterport.com\u002Fblog\u002F2017\u002F09\u002F20\u002Fannouncing-matterport3d-research-dataset\u002F) dataset useful as well.\nThis dataset was created from 3D-reconstructed spaces captured by our customers who agreed to make them publicly available for academic use. You can see more examples [here](https:\u002F\u002Fmatterport.com\u002Fgallery\u002F).\n\n# Getting Started\n* [demo.ipynb](samples\u002Fdemo.ipynb) Is the easiest way to start. It shows an example of using a model pre-trained on MS COCO to segment objects in your own images.\nIt includes code to run object detection and instance segmentation on arbitrary images.\n\n* [train_shapes.ipynb](samples\u002Fshapes\u002Ftrain_shapes.ipynb) shows how to train Mask R-CNN on your own dataset. This notebook introduces a toy dataset (Shapes) to demonstrate training on a new dataset.\n\n* ([model.py](mrcnn\u002Fmodel.py), [utils.py](mrcnn\u002Futils.py), [config.py](mrcnn\u002Fconfig.py)): These files contain the main Mask RCNN implementation. \n\n\n* [inspect_data.ipynb](samples\u002Fcoco\u002Finspect_data.ipynb). This notebook visualizes the different pre-processing steps\nto prepare the training data.\n\n* [inspect_model.ipynb](samples\u002Fcoco\u002Finspect_model.ipynb) This notebook goes in depth into the steps performed to detect and segment objects. It provides visualizations of every step of the pipeline.\n\n* [inspect_weights.ipynb](samples\u002Fcoco\u002Finspect_weights.ipynb)\nThis notebooks inspects the weights of a trained model and looks for anomalies and odd patterns.\n\n\n# Step by Step Detection\nTo help with debugging and understanding the model, there are 3 notebooks \n([inspect_data.ipynb](samples\u002Fcoco\u002Finspect_data.ipynb), [inspect_model.ipynb](samples\u002Fcoco\u002Finspect_model.ipynb),\n[inspect_weights.ipynb](samples\u002Fcoco\u002Finspect_weights.ipynb)) that provide a lot of visualizations and allow running the model step by step to inspect the output at each point. Here are a few examples:\n\n\n\n## 1. Anchor sorting and filtering\nVisualizes every step of the first stage Region Proposal Network and displays positive and negative anchors along with anchor box refinement.\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_467e5eca7a69.png)\n\n## 2. Bounding Box Refinement\nThis is an example of final detection boxes (dotted lines) and the refinement applied to them (solid lines) in the second stage.\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_e1af83f44a6f.png)\n\n## 3. Mask Generation\nExamples of generated masks. These then get scaled and placed on the image in the right location.\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_875868bf3ca0.png)\n\n## 4.Layer activations\nOften it's useful to inspect the activations at different layers to look for signs of trouble (all zeros or random noise).\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_2ee303db2e7c.png)\n\n## 5. Weight Histograms\nAnother useful debugging tool is to inspect the weight histograms. These are included in the inspect_weights.ipynb notebook.\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_690c33728db9.png)\n\n## 6. Logging to TensorBoard\nTensorBoard is another great debugging and visualization tool. The model is configured to log losses and save weights at the end of every epoch.\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_c063a26a92bd.png)\n\n## 6. Composing the different pieces into a final result\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_211670ba645c.png)\n\n\n# Training on MS COCO\nWe're providing pre-trained weights for MS COCO to make it easier to start. You can\nuse those weights as a starting point to train your own variation on the network.\nTraining and evaluation code is in `samples\u002Fcoco\u002Fcoco.py`. You can import this\nmodule in Jupyter notebook (see the provided notebooks for examples) or you\ncan run it directly from the command line as such:\n\n```\n# Train a new model starting from pre-trained COCO weights\npython3 samples\u002Fcoco\u002Fcoco.py train --dataset=\u002Fpath\u002Fto\u002Fcoco\u002F --model=coco\n\n# Train a new model starting from ImageNet weights\npython3 samples\u002Fcoco\u002Fcoco.py train --dataset=\u002Fpath\u002Fto\u002Fcoco\u002F --model=imagenet\n\n# Continue training a model that you had trained earlier\npython3 samples\u002Fcoco\u002Fcoco.py train --dataset=\u002Fpath\u002Fto\u002Fcoco\u002F --model=\u002Fpath\u002Fto\u002Fweights.h5\n\n# Continue training the last model you trained. This will find\n# the last trained weights in the model directory.\npython3 samples\u002Fcoco\u002Fcoco.py train --dataset=\u002Fpath\u002Fto\u002Fcoco\u002F --model=last\n```\n\nYou can also run the COCO evaluation code with:\n```\n# Run COCO evaluation on the last trained model\npython3 samples\u002Fcoco\u002Fcoco.py evaluate --dataset=\u002Fpath\u002Fto\u002Fcoco\u002F --model=last\n```\n\nThe training schedule, learning rate, and other parameters should be set in `samples\u002Fcoco\u002Fcoco.py`.\n\n\n# Training on Your Own Dataset\n\nStart by reading this [blog post about the balloon color splash sample](https:\u002F\u002Fengineering.matterport.com\u002Fsplash-of-color-instance-segmentation-with-mask-r-cnn-and-tensorflow-7c761e238b46). It covers the process starting from annotating images to training to using the results in a sample application.\n\nIn summary, to train the model on your own dataset you'll need to extend two classes:\n\n```Config```\nThis class contains the default configuration. Subclass it and modify the attributes you need to change.\n\n```Dataset```\nThis class provides a consistent way to work with any dataset. \nIt allows you to use new datasets for training without having to change \nthe code of the model. It also supports loading multiple datasets at the\nsame time, which is useful if the objects you want to detect are not \nall available in one dataset. \n\nSee examples in `samples\u002Fshapes\u002Ftrain_shapes.ipynb`, `samples\u002Fcoco\u002Fcoco.py`, `samples\u002Fballoon\u002Fballoon.py`, and `samples\u002Fnucleus\u002Fnucleus.py`.\n\n## Differences from the Official Paper\nThis implementation follows the Mask RCNN paper for the most part, but there are a few cases where we deviated in favor of code simplicity and generalization. These are some of the differences we're aware of. If you encounter other differences, please do let us know.\n\n* **Image Resizing:** To support training multiple images per batch we resize all images to the same size. For example, 1024x1024px on MS COCO. We preserve the aspect ratio, so if an image is not square we pad it with zeros. In the paper the resizing is done such that the smallest side is 800px and the largest is trimmed at 1000px.\n* **Bounding Boxes**: Some datasets provide bounding boxes and some provide masks only. To support training on multiple datasets we opted to ignore the bounding boxes that come with the dataset and generate them on the fly instead. We pick the smallest box that encapsulates all the pixels of the mask as the bounding box. This simplifies the implementation and also makes it easy to apply image augmentations that would otherwise be harder to apply to bounding boxes, such as image rotation.\n\n    To validate this approach, we compared our computed bounding boxes to those provided by the COCO dataset.\nWe found that ~2% of bounding boxes differed by 1px or more, ~0.05% differed by 5px or more, \nand only 0.01% differed by 10px or more.\n\n* **Learning Rate:** The paper uses a learning rate of 0.02, but we found that to be\ntoo high, and often causes the weights to explode, especially when using a small batch\nsize. It might be related to differences between how Caffe and TensorFlow compute \ngradients (sum vs mean across batches and GPUs). Or, maybe the official model uses gradient\nclipping to avoid this issue. We do use gradient clipping, but don't set it too aggressively.\nWe found that smaller learning rates converge faster anyway so we go with that.\n\n## Citation\nUse this bibtex to cite this repository:\n```\n@misc{matterport_maskrcnn_2017,\n  title={Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow},\n  author={Waleed Abdulla},\n  year={2017},\n  publisher={Github},\n  journal={GitHub repository},\n  howpublished={\\url{https:\u002F\u002Fgithub.com\u002Fmatterport\u002FMask_RCNN}},\n}\n```\n\n## Contributing\nContributions to this repository are welcome. Examples of things you can contribute:\n* Speed Improvements. Like re-writing some Python code in TensorFlow or Cython.\n* Training on other datasets.\n* Accuracy Improvements.\n* Visualizations and examples.\n\nYou can also [join our team](https:\u002F\u002Fmatterport.com\u002Fcareers\u002F) and help us build even more projects like this one.\n\n## Requirements\nPython 3.4, TensorFlow 1.3, Keras 2.0.8 and other common packages listed in `requirements.txt`.\n\n### MS COCO Requirements:\nTo train or test on MS COCO, you'll also need:\n* pycocotools (installation instructions below)\n* [MS COCO Dataset](http:\u002F\u002Fcocodataset.org\u002F#home)\n* Download the 5K [minival](https:\u002F\u002Fdl.dropboxusercontent.com\u002Fs\u002Fo43o90bna78omob\u002Finstances_minival2014.json.zip?dl=0)\n  and the 35K [validation-minus-minival](https:\u002F\u002Fdl.dropboxusercontent.com\u002Fs\u002Fs3tw5zcg7395368\u002Finstances_valminusminival2014.json.zip?dl=0)\n  subsets. More details in the original [Faster R-CNN implementation](https:\u002F\u002Fgithub.com\u002Frbgirshick\u002Fpy-faster-rcnn\u002Fblob\u002Fmaster\u002Fdata\u002FREADME.md).\n\nIf you use Docker, the code has been verified to work on\n[this Docker container](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fwaleedka\u002Fmodern-deep-learning\u002F).\n\n\n## Installation\n1. Clone this repository\n2. Install dependencies\n   ```bash\n   pip3 install -r requirements.txt\n   ```\n3. Run setup from the repository root directory\n    ```bash\n    python3 setup.py install\n    ``` \n3. Download pre-trained COCO weights (mask_rcnn_coco.h5) from the [releases page](https:\u002F\u002Fgithub.com\u002Fmatterport\u002FMask_RCNN\u002Freleases).\n4. (Optional) To train or test on MS COCO install `pycocotools` from one of these repos. They are forks of the original pycocotools with fixes for Python3 and Windows (the official repo doesn't seem to be active anymore).\n\n    * Linux: https:\u002F\u002Fgithub.com\u002Fwaleedka\u002Fcoco\n    * Windows: https:\u002F\u002Fgithub.com\u002Fphilferriere\u002Fcocoapi.\n    You must have the Visual C++ 2015 build tools on your path (see the repo for additional details)\n\n# Projects Using this Model\nIf you extend this model to other datasets or build projects that use it, we'd love to hear from you.\n\n### [4K Video Demo](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=OOT3UIXZztE) by Karol Majek.\n[![Mask RCNN on 4K Video](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_bddd6e8efefa.gif)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=OOT3UIXZztE)\n\n### [Images to OSM](https:\u002F\u002Fgithub.com\u002Fjremillard\u002Fimages-to-osm): Improve OpenStreetMap by adding baseball, soccer, tennis, football, and basketball fields.\n\n![Identify sport fields in satellite images](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_da48feb01634.png)\n\n### [Splash of Color](https:\u002F\u002Fengineering.matterport.com\u002Fsplash-of-color-instance-segmentation-with-mask-r-cnn-and-tensorflow-7c761e238b46). A blog post explaining how to train this model from scratch and use it to implement a color splash effect.\n![Balloon Color Splash](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_f0f0efd39468.gif)\n\n\n### [Segmenting Nuclei in Microscopy Images](samples\u002Fnucleus). Built for the [2018 Data Science Bowl](https:\u002F\u002Fwww.kaggle.com\u002Fc\u002Fdata-science-bowl-2018)\nCode is in the `samples\u002Fnucleus` directory.\n\n![Nucleus Segmentation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_d3f6af6f22b0.png)\n\n### [Detection and Segmentation for Surgery Robots](https:\u002F\u002Fgithub.com\u002FSUYEgit\u002FSurgery-Robot-Detection-Segmentation) by the NUS Control & Mechatronics Lab.\n![Surgery Robot Detection and Segmentation](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_262f6fd08c0d.gif)\n\n### [Reconstructing 3D buildings from aerial LiDAR](https:\u002F\u002Fmedium.com\u002Fgeoai\u002Freconstructing-3d-buildings-from-aerial-lidar-with-ai-details-6a81cb3079c0)\nA proof of concept project by [Esri](https:\u002F\u002Fwww.esri.com\u002F), in collaboration with Nvidia and Miami-Dade County. Along with a great write up and code by Dmitry Kudinov, Daniel Hedges, and Omar Maher.\n![3D Building Reconstruction](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_d4a7be73cdc3.png)\n\n### [Usiigaci: Label-free Cell Tracking in Phase Contrast Microscopy](https:\u002F\u002Fgithub.com\u002Foist\u002Fusiigaci)\nA project from Japan to automatically track cells in a microfluidics platform. Paper is pending, but the source code is released.\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_70a57913cc77.gif) ![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_62d39c65a091.gif)\n\n### [Characterization of Arctic Ice-Wedge Polygons in Very High Spatial Resolution Aerial Imagery](http:\u002F\u002Fwww.mdpi.com\u002F2072-4292\u002F10\u002F9\u002F1487)\nResearch project to understand the complex processes between degradations in the Arctic and climate change. By Weixing Zhang, Chandi Witharana, Anna Liljedahl, and Mikhail Kanevskiy.\n![image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_cb066b9ef17d.png)\n\n### [Mask-RCNN Shiny](https:\u002F\u002Fgithub.com\u002Fhuuuuusy\u002FMask-RCNN-Shiny)\nA computer vision class project by HU Shiyu to apply the color pop effect on people with beautiful results.\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_e3dee2355943.jpg)\n\n### [Mapping Challenge](https:\u002F\u002Fgithub.com\u002FcrowdAI\u002Fcrowdai-mapping-challenge-mask-rcnn): Convert satellite imagery to maps for use by humanitarian organisations.\n![Mapping Challenge](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_6547dbe5d292.png)\n\n### [GRASS GIS Addon](https:\u002F\u002Fgithub.com\u002Fctu-geoforall-lab\u002Fi.ann.maskrcnn) to generate vector masks from geospatial imagery. Based on a [Master's thesis](https:\u002F\u002Fgithub.com\u002Fctu-geoforall-lab-projects\u002Fdp-pesek-2018) by Ondřej Pešek.\n![GRASS GIS Image](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_83a0312e3dcf.png)\n","# 用于目标检测与分割的 Mask R-CNN\n\n这是在 Python 3、Keras 和 TensorFlow 上实现的 [Mask R-CNN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.06870)。该模型为图像中每个对象实例生成边界框和分割掩码。它基于特征金字塔网络（FPN）和 ResNet101 主干网络。\n\n![实例分割示例](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_511b26a23d69.png)\n\n该仓库包含：\n* 基于 FPN 和 ResNet101 构建的 Mask R-CNN 源代码。\n* MS COCO 数据集的训练代码。\n* MS COCO 数据集的预训练权重。\n* Jupyter 笔记本，用于可视化检测流程的每一步。\n* 用于多 GPU 训练的 ParallelModel 类。\n* 在 MS COCO 数据集上的评估指标（AP）。\n* 自定义数据集训练示例。\n\n代码经过详细注释，并设计得易于扩展。如果您在研究中使用了此代码，请考虑引用本仓库（下方提供 BibTeX 格式）。如果您从事 3D 视觉相关工作，我们最近发布的 [Matterport3D](https:\u002F\u002Fmatterport.com\u002Fblog\u002F2017\u002F09\u002F20\u002Fannouncing-matterport3d-research-dataset\u002F) 数据集也可能对您有所帮助。\n该数据集由我们的客户采集的 3D 重建空间构成，这些客户同意将其公开供学术研究使用。您可以在 [这里](https:\u002F\u002Fmatterport.com\u002Fgallery\u002F) 查看更多示例。\n\n# 快速入门\n* [demo.ipynb](samples\u002Fdemo.ipynb) 是最简单的入门方式。它展示了如何使用在 MS COCO 数据集上预训练的模型来分割您自己的图像中的物体。\n其中包含了在任意图像上运行目标检测和实例分割的代码。\n\n* [train_shapes.ipynb](samples\u002Fshapes\u002Ftrain_shapes.ipynb) 展示了如何在自定义数据集上训练 Mask R-CNN。该笔记本引入了一个玩具数据集（Shapes），以演示如何在新数据集上进行训练。\n\n* ([model.py](mrcnn\u002Fmodel.py)、[utils.py](mrcnn\u002Futils.py)、[config.py](mrcnn\u002Fconfig.py))：这些文件包含了 Mask R-CNN 的核心实现。\n\n* [inspect_data.ipynb](samples\u002Fcoco\u002Finspect_data.ipynb)。该笔记本可视化了用于准备训练数据的不同预处理步骤。\n\n* [inspect_model.ipynb](samples\u002Fcoco\u002Finspect_model.ipynb) 该笔记本深入探讨了检测和分割物体所执行的各个步骤，并提供了整个流程中每一步的可视化结果。\n\n* [inspect_weights.ipynb](samples\u002Fcoco\u002Finspect_weights.ipynb)\n该笔记本检查了训练好的模型的权重，寻找异常和不寻常的模式。\n\n# 分步检测\n为了便于调试和理解模型，我们提供了 3 个笔记本\n([inspect_data.ipynb](samples\u002Fcoco\u002Finspect_data.ipynb)、[inspect_model.ipynb](samples\u002Fcoco\u002Finspect_model.ipynb)、\n[inspect_weights.ipynb](samples\u002Fcoco\u002Finspect_weights.ipynb))，它们提供了大量可视化内容，并允许逐步运行模型，以便在每个阶段检查输出。以下是一些示例：\n\n\n\n## 1. 锚点排序与过滤\n可视化了第一阶段区域建议网络的每一步操作，并展示了正负锚点以及锚框的精炼过程。\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_467e5eca7a69.png)\n\n## 2. 边界框精炼\n这是一个第二阶段最终检测框（虚线）及其应用的精炼结果（实线）的示例。\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_e1af83f44a6f.png)\n\n## 3. 掩码生成\n生成的掩码示例。随后这些掩码会被缩放并放置到图像的正确位置。\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_875868bf3ca0.png)\n\n## 4. 层激活\n通常检查不同层的激活情况有助于发现潜在问题（如全零或随机噪声）。\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_2ee303db2e7c.png)\n\n## 5. 权重直方图\n另一个有用的调试工具是检查权重的直方图。这些内容包含在 inspect_weights.ipynb 笔记本中。\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_690c33728db9.png)\n\n## 6. 日志记录到 TensorBoard\nTensorBoard 是另一个优秀的调试和可视化工具。该模型已配置为在每个 epoch 结束时记录损失并保存权重。\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_c063a26a92bd.png)\n\n## 6. 将各个部分组合成最终结果\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_211670ba645c.png)\n\n\n# 在 MS COCO 数据集上训练\n我们提供了 MS COCO 数据集的预训练权重，以便您更轻松地开始。您可以将这些权重作为起点，训练您自己的网络变体。\n训练和评估代码位于 `samples\u002Fcoco\u002Fcoco.py` 中。您可以在 Jupyter 笔记本中导入该模块（请参阅提供的笔记本示例），也可以直接通过命令行运行，如下所示：\n\n```\n# 从预训练的 COCO 权重开始训练新模型\npython3 samples\u002Fcoco\u002Fcoco.py train --dataset=\u002Fpath\u002Fto\u002Fcoco\u002F --model=coco\n\n# 从 ImageNet 权重开始训练新模型\npython3 samples\u002Fcoco\u002Fcoco.py train --dataset=\u002Fpath\u002Fto\u002Fcoco\u002F --model=imagenet\n\n# 继续训练之前已经训练过的模型\npython3 samples\u002Fcoco\u002Fcoco.py train --dataset=\u002Fpath\u002Fto\u002Fcoco\u002F --model=\u002Fpath\u002Fto\u002Fweights.h5\n\n# 继续训练上次训练的模型。这将会在模型目录中找到最后训练的权重。\npython3 samples\u002Fcoco\u002Fcoco.py train --dataset=\u002Fpath\u002Fto\u002Fcoco\u002F --model=last\n```\n\n您还可以使用以下命令运行 COCO 评估代码：\n```\n# 对上次训练的模型进行 COCO 评估\npython3 samples\u002Fcoco\u002Fcoco.py evaluate --dataset=\u002Fpath\u002Fto\u002Fcoco\u002F --model=last\n```\n\n训练计划、学习率和其他参数应在 `samples\u002Fcoco\u002Fcoco.py` 中设置。\n\n# 在自定义数据集上训练\n\n首先阅读这篇关于气球颜色泼溅示例的博客文章[《色彩的绽放：使用 Mask R-CNN 和 TensorFlow 进行实例分割》](https:\u002F\u002Fengineering.matterport.com\u002Fsplash-of-color-instance-segmentation-with-mask-r-cnn-and-tensorflow-7c761e238b46)。它涵盖了从图像标注到训练，再到将结果应用于示例应用程序的完整流程。\n简而言之，要在您的自定义数据集上训练模型，您需要扩展两个类：\n\n```Config```\n该类包含默认配置。请继承它并修改您需要更改的属性。\n\n```Dataset```\n该类提供了一种一致的方式来处理任何数据集。\n它允许您使用新的数据集进行训练，而无需更改模型的代码。此外，它还支持同时加载多个数据集，这在您想要检测的对象并不都存在于一个数据集中时非常有用。\n\n请参阅 `samples\u002Fshapes\u002Ftrain_shapes.ipynb`、`samples\u002Fcoco\u002Fcoco.py`、`samples\u002Fballoon\u002Fballoon.py` 和 `samples\u002Fnucleus\u002Fnucleus.py` 中的示例。\n\n## 与官方论文的区别\n本实现大部分遵循 Mask R-CNN 论文，但在某些地方为了代码的简洁性和通用性，我们做了一些调整。以下是我们已知的一些差异。如果您发现其他差异，请随时告知我们。\n\n* **图像缩放：** 为了支持每批次训练多张图像，我们将所有图像缩放为相同的尺寸。例如，在 MS COCO 数据集上使用 1024x1024 像素。我们会保持宽高比，因此如果图像不是正方形，我们会用零填充。而在论文中，缩放是使最短边为 800 像素，最长边则裁剪至 1000 像素。\n* **边界框：** 有些数据集提供边界框，而有些仅提供掩码。为了支持在多个数据集上进行训练，我们选择忽略数据集自带的边界框，转而实时生成它们。我们选取能够包围掩码所有像素的最小矩形作为边界框。这不仅简化了实现，还便于应用一些对边界框较难处理的数据增强技术，比如图像旋转。\n\n    为了验证这一方法，我们将其计算出的边界框与 COCO 数据集提供的边界框进行了对比。结果显示，约 2% 的边界框相差 1 像素或以上，约 0.05% 的边界框相差 5 像素或以上，而仅有 0.01% 的边界框相差 10 像素或以上。\n    \n* **学习率：** 论文中使用的学习率为 0.02，但我们发现这个值过高，容易导致权重爆炸，尤其是在小批量的情况下。这可能与 Caffe 和 TensorFlow 在梯度计算方式上的差异有关（即跨批次和 GPU 是求和还是取平均）。或者，官方模型可能使用了梯度裁剪来避免这个问题。虽然我们也使用梯度裁剪，但并未设置得过于激进。我们发现较小的学习率反而收敛得更快，因此我们选择了较低的学习率。\n\n## 引用\n请使用以下 BibTeX 格式引用本仓库：\n```\n@misc{matterport_maskrcnn_2017,\n  title={Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow},\n  author={Waleed Abdulla},\n  year={2017},\n  publisher={Github},\n  journal={GitHub repository},\n  howpublished={\\url{https:\u002F\u002Fgithub.com\u002Fmatterport\u002FMask_RCNN}},\n}\n```\n\n## 贡献\n欢迎为本仓库做出贡献。您可以贡献的内容包括：\n* 性能优化，例如将部分 Python 代码重写为 TensorFlow 或 Cython 实现。\n* 在其他数据集上进行训练。\n* 提升模型精度。\n* 可视化工具和示例。\n\n您也可以 [加入我们的团队](https:\u002F\u002Fmatterport.com\u002Fcareers\u002F)，帮助我们开发更多类似项目。\n\n## 环境要求\nPython 3.4、TensorFlow 1.3、Keras 2.0.8，以及 `requirements.txt` 中列出的其他常用包。\n\n### MS COCO 特别要求：\n若要在 MS COCO 数据集上进行训练或测试，您还需要：\n* pycocotools（安装说明见下文）\n* [MS COCO 数据集](http:\u002F\u002Fcocodataset.org\u002F#home)\n* 下载 5K 的 [minival](https:\u002F\u002Fdl.dropboxusercontent.com\u002Fs\u002Fo43o90bna78omob\u002Finstances_minival2014.json.zip?dl=0) 和 35K 的 [validation-minus-minival](https:\u002F\u002Fdl.dropboxusercontent.com\u002Fs\u002Fs3tw5zcg7395368\u002Finstances_valminusminival2014.json.zip?dl=0) 子集。更多详情请参阅原始的 [Faster R-CNN 实现](https:\u002F\u002Fgithub.com\u002Frbgirshick\u002Fpy-faster-rcnn\u002Fblob\u002Fmaster\u002Fdata\u002FREADME.md)。\n\n如果您使用 Docker，代码已在 [此 Docker 容器](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fwaleedka\u002Fmodern-deep-learning\u002F) 上验证过可以正常运行。\n\n## 安装步骤\n1. 克隆本仓库\n2. 安装依赖项\n   ```bash\n   pip3 install -r requirements.txt\n   ```\n3. 在仓库根目录下运行安装命令\n    ```bash\n    python3 setup.py install\n    ``` \n4. 从 [发布页面](https:\u002F\u002Fgithub.com\u002Fmatterport\u002FMask_RCNN\u002Freleases) 下载预训练的 COCO 权重文件 `mask_rcnn_coco.h5`。\n5. （可选）若要在 MS COCO 数据集上进行训练或测试，可以从以下仓库之一安装 `pycocotools`。这些仓库是原版 pycocotools 的分支，针对 Python 3 和 Windows 进行了修复（官方仓库似乎已不再维护）。\n\n    * Linux 版本：https:\u002F\u002Fgithub.com\u002Fwaleedka\u002Fcoco\n    * Windows 版本：https:\u002F\u002Fgithub.com\u002Fphilferriere\u002Fcocoapi。您需要确保系统路径中包含 Visual C++ 2015 构建工具（详情请参阅相关仓库）。\n\n# 使用本模型的项目\n如果您将本模型扩展到其他数据集，或基于它构建相关项目，我们非常期待您的反馈。\n\n### [4K 视频演示](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=OOT3UIXZztE) by Karol Majek。\n[![Mask RCNN 在 4K 视频中的应用](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_bddd6e8efefa.gif)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=OOT3UIXZztE)\n\n### [图片转 OSM](https:\u002F\u002Fgithub.com\u002Fjremillard\u002Fimages-to-osm)：通过添加棒球、足球、网球、橄榄球和篮球场来改进 OpenStreetMap。\n\n![卫星图像中的运动场地识别](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_da48feb01634.png)\n\n### [色彩点缀](https:\u002F\u002Fengineering.matterport.com\u002Fsplash-of-color-instance-segmentation-with-mask-r-cnn-and-tensorflow-7c761e238b46)：一篇博客文章，详细介绍了如何从头训练该模型，并利用它实现色彩点缀效果。\n![气球色彩点缀](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_f0f0efd39468.gif)\n\n\n### [显微镜图像中的细胞核分割](samples\u002Fnucleus)。专为 [2018 年数据科学碗竞赛](https:\u002F\u002Fwww.kaggle.com\u002Fc\u002Fdata-science-bowl-2018) 开发。\n代码位于 `samples\u002Fnucleus` 目录中。\n\n![细胞核分割](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_d3f6af6f22b0.png)\n\n### [手术机器人检测与分割](https:\u002F\u002Fgithub.com\u002FSUYEgit\u002FSurgery-Robot-Detection-Segmentation) by the NUS 控制与机电一体化实验室。\n![手术机器人检测和分割](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_262f6fd08c0d.gif)\n\n### [利用航空 LiDAR 数据重建三维建筑](https:\u002F\u002Fmedium.com\u002Fgeoai\u002Freconstructing-3d-buildings-from-aerial-lidar-with-ai-details-6a81cb3079c0)\n由 [Esri](https:\u002F\u002Fwww.esri.com\u002F) 与 Nvidia、迈阿密戴德县合作完成的概念验证项目。该项目由 Dmitry Kudinov、Daniel Hedges 和 Omar Maher 共同撰写并提供了代码。\n![3D 建筑重建](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_d4a7be73cdc3.png)\n\n### [Usiigaci：相位差显微镜下的无标记细胞追踪](https:\u002F\u002Fgithub.com\u002Foist\u002Fusiigaci)\n来自日本的一项研究项目，旨在自动跟踪微流控平台中的细胞。论文尚未发表，但源代码已公开。\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_70a57913cc77.gif) ![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_62d39c65a091.gif)\n\n### [极高分辨率航空影像中的北极冰楔多边形特征分析](http:\u002F\u002Fwww.mdpi.com\u002F2072-4292\u002F10\u002F9\u002F1487)\n一项研究北极地区退化过程与气候变化之间复杂关系的项目。作者包括 Weixing Zhang、Chandi Witharana、Anna Liljedahl 和 Mikhail Kanevskiy。\n![图像](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_cb066b9ef17d.png)\n\n### [Mask-RCNN Shiny](https:\u002F\u002Fgithub.com\u002Fhuuuuusy\u002FMask-RCNN-Shiny)\nHU Shiyu 的计算机视觉课程项目，用于对人物图像应用色彩突出效果，取得了很好的效果。\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_e3dee2355943.jpg)\n\n### [地图绘制挑战](https:\u002F\u002Fgithub.com\u002FcrowdAI\u002Fcrowdai-mapping-challenge-mask-rcnn)：将卫星影像转换为地图，供人道主义组织使用。\n![地图绘制挑战](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_6547dbe5d292.png)\n\n### 用于从地理空间影像生成矢量掩膜的 [GRASS GIS 插件](https:\u002F\u002Fgithub.com\u002Fctu-geoforall-lab\u002Fi.ann.maskrcnn)。基于 Ondřej Pešek 的[硕士论文](https:\u002F\u002Fgithub.com\u002Fctu-geoforall-lab-projects\u002Fdp-pesek-2018)。\n![GRASS GIS 图像](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_readme_83a0312e3dcf.png)","# Mask R-CNN 快速上手指南\n\nMask R-CNN 是一个基于 Python 3、Keras 和 TensorFlow 的目标检测与实例分割模型实现。该模型能够生成图像中每个物体实例的边界框和分割掩码，底层架构基于特征金字塔网络（FPN）和 ResNet101 骨干网络。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux 或 Windows (Windows 需额外配置编译工具)\n*   **Python**: 3.4+ (推荐 3.6+)\n*   **核心框架**:\n    *   TensorFlow 1.3+ (注意：此版本主要针对 TF 1.x，若需在 TF 2.x 运行可能需要修改代码或使用兼容模式)\n    *   Keras 2.0.8+\n*   **其他依赖**: 详见项目根目录下的 `requirements.txt`\n*   **MS COCO 数据集支持 (可选)**: 如需训练或测试 MS COCO 数据集，需安装 `pycocotools`。\n    *   **Linux**: 推荐使用修复版 `https:\u002F\u002Fgithub.com\u002Fwaleedka\u002Fcoco`\n    *   **Windows**: 推荐使用修复版 `https:\u002F\u002Fgithub.com\u002Fphilferriere\u002Fcocoapi` (需安装 Visual C++ 2015 Build Tools)\n\n> **提示**: 国内用户如遇 pip 下载缓慢，可临时使用清华或阿里镜像源：\n> `pip3 install -r requirements.txt -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple`\n\n## 安装步骤\n\n请按顺序执行以下命令完成安装：\n\n1.  **克隆仓库**\n    ```bash\n    git clone https:\u002F\u002Fgithub.com\u002Fmatterport\u002FMask_RCNN.git\n    cd Mask_RCNN\n    ```\n\n2.  **安装 Python 依赖**\n    ```bash\n    pip3 install -r requirements.txt\n    ```\n\n3.  **安装 Mask R-CNN 包**\n    在项目根目录下运行 setup 脚本：\n    ```bash\n    python3 setup.py install\n    ```\n\n4.  **下载预训练权重**\n    从 [Releases 页面](https:\u002F\u002Fgithub.com\u002Fmatterport\u002FMask_RCNN\u002Freleases) 下载在 MS COCO 上预训练的权重文件 `mask_rcnn_coco.h5`，并将其放置在项目根目录或您指定的路径下。\n\n5.  **(可选) 安装 pycocotools**\n    如果您计划使用 MS COCO 数据集，请根据操作系统安装对应的 fork 版本：\n    *   Linux:\n        ```bash\n        pip3 install git+https:\u002F\u002Fgithub.com\u002Fwaleedka\u002Fcoco.git#subdirectory=PythonAPI\n        ```\n    *   Windows: 请参考 `philferriere\u002Fcocoapi` 仓库说明进行编译安装。\n\n## 基本使用\n\n最简单的入门方式是运行官方提供的 Jupyter Notebook 演示，它展示了如何加载预训练模型并对自定义图片进行实例分割。\n\n### 方式一：运行 Demo Notebook (推荐)\n\n启动 Jupyter Notebook 并打开演示文件：\n\n```bash\njupyter notebook samples\u002Fdemo.ipynb\n```\n\n在该 Notebook 中，您将看到完整的流程：\n1.  加载预训练的 COCO 权重。\n2.  读取任意图片。\n3.  执行目标检测和实例分割。\n4.  可视化结果（边界框、类别标签、彩色掩码）。\n\n### 方式二：代码调用示例\n\n如果您希望在 Python 脚本中直接调用，参考以下核心逻辑：\n\n```python\nimport os\nimport numpy as np\nfrom mrcnn.config import Config\nfrom mrcnn import model as modellib, utils\nfrom mrcnn import visualize\n\n# 1. 定义配置类 (继承自 Config)\nclass InferenceConfig(Config):\n    NAME = \"coco_inference\"\n    IMAGES_PER_GPU = 1\n    NUM_CLASSES = 1 + 80  # COCO 有 80 个类别 + 背景\n\nconfig = InferenceConfig()\n\n# 2. 创建模型实例\nmodel = modellib.MaskRCNN(mode=\"inference\", config=config, model_dir=\".\u002Flogs\")\n\n# 3. 加载预训练权重\nCOCO_MODEL_PATH = \".\u002Fmask_rcnn_coco.h5\"  # 确保文件存在\nmodel.load_weights(COCO_MODEL_PATH, by_name=True)\n\n# 4. 运行检测\nimage = ... # 加载您的图片 (numpy array)\nresults = model.detect([image], verbose=1)\n\n# 5. 获取结果\nr = results[0]\n# r['rois']: 边界框\n# r['masks']: 掩码\n# r['class_ids']: 类别 ID\n# r['scores']: 置信度\n\n# 6. 可视化结果\nvisualize.display_instances(image, r['rois'], r['masks'], r['class_ids'], \n                            class_names=class_names, scores=r['scores'])\n```\n\n### 进阶：训练自己的数据集\n\n若要训练自定义数据集（如气球颜色飞溅示例），请参考 `samples\u002Fballoon\u002Fballoon.py` 或 `samples\u002Fshapes\u002Ftrain_shapes.ipynb`。核心步骤包括：\n1.  子类化 `Config` 类以修改类别数和路径。\n2.  子类化 `Dataset` 类以加载和预处理您的数据。\n3.  调用 `model.train()` 开始训练。","某智慧城市交通部门正利用路口监控视频，自动统计早晚高峰期间不同车型的车流量并分析车辆轨迹。\n\n### 没有 Mask_RCNN 时\n- 传统目标检测算法只能输出矩形边框，在车辆密集拥堵时，边框严重重叠导致无法区分具体车辆数量。\n- 难以精确提取车辆轮廓，当车辆被路灯杆或绿化带部分遮挡时，系统极易丢失目标或误判车型。\n- 人工复核成本极高，工作人员需逐帧查看视频来修正错误的计数数据，效率低下且容易疲劳出错。\n- 缺乏像素级的分割掩码，无法进行精细化的车道占用分析或车辆三维尺寸估算。\n\n### 使用 Mask_RCNN 后\n- Mask_RCNN 生成的实例分割掩码能清晰分离紧挨着的车辆，即使在拥堵路段也能实现单车级别的精准计数。\n- 凭借强大的特征金字塔网络，Mask_RCNN 能有效识别被部分遮挡的车辆，显著降低漏检率并提升车型分类准确度。\n- 自动化流程完全取代人工复核，系统可实时输出带轮廓标注的视频流，将数据处理效率提升数十倍。\n- 输出的高精度像素级掩码支持深度分析，如计算车辆实际投影面积以辅助判断违章变道或异常停车行为。\n\nMask_RCNN 通过将目标检测与实例分割完美结合，解决了复杂交通场景下“数不清、看不准”的核心难题，让视觉数据分析从粗糙的框选迈向了精细化的像素级理解。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmatterport_Mask_RCNN_511b26a2.png","matterport","Matterport, Inc","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fmatterport_dcb0ee3c.png","",null,"http:\u002F\u002Fmatterport.com","https:\u002F\u002Fgithub.com\u002Fmatterport",[80],{"name":81,"color":82,"percentage":83},"Python","#3572A5",100,25541,11681,"2026-04-12T12:05:07","NOASSERTION","Linux, Windows","需要 NVIDIA GPU（支持多 GPU 训练），具体型号和显存未说明，需匹配 TensorFlow 1.3 的 CUDA 版本","未说明",{"notes":92,"python":93,"dependencies":94},"Windows 用户安装 pycocotools 需要 Visual C++ 2015 构建工具；官方提供了经过验证的 Docker 容器；训练 MS COCO 数据集需额外下载特定的验证集子集文件。","3.4",[95,96,97],"TensorFlow>=1.3","Keras>=2.0.8","pycocotools",[14,15],[100,101,102,103,104],"mask-rcnn","tensorflow","object-detection","instance-segmentation","keras","2026-03-27T02:49:30.150509","2026-04-13T13:37:50.487908",[108,113,118,123,128,132],{"id":109,"question_zh":110,"answer_zh":111,"source_url":112},31527,"如何安装 pycocotools 模块？","可以通过以下两种方式安装：\n1. 使用 Conda（推荐）：运行命令 `conda install -c conda-forge pycocotools`。\n2. 使用 Pip：运行命令 `pip install \"git+https:\u002F\u002Fgithub.com\u002Fphilferriere\u002Fcocoapi.git#egg=pycocotools&subdirectory=PythonAPI\"`。","https:\u002F\u002Fgithub.com\u002Fmatterport\u002FMask_RCNN\u002Fissues\u002F6",{"id":114,"question_zh":115,"answer_zh":116,"source_url":117},31528,"如何提高输出掩码（Mask）的分辨率以减少块状效应？","需要修改 `model.py` 中的 `build_fpn_mask_graph` 函数。具体步骤是增加反卷积层（Conv2DTranspose）的数量以提升上采样倍数。例如，要生成 (56, 56) 的掩码，可以添加额外的层：\n```python\nx = KL.TimeDistributed(KL.Conv2DTranspose(256, (2, 2), strides=2, activation=\"relu\"), name=\"mrcnn_mask_deconv\")(x)\nx = KL.TimeDistributed(KL.Conv2DTranspose(256, (2, 2), strides=2, activation=\"relu\"), name=\"mrcnn_mask_deconv2\")(x)\n```\n同时，记得在 `config.py` 中将 `MASK_SHAPE` 更新为对应的新尺寸（如 [56, 56]）。","https:\u002F\u002Fgithub.com\u002Fmatterport\u002FMask_RCNN\u002Fissues\u002F635",{"id":119,"question_zh":120,"answer_zh":121,"source_url":122},31529,"如何在自定义数据集上训练模型以支持多个类别（多分类）？","默认的 balloon 示例仅支持单类。若要训练多类模型，建议参考以下资源修改数据集加载逻辑：\n1. 参考多类别实现示例：https:\u002F\u002Fgithub.com\u002FSriRamGovardhanam\u002Fwastedata-Mask_RCNN-multiple-classes\n2. 阅读详细教程：https:\u002F\u002Fmedium.com\u002Fanalytics-vidhya\u002Ftraining-your-own-data-set-using-mask-r-cnn-for-detecting-multiple-classes-3960ada85079\n关键点在于正确修改 `load_mask()` 和 `load_image()` 函数，确保返回的掩码和类别 ID 与多类设置兼容。","https:\u002F\u002Fgithub.com\u002Fmatterport\u002FMask_RCNN\u002Fissues\u002F372",{"id":124,"question_zh":125,"answer_zh":126,"source_url":127},31530,"在 TensorFlow 2.x 环境下运行推理时出现错误或结果异常怎么办？","原始仓库可能不完全兼容 TF 2.x。建议采取以下措施：\n1. 克隆专门适配 TF 2.x 的分支或仓库：https:\u002F\u002Fgithub.com\u002FakTwelve\u002FMask_RCNN\n2. 确保已正确安装 `pycocotools` 和 `cython`（参考安装步骤第 5 步）。\n3. 检查并调整 TensorFlow、Keras、h5py 和 scikit-image 的版本兼容性，不同版本组合可能导致推理结果乱码或报错。","https:\u002F\u002Fgithub.com\u002Fmatterport\u002FMask_RCNN\u002Fissues\u002F1070",{"id":129,"question_zh":130,"answer_zh":131,"source_url":117},31531,"修改掩码分辨率后训练时报错 \"Incompatible shapes\"（形状不兼容）如何解决？","该错误通常是因为修改了网络结构（如增加卷积层以改变输出尺寸）但未同步更新配置文件中的 `MASK_SHAPE`，或者损失计算部分的张量形状不匹配。解决方法：\n1. 确保 `config.py` 中的 `MASK_SHAPE` 与你修改后的网络输出尺寸完全一致（例如改为 [56, 56]）。\n2. 检查 `build_fpn_mask_graph` 中的层堆叠是否正确，确保最终输出特征图的宽高与配置相符。\n3. 如果手动添加了卷积层，需确认没有破坏原有的维度传递逻辑。",{"id":133,"question_zh":134,"answer_zh":135,"source_url":122},31532,"使用 VIA 工具标注的数据集如何转换为 Mask RCNN 可用的格式？","VIA 导出的 JSON 格式与 COCO 格式不同。你需要编写自定义的 Dataset 类继承自 `mrcnn.utils.Dataset`。\n关键步骤包括：\n1. 重写 `load_custom` 方法解析 VIA 的 JSON 文件，提取 `regions` 中的 `shape_attributes`。\n2. 在 `load_mask` 方法中根据提取的多边形坐标生成二进制掩码。\n3. 确保类别映射正确，如果是多类任务，需在 `class_names` 列表中定义所有类别。\n可参考社区实现的自定义数据集模块：https:\u002F\u002Fgithub.com\u002Fsoumyaiitkgp\u002FCustom_Mask_RCNN",[137,142,147],{"id":138,"version":139,"summary_zh":140,"released_at":141},238241,"v2.1","本次发布新增：\n\n* 气球颜色泼溅示例，附带数据集和训练好的权重。\n* 将最后一个预测层从 Python 实现转换为 TensorFlow 操作。\n* 自动下载 COCO 权重和数据集。\n* 修复了在 Windows 上运行的问题。\n\n感谢所有通过修复和提交 Pull Request 让这一切成为可能的贡献者。\n\n注意：本次发布未更新 COCO 权重。请继续使用 2.0 版本中的 .h5 文件。","2018-03-19T23:26:21",{"id":143,"version":144,"summary_zh":145,"released_at":146},238242,"v2.0","本次发布包含多项更新，旨在提升训练效果和模型精度，并新增了一个基于 MS COCO 数据集训练的模型。\n\n* 移除不必要的 Dropout 层\n* 将锚点步长从 2 减少到 1\n* 将 ROI 训练的 mini-batch 大小增加至每张图像 200 个\n* 改善候选框正负样本比例的计算\n* 更新了 COCO 数据集的训练计划\n* 在 coco.py 中添加 --logs 参数，用于设置日志目录\n* 修复 Bug：在 L2 正则化中排除 BN 层的权重\n* 在 TensorBoard 中使用 L2 正则化的均值（而非总和），以获得更平滑的损失曲线\n* 提高与 Python 2.7 的兼容性\n\n新训练得到的 MS COCO 权重相比之前的权重显著提升了精度。以下是该模型在 minival 数据集上的评估结果：\n\n``` \n评估标注类型 *bbox*\n 平均精度 (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.347\n 平均精度 (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.544\n 平均精度 (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.377\n 平均精度 (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.163\n 平均精度 (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.390\n 平均精度 (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.486\n 平均召回率 (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.295\n 平均召回率 (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.424\n 平均召回率 (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.433\n 平均召回率 (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.214\n 平均召回率 (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.481\n 平均召回率 (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.601\n```\n\n``` \n评估标注类型 *segm*\n 平均精度 (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.296\n 平均精度 (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.510\n 平均精度 (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.306\n 平均精度 (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.128\n 平均精度 (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.330\n 平均精度 (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.430\n 平均召回率 (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.258\n 平均召回率 (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.369\n 平均召回率 (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.376\n 平均召回率 (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.173\n 平均召回率 (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.417\n 平均召回率 (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.538\n```\n\n衷心感谢所有为本仓库做出贡献的开发者，他们的名字都记录在提交历史中。","2017-11-26T04:33:46",{"id":148,"version":149,"summary_zh":76,"released_at":150},238243,"v1.0","2017-10-23T22:57:20"]