[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-ruotianluo--pytorch-faster-rcnn":3,"tool-ruotianluo--pytorch-faster-rcnn":64},[4,17,26,40,48,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,2,"2026-04-03T11:11:01",[13,14,15],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":23,"last_commit_at":32,"category_tags":33,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,34,35,36,15,37,38,13,39],"数据工具","视频","插件","其他","语言模型","音频",{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":10,"last_commit_at":46,"category_tags":47,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,38,37],{"id":49,"name":50,"github_repo":51,"description_zh":52,"stars":53,"difficulty_score":10,"last_commit_at":54,"category_tags":55,"status":16},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74939,"2026-04-05T23:16:38",[38,14,13,37],{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":23,"last_commit_at":62,"category_tags":63,"status":16},2471,"tesseract","tesseract-ocr\u002Ftesseract","Tesseract 是一款历史悠久且备受推崇的开源光学字符识别（OCR）引擎，最初由惠普实验室开发，后由 Google 维护，目前由全球社区共同贡献。它的核心功能是将图片中的文字转化为可编辑、可搜索的文本数据，有效解决了从扫描件、照片或 PDF 文档中提取文字信息的难题，是数字化归档和信息自动化的重要基础工具。\n\n在技术层面，Tesseract 展现了强大的适应能力。从版本 4 开始，它引入了基于长短期记忆网络（LSTM）的神经网络 OCR 引擎，显著提升了行识别的准确率；同时，为了兼顾旧有需求，它依然支持传统的字符模式识别引擎。Tesseract 原生支持 UTF-8 编码，开箱即用即可识别超过 100 种语言，并兼容 PNG、JPEG、TIFF 等多种常见图像格式。输出方面，它灵活支持纯文本、hOCR、PDF、TSV 等多种格式，方便后续数据处理。\n\nTesseract 主要面向开发者、研究人员以及需要构建文档处理流程的企业用户。由于它本身是一个命令行工具和库（libtesseract），不包含图形用户界面（GUI），因此最适合具备一定编程能力的技术人员集成到自动化脚本或应用程序中",73286,"2026-04-03T01:56:45",[13,14],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":80,"owner_location":81,"owner_email":82,"owner_twitter":83,"owner_website":84,"owner_url":85,"languages":86,"stars":106,"forks":107,"last_commit_at":108,"license":109,"difficulty_score":110,"env_os":111,"env_gpu":112,"env_ram":113,"env_deps":114,"category_tags":124,"github_topics":83,"view_count":23,"oss_zip_url":83,"oss_zip_packed_at":83,"status":16,"created_at":125,"updated_at":126,"faqs":127,"releases":158},2756,"ruotianluo\u002Fpytorch-faster-rcnn","pytorch-faster-rcnn","pytorch1.0 updated. Support cpu test and demo. (Use detectron2, it's a masterpiece)","pytorch-faster-rcnn 是一个基于 PyTorch 框架实现的 Faster R-CNN 目标检测工具，旨在将经典的深度学习检测算法从 TensorFlow 或 Caffe 平台迁移至 PyTorch 生态。它主要解决了早期 PyTorch 社区缺乏高性能、可复现的目标检测实现这一痛点，让研究人员能够更方便地在 PyTorch 环境中进行算法验证与实验。\n\n该项目非常适合计算机视觉领域的开发者与科研人员使用，特别是那些希望深入理解 Faster R-CNN 架构细节、进行模型改进研究，或需要在 PyTorch 中复现经典论文结果的用户。其独特的技术亮点在于严格对齐了 Xinlei Chen 的 TensorFlow 版本实现，不仅支持 VGG16、ResNet101 及 MobileNetV1 等多种骨干网络，还实现了预训练模型的直接转换，确保了在不同框架间性能表现的一致性。此外，代码支持 CPU 推理与演示，降低了运行门槛。\n\n需要注意的是，随着 Detectron2 和 MMDetection 等更现代、功能更强大的库出现，pytorch-faster-rcnn 已","pytorch-faster-rcnn 是一个基于 PyTorch 框架实现的 Faster R-CNN 目标检测工具，旨在将经典的深度学习检测算法从 TensorFlow 或 Caffe 平台迁移至 PyTorch 生态。它主要解决了早期 PyTorch 社区缺乏高性能、可复现的目标检测实现这一痛点，让研究人员能够更方便地在 PyTorch 环境中进行算法验证与实验。\n\n该项目非常适合计算机视觉领域的开发者与科研人员使用，特别是那些希望深入理解 Faster R-CNN 架构细节、进行模型改进研究，或需要在 PyTorch 中复现经典论文结果的用户。其独特的技术亮点在于严格对齐了 Xinlei Chen 的 TensorFlow 版本实现，不仅支持 VGG16、ResNet101 及 MobileNetV1 等多种骨干网络，还实现了预训练模型的直接转换，确保了在不同框架间性能表现的一致性。此外，代码支持 CPU 推理与演示，降低了运行门槛。\n\n需要注意的是，随着 Detectron2 和 MMDetection 等更现代、功能更强大的库出现，pytorch-faster-rcnn 已不再活跃维护。但它作为一份清晰、标准的参考实现，对于学习目标检测基础原理和回顾技术发展历程依然具有重要的参考价值。","# Notice(2019.11.2)\nThis repo was built back two years ago when there were no pytorch detection implementation that can achieve reasonable performance. At this time, there are many better repos out there, for example:\n\n- [detectron2](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2)(It's a masterpiece.)\n- [mmdetection](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmdetection)\n\nTherefore, this repo will not be actively maintained.\n\n# Important notice:\nIf you used the master branch before Sep. 26 2017 and its corresponding pretrained model, **PLEASE PAY ATTENTION**:\nThe old master branch in now under old_master, you can still run the code and download the pretrained model, but the pretrained model for that old master is not compatible to the current master!\n\nThe main differences between new and old master branch are in this two commits: [9d4c24e](https:\u002F\u002Fgithub.com\u002Fruotianluo\u002Fpytorch-faster-rcnn\u002Fcommit\u002F9d4c24e83c3e4ec33751e50d5e4d8b1dd793dfaa), [c899ce7](https:\u002F\u002Fgithub.com\u002Fruotianluo\u002Fpytorch-faster-rcnn\u002Fcommit\u002Fc899ce70dae62e3db1a5805eda96df88e4b59ca6)\nThe change is related to this [issue](https:\u002F\u002Fgithub.com\u002Fruotianluo\u002Fpytorch-faster-rcnn\u002Fissues\u002F6); master now matches all the details in [tf-faster-rcnn](https:\u002F\u002Fgithub.com\u002Fendernewton\u002Ftf-faster-rcnn) so that we can now convert pretrained tf model to pytorch model.\n\n# pytorch-faster-rcnn\nA pytorch implementation of faster RCNN detection framework based on Xinlei Chen's [tf-faster-rcnn](https:\u002F\u002Fgithub.com\u002Fendernewton\u002Ftf-faster-rcnn). Xinlei Chen's repository is based on the python Caffe implementation of faster RCNN available [here](https:\u002F\u002Fgithub.com\u002Frbgirshick\u002Fpy-faster-rcnn).\n\n**Note**: Several minor modifications are made when reimplementing the framework, which give potential improvements. For details about the modifications and ablative analysis, please refer to the technical report [An Implementation of Faster RCNN with Study for Region Sampling](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1702.02138.pdf). If you are seeking to reproduce the results in the original paper, please use the [official code](https:\u002F\u002Fgithub.com\u002FShaoqingRen\u002Ffaster_rcnn) or maybe the [semi-official code](https:\u002F\u002Fgithub.com\u002Frbgirshick\u002Fpy-faster-rcnn). For details about the faster RCNN architecture please refer to the paper [Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1506.01497.pdf).\n\n### Detection Performance\nThe current code supports **VGG16**, **Resnet V1** and **Mobilenet V1** models. We mainly tested it on plain VGG16 and Resnet101 architecture. As the baseline, we report numbers using a single model on a single convolution layer, so no multi-scale, no multi-stage bounding box regression, no skip-connection, no extra input is used. The only data augmentation technique is left-right flipping during training following the original Faster RCNN. All models are released.\n\nWith VGG16 (``conv5_3``):\n  - Train on VOC 2007 trainval and test on VOC 2007 test, **71.22**(from scratch) **70.75**(converted) (**70.8** for tf-faster-rcnn).\n  - Train on VOC 2007+2012 trainval and test on VOC 2007 test ([R-FCN](https:\u002F\u002Fgithub.com\u002Fdaijifeng001\u002FR-FCN) schedule), **75.33**(from scratch) **75.27**(converted) (**75.7** for tf-faster-rcnn).\n  - Train on COCO 2014 [trainval35k](https:\u002F\u002Fgithub.com\u002Frbgirshick\u002Fpy-faster-rcnn\u002Ftree\u002Fmaster\u002Fmodels) and test on [minival](https:\u002F\u002Fgithub.com\u002Frbgirshick\u002Fpy-faster-rcnn\u002Ftree\u002Fmaster\u002Fmodels) (900k\u002F1190k) **29.2**(from scratch) **30.1**(converted) (**30.2** for tf-faster-rcnn).\n\nWith Resnet101 (last ``conv4``):\n  - Train on VOC 2007 trainval and test on VOC 2007 test, **75.29**(from scratch) **75.76**(converted) (**75.7** for tf-faster-rcnn).\n  - Train on VOC 2007+2012 trainval and test on VOC 2007 test (R-FCN schedule), **79.26**(from scratch) **79.78**(converted) (**79.8** for tf-faster-rcnn).\n  - Train on COCO 2014 trainval35k and test on minival (800k\u002F1190k), **35.1**(from scratch) **35.4**(converted) （**35.4** for tf-faster-rcnn).\n\nMore Results:\n  - Train Mobilenet (1.0, 224) on COCO 2014 trainval35k and test on minival (900k\u002F1190k), **21.4**(from scratch), **21.9**(converted) （**21.8** for tf-faster-rcnn).\n  - Train Resnet50 on COCO 2014 trainval35k and test on minival (900k\u002F1190k), **32.4**(converted) (**32.4** for tf-faster-rcnn).\n  - Train Resnet152 on COCO 2014 trainval35k and test on minival (900k\u002F1190k), **36.7**(converted) (**36.1** for tf-faster-rcnn).\n\nApproximate *baseline* [setup](https:\u002F\u002Fgithub.com\u002Fendernewton\u002Ftf-faster-rcnn\u002Fblob\u002Fmaster\u002Fexperiments\u002Fcfgs\u002Fres101-lg.yml) from [FPN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1612.03144) (this repository does not contain training code for FPN yet):\n  - Train Resnet50 on COCO 2014 trainval35k and test on minival (900k\u002F1190k), ~~**34.2**~~.\n  - Train Resnet101 on COCO 2014 trainval35k and test on minival (900k\u002F1190k), ~~**37.4**~~.\n  - Train Resnet152 on COCO 2014 trainval35k and test on minival (900k\u002F1190k), ~~**38.2**~~.\n\n**Note**:\n  - Due to the randomness in GPU training especially for VOC, the best numbers are reported (with 2-3 attempts) here. According to Xinlei's experience, for COCO you can almost always get a very close number (within ~0.2%) despite the randomness.\n  - The numbers are obtained with the **default** testing scheme which selects region proposals using non-maximal suppression (TEST.MODE nms), the alternative testing scheme (TEST.MODE top) will likely result in slightly better performance (see [report](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1702.02138.pdf), for COCO it boosts 0.X AP).\n  - Since we keep the small proposals (\\\u003C 16 pixels width\u002Fheight), our performance is especially good for small objects.\n  - We do not set a threshold (instead of 0.05) for a detection to be included in the final result, which increases recall.\n  - Weight decay is set to 1e-4.\n  - For other minor modifications, please check the [report](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1702.02138.pdf). Notable ones include using ``crop_and_resize``, and excluding ground truth boxes in RoIs during training.\n  - For COCO, we find the performance improving with more iterations, and potentially better performance can be achieved with even more iterations.\n  - For Resnets, we fix the first block (total 4) when fine-tuning the network, and only use ``crop_and_resize`` to resize the RoIs (7x7) without max-pool (which Xinlei finds useless especially for COCO). The final feature maps are average-pooled for classification and regression. All batch normalization parameters are fixed. Learning rate for biases is not doubled.\n  - For Mobilenets, we fix the first five layers when fine-tuning the network. All batch normalization parameters are fixed. Weight decay for Mobilenet layers is set to 4e-5.\n\n  - For approximate [FPN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1612.03144) baseline setup we simply resize the image with 800 pixels, add 32^2 anchors, and take 1000 proposals during testing.\n  - Check out [here](http:\u002F\u002Fladoga.graphics.cs.cmu.edu\u002Fxinleic\u002Ftf-faster-rcnn\u002F)\u002F[here](http:\u002F\u002Fxinlei.sp.cs.cmu.edu\u002Fxinleic\u002Ftf-faster-rcnn\u002F)\u002F[here](https:\u002F\u002Fdrive.google.com\u002Fopen?id=0B1_fAEgxdnvJSmF3YUlZcHFqWTQ) for the latest models, including longer COCO VGG16 models and Resnet ones.\n  \n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fruotianluo_pytorch-faster-rcnn_readme_ce0673762446.png)      |  ![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fruotianluo_pytorch-faster-rcnn_readme_29c8a8fa4d7a.png)\n:-------------------------:|:-------------------------:\nDisplayed Ground Truth on Tensorboard |  Displayed Predictions on Tensorboard\n\n### Additional features\nAdditional features not mentioned in the [report](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1702.02138.pdf) are added to make research life easier:\n  - **Support for train-and-validation**. During training, the validation data will also be tested from time to time to monitor the process and check potential overfitting. Ideally training and validation should be separate, where the model is loaded every time to test on validation. However Xinlei have implemented it in a joint way to save time and GPU memory. Though in the default setup the testing data is used for validation, no special attempts is made to overfit on testing set.\n  - **Support for resuming training**. Xinlei tried to store as much information as possible when snapshoting, with the purpose to resume training from the latest snapshot properly. The meta information includes current image index, permutation of images, and random state of numpy. However, when you resume training the random seed for tensorflow will be reset (not sure how to save the random state of tensorflow now), so it will result in a difference. **Note** that, the current implementation still cannot force the model to behave deterministically even with the random seeds set. Suggestion\u002Fsolution is welcome and much appreciated.\n  - **Support for visualization**. The current implementation will summarize ground truth boxes, statistics of losses, activations and variables during training, and dump it to a separate folder for tensorboard visualization. The computing graph is also saved for debugging.\n\n### Prerequisites\n  - A basic pytorch installation. The code follows **1.0**. If you are using old **0.1.12** or **0.2** or **0.3** or **0.4**, you can checkout the corresponding branch.\n  - Torchvision **0.3**. This code uses `torchvision.ops` for `nms`, `roi_pool` and `roi_align`\n  - Python packages you might not have: `opencv-python`, `easydict` (similar to [py-faster-rcnn](https:\u002F\u002Fgithub.com\u002Frbgirshick\u002Fpy-faster-rcnn)). For `easydict` make sure you have the right version. Xinlei uses 1.6.\n  - [tensorboard-pytorch](https:\u002F\u002Fgithub.com\u002Flanpa\u002Ftensorboard-pytorch) to visualize the training and validation curve. Please build from source to use the latest tensorflow-tensorboard.\n  - ~~Docker users: Since the recent upgrade, the docker image on docker hub (https:\u002F\u002Fhub.docker.com\u002Fr\u002Fmbuckler\u002Ftf-faster-rcnn-deps\u002F) is no longer valid. However, you can still build your own image by using dockerfile located at `docker` folder (cuda 8 version, as it is required by Tensorflow r1.0.) And make sure following Tensorflow installation to install and use nvidia-docker[https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fnvidia-docker]. Last, after launching the container, you have to build the Cython modules within the running container.~~\n\n### Installation\n1. Clone the repository\n  ```Shell\n  git clone https:\u002F\u002Fgithub.com\u002Fruotianluo\u002Fpytorch-faster-rcnn.git\n  ```\n\n2. Install the [Python COCO API](https:\u002F\u002Fgithub.com\u002Fpdollar\u002Fcoco). The code requires the API to access COCO dataset.\n  ```Shell\n  cd data\n  git clone https:\u002F\u002Fgithub.com\u002Fpdollar\u002Fcoco.git\n  cd coco\u002FPythonAPI\n  make\n  cd ..\u002F..\u002F..\n  ```\n\n### Setup data\nPlease follow the instructions of py-faster-rcnn [here](https:\u002F\u002Fgithub.com\u002Frbgirshick\u002Fpy-faster-rcnn#beyond-the-demo-installation-for-training-and-testing-models) to setup VOC and COCO datasets (Part of COCO is done). The steps involve downloading data and optionally creating soft links in the ``data`` folder. Since faster RCNN does not rely on pre-computed proposals, it is safe to ignore the steps that setup proposals.\n\nIf you find it useful, the ``data\u002Fcache`` folder created on Xinlei's side is also shared [here](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F0B1_fAEgxdnvJSmF3YUlZcHFqWTQ).\n\n### Demo and Test with pre-trained models\n1. Download pre-trained model (only google drive works)\n  \u003C!-- ```Shell\n  # Resnet101 for voc pre-trained on 07+12 set\n  # .\u002Fdata\u002Fscripts\u002Ffetch_faster_rcnn_models.sh\n  ```\n  **Note**: if you cannot download the models through the link, or you want to try more models, you can check out the following solutions and optionally update the downloading script: -->\n  - ~~Another server [here](http:\u002F\u002Fgs11655.sp.cs.cmu.edu\u002Fxinleic\u002Ftf-faster-rcnn\u002F).~~\n  - Google drive [here](https:\u002F\u002Fdrive.google.com\u002Fopen?id=0B7fNdx_jAqhtNE10TDZDbFRuU0E).\n\n**(Optional)**\nInstead of downloading my pretrained or converted model, you can also convert from tf-faster-rcnn model.\nYou can download the tensorflow pretrained model from [tf-faster-rcnn](https:\u002F\u002Fgithub.com\u002Fendernewton\u002Ftf-faster-rcnn\u002F#demo-and-test-with-pre-trained-models).\nThen run:\n```Shell\npython tools\u002Fconvert_from_tensorflow.py --tensorflow_model resnet_model.ckpt \npython tools\u002Fconvert_from_tensorflow_vgg.py --tensorflow_model vgg_model.ckpt\n```\n\nThis script will create a `.pth` file with the same name in the same folder as the tensorflow model.\n\n2. Create a folder and a soft link to use the pre-trained model\n  ```Shell\n  NET=res101\n  TRAIN_IMDB=voc_2007_trainval+voc_2012_trainval\n  mkdir -p output\u002F${NET}\u002F${TRAIN_IMDB}\n  cd output\u002F${NET}\u002F${TRAIN_IMDB}\n  ln -s ..\u002F..\u002F..\u002Fdata\u002Fvoc_2007_trainval+voc_2012_trainval .\u002Fdefault\n  cd ..\u002F..\u002F..\n  ```\n\n3. Demo for testing on custom images\n  ```Shell\n  # at repository root\n  GPU_ID=0\n  CUDA_VISIBLE_DEVICES=${GPU_ID} .\u002Ftools\u002Fdemo.py\n  ```\n  **Note**: Resnet101 testing probably requires several gigabytes of memory, so if you encounter memory capacity issues, please install it with CPU support only. Refer to [Issue 25](https:\u002F\u002Fgithub.com\u002Fendernewton\u002Ftf-faster-rcnn\u002Fissues\u002F25).\n\n4. Test with pre-trained Resnet101 models\n  ```Shell\n  GPU_ID=0\n  .\u002Fexperiments\u002Fscripts\u002Ftest_faster_rcnn.sh $GPU_ID pascal_voc_0712 res101\n  ```\n  **Note**: If you cannot get the reported numbers (79.8 on my side), then probably the NMS function is compiled improperly, refer to [Issue 5](https:\u002F\u002Fgithub.com\u002Fendernewton\u002Ftf-faster-rcnn\u002Fissues\u002F5).\n\n### Train your own model\n1. Download pre-trained models and weights. The current code support VGG16 and Resnet V1 models. Pre-trained models are provided by [pytorch-vgg](https:\u002F\u002Fgithub.com\u002Fjcjohnson\u002Fpytorch-vgg.git) and [pytorch-resnet](https:\u002F\u002Fgithub.com\u002Fruotianluo\u002Fpytorch-resnet) (the ones with caffe in the name), you can download the pre-trained models and set them in the ``data\u002Fimagenet_weights`` folder. For example for VGG16 model, you can set up like:\n   ```Shell\n   mkdir -p data\u002Fimagenet_weights\n   cd data\u002Fimagenet_weights\n   python # open python in terminal and run the following Python code\n   ```\n   ```Python\n   import torch\n   from torch.utils.model_zoo import load_url\n   from torchvision import models\n\n   sd = load_url(\"https:\u002F\u002Fs3-us-west-2.amazonaws.com\u002Fjcjohns-models\u002Fvgg16-00b39a1b.pth\")\n   sd['classifier.0.weight'] = sd['classifier.1.weight']\n   sd['classifier.0.bias'] = sd['classifier.1.bias']\n   del sd['classifier.1.weight']\n   del sd['classifier.1.bias']\n\n   sd['classifier.3.weight'] = sd['classifier.4.weight']\n   sd['classifier.3.bias'] = sd['classifier.4.bias']\n   del sd['classifier.4.weight']\n   del sd['classifier.4.bias']\n\n   torch.save(sd, \"vgg16.pth\")\n   ```\n   ```Shell\n   cd ..\u002F..\n   ```\n   For Resnet101, you can set up like:\n   ```Shell\n   mkdir -p data\u002Fimagenet_weights\n   cd data\u002Fimagenet_weights\n   # download from my gdrive (link in pytorch-resnet)\n   mv resnet101-caffe.pth res101.pth\n   cd ..\u002F..\n   ```\n\n   For Mobilenet V1, you can set up like:\n   ```Shell\n   mkdir -p data\u002Fimagenet_weights\n   cd data\u002Fimagenet_weights\n   # download from my gdrive (https:\u002F\u002Fdrive.google.com\u002Fopen?id=0B7fNdx_jAqhtZGJvZlpVeDhUN1k)\n   mv mobilenet_v1_1.0_224.pth.pth mobile.pth\n   cd ..\u002F..\n   ```\n\n2. Train (and test, evaluation)\n  ```Shell\n  .\u002Fexperiments\u002Fscripts\u002Ftrain_faster_rcnn.sh [GPU_ID] [DATASET] [NET]\n  # GPU_ID is the GPU you want to test on\n  # NET in {vgg16, res50, res101, res152} is the network arch to use\n  # DATASET {pascal_voc, pascal_voc_0712, coco} is defined in train_faster_rcnn.sh\n  # Examples:\n  .\u002Fexperiments\u002Fscripts\u002Ftrain_faster_rcnn.sh 0 pascal_voc vgg16\n  .\u002Fexperiments\u002Fscripts\u002Ftrain_faster_rcnn.sh 1 coco res101\n  ```\n  **Note**: Please double check you have deleted soft link to the pre-trained models before training. If you find NaNs during training, please refer to [Issue 86](https:\u002F\u002Fgithub.com\u002Fendernewton\u002Ftf-faster-rcnn\u002Fissues\u002F86). Also if you want to have multi-gpu support, check out [Issue 121](https:\u002F\u002Fgithub.com\u002Fendernewton\u002Ftf-faster-rcnn\u002Fissues\u002F121).\n\n3. Visualization with Tensorboard\n  ```Shell\n  tensorboard --logdir=tensorboard\u002Fvgg16\u002Fvoc_2007_trainval\u002F --port=7001 &\n  tensorboard --logdir=tensorboard\u002Fvgg16\u002Fcoco_2014_train+coco_2014_valminusminival\u002F --port=7002 &\n  ```\n\n4. Test and evaluate\n  ```Shell\n  .\u002Fexperiments\u002Fscripts\u002Ftest_faster_rcnn.sh [GPU_ID] [DATASET] [NET]\n  # GPU_ID is the GPU you want to test on\n  # NET in {vgg16, res50, res101, res152} is the network arch to use\n  # DATASET {pascal_voc, pascal_voc_0712, coco} is defined in test_faster_rcnn.sh\n  # Examples:\n  .\u002Fexperiments\u002Fscripts\u002Ftest_faster_rcnn.sh 0 pascal_voc vgg16\n  .\u002Fexperiments\u002Fscripts\u002Ftest_faster_rcnn.sh 1 coco res101\n  ```\n\n5. You can use ``tools\u002Freval.sh`` for re-evaluation\n\n\nBy default, trained networks are saved under:\n\n```\noutput\u002F[NET]\u002F[DATASET]\u002Fdefault\u002F\n```\n\nTest outputs are saved under:\n\n```\noutput\u002F[NET]\u002F[DATASET]\u002Fdefault\u002F[SNAPSHOT]\u002F\n```\n\nTensorboard information for train and validation is saved under:\n\n```\ntensorboard\u002F[NET]\u002F[DATASET]\u002Fdefault\u002F\ntensorboard\u002F[NET]\u002F[DATASET]\u002Fdefault_val\u002F\n```\n\nThe default number of training iterations is kept the same to the original faster RCNN for VOC 2007, however Xinlei finds it is beneficial to train longer (see [report](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1702.02138.pdf) for COCO), probably due to the fact that the image batch size is one. For VOC 07+12 we switch to a 80k\u002F110k schedule following [R-FCN](https:\u002F\u002Fgithub.com\u002Fdaijifeng001\u002FR-FCN). Also note that due to the nondeterministic nature of the current implementation, the performance can vary a bit, but in general it should be within ~1% of the reported numbers for VOC, and ~0.2% of the reported numbers for COCO. Suggestions\u002FContributions are welcome.\n\n### Citation\nIf you find this implementation or the analysis conducted in our report helpful, please consider citing:\n\n    @article{chen17implementation,\n        Author = {Xinlei Chen and Abhinav Gupta},\n        Title = {An Implementation of Faster RCNN with Study for Region Sampling},\n        Journal = {arXiv preprint arXiv:1702.02138},\n        Year = {2017}\n    }\n\nFor convenience, here is the faster RCNN citation:\n\n    @inproceedings{renNIPS15fasterrcnn,\n        Author = {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun},\n        Title = {Faster {R-CNN}: Towards Real-Time Object Detection\n                 with Region Proposal Networks},\n        Booktitle = {Advances in Neural Information Processing Systems ({NIPS})},\n        Year = {2015}\n    }\n\n### ~~Detailed numbers from COCO server~~ (not supported)\n\nAll the models are trained on COCO 2014 [trainval35k](https:\u002F\u002Fgithub.com\u002Frbgirshick\u002Fpy-faster-rcnn\u002Ftree\u002Fmaster\u002Fmodels).\n\nVGG16 COCO 2015 test-dev (900k\u002F1190k):\n```\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.297\n Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.504\n Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.312\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.128\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.325\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.421\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.272\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.399\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.409\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.187\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.451\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.591\n ```\n\nVGG16 COCO 2015 test-std (900k\u002F1190k):\n ```\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.295\n Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.501\n Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.312\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.119\n Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.327\n Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.418\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.273\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.400\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.409\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.179\n Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.455\n Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.586\n ```\n","# 通知（2019年11月2日）\n这个仓库是在两年前搭建的，当时还没有能够达到合理性能的 PyTorch 目标检测实现。如今，已经有许多更优秀的项目，例如：\n\n- [detectron2](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2)（这真是一部杰作。）\n- [mmdetection](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmdetection)\n\n因此，本仓库将不再进行积极维护。\n\n# 重要提示：\n如果您在 2017年9月26日之前使用过主分支及其对应的预训练模型，请务必注意：\n旧的主分支现已移至 old_master 分支。您仍然可以运行代码并下载预训练模型，但该旧主分支的预训练模型与当前主分支不兼容！\n\n新旧主分支的主要区别在于以下两个提交：[9d4c24e](https:\u002F\u002Fgithub.com\u002Fruotianluo\u002Fpytorch-faster-rcnn\u002Fcommit\u002F9d4c24e83c3e4ec33751e50d5e4d8b1dd793dfaa) 和 [c899ce7](https:\u002F\u002Fgithub.com\u002Fruotianluo\u002Fpytorch-faster-rcnn\u002Fcommit\u002Fc899ce70dae62e3db1a5805eda96df88e4b59ca6)。\n这些改动与 [此问题](https:\u002F\u002Fgithub.com\u002Fruotianluo\u002Fpytorch-faster-rcnn\u002Fissues\u002F6) 相关；现在的主分支已完全匹配 [tf-faster-rcnn](https:\u002F\u002Fgithub.com\u002Fendernewton\u002Ftf-faster-rcnn) 中的所有细节，因此我们现在可以将 TensorFlow 的预训练模型转换为 PyTorch 模型。\n\n# pytorch-faster-rcnn\n基于 Xinlei Chen 的 [tf-faster-rcnn](https:\u002F\u002Fgithub.com\u002Fendernewton\u002Ftf-faster-rcnn) 的 Faster R-CNN 目标检测框架的 PyTorch 实现。Xinlei Chen 的仓库则基于可用的 Python Caffe 版本的 Faster R-CNN 实现，详见 [此处](https:\u002F\u002Fgithub.com\u002Frbgirshick\u002Fpy-faster-rcnn)。\n\n**注**：在重新实现该框架时，我们做了一些小的修改，这些修改带来了潜在的性能提升。有关具体修改及消融实验的详细信息，请参阅技术报告《Faster R-CNN 的实现及其区域采样研究》（[arXiv:1702.02138](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1702.02138.pdf)）。如果您希望复现原始论文中的结果，请使用 [官方代码](https:\u002F\u002Fgithub.com\u002FShaoqingRen\u002Ffaster_rcnn) 或者 [半官方代码](https:\u002F\u002Fgithub.com\u002Frbgirshick\u002Fpy-faster-rcnn)。关于 Faster R-CNN 架构的详细信息，请参阅论文《Faster R-CNN：基于区域建议网络的实时目标检测》（[arXiv:1506.01497](http:\u002F\u002Farxiv.org\u002Fpdf\u002F1506.01497.pdf)）。\n\n### 检测性能\n当前代码支持 **VGG16**、**Resnet V1** 和 **Mobilenet V1** 模型。我们主要在普通的 VGG16 和 Resnet101 架构上进行了测试。作为基准，我们报告的是在单个卷积层上使用单个模型的结果，因此未采用多尺度、多阶段边界框回归、跳跃连接或额外输入等技术。唯一的数据增强技术是在训练过程中按照原始 Faster RCNN 的方式进行左右翻转。所有模型均已公开发布。\n\n使用 VGG16（``conv5_3``）：\n  - 在 VOC 2007 trainval 上训练，并在 VOC 2007 test 上测试，**71.22**（从零开始训练）**70.75**（转换后）（tf-faster-rcnn 为 **70.8**）。\n  - 在 VOC 2007+2012 trainval 上训练，并在 VOC 2007 test 上测试（[R-FCN](https:\u002F\u002Fgithub.com\u002Fdaijifeng001\u002FR-FCN) 训练计划），**75.33**（从零开始训练）**75.27**（转换后）（tf-faster-rcnn 为 **75.7**）。\n  - 在 COCO 2014 [trainval35k](https:\u002F\u002Fgithub.com\u002Frbgirshick\u002Fpy-faster-rcnn\u002Ftree\u002Fmaster\u002Fmodels) 上训练，并在 [minival](https:\u002F\u002Fgithub.com\u002Frbgirshick\u002Fpy-faster-rcnn\u002Ftree\u002Fmaster\u002Fmodels)（900k\u002F1190k）上测试，**29.2**（从零开始训练）**30.1**（转换后）（tf-faster-rcnn 为 **30.2**）。\n\n使用 Resnet101（最后一个 ``conv4``）：\n  - 在 VOC 2007 trainval 上训练，并在 VOC 2007 test 上测试，**75.29**（从零开始训练）**75.76**（转换后）（tf-faster-rcnn 为 **75.7**）。\n  - 在 VOC 2007+2012 trainval 上训练，并在 VOC 2007 test 上测试（R-FCN 训练计划），**79.26**（从零开始训练）**79.78**（转换后）（tf-faster-rcnn 为 **79.8**）。\n  - 在 COCO 2014 trainval35k 上训练，并在 minival（800k\u002F1190k）上测试，**35.1**（从零开始训练）**35.4**（转换后）（tf-faster-rcnn 为 **35.4**）。\n\n更多结果：\n  - 在 COCO 2014 trainval35k 上训练 Mobilenet（1.0, 224），并在 minival（900k\u002F1190k）上测试，**21.4**（从零开始训练），**21.9**（转换后）（tf-faster-rcnn 为 **21.8**）。\n  - 在 COCO 2014 trainval35k 上训练 Resnet50，并在 minival（900k\u002F1190k）上测试，**32.4**（转换后）（tf-faster-rcnn 为 **32.4**）。\n  - 在 COCO 2014 trainval35k 上训练 Resnet152，并在 minival（900k\u002F1190k）上测试，**36.7**（转换后）（tf-faster-rcnn 为 **36.1**）。\n\n近似的 *基准* [设置](https:\u002F\u002Fgithub.com\u002Fendernewton\u002Ftf-faster-rcnn\u002Fblob\u002Fmaster\u002Fexperiments\u002Fcfgs\u002Fres101-lg.yml) 来自 [FPN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1612.03144)（本仓库目前尚未包含 FPN 的训练代码）：\n  - 在 COCO 2014 trainval35k 上训练 Resnet50，并在 minival（900k\u002F1190k）上测试，~~**34.2**~~。\n  - 在 COCO 2014 trainval35k 上训练 Resnet101，并在 minival（900k\u002F1190k）上测试，~~**37.4**~~。\n  - 在 COCO 2014 trainval35k 上训练 Resnet152，并在 minival（900k\u002F1190k）上测试，~~**38.2**~~。\n\n**注**：\n  - 由于 GPU 训练的随机性，尤其是在 VOC 数据集上，此处报告的是经过 2–3 次尝试后得到的最佳结果。根据 Xinlei 的经验，对于 COCO 数据集，尽管存在随机性，几乎总是可以获得非常接近的结果（误差在 ~0.2% 以内）。\n  - 这些数字是使用 **默认** 测试方案得出的，该方案通过非极大值抑制（TEST.MODE nms）来选择区域建议。另一种测试方案（TEST.MODE top）可能会带来略微更好的性能（参见 [报告](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1702.02138.pdf)，对于 COCO 数据集可提升 0.X AP）。\n  - 由于我们保留了小的建议区域（宽度\u002F高度小于 16 像素），我们的性能在检测小物体时尤为出色。\n  - 我们并未为检测结果设置阈值（而非 0.05），这提高了召回率。\n  - 权重衰减设置为 1e-4。\n  - 其他细微的修改请参阅 [报告](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1702.02138.pdf)。值得注意的是，我们使用了 ``crop_and_resize``，并在训练过程中将真实框排除在 RoI 之外。\n  - 对于 COCO 数据集，我们发现随着迭代次数的增加，性能会有所提升，而进一步增加迭代次数可能会带来更好的效果。\n  - 对于 Resnets 模型，在微调网络时，我们固定了第一个模块（共 4 个），并仅使用 ``crop_and_resize`` 来调整 RoI 大小（7×7），而不进行最大池化（Xinlei 认为这对于 COCO 数据集尤其没有帮助）。最终的特征图会进行平均池化以用于分类和回归。所有批量归一化参数均被固定。偏置的学习率并未加倍。\n  - 对于 Mobilenets 模型，在微调网络时，我们固定了前五个层。所有批量归一化参数均被固定。Mobilenet 层的权重衰减设置为 4e-5。\n\n  - 对于近似的 [FPN](https:\u002F\u002Farxiv.org\u002Fabs\u002F1612.03144) 基准设置，我们只需将图像缩放至 800 像素，添加 32^2 个锚点，并在测试时选取 1000 个建议区域。\n  - 请访问 [这里](http:\u002F\u002Fladoga.graphics.cs.cmu.edu\u002Fxinleic\u002Ftf-faster-rcnn\u002F) \u002F [这里](http:\u002F\u002Fxinlei.sp.cs.cmu.edu\u002Fxinleic\u002Ftf-faster-rcnn\u002F) \u002F [这里](https:\u002F\u002Fdrive.google.com\u002Fopen?id=0B1_fAEgxdnvJSmF3YUlZcHFqWTQ) 获取最新模型，包括更长训练时间的 COCO VGG16 模型和 Resnet 模型。\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fruotianluo_pytorch-faster-rcnn_readme_ce0673762446.png)      |  ![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fruotianluo_pytorch-faster-rcnn_readme_29c8a8fa4d7a.png)\n:-------------------------:|:-------------------------:\nTensorBoard 上显示的真实框 | TensorBoard 上显示的预测结果\n\n### 附加功能\n为了使研究工作更加便捷，我们添加了一些 [报告](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1702.02138.pdf) 中未提及的附加功能：\n  - **支持训练与验证同时进行**。在训练过程中，验证数据也会不时被测试，以监控训练进程并检查潜在的过拟合现象。理想情况下，训练和验证应分开进行，每次加载模型到验证集上进行测试。然而，Xinlei 实现了一种联合的方式，以节省时间和 GPU 内存。尽管在默认设置中验证数据也用于测试，但并未刻意针对测试集进行过拟合。\n  - **支持恢复训练**。Xinlei 在保存快照时尽可能多地存储信息，以便能够从最新的快照正确地恢复训练。元信息包括当前图像索引、图像排列顺序以及 numpy 的随机状态。然而，当您恢复训练时，tensorflow 的随机种子会被重置（目前尚不清楚如何保存 tensorflow 的随机状态），因此可能会导致差异。**请注意**，即使设置了随机种子，当前实现仍无法使模型表现出确定性行为。欢迎提出建议或解决方案，我们将不胜感激。\n  - **支持可视化**。当前实现会在训练过程中汇总真实框、损失统计、激活值和变量，并将其转储到一个单独的文件夹中，以便在 tensorboard 上进行可视化。计算图也会被保存下来，便于调试。\n\n### 前置条件\n  - 需要安装基础的 PyTorch。代码遵循 **1.0** 版本。如果你使用的是旧版本如 **0.1.12**、**0.2**、**0.3** 或 **0.4**，可以检出对应的分支。\n  - Torchvision **0.3**。此代码使用 `torchvision.ops` 中的 `nms`、`roi_pool` 和 `roi_align`。\n  - 可能尚未安装的 Python 包：`opencv-python`、`easydict`（类似于 [py-faster-rcnn](https:\u002F\u002Fgithub.com\u002Frbgirshick\u002Fpy-faster-rcnn)）。对于 `easydict`，请确保版本正确，Xinlei 使用的是 1.6 版。\n  - [tensorboard-pytorch](https:\u002F\u002Fgithub.com\u002Flanpa\u002Ftensorboard-pytorch)，用于可视化训练和验证曲线。请从源码构建以使用最新版本的 tensorflow-tensorboard。\n  - ~~Docker 用户：自最近升级以来，Docker Hub 上的镜像（https:\u002F\u002Fhub.docker.com\u002Fr\u002Fmbuckler\u002Ftf-faster-rcnn-deps\u002F）已不再有效。不过，你仍然可以使用位于 `docker` 文件夹中的 Dockerfile 构建自己的镜像（CUDA 8 版本，因为 TensorFlow r1.0 要求）。同时，请按照 TensorFlow 的安装说明安装并使用 nvidia-docker[https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fnvidia-docker]。最后，在启动容器后，必须在运行中的容器内构建 Cython 模块。~~\n\n### 安装\n1. 克隆仓库\n  ```Shell\n  git clone https:\u002F\u002Fgithub.com\u002Fruotianluo\u002Fpytorch-faster-rcnn.git\n  ```\n\n2. 安装 [Python COCO API](https:\u002F\u002Fgithub.com\u002Fpdollar\u002Fcoco)。代码需要该 API 来访问 COCO 数据集。\n  ```Shell\n  cd data\n  git clone https:\u002F\u002Fgithub.com\u002Fpdollar\u002Fcoco.git\n  cd coco\u002FPythonAPI\n  make\n  cd ..\u002F..\u002F..\n  ```\n\n### 数据准备\n请参照 py-faster-rcnn 的说明 [这里](https:\u002F\u002Fgithub.com\u002Frbgirshick\u002Fpy-faster-rcnn#beyond-the-demo-installation-for-training-and-testing-models) 设置 VOC 和 COCO 数据集（COCO 的部分已完成）。步骤包括下载数据，并可选地在 `data` 文件夹中创建软链接。由于 Faster R-CNN 不依赖于预计算的候选框，因此可以忽略设置候选框的相关步骤。\n\n如果你觉得有用，Xinlei 方面创建的 `data\u002Fcache` 文件夹也在此共享 [这里](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F0B1_fAEgxdnvJSmF3YUlZcHFqWTQ)。\n\n### 使用预训练模型进行演示和测试\n1. 下载预训练模型（仅 Google Drive 可用）\n  \u003C!-- ```Shell\n  # Resnet101 for voc 在 07+12 数据集上预训练\n  # .\u002Fdata\u002Fscripts\u002Ffetch_faster_rcnn_models.sh\n  ```\n  **注意**：如果无法通过链接下载模型，或你想尝试更多模型，可以查看以下解决方案，并可选地更新下载脚本： -->\n  - ~~另一台服务器 [这里](http:\u002F\u002Fgs11655.sp.cs.cmu.edu\u002Fxinleic\u002Ftf-faster-rcnn\u002F)。~~\n  - Google Drive [这里](https:\u002F\u002Fdrive.google.com\u002Fopen?id=0B7fNdx_jAqhtNE10TDZDbFRuU0E)。\n\n**(可选)**\n除了下载我提供的预训练或转换后的模型，你也可以从 tf-faster-rcnn 模型进行转换。\n你可以从 [tf-faster-rcnn](https:\u002F\u002Fgithub.com\u002Fendernewton\u002Ftf-faster-rcnn\u002F#demo-and-test-with-pre-trained-models) 下载 TensorFlow 预训练模型。\n然后运行：\n```Shell\npython tools\u002Fconvert_from_tensorflow.py --tensorflow_model resnet_model.ckpt \npython tools\u002Fconvert_from_tensorflow_vgg.py --tensorflow_model vgg_model.ckpt\n```\n\n该脚本将在与 TensorFlow 模型同名的文件夹中生成一个 `.pth` 文件。\n\n2. 创建文件夹并建立软链接以使用预训练模型\n  ```Shell\n  NET=res101\n  TRAIN_IMDB=voc_2007_trainval+voc_2012_trainval\n  mkdir -p output\u002F${NET}\u002F${TRAIN_IMDB}\n  cd output\u002F${NET}\u002F${TRAIN_IMDB}\n  ln -s ..\u002F..\u002F..\u002Fdata\u002Fvoc_2007_trainval+voc_2012_trainval .\u002Fdefault\n  cd ..\u002F..\u002F..\n  ```\n\n3. 自定义图像的演示与测试\n  ```Shell\n  # 在仓库根目录下\n  GPU_ID=0\n  CUDA_VISIBLE_DEVICES=${GPU_ID} .\u002Ftools\u002Fdemo.py\n  ```\n  **注意**：Resnet101 的测试可能需要数 GB 内存，因此若遇到内存不足的问题，请仅使用 CPU 支持的方式运行。参考 [Issue 25](https:\u002F\u002Fgithub.com\u002Fendernewton\u002Ftf-faster-rcnn\u002Fissues\u002F25)。\n\n4. 使用预训练的 Resnet101 模型进行测试\n  ```Shell\n  GPU_ID=0\n  .\u002Fexperiments\u002Fscripts\u002Ftest_faster_rcnn.sh $GPU_ID pascal_voc_0712 res101\n  ```\n  **注意**：如果无法得到报告的指标（我这边是 79.8），则可能是 NMS 函数编译不当，可参考 [Issue 5](https:\u002F\u002Fgithub.com\u002Fendernewton\u002Ftf-faster-rcnn\u002Fissues\u002F5)。\n\n### 训练你自己的模型\n1. 下载预训练模型和权重。当前代码支持 VGG16 和 Resnet V1 模型。预训练模型由 [pytorch-vgg](https:\u002F\u002Fgithub.com\u002Fjcjohnson\u002Fpytorch-vgg.git) 和 [pytorch-resnet](https:\u002F\u002Fgithub.com\u002Fruotianluo\u002Fpytorch-resnet)（名称中带有“caffe”的版本）提供，你可以下载这些预训练模型并将其放置在 ``data\u002Fimagenet_weights`` 文件夹中。例如，对于 VGG16 模型，可以按以下步骤设置：\n   ```Shell\n   mkdir -p data\u002Fimagenet_weights\n   cd data\u002Fimagenet_weights\n   python # 在终端中打开 Python 并运行以下 Python 代码\n   ```\n   ```Python\n   import torch\n   from torch.utils.model_zoo import load_url\n   from torchvision import models\n\n   sd = load_url(\"https:\u002F\u002Fs3-us-west-2.amazonaws.com\u002Fjcjohns-models\u002Fvgg16-00b39a1b.pth\")\n   sd['classifier.0.weight'] = sd['classifier.1.weight']\n   sd['classifier.0.bias'] = sd['classifier.1.bias']\n   del sd['classifier.1.weight']\n   del sd['classifier.1.bias']\n\n   sd['classifier.3.weight'] = sd['classifier.4.weight']\n   sd['classifier.3.bias'] = sd['classifier.4.bias']\n   del sd['classifier.4.weight']\n   del sd['classifier.4.bias']\n\n   torch.save(sd, \"vgg16.pth\")\n   ```\n   ```Shell\n   cd ..\u002F..\n   ```\n   对于 Resnet101，可以按以下步骤设置：\n   ```Shell\n   mkdir -p data\u002Fimagenet_weights\n   cd data\u002Fimagenet_weights\n   # 从我的 Google Drive 下载（链接在 pytorch-resnet 中）\n   mv resnet101-caffe.pth res101.pth\n   cd ..\u002F..\n   ```\n\n   对于 Mobilenet V1，可以按以下步骤设置：\n   ```Shell\n   mkdir -p data\u002Fimagenet_weights\n   cd data\u002Fimagenet_weights\n   # 从我的 Google Drive 下载（https:\u002F\u002Fdrive.google.com\u002Fopen?id=0B7fNdx_jAqhtZGJvZlpVeDhUN1k）\n   mv mobilenet_v1_1.0_224.pth.pth mobile.pth\n   cd ..\u002F..\n   ```\n\n2. 训练（以及测试、评估）\n  ```Shell\n  .\u002Fexperiments\u002Fscripts\u002Ftrain_faster_rcnn.sh [GPU_ID] [DATASET] [NET]\n  # GPU_ID 是你要用来测试的 GPU 编号\n  # NET 可取 {vgg16, res50, res101, res152}，表示要使用的网络架构\n  # DATASET {pascal_voc, pascal_voc_0712, coco} 在 train_faster_rcnn.sh 中定义\n  # 示例：\n  .\u002Fexperiments\u002Fscripts\u002Ftrain_faster_rcnn.sh 0 pascal_voc vgg16\n  .\u002Fexperiments\u002Fscripts\u002Ftrain_faster_rcnn.sh 1 coco res101\n  ```\n  **注意**：请务必在训练前删除指向预训练模型的软链接。如果在训练过程中遇到 NaN 值，请参考 [Issue 86](https:\u002F\u002Fgithub.com\u002Fendernewton\u002Ftf-faster-rcnn\u002Fissues\u002F86)。此外，如果你希望支持多 GPU 训练，请查看 [Issue 121](https:\u002F\u002Fgithub.com\u002Fendernewton\u002Ftf-faster-rcnn\u002Fissues\u002F121)。\n\n3. 使用 TensorBoard 可视化\n  ```Shell\n  tensorboard --logdir=tensorboard\u002Fvgg16\u002Fvoc_2007_trainval\u002F --port=7001 &\n  tensorboard --logdir=tensorboard\u002Fvgg16\u002Fcoco_2014_train+coco_2014_valminusminival\u002F --port=7002 &\n  ```\n\n4. 测试与评估\n  ```Shell\n  .\u002Fexperiments\u002Fscripts\u002Ftest_faster_rcnn.sh [GPU_ID] [DATASET] [NET]\n  # GPU_ID 是你要用来测试的 GPU 编号\n  # NET 可取 {vgg16, res50, res101, res152}，表示要使用的网络架构\n  # DATASET {pascal_voc, pascal_voc_0712, coco} 在 test_faster_rcnn.sh 中定义\n  # 示例：\n  .\u002Fexperiments\u002Fscripts\u002Ftest_faster_rcnn.sh 0 pascal_voc vgg16\n  .\u002Fexperiments\u002Fscripts\u002Ftest_faster_rcnn.sh 1 coco res101\n  ```\n\n5. 你可以使用 ``tools\u002Freval.sh`` 进行重新评估。\n\n\n默认情况下，训练好的网络会保存在以下路径：\n\n```\noutput\u002F[NET]\u002F[DATASET]\u002Fdefault\u002F\n```\n\n测试输出会保存在以下路径：\n\n```\noutput\u002F[NET]\u002F[DATASET]\u002Fdefault\u002F[SNAPSHOT]\u002F\n```\n\nTensorBoard 的训练和验证信息会保存在以下路径：\n\n```\ntensorboard\u002F[NET]\u002F[DATASET]\u002Fdefault\u002F\ntensorboard\u002F[NET]\u002F[DATASET]\u002Fdefault_val\u002F\n```\n\n默认的训练迭代次数与原始 Faster R-CNN 对于 VOC 2007 的设置相同，然而 Xinlei 发现延长训练时间是有益的（参见 COCO 的 [报告](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1702.02138.pdf)），这可能是因为图像批次大小为 1。对于 VOC 07+12 数据集，我们采用了 80k\u002F110k 的调度策略，遵循 [R-FCN](https:\u002F\u002Fgithub.com\u002Fdaijifeng001\u002FR-FCN) 的做法。另外请注意，由于当前实现的非确定性，性能可能会略有波动，但通常应在 VOC 报告数值的 ±1% 以内，COCO 报告数值的 ±0.2% 以内。欢迎提出建议或贡献。\n\n### 引用\n如果你觉得本实现或我们在报告中进行的分析有所帮助，请考虑引用以下内容：\n\n    @article{chen17implementation,\n        Author = {Xinlei Chen and Abhinav Gupta},\n        Title = {An Implementation of Faster RCNN with Study for Region Sampling},\n        Journal = {arXiv preprint arXiv:1702.02138},\n        Year = {2017}\n    }\n\n为了方便起见，以下是 Faster R-CNN 的引用：\n\n    @inproceedings{renNIPS15fasterrcnn,\n        Author = {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun},\n        Title = {Faster {R-CNN}: Towards Real-Time Object Detection\n                 with Region Proposal Networks},\n        Booktitle = {Advances in Neural Information Processing Systems ({NIPS})},\n        Year = {2015}\n    }\n\n### ~~来自 COCO 服务器的详细指标~~（不支持）\n\n所有模型均在 COCO 2014 [trainval35k](https:\u002F\u002Fgithub.com\u002Frbgirshick\u002Fpy-faster-rcnn\u002Ftree\u002Fmaster\u002Fmodels) 数据集上训练。\n\nVGG16 COCO 2015 test-dev（90万\u002F119万）：\n```\n 平均精度（AP）@[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.297\n 平均精度（AP）@[ IoU=0.50      | area=   all | maxDets=100 ] = 0.504\n 平均精度（AP）@[ IoU=0.75      | area=   all | maxDets=100 ] = 0.312\n 平均精度（AP）@[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.128\n 平均精度（AP）@[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.325\n 平均精度（AP）@[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.421\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.272\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.399\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.409\n 平均召回率（AR）@[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.187\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.451\n 平均召回率（AR）@[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.591\n```\n\nVGG16 COCO 2015 test-std（90万\u002F119万）：\n```\n 平均精度（AP）@[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.295\n 平均精度（AP）@[ IoU=0.50      | area=   all | maxDets=100 ] = 0.501\n 平均精度（AP）@[ IoU=0.75      | area=   all | maxDets=100 ] = 0.312\n 平均精度（AP）@[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.119\n 平均精度（AP）@[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.327\n 平均精度（AP）@[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.418\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.273\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.400\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.409\n 平均召回率（AR）@[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.179\n 平均召回率（AR）@[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.455\n 平均召回率（AR）@[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.586\n```","# pytorch-faster-rcnn 快速上手指南\n\n> **⚠️ 重要提示**：本项目已于 2019 年停止主动维护。作者推荐目前更先进且活跃的项目，如 [detectron2](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2) 或 [mmdetection](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmdetection)。本指南仅适用于需要复现旧版实验或使用特定历史模型的用户。\n\n## 1. 环境准备\n\n本项目基于 PyTorch 开发，对版本有特定要求。\n\n*   **操作系统**: Linux (推荐 Ubuntu)\n*   **Python**: 建议 Python 3.6+\n*   **PyTorch**: **1.0** 版本 (代码遵循此版本)。若使用 0.1.12 - 0.4 版本，请切换至对应分支。\n*   **Torchvision**: **0.3** 版本 (必须，用于 `nms`, `roi_pool`, `roi_align`)\n*   **其他依赖**:\n    *   `opencv-python`\n    *   `easydict` (建议使用版本 1.6)\n    *   `tensorboard-pytorch` (用于可视化训练过程)\n\n**安装依赖示例：**\n```bash\npip install torch==1.0.0 torchvision==0.3.0 opencv-python easydict==1.6 tensorboard-pytorch\n```\n\n## 2. 安装步骤\n\n### 2.1 克隆仓库\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fruotianluo\u002Fpytorch-faster-rcnn.git\ncd pytorch-faster-rcnn\n```\n\n### 2.2 安装 COCO API\n代码访问 COCO 数据集需要 Python COCO API 支持。\n```bash\ncd data\ngit clone https:\u002F\u002Fgithub.com\u002Fpdollar\u002Fcoco.git\ncd coco\u002FPythonAPI\nmake\ncd ..\u002F..\u002F..\n```\n*(注：国内用户若克隆缓慢，可尝试使用 Gitee 镜像或配置 git 代理)*\n\n### 2.3 数据集准备\n本项目支持 VOC 和 COCO 数据集。请参考原 [py-faster-rcnn](https:\u002F\u002Fgithub.com\u002Frbgirshick\u002Fpy-faster-rcnn) 的说明下载数据并在 `data` 文件夹下建立软链接。\n*   无需预计算候选框 (proposals)，可直接忽略相关设置步骤。\n*   如需缓存文件，可从作者提供的 Google Drive 链接下载 `data\u002Fcache` 内容。\n\n## 3. 基本使用\n\n### 3.1 获取预训练模型\n你可以直接下载作者转换好的 PyTorch 模型，或者从 TensorFlow 版本转换。\n\n**方式 A：直接下载 (Google Drive)**\n访问 [此链接](https:\u002F\u002Fdrive.google.com\u002Fopen?id=0B7fNdx_jAqhtNE10TDZDbFRuU0E) 下载模型文件并放入 `data\u002Fmodels` 目录（具体路径需根据脚本调整）。\n\n**方式 B：从 TensorFlow 模型转换**\n如果你已有 `tf-faster-rcnn` 的预训练模型，可以运行以下命令进行转换：\n```bash\npython tools\u002Fconvert_from_tensorflow.py --tensorflow_model \u003Cpath_to_tf_model> --output_model \u003Cpath_to_save_pt_model>\n```\n\n### 3.2 运行演示 (Demo)\n使用预训练模型对单张图片进行检测。假设你已准备好模型文件和测试图片：\n\n```bash\npython demo.py --model \u003Cpath_to_model.pth> --image \u003Cpath_to_image.jpg>\n```\n*程序将输出检测结果，并可选地在 TensorBoard 中可视化预测框。*\n\n### 3.3 训练与验证\n项目支持训练过程中自动进行验证监控。确保数据集路径配置正确后，运行训练脚本（具体脚本名通常为 `trainval_net.py` 或类似，需查看 `experiments\u002Fscripts\u002F` 下的具体配置）：\n\n```bash\n# 示例命令，具体参数需根据配置文件调整\npython trainval_net.py --dataset voc_2007_trainval --model VGG16 --max_iter 70000\n```\n\n**特性说明：**\n*   **断点续训**：支持从最新的 snapshot 恢复训练，保留图像索引和随机状态信息。\n*   **可视化**：训练过程中的 Loss、激活值统计及 Ground Truth 会自动保存至 TensorBoard 日志目录。","某自动驾驶初创团队的算法工程师需要在有限算力下，快速验证基于 PyTorch 的实时车辆与行人检测方案。\n\n### 没有 pytorch-faster-rcnn 时\n- **框架迁移困难**：团队熟悉 PyTorch 生态，但当时缺乏高性能的 PyTorch 版 Faster R-CNN 实现，被迫在 Caffe 或 TensorFlow 版本间艰难转换，代码调试成本极高。\n- **复现细节缺失**：直接参考原始论文或官方 Caffe 代码时，常因区域采样（Region Sampling）等关键细节处理不一致，导致模型精度无法对齐基准线。\n- **骨干网络受限**：难以灵活切换 VGG16、ResNet101 或轻量级 Mobilenet 等不同骨干网络进行消融实验，阻碍了针对车载嵌入式设备的模型选型。\n- **预训练模型不兼容**：现有的 TensorFlow 预训练权重无法直接加载，必须从头训练，耗费大量 GPU 时间和数据标注成本。\n\n### 使用 pytorch-faster-rcnn 后\n- **原生 PyTorch 支持**：直接利用该工具提供的完整 PyTorch 实现，无缝融入现有开发流，无需跨框架折腾，显著缩短环境搭建时间。\n- **精度精准对齐**：工具严格复现了 tf-faster-rcnn 的细节（包括区域采样策略），确保在 VOC 和 COCO 数据集上的 mAP 指标与 TensorFlow 版本基本一致（如 ResNet101 在 VOC2007 上达到 75.76%）。\n- **多架构灵活验证**：内置支持 VGG16、ResNet 及 Mobilenet 等多种骨干网络，团队可快速对比不同模型在精度与速度间的平衡，加速端侧部署决策。\n- **权重平滑迁移**：支持将成熟的 TensorFlow 预训练模型直接转换为 PyTorch 格式，让团队能立即基于高质量权重进行微调，大幅降低训练门槛。\n\npytorch-faster-rcnn 通过提供高精度且细节严谨的 PyTorch 实现，消除了跨框架复现目标检测算法的鸿沟，让研发团队能专注于策略优化而非底层适配。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fruotianluo_pytorch-faster-rcnn_ce067376.png","ruotianluo","Ruotian(RT) Luo","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fruotianluo_ee04f309.png","Waymo Perception SWE. Phd graduate from TTIC.","Waymo","Austin","rluo@ttic.edu",null,"http:\u002F\u002Fttic.uchicago.edu\u002F~rluo","https:\u002F\u002Fgithub.com\u002Fruotianluo",[87,91,95,99,103],{"name":88,"color":89,"percentage":90},"Jupyter Notebook","#DA5B0B",97.2,{"name":92,"color":93,"percentage":94},"Python","#3572A5",2.6,{"name":96,"color":97,"percentage":98},"Shell","#89e051",0.1,{"name":100,"color":101,"percentage":102},"MATLAB","#e16737",0,{"name":104,"color":105,"percentage":102},"Roff","#ecdebe",1817,467,"2026-03-26T11:37:02","MIT",4,"Linux","需要 NVIDIA GPU（文中提及需使用 nvidia-docker 及 CUDA 8），显存大小未说明","未说明",{"notes":115,"python":116,"dependencies":117},"该项目已于 2019 年停止维护，建议改用 detectron2 或 mmdetection。旧版主干代码已移至 old_master 分支且预训练模型不兼容。Docker 镜像已失效，如需使用 Docker 需基于提供的 Dockerfile（CUDA 8 版本）自行构建并编译 Cython 模块。支持 VGG16、ResNet 和 Mobilenet 骨干网络。","兼容 PyTorch 1.0 的版本（文中提及若使用 0.1.12-0.4 需切换分支，未指定具体 Python 版本号）",[118,119,120,121,122,123],"pytorch==1.0","torchvision==0.3","opencv-python","easydict==1.6","tensorboard-pytorch","COCO API",[14],"2026-03-27T02:49:30.150509","2026-04-06T07:23:23.108001",[128,133,138,143,148,153],{"id":129,"question_zh":130,"answer_zh":131,"source_url":132},12753,"训练开始时损失（loss）变为 NaN 或最终结果远低于预期怎么办？","这通常是因为使用了错误的预训练权重文件。请确保下载并使用官方推荐的 PyTorch 预训练模型，而不是其他框架转换的模型。推荐的 ResNet101 下载地址为：https:\u002F\u002Fdownload.pytorch.org\u002Fmodels\u002Fresnet101-5d3b4d8f.pth。请严格按照 README 中的指示操作。","https:\u002F\u002Fgithub.com\u002Fruotianluo\u002Fpytorch-faster-rcnn\u002Fissues\u002F4",{"id":134,"question_zh":135,"answer_zh":136,"source_url":137},12754,"加载 TensorFlow 转换的模型或在 demo.py 中运行时出现 'KeyError'（如 'resnet.bn1.bias' 或 'vgg.features...') 如何解决？","这是因为 TensorFlow 转换的模型与当前 PyTorch 代码的结构不兼容（特别是 BatchNorm 层或缺少的键）。建议直接使用 PyTorch 原生的预训练权重进行训练或推理，不要直接使用从 TensorFlow 转换过来的 .pth 文件，除非代码已专门适配该格式。","https:\u002F\u002Fgithub.com\u002Fruotianluo\u002Fpytorch-faster-rcnn\u002Fissues\u002F25",{"id":139,"question_zh":140,"answer_zh":141,"source_url":142},12755,"训练过程中报错 'fg_inds.numel()=0 and bg_inds.numel()=0' 导致中断或进入调试模式怎么办？","这是一个已知问题，当没有前景（fg）和背景（bg）样本被选中时发生。该问题已在 PR #169 中修复。如果无法更新代码，可以手动修改 `lib\u002Flayer_utils\u002Fproposal_target_layer.py` 文件，在相关逻辑后添加处理代码：当 fg 和 bg 都为空时，强制随机选择一些 ROI 作为背景样本。具体代码逻辑为：`if fg_inds.numel() == 0 and bg_inds.numel() == 0: bg_inds = torch.from_numpy(npr.choice(...)).long().cuda()`。","https:\u002F\u002Fgithub.com\u002Fruotianluo\u002Fpytorch-faster-rcnn\u002Fissues\u002F68",{"id":144,"question_zh":145,"answer_zh":146,"source_url":147},12756,"为什么在 proposal_target_layer.py 中没有处理 fg_inds 和 bg_inds 同时为空的情况？","这是一个逻辑遗漏。当配置 `TRAIN.USE_GT` 为 False 或者阈值 `TRAIN.BG_THRESH_HI\u002FLO` 设置不当时，可能出现没有正负样本的情况。该问题已通过 PR #169 修复，增加了对此极端情况的随机采样处理逻辑，防止训练崩溃。","https:\u002F\u002Fgithub.com\u002Fruotianluo\u002Fpytorch-faster-rcnn\u002Fissues\u002F11",{"id":149,"question_zh":150,"answer_zh":151,"source_url":152},12757,"训练时显存溢出（Out of Memory），尤其是输入图像尺寸较大时，如何调整？","可以通过减小批处理大小（BATCH_SIZE）来解决显存不足的问题。例如，将配置文件中的 `TRAIN.BATCH_SIZE` 参数降低（如调整为 64 或更小），以适应较大的输入图像尺度（SCALE）或较小的显存容量。","https:\u002F\u002Fgithub.com\u002Fruotianluo\u002Fpytorch-faster-rcnn\u002Fissues\u002F29",{"id":154,"question_zh":155,"answer_zh":156,"source_url":157},12758,"使用预训练权重训练自定义数据时出现 'KeyError: resnet.bn1.num_batches_tracked' 错误怎么办？","这是由于 PyTorch 版本差异导致预训练模型中包含 `num_batches_tracked` 字段，而旧版代码或网络定义中未包含该字段。该问题在较新版本中已修复。如果遇到此问题，请确保代码是最新的，或者检查 `load_pretrained_cnn` 函数的实现，确保在加载状态字典时忽略不匹配的键，或将加载逻辑移至正确的网络结构初始化之后。","https:\u002F\u002Fgithub.com\u002Fruotianluo\u002Fpytorch-faster-rcnn\u002Fissues\u002F140",[159,164],{"id":160,"version":161,"summary_zh":162,"released_at":163},71429,"2.0","代码已更新至 PyTorch 0.4。\n\n得益于 `.to(device)` 方法，现在可以轻松地在 CPU 和 CUDA 之间切换模型的运行设备。默认情况下，如果没有可用的 CUDA 设备，在运行测试或演示时，模型会自动在 CPU 上执行。","2018-04-25T04:37:01",{"id":165,"version":166,"summary_zh":167,"released_at":168},71430,"1.0","当前版本较为稳定，但不如 tf-faster-rcnn 的 master 分支。\r\n\r\n接下来会将更改合并到 master 分支。这里仅作备份。","2017-10-20T03:32:38"]