[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-KaihuaTang--Scene-Graph-Benchmark.pytorch":3,"tool-KaihuaTang--Scene-Graph-Benchmark.pytorch":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":80,"owner_location":81,"owner_email":82,"owner_twitter":80,"owner_website":83,"owner_url":84,"languages":85,"stars":109,"forks":110,"last_commit_at":111,"license":112,"difficulty_score":113,"env_os":114,"env_gpu":115,"env_ram":116,"env_deps":117,"category_tags":125,"github_topics":80,"view_count":23,"oss_zip_url":80,"oss_zip_packed_at":80,"status":16,"created_at":126,"updated_at":127,"faqs":128,"releases":157},2340,"KaihuaTang\u002FScene-Graph-Benchmark.pytorch","Scene-Graph-Benchmark.pytorch","A new codebase for popular Scene Graph Generation methods (2020). Visualization & Scene Graph Extraction on custom images\u002Fdatasets are provided. It's also a PyTorch implementation of paper “Unbiased Scene Graph Generation from Biased Training CVPR 2020”","Scene-Graph-Benchmark.pytorch 是一个基于 PyTorch 构建的场景图生成（SGG）开源代码库，也是 CVPR 2020 口头报告论文《Unbiased Scene Graph Generation from Biased Training》的官方实现。它旨在解决旧有代码库（如 neural-motifs）与现代目标检测框架脱节的问题，通过集成成熟的 maskrcnn-benchmark 架构，将关系预测定义为额外的 ROI Head，从而提供更稳定、易读且易于扩展的开发环境。\n\n该工具不仅复现了多种主流场景图生成方法，还特别针对训练数据中的偏差问题提出了无偏生成方案，显著提升了模型在少样本和零样本场景下的表现。此外，它支持用户在自定义图片上进行场景图检测与可视化，并输出标准化的 JSON 结果，方便后续分析。\n\nScene-Graph-Benchmark.pytorch 非常适合计算机视觉领域的研究人员和开发者使用，尤其是那些希望深入探索场景图生成算法、复现前沿论文或构建自定义视觉关系理解系统的团队。其清晰的代码结构和完善的文档也使其成为初学者入门该领域","Scene-Graph-Benchmark.pytorch 是一个基于 PyTorch 构建的场景图生成（SGG）开源代码库，也是 CVPR 2020 口头报告论文《Unbiased Scene Graph Generation from Biased Training》的官方实现。它旨在解决旧有代码库（如 neural-motifs）与现代目标检测框架脱节的问题，通过集成成熟的 maskrcnn-benchmark 架构，将关系预测定义为额外的 ROI Head，从而提供更稳定、易读且易于扩展的开发环境。\n\n该工具不仅复现了多种主流场景图生成方法，还特别针对训练数据中的偏差问题提出了无偏生成方案，显著提升了模型在少样本和零样本场景下的表现。此外，它支持用户在自定义图片上进行场景图检测与可视化，并输出标准化的 JSON 结果，方便后续分析。\n\nScene-Graph-Benchmark.pytorch 非常适合计算机视觉领域的研究人员和开发者使用，尤其是那些希望深入探索场景图生成算法、复现前沿论文或构建自定义视觉关系理解系统的团队。其清晰的代码结构和完善的文档也使其成为初学者入门该领域的理想选择。","# Scene Graph Benchmark in Pytorch\n\n[![LICENSE](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-green)](https:\u002F\u002Fgithub.com\u002FKaihuaTang\u002FScene-Graph-Benchmark.pytorch\u002Fblob\u002Fmaster\u002FLICENSE)\n[![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.7-blue.svg)](https:\u002F\u002Fwww.python.org\u002F)\n![PyTorch](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpytorch-1.2.0-%237732a8)\n\nOur paper [Unbiased Scene Graph Generation from Biased Training](https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.11949) has been accepted by CVPR 2020 (Oral).\n\n## Recent Updates\n\n- [x] 2020.06.23 Add no graph constraint mean Recall@K (ng-mR@K) and no graph constraint Zero-Shot Recall@K (ng-zR@K) [\\[link\\]](METRICS.md#explanation-of-our-metrics)\n- [x] 2020.06.23 Allow scene graph detection (SGDet) on custom images [\\[link\\]](#SGDet-on-custom-images)\n- [x] 2020.07.21 Change scene graph detection output on custom images to json files [\\[link\\]](#SGDet-on-custom-images)\n- [x] 2020.07.21 Visualize detected scene graphs of custom images [\\[link\\]](#Visualize-Detected-SGs-of-Custom-Images)\n- [ ] TODO: Using [Background-Exempted Inference](https:\u002F\u002Fgithub.com\u002FKaihuaTang\u002FLong-Tailed-Recognition.pytorch\u002Ftree\u002Fmaster\u002Flvis1.0#background-exempted-inference) to improve the quality of TDE Scene Graph\n\n## Contents\n\n1. [Overview](#Overview)\n2. [Install the Requirements](INSTALL.md)\n3. [Prepare the Dataset](DATASET.md)\n4. [Metrics and Results for our Toolkit](METRICS.md)\n    - [Explanation of R@K, mR@K, zR@K, ng-R@K, ng-mR@K, ng-zR@K, A@K, S2G](METRICS.md#explanation-of-our-metrics)\n    - [Output Format](METRICS.md#output-format-of-our-code)\n    - [Reported Results](METRICS.md#reported-results)\n5. [Faster R-CNN Pre-training](#pretrained-models)\n6. [Scene Graph Generation as RoI_Head](#scene-graph-generation-as-RoI_Head)\n7. [Training on Scene Graph Generation](#perform-training-on-scene-graph-generation)\n8. [Evaluation on Scene Graph Generation](#Evaluation)\n9. [**Detect Scene Graphs on Your Custom Images** :star2:](#SGDet-on-custom-images)\n10. [**Visualize Detected Scene Graphs of Custom Images** :star2:](#Visualize-Detected-SGs-of-Custom-Images)\n11. [Other Options that May Improve the SGG](#other-options-that-may-improve-the-SGG)\n12. [Tips and Tricks for TDE on any Unbiased Task](#tips-and-Tricks-for-any-unbiased-taskX-from-biased-training)\n13. [Frequently Asked Questions](#frequently-asked-questions)\n14. [Citations](#Citations)\n\n## Overview\n\nThis project aims to build a new CODEBASE of Scene Graph Generation (SGG), and it is also a Pytorch implementation of the paper [Unbiased Scene Graph Generation from Biased Training](https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.11949). The previous widely adopted SGG codebase [neural-motifs](https:\u002F\u002Fgithub.com\u002Frowanz\u002Fneural-motifs) is detached from the recent development of Faster\u002FMask R-CNN. Therefore, I decided to build a scene graph benchmark on top of the well-known [maskrcnn-benchmark](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fmaskrcnn-benchmark) project and define relationship prediction as an additional roi_head. By the way, thanks to their elegant framework, this codebase is much more novice-friendly and easier to read\u002Fmodify for your own projects than previous neural-motifs framework(at least I hope so). It is a pity that when I was working on this project, the [detectron2](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdetectron2) had not been released, but I think we can consider [maskrcnn-benchmark](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fmaskrcnn-benchmark) as a more stable version with less bugs, hahahaha. I also introduce all the old and new metrics used in SGG, and clarify two common misunderstandings in SGG metrics in [METRICS.md](METRICS.md), which cause abnormal results in some papers.\n\n### Benefit from the up-to-date Faster R-CNN in [maskrcnn-benchmark](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fmaskrcnn-benchmark), this codebase achieves new state-of-the-art Recall@k on SGCls & SGGen (by 2020.2.16) through the reimplemented VCTree using two 1080ti GPUs and batch size 8:\n\nModels | SGGen R@20 | SGGen R@50 | SGGen R@100 | SGCls R@20 | SGCls R@50 | SGCls R@100 | PredCls R@20 | PredCls R@50 | PredCls R@100\n-- | -- | -- | -- | -- | -- | -- | -- | -- | -- \nVCTree | 24.53 | 31.93 | 36.21 | 42.77 | 46.67 | 47.64 | 59.02 | 65.42 | 67.18\n\nNote that all results of VCTree should be better than what we reported in [Unbiased Scene Graph Generation from Biased Training](https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.11949), because we optimized the tree construction network after the publication.\n\n### The illustration of the Unbiased SGG from 'Unbiased Scene Graph Generation from Biased Training'\n\n![alt text](demo\u002Fteaser_figure.png \"from 'Unbiased Scene Graph Generation from Biased Training'\")\n\n## Installation\n\nCheck [INSTALL.md](INSTALL.md) for installation instructions.\n\n## Dataset\n\nCheck [DATASET.md](DATASET.md) for instructions of dataset preprocessing.\n\n## Metrics and Results **(IMPORTANT)**\nExplanation of metrics in our toolkit and reported results are given in [METRICS.md](METRICS.md)\n\n\n## Pretrained Models\n\nSince we tested many SGG models in our paper [Unbiased Scene Graph Generation from Biased Training](https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.11949), I won't upload all the pretrained SGG models here. However, you can download the [pretrained Faster R-CNN](https:\u002F\u002F1drv.ms\u002Fu\u002Fs!AmRLLNf6bzcir8xemVHbqPBrvjjtQg?e=hAhYCw) we used in the paper, which is the most time consuming step in the whole training process (it took 4 2080ti GPUs). As to the SGG model, you can follow the rest instructions to train your own, which only takes 2 GPUs to train each SGG model. The results should be very close to the reported results given in [METRICS.md](METRICS.md)\n\nAfter you download the [Faster R-CNN model](https:\u002F\u002F1drv.ms\u002Fu\u002Fs!AmRLLNf6bzcir8xemVHbqPBrvjjtQg?e=hAhYCw), please extract all the files to the directory `\u002Fhome\u002Fusername\u002Fcheckpoints\u002Fpretrained_faster_rcnn`. To train your own Faster R-CNN model, please follow the next section.\n\nThe above pretrained Faster R-CNN model achives 38.52\u002F26.35\u002F28.14 mAp on VG train\u002Fval\u002Ftest set respectively.\n\n## Alternate links\nThanks for sponsorship from [Catchip](https:\u002F\u002Fgithub.com\u002FCatchip). Since OneDrive links might be broken in mainland China, we also provide the following alternate links for all the pretrained models and dataset annotations: \n\nLink1(Baidu)：[https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1oyPQBDHXMQ5Tsl0jy5OzgA](https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1oyPQBDHXMQ5Tsl0jy5OzgA)\nExtraction code：1234\n\nLink2(Weiyun): [https:\u002F\u002Fshare.weiyun.com\u002FViTWrFxG](https:\u002F\u002Fshare.weiyun.com\u002FViTWrFxG)\n\n## Faster R-CNN pre-training\nThe following command can be used to train your own Faster R-CNN model:\n```bash\nCUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --master_port 10001 --nproc_per_node=4 tools\u002Fdetector_pretrain_net.py --config-file \"configs\u002Fe2e_relation_detector_X_101_32_8_FPN_1x.yaml\" SOLVER.IMS_PER_BATCH 8 TEST.IMS_PER_BATCH 4 DTYPE \"float16\" SOLVER.MAX_ITER 50000 SOLVER.STEPS \"(30000, 45000)\" SOLVER.VAL_PERIOD 2000 SOLVER.CHECKPOINT_PERIOD 2000 MODEL.RELATION_ON False OUTPUT_DIR \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fpretrained_faster_rcnn SOLVER.PRE_VAL False\n```\nwhere ```CUDA_VISIBLE_DEVICES``` and ```--nproc_per_node``` represent the id of GPUs and number of GPUs you use, ```--config-file``` means the config we use, where you can change other parameters. ```SOLVER.IMS_PER_BATCH``` and ```TEST.IMS_PER_BATCH``` are the training and testing batch size respectively, ```DTYPE \"float16\"``` enables Automatic Mixed Precision supported by [APEX](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fapex), ```SOLVER.MAX_ITER``` is the maximum iteration, ```SOLVER.STEPS``` is the steps where we decay the learning rate, ```SOLVER.VAL_PERIOD``` and ```SOLVER.CHECKPOINT_PERIOD``` are the periods of conducting val and saving checkpoint, ```MODEL.RELATION_ON``` means turning on the relationship head or not (since this is the pretraining phase for Faster R-CNN only, we turn off the relationship head),  ```OUTPUT_DIR``` is the output directory to save checkpoints and log (considering `\u002Fhome\u002Fusername\u002Fcheckpoints\u002Fpretrained_faster_rcnn`), ```SOLVER.PRE_VAL``` means whether we conduct validation before training or not.\n\n\n## Scene Graph Generation as RoI_Head\n\nTo standardize the SGG, I define scene graph generation as an RoI_Head. Referring to the design of other roi_heads like box_head, I put most of the SGG codes under ```maskrcnn_benchmark\u002Fmodeling\u002Froi_heads\u002Frelation_head``` and their calling sequence is as follows:\n\n![alt text](demo\u002Frelation_head.png \"structure of relation_head\")\n\n\n## Perform training on Scene Graph Generation\n\nThere are **three standard protocols**: (1) Predicate Classification (PredCls): taking ground truth bounding boxes and labels as inputs, (2) Scene Graph Classification (SGCls) : using ground truth bounding boxes without labels, (3) Scene Graph Detection (SGDet): detecting SGs from scratch. We use two switches ```MODEL.ROI_RELATION_HEAD.USE_GT_BOX``` and ```MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL``` to select the protocols. \n\nFor **Predicate Classification (PredCls)**, we need to set:\n``` bash\nMODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL True\n```\nFor **Scene Graph Classification (SGCls)**:\n``` bash\nMODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False\n```\nFor **Scene Graph Detection (SGDet)**:\n``` bash\nMODEL.ROI_RELATION_HEAD.USE_GT_BOX False MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False\n```\n\n### Predefined Models\nWe abstract various SGG models to be different ```relation-head predictors``` in the file ```roi_heads\u002Frelation_head\u002Froi_relation_predictors.py```, which are independent of the Faster R-CNN backbone and relation-head feature extractor. To select our predefined models, you can use ```MODEL.ROI_RELATION_HEAD.PREDICTOR```.\n\nFor [Neural-MOTIFS](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.06640) Model:\n```bash\nMODEL.ROI_RELATION_HEAD.PREDICTOR MotifPredictor\n```\nFor [Iterative-Message-Passing(IMP)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1701.02426) Model (Note that SOLVER.BASE_LR should be changed to 0.001 in SGCls, or the model won't converge):\n```bash\nMODEL.ROI_RELATION_HEAD.PREDICTOR IMPPredictor\n```\nFor [VCTree](https:\u002F\u002Farxiv.org\u002Fabs\u002F1812.01880) Model:\n```bash\nMODEL.ROI_RELATION_HEAD.PREDICTOR VCTreePredictor\n```\nFor our predefined Transformer Model (Note that Transformer Model needs to change SOLVER.BASE_LR to 0.001, SOLVER.SCHEDULE.TYPE to WarmupMultiStepLR, SOLVER.MAX_ITER to 16000, SOLVER.IMS_PER_BATCH to 16, SOLVER.STEPS to (10000, 16000).), which is provided by [Jiaxin Shi](https:\u002F\u002Fgithub.com\u002Fshijx12):\n```bash\nMODEL.ROI_RELATION_HEAD.PREDICTOR TransformerPredictor\n```\nFor [Unbiased-Causal-TDE](https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.11949) Model:\n```bash\nMODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor\n```\n\nThe default settings are under ```configs\u002Fe2e_relation_X_101_32_8_FPN_1x.yaml``` and ```maskrcnn_benchmark\u002Fconfig\u002Fdefaults.py```. The priority is ```command > yaml > defaults.py```\n\n### Customize Your Own Model\nIf you want to customize your own model, you can refer ```maskrcnn-benchmark\u002Fmodeling\u002Froi_heads\u002Frelation_head\u002Fmodel_XXXXX.py``` and ```maskrcnn-benchmark\u002Fmodeling\u002Froi_heads\u002Frelation_head\u002Futils_XXXXX.py```. You also need to add corresponding nn.Module in ```maskrcnn-benchmark\u002Fmodeling\u002Froi_heads\u002Frelation_head\u002Froi_relation_predictors.py```. Sometimes you may also need to change the inputs & outputs of the module through ```maskrcnn-benchmark\u002Fmodeling\u002Froi_heads\u002Frelation_head\u002Frelation_head.py```.\n\n### The proposed Causal TDE on [Unbiased Scene Graph Generation from Biased Training](https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.11949)\nAs to the Unbiased-Causal-TDE, there are some additional parameters you need to know. ```MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE``` is used to select the causal effect analysis type during inference(test), where \"none\" is original likelihood, \"TDE\" is total direct effect, \"NIE\" is natural indirect effect, \"TE\" is total effect. ```MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE``` has two choice \"sum\" or \"gate\". Since Unbiased Causal TDE Analysis is model-agnostic, we support [Neural-MOTIFS](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.06640), [VCTree](https:\u002F\u002Farxiv.org\u002Fabs\u002F1812.01880) and [VTransE](https:\u002F\u002Farxiv.org\u002Fabs\u002F1702.08319). ```MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER``` is used to select these models for Unbiased Causal Analysis, which has three choices: motifs, vctree, vtranse.\n\nNote that during training, we always set ```MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE``` to be 'none', because causal effect analysis is only applicable to the inference\u002Ftest phase.\n\n### Examples of the Training Command\nTraining Example 1 : (PreCls, Motif Model)\n```bash\nCUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 10025 --nproc_per_node=2 tools\u002Frelation_train_net.py --config-file \"configs\u002Fe2e_relation_X_101_32_8_FPN_1x.yaml\" MODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL True MODEL.ROI_RELATION_HEAD.PREDICTOR MotifPredictor SOLVER.IMS_PER_BATCH 12 TEST.IMS_PER_BATCH 2 DTYPE \"float16\" SOLVER.MAX_ITER 50000 SOLVER.VAL_PERIOD 2000 SOLVER.CHECKPOINT_PERIOD 2000 GLOVE_DIR \u002Fhome\u002Fkaihua\u002Fglove MODEL.PRETRAINED_DETECTOR_CKPT \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fpretrained_faster_rcnn\u002Fmodel_final.pth OUTPUT_DIR \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fmotif-precls-exmp\n```\nwhere ```GLOVE_DIR``` is the directory used to save glove initializations, ```MODEL.PRETRAINED_DETECTOR_CKPT``` is the pretrained Faster R-CNN model you want to load, ```OUTPUT_DIR``` is the output directory used to save checkpoints and the log. Since we use the ```WarmupReduceLROnPlateau``` as the learning scheduler for SGG, ```SOLVER.STEPS``` is not required anymore.\n\nTraining Example 2 : (SGCls, Causal, **TDE**, SUM Fusion, MOTIFS Model)\n```bash\nCUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 10026 --nproc_per_node=2 tools\u002Frelation_train_net.py --config-file \"configs\u002Fe2e_relation_X_101_32_8_FPN_1x.yaml\" MODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE none MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE sum MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER motifs  SOLVER.IMS_PER_BATCH 12 TEST.IMS_PER_BATCH 2 DTYPE \"float16\" SOLVER.MAX_ITER 50000 SOLVER.VAL_PERIOD 2000 SOLVER.CHECKPOINT_PERIOD 2000 GLOVE_DIR \u002Fhome\u002Fkaihua\u002Fglove MODEL.PRETRAINED_DETECTOR_CKPT \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fpretrained_faster_rcnn\u002Fmodel_final.pth OUTPUT_DIR \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fcausal-motifs-sgcls-exmp\n```\n\n\n## Evaluation\n\n### Examples of the Test Command\nTest Example 1 : (PreCls, Motif Model)\n```bash\nCUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10027 --nproc_per_node=1 tools\u002Frelation_test_net.py --config-file \"configs\u002Fe2e_relation_X_101_32_8_FPN_1x.yaml\" MODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL True MODEL.ROI_RELATION_HEAD.PREDICTOR MotifPredictor TEST.IMS_PER_BATCH 1 DTYPE \"float16\" GLOVE_DIR \u002Fhome\u002Fkaihua\u002Fglove MODEL.PRETRAINED_DETECTOR_CKPT \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fmotif-precls-exmp OUTPUT_DIR \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fmotif-precls-exmp\n```\n\nTest Example 2 : (SGCls, Causal, **TDE**, SUM Fusion, MOTIFS Model)\n```bash\nCUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10028 --nproc_per_node=1 tools\u002Frelation_test_net.py --config-file \"configs\u002Fe2e_relation_X_101_32_8_FPN_1x.yaml\" MODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE TDE MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE sum MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER motifs  TEST.IMS_PER_BATCH 1 DTYPE \"float16\" GLOVE_DIR \u002Fhome\u002Fkaihua\u002Fglove MODEL.PRETRAINED_DETECTOR_CKPT \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fcausal-motifs-sgcls-exmp OUTPUT_DIR \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fcausal-motifs-sgcls-exmp\n```\n\n### Examples of Pretrained Causal MOTIFS-SUM models\nExamples of Pretrained Causal MOTIFS-SUM models on SGDet\u002FSGCls\u002FPredCls (batch size 12): [(SGDet Download)](https:\u002F\u002F1drv.ms\u002Fu\u002Fs!AmRLLNf6bzcir9x7OYb6sKBlzoXuYA?e=s3Y602), [(SGCls Download)](https:\u002F\u002F1drv.ms\u002Fu\u002Fs!AmRLLNf6bzcir9xyuLO_I8TSZ6kfyQ?e=Y5686s), [(PredCls Download)](https:\u002F\u002F1drv.ms\u002Fu\u002Fs!AmRLLNf6bzcir9xx725wYjN7lytynA?e=0B65Ws)\n\nCorresponding Results (The original models used in the paper are lost. These are the fresh ones, so there are some fluctuations on the results. More results can be found in [Reported Results](METRICS.md#reported-results)):\n\nModels |  R@20 | R@50 | R@100 | mR@20 | mR@50 | mR@100 | zR@20 | zR@50 | zR@100\n-- | -- | -- | -- | -- | -- | -- | -- | -- | -- \nMOTIFS-SGDet-none   | 25.42 | 32.45 | 37.26 | 4.36 | 5.83 | 7.08 | 0.02 | 0.08 | 0.24\nMOTIFS-SGDet-TDE    | 11.92 | 16.56 | 20.15 | 6.58 | 8.94 | 10.99 | 1.54 | 2.33 | 3.03\nMOTIFS-SGCls-none   | 36.02 | 39.25 | 40.07 | 6.50 | 8.02 | 8.51 | 1.06 | 2.18 | 3.07\nMOTIFS-SGCls-TDE    | 20.47 | 26.31 | 28.79 | 9.80 | 13.21 | 15.06 | 1.91 | 2.95 | 4.10\nMOTIFS-PredCls-none | 59.64 | 66.11 | 67.96 | 11.46 | 14.60 | 15.84 | 5.79 | 11.02 | 14.74\nMOTIFS-PredCls-TDE  | 33.38 | 45.88 | 51.25 | 17.85 | 24.75 | 28.70 | 8.28 | 14.31 | 18.04\n\n## SGDet on Custom Images\nNote that evaluation on custum images is only applicable for SGDet model, because PredCls and SGCls model requires additional ground-truth bounding boxes information. To detect scene graphs into a json file on your own images, you need to turn on the switch TEST.CUSTUM_EVAL and give a folder path (or a json file containing a list of image paths) that contains the custom images to TEST.CUSTUM_PATH. Only JPG files are allowed. The output will be saved as custom_prediction.json in the given DETECTED_SGG_DIR.\n\nTest Example 1 : (SGDet, **Causal TDE**, MOTIFS Model, SUM Fusion) [(checkpoint)](https:\u002F\u002F1drv.ms\u002Fu\u002Fs!AmRLLNf6bzcir9x7OYb6sKBlzoXuYA?e=s3Y602)\n```bash\nCUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10027 --nproc_per_node=1 tools\u002Frelation_test_net.py --config-file \"configs\u002Fe2e_relation_X_101_32_8_FPN_1x.yaml\" MODEL.ROI_RELATION_HEAD.USE_GT_BOX False MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE TDE MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE sum MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER motifs TEST.IMS_PER_BATCH 1 DTYPE \"float16\" GLOVE_DIR \u002Fhome\u002Fkaihua\u002Fglove MODEL.PRETRAINED_DETECTOR_CKPT \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fcausal-motifs-sgdet OUTPUT_DIR \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fcausal-motifs-sgdet TEST.CUSTUM_EVAL True TEST.CUSTUM_PATH \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fcustom_images DETECTED_SGG_DIR \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fyour_output_path\n```\n\nTest Example 2 : (SGDet, **Original**, MOTIFS Model, SUM Fusion) [(same checkpoint)](https:\u002F\u002F1drv.ms\u002Fu\u002Fs!AmRLLNf6bzcir9x7OYb6sKBlzoXuYA?e=s3Y602)\n```bash\nCUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10027 --nproc_per_node=1 tools\u002Frelation_test_net.py --config-file \"configs\u002Fe2e_relation_X_101_32_8_FPN_1x.yaml\" MODEL.ROI_RELATION_HEAD.USE_GT_BOX False MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE none MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE sum MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER motifs TEST.IMS_PER_BATCH 1 DTYPE \"float16\" GLOVE_DIR \u002Fhome\u002Fkaihua\u002Fglove MODEL.PRETRAINED_DETECTOR_CKPT \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fcausal-motifs-sgdet OUTPUT_DIR \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fcausal-motifs-sgdet TEST.CUSTUM_EVAL True TEST.CUSTUM_PATH \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fcustom_images DETECTED_SGG_DIR \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fyour_output_path\n```\n\nThe output is a json file. For each image, the scene graph information is saved as a dictionary containing bbox(sorted), bbox_labels(sorted), bbox_scores(sorted), rel_pairs(sorted), rel_labels(sorted), rel_scores(sorted), rel_all_scores(sorted), where the last rel_all_scores give all 51 predicates probability for each pair of objects. The dataset information is saved as custom_data_info.json in the same DETECTED_SGG_DIR.\n\n## Visualize Detected SGs of Custom Images\nTo visualize the detected scene graphs of custom images, you can follow the jupyter note: [visualization\u002F3.visualize_custom_SGDet.jpynb](https:\u002F\u002Fgithub.com\u002FKaihuaTang\u002FScene-Graph-Benchmark.pytorch\u002Fblob\u002Fmaster\u002Fvisualization\u002F3.visualize_custom_SGDet.ipynb). The inputs of our visualization code are custom_prediction.json and custom_data_info.json in DETECTED_SGG_DIR. They will be automatically generated if you run the above custom SGDet instruction successfully. Note that there may be too much trivial bounding boxes and relationships, so you can select top-k bbox and predicates for better scene graphs by change parameters box_topk and rel_topk. \n\n## Other Options that May Improve the SGG\n\n- For some models (not all), turning on or turning off ```MODEL.ROI_RELATION_HEAD.POOLING_ALL_LEVELS``` will affect the performance of predicate prediction, e.g., turning it off will improve VCTree PredCls but not the corresponding SGCls and SGGen. For the reported results of VCTree, we simply turn it on for all three protocols like other models.\n\n- For some models (not all), a crazy fusion proposed by [Learning to Count Object](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.05766) will significantly improves the results, which looks like ```f(x1, x2) = ReLU(x1 + x2) - (x1 - x2)**2```. It can be used to combine the subject and object features in ```roi_heads\u002Frelation_head\u002Froi_relation_predictors.py```. For now, most of our model just concatenate them as ```torch.cat((head_rep, tail_rep), dim=-1)```.\n\n- Not to mention the hidden dimensions in the models, e.g., ```MODEL.ROI_RELATION_HEAD.CONTEXT_HIDDEN_DIM```. Due to the limited time, we didn't fully explore all the settings in this project, I won't be surprised if you improve our results by simply changing one of our hyper-parameters\n\n## Tips and Tricks for any Unbiased TaskX from Biased Training\n\nThe counterfactual inference is not only applicable to SGG. Actually, my collegue [Yulei](https:\u002F\u002Fgithub.com\u002Fyuleiniu) found that counterfactual causal inference also has significant potential in [unbiased VQA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.04315). We believe such an counterfactual inference can also be applied to lots of reasoning tasks with significant bias. It basically just runs the model two times (one for original output, another for the intervened output), and the later one gets the biased prior that should be subtracted from the final prediction. But there are three tips you need to bear in mind:\n- The most important things is always the causal graph. You need to find the correct causal graph with an identifiable branch that causes the biased predictions. If the causal graph is incorrect, the rest would be meaningless. Note that causal graph is not the summarization of the existing network (but the guidance to build networks), you should modify your network based on causal graph, but not vise versa. \n- For those nodes having multiple input branches in the causal graph, it's crucial to choose the right fusion function. We tested lots of fusion funtions and only found the SUM fusion and GATE fusion consistently working well. The fusion function like element-wise production won't work for TDE analysis in most of the cases, because the causal influence from multiple branches can not be linearly separated anymore, which means, it's no longer an identifiable 'influence'.\n- For those final predictions having multiple input branches in the causal graph, it may also need to add auxiliary losses for each branch to stablize the causal influence of each independent branch. Because when these branches have different convergent speeds, those hard branches would easily be learned as unimportant tiny floatings that depend on the fastest\u002Fstablest converged branch. Auxiliary losses allow different branches to have independent and equal influences.\n\n## Frequently Asked Questions:\n\n1. **Q:** Fail to load the given checkpoints.\n**A:** The model to be loaded is based on the last_checkpoint file in the OUTPUT_DIR path. If you fail to load the given pretained checkpoints, it probably because the last_checkpoint file still provides the path in my workstation rather than your own path.\n\n2. **Q:** AssertionError on \"assert len(fns) == 108073\"\n**A:** If you are working on VG dataset, it is probably caused by the wrong DATASETS (data path) in maskrcnn_benchmark\u002Fconfig\u002Fpaths_catlog.py. If you are working on your custom datasets, just comment out the assertions.\n\n3. **Q:** AssertionError on \"l_batch == 1\" in model_motifs.py\n**A:** The original MOTIFS code only supports evaluation on 1 GPU. Since my reimplemented motifs is based on their code, I keep this assertion to make sure it won't cause any unexpected errors.\n\n## Citations\n\nIf you find this project helps your research, please kindly consider citing our project or papers in your publications.\n\n```\n@misc{tang2020sggcode,\ntitle = {A Scene Graph Generation Codebase in PyTorch},\nauthor = {Tang, Kaihua},\nyear = {2020},\nnote = {\\url{https:\u002F\u002Fgithub.com\u002FKaihuaTang\u002FScene-Graph-Benchmark.pytorch}},\n}\n\n@inproceedings{tang2018learning,\n  title={Learning to Compose Dynamic Tree Structures for Visual Contexts},\n  author={Tang, Kaihua and Zhang, Hanwang and Wu, Baoyuan and Luo, Wenhan and Liu, Wei},\n  booktitle= \"Conference on Computer Vision and Pattern Recognition\",\n  year={2019}\n}\n\n@inproceedings{tang2020unbiased,\n  title={Unbiased Scene Graph Generation from Biased Training},\n  author={Tang, Kaihua and Niu, Yulei and Huang, Jianqiang and Shi, Jiaxin and Zhang, Hanwang},\n  booktitle= \"Conference on Computer Vision and Pattern Recognition\",\n  year={2020}\n}\n```\n","# PyTorch中的场景图基准测试\n\n[![LICENSE](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-green)](https:\u002F\u002Fgithub.com\u002FKaihuaTang\u002FScene-Graph-Benchmark.pytorch\u002Fblob\u002Fmaster\u002FLICENSE)\n[![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.7-blue.svg)](https:\u002F\u002Fwww.python.org\u002F)\n![PyTorch](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpytorch-1.2.0-%237732a8)\n\n我们的论文《从有偏训练中无偏生成场景图》（Unbiased Scene Graph Generation from Biased Training）已被CVPR 2020接受并作口头报告。\n\n## 最新更新\n\n- [x] 2020.06.23 增加无图约束平均召回率@K (ng-mR@K) 和无图约束零样本召回率@K (ng-zR@K) [\\[链接\\]](METRICS.md#explanation-of-our-metrics)\n- [x] 2020.06.23 允许在自定义图像上进行场景图检测 (SGDet) [\\[链接\\]](#SGDet-on-custom-images)\n- [x] 2020.07.21 将自定义图像上的场景图检测输出更改为json文件 [\\[链接\\]](#SGDet-on-custom-images)\n- [x] 2020.07.21 可视化自定义图像的检测到的场景图 [\\[链接\\]](#Visualize-Detected-SGs-of-Custom-Images)\n- [ ] 待办：使用【背景排除推理】(Background-Exempted Inference) 来提升TDE场景图的质量\n\n## 目录\n\n1. [概述](#Overview)\n2. [安装依赖](INSTALL.md)\n3. [准备数据集](DATASET.md)\n4. [我们工具包的指标与结果](METRICS.md)\n    - [R@K、mR@K、zR@K、ng-R@K、ng-mR@K、ng-zR@K、A@K、S2G的解释](METRICS.md#explanation-of-our-metrics)\n    - [输出格式](METRICS.md#output-format-of-our-code)\n    - [报告的结果](METRICS.md#reported-results)\n5. [Faster R-CNN预训练模型](#pretrained-models)\n6. [作为RoI_Head的场景图生成](#scene-graph-generation-as-RoI_Head)\n7. [场景图生成的训练](#perform-training-on-scene-graph-generation)\n8. [场景图生成的评估](#Evaluation)\n9. [**在您的自定义图像上检测场景图** :star2:](#SGDet-on-custom-images)\n10. [**可视化自定义图像的检测到的场景图** :star2:](#Visualize-Detected-SGs-of-Custom-Images)\n11. [可能提升SGG性能的其他选项](#other-options-that-may-improve-the-SGG)\n12. [针对任何无偏任务的技巧与窍门](#tips-and-Tricks-for-any-unbiased-taskX-from-biased-training)\n13. [常见问题](#frequently-asked-questions)\n14. [引用](#Citations)\n\n## 概述\n\n该项目旨在构建一个新的场景图生成（SGG）代码库，同时也是论文《从有偏训练中无偏生成场景图》（Unbiased Scene Graph Generation from Biased Training）的PyTorch实现。此前广泛采用的SGG代码库neural-motifs已经脱离了Faster\u002FMask R-CNN的最新发展。因此，我决定基于著名的maskrcnn-benchmark项目构建一个场景图基准测试，并将关系预测定义为额外的roi_head。顺便说一下，得益于他们优雅的框架，这个代码库比之前的neural-motifs框架更加友好，也更容易阅读和修改以适应您自己的项目（至少我希望如此）。遗憾的是，在我开发这个项目时，detectron2尚未发布，但我认为我们可以把maskrcnn-benchmark视为一个更稳定、bug更少的版本，哈哈哈哈哈。此外，我还介绍了SGG中使用的所有新旧指标，并在[METRICS.md](METRICS.md)中澄清了SGG指标中的两个常见误解，这些误解导致了一些论文中出现异常结果。\n\n### 凭借maskrcnn-benchmark中最新的Faster R-CNN，该代码库通过使用两块1080ti显卡和批量大小8重新实现的VCTree模型，在SGCls和SGGen任务上达到了新的最先进水平的Recall@k（截至2020年2月16日）：\n\n模型 | SGGen R@20 | SGGen R@50 | SGGen R@100 | SGCls R@20 | SGCls R@50 | SGCls R@100 | PredCls R@20 | PredCls R@50 | PredCls R@100\n-- | -- | -- | -- | -- | -- | -- | -- | -- | --\nVCTree | 24.53 | 31.93 | 36.21 | 42.77 | 46.67 | 47.64 | 59.02 | 65.42 | 67.18\n\n请注意，VCTree的所有结果都应该优于我们在《从有偏训练中无偏生成场景图》中报告的结果，因为在论文发表后我们对树结构网络进行了优化。\n\n### 来自《从有偏训练中无偏生成场景图》的无偏SGG示意图\n\n![alt text](demo\u002Fteaser_figure.png \"来自《从有偏训练中无偏生成场景图》\")\n\n## 安装\n\n请参阅[INSTALL.md](INSTALL.md)获取安装说明。\n\n## 数据集\n\n请参阅[DATASET.md](DATASET.md)获取数据集预处理说明。\n\n## 指标与结果 **(重要)**\n我们工具包中的指标解释以及报告的结果均在[METRICS.md](METRICS.md)中给出。\n\n\n## 预训练模型\n\n由于我们在论文《从有偏训练中无偏生成场景图》中测试了许多SGG模型，因此我不会在此处上传所有预训练的SGG模型。不过，您可以下载我们在论文中使用的【预训练Faster R-CNN】(https:\u002F\u002F1drv.ms\u002Fu\u002Fs!AmRLLNf6bzcir8xemVHbqPBrvjjtQg?e=hAhYCw)，这是整个训练过程中最耗时的步骤（当时使用了4块2080ti显卡）。至于SGG模型，您可以按照后续说明自行训练，每个SGG模型仅需2块GPU即可完成训练。训练结果应与[METRICS.md](METRICS.md)中报告的结果非常接近。\n\n下载【Faster R-CNN模型】(https:\u002F\u002F1drv.ms\u002Fu\u002Fs!AmRLLNf6bzcir8xemVHbqPBrvjjtQg?e=hAhYCw)后，请将所有文件解压到目录`\u002Fhome\u002Fusername\u002Fcheckpoints\u002Fpretrained_faster_rcnn`。要训练您自己的Faster R-CNN模型，请参阅下一节。\n\n上述预训练的Faster R-CNN模型在VG训练集、验证集和测试集上的mAp分别为38.52、26.35和28.14。\n\n## 替代链接\n感谢[Catchip](https:\u002F\u002Fgithub.com\u002FCatchip)的赞助。由于OneDrive链接在中国大陆可能会失效，我们还提供了以下替代链接，用于访问所有预训练模型和数据集标注：\n\n链接1(Baidu)：[https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1oyPQBDHXMQ5Tsl0jy5OzgA](https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1oyPQBDHXMQ5Tsl0jy5OzgA)\n提取码：1234\n\n链接2(Weiyun): [https:\u002F\u002Fshare.weiyun.com\u002FViTWrFxG](https:\u002F\u002Fshare.weiyun.com\u002FViTWrFxG)\n\n## Faster R-CNN 预训练\n可以使用以下命令来训练您自己的 Faster R-CNN 模型：\n```bash\nCUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --master_port 10001 --nproc_per_node=4 tools\u002Fdetector_pretrain_net.py --config-file \"configs\u002Fe2e_relation_detector_X_101_32_8_FPN_1x.yaml\" SOLVER.IMS_PER_BATCH 8 TEST.IMS_PER_BATCH 4 DTYPE \"float16\" SOLVER.MAX_ITER 50000 SOLVER.STEPS \"(30000, 45000)\" SOLVER.VAL_PERIOD 2000 SOLVER.CHECKPOINT_PERIOD 2000 MODEL.RELATION_ON False OUTPUT_DIR \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fpretrained_faster_rcnn SOLVER.PRE_VAL False\n```\n其中 ```CUDA_VISIBLE_DEVICES``` 和 ```--nproc_per_node``` 分别表示您使用的 GPU 编号和 GPU 数量，```--config-file``` 表示我们所使用的配置文件，您可以在该文件中调整其他参数。```SOLVER.IMS_PER_BATCH``` 和 ```TEST.IMS_PER_BATCH``` 分别是训练和测试的批次大小，```DTYPE \"float16\"``` 启用由 [APEX](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fapex) 支持的自动混合精度，```SOLVER.MAX_ITER``` 是最大迭代次数，```SOLVER.STEPS``` 是学习率衰减的步长，```SOLVER.VAL_PERIOD``` 和 ```SOLVER.CHECKPOINT_PERIOD``` 分别是验证和保存检查点的周期，```MODEL.RELATION_ON``` 表示是否开启关系头（由于这是仅针对 Faster R-CNN 的预训练阶段，因此关闭关系头），```OUTPUT_DIR``` 是用于保存检查点和日志的输出目录（例如 `\u002Fhome\u002Fusername\u002Fcheckpoints\u002Fpretrained_faster_rcnn`），```SOLVER.PRE_VAL``` 表示是否在训练前进行验证。\n\n\n## 场景图生成作为 RoI_Head\n为了标准化场景图生成任务，我将其定义为一个 RoI_Head。参考其他 RoI_Head（如 box_head）的设计，我将大部分 SGG 相关代码放在 ```maskrcnn_benchmark\u002Fmodeling\u002Froi_heads\u002Frelation_head``` 目录下，其调用顺序如下所示：\n\n![alt text](demo\u002Frelation_head.png \"relation_head 的结构\")\n\n\n## 进行场景图生成的训练\n共有 **三种标准协议**：(1) 谓词分类 (PredCls)：以真实边界框和标签作为输入；(2) 场景图分类 (SGCls)：使用没有标签的真实边界框；(3) 场景图检测 (SGDet)：从零开始检测场景图。我们通过两个开关 ```MODEL.ROI_RELATION_HEAD.USE_GT_BOX``` 和 ```MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL``` 来选择不同的协议。\n\n对于 **谓词分类 (PredCls)**，需要设置：\n``` bash\nMODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL True\n```\n对于 **场景图分类 (SGCls)**：\n``` bash\nMODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False\n```\n对于 **场景图检测 (SGDet)**：\n``` bash\nMODEL.ROI_RELATION_HEAD.USE_GT_BOX False MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False\n```\n\n### 预定义模型\n我们将各种 SGG 模型抽象为 ```roi_heads\u002Frelation_head\u002Froi_relation_predictors.py``` 文件中的不同 ```relation-head predictors```，这些预测器与 Faster R-CNN 的主干网络和关系头特征提取器无关。要选择预定义的模型，可以使用 ```MODEL.ROI_RELATION_HEAD.PREDICTOR```。\n\n对于 [Neural-MOTIFS](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.06640) 模型：\n```bash\nMODEL.ROI_RELATION_HEAD.PREDICTOR MotifPredictor\n```\n对于 [迭代消息传递 (IMP)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1701.02426) 模型（注意，在 SGCls 中应将 SOLVER.BASE_LR 改为 0.001，否则模型无法收敛）：\n```bash\nMODEL.ROI_RELATION_HEAD.PREDICTOR IMPPredictor\n```\n对于 [VCTree](https:\u002F\u002Farxiv.org\u002Fabs\u002F1812.01880) 模型：\n```bash\nMODEL.ROI_RELATION_HEAD.PREDICTOR VCTreePredictor\n```\n对于我们预定义的 Transformer 模型（注意，Transformer 模型需要将 SOLVER.BASE_LR 改为 0.001，SOLVER.SCHEDULE.TYPE 改为 WarmupMultiStepLR，SOLVER.MAX_ITER 改为 16000，SOLVER.IMS_PER_BATCH 改为 16，SOLVER.STEPS 改为 (10000, 16000)。该模型由 [Jiaxin Shi](https:\u002F\u002Fgithub.com\u002Fshijx12) 提供）：\n```bash\nMODEL.ROI_RELATION_HEAD.PREDICTOR TransformerPredictor\n```\n对于 [Unbiased-Causal-TDE](https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.11949) 模型：\n```bash\nMODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor\n```\n\n默认设置位于 ```configs\u002Fe2e_relation_X_101_32_8_FPN_1x.yaml``` 和 ```maskrcnn_benchmark\u002Fconfig\u002Fdefaults.py``` 中。优先级顺序为：```命令 > yaml > defaults.py```。\n\n### 自定义您的模型\n如果您想自定义模型，可以参考 ```maskrcnn-benchmark\u002Fmodeling\u002Froi_heads\u002Frelation_head\u002Fmodel_XXXXX.py``` 和 ```maskrcnn-benchmark\u002Fmodeling\u002Froi_heads\u002Frelation_head\u002Futils_XXXXX.py```。同时，您还需要在 ```maskrcnn-benchmark\u002Fmodeling\u002Froi_heads\u002Frelation_head\u002Froi_relation_predictors.py``` 中添加相应的 nn.Module。有时，您可能还需要通过 ```maskrcnn-benchmark\u002Fmodeling\u002Froi_heads\u002Frelation_head\u002Frelation_head.py``` 修改模块的输入和输出。\n\n### 关于 [偏置训练下的无偏场景图生成](https:\u002F\u002Farxiv.org\u002Fabs\u002F2002.11949) 中提出的因果 TDE\n对于无偏因果 TDE，还有一些额外的参数需要了解。```MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE``` 用于在推理（测试）阶段选择因果效应分析类型，其中“none”表示原始似然，“TDE”表示总直接效应，“NIE”表示自然间接效应，“TE”表示总效应。```MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE``` 有两个选项：“sum”或“gate”。由于无偏因果 TDE 分析与具体模型无关，我们支持 [Neural-MOTIFS](https:\u002F\u002Farxiv.org\u002Fabs\u002F1711.06640)、[VCTree](https:\u002F\u002Farxiv.org\u002Fabs\u002F1812.01880) 和 [VTransE](https:\u002F\u002Farxiv.org\u002Fabs\u002F1702.08319)。```MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER``` 用于选择参与无偏因果分析的模型，有三个选项：motifs、vctree、vtranse。\n\n需要注意的是，在训练过程中，我们始终将 ```MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE``` 设置为“none”，因为因果效应分析仅适用于推理\u002F测试阶段。\n\n### 训练命令示例\n训练示例 1：（PreCls，Motif 模型）\n```bash\nCUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 10025 --nproc_per_node=2 tools\u002Frelation_train_net.py --config-file \"configs\u002Fe2e_relation_X_101_32_8_FPN_1x.yaml\" MODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL True MODEL.ROI_RELATION_HEAD.PREDICTOR MotifPredictor SOLVER.IMS_PER_BATCH 12 TEST.IMS_PER_BATCH 2 DTYPE \"float16\" SOLVER.MAX_ITER 50000 SOLVER.VAL_PERIOD 2000 SOLVER.CHECKPOINT_PERIOD 2000 GLOVE_DIR \u002Fhome\u002Fkaihua\u002Fglove MODEL.PRETRAINED_DETECTOR_CKPT \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fpretrained_faster_rcnn\u002Fmodel_final.pth OUTPUT_DIR \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fmotif-precls-exmp\n```\n其中 ```GLOVE_DIR``` 是用于保存 Glove 初始化的目录，```MODEL.PRETRAINED_DETECTOR_CKPT``` 是您想要加载的预训练 Faster R-CNN 模型，```OUTPUT_DIR``` 是用于保存检查点和日志的输出目录。由于我们为 SGG 使用了 ```WarmupReduceLROnPlateau``` 学习率调度器，因此不再需要设置 ```SOLVER.STEPS```。\n\n训练示例 2：（SGCls，Causal，**TDE**，SUM 融合，MOTIFS 模型）\n```bash\nCUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 10026 --nproc_per_node=2 tools\u002Frelation_train_net.py --config-file \"configs\u002Fe2e_relation_X_101_32_8_FPN_1x.yaml\" MODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE none MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE sum MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER motifs  SOLVER.IMS_PER_BATCH 12 TEST.IMS_PER_BATCH 2 DTYPE \"float16\" SOLVER.MAX_ITER 50000 SOLVER.VAL_PERIOD 2000 SOLVER.CHECKPOINT_PERIOD 2000 GLOVE_DIR \u002Fhome\u002Fkaihua\u002Fglove MODEL.PRETRAINED_DETECTOR_CKPT \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fpretrained_faster_rcnn\u002Fmodel_final.pth OUTPUT_DIR \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fcausal-motifs-sgcls-exmp\n```\n\n\n## 评估\n\n### 测试命令示例\n测试示例 1：（PreCls，Motif 模型）\n```bash\nCUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10027 --nproc_per_node=1 tools\u002Frelation_test_net.py --config-file \"configs\u002Fe2e_relation_X_101_32_8_FPN_1x.yaml\" MODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL True MODEL.ROI_RELATION_HEAD.PREDICTOR MotifPredictor TEST.IMS_PER_BATCH 1 DTYPE \"float16\" GLOVE_DIR \u002Fhome\u002Fkaihua\u002Fglove MODEL.PRETRAINED_DETECTOR_CKPT \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fmotif-precls-exmp OUTPUT_DIR \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fmotif-precls-exmp\n```\n\n测试示例 2：（SGCls，Causal，**TDE**，SUM 融合，MOTIFS 模型）\n```bash\nCUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10028 --nproc_per_node=1 tools\u002Frelation_test_net.py --config-file \"configs\u002Fe2e_relation_X_101_32_8_FPN_1x.yaml\" MODEL.ROI_RELATION_HEAD.USE_GT_BOX True MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE TDE MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE sum MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER motifs  TEST.IMS_PER_BATCH 1 DTYPE \"float16\" GLOVE_DIR \u002Fhome\u002Fkaihua\u002Fglove MODEL.PRETRAINED_DETECTOR_CKPT \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fcausal-motifs-sgcls-exmp OUTPUT_DIR \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fcausal-motifs-sgcls-exmp\n```\n\n### 预训练的 Causal MOTIFS-SUM 模型示例\nSGDet\u002FSGCls\u002FPredCls 上的预训练 Causal MOTIFS-SUM 模型示例（批量大小 12）：[(SGDet 下载)](https:\u002F\u002F1drv.ms\u002Fu\u002Fs!AmRLLNf6bzcir9x7OYb6sKBlzoXuYA?e=s3Y602), [(SGCls 下载)](https:\u002F\u002F1drv.ms\u002Fu\u002Fs!AmRLLNf6bzcir9xyuLO_I8TSZ6kfyQ?e=Y5686s), [(PredCls 下载)](https:\u002F\u002F1drv.ms\u002Fu\u002Fs!AmRLLNf6bzcir9xx725wYjN7lytynA?e=0B65Ws)\n\n对应结果（论文中使用的原始模型已丢失，这些是新生成的模型，因此结果存在一些波动。更多结果请参见 [报告结果](METRICS.md#reported-results)）：\n\n模型 |  R@20 | R@50 | R@100 | mR@20 | mR@50 | mR@100 | zR@20 | zR@50 | zR@100\n-- | -- | -- | -- | -- | -- | -- | -- | -- | --\nMOTIFS-SGDet-none   | 25.42 | 32.45 | 37.26 | 4.36 | 5.83 | 7.08 | 0.02 | 0.08 | 0.24\nMOTIFS-SGDet-TDE    | 11.92 | 16.56 | 20.15 | 6.58 | 8.94 | 10.99 | 1.54 | 2.33 | 3.03\nMOTIFS-SGCls-none   | 36.02 | 39.25 | 40.07 | 6.50 | 8.02 | 8.51 | 1.06 | 2.18 | 3.07\nMOTIFS-SGCls-TDE    | 20.47 | 26.31 | 28.79 | 9.80 | 13.21 | 15.06 | 1.91 | 2.95 | 4.10\nMOTIFS-PredCls-none | 59.64 | 66.11 | 67.96 | 11.46 | 14.60 | 15.84 | 5.79 | 11.02 | 14.74\nMOTIFS-PredCls-TDE  | 33.38 | 45.88 | 51.25 | 17.85 | 24.75 | 28.70 | 8.28 | 14.31 | 18.04\n\n## SGDet 在自定义图像上的应用\n请注意，自定义图像上的评估仅适用于 SGDet 模型，因为 PredCls 和 SGCls 模型需要额外的标注边界框信息。要在您自己的图像上检测场景图并将其保存为 JSON 文件，您需要打开 TEST.CUSTUM_EVAL 开关，并在 TEST.CUSTUM_PATH 中指定包含自定义图像的文件夹路径（或包含图像路径列表的 JSON 文件）。仅允许使用 JPG 格式的文件。输出将被保存为 custom_prediction.json，位于指定的 DETECTED_SGG_DIR 目录中。\n\n测试示例 1：(SGDet，**因果 TDE**，MOTIFS 模型，SUM 融合) [(检查点)](https:\u002F\u002F1drv.ms\u002Fu\u002Fs!AmRLLNf6bzcir9x7OYb6sKBlzoXuYA?e=s3Y602)\n```bash\nCUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10027 --nproc_per_node=1 tools\u002Frelation_test_net.py --config-file \"configs\u002Fe2e_relation_X_101_32_8_FPN_1x.yaml\" MODEL.ROI_RELATION_HEAD.USE_GT_BOX False MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE TDE MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE sum MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER motifs TEST.IMS_PER_BATCH 1 DTYPE \"float16\" GLOVE_DIR \u002Fhome\u002Fkaihua\u002Fglove MODEL.PRETRAINED_DETECTOR_CKPT \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fcausal-motifs-sgdet OUTPUT_DIR \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fcausal-motifs-sgdet TEST.CUSTUM_EVAL True TEST.CUSTUM_PATH \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fcustom_images DETECTED_SGG_DIR \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fyour_output_path\n```\n\n测试示例 2：(SGDet，**原始**，MOTIFS 模型，SUM 融合) [(相同检查点)](https:\u002F\u002F1drv.ms\u002Fu\u002Fs!AmRLLNf6bzcir9x7OYb6sKBlzoXuYA?e=s3Y602)\n```bash\nCUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10027 --nproc_per_node=1 tools\u002Frelation_test_net.py --config-file \"configs\u002Fe2e_relation_X_101_32_8_FPN_1x.yaml\" MODEL.ROI_RELATION_HEAD.USE_GT_BOX False MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE none MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE sum MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER motifs TEST.IMS_PER_BATCH 1 DTYPE \"float16\" GLOVE_DIR \u002Fhome\u002Fkaihua\u002Fglove MODEL.PRETRAINED_DETECTOR_CKPT \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fcausal-motifs-sgdet OUTPUT_DIR \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fcausal-motifs-sgdet TEST.CUSTUM_EVAL True TEST.CUSTUM_PATH \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fcustom_images DETECTED_SGG_DIR \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fyour_output_path\n```\n\n输出是一个 JSON 文件。对于每张图像，场景图信息以字典形式保存，包含 bbox（已排序）、bbox_labels（已排序）、bbox_scores（已排序）、rel_pairs（已排序）、rel_labels（已排序）、rel_scores（已排序）以及 rel_all_scores（已排序），其中 rel_all_scores 提供了每对物体的所有 51 种谓词的概率。数据集信息则保存为 custom_data_info.json，位于相同的 DETECTED_SGG_DIR 目录中。\n\n## 可视化自定义图像的检测到的场景图\n要可视化自定义图像的检测到的场景图，您可以按照以下 Jupyter 笔记进行操作：[visualization\u002F3.visualize_custom_SGDet.jpynb](https:\u002F\u002Fgithub.com\u002FKaihuaTang\u002FScene-Graph-Benchmark.pytorch\u002Fblob\u002Fmaster\u002Fvisualization\u002F3.visualize_custom_SGDet.ipynb)。我们的可视化代码的输入是 DETECTED_SGG_DIR 中的 custom_prediction.json 和 custom_data_info.json。如果您成功运行了上述自定义 SGDet 指令，这些文件将会自动生成。请注意，可能会出现过多的无关紧要的边界框和关系，因此您可以通过调整参数 box_topk 和 rel_topk 来选择前 k 个边界框和谓词，从而获得更清晰的场景图。\n\n## 其他可能提升 SGG 效果的选项\n\n- 对于某些模型（并非全部），开启或关闭 ```MODEL.ROI_RELATION_HEAD.POOLING_ALL_LEVELS``` 会显著影响谓词预测的性能。例如，关闭该选项可以提升 VCTree 的 PredCls 性能，但不会改善相应的 SGCls 和 SGGen。对于 VCTree 的报告结果，我们像其他模型一样，在所有三种协议中都将其保持开启状态。\n\n- 对于某些模型（并非全部），[Learning to Count Object](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.05766) 提出的一种奇特融合方法能够显著提升效果，其形式为 ```f(x1, x2) = ReLU(x1 + x2) - (x1 - x2)**2```。这种方法可用于在 ```roi_heads\u002Frelation_head\u002Froi_relation_predictors.py``` 中结合主体和客体特征。目前，大多数模型只是简单地将它们拼接在一起，即 ```torch.cat((head_rep, tail_rep), dim=-1)```。\n\n- 更不用说模型中的隐藏维度，例如 ```MODEL.ROI_RELATION_HEAD.CONTEXT_HIDDEN_DIM```。由于时间有限，我们并未在此项目中充分探索所有设置。因此，如果您仅仅通过调整其中一个超参数就提升了我们的结果，我也不会感到意外。\n\n## 针对从有偏训练中得到的无偏任务 X 的技巧与窍门\n\n反事实推理不仅适用于 SGG。事实上，我的同事 [Yulei](https:\u002F\u002Fgithub.com\u002Fyuleiniu) 发现，反事实因果推理在 [无偏 VQA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.04315) 中也具有巨大潜力。我们相信，这种反事实推理同样可以应用于许多存在显著偏差的推理任务。其基本原理是让模型运行两次（一次生成原始输出，另一次生成干预后的输出），然后用后者减去应被剔除的有偏先验，从而得到最终的无偏预测。不过，您需要注意以下三点：\n- 最重要的是因果图。您需要找到正确的因果图，并识别出导致有偏预测的可辨识分支。如果因果图不正确，后续的一切都将毫无意义。需要注意的是，因果图并不是对现有网络的总结（而是指导网络构建的原则），您应当根据因果图来修改网络结构，而不是反过来。\n- 对于那些在因果图中有多个输入分支的节点，选择合适的融合函数至关重要。我们尝试过多种融合函数，最终发现 SUM 融合和 GATE 融合始终表现稳定。而诸如逐元素相乘之类的融合方式在大多数情况下并不适用于 TDE 分析，因为来自不同分支的因果影响已经无法线性分离，也就不再具备可辨识的“影响力”。\n- 对于那些在因果图中有多个输入分支的最终预测，可能还需要为每个分支添加辅助损失，以稳定各独立分支的因果影响力。因为当这些分支的收敛速度不同时，较慢的分支很容易被视为依赖于最快或最稳定的分支的微不足道的小浮动值。辅助损失能够让各个分支拥有独立且平等的影响力。\n\n## 常见问题解答：\n\n1. **问：** 无法加载给定的检查点。\n**答：** 要加载的模型是基于 `OUTPUT_DIR` 路径下的 `last_checkpoint` 文件。如果无法加载给定的预训练检查点，很可能是由于 `last_checkpoint` 文件中仍然保存的是您本地工作站的路径，而不是您当前环境的路径。\n\n2. **问：** 在 `assert len(fns) == 108073` 处出现断言错误。\n**答：** 如果您正在使用 VG 数据集，这很可能是由于 `maskrcnn_benchmark\u002Fconfig\u002Fpaths_catalog.py` 文件中 `DATASETS`（数据路径）配置错误导致的。如果您使用的是自定义数据集，可以直接注释掉这些断言语句。\n\n3. **问：** 在 `model_motifs.py` 中出现 `l_batch == 1` 的断言错误。\n**答：** 原始的 MOTIFS 代码仅支持在单 GPU 上进行评估。由于我重新实现的 MOTIFS 是基于他们的代码，因此保留了这一断言，以确保不会引发任何意外错误。\n\n## 引用\n\n如果您认为本项目对您的研究有所帮助，请在您的论文或出版物中引用我们的项目或相关论文。\n\n```\n@misc{tang2020sggcode,\ntitle = {基于 PyTorch 的场景图生成代码库},\nauthor = {唐凯华},\nyear = {2020},\nnote = {\\url{https:\u002F\u002Fgithub.com\u002FKaihuaTang\u002FScene-Graph-Benchmark.pytorch}},\n}\n\n@inproceedings{tang2018learning,\n  title={学习为视觉上下文构建动态树结构},\n  author={唐凯华、张汉旺、吴宝元、罗文翰、刘伟},\n  booktitle= \"计算机视觉与模式识别会议\",\n  year={2019}\n}\n\n@inproceedings{tang2020unbiased,\n  title={从有偏训练中无偏地生成场景图},\n  author={唐凯华、牛宇雷、黄建强、史佳欣、张汉旺},\n  booktitle= \"计算机视觉与模式识别会议\",\n  year={2020}\n}\n```","# Scene-Graph-Benchmark.pytorch 快速上手指南\n\n本指南基于 PyTorch 实现，旨在帮助开发者快速搭建场景图生成（SGG）基准环境，复现 CVPR 2020 Oral 论文《Unbiased Scene Graph Generation from Biased Training》中的成果。\n\n## 1. 环境准备\n\n在开始之前，请确保满足以下系统和依赖要求：\n\n*   **操作系统**: Linux (推荐 Ubuntu)\n*   **Python**: 3.7\n*   **PyTorch**: 1.2.0 (需匹配对应的 CUDA 版本)\n*   **GPU**: 支持 CUDA 的 NVIDIA 显卡 (训练 Faster R-CNN 预训练模型建议多卡，如 4x 2080Ti；训练 SGG 模型仅需 2x 1080Ti)\n*   **核心依赖**:\n    *   [maskrcnn-benchmark](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fmaskrcnn-benchmark): 本项目基于此框架构建。\n    *   [APEX](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fapex): 用于启用混合精度训练 (`float16`)，加速训练过程。\n    *   其他标准库：`numpy`, `scipy`, `opencv-python`, `yacs` 等。\n\n> **注意**：详细的环境安装步骤（包括 CUDA\u002FcuDNN 配置及依赖包安装命令）请参考项目根目录下的 `INSTALL.md` 文件。\n\n## 2. 安装步骤\n\n### 2.1 克隆代码库\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FKaihuaTang\u002FScene-Graph-Benchmark.pytorch.git\ncd Scene-Graph-Benchmark.pytorch\n```\n\n### 2.2 安装依赖\n请严格按照 `INSTALL.md` 中的指示安装 `maskrcnn-benchmark` 及其依赖。通常涉及以下步骤（具体版本请以官方文档为准）：\n```bash\n# 示例：安装 apex (需根据实际 CUDA 版本调整)\ngit clone https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fapex\ncd apex\npip install -v --no-cache-dir .\u002F --global-option=\"--cpp_ext\" --global-option=\"--cuda_ext\"\ncd ..\n```\n\n### 2.3 下载预训练模型与数据集\n由于原链接可能在大陆访问受限，优先推荐使用国内镜像源下载必要的资源。\n\n**预训练 Faster R-CNN 模型**（训练 SGG 的基础，最耗时步骤）：\n*   **百度网盘**: [下载链接](https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1oyPQBDHXMQ5Tsl0jy5OzgA) (提取码：`1234`)\n*   **微云**: [下载链接](https:\u002F\u002Fshare.weiyun.com\u002FViTWrFxG)\n\n下载后，将解压后的文件放置于指定目录（请根据实际用户名修改路径）：\n```bash\n# 假设下载的文件解压后包含模型权重\nmv \u003Cdownloaded_files> \u002Fhome\u002Fusername\u002Fcheckpoints\u002Fpretrained_faster_rcnn\n```\n\n**数据集准备**：\n本项目主要使用 Visual Genome (VG) 数据集。请参考 `DATASET.md` 进行数据预处理和标注文件的整理。\n\n## 3. 基本使用\n\n本项目将场景图生成定义为 Faster R-CNN 的一个 `RoI_Head`。使用前需明确三种标准协议：\n1.  **PredCls** (谓词分类): 输入为真值框和真值标签。\n2.  **SGCls** (场景图分类): 输入为真值框，无标签。\n3.  **SGDet** (场景图检测): 从零开始检测（最常用，完全端到端）。\n\n### 3.1 训练 Faster R-CNN (可选)\n如果未使用提供的预训练模型，可自行训练：\n```bash\nCUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --master_port 10001 --nproc_per_node=4 tools\u002Fdetector_pretrain_net.py --config-file \"configs\u002Fe2e_relation_detector_X_101_32_8_FPN_1x.yaml\" SOLVER.IMS_PER_BATCH 8 TEST.IMS_PER_BATCH 4 DTYPE \"float16\" SOLVER.MAX_ITER 50000 SOLVER.STEPS \"(30000, 45000)\" SOLVER.VAL_PERIOD 2000 SOLVER.CHECKPOINT_PERIOD 2000 MODEL.RELATION_ON False OUTPUT_DIR \u002Fhome\u002Fkaihua\u002Fcheckpoints\u002Fpretrained_faster_rcnn SOLVER.PRE_VAL False\n```\n\n### 3.2 训练场景图生成模型 (SGG)\n通过修改命令行参数选择模型架构和任务协议。\n\n**示例：使用 VCTree 模型进行场景图检测 (SGDet) 训练**\n```bash\nCUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 10002 --nproc_per_node=2 tools\u002Frelation_train_net.py --config-file \"configs\u002Fe2e_relation_X_101_32_8_FPN_1x.yaml\" MODEL.ROI_RELATION_HEAD.USE_GT_BOX False MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False MODEL.ROI_RELATION_HEAD.PREDICTOR VCTreePredictor SOLVER.IMS_PER_BATCH 8 DTYPE \"float16\" OUTPUT_DIR \u002Fhome\u002Fusername\u002Fcheckpoints\u002Fvctree_sgdet\n```\n\n**常用模型选择参数 (`MODEL.ROI_RELATION_HEAD.PREDICTOR`)**:\n*   `MotifPredictor`: Neural-MOTIFS 模型\n*   `IMPPredictor`: IMP 模型 (SGCls 任务需调整学习率为 0.001)\n*   `VCTreePredictor`: VCTree 模型 (默认推荐)\n*   `TransformerPredictor`: Transformer 模型 (需调整学习率、迭代次数等超参)\n*   `CausalAnalysisPredictor`: 无偏因果 TDE 模型 (论文核心贡献)\n\n**协议切换开关**:\n*   **PredCls**: `MODEL.ROI_RELATION_HEAD.USE_GT_BOX True` + `MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL True`\n*   **SGCls**: `MODEL.ROI_RELATION_HEAD.USE_GT_BOX True` + `MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False`\n*   **SGDet**: `MODEL.ROI_RELATION_HEAD.USE_GT_BOX False` + `MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False`\n\n### 3.3 在自定义图片上检测场景图 (SGDet)\n项目支持对用户上传的自定义图片进行场景图检测，并输出 JSON 结果或可视化图像。\n\n**执行检测**:\n确保已训练好 SGDet 模型，运行推理脚本（具体脚本名参考项目最新更新，通常为 `tools\u002Frelation_test_net.py` 或特定 demo 脚本），并将输出格式设置为 JSON。\n\n**可视化结果**:\n利用内置工具将检测到的场景图绘制在图片上，直观展示物体间的关系三元组。\n\n> **提示**: 更多高级用法、指标解释（R@K, mR@K, ng-R@K 等）及自定义模型开发指南，请参阅项目根目录下的 `METRICS.md` 及相关源码注释。","某电商平台的算法团队正致力于升级商品图片的自动 tagging 系统，希望通过深度理解图中物体间的交互关系（如“人穿着鞋”、“手拿着包”）来提升搜索精准度。\n\n### 没有 Scene-Graph-Benchmark.pytorch 时\n- **模型架构陈旧**：团队被迫基于过时的 `neural-motifs` 框架开发，难以整合最新的 Faster R-CNN 检测能力，导致基础物体识别准确率受限。\n- **长尾关系识别差**：训练出的模型严重偏向常见关系（如“人在...旁边”），无法有效识别“人试戴帽子”等稀疏但高价值的长尾交互，导致搜索漏判。\n- **自定义数据验证难**：缺乏直接对业务自有图片进行场景图检测（SGDet）和可视化的便捷工具，每次验证新策略都需编写大量繁琐的预处理代码。\n- **评估指标混淆**：团队对 R@K、mR@K 等核心指标的定义存在误解，导致实验结果虚高，无法真实反映模型在无偏场景下的泛化能力。\n\n### 使用 Scene-Graph-Benchmark.pytorch 后\n- **架构现代化升级**：直接复用基于 `maskrcnn-benchmark` 构建的现代化架构，将关系预测定义为 RoI Head，轻松集成先进检测器，基础识别性能显著提升。\n- **无偏训练突破瓶颈**：利用论文提出的“从无偏训练中生成场景图”技术，大幅改善了长尾关系的召回率，使“试戴”、“手持”等细粒度动作被精准捕捉。\n- **开箱即用的自定义推理**：调用内置的 SGDet 功能，直接上传店铺实拍图即可输出 JSON 格式的场景图并生成可视化结果，验证效率从数天缩短至分钟级。\n- **指标体系规范化**：依托工具提供的详细指标说明（含 ng-mR@K 等），团队修正了评估逻辑，确保优化方向真正对齐业务所需的零样本泛化能力。\n\nScene-Graph-Benchmark.pytorch 通过提供现代化的无偏训练框架与便捷的自定义推理流程，帮助团队低成本地实现了从“粗略物体检测”到“精细语义关系理解”的技术跨越。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FKaihuaTang_Scene-Graph-Benchmark.pytorch_a2dad366.png","KaihuaTang","Kaihua Tang","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FKaihuaTang_700e8242.jpg","@kaihuatang.github.io\u002F\r\n",null,"Singapore","tkhchipaomian@gmail.com","https:\u002F\u002Fkaihuatang.github.io\u002F","https:\u002F\u002Fgithub.com\u002FKaihuaTang",[86,90,94,98,102,105],{"name":87,"color":88,"percentage":89},"Jupyter Notebook","#DA5B0B",93.2,{"name":91,"color":92,"percentage":93},"Python","#3572A5",5.8,{"name":95,"color":96,"percentage":97},"Cuda","#3A4E3A",0.8,{"name":99,"color":100,"percentage":101},"C++","#f34b7d",0.1,{"name":103,"color":104,"percentage":101},"C","#555555",{"name":106,"color":107,"percentage":108},"Dockerfile","#384d54",0,1190,238,"2026-03-25T12:59:37","MIT",4,"Linux","必需 NVIDIA GPU。训练 Faster R-CNN 需 4 张显卡 (如 2080ti)；训练 SGG 模型需 2 张显卡 (如 1080ti)。支持混合精度训练 (float16)，需安装 APEX。未明确指定最低显存，但基于批量大小和模型复杂度，建议单卡 11GB+。","未说明",{"notes":118,"python":119,"dependencies":120},"该项目基于较旧的 maskrcnn-benchmark 框架和 PyTorch 1.2.0 版本。若使用现代环境可能需要修改代码以适配新版本 PyTorch。训练分为两个阶段：首先需预训练 Faster R-CNN（耗时最长），然后训练场景图生成模型。提供了一键式自定义图片检测功能。","3.7",[121,122,123,124],"pytorch==1.2.0","maskrcnn-benchmark","apex (用于混合精度训练)","torchvision (隐含依赖)",[14,13],"2026-03-27T02:49:30.150509","2026-04-06T07:11:55.306536",[129,134,139,144,149,153],{"id":130,"question_zh":131,"answer_zh":132,"source_url":133},10750,"在自定义图像上进行场景图检测（SGDet）时，是否需要下载 Visual Genome (VG) 数据集？","如果仅进行 SGDet（场景图检测），理论上不需要下载完整的 VG 数据集。但根据用户反馈，如果不配置数据集路径，可能会遇到无法加载统计数据缓存文件（如 `VG_stanford_filtered_with_attribute_train_statistics.cache`）的错误。此外，运行代码时可能需要修复代码中的弃用警告：在 `maskrcnn_benchmark\u002Fdata\u002Fdatasets\u002Fvisual_genome.py` 文件中，将 `np.float` 替换为 `float` 或 `np.float64`，将 `np.bool` 替换为 `bool` 或 `np.bool64`，因为旧版本的 numpy 类型已被弃用。","https:\u002F\u002Fgithub.com\u002FKaihuaTang\u002FScene-Graph-Benchmark.pytorch\u002Fissues\u002F190",{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},10751,"启用属性预测（MODEL.ATTRIBUTE_ON=true）并使用预训练模型时，出现权重形状不匹配（size mismatch）错误怎么办？","这是一个已知问题。官方提供的预训练模型（例如 `upload_causal_motif_sgdet`）通常是在 `MODEL.ATTRIBUTE_ON=false` 的配置下训练的。如果您尝试将这些权重加载到开启了属性预测（`ATTRIBUTE_ON=true`）的模型结构中，会导致全连接层（如 `fc7`）和预测器层的维度不匹配（例如期望 4096 维但实际为 2048 维，或类别数量不同）。目前的解决方案是：要么使用未开启属性的预训练权重并保持配置为 `false`，要么需要从头开始训练或使用专门针对属性任务训练的预训练权重（如果可用）。直接混用会导致 `RuntimeError`。","https:\u002F\u002Fgithub.com\u002FKaihuaTang\u002FScene-Graph-Benchmark.pytorch\u002Fissues\u002F57",{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},10752,"安装时运行 conda 命令出现依赖冲突（Conflict in installation）如何解决？","在执行 `conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch` 时可能会遇到依赖冲突。这通常是因为当前环境中的其他包与指定版本不兼容。建议尝试创建一个全新的干净 conda 环境后再执行安装命令，或者检查是否有其他源干扰。如果问题依旧，可能需要手动调整 PyTorch 版本以匹配您的 CUDA 版本和其他已安装的库。","https:\u002F\u002Fgithub.com\u002FKaihuaTang\u002FScene-Graph-Benchmark.pytorch\u002Fissues\u002F135",{"id":145,"question_zh":146,"answer_zh":147,"source_url":148},10753,"运行时出现 ImportError: cannot import name '_C' from 'maskrcnn_benchmark' 错误是什么原因？","该错误表明 `maskrcnn_benchmark` 的 C++ 扩展模块（_C）未成功编译或未被 Python 找到。这通常发生在安装步骤不完整时。解决方法是确保您已按照项目说明正确编译了代码。通常需要进入项目根目录，运行 `python setup.py build develop` 来编译 C++ 代码并注册模块。请确保您的环境中已安装了与代码兼容版本的 PyTorch 和 CUDA 工具包，并且编译器（如 gcc）可用。","https:\u002F\u002Fgithub.com\u002FKaihuaTang\u002FScene-Graph-Benchmark.pytorch\u002Fissues\u002F60",{"id":150,"question_zh":151,"answer_zh":152,"source_url":138},10754,"如何在自定义图像上运行预训练的 Causal MOTIFS-SUM 模型进行场景图检测？","您可以使用以下命令在自定义图像上运行测试。请确保替换路径为您的实际路径：\n`CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10027 --nproc_per_node=1 tools\u002Frelation_test_net.py --config-file \"configs\u002Fe2e_relation_X_101_32_8_FPN_1x.yaml\" MODEL.ROI_RELATION_HEAD.USE_GT_BOX False MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE TDE MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE sum MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER motifs TEST.IMS_PER_BATCH 1 DTYPE \"float16\" GLOVE_DIR \u003Cyour_glove_path> MODEL.PRETRAINED_DETECTOR_CKPT \u003Cyour_checkpoint_path> OUTPUT_DIR \u003Cyour_output_path> TEST.CUSTUM_EVAL True TEST.CUSTUM_PATH \u003Cyour_image_folder> DETECTED_SGG_DIR \u003Cyour_result_dir>`\n注意：如果不需要属性预测，请确保配置中 `MODEL.ATTRIBUTE_ON` 为 false 以匹配预训练权重。",{"id":154,"question_zh":155,"answer_zh":156,"source_url":138},10755,"下载的预训练模型权重文件为空或损坏怎么办？","部分用户反馈从 OneDrive 或其他链接下载的预训练权重文件可能为空或损坏，导致加载失败。如果遇到这种情况，建议检查网络连接或尝试使用不同的下载工具。如果问题持续，可以在 GitHub Issue 中留言请求维护者重新上传，或者查看是否有其他镜像源提供相同的权重文件。",[]]