Deep-Learning-for-Tracking-and-Detection
Deep-Learning-for-Tracking-and-Detection 是一个面向目标检测与跟踪领域的开源资源集合,汇集了大量深度学习相关的论文、数据集、代码实现及其他实用资料。它系统整理了静态图像中的目标检测(如 YOLO、SSD、RetinaNet 等主流方法)、视频中的目标检测、单目标与多目标跟踪等多个方向的研究成果,并覆盖从传统方法到前沿技术(如无锚框检测、图神经网络、强化学习等)的广泛内容。该资源库特别适合计算机视觉领域的研究人员和开发者快速了解领域进展、复现经典算法或构建新模型。对于需要处理视频分析、自动驾驶、无人机监控、细胞追踪等任务的团队,也能从中找到对应的数据集和基准代码。其结构清晰、分类细致,是进入目标感知任务的理想起点。
使用场景
某生物医学研究团队正在开发一套用于活细胞动态行为分析的自动化系统,需对显微镜视频中的多个细胞进行长时间、高精度的追踪与分裂事件检测。
没有 Deep-Learning-for-Tracking-and-Detection 时
- 团队需手动搜索最新论文和开源代码,耗费大量时间筛选适用于细胞追踪的模型,信息分散且版本混乱。
- 缺乏针对显微成像特点(如低对比度、密集重叠)的专用数据集参考,导致模型泛化能力差。
- 多目标追踪与检测模块需从零搭建,难以复现前沿方法(如基于图神经网络或Re-ID的关联策略)。
- 评估指标不统一,无法客观比较不同算法在细胞分裂、遮挡等复杂场景下的性能。
- 团队成员在YOLO、RetinaNet等静态检测框架之间反复试错,缺乏系统性技术路线指导。
使用 Deep-Learning-for-Tracking-and-Detection 后
- 通过项目中“Microscopy / Cell Tracking”分类快速定位适配的论文、数据集(如Cell Tracking Challenge)和代码实现,大幅缩短调研周期。
- 直接复用整理好的多目标追踪基线模型(如基于Siamese网络或Graph NN的方案),并结合显微场景微调,提升追踪连续性。
- 利用项目提供的标准化评估指标(如MOTA、IDF1)对算法进行量化对比,精准优化分裂事件检测逻辑。
- 借助“Static Detection”和“Multi Object Tracking”下的主流框架(如YOLOv5 + DeepSORT组合),快速构建端到端流水线。
- 团队聚焦于生物学问题本身,而非底层技术整合,研发效率显著提升。
Deep-Learning-for-Tracking-and-Detection 将碎片化的学术资源系统化,让科研团队能高效落地复杂的视觉追踪任务。
运行环境要求
- 未说明
未说明
未说明
快速开始
使用深度学习进行目标检测与跟踪的论文、数据集、代码及其他资源合集
- 研究数据
- 论文
- 数据集
- 代码
- 资源合集
- 教程
- 博客
研究数据
我使用 DavidRM Journal 来管理我的研究数据,因为它具备出色的层次化组织、交叉链接和标签(tagging)功能。
我提供了一个 Journal 条目导出文件,其中包含我对过去几年收集的关于计算机视觉(computer vision)和深度学习(deep learning)的论文、文章、教程、代码和笔记进行分类和打标签后的整理成果。
主题云(topic cloud)效果如下所示:![]()
该文件需要 Journal 8 版本,并可通过以下步骤导入:
- 使用 File -> Import -> Import User Preferences 导入我的用户偏好设置
- 使用 File -> Import -> Sync from The Journal Export File 导入研究数据
请注意,必须先导入我的用户偏好设置,再导入研究数据,否则带标签的主题将无法正常工作。
(可选)我还提供了全局选项(Global Options)文件,适用于喜欢深色主题(dark theme)的用户,可通过 File -> Import -> Import Global Options 导入。
更新日期:2026-03-09
论文
静态检测(Static Detection)
区域提议(Region Proposal)
- Scalable Object Detection Using Deep Neural Networks
[cvpr14]
[pdf]
[notes] - Selective Search for Object Recognition
[ijcv2013]
[pdf]
[notes]
RCNN
- Faster R-CNN Towards Real-Time Object Detection with Region Proposal Networks
[tpami17]
[pdf]
[notes] - RFCN - Object Detection via Region-based Fully Convolutional Networks
[nips16]
[Microsoft Research]
[pdf]
[notes] - Mask R-CNN
[iccv17]
[Facebook AI Research]
[pdf]
[notes]
[arxiv]
[code (keras)]
[code (tensorflow)] - SNIPER Efficient Multi-Scale Training
[ax1812/nips18]
[pdf]
[notes]
[code]
YOLO
- You Only Look Once Unified, Real-Time Object Detection
[ax1605]
[pdf]
[notes] - YOLO9000 Better, Faster, Stronger
[ax1612]
[pdf]
[notes] - YOLOv3 An Incremental Improvement
[ax1804]
[pdf]
[notes] - YOLOv4 Optimal Speed and Accuracy of Object Detection
[ax2004]
[pdf]
[notes]
[code]
SSD
- SSD Single Shot MultiBox Detector
[ax1612/eccv16]
[pdf]
[notes] - DSSD Deconvolutional Single Shot Detector
[ax1701]
[pdf]
[notes]
RetinaNet
- Feature Pyramid Networks for Object Detection
[ax1704]
[pdf]
[notes] - Focal Loss for Dense Object Detection
[ax180207/iccv17]
[pdf]
[notes]
无锚框(Anchor Free)
FoveaBox: 超越基于锚框的目标检测器(Beyond Anchor-based Object Detector)
[ax1904]
[pdf]
[notes]
[code]CornerNet: 将目标检测视为成对关键点(Detecting Objects as Paired Keypoints)
[ax1903/ijcv19]
[pdf]
[notes]
[code]FCOS: 全卷积单阶段目标检测(Fully Convolutional One-Stage Object Detection)
[ax1908/iccv19]
[pdf]
[notes]
[code]
[code/FCOS_PLUS]
[code/VoVNet]
[code/HRNet]
[code/NAS]用于单阶段目标检测的特征选择无锚框模块(Feature Selective Anchor-Free Module for Single-Shot Object Detection)
[ax1903/cvpr19]
[pdf]
[notes]
[code]通过聚合极值点与中心点实现自底向上的目标检测(Bottom-up object detection by grouping extreme and center points)
[ax1901]
[pdf]
[notes]
[code]通过自适应训练样本选择弥合基于锚框与无锚框检测之间的差距(Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection)
[ax1912/cvpr20]
[pdf]
[notes]
[code]使用 Transformer 实现端到端目标检测(End-to-end object detection with Transformers)
[ax200528]
[pdf]
[notes]
[code]RepPoints:用于目标检测的点集表示(Point Set Representation for Object Detection)
[iccv19]
[pdf]
[notes]
[code]
其他(Misc)
OverFeat:使用卷积网络实现集成的识别、定位与检测(Integrated Recognition, Localization and Detection using Convolutional Networks)
[ax1402/iclr14]
[pdf]
[notes]LSDA:通过迁移实现大规模检测(Large scale detection through adaptation)
[ax1411/nips14]
[pdf]
[notes]获取定位置信度以实现精确目标检测(Acquisition of Localization Confidence for Accurate Object Detection)
[ax1807/eccv18]
[pdf]
[notes]
[code]EfficientDet:可扩展且高效的目标检测(Scalable and Efficient Object Detection)
[cvpr20]
[pdf]广义交并比(Generalized Intersection over Union):一种用于边界框回归的度量与损失函数(A Metric and A Loss for Bounding Box Regression)
[ax1902/cvpr19]
[pdf]
[notes]
[code]
[project]
视频目标检测(Video Detection)
Tubelet
使用卷积神经网络从视频 Tubelet 中进行目标检测(Object Detection from Video Tubelets with Convolutional Neural Networks)
[cvpr16]
[pdf]
[notes]使用 Tubelet 提案网络进行视频目标检测(Object Detection in Videos with Tubelet Proposal Networks)
[ax1704/cvpr17]
[pdf]
[notes]
FGFA
- Deep Feature Flow for Video Recognition(用于视频识别的深度特征流)
[cvpr17]
[Microsoft Research]
[pdf]
[arxiv]
[code] - Flow-Guided Feature Aggregation for Video Object Detection(用于视频目标检测的光流引导特征聚合)
[ax1708/iccv17]
[pdf]
[notes] - Towards High Performance Video Object Detection(迈向高性能视频目标检测)
[ax1711]
[Microsoft]
[pdf]
[notes]
RNN
- Online Video Object Detection using Association LSTM(使用关联 LSTM 的在线视频目标检测)
[iccv17]
[pdf]
[notes] - Context Matters Refining Object Detection in Video with Recurrent Neural Networks(上下文至关重要:利用循环神经网络优化视频中的目标检测)
[bmvc16]
[pdf]
[notes]
多目标跟踪(Multi Object Tracking)
联合检测(Joint-Detection)
身份嵌入(Identity Embedding)
MOTS Multi-Object Tracking and Segmentation(MOTS:多目标跟踪与分割)
[cvpr19]
[pdf]
[notes]
[code]
[project/data]Towards Real-Time Multi-Object Tracking(迈向实时多目标跟踪)
[ax1909]
[pdf]
[notes]A Simple Baseline for Multi-Object Tracking(多目标跟踪的一个简单基线)
[ax2004]
[pdf]
[notes]
[code]Integrated Object Detection and Tracking with Tracklet-Conditioned Detection(基于轨迹片段条件检测的集成目标检测与跟踪)
[ax1811]
[pdf]
[notes]
关联(Association)
- Deep Affinity Network for Multiple Object Tracking(用于多目标跟踪的深度亲和网络)
[ax1810/tpami19]
[pdf]
[notes]
[code] [pytorch]
深度学习(Deep Learning)
Online Multi-Object Tracking Using CNN-based Single Object Tracker with Spatial-Temporal Attention Mechanism(使用基于 CNN 的单目标跟踪器与时空注意力机制的在线多目标跟踪)
[ax1708/iccv17]
[pdf]
[arxiv]
[notes]Online multi-object tracking with dual matching attention networks(使用双匹配注意力网络的在线多目标跟踪)
[ax1902/eccv18]
[pdf]
[arxiv]
[notes]
[code]FAMNet Joint Learning of Feature, Affinity and Multi-Dimensional Assignment for Online Multiple Object Tracking(FAMNet:用于在线多目标跟踪的特征、亲和性与多维分配联合学习)
[iccv19]
[pdf]
[notes]Exploit the Connectivity: Multi-Object Tracking with TrackletNet(利用连通性:使用 TrackletNet 进行多目标跟踪)
[ax1811/mm19]
[pdf]
[notes]Tracking without bells and whistles(简洁高效的跟踪方法)
[ax1903/iccv19]
[pdf]
[notes]
[code] [pytorch]
RNN(循环神经网络)
- Tracking The Untrackable: Learning To Track Multiple Cues with Long-Term Dependencies
[ax1704/iccv17]
[Stanford]
[pdf]
[notes]
[arxiv]
[project] - Multi-object Tracking with Neural Gating Using Bilinear LSTM
[eccv18]
[pdf]
[notes] - Eliminating Exposure Bias and Metric Mismatch in Multiple Object Tracking
[cvpr19]
[pdf]
[notes]
[code]
无监督学习(Unsupervised Learning)
- Unsupervised Person Re-identification by Deep Learning Tracklet Association
[ax1809/eccv18]
[pdf]
[notes] - Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers
[ax1809/cvpr19]
[pdf]
[arxiv]
[notes]
[code] - Simple Unsupervised Multi-Object Tracking
[ax2006]
[pdf]
[notes]
强化学习(Reinforcement Learning)
- Learning to Track: Online Multi-object Tracking by Decision Making
[iccv15]
[Stanford]
[pdf]
[notes]
[code (matlab)]
[project] - Collaborative Deep Reinforcement Learning for Multi-Object Tracking
[eccv18]
[pdf]
[notes]
网络流(Network Flow)
- Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor
[iccv15]
[NEC Labs]
[pdf]
[author]
[notes] - Deep Network Flow for Multi-Object Tracking
[cvpr17]
[NEC Labs]
[pdf]
[supplementary]
[notes] - Learning a Neural Solver for Multiple Object Tracking
[ax1912/cvpr20]
[pdf]
[notes]
[code]
图优化(Graph Optimization)
- A Multi-cut Formulation for Joint Segmentation and Tracking of Multiple Objects
[ax1607]
[highest MT on MOT2015]
[University of Freiburg, Germany]
[pdf]
[arxiv]
[author]
[notes]
基线方法(Baseline)
- Simple Online and Realtime Tracking
[icip16]
[pdf]
[notes]
[code] - High-Speed Tracking-by-Detection Without Using Image Information
[avss17]
[pdf]
[notes]
[code]
评估指标(Metrics)
单目标跟踪(Single Object Tracking)
强化学习(Reinforcement Learning)
- Deep Reinforcement Learning for Visual Object Tracking in Videos
[ax1704] [加州大学圣塔芭芭拉分校(USC-Santa Barbara),三星研究院(Samsung Research)]
[pdf]
[arxiv]
[author]
[notes] - Visual Tracking by Reinforced Decision Making
[ax1702] [首尔国立大学(Seoul National University),中央大学(Chung-Ang University)]
[pdf]
[arxiv]
[author]
[notes] - Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning
[cvpr17] [首尔国立大学(Seoul National University)]
[pdf]
[supplementary]
[project]
[notes]
[code] - End-to-end Active Object Tracking via Reinforcement Learning
[ax1705]
[北京大学(Peking University),腾讯 AI Lab(Tencent AI Lab)]
[pdf]
[arxiv]
Siamese 网络
- Fully-Convolutional Siamese Networks for Object Tracking
[eccv16]
[pdf]
[project]
[notes] - High Performance Visual Tracking with Siamese Region Proposal Network
[cvpr18]
[pdf]
[author]
[notes] - Siam R-CNN Visual Tracking by Re-Detection
[cvpr20]
[pdf]
[notes]
[project]
[code]
相关滤波(Correlation)
- ATOM: Accurate Tracking by Overlap Maximization
[cvpr19]
[pdf]
[notes]
[code] - DiMP: Learning Discriminative Model Prediction for Tracking
[iccv19]
[pdf]
[notes]
[code] - D3S – A Discriminative Single Shot Segmentation Tracker
[cvpr20]
[pdf]
[notes]
[code]
其他(Misc)
深度学习(Deep Learning)
合成梯度(Synthetic Gradients)
- Decoupled Neural Interfaces using Synthetic Gradients
[ax1608]
[pdf]
[notes] - Understanding Synthetic Gradients and Decoupled Neural Interfaces
[ax1703]
[pdf]
[notes]
高效模型(Efficient)
无监督学习(Unsupervised Learning)
插值(Interpolation)
- Video Frame Interpolation via Adaptive Convolution
[cvpr17 / iccv17]
[pdf (cvpr17)]
[pdf (iccv17)]
[ppt]
自编码器(Autoencoder)
变分方法(Variational)
- beta-VAE:使用约束变分框架学习基本视觉概念 [iclr17]
[pdf]
[notes] - 通过因子分解实现解耦(Disentangling by Factorising) [ax1806]
[pdf]
[notes]
数据集
多目标跟踪(Multi Object Tracking)
- IDOT
- UA-DETRAC Benchmark Suite
- GRAM Road-Traffic Monitoring [paper]
- Ko-PER Intersection Dataset
- TRANCOS
- Urban Tracker
- DARPA VIVID / PETS 2005 [非固定相机]
- KIT-AKS [无真值标注]
- CBCL StreetScenes Challenge Framework [无俯视视角]
- MOT 2015 [主要为街面视角]
- MOT 2016 [主要为街面视角]
- MOT 2017 [主要为街面视角]
- MOT 2020 [主要为俯视视角]
- MOTS: 多目标跟踪与分割(Multi-Object Tracking and Segmentation) [基于 MOT 和 KITTI]
- CVPR 2019 [主要为街面视角]
- PETS 2009 [不含车辆]
- PETS 2017 [低密度] [主要为行人]
- DukeMTMC [多相机] [静态背景] [行人] [高于街面视角] [网站无法访问]
- KITTI Tracking Dataset [无俯视视角] [非固定相机]
- The WILDTRACK Seven-Camera HD Dataset [行人检测与跟踪]
- 基于移动平台的 3D 交通场景理解(3D Traffic Scene Understanding from Movable Platforms) [交叉路口交通] [立体视觉设置] [移动相机]
- LOST : 场景长期观测与轨迹数据集(Longterm Observation of Scenes with Tracks) [俯视与街面视角] [无真值标注]
- JTA [俯视与街面视角] [合成数据/GTA 5] [行人] [3D 标注]
- PathTrack: 使用路径监督进行快速轨迹标注(Fast Trajectory Annotation with Path Supervision) [俯视与街面视角] [iccv17] [行人]
- CityFlow [杆装摄像头] [交叉路口] [车辆] [重识别(re-id)] [cvpr19]
- JackRabbot Dataset [RGBD] [正面视角][室内/室外][斯坦福]
- TAO: 面向任意目标跟踪的大规模基准(A Large-Scale Benchmark for Tracking Any Object) [eccv20] [code]
- 爱丁堡办公室监控视频数据集(Edinburgh office monitoring video dataset) [室内][长期][人物大多静止]
- Waymo Open Dataset [室外][车辆]
无人机(UAV)
- 斯坦福无人机数据集(Stanford Drone Dataset)
- UAVDT - 无人机基准:目标检测与跟踪(The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking) [无人机] [交叉路口/高速公路] [车辆] [eccv18]
- VisDrone
合成数据(Synthetic)
- MNIST-MOT / MNIST-Sprites [脚本生成] [cvpr19]
- TUB 多目标多相机跟踪数据集(TUB Multi-Object and Multi-Camera Tracking Dataset) [avss16]
- Virtual KITTI [arxiv] [cvpr16] [链接似乎已失效]
显微镜 / 细胞跟踪(Microscopy / Cell Tracking)
- 细胞跟踪挑战赛(Cell Tracking Challenge) [nature methods/2017]
- CTMC: 带有有丝分裂检测的细胞跟踪数据集挑战(Cell Tracking with Mitosis Detection Dataset Challenge) [cvprw20] [MOT]
单目标跟踪(Single Object Tracking)
- TrackingNet: 面向真实场景的大规模目标跟踪数据集与基准 [eccv18]
- LaSOT: 大规模单目标跟踪(Large-scale Single Object Tracking) [cvpr19]
- Need for speed: 高帧率目标跟踪基准(A benchmark for higher frame rate object tracking) [iccv17]
- 野外长期跟踪基准(Long-term Tracking in the Wild A Benchmark) [eccv18]
- UAV123: 无人机跟踪基准与模拟器(A benchmark and simulator for UAV tracking) [eccv16] [project]
- Sim4CV: 面向计算机视觉应用的照片级真实感模拟器(A Photo-Realistic Simulator for Computer Vision Applications) [ijcv18]
- CDTB: 彩色与深度视觉目标跟踪与基准(A Color and Depth Visual Object Tracking and Benchmark) [iccv19] [RGBD]
- Temple Color 128 - 彩色跟踪基准(Color Tracking Benchmark) [tip15]
视频检测(Video Detection)
视频理解 / 行为识别(Video Understanding / Activity Recognition)
- YouTube-8M
- AVA: 原子视觉动作视频数据集(A Video Dataset of Atomic Visual Action)
- VIRAT 视频数据集(VIRAT Video Dataset)
- Kinetics 行为识别数据集(Kinetics Action Recognition Dataset)
静态检测(Static Detection)
- PASCAL Visual Object Classes
- A Large-Scale Dataset for Vehicle Re-Identification in the Wild
[cvpr19] - Object Detection-based annotations for some frames of the VIRAT dataset
- MIO-TCD: A new benchmark dataset for vehicle classification and localization
[tip18] - Tiny ImageNet
动物(Animals)
- Wildlife Image and Localization Dataset (species and bounding box labels)
[wacv18] - Stanford Dogs Dataset
[cvpr11] - Oxford-IIIT Pet Dataset
[cvpr12] - Caltech-UCSD Birds 200 [rough segmentation] [attributes]
- Gold Standard Snapshot Serengeti Bounding Box Coordinates
边界检测(Boundary Detection)
静态分割(Static Segmentation)
- COCO - Common Objects in Context
- Open Images
- ADE20K
[cvpr17] - SYNTHIA
[cvpr16] - UC Berkeley Computer Vision Group - Contour Detection and Image Segmentation
视频分割(Video Segmentation)
- DAVIS: Densely Annotated VIdeo Segmentation
- Mapillary Vistas Dataset [street scenes] [semi-free]
- BDD100K [street scenes] [autonomous driving]
- ApolloScape [street scenes] [autonomous driving]
- Cityscapes [street scenes] [instance-level]
- YouTube-VOS
[iccv19]
分类(Classification)
- ImageNet Large Scale Visual Recognition Competition 2012
- Animals with Attributes 2
- CompCars Dataset
- ObjectNet [only test set]
光流(Optical Flow)
运动预测(Motion Prediction)
代码(Code)
通用视觉(General Vision)
- Gluon CV Toolkit [mxnet] [pytorch]
- OpenMMLab Computer Vision Foundation [pytorch]
多目标跟踪(Multi Object Tracking)
框架(Frameworks)
通用
- Globally-optimal greedy algorithms for tracking a variable number of objects [cvpr11] [matlab] [作者]
- Continuous Energy Minimization for Multitarget Tracking [cvpr11 / iccv11 / tpami 2014] [matlab]
- Discrete-Continuous Energy Minimization for Multi-Target Tracking [cvpr12] [matlab] [项目]
- The way they move: Tracking multiple targets with similar appearance [iccv13] [matlab]
- 3D Traffic Scene Understanding from Movable Platforms [2d_tracking] [pami14/kit13/iccv13/nips11] [c++/matlab]
- 基于无向分层关系超图的多目标跟踪 [cvpr14] [C++] [作者]
- Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning [cvpr14] [matlab] (项目)
- Learning to Track: Online Multi-Object Tracking by Decision Making [iccv15] [matlab]
- Joint Tracking and Segmentation of Multiple Targets [cvpr15] [matlab]
- Multiple Hypothesis Tracking Revisited [iccv15] [在 MOT2015 上开源跟踪器中性能最高] [matlab]
- Combined Image- and World-Space Tracking in Traffic Scenes [icra 2017] [c++]
- Online Multi-Target Tracking with Recurrent Neural Networks [aaai17] [lua/torch7]
- Real-Time Multiple Object Tracking - A Study on the Importance of Speed [ax1710/硕士论文] [c++]
- Beyond Pixels: Leveraging Geometry and Shape Cues for Online Multi-Object Tracking [icra18] [matlab]
- Online Multi-Object Tracking with Dual Matching Attention Network [eccv18] [matlab/tensorflow]
- TrackR-CNN - 多目标跟踪与分割 [cvpr19] [tensorflow] [项目]
- Eliminating Exposure Bias and Metric Mismatch in Multiple Object Tracking [cvpr19] [tensorflow]
- Robust Multi-Modality Multi-Object Tracking [iccv19] [pytorch]
- Towards Real-Time Multi-Object Tracking / Joint Detection and Embedding [ax1909] [pytorch] [CMU]
- Deep Affinity Network for Multiple Object Tracking [tpami19] [pytorch]
- Tracking without bells and whistles [iccv19] [pytorch]
- Lifted Disjoint Paths with Application in Multiple Object Tracking [icml20] [matlab] [mot15 第1名, mot16 第3名, mot17 第2名]
- Learning a Neural Solver for Multiple Object Tracking [cvpr20] [pytorch] [mot15 第2名]
- Tracking Objects as Points [ax2004] [pytorch]
- Quasi-Dense Similarity Learning for Multiple Object Tracking [ax2006] [pytorch]
- DEFT: Detection Embeddings for Tracking [ax2102] [pytorch]
- How To Train Your Deep Multi-Object Tracker [ax1906/cvpr20] [pytorch] [traktor/gitlab]
- Track To Detect and Segment: An Online Multi-Object Tracker [cvpr21] [pytorch] [项目]
- MOTR: End-to-End Multiple-Object Tracking with Transformer [ax2202] [pytorch]
基线方法(Baseline)
- Simple Online and Realtime Tracking [icip 2016] [python]
- Deep SORT : Simple Online Realtime Tracking with a Deep Association Metric [icip17] [python]
- High-Speed Tracking-by-Detection Without Using Image Information [avss17] [python]
- A simple baseline for one-shot multi-object tracking [ax2004] [pytorch] [MOT15、16、17、20 的冠军方法]
Siamese 网络
- SiamMOT: Siamese Multi-Object Tracking [ax2105] [pytorch]
无监督方法(Unsupervised)
- Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers [cvpr19] [python/c++/pytorch]
重识别(Re-ID)
- Torchreid: Deep learning person re-identification in PyTorch [ax1910] [pytorch]
- SMOT: Single-Shot Multi Object Tracking [ax2010] [pytorch] [gluon-cv]
- FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking [ax2004] [pytorch] [微软] [BDD100K] [人脸跟踪]
- Rethinking the competition between detection and ReID in Multi-Object Tracking [ax2010] [pytorch]
框架(Frameworks)
图神经网络(Graph NN)
- Joint Object Detection and Multi-Object Tracking with Graph Neural Networks [ax2006/ icra21] [pytorch]
显微镜 / 细胞跟踪(Microscopy / cell tracking)
- Baxter Algorithms / Viterbi Tracking [tmi14] [matlab]
- Deepcell: Accurate cell tracking and lineage construction in live-cell imaging experiments with deep learning [biorxiv1910] [tensorflow]
3D
- 3D 多目标跟踪:一个基线方法与新的评估指标 [iros20/eccvw20] [pytorch]
- GNN3DMOT: 基于图神经网络(Graph Neural Network, GNN)的 3D 多目标跟踪方法,结合多特征学习 [iros20/eccvw20] [pytorch]
评估指标(Metrics)
- HOTA: 用于评估多目标跟踪的高阶指标 [cvpr20] [python]
单目标跟踪(Single Object Tracking)
- 常见跟踪算法合集(2003–2012) [c++/matlab]
- 商汤科技(SenseTime)单目标跟踪研究平台,实现了 SiamRPN、SiamMask 等算法 [pytorch]
- 为基于颜色的无模型(model-free)跟踪正名 [cvpr15] [c++]
- 用于视觉跟踪的层次化卷积特征 [iccv15] [matlab]
- 基于全卷积网络(Fully Convolutional Networks)的视觉跟踪 [iccv15] [matlab]
- 用于视觉跟踪的层次化卷积特征 [iccv15] [matlab]
- DeepTracking: 利用循环神经网络(Recurrent Neural Networks)实现“超越视觉”的跟踪 [aaai16] [torch 7]
- Learning Multi-Domain Convolutional Neural Networks for Visual Tracking [cvpr16] [vot2015 冠军] [matlab/matconvnet] [pytorch]
- 超越相关滤波器:学习用于视觉跟踪的连续卷积算子(Continuous Convolution Operators) [eccv 2016] [matlab]
- 用于目标跟踪的全卷积孪生网络(Fully-Convolutional Siamese Networks) [eccvw 2016] [matlab/matconvnet] [项目主页] [pytorch] [pytorch(仅训练)]
- DCFNet: 用于视觉跟踪的判别相关滤波器网络(Discriminant Correlation Filters Network) [ax1704] [matlab/matconvnet] [pytorch]
- 面向相关滤波器跟踪的端到端表示学习 [cvpr17] [matlab/matconvnet] [tensorflow/仅推理] [项目主页]
- 用于视觉跟踪的双深度网络(Dual Deep Network) [tip1704] [caffe]
- SiameseX: 用于跟踪的孪生网络(Siamese networks)简化版 PyTorch 实现,包括 SiamFC、SiamRPN、SiamRPN++、SiamVGG、SiamDW、SiamRPN-VGG [pytorch]
- RATM: 循环注意力跟踪模型(Recurrent Attentive Tracking Model) [cvprw17] [python]
- ROLO: 用于视觉目标跟踪的空间监督循环卷积神经网络(Spatially Supervised Recurrent Convolutional Neural Networks) [iscas 2017] [tensorflow]
- ECO: 高效卷积算子用于跟踪(Efficient Convolution Operators for Tracking) [cvpr17] [matlab] [python/cuda] [pytorch]
- 基于深度强化学习的动作决策网络(Action-Decision Networks)用于视觉跟踪 [cvpr17] [tensorflow]
- 从检测到跟踪,从跟踪到检测(Detect to Track and Track to Detect) [iccv17] [matlab]
- Meta-Tracker: 视觉目标跟踪器的快速鲁棒在线自适应方法 [eccv18] [pytorch]
- 学习时空正则化相关滤波器用于视觉跟踪 [cvpr18] [matlab]
- High Performance Visual Tracking with Siamese Region Proposal Network [cvpr18] [pytorch/195] [pytorch/313] [pytorch/no_train/104] [pytorch/177]
- 抗干扰孪生网络(Distractor-aware Siamese Networks)用于视觉目标跟踪 [eccv18] [vot18 冠军] [pytorch]
- VITAL: 基于对抗学习(Adversarial Learning)的视觉跟踪 [cvpr18] [matlab] [pytorch] [项目主页]
- 快速在线目标跟踪与分割:一种统一方法(SiamMask) [cvpr19] [pytorch] [项目主页]
- PyTracking: 基于 PyTorch 的通用 Python 视觉目标跟踪训练与运行框架 [ECO/ATOM/DiMP/PrDiMP] [cvpr17/cvpr19/iccv19/cvpr20] [pytorch]
- 无监督深度跟踪(Unsupervised Deep Tracking) [cvpr19] [matlab/matconvnet] [pytorch]
- 更深更宽的孪生网络用于实时视觉跟踪 [cvpr19] [pytorch]
- GradNet: 梯度引导网络(Gradient-Guided Network)用于视觉目标跟踪 [iccv19] [tensorflow]
- [`Skimming-Perusal' 跟踪:一种用于实时鲁棒长期跟踪的框架](https://github.com/iiau-tracker/SPLT) [iccv19] [tensorflow]
- 学习异常抑制相关滤波器用于实时无人机(UAV)跟踪 [iccv19] [matlab]
- 学习孪生跟踪器的模型更新策略 [iccv19] [pytorch]
- SPM-Tracker: 用于实时视觉目标跟踪的串并联匹配(Series-Parallel Matching)方法 [cvpr19] [pytorch] [仅推理]
- 联合组特征选择与判别滤波器学习用于鲁棒视觉目标跟踪 [iccv19] [matlab]
- Siam R-CNN: 通过重检测(Re-Detection)实现视觉跟踪 [cvpr20] [tensorflow]
- D3S - 判别式单次分割跟踪器(Discriminative Single Shot Segmentation Tracker) [cvpr20] [pytorch/pytracking]
- 孪生视觉跟踪的判别式鲁棒在线学习(Discriminative and Robust Online Learning) [aaai20] [pytorch/pysot]
- 孪生框自适应网络(Siamese Box Adaptive Network)用于视觉跟踪 [cvpr20] [pytorch/pysot]
- Ocean: 面向目标感知的无锚框跟踪(Object-aware Anchor-free Tracking) [ax2010] [pytorch]
GUI 应用 / 大规模追踪 / 动物
- BioTracker:一个用于视觉动物追踪的开源计算机视觉框架 [opencv/c++]
- Tracktor:基于图像的动物运动与行为自动追踪 [opencv/c++]
- MARGO(Massively Automated Real-time GUI for Object-tracking):一个用于高通量行为学(ethology)研究的平台 [matlab]
- idtracker.ai:在大量未标记动物群体中追踪所有个体 [tensorflow] [项目主页]
视频检测(Video Detection)
- Flow-Guided Feature Aggregation for Video Object Detection [nips16 / iccv17] [mxnet]
- T-CNN: Tubelets with Convolution Neural Networks [cvpr16] [python]
- TPN: Tubelet Proposal Network [cvpr17] [python]
- Deep Feature Flow for Video Recognition [cvpr17] [mxnet]
- Mobile Video Object Detection with Temporally-Aware Feature Maps [cvpr18] [Google] [tensorflow]
动作检测(Action Detection)
框架(Frameworks)
- OpenMMLab 下一代视频理解工具箱与基准 [pytorch]
静态检测与匹配(Static Detection and Matching)
框架(Frameworks)
- Tensorflow object detection API [tensorflow]
- Detectron2 [pytorch]
- Detectron [pytorch]
- 基于 PyTorch 的 Open MMLab 目标检测工具箱 [pytorch]
- SimpleDet [mxnet]
区域建议(Region Proposal)
- MCG : Multiscale Combinatorial Grouping - 目标建议与分割 (项目主页) [tpami16/cvpr14] [python]
- COB : Convolutional Oriented Boundaries (项目主页) [tpami18/eccv16] [matlab/caffe]
FPN(特征金字塔网络,Feature Pyramid Networks)
- Feature Pyramid Networks for Object Detection [caffe/python]
RCNN
- RFCN (作者官方实现) [caffe/matlab]
- RFCN-tensorflow [tensorflow]
- PVANet: 用于实时目标检测的轻量级深度神经网络 [intel] [emdnn16(nips16)]
- Mask R-CNN [tensorflow] [keras]
- Light-head R-CNN [cvpr18] [tensorflow]
- Evolving Boxes for Fast Vehicle Detection [icme18] [caffe/python]
- Cascade R-CNN (cvpr18) [detectron] [caffe]
- A MultiPath Network for Object Detection [torch] [bmvc16] [facebook]
- SNIPER: Efficient Multi-Scale Training / 关于目标检测中尺度不变性的分析 - SNIP [nips18/cvpr18] [mxnet]
SSD(Single Shot MultiBox Detector)
- SSD-Tensorflow [tensorflow]
- SSD-Tensorflow (tf.estimator) [tensorflow]
- SSD-Tensorflow (tf.slim) [tensorflow]
- SSD-Keras [keras]
- SSD-Pytorch [pytorch]
- Enhanced SSD with Feature Fusion and Visual Reasoning [nca18] [tensorflow]
- RefineDet - Single-Shot Refinement Neural Network for Object Detection [cvpr18] [caffe]
RetinaNet
- 9.277.41 [pytorch]
- 31.857.212 [pytorch]
- 25.274.84 [pytorch] [nvidia]
- 22.869.302 [pytorch]
YOLO(You Only Look Once)
- Darknet: 卷积神经网络 [c/python]
- YOLO9000: Better, Faster, Stronger - 实时目标检测,支持 9000 个类别! [c/python]
- Darkflow [tensorflow]
- Pytorch Yolov2 [pytorch]
- 适用于 Windows 和 Linux 的 Yolo-v3 和 Yolo-v2 [c/python]
- YOLOv3 in PyTorch [pytorch]
- pytorch-yolo-v3 [pytorch] [无训练功能] [教程]
- YOLOv3_TensorFlow [tensorflow]
- tensorflow-yolo-v3 [tensorflow slim]
- tensorflow-yolov3 [tensorflow slim]
- keras-yolov3 [keras]
- YOLOv4 [darknet - c/python] [tensorflow] [pytorch/711] [pytorch/ONNX/TensorRT/1.9k] [pytorch 3D]
- YOLOv5 [pytorch]
- YOLOX [pytorch] MegEngine [ax2107]
无锚框(Anchor Free)
- FoveaBox: 超越基于锚框的目标检测器 [ax1904] [pytorch/mmdetection]
- Cornernet: 将目标检测视为成对关键点检测 [ax1903/eccv18] [pytorch]
- FCOS: 全卷积单阶段目标检测 [iccv19] [pytorch] [VoVNet] [HRNet] [NAS] [FCOS_PLUS]
- 用于单阶段目标检测的特征选择无锚框模块 [cvpr19] [pytorch]
- CenterNet: 将目标视为点 [ax1904] [pytorch]
- 通过聚合极值点与中心点实现自底向上的目标检测 [cvpr19] [pytorch]
- RepPoints:用于目标检测的点集表示 [iccv19] [pytorch] [microsoft]
- DETR: 基于 Transformer 的端到端目标检测 [ax200528] [pytorch] [facebook]
- 通过自适应训练样本选择弥合基于锚框与无锚框检测之间的差距 [cvpr20] [pytorch]
其他(Misc)
- 用于目标检测的关系网络(Relation Networks) [cvpr18] [mxnet]
- DeNet: 使用定向稀疏采样的可扩展实时目标检测 [iccv17(poster)] [theano]
- 用于目标检测的多尺度位置感知核表示 [cvpr18] [caffe/python]
匹配(Matching)
边界检测(Boundary Detection)
- 整体嵌套边缘检测(Holistically-Nested Edge Detection, HED)(iccv15) [caffe]
- 使用深度学习进行边缘检测(HED) [tensorflow]
- OpenCV 中的整体嵌套边缘检测(HED) [python/c++]
- 使用逐点互信息实现清晰边界检测(eccv14) [matlab]
- 密集极值 Inception 网络:面向鲁棒边缘检测的 CNN 模型 [wacv20] tensorflow pytorch
文本检测(Text Detection)
- 使用可微分二值化的实时场景文本检测 [pytorch] [aaai20]
框架(Frameworks)
- OpenMMLab 文本检测、识别与理解工具箱 [pytorch]
3D 检测(3D Detection)
框架(Frameworks)
- OpenMMLab 下一代通用 3D 目标检测平台 [pytorch]
- 基于 LiDAR 的 3D 目标检测工具箱 OpenPCDet [pytorch]
光流(Optical Flow)
- FlowNet 2.0: 使用深度网络进行光流估计的演进 (cvpr17) [caffe] [pytorch/nvidia]
- SPyNet: 用于光流的空间金字塔网络 (cvpr17) [lua] [pytorch]
- 引导式光流学习 (cvprw17) [caffe] [tensorflow]
- 使用密集逆搜索(DIS)的快速光流 [eccv16] [C++]
- 一种用于计算实时光流的滤波器方法 [ral16] [c++/cuda - matlab, python 封装]
- PatchBatch - 一种用于光流的批增强损失 [cvpr16] [python/theano]
- 分段刚性场景流(Piecewise Rigid Scene Flow) [iccv13/eccv14/ijcv15] [c++/matlab]
- DeepFlow v2 [iccv13] [c++/python/matlab], [项目主页]
- 光流数据代价的评估 [gcpr13] [matlab]
框架(Frameworks)
- OpenMMLab 光流工具箱与基准 [pytorch]
实例分割(Instance Segmentation)
- 全卷积实例感知语义分割 [cvpr17] [coco16 冠军] [mxnet]
- 通过多任务网络级联实现实例感知语义分割 [cvpr16] [caffe] [coco15 冠军]
- DeepMask/SharpMask [nips15/eccv16] [facebook] [torch] [tensorflow] [pytorch/deepmask]
- 同时检测与分割(Simultaneous Detection and Segmentation) [eccv14] [matlab] [项目主页]
- PANet [cvpr18] [pytorch]
- RetinaMask [arxviv1901] [pytorch]
- Mask Scoring R-CNN [cvpr19] [pytorch]
- DeepMAC [ax2104] [tensorflow]
- Swin Transformer [iccv21] [pytorch] [microsoft]
框架(Frameworks)
- PyTorch 中实例分割与目标检测算法的快速、模块化参考实现 [pytorch] [facebook]
- PaddleDetection,基于 PaddlePaddle 的目标检测与实例分割工具箱 [2019]
语义分割(Semantic Segmentation)
- Learning from Synthetic Data: Addressing Domain Shift for Semantic Segmentation [cvpr18] [spotlight] [pytorch]
- Few-shot Segmentation Propagation with Guided Networks [ax1806] [pytorch] [incomplete]
- Pytorch-segmentation-toolbox [DeeplabV3 和 PSPNet] [pytorch]
- DeepLab [tensorflow]
- Auto-DeepLab [pytorch]
- DeepLab v3+ [pytorch]
- Deep Extreme Cut (DEXTR): From Extreme Points to Object Segmentation[cvpr18][project] [pytorch]
- FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation[ax1903][project] [pytorch]
框架(Frameworks)
息肉分割(Polyp)
- PraNet: Parallel Reverse Attention Network for Polyp Segmentation[miccai20]
- PHarDNet-MSEG: A Simple Encoder-Decoder Polyp Segmentation Neural Network that Achieves over 0.9 Mean Dice and 86 FPS[ax2101]
全景分割(Panoptic Segmentation)
- Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation [cvpr20] [pytorch]
视频分割(Video Segmentation)
- Improving Semantic Segmentation via Video Prediction and Label Relaxation [cvpr19] [pytorch] [nvidia]
- PReMVOS: Proposal-generation, Refinement and Merging for Video Object Segmentation [accv18/cvprw18/eccvw18] [tensorflow]
- MaskTrackRCNN for video instance segmentation [iccv19] [pytorch/detectron]
- MaskTrackRCNN [iccv19] [pytorch/detectron]
- Video Instance Segmentation using Inter-Frame Communication Transformers [nips21] [pytorch/detectron]
- VNext: SeqFormer / IDOL [eccv22] [pytorch/detectron2]
- SeqFormer: Sequential Transformer for Video Instance Segmentation [eccv22] [pytorch/detectron2]
- VITA: Video Instance Segmentation via Object Token Association [nips22] [pytorch/detectron2]
全景视频分割(Panoptic Video Segmentation)
- ViP-DeepLab [cvpr21]
运动预测(Motion Prediction)
- Self-Supervised Learning via Conditional Motion Propagation [cvpr19] [pytorch]
- A Neural Temporal Model for Human Motion Prediction [cvpr19] [tensorflow]
- Learning Trajectory Dependencies for Human Motion Prediction [iccv19] [pytorch]
- Structural-RNN: Deep Learning on Spatio-Temporal Graphs [cvpr15] [tensorflow]
- A Keras multi-input multi-output LSTM-based RNN for object trajectory forecasting [keras]
- Transformer Networks for Trajectory Forecasting [ax2003] [pytorch]
- Regularizing neural networks for future trajectory prediction via IRL framework [ietcv1907] [tensorflow]
- Peeking into the Future: Predicting Future Person Activities and Locations in Videos [cvpr19] [tensorflow]
- DAG-Net: Double Attentive Graph Neural Network for Trajectory Forecasting [ax200526] [pytorch]
- MCENET: Multi-Context Encoder Network for Homogeneous Agent Trajectory Prediction in Mixed Traffic [ax200405] [tensorflow]
- Human Trajectory Prediction in Socially Interacting Crowds Using a CNN-based Architecture [pytorch]
- A tool set for trajectory prediction, ready for pip install [icai19/wacv19] [pytorch]
- RobustTP: End-to-End Trajectory Prediction for Heterogeneous Road-Agents in Dense Traffic with Noisy Sensor Inputs [acmcscs19] [pytorch/tensorflow]
- The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction [cvpr20] [dummy]
- Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction [cvpr19] [tensorflow]
- Adversarial Loss for Human Trajectory Prediction [hEART19] [pytorch]
- Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks [cvpr18] [pytorch]
- Forecasting Trajectory and Behavior of Road-Agents Using Spectral Clustering in Graph-LSTMs [ax1912] [pytorch]
- Study of attention mechanisms for trajectory prediction in Deep Learning [msc thesis] [python]
- A python implementation of multi-model estimation algorithm for trajectory tracking and prediction, research project from BMW ABSOLUT self-driving bus project. [python]
- Predicting Human Trajectories [theano]
- Implementation of Recurrent Neural Networks for future trajectory prediction of pedestrians [pytorch]
姿态估计(Pose Estimation)
框架(Frameworks)
自编码器(Autoencoders)
- β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework [iclr17] [deepmind] [tensorflow] [tensorflow] [pytorch]
- Disentangling by Factorising [ax1806] [pytorch]
分类(Classification)
- Learning Efficient Convolutional Networks Through Network Slimming [iccv17] [pytorch]
框架(Frameworks)
- OpenMMLab 图像分类工具箱与基准测试 [pytorch]
深度强化学习(Deep RL)
标注(Annotation)
- LabelImg
- ByLabel: A Boundary Based Semi-Automatic Image Annotation Tool
- Bounding Box Editor and Exporter
- VGG Image Annotator
- Visual Object Tagging Tool: 一个用于从图像和视频构建端到端目标检测模型的 Electron 应用
- PixelAnnotationTool
- labelme:使用 Python 进行图像多边形标注(支持多边形、矩形、圆形、线条、点及图像级标签标注)
- VATIC - 来自加州尔湾的视频标注工具 [ijcv12] [项目主页]
- 计算机视觉标注工具(CVAT)
- 图像标注工具
- Labelbox [付费]
- RectLabel:一款用于边界框目标检测和分割的图像标注工具 [付费]
- Onepanel:面向生产级视觉 AI 的平台,集成了模型构建、自动标注、数据处理和模型训练流水线等完整组件 [文档]
编辑(Editing)
数据增强(Augmentation)
- Augmentor:用于机器学习的 Python 图像增强库
- Albumentations:快速图像增强库,并提供对其他库的易用封装
- imgaug:用于机器学习实验的图像增强库
- solt:基于轻量级数据变换的图像流处理库
深度学习(Deep Learning)
- 可变形卷积网络(Deformable Convolutional Networks)
- RNNexp
- Grad-CAM:梯度加权类激活映射(Gradient-weighted Class Activation Mapping)
类别不平衡(Class Imbalance)
- Imbalanced Dataset Sampler [pytorch]
- PyTorch 中的可迭代数据集重采样 [pytorch]
小样本学习(Few shot learning)
- OpenMMLab 小样本学习工具箱与基准测试 [pytorch]
无监督学习(Unsupervised learning)
资源集合(Collections)
数据集(Datasets)
- Awesome Public Datasets
- 交通监控数据集列表
- 机器学习数据集:来自网络的大型机器学习数据集列表
- 亚历山大标注信息图书馆:生物学与保护领域 [其他保护相关数据集]
- THOTH:数据集与图像
- Google AI 数据集
- Google Cloud Storage 公共数据集
- 微软研究院开放数据
- Earth Engine 数据目录
- AWS 开放数据注册表
- Kaggle 数据集
- CVonline:图像数据库
- 计算机视觉合成数据:计算机视觉领域的合成数据集与工具列表
- pgram 机器学习数据集
- pgram 视觉数据集
深度学习(Deep Learning)
静态检测(Static Detection)
视频检测(Video Detection)
单目标跟踪(Single Object Tracking)
- 视觉跟踪论文列表
- 基于深度学习的跟踪论文列表
- 在 OTB 上评估的单目标跟踪器列表
- 基于相关滤波(Correlation Filter)的跟踪器集合,包含论文、代码等链接
- VOT2018 跟踪器仓库
- CUHK 数据集
- CVPR19 视觉跟踪论文总结
- 单目标视觉跟踪器
多目标跟踪(Multi Object Tracking, MOT)
静态分割(Static Segmentation)
- 分割相关论文与代码
- Segmentation.X:语义分割(semantic segmentation)、实例分割(instance segmentation)、全景分割(panoptic segmentation)和视频分割(video segmentation)的论文与基准
- 带代码的实例分割论文
视频分割(Video Segmentation)
运动预测(Motion Prediction)
- Awesome-Trajectory-Prediction
- Awesome Interaction-aware Behavior and Trajectory Prediction(交互感知的行为与轨迹预测)
- 人类轨迹预测数据集
深度压缩感知(Deep Compressed Sensing)
其他(Misc)
- Papers With Code:机器学习最新进展
- Awesome Deep Ecology(深度生态学资源)
- Matlab 框架、库和软件列表
- 人脸识别(Face Recognition)
- 一个月的机器学习论文摘要
- Awesome-model-compression-and-acceleration(模型压缩与加速)
- Model-Compression-Papers(模型压缩论文)
教程(Tutorials)
资源合集(Collections)
多目标跟踪(Multi Object Tracking)
静态检测(Static Detection)
- 使用 Transformer 实现端到端目标检测
- 深度学习目标检测全面综述
- 目标检测深度学习算法综述
- Inception 网络各版本简易指南
- R-CNN、Fast R-CNN、Faster R-CNN、YOLO — 目标检测算法
- 深度学习目标检测入门指南
- RetinaNet 背后的直觉
- YOLO——“你只看一次”,实时目标检测详解
- 理解用于目标检测的特征金字塔网络(FPN)
- 在 Keras 上使用 SqueezeDet 实现快速目标检测
- 感兴趣区域池化(RoI Pooling)详解
视频检测(Video Detection)
实例分割(Instance Segmentation)
- 色彩点缀:使用 Mask R-CNN 和 TensorFlow 实现实例分割
- Mask R-CNN 简明理解
- 学会分割
- 分析 Facebook 计算机视觉方法背后的论文
- 论文回顾:MNC —— 2015 COCO 分割冠军的多任务网络级联
- 论文回顾:FCIS —— 2016 COCO 分割冠军
- 论文回顾:InstanceFCN —— 实例敏感得分图
深度学习(Deep Learning)
优化(Optimization)
类别不平衡(Class Imbalance)
- 从不平衡数据中学习
- 从不平衡类别中学习
- 在机器学习中处理不平衡数据集 [medium]
- 如何处理类别不平衡问题 [medium]
- 处理不平衡数据的方法 [towardsdatascience]
- 如何在机器学习中处理不平衡类别 [elitedatascience]
- 处理不平衡数据的 7 种技术 [kdnuggets]
- 处理机器学习中不平衡类别的 10 种技术 [analyticsvidhya]
RNN(循环神经网络,Recurrent Neural Networks)
深度强化学习(Deep RL,Deep Reinforcement Learning)
自编码器(Autoencoders)
- 自编码器指南
- 应用深度学习 - 第三部分:自编码器
- 去噪自编码器(Denoising Autoencoders)
- 堆叠去噪自编码器(Stacked Denoising Autoencoders)
- LSTM 自编码器入门简介
- 使用 TensorFlow 实现变分自编码器(Variational Autoencoder)
- 使用 TensorFlow Probability 层实现变分自编码器
博客(Blogs)
常见问题
相似工具推荐
stable-diffusion-webui
stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面,旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点,将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。 无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师,还是想要深入探索模型潜力的开发者与研究人员,都能从中获益。其核心亮点在于极高的功能丰富度:不仅支持文生图、图生图、局部重绘(Inpainting)和外绘(Outpainting)等基础模式,还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外,它内置了 GFPGAN 和 CodeFormer 等人脸修复工具,支持多种神经网络放大算法,并允许用户通过插件系统无限扩展能力。即使是显存有限的设备,stable-diffusion-webui 也提供了相应的优化选项,让高质量的 AI 艺术创作变得触手可及。
everything-claude-code
everything-claude-code 是一套专为 AI 编程助手(如 Claude Code、Codex、Cursor 等)打造的高性能优化系统。它不仅仅是一组配置文件,而是一个经过长期实战打磨的完整框架,旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。 通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能,everything-claude-code 能显著提升 AI 在复杂任务中的表现,帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略,使得模型响应更快、成本更低,同时有效防御潜在的攻击向量。 这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库,还是需要 AI 协助进行安全审计与自动化测试,everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目,它融合了多语言支持与丰富的实战钩子(hooks),让 AI 真正成长为懂上
ComfyUI
ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎,专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式,采用直观的节点式流程图界面,让用户通过连接不同的功能模块即可构建个性化的生成管线。 这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景,也能自由组合模型、调整参数并实时预览效果,轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性,不仅支持 Windows、macOS 和 Linux 全平台,还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构,并率先支持 SDXL、Flux、SD3 等前沿模型。 无论是希望深入探索算法潜力的研究人员和开发者,还是追求极致创作自由度的设计师与资深 AI 绘画爱好者,ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能,使其成为当前最灵活、生态最丰富的开源扩散模型工具之一,帮助用户将创意高效转化为现实。
NextChat
NextChat 是一款轻量且极速的 AI 助手,旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性,以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发,NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。 这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言,它也提供了便捷的自托管方案,支持一键部署到 Vercel 或 Zeabur 等平台。 NextChat 的核心亮点在于其广泛的模型兼容性,原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型,让用户在一个界面即可自由切换不同 AI 能力。此外,它还率先支持 MCP(Model Context Protocol)协议,增强了上下文处理能力。针对企业用户,NextChat 提供专业版解决方案,具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能,满足公司对数据隐私和个性化管理的高标准要求。
ML-For-Beginners
ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程,旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周,包含 26 节精炼课程和 52 道配套测验,内容涵盖从基础概念到实际应用的完整流程,有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。 无论是希望转型的开发者、需要补充算法背景的研究人员,还是对人工智能充满好奇的普通爱好者,都能从中受益。课程不仅提供了清晰的理论讲解,还强调动手实践,让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持,通过自动化机制提供了包括简体中文在内的 50 多种语言版本,极大地降低了全球不同背景用户的学习门槛。此外,项目采用开源协作模式,社区活跃且内容持续更新,确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路,ML-For-Beginners 将是理想的起点。
ragflow
RAGFlow 是一款领先的开源检索增强生成(RAG)引擎,旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体(Agent)能力相结合,不仅支持从各类文档中高效提取知识,还能让模型基于这些知识进行逻辑推理和任务执行。 在大模型应用中,幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构(如表格、图表及混合排版),显著提升了信息检索的准确度,从而有效减少模型“胡编乱造”的现象,确保回答既有据可依又具备时效性。其内置的智能体机制更进一步,使系统不仅能回答问题,还能自主规划步骤解决复杂问题。 这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统,还是致力于探索大模型在垂直领域落地的创新者,都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口,既降低了非算法背景用户的上手门槛,也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目,它正成为连接通用大模型与行业专有知识之间的重要桥梁。