learnopencv

22.9k 11.7k 中等 1 次阅读今天图像开发框架Agent

AI 解读由 AI 自动生成，仅供参考

learnopencv 是一个专注于计算机视觉、深度学习与人工智能领域的开源代码库，旨在将复杂的技术理论转化为可运行的 C++ 和 Python 实战示例。它紧密配合 LearnOpenCV.com 博客的技术文章，为读者提供从基础概念到前沿应用的完整代码实现。

面对 AI 技术迭代快、论文复现难的问题，learnopencv 提供了经过验证的落地方案，帮助用户跨越从“读懂原理”到“写出代码”的鸿沟。无论是实时目标检测（如最新的 YOLO26、RF-DETR）、多目标跟踪、人脸隐私保护，还是大模型部署（如 Jetson 边缘计算、vLLM 服务）、3D 重建（SAM 3D、高斯泼溅）以及 RAG 检索增强生成等热门方向，这里都能找到对应的演示项目。

该资源特别适合开发者、算法研究人员及 AI 学习者使用。对于希望提升工程能力的程序员，它提供了生产级的参考架构；对于科研人员，它是快速验证新想法的试验田；对于初学者，则是循序渐进掌握 OpenCV 与深度学习框架的最佳实践指南。通过涵盖从传统图像处理到大模型应用的全栈内容，learnopencv 致力于让每个人都能轻松上手并精通 AI 开发。

使用场景

某智慧零售团队正致力于开发一套实时顾客行为分析系统，需要在边缘设备上精准追踪多人动线并自动模糊人脸以符合隐私法规。

没有 learnopencv 时

开发者需从零复现复杂的 YOLO26 或 RF-DETR 算法，耗费数周调试实例分割与实时检测的代码兼容性。
面对多目标追踪场景，缺乏成熟的 Roboflow 追踪器集成示例，导致人员身份频繁切换，数据准确率极低。
为满足隐私合规，手动编写基于 YuNet 的人脸模糊逻辑效率低下，且难以在 Jetson 等边缘端实现低延迟推理。
遇到模型部署瓶颈（如 NMS 后处理耗时）时，缺乏官方优化的无 NMS 推理方案，系统帧率无法达到实时要求。

使用 learnopencv 后

直接调用仓库中经过验证的 YOLO26 和 RF-DETR 演示代码，半天内即可跑通像素级实例分割功能，大幅缩短研发周期。
复用现成的 Roboflow 追踪器集成脚本，轻松实现稳定流畅的多目标轨迹跟踪，无需担心算法底层实现细节。
利用内置的 OpenCV YuNet 人脸模糊模块，快速部署实时的隐私保护功能，确保系统在采集瞬间即完成脱敏处理。
借鉴 YOLO26 无 NMS 推理及 Jetson 端 LLM 部署的最佳实践，成功将系统延迟降低 40%，在边缘设备上实现丝滑运行。

learnopencv 通过提供生产级的代码范例与前沿算法落地指南，让开发者从重复造轮子中解放出来，专注于业务逻辑的创新与交付。

运行环境要求

操作系统

未说明

GPU

部分项目（如 LLM 服务、VLM、3D 重建）需要 NVIDIA GPU，具体显存需求视模型而定（通常建议 8GB+），CUDA 版本未明确指定
部分项目支持边缘设备（如 Jetson Nano/Orin）或 Arduino

内存

未说明（大型模型训练或推理通常建议 16GB+）

依赖

notes该仓库是多个独立教程和演示代码的集合，并非单一工具，因此不同子目录（如 YOLO26、vLLM 部署、SAM-3 等）的环境需求差异巨大。部分项目专为边缘设备（NVIDIA Jetson）或微控制器（Arduino）设计。运行特定项目前，请务必查阅对应子目录下的具体要求或关联博客文章。

python未说明

opencv-python

torch

transformers

ultralytics

vllm

langgraph

accelerate

roboflow

快速开始

LearnOpenCV

此仓库包含我们在博客 LearnOpenCV.com 上分享的计算机视觉、深度学习和人工智能研究文章的代码。

想成为人工智能专家吗？OpenCV 人工智能课程是一个很好的起点。

博客文章列表

Blog Post	Code
RF-DETR Segmentation: Real-Time Detection & Instance Segmentation Guide	Code
YOLO26 Instance Segmentation: Pixel-Perfect AI at Real-Time Speed	Code
Multi-Object Tracking with Roboflow Trackers and OpenCV	Code
Real-Time Face Blur and Pixelation with OpenCV YuNet	Code
Breaking the Bottleneck: Achieving Native NMS-Free Inference with YOLO26	Code
YOLOv26: An Object Detector Built for Real-Time Deployment	Code
Beyond Transformers: A Deep Dive into HOPE
Serving SGLang: Launch a Production-Style Server
Deployment on Edge: LLM Serving on Jetson using vLLM	Code
Nested Learning: Is Deep Learning Architecture an Illusion?
How to Build a GitHub Code-Analyser Agent for Developer Productivity	Code
The Existential Problems in LLM Serving
SAM 3D: Foundation Model for Single-Image 3D Reconstruction
SAM-3: What’s New, How It Works, and Why It Matters	Code
Image-GS: Adaptive Image Reconstruction using 2D Gaussians	Code
Ultimate Guide to Vector Databases and RAG Pipeline	Code
What Makes DeepSeek OCR So Powerful	Code
2D Gaussian Splatting: Geometrically Accurate Radiance Field Reconstruction	Code
TRM: Tiny Recursive Models	Code
Deploying ML Models on Arduino: From Blink to Think	Code
VideoRAG: Redefining Long-Context Video Comprehension
AI Agent in Action: Automating Desktop Tasks with VLMs	Code
Top VLM Evaluation Metrics for Optimal Performance Analysis	Code
Getting Started with VLM on Jetson Nano	Code
VLM on Edge: Worth the Hype or Just a Novelty?	Code
AnomalyCLIP : Harnessing CLIP for Weakly-Supervised Video Anomaly Recognition	Code
AI_for_Video_Understanding_From_Content_Moderation_to_Summarization	Code
Video-RAG: Training-Free Retrieval for Long-Video LVLMs	Code
Object Detection and Spatial Understanding with VLMs ft. Qwen2.5-VL	Code
LangGraph: Building Self-Correcting RAG Agent for Code Generation	Code
Inside Sinusoidal Position Embeddings: A Sense of Order	Code
Inside RoPE: Rotary Magic into Position Embeddings	Code
SimLingo-Vision-Language-Action-Model-for-Autonomous-Driving	Code
FineTuning Gemma 3n for Medical VQA on ROCOv2	Code
SmolLM3 Blueprint: SOTA 3B-Parameter LLM
LangGraph-A-Visual-Automation-and-Summarization-Pipeline	Code
Fine-Tuning AnomalyCLIP: Class-Agnostic Zero-Shot Anomaly Detection	Code
SigLIP 2: DeepMind’s Multilingual Vision-Language Model
MedGemma: Google’s Medico VLM for Clinical QA, Imaging, and More	Code
Nanonets-OCR-s: Enabling Rich, Structured Markdown for Document Understanding
Optimizing VJEPA-2: Tackling Latency & Context in Real-Time Video Classification Scripts	Code
V-JEPA 2: Meta’s Breakthrough in AI for the Physical World	Code
NVIDIA Cosmos Reason1: Video Understanding	Code
GR00T N1.5 Explained
LLaVA	Code
SmolVLA: Affordable & Efficient VLA Robotics on Consumer GPUs	Code
Fine-Tuning Grounding DINO: Open-Vocabulary Object Detection	Code
Getting Started with Qwen3 – The Thinking Expert	Code
Inside the GPU: A Comprehensive Guide to Modern Graphics Architecture
Distributed Parallel Training: PyTorch	Code
MONAI: The Definitive Framework for Medical Imaging Powered by PyTorch
SANA-Sprint: The One-Step Revolution in High-Quality AI Image Synthesis
FramePack-Video-Diffusion-but-feels-like-Image-Diffusion	Code
Model Weights File Formats in Machine Learning
Unsloth: A Guide from Basics to Fine-Tuning Vision Models	Code
Iterative Closest Point (ICP) Algorithm Explained	Code
MedSAM2 Explained: One Prompt to Segment Anything in Medical Imaging	Code
Batch Normalization and Dropout as Regularizers
DINOv2_by_Meta_A_Self-Supervised_foundational_vision_model	Code
Beginner's Guide to Embedding Models
MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors	Code
Google's A2A Protocol
Nvidia SANA : Faster Image Generation
Fine-tuning RF-DETR	Code
Qwen2.5-Omni: A Real-Time Multimodal AI
Vision Language Action Models: Robotic Control	Code
Fine-Tuning Gemma 3 VLM using QLoRA for LaTeX-OCR Dataset	Code
ComfyUI	Code
Gemma-3: A Comprehensive Introduction
YOLO11 on Raspberry Pi: Optimizing Object Detection for Edge Devices	Code
VGGT: Visual Geometry Grounded Transformer – For Dense 3D Reconstruction	Code
DDIM: The Faster, Improved Version of DDPM for Efficient AI Image Generation	Code
Introduction to Model Context Protocol (MCP)
MASt3R and MASt3R-SfM Explanation: Image Matching and 3D Reconstruction	Code
MatAnyone Explained: Consistent Memory for Better Video Matting	Code
GraphRAG: For Medical Document Analysis	Code
OmniParser: Vision Based GUI Agent
Fine-Tuning-YOLOv12-Comparison-With-YOLOv11-And-YOLOv7-Based-Darknet	Code
FineTuning RetinaNet for Wildlife Detection with PyTorch: A Step-by-Step Tutorial	Code
DUSt3R: Geometric 3D Vision Made Easy : Explanation and Results	Code
YOLOv12: Attention Meets Speed	Code
Video Generation: A Diffusion based approach	Code
Agentic AI: A Comprehensive Introduction	Code
Finetuning SAM2 for Leaf Disease Segmentation	Code
Object Insertion in Gaussian Splatting: Paper Explained and Training Code for MCMC and Bilateral Grid	Code
Depth Pro: Sharp Monocular Metric Depth	Code
Fine-tuning-Stable-Diffusion-3_5-UI-images	Code
SimSiam: Streamlining SSL with Stop-Gradient Mechanism	Code
Image Captioning using ResNet and LSTM	Code
Molmo VLM: Paper Explanation and Demo	Code
3D Gaussian Splatting Paper Explanation: Training Custom Datasets with NeRF-Studio Gsplats	Code
FLUX Image Generation: Experimenting with the Parameters	Code
Contrastive-Learning-SimCLR-and-BYOL(With Code Example)	Code
The Annotated NeRF : Training on Custom Dataset from Scratch in Pytorch	Code
Stable Diffusion 3 and 3.5: Paper Explanation and Inference	Code
LightRAG - Legal Document Analysis	Code
NVIDIA AI Summit 2024 – India Overview
Introduction to Speech to Speech: Most Efficient Form of NLP	Code
Training 3D U-Net for Brain Tumor Segmentation (BraTS-GLI)	Code
DETR: Overview and Inference	Code
YOLO11: Faster Than You Can Imagine!	Code
Exploring DINO: Self-Supervised Transformers for Road Segmentation with ResNet50 and U-Net	Code
Sapiens: Foundation for Human Vision Models by Meta	Code
Multimodal RAG with ColPali and Gemini	Code
Building Autonomous Vehicle in Carla: Path Following with PID Control & ROS 2	Code
Handwritten Text Recognition using OCR	Code
Training CLIP from Sratch for Image Retrieval	Code
Introduction to LiDAR SLAM: LOAM and LeGO-LOAM Paper and Code Explanation with ROS 2 Implementation	Code
Recommendation System using Vector Search	Code
Fine Tuning Whisper on Custom Dataset	Code
SAM 2 – Promptable Segmentation for Images and Videos	Code
Introduction to Feature Matching Using Neural Networks	Code
Introduction to ROS2 (Robot Operating System 2): Tutorial on ROS2 Working, DDS, ROS1 RMW, Topics, Nodes, Publisher, Subscriber in Python	Code
CVPR 2024 Research Papers - Part- 2	Code
CVPR 2024: An Overview and Key Papers	Code
Object Detection on Edge Device - OAK-D-Lite	Code
Fine-Tuning YOLOv10 Models on Custom Dataset	Code
ROS2 and Carla Setup Guide for Ubuntu 22.04
Understanding Visual SLAM for Robotics Perception: Building Monocular SLAM from Scratch in Python	Code
Enhancing Image Segmentation using U2-Net: An Approach to Efficient Background Removal	Code
YOLOv10: The Dual-Head OG of YOLO Series	Code
Fine-tuning Faster R-CNN on Sea Rescue Dataset	Code
Mastering Recommendation System: A Complete Guide
Automatic Speech Recognition with Diarization : Speech-to-Text	Code
Building MobileViT Image Classification Model from Scratch In Keras 3	Code
SDXL Inpainting: Fusing Image Inpainting with Stable Diffusion	Code
YOLOv9 Instance Segmentation on Medical Dataset	Code
A Comprehensive Guide to Robotics
Integrating Gradio with OpenCV DNN	Code
Fine-Tuning YOLOv9 on Custom Dataset	Code
Dreambooth using Diffusers	Code
Introduction to Hugging Face Diffusers	Code
Introduction to Ultralytics Explorer API	Code
YOLOv9: Advancing the YOLO Legacy	Code
Fine-Tuning LLMs using PEFT	Code
Depth Anything: Accelerating Monocular Depth Perception	Code
Deciphering LLMs: From Transformers to Quantization	Code
YOLO Loss Function Part 2: GFL and VFL Loss	Code
YOLOv8-Object-Tracking-and-Counting-with-OpenCV	Code
Stereo Vision in ADAS: Pioneering Depth Perception Beyond LiDAR	Code
YOLO Loss Function Part 1: SIoU and Focal Loss	Code
Moving Object Detection with OpenCV	Code
Integrating ADAS with Keypoint Feature Pyramid Network for 3D LiDAR Object Detection	Code
Mastering All YOLO Models from YOLOv1 to YOLO-NAS: Papers Explained (2024)
GradCAM: Enhancing Neural Network Interpretability in the Realm of Explainable AI	Code
Text Summarization using T5: Fine-Tuning and Building Gradio App	Code
3D LiDAR Visualization using Open3D: A Case Study on 2D KITTI Depth Frames for Autonomous Driving	Code
Fine Tuning T5: Text2Text Transfer Transformer for Building a Stack Overflow Tag Generator	Code
SegFormer 🤗 : Fine-Tuning for Improved Lane Detection in Autonomous Vehicles	Code
Fine-Tuning BERT using Hugging Face Transformers	Code
YOLO-NAS Pose	Code
BERT: Bidirectional Encoder Representations from Transformers	Code
Comparing KerasCV YOLOv8 Models on the Global Wheat Data 2020	Code
Top 5 AI papers of September 2023
Empowering Drivers: The Rise and Role of Advanced Driver Assistance Systems
Semantic Segmentation using KerasCV DeepLabv3+	Code
Object Detection using KerasCV YOLOv8	Code
Fine-tuning YOLOv8 Pose Models for Animal Pose Estimation	Code
Top 5 AI papers of August 2023
Fine Tuning TrOCR - Training TrOCR to Recognize Curved Text	Code
TrOCR - Getting Started with Transformer Based OCR	Code
Facial Emotion Recognition	Code
Object Keypoint Similarity in Keypoint Detection	Code
Real Time Deep SORT with Torchvision Detectors	Code
Top 5 AI papers of July 2023
Medical Image Segmentation	Code
Weighted Boxes Fusion in Object Detection: A Comparison with Non-Maximum Suppression	Code
Medical Multi-label Classification with PyTorch & Lightning	Code
Getting Started with PaddlePaddle: Exploring Object Detection, Segmentation, and Keypoints	Code
Drone Programming With Computer Vision A Beginners Guide	Code
How to Build a Pip Installable Package & Upload to PyPi
IoU Loss Functions for Faster & More Accurate Object Detection
Exploring Slicing Aided Hyper Inference for Small Object Detection	Code
Advancements in Face Recognition Models, Toolkit and Datasets
Train YOLO NAS on Custom Dataset	Code
Train YOLOv8 Instance Segmentation on Custom Data	Code
YOLO-NAS: New Object Detection Model Beats YOLOv6 & YOLOv8	Code
Segment Anything – A Foundation Model for Image Segmentation	Code
Build a Video to Slides Converter Application using the Power of Background Estimation and Frame Differencing in OpenCV	Code
A Closer Look at CVAT: Perfecting Your Annotations	YouTube
ControlNet - Achieving Superior Image Generation Results	Code
InstructPix2Pix - Edit Images With Prompts	Code
NVIDIA Spring GTC 2023 Day 4: Ending on a High Note with Top Moments from the Finale!
NVIDIA Spring GTC 2023 Day 3: Digging deeper into Deep Learning, Semiconductors & more!
NVIDIA Spring GTC 2023 Day 2: Jensen’s keynote & the iPhone moment of AI is here!
NVIDIA Spring GTC 2023 Day 1: Welcome to the future!
NVIDIA GTC Spring 2023 Curtain Raiser
Stable Diffusion - A New Paradigm in Generative AI	Code
OpenCV Face Recognition – Does Face Recognition Work on AI-Generated Images?
An In-Depth Guide to Denoising Diffusion Probabilistic Models – From Theory to Implementation	Code
From Pixels to Paintings: The Rise of Midjourney AI Art
Mastering DALL·E 2: A Breakthrough in AI Art Generation
Top 10 AI Art Generation Tools using Diffusion Models
The Future of Image Recognition is Here: PyTorch Vision Transformer	Code
Understanding Attention Mechanism in Transformer Neural Networks	Code
Deploying a Deep Learning Model using Hugging Face Spaces and Gradio	Code
Train YOLOv8 on Custom Dataset – A Complete Tutorial	Code
Introduction to Diffusion Models for Image Generation	Code
Building An Automated Image Annotation Tool: PyOpenAnnotate	Code
Ultralytics YOLOv8: State-of-the-Art YOLO Models	Code
Getting Started with YOLOv5 Instance Segmentation	Code
The Ultimate Guide To DeepLabv3 - With PyTorch Inference	Code
AI Fitness Trainer using MediaPipe: Squats Analysis	Code
YoloR - Paper Explanation & Inference -An In-Depth Analysis	Code
Roadmap To an Automated Image Annotation Tool Using Python	Code
Performance Comparison of YOLO Object Detection Models – An Intensive Study
FCOS - Anchor Free Object Detection Explained	Code
YOLOv6 Custom Dataset Training – Underwater Trash Detection	Code
What is EXIF Data in Images?	Code
t-SNE: T-Distributed Stochastic Neighbor Embedding Explained	Code
CenterNet: Objects as Points – Anchor-free Object Detection Explained	Code
YOLOv7 Pose vs MediaPipe in Human Pose Estimation	Code
YOLOv6 Object Detection – Paper Explanation and Inference	Code
YOLOX Object Detector Paper Explanation and Custom Training	Code
Driver Drowsiness Detection Using Mediapipe In Python	Code
GTC 2022 Big Bang AI announcements: Everything you need to know
NVIDIA GTC 2022 : The most important AI event this Fall
Object Tracking and Reidentification with FairMOT	Code
What is Face Detection? – The Ultimate Guide for 2022	Code
Document Scanner: Custom Semantic Segmentation using PyTorch-DeepLabV3	Code
Fine Tuning YOLOv7 on Custom Dataset	Code
Center Stage for Zoom Calls using MediaPipe	Code
Mean Average Precision (mAP) in Object Detection
YOLOv7 Object Detection Paper Explanation and Inference	Code
Pothole Detection using YOLOv4 and Darknet	Code
Automatic Document Scanner using OpenCV	Code
Demystifying GPU architectures for deep learning: Part 2	Code
Demystifying GPU Architectures For Deep Learning	Code
Intersection-over-Union(IoU)-in-Object-Detection-and-Segmentation	Code
Understanding Multiple Object Tracking using DeepSORT	Code
Optical Character Recognition using PaddleOCR	Code
Gesture Control in Zoom Call using Mediapipe	Code
A Deep Dive into Tensorflow Model Optimization	Code
DepthAI Pipeline Overview: Creating a Complex Pipeline	Code
TensorFlow Lite Model Maker: Create Models for On-Device Machine Learning	Code
TensorFlow Lite: Model Optimization for On Device Machine Learning	Code
Object detection with depth measurement using pre-trained models with OAK-D	Code
Custom Object Detection Training using YOLOv5	Code
Object Detection using Yolov5 and OpenCV DNN (C++/Python)	Code
Create Snapchat/Instagram filters using Mediapipe	Code
AUTOSAR C++ compliant deep learning inference with TensorRT	Code
NVIDIA GTC 2022 Day 4 Highlights: Meet the new Jetson Orin
NVIDIA GTC 2022 Day 3 Highlights: Deep Dive into Hopper architecture
NVIDIA GTC 2022 Day 2 Highlights: Jensen’s Keynote
NVIDIA GTC 2022 Day 1 Highlights: Brilliant Start
Automatic License Plate Recognition using Python	Code
Building a Poor Body Posture Detection and Alert System using MediaPipe	Code
Introduction to MediaPipe	Code
Disparity Estimation using Deep Learning	Code
How to build Chrome Dino game bot using OpenCV Feature Matching	Code
Top 10 Sources to Find Computer Vision and AI Models
Multi-Attribute and Graph-based Object Detection
Plastic Waste Detection with Deep Learning	Code
Ensemble Deep Learning-based Defect Classification and Detection in SEM Images
Building Industrial embedded deep learning inference pipelines with TensorRT	Code
Transfer Learning for Medical Images
Stereo Vision and Depth Estimation using OpenCV AI Kit	Code
Introduction to OpenCV AI Kit and DepthAI	Code
WeChat QR Code Scanner in OpenCV	Code
AI behind the Diwali 2021 ‘Not just a Cadbury ad’
Model Selection and Benchmarking with Modelplace.AI	Model Zoo
Real-time style transfer in a zoom meeting	Code
Introduction to OpenVino Deep Learning Workbench	Code
Running OpenVino Models on Intel Integrated GPU	Code
Post Training Quantization with OpenVino Toolkit	Code
Introduction to Intel OpenVINO Toolkit
Human Action Recognition using Detectron2 and LSTM	Code
Pix2Pix:Image-to-Image Translation in PyTorch & TensorFlow	Code
Conditional GAN (cGAN) in PyTorch and TensorFlow	Code
Deep Convolutional GAN in PyTorch and TensorFlow	Code
Introduction to Generative Adversarial Networks (GANs)	Code
Human Pose Estimation using Keypoint RCNN in PyTorch	Code
Non Maximum Suppression: Theory and Implementation in PyTorch	Code
MRNet – The Multi-Task Approach	Code
Generative and Discriminative Models
Playing Chrome's T-Rex Game with Facial Gestures	Code
Variational Autoencoder in TensorFlow	Code
Autoencoder in TensorFlow 2: Beginner’s Guide	Code
Deep Learning with OpenCV DNN Module: A Definitive Guide	Code
Depth perception using stereo camera (Python/C++)	Code
Contour Detection using OpenCV (Python/C++)	Code
Super Resolution in OpenCV	Code
Improving Illumination in Night Time Images	Code
Video Classification and Human Activity Recognition	Code
How to use OpenCV DNN Module with Nvidia GPU on Windows	Code
How to use OpenCV DNN Module with NVIDIA GPUs	Code
Code OpenCV in Visual Studio
Install OpenCV on Windows – C++ / Python	Code
Face Recognition with ArcFace	Code
Background Subtraction with OpenCV and BGS Libraries	Code
RAFT: Optical Flow estimation using Deep Learning	Code
Making A Low-Cost Stereo Camera Using OpenCV	Code
Optical Flow in OpenCV (C++/Python)	Code
Introduction to Epipolar Geometry and Stereo Vision	Code
Classification With Localization: Convert any keras Classifier to a Detector	Code
Photoshop Filters in OpenCV	Code
Tetris Game using OpenCV Python	Code
Image Classification with OpenCV for Android	Code
Image Classification with OpenCV Java	Code
PyTorch to Tensorflow Model Conversion	Code
Snake Game with OpenCV Python	Code
Stanford MRNet Challenge: Classifying Knee MRIs	Code
Experiment Logging with TensorBoard and wandb	Code
Understanding Lens Distortion	Code
Image Matting with state-of-the-art Method “F, B, Alpha Matting”	Code
Bag Of Tricks For Image Classification - Let's check if it is working or not	Code
Getting Started with OpenCV CUDA Module	Code
Training a Custom Object Detector with DLIB & Making Gesture Controlled Applications	Code
How To Run Inference Using TensorRT C++ API	Code
Using Facial Landmarks for Overlaying Faces with Medical Masks	Code
Tensorboard with PyTorch Lightning	Code
Otsu's Thresholding with OpenCV	Code
PyTorch-to-CoreML-model-conversion	Code
Playing Rock, Paper, Scissors with AI	Code
CNN Receptive Field Computation Using Backprop with TensorFlow	Code
CNN Fully Convolutional Image Classification with TensorFlow	Code
How to convert a model from PyTorch to TensorRT and speed up inference	Code
Efficient image loading	Code
Graph Convolutional Networks: Model Relations In Data	Code
Getting Started with Federated Learning with PyTorch and PySyft	Code
Creating a Virtual Pen & Eraser	Code
Getting Started with PyTorch Lightning	Code
Multi-Label Image Classification with PyTorch: Image Tagging	Code
Funny Mirrors Using OpenCV	code
t-SNE for ResNet feature visualization	Code
Multi-Label Image Classification with Pytorch	Code
CNN Receptive Field Computation Using Backprop	Code
CNN Receptive Field Computation Using Backprop with TensorFlow	Code
Augmented Reality using AruCo Markers in OpenCV(C++ and Python)	Code
Fully Convolutional Image Classification on Arbitrary Sized Image	Code
Camera Calibration using OpenCV	Code
Geometry of Image Formation
Ensuring Training Reproducibility in Pytorch
Gaze Tracking
Simple Background Estimation in Videos Using OpenCV	Code
Applications of Foreground-Background separation with Semantic Segmentation	Code
EfficientNet: Theory + Code	Code
PyTorch for Beginners: Mask R-CNN Instance Segmentation with PyTorch	Code
PyTorch for Beginners: Faster R-CNN Object Detection with PyTorch	Code
PyTorch for Beginners: Semantic Segmentation using torchvision	Code
PyTorch for Beginners: Comparison of pre-trained models for Image Classification	Code
PyTorch for Beginners: Basics	Code
PyTorch Model Inference using ONNX and Caffe2	Code
Image Classification Using Transfer Learning in PyTorch	Code
Hangman: Creating games in OpenCV	Code
Image Inpainting with OpenCV (C++/Python)	Code
Hough Transform with OpenCV (C++/Python)	Code
Xeus-Cling: Run C++ code in Jupyter Notebook	Code
Gender & Age Classification using OpenCV Deep Learning ( C++/Python )	Code
Invisibility Cloak using Color Detection and Segmentation with OpenCV	Code
Fast Image Downloader for Open Images V4 (Python)	Code
Deep Learning based Text Detection Using OpenCV (C++/Python)	Code
Video Stabilization Using Point Feature Matching in OpenCV	Code
Training YOLOv3 : Deep Learning based Custom Object Detector	Code
Using OpenVINO with OpenCV	Code
Duplicate Search on Quora Dataset	Code
Shape Matching using Hu Moments (C++/Python)	Code
Install OpenCV 4 on CentOS (C++ and Python)	Code
Install OpenCV 3.4.4 on CentOS (C++ and Python)	Code
Install OpenCV 3.4.4 on Red Hat (C++ and Python)	Code
Install OpenCV 4 on Red Hat (C++ and Python)	Code
Install OpenCV 4 on macOS (C++ and Python)	Code
Install OpenCV 3.4.4 on Raspberry Pi	Code
Install OpenCV 3.4.4 on macOS (C++ and Python)	Code
OpenCV QR Code Scanner (C++ and Python)	Code
Install OpenCV 3.4.4 on Windows (C++ and Python)	Code
Install OpenCV 3.4.4 on Ubuntu 16.04 (C++ and Python)	Code
Install OpenCV 3.4.4 on Ubuntu 18.04 (C++ and Python)	Code
Universal Sentence Encoder	Code
Install OpenCV 4 on Raspberry Pi	Code
Install OpenCV 4 on Windows (C++ and Python)	Code
Face Detection – Dlib, OpenCV, and Deep Learning ( C++ / Python )	Code
Hand Keypoint Detection using Deep Learning and OpenCV	Code
Deep learning based Object Detection and Instance Segmentation using Mask R-CNN in OpenCV (Python / C++)	Code
Install OpenCV 4 on Ubuntu 18.04 (C++ and Python)	Code
Install OpenCV 4 on Ubuntu 16.04 (C++ and Python)	Code
Multi-Person Pose Estimation in OpenCV using OpenPose	Code
Heatmap for Logo Detection using OpenCV (Python)	Code
Deep Learning based Object Detection using YOLOv3 with OpenCV ( Python / C++ )	Code
Convex Hull using OpenCV in Python and C++	Code
MultiTracker : Multiple Object Tracking using OpenCV (C++/Python)	Code
Convolutional Neural Network based Image Colorization using OpenCV	Code
SVM using scikit-learn	Code
GOTURN: Deep Learning based Object Tracking	Code
Find the Center of a Blob (Centroid) using OpenCV (C++/Python)	Code
Support Vector Machines (SVM)	Code
Batch Normalization in Deep Networks	Code
Deep Learning based Character Classification using Synthetic Dataset	Code
Image Quality Assessment : BRISQUE	Code
Understanding AlexNet
Deep Learning based Text Recognition (OCR) using Tesseract and OpenCV	Code
Deep Learning based Human Pose Estimation using OpenCV ( C++ / Python )	Code
Number of Parameters and Tensor Sizes in a Convolutional Neural Network (CNN)
How to convert your OpenCV C++ code into a Python module	Code
CV4Faces : Best Project Award 2018
Facemark : Facial Landmark Detection using OpenCV	Code
Image Alignment (Feature Based) using OpenCV (C++/Python)	Code
Barcode and QR code Scanner using ZBar and OpenCV	Code
Keras Tutorial : Fine-tuning using pre-trained models	Code
OpenCV Transparent API
Face Reconstruction using EigenFaces (C++/Python)	Code
Eigenface using OpenCV (C++/Python)	Code
Principal Component Analysis
Keras Tutorial : Transfer Learning using pre-trained models	Code
Keras Tutorial : Using pre-trained Imagenet models	Code
Technical Aspects of a Digital SLR
Using Harry Potter interactive wand with OpenCV to create magic
Install OpenCV 3 and Dlib on Windows ( Python only )
Image Classification using Convolutional Neural Networks in Keras	Code
Understanding Autoencoders using Tensorflow (Python)	Code
Best Project Award : Computer Vision for Faces
Understanding Activation Functions in Deep Learning
Image Classification using Feedforward Neural Network in Keras	Code
Exposure Fusion using OpenCV (C++/Python)	Code
Understanding Feedforward Neural Networks
High Dynamic Range (HDR) Imaging using OpenCV (C++/Python)	Code
Deep learning using Keras – The Basics	Code
Selective Search for Object Detection (C++ / Python)	Code
Installing Deep Learning Frameworks on Ubuntu with CUDA support
Parallel Pixel Access in OpenCV using forEach	Code
cvui: A GUI lib built on top of OpenCV drawing primitives	Code
Install Dlib on Windows
Install Dlib on Ubuntu
Install OpenCV3 on Ubuntu
Read, Write and Display a video using OpenCV ( C++/ Python )	Code
Install Dlib on MacOS
Install OpenCV 3 on MacOS
Install OpenCV 3 on Windows
Get OpenCV Build Information ( getBuildInformation )
Color spaces in OpenCV (C++ / Python)	Code
Neural Networks : A 30,000 Feet View for Beginners
Alpha Blending using OpenCV (C++ / Python)	Code
User stories : How readers of this blog are applying their knowledge to build applications
How to select a bounding box ( ROI ) in OpenCV (C++/Python) ?
Automatic Red Eye Remover using OpenCV (C++ / Python)	Code
Bias-Variance Tradeoff in Machine Learning
Embedded Computer Vision: Which device should you choose?
Object Tracking using OpenCV (C++/Python)	Code
Handwritten Digits Classification : An OpenCV ( C++ / Python ) Tutorial	Code
Training a better Haar and LBP cascade based Eye Detector using OpenCV
Deep Learning Book Gift Recipients
Minified OpenCV Haar and LBP Cascades	Code
Deep Learning Book Gift
Histogram of Oriented Gradients
Image Recognition and Object Detection : Part 1
Head Pose Estimation using OpenCV and Dlib	Code
Live CV : A Computer Vision Coding Application
Approximate Focal Length for Webcams and Cell Phone Cameras
Configuring Qt for OpenCV on OSX	Code
Rotation Matrix To Euler Angles	Code
Speeding up Dlib’s Facial Landmark Detector
Warp one triangle to another using OpenCV ( C++ / Python )	Code
Average Face : OpenCV ( C++ / Python ) Tutorial	Code
Face Swap using OpenCV ( C++ / Python )	Code
Face Morph Using OpenCV — C++ / Python	Code
Deep Learning Example using NVIDIA DIGITS 3 on EC2
NVIDIA DIGITS 3 on EC2
Homography Examples using OpenCV ( Python / C ++ )	Code
Filling holes in an image using OpenCV ( Python / C++ )	Code
How to find frame rate or frames per second (fps) in OpenCV ( Python / C++ ) ?	Code
Delaunay Triangulation and Voronoi Diagram using OpenCV ( C++ / Python)	Code
OpenCV (C++ vs Python) vs MATLAB for Computer Vision
Facial Landmark Detection
Why does OpenCV use BGR color format ?
Computer Vision for Predicting Facial Attractiveness	Code
applyColorMap for pseudocoloring in OpenCV ( C++ / Python )	Code
Image Alignment (ECC) in OpenCV ( C++ / Python )	Code
How to find OpenCV version in Python and C++ ?
Baidu banned from ILSVRC 2015
OpenCV Transparent API
How Computer Vision Solved the Greatest Soccer Mystery of All Time
Embedded Vision Summit 2015
Read an Image in OpenCV ( Python, C++ )	Code
Non-Photorealistic Rendering using OpenCV ( Python, C++ )	Code
Seamless Cloning using OpenCV ( Python , C++ )	Code
OpenCV Threshold ( Python , C++ )	Code
Blob Detection Using OpenCV ( Python, C++ )	Code
Turn your OpenCV Code into a Web API in under 10 minutes — Part 1
How to compile OpenCV sample Code ?
Install OpenCV 3 on Yosemite ( OSX 10.10.x )

LearnOpenCV 快速上手指南

LearnOpenCV 是一个汇集了计算机视觉、深度学习和人工智能研究文章配套代码的开源仓库。本指南将帮助你快速配置环境并运行其中的示例项目。

环境准备

在开始之前，请确保你的开发环境满足以下要求：

操作系统: Linux (推荐 Ubuntu 20.04+), macOS, 或 Windows (建议使用 WSL2)。
Python 版本: Python 3.8 或更高版本 (推荐 3.10+)。
硬件加速: 部分深度学习示例（如 YOLO, VLM, LLM 部署）强烈建议配备 NVIDIA GPU 并安装对应的 CUDA 驱动。
前置依赖:
- Git
- pip (Python 包管理工具)
- CMake (部分需要编译的 OpenCV 模块可能需要)

国内开发者提示：建议在安装 Python 依赖时使用国内镜像源（如清华源或阿里源）以加速下载。

安装步骤

1. 克隆仓库

首先，将代码库克隆到本地：

git clone https://github.com/spmallick/learnopencv.git
cd learnopencv

2. 创建虚拟环境

为避免依赖冲突，建议为每个项目或整体创建一个独立的虚拟环境：

python -m venv venv
source venv/bin/activate  # Linux/macOS
# 或在 Windows 上使用: venv\Scripts\activate

3. 安装基础依赖

虽然不同子项目可能有特定的 requirements.txt，但大多数项目依赖基础的 OpenCV 和深度学习框架。你可以先安装通用基础包：

# 使用国内镜像源加速安装
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple opencv-python-headless torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118

注意：具体的子项目（如 YOLO26-instance-segmentation 或 LLaVA）通常在其文件夹内包含独立的 requirements.txt。进入具体项目目录后，请优先执行该项目下的安装命令：
cd <具体项目文件夹名称>
pip install -r requirements.txt

基本使用

LearnOpenCV 包含多个独立的项目演示。以下以通用的物体检测或图像读取为例，展示如何运行一个简单的 Python 脚本。

示例：运行一个简单的 OpenCV 脚本

假设你正在尝试运行仓库中某个基础示例（例如读取并显示图像，或运行一个简单的检测模型）：

进入项目目录：选择一个包含代码的文件夹，例如 FaceBlurPixelate (人脸模糊) 或其他带有 Code 链接的项目。
```
cd FaceBlurPixelate
```
查看脚本用法：大多数脚本支持 -h 或 --help 参数来查看使用说明。
```
python main.py --help
```
运行示例：根据帮助信息传入必要的参数（通常是输入图片或视频路径）。以下是一个典型的运行命令结构：
```
python main.py --input path/to/your/image.jpg --output result.jpg
```
如果是基于深度学习模型的示例（如 YOLO 或 VLM），通常需要指定模型权重文件：
```
python infer.py --weights yolov8n.pt --source 0
```
(注：--source 0 通常表示调用摄像头)

探索更多项目

你可以浏览仓库根目录下的文件夹列表，每个文件夹对应一篇技术博客的实现代码。进入相应文件夹后，参照该文件夹内的 README.md（如果有）或直接运行主脚本即可体验最新的 AI 技术（如 RF-DETR, SAM-3, V-JEPA 2 等）。

版本历史

RF_DETR_Segmentation2026/04/07

YOLO26_Keypoint_Estimation2026/04/16

Colorization2026/03/18

Roboflow_Trackers2026/03/17

常见问题

运行 EAST 文本检测模型时出现 'Can't open "true"' 或 'Can't create layer...RealDiv' 错误怎么办？

运行示例代码时提示找不到模型文件或配置文件（如 .pb, .xml 文件）如何解决？

运行立体相机深度感知代码时报错 'Can't open file: ...stereo_rectify_maps.xml' 或 '_map1.empty()' 怎么办？

GroundingDINO 微调脚本报错 'module groundingdino.util.train does not exist' 如何解决？

YOLOv3 在 OpenCV 上的运行速度为何远低于宣传的数值（如只有 4 FPS）？

在 Windows 上编译 Darknet 进行 ALPR 训练时经常卡死或中断怎么办？

相似工具推荐

openclaw

OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。 OpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你

★ 349.3k|★★★☆☆|1周前

Agent开发框架图像

stable-diffusion-webui

stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。

★ 162.1k|★★★☆☆|1周前

开发框架图像Agent

everything-claude-code

everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上

★ 159.6k|★★☆☆☆|今天

开发框架Agent语言模型

opencode

OpenCode 是一款开源的 AI 编程助手（Coding Agent），旨在像一位智能搭档一样融入您的开发流程。它不仅仅是一个代码补全插件，而是一个能够理解项目上下文、自主规划任务并执行复杂编码操作的智能体。无论是生成全新功能、重构现有代码，还是排查难以定位的 Bug，OpenCode 都能通过自然语言交互高效完成，显著减少开发者在重复性劳动和上下文切换上的时间消耗。这款工具专为软件开发者、工程师及技术研究人员设计，特别适合希望利用大模型能力来提升编码效率、加速原型开发或处理遗留代码维护的专业人群。其核心亮点在于完全开源的架构，这意味着用户可以审查代码逻辑、自定义行为策略，甚至私有化部署以保障数据安全，彻底打破了传统闭源 AI 助手的“黑盒”限制。在技术体验上，OpenCode 提供了灵活的终端界面（Terminal UI）和正在测试中的桌面应用程序，支持 macOS、Windows 及 Linux 全平台。它兼容多种包管理工具，安装便捷，并能无缝集成到现有的开发环境中。无论您是追求极致控制权的资深极客，还是渴望提升产出的独立开发者，OpenCode 都提供了一个透明、可信

★ 144.3k|★☆☆☆☆|昨天

Agent插件

ComfyUI

ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。

★ 108.3k|★★☆☆☆|1周前

开发框架图像Agent

gemini-cli

gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。

★ 100.8k|★★☆☆☆|1周前

插件Agent图像