[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-shouxieai--tensorRT_Pro":3,"tool-shouxieai--tensorRT_Pro":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":78,"owner_location":78,"owner_email":78,"owner_twitter":78,"owner_website":78,"owner_url":79,"languages":80,"stars":119,"forks":120,"last_commit_at":121,"license":122,"difficulty_score":10,"env_os":123,"env_gpu":124,"env_ram":125,"env_deps":126,"category_tags":133,"github_topics":134,"view_count":10,"oss_zip_url":78,"oss_zip_packed_at":78,"status":16,"created_at":141,"updated_at":142,"faqs":143,"releases":174},1213,"shouxieai\u002FtensorRT_Pro","tensorRT_Pro","C++ library based on tensorrt integration","tensorRT_Pro是一个基于TensorRT的高性能推理框架，专为简化AI模型部署而设计。它提供C++和Python的极简接口，让开发者只需几行代码就能运行YOLOv5、YOLOX等主流模型，无需深入处理TensorRT的复杂集成细节。例如，C++只需3行代码完成推理，Python示例也清晰易用。\n\n它解决了传统TensorRT部署门槛高的问题——原本需要大量代码处理插件开发、序列化和精度优化（如FP32\u002FFP16\u002FINT8编译），而tensorRT_Pro已封装这些步骤，让部署效率大幅提升。特别适合AI工程师和嵌入式开发者在服务器或边缘设备上快速部署模型，无需反复调试底层细节。\n\n工具附带丰富教程、Docker镜像和预训练模型示例（如YOLOv5的简单实现、CenterNet转换指南），新手也能轻松上手。核心优势在于“开箱即用”：从模型加载到推理结果输出，全程流畅高效，让高性能推理真正触手可及。","*Read this in other languages: [English](README.md), [简体中文](tutorial\u002FREADME.zh-cn.md).*\n\n## News: \n- 🔥 A simple implementation is released: https:\u002F\u002Fgithub.com\u002Fshouxieai\u002Finfer\n- 🔥 Add yolov7 support .\n- 🔥 Released python solution for hardware decoding with tensorRT integration\n- 🔥 Docker Image has been released：https:\u002F\u002Fhub.docker.com\u002Fr\u002Fhopef\u002Ftensorrt-pro\n- ⚡tensorRT_Pro_comments_version(co-contributing version) is also provided for a better learning experience. Repo: https:\u002F\u002Fgithub.com\u002FGuanbin-Huang\u002FtensorRT_Pro_comments\n- 🔥 [Simple yolov5\u002Fyolox implemention is released. Simple and easy to use.](example-simple_yolo)\n- 🔥 yolov5-1.0-6.0\u002Fmaster are supported.\n- Tutorial notebooks download:\n  - [WarpAffine.lesson.tar.gz](http:\u002F\u002Fzifuture.com:1000\u002Ffs\u002F25.shared\u002Fwarpaffine.lesson.tar.gz)\n  - [Offset.tar.gz](http:\u002F\u002Fzifuture.com:1000\u002Ffs\u002F25.shared\u002Foffset.tar.gz)\n- Tutorial for exporting CenterNet from pytorch to tensorRT is released. \n\n## Tutorial Video\n\n- \u003Cb>blibli\u003C\u002Fb> : https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1Xw411f7FW (Now only in Chinese. English is comming)\n- \u003Cb>slides\u003C\u002Fb> : http:\u002F\u002Fzifuture.com:1556\u002Ffs\u002Fsxai\u002FtensorRT.pptx (Now only in Chinese. English is comming)\n- \u003Cb>tutorial folder\u003C\u002Fb>: a good intro for beginner to get a general idea of our framework.(Chinese\u002FEnglish)\n\n## An Out-of-the-Box TensorRT-based Framework for High Performance Inference with C++\u002FPython Support\n\n- C++ Interface: 3 lines of code is all you need to run a YoloX\n\n  ```C++\n  \u002F\u002F create inference engine on gpu-0\n  \u002F\u002Fauto engine = Yolo::create_infer(\"yolov5m.fp32.trtmodel\", Yolo::Type::V5, 0);\n  auto engine = Yolo::create_infer(\"yolox_m.fp32.trtmodel\", Yolo::Type::X, 0);\n  \n  \u002F\u002F load image\n  auto image = cv::imread(\"1.jpg\");\n  \n  \u002F\u002F do inference and get the result\n  auto box = engine->commit(image).get();  \u002F\u002F return vector\u003CBox>\n  ```\n\n- Python Interface:\n  ```python\n  import pytrt\n  \n  model     = models.resnet18(True).eval().to(device)\n  trt_model = tp.from_torch(model, input)\n  trt_out   = trt_model(input)\n  ```\n  \n  - simple yolo for python\n  ```python\n  import os\n  import cv2\n  import numpy as np\n  import pytrt as tp\n\n  engine_file = \"yolov5s.fp32.trtmodel\"\n  if not os.path.exists(engine_file):\n      tp.compile_onnx_to_file(1, tp.onnx_hub(\"yolov5s\"), engine_file)\n\n  yolo   = tp.Yolo(engine_file, type=tp.YoloType.V5)\n  image  = cv2.imread(\"car.jpg\")\n  bboxes = yolo.commit(image).get()\n  print(f\"{len(bboxes)} objects\")\n\n  for box in bboxes:\n      left, top, right, bottom = map(int, [box.left, box.top, box.right, box.bottom])\n      cv2.rectangle(image, (left, top), (right, bottom), tp.random_color(box.class_label), 5)\n\n  saveto = \"yolov5.car.jpg\"\n  print(f\"Save to {saveto}\")\n\n  cv2.imwrite(saveto, image)\n  cv2.imshow(\"result\", image)\n  cv2.waitKey()\n  ```\n\n## INTRO\n\n1. High level interface for C++\u002FPython.\n2. Simplify the implementation of custom plugin. And serialization and deserialization have been encapsulated for easier usage.\n3. Simplify the compile of fp32, fp16 and int8 for facilitating the deployment with C++\u002FPython in server or embeded device.\n4. Models ready for use also with examples are RetinaFace, Scrfd, YoloV5, YoloX, Arcface, AlphaPose, CenterNet and DeepSORT(C++)\n\n## YoloX and YoloV5-series Model Test Report\n\n\u003Cdetails>\n\u003Csummary>app_yolo.cpp speed testing\u003C\u002Fsummary>\n  \n1. Resolution (YoloV5P5, YoloX) = (640x640),  (YoloV5P6) = (1280x1280)\n2. max batch size = 16\n3. preprocessing + inference + postprocessing\n4. cuda10.2, cudnn8.2.2.26, TensorRT-8.0.1.6\n5. RTX2080Ti\n6. num of testing: take the average on the results of 100 times but excluding the first time for warmup \n7. Testing log: [workspace\u002Fperf.result.std.log (workspace\u002Fperf.result.std.log)\n8. code for testing: [src\u002Fapplication\u002Fapp_yolo.cpp](src\u002Fapplication\u002Fapp_yolo.cpp)\n9. images for testing: 6 images in workspace\u002Finference \n    - with resolution 810x1080，500x806，1024x684，550x676，1280x720，800x533 respetively\n10. Testing method: load 6 images. Then do the inference on the 6 images, which will be repeated for 100 times. Note that each image should be preprocessed and postprocessed.\n\n---\n\n| Model    | Resolution | Type      | Precision | Elapsed Time | FPS    |\n| -------- | ---------- | --------- | --------- | ------------ | ------ |\n| yolox_x  | 640x640    | YoloX     | FP32      | 21.879       | 45.71  |\n| yolox_l  | 640x640    | YoloX     | FP32      | 12.308       | 81.25  |\n| yolox_m  | 640x640    | YoloX     | FP32      | 6.862        | 145.72 |\n| yolox_s  | 640x640    | YoloX     | FP32      | 3.088        | 323.81 |\n| yolox_x  | 640x640    | YoloX     | FP16      | 6.763        | 147.86 |\n| yolox_l  | 640x640    | YoloX     | FP16      | 3.933        | 254.25 |\n| yolox_m  | 640x640    | YoloX     | FP16      | 2.515        | 397.55 |\n| yolox_s  | 640x640    | YoloX     | FP16      | 1.362        | 734.48 |\n| yolox_x  | 640x640    | YoloX     | INT8      | 4.070        | 245.68 |\n| yolox_l  | 640x640    | YoloX     | INT8      | 2.444        | 409.21 |\n| yolox_m  | 640x640    | YoloX     | INT8      | 1.730        | 577.98 |\n| yolox_s  | 640x640    | YoloX     | INT8      | 1.060        | 943.15 |\n| yolov5x6 | 1280x1280  | YoloV5_P6 | FP32      | 68.022       | 14.70  |\n| yolov5l6 | 1280x1280  | YoloV5_P6 | FP32      | 37.931       | 26.36  |\n| yolov5m6 | 1280x1280  | YoloV5_P6 | FP32      | 20.127       | 49.69  |\n| yolov5s6 | 1280x1280  | YoloV5_P6 | FP32      | 8.715        | 114.75 |\n| yolov5x  | 640x640    | YoloV5_P5 | FP32      | 18.480       | 54.11  |\n| yolov5l  | 640x640    | YoloV5_P5 | FP32      | 10.110       | 98.91  |\n| yolov5m  | 640x640    | YoloV5_P5 | FP32      | 5.639        | 177.33 |\n| yolov5s  | 640x640    | YoloV5_P5 | FP32      | 2.578        | 387.92 |\n| yolov5x6 | 1280x1280  | YoloV5_P6 | FP16      | 20.877       | 47.90  |\n| yolov5l6 | 1280x1280  | YoloV5_P6 | FP16      | 10.960       | 91.24  |\n| yolov5m6 | 1280x1280  | YoloV5_P6 | FP16      | 7.236        | 138.20 |\n| yolov5s6 | 1280x1280  | YoloV5_P6 | FP16      | 3.851        | 259.68 |\n| yolov5x  | 640x640    | YoloV5_P5 | FP16      | 5.933        | 168.55 |\n| yolov5l  | 640x640    | YoloV5_P5 | FP16      | 3.450        | 289.86 |\n| yolov5m  | 640x640    | YoloV5_P5 | FP16      | 2.184        | 457.90 |\n| yolov5s  | 640x640    | YoloV5_P5 | FP16      | 1.307        | 765.10 |\n| yolov5x6 | 1280x1280  | YoloV5_P6 | INT8      | 12.207       | 81.92  |\n| yolov5l6 | 1280x1280  | YoloV5_P6 | INT8      | 7.221        | 138.49 |\n| yolov5m6 | 1280x1280  | YoloV5_P6 | INT8      | 5.248        | 190.55 |\n| yolov5s6 | 1280x1280  | YoloV5_P6 | INT8      | 3.149        | 317.54 |\n| yolov5x  | 640x640    | YoloV5_P5 | INT8      | 3.704        | 269.97 |\n| yolov5l  | 640x640    | YoloV5_P5 | INT8      | 2.255        | 443.53 |\n| yolov5m  | 640x640    | YoloV5_P5 | INT8      | 1.674        | 597.40 |\n| yolov5s  | 640x640    | YoloV5_P5 | INT8      | 1.143        | 874.91 |\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>app_yolo_fast.cpp speed testing. Never stop desiring for being faster\u003C\u002Fsummary>\n  \n- \u003Cb>Highlight:\u003C\u002Fb>   0.5 ms faster without any loss in precision compared with the above. Specifically, we remove the Focus and some transpose nodes etc, and implement them in CUDA kenerl function. But the rest remains the same.\n- \u003Cb>Test log:\u003C\u002Fb>   [workspace\u002Fperf.result.std.log](workspace\u002Fperf.result.std.log)\n- \u003Cb>Code for testing:\u003C\u002Fb>   [src\u002Fapplication\u002Fapp_yolo_fast.cpp](src\u002Fapplication\u002Fapp_yolo_fast.cpp)\n- \u003Cb>Tips:\u003C\u002Fb>   you can do the modification while refering to the downloaded onnx. Any questions are welcomed through any kinds of contact.\n- \u003Cb>Conclusion:\u003C\u002Fb>   the main idea of this work is to optimize the pre-and-post processing. If you go for yolox, yolov5 small version, the optimization might help you.\n\n|Model|Resolution|Type|Precision|Elapsed Time|FPS|\n|---|---|---|---|---|---|\n|yolox_x_fast|640x640|YoloX|FP32|21.598 |46.30 |\n|yolox_l_fast|640x640|YoloX|FP32|12.199 |81.97 |\n|yolox_m_fast|640x640|YoloX|FP32|6.819 |146.65 |\n|yolox_s_fast|640x640|YoloX|FP32|2.979 |335.73 |\n|yolox_x_fast|640x640|YoloX|FP16|6.764 |147.84 |\n|yolox_l_fast|640x640|YoloX|FP16|3.866 |258.64 |\n|yolox_m_fast|640x640|YoloX|FP16|2.386 |419.16 |\n|yolox_s_fast|640x640|YoloX|FP16|1.259 |794.36 |\n|yolox_x_fast|640x640|YoloX|INT8|3.918 |255.26 |\n|yolox_l_fast|640x640|YoloX|INT8|2.292 |436.38 |\n|yolox_m_fast|640x640|YoloX|INT8|1.589 |629.49 |\n|yolox_s_fast|640x640|YoloX|INT8|0.954 |1048.47 |\n|yolov5x6_fast|1280x1280|YoloV5_P6|FP32|67.075 |14.91 |\n|yolov5l6_fast|1280x1280|YoloV5_P6|FP32|37.491 |26.67 |\n|yolov5m6_fast|1280x1280|YoloV5_P6|FP32|19.422 |51.49 |\n|yolov5s6_fast|1280x1280|YoloV5_P6|FP32|7.900 |126.57 |\n|yolov5x_fast|640x640|YoloV5_P5|FP32|18.554 |53.90 |\n|yolov5l_fast|640x640|YoloV5_P5|FP32|10.060 |99.41 |\n|yolov5m_fast|640x640|YoloV5_P5|FP32|5.500 |181.82 |\n|yolov5s_fast|640x640|YoloV5_P5|FP32|2.342 |427.07 |\n|yolov5x6_fast|1280x1280|YoloV5_P6|FP16|20.538 |48.69 |\n|yolov5l6_fast|1280x1280|YoloV5_P6|FP16|10.404 |96.12 |\n|yolov5m6_fast|1280x1280|YoloV5_P6|FP16|6.577 |152.06 |\n|yolov5s6_fast|1280x1280|YoloV5_P6|FP16|3.087 |323.99 |\n|yolov5x_fast|640x640|YoloV5_P5|FP16|5.919 |168.95 |\n|yolov5l_fast|640x640|YoloV5_P5|FP16|3.348 |298.69 |\n|yolov5m_fast|640x640|YoloV5_P5|FP16|2.015 |496.34 |\n|yolov5s_fast|640x640|YoloV5_P5|FP16|1.087 |919.63 |\n|yolov5x6_fast|1280x1280|YoloV5_P6|INT8|11.236 |89.00 |\n|yolov5l6_fast|1280x1280|YoloV5_P6|INT8|6.235 |160.38 |\n|yolov5m6_fast|1280x1280|YoloV5_P6|INT8|4.311 |231.97 |\n|yolov5s6_fast|1280x1280|YoloV5_P6|INT8|2.139 |467.45 |\n|yolov5x_fast|640x640|YoloV5_P5|INT8|3.456 |289.37 |\n|yolov5l_fast|640x640|YoloV5_P5|INT8|2.019 |495.41 |\n|yolov5m_fast|640x640|YoloV5_P5|INT8|1.425 |701.71 |\n|yolov5s_fast|640x640|YoloV5_P5|INT8|0.844 |1185.47 |\n  \n\u003C\u002Fdetails>\n\n## Setup and Configuration\n\u003Cdetails>\n\u003Csummary>Linux\u003C\u002Fsummary>\n  \n  \n1. VSCode (highly recommended!)\n2. Configure your path for cudnn, cuda, tensorRT8.0 and protobuf.\n3. Configure the compute capability matched with your nvidia graphics card in Makefile\u002FCMakeLists.txt\n    - e.g.  `-gencode=arch=compute_75,code=sm_75`. If you are using 3080Ti, that should be `gencode=arch=compute_86,code=sm_86`\n    - reference for the table for GPU Compute Capability:\n  https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-gpus#compute\n4. Configure your library path in .vscode\u002Fc_cpp_properties.json\n5. CUDA version: CUDA10.2\n6. CUDNN version: cudnn8.2.2.26. Note that dev(.h file) and runtime(.so file) should be downloaded.\n7. tensorRT version：tensorRT-8.0.1.6-cuda10.2\n8. protobuf version（for onnx parser）：protobufv3.11.4\n    - if other version, refer to the ........\n    - link for download: https:\u002F\u002Fgithub.com\u002Fprotocolbuffers\u002Fprotobuf\u002Ftree\u002Fv3.11.4\n    - download, compile and replace the path in Makefile\u002FCMakeLists.txt with new path to protobuf3.11.4\n  - CMake:\n    - `mkdir build && cd build`\n    - `cmake ..`\n    - `make yolo -j8`\n  - Makefile:\n    - `make yolo -j8`\n  \n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Linux: Compile for Python\u003C\u002Fsummary>\n\n- compile and install\n    - Makefile：\n        - set `use_python := true` in Makefile\n    - CMakeLists.txt:\n        - `set(HAS_PYTHON ON)` in CMakeLists.txt\n    - Type in `make pyinstall -j8`\n    - Complied files are in `python\u002Fpytrt\u002Flibpytrtc.so`\n\n\u003C\u002Fdetails>\n  \n\u003Cdetails>\n\u003Csummary>Windows\u003C\u002Fsummary>\n\n  \n1. Please check the [lean\u002FREADME.md](lean\u002FREADME.md) for the detailed dependency\n2. In TensorRT.vcxproj, replace the `\u003CImport Project=\"$(VCTargetsPath)\\BuildCustomizations\\CUDA 10.0.props\" \u002F>` with your own CUDA path\n3. In TensorRT.vcxproj, replace the `\u003CImport Project=\"$(VCTargetsPath)\\BuildCustomizations\\CUDA 10.0.targets\" \u002F>` with your own CUDA path\n4. In TensorRT.vcxproj, replace the `\u003CCodeGeneration>compute_61,sm_61\u003C\u002FCodeGeneration>` with your compute capability.\n    - refer to the table in https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-gpus#compute\n  \n5. Configure your dependency or download it to the foler \u002Flean. Configure VC++ dir (include dir and refence)\n\n6. Configure your env, debug->environment\n7. Compile and run the example, where 3 options are available.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Windows: Compile for Python\u003C\u002Fsummary>\n\n  \n1. Compile pytrtc.pyd. Choose python in visual studio to compile\n2. Copy dll and execute 'python\u002Fcopy_dll_to_pytrt.bat'\n3. Execute the example in python dir by 'python test_yolov5.py'\n  - if installation is needed, switch to target env(e.g. your conda env) then 'python setup.py install', which has to be followed by step 1 and step 2.\n  - the compiled files are in `python\u002Fpytrt\u002Flibpytrtc.pyd`\n\n\u003C\u002Fdetails>\n  \n  \n\u003Cdetails>\n\u003Csummary>Other Protobuf Version\u003C\u002Fsummary>\n  \n- in onnx\u002Fmake_pb.sh, replace the path `protoc=\u002Fdata\u002Fsxai\u002Flean\u002Fprotobuf3.11.4\u002Fbin\u002Fprotoc` in protoc with the protoc of your own version\n\n```bash\n#cd the path in terminal to \u002Fonnx\ncd onnx\n\n#execuete the command to make pb files\nbash make_pb.sh\n```\n  \n- CMake:\n    - replace the `set(PROTOBUF_DIR \"\u002Fdata\u002Fsxai\u002Flean\u002Fprotobuf3.11.4\")` in CMakeLists.txt with the same path of your protoc.\n\n```bash\nmkdir build && cd build\ncmake ..\nmake yolo -j64\n```\n- Makefile:\n    - replace the path `lean_protobuf  := \u002Fdata\u002Fsxai\u002Flean\u002Fprotobuf3.11.4` in Makefile with the same path of protoc\n\n```bash\nmake yolo -j64\n```\n\n\u003C\u002Fdetails>\n  \n\n\u003Cdetails>\n\u003Csummary>TensorRT 7.x support\u003C\u002Fsummary>\n\n- The default is tensorRT8.x\n1. Replace onnx_parser_for_7.x\u002Fonnx_parser to src\u002FtensorRT\u002Fonnx_parser\n    - `bash onnx_parser\u002Fuse_tensorrt_7.x.sh`\n2. Configure Makefile\u002FCMakeLists.txt path to TensorRT7.x\n3. Execute `make yolo -j64`\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>TensorRT 8.x support\u003C\u002Fsummary>\n\n- The default is tensorRT8.x\n1. Replace onnx_parser_for_8.x\u002Fonnx_parser to src\u002FtensorRT\u002Fonnx_parser\n    - `bash onnx_parser\u002Fuse_tensorrt_8.x.sh`\n2. Configure Makefile\u002FCMakeLists.txt path to TensorRT8.x\n3. Execute `make yolo -j64`\n\n\u003C\u002Fdetails>\n  \n  \n## Guide for Different Tasks\u002FModel Support\n\u003Cdetails>\n\u003Csummary>YoloV5 Support\u003C\u002Fsummary>\n  \n- if pytorch >= 1.7, and the model is 5.0+, the model is suppored by the framework \n- if pytorch \u003C 1.7 or yolov5(2.0, 3.0 or 4.0), minor modification should be done in opset.\n- if you want to achieve the inference with lower pytorch, dynamic batchsize and other advanced setting, please check our [blog](http:\u002F\u002Fzifuture.com:8090) (now in Chinese) and scan the QRcode via Wechat to join us.\n\n\n1. Download yolov5\n\n```bash\ngit clone git@github.com:ultralytics\u002Fyolov5.git\n```\n\n2. Modify the code for dynamic batchsize\n```python\n# line 55 forward function in yolov5\u002Fmodels\u002Fyolo.py \n# bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)\n# x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n# modified into:\n\nbs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)\nbs = -1\nny = int(ny)\nnx = int(nx)\nx[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n\n# line 70 in yolov5\u002Fmodels\u002Fyolo.py\n#  z.append(y.view(bs, -1, self.no))\n# modified into：\nz.append(y.view(bs, self.na * ny * nx, self.no))\n\n############# for yolov5-6.0 #####################\n# line 65 in yolov5\u002Fmodels\u002Fyolo.py\n# if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic:\n#    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)\n# modified into:\nif self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic:\n    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)\n\n# disconnect for pytorch trace\nanchor_grid = (self.anchors[i].clone() * self.stride[i]).view(1, -1, 1, 1, 2)\n\n# line 70 in yolov5\u002Fmodels\u002Fyolo.py\n# y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh\n# modified into:\ny[..., 2:4] = (y[..., 2:4] * 2) ** 2 * anchor_grid  # wh\n\n# line 73 in yolov5\u002Fmodels\u002Fyolo.py\n# wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh\n# modified into:\nwh = (y[..., 2:4] * 2) ** 2 * anchor_grid  # wh\n############# for yolov5-6.0 #####################\n\n\n# line 52 in yolov5\u002Fexport.py\n# torch.onnx.export(dynamic_axes={'images': {0: 'batch', 2: 'height', 3: 'width'},  # shape(1,3,640,640)\n#                                'output': {0: 'batch', 1: 'anchors'}  # shape(1,25200,85)  修改为\n# modified into:\ntorch.onnx.export(dynamic_axes={'images': {0: 'batch'},  # shape(1,3,640,640)\n                                'output': {0: 'batch'}  # shape(1,25200,85) \n```\n3. Export to onnx model\n```bash\ncd yolov5\npython export.py --weights=yolov5s.pt --dynamic --include=onnx --opset=11\n```\n4. Copy the model and execute it\n```bash\ncp yolov5\u002Fyolov5s.onnx tensorRT_cpp\u002Fworkspace\u002F\ncd tensorRT_cpp\nmake yolo -j32\n```\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>YoloV7 Support\u003C\u002Fsummary>\n1. Download yolov7 and pth\n\n```bash\n# from cdn\n# or wget https:\u002F\u002Fgithub.com\u002FWongKinYiu\u002Fyolov7\u002Freleases\u002Fdownload\u002Fv0.1\u002Fyolov7.pt\n\nwget https:\u002F\u002Fcdn.githubjs.cf\u002FWongKinYiu\u002Fyolov7\u002Freleases\u002Fdownload\u002Fv0.1\u002Fyolov7.pt\ngit clone git@github.com:WongKinYiu\u002Fyolov7.git\n```\n\n2. Modify the code for dynamic batchsize\n```python\n# line 45 forward function in yolov7\u002Fmodels\u002Fyolo.py \n# bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)\n# x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n# modified into:\n\nbs, _, ny, nx = map(int, x[i].shape)  # x(bs,255,20,20) to x(bs,3,20,20,85)\nbs = -1\nx[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n\n# line 52 in yolov7\u002Fmodels\u002Fyolo.py\n# y = x[i].sigmoid()\n# y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy\n# y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh\n# z.append(y.view(bs, -1, self.no))\n# modified into：\ny = x[i].sigmoid()\nxy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy\nwh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].view(1, -1, 1, 1, 2)  # wh\nclassif = y[..., 4:]\ny = torch.cat([xy, wh, classif], dim=-1)\nz.append(y.view(bs, self.na * ny * nx, self.no))\n\n# line 57 in yolov7\u002Fmodels\u002Fyolo.py\n# return x if self.training else (torch.cat(z, 1), x)\n# modified into:\nreturn x if self.training else torch.cat(z, 1)\n\n\n# line 52 in yolov7\u002Fmodels\u002Fexport.py\n# output_names=['classes', 'boxes'] if y is None else ['output'],\n# dynamic_axes={'images': {0: 'batch', 2: 'height', 3: 'width'},  # size(1,3,640,640)\n#               'output': {0: 'batch', 2: 'y', 3: 'x'}} if opt.dynamic else None)\n# modified into:\noutput_names=['classes', 'boxes'] if y is None else ['output'],\ndynamic_axes={'images': {0: 'batch'},  # size(1,3,640,640)\n              'output': {0: 'batch'}} if opt.dynamic else None)\n\n```\n3. Export to onnx model\n```bash\ncd yolov7\npython models\u002Fexport.py --dynamic --grid --weight=yolov7.pt\n```\n4. Copy the model and execute it\n```bash\ncp yolov7\u002Fyolov7.onnx tensorRT_cpp\u002Fworkspace\u002F\ncd tensorRT_cpp\nmake yolo -j32\n```\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>YoloX Support\u003C\u002Fsummary>\n  \n- download from: https:\u002F\u002Fgithub.com\u002FMegvii-BaseDetection\u002FYOLOX\n- If you don't want to export onnx by yourself, just make run in the repo of Megavii\n\n1. Download YoloX\n```bash\ngit clone git@github.com:Megvii-BaseDetection\u002FYOLOX.git\ncd YOLOX\n```\n\n2. Modify the code\nThe modification ensures a successful int8 compilation and inference, otherwise `Missing scale and zero-point for tensor (Unnamed Layer* 686)` will be raised.\n  \n```Python\n# line 206 forward fuction in yolox\u002Fmodels\u002Fyolo_head.py. Replace the commented code with the uncommented code\n# self.hw = [x.shape[-2:] for x in outputs] \nself.hw = [list(map(int, x.shape[-2:])) for x in outputs]\n\n\n# line 208 forward function in yolox\u002Fmodels\u002Fyolo_head.py. Replace the commented code with the uncommented code\n# [batch, n_anchors_all, 85]\n# outputs = torch.cat(\n#     [x.flatten(start_dim=2) for x in outputs], dim=2\n# ).permute(0, 2, 1)\nproc_view = lambda x: x.view(-1, int(x.size(1)), int(x.size(2) * x.size(3)))\noutputs = torch.cat(\n    [proc_view(x) for x in outputs], dim=2\n).permute(0, 2, 1)\n\n\n# line 253 decode_output function in yolox\u002Fmodels\u002Fyolo_head.py Replace the commented code with the uncommented code\n#outputs[..., :2] = (outputs[..., :2] + grids) * strides\n#outputs[..., 2:4] = torch.exp(outputs[..., 2:4]) * strides\n#return outputs\nxy = (outputs[..., :2] + grids) * strides\nwh = torch.exp(outputs[..., 2:4]) * strides\nreturn torch.cat((xy, wh, outputs[..., 4:]), dim=-1)\n\n# line 77 in tools\u002Fexport_onnx.py\nmodel.head.decode_in_inference = True\n```\n\n \n3. Export to onnx\n```bash\n\n# download model\nwget https:\u002F\u002Fgithub.com\u002FMegvii-BaseDetection\u002FYOLOX\u002Freleases\u002Fdownload\u002F0.1.1rc0\u002Fyolox_m.pth\n\n# export\nexport PYTHONPATH=$PYTHONPATH:.\npython tools\u002Fexport_onnx.py -c yolox_m.pth -f exps\u002Fdefault\u002Fyolox_m.py --output-name=yolox_m.onnx --dynamic --no-onnxsim\n```\n\n4. Execute the command\n```bash\ncp YOLOX\u002Fyolox_m.onnx tensorRT_cpp\u002Fworkspace\u002F\ncd tensorRT_cpp\nmake yolo -j32\n```\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>YoloV3 Support\u003C\u002Fsummary>\n  \n- if pytorch >= 1.7, and the model is 5.0+, the model is suppored by the framework \n- if pytorch \u003C 1.7 or yolov3, minor modification should be done in opset.\n- if you want to achieve the inference with lower pytorch, dynamic batchsize and other advanced setting, please check our [blog](http:\u002F\u002Fzifuture.com:8090) (now in Chinese) and scan the QRcode via Wechat to join us.\n\n\n1. Download yolov3\n\n```bash\ngit clone git@github.com:ultralytics\u002Fyolov3.git\n```\n\n2. Modify the code for dynamic batchsize\n```python\n# line 55 forward function in yolov3\u002Fmodels\u002Fyolo.py \n# bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)\n# x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n# modified into:\n\nbs, _, ny, nx = map(int, x[i].shape)  # x(bs,255,20,20) to x(bs,3,20,20,85)\nbs = -1\nx[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n\n\n# line 70 in yolov3\u002Fmodels\u002Fyolo.py\n#  z.append(y.view(bs, -1, self.no))\n# modified into：\nz.append(y.view(bs, self.na * ny * nx, self.no))\n\n# line 62 in yolov3\u002Fmodels\u002Fyolo.py\n# if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic:\n#    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)\n# modified into:\nif self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic:\n    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)\nanchor_grid = (self.anchors[i].clone() * self.stride[i]).view(1, -1, 1, 1, 2)\n\n# line 70 in yolov3\u002Fmodels\u002Fyolo.py\n# y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh\n# modified into:\ny[..., 2:4] = (y[..., 2:4] * 2) ** 2 * anchor_grid  # wh\n\n# line 73 in yolov3\u002Fmodels\u002Fyolo.py\n# wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh\n# modified into:\nwh = (y[..., 2:4] * 2) ** 2 * anchor_grid  # wh\n\n\n# line 52 in yolov3\u002Fexport.py\n# torch.onnx.export(dynamic_axes={'images': {0: 'batch', 2: 'height', 3: 'width'},  # shape(1,3,640,640)\n#                                'output': {0: 'batch', 1: 'anchors'}  # shape(1,25200,85) \n# modified into:\ntorch.onnx.export(dynamic_axes={'images': {0: 'batch'},  # shape(1,3,640,640)\n                                'output': {0: 'batch'}  # shape(1,25200,85) \n```\n3. Export to onnx model\n```bash\ncd yolov3\npython export.py --weights=yolov3.pt --dynamic --include=onnx --opset=11\n```\n4. Copy the model and execute it\n```bash\ncp yolov3\u002Fyolov3.onnx tensorRT_cpp\u002Fworkspace\u002F\ncd tensorRT_cpp\n\n# change src\u002Fapplication\u002Fapp_yolo.cpp: main\n# test(Yolo::Type::V3, TRT::Mode::FP32, \"yolov3\");\n\nmake yolo -j32\n```\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>UNet Support\u003C\u002Fsummary>\n  \n- reference to : https:\u002F\u002Fgithub.com\u002Fshouxieai\u002Funet-pytorch\n\n```\nmake dunet -j32\n```\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>Retinaface Support\u003C\u002Fsummary>\n\n- https:\u002F\u002Fgithub.com\u002Fbiubug6\u002FPytorch_Retinaface\n\n1. Download Pytorch_Retinaface Repo\n\n```bash\ngit clone git@github.com:biubug6\u002FPytorch_Retinaface.git\ncd Pytorch_Retinaface\n```\n\n2. Download model from the Training of README.md in https:\u002F\u002Fgithub.com\u002Fbiubug6\u002FPytorch_Retinaface#training .Then unzip it to the \u002Fweights . Here, we use mobilenet0.25_Final.pth\n\n3. Modify the code\n\n```python\n# line 24 in models\u002Fretinaface.py\n# return out.view(out.shape[0], -1, 2) is modified into \nreturn out.view(-1, int(out.size(1) * out.size(2) * 2), 2)\n\n# line 35 in models\u002Fretinaface.py\n# return out.view(out.shape[0], -1, 4) is modified into\nreturn out.view(-1, int(out.size(1) * out.size(2) * 2), 4)\n\n# line 46 in models\u002Fretinaface.py\n# return out.view(out.shape[0], -1, 10) is modified into\nreturn out.view(-1, int(out.size(1) * out.size(2) * 2), 10)\n\n# The following modification ensures the output of resize node is based on scale rather than shape such that dynamic batch can be achieved.\n# line 89 in models\u002Fnet.py\n# up3 = F.interpolate(output3, size=[output2.size(2), output2.size(3)], mode=\"nearest\") is modified into\nup3 = F.interpolate(output3, scale_factor=2, mode=\"nearest\")\n\n# line 93 in models\u002Fnet.py\n# up2 = F.interpolate(output2, size=[output1.size(2), output1.size(3)], mode=\"nearest\") is modified into\nup2 = F.interpolate(output2, scale_factor=2, mode=\"nearest\")\n\n# The following code removes softmax (bug sometimes happens). At the same time, concatenate the output to simplify the decoding.\n# line 123 in models\u002Fretinaface.py\n# if self.phase == 'train':\n#     output = (bbox_regressions, classifications, ldm_regressions)\n# else:\n#     output = (bbox_regressions, F.softmax(classifications, dim=-1), ldm_regressions)\n# return output\n# the above is modified into:\noutput = (bbox_regressions, classifications, ldm_regressions)\nreturn torch.cat(output, dim=-1)\n\n# set 'opset_version=11' to ensure a successful export\n# torch_out = torch.onnx._export(net, inputs, output_onnx, export_params=True, verbose=False,\n#     input_names=input_names, output_names=output_names)\n# is modified into:\ntorch_out = torch.onnx._export(net, inputs, output_onnx, export_params=True, verbose=False, opset_version=11,\n    input_names=input_names, output_names=output_names)\n\n\n\n\n```\n4. Export to onnx\n```bash\npython convert_to_onnx.py\n```\n\n5. Execute\n```bash\ncp FaceDetector.onnx ..\u002FtensorRT_cpp\u002Fworkspace\u002Fmb_retinaface.onnx\ncd ..\u002FtensorRT_cpp\nmake retinaface -j64\n```\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>DBFace Support\u003C\u002Fsummary>\n\n- https:\u002F\u002Fgithub.com\u002Fdlunion\u002FDBFace\n\n```bash\nmake dbface -j64\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Scrfd Support\u003C\u002Fsummary>\n\n- https:\u002F\u002Fgithub.com\u002Fdeepinsight\u002Finsightface\u002Ftree\u002Fmaster\u002Fdetection\u002Fscrfd\n- The know-how about exporting to onnx is comming. Before it is released, come and join us to disucss. \n\n\u003C\u002Fdetails>\n\n\n\n\u003Cdetails>\n\u003Csummary>Arcface Support\u003C\u002Fsummary>\n\n- https:\u002F\u002Fgithub.com\u002Fdeepinsight\u002Finsightface\u002Ftree\u002Fmaster\u002Frecognition\u002Farcface_torch\n```C++\nauto arcface = Arcface::create_infer(\"arcface_iresnet50.fp32.trtmodel\", 0);\nauto feature = arcface->commit(make_tuple(face, landmarks)).get();\ncout \u003C\u003C feature \u003C\u003C endl;  \u002F\u002F 1x512\n```\n- In the example of Face Recognition, `workspace\u002Fface\u002Flibrary` is the set of faces registered.\n- `workspace\u002Fface\u002Frecognize` is the set of face to be recognized.\n- the result is saved in `workspace\u002Fface\u002Fresult`和`workspace\u002Fface\u002Flibrary_draw`\n\n\u003C\u002Fdetails>\n  \n\u003Cdetails>\n\u003Csummary>CenterNet Support\u003C\u002Fsummary>\n  \ncheck the great details in tutorial\u002F2.0\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>Bert Support(Chinese Classification)\u003C\u002Fsummary>\n\n- https:\u002F\u002Fgithub.com\u002F649453932\u002FBert-Chinese-Text-Classification-Pytorch\n- `make bert -j6`  \n\n\u003C\u002Fdetails>\n\n\n## the INTRO to Interface\n\n\u003Cdetails>\n\u003Csummary>Python Interface：Get onnx and trtmodel from pytorch model more easily\u003C\u002Fsummary>\n\n- Just one line of code to export onnx and trtmodel. And save them for usage in the future.\n```python\nimport pytrt\n\nmodel = models.resnet18(True).eval()\npytrt.from_torch(\n    model, \n    dummy_input, \n    max_batch_size=16, \n    onnx_save_file=\"test.onnx\", \n    engine_save_file=\"engine.trtmodel\"\n)\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Python Interface：TensorRT Inference\u003C\u002Fsummary>\n\n- YoloX TensorRT Inference\n```python\nimport pytrt\n\nyolo   = tp.Yolo(engine_file, type=tp.YoloType.X)   # engine_file is the trtmodel file\nimage  = cv2.imread(\"inference\u002Fcar.jpg\")\nbboxes = yolo.commit(image).get()\n```\n\n- Seamless Inference from Pytorch to TensorRT\n```python\nimport pytrt\n\nmodel     = models.resnet18(True).eval().to(device) # pt model\ntrt_model = tp.from_torch(model, input)\ntrt_out   = trt_model(input)\n```\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>C++ Interface：YoloX Inference\u003C\u002Fsummary>\n\n```C++\n\n\u002F\u002F create infer engine on gpu 0\nauto engine = Yolo::create_infer(\"yolox_m.fp32.trtmodel\"， Yolo::Type::X, 0);\n\n\u002F\u002F load image\nauto image = cv::imread(\"1.jpg\");\n\n\u002F\u002F do inference and get the result\nauto box = engine->commit(image).get();\n```\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>C++ Interface：Compile Model in FP32\u002FFP16\u003C\u002Fsummary>\n\n```cpp\nTRT::compile(\n  TRT::Mode::FP32,   \u002F\u002F compile model in fp32\n  3,                          \u002F\u002F max batch size\n  \"plugin.onnx\",              \u002F\u002F onnx file\n  \"plugin.fp32.trtmodel\",     \u002F\u002F save path\n  {}                         \u002F\u002F  redefine the shape of input when needed\n);\n```\n- For fp32 compilation, all you need is offering onnx file whose input shape is allowed to be redefined.\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>C++ Interface：Compile in int8\u003C\u002Fsummary>\n\n- The in8 inference performs slightly worse than fp32 in precision(about -5% drop down), but stunningly faster. In the framework, we offer int8 inference\n\n```cpp\n\u002F\u002F define int8 calibration function to read data and handle it to tenor.\nauto int8process = [](int current, int count, vector\u003Cstring>& images, shared_ptr\u003CTRT::Tensor>& tensor){\n    for(int i = 0; i \u003C images.size(); ++i){\n    \u002F\u002F int8 compilation requires calibration. We read image data and set_norm_mat. Then the data will be transfered into the tensor.\n        auto image = cv::imread(images[i]);\n        cv::resize(image, image, cv::Size(640, 640));\n        float mean[] = {0, 0, 0};\n        float std[]  = {1, 1, 1};\n        tensor->set_norm_mat(i, image, mean, std);\n    }\n};\n\n\n\u002F\u002F Specify TRT::Mode as INT8\nauto model_file = \"yolov5m.int8.trtmodel\";\nTRT::compile(\n  TRT::Mode::INT8,            \u002F\u002F INT8\n  3,                          \u002F\u002F max batch size\n  \"yolov5m.onnx\",             \u002F\u002F onnx\n  model_file,                 \u002F\u002F saved filename\n  {},                         \u002F\u002F redefine the input shape\n  int8process,                \u002F\u002F the recall function for calibration\n  \".\",                        \u002F\u002F the dir where the image data is used for calibration\n  \"\"                          \u002F\u002F the dir where the data generated from calibration is saved(a.k.a where to load the calibration data.)\n);\n```\n- We integrate into only one int8process function to save otherwise a lot of issues that might happen in tensorRT official implementation. \n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>C++ Interface：Inference\u003C\u002Fsummary>\n\n- We introduce class Tensor for easier inference and data transfer between host to device. So that as a user, the details wouldn't be annoying.\n\n- class Engine is another facilitator.\n\n```cpp\n\u002F\u002F load model and get a shared_ptr. get nullptr if fail to load.\nauto engine = TRT::load_infer(\"yolov5m.fp32.trtmodel\");\n\n\u002F\u002F print model info\nengine->print();\n\n\u002F\u002F load image\nauto image = imread(\"demo.jpg\");\n\n\u002F\u002F get the model input and output node, which can be accessed by name or index\nauto input = engine->input(0);   \u002F\u002F or auto input = engine->input(\"images\");\nauto output = engine->output(0); \u002F\u002F or auto output = engine->output(\"output\");\n\n\u002F\u002F put the image into input tensor by calling set_norm_mat()\nfloat mean[] = {0, 0, 0};\nfloat std[]  = {1, 1, 1};\ninput->set_norm_mat(i, image, mean, std);\n\n\u002F\u002F do the inference. Here sync(true) or async(false) is optional\nengine->forward(); \u002F\u002F engine->forward(true or false)\n\n\u002F\u002F get the outut_ptr, which can used to access the output\nfloat* output_ptr = output->cpu\u003Cfloat>();\n```\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>C++ Interface：Plugin\u003C\u002Fsummary>\n\n- You only need to define kernel function and inference process. The details of code(e.g the serialization, deserialization and injection of plugin etc) are under the hood.\n- Easy to implement a new plugin in FP32 and FP16. Refer to HSwish.cu for details.\n```cpp\ntemplate\u003C>\n__global__ void HSwishKernel(float* input, float* output, int edge) {\n\n    KernelPositionBlock;\n    float x = input[position];\n    float a = x + 3;\n    a = a \u003C 0 ? 0 : (a >= 6 ? 6 : a);\n    output[position] = x * a \u002F 6;\n}\n\nint HSwish::enqueue(const std::vector\u003CGTensor>& inputs, std::vector\u003CGTensor>& outputs, const std::vector\u003CGTensor>& weights, void* workspace, cudaStream_t stream) {\n\n    int count = inputs[0].count();\n    auto grid = CUDATools::grid_dims(count);\n    auto block = CUDATools::block_dims(count);\n    HSwishKernel \u003C\u003C\u003Cgrid, block, 0, stream >>> (inputs[0].ptr\u003Cfloat>(), outputs[0].ptr\u003Cfloat>(), count);\n    return 0;\n}\n\n\nRegisterPlugin(HSwish);\n```\n\n\u003C\u002Fdetails>\n\n\n## About Us\n- Our blog：http:\u002F\u002Fwww.zifuture.com\u002F                        (Now only in Chinese. English is comming)\n- Our video channel： https:\u002F\u002Fspace.bilibili.com\u002F1413433465 (Now only in Chinese. English is comming)\n\n\n\n\n\n\n\n\n\n\n","*在其他语言中阅读此内容：[英语](README.md), [简体中文](tutorial\u002FREADME.zh-cn.md).*\n\n## 新闻：\n- 🔥 发布了一个简单的实现：https:\u002F\u002Fgithub.com\u002Fshouxieai\u002Finfer\n- 🔥 增加了对 YOLOv7 的支持。\n- 🔥 发布了集成 TensorRT 的硬件解码 Python 解决方案。\n- 🔥 Docker 镜像已发布：https:\u002F\u002Fhub.docker.com\u002Fr\u002Fhopef\u002Ftensorrt-pro\n- ⚡ 也提供了 tensorRT_Pro_comments_version（协作贡献版本），以获得更好的学习体验。仓库：https:\u002F\u002Fgithub.com\u002FGuanbin-Huang\u002FtensorRT_Pro_comments\n- 🔥 [发布了简单的 YOLOv5\u002FYOLOX 实现，简单易用。](example-simple_yolo)\n- 🔥 支持 YOLOv5 1.0-6.0\u002Fmaster 版本。\n- 教程笔记本下载：\n  - [WarpAffine.lesson.tar.gz](http:\u002F\u002Fzifuture.com:1000\u002Ffs\u002F25.shared\u002Fwarpaffine.lesson.tar.gz)\n  - [Offset.tar.gz](http:\u002F\u002Fzifuture.com:1000\u002Ffs\u002F25.shared\u002Foffset.tar.gz)\n- 发布了将 CenterNet 从 PyTorch 导出到 TensorRT 的教程。\n\n## 教程视频\n\n- \u003Cb>哔哩哔哩\u003C\u002Fb>：https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1Xw411f7FW（目前仅提供中文版，英文版即将推出）\n- \u003Cb>幻灯片\u003C\u002Fb>：http:\u002F\u002Fzifuture.com:1556\u002Ffs\u002Fsxai\u002FtensorRT.pptx（目前仅提供中文版，英文版即将推出）\n- \u003Cb>教程文件夹\u003C\u002Fb>：为初学者提供了一个很好的入门介绍，帮助他们大致了解我们的框架。（中文\u002F英文）\n\n## 开箱即用的基于 TensorRT 的高性能推理框架，支持 C++\u002FPython\n\n- C++ 接口：只需 3 行代码即可运行 YOLOX\n\n  ```C++\n  \u002F\u002F 在 GPU-0 上创建推理引擎\n  \u002F\u002Fauto engine = Yolo::create_infer(\"yolov5m.fp32.trtmodel\", Yolo::Type::V5, 0);\n  auto engine = Yolo::create_infer(\"yolox_m.fp32.trtmodel\", Yolo::Type::X, 0);\n  \n  \u002F\u002F 加载图像\n  auto image = cv::imread(\"1.jpg\");\n  \n  \u002F\u002F 进行推理并获取结果\n  auto box = engine->commit(image).get();  \u002F\u002F 返回 Box 向量\n  ```\n\n- Python 接口：\n  ```python\n  import pytrt\n  \n  model     = models.resnet18(True).eval().to(device)\n  trt_model = tp.from_torch(model, input)\n  trt_out   = trt_model(input)\n  ```\n  \n  - 简单的 Python YOLO 示例：\n  ```python\n  import os\n  import cv2\n  import numpy as np\n  import pytrt as tp\n\n  engine_file = \"yolov5s.fp32.trtmodel\"\n  if not os.path.exists(engine_file):\n      tp.compile_onnx_to_file(1, tp.onnx_hub(\"yolov5s\"), engine_file)\n\n  yolo   = tp.Yolo(engine_file, type=tp.YoloType.V5)\n  image  = cv2.imread(\"car.jpg\")\n  bboxes = yolo.commit(image).get()\n  print(f\"{len(bboxes)} objects\")\n\n  for box in bboxes:\n      left, top, right, bottom = map(int, [box.left, box.top, box.right, box.bottom])\n      cv2.rectangle(image, (left, top), (right, bottom), tp.random_color(box.class_label), 5)\n\n  saveto = \"yolov5.car.jpg\"\n  print(f\"Save to {saveto}\")\n\n  cv2.imwrite(saveto, image)\n  cv2.imshow(\"result\", image)\n  cv2.waitKey()\n  ```\n  \n## 简介\n\n1. 提供面向 C++\u002FPython 的高级接口。\n2. 简化自定义插件的实现，并封装了序列化和反序列化过程，使使用更加便捷。\n3. 简化 fp32、fp16 和 int8 模型的编译流程，便于在服务器或嵌入式设备上使用 C++\u002FPython 进行部署。\n4. 提供可直接使用的模型及示例，包括 RetinaFace、SCRFD、YOLOv5、YOLOX、ArcFace、AlphaPose、CenterNet 和 DeepSORT(C++)。\n\n## YOLOX与YOLOv5系列模型测试报告\n\n\u003Cdetails>\n\u003Csummary>app_yolo.cpp速度测试\u003C\u002Fsummary>\n  \n1. 分辨率（YOLOv5P5、YOLOX）= (640×640)，(YOLOv5P6) = (1280×1280)\n2. 最大批处理大小 = 16\n3. 预处理 + 推理 + 后处理\n4. CUDA 10.2，cuDNN 8.2.2.26，TensorRT 8.0.1.6\n5. RTX 2080 Ti\n6. 测试次数：取100次结果的平均值，但排除首次预热运行\n7. 测试日志：[workspace\u002Fperf.result.std.log](workspace\u002Fperf.result.std.log)\n8. 测试代码：[src\u002Fapplication\u002Fapp_yolo.cpp](src\u002Fapplication\u002Fapp_yolo.cpp)\n9. 测试图像：位于workspace\u002Finference目录下的6张图片\n    - 分别为810×1080、500×806、1024×684、550×676、1280×720、800×533分辨率\n10. 测试方法：加载6张图片，对这6张图片进行推理，重复100次。注意每张图片都需要进行预处理和后处理。\n\n---\n\n| 模型    | 分辨率 | 类型      | 精度   | 耗时(ms) | FPS    |\n| -------- | -------- | --------- | ------- | ---------- | ------ |\n| yolox_x  | 640×640  | YOLOX     | FP32    | 21.879     | 45.71  |\n| yolox_l  | 640×640  | YOLOX     | FP32    | 12.308     | 81.25  |\n| yolox_m  | 640×640  | YOLOX     | FP32    | 6.862      | 145.72 |\n| yolox_s  | 640×640  | YOLOX     | FP32    | 3.088      | 323.81 |\n| yolox_x  | 640×640  | YOLOX     | FP16    | 6.763      | 147.86 |\n| yolox_l  | 640×640  | YOLOX     | FP16    | 3.933      | 254.25 |\n| yolox_m  | 640×640  | YOLOX     | FP16    | 2.515      | 397.55 |\n| yolox_s  | 640×640  | YOLOX     | FP16    | 1.362      | 734.48 |\n| yolox_x  | 640×640  | YOLOX     | INT8    | 4.070      | 245.68 |\n| yolox_l  | 640×640  | YOLOX     | INT8    | 2.444      | 409.21 |\n| yolox_m  | 640×640  | YOLOX     | INT8    | 1.730      | 577.98 |\n| yolox_s  | 640×640  | YOLOX     | INT8    | 1.060      | 943.15 |\n| yolov5x6 | 1280×1280| YOLOv5_P6 | FP32    | 68.022     | 14.70  |\n| yolov5l6 | 1280×1280| YOLOv5_P6 | FP32    | 37.931     | 26.36  |\n| yolov5m6 | 1280×1280| YOLOv5_P6 | FP32    | 20.127     | 49.69  |\n| yolov5s6 | 1280×1280| YOLOv5_P6 | FP32    | 8.715      | 114.75 |\n| yolov5x  | 640×640  | YOLOv5_P5 | FP32    | 18.480     | 54.11  |\n| yolov5l  | 640×640  | YOLOv5_P5 | FP32    | 10.110     | 98.91  |\n| yolov5m  | 640×640  | YOLOv5_P5 | FP32    | 5.639      | 177.33 |\n| yolov5s  | 640×640  | YOLOv5_P5 | FP32    | 2.578      | 387.92 |\n| yolov5x6 | 1280×1280| YOLOv5_P6 | FP16    | 20.877     | 47.90  |\n| yolov5l6 | 1280×1280| YOLOv5_P6 | FP16    | 10.960     | 91.24  |\n| yolov5m6 | 1280×1280| YOLOv5_P6 | FP16    | 7.236      | 138.20 |\n| yolov5s6 | 1280×1280| YOLOv5_P6 | FP16    | 3.851      | 259.68 |\n| yolov5x  | 640×640  | YOLOv5_P5 | FP16    | 5.933      | 168.55 |\n| yolov5l  | 640×640  | YOLOv5_P5 | FP16    | 3.450      | 289.86 |\n| yolov5m  | 640×640  | YOLOv5_P5 | FP16    | 2.184      | 457.90 |\n| yolov5s  | 640×640  | YOLOv5_P5 | FP16    | 1.307      | 765.10 |\n| yolov5x6 | 1280×1280| YOLOv5_P6 | INT8    | 12.207     | 81.92  |\n| yolov5l6 | 1280×1280| YOLOv5_P6 | INT8    | 7.221      | 138.49 |\n| yolov5m6 | 1280×1280| YOLOv5_P6 | INT8    | 5.248      | 190.55 |\n| yolov5s6 | 1280×1280| YOLOv5_P6 | INT8    | 3.149      | 317.54 |\n| yolov5x  | 640×640  | YOLOv5_P5 | INT8    | 3.704      | 269.97 |\n| yolov5l  | 640×640  | YOLOv5_P5 | INT8    | 2.255      | 443.53 |\n| yolov5m  | 640×640  | YOLOv5_P5 | INT8    | 1.674      | 597.40 |\n| yolov5s  | 640×640  | YOLOv5_P5 | INT8    | 1.143      | 874.91 |\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>app_yolo_fast.cpp速度测试。永不止步，追求更快\u003C\u002Fsummary>\n  \n- \u003Cb>亮点：\u003C\u002Fb> 在精度无损的情况下，比上述结果快约0.5毫秒。具体来说，我们移除了Focus层及部分转置节点等，并将其改用CUDA内核函数实现，其余部分保持不变。\n- \u003Cb>测试日志：\u003C\u002Fb> [workspace\u002Fperf.result.std.log](workspace\u002Fperf.result.std.log)\n- \u003Cb>测试代码：\u003C\u002Fb> [src\u002Fapplication\u002Fapp_yolo_fast.cpp](src\u002Fapplication\u002Fapp_yolo_fast.cpp)\n- \u003Cb>提示：\u003C\u002Fb> 可以参考下载的ONNX文件进行修改。如有任何疑问，欢迎通过各种方式联系。\n- \u003Cb>结论：\u003C\u002Fb> 本工作的核心思想是优化预处理和后处理流程。若使用YOLOX或YOLOv5的小型版本，该优化可能会有所帮助。\n\n|模型|分辨率|类型|精度|耗时(ms)|FPS|\n|---|---|---|---|---|---|\n|yolox_x_fast|640×640|YOLOX|FP32|21.598 |46.30 |\n|yolox_l_fast|640×640|YOLOX|FP32|12.199 |81.97 |\n|yolox_m_fast|640×640|YOLOX|FP32|6.819 |146.65 |\n|yolox_s_fast|640×640|YOLOX|FP32|2.979 |335.73 |\n|yolox_x_fast|640×640|YOLOX|FP16|6.764 |147.84 |\n|yolox_l_fast|640×640|YOLOX|FP16|3.866 |258.64 |\n|yolox_m_fast|640×640|YOLOX|FP16|2.386 |419.16 |\n|yolox_s_fast|640×640|YOLOX|FP16|1.259 |794.36 |\n|yolox_x_fast|640×640|YOLOX|INT8|3.918 |255.26 |\n|yolox_l_fast|640×640|YOLOX|INT8|2.292 |436.38 |\n|yolox_m_fast|640×640|YOLOX|INT8|1.589 |629.49 |\n|yolox_s_fast|640×640|YOLOX|INT8|0.954 |1048.47 |\n|yolov5x6_fast|1280×1280|YOLOv5_P6|FP32|67.075 |14.91 |\n|yolov5l6_fast|1280×1280|YOLOv5_P6|FP32|37.491 |26.67 |\n|yolov5m6_fast|1280×1280|YOLOv5_P6|FP32|19.422 |51.49 |\n|yolov5s6_fast|1280×1280|YOLOv5_P6|FP32|7.900 |126.57 |\n|yolov5x_fast|640×640|YOLOv5_P5|FP32|18.554 |53.90 |\n|yolov5l_fast|640×640|YOLOv5_P5|FP32|10.060 |99.41 |\n|yolov5m_fast|640×640|YOLOv5_P5|FP32|5.500 |181.82 |\n|yolov5s_fast|640×640|YOLOv5_P5|FP32|2.342 |427.07 |\n|yolov5x6_fast|1280×1280|YOLOv5_P6|FP16|20.538 |48.69 |\n|yolov5l6_fast|1280×1280|YOLOv5_P6|FP16|10.404 |96.12 |\n|yolov5m6_fast|1280×1280|YOLOv5_P6|FP16|6.577 |152.06 |\n|yolov5s6_fast|1280×1280|YOLOv5_P6|FP16|3.087 |323.99 |\n|yolov5x_fast|640×640|YOLOv5_P5|FP16|5.919 |168.95 |\n|yolov5l_fast|640×640|YOLOv5_P5|FP16|3.348 |298.69 |\n|yolov5m_fast|640×640|YOLOv5_P5|FP16|2.015 |496.34 |\n|yolov5s_fast|640×640|YOLOv5_P5|FP16|1.087 |919.63 |\n|yolov5x6_fast|1280×1280|YOLOv5_P6|INT8|11.236 |89.00 |\n|yolov5l6_fast|1280×1280|YOLOv5_P6|INT8|6.235 |160.38 |\n|yolov5m6_fast|1280×1280|YOLOv5_P6|INT8|4.311 |231.97 |\n|yolov5s6_fast|1280×1280|YOLOv5_P6|INT8|2.139 |467.45 |\n|yolov5x_fast|640×640|YOLOv5_P5|INT8|3.456 |289.37 |\n|yolov5l_fast|640×640|YOLOv5_P5|INT8|2.019 |495.41 |\n|yolov5m_fast|640×640|YOLOv5_P5|INT8|1.425 |701.71 |\n|yolov5s_fast|640×640|YOLOv5_P5|INT8|0.844 |1185.47 |\n\n\u003C\u002Fdetails>\n\n## 设置与配置\n\u003Cdetails>\n\u003Csummary>Linux\u003C\u002Fsummary>\n  \n  \n1. VSCode（强烈推荐！）\n2. 配置 cuDNN、CUDA、TensorRT 8.0 和 Protocol Buffers 的路径。\n3. 在 Makefile 或 CMakeLists.txt 中配置与你的 NVIDIA 显卡匹配的计算能力：\n    - 例如：`-gencode=arch=compute_75,code=sm_75`。如果你使用的是 3080Ti，则应为 `gencode=arch=compute_86,code=sm_86`。\n    - GPU 计算能力参考表：\n  https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-gpus#compute\n4. 在 .vscode\u002Fc_cpp_properties.json 中配置库路径。\n5. CUDA 版本：CUDA 10.2\n6. cuDNN 版本：cudnn 8.2.2.26。注意需要同时下载开发文件（.h 文件）和运行时文件（.so 文件）。\n7. TensorRT 版本：tensorRT-8.0.1.6-cuda10.2\n8. Protocol Buffers 版本（用于 ONNX 解析器）：protobuf v3.11.4\n    - 如果使用其他版本，请参考……\n    - 下载链接：https:\u002F\u002Fgithub.com\u002Fprotocolbuffers\u002Fprotobuf\u002Ftree\u002Fv3.11.4\n    - 下载后编译，并将 Makefile\u002FCMakeLists.txt 中的路径替换为新的 protobuf 3.11.4 路径。\n  - CMake：\n    - `mkdir build && cd build`\n    - `cmake ..`\n    - `make yolo -j8`\n  - Makefile：\n    - `make yolo -j8`\n  \n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Linux：为 Python 编译\u003C\u002Fsummary>\n\n- 编译并安装\n    - Makefile：\n        - 在 Makefile 中设置 `use_python := true`\n    - CMakeLists.txt：\n        - 在 CMakeLists.txt 中设置 `set(HAS_PYTHON ON)`\n    - 输入 `make pyinstall -j8`\n    - 编译后的文件位于 `python\u002Fpytrt\u002Flibpytrtc.so`\n\n\u003C\u002Fdetails>\n  \n\u003Cdetails>\n\u003Csummary>Windows\u003C\u002Fsummary>\n\n  \n1. 请查看 [lean\u002FREADME.md](lean\u002FREADME.md) 以获取详细的依赖项信息。\n2. 在 TensorRT.vcxproj 中，将 `\u003CImport Project=\"$(VCTargetsPath)\\BuildCustomizations\\CUDA 10.0.props\" \u002F>` 替换为你自己的 CUDA 路径。\n3. 在 TensorRT.vcxproj 中，将 `\u003CImport Project=\"$(VCTargetsPath)\\BuildCustomizations\\CUDA 10.0.targets\" \u002F>` 替换为你自己的 CUDA 路径。\n4. 在 TensorRT.vcxproj 中，将 `\u003CCodeGeneration>compute_61,sm_61\u003C\u002FCodeGeneration>` 替换为你自己的计算能力。\n    - 参考 https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-gpus#compute 上的表格\n  \n5. 配置你的依赖项，或将它们下载到 \u002Flean 文件夹中。配置 VC++ 目录（包含目录和引用）。\n6. 配置环境变量，在“调试”->“环境”中进行设置。\n7. 编译并运行示例，有三种选项可供选择。\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Windows：为 Python 编译\u003C\u002Fsummary>\n\n  \n1. 编译 pytrtc.pyd。在 Visual Studio 中选择 Python 进行编译。\n2. 复制 dll 文件并执行 `python\u002Fcopy_dll_to_pytrt.bat`。\n3. 在 python 目录下通过 `python test_yolov5.py` 执行示例。\n  - 如果需要安装，切换到目标环境（例如你的 conda 环境），然后运行 `python setup.py install`，之后再按照步骤 1 和 2 操作。\n  - 编译后的文件位于 `python\u002Fpytrt\u002Flibpytrtc.pyd`。\n\n\u003C\u002Fdetails>\n  \n  \n\u003Cdetails>\n\u003Csummary>其他 Protocol Buffers 版本\u003C\u002Fsummary>\n  \n- 在 onnx\u002Fmake_pb.sh 中，将 protoc 的路径 `protoc=\u002Fdata\u002Fsxai\u002Flean\u002Fprotobuf3.11.4\u002Fbin\u002Fprotoc` 替换为你自己版本的 protoc。\n\n```bash\n# 在终端中进入 \u002Fonnx 路径\ncd onnx\n\n# 执行命令生成 pb 文件\nbash make_pb.sh\n```\n  \n- CMake：\n    - 将 CMakeLists.txt 中的 `set(PROTOBUF_DIR \"\u002Fdata\u002Fsxai\u002Flean\u002Fprotobuf3.11.4\")` 替换为你所用 protoc 的相同路径。\n\n```bash\nmkdir build && cd build\ncmake ..\nmake yolo -j64\n```\n- Makefile：\n    - 将 Makefile 中的 `lean_protobuf := \u002Fdata\u002Fsxai\u002Flean\u002Fprotobuf3.11.4` 替换为你所用 protoc 的相同路径。\n\n```bash\nmake yolo -j64\n```\n\n\u003C\u002Fdetails>\n  \n\n\u003Cdetails>\n\u003Csummary>TensorRT 7.x 支持\u003C\u002Fsummary>\n\n- 默认是 TensorRT 8.x\n1. 将 onnx_parser_for_7.x\u002Fonnx_parser 替换为 src\u002FtensorRT\u002Fonnx_parser\n    - `bash onnx_parser\u002Fuse_tensorrt_7.x.sh`\n2. 配置 Makefile\u002FCMakeLists.txt 中指向 TensorRT 7.x 的路径。\n3. 执行 `make yolo -j64`\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>TensorRT 8.x 支持\u003C\u002Fsummary>\n\n- 默认是 TensorRT 8.x\n1. 将 onnx_parser_for_8.x\u002Fonnx_parser 替换为 src\u002FtensorRT\u002Fonnx_parser\n    - `bash onnx_parser\u002Fuse_tensorrt_8.x.sh`\n2. 配置 Makefile\u002FCMakeLists.txt 中指向 TensorRT 8.x 的路径。\n3. 执行 `make yolo -j64`\n\n\u003C\u002Fdetails>\n  \n  \n## 不同任务\u002F模型支持指南\n\u003Cdetails>\n\u003Csummary>YoloV5 支持\u003C\u002Fsummary>\n  \n- 如果 PyTorch ≥ 1.7，且模型为 5.0+，则该框架支持此模型。\n- 如果 PyTorch \u003C 1.7 或 YOLOv5 为 2.0、3.0 或 4.0，则需要对 opset 进行小幅修改。\n- 如果你想实现低版本 PyTorch 下的推理、动态批次大小以及其他高级设置，请查看我们的 [博客](http:\u002F\u002Fzifuture.com:8090)（目前为中文），并通过微信扫描二维码加入我们。\n\n\n1. 下载 YOLOv5\n\n```bash\ngit clone git@github.com:ultralytics\u002Fyolov5.git\n```\n\n2. 修改代码以支持动态批次大小\n```python\n# yolov5\u002Fmodels\u002Fyolo.py 中 forward 函数第 55 行\n# bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) 转为 x(bs,3,20,20,85)\n# x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n# 修改为：\n\nbs, _, ny, nx = x[i].shape  # x(bs,255,20,20) 转为 x(bs,3,20,20,85)\nbs = -1\nny = int(ny)\nnx = int(nx)\nx[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n\n# yolov5\u002Fmodels\u002Fyolo.py 第 70 行\n# z.append(y.view(bs, -1, self.no))\n# 修改为：\nz.append(y.view(bs, self.na * ny * nx, self.no))\n\n############# 对于 YOLOv5-6.0 #####################\n# yolov5\u002Fmodels\u002Fyolo.py 第 65 行\n# if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic:\n#    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)\n# 修改为：\nif self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic:\n    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)\n\n# 断开 PyTorch trace 的连接\nanchor_grid = (self.anchors[i].clone() * self.stride[i]).view(1, -1, 1, 1, 2)\n\n# yolov5\u002Fmodels\u002Fyolo.py 第 70 行\n# y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh\n# 修改为：\ny[..., 2:4] = (y[..., 2:4] * 2) ** 2 * anchor_grid  # wh\n\n# yolov5\u002Fmodels\u002Fyolo.py 第 73 行\n# wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh\n# 修改为：\nwh = (y[..., 2:4] * 2) ** 2 * anchor_grid  # wh\n############# 对于 YOLOv5-6.0 #####################\n\n\n# yolov5\u002Fexport.py 第 52 行\n# torch.onnx.export(dynamic_axes={'images': {0: 'batch', 2: 'height', 3: 'width'},  # 形状(1,3,640,640)\n#                                'output': {0: 'batch', 1: 'anchors'}  # 形状(1,25200,85) 修改为\n# 修改为：\ntorch.onnx.export(dynamic_axes={'images': {0: 'batch'},  # 形状(1,3,640,640)\n                                'output': {0: 'batch'}  # 形状(1,25200,85) \n```\n3. 导出为 ONNX 模型\n```bash\ncd yolov5\npython export.py --weights=yolov5s.pt --dynamic --include=onnx --opset=11\n```\n4. 复制模型并执行\n```bash\ncp yolov5\u002Fyolov5s.onnx tensorRT_cpp\u002Fworkspace\u002F\ncd tensorRT_cpp\nmake yolo -j32\n```\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>YOLOv7 支持\u003C\u002Fsummary>\n1. 下载 YOLOv7 和对应的 pth 文件。\n\n```bash\n# 来自 CDN\n\n# 或者使用 wget 下载：https:\u002F\u002Fgithub.com\u002FWongKinYiu\u002Fyolov7\u002Freleases\u002Fdownload\u002Fv0.1\u002Fyolov7.pt\n\nwget https:\u002F\u002Fcdn.githubjs.cf\u002FWongKinYiu\u002Fyolov7\u002Freleases\u002Fdownload\u002Fv0.1\u002Fyolov7.pt\ngit clone git@github.com:WongKinYiu\u002Fyolov7.git\n```\n\n2. 修改代码以支持动态批次大小\n```python\n# yolov7\u002Fmodels\u002Fyolo.py 中的第45行 forward 函数\n# bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) 转为 x(bs,3,20,20,85)\n# x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n# 修改为：\n\nbs, _, ny, nx = map(int, x[i].shape)  # x(bs,255,20,20) 转为 x(bs,3,20,20,85)\nbs = -1\nx[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n\n# yolov7\u002Fmodels\u002Fyolo.py 中的第52行\n# y = x[i].sigmoid()\n# y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy\n# y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh\n# z.append(y.view(bs, -1, self.no))\n# 修改为：\ny = x[i].sigmoid()\nxy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy\nwh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].view(1, -1, 1, 1, 2)  # wh\nclassif = y[..., 4:]\ny = torch.cat([xy, wh, classif], dim=-1)\nz.append(y.view(bs, self.na * ny * nx, self.no))\n\n# yolov7\u002Fmodels\u002Fyolo.py 中的第57行\n# return x 如果处于训练模式，否则返回 (torch.cat(z, 1), x)\n# 修改为：\nreturn x 如果处于训练模式，否则返回 torch.cat(z, 1)\n\n\n# yolov7\u002Fmodels\u002Fexport.py 中的第52行\n# output_names=['classes', 'boxes'] 如果 y 为空，则为 ['output'],\n# dynamic_axes={'images': {0: 'batch', 2: 'height', 3: 'width'},  \u002F\u002F 尺寸(1,3,640,640)\n#               'output': {0: 'batch', 2: 'y', 3: 'x'}} 如果 opt.dynamic 为真，则设置，否则为 None)\n# 修改为：\noutput_names=['classes', 'boxes'] 如果 y 为空，则为 ['output'],\ndynamic_axes={'images': {0: 'batch'},  \u002F\u002F 尺寸(1,3,640,640)\n              'output': {0: 'batch'}} 如果 opt.dynamic 为真，则设置，否则为 None)\n\n```\n3. 导出为 ONNX 模型\n```bash\ncd yolov7\npython models\u002Fexport.py --dynamic --grid --weight=yolov7.pt\n```\n4. 复制模型并执行\n```bash\ncp yolov7\u002Fyolov7.onnx tensorRT_cpp\u002Fworkspace\u002F\ncd tensorRT_cpp\nmake yolo -j32\n```\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>YoloX 支持\u003C\u002Fsummary>\n  \n- 下载地址：https:\u002F\u002Fgithub.com\u002FMegvii-BaseDetection\u002FYOLOX\n- 如果不想自己导出 ONNX 文件，可以直接在 Megvii 的仓库中运行。\n\n1. 下载 YoloX\n```bash\ngit clone git@github.com:Megvii-BaseDetection\u002FYOLOX.git\ncd YOLOX\n```\n\n2. 修改代码\n修改后的代码可以确保成功进行 int8 编译和推理，否则会抛出 `Missing scale and zero-point for tensor (Unnamed Layer* 686)` 错误。\n  \n```Python\n# yolox\u002Fmodels\u002Fyolo_head.py 中的第206行 forward 函数。将注释掉的代码替换为未注释的代码\n# self.hw = [x.shape[-2:] for x in outputs] \nself.hw = [list(map(int, x.shape[-2:])) for x in outputs]\n\n\n# yolox\u002Fmodels\u002Fyolo_head.py 中的第208行 forward 函数。将注释掉的代码替换为未注释的代码\n# [batch, n_anchors_all, 85]\n# outputs = torch.cat(\n#     [x.flatten(start_dim=2) for x in outputs], dim=2\n# ).permute(0, 2, 1)\nproc_view = lambda x: x.view(-1, int(x.size(1)), int(x.size(2) * x.size(3)))\noutputs = torch.cat(\n    [proc_view(x) for x in outputs], dim=2\n).permute(0, 2, 1)\n\n\n# yolox\u002Fmodels\u002Fyolo_head.py 中的第253行 decode_output 函数。将注释掉的代码替换为未注释的代码\n#outputs[..., :2] = (outputs[..., :2] + grids) * strides\n#outputs[..., 2:4] = torch.exp(outputs[..., 2:4]) * strides\n#return outputs\nxy = (outputs[..., :2] + grids) * strides\nwh = torch.exp(outputs[..., 2:4]) * strides\nreturn torch.cat((xy, wh, outputs[..., 4:]), dim=-1)\n\n# tools\u002Fexport_onnx.py 中的第77行\nmodel.head.decode_in_inference = True\n```\n\n \n3. 导出为 ONNX\n```bash\n\n# 下载模型\nwget https:\u002F\u002Fgithub.com\u002FMegvii-BaseDetection\u002FYOLOX\u002Freleases\u002Fdownload\u002F0.1.1rc0\u002Fyolox_m.pth\n\n# 导出\nexport PYTHONPATH=$PYTHONPATH:.\npython tools\u002Fexport_onnx.py -c yolox_m.pth -f exps\u002Fdefault\u002Fyolox_m.py --output-name=yolox_m.onnx --dynamic --no-onnxsim\n```\n\n4. 执行命令\n```bash\ncp YOLOX\u002Fyolox_m.onnx tensorRT_cpp\u002Fworkspace\u002F\ncd tensorRT_cpp\nmake yolo -j32\n```\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>YoloV3 支持\u003C\u002Fsummary>\n  \n- 如果 PyTorch 版本 ≥ 1.7，且模型版本 ≥ 5.0，则框架本身即可支持该模型。\n- 如果 PyTorch 版本 \u003C 1.7 或是 YOLOv3 模型，则需要对 opset 进行小幅调整。\n- 若希望在较低版本的 PyTorch 上实现推理，或使用动态批次大小等高级功能，请查看我们的[博客](http:\u002F\u002Fzifuture.com:8090)（目前为中文），并通过微信扫描二维码加入我们。\n\n\n1. 下载 YOLOv3\n\n```bash\ngit clone git@github.com:ultralytics\u002Fyolov3.git\n```\n\n2. 修改代码以支持动态批次大小\n```python\n# yolov3\u002Fmodels\u002Fyolo.py 中的第55行 forward 函数\n# bs, _, ny, nx = x[i].shape  \u002F\u002F x(bs,255,20,20) 转为 x(bs,3,20,20,85)\n# x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n# 修改为：\n\nbs, _, ny, nx = map(int, x[i].shape)  \u002F\u002F x(bs,255,20,20) 转为 x(bs,3,20,20,85)\nbs = -1\nx[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n\n\n# yolov3\u002Fmodels\u002Fyolo.py 中的第70行\n#  z.append(y.view(bs, -1, self.no))\n# 修改为：\nz.append(y.view(bs, self.na * ny * nx, self.no))\n\n# yolov3\u002Fmodels\u002Fyolo.py 中的第62行\n# 如果 self.grid[i].shape[2:4] 不等于 x[i].shape[2:4] 或者 self.onnx_dynamic 为真，\n#    则更新 self.grid[i] 和 self.anchor_grid[i] 使用 _make_grid(nx, ny, i) 方法\n# 修改为：\n如果 self.grid[i].shape[2:4] 不等于 x[i].shape[2:4] 或者 self.onnx_dynamic 为真，\n    则更新 self.grid[i] 和 self.anchor_grid[i] 使用 _make_grid(nx, ny, i) 方法\nanchor_grid = (self.anchors[i].clone() * self.stride[i]).view(1, -1, 1, 1, 2)\n\n# yolov3\u002Fmodels\u002Fyolo.py 中的第70行\n# y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  \u002F\u002F wh\n# 修改为：\ny[..., 2:4] = (y[..., 2:4] * 2) ** 2 * anchor_grid  \u002F\u002F wh\n\n# yolov3\u002Fmodels\u002Fyolo.py 中的第73行\n# wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  \u002F\u002F wh\n# 修改为：\nwh = (y[..., 2:4] * 2) ** 2 * anchor_grid  \u002F\u002F wh\n\n\n# yolov3\u002Fexport.py 中的第52行\n# torch.onnx.export(dynamic_axes={'images': {0: 'batch', 2: 'height', 3: 'width'},  \u002F\u002F 形状(1,3,640,640)\n#                                'output': {0: 'batch', 1: 'anchors'}  \u002F\u002F 形状(1,25200,85) \n# 修改为：\ntorch.onnx.export(dynamic_axes={'images': {0: 'batch'},  \u002F\u002F 形状(1,3,640,640)\n                                'output': {0: 'batch'}  \u002F\u002F 形状(1,25200,85) \n```\n3. 导出为 ONNX 模型\n```bash\ncd yolov3\npython export.py --weights=yolov3.pt --dynamic --include=onnx --opset=11\n```\n4. 复制模型并执行\n```bash\ncp yolov3\u002Fyolov3.onnx tensorRT_cpp\u002Fworkspace\u002F\ncd tensorRT_cpp\n\n# 修改 src\u002Fapplication\u002Fapp_yolo.cpp: main\n\n# 测试(Yolo::Type::V3, TRT::Mode::FP32, \"yolov3\");\n\nmake yolo -j32\n```\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>UNet支持\u003C\u002Fsummary>\n  \n- 参考链接：https:\u002F\u002Fgithub.com\u002Fshouxieai\u002Funet-pytorch\n\n```\nmake dunet -j32\n```\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>Retinaface支持\u003C\u002Fsummary>\n\n- https:\u002F\u002Fgithub.com\u002Fbiubug6\u002FPytorch_Retinaface\n\n1. 下载Pytorch_Retinaface仓库\n\n```bash\ngit clone git@github.com:biubug6\u002FPytorch_Retinaface.git\ncd Pytorch_Retinaface\n```\n\n2. 从https:\u002F\u002Fgithub.com\u002Fbiubug6\u002FPytorch_Retinaface#training中的README.md训练部分下载模型，然后解压到\u002Fweights目录。这里我们使用mobilenet0.25_Final.pth。\n\n3. 修改代码\n\n```python\n# models\u002Fretinaface.py第24行\n# return out.view(out.shape[0], -1, 2) 修改为\nreturn out.view(-1, int(out.size(1) * out.size(2) * 2), 2)\n\n# models\u002Fretinaface.py第35行\n# return out.view(out.shape[0], -1, 4) 修改为\nreturn out.view(-1, int(out.size(1) * out.size(2) * 2), 4)\n\n# models\u002Fretinaface.py第46行\n# return out.view(out.shape[0], -1, 10) 修改为\nreturn out.view(-1, int(out.size(1) * out.size(2) * 2), 10)\n\n# 下面的修改确保resize节点的输出基于缩放比例而非固定形状，从而实现动态批处理。\n# models\u002Fnet.py第89行\n# up3 = F.interpolate(output3, size=[output2.size(2), output2.size(3)], mode=\"nearest\") 修改为\nup3 = F.interpolate(output3, scale_factor=2, mode=\"nearest\")\n\n# models\u002Fnet.py第93行\n# up2 = F.interpolate(output2, size=[output1.size(2), output1.size(3)], mode=\"nearest\") 修改为\nup2 = F.interpolate(output2, scale_factor=2, mode=\"nearest\")\n\n# 下面的代码移除了softmax（有时会出现问题），同时将输出拼接起来以简化解码过程。\n# models\u002Fretinaface.py第123行\n# if self.phase == 'train':\n#     output = (bbox_regressions, classifications, ldm_regressions)\n# else:\n#     output = (bbox_regressions, F.softmax(classifications, dim=-1), ldm_regressions)\n# return output\n# 上述内容修改为：\noutput = (bbox_regressions, classifications, ldm_regressions)\nreturn torch.cat(output, dim=-1)\n\n# 设置'opset_version=11'以确保导出成功。\n# torch_out = torch.onnx._export(net, inputs, output_onnx, export_params=True, verbose=False,\n#     input_names=input_names, output_names=output_names)\n# 修改为：\ntorch_out = torch.onnx._export(net, inputs, output_onnx, export_params=True, verbose=False, opset_version=11,\n    input_names=input_names, output_names=output_names)\n\n\n\n\n```\n4. 导出为ONNX格式\n```bash\npython convert_to_onnx.py\n```\n\n5. 执行\n```bash\ncp FaceDetector.onnx ..\u002FtensorRT_cpp\u002Fworkspace\u002Fmb_retinaface.onnx\ncd ..\u002FtensorRT_cpp\nmake retinaface -j64\n```\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>DBFace支持\u003C\u002Fsummary>\n\n- https:\u002F\u002Fgithub.com\u002Fdlunion\u002FDBFace\n\n```bash\nmake dbface -j64\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Scrfd支持\u003C\u002Fsummary>\n\n- https:\u002F\u002Fgithub.com\u002Fdeepinsight\u002Finsightface\u002Ftree\u002Fmaster\u002Fdetection\u002Fscrfd\n- 关于导出为ONNX格式的技术细节即将发布。在正式发布之前，欢迎加入我们进行讨论。\n\n\u003C\u002Fdetails>\n\n\n\n\u003Cdetails>\n\u003Csummary>Arcface支持\u003C\u002Fsummary>\n\n- https:\u002F\u002Fgithub.com\u002Fdeepinsight\u002Finsightface\u002Ftree\u002Fmaster\u002Frecognition\u002Farcface_torch\n```C++\nauto arcface = Arcface::create_infer(\"arcface_iresnet50.fp32.trtmodel\", 0);\nauto feature = arcface->commit(make_tuple(face, landmarks)).get();\ncout \u003C\u003C feature \u003C\u003C endl;  \u002F\u002F 1x512\n```\n- 在人脸识别示例中，`workspace\u002Fface\u002Flibrary`是已注册的人脸集合。\n- `workspace\u002Fface\u002Frecognize`是要识别的人脸集合。\n- 结果保存在`workspace\u002Fface\u002Fresult`和`workspace\u002Fface\u002Flibrary_draw`中。\n\n\u003C\u002Fdetails>\n  \n\u003Cdetails>\n\u003Csummary>CenterNet支持\u003C\u002Fsummary>\n  \n请参阅教程\u002F2.0中的详细说明。\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>Bert支持（中文分类）\u003C\u002Fsummary>\n\n- https:\u002F\u002Fgithub.com\u002F649453932\u002FBert-Chinese-Text-Classification-Pytorch\n- `make bert -j6`  \n\n\u003C\u002Fdetails>\n\n## 界面简介\n\n\u003Cdetails>\n\u003Csummary>Python 接口：更轻松地从 PyTorch 模型获取 ONNX 和 TRT 模型\u003C\u002Fsummary>\n\n- 仅需一行代码即可导出 ONNX 和 TRT 模型，并将其保存以供后续使用。\n```python\nimport pytrt\n\nmodel = models.resnet18(True).eval()\npytrt.from_torch(\n    model, \n    dummy_input, \n    max_batch_size=16, \n    onnx_save_file=\"test.onnx\", \n    engine_save_file=\"engine.trtmodel\"\n)\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Python 接口：TensorRT 推理\u003C\u002Fsummary>\n\n- YOLOX TensorRT 推理\n```python\nimport pytrt\n\nyolo   = tp.Yolo(engine_file, type=tp.YoloType.X)   # engine_file 是 TRT 模型文件\nimage  = cv2.imread(\"inference\u002Fcar.jpg\")\nbboxes = yolo.commit(image).get()\n```\n\n- 从 PyTorch 到 TensorRT 的无缝推理\n```python\nimport pytrt\n\nmodel     = models.resnet18(True).eval().to(device) # PyTorch 模型\ntrt_model = tp.from_torch(model, input)\ntrt_out   = trt_model(input)\n```\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>C++ 接口：YOLOX 推理\u003C\u002Fsummary>\n\n```C++\n\n\u002F\u002F 在 GPU 0 上创建推理引擎\nauto engine = Yolo::create_infer(\"yolox_m.fp32.trtmodel\"， Yolo::Type::X, 0);\n\n\u002F\u002F 加载图像\nauto image = cv::imread(\"1.jpg\");\n\n\u002F\u002F 进行推理并获取结果\nauto box = engine->commit(image).get();\n```\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>C++ 接口：以 FP32\u002FFP16 编译模型\u003C\u002Fsummary>\n\n```cpp\nTRT::compile(\n  TRT::Mode::FP32,   \u002F\u002F 以 FP32 编译模型\n  3,                          \u002F\u002F 最大批量大小\n  \"plugin.onnx\",              \u002F\u002F ONNX 文件\n  \"plugin.fp32.trtmodel\",     \u002F\u002F 保存路径\n  {}                         \u002F\u002F 需要时重新定义输入形状\n);\n```\n- 对于 FP32 编译，你只需提供一个允许重新定义输入形状的 ONNX 文件。\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>C++ 接口：以 INT8 编译\u003C\u002Fsummary>\n\n- INT8 推理在精度上略逊于 FP32（约下降 5%），但速度却快得惊人。在该框架中，我们提供了 INT8 推理功能。\n\n```cpp\n\u002F\u002F 定义 INT8 校准函数，用于读取数据并处理为张量。\nauto int8process = [](int current, int count, vector\u003Cstring>& images, shared_ptr\u003CTRT::Tensor>& tensor){\n    for(int i = 0; i \u003C images.size(); ++i){\n    \u002F\u002F INT8 编译需要校准。我们读取图像数据并设置归一化矩阵，然后将数据转换为张量。\n        auto image = cv::imread(images[i]);\n        cv::resize(image, image, cv::Size(640, 640));\n        float mean[] = {0, 0, 0};\n        float std[]  = {1, 1, 1};\n        tensor->set_norm_mat(i, image, mean, std);\n    }\n};\n\n\n\u002F\u002F 指定 TRT::Mode 为 INT8\nauto model_file = \"yolov5m.int8.trtmodel\";\nTRT::compile(\n  TRT::Mode::INT8,            \u002F\u002F INT8\n  3,                          \u002F\u002F 最大批量大小\n  \"yolov5m.onnx\",             \u002F\u002F ONNX\n  model_file,                 \u002F\u002F 保存文件名\n  {},                         \u002F\u002F 重新定义输入形状\n  int8process,                \u002F\u002F 校准回调函数\n  \".\",                        \u002F\u002F 用于校准的图像数据所在目录\n  \"\"                          \u002F\u002F 校准数据保存目录（即加载校准数据的地方）\n);\n```\n- 我们通过整合一个 int8process 函数，避免了 TensorRT 官方实现中可能出现的诸多问题。\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>C++ 接口：推理\u003C\u002Fsummary>\n\n- 我们引入了 Tensor 类，以便更轻松地进行推理和主机与设备之间的数据传输，从而让用户无需关注底层细节。\n\n- Engine 类是另一个便利工具。\n\n```cpp\n\u002F\u002F 加载模型并获取共享指针。如果加载失败，则返回 nullptr。\nauto engine = TRT::load_infer(\"yolov5m.fp32.trtmodel\");\n\n\u002F\u002F 打印模型信息\nengine->print();\n\n\u002F\u002F 加载图像\nauto image = imread(\"demo.jpg\");\n\n\u002F\u002F 获取模型的输入和输出节点，可以通过名称或索引访问\nauto input = engine->input(0);   \u002F\u002F 或者 auto input = engine->input(\"images\");\nauto output = engine->output(0); \u002F\u002F 或者 auto output = engine->output(\"output\");\n\n\u002F\u002F 调用 set_norm_mat() 将图像放入输入张量中\nfloat mean[] = {0, 0, 0};\nfloat std[]  = {1, 1, 1};\ninput->set_norm_mat(i, image, mean, std);\n\n\u002F\u002F 进行推理。这里 sync(true) 或 async(false) 是可选的\nengine->forward(); \u002F\u002F engine->forward(true 或 false)\n\n\u002F\u002F 获取输出指针，用于访问输出结果\nfloat* output_ptr = output->cpu\u003Cfloat>();\n```\n\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>C++ 接口：插件\u003C\u002Fsummary>\n\n- 你只需定义核函数和推理过程即可。代码的细节（例如插件的序列化、反序列化和注入等）都由框架自动处理。\n- 很容易实现新的 FP32 和 FP16 插件。详情请参阅 HSwish.cu。\n```cpp\ntemplate\u003C>\n__global__ void HSwishKernel(float* input, float* output, int edge) {\n\n    KernelPositionBlock;\n    float x = input[position];\n    float a = x + 3;\n    a = a \u003C 0 ? 0 : (a >= 6 ? 6 : a);\n    output[position] = x * a \u002F 6;\n}\n\nint HSwish::enqueue(const std::vector\u003CGTensor>& inputs, std::vector\u003CGTensor>& outputs, const std::vector\u003CGTensor>& weights, void* workspace, cudaStream_t stream) {\n\n    int count = inputs[0].count();\n    auto grid = CUDATools::grid_dims(count);\n    auto block = CUDATools::block_dims(count);\n    HSwishKernel \u003C\u003C\u003Cgrid, block, 0, stream >>> (inputs[0].ptr\u003Cfloat>(), outputs[0].ptr\u003Cfloat>(), count);\n    return 0;\n}\n\n\nRegisterPlugin(HSwish);\n```\n\n\u003C\u002Fdetails>\n\n\n## 关于我们\n- 我们的博客：http:\u002F\u002Fwww.zifuture.com\u002F                        （目前仅提供中文版，英文版即将推出）\n- 我们的视频频道： https:\u002F\u002Fspace.bilibili.com\u002F1413433465 （目前仅提供中文版，英文版即将推出）","# tensorRT_Pro 快速上手指南\n\n## 环境准备\n\n- **系统要求**：Linux (Ubuntu 20.04+)\n- **前置依赖**：\n  - CUDA 10.2\n  - cuDNN 8.2.2.26（需同时安装开发头文件和运行时库）\n  - TensorRT 8.0.1.6（CUDA 10.2）\n  - Protobuf 3.11.4（推荐使用清华镜像下载：`https:\u002F\u002Fmirrors.tuna.tsinghua.edu.cn\u002Fgithub-release\u002Fprotocolbuffers\u002Fprotobuf\u002Fv3.11.4\u002Fprotobuf-cpp-3.11.4.zip`）\n  - GPU Compute Capability（根据显卡设置，如 RTX 3080 Ti 使用 `compute_86`，参考 [NVIDIA Compute Capability 表](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-gpus#compute)）\n  - VSCode（推荐用于开发）\n\n## 安装步骤\n\n1. 克隆仓库：\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Fshouxieai\u002Finfer.git\n   cd infer\n   ```\n\n2. 配置依赖路径：\n   - 编辑 `Makefile`，设置路径变量（示例）：\n     ```makefile\n     CUDA_PATH := \u002Fusr\u002Flocal\u002Fcuda-10.2\n     CUDNN_PATH := \u002Fusr\u002Flib\u002Fx86_64-linux-gnu\n     TENSORRT_PATH := \u002Fusr\u002Flib\u002Fx86_64-linux-gnu\n     PROTOBUF_PATH := \u002Fpath\u002Fto\u002Fprotobuf-3.11.4\n     ```\n   - 设置 Compute Capability（在 `NVCC_FLAGS` 中修改）：\n     ```makefile\n     NVCC_FLAGS += -gencode=arch=compute_86,code=sm_86  # 示例：RTX 3080 Ti\n     ```\n\n3. 编译项目：\n   - 仅 C++ 版本：\n     ```bash\n     make -j8\n     ```\n   - Python 支持（需额外配置）：\n     ```makefile\n     # 在 Makefile 中设置\n     use_python := true\n     ```\n     ```bash\n     make pyinstall -j8\n     ```\n\n## 基本使用\n\n### Python 最简示例\n\n以下代码实现 YOLOv5s 推理（自动编译 ONNX 模型）：\n\n```python\nimport os\nimport cv2\nimport numpy as np\nimport pytrt as tp\n\nengine_file = \"yolov5s.fp32.trtmodel\"\nif not os.path.exists(engine_file):\n    tp.compile_onnx_to_file(1, tp.onnx_hub(\"yolov5s\"), engine_file)\n\nyolo = tp.Yolo(engine_file, type=tp.YoloType.V5)\nimage = cv2.imread(\"car.jpg\")\nbboxes = yolo.commit(image).get()\nprint(f\"{len(bboxes)} objects\")\n\nfor box in bboxes:\n    left, top, right, bottom = map(int, [box.left, box.top, box.right, box.bottom])\n    cv2.rectangle(image, (left, top), (right, bottom), tp.random_color(box.class_label), 5)\n\nsaveto = \"yolov5.car.jpg\"\nprint(f\"Save to {saveto}\")\ncv2.imwrite(saveto, image)\ncv2.imshow(\"result\", image)\ncv2.waitKey()\n```\n\n**使用说明**：\n1. 确保 `car.jpg` 文件在当前目录\n2. 首次运行会自动编译 ONNX 模型到 TensorRT 引擎\n3. 结果保存为 `yolov5.car.jpg`，显示检测框","某智能安防公司需在NVIDIA Jetson AGX Xavier嵌入式设备上部署YOLOv5行人检测模型，实时分析1080P视频流以实现安防预警，要求推理延迟低于80ms。\n\n### 没有 tensorRT_Pro 时\n- 需手动编写TensorRT引擎创建代码，涉及输入输出绑定、插件注册等底层操作，开发周期长达2周。\n- 模型精度优化（FP16\u002FINT8）需反复编译测试，每次调整耗时1小时以上，影响紧急需求迭代。\n- 自定义后处理插件（如NMS）实现复杂，调试时频繁出现内存泄漏，导致3次返工。\n- 嵌入式部署需单独处理CUDA 11.4、TensorRT 8.4版本依赖，环境配置失败率超40%。\n- 性能瓶颈不明确，推理延迟常达120ms，无法满足实时预警要求。\n\n### 使用 tensorRT_Pro 后\n- 仅需3行C++代码加载预编译模型并执行推理，开发周期压缩至2天。\n- 内置一键精度优化工具，FP16\u002FINT8切换仅需5分钟，优化效率提升12倍。\n- 封装后处理插件序列化机制，NMS逻辑通过API直接调用，调试错误率归零。\n- 通过Docker镜像一键部署到Jetson设备，环境配置时间从1天降至10分钟。\n- 性能测试报告精准定位瓶颈，优化后平均延迟降至58ms，满足实时性指标。\n\ntensorRT_Pro将复杂的TensorRT部署流程简化为几行代码，让嵌入式AI开发从“技术攻坚”转向“业务创新”。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fshouxieai_tensorRT_Pro_0ddc00e2.png","shouxieai","手写AI","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fshouxieai_48eb56de.jpg",null,"https:\u002F\u002Fgithub.com\u002Fshouxieai",[81,85,89,93,96,100,104,108,112,115],{"name":82,"color":83,"percentage":84},"C++","#f34b7d",75.4,{"name":86,"color":87,"percentage":88},"C","#555555",16.6,{"name":90,"color":91,"percentage":92},"HTML","#e34c26",2.4,{"name":94,"color":95,"percentage":23},"Cuda","#3A4E3A",{"name":97,"color":98,"percentage":99},"Jupyter Notebook","#DA5B0B",1.9,{"name":101,"color":102,"percentage":103},"Python","#3572A5",0.6,{"name":105,"color":106,"percentage":107},"CSS","#663399",0.5,{"name":109,"color":110,"percentage":111},"JavaScript","#f1e05a",0.2,{"name":113,"color":114,"percentage":111},"Makefile","#427819",{"name":116,"color":117,"percentage":118},"CMake","#DA3434",0.1,2867,579,"2026-04-01T01:12:08","MIT","Linux, Windows","需要 NVIDIA GPU，CUDA 10.2","未说明",{"notes":127,"python":125,"dependencies":128},"编译时需指定 GPU Compute Capability（如 compute_75, sm_75），需配置 CUDA\u002FCUDNN\u002FTensorRT 路径，首次运行可能需下载模型文件",[129,130,131,132],"cuda==10.2","cudnn==8.2.2.26","tensorrt==8.0.1.6","protobuf==3.11.4",[14,13],[135,136,137,138,139,140],"pytorch","tensorrt","object-detection","deep-learning","yolov5","yolox","2026-03-27T02:49:30.150509","2026-04-06T08:17:38.232243",[144,149,154,159,164,169],{"id":145,"question_zh":146,"answer_zh":147,"source_url":148},5524,"INT8 模式下没有检测结果","问题原因是模型 INT8 时的预处理不匹配。解决方案：修改为最新版的预处理方式，使用 Norm::None()。已在仓库提交修复。","https:\u002F\u002Fgithub.com\u002Fshouxieai\u002FtensorRT_Pro\u002Fissues\u002F4",{"id":150,"question_zh":151,"answer_zh":152,"source_url":153},5525,"Yolov7 模型转换失败","问题可能是预训练模型已包含 IDetect 类型，导致 yaml 配置无效。解决方案：检查预训练模型是否为 IDetect 格式（如模型文件中显示 IDetect），若已包含则需使用对应配置，避免使用错误的 yaml 文件。","https:\u002F\u002Fgithub.com\u002Fshouxieai\u002FtensorRT_Pro\u002Fissues\u002F119",{"id":155,"question_zh":156,"answer_zh":157,"source_url":158},5526,"编译时出现 undefined reference 错误","错误 'undefined reference to symbol 'dlsym@@GLIBC_2.2.5'' 的解决方案：在 CMakeLists.txt 文件的最后一行添加 'c'，例如将 'target_link_libraries(...)' 修改为 'target_link_libraries(... c)'，即可解决链接问题。","https:\u002F\u002Fgithub.com\u002Fshouxieai\u002FtensorRT_Pro\u002Fissues\u002F29",{"id":160,"question_zh":161,"answer_zh":162,"source_url":163},5527,"Jetson AGX ORIN 中出现未知设备错误","错误 'Assertion upperBound != 0 failed. Unknown embedded device detected' 的解决方案：尝试使用最新版本的 JetPack（如 5.0.2 或 5.1.1），因为 NVIDIA 表示此问题会在后续版本中修复。","https:\u002F\u002Fgithub.com\u002Fshouxieai\u002FtensorRT_Pro\u002Fissues\u002F137",{"id":165,"question_zh":166,"answer_zh":167,"source_url":168},5528,"仿射变换矩阵的偏移量计算问题","坐标系理解角度不同但结果一致。解决方案：使用平移矩阵 TPS 相乘可得到相同变换矩阵 M，无需调整偏移量计算。例如，通过 TPS 矩阵相乘可抵消偏移量差异。","https:\u002F\u002Fgithub.com\u002Fshouxieai\u002FtensorRT_Pro\u002Fissues\u002F124",{"id":170,"question_zh":171,"answer_zh":172,"source_url":173},5529,"NMS CUDA 加速实现精度损失问题","精度损失较小的解决方案：参考 yolo.hpp 文件中的备注，具体位置在 https:\u002F\u002Fgithub.com\u002Fshouxieai\u002FtensorRT_Pro\u002Fblob\u002Fd577fbab615a3d84cb50824d2418655659fd61af\u002Fsrc\u002Fapplication\u002Fapp_yolo\u002Fyolo.hpp#L29，按代码注释实现即可。","https:\u002F\u002Fgithub.com\u002Fshouxieai\u002FtensorRT_Pro\u002Fissues\u002F85",[175],{"id":176,"version":177,"summary_zh":178,"released_at":179},114750,"v1.0","第一个版本固定下来","2021-09-14T03:39:56"]