[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-onnx--models":3,"tool-onnx--models":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",158594,2,"2026-04-16T23:34:05",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":77,"owner_twitter":77,"owner_website":78,"owner_url":79,"languages":80,"stars":89,"forks":90,"last_commit_at":91,"license":92,"difficulty_score":32,"env_os":93,"env_gpu":94,"env_ram":94,"env_deps":95,"category_tags":99,"github_topics":100,"view_count":32,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":104,"updated_at":105,"faqs":106,"releases":136},8345,"onnx\u002Fmodels","models","A collection of pre-trained, state-of-the-art models in the ONNX format ","ONNX Model Zoo 是一个汇聚了众多预训练、最先进机器学习模型的开源资源库，所有模型均采用通用的 ONNX 格式。它主要解决了不同深度学习框架之间模型转换困难、复用成本高的问题，让开发者能够轻松跨越框架壁垒，在各种工具、运行时和编译器中灵活部署和使用模型。\n\n虽然该项目目前已转为历史归档状态，不再直接通过 Git LFS 提供下载（新资源已迁移至 Hugging Face），但其沉淀的模型资产依然极具价值。这些模型源自 timm、torchvision、transformers 等知名开源项目，涵盖计算机视觉、自然语言处理、生成式 AI 及图机器学习等多个领域，并经过严格的准确性验证。此外，资源库还特别提供了由 Intel Neural Compressor 生成的 INT8 量化模型，帮助追求高性能推理的用户进一步优化模型体积与速度。\n\n无论是希望快速上手实验的 AI 开发者、需要基准模型进行研究的研究人员，还是对机器学习感兴趣的技术爱好者，都能从中找到适合的起点。通过标准化的格式和丰富的类别，models 让高质量的人工智能技术变得更加触手可及，促进了社区内的知识共享与技","ONNX Model Zoo 是一个汇聚了众多预训练、最先进机器学习模型的开源资源库，所有模型均采用通用的 ONNX 格式。它主要解决了不同深度学习框架之间模型转换困难、复用成本高的问题，让开发者能够轻松跨越框架壁垒，在各种工具、运行时和编译器中灵活部署和使用模型。\n\n虽然该项目目前已转为历史归档状态，不再直接通过 Git LFS 提供下载（新资源已迁移至 Hugging Face），但其沉淀的模型资产依然极具价值。这些模型源自 timm、torchvision、transformers 等知名开源项目，涵盖计算机视觉、自然语言处理、生成式 AI 及图机器学习等多个领域，并经过严格的准确性验证。此外，资源库还特别提供了由 Intel Neural Compressor 生成的 INT8 量化模型，帮助追求高性能推理的用户进一步优化模型体积与速度。\n\n无论是希望快速上手实验的 AI 开发者、需要基准模型进行研究的研究人员，还是对机器学习感兴趣的技术爱好者，都能从中找到适合的起点。通过标准化的格式和丰富的类别，models 让高质量的人工智能技术变得更加触手可及，促进了社区内的知识共享与技术普及。","\u003C!--- SPDX-License-Identifier: Apache-2.0 -->\n\n> **Deprecation Notice**: We sincerely thank the community for participating in the ONNX Model Zoo effort. As the machine learning ecosystem has evolved, much of the novel model sharing has successfully transitioned to Hugging Face, which maintains a vibrant and healthy state. We are preserving the ONNX Model Zoo  repository for historical purposes only. Please note that models will no longer be available for LFS download starting July 1st, 2025. You can still get access to the models that were originally available on this repository by going to https:\u002F\u002Fhuggingface.co\u002Fonnxmodelzoo.\n\n# ONNX Model Zoo\n\n\n## Introduction\n\nWelcome to the ONNX Model Zoo! The Open Neural Network Exchange (ONNX) is an open standard format created to represent machine learning models. Supported by a robust community of partners, ONNX defines a common set of operators and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.\n\nThis repository is a curated collection of pre-trained, state-of-the-art models in the ONNX format. These models are sourced from prominent open-source repositories and have been contributed by a diverse group of community members. Our aim is to facilitate the spread and usage of machine learning models among a wider audience of developers, researchers, and enthusiasts.\n\nTo handle ONNX model files, which can be large, we use Git LFS (Large File Storage). \n\n## Models\n\nCurrently, we are expanding the ONNX Model Zoo by incorporating additional models from the following categories.\nAs we are rigorously validating the new models for accuracy, refer to the [validated models](#validated-models) below that have been successfully validated for accuracy:\n\n- Computer Vision\n- Natural Language Processing (NLP)\n- Generative AI\n- Graph Machine Learning\n\nThese models are sourced from prominent open-source repositories such as [timm](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models), [torchvision](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fvision), [torch_hub](https:\u002F\u002Fpytorch.org\u002Fhub\u002F), and [transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers), and exported into the ONNX format using the open-source [TurnkeyML toolchain](https:\u002F\u002Fgithub.com\u002Fonnx\u002Fturnkeyml).\n\n\n## Validated Models\n\n#### Vision\n* [Image Classification](#image_classification)\n* [Object Detection & Image Segmentation](#object_detection)\n* [Body, Face & Gesture Analysis](#body_analysis)\n* [Image Manipulation](#image_manipulation)\n\n#### Language\n* [Machine Comprehension](#machine_comprehension)\n* [Machine Translation](#machine_translation)\n* [Language Modelling](#language_modelling)\n\n#### Other\n* [Visual Question Answering & Dialog](#visual_qna)\n* [Speech & Audio Processing](#speech)\n* [Other interesting models](#others)\n\nRead the [Usage](#usage-) section below for more details on the file formats in the ONNX Model Zoo (.onnx, .pb, .npz), downloading multiple ONNX models through [Git LFS command line](#gitlfs-), and starter Python code for validating your ONNX model using test data.\n\nINT8 models are generated by [Intel® Neural Compressor](https:\u002F\u002Fgithub.com\u002Fintel\u002Fneural-compressor). [Intel® Neural Compressor](https:\u002F\u002Fgithub.com\u002Fintel\u002Fneural-compressor) is an open-source Python library which supports automatic accuracy-driven tuning strategies to help user quickly find out the best quantized model. It implements dynamic and static quantization for ONNX models and can represent quantized ONNX models with operator oriented as well as tensor oriented (QDQ) ways. Users can use web-based UI service or python code to do quantization. Read the [Introduction](https:\u002F\u002Fgithub.com\u002Fintel\u002Fneural-compressor\u002Fblob\u002Fmaster\u002FREADME.md) for more details.\n\n### Image Classification \u003Ca name=\"image_classification\"\u002F>\nThis collection of models take images as input, then classifies the major objects in the images into 1000 object categories such as keyboard, mouse, pencil, and many animals.\n\n|Model Class |Reference |Description |Huggingface Spaces|\n|-|-|-|-|\n|\u003Cb>[MobileNet](validated\u002Fvision\u002Fclassification\u002Fmobilenet)\u003C\u002Fb>|[Sandler et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1801.04381)|Light-weight deep neural network best suited for mobile and embedded vision applications. \u003Cbr>Top-5 error from paper - ~10%|\n|\u003Cb>[ResNet](validated\u002Fvision\u002Fclassification\u002Fresnet)\u003C\u002Fb>|[He et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1512.03385)|A CNN model (up to 152 layers). Uses shortcut connections to achieve higher accuracy when classifying images. \u003Cbr> Top-5 error from paper - ~3.6%| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FResNet) |\n|\u003Cb>[SqueezeNet](validated\u002Fvision\u002Fclassification\u002Fsqueezenet)\u003C\u002Fb>|[Iandola et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1602.07360)|A light-weight CNN model providing AlexNet level accuracy with 50x fewer parameters. \u003Cbr>Top-5 error from paper - ~20%| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FSqueezeNet) |\n|\u003Cb>[VGG](validated\u002Fvision\u002Fclassification\u002Fvgg)\u003C\u002Fb>|[Simonyan et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1409.1556)|Deep CNN model(up to 19 layers). Similar to AlexNet but uses multiple smaller kernel-sized filters that provides more accuracy when classifying images. \u003Cbr>Top-5 error from paper - ~8%| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FVGG) |\n|\u003Cb>[AlexNet](validated\u002Fvision\u002Fclassification\u002Falexnet)\u003C\u002Fb>|[Krizhevsky et al.](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf)|A Deep CNN model (up to 8 layers) where the input is an image and the output is a vector of 1000 numbers. \u003Cbr> Top-5 error from paper - ~15%| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FAlexNet) |\n|\u003Cb>[GoogleNet](validated\u002Fvision\u002Fclassification\u002Finception_and_googlenet\u002Fgooglenet)\u003C\u002Fb>|[Szegedy et al.](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1409.4842.pdf)|Deep CNN model(up to 22 layers). Comparatively smaller and faster than VGG and more accurate in detailing than AlexNet. \u003Cbr> Top-5 error from paper - ~6.7%| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FGoogleNet) |\n|\u003Cb>[CaffeNet](validated\u002Fvision\u002Fclassification\u002Fcaffenet)\u003C\u002Fb>|[Krizhevsky et al.]( https:\u002F\u002Fucb-icsi-vision-group.github.io\u002Fcaffe-paper\u002Fcaffe.pdf)|Deep CNN variation of AlexNet for Image Classification in Caffe where the max pooling precedes the local response normalization (LRN) so that the LRN takes less compute and memory.| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FCaffeNet) |\n|\u003Cb>[RCNN_ILSVRC13](validated\u002Fvision\u002Fclassification\u002Frcnn_ilsvrc13)\u003C\u002Fb>|[Girshick et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1311.2524)|Pure Caffe implementation of R-CNN for image classification. This model uses localization of regions to classify and extract features from images.|\n|\u003Cb>[DenseNet-121](validated\u002Fvision\u002Fclassification\u002Fdensenet-121)\u003C\u002Fb>|[Huang et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1608.06993)|Model that has every layer connected to every other layer and passes on its own feature providing strong gradient flow and more diversified features.| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FDenseNet-121) |\n|\u003Cb>[Inception_V1](validated\u002Fvision\u002Fclassification\u002Finception_and_googlenet\u002Finception_v1)\u003C\u002Fb>|[Szegedy et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1409.4842)|This model is same as GoogLeNet, implemented through Caffe2 that has improved utilization of the computing resources inside the network and helps with the vanishing gradient problem. \u003Cbr> Top-5 error from paper - ~6.7%| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FInception_v1) |\n|\u003Cb>[Inception_V2](validated\u002Fvision\u002Fclassification\u002Finception_and_googlenet\u002Finception_v2)\u003C\u002Fb>|[Szegedy et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1512.00567)|Deep CNN model for Image Classification as an adaptation to Inception v1 with batch normalization. This model has reduced computational cost and improved image resolution compared to Inception v1. \u003Cbr> Top-5 error from paper ~4.82%|\n|\u003Cb>[ShuffleNet_V1](validated\u002Fvision\u002Fclassification\u002Fshufflenet)\u003C\u002Fb>|[Zhang et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.01083)|Extremely computation efficient CNN model that is designed specifically for mobile devices. This model greatly reduces the computational cost and provides a ~13x speedup over AlexNet on ARM-based mobile devices. Compared to MobileNet, ShuffleNet achieves superior performance by a significant margin due to it's efficient structure. \u003Cbr> Top-1 error from paper - ~32.6%|\n|\u003Cb>[ShuffleNet_V2](validated\u002Fvision\u002Fclassification\u002Fshufflenet)\u003C\u002Fb>|[Zhang et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1807.11164)|Extremely computation efficient CNN model that is designed specifically for mobile devices. This network architecture design considers direct metric such as speed, instead of indirect metric like FLOP. \u003Cbr> Top-1 error from paper - ~30.6%|\n|\u003Cb>[ZFNet-512](validated\u002Fvision\u002Fclassification\u002Fzfnet-512)\u003C\u002Fb>|[Zeiler et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1311.2901)|Deep CNN model (up to 8 layers) that increased the number of features that the network is capable of detecting that helps to pick image features at a finer level of resolution. \u003Cbr> Top-5 error from paper - ~14.3%| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FZFNet-512) |\n|\u003Cb>[EfficientNet-Lite4](validated\u002Fvision\u002Fclassification\u002Fefficientnet-lite4)\u003C\u002Fb>|[Tan et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.11946)|CNN model with an order of magnitude of few computations and parameters, while still acheiving state-of-the-art accuracy and better efficiency than previous ConvNets. \u003Cbr> Top-5 error from paper - ~2.9%| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FEfficientNet-Lite4) |\n\u003Chr>\n\n#### Domain-based Image Classification \u003Ca name=\"domain_based_image\"\u002F>\nThis subset of models classify images for specific domains and datasets.\n\n|Model Class |Reference |Description |\n|-|-|-|\n|\u003Cb>[MNIST-Handwritten Digit Recognition](validated\u002Fvision\u002Fclassification\u002Fmnist)\u003C\u002Fb>|[Convolutional Neural Network with MNIST](https:\u002F\u002Fgithub.com\u002FMicrosoft\u002FCNTK\u002Fblob\u002Fmaster\u002FTutorials\u002FCNTK_103D_MNIST_ConvolutionalNeuralNetwork.ipynb)\t|Deep CNN model for handwritten digit identification|\n\u003Chr>\n\n### Object Detection & Image Segmentation \u003Ca name=\"object_detection\"\u002F>\nObject detection models detect the presence of multiple objects in an image and segment out areas of the image where the objects are detected. Semantic segmentation models partition an input image by labeling each pixel into a set of pre-defined categories.\n\n|Model Class |Reference |Description |Hugging Face Spaces |\n|-|-|-|-|\n|\u003Cb>[Tiny YOLOv2](validated\u002Fvision\u002Fobject_detection_segmentation\u002Ftiny-yolov2)\u003C\u002Fb>|[Redmon et al.](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1612.08242.pdf)|A real-time CNN for object detection that detects 20 different classes. A smaller version of the more complex full YOLOv2 network.|\n|\u003Cb>[SSD](validated\u002Fvision\u002Fobject_detection_segmentation\u002Fssd)\u003C\u002Fb>|[Liu et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1512.02325)|Single Stage Detector: real-time CNN for object detection that detects 80 different classes.|\n|\u003Cb>[SSD-MobileNetV1](validated\u002Fvision\u002Fobject_detection_segmentation\u002Fssd-mobilenetv1)\u003C\u002Fb>|[Howard et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.04861)|A variant of MobileNet that uses the Single Shot Detector (SSD) model framework. The model detects 80 different object classes and locates up to 10 objects in an image.|\n|\u003Cb>[Faster-RCNN](validated\u002Fvision\u002Fobject_detection_segmentation\u002Ffaster-rcnn)\u003C\u002Fb>|[Ren et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1506.01497)|Increases efficiency from R-CNN by connecting a RPN with a CNN to create a single, unified network for object detection that detects 80 different classes.| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002Ffaster-rcnn) |\n|\u003Cb>[Mask-RCNN](validated\u002Fvision\u002Fobject_detection_segmentation\u002Fmask-rcnn)\u003C\u002Fb>|[He et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.06870)|A real-time neural network for object instance segmentation that detects 80 different classes. Extends Faster R-CNN as each of the 300 elected ROIs go through 3 parallel branches of the network: label prediction, bounding box prediction and mask prediction.| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002Fmask-rcnn) |\n|\u003Cb>[RetinaNet](validated\u002Fvision\u002Fobject_detection_segmentation\u002Fretinanet)\u003C\u002Fb>|[Lin et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1708.02002)|A real-time dense detector network for object detection that addresses class imbalance through Focal Loss. RetinaNet is able to match the speed of previous one-stage detectors and defines the state-of-the-art in two-stage detectors (surpassing R-CNN).|\n|\u003Cb>[YOLO v2-coco](validated\u002Fvision\u002Fobject_detection_segmentation\u002Fyolov2-coco)\u003C\u002Fb>|[Redmon et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1612.08242)|A CNN model for real-time object detection system that can detect over 9000 object categories. It uses a single network evaluation, enabling it to be more than 1000x faster than R-CNN and 100x faster than Faster R-CNN. This model is trained with COCO dataset and contains 80 classes.\n|\u003Cb>[YOLO v3](validated\u002Fvision\u002Fobject_detection_segmentation\u002Fyolov3)\u003C\u002Fb>|[Redmon et al.](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1804.02767.pdf)|A deep CNN model for real-time object detection that detects 80 different classes. A little bigger than YOLOv2 but still very fast. As accurate as SSD but 3 times faster.|\n|\u003Cb>[Tiny YOLOv3](validated\u002Fvision\u002Fobject_detection_segmentation\u002Ftiny-yolov3)\u003C\u002Fb>|[Redmon et al.](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1804.02767.pdf)| A smaller version of YOLOv3 model. |\n|\u003Cb>[YOLOv4](validated\u002Fvision\u002Fobject_detection_segmentation\u002Fyolov4)\u003C\u002Fb>|[Bochkovskiy et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.10934)|Optimizes the speed and accuracy of object detection. Two times faster than EfficientDet. It improves YOLOv3's AP and FPS by 10% and 12%, respectively, with mAP50 of 52.32 on the COCO 2017 dataset and FPS of 41.7 on a Tesla V100.| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002Fyolov4) |\n|\u003Cb>[DUC](validated\u002Fvision\u002Fobject_detection_segmentation\u002Fduc)\u003C\u002Fb>|[Wang et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1702.08502)|Deep CNN based pixel-wise semantic segmentation model with >80% [mIOU](\u002Fmodels\u002Fsemantic_segmentation\u002FDUC\u002FREADME.md\u002F#metric) (mean Intersection Over Union). Trained on cityscapes dataset, which can be effectively implemented in self driving vehicle systems.| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FDUC) |\n|\u003Cb>[FCN](validated\u002Fvision\u002Fobject_detection_segmentation\u002Ffcn)|[Long et al.](https:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~jonlong\u002Flong_shelhamer_fcn.pdf)|Deep CNN based segmentation model trained end-to-end, pixel-to-pixel that produces efficient inference and learning. Built off of AlexNet, VGG net, GoogLeNet classification methods. \u003Cbr>[contribute](contribute.md)| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FFCN) |\n\u003Chr>\n\n### Body, Face & Gesture Analysis \u003Ca name=\"body_analysis\"\u002F>\nFace detection models identify and\u002For recognize human faces and emotions in given images. Body and Gesture Analysis models identify gender and age in given image.\n\n|Model Class |Reference |Description |Hugging Face Spaces |\n|-|-|-|-|\n|\u003Cb>[ArcFace](validated\u002Fvision\u002Fbody_analysis\u002Farcface)\u003C\u002Fb>|[Deng et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1801.07698)|A CNN based model for face recognition which learns discriminative features of faces and produces embeddings for input face images.| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FArcFace) |\n|\u003Cb>[UltraFace](validated\u002Fvision\u002Fbody_analysis\u002Fultraface)\u003C\u002Fb>|[Ultra-lightweight face detection model](https:\u002F\u002Fgithub.com\u002FLinzaer\u002FUltra-Light-Fast-Generic-Face-Detector-1MB)|This model is a lightweight facedetection model designed for edge computing devices.| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002Fultraface) |\n|\u003Cb>[Emotion FerPlus](validated\u002Fvision\u002Fbody_analysis\u002Femotion_ferplus)\u003C\u002Fb> |[Barsoum et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1608.01041)\t| Deep CNN for emotion recognition trained on images of faces.|\n|\u003Cb>[Age and Gender Classification using Convolutional Neural Networks](validated\u002Fvision\u002Fbody_analysis\u002Fage_gender)\u003C\u002Fb>| [Rothe et al.](https:\u002F\u002Fdata.vision.ee.ethz.ch\u002Fcvl\u002Fpublications\u002Fpapers\u002Fproceedings\u002Feth_biwi_01229.pdf)\t|This model accurately classifies gender and age even the amount of learning data is limited.|\n\u003Chr>\n\n### Image Manipulation \u003Ca name=\"image_manipulation\"\u002F>\nImage manipulation models use neural networks to transform input images to modified output images. Some popular models in this category involve style transfer or enhancing images by increasing resolution.\n\n|Model Class |Reference |Description |Hugging Face Spaces |\n|-|-|-|-|\n|Unpaired Image to Image Translation using Cycle consistent Adversarial Network|[Zhu et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.10593)|The model uses learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. \u003Cbr>[contribute](contribute.md)|\n|\u003Cb>[Super Resolution with sub-pixel CNN](validated\u002Fvision\u002Fsuper_resolution\u002Fsub_pixel_cnn_2016)\u003C\u002Fb> |\t[Shi et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1609.05158)\t|A deep CNN that uses sub-pixel convolution layers to upscale the input image. | [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002Fsub_pixel_cnn_2016) |\n|\u003Cb>[Fast Neural Style Transfer](validated\u002Fvision\u002Fstyle_transfer\u002Ffast_neural_style)\u003C\u002Fb> |\t[Johnson et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1603.08155)\t|This method uses a loss network pretrained for image classification to define perceptual loss functions that measure perceptual differences in content and style between images. The loss network remains fixed during the training process.|\n\u003Chr>\n\n### Speech & Audio Processing \u003Ca name=\"speech\"\u002F>\nThis class of models uses audio data to train models that can identify voice, generate music, or even read text out loud.\n\n|Model Class |Reference |Description |\n|-|-|-|\n|Speech recognition with deep recurrent neural networks|\t[Graves et al.](https:\u002F\u002Fwww.cs.toronto.edu\u002F~fritz\u002Fabsps\u002FRNN13.pdf)|A RNN model for sequential data for speech recognition. Labels problems where the input-output alignment is unknown\u003Cbr>[contribute](contribute.md)|\n|Deep voice: Real time neural text to speech |\t[Arik et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1702.07825)\t|A DNN model that performs end-to-end neural speech synthesis. Requires fewer parameters and it is faster than other systems. \u003Cbr>[contribute](contribute.md)|\n|Sound Generative models|\t[WaveNet: A Generative Model for Raw Audio ](https:\u002F\u002Farxiv.org\u002Fabs\u002F1609.03499)|A CNN model that generates raw audio waveforms. Has predictive distribution for each audio sample. Generates realistic music fragments. \u003Cbr>[contribute](contribute.md)|\n\u003Chr>\n\n### Machine Comprehension \u003Ca name=\"machine_comprehension\"\u002F>\nThis subset of natural language processing models that answer questions about a given context paragraph.\n\n|Model Class |Reference |Description |Hugging Face Spaces|\n|-|-|-|-|\n|\u003Cb>[Bidirectional Attention Flow](validated\u002Ftext\u002Fmachine_comprehension\u002Fbidirectional_attention_flow)\u003C\u002Fb>|[Seo et al.](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1611.01603)|A model that answers a query about a given context paragraph.| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FBiDAF) |\n|\u003Cb>[BERT-Squad](validated\u002Ftext\u002Fmachine_comprehension\u002Fbert-squad)\u003C\u002Fb>|[Devlin et al.](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.04805.pdf)|This model answers questions based on the context of the given input paragraph. | [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FBERT-Squad) |\n|\u003Cb>[RoBERTa](validated\u002Ftext\u002Fmachine_comprehension\u002Froberta)\u003C\u002Fb>|[Liu et al.](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1907.11692.pdf)|A large transformer-based model that predicts sentiment based on given input text.| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FRoBERTa) |\n|\u003Cb>[GPT-2](validated\u002Ftext\u002Fmachine_comprehension\u002Fgpt-2)\u003C\u002Fb>|[Radford et al.](https:\u002F\u002Fd4mucfpksywv.cloudfront.net\u002Fbetter-language-models\u002Flanguage_models_are_unsupervised_multitask_learners.pdf)|A large transformer-based language model that given a sequence of words within some text, predicts the next word. | [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FGPT-2) |\n|\u003Cb>[T5](validated\u002Ftext\u002Fmachine_comprehension\u002Ft5)\u003C\u002Fb>|[Raffel et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.10683)|A large transformer-based language model trained on multiple tasks at once to achieve better semantic understanding of the prompt, capable of sentiment-analysis, question-answering, similarity-detection, translation, summarization, etc. |[![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FT5) |\n\u003Chr>\n\n### Machine Translation \u003Ca name=\"machine_translation\"\u002F>\nThis class of natural language processing models learns how to translate input text to another language.\n\n|Model Class |Reference |Description |\n|-|-|-|\n|Neural Machine Translation by jointly learning to align and translate|\t[Bahdanau et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1409.0473)|Aims to build a single neural network that can be jointly tuned to maximize the translation performance. \u003Cbr>[contribute](contribute.md)|\n|Google's Neural Machine Translation System|\t[Wu et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1609.08144)|This model helps to improve issues faced by the Neural Machine Translation (NMT) systems like parallelism that helps accelerate the final translation speed.\u003Cbr>[contribute](contribute.md)|\n\u003Chr>\n\n### Language Modelling \u003Ca name=\"language_modelling\"\u002F>\nThis subset of natural language processing models learns representations of language from large corpuses of text.\n\n|Model Class |Reference |Description |\n|-|-|-|\n|Deep Neural Network Language Models | [Arisoy et al.](https:\u002F\u002Fpdfs.semanticscholar.org\u002Fa177\u002F45f1d7045636577bcd5d513620df5860e9e5.pdf)|A DNN acoustic model. Used in many natural language technologies. Represents a probability distribution over all possible word strings in a language. \u003Cbr> [contribute](contribute.md)|\n\u003Chr>\n\n### Visual Question Answering & Dialog \u003Ca name=\"visual_qna\"\u002F>\nThis subset of natural language processing models uses input images to answer questions about those images.\n\n|Model Class |Reference |Description |\n|-|-|-|\n|VQA: Visual Question Answering |[Agrawal et al.](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1505.00468v6.pdf)|A model that takes an image and a free-form, open-ended natural language question about the image and outputs a natural-language answer. \u003Cbr>[contribute](contribute.md)|\n|Yin and Yang: Balancing and Answering Binary Visual Questions |[Zhang et al.](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1511.05099.pdf)|Addresses VQA by converting the question to a tuple that concisely summarizes the visual concept to be detected in the image. Next, if the concept can be found in the image, it provides a “yes” or “no” answer. Its performance matches the traditional VQA approach on unbalanced dataset, and outperforms it on the balanced dataset. \u003Cbr>[contribute](contribute.md)|\n|Making the V in VQA Matter|[Goyal et al.](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1612.00837.pdf)|Balances the VQA dataset by collecting complementary images such that every question is associated with a pair of similar images that result in two different answers to the question, providing a unique interpretable model that provides a counter-example based explanation.  \u003Cbr>[contribute](contribute.md)|\n|Visual Dialog|\t[Das et al.](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.08669)|An AI agent that holds a meaningful dialog with humans in natural, conversational language about visual content. Curates a large-scale Visual Dialog dataset (VisDial). \u003Cbr>[contribute](contribute.md)|\n\u003Chr>\n\n### Other interesting models \u003Ca name=\"others\"\u002F>\nThere are many interesting deep learning models that do not fit into the categories described above. The ONNX team would like to highly encourage users and researchers to [contribute](contribute.md) their models to the growing model zoo.\n\n|Model Class |Reference |Description |\n|-|-|-|\n|Text to Image|\t[Generative Adversarial Text to image Synthesis ](https:\u002F\u002Farxiv.org\u002Fabs\u002F1605.05396)|Effectively bridges the advances in text and image modeling, translating visual concepts from characters to pixels. Generates plausible images of birds and flowers from detailed text descriptions. \u003Cbr>[contribute](contribute.md)|\n|Time Series Forecasting|\t[Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks ](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.07015.pdf)|The model extracts short-term local dependency patterns among variables and to discover long-term patterns for time series trends. It helps to predict solar plant energy output, electricity consumption, and traffic jam situations. \u003Cbr>[contribute](contribute.md)|\n|Recommender systems|[DropoutNet: Addressing Cold Start in Recommender Systems](http:\u002F\u002Fwww.cs.toronto.edu\u002F~mvolkovs\u002Fnips2017_deepcf.pdf)|A collaborative filtering method that makes predictions about an individual’s preference based on preference information from other users.\u003Cbr>[contribute](contribute.md)|\n|Collaborative filtering|[Neural Collaborative Filtering](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1708.05031.pdf)|A DNN model based on the interaction between user and item features using matrix factorization. \u003Cbr>[contribute](contribute.md)|\n|Autoencoders|[A Hierarchical Neural Autoencoder for Paragraphs and Documents](https:\u002F\u002Farxiv.org\u002Fabs\u002F1506.01057)|An LSTM (long-short term memory) auto-encoder to preserve and reconstruct multi-sentence paragraphs.\u003Cbr>[contribute](contribute.md)|\n\u003Chr>\n\n## Usage \u003Ca name=\"usage-\"\u002F>\n\nEvery ONNX backend should support running the models out of the box. After downloading and extracting the tarball of each model, you will find:\n\n- A protobuf file `model.onnx` that represents the serialized ONNX model.\n- Test data (in the form of serialized protobuf TensorProto files or serialized NumPy archives).\n\n### Usage - Test data starter code\n\nThe test data files can be used to validate ONNX models from the Model Zoo. We have provided the following interface examples for you to get started. Please replace `onnx_backend` in your code with the appropriate framework of your choice that provides ONNX inferencing support, and likewise replace `backend.run_model` with the framework's model evaluation logic.\n\nThere are two different formats for the test data files:\n\n- Serialized protobuf TensorProtos (.pb), stored in folders with the naming convention `test_data_set_*`.\n\n```python\nimport numpy as np\nimport onnx\nimport os\nimport glob\nimport onnx_backend as backend\n\nfrom onnx import numpy_helper\n\nmodel = onnx.load('model.onnx')\ntest_data_dir = 'test_data_set_0'\n\n# Load inputs\ninputs = []\ninputs_num = len(glob.glob(os.path.join(test_data_dir, 'input_*.pb')))\nfor i in range(inputs_num):\n    input_file = os.path.join(test_data_dir, 'input_{}.pb'.format(i))\n    tensor = onnx.TensorProto()\n    with open(input_file, 'rb') as f:\n        tensor.ParseFromString(f.read())\n    inputs.append(numpy_helper.to_array(tensor))\n\n# Load reference outputs\nref_outputs = []\nref_outputs_num = len(glob.glob(os.path.join(test_data_dir, 'output_*.pb')))\nfor i in range(ref_outputs_num):\n    output_file = os.path.join(test_data_dir, 'output_{}.pb'.format(i))\n    tensor = onnx.TensorProto()\n    with open(output_file, 'rb') as f:\n        tensor.ParseFromString(f.read())\n    ref_outputs.append(numpy_helper.to_array(tensor))\n\n# Run the model on the backend\noutputs = list(backend.run_model(model, inputs))\n\n# Compare the results with reference outputs.\nfor ref_o, o in zip(ref_outputs, outputs):\n    np.testing.assert_almost_equal(ref_o, o)\n```\n\n- Serialized Numpy archives, stored in files with the naming convention `test_data_*.npz`. Each file contains one set of test inputs and outputs.\n\n```python\nimport numpy as np\nimport onnx\nimport onnx_backend as backend\n\n# Load the model and sample inputs and outputs\nmodel = onnx.load(model_pb_path)\nsample = np.load(npz_path, encoding='bytes')\ninputs = list(sample['inputs'])\noutputs = list(sample['outputs'])\n\n# Run the model with an onnx backend and verify the results\nnp.testing.assert_almost_equal(outputs, backend.run_model(model, inputs))\n```\n\n### Usage - Model quantization\nYou can get quantized ONNX models by using [Intel® Neural Compressor](https:\u002F\u002Fgithub.com\u002Fintel\u002Fneural-compressor). It provides web-based UI service to make quantization easier and supports code-based usage for more abundant quantization settings. Refer to [bench document](https:\u002F\u002Fgithub.com\u002Fintel\u002Fneural-compressor\u002Fblob\u002Fmaster\u002Fdocs\u002Fbench.md) for how to use web-based UI service and [example document](.\u002Fresource\u002Fdocs\u002FINC_code.md) for a simple code-based demo.\n![image](.\u002Fresource\u002Fimages\u002FINC_GUI.gif)\n\n## Usage\n\nThere are multiple ways to access the ONNX Model Zoo:\n\n### Git Clone (Not Recommended)\n\nCloning the repository using git won't automatically download the ONNX models due to their size. To manage these files, first, install Git LFS by running:\n\n```bash\npip install git-lfs\n```\n\nTo download a specific model:\n\n```bash\ngit lfs pull --include=\"[path to model].onnx\" --exclude=\"\"\n```\n\nTo download all models:\n\n```bash\ngit lfs pull --include=\"*\" --exclude=\"\"\n```\n\n### GitHub UI\n\nAlternatively, you can download models directly from GitHub. Navigate to the model's page and click the \"Download\" button on the top right corner.\n\n## Model Visualization\n\nFor a graphical representation of each model's architecture, we recommend using [Netron](https:\u002F\u002Fgithub.com\u002Flutzroeder\u002Fnetron).\n\n## Contributions\n\nContributions to the ONNX Model Zoo are welcome! Please check our [contribution guidelines](contribute.md) for more information on how you can contribute to the growth and improvement of this resource.\n\nThank you for your interest in the ONNX Model Zoo, and we look forward to your participation in our community!\n\n# License\n\n[Apache License v2.0](LICENSE)\n","\u003C!--- SPDX-License-Identifier: Apache-2.0 -->\n\n> **弃用通知**：我们衷心感谢社区对 ONNX Model Zoo 项目的参与和支持。随着机器学习生态系统的不断发展，许多新颖的模型共享已成功迁移至 Hugging Face 平台，该平台目前保持着活跃且健康的态势。我们仅将 ONNX Model Zoo 仓库保留用于历史记录目的。请注意，自 2025 年 7 月 1 日起，模型将不再可通过 LFS 下载。您仍可访问原本在此仓库中提供的模型，请前往 https:\u002F\u002Fhuggingface.co\u002Fonnxmodelzoo。\n\n# ONNX 模型库\n\n\n## 简介\n\n欢迎来到 ONNX 模型库！开放神经网络交换格式（ONNX）是一种开放标准格式，旨在表示机器学习模型。在强大的合作伙伴社区支持下，ONNX 定义了一组通用算子和一种通用文件格式，使 AI 开发人员能够在多种框架、工具、运行时和编译器之间使用模型。\n\n本仓库是一个精选的预训练、最先进的 ONNX 格式模型集合。这些模型来自知名的开源项目，并由多元化的社区成员贡献。我们的目标是促进机器学习模型在更广泛的开发者、研究人员和爱好者群体中的传播与应用。\n\n由于 ONNX 模型文件可能较大，我们使用 Git LFS（大文件存储）来管理这些文件。\n\n## 模型\n\n目前，我们正在通过纳入以下类别中的更多模型来扩展 ONNX 模型库。鉴于我们正在严格验证新模型的准确性，请参阅下方已成功通过准确性验证的[已验证模型]：\n\n- 计算机视觉\n- 自然语言处理（NLP）\n- 生成式 AI\n- 图机器学习\n\n这些模型来源于诸如 [timm](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models)、[torchvision](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fvision)、[torch_hub](https:\u002F\u002Fpytorch.org\u002Fhub\u002F) 和 [transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers) 等知名开源项目，并使用开源的 [TurnkeyML 工具链](https:\u002F\u002Fgithub.com\u002Fonnx\u002Fturnkeyml) 导出为 ONNX 格式。\n\n\n## 已验证模型\n\n#### 视觉\n* [图像分类](#image_classification)\n* [目标检测与图像分割](#object_detection)\n* [人体、人脸与手势分析](#body_analysis)\n* [图像处理](#image_manipulation)\n\n#### 语言\n* [机器阅读理解](#machine_comprehension)\n* [机器翻译](#machine_translation)\n* [语言建模](#language_modelling)\n\n#### 其他\n* [视觉问答与对话](#visual_qna)\n* [语音与音频处理](#speech)\n* [其他有趣模型](#others)\n\n请阅读下方的[使用说明](#usage-)部分，了解更多关于 ONNX 模型库中文件格式（.onnx、.pb、.npz）、通过[Git LFS 命令行](#gitlfs-)下载多个 ONNX 模型，以及使用测试数据验证 ONNX 模型的入门 Python 代码的信息。\n\nINT8 模型由 [Intel® Neural Compressor](https:\u002F\u002Fgithub.com\u002Fintel\u002Fneural-compressor) 生成。[Intel® Neural Compressor](https:\u002F\u002Fgithub.com\u002Fintel\u002Fneural-compressor) 是一个开源的 Python 库，支持自动的精度驱动调优策略，帮助用户快速找到最佳量化模型。它为 ONNX 模型实现了动态和静态量化，并能以算子导向和张量导向（QDQ）两种方式表示量化后的 ONNX 模型。用户可以通过基于 Web 的 UI 服务或 Python 代码进行量化操作。更多详情请参阅[简介](https:\u002F\u002Fgithub.com\u002Fintel\u002Fneural-compressor\u002Fblob\u002Fmaster\u002FREADME.md)。\n\n### 图像分类 \u003Ca name=\"image_classification\"\u002F>\n这一系列模型以图像作为输入，然后将图像中的主要物体分类到1000个类别中，例如键盘、鼠标、铅笔以及许多动物。\n\n|模型类别 |参考文献 |描述 |Huggingface Spaces|\n|-|-|-|-|\n|\u003Cb>MobileNet\u003C\u002Fb>|[Sandler等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1801.04381)|轻量级深度神经网络，最适合移动和嵌入式视觉应用。\u003Cbr>论文中的Top-5错误率约为10%|\n|\u003Cb>ResNet\u003C\u002Fb>|[He等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1512.03385)|一种CNN模型（最多152层）。使用捷径连接在图像分类时达到更高的准确率。\u003Cbr>论文中的Top-5错误率约为3.6%| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FResNet) |\n|\u003Cb>SqueezeNet\u003C\u002Fb>|[Iandola等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1602.07360)|一种轻量级CNN模型，在参数数量减少50倍的情况下仍能提供AlexNet级别的准确率。\u003Cbr>论文中的Top-5错误率约为20%| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FSqueezeNet) |\n|\u003Cb>VGG\u003C\u002Fb>|[Simonyan等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1409.1556)|深度CNN模型（最多19层）。与AlexNet类似，但使用多个较小的卷积核，从而在图像分类时提供更高的准确率。\u003Cbr>论文中的Top-5错误率约为8%| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FVGG) |\n|\u003Cb>AlexNet\u003C\u002Fb>|[Krizhevsky等](https:\u002F\u002Fpapers.nips.cc\u002Fpaper\u002F4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf)|一种深度CNN模型（最多8层），输入为图像，输出为一个包含1000个数字的向量。\u003Cbr>论文中的Top-5错误率约为15%| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FAlexNet) |\n|\u003Cb>GoogleNet\u003C\u002Fb>|[Szegedy等](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1409.4842.pdf)|深度CNN模型（最多22层）。相比VGG更小、更快，而在细节刻画上又比AlexNet更准确。\u003Cbr>论文中的Top-5错误率约为6.7%| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FGoogleNet) |\n|\u003Cb>CaffeNet\u003C\u002Fb>|[Krizhevsky等]( https:\u002F\u002Fucb-icsi-vision-group.github.io\u002Fcaffe-paper\u002Fcaffe.pdf)|AlexNet的深度CNN变体，用于Caffe中的图像分类，其中最大池化先于局部响应归一化（LRN），从而使LRN占用更少的计算资源和内存。| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FCaffeNet) |\n|\u003Cb>RCNN_ILSVRC13\u003C\u002Fb>|[Girshick等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1311.2524)|R-CNN的纯Caffe实现，用于图像分类。该模型利用区域定位来对图像进行分类并提取特征。|\n|\u003Cb>DenseNet-121\u003C\u002Fb>|[Huang等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1608.06993)|一种每层都与其他所有层相连的模型，能够传递自身的特征，从而提供更强的梯度流动和更丰富的特征。\u003Cbr>Top-5错误率约为6.7%| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FDenseNet-121) |\n|\u003Cb>Inception_V1\u003C\u002Fb>|[Szegedy等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1409.4842)|该模型与GoogLeNet相同，通过Caffe2实现，改进了网络内部计算资源的利用率，并有助于缓解梯度消失问题。\u003Cbr>论文中的Top-5错误率约为6.7%| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FInception_v1) |\n|\u003Cb>Inception_V2\u003C\u002Fb>|[Szegedy等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1512.00567)|一种针对图像分类的深度CNN模型，是Inception v1的改进版本，加入了批量归一化。与Inception v1相比，该模型降低了计算成本并提高了图像分辨率。\u003Cbr>论文中的Top-5错误率约为4.82%|\n|\u003Cb>ShuffleNet_V1\u003C\u002Fb>|[Zhang等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1707.01083)|一种极其高效的CNN模型，专为移动设备设计。该模型大大减少了计算成本，在基于ARM的移动设备上比AlexNet快约13倍。与MobileNet相比，ShuffleNet凭借其高效的结构取得了显著的优势。\u003Cbr>论文中的Top-1错误率约为32.6%|\n|\u003Cb>ShuffleNet_V2\u003C\u002Fb>|[Zhang等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1807.11164)|一种极其高效的CNN模型，专为移动设备设计。该网络架构设计考虑的是直接指标，如速度，而不是间接指标，如FLOP。\u003Cbr>论文中的Top-1错误率约为30.6%|\n|\u003Cb>ZFNet-512\u003C\u002Fb>|[Zeiler等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1311.2901)|一种深度CNN模型（最多8层），增加了网络能够检测的特征数量，从而帮助在网络分辨率更高的情况下提取图像特征。\u003Cbr>论文中的Top-5错误率约为14.3%| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FZFNet-512) |\n|\u003Cb>EfficientNet-Lite4\u003C\u002Fb>|[Tan等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1905.11946)|一种计算和参数数量都大幅减少的CNN模型，同时仍能达到最先进的准确率，并且比之前的ConvNet效率更高。\u003Cbr>论文中的Top-5错误率约为2.9%| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FEfficientNet-Lite4) |\n\u003Chr>\n\n#### 领域特定图像分类 \u003Ca name=\"domain_based_image\"\u002F>\n这一子集的模型针对特定领域和数据集对图像进行分类。\n\n|模型类别 |参考文献 |描述 |\n|-|-|-|\n|\u003Cb>MNIST手写数字识别\u003C\u002Fb>|[带有MNIST的卷积神经网络](https:\u002F\u002Fgithub.com\u002FMicrosoft\u002FCNTK\u002Fblob\u002Fmaster\u002FTutorials\u002FCNTK_103D_MNIST_ConvolutionalNeuralNetwork.ipynb)\t|用于手写数字识别的深度CNN模型|\n\u003Chr>\n\n### 目标检测与图像分割 \u003Ca name=\"object_detection\"\u002F>\n目标检测模型用于检测图像中是否存在多个对象，并分割出检测到对象的区域。语义分割模型则通过为每个像素分配预定义的类别标签，将输入图像划分为不同的区域。\n\n|模型类别 |参考文献 |描述 |Hugging Face Spaces |\n|-|-|-|-|\n|\u003Cb>[Tiny YOLOv2](validated\u002Fvision\u002Fobject_detection_segmentation\u002Ftiny-yolov2)\u003C\u002Fb>|[Redmon等](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1612.08242.pdf)|一种实时目标检测CNN，可检测20个不同类别。它是更复杂的完整YOLOv2网络的一个较小版本。|\n|\u003Cb>[SSD](validated\u002Fvision\u002Fobject_detection_segmentation\u002Fssd)\u003C\u002Fb>|[Liu等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1512.02325)|单阶段检测器：一种实时目标检测CNN，可检测80个不同类别。|\n|\u003Cb>[SSD-MobileNetV1](validated\u002Fvision\u002Fobject_detection_segmentation\u002Fssd-mobilenetv1)\u003C\u002Fb>|[Howard等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1704.04861)|MobileNet的一种变体，使用单次检测器（SSD）模型框架。该模型可检测80个不同物体类别，并在一张图像中定位最多10个物体。|\n|\u003Cb>[Faster-RCNN](validated\u002Fvision\u002Fobject_detection_segmentation\u002Ffaster-rcnn)\u003C\u002Fb>|[Ren等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1506.01497)|通过将RPN与CNN连接，形成一个统一的目标检测网络，从而提高了R-CNN的效率，可检测80个不同类别。| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002Ffaster-rcnn) |\n|\u003Cb>[Mask-RCNN](validated\u002Fvision\u002Fobject_detection_segmentation\u002Fmask-rcnn)\u003C\u002Fb>|[He等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.06870)|一种实时目标实例分割神经网络，可检测80个不同类别。它扩展了Faster R-CNN，对选出的300个ROI分别进行三个并行分支的处理：类别预测、边界框预测和掩码预测。| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002Fmask-rcnn) |\n|\u003Cb>[RetinaNet](validated\u002Fvision\u002Fobject_detection_segmentation\u002Fretinanet)\u003C\u002Fb>|[Lin等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1708.02002)|一种实时密集型目标检测网络，通过焦点损失解决类别不平衡问题。RetinaNet能够达到之前单阶段检测器的速度，并在双阶段检测器中树立了新的标杆（超越R-CNN）。|\n|\u003Cb>[YOLO v2-coco](validated\u002Fvision\u002Fobject_detection_segmentation\u002Fyolov2-coco)\u003C\u002Fb>|[Redmon等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1612.08242)|一种用于实时目标检测系统的CNN模型，可检测超过9000个物体类别。它采用单次网络评估，速度比R-CNN快1000多倍，比Faster R-CNN快100倍。该模型使用COCO数据集训练，包含80个类别。\n|\u003Cb>[YOLO v3](validated\u002Fvision\u002Fobject_detection_segmentation\u002Fyolov3)\u003C\u002Fb>|[Redmon等](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1804.02767.pdf)|一种深度CNN模型，用于实时目标检测，可检测80个不同类别。比YOLOv2稍大，但仍非常快速。准确度与SSD相当，但速度快3倍。|\n|\u003Cb>[Tiny YOLOv3](validated\u002Fvision\u002Fobject_detection_segmentation\u002Ftiny-yolov3)\u003C\u002Fb>|[Redmon等](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1804.02767.pdf)|YOLOv3模型的一个较小版本。|\n|\u003Cb>[YOLOv4](validated\u002Fvision\u002Fobject_detection_segmentation\u002Fyolov4)\u003C\u002Fb>|[Bochkovskiy等](https:\u002F\u002Farxiv.org\u002Fabs\u002F2004.10934)|优化了目标检测的速度和精度。速度是EfficientDet的两倍。它将YOLOv3的AP和FPS分别提高了10%和12%，在COCO 2017数据集上的mAP50为52.32，Tesla V100上的FPS为41.7。| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002Fyolov4) |\n|\u003Cb>[DUC](validated\u002Fvision\u002Fobject_detection_segmentation\u002Fduc)\u003C\u002Fb>|[Wang等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1702.08502)|基于深度CNN的逐像素语义分割模型，mIOU（平均交并比）超过80%。该模型在cityscapes数据集上训练，可有效应用于自动驾驶车辆系统。| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FDUC) |\n|\u003Cb>[FCN](validated\u002Fvision\u002Fobject_detection_segmentation\u002Ffcn)\u003C\u002Fb>|[Long等](https:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~jonlong\u002Flong_shelhamer_fcn.pdf)|一种端到端、逐像素训练的深度CNN分割模型，具有高效的推理和学习能力。基于AlexNet、VGG网络和GoogLeNet分类方法构建。\u003Cbr>[贡献](contribute.md)| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FFCN) |\n\u003Chr>\n\n### 身体、面部及手势分析 \u003Ca name=\"body_analysis\"\u002F>\n面部检测模型用于识别和\u002F或在给定图像中识别人脸及其情感。身体和手势分析模型则用于识别图像中的性别和年龄。\n\n|模型类别 |参考文献 |描述 |Hugging Face Spaces |\n|-|-|-|-|\n|\u003Cb>[ArcFace](validated\u002Fvision\u002Fbody_analysis\u002Farcface)\u003C\u002Fb>|[Deng等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1801.07698)|一种基于CNN的面部识别模型，能够学习人脸的判别特征，并为输入的面部图像生成嵌入向量。| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FArcFace) |\n|\u003Cb>[UltraFace](validated\u002Fvision\u002Fbody_analysis\u002Fultraface)\u003C\u002Fb>|[超轻量级面部检测模型](https:\u002F\u002Fgithub.com\u002FLinzaer\u002FUltra-Light-Fast-Generic-Face-Detector-1MB)|该模型是一种专为边缘计算设备设计的轻量级面部检测模型。| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002Fultraface) |\n|\u003Cb>[Emotion FerPlus](validated\u002Fvision\u002Fbody_analysis\u002Femotion_ferplus)\u003C\u002Fb> |[Barsoum等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1608.01041)\t|一种基于深度CNN的情感识别模型，基于人脸图像进行训练。|\n|\u003Cb>[基于卷积神经网络的年龄与性别分类](validated\u002Fvision\u002Fbody_analysis\u002Fage_gender)\u003C\u002Fb>| [Rothe等](https:\u002F\u002Fdata.vision.ee.ethz.ch\u002Fcvl\u002Fpublications\u002Fpapers\u002Fproceedings\u002Feth_biwi_01229.pdf)\t|该模型即使在训练数据有限的情况下，也能准确地对性别和年龄进行分类。|\n\u003Chr>\n\n### 图像处理 \u003Ca name=\"image_manipulation\"\u002F>\n图像处理模型利用神经网络将输入图像转换为经过修改的输出图像。这一类别中一些流行的模型涉及风格迁移或通过提高分辨率来增强图像。\n\n|模型类别 |参考文献 |描述 |Hugging Face Spaces |\n|-|-|-|-|\n|基于循环一致对抗网络的无配对图像到图像翻译|[Zhu 等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.10593)|该模型在缺乏成对示例的情况下，学习将源域 X 中的图像转换为目标域 Y 中的图像。\u003Cbr>[贡献](contribute.md)|\n|\u003Cb>基于亚像素卷积神经网络的超分辨率\u003C\u002Fb> |\t[Shi 等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1609.05158)\t|一种使用亚像素卷积层来放大输入图像的深度卷积神经网络。 | [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002Fsub_pixel_cnn_2016) |\n|\u003Cb>快速神经风格迁移\u003C\u002Fb> |\t[Johnson 等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1603.08155)\t|该方法使用一个用于图像分类的预训练损失网络来定义感知损失函数，以衡量图像内容和风格之间的感知差异。在训练过程中，该损失网络保持固定不变。|\n\u003Chr>\n\n### 语音与音频处理 \u003Ca name=\"speech\"\u002F>\n这一类模型使用音频数据来训练能够识别语音、生成音乐，甚至将文本朗读出来的模型。\n\n|模型类别 |参考文献 |描述 |\n|-|-|-|\n|基于深度循环神经网络的语音识别|\t[Graves 等](https:\u002F\u002Fwww.cs.toronto.edu\u002F~fritz\u002Fabsps\u002FRNN13.pdf)|一种用于语音识别的序列数据循环神经网络模型。适用于输入输出对齐未知的问题\u003Cbr>[贡献](contribute.md)|\n|Deep Voice：实时神经文本转语音 |\t[Arik 等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1702.07825)\t|一种执行端到端神经语音合成的深度神经网络模型。所需参数较少，且速度比其他系统更快。\u003Cbr>[贡献](contribute.md)|\n|声音生成模型|\t[WaveNet：原始音频的生成模型 ](https:\u002F\u002Farxiv.org\u002Fabs\u002F1609.03499)|一种生成原始音频波形的卷积神经网络模型。对每个音频样本都有预测分布，能生成逼真的音乐片段。\u003Cbr>[贡献](contribute.md)|\n\u003Chr>\n\n### 机器阅读理解 \u003Ca name=\"machine_comprehension\"\u002F>\n这是自然语言处理模型的一个子集，能够根据给定的上下文段落回答问题。\n\n|模型类别 |参考文献 |描述 |Hugging Face Spaces|\n|-|-|-|-|\n|\u003Cb>双向注意力流\u003C\u002Fb>|[Seo 等](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1611.01603)|一种根据给定的上下文段落回答问题的模型。| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FBiDAF) |\n|\u003Cb>BERT-SQuAD\u003C\u002Fb>|[Devlin 等](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1810.04805.pdf)|该模型根据给定输入段落的上下文回答问题。 | [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FBERT-Squad) |\n|\u003Cb>RoBERTa\u003C\u002Fb>|[Liu 等](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1907.11692.pdf)|一种基于 Transformer 的大型模型，可根据给定的文本预测情感倾向。| [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FRoBERTa) |\n|\u003Cb>GPT-2\u003C\u002Fb>|[Radford 等](https:\u002F\u002Fd4mucfpksywv.cloudfront.net\u002Fbetter-language-models\u002Flanguage_models_are_unsupervised_multitask_learners.pdf)|一种基于 Transformer 的大型语言模型，给定一段文本中的词序列，可预测下一个词。 | [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FGPT-2) |\n|\u003Cb>T5\u003C\u002Fb>|[Raffel 等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1910.10683)|一种基于 Transformer 的大型语言模型，同时在多个任务上进行训练，以更好地理解提示的语义，能够进行情感分析、问答、相似度检测、翻译、摘要等任务。 |[![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fonnx\u002FT5) |\n\u003Chr>\n\n### 机器翻译 \u003Ca name=\"machine_translation\"\u002F>\n这类自然语言处理模型学习如何将输入文本翻译成另一种语言。\n\n|模型类别 |参考文献 |描述 |\n|-|-|-|\n|通过联合学习对齐与翻译实现神经机器翻译|\t[Bahdanau 等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1409.0473)|旨在构建一个可以联合调优以最大化翻译性能的单一神经网络。\u003Cbr>[贡献](contribute.md)|\n|谷歌的神经机器翻译系统|\t[Wu 等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1609.08144)|该模型有助于改善神经机器翻译（NMT）系统面临的问题，例如并行化，从而加快最终的翻译速度。\u003Cbr>[贡献](contribute.md)|\n\u003Chr>\n\n### 语言建模 \u003Ca name=\"language_modelling\"\u002F>\n这是自然语言处理模型的一个子集，从大规模文本语料库中学习语言表示。\n\n|模型类别 |参考文献 |描述 |\n|-|-|-|\n|深度神经网络语言模型 | [Arisoy 等](https:\u002F\u002Fpdfs.semanticscholar.org\u002Fa177\u002F45f1d7045636577bcd5d513620df5860e9e5.pdf)|一种深度神经网络声学模型。广泛应用于多种自然语言技术中。它表示语言中所有可能词串的概率分布。\u003Cbr> [贡献](contribute.md)|\n\u003Chr>\n\n### 视觉问答与对话 \u003Ca name=\"visual_qna\"\u002F>\n这一自然语言处理模型子集使用输入图像来回答关于这些图像的问题。\n\n|模型类别 |参考文献 |描述 |\n|-|-|-|\n|VQA：视觉问答 |[Agrawal 等](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1505.00468v6.pdf)|该模型接收一张图像以及一个关于该图像的自由形式、开放式自然语言问题，并输出一个自然语言答案。\u003Cbr>[贡献](contribute.md)|\n|阴阳：平衡与回答二元视觉问题 |[Zhang 等](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1511.05099.pdf)|通过将问题转换为一个简洁总结图像中待检测视觉概念的元组来解决 VQA 问题。随后，如果图像中能找到该概念，则给出“是”或“否”的答案。其在不平衡数据集上的表现与传统 VQA 方法相当，在平衡数据集上则优于后者。\u003Cbr>[贡献](contribute.md)|\n|让 VQA 中的“V”更有意义|[Goyal 等](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1612.00837.pdf)|通过收集互补图像来平衡 VQA 数据集，使得每个问题都对应一对相似但会产生不同答案的图像，从而提供一种独特的可解释模型，基于反例进行解释。\u003Cbr>[贡献](contribute.md)|\n|视觉对话|\t[Das 等](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.08669)|一个能够以自然、会话式的语言与人类就视觉内容进行有意义对话的 AI 代理。整理了一个大规模的视觉对话数据集（VisDial）。\u003Cbr>[贡献](contribute.md)|\n\u003Chr>\n\n### 其他有趣的模型 \u003Ca name=\"others\"\u002F>\n有许多有趣的深度学习模型并不符合上述分类。ONNX 团队非常鼓励用户和研究人员将他们的模型 [贡献](contribute.md) 到不断增长的模型库中。\n\n|模型类别 |参考文献 |描述 |\n|-|-|-|\n|文本到图像|\t[生成对抗网络文本到图像合成 ](https:\u002F\u002Farxiv.org\u002Fabs\u002F1605.05396)|有效地结合了文本和图像建模领域的进展，将视觉概念从文字转化为像素。根据详细的文本描述生成逼真的鸟类和花卉图像。\u003Cbr>[贡献](contribute.md)|\n|时间序列预测|\t[利用深度神经网络建模长期和短期时间模式 ](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1703.07015.pdf)|该模型提取变量之间的短期局部依赖模式，并发现时间序列趋势中的长期模式。有助于预测太阳能电站的发电量、电力消耗以及交通拥堵情况。\u003Cbr>[贡献](contribute.md)|\n|推荐系统|[DropoutNet：解决推荐系统中的冷启动问题](http:\u002F\u002Fwww.cs.toronto.edu\u002F~mvolkovs\u002Fnips2017_deepcf.pdf)|一种协同过滤方法，基于其他用户的偏好信息来预测个人的偏好。\u003Cbr>[贡献](contribute.md)|\n|协同过滤|[神经协同过滤](https:\u002F\u002Farxiv.org\u002Fpdf\u002F1708.05031.pdf)|一种基于矩阵分解的 DNN 模型，利用用户和物品特征之间的交互作用。\u003Cbr>[贡献](contribute.md)|\n|自编码器|[用于段落和文档的层次化神经自编码器](https:\u002F\u002Farxiv.org\u002Fabs\u002F1506.01057)|一种 LSTM（长短期记忆）自编码器，用于保存和重建多句段落。\u003Cbr>[贡献](contribute.md)|\n\u003Chr>\n\n## 使用 \u003Ca name=\"usage-\"\u002F>\n\n每个 ONNX 后端都应该支持开箱即用地运行这些模型。下载并解压每个模型的 tarball 文件后，您会发现：\n\n- 一个表示序列化 ONNX 模型的 protobuf 文件 `model.onnx`。\n- 测试数据（以序列化的 protobuf TensorProto 文件或序列化的 NumPy 归档文件的形式）。\n\n### 使用 - 测试数据入门代码\n\n测试数据文件可用于验证来自模型库的 ONNX 模型。我们提供了以下接口示例供您开始使用。请将代码中的 `onnx_backend` 替换为您选择的、支持 ONNX 推理的相应框架，并将 `backend.run_model` 替换为该框架的模型评估逻辑。\n\n测试数据文件有两种不同的格式：\n\n- 序列化的 protobuf TensorProtos (.pb)，存储在以 `test_data_set_*` 命名的文件夹中。\n\n```python\nimport numpy as np\nimport onnx\nimport os\nimport glob\nimport onnx_backend as backend\n\nfrom onnx import numpy_helper\n\nmodel = onnx.load('model.onnx')\ntest_data_dir = 'test_data_set_0'\n\n# 加载输入\ninputs = []\ninputs_num = len(glob.glob(os.path.join(test_data_dir, 'input_*.pb')))\nfor i in range(inputs_num):\n    input_file = os.path.join(test_data_dir, 'input_{}.pb'.format(i))\n    tensor = onnx.TensorProto()\n    with open(input_file, 'rb') as f:\n        tensor.ParseFromString(f.read())\n    inputs.append(numpy_helper.to_array(tensor))\n\n# 加载参考输出\nref_outputs = []\nref_outputs_num = len(glob.glob(os.path.join(test_data_dir, 'output_*.pb')))\nfor i in range(ref_outputs_num):\n    output_file = os.path.join(test_data_dir, 'output_{}.pb'.format(i))\n    tensor = onnx.TensorProto()\n    with open(output_file, 'rb') as f:\n        tensor.ParseFromString(f.read())\n    ref_outputs.append(numpy_helper.to_array(tensor))\n\n# 在后端运行模型\noutputs = list(backend.run_model(model, inputs))\n\n# 将结果与参考输出进行比较。\nfor ref_o, o in zip(ref_outputs, outputs):\n    np.testing.assert_almost_equal(ref_o, o)\n```\n\n- 序列化的 Numpy 归档文件，以 `test_data_*.npz` 的命名方式存储。每个文件包含一组测试输入和输出。\n\n```python\nimport numpy as np\nimport onnx\nimport onnx_backend as backend\n\n# 加载模型以及样本输入和输出\nmodel = onnx.load(model_pb_path)\nsample = np.load(npz_path, encoding='bytes')\ninputs = list(sample['inputs'])\noutputs = list(sample['outputs'])\n\n# 使用 ONNX 后端运行模型并验证结果\nnp.testing.assert_almost_equal(outputs, backend.run_model(model, inputs))\n```\n\n### 使用 - 模型量化\n您可以使用 [Intel® Neural Compressor](https:\u002F\u002Fgithub.com\u002Fintel\u002Fneural-compressor) 获取量化后的 ONNX 模型。它提供基于 Web 的 UI 服务，使量化更加简便，并支持基于代码的使用方式以实现更丰富的量化设置。有关如何使用基于 Web 的 UI 服务，请参阅 [bench 文档](https:\u002F\u002Fgithub.com\u002Fintel\u002Fneural-compressor\u002Fblob\u002Fmaster\u002Fdocs\u002Fbench.md)，有关简单的基于代码的演示，请参阅 [示例文档](.\u002Fresource\u002Fdocs\u002FINC_code.md)。\n![image](.\u002Fresource\u002Fimages\u002FINC_GUI.gif)\n\n## 使用\n\n访问 ONNX 模型库有多种方式：\n\n### Git 克隆（不推荐）\n\n使用 `git` 克隆仓库时，由于 ONNX 模型文件体积较大，不会自动下载这些模型。要管理这些文件，首先需要安装 Git LFS，运行以下命令：\n\n```bash\npip install git-lfs\n```\n\n要下载特定模型：\n\n```bash\ngit lfs pull --include=\"[模型路径].onnx\" --exclude=\"\"\n```\n\n要下载所有模型：\n\n```bash\ngit lfs pull --include=\"*\" --exclude=\"\"\n```\n\n### GitHub 网页界面\n\n此外，您也可以直接从 GitHub 下载模型。导航到相应模型的页面，然后点击右上角的“Download”按钮。\n\n## 模型可视化\n\n为了以图形化方式展示每个模型的架构，我们推荐使用 [Netron](https:\u002F\u002Fgithub.com\u002Flutzroeder\u002Fnetron)。\n\n## 贡献\n\n欢迎为 ONNX 模型库做出贡献！请查看我们的[贡献指南](contribute.md)，了解如何参与本资源的建设与改进。\n\n感谢您对 ONNX 模型库的关注，我们期待您的加入！\n\n# 许可证\n\n[Apache License v2.0](LICENSE)","# ONNX Model Zoo 快速上手指南\n\n> **重要提示**：ONNX Model Zoo 仓库目前已进入归档维护状态，不再新增模型。自 2025 年 7 月 1 日起，将停止通过 Git LFS 提供模型下载。\n> *   **推荐方案**：请访问 **[Hugging Face ONNX Model Zoo](https:\u002F\u002Fhuggingface.co\u002Fonnxmodelzoo)** 获取所有历史及最新模型。\n> *   **国内加速**：中国开发者建议使用 [Hugging Face 镜像站](https:\u002F\u002Fhf-mirror.com\u002F) 或配置 `HF_ENDPOINT` 环境变量以加速下载。\n\n本指南旨在帮助开发者快速获取并使用仓库中经过验证的预训练 ONNX 模型（如 ResNet, YOLO, BERT 等）。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：Linux, macOS 或 Windows\n*   **Python**：版本 3.8 及以上\n*   **Git & Git LFS**：必须安装 Git Large File Storage (LFS) 以拉取大型模型文件\n*   **核心依赖**：\n    *   `onnx`: 用于加载和验证模型结构\n    *   `onnxruntime`: 用于推理执行\n    *   `numpy`: 用于数据处理\n\n### 安装前置依赖\n\n```bash\n# 安装 Git LFS (以 Ubuntu\u002FDebian 为例，其他系统请参考官方文档)\nsudo apt-get install git-lfs\ngit lfs install\n\n# 创建虚拟环境并安装 Python 依赖\npython -m venv onnx-env\nsource onnx-env\u002Fbin\u002Factivate  # Windows 用户请使用: onnx-env\\Scripts\\activate\n\n# 安装核心库 (推荐使用国内镜像源加速)\npip install onnx onnxruntime numpy -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n## 安装步骤 (获取模型)\n\n由于原仓库已停止 LFS 服务，**强烈建议直接从 Hugging Face 下载模型**。\n\n### 方法一：使用 Hugging Face CLI (推荐)\n\n```bash\n# 安装 huggingface hub 工具\npip install huggingface_hub -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n\n# 设置国内镜像加速 (可选但推荐)\nexport HF_ENDPOINT=https:\u002F\u002Fhf-mirror.com\n\n# 下载特定模型 (例如 ResNet-50)\n# 注意：将 \u003Cmodel_path> 替换为 Hugging Face 上的具体模型路径\nhuggingface-cli download --repo-type model onnxmodelzoo\u002Fresnet-50 --local-dir .\u002Fmodels\u002Fresnet-50\n```\n\n### 方法二：直接下载 .onnx 文件\n\n如果您只需单个模型文件，可直接在 [Hugging Face ONNX Model Zoo](https:\u002F\u002Fhuggingface.co\u002Fonnxmodelzoo) 页面找到对应的 `.onnx` 文件并下载，或使用 `wget`\u002F`curl`：\n\n```bash\n# 示例：下载 ResNet-50 模型 (链接仅为示例，请以官网最新链接为准)\nwget https:\u002F\u002Fhuggingface.co\u002Fonnxmodelzoo\u002Fresnet-50\u002Fresolve\u002Fmain\u002Fresnet-50.onnx\n```\n\n## 基本使用\n\n以下是一个最简单的 Python 示例，展示如何加载一个图像分类模型（如 ResNet）并进行推理。\n\n### 1. 准备测试数据\n确保你有一张测试图片（例如 `test.jpg`），或者使用随机生成的符合模型输入形状的数据。\n\n### 2. 运行推理代码\n\n创建一个名为 `infer.py` 的文件，写入以下代码：\n\n```python\nimport onnx\nimport onnxruntime as ort\nimport numpy as np\nfrom PIL import Image\n\n# 1. 加载模型\nmodel_path = \"resnet-50.onnx\"  # 替换为你下载的模型路径\nsession = ort.InferenceSession(model_path)\n\n# 2. 获取模型输入信息\ninput_name = session.get_inputs()[0].name\ninput_shape = session.get_inputs()[0].shape\n# 假设形状为 [batch, channel, height, width]，去除 batch 维度处理单张图片\nheight, width = input_shape[2], input_shape[3]\n\n# 3. 预处理图片 (以 ResNet 为例，需调整大小并归一化)\n# 实际使用时请根据具体模型的预处理要求调整\nimage = Image.open(\"test.jpg\").convert(\"RGB\")\nimage = image.resize((width, height))\nimage_data = np.array(image).astype(np.float32)\nimage_data = np.transpose(image_data, (2, 0, 1)) \u002F 255.0  # HWC -> CHW & Normalize\nimage_data = np.expand_dims(image_data, axis=0)  # Add batch dimension\n\n# 4. 执行推理\noutputs = session.run(None, {input_name: image_data})\n\n# 5. 处理结果\nresult = outputs[0]\nprint(f\"输出形状：{result.shape}\")\nprint(f\"预测分数前 5 类索引：{np.argsort(result[0])[-5:][::-1]}\")\n```\n\n### 3. 运行脚本\n\n```bash\npython infer.py\n```\n\n如果一切正常，终端将输出模型的预测结果索引。您可以结合 ImageNet 标签表将这些索引转换为具体的类别名称（如 \"cat\", \"dog\" 等）。","某边缘计算团队正致力于将先进的图像识别算法部署到资源受限的工业质检摄像头中，以实现实时缺陷检测。\n\n### 没有 models 时\n- **框架绑定严重**：团队使用的模型多基于 PyTorch 或 TensorFlow 训练，而目标硬件仅支持特定推理引擎，导致跨框架迁移需重写大量代码甚至重新训练。\n- **格式转换繁琐**：缺乏现成的标准化模型，开发人员需手动导出并调试 ONNX 格式，常因算子不兼容导致转换失败，耗费数天排查环境差异。\n- **验证成本高昂**：自行转换的模型缺乏权威精度验证，必须在生产环境中反复测试才能确认效果，极大拖慢了从原型到落地的周期。\n- **量化优化困难**：为了让模型在低算力设备上流畅运行，团队需从零研究 INT8 量化策略，难以快速找到精度与速度的最佳平衡点。\n\n### 使用 models 后\n- **即插即用部署**：直接下载 models 库中预置的、已转换为 ONNX 格式的 ResNet 或 YOLO 等_state-of-the-art_模型，无缝对接各类推理后端，消除框架壁垒。\n- **开箱即用体验**：获取经过严格准确性验证的模型文件，跳过复杂的导出与调试环节，将原本数天的环境适配工作缩短至几小时。\n- **可信基准参考**：利用库中提供的验证脚本和测试数据，快速确认模型在目标场景下的表现，确保上线前的性能指标可靠可控。\n- **高效量化支持**：直接复用由 Intel Neural Compressor 生成的 INT8 量化模型，在几乎不损失精度的前提下显著提升边缘设备的推理速度。\n\nmodels 通过提供标准化、预验证且优化的模型资产，让开发者从繁琐的格式转换与调优中解放出来，专注于业务逻辑创新与快速落地。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fonnx_models_17c6b964.png","onnx","Open Neural Network Exchange","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fonnx_e2e75b88.png","ONNX is an open ecosystem for interoperable AI models. It's a community project: we welcome your contributions!",null,"https:\u002F\u002Fonnx.ai","https:\u002F\u002Fgithub.com\u002Fonnx",[81,85],{"name":82,"color":83,"percentage":84},"Jupyter Notebook","#DA5B0B",97.6,{"name":86,"color":87,"percentage":88},"Python","#3572A5",2.4,9540,1565,"2026-04-16T14:17:50","Apache-2.0","","未说明",{"notes":96,"python":94,"dependencies":97},"该项目为已弃用的 ONNX 模型库（历史存档），模型文件需通过 Git LFS 下载（2025 年 7 月 1 日后将停止 LFS 支持，建议转至 Hugging Face 获取）。部分 INT8 量化模型依赖 Intel® Neural Compressor。README 未明确列出具体的操作系统、GPU、内存或 Python 版本要求，因为该仓库主要存储模型文件而非推理代码，具体运行环境取决于用户使用的推理框架（如 ONNX Runtime）。",[73,98],"Git LFS",[14],[73,64,101,102,103],"download","pretrained","deep-learning","2026-03-27T02:49:30.150509","2026-04-17T10:20:41.593406",[107,112,117,122,127,132],{"id":108,"question_zh":109,"answer_zh":110,"source_url":111},37344,"为什么在 Windows CPU 上运行量化模型（如 VGG16, ResNet50 int8）会失败或产生显著不同的结果？","这是由于硬件差异导致的。量化模型在支持 VNNI 指令集的机器和不支持 VNNI 的机器上运行时，可能会产生显著不同的输出结果。通常认为 VNNI 机器生成的结果更准确可靠。解决方案是：如果您有 VNNI 机器，应使用它来生成测试数据集（test_data_set）并更新模型的预期输出；对于没有 VNNI 的用户，需注意这种硬件差异可能导致的结果偏差。维护者已通过 PR 更新了基于 VNNI 机器的测试结果以消除混淆。","https:\u002F\u002Fgithub.com\u002Fonnx\u002Fmodels\u002Fissues\u002F568",{"id":113,"question_zh":114,"answer_zh":115,"source_url":116},37345,"将 ArcFace (MXNet) 模型转换为 ONNX 时遇到 'Sub' 操作数类型不匹配或 'PRelu' 广播错误怎么办？","这是一个已知问题，通常发生在将 MXNet 的 ArcFace 模型转换为 ONNX 时。错误包括 'Sub' 操作的输入类型不一致（float vs double）以及 'PRelu' 节点的形状无法广播（例如左形状 [1,64,112,112] 与右形状 [64] 不匹配）。官方修复方案参考 Apache MXNet 的 PR #17711。用户在使用 insightface 的 convert_onnx.py 脚本转换时也会遇到此问题，建议检查是否已应用相关的修复补丁，或者等待框架更新以正确导出 PRelu 层。","https:\u002F\u002Fgithub.com\u002Fonnx\u002Fmodels\u002Fissues\u002F91",{"id":118,"question_zh":119,"answer_zh":120,"source_url":121},37346,"将 SSD 模型转换为 TensorRT 时遇到 'Assertion failed: axis >= 0 && axis \u003C nbDims' 错误如何解决？","该错误通常与 ONNX 模型的 Opset 版本或 TensorRT 解析器版本过旧有关。日志显示模型使用的是 Opset 10，而旧版解析器可能不支持。解决方案是升级 TensorRT 及其 ONNX 解析器以支持 Opset 10 或更高版本。此外，警告中提到的 INT64 权重不被 TensorRT 原生支持的问题，解析器通常会尝试自动将其转换为 INT32，但确保使用最新版本的 onnx-tensorrt 可以避免此类断言失败。","https:\u002F\u002Fgithub.com\u002Fonnx\u002Fmodels\u002Fissues\u002F185",{"id":123,"question_zh":124,"answer_zh":125,"source_url":126},37347,"ONNX Model Zoo 中是否有预训练的目标检测模型（如 SSD, Faster-RCNN, RetinaNet）可用？","Model Zoo 中提供部分目标检测模型，如 SSD 和 Faster-RCNN。关于 RetinaNet，虽然大多数 Detectron 模型已受支持，但 RetinaNet 包含一些特定的操作符，导致其支持进度较慢。如果需要使用特定模型，通常需要确保原模型可以通过 Torch JIT 进行追踪（traceable），但这并不保证能直接转换为 ONNX。建议查看 Model Zoo 的最新列表获取已支持的模型，或参考社区中其他项目转换的 ONNX 模型作为参考。","https:\u002F\u002Fgithub.com\u002Fonnx\u002Fmodels\u002Fissues\u002F17",{"id":128,"question_zh":129,"answer_zh":130,"source_url":131},37348,"为什么 ONNX Model Zoo 中存在重复的 VGG19 模型文件？应该使用哪一个？","Model Zoo 中确实存在多个 VGG19 模型，这是因为它们是由不同的深度学习框架（如 MXNet 与其他框架）生成的。目前官方尚未达成统一意见来删除其中一个或合并文件，因此两个版本都保留。不同来源的模型可能在精度或结构细节上略有差异。建议用户根据自己使用的框架背景选择对应的模型，或者查阅各模型目录下的 README.md 文件以了解具体的生成来源和差异说明。","https:\u002F\u002Fgithub.com\u002Fonnx\u002Fmodels\u002Fissues\u002F93",{"id":133,"question_zh":134,"answer_zh":135,"source_url":121},37349,"在转换模型到 TensorRT 时遇到 INT64 权重不支持的警告，这会影响模型运行吗？","TensorRT 原生不支持 INT64 类型的权重。当 ONNX 模型包含 INT64 权重时，解析器会发出警告并尝试将其自动向下转换（cast down）为 INT32。日志中显示 'Successfully casted down to INT32' 表示转换已成功，通常不会影响模型的正常运行。但如果遇到后续的解析错误（如轴范围断言失败），则可能需要检查模型结构或使用更新版本的 TensorRT 解析器来处理这些数据类型转换。",[]]