[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-openvinotoolkit--nncf":3,"tool-openvinotoolkit--nncf":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":106,"forks":107,"last_commit_at":108,"license":109,"difficulty_score":23,"env_os":110,"env_gpu":111,"env_ram":112,"env_deps":113,"category_tags":120,"github_topics":121,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":139,"updated_at":140,"faqs":141,"releases":170},3600,"openvinotoolkit\u002Fnncf","nncf","Neural Network Compression Framework for enhanced OpenVINO™ inference","nncf 是一款专为提升 OpenVINO 推理效率而设计的神经网络压缩框架。它主要解决深度学习模型在部署时体积过大、运行速度慢以及资源消耗高的问题，帮助开发者在几乎不损失精度的前提下，显著优化模型的推理性能。\n\n这款工具非常适合 AI 工程师、算法研究人员以及需要在边缘设备或服务器上高效部署模型的开发者使用。无论是基于 PyTorch、ONNX 还是 OpenVINO 的模型，nncf 都能提供强大的支持。\n\n其核心技术亮点在于提供了丰富的“训练后”与“训练中”压缩算法。用户可以直接对已训练好的模型进行量化和权重压缩，快速获得轻量级模型；也可以在训练阶段引入量化感知训练、剪枝等策略，从源头打造高效模型。nncf 具备自动转换模型图的能力，拥有统一的调用接口，并支持 GPU 加速微调与分布式训练，让复杂的压缩流程变得简单可控。作为 Intel OpenVINO 生态的重要组件，nncf 以开源友好的方式，助力各类神经网络应用实现更快的速度与更低的成本。","\u003Cdiv align=\"center\">\n\n# Neural Network Compression Framework (NNCF)\n\n[Key Features](#key-features) •\n[Installation](#installation-guide) •\n[Documentation](#documentation) •\n[Usage](#usage) •\n[Tutorials and Samples](#demos-tutorials-and-samples) •\n[Third-party integration](#third-party-repository-integration) •\n[Model Zoo](.\u002Fdocs\u002FModelZoo.md)\n\n[![GitHub Release](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Frelease\u002Fopenvinotoolkit\u002Fnncf?color=green)](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Freleases)\n[![Website](https:\u002F\u002Fimg.shields.io\u002Fwebsite?up_color=blue&up_message=docs&url=https%3A%2F%2Fdocs.openvino.ai%2Fnncf)](https:\u002F\u002Fdocs.openvino.ai\u002Fnncf)\n[![Apache License Version 2.0](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-Apache_2.0-green.svg)](LICENSE)\n[![PyPI Downloads](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenvinotoolkit_nncf_readme_0adfd481def8.png)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fnncf\u002F)\n\n![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.10+-blue)\n![Backends](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fbackends-openvino_|_pytorch_|_onnx_-orange)\n![OS](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FOS-Linux_|_Windows_|_MacOS-blue)\n\n\u003C\u002Fdiv>\n\nNeural Network Compression Framework (NNCF) provides a suite of post-training and training-time algorithms for\noptimizing inference of neural networks in [OpenVINO&trade;](https:\u002F\u002Fdocs.openvino.ai) with a minimal accuracy drop.\n\nNNCF is designed to work with models from [PyTorch](https:\u002F\u002Fpytorch.org\u002F),\n[TorchFX](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fstable\u002Ffx.html),\n[ONNX](https:\u002F\u002Fonnx.ai\u002F) and [OpenVINO&trade;](https:\u002F\u002Fdocs.openvino.ai).\n\nNNCF provides [samples](#demos-tutorials-and-samples) that demonstrate the usage of compression algorithms for different\nuse cases and models. See compression results achievable with the NNCF-powered samples on the [NNCF Model Zoo page](.\u002Fdocs\u002FModelZoo.md).\n\nThe framework is organized as a Python\\* package that can be built and used in a standalone mode. The framework\narchitecture is unified to make it easy to add different compression algorithms for both PyTorch deep\nlearning frameworks.\n\n\u003Ca id=\"key-features\">\u003C\u002Fa>\n\n## Key Features\n\n### Post-Training Compression Algorithms\n\n| Compression algorithm                                                                                    | OpenVINO      | PyTorch      | TorchFX       | ONNX          |\n| :------------------------------------------------------------------------------------------------------- | :-----------: | :----------: | :-----------: | :-----------: |\n| [Post-Training Quantization](.\u002Fdocs\u002Fusage\u002Fpost_training_compression\u002Fpost_training_quantization\u002FUsage.md) | Supported     | Supported    | Experimental  | Supported     |\n| [Weights Compression](.\u002Fdocs\u002Fusage\u002Fpost_training_compression\u002Fweights_compression\u002FUsage.md)               | Supported     | Supported    | Experimental  | Supported     |\n| [Activation Sparsity](.\u002Fsrc\u002Fnncf\u002Fexperimental\u002Ftorch\u002Fsparsify_activations\u002FActivationSparsity.md)          | Not supported | Experimental | Not supported | Not supported |\n\n### Training-Time Compression Algorithms\n\n| Compression algorithm                                                                                                                         | PyTorch   |\n| :-------------------------------------------------------------------------------------------------------------------------------------------- | :-------: |\n| [Quantization Aware Training](.\u002Fdocs\u002Fusage\u002Ftraining_time_compression\u002Fquantization_aware_training\u002FUsage.md)                                    | Supported |\n| [Weight-Only Quantization Aware Training with LoRA and NLS](.\u002Fdocs\u002Fusage\u002Ftraining_time_compression\u002Fquantization_aware_training_lora\u002FUsage.md) | Supported |\n| [Pruning](.\u002Fdocs\u002Fusage\u002Ftraining_time_compression\u002Fpruning\u002FUsage.md)                                                                            | Supported |\n\n- Automatic, configurable model graph transformation to obtain the compressed model.\n- Common interface for compression methods.\n- GPU-accelerated layers for faster compressed model fine-tuning.\n- Distributed training support.\n- Git patch for prominent third-party repository ([huggingface-transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers)) demonstrating the process of integrating NNCF into custom training pipelines.\n- Exporting PyTorch compressed models to ONNX\\* checkpoints compressed models to SavedModel or Frozen Graph format, ready to use with [OpenVINO&trade; toolkit](https:\u002F\u002Fdocs.openvino.ai).\n\n\u003Ca id=\"documentation\">\u003C\u002Fa>\n\n## Documentation\n\nThis documentation covers detailed information about NNCF algorithms and functions needed for the contribution to NNCF.\n\nThe latest user documentation for NNCF is available [here](https:\u002F\u002Fdocs.openvino.ai\u002Fnncf).\n\nNNCF API documentation can be found [here](https:\u002F\u002Fopenvinotoolkit.github.io\u002Fnncf\u002Fautoapi\u002Fnncf\u002F).\n\n\u003Ca id=\"usage\">\u003C\u002Fa>\n\n## Usage\n\n### Post-Training Quantization\n\nThe NNCF PTQ is the simplest way to apply 8-bit quantization. To run the algorithm you only need your model and a small (~300 samples) calibration dataset.\n\n[OpenVINO](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino) is the preferred backend to run PTQ with, while PyTorch and ONNX are also supported.\n\n\u003Cdetails open>\u003Csummary>\u003Cb>OpenVINO\u003C\u002Fb>\u003C\u002Fsummary>\n\n```python\nimport nncf\nimport openvino as ov\nimport torch\nfrom torchvision import datasets, transforms\n\n# Instantiate your uncompressed model\nmodel = ov.Core().read_model(\"\u002Fmodel_path\")\n\n# Provide validation part of the dataset to collect statistics needed for the compression algorithm\nval_dataset = datasets.ImageFolder(\"\u002Fpath\", transform=transforms.Compose([transforms.ToTensor()]))\ndataset_loader = torch.utils.data.DataLoader(val_dataset, batch_size=1)\n\n# Step 1: Initialize transformation function\ndef transform_fn(data_item):\n    images, _ = data_item\n    return images\n\n# Step 2: Initialize NNCF Dataset\ncalibration_dataset = nncf.Dataset(dataset_loader, transform_fn)\n# Step 3: Run the quantization pipeline\nquantized_model = nncf.quantize(model, calibration_dataset)\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>\u003Cb>PyTorch\u003C\u002Fb>\u003C\u002Fsummary>\n\n```python\nimport nncf\nimport torch\nfrom torchvision import datasets, models\n\n# Instantiate your uncompressed model\nmodel = models.mobilenet_v2()\n\n# Provide validation part of the dataset to collect statistics needed for the compression algorithm\nval_dataset = datasets.ImageFolder(\"\u002Fpath\", transform=transforms.Compose([transforms.ToTensor()]))\ndataset_loader = torch.utils.data.DataLoader(val_dataset)\n\n# Step 1: Initialize the transformation function\ndef transform_fn(data_item):\n    images, _ = data_item\n    return images\n\n# Step 2: Initialize NNCF Dataset\ncalibration_dataset = nncf.Dataset(dataset_loader, transform_fn)\n# Step 3: Run the quantization pipeline\nquantized_model = nncf.quantize(model, calibration_dataset)\n\n```\n\n**NOTE** If the Post-Training Quantization algorithm does not meet quality requirements you can fine-tune the quantized pytorch model. You can find an example of the Quantization-Aware training pipeline for a pytorch model [here](examples\u002Fquantization_aware_training\u002Ftorch\u002Fresnet18\u002FREADME.md).\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>\u003Cb>TorchFX\u003C\u002Fb>\u003C\u002Fsummary>\n\n```python\nimport nncf\nimport torch.fx\nfrom torchvision import datasets, models\n\n# Instantiate your uncompressed model\nmodel = models.mobilenet_v2()\n\n# Provide validation part of the dataset to collect statistics needed for the compression algorithm\nval_dataset = datasets.ImageFolder(\"\u002Fpath\", transform=transforms.Compose([transforms.ToTensor()]))\ndataset_loader = torch.utils.data.DataLoader(val_dataset)\n\n# Step 1: Initialize the transformation function\ndef transform_fn(data_item):\n    images, _ = data_item\n    return images\n\n# Step 2: Initialize NNCF Dataset\ncalibration_dataset = nncf.Dataset(dataset_loader, transform_fn)\n\n# Step 3: Export model to TorchFX\ninput_shape = (1, 3, 224, 224)\nfx_model = torch.export.export_for_training(model, args=(ex_input,)).module()\n# or\n# fx_model = torch.export.export(model, args=(ex_input,)).module()\n\n# Step 4: Run the quantization pipeline\nquantized_fx_model = nncf.quantize(fx_model, calibration_dataset)\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>\u003Cb>ONNX\u003C\u002Fb>\u003C\u002Fsummary>\n\n```python\nimport onnx\nimport nncf\nimport torch\nfrom torchvision import datasets\n\n# Instantiate your uncompressed model\nonnx_model = onnx.load_model(\"\u002Fmodel_path\")\n\n# Provide validation part of the dataset to collect statistics needed for the compression algorithm\nval_dataset = datasets.ImageFolder(\"\u002Fpath\", transform=transforms.Compose([transforms.ToTensor()]))\ndataset_loader = torch.utils.data.DataLoader(val_dataset, batch_size=1)\n\n# Step 1: Initialize transformation function\ninput_name = onnx_model.graph.input[0].name\ndef transform_fn(data_item):\n    images, _ = data_item\n    return {input_name: images.numpy()}\n\n# Step 2: Initialize NNCF Dataset\ncalibration_dataset = nncf.Dataset(dataset_loader, transform_fn)\n# Step 3: Run the quantization pipeline\nquantized_model = nncf.quantize(onnx_model, calibration_dataset)\n```\n\n\u003C\u002Fdetails>\n\n[\u002F\u002F]: # (NNCF provides full  [samples]&#40;#post-training-quantization-samples&#41;, which demonstrate Post-Training Quantization usage for PyTorch, ONNX, and OpenVINO.)\n\n### Training-Time Quantization\n\nHere is an example of Accuracy Aware Quantization pipeline where model weights and compression parameters may be fine-tuned to achieve a higher accuracy.\n\n\u003Cdetails>\u003Csummary>\u003Cb>PyTorch\u003C\u002Fb>\u003C\u002Fsummary>\n\n```python\nimport nncf\nimport torch\nfrom torchvision import datasets, models\n\n# Instantiate your uncompressed model\nmodel = models.mobilenet_v2()\n\n# Provide validation part of the dataset to collect statistics needed for the compression algorithm\nval_dataset = datasets.ImageFolder(\"\u002Fpath\", transform=transforms.Compose([transforms.ToTensor()]))\ndataset_loader = torch.utils.data.DataLoader(val_dataset)\n\n# Step 1: Initialize the transformation function\ndef transform_fn(data_item):\n    images, _ = data_item\n    return images\n\n# Step 2: Initialize NNCF Dataset\ncalibration_dataset = nncf.Dataset(dataset_loader, transform_fn)\n# Step 3: Run the quantization pipeline\nquantized_model = nncf.quantize(model, calibration_dataset)\n\n# Now use compressed_model as a usual torch.nn.Module\n# to fine-tune compression parameters along with the model weights\n\n# Save quantization modules and the quantized model parameters\ncheckpoint = {\n    'state_dict': model.state_dict(),\n    'nncf_config': nncf.torch.get_config(model),\n    ... # the rest of the user-defined objects to save\n}\ntorch.save(checkpoint, path_to_checkpoint)\n\n# ...\n\n# Load quantization modules and the quantized model parameters\nresuming_checkpoint = torch.load(path_to_checkpoint)\nnncf_config = resuming_checkpoint['nncf_config']\nstate_dict = resuming_checkpoint['state_dict']\n\nquantized_model = nncf.torch.load_from_config(model, nncf_config, example_input)\nmodel.load_state_dict(state_dict)\n# ... the rest of the usual PyTorch-powered training pipeline\n```\n\n\u003C\u002Fdetails>\n\n\u003Ca id=\"demos-tutorials-and-samples\">\u003C\u002Fa>\n\n## Demos, Tutorials and Samples\n\nFor a quicker start with NNCF-powered compression, try sample notebooks and scripts presented below.\n\n### Jupyter* Notebook Tutorials and Demos\n\nReady-to-run Jupyter* notebook tutorials and demos are available to explain and display NNCF compression algorithms for optimizing models for inference with the OpenVINO Toolkit:\n\n| Notebook Tutorial Name                                                                                                                                                                                                                                                                                                                                 |                                  Compression Algorithm                                  |  Backend   |               Domain                |\n|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------:|:----------:|:-----------------------------------:|\n| [BERT Quantization](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Flanguage-quantize-bert)\u003Cbr>[![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Flanguage-quantize-bert\u002Flanguage-quantize-bert.ipynb) |                               Post-Training Quantization                                |  OpenVINO  |                 NLP                 |\n| [MONAI Segmentation Model Quantization](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fct-segmentation-quantize)\u003Cbr>[![Binder](https:\u002F\u002Fmybinder.org\u002Fbadge_logo.svg)](https:\u002F\u002Fmybinder.org\u002Fv2\u002Fgh\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002FHEAD?filepath=notebooks%2Fct-segmentation-quantize%2Fct-scan-live-inference.ipynb)     |                               Post-Training Quantization                                |  OpenVINO  |            Segmentation             |\n| [PyTorch Model Quantization](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fpytorch-post-training-quantization-nncf)                                                                                                                                                                                                      |                               Post-Training Quantization                                |  PyTorch   |        Image Classification         |\n| [YOLOv11 Quantization with Accuracy Control](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fyolov11-quantization-with-accuracy-control)                                                                                                                                                                                               |                    Post-Training Quantization with Accuracy Control                     |  OpenVINO  | Speech-to-Text,\u003Cbr>Object Detection |\n\nA list of notebooks demonstrating OpenVINO conversion and inference together with NNCF compression for models from various domains:\n\n| Demo Model                                                                                                                                                                                                                                                                                                                                        |               Compression Algorithm               |  Backend  |                                Domain                                |\n|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------:|:---------:|:--------------------------------------------------------------------:|\n| [YOLOv8](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fyolov8-optimization)\u003Cbr>[![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fyolov8-optimization\u002Fyolov8-object-detection.ipynb)            |            Post-Training Quantization             | OpenVINO  |  Object Detection,\u003Cbr>KeyPoint Detection,\u003Cbr>Instance Segmentation   |\n| [EfficientSAM](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fefficient-sam)                                                                                                                                                                                                                                         |            Post-Training Quantization             | OpenVINO  |                          Image Segmentation                          |\n| [Segment Anything Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fsegment-anything)                                                                                                                                                                                                                            |            Post-Training Quantization             | OpenVINO  |                          Image Segmentation                          |\n| [OneFormer](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Foneformer-segmentation)                                                                                                                                                                                                                                   |            Post-Training Quantization             | OpenVINO  |                          Image Segmentation                          |\n| [CLIP](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fclip-zero-shot-image-classification)                                                                                                                                                                                                                           |            Post-Training Quantization             | OpenVINO  |                            Image-to-Text                             |\n| [BLIP](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fblip-visual-language-processing)                                                                                                                                                                                                                               |            Post-Training Quantization             | OpenVINO  |                            Image-to-Text                             |\n| [Latent Consistency Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Flatent-consistency-models-image-generation)                                                                                                                                                                                                |            Post-Training Quantization             | OpenVINO  |                            Text-to-Image                             |\n| [Distil-Whisper](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fdistil-whisper-asr)                                                                                                                                                                                                                                  |            Post-Training Quantization             | OpenVINO  |                            Speech-to-Text                            |\n| [Whisper](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fwhisper-subtitles-generation)\u003Cbr>[![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fwhisper-subtitles-generation\u002Fwhisper-convert.ipynb) |            Post-Training Quantization             | OpenVINO  |                            Speech-to-Text                            |\n| [MMS Speech Recognition](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fmms-massively-multilingual-speech)                                                                                                                                                                                                           |            Post-Training Quantization             | OpenVINO  |                            Speech-to-Text                            |\n| [LLM Instruction Following](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fllm-question-answering)                                                                                                                                                                                                                   |                Weight Compression                 | OpenVINO  |                      NLP, Instruction Following                      |\n| [LLM Chat Bots](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fllm-chatbot)                                                                                                                                                                                                                                          |                Weight Compression                 | OpenVINO  |                            NLP, Chat Bot                             |\n\n### Post-Training Quantization and Weight Compression Examples\n\nCompact scripts demonstrating quantization\u002Fweight compression and corresponding inference speed boost:\n\n| Example Name                                                                                                                             |              Compression Algorithm               |  Backend   |         Domain         |\n|:-----------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------:|:----------:|:----------------------:|\n| [OpenVINO MobileNetV2](.\u002Fexamples\u002Fpost_training_quantization\u002Fopenvino\u002Fmobilenet_v2\u002FREADME.md)                                            |            Post-Training Quantization            |  OpenVINO  |  Image Classification  |\n| [OpenVINO YOLO26](.\u002Fexamples\u002Fpost_training_quantization\u002Fopenvino\u002Fyolo26\u002FREADME.md)                                                       |            Post-Training Quantization            |  OpenVINO  |    Object Detection    |\n| [OpenVINO YOLOv8 QwAC](.\u002Fexamples\u002Fpost_training_quantization\u002Fopenvino\u002Fyolov8_quantize_with_accuracy_control\u002FREADME.md)                   | Post-Training Quantization with Accuracy Control |  OpenVINO  |    Object Detection    |\n| [OpenVINO Anomaly Classification](.\u002Fexamples\u002Fpost_training_quantization\u002Fopenvino\u002Fanomaly_stfpm_quantize_with_accuracy_control\u002FREADME.md) | Post-Training Quantization with Accuracy Control |  OpenVINO  | Anomaly Classification |\n| [PyTorch MobileNetV2](.\u002Fexamples\u002Fpost_training_quantization\u002Ftorch\u002Fmobilenet_v2\u002FREADME.md)                                                |            Post-Training Quantization            |  PyTorch   |  Image Classification  |\n| [PyTorch SSD](.\u002Fexamples\u002Fpost_training_quantization\u002Ftorch\u002Fssd300_vgg16\u002FREADME.md)                                                        |            Post-Training Quantization            |  PyTorch   |    Object Detection    |\n| [TorchFX Resnet18](.\u002Fexamples\u002Fpost_training_quantization\u002Ftorch_fx\u002Fresnet18\u002FREADME.md)                                                    |            Post-Training Quantization            |  TorchFX   |  Image Classification  |\n| [ONNX MobileNetV2](.\u002Fexamples\u002Fpost_training_quantization\u002Fonnx\u002Fmobilenet_v2\u002FREADME.md)                                                    |            Post-Training Quantization            |    ONNX    |  Image Classification  |\n| [ONNX YOLOv8 QwAC](.\u002Fexamples\u002Fpost_training_quantization\u002Fonnx\u002Fyolov8_quantize_with_accuracy_control\u002FREADME.md)                           | Post-Training Quantization with Accuracy Control |    ONNX    |    Object Detection    |\n| [ONNX TinyLlama WC](.\u002Fexamples\u002Fllm_compression\u002Fonnx\u002Ftiny_llama\u002FREADME.md)                                                                |                Weight Compression                |    ONNX    |           LLM          |\n| [TorchFX TinyLlama WC](.\u002Fexamples\u002Fllm_compression\u002Ftorch_fx\u002Ftiny_llama\u002FREADME.md)                                                         |                Weight Compression                |  TorchFX   |           LLM          |\n| [OpenVINO TinyLlama WC](.\u002Fexamples\u002Fllm_compression\u002Fopenvino\u002Ftiny_llama\u002FREADME.md)                                                        |                Weight Compression                |  OpenVINO  |           LLM          |\n| [OpenVINO TinyLlama WC with HS](.\u002Fexamples\u002Fllm_compression\u002Fopenvino\u002Ftiny_llama_find_hyperparams\u002FREADME.md)                               |  Weight Compression with Hyperparameters Search  |  OpenVINO  |           LLM          |\n| [ONNX TinyLlama WC with SE](.\u002Fexamples\u002Fllm_compression\u002Fonnx\u002Ftiny_llama_scale_estimation\u002FREADME.md)                                       |     Weight Compression with Scale Estimation     |    ONNX    |           LLM          |\n\n### Quantization-Aware Training Examples\n\n| Example Name                                                                        |   Compression Algorithm     | Backend |        Domain        |\n|:------------------------------------------------------------------------------------|:---------------------------:|:-------:|:--------------------:|\n| [PyTorch Resnet18](.\u002Fexamples\u002Fquantization_aware_training\u002Ftorch\u002Fresnet18\u002FREADME.md) | Quantization-Aware Training | PyTorch | Image Classification |\n| [PyTorch Anomalib](.\u002Fexamples\u002Fquantization_aware_training\u002Ftorch\u002Fanomalib\u002FREADME.md) | Quantization-Aware Training | PyTorch | Anomaly Detection    |\n\n\u003Ca id=\"third-party-repository-integration\">\u003C\u002Fa>\n\n## Third-party Repository Integration\n\nNNCF may be easily integrated into training\u002Fevaluation pipelines of third-party repositories.\n\n### Used by\n\n- [HuggingFace Optimum Intel](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Foptimum\u002Fintel\u002Foptimization_ov)\n\n  NNCF is used as a compression backend within the renowned `transformers` repository in HuggingFace Optimum Intel. For instance, the command below exports the [Llama-3.2-3B-Instruct](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-3.2-3B-Instruct) model to OpenVINO format with INT4-quantized weights:\n\n  ```bash\n  optimum-cli export openvino -m meta-llama\u002FLlama-3.2-3B-Instruct --weight-format int4 .\u002FLlama-3.2-3B-Instruct-int4\n  ```\n\n- [Ultralytics](https:\u002F\u002Fdocs.ultralytics.com\u002Fintegrations\u002Fopenvino)\n\n  NNCF is integrated into the Intel OpenVINO export pipeline, enabling quantization for the exported models.\n\n- [ExecuTorch](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fexecutorch\u002Fblob\u002Fmain\u002Fexamples\u002Fopenvino\u002FREADME.md)\n\n  NNCF is used as primary quantization framework for the [ExecuTorch OpenVINO integration](https:\u002F\u002Fdocs.pytorch.org\u002Fexecutorch\u002Fmain\u002Fbuild-run-openvino.html).\n\n- [torch.compile](https:\u002F\u002Fdocs.pytorch.org\u002Ftutorials\u002Fprototype\u002Fopenvino_quantizer.html)\n\n  NNCF is used as primary quantization framework for the [torch.compile OpenVINO integration](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fopenvino-workflow\u002Ftorch-compile.html).\n\n- [OpenVINO Training Extensions](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Ftraining_extensions)\n\n  NNCF is integrated into OpenVINO Training Extensions as a model optimization backend. You can train, optimize, and\n  export new models based on available model templates as well as run the exported models with OpenVINO.\n\n- [Microsoft Olive](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Folive)\n\n  NNCF is used to quantize OpenVINO IR and ONNX models for the [OpenVINO integration](https:\u002F\u002Fmicrosoft.github.io\u002FOlive\u002Ffeatures\u002Fihv-integration\u002Fopenvino.html).\n\n\u003Ca id=\"installation-guide\">\u003C\u002Fa>\n\n## Installation Guide\n\nFor detailed installation instructions, refer to the [Installation](.\u002Fdocs\u002FInstallation.md) guide.\n\nNNCF can be installed as a regular PyPI package via pip:\n\n```bash\npip install nncf\n```\n\nNNCF is also available via [conda](https:\u002F\u002Fanaconda.org\u002Fconda-forge\u002Fnncf):\n\n```bash\nconda install -c conda-forge nncf\n```\n\nSystem requirements of NNCF correspond to the used backend. System requirements for each backend and\nthe matrix of corresponding versions can be found in [installation.md](.\u002Fdocs\u002FInstallation.md).\n\n## NNCF Compressed Model Zoo\n\nList of models and compression results for them can be found at our [NNCF Model Zoo page](.\u002Fdocs\u002FModelZoo.md).\n\n## Citing\n\n```bi\n@article{kozlov2020neural,\n    title =   {Neural network compression framework for fast model inference},\n    author =  {Kozlov, Alexander and Lazarevich, Ivan and Shamporov, Vasily and Lyalyushkin, Nikolay and Gorbachev, Yury},\n    journal = {arXiv preprint arXiv:2002.08679},\n    year =    {2020}\n}\n```\n\n## Contributing Guide\n\nRefer to the [CONTRIBUTING.md](.\u002FCONTRIBUTING.md) file for guidelines on contributions to the NNCF repository.\n\n## Useful links\n\n- [Documentation](.\u002Fdocs)\n- [Examples](.\u002Fexamples)\n- [FAQ](.\u002Fdocs\u002FFAQ.md)\n- [Notebooks](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks#-model-training)\n- [HuggingFace Optimum Intel](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Foptimum\u002Fintel\u002Foptimization_ov)\n- [OpenVINO Model Optimization Guide](https:\u002F\u002Fdocs.openvino.ai\u002Fnncf)\n- [OpenVINO Hugging Face page](https:\u002F\u002Fhuggingface.co\u002FOpenVINO#models)\n- [OpenVino Performance Benchmarks page](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fabout-openvino\u002Fperformance-benchmarks.html)\n\n## Telemetry\n\nNNCF as part of the OpenVINO™ toolkit collects anonymous usage data for the purpose of improving OpenVINO™ tools.\nYou can opt-out at any time by running the following command in the Python environment where you have NNCF installed:\n\n`opt_in_out --opt_out`\n\nMore information available on [OpenVINO telemetry](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fabout-openvino\u002Fadditional-resources\u002Ftelemetry.html).\n","\u003Cdiv align=\"center\">\n\n# 神经网络压缩框架 (NNCF)\n\n[主要特性](#key-features) •\n[安装指南](#installation-guide) •\n[文档](#documentation) •\n[使用方法](#usage) •\n[教程与示例](#demos-tutorials-and-samples) •\n[第三方集成](#third-party-repository-integration) •\n[模型库](.\u002Fdocs\u002FModelZoo.md)\n\n[![GitHub 发布](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Frelease\u002Fopenvinotoolkit\u002Fnncf?color=green)](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Freleases)\n[![官网](https:\u002F\u002Fimg.shields.io\u002Fwebsite?up_color=blue&up_message=docs&url=https%3A%2F%2Fdocs.openvino.ai%2Fnncf)](https:\u002F\u002Fdocs.openvino.ai\u002Fnncf)\n[![Apache 2.0 许可证](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-Apache_2.0-green.svg)](LICENSE)\n[![PyPI 下载量](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenvinotoolkit_nncf_readme_0adfd481def8.png)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fnncf\u002F)\n\n![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.10+-blue)\n![后端](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fbackends-openvino_|_pytorch_|_onnx_-orange)\n![操作系统](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FOS-Linux_|_Windows_|_MacOS-blue)\n\n\u003C\u002Fdiv>\n\n神经网络压缩框架 (NNCF) 提供了一系列训练后和训练时算法，用于在 [OpenVINO&trade;](https:\u002F\u002Fdocs.openvino.ai) 中优化神经网络的推理性能，同时将精度损失降至最低。\n\nNNCF 设计用于支持来自 [PyTorch](https:\u002F\u002Fpytorch.org\u002F)、\n[TorchFX](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fstable\u002Ffx.html)、\n[ONNX](https:\u002F\u002Fonnx.ai\u002F) 和 [OpenVINO&trade;](https:\u002F\u002Fdocs.openvino.ai) 的模型。\n\nNNCF 提供了 [示例](#demos-tutorials-and-samples)，展示了如何针对不同用例和模型使用压缩算法。有关使用 NNCF 示例实现的压缩效果，请参阅 [NNCF 模型库页面](.\u002Fdocs\u002FModelZoo.md)。\n\n该框架以 Python\\* 包的形式组织，可以独立构建和使用。其架构统一，便于为 PyTorch 等深度学习框架添加不同的压缩算法。\n\n\u003Ca id=\"key-features\">\u003C\u002Fa>\n\n## 主要特性\n\n### 训练后压缩算法\n\n| 压缩算法                                                                                    | OpenVINO      | PyTorch      | TorchFX       | ONNX          |\n| :------------------------------------------------------------------------------------------------------- | :-----------: | :----------: | :-----------: | :-----------: |\n| [训练后量化](.\u002Fdocs\u002Fusage\u002Fpost_training_compression\u002Fpost_training_quantization\u002FUsage.md) | 支持     | 支持    | 实验性  | 支持     |\n| [权重压缩](.\u002Fdocs\u002Fusage\u002Fpost_training_compression\u002Fweights_compression\u002FUsage.md)               | 支持     | 支持    | 实验性  | 支持     |\n| [激活稀疏化](.\u002Fsrc\u002Fnncf\u002Fexperimental\u002Ftorch\u002Fsparsify_activations\u002FActivationSparsity.md)          | 不支持 | 实验性 | 不支持 | 不支持 |\n\n### 训练时压缩算法\n\n| 压缩算法                                                                                                                         | PyTorch   |\n| :-------------------------------------------------------------------------------------------------------------------------------- | :-------: |\n| [量化感知训练](.\u002Fdocs\u002Fusage\u002Ftraining_time_compression\u002Fquantization_aware_training\u002FUsage.md)                                    | 支持 |\n| [基于 LoRA 和 NLS 的仅权重量化感知训练](.\u002Fdocs\u002Fusage\u002Ftraining_time_compression\u002Fquantization_aware_training_lora\u002FUsage.md) | 支持 |\n| [剪枝](.\u002Fdocs\u002Fusage\u002Ftraining_time_compression\u002Fpruning\u002FUsage.md)                                                                            | 支持 |\n\n- 自动且可配置的模型图转换，以获得压缩后的模型。\n- 压缩方法的通用接口。\n- GPU 加速层，用于更快地微调压缩模型。\n- 分布式训练支持。\n- 针对知名第三方仓库（如 [huggingface-transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers)）提供的 Git 补丁，演示了如何将 NNCF 集成到自定义训练流程中。\n- 可将 PyTorch 压缩模型导出为 ONNX\\* 检查点、SavedModel 或 Frozen Graph 格式，以便直接与 [OpenVINO&trade; 工具包](https:\u002F\u002Fdocs.openvino.ai) 一起使用。\n\n\u003Ca id=\"documentation\">\u003C\u002Fa>\n\n## 文档\n\n本文档涵盖了 NNCF 算法和功能的详细信息，这些内容对于贡献代码至 NNCF 至关重要。\n\nNNCF 的最新用户文档可在 [此处](https:\u002F\u002Fdocs.openvino.ai\u002Fnncf) 获取。\n\nNNCF API 文档请见 [此处](https:\u002F\u002Fopenvinotoolkit.github.io\u002Fnncf\u002Fautoapi\u002Fnncf\u002F)。\n\n\u003Ca id=\"usage\">\u003C\u002Fa>\n\n## 使用方法\n\n### 训练后量化\n\nNNCF 的 PTQ 是应用 8 位量化最简单的方式。运行该算法只需您的模型和一个小规模（约 300 个样本）的校准数据集。\n\n首选后端是 [OpenVINO](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino)，但 PyTorch 和 ONNX 也受支持。\n\n\u003Cdetails open>\u003Csummary>\u003Cb>OpenVINO\u003C\u002Fb>\u003C\u002Fsummary>\n\n```python\nimport nncf\nimport openvino as ov\nimport torch\nfrom torchvision import datasets, transforms\n\n# 实例化未压缩的模型\nmodel = ov.Core().read_model(\"\u002Fmodel_path\")\n\n# 提供验证数据集的一部分，以收集压缩算法所需的统计信息\nval_dataset = datasets.ImageFolder(\"\u002Fpath\", transform=transforms.Compose([transforms.ToTensor()]))\ndataset_loader = torch.utils.data.DataLoader(val_dataset, batch_size=1)\n\n# 步骤 1：初始化变换函数\ndef transform_fn(data_item):\n    images, _ = data_item\n    return images\n\n# 步骤 2：初始化 NNCF 数据集\ncalibration_dataset = nncf.Dataset(dataset_loader, transform_fn)\n# 步骤 3：运行量化流程\nquantized_model = nncf.quantize(model, calibration_dataset)\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>\u003Cb>PyTorch\u003C\u002Fb>\u003C\u002Fsummary>\n\n```python\nimport nncf\nimport torch\nfrom torchvision import datasets, models\n\n# 实例化未压缩的模型\nmodel = models.mobilenet_v2()\n\n# 提供验证数据集的一部分，以收集压缩算法所需的统计信息\nval_dataset = datasets.ImageFolder(\"\u002Fpath\", transform=transforms.Compose([transforms.ToTensor()]))\ndataset_loader = torch.utils.data.DataLoader(val_dataset)\n\n# 步骤 1：初始化变换函数\ndef transform_fn(data_item):\n    images, _ = data_item\n    return images\n\n# 步骤 2：初始化 NNCF 数据集\ncalibration_dataset = nncf.Dataset(dataset_loader, transform_fn)\n\n# 步骤3：运行量化流水线\n量化模型 = nncf.quantize(模型, 校准数据集)\n\n```\n\n**注意** 如果训练后量化算法无法满足质量要求，您可以对量化的 PyTorch 模型进行微调。您可以在 [这里](examples\u002Fquantization_aware_training\u002Ftorch\u002Fresnet18\u002FREADME.md) 找到一个针对 PyTorch 模型的量化感知训练流水线示例。\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>\u003Cb>TorchFX\u003C\u002Fb>\u003C\u002Fsummary>\n\n```python\nimport nncf\nimport torch.fx\nfrom torchvision import datasets, models\n\n# 实例化您的未压缩模型\n模型 = models.mobilenet_v2()\n\n# 提供验证数据集的一部分，以收集压缩算法所需的统计信息\n验证数据集 = datasets.ImageFolder(\"\u002Fpath\", transform=transforms.Compose([transforms.ToTensor()]))\n数据加载器 = torch.utils.data.DataLoader(验证数据集)\n\n# 步骤1：初始化转换函数\ndef transform_fn(data_item):\n    图像, _ = data_item\n    return 图像\n\n# 步骤2：初始化 NNCF 数据集\n校准数据集 = nncf.Dataset(数据加载器, transform_fn)\n\n# 步骤3：将模型导出为 TorchFX 格式\n输入形状 = (1, 3, 224, 224)\nfx_model = torch.export.export_for_training(模型, args=(ex_input,)).module()\n# 或者\n# fx_model = torch.export.export(模型, args=(ex_input,)).module()\n\n# 步骤4：运行量化流水线\n量化_fx_model = nncf.quantize(fx_model, 校准数据集)\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\u003Csummary>\u003Cb>ONNX\u003C\u002Fb>\u003C\u002Fsummary>\n\n```python\nimport onnx\nimport nncf\nimport torch\nfrom torchvision import datasets\n\n# 实例化您的未压缩模型\nonnx_model = onnx.load_model(\"\u002Fmodel_path\")\n\n# 提供验证数据集的一部分，以收集压缩算法所需的统计信息\n验证数据集 = datasets.ImageFolder(\"\u002Fpath\", transform=transforms.Compose([transforms.ToTensor()]))\n数据加载器 = torch.utils.data.DataLoader(验证数据集, batch_size=1)\n\n# 步骤1：初始化转换函数\n输入名称 = onnx_model.graph.input[0].name\ndef transform_fn(data_item):\n    图像, _ = data_item\n    return {输入名称: 图像.numpy()}\n\n# 步骤2：初始化 NNCF 数据集\n校准数据集 = nncf.Dataset(数据加载器, transform_fn)\n# 步骤3：运行量化流水线\n量化模型 = nncf.quantize(onnx_model, 校准数据集)\n```\n\n\u003C\u002Fdetails>\n\n[\u002F\u002F]: # (NNCF 提供完整的 [示例]&#40;#post-training-quantization-samples&#41;，展示了如何在 PyTorch、ONNX 和 OpenVINO 中使用训练后量化。)\n\n### 训练时量化\n\n以下是一个准确率感知量化流水线的示例，其中可以对模型权重和压缩参数进行微调，以获得更高的准确率。\n\n\u003Cdetails>\u003Csummary>\u003Cb>PyTorch\u003C\u002Fb>\u003C\u002Fsummary>\n\n```python\nimport nncf\nimport torch\nfrom torchvision import datasets, models\n\n# 实例化您的未压缩模型\n模型 = models.mobilenet_v2()\n\n# 提供验证数据集的一部分，以收集压缩算法所需的统计信息\n验证数据集 = datasets.ImageFolder(\"\u002Fpath\", transform=transforms.Compose([transforms.ToTensor()]))\n数据加载器 = torch.utils.data.DataLoader(验证数据集)\n\n# 步骤1：初始化转换函数\ndef transform_fn(data_item):\n    图像, _ = data_item\n    return 图像\n\n# 步骤2：初始化 NNCF 数据集\n校准数据集 = nncf.Dataset(数据加载器, transform_fn)\n# 步骤3：运行量化流水线\n量化模型 = nncf.quantize(模型, 校准数据集)\n\n# 现在可以将压缩后的模型当作普通的 torch.nn.Module 来使用，\n# 以同时对压缩参数和模型权重进行微调。\n\n# 保存量化模块和量化的模型参数\n检查点 = {\n    'state_dict': 模型.state_dict(),\n    'nncf_config': nncf.torch.get_config(模型),\n    ... # 其他需要保存的用户自定义对象\n}\ntorch.save(检查点, 路径_to_checkpoint)\n\n# ...\n\n# 加载量化模块和量化的模型参数\n恢复检查点 = torch.load(路径_to_checkpoint)\nnncf_config = 恢复检查点['nncf_config']\nstate_dict = 恢复检查点['state_dict']\n\n量化模型 = nncf.torch.load_from_config(模型, nncf_config, 示例输入)\n模型.load_state_dict(state_dict)\n# ... 剩下的常规 PyTorch 训练流程\n```\n\n\u003C\u002Fdetails>\n\n\u003Ca id=\"demos-tutorials-and-samples\">\u003C\u002Fa>\n\n## 演示、教程和示例\n\n为了更快地开始使用 NNCF 驱动的模型压缩，请尝试下面提供的示例笔记本和脚本。\n\n### Jupyter* 笔记本教程和演示\n\n现成可运行的 Jupyter* 笔记本教程和演示可用于解释并展示 NNCF 的压缩算法，以优化适用于 OpenVINO 工具包推理的模型：\n\n| 笔记本教程名称                                                                                                                                                                                                                                                                                                                                 |                                  压缩算法                                  |  后端   |               领域                |\n|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------:|:----------:|:-----------------------------------:|\n| [BERT量化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Flanguage-quantize-bert)\u003Cbr>[![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Flanguage-quantize-bert\u002Flanguage-quantize-bert.ipynb) |                               训练后量化                                |  OpenVINO  |                 NLP                 |\n| [MONAI分割模型量化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fct-segmentation-quantize)\u003Cbr>[![Binder](https:\u002F\u002Fmybinder.org\u002Fbadge_logo.svg)](https:\u002F\u002Fmybinder.org\u002Fv2\u002Fgh\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002FHEAD?filepath=notebooks%2Fct-segmentation-quantize%2Fct-scan-live-inference.ipynb)     |                               训练后量化                                |  OpenVINO  |            分割             |\n| [PyTorch模型量化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fpytorch-post-training-quantization-nncf)                                                                                                                                                                                                      |                               训练后量化                                |  PyTorch   |        图像分类         |\n| [带精度控制的YOLOv11量化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fyolov11-quantization-with-accuracy-control)                                                                                                                                                                                               |                    带精度控制的训练后量化                     |  OpenVINO  | 语音转文本,\u003Cbr>目标检测 |\n\n一个展示OpenVINO转换与推理，以及结合NNCF压缩技术对来自不同领域模型进行优化的笔记本列表：\n\n| 演示模型                                                                                                                                                                                                                                                                                                                                        |               压缩算法               |  后端  |                                领域                                |\n|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------:|:---------:|:--------------------------------------------------------------------:|\n| [YOLOv8](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fyolov8-optimization)\u003Cbr>[![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fyolov8-optimization\u002Fyolov8-object-detection.ipynb)            |            训练后量化             | OpenVINO  |  目标检测,\u003Cbr>关键点检测,\u003Cbr>实例分割   |\n| [EfficientSAM](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fefficient-sam)                                                                                                                                                                                                                                         |            训练后量化             | OpenVINO  |                          图像分割                          |\n| [Segment Anything Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fsegment-anything)                                                                                                                                                                                                                            |            训练后量化             | OpenVINO  |                          图像分割                          |\n| [OneFormer](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Foneformer-segmentation)                                                                                                                                                                                                                                   |            训练后量化             | OpenVINO  |                          图像分割                          |\n| [CLIP](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fclip-zero-shot-image-classification)                                                                                                                                                                                                                           |            训练后量化             | OpenVINO  |                            图像到文本                             |\n| [BLIP](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fblip-visual-language-processing)                                                                                                                                                                                                                               |            训练后量化             | OpenVINO  |                            图像到文本                             |\n| [潜在一致性模型](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Flatent-consistency-models-image-generation)                                                                                                                                                                                                |            训练后量化             | OpenVINO  |                            文本到图像                             |\n| [Distil-Whisper](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fdistil-whisper-asr)                                                                                                                                                                                                                                  |            训练后量化             | OpenVINO  |                            语音到文本                            |\n| [Whisper](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fwhisper-subtitles-generation)\u003Cbr>[![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fwhisper-subtitles-generation\u002Fwhisper-convert.ipynb) |            训练后量化             | OpenVINO  |                            语音到文本                            |\n| [MMS语音识别](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fmms-massively-multilingual-speech)                                                                                                                                                                                                           |            训练后量化             | OpenVINO  |                            语音到文本                            |\n| [LLM指令遵循](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fllm-question-answering)                                                                                                                                                                                                                   |                权重压缩                 | OpenVINO  |                      自然语言处理，指令遵循                      |\n| [LLM聊天机器人](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fllm-chatbot)                                                                                                                                                                                                                                          |                权重压缩                 | OpenVINO  |                            自然语言处理，聊天机器人                             |\n\n### 训练后量化与权重压缩示例\n\n展示量化\u002F权重压缩及其相应推理速度提升的精简脚本：\n\n| 示例名称                                                                                                                             |              压缩算法               |  后端   |         领域         |\n|:-----------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------:|:----------:|:----------------------:|\n| [OpenVINO MobileNetV2](.\u002Fexamples\u002Fpost_training_quantization\u002Fopenvino\u002Fmobilenet_v2\u002FREADME.md)                                            |            训练后量化            |  OpenVINO  |  图像分类  |\n| [OpenVINO YOLO26](.\u002Fexamples\u002Fpost_training_quantization\u002Fopenvino\u002Fyolo26\u002FREADME.md)                                                       |            训练后量化            |  OpenVINO  |    目标检测    |\n| [OpenVINO YOLOv8 QwAC](.\u002Fexamples\u002Fpost_training_quantization\u002Fopenvino\u002Fyolov8_quantize_with_accuracy_control\u002FREADME.md)                   | 带精度控制的训练后量化 |  OpenVINO  |    目标检测    |\n| [OpenVINO 异常分类](.\u002Fexamples\u002Fpost_training_quantization\u002Fopenvino\u002Fanomaly_stfpm_quantize_with_accuracy_control\u002FREADME.md) | 带精度控制的训练后量化 |  OpenVINO  | 异常分类 |\n| [PyTorch MobileNetV2](.\u002Fexamples\u002Fpost_training_quantization\u002Ftorch\u002Fmobilenet_v2\u002FREADME.md)                                                |            训练后量化            |  PyTorch   |  图像分类  |\n| [PyTorch SSD](.\u002Fexamples\u002Fpost_training_quantization\u002Ftorch\u002Fssd300_vgg16\u002FREADME.md)                                                        |            训练后量化            |  PyTorch   |    目标检测    |\n| [TorchFX Resnet18](.\u002Fexamples\u002Fpost_training_quantization\u002Ftorch_fx\u002Fresnet18\u002FREADME.md)                                                    |            训练后量化            |  TorchFX   |  图像分类  |\n| [ONNX MobileNetV2](.\u002Fexamples\u002Fpost_training_quantization\u002Fonnx\u002Fmobilenet_v2\u002FREADME.md)                                                    |            训练后量化            |    ONNX    |  图像分类  |\n| [ONNX YOLOv8 QwAC](.\u002Fexamples\u002Fpost_training_quantization\u002Fonnx\u002Fyolov8_quantize_with_accuracy_control\u002FREADME.md)                           | 带精度控制的训练后量化 |    ONNX    |    目标检测    |\n| [ONNX TinyLlama WC](.\u002Fexamples\u002Fllm_compression\u002Fonnx\u002Ftiny_llama\u002FREADME.md)                                                                |                压缩权重                |    ONNX    |           LLM          |\n| [TorchFX TinyLlama WC](.\u002Fexamples\u002Fllm_compression\u002Ftorch_fx\u002Ftiny_llama\u002FREADME.md)                                                         |                压缩权重                |  TorchFX   |           LLM          |\n| [OpenVINO TinyLlama WC](.\u002Fexamples\u002Fllm_compression\u002Fopenvino\u002Ftiny_llama\u002FREADME.md)                                                        |                压缩权重                |  OpenVINO  |           LLM          |\n| [OpenVINO TinyLlama WC 带超参数搜索](.\u002Fexamples\u002Fllm_compression\u002Fopenvino\u002Ftiny_llama_find_hyperparams\u002FREADME.md)                               |  压缩权重并进行超参数搜索  |  OpenVINO  |           LLM          |\n| [ONNX TinyLlama WC 带尺度估计](.\u002Fexamples\u002Fllm_compression\u002Fonnx\u002Ftiny_llama_scale_estimation\u002FREADME.md)                                       |     压缩权重并进行尺度估计     |    ONNX    |           LLM          |\n\n### 量化感知训练示例\n\n| 示例名称                                                                        |   压缩算法     | 后端 |        领域        |\n|:------------------------------------------------------------------------------------|:---------------------------:|:-------:|:--------------------:|\n| [PyTorch Resnet18](.\u002Fexamples\u002Fquantization_aware_training\u002Ftorch\u002Fresnet18\u002FREADME.md) | 量化感知训练 | PyTorch | 图像分类 |\n| [PyTorch Anomalib](.\u002Fexamples\u002Fquantization_aware_training\u002Ftorch\u002Fanomalib\u002FREADME.md) | 量化感知训练 | PyTorch | 异常检测 |\n\n\u003Ca id=\"third-party-repository-integration\">\u003C\u002Fa>\n\n## 第三方仓库集成\n\nNNCF 可以轻松集成到第三方仓库的训练\u002F评估流程中。\n\n### 已被使用的项目\n\n- [HuggingFace Optimum Intel](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Foptimum\u002Fintel\u002Foptimization_ov)\n\n  NNCF 被用作 HuggingFace Optimum Intel 中著名 `transformers` 仓库内的压缩后端。例如，以下命令将 [Llama-3.2-3B-Instruct](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-3.2-3B-Instruct) 模型导出为 OpenVINO 格式，并采用 INT4 量化权重：\n\n  ```bash\n  optimum-cli export openvino -m meta-llama\u002FLlama-3.2-3B-Instruct --weight-format int4 .\u002FLlama-3.2-3B-Instruct-int4\n  ```\n\n- [Ultralytics](https:\u002F\u002Fdocs.ultralytics.com\u002Fintegrations\u002Fopenvino)\n\n  NNCF 已集成到 Intel OpenVINO 导出流程中，从而实现对导出模型的量化。\n\n- [ExecuTorch](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fexecutorch\u002Fblob\u002Fmain\u002Fexamples\u002Fopenvino\u002FREADME.md)\n\n  NNCF 被用作 [ExecuTorch OpenVINO 集成](https:\u002F\u002Fdocs.pytorch.org\u002Fexecutorch\u002Fmain\u002Fbuild-run-openvino.html) 的主要量化框架。\n\n- [torch.compile](https:\u002F\u002Fdocs.pytorch.org\u002Ftutorials\u002Fprototype\u002Fopenvino_quantizer.html)\n\n  NNCF 被用作 [torch.compile OpenVINO 集成](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fopenvino-workflow\u002Ftorch-compile.html) 的主要量化框架。\n\n- [OpenVINO Training Extensions](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Ftraining_extensions)\n\n  NNCF 已集成到 OpenVINO Training Extensions 中，作为模型优化后端。用户可以基于现有模型模板训练、优化并导出新模型，同时使用 OpenVINO 运行这些导出的模型。\n\n- [Microsoft Olive](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Folive)\n\n  NNCF 被用于对 OpenVINO IR 和 ONNX 模型进行量化，以支持 [OpenVINO 集成](https:\u002F\u002Fmicrosoft.github.io\u002FOlive\u002Ffeatures\u002Fihv-integration\u002Fopenvino.html)。\n\n\u003Ca id=\"installation-guide\">\u003C\u002Fa>\n\n## 安装指南\n\n有关详细的安装说明，请参阅 [安装](.\u002Fdocs\u002FInstallation.md) 指南。\n\nNNCF 可以通过 pip 以常规 PyPI 包的形式安装：\n\n```bash\npip install nncf\n```\n\nNNCF 也可通过 [conda](https:\u002F\u002Fanaconda.org\u002Fconda-forge\u002Fnncf) 获取：\n\n```bash\nconda install -c conda-forge nncf\n```\n\nNNCF 的系统要求取决于所使用的后端。各后端的系统要求及对应版本矩阵可在 [installation.md](.\u002Fdocs\u002FInstallation.md) 中找到。\n\n## NNCF 压缩模型库\n\n模型列表及其压缩结果可在我们的 [NNCF 模型库页面](.\u002Fdocs\u002FModelZoo.md) 中找到。\n\n## 引用\n\n```bi\n@article{kozlov2020neural,\n    title =   {用于快速模型推理的神经网络压缩框架},\n    author =  {科兹洛夫、拉扎列维奇、沙姆波罗夫、利亚柳什金、戈尔巴乔夫},\n    journal = {arXiv 预印本 arXiv:2002.08679},\n    year =    {2020}\n}\n```\n\n## 贡献指南\n\n有关向 NNCF 仓库贡献的指导原则，请参阅 [CONTRIBUTING.md](.\u002FCONTRIBUTING.md) 文件。\n\n## 有用链接\n\n- [文档](.\u002Fdocs)\n- [示例](.\u002Fexamples)\n- [常见问题解答](.\u002Fdocs\u002FFAQ.md)\n- [Notebooks](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks#-model-training)\n- [HuggingFace Optimum Intel](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Foptimum\u002Fintel\u002Foptimization_ov)\n- [OpenVINO 模型优化指南](https:\u002F\u002Fdocs.openvino.ai\u002Fnncf)\n- [OpenVINO Hugging Face 页面](https:\u002F\u002Fhuggingface.co\u002FOpenVINO#models)\n- [OpenVino 性能基准测试页面](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fabout-openvino\u002Fperformance-benchmarks.html)\n\n## 遥测\n\n作为 OpenVINO™ 工具包的一部分，NNCF 会收集匿名使用数据，以用于改进 OpenVINO™ 工具。您可以在安装了 NNCF 的 Python 环境中运行以下命令，随时选择退出：\n\n`opt_in_out --opt_out`\n\n更多信息请参阅 [OpenVINO 遥测](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fabout-openvino\u002Fadditional-resources\u002Ftelemetry.html)。","# NNCF 快速上手指南\n\nNeural Network Compression Framework (NNCF) 是一个用于优化神经网络推理的开源工具库，支持 OpenVINO、PyTorch、ONNX 和 TorchFX 后端。它提供训练后量化（PTQ）和训练时量化（QAT）等算法，旨在最小化精度损失的前提下提升模型推理速度并减小模型体积。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**: Linux, Windows 或 MacOS\n*   **Python 版本**: 3.10 或更高版本\n*   **支持的框架**:\n    *   OpenVINO\n    *   PyTorch\n    *   ONNX\n    *   TorchFX (实验性支持)\n*   **依赖项**: 根据您使用的后端，需预先安装对应的深度学习框架（如 `torch`, `onnx`, `openvino`）。\n\n## 安装步骤\n\n推荐使用 pip 进行安装。国内开发者建议使用清华或阿里镜像源以加速下载。\n\n**基础安装：**\n```bash\npip install nncf -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n**安装特定后端支持（可选）：**\n如果需要使用特定的后端功能，可能需要额外安装相关依赖。例如，若未安装 OpenVINO 或 PyTorch，请先安装它们：\n\n```bash\n# 安装 PyTorch (示例，具体版本请参考 pytorch.org)\npip install torch torchvision -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n\n# 安装 OpenVINO\npip install openvino -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n## 基本使用\n\nNNCF 最核心的功能是**训练后量化 (Post-Training Quantization, PTQ)**。以下展示如何使用 NNCF 将模型量化为 8-bit 整数格式。该过程仅需模型文件和少量校准数据集（约 300 个样本）。\n\n### 场景一：OpenVINO 模型量化（推荐）\n\n这是性能最优的后端组合。\n\n```python\nimport nncf\nimport openvino as ov\nimport torch\nfrom torchvision import datasets, transforms\n\n# 1. 加载未压缩的 OpenVINO 模型\nmodel = ov.Core().read_model(\"\u002Fpath\u002Fto\u002Fyour\u002Fmodel.xml\")\n\n# 2. 准备校准数据集 (用于收集统计信息)\nval_dataset = datasets.ImageFolder(\"\u002Fpath\u002Fto\u002Fdataset\", transform=transforms.Compose([transforms.ToTensor()]))\ndataset_loader = torch.utils.data.DataLoader(val_dataset, batch_size=1)\n\n# 定义数据转换函数：从 dataloader 中提取输入张量\ndef transform_fn(data_item):\n    images, _ = data_item\n    return images\n\n# 初始化 NNCF 数据集\ncalibration_dataset = nncf.Dataset(dataset_loader, transform_fn)\n\n# 3. 执行量化\nquantized_model = nncf.quantize(model, calibration_dataset)\n\n# 保存量化后的模型\nov.save_model(quantized_model, \"\u002Fpath\u002Fto\u002Fquantized_model.xml\")\n```\n\n### 场景二：PyTorch 模型量化\n\n直接对 PyTorch 模型进行量化，无需转换为中间格式。\n\n```python\nimport nncf\nimport torch\nfrom torchvision import datasets, models, transforms\n\n# 1. 实例化 PyTorch 模型\nmodel = models.mobilenet_v2(pretrained=True)\nmodel.eval() # 确保模型处于评估模式\n\n# 2. 准备校准数据集\nval_dataset = datasets.ImageFolder(\"\u002Fpath\u002Fto\u002Fdataset\", transform=transforms.Compose([transforms.ToTensor()]))\ndataset_loader = torch.utils.data.DataLoader(val_dataset)\n\n# 定义数据转换函数\ndef transform_fn(data_item):\n    images, _ = data_item\n    return images\n\n# 初始化 NNCF 数据集\ncalibration_dataset = nncf.Dataset(dataset_loader, transform_fn)\n\n# 3. 执行量化\nquantized_model = nncf.quantize(model, calibration_dataset)\n\n# 此时 quantized_model 仍是一个 torch.nn.Module，可直接用于推理或进一步微调\n```\n\n### 场景三：ONNX 模型量化\n\n```python\nimport onnx\nimport nncf\nimport torch\nfrom torchvision import datasets, transforms\n\n# 1. 加载 ONNX 模型\nonnx_model = onnx.load_model(\"\u002Fpath\u002Fto\u002Fyour\u002Fmodel.onnx\")\n\n# 2. 准备校准数据集\nval_dataset = datasets.ImageFolder(\"\u002Fpath\u002Fto\u002Fdataset\", transform=transforms.Compose([transforms.ToTensor()]))\ndataset_loader = torch.utils.data.DataLoader(val_dataset, batch_size=1)\n\n# 获取输入节点名称\ninput_name = onnx_model.graph.input[0].name\n\n# 定义数据转换函数 (ONNX 需要字典格式的输入)\ndef transform_fn(data_item):\n    images, _ = data_item\n    return {input_name: images.numpy()}\n\n# 初始化 NNCF 数据集\ncalibration_dataset = nncf.Dataset(dataset_loader, transform_fn)\n\n# 3. 执行量化\nquantized_model = nncf.quantize(onnx_model, calibration_dataset)\n\n# 保存量化后的 ONNX 模型\nonnx.save(quantized_model, \"\u002Fpath\u002Fto\u002Fquantized_model.onnx\")\n```\n\n> **提示**: 如果量化后精度下降超出预期，可以考虑使用 **训练时量化 (Quantization Aware Training, QAT)** 对模型进行微调，以恢复精度。详细用法请参考官方文档中的 Training-Time Compression 部分。","某边缘计算团队正致力于将基于 PyTorch 训练的高精度缺陷检测模型部署到算力受限的工业相机上，以满足产线实时质检需求。\n\n### 没有 nncf 时\n- **推理延迟过高**：原始浮点模型在边缘设备上单次推理耗时超过 80ms，无法达到产线要求的 30ms 实时响应标准。\n- **内存资源紧张**：模型参数量大导致显存占用极高，容易引发设备内存溢出（OOM），迫使团队更换更昂贵的硬件。\n- **精度与速度难平衡**：手动尝试量化或剪枝不仅代码改造量大，且极易造成检测准确率断崖式下跌，难以通过验收。\n- **部署流程繁琐**：缺乏统一的压缩接口，针对不同后端（如 OpenVINO）需要重复编写转换脚本，维护成本高昂。\n\n### 使用 nncf 后\n- **推理速度显著提升**：利用 nncf 的“训练后量化”技术，将模型权重从 FP32 压缩至 INT8，推理延迟降低至 25ms，完美满足实时性要求。\n- **内存占用大幅缩减**：通过“权重量化压缩”，模型体积缩小约 4 倍，轻松在低配边缘设备上运行，无需升级硬件。\n- **精度损失微乎其微**：nncf 自动优化模型图结构并校准数据分布，在加速的同时保持检测准确率下降不超过 1%。\n- **一站式压缩流程**：仅需几行代码即可调用统一接口完成从 PyTorch 到 OpenVINO 的自动转换与压缩，极大简化了部署流水线。\n\nnncf 通过自动化的高效压缩算法，让高精度 AI 模型在低成本边缘设备上实现了速度与精度的最佳平衡。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenvinotoolkit_nncf_33b67927.png","openvinotoolkit","OpenVINO™ Toolkit","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fopenvinotoolkit_3a5e7b58.png","",null,"https:\u002F\u002Fdocs.openvino.ai\u002F","https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit",[83,87,91,95,99,102],{"name":84,"color":85,"percentage":86},"Python","#3572A5",91.9,{"name":88,"color":89,"percentage":90},"Jupyter Notebook","#DA5B0B",7.4,{"name":92,"color":93,"percentage":94},"Cuda","#3A4E3A",0.5,{"name":96,"color":97,"percentage":98},"C++","#f34b7d",0.1,{"name":100,"color":101,"percentage":98},"Makefile","#427819",{"name":103,"color":104,"percentage":105},"C","#555555",0,1146,295,"2026-04-03T07:10:24","Apache-2.0","Linux, Windows, macOS","未说明（支持 GPU 加速层用于微调，但未指定具体型号、显存或 CUDA 版本要求）","未说明",{"notes":114,"python":115,"dependencies":116},"该工具主要支持 OpenVINO、PyTorch、TorchFX 和 ONNX 后端。提供训练后量化（PTQ）和训练时量化（QAT）等算法。虽然支持分布式训练和 GPU 加速微调，但 README 中未列出具体的硬件配置门槛。","3.10+",[117,118,119],"openvino","torch","onnx",[13,26,14],[122,123,124,125,126,127,128,129,130,131,132,133,134,135,119,117,136,137,138],"quantization","pruning","sparsity","quantization-aware-training","mixed-precision-training","compression","semantic-segmentation","object-detection","classification","nlp","bert","transformers","pytorch","tensorflow","deep-learning","genai","llm","2026-03-27T02:49:30.150509","2026-04-06T05:35:35.515641",[142,147,152,157,162,166],{"id":143,"question_zh":144,"answer_zh":145,"source_url":146},16487,"如何认领一个 'Good First Issue'（适合新手的任务）？","在 Issue 评论区输入命令 \".take\" 即可将问题分配给自己。如果该问题已被分配给其他贡献者，系统会提示您联系当前负责人确认状态。","https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Fissues\u002F2496",{"id":148,"question_zh":149,"answer_zh":150,"source_url":151},16488,"量化 ONNX 模型时遇到 \"RuntimeError: Could not find the bias value of the node\" 错误怎么办？","这通常是因为卷积层后直接跟随 BatchNormalization 层，导致偏置值位于 BN 层而非卷积层中。解决方法是安装最新版本的 OpenVINO 开发包以获取修复后的逻辑：运行 `pip install openvino-dev==2023.1.0.dev20230728`（或更新版本）。无需手动为所有无偏置的卷积层添加 0 偏置。","https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Fissues\u002F1936",{"id":153,"question_zh":154,"answer_zh":155,"source_url":156},16489,"如何在 NNCF 项目中启用并运行 mypy 类型检查？","首先在仓库根目录创建 `.mypy.ini` 文件，配置目标文件范围（如 `files = nncf\u002Fcommon\u002Fgraph`）和严格模式（`strict = True`）。然后通过 `pip install -e .` 以可编辑模式安装 NNCF，最后在命令行运行 `mypy` 即可查看类型错误报告。","https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Fissues\u002F2495",{"id":158,"question_zh":159,"answer_zh":160,"source_url":161},16490,"NNCF 中的旧版张量统计（tensor_statistics）和新版有什么区别？如何处理？","新版张量统计位于 `experimental\u002Fcommon\u002Ftensor_statistics`，提供更好的功能和性能。目前的任务是搜索代码库中将旧版统计用于算法、测试和工具函数的地方，将其替换为新版实现，更新相应测试，最后删除旧的统计模块及实验性目录以消除冗余。","https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Fissues\u002F3041",{"id":163,"question_zh":164,"answer_zh":165,"source_url":146},16491,"为什么某些标记为 \"Good First Issue\" 的任务突然被关闭了？","任务可能因外部依赖变更而关闭。例如，某次代码覆盖率提升任务被关闭，是因为仓库禁用了代码覆盖率测量的 GitHub Action，且当时没有可用的替代测量工具。遇到这种情况请寻找其他开放的任务。",{"id":167,"question_zh":168,"answer_zh":169,"source_url":161},16492,"在哪里可以找到 NNCF 的贡献指南和社区支持渠道？","贡献指南位于仓库的 `CONTRIBUTING.md` 文件中。如需社区支持、讨论或提问，可以加入 Intel DevHub Discord 频道（链接通常在 Issue 描述或仓库 README 中提供），那里有 OpenVINO 开发者和其他贡献者。",[171,176,181,186,191,196,201,206,211,216,221,226,231,236,241,246,251,256,261,266],{"id":172,"version":173,"summary_zh":174,"released_at":175},98827,"v3.0.0","训练后量化：\n\n- 破坏性变更：\n  - 将 `nncf.CompressWeightsMode.CB4_F8E4M3` 模式选项重命名为 `nncf.CompressWeightsMode.CB4`。\n- 通用：\n  - 新增了 `nncf.prune` API 函数，为剪枝算法提供统一接口。目前适用于 PyTorch 后端，并支持幅度剪枝。\n    更多关于该新 API 的详细信息，请参阅[文档](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Ftree\u002Fdevelop\u002Fdocs\u002Fusage\u002Ftraining_time_compression\u002Fpruning\u002FUsage.md)。\n  - 新增了 `nncf.build_graph` API 函数，用于从模型构建 `NNCFGraph`。该 API 可用于检查和定义[忽略范围](\u002Fdocs\u002Fusage\u002FIgnoredScope.md)。\n  - 新增了关于使用 `nncf.IgnoredScope` 的[文档](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Fblob\u002Fdevelop\u002Fdocs\u002Fusage\u002FIgnoredScope.md)。\n  - 重构了 `HWConfig`，现采用 Python 风格的硬件配置定义，而非 JSON 文件。\n- 功能：\n  - 在无数据权重压缩和数据感知 AWQ 算法中，新增了对包含转置激活输入的 MatMul 操作的模型的支持。\n  - （OpenVINO）引入了一种新的实验性压缩数据类型 ADAPTIVE_CODEBOOK。该压缩类型会为每个 MatMul 或一组相同的 MatMul 计算一个唯一的码本（例如，所有 down_proj 可以共享同一个码本）。这种方法能够在按通道进行权重压缩时减少精度损失。请参阅[示例](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Ftree\u002Fdevelop\u002Fexamples\u002Fllm_compression\u002Fopenvino\u002Fsmollm2_360m_adaptive_codebook)。\n  - （TorchFX）推出了对新 `compress_pt2e` API 的预览支持，允许使用 `OpenVINOQuantizer` 对 `torch.fx.GraphModule` 模型进行量化。用户现在可以通过 nncf 的 `compress_pt2e`，结合 Scale Estimation 和 AWQ，将其模型量化到 OpenVINO 后端的 [ExecuTorch](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fexecutorch)。\n  - （PyTorch）为快速偏差校正算法新增了对线性函数的支持，以提高量化后此类模型的精度。\n  - （OpenVINO）新增了[激活分析器](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Ftree\u002Fdevelop\u002Ftools\u002Factivation_profiler)工具，用于收集和可视化张量统计信息。\n- 修复：\n  - （ONNX）修复了 `compress_quantize_weights_transformation()` 方法，移除了已删除初始值设定项名称从图输入中的引用。\n  - （ONNX）修复了 MatMulNBits 节点插入不正确的问题。\n- 改进：\n  - 在 AWQ、Scale Estimation 和 GPTQ 算法中，新增了对 3D 权重压缩的支持。现在，带有 MoE（专家混合）结构的模型，如 GPT-OSS-20B 和 Qwen3-30B-A3B，也可以使用数据感知方法进行压缩。\n- 教程：\n  - [YOLO26 OpenVINO 模型的训练后量化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Ftree\u002Fdevelop\u002Fexamples\u002Fpost_training_quantization\u002Fopenvino\u002Fyolo26)\n  - [Wan2.2 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fwan2.2-te","2026-02-24T10:19:44",{"id":177,"version":178,"summary_zh":179,"released_at":180},98828,"v2.19.0","训练后量化：\n\n- 破坏性变更：\n  - （OpenVINO）`nncf.CompressWeightsMode.E2M1` 模式选项已重命名为 `nncf.CompressWeightsMode.MXFP4`。\n- 功能：\n  - 引入了直方图聚合器：它会记录张量值的运行直方图，并计算量化范围，以最小化直方图桶量化误差的 L2 范数。直方图聚合器显著提升了多个 PTQ 分类模型的精度指标。可通过 `nncf.quantize()` 中的 `AdvancedQuantizationParameters` 使用 `RangeEstimatorParametersSet.HISTOGRAM` 来启用。\n  - （OpenVINO）在 `nncf.CompressWeightsMode` 中引入了多个新的压缩模式：`MXFP8`、`FP8` 和 `FP4`。这些模式可用作 `nncf.compress_weights()` 的 `mode` 参数，以应用相应的 MXFP8、FP8 或 FP4 精度（实验性）。\n  - 现在，权重压缩位宽分布表还会为每种压缩数据类型显示分组大小值。\n  - （ONNX）ONNX 后端的 INT8 量化现已支持 SmoothQuant 算法。\n  - （ONNX）新增了一种优化模型的转换方法，可将具有常量输入的 `QuantizeLinear` 节点折叠为预计算的量化初始化器。此行为由 `nncf.quantize()` 中的 `COMPRESS_WEIGHTS` 后端参数控制，该参数现默认启用（`True`）。\n  - （ONNX）新增了对 `MatMul` + `Add` 子图应用快速偏置\u002F偏置校正算法的支持，其中 `Add` 操作的一个输入为常量。此前，由于系统未识别出 `MatMul` 操作包含偏置，导致此类情况被跳过，无法应用该算法。\n- 修复：\n  - 在 Segment Anything 模型中添加了对位置嵌入层的忽略模式。\n  - （ONNX）修复了 `MatMulNBits` 操作中不正确的输入处理问题，该问题曾导致图结构中断。\n  - （ONNX）解决了在 `transB=1` 时，`Gemm` 操作中 INT4 权重压缩的问题。\n  - 修复了 [#3613](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Fissues\u002F3613) 中报告的 `_get_smooth_quant_param_grid()` 方法中的拼写错误。\n- 改进：\n  - 通过在下一次统计收集推理调用之前释放模型输出内存，降低了统计信息收集过程中的最大内存消耗。\n  - 减少了偏置校正算法的峰值内存占用。\n  - （OpenVINO）将模型压缩至 `MXFP4` 数据类型的耗时缩短了最多 3 倍，内存使用减少了最多 1.5 倍。\n- 教程：\n  - [训练后优化后的 Qwen3 代理部署](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fsupplementary_materials\u002Fnotebooks\u002Fqwen-3\u002Fsmolagents\u002Fqwen3_agent.ipynb)\n  - [ACE Step 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Face-step-music-generation\u002Face-step-music-generation.ipynb)\n  - [Qwen3-VL 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fqwen3-vl\u002Fqwen3-vl.ipynb)","2025-12-01T16:40:47",{"id":182,"version":183,"summary_zh":184,"released_at":185},98829,"v2.18.0","训练后量化：\n\n- 功能：\n  - （OpenVINO）引入了新的压缩数据类型 CB4_F8E4M3 和 CODEBOOK。CB4_F8E4M3 是一种基于 NF4 数据类型的固定码本，包含 16 个 fp8 值。CODEBOOK 则是用户可任意选择的码本，可用于尝试不同的数据类型。这两种数据类型均用于权重压缩。针对这些数据类型，支持 AWQ 和尺度估计算法。\n  - （OpenVINO）新增对 FP8（f8e4m3 和 f8e5m2）权重压缩为 4 位数据类型的支持，这对于 DeepSeek-R1 等模型尤为有益。\n  - 新增 `group_size_fallback_mode` 参数，用于高级权重压缩。该参数控制如何处理不支持默认分组大小的节点。默认设置为 `IGNORE`，即跳过此类节点；设置为 `ERROR` 时，若通道数不能被分组大小整除，则会抛出异常；而设置为 `ADJUST` 时，则会尝试调整分组大小使其有效。\n  - （TorchFX）在 `quantize_pt2e` API 中新增对外部量化器的支持，包括 [XNNPACKQuantizer](https:\u002F\u002Fdocs.pytorch.org\u002Fexecutorch\u002Fstable\u002Fbackends-xnnpack.html#quantization) 和 [CoreMLQuantizer](https:\u002F\u002Fdocs.pytorch.org\u002Fexecutorch\u002Fstable\u002Fbackends-coreml.html#quantization)。用户现在可以通过 nncf 的 `quantize_pt2e` 接口，在 [ExecuTorch](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fexecutorch) 中为 XNNPACK 和 CoreML 后端量化模型，采用平滑量化、偏置校正算法以及多种统计信息收集器。\n  - （ONNX）在 ONNX 后端新增对数据感知型权重压缩的支持，包括 AWQ 和尺度估计算法。并提供了一个[示例](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Ftree\u002Fdevelop\u002Fexamples\u002Fllm_compression\u002Fonnx\u002Ftiny_llama_scale_estimation)，演示了使用 ONNX 格式的 `TinyLlama\u002FTinyLlama-1.1B-Chat-v1.0` 模型进行数据感知型权重压缩的流程。\n- 改进：\n  - 支持带有旋转位置嵌入模块的模型进行权重压缩。\n  - 支持带有状态自注意力模块的模型进行权重压缩。\n- 教程：\n  - [Qwen-Agent 的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fllm-agent-mcp\u002Fllm-agent-mcp.ipynb)\n  - [FLUX.1 Kontext 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fflux.1-kontext\u002Fflux.1-kontext.ipynb)\n  - [Qwen3 嵌入模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fqwen3-embedding\u002Fqwen3-embedding.ipynb)\n  - [GLM-4.1V-9B-Thinking 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fglm4.1-v-thinking\u002Fglm4.1-v-thinking.ipynb)\n\n压缩感知训练：\n\n- 功能：\n  - （PyTorch）利用先进的压缩方法（AWQ + 尺度估计），增强了“可吸收 LoRA 的 QAT”初始化。这一改进取代了先前的…","2025-09-04T14:37:05",{"id":187,"version":188,"summary_zh":189,"released_at":190},98830,"v2.17.0","训练后量化：\n\n- 通用：\n  - （PyTorch）function_hook 模块现已成为模型追踪的默认机制。它已从实验性状态移出，并被迁移到核心 nncf.torch 命名空间中。\n- 功能：\n  - （OpenVINO、PyTorch、TorchFX）基于权重的每列幅值，新增了4位无数据AWQ（激活感知权重量化），使得在无需数据集的情况下也能应用AWQ，从而实现更精确的压缩。\n  - （OpenVINO）增加了对FP8下ScaledDotProductAttention值输入进行量化的支持。\n  - （ONNX）在ONNX后端新增了使用INT4（INT8）进行无数据权重压缩的支持，并添加了一个LLM权重压缩的示例。[该示例](examples\u002Fllm_compression\u002Fonnx\u002Ftiny_llama)展示了如何利用NNCF权重压缩API对ONNX格式的`TinyLlama-1.1B-Chat-v0.3`模型进行优化。\n  - （ONNX）为ONNX后端新增了`BackendParameters.EXTERNAL_DATA_DIR`参数。该参数用于指定存储模型外部数据文件的绝对路径。所有外部数据文件必须位于同一目录下。当使用`onnx.load(\"model.onnx\", load_external_data=False)`加载模型且未包含外部数据时，若外部数据文件不在进程的当前工作目录中，则应使用此参数。如果外部数据文件位于进程的当前工作目录中，则可省略该参数。\n  - （TorchFX，实验性）新增了支持AWQ和Scale Estimation等数据感知方法的4位权重压缩功能，以减少精度损失。\n- 修复：\n  - （TorchFX，实验性）为简化使用，nncf.torch.disable_patching() 上下文管理器已被弃用，不再需要（[示例](\u002Fexamples\u002Fpost_training_quantization\u002Ftorch_fx\u002Fresnet18\u002FREADME.md)）。\n  - 修复了不含批次维度的模型中BiasCorrection失败的问题。\n  - 将NF4的分位数中心与OpenVINO实现对齐。\n  - 修复了权重压缩统计信息的显示问题，使其能够正确展示被忽略权重的数据类型。\n- 改进：\n  - （OpenVINO）将NNCF版本添加到rt_info中。\n  - 优化了NF4的权重压缩性能（速度提升高达10倍）。\n  - `nncf.data.generate_text_data`增加了对`transformer>4.52`版本的支持。\n- 教程：\n  - [MiniCPM-o 2.6模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fminicpm-o-omnimodal-chatbot\u002Fminicpm-o-omnimodal-chatbot.ipynb)\n  - [Qwen2.5-Omni模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fqwen2.5-omni-chatbot\u002Fqwen2.5-omni-chatbot.ipynb)\n  - [InternVideo2模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fintern-video2-classiciation\u002Fintern-video2-classification.ipynb)\n  - [OpenVoice2和MeloTTS模型的训练后优化]","2025-06-18T14:59:06",{"id":192,"version":193,"summary_zh":194,"released_at":195},98831,"v2.16.0","### 训练后量化：\n\n#### 功能：\n  - （PyTorch）新增了对 AWQ 和 Scale Estimation 数据感知方法的 4 位权重压缩支持，以减少精度损失。\n  - （PyTorch，实验性）为 MinMax、FastBiasCorrection、SmoothQuant 和 WeightCompression 算法引入了 TorchFunctionMode 支持。\n\n#### 修复：\n  - 修复了 ARM CPU 上权重压缩算法偶尔失败的问题。\n  - 修复了 GPTQ 在使用按通道 int4 权重压缩时失败的问题。\n  - 修复了具有 fp8 权重的模型在权重压缩时失败的问题。\n  - （PyTorch，实验性）修复了 float16\u002Fbfloat16 模型的权重压缩问题。\n  - （PyTorch，实验性）修复了多个内存泄漏问题：未分离的张量、提取的模块以及带有梯度的图构建。\n\n#### 改进：\n  - 降低了 OpenVINO 后端在权重压缩过程中混合精度分配步骤的运行时间和峰值内存消耗。总体而言，在混合精度情况下，压缩时间缩短约 20%-40%；峰值内存减少约 20%。\n  - NNCF 硬件配置中新增了 `narrow_range` 参数，从而在 MinMax 量化算法中支持更多量化配置组合。\n  - （TorchFX，实验性）为使用动态形状导出的 [TorchFX](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fstable\u002Ffx.html) 模型添加了量化支持。\n  - （TorchFX，实验性）从 `quantize_pt2e` 函数和 `OpenVINOQuantizer` 的 `transform_for_annotation` 方法中移除了常量折叠步骤，以与 `torch.ao` 的量化实现保持一致。\n  - 优化了 GPTQ 算法的行为，使其内存和时间消耗分别减少了 2.71 倍和 1.16 倍。\n  - 新增了对具有 FP8 和 NF4 权重的模型进行优化的通用支持。\n  - 禁用非 8 位量化时的溢出修复应用。\n\n#### 教程：\n  - [Gemma3 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fgemma3\u002Fgemma3.ipynb)\n  - [GLM4-V 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fglm4-v\u002Fglm4-v.ipynb)\n  - [Llasa 语音合成模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fllasa-speech-synthesis\u002Fllasa-speech-synthesis.ipynb)\n  - [YOLOv12 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fyolov12-optimization\u002Fyolov12-object-detection.ipynb)\n  - [Phi-4 多模态模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fphi-4-multimodal\u002Fphi-4-multimodal.ipynb)\n  - [Qwen2.5VL 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fqwen2.5-vl\u002Fqwen2.5-vl.ipynb)\n  - [DeepSeek-VL2 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fdeepseek-vl","2025-04-10T10:13:45",{"id":197,"version":198,"summary_zh":199,"released_at":200},98832,"v2.15.0","### 训练后量化：\n\n#### 功能特性：\n\n  - （TensorFlow）`nncf.quantize()` 方法现已成为量化感知训练的推荐 API。有关如何使用这一新方法的详细信息，请参阅 [示例](examples\u002Fquantization_aware_training\u002Ftensorflow\u002Fmobilenet_v2)。\n  - （TensorFlow）现在可以通过新的 API 函数 `nncf.tensorflow.get_config()` 和 `nncf.tensorflow.load_from_config()` 对模型中的压缩层位置进行序列化和恢复。有关量化模型的保存与加载的更多详细信息，请参阅 [文档](docs\u002Fusage\u002Ftraining_time_compression\u002Fquantization_aware_training\u002FUsage.md#saving-and-loading-compressed-models)。\n  - （OpenVINO）新增了针对 LLM 量化至 FP8 精度的 [示例](examples\u002Fllm_compression\u002Fopenvino\u002Fsmollm2_360m_fp8)。\n  - （TorchFX，实验性）引入了对全新 `quantize_pt2e` API 的预览支持，该 API 可以使用 `OpenVINOQuantizer` 和 `X86InductorQuantizer` 量化器对 `torch.fx.GraphModule` 模型进行量化。`quantize_pt2e` API 利用 MinMax 算法统计收集器，以及 SmoothQuant、BiasCorrection 和 FastBiasCorrection 等训练后量化算法。\n  - 为 ScaledDotProductAttention 操作增加了尺度统一。\n\n#### 修复：\n  - （ONNX）修复了 BiasCorrection 算法中偶发的精度问题。\n  - （ONNX）修复了 GroupConvolution 操作的权重量化问题，这也提升了多个模型的性能。\n  - 修复了 AccuracyAwareQuantization 算法，以解决 [#3118](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Fissues\u002F3118) 问题。\n  - 修复了 NNCF 在可能损坏的后端框架中使用时出现的问题。\n\n#### 改进：\n  - （TorchFX，实验性）新增了 YoloV11 支持。\n  - （OpenVINO）提升了 FastBiasCorrection 算法的性能。\n  - OpenVINO 模型的无数据权重压缩速度显著提升：INT4 压缩速度最高可提升 10 倍，而 INT8 压缩速度最高可提升 3 倍。模型越大，时间缩短越明显。\n  - AWQ 权重压缩速度最高可提升 2 倍，从而提高整体运行效率。\n  - 在 OpenVINO 后端进行 INT4 无数据权重压缩时，部分模型的峰值内存占用可降低高达 50%。\n\n#### 教程：\n  - [GLM-Edge-V 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fglm-edge-v\u002Fglm-edge-v.ipynb)\n  - [OmniGen 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fomnigen\u002Fomnigen.ipynb)\n  - [Sana 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fsana-image-generation\u002Fsana-image-generation.ipynb)\n  - [BGE 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fllm-rag-langchain\u002Fllm-rag-langchain-genai.ipynb)\n  - [训练后优","2025-02-06T10:08:27",{"id":202,"version":203,"summary_zh":204,"released_at":205},98833,"v2.14.1","### 训练后量化：\n\n#### 错误修复：\n  - （PyTorch）修复了 `get_torch_compile_wrapper` 函数，使其与 `torch.compile` 保持一致。\n  - （OpenVINO）更新了缓存统计功能，以采用 `safetensors` 方式。","2024-12-19T12:28:49",{"id":207,"version":208,"summary_zh":209,"released_at":210},98834,"v2.14.0","### 训练后量化：\n\n#### 功能特性：\n\n  - 在 `nncf.compress_weights()` 中引入了可选参数 `backup_mode`，用于指定在进行4位权重压缩时嵌入层、卷积层和最后一层线性层的数据类型。可用选项包括默认的 INT8_ASYM、INT8_SYM，以及 NONE，后者会保留模型权重的原始浮点精度。\n  - 新增了 `quantizer_propagation_rule` 参数，提供对量化器传播的细粒度控制。这一高级选项旨在提升那些可能因不同粒度的量化器合并为逐张量量化而影响精度的模型的准确性。\n  - 引入了 `nncf.data.generate_text_data` API 方法，该方法利用大语言模型生成数据，以进一步支持数据感知型优化。详情请参阅 [示例](examples\u002Fllm_compression\u002Fopenvino\u002Ftiny_llama_synthetic_data\u002F)。\n  - （OpenVINO）扩展了 `nncf.compress_weights()` 对无数据和数据感知权重压缩方法的支持，新增了 NF4 每通道量化，从而使压缩后的大语言模型在 NPU 上更加准确且运行更快。\n  - （OpenVINO）引入了一个新的选项 `statistics_path`，用于缓存并复用 `nncf.compress_weights()` 的统计信息，从而减少寻找最优压缩配置所需的时间。详情请参阅 [TinyLlama 示例](examples\u002Fllm_compression\u002Fopenvino\u002Ftiny_llama_find_hyperparams)。\n  - （TorchFX，实验性）新增了对 [Torch FX](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fstable\u002Ffx.html) 模型的量化和权重压缩支持。压缩后的模型可以直接通过 `torch.compile(compressed_model, backend=\"openvino\")` 执行（详情请见 [此处](https:\u002F\u002Fdocs.openvino.ai\u002F2024\u002Fopenvino-workflow\u002Ftorch-compile.html)）。同时新增了 [INT8 量化示例](examples\u002Fpost_training_quantization\u002Ftorch_fx\u002Fresnet18)。支持的功能列表如下：\n    - 通过 `nncf.quantize()` 实现 SmoothQuant、MinMax、FastBiasCorrection 和 BiasCorrection 算法的 INT8 量化。\n    - 使用 `nncf.compress_weights()` 进行无数据的 INT8、INT4 以及混合精度权重压缩。\n  - （PyTorch，实验性）基于 TorchFunctionMode 添加了模型跟踪及执行前后的钩子。\n\n#### 修复内容：\n\n  - 解决了在元素级运算前冗余插入量化器的问题，从而减少了量化带来的噪声。\n  - 修复了 `nncf.quantize_with_accuracy_control()` 中的类型不匹配问题。\n  - 针对特定分支情况修复了 BiasCorrection 算法。\n  - （OpenVINO）修复了 Stable Diffusion 模型中 GPTQ 权重压缩方法的问题。\n  - （OpenVINO）修复了 `nncf.compress_weights()` 中变分统计处理的相关问题。\n  - （PyTorch，ONNX）将缩放点积注意力模式的量化设置与 OpenVINO 保持一致。\n\n#### 性能改进：\n\n  - 对于使用 AWQ、Scale Estimation、LoRA 及混合精度算法的数据感知型 `nncf.compress_weights()`，峰值内存占用降低了 30% 至 50%。\n  - 对于 `nncf.com`，压缩时间缩短了 10% 至 20%。","2024-11-20T11:49:13",{"id":212,"version":213,"summary_zh":214,"released_at":215},98835,"v2.13.0","### 训练后量化：\n\n#### 功能特性：\n\n  - （OpenVINO）在 `nncf.compress_weights()` 中新增了将 GPTQ 与 AWQ 和尺度估计（SE）算法结合的支持，以实现对 LLM 权重更精确的压缩。因此，目前支持以下包含 GPTQ 的组合：AWQ+GPTQ+SE、AWQ+GPTQ、GPTQ+SE、GPTQ。\n  - （OpenVINO）新增 LoRA 校正算法，在 AWQ 和尺度估计等现有算法的基础上进一步提升 int4 压缩模型的精度。可通过 `nncf.compress_weights()` API 的可选参数 `lora_correction` 启用该算法。该算法会增加压缩时间，并带来可忽略不计的模型尺寸开销。有关不同 int4 压缩方法的精度与占用空间权衡，请参阅 [精度\u002F占用空间权衡](docs\u002Fusage\u002Fpost_training_compression\u002Fweights_compression\u002FUsage.md#accuracyfootprint-trade-off)。\n  - （PyTorch）新增实验性训练后激活剪枝算法的实现。详情请参阅 [激活稀疏化](nncf\u002Fexperimental\u002Ftorch\u002Fsparsify_activations\u002FActivationSparsity.md)。\n  - 新增内存监控工具，用于记录一段 Python 代码或脚本所分配的内存。详细信息请参阅 [NNCF 工具](tools\u002FREADME.md)。\n\n#### 修复内容：\n\n  - （OpenVINO）修复了在部分输入属于 ShapeOF 子图的情况下，卷积和 LSTMSequence 操作的量化问题。\n  - （OpenVINO）修复了 FP8 的 FakeConvert 重复问题。\n  - 修复了 Smooth Quant 算法在形状不正确情况下的问题。\n  - 修复了逐层调度的非确定性问题。\n\n#### 优化改进：\n\n  - （OpenVINO）扩大了硬件融合模式的覆盖范围。\n  - 改进了权重压缩过程中的进度条逻辑，以更准确地估算剩余时间。\n  - 扩展了 `nncf.compress_weights()` 对尺度估计位宽范围的支持。\n  - 移除了算法生成的被忽略作用域的额外日志记录。\n\n#### 教程：\n\n  - [Flux.1 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fflux.1-image-generation\u002Fflux.1-image-generation.ipynb)\n  - [PixArt-α 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fpixart\u002Fpixart.ipynb)\n  - [InternVL2 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Finternvl2\u002Finternvl2.ipynb)\n  - [Qwen2Audio 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fqwen2-audio\u002Fqwen2-audio.ipynb)\n  - [NuExtract 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fnuextract-structure-extraction\u002Fnuextract-structure-extraction.ipynb)\n  - [MiniCPM-V2 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Fblob\u002Flatest\u002Fnotebooks\u002Fminicpm-v-multimodal-chatbot\u002Fminicpm-v-multimodal-chatbot.ipynb)\n\n### 压缩感知 t","2024-09-19T10:24:31",{"id":217,"version":218,"summary_zh":219,"released_at":220},98836,"v2.12.0","### 训练后量化：\n\n#### 功能特性：\n\n  - （OpenVINO、PyTorch、ONNX）对于 `nncf.ModelType.TRANSFORMER` 类型的模型，已将比较运算符排除在量化范围之外。\n  - （OpenVINO、PyTorch）在 `nncf.compress_weights()` 方法中，将对称量化权重的表示方式由带有固定零点的无符号整数改为不带零点的有符号数据类型。\n  - （OpenVINO）扩展了 AWQ 算法在 `nncf.compress_weights()` 中的支持模式，从而使其能够应用于更广泛的模型。\n  - （OpenVINO）在 `nncf.compress_weights()` 方法中引入了 `nncf.CompressWeightsMode.E2M1` 模式选项，作为新的 MXFP4 精度（实验性）。\n  - （OpenVINO）在 `nncf.quantize()` 方法中增加了对 BF16 精度模型的支持。\n  - （PyTorch）新增了对 `torch.addmm` 的量化支持。\n  - （PyTorch）新增了对 `torch.nn.functional.scaled_dot_product_attention` 的量化支持。\n\n#### 修复内容：\n\n  - （OpenVINO、PyTorch、ONNX）修复了 Fast-\u002FBiasCorrection 算法，使其正确支持转置的 MatMul 层。\n  - （OpenVINO）修复了针对包含 If 操作的模型的 `nncf.IgnoredScope()` 功能。\n  - （OpenVINO）修复了包含 PReLU 操作的模式。\n  - 修复了在未安装 Matplotlib 包的情况下导入 NNCF 时出现的运行时错误。\n\n#### 改进内容：\n\n  - 减少了对 OpenVINO 模型应用 `nncf.compress_weights()` 时所需的内存用量。\n  - 改进了在 `nncf.IgnoredScope()` 不为空时的日志记录。\n\n#### 教程：\n\n  - [Stable Audio Open 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fstable-audio\u002Fstable-audio.ipynb)\n  - [Phi3-Vision 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fphi-3-vision\u002Fphi-3-vision.ipynb)\n  - [MiniCPM-V2 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fminicpm-v-multimodal-chatbot\u002Fminicpm-v-multimodal-chatbot.ipynb)\n  - [Jina CLIP 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fjina-clip\u002Fjina-clip.ipynb)\n  - [Stable Diffusion v3 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fstable-diffusion-v3\u002Fstable-diffusion-v3.ipynb)\n  - [HunyuanDIT 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fhunyuan-dit-image-generation\u002Fhunyuan-dit-image-generation.ipynb)\n  - [DDColor 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fddcolor-image-colorization\u002Fddcolor-image-colorization.ipynb)\n  - [DynamiCrafter 模型的训练后优化](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fdynamicrafter-animating-images\u002Fdynamicrafter-animating-images.ipynb)\n  - [训练后…","2024-07-31T12:28:55",{"id":222,"version":223,"summary_zh":224,"released_at":225},98837,"v2.11.0","### Post-training Quantization:\r\n\r\n#### Features:\r\n\r\n- (OpenVINO) Added Scale Estimation algorithm for 4-bit data-aware weights compression. The optional scale_estimation parameter was introduced to nncf.compress_weights() and can be used to minimize accuracy degradation of compressed models (note that this algorithm increases the compression time).\r\n- (OpenVINO) Added GPTQ algorithm for 8\u002F4-bit data-aware weights compression, supporting INT8, INT4, and NF4 data types. The optional gptq parameter was introduced to nncf.compress_weights() to enable the [GPTQ](https:\u002F\u002Farxiv.org\u002Fabs\u002F2210.17323) algorithm.\r\n- (OpenVINO) Added support for models with BF16 weights in the weights compression method, nncf.compress_weights().\r\n- (PyTorch) Added support for quantization and weight compression of the custom modules.\r\n\r\n#### Fixes:\r\n\r\n- (OpenVINO) Fixed incorrect node with bias determination in Fast-\u002FBiasCorrection and ChannelAlighnment algorithms.\r\n- (OpenVINO, PyTorch) Fixed incorrect behaviour of nncf.compress_weights() in case of compressed model as input.\r\n- (OpenVINO, PyTorch) Fixed SmoothQuant algorithm to work with Split ports correctly.\r\n\r\n#### Improvements:\r\n\r\n- (OpenVINO) Aligned resulting compression subgraphs for the nncf.compress_weights() in different FP precisions.\r\n- Aligned 8-bit scheme for NPU target device with the CPU.\r\n\r\n#### Examples:\r\n\r\n- (OpenVINO, ONNX) Updated ignored scope for YOLOv8 examples utilizing a subgraphs approach.\r\n\r\n#### Tutorials:\r\n\r\n- [Post-Training Optimization of Stable Video Diffusion Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fstable-video-diffusion\u002Fstable-video-diffusion.ipynb)\r\n- [Post-Training Optimization of YOLOv10 Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fyolov10-optimization\u002Fyolov10-optimization.ipynb)\r\n- [Post-Training Optimization of LLaVA Next Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fnano-llava-multimodal-chatbot\u002Fnano-llava-multimodal-chatbot.ipynb)\r\n- [Post-Training Optimization of S3D MIL-NCE Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fs3d-mil-nce-text-to-video-retrieval\u002Fs3d-mil-nce-text-to-video-retrieval.ipynb)\r\n- [Post-Training Optimization of Stable Cascade Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fstable-cascade-image-generation\u002Fstable-cascade-image-generation.ipynb)\r\n\r\n### Compression-aware training:\r\n\r\n#### Features:\r\n\r\n- (PyTorch) nncf.quantize method is now the recommended path for the quantization initialization for Quantization-Aware Training.\r\n- (PyTorch) Compression modules placement in the model now can be serialized and restored with new API functions: compressed_model.nncf.get_config() and nncf.torch.load_from_config. The [documentation](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Fblob\u002Fmaster\u002Fdocs\u002Fusage\u002Ftraining_time_compression\u002Fquantization_aware_training\u002FUsage.md#saving-and-loading-compressed-models) for the saving\u002Floading of a quantized model is available, and Resnet18 [example](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Fblob\u002Fmaster\u002Fexamples\u002Fquantization_aware_training\u002Ftorch\u002Fresnet18) was updated to use the new API.\r\n\r\n#### Fixes:\r\n\r\n- (PyTorch) Fixed compatibility with torch.compile.\r\n\r\n#### Improvements:\r\n\r\n- (PyTorch) Base parameters were extended for the EvolutionOptimizer (LeGR algorithm part).\r\n- (PyTorch) Improved wrapping for parameters which are not tensors.\r\n\r\n#### Examples:\r\n\r\n- (PyTorch) Added [an example](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Fblob\u002Fmaster\u002Fexamples\u002Fquantization_aware_training\u002Ftorch\u002Fanomalib) for STFPM model from Anomalib.\r\n\r\n#### Tutorials:\r\n\r\n- [Quantization-Sparsity Aware Training of PyTorch ResNet-50 Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fpytorch-quantization-sparsity-aware-training\u002Fpytorch-quantization-sparsity-aware-training.ipynb)\r\n\r\n### Deprecations\u002FRemovals:\r\n\r\n- Removed extra dependencies to install backends from setup.py (like [torch] are [tf], [onnx] and [openvino]).\r\n- Removed openvino-dev dependency.\r\n\r\n### Requirements:\r\n\r\n- Updated PyTorch (2.3.0) and Torchvision (0.18.0) versions.\r\n\r\n**Acknowledgements**\r\n\r\nThanks for contributions from the OpenVINO developer community:\r\n@DaniAffCH \r\n@UsingtcNower \r\n@anzr299 \r\n@AdiKsOnDev \r\n@Viditagarwal7479 \r\n@truhinnm ","2024-06-17T11:02:21",{"id":227,"version":228,"summary_zh":229,"released_at":230},98838,"v2.10.0","### Post-training Quantization:\r\n\r\n#### Features:\r\n\r\n- Introduced the subgraph defining functionality for the nncf.IgnoredScope() option.\r\n- Introduced limited support for the batch size of more than 1. MobilenetV2 [PyTorch example](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Fblob\u002Fmaster\u002Fexamples\u002Fpost_training_quantization\u002Ftorch\u002Fmobilenet_v2) was updated with batch support.\r\n\r\n#### Fixes:\r\n\r\n- Fixed issue with the nncf.OverflowFix parameter absence in some scenarios.\r\n- Aligned the list of correctable layers for the FastBiasCorrection algorithm between PyTorch, OpenVINO and ONNX backends.\r\n- Fixed issue with the nncf.QuantizationMode parameters combination.\r\n- Fixed MobilenetV2 ([PyTorch](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Fblob\u002Fmaster\u002Fexamples\u002Fpost_training_quantization\u002Ftorch\u002Fmobilenet_v2), [ONNX](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Fblob\u002Fmaster\u002Fexamples\u002Fpost_training_quantization\u002Fonnx\u002Fmobilenet_v2), [OpenVINO](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Fblob\u002Fmaster\u002Fexamples\u002Fpost_training_quantization\u002Fopenvino\u002Fmobilenet_v2)) examples for the Windows platform.\r\n- (OpenVINO) Fixed [Anomaly Classification example](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Fblob\u002Fmaster\u002Fexamples\u002Fpost_training_quantization\u002Fopenvino\u002Fanomaly_stfpm_quantize_with_accuracy_control) for the Windows platform.\r\n- (PyTorch) Fixed bias shift magnitude calculation for fused layers.\r\n- (OpenVINO) Fixed removing the ShapeOf graph which led to an error in the nncf.quantize_with_accuracy_control() method.\r\n- Improvements:\r\n- OverflowFix, AdvancedSmoothQuantParameters and AdvancedBiasCorrectionParameters were exposed into the nncf.* namespace.\r\n- (OpenVINO, PyTorch) Introduced scale compression to FP16 for weights in nncf.compress_weights() method, regardless of model weights precision.\r\n- (PyTorch) Modules that NNCF inserted were excluded from parameter tracing.\r\n- (OpenVINO) Extended the list of correctable layers for the BiasCorrection algorithm.\r\n- (ONNX) Aligned BiasCorrection algorithm behaviour with OpenVINO in specific cases.\r\n\r\n#### Tutorials:\r\n\r\n- [Post-Training Optimization of PhotoMaker Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fphoto-maker\u002Fphoto-maker.ipynb)\r\n- [Post-Training Optimization of Stable Diffusion XL Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fstable-diffusion-xl\u002Fstable-diffusion-xl.ipynb)\r\n- [Post-Training Optimization of KerasCV Stable Diffusion Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fstable-diffusion-keras-cv\u002Fstable-diffusion-keras-cv.ipynb)\r\n- [Post-Training Optimization of Paint By Example Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fpaint-by-example\u002Fpaint-by-example.ipynb)\r\n- [Post-Training Optimization of aMUSEd Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Famused-lightweight-text-to-image\u002Famused-lightweight-text-to-image.ipynb)\r\n- [Post-Training Optimization of InstantID Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Finstant-id\u002Finstant-id.ipynb)\r\n- [Post-Training Optimization of LLaVA Next Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fllava-next-multimodal-chatbot\u002Fllava-next-multimodal-chatbot.ipynb)\r\n- [Post-Training Optimization of AnimateAnyone Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fanimate-anyone\u002Fanimate-anyone.ipynb)\r\n- [Post-Training Optimization of YOLOv8-OBB Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fyolov8-optimization\u002Fyolov8-obb.ipynb)\r\n- [Post-Training Optimization of LLM Agent](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Flatest\u002Fnotebooks\u002Fllm-agent-langchain\u002Fllm-agent-langchain.ipynb)\r\n\r\n### Compression-aware training:\r\n\r\n#### Features:\r\n\r\n- (PyTorch) nncf.quantize method now may be used as quantization initialization for Quantization-Aware Training. Added a [Resnet18-based example](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Fblob\u002Fmaster\u002Fexamples\u002Fquantization_aware_training\u002Ftorch\u002Fresnet18) with the transition from the Post-Training Quantization to a Quantization-Aware Training algorithm.\r\n- (PyTorch) Introduced extractors for the fused Convolution, Batch-\u002FGroupNorm, and Linear functions.\r\n\r\n#### Fixes:\r\n\r\n- (PyTorch) Fixed apply_args_defaults function issue.\r\n- (PyTorch) Fixed dtype handling for the compressed torch.nn.Parameter.\r\n- (PyTorch) Fixed is_shared parameter propagation.\r\n\r\n#### Improvements:\r\n\r\n- (PyTorch) Updated command creation behaviour to reduce the number of adapters.\r\n- (PyTorch) Added option to insert point for models that wrapped with replace_modules=False.\r\n\r\n#### Deprecations\u002FRemovals:\r\n\r\n- (PyTorch) Removed the binarization algorithm.\r\n- NNCF installation via pip install nncf[\u003Cframework>] option is now deprecated.\r\n\r\n#### Requirements:\r\n\r\n- Updated PyTorch (2.2.1) and CUDA (12.1) versions.\r\n- Updated ONNX (1.16.0) and ONNXRun","2024-04-25T12:01:23",{"id":232,"version":233,"summary_zh":234,"released_at":235},98839,"v2.9.0","### Post-training Quantization:\r\n\r\n#### Features:\r\n  - (OpenVINO) Added modified AWQ algorithm for 4-bit data-aware weights compression. This algorithm applied only for patterns `MatMul->Multiply->Matmul`. For that `awq` optional parameter has been added to `nncf.compress_weights()` and can be used to minimize accuracy degradation of compressed models (note that this option increases the compression time).\r\n  - (ONNX) Introduced support for the ONNX backend in the `nncf.quantize_with_accuracy_control()` method. Users can now perform quantization with accuracy control for `onnx.ModelProto`. By leveraging this feature, users can enhance the accuracy of quantized models while minimizing performance impact.\r\n  - (ONNX) Added an example based on the YOLOv8n-seg model for demonstrating the usage of quantization with accuracy control for the ONNX backend.\r\n  - (PT) Added SmoothQuant algorithm for PyTorch backend in `nncf.quantize()`.\r\n  - (OpenVINO) Added [an example](examples\u002Fllm_compression\u002Fopenvino\u002Ftiny_llama_find_hyperparams) with the hyperparameters tuning for the TinyLLama model.\r\n  - Introduced the `nncf.AdvancedAccuracyRestorerParameters`.\r\n  - Introduced the `subset_size` option for the `nncf.compress_weights()`.\r\n  - Introduced `TargetDevice.NPU` as the replacement for `TargetDevice.VPU`.\r\n#### Fixes:\r\n  - Fixed API Enums serialization\u002Fdeserialization issue.\r\n  - Fixed issue with required arguments for `revert_operations_to_floating_point_precision` method.\r\n#### Improvements:\r\n  - (ONNX) Aligned statistics collection with OpenVINO and PyTorch backends.\r\n  - Extended `nncf.compress_weights()` with Convolution & Embeddings compression in order to reduce memory footprint.\r\n#### Deprecations\u002FRemovals:\r\n  - (OpenVINO) Removed outdated examples with `nncf.quantize()` for BERT and YOLOv5 models.\r\n  - (OpenVINO) Removed outdated example with `nncf.quantize_with_accuracy_control()` for SSD MobileNetV1 FPN model.\r\n  - (PyTorch) Deprecated the `binarization` algorithm.\r\n  - Removed Post-training Optimization Tool as OpenVINO backend.\r\n  - Removed Dockerfiles.\r\n  - `TargetDevice.VPU` was replaced by `TargetDevice.NPU`.\r\n#### Tutorials:\r\n  - [Post-Training Optimization of Stable Diffusion v2 Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F236-stable-diffusion-v2\u002F236-stable-diffusion-v2-text-to-image.ipynb)\r\n  - [Post-Training Optimization of DeciDiffusion Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F259-decidiffusion-image-generation\u002F259-decidiffusion-image-generation.ipynb)\r\n  - [Post-Training Optimization of DepthAnything Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F280-depth-anything\u002F280-depth-anything.ipynb)\r\n  - [Post-Training Optimization of Stable Diffusion ControlNet Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F235-controlnet-stable-diffusion\u002F235-controlnet-stable-diffusion.ipynb)\r\n\r\n### Compression-aware training:\r\n\r\n#### Fixes\r\n  - (PyTorch) Fixed issue with `NNCFNetworkInterface.get_clean_shallow_copy` missed arguments.\r\n \r\n**Acknowledgements**\r\n\r\nThanks for contributions from the OpenVINO developer community:\r\n@AishwaryaDekhane \r\n@UsingtcNower \r\n@Om-Doiphode \r\n","2024-03-06T11:39:00",{"id":237,"version":238,"summary_zh":239,"released_at":240},98840,"v2.8.1","### Post-training Quantization:\r\n\r\n#### Bugfixes:\r\n  - (Common) Fixed issue with `nncf.compress_weights()` to avoid overflows on 32-bit Windows systems.\r\n  - (Common) Fixed performance issue with `nncf.compress_weights()` on LLama models.\r\n  - (Common) Fixed `nncf.quantize_with_accuracy_control` pipeline with `tune_hyperparams=True` enabled option.\r\n  - (OpenVINO) Fixed issue for stateful LLM models and added state restoring after the inference for it.\r\n  - (PyTorch) Fixed issue with `nncf.compress_weights()` for LLM models with the executing `is_floating_point` with tracing.","2024-02-09T09:45:24",{"id":242,"version":243,"summary_zh":244,"released_at":245},98841,"v2.8.0","### Post-training Quantization:\r\n\r\n#### Breaking changes:\r\n  - `nncf.quantize` signature has been changed to add `mode: Optional[nncf.QuantizationMode] = None` as its 3-rd argument, between the original `calibration_dataset` and `preset` arguments.\r\n  - (Common) `nncf.common.quantization.structs.QuantizationMode` has been renamed to `nncf.common.quantization.structs.QuantizationScheme`\r\n#### General:\r\n  - (OpenVINO) Changed default OpenVINO opset from 9 to 13.\r\n#### Features:\r\n  - (OpenVINO) Added 4-bit data-aware weights compression. For that `dataset` optional parameter has been added to `nncf.compress_weights()` and can be used to minimize accuracy degradation of compressed models (note that this option increases the compression time).\r\n  - (PyTorch) Added support for PyTorch models with shared weights and custom PyTorch modules in `nncf.compress_weights()`. The weights compression algorithm for PyTorch models is now based on tracing the model graph. The `dataset` parameter is now required in `nncf.compress_weights()` for the compression of PyTorch models.\r\n  - (Common) Renamed the `nncf.CompressWeightsMode.INT8` to `nncf.CompressWeightsMode.INT8_ASYM` and introduce `nncf.CompressWeightsMode.INT8_SYM` that can be efficiently used with dynamic 8-bit quantization of activations.\r\n  The original `nncf.CompressWeightsMode.INT8` enum value is now deprecated.\r\n  - (OpenVINO) Added support for quantizing the ScaledDotProductAttention operation from OpenVINO opset 13.\r\n  - (OpenVINO) Added FP8 quantization support via `nncf.QuantizationMode.FP8_E4M3` and `nncf.QuantizationMode.FP8_E5M2` enum values, invoked via passing one of these values as an optional `mode` argument to `nncf.quantize`. Currently, OpenVINO supports inference of FP8-quantized models in reference mode with no performance benefits and can be used for accuracy projections.\r\n  - (Common) Post-training Quantization with Accuracy Control - `nncf.quantize_with_accuracy_control()` has been extended by `restore_mode` optional parameter to revert weights to int8 instead of the original precision.\r\n  This parameter helps to reduce the size of the quantized model and improves its performance.\r\n  By default, it's disabled and model weights are reverted to the original precision in `nncf.quantize_with_accuracy_control()`.\r\n  - (Common) Added an `all_layers: Optional[bool] = None` argument to `nncf.compress_weights` to indicate whether embeddings and last layers of the model should be compressed to a primary precision. This is relevant to 4-bit quantization only.\r\n  - (Common) Added a `sensitivity_metric: Optional[nncf.parameters.SensitivityMetric] = None` argument to `nncf.compress_weights` for finer control over the sensitivity metric for assigning quantization precision to layers.\r\n  Defaults to weight quantization error if a dataset is not provided for weight compression and to maximum variance of the layers' inputs multiplied by inverted 8-bit quantization noise if a dataset is provided.\r\n  By default, the backup precision is assigned for the embeddings and last layers.\r\n#### Fixes:\r\n  - (OpenVINO) Models with embeddings (e.g. `gpt-2`, `stable-diffusion-v1-5`, `stable-diffusion-v2-1`, `opt-6.7b`, `falcon-7b`, `bloomz-7b1`) are now more accurately quantized.\r\n  - (PyTorch) `nncf.strip(..., do_copy=True)` now actually returns a deepcopy (stripped) of the model object.\r\n  - (PyTorch) Post-hooks can now be set up on operations that return `torch.return_type` (such as `torch.max`).\r\n  - (PyTorch) Improved dynamic graph tracing for various tensor operations from `torch` namespace.\r\n  - (PyTorch) More robust handling of models with disjoint traced graphs when applying PTQ.\r\n#### Improvements:\r\n  - Reformatted the tutorials section in the top-level `README.md` for better readability.\r\n#### Deprecations\u002FRemovals:\r\n  - (Common) The original `nncf.CompressWeightsMode.INT8` enum value is now deprecated.\r\n  - (PyTorch) The Git patch for integration with HuggingFace `transformers` repository is marked as deprecated and will be removed in a future release.\r\n  Developers are advised to use [optimum-intel](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Foptimum-intel) instead.\r\n  - Dockerfiles in the NNCF Git repository are deprecated and will be removed in a future release.","2024-01-24T13:06:00",{"id":247,"version":248,"summary_zh":249,"released_at":250},98842,"v2.7.0","### Post-training Quantization:\r\n\r\n#### Features:\r\n  - (OpenVINO) Added support for data-free 4-bit weights compression through NF4 and INT4 data types (`compress_weights(…)` pipeline).\r\n  - (OpenVINO) Added support for [IF operation](https:\u002F\u002Fdocs.openvino.ai\u002Flatest\u002Fopenvino_docs_ops_infrastructure_If_8.html) quantization.\r\n  - (OpenVINO) Added `dump_intermediate_model` parameter support for AccuracyAwareAlgorithm (`quantize_with_accuracy_control(…)` pipeline).\r\n  - (OpenVINO) Added support for SmoothQuant and ChannelAlignment algorithms for HyperparameterTuner algorithm (`quantize_with_tune_hyperparams(…)` pipeline).\r\n  - (PyTorch) Post-training Quantization is now supported with `quantize(…)` pipeline and the common implementation of quantization algorithms. Deprecated `create_compressed_model()` method for Post-training Quantization.\r\n  - Added new types (AvgPool, GroupNorm, LayerNorm) to the ignored scope for `ModelType.Transformer` scheme.\r\n  - `QuantizationPreset.Mixed` was set as the default for `ModelType.Transformer` scheme.\r\n#### Fixes:\r\n  - (OpenVINO, ONNX, PyTorch) Aligned\u002Fadded patterns between backends (SE block, MVN layer, multiple activations, etc.) to restore performance\u002Fmetrics.\r\n  - Fixed patterns for `ModelType.Transformer` to align with the [quantization scheme](https:\u002F\u002Fdocs.openvino.ai\u002Flatest\u002Fopenvino_docs_OV_UG_lpt.html).\r\n#### Improvements:\r\n  - Improved UX with the new progress bar for pipeline, new exceptions, and .dot graph visualization updates.\r\n  - (OpenVINO) Optimized WeightsCompression algorithm (`compress_weights(…)` pipeline) execution time for LLM's quantization, added ignored scope support.\r\n  - (OpenVINO) Optimized AccuracyAwareQuantization algorithm execution time with multi-threaded approach while calculating ranking score (`quantize_with_accuracy_control(…)` pipeline).\r\n  - (OpenVINO) Added [extract_ov_subgraph tool](tools\u002Fextract_ov_subgraph.py) for large IR subgraph extraction.\r\n  - (ONNX) Optimized quantization pipeline (up to 1.15x speed up).\r\n#### Tutorials:\r\n  - [Post-Training Optimization of BLIP Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F233-blip-visual-language-processing)\r\n  - [Post-Training Optimization of DeepFloyd IF Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F238-deepfloyd-if)\r\n  - [Post-Training Optimization of Grammatical Error Correction Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F214-grammar-correction)\r\n  - [Post-Training Optimization of Dolly 2.0 Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F240-dolly-2-instruction-following)\r\n  - [Post-Training Optimization of Massively Multilingual Speech Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F255-mms-massively-multilingual-speech)\r\n  - [Post-Training Optimization of OneFormer Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F249-oneformer-segmentation)\r\n  - [Post-Training Optimization of InstructPix2Pix Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F231-instruct-pix2pix-image-editing)\r\n  - [Post-Training Optimization of LLaVA Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F257-llava-multimodal-chatbot)\r\n  - [Post-Training Optimization of Latent Consistency Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F263-latent-consistency-models-image-generation)\r\n  - [Post-Training Optimization of Distil-Whisper Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F267-distil-whisper-asr)\r\n  - [Post-Training Optimization of FastSAM Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F261-fast-segment-anything)\r\n#### Known issues:\r\n  - (ONNX) `quantize(...)` method can generate inaccurate int8 results for models with the BatchNormalization layer that contains biases. To get the best accuracy, use the `do_constant_folding=True` option during export from PyTorch to ONNX.\r\n\r\n### Compression-aware training:\r\n\r\n#### Fixes:\r\n  - (PyTorch) Fixed Hessian trace calculation to solve [#2155](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Fissues\u002F2155) issue.\r\n#### Requirements:\r\n  - Updated PyTorch version (2.1.0).\r\n  - Updated numpy version (\u003C1.27).\r\n#### Deprecations\u002FRemovals:\r\n  - (PyTorch) Removed legacy external quantizer storage names.\r\n  - (PyTorch) Removed torch \u003C 2.0 version support.","2023-11-16T14:59:00",{"id":252,"version":253,"summary_zh":254,"released_at":255},98843,"v2.6.0","### Post-training Quantization:\r\n\r\n#### Features:\r\n  - Added `CPU_SPR` device type support.\r\n  - Added quantizers scales unification.\r\n  - Added quantization scheme for ReduceSum operation.\r\n  - Added new types (ReduceL2, ReduceSum, Maximum) to the ignored scope for `ModelType.Transformer`.\r\n  - (OpenVINO) Added SmoothQuant algorithm.\r\n  - (OpenVINO) Added ChannelAlignment algorithm.\r\n  - (OpenVINO) Added HyperparameterTuner algorithm.\r\n  - (PyTorch) Added FastBiasCorrection algorithm support.\r\n  - (OpenVINO, ONNX) Added embedding weights quantization.\r\n  - (OpenVINO, PyTorch) Added new `compress_weights` method that provides data-free [INT8 weights compression](docs\u002Fcompression_algorithms\u002FCompressWeights.md).\r\n#### Fixes:\r\n  - Fixed detection of decomposed post-processing in models.\r\n  - Multiple fixes (new patterns, bugfixes, etc.) to solve [#1936](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fnncf\u002Fissues\u002F1936) issue.\r\n  - Fixed model reshaping while quantization to keep original model shape.\r\n  - (OpenVINO) Added support for sequential models quanitzation.\r\n  - (OpenVINO) Fixed in-place statistics cast to support empty dimensions.\r\n  - (OpenVINO, ONNX) Fixed quantization of the MatMul operation with weights rank > 2.\r\n  - (OpenVINO, ONNX) Fixed BiasCorrection algorithm to enable [CLIP model quantization](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F228-clip-zero-shot-image-classification).\r\n#### Improvements:\r\n  - Optimized `quantize(…)` pipeline (up to 4.3x speed up in total).\r\n  - Optimized `quantize_with_accuracy_control(…)` pipelilne (up to 8x speed up for [122-quantizing-model-with-accuracy-control](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F122-quantizing-model-with-accuracy-control) notebook).\r\n  - Optimized general statistics collection (up to 1.2x speed up for ONNX backend).\r\n  - Ignored patterns separated from Fused patterns scheme (with multiple patterns addition).\r\n#### Tutorials:\r\n  - [Post-Training Optimization of Segment Anything Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F237-segment-anything).\r\n  - [Post-Training Optimization of CLIP Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F228-clip-zero-shot-image-classification).\r\n  - [Post-Training Optimization of ImageBind Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F239-image-bind).\r\n  - [Post-Training Optimization of Whisper Model](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F227-whisper-subtitles-generation).\r\n  - [Post-Training Optimization with accuracy control](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F122-quantizing-model-with-accuracy-control).\r\n\r\n### Compression-aware training:\r\n\r\n#### Features:\r\n  - Added shape pruning processor for BootstrapNAS algorithm.\r\n  - Added KD loss for BootstrapNAS algorithm.\r\n  - Added `validate_scopes` parameter for NNCF configuration.\r\n  - (PyTorch) Added PyTorch 2.0 support.\r\n  - (PyTorch) Added `.strip()` option to API.\r\n  - (PyTorch) Enabled bfloat data type for quantization kernels.\r\n  - (PyTorch) Quantized models can now be `torch.jit.trace`d without calling `.strip()`.\r\n  - (PyTorch) Added support for overridden `forward` instance attribute on model objects passed into `create_compressed_model`.\r\n  - (Tensorflow) Added Tensorflow 2.12 support.\r\n#### Fixes:\r\n  - (PyTorch) Fixed padding adjustment issue in the elastic kernel to work with the different active kernel sizes.\r\n  - (PyTorch) Fixed the torch graph tracing in the case the tensors belonging to parallel edges are interleaved in the order of the tensor argument.\r\n  - (PyTorch) Fixed recurrent nodes matching (LSTM, GRU cells) condition with the strict rule to avoid adding not necessary nodes to the ignored scope.\r\n  - (PyTorch) Fixed `torch.jit.script` wrapper so that user-side handling exceptions during `torch.jit.script` invocation do not cause NNCF to be permanently disabled.\r\n  - (PyTorch, Tensorflow) Adjusted quantizer propagation algorithm to check if quantizer propagation will result in output quantization.\r\n  - (PyTorch) Added redefined `__class__` method for ProxyModule that avoids causing error while calling `.super()` in forward method.\r\n#### Deprecations\u002FRemovals:\r\n  - (PyTorch) Removed deprecated `NNCFNetwork.__getattr__`, `NNCFNetwork.get_nncf_wrapped_model` methods.\r\n#### Requirements:\r\n  - Updated PyTorch version (2.0.1).\r\n  - Updated Tensorflow version (2.12.0).","2023-09-18T15:59:08",{"id":257,"version":258,"summary_zh":259,"released_at":260},98844,"v2.5.0","### Post-training Quantization:\r\n\r\n#### Features:\r\n  - Official release of OpenVINO framework support.\r\n    - Ported NNCF OpenVINO backend to use the [nGraph](https:\u002F\u002Fdocs.openvino.ai\u002F2021.3\u002Fopenvino_docs_nGraph_DG_Introduction.html) representation of OpenVINO models.\r\n    - Changed dependecies of NNCF OpenVINO backend. It now depends on `openvino` package and not on the `openvino-dev` package.\r\n    - Added GRU\u002FLSTM quantization support.\r\n    - Added quantizer scales unification.\r\n    - Added support for models with 3D and 5D Depthwise convolution.\r\n    - Added FP16 OpenVINO models support.\r\n  - Added `\"overflow_fix\"` parameter (for `quantize(...)` & `quantize_with_accuracy_control(...)` methods) support & functionality. It improves accuracy for optimized model for affected devices. More details in [Quantization section](docs\u002Fcompression_algorithms\u002FQuantization.md).\r\n  - (OpenVINO) Added support for in-place statistics collection (reduce memory footprint during optimization).\r\n  - (OpenVINO) Added Quantization with accuracy control algorithm.\r\n  - (OpenVINO) Added YOLOv8 examples for [`quantize(...)`](examples\u002Fpost_training_quantization\u002Fopenvino\u002Fyolov8) & [`quantize_with_accuracy_control(...)`](examples\u002Fpost_training_quantization\u002Fopenvino\u002Fyolov8_quantize_with_accuracy_control) methods.\r\n  - (PyTorch) Added min-max quantization algorithm as experimental.\r\n\r\n#### Fixes:\r\n  - Fixed `ignored_scope` attribute behaviour for weights. Now, the weighted layers excludes from optimization scope correctly.\r\n  - (ONNX) Checking correct ONNX opset version via the `nncf.quantize(...)`. Now, models with opset \u003C 13 are optimized correctly in per-tensor quantization.\r\n\r\n#### Improvements:\r\n  - Added improvements for statistic collection process (collect weights statistics only once).\r\n  - (PyTorch, OpenVINO, ONNX) Introduced unified quantizer parameters calculation.\r\n\r\n#### Known issues:\r\n  - `quantize(...)` method can generate inaccurate int8 results for models with the *DenseNet-like* architecture. Use `quantize_with_accuracy_control(...)` in such case.\r\n  - `quantize(...)` method can hang on models with *transformer* architecture when `fast_bias_correction` optional parameter is set to *False*. Don't set it to *False* or use `quantize_with_accuracy_control(...)` in such case.\r\n  - `quantize(...)` method can generate inaccurate int8 results for models with the *MobileNet-like* architecture on non-VNNI machines.\r\n\r\n### Compression-aware training:\r\n\r\n#### New Features:\r\n  - Introduced automated structured pruning algorithm for JPQD with support for BERT, Wave2VecV2, Swin, ViT, DistilBERT, CLIP, and MobileBERT models.\r\n  - Added `nncf.common.utils.patcher.Patcher` - this class can be used to patch methods on live PyTorch model objects with wrappers such as `nncf.torch.dynamic_graph.context.no_nncf_trace` when doing so in the model code is not possible (e.g. if the model comes from an external library package).\r\n  - Compression controllers of the `nncf.api.compression.CompressionAlgorithmController` class now have a `.strip()` method that will return the compressed model object with as many custom NNCF additions removed as possible while preserving the functioning of the model object as a compressed model.\r\n\r\n#### Fixes:\r\n  - Fixed statistics computation for pruned layers.\r\n  - (PyTorch) Fixed traced tensors to implement the YOLOv8 from Ultralytics.\r\n\r\n#### Improvements:\r\n  - Extension of attributes (`transpose\u002Fpermute\u002Fgetitem`) for pruning node selector.\r\n  - NNCFNetwork was refactored from a wrapper-approach to a mixin-like approach.\r\n  - Added average pool 3d-like ops to pruning mask.\r\n  - Added Conv3d for overflow fix.\r\n  - `nncf.set_log_file(...)` can now be used to set location of the NNCF log file.\r\n  - (PyTorch) Added support for pruning of `torch.nn.functional.pad` operation.\r\n  - (PyTorch) Added `torch.baddbmm` as an alias for the matmul metatype for quantization purposes.\r\n  - (PyTorch) Added config file for ResNet18 accuracy-aware pruning + quantization on CIFAR10.\r\n  - (PyTorch) Fixed JIT-traceable PyTorch models with internal patching.\r\n  - (PyTorch) Added `__matmul__` magic functions to the list of patched ops (for SwinTransformer by Microsoft).\r\n\r\n### Requirements:\r\n  - Updated ONNX version (1.13)\r\n  - Updated Tensorflow version (2.11)\r\n\r\n### General changes:\r\n  - Added Windows support for NNCF.","2023-06-06T16:40:27",{"id":262,"version":263,"summary_zh":264,"released_at":265},98845,"v2.4.0","### Target version updates:\r\n- Bump target framework versions to PyTorch 1.13.1, TensorFlow 2.8.x, ONNX 1.12, ONNXRuntime 1.13.1\r\n- Increased target HuggingFace transformers version for the integration patch to 4.23.1\r\n\r\n### Features:\r\n- Official release of the ONNX framework support.\r\nNNCF may now be used for post-training quantization (PTQ) on ONNX models.\r\nAdded an [example script](examples\u002Fpost_training_quantization\u002Fonnx\u002Fmobilenet_v2) demonstrating the ONNX post-training quantization on MobileNetV2.\r\n- Preview release of OpenVINO framework support. \r\nNNCF may now be used for post-training quantization on OpenVINO models. Added an example script demonstrating the OpenVINO post-training quantization on MobileNetV2.\r\n`pip install nncf[openvino]` will install NNCF with the required OV framework dependencies.\r\n- Common post-training quantization API across the supported framework model formats (PyTorch, TensorFlow, ONNX, OpenVINO IR) via the `nncf.quantize(...)` function.\r\nThe parameter set of the function is the same for all frameworks - actual framework-specific implementations are being dispatched based on the type of the model object argument.\r\n- (PyTorch, TensorFlow) Improved the adaptive compression training functionality to reduce effective training time.\r\n- (ONNX) Post-processing nodes are now automatically excluded from quantization.\r\n- (PyTorch - Experimental) Joint Pruning, Quantization and Distillation for Transformers enabled for certain models from HuggingFace `transformers` repo.\r\nSee [description](nncf\u002Fexperimental\u002Ftorch\u002Fsparsity\u002Fmovement\u002FMovementSparsity.md) of the movement pruning involved in the JPQD for details.\r\n\r\n### Bugfixes:\r\n- Fixed a division by zero if every operation is added to ignored scope\r\n- Improved logging output, cutting down on the number of messages being output to the standard `logging.INFO` log level.\r\n- Fixed FLOPS calculation for linear filters - this impacts existing models that were pruned with a FLOPS target.\r\n- \"chunk\" and \"split\" ops are correctly handled during pruning.\r\n- Linear layers may now be pruned by input and output independently.\r\n- Matmul-like operations and subsequent arithmetic operations are now treated as a fused pattern.\r\n- (PyTorch) Fixed a rare condition with accumulator overflow in CUDA quantization kernels, which led to CUDA runtime errors and NaN values appearing in quantized tensors and \r\n- (PyTorch) `transformers` integration patch now allows to export to ONNX during training, and not only at the end of it.\r\n- (PyTorch) `torch.nn.utils.weight_norm` weights are now detected correctly.\r\n- (PyTorch) Exporting a model with sparsity or pruning no longer leads to weights in the original model object in-memory to be hard-set to 0.\r\n- (PyTorch - Experimental) improved automatic search of blocks to skip within the NAS algorithm – overlapping blocks are correctly filtered.\r\n- (PyTorch, TensorFlow) Various bugs and issues with compression training were fixed.\r\n- (TensorFlow) Fixed an error with `\"num_bn_adaptation_samples\": 0` in config leading to a `TypeError` during quantization algo initialization.\r\n- (ONNX) Temporary model file is no longer saved on disk.\r\n- (ONNX) Depthwise convolutions are now quantizable in per-channel mode.\r\n- (ONNX) Improved the working time of PTQ by optimizing the calls to ONNX shape inferencing.\r\n\r\n### Breaking changes:\r\n- Fused patterns will be excluded from quantization via `ignored_scopes` only if the top-most node in data flow order matches against `ignored_scopes`\r\n- NNCF config's `\"ignored_scopes\"` and `\"target_scopes\"` are now strictly checked to be matching against at least one node in the model graph instead of silently ignoring the unmatched entries.\r\n- Calling `setup.py` directly to install NNCF is deprecated and no longer guaranteed to work.\r\n- Importing NNCF logger as `from nncf.common.utils.logger import logger as nncf_logger` is deprecated - use `from nncf import nncf_logger` instead.\r\n- `pruning_rate` is renamed to `pruning_level` in pruning compression controllers.\r\n- (ONNX) Removed CompressionBuilder. Excluded examples of NNCF for ONNX with CompressionBuilder API","2023-02-01T11:08:28",{"id":267,"version":268,"summary_zh":269,"released_at":270},98846,"v2.3.0","**New features**\r\n- (ONNX) PTQ API support for ONNX.\r\n- (ONNX) Added PTQ examples for ONNX in image classification, object detection, and semantic segmentation.\r\n- (PyTorch) Added `BootstrapNAS` to find high-performing sub-networks from the super-network optimization.\r\n\r\n**Bugfixes**\r\n- (PyTorch) Returned the initial quantized model when the retraining failed to find out the best checkpoint.\r\n- (Experimental) Fixed weight initialization for `ONNXGraph` and `MinMaxQuantization`.","2022-07-05T01:00:09"]