[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-NVIDIA--DALI":3,"tool-NVIDIA--DALI":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":110,"forks":111,"last_commit_at":112,"license":113,"difficulty_score":10,"env_os":114,"env_gpu":115,"env_ram":116,"env_deps":117,"category_tags":126,"github_topics":127,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":143,"updated_at":144,"faqs":145,"releases":176},3990,"NVIDIA\u002FDALI","DALI","A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.","DALI 是 NVIDIA 推出的一款 GPU 加速数据加载与预处理库，旨在为深度学习训练和推理应用提供高效的数据处理引擎。它内置了针对图像、视频和音频数据的高度优化算子，能够无缝替代主流框架（如 PyTorch、TensorFlow）中原有的数据加载器。\n\n在深度学习任务中，复杂的多阶段数据处理流程（包括解码、裁剪、缩放及增强等）传统上依赖 CPU 执行，往往成为限制整体性能与扩展性的瓶颈。DALI 通过将这部分繁重的预处理工作卸载到 GPU 上运行，有效解决了\"CPU 瓶颈”问题。其独有的执行引擎支持自动预取、并行执行和批处理，用户在编写代码时无需手动管理这些细节，即可显著提升数据吞吐率。此外，DALI 具备良好的可移植性，同一套数据处理流水线可轻松应用于不同的深度学习框架，降低了代码维护成本。\n\nDALI 非常适合需要处理大规模数据集的 AI 研究人员、算法工程师及深度学习开发者。无论是构建高性能训练管道还是优化推理服务，只要希望突破数据供给速度的限制并简化跨框架工作流，DALI 都是一个值得尝试的专业工具。","|License|  |Documentation|  |Format|\n\nNVIDIA DALI\n===========\n.. overview-begin-marker-do-not-remove\n\nThe NVIDIA Data Loading Library (DALI) is a GPU-accelerated library for data loading\nand pre-processing to accelerate deep learning applications. It provides a\ncollection of highly optimized building blocks for loading and processing\nimage, video and audio data. It can be used as a portable drop-in replacement\nfor built in data loaders and data iterators in popular deep learning frameworks.\n\nDeep learning applications require complex, multi-stage data processing pipelines\nthat include loading, decoding, cropping, resizing, and many other augmentations.\nThese data processing pipelines, which are currently executed on the CPU, have become a\nbottleneck, limiting the performance and scalability of training and inference.\n\nDALI addresses the problem of the CPU bottleneck by offloading data preprocessing to the\nGPU. Additionally, DALI relies on its own execution engine, built to maximize the throughput\nof the input pipeline. Features such as prefetching, parallel execution, and batch processing\nare handled transparently for the user.\n\nIn addition, the deep learning frameworks have multiple data pre-processing implementations,\nresulting in challenges such as portability of training and inference workflows, and code\nmaintainability. Data processing pipelines implemented using DALI are portable because they\ncan easily be retargeted to TensorFlow, PyTorch, and PaddlePaddle.\n\n.. image:: \u002Fdali.png\n    :width: 800\n    :align: center\n    :alt: DALI Diagram\n\nDALI in action:\n\n.. container:: dali-tabs\n\n   **Pipeline mode:**\n\n   .. code-block:: python\n\n      from nvidia.dali.pipeline import pipeline_def\n      import nvidia.dali.types as types\n      import nvidia.dali.fn as fn\n      from nvidia.dali.plugin.pytorch import DALIGenericIterator\n      import os\n\n      # To run with different data, see documentation of nvidia.dali.fn.readers.file\n      # points to https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FDALI_extra\n      data_root_dir = os.environ['DALI_EXTRA_PATH']\n      images_dir = os.path.join(data_root_dir, 'db', 'single', 'jpeg')\n\n\n      def loss_func(pred, y):\n          pass\n\n\n      def model(x):\n          pass\n\n\n      def backward(loss, model):\n          pass\n\n\n      @pipeline_def(num_threads=4, device_id=0)\n      def get_dali_pipeline():\n          images, labels = fn.readers.file(\n              file_root=images_dir, random_shuffle=True, name=\"Reader\")\n          # decode data on the GPU\n          images = fn.decoders.image_random_crop(\n              images, device=\"mixed\", output_type=types.RGB)\n          # the rest of processing happens on the GPU as well\n          images = fn.resize(images, resize_x=256, resize_y=256)\n          images = fn.crop_mirror_normalize(\n              images,\n              crop_h=224,\n              crop_w=224,\n              mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],\n              std=[0.229 * 255, 0.224 * 255, 0.225 * 255],\n              mirror=fn.random.coin_flip())\n          return images, labels\n\n\n      train_data = DALIGenericIterator(\n          [get_dali_pipeline(batch_size=16)],\n          ['data', 'label'],\n          reader_name='Reader'\n      )\n\n\n      for i, data in enumerate(train_data):\n          x, y = data[0]['data'], data[0]['label']\n          pred = model(x)\n          loss = loss_func(pred, y)\n          backward(loss, model)\n\n   **Dynamic mode:**\n\n   .. code-block:: python\n\n      import os\n      import nvidia.dali.types as types\n      import nvidia.dali.experimental.dynamic as ndd\n      import torch\n\n      # To run with different data, see documentation of ndd.readers.File\n      # points to https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FDALI_extra\n      data_root_dir = os.environ['DALI_EXTRA_PATH']\n      images_dir = os.path.join(data_root_dir, 'db', 'single', 'jpeg')\n\n\n      def loss_func(pred, y):\n          pass\n\n\n      def model(x):\n          pass\n\n\n      def backward(loss, model):\n          pass\n\n\n      reader = ndd.readers.File(file_root=images_dir, random_shuffle=True)\n\n      for images, labels in reader.next_epoch(batch_size=16):\n          images = ndd.decoders.image_random_crop(images, device=\"gpu\", output_type=types.RGB)\n          # the rest of processing happens on the GPU as well\n          images = ndd.resize(images, resize_x=256, resize_y=256)\n          images = ndd.crop_mirror_normalize(\n              images,\n              crop_h=224,\n              crop_w=224,\n              mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],\n              std=[0.229 * 255, 0.224 * 255, 0.225 * 255],\n              mirror=ndd.random.coin_flip(),\n          )\n\n          x = torch.as_tensor(images)\n          y = torch.as_tensor(labels.gpu())\n\n          pred = model(x)\n          loss = loss_func(pred, y)\n          backward(loss, model)\n\n\nHighlights\n----------\n- Easy-to-use functional style Python API.\n- Multiple data formats support - LMDB, RecordIO, TFRecord, COCO, JPEG, JPEG 2000, WAV, FLAC, OGG, H.264, VP9 and HEVC.\n- Portable across popular deep learning frameworks: TensorFlow, PyTorch, PaddlePaddle, JAX.\n- Supports CPU and GPU execution.\n- Scalable across multiple GPUs.\n- Flexible graphs let developers create custom pipelines.\n- Extensible for user-specific needs with custom operators.\n- Accelerates image classification (ResNet-50), object detection (SSD) workloads as well as ASR models (Jasper, RNN-T).\n- Allows direct data path between storage and GPU memory with `GPUDirect Storage \u003Chttps:\u002F\u002Fdeveloper.nvidia.com\u002Fgpudirect-storage>`__.\n- Easy integration with `NVIDIA Triton Inference Server \u003Chttps:\u002F\u002Fdeveloper.nvidia.com\u002Fnvidia-triton-inference-server>`__\n  with `DALI TRITON Backend \u003Chttps:\u002F\u002Fgithub.com\u002Ftriton-inference-server\u002Fdali_backend>`__.\n- Open source.\n\n.. overview-end-marker-do-not-remove\n\n----\n\nDALI success stories:\n---------------------\n\n- `During Kaggle computer vision competitions \u003Chttps:\u002F\u002Fwww.kaggle.com\u002Fcode\u002Ftheoviel\u002Frsna-breast-baseline-faster-inference-with-dali>`__:\n  `\"DALI is one of the best things I have learned in this competition\" \u003Chttps:\u002F\u002Fwww.kaggle.com\u002Fcompetitions\u002Frsna-breast-cancer-detection\u002Fdiscussion\u002F391059>`__\n- `Lightning Pose - state of the art pose estimation research model \u003Chttps:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC10168383\u002F>`__\n- `To improve the resource utilization in Advanced Computing Infrastructure \u003Chttps:\u002F\u002Farcwiki.rs.gsu.edu\u002Fen\u002Fdali\u002Fusing_nvidia_dali_loader>`__\n- `MLPerf - the industry standard for benchmarking compute and deep learning hardware and software \u003Chttps:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Fmlperf-hpc-v1-0-deep-dive-into-optimizations-leading-to-record-setting-nvidia-performance\u002F>`__\n- `\"we optimized major models inside eBay with the DALI framework\" \u003Chttps:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fon-demand\u002Fsession\u002Fgtc24-s62578\u002F>`__\n\n----\n\nDALI Roadmap\n------------\n\n`The following issue represents \u003Chttps:\u002F\u002Fgithub.com\u002FNVIDIA\u002FDALI\u002Fissues\u002F5320>`__ a high-level overview of our 2024 plan. You should be aware that this\nroadmap may change at any time and the order of its items does not reflect any type of priority.\n\nWe strongly encourage you to comment on our roadmap and provide us feedback on the mentioned\nGitHub issue.\n\n----\n\nInstalling DALI\n---------------\n\nTo install the latest DALI release for the latest CUDA version (12.x)::\n\n    pip install nvidia-dali-cuda120\n    # or\n    pip install --extra-index-url https:\u002F\u002Fpypi.nvidia.com  --upgrade nvidia-dali-cuda120\n\nDALI requires `NVIDIA driver \u003Chttps:\u002F\u002Fwww.nvidia.com\u002Fdrivers>`__ supporting the appropriate CUDA version.\nIn case of DALI based on CUDA 12, it requires `CUDA Toolkit \u003Chttps:\u002F\u002Fdocs.nvidia.com\u002Fcuda\u002Fcuda-installation-guide-linux\u002Findex.html>`__\nto be installed.\n\nDALI comes preinstalled in the `TensorFlow \u003Chttps:\u002F\u002Fcatalog.ngc.nvidia.com\u002Forgs\u002Fnvidia\u002Fcontainers\u002Ftensorflow>`__,\n`PyTorch \u003Chttps:\u002F\u002Fcatalog.ngc.nvidia.com\u002Forgs\u002Fnvidia\u002Fcontainers\u002Fpytorch>`__,\nand `PaddlePaddle \u003Chttps:\u002F\u002Fcatalog.ngc.nvidia.com\u002Forgs\u002Fnvidia\u002Fcontainers\u002Fpaddlepaddle>`__\ncontainers on `NVIDIA GPU Cloud \u003Chttps:\u002F\u002Fngc.nvidia.com>`__.\n\nFor other installation paths (TensorFlow plugin, older CUDA version, nightly and weekly builds, etc),\nand specific requirements please refer to the `Installation Guide \u003Chttps:\u002F\u002Fdocs.nvidia.com\u002Fdeeplearning\u002Fdali\u002Fuser-guide\u002Fdocs\u002Finstallation.html>`__.\n\nTo build DALI from source, please refer to the `Compilation Guide \u003Chttps:\u002F\u002Fdocs.nvidia.com\u002Fdeeplearning\u002Fdali\u002Fuser-guide\u002Fdocs\u002Fcompilation.html>`__.\n\n\n----\n\nExamples and Tutorials\n----------------------\n\nAn introduction to DALI can be found in the `Getting Started \u003Chttps:\u002F\u002Fdocs.nvidia.com\u002Fdeeplearning\u002Fdali\u002Fuser-guide\u002Fdocs\u002Fexamples\u002Fgetting_started.html>`__ page.\n\nMore advanced examples can be found in the `Examples and Tutorials \u003Chttps:\u002F\u002Fdocs.nvidia.com\u002Fdeeplearning\u002Fdali\u002Fuser-guide\u002Fdocs\u002Fexamples\u002Findex.html>`__ page.\n\nFor an interactive version (Jupyter notebook) of the examples, go to the `docs\u002Fexamples \u003Chttps:\u002F\u002Fgithub.com\u002FNVIDIA\u002FDALI\u002Fblob\u002Fmain\u002Fdocs\u002Fexamples>`__\ndirectory.\n\n**Note:** Select the `Latest Release Documentation \u003Chttps:\u002F\u002Fdocs.nvidia.com\u002Fdeeplearning\u002Fdali\u002Fuser-guide\u002Fdocs\u002Findex.html>`__\nor the `Nightly Release Documentation \u003Chttps:\u002F\u002Fdocs.nvidia.com\u002Fdeeplearning\u002Fdali\u002Fmain-user-guide\u002Fdocs\u002Findex.html>`__, which stays in sync with the main branch,\ndepending on your version.\n\n----\n\nAdditional Resources\n--------------------\n\n- GPU Technology Conference 2024; **Optimizing Inference Model Serving for Highest Performance at eBay**; Yiheng Wang:\n  `event \u003Chttps:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fon-demand\u002Fsession\u002Fgtc24-s62578\u002F>`__\n- GPU Technology Conference 2023; **Developer Breakout: Accelerating Enterprise Workflows With Triton Server and DALI**; Brandon Tuttle:\n  `event \u003Chttps:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fon-demand\u002Fsession\u002Fgtcspring23-se52140\u002F>`__.\n- GPU Technology Conference 2023; **GPU-Accelerating End-to-End Geospatial Workflows**; Kevin Green:\n  `event \u003Chttps:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fon-demand\u002Fsession\u002Fgtcspring23-s51796\u002F>`__.\n- GPU Technology Conference 2022; **Effective NVIDIA DALI: Accelerating Real-life Deep-learning Applications**; Rafał Banaś:\n  `event \u003Chttps:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fon-demand\u002Fsession\u002Fgtcspring22-s41442\u002F>`__.\n- GPU Technology Conference 2022; **Introduction to NVIDIA DALI: GPU-accelerated Data Preprocessing**; Joaquin Anton Guirao:\n  `event \u003Chttps:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fon-demand\u002Fsession\u002Fgtcspring22-s41443\u002F>`__.\n- GPU Technology Conference 2021; **NVIDIA DALI: GPU-Powered Data Preprocessing** by Krzysztof Łęcki and Michał Szołucha:\n  `event \u003Chttps:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fon-demand\u002Fsession\u002Fgtcspring21-s31298\u002F>`__.\n- GPU Technology Conference 2020; **Fast Data Pre-Processing with NVIDIA Data Loading Library (DALI)**; Albert Wolant, Joaquin Anton Guirao:\n  `recording \u003Chttps:\u002F\u002Fdeveloper.nvidia.com\u002Fgtc\u002F2020\u002Fvideo\u002Fs21139>`__.\n- GPU Technology Conference 2019; **Fast AI data pre-preprocessing with DALI**; Janusz Lisiecki, Michał Zientkiewicz:\n  `slides \u003Chttps:\u002F\u002Fdeveloper.download.nvidia.com\u002Fvideo\u002Fgputechconf\u002Fgtc\u002F2019\u002Fpresentation\u002Fs9925-fast-ai-data-pre-processing-with-nvidia-dali.pdf>`__,\n  `recording \u003Chttps:\u002F\u002Fdeveloper.nvidia.com\u002Fgtc\u002F2019\u002Fvideo\u002FS9925\u002Fvideo>`__.\n- GPU Technology Conference 2019; **Integration of DALI with TensorRT on Xavier**; Josh Park and Anurag Dixit:\n  `slides \u003Chttps:\u002F\u002Fdeveloper.download.nvidia.com\u002Fvideo\u002Fgputechconf\u002Fgtc\u002F2019\u002Fpresentation\u002Fs9818-integration-of-tensorrt-with-dali-on-xavier.pdf>`__,\n  `recording \u003Chttps:\u002F\u002Fdeveloper.nvidia.com\u002Fgtc\u002F2019\u002Fvideo\u002FS9818\u002Fvideo>`__.\n- GPU Technology Conference 2018; **Fast data pipeline for deep learning training**, T. Gale, S. Layton and P. Trędak:\n  `slides \u003Chttp:\u002F\u002Fon-demand.gputechconf.com\u002Fgtc\u002F2018\u002Fpresentation\u002Fs8906-fast-data-pipelines-for-deep-learning-training.pdf>`__,\n  `recording \u003Chttps:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fon-demand\u002Fsession\u002Fgtcsiliconvalley2018-s8906\u002F>`__.\n- `Developer Page \u003Chttps:\u002F\u002Fdeveloper.nvidia.com\u002FDALI>`__.\n- `Blog Posts \u003Chttps:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Ftag\u002Fdali\u002F>`__.\n\n\n----\n\nContributing to DALI\n--------------------\n\nWe welcome contributions to DALI. To contribute to DALI and make pull requests,\nfollow the guidelines outlined in the `Contributing \u003Chttps:\u002F\u002Fgithub.com\u002FNVIDIA\u002FDALI\u002Fblob\u002Fmain\u002FCONTRIBUTING.md>`__\ndocument.\n\nIf you are looking for a task good for the start please check one from\n`external contribution welcome label \u003Chttps:\u002F\u002Fgithub.com\u002FNVIDIA\u002FDALI\u002Flabels\u002Fexternal%20contribution%20welcome>`__.\n\nReporting Problems, Asking Questions\n------------------------------------\n\nWe appreciate feedback, questions or bug reports. When you need help\nwith the code, follow the process outlined in the `Stack Overflow\n\u003Chttps:\u002F\u002Fstackoverflow.com\u002Fhelp\u002Fmcve>`__ document. Ensure that the\nposted examples are:\n\n- **minimal**: Use as little code as possible that still produces the same problem.\n- **complete**: Provide all parts needed to reproduce the problem.\n  Check if you can strip external dependency and still show the problem.\n  The less time we spend on reproducing the problems, the more time we\n  can dedicate to the fixes.\n- **verifiable**: Test the code you are about to provide, to make sure\n  that it reproduces the problem. Remove all other problems that are not\n  related to your request.\n\nAcknowledgements\n----------------\n\nDALI was originally built with major contributions from Trevor Gale, Przemek Tredak,\nSimon Layton, Andrei Ivanov and Serge Panev.\n\n.. |License| image:: https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache%202.0-blue.svg\n   :target: https:\u002F\u002Fopensource.org\u002Flicenses\u002FApache-2.0\n\n.. |Documentation| image:: https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FNVIDIA%20DALI-documentation-brightgreen.svg?longCache=true\n   :target: https:\u002F\u002Fdocs.nvidia.com\u002Fdeeplearning\u002Fdali\u002Fuser-guide\u002Fdocs\u002Findex.html\n\n.. |Format| image:: https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcode%20style-black-000000.svg\n    :target: https:\u002F\u002Fgithub.com\u002Fpsf\u002Fblack\n","|许可证|  |文档|  |格式|\n\nNVIDIA DALI\n===========\n.. overview-begin-marker-do-not-remove\n\nNVIDIA 数据加载库（DALI）是一个 GPU 加速的数据加载和预处理库，用于加速深度学习应用。它提供了一系列高度优化的构建模块，用于加载和处理图像、视频和音频数据。它可以作为流行深度学习框架中内置数据加载器和数据迭代器的可移植替代方案。\n\n深度学习应用需要复杂的多阶段数据处理流水线，包括加载、解码、裁剪、调整大小以及许多其他增强操作。这些目前在 CPU 上执行的数据处理流水线已经成为瓶颈，限制了训练和推理的性能与可扩展性。\n\nDALI 通过将数据预处理卸载到 GPU 来解决 CPU 瓶颈问题。此外，DALI 还依赖于其自身的执行引擎，旨在最大化输入流水线的吞吐量。诸如预取、并行执行和批处理等功能对用户来说都是透明的。\n\n此外，深度学习框架通常有多种数据预处理实现，这导致了训练和推理工作流的可移植性以及代码维护性方面的挑战。使用 DALI 实现的数据处理流水线是可移植的，因为它们可以轻松地重新部署到 TensorFlow、PyTorch 和 PaddlePaddle 中。\n\n.. image:: \u002Fdali.png\n    :width: 800\n    :align: center\n    :alt: DALI 图表\n\nDALI 的实际应用：\n\n.. container:: dali-tabs\n\n   **流水线模式：**\n\n   .. code-block:: python\n\n      from nvidia.dali.pipeline import pipeline_def\n      import nvidia.dali.types as types\n      import nvidia.dali.fn as fn\n      from nvidia.dali.plugin.pytorch import DALIGenericIterator\n      import os\n\n      # 要使用不同的数据，请参阅 nvidia.dali.fn.readers.file 的文档\n      # 该文档指向 https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FDALI_extra\n      data_root_dir = os.environ['DALI_EXTRA_PATH']\n      images_dir = os.path.join(data_root_dir, 'db', 'single', 'jpeg')\n\n\n      def loss_func(pred, y):\n          pass\n\n\n      def model(x):\n          pass\n\n\n      def backward(loss, model):\n          pass\n\n\n      @pipeline_def(num_threads=4, device_id=0)\n      def get_dali_pipeline():\n          images, labels = fn.readers.file(\n              file_root=images_dir, random_shuffle=True, name=\"Reader\")\n          # 在 GPU 上解码数据\n          images = fn.decoders.image_random_crop(\n              images, device=\"mixed\", output_type=types.RGB)\n          # 其余处理也在 GPU 上进行\n          images = fn.resize(images, resize_x=256, resize_y=256)\n          images = fn.crop_mirror_normalize(\n              images,\n              crop_h=224,\n              crop_w=224,\n              mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],\n              std=[0.229 * 255, 0.224 * 255, 0.225 * 255],\n              mirror=fn.random.coin_flip())\n          return images, labels\n\n\n      train_data = DALIGenericIterator(\n          [get_dali_pipeline(batch_size=16)],\n          ['data', 'label'],\n          reader_name='Reader'\n      )\n\n\n      for i, data in enumerate(train_data):\n          x, y = data[0]['data'], data[0]['label']\n          pred = model(x)\n          loss = loss_func(pred, y)\n          backward(loss, model)\n\n   **动态模式：**\n\n   .. code-block:: python\n\n      import os\n      import nvidia.dali.types as types\n      import nvidia.dali.experimental.dynamic as ndd\n      import torch\n\n      # 要使用不同的数据，请参阅 ndd.readers.File 的文档\n      # 该文档指向 https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FDALI_extra\n      data_root_dir = os.environ['DALI_EXTRA_PATH']\n      images_dir = os.path.join(data_root_dir, 'db', 'single', 'jpeg')\n\n\n      def loss_func(pred, y):\n          pass\n\n\n      def model(x):\n          pass\n\n\n      def backward(loss, model):\n          pass\n\n\n      reader = ndd.readers.File(file_root=images_dir, random_shuffle=True)\n\n      for images, labels in reader.next_epoch(batch_size=16):\n          images = ndd.decoders.image_random_crop(images, device=\"gpu\", output_type=types.RGB)\n          # 其余处理也在 GPU 上进行\n          images = ndd.resize(images, resize_x=256, resize_y=256)\n          images = ndd.crop_mirror_normalize(\n              images,\n              crop_h=224,\n              crop_w=224,\n              mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],\n              std=[0.229 * 255, 0.224 * 255, 0.225 * 255],\n              mirror=ndd.random.coin_flip(),\n          )\n\n          x = torch.as_tensor(images)\n          y = torch.as_tensor(labels.gpu())\n\n          pred = model(x)\n          loss = loss_func(pred, y)\n          backward(loss, model)\n\n\n亮点\n----------\n- 易于使用的函数式 Python API。\n- 支持多种数据格式——LMDB、RecordIO、TFRecord、COCO、JPEG、JPEG 2000、WAV、FLAC、OGG、H.264、VP9 和 HEVC。\n- 可跨主流深度学习框架使用：TensorFlow、PyTorch、PaddlePaddle、JAX。\n- 支持 CPU 和 GPU 执行。\n- 可扩展至多 GPU。\n- 灵活的图结构使开发者能够创建自定义流水线。\n- 可通过自定义算子扩展以满足特定需求。\n- 加速图像分类（ResNet-50）、目标检测（SSD）工作loads以及 ASR 模型（Jasper、RNN-T）。\n- 通过 `GPUDirect Storage \u003Chttps:\u002F\u002Fdeveloper.nvidia.com\u002Fgpudirect-storage>`__ 实现存储与 GPU 内存之间的直接数据路径。\n- 可通过 `DALI TRITON Backend \u003Chttps:\u002F\u002Fgithub.com\u002Ftriton-inference-server\u002Fdali_backend>`__ 轻松集成到 `NVIDIA Triton 推理服务器 \u003Chttps:\u002F\u002Fdeveloper.nvidia.com\u002Fnvidia-triton-inference-server>`__。\n- 开源。\n\n.. overview-end-marker-do-not-remove\n\n----\n\nDALI 成功案例：\n---------------------\n\n- `在 Kaggle 计算机视觉竞赛中 \u003Chttps:\u002F\u002Fwww.kaggle.com\u002Fcode\u002Ftheoviel\u002Frsna-breast-baseline-faster-inference-with-dali>`__：\n  `\"DALI 是我在本次比赛中学到的最佳工具之一\" \u003Chttps:\u002F\u002Fwww.kaggle.com\u002Fcompetitions\u002Frsna-breast-cancer-detection\u002Fdiscussion\u002F391059>`__\n- `Lightning Pose - 最先进的姿态估计研究模型 \u003Chttps:\u002F\u002Fwww.ncbi.nlm.nih.gov\u002Fpmc\u002Farticles\u002FPMC10168383\u002F>`__\n- `用于提升高级计算基础设施中的资源利用率 \u003Chttps:\u002F\u002Farcwiki.rs.gsu.edu\u002Fen\u002Fdali\u002Fusing_nvidia_dali_loader>`__\n- `MLPerf - 行业标准的计算和深度学习硬件及软件基准测试 \u003Chttps:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Fmlperf-hpc-v1-0-deep-dive-into-optimizations-leading-to-record-setting-nvidia-performance\u002F>`__\n- `\"我们使用 DALI 框架优化了 eBay 内部的主要模型\" \u003Chttps:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fon-demand\u002Fsession\u002Fgtc24-s62578\u002F>`__\n\n----\n\nDALI 路线图\n------------\n\n`以下议题代表 \u003Chttps:\u002F\u002Fgithub.com\u002FNVIDIA\u002FDALI\u002Fissues\u002F5320>`__ 我们2024年计划的高层次概览。请注意，该路线图可能会随时更改，其中条目的顺序并不反映任何优先级。\n\n我们强烈鼓励您在上述 GitHub 问题中对我们的路线图发表评论，并向我们提供反馈。\n\n----\n\n安装 DALI\n---------\n\n要为最新 CUDA 版本（12.x）安装最新的 DALI 发行版，请执行以下命令：\n\n    pip install nvidia-dali-cuda120\n    # 或\n    pip install --extra-index-url https:\u002F\u002Fpypi.nvidia.com  --upgrade nvidia-dali-cuda120\n\nDALI 需要支持相应 CUDA 版本的 `NVIDIA 驱动程序 \u003Chttps:\u002F\u002Fwww.nvidia.com\u002Fdrivers>`__。对于基于 CUDA 12 的 DALI，还需要安装 `CUDA 工具包 \u003Chttps:\u002F\u002Fdocs.nvidia.com\u002Fcuda\u002Fcuda-installation-guide-linux\u002Findex.html>`__。\n\nDALI 已预装在 `TensorFlow \u003Chttps:\u002F\u002Fcatalog.ngc.nvidia.com\u002Forgs\u002Fnvidia\u002Fcontainers\u002Ftensorflow>`__、`PyTorch \u003Chttps:\u002F\u002Fcatalog.ngc.nvidia.com\u002Forgs\u002Fnvidia\u002Fcontainers\u002Fpytorch>`__ 和 `PaddlePaddle \u003Chttps:\u002F\u002Fcatalog.ngc.nvidia.com\u002Forgs\u002Fnvidia\u002Fcontainers\u002Fpaddlepaddle>`__ 容器中，这些容器位于 `NVIDIA GPU Cloud \u003Chttps:\u002F\u002Fngc.nvidia.com>`__ 上。\n\n有关其他安装方式（如 TensorFlow 插件、旧版 CUDA、夜间和每周构建等）以及特定需求，请参阅 `安装指南 \u003Chttps:\u002F\u002Fdocs.nvidia.com\u002Fdeeplearning\u002Fdali\u002Fuser-guide\u002Fdocs\u002Finstallation.html>`__。\n\n若要从源代码构建 DALI，请参阅 `编译指南 \u003Chttps:\u002F\u002Fdocs.nvidia.com\u002Fdeeplearning\u002Fdali\u002Fuser-guide\u002Fdocs\u002Fcompilation.html>`__。\n\n\n----\n\n示例与教程\n------------\n\n关于 DALI 的入门介绍可在 `快速入门 \u003Chttps:\u002F\u002Fdocs.nvidia.com\u002Fdeeplearning\u002Fdali\u002Fuser-guide\u002Fdocs\u002Fexamples\u002Fgetting_started.html>`__ 页面找到。\n\n更高级的示例则可在 `示例与教程 \u003Chttps:\u002F\u002Fdocs.nvidia.com\u002Fdeeplearning\u002Fdali\u002Fuser-guide\u002Fdocs\u002Fexamples\u002Findex.html>`__ 页面找到。\n\n如需交互式版本（Jupyter Notebook）的示例，请前往 `docs\u002Fexamples \u003Chttps:\u002F\u002Fgithub.com\u002FNVIDIA\u002FDALI\u002Fblob\u002Fmain\u002Fdocs\u002Fexamples>`__ 目录。\n\n**注意：** 请根据您的版本选择 `最新发行版文档 \u003Chttps:\u002F\u002Fdocs.nvidia.com\u002Fdeeplearning\u002Fdali\u002Fuser-guide\u002Fdocs\u002Findex.html>`__ 或与主分支保持同步的 `夜间发行版文档 \u003Chttps:\u002F\u002Fdocs.nvidia.com\u002Fdeeplearning\u002Fdali\u002Fmain-user-guide\u002Fdocs\u002Findex.html>`__。\n\n----\n\n附加资源\n----------\n\n- GPU 技术大会 2024；**优化 eBay 的推理模型服务以实现最高性能**；Yiheng Wang：\n  `活动链接 \u003Chttps:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fon-demand\u002Fsession\u002Fgtc24-s62578\u002F>`__\n- GPU 技术大会 2023；**开发者分会场：使用 Triton Server 和 DALI 加速企业工作流**；Brandon Tuttle：\n  `活动链接 \u003Chttps:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fon-demand\u002Fsession\u002Fgtcspring23-se52140\u002F>`__。\n- GPU 技术大会 2023；**GPU 加速端到端地理空间工作流**；Kevin Green：\n  `活动链接 \u003Chttps:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fon-demand\u002Fsession\u002Fgtcspring23-s51796\u002F>`__。\n- GPU 技术大会 2022；**高效 NVIDIA DALI：加速实际深度学习应用**；Rafał Banaś：\n  `活动链接 \u003Chttps:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fon-demand\u002Fsession\u002Fgtcspring22-s41442\u002F>`__。\n- GPU 技术大会 2022；**NVIDIA DALI 入门：GPU 加速的数据预处理**；Joaquin Anton Guirao：\n  `活动链接 \u003Chttps:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fon-demand\u002Fsession\u002Fgtcspring22-s41443\u002F>`__。\n- GPU 技术大会 2021；**NVIDIA DALI：GPU 驱动的数据预处理**，由 Krzysztof Łęcki 和 Michał Szołucha 主讲：\n  `活动链接 \u003Chttps:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fon-demand\u002Fsession\u002Fgtcspring21-s31298\u002F>`__。\n- GPU 技术大会 2020；**使用 NVIDIA 数据加载库 (DALI) 进行快速数据预处理**；Albert Wolant、Joaquin Anton Guirao：\n  `录像链接 \u003Chttps:\u002F\u002Fdeveloper.nvidia.com\u002Fgtc\u002F2020\u002Fvideo\u002Fs21139>`__。\n- GPU 技术大会 2019；**使用 DALI 进行快速 AI 数据预处理**；Janusz Lisiecki、Michał Zientkiewicz：\n  `幻灯片链接 \u003Chttps:\u002F\u002Fdeveloper.download.nvidia.com\u002Fvideo\u002Fgputechconf\u002Fgtc\u002F2019\u002Fpresentation\u002Fs9925-fast-ai-data-pre-processing-with-nvidia-dali.pdf>`__，\n  `录像链接 \u003Chttps:\u002F\u002Fdeveloper.nvidia.com\u002Fgtc\u002F2019\u002Fvideo\u002FS9925\u002Fvideo>`__。\n- GPU 技术大会 2019；**DALI 与 TensorRT 在 Xavier 上的集成**；Josh Park 和 Anurag Dixit：\n  `幻灯片链接 \u003Chttps:\u002F\u002Fdeveloper.download.nvidia.com\u002Fvideo\u002Fgputechconf\u002Fgtc\u002F2019\u002Fpresentation\u002Fs9818-integration-of-tensorrt-with-dali-on-xavier.pdf>`__，\n  `录像链接 \u003Chttps:\u002F\u002Fdeveloper.nvidia.com\u002Fgtc\u002F2019\u002Fvideo\u002FS9818\u002Fvideo>`__。\n- GPU 技术大会 2018；**用于深度学习训练的快速数据流水线**，T. Gale、S. Layton 和 P. Trędak：\n  `幻灯片链接 \u003Chttp:\u002F\u002Fon-demand.gputechconf.com\u002Fgtc\u002F2018\u002Fpresentation\u002Fs8906-fast-data-pipelines-for-deep-learning-training.pdf>`__，\n  `录像链接 \u003Chttps:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fon-demand\u002Fsession\u002Fgtcsiliconvalley2018-s8906\u002F>`__。\n- `开发者页面 \u003Chttps:\u002F\u002Fdeveloper.nvidia.com\u002FDALI>`__。\n- `博客文章 \u003Chttps:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Ftag\u002Fdali\u002F>`__。\n\n\n----\n\n参与贡献 DALI\n--------------\n\n我们欢迎对 DALI 的贡献。要为 DALI 做出贡献并提交拉取请求，请遵循 `贡献指南 \u003Chttps:\u002F\u002Fgithub.com\u002FNVIDIA\u002FDALI\u002Fblob\u002Fmain\u002FCONTRIBUTING.md>`__ 文档中的说明。\n\n如果您正在寻找适合初学者的任务，请查看带有 `欢迎外部贡献标签 \u003Chttps:\u002F\u002Fgithub.com\u002FNVIDIA\u002FDALI\u002Flabels\u002Fexternal%20contribution%20welcome>`__ 的任务。\n\n报告问题、提问\n--------------\n\n我们非常感谢您的反馈、问题或错误报告。当您需要代码方面的帮助时，请遵循 `Stack Overflow \u003Chttps:\u002F\u002Fstackoverflow.com\u002Fhelp\u002Fmcve>`__ 文档中概述的流程。请确保您发布的示例：\n\n- **最小化**：尽可能使用最少的代码，但仍能重现相同的问题。\n- **完整**：提供所有用于重现问题所需的组件。尝试去除外部依赖，看看是否仍能展示问题。我们花在复现问题上的时间越少，就能有更多时间用于修复。\n- **可验证**：在提交代码之前，先测试它是否确实能重现问题。移除所有与您的请求无关的问题。\n\n致谢\n------\n\nDALI 最初是在 Trevor Gale、Przemek Tredak、Simon Layton、Andrei Ivanov 和 Serge Panev 的主要贡献下开发的。\n\n.. |License| image:: https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache%202.0-blue.svg\n   :target: https:\u002F\u002Fopensource.org\u002Flicenses\u002FApache-2.0\n\n.. |Documentation| image:: https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FNVIDIA%20DALI-documentation-brightgreen.svg?longCache=true\n   :target: https:\u002F\u002Fdocs.nvidia.com\u002Fdeeplearning\u002Fdali\u002Fuser-guide\u002Fdocs\u002Findex.html\n\n.. |Format| image:: https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcode%20style-black-000000.svg\n    :target: https:\u002F\u002Fgithub.com\u002Fpsf\u002Fblack","# NVIDIA DALI 快速上手指南\n\nNVIDIA DALI (Data Loading Library) 是一个 GPU 加速的数据加载和预处理库，旨在解决深度学习训练中 CPU 数据预处理成为瓶颈的问题。它支持图像、视频和音频数据，可无缝集成到 TensorFlow、PyTorch、PaddlePaddle 等主流框架中。\n\n## 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n*   **操作系统**: Linux (推荐 Ubuntu 18.04\u002F20.04\u002F22.04)。\n*   **GPU**: 支持 CUDA 的 NVIDIA GPU。\n*   **驱动程序**: 安装与您的 CUDA 版本匹配的 [NVIDIA Driver](https:\u002F\u002Fwww.nvidia.com\u002Fdrivers)。\n*   **CUDA Toolkit**: \n    *   若使用 CUDA 12.x 版本的 DALI，需安装对应的 [CUDA Toolkit](https:\u002F\u002Fdocs.nvidia.com\u002Fcuda\u002Fcuda-installation-guide-linux\u002Findex.html)。\n    *   若使用 NGC 容器（推荐），则无需单独安装，环境已预配置。\n*   **Python**: Python 3.8 - 3.11。\n*   **深度学习框架**: 已安装 PyTorch、TensorFlow 或 PaddlePaddle。\n\n> **提示**：国内开发者推荐使用 [NVIDIA NGC 中国镜像站](https:\u002F\u002Fngc.nvidia.cn\u002F) 获取预装 DALI 的 Docker 容器，可避免复杂的环境配置。\n\n## 安装步骤\n\n### 方法一：使用 pip 安装（推荐用于现有环境）\n\n针对最新的 CUDA 12.x 版本，运行以下命令：\n\n```bash\npip install nvidia-dali-cuda120\n```\n\n或者使用 NVIDIA 官方 PyPI 索引进行升级安装：\n\n```bash\npip install --extra-index-url https:\u002F\u002Fpypi.nvidia.com --upgrade nvidia-dali-cuda120\n```\n\n> **注意**：如果您使用的是旧版 CUDA (如 11.x)，请将包名中的 `cuda120` 替换为相应的版本号（例如 `cuda110`）。\n\n### 方法二：使用 Docker 容器（最简便）\n\n直接拉取包含 DALI 的官方容器，无需手动配置依赖：\n\n```bash\n# PyTorch 示例\ndocker pull nvcr.io\u002Fnvidia\u002Fpytorch:24.02-py3\n# 启动容器时确保添加 --gpus all 参数以启用 GPU\n```\n\n## 基本使用\n\nDALI 提供两种主要使用模式：**Pipeline 模式**（经典静态图）和 **Dynamic 模式**（动态图，更灵活）。以下以 **PyTorch** 为例展示最简单的用法。\n\n### 示例：Pipeline 模式（静态定义）\n\n此模式适合固定的数据处理流程，性能最优。\n\n```python\nfrom nvidia.dali.pipeline import pipeline_def\nimport nvidia.dali.types as types\nimport nvidia.dali.fn as fn\nfrom nvidia.dali.plugin.pytorch import DALIGenericIterator\nimport os\n\n# 设置数据路径 (需提前下载 DALI_EXTRA 数据集或指向本地图片目录)\ndata_root_dir = os.environ.get('DALI_EXTRA_PATH', '\u002Fpath\u002Fto\u002Fyour\u002Fdata')\nimages_dir = os.path.join(data_root_dir, 'db', 'single', 'jpeg')\n\n@pipeline_def(num_threads=4, device_id=0)\ndef get_dali_pipeline():\n    # 1. 读取文件\n    images, labels = fn.readers.file(\n        file_root=images_dir, random_shuffle=True, name=\"Reader\")\n    \n    # 2. GPU 解码并随机裁剪\n    images = fn.decoders.image_random_crop(\n        images, device=\"mixed\", output_type=types.RGB)\n    \n    # 3. 调整大小\n    images = fn.resize(images, resize_x=256, resize_y=256)\n    \n    # 4. 归一化、中心裁剪及随机翻转\n    images = fn.crop_mirror_normalize(\n        images,\n        crop_h=224,\n        crop_w=224,\n        mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],\n        std=[0.229 * 255, 0.224 * 255, 0.225 * 255],\n        mirror=fn.random.coin_flip())\n        \n    return images, labels\n\n# 创建迭代器\ntrain_data = DALIGenericIterator(\n    [get_dali_pipeline(batch_size=16)],\n    ['data', 'label'],\n    reader_name='Reader'\n)\n\n# 在训练循环中使用\nfor i, data in enumerate(train_data):\n    x = data[0]['data']  # GPU Tensor\n    y = data[0]['label'] # GPU Tensor\n    \n    # 此处接入您的 PyTorch 模型训练代码\n    # pred = model(x)\n    # loss = loss_func(pred, y)\n    # loss.backward()\n    break # 仅演示，实际使用时移除\n```\n\n### 示例：Dynamic 模式（动态执行）\n\n此模式允许在 Python 循环中动态改变处理逻辑，代码风格更接近原生 PyTorch。\n\n```python\nimport os\nimport nvidia.dali.types as types\nimport nvidia.dali.experimental.dynamic as ndd\nimport torch\n\ndata_root_dir = os.environ.get('DALI_EXTRA_PATH', '\u002Fpath\u002Fto\u002Fyour\u002Fdata')\nimages_dir = os.path.join(data_root_dir, 'db', 'single', 'jpeg')\n\n# 定义读取器\nreader = ndd.readers.File(file_root=images_dir, random_shuffle=True)\n\n# 动态迭代\nfor images, labels in reader.next_epoch(batch_size=16):\n    # 直接在循环中定义处理流程\n    images = ndd.decoders.image_random_crop(images, device=\"gpu\", output_type=types.RGB)\n    images = ndd.resize(images, resize_x=256, resize_y=256)\n    images = ndd.crop_mirror_normalize(\n        images,\n        crop_h=224,\n        crop_w=224,\n        mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],\n        std=[0.229 * 255, 0.224 * 255, 0.225 * 255],\n        mirror=ndd.random.coin_flip(),\n    )\n\n    # 转换为 PyTorch Tensor\n    x = torch.as_tensor(images)\n    y = torch.as_tensor(labels.gpu())\n\n    # 接入模型训练\n    # pred = model(x)\n    # loss = loss_func(pred, y)\n    # loss.backward()\n    break # 仅演示\n```\n\n### 关键特性说明\n*   **device=\"mixed\"**: 解码操作通常在 CPU 上进行（利用 NVJPEG 硬件加速），其余操作在 GPU 上进行，以最大化吞吐量。\n*   **自动批处理**: DALI 自动处理数据的批量化、预取和并行执行，用户无需手动管理队列。\n*   **框架兼容**: 输出的数据可直接作为 GPU Tensor 传递给模型，零拷贝开销。","某计算机视觉团队正在训练一个基于 ResNet-50 的大规模图像分类模型，需处理百万级高清图片并执行复杂的实时增强操作。\n\n### 没有 DALI 时\n- **CPU 成为性能瓶颈**：数据解码、裁剪和归一化等预处理全在 CPU 上进行，导致 GPU 经常空闲等待数据，利用率不足 60%。\n- **多框架代码冗余**：为了适配 PyTorch 和 TensorFlow 不同环境，团队需维护两套独立的数据加载逻辑，增加了出错风险和维护成本。\n- **流水线延迟高**：传统的串行数据处理方式无法有效隐藏 I\u002FO 延迟，整体训练迭代速度缓慢，延长了模型上线周期。\n- **复杂增强实现困难**：在 CPU 上实现高效的随机裁剪与混合精度预处理代码复杂且难以优化，往往被迫简化增强策略影响模型精度。\n\n### 使用 DALI 后\n- **GPU 加速释放算力**：DALI 将解码、调整大小及归一化等繁重任务卸载至 GPU，使 GPU 利用率提升至 95% 以上，显著缩短单步训练时间。\n- **一套代码多端运行**：借助 DALI 的可移植性，同一套数据流水线定义可直接服务于 PyTorch、TensorFlow 等多个框架，消除了重复开发工作。\n- **智能流水线优化**：DALI 内置的执行引擎自动处理预取、并行执行和批处理，透明地掩盖了数据读取延迟，大幅提高了吞吐量。\n- **高效实现复杂增强**：利用 DALI 高度优化的算子，团队轻松实现了包括随机镜像、色彩抖动在内的高级增强策略，既提升了效率又保证了模型泛化能力。\n\nDALI 通过将数据预处理从 CPU 迁移至 GPU 并统一多框架接口，彻底打破了数据加载瓶颈，让深度学习训练重回算力驱动的高速轨道。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FNVIDIA_DALI_5eb58370.png","NVIDIA","NVIDIA Corporation","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FNVIDIA_7dcf6000.png","",null,"https:\u002F\u002Fnvidia.com","https:\u002F\u002Fgithub.com\u002FNVIDIA",[83,87,91,95,99,103,106],{"name":84,"color":85,"percentage":86},"C++","#f34b7d",52.9,{"name":88,"color":89,"percentage":90},"Python","#3572A5",33,{"name":92,"color":93,"percentage":94},"Cuda","#3A4E3A",9.6,{"name":96,"color":97,"percentage":98},"CMake","#DA3434",1.9,{"name":100,"color":101,"percentage":102},"Shell","#89e051",1.3,{"name":104,"color":105,"percentage":102},"C","#555555",{"name":107,"color":108,"percentage":109},"Dockerfile","#384d54",0,5658,660,"2026-04-03T12:06:22","Apache-2.0","Linux","必需 NVIDIA GPU。需安装支持相应 CUDA 版本的 NVIDIA 驱动。示例中提及支持 CUDA 12.x (nvidia-dali-cuda120)。支持多 GPU 扩展及 GPUDirect Storage。","未说明",{"notes":118,"python":116,"dependencies":119},"DALI 是 NVIDIA 推出的 GPU 加速数据加载与预处理库，旨在解决 CPU 数据预处理瓶颈。它可作为 TensorFlow、PyTorch、PaddlePaddle 和 JAX 等框架的数据加载器替代品。官方推荐使用 NGC 容器（已预装 DALI）或通过 pip 安装特定 CUDA 版本的包（如 nvidia-dali-cuda120）。支持多种数据格式（图像、视频、音频）及从存储到显存的直接数据路径（GPUDirect Storage）。",[120,121,122,123,124,125],"NVIDIA Driver","CUDA Toolkit (针对 CUDA 12 版本)","TensorFlow (可选)","PyTorch (可选)","PaddlePaddle (可选)","JAX (可选)",[13,55,14,51],[128,129,130,131,132,133,134,135,136,137,138,139,140,141,142],"fast-data-pipeline","image-augmentation","data-augmentation","image-processing","data-processing","deep-learning","machine-learning","python","neural-network","gpu","gpu-tensorflow","audio-processing","pytorch","mxnet","paddle","2026-03-27T02:49:30.150509","2026-04-06T05:15:31.741122",[146,151,156,161,166,171],{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},18209,"DALI 支持在 Windows 平台上安装和使用吗？","DALI 原生不支持 Windows 平台，因此直接使用 pip 无法找到合适的安装包。建议用户使用基于 Linux 的环境来评估和使用 DALI。如果必须在 Windows 上运行，可以使用 WSL (Windows Subsystem for Linux) 来运行 Linux 包。","https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FDALI\u002Fissues\u002F476",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},18210,"如何在 VideoReader 加载视频时同时获取标签（labels）？","可以通过特定的 PR 实现该功能。维护者指出相关功能已在 PR #1500 中解决，建议用户检查最新 nightly 构建版本中该 PR 的工作方式，以支持视频分类任务（如动作识别）中的标签返回。","https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FDALI\u002Fissues\u002F666",{"id":157,"question_zh":158,"answer_zh":159,"source_url":160},18211,"VideoReader 的输出能否直接连接到 Resize 算子？遇到 'Assert on input.GetLayout() == DALI_NHWC failed' 错误怎么办？","VideoReader 默认输出的布局可能与 Resize 算子要求的 NHWC 布局不匹配。对于推理场景，如果需要消除视频序列维度（F）仅保留批次维度（N），可以使用 Reshape 算子将序列长度设为 1，或者使用 ElementExtract 算子。相关功能已在 0.25 版本中实现。","https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FDALI\u002Fissues\u002F1247",{"id":162,"question_zh":163,"answer_zh":164,"source_url":165},18212,"如何在 Pipeline 内部获取解码后图像的形状（宽高通道数）？","在 CPU 端无法直接获取形状信息，因为数据位于 GPU 上且输出类型为 TensorReference（没有 shape 属性）。如果需要获取形状，必须开发自定义的 GPU kernel 来直接在 GPU 上访问这些数据。","https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FDALI\u002Fissues\u002F410",{"id":167,"question_zh":168,"answer_zh":169,"source_url":170},18213,"ops.WarpAffine 的输出结果为什么与 cv2.warpAffine 不一致？","即使仿射矩阵和解码器输出相同，ops.WarpAffine (GPU) 与 cv2.warpAffine (CPU) 的结果也可能存在差异。这通常是由于底层插值算法、边界处理策略或坐标映射方式的细微差别导致的。在追求完全一致的复现时需注意这一特性，建议以 DALI 的 GPU 加速结果为基准进行模型训练和推理。","https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FDALI\u002Fissues\u002F2648",{"id":172,"question_zh":173,"answer_zh":174,"source_url":175},18214,"使用 VideoReader 加载大量视频文件时出现阻塞或性能问题，如何优化？","确保正确配置 VideoReader 的参数，如 sequence_length, initial_fill 和 random_shuffle。利用 DALI 的 GPU 硬件加速视频解码功能可以显著加速 CNN 训练。如果遇到阻塞，请检查是否错误地混用了 OpenCV 等其他库的代码，并确保整个数据加载流程都在 DALI Pipeline 中执行以发挥并行化优势。","https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FDALI\u002Fissues\u002F1599",[177,182,187,192,197,202,207,212,217,222,227,232,237,242,247,252,257,262,267,272],{"id":178,"version":179,"summary_zh":180,"released_at":181},108739,"v2.0.0","主要特性与改进\n---\n本次 DALI 发布包含以下主要特性与改进：\n\n* 改进的 DALI 动态模式：\n  * 添加了异步和延迟执行功能（#6210、#6204、#6124、#6216、#6152）\n  * 优化了多线程支持，兼容无 GIL 的 Python 3.13t 和 Python 3.14。（#6200、#6174、#6136、#5884、#6201、#6202、#6164、#6142）\n  * 增加了 TorchData 集成功能（#6198）\n  * 提升了易用性及与其他库的互操作性（#6131、#6182、#6188、#6172、#6179、#6143）\n  * 改进了执行设备的指定与处理方式（#6194、#6165）\n  * 完善了示例和文档（#6140、#6189、#6170）\n* 新增对比度受限自适应直方图均衡化（CLAHE）算子（#6069）\n  * 感谢 @tonyreina 的贡献！\n* 增加对 CUDA 13.1U1 的支持（#6163）\n* 改进了 slice、full、zeros、ones 算子（#6159、#6109、#6169）\n\n修复的问题\n---\n* 添加 DALI_MAX_IMAGE_SIZE 环境变量，用于限制 CPU 和 GPU 解码器中解码后的图像大小。（#6208）\n* 修复了图像格式检测中的越界读取问题。（#6207）\n* 修复了音频解码器对超过 2GB 文件的处理问题。（#6199）\n* 修复了随机裁剪算子在新随机状态传递机制下的行为一致性问题。（#6190）\n* 修复了位移滤波器因同步缺失偶尔返回损坏数据的问题。（#6168）\n* 将 DALI 检查点格式中使用的 pickle 替换为 JSON。（#6154）\n* 修复了使用负步长进行切片时的问题。（#6161）\n* 修复了固定大小轮询分配器中的内存泄漏问题。（#6153）\n\n改进内容\n---\n* 添加一个函数，用于清除 EvalContext 的算子实例缓存。（#6216）\n* 在动态模式中集成 TorchData，并创建相关示例。（#6198）\n* 为延迟和异步执行添加异常传播功能。（#6210）\n* 将版本号更新至 2.0.0\n* 添加 ndd.Stream.synchronize 方法，并实现 EvalMode.sync_full 功能。（#6204）\n* `ndd` 与 `fn` 测试第一部分：工具与自动化测试。（#6191）\n* 为动态模式添加多线程使用指南。（#6200）\n* 将 ndd 多线程测试中的线程数限制为 32 个。（#6201）\n* 修复自由线程环境下的 conda 测试问题。（#6202）\n* 改进了设备处理逻辑，移除混合设备支持，使 DALI 能够在无 GPU 的情况下正常运行。（#6194）\n* 将已弃用的 pkg_resources.require 替换为基于 packaging\u002Fimportlib 的替代方案。（#6196）\n* 在张量转换中新增一等公民级别的批处理功能，并支持可选填充。（#6182）\n* 将 DALI 动态模式与 Pipeline API 分为两个独立章节。（#6189）\n* 编写了 ndd.DType 的文档。（#6170）\n* 为动态模式添加多线程测试用例。（#6164）\n* 将 ndd 读取类算子从算子文档中排除。（#6173）\n* 更新 DALI_DEPS：libsound、openssl。（#6185）\n* 在 ArgValue 中支持将标量列表广播到任意形状。（#6188）\n* 添加每线程流，重新设计流语义，并引入真正的 Python 流类。（#6174）\n* 从文档中隐藏已弃用的算子。（#6180）\n* 修复 Jupyter 测试问题。（#6184）\n* 迁移到 CUDA 13.1U1 版本。（#6163）\n* 提升动态模式与 PyTorch 的互操作性。（#6172）\n* 从文档中移除调试模式的相关引用。（#6175）\n* C","2026-03-03T16:32:22",{"id":183,"version":184,"summary_zh":185,"released_at":186},108740,"v1.53.0","关键特性与增强\n---\n本次 DALI 发布包含以下关键特性与增强：\n\n* 动态模式改进（ndd）\n  * 新增 ndd.imread 函数，用于读取和解码图像 (#6092)\n  * 动态模式下支持 DALI 随机算子 (#6110, #6101, #6100, #6096, #6093, #6088)\n  * 添加自动参数类型转换 (#6095)\n  * 改进文档、注释和存根文件 (#6085, #6089, #6080, #6078)\n* 新增 fn.bbox_rotate 算子 (#5979)\n  * 感谢 @5had3z 的贡献！\n* 增加对 nvImageCodec 0.7.0 的支持 (#6105)\n* 改进流的复用，以减少内核之间的冗余依赖 (#6072)\n* 迁移到 C++20 (#5962)\n\n\n已修复的问题\n---\n* 修复常量和 fn.full 中的广播问题 (#6104)\n* 修复 GPU 随机算子中大批次的处理问题 (#6082)\n* 修复动态模式下的设备跟踪问题 (#6090)\n* 修复 GPU Tensor 和 TensorList 复制中的同步问题 (#6071)\n\n\n改进内容\n---\n* 在动态模式存根文件中支持随机算子 (#6110)\n* 将 VERSION 更新至 1.53.0\n* 将 nvcomp 更新至 libnvcomp 5.1.0.21 (#6111)\n* 将 numba-cuda 更新至 0.22.0，并修复 TL0_python-self-test-operators_1 测试 (#6106)\n* 将 nvImageCodec 更新至 0.7.0 (#6105)\n* 为 DALI 动态模式随机算子添加 Python RNG API (#6101)\n* 修复 TL1_cutom_src_pattern_build 测试中的问题 (#6103)\n* 修复 Python 3.9 的 numba-cuda 安装问题 (#6097)\n* 重新设计随机算子 (#6100)\n* 更新依赖至 2025.11 版本 (#6094)\n* 为 Python 3.9 添加弃用警告 (#6098)\n* 为动态模式创建存根文件 (#6089)\n* 新增 ndd.imread 函数，用于读取和解码图像 (#6092)\n* 从文档和错误信息中移除“dynamic executor”这一名称 (#6084)\n* 使用 cuRAND 和 STL 实现泊松分布 (#6096)\n* 生成动态模式文档 (#6085)\n* 为动态 API 添加自动参数类型转换功能 (#6095)\n* 提供主机端和设备端的随机分布 (#6093)\n* 修复 clang CUDA 运行时包装器在多个版本上的补丁问题 (#6091)\n* 为动态模式随机算子添加隐藏的 _random_state 参数 (#6087)\n* 为 CPU 添加 Philox32x4_10 生成器 (#6088)\n* 补充动态模式文档中的空白部分——Tensor、Batch、EvalContext、Device 和 Readers (#6080)\n* 移除 NvDecoder 中基于纹理的视频处理功能 (#6076)\n* 进行 Coverity 检查 2025.11 版本 (#6081)\n* 为动态模式算子添加文档字符串 (#6078)\n* 为 DALI 插件添加 --no-build-isolation 标志 (#6077)\n* 记录执行器标志 StreamPolicy 和 OperatorConcurrency (#5988)\n* 向 CUDAStreamPool 添加繁忙列表，不从池中返回繁忙的流 (#6072)\n* 使信号量实现可选，默认使用 POSIX 实现 (#6074)\n* fn.bbox_rotate (#5979)\n* 切换到 C++20 (#5962)\n* 修复 flake8 报告的警告 (#6059)\n\n\n错误修复\n---\n* 修复 TL3_EfficientDet_convergence 和 TL3_YOLO_convergence 测试 (#6118)\n* 为 constant_value 算子系列添加正确的广播逻辑 (#6104)\n* 修复 imread 类型注释中 Python 3.9 的兼容性问题 (#6099)\n* M","2026-01-07T15:31:56",{"id":188,"version":189,"summary_zh":190,"released_at":191},108741,"v1.52.0","主要特性与改进\n---\n本次 DALI 发布包含以下主要特性与改进：\n\n* 引入实验性动态模式：一种带有惰性求值的命令式执行模型，便于集成到 Python 工作流中。(#6066, #6064, #6060, #6056, #6042, #6039, #6037, #6036, #5954)\n  * 动态模式：添加增强图库 (#6057)\n  * DALI 动态文档主页 (#6052)\n* 新增 Pipeline ZOO——针对常见图像和视频处理用例的代码片段及示例。(#5922)\n* 增加对 CUDA 13U2 的支持 (#6063)\n* 新增 `fn.decoders.numpy` (#5953) 和 CPU `fn.paste` 算子 (#5968)。\n  感谢 @5had3z 的贡献。\n* 公开管道动态执行器的相关配置选项：\n    * 公开执行器的 `stream_policy` 和 `concurrency` 选项 (#5983)\n    * 用于控制执行器线程数的环境变量。(#5949)\n\n\n已修复的问题\n---\n* 修复 Tensor::Copy 和 Tensor(List)GPU.as_cpu 中的流顺序问题 (#6070)\n* 修复固定内存张量到 DLPack 的转换问题 (#6061)\n* 修复当 DLPack 步幅指针为 NULL 时的步幅检查问题\n* 修复无关键帧视频的处理及旧索引复用问题 (#6058)\n* 修复 resize_crop_mirror 视频输出形状问题 (#5957)\n\n\n改进内容\n---\n* 更新至 FFmpeg 8.0\n* 动态模式：添加增强图库 (#6057)\n* 添加数学函数的动态 API 及测试。(#6066)\n* 将 DALI2 重命名为 dynamic (#6064)\n* 切换到 CUDA 13.0 U2 (#6063)\n* 动态模式：算子基类及算子调用生成器 (#6060)\n* 将版本号更新为 1.52.0\n* 依赖项更新（25\u002F10）(#6053)\n* 动态模式：Tensor 和 Batch 类型 (#6056)\n* 从致谢列表中移除 CMake。(#6020)\n* DALI 动态文档主页 (#6052)\n* 降低 TL1_decoder_perf 中实验性解码器的最低吞吐量要求 (#6050)\n* 修复 TL0_video_plugin 以兼容 sanitizers 运行 (#6040)\n* 命令式模式：调用方式 (#6042)\n* 更新 sanitizers 配置中的 LD_PRELOAD，并排除更多 numba 测试 (#6041)\n* 命令式模式：EvalContext、EvalMode、类型与设备 (#6039)\n* 将测试环境更新至 Ubuntu 24.04 (#6033)\n* 更新 curl 版本：3.15 至 3.16 (#6038)\n* 添加 TensorList 广播构造函数。(#6037)\n* 为命令式模式调整后端实现 (#6036)\n* 在 numba CUDA 测试中加入 nvcc\u002Fnvjitlink 版本兼容性检查 (#6035)\n* 统一所需的最小 CMake 版本。(#6022)\n* 修复 TL1_tensorflow-dali_test 中 Horovod 的安装问题 (#6024)\n* 移除主机解码器回退时的混淆警告 (#6029)\n* 为 TensorGPU DLPack 构造函数添加 `stream` 参数。(#6015)\n* 2025 年 9 月的累计依赖项更新。(#6017)\n* 屏蔽 sanitizers 构建中的误报警告 (#6018)\n* 将图像解码器性能测试中的 5% 阈值下调至 15%，以考虑非迭代情况 (#6021)\n* 将 CMake 升级至 3.25.2 (#6019)\n* 切换到 CUDA 13.0 U1 (#6016)\n* 切换到 gcc-toolset-14 (#6014)\n* 更新测试包 (#6010)\n* 更正 Orin 的支持矩阵条目 (#6008)\n* 屏蔽由 GCC 12.2.1 触发的误报警告 (#6002)\n* 修复 CVE-2024-13978","2025-10-27T14:10:55",{"id":193,"version":194,"summary_zh":195,"released_at":196},108742,"v1.51.0","关键特性与增强\n---\n本次 DALI 发布包含以下关键特性与增强：\n\n* 增加对 CUDA 13 和 CUDA 12.9U1 的支持。(#5946)\n* 增加对 nvImageCodec 0.6.0 的支持。\n* 提升 CPU 多线程效率。(#5960, #5963, #5961)\n  * 减少 ARM CPU 上的锁竞争。\n  * 减少 ThreadPool 中的互斥锁数量。\n  * 优化自旋锁热点路径。\n* 将新的（动态）执行器设为默认。(#5936, #5944)\n* 改进基于 nvImageCodec 的解码器的内存管理。(#5948, #5945)\n\n\n改进\n---\n* 优化自旋锁热点路径。(#5961)\n* 提升 ThreadPool 效率。(#5963)\n* 减少 ThreadPool 中的互斥锁数量。(#5960)\n* 修复 TL1_superres_pytorch 测试的模型权重路径问题。(#5955)\n* 将 VERSION 更新至 1.51.0\n* 依赖项更新 06.2025。(#5951)\n* 修复部分测试中安装 numpy 2 的问题。(#5952)\n* 切换到 CUDA 12.9U1。(#5946)\n* 新执行器性能修复。(#5944)\n* 更新测试中的 TensorFlow 和 Numba 版本。(#5942)\n* 在适用情况下使用 std::move 代替拷贝。(#5940)\n* BLD：静默 setuptools 关于包配置的警告。(#5939)\n* 将动态执行器设为默认选项。(#5936)\n* 更新 DALI_DEPS_VERSION。(#5934)\n* 为 sw_scale 添加越界内存写入保护。(#5931)\n* 移除重复的文档版本选择器。(#5933)\n* 更新子模块依赖。(#5927)\n* 启用 tfrecord2idx 脚本，将对象存储中的 tfrecord 转换为索引文件，并将其也存储在对象存储中。(#5918)\n* 禁用启用 sanitizers 时的 conda 测试。(#5923)\n* 启用带有 AWS SDK 的 conda 构建。(#5917)\n* 将 POST_BUILD 添加到自定义命令中，并在包装文件中包含 stdexcept。(#5903)\n* 在 conda 构建中启用 nvJPEG2k。(#5920)\n\n\n错误修复\n---\n* 不再强制使图像解码器的输出为非连续内存布局。(#5948)\n* nvImageCodec 解码器——分配整个批次内存。(#5945)\n* 在 test_crop_window_warning 测试流水线中设置 prefetch_queue_depth=1 参数。(#5938)\n* BLD：将 CMake Policy 175 设置为 NEW。(#5937)\n* 抑制样本中无边界框时的警告。(#5932)\n* 修复 DALI 代理的 ResNet50 测试。(#5925)\n\n\n破坏性 API 变更\n---\n* DALI 1.50 是最后一个支持 CUDA 11 的版本。\n* CUDA 13 构建中已移除对计算能力低于 75 的架构的支持。\n\n\n已弃用的功能\n---\n本次发布未弃用任何功能。\n\n\n已知问题：\n---\n* 在 Ubuntu 22.04 的 aarch64 平台上，曾观察到静态 TLS 分配大小不足的问题，这可能导致加载动态库时进程崩溃。将 glibc 更新至 2.39 或更高版本，或通过 `GLIBC_TUNABLES=glibc.rtld.optional_static_tls=10000` 指定更高的静态 TLS 大小，应能解决该问题。\n* 以下算子目前不支持检查点功能：`experimental.readers.fits`、`experimental.decoders.video` 和 `experimental.inputs.video`。\n* 视频加载算子要求关键帧至少每 10 至 15 帧出现一次。","2025-08-13T10:26:42",{"id":198,"version":199,"summary_zh":200,"released_at":201},108743,"v1.50.0","主要特性与改进\n---\n本次 DALI 发布包含以下主要特性与改进：\n\n* 添加对 CUDA 12.9 的支持 (#5908)\n* 增加了为 S3 存储桶禁用 SSL 验证的选项 (#5907)\n  感谢 @dimabasow 的贡献。\n* 添加了从 Python wheel 加载 nvComp 的支持 (#5894, #5889, #5909)\n* 改进了视频加载器的错误信息，在消息中加入文件名 (#5910)\n\n\n已修复的问题\n---\n* 修复了视频解码器中每包多帧的处理问题 (#5911) \n* 修复了 TF 插件中稀疏张量的处理问题 (#5916, #5887) \n* 修复了算子中默认随机种子的序列化问题 (#5919) \n* 修复了 GPU 归约操作中空输入的处理问题 (#5914) \n* 修复了 CUFileDriverScope 中标准输入描述符的处理问题 (#5902) \n\n改进\n---\n* 将 Python 3.10 设置为 build.sh 的默认版本 (#5913) \n* 使库打包错误更容易在日志中被发现。(#5915) \n* 升级到 CUDA 12.9 (#5908) \n* 改进视频加载器的错误信息，使其包含文件名 (#5910) \n* 增加了为 S3 存储桶禁用 SSL 验证的功能 (#5907) \n* 将 DALI TF 插件迁移到 C API 2.0 (#5904) \n* 构建系统：如果可用，使用 CMake 的 nvimgcodec 模块来获取头文件 (#5906) \n* TF 插件所需的 C API 变更。(#5898) \n* 从 augmentation_gallery 中移除冗余的导入 (#5900) \n* 切换到外部提供的 nvComp (#5894) \n* 因 EOL 而移除对 Python 3.8 的支持 (#5896) \n* 扩展 EfficientNet 的 README 文件 (#5895) \n* 修复 PyTorch 在 dlpack 零拷贝性能测试中的内存消耗问题 (#5891) \n* 在 get_device_memory_info 中添加对 NVMLError_NotSupported 的处理 (#5890) \n* 为 SBSA 平台启用 nvComp (#5889) \n* 实验性视频读取器将丢弃显示时间戳为负值的帧 (#5885)\n\n\n错误修复\n---\n* Coverity 检查 04.2025 (#5912)  \n* 帧解码器修复：避免溢出，处理每包多帧的情况 (#5911) \n* 修复 TF 插件中稀疏张量的构造问题。(#5916) \n* 不再序列化默认种子 (#5919) \n* 修复 GPU 空归约操作的问题 (#5914) \n* 确保即使 WITH_DYNAMIC_CUDA_TOOLKIT 关闭时，nvComp 也会被打包进去 (#5909) \n* 改进 conda 构建配方 (#5905) \n* 修复 CUFileDriverScope 中标准输入的处理问题 (#5902) \n* 从 C API 中移除挤压功能——它可能从来就没真正起过作用。(#5893) \n* 修复旧版 C API 中无效的堆栈读取问题 (#5892) \n* 使用 Polygon(..., closed=true) 代替 Polygon(..., true) (#5842) \n* 修复 TF 稀疏张量中标量的处理问题。(#5887) \n\n破坏性 API 变更\n---\nDALI 1.49 是最后一个支持 Python 3.8 的版本\n\n\n已弃用的功能\n---\n对 CUDA 11 的支持将在后续版本中终止。\n\n已知问题：\n---\n* 以下算子：`experimental.readers.fits`、`experimental.decoders.video` 和 `experimental.inputs.video` 目前不支持检查点功能。 \n* 视频加载器算子要求关键帧至少每隔 10 到 15 帧出现一次。  \n如果关键帧出现的频率","2025-05-27T17:43:38",{"id":203,"version":204,"summary_zh":205,"released_at":206},108744,"v1.49.0","主要特性与增强\n---\n本次 DALI 发布包含以下主要特性与增强：\n\n* 改进了新的（实验性）C API (#5879, #5872, #5866, #5857, #5835, #5868)\n* 添加了对 CUDA 12.8U1 的支持 (#5850)\n* 为 dali.fn.experimental.debayer 添加了 CPU 支持 (#5832)\n  感谢 @5had3z 的贡献！\n* 添加了对 nvImageCodec 0.5.0 的支持 (#5854)\n\n\n已修复的问题\n---\n* 修复了实验性图像解码器中的竞态条件 (#5856)\n\n\n改进\n---\n* 将 VERSION 更新至 1.49.0\n* C API 2.0 检查点功能 + 解除 dali.h 的阻塞 (#5879)\n* 暂时禁用失败的测试 (#5882)\n* 实验性视频读取器重构及 API 改进 (#5839)\n* 升级到 LLVM 20.1.2 (#5870)\n* C API 2.0：外部源信息 (#5872)\n* 将 _zmq.cpython 添加到地址消毒器抑制列表 (#5873)\n* 为 Horovod 构建设置最低 CMake 策略版本 (#5871)\n* 流水线重构 (#5866)\n* 添加多配置性能基准测试 (#5858)\n* 整理 Python 3.8 支持 (#5867)\n* 切换到 manylinux_2_28 (#5608)\n* 调整测试以兼容 numpy 2.x (#5862)\n* 提高 ffts 所需的最低 CMake 版本 (#5864)\n* 移除不必要的全局声明并添加 noqa 注释 (#5865)\n* 在 check_batch 中为缺失的源信息添加回退 (#5861)\n* 将 nvimagecodec 的要求提升至 0.5.0 (#5854)\n* 当 Mixed ImageDecoder 未注册时，跳过使用它的 C API2 测试。(#5857)\n* C API 2.0 流水线及流水线输出 (#5835)\n* 更新 six 包的版本约束 (#5855)\n* 在文档中添加 GIT sha 信息 (#5853)\n* 将 Black 版本升级至 25.x (#5849)\n* 将 conda 中的 OpenCV 版本提升至 4.11 (#5851)\n* 改进消毒器配置并抑制误报 (#5795)\n* 基于 OpenCV 添加 Debayer CPU 支持 (#5832)\n* FramesDecoder 的边界处理及视频工具 (#5844)\n* 升级到 CUDA 12.8 U1 (#5850)\n* 将透视变换测试添加到其他测试套件中 (#5847)\n* 在 OpSchema 中添加算子状态信息 (#5848)\n* 将支持的 TensorFlow 版本提升至 2.18 (#5840)\n\n\n错误修复\n---\n* 修复了实验性图像解码器中的竞态条件 (#5856)\n\n\n破坏性 API 变更\n---\n本次 DALI 发布没有破坏性变更。\n\n\n已弃用的功能\n---\nDALI 1.49 是最后一个支持 Python 3.8 的版本。\n\n\n已知问题：\n---\n* 以下算子：`experimental.readers.fits`、`experimental.decoders.video` 和 `experimental.inputs.video` 目前不支持检查点功能。\n* 视频加载算子要求关键帧至少每 10 至 15 帧出现一次。\n  如果关键帧的频率低于每 10-15 帧，返回的帧可能会不同步。\n* 实验性 VideoReaderDecoder 不支持开放 GOP。\n  它不会报告错误，但可能会生成无效帧。VideoReader 使用启发式方法检测开放 GOP，在大多数常见情况下应能正常工作。\n* DALI TensorFlow 插件可能不兼容。","2025-04-29T14:58:10",{"id":208,"version":209,"summary_zh":210,"released_at":211},108745,"v1.48.0","关键特性与增强\n---\n本次 DALI 发布包含以下关键特性与增强：\n\n* 改进的 `fn.experimental.decoders.video` 视频解码器：(#5830, #5814)\n    * 优化了寻址和重置行为\n    * 增加了支持多种可配置模式的帧填充功能\n    * 新增了帧选择选项\n    * 添加了 `build_index` 选项，用于控制帧索引的生成\n* 为 `dali.fn.experimental.warp_perspective` 增加了 CPU 支持 (#5829, #5815)\n    * 感谢 @5had3z 的贡献！\n* 引入了新的（实验性）C API (#5796, #5797, #5798, #5799)\n\n已修复的问题\n---\n* 引入 AvUniquePtr 以避免帧解码器中的内存泄漏 (#5834)\n* 移除了在使用固定内存输入的操作符中不必要的主机同步。(#5822)\n* 修复了以非主机顺序生成的固定 CPU 缓冲区的主机端访问问题 (#5820)\n* 修复了 GPU 算术操作符对空批次的处理问题。(#5818)\n\n改进\n---\n* 修复 TL3 短测试中的数据路径 (#5845)\n* 由于收敛问题，将 SSD LT3 的批大小更改回 64 (#5846)\n* 将 VERSION 更新为 1.48.0\n* 修复 Coverity 问题 25\u002F03 (#5843)\n* 将 FFmpeg 升级至 7.1.1 (#5838)\n* 重新组织视频解码器源代码 (#5836)\n* 依赖项更新 2025-03 (#5833)\n* C API 2.0 Tensor 和 TensorList (#5799)\n* 更新音频解码器算子文档（支持的格式）(#5803)\n* 移除 RN50 基准测试，将其移至 DALI_EXTRA 用于 RN50 DL FW 迭代测试 (#5824)\n* 改善视频解码器的寻址和重置行为 (#5830)\n* 实现 Warp Perspective 的 CPU 版本 (#5829)\n* 移除 ScratchpadAllocator 和 ScratchpadEstimator (#5810)\n* 对 Pipeline、OpSpec 和 InputOperator 进行代码现代化和重构 (#5826)\n* ``fn.experimental.decoders.video`` 的改进 (#5814)\n* C API 2.0 辅助函数 (#5798)\n* C API 2.0 的初始化与错误处理 (#5797)\n* 限制 TensorTest 中的最大张量列表大小 (#5823)\n* 将 DisplacementTest.Sphere 的约束从 0.005 放宽至 0.006 (#5821)\n* 限制 Python 3.8 和 3.9 的 dm-tree 版本 (#5819)\n* 添加 C API 头文件和 C 语言构建测试。(#5796)\n* 在文档中公开 DLPack 支持 (#5817)\n\nBug 修复\n---\n* 修复 data_objects_test 中数组使用 unique_ptr 的问题 (#5837)\n* 引入 AvUniquePtr 以避免帧解码器中的内存泄漏 (#5834)\n* 修复 tensor_shape.h 警告 (#5831)\n* 增强视频编解码器支持及错误处理 (#5825)\n* 修正 warp_perspective 的文档说明，要求输入为 3x3 形状，而非展平的一维数组 (#5815)\n* 移除在使用固定内存输入的操作符中不必要的潜在主机同步。(#5822)\n* 修复以非主机顺序生成的固定 CPU 缓冲区的主机端访问问题 (#5820)\n* 修复 GPU 算术操作符对空批次的处理问题。(#5818)\n\n破坏性 API 变更\n---\n本次 DALI 发布没有破坏性 API 变更。\n\n已弃用的功能\n---\n本次发布未弃用任何功能。\n\n已知问题：\n---\n* 以下算子：`experimental.readers.fits`、`experimental.decoders.video`","2025-03-25T17:57:30",{"id":213,"version":214,"summary_zh":215,"released_at":216},108746,"v1.47.0","主要特性与增强\n---\n本次 DALI 发布包含以下主要特性与增强：\n\n* 添加了对 DALI 批量处理的支持，作为 PyTorch DataLoader 的一部分（DALI 代理）：\n    * 实验性 DALI 代理 (#5726, #5784)\n    * 训练中使用 DALI 代理的示例 (#5791, #5792)\n* Tegra 构建已迁移到 JetPack 6.2（CUDA 12.6）(#5449)\n\n修复的问题\n---\n* 修复了实验性图像解码器中的同步不足问题 (#5806)\n* 修复了实验性视频解码器中的内存泄漏问题 (#5778)\n\n改进\n---\n* 修复 openssl 中的 CVE-2024-13176 漏洞 (#5805) \n* 将 VERSION 更新至 1.47.0 \n* 使帧解码器在不进行文件解码的情况下构建索引 (#5809) \n* 清理警告信息 (#5811) \n* 将 PyNvVideoCodec 的下载源切换至 PyPI (#5813) \n* 依赖库更新至 2025 年 2 月 (#5801) \n* 在 resnet50 示例中将 DALI 设置为默认选项 (#5808) \n* 在 EfficientNet 和 ResNet 示例中添加关于 DALI 代理的文档 (#5800) \n* 增加对 AWS SDK C++、curl 和 openssl 的致谢 (#5794) \n* 迁移到 CUDA 12.8 (#5793) \n* 迁移到 JetPack 6.2（CUDA 12.6）(#5449) \n* 在 EfficientNet 示例中添加 DALI 代理选项 (#5791) \n* 使用 DALI 代理运行 ResNet50 示例，并引入 TL3_RN50_benchmark (#5792) \n* 从 asan 屏蔽列表中移除 libavutils (#5783) \n* 为 EfficeintNet 添加典型的数据加载流水线路径 (#5761) \n* 移除已失效的执行器测试。(#5788) \n* 修复 test_dali_proxy 的使用问题 (#5784) \n* 修复 pip show 对 TL1_decoder_perf 的使用问题 (#5781) \n* 引入（实验性）DALI 代理 (#5726) \n* 将光流测试从特定的 TU 测试任务迁移至 Ampere 测试 (#5771) \n* 修复文档中 ipynb 文件中的轻微 Markdown 格式问题 (#5773) \n\n错误修复\n---\n* 确保分配的临时内存可供 nvImageCodec 流使用，因为我们通常会跳过预同步以避免不必要的开销 (#5806) \n* 从光流代码中移除冗余的 nvml::Shutdown 调用 (#5804) \n* 修复 libaviutils 占用过多内存的问题 (#5778) \n\n破坏性 API 变更\n---\n本次 DALI 发布没有破坏性 API 变更。\n\n已弃用的功能\n---\n本次发布未弃用任何功能。\n\n已知问题：\n---\n* 以下算子：`experimental.readers.fits`、`experimental.decoders.video` 和 `experimental.inputs.video` 目前不支持检查点功能。  \n* 视频加载算子要求关键帧至少每 10 至 15 帧出现一次。如果关键帧的频率低于 10–15 帧，返回的帧可能会不同步。  \n* 实验性 VideoReaderDecoder 不支持开放 GOP。它不会报错，但可能会生成无效帧。VideoReader 使用启发式方法检测开放 GOP，在大多数常见情况下应能正常工作。  \n* DALI TensorFlow 插件可能与 TensorFlow 1.15.0 及更高版本不兼容。  \n为了在没有随 DALI 附带预编译插件二进制文件的 TensorFlow 版本上使用 DALI，请确保用于编译的编译器是…","2025-02-25T18:26:31",{"id":218,"version":219,"summary_zh":220,"released_at":221},108747,"v1.46.0","关键特性与增强\n---\n本次 DALI 发布包含以下关键特性与增强：\n\n* 新增对 CUDA 12.8 的支持\n* 优化了工作空间和算子规范 (#5740, #5770)\n* 引入了适用于 DALI 流水线\u002F图的公共子图消除功能 (#5752, #5755)\n* 增加了对 nvImageCodec 0.4.1 的支持 (#5576, #5774, #5780)\n* 改进了对支持环境变量的文档说明 (#5756)\n* 使流水线的 `build` 调用变为可选 (#5754)\n\n已修复的问题\n---\n* 修复了全局命名空间中 DALIDataType 的打印问题（针对自定义 C++ 构建）(#5748)\n\n改进\n---\n* 禁用 Xavier 构建中的 nvimgcodec 支持 (#5780)\n* 在 conda 构建中将 nvimagecodec 版本升级至 0.4.1 (#5774)\n* 将 VERSION 更新至 1.46.0\n* 优化 OpSpec 和 Workspace 查询 (#5770)\n* 将 CUTLASS 升级至 3.6.0 (#5765)\n* 将 nvImageCodec 升级至 0.4.1 版本 (#5576)\n* 依赖项更新 01\u002F2025 (#5767)\n* 更改向源密钥环添加 CUDA 公钥的方式 (#5766)\n* 为 TensorCPU 和 TensorListCPU 添加 .as_cpu() 方法 (#5751)\n* 使 build() 变为可选 (#5754)\n* 调整 L1 测试以符合规范化的 DALI 插件命名规则 (#5760)\n* 添加测试以验证 CSE 不会合并外部源 (#5755)\n* 记录环境变量文档 (#5756)\n* 实现 CSE——公共子表达式（子图）消除 (#5752)\n* 将 setuptools 添加为 conda 构建的依赖项 (#5753)\n* 规范化 wheel 和 sdist 文件名，使其仅使用下划线 `_`，符合 PEP 625 的要求 (#5750)\n* 在 TL1_decoder_perf 中增加 taskset 的使用 (#5738)\n* 对 OpSchema 进行重大重构，并弃用默认种子 (#5740)\n* 修复 python-self-core-exec2 被运行两次的问题 (#5743)\n\n错误修复\n---\n* 修复 dlpack 测试 (#5768)\n* 修复 libsnd 中的 CVE-2024-50612 漏洞 (#5745)\n* 将 DALIDataType 的 operator\u003C\u003C 和 to_string 移至全局命名空间。(#5748)\n\n破坏性 API 变更\n---\n本次 DALI 发布没有破坏性 API 变更。\n\n已弃用的功能\n---\n* 向非随机算子传递 `seed` 参数已被弃用。虽然传递该参数不会产生任何效果，但会触发警告。\n\n已知问题：\n---\n* 以下算子目前不支持检查点功能：`experimental.readers.fits`、`experimental.decoders.video` 和 `experimental.inputs.video`。\n* 视频加载算子要求关键帧至少每 10 到 15 帧出现一次。如果关键帧的频率低于每 10–15 帧，返回的帧可能会不同步。\n* Experimental VideoReaderDecoder 不支持开放 GOP。它不会报错，但可能会生成无效帧。VideoReader 使用启发式方法检测开放 GOP，在大多数常见情况下应能正常工作。\n* DALI TensorFlow 插件可能与 TensorFlow 1.15.0 及更高版本不兼容。若要将 DALI 与未随 DALI 提供预编译插件二进制文件的 TensorFlow 版本一起使用，请确保在安装插件时，系统上存在用于构建 TensorFlow 的编译器。（取决于","2025-01-31T17:44:28",{"id":223,"version":224,"summary_zh":225,"released_at":226},108748,"v1.45.0","主要特性与增强\n---\n本次 DALI 发布包含以下主要特性与增强：\n\n* 添加对 CUDA 12.8 的支持 (#5711)。\n* 优化了使用动态执行器时 JAX 和 PaddlePaddle 插件中输出的（零拷贝）传输。(#5703, #5715)\n\n已修复的问题\n---\n* 修复了将从 GPU 传输到 CPU 的输入通过 `.cpu()` 调用作为关键字参数传递时的问题 (#5732)\n\n\n改进\n---\n* 添加 CUDA 12.8 支持\n* 更新 libjpeg2k (#5742)\n* 将版本号更新为 1.45.0\n* 从 DALI wheel 名称中移除“构建标签” (#5736)\n* 将 CV-CUDA 从 0.8 更新至 0.12，rapidjson (ToT)，google benchmark 从 1.9.0 更新至 1.15.1，black 从 24.4.2 更新至 24.8.0 (#5733)\n* 以常量引用方式返回 TensorLayout。(#5730)\n* 提取 DALIDataType。(#5729)\n* 改善 hw_decoder_bench.py 中的打印功能 (#5724)\n* 将所有参数引用中的双反引号替换为单反引号 (#5716)\n* 启用 ops API 的运行时和 sphinx 级别签名 (#5722)\n* 切换到 CUDA 12.6 U3 (#5719)\n* 移除未使用的 `max_num_stream` Pipeline 参数，并在 Python 中弃用 `max_streams`。(#5720)\n* 从原生代码中移除 default_cuda_stream_priority，并在 Python 中将其弃用。(#5717)\n* PaddlePaddle 零拷贝 (#5715)\n* 在 Sphinx 文档中添加对参数引用的处理 (#5707)\n* JAX 零拷贝 (#5703)\n* 在可分离重采样中使用 FMA。(#5711)\n* 在 RNN-t 流水线中使用 exec-dynamic。对 exec2 进行小幅修复。(#5706)\n\n错误修复\n---\n* 提高 tf 测试中 numpy 版本的上限 (#5741)\n* 从 backend_impl 中移除 TFRecordParser 依赖 (#5737)\n* 修复 coverity 问题 (#5734)\n* 修复将 .cpu() 结果传递给参数输入的问题。(#5732)\n* 对参数使用绝对寻址方式 (#5725)\n* 更正 conda 包及安装说明消息中的 nvimagecodec 版本 (#5714)\n\n\n破坏性 API 变更\n---\n本次 DALI 发布没有破坏性变更。\n\n\n已弃用的功能\n---\n* Pipeline 参数 `max_streams` 和 `default_cuda_stream_priority` 已被弃用。传递这些参数不会产生任何效果，但会触发警告。\n\n\n已知问题：\n---\n* 最新的 nvImageCodec (0.4.0) 目前与 DALI 不兼容。从 DALI 1.44 开始，Python wheel 将依赖锁定为 0.3.0，但较早的版本并未明确指定所需版本。使用早期 DALI 版本的用户可能需要手动安装旧版 nvImageCodec，才能使用 `fn.experimental.decoders.image.*`，或者对于 DALI 1.39 和 1.40，使用 `fn.decoders.image.*`。兼容版本可通过 `pip install nvidia-nvimgcodec-cu12~=0.3.0` 安装。\n* 以下算子：`experimental.readers.fits`、`experimental.decoders.video` 和 `experimental.inputs.video` 当前不支持检查点功能。\n* 视频加载算子要求关键帧至少每 10 至 15 帧出现一次。如果关键帧的频率低于每 10–15 帧，返回的帧可能会不同步。\n* 实验性 VideoReaderDecoder 不支持开放 GOP。","2025-01-24T15:08:06",{"id":228,"version":229,"summary_zh":230,"released_at":231},108749,"v1.44.0","Key Features and Enhancements\r\n---\r\nThis DALI release includes the following key features and enhancements:\r\n\r\n* The dynamic executor (exec_dynamic) is no longer experimental. It supports GPU to CPU transfers and reduces memory consumption. (#5704)\r\n* Added support for zero-copy outputs transfer with dynamic executor. (#5684, #5673)\r\n  * Eliminated the outputs copy in PyTorch plugin. (#5699)\r\n* Added dynamic executor support to TF plugin. (#5686)\r\n* Optimized pipeline's output contiguity handling. (#5677)\r\n\r\nFixed Issues\r\n---\r\n* Restricted nvImageCodec version in DALI wheel dependencies list, as the most recent nvImageCodec (0.4.0) is incompatible. (#5709)\r\n* Fixed custom stream handling on non-default device in `fn.external_source` (#5690).\r\n* Fixed problem with using DALI with Python3.12 with no distutils\u002Fsetuptools installed.\r\n* Fixed incorrect stream usage in `fn.experimental.inputs.video` (#5682)\r\n* Fixed possible hang in video decoder when rewinding near the last keyframe (#5676, #5669)\r\n* Fixed `dont_use_mmap` option handling in `fn.readers.webdataset` (#5683)\r\n* Fixed redundant usage of pinned memory in the CPU `fn.readers.numpy` reader (#5678)\r\n* Fixed dynamic executor's handling of operators that produce no outputs (#5674)\r\n\r\n\r\nImprovements\r\n---\r\n* Make `exec_dynamic` non-experimental (alternative formatting) (#5704)\r\n* Use zero-copy outputs with PyTorch (#5699)\r\n* Add Python 3.13 (experimental) support (#5692)\r\n* Add proper NVTX markers to Executor2. (#5694)\r\n* Add Efficientnet pipeline to hw_bench script (#5691)\r\n* Stream aware outputs (#5684)\r\n* Update DALI_DEPS_VERSION for new OpenSSL (#5689)\r\n* Add dynamic executor support to TF plugin. (#5686)\r\n* Make black and flake8 run independently. (#5685)\r\n* Update of FFmpeg to n7.1 (#5681)\r\n* Deps update 10 2024 (#5670)\r\n* Refactor operator output contiguity handling (#5677)\r\n* Add ready event to Tensor and TensorList. (#5673)\r\n\r\nBug Fixes\r\n---\r\n* Fix nvimgcodec version check, do not install it separately in tests env (#5713)\r\n* Limit the upper versions of DALI wheel installation dependencies (#5710)\r\n* Limit the maximum version of nvimagecodec for current DALI (#5709)\r\n* Use exec-dynamic in RNN-t pipeline. Minor fix to exec2. (#5706)\r\n* Check JAX version and invoke __dlpack__ manually for jax pre-0.4.16. (#5702)\r\n* Fix `nose` imports (#5698)\r\n* ExternalSource refactoring and fixing (#5690)\r\n* Move from deprecated distutils to packaging (#5687)\r\n* Make sure that the proper video stream index is used by the GPU decoder (#5682)\r\n* Add an ability to rewind at the end of the video (#5676)\r\n* Fix inverted mmap inside webdataset reader (#5683)\r\n* Fix the redundant usage of pinned memory in the numpy cpu reader (#5678)\r\n* Fix handling of tasks with zero outputs. (#5674)\r\n* Add an ability to retry rewind to the one before the last keyframe (#5669)\r\n\r\n\r\nBreaking API changes\r\n---\r\nThere are no breaking changes in this DALI release.\r\n\r\n\r\nDeprecated features\r\n---\r\nNo features were deprecated in this release.\r\n\r\n\r\n\r\nKnown issues:\r\n---\r\n* The most recent nvImageCodec (0.4.0) is currently incompatible with DALI. Python wheel for DALI 1.44 pins the dependency to 0.3.0, but older releases do not specify the required version explicitly. Users of previous DALI releases may need to manually install older nvImageCodec in order to use `fn.experimental.decoders.image.*` or, for DALI 1.39 and 1.40, `fn.decoders.image.*`. The compatible version can be installed with `pip install nvidia-nvimgcodec-cu12~=0.3.0`.\r\n* The following operators: `experimental.readers.fits`, `experimental.decoders.video`, and `experimental.inputs.video` do not currently support checkpointing. \r\n* The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.  \r\nIf the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.\r\n* Experimental VideoReaderDecoder does not support open GOP.  \r\nIt will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.\r\n* The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.  \r\nTo use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)\r\n* In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams. \r\nAs a workaround, you can manually synchronize the device before returning the data from the callback.\r\n* Due to some known issues with meltdown\u002Fspectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:\r\n  * `privileged=yes` in Extra Settings for AWS data points\r\n  * `--privileged` o","2024-11-28T18:09:35",{"id":233,"version":234,"summary_zh":235,"released_at":236},108750,"v1.43.0","Key Features and Enhancements\r\n---\r\nThis DALI release includes the following key features and enhancements:\r\n\r\n* Added DataNode methods for runtime access to batch's shape, layout, and source_info (#5650, #5648). \r\n* Added support for CUDA 12.6U2 (#5657)\r\n* Add experimental CV-CUDA resize operator (#5637)\r\n* Improved performance of TensorList resizing and TypeTable (#5638, #5634).\r\n* Improved DLPack support (to enable sharing ownership and pinned memory) (#5661).\r\n\r\n\r\nFixed Issues\r\n---\r\n* Fixed cleanup of pipelines containing PythonFunction. (#5668)\r\n* Fixed CPU resize operator running with multiple resampling modes in a batch. (#5647)\r\n\r\n\r\nImprovements\r\n---\r\n* Add support for bool type for the numba operator (#5666)\r\n* Bump numpy version in Xavier tests. (#5663)\r\n* DLPack support rework (#5661)\r\n* Update links in DALI readme (#5660)\r\n* Bump required NumPy version to 1.23. (#5658)\r\n* Move to CUDA 12.6 update 2 (#5657)\r\n* Increase number of the decoder bench iterations (#5655)\r\n* GetProperty refactor + DataNode.property accessor (#5650)\r\n* Remove and forbid direct inclusion of half.hpp. (#5654)\r\n* Add DataNode.shape() (#5648)\r\n* Fix conda build for Python 3.9 (#5649)\r\n* Increase batch size in RN50 test for TF as on H100 it works well again (#5645)\r\n* Add experimental CV-CUDA resize (#5637)\r\n* Pin libprotobuf and protobuf to 5.24 which works with python 3.8-3.12 in conda (#5643)\r\n* Optimize TensorList::Resize (#5638)\r\n* TypeTable\u002FTypeInfo optimization (#5634)\r\n\r\n\r\nBug Fixes\r\n---\r\n* Fix Pipeline reference leak in PythonFunction. (#5668)\r\n* Fix constness in (Const)SampleView. Improve diagnostics. (#5664)\r\n* Fix issues detected by Coverity (2024.09.30) (#5652)\r\n* Fix CPU resize with mixed NN\u002Fother resampling filters. (#5647)\r\n* Fix block size in TransposeTiled kernel test. (#5641)\r\n* Fix the lack of the previous release in the docs switcher list (#5640)\r\n\r\n\r\nBreaking API changes\r\n---\r\nThere are no breaking changes in this DALI release.\r\n\r\n\r\nDeprecated features\r\n---\r\nNo features were deprecated in this release.\r\n\r\n\r\n\r\nKnown issues:\r\n---\r\n* The following operators: `experimental.readers.fits`, `experimental.decoders.video`, and `experimental.inputs.video` do not currently support checkpointing. \r\n* The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.  \r\nIf the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.\r\n* Experimental VideoReaderDecoder does not support open GOP.  \r\nIt will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.\r\n* The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.  \r\nTo use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)\r\n* In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams. \r\nAs a workaround, you can manually synchronize the device before returning the data from the callback.\r\n* Due to some known issues with meltdown\u002Fspectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:\r\n  * `privileged=yes` in Extra Settings for AWS data points\r\n  * `--privileged` or `--security-opt seccomp=unconfined` for bare Docker.\r\n\r\n\r\nBinary builds\r\n---\r\n\r\n**NOTE:** DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest [CUDA toolkit](https:\u002F\u002Fdocs.nvidia.com\u002Fcuda\u002Fcuda-installation-guide-linux\u002Findex.html).\r\n\r\n```\r\nCUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. \r\nThey are built with the latest CUDA 11.x\u002F12.x toolkit respectively but they can run on the latest, \r\nstable CUDA 11.0\u002FCUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).\r\nHowever, using the most recent driver may enable additional functionality. \r\nMore details can be found in enhanced CUDA compatibility guide.\r\n```\r\n\r\nInstall via pip for CUDA 12.0:\r\n`pip install --extra-index-url https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002F nvidia-dali-cuda120==1.43.0`\r\n`pip install --extra-index-url https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002F nvidia-dali-tf-plugin-cuda120==1.43.0`\r\n\r\nor just:\r\n\r\n`pip install nvidia-dali-cuda120==1.43.0`\r\n`pip install nvidia-dali-tf-plugin-cuda120==1.43.0`\r\n\r\nFor CUDA 11:\r\n`pip install --extra-index-url https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002F nvidia-dali-cuda110==1.43.0`\r\n`pip install --extra-index-url https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002F nvidia-dali-tf-plugin-cuda110==1.43.0`\r\n \r\nor just:\r\n\r\n`pip install nvidia-dali-cuda110==1.43.0`\r\n`pip install nvidia-dali-tf-plugin-cuda110==1","2024-10-30T17:22:30",{"id":238,"version":239,"summary_zh":240,"released_at":241},108751,"v1.42.0","Key Features and Enhancements\r\n---\r\nThis DALI release includes the following key features and enhancements:\r\n\r\n* Introduced more flexible execution in the DALI pipeline with the  `experimental_exec_dynamic` flag (#5635, #5631, #5593, #5528, #5620, #5602, #5529, #5595):\r\n  * Enabled support for GPU-to-CPU transfers in a pipeline.\r\n  * Added support for accessing CPU metadata of GPU outputs (e.g. shape of GPU decoded images\u002Fvideos).\r\n* Added support for CUDA 12.6U1 (#5616).\r\n* Added an option to return the number of frames in the experimental video reader (#5628). \r\n\r\n\r\n\r\nFixed Issues\r\n---\r\n* Fixed handling of optical flow initialization failure (#5624).\r\n\r\n\r\nImprovements\r\n---\r\n* Add metadata-only inputs. (#5635)\r\n* Schema-based input device check (#5631)\r\n* Enable GPU->CPU transfers (#5593)\r\n* Adds `enable_frame_num` to the experimental video reader (#5628)\r\n* Executor2 class implementation & tests (#5528)\r\n* Executor 2.0: Per-operator stream assignment policy (#5620)\r\n* Move to CUDA 12.6U1 (#5616)\r\n* Executor 2.0: Stream assignment (#5602)\r\n* Tasking: Test returning multiple outputs of type std::any. (#5529)\r\n* Patch OSS vulnerabilities (#5612)\r\n* Executor 2.0: Graph lowering (#5595)\r\n* Make DALI tests compatible with Python 3.12 (#5452)\r\n* Adjust the L3 perf test threshold for H100 runners (#5606)\r\n* Add L1 image decoder DALI test (#5601)\r\n\r\n\r\nBug Fixes\r\n---\r\n* Fix multiple initialization attempts in optical flow operator. (#5624)\r\n* Fix null pointer access when clearing incomplete workspace payload. (#5622) \r\n\r\n\r\nBreaking API changes\r\n---\r\nThere are no breaking changes in this DALI release.\r\n\r\n\r\nDeprecated features\r\n---\r\nNo features were deprecated in this release.\r\n\r\n\r\n\r\nKnown issues:\r\n---\r\n* The following operators: `experimental.readers.fits`, `experimental.decoders.video`, and `experimental.inputs.video` do not currently support checkpointing. \r\n* The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.  \r\nIf the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.\r\n* Experimental VideoReaderDecoder does not support open GOP.  \r\nIt will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.\r\n* The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.  \r\nTo use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)\r\n* In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams. \r\nAs a workaround, you can manually synchronize the device before returning the data from the callback.\r\n* Due to some known issues with meltdown\u002Fspectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:\r\n  * `privileged=yes` in Extra Settings for AWS data points\r\n  * `--privileged` or `--security-opt seccomp=unconfined` for bare Docker.\r\n\r\n\r\nBinary builds\r\n---\r\n\r\n**NOTE:** DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest [CUDA toolkit](https:\u002F\u002Fdocs.nvidia.com\u002Fcuda\u002Fcuda-installation-guide-linux\u002Findex.html).\r\n\r\n```\r\nCUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. \r\nThey are built with the latest CUDA 11.x\u002F12.x toolkit respectively but they can run on the latest, \r\nstable CUDA 11.0\u002FCUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).\r\nHowever, using the most recent driver may enable additional functionality. \r\nMore details can be found in enhanced CUDA compatibility guide.\r\n```\r\n\r\nInstall via pip for CUDA 12.0:\r\n`pip install --extra-index-url https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002F nvidia-dali-cuda120==1.42.0`\r\n`pip install --extra-index-url https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002F nvidia-dali-tf-plugin-cuda120==1.42.0`\r\n\r\nor just:\r\n\r\n`pip install nvidia-dali-cuda120==1.42.0`\r\n`pip install nvidia-dali-tf-plugin-cuda120==1.42.0`\r\n\r\nFor CUDA 11:\r\n`pip install --extra-index-url https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002F nvidia-dali-cuda110==1.42.0`\r\n`pip install --extra-index-url https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002F nvidia-dali-tf-plugin-cuda110==1.42.0`\r\n \r\nor just:\r\n\r\n`pip install nvidia-dali-cuda110==1.42.0`\r\n`pip install nvidia-dali-tf-plugin-cuda110==1.42.0`\r\n\r\n\r\nOr use direct download links (CUDA 12.0):\r\n* https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002Fnvidia-dali-cuda120\u002Fnvidia_dali_cuda120-1.42.0-18507157-py3-none-manylinux2014_x86_64.whl\r\n* https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002Fnvidia-dali-cuda120\u002Fnvidia_dali_cuda120-1.42.0-18507157-py3-none-manylinux2014_aarch64.wh","2024-09-30T16:53:23",{"id":243,"version":244,"summary_zh":245,"released_at":246},108752,"v1.41.0","Key Features and Enhancements\r\n---\r\nThis DALI release includes the following key features and enhancements:\r\n\r\n* Added support for CUDA 12.6. (#5596)\r\n* Added `fn.experimental.warp_perspective` operator. (#5542, #5575)\r\n* Added `fn.random.beta` random variate sampling operator. (#5550, #5571)\r\n* Added `fn.io.file.read` operator that supports loading files from dynamically specified paths. (#5552, #5572)\r\n* Enabled support for more simple types in `fn.python_function`, `fn.ones`, and `fn.zeros`. (#5598)\r\n* Removed unnecessary copy of tensor arguments fed into GPU operators. (#5590)\r\n\r\n\r\nFixed Issues\r\n---\r\n* Reverted the `fn.decoders.image*` to use legacy decoders due to performance regression in nvImageCodec. (#5582, #5578, #5586)\r\n* Optimized S3 downloading in TFRecord reader. (#5554)\r\n* Added missing validation for number of inputs in argument promotion. (#5592)\r\n* Added missing header to support compilation with GCC 14. (#5594)\r\n* Fixed empty batch handling when copying batch from cpu to gpu. (#5567)\r\n\r\n\r\nImprovements\r\n---\r\n* Executor 2.0: ExecGraph (#5587) \r\n* Enable more Python types to be supported by the DALI python function (#5598) \r\n* Remove usages of `std::call_once`. (#5599) \r\n* Move to CUDA 12.6 (#5596) \r\n* Remove MakeContiguous before CPU inputs of GPU ops.  (#5590) \r\n* nvImageCodec related fixes (#5586) \r\n* Mark PropagateError as [[noreturn]] (#5589) \r\n* Make test_beta_distribution compatible with Python 3.8 (#5571) \r\n* Add default_batch_size to IterationData. (#5588) \r\n* Add thread_setup callback to tasking::Executor (#5581) \r\n* Fix librosa deprecated usage (#5579) \r\n* Bring back the legacy image decoder operator (#5578) \r\n* Extract librosa's effects.trim and stft to DALI test utils, to avoid issues with breaking changes (#5568) \r\n* Remove libjpeg and libtiff deps (#5569) \r\n* Add warp_perspective operator (#5542) \r\n* Remove legacy image decoder (#5559) \r\n* Optimize S3 downloading for TFRecord reader (#5554) \r\n* Add io.file.read operator (#5552) \r\n* Add `fn.random.beta` random variate (#5550) \r\n* Reduce the batch size in the TensorFlow RN50 L3 test (#5565) \r\n* Use MakeContiguous\u003CCPUBackend> when copying CPU->CPU. (#5562) \r\n* Update the DALI EfficientNet example to be compatible with the latest NumPy (#5561) \r\n\r\n\r\nBug Fixes\r\n---\r\n* Fixes problems with fetching LFS objects during nvImageCodec conda build (#5603) \r\n* Fix the --python-tag option passed to python setup.py bdist_wheel command (#5600) \r\n* Revert \"Reintroduce \"Move old ImageDecoder to legacy module and make the nvImageCodec based ImageDecoder the default\" (#5470)\" (#5582)\r\n* Adding cstdint header to support GCC 14 compilation (#5594) \r\n* Add missing validation for input count in argument promotion (#5592) \r\n* Don't return pointers to a local variable in dali_operator_test. (#5585) \r\n* Fix operator trace caching (#5580) \r\n* Fix readlink usage - readlink doens't null-terminate strings. (#5577) \r\n* Fix WarpPerspective::GetFillValue (#5575) \r\n* Prevent stack-use-after-scope (#5572) \r\n* Add missing `#include \u003Coptional>` in nvcvop.h (#5570) \r\n* Fix MakeContiguous sample_dim for empty batches. (#5567) \r\n* Set affinity by device UUID. (#5566) \r\n* Unchecked return value from CUDA library (#5564) \r\n\r\n\r\nBreaking API changes\r\n---\r\n* DALI 1.39 was the final release to support the MXNet integration.\r\n\r\n\r\nDeprecated features\r\n---\r\nNo features were deprecated in this release.\r\n\r\n\r\n\r\nKnown issues:\r\n---\r\n* The following operators: `experimental.readers.fits`, `experimental.decoders.video`, and `experimental.inputs.video` do not currently support checkpointing. \r\n* The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.  \r\nIf the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.\r\n* Experimental VideoReaderDecoder does not support open GOP.  \r\nIt will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.\r\n* The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.  \r\nTo use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)\r\n* In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams. \r\nAs a workaround, you can manually synchronize the device before returning the data from the callback.\r\n* Due to some known issues with meltdown\u002Fspectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:\r\n  * `privileged=yes` in Extra Settings for AWS data points\r\n  * `--privileged` or `--security-opt seccomp=unconfined` for bare Docker.\r\n\r\n\r\nBinary builds\r\n---\r\n\r\n**NO","2024-08-29T19:14:26",{"id":248,"version":249,"summary_zh":250,"released_at":251},108753,"v1.40.0","Key Features and Enhancements\r\n---\r\nThis DALI release includes the following key features and enhancements:\r\n\r\n* Added operators: `fn.zeros` , `fn.zeros_like` , `fn.ones` , `fn.ones_like` , `fn.full`  and `fn.full_like`  (#5505).\r\n* Added support for H264, H265, and AV1 video formats to `fn.plugin.video` (#5504).\r\n* Added support for CUDA 12.5U1 (#5545).\r\n\r\n\r\nFixed Issues\r\n---\r\n* Fixed following issues with S3 files reading:\r\n  * Fixed handling of file names with whitespaces in TFRecord reader (#5525).\r\n  * Fixed loading when no GPU is available (#5533).\r\n  * Fixed handling of TFrecord index file (#5515).\r\n\r\n\r\nImprovements\r\n---\r\n* Dependency update 07\u002F2024 (#5556) \r\n* Move checkpoint to IterationData. Remove ExecIterData. (#5555) \r\n* Remove pruning from the Executor. (#5553) \r\n* Move most of Operator\u003CBackend> to OperatorBase. Unify and simplify operator interfaces. (#5548) \r\n* Move graph visiting utilities to a separate file. (#5549) \r\n* Move to CUDA 12.5U1 (#5545) \r\n* Extend the external source signature to include all arguments (#5541) \r\n* Update DALI_deps version (#5536) \r\n* Pin numpy to \u003C1.24 in TensorFlow examples (#5534) \r\n* Use new graph in Pipeline (#5520) \r\n* Deps update 06\u002F24 (#5514) \r\n* Revert reducing the number of epoch in SBSA training test case (#5531) \r\n* Add AV1 support (#5504) \r\n* Removes MXNet support from DALI (#5526) \r\n* Video decoder in plugin (#5477) \r\n* Checkpoint refactoring - recognize checkpoints by operator instance name. (#5503) \r\n* Keep separate per-pipeline operator counters. Error out when \"stealing\" subgraphs from other pipelines results in duplicate names. (#5506) \r\n* Graph lowering. (#5496) \r\n* Use \"device\" and \"preserve\" built-in arguments in OpGraph2. (#5516) \r\n* Add fn.zeros, fn.zeros_like, fn.ones, fn.ones_like, fn.full and fn.full_like (#5505) \r\n\r\n\r\nBug Fixes\r\n---\r\n* Support spaces in S3 paths (#5525) \r\n* Fix device ID in s3_client_manager (#5533) \r\n* Add failure tests for stealing subgraphs. Minor fix in pipeline validation. (#5518) \r\n* TFRecord to support S3 index URIs (#5515) \r\n* Exclude docs line length adjustment PR from the blame history (#5509)\r\n* Fix keras compat mode for ResNet50 tensorflow example (#5530) \r\n\r\n\r\nBreaking API changes\r\n---\r\n* DALI 1.39 was the final release to support the MXNet integration.\r\n\r\n\r\nDeprecated features\r\n---\r\nNo features were deprecated in this release.\r\n\r\n\r\n\r\nKnown issues:\r\n---\r\n* Starting with DALI 1.39, a performance regression was observed in hardware-accelerated image decoders for setups with high number of worker threads. The nvImageCodec hardware decoder pre-allocation uses higher mini-batch size, causing extra cuMemFree calls that may slowing down decoding in some iterations. The issue will be fixed in the upcoming release.\r\n* The following operators: `experimental.readers.fits`, `experimental.decoders.video`, and `experimental.inputs.video` do not currently support checkpointing. \r\n* The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.  \r\nIf the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.\r\n* Experimental VideoReaderDecoder does not support open GOP.  \r\nIt will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.\r\n* The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.  \r\nTo use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)\r\n* In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams. \r\nAs a workaround, you can manually synchronize the device before returning the data from the callback.\r\n* Due to some known issues with meltdown\u002Fspectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:\r\n  * `privileged=yes` in Extra Settings for AWS data points\r\n  * `--privileged` or `--security-opt seccomp=unconfined` for bare Docker.\r\n\r\n\r\nBinary builds\r\n---\r\n\r\n**NOTE:** DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest [CUDA toolkit](https:\u002F\u002Fdocs.nvidia.com\u002Fcuda\u002Fcuda-installation-guide-linux\u002Findex.html).\r\n\r\n```\r\nCUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. \r\nThey are built with the latest CUDA 11.x\u002F12.x toolkit respectively but they can run on the latest, \r\nstable CUDA 11.0\u002FCUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).\r\nHowever, using the most recent driver may enable additional functionality. \r\nMore details can be found in enhanced CUDA compatibility guide.\r\n```\r\n\r\nInstall via pip for CUDA 12.0:\r\n`pip install --extra","2024-07-31T11:06:56",{"id":253,"version":254,"summary_zh":255,"released_at":256},108754,"v1.39.0","Key Features and Enhancements\r\n---\r\nThis DALI release includes the following key features and enhancements:\r\n\r\n* Added support for CUDA 12.5 (#5478).\r\n* Migrated `fn.decoders.image*` operators to use nvImageCodec as a decoding backend (#5470).\r\n* Improved error handling (#5466, #5494, #5486, #5491).\r\n\r\n\r\nFixed Issues\r\n---\r\n* Fixed DALI TF plugin compatibility with TensorFlow 2.9 (#5499).\r\n* Fixed S3 `fn.readers.file` support for pad_last_batch=True (#5493).\r\n* Fixed a bug that resulted in long build times for some pipelines with enabled conditional execution (#5475).\r\n\r\n\r\nImprovements\r\n---\r\n* Add a mention of blogpost in Automatic Augmentation docs (#5508)\r\n* Removal of Python 3.8 notes from documentation (#5502)\r\n* Add default schema and use it in OpSpec argument queries. (#5500)\r\n* Add missing `blocking` argument documentation to the external source operator (#5501)\r\n* Trim line length in the documentation\u002Fexamples for the new theme (#5479)\r\n* Refactoring in Pipeline, OpGraph and old Executor + name lookup improvement in old OpGraph and Pipeline. (#5495)\r\n* Improve error message about FFmpeg not being available (#5494)\r\n* Extend docs by adding info about ``@do_not_convert`` for NUMBA and Python ops (#5488)\r\n* New OpGraph (#5485)\r\n* Fix tests for sanitizer build (#5492)\r\n* Github comment acceptance formating table fix (#5490)\r\n* Remove image decoder memory padding from examples (#5484)\r\n* Adding git lfs as a compilation prerequisite (#5483)\r\n* Remove unused JIT workspace policy. (#5487)\r\n* Add a warning about pipeline definition being executed only once. (#5486)\r\n* Move to CUDA 12.5 (#5478)\r\n* Pin NPP version for CUDA 12 (#5480)\r\n* Reintroduce \"Move old ImageDecoder to legacy module and make the nvImageCodec based ImageDecoder the default\" (#5470)\r\n* Move to new, unified, NVIDIA sphinx theme (#5471)\r\n* Add DALI video plugin skeleton (#5328)\r\n* Don't initialize NVML when not setting affinity. (#5472)\r\n* Add MXNet deprecation message to the docs and plugin (#5465)\r\n* Add first-class check for nested datanodes in math\u002Farithmetic ops. (#5466)\r\n\r\nBug Fixes\r\n---\r\n* Fix DALI TF plugin incompatibility with TF 2.9 (#5499)\r\n* Coverity May 2024 (#5497)\r\n* Fix S3 FileReader when using repeated samples (pad_last_batch=True) (#5493)\r\n* Improve the video decoder errors (#5491)\r\n* Add extra rpath for prebuilt ffmpeg dependencies for video plugin (#5481)\r\n* Use dynamic programming in `OpGraph::HasConsumersInOtherStage` (#5475)\r\n\r\n\r\nBreaking API changes\r\n---\r\nThere are no breaking changes in this DALI release.\r\n\r\n\r\nDeprecated features\r\n---\r\nDALI 1.39 is the final release that will support the MXNet integration.\r\n\r\n\r\n\r\nKnown issues:\r\n---\r\n* The following operators: `experimental.readers.fits`, `experimental.decoders.video`, and `experimental.inputs.video` do not currently support checkpointing. \r\n* The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.  \r\nIf the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.\r\n* Experimental VideoReaderDecoder does not support open GOP.  \r\nIt will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.\r\n* The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.  \r\nTo use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)\r\n* In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams. \r\nAs a workaround, you can manually synchronize the device before returning the data from the callback.\r\n* Due to some known issues with meltdown\u002Fspectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:\r\n  * `privileged=yes` in Extra Settings for AWS data points\r\n  * `--privileged` or `--security-opt seccomp=unconfined` for bare Docker.\r\n\r\n\r\nBinary builds\r\n---\r\n\r\n**NOTE:** DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest [CUDA toolkit](https:\u002F\u002Fdocs.nvidia.com\u002Fcuda\u002Fcuda-installation-guide-linux\u002Findex.html).\r\n\r\n```\r\nCUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. \r\nThey are built with the latest CUDA 11.x\u002F12.x toolkit respectively but they can run on the latest, \r\nstable CUDA 11.0\u002FCUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).\r\nHowever, using the most recent driver may enable additional functionality. \r\nMore details can be found in enhanced CUDA compatibility guide.\r\n```\r\n\r\nInstall via pip for CUDA 12.0:\r\n`pip install --extra-index-url https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002F nvidia-dali-cuda120==1.","2024-06-28T13:53:16",{"id":258,"version":259,"summary_zh":260,"released_at":261},108755,"v1.38.0","Key Features and Enhancements\r\n---\r\nThis DALI release includes the following key features and enhancements:\r\n\r\n* Added support for AWS S3 urls in DALI readers (#5415, #5434).\r\n* Improved support for enum types in `types.Constant`, `fn.cast`, `fn.random.choice` (#5422).\r\n* Improved error reporting (#5428).\r\n\r\n\r\nFixed Issues\r\n---\r\n* Fixed checkpoint clean-up in C API. (#5453)\r\n\r\n\r\nImprovements\r\n---\r\n* Dependency update for May 2024 - black, boost-pp, cv-cuda, pybind11, rapidjson (#5458)\r\n* Introduce DALI_PRELOAD_PLUGINS (#5457)\r\n* Move old ImageDecoder to legacy module and make the nvImageCodec based ImageDecoder the default (#5445)\r\n* Bump up NUMBA version used in tests to 0.59.1 (#5451)\r\n* Extend the documentation footer (#5454)\r\n* Remove the use of (soon deprecated) aligned_storage. (#5455)\r\n* Make shared IterationData a first class member of Workspace. (#5447)\r\n* Tasking module (#5436)\r\n* Add AWS SDK support to all file readers (FileReader, NumpyReader, WebdatasetReader...) (#5415)\r\n* Fix test_enum_types.py for Python3.11 (#5443)\r\n* Remove files related to QNX that are no longer used (#5438)\r\n* Remove usage of THRUST host&device vector (#5439)\r\n* Add CMake to aarch64 base docker images (#5437)\r\n* Refactoring of File Reader classes to accommodate for AWS SDK S3 integration (#5434)\r\n* Replace Ops class name with proper operator API name (#5428)\r\n* Use CMake binary release (#5435)\r\n* Improve support for DALI enum types (#5422)\r\n* Disable some JAX iterator tests in sanitizer run (#5427)\r\n\r\nBug Fixes\r\n---\r\n* Fix GTest Death Style Tests and LoadDirectory test in conda (#5469)\r\n* Revert \"Move old ImageDecoder to legacy module and make the nvImageCodec based ImageDecoder the default (#5445)\" (#5464)\r\n* Pin JAX version for multigpu test (#5460)\r\n* Use C++17 standard in nodeps test. (#5459)\r\n* Fix Coverity issues (May\u002F2024) (#5453)\r\n* Fix equalize unit test (#5456)\r\n\r\n\r\nBreaking API changes\r\n---\r\nThere are no breaking changes in this DALI release.\r\n\r\n\r\nDeprecated features\r\n---\r\nDALI 1.39 will be the last release to support MXNet integration.\r\n\r\n\r\n\r\nKnown issues:\r\n---\r\n* The following operators: `experimental.readers.fits`, `experimental.decoders.video`, `experimental.inputs.video`, and `experimental.decoders.image_random_crop` do not currently support checkpointing. \r\n* The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.  \r\nIf the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.\r\n* Experimental VideoReaderDecoder does not support open GOP.  \r\nIt will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.\r\n* The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.  \r\nTo use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)\r\n* In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams. \r\nAs a workaround, you can manually synchronize the device before returning the data from the callback.\r\n* Due to some known issues with meltdown\u002Fspectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:\r\n  * `privileged=yes` in Extra Settings for AWS data points\r\n  * `--privileged` or `--security-opt seccomp=unconfined` for bare Docker.\r\n\r\n\r\nBinary builds\r\n---\r\n\r\n**NOTE:** DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest [CUDA toolkit](https:\u002F\u002Fdocs.nvidia.com\u002Fcuda\u002Fcuda-installation-guide-linux\u002Findex.html).\r\n\r\n```\r\nCUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. \r\nThey are built with the latest CUDA 11.x\u002F12.x toolkit respectively but they can run on the latest, \r\nstable CUDA 11.0\u002FCUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).\r\nHowever, using the most recent driver may enable additional functionality. \r\nMore details can be found in enhanced CUDA compatibility guide.\r\n```\r\n\r\nInstall via pip for CUDA 12.0:\r\n`pip install --extra-index-url https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002F nvidia-dali-cuda120==1.38.0`\r\n`pip install --extra-index-url https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002F nvidia-dali-tf-plugin-cuda120==1.38.0`\r\n\r\nor just:\r\n\r\n`pip install nvidia-dali-cuda120==1.38.0`\r\n`pip install nvidia-dali-tf-plugin-cuda120==1.38.0`\r\n\r\nFor CUDA 11:\r\n`pip install --extra-index-url https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002F nvidia-dali-cuda110==1.38.0`\r\n`pip install --extra-index-url https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002F nvidia-dali-tf-plugin-cuda110==1.38.0`\r\n \r\nor just:\r\n\r\n`pip install nvi","2024-05-29T09:42:13",{"id":263,"version":264,"summary_zh":265,"released_at":266},108756,"v1.37.1","Key Features and Enhancements\r\n---\r\nThere are no new features in this release\r\n\r\nFixed Issues\r\n---\r\n* Fixed DALI TF plugin source compilation during installation #5448\r\n\r\nImprovements\r\n---\r\nThere are no new improvements in this release\r\n\r\nBug Fixes\r\n---\r\n* Fixed DALI TF plugin source compilation during installation #5448\r\n* Pin all nvJPEG2k subpackages #5442\r\n\r\nBreaking API changes\r\n---\r\nThere are no breaking changes in this DALI release.\r\n\r\n\r\n\r\nDeprecated features\r\n---\r\nNo features were deprecated in this release.\r\n\r\n\r\n\r\nKnown issues:\r\n---\r\n* The following operators: `experimental.readers.fits`, `experimental.decoders.video`, `experimental.inputs.video`, and `experimental.decoders.image_random_crop` do not currently support checkpointing. \r\n* The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.  \r\nIf the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.\r\n* Experimental VideoReaderDecoder does not support open GOP.  \r\nIt will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.\r\n* The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.  \r\nTo use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)\r\n* In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams. \r\nAs a workaround, you can manually synchronize the device before returning the data from the callback.\r\n* Due to some known issues with meltdown\u002Fspectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:\r\n  * `privileged=yes` in Extra Settings for AWS data points\r\n  * `--privileged` or `--security-opt seccomp=unconfined` for bare Docker.\r\n\r\n\r\nBinary builds\r\n---\r\n\r\n**NOTE:** DALI builds for CUDA 12 dynamically link the CUDA toolkit. To use DALI, install the latest [CUDA toolkit](https:\u002F\u002Fdocs.nvidia.com\u002Fcuda\u002Fcuda-installation-guide-linux\u002Findex.html).\r\n\r\n```\r\nCUDA 11.0 and CUDA 12.0 builds use CUDA toolkit enhanced compatibility. \r\nThey are built with the latest CUDA 11.x\u002F12.x toolkit respectively but they can run on the latest, \r\nstable CUDA 11.0\u002FCUDA 12.0 capable drivers (450.80 or later and 525.60 or later respectively).\r\nHowever, using the most recent driver may enable additional functionality. \r\nMore details can be found in enhanced CUDA compatibility guide.\r\n```\r\n\r\nInstall via pip for CUDA 12.0:\r\n`pip install --extra-index-url https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002F nvidia-dali-cuda120==1.37.1`\r\n`pip install --extra-index-url https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002F nvidia-dali-tf-plugin-cuda120==1.37.1`\r\n\r\nor just:\r\n\r\n`pip install nvidia-dali-cuda120==1.37.1`\r\n`pip install nvidia-dali-tf-plugin-cuda120==1.37.1`\r\n\r\nFor CUDA 11:\r\n`pip install --extra-index-url https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002F nvidia-dali-cuda110==1.37.1`\r\n`pip install --extra-index-url https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002F nvidia-dali-tf-plugin-cuda110==1.37.1`\r\n \r\nor just:\r\n\r\n`pip install nvidia-dali-cuda120==1.37.1`\r\n`pip install nvidia-dali-tf-plugin-cuda120==1.37.1`\r\n\r\n\r\nOr use direct download links (CUDA 12.0):\r\n* https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002Fnvidia-dali-cuda120\u002Fnvidia_dali_cuda120-1.37.1-14636516-py3-none-manylinux2014_x86_64.whl\r\n* https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002Fnvidia-dali-cuda120\u002Fnvidia_dali_cuda120-1.37.1-14636516-py3-none-manylinux2014_aarch64.whl\r\n* https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002Fnvidia-dali-tf-plugin-cuda120\u002Fnvidia-dali-tf-plugin-cuda120-1.37.1.tar.gz\r\n\r\nOr use direct download links (CUDA 11.0):\r\n* https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002Fnvidia-dali-cuda110\u002Fnvidia_dali_cuda110-1.37.1-14636526-py3-none-manylinux2014_x86_64.whl\r\n* https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002Fnvidia-dali-cuda110\u002Fnvidia_dali_cuda110-1.37.1-14636526-py3-none-manylinux2014_aarch64.whl\r\n* https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002Fnvidia-dali-tf-plugin-cuda110\u002Fnvidia-dali-tf-plugin-cuda110-1.37.1.tar.gz\r\n\r\nFFmpeg source code:\r\n* This software uses code of [FFmpeg](http:\u002F\u002Fffmpeg.org) licensed under the [LGPLv2.1](http:\u002F\u002Fwww.gnu.org\u002Flicenses\u002Fold-licenses\u002Flgpl-2.1.html) and its source can be downloaded [here](https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002Fnvidia-dali\u002FFFmpeg-n6.1.1.tar.gz)\r\n\r\nLibsndfile source code:\r\n* https:\u002F\u002Fdeveloper.download.nvidia.com\u002Fcompute\u002Fredist\u002Fnvidia-dali\u002Flibsndfile-1.2.2.tar.gz","2024-05-07T16:55:22",{"id":268,"version":269,"summary_zh":270,"released_at":271},108757,"v1.37.0","Key Features and Enhancements\r\n---\r\nThis DALI release includes the following key features and enhancements:\r\n\r\n* Added support for running JAX defined augmentations in the iterator and pipeline. (#5406, #5426, #5432)\r\n* Improved error reporting with a stack trace pointing to the offending operation in user code. (#5357, #5396)\r\n* Added CPU `fn.random.choice` operator. (#5380, #5387)\r\n* Added support for CUDA 12.4. (#5353, #5410)\r\n* Improved iterators checkpointing (#5374, #5375, #5371, #5356)\r\n* Optimized `fn.resize` operator for better GPU utilization (#5382)\r\n* Added option to skip bboxes in `fn.random_bbox_crop` with the fraction of area within the crop below user-provided threshold. (#5368) \r\n\r\nFixed Issues\r\n---\r\n* Fixed handling of special values of the `stream` field in CUDA Array Interface v3 (#5425).\r\n* Fixed insufficient synchronization around scratch memory in nvImageCoded-based decoders (`fn.experimental.decoders.*`). (#5408)\r\n* Fixed readers saving incorrect checkpoint when restored and saved back in the same epoch. (#5378)\r\n\r\n\r\nImprovements\r\n---\r\n* Add JAX-defined augmentation examples (#5426)\r\n* Extend context and name propagation in errors (#5396) \r\n* Add experimental jax operator (#5406) \r\n* Enable Bandit security scan (#5402) \r\n* Reworks links in the RST documentation (#5413) \r\n* Refactor to remove duplicated logic in traverse_directories utility function (#5419) \r\n* Update DALI deps version (#5417) \r\n* Changes to dali\u002Futil\u002Fnumpy (#5416) \r\n* Add libcurl-devel (#5412) \r\n* Move to CUDA 12.4 U1 (#5410) \r\n* Separate excutor interface and implementation files. (#5411) \r\n* Make the video reader use cudaVideoDeinterlaceMode_Adaptive only for non-progressive videos (#5392) \r\n* Skip AutoAug test when sanitizers are on (#5403) \r\n* Unpin typing_extensions in tests (#5405) \r\n* Dependency update 03-2024 (#5397) \r\n* Review Bandit reported vulnerabilities (#5398) \r\n* Support checkpointing in JAX decorators (#5374) \r\n* Workaround ASAN bug ignoring RPATH (#5388) \r\n* Update supported TensorFlow version (#5386) \r\n* Disable more video tests on selected machines (#5385) \r\n* Extend fn.random.choice to support n-D inputs (#5387) \r\n* Add random choice CPU operator for 0D samples (#5380) \r\n* Resize: Optimize block sizes, use dynamic amount of shared mem. (#5382) \r\n* Support checkpointing in JAX peekable iterator (#5375) \r\n* Increase DALI TF Plugin loading timeout (#5381) \r\n* Improve iterator checkpointing (#5371) \r\n* Improve logs when the DALI TF plugin loading process fails (#5379) \r\n* Add option to prune bboxes based on % area in Crop ROI (#5368) \r\n* Improve op deprecation and deprecate sequence reader (#5372) \r\n* Fix typo in nvcuvid error (#5373) \r\n* Optimize sanitizer operator tests (#5352) \r\n* Introduce operator origin stack trace in the error message (#5357) \r\n* Make ExternalContext more flexible (#5356) \r\n* Enable CUDA 12.4 build (#5353) \r\n\r\nBug Fixes\r\n---\r\n* Add nose as a dependency to iterators tests (#5433)\r\n* Disable jax_function notebook conversions for unsupported Python3.8 (#5432)\r\n* Improve handling of CUDA Array Interface v3 (#5425) \r\n* Fix debug build (#5414) \r\n* Add stream synchronization before decode for nvImageCodec \u003C= 0.2 (#5408) \r\n* Fix Loader checkpointing bug (#5378) \r\n* Fix pixelwise_masks support when the ratio is on in the coco reader (#5407) \r\n* Fix storage of non-POD random distributions. (#5395) \r\n* Fix nvImageCodec version check. (#5399) \r\n* Fix bug in checkpointing C API (#5390) \r\n* Add nose to the package list for TL1_separate_executor. (#5393) \r\n* Use host sync allocation for nvImageCodec \u003C= 0.2 (#5391) \r\n* Remove temporary lock file from wheel (#5384) \r\n* Disable type annotation tests in sanitizer build (#5383) \r\n* Fix CUDA 12.4 with ASAN (#5370) \r\n* Skip video tests on M60 (#5369) \r\n* Enable eager mode tests, fix mixed ops and improve coverage (#5367) \r\n\r\nBreaking API changes\r\n---\r\nThere are no breaking changes in this DALI release.\r\n\r\n\r\n\r\nDeprecated features\r\n---\r\nNo features were deprecated in this release.\r\n\r\n\r\n\r\nKnown issues:\r\n---\r\n* The following operators: `experimental.readers.fits`, `experimental.decoders.video`, `experimental.inputs.video`, and `experimental.decoders.image_random_crop` do not currently support checkpointing. \r\n* The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.  \r\nIf the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.\r\n* Experimental VideoReaderDecoder does not support open GOP.  \r\nIt will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.\r\n* The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.  \r\nTo use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the pl","2024-04-29T15:41:50",{"id":273,"version":274,"summary_zh":275,"released_at":276},108758,"v1.36.0","Key Features and Enhancements\r\n---\r\nThis DALI release includes the following key features and enhancements:\r\n\r\n* Added support for checkpointing in MXNet iterator and CPU TensorFlow plugin (#5334, #5315).\r\n* Added morphological operators (`fn.experimental.dilate`, `fn.experimental.erode`) (#5294).\r\n* Integrated nvImageCodec for decoding in `fn.experimental.decoders` (#5297, #5336, #5324, #5333, #5339).\r\n* Added `fn.random_crop_generator` operator (#5304).\r\n* Added support for multiple inputs and relative shapes and anchors in `fn.multi_paste` (#5331).\r\n\r\n\r\nFixed Issues\r\n---\r\n* Fixed insufficient synchronization in MXNet iterator (#5364).\r\n* Fixed auto_reset argument handling in iterator plugins (#5340).\r\n* Fixed missing calls to nvml::Shutdown (#5317).\r\n* Limited a number of progressive scans for jpeg decoding (#5316).\r\n\r\n\r\nImprovements\r\n---\r\n* Propagate module and display name of the operator to backend (#5344)\r\n* Update dependencies (#5349)\r\n* Map backend exceptions into Python exception types (#5345)\r\n* Emphasise the optical flow is calculated at input resolution. (#5350)\r\n* Refactor custom ops classes to use python_op_factory as base (#5338)\r\n* Add origin stack trace capture for DALI operators (#5302)\r\n* Test fused decoder with two separate pipelines (#5343)\r\n* [Cutmix] Make fn.multi_paste more flexible, fix validation (#5331)\r\n* Enable checkpointing in TensorFlow plugin (CPU only) (#5334)\r\n* Copy out nvImageCodec conda package from the build (#5336)\r\n* Add error message when GPU is not available (#5329)\r\n* Enable build with statically linked nvimgcodec + hard dependency for dynamic linking (#5324)\r\n* Add tf_stack util to autograph (#5322)\r\n* Rewrite median blur to use nvcvop tools (#5327)\r\n* Add morphological operators and the nvcvop module (#5294)\r\n* Add OpSpec::ArgumentInputIdx (#5330)\r\n* Simplify workspace object. Ensure predictable argument order in OpSpec. (#5325)\r\n* Support checkpointing in MXNet iterator (#5315)\r\n* Set rpath at cmake level (do not wait for bundle-wheel) (#5323)\r\n* Interpolation modes documentation upgrade (#5314)\r\n* Update links in DALI documentation (#5321)\r\n* Integrate nvimagecodec (#5297)\r\n* Add `naive_histogram` custom operator to test suite (#4731)\r\n* Add RandomCropGenerator (#5304)\r\n* Use small videos in checkpointing tests (#5305)\r\n\r\n\r\nBug Fixes\r\n---\r\n* Use synchronous copy to framework array in the absence of a stream (#5364)\r\n* Process TFRecord reader binding classes only when it is enabled (#5360)\r\n* Adjust stack formatting in backend to match Python (#5354)\r\n* Link test operators against nvml wrapper (#5355)\r\n* Fix range check in Workspace::SetInput (#5358)\r\n* Make async_pool immune to stream handle reuse. (#5348)\r\n* Coverity fixes for 1.36 (#5342)\r\n* Fix \"auto_reset\" argument handling (#5340)\r\n* Fix cupy tests (#5341)\r\n* Add nvimagecodec libs to DALI_EXCLUDES + test utils to dump mismatched images (#5339)\r\n* Fix warning about nvImageCodec version (#5333)\r\n* Silence warning about DOWNLOAD_EXTRACT_TIMESTAMP while fixing the cmake \u003C3.24 builds (#5326)\r\n* Fix inconsistent calls to nvml::Init and nvml::Shutdown (#5317)\r\n* Limit the number of progressive scans for jpeg decoding (#5316)\r\n\r\n\r\nBreaking API changes\r\n---\r\nThere are no breaking changes in this DALI release.\r\n\r\n\r\n\r\nDeprecated features\r\n---\r\nNo features were deprecated in this release.\r\n\r\n\r\n\r\nKnown issues:\r\n---\r\n* The following operators: `experimental.readers.fits`, `experimental.decoders.video`, `experimental.inputs.video`, and `experimental.decoders.image_random_crop` do not currently support checkpointing. \r\n* The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.  \r\nIf the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.\r\n* Experimental VideoReaderDecoder does not support open GOP.  \r\nIt will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.\r\n* The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.  \r\nTo use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)\r\n* In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams. \r\nAs a workaround, you can manually synchronize the device before returning the data from the callback.\r\n* Due to some known issues with meltdown\u002Fspectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:\r\n  * `privileged=yes` in Extra Settings for AWS data points\r\n  * `--privileged` or `--security-opt seccomp=unconfined` for bare Docker.\r\n\r\n\r\nBinary builds\r\n---\r\n\r\n**NOTE:** DALI ","2024-03-25T18:23:41"]