[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tool-microsoft--onnxruntime":3,"similar-microsoft--onnxruntime":215},{"id":4,"github_repo":5,"name":6,"description_en":7,"description_zh":8,"ai_summary_zh":9,"readme_en":10,"readme_zh":11,"quickstart_zh":12,"use_case_zh":13,"hero_image_url":14,"owner_login":15,"owner_name":16,"owner_avatar_url":17,"owner_bio":18,"owner_company":19,"owner_location":19,"owner_email":20,"owner_twitter":21,"owner_website":22,"owner_url":23,"languages":24,"stars":64,"forks":65,"last_commit_at":66,"license":67,"difficulty_score":44,"env_os":68,"env_gpu":69,"env_ram":70,"env_deps":71,"category_tags":79,"github_topics":81,"view_count":90,"oss_zip_url":19,"oss_zip_packed_at":19,"status":91,"created_at":92,"updated_at":93,"faqs":94,"releases":115},6933,"microsoft\u002Fonnxruntime","onnxruntime","ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator","ONNX Runtime 是一款由微软开源的高性能机器学习推理与训练加速引擎，旨在打破不同框架与硬件之间的壁垒。它主要解决了模型部署难、推理速度慢以及跨平台兼容性差的痛点，让开发者无需重写代码，即可将来自 PyTorch、TensorFlow\u002FKeras 等深度学习框架，或 scikit-learn、XGBoost 等传统机器学习库训练的模型，高效运行在 Windows、Linux、macOS 等多种操作系统及各类硬件加速器上。\n\n这款工具非常适合 AI 工程师、后端开发者及算法研究人员使用。对于希望将模型落地到生产环境并优化响应速度的团队，ONNX Runtime 能通过先进的图优化技术和算子融合，充分挖掘硬件潜力，显著降低延迟并节约计算成本。其独特的亮点在于“一次转换，处处运行”的跨平台能力，以及在训练场景下，仅需在现有 PyTorch 脚本中添加一行代码，即可利用多节点 NVIDIA GPU 大幅加速 Transformer 等大模型的训练过程。无论是构建实时智能应用，还是进行大规模模型迭代，ONNX Runtime 都能提供稳定且卓越的性能支持，是连接模型研发与实际应用的高效桥","ONNX Runtime 是一款由微软开源的高性能机器学习推理与训练加速引擎，旨在打破不同框架与硬件之间的壁垒。它主要解决了模型部署难、推理速度慢以及跨平台兼容性差的痛点，让开发者无需重写代码，即可将来自 PyTorch、TensorFlow\u002FKeras 等深度学习框架，或 scikit-learn、XGBoost 等传统机器学习库训练的模型，高效运行在 Windows、Linux、macOS 等多种操作系统及各类硬件加速器上。\n\n这款工具非常适合 AI 工程师、后端开发者及算法研究人员使用。对于希望将模型落地到生产环境并优化响应速度的团队，ONNX Runtime 能通过先进的图优化技术和算子融合，充分挖掘硬件潜力，显著降低延迟并节约计算成本。其独特的亮点在于“一次转换，处处运行”的跨平台能力，以及在训练场景下，仅需在现有 PyTorch 脚本中添加一行代码，即可利用多节点 NVIDIA GPU 大幅加速 Transformer 等大模型的训练过程。无论是构建实时智能应用，还是进行大规模模型迭代，ONNX Runtime 都能提供稳定且卓越的性能支持，是连接模型研发与实际应用的高效桥梁。","\u003Cp align=\"center\">\u003Cimg width=\"50%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_onnxruntime_readme_4aef65f22eb9.png\" \u002F>\u003C\u002Fp>\n\n**ONNX Runtime is a cross-platform inference and training machine-learning accelerator**.\n\n**ONNX Runtime inference** can enable faster customer experiences and lower costs, supporting models from deep learning frameworks such as PyTorch and TensorFlow\u002FKeras as well as classical machine learning libraries such as scikit-learn, LightGBM, XGBoost, etc. ONNX Runtime is compatible with different hardware, drivers, and operating systems, and provides optimal performance by leveraging hardware accelerators where applicable alongside graph optimizations and transforms. [Learn more &rarr;](https:\u002F\u002Fwww.onnxruntime.ai\u002Fdocs\u002F#onnx-runtime-for-inferencing)\n\n**ONNX Runtime training** can accelerate the model training time on multi-node NVIDIA GPUs for transformer models with a one-line addition for existing PyTorch training scripts. [Learn more &rarr;](https:\u002F\u002Fwww.onnxruntime.ai\u002Fdocs\u002F#onnx-runtime-for-training)\n\n## Get Started & Resources\n\n* **General Information**: [onnxruntime.ai](https:\u002F\u002Fonnxruntime.ai)\n\n* **Usage documentation and tutorials**: [onnxruntime.ai\u002Fdocs](https:\u002F\u002Fonnxruntime.ai\u002Fdocs)\n\n* **YouTube video tutorials**: [youtube.com\u002F@ONNXRuntime](https:\u002F\u002Fwww.youtube.com\u002F@ONNXRuntime)\n\n* [**Upcoming Release Roadmap**](https:\u002F\u002Fonnxruntime.ai\u002Froadmap)\n\n* **Companion sample repositories**:\n  - ONNX Runtime Inferencing: [microsoft\u002Fonnxruntime-inference-examples](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-inference-examples)\n  - ONNX Runtime Training: [microsoft\u002Fonnxruntime-training-examples](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-training-examples)\n\n## Releases\n\nThe current release and past releases can be found here: https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Freleases.\n\nFor details on the upcoming release, including release dates, announcements, features, and guidance on submitting feature requests, please visit the release roadmap: https:\u002F\u002Fonnxruntime.ai\u002Froadmap.\n\n## Data\u002FTelemetry\n\nWindows distributions of this project may collect usage data and send it to Microsoft to help improve our products and services. See the [privacy statement](docs\u002FPrivacy.md) for more details.\n\n## Contributions and Feedback\n\nWe welcome contributions! Please see the [contribution guidelines](CONTRIBUTING.md).\n\nFor feature requests or bug reports, please file a [GitHub Issue](https:\u002F\u002Fgithub.com\u002FMicrosoft\u002Fonnxruntime\u002Fissues).\n\nFor general discussion or questions, please use [GitHub Discussions](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fdiscussions).\n\n## Code of Conduct\n\nThis project has adopted the [Microsoft Open Source Code of Conduct](https:\u002F\u002Fopensource.microsoft.com\u002Fcodeofconduct\u002F).\nFor more information see the [Code of Conduct FAQ](https:\u002F\u002Fopensource.microsoft.com\u002Fcodeofconduct\u002Ffaq\u002F)\nor contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.\n\n## License\n\nThis project is licensed under the [MIT License](LICENSE).\n","\u003Cp align=\"center\">\u003Cimg width=\"50%\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_onnxruntime_readme_4aef65f22eb9.png\" \u002F>\u003C\u002Fp>\n\n**ONNX Runtime 是一个跨平台的机器学习推理与训练加速器**。\n\n**ONNX Runtime 推理** 可以带来更快的用户体验和更低的成本，支持来自 PyTorch、TensorFlow\u002FKeras 等深度学习框架以及 scikit-learn、LightGBM、XGBoost 等传统机器学习库的模型。ONNX Runtime 兼容不同的硬件、驱动程序和操作系统，并在适用的情况下利用硬件加速器，结合图优化和变换，实现最佳性能。[了解更多 &rarr;](https:\u002F\u002Fwww.onnxruntime.ai\u002Fdocs\u002F#onnx-runtime-for-inferencing)\n\n**ONNX Runtime 训练** 通过为现有的 PyTorch 训练脚本添加一行代码，即可加速多节点 NVIDIA GPU 上的 Transformer 模型训练时间。[了解更多 &rarr;](https:\u002F\u002Fwww.onnxruntime.ai\u002Fdocs\u002F#onnx-runtime-for-training)\n\n## 开始使用与资源\n\n* **基本信息**: [onnxruntime.ai](https:\u002F\u002Fonnxruntime.ai)\n\n* **使用文档和教程**: [onnxruntime.ai\u002Fdocs](https:\u002F\u002Fonnxruntime.ai\u002Fdocs)\n\n* **YouTube 视频教程**: [youtube.com\u002F@ONNXRuntime](https:\u002F\u002Fwww.youtube.com\u002F@ONNXRuntime)\n\n* [**即将发布的路线图**](https:\u002F\u002Fonnxruntime.ai\u002Froadmap)\n\n* **配套示例仓库**:\n  - ONNX Runtime 推理: [microsoft\u002Fonnxruntime-inference-examples](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-inference-examples)\n  - ONNX Runtime 训练: [microsoft\u002Fonnxruntime-training-examples](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-training-examples)\n\n## 发布版本\n\n当前及历史版本可在以下链接查看：https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Freleases。\n\n有关即将发布的版本详情，包括发布日期、公告、新特性以及提交功能请求的指南，请访问发布路线图：https:\u002F\u002Fonnxruntime.ai\u002Froadmap。\n\n## 数据\u002F遥测\n\n该项目的 Windows 版本可能会收集使用数据并发送至 Microsoft，以帮助改进我们的产品和服务。更多详情请参阅 [隐私声明](docs\u002FPrivacy.md)。\n\n## 贡献与反馈\n\n我们欢迎各类贡献！请参阅 [贡献指南](CONTRIBUTING.md)。\n\n如需提交功能请求或报告问题，请创建 [GitHub 问题](https:\u002F\u002Fgithub.com\u002FMicrosoft\u002Fonnxruntime\u002Fissues)。\n\n如需进行一般性讨论或提问，请使用 [GitHub 讨论区](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fdiscussions)。\n\n## 行为准则\n\n本项目已采纳 [微软开源行为准则](https:\u002F\u002Fopensource.microsoft.com\u002Fcodeofconduct\u002F)。\n更多信息请参阅 [行为准则常见问题解答](https:\u002F\u002Fopensource.microsoft.com\u002Fcodeofconduct\u002Ffaq\u002F)\n或通过 [opencode@microsoft.com](mailto:opencode@microsoft.com) 联系我们，提出任何其他问题或意见。\n\n## 许可证\n\n本项目采用 [MIT 许可证](LICENSE) 许可。","# ONNX Runtime 快速上手指南\n\nONNX Runtime 是一个跨平台的推理和训练机器学习加速器。它支持来自 PyTorch、TensorFlow\u002FKeras 等深度学习框架以及 scikit-learn、LightGBM、XGBoost 等传统机器学习库的模型，能够通过利用硬件加速器（如 GPU）和图优化技术提供最佳性能。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：Windows、Linux 或 macOS。\n*   **Python 版本**：推荐 Python 3.8 - 3.11。\n*   **前置依赖**：\n    *   已安装 `pip` 包管理工具。\n    *   （可选）若需使用 GPU 加速，请确保已安装对应的 NVIDIA 驱动程序和 CUDA\u002FcuDNN（针对 NVIDIA GPU 版本）。\n\n## 安装步骤\n\n根据您的硬件环境选择合适的安装包。国内用户如遇下载缓慢，可配置清华源或阿里源加速安装。\n\n### 1. 安装 CPU 版本（通用）\n适用于大多数基础推理场景，无需额外显卡驱动。\n\n```bash\npip install onnxruntime\n# 国内加速安装\npip install onnxruntime -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n### 2. 安装 GPU 版本（NVIDIA）\n适用于拥有 NVIDIA 显卡且需要高性能推理的场景。请确保系统已正确配置 CUDA 环境。\n\n```bash\npip install onnxruntime-gpu\n# 国内加速安装\npip install onnxruntime-gpu -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n> **注意**：从 ONNX Runtime 1.16 版本起，GPU 包名为 `onnxruntime-gpu`。旧版本可能直接使用 `onnxruntime` 包含 GPU 支持，具体请参考官方发布说明。\n\n## 基本使用\n\n以下是最简单的推理示例，展示如何加载一个现有的 ONNX 模型并执行推理。\n\n### 示例代码\n\n假设您已经有了一个名为 `model.onnx` 的模型文件：\n\n```python\nimport onnxruntime as ort\nimport numpy as np\n\n# 1. 创建推理会话 (Inference Session)\n# 自动检测并使用可用的执行提供者 (CPU 或 GPU)\nsession = ort.InferenceSession(\"model.onnx\")\n\n# 2. 获取模型的输入和输出名称\ninput_name = session.get_inputs()[0].name\noutput_name = session.get_outputs()[0].name\n\n# 3. 准备输入数据 (示例为随机生成的浮点数数组)\n# 请根据实际模型的输入形状调整数据\ninput_data = np.random.rand(1, 3, 224, 224).astype(np.float32)\n\n# 4. 执行推理\noutputs = session.run([output_name], {input_name: input_data})\n\n# 5. 处理结果\nresult = outputs[0]\nprint(f\"推理完成，输出形状：{result.shape}\")\nprint(f\"前 5 个预测值：{result[0][:5]}\")\n```\n\n### 关键说明\n*   **自动加速**：`InferenceSession` 默认会尝试加载可用的执行提供者（Execution Providers）。如果安装了 `onnxruntime-gpu` 且检测到 GPU，它将自动使用 CUDA 进行加速；否则回退到 CPU。\n*   **模型来源**：您可以将 PyTorch (`torch.onnx.export`) 或 TensorFlow (`tf2onnx`) 模型导出为 `.onnx` 格式后在此运行。","某电商团队正在将基于 PyTorch 训练的商品推荐模型部署到生产环境，以支持每秒数千次的实时个性化推荐请求。\n\n### 没有 onnxruntime 时\n- **推理延迟高**：直接加载原始 PyTorch 模型进行推理，未利用底层硬件加速，导致单次请求耗时超过 50ms，难以满足高并发下的低延迟要求。\n- **环境依赖复杂**：生产服务器需安装完整的深度学习框架及特定版本的 CUDA 驱动，镜像体积庞大且容易因版本冲突导致部署失败。\n- **跨平台适配难**：若需将服务扩展至边缘设备或非 NVIDIA GPU 环境，需要重写大量后端代码以适配不同的推理引擎。\n- **资源成本高昂**：由于计算效率低下，团队不得不申请更多高性能 GPU 实例来维持服务稳定性，显著增加了云资源开支。\n\n### 使用 onnxruntime 后\n- **性能显著提升**：onnxruntime 自动应用图优化并调用硬件加速提供者（如 CUDA、TensorRT），将单次推理延迟降低至 10ms 以内，吞吐量提升 4 倍。\n- **部署轻量化**：只需导出标准的 ONNX 模型文件，运行时不再依赖庞大的训练框架，大幅减小了容器镜像体积并简化了依赖管理。\n- **无缝跨平台运行**：同一份 ONNX 模型可直接在 CPU、NVIDIA GPU 甚至 ARM 架构设备上运行，无需修改任何业务代码即可实现多端部署。\n- **成本大幅降低**：高效的推理能力使得同等流量下所需的计算节点数量减少 60%，直接降低了每月的云服务账单。\n\nonnxruntime 通过统一的加速运行时和深度硬件优化，成功打通了从模型训练到高效生产部署的“最后一公里”。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_onnxruntime_4aef65f2.png","microsoft","Microsoft","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fmicrosoft_4900709c.png","Open source projects and samples from Microsoft",null,"opensource@microsoft.com","OpenAtMicrosoft","https:\u002F\u002Fopensource.microsoft.com","https:\u002F\u002Fgithub.com\u002Fmicrosoft",[25,29,33,37,41,45,49,53,57,60],{"name":26,"color":27,"percentage":28},"C++","#f34b7d",89.3,{"name":30,"color":31,"percentage":32},"Python","#3572A5",3.5,{"name":34,"color":35,"percentage":36},"C","#555555",2.4,{"name":38,"color":39,"percentage":40},"Cuda","#3A4E3A",1.1,{"name":42,"color":43,"percentage":44},"C#","#178600",1,{"name":46,"color":47,"percentage":48},"Assembly","#6E4C13",0.8,{"name":50,"color":51,"percentage":52},"TypeScript","#3178c6",0.7,{"name":54,"color":55,"percentage":56},"JavaScript","#f1e05a",0.3,{"name":58,"color":59,"percentage":56},"CMake","#DA3434",{"name":61,"color":62,"percentage":63},"Java","#b07219",0.2,19841,3820,"2026-04-12T14:41:26","MIT","Linux, macOS, Windows","推理非必需（支持利用硬件加速器）；训练需要多节点 NVIDIA GPU（具体型号、显存及 CUDA 版本未在文中说明）","未说明",{"notes":72,"python":70,"dependencies":73},"该工具是跨平台的推理和训练加速器。推理支持多种深度学习框架及传统机器学习库，并通过图优化和硬件加速提供最佳性能。训练功能主要针对 Transformer 模型，可在现有 PyTorch 脚本中通过一行代码添加以加速多节点 NVIDIA GPU 上的训练。Windows 发行版可能会收集使用数据。",[74,75,76,77,78],"PyTorch","TensorFlow\u002FKeras","scikit-learn","LightGBM","XGBoost",[80],"开发框架",[82,83,84,85,86,87,88,89,76],"deep-learning","onnx","neural-networks","machine-learning","ai-framework","hardware-acceleration","pytorch","tensorflow",2,"ready","2026-03-27T02:49:30.150509","2026-04-13T04:03:04.421415",[95,100,105,110],{"id":96,"question_zh":97,"answer_zh":98,"source_url":99},31241,"如何在 Apple M1 (ARM64) Mac 上安装 ONNX Runtime？","官方 pip 包可能尚未直接支持或存在兼容性问题。您可以安装社区维护的专用包 `onnxruntime-silicon`。使用命令：`pip install onnxruntime-silicon`。该包专为 Mac M1 (Apple Silicon) 构建，可解决在 ARM64 架构 macOS 上无法找到匹配分发版本的问题。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fissues\u002F6633",{"id":101,"question_zh":102,"answer_zh":103,"source_url":104},31242,"TensorRT Execution Provider 比 CUDA Execution Provider 慢，且部分节点未分配到 TensorRT，如何解决？","这通常是由于特定算子（如 ScatterND）在旧版本 TensorRT 中支持不佳，导致节点回退到 CPU 或 CUDA，引发频繁的内存传输。解决方案是升级 TensorRT 版本。例如，升级到 TRT 10.9 可以修复嵌入式上下文加载和 ScatterND 算子的问题。您通常可以直接运行预编译的包含 TRT 10.9 支持的 ONNX Runtime 包，无需修改构建依赖文件。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fissues\u002F17434",{"id":106,"question_zh":107,"answer_zh":108,"source_url":109},31243,"在 AWS Lambda (ARM64\u002FGraviton2) 上导入 onnxruntime 时遇到 cpuinfo 解析错误并崩溃，怎么办？","该问题表现为报错 `Error in cpuinfo: failed to parse the list of possible processors`，原因是 AWS Lambda 环境中缺少 `\u002Fsys\u002Fdevices\u002Fsystem\u002Fcpu\u002Fpossible` 和 `\u002Fsys\u002Fdevices\u002Fsystem\u002Fcpu\u002Fpresent` 文件。这是一个已知回归问题（发生在 v1.22.2 到 v1.23.1 之间）。如果遇到此问题，建议检查是否可以使用不受影响的版本，或者关注官方后续针对 aarch64-linux 沙箱环境的修复补丁。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fissues\u002F10038",{"id":111,"question_zh":112,"answer_zh":113,"source_url":114},31244,"在 Jetson Xavier 上从源码构建 ONNX Runtime 时遇到 Eigen 相关的编译错误，如何处理？","在 Jetson 等 ARM64 设备上构建时，可能会遇到 Eigen 库相关的编译错误（例如 `GeneralBlockPanelKernel.h` 中的成员函数更新错误）。这通常与特定的编译器版本或 Eigen 版本兼容性有关。虽然具体修复可能需要调整构建脚本或更新依赖，但遇到此类问题时，应首先确保使用了与 JetPack 版本匹配的 CUDA 和 cuDNN 路径，并检查是否有针对该架构的特定构建标志需要添加。如果问题持续，可能需要等待官方对该架构编译链的更新或尝试不同的 GCC 版本。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fissues\u002F3024",[116,121,126,131,136,140,145,150,155,160,165,170,175,180,185,190,195,200,205,210],{"id":117,"version":118,"summary_zh":119,"released_at":120},230953,"v1.24.4","这是 ONNX Runtime 1.24 的补丁版本，包含错误修复和执行提供程序更新。\n\n## 错误修复\n- **核心**：在容器化环境中（例如 AKS\u002FKubernetes），当 `nvidia-drm` 驱动未加载但 GPU PCI 设备仍通过 sysfs 暴露时，为 Linux 上的 GPU 设备发现添加了 PCI 总线回退机制。（[#27591](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27591)）\n- **插件 EP**：修复了在 `GetOutputIndex` 中遍历输出跨度时出现的空指针解引用问题。（[#27644](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27644)）\n- **插件 EP**：修复了一个 bug，该 bug 会错误地为不同 GraphView 中的融合节点（例如 If 节点的 then\u002Felse 分支）分配重复的 MetaDef ID，从而导致会话创建因内核冲突错误而失败。（[#27666](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27666)）\n\n## 执行提供程序更新\n- **QNN EP**：通过将 rpcmem 库的加载延迟到推理时，启用了使用 memhandle IO 类型的离线 x64 编译。（[#27479](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27479)）\n- **QNN EP**：回滚了 QNN SDK 日志详细程度的更改，这些更改曾在后端销毁时导致段错误。（[#27650](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27650)）\n\n## 构建与基础设施\n- **Python**：将 `python_requires` 从 `>=3.10` 更新为 `>=3.11`，以反映已停止对 Python 3.10 的支持。（[#27354](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27354)）\n- **构建**：用编译器可移植的 `_tpause` 内联函数替换了 `__builtin_ia32_tpause`，以解决 GCC 和 LLVM 之间的跨编译器可移植性问题。（[#27607](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27607)）\n\n**完整变更日志**：[v1.24.3...v1.24.4](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fcompare\u002Frel-1.24.3...rel-1.24.4)\n\n## 贡献者\n[@derdeljan-msft](https:\u002F\u002Fgithub.com\u002Fderdeljan-msft)、[@adrianlizarraga](https:\u002F\u002Fgithub.com\u002Fadrianlizarraga)、[@apwojcik](https:\u002F\u002Fgithub.com\u002Fapwojcik)、[@baijumeswani](https:\u002F\u002Fgithub.com\u002Fbaijumeswani)、[@edgchen1](https:\u002F\u002Fgithub.com\u002Fedgchen1)、[@mocknen](https:\u002F\u002Fgithub.com\u002Fmocknen)、[@tianleiwu](https:\u002F\u002Fgithub.com\u002Ftianleiwu)、[@XXXXRT666](https:\u002F\u002Fgithub.com\u002FXXXXRT666)","2026-03-17T23:08:09",{"id":122,"version":123,"summary_zh":124,"released_at":125},230954,"v1.24.3","这是 ONNX Runtime 1.24 的补丁版本，包含错误修复、安全改进、性能增强以及执行提供程序的更新。\n\n## 安全修复\n- **核心**：修复了 GatherCopyData 中的整数截断问题，该问题会导致堆越界读写。([#27444](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27444))\n- **核心**：通过未检查的 batch_indices，修复了 RoiAlign 中的堆越界读取问题。([#27543](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27543))\n- **核心**：防止由恶意构造的 LoRA Adapter 引发的堆越界访问。([#27518](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27518))\n- **核心**：修复了 Resize 操作中的越界访问问题。([#27419](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27419))\n\n## 错误修复\n- **核心**：修复了当批维度不匹配时，GatherND 运算中出现的除零错误。([#27090](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27090))\n- **核心**：修复了从字节加载模型时外部数据路径的验证问题。([#27430](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27430))\n- **核心**：修复了 SkipLayerNorm 融合在 gamma\u002Fbeta 不为 1D 时被错误应用的问题。([#27459](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27459))\n- **核心**：修复了 TRT EP 自定义算子域 Release 函数中的双重释放问题。([#27471](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27471))\n- **核心**：修复了 QMoE CPU 算子。([#27360](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27360))\n- **核心**：修复了 MatmulNBits 预打包缩放因子的问题。([#27412](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27412))\n- **Python**：修复了映射输入转换中的引用计数 bug，该 bug 曾导致关闭时发生段错误。([#27413](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27413))\n- **NuGet**：修复了 DllImportResolver。([#27397](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27397))\n- **NuGet**：添加了 `OrtEnv.DisableDllImportResolver`，以防止解析器冲突导致的致命错误。([#27535](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27535))\n\n## 性能改进\n- **核心**：QMoE CPU 性能更新（4-bit 时最高可提升至 4 倍）。([#27364](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27364))\n- **核心**：修复了带有分类特征链的 TreeEnsemble 模型加载时间 O(n²) 的问题。([#27391](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27391))\n\n## 执行提供程序更新\n- **NvTensorRtRtx EP**：\n    - 避免重复创建 fp4\u002Ffp8 原生自定义算子域。([#27192](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27192))\n    - 添加了缺失的 override 修饰符以抑制警告。([#27288](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27288))\n    - 实现了 DQ→MatMulNBits 融合变换器。([#27466](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27466))\n- **WebGPU**：\n    - 当提供 `wasmBinary` 时，在 Blob URL 工作线程中使用嵌入式 WASM 模块。([#27318](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27318))\n    - 修复了 `wasmBinary` 与用于 `.mjs` 文件的 Blob URL 同时使用时的问题。([#27411](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F2741","2026-03-05T19:00:05",{"id":127,"version":128,"summary_zh":129,"released_at":130},230955,"v1.24.2","这是 ONNX Runtime 1.24 的补丁版本，包含多项错误修复、安全改进以及执行提供程序更新。\n\n## 错误修复\n- **NuGet**：修复了 Linux 和 macOS 上 ONNX Runtime NuGet 包中本机库加载的问题。([#27266](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27266))\n- **macOS**：修复了 macOS ARM64 平台上的 Java 支持及 Jar 测试问题。([#27271](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27271))\n- **核心**：为 Hugging Face Hub 缓存的外部数据启用更健壮的符号链接支持。([#27374](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27374))\n- **核心**：为 `SparseTensorProtoToDenseTensorProto` 添加了边界检查，以提高鲁棒性。([#27323](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27323))\n- **安全**：修复了 `ArrayFeatureExtractor` 中的一个越界读取漏洞。([#27275](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27275))\n\n## 执行提供程序更新\n- **MLAS**：修复了 Lut GEMM（MatMulNBitsLutGemm）中的不稳定性和精度问题。([#27216](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27216))\n- **QNN**：为 HTP 目标 v81 或更高版本启用了 64 位 UDMA 模式。([#26677](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F26677))\n- **WebGPU**：\n    - 对预打包分配器使用了 `LazyRelease`。([#27077](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27077))\n    - 修复了 TypeScript 和 C++ 实现中 `ConvTranspose` 偏置验证的问题。([#27213](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27213))\n- **OpenVINO (OVEP)**：通过在共享上下文中复用权重文件来减少常驻内存的补丁。([#27238](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27238))\n- **DNNL**：通过添加缺失文件修复了 DNNL 构建错误。([#27334](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27334))\n\n## 构建与基础设施\n- **CUDA**：\n    - 添加了对 CUDA 12.9 引入的架构系列代码（后缀 'f'）的支持。([#27278](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27278))\n    - 修复了针对不同 CUDA 版本（12.8、13.0、13.1.1）的构建错误和警告。([#27276](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27276))\n    - 应用了针对 Abseil CUDA 警告的补丁。([#27096](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27096)，[#27126](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27126))\n- **流水线**：\n    - 修复了 Windows ARM64 平台的 Python 打包流水线及发布流程。([#27339](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27339)，[#27350](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27350)，[#27299](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27299))\n    - 修复了 DirectML NuGet 流水线，使其能够正确地将 x64 和 ARM64 二进制文件打包到发布版本中。([#27349](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27349))\n    - 更新了 `Microsoft.ML.OnnxRuntime.Foundry` 包，以支持 Windows ARM64 并进行 NuGet 签名。([#27294](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F27294))\n- **测试**：更新了 `BaseTester`，使其能够同时支持带有插件 EP 的","2026-02-19T21:28:03",{"id":132,"version":133,"summary_zh":134,"released_at":135},230956,"v1.24.1","## 📢 通知与重大变更\n\n### 平台支持变更\n- **不再发布 Python 3.10 的 wheel 包** — 请升级至 Python 3.11 或更高版本\n- **新增对 Python 3.14 的支持**\n- **自由线程化 Python（PEP 703）** — 在 Linux 上新增对 Python 3.13t 和 3.14t 的支持（[#26786](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F26786)）\n- **不再提供适用于 macOS\u002FiOS 的 x86_64 二进制文件，并将最低 macOS 版本提升至 14.0**\n\n### API 版本\n- **ORT_API_VERSION** 更新至 **24**（[#26418](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F26418)）\n\n---\n\n## ✨ 新特性\n\n### 🤖 执行提供者（EP）插件 API\n一项重要的基础设施增强，支持基于插件的 EP 并实现动态加载：\n- 初步支持基于内核的 EP（[#26206](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F26206)）\n- 插件 EP 的权重预打包支持（[#26754](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F26754)）\n- EP 上下文模型支持（[#25124](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F25124)）\n- 控制流内核 API（[#26927](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F26927)）\n- 面向基于内核的插件 EP 的 `OrtKernelInfo` API（[#26803](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F26803)）\n\n### 🔧 核心 API\n- **`OrtApi::CreateEnvWithOptions()`** 和 **`OrtEpApi::GetEnvConfigEntries()`**（[#26971](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F26971)）\n- **EP 设备兼容性 API**（[#26922](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F26922)）\n- 用于 D3D12 共享资源的外部资源导入器 API（[#26828](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F26828)）\n- 从 `KernelInfo` 访问会话配置（[#26589](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F26589)）\n\n### 📊 依赖与集成\n- **ONNX 升级至 1.20.1**（[#26579](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F26579)）\n- **Protobuf 更新** 从 3.20.3 → **4.25.8**（[#26910](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F26910)）\n- **CUDA 图默认启用**（[#26929](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F26929)）\n\n---\n\n## 🖥️ 执行提供者更新\n\n### NVIDIA\n- **CUDA EP：** Flash Attention 更新、GQA 内核融合、MoE\u002FqMoE\u002FMatMulNBits 的 BF16 支持、CUDA 13.0 支持\n- **TensorRT EP：** 升级至 TensorRT 10.14、自动加载插件、NVFP4 自定义算子\n- **TensorRT RTX EP：** RTX 运行时缓存、CUDA 图支持、BFloat16、内存映射引擎\n\n### Qualcomm QNN EP\n- QNN SDK 升级至 **2.42.0**，新增算子（RMSNorm、ScatterElements、GatherND、STFT、RandomUniformLike）\n- Gelu 模式融合、LPBQ 量化支持、ARM64 wheel 构建、v81 设备支持\n\n### Intel & AMD\n- **OpenVINO EP：** 升级至 2025.4.1\n- **VitisAI EP：** 外部 EP 加载器、编译模型兼容性 API\n- **MIGraphX EP：** QuickGelu、多头注意力、QLinear 池化算子\n\n### ArmNN EP\nArm 正式宣布在 ONNX Runtime 中弃用 Arm NN 执行提供者（EP）。Arm NN EP 目前仍处于实验阶段，依赖","2026-02-06T00:00:13",{"id":137,"version":138,"summary_zh":19,"released_at":139},230957,"v1.23.2","2025-10-25T04:15:21",{"id":141,"version":142,"summary_zh":143,"released_at":144},230958,"v1.23.1","## 变更内容\n- 修复 CPU 上的 Attention GQA 实现 (#25966)  \n- 处理 GetMemInfo 接口的边缘情况 (#26021)  \n- 实现新的 Python API (#25999)  \n- 为插件执行提供 MemcpyFromHost 和 MemcpyToHost 支持 (#26088)  \n- [TRT RTX EP] 修复 GetCapability 中生成正确子图的 bug (#26132)\n- 向 LogEvaluationStart\u002FStop 和 LogSessionCreationStart 添加 session_id_ (#25590)  \n- [构建] 修复 macOS\u002Farm64 平台上的 WebAssembly 构建问题 (#25653)  \n- [CPU] MoE 内核 (#25958)  \n- [CPU] 面向 CPU 的分块 QMoE 内核 (#26009)  \n- [C#] 实现缺失的 API (#26101)  \n- 使用 ONNX IR \u003C 12 重新生成测试模型 (#26149)  \n- [CPU] 修复因未使用变量导致的编译错误 (#26147)  \n- [EP ABI] 检查 GetCapability() 中指定的节点是否已被分配 (#26156)  \n- [QNN EP] 添加用于设置 HTP 性能模式的动态选项 (#26135)\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fcompare\u002Fv1.23.0...v1.23.1","2025-10-08T04:12:55",{"id":146,"version":147,"summary_zh":148,"released_at":149},230959,"v1.23.0","# 公告\n\n- 本次发布引入了执行提供者（EP）插件 API，这是一项用于构建基于插件的 EP 的新基础设施。（#24887、#25137、#25124、#25147、#25127、#25159、#25191、#2524）\n\n- 本次发布新增了动态下载和安装执行提供者的能力。此功能仅在 WinML 构建中可用，并且需要 Windows 11 版本 25H2 或更高版本。为了利用这一新功能，C\u002FC++\u002FC# 用户应使用通过 Windows 应用 SDK 分发的构建版本，而 Python 用户则应安装 onnxruntime-winml 包（即将发布）。我们鼓励能够升级到最新 Windows 11 的用户使用 WinML 构建，以充分利用此增强功能。\n\n## 即将发生的变更\n\n- 下一个版本将停止为 macOS 和 iOS 操作系统提供 x86_64 二进制文件。\n- 下一个版本将把支持的最低 macOS 版本从 13.4 提升至 14.0。\n- 下一个版本将停止提供 Python 3.10 的 wheel 文件。\n\n# 执行与核心优化\n\n## 简化了 Windows 上的关闭逻辑\n\n现在，在 Windows 上，如果我们检测到进程正在关闭，则某些全局对象将不会被销毁（#24891）。由于进程结束时所有内存都会归还给操作系统，因此这不会导致内存泄漏。这一更改可以降低进程退出时发生崩溃的可能性。\n\n## AutoEP\u002F设备管理\n\nONNX Runtime 现在具备自动发现计算设备并选择最佳 EP 进行下载和注册的能力。目前，EP 下载功能仅适用于 Windows 11 版本 25H2 或更高版本。\n\n## 执行提供者（EP）更新\n\nROCM EP 已从源代码树中移除。建议用户改用来自 AMD 的 Migraphx 或 Vitis AI EP。\n此外，新增了一个名为 Nvidia TensorRT RTX 的 EP。\n\n### Web\nEMDSK 已从 4.0.4 升级至 4.0.8。\n\n### WebGPU EP\n增加了对 WGSL 模板的支持。\n\n### QNN EP\nSDK 更新：新增对 QNN SDK 2.37 的支持。\n\n### KleidiAI\n提升了 SGEMM、IGEMM 和动态量化 MatMul 操作的性能，尤其是在支持 SME2（可扩展矩阵扩展 v2）硬件上的 Conv2D 算子。\n\n# 已知问题\n\n- build.py 中与 KleidiAI 相关的一项更改可能导致交叉编译失败（#26175）。\n\n# 贡献\n\nONNX Runtime 的贡献者包括微软各团队的成员以及社区成员：\n\n@1duo、@Akupadhye、@amarin16、@AndreyOrb、@ankan-ban、@ankitm3k、@anujj、@aparmp-quic、@arnej27959、@bachelor-dou、@benjamin-hodgson、@Bonoy0328、@chenweng-quic、@chuteng-quic、@clementperon、@co63oc、@daijh、@damdoo01-arm、@danyue333、@fanchenkong1、@gedoensmax、@genarks、@gnedanur、@Honry、@huaychou、@ianfhunter、@ishwar-raut1、@jing-bao、@joeyearsley、@johnpaultaken、@jordanozang、@JulienMaille、@keshavv27、@kevinch-nv、@khoover、@krahenbuhl、@kuanyul-quic、@mauriciocm9、@mc-nv、@minfhong-quic、@mingyueliuh、@MQ-mengqing、@NingW101、@","2025-09-26T04:33:48",{"id":151,"version":152,"summary_zh":153,"released_at":154},230960,"v1.22.2","# 有哪些新内容？\n\n此版本添加了 DequantizeLinear（8 位）的优化 CPU\u002FMLAS 实现，并引入了构建选项 client_package_build，该选项启用了更适合客户端\u002F设备端工作负载的默认配置（例如，默认禁用线程自旋）。\n\n## 构建系统与包\n\n- 添加 –client_package_build 选项 ([#25351](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F25351)) - @jywu-msft\n- 从 win-qnn-arm64-ci-pipeline.yml 中移除 Python 安装步骤 ([#25552](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F25552)) - @snnn\n\n## CPU EP\n\n- 为 int8 和 uint8 输入添加 DequantizeLinear 的多线程\u002F向量化实现（SSE2、NEON）([#24818](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F24818)) - @adrianlizarraga\n\n## QNN EP\n\n- 添加对 Upsample、Einsum、LSTM 和 CumSum 算子的支持 ([#24265](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F24265)、[#24616](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F24616)、[#24646](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F24646)、[#24820](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F24820)) - @quic-zhaoxul、@1duo、@chenweng-quic、@Akupadhye\n- 将缩放因子融合到 Softmax 中 ([#24809](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F24809)) - @qti-yuduo\n- 当性能设置为“burst”模式时，启用 DSP 队列轮询 ([#25361](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F25361)) - @quic-calvnguy\n- 将 QNN SDK 更新至 2.36.1 版本 ([#25388](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F25388)) - @qti-jkilpatrick\n- 在 Microsoft.ML.OnnxRunitme.QNN NuGet 包中包含 QNN SDK 的许可证文件 ([#25158](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F25158)) - @HectorSVC","2025-08-13T16:53:48",{"id":156,"version":157,"summary_zh":158,"released_at":159},230961,"v1.22.1","# 有什么新内容？\n\n本次发布将 dxcore.lib 的静态链接替换为可选的运行时加载方式，从而将最低支持版本从 Windows 10 22H2（10.0.22621）降低至 20H1（10.0.19041）。这使得本项目能够兼容 Windows Server 2019（10.0.17763），在该版本中 dxcore.dll 可能不存在。\n\n- 将依赖项由 GitLab 上的 Eigen 更改为 GitHub 上的 Eigen 镜像 #24884 - @prathikr\n- 削弱对 dxcore 的依赖 #24845 - @skottmckay\n- [DML] 恢复与 Windows SDK 10.0.17134.0 的兼容性 #24950 - @JulienMaille\n- 禁用 VCPKG 的二进制缓存 #24889 - @snnn\n\n","2025-07-08T22:08:08",{"id":161,"version":162,"summary_zh":163,"released_at":164},230962,"v1.22.0","## 公告\n\n* 本次发布引入了用于模型编辑器、自动执行提供程序基础架构以及提前编译的新 API。\n* OnnxRuntime GPU 包要求使用 CUDA 12.x，针对 CUDA 11.x 构建的包已不再发布。\n* 支持的最低 Windows 版本现已提升至 10.0.19041。\n\n## 生成式 AI 及高级模型特性\n\n* **约束解码：** 引入了新的约束解码功能，可更精细地控制生成式 AI 模型的输出。\n\n## 执行与核心优化\n\n### 核心\n* **自动 EP 选择基础架构：** 添加了基础架构，支持通过选择策略自动选择执行提供程序，旨在简化配置并优化性能。（拉取请求 #24430）\n* **编译 API：** 引入了新 API，以支持显式编译 ONNX 模型。\n    * 参见：[OrtCompileApi 结构参考](https:\u002F\u002Fonnxruntime.ai\u002Fdocs\u002Fapi\u002Fc\u002Fstruct_ort_compile_api.html)（假设未来文档将采用类似链接结构）\n    * 参见：[EP 上下文设计](https:\u002F\u002Fonnxruntime.ai\u002Fdocs\u002Fapi\u002Fc\u002Fstruct_ort_ep_context.html)（假设未来文档将采用类似链接结构）\n* **模型编辑器 API：** 用于创建或编辑 ONNX 模型的 API。\n    * 参见：[OrtModelEditorApi](https:\u002F\u002Fonnxruntime.ai\u002Fdocs\u002Fapi\u002Fc\u002Fstruct_ort_model_editor_api.html#details)\n\n### 执行提供程序 (EP) 更新\n\n#### CPU EP\u002FMLAS\n* **KleidiAI 集成：** 将 KleidiAI 集成到 ONNX Runtime\u002FMLAS 中，以提升 Arm 架构上的性能。\n* **MatMulNBits 支持：** 增加了对 `MatMulNBits` 的支持，允许使用量化为 8 位权重的矩阵乘法。\n* **GroupQueryAttention 优化与增强**\n\n#### OpenVINO EP\n* 支持最高版本 OpenVINO 2025.1。\n* 为 QDQ 模型引入了 Intel 编译器级别的优化。\n* 增加了基于 LUID 选择 Intel 设备的功能。\n* 改进了 load_config 功能，以支持 AUTO、HETERO 和 MULTI 插件。\n* 其他错误修复和优化。\n* 有关详细更新，请参阅拉取请求 #24394：[ONNXRuntime OpenVINO - Release 1.22](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F24394)。\n\n#### QNN EP\n* **SDK 更新：** 增加了对 QNN SDK 2.33.2 的支持。\n* 对 Sum、Softmax、Upsample、Expand、ScatterND、Einsum 等算子进行了更新和支持。\n* QNN EP 可以构建为共享库或静态库。\n* 启用了 QnnGpu 后端。\n* 有关详细更新，请参阅 [最近标记为 QNN 的已合并拉取请求](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpulls?q=is%3Apr+qnn+ep+is%3Aclosed+label%3Aep%3AQNN)。\n\n#### TensorRT EP\n* **TensorRT 版本：** 增加了对 TensorRT 10.9 的支持。\n    * **注意：** 对于使用 onnx-tensorrt 开源解析器的用户，请查看[此处](https:\u002F\u002Fonnxruntime.ai\u002Fdocs\u002Fbuild\u002Feps.html#note-to-ort-1210-open-sourced-parser-users)，了解具体要求（此处引用 1.21 的链接作为占位符，应更新为 1.22 的链接）。\n* **新特性：**\n    * EP 选项可启用 TRT 预览功能。\n    * 支持加载 TensorRT V3 插件。\n* **错误修复：**\n    * 解决了一个问题 rel","2025-05-10T01:14:00",{"id":166,"version":167,"summary_zh":168,"released_at":169},230963,"v1.21.1","# 有哪些新内容？\n\n- 使用所有 Blackwell 计算能力扩展 CMAKE_CUDA_FLAGS #23928 - @yf711\n- [ARM CPU] 修复在不支持 FP16 的平台上 FP16 常量初始化问题 #23978 - @fajin-corp\n- [TensorRT EP] 在计算函数中调用 cudaSetDevice，以处理多线程场景 #24010 - @chilo-ms\n- 修复注意力偏置广播问题 #24017 - @tianleiwu\n- 删除常量 SKIP_CUDA_TEST_WITH_DML #24113 - @CodingSeaotter\n- [QNN EP] ARM64EC Python 包构建中移除 --vcpkg 选项 #24174 - @jywu-msft\n- [WASM] 移除 WASM 构建中的 --vcpkg 选项 #24179 - @fs-eire\n\n","2025-04-21T17:38:36",{"id":171,"version":172,"summary_zh":173,"released_at":174},230964,"v1.21.0","## Announcements\r\n- No large announcements of note this release! We've made a lot of small refinements to streamline your ONNX Runtime experience.\r\n\r\n## GenAI & Advanced Model Features\r\n\r\n### Enhanced Decoding & Pipeline Support\r\n- Added \"chat mode\" support for CPU, GPU, and WebGPU.\r\n- Provided support for decoder model pipelines.\r\n- Added support for Java API for MultiLoRA.\r\n\r\n### API & Compatibility Updates\r\n- Chat mode introduced breaking changes in the API (see [migration guide](https:\u002F\u002Fonnxruntime.ai\u002Fdocs\u002Fgenai\u002Fhowto\u002Fmigrate.html)).\r\n\r\n### Bug Fixes for Model Output\r\n- Fixed Phi series garbage output issues with long prompts.\r\n- Resolved gibberish issues with `top_k` on CPU.\r\n\r\n## Execution & Core Optimizations\r\n\r\n### Core Refinements\r\n- Reduced default logger usage for improved efficiency(#23030).\r\n- Fixed a visibility issue in theadpool (#23098).\r\n\r\n### Execution Provider (EP) Updates\r\n#### General\r\n- Removed TVM EP from the source tree(#22827).\r\n- Marked NNAPI EP for deprecation (following Google's deprecation of NNAPI).\r\n- Fixed a DLL delay loading issue that impacts WebGPU EP and DirectML EP's usability on Windows (#23111, #23227)\r\n\r\n#### TensorRT EP Improvements\r\n- Added support for TensorRT 10.8.\r\n  - [onnx-tensorrt](https:\u002F\u002Fgithub.com\u002Fonnx\u002Fonnx-tensorrt) open-source parser user: please check [here](https:\u002F\u002Fonnxruntime.ai\u002Fdocs\u002Fbuild\u002Feps.html#note-to-ort-1210-open-sourced-parser-users) for requirement.\r\n- Assigned DDS ops (`NMS`, `RoiAlign`, `NonZero`) to TensorRT by default.\r\n- Introduced option `trt_op_types_to_exclude` to exclude specific ops from TensorRT assignment.\r\n\r\n#### CUDA EP Improvements\r\n- Added a python API [preload_dlls](https:\u002F\u002Fonnxruntime.ai\u002Fdocs\u002Fexecution-providers\u002FCUDA-ExecutionProvider.html#preload-dlls) to [coexist with PyTorch](https:\u002F\u002Fonnxruntime.ai\u002Fdocs\u002Fexecution-providers\u002FCUDA-ExecutionProvider.html#compatibility-with-pytorch).\r\n- Miscellaneous enhancements for Flux model inference.\r\n\r\n#### QNN EP Improvements\r\n- Introduced QNN shared memory support.\r\n- Improved performance for AI Hub models.\r\n- Added support for QAIRT\u002FQNN SDK 2.31.\r\n- Added Python 3.13 package.\r\n- Miscellaneous bug fixes and enhancements.\r\n- QNN EP is now built as a shared library\u002FDLL by default. To retain previous build behavior, use build option `--use_qnn static_lib`.\r\n\r\n#### DirectML EP Support & Upgrades\r\n- Updated DirectML version from 1.15.2 to 1.15.4(#22635).\r\n\r\n#### OpenVINO EP Improvements\r\n- Introduced OpenVINO EP Weights Sharing feature.\r\n- Added support for various contrib Ops in OVEP:\r\n  - `SkipLayerNormalization`, `MatMulNBits`, `FusedGemm`, `FusedConv`, `EmbedLayerNormalization`, `BiasGelu`, `Attention`, `DynamicQuantizeMatMul`, `FusedMatMul`, `QuickGelu`, `SkipSimplifiedLayerNormalization`\r\n- Miscellaneous bug fixes and improvements.\r\n\r\n#### VitisAI EP Improvements\r\n- Miscellaneous bug fixes and improvements.\r\n\r\n## Mobile Platform Enhancements\r\n\r\n### CoreML Updates\r\n- Added support for caching generated CoreML models.  \r\n\r\n## Extensions & Tokenizer Improvements\r\n\r\n### Expanded Tokenizer Support\r\n- Now supports more tokenizer models, including `ChatGLM`, `Baichuan2`, `Phi-4`, etc.\r\n- Added full `Phi-4` pre\u002Fpost-processing support for text, vision, and audio.\r\n- Introduced RegEx pattern loading from `tokenizer.json`.\r\n\r\n### Image Codec Enhancements\r\n- `ImageCodec` now links to native APIs if available; otherwise, falls back to built-in libraries.\r\n\r\n### Unified Tokenizer API\r\n- Introduced a new tokenizer op schema to unify the tokenizer codebase.\r\n- Added support for loading tokenizer data from a memory blob in the C API.\r\n\r\n## Infrastructure & Build Improvements\r\n\r\n### Runtime Requirements\r\n\r\nAll the prebuilt Windows packages now require VC++ Runtime version >= 14.40(instead of 14.38).  If your VC++ runtime version is lower than that, you may see a crash when ONNX Runtime was initializing. See https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FSTL\u002Fwiki\u002FChangelog#vs-2022-1710 for more details. \r\n\r\nUpdated minimum iOS and Android SDK requirements to align with React Native 0.76:\r\n   - iOS  >=  [15.1](https:\u002F\u002Fsupport.apple.com\u002Fen-gb\u002F108051#151)\r\n   - Android API >= [24](https:\u002F\u002Fdeveloper.android.com\u002Ftools\u002Freleases\u002Fplatforms#7.0) (Android 7)\r\n\r\nAll macOS packages now require macOS version >= 13.3.\r\n\r\n### CMake File Changes\r\n\r\nCMake Version: Increased the minimum required CMake version from 3.26 to 3.28. Added support for CMake 4.0.\r\nPython Version: Increased the minimum required Python version from 3.8 to 3.10 for building ONNX Runtime from source.\r\nImproved VCPKG support\r\n\r\nAdded the following cmake options for WebGPU EP\r\n\r\n- onnxruntime_USE_EXTERNAL_DAWN\r\n- onnxruntime_CUSTOM_DAWN_SRC_PATH\r\n- onnxruntime_BUILD_DAWN_MONOLITHIC_LIBRARY\r\n- onnxruntime_ENABLE_PIX_FOR_WEBGPU_EP\r\n- onnxruntime_ENABLE_DAWN_BACKEND_VULKAN\r\n- onnxruntime_ENABLE_DAWN_BACKEND_D3D12\r\n\r\nAdded cmake option onnxruntime_BUILD_QNN_EP_STATIC_LIB for building with QNN EP as a static library.\r\nRemoved cmake option onnxruntime_USE_PREINSTALLED_EIGEN","2025-03-08T05:33:03",{"id":176,"version":177,"summary_zh":178,"released_at":179},230965,"v1.20.2","# What's new?\r\n## Build System & Packages\r\n- Merge Windows machine pools for Web CI pipeline to reduce maintenance costs (#23243) - @snnn \r\n- Update boost URL for React Native CI pipeline (#23281) - @jchen351 \r\n- Move ORT Training pipeline to GitHub actions and enable CodeQL scan for the source code (#22543) - @snnn \r\n- Move Linux GitHub actions to a dedicated machine pool (#22566) - @snnn \r\n- Update Apple deployment target to iOS 15.1 and macOS 13.3 (#23308) - @snnn \r\n- Deprecate macOS 12 in packaging pipeline (#23017) - @mszhanyi \r\n- Remove net8.0-android MAUI target from MAUI test project (#23607) - @carzh \r\n\r\n## CUDA EP\r\n- Fixes use of numeric_limits that causes a compiler error in Visual Studio 2022 v17.12 Preview 5 (#22738, #22868) - @tianleiwu\r\n\r\n## QNN EP\r\n- Enable offloading graph input quantization and graph output dequantization to CPU by default. Improves inference latency by reducing the amount of I\u002FO data copied between CPU and NPU. (#23368) - @adrianlizarraga\r\n","2025-02-12T22:57:43",{"id":181,"version":182,"summary_zh":183,"released_at":184},230966,"v1.20.1","# What's new?\r\n\r\n## Python Quantization Tool\r\n- Prevent int32 quantized bias from clipping by adjusting the weight's scale (#22020) - @adrianlizarraga \r\n- Update QDQ Pad, Slice, Softmax (#22676) - @adrianlizarraga \r\n- Introduce get_qdq_config() helper to get QDQ configurations (#22677) - @adrianlizarraga \r\n- Add reduce_range option to get_qdq_config() (#22782) - @adrianlizarraga \r\n- Flaky test due to Pad reflect bug (#22798) - @adrianlizarraga \r\n\r\n## CPU EP\r\n- Refactor SkipLayerNorm implementation to address issues (#22719, #22862) - @amarin16, @liqunfu\r\n\r\n## QNN EP\r\n- Add QNN SDK v2.28.2 support (#22724, #22844) - @HectorSVC, @adrianlizarraga\r\n\r\n## TensorRT EP\r\n- Exclude DDS ops from running on TRT (#22875) - @chilo-ms\r\n\r\n## Packaging\r\n- Rework the native library usage so that a pre-built ORT native package can be easily used (#22345) - @skottmckay \r\n- Fix Maven Sha256 Checksum Issue (#22600) - @idiskyle \r\n\r\n## Contributions\r\nBig thank you to the release manager @yf711, along with @adrianlizarraga, @HectorSVC, @jywu-msft, and everyone else who helped to make this patch release process a smooth one!","2024-11-21T22:20:31",{"id":186,"version":187,"summary_zh":188,"released_at":189},230967,"v1.20.0","**Release Manager: @apsonawane** \r\n\r\n# Announcements\r\n- **All ONNX Runtime Training packages have been deprecated.** ORT 1.19.2 was the last release for which onnxruntime-training (PyPI), onnxruntime-training-cpu (PyPI), Microsoft.ML.OnnxRuntime.Training (Nuget), onnxruntime-training-c (CocoaPods), onnxruntime-training-objc (CocoaPods), and onnxruntime-training-android (Maven Central) were published.\r\n- **ONNX Runtime packages will stop supporting Python 3.8 and Python 3.9.** This decision aligns with NumPy Python version support. To continue using ORT with Python 3.8 and Python 3.9, you can use ORT 1.19.2 and earlier.\r\n- **ONNX Runtime 1.20 CUDA packages will include new dependencies that were not required in 1.19 packages.** The following dependencies are new: libcudnn_adv.so.9, libcudnn_cnn.so.9, libcudnn_engines_precompiled.so.9, libcudnn_engines_runtime_compiled.so.9, libcudnn_graph.so.9, libcudnn_heuristic.so.9, libcudnn_ops.so.9, libnvrtc.so.12, and libz.so.1.\r\n\r\n# Build System & Packages\r\n- Python 3.13 support is included in PyPI packages.\r\n- ONNX 1.17 support will be delayed until a future release, but the ONNX version used by ONNX Runtime has been patched to include a [shape inference change to the Einsum op](https:\u002F\u002Fgithub.com\u002Fonnx\u002Fonnx\u002Fpull\u002F6010).\r\n- DLLs in the Maven build are now digitally signed (fix for issue reported [here](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fissues\u002F19204)).\r\n- (Experimental) vcpkg support added for the CPU EP. The DML EP does not yet support vcpkg, and other EPs have not been tested.\r\n\r\n# Core\r\n- MultiLoRA support.\r\n- Reduced memory utilization.\r\n  - Fixed alignment that was causing mmap to fail for external weights.\r\n  - Eliminated double allocations when deserializing external weights.\r\n  - Added ability to serialize pre-packed weights so that they don’t cause an increase in memory utilization when the model is loaded.\r\n- Support bfloat16 and float8 data types in python I\u002FO binding API.\r\n\r\n# Performance\r\n- INT4 quantized embedding support on CPU and CUDA EPs.\r\n- Miscellaneous performance improvements and bug fixes.\r\n\r\n# EPs\r\n\r\n## CPU\r\n- FP16 support for MatMulNbits, Clip, and LayerNormalization ops.\r\n\r\n## CUDA\r\n- Cudnn frontend integration for convolution operators.\r\n- Added support of cuDNN Flash Attention and Lean Attention in MultiHeadAttention op.\r\n\r\n## TensorRT\r\n- TensorRT [10.4](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FTensorRT\u002Freleases\u002Ftag\u002Fv10.4.0) and [10.5](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FTensorRT\u002Freleases\u002Ftag\u002Fv10.5.0) support.\r\n\r\n## QNN\r\n- QNN HTP support for weight sharing across multiple ORT inference sessions. (See [ORT QNN EP documentation](https:\u002F\u002Fonnxruntime.ai\u002Fdocs\u002Fexecution-providers\u002FQNN-ExecutionProvider.html#qnn-ep-weight-sharing) for more information.) \r\n- Support for QNN SDK 2.27.\r\n\r\n## OpenVINO\r\n- Added support up to OpenVINO 2024.4.1.\r\n- Compile-time memory optimizations.\r\n- Enhancement of ORT EPContext Session option for optimized first inference latency.\r\n- Added remote tensors to ensure direct memory access for inferencing on NPU.\r\n\r\n## DirectML\r\n- [DirectML 1.15.2](https:\u002F\u002Fwww.nuget.org\u002Fpackages\u002FMicrosoft.AI.DirectML\u002F1.15.2) support.\r\n\r\n# Mobile\r\n- Improved Android QNN support, including a pre-built Maven package and various performance improvements. \r\n- FP16 support for ML Program models with CoreML EP. \r\n- FP16 XNNPACK kernels to provide a fallback option if CoreML is not available at runtime. \r\n- Initial support for using the native WebGPU EP on Android and iOS. _Note: The set of initial operators is limited, and the code is available from the main branch, not ORT 1.20 packages. See [#22591](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F22591) for more information.\r\n\r\n# Web\r\n- Quantized embedding support.\r\n- On-demand weight loading support (offloads Wasm32 heap and enables 8B-parameter LLMs).\r\n- Integrated Intel GPU performance improvements.\r\n- [Opset-21](https:\u002F\u002Fgithub.com\u002Fonnx\u002Fonnx\u002Freleases\u002Ftag\u002Fv1.16.0) support (Reshape, Shape, Gelu).\r\n\r\n# GenAI\r\n- MultiLoRA support.\r\n- Generations can now be terminated mid-loop.\r\n- Logit soft capping support in Group Query Attention (GQA).\r\n- Additional model support, including Phi-3.5 Vision Multi-Frame, ChatGLM3, and Nemotron-Mini.\r\n- Python package now available for Mac.\r\n- Mac \u002F iOS now available in NuGet packages.\r\n\r\n_Full release notes for ONNX Runtime generate() API v0.5.0 can be found [here](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Freleases)._\r\n\r\n# Extensions\r\n- Tokenization performance improvements.\r\n- Support for latest Hugging Face tokenization JSON format (transformers>=4.45).\r\n- Unigram tokenization model support.\r\n- OpenCV dependency removed from C API build.\r\n\r\n_Full release notes for ONNX Runtime Extensions v0.13 can be found [here](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-extensions\u002Freleases)._\r\n\r\n# Olive\r\n- Olive command line interface (CLI) now available with support to execute well-defined, concrete workflows without the need to create or edit configs manually.\r\n- Additional improvements, incl","2024-11-01T18:02:18",{"id":191,"version":192,"summary_zh":193,"released_at":194},230968,"v1.19.2","## Announcements\r\n* ORT 1.19.2 is a small patch release, fixing some broken workflows and introducing bug fixes.\r\n\r\n## Build System & Packages\r\n* Fixed the signing of native DLLs.\r\n* Disabled absl symbolize in Windows Release build to avoid dependency on dbghelp.dll.\r\n\r\n## Training\r\n* Restored support for CUDA compute capability 7.0 and 7.5 with CUDA 12, and 6.0 and 6.1 with CUDA 11.\r\n* Several fixes for training CI pipelines.\r\n\r\n## Mobile\r\n* Fixed ArgMaxOpBuilder::AddToModelBuilderImpl() nullptr Node access for CoreML EP.\r\n\r\n## Generative AI\r\n* Added CUDA kernel for Phi3 MoE.\r\n* Added smooth softmax support in CUDA and CPU kernels for the GroupQueryAttention operator.\r\n* Fixed number of splits calculations in GroupQueryAttention CUDA operator.\r\n* Enabled causal support in the MultiHeadAttention CUDA operator.\r\n\r\n## Contributors\r\n@prathikr, @mszhanyi, @edgchen1, @tianleiwu, @wangyems, @aciddelgado, @mindest, @snnn, @baijumeswani, @MaanavD \r\n\r\n**Thanks to everyone who helped ship this release smoothly!**\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fcompare\u002Fv1.19.0...v1.19.2","2024-09-04T19:33:14",{"id":196,"version":197,"summary_zh":198,"released_at":199},230969,"v1.19.0","## Announcements\r\n* Note that the wrong commit was initially tagged with v1.19.0. The final commit has since been correctly tagged: https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fcommit\u002F26250ae74d2c9a3c6860625ba4a147ddfb936907. This shouldn't effect much, but sorry for the inconvenience!\r\n\r\n## Build System & Packages\r\n* Numpy support for 2.x has been added \r\n* Qualcomm SDK has been upgraded to 2.25 \r\n* ONNX has been upgraded from 1.16 → 1.16.1 \r\n* Default GPU packages use CUDA 12.x and Cudnn 9.x  (previously CUDA 11.x\u002FCuDNN 8.x) CUDA 11.x\u002FCuDNN 8.x packages are moved to the aiinfra VS feed. \r\n* TensorRT 10.2 support added \r\n* Introduced Java CUDA 12 packages on Maven. \r\n* Discontinued support for Xamarin. (Xamarin reached EOL on May 1, 2024) \r\n* Discontinued support for macOS 11 and increasing the minimum supported macOS version to 12. (macOS 11 reached EOL in September 2023) \r\n* Discontinued support for iOS 12 and increasing the minimum supported iOS version to 13. \r\n\r\n## Core\r\n* Implemented DeformConv\r\n* [Fixed big-endian](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F21133) and support build on AIX\r\n\r\n## Performance\r\n* Added QDQ support for INT4 quantization in CPU and CUDA Execution Providers \r\n* Implemented FlashAttention on CPU to improve performance for GenAI prompt cases \r\n* Improved INT4 performance on CPU (X64, ARM64) and NVIDIA GPUs \r\n\r\n## Execution Providers\r\n* TensorRT\r\n  * Updated to support TensorRT 10.2 \r\n  * Remove calls to deprecated api’s\t \r\n  * Enable refittable embedded engine when ONNX model provided as byte stream \r\n\r\n* CUDA\r\n  * Upgraded cutlass to 3.5.0 for performance improvement of memory efficient attention.\r\n  * Updated MultiHeadAttention and Attention operators to be thread-safe.\r\n  * Added sdpa_kernel provider option to choose kernel for Scaled Dot-Product Attention.\r\n  * Expanded op support - Tile (bf16)\r\n\r\n* CPU\r\n  * Expanded op support - GroupQueryAttention, SparseAttention (for Phi-3 small)\r\n  \r\n* QNN\r\n  * Updated to support QNN SDK 2.25 \r\n  * Expanded op support - HardSigmoid, ConvTranspose 3d, Clip (int32 data), Matmul (int4 weights), Conv (int4 weights), prelu (fp16) \r\n  * Expanded fusion support – Conv + Clip\u002FRelu fusion \r\n\r\n* OpenVINO\r\n  * Added support for OpenVINO 2024.3 \r\n  * Support for enabling EpContext using session options \r\n\r\n* DirectML\r\n  * Updated DirectML from 1.14.1 → 1.15.1 \r\n  * Updated ONNX opset from 17 → 20 \r\n  * Opset 19 and Opset 20 are supported with known caveats: \r\n    * Gridsample 20: 5d not supported \r\n    * DeformConv not supported \r\n\r\n## Mobile\r\n* Additional CoreML ML Program operators were added \r\n  * See supported operators list [here](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fblob\u002Fmain\u002Ftools\u002Fci_build\u002Fgithub\u002Fapple\u002Fcoreml_supported_mlprogram_ops.md) \r\n* Fixed packaging issue with macOS framework in onnxruntime-c cocoapod \r\n* Removed Xamarin support \r\n  * Xamarin EOL was May 1, 2024 \r\n  * [Xamarin official support policy | .NET (microsoft.com)](https:\u002F\u002Fdotnet.microsoft.com\u002Fen-us\u002Fplatform\u002Fsupport\u002Fpolicy\u002Fxamarin)\r\n\r\n## Web\r\n* Updated JavaScript packaging to align with best practices, including slight incompatibilities when apps bundle onnxruntime-web \r\n* Improved CPU operators coverage for WebNN (now supported by Chrome) \r\n\r\n## Training\r\n* No specific updates \r\n\r\n## GenAI\r\n* Support for new models Qwen, Llama 3.1, Gemma 2, phi3 small \r\n* Support to build quantized models with method AWQ and GPTQ \r\n* Performance improvements for Intel and Arm CPU \r\n* Packing and language binding \r\n  * Added Java bindings (build from source) \r\n  * Separate OnnxRuntime.dll and directml.dll out of GenAI package to improve usability \r\n  * Publish packages for Win Arm \r\n  * Support for Android (build from source) \r\n* Bug fixes, like the [long prompt correctness issue for](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fissues\u002F552) phi3.\r\n\r\n## Extensions\r\n* Added C APIs for language, vision and audio processors including new FeatureExtractor for Whisper \r\n* Support for Phi-3 Small Tokenizer and new OpenAI tiktoken format for fast loading of BPE tokenizers \r\n* Added new CUDA custom operators such as MulSigmoid, Transpose2DCast, ReplaceZero, AddSharedInput and MulSharedInput \r\n* Enhanced Custom Op Lite API on GPU and fused kernels for DORT \r\n* Bug fixes, including null bos_token for Qwen2 tokenizer and SentencePiece converted FastTokenizer issue on non-ASCII characters, as well as necessary updates for MSVC 19.40 and numpy 2.0 release \r\n\r\n## Contributors\r\nChangming Sun, Baiju Meswani, Scott McKay, Edward Chen, Jian Chen, Wanming Lin, Tianlei Wu, Adrian Lizarraga, Chester Liu, Yi Zhang, Yulong Wang, Hector Li, kunal-vaishnavi, pengwa, aciddelgado, Yifan Li, Xu Xing, Yufeng Li, Patrice Vignola, Yueqing Zhang, Jing Fang, Chi Lo, Dmitri Smirnov, mingyueliuh, cloudhan, Yi-Hong Lyu, Ye Wang, Ted Themistokleous, Guenther Schmuelling, George Wu, mindest, liqun Fu, Preetha Veeramalai, Justin Chu, Xiang Zhang, zz002, vraspar, kailums, guyang3532, Satya Kumar Jandhyala, Rachel Guo, Prath","2024-08-19T18:44:22",{"id":201,"version":202,"summary_zh":203,"released_at":204},230970,"v1.18.1","## What's new?\r\n\r\n**Announcements:**\r\n- ONNX Runtime Python packages now have numpy dependency >=1.21.6, \u003C2.0. Support for numpy 2.0 will be added in a future release.\r\n- CUDA 12.x ONNX Runtime GPU packages are now built against cuDNN 9.x (1.18.0 packages previously depended on cuDNN 8.x). CUDA 11.x ONNX Runtime GPU packages continue to depend on CuDNN 8.x.\r\n- Windows packages require installation of Microsoft Visual C++ Redistributable Runtime 14.38 or newer. \r\n\r\n**TensorRT EP:**\r\n- TensorRT Weightless API integration.\r\n- Support for TensorRT hardware compatible engines.\r\n- Support for INT64 types in TensorRT constant layer calibration.\r\n- Now using latest commit of onnx-tensorrt parser, which includes several issue fixes.\r\n- Additional TensorRT support and performance improvements.\r\n\r\n**Packages:**\r\n- Publish CUDA 12 Java packages to Azure DevOps feed.\r\n- Various packaging pipeline fixes.\r\n\r\nThis patch release also features various other bug fixes, including a CUDA 12.5 build error fix.\r\n\r\n**Big thank you to @yf711 for driving this release as the release manager and to all our contributors!**\r\n\r\n@yf711 @jchen351 @mszhanyi @snnn @wangyems @jywu-msft @skottmckay @chilo-ms @moraxu @kevinch-nv @pengwa @wejoncy @pranavsharma @Craigacp @jslhcl @adrianlizarraga @inisis @jeffbloo @mo-ja @kunal-vaishnavi @sumitsays @neNasko1 @yufenglee @dhruvbird @wangshuai09 @xiaoyu-work @axinging @yuslepukhin @YUNQIUGUO @shubhambhokare1 @fs-eire @afantino951 @tboby @HectorSVC @baijumeswani","2024-06-28T00:29:13",{"id":206,"version":207,"summary_zh":208,"released_at":209},230971,"v1.18.0","## Announcements\r\n* **Windows ARM32 support has been dropped at the source code level**.\r\n* **Python version >=3.8 is now required for build.bat\u002Fbuild.sh** (previously >=3.7). *Note: If you have Python version \u003C3.8, you can bypass the tools and use CMake directly.*\r\n* **The [onnxruntime-mobile](https:\u002F\u002Fmvnrepository.com\u002Fartifact\u002Fcom.microsoft.onnxruntime\u002Fonnxruntime-mobile) Android package and onnxruntime-mobile-c\u002Fonnxruntime-mobile-objc iOS cocoapods are being deprecated**. Please use the [onnxruntime-android](https:\u002F\u002Fmvnrepository.com\u002Fartifact\u002Fcom.microsoft.onnxruntime\u002Fonnxruntime-android) Android package, and onnxruntime-c\u002Fonnxruntime-objc cocoapods, which support ONNX and ORT format models and all operators and data types. *Note: If you require a smaller binary size, a custom build is required. See details on creating a custom Android or iOS package on [Custom build | onnxruntime](https:\u002F\u002Fonnxruntime.ai\u002Fdocs\u002Fbuild\u002Fcustom.html#custom-build-packages).*\r\n\r\n## Build System & Packages\r\n* CoreML execution provider now depends on coremltools. \r\n* Flatbuffers has been upgraded from 1.12.0 → 23.5.26. \r\n* ONNX has been upgraded from 1.15 → 1.16. \r\n* EMSDK has been upgraded from 3.1.51 → 3.1.57. \r\n* Intel neural_speed library has been upgraded from v0.1.1 → v0.3 with several important bug fixes. \r\n* There is a new onnxruntime_CUDA_MINIMAL CMake option for building ONNX Runtime CUDA execution provider without any operations apart from memcpy ops. \r\n* Added support for Catalyst for macOS build support. \r\n* Added initial support for RISC-V and three new build options for it: `--rv64`, `--riscv_toolchain_root`, and `--riscv_qemu_path`. \r\n* Now you can build TensorRT EP with protobuf-lite instead of the full version of protobuf. \r\n* Some security-related compile\u002Flink flags have been moved from the default setting → new build option: `--use_binskim_compliant_compile_flags`. *Note: All our release binaries are built with this flag, but when building ONNX Runtime from source, this flag is default OFF.*\r\n* Windows ARM64 build now depends on PyTorch CPUINFO library. \r\n* Windows OneCore build now uses “Reverse forwarding” apisets instead of “Direct forwarding”, so onnxruntime.dll in our Nuget packages will depend on kernel32.dll. *Note: Windows systems without kernel32.dll need to have reverse forwarders (see [API set loader operation - Win32 apps | Microsoft Learn](https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fwindows\u002Fwin32\u002Fapiindex\u002Fapi-set-loader-operation) for more information).*\r\n\r\n## Core\r\n* Added ONNX 1.16 support.\r\n* Added additional optimizations related to Dynamo-exported models.\r\n* Improved testing infrastructure for EPs developed as shared libraries.\r\n* Exposed Reserve() in OrtAllocator to allow custom allocators to work when session.use_device_allocator_for_initializers is specified.\r\n* Improved lock contention due to memory allocations.\r\n* Improved session creation time (graph and graph transformer optimizations).\r\n* Added new SessionOptions config entry to disable specific transformers and rules.\r\n* [C# API] Exposed SessionOptions.DisablePerSessionThreads to allow sharing of threadpool between sessions.\r\n* [Java API] Added CUDA 12 Java support.\r\n  \r\n## Performance\r\n* Improved 4bit quant support:\r\n  * Added HQQ quantization support to improve accuracy.\r\n  * Implemented general GEMM kernel and improved GEMV kernel performance on GPU.\r\n  * Improved GEMM kernel quality and performance on x64.\r\n  * Implemented general GEMM kernel and improved GEMV performance on ARM64.\r\n* Improved MultiheadAttention performance on CPU.\r\n\r\n## Execution Providers\r\n* TensorRT\r\n  * Added support for TensorRT 10. \r\n  * Finalized support for DDS ops. \r\n  * Added Python support for user provided CUDA stream. \r\n  * Fixed various bugs. \r\n\r\n* CUDA\r\n  * Added support of multiple CUDA graphs.\r\n  * Added a provider option to disable TF32.\r\n  * Added Python support for user provided CUDA stream.\r\n  * Extended MoE to support of Tensor Parallelism and int4 quantization.\r\n  * Fixed bugs in BatchNorm and TopK kernel.\r\n\r\n* QNN\r\n  * Added support for up to QNN SDK 2.22. \r\n  * Upgraded support from A16W8 → mixed 8\u002F16-bit precision configurability per layer. \r\n  * Added fp16 execution support via enable_htp_fp16 option. \r\n  * Added multiple partition support for QNN context binary. \r\n  * Expanded operator support and fixed various bugs. \r\n  * Added support for per-channel quantized weights for Conv.\r\n  * Integration with Qualcomm’s AIHub.\r\n\r\n* OpenVINO\r\n  * Added support for up to OpenVINO 2024.1. \r\n  * Added support for importing pre-compiled blob as EPContext blob. \r\n  * Separated device and precision as inputs by removing support for device_id in provider options and adding precision as separate CLI option. \r\n  * Deprecated CPU_FP32 and GPU_FP32 terminology and introduced CPU and GPU terminology. \r\n  * `AUTO:GPU,CPU` will only create GPU blob, not CPU blob. \r\n\r\n* DirectML\r\n  * Additional ONNX operator support: Resize-18 and Resize-19, Col2Im-18, InNaN-20, IsInf-2","2024-05-21T00:28:41",{"id":211,"version":212,"summary_zh":213,"released_at":214},230972,"v1.17.3","# What's new?\r\n\r\n**General:**\r\n- Update copying API header files to make Linux logic consistent with Windows ([#19736](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19736)) - @mszhanyi\r\n- Pin ONNX version to fix DML and Python packaging pipeline exceptions ([#20073](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F20073)) - @mszhanyi \r\n\r\n**Build System & Packages:**\r\n- Fix minimal build with training APIs enabled bug affecting Apple framework ([#19858](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19858)) - @edgchen1 \r\n\r\n**Core:**\r\n- Fix SplitToSequence op with string tensor bug ([#19942](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19942)) - @Craigacp \r\n\r\n**CUDA EP:**\r\n- Fix onnxruntime_test_all build break with CUDA ([#19673](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19673)) - @gedoensmax \r\n- Fix broken pooling CUDA NHWC ops and ensure NCHW \u002F NHWC parity ([#19889](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19889)) - @mtavenrath \r\n\r\n**TensorRT EP:**\r\n- Fix TensorRT build break caused by image update ([#19880](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19880)) - @jywu-msft \r\n- Fix TensorRT custom op list concurrency bug ([#20093](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F20093)) -  @chilo-ms \r\n\r\n**Web:**\r\n- Add hardSigmoid op support and hardSigmoid activation for fusedConv ([#19215](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19215), [#19233](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19233)) - @qjia7 \r\n- Add support for WebNN async API with Asyncify ([#19415](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19145)) - @Honry \r\n- Add uniform support for conv, conv transpose, conv grouped, and fp16 ([#18753](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F18753), [#19098](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19098)) - @axinging \r\n- Add capture and replay support for JS EP ([#18989](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F18989)) - @fs-eire \r\n- Add LeakyRelu activation for fusedConv ([#19369](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19369)) - @qjia7 \r\n- Add FastGelu custom op support ([#19392](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19369)) - @fs-eire \r\n- Allow uint8 tensors for WebGPU ([#19545](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19545)) - @satyajandhyala \r\n- Add and optimize MatMulNBits ([#19852](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19852)) - @satyajandhyala \r\n- Enable ort-web with any Float16Array polyfill ([#19305](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19305)) - @fs-eire \r\n- Allow multiple EPs to be specified in backend resolve logic ([#19735](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19735)) - @fs-eire \r\n- Various bug fixes: ([#19258](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19258)) - @gyagp, ([#19201](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19201), [#19554](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19554)) - @hujiajie, ([#19262](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19262), [#19981](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19981)) - @guschmue, ([#19581](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19581), [#19596](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19596), [#19387](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19387)) - @axinging, ([#19613](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19613)) - @satyajandhyala\r\n- Various improvements for performance and usability: ([#19202](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19202)) - @qjia7, ([#18900](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F18900), [#19281](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19281), [#18883](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F18883)) - @axinging, ([#18788](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F18788), [#19737](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19737)) - @satyajandhyala, ([#19610](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19610)) - @segevfiner, ([#19614](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19614), [#19702](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19702), [#19677](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19677), [#19857](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19857), [#19940](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19940)) - @fs-eire, ([#19791](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19791)) - @gyagp, ([#19868](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19868)) - @guschmue, ([#19433](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19433)) - @martholomew, ([#19932](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19932)) - @ibelem\r\n\r\n**Windows:**\r\n- Fix Windows memory mapping bug affecting some larger models ([#19623](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19623)) - @yufenglee \r\n\r\n**Kernel Optimizations:**\r\n- Fix GQA and Rotary Embedding bugs affecting some models ([#19801](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19801), [#19874](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F19874)) - @aciddelgado \r\n- Update replacement of MultiHeadAttention (MHA) and GroupQueryAttention (GQA) ([#19882](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime\u002Fpull\u002F1988","2024-04-18T15:46:48",[216,228,236,245,253,262],{"id":217,"name":218,"github_repo":219,"description_zh":220,"stars":221,"difficulty_score":222,"last_commit_at":223,"category_tags":224,"status":91},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[225,80,226,227],"Agent","图像","数据工具",{"id":229,"name":230,"github_repo":231,"description_zh":232,"stars":233,"difficulty_score":222,"last_commit_at":234,"category_tags":235,"status":91},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[80,226,225],{"id":237,"name":238,"github_repo":239,"description_zh":240,"stars":241,"difficulty_score":90,"last_commit_at":242,"category_tags":243,"status":91},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",151918,"2026-04-12T11:33:05",[80,225,244],"语言模型",{"id":246,"name":247,"github_repo":248,"description_zh":249,"stars":250,"difficulty_score":90,"last_commit_at":251,"category_tags":252,"status":91},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[80,226,225],{"id":254,"name":255,"github_repo":256,"description_zh":257,"stars":258,"difficulty_score":90,"last_commit_at":259,"category_tags":260,"status":91},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[261,225,226,80],"插件",{"id":263,"name":264,"github_repo":265,"description_zh":266,"stars":267,"difficulty_score":90,"last_commit_at":268,"category_tags":269,"status":91},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[261,80]]