[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-microsoft--onnxruntime-genai":3,"tool-microsoft--onnxruntime-genai":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",144730,2,"2026-04-07T23:26:32",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107888,"2026-04-06T11:32:50",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":10,"last_commit_at":59,"category_tags":60,"status":17},4487,"LLMs-from-scratch","rasbt\u002FLLMs-from-scratch","LLMs-from-scratch 是一个基于 PyTorch 的开源教育项目，旨在引导用户从零开始一步步构建一个类似 ChatGPT 的大型语言模型（LLM）。它不仅是同名技术著作的官方代码库，更提供了一套完整的实践方案，涵盖模型开发、预训练及微调的全过程。\n\n该项目主要解决了大模型领域“黑盒化”的学习痛点。许多开发者虽能调用现成模型，却难以深入理解其内部架构与训练机制。通过亲手编写每一行核心代码，用户能够透彻掌握 Transformer 架构、注意力机制等关键原理，从而真正理解大模型是如何“思考”的。此外，项目还包含了加载大型预训练权重进行微调的代码，帮助用户将理论知识延伸至实际应用。\n\nLLMs-from-scratch 特别适合希望深入底层原理的 AI 开发者、研究人员以及计算机专业的学生。对于不满足于仅使用 API，而是渴望探究模型构建细节的技术人员而言，这是极佳的学习资源。其独特的技术亮点在于“循序渐进”的教学设计：将复杂的系统工程拆解为清晰的步骤，配合详细的图表与示例，让构建一个虽小但功能完备的大模型变得触手可及。无论你是想夯实理论基础，还是为未来研发更大规模的模型做准备",90106,"2026-04-06T11:19:32",[35,15,13,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":78,"owner_twitter":79,"owner_website":80,"owner_url":81,"languages":82,"stars":123,"forks":124,"last_commit_at":125,"license":126,"difficulty_score":32,"env_os":127,"env_gpu":128,"env_ram":129,"env_deps":130,"category_tags":135,"github_topics":77,"view_count":32,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":136,"updated_at":137,"faqs":138,"releases":175},5391,"microsoft\u002Fonnxruntime-genai","onnxruntime-genai","Generative AI extensions for onnxruntime","onnxruntime-genai 是微软推出的开源扩展库，旨在让开发者能够轻松、高效地在本地设备上运行生成式 AI 模型（如大语言模型）。它基于成熟的 ONNX Runtime 构建，将复杂的模型推理流程封装为简洁的 API，涵盖了从输入预处理、核心推理、日志处理到搜索采样、键值缓存管理及语法约束等全链路环节。\n\n这一工具主要解决了大模型在边缘设备部署难、推理优化复杂的问题。通过提供统一的接口，它屏蔽了底层硬件差异，让同一套代码能在 Windows、Linux、macOS、Android 等多种操作系统上流畅运行，并自动适配 CPU、CUDA、DirectML、WebGPU 等多种硬件加速后端。目前，它已广泛应用于 Foundry Local、Windows ML 及 VS Code AI 工具箱等实际产品中。\n\nonnxruntime-genai 特别适合 AI 应用开发者、算法工程师及研究人员使用。无论是希望在个人电脑或移动设备上离线部署智能助手，还是需要集成多 LoRA 适配、连续解码及受限解码等高级功能，都能从中获益。其独特的亮点在于对 Llama、Phi、Qwen、Gemm","onnxruntime-genai 是微软推出的开源扩展库，旨在让开发者能够轻松、高效地在本地设备上运行生成式 AI 模型（如大语言模型）。它基于成熟的 ONNX Runtime 构建，将复杂的模型推理流程封装为简洁的 API，涵盖了从输入预处理、核心推理、日志处理到搜索采样、键值缓存管理及语法约束等全链路环节。\n\n这一工具主要解决了大模型在边缘设备部署难、推理优化复杂的问题。通过提供统一的接口，它屏蔽了底层硬件差异，让同一套代码能在 Windows、Linux、macOS、Android 等多种操作系统上流畅运行，并自动适配 CPU、CUDA、DirectML、WebGPU 等多种硬件加速后端。目前，它已广泛应用于 Foundry Local、Windows ML 及 VS Code AI 工具箱等实际产品中。\n\nonnxruntime-genai 特别适合 AI 应用开发者、算法工程师及研究人员使用。无论是希望在个人电脑或移动设备上离线部署智能助手，还是需要集成多 LoRA 适配、连续解码及受限解码等高级功能，都能从中获益。其独特的亮点在于对 Llama、Phi、Qwen、Gemma 等主流模型架构的广泛支持，以及提供 Python、C#、C\u002FC++ 和 Java 等多语言绑定，极大地降低了高性能生成式 AI 应用的开发门槛。","# ONNX Runtime GenAI\n\n## Status\n\n[![Latest version](https:\u002F\u002Fimg.shields.io\u002Fnuget\u002Fvpre\u002FMicrosoft.ML.OnnxRuntimeGenAI.Managed?label=latest)](https:\u002F\u002Fwww.nuget.org\u002Fpackages\u002FMicrosoft.ML.OnnxRuntimeGenAI.Managed\u002FabsoluteLatest)\n\n[![Nightly Build](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Factions\u002Fworkflows\u002Flinux-cpu-x64-nightly-build.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Factions\u002Fworkflows\u002Flinux-cpu-x64-nightly-build.yml)\n\n## Description\n\nRun generative AI models with ONNX Runtime. This API gives you an easy, flexible and performant way of running LLMs on device. It implements the generative AI loop for ONNX models, including pre and post processing, inference with ONNX Runtime, logits processing, search and sampling, KV cache management, and grammar specification for tool calling.\n\nONNX Runtime GenAI powers Foundry Local, Windows ML, and the Visual Studio Code AI Toolkit.\n\nSee documentation at the [ONNX Runtime website](https:\u002F\u002Fonnxruntime.ai\u002Fdocs\u002Fgenai) for more details.\n\n| Support matrix | Supported now | Under development | On the roadmap|\n| -------------- | ------------- | ----------------- | -------------- |\n| Model architectures | AMD OLMo \u003Cbr\u002F> ChatGLM \u003Cbr\u002F> DeepSeek \u003Cbr\u002F> ERNIE 4.5 \u003Cbr\u002F> Fara \u003Cbr\u002F> Gemma \u003Cbr\u002F> gpt-oss \u003Cbr\u002F> Granite \u003Cbr\u002F> InternLM2 \u003Cbr\u002F> Llama \u003Cbr\u002F> Mistral \u003Cbr\u002F> Nemotron \u003Cbr\u002F> Phi (language + vision) \u003Cbr\u002F> Qwen (language + vision) \u003Cbr\u002F> SmolLM3 \u003Cbr\u002F> Whisper | Stable diffusion | Multi-modal models |\n| API | Python \u003Cbr\u002F>C# \u003Cbr\u002F>C\u002FC++ \u003Cbr\u002F> Java ^ | Objective-C ||\n| O\u002FS | Linux \u003Cbr\u002F> Windows \u003Cbr\u002F>Mac  \u003Cbr\u002F>Android || iOS |||\n| Architecture | x86 \u003Cbr\u002F> x64 \u003Cbr\u002F> arm64 ||||\n| Hardware Acceleration | CPU \u003Cbr\u002F> CUDA \u003Cbr\u002F> DirectML \u003Cbr\u002F> NvTensorRtRtx (TRT-RTX) \u003Cbr\u002F> OpenVINO \u003Cbr\u002F> QNN \u003Cbr\u002F> WebGPU | | AMD GPU |\n| Features | Multi-LoRA \u003Cbr\u002F> Continuous decoding \u003Cbr\u002F> Constrained decoding | | Speculative decoding |\n\n^ Requires build from source\n\n## Installation\n\nSee [installation instructions](https:\u002F\u002Fonnxruntime.ai\u002Fdocs\u002Fgenai\u002Fhowto\u002Finstall) or [build from source](https:\u002F\u002Fonnxruntime.ai\u002Fdocs\u002Fgenai\u002Fhowto\u002Fbuild-from-source.html)\n\n## Sample code for Phi-3 in Python\n\n1. Download the model\n\n   ```shell\n   huggingface-cli download microsoft\u002FPhi-3-mini-4k-instruct-onnx --include cpu_and_mobile\u002Fcpu-int4-rtn-block-32-acc-level-4\u002F* --local-dir .\n   ```\n\n2. Install the API\n\n   ```shell\n   pip install numpy\n   pip install --pre onnxruntime-genai\n   ```\n\n3. Run the model\n\n   ```python\n   import onnxruntime_genai as og\n\n   model = og.Model('cpu_and_mobile\u002Fcpu-int4-rtn-block-32-acc-level-4')\n   tokenizer = og.Tokenizer(model)\n   stream = tokenizer.create_stream()\n    \n   # Set the max length to something sensible by default,\n   # since otherwise it will be set to the entire context length\n   search_options = {}\n   search_options['max_length'] = 2048\n   search_options['batch_size'] = 1\n\n   chat_template = '\u003C|user|>\\n{input} \u003C|end|>\\n\u003C|assistant|>'\n\n   text = input(\"Input: \")\n   if not text:\n      print(\"Error, input cannot be empty\")\n      exit()\n\n   prompt = f'{chat_template.format(input=text)}'\n\n   input_tokens = tokenizer.encode(prompt)\n\n   params = og.GeneratorParams(model)\n   params.set_search_options(**search_options)\n   generator = og.Generator(model, params)\n  \n   print(\"Output: \", end='', flush=True)\n\n   try:\n      generator.append_tokens(input_tokens)\n      while not generator.is_done():\n         generator.generate_next_token()\n         new_token = generator.get_next_tokens()[0]\n         print(stream.decode(new_token), end='', flush=True)\n   except KeyboardInterrupt:\n         print(\"  --control+c pressed, aborting generation--\")\n\n   print()\n   del generator\n   ```\n\n### Choose the correct version of the examples\n\nDue to the evolving nature of this project and ongoing feature additions, examples in the `main` branch may not always align with the latest stable release. This section outlines how to ensure compatibility between the examples and the corresponding version.\n\n### Stable version\n\nInstall the package according to the [installation instructions](https:\u002F\u002Fonnxruntime.ai\u002Fdocs\u002Fgenai\u002Fhowto\u002Finstall). For example, install the Python package.\n\n```bash\npip install onnxruntime-genai\n```\n\nGet the version of the package\n\nLinux\u002FMac:\n```bash\npip list | grep onnxruntime-genai\n```\n\nWindows:\n```bash\npip list | findstr \"onnxruntime-genai\"\n```\n\nThen, check out the version of the examples that corresponds to that release.\n\n```bash\n# Clone the repo\ngit clone https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai.git && cd onnxruntime-genai\n# Checkout the branch for the version you are using\ngit checkout v0.11.5\ncd examples\n```\n\n### Nightly version (main branch)\n\nCheckout the main branch of the repo\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai.git && cd onnxruntime-genai\n```\n\nBuild from source, using these [instructions](https:\u002F\u002Fonnxruntime.ai\u002Fdocs\u002Fgenai\u002Fhowto\u002Fbuild-from-source.html). For example, to build the Python wheel:\n\n```bash\npython build.py\n```\n\nNavigate to the examples folder in the main branch.\n\n```bash\ncd examples\n```\n\nTo install the nightly Python build:\n\n```bash\n# Change onnxruntime-genai to the Python package you want to install\npip install --index-url https:\u002F\u002Faiinfra.pkgs.visualstudio.com\u002FPublicPackages\u002F_packaging\u002FORT-Nightly\u002Fpypi\u002Fsimple\u002F onnxruntime-genai\n```\n\n## Roadmap\n\nSee the [Discussions](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fdiscussions) to request new features and up-vote existing requests.\n\n\n## Contributing\n\nThis project welcomes contributions and suggestions.  Most contributions require you to agree to a\nContributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us\nthe rights to use your contribution. For details, visit https:\u002F\u002Fcla.opensource.microsoft.com.\n\nWhen you submit a pull request, a CLA bot will automatically determine whether you need to provide\na CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions\nprovided by the bot. You will only need to do this once across all repos using our CLA.\n\nThis project has adopted the [Microsoft Open Source Code of Conduct](https:\u002F\u002Fopensource.microsoft.com\u002Fcodeofconduct\u002F).\nFor more information see the [Code of Conduct FAQ](https:\u002F\u002Fopensource.microsoft.com\u002Fcodeofconduct\u002Ffaq\u002F) or\ncontact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.\n\n### Linting\n\nThis project enables [lintrunner](https:\u002F\u002Fgithub.com\u002Fsuo\u002Flintrunner) for linting. You can install the dependencies and initialize with\n\n```sh\npip install -r requirements-lintrunner.txt\nlintrunner init\n```\n\nThis will install lintrunner on your system and download all the necessary dependencies to run linters locally.\n\nTo format local changes:\n\n```bash\nlintrunner -a\n```\n\nTo format all files:\n\n```bash\nlintrunner -a --all-files\n```\n\n## Trademarks\n\nThis project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft \ntrademarks or logos is subject to and must follow [Microsoft's Trademark & Brand Guidelines](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Flegal\u002Fintellectualproperty\u002Ftrademarks\u002Fusage\u002Fgeneral). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.\n","# ONNX Runtime GenAI\n\n## 状态\n\n[![最新版本](https:\u002F\u002Fimg.shields.io\u002Fnuget\u002Fvpre\u002FMicrosoft.ML.OnnxRuntimeGenAI.Managed?label=latest)](https:\u002F\u002Fwww.nuget.org\u002Fpackages\u002FMicrosoft.ML.OnnxRuntimeGenAI.Managed\u002FabsoluteLatest)\n\n[![夜间构建](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Factions\u002Fworkflows\u002Flinux-cpu-x64-nightly-build.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Factions\u002Fworkflows\u002Flinux-cpu-x64-nightly-build.yml)\n\n## 描述\n\n使用 ONNX Runtime 运行生成式 AI 模型。该 API 提供了一种简单、灵活且高效的在设备上运行 LLM 的方式。它为 ONNX 模型实现了生成式 AI 循环，包括预处理和后处理、使用 ONNX Runtime 进行推理、logits 处理、搜索与采样、KV 缓存管理，以及用于工具调用的语法规范。\n\nONNX Runtime GenAI 为 Foundry Local、Windows ML 和 Visual Studio Code AI 工具包提供支持。\n\n更多详细信息请参阅 [ONNX Runtime 官网](https:\u002F\u002Fonnxruntime.ai\u002Fdocs\u002Fgenai) 的文档。\n\n| 支持矩阵 | 当前支持 | 开发中 | 路线图 |\n| -------------- | ------------- | ----------------- | -------------- |\n| 模型架构 | AMD OLMo \u003Cbr\u002F> ChatGLM \u003Cbr\u002F> DeepSeek \u003Cbr\u002F> ERNIE 4.5 \u003Cbr\u002F> Fara \u003Cbr\u002F> Gemma \u003Cbr\u002F> gpt-oss \u003Cbr\u002F> Granite \u003Cbr\u002F> InternLM2 \u003Cbr\u002F> Llama \u003Cbr\u002F> Mistral \u003Cbr\u002F> Nemotron \u003Cbr\u002F> Phi (语言 + 视觉) \u003Cbr\u002F> Qwen (语言 + 视觉) \u003Cbr\u002F> SmolLM3 \u003Cbr\u002F> Whisper | Stable diffusion | 多模态模型 |\n| API | Python \u003Cbr\u002F>C# \u003Cbr\u002F>C\u002FC++ \u003Cbr\u002F> Java ^ | Objective-C ||\n| 操作系统 | Linux \u003Cbr\u002F> Windows \u003Cbr\u002F>Mac  \u003Cbr\u002F>Android || iOS |||\n| 架构 | x86 \u003Cbr\u002F> x64 \u003Cbr\u002F> arm64 ||||\n| 硬件加速 | CPU \u003Cbr\u002F> CUDA \u003Cbr\u002F> DirectML \u003Cbr\u002F> NvTensorRtRtx (TRT-RTX) \u003Cbr\u002F> OpenVINO \u003Cbr\u002F> QNN \u003Cbr\u002F> WebGPU | | AMD GPU |\n| 特性 | Multi-LoRA \u003Cbr\u002F> 连续解码 \u003Cbr\u002F> 约束解码 | | 推测解码 |\n\n^ 需要从源代码构建\n\n## 安装\n\n请参阅 [安装说明](https:\u002F\u002Fonnxruntime.ai\u002Fdocs\u002Fgenai\u002Fhowto\u002Finstall) 或 [从源代码构建](https:\u002F\u002Fonnxruntime.ai\u002Fdocs\u002Fgenai\u002Fhowto\u002Fbuild-from-source.html)\n\n## Phi-3 的 Python 示例代码\n\n1. 下载模型\n\n   ```shell\n   huggingface-cli download microsoft\u002FPhi-3-mini-4k-instruct-onnx --include cpu_and_mobile\u002Fcpu-int4-rtn-block-32-acc-level-4\u002F* --local-dir .\n   ```\n\n2. 安装 API\n\n   ```shell\n   pip install numpy\n   pip install --pre onnxruntime-genai\n   ```\n\n3. 运行模型\n\n   ```python\n   import onnxruntime_genai as og\n\n   model = og.Model('cpu_and_mobile\u002Fcpu-int4-rtn-block-32-acc-level-4')\n   tokenizer = og.Tokenizer(model)\n   stream = tokenizer.create_stream()\n    \n   # 默认设置一个合理的最大长度，\n   # 否则它将被设置为整个上下文长度\n   search_options = {}\n   search_options['max_length'] = 2048\n   search_options['batch_size'] = 1\n\n   chat_template = '\u003C|user|>\\n{input} \u003C|end|>\\n\u003C|assistant|>'\n\n   text = input(\"Input: \")\n   if not text:\n      print(\"Error, input cannot be empty\")\n      exit()\n\n   prompt = f'{chat_template.format(input=text)}'\n\n   input_tokens = tokenizer.encode(prompt)\n\n   params = og.GeneratorParams(model)\n   params.set_search_options(**search_options)\n   generator = og.Generator(model, params)\n  \n   print(\"Output: \", end='', flush=True)\n\n   try:\n      generator.append_tokens(input_tokens)\n      while not generator.is_done():\n         generator.generate_next_token()\n         new_token = generator.get_next_tokens()[0]\n         print(stream.decode(new_token), end='', flush=True)\n   except KeyboardInterrupt:\n         print(\"  --control+c pressed, aborting generation--\")\n\n   print()\n   del generator\n   ```\n\n### 选择正确的示例版本\n\n由于该项目仍在不断发展并持续添加新功能，`main` 分支中的示例可能并不总是与最新的稳定版完全一致。本节介绍了如何确保示例与相应版本之间的兼容性。\n\n### 稳定版\n\n按照 [安装说明](https:\u002F\u002Fonnxruntime.ai\u002Fdocs\u002Fgenai\u002Fhowto\u002Finstall) 安装软件包。例如，安装 Python 包：\n\n```bash\npip install onnxruntime-genai\n```\n\n获取软件包版本：\n\nLinux\u002FMac：\n```bash\npip list | grep onnxruntime-genai\n```\n\nWindows：\n```bash\npip list | findstr \"onnxruntime-genai\"\n```\n\n然后检出与该版本对应的示例分支：\n\n```bash\n# 克隆仓库\ngit clone https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai.git && cd onnxruntime-genai\n# 检出您正在使用的版本分支\ngit checkout v0.11.5\ncd examples\n```\n\n### 夜间版（main 分支）\n\n检出仓库的 `main` 分支：\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai.git && cd onnxruntime-genai\n```\n\n按照这些 [说明](https:\u002F\u002Fonnxruntime.ai\u002Fdocs\u002Fgenai\u002Fhowto\u002Fbuild-from-source.html) 从源代码构建。例如，构建 Python wheel：\n\n```bash\npython build.py\n```\n\n导航到 `main` 分支中的示例文件夹：\n\n```bash\ncd examples\n```\n\n安装夜间版 Python 构建：\n\n```bash\n# 将 onnxruntime-genai 替换为您想要安装的 Python 包\npip install --index-url https:\u002F\u002Faiinfra.pkgs.visualstudio.com\u002FPublicPackages\u002F_packaging\u002FORT-Nightly\u002Fpypi\u002Fsimple\u002F onnxruntime-genai\n```\n\n## 路线图\n\n请查看 [讨论区](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fdiscussions)，以请求新功能或为现有请求投票。\n\n\n## 贡献\n\n本项目欢迎贡献和建议。大多数贡献都需要您同意一份贡献者许可协议 (CLA)，声明您有权并将您的贡献权利授予我们。有关详情，请访问 https:\u002F\u002Fcla.opensource.microsoft.com。\n\n当您提交拉取请求时，CLA 机器人会自动判断您是否需要提供 CLA，并相应地标记 PR（例如状态检查、评论）。只需按照机器人提供的指示操作即可。对于使用我们 CLA 的所有仓库，您只需执行一次此操作。\n\n本项目已采用 [微软开源行为准则](https:\u002F\u002Fopensource.microsoft.com\u002Fcodeofconduct\u002F)。更多信息请参阅 [行为准则常见问题解答](https:\u002F\u002Fopensource.microsoft.com\u002Fcodeofconduct\u002Ffaq\u002F)，或如有任何其他疑问或意见，请联系 [opencode@microsoft.com](mailto:opencode@microsoft.com)。\n\n### 代码检查\n\n本项目启用了 [lintrunner](https:\u002F\u002Fgithub.com\u002Fsuo\u002Flintrunner) 进行代码检查。您可以安装依赖项并初始化：\n\n```sh\npip install -r requirements-lintrunner.txt\nlintrunner init\n```\n\n这将在您的系统上安装 lintrunner，并下载所有必要的依赖项，以便在本地运行 linter。\n\n格式化本地更改：\n\n```bash\nlintrunner -a\n```\n\n格式化所有文件：\n\n```bash\nlintrunner -a --all-files\n```\n\n## 商标\n\n本项目可能包含项目、产品或服务的商标或标识。对微软商标或标识的授权使用须遵守并依据[微软商标与品牌指南](https:\u002F\u002Fwww.microsoft.com\u002Fen-us\u002Flegal\u002Fintellectualproperty\u002Ftrademarks\u002Fusage\u002Fgeneral)。在本项目的修改版本中使用微软商标或标识时，不得造成混淆或暗示微软的赞助关系。任何对第三方商标或标识的使用均应遵循该第三方的相关政策。","# ONNX Runtime GenAI 快速上手指南\n\nONNX Runtime GenAI 是一个用于在本地设备运行生成式 AI 模型（如 LLM）的高性能库。它封装了完整的生成式 AI 循环，包括预处理、推理、日志处理、搜索采样、KV 缓存管理及工具调用的语法约束。\n\n## 环境准备\n\n### 系统要求\n支持以下操作系统和架构：\n- **操作系统**: Linux, Windows, macOS, Android\n- **架构**: x86, x64, arm64\n- **硬件加速**: 支持 CPU, CUDA (NVIDIA), DirectML, OpenVINO, QNN, WebGPU 等。\n\n### 前置依赖\n- **Python**: 建议 Python 3.8 或更高版本。\n- **包管理工具**: `pip`\n- **基础库**: `numpy` (运行示例代码必需)\n\n> **注意**：本项目迭代迅速，示例代码需与安装的库版本严格对应。本指南基于最新稳定版流程编写。\n\n## 安装步骤\n\n### 1. 安装基础依赖\n首先安装 `numpy`：\n```bash\npip install numpy\n```\n\n### 2. 安装 ONNX Runtime GenAI\n安装预发布版本（通常包含最新的功能支持）：\n```bash\npip install --pre onnxruntime-genai\n```\n\n> **国内加速提示**：如果下载速度慢，可使用清华或阿里镜像源：\n> ```bash\n> pip install --pre onnxruntime-genai -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n> ```\n\n### 3. 验证版本（可选但推荐）\n由于 `main` 分支的示例可能与稳定版不兼容，建议确认已安装的版本号：\n```bash\npip list | grep onnxruntime-genai\n# Windows 用户使用: pip list | findstr \"onnxruntime-genai\"\n```\n*若需运行特定版本的示例代码，请前往 GitHub 仓库 checkout 对应的 tag（例如 `git checkout v0.11.5`）。*\n\n## 基本使用\n\n以下以微软 **Phi-3** 模型为例，演示如何下载模型并运行一个简单的对话生成任务。\n\n### 第一步：下载模型\n使用 `huggingface-cli` 下载 Phi-3 的 ONNX 量化版本（针对 CPU 和移动端优化）：\n\n```shell\nhuggingface-cli download microsoft\u002FPhi-3-mini-4k-instruct-onnx --include cpu_and_mobile\u002Fcpu-int4-rtn-block-32-acc-level-4\u002F* --local-dir .\n```\n\n> **国内加速提示**：如果无法访问 Hugging Face，可设置镜像环境变量：\n> ```shell\n> export HF_ENDPOINT=https:\u002F\u002Fhf-mirror.com\n> huggingface-cli download microsoft\u002FPhi-3-mini-4k-instruct-onnx --include cpu_and_mobile\u002Fcpu-int4-rtn-block-32-acc-level-4\u002F* --local-dir .\n> ```\n\n### 第二步：运行代码\n创建文件 `run_phi3.py`，填入以下代码：\n\n```python\nimport onnxruntime_genai as og\n\n# 加载模型路径\nmodel = og.Model('cpu_and_mobile\u002Fcpu-int4-rtn-block-32-acc-level-4')\ntokenizer = og.Tokenizer(model)\nstream = tokenizer.create_stream()\n \n# 配置生成参数\nsearch_options = {}\nsearch_options['max_length'] = 2048\nsearch_options['batch_size'] = 1\n\n# 定义聊天模板 (Phi-3 特定格式)\nchat_template = '\u003C|user|>\\n{input} \u003C|end|>\\n\u003C|assistant|>'\n\ntext = input(\"Input: \")\nif not text:\n   print(\"Error, input cannot be empty\")\n   exit()\n\nprompt = f'{chat_template.format(input=text)}'\n\n# 编码输入\ninput_tokens = tokenizer.encode(prompt)\n\nparams = og.GeneratorParams(model)\nparams.set_search_options(**search_options)\ngenerator = og.Generator(model, params)\n\nprint(\"Output: \", end='', flush=True)\n\ntry:\n   generator.append_tokens(input_tokens)\n   while not generator.is_done():\n      generator.generate_next_token()\n      new_token = generator.get_next_tokens()[0]\n      # 流式解码输出\n      print(stream.decode(new_token), end='', flush=True)\nexcept KeyboardInterrupt:\n      print(\"  --control+c pressed, aborting generation--\")\n\nprint()\ndel generator\n```\n\n### 第三步：执行\n在终端运行脚本：\n```bash\npython run_phi3.py\n```\n输入问题后，模型将流式输出回答。\n\n---\n*更多模型架构支持（如 Llama, Qwen, Gemma 等）及高级用法，请参考 [ONNX Runtime 官方文档](https:\u002F\u002Fonnxruntime.ai\u002Fdocs\u002Fgenai)。*","某初创团队希望将智能客服助手部署在用户的本地笔记本电脑上，以确保数据隐私并降低云端推理成本。\n\n### 没有 onnxruntime-genai 时\n- **开发复杂度极高**：工程师需要手动拼凑预处理、KV 缓存管理、Logits 处理和搜索采样等底层逻辑，代码冗长且极易出错。\n- **跨平台适配困难**：为了让模型同时支持 Windows、Linux 和 macOS，甚至不同的 CPU 架构，需要为每种环境单独编写和优化推理后端。\n- **资源占用不可控**：缺乏统一的显存与内存优化机制，导致大模型在普通消费级硬件上运行缓慢或直接因内存溢出而崩溃。\n- **功能迭代缓慢**：若要添加约束解码（如强制输出特定 JSON 格式）或工具调用功能，需从头研发算法，严重拖慢产品上线节奏。\n\n### 使用 onnxruntime-genai 后\n- **一站式生成循环**：直接调用其封装好的 API，自动处理从 Token 编码、推理执行到流式解码的全流程，核心代码缩减至几十行。\n- **无缝多端部署**：凭借对 x86、ARM64 及 CUDA、DirectML 等多种硬件加速的后端支持，同一套代码即可在用户不同的设备上高效运行。\n- **极致性能优化**：内置高效的 KV 缓存管理和量化模型支持，使 Phi-3 等模型在本地笔记本上也能实现低延迟的流畅对话。\n- **高级特性即插即用**：原生支持约束解码和语法指定，轻松实现结构化数据输出和工具调用，无需重复造轮子。\n\nonnxruntime-genai 通过屏蔽底层推理细节并提供高性能的生成式 AI 原语，让开发者能专注于业务逻辑，真正实现大模型在终端设备上的普惠落地。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fmicrosoft_onnxruntime-genai_3380f9e6.png","microsoft","Microsoft","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fmicrosoft_4900709c.png","Open source projects and samples from Microsoft",null,"opensource@microsoft.com","OpenAtMicrosoft","https:\u002F\u002Fopensource.microsoft.com","https:\u002F\u002Fgithub.com\u002Fmicrosoft",[83,87,91,95,99,103,107,111,115,119],{"name":84,"color":85,"percentage":86},"C++","#f34b7d",49.8,{"name":88,"color":89,"percentage":90},"Python","#3572A5",29.3,{"name":92,"color":93,"percentage":94},"Cuda","#3A4E3A",7.4,{"name":96,"color":97,"percentage":98},"C#","#178600",4.4,{"name":100,"color":101,"percentage":102},"Java","#b07219",2.6,{"name":104,"color":105,"percentage":106},"C","#555555",2.2,{"name":108,"color":109,"percentage":110},"CMake","#DA3434",2.1,{"name":112,"color":113,"percentage":114},"Objective-C++","#6866fb",1,{"name":116,"color":117,"percentage":118},"Objective-C","#438eff",0.8,{"name":120,"color":121,"percentage":122},"Shell","#89e051",0.2,998,278,"2026-04-07T17:29:48","MIT","Linux, Windows, macOS, Android","非必需（支持 CPU）。若需加速，支持 NVIDIA GPU (CUDA, TRT-RTX), AMD GPU (开发中), Intel GPU (OpenVINO), Qualcomm NPU (QNN) 或 WebGPU。具体显存大小和 CUDA 版本取决于所选的后端及运行的模型规模，文中未明确指定统一最低要求。","未说明（取决于运行的模型大小，示例中使用的是量化后的 Phi-3 模型以适应移动端\u002F低资源环境）",{"notes":131,"python":132,"dependencies":133},"1. 该工具旨在实现端侧（On-device）运行大语言模型，支持多种硬件加速后端（CPU, CUDA, DirectML, OpenVINO, QNN 等）。2. Java API 需要从源代码构建。3. 支持的模型架构包括 Llama, Phi, Qwen, Gemma, Mistral 等，部分多模态模型正在路线图中。4. 安装时需注意版本匹配：稳定版直接使用 pip 安装，夜间版（Nightly）需从特定索引源安装或从源码构建。5. 示例代码展示了如何下载 INT4 量化版本的 Phi-3 模型并在 CPU 上运行，表明其对低资源设备友好。","未说明（示例代码使用 Python，需安装 numpy 和 onnxruntime-genai 包）",[134,64],"numpy",[35,14],"2026-03-27T02:49:30.150509","2026-04-08T13:57:46.078242",[139,144,149,154,158,162,167,171],{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},24446,"在特定输入长度下（如 3000 或 8000），Phi-3.5 模型生成的输出全是乱码或换行符，如何解决？","该问题已在 onnxruntime-genai 0.6.0 版本中修复。请升级到最新版本（pip install --upgrade onnxruntime-genai）或使用最新源代码构建。如果问题仍然存在，请确保使用最新的示例代码并从头构建项目。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fissues\u002F954",{"id":145,"question_zh":146,"answer_zh":147,"source_url":148},24447,"如果在导入 onnxruntime 之后再导入 onnxruntime-genai，程序会因 SIGSEGV 错误崩溃，原因是什么？","这是一个已知的兼容性问题，通常与 GCC 版本或底层 C++ 标准库冲突有关。维护者正在通过升级 GCC 来解决此问题。临时解决方案是调整导入顺序（先导入 onnxruntime-genai），或者等待包含修复的新版本发布。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fissues\u002F257",{"id":150,"question_zh":151,"answer_zh":152,"source_url":153},24448,"在 Windows 上安装 onnxruntime-genai-cuda 时出现 'DLL load failed' 或 'The specified module could not be found' 错误怎么办？","这通常是因为缺少对应的 CUDA 运行时依赖或安装了错误的包版本。请确保您的环境匹配 CUDA 12，并使用以下命令安装预发布版本：\npip install --pre onnxruntime-genai-cuda --index-url=https:\u002F\u002Faiinfra.pkgs.visualstudio.com\u002FPublicPackages\u002F_packaging\u002Fonnxruntime-genai\u002Fpypi\u002Fsimple\u002F\n如果指定版本，可以添加版本号，例如 ==0.3.0。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fissues\u002F490",{"id":155,"question_zh":156,"answer_zh":157,"source_url":153},24449,"如何在 Docker 容器中正确安装适用于 CUDA 12 的 onnxruntime-genai-cuda 包？","在 Dockerfile 中，请确保基础镜像包含正确的 CUDA 版本（如 nvidia\u002Fcuda:12.0.0），并安装必要的 cuDNN 库。然后使用以下命令安装 Python 包：\nRUN pip install --pre onnxruntime-genai-cuda --index-url=https:\u002F\u002Faiinfra.pkgs.visualstudio.com\u002FPublicPackages\u002F_packaging\u002Fonnxruntime-genai\u002Fpypi\u002Fsimple\u002F\n如果找不到包，请检查索引 URL 是否正确以及网络连通性。",{"id":159,"question_zh":160,"answer_zh":161,"source_url":153},24450,"部署量化后的 CUDA 模型时，是否可以删除其他架构（如 CPU、DirectML）的模型文件以节省空间？","是的，您可以只保留需要的模型文件夹。例如，如果您只使用 cuda-int4-rtn-block-32 版本的模型，只需保留 Phi-3-mini-4k-instruct-onnx\u002Fcuda\u002Fcuda-int4-rtn-block-32 文件夹，可以安全删除 cpu_and_mobile、directml 或其他 cuda-fp16 文件夹来减少部署体积。",{"id":163,"question_zh":164,"answer_zh":165,"source_url":166},24451,"Phi-3 Vision 模型对输入图像的文件大小有限制吗？最大分辨率是多少？","虽然官方文档可能未明确列出文件大小限制（KB），但模型主要关注图像分辨率。已知支持的分辨率包括 1366x1366。建议控制图像尺寸在模型训练时的分辨率范围内以获得最佳效果，过大的文件或分辨率可能导致处理失败或性能下降。","https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fissues\u002F571",{"id":168,"question_zh":169,"answer_zh":170,"source_url":166},24452,"是否有同时支持 CUDA、DirectML 和 CPU 的单一 NuGet 包用于 .NET\u002FC# 开发？","目前通常需要针对不同的执行提供者（Execution Provider）选择特定的包或配置。维护者正在评估提供统一包的可能性，但在当前版本中，建议根据目标运行环境选择对应的包（如专门针对 DirectML 的包），或者在应用层面动态加载不同的后端。",{"id":172,"question_zh":173,"answer_zh":174,"source_url":166},24453,"onnxruntime_genai.models.builder 工具现在支持构建视觉（Vision）模型吗？","截至该 Issue 讨论时，对 Vision 模型的支持仍在完善中。建议查看最新的官方文档或 Hugging Face 仓库以确认 builder 工具是否已更新支持 Phi-3 Vision 等多模态模型的转换和构建。",[176,181,186,191,196,201,206,211,216,221,226,231,236,241,246,251,256,261,266,271],{"id":177,"version":178,"summary_zh":179,"released_at":180},153988,"v0.12.2","- [在 0.12.0 版本发布后更新示例](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1980)\n- [为 ChatGLM3 输出层添加缺失的 Quark 0.11 权重模式](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1983)\n- [在 qwen.py 中支持 Qwen2.5-VL 的预量化模型](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1985)\n- [修复使用多个提示时批次响应不正确的问题](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1986)\n- [加强整个代码库中的 CUDA 错误检查](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1994)\n- [允许在预填充阶段使用剪枝模型](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1995)\n- [在剪枝预填充后添加少量改动](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F2000)","2026-03-27T17:49:15",{"id":182,"version":183,"summary_zh":184,"released_at":185},153989,"v0.12.1","- https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1988\n- https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1984","2026-03-02T23:22:46",{"id":187,"version":188,"summary_zh":189,"released_at":190},153990,"v0.12.0","## 变更内容\n* 在 @kunal-vaishnavi 的 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1867 中，创建 0.11.0 分支后更新版本号。\n* @kunal-vaishnavi 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1870 中修复了连续解码中的指导使用问题。\n* @kunal-vaishnavi 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1871 中修复了 HelloPhi C# 示例。\n* @apsonawane 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1875 中修复了正则表达式问题。\n* @apsonawane 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1874 中更新了扩展提交。\n* @xiaofeihan1 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1878 中撤销了移除 eps_without_if_support 的操作。\n* @apsonawane 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1880 中修复了 NPU 的条件判断问题。\n* @tianleiwu 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1862 中重构了模型构建器。\n* @tianleiwu 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1884 中添加了 lintrunner 以格式化代码。\n* @xkszltl 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1883 中移除了空子模块的残留文件。\n* @jaeyoonjung 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1888 中修复了因缺少 RTLD_DI_ORIGIN 支持而导致的构建问题。\n* @qjia7 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1848 中启用了 WebGPU 的图捕获功能。\n* @jixiongdeng 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1885 中实现了通用的共享 emb_tokens\u002Flm_head 实现。\n* @Honry 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1886 中修复了 Squeeze 操作中获取 total_seq_len 值的 bug。\n* @jixiongdeng 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1893 中新增了 extra_options `disable_qkv_fusion`，用于将 qkv_projs 与上游选择解耦。\n* @apsonawane 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1904 中修复了 Mac 上的流水线问题。\n* @RyanMetcalfeInt8 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1857 中为 Whisper 流水线增加了一种变体，其中编码器和解码器均为有状态模式。\n* @tianleiwu 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1882 中添加了 Qwen2_5_VLTextModel 的模型构建器。\n* @apsonawane 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1902 中集成了 FARA-7B 模型。\n* @apsonawane 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1861 中修复了 gpt-oss 模型的导出问题。\n* @RyanMetcalfeInt8 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1900 中为 OpenVINO 添加了通过 'cache_dir' 提供者选项支持模型缓存的功能。\n* @chrisdMSFT 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1907 中移除了 Microsoft.WindowsAppSDK.ML 的包含性范围检查。\n* @apsonawane 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1908 中以文本模式运行模型。\n* @apsonawane 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1914 中更新了扩展提交。\n* @apsonawane 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1915 中修复了 gpt-oss 的导出问题。\n* @xia 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1916 中增加了对 Olive 新的 uint8 量化格式的支持。","2026-02-13T17:38:22",{"id":192,"version":193,"summary_zh":194,"released_at":195},153991,"v0.11.4","## 变更内容\n* WinML - 由 @chrisdMSFT 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1907 中移除了包含 Microsoft.WindowsAppSDK.ML 的范围检查\n* 以文本模式运行模型 - 由 @apsonawane 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1908 中完成\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fcompare\u002Fv0.11.3...v0.11.4","2025-12-12T05:23:01",{"id":197,"version":198,"summary_zh":199,"released_at":200},153992,"v0.11.3","## 变更内容\n* 模型构建器重构，由 @tianleiwu 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1862 中完成\n* 添加 lintrunner 用于代码格式化，由 @tianleiwu 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1884 中完成\n* 移除空子模块残留，由 @xkszltl 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1883 中完成\n* 修复缺少 RTLD_DI_ORIGIN 支持导致的构建问题，由 @jaeyoonjung 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1888 中完成\n* 为 WebGPU 启用图捕获功能，由 @qjia7 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1848 中完成\n* 实现通用的共享 emb_tokens\u002Flm_head 实现，由 @jixiongdeng 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1885 中完成\n* 修复 Squeeze 操作中获取 total_seq_len 值的 bug，由 @Honry 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1886 中完成\n* 增加 extra_options disable_qkv_fusion 选项，以解除 qkv_projs 与上游选择的绑定关系，由 @jixiongdeng 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1893 中完成\n* 修复 macOS 管道问题，由 @apsonawane 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1904 中完成\n* Whisper：支持一种编码器\u002F解码器有状态的 Whisper 管道变体，由 @RyanMetcalfeInt8 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1857 中完成\n* 添加 Qwen2_5_VLTextModel 的模型构建器，由 @tianleiwu 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1882 中完成\n* 集成 FARA-7B 模型，由 @apsonawane 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1902 中完成\n* 将版本号设置为 0.11.3，由 @kunal-vaishnavi 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1905 中完成\n\n## 新贡献者\n* @xkszltl 在 #1883 中完成了首次贡献\n* @jaeyoonjung 在 #1888 中完成了首次贡献\n* @jixiongdeng 在 #1885 中完成了首次贡献\n* @Honry 在 #1886 中完成了首次贡献\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fcompare\u002Fv0.11.2...v0.11.3","2025-12-08T20:23:11",{"id":202,"version":203,"summary_zh":204,"released_at":205},153993,"v0.11.2","## 变更内容\n* 恢复 @xiaofeihan1 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1878 中移除的 eps_without_if_support\n* 修复 NPU 的条件判断，由 @apsonawane 完成，详见 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1880\n* 将版本号设置为 0.11.2，由 @kunal-vaishnavi 完成，详见 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1881\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fcompare\u002Fv0.11.1...v0.11.2","2025-11-18T12:53:55",{"id":207,"version":208,"summary_zh":209,"released_at":210},153994,"v0.11.1","## 变更内容\n* @kunal-vaishnavi 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1872 中将引导修复合入 0.11.1 版本\n* @kunal-vaishnavi 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1873 中将版本号设置为 0.11.1\n* @apsonawane 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1876 中修复了正则表达式\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fcompare\u002Fv0.11.0...v0.11.1","2025-11-17T03:39:36",{"id":212,"version":213,"summary_zh":214,"released_at":215},153995,"v0.11.0","## 变更内容\n* ADO - 更新 WinML 构建管道，由 @chrisdMSFT 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1768 中完成\n* 修复 CMakeLists.txt 自动检测库目录的问题，由 @anujj 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1774 中完成\n* 修复 new\u002Fdelete 重载问题，并在 Windows 上启用 CUDA 内核测试，由 @tianleiwu 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1772 中完成\n* 为 TensorRT RTX EP 使用缩写，由 @kunal-vaishnavi 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1763 中完成\n* 向模型构建器添加信任远程代码选项，由 @kunal-vaishnavi 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1766 中完成\n* 支持 qmoe 操作中的分块量化，由 @apsonawane 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1746 中完成\n* 更改 TRT-RTX EP 的状态，由 @gaugarg-nv 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1780 中完成\n* 将 rel 0.10.0 中的更改 cherry-pick 回主分支，由 @chrisdMSFT 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1782 中完成\n* 修复交叉编译时 \u002FCETCOMPAT 的使用问题，由 @sayanshaw24 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1779 中完成\n* 提供改进型 TopK 内核的分布式版本，由 @hariharans29 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1710 中完成\n* [TRT-RTX] 禁用 Phi 模型的 KV 缓存重新计算，由 @gaugarg-nv 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1787 中完成\n* [CUDA] 添加高性能 Top-K 内核及在线基准测试，由 @tianleiwu 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1748 中完成\n* 将共享索引数组类型由 float 改为 int，由 @hariharans29 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1789 中完成\n* 启用 bfloat16 多模态模型，由 @kunal-vaishnavi 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1786 中完成\n* 在处理提示时禁用 lmhead，由 @qti-ashimaj 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1762 中完成\n* 引入对动态批处理的支持，由 @baijumeswani 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1662 中完成\n* 生成 pyd 类型信息，由 @chemwolf6922 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1742 中完成\n* 在 C 示例中添加 trt-rtx c 包，由 @anujj 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1794 中完成\n* [CUDA] 修复使用 CUDA >= 12.9 构建时的问题，由 @tianleiwu 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1802 中完成\n* [CUDA] topk 内核 v2，由 @tianleiwu 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1798 中完成\n* 为 NvTensorRtRtx 和 CUDA 提供者添加预填充分块支持，由 @anujj 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1765 中完成\n* 添加 TRT-RTX EP 支持，将 NvTensorRtRtx 保留为面向用户的名称，并强制执行 QDQ，由 @anujj 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1791 中完成\n* [CUDA] 添加静态断言以抑制 Windows 构建警告，由 @tianleiwu 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1804 中完成\n* 撤销“生成 pyd 类型信息”的更改，由 @baijumeswani 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1805 中完成\n* [QNN] 支持 co","2025-11-14T02:51:44",{"id":217,"version":218,"summary_zh":219,"released_at":220},153996,"v0.10.0","## 变更内容\n* 由 @anujj 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1697 中为 NvTensorRtRtx EP 启用连续解码功能\n* 由 @sayanshaw24 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1722 中使用带有 `skip_special_tokens` 参数的更新版 Decoder API\n* 由 @baijumeswani 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1724 中更新扩展组件，包含内存泄漏修复\n* 由 @jiafatom 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1723 中支持 Whisper 示例的批处理功能\n* 由 @baijumeswani 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1725 中更新 onnxruntime_extensions 依赖版本\n* 由 @baijumeswani 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1727 中将 C++ 头文件纳入原生 NuGet 包，并修复编译器警告\n* 由 @rogerbarreto 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1689 中将 Microsoft.Extensions.AI 更新至 9.8.0 版本\n* 由 @sayanshaw24 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1730 中更新 Extensions 提交，修复 Qwen 2.5 对话模板工具问题\n* 由 @sayanshaw24 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1735 中更新 Whisper 截断扩展组件提交\n* 由 @anujj 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1734 中默认启用 TensorRtRtx 的 CUDA 图功能\n* 由 @tianleiwu 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1729 中更新采样基准测试\n* 由 @chrisdMSFT 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1740 中添加 Windows WinML x64 构建工作流\n* 由 @anujj 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1733 中修复 ORT-GenAI 和 TRT-RTX 推理之间的 CUDA 同步问题\n* 由 @chrisdMSFT 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1711 中实现 Hello WindowsML 功能\n* 由 @tianleiwu 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1732 中对 [CUDA] 采样内核进行优化改进\n* 由 @snnn 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1749 中将 GitHub Actions 更新至最新版本\n* 由 @nieubank 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1750 中将 WinML 版本更新至 1.8.2091\n* 由 @baijumeswani 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1747 中解决 macOS 打包流水线问题\n* 由 @vortex-captain 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1744 中实现 ProviderOptions 级别的设备筛选，以及用于配置模型级别设备筛选的 API\n* 由 @kunal-vaishnavi 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1751 中修复 Phi-4 mm 分词中的字符串索引错误\n* 由 @gaugarg-nv 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1754 中修复 TRT-RTX EP 的回归问题\n* 由 @kunal-vaishnavi 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1753 中修正 C API 头文件中的拼写错误\n* 由 @chrisdMSFT 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1755 中在 ADO 流水线中默认启用 WinML\n* 由 @baijumeswani 在 https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1757 中将默认构建配置更改为 'relwithdebinfo'\n* 锁定 cmake 和 vcpkg 的版本","2025-10-10T17:26:47",{"id":222,"version":223,"summary_zh":224,"released_at":225},153997,"v0.9.2","本次发布修复了 Phi-4 多模态模型的预处理错误。\n\n**完整更新日志**: https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fcompare\u002Fv0.9.1...v0.9.2","2025-09-16T07:27:58",{"id":227,"version":228,"summary_zh":229,"released_at":230},153998,"v0.9.1","🚀 Features\r\n\r\nSupport for Continuous Batching (#1580) by @baijumeswani\r\nRegisterExecutionProviderLibrary (#1628) by @vortex-captain\r\nEnable CUDA graph for LLMs for NvTensorRtRtx EP (#1645) by @anujj\r\nAdd support for smollm3 (#1666) by @xenova\r\nAdd OpenAI's gpt-oss to ONNX Runtime GenAI (#1678) by @kunal-vaishnavi\r\nAdd custom ops library path resolution using EP metadata (#1707) by @psakhamoori\r\nUse OnnxRuntime API wrapper for EP device operations (#1719) by @psakhamoori\r\n\r\n\r\n🛠 Improvements\r\n\r\nUpdate Extensions Commit to Support Strft Custom Function for Chat Template (#1670) by @sayanshaw24\r\nAdd parameters to chat template in chat example (#1673) by @kunal-vaishnavi\r\nUpdate how Hugging Face's config files are processed (#1693) by @kunal-vaishnavi\r\nTie embedding weight sharing (#1690) by @jiafatom\r\nImprove top-k sampling CUDA kernel (#1708) by @gaugarg-nv\r\n\r\n\r\n🐛 Bug Fixes\r\n\r\nFix accessing final norm for Gemma-3 models (#1687) by @kunal-vaishnavi\r\nFix runtime bugs with multi-modal models (#1701) by @kunal-vaishnavi\r\nFix BF16 CUDA version of OpenAI's gpt-oss (#1706) by @kunal-vaishnavi\r\nFix benchmark_e2e (#1702) by @jiafatom\r\nFix benchmark_multimodal (#1714) by @jiafatom\r\nFix pad vs. eos token misidentification (#1694) by @aciddelgado\r\n\r\n\r\n⚡ Performance & EP Enhancements\r\n\r\nNvTensorRtRtx: Support num_beam > 1 (#1688) by @anujj\r\nNvTensorRtRtx: Skip if node of Phi4 models (#1696) by @anujj\r\nRemove QDQ and Opset Coupling for TRT RTX EP (#1692) by @xiaoyu-work\r\n\r\n\r\n🔒 Build & CI\r\n\r\nEnable Security Protocols in MSVC for BinSkim (#1672) by @sayanshaw24\r\nExplicitly specify setup-java architecture in win-cpu-arm64-build.yml (#1685) by @edgchen1\r\nUse dotnet instead of nuget in mac build (#1717) by @natke\r\n\r\n\r\n📦 Versioning & Release\r\n\r\nUpdate version to 0.10.0 (#1676) by @ajindal1\r\nCherrypick 0: Forgot to change versions (#1721) by @aciddelgado\r\nCherrypick 1... Becomes RC1 (#1726) by @aciddelgado\r\nCherrypick 2 (#1743) by @aciddelgado\r\n\r\n\r\n🙌 New Contributors\r\n\r\n@xiaoyu-work (#1692)\r\n@psakhamoori (#1707)\r\n\r\n\r\n✅ Full Changelog: v0.9.0...v0.9.1","2025-09-09T22:53:00",{"id":232,"version":233,"summary_zh":234,"released_at":235},153999,"v0.9.0","## What's Changed\r\n\r\n### New Features\r\n* Constrained decoding integration by @ajindal1 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1381\r\n* Update constrained decoding by @ajindal1 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1477\r\n* Enable TRT multi profile option though provider option   by @anujj in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1493\r\n* Add support for Machine Translation model by @apsonawane in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1482\r\n* Overlap prompt processing KV cache update for WindowedKeyValueCache in DecoderOnlyPipelineState by @edgchen1 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1526\r\n* Add basic support for tracing by @edgchen1 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1524\r\n* Logging SetLogCallback + Debugging cleanup by @RyanUnderhill in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1471\r\n* Support loading models from memory by @baijumeswani in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1571\r\n* Add SLM Engine support function calling by @kinfey in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1582\r\n* Pass the batch_size thought the Overlay  by @anujj in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1627\r\n* Enable GPU based sampling for TRT-RTX by @gaugarg-nv in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1650\r\n\r\n### Model Builder Changes\r\n* Whisper Redesigned Solution by @kunal-vaishnavi in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1229\r\n* [Builder] Add support for Olive quantized models by @jambayk in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1647\r\n* Add Qwen3 to model builder by @xenova in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1428\r\n* Model builder: Add ability to exclude a node from quantization by @sushraja-msft in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1436\r\n* Support k_quant in model builder by @jiafatom in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1444\r\n* Add final norm for LoRA models by @kunal-vaishnavi in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1446\r\n* Add bfloat16 support in model builder by @kunal-vaishnavi in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1447\r\n* Fix accuracy issues with Gemma models by @kunal-vaishnavi in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1448\r\n* Always cast bf16 logits to fp32 by @nenad1002 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1479\r\n* NvTensorRtRtx EP option in GenAI - model builder by @BLSharda in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1453\r\n* Add Gemma3 Model support for NvTensorRtRtx execution provider by @anujj in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1520\r\n* Use IRv10 in the model builder by @justinchuby in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1547\r\n* [Builder] Rename methods make_value and make_initializer by @justinchuby in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1554\r\n* Always use opset21 in builder by @justinchuby in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1548\r\n* Clamp KV Cache Size to Sliding Window for NvTensorRtRtx EP by @BLSharda in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1523\r\n* [Builder] Fix output name in make_rotary_embedding_multi_cache by @justinchuby in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1562\r\n* [Builder] Use lazy tensor by @justinchuby in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1556\r\n* [Builder] Fix KeyError for torch.uint8 in dtype mapping for MoE quantization by @Copilot in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1561\r\n* [Builder] Fix 1d constant creation by @justinchuby in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1568\r\n* [Builder] Create progress bar by @justinchuby in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1559\r\n* [Builder] Use packed 4bit tensors directly by @justinchuby in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1566\r\n* [Builder] Simplify constant creation by @justinchuby in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1569\r\n* [Builder] Add cuda-bfloat16 entry to valid_gqa_configurations by @justinchuby in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1585\r\n* [Builder] use dtype conversion helpers from onnx_ir by @justinchuby in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1587\r\n* [Model builder] Add support for Ernie 4.5 models by @xenova in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1608\r\n* whisper: Allow session options to be used for encoder by @RyanMetcalfeInt8 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1622\r\n* Make default top_k=50 in model builder by @jiafatom in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1642\r\n* Update builder.py by @lnigam in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1665\r\n* Change IO dtype for INT4 CUDA models by @kunal-vaishnavi in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1629\r\n\r\n### Bug fixes\r\n* CUDA Top K \u002F Top P Fixes by @aciddelgado in https:\u002F\u002Fgithub.com\u002Fmi","2025-08-06T17:30:57",{"id":237,"version":238,"summary_zh":239,"released_at":240},154000,"v0.8.3","This release addresses regressions with DML.\r\n\r\nFixes include:\r\n\r\n- #1578 @aciddelgado \r\n- #1590 @baijumeswani ","2025-07-03T20:37:56",{"id":242,"version":243,"summary_zh":244,"released_at":245},154001,"v0.8.2","## What's changed\r\n\r\n### New features\r\n* Use Accuracy level 4 for webgpu by default by [@guschmue](https:\u002F\u002Fgithub.com\u002Fguschmue) ([#1474](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1474))\r\n* Enable guidance by default on macos by [@ajindal1](https:\u002F\u002Fgithub.com\u002Fajindal1) ([#1514](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1514))\r\n\r\n### Bug fixes\r\n* Remove position_id and fix context phase KV shapes for in-place cache buffer support by [@anujj](https:\u002F\u002Fgithub.com\u002Fanujj) ([#1505](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1505))\r\n* Update Extensions Commit for 0.8.2 by [@sayanshaw24](https:\u002F\u002Fgithub.com\u002Fsayanshaw24) ([#1519](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1519))\r\n* Update Extensions Commit for another DeepSeek Fix by [@sayanshaw24](https:\u002F\u002Fgithub.com\u002Fsayanshaw24) ([#1521](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1521))\r\n\r\n### Packaging and testing\r\n* Update triggers by [@snnn](https:\u002F\u002Fgithub.com\u002Fsnnn) ([#1490](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1490))\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fcompare\u002Fv0.8.1...v0.8.2","2025-06-05T23:03:26",{"id":247,"version":248,"summary_zh":249,"released_at":250},154002,"v0.8.1","## What's changed\r\n\r\n### New features\r\n- Integrate tools input into Chat Template API by [@sayanshaw24](https:\u002F\u002Fgithub.com\u002Fsayanshaw24) ([#1472](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1472))\r\n* NvTensorRtRtx EP option in GenAI - model builder by [@BLSharda](https:\u002F\u002Fgithub.com\u002FBLSharda) ([#1453](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1453))\r\n* Enable TRT multi profile option though provider option by [@anujj](https:\u002F\u002Fgithub.com\u002Fanujj) ([#1493](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1493))\r\n\r\n### Bug fixes\r\n* Always cast bf16 logits to fp32 by [@nenad1002](https:\u002F\u002Fgithub.com\u002Fnenad1002) ([#1479](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1479))\r\n\r\n### Examples and documentation\r\n* Update Chat Template Examples for Tools API change by [@sayanshaw24](https:\u002F\u002Fgithub.com\u002Fsayanshaw24) ([#1506](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1506))\r\n* Fix model chat example for rewind by [@ajindal1](https:\u002F\u002Fgithub.com\u002Fajindal1) ([#1480](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1480))\r\n\r\n### Model builder changes\r\n* Fix from pretrained method for quantized models by [@kunal-vaishnavi](https:\u002F\u002Fgithub.com\u002Fkunal-vaishnavi) ([#1503](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1503))\r\n* Fix missing parameter name by [@xadupre](https:\u002F\u002Fgithub.com\u002Fxadupre) ([#1502](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1502))\r\n* minor change to support qwen3 by [@guschmue](https:\u002F\u002Fgithub.com\u002Fguschmue) ([#1499](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1499))\r\n* Fix how torch tensors are saved by [@kunal-vaishnavi](https:\u002F\u002Fgithub.com\u002Fkunal-vaishnavi) ([#1476](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1476))\r\n* Support k_quant in model builder by [@jiafatom](https:\u002F\u002Fgithub.com\u002Fjiafatom) ([#1444](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1444))\r\n\r\n### Dependency updates\r\n- Update to stable release of Microsoft.Extensions.AI.Abstractions by [@stephentoub](https:\u002F\u002Fgithub.com\u002Fstephentoub) ([#1489](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1489))\r\n- Update to M.E.AI 9.4.3-preview.1.25230.7 by [@stephentoub](https:\u002F\u002Fgithub.com\u002Fstephentoub) ([#1443](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1443))\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fcompare\u002Fv0.8.0...v0.8.1","2025-05-30T22:14:29",{"id":252,"version":253,"summary_zh":254,"released_at":255},154003,"v0.8.0","## What's Changed\r\n\r\n### New Features\r\n* Add Chat Template API Changes by @sayanshaw24 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1398\r\n* Add Python and C# bindings for Chat Template API by @sayanshaw24 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1411\r\n* Support for gemma3 model by @baijumeswani in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1374\r\n* Support more QNN models with different model structures by @baijumeswani in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1322\r\n* Add ability to load audio from bytes, to match images API by @RyanUnderhill in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1304\r\n* Add support for DML Graph Capture to improve speed by @aciddelgado in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1305\r\n* Added OnnxRuntimeGenAIChatClient ctor with Config. by @azchohfi in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1364\r\n* Extensible AppendExecutionProvider and expose OrtSessionOptions::AddConfigEntry directly by @RyanUnderhill in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1384\r\n* OpenVINO: Model Managed KVCache by @RyanMetcalfeInt8 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1399\r\n* Changes how the device OrtAllocators work, use a global OrtSession instead by @RyanUnderhill in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1378\r\n* Remove audio attention mask processing and update ort-extensions by @baijumeswani in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1319\r\n* Simplify the C API definitions and prevent any type mismatches going forward by @RyanUnderhill in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1365\r\n\r\n### Model builder updates\r\n* Quark Quantizer Support by @shobrienDMA in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1207\r\n* Add Gemma 3 to model builder by @kunal-vaishnavi in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1359\r\n* Initial support for VitisAI EP by @AnanyaA-9 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1370\r\n* [OVEP] feat: Adding OpenVINO EP in ORT-GenAI by @ankitm3k in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1389\r\n* Initial support for NV EP by @BLSharda in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1404\r\n* Adapt to MatMulNBitsQuantizer in ort by @jiafatom in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1426\r\n* Fix LM head for Gemma-2 by @kunal-vaishnavi in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1420\r\n\r\n### Bug Fixes\r\n* Fix mismatch in Java bindings by @CaptainIRS in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1307\r\n* Fix type mismatch in Java bindings by @CaptainIRS in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1313\r\n* Update ort-extensions to fix tokenizer bug for phi4 by @baijumeswani in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1331\r\n* Windows: Show more useful DLL load errors to say exactly what DLL is missing by @RyanUnderhill in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1345\r\n* deprecate graph cap by @aciddelgado in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1338\r\n* Support load\u002Funload of models to avoid QNN errors on deepseek r1 1.5B by @baijumeswani in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1346\r\n* Add missing 'value_stats' to logging API, and fix wrong default by @RyanUnderhill in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1353\r\n* Convert tokens to list for concat by @ajindal1 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1358\r\n* Improve and Fix TopKTopP by @jiafatom in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1363\r\n* Switch the order of softmax on CPU Top K by @aciddelgado in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1354\r\n* Update pybind and fix rpath for macos and check for nullptr by @baijumeswani in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1367\r\n* iterate over the providers by @baijumeswani in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1486\r\n* Correctly iterate over the providers to check if graph capture is enabled by @baijumeswani in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1487\r\n\r\n### Examples and Documentation\r\n* Update README.md by @RyanUnderhill in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1372\r\n* Add slm engine example by @avijit-chakroborty in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1242\r\n* Added cancellation to the streaming method of OnnxRuntimeGenAIChatClient. by @azchohfi in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1289\r\n* Update nuget README with latest API by @natke in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1326\r\n* Update C examples downloads by @ajindal1 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1332\r\n* Add Q&A Test Example in Nightly by @ajindal1 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1277\r\n* docs: update the doc of slm_engine to ensure consistency with the code by @dennis2030 in https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fpull\u002F1386\r\n* C++ and python samples: follow","2025-05-30T22:12:02",{"id":257,"version":258,"summary_zh":259,"released_at":260},154004,"v0.7.1","## Release Notes\r\n\r\n- Add AMD Quark Quantizer Support #1207\r\n- Added Gemma 3 to model builder #1359 \r\n- Updated Phi-3 Python Q&A example to be consistent with C++ example  #1392 \r\n- Updated Microsoft.Extensions.AI.Abstractions to 9.4.0-preview.1.25207.5 #1388 \r\n- Added OnnxRuntimeGenAIChatClient constructor with Config #1364\r\n- Improve and Fix TopKTopP  #1363\r\n- Switch the order of softmax on CPU Top K #1354 \r\n- Updated custom nuget packaging logic #1377 \r\n- Updated pybind and fix rpath for macos and check for nullptr  #1367 \r\n- Convert tokens to list for concat to accommodate breaking API change in tokenizer #1358 \r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n","2025-04-22T02:20:02",{"id":262,"version":263,"summary_zh":264,"released_at":265},154005,"v0.7.0","# Release Notes\r\n\r\nWe are excited to announce the release of `onnxruntime-genai` version 0.7.0. Below are the key updates included in this release:\r\n\r\n1. Support for a wider variety of QNN NPU models (such as Deepseek R1)\r\n2. Remove `onnxruntime-genai` static library. All language bindings now interface with `onnxruntime-genai` through the `onnxruntime-genai` shared library.\r\n    - All return types from `onnxruntime-genai` python package is now a numpy array type.\r\n    - Previously the return type from tokenizer.encode was a python list. This broke [examples\u002Fpython\u002Fmodel-qa.py](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime-genai\u002Fblob\u002Fmain\u002Fexamples\u002Fpython\u002Fmodel-qa.py) which was using '+' to concatenate two lists. np.concatenate must be used instead for these cases.\r\n3. Abstract away execution provider specific code into shared libraries of their own (for example `onnxruntime-genai-cuda` for cuda, and `onnxruntime-genai-dml` for dml). This allows using the onnxruntime-genai-cuda package to also work on non cuda machines (as an example).\r\n4. Support for multi-modal models (text, speech, and vision) such as phi4-multi-modal.\r\n5. Add an IChatClient implementation to the `onnxruntime-genai` C# bindings.\r\n6. Expose the model type through the Python bindings.\r\n7. Code and performance improvements for DML EP.\r\n\r\nThis release also includes several bug fixes that resolve issues reported by users.","2025-03-28T16:58:37",{"id":267,"version":268,"summary_zh":269,"released_at":270},154006,"v0.6.0","# Release Notes\r\n\r\nWe are excited to announce the release of `onnxruntime-genai` version 0.6.0. Below are the key updates included in this release:\r\n\r\n1. Support for contextual or continuous decoding allows users to carry out multi-turn conversation style generation.\r\n2. Support for new models such as Deepseek R1, AMD OLMo, IBM Granite and others.\r\n3. Python 3.13 wheels have been introduced\r\n4. Support for generation for models sourced from [Qualcomm's AI Hub](https:\u002F\u002Faihub.qualcomm.com\u002Fmobile\u002Fmodels). This work also includes publishing a nuget package `Microsoft.ML.OnnxRuntimeGenAI.QNN` for QNN EP\r\n5. Support for WebGPU EP\r\n\r\nThis release also includes performance improvements to optimize memory usage and speed. In addition, there are several bug fixes that resolve issues reported by users.","2025-02-14T18:07:51",{"id":272,"version":273,"summary_zh":274,"released_at":275},154007,"v0.5.2","# Release Notes\r\n\r\nPatch release 0.5.2 adds:\r\n\r\n* Fixes for bugs #1074, #1092 via PRs #1065 and #1070\r\n* Fix Nuget sample in package README to show correct disposal of objects\r\n* Added extra validation via PRs #1050 #1066 \r\n\r\nFeatures in 0.5.0:\r\n\r\n* Support for MultiLoRA\r\n* Support for multi-frame for Phi-3 vision and Phi-3.5 vision models\r\n* Support for the Phi-3 MoE model \r\n* Support for NVIDIA Nemotron model\r\n* Support for the Qwen model\r\n* Addition of the Set Terminate feature, which allows users to cancel mid-generation\r\n* Soft capping support for Group Query Attention\r\n* Extend quantization support to embedding and LM head layers\r\n* Mac support in published packages\r\n\r\n## Known issues\r\n* Models running with DirectML do not support batching\r\n* Python 3.13 is not supported in this release","2024-11-26T18:05:26"]