[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-openvinotoolkit--model_server":3,"tool-openvinotoolkit--model_server":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",154349,2,"2026-04-13T23:32:16",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108322,"2026-04-10T11:39:34",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":76,"owner_twitter":76,"owner_website":77,"owner_url":78,"languages":79,"stars":119,"forks":120,"last_commit_at":121,"license":122,"difficulty_score":10,"env_os":123,"env_gpu":124,"env_ram":125,"env_deps":126,"category_tags":134,"github_topics":135,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":148,"updated_at":149,"faqs":150,"releases":179},7381,"openvinotoolkit\u002Fmodel_server","model_server","A scalable inference server for models optimized with OpenVINO™","OpenVINO Model Server 是一个专为高效部署 AI 模型打造的高性能推理服务器，特别针对英特尔架构进行了深度优化。它核心解决了模型落地难的问题：通过将模型推理能力封装为标准网络服务，让应用程序无需关心底层硬件、框架或基础设施细节，即可通过 REST 或 gRPC 协议轻松调用。这不仅实现了客户端轻量化，还有效保护了模型权重与结构的安全，非常适合构建基于微服务的云原生应用（如 Kubernetes 环境）。\n\n该工具主要面向需要大规模部署 AI 模型的开发者、算法工程师及系统架构师。其独特亮点在于全面兼容 OpenAI、Cohere 等主流生成式 API 标准，支持大语言模型、图像生成、语音识别及文本重排序等多种任务，让迁移现有应用变得异常简单。此外，它还具备动态批处理、模型热更新、版本管理及原生 Windows 支持等高级特性，能够灵活应对从边缘设备到云端集群的各种复杂场景，帮助团队以更低的成本实现资源利用率最大化。","# OpenVINO&trade; Model Server\n\nModel Server hosts models and makes them accessible to software components over standard network protocols: a client sends a request to the model server, which performs model inference and sends a response back to the client. Model Server offers many advantages for efficient model deployment:\n- Remote inference enables using lightweight clients with only the necessary functions to perform API calls to edge or cloud deployments.\n- Applications are independent of the model framework, hardware device, and infrastructure.\n- Client applications in any programming language that supports REST or gRPC calls can be used to run inference remotely on the model server.\n- Clients require fewer updates since client libraries change very rarely.\n- Model topology and weights are not exposed directly to client applications, making it easier to control access to the model.\n- Ideal architecture for microservices-based applications and deployments in cloud environments – including Kubernetes and OpenShift clusters.\n- Efficient resource utilization with horizontal and vertical inference scaling.\n\n![OVMS diagram](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenvinotoolkit_model_server_readme_20f07cd21e3f.png)\n\nOpenVINO&trade; Model Server (OVMS) is a high-performance system for serving models. Implemented in C++ for scalability and optimized for deployment on Intel architectures. It uses the [generative API](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_clients_genai.html) like OpenAI and Cohere, [KServe](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_clients_kfs.html) and [TensorFlow Serving](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_clients_tfs.html) and while applying OpenVINO for inference execution. Inference service is provided via gRPC or REST API, making deploying new algorithms and AI experiments easy.\n\n![OVMS picture](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenvinotoolkit_model_server_readme_7a0b65e4a8b5.png)\n\nThe models used by the server can be stored locally, hosted remotely by object storage services or pulled from HuggingFace Hub. For more details, refer to [Preparing Model Repository](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_models_repository.html) and [Deployment](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_deploying_server.html) documentation.\nModel server works inside Docker containers, Bare Metal and in Kubernetes environment.\n\nStart using OpenVINO Model Server with a fast-forward serving example from the [QuickStart guide](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_quick_start_guide.html) or [LLM QuickStart guide](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_llm_quickstart.html).\n\nRead [release notes](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Freleases) to find out what’s new.\n\n### Key features:\n- **[NEW]** [Speech Generation and Speech Recognition with OpenAI API](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_demos_audio.html)\n- **[NEW]** [Support for AI agents](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_demos_continuous_batching_agent.html)\n- **[NEW]** [Image generation compatible with OpenAI API](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_demos_image_generation.html)\n- Native Windows support. Check updated [deployment guide](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_deploying_server_baremetal.html)\n- [Text Embeddings compatible with OpenAI API](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_demos_embeddings.html)\n- [Reranking compatible with Cohere API](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_demos_rerank.html)\n- [Efficient Text Generation via OpenAI API](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_demos_continuous_batching.html)\n- [Python code execution](docs\u002Fpython_support\u002Freference.md)\n- [gRPC streaming](docs\u002Fstreaming_endpoints.md)\n- [MediaPipe graphs serving](docs\u002Fmediapipe.md)\n- Model management - including [model versioning](docs\u002Fmodel_version_policy.md) and [model updates in runtime](docs\u002Fonline_config_changes.md)\n- [Dynamic model inputs](docs\u002Fshape_batch_size_and_layout.md)\n- [Directed Acyclic Graph Scheduler](docs\u002Fdag_scheduler.md) along with [custom nodes in DAG pipelines](docs\u002Fcustom_node_development.md)\n- [Metrics](docs\u002Fmetrics.md) - metrics compatible with Prometheus standard\n- Support for multiple frameworks, such as TensorFlow, PaddlePaddle and ONNX\n- Support for [AI accelerators](.\u002Fdocs\u002Faccelerators.md)\n\nCheck full list of [features](.\u002Fdocs\u002Ffeatures.md)\n\n**Note:** OVMS has been tested on RedHat, Ubuntu and Windows. \nPublic docker images are stored in:\n- [Dockerhub](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fopenvino\u002Fmodel_server)\n- [RedHat Ecosystem Catalog](https:\u002F\u002Fcatalog.redhat.com\u002Fsoftware\u002Fcontainers\u002Fintel\u002Fopenvino-model-server\u002F607833052937385fc98515de)\nBinary packages for Linux and Windows are on [Github](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Freleases)\n\n## Run OpenVINO Model Server\n\nA demonstration on how to use OpenVINO Model Server can be found in our [quick-start guide for vision use case](docs\u002Fovms_quickstart.md) and [LLM text generation](docs\u002Fllm\u002Fquickstart.md).\n\nCheck also other instructions:\n\n[Preparing model repository](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_models_repository.html)\n\n[Deployment](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_deploying_server.html)\n\n[Writing client code](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_server_app.html)\n\n[Demos](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_demos.html)\n\n\n\n## References\n\n* [OpenVINO&trade;](https:\u002F\u002Fsoftware.intel.com\u002Fen-us\u002Fopenvino-toolkit)\n\n* [ADVANCING GENAI WITH CPU OPTIMIZATION](https:\u002F\u002Fcdrdv2-public.intel.com\u002F864404\u002FvFINAL_Intel%20SLM%20Whitepaper.pdf)\n\n* [Manage deep learning models with OpenVINO Model Server](https:\u002F\u002Fdevelopers.redhat.com\u002Farticles\u002F2024\u002F07\u002F03\u002Fmanage-deep-learning-models-openvino-model-server#)\n\n* [RAG building blocks made easy and affordable with OpenVINO Model Server](https:\u002F\u002Fmedium.com\u002Fopenvino-toolkit\u002Frag-building-blocks-made-easy-and-affordable-with-openvino-model-server-e7b03da5012b)\n\n* [Simple deployment with KServe API](https:\u002F\u002Fblog.openvino.ai\u002Fblog-posts\u002Fkserve-api)\n\n* [Benchmarking results](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fabout-openvino\u002Fperformance-benchmarks.html)\n\n\n## Contact\n\nIf you have a question, a feature request, or a bug report, feel free to submit a Github issue.\n\n\n---\n\\* Other names and brands may be claimed as the property of others.\n","# OpenVINO&trade; 模型服务器\n\n模型服务器用于托管模型，并通过标准网络协议使其可供软件组件访问：客户端向模型服务器发送请求，模型服务器执行模型推理并将响应返回给客户端。模型服务器为高效部署模型提供了诸多优势：\n- 远程推理允许使用仅具备必要功能的轻量级客户端，通过 API 调用与边缘或云端部署进行交互。\n- 应用程序与模型框架、硬件设备和基础设施无关。\n- 支持 REST 或 gRPC 调用的任何编程语言的客户端应用程序均可用于在模型服务器上远程执行推理。\n- 客户端所需的更新较少，因为客户端库很少发生变化。\n- 模型拓扑和权重不会直接暴露给客户端应用程序，从而更容易控制对模型的访问权限。\n- 是基于微服务的应用程序以及云环境（包括 Kubernetes 和 OpenShift 集群）部署的理想架构。\n- 通过水平和垂直推理扩展实现高效的资源利用。\n\n![OVMS 流程图](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenvinotoolkit_model_server_readme_20f07cd21e3f.png)\n\nOpenVINO&trade; 模型服务器 (OVMS) 是一个高性能的模型推理服务系统。它采用 C++ 实现以支持可扩展性，并针对 Intel 架构进行了优化。OVMS 使用类似于 OpenAI 和 Cohere 的[生成式 API](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_clients_genai.html)、[KServe](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_clients_kfs.html) 和 [TensorFlow Serving](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_clients_tfs.html)，同时利用 OpenVINO 执行推理。推理服务通过 gRPC 或 REST API 提供，使得部署新算法和 AI 实验变得简单易行。\n\n![OVMS 高层架构图](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenvinotoolkit_model_server_readme_7a0b65e4a8b5.png)\n\n服务器所使用的模型可以存储在本地、由对象存储服务远程托管，或从 HuggingFace Hub 中拉取。更多详细信息，请参阅 [准备模型仓库](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_models_repository.html) 和 [部署](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_deploying_server.html) 文档。模型服务器可在 Docker 容器、裸金属服务器以及 Kubernetes 环境中运行。\n\n您可以通过 [快速入门指南](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_quick_start_guide.html) 或 [LLM 快速入门指南](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_llm_quickstart.html) 中的快速演示示例开始使用 OpenVINO 模型服务器。\n\n请阅读 [发布说明](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Freleases)，了解最新功能。\n\n### 核心特性：\n- **[新增]** [使用 OpenAI API 进行语音生成和语音识别](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_demos_audio.html)\n- **[新增]** [支持 AI 代理](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_demos_continuous_batching_agent.html)\n- **[新增]** [兼容 OpenAI API 的图像生成](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_demos_image_generation.html)\n- 原生 Windows 支持。请查看更新后的 [部署指南](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_deploying_server_baremetal.html)\n- [兼容 OpenAI API 的文本嵌入](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_demos_embeddings.html)\n- [兼容 Cohere API 的重排序](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_demos_rerank.html)\n- [通过 OpenAI API 实现高效文本生成](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_demos_continuous_batching.html)\n- [Python 代码执行](docs\u002Fpython_support\u002Freference.md)\n- [gRPC 流式传输](docs\u002Fstreaming_endpoints.md)\n- [MediaPipe 图谱服务](docs\u002Fmediapipe.md)\n- 模型管理——包括 [模型版本控制](docs\u002Fmodel_version_policy.md) 和 [运行时模型更新](docs\u002Fonline_config_changes.md)\n- [动态模型输入](docs\u002Fshape_batch_size_and_layout.md)\n- [有向无环图调度器](docs\u002Fdag_scheduler.md)，以及 [DAG 流水线中的自定义节点](docs\u002Fcustom_node_development.md)\n- [指标](docs\u002Fmetrics.md)——兼容 Prometheus 标准的指标\n- 支持多种框架，如 TensorFlow、PaddlePaddle 和 ONNX\n- 支持 [AI 加速器](.\u002Fdocs\u002Faccelerators.md)\n\n完整特性列表请参阅 [特性文档](.\u002Fdocs\u002Ffeatures.md)。\n\n**注意：** OVMS 已在 RedHat、Ubuntu 和 Windows 上经过测试。公开的 Docker 镜像存储于：\n- [Dockerhub](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fopenvino\u002Fmodel_server)\n- [RedHat 生态系统目录](https:\u002F\u002Fcatalog.redhat.com\u002Fsoftware\u002Fcontainers\u002Fintel\u002Fopenvino-model-server\u002F607833052937385fc98515de)\nLinux 和 Windows 的二进制包可在 [Github](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Freleases) 上找到。\n\n## 运行 OpenVINO 模型服务器\n\n有关如何使用 OpenVINO 模型服务器的演示，请参阅我们的 [视觉用例快速入门指南](docs\u002Fovms_quickstart.md) 和 [LLM 文本生成快速入门指南](docs\u002Fllm\u002Fquickstart.md)。\n\n此外，请参考以下说明：\n- [准备模型仓库](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_models_repository.html)\n- [部署](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_deploying_server.html)\n- [编写客户端代码](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_server_app.html)\n- [演示](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_demos.html)\n\n## 参考文献\n* [OpenVINO&trade;](https:\u002F\u002Fsoftware.intel.com\u002Fen-us\u002Fopenvino-toolkit)\n* [通过 CPU 优化推动生成式 AI 发展](https:\u002F\u002Fcdrdv2-public.intel.com\u002F864404\u002FvFINAL_Intel%20SLM%20Whitepaper.pdf)\n* [使用 OpenVINO 模型服务器管理深度学习模型](https:\u002F\u002Fdevelopers.redhat.com\u002Farticles\u002F2024\u002F07\u002F03\u002Fmanage-deep-learning-models-openvino-model-server#)\n* [借助 OpenVINO 模型服务器轻松且经济地构建 RAG 组件](https:\u002F\u002Fmedium.com\u002Fopenvino-toolkit\u002Frag-building-blocks-made-easy-and-affordable-with-openvino-model-server-e7b03da5012b)\n* [使用 KServe API 简单部署](https:\u002F\u002Fblog.openvino.ai\u002Fblog-posts\u002Fkserve-api)\n* [基准测试结果](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fabout-openvino\u002Fperformance-benchmarks.html)\n\n## 联系方式\n如果您有任何问题、功能请求或错误报告，请随时提交 Github 问题。\n\n---\n\\* 其他名称和品牌可能属于其各自所有者。","# OpenVINO Model Server (OVMS) 快速上手指南\n\nOpenVINO Model Server (OVMS) 是一个高性能的模型服务系统，专为在 Intel 架构上部署深度学习模型而优化。它支持通过 gRPC 或 REST API 提供推理服务，兼容 OpenAI、KServe 和 TensorFlow Serving 等主流接口标准，适用于云原生、微服务及边缘计算场景。\n\n## 环境准备\n\n### 系统要求\n- **操作系统**：Ubuntu, RedHat Enterprise Linux (RHEL), 或 Windows (原生支持)。\n- **硬件架构**：Intel CPU (推荐)，支持多种 AI 加速器。\n- **运行环境**：\n  - **Docker** (推荐)：需安装 Docker Engine 和 Docker Compose。\n  - **裸金属 (Bare Metal)**：需具备 C++ 运行环境。\n  - **Kubernetes**：适用于 K8s 或 OpenShift 集群部署。\n\n### 前置依赖\n- 若使用 Docker 方式，无需额外安装 OpenVINO  toolkit，镜像已包含所有依赖。\n- 若需从 HuggingFace Hub 拉取模型，请确保网络通畅（国内用户建议配置代理或使用镜像源）。\n- 客户端开发：任意支持 REST 或 gRPC 调用的编程语言环境（如 Python, C++, Java 等）。\n\n## 安装步骤\n\n推荐使用 Docker 容器化部署，这是最简便且隔离性最好的方式。\n\n### 1. 拉取官方 Docker 镜像\n从 Docker Hub 拉取最新稳定版镜像：\n\n```bash\ndocker pull openvino\u002Fmodel_server:latest\n```\n\n> **提示**：国内用户若拉取缓慢，可尝试配置 Docker 镜像加速器，或使用 RedHat Ecosystem Catalog 镜像（需注册账号）。\n\n### 2. 准备模型仓库\nOVMS 支持本地存储、对象存储或直接从 HuggingFace Hub 加载模型。以下以本地模型为例：\n\n假设你已将 OpenVINO 格式模型（`.xml` 和 `.bin` 文件）放置在本地目录 `\u002Fhome\u002Fuser\u002Fmodels\u002Fresnet50\u002F1`（其中 `1` 为版本号）。\n\n目录结构示例：\n```text\nmodels\u002F\n└── resnet50\u002F\n    └── 1\u002F\n        ├── model.xml\n        └── model.bin\n```\n\n### 3. 启动模型服务\n使用 `docker run` 命令挂载模型目录并启动服务。以下示例将本地模型目录映射到容器内，并开放 gRPC (9000) 和 REST (8000) 端口：\n\n```bash\ndocker run -d --rm -p 8000:8000 -p 9000:9000 \\\n  -v \u002Fhome\u002Fuser\u002Fmodels:\u002Fmodels \\\n  openvino\u002Fmodel_server:latest \\\n  --model_path \u002Fmodels \\\n  --port 8000 \\\n  --rest_port 8000 \\\n  --grpc_port 9000 \\\n  --model_name resnet50\n```\n\n*注：若需加载多个模型或配置复杂策略，建议使用 JSON 配置文件并通过 `--config_path` 参数指定。*\n\n## 基本使用\n\n启动成功后，即可通过 REST API 或 gRPC 客户端发送推理请求。以下展示最简单的 REST API 调用示例。\n\n### 发送推理请求 (REST API)\n假设使用 `curl` 向上述启动的 ResNet-50 模型发送一张预处理后的图片数据（此处以占位符表示输入张量）：\n\n```bash\ncurl http:\u002F\u002Flocalhost:8000\u002Fv1\u002Fmodels\u002Fresnet50:predict \\\n  -d '{\n    \"instances\": [\n      {\"input\": [0.1, 0.2, 0.3, ...]} \n    ]\n  }'\n```\n\n*注意：实际使用时，`input` 数组应为经过预处理（归一化、Resize 等）并展平的图片像素数据，具体形状需参考模型的输入要求。*\n\n### 查看模型状态\n检查模型是否加载成功：\n\n```bash\ncurl http:\u002F\u002Flocalhost:8000\u002Fv1\u002Fmodels\u002Fresnet50\n```\n\n若返回包含 `\"version\": \"1\"` 和 `\"state\": \"AVAILABLE\"` 的 JSON 响应，则表示服务就绪。\n\n### 进阶场景提示\n- **LLM 文本生成**：OVMS 兼容 OpenAI API 格式，可直接使用标准的 OpenAI Python SDK 连接 OVMS 进行大模型推理。\n- **动态批处理**：支持在运行时调整 Batch Size 和输入形状，无需重启服务。\n- **多框架支持**：除 OpenVINO IR 格式外，还支持直接加载 TensorFlow SavedModel、ONNX 和 PaddlePaddle 模型。\n\n更多详细用例（如语音生成、图像生成、RAG 架构搭建等），请参考官方 [Demos](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_demos.html) 文档。","某电商团队正在构建一个支持多语言客服对话和实时商品图生成的智能中台，需同时服务 Web、App 及内部数据分析系统。\n\n### 没有 model_server 时\n- 各业务线需分别集成不同框架的推理代码，导致 Python、Go 和 Java 客户端维护成本极高，每次模型更新都需全员协同发布。\n- 直接暴露模型权重和拓扑结构给前端应用，存在核心算法被逆向工程或非法拷贝的安全隐患。\n- 面对大促流量洪峰，单体推理服务难以动态扩容，常因资源争抢导致响应延迟飙升甚至服务崩溃。\n- 硬件绑定严重，更换英特尔新代 CPU 或加速卡时，所有客户端驱动和依赖库均需重新编译适配。\n\n### 使用 model_server 后\n- 所有客户端仅需通过标准 REST 或 gRPC 接口调用，彻底解耦了业务逻辑与底层模型框架，新功能上线周期从周级缩短至小时级。\n- 模型文件集中托管在 server 端，客户端只发送请求数据，有效保护了核心资产，并实现了细粒度的访问控制。\n- 依托 Kubernetes 实现横向自动扩缩容，轻松应对突发流量，结合 OpenVINO 优化技术将推理吞吐量提升数倍。\n- 屏蔽了底层硬件差异，团队可无缝切换至最新的英特尔架构，而无需修改任何一行客户端代码。\n\nmodel_server 通过标准化的远程推理架构，让企业能够以更安全、弹性且高效的方式将 AI 能力规模化落地到复杂生产环境中。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fopenvinotoolkit_model_server_9d75b9a8.png","openvinotoolkit","OpenVINO™ Toolkit","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fopenvinotoolkit_3a5e7b58.png","",null,"https:\u002F\u002Fdocs.openvino.ai\u002F","https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit",[80,84,88,92,96,100,104,108,111,115],{"name":81,"color":82,"percentage":83},"C++","#f34b7d",89.1,{"name":85,"color":86,"percentage":87},"Python","#3572A5",4.3,{"name":89,"color":90,"percentage":91},"Starlark","#76d275",2.2,{"name":93,"color":94,"percentage":95},"Shell","#89e051",1,{"name":97,"color":98,"percentage":99},"Batchfile","#C1F12E",0.7,{"name":101,"color":102,"percentage":103},"Groovy","#4298b8",0.6,{"name":105,"color":106,"percentage":107},"C","#555555",0.5,{"name":109,"color":110,"percentage":107},"Jinja","#a52a22",{"name":112,"color":113,"percentage":114},"Makefile","#427819",0.4,{"name":116,"color":117,"percentage":118},"Go","#00ADD8",0.3,855,248,"2026-04-13T13:53:53","Apache-2.0","Linux (RedHat, Ubuntu), Windows","非必需（支持 CPU 推理）。若使用 AI 加速器，需参考官方文档，未明确指定具体 NVIDIA 显卡型号或 CUDA 版本要求。","未说明",{"notes":127,"python":128,"dependencies":129},"该工具主要作为 C++ 编写的高性能模型服务系统，推荐通过 Docker 容器、裸机（Bare Metal）或 Kubernetes 环境部署。支持从本地、对象存储或 HuggingFace Hub 加载模型。原生支持 Windows，已在 RedHat 和 Ubuntu 上测试。支持多种框架（TensorFlow, PaddlePaddle, ONNX）及 OpenAI\u002FCohere\u002FKServe 兼容 API。","未说明（核心服务由 C++ 实现，提供 Python 客户端支持）",[130,131,132,133],"Docker (推荐部署方式)","Kubernetes\u002FOpenShift (可选)","gRPC 或 REST 客户端库","OpenVINO Toolkit",[14,13,16,15],[136,137,138,139,140,141,142,143,144,145,146,147],"openvino","inference","ai","edge","cloud","deep-learning","serving","dag","kubernetes","machine-learning","model-serving","genai","2026-03-27T02:49:30.150509","2026-04-14T12:35:38.050286",[151,156,161,166,171,175],{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},33124,"启动 OpenVINO Model Server Docker 容器时，如何正确设置模型路径和命令参数以避免连接错误？","确保挂载卷的路径与容器内的模型路径一致。如果本地文件夹结构为 `models\u002Fmodel1\u002F1\u002Ffacedetection.xml`，建议使用以下命令启动容器：\n`docker run -v $(pwd)\u002Fmodels:\u002Fmodels\u002F -p 9000:9000 openvino\u002Fmodel_server:latest --model_path \u002Fmodels\u002Fmodel1\u002F --model_name face-detection --port 9000 --shape auto`\n注意 `--model_path` 应指向包含版本号的父目录（如 `\u002Fmodels\u002Fmodel1\u002F`），而不是具体的版本号文件夹。","https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Fissues\u002F441",{"id":157,"question_zh":158,"answer_zh":159,"source_url":160},33125,"OpenVINO Server 报错输入长度超过允许的 512，但模型配置文件显示支持更长上下文，如何解决？","这通常是由于 GPU 显存不足导致的，而非配置未生效。当输入较长或 Batch Size 较大时，显存需求会激增。解决方案包括：\n1. 减少其他应用程序的 RAM\u002FGPU 显存占用；\n2. 通过量化降低模型精度（如转为 Int8 或 Int4）；\n3. 减小推理时的 Batch Size。\n注意：在 WSL 环境下，Linux 可能只分配到主机一半的内存，且集成显卡（iGPU）也会占用部分系统内存，需特别留意可用资源。","https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Fissues\u002F2923",{"id":162,"question_zh":163,"answer_zh":164,"source_url":165},33126,"加载使用 AVX512 指令集的自定义 CPU 扩展库（cpu_extension）时出现段错误（Segmentation fault）怎么办？","该问题通常与基础 Docker 镜像的操作系统版本或编译器环境不兼容有关。建议尝试基于 Ubuntu 20.04 的开发镜像重新构建 OVMS。可以使用以下命令在 develop 分支上构建：\n`BASE_OS=ubuntu no_proxy=localhost,127.0.0.1 OVMS_CPP_DOCKER_IMAGE=OVMS_ubuntu_develop JOBS=24 make docker_build`\n构建完成后使用生成的本地镜像（如 `OVMS_ubuntu_develop-build`）运行服务，通常可解决因指令集兼容性导致的崩溃问题。","https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Fissues\u002F842",{"id":167,"question_zh":168,"answer_zh":169,"source_url":170},33127,"启动服务器时报错 'Given config is invalid according to schema... Key: layout'，提示配置文件无效，原因是什么？","此错误表明 `config.json` 中包含了当前 OVMS 版本不支持的配置字段（例如 `layout`）。这通常是因为使用了旧版本的配置文件模板与新版本的服务器不兼容。解决方法是检查并移除配置文件中 `model_config` 下不被支持的属性（如 `layout`），或参考当前版本官方文档中的最新 `config.json` 示例格式进行修改。","https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Fissues\u002F755",{"id":172,"question_zh":173,"answer_zh":174,"source_url":155},33128,"TensorFlow 相关的 'libcudart.so' 动态库加载警告是否会影响 OpenVINO Model Server 的正常运行？","通常不会影响。日志中出现的 `Could not load dynamic library 'libcudart.so.11.0'` 是 TensorFlow 后端尝试加载 CUDA 库时的警告。如果您没有配置 GPU 或者使用的是 CPU 模式进行推理，可以安全地忽略此错误。只有当您确实需要使用 GPU 加速时，才需要安装对应的 NVIDIA CUDA Toolkit 并确保库文件路径正确。",{"id":176,"question_zh":177,"answer_zh":178,"source_url":160},33129,"在 WSL (Windows Subsystem for Linux) 环境下运行 OpenVINO GPU 推理时遇到内存分配失败，可能是什么原因？","WSL 默认仅分配主机物理内存的一半给 Linux 子系统，而集成显卡（iGPU）又共享这部分系统内存，导致实际可用于模型推理的显存非常有限。如果遇到内存不足错误，建议：\n1. 修改 WSL 配置（`.wslconfig`）增加内存分配上限；\n2. 关闭主机上其他占用内存的应用；\n3. 降低模型的推理批次大小（Batch Size）或使用低精度模型。",[180,185,190,195,200,205,210,215,220,225,230,235,240,245,250,255,260,265,270,275],{"id":181,"version":182,"summary_zh":183,"released_at":184},255290,"v2026.1","## 针对 Qwen3-MOE 模型和 gpt-oss-20b 的增强支持  \n这些模型现具备更出色的性能、准确性和强大的并发请求处理能力，并支持连续批处理功能。目前，这些模型已以预优化的 OpenVINO™ 格式直接在 [Hugging Face 模型库](https:\u002F\u002Fhuggingface.co\u002FOpenVINO) 上提供，部署极为便捷。请参阅相关演示了解使用方法：  \n- 与 [代理框架](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_demos_continuous_batching_agent.html) 的集成  \n- 与 [Visual Studio Code](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_demos_code_completion_vsc.html) 的集成  \n- 与 [OpenWebUI](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_demos_integration_with_open_webui.html) 的集成  \n\n## 新增对 Qwen3-VL 的支持  \n该系列模型具备函数调用能力，使其能够在代理场景中发挥作用。上述演示中包含了使用示例。  \n\n## 扩展 `\u002Fimage` 端点，支持图像修复与扩展功能  \n现在可以同时传入输入图像和掩码，以编辑图像局部区域或扩展图像边界。  \n请参阅 [图像生成演示](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_demos_image_generation.html)，了解如何使用这些功能。  \n\n## 其他改进与修复  \n- 服务器日志现会报告当前 KV 缓存分配情况以及使用指标。在动态缓存大小模式下（默认设置），缓存分配会根据请求并发数和已处理上下文长度，在运行时自动调整。  \n- 现已支持在 NPU 设备上取消生成请求；对于断开连接的客户端发起的请求，系统将自动取消。  \n- 当模型生成函数调用时，完成原因字段将返回 `tool_calls`，符合 OpenAI API 标准。  \n- 修正了在 NPU 执行时，文本生成最后流式事件中的 token 使用量报告问题。  \n- 增加了在首个 token 生成后立即发送的额外流式事件，以符合 OpenAI API 规范，从而更准确地使用依赖流式事件的工具进行 TTFT 指标基准测试。  \n- 改进了从 Hugging Face 模型库拉取和下载模型时的错误处理机制，新增重试与断点续传功能，以应对大型模型文件下载过程中可能出现的网络连接问题。下载操作现在可以从之前的网络故障中恢复，或在无法恢复时记录日志。  \n\n---  \n\n您可以通过以下命令使用基于 Ubuntu 的 OpenVINO Model Server 公共 Docker 镜像：  \n- `docker pull openvino\u002Fmodel_server:2026.1` - 支持 CPU 设备，镜像基于 Ubuntu 24.04  \n- `docker pull openvino\u002Fmodel_server:2026.1-gpu` - 同时支持 GPU、NPU 和 CPU 设备，镜像基于 Ubuntu 24.04  \n\n此外，您也可以使用提供的二进制软件包。仅带有 `_python_on` 后缀的软件包才支持 Python。  \n\n另有额外的分发渠道：[https:\u002F\u002Fstorage.openvinotoolkit.org\u002Frepositories\u002Fopenvino_model_server\u002Fpackages\u002F2026.1.0\u002F](","2026-04-07T15:28:13",{"id":186,"version":187,"summary_zh":188,"released_at":189},255291,"v2026.0","## 性能提升\n\n- GPT-OSS 和 Qwen3-MOE 模型的性能和准确性得到提升。\n- 执行性能显著改善，尤其是在搭载 Intel® Core™ Ultra 系列 3 内置 GPU 的设备上。\n- 在 INT4 精度下，尤其是处理长提示时，模型的准确性更高。\n- 修复了编译缓存的处理方式，从而加快模型加载速度。\n\n\n## 智能体用例改进\n\n- 改进了聊天模板示例，以更好地支持智能体用例。\n- 工具解析器的限制性降低，生成内容更加灵活，响应可靠性也有所提高。\n- 新增对与 `devstral` 模型兼容的工具解析器的支持——可利用 unsloth\u002FDevstral-Small-2507 模型或类似模型来完成编码任务。详情请参阅 [代码本地助手演示](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_demos_code_completion_vsc.html) 和 [LLM 参考文档](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_llm_reference.html)。\n\n\n## 音频端点改进\n\n- 文本转语音端点的改进：\n  - 新增 voice 参数，可根据提供的嵌入向量选择不同的说话人。\n- 语音转文本端点的改进：\n  - 新增对温度采样参数的处理。\n  - 输出中支持时间戳。\n\n请查看 [音频端点演示](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_demos_audio.html)。\n\n\n## VLM 流水线改进\n\n- 为 VLM 流水线新增了参数，用于控制请求中图像 URL 的域名限制，并可选支持 URL 重定向。默认情况下，所有 URL 均被阻止。可通过 `--allowed_media_domains` 和 `--allowed_local_media_path` 配置允许的来源。详情请参阅 [服务器参数文档](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_parameters.html)。\n\n\n## 嵌入和重排序器改进\n\n- 文本嵌入端点现已支持 NPU 执行（预览版）。详情请参阅 [嵌入演示](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_demos_embeddings.html)。\n- 向重排序器和 LLM 流水线公开了分词器端点。\n\n\n## 部署改进\n\n- 为经典模型新增了可配置的预处理功能。部署的模型可以在运行时动态添加额外的预处理层，从而简化客户端实现，并支持将编码后的图像作为输入数组直接传递给模型。可能的选项包括：\n  - 颜色格式转换\n  - 布局调整\n  - 尺度缩放\n  - 均值调整\n  - 精度转换\n\n详细信息请参阅 [服务器参数文档](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_docs_parameters.html) 和 [ONNX 模型预处理演示](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_demo_using_onnx_model.html)。\n- 优化了文件句柄的使用，在 Linux 部署的高负载场景下减少了打开文件的数量。\n\n\n## 新增或更新的演示\n\n- [音频端点](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-server\u002Fovms_demos_audio.html)\n- [VLM 端点使用](https:\u002F\u002Fdocs.openvino.ai\u002F2026\u002Fmodel-ser","2026-02-24T16:35:27",{"id":191,"version":192,"summary_zh":193,"released_at":194},255292,"v2025.4.1","2025.4.1 是一个基于 [OpenVINO 2025.4.1](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino\u002Freleases\u002Ftag\u002F2025.4.1) 的小版本发布，包含错误修复和改进。\n\n## 预览：\n新增了对 **GPT-OSS** [代理用例](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_demos_continuous_batching_agent.html) 的预览支持。  \n截至 2025.4.1 版本，达到最佳准确率的配置为：  \n- `--pipeline_type LM`（不启用连续批处理和并发）  \n- `--target_device GPU`（该配置已在 Lunar Lake、Arrow Lake-H 以及配备 ≥16 GB 显存的 Intel Arc Battlemage 独显上验证）  \n同时，必须使用 INT4 精度。\n\n## 错误修复：\n* 修复了 **qwen3coder** 工具调用解析器中字符串参数里空白字符的转义问题。  \n* 将流式传输和用量统计相关的 `chat\u002Fcompletions` 端点请求处理方式调整为适用于无连续批处理的 LLM 流程。此类流程不会跟踪生成的 token 数量，此前最后一个数据块可能未发送至客户端，导致响应中缺少 token。现在会发送最后一个数据块，并将 token 使用量设为 0，供客户端忽略。  \n* 对文档和示例进行了一些小幅修复。\n\n---\n您可以通过以下命令使用基于 Ubuntu 的 OpenVINO Model Server 公共 Docker 镜像：\n\ndocker pull openvino\u002Fmodel_server:2025.4.1 - 支持 CPU 设备，镜像基于 Ubuntu 24.04  \ndocker pull openvino\u002Fmodel_server:2025.4.1-gpu - 同时支持 GPU、NPU 和 CPU 设备，镜像基于 Ubuntu 24.04  \n\n或者直接使用提供的二进制软件包。请注意，仅带有 `_python_on` 后缀的软件包才支持 Python。\n\n此外，您还可以通过以下网址获取额外的分发渠道：  \n[https:\u002F\u002Fstorage.openvinotoolkit.org\u002Frepositories\u002Fopenvino_model_server\u002Fpackages\u002F2025.4.1\u002F](https:\u002F\u002Fstorage.openvinotoolkit.org\u002Frepositories\u002Fopenvino_model_server\u002Fpackages\u002F2025.4.1\u002F)","2025-12-18T12:55:26",{"id":196,"version":197,"summary_zh":198,"released_at":199},255293,"v2025.4","## 智能体用例改进\n* 已启用针对新模型 Qwen3-Coder-30B 和 Qwen3-30B-A3B-Instruct 的工具解析器。这些模型作为预览功能在 OpenVINO 运行时中受到支持，可评估其“工具调用”能力。\n* 对 phi-4-mini-instruct 和 mistral-7B-v0.4 模型的流式处理已支持“工具调用”，与其余受支持的智能体模型相同。\n* 改进了 mistral 和 hermes3 的工具解析器，解决了与复杂生成的 JSON 对象相关的多个问题，并提高了整体响应的可靠性。\n* 引导式生成现支持 XGrammar 集成中的所有规则。`response_format` 参数现在可以接受 [XGrammar 结构化标签格式](https:\u002F\u002Fgithub.com\u002Fmlc-ai\u002Fxgrammar\u002Fblob\u002Fmain\u002Fdocs\u002Ftutorials\u002Fstructural_tag.md#format-types)（不属于 OpenAI API）。示例：`{\"type\": \"regex\", \"pattern\": \"\\\\w+\\\\.\\\\w+@company\\\\.com\"}`。\n\n## 新增或更新的演示\n* [与 OpenWebUI 集成](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_demos_integration_with_open_webui.html)\n* [使用 Continue 扩展与 Visual Studio Code 集成](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_demos_code_completion_vsc.html)\n* [智能体客户端演示](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_demos_continuous_batching_agent.html)\n* [音频端点](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_demos_audio.html)\n* [Windows 服务使用](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_docs_deploying_server_service.html)\n* [GGUF 模型拉取](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_demos_gguf.html)\n\n## 部署改进\n* 现在可以直接从 Hugging Face Hub 部署 GGUF 格式的多种 LLM 架构模型。例如 Qwen2、Qwen2.5、Qwen3 和 Llama3 等架构，只需一条命令即可部署。详情请参阅 [OVMS 中加载 GGUF 模型演示](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_demos_gguf.htmll)。\n\n* OpenVINO 模型服务器现在可以在 Windows 操作系统中以服务形式部署。它可通过服务配置管理进行统一管理，供所有运行的应用程序共享，并使用简化的 CLI 来拉取、配置、启用和禁用模型。[链接](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_docs_deploying_server_service.html)\n\n* 以 IR 格式拉取模型的功能已扩展至 Hugging Face* Hub 中的非 OpenVINO 组织范围。虽然 OpenVINO 组织的模型由 Intel 验证，但来自其他发布者的快速增长的 IR 格式模型生态系统，现在也可以通过 OVMS CLI 拉取并部署。注意：该仓库需通过 `optimum-cli export openvino` 命令填充，并且必须包含 IR 格式的分词器模型，才能被 OpenVINO 模型服务器成功加载。\n\n* 为简化部署而对 CLI 进行了优化：\n    `--plugin_config` 参数现在不仅可用于经典模型，还可应用于生成式流水线。\n    `--cache_dir` 现在支持编译缓存功能。","2025-12-01T14:42:08",{"id":201,"version":202,"summary_zh":203,"released_at":204},255294,"v2025.3","2025.3 是一个重大版本，它改进了代理式应用场景，新增了对图像生成端点的官方支持以及简化的部署方式。此外，还增加了对一系列新型生成模型的支持。\n\n## 代理式应用场景改进\n* 实现了工具引导生成功能，通过服务器参数 `--enable_tool_guided_generation` 和 `--tool_parser` 启用特定于模型的 XGrammar 配置，以遵循预期的响应语法。该功能基于生成的序列动态调整规则，从而提高模型准确性，并最大限度地减少工具调用时出现无效响应格式的情况。[链接](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_docs_llm_reference.html#output-parsing-settings)\n\n* 扩展了支持工具处理的模型列表，为 Mistral-7B-Instruct-v0.3 添加了工具解析器。\n\n* 为 Qwen3、Hermes3 和 Llama3 模型实现了流式响应功能，使这些模型能够以更具交互性的方式与工具协同工作。\n\n* 将工具解析器和推理解析器的实现与配置分离——不再使用单一参数 `response_parser`，而是分别使用 `tool_parser` 和 `reasoning_parser` 参数。这一改动提高了服务器端解析器实现和配置的灵活性，使得不同模型可以独立共享解析器。目前支持的工具解析器包括 Hermes3、Phi4、Llama3 和 Mistral；而推理解析器则仅实现了 Qwen3。  \n\n* 如果聊天模板未包含在 `tokenizer_config.json` 文件中，则将聊天模板文件名由 `template.jinja` 更改为 `chat_template.jinja`。\n\n* 现已支持结构化输出，通过 OpenAI 的 `response_format` 字段实现了基于 JSON Schema 的引导生成。此参数可用于生成 JSON 格式的响应，适用于自动化场景，并能提升响应的准确性。更多详情请参阅文档：[链接](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_structured_output.html) 文档中还附带了一个用于测试准确率提升效果的脚本。\n\n* 新增了通过聊天\u002F补全接口字段 `tool_choice=required` 强制执行工具调用生成的功能。该选项会触发工具序列的开始，促使模型至少生成一条工具响应。虽然这并不能保证响应一定有效，但可以提高响应的可靠性。[链接](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_docs_rest_api_chat.html#request)\n\n* 更新了包含所有功能的 MCP 服务器使用演示。[链接](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_demos_continuous_batching_agent.html)\n\n## 新增支持的模型及应用场景\n* Qwen3 嵌入模型——新增了对采用最后一 token pooling 模式嵌入模型的支持。导出此类模型时需传递额外参数 `--pooling`，示例可参见：[链接](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_demos_embeddings.html#model-preparation)\n\n* Qwen3 重排序模型——新增了对 tomaarsen\u002FQwen3-Reranker-seq-cl 的支持，该模型是 th 的副本。","2025-09-04T19:51:33",{"id":206,"version":207,"summary_zh":208,"released_at":209},255295,"v2025.2.1","2025.2.1 是一个包含错误修复和改进的次要版本，主要集中在自动模型拉取和图像生成方面。\n\n改进：\n* 支持在 `chat\u002Fcompletion` 请求中传递 [`chat_template_kwargs` 参数](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_docs_rest_api_chat.html#generic)。该参数可用于关闭模型推理。\n* 允许在 HTTP 响应中设置 [CORS 头](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_docs_parameters.html#server-configuration-options)。这可以解决 [OpenWebUI](https:\u002F\u002Fopenwebui.com\u002F) 与模型服务器之间的连接问题。\n\n其他变更：\n* Docker 镜像中的 NPU 驱动程序版本由 1.17 更新至 1.19。\n* 依赖项中的安全相关更新。\n\n错误修复：\n* 移除了图像生成的限制——现在支持通过参数 `n` 请求多个输出图像。\n* `add_to_config` 和 `remove_from_config` 参数除了接受包含 `config.json` 文件的目录外，还支持配置文件的完整路径。\n* 解决了在未配置代理的情况下从 HuggingFace Hub 拉取模型时出现的连接问题。\n* 修复了对 `HF_ENDPOINT` 环境变量的处理：此前会错误地在 HTTP 地址前添加 `https:\u002F\u002F` 前缀。\n* 将 `pull` 功能的相关环境变量 `GIT_SERVER_CONNECT_TIMEOUT_MS` 更改为 `GIT_OPT_SET_SERVER_TIMEOUT`，`GIT_SERVER_TIMEOUT_MS` 更改为 `GIT_OPT_SET_SERVER_TIMEOUT`，以与底层 libgit2 实现保持一致。\n* 修复了在 Windows 系统上使用 MediaPipe\u002FLLM 时，`config_path` 参数对相对路径的处理问题。\n* 修复了无代理情况下代理式演示无法正常工作的问题。\n* 不再拒绝图像生成中的 `response_format` 字段。虽然该参数目前仅接受 `base64_json` 值，但这样可以更好地与 Open WebUI 集成。\n* 在使用 OVMS 拉取 LLM 模型并准备其配置时，为 [参数](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_docs_parameters.html#text-generation) 添加了缺失的 `--response_parser` 选项。\n* 禁止同时使用 `--list_models` 和 `--pull` 参数，因为它们互斥。\n* 修复了 Phi4-mini 模型响应解析器在使用列表作为参数的函数时的准确性问题。\n* 修复了 `export_model.py` 脚本对嵌入和重排序模型 `target_device` 参数的处理。\n* 有状态文本生成流水线不再包含使用内容——此类流水线不支持该功能。此前会返回错误的响应。\n\n已知问题与限制：\n* VLM 模型 QwenVL2、QwenVL2.5 和 Phi3_VL 在 CPU 上部署于带有连续批处理的文本生成流水线时，精度较低。建议将这些模型部署在有状态流水线中，按顺序处理请求，如[示例](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_demos_continuous_batching_vlm.html)所示。\n* 本版本不支持在图像生成端点中使用 NPU。\n\n您可以使用基于 Ubuntu 的 OpenVINO Model Server 公共 Docker 镜像，运行以下命令：\n\ndocker pull openvino\u002Fmodel_server:2025.2.1-CPU","2025-07-16T13:49:33",{"id":211,"version":212,"summary_zh":213,"released_at":214},255296,"v2025.2","2025.2 是一个重大版本，新增了对 **图像生成** 的支持、对带有 `tools_calls` 处理的 **AI 代理** 的支持，以及在 **模型管理** 方面的新功能。  \n\n## 图像生成（预览）  \n\n图像生成端点——这一预览功能允许基于文本提示生成图像。该端点与 OpenAI API 兼容，便于与现有生态系统集成。它支持 Stable Diffusion、Stable Diffusion XL、Stable Diffusion 3 和 FLUX 等热门模型。  \n\n请查看 [端到端演示](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_demos_image_generation.html)  \n\n图像生成 [API 参考文档](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_docs_rest_api_image_generation.html)  \n\n## 智能体式 AI（预览）  \n\n在使用 LLM 模型生成文本时，可以通过工具扩展上下文。这些工具能够从外部源（如 Python 函数）提供额外的上下文信息。AI 代理可以利用 OpenVINO 模型服务器选择合适的工具并生成函数参数，最终的代理响应也可以基于工具的返回结果生成。  \n\n现在可以在用于文本生成的 `chat\u002Fcompletions` 端点中使用工具规范，并且消息中可以包含工具调用的响应（tool_calls）。此类智能体式用例需要经过专门调优的聊天模板和自定义响应解析器。目前，这些功能已适用于支持工具的热门模型。  \n\n请查看包含 **AI 代理** 的演示：[AI 代理演示](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_demos_continuous_batching_agent.html)  \n\n## 针对生成式用例的模型管理  \n\n此版本为模型管理和开发机制带来了多项改进，尤其针对生成式用例。  \n\n现在可以直接从 Hugging Face Hub 拉取并部署 OpenVINO 格式的生成式模型。生成式流水线的所有运行时参数均可通过模型服务器命令行界面进行设置。`ovms` 二进制文件可用于将模型拉取到本地模型仓库，以便在后续运行中重复使用。此外，还提供了用于列出模型仓库中模型、以及在配置文件中添加或移除启用模型的 CLI 命令。  \n\n有关使用 CLI 拉取模型和启动服务器的更多详细信息，请参阅：  \n- [拉取模型](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_docs_pull.html)  \n- [启动模型服务](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_docs_serving_model.html)  \n\n请查看 [RAG 演示](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_demos_continuous_batching_rag.html)，了解如何在单个服务器实例中轻松部署 3 个模型。  \n\n需要注意的是，Python 脚本 `export_models.py` 仍可用于准备来自 Hugging Face Hub 中非 OpenVINO 组织的模型，并且现已扩展以支持图像生成任务。  \n\n\n## 破坏性变更  \n\n此前，文本生成的采样参数均为静态值。此版本将默认采样参数改为基于模型目录中的 `generation_config.json` 文件。","2025-06-18T09:52:55",{"id":216,"version":217,"summary_zh":218,"released_at":219},255297,"v2025.1","2025.1 版本是一个重大发布，新增了对视觉语言模型的支持，并且能够在 NPU 加速器上进行文本生成。\n\n## VLM 支持\n\n`chat\u002Fcompletion` 端点现已扩展以支持视觉语言模型。现在可以在聊天上下文中发送图像。视觉语言模型的部署方式与 LLM 模型相同。\n\n查看端到端演示：[链接](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_demos_continuous_batching_vlm.html)\n\n更新后的 API 参考：[链接](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_docs_rest_api_chat.html#api-reference)\n\n## NPU 上的文本生成\n\n现在可以将 LLM 和 VLM 模型部署在 NPU 加速器上。文本生成将通过 `completions` 和 `chat\u002Fcompletions` 端点提供。从客户端角度来看，其工作方式与 GPU 和 CPU 部署相同，但不支持连续批处理算法。NPU 主要面向并发度较低的 AI PC 场景。\n\n请查看 [NPU LLM 演示](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_demos_llm_npu.html)和 [NPU VLM 演示](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fmodel-server\u002Fovms_demos_vlm_npu.html)。\n\n## 模型管理改进\n\n- 无需配置文件即可通过 CLI 启动 MediaPipe 图和生成式端点。只需将 `--model_path` CLI 参数指向包含 MediaPipe 图的目录即可。\n- 统一了模型和图的 JSON 配置文件结构，统一归入 `models_config_list` 部分。\n\n## 破坏性变更\n\n- gRPC 服务器现为可选。不再设置默认的 gRPC 端口。必须指定 `--port` 参数才能启动 gRPC 服务器。也可以仅通过 `--rest_port` 参数启动 REST API 服务器。从 CLI 启动 OVMS 时，至少需要定义一个端口号（`--port` 用于 gRPC，或 `--rest_port` 用于 REST）。而通过 C-API 启动 OVMS 则无需定义任何端口。\n\n## 其他变更\n\n- 更新了使用多实例的可扩展性演示：[链接](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Ftree\u002Fmain\u002Fdemos\u002Fcontinuous_batching\u002Fscaling)\n- 将请求中允许的文本生成停止词数量从 4 个增加至 16 个。\n- 实现并测试了 OVMS 与 [Continue](https:\u002F\u002Fcontinue.dev\u002F) 的 Visual Studio Code 扩展集成。OpenVINO 模型服务器可用作代码补全和内置 IDE 聊天助手的后端。请参阅说明：[链接](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Ftree\u002Freleases\u002F2025\u002F1\u002Fdemos\u002Fcode_local_assistant)。\n- 性能优化——对 OpenVINO 运行时以及文本采样生成算法进行了改进，有望在高并发负载下提升吞吐量。\n\n## 错误修复\n\n- 修复了 LLM 上下文长度的处理问题——现在当超过模型上下文长度时，OVMS 将停止生成文本。如果提示文本长度超过上下文长度，或者 `max_tokens` 加上输入标记数超过模型上下文长度，将会抛出错误。\n- 安全性和稳定性提升","2025-04-10T12:03:20",{"id":221,"version":222,"summary_zh":223,"released_at":224},255298,"v2025.0","2025.0 是一个重大版本，新增了对 Windows 原生部署的支持，并针对生成式用例进行了多项改进。\n\n## 新特性——Windows 原生服务器部署\n\n- 本版本支持在 Windows 操作系统上以二进制应用程序的形式部署模型服务器。\n- 完全支持生成式端点：基于 OpenAI API 的文本生成和嵌入，以及基于 Cohere API 的重排序。\n- 功能与 Linux 版本基本一致，但存在一些细微差异，包括云存储、CAPI 接口和 DAG 流水线等。更多详情请参阅：[Windows 开发者指南](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Fblob\u002Freleases\u002F2025\u002F0\u002Fdocs\u002Fwindows_developer_guide.md#list-of-disabled-features-on-windows-model-server)。\n- 该版本主要面向运行 **Windows 11** 的客户端机器，以及使用 **Windows Server 2022** 操作系统的数据中心环境。\n- 示例已更新，可在 Linux 和 Windows 上同时运行。请参阅[安装指南](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fopenvino-workflow\u002Fmodel-server\u002Fovms_docs_deploying_server_baremetal.html)。\n\n## 其他变更与改进\n\n- 正式支持 **Battle Mage GPU**、**Arrow Lake CPU、iGPU、NPU** 以及 **Lunar Lake CPU、iGPU 和 NPU**。\n- 更新了基础 Docker 镜像：新增 Ubuntu 24 和 RedHat UBI 9，移除了 Ubuntu 20 和 RedHat UBI 8。\n- 扩展了聊天\u002F补全 API，支持 `max_completion_tokens` 参数，并允许将消息内容以数组形式传递。这些更改旨在使 API 与 OpenAI API 保持兼容。\n- 在嵌入端点中新增截断选项：现在可以导出嵌入模型，并配置为自动截断输入，使其长度与嵌入上下文长度匹配。默认情况下，当输入过长时会抛出错误。\n- 为文本生成添加了推测解码算法：请参阅[示例](https:\u002F\u002Fdocs.openvino.ai\u002F2025\u002Fopenvino-workflow\u002Fmodel-server\u002Fovms_demos_continuous_batching_speculative_decoding.html)。\n- 新增对无命名输出模型的直接支持：当模型没有命名输出时，将在模型初始化时按 `out_\u003Cindex>` 的模式为其分配通用名称。\n- 添加了用于跟踪 MediaPipe 图处理时长的直方图指标。\n- 性能优化。\n\n## 破坏性变更\n\n- 停止对 NVIDIA 插件的支持。\n\n## Bug 修复\n\n- 修正了断开连接客户端取消文本生成时的行为。\n- 修复了嵌入端点中模型上下文长度的检测问题。\n- 提升了安全性和稳定性。\n\n您可以使用基于 Ubuntu 的 OpenVINO Model Server 公共 Docker 镜像，命令如下：\n* `docker pull openvino\u002Fmodel_server:2025.0` - 仅支持 CPU 设备\n* `docker pull openvino\u002Fmodel_server:2025.0-gpu` - 支持 GPU、NPU 和 CPU 设备\n\n或者使用提供的二进制包。预构建镜像也可在 [RedHat 生态系统目录](https:\u002F\u002Fcatalog.redhat.com\u002Fsoftware\u002Fcontainers\u002Fintel\u002Fopenvino-model-server\u002F607833052937385fc98515de) 中获取。","2025-02-06T14:36:29",{"id":226,"version":227,"summary_zh":228,"released_at":229},255299,"v2024.5","2024.5 版本新增了对嵌入和重排序端点的支持，并推出了实验性的 Windows 支持版本。\n\n## 变更与改进\n\n* 新增了 OpenAI API 文本 [嵌入端点](https:\u002F\u002Fdocs.openvino.ai\u002F2024\u002Fopenvino-workflow\u002Fmodel-server\u002Fovms_docs_rest_api_embeddings.html)，使 OVMS 能够作为 RAG 等 AI 应用的构建模块使用。  \n\n* 基于 Cohere API 新增了 [重排序端点](https:\u002F\u002Fdocs.openvino.ai\u002F2024\u002Fopenvino-workflow\u002Fmodel-server\u002Fovms_docs_rest_api_rerank.html)，可轻松实现查询与一组文档之间的相似度检测。它是 RAG 等 AI 应用的重要组成部分，并能便捷地与 LangChain 等框架集成。\n\n\n* 现在支持 `completions` 端点中的 `echo` 采样参数以及 `logprobs` 参数。  \n\n* LLM 文本生成在 CPU 和 GPU 上的性能均有所提升。  \n\n* 针对 GPU 目标设备的 LLM `dynamic_split_fuse` 优化，可在高并发场景下显著提升吞吐量。  \n\n* 简化了 LLM 服务部署及模型仓库准备流程。  \n* 提升了 LLM 测试覆盖率和稳定性。  \n\n* 提供了[说明文档](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Fblob\u002Fmain\u002Fdocs\u002Fwindows_binary_guide.md)，介绍如何构建 Windows 二进制包的实验性版本——适用于 Windows 操作系统的原生模型服务器。该版本存在一些限制，测试覆盖范围也较为有限，主要用于测试用途；正式生产版本预计将在 2025.0 版本中发布。欢迎用户提供反馈。  \n\n* OpenVINO Model Server C-API 现已支持异步推理，通过设置输出的能力提升了性能，并允许在 GPU 目标设备上同时将 OpenCL 和 VA 表面用于输入和输出。  \n\n* KServe REST API 的 `Model_metadata` 端点现在可以提供额外的 [model_info 引用](https:\u002F\u002Fdocs.openvino.ai\u002F2024\u002Fopenvino-workflow\u002Fmodel-server\u002Fovms_docs_rest_api_kfs.html#model-metadata-api)。  \n* 新增对 MTL 和 LNL 平台上的 NPU 和 iGPU 的支持。  \n\n* 安全性和稳定性得到进一步提升。  \n\n## 中断性变更\n\n无中断性变更。  \n\n## 错误修复：\n* 修复了 KServe REST API 对 URL 编码模型名称的支持问题。  \n* OpenAI 文本生成端点现可接受带有 v3 或 v3\u002Fv1 路径前缀的请求。  \n* 修复了视频流基准测试客户端的指标上报问题。  \n* 修复了 `completions` 端点偶发的 `INVALID_ARGUMENT` 错误。  \n* 修复了在预期停止但实际因长度限制而结束时，LLM 结束原因显示不正确的问题。  \n\n## 停止维护计划\n在未来的版本中，将不再维护以下构建选项的支持：\n* 以 Ubuntu 20 为基础镜像  \n* OpenVINO NVIDIA 插件  \n\n您可以通过以下命令使用基于 Ubuntu 22.04 的 OpenVINO Model Server 公共 Docker 镜像：\n* `docker pull openvino\u002Fmodel_server:2024.5` —— 仅支持 CPU 设备  \n* `docker pull openvino\u002Fmodel_server:2024.5-gpu` —— 支持 GPU、NPU 和 CPU 设备  \n\n或者直接使用提供的二进制包。","2024-11-20T14:08:49",{"id":231,"version":232,"summary_zh":233,"released_at":234},255300,"v2024.4","The 2024.4 release brings official support for OpenAI API [text generation](docs\u002Fllm\u002Freference.md). It is now recommended for production usage. It comes with a set of added features and improvements. \r\n\r\n## Changes and improvements \r\n\r\n- Significant performance improvements for multinomial sampling algorithm \r\n\r\n- `finish_reason` in the response correctly determines reaching the max_tokens (length) and completed the sequence (stop) \r\n\r\n- Added automatic cancelling of text generation for disconnected clients \r\n\r\n- Included prefix caching feature which speeds up text generation by caching the prompt evaluation \r\n\r\n- Option to compress the KV Cache to lower precision – it reduces the memory consumption with minimal impact on accuracy \r\n\r\n- Added support for `stop` sampling parameters. It can define a sequence which stops text generation.  \r\n\r\n- Added support for `logprobs` sampling parameter. It returns the probabilities of generated tokens.\r\n\r\n- Included generic [metrics](docs\u002Fmetrics.md) related to execution of MediaPipe graph. Metric `ovms_current_graphs` can be used for autoscaling based on current load and the level of concurrency. Counters like `ovms_requests_accepted` and `ovms_responses` can track the activity of the server.\r\n\r\n- Included demo of text generation [horizontal scalability](demos\u002Fcontinuous_batching\u002Fscaling)\r\n\r\n- Configurable handling of non-UTF-8 responses from the model – detokenizer can now automatically change then to Unicode replacement character  \r\n\r\n- Included support for Llama3.1 models\r\n\r\n- Text generation is supported both on CPU and GPU -check the [demo](demos\u002Fcontinuous_batching)\r\n\r\n \r\n\r\n## Breaking changes \r\n\r\nNo breaking changes. \r\n\r\n## Bug fixes \r\n\r\n- Security and stability improvements \r\n\r\n- Fixed handling of model templates without bos_token \r\n\r\n\r\n\r\nYou can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command: \r\n`docker pull openvino\u002Fmodel_server:2024.4` - CPU device support with the image based on Ubuntu22.04 \r\n`docker pull openvino\u002Fmodel_server:2024.4-gpu` - CPU, GPU and NPU device support with the image based on Ubuntu22.04 \r\nor use provided binary packages. \r\nThe prebuilt image is available also on [RedHat Ecosystem Catalog](https:\u002F\u002Fcatalog.redhat.com\u002Fsoftware\u002Fcontainers\u002Fintel\u002Fopenvino-model-server\u002F607833052937385fc98515de)","2024-09-19T11:46:25",{"id":236,"version":237,"summary_zh":238,"released_at":239},255301,"v2024.3","The 2024.3 release focus mostly on improvements in OpenAI API text generation implementation.\r\n\r\n## Changes and improvements \r\n\r\nA set of improvements in [OpenAI API text generation](https:\u002F\u002Fdocs.openvino.ai\u002F2024\u002Fovms_docs_llm_reference.html): \r\n\r\n- Significantly better performance thanks to numerous improvements in OpenVINO Runtime and sampling algorithms\r\n- Added config parameters `best_of_limit` and `max_tokens_limit` to avoid memory overconsumption impact from invalid requests [Read more](https:\u002F\u002Fdocs.openvino.ai\u002F2024\u002Fovms_docs_llm_reference.html)\r\n- Added reporting LLM metrics in the server logs [Read more](https:\u002F\u002Fdocs.openvino.ai\u002F2024\u002Fovms_docs_llm_reference.html)\r\n- Added extra sampling parameters `diversity_penalty`, `length_penalty`, `repetition_penalty`. [Read more](https:\u002F\u002Fdocs.openvino.ai\u002F2024\u002Fovms_docs_rest_api_chat.html)\r\n\r\nImprovements in documentation and demos:\r\n\r\n- Added [RAG demo](demos\u002Fcontinuous_batching\u002Frag) with OpenAI API \r\n- Added [K8S deployment demo](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Foperator\u002Fblob\u002Fmain\u002Fdocs\u002FRAG_serving_demo.md) for text generation scenarios \r\n- Simplified models initialization for a set of demos with mediapipe graphs using pose_detection model. TFLite models don't required any conversions [Check demo](demos\u002Fmediapipe\u002Fholistic_tracking)\r\n\r\n## Breaking changes\r\n\r\nNo breaking changes. \r\n\r\n## Bug fixes \r\n\r\n- Resolved issue with sporadic text generation hang via OpenAI API endpoints\r\n- Fixed issue with chat streamer impacting incomplete utf-8 sequences \r\n- Corrected format of the last streaming event in `completions` endpoint \r\n- Fixed issue with request hanging when running out of available cache\r\n\r\nYou can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command: \r\n`docker pull openvino\u002Fmodel_server:2024.3` - CPU device support with the image based on Ubuntu22.04 \r\n`docker pull openvino\u002Fmodel_server:2024.3-gpu` - GPU and CPU device support with the image based on Ubuntu22.04 \r\nor use provided binary packages. \r\nThe prebuilt image is available also on [RedHat Ecosystem Catalog](https:\u002F\u002Fcatalog.redhat.com\u002Fsoftware\u002Fcontainers\u002Fintel\u002Fopenvino-model-server\u002F607833052937385fc98515de)","2024-07-31T14:15:40",{"id":241,"version":242,"summary_zh":243,"released_at":244},255302,"v2024.2","The major new functionality in 2024.2 is a preview feature of OpenAI compatible API for text generation along with state of the art techniques like continuous batching and paged attention for improving efficiency of generative workloads.\r\n\r\n## Changes and improvements \r\n\r\n- Updated OpenVINO Runtime backend to [2024.2](https:\u002F\u002Fdocs.openvino.ai\u002F2024\u002Fhome.html)\r\n\r\n- OpenVINO Model Server can be now used for text generation use cases using [OpenAI compatible API](https:\u002F\u002Fdocs.openvino.ai\u002F2024\u002Fovms_docs_clients_openai.html)\r\n\r\n- Added support for continuous batching and PagedAttention algorithms for text generation with fast and efficient in high concurrency load especially on Intel Xeon processors. [Learn more about it](https:\u002F\u002Fdocs.openvino.ai\u002F2024\u002Fovms_docs_llm_reference.html).\r\n\r\n- Added LLM text generation OpenAI API [demo](https:\u002F\u002Fdocs.openvino.ai\u002F2024\u002Fovms_demos_continuous_batching.html).\r\n\r\n- Added notebook showcasing RAG algorithm with online scope changes delegated to the model server. [Link](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Ftree\u002Freleases\u002F2024\u002F2\u002Fdemos\u002Fcontinuous_batching\u002Frag)\r\n\r\n- Enabled python 3.12 for python clients, samples and demos.\r\n\r\n- Updated RedHat UBI base image to 8.10\r\n\r\n## Breaking changes \r\n\r\nNo breaking changes. \r\n\r\n\r\nYou can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command: \r\n`docker pull openvino\u002Fmodel_server:2024.2` - CPU device support with the image based on Ubuntu 22.04 \r\n`docker pull openvino\u002Fmodel_server:2024.2-gpu` - GPU and CPU device support with the image based on Ubuntu 22.04 \r\nor use provided binary packages. \r\nThe prebuilt image is available also on [RedHat Ecosystem Catalog](https:\u002F\u002Fcatalog.redhat.com\u002Fsoftware\u002Fcontainers\u002Fintel\u002Fopenvino-model-server\u002F607833052937385fc98515de)\r\n","2024-06-17T13:37:09",{"id":246,"version":247,"summary_zh":248,"released_at":249},255303,"v2024.1","The 2024.1 has a few improvements in the serving functionality, demo enhancements and bug fixes. \r\n\r\n## Changes and improvements \r\n\r\n- Updated OpenVINO Runtime backend to 2024.1  [Link](https:\u002F\u002Fdocs.openvino.ai\u002F2024\u002Fhome.html)\r\n\r\n- Added support for OpenVINO models with string data type on output. Together with the features introduced in 2024.0, now OVMS can support models with input and output of string type. That way you can take advantage of the tokenization built into the model as the first layer. You can also rely on any post-processing embedded into the model which returns just text. Check [universal sentence encoder demo](https:\u002F\u002Fdocs.openvino.ai\u002F2024\u002Fovms_demo_universal-sentence-encoder.html)  and [image classification with string output demo](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Ftree\u002Fmain\u002Fdemos\u002Fimage_classification_with_string_output)\r\n\r\n- Updated MediaPipe python calculators to support relative path for all related configuration and python code files. Now, the complete graph configuration folder can be deployed in arbitrary path without any code changes. It is demonstrated in the updated [text generation demo](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Ftree\u002Freleases\u002F2024\u002F1\u002Fdemos\u002Fpython_demos\u002Fllm_text_generation).\r\n\r\n- Extended support for KServe REST API for MediaPipe graph endpoints. Now you can send the data in KServe JSON body. Check how it is used in [text generation use case](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Ftree\u002Freleases\u002F2024\u002F1\u002Fdemos\u002Fpython_demos\u002Fllm_text_generation#use-kserve-rest-api-with-curl).\r\n\r\n- Added demo showcasing full RAG algorithm entirely delegated to the model server [Link](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Ftree\u002Freleases\u002F2024\u002F1\u002Fdemos\u002Fpython_demos\u002Frag_chatbot)\r\n\r\n- Added RedHat UBI based Dockerfile for python demos, usage documented in [python demos](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Ftree\u002Freleases\u002F2024\u002F1\u002Fdemos\u002Fpython_demos)\r\n\r\n## Breaking changes \r\n\r\nNo breaking changes. \r\n\r\n## Bug fixes \r\n\r\n- Improvements in error handling for invalid requests and incorrect configuration \r\n- Fixes in the demos and documentation \r\n \r\n\r\nYou can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command: \r\n`docker pull openvino\u002Fmodel_server:2024.1` - CPU device support with the image based on Ubuntu22.04 \r\n`docker pull openvino\u002Fmodel_server:2024.1-gpu` - GPU and CPU device support with the image based on Ubuntu22.04 \r\nor use provided binary packages. \r\nThe prebuilt image is available also on [RedHat Ecosystem Catalog](https:\u002F\u002Fcatalog.redhat.com\u002Fsoftware\u002Fcontainers\u002Fintel\u002Fopenvino-model-server\u002F607833052937385fc98515de)\r\n","2024-04-25T12:23:46",{"id":251,"version":252,"summary_zh":253,"released_at":254},255304,"v2024.0","The 2024.0 includes new version of OpenVINO™ backend and several improvements in the serving functionality.  \r\n\r\n## Changes and improvements \r\n\r\n- Updated OpenVINO™ Runtime backend to 2024.0. [Link](https:\u002F\u002Fdocs.openvino.ai\u002F2024\u002Fhome.html)\r\n- Extended text generation demo to support multi batch size both with streaming and unary clients. [Link to demo](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Ftree\u002Freleases\u002F2024\u002F0\u002Fdemos\u002Fpython_demos\u002Fllm_text_generation)\r\n- Added support for REST client for servables based on MediaPipe graphs including python pipeline nodes. [Link to demo](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Ftree\u002Freleases\u002F2024\u002F0\u002Fdemos\u002Fpython_demos\u002Fclip_image_classification) \r\n- Added additional MediaPipe calculators which can be reused for multiple image analysis scenarios. [Link to new calculators](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmediapipe\u002Ftree\u002Fmain\u002Fmediapipe\u002Fcalculators\u002Fgeti)\r\n- Added support for models with a `string` input data type including tokenization extension. [Link to demo](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Ftree\u002Freleases\u002F2024\u002F0\u002Fdemos\u002Funiversal-sentence-encoder)\r\n- Security related updates in versions of included dependencies. \r\n\r\n## Deprecation notices\r\n\r\n[Batch Size AUTO](https:\u002F\u002Fdocs.openvino.ai\u002F2024\u002Fovms_docs_dynamic_bs_auto_reload.html) and [Shape AUTO](https:\u002F\u002Fdocs.openvino.ai\u002F2024\u002Fovms_docs_dynamic_shape_auto_reload.html) are deprecated and will be removed.\r\nUse [Dynamic Model Shape feature](https:\u002F\u002Fdocs.openvino.ai\u002F2024\u002Fovms_docs_dynamic_shape_dynamic_model.html) instead.\r\n\r\n## Breaking changes \r\n\r\nNo breaking changes. \r\n\r\n## Bug fixes \r\n\r\n- Improvements in error handling for invalid requests and incorrect configuration \r\n- Minor fixes in the demos and documentation \r\n \r\n\r\nYou can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command: \r\n`docker pull openvino\u002Fmodel_server:2024.0` - CPU device support with the image based on Ubuntu22.04 \r\n`docker pull openvino\u002Fmodel_server:2024.0-gpu` - GPU and CPU device support with the image based on Ubuntu22.04 \r\nor use provided binary packages. \r\nThe prebuilt image is available also on [RedHat Ecosystem Catalog](https:\u002F\u002Fcatalog.redhat.com\u002Fsoftware\u002Fcontainers\u002Fintel\u002Fopenvino-model-server\u002F607833052937385fc98515de)","2024-03-06T14:23:45",{"id":256,"version":257,"summary_zh":258,"released_at":259},255305,"v2023.3","The 2023.3 is a major release with added a new feature and numerous improvements.  \r\n\r\n## Changes and improvements \r\n\r\n\r\n- Included a set of new [demos]( https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Ftree\u002Fmain\u002Fdemos\u002Fpython_demos) using custom nodes as a python code. They include LLM text generation, stable diffusion and seq2seq translation. \r\n\r\n- Improvements in the [demo]( https:\u002F\u002Fdocs.openvino.ai\u002F2023.3\u002Fovms_demo_real_time_stream_analysis.html) highlighting video stream analysis. A simple client example can now process the video stream from a local camera, video file or RTSP stream. The data can be sent to the model server via unary gRPC calls or gRPC streaming. \r\n\r\n- Changes in the public release artifacts – the base image of the public model server images is now updated to Ubuntu 22.04 and RHEL 8.8. Public docker images include support for python custom nodes but without custom python dependencies. The public binary distribution of the model server is targeted also on Ubuntu 22.04 and RHEL 8.8 but without python support (it can be deployed on bare metal hosts without python installed). Check [building from source](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Fblob\u002Freleases\u002F2023\u002F3\u002Fdocs\u002Fbuild_from_source.md#building-binary-package) guide. \r\n\r\n- Improvements in the documentation https:\u002F\u002Fdocs.openvino.ai\u002F2023.3\u002Fovms_what_is_openvino_model_server.html \r\n\r\n\r\n \r\n\r\n## New Features (Preview)\r\n\r\n- Added support for serving MediaPipe graphs with custom nodes implemented as a python code. It greatly simplifies exposing GenAI algorithms based on Hugging Face and Optimum libraries. It can be also applied for arbitrary pre and post processing for the AI solutions. [Learn more about it]( https:\u002F\u002Fdocs.openvino.ai\u002F2023.3\u002Fovms_docs_python_support_reference.html)  \r\n\r\n## Stable Feature\r\ngRPC streaming support is out of preview and considered stable. \r\n\r\n## Breaking changes \r\n\r\nNo breaking changes. \r\n\r\n## Deprecation notices\r\n\r\n[Batch Size AUTO](https:\u002F\u002Fdocs.openvino.ai\u002F2023.3\u002Fovms_docs_dynamic_bs_auto_reload.html) and [Shape AUTO](https:\u002F\u002Fdocs.openvino.ai\u002F2023.3\u002Fovms_docs_dynamic_shape_auto_reload.html) are deprecated and will be removed.\r\nUse [Dynamic Model Shape feature](https:\u002F\u002Fdocs.openvino.ai\u002F2023.3\u002Fovms_docs_dynamic_shape_dynamic_model.html) instead.\r\n\r\n## Bug fixes \r\n\r\n- OVMS handles boolean parameters to plugin config now https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Fpull\u002F2197 \r\n\r\n- Sporadic failures in the IrisTracking demo using gRPC stream are fixed https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Fpull\u002F2161 \r\n\r\n- Fixed handling of the incorrect MediaPipe graphs producing multiple outputs with the same name https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Fpull\u002F2161 \r\n\r\n \r\n\r\nYou can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command: \r\n`docker pull openvino\u002Fmodel_server:2023.3` - CPU device support with the image based on Ubuntu22.04 \r\n`docker pull openvino\u002Fmodel_server:2023.3-gpu` - GPU and CPU device support with the image based on Ubuntu22.04 \r\nor use provided binary packages. \r\nThe prebuilt image is available also on [RedHat Ecosystem Catalog](https:\u002F\u002Fcatalog.redhat.com\u002Fsoftware\u002Fcontainers\u002Fintel\u002Fopenvino-model-server\u002F607833052937385fc98515de)","2024-01-24T13:55:03",{"id":261,"version":262,"summary_zh":263,"released_at":264},255306,"v2023.2","The 2023.2 is a major release with several new features and improvements.\r\n\r\n## Changes\r\n* Updated OpenVINO backend to version 2023.2.\r\n* MediaPipe framework has been updated to the current latest version 0.10.3.  \r\n* Model API used in the OpenVINO Inference MediaPipe Calculator has been updated and included with all its features.\r\n\r\n## New Features\r\n* Introduced extension of KServe gRPC API with a stream on input and output. That extension is enabled for the servables with MediaPipe graphs. MediaPipe graph is persistent in the scope of the user session. That improves processing performance and supports stateful graphs – for example tracking algorithms. It also enables the use of source calculators. Check [more details](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Fblob\u002Freleases\u002F2023\u002F2\u002Fdocs\u002Fstreaming_endpoints.md).  \r\n* Added a demo showcasing gRPC streaming with MediaPipe graph. Check [more details](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Ftree\u002Freleases\u002F2023\u002F2\u002Fdemos\u002Fmediapipe\u002Fholistic_tracking).\r\n* Added parameters for gRPC quota configuration and changed default gRPC channel arguments to add rate limits. It will minimize the risks of impact of the service from uncontrolled flow of requests. Check [more details](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Fblob\u002Freleases\u002F2023\u002F2\u002Fdocs\u002Fsecurity_considerations.md).\r\n* Updated python clients requirements to match wide range of python versions from 3.7 to 3.11 \r\n\r\n## Breaking changes\r\nNo breaking changes.\r\n\r\n## Bug fixes\r\n* Handling situation when MediaPipe graph is being added with the same name as previously loaded DAG.\r\n* Fixed returned HTTP status code when MediaPipe graph\u002FDAG is not loaded yet. (previously 404, now 503)\r\n* Corrected error message returned via HTTP when using method other than GET for metadata endpoint - \"Unsupported method\".\r\n\r\n##\r\nYou can use an OpenVINO Model Server public Docker image's based on Ubuntu via the following command:\r\n`docker pull openvino\u002Fmodel_server:2023.2 ` - CPU device support with the image based on Ubuntu20.04\r\n`docker pull openvino\u002Fmodel_server:2023.2-gpu` - GPU and CPU device support with the image based on Ubuntu22.04\r\nor use provided binary packages.\r\nThe prebuilt image is available also on [RedHat Ecosystem Catalog](https:\u002F\u002Fcatalog.redhat.com\u002Fsoftware\u002Fcontainers\u002Fintel\u002Fopenvino-model-server\u002F607833052937385fc98515de)\r\n","2023-11-16T15:41:47",{"id":266,"version":267,"summary_zh":268,"released_at":269},255307,"v2023.1","OpenVINO™ Model Server 2023.1 \r\n\r\nThe 2023.1 is a major release with numerous improvements and changes.  \r\n\r\n## New Features\r\n* Improvements in Model Server with [MediaPipe](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fmediapipe) integration. In the previous version MediaPipe scheduler was included in OpenVINO Model Server as a preview. Now, the MediaPipe graph scheduler is added by default and officially supported. Check [mediapipe in the model server documentation](docs\u002Fmediapipe.md). This release includes the following improvements in running requests calls to the graphs: \r\n  * `GetModelMetadata` implementation for MediaPipe graphs – the calls to model metadata returns information about the expected inputs and outputs names from the graph with [the limitation on shape and datatype](docs\u002Fmediapipe.md#current-limitations-)  \r\n  * Support for data serialization and deserialization to a range of types: `ov::Tensor`, `mediapipe::Image`, KServe ModelInfer Request\u002FResponse – those capabilities simplify adoption of the existing graphs which might have on the input and output the expected data in many different formats. Now the data submitted to the KServe endpoint can be automatically deserialized to the expected type. The deserialization function is determined based on the naming convention in the graph input and output tags in the graphs config. Check [more details](docs\u002Fmediapipe.md#supported-inputoutput-packet-types).  \r\n  * `OpenVINOInferenceCalculator` support for a range of input formats from `ov::Tensor` to `tensorflow::Tensor` and `TfLite::Tensor`  - the `OpenVINOInferenceCalculator` has been created as a replacement for Tensorflow calculators. It can accept the input data and returns the data with a range of possible formats. That simplifies just swapping inference related nodes in the existing graphs without changing the rest of the graph. Learn [more about the calculators](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmediapipe\u002Fblob\u002Fmain\u002Fmediapipe\u002Fcalculators\u002Fovms\u002Fcalculators.md) \r\n  * Added demos based on MediaPipe upstream graphs: [holistic sensory analysis](demos\u002Fmediapipe\u002Fholistic_tracking\u002FREADME.md), [object detection](demos\u002Fmediapipe\u002Fobject_detection\u002FREADME.md), [iris detection](demos\u002Fmediapipe\u002Fholistic_tracking\u002FREADME.md#run-client-application-for-iris-tracking)\r\n* Improvements in C-API interface:\r\n  * Added `OVMS_ApiVersion` call \r\n  * Added support for C-API calls to DAG pipelines \r\n  * Changed data type in API calls for data shape from `uint64_t` to `int64_t` and `dimCount` from `uint32_t` to `size_t`, this is breaking change \r\n  * Added a call to servable (model, DAG) [metadata](docs\u002Fmodel_server_c_api.md#servable-metadata) and [state](docs\u002Fmodel_server_c_api.md#servable-readiness)\r\n  * Added a call to get [ServerMetadata](docs\u002Fmodel_server_c_api.md#servable-metadata)\r\n* Improvements in error handling \r\n* Improvements in GRPC and REST status codes - the error statuses will include more meaningful and accurate info about the culprit \r\n* Support for models with scalars on input (empty shape) - model server can be used with models even with input shape represented by an empty list `[]` (scalar).\r\n* Support for input with zero size dimensions  - model server can now accept requests to dynamic shape models even with `0` size like `[0,234]` \r\n* Added support for TFLite models - OpenVINO Model Server can not directly serve models with `.tflite` extension \r\n* Demo improvements: \r\n  * Added Video streaming demos - [text detection](demos\u002Fhorizontal_text_detection\u002Fpython\u002FREADME.md#rtsp-client) and [holistic pose tracking](demos\u002Fmediapipe\u002Fholistic_tracking\u002FREADME.md#rtsp-client)\r\n  * Stable diffusion [demo](demos\u002Fstable-diffusion\u002Fpython\u002FREADME.md)\r\n  * MediaPipe [demos](demos\u002Fmediapipe)\r\n\r\n## Breaking changes\r\n * Changed few of the C-API functions names. Check [this commit](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Fpull\u002F2012\u002Ffiles#diff-afe687ca58f92d43e499a9d684e30d39e86ceebdf175d68bc8e2bca34784e67b)\r\n\r\n## Bug fixes\r\n* Fix REST status code when the improper path is requested \r\n* metrics endpoint now returns correct response even with unsupported parameters\r\n\r\n##\r\nYou can use an OpenVINO Model Server public Docker image's based on Ubuntu via the following command:\r\n`docker pull openvino\u002Fmodel_server:2023.1 ` - CPU device support with the image based on Ubuntu20.04\r\n`docker pull openvino\u002Fmodel_server:2023.1-gpu` - GPU and CPU device support with the image based on Ubuntu22.04\r\nor use provided binary packages.\r\nThe prebuilt image is available also on [RedHat Ecosystem Catalog](https:\u002F\u002Fcatalog.redhat.com\u002Fsoftware\u002Fcontainers\u002Fintel\u002Fopenvino-model-server\u002F607833052937385fc98515de)","2023-09-18T13:17:07",{"id":271,"version":272,"summary_zh":273,"released_at":274},255308,"v2023.0","The 2023.0 is a major release with numerous improvements and changes.\r\n\r\n## New Features\r\n* Added option to submit [inference requests in a form of strings](https:\u002F\u002Fdocs.openvino.ai\u002F2023.0\u002Fovms_docs_text.html) and reading the response also in a form of a string. That can be currently utilized via a custom nodes and OpenVINO models with a CPU extension handling string data:\r\n    * Using a custom node in a [DAG pipeline](https:\u002F\u002Fdocs.openvino.ai\u002F2023.0\u002Fovms_docs_dag.html) which can perform string tokenization before passing it to the OpenVINO model - that is beneficial for models without tokenization layer to fully delegate that preprocessing to the model server.\r\n    * Using a custom node in a DAG pipeline which can perform string detokenization of the model response to convert it to a string format - that can be beneficial for models without detokenization layer to fully delegate that postprocessing to the model server.\r\n    * Both options above are demonstrated with a GPT model for text generation [demo](https:\u002F\u002Fdocs.openvino.ai\u002F2023.0\u002Fovms_demo_gptj_causal_lm.html).\r\n    * For models with tokenization layer like [universal-sentence-encoder](https:\u002F\u002Ftfhub.dev\u002Fgoogle\u002Funiversal-sentence-encoder-multilingual\u002F3) - there is added a cpu extension which implements sentencepiece_tokenization layer. Users can pass to the model a string which is automatically converted to the format needed by the cpu extension.\r\n    * The option above is demonstrated in universal-sentence-encoder model usage [demo](https:\u002F\u002Fdocs.openvino.ai\u002F2023.0\u002Fovms_demo_universal-sentence-encoder.html).\r\n    * Added support for string input and output in the [ovmsclient](https:\u002F\u002Fpypi.org\u002Fproject\u002Fovmsclient\u002F) – `ovmsclient` library can be used to send the string data to the model server. Check the code [snippets](https:\u002F\u002Fdocs.openvino.ai\u002F2023.0\u002Fovms_docs_clients_tfs.html).\r\n* Preview version of OVMS with MediaPipe framework - it is possible to make calls to OpenVINO Model Server to perform mediapipe graph processing. There are calculators performing OpenVINO inference via C-API calls from OpenVINO Model Server, and also calculators converting the OV::Tensor input format to mediapipe image format. That creates a foundation for creating arbitrary graphs. Check [model server integration with mediapipe documentation](https:\u002F\u002Fdocs.openvino.ai\u002F2023.0\u002Fovms_docs_mediapipe.html).\r\n* Extended [C-API interface](https:\u002F\u002Fdocs.openvino.ai\u002F2023.0\u002Fovms_docs_c_api.html) with ApiVersion and Metadata calls, C-API version is now 0.3.\r\n* Added support for saved_model format. Check how to create [models repository](https:\u002F\u002Fdocs.openvino.ai\u002F2023.0\u002Fovms_docs_models_repository.html). An example of such use case is in [universal-sentence-encoder demo](https:\u002F\u002Fdocs.openvino.ai\u002F2023.0\u002Fovms_demo_universal-sentence-encoder.html).\r\n* Added option to [build the model server with NVIDIA plugin](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Fblob\u002Freleases\u002F2023\u002F0\u002Fdocs\u002Fbuild_from_source.md#nvidia) on UBI8 base image.\r\n* Virtual plugins AUTO, HETERO and MULTI are now supported with [NVIDIA plugin](https:\u002F\u002Fdocs.openvino.ai\u002F2023.0\u002Fovms_docs_target_devices.html#using-nvidia-plugin).\r\n* In the DEBUG log_level, there is included a message about the actual execution device for each inference request for the [AUTO target_device](https:\u002F\u002Fdocs.openvino.ai\u002F2023.0\u002Fovms_docs_target_devices.html#using-auto-plugin). Learn more about the [AUTO plugin](https:\u002F\u002Fdocs.openvino.ai\u002F2023.0\u002Fopenvino_docs_OV_UG_supported_plugins_AUTO.html).\r\n* Support for relative paths to the model files. The paths can be now relative to the config.json location. It simplifies deployments when the config.json to distributed together with the models repository.\r\n* Updated OpenCL drivers for the GPU device to version 23.13 (with Ubuntu22.04 base image).\r\n* Added option to build OVMS on the base OS Ubuntu:22.04. This is an addition to the supported based OSes Ubuntu:20.04 and UBI8.7.\r\n\r\n## Breaking changes\r\n* KServe API unification with Triton implementation for handling string and encoded images formats (now every string or encoded image located in binary extension (REST) or raw_input_contents (GRPC) need to be preceded by 4 bytes (little endian) containing its size) The updated [code snippets](https:\u002F\u002Fdocs.openvino.ai\u002F2023.0\u002Fovms_docs_clients_kfs.html) and [samples](https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fmodel_server\u002Ftree\u002Freleases\u002F2023\u002F0\u002Fclient\u002Fpython\u002Fkserve-api\u002Fsamples).\r\n* Changed default performance hint from THROUGHPUT to LATENCY in 2023.0 the default performance hint is changed from THROUGHPUT to LATENCY. With the new default settings, the model server will be adjusted for optimal execution and minimal latency with low concurrency. The default setting will also minimize memory consumption. In case of the usage model with high concurrency, it is recommended to adjust the NUM_STREAMS or set the performance hint to THROUGHPUT explicitly. Read more in [performance tuning guide](https:\u002F\u002Fdocs.openvino.ai\u002F2023.0\u002Fovms_docs_perfor","2023-06-01T15:35:31",{"id":276,"version":277,"summary_zh":278,"released_at":279},255309,"v2022.3.0.1","The 2022.3.0.1 version is a patch release for the OpenVINO Model Server. It includes a few bug fixes and enhancement in the C-API.\r\n\r\n## New Features\r\n* Added to inference execution method OVMS_Inference in C API support for DAG pipelines. The parameter servableName  can be both the model name or the pipeline name \r\n* Added debug log in the AUTO plugin execution to report which physical device is used - AUTO plugin allocates the best available device for the model execution. For troubleshooting purposes, in the debug log level, the model server will report which device is used for each inference execution\r\n* Allowed enabling metrics collection via CLI parameters while using the configuration file. Metrics collection can be configured in CLI parameters or in the configuration file. Enabling the metrics in CLI is not blocking any more the usage of configuration file to define multiple models for serving.\r\n* Added client sample in Java to demonstrate KServe API usage .\r\n* Added client sample in Go to demonstrate KServe API usage.\r\n* Added client samples demonstrating asynchronous calls via KServe API.\r\n* Added a [demo](https:\u002F\u002Fdocs.openvino.ai\u002Fnightly\u002Fovms_demo_gptj_causal_lm.html#doxid-ovms-demo-gptj-causal-lm) showcasing OVMS with GPT-J-6b model from Hugging Face.\r\n\r\n## Bug fixes\r\n* Fixed model server image building with NVIDIA plugin on a host with NVIDIA Container Toolkit installed. \r\n* Fixed KServe API response to include the DAG pipeline name for the calls to DAG – based on the API definition, the response includes the servable name. In case of DAG processing, it will return now the pipeline name instead of an empty value. \r\n* Default number of gRPC and REST workers will be calculated correctly based on allocated CPU cores – when the model server is started in the docker container with constrained CPU allocation, the default number of the frontend threads will be set more efficiently. \r\n* Corrected reporting the number of streams in the metrics while using non-CPU plugins – before fixing that bug, a zero value was returned. That metric suggests the optimal number of active parallel inferences calls for the best throughput performance.\r\n* Fixed handling model mapping with model reloads.\r\n* Fixed handling model mapping with dynamic shape\u002Fbatch size.\r\n* ovmsclient is not causing conflicts with tensorflow-serving-api package installation in the same python environment.\r\n* Fixed debug image building.\r\n* Fixed C-API demo building.\r\n* Added security fixes.\r\n\r\n## Other changes:\r\n* Updated OpenCV version to 4.7 - opencv is an included dependence for image transformation in the custom nodes and for jpeg\u002Fpng input decoding.\r\n* Lengthened requests waiting timeout during DAG reloads. On slower machines during DAG configuration reload sporadically timeout was reached ending in unsuccessful request.\r\n* ovmsclient has more relaxed requirements related to numpy version.\r\n* Improved unit tests stability.\r\n* Improved documentation.\r\n\r\nYou can use an OpenVINO Model Server public Docker image's based on Ubuntu via the following command:\r\n`docker pull openvino\u002Fmodel_server:2022.3.0.1 `\r\n`docker pull openvino\u002Fmodel_server:2022.3.0.1-gpu`\r\nor use provided binary packages.\r\n","2023-02-27T14:44:51"]