[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-kserve--kserve":3,"tool-kserve--kserve":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",140436,2,"2026-04-05T23:32:43",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":67,"owner_name":76,"owner_avatar_url":77,"owner_bio":68,"owner_company":78,"owner_location":78,"owner_email":79,"owner_twitter":78,"owner_website":80,"owner_url":81,"languages":82,"stars":109,"forks":110,"last_commit_at":111,"license":112,"difficulty_score":113,"env_os":114,"env_gpu":115,"env_ram":114,"env_deps":116,"category_tags":119,"github_topics":120,"view_count":10,"oss_zip_url":78,"oss_zip_packed_at":78,"status":16,"created_at":140,"updated_at":141,"faqs":142,"releases":171},480,"kserve\u002Fkserve","kserve","Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes","KServe 是一个基于 Kubernetes 的 AI 模型推理平台，专注于统一部署生成式和预测式 AI 模型。它通过标准化接口和自动化流程，简化了在云端或本地环境中部署、扩展和管理 AI 模型的复杂性，支持 TensorFlow、PyTorch、Hugging Face 等主流框架，同时针对大语言模型（LLM）提供性能优化方案。\n\n传统 AI 部署常面临框架适配困难、资源利用率低、维护成本高等问题。KServe 通过内置的 GPU 加速、智能内存管理（如 KV Cache 卸载）、动态自动扩展等功能，显著提升了推理效率并降低了硬件成本。其独特的多框架兼容能力，允许用户在单一平台上混合部署生成式（如文本生成）和预测式（如分类回归）模型，减少重复建设。\n\n该工具特别适合需要大规模部署 AI 服务的企业开发团队、云原生架构师以及 AI 研究人员。开发者可通过标准 API 快速集成模型，运维人员则能利用其可观察性工具（如异常检测、模型解释）保障服务稳定性。技术亮点包括对 vLLM 等高效推理后端的原生支持、基于流量的智能路由策略，以及通过 CNCF 孵化认证的社区生态保障。用户可访问官网获","KServe 是一个基于 Kubernetes 的 AI 模型推理平台，专注于统一部署生成式和预测式 AI 模型。它通过标准化接口和自动化流程，简化了在云端或本地环境中部署、扩展和管理 AI 模型的复杂性，支持 TensorFlow、PyTorch、Hugging Face 等主流框架，同时针对大语言模型（LLM）提供性能优化方案。\n\n传统 AI 部署常面临框架适配困难、资源利用率低、维护成本高等问题。KServe 通过内置的 GPU 加速、智能内存管理（如 KV Cache 卸载）、动态自动扩展等功能，显著提升了推理效率并降低了硬件成本。其独特的多框架兼容能力，允许用户在单一平台上混合部署生成式（如文本生成）和预测式（如分类回归）模型，减少重复建设。\n\n该工具特别适合需要大规模部署 AI 服务的企业开发团队、云原生架构师以及 AI 研究人员。开发者可通过标准 API 快速集成模型，运维人员则能利用其可观察性工具（如异常检测、模型解释）保障服务稳定性。技术亮点包括对 vLLM 等高效推理后端的原生支持、基于流量的智能路由策略，以及通过 CNCF 孵化认证的社区生态保障。用户可访问官网获取部署指南和案例参考。","# KServe\n[![go.dev reference](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fgo.dev-reference-007d9c?logo=go&logoColor=white)](https:\u002F\u002Fpkg.go.dev\u002Fgithub.com\u002Fkserve\u002Fkserve)\n[![Coverage Status](https:\u002F\u002Fimg.shields.io\u002Fendpoint?url=https:\u002F\u002Fgist.githubusercontent.com\u002Fandyi2it\u002F5174bd748ac63a6e4803afea902e9810\u002Fraw\u002Fcoverage.json)](https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Factions\u002Fworkflows\u002Fgo.yml)\n[![Go Report Card](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkserve_kserve_readme_2b4a70945b89.png)](https:\u002F\u002Fgoreportcard.com\u002Freport\u002Fgithub.com\u002Fkserve\u002Fkserve)\n[![OpenSSF Best Practices](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkserve_kserve_readme_50ccde67b228.png)](https:\u002F\u002Fbestpractices.coreinfrastructure.org\u002Fprojects\u002F6643)\n[![Releases](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frelease-pre\u002Fkserve\u002Fkserve.svg?sort=semver)](https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Freleases)\n[![LICENSE](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fkserve\u002Fkserve.svg)](https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fblob\u002Fmaster\u002FLICENSE)\n[![Slack Status](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fslack-join_chat-white.svg?logo=slack&style=social)](https:\u002F\u002Fgithub.com\u002Fkserve\u002Fcommunity\u002Fblob\u002Fmain\u002FREADME.md#questions-and-issues)\n[![Gurubase](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGurubase-Ask%20KServe%20Guru-006BFF)](https:\u002F\u002Fgurubase.io\u002Fg\u002Fkserve)\n\nKServe is a standardized distributed generative and predictive AI inference platform for scalable, multi-framework deployment on Kubernetes.\n\nKServe is being [used by many organizations](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fcommunity\u002Fadopters) and is a [Cloud Native Computing Foundation (CNCF)](https:\u002F\u002Fwww.cncf.io\u002F) incubating project.\n\nFor more details, visit the [KServe website](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002F).\n\n![KServe](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkserve_kserve_readme_4cbc7fbcf8da.png)\n\n### Why KServe?\n\nSingle platform that unifies Generative and Predictive AI inference on Kubernetes. Simple enough for quick deployments, yet powerful enough to handle enterprise-scale AI workloads with advanced features.\n\n### Features\n\n**Generative AI**\n  * 🧮 **Optimized Backends**: Support for vLLM and llm-d for optimized performance for serving LLMs\n  * 📌 **Standardization**: OpenAI-compatible inference protocol for seamless integration with LLMs\n  * 🚅 **GPU Acceleration**: High-performance serving with GPU support and optimized memory management for large models\n  * 💾 **Model Caching**: Intelligent model caching to reduce loading times and improve response latency for frequently used models\n  * 🗂️ **KV Cache Offloading**: Advanced memory management with KV cache offloading to CPU\u002Fdisk for handling longer sequences efficiently\n  * 📈 **Autoscaling**: Request-based autoscaling capabilities optimized for generative workload patterns\n  * 🔧 **Hugging Face Ready**: Native support for Hugging Face models with streamlined deployment workflows\n\n**Predictive AI**\n  * 🧮 **Multi-Framework**: Support for TensorFlow, PyTorch, scikit-learn, XGBoost, ONNX, and more\n  * 🔀 **Intelligent Routing**: Seamless request routing between predictor, transformer, and explainer components with automatic traffic management\n  * 🔄 **Advanced Deployments**: Canary rollouts, inference pipelines, and ensembles with InferenceGraph\n  * ⚡ **Autoscaling**: Request-based autoscaling with scale-to-zero for predictive workloads\n  * 🔍 **Model Explainability**: Built-in support for model explanations and feature attribution to understand prediction reasoning\n  * 📊 **Advanced Monitoring**: Enables payload logging, outlier detection, adversarial detection, and drift detection\n  * 💰 **Cost Efficient**: Scale-to-zero on expensive resources when not in use, reducing infrastructure costs\n\n### Learn More\nTo learn more about KServe, how to use various supported features, and how to participate in the KServe community, \nplease follow the [KServe website documentation](https:\u002F\u002Fkserve.github.io\u002Fwebsite). \nAdditionally, we have compiled a list of [presentations and demos](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fcommunity\u002Fpresentations) to dive through various details.\n\n### :hammer_and_wrench: Installation\n\n#### Standalone Installation\n- **[Standard Kubernetes Installation](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fadmin-guide\u002Foverview#raw-kubernetes-deployment)**: Compared to Serverless Installation, this is a more **lightweight** installation. However, this option does not support canary deployment and request based autoscaling with scale-to-zero.\n- **[Knative Installation](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fadmin-guide\u002Foverview#serverless-deployment)**: KServe by default installs Knative for **serverless deployment** for InferenceService.\n- **[ModelMesh Installation](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fadmin-guide\u002Foverview#modelmesh-deployment)**: You can optionally install ModelMesh to enable **high-scale**, **high-density** and **frequently-changing model serving** use cases. \n- **[Quick Installation](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fgetting-started\u002Fquickstart-guide)**: Install KServe on your local machine.\n\n#### Kubeflow Installation\nKServe is an important addon component of Kubeflow, please learn more from the [Kubeflow KServe documentation](https:\u002F\u002Fwww.kubeflow.org\u002Fdocs\u002Fexternal-add-ons\u002Fkserve\u002Fkserve). Check out the following guides for running [on AWS](https:\u002F\u002Fawslabs.github.io\u002Fkubeflow-manifests\u002Fmain\u002Fdocs\u002Fcomponent-guides\u002Fkserve) or [on OpenShift Container Platform](https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fblob\u002Fmaster\u002Fdocs\u002FOPENSHIFT_GUIDE.md).\n\n### :flight_departure: [Create your first InferenceService](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fgetting-started\u002Fgenai-first-isvc)\n\n### :bulb: [Roadmap](.\u002FROADMAP.md)\n\n### :blue_book: [InferenceService API Reference](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Freference\u002Fcrd-api)\n\n### :toolbox: [Developer Guide](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fdeveloper-guide)\n\n### :writing_hand: [Contributor Guide](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fdeveloper-guide\u002Fcontribution)\n\n### :handshake: [Adopters](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fcommunity\u002Fadopters)\n\n### Star History\n\n[![Star History Chart](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkserve_kserve_readme_573c534db316.png)](https:\u002F\u002Fwww.star-history.com\u002F#kserve\u002Fkserve&Date)\n\n### Contributors\n\nThanks to all of our amazing contributors!\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkserve_kserve_readme_802cd3758daf.png\" \u002F>\n\u003C\u002Fa>\n","# KServe\n[![go.dev reference](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fgo.dev-reference-007d9c?logo=go&logoColor=white)](https:\u002F\u002Fpkg.go.dev\u002Fgithub.com\u002Fkserve\u002Fkserve)\n[![Coverage Status](https:\u002F\u002Fimg.shields.io\u002Fendpoint?url=https:\u002F\u002Fgist.githubusercontent.com\u002Fandyi2it\u002F5174bd748ac63a6e4803afea902e9810\u002Fraw\u002Fcoverage.json)](https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Factions\u002Fworkflows\u002Fgo.yml)\n[![Go Report Card](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkserve_kserve_readme_2b4a70945b89.png)](https:\u002F\u002Fgoreportcard.com\u002Freport\u002Fgithub.com\u002Fkserve\u002Fkserve)\n[![OpenSSF Best Practices](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkserve_kserve_readme_50ccde67b228.png)](https:\u002F\u002Fbestpractices.coreinfrastructure.org\u002Fprojects\u002F6643)\n[![Releases](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frelease-pre\u002Fkserve\u002Fkserve.svg?sort=semver)](https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Freleases)\n[![LICENSE](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fkserve\u002Fkserve.svg)](https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fblob\u002Fmaster\u002FLICENSE)\n[![Slack Status](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fslack-join_chat-white.svg?logo=slack&style=social)](https:\u002F\u002Fgithub.com\u002Fkserve\u002Fcommunity\u002Fblob\u002Fmain\u002FREADME.md#questions-and-issues)\n[![Gurubase](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGurubase-Ask%20KServe%20Guru-006BFF)](https:\u002F\u002Fgurubase.io\u002Fg\u002Fkserve)\n\nKServe 是一个标准化的分布式生成式和预测式 AI 推理平台，用于在 Kubernetes 上进行可扩展的多框架部署。\n\nKServe 正被 [许多组织使用](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fcommunity\u002Fadopters)，并且是 [云原生计算基金会 (CNCF)](https:\u002F\u002Fwww.cncf.io\u002F) 的孵化项目。\n\n更多详情，请访问 [KServe 官网](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002F)。\n\n![KServe](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkserve_kserve_readme_4cbc7fbcf8da.png)\n\n### 为什么选择 KServe？\n\n单一平台统一了 Kubernetes 上的生成式和预测式 AI 推理。足够简单以实现快速部署，同时又足够强大以处理企业级 AI 工作负载并具备高级功能。\n\n### 功能特性\n\n**生成式 AI**\n  * 🧮 **优化后端**：支持 vLLM 和 llm-d，为 LLM 服务提供优化性能\n  * 📌 **标准化**：OpenAI 兼容的推理协议，实现与 LLM 的无缝集成\n  * 🚅 **GPU 加速**：高性能服务，支持 GPU 及针对大模型的优化内存管理\n  * 💾 **模型缓存**：智能模型缓存，减少加载时间并提高常用模型的响应延迟\n  * 🗂️ **KV 缓存卸载**：高级内存管理，将 KV 缓存 (KV Cache) 卸载到 CPU\u002F磁盘，高效处理长序列\n  * 📈 **自动扩缩容**：基于请求的自动扩缩容 (Autoscaling) 能力，针对生成式工作负载模式进行了优化\n  * 🔧 **Hugging Face 就绪**：原生支持 Hugging Face 模型，简化部署流程\n\n**预测式 AI**\n  * 🧮 **多框架**：支持 TensorFlow、PyTorch、scikit-learn、XGBoost、ONNX 等\n  * 🔀 **智能路由**：预测器、转换器和解释器组件之间无缝的请求路由，以及自动流量管理\n  * 🔄 **高级部署**：配合 InferenceGraph 实现金丝雀发布 (Canary)、推理管道和集成\n  * ⚡ **自动扩缩容**：基于请求的自动扩缩容 (Autoscaling)，支持预测式工作负载的缩容至零 (Scale-to-zero)\n  * 🔍 **模型可解释性**：内置模型解释和特征归因支持，理解预测推理\n  * 📊 **高级监控**：启用载荷日志记录、异常检测、对抗检测和漂移检测\n  * 💰 **成本效益**：在不使用时对昂贵资源进行缩容至零 (Scale-to-zero)，降低基础设施成本\n\n### 了解更多\n要了解有关 KServe 的更多信息、如何使用各种支持的功能以及如何参与 KServe 社区，请遵循 [KServe 网站文档](https:\u002F\u002Fkserve.github.io\u002Fwebsite)。此外，我们还整理了一份 [演讲和演示](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fcommunity\u002Fpresentations) 列表，以便深入了解各种细节。\n\n### :hammer_and_wrench: 安装\n\n#### 独立安装\n- **[标准 Kubernetes 安装](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fadmin-guide\u002Foverview#raw-kubernetes-deployment)**：与无服务器安装相比，这是一种更**轻量级**的安装方式。但是，此选项不支持金丝雀部署和基于请求的带缩容至零 (Scale-to-zero) 的自动扩缩容。\n- **[Knative 安装](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fadmin-guide\u002Foverview#serverless-deployment)**：KServe 默认安装 Knative 以用于 InferenceService 的**无服务器 (Serverless)** 部署。\n- **[ModelMesh 安装](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fadmin-guide\u002Foverview#modelmesh-deployment)**：您可以选择安装 ModelMesh 以启用**高扩展性**、**高密度**和**频繁变更的模型服务**用例。 \n- **[快速安装](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fgetting-started\u002Fquickstart-guide)**：在本地机器上安装 KServe。\n\n#### Kubeflow 安装\nKServe 是 Kubeflow 的重要附加组件，请从 [Kubeflow KServe 文档](https:\u002F\u002Fwww.kubeflow.org\u002Fdocs\u002Fexternal-add-ons\u002Fkserve\u002Fkserve) 了解更多。查看以下指南以了解如何在 [AWS](https:\u002F\u002Fawslabs.github.io\u002Fkubeflow-manifests\u002Fmain\u002Fdocs\u002Fcomponent-guides\u002Fkserve) 或 [OpenShift 容器平台](https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fblob\u002Fmaster\u002Fdocs\u002FOPENSHIFT_GUIDE.md) 上运行。\n\n### :flight_departure: [创建您的第一个 InferenceService](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fgetting-started\u002Fgenai-first-isvc)\n\n### :bulb: [路线图](.\u002FROADMAP.md)\n\n### :blue_book: [InferenceService API 参考](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Freference\u002Fcrd-api)\n\n### :toolbox: [开发者指南](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fdeveloper-guide)\n\n### :writing_hand: [贡献者指南](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fdeveloper-guide\u002Fcontribution)\n\n### :handshake: [采用者](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fcommunity\u002Fadopters)\n\n### Star 历史记录\n\n[![Star History Chart](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkserve_kserve_readme_573c534db316.png)](https:\u002F\u002Fwww.star-history.com\u002F#kserve\u002Fkserve&Date)\n\n### 贡献者\n\n感谢所有杰出的贡献者！\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkserve_kserve_readme_802cd3758daf.png\" \u002F>\n\u003C\u002Fa>","# KServe 快速上手指南\n\n**KServe** 是一个标准化的分布式生成式和预测式 AI 推理平台，支持在 Kubernetes 上进行可扩展的多框架部署。作为 CNCF 孵化项目，它统一了生成式 AI（如 LLM）和预测式 AI 的推理服务。\n\n---\n\n## 1. 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n- **集群要求**: 已运行的 Kubernetes 集群（版本建议 v1.20+）。\n- **工具依赖**:\n  - `kubectl`: 用于管理集群资源。\n  - `helm`: 推荐使用 Helm 进行组件安装（v3.x+）。\n- **网络配置**:\n  - 由于涉及拉取镜像，国内开发者可能需要配置 Docker 镜像加速或代理，以确保能顺利拉取 `kserve` 相关镜像。\n  - 如需本地测试，可使用 Minikube 或 Kind 搭建本地集群。\n\n---\n\n## 2. 安装步骤\n\nKServe 支持多种安装模式，您可以根据需求选择。默认推荐 **Knative Serverless** 模式以启用自动扩缩容功能。\n\n### 方式一：Knative 安装（默认推荐）\n此模式支持请求驱动的自动扩缩容（Scale-to-Zero），适合企业级负载。\n\n```bash\n# 添加仓库\nhelm repo add kserve https:\u002F\u002Fkserve.github.io\u002Fkserve\n\n# 更新仓库\nhelm repo update\n\n# 安装 KServe (默认包含 Knative)\nhelm install kserve kserve\u002Fkserve --namespace kserve --create-namespace\n```\n\n### 方式二：标准 Kubernetes 安装（轻量级）\n如果您不需要 Knative 的 Serverless 特性（如不支持 Canary 发布），可选择此模式。\n> 详细参数请参考 [Standard Kubernetes Installation](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fadmin-guide\u002Foverview#raw-kubernetes-deployment)。\n\n### 方式三：ModelMesh 安装\n适用于高并发、高密度及模型频繁变更的场景。\n> 参考 [ModelMesh Installation](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fadmin-guide\u002Foverview#modelmesh-deployment)。\n\n### 方式四：Kubeflow 集成\nKServe 也是 Kubeflow 的重要组件。\n> 参考 [Kubeflow KServe 文档](https:\u002F\u002Fwww.kubeflow.org\u002Fdocs\u002Fexternal-add-ons\u002Fkserve\u002Fkserve)。\n\n---\n\n## 3. 基本使用\n\n安装完成后，创建一个最简单的 `InferenceService` 来部署一个预测模型。以下示例展示了如何部署一个基于 PyTorch 的模型。\n\n### 创建 InferenceService 资源\n\n将以下内容保存为 `my-model.yaml`：\n\n```yaml\napiVersion: serving.kserve.io\u002Fv1beta1\nkind: InferenceService\nmetadata:\n  name: my-model\nspec:\n  predictor:\n    sklearn:\n      storageUri: \"gs:\u002F\u002Fkfserving-samples\u002Fmodels\u002Fsklearn\u002Firis\"\n      resources:\n        requests:\n          cpu: 100m\n          memory: 128Mi\n```\n\n应用该资源：\n\n```bash\nkubectl apply -f my-model.yaml\n```\n\n### 验证服务状态\n\n检查服务是否就绪：\n\n```bash\nkubectl get isvc my-model\n```\n\n当状态显示为 `Ready` 时，即可通过 HTTP 端点发送推理请求。\n\n---\n\n## 4. 更多资源\n\n- **完整文档**: [KServe Website](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002F)\n- **API 参考**: [InferenceService API Reference](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Freference\u002Fcrd-api)\n- **开发者指南**: [Developer Guide](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fdeveloper-guide)\n- **社区交流**: [Slack Join Chat](https:\u002F\u002Fgithub.com\u002Fkserve\u002Fcommunity\u002Fblob\u002Fmain\u002FREADME.md#questions-and-issues)","某电商公司计划在 Kubernetes 集群中统一部署商品推荐预测模型与智能客服大模型，以支撑大促期间的业务需求。\n\n### 没有 kserve 时\n- 不同框架模型需独立搭建推理服务，运维团队需维护多套环境，资源利用率低。\n- 大促期间流量突增，依赖人工手动扩容，响应滞后常导致服务雪崩。\n- 缺乏统一的监控指标，难以追踪模型预测偏差、数据漂移或异常攻击请求。\n- 新模型上线流程繁琐，需要编写大量定制化脚本配置 GPU 环境与依赖包。\n\n### 使用 kserve 后\n- kserve 支持多框架统一接入，PyTorch 与 vLLM 等大模型可在同一平台标准化管理。\n- 基于请求量的自动扩缩容功能，确保高并发下服务稳定且空闲时自动缩容至零。\n- 内置漂移检测与可解释性组件，实时监控模型表现并自动生成预测依据报告。\n- 通过自定义资源定义一键部署，大幅简化了从训练完成到在线服务发布的流程。\n\nkserve 实现了异构 AI 模型的标准化托管，显著降低了企业级 AI 落地的运维复杂度与基础设施成本。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkserve_kserve_419ddae1.png","KServe","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fkserve_d27d5deb.png",null,"info@lfaidata.foundation","https:\u002F\u002Fkserve.github.io\u002Fwebsite","https:\u002F\u002Fgithub.com\u002Fkserve",[83,87,91,95,97,101,105],{"name":84,"color":85,"percentage":86},"Go","#00ADD8",64.5,{"name":88,"color":89,"percentage":90},"Python","#3572A5",33.4,{"name":92,"color":93,"percentage":94},"Dockerfile","#384d54",0.7,{"name":96,"color":85,"percentage":94},"Go Template",{"name":98,"color":99,"percentage":100},"Makefile","#427819",0.6,{"name":102,"color":103,"percentage":104},"Shell","#89e051",0.1,{"name":106,"color":107,"percentage":108},"Procfile","#3B2F63",0,5295,1424,"2026-04-05T15:19:48","Apache-2.0",5,"未说明","支持 GPU 加速，具体型号、显存及 CUDA 版本未说明",{"notes":117,"python":114,"dependencies":118},"1. 核心运行环境为 Kubernetes 集群，需预先部署 K8s。\n2. 支持生成式 AI（如 LLM）和预测式 AI 的统一推理平台。\n3. 可选安装 Knative 实现 Serverless 部署或 ModelMesh 实现高密度模型服务。\n4. 兼容 TensorFlow, PyTorch, scikit-learn, XGBoost, ONNX 等多种模型框架。",[114],[26,13],[121,122,123,124,125,126,127,128,129,130,131,132,133,67,134,135,136,137,138,139],"knative","machine-learning","model-interpretability","model-serving","istio","kubeflow","artificial-intelligence","tensorflow","pytorch","xgboost","kubernetes","k8s","service-mesh","hacktoberfest","mlops","genai","llm-inference","cncf","vllm","2026-03-27T02:49:30.150509","2026-04-06T08:40:54.255974",[143,148,153,158,162,167],{"id":144,"question_zh":145,"answer_zh":146,"source_url":147},1890,"在 GCP 环境下部署 Flowers Sample 时，如何正确获取 CLUSTER_IP？","如果在执行命令后 CLUSTER_IP 返回空值，可能是因为 `istio-ingressgateway` 服务的 `.status.loadBalancer.ingress[0].ip` 字段为空。此时可以尝试使用 `.spec.clusterIP` 来获取集群内部 IP。例如：`kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.clusterIP}'`。","https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fissues\u002F734",{"id":149,"question_zh":150,"answer_zh":151,"source_url":152},1891,"运行预测时 SERVICE_HOSTNAME 变量为何显示为 null？","这通常是因为 InferenceService 的状态 URL 未正确生成或无法解析。请检查 InferenceService 的状态是否为 'Ready'。如果状态正常但仍为空，可能是 Ingress 配置问题。尝试从 `kubectl get inferenceservice ${MODEL_NAME} -o jsonpath='{.status.url}'` 中手动提取 hostname，确保没有多余的空格或截断。","https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fissues\u002F912",{"id":154,"question_zh":155,"answer_zh":156,"source_url":157},1892,"Deployment 更新失败报错 \"the object has been modified\" 该如何解决？","此错误通常发生在 Kubernetes 对象被并发修改时。根据经验，这可能与 Knative 版本和 Kubernetes 版本不兼容有关。建议检查 Knative 版本是否支持当前的 K8s 版本。例如，Knative 0.11 可能不支持 K8s 1.18，需升级 Knative 到 v0.15.0 以解决此类冲突。","https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fissues\u002F833",{"id":159,"question_zh":160,"answer_zh":161,"source_url":157},1893,"Kubernetes 1.18 与 Knative 0.11 版本不兼容导致的问题如何解决？","Kubernetes 1.18 发布后，Knative Serving 0.11 可能存在 Webhook Handler 不兼容问题。解决方法是卸载旧的 Knative 版本（如 0.11.1），并按照 Knative 官方文档重新安装较新的版本（如 v0.15.0）。安装完成后，KFServing 样本应能正常运行。",{"id":163,"question_zh":164,"answer_zh":165,"source_url":166},1894,"使用 Kubeflow Master 分支时出现 IngressNotConfigured 错误是什么原因？","该错误通常与 Istio 版本有关。如果使用接近 kfctl_istio_dex.yaml 的配置，可能会安装 Istio 1.1 而非推荐的 1.3。由于 Manifests 更新，旧版 Istio 可能导致 Kservice 无法变为可用状态。建议参考相关文档（如 istio-dex README）确认 Istio 版本兼容性。","https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fissues\u002F668",{"id":168,"question_zh":169,"answer_zh":170,"source_url":166},1895,"Kubeflow 安装中 Istio 版本的选择有什么注意事项？","在 Kubeflow Master 分支中，建议使用 Istio 1.3 以获得更好的兼容性。部分云厂商（如 DigitalOcean）可能暂时无法运行 1.3 版本而只能使用 1.1，但这可能导致 Ingress 配置问题。如果遇到问题，请优先尝试升级到 Istio 1.3 并检查 KubeFlow 文档中的版本要求。",[172,177,182,187,192,197,201,206,211,216,221,225,230,235,240,244,249,254,259,264],{"id":173,"version":174,"summary_zh":175,"released_at":176},111059,"v0.17.0","## Installation\r\n\r\n- [Installation Guide](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fnext\u002Fgetting-started\u002Fquickstart-guide)\r\n\r\n## Breaking Change\r\n- [Major Helm Chart Restructuring ](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Finstall\u002Fupgrade-guide#upgrading-v0160-to-v0170)\r\n\r\n## What's Changed\r\n* choir: fix missing markers by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4793\r\n* fix: update base image for PMML to use eclipse-temurin JDK by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4794\r\n* add suffix kserve for helm chart of llmisvc by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4792\r\n* Parallelize blob downloads from azure in storage-initializer by @agoston in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4709\r\n* Parallelize blob downloads from S3 in storage-initializer by @agoston in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4714\r\n* chore: bump k8s package versions to v0.34.0 by @murataslan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4713\r\n* Prevent path traversal on https.go by @spolti in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4796\r\n* Refactor existing script to use centralized installation scripts by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4787\r\n* fix: CI failure caused by deprecated MinIO image by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4812\r\n* feat: Add a python script to generate quick install script by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4806\r\n* fix generator to be executable without cloning repo by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4813\r\n* Fixing LLMISVC CRD Installation Paths by @brettmthompson in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4811\r\n* feat: Add pathTemplate configuration for inference service routing by @DeepFlame-JR in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4817\r\n* helmchart: add image pull policy for serving runtimes by @stergem in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4814\r\n* fix: multiple storage uri volume mount error by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4822\r\n* Fix Bug in LLMISVC E2E Test Setup by @brettmthompson in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4841\r\n* fix: Fix various bugs in quickstart scripts by @JonahSussman in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4832\r\n* Pin starlette to version 0.49.1 to fix CVE-2025-62727 by @hdefazio in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4846\r\n* fix(llmisvc): adds missing generated clients by @bartoszmajsak in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4849\r\n* Change security mailing list email address by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4853\r\n* Allow stopping LLMInferenceService by @pierDipi in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4839\r\n* Add cjohannsen-cloudera as a reviewer by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4854\r\n* update go version to 1.25 and kubebuilder version to 1.9.0 by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4862\r\n* fix: remove cert-manager annotation in inferenceservice crd by @juhyeon-cha in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4837\r\n* update kserve version and update helm chart name by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4860\r\n* fix: Address CVE-2025-22872, CVE-2025-47914, CVE-2025-58181 by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4866\r\n* Add Support For Protocol in S3 Endpoint by @brettmthompson in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4797\r\n* fix: llmisvc image patch script to use renamed manager yaml file by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4820\r\n* fix make deploy target in makefile by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4872\r\n* ci: add manual trigger for helm publish workflow by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4873\r\n* docs: Update KServe diagram to use the KubeCon NA version by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4877\r\n* CVE-2024-43598: Update lightgbm version to 4.6.0 by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4799\r\n* fix: the quick install scripts temporarily. by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4880\r\n* mock test for ray by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4885\r\n* fix: synced kserve-helm.sh and kserve-kustomize.sh by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4891\r\n* fix: update LWS version to 0.7.0 to fix multi-node E2E test failure by @vivekk16 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4892\r\n* OCI model storage is not working with multi-node feature by @spolti in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4889\r\n* Follow up 4787 by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4810\r\n* Fix CI precommit error by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4897\r\n* fix: clear stale router and scheduler conditions on config changes by @vivekk16 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4876\r\n* feat: bump up vllm version to 0.11.2 & remove python 3.9 support by @csy1204 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4851\r\n* Remove deprecated --disable-log-requests flag for vLLM 0.11 compatibility by @vivekk16 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4883\r\n* Improved CA Bundle","2026-03-13T23:12:37",{"id":178,"version":179,"summary_zh":180,"released_at":181},111060,"v0.17.0-rc1","## Installation\r\n\r\n- [Installation Guide](https:\u002F\u002Fkserve.github.io\u002Fwebsite\u002Fdocs\u002Fnext\u002Fgetting-started\u002Fquickstart-guide)\r\n\r\n## What's Cherry-Picked\r\n* bbbf9f49f release: prepare release v0.17.0-rc1 (#5222)\r\n* 8ac79d4fa feat: include GIE CRDs as part of Kserve bundle (#5214)\r\n* fa74bca24 fix: add helm chart version to kserve-runtime-configs (#5209)\r\n* 16ea3d0e8 feat(release): add draft release automation and improve release process (#5191)\r\n* 706a9674c feat: 5060 CSV and Parquet marshallers (#5115)\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fcompare\u002Fv0.17.0-rc0...v0.17.0-rc1","2026-03-12T14:27:48",{"id":183,"version":184,"summary_zh":185,"released_at":186},111061,"v0.17.0-rc0","## What's Changed\r\n* choir: fix missing markers by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4793\r\n* fix: update base image for PMML to use eclipse-temurin JDK by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4794\r\n* add suffix kserve for helm chart of llmisvc by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4792\r\n* Parallelize blob downloads from azure in storage-initializer by @agoston in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4709\r\n* Parallelize blob downloads from S3 in storage-initializer by @agoston in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4714\r\n* chore: bump k8s package versions to v0.34.0 by @murataslan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4713\r\n* Prevent path traversal on https.go by @spolti in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4796\r\n* Refactor existing script to use centralized installation scripts by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4787\r\n* fix: CI failure caused by deprecated MinIO image by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4812\r\n* feat: Add a python script to generate quick install script by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4806\r\n* fix generator to be executable without cloning repo by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4813\r\n* Fixing LLMISVC CRD Installation Paths by @brettmthompson in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4811\r\n* feat: Add pathTemplate configuration for inference service routing by @DeepFlame-JR in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4817\r\n* helmchart: add image pull policy for serving runtimes by @stergem in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4814\r\n* fix: multiple storage uri volume mount error by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4822\r\n* Fix Bug in LLMISVC E2E Test Setup by @brettmthompson in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4841\r\n* fix: Fix various bugs in quickstart scripts by @JonahSussman in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4832\r\n* Pin starlette to version 0.49.1 to fix CVE-2025-62727 by @hdefazio in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4846\r\n* fix(llmisvc): adds missing generated clients by @bartoszmajsak in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4849\r\n* Change security mailing list email address by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4853\r\n* Allow stopping LLMInferenceService by @pierDipi in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4839\r\n* Add cjohannsen-cloudera as a reviewer by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4854\r\n* update go version to 1.25 and kubebuilder version to 1.9.0 by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4862\r\n* fix: remove cert-manager annotation in inferenceservice crd by @juhyeon-cha in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4837\r\n* update kserve version and update helm chart name by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4860\r\n* fix: Address CVE-2025-22872, CVE-2025-47914, CVE-2025-58181 by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4866\r\n* Add Support For Protocol in S3 Endpoint by @brettmthompson in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4797\r\n* fix: llmisvc image patch script to use renamed manager yaml file by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4820\r\n* fix make deploy target in makefile by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4872\r\n* ci: add manual trigger for helm publish workflow by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4873\r\n* docs: Update KServe diagram to use the KubeCon NA version by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4877\r\n* CVE-2024-43598: Update lightgbm version to 4.6.0 by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4799\r\n* fix: the quick install scripts temporarily. by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4880\r\n* mock test for ray by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4885\r\n* fix: synced kserve-helm.sh and kserve-kustomize.sh by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4891\r\n* fix: update LWS version to 0.7.0 to fix multi-node E2E test failure by @vivekk16 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4892\r\n* OCI model storage is not working with multi-node feature by @spolti in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4889\r\n* Follow up 4787 by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4810\r\n* Fix CI precommit error by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4897\r\n* fix: clear stale router and scheduler conditions on config changes by @vivekk16 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4876\r\n* feat: bump up vllm version to 0.11.2 & remove python 3.9 support by @csy1204 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4851\r\n* Remove deprecated --disable-log-requests flag for vLLM 0.11 compatibility by @vivekk16 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4883\r\n* Improved CA Bundle Management For LLM Inference Services by @brettmthompson in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4803\r\n* ci: split KServe and Storage  publish workflow into separate jobs by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4801\r\n* chore: Add .gitattributes t","2026-03-06T18:55:36",{"id":188,"version":189,"summary_zh":190,"released_at":191},111062,"v0.16.0","## What's Changed\r\n* chore: remove 'default' suffix compatibility by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4178\r\n* Upgrade Torch to v2.6.0 everywhere by @ashahba in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4450\r\n* chore: drop pydantic v1 support by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4353\r\n* fix: Update TextIteratorStreamer to skip special tokens by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4490\r\n* Add Jooho to approvers in OWNERS file by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4504\r\n* Rename CRD file to reflect all KServe CRDs (Fixes #4396) by @WHITE-ICE-BOX in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4494\r\n* Update kserve-resources helm chart to disable desired servingruntimes by @jmlaubach in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4485\r\n* upgrade vllm to v0.9.0 and Torch to v2.7.0 by @ashahba in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4501\r\n* Upgrade vLLM to v0.9.0.1 by @ashahba in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4507\r\n* Initial segregation of the storage module from KServe SDK by @spolti in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4391\r\n* Fix pss restricted warnings by @akagami-harsh in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4327\r\n* Fix: do not update poetry dependency when install hf cpu deps by @yuzisun in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4516\r\n* [Bug] Fixes error in trace logging by @gavrissh in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4514\r\n* Stop and resume a model [Raw Deployment] by @hdefazio in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4455\r\n* Resolve inference endpoint using runtime protocol when applicable by @israel-hdez in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4527\r\n* fix(codegen): pins code-generator binaries version by @bartoszmajsak in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4533\r\n* Allow to set custom timeouts for `InferenceGraph` router by @lifo9 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4218\r\n* feat: support remote storage URI injection for serving runtimes by @de0725 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4492\r\n* [API] Define LLMInferenceService and LLMInferenceServiceConfig types and CRDs by @pierDipi in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4522\r\n* Stop and Resume a transformer by @hdefazio in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4534\r\n* Allow OCI for multi-node\u002Fmulti-gpu by @israel-hdez in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4441\r\n* 4380 - Inference logging to blob storage by @cjohannsen-cloudera in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4473\r\n* Fix outdated BentoML import in sample code (BentoService no longer available in v1.x) by @YehCC52 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4540\r\n* Auto-update annotation for isvc. by @andresllh in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4342\r\n* Stop and resume an explainer by @hdefazio in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4546\r\n* fix: unset clampMax and clampMin, since they are not for replicas by @houshengbo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4556\r\n* refactor: Enhance HTTPRoute readiness checks by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4543\r\n* Refactor KServe to use global context for PredictorConfig by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4526\r\n* feat: refactor storage initializer resources configuration by @takamai06 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4411\r\n* feat(envtest): simplifies CRD lookup by @bartoszmajsak in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4564\r\n* llmisvc: Initial controller scaffold and helm chart by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4557\r\n* Add logic to merge specs for LLMInferenceService  by @VedantMahabaleshwarkar in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4563\r\n* fix: Allow CA bundle path without config map by @fabiendupont in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4451\r\n* docs: fixes invalid openshift subscription by @bartoszmajsak in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4572\r\n* Add Code Coverage change report for PRs by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4487\r\n* feat: support secure access to prometheus in keda by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4384\r\n* feat: switch kserve from poetry to uv by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4407\r\n* chore(utils): simplifies code using generics by @bartoszmajsak in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4578\r\n* Add AOT flashinfer build to huggingfaceserver dockerfile to precompil… by @AyushSawant18588 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4567\r\n* feat: improves collection helpers by @bartoszmajsak in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4579\r\n* Add Missing config file for code coverage by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4581\r\n* Fix: Support multiple metrics in OpenTelemetryCollector for autoscaling by @houshengbo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4591\r\n* Upgrade vllm to v0.9.2 by @AyushSawant18588 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4586\r\n* Remove unused Strategy interface from sharding package by @houshengbo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4590\r\n* fix(test): us","2025-11-03T02:40:28",{"id":193,"version":194,"summary_zh":195,"released_at":196},111063,"v0.16.0-rc1","## What's Changed\r\n* deprecate: remove EnableDirectPvcVolumeMount flag by @anurags25 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4694\r\n* Fix autoscaling tests duration and crashes by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4688\r\n* Avoid Pervasive Logging of SA Not Found Errors by Credential Builder by @brettmthompson in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4696\r\n* fix: correct llmisvc Dockerfile reference in image publish workflow by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4705\r\n* llmisvc: fix RBAC, templating, and adds quick install script by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4698\r\n* Fixed the panic nil pointer issue componentExt being nil by @houshengbo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4704\r\n* fix: Add disk space cleanup step to Docker publish workflows by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4717\r\n* Add Star History section to README by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4719\r\n* docs: Mention CNCF in project README by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4718\r\n* Enabled the configuration options in Helm for opentelemetryCollector and autoscaler by @houshengbo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4725\r\n* Time Series Forecast API Endpoint by @jinan-zhou in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4615\r\n* llmisvc dev & kustomize setup, add webhook config to helm chart  by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4712\r\n* Revise KServe overview and enhance features section by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4721\r\n* Fix incorrect entrypoint in llmisvc Dockerfile by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4730\r\n* Configure HF Downloads to Lower Memory Usage by @brettmthompson in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4726\r\n* Temporarily disable SSL to unblock e2e tests by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4731\r\n* Use a wrapper struct to accept resource.Quantity and keep the original input by @houshengbo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4699\r\n* Support Multiple Storage URIs for InferenceServices by @anurags25 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4702\r\n* Fix: prepend the KO_DOCKER_REPOSITORY to the base docker build to allow local publishing of the controller by @cjohannsen-cloudera in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4736\r\n* Add Cert Manager installation to llmisvc quick install script by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4733\r\n* Injecting CA Bundle Into Storage Initializer Container for S3 Storage on LLMISVC Reconciliation by @brettmthompson in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4728\r\n* Add metadata propagation for Kueue configurations to both Deployment and LeaderWorkerSet workloads by @hdefazio in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4747\r\n* Add Support for Configuring S3 Storage via Secret Data by @brettmthompson in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4727\r\n* 4739 - Fix: blob storage for inference logging recognizes embedded spec by @cjohannsen-cloudera in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4740\r\n* Add llmd e2e tests by @andresllh in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4729\r\n* Prepare for 0.16.0-rc1 release by @houshengbo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4732\r\n* Update the kserve-storage module to the latest version by @spolti in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4754\r\n* feature: 4553 - Support inference logging to GCS and Azure by @cjohannsen-cloudera in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4582\r\n* fix(helm-chart): expose `uidModelcar` in the chart by @maciej-tatarski in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4689\r\n\r\n## New Contributors\r\n* @anurags25 made their first contribution in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4694\r\n* @jinan-zhou made their first contribution in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4615\r\n* @maciej-tatarski made their first contribution in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4689\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fcompare\u002Fv0.16.0-rc0...v0.16.0-rc1","2025-10-19T19:00:03",{"id":198,"version":199,"summary_zh":190,"released_at":200},111064,"v0.16.0-rc0","2025-09-15T12:18:38",{"id":202,"version":203,"summary_zh":204,"released_at":205},111065,"v0.15.2","## What's Changed\r\n* Fixes CVE-2025-43859 by @spolti in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4468\r\n* config: enable ModelCar by default by @tarilabs in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4316\r\n* fix: huggingface e2e test output mismatch and add tests for stream requests by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4482\r\n* Rework the order in which the knative autoscaler configmap is read during reconciliation by @brettmthompson in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4471\r\n* Add predictor_config to ModelServer init function by @greenmoon55 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4491\r\n* docs: enhance security documentation with detailed reporting and prevention mechanisms by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4495\r\n* fix: update workflow to use ubuntu-latest for rerun PR tests by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4496\r\n* Generate Release 0.15.2 by @yuzisun in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4497\r\n\r\n## New Contributors\r\n* @tarilabs made their first contribution in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4316\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fcompare\u002Fv0.15.1...v0.15.2","2025-05-27T07:11:20",{"id":207,"version":208,"summary_zh":209,"released_at":210},111066,"v0.15.1","## What's Changed\r\n* fix typo on inferenceservice-config by @spolti in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4244\r\n* Bump Go version to 1.24 by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4321\r\n* CI: Increase timeout for REST client connections to improve reliability by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4355\r\n* Localmodel agent can watch node groups by @greenmoon55 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4362\r\n* Fixing flake8 linter error. by @andresllh in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4354\r\n* Update Huggingface Transformer to 4.50.3 by @rajatvig in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4351\r\n* fix: Register LoRA model name in model registry to avoid not found error by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4352\r\n* Update Huggingface Transformer to 4.51.0 and huggingface-hub for kserve by @rajatvig in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4364\r\n* Router config fixes in configmap template by @tmbochenski in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4369\r\n* Chore: Deprecate Openvino support in HF runtime by @gavrissh in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4379\r\n* vLLM V1 support for HF Server Runtime by @gavrissh in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4368\r\n* Support Numpy 2.x by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4386\r\n* [Model cache] Do not remove PVC and PV after isvc deletion by @greenmoon55 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4390\r\n* Fix Flaky multi processing tests by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4383\r\n* Rerank support for vLLM in HuggingFace Serving Runtime by @AyushSawant18588 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4376\r\n* Upgrade vllm to support Llama4 by @gavrissh in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4388\r\n* Adding bitsandbytes package for 4 bit support by @johnugeorge in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4406\r\n* Fix: Isvc matched with wrong ModelCache by @HotsauceLee in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4398\r\n* Fix: remove duplicated OpenAIGenerativeModel in __init__ by @huazq in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4399\r\n* Feat: Allow inference service metadata injection at agent sidecar level for payload logging by @tylerhyang in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4325\r\n* Fix for KEDA scaledobject target value is set to pointer instead of specified value by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4373\r\n* MultiNode change a logic to calculate ray node count and gpu count by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4356\r\n* Remove internal annotations when no cache resource is matched by @greenmoon55 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4412\r\n* Add DeploymentMode to InferenceService and InferenceGraph status and prevent deploymentMode change by @israel-hdez in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4423\r\n* Feat should not NewHPARecroncile for external hpa. by @johnzheng1975 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4363\r\n* chore: Upgrade prow-github-actions to version 2 in workflow files by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4417\r\n* Fix: update external autoscaler tests missed from PR 4363 by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4438\r\n* Update OWNERS by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4442\r\n* Upgrade vLLM to support Qwen3 by @gavrissh in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4434\r\n* InferenceGraph: Fix response code when condition step is not fulfilled by @israel-hdez in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4429\r\n* Stop and resume a model by adding a new annotation [Serverless]  by @hdefazio in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4337\r\n* Improve Handling of Knative Autoscaler Confguration by @brettmthompson in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4394\r\n* chore: adds CNCF Code of Conduct by @bartoszmajsak in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4458\r\n* chore:  Reenable Docker workflows to support  arm64 build by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4446\r\n* Fix issue with precommit hook by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4456\r\n* Fix raw deployment update  by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4445\r\n* update golangcli-lint to 1.64.8 by @ashahba in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4459\r\n* fix: corrects links to translations by @bartoszmajsak in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4461\r\n* chore: Include third party licenses, Add license checker, Enable SBOM Generation for images by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4416\r\n* LMCache Integration with vLLM runtime by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4320\r\n* Publish 0.15.1 release by @greenmoon55 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4466\r\n* Fix: add type specification for nthread argument in argument parser by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4410\r\n* Improve code coverage by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4385\r\n* Fixes vLLM V1 failures: Revert back the approach to initiate the background engine task by @ga","2025-05-15T15:23:01",{"id":212,"version":213,"summary_zh":214,"released_at":215},111067,"v0.15.0","## What's Changed\r\n* bump to vllm0.6.2 and add explicit chat template by @hustxiayang in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3964\r\n* bump to vllm0.6.3 by @hustxiayang in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4001\r\n* Feature: Add hf transfer by @tjandy98 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4000\r\n* Fix snyk scan null error by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3974\r\n* Update quick install script by @johnugeorge in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4005\r\n* Local Model Node CR by @HotsauceLee in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3978\r\n* Reduce E2Es dependency on CI environment (2) by @israel-hdez in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4008\r\n* Allow GCS to download single file by @spolti in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4015\r\n* bump to vllm0.6.3.post1 by @hustxiayang in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4023\r\n* Set default for SamplingParams.max_tokens in OpenAI requests if unset by @kevinmingtarja in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4020\r\n* Add tools functionality to vLLM by @ArjunBhalla98 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4033\r\n* For vllm users, our parser should be able to support both - and _ by @hustxiayang in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3933\r\n* Add tools unpacking for vLLM by @ArjunBhalla98 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4035\r\n* Multi-Node Inference Implementation by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3972\r\n* Enhance InjectAgent to Handle Only HTTPGet, TCP Readiness Probes by @LOADBC in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4012\r\n* Feat: Fix memory issue by replacing io.ReadAll with io.Copy (#4017) by @ops-jaeha in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4018\r\n* Update alibiexplainer example by @spolti in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4004\r\n* Fix huggingface build runs out of storage in CI by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4044\r\n* Update snyk scan to include new images by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4042\r\n* Introducing KServe Guru on Gurubase.io by @kursataktas in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4038\r\n* Fix Hugging Face server EncoderModel not returning probabilities by correctly passing --return_probabilities flag  (#3958) by @oplushappy in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4024\r\n* Add deeper readiness check for transformer by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3348\r\n* Fix Starlette Denial of service (DoS) via multipart\u002Fform-data by @spolti in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4006\r\n* remove duplicated import \"github.com\u002Fonsi\u002Fgomega\" by @carlory in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4051\r\n* Fix localmodel controller name in snyk scan workflow by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4054\r\n* Fix azure blob storage access key env not mounted by @bentohset in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4064\r\n* Storage Initializer support single digit azure DNS zone ID by @bentohset in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4070\r\n* Fix trust remote code encoder model by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4043\r\n* introduce the prepare-for-release.sh script by @spolti in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3993\r\n* Model cache controller and node agent  by @yuzisun in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4089\r\n* Storage containers typo fix for Huggingface Storage type by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4098\r\n* Support datetime object serialization in v1\u002Fv2 response by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4099\r\n* Replace klog with klog\u002Fv2 by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4093\r\n* Add exception handling and logging for grpc server by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4066\r\n* Update ClusterLocalModel to LocalModelCache by @yuzisun in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4105\r\n* Fix LocalModelCache controller reconciles deleted resource by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4106\r\n* Fix  InferenceService state when Predictor pod in CrashLoopBackOff  by @hdefazio in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4003\r\n* LocalModelCache Admission Webhook by @HotsauceLee in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4102\r\n* Add namespace to localmodel and localmodelnode ServiceAccount helm chart by @ritzdevp in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4111\r\n* KServe VLLM cpu image by @AyushSawant18588 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4049\r\n* Update max_model_len calculation and fixup encoder pooling by @Datta0 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4055\r\n* chore: use patch instead of update for finalizer changes by @whynowy in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4072\r\n* Fix isvc role localmodelcache permission by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4131\r\n* Detect missing models and redownload models by @greenmoon55 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4095\r\n* introduce service configuration at configmap level by @spolti in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3672\r\n* Allow","2025-03-31T11:17:55",{"id":217,"version":218,"summary_zh":219,"released_at":220},111068,"v0.15.0-rc1","## What's Changed\r\n* Fix Gateway API flaky test by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4214\r\n* Remove linux\u002Farm64\u002Fv8 as platform option to fix build errors by @gavrissh in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4217\r\n* Fix: typo in inferenceservice configmap by @sukumargaonkar in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4215\r\n* Fix CI not using localmodelnode agent dev image by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4221\r\n* Fix model download path by @hakuro95 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4112\r\n* Support Multiple NodeGroups In LocalModelCache by @HotsauceLee in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4170\r\n* Inference Graph: use plain text HTTP when part of Istio Mesh by @israel-hdez in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4031\r\n* Better compatibility with in-place upgrades by @israel-hdez in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4234\r\n* Increase request timeout seconds for art explainer by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4241\r\n* fix: add trainedmodels custom resource to kubeflow-kserve clusterroles by @gigabyte132 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4225\r\n* Fix CVE-2025-24357 and Bump vLLM to 0.7.2 by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4223\r\n* Use Go 1.23 to build kserve and update mod versions by @rajatvig in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4239\r\n* install: Remove modelmesh installation from helm chart by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4243\r\n* Bump golang-lint to 1.63 and fix all linter errors by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3967\r\n* Issue 4248: Request Logger with Multiple Metadata Headers fail by @tylerhyang in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4249\r\n* Add predictor healthcheck to OpenAIProxyModel by @greenmoon55 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4250\r\n* Expose podSpec fields for Inferencegraph by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4091\r\n* Fix localmodel test by @greenmoon55 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4268\r\n* Force symlink for ModelCar by @pmtk in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4274\r\n* Refactor vLLM + Embed support by @gavrissh in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4177\r\n* Fix triton health check by @greenmoon55 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4277\r\n* Upgrade vLLM version to 0.7.3 by @gavrissh in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4281\r\n* 0.15.0-rc1 release by @greenmoon55 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4285\r\n* Add model_version field to InferRequest by @greenmoon55 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4287\r\n* (Bug #4273) quick_install.sh failed to uninstall incomplete installation and has small syntax bug by @zozowell in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4275\r\n* update openshift guide by @spolti in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4210\r\n* Collocation transformer and predictor spec by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4255\r\n* Move arguments from 'args' to 'command' for huggingface server multnode SR by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4289\r\n* Include reasoning parser option in vLLM for reasoning models by @gavrissh in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4282\r\n* KServe Keda Integration by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3652\r\n* add huggingfaceserver-multinode to helm chart by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4293\r\n* Add missing CRDs for Keda by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4296\r\n\r\n## New Contributors\r\n* @hakuro95 made their first contribution in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4112\r\n* @gigabyte132 made their first contribution in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4225\r\n* @tylerhyang made their first contribution in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4249\r\n* @pmtk made their first contribution in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4274\r\n* @zozowell made their first contribution in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4275\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fcompare\u002Fv0.15.0-rc0...v0.15.0-rc1","2025-03-07T16:01:32",{"id":222,"version":223,"summary_zh":214,"released_at":224},111069,"v0.15.0-rc0","2025-01-27T21:59:47",{"id":226,"version":227,"summary_zh":228,"released_at":229},111070,"v0.14.1","## What's Changed\r\n* Support datetime object serialization for v1\u002Fv2 response by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4123\r\n* Introduce LocalModelNode CR by @HotsauceLee in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3978\r\n* Update Model Cache controller for LocalModelNode and implement LocalModel node agent by @HotsauceLee @greenmoon55 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4089\r\n* Rename ClusterLocalModel to LocalModelCache by @yuzisun in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4105\r\n* Detect missing models and redownload models by @greenmoon55 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4095\r\n* Allow multiple node groups in the model cache CR by @greenmoon55 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4134\r\n* Annotation to disable model cache by @greenmoon55 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4118\r\n* Clean up jobs in local model agent by @greenmoon55 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4140\r\n* Add node group to PVC name by @HotsauceLee in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4141\r\n* Make local node agent reconciliation frequency configurable by @greenmoon55 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4143\r\n* Add LocalModelCache admission webhook by @HotsauceLee in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4102\r\n* Fix model server fails to gracefully shutdown by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4116\r\n* Ensure root model directory exists and add protection for jobs created by @yuzisun https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F4152\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fcompare\u002Fv0.14.0...v0.14.1","2024-12-25T04:23:53",{"id":231,"version":232,"summary_zh":233,"released_at":234},111071,"v0.14.0","## What's Changed\r\n* Prevent the PassthroughCluster for clients\u002Fworkloads in the service mesh by @israel-hdez in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3711\r\n* Extract openai predict logic into smaller methods by @grandbora in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3716\r\n* Bump MLServer to 1.5.0 by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3740\r\n* Refactor storage initializer to log model download time for all storage types by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3735\r\n* inferenceservice controller: fix error check in Serverless mode by @dtrifiro in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3753\r\n* Add nccl package and Bump vLLM to 0.4.3 for huggingface runtime by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3723\r\n* Propagate `trust_remote_code` flag throughout vLLM startup by @calwoo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3729\r\n* Fix dead links on PyPI by @kevinbazira in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3754\r\n* Fix model is ready even if there is no model by @HAO2167 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3275\r\n* Fix No model ready error in multi model serving by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3758\r\n* Initial implementation of Inference client by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3401\r\n* Fix logprobs for vLLM by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3738\r\n* Fix model name not properly parsed by inference graph by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3746\r\n* pillow - Buffer Overflow by @spolti in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3598\r\n* Use add_generation_prompt while creating chat template by @Datta0 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3775\r\n* Deduplicate the names for the additional domain names by @houshengbo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3773\r\n* Make Virtual Service case-insensitive by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3779\r\n* Install packages needed for vllm model load by @gavrissh in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3802\r\n* Make gRPC max message length configurable by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3741\r\n* Add readiness probe for MLServer and Increase memory for pmml in CI by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3789\r\n* Several bug fixes for vLLM completion endpoint by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3788\r\n* Increase timeout to make unit test stable by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3808\r\n* Upgrade CI deps by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3822\r\n* Add tests for vLLM by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3771\r\n* Bump python to 3.11 for serving runtime images and Bump poetry to 1.8.3 by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3812\r\n* Bump vLLM to 0.5.3.post1 by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3828\r\n* Refactor the ModelServer to let uvicorn handle multiple workers and use 'spawn' for mutiprocessing by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3757\r\n* Update golang for docs\u002FDockerfile to 1.21 by @spolti in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3761\r\n* Make ray an optional dependency by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3834\r\n* Update aif example by @spolti in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3765\r\n* Use helm for quick installation by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3813\r\n* Allow KServe to have its own local gateways for Serverless mode by @israel-hdez in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3737\r\n* Add support for Azure DNS zone endpoints by @tjandy98 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3819\r\n* Fix failed build for knativeLocalGatewayService by @yuzisun in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3866\r\n* Add logging request feature for vLLM backend by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3849\r\n* Bump vLLM to 0.5.4 by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3874\r\n* Fix: Add workaround for snyk image scan failure by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3880\r\n* Fix trust_remote_code not working with huggingface backend by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3879\r\n* Update KServe 2024-2025 Roadmap by @yuzisun in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3810\r\n* Configurable image pull secrets in Helm charts by @saileshd1402 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3838\r\n* Fix issue with rolling update behavior by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3786\r\n* Fix the 'tokens exceeding model limit' error response in vllm server by @saileshd1402 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3886\r\n* Add support for binary data extension protocol and FP16 datatype by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3685\r\n* Protobuf version upgrade 4.25.4 by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3881\r\n* Adds optional labels and annotations to the controller by @guitouni in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F336","2024-10-16T11:16:32",{"id":236,"version":237,"summary_zh":238,"released_at":239},111072,"v0.14.0-rc1","## What's Changed\r\n* Publish 0.14.0-rc0 release by @yuzisun in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3867\r\n* Use API token for publishing package to PyPI by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3896\r\n* Fix sdlc broken when kserve installed using helm by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3890\r\n* Add Security Context and Resources to RBAC Proxy by @HotsauceLee in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3898\r\n* Remove unwanted cluster scope secret permissions by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3893\r\n* bump to vllm 0.5.5 by @lizzzcai in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3911\r\n* pin gosec to 2.20.0 by @greenmoon55 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3921\r\n* add a new doc 'common issues and solutions' by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3878\r\n* Implement health endpoint for vLLM backend by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3850\r\n* Add security best practices for inferenceservice, inferencegraph, servingruntimes by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3917\r\n* Bump Go to 1.22 by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3912\r\n* bump to vllm 0.6.0 by @hustxiayang in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3934\r\n* Set the volume mount's readonly annotation based on the ISVC annotation by @hdefazio in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3885\r\n* mount \u002Fdev\u002Fshm volume to huggingfaceserver by @lizzzcai in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3910\r\n* Fix permission error in snyk scan by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3889\r\n* Cluster Local Model CR by @greenmoon55 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3839\r\n* added http headers to inbound request by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3895\r\n* Add prow-github-action by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3888\r\n* Add TLS support for Inference Loggers by @ruivieira in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3863\r\n* Fix explainer endpoint not working with path based routing by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3257\r\n* Fix ingress configuration for path based routing  and update go mod by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3944\r\n* Add HostIPC field to ServingRuntimePodSpec by @greenmoon55 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3943\r\n* remove conversion wehbook part from self-signed-ca.sh by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3941\r\n* update fluid kserve sample to use huggingface servingruntime by @lizzzcai in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3907\r\n* bump to vLLM0.6.1post2 by @hustxiayang in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3948\r\n* Add NodeDownloadPending status to ClusterLocalModel by @greenmoon55 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3955\r\n* add tags to rest server timing logs to differentiate cpu and wall time by @gfkeith in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3954\r\n* Implement Huggingface model download in storage initializer by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3584\r\n* Update OWNERS file by @yuzisun in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3966\r\n* Cluster local model controller by @greenmoon55 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3860\r\n* Prepare for 0.14.0-rc1 release and automate sync process by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3970\r\n\r\n## New Contributors\r\n* @HotsauceLee made their first contribution in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3898\r\n* @hustxiayang made their first contribution in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3934\r\n* @hdefazio made their first contribution in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3885\r\n* @ruivieira made their first contribution in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3863\r\n* @gfkeith made their first contribution in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3954\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fcompare\u002Fv0.14.0-rc0...v0.14.0-rc1","2024-10-03T08:44:00",{"id":241,"version":242,"summary_zh":233,"released_at":243},111073,"v0.14.0-rc0","2024-08-27T03:20:43",{"id":245,"version":246,"summary_zh":247,"released_at":248},111074,"v0.13.1","## What's Changed\r\n* Add nccl package and Bump vLLM to 0.4.3 for huggingface runtime by @sivanantha321 (https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3723)\r\n* Propagate trust_remote_code flag throughout vLLM startup by @calwoo (https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3729)\r\n* Use add_generation_prompt while creating chat template by @Datta0 (https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3775)\r\n* Fix logprobs for vLLM by @sivanantha321 (https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3738)\r\n* Install packages needed for vllm model load by @gavrissh (https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3802)\r\n* Publish 0.13.1 Release by @johnugeorge in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3824\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fcompare\u002Fv0.13.0...v0.13.1","2024-07-28T17:22:03",{"id":250,"version":251,"summary_zh":252,"released_at":253},111075,"v0.13.0","## :rainbow: What's New?\r\n* add support for async streaming in predict by @alexagriffith in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3475\r\n* Fix: Support model parallelism in HF transformer by @gavrishp in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3459\r\n* Support model revision and tokenizer revision in huggingface server by @lizzzcai in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3558\r\n* OpenAI schema by @tessapham in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3477\r\n* Support OpenAIModel in ModelRepository by @grandbora in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3590\r\n* updated xgboost to support json and ubj models by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3551\r\n* Add OpenAI API support to Huggingfaceserver by @cmaddalozzo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3582\r\n* VLLM support for OpenAI Completions in HF server by @gavrishp in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3589\r\n* Add a user friendly error message for http exceptions by @grandbora in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3581\r\n* feat: Provide minimal distribution of CRDs by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3492\r\n* set default SAFETENSORS_FAST_GPU and HF_HUB_DISABLE_TELEMETRY in HF Server by @lizzzcai in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3594\r\n* Enabled the multiple domains support on an inference service by @houshengbo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3615\r\n* Add base model for proxying request to an OpenAI API enabled model server by @cmaddalozzo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3621\r\n* Add headers to predictor exception logging by @grandbora in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3658\r\n* Enhance controller setup based on available CRDs by @israel-hdez in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3472\r\n* Add openai models endpoint by @cmaddalozzo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3666\r\n* feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3603\r\n* Enable dtype support for huggingface server by @Datta0 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3613\r\n* Add method for checking model health\u002Freadiness by @cmaddalozzo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3673\r\n* Unify the log configuration using kserve logger by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3577\r\n* Add the field ResponseStartTimeoutSeconds to create ksvc by @houshengbo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3705\r\n* Add FP16 datatype support for OIP grpc by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3695\r\n* Add option for returning probabilities in huggingface server by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3607\r\n\r\n## ⚠️ What's Changed\r\n* Remove conversion webhook from manifests by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3476\r\n* Remove cluster level list\u002Fwatch for configmaps, serviceaccounts, secrets by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3469\r\n* chore: Remove Seldon Alibi dependencies. Fixes #3380 by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3443\r\n* docs: Move Alibi explainer to docs by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3579\r\n* Remove generate endpoints by @cmaddalozzo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3654\r\n* Remove conversion webhook from kubeflow manifest patch by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3700\r\n\r\n## 🐛  What's Fixed\r\n* Fix:Support Parallelism in vllm runtime by @gavrishp in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3464\r\n* fix: Instantiate HuggingfaceModelRepository only when model cannot be loaded. Fixes #3423 by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3424\r\n* Fix isADirectoryError in Azure blob download by @tjandy98 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3502\r\n* Fix bug: Remove redundant helm chart affinity on predictor CRD by @trojaond in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3481\r\n* Make the modelcar injection idempotent by @rhuss in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3517\r\n* Only pad left for decode-only architecture models. by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3534\r\n* fix lint typo on Makefile by @spolti in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3569\r\n* fix: Set writable cache folder to avoid permission issue. Fixes #3562 by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3576\r\n* Fix model unload in server stop method by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3587\r\n* Fix golint errors by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3552\r\n* Fix make deploy-dev-storage-initializer not working by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3617\r\n* Fix Pydantic 2 warnings by @cmaddalozzo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3622\r\n* build: Fix CRD copying in generate-install.sh by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3620\r\n* Only load from model repository if model binary is not found under model_dir by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3559\r\n* build: Remove misleading logs from minim","2024-06-05T13:38:29",{"id":255,"version":256,"summary_zh":257,"released_at":258},111076,"v0.13.0-rc1","## What's Changed\r\n* upgrade vllm\u002Ftransformers version by @johnugeorge in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3671\r\n* Add openai models endpoint by @cmaddalozzo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3666\r\n* feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3603\r\n* Enable dtype support for huggingface server by @Datta0 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3613\r\n* Add method for checking model health\u002Freadiness by @cmaddalozzo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3673\r\n* fix for extract zip from gcs by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3510\r\n* Update Dockerfile and Readme by @gavrishp in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3676\r\n* Update huggingface readme by @alexagriffith in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3678\r\n* fix: HPA equality check should include annotations by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3650\r\n* Fix:  huggingface runtime in helm chart by @yuzisun in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3679\r\n* Fix: model id and model dir check order by @yuzisun in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3680\r\n* Fix:vLLM Model Supported check throwing circular dependency by @gavrishp in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3688\r\n* Fix: Allow null in Finish reason streaming response in vLLM by @gavrishp in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3684\r\n* Unify the log configuration using kserve logger by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3577\r\n* Remove conversion webhook from kubeflow manifest patch by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3700\r\n* Add the field ResponseStartTimeoutSeconds to create ksvc by @houshengbo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3705\r\n\r\n## New Contributors\r\n* @Datta0 made their first contribution in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3613\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fcompare\u002Fv0.13.0-rc0...v0.13.0-rc1","2024-05-21T09:58:39",{"id":260,"version":261,"summary_zh":262,"released_at":263},111077,"v0.13.0-rc0","## :rainbow: What's New?\r\n* add support for async streaming in predict by @alexagriffith in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3475\r\n* Fix: Support model parallelism in HF transformer by @gavrishp in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3459\r\n* Support model revision and tokenizer revision in huggingface server by @lizzzcai in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3558\r\n* OpenAI schema by @tessapham in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3477\r\n* Support OpenAIModel in ModelRepository by @grandbora in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3590\r\n* updated xgboost to support json and ubj models by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3551\r\n* Add OpenAI API support to Huggingfaceserver by @cmaddalozzo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3582\r\n* VLLM support for OpenAI Completions in HF server by @gavrishp in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3589\r\n* Add a user friendly error message for http exceptions by @grandbora in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3581\r\n* feat: Provide minimal distribution of CRDs by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3492\r\n* set default SAFETENSORS_FAST_GPU and HF_HUB_DISABLE_TELEMETRY in HF Server by @lizzzcai in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3594\r\n* Enabled the multiple domains support on an inference service by @houshengbo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3615\r\n* Add base model for proxying request to an OpenAI API enabled model server by @cmaddalozzo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3621\r\n* Add headers to predictor exception logging by @grandbora in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3658\r\n* Enhance controller setup based on available CRDs by @israel-hdez in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3472\r\n\r\n## ⚠️ What's Changed\r\n* Remove conversion webhook from manifests by @Jooho in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3476\r\n* Remove cluster level list\u002Fwatch for configmaps, serviceaccounts, secrets by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3469\r\n* chore: Remove Seldon Alibi dependencies. Fixes #3380 by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3443\r\n* docs: Move Alibi explainer to docs by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3579\r\n* Remove generate endpoints by @cmaddalozzo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3654\r\n\r\n## 🐛  What's Fixed\r\n* Fix:Support Parallelism in vllm runtime by @gavrishp in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3464\r\n* fix: Instantiate HuggingfaceModelRepository only when model cannot be loaded. Fixes #3423 by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3424\r\n* Fix isADirectoryError in Azure blob download by @tjandy98 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3502\r\n* Fix bug: Remove redundant helm chart affinity on predictor CRD by @trojaond in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3481\r\n* Make the modelcar injection idempotent by @rhuss in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3517\r\n* Only pad left for decode-only architecture models. by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3534\r\n* fix lint typo on Makefile by @spolti in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3569\r\n* fix: Set writable cache folder to avoid permission issue. Fixes #3562 by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3576\r\n* Fix model unload in server stop method by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3587\r\n* Fix golint errors by @andyi2it in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3552\r\n* Fix make deploy-dev-storage-initializer not working by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3617\r\n* Fix Pydantic 2 warnings by @cmaddalozzo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3622\r\n* build: Fix CRD copying in generate-install.sh by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3620\r\n* Only load from model repository if model binary is not found under model_dir by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3559\r\n* build: Remove misleading logs from minimal-crdgen.sh by @terrytangyuan in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3641\r\n* Assign device to input tensors in huggingface server with huggingface backend by @saileshd1402 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3657\r\n* Fix Huggingface server stopping criteria by @cmaddalozzo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3659\r\n* Explicitly specify pad token id when generating tokens  by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3565\r\n* Fix quick install does not cleans up Istio installer by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3660\r\n\r\n## ⬆️ Version Upgrade\r\n* Upgrade orjson to version 3.9.15 by @spolti in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3488\r\n* feat: upgrade to new fastapi, update models to handle both pydantic v… by @timothyjlaurent in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3374\r\n* Update cert manager version in quick install script by @shauryagoel in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3496\r\n* ci: Bump minikube version to work with newer K8s version by @terrytangyuan in https:\u002F\u002Fgithub.com\u002F","2024-05-07T10:11:58",{"id":265,"version":266,"summary_zh":267,"released_at":268},111078,"v0.12.1","## What's Changed\r\n* [release-0.12] Update fastapi to 0.109.1 and Support ray 2.10 by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3609\r\n* [release-0.12] Pydantic 2 support by @cmaddalozzo in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3614\r\n* [release-0.12] Make the modelcar injection idempotent by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3612\r\n* Prepare for release 0.12.1 by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3610\r\n* release-0.12 pin back ray to 2.10 by @yuzisun in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3616\r\n* [release-0.12] Fix docker build failure for ARM64 by @sivanantha321 in https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fpull\u002F3627\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fkserve\u002Fkserve\u002Fcompare\u002Fv0.12.0...v0.12.1","2024-04-23T12:20:26"]