[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-volcano-sh--volcano":3,"tool-volcano-sh--volcano":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":80,"owner_twitter":79,"owner_website":81,"owner_url":82,"languages":83,"stars":105,"forks":106,"last_commit_at":107,"license":108,"difficulty_score":109,"env_os":110,"env_gpu":111,"env_ram":112,"env_deps":113,"category_tags":119,"github_topics":120,"view_count":23,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":131,"updated_at":132,"faqs":133,"releases":154},3817,"volcano-sh\u002Fvolcano","volcano","A Cloud Native Batch System (Project under CNCF)","Volcano 是一个源自 CNCF 的云原生批量调度系统，专为增强 Kubernetes 在处理复杂计算任务时的能力而生。它并非简单的任务队列，而是对标准 Kubernetes 调度器进行了深度扩展，旨在解决人工智能训练、基因测序、大数据分析等高负载场景下资源利用率低、任务排队拥堵以及多任务协同困难等痛点。\n\n对于需要大规模运行 AI\u002FML 模型（如 TensorFlow、PyTorch）、大数据框架（如 Spark、Flink）或高性能计算（HPC）应用的开发者与研究人员而言，Volcano 是理想的底层基础设施伙伴。它支持作业优先级管理、队列调度、断点续跑及弹性伸缩等高级特性，确保关键任务能优先获得资源，同时在集群空闲时最大化算力利用。\n\nVolcano 的独特之处在于其融合了华为等企业在超大规模集群调度领域超过十五年的实践经验，并针对云原生环境进行了深度优化。它能够智能地协调成千上万个并行任务，显著缩短整体作业完成时间。目前，Volcano 已在全球互联网、金融、制造及医疗等多个行业得到广泛应用，是一个成熟且活跃的开源项目，帮助团队更高效、稳定地驾驭云端算力。","\u003Ca href=\"https:\u002F\u002Fvolcano.sh\u002F\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fvolcano-sh_volcano_readme_44effcc59257.png\"\u002F>\n\u003C\u002Fa>\n\n-------\n\n[![Build Status](https:\u002F\u002Ftravis-ci.org\u002Fvolcano-sh\u002Fvolcano.svg?branch=master)](https:\u002F\u002Ftravis-ci.org\u002Fvolcano-sh\u002Fvolcano)\n[![Go Report Card](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fvolcano-sh_volcano_readme_2b4a70945b89.png)](https:\u002F\u002Fgoreportcard.com\u002Freport\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano)\n[![RepoSize](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frepo-size\u002Fvolcano-sh\u002Fvolcano.svg)](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano)\n[![Release](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frelease\u002Fvolcano-sh\u002Fvolcano.svg)](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Freleases)\n[![LICENSE](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fvolcano-sh\u002Fvolcano.svg)](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fblob\u002Fmaster\u002FLICENSE)\n[![CII Best Practices](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fvolcano-sh_volcano_readme_50ccde67b228.png)](https:\u002F\u002Fbestpractices.coreinfrastructure.org\u002Fprojects\u002F3012)\n[![OpenSSF Scorecard](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fvolcano-sh_volcano_readme_bf909dba5f08.png)](https:\u002F\u002Fscorecard.dev\u002Fviewer\u002F?uri=github.com\u002Fvolcano-sh\u002Fvolcano)\n[![Gurubase](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGurubase-Ask%20Volcano%20Guru-006BFF)](https:\u002F\u002Fgurubase.io\u002Fg\u002Fvolcano)\n\n\n\n[Volcano](https:\u002F\u002Fvolcano.sh\u002F) is a Kubernetes-native batch scheduling system, extending and enhancing the capabilities of the standard kube-scheduler. It provides a comprehensive set of features specifically designed to manage and optimize various batch and elastic workloads, including Artificial Intelligence (AI) \u002F machine learning (ML) \u002F deep learning (DL), bioinformatics \u002F genomics, and other \"Big Data\" applications.\n\nThese workloads commonly leverage AI, Big Data, and HPC frameworks such as Spark, Flink, Ray, TensorFlow, PyTorch, Argo, MindSpore, PaddlePaddle, Kubeflow, MPI, Horovod, MXNet, KubeGene, and others, with which Volcano offers robust integration.\n\nVolcano incorporates over fifteen years of collective experience in operating diverse high-performance workloads at scale across multiple systems and platforms. It combines proven best practices and innovative concepts from the open-source community to deliver a powerful and flexible scheduling solution.\n\nAs of 2025, Volcano has seen widespread adoption across numerous industries globally, including Internet\u002FCloud, Finance, Manufacturing, and Medical sectors. Many organizations and institutions are not only end-users but also active contributors to the project. Hundreds of contributors actively participate in code commits, pull request reviews, issue discussions, documentation updates, and design proposals. We encourage your participation in the ongoing development and growth of the Volcano project.\n\n\n> [!NOTE]\n> the scheduler is built based on [kube-batch](https:\u002F\u002Fgithub.com\u002Fkubernetes-sigs\u002Fkube-batch);\nrefer to [#241](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fissues\u002F241) and [#288](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F288) for more detail.\n\n![cncf_logo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fvolcano-sh_volcano_readme_c27bab9bb634.png)\n\nVolcano is an incubating project of the [Cloud Native Computing Foundation](https:\u002F\u002Fcncf.io\u002F) (CNCF). Please consider joining the CNCF if you are an organization that wants to take an active role in supporting the growth and evolution of the cloud native ecosystem. \n\n## Overall Architecture\n\n![volcano](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fvolcano-sh_volcano_readme_c93b72ba9be4.png)\n\n## Talks\n\n- [Intro: Kubernetes Batch Scheduling @ KubeCon 2019 EU](https:\u002F\u002Fsched.co\u002FMPi7)\n- [Volcano 在 Kubernetes 中运行高性能作业实践 @ ArchSummit 2019](https:\u002F\u002Farchsummit.infoq.cn\u002F2019\u002Fshenzhen\u002Fpresentation\u002F1817)\n- [Volcano：基于云原生的高密计算解决方案 @ Huawei Connection 2019](https:\u002F\u002Fe.huawei.com\u002Fcn\u002Fmaterial\u002Fevent\u002FHC\u002F09099dce0070415e9f26ada51b2216d7)\n- [Improving Performance of Deep Learning Workloads With Volcano @ KubeCon 2019 NA](https:\u002F\u002Fsched.co\u002FUaZi)\n- [Batch Capability of Kubernetes Intro @ KubeCon 2019 NA](https:\u002F\u002Fsched.co\u002FUajv)\n- [Optimizing Knowledge Distillation Training With Volcano @ KubeCon 2021 EU](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=cDPGmhVcj7Y&t=143s)\n- [Exploration About Mixing Technology of Online Services and Offline Jobs Based On Volcano @ KubeCon 2021 China](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=daqkUlT5ReY)\n- [Volcano - Cloud Native Batch System for AI, Big Data and HPC @ KubeCon 2022 EU](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=wjy35HfIP_k)\n- [How to Leverage Volcano to Improve the Resource Utilization of AI Pharmaceuticals, Autonomous Driving, and Smart Buildings @ KubeCon 2023 EU](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ujHDV5xteqU)\n- [Run Your AI Workloads and Microservices on Kubernetes More Easily and Efficiently @ KubeCon 2023 China](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=OO7zpyf7fgs)\n- [Optimize LLM Workflows with Smart Infrastructure Enhanced by Volcano @ KubeCon 2024 China](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=77Qn1-I-muQ)\n- [How Volcano Enable Next Wave of Intelligent Applications @ KubeCon 2024 China](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=IzR7zJQ8vMw)\n- [Leverage Topology Modeling and Topology-Aware Scheduling to Accelerate LLM Training @ KubeCon 2024 China](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=IB54LHQQ8lI)\n\n\n## Ecosystem\n\n- [Spark Operator](https:\u002F\u002Fwww.kubeflow.org\u002Fdocs\u002Fcomponents\u002Fspark-operator\u002Fuser-guide\u002Fvolcano-integration\u002F)\n- [Native Spark](https:\u002F\u002Fspark.apache.org\u002Fdocs\u002F3.5.0\u002Frunning-on-kubernetes.html#using-volcano-as-customized-scheduler-for-spark-on-kubernetes)\n- [Flink](https:\u002F\u002Fgithub.com\u002FGoogleCloudPlatform\u002Fflink-on-k8s-operator\u002Fblob\u002Fmaster\u002Fdocs\u002Fvolcano_integration.md)\n- [KubeRay](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Fmaster\u002Fcluster\u002Fkubernetes\u002Fk8s-ecosystem\u002Fvolcano.html)\n- [PyTorch](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fblob\u002Fmaster\u002Fdocs\u002Fuser-guide\u002Fhow_to_use_pytorch_plugin.md)\n- [TensorFlow](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Ftree\u002Fmaster\u002Fexample\u002Fintegrations\u002Ftensorflow)\n- [kubeflow\u002Ftraining-operator](https:\u002F\u002Fwww.kubeflow.org\u002Fdocs\u002Fcomponents\u002Ftraining\u002Fuser-guides\u002Fjob-scheduling\u002F)\n- [kubeflow\u002Farena](https:\u002F\u002Fgithub.com\u002Fkubeflow\u002Farena\u002Fblob\u002Fmaster\u002Fdocs\u002Ftraining\u002Fvolcanojob\u002Fvolcanojob.md)\n- [MPI](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Ftree\u002Fmaster\u002Fexample\u002Fintegrations\u002Fmpi)\n- [Horovod](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fblob\u002Fmaster\u002Fexample\u002Fkubecon-2019-china\u002Fhorovod-sample\u002Flm-horovod-tf-mnist-v0.5.yaml)\n- [PaddlePaddle](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Ftree\u002Fmaster\u002Fexample\u002Fintegrations\u002Fpaddlepaddle)\n- [Cromwell](https:\u002F\u002Fgithub.com\u002Fbroadinstitute\u002Fcromwell\u002Fblob\u002Fdevelop\u002Fdocs\u002Fbackends\u002FVolcano.md)\n- [MindSpore](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Ftree\u002Fmaster\u002Fexample\u002FMindSpore-example)\n- [MXNet](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Ftree\u002Fmaster\u002Fexample\u002Fintegrations\u002Fmxnet\u002Ftrain)\n- [Argo](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Ftree\u002Fmaster\u002Fexample\u002Fintegrations\u002Fargo)\n- [KubeGene](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fkubegene)\n\n## Use Cases\n- [Why Spark chooses Volcano as built-in batch scheduler on Kubernetes?](https:\u002F\u002Fwww.cncf.io\u002Fblog\u002F2022\u002F06\u002F30\u002Fwhy-spark-chooses-volcano-as-built-in-batch-scheduler-on-kubernetes\u002F)\n- [ING Bank: How Volcano empowers its big data analytics platform](https:\u002F\u002Fwww.cncf.io\u002Fblog\u002F2023\u002F02\u002F21\u002Fing-bank-how-volcano-empowers-its-big-data-analytics-platform\u002F)\n- [Using Volcano as a custom scheduler for Apache Spark on Amazon EMR on EKS](https:\u002F\u002Fdocs.aws.amazon.com\u002Femr\u002Flatest\u002FEMR-on-EKS-DevelopmentGuide\u002Ftutorial-volcano.html)\n- [Deploy Azure Machine Learning extension on AKS or Arc Kubernetes cluster](https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fazure\u002Fmachine-learning\u002Fhow-to-deploy-kubernetes-extension?view=azureml-api-2&tabs=deploy-extension-with-cli)\n- [Practical Tips for Preventing GPU Fragmentation for Volcano Scheduler](https:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Fpractical-tips-for-preventing-gpu-fragmentation-for-volcano-scheduler\u002F)\n- [Using Volcano in Large-Scale, Distributed Offline Computing](https:\u002F\u002Fvolcano.sh\u002Fen\u002Fblog\u002Fruitian2-en\u002F)\n- [OpenI-Octopus: How to Avoid Resource Preemption in Kubernetes Clusters](https:\u002F\u002Fvolcano.sh\u002Fen\u002Fblog\u002Fpengcheng-en\u002F)\n- [How Does Volcano Empower a Content Recommendation Engine in Xiaohongshu](https:\u002F\u002Fvolcano.sh\u002Fen\u002Fblog\u002Fxiaohongshu-en\u002F)\n- [How Ruitian Used Volcano to Run Large-Scale Offline HPC Jobs](https:\u002F\u002Fvolcano.sh\u002Fen\u002Fblog\u002Fruitian-en\u002F)\n- [Integrating Volcano into the Leinao Cloud OS](https:\u002F\u002Fvolcano.sh\u002Fen\u002Fblog\u002Fleinao-en\u002F)\n- [HPC on Volcano: How Containers Support HPC Applications in the Meteorological Industry](https:\u002F\u002Fvolcano.sh\u002Fen\u002Fblog\u002Fhpc-en\u002F)\n- [iQIYI:Volcano-based Cloud Native Migration Practices](https:\u002F\u002Fvolcano.sh\u002Fen\u002Fblog\u002Faiqiyi-en\u002F)\n- [PaddlePaddle Distributed Training on Volcano](https:\u002F\u002Fvolcano.sh\u002Fen\u002Fblog\u002Fpaddlepaddle-en\u002F)\n\n## Quick Start Guide\n\n### Prerequisites\n\n- Kubernetes 1.12+ with CRD support\n\n\nYou can try Volcano by one of the following two ways.\n\n> [!NOTE]\n> * For Kubernetes v1.17 and above, use CRDs under config\u002Fcrd\u002Fbases (recommended)\n> * For Kubernetes v1.16 and below, use CRDs under config\u002Fcrd\u002Fv1beta1 (deprecated)\n\n### Install with YAML files\n\nInstall Volcano on an existing Kubernetes cluster. This way is both available for x86_64 and arm64 architecture.\n\n```\nkubectl apply -f https:\u002F\u002Fraw.githubusercontent.com\u002Fvolcano-sh\u002Fvolcano\u002Fmaster\u002Finstaller\u002Fvolcano-development.yaml\n```\n\nEnjoy! Volcano will create the following resources in `volcano-system` namespace.\n\n\n```\nNAME                                       READY   STATUS      RESTARTS   AGE\npod\u002Fvolcano-admission-5bd5756f79-dnr4l     1\u002F1     Running     0          96s\npod\u002Fvolcano-controllers-687948d9c8-nw4b4   1\u002F1     Running     0          96s\npod\u002Fvolcano-scheduler-94998fc64-4z8kh      1\u002F1     Running     0          96s\n\nNAME                                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE\nservice\u002Fvolcano-admission-service   ClusterIP   10.98.152.108   \u003Cnone>        443\u002FTCP   96s\n\nNAME                                  READY   UP-TO-DATE   AVAILABLE   AGE\ndeployment.apps\u002Fvolcano-admission     1\u002F1     1            1           96s\ndeployment.apps\u002Fvolcano-controllers   1\u002F1     1            1           96s\ndeployment.apps\u002Fvolcano-scheduler     1\u002F1     1            1           96s\n\nNAME                                             DESIRED   CURRENT   READY   AGE\nreplicaset.apps\u002Fvolcano-admission-5bd5756f79     1         1         1       96s\nreplicaset.apps\u002Fvolcano-controllers-687948d9c8   1         1         1       96s\nreplicaset.apps\u002Fvolcano-scheduler-94998fc64      1         1         1       96s\n\nNAME                               COMPLETIONS   DURATION   AGE\njob.batch\u002Fvolcano-admission-init   1\u002F1           48s        96s\n\n```\n\n### Install via helm\n\nTo install official release, please visit [helm-charts](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fhelm-charts) for details.\n\n```bash\nhelm repo add volcano-sh https:\u002F\u002Fvolcano-sh.github.io\u002Fhelm-charts\nhelm install volcano volcano-sh\u002Fvolcano -n volcano-system --create-namespace\n```\n\nInstall from source code for developers:\n\n```bash\nhelm install volcano installer\u002Fhelm\u002Fchart\u002Fvolcano --namespace volcano-system --create-namespace\n\n# list helm release\nhelm list -n volcano-system\n```\n\n### Install from code\n\nIf you don't have a kubernetes cluster, try one-click install from code base:\n\n```bash\n.\u002Fhack\u002Flocal-up-volcano.sh\n```\n\nThis way is only available for x86_64 temporarily.\n\n### Install volcano agent\n\nPlease follow the guide [Volcano Agent](https:\u002F\u002Fvolcano.sh\u002Fen\u002Fdocs\u002Fcolocation) to install volcano agent.\n\n### Install monitoring system\n\nIf you want to get prometheus and grafana volcano dashboard after volcano installed, try following commands:\n\n```bash\nkubectl create -f installer\u002Fvolcano-monitoring.yaml\n```\n\n### Install dashboard\n\nPlease follow the guide [Volcano Dashboard](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fdashboard#volcano-dashboard) to install volcano dashboard.\n\n## Kubernetes compatibility\n|                       | Kubernetes 1.35 | Kubernetes 1.34 | Kubernetes 1.33 | Kubernetes 1.32 | Kubernetes 1.31 | Kubernetes 1.30 | Kubernetes 1.29 | Kubernetes 1.28 | Kubernetes 1.27 | Kubernetes 1.26 | Kubernetes 1.25 | Kubernetes 1.24 | Kubernetes 1.23 | Kubernetes 1.22 | Kubernetes 1.21 |\n|-----------------------|-----------------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|\n| Volcano HEAD (master) | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | -               | -               |\n| Volcano v1.14         | -               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | -               | -               |\n| Volcano v1.13         | -               | -               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | -               | -               |\n| Volcano v1.12         | -               | -               | -               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               |\n| Volcano v1.11         | -               | -               | -               | -               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               |\n| Volcano v1.10         | -               | -               | -               | -               | -               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               |\n\nKey:\n* `✓` Volcano and the Kubernetes version are exactly compatible.\n* `+` Volcano has features or API objects that may not be present in the Kubernetes version.\n* `-` The Kubernetes version has features or API objects that Volcano can't use.\n\n\n## Citing Volcano\n\nIf Volcano helps your research, we appreciate your citations. Here is the BibTeX entry:\n\n```bibtex\n@misc{volcano2025,\n  title={Volcano: A Cloud Native Batch System},\n  author={Klaus Ma and Kevin Wang and others},\n  year={2025},\n  howpublished={\\url{https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano}},\n}\n```\n\n## Meeting\n\nWe hold community meetings for different timezones. See [Volcano Community Meeting Info](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fcommunity#community-meeting).\n\nResources:\n- [Meeting notes and agenda](https:\u002F\u002Fdocs.google.com\u002Fdocument\u002Fd\u002F1YLbF8zjZBiR9PbXQPB22iuc_L0Oui5A1lddVfRnZrqs\u002Fedit)\n- [Meeting link](https:\u002F\u002Fzoom.us\u002Fj\u002F91804791393)\n- [Meeting Calendar](https:\u002F\u002Fcalendar.google.com\u002Fcalendar\u002Fb\u002F1\u002Fembed?src=volcano.sh.bot@gmail.com) | [Subscribe](https:\u002F\u002Fcalendar.google.com\u002Fcalendar\u002Fb\u002F1?cid=dm9sY2Fuby5zaC5ib3RAZ21haWwuY29t)\n\n## Contact\n\nIf you have any question, feel free to reach out to us in the following ways:\n\n[Volcano Slack Channel](https:\u002F\u002Fcloud-native.slack.com\u002Farchives\u002FC011GJDQS0N) | [Join](https:\u002F\u002Fslack.cncf.io\u002F)\n\n[Mailing List](https:\u002F\u002Fgroups.google.com\u002Fforum\u002F#!forum\u002Fvolcano-sh)\n\nWeChat: Please add WeChat account `k8s2222` and request an invitation to the group chat.\n","\u003Ca href=\"https:\u002F\u002Fvolcano.sh\u002F\">\n    \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fvolcano-sh_volcano_readme_44effcc59257.png\"\u002F>\n\u003C\u002Fa>\n\n-------\n\n[![构建状态](https:\u002F\u002Ftravis-ci.org\u002Fvolcano-sh\u002Fvolcano.svg?branch=master)](https:\u002F\u002Ftravis-ci.org\u002Fvolcano-sh\u002Fvolcano)\n[![Go报告卡](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fvolcano-sh_volcano_readme_2b4a70945b89.png)](https:\u002F\u002Fgoreportcard.com\u002Freport\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano)\n[![仓库大小](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frepo-size\u002Fvolcano-sh\u002Fvolcano.svg)](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano)\n[![发布版本](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frelease\u002Fvolcano-sh\u002Fvolcano.svg)](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Freleases)\n[![许可证](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fvolcano-sh\u002Fvolcano.svg)](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fblob\u002Fmaster\u002FLICENSE)\n[![CII最佳实践](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fvolcano-sh_volcano_readme_50ccde67b228.png)](https:\u002F\u002Fbestpractices.coreinfrastructure.org\u002Fprojects\u002F3012)\n[![OpenSSF评分卡](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fvolcano-sh_volcano_readme_bf909dba5f08.png)](https:\u002F\u002Fscorecard.dev\u002Fviewer\u002F?uri=github.com\u002Fvolcano-sh\u002Fvolcano)\n[![Gurubase](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGurubase-向Volcano专家请教-006BFF)](https:\u002F\u002Fgurubase.io\u002Fg\u002Fvolcano)\n\n\n\n[Volcano](https:\u002F\u002Fvolcano.sh\u002F) 是一个 Kubernetes 原生的批处理调度系统，扩展并增强了标准 kube-scheduler 的功能。它提供了一套全面的功能，专门用于管理和优化各种批处理及弹性工作负载，包括人工智能（AI）\u002F机器学习（ML）\u002F深度学习（DL）、生物信息学\u002F基因组学以及其他“大数据”应用。\n\n这些工作负载通常会使用 AI、大数据和 HPC 框架，如 Spark、Flink、Ray、TensorFlow、PyTorch、Argo、MindSpore、PaddlePaddle、Kubeflow、MPI、Horovod、MXNet、KubeGene 等，而 Volcano 则提供了与这些框架的强大集成能力。\n\nVolcano 汇集了超过十五年的经验，曾在多个系统和平台上大规模运行各类高性能工作负载。它结合了开源社区中经过验证的最佳实践和创新理念，提供了一个强大且灵活的调度解决方案。\n\n截至 2025 年，Volcano 已在全球众多行业中得到广泛应用，涵盖互联网\u002F云计算、金融、制造和医疗等领域。许多组织和机构不仅是最终用户，也是该项目的积极贡献者。数百名贡献者积极参与代码提交、拉取请求评审、问题讨论、文档更新以及设计方案的提出。我们诚挚地邀请您参与到 Volcano 项目的持续开发与成长中来。\n\n\n> [!注意]\n> 该调度器基于 [kube-batch](https:\u002F\u002Fgithub.com\u002Fkubernetes-sigs\u002Fkube-batch) 构建；\n有关详细信息，请参阅 [#241](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fissues\u002F241) 和 [#288](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F288)。\n\n![cncf_logo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fvolcano-sh_volcano_readme_c27bab9bb634.png)\n\nVolcano 是 [云原生计算基金会](https:\u002F\u002Fcncf.io\u002F)（CNCF）的孵化项目。如果您所在的组织希望在支持云原生生态系统的成长与演进中发挥积极作用，请考虑加入 CNCF。\n\n## 总体架构\n\n![volcano](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fvolcano-sh_volcano_readme_c93b72ba9be4.png)\n\n## 演讲\n\n- [简介：Kubernetes 批处理调度 @ KubeCon 2019 欧洲站](https:\u002F\u002Fsched.co\u002FMPi7)\n- [Volcano 在 Kubernetes 中运行高性能作业实践 @ ArchSummit 2019](https:\u002F\u002Farchsummit.infoq.cn\u002F2019\u002Fshenzhen\u002Fpresentation\u002F1817)\n- [Volcano：基于云原生的高密计算解决方案 @ 华为连接 2019](https:\u002F\u002Fe.huawei.com\u002Fcn\u002Fmaterial\u002Fevent\u002FHC\u002F09099dce0070415e9f26ada51b2216d7)\n- [利用 Volcano 提升深度学习工作负载性能 @ KubeCon 2019 北美站](https:\u002F\u002Fsched.co\u002FUaZi)\n- [Kubernetes 的批处理能力介绍 @ KubeCon 2019 北美站](https:\u002F\u002Fsched.co\u002FUajv)\n- [利用 Volcano 优化知识蒸馏训练 @ KubeCon 2021 欧洲站](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=cDPGmhVcj7Y&t=143s)\n- [基于 Volcano 探索在线服务与离线作业混合技术 @ KubeCon 2021 中国站](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=daqkUlT5ReY)\n- [Volcano - 面向 AI、大数据和 HPC 的云原生批处理系统 @ KubeCon 2022 欧洲站](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=wjy35HfIP_k)\n- [如何利用 Volcano 提升 AI 制药、自动驾驶和智能建筑的资源利用率 @ KubeCon 2023 欧洲站](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ujHDV5xteqU)\n- [更轻松高效地在 Kubernetes 上运行您的 AI 工作负载和微服务 @ KubeCon 2023 中国站](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=OO7zpyf7fgs)\n- [利用 Volcano 增强的智能基础设施优化 LLM 工作流 @ KubeCon 2024 中国站](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=77Qn1-I-muQ)\n- [Volcano 如何助力下一波智能应用 @ KubeCon 2024 中国站](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=IzR7zJQ8vMw)\n- [利用拓扑建模与拓扑感知调度加速 LLM 训练 @ KubeCon 2024 中国站](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=IB54LHQQ8lI)\n\n\n## 生态系统\n\n- [Spark Operator](https:\u002F\u002Fwww.kubeflow.org\u002Fdocs\u002Fcomponents\u002Fspark-operator\u002Fuser-guide\u002Fvolcano-integration\u002F)\n- [原生 Spark](https:\u002F\u002Fspark.apache.org\u002Fdocs\u002F3.5.0\u002Frunning-on-kubernetes.html#using-volcano-as-customized-scheduler-for-spark-on-kubernetes)\n- [Flink](https:\u002F\u002Fgithub.com\u002FGoogleCloudPlatform\u002Fflink-on-k8s-operator\u002Fblob\u002Fmaster\u002Fdocs\u002Fvolcano_integration.md)\n- [KubeRay](https:\u002F\u002Fdocs.ray.io\u002Fen\u002Fmaster\u002Fcluster\u002Fkubernetes\u002Fk8s-ecosystem\u002Fvolcano.html)\n- [PyTorch](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fblob\u002Fmaster\u002Fdocs\u002Fuser-guide\u002Fhow_to_use_pytorch_plugin.md)\n- [TensorFlow](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Ftree\u002Fmaster\u002Fexample\u002Fintegrations\u002Ftensorflow)\n- [kubeflow\u002Ftraining-operator](https:\u002F\u002Fwww.kubeflow.org\u002Fdocs\u002Fcomponents\u002Ftraining\u002Fuser-guides\u002Fjob-scheduling\u002F)\n- [kubeflow\u002Farena](https:\u002F\u002Fgithub.com\u002Fkubeflow\u002Farena\u002Fblob\u002Fmaster\u002Fdocs\u002Ftraining\u002Fvolcanojob\u002Fvolcanojob.md)\n- [MPI](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Ftree\u002Fmaster\u002Fexample\u002Fintegrations\u002Fmpi)\n- [Horovod](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fblob\u002Fmaster\u002Fexample\u002Fkubecon-2019-china\u002Fhorovod-sample\u002Flm-horovod-tf-mnist-v0.5.yaml)\n- [PaddlePaddle](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Ftree\u002Fmaster\u002Fexample\u002Fintegrations\u002Fpaddlepaddle)\n- [Cromwell](https:\u002F\u002Fgithub.com\u002Fbroadinstitute\u002Fcromwell\u002Fblob\u002Fdevelop\u002Fdocs\u002Fbackends\u002FVolcano.md)\n- [MindSpore](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Ftree\u002Fmaster\u002Fexample\u002FMindSpore-example)\n- [MXNet](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Ftree\u002Fmaster\u002Fexample\u002Fintegrations\u002Fmxnet\u002Ftrain)\n- [Argo](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Ftree\u002Fmaster\u002Fexample\u002Fintegrations\u002Fargo)\n- [KubeGene](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fkubegene)\n\n## 使用场景\n- [为什么 Spark 选择 Volcano 作为 Kubernetes 上的内置批处理调度器？](https:\u002F\u002Fwww.cncf.io\u002Fblog\u002F2022\u002F06\u002F30\u002Fwhy-spark-chooses-volcano-as-built-in-batch-scheduler-on-kubernetes\u002F)\n- [ING 银行：Volcano 如何赋能其大数据分析平台](https:\u002F\u002Fwww.cncf.io\u002Fblog\u002F2023\u002F02\u002F21\u002Fing-bank-how-volcano-empowers-its-big-data-analytics-platform\u002F)\n- [在 Amazon EMR on EKS 上将 Volcano 用作 Apache Spark 的自定义调度器](https:\u002F\u002Fdocs.aws.amazon.com\u002Femr\u002Flatest\u002FEMR-on-EKS-DevelopmentGuide\u002Ftutorial-volcano.html)\n- [在 AKS 或 Arc Kubernetes 集群上部署 Azure 机器学习扩展](https:\u002F\u002Flearn.microsoft.com\u002Fen-us\u002Fazure\u002Fmachine-learning\u002Fhow-to-deploy-kubernetes-extension?view=azureml-api-2&tabs=deploy-extension-with-cli)\n- [防止 Volcano 调度器中 GPU 碎片化的实用技巧](https:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Fpractical-tips-for-preventing-gpu-fragmentation-for-volcano-scheduler\u002F)\n- [在大规模分布式离线计算中使用 Volcano](https:\u002F\u002Fvolcano.sh\u002Fen\u002Fblog\u002Fruitian2-en\u002F)\n- [OpenI-Octopus：如何避免 Kubernetes 集群中的资源抢占](https:\u002F\u002Fvolcano.sh\u002Fen\u002Fblog\u002Fpengcheng-en\u002F)\n- [Volcano 如何赋能小红书的内容推荐引擎](https:\u002F\u002Fvolcano.sh\u002Fen\u002Fblog\u002Fxiaohongshu-en\u002F)\n- [Ruitian 如何利用 Volcano 运行大规模离线 HPC 作业](https:\u002F\u002Fvolcano.sh\u002Fen\u002Fblog\u002Fruitian-en\u002F)\n- [将 Volcano 集成到雷鸟云操作系统](https:\u002F\u002Fvolcano.sh\u002Fen\u002Fblog\u002Fleinao-en\u002F)\n- [Volcano 上的 HPC：容器如何支持气象行业的 HPC 应用程序](https:\u002F\u002Fvolcano.sh\u002Fen\u002Fblog\u002Fhpc-en\u002F)\n- [爱奇艺：基于 Volcano 的云原生迁移实践](https:\u002F\u002Fvolcano.sh\u002Fen\u002Fblog\u002Faiqiyi-en\u002F)\n- [PaddlePaddle 在 Volcano 上的分布式训练](https:\u002F\u002Fvolcano.sh\u002Fen\u002Fblog\u002Fpaddlepaddle-en\u002F)\n\n## 快速入门指南\n\n### 前提条件\n\n- Kubernetes 1.12 及以上版本，且支持 CRD\n\n\n您可以通过以下两种方式之一试用 Volcano。\n\n> [!NOTE]\n> * 对于 Kubernetes v1.17 及以上版本，请使用 config\u002Fcrd\u002Fbases 下的 CRD（推荐）\n> * 对于 Kubernetes v1.16 及以下版本，请使用 config\u002Fcrd\u002Fv1beta1 下的 CRD（已弃用）\n\n### 使用 YAML 文件安装\n\n在现有的 Kubernetes 集群上安装 Volcano。此方法适用于 x86_64 和 arm64 架构。\n\n```\nkubectl apply -f https:\u002F\u002Fraw.githubusercontent.com\u002Fvolcano-sh\u002Fvolcano\u002Fmaster\u002Finstaller\u002Fvolcano-development.yaml\n```\n\n现在您可以开始使用了！Volcano 将在 `volcano-system` 命名空间中创建以下资源。\n\n\n```\nNAME                                       READY   STATUS      RESTARTS   AGE\npod\u002Fvolcano-admission-5bd5756f79-dnr4l     1\u002F1     Running     0          96s\npod\u002Fvolcano-controllers-687948d9c8-nw4b4   1\u002F1     Running     0          96s\npod\u002Fvolcano-scheduler-94998fc64-4z8kh      1\u002F1     Running     0          96s\n\nNAME                                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE\nservice\u002Fvolcano-admission-service   ClusterIP   10.98.152.108   \u003Cnone>        443\u002FTCP   96s\n\nNAME                                  READY   UP-TO-DATE   AVAILABLE   AGE\ndeployment.apps\u002Fvolcano-admission     1\u002F1     1            1           96s\ndeployment.apps\u002Fvolcano-controllers   1\u002F1     1            1           96s\ndeployment.apps\u002Fvolcano-scheduler     1\u002F1     1            1           96s\n\nNAME                                             DESIRED   CURRENT   READY   AGE\nreplicaset.apps\u002Fvolcano-admission-5bd5756f79     1         1         1       96s\nreplicaset.apps\u002Fvolcano-controllers-687948d9c8   1         1         1       96s\nreplicaset.apps\u002Fvolcano-scheduler-94998fc64      1         1         1       96s\n\nNAME                               COMPLETIONS   DURATION   AGE\njob.batch\u002Fvolcano-admission-init   1\u002F1           48s        96s\n\n```\n\n### 通过 Helm 安装\n\n要安装官方发布的版本，请访问 [helm-charts](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fhelm-charts) 获取详细信息。\n\n```bash\nhelm repo add volcano-sh https:\u002F\u002Fvolcano-sh.github.io\u002Fhelm-charts\nhelm install volcano volcano-sh\u002Fvolcano -n volcano-system --create-namespace\n```\n\n开发者从源代码安装：\n\n```bash\nhelm install volcano installer\u002Fhelm\u002Fchart\u002Fvolcano --namespace volcano-system --create-namespace\n\n# 列出 Helm 发布\nhelm list -n volcano-system\n```\n\n### 从代码安装\n\n如果您没有 Kubernetes 集群，可以尝试从代码库一键安装：\n\n```bash\n.\u002Fhack\u002Flocal-up-volcano.sh\n```\n\n目前此方法仅适用于 x86_64 架构。\n\n### 安装 Volcano 代理\n\n请按照 [Volcano 代理](https:\u002F\u002Fvolcano.sh\u002Fen\u002Fdocs\u002Fcolocation) 指南安装 Volcano 代理。\n\n### 安装监控系统\n\n如果您希望在安装 Volcano 后获得 Prometheus 和 Grafana 的 Volcano 仪表板，请尝试以下命令：\n\n```bash\nkubectl create -f installer\u002Fvolcano-monitoring.yaml\n```\n\n### 安装仪表板\n\n请按照 [Volcano 仪表板](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fdashboard#volcano-dashboard) 指南安装 Volcano 仪表板。\n\n## Kubernetes 兼容性\n|                       | Kubernetes 1.35 | Kubernetes 1.34 | Kubernetes 1.33 | Kubernetes 1.32 | Kubernetes 1.31 | Kubernetes 1.30 | Kubernetes 1.29 | Kubernetes 1.28 | Kubernetes 1.27 | Kubernetes 1.26 | Kubernetes 1.25 | Kubernetes 1.24 | Kubernetes 1.23 | Kubernetes 1.22 | Kubernetes 1.21 |\n|-----------------------|-----------------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|\n| Volcano HEAD (master) | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | -               | -               |\n| Volcano v1.14         | -               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | -               | -               |\n| Volcano v1.13         | -               | -               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | -               | -               |\n| Volcano v1.12         | -               | -               | -               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               |\n| Volcano v1.11         | -               | -               | -               | -               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               |\n| Volcano v1.10         | -               | -               | -               | -               | -               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               | ✓               |\n\n说明：\n* `✓` 表示 Volcano 与该 Kubernetes 版本完全兼容。\n* `+` 表示 Volcano 具有某些功能或 API 对象，而该 Kubernetes 版本可能不具备。\n* `-` 表示该 Kubernetes 版本具有某些功能或 API 对象，而 Volcano 无法使用。\n\n## 引用 Volcano\n\n如果 Volcano 对您的研究有所帮助，我们非常感谢您的引用。以下是 BibTeX 格式的引用条目：\n\n```bibtex\n@misc{volcano2025,\n  title={Volcano: A Cloud Native Batch System},\n  author={Klaus Ma and Kevin Wang and others},\n  year={2025},\n  howpublished={\\url{https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano}},\n}\n```\n\n## 会议\n\n我们为不同的时区举办社区会议。详情请参阅 [Volcano 社区会议信息](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fcommunity#community-meeting)。\n\n资源：\n- [会议记录和议程](https:\u002F\u002Fdocs.google.com\u002Fdocument\u002Fd\u002F1YLbF8zjZBiR9PbXQPB22iuc_L0Oui5A1lddVfRnZrqs\u002Fedit)\n- [会议链接](https:\u002F\u002Fzoom.us\u002Fj\u002F91804791393)\n- [会议日历](https:\u002F\u002Fcalendar.google.com\u002Fcalendar\u002Fb\u002F1\u002Fembed?src=volcano.sh.bot@gmail.com) | [订阅](https:\u002F\u002Fcalendar.google.com\u002Fcalendar\u002Fb\u002F1?cid=dm9sY2Fuby5zaC5ib3RAZ21haWwuY29t)\n\n## 联系方式\n\n如果您有任何问题，请随时通过以下方式联系我们：\n\n[Volcano Slack 频道](https:\u002F\u002Fcloud-native.slack.com\u002Farchives\u002FC011GJDQS0N) | [加入](https:\u002F\u002Fslack.cncf.io\u002F)\n\n[邮件列表](https:\u002F\u002Fgroups.google.com\u002Fforum\u002F#!forum\u002Fvolcano-sh)\n\n微信：请添加微信号 `k8s2222` 并申请加入群聊。","# Volcano 快速上手指南\n\nVolcano 是一个构建在 Kubernetes 之上的原生批量调度系统，专为 AI\u002FML、深度学习、生物信息学及大数据应用设计。它扩展了标准 kube-scheduler 的能力，提供队列管理、作业依赖、弹性伸缩等高级特性，完美支持 TensorFlow、PyTorch、Spark、Flink 等主流框架。\n\n## 1. 环境准备\n\n在开始之前，请确保满足以下前置条件：\n\n*   **Kubernetes 集群**：版本需为 **1.12+** 且支持 CRD（自定义资源定义）。\n    *   *推荐*：Kubernetes v1.17 及以上版本。\n    *   *注意*：若使用 v1.16 及以下版本，需使用旧的 CRD 定义（已弃用）。\n*   **kubectl 工具**：已安装并配置好与集群的连接。\n*   **架构支持**：官方安装包同时支持 `x86_64` 和 `arm64` 架构。\n\n## 2. 安装步骤\n\n您可以选择以下任意一种方式进行安装。\n\n### 方式一：使用 YAML 文件安装（推荐）\n\n这是最快捷的方式，适用于大多数现有 Kubernetes 集群。执行以下命令即可在 `volcano-system` 命名空间中部署所有组件：\n\n```bash\nkubectl apply -f https:\u002F\u002Fraw.githubusercontent.com\u002Fvolcano-sh\u002Fvolcano\u002Fmaster\u002Finstaller\u002Fvolcano-development.yaml\n```\n\n安装完成后，可通过以下命令验证组件状态（所有 Pod 应处于 `Running` 状态）：\n\n```bash\nkubectl get pods -n volcano-system\n```\n\n预期输出示例：\n```text\nNAME                                       READY   STATUS      RESTARTS   AGE\npod\u002Fvolcano-admission-5bd5756f79-dnr4l     1\u002F1     Running     0          96s\npod\u002Fvolcano-controllers-687948d9c8-nw4b4   1\u002F1     Running     0          96s\npod\u002Fvolcano-scheduler-94998fc64-4z8kh      1\u002F1     Running     0          96s\n```\n\n### 方式二：使用 Helm 安装\n\n如果您习惯使用 Helm 进行包管理，可以使用官方 Chart 进行安装。\n\n**添加仓库并安装：**\n```bash\nhelm repo add volcano-sh https:\u002F\u002Fvolcano-sh.github.io\u002Fhelm-charts\nhelm install volcano volcano-sh\u002Fvolcano -n volcano-system --create-namespace\n```\n\n**查看发布状态：**\n```bash\nhelm list -n volcano-system\n```\n\n### 方式三：本地开发环境一键安装\n\n如果您没有现成的 Kubernetes 集群，希望在本地代码库中快速搭建测试环境（仅限 `x86_64`），可运行：\n\n```bash\n.\u002Fhack\u002Flocal-up-volcano.sh\n```\n\n### （可选）安装监控与看板\n\n*   **监控面板**：安装 Prometheus 和 Grafana 监控看板。\n    ```bash\n    kubectl create -f installer\u002Fvolcano-monitoring.yaml\n    ```\n*   **Web 控制台**：参考 [Volcano Dashboard](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fdashboard#volcano-dashboard) 指南安装可视化管理界面。\n\n## 3. 基本使用\n\n安装完成后，Volcano 会自动接管集群中的批量作业调度。您可以通过定义 `Job` 和 `Queue` 资源来使用其高级功能。\n\n### 第一步：创建队列 (Queue)\n\nVolcano 使用队列来隔离资源和设置优先级。创建一个名为 `default` 的队列：\n\n```yaml\napiVersion: scheduling.volcano.sh\u002Fv1beta1\nkind: Queue\nmetadata:\n  name: default\nspec:\n  weight: 1\n```\n\n应用配置：\n```bash\nkubectl apply -f queue.yaml\n```\n\n### 第二步：提交批量作业 (Job)\n\n以下是一个简单的 Volcano Job 示例，它启动两个并行任务（类似 MPI 或分布式训练场景）：\n\n```yaml\napiVersion: batch.volcano.sh\u002Fv1alpha1\nkind: Job\nmetadata:\n  name: volcano-job-example\nspec:\n  minAvailable: 2\n  schedulerName: volcano\n  queue: default\n  tasks:\n    - replicas: 2\n      name: \"worker\"\n      template:\n        spec:\n          containers:\n            - image: ubuntu:18.04\n              command: [\"sleep\", \"60\"]\n              imagePullPolicy: IfNotPresent\n              name: worker\n          restartPolicy: OnFailure\n```\n\n应用作业：\n```bash\nkubectl apply -f job.yaml\n```\n\n### 第三步：查看作业状态\n\n使用 `kubectl` 查看作业运行情况，Volcano 会显示独特的调度状态（如 `Pending`, `Running`, `Completed`）：\n\n```bash\nkubectl get vcjob volcano-job-example\n```\n\n当 `minAvailable` 条件满足时，Volcano 才会将任务调度到节点上运行，从而避免资源碎片化并确保分布式任务的完整性。","某生物科技公司正在利用 Kubernetes 集群运行大规模基因组测序分析任务，需要同时调度数百个相互依赖的计算作业。\n\n### 没有 volcano 时\n- **资源碎片化严重**：原生调度器无法感知作业间的依赖关系，导致大量 GPU 和 CPU 资源被零散占用，大任务因凑不齐资源而长期排队。\n- **任务启动不同步**：分布式训练任务（如 MPI 或 TensorFlow）的多个 Pod 往往分批启动，先启动的节点空等后续节点，造成昂贵的算力闲置浪费。\n- **缺乏公平性机制**：不同研发团队的作业互相争抢资源，重要紧急任务常被大量低优先级的小任务阻塞，难以保障核心业务 SLA。\n- **队列管理缺失**：无法按部门或项目划分资源配额，一旦某个实验提交海量任务，极易耗尽整个集群资源，影响其他业务线正常运行。\n\n### 使用 volcano 后\n- **智能装箱与协同调度**：Volcano 通过 Gang Scheduling（帮派调度）机制，确保分布式任务的所有 Pod 资源就绪后才统一启动，彻底消除节点空等现象。\n- **高效资源利用率**：凭借对批处理作业的深度优化，Volcano 能识别作业拓扑结构，将关联任务紧凑调度，显著减少资源碎片，集群整体吞吐量提升 40%。\n- **多级队列与公平共享**：支持基于层级的队列管理，可为不同团队设定资源权重和上限，确保高优先级任务插队执行，同时防止单一用户垄断集群。\n- **断点续跑与弹性伸缩**：在资源紧张时自动挂起低优先级作业并释放资源，待资源充裕后无缝恢复，保障关键基因分析任务按时交付。\n\nVolcano 将原本混乱无序的批处理集群转变为高效、公平且可预测的算力引擎，让科研人员能专注于数据价值而非资源博弈。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fvolcano-sh_volcano_44effcc5.png","volcano-sh","Volcano","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fvolcano-sh_1669e065.png","A Cloud Native Batch System",null,"volcano-sh@googlegroups.com","http:\u002F\u002Fvolcano.sh","https:\u002F\u002Fgithub.com\u002Fvolcano-sh",[84,88,92,96,99,103],{"name":85,"color":86,"percentage":87},"Go","#00ADD8",96.4,{"name":89,"color":90,"percentage":91},"Shell","#89e051",3.3,{"name":93,"color":94,"percentage":95},"Makefile","#427819",0.2,{"name":97,"color":98,"percentage":95},"Dockerfile","#384d54",{"name":100,"color":101,"percentage":102},"Python","#3572A5",0,{"name":104,"color":86,"percentage":102},"Go Template",5421,1318,"2026-04-05T08:53:53","Apache-2.0",4,"Linux (x86_64, arm64)","未说明 (Volcano 是 Kubernetes 调度系统，本身不直接依赖 GPU，但用于调度 AI\u002FHPC 等需要 GPU 的工作负载)","未说明 (取决于底层 Kubernetes 集群规模及运行的工作负载)",{"notes":114,"python":115,"dependencies":116},"Volcano 是一个 Kubernetes 原生批处理调度系统，而非直接的 AI 模型推理\u002F训练工具。它需要运行在现有的 Kubernetes 集群上（推荐 1.17+ 版本）。安装方式支持 YAML、Helm 或源码一键安装（源码一键安装目前仅支持 x86_64 架构）。它主要作为基础设施层，与 Spark、Flink、TensorFlow、PyTorch 等框架集成以优化资源调度。","未说明",[117,118],"Kubernetes 1.12+","CRD support",[52,26,15,14,55,51,13],[121,122,123,124,125,126,127,128,129,130],"batch-systems","kubernetes","golang","hpc","bigdata","machine-learning","gene","ai","serving","training","2026-03-27T02:49:30.150509","2026-04-06T05:35:54.378686",[134,139,144,149],{"id":135,"question_zh":136,"answer_zh":137,"source_url":138},17481,"安装 Volcano 后启动 MPIJob 失败，提示队列不工作或连接错误，可能是什么原因？","这通常是由网络配置问题引起的。主要原因包括：\n1. CoreDNS 配置错误：检查集群的 CoreDNS 是否正常运行。\n2. kube-apiserver 无法访问 admission 服务：尝试从 kube-apiserver 所在的节点直接 ping Volcano admission pod 的 IP 地址进行验证。\n3. 自定义 Docker 镜像的用户权限问题：如果自定义镜像的默认用户不是 root，可能会导致 DNS 解析或 SSH 密钥错误。解决方案是将 Dockerfile 中的默认用户改回 root，或者在启动命令中显式使用 sudo（例如 `sudo \u002Fusr\u002Fsbin\u002Fsshd`），但改为 root 用户通常能彻底解决此类权限导致的 DNS 可见性问题。","https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fissues\u002F452",{"id":140,"question_zh":141,"answer_zh":142,"source_url":143},17482,"如何使用 kubectl 直接删除 Volcano 队列？为什么提示“只有状态为 Closed 的队列才能被删除”？","Volcano 的准入控制 webhook 强制要求队列在删除前必须处于 `Closed` 状态，直接使用 `kubectl delete` 删除 `Open` 状态的队列会被拒绝。\n目前的限制是：虽然可以通过 API 直接删除队列，但无法通过 `kubectl edit` 直接修改队列状态来关闭它。\n解决方案：\n1. 暂时方案：需要使用 Volcano 提供的命令行工具 `vcctl` 先执行关闭操作（`vcctl queue close`），然后再用 kubectl 删除。\n2. 未来规划：维护者表示未来将增加通过 Kubernetes API 直接控制队列状态的功能，届时将支持纯 Kubernetes 原生命令完成关闭和删除操作。","https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fissues\u002F1687",{"id":145,"question_zh":146,"answer_zh":147,"source_url":148},17483,"Volcano Scheduler 运行几天后内存占用过高且任务一直 Pending，没有报错日志，该如何排查？","这是一个已知的潜在问题，表现为 Scheduler Pod 内存持续增长且调度功能异常，重启后可暂时恢复。可能的原因和排查建议包括：\n1. 存储性能影响：如果集群的 ETCD 使用的是普通磁盘而非 SSD，高延迟可能导致 Scheduler 内部状态同步阻塞，进而引发内存堆积。\n2. 资源竞争：确保部署 Volcano 组件的节点没有被其他高负载任务（如 PyTorchJob）抢占资源，尽管 Volcano 自身没有设置 Requests，但节点资源不足会影响其运行。\n3. 监控与重启：目前建议在监控到内存异常升高时重启 Scheduler Pod 作为临时规避措施。维护者提到 CI 流程会每天构建并推送最新镜像到 Docker Hub，尝试升级到最新版本可能包含相关的内存优化修复。","https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fissues\u002F1947",{"id":150,"question_zh":151,"answer_zh":152,"source_url":153},17484,"设置了 vGPU 内存限制后，容器内 nvidia-smi 仍显示全部显存（如 32GB），限制未生效怎么办？","当在 Volcano 中配置 vGPU 内存限制后，如果容器内 `nvidia-smi` 仍然显示物理卡的全部显存（例如 V100 的 32GB），说明限制未正确应用。\n注意：此问题在报告中因长期无活动被关闭，但常见原因通常涉及:\n1. 驱动版本兼容性：确认 Nvidia 驱动版本（如 545.23.08）是否与 Volcano 版本及 vGPU 插件版本完全兼容。\n2. 配置校验：检查 Pod 定义中是否正确添加了 vGPU 相关的注解或资源请求。\n3. MIG 配置：确认是否错误地启用了 MIG 模式（报告中显示 MIG M: NA），某些配置下 vGPU 与 MIG 可能存在冲突。\n如果遇到此问题，建议检查 Volcano 控制器日志以确认 vGPU 设备插件是否成功拦截并限制了设备文件。","https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fissues\u002F3384",[155,160,165,170,175,180,185,190,195,200,205,210,215,220,225,230,235,240,245,250],{"id":156,"version":157,"summary_zh":158,"released_at":159},107677,"v1.13.2","## 变更内容\n\n### Bug 修复\n\n* 将 4829 提交 cherry-pick 到 1.13 版本：在作业中保留正在终止的 Pod，由 @kingeasternsun 在 [#4860](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4860) 中完成\n* [release-1.13] 修复快照中 NUMA 资源信息更新时可能出现的 panic，由 @qi-min 在 [#4897](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4897) 中完成\n* [release-1.13] 修复 GPU 资源错误，由 @sailorvii 在 [#4916](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4916) 中完成\n* [release-1.13] 更新 metrics_client_prometheus.go 文件，由 @nitindhiman314e 在 [#4931](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4931) 中完成\n* [release-1.13] 修复调度器快照克隆中的共享可变对象问题，由 @zhifei92 在 [#5093](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F5093) 中完成\n\n**完整变更日志**：[v1.13.1...v1.13.2](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fcompare\u002Fv1.13.1...v1.13.2)","2026-03-30T13:16:14",{"id":161,"version":162,"summary_zh":163,"released_at":164},107678,"v1.14.1","## 变更内容\n\n### Bug 修复\n\n* [release-1.14] 修复了包含子组但未设置硬性 networkTopology.mode 的作业无法被调度的问题。由 @JesseStutler 在 #5041 中完成。\n* [release-1.14] 修复：调度器重启时，SubJob 的 AllocatedHyperNode 恢复可能并非最低层级。由 @ouyangshengjia 在 #5012 中完成。\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fcompare\u002Fv1.14.0...v1.14.1","2026-02-14T01:57:51",{"id":166,"version":167,"summary_zh":168,"released_at":169},107679,"v1.14.0","## 摘要\n\nVolcano 1.14.0 版本将 Volcano 打造成面向大规模多样化工作负载的统一调度平台。该版本引入了可扩展的多调度器架构，并配备了动态节点调度分片功能，使多个调度器能够在大型集群中高效协同工作。同时，新增的 Agent 调度器为时延敏感的 AI Agent 工作负载提供快速调度，且与 Volcano 批处理调度器无缝衔接。网络拓扑感知调度也得到了显著增强，包括超节点级别的紧凑放置、SubGroup 策略，以及跨 Job 和 SubGroup 范围的多级 Gang 调度。Volcano Global 集成进一步推进，支持通过 HyperJob 实现多集群训练和数据感知调度。此外，混部功能现已支持通用操作系统，并提供 CPU 限流、内存 QoS 以及 Cgroup V2 支持。最后，集成的 Ascend vNPU 调度能力实现了 Ascend AI 加速器的高效共享。\n\n## 新特性\n\n### 核心特性概览\n\n- **可扩展的多调度器架构与动态节点调度分片（Alpha）**：基于可扩展策略，为各调度器动态计算候选节点池。\n- **AI Agent 工作负载的快速调度（Alpha）**：推出专为时延敏感的 AI Agent 工作负载设计的新 Agent 调度器，与 Volcano 批处理调度器协同工作，构建统一调度平台。\n- **网络拓扑感知调度增强**：支持超节点级别的紧凑放置、SubGroup 级别的网络拓扑感知调度，以及跨 Job 和 SubGroup 范围的多级 Gang 调度，以满足分布式工作负载的需求。\n- **Volcano Global 功能增强**：通过 HyperJob 支持多集群训练，并在联邦环境中实现数据感知调度。\n- **通用操作系统上的混部支持**：在 Ubuntu、CentOS 等通用操作系统上，支持 CPU 限流、内存 QoS、CPU 突发，并集成 Cgroup V2。\n- **Ascend vNPU 调度**：集成对 Ascend 310P\u002F910 系列 vNPU 的调度支持，兼容 MindCluster 和 HAMi 模式。\n\n### 核心特性详解\n\n### 可扩展的多调度器架构与动态节点调度分片（Alpha）\n\n**背景与动机**：\n\n随着 Volcano 不断演进，以支持大规模下的多样化调度需求，单调度器架构正面临严峻挑战。不同类型的工作负载（如批处理训练、AI Agent、微服务等）具有截然不同的调度要求和资源使用模式。单一调度器容易成为瓶颈，而静态的资源分配则会导致集群资源利用率低下。\n\n为此，Sharding Controller 引入了一种可扩展的多调度器架构，能够为每个调度器动态计算候选节点池。与严格的资源分区不同，Sharding Controller 并非强制实现调度器之间的硬隔离，而是根据当前需求动态调整候选节点集。这种灵活的设计使 Volcano 能够作为统一调度平台，同时服务于多种工作负载，保持高吞吐量和低延迟。\n\n**Alpha 特性说明**","2026-01-31T06:34:11",{"id":171,"version":172,"summary_zh":173,"released_at":174},107680,"v1.12.3","## 变更内容\n### Bug 修复\n- [Cherry-pick v1.12] 添加 hcclrank 作业插件，由 @wangdongyang1 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4555 中提交\n- 自动 cherry-pick #4347：当应得资源中某些标量资源为 0 时，分层队列验证无法通过，由 @wuxiaobao 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4586 中提交\n- 自动 cherry-pick #4590：在准入规则中添加管理命名空间的权限，由 @suyiiyii 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4594 中提交\n- [Cherry-pick v1.12] 修复 MPI 作业插件在仅包含主任务时出现的 panic，由 @wangdongyang1 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4619 中提交\n- [Cherry-pick v1.12] 同步 kube-scheduler：通过使用 VolumeAttachments 提升 CSILimits 插件的准确性，由 @guoqinwill 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4627 中提交\n- 自动 cherry-pick #4599：修复：为每个队列报告所有标量指标，由 @hajnalmt 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4651 中提交\n- [Cherry-pick 1.12] 修复：在 newQueueAttr 中初始化 realCapability 字段，由 @dafu-wu 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4695 中提交\n- [cherry-pick 1.12] 由于 Volcano 中未释放的 PreBind 锁导致调度主循环阻塞并超时，由 @guoqinwill 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4699 中提交\n- [release-1.12] Cherry-pick #4786 和 #4792：修复 ReplicaSet 的 KubeGroupNameAnnotation 处理以及 ReplicaSet PodGroup 更新同步问题，由 @hajnalmt 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4843 中提交\n- 自动 cherry-pick #4829：在作业中保留正在终止的 Pod，由 @wangdongyang1 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4861 中提交\n- [release-1.12] 修复快照中 NUMA 资源信息更新时可能出现的 panic，由 @qi-min 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4898 中提交\n- [release-1.12] 修复 GPU 资源错误，由 @chenw66 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4915 中提交\n- [release-1.12] 修复：PodGroup 中任务成员发生变化时，会导致调度过程中任务有效性检查失败，由 @ouyangshengjia 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4920 中提交\n- [release-1.12] 修复在禁用指标时调度器出现的 panic，由 @Copilot 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4921 中提交\n- [release-1.12] 更新 metrics_client_prometheus.go 文件，由 @nitindhiman314e 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4932 中提交\n## 维护\n- [release-1.12] 在 E2E 流程中添加“可用磁盘空间”步骤，由 @Copilot 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4851 中提交\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fcompare\u002Fv1.12.2...v1.12.3","2026-01-18T08:34:15",{"id":176,"version":177,"summary_zh":178,"released_at":179},107681,"v1.13.1","## 变更内容\n### Bug 修复\n* 自动 cherry-pick #4670：修复：ci 错误导致 bt ray e2e 默认镜像问题，由 @Wonki4 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4681 中提交\n* [Cherry-pick 1.13] 修复：在 newQueueAttr 中初始化 realCapability 字段，由 @dafu-wu 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4694 中提交\n* [cherry-pick 1.13] Volcano 中因未释放 PreBind 锁而导致调度主循环阻塞并超时，由 @guoqinwill 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4700 中提交\n* [release-1.13] 修复当指标功能被禁用时调度器出现 panic 的问题，由 @Copilot 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4770 中提交\n* 将 PR #4786 cherry-pick 到 release-1.13：修复 replicaSet podgroup 更新同步问题，由 @jiahuat 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4799 中提交\n* [release-1.13] 修复：replicaset 对 KubeGroupNameAnnotation 的处理问题，由 @hajnalmt 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4826 中提交\n* [release-1.13] 修复：缓存相关的持续警告，由 @hajnalmt 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4831 中提交\n* [release-1.13] 修复：容量插件的 preemptivefn 逻辑，由 @hajnalmt 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4830 中提交\n* [release-1.13] 修复：PodGroup 中任务成员发生变化时，会导致调度过程中任务有效性检查失败的问题，由 @ouyangshengjia 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4852 中提交\n\n### 维护\n* [release-1.13] 在 E2E 流程中添加“可用磁盘空间”步骤，由 @Copilot 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4763 中提交\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fcompare\u002Fv1.13.0...v1.13.1","2025-12-23T11:32:34",{"id":181,"version":182,"summary_zh":183,"released_at":184},107682,"v1.13.0","## 新增功能\n欢迎使用 Volcano v1.13.0 版本！🚀 🎉 📣  \n在本次发布中，我们带来了一系列备受社区用户期待的重大增强：\n\n- **AI 训练与推理增强**  \n  - [支持 LeaderWorkerSet 用于大模型推理场景](#support-leaderworkerset-for-large-model-inference-scenarios)  \n  - [引入 Cron VolcanoJob](#introduce-cron-volcanojob)  \n  - [支持基于标签的 HyperNode 自动发现](#support-label-based-hypernode-auto-discovery)  \n  - [新增原生 Ray 框架支持](#add-native-ray-framework-support)  \n  - [引入 HCCL 插件支持](#introduce-hccl-plugin-support)  \n\n- **资源管理与调度增强**  \n  - [引入 ResourceStrategyFit 插件](#introduce-resourcestrategyfit-plugin)  \n    - [按资源类型独立打分策略](#independent-scoring-strategy-by-resource-type)  \n    - [稀缺资源规避（SRA）](#scarce-resource-avoidance-sra)  \n  - [增强 NodeGroup 功能](#enhance-nodegroup-functionality)  \n\n- **混部增强**  \n  - [将混部与操作系统解耦](#decouple-colocation-from-os)  \n  - [支持自定义超卖资源名称](#support-custom-oversubscription-resource-names)  \n\n## 支持 LeaderWorkerSet 用于大模型推理场景\n\n[LeaderWorkerSet (LWS)](https:\u002F\u002Fgithub.com\u002Fkubernetes-sigs\u002Flws) 是一个用于在 Kubernetes 上部署 Pod 组的 API。它主要用于解决 AI\u002FML 推理工作负载中的多机推理问题，特别是需要对大型语言模型（LLM）进行分片，并将其跨多个节点上的多台设备运行的场景。\n\n自开源以来，Volcano 积极与上下游生态集成，构建了涵盖 AI 和大数据等批处理计算的全方位社区生态体系。在 LWS 的 [v0.7](https:\u002F\u002Fgithub.com\u002Fkubernetes-sigs\u002Flws\u002Freleases\u002Ftag\u002Fv0.7.0) 版本中，其原生集成了 Volcano 的 AI 调度能力。当与新版 Volcano 配合使用时，LWS 会自动创建 PodGroup，随后由 Volcano 进行调度和管理，从而为大模型推理场景实现 Gang 调度等高级功能。\n\n展望未来，Volcano 将继续拓展生态集成能力，为更多致力于在 Kubernetes 上实现分布式推理的项目提供强大的调度和资源管理支持。\n\n使用文档：[LeaderWorkerSet 与 Gang 调度](https:\u002F\u002Fgithub.com\u002Fkubernetes-sigs\u002Flws\u002Ftree\u002Fmain\u002Fdocs\u002Fexamples\u002Fsample\u002Fgang-scheduling)。  \n\n相关 PR：https:\u002F\u002Fgithub.com\u002Fkubernetes-sigs\u002Flws\u002Fpull\u002F496、https:\u002F\u002Fgithub.com\u002Fkubernetes-sigs\u002Flws\u002Fpull\u002F498，@[JesseStutler](https:\u002F\u002Fgithub.com\u002FJesseStutler)  \n\n## 引入 Cron VolcanoJob\n\n本次发布新增了对 Cron VolcanoJob 的支持。用户现在可以按照预定义的调度计划周期性地创建和运行 VolcanoJob，类似于原生的 Kubernetes CronJob，从而实现定期…","2025-09-29T11:40:54",{"id":186,"version":187,"summary_zh":188,"released_at":189},107683,"v1.12.2","## 变更内容\n* 自动 cherry pick #4422：将 kube-scheduler 相关指标的初始化移至 server.go，以避免 panic，由 @JesseStutler 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4461 中提出\n* 自动 cherry pick #4473：修复节点数量对齐问题，由 @Monokaix 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4488 中提出\n* [cherry-pick for 1.12] 修复 ReleaseNameEnvKey 的错误定义，由 @ouyangshengjia 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4490 中提出\n* [cherry-pick for 1.12] 修复 SelectBestNode 在插件得分均为负值时返回 nil 的问题，由 @guoqinwill 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4472 中提出\n* 自动 cherry pick #4487：在分层队列中添加缺失的容量指标，由 @JesseStutler 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4494 中提出\n* [Cherry-pick] 添加版本号更新脚本；使版本发布更加自动化，由 @JesseStutler 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4521 中提出\n* [Cherry-pick] 修复：在 StatefulSet 更新时同步更新 podGroup，由 @Poor12 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4522 中提出\n* 自动化：将版本号升级至 v1.12.2，由 @JesseStutler 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4518 中提出\n\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fcompare\u002Fv1.12.1...v1.12.2","2025-08-14T01:26:39",{"id":191,"version":192,"summary_zh":193,"released_at":194},107684,"v1.12.1","## 变更内容\n* 由 @Monokaix 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4336 中修复升级到新版本时队列更新冲突的问题\n* 由 @Monokaix 在 https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4337 中将镜像版本升级至 v1.12.1\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fcompare\u002Fv1.12.0...v1.12.1","2025-05-31T14:47:45",{"id":196,"version":197,"summary_zh":198,"released_at":199},107685,"v1.12.0","# 新增功能\n\n**欢迎使用 Volcano v1.12.0 版本！🚀 🎉 📣**  \n在本次发布中，我们带来了社区用户期待已久的多项重要增强功能。**\n\n## 网络拓扑感知调度：Alpha 版本\nVolcano 的网络拓扑感知调度功能最初在 v1.11 中以预览形式推出，如今已在 v1.12 中正式进入 Alpha 阶段。该特性旨在优化大规模训练和推理场景中的 AI 任务部署，例如模型并行训练和 Leader-Worker 推理架构。通过将任务调度在同一网络拓扑性能域内，可以减少跨交换机通信，从而显著提升任务执行效率。Volcano 利用 HyperNode CRD 抽象并表示异构硬件的网络拓扑结构，支持分层架构以简化管理。\n\nv1.12 中集成的关键特性包括：\n\n- **HyperNode 自动发现：**  \n  Volcano 现已支持集群网络拓扑的自动发现功能。用户只需配置发现类型，系统便会自动生成并维护反映实际集群网络拓扑的分层 HyperNode 结构。目前，该功能支持 InfiniBand（IB）网络，通过 UFM（Unified Fabric Manager）接口获取拓扑信息并自动更新 HyperNode。未来计划还将扩展对 RoCE 等更多网络协议的支持。\n\n- **HyperNode 优先级选择：**  \n\n  本次版本引入了一种基于节点级别和 HyperNode 级别评估的打分策略，并将两者得分累加以确定最终的 HyperNode 评分。\n\n  - **节点级别：** 建议配置 BinPack 插件，优先填充 HyperNode，从而减少资源碎片化。\n  - **HyperNode 级别：** 由于低层级的 HyperNode 具有更少的跨交换机通信，因此优先选择低层级的 HyperNode 以获得更好的性能。对于同层级的 HyperNode，包含更多任务的 HyperNode 将获得更高分数，以降低 HyperNode 层面的资源碎片化。\n\n- **支持 Label Selector 节点匹配：**  \n\n  HyperNode 叶子节点与集群中的物理节点相关联，支持三种匹配策略：\n\n  - **精确匹配：** 直接匹配节点名称。\n  - **正则表达式匹配：** 使用正则表达式匹配节点名称。\n  - **Label 匹配：** 通过标准的 Label Selector 匹配节点。\n\n**相关文档：**\n\n- [网络拓扑感知调度简介与使用指南](https:\u002F\u002Fvolcano.sh\u002Fen\u002Fdocs\u002Fnetwork_topology_aware_scheduling\u002F)\n- [网络拓扑感知调度设计文档](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fblob\u002Fmaster\u002Fdocs\u002Fdesign\u002FNetwork%20Topology%20Aware%20Scheduling.md)\n- [网络拓扑自动发现设计文档](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fblob\u002Fmaster\u002Fdocs\u002Fdesign\u002FhyperNode-auto-discovery.md)\n- [网络拓扑自动发现使用指南](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fblob\u002Fmaster\u002Fdocs\u002Fuser-guide\u002Fhow_t","2025-05-31T14:46:56",{"id":201,"version":202,"summary_zh":203,"released_at":204},107686,"v1.11.0-network-topology-preview.3","**重要提示**\n此版本修复了多个严重的安全漏洞。我们强烈建议所有用户立即升级，以保护您的系统和数据。\n## 安全修复\n- [Cherry-pick network-topology] 添加 HTTP 响应体大小限制 (#4255 @JesseStutler)\n- [Cherry-pick network-topology] 添加安全上下文配置 (#4250 @JesseStutler)\n- 移除部分文件的执行权限，将权限设置为 644 ([#4171](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4171) **@JesseStutler**)\n- 在调度器中添加一个开关，用于控制是否启用 pprof ([#4173](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4173) **@JesseStutler**)\n- 当 TLS 验证被禁用时，添加警告信息 ([#4211](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4211) **@Monokaix**)\n- 添加 HTTP 服务器超时时间 ([#4208](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4208) **@Monokaix**)\n\n## 其他改进\n- 将镜像版本升级至 v1.11.0-network-topology-preview.3 ([#4237](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4237) **@JesseStutler**)\n- 添加 NetworkTopology 插件评分文档 ([#4213](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4213) **@ecosysbin**)\n- HyperNode 支持按标签选择节点 ([#4068](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4068) **@ecosysbin**)\n- 更新 Ubuntu 基础镜像 ([#4197](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4197) **@Monokaix**)\n\n## 升级前的重要注意事项\n**变更：Volcano 调度器的 pprof 端点默认关闭**\n为提升安全性，本版本中 Volcano 调度器的 pprof 端点已默认关闭。如果您需要该端点进行调试或监控，则需在升级后显式启用。具体方法如下：\n- 如果您使用 Helm 进行部署，请在 Helm 安装或升级时指定 `custom.scheduler_pprof_enable=true`。\n- 或者，在启动 Volcano 调度器时，手动设置命令行参数 `--enable-pprof=true`。\n\n请务必注意生产环境中启用此端点可能带来的安全风险。","2025-04-30T10:33:54",{"id":206,"version":207,"summary_zh":208,"released_at":209},107687,"v1.10.2","**Important:**\r\nThis release addresses multiple critical security vulnerabilities. We strongly advise all users to upgrade immediately to protect your systems and data.\r\n## Security Fixes\r\n- [Cherry-pick 1.10] Add http response body size limit (#4253 @kevin-wangzefeng)\r\n- [Cherry-pick 1.10] Add security context configuration (#4246 @JesseStutler)\r\n- Remove the execute permission for some files, chmod to 644 ([#4171](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4171) **@JesseStutler**)\r\n- add a switch to control whether enable pprof in scheduler ([#4173](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4173) **@JesseStutler**)\r\n- Add warning msg when TLS verification disabled([#4211](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4211) **@Monokaix**)\r\n- Add http server timeout([#4208](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4208) **@Monokaix**)\r\n\r\n## Other Improvements\r\n- Update ubuntu base image([#4194](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4194) **@Monokaix**)\r\n- Bump image to v1.10.2 ([#4231](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4231) **@JesseStutler**)\r\n\r\n## Important Notes Before Upgrading\r\n**Change: Volcano Scheduler pprof Endpoint Disabled by Default**\r\nFor security enhancement, the pprof endpoint for the Volcano Scheduler is now disabled by default in this release. If you require this endpoint for debugging or monitoring, you will need to explicitly enable it post-upgrade. This can be achieved by:\r\n- If you are using helm, specifying `custom.scheduler_pprof_enable=true` during Helm installation or upgrade. \r\n- OR, manually  setting the command-line argument `--enable-pprof=true` when starting the Volcano Scheduler. \r\n\r\nPlease be aware of the security implications before enabling this endpoint in production environments.","2025-04-30T10:33:38",{"id":211,"version":212,"summary_zh":213,"released_at":214},107688,"v1.9.1","**Important:**\r\nThis release addresses multiple critical security vulnerabilities. We strongly advise all users to upgrade immediately to protect your systems and data.\r\n## Security Fixes\r\n- [Cherry-pick 1.9] Add http response body size limit (#4254 @JesseStutler)\r\n- [Cherry-pick 1.9] Add security context configuration (#4249 @Monokaix)\r\n- Remove the execute permission for some files, chmod to 644 ([#4171](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4171) **@JesseStutler**)\r\n- add a switch to control whether enable pprof in scheduler ([#4173](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4173) **@JesseStutler**)\r\n- Add warning msg when TLS verification disabled([#4211](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4211) **@Monokaix**)\r\n- Add http server timeout([#4208](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4208) **@Monokaix**)\r\n\r\n## Other Improvements\r\n- Bump image to v1.9.1 ([#4230](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4230) **@JesseStutler**)\r\n- fix panic when get job's elastic resource ([#4103](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4103) **@lowang-bh**)\r\n- change to action cache v4 ([#4075](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4075) **@Monokaix**)\r\n- fix flaky test ([#4121](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4121) **@Monokaix**)\r\n- Supports rollback when allocate callback function fails ([#3780](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F3780) **@wangyang0616**)\r\n- Supports rollback when allocate callback function fails ([#3776](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F3776) **@wangyang0616**)\r\n- fix pg controller create redundancy podGroup when schedulerName isn't matched ([#3675](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F3675) **@liuyuanchun11**)\r\n- Update Kubernetes compatibility ([#3570](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F3570) **@Monokaix**)\r\n- Fix podgroup not created ([#3572](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F3572) **@liuyuanchun11**)\r\n- update pod status when bind error ([#3550](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F3550) **@bibibox**)\r\n- Update NominatedNodeName for pipelined task ([#3501](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F3501) **@bibibox**)\r\n\r\n## Important Notes Before Upgrading\r\n**Change: Volcano Scheduler pprof Endpoint Disabled by Default**\r\nFor security enhancement, the pprof endpoint for the Volcano Scheduler is now disabled by default in this release. If you require this endpoint for debugging or monitoring, you will need to explicitly enable it post-upgrade. This can be achieved by:\r\n- If you are using helm, specifying `custom.scheduler_pprof_enable=true` during Helm installation or upgrade. \r\n- OR, manually  setting the command-line argument `--enable-pprof=true` when starting the Volcano Scheduler. \r\n\r\nPlease be aware of the security implications before enabling this endpoint in production environments.","2025-04-30T10:32:45",{"id":216,"version":217,"summary_zh":218,"released_at":219},107689,"v1.11.2","**Important:**\r\nThis release addresses multiple critical security vulnerabilities. We strongly advise all users to upgrade to immediately to protect your systems and data.\r\n## Security Fixes\r\n- [Cherry-pick 1.11] Add http response body size limit (#4252 @kevin-wangzefeng )\r\n- [Cherry-pick 1.11] Add security context configuration (#4245 @JesseStutler)\r\n- Remove the execute permission for some files, chmod to 644 ([#4171](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4171) **@JesseStutler**)\r\n- add a switch to control whether enable pprof in scheduler ([#4173](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4173) **@JesseStutler**)\r\n- Add warning msg when TLS verification disabled([#4211](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4211) **@Monokaix**)\r\n- Add http server timeout([#4208](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4208) **@Monokaix**)\r\n\r\n## Other Improvements\r\n- Bump image to v1.11.2 ([#4232](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4232) **@JesseStutler**)\r\n- Fix: remove controller-manager metrics that should not be introduced ([#4202](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4202) **@dongjiang1989**)\r\n- Filter useless logs in binpack ([#4240](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4240) **@XbaoWu**)\r\n\r\n## Important Notes Before Upgrading\r\n**Change: Volcano Scheduler pprof Endpoint Disabled by Default**\r\nFor security enhancement, the pprof endpoint for the Volcano Scheduler is now disabled by default in this release. If you require this endpoint for debugging or monitoring, you will need to explicitly enable it post-upgrade. This can be achieved by:\r\n- If you are using helm, specifying `custom.scheduler_pprof_enable=true` during Helm installation or upgrade. \r\n- OR, manually  setting the command-line argument `--enable-pprof=true` when starting the Volcano Scheduler. \r\n\r\nPlease be aware of the security implications before enabling this endpoint in production environments.","2025-04-30T10:34:04",{"id":221,"version":222,"summary_zh":223,"released_at":224},107690,"v1.11.0-network-topology-preview.2","## What's Changed\r\n* [cherry-pick]change to action cache v4 by @Monokaix in https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4074\r\n* [Cherry-pick network-topology] Replace queue status update by using ApplyStatus method & Bump image to v1.11.0-network-topology-preview.2 by @JesseStutler in https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4153\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fcompare\u002Fv1.11.0-network-topology-preview.0...v1.11.0-network-topology-preview.2","2025-04-09T09:33:20",{"id":226,"version":227,"summary_zh":228,"released_at":229},107691,"v1.11.1","## What's Changed\r\n* [cherry-pick]change to action cache v4 by @Monokaix in https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4075\r\n* [cherry-pick]fix creating a hierarchical sub-queue will be rejected by @zhutong196 in https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4080\r\n* [cherry-pick] Fix jobflow `status` confusion problem  by @dongjiang1989 in https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4094\r\n* [cherry-pick] fix: the problem that PVC will be continuously created indefinitely by @ytcisme in https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4144\r\n* [Cherry-pick v1.11] Replace queue status update by using ApplyStatus method & Bump image to v1.11.1 by @JesseStutler in https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4155\r\n* [Cherry-pick v1.11] fix: remove lessPartly condition in reclaimable fn from capacity and proportion plugins by @JesseStutler in https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4178\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fcompare\u002Fv1.11.0...v1.11.1","2025-04-09T01:46:55",{"id":231,"version":232,"summary_zh":233,"released_at":234},107692,"v1.10.1","## What's Changed\r\n* [cherry-pick for release-1.10]fix job controller reports duplicate warnings by @liuyuanchun11 in https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F3755\r\n* [cherry-pick for release 1.10] Fix predicate return unexpected result by @bibibox in https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F3859\r\n* [cherry-pick for release-1.10]Supports rollback when allocate callback function fails by @wangyang0616 in https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F3864\r\n* change to action cache v4 by @Monokaix in https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4096\r\n* [cherry-pick] Fix jobflow `status` confusion problem by @dongjiang1989 in https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4095\r\n* [cherry-pick] fix panic when get job's elastic resource by @lowang-bh in https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4103\r\n* Update release-1.10 api version to v1.10.1 and Bump image to v1.10.1 by @JesseStutler in https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fpull\u002F4154\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fcompare\u002Fv1.10.0...v1.10.1","2025-04-09T09:26:06",{"id":236,"version":237,"summary_zh":238,"released_at":239},107693,"v1.11.0","# What's New\r\n\r\n**Welcome to the v1.11.0 release of Volcano! 🚀 🎉 📣\r\nIn this release, we have brought a bunch of significant enhancements that have long-awaited by community users.**\r\n\r\n## Feature Preview: Network Topology Aware Scheduling\r\n\r\nIn AI large model training scenarios, model parallelism splits the model across multiple nodes, requiring frequent data exchange between these nodes during training. At this point, network transmission performance between nodes often becomes a bottleneck, significantly impacting training efficiency. Data centers have diverse network types (e.g., IB, RoCE, NVSwitch) and complex network topologies, typically involving multiple layers of switches. The fewer switches between two nodes, the lower the communication latency and the higher the throughput. Therefore, users want to schedule workloads to the best performance domain with the highest throughput and lowest latency, minimizing cross-switch communication to accelerate data exchange and improve training efficiency.\r\n\r\nTo address this, Volcano has introduced the **Network Topology Aware Scheduling** strategy, solving the network communication performance issues in large-scale data center AI training tasks through a unified network topology API and intelligent scheduling policies. It provides the following capabilities:\r\n\r\n- **Unified Network Topology API**\r\n\r\n  Introduced the [HyperNode CRD](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fapis\u002Fblob\u002Fnetwork-topology-dev\u002Fpkg\u002Fapis\u002Ftopology\u002Fv1alpha1\u002Fhypernode_types.go) to accurately express the network topology of data centers.\r\n\r\n- **Network Topology-Aware Scheduling Policy**\r\n\r\n  Volcano Job and PodGroup can set topology constraints for jobs through the `networkTopology` field, supporting the following configurations:\r\n\r\n  - **mode**: Supports `hard` and `soft` modes.\r\n    - `hard`: Hard constraint, tasks within the job must be deployed within the same HyperNode.\r\n    - `soft`: Soft constraint, attempts to deploy the job within the same HyperNode.\r\n  - **highestTierAllowed**: Used with `hard` mode, indicating the highest tier of HyperNode the job is allowed to span.\r\n\r\nDesign doc: [Topology Aware Scheduling](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fblob\u002Fmaster\u002Fdocs\u002Fdesign\u002FNetwork%20Topology%20Aware%20Scheduling.md).\r\n\r\nUser Guide: [Topology Aware Scheduling | Volcano](https:\u002F\u002Fvolcano.sh\u002Fen\u002Fdocs\u002Fnetwork_topology_aware_scheduling\u002F).\r\n\r\nRelated PRs: (#3850, [#144](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fapis\u002Fpull\u002F144), #3874, #3922, #3964, #3971, #3974, #3887, #3897, @**[ecosysbin](https:\u002F\u002Fgithub.com\u002Fecosysbin)**, @**[weapons97](https:\u002F\u002Fgithub.com\u002Fweapons97)**, @**[Xu-Wentao](https:\u002F\u002Fgithub.com\u002FXu-Wentao)**,@**[penggu](https:\u002F\u002Fgithub.com\u002Fpenggu)**  @**[JesseStutler](https:\u002F\u002Fgithub.com\u002FJesseStutler)**, @Monokaix)\r\n\r\n## Supports Elastic Hierarchical Queue\r\n\r\nIn multi-tenant scenarios, fairness, isolation, and task priority control in resource allocation are core requirements. Different departments or teams often need to share cluster resources while ensuring their tasks can obtain resources on demand, avoiding resource contention or waste. To address this, Volcano has introduced the **Elastic Hierarchical Queue** feature, significantly enhancing queue resource management capabilities. Through hierarchical queues, users can achieve finer-grained resource quota management, cross-level resource sharing and reclamation, and flexible preemption strategies, building an efficient and fair unified scheduling platform. For users of YARN, they can seamlessly migrate big data workloads to Kubernetes clusters using Volcano.\r\n\r\nVolcano's elastic hierarchical queues have the following key features to meet the complex demands of multi-tenant scenarios:\r\n\r\n1. **Supports Configuring Queue Hierarchies**\r\n   Users can create multi-level queues as needed, forming a tree structure. Each queue can set independent resource quotas and priorities, ensuring fair resource allocation.\r\n2. **Cross-Level Resource Sharing and Reclamation**\r\n   When a sub-queue is idle, its resources can be shared with other sub-queues. When jobs are submitted to the sub-queue, resources can be reclaimed from other sub-queues.\r\n3. **Fine-Grained Resource Quota Management**\r\n   Each queue can set the following resource parameters:\r\n   - **`capability`**: The upper limit of the queue's resource capacity.\r\n   - **`deserved`**: The amount of resources the queue deserves. If the queue's allocated resources exceed the `deserved` value, the excess can be reclaimed.\r\n   - **`guarantee`**: The reserved resources for the queue, ensuring the minimum resource guarantee.\r\n4. **Flexible Preemption Strategies**\r\n   Supports priority-based resource preemption, ensuring high-priority tasks can obtain the required resources promptly.\r\n\r\nFor detailed design and usage guidance on elastic hierarchical queues, please refer to:\r\n\r\nDesign doc: [hierarchical-queue-on-capacity-plugin](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fblob\u002Fmaster\u002Fdocs\u002Fdesign\u002Fhierarchical-queue-on-capacity-plugi","2025-01-24T10:31:22",{"id":241,"version":242,"summary_zh":243,"released_at":244},107694,"v1.10.0","# What's New\r\n\r\n## Support Queue Priority Scheduling Strategy\r\n\r\nIn traditional big data processing scenarios, users can directly set queue priorities to control the scheduling order of jobs. To ease the migration from Hadoop\u002FYarn to cloud-native platforms, Volcano supports setting priorities at the queue level, reducing migration costs for big data users while enhancing user experience and resource utilization efficiency.\r\n\r\nQueues are a fundamental resource in Volcano, each with its own priority. By default, a queue's priority is determined by its `share` value, which is calculated by dividing the resources allocated to the queue by its total capacity. This is done automatically, with no manual configuration needed. The smaller the `share` value, the fewer resources the queue has, making it less saturated and more likely to receive resources first. Thus, queues with smaller `share` values have higher priority, ensuring fairness in resource allocation.\r\n\r\nIn production environments—especially in big data scenarios—users often prefer to manually set queue priorities to have a clearer understanding of the order in which queues are scheduled. Since the `share` value is dynamic and changes in real-time as resources are allocated, Volcano introduces a `priority` field to allow users to set queue priorities more intuitively. The higher the `priority`, the higher the queue's standing. High-priority queues receive resources first, while low-priority queues have their jobs reclaimed earlier when resources need to be recycled.\r\n\r\nTo ensure compatibility with the `share` mechanism, Volcano also considers the share value when calculating queue priorities. By default, if a user has not set a specific queue priority or if priorities are equal, Volcano will fall back to comparing share values. In this case, the queue with the smaller share has higher priority. Users have the flexibility to choose between different priority strategies based on their specific needs—either by using the priority or the share method.\r\n\r\nQueue priority design doc: [Queue priority](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fblob\u002Fmaster\u002Fdocs\u002Fdesign\u002Fqueue-priority.md)\r\n\r\nRelated PRs: ([#132](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fapis\u002Fpull\u002F132), #3700, @TaiPark)\r\n\r\n## Enable Fine-Grained GPU Resource Sharing and Reclaim\r\n\r\nVolcano introduced the elastic queue capacity scheduling feature in version v1.9, allowing users to directly set the capacity for each resource dimension within a queue. This feature also supports elastic scheduling based on the `deserved` value, enabling more fine-grained resource sharing and recycling across queues.\r\n\r\nFor detailed design information on elastic queue capacity scheduling, refer to the [Capacity Scheduling Design Document](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fblob\u002Fmaster\u002Fdocs\u002Fdesign\u002Fcapacity-scheduling.md).\r\n\r\nFor a step-by-step guide on using the capacity plugin, see the [Capacity Plugin User Guide](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fblob\u002Fmaster\u002Fdocs\u002Fuser-guide\u002Fhow_to_use_capacity_plugin.md).\r\n\r\nIn version v1.10, Volcano extends its support to include reporting different types of GPU resources within elastic queue capacities. NVIDIA's default `Device Plugin` does not distinguish between GPU models, instead reporting all resources uniformly as `nvidia.com\u002Fgpu`. This limits AI training and inference tasks from selecting specific GPU models, such as A100 or T4, based on their particular needs. To address this, Volcano now supports reporting distinct GPU models at the `Device Plugin` level, working with the `capacity` plugin to enable more precise GPU resource sharing and recycling.\r\n\r\nFor instructions on using the `Device Plugin` to report various GPU models, refer to the [GPU Resource Naming Guide](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fdevices\u002Ftree\u002Frelease-1.1\u002Fdocs\u002Fresource-naming).\r\n\r\n**Note:**\r\n\r\nIn version v1.10.0, the `capacity` plugin is the default for queue management. Note that the `capacity` and `proportion` plugins are incompatible, so after upgrading to v1.10.0, you must set the `deserved` field for queues to ensure proper functionality. For detailed instructions, refer to the [Capacity Plugin User Guide](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fblob\u002Fmaster\u002Fdocs\u002Fuser-guide\u002Fhow_to_use_capacity_plugin.md).\r\n\r\nThe `capacity` plugin allocates cluster resources based on the `deserved` value set by the user, while the `proportion` plugin dynamically allocates resources according to queue weight. Users can select either the `capacity` or `proportion` plugin for queue management based on their specific needs. For more details on the proportion plugin, visit: [Proportion Plugin](https:\u002F\u002Fvolcano.sh\u002Fen\u002Fdocs\u002Fplugins\u002F#proportion).\r\n\r\nRelated PR: ([#68](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fdevices\u002Fpull\u002F68), @MondayCha)\r\n\r\n## Introduce Pod Scheduling Readiness Support\r\n\r\nOnce a Pod is created, it is considered ready for scheduling. In Kube-scheduler, it will try its best to find a suitable node to place all pending Pods. However, in reality, s","2024-09-19T06:47:46",{"id":246,"version":247,"summary_zh":248,"released_at":249},107695,"v1.9.0"," # What's New\r\n\r\n  ## Support elastic queue capacity scheduling\r\n\r\n  Volcano now uses the proportion plugin for queue management. Users can set the guarantee, capacity and other fields of the queue to set the reserved resources and capacity limit of the queue. And by setting the weight value of the queue to realize the resource sharing within the cluster, the queue is proportionally divided into cluster resources according to the weight value, but this queue management method has the following problems:\r\n\r\n  - The capacity of the resources divided by the queue is reflected by the weight, which is not intuitive enough.\r\n  - All resources in the queue are divided using the same ratio, and the capacity cannot be set separately for each dimension of the queue.\r\n\r\n  Based on the above considerations, Volcano implements a new queue elasticity capacity management capability, it supports:\r\n\r\n  - Allows users to directly set the capacity of each dimension of resources for the queue instead of setting a weight value.\r\n  - Elastic capacity scheduling based deserved resources, and queue's resources can be shared and reclaimed back.\r\n\r\n  For example,  in AI large model training scenario, setting different resource capacities for different GPU models in the queue, such as A100 and V100, respectively. At the same time, when the cluster resources are idle, the queue can reuse the resources of other idle queues, and when needed, reclaim the resources set by the user for the queue, that is, the amount of resources deserved, so as to realize the elastic capacity scheduling.\r\n\r\n  To use this feature, you need to set the deserved field of the queue and set the amount of resources to be deserved for each dimension. At the same time, you need to turn on the capacity plugin and turn off the proportion plugin in the scheduling configuration.\r\n\r\n  Please refer to [Capacity Scheduling Design](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fblob\u002Fmaster\u002Fdocs\u002Fdesign\u002Fcapacity-scheduling.md) for more detail.\r\n\r\n  Capacity scheduling example: [How to use capacity plugin](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fblob\u002Fmaster\u002Fdocs\u002Fuser-guide\u002Fhow_to_use_capacity_plugin.md).\r\n\r\n  Related PR: (#3277, #121, #3283, @Monokaix)\r\n\r\n  ## Support affinity scheduling between queues and nodes\r\n\r\n  Queues are usually associated with departments within the company, and different departments usually need to use different heterogeneous resource types. For example, the large model training team needs to use NIVDIA’s Tesla GPU, and the recommendation team needs to use AMD’s GPU. When users submit jobs to the queue , the job needs to be automatically scheduled to the node of the corresponding resource type according to the attributes of the queue.\r\n\r\n  Volcano has implemented affinity scheduling capabilities for queues and nodes. Users only need to set the node label that require affinity in the affinity field of the queue. Volcano will automatically schedule jobs submitted to the current queue to the nodes associated with the queue. Users do not need to Set the affinity of the job separately, and only need to set the affinity of the queue uniformly. Jobs submitted to the queue will be scheduled to the corresponding node based on the affinity of the queue and the node.\r\n\r\n  This feature supports hard affinity, soft affinity, and anti-affinity scheduling at the same time. When using it, you need to set a label with the key `volcano.sh\u002Fnodegroup-name` for the node, and then set the affinity field of the queue to specify hard affinity, soft affinity label values.\r\n\r\n  The scheduling plugin for this feature is called nodegroup, for a complete example of its use see: [How to use nodegroup plugin](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fblob\u002Fmaster\u002Fdocs\u002Fuser-guide\u002Fhow_to_use_nodegroup_plugin.md).\r\n\r\n  For detailed design documentation, see [The nodegroup design](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fblob\u002Fmaster\u002Fdocs\u002Fdesign\u002Fnode-group.md).\r\n\r\n  Related PR: (#3132, @qiankunli, @wuyueandrew)\r\n\r\n  ## GPU sharing feature supports node scoring scheduling\r\n\r\n  GPU Sharing is a GPU sharing and isolation solution introduced in Volcano v1.8, which provides GPU sharing and device memory control capabilities to enhance the GPU resource utilization in AI training and inference scenarios. v1.9 adds a new scoring strategy for GPU nodes on top of this feature, so that the optimal node can be selected during job assignment to further enhance resource utilization. Users can set different scoring strategies. Currently, the following two strategies are supported:\r\n\r\n  - Binpack: Provides a binpack algorithm for GPU card granularity, prioritizing to fill up a node with GPU cards that have already been allocated resources to avoid resource fragmentation and waste.\r\n\r\n  - Spread: Prioritizes the use of idle GPU cards over shared cards that have already been allocated resources.\r\n\r\n  For detailed usage documentation, please refer to: [How to use gpu sharing](https:\u002F\u002Fgithub.com\u002Fvolcano-sh\u002Fvolcano\u002Fblob\u002Fmast","2024-05-21T01:52:54",{"id":251,"version":252,"summary_zh":253,"released_at":254},107696,"v1.8.2","\r\nChanges since v1.8.1\r\n\r\n- fix wrong pods field format output of queue status (#3287 @Monokaix)\r\n- add ignored csi provisioner when compute csi resources (#3286 @Monokaix)\r\n- fix k8s.io\u002Fdynamic-resource-allocation go mod not found err (#3272 @Monokaix)\r\n- fix: json marsh error for unsupport type: func() (#3282 @lowang-bh)\r\n- fix job CRD metadata.annotations: Too long error (#3267 @Monokaix)\r\n- fix queue update validation err when status.allocated empty ( #3266 @Monokaix)\r\n- fix grafana dashboard format err (#3265 @Monokaix)\r\n- update parameter BestEffort of taskInfo after changing parameter InitResreq (#3232 @Lily922)\r\n- fix: allocated field in queue status is calcutated error (#3221 @shusley244)\r\n- Avoid repeatedly creating links to obtain node metrics (#3229 @wangyang0616)\r\n- skip 'pods' resource when checking if the Resource is empty (#3224 @Lily922)\r\n- queue realcapability change to min dimension of queue capability and … (#3219 @Monokaix)\r\n- support preemption when the number of pods of a node reaches the upper limit (#3202 @Lily922)\r\n- Delete duplicate logs generated by the predicate_helper method (#3214 @guoqinwill)\r\n- support preempting task with bound status (#3209 @Lily922)\r\n- support preemption when the number of attachment volumes of a node reaches the upper limit (#3212 @Lily922)\r\n- fix: task scheduling latancy metrics is not accurate (#3128 @lowang-bh)\r\n- backfill add score process (#3164 @lowang-bh)\r\n- Obtains the actual load data of a node from the custom metrics API (#3181 @wangyang0616)\r\n- Update the default value of parameter worker-threads-for-podgroup to 5 (#3180 @Lily922)\r\n- update volcano.sh\u002Fapis version (#3166 @Lily922)","2024-01-10T03:43:36"]