[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-kvcache-ai--Mooncake":3,"tool-kvcache-ai--Mooncake":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",160015,2,"2026-04-18T11:30:52",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",109154,"2026-04-18T11:18:24",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":66,"readme_en":67,"readme_zh":68,"quickstart_zh":69,"use_case_zh":70,"hero_image_url":71,"owner_login":72,"owner_name":73,"owner_avatar_url":74,"owner_bio":75,"owner_company":76,"owner_location":76,"owner_email":77,"owner_twitter":76,"owner_website":78,"owner_url":79,"languages":80,"stars":117,"forks":118,"last_commit_at":119,"license":120,"difficulty_score":121,"env_os":122,"env_gpu":123,"env_ram":124,"env_deps":125,"category_tags":134,"github_topics":135,"view_count":32,"oss_zip_url":76,"oss_zip_packed_at":76,"status":17,"created_at":143,"updated_at":144,"faqs":145,"releases":174},9039,"kvcache-ai\u002FMooncake","Mooncake","Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.","Mooncake 是月之暗面（Moonshot AI）为其旗舰大模型服务 Kimi 打造的高性能推理平台，现已开源核心组件。它主要解决大模型在高并发场景下的显存瓶颈与数据传输延迟问题，通过一种“以 KVCache 为中心”的分离式架构，将计算与显存管理解耦。这种设计让多张显卡甚至多台服务器能高效共享关键缓存数据，大幅减少重复计算，显著提升长文本处理和复杂任务时的响应速度与吞吐量。\n\nMooncake 特别适合从事大模型基础设施开发的工程师、追求极致推理性能的研究人员，以及需要部署大规模 LLM 服务的企业团队。其技术亮点在于自研的 Transfer Engine（传输引擎）和 Mooncake Store，它们实现了跨设备、跨节点的低延迟数据搬运，并支持动态显存池化。目前，Mooncake 已深度集成至 PyTorch 生态，并被 SGLang、vLLM 等主流推理框架采纳，用于优化多模态嵌入缓存及分离式推理流程。如果你正在构建需要处理海量上下文或高流量请求的 AI 应用，Mooncake 提供了一个经过工业级验证的可靠解决方案，帮助你在有限的硬件资源下释放更大的模型潜能。","\u003Cdiv align=\"center\">\n  \u003Cimg src=image\u002Fmooncake-icon.png width=44% \u002F>\n  \u003Ch2 align=\"center\">\n      A KVCache-centric Disaggregated Architecture for LLM Serving\n  \u003C\u002Fh2>\n  \u003Ca href=\"https:\u002F\u002Fwww.usenix.org\u002Fsystem\u002Ffiles\u002Ffast25-qin.pdf\" target=\"_blank\">\u003Cstrong>Paper\u003C\u002Fstrong>\u003C\u002Fa>\n  | \u003Ca href=\"https:\u002F\u002Fwww.usenix.org\u002Fsystem\u002Ffiles\u002Ffast25_slides-qin.pdf\" target=\"_blank\">\u003Cstrong>Slides\u003C\u002Fstrong>\u003C\u002Fa>\n  | \u003Ca href=\"FAST25-release\u002Ftraces\" target=\"_blank\">\u003Cstrong>Traces\u003C\u002Fstrong>\u003C\u002Fa>\n  | \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.00079\" target=\"_blank\">\u003Cstrong>Technical Report\u003C\u002Fstrong>\u003C\u002Fa>\n  | \u003Ca href=\"https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002F\" target=\"_blank\">\u003Cstrong>Blog\u003C\u002Fstrong>\u003C\u002Fa>\n  | \u003Ca href=\"https:\u002F\u002Fjoin.slack.com\u002Ft\u002Fmooncake-project\u002Fshared_invite\u002Fzt-3qx4x35ea-zSSTqTHItHJs9SCoXLOSPA\" target=\"_blank\">\u003Cstrong>Slack\u003C\u002Fstrong>\u003C\u002Fa>\n  \u003Cbr \u002F>\n  \u003Cbr \u002F>\n\n  [![Docs](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdocs-live-brightgreen)](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002F)\n  [![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fmooncake-transfer-engine)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmooncake-transfer-engine)\n  [![PyPI - Python Version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fmooncake-transfer-engine)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmooncake-transfer-engine)\n  [![CUDA \u003C=12.9](https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=CUDA&message=%3C%3D12.9&color=76B900)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmooncake-transfer-engine)\n  [![CUDA 13.0\u002F13.1](https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=CUDA&message=13.0%2F13.1&color=76B900)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmooncake-transfer-engine-cuda13)\n  [![PyPI - Downloads](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Fmooncake-transfer-engine)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmooncake-transfer-engine)\n  [![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002Fkvcache-ai\u002FMooncake)\n  [![GitHub commit activity](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fw\u002Fkvcache-ai\u002FMooncake)](https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fgraphs\u002Fcommit-activity)\n  [![license](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fkvcache-ai\u002Fmooncake.svg)](https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fblob\u002Fmain\u002FLICENSE-APACHE)\n\n\u003C\u002Fdiv>\n\u003Cbr\u002F>\n\nMooncake is the serving platform for  \u003Ca href=\"https:\u002F\u002Fkimi.ai\u002F\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkvcache-ai_Mooncake_readme_0514200a5ad8.png\" alt=\"icon\" style=\"height: 16px; vertical-align: middle;\"> Kimi\u003C\u002Fa>, a leading LLM service provided by \u003Ca href=\"https:\u002F\u002Fwww.moonshot.cn\u002F\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkvcache-ai_Mooncake_readme_69c5b785aee5.jpg\" alt=\"icon\" style=\"height: 16px; vertical-align: middle;\"> Moonshot AI\u003C\u002Fa>.\nNow both the Transfer Engine and Mooncake Store are open-sourced!\nThis repository also hosts its technical report and the open-sourced traces.\n\n\u003Ch2 id=\"updates\">🔄 Updates\u003C\u002Fh2>\n\n- **Mar 19, 2026**: [TorchSpec: Speculative Decoding Training at Scale](https:\u002F\u002Fpytorch.org\u002Fblog\u002Ftorchspec-speculative-decoding-training-at-scale) is [open sourced](https:\u002F\u002Fgithub.com\u002Ftorchspec-project\u002FTorchSpec), using Mooncake to decouple inference and training via efficient hidden states management.\n- **Mar 5, 2026**: [LightX2V](https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Fpull\u002F893) now supports disaggregated deployment based on Mooncake, enabling encoder\u002Ftransformer service decoupling with Mooncake Transfer Engine for high-performance cross-device and cross-machine data transfer.\n- **Feb 25, 2026**: [SGLang](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang) merged [Encoder Global Cache Manager](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang\u002Fpull\u002F16137), introducing a Mooncake-powered global multimodal embedding cache that enables cross-instance sharing of ViT embeddings to avoid redundant GPU computation.\n- **Feb 24, 2026**: [vLLM-Omni](https:\u002F\u002Fdocs.vllm.ai\u002Fprojects\u002Fvllm-omni\u002Fen\u002Flatest\u002Fdesign\u002Ffeature\u002Fdisaggregated_inference\u002F) introduces disaggregated inference connectors with support for both `MooncakeStoreConnector` and `MooncakeTransferEngineConnector` for multi-node omni-modality pipelines.\n- **Feb 12, 2026**: [Mooncake Joins PyTorch Ecosystem](https:\u002F\u002Fpytorch.org\u002Fblog\u002Fmooncake-joins-pytorch-ecosystem\u002F) We are thrilled to announce that Mooncake has officially joined the PyTorch Ecosystem!\n- **Jan 28, 2026**: [FlexKV](https:\u002F\u002Fgithub.com\u002Ftaco-project\u002FFlexKV), a distributed KV store and cache system from Tencent and NVIDIA in collaboration with the community, now supports [distributed KVCache reuse](https:\u002F\u002Fgithub.com\u002Ftaco-project\u002FFlexKV\u002Fblob\u002Fmain\u002Fdocs\u002Fdist_reuse\u002FREADME_en.md) with the Mooncake Transfer Engine.\n - **Dec 27, 2025**: Collaboration with [ROLL](https:\u002F\u002Fgithub.com\u002Falibaba\u002FROLL)! Check out the paper [here](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.22560).\n - **Dec 23, 2025**: SGLang introduces [Encode-Prefill-Decode (EPD) Disaggregation](https:\u002F\u002Flmsys.org\u002Fblog\u002F2026-01-12-epd\u002F) with Mooncake as a transfer backend. This integration allows decoupling compute-intensive multimodal encoders (e.g., Vision Transformers) from language model nodes, utilizing Mooncake's RDMA engine for zero-copy transfer of large multimodal embeddings.\n - **Dec 19, 2025**: Mooncake Transfer Engine has been [integrated into TensorRT LLM](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FTensorRT-LLM\u002Ftree\u002Fmain\u002Fcpp\u002Ftensorrt_llm\u002Fexecutor\u002Fcache_transmission\u002Fmooncake_utils) for KVCache transfer in PD-disaggregated inference.\n - **Dec 19, 2025**: Mooncake Transfer Engine has been directly integrated into vLLM v1 as a [KV Connector](https:\u002F\u002Fdocs.vllm.ai\u002Fen\u002Flatest\u002Ffeatures\u002Fmooncake_connector_usage\u002F) in PD-disaggregated setups.\n - **Nov 07, 2025**: [RBG + SGLang HiCache + Mooncake](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Frbg\u002Fblob\u002Fmain\u002Fkeps\u002F74-mooncake-integration\u002FREADME.md), a role-based out-of-the-box solution for cloud native deployment, which is elastic, scalable, and high-performance.\n - **Sept 18, 2025**: Mooncake Store empowers vLLM Ascend by serving as [the distributed KV cache pool backend](https:\u002F\u002Fdocs.vllm.ai\u002Fprojects\u002Fascend\u002Fzh-cn\u002Fmain\u002Fuser_guide\u002Ffeature_guide\u002Fkv_pool.html).\n - **Sept 10, 2025**: SGLang officially supports Mooncake Store as a [hierarchical KV caching storage backend](https:\u002F\u002Flmsys.org\u002Fblog\u002F2025-09-10-sglang-hicache\u002F). The integration extends RadixAttention with multi-tier KV cache storage across device, host, and remote storage layers.\n - **Sept 10, 2025**: The official & high-performance version of Mooncake P2P Store is open-sourced as [checkpoint-engine](https:\u002F\u002Fgithub.com\u002FMoonshotAI\u002Fcheckpoint-engine\u002F). It has been successfully applied in K1.5 and K2 production training, updating Kimi-K2 model (1T parameters) across thousands of GPUs in ~20s.\n - **Aug 23, 2025**: [xLLM](https:\u002F\u002Fgithub.com\u002Fjd-opensource\u002Fxllm) high-performance inference engine builds hybrid KV cache management based on Mooncake, supporting global KV cache management with intelligent offloading and prefetching.\n - **Aug 18, 2025**: vLLM-Ascend [integrates Mooncake Transfer Engine](https:\u002F\u002Fdocs.vllm.ai\u002Fprojects\u002Fascend\u002Fen\u002Flatest\u002Fdeveloper_guide\u002Ffeature_guide\u002Fdisaggregated_prefill.html) for KV cache register and disaggregate prefill, enabling efficient distributed inference on Ascend NPUs.\n - **Jul 20, 2025**: Mooncake powers [the deployment of Kimi K2](https:\u002F\u002Flmsys.org\u002Fblog\u002F2025-07-20-k2-large-scale-ep\u002F) on 128 H200 GPUs with PD disaggregation and large-scale expert parallelism, achieving 224k tokens\u002Fsec prefill throughput and 288k tokens\u002Fsec decode throughput.\n - **Jun 20, 2025**: Mooncake becomes a PD disaggregation [backend](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fgetting_started\u002Fexamples\u002Flmdeploy-integration-v0.9.html) for LMDeploy.\n - **May 9, 2025**: NIXL officially supports Mooncake Transfer Engine as [a backend plugin](https:\u002F\u002Fgithub.com\u002Fai-dynamo\u002Fnixl\u002Fblob\u002Fmain\u002Fsrc\u002Fplugins\u002Fmooncake\u002FREADME.md).\n - **May 8, 2025**: [Mooncake x LMCache](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fgetting_started\u002Fexamples\u002Flmcache-integration.html) unite to pioneer KVCache-centric LLM serving system.\n - **May 5, 2025**: Supported by Mooncake Team, SGLang release \u003Ca href=\"https:\u002F\u002Flmsys.org\u002Fblog\u002F2025-05-05-large-scale-ep\u002F\" target=\"_blank\">guidance\u003C\u002Fa> to deploy DeepSeek with PD Disaggregation on 96 H100 GPUs.\n - **Apr 22, 2025**: LMCache officially supports Mooncake Store as a \u003Ca href=\"https:\u002F\u002Fblog.lmcache.ai\u002F2025-04-22-tencent\u002F\" target=\"_blank\">remote connector\u003C\u002Fa>.\n - **Apr 10, 2025**: SGLang officially supports Mooncake Transfer Engine for disaggregated prefilling and KV cache transfer.\n- **Mar 7, 2025**: We open-sourced the Mooncake Store, a distributed KVCache based on Transfer Engine. vLLM's xPyD disaggregated prefilling & decoding based on Mooncake Store will be released soon.\n - **Feb 25, 2025**: Mooncake receives the **Best Paper Award** at **FAST 2025**!\n - **Feb 21, 2025**: The updated \u003Ca href=\"FAST25-release\u002Ftraces\" target=\"_blank\">traces\u003C\u002Fa> used in our FAST'25 paper have been released.\n - **Dec 16, 2024**: vLLM officially supports Mooncake Transfer Engine for disaggregated prefilling and KV cache transfer.\n- **Nov 28, 2024**: We open-sourced the Transfer Engine, the central component of Mooncake. We also provide two demonstrations of Transfer Engine: a P2P Store and vLLM integration.\n- **July 9, 2024**: We open-sourced the trace as a \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fblob\u002Fmain\u002FFAST25-release\u002Farxiv-trace\u002Fmooncake_trace.jsonl\" target=\"_blank\">JSONL file\u003C\u002Fa>.\n - **June 27, 2024**: We present a series of Chinese blogs with more discussions on \u003Ca href=\"https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F705754254\">zhihu 1\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F705910725\">2\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F706204757\">3\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F707997501\">4\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F9461861451\">5\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F1939988652114580803\">6\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F1959366095443064318\">7\u003C\u002Fa>.\n - **June 26, 2024**: Initial technical report release.\n\n\n\u003Ch2 id=\"overview\">🎉 Overview\u003C\u002Fh2>\n\nMooncake features a KVCache-centric disaggregated architecture that separates the prefill and decoding clusters. It also leverages the underutilized CPU, DRAM, and SSD resources of the GPU cluster to implement a disaggregated KVCache pool.\n\n![architecture](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkvcache-ai_Mooncake_readme_f4c39cb25ae7.png)\n\nThe core of Mooncake is its KVCache-centric scheduler, which balances maximizing overall effective throughput while meeting latency-related Service Level Objectives (SLOs). Unlike traditional studies that assume all requests will be processed, Mooncake faces challenges in highly overloaded scenarios. To mitigate these, we developed a prediction-based early rejection policy. Experiments show that Mooncake excels in long-context scenarios. Compared to the baseline method, Mooncake can achieve up to a 525% increase in throughput in certain simulated scenarios while adhering to SLOs. Under real workloads, Mooncake’s innovative architecture enables \u003Ca href=\"https:\u002F\u002Fkimi.ai\u002F\">Kimi\u003C\u002Fa> to handle 75% more requests.\n\n\u003Ch2 id=\"components\">🧩 Components\u003C\u002Fh2>\n\n\u003C!-- ![components](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkvcache-ai_Mooncake_readme_9ed8b0474045.png) -->\n\u003Cimg src=https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkvcache-ai_Mooncake_readme_9ed8b0474045.png width=74% \u002F>\n\n**Mooncake Core Component: Transfer Engine (TE)**\nThe core of Mooncake is the Transfer Engine (TE), which provides a unified interface for batched data transfer across various storage devices and network links. Supporting multiple protocols including TCP, RDMA, CXL\u002Fshared-memory, and NVMe over Fabric (NVMe-of), TE is designed to enable fast and reliable data transfer for AI workloads. Compared to Gloo (used by Distributed PyTorch) and traditional TCP, TE achieves significantly lower I\u002FO latency, making it a superior solution for efficient data transmission.\n\n**P2P Store and Mooncake Store**\nBoth P2P Store and Mooncake Store are built on the Transfer Engine and provide key\u002Fvalue caching for different scenarios. P2P Store focuses on sharing temporary objects (e.g., checkpoint files) across nodes in a cluster, preventing bandwidth saturation on a single machine. Mooncake Store, on the other hand, supports distributed pooled KVCache, specifically designed for XpYd disaggregation to enhance resource utilization and system performance.\n\n**Mooncake Integration with Leading LLM Inference Systems**\nMooncake has been seamlessly integrated with several popular large language model (LLM) inference systems. Through collaboration with the vLLM and SGLang teams, Mooncake now officially supports prefill-decode disaggregation. By leveraging the high-efficiency communication capabilities of RDMA devices, Mooncake significantly improves inference efficiency in prefill-decode disaggregation scenarios, providing robust technical support for large-scale distributed inference tasks.\nIn addition, Mooncake has been successfully integrated with SGLang's Hierarchical KV Caching, vLLM's prefill serving, and LMCache, augmenting KV cache management capabilities across large-scale inference scenarios.\n\n**Elastic Expert Parallelism Support**\nMooncake adds elasticity and fault tolerance support for MoE model inference, enabling inference systems to remain responsive and recoverable in the event of GPU failures or changes in resource configuration. This functionality includes automatic faulty rank detection and can work with the EPLB module to dynamically route tokens to healthy ranks during inference.\n\n**Tensor-Centric Ecosystem**\nMooncake establishes a full-stack, Tensor-oriented AI infrastructure where Tensors serve as the fundamental data carrier. The ecosystem spans from the Transfer Engine, which accelerates Tensor data movement across heterogeneous storage (DRAM\u002FVRAM\u002FNVMe), to the P2P Store and Mooncake Store for distributed management of Tensor objects (e.g., Checkpoints and KVCache), up to the Mooncake Backend enabling Tensor-based elastic distributed computing. This architecture is designed to maximize Tensor processing efficiency for large-scale model inference and training.\n\n\u003Ch2 id=\"show-cases\">🔥 Show Cases\u003C\u002Fh2>\n\n### Use Transfer Engine Standalone ([Guide](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fdesign\u002Ftransfer-engine\u002Findex.html))\n\nTransfer Engine is a high-performance data transfer framework. Transfer Engine provides a unified interface to transfer data from DRAM, VRAM or NVMe, while the technical details related to hardware are hidden. Transfer Engine supports multiple communication protocols including TCP, RDMA (InfiniBand\u002FRoCEv2\u002FeRDMA\u002FNVIDIA GPUDirect), NVMe over Fabric (NVMe-of), NVLink, HIP, CXL, and Ascend. When built with the corresponding runtime, Transfer Engine can also detect and route accelerator memory on CUDA, MUSA, HIP, and Cambricon MLU devices. For a complete list of supported protocols and configuration guide, see the [Supported Protocols Documentation](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fgetting_started\u002Fsupported-protocols.html).\n\n#### Highlights\n- **Efficient use of multiple RDMA NIC devices.** Transfer Engine supports the use of multiple RDMA NIC devices to achieve the *aggregation of transfer bandwidth*.\n\n- **Topology aware path selection.** Transfer Engine can *select optimal devices* based on the location (NUMA affinity, etc.) of both source and destination.\n\n- **More robust against temporary network errors.** Once transmission fails, Transfer Engine will try to use alternative paths for data delivery automatically.\n\n#### Performance\nWith 40 GB of data (equivalent to the size of the KVCache generated by 128k tokens in the LLaMA3-70B model), Mooncake Transfer Engine delivers up to **87 GB\u002Fs** and **190 GB\u002Fs** of bandwidth in 4×200 Gbps and 8×400 Gbps RoCE networks respectively, which are about **2.4x and 4.6x faster** than the TCP protocol.\n\n\u003C!-- ![transfer-engine-performance.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkvcache-ai_Mooncake_readme_d8a0c90a97ae.png) -->\n\u003Cimg src=https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkvcache-ai_Mooncake_readme_d8a0c90a97ae.png width=75% \u002F>\n\n### P2P Store  ([Guide](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fdesign\u002Fp2p-store.html))\nP2P Store is built on the Transfer Engine and supports sharing temporary objects between peer nodes in a cluster. P2P Store is ideal for scenarios like checkpoint transfer, where data needs to be rapidly and efficiently shared across a cluster.\n**P2P Store has been used in the checkpoint transfer service of Moonshot AI.**\n\n#### Highlights\n- **Decentralized architecture.** P2P Store leverages a pure client-side architecture with global metadata managed by the etcd service.\n\n- **Efficient data distribution.** Designed to enhance the efficiency of large-scale data distribution, P2P Store *avoids bandwidth saturation* issues by allowing replicated nodes to share data directly. This reduces the CPU\u002FRDMA NIC pressures of data providers (e.g., trainers).\n\n\u003C!-- #### Performance\nThanks to the high performance of Transfer Engine, P2P Stores can also distribute objects with full utilization of *hardware incoming bandwidth* (e.g., A 25Gbps NIC was used in the following figure, and the throughput of get replica is about 3.1 GB\u002Fs). -->\n\n\u003C!-- ![p2p-store.gif](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkvcache-ai_Mooncake_readme_05d5c15c4fd6.gif) -->\n\n### Mooncake Store ([Guide](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fdesign\u002Fmooncake-store.html))\nMooncake Store is a distributed KVCache storage engine specialized for LLM inference based on Transfer Engine. It is the central component of the KVCache-centric disaggregated architecture. The goal of Mooncake Store is to store the reusable KV caches across various locations in an inference cluster. Mooncake Store has been supported in  [SGLang's Hierarchical KV Caching](https:\u002F\u002Flmsys.org\u002Fblog\u002F2025-09-10-sglang-hicache\u002F), [vLLM's prefill serving](https:\u002F\u002Fdocs.vllm.ai\u002Fen\u002Flatest\u002Ffeatures\u002Fdisagg_prefill.html) and is now integrated with [LMCache](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fgetting_started\u002Fexamples\u002Flmcache-integration.html) to provide enhanced KVCache management capabilities.\n\n#### Highlights\n- **Multi-replica support**: Mooncake Store supports storing multiple data replicas for the same object, effectively alleviating hotspots in access pressure.\n\n- **High bandwidth utilization**: Mooncake Store supports striping and parallel I\u002FO transfer of large objects, fully utilizing multi-NIC aggregated bandwidth for high-speed data reads and writes.\n\n### SGLang Integration ([Guide](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fgetting_started\u002Fexamples\u002Fsglang-integration\u002Fhicache-integration-v1.html))\n\nSGLang officially supports Mooncake Store as a [HiCache storage backend](https:\u002F\u002Flmsys.org\u002Fblog\u002F2025-09-10-sglang-hicache\u002F). This integration enables scalable KV cache retention and high-performance access for large-scale LLM serving scenarios.\n\n#### Highlights\n- **Hierarchical KV Caching**: Mooncake Store serves as an external storage backend in SGLang's HiCache system, extending RadixAttention with multi-level KV cache storage across device, host, and remote storage layers.\n- **Flexible Cache Management**: Supports multiple cache policies including write-through, write-through-selective, and write-back modes, with intelligent prefetching strategies for optimal performance.\n- **Comprehensive Optimizations**: Features advanced data plane optimizations including page-first memory layout for improved I\u002FO efficiency, zero-copy mechanisms for reduced memory overhead, GPU-assisted I\u002FO kernels delivering fast CPU-GPU transfers, and layer-wise overlapping for concurrent KV cache loading while computation executes.\n- **Elastic Expert Parallel**: Mooncake's collective communication backend and expert parallel kernels are integrated into SGLang to enable fault-tolerant expert parallel inference ([sglang#11657](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang\u002Fpull\u002F11657)).\n- **Significant Performance Gains**: The multi-turn benchmark demonstrates substantial performance improvements over the non-HiCache setting. See our [benchmark report](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fperformance\u002Fsglang-hicache-benchmark-results-v1.html) for more details.\n- **Community Feedback**: Effective KV caching significantly reduces TTFT by eliminating redundant and costly re-computation. Integrating SGLang HiCache with the Mooncake service enables scalable KV cache retention and high-performance access. In our evaluation, we tested the DeepSeek-R1-671B model under PD-disaggregated deployment using in-house online requests sampled from a general QA scenario. On average, cache hits achieved an 84% reduction in TTFT compared to full re-computation. – Ant Group\n\n### vLLM Integration ([Guide v0.2](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fgetting_started\u002Fexamples\u002Fvllm-integration\u002Fvllm-integration-v0.2.html))\nTo optimize LLM inference, the vLLM community is working on supporting [disaggregated prefilling (PR 10502)](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm\u002Fpull\u002F10502). This feature allows separating the **prefill** phase from the **decode** phase in different processes. The vLLM uses `nccl` and `gloo` as the transport layer by default, but currently it cannot efficiently decouple both phases in different machines.\n\nWe have implemented vLLM integration, which uses Transfer Engine as the network layer instead of `nccl` and `gloo`, to support **inter-node KVCache transfer** [(PR 10884)](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm\u002Fpull\u002F10884). Transfer Engine provides simpler interfaces and more efficient use of RDMA devices.\n\nWe will soon release the new vLLM integration based on Mooncake Store, which supports xPyD prefill\u002Fdecode disaggregation.\n\n**_Update[Dec 16, 2024]: Here is the latest vLLM Integration ([Guide v0.2](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fgetting_started\u002Fexamples\u002Fvllm-integration\u002Fvllm-integration-v0.2.html)) that is based on vLLM's main branch._**\n\n#### Performance\nBy supporting Topology Aware Path Selection and multi-card bandwidth aggregation, Mean TTFT of vLLM with Transfer Engine is up to 25% lower than traditional TCP-based transports.\nIn the future, we will further improve TTFT through GPUDirect RDMA and zero-copy.\n\n| Backend\u002FSetting                                         | Output Token Throughput (tok\u002Fs) | Total Token Throughput (tok\u002Fs) | Mean TTFT (ms) | Median TTFT (ms) | P99 TTFT (ms)|\n|---------------------------------------------------------|---------------------------------|--------------------------------|----------------|------------------|---------------|\n| Transfer Engine (RDMA) | 12.06                           | 2042.74                        | 1056.76        | 635.00           | 4006.59       |\n| TCP  | 12.05                           | 2041.13                        | 1414.05        | 766.23          | 6035.36       |\n\n- Click [here](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fperformance\u002Fvllm-benchmark-results-v0.2.html) to access detailed benchmark results.\n\n**More advanced features are coming soon, so stay tuned!**\n\n\u003Ch2 id=\"quick-start\">🚀 Quick Start\u003C\u002Fh2>\n\n### Before using Mooncake\n\nMooncake is designed and optimized for high-speed RDMA networks. Though Mooncake supports TCP-only data transfer, we **strongly** recommend users to evaluate the functionality and performance of Mooncake with RDMA network support.\n\nThe following need to be installed before running any component of Mooncake:\n- RDMA Driver & SDK, such as Mellanox OFED.\n- Python 3.10, virtual environment is recommended.\n- CUDA 12.1 and above, including NVIDIA GPUDirect Storage Support, if the package is built with `-DUSE_CUDA` (disabled by default). *You may install them from [here](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-downloads)*.\n- Cambricon Neuware, if the package is built with `-DUSE_MLU`. By default Mooncake looks for Neuware under `NEUWARE_HOME` or `\u002Fusr\u002Flocal\u002Fneuware`.\n\n### Use Python package\nThe simplest way to use Mooncake Transfer Engine is using `pip`:\n\n**For CUDA-enabled systems:**\n\n- CUDA \u003C 13.0\n```bash\npip install mooncake-transfer-engine\n```\n- CUDA >= 13.0\n```bash\npip install mooncake-transfer-engine-cuda13\n```\n\n**For non-CUDA systems:**\n```bash\npip install mooncake-transfer-engine-non-cuda\n```\n\n> [!IMPORTANT]\n> - The CUDA version (`mooncake-transfer-engine`) includes Mooncake-EP and GPU topology detection, requiring CUDA 12.1+.\n> - The non-CUDA version (`mooncake-transfer-engine-non-cuda`) is for environments without CUDA dependencies.\n> - MLU support is currently available through source builds with `-DUSE_MLU=ON`; there is no dedicated prebuilt MLU wheel yet.\n> - If users encounter problems such as missing `lib*.so`, they should uninstall the package they installed and build the binaries manually.\n\n### Use Docker image\nMooncake supports Docker-based deployment, see [Build Guide](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fgetting_started\u002Fbuild.html) in detail.\n\nTo produce an image that compiles Mooncake from source, builds the wheel via `scripts\u002Fbuild_wheel.sh`, and installs that wheel inside the container, use `build-wheel.dockerfile`:\n\n```bash\ndocker build -f docker\u002Fmooncake.Dockerfile \\\n  --build-arg PYTHON_VERSION=3.10 \\\n  --build-arg EP_TORCH_VERSIONS=\"2.9.1\" \\\n  -t mooncake:from-source .\n```\n\nThe resulting image already has a virtual environment at `\u002Fopt\u002Fvenv` with the freshly built wheel installed. Launch it with GPU\u002FRDMA access as needed, for example:\n\n```bash\ndocker run --gpus all --network host -it mooncake:from-source \u002Fbin\u002Fbash\n```\n\n> [!NOTE]\n> Make sure you build the image from the repository root so that Git metadata and submodules are available inside the build context.\n\n### Build and use binaries\nThe following are additional dependencies for building Mooncake:\n- Build essentials, including gcc, g++ (9.4+) and cmake (3.16+).\n- Go 1.20+, if you want to build with `-DWITH_P2P_STORE`, `-DUSE_ETCD` (enabled by default to use etcd as metadata servers), or `-DSTORE_USE_ETCD` (use etcd for the failover of the store master).\n- CUDA 12.1 and above, including NVIDIA GPUDirect Storage Support, if the package is built with `-DUSE_CUDA`. *This is NOT included in the `dependencies.sh` script. You may install them from [here](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-downloads)*.\n- Cambricon Neuware, if you want to build with `-DUSE_MLU`. *This is NOT included in the `dependencies.sh` script.* Mooncake resolves it from `NEUWARE_HOME` or `\u002Fusr\u002Flocal\u002Fneuware` by default, and also supports overriding `MLU_INCLUDE_DIR` \u002F `MLU_LIB_DIR` during CMake configure.\n- [Optional] Rust Toolchain, if you want to build with `-DWITH_RUST_EXAMPLE`. *This is NOT included in the `dependencies.sh` script.*\n- [Optional] `hiredis`, if you want to build with `-DUSE_REDIS` to use Redis instead of etcd as metadata servers.\n- [Optional] `curl`, if you want to build with `-DUSE_HTTP` to use HTTP instead of etcd as metadata servers.\n\nThe build and installation steps are as follows:\n1. Retrieve source code from GitHub repo\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake.git\n   cd Mooncake\n   ```\n\n2. Install dependencies\n   ```bash\n   bash dependencies.sh\n   ```\n\n3. Compile Mooncake and examples\n   ```bash\n   mkdir build\n   cd build\n   cmake ..\n   make -j\n   sudo make install # optional, make it ready to be used by vLLM\u002FSGLang\n   ```\n\nFor Cambricon MLU builds, configure CMake with `-DUSE_MLU=ON`. For example:\n```bash\nmkdir build\ncd build\ncmake .. -DUSE_MLU=ON -DNEUWARE_ROOT=\u002Fusr\u002Flocal\u002Fneuware\nmake -j\n```\n\n\n\u003Ch2 id=\"milestones\"> 🛣️ Incoming Milestones\u003C\u002Fh2>\n\n- [x] First release of Mooncake and integrate with latest vLLM\n- [ ] Share KV caches across multiple serving engines\n- [ ] User and developer documentation\n\n\u003Ch2 id=\"trace\">📦 Open Source Trace\u003C\u002Fh2>\n\n```json\n{\n    \"timestamp\": 27482,\n    \"input_length\": 6955,\n    \"output_length\": 52,\n    \"hash_ids\": [46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 2353, 2354]\n}\n{\n    \"timestamp\": 30535,\n    \"input_length\": 6472,\n    \"output_length\": 26,\n    \"hash_ids\": [46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 2366]\n}\n```\nThe above presents two samples from our trace dataset. The trace includes the timing of request arrivals, the number of input tokens, the number of output tokens, and the remapped block hash. To protect our customers' privacy, we applied several mechanisms to remove user-related information while preserving the dataset's utility for simulated evaluation. More descriptions of the trace (e.g., up to 50% cache hit ratio) can be found in Section 4 of the technical report.\n\n**_Update[Feb 21, 2025]: The updated [traces](FAST25-release\u002Ftraces) used in our FAST'25 paper have been released! Please refer to the paper's appendix (found [here](FAST25-release\u002FMooncake-FAST25.pdf)) for more details._**\n\n\u003Ch2 id=\"citation\">📑 Citation\u003C\u002Fh2>\nPlease kindly cite our paper if you find the paper or the traces are useful:\n\n```bibtex\n@article{qin2025mooncake_tos,\n  author    = {Qin Ruoyu and Li Zheming and He Weiran and Cui Jialei and Tang Heyi and Ren Feng and Ma Teng and Cai Shangming and Zhang Yineng and Zhang Mingxing and Wu Yongwei and Zheng Weimin and Xu Xinran},\n  title     = {Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving},\n  year      = {2025},\n  publisher = {Association for Computing Machinery},\n  address   = {New York, NY, USA},\n  issn      = {1553-3077},\n  url       = {https:\u002F\u002Fdoi.org\u002F10.1145\u002F3773772},\n  doi       = {10.1145\u002F3773772},\n  journal   = {ACM Trans. Storage},\n  month     = {nov},\n  keywords  = {Machine learning system, LLM serving, KVCache},\n}\n\n@inproceedings{qin2025mooncake,\n  author    = {Ruoyu Qin and Zheming Li and Weiran He and Jialei Cui and Feng Ren and Mingxing Zhang and Yongwei Wu and Weimin Zheng and Xinran Xu},\n  title     = {Mooncake: Trading More Storage for Less Computation {\\textemdash} A {KVCache-centric} Architecture for Serving {LLM} Chatbot},\n  booktitle = {23rd USENIX Conference on File and Storage Technologies (FAST 25)},\n  year      = {2025},\n  isbn      = {978-1-939133-45-8},\n  address   = {Santa Clara, CA},\n  pages     = {155--170},\n  url       = {https:\u002F\u002Fwww.usenix.org\u002Fconference\u002Ffast25\u002Fpresentation\u002Fqin},\n  publisher = {USENIX Association},\n  month     = {feb},\n}\n\n@article{qin2024mooncake_arxiv,\n  title  = {Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving},\n  author = {Ruoyu Qin and Zheming Li and Weiran He and Mingxing Zhang and Yongwei Wu and Weimin Zheng and Xinran Xu},\n  year   = {2024},\n  url    = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.00079},\n}\n```\n","\u003Cdiv align=\"center\">\n  \u003Cimg src=image\u002Fmooncake-icon.png width=44% \u002F>\n  \u003Ch2 align=\"center\">\n      一种以KV缓存为中心的解耦架构，用于大模型推理服务\n  \u003C\u002Fh2>\n  \u003Ca href=\"https:\u002F\u002Fwww.usenix.org\u002Fsystem\u002Ffiles\u002Ffast25-qin.pdf\" target=\"_blank\">\u003Cstrong>论文\u003C\u002Fstrong>\u003C\u002Fa>\n  | \u003Ca href=\"https:\u002F\u002Fwww.usenix.org\u002Fsystem\u002Ffiles\u002Ffast25_slides-qin.pdf\" target=\"_blank\">\u003Cstrong>演示文稿\u003C\u002Fstrong>\u003C\u002Fa>\n  | \u003Ca href=\"FAST25-release\u002Ftraces\" target=\"_blank\">\u003Cstrong>数据追踪文件\u003C\u002Fstrong>\u003C\u002Fa>\n  | \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.00079\" target=\"_blank\">\u003Cstrong>技术报告\u003C\u002Fstrong>\u003C\u002Fa>\n  | \u003Ca href=\"https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002F\" target=\"_blank\">\u003Cstrong>博客\u003C\u002Fstrong>\u003C\u002Fa>\n  | \u003Ca href=\"https:\u002F\u002Fjoin.slack.com\u002Ft\u002Fmooncake-project\u002Fshared_invite\u002Fzt-3qx4x35ea-zSSTqTHItHJs9SCoXLOSPA\" target=\"_blank\">\u003Cstrong>Slack 社区\u003C\u002Fstrong>\u003C\u002Fa>\n  \u003Cbr \u002F>\n  \u003Cbr \u002F>\n\n  [![文档](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdocs-live-brightgreen)](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002F)\n  [![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fmooncake-transfer-engine)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmooncake-transfer-engine)\n  [![PyPI - Python 版本](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fmooncake-transfer-engine)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmooncake-transfer-engine)\n  [![CUDA \u003C=12.9](https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=CUDA&message=%3C%3D12.9&color=76B900)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmooncake-transfer-engine)\n  [![CUDA 13.0\u002F13.1](https:\u002F\u002Fimg.shields.io\u002Fstatic\u002Fv1?label=CUDA&message=13.0%2F13.1&color=76B900)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmooncake-transfer-engine-cuda13)\n  [![PyPI - 下载量](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fdm\u002Fmooncake-transfer-engine)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fmooncake-transfer-engine)\n  [![Ask DeepWiki](https:\u002F\u002Fdeepwiki.com\u002Fbadge.svg)](https:\u002F\u002Fdeepwiki.com\u002Fkvcache-ai\u002FMooncake)\n  [![GitHub 提交活跃度](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fcommit-activity\u002Fw\u002Fkvcache-ai\u002FMooncake)](https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fgraphs\u002Fcommit-activity)\n  [![许可证](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002Fkvcache-ai\u002Fmooncake.svg)](https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fblob\u002Fmain\u002FLICENSE-APACHE)\n\n\u003C\u002Fdiv>\n\u003Cbr\u002F>\n\nMooncake 是 \u003Ca href=\"https:\u002F\u002Fkimi.ai\u002F\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkvcache-ai_Mooncake_readme_0514200a5ad8.png\" alt=\"icon\" style=\"height: 16px; vertical-align: middle;\"> Kimi\u003C\u002Fa> 的推理服务平台，Kimi 是由 \u003Ca href=\"https:\u002F\u002Fwww.moonshot.cn\u002F\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkvcache-ai_Mooncake_readme_69c5b785aee5.jpg\" alt=\"icon\" style=\"height: 16px; vertical-align: middle;\"> Moonshot AI\u003C\u002Fa> 提供的领先大模型服务。目前，Transfer Engine 和 Mooncake Store 均已开源！本仓库还托管了其技术报告以及开源的数据追踪文件。\n\n\u003Ch2 id=\"updates\">🔄 最新动态\u003C\u002Fh2>\n\n- **2026年3月19日**: [TorchSpec: 大规模推测解码训练](https:\u002F\u002Fpytorch.org\u002Fblog\u002Ftorchspec-speculative-decoding-training-at-scale) 已 [开源](https:\u002F\u002Fgithub.com\u002Ftorchspec-project\u002FTorchSpec)，利用 Mooncake 通过高效的隐藏状态管理实现推理与训练的解耦。\n- **2026年3月5日**: [LightX2V](https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\u002Fpull\u002F893) 现已支持基于 Mooncake 的分离式部署，借助 Mooncake Transfer Engine 实现编码器\u002FTransformer 服务的解耦，从而实现高性能的跨设备、跨机器数据传输。\n- **2026年2月25日**: [SGLang](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang) 合并了 [Encoder Global Cache Manager](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang\u002Fpull\u002F16137)，引入由 Mooncake 提供支持的全局多模态嵌入缓存，实现 ViT 嵌入在不同实例间的共享，避免重复的 GPU 计算。\n- **2026年2月24日**: [vLLM-Omni](https:\u002F\u002Fdocs.vllm.ai\u002Fprojects\u002Fvllm-omni\u002Fen\u002Flatest\u002Fdesign\u002Ffeature\u002Fdisaggregated_inference\u002F) 引入了分离式推理连接器，同时支持 `MooncakeStoreConnector` 和 `MooncakeTransferEngineConnector`，用于多节点全模态流水线。\n- **2026年2月12日**: [Mooncake 加入 PyTorch 生态系统](https:\u002F\u002Fpytorch.org\u002Fblog\u002Fmooncake-joins-pytorch-ecosystem\u002F) 我们非常高兴地宣布，Mooncake 已正式加入 PyTorch 生态系统！\n- **2026年1月28日**: [FlexKV](https:\u002F\u002Fgithub.com\u002Ftaco-project\u002FFlexKV)，由腾讯和 NVIDIA 联合社区开发的分布式 KV 存储与缓存系统，现已支持使用 Mooncake Transfer Engine 进行 [分布式 KVCache 复用](https:\u002F\u002Fgithub.com\u002Ftaco-project\u002FFlexKV\u002Fblob\u002Fmain\u002Fdocs\u002Fdist_reuse\u002FREADME_en.md)。\n- **2025年12月27日**: 与 [ROLL](https:\u002F\u002Fgithub.com\u002Falibaba\u002FROLL) 达成合作！论文请见 [这里](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.22560)。\n- **2025年12月23日**: SGLang 引入了 [Encode-Prefill-Decode (EPD) 分离](https:\u002F\u002Flmsys.org\u002Fblog\u002F2026-01-12-epd\u002F)，以 Mooncake 作为传输后端。该集成允许将计算密集型多模态编码器（如 Vision Transformer）从语言模型节点中解耦，并利用 Mooncake 的 RDMA 引擎实现大型多模态嵌入的零拷贝传输。\n- **2025年12月19日**: Mooncake Transfer Engine 已被 [集成到 TensorRT LLM 中](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FTensorRT-LLM\u002Ftree\u002Fmain\u002Fcpp\u002Ftensorrt_llm\u002Fexecutor\u002Fcache_transmission\u002Fmooncake_utils)，用于 PD 分离式推理中的 KVCache 传输。\n- **2025年12月19日**: Mooncake Transfer Engine 已直接集成到 vLLM v1 中，作为 PD 分离式设置中的 [KV 连接器](https:\u002F\u002Fdocs.vllm.ai\u002Fen\u002Flatest\u002Ffeatures\u002Fmooncake_connector_usage\u002F)。\n- **2025年11月7日**: [RBG + SGLang HiCache + Mooncake](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Frbg\u002Fblob\u002Fmain\u002Fkeps\u002F74-mooncake-integration\u002FREADME.md)，一种基于角色的云原生部署开箱即用解决方案，具备弹性、可扩展性和高性能。\n- **2025年9月18日**: Mooncake Store 为 vLLM Ascend 提供支持，作为 [分布式 KV 缓存池后端](https:\u002F\u002Fdocs.vllm.ai\u002Fprojects\u002Fascend\u002Fzh-cn\u002Fmain\u002Fuser_guide\u002Ffeature_guide\u002Fkv_pool.html)。\n- **2025年9月10日**: SGLang 正式支持 Mooncake Store 作为 [层次化 KV 缓存存储后端](https:\u002F\u002Flmsys.org\u002Fblog\u002F2025-09-10-sglang-hicache\u002F)。该集成将 RadixAttention 扩展至跨设备、主机及远程存储层的多级 KV 缓存存储。\n- **2025年9月10日**: Mooncake P2P Store 的官方高性能版本已作为 [checkpoint-engine](https:\u002F\u002Fgithub.com\u002FMoonshotAI\u002Fcheckpoint-engine\u002F) 开源。它已成功应用于 K1.5 和 K2 的生产训练中，在数千张 GPU 上以约 20 秒的时间更新 Kimi-K2 模型（1T 参数）。\n- **2025年8月23日**: [xLLM](https:\u002F\u002Fgithub.com\u002Fjd-opensource\u002Fxllm) 高性能推理引擎基于 Mooncake 构建混合 KV 缓存管理方案，支持全局 KV 缓存管理，并具备智能卸载与预取功能。\n- **2025年8月18日**: vLLM-Ascend [集成 Mooncake Transfer Engine](https:\u002F\u002Fdocs.vllm.ai\u002Fprojects\u002Fascend\u002Fen\u002Flatest\u002Fdeveloper_guide\u002Ffeature_guide\u002Fdisaggregated_prefill.html) 用于 KV 缓存注册和分离式预填充，从而实现 Ascend NPU 上的高效分布式推理。\n- **2025年7月20日**: Mooncake 支持 [Kimi K2 的部署](https:\u002F\u002Flmsys.org\u002Fblog\u002F2025-07-20-k2-large-scale-ep\u002F) 在 128 张 H200 GPU 上采用 PD 分离和大规模专家并行技术，实现了 224k tokens\u002Fsec 的预填充吞吐量和 288k tokens\u002Fsec 的解码吞吐量。\n- **2025年6月20日**: Mooncake 成为 LMDeploy 的 PD 分离式 [后端](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fgetting_started\u002Fexamples\u002Flmdeploy-integration-v0.9.html)。\n- **2025年5月9日**: NIXL 正式支持 Mooncake Transfer Engine 作为 [后端插件](https:\u002F\u002Fgithub.com\u002Fai-dynamo\u002Fnixl\u002Fblob\u002Fmain\u002Fsrc\u002Fplugins\u002Fmooncake\u002FREADME.md)。\n- **2025年5月8日**: [Mooncake x LMCache](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fgetting_started\u002Fexamples\u002Flmcache-integration.html) 联手，开创以 KVCache 为中心的 LLM 服务系统。\n- **2025年5月5日**: 在 Mooncake 团队的支持下，SGLang 发布了 \u003Ca href=\"https:\u002F\u002Flmsys.org\u002Fblog\u002F2025-05-05-large-scale-ep\u002F\" target=\"_blank\">指南\u003C\u002Fa>,介绍如何在 96 张 H100 GPU 上使用 PD 分离部署 DeepSeek。\n- **2025年4月22日**: LMCache 正式支持 Mooncake Store 作为 \u003Ca href=\"https:\u002F\u002Fblog.lmcache.ai\u002F2025-04-22-tencent\u002F\" target=\"_blank\">远程连接器\u003C\u002Fa>。\n- **2025年4月10日**: SGLang 正式支持 Mooncake Transfer Engine，用于分离式预填充和 KV 缓存传输。\n- **2025年3月7日**: 我们开源了 Mooncake Store，这是一个基于 Transfer Engine 的分布式 KVCache。基于 Mooncake Store 的 vLLM xPyD 分离式预填充和解码功能即将发布。\n- **2025年2月25日**: Mooncake 在 **FAST 2025** 上荣获 **最佳论文奖**！\n- **2025年2月21日**: 我们发布了用于 FAST'25 论文的更新版 \u003Ca href=\"FAST25-release\u002Ftraces\" target=\"_blank\">trace 数据\u003C\u002Fa>。\n- **2024年12月16日**: vLLM 正式支持 Mooncake Transfer Engine，用于分离式预填充和 KV 缓存传输。\n- **2024年11月28日**: 我们开源了 Transfer Engine，这是 Mooncake 的核心组件。同时提供了两个关于 Transfer Engine 的演示：P2P Store 和 vLLM 集成。\n- **2024年7月9日**: 我们将 trace 以 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fblob\u002Fmain\u002FFAST25-release\u002Farxiv-trace\u002Fmooncake_trace.jsonl\" target=\"_blank\">JSONL 文件\u003C\u002Fa> 的形式开源。\n- **2024年6月27日**: 我们发布了一系列中文博客，进一步探讨相关内容，详见 \u003Ca href=\"https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F705754254\">知乎 1\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F705910725\">2\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F706204757\">3\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F707997501\">4\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F9461861451\">5\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F1939988652114580803\">6\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F1959366095443064318\">7\u003C\u002Fa>。\n- **2024年6月26日**: 初步技术报告发布。\n\n\u003Ch2 id=\"overview\">🎉 概述\u003C\u002Fh2>\n\nMooncake 采用以 KVCache 为中心的解耦架构，将预填充和解码集群分离。同时，它充分利用 GPU 集群中未充分使用的 CPU、DRAM 和 SSD 资源，构建了一个分布式 KVCache 池。\n\n![architecture](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkvcache-ai_Mooncake_readme_f4c39cb25ae7.png)\n\nMooncake 的核心是其基于 KVCache 的调度器，能够在满足延迟相关的服务等级目标（SLO）的同时，最大化整体有效吞吐量。与传统研究假设所有请求都会被处理不同，Mooncake 面临着高负载场景下的挑战。为此，我们开发了一种基于预测的早期拒绝策略。实验表明，Mooncake 在长上下文场景中表现尤为出色。与基准方法相比，在某些模拟场景中，Mooncake 可以在遵守 SLO 的前提下，将吞吐量提升高达 525%。而在实际工作负载下，Mooncake 的创新架构使 \u003Ca href=\"https:\u002F\u002Fkimi.ai\u002F\">Kimi\u003C\u002Fa> 能够处理多出 75% 的请求。\n\n\u003Ch2 id=\"components\">🧩 组件\u003C\u002Fh2>\n\n\u003C!-- ![components](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkvcache-ai_Mooncake_readme_9ed8b0474045.png) -->\n\u003Cimg src=https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkvcache-ai_Mooncake_readme_9ed8b0474045.png width=74% \u002F>\n\n**Mooncake 核心组件：传输引擎（TE）**\nMooncake 的核心是传输引擎（TE），它为跨多种存储设备和网络链路的批量数据传输提供统一接口。TE 支持 TCP、RDMA、CXL\u002F共享内存以及 NVMe over Fabric（NVMe-of）等多种协议，旨在为 AI 工作负载实现快速可靠的数据传输。与 Distributed PyTorch 使用的 Gloo 以及传统 TCP 相比，TE 的 I\u002FO 延迟显著更低，是高效数据传输的更优解决方案。\n\n**P2P 存储与 Mooncake 存储**\nP2P 存储和 Mooncake 存储均基于传输引擎构建，分别针对不同场景提供键值缓存功能。P2P 存储专注于在集群节点间共享临时对象（如检查点文件），从而避免单台机器带宽饱和。而 Mooncake 存储则支持分布式池化 KVCache，专为 XpYd 解耦设计，以提升资源利用率和系统性能。\n\n**Mooncake 与主流 LLM 推理系统的集成**\nMooncake 已与多个流行的大语言模型（LLM）推理系统无缝集成。通过与 vLLM 和 SGLang 团队的合作，Mooncake 现已正式支持预填充-解码解耦。借助 RDMA 设备的高效通信能力，Mooncake 在预填充-解码解耦场景中显著提升了推理效率，为大规模分布式推理任务提供了强大的技术支持。\n此外，Mooncake 还成功集成了 SGLang 的分层 KV 缓存、vLLM 的预填充服务以及 LMCache，从而增强了大规模推理场景下的 KV 缓存管理能力。\n\n**弹性专家并行支持**\nMooncake 为 MoE 模型推理增加了弹性和容错支持，使推理系统在 GPU 故障或资源配置变化时仍能保持响应并恢复运行。该功能包括自动检测故障 Rank，并可与 EPLB 模块协同工作，在推理过程中将 Token 动态路由到健康的 Rank 上。\n\n**以 Tensor 为中心的生态系统**\nMooncake 构建了一个全栈式的、面向 Tensor 的 AI 基础设施，其中 Tensor 是基础的数据载体。该生态系统从加速异构存储（DRAM\u002FVRAM\u002FNVMe）间 Tensor 数据移动的传输引擎，到用于分布式管理 Tensor 对象（如检查点和 KVCache）的 P2P 存储和 Mooncake 存储，再到支持基于 Tensor 的弹性分布式计算的 Mooncake 后端，层层递进。这一架构旨在最大限度地提高大规模模型推理和训练中的 Tensor 处理效率。\n\n\u003Ch2 id=\"show-cases\">🔥 案例展示\u003C\u002Fh2>\n\n\n\n### 单独使用传输引擎（[指南](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fdesign\u002Ftransfer-engine\u002Findex.html)）\n\n传输引擎是一个高性能的数据传输框架。它提供统一的接口来传输来自 DRAM、VRAM 或 NVMe 的数据，同时隐藏了与硬件相关的技术细节。传输引擎支持多种通信协议，包括 TCP、RDMA（InfiniBand\u002FRoCEv2\u002FeRDMA\u002FNVIDIA GPUDirect）、NVMe over Fabric（NVMe-of）、NVLink、HIP、CXL 和 Ascend 等。当与相应运行时环境结合使用时，传输引擎还能检测并路由 CUDA、MUSA、HIP 和寒武纪 MLU 设备上的加速器显存。有关支持的完整协议列表和配置指南，请参阅 [支持的协议文档](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fgetting_started\u002Fsupported-protocols.html)。\n\n#### 亮点\n- **高效利用多张 RDMA 网卡。** 传输引擎支持使用多张 RDMA 网卡，实现 *传输带宽的聚合*。\n\n- **拓扑感知路径选择。** 传输引擎能够根据源和目标的位置（NUMA 亲和性等） *选择最优设备*。\n\n- **对临时网络故障更具鲁棒性。** 一旦传输失败，传输引擎会自动尝试使用替代路径进行数据交付。\n\n#### 性能\n在 4×200 Gbps 和 8×400 Gbps RoCE 网络中，对于 40 GB 的数据（相当于 LLaMA3-70B 模型中 12.8 万个 Token 生成的 KVCache 大小），Mooncake 传输引擎分别可达到 **87 GB\u002Fs** 和 **190 GB\u002Fs** 的带宽，这比 TCP 协议快约 **2.4 倍和 4.6 倍**。\n\n\u003C!-- ![transfer-engine-performance.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkvcache-ai_Mooncake_readme_d8a0c90a97ae.png) -->\n\u003Cimg src=https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkvcache-ai_Mooncake_readme_d8a0c90a97ae.png width=75% \u002F>\n\n### P2P Store  ([指南](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fdesign\u002Fp2p-store.html))\nP2P Store 构建于 Transfer Engine 之上，支持在集群中的对等节点间共享临时对象。P2P Store 非常适合检查点传输等场景，在这些场景中需要在集群内快速高效地共享数据。\n**P2P Store 已被用于 Moonshot AI 的检查点传输服务中。**\n\n#### 亮点\n- **去中心化架构。** P2P Store 采用纯客户端架构，全局元数据由 etcd 服务管理。\n\n- **高效的数据分发。** 为提升大规模数据分发效率而设计，P2P Store *避免了带宽饱和*问题，允许副本节点直接共享数据。这降低了数据提供者（例如训练器）的 CPU 和 RDMA 网卡压力。\n\n\u003C!-- #### 性能\n得益于 Transfer Engine 的高性能，P2P Store 还可以以 *硬件入站带宽* 的充分利用来分发对象（例如下图中使用了一块 25Gbps 的网卡，获取副本的吞吐量约为 3.1 GB\u002Fs）。 -->\n\n\u003C!-- ![p2p-store.gif](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkvcache-ai_Mooncake_readme_05d5c15c4fd6.gif) -->\n\n### Mooncake Store ([指南](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fdesign\u002Fmooncake-store.html))\nMooncake Store 是一个基于 Transfer Engine 的分布式 KVCache 存储引擎，专为 LLM 推理而设计。它是以 KVCache 为中心的分离式架构的核心组件。Mooncake Store 的目标是在推理集群中的各个位置存储可重用的 KVCache。Mooncake Store 已被 [SGLang 的分层 KV 缓存](https:\u002F\u002Flmsys.org\u002Fblog\u002F2025-09-10-sglang-hicache\u002F)、[vLLM 的预填充服务](https:\u002F\u002Fdocs.vllm.ai\u002Fen\u002Flatest\u002Ffeatures\u002Fdisagg_prefill.html)所支持，并且现在已与 [LMCache](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fgetting_started\u002Fexamples\u002Flmcache-integration.html) 集成，以提供更强大的 KVCache 管理能力。\n\n#### 亮点\n- **多副本支持**：Mooncake Store 支持为同一对象存储多个数据副本，从而有效缓解访问压力的热点问题。\n\n- **高带宽利用率**：Mooncake Store 支持大型对象的条带化和并行 I\u002FO 传输，充分利用多网卡聚合带宽进行高速数据读写。\n\n### SGLang 集成 ([指南](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fgetting_started\u002Fexamples\u002Fsglang-integration\u002Fhicache-integration-v1.html))\n\nSGLang 官方将 Mooncake Store 作为 [HiCache 存储后端](https:\u002F\u002Flmsys.org\u002Fblog\u002F2025-09-10-sglang-hicache\u002F)正式支持。这一集成使得大规模 LLM 服务场景能够实现可扩展的 KVCache 保留和高性能访问。\n\n#### 亮点\n- **分层 KV 缓存**：Mooncake Store 在 SGLang 的 HiCache 系统中作为外部存储后端，通过设备、主机和远程存储层的多级 KVCache 存储扩展了 RadixAttention。\n- **灵活的缓存管理**：支持多种缓存策略，包括直写、选择性直写和回写模式，并配备智能预取策略以实现最佳性能。\n- **全面优化**：具备先进的数据平面优化功能，包括页面优先的内存布局以提高 I\u002FO 效率、减少内存开销的零拷贝机制、加速 CPU-GPU 传输的 GPU 辅助 I\u002FO 核心，以及在计算执行时并发加载 KVCache 的分层重叠技术。\n- **弹性专家并行**：Mooncake 的集体通信后端和专家并行核已被集成到 SGLang 中，以实现容错的专家并行推理（[sglang#11657](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang\u002Fpull\u002F11657)）。\n- **显著的性能提升**：多轮基准测试表明，相比非 HiCache 设置，性能有了大幅提高。更多详情请参阅我们的 [基准测试报告](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fperformance\u002Fsglang-hicache-benchmark-results-v1.html)。\n- **社区反馈**：有效的 KV 缓存显著减少了 TTFT，消除了冗余且昂贵的重新计算。将 SGLang HiCache 与 Mooncake 服务集成，可以实现可扩展的 KVCache 保留和高性能访问。在我们的评估中，我们使用内部在线请求，从通用 QA 场景中采样，对 PD 分离式部署下的 DeepSeek-R1-671B 模型进行了测试。平均而言，缓存命中使 TTFT 相比完全重新计算降低了 84%。——蚂蚁集团\n\n### vLLM 集成 ([指南 v0.2](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fgetting_started\u002Fexamples\u002Fvllm-integration\u002Fvllm-integration-v0.2.html))\n为了优化 LLM 推理，vLLM 社区正在努力支持 [分离式预填充（PR 10502）](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm\u002Fpull\u002F10502)。该特性允许将 **预填充** 阶段与 **解码** 阶段分离到不同的进程。vLLM 默认使用 `nccl` 和 `gloo` 作为传输层，但目前还无法在不同机器上高效地解耦这两个阶段。\n\n我们已经实现了 vLLM 集成，该集成使用 Transfer Engine 作为网络层，而非 `nccl` 和 `gloo`，以支持 **节点间 KVCache 传输** [(PR 10884)](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm\u002Fpull\u002F10884)。Transfer Engine 提供更简单的接口和更高效的 RDMA 设备使用方式。\n\n我们很快将发布基于 Mooncake Store 的新 vLLM 集成，该集成支持 xPyD 预填充\u002F解码分离。\n\n**_更新[2024年12月16日]：以下是基于 vLLM 主分支的最新 vLLM 集成（[指南 v0.2](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fgetting_started\u002Fexamples\u002Fvllm-integration\u002Fvllm-integration-v0.2.html))。_**\n\n#### 性能\n通过支持拓扑感知路径选择和多卡带宽聚合，使用 Transfer Engine 的 vLLM 的平均 TTFT 比传统的基于 TCP 的传输方式低多达 25%。\n未来，我们将通过 GPUDirect RDMA 和零拷贝进一步改善 TTFT。\n\n| 后端\u002F设置                                         | 输出 Token 吞吐量 (tok\u002Fs) | 总 Token 吞吐量 (tok\u002Fs) | 平均 TTFT (ms) | 中位数 TTFT (ms) | P99 TTFT (ms)|\n|---------------------------------------------------------|---------------------------|--------------------------|----------------|------------------|---------------|\n| Transfer Engine (RDMA) | 12.06                   | 2042.74                  | 1056.76        | 635.00           | 4006.59       |\n| TCP  | 12.05                   | 2041.13                  | 1414.05        | 766.23          | 6035.36       |\n\n- 点击 [这里](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fperformance\u002Fvllm-benchmark-results-v0.2.html) 查看详细的基准测试结果。\n\n**更多高级功能即将推出，请继续关注！**\n\n\u003Ch2 id=\"quick-start\">🚀 快速入门\u003C\u002Fh2>\n\n### 使用 Mooncake 之前\n\nMooncake 专为高速 RDMA 网络设计并进行了优化。尽管 Mooncake 支持仅使用 TCP 的数据传输，但我们**强烈**建议用户在具备 RDMA 网络支持的环境中评估 Mooncake 的功能和性能。\n\n在运行 Mooncake 的任何组件之前，需要先安装以下内容：\n- RDMA 驱动程序及 SDK，例如 Mellanox OFED。\n- Python 3.10，建议使用虚拟环境。\n- CUDA 12.1 及以上版本，包括 NVIDIA GPUDirect Storage 支持；如果软件包是通过 `-DUSE_CUDA` 标志构建的（默认未启用）。*您可从 [此处](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-downloads) 安装所需工具。*\n- Cambricon Neuware；如果软件包是通过 `-DUSE_MLU` 标志构建的。默认情况下，Mooncake 会在 `NEUWARE_HOME` 或 `\u002Fusr\u002Flocal\u002Fneuware` 路径下查找 Neuware。\n\n### 使用 Python 包\n使用 Mooncake Transfer Engine 最简单的方式是通过 `pip`：\n\n**对于启用了 CUDA 的系统：**\n\n- CUDA \u003C 13.0\n```bash\npip install mooncake-transfer-engine\n```\n- CUDA ≥ 13.0\n```bash\npip install mooncake-transfer-engine-cuda13\n```\n\n**对于非 CUDA 系统：**\n```bash\npip install mooncake-transfer-engine-non-cuda\n```\n\n> [!重要提示]\n> - 含有 CUDA 的版本（`mooncake-transfer-engine`）包含 Mooncake-EP 和 GPU 拓扑检测功能，因此需要 CUDA 12.1 及以上版本。\n> - 非 CUDA 版本（`mooncake-transfer-engine-non-cuda`）适用于没有 CUDA 依赖的环境。\n> - 目前 MLU 支持仅可通过源码构建实现（需指定 `-DUSE_MLU=ON`），暂无专门的预编译 MLU wheel 文件。\n> - 如果用户遇到诸如缺少 `lib*.so` 文件等问题，请先卸载已安装的包，然后手动重新编译二进制文件。\n\n### 使用 Docker 镜像\nMooncake 支持基于 Docker 的部署，详细信息请参阅[构建指南](https:\u002F\u002Fkvcache-ai.github.io\u002FMooncake\u002Fgetting_started\u002Fbuild.html)。\n\n若要构建一个从源码编译 Mooncake、通过 `scripts\u002Fbuild_wheel.sh` 生成 wheel 文件，并将该 wheel 安装到容器内的镜像，可以使用 `build-wheel.dockerfile`：\n\n```bash\ndocker build -f docker\u002Fmooncake.Dockerfile \\\n  --build-arg PYTHON_VERSION=3.10 \\\n  --build-arg EP_TORCH_VERSIONS=\"2.9.1\" \\\n  -t mooncake:from-source .\n```\n\n生成的镜像已在 `\u002Fopt\u002Fvenv` 路径下创建了虚拟环境，并安装了新构建的 wheel 文件。根据需要以 GPU 或 RDMA 访问权限启动容器，例如：\n\n```bash\ndocker run --gpus all --network host -it mooncake:from-source \u002Fbin\u002Fbash\n```\n\n> [!注意]\n> 请确保从仓库根目录构建镜像，以便 Git 元数据和子模块能够在构建上下文中被正确读取。\n\n### 构建和使用二进制文件\n以下是构建 Mooncake 的额外依赖项：\n- 构建必备工具，包括 gcc、g++（9.4+）和 cmake（3.16+）。\n- Go 1.20+，如果你希望使用 `-DWITH_P2P_STORE`、`-DUSE_ETCD`（默认启用以使用 etcd 作为元数据服务器）或 `-DSTORE_USE_ETCD`（将 etcd 用于存储主节点的故障转移）进行构建。\n- CUDA 12.1 及以上版本，包含 NVIDIA GPUDirect Storage 支持，如果软件包是使用 `-DUSE_CUDA` 构建的。*这并未包含在 `dependencies.sh` 脚本中。你可以从 [这里](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-downloads) 安装。*\n- Cambricon Neuware，如果你希望使用 `-DUSE_MLU` 进行构建。*这并未包含在 `dependencies.sh` 脚本中。* Mooncake 默认会从 `NEUWARE_HOME` 或 `\u002Fusr\u002Flocal\u002Fneuware` 中解析，同时也支持在 CMake 配置时覆盖 `MLU_INCLUDE_DIR` \u002F `MLU_LIB_DIR`。\n- [可选] Rust 工具链，如果你希望使用 `-DWITH_RUST_EXAMPLE` 进行构建。*这并未包含在 `dependencies.sh` 脚本中。*\n- [可选] `hiredis`，如果你希望使用 `-DUSE_REDIS` 将 Redis 作为元数据服务器，而非 etcd。\n- [可选] `curl`，如果你希望使用 `-DUSE_HTTP` 将 HTTP 作为元数据服务器，而非 etcd。\n\n构建和安装步骤如下：\n1. 从 GitHub 仓库获取源代码\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake.git\n   cd Mooncake\n   ```\n\n2. 安装依赖项\n   ```bash\n   bash dependencies.sh\n   ```\n\n3. 编译 Mooncake 和示例程序\n   ```bash\n   mkdir build\n   cd build\n   cmake ..\n   make -j\n   sudo make install # 可选，使系统准备好被 vLLM\u002FSGLang 使用\n   ```\n\n对于 Cambricon MLU 的构建，请在配置 CMake 时添加 `-DUSE_MLU=ON`。例如：\n```bash\nmkdir build\ncd build\ncmake .. -DUSE_MLU=ON -DNEUWARE_ROOT=\u002Fusr\u002Flocal\u002Fneuware\nmake -j\n```\n\n\n\u003Ch2 id=\"milestones\"> 🛣️ 即将到来的里程碑\u003C\u002Fh2>\n\n- [x] Mooncake 首次发布，并与最新的 vLLM 集成\n- [ ] 在多个推理引擎之间共享 KV 缓存\n- [ ] 用户和开发者文档\n\n\u003Ch2 id=\"trace\">📦 开源追踪数据\u003C\u002Fh2>\n\n```json\n{\n    \"timestamp\": 27482,\n    \"input_length\": 6955,\n    \"output_length\": 52,\n    \"hash_ids\": [46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 2353, 2354]\n}\n{\n    \"timestamp\": 30535,\n    \"input_length\": 6472,\n    \"output_length\": 26,\n    \"hash_ids\": [46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 2366]\n}\n```\n以上展示了我们追踪数据集中的两个样本。该追踪数据包含了请求到达的时间、输入 token 的数量、输出 token 的数量以及重新映射后的块哈希值。为了保护客户的隐私，我们在保留数据集用于模拟评估价值的同时，应用了多种机制来移除与用户相关的信息。关于追踪数据的更多描述（例如高达 50% 的缓存命中率），请参阅技术报告的第 4 节。\n\n**_更新[2025年2月21日]：我们在 FAST'25 论文中使用的更新版 [追踪数据](FAST25-release\u002Ftraces) 已经发布！更多信息请参阅论文附录（见 [这里](FAST25-release\u002FMooncake-FAST25.pdf)）。_**\n\n\u003Ch2 id=\"citation\">📑 引用\u003C\u002Fh2>\n如果你认为我们的论文或追踪数据有用，请引用：\n\n```bibtex\n@article{qin2025mooncake_tos,\n  author    = {Qin Ruoyu and Li Zheming and He Weiran and Cui Jialei and Tang Heyi and Ren Feng and Ma Teng and Cai Shangming and Zhang Yineng and Zhang Mingxing and Wu Yongwei and Zheng Weimin and Xu Xinran},\n  title     = {Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving},\n  year      = {2025},\n  publisher = {Association for Computing Machinery},\n  address   = {New York, NY, USA},\n  issn      = {1553-3077},\n  url       = {https:\u002F\u002Fdoi.org\u002F10.1145\u002F3773772},\n  doi       = {10.1145\u002F3773772},\n  journal   = {ACM Trans. Storage},\n  month     = {nov},\n  keywords  = {Machine learning system, LLM serving, KVCache},\n}\n\n@inproceedings{qin2025mooncake,\n  author    = {Ruoyu Qin and Zheming Li and Weiran He and Jialei Cui and Feng Ren and Mingxing Zhang and Yongwei Wu and Weimin Zheng and Xinran Xu},\n  title     = {Mooncake: Trading More Storage for Less Computation {\\textemdash} A {KVCache-centric} Architecture for Serving {LLM} Chatbot},\n  booktitle = {23rd USENIX Conference on File and Storage Technologies (FAST 25)},\n  year      = {2025},\n  isbn      = {978-1-939133-45-8},\n  address   = {Santa Clara, CA},\n  pages     = {155--170},\n  url       = {https:\u002F\u002Fwww.usenix.org\u002Fconference\u002Ffast25\u002Fpresentation\u002Fqin},\n  publisher = {USENIX Association},\n  month     = {feb},\n}\n\n@article{qin2024mooncake_arxiv,\n  title  = {Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving},\n  author = {Ruoyu Qin and Zheming Li and Weiran He and Mingxing Zhang and Yongwei Wu and Weimin Zheng and Xinran Xu},\n  year   = {2024},\n  url    = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.00079},\n}\n```","# Mooncake 快速上手指南\n\nMooncake 是一个以 KVCache 为核心的解耦架构大模型服务框架，由月之暗面（Moonshot AI）开源，也是 Kimi 智能助手的服务底座。它通过高效的传输引擎（Transfer Engine）和分布式存储（Mooncake Store），实现了预填充（Prefill）与解码（Decode）阶段的解耦，显著提升长上下文场景下的吞吐量。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：推荐 Linux (Ubuntu 20.04\u002F22.04)。\n*   **Python 版本**：3.8 - 3.12。\n*   **GPU 驱动**：已安装兼容的 NVIDIA 驱动。\n*   **CUDA 版本**：\n    *   标准版：支持 CUDA \u003C= 12.9\n    *   新版：支持 CUDA 13.0 \u002F 13.1 (需安装特定包)\n*   **网络硬件（可选但推荐）**：若需高性能跨节点传输，建议配置 RDMA (InfiniBand\u002FRoCE) 环境；若无 RDMA，TE 引擎也支持 TCP 协议。\n\n## 安装步骤\n\n### 1. 安装 Transfer Engine (核心组件)\n\n根据您当前的 CUDA 版本选择对应的安装命令。推荐使用国内镜像源加速下载。\n\n**对于 CUDA 12.x 及以下版本：**\n```bash\npip install mooncake-transfer-engine -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n**对于 CUDA 13.0 \u002F 13.1 版本：**\n```bash\npip install mooncake-transfer-engine-cuda13 -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n### 2. 安装 Mooncake Store (分布式 KVCache 存储)\n\nMooncake Store 构建在 Transfer Engine 之上，用于实现分布式 KVCache 池。\n\n```bash\npip install mooncake-store -i https:\u002F\u002Fpypi.tuna.tsinghua.edu.cn\u002Fsimple\n```\n\n> **注意**：如果您计划将 Mooncake 与 vLLM、SGLang 或 LMDeploy 等推理框架集成，通常只需安装上述核心库，推理框架会通过插件机制自动调用。具体集成请参考各框架官方文档中的 Mooncake 章节。\n\n## 基本使用\n\nMooncake 的核心功能是提供低延迟的数据传输和分布式缓存。以下是一个使用 Python 初始化 Transfer Engine 并进行简单数据传输的最小化示例。\n\n### 示例：初始化引擎与数据传输\n\n此示例演示如何启动一个本地传输实例，并模拟内存块的注册与传输准备。\n\n```python\nimport mooncake_transfer_engine as te\n\n# 1. 初始化传输引擎\n# 指定本地元数据服务器地址（单机测试可用本地地址）\nengine = te.TransferEngine(\n    metadata_server=\"localhost:12345\",\n    local_hostname=\"localhost\",\n    protocol=\"tcp\"  # 若有 RDMA 环境，可改为 \"rdma\"\n)\n\n# 2. 分配并注册内存缓冲区\n# 在实际 LLM 服务中，这里通常是 KVCache 所在的显存\nbuffer_size = 1024 * 1024 * 100  # 100MB\nbuffer = engine.allocate_buffer(buffer_size)\n\nprint(f\"Buffer allocated at address: {hex(buffer.address)}\")\n\n# 3. 注册内存段以便远程访问\nsegment_id = engine.register_segment(buffer.address, buffer_size)\nprint(f\"Segment registered with ID: {segment_id}\")\n\n# 4. (逻辑示意) 发起传输任务\n# 实际生产中，调度器会根据请求协调不同节点间的 KVCache 拉取\n# 此处仅展示 API 调用形式，具体 target 需为集群内其他节点信息\n# task_id = engine.submit_task(...) \n\nprint(\"Mooncake Transfer Engine initialized successfully.\")\n```\n\n### 与推理框架集成简述\n\nMooncake 最常用的方式是作为后端插件嵌入主流推理框架：\n\n*   **vLLM**: 启动时添加参数 `--kv-connector mooncake` 并配置相关环境变量即可启用 PD 解耦推理。\n*   **SGLang**: 支持将 Mooncake Store 作为分层 KV 缓存的后端，需在启动配置中指定 `--hi-cache-backend mooncake`。\n\n详细的高级配置和集群部署方案，请参阅 Mooncake 官方文档或对应推理框架的集成指南。","某大型多模态内容平台在高峰期面临海量用户并发请求，需同时处理长文本对话与高分辨率图像理解，导致推理集群负载不均且响应延迟飙升。\n\n### 没有 Mooncake 时\n- **显存资源浪费严重**：每个推理实例独立维护 KVCache，相同的前缀提示词（如系统指令或长文档）在不同节点被重复计算和存储，大幅降低显存利用率。\n- **跨节点通信瓶颈**：在多机部署架构下，缺乏高效的状态迁移机制，导致请求在不同 GPU 间调度时数据传输延迟高，首字生成时间（TTFT）波动剧烈。\n- **弹性扩缩容困难**：由于推理状态与计算节点强绑定，动态增减实例时难以平滑迁移正在进行的会话，常引发服务中断或需要昂贵的全量重计算。\n- **多模态处理冗余**：处理视频或多图输入时，视觉编码器（ViT）生成的嵌入向量无法在集群内共享，导致相同的视觉特征被反复提取，浪费大量算力。\n\n### 使用 Mooncake 后\n- **全局缓存共享**：Mooncake 的解耦架构实现了 KVCache 的全局池化管理，相同前缀只需计算一次即可被集群内所有实例复用，显存效率提升数倍。\n- **高速状态迁移**：依托 Mooncake Transfer Engine，推理状态可在毫秒级内在不同设备或机器间无损传输，显著降低 TTFT 并消除长尾延迟。\n- **无感弹性伸缩**：计算与存储分离使得会话状态可自由漂移，扩容新节点时可瞬间接管现有请求，实现真正的零中断平滑扩缩容。\n- **跨实例视觉复用**：通过全局多模态嵌入缓存，ViT 提取的特征向量可在不同推理任务间直接共享，彻底避免了对同一视觉输入的重复计算。\n\nMooncake 通过以 KVCache 为核心的解耦架构，将大模型服务从“单点计算”升级为“集群协同”，在保障极致低延迟的同时大幅降低了算力成本。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fkvcache-ai_Mooncake_36dee92b.png","kvcache-ai","kvcache.ai","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fkvcache-ai_1d3b634d.png","KVCache.AI is a joint research project between MADSys and top industry collaborators, focusing on efficient LLM serving.",null,"zhang_mingxing@mail.tsinghua.edu.cn","https:\u002F\u002Fmadsys.cs.tsinghua.edu.cn\u002F","https:\u002F\u002Fgithub.com\u002Fkvcache-ai",[81,85,89,93,97,101,105,109,113],{"name":82,"color":83,"percentage":84},"C++","#f34b7d",83.4,{"name":86,"color":87,"percentage":88},"Python","#3572A5",9.3,{"name":90,"color":91,"percentage":92},"C","#555555",1.8,{"name":94,"color":95,"percentage":96},"Shell","#89e051",1.5,{"name":98,"color":99,"percentage":100},"CMake","#DA3434",1.3,{"name":102,"color":103,"percentage":104},"Cuda","#3A4E3A",1.1,{"name":106,"color":107,"percentage":108},"Go","#00ADD8",1,{"name":110,"color":111,"percentage":112},"Rust","#dea584",0.6,{"name":114,"color":115,"percentage":116},"Dockerfile","#384d54",0.1,5120,682,"2026-04-18T01:35:25","Apache-2.0",4,"Linux","需要 NVIDIA GPU（支持 RDMA），具体型号未说明但提及生产环境使用 H100\u002FH200；支持 CUDA \u003C=12.9 及 CUDA 13.0\u002F13.1","未说明",{"notes":126,"python":127,"dependencies":128},"该工具核心为传输引擎（Transfer Engine），专为大规模 LLM 服务设计，支持通过 RDMA、TCP、CXL 等协议进行高性能数据传输。主要应用于解耦预填充（Prefill）和解码（Decode）架构。已深度集成至 vLLM、SGLang、TensorRT-LLM 等主流推理框架。生产环境案例显示需大规模多机集群（如 128 张 H200 GPU）。","未说明（PyPI badge 显示支持特定 Python 版本但未列明具体数字）",[129,130,131,132,133],"mooncake-transfer-engine","vLLM (可选集成)","SGLang (可选集成)","TensorRT-LLM (可选集成)","LMDeploy (可选集成)",[35,14],[136,137,138,139,140,141,142],"inference","kvcache","llm","rdma","sglang","vllm","disaggregation","2026-03-27T02:49:30.150509","2026-04-18T22:35:23.891326",[146,151,156,161,166,170],{"id":147,"question_zh":148,"answer_zh":149,"source_url":150},40542,"遇到 'Mooncake memory registration failed' 或 'Bad address' 错误该如何解决？","这通常是因为缺少 `nvidia_peermem` 内核模块，导致 RDMA 无法正确映射 GPU 内存地址。请执行以下步骤：\n1. 检查模块是否加载：运行 `lsmod | grep nvidia_peermem`。\n2. 如果未加载，手动加载模块：运行 `sudo modprobe nvidia_peermem`。\n\n此外，确保您的 Linux 内核安装了源码\u002F头文件，并且 NVIDIA 驱动安装时包含了内核模块（避免使用 `--no-kernel-module` 参数），以便支持 `gdrcopy` 和正确的内存注册。","https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fissues\u002F351",{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},40543,"如何正确编译并导入 `mooncake.mooncake_ep_buffer` 或 `mooncake.ep`？","如果通过 `pip install` 安装的版本出现 'device kernel image is invalid' 错误或导入失败，建议从源码编译并替换环境中的文件：\n1. 克隆仓库并从源码编译 Mooncake。\n2. 找到编译生成的产物（通常位于 `mooncake-wheel\u002Fmooncake` 目录）。\n3. 将该目录直接复制到您的 Python 虚拟环境的 `site-packages` 文件夹中，覆盖原有的 `mooncake` 目录。\n这样可以确保使用与您当前 CUDA 环境匹配的编译版本。","https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fissues\u002F936",{"id":157,"question_zh":158,"answer_zh":159,"source_url":160},40544,"使用 TCP 协议进行 KV Cache 传输时性能低下或报错的原因是什么？","目前通过 TCP 协议传输 CUDA 显存数据时，系统需要先将数据从 GPU 拷贝到 CPU 内存，然后再通过网络发送，这会导致显著的性能开销和潜在的延迟问题。这是当前架构的限制，未来计划支持更高效的直接显存传输方式，但具体时间表取决于开发进度。建议在高性能场景下优先配置和使用 RDMA 协议。","https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fissues\u002F960",{"id":162,"question_zh":163,"answer_zh":164,"source_url":165},40545,"在单节点多卡环境下运行 vLLM + Mooncake 出现 'local access violation work queue error' 怎么办？","该错误通常与特定的 vLLM 私有版本、模型配置或 RDMA 上下文竞争有关。排查步骤如下：\n1. 确认是否在官方标准版本的 vLLM 上复现，某些私有修改版本可能存在兼容性问题。\n2. 检查 `mooncake.json` 配置，确保 `device_name`（如 `mlx5_0`）与实际网卡名称一致。\n3. 尝试调整每个 worker 的内存分配大小，避免超出限制。\n4. 如果仅在特定长输入 token（如 3000+）下复现，可能是触发了特定的边界条件，建议升级至最新稳定版或联系维护者提供详细日志。","https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fissues\u002F277",{"id":167,"question_zh":168,"answer_zh":169,"source_url":160},40546,"如何在 vLLM 中配置 MooncakeConnector 实现 Prefill 和 Decode 分离（PD Separation）？","需要启动两个 vLLM 实例（Producer 和 Consumer）并配合代理服务器。配置示例如下：\n\n1. **配置文件 (mooncake.json)**:\n```json\n{\n  \"prefill_url\": \"127.0.0.1:13003\",\n  \"decode_url\": \"127.0.0.1:13103\",\n  \"metadata_server\": \"127.0.0.1:2379\",\n  \"protocol\": \"tcp\",\n  \"device_name\": \"\"\n}\n```\n\n2. **启动 Producer (P 实例)**:\n设置环境变量 `MOONCAKE_CONFIG_PATH`，并在启动命令中添加：\n`--kv-transfer-config '{\"kv_connector\":\"MooncakeConnector\",\"kv_role\":\"kv_producer\", \"kv_connector_module_path\":\"mooncake.mooncake_connector_v1\"}'`\n\n3. **启动 Consumer (D 实例)**:\n类似 P 实例，但角色设为 `kv_consumer`。\n\n4. **启动代理**:\n运行 `python3 -m mooncake.vllm_v1_proxy_server` 并指定 prefill 和 decoder 的地址端口。",{"id":171,"question_zh":172,"answer_zh":173,"source_url":150},40547,"为什么在使用 PyTorch 和 RDMA 时会出现内存地址解析失败的问题？","这可能与 PyTorch 的内存分配策略有关。默认情况下，PyTorch 可能未启用 `expandable_segments`，导致内存碎片化或地址不连续，从而使得 `ibv_reg_mr()` 无法解析地址。\n解决方案：\n1. 设置环境变量 `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True`。\n2. 确保使用的是已包含相关兼容性修复（如 PR #153261）的 PyTorch 版本，该修复使 PyTorch 更好地兼容 RDMA 内存注册。",[175,180,185,190,195,200,205,210,215,220,225,230,235,240,245,250,255,260,265,270],{"id":176,"version":177,"summary_zh":178,"released_at":179},323994,"v0.3.10.post1","## 变更内容\n* 修复\u002FTent 批量传输合并边界问题，由 @Primary33 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1704 中完成\n* 文档：在 Mooncake README 中添加 TorchSpec，由 @zhyncs 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1709 中完成\n* [Store] 为 BucketStorageBackend 添加带有批量主节点通知的逐出策略，由 @zhangzuo21 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1646 中完成\n* 支持主节点 TTL 标志的时长单位，由 @Primary33 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1684 中完成\n* 存储后端端到端测试，由 @maheshrbapatu 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1660 中完成\n* [CI] 在 Ascend 平台上添加 Hixl RoCE 示例，由 @VNightMare 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1697 中完成\n* 从 Transfer Engine 中移除不可移植的 GCC 内部头文件，由 @Copilot 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1716 中完成\n* [STORE] 引入高可用后端抽象，由 @YiXR 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1678 中完成\n* [TENT] 修复：避免在重复并发引导时重置 RDMA 端点，由 @00fish0 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1705 中完成\n* [Bugfix] 修复 TENT_METRICS_ENABLED=ON 时 tent_metrics 的构建错误，由 @staryxchen 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1712 中完成\n* [PG] 修复组大小扩展问题，由 @caozhanhao 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1706 中完成\n* 构建：添加内存感知的编译\u002F链接并行化，由 @staryxchen 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1718 中完成\n* [Skill] 功能：添加故障排除技能，由 @stmatengss 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1724 中完成\n* [EP] 使 num_ranks 更加灵活，由 @ympcMark 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1725 中完成\n* [Misc] 改善 EP 和 PG 的开发者体验，由 @UNIDY2002 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1708 中完成\n* [TE] 重构 Ascend 直接传输，并适配 Store 的虚拟真实模式，由 @ascend-direct-dev 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1720 中完成\n* [STORE] 添加 Redis 领导者后端及高可用回归覆盖，由 @YiXR 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1722 中完成\n* Store：拆分客户端高可用\u002F控制平面线程，并抑制零段 he…，由 @XucSh 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1736 中完成\n* [Transfer Engine] 添加初始 MACA 构建路径和类似 CUDA 的适配器，由 @Dayuxiaoshui 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1731 中完成\n* 构建时添加所需库，由 @xleoken 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1674 中完成\n* [PG] 优化 p2p 代理缓冲区大小，由 @JunlinW113 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1735 中完成\n* [Store] 为受逐出保护的对象添加硬锁定机制，由 @he-yufeng 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1728 中完成\n* [PG] 将 kP2PBufferSize 提高以释放全部性能潜力，由 @UNIDY2002 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1740 中完成\n* 修复\u002FTent Store 元数据覆盖问题，由 @Primary33 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1743 中完成\n* [bug fix] NVMeoFTransport::submitTransferTask 参数不一致，由 @yz53665 在 htt","2026-04-01T03:50:21",{"id":181,"version":182,"summary_zh":183,"released_at":184},323995,"v0.3.10","## 变更内容\n* [Store] 封装 etcd 基础接口，用于主服务高可用，由 @Libotry 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1451 中完成\n* [Build] 更新 CUDA 13 构建的包名和关键字，由 @stmatengss 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1506 中完成\n* [EP] 改进调试信息，由 @UNIDY2002 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1505 中完成\n* [Misc] 修复 wheel 构建脚本，由 @Ann-1024 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1504 中完成\n* [Store][Feature] 添加客户端复制和移动支持，由 @zhongzhouTan-coder 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1364 中完成\n* [EP] 修复在禁用 IBGDA 时出现的回归问题，由 @UNIDY2002 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1514 中完成\n* 在 README 的更新部分添加 ROLL 合作公告，由 @Copilot 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1517 中完成\n* [TE] 添加使用 libfabric 的 AWS EFA 传输，由 @whn09 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1509 中完成\n* [TENT] 修复：减少 TCP recvData 操作中的不必要带宽消耗，由 @00fish0 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1513 中完成\n* [Config][1\u002Fn] 添加所有环境变量的全局配置，由 @stmatengss 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1512 中完成\n* [docs] 添加导管索引器 API 设计文档，由 @yejj710 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1416 中完成\n* [Doc] 补充说明缺失的 USE_MNNVL 编译选项，由 @ShangmingCai 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1525 中完成\n* [TE] 新增功能：当接口调用失败时，hixl 支持报告错误信息，由 @A-Liuhao 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1524 中完成\n* [TE] 在 EFA 构建中支持 TCP 回退，并改进 EFA 文档，由 @whn09 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1523 中完成\n* [Store] 优化 BucketStorageBackend，以减少锁竞争并增加删除安全性，由 @maheshrbapatu 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1456 中完成\n* 为 Mooncake Store 客户端添加本地缓存机制，由 @Shichang-Zhang 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1226 中完成\n* [CI] 添加 sglang epd 测试用例，由 @luketong777 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1528 中完成\n* 为 mooncake store 客户端添加 efa 协议，由 @snadampal 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1526 中完成\n* 实现 P2P 连接的 C++ 方法，由 @donghun-furiosa 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1539 中完成\n* [TENT] 改进 tebench：GPU 选择、优雅中断以及构建修复，由 @staryxchen 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1537 中完成\n* [TE] 更改 ascend 直连传输文档，并修复异步传输断开连接的 bug，由 @ascend-direct-dev 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1534 中完成\n* [Bug] 添加 CI 开关和空闲空间代码，由 @JasonZhang517 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1540 中完成\n* [Store] 引入“空闲率优先”分配策略，以提高收敛性，由 @00fish0 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1511 中完成\n* 更新 README，加入近期项目更新内容，由 @zhyncs 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1541 中完成\n* ","2026-03-19T14:40:24",{"id":186,"version":187,"summary_zh":188,"released_at":189},323996,"v0.3.9","## 变更内容\n* [Store] 修复：将偏移量分配器的节点存储切换为向量，由 @yuechen-sys 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1286 中完成\n* [Store] 新特性：Mooncake Store 启用 Ascend Fabric 内存模式，由 @ascend-direct-dev 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1170 中完成\n* [Doc] 修复传输引擎文档中的图片路径，由 @00fish0 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1295 中完成\n* 更新 yalantinglibs 依赖，改为直接下载归档文件，由 @Copilot 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1282 中完成\n* [store] 支持批量发布及对 pub_tensor 的事务感知，由 @zxpdemonio 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1288 中完成\n* 文档：为 SGLang hicache 集成添加虚拟客户端支持，由 @YiXR 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1299 中完成\n* 将 YiXR 添加为 mooncake-store 的代码所有者，由 @stmatengss 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1301 中完成\n* 文档：修复错误的环境变量配置，由 @YiXR 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1302 中完成\n* [store] 为 get_tensor_into 添加错误检查，由 @zxpdemonio 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1272 中完成\n* [Store] 添加 HugePage 支持，由 @YiXR 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1300 中完成\n* [TENT] 增加与 TE 的向后兼容性，由 @alogfans 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1277 中完成\n* [Store]：添加带有锁-striped 元数据和引用计数扩展的 OffsetAllocator 磁盘后端，由 @maheshrbapatu 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1284 中完成\n* [EP] 尽可能在 EP 中使用 NVLink，由 @UNIDY2002 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1308 中完成\n* [EP] 将 _mm_pause() 替换为 PAUSE() 宏，由 @UNIDY2002 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1313 中完成\n* [TE] 添加任务完成延迟跟踪和详细指标报告，由 @staryxchen 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1310 中完成\n* 新特性：在 CI 中添加代码覆盖率支持，由 @LiYiMing-lg 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1316 中完成\n* 文档：在中文构建指南中添加 Docker 部署说明，由 @LiYiMing-lg 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1318 中完成\n* [TE] 在 NVLINK_allocator 中添加早期内存后端检测方法，由 @TTThanos 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1296 中完成\n* [TE] 新特性：Ascend 直连传输增加异步传输任务限制，由 @ascend-direct-dev 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1325 中完成\n* [TENT] 修复内存注册中的一个 bug，并在示例中添加 TENT 支持，由 @staryxchen 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1330 中完成\n* [Store][功能]：添加任务管理组件，由 @zhongzhouTan-coder 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1326 中完成\n* [Store] —— 在 BatchOffload 中实现部分成功处理，由 @maheshrbapatu 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1319 中完成\n* [TE] 支持 ARM 架构的 PAUSE() 指令，由 @stmatengss 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1340 中完成\n* [CI] 添加 CUDA13 wheel 发布工作流，由 @stmatengss 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1331 中完成\n* [TENT] 添加 Redis 认证和数据库选择支持","2026-02-05T10:50:06",{"id":191,"version":192,"summary_zh":193,"released_at":194},323997,"v0.3.8.post1","## 变更内容\n* [Store] 修复：将偏移量分配器的节点存储切换为向量，由 @yuechen-sys 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1286 中完成\n* [Store] 功能：Mooncake Store 启用 Ascend Fabric 内存模式，由 @ascend-direct-dev 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1170 中完成\n* [Doc] 修复传输引擎文档中的图片路径，由 @00fish0 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1295 中完成\n* 更新 yalantinglibs 依赖，改为直接下载归档文件，由 @Copilot 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1282 中完成\n* [store] 支持批量发布及对 pub_tensor 的事务感知功能，由 @zxpdemonio 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1288 中完成\n* 文档：为 SGLang hicache 集成添加虚拟客户端支持，由 @YiXR 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1299 中完成\n* 将 YiXR 添加为 mooncake-store 的代码所有者，由 @stmatengss 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1301 中完成\n* 文档：修复错误的环境变量配置，由 @YiXR 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1302 中完成\n* [store] 为 get_tensor_into 添加错误检查，由 @zxpdemonio 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1272 中完成\n* [Store] 添加 HugePage 支持，由 @YiXR 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1300 中完成\n* [TENT] 增加与 TE 的向后兼容性，由 @alogfans 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1277 中完成\n* [Store]：添加带有锁-striped 元数据和引用计数区段的 OffsetAllocator 磁盘后端，由 @maheshrbapatu 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1284 中完成\n* [EP]：在 EP 中尽可能使用 NVLink，由 @UNIDY2002 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1308 中完成\n* [EP]：将 _mm_pause() 替换为 PAUSE() 宏，由 @UNIDY2002 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1313 中完成\n* [TE]：添加任务完成延迟跟踪和详细指标报告，由 @staryxchen 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1310 中完成\n* 功能：在 CI 中添加代码覆盖率支持，由 @LiYiMing-lg 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1316 中完成\n* 文档：在中文构建指南中添加 Docker 部署说明，由 @LiYiMing-lg 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1318 中完成\n* [TE]：在 NVLINK_allocator 中添加早期内存后端检测方法，由 @TTThanos 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1296 中完成\n* [TE] 功能：Ascend 直连传输增加异步传输任务限制，由 @ascend-direct-dev 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1325 中完成\n* [TENT]：修复内存注册中的一个 bug，并在示例中添加 TENT 支持，由 @staryxchen 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1330 中完成\n* [Store][功能]：添加任务管理组件，由 @zhongzhouTan-coder 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1326 中完成\n* [Store]：在 BatchOffload 中实现部分成功处理，由 @maheshrbapatu 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1319 中完成\n* [TE]：支持 ARM 架构的 PAUSE()，由 @stmatengss 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1340 中完成\n* [CI]：添加 CUDA13 wheel 发布工作流，由 @stmatengss 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1331 中完成\n* [TENT]：添加 Redis 认证和数据库选择支持","2026-01-09T05:32:05",{"id":196,"version":197,"summary_zh":198,"released_at":199},323998,"v0.3.8","## 变更内容\n* adxl：@ascend-direct-dev 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F963 中修复了 aclrtMemcpyBatch 最大 4096 限制的 bug\n* ci：@xiaguan 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F969 中添加了非 CUDA 版本的发布工作流，并更新了文档\n* fix(transfer_engine)：@iBenzene 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F966 中在 RPC 元数据处理中增加了通知回调注册\n* [杂项]（Mooncake 后端）：@UNIDY2002 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F980 中提出，如果通过 ping 消息可以确定某个 rank 出现故障，则提前退出\n* @ShangmingCai 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F985 中修订了非 CUDA 版 Mooncake 的安装说明\n* [Store] feat：@zhuxinjie-nz 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F968 中实现了将键值数据存储到桶中的功能\n* 支持 AMDGPU（重构 CUDA 类似接口）：@yeahdongcn 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F973 中完成了相关工作\n* @ascend-direct-dev 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F996 中添加了日志记录\n* 功能：支持为 issue 957 自定义键前缀：@uniqueni 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F958 中实现了该功能\n* refactor(store)：@xiaguan 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F994 中移除了 Store 对 Transfer Engine 内部 API 的使用\n* [TransferEngine]：缓解大规模集群和大批量数据传输带来的性能开销：@alogfans 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F999 中进行了优化\n* 将 pyproject.toml 中的版本号升级至 0.3.7.post1：@ShangmingCai 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F984 中完成了此操作\n* [store] feat：@yejj710 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F976 中添加了二级存储使用情况监控\n* fix(ci)：@xiaguan 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1003 中移除了 nvlink allocator --ci-build 标志\n* [DOC] 更新图表：@stmatengss 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F987 中完成了此项工作\n* 修改 nvlink_allocator 的构建命令：@ShangmingCai 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1001 中进行了调整\n* fix(ci)：@xiaguan 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1004 中添加了 id-token 权限，并统一了 PyPI 发布用的 token\n* [杂项]：在 `mooncake_ep_buffer.py` 中延迟导入 `ep`：@UNIDY2002 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1014 中完成了此操作\n* 将 pyproject.toml 中的版本号升级至 0.3.7.post2：@ShangmingCai 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1015 中完成了此操作\n* [Store] 修复 CI 中的 bug、改进日志输出并重构 TE 初始化：@ykwd 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1006 中完成了相关工作\n* [CI] 修复 PyClientTest:TestSetupExistTransferEngine 中的一个 CI BUG：@ykwd 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1016 中解决了该问题\n* [杂项]：移除 EP 对 `getCudaTopologyJson` 的重复实现：@UNIDY2002 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1009 中完成了此项工作\n* [Store]：如果传输超时，则清理处理对象（#975）：@nickyc975 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F993 中实现了该功能\n* [Store] 支持分段级别指标（修复 #1029 的代码格式）：@cocktail828 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1030 中完成了相关工作\n* [Store] 开启","2025-12-26T09:25:46",{"id":201,"version":202,"summary_zh":203,"released_at":204},323999,"v0.3.7.post2","## 变更内容\n* [其他] 在 `mooncake_ep_buffer.py` 中延迟导入 `ep`，由 @UNIDY2002 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1014 中完成\n* 在 `pyproject.toml` 中将版本号升级至 0.3.7.post2，由 @ShangmingCai 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1015 中完成\n\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fcompare\u002Fv0.3.7.post1...v0.3.7.post2","2025-11-04T04:40:58",{"id":206,"version":207,"summary_zh":208,"released_at":209},324000,"v0.3.7.post1","## 变更内容\n* adxl：@ascend-direct-dev 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F963 中修复了 aclrtMemcpyBatch 最大 4096 的限制 bug\n* ci：@xiaguan 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F969 中添加了非 CUDA 版本的发布工作流，并更新了文档\n* fix(transfer_engine)：@iBenzene 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F966 中在 RPC 元数据处理中增加了通知回调注册\n* [杂项]（Mooncake 后端）：如果通过 ping 消息可以确定某个 rank 发生故障，则提前退出，由 @UNIDY2002 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F980 中实现\n* 修订非 CUDA 版 Mooncake 的安装说明，由 @ShangmingCai 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F985 中完成\n* [Store] 功能：将键值数据存储到桶中，由 @zhuxinjie-nz 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F968 中实现\n* 支持 AMDGPU（重构类似 CUDA 的接口），由 @yeahdongcn 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F973 中完成\n* 添加日志记录，由 @ascend-direct-dev 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F996 中完成\n* 功能：支持为 issue 957 自定义键前缀，由 @uniqueni 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F958 中实现\n* 重构(store)：移除了 Store 对 Transfer Engine 内部 API 的使用，由 @xiaguan 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F994 中完成\n* [TransferEngine] 缓解大型集群和大批量操作带来的性能开销，由 @alogfans 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F999 中实现\n* 将 pyproject.toml 中的版本号升级至 0.3.7.post1，由 @ShangmingCai 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F984 中完成\n* [store] 功能：添加二级存储使用情况监控，由 @yejj710 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F976 中实现\n* fix(ci)：移除 nvlink allocator --ci-build 标志，由 @xiaguan 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1003 中完成\n* [DOC] 更新图表，由 @stmatengss 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F987 中完成\n* 修改 nvlink_allocator 的构建命令，由 @ShangmingCai 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1001 中完成\n* fix(ci)：添加 id-token 权限，并统一用于发布的 PyPI token，由 @xiaguan 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F1004 中完成\n\n## 新贡献者\n* @iBenzene 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F966 中完成了首次贡献\n* @zhuxinjie-nz 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F968 中完成了首次贡献\n* @yeahdongcn 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F973 中完成了首次贡献\n* @yejj710 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F976 中完成了首次贡献\n\n**完整变更日志**：https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fcompare\u002Fv0.3.7...v0.3.7.post1","2025-11-03T08:04:23",{"id":211,"version":212,"summary_zh":213,"released_at":214},324001,"v0.3.7","## 变更内容\n* [Store] 由 @XucSh 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F812 中实现跳过空缓冲区\n* [Store] 由 @ykwd 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F820 中更改 eviction_high_watermark_ratio 和 eviction_ratio 的默认值\n* [CI\u002FBuild] 由 @peng1999 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F821 中将 mooncake-store 测试置于 BUILD_UNIT_TESTS 选项之后\n* [Doc] 由 @ykwd 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F829 中将 Store 集成到 SGLang HiCache\n* [TransferEngine]: 由 @doujiang24 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F830 中移除 findAvailableTcpPort 中的 SO_REUSEADDR\n* [Build] 由 @ykwd 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F836 中安装 Python 文件\n* [TransferEngine] 由 @hjchen2 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F827 中使 ascend TE 能够成功发布，并支持通过重试快速从故障中恢复\n* [Build] 由 @ykwd 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F839 中安装 Python 文件补丁\n* [Doc] 由 @ykwd 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F833 中进行 SGLang HiCache 集成\n* [Docs] 由 @ykwd 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F841 中修复损坏的跟踪链接\n* feat(store): 由 @xiaguan 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F823 中通过 bind_to_numa_node 方法添加 NUMA 节点绑定支持\n* docs(deployment): 由 @xiaguan 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F825 中添加基本的 Mooncake Store 部署指南\n* refactor(store): 由 @xiaguan 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F840 中使用专用线程处理信号\n* 由 @ascend-direct-dev 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F835 中将 ascend 协议添加到 mooncake store\n* store: 由 @201341 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F843 中添加 JSON 文件并改进文档\n* feat(store): 由 @xiaguan 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F845 中为非高可用模式添加客户端心跳支持\n* 由 @Zane-Jiang 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F858 中修复问题模板中的拼写错误\n* 修复 nvlink_transport 错误：回滚 #683，由 @ShangmingCai 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F869 中完成\n* 由 @ascend-direct-dev 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F856 中修复 adxl 查找 TCP 端口的错误\n* chore: 由 @ShangmingCai 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F870 中将 pyproject.toml 中的版本号更新至 0.3.6.post1\n* [Transfer Engine] 由 @alogfans 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F831 中在所有传输任务完成后发送通知\n* feat(store): 由 @xiaguan 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F852 中支持传输引擎的 p2p 握手\n* [Chores] 由 @ykwd 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F822 中移除未使用的变量\n* [TransferEngine] 由 @zuochunwei 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F859 中通过智能聚合和流水线设计提升异构 Ascend 的性能\n* [Misc] feat: 由 @dtcccc 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F865 中支持 vllm v1 的外部 kv_connector\n* feat(store):","2025-10-25T02:56:12",{"id":216,"version":217,"summary_zh":218,"released_at":219},324002,"v0.3.6.post1","## 变更内容\n* [Store] 由 @XucSh 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F812 中实现跳过空缓冲区\n* [Store] 由 @ykwd 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F820 中更改 eviction_high_watermark_ratio 和 eviction_ratio 的默认值\n* [CI\u002F构建] 由 @peng1999 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F821 中将 mooncake-store 测试置于 BUILD_UNIT_TESTS 选项之后\n* [文档] 由 @ykwd 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F829 中将 Store 集成到 SGLang HiCache\n* [TransferEngine]: 由 @doujiang24 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F830 中移除 findAvailableTcpPort 中的 SO_REUSEADDR\n* [构建] 由 @ykwd 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F836 中安装 Python 文件\n* [TransferEngine] 由 @hjchen2 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F827 中使 ascend TE 成功发布，并通过重试支持从故障中快速恢复\n* [构建] 由 @ykwd 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F839 中安装 Python 文件补丁\n* [文档] 由 @ykwd 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F833 中完成 SGLang HiCache 集成\n* [文档] 由 @ykwd 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F841 中修复损坏的跟踪链接\n* 功能（store）：由 @xiaguan 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F823 中通过 bind_to_numa_node 方法添加 NUMA 节点绑定支持\n* 文档（部署）：由 @xiaguan 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F825 中添加基本的 Mooncake Store 部署指南\n* 重构（store）：由 @xiaguan 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F840 中使用专用线程处理信号\n* 由 @ascend-direct-dev 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F835 中将 ascend 协议添加到 mooncake store\n* store：由 @201341 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F843 中添加 JSON 文件并改进文档\n* 功能（store）：由 @xiaguan 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F845 中为非高可用模式添加客户端心跳支持\n* 由 @Zane-Jiang 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F858 中修复问题模板中的拼写错误\n* 修复 nvlink_transport 错误：由 @ShangmingCai 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F869 中回滚 #683\n* 由 @ascend-direct-dev 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F856 中修复 adxl 查找 TCP 端口的错误\n* 杂项：由 @ShangmingCai 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F870 中将 pyproject.toml 中的版本号更新至 0.3.6.post1\n\n## 新贡献者\n* @peng1999 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F821 中做出了首次贡献\n* @Zane-Jiang 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F858 中做出了首次贡献\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fcompare\u002Fv0.3.6...v0.3.6.post1","2025-09-20T03:31:09",{"id":221,"version":222,"summary_zh":223,"released_at":224},324003,"v0.3.6","## 变更内容\n* 功能（store）：添加批量获取缓冲区支持，由 @xiaguan 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F671 中实现\n* [TransferEngine] 优化：移除 submitTransferTask 中的 request_list 参数，由 @staryxchen 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F565 中完成\n* 功能（transfer_engine_bench）：添加多 GPU 支持，由 @staryxchen 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F675 中实现\n* [CI\u002FBuild] Mooncake-common\u002Fcommon.cmake：添加 pthread 链接标志，由 @weinanliu 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F681 中完成\n* [DOC] 修复 mooncake-store-preview.md 中的问题，由 @SgtPepperr 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F685 中完成\n* [Store] 指标：添加响应结构体，由 @stmatengss 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F686 中实现\n* [Store] 修复：添加客户端列表指标，由 @stmatengss 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F693 中完成\n* 重构（offset-allocator）：添加内存分配指标跟踪，由 @xiaguan 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F687 中实现\n* [TransferEngine] 修复构建问题并适配最新的 Mooncake 变更，由 @AscendTransport 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F684 中完成\n* 文档：添加 RDMA 内存注册故障排除指南，由 @xiaguan 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F694 中完成\n* 添加在 AMD GPU 上运行的说明，由 @lihaofd 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F689 中完成\n* [BugFix] 解决 RDMA 内存注册大小为零的问题，由 @ykwd 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F695 中完成\n* 重构（store）：将客户端缓冲区实现移至 store 模块，由 @xiaguan 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F700 中完成\n* 重构（MasterClient）：引入通用 RPC 调用辅助函数，由 @xiaguan 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F697 中完成\n* [BugFix] 修复拓扑空检查错误，由 @ykwd 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F696 中完成\n* 错误修复（nvlink）：为 NvlinkTransport 添加显式 P2P 访问启用及错误处理，由 @staryxchen 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F683 中完成\n* [BugFix] 禁止注册大小为零的内存，由 @ykwd 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F701 中完成\n* [TransferEngine] 功能：支持 CXL 共享内存，并提供简单的单元测试，由 @hemist 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F670 中完成\n* 代码格式化及在 CI 中启用代码格式检查，由 @doujiang24 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F677 中完成\n* 文档：添加 QP 分配错误的故障排除步骤，由 @staryxchen 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F707 中完成\n* [Store] 增强 Master 指标，由 @ykwd 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F705 中完成\n* [store] 功能：添加 master 配置，由 @201341 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F650 中完成\n* [Store][bind] 添加新的支持数据类型，由 @stmatengss 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F712 中完成\n* [TransferEngine] 功能：添加自动获取 CXL 设备大小的通用方法，由 @StepY1aoZz 在 https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F715 中完成\n* 在 TCP 传输中重新实现 VRAM 缓冲，由 @alogfans 在 http 中完成","2025-09-10T07:46:26",{"id":226,"version":227,"summary_zh":228,"released_at":229},324004,"v0.3.5","## What's Changed\r\n* feat(store): add thread safety analysis with clang annotations by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F538\r\n* feat(master): support rpc server address parameter by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F530\r\n* add notify support by @haobayuxi in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F528\r\n* [TE] revert: fix QP reclaim issues by @stmatengss in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F543\r\n* chore: bump version to 0.3.4.post1 in pyproject.toml by @ShangmingCai in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F544\r\n* [TransferEngine] Add Redis password authentication and DB selection via environment variables by @staryxchen in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F512\r\n* feat(store): add batch exist support for master by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F542\r\n* [TransferEngine] Fix side effect of wild location registration by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F552\r\n* chore: bump version to 0.3.4.post2 in pyproject.toml by @ShangmingCai in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F554\r\n* chore: checkout specific version of yalantinglibs in script by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F555\r\n* [TransferEngine]: fix compilation warning by @201341 in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F550\r\n* [TransferEngine] fix segfault when create cq failed by @doujiang24 in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F535\r\n* [Integration] feat: expose batch reg API by @stmatengss in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F558\r\n* support batch put\u002Fget api in python module by @xinranwang17 in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F556\r\n* feat(store): add zero copy batch put and get for python binding by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F551\r\n* [TransferEngine] bugfix: ensure proper socket closure in destructor by @staryxchen in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F566\r\n* [TransferEngine] Add support to force MNNVL transport by MC_FORCE_MNNVL by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F572\r\n* [Store] Add Chaos Tests and Fix Bugs by @ykwd in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F568\r\n* Optimize slice handling to accelerate the large batch transfer operation by @SCDESPERTATE in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F557\r\n* [P2P Store] Add cuda link option when it is installed by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F560\r\n* [DOC] fix: Naming errors in Doc transfer-engine-python.md by @SgtPepperr in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F508\r\n* [cmake]fix cmake for centos by @qicosmos in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F573\r\n* [Doc] Add pypi install guide in the build doc by @ShangmingCai in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F574\r\n* [TransferEngine] Enable Huawei Ascend Transport for TransferEngine by @AscendTransport in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F502\r\n* [Misc] Add Issue Template in Github by @scatyf3 in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F506\r\n* [DOC] Update API description of mooncake store client by @panli889 in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F548\r\n* [DOC] Add Description for High Availability in Store by @ykwd in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F576\r\n* Disable memcpy by default and improve stress workload test by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F577\r\n* [Store] Enable Client SSD Offload And Storage Persistence by @SgtPepperr in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F437\r\n* [TransferEngine] Fix retry logics in RDMA worker by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F417\r\n* refactor: introduce expected pattern for error handling in master service by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F562\r\n* refactor(tests): enhance stress test benchmarking with zero-copy batch by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F586\r\n* [TransferEngine] Enlarge default send\u002Frecv message size in etcd by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F575\r\n* fixed initall function by @JasonZhang517 in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F591\r\n* [DOC]: Add Description for Data Persistence and KVCache offloading in Store by @SgtPepperr in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F585\r\n* add support for asynchronous batch transfer to accelerate transfer operation by @SCDESPERTATE in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F564\r\n* [Store] Add ungister_buffer api for Store by @SgtPepperr in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F596\r\n* feat(topology): improve HCA selection by considering PCIe distance by @staryxchen in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F581\r\n* [Store] Soft Pin for Important Object by @ykwd in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F587\r\n* docs: add support for LMDeploy by @Risc-lt in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F592\r\n* [doc] Update mooncake-store doc by @LuyuZhang00 in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F603\r\n* test(cli","2025-07-25T03:29:58",{"id":231,"version":232,"summary_zh":233,"released_at":234},324005,"v0.3.4.post2","## What's Changed\r\n* [TransferEngine] Add Redis password authentication and DB selection via environment variables by @staryxchen in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F512\r\n* feat(store): add batch exist support for master by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F542\r\n* [TransferEngine] Fix side effect of wild location registration by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F552\r\n* chore: bump version to 0.3.4.post2 in pyproject.toml by @ShangmingCai in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F554\r\n* chore: checkout specific version of yalantinglibs in script by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F555\r\n\r\n## New Contributors\r\n* @staryxchen made their first contribution in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F512\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fcompare\u002Fv0.3.4.post1...v0.3.4.post2","2025-06-25T08:23:37",{"id":236,"version":237,"summary_zh":238,"released_at":239},324006,"v0.3.4.post1","## What's Changed\r\n* feat(store): add thread safety analysis with clang annotations by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F538\r\n* feat(master): support rpc server address parameter by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F530\r\n* add notify support by @haobayuxi in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F528\r\n* [TE] revert: fix QP reclaim issues by @stmatengss in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F543\r\n* chore: bump version to 0.3.4.post1 in pyproject.toml by @ShangmingCai in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F544\r\n\r\n## New Contributors\r\n* @haobayuxi made their first contribution in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F528\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fcompare\u002Fv0.3.4...v0.3.4.post1","2025-06-23T11:28:29",{"id":241,"version":242,"summary_zh":243,"released_at":244},324007,"v0.3.4","## What's Changed\r\n* chore(ci): disable asan in release workflow by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F493\r\n* chore: bump version to 0.3.3.post1 in pyproject.toml by @ShangmingCai in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F494\r\n* [TransferEngine] Optimize custom allocator function name by @ShangmingCai in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F497\r\n* chore: bump version to 0.3.3.post2 in pyproject.toml by @ShangmingCai in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F498\r\n* fix(transfer-task): fix error hanlding logic in transfer task by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F503\r\n* [MooncakeIntegration] Fix find class id by @jellor in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F500\r\n* [Build] add TE bench into wheel package by @stmatengss in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F514\r\n* [Build] add nvlink hook into python package dir for local build by @ShangmingCai in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F517\r\n* [Build] Optimize nvlink allocator build logic and fix name issue by @ShangmingCai in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F523\r\n* [Build] Add allocator class to support nvlink for more use-cases by @ShangmingCai in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F524\r\n* [TransferEngine] Change option use_nvlink to use_mnnvl to clarify the usage by @ShangmingCai in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F525\r\n* [Bugfix] Fix missing option and sglang integration doc by @ShangmingCai in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F526\r\n* [Build] Skip etcd go package compilation by default by @ykwd in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F520\r\n* [Build] Deprecate stale adaptor usage to reduce whl package size by @ShangmingCai in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F529\r\n* [TransferEngine] Disabling auto-delete QP trying to avoid the availabilty problem by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F483\r\n* use kWildcardLocation instead of hardcode \"cpu:0\" to recognize cpu numa node automatically. by @doujiang24 in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F527\r\n* [Build] Optimize store build control for wheel and local build by @ShangmingCai in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F531\r\n* add support for batch transfer to accelerate transfer operation by @ssssnow in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F499\r\n* [Store] High Availability V2: Client Failover  by @ykwd in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F501\r\n* feat(store): add zero-copy operations for python binding by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F532\r\n* chore: bump version to 0.3.4 in pyproject.toml by @ShangmingCai in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F533\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fcompare\u002Fv0.3.3...v0.3.4","2025-06-20T10:58:18",{"id":246,"version":247,"summary_zh":248,"released_at":249},324008,"v0.3.3.post2","## What's Changed\r\n* [TransferEngine] Optimize custom allocator function name by @ShangmingCai in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F497\r\n* fix(transfer-task): fix error hanlding logic in transfer task by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F503\r\n* chore: bump version to 0.3.3.post2 in pyproject.toml by @ShangmingCai in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F498\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fcompare\u002Fv0.3.3.post1...v0.3.3.post2","2025-06-16T16:46:32",{"id":251,"version":252,"summary_zh":253,"released_at":254},324009,"v0.3.3.post1","## What's Changed\r\n* chore(ci): disable asan in release workflow by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F493\r\n* chore: bump version to 0.3.3.post1 in pyproject.toml by @ShangmingCai in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F494\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fcompare\u002Fv0.3.3...v0.3.3.post1","2025-06-15T04:12:02",{"id":256,"version":257,"summary_zh":258,"released_at":259},324010,"v0.3.3","## What's Changed\r\n* Fix(#374)(master): Pass eviction ration flag to the master service by @maobaolong in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F405\r\n* [TransferEngine]: Add a new configuration option of log path by @SCDESPERTATE in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F399\r\n* [TransferEngine] Avoid query segment desc too often by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F404\r\n* chore: bump version to 0.3.2.post1 in pyproject.toml by @ShangmingCai in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F409\r\n* Improve[store-service] Support put 100MB limited value size by @maobaolong in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F407\r\n* [Fix] Store Metrics-related Bugs by @ykwd in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F408\r\n* leave endpoint status unchanged when delete endpoint reference to avoid endpoint deconstruction before CQ being generated by @jiafuzha in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F384\r\n* add VRAM support in server side for transfer_engine_bench.cpp by @yongjianxu in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F413\r\n* make backlog size of handshake listen configurable by @jiafuzha in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F388\r\n* fix: filter out DOWN network interfaces in findLocalIpAddresses() by @phoenixwu0229 in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F422\r\n* followup(#374)(master): Add high watermark ratio flag to avoid trigger eviction until put failed by @maobaolong in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F403\r\n* [TransferEngine] Reduce duplicated code by @201341 in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F425\r\n* ci: add asan check by @201341 in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F423\r\n* Improve the batchEvict logic by @maobaolong in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F432\r\n* [Doc] Update lmcacheV1-deployment.md by @joker-star-l in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F438\r\n* chore: artifact include etcd by default by @qyzhaoxun in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F440\r\n* [Test] add fault tolerant CI by @stmatengss in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F418\r\n* [Doc] fix default `slice_size` documentation by @chenhao-ye in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F444\r\n* [MooncakeStore] support batch api by @xinranwang17 in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F428\r\n* feat(store): add preferred segment allocation strategy by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F412\r\n* ci: fix sccache by @201341 in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F443\r\n* [TransferEngine] Enable NVLink transport across multiple processes by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F442\r\n* [MooncakeStore]Provide cache aware interface for scheduler by @zhaoyongke in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F448\r\n* Support cuMem APIs by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F457\r\n* [DOC] Transfer Engine Python API Doc by @stmatengss in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F458\r\n* [TransferEngine] Add debug information in NVLink xport register by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F461\r\n* chore: automate build output directory and update scripts by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F460\r\n* [TransferEngine] Fix registration and relocating problem when requested size smaller than physical page size by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F462\r\n* [TransferEngine] Fix compilation bug in NVLink xport by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F463\r\n* [TransferEngine] hotfix bench program by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F466\r\n* feat(ci): enable nvlink hook in build configuration by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F465\r\n* [TransferEngine] Fix minor bugs in NVLink transport and benchmark by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F468\r\n* [Store] High Availability 1: Master Failover by @ykwd in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F451\r\n* Revert \"[TransferEngine] Fix minor bugs in NVLink transport and benchmark\" by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F469\r\n* [TransferEngine] Fix protection problem when multiple GPU devices are used by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F475\r\n* fixed etcd and cmake version issues by @JasonZhang517 in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F478\r\n* [TransferEngine] Add IPv6 support by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F479\r\n* feat(client): Abstract client-side data transmission for async and batch optimization by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F455\r\n* [Build] fix build wheel if nvlink is disabled by @stmatengss in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F480\r\n* Add operation cost time for mooncake_store_service by @maobaolong in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F471\r\n* Fix NVLink transport error in multi-node scenarios by @fzyzcjy in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F485\r\n* feat(release)","2025-06-14T07:11:06",{"id":261,"version":262,"summary_zh":263,"released_at":264},324011,"v0.3.2.post1","## What's Changed\r\n* Fix(#374)(master): Pass eviction ration flag to the master service by @maobaolong in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F405\r\n* [TransferEngine]: Add a new configuration option of log path by @SCDESPERTATE in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F399\r\n* [TransferEngine] Avoid query segment desc too often by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F404\r\n* chore: bump version to 0.3.2.post1 in pyproject.toml by @ShangmingCai in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F409\r\n\r\n## New Contributors\r\n* @SCDESPERTATE made their first contribution in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F399\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fcompare\u002Fv0.3.2...v0.3.2.post1","2025-05-26T11:51:05",{"id":266,"version":267,"summary_zh":268,"released_at":269},324012,"v0.3.2","## Highlights\r\n* TE supports fault tolerency\r\n* Store: supports eviction and lease\r\n\r\n## What's Changed\r\n* chore(deps): bump golang.org\u002Fx\u002Fnet from 0.36.0 to 0.38.0 in \u002Fmooncake-p2p-store\u002Fsrc\u002Fp2pstore by @dependabot in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F376\r\n* [Build]: exclude cuda so files in auditwheel. by @doujiang24 in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F379\r\n* [TransferEngine] Remove unused local variable by @jellor in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F382\r\n* [Doc] Add document for integrating Mooncake Store to LMCache V1 by @XucSh in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F385\r\n* [Store] Add features: lease and eviction by @ykwd in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F374\r\n* fix bug in async apis in python, the batch_id's dtype is int64_t, not int by @niqi-lyu in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F387\r\n* [Bugfix] Fix PID retrieval when destroying the vLLM thread in the vllm benchmark demo by @0x777a6c in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F389\r\n* [TE FIX] Updating the outstanding work request counting when closing a QP by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F390\r\n* [TransferEngine] Fix hang problem due to previous failed connection by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F392\r\n* Remove duplicated code between transferSync[Read|Write] and transferSync by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F394\r\n* [FIX] avoid locking for the same spinlock multiple times by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F395\r\n* [Build] do not hard code release build type. by @doujiang24 in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F393\r\n* [TransferEngine] add lock for handle_map_ to avoid segfault. by @doujiang24 in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F396\r\n* [TransferEngine] Update software-based timeout mechanism by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F398\r\n* [TransferEngine] Revert to disable slice timeout by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F401\r\n* chore: bump version to 0.3.2 in pyproject.toml by @ShangmingCai in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F402\r\n\r\n## New Contributors\r\n* @jellor made their first contribution in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F382\r\n* @niqi-lyu made their first contribution in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F387\r\n* @0x777a6c made their first contribution in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F389\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fcompare\u002Fv0.3.1...v0.3.2","2025-05-25T14:35:46",{"id":271,"version":272,"summary_zh":273,"released_at":274},324013,"v0.3.1","## Highlights\r\n* Performance: Optimized local data transfer via memcpy, Enhanced buffer allocation logic and path selection strategy.\r\n* CI\u002FCD: more build scripts, Docker support for master server and CI testing.\r\n* Observability: Improved error logging, metrics and null pointer checks\r\n* Create mooncake website\r\n* Bug fixes: GCC10 build fixes, Dependency and RDMA transport fixes.\r\n\r\n## What's Changed\r\n* docs: add lmcache integration documentation by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F336\r\n* [Store] Improve: Add RemoveAll rpc for remove all keys by @maobaolong in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F327\r\n* [fix]fix compile error for gcc10 by @qicosmos in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F339\r\n* feat(Mooncake Integration): Support pure client without store mode by @maobaolong in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F341\r\n* ARM build_wheel.sh by @johnnynunez in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F344\r\n* [DOC] Add news about NIXL supports Mooncake as a backend by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F348\r\n* Fix news render in README by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F350\r\n* chore: enhance error logging for tcp transport by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F349\r\n* add arm compatibility by @johnnynunez in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F343\r\n* gitmodules: use full path instead of relative path. by @doujiang24 in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F345\r\n* feat(transfer-engine): enhance logging for RPC and topology discovery by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F352\r\n* chore(ci): expand python version testing matrix to include 3.8-3.13 by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F353\r\n* feat(Mooncake Integration): Supply a MooncakeConfig into whl file by @maobaolong in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F338\r\n* feat(docs): build documentation website for Mooncake using Sphinx by @Risc-lt in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F354\r\n* [Build] feature: start the Mooncake master server through Docker by @Chasing1020 in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F320\r\n* fix wrong command in XpYd by @gujingit in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F357\r\n* fix(master): Fix negative storage size metrics after removeAll by @maobaolong in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F355\r\n* [CI] add dockerfile CI test by @stmatengss in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F362\r\n* [FIX] avoid sending request after setting inactive by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F361\r\n* [Transfer Engine] Check the buffer size before register by @XucSh in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F364\r\n* fix: incorrect urls and update page deployment by @Risc-lt in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F363\r\n* [Build] Call find_package() before using external deps by @tchaikov in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F359\r\n* [BugFix] Buffer Allocation Always Tries on the Same Allocator by @ykwd in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F365\r\n* [TransferEngine] Add sanity check on nullptrs  by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F366\r\n* fix(store): the metadata leak after umount segment by @maobaolong in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F337\r\n* store: optimize local data transfer with memcpy fast path by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F340\r\n* [DOC] update blog url by @stmatengss in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F369\r\n* build(transport): add glog and pthread to rdma target link libraries by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F370\r\n* Add LRU in MasterService, complexity O(1) by @zhaoyongke in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F287\r\n* [Store] feat: Add a MooncakeStoreService to serve store and rest api by @maobaolong in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F328\r\n* [hotfix] Allow compile flag USE_LRU_MASTER to enable\u002Fdisable the LRU feature by @ykwd in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F372\r\n* [FIX] update path reselection stragegy to cover all possible available devices by @alogfans in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F373\r\n* feat(py): integrate python http metadata server by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F367\r\n* chore: bump version to 0.3.1 in pyproject.toml by @xiaguan in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F377\r\n\r\n## New Contributors\r\n* @qicosmos made their first contribution in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F339\r\n* @johnnynunez made their first contribution in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F344\r\n* @Risc-lt made their first contribution in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F354\r\n* @XucSh made their first contribution in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F364\r\n* @tchaikov made their first contribution in https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002FMooncake\u002Fpull\u002F359\r\n* @ykwd made their first contribution in https:\u002F\u002Fgithub.com\u002Fkv","2025-05-19T05:36:43"]