[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-rapidsai--raft":3,"tool-rapidsai--raft":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":75,"owner_name":76,"owner_avatar_url":77,"owner_bio":78,"owner_company":79,"owner_location":79,"owner_email":79,"owner_twitter":80,"owner_website":81,"owner_url":82,"languages":83,"stars":123,"forks":124,"last_commit_at":125,"license":126,"difficulty_score":10,"env_os":127,"env_gpu":128,"env_ram":127,"env_deps":129,"category_tags":139,"github_topics":140,"view_count":10,"oss_zip_url":79,"oss_zip_packed_at":79,"status":16,"created_at":161,"updated_at":162,"faqs":163,"releases":189},662,"rapidsai\u002Fraft","raft","RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.","RAFT 是一个专为机器学习和数据挖掘打造的高性能计算库。它提供了一系列经过 CUDA 加速的基础算法与计算原语，旨在成为构建高效应用的坚实基石。\n\n在实际开发中，底层算子的重复实现往往拖慢进度且难以维护。RAFT 通过模块化设计解决了这一痛点，让开发者能像搭积木一样快速组合出高性能程序。它不仅支持 C++ 原生调用，还提供轻量级的 Python 封装及 Dask 分布式支持，方便不同技术栈的用户接入。\n\n无论是算法工程师还是系统开发者，只要涉及大规模数据处理、线性代数运算或模型评估，RAFT 都是得力助手。其独特的头文件模板库架构兼顾了编译速度与灵活性，涵盖了从数据格式转换到稀疏矩阵求解的全方位功能。对于追求极致性能并希望简化 GPU 编程复杂度的团队来说，RAFT 是理想选择。","# \u003Cdiv align=\"left\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frapidsai_raft_readme_9e040bc0234d.png\" width=\"90px\"\u002F>&nbsp;RAFT: Reusable Accelerated Functions and Tools\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Frapidsai\u002Fraft\u002FHEAD\u002Fimg\u002Fraft-tech-stack.svg\" alt=\"RAFT tech stack\" width=\"100%\">\n\u003C\u002Fp>\n\n\n\n## Contents\n\u003Chr>\n\n1. [Useful Resources](#useful-resources)\n2. [What is RAFT?](#what-is-raft)\n2. [Use cases](#use-cases)\n3. [Is RAFT right for me?](#is-raft-right-for-me)\n4. [Getting Started](#getting-started)\n5. [Installing RAFT](#installing)\n6. [Codebase structure and contents](#folder-structure-and-contents)\n7. [Contributing](#contributing)\n8. [References](#references)\n\n\u003Chr>\n\n## Useful Resources\n\n- [RAFT Reference Documentation](https:\u002F\u002Fdocs.rapids.ai\u002Fapi\u002Fraft\u002Fstable\u002F): API Documentation.\n- [RAFT Getting Started](.\u002Fdocs\u002Fsource\u002Fquick_start.md): Getting started with RAFT.\n- [Build and Install RAFT](.\u002Fdocs\u002Fsource\u002Fbuild.md): Instructions for installing and building RAFT.\n- [RAPIDS Community](https:\u002F\u002Frapids.ai\u002Fcommunity.html): Get help, contribute, and collaborate.\n- [GitHub repository](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft): Download the RAFT source code.\n- [Issue tracker](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fissues): Report issues or request features.\n\n\n\n## What is RAFT?\n\nRAFT contains fundamental widely-used algorithms and primitives for machine learning and data mining. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.\n\nBy taking a primitives-based approach to algorithm development, RAFT\n- accelerates algorithm construction time\n- reduces the maintenance burden by maximizing reuse across projects, and\n- centralizes core reusable computations, allowing future optimizations to benefit all algorithms that use them.\n\nWhile not exhaustive, the following general categories help summarize the accelerated functions in RAFT:\n#####\n| Category              | Accelerated Functions in RAFT                                                                                                     |\n|-----------------------|-----------------------------------------------------------------------------------------------------------------------------------|\n| **Data Formats**      | sparse & dense, conversions, data generation                                                                                      |\n| **Dense Operations**  | linear algebra, matrix and vector operations, reductions, slicing, norms, factorization, least squares, svd & eigenvalue problems |\n| **Sparse Operations** | linear algebra, eigenvalue problems, slicing, norms, reductions, factorization, symmetrization, components & labeling             |\n| **Solvers**           | combinatorial optimization, iterative solvers                                                                                     |\n| **Statistics**        | sampling, moments and summary statistics, metrics, model evaluation                                                               |\n| **Tools & Utilities** | common tools and utilities for developing CUDA applications, multi-node multi-gpu infrastructure                                  |\n\n\nRAFT is a C++ header-only template library with an optional shared library that\n1) can speed up compile times for common template types, and\n2) provides host-accessible \"runtime\" APIs, which don't require a CUDA compiler to use\n\nIn addition being a C++ library, RAFT also provides 2 Python libraries:\n- `pylibraft` - lightweight Python wrappers around RAFT's host-accessible \"runtime\" APIs.\n- `raft-dask` - multi-node multi-GPU communicator infrastructure for building distributed algorithms on the GPU with Dask.\n\n![RAFT is a C++ header-only template library with optional shared library and lightweight Python wrappers](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frapidsai_raft_readme_286e145567d6.png)\n\n\n## Is RAFT right for me?\n\nRAFT contains low-level primitives for accelerating applications and workflows. Data source providers and application developers may find specific tools very useful. RAFT is not intended to be used directly by data scientists for discovery and experimentation. For data science tools, please see the [RAPIDS website](https:\u002F\u002Frapids.ai\u002F).\n\n## Getting started\n\n### RAPIDS Memory Manager (RMM)\n\nRAFT relies heavily on RMM which eases the burden of configuring different allocation strategies globally across the libraries that use it.\n\n### Multi-dimensional Arrays\n\nThe APIs in RAFT accept the [mdspan](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.06474) multi-dimensional array view for representing data in higher dimensions similar to the `ndarray` in the Numpy Python library. RAFT also contains the corresponding owning `mdarray` structure, which simplifies the allocation and management of multi-dimensional data in both host and device (GPU) memory.\n\nThe `mdarray` forms a convenience layer over RMM and can be constructed in RAFT using a number of different helper functions:\n\n```c++\n#include \u003Craft\u002Fcore\u002Fdevice_mdarray.hpp>\n\nint n_rows = 10;\nint n_cols = 10;\n\nauto scalar = raft::make_device_scalar\u003Cfloat>(handle, 1.0);\nauto vector = raft::make_device_vector\u003Cfloat>(handle, n_cols);\nauto matrix = raft::make_device_matrix\u003Cfloat>(handle, n_rows, n_cols);\n```\n\n### C++ Example\n\nMost of the primitives in RAFT accept a `raft::device_resources` object for the management of resources which are expensive to create, such CUDA streams, stream pools, and handles to other CUDA libraries like `cublas` and `cusolver`.\n\nThe example below demonstrates creating a RAFT handle and using it with `device_matrix` and `device_vector` to allocate memory, generating random clusters, and computing\npairwise Euclidean distances with the [NVIDIA cuVS](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fcuvs) library:\n\n```c++\n#include \u003Craft\u002Fcore\u002Fdevice_resources.hpp>\n#include \u003Craft\u002Fcore\u002Fdevice_mdspan.hpp>\n#include \u003Craft\u002Frandom\u002Fmake_blobs.cuh>\n#include \u003Ccuvs\u002Fdistance\u002Fdistance.hpp>\n\nraft::device_resources handle;\n\nint n_samples = 5000;\nint n_features = 50;\n\nfloat *input;\nint *labels;\nfloat *output;\n\n...\n\u002F\u002F Allocate input, labels, and output pointers\n...\n\nauto input_view = raft::make_device_matrix_view(input, n_samples, n_features);\nauto labels_view = raft::make_device_vector_view(labels, n_samples);\nauto output_view = raft::make_device_matrix_view(output, n_samples, n_samples);\n\nraft::random::make_blobs(handle, input_view, labels_view);\n\nauto metric = cuvs::distance::DistanceType::L2SqrtExpanded;\ncuvs::distance::pairwise_distance(handle, input_view, input_view, output_view, metric);\n```\n\n\n### Python Example\n\nThe `pylibraft` package contains a Python API for RAFT algorithms and primitives. `pylibraft` integrates nicely into other libraries by being very lightweight with minimal dependencies and accepting any object that supports the `__cuda_array_interface__`, such as [CuPy's ndarray](https:\u002F\u002Fdocs.cupy.dev\u002Fen\u002Fstable\u002Fuser_guide\u002Finteroperability.html#rmm). The number of RAFT algorithms exposed in this package is continuing to grow from release to release.\n\nThe example below demonstrates computing the pairwise Euclidean distances between CuPy arrays using the [NVIDIA cuVS](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fcuvs) library. Note that CuPy is not a required dependency for `pylibraft`.\n\n```python\nimport cupy as cp\n\nfrom cuvs.distance import pairwise_distance\n\nn_samples = 5000\nn_features = 50\n\nin1 = cp.random.random_sample((n_samples, n_features), dtype=cp.float32)\nin2 = cp.random.random_sample((n_samples, n_features), dtype=cp.float32)\n\noutput = pairwise_distance(in1, in2, metric=\"euclidean\")\n```\n\nThe `output` array in the above example is of type `raft.common.device_ndarray`, which supports [__cuda_array_interface__](https:\u002F\u002Fnumba.pydata.org\u002Fnumba-doc\u002Fdev\u002Fcuda\u002Fcuda_array_interface.html#cuda-array-interface-version-2) making it interoperable with other libraries like CuPy, Numba, PyTorch and RAPIDS cuDF that also support it. CuPy supports DLPack, which also enables zero-copy conversion from `raft.common.device_ndarray` to JAX and Tensorflow.\n\nBelow is an example of converting the output `pylibraft.device_ndarray` to a CuPy array:\n```python\ncupy_array = cp.asarray(output)\n```\n\nAnd converting to a PyTorch tensor:\n```python\nimport torch\n\ntorch_tensor = torch.as_tensor(output, device='cuda')\n```\n\nOr converting to a RAPIDS cuDF dataframe:\n```python\ncudf_dataframe = cudf.DataFrame(output)\n```\n\nWhen the corresponding library has been installed and available in your environment, this conversion can also be done automatically by all RAFT compute APIs by setting a global configuration option:\n```python\nimport pylibraft.config\npylibraft.config.set_output_as(\"cupy\")  # All compute APIs will return cupy arrays\npylibraft.config.set_output_as(\"torch\") # All compute APIs will return torch tensors\n```\n\nYou can also specify a `callable` that accepts a `pylibraft.common.device_ndarray` and performs a custom conversion. The following example converts all output to `numpy` arrays:\n```python\npylibraft.config.set_output_as(lambda device_ndarray: return device_ndarray.copy_to_host())\n```\n\n`pylibraft` also supports writing to a pre-allocated output array so any `__cuda_array_interface__` supported array can be written to in-place:\n\n```python\nimport cupy as cp\n\nfrom cuvs.distance import pairwise_distance\n\nn_samples = 5000\nn_features = 50\n\nin1 = cp.random.random_sample((n_samples, n_features), dtype=cp.float32)\nin2 = cp.random.random_sample((n_samples, n_features), dtype=cp.float32)\noutput = cp.empty((n_samples, n_samples), dtype=cp.float32)\n\npairwise_distance(in1, in2, out=output, metric=\"euclidean\")\n```\n\n\n## Installing\n\nRAFT's C++ and Python libraries can both be installed through Conda and the Python libraries through Pip.\n\n\n### Installing C++ and Python through Conda\n\nThe easiest way to install RAFT is through conda and several packages are provided.\n- `libraft-headers` C++ headers\n- `pylibraft` (optional) Python library\n- `raft-dask` (optional) Python library for deployment of multi-node multi-GPU algorithms that use the RAFT `raft::comms` abstraction layer in Dask clusters.\n\nUse the following command, depending on your CUDA version, to install all of the RAFT packages with conda (replace `rapidsai` with `rapidsai-nightly` to install more up-to-date but less stable nightly packages). `mamba` is preferred over the `conda` command.\n\n```bash\n# CUDA 13\nmamba install -c rapidsai -c conda-forge raft-dask pylibraft cuda-version=13.1\n\n# CUDA 12\nmamba install -c rapidsai -c conda-forge raft-dask pylibraft cuda-version=12.9\n```\n\nNote that the above commands will also install `libraft-headers` and `libraft`.\n\nYou can also install the conda packages individually using the `mamba` command above. For example, if you'd like to install RAFT's headers and pre-compiled shared library to use in your project:\n\n```bash\n# CUDA 13\nmamba install -c rapidsai -c conda-forge libraft libraft-headers cuda-version=13.1\n\n# CUDA 12\nmamba install -c rapidsai -c conda-forge libraft libraft-headers cuda-version=12.9\n```\n\n### Installing Python through Pip\n\n`pylibraft` and `raft-dask` both have experimental packages that can be [installed through pip](https:\u002F\u002Frapids.ai\u002Fpip.html#install):\n\n```bash\n# CUDA 13\npip install pylibraft-cu13\npip install raft-dask-cu13\n\n# CUDA 12\npip install pylibraft-cu12\npip install raft-dask-cu12\n```\n\nThese packages statically build RAFT's pre-compiled instantiations and so the C++ headers won't be readily available to use in your code.\n\nThe [build instructions](https:\u002F\u002Fdocs.rapids.ai\u002Fapi\u002Fraft\u002Fnightly\u002Fbuild\u002F) contain more details on building RAFT from source and including it in downstream projects. You can also find a more comprehensive version of the above CPM code snippet the [Building RAFT C++ and Python from source](https:\u002F\u002Fdocs.rapids.ai\u002Fapi\u002Fraft\u002Fnightly\u002Fbuild\u002F#building-c-and-python-from-source) section of the build instructions.\n\n\n## Contributing\n\nIf you are interested in contributing to the RAFT project, please read our [Contributing guidelines](docs\u002Fsource\u002Fcontributing.md). Refer to the [Developer Guide](docs\u002Fsource\u002Fdeveloper_guide.md) for details on the developer guidelines, workflows, and principals.\n\n## References\n\nWhen citing RAFT generally, please consider referencing this Github project.\n```bibtex\n@misc{rapidsai,\n  title={Rapidsai\u002Fraft: RAFT contains fundamental widely-used algorithms and primitives for data science, Graph and machine learning.},\n  url={https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft},\n  journal={GitHub},\n  publisher={NVIDIA RAPIDS},\n  author={Rapidsai},\n  year={2022}\n}\n```\n","# \u003Cdiv align=\"left\">\u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frapidsai_raft_readme_9e040bc0234d.png\" width=\"90px\"\u002F>&nbsp;RAFT: Reusable Accelerated Functions and Tools\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Frapidsai\u002Fraft\u002FHEAD\u002Fimg\u002Fraft-tech-stack.svg\" alt=\"RAFT tech stack\" width=\"100%\">\n\u003C\u002Fp>\n\n\n\n## Contents\n\u003Chr>\n\n1. [有用资源](#useful-resources)\n2. [RAFT 是什么？](#what-is-raft)\n2. [使用场景](#use-cases)\n3. [RAFT 适合我吗？](#is-raft-right-for-me)\n4. [入门指南](#getting-started)\n5. [安装 RAFT](#installing)\n6. [代码库结构和内容](#folder-structure-and-contents)\n7. [贡献指南](#contributing)\n8. [参考文献](#references)\n\n\u003Chr>\n\n## 有用资源\n\n- [RAFT 参考文档](https:\u002F\u002Fdocs.rapids.ai\u002Fapi\u002Fraft\u002Fstable\u002F): API 文档。\n- [RAFT 入门指南](.\u002Fdocs\u002Fsource\u002Fquick_start.md): 开始使用 RAFT。\n- [构建和安装 RAFT](.\u002Fdocs\u002Fsource\u002Fbuild.md): 安装和构建 RAFT 的说明。\n- [RAPIDS 社区](https:\u002F\u002Frapids.ai\u002Fcommunity.html): 获取帮助、做出贡献并协作。\n- [GitHub 仓库](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft): 下载 RAFT 源代码。\n- [问题追踪器](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fissues): 报告问题或请求功能。\n\n\n\n## RAFT 是什么？\n\nRAFT 包含机器学习和数据挖掘中广泛使用的基础算法和原语（primitives）。这些算法经过 CUDA 加速，构成了编写高性能应用程序的基础模块。\n\n通过采用基于原语的算法开发方法，RAFT\n- 加速了算法构建时间\n- 通过最大化跨项目的复用性来减少维护负担，并且\n- 集中了核心可复用计算，允许未来的优化惠及所有使用它们的算法。\n\n虽然不全面，但以下一般类别有助于总结 RAFT 中的加速函数：\n#####\n| 类别              | RAFT 中的加速函数                                                                                                     |\n|-----------------------|-----------------------------------------------------------------------------------------------------------------------------------|\n| **数据格式**      | 稀疏与稠密、转换、数据生成                                                                                      |\n| **稠密操作**  | 线性代数、矩阵和向量运算、归约、切片、范数、分解、最小二乘法、SVD 及特征值问题 |\n| **稀疏操作** | 线性代数、特征值问题、切片、范数、归约、分解、对称化、组件与标记             |\n| **求解器**           | 组合优化、迭代求解器                                                                                     |\n| **统计**        | 采样、矩与汇总统计、指标、模型评估                                                               |\n| **工具与实用程序** | 用于开发 CUDA 应用的通用工具和实用程序、多节点多 GPU 基础设施                                  |\n\n\nRAFT 是一个 C++ 仅头文件模板库，带有一个可选的共享库，该库\n1) 可以加快常见模板类型的编译时间，并且\n2) 提供主机可访问的“运行时”API，无需 CUDA 编译器即可使用\n\n除了作为 C++ 库之外，RAFT 还提供了 2 个 Python 库：\n- `pylibraft` - 围绕 RAFT 的主机可访问“运行时”API 的轻量级 Python 包装器。\n- `raft-dask` - 用于在 GPU 上使用 Dask 构建分布式算法的多节点多 GPU 通信基础设施。\n\n![RAFT 是一个 C++ 仅头文件模板库，带有可选的共享库和轻量级 Python 包装器](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frapidsai_raft_readme_286e145567d6.png)\n\n\n## RAFT 适合我吗？\n\nRAFT 包含用于加速应用程序和工作流的底层原语。数据源提供者和应用开发人员可能会发现特定工具非常有用。RAFT 并不打算供数据科学家直接用于发现和实验。对于数据科学工具，请参阅 [RAPIDS 网站](https:\u002F\u002Frapids.ai\u002F)。\n\n## 入门指南\n\n### RAPIDS 内存管理器 (RMM)\n\nRAFT 严重依赖 RMM，它减轻了在使用它的库中全局配置不同分配策略的负担。\n\n### 多维数组\n\nRAFT 中的 API 接受 [mdspan](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.06474) 多维数组视图来表示更高维度的数据，类似于 Python 库 Numpy 中的 `ndarray`。RAFT 还包含相应的拥有者 `mdarray` 结构，简化了主机和设备（GPU）内存中多维数据的分配和管理。\n\n`mdarray` 是建立在 RMM 之上的便捷层，可以在 RAFT 中使用多种不同的辅助函数进行构造：\n\n```c++\n#include \u003Craft\u002Fcore\u002Fdevice_mdarray.hpp>\n\nint n_rows = 10;\nint n_cols = 10;\n\nauto scalar = raft::make_device_scalar\u003Cfloat>(handle, 1.0);\nauto vector = raft::make_device_vector\u003Cfloat>(handle, n_cols);\nauto matrix = raft::make_device_matrix\u003Cfloat>(handle, n_rows, n_cols);\n```\n\n### C++ 示例\n\nRAFT 中的大多数原语接受一个 `raft::device_resources` 对象来管理创建成本较高的资源，例如 CUDA 流、流池以及像 `cublas` 和 `cusolver` 等其他 CUDA 库的句柄。\n\n下面的示例展示了创建一个 RAFT 句柄，并将其与 `device_matrix` 和 `device_vector` 一起使用以分配内存、生成随机簇，并使用 [NVIDIA cuVS](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fcuvs) 库计算成对欧几里得距离：\n\n```c++\n#include \u003Craft\u002Fcore\u002Fdevice_resources.hpp>\n#include \u003Craft\u002Fcore\u002Fdevice_mdspan.hpp>\n#include \u003Craft\u002Frandom\u002Fmake_blobs.cuh>\n#include \u003Ccuvs\u002Fdistance\u002Fdistance.hpp>\n\nraft::device_resources handle;\n\nint n_samples = 5000;\nint n_features = 50;\n\nfloat *input;\nint *labels;\nfloat *output;\n\n...\n\u002F\u002F Allocate input, labels, and output pointers\n...\n\nauto input_view = raft::make_device_matrix_view(input, n_samples, n_features);\nauto labels_view = raft::make_device_vector_view(labels, n_samples);\nauto output_view = raft::make_device_matrix_view(output, n_samples, n_samples);\n\nraft::random::make_blobs(handle, input_view, labels_view);\n\nauto metric = cuvs::distance::DistanceType::L2SqrtExpanded;\ncuvs::distance::pairwise_distance(handle, input_view, input_view, output_view, metric);\n```\n\n### Python 示例\n\n`pylibraft` 包包含了用于 RAFT 算法和原语（primitives）的 Python API。`pylibraft` 非常轻量且依赖极少，能够很好地集成到其他库中，并支持任何实现了 `__cuda_array_interface__` 接口的对象，例如 [CuPy 的 ndarray](https:\u002F\u002Fdocs.cupy.dev\u002Fen\u002Fstable\u002Fuser_guide\u002Finteroperability.html#rmm)。此包中暴露的 RAFT 算法数量正在持续增加。\n\n下面的示例演示了如何使用 [NVIDIA cuVS](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fcuvs) 库计算 CuPy 数组之间的成对欧氏距离。注意，CuPy 不是 `pylibraft` 的必要依赖项。\n\n```python\nimport cupy as cp\n\nfrom cuvs.distance import pairwise_distance\n\nn_samples = 5000\nn_features = 50\n\nin1 = cp.random.random_sample((n_samples, n_features), dtype=cp.float32)\nin2 = cp.random.random_sample((n_samples, n_features), dtype=cp.float32)\n\noutput = pairwise_distance(in1, in2, metric=\"euclidean\")\n```\n\n上述示例中的 `output` 数组类型为 `raft.common.device_ndarray`，它支持 [__cuda_array_interface__](https:\u002F\u002Fnumba.pydata.org\u002Fnumba-doc\u002Fdev\u002Fcuda\u002Fcuda_array_interface.html#cuda-array-interface-version-2)，使其能够与同样支持该接口的其他库（如 CuPy、Numba、PyTorch 和 RAPIDS cuDF）互操作。CuPy 支持 DLPack，这也使得从 `raft.common.device_ndarray` 到 JAX 和 Tensorflow 的零拷贝（zero-copy）转换成为可能。\n\n下面是将输出 `pylibraft.device_ndarray` 转换为 CuPy 数组的示例：\n```python\ncupy_array = cp.asarray(output)\n```\n\n以及转换为 PyTorch 张量：\n```python\nimport torch\n\ntorch_tensor = torch.as_tensor(output, device='cuda')\n```\n\n或者转换为 RAPIDS cuDF 数据框：\n```python\ncudf_dataframe = cudf.DataFrame(output)\n```\n\n当相应的库已安装并在您的环境中可用时，通过设置全局配置选项，所有 RAFT 计算 API 也可以自动完成此转换：\n```python\nimport pylibraft.config\npylibraft.config.set_output_as(\"cupy\")  # 所有计算 API 将返回 cupy 数组\npylibraft.config.set_output_as(\"torch\") # 所有计算 API 将返回 torch 张量\n```\n\n您还可以指定一个可调用对象（callable），它接受 `pylibraft.common.device_ndarray` 并执行自定义转换。以下示例将所有输出转换为 `numpy` 数组：\n```python\npylibraft.config.set_output_as(lambda device_ndarray: return device_ndarray.copy_to_host())\n```\n\n`pylibraft` 还支持写入预分配的输出数组，因此任何支持 `__cuda_array_interface__` 的数组都可以就地（in-place）写入：\n\n```python\nimport cupy as cp\n\nfrom cuvs.distance import pairwise_distance\n\nn_samples = 5000\nn_features = 50\n\nin1 = cp.random.random_sample((n_samples, n_features), dtype=cp.float32)\nin2 = cp.random.random_sample((n_samples, n_features), dtype=cp.float32)\noutput = cp.empty((n_samples, n_samples), dtype=cp.float32)\n\npairwise_distance(in1, in2, out=output, metric=\"euclidean\")\n```\n\n\n## 安装\n\nRAFT 的 C++ 和 Python 库都可以通过 Conda 安装，Python 库也可以通过 Pip 安装。\n\n\n### 通过 Conda 安装 C++ 和 Python\n\n通过 conda 安装 RAFT 是最简单的方法，并提供多个软件包。\n- `libraft-headers` C++ 头文件\n- `pylibraft`（可选）Python 库\n- `raft-dask`（可选）用于在 Dask 集群中部署使用 RAFT `raft::comms` 抽象层的多节点多 GPU 算法的 Python 库。\n\n根据您的 CUDA 版本使用以下命令通过 conda 安装所有 RAFT 软件包（将 `rapidsai` 替换为 `rapidsai-nightly` 以安装更新但稳定性稍差的夜间构建版本）。推荐使用 `mamba` 命令而非 `conda`。\n\n```bash\n# CUDA 13\nmamba install -c rapidsai -c conda-forge raft-dask pylibraft cuda-version=13.1\n\n# CUDA 12\nmamba install -c rapidsai -c conda-forge raft-dask pylibraft cuda-version=12.9\n```\n\n注意，上述命令也会安装 `libraft-headers` 和 `libraft`。\n\n您也可以使用上面的 `mamba` 命令单独安装 conda 软件包。例如，如果您想安装 RAFT 的头文件和预编译共享库以便在项目中使用：\n\n```bash\n# CUDA 13\nmamba install -c rapidsai -c conda-forge libraft libraft-headers cuda-version=13.1\n\n# CUDA 12\nmamba install -c rapidsai -c conda-forge libraft libraft-headers cuda-version=12.9\n```\n\n### 通过 Pip 安装 Python\n\n`pylibraft` 和 `raft-dask` 都有可以通过 [pip 安装](https:\u002F\u002Frapids.ai\u002Fpip.html#install) 的实验性软件包：\n\n```bash\n# CUDA 13\npip install pylibraft-cu13\npip install raft-dask-cu13\n\n# CUDA 12\npip install pylibraft-cu12\npip install raft-dask-cu12\n```\n\n这些软件包静态构建了 RAFT 的预编译实例，因此 C++ 头文件不会直接可用于您的代码中。\n\n[构建说明](https:\u002F\u002Fdocs.rapids.ai\u002Fapi\u002Fraft\u002Fnightly\u002Fbuild\u002F) 包含有关从源代码构建 RAFT 并将其包含在下游项目中的更多详细信息。您也可以在构建说明的 [从源代码构建 RAFT C++ 和 Python](https:\u002F\u002Fdocs.rapids.ai\u002Fapi\u002Fraft\u002Fnightly\u002Fbuild\u002F#building-c-and-python-from-source) 部分找到上述 CPM 代码片段的更完整版本。\n\n\n## 贡献\n\n如果您对为 RAFT 项目做出贡献感兴趣，请阅读我们的 [贡献指南](docs\u002Fsource\u002Fcontributing.md)。有关开发者指南、工作流程和原则的详细信息，请参阅 [开发者指南](docs\u002Fsource\u002Fdeveloper_guide.md)。\n\n## 参考文献\n\n在一般引用 RAFT 时，请考虑引用此 GitHub 项目。\n```bibtex\n@misc{rapidsai,\n  title={Rapidsai\u002Fraft: RAFT contains fundamental widely-used algorithms and primitives for data science, Graph and machine learning.},\n  url={https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft},\n  journal={GitHub},\n  publisher={NVIDIA RAPIDS},\n  author={Rapidsai},\n  year={2022}\n}\n```","# RAFT 快速上手指南\n\nRAFT (Reusable Accelerated Functions and Tools) 是一个用于机器学习和数据挖掘的 CUDA 加速算法与基础工具库。它提供 C++ 头文件模板库及轻量级 Python 封装，旨在加速算法构建并提高性能。\n\n## 环境准备\n\n- **硬件要求**: NVIDIA GPU (支持 CUDA)。\n- **操作系统**: Linux \u002F Windows \u002F macOS (推荐 Linux)。\n- **软件依赖**:\n  - CUDA Toolkit (版本 12.x 或 13.x)。\n  - Conda 或 Mamba (推荐使用 Mamba 以加快安装速度)。\n  - Python (3.8+)。\n- **前置知识**: 熟悉 Python 编程及基本的 CUDA 概念。\n\n## 安装步骤\n\n推荐使用 `mamba` 进行安装，比 `conda` 更快且更稳定。请根据您的 CUDA 版本选择对应的命令。\n\n### 通过 Conda\u002FMamba 安装 (推荐)\n\n此方式将安装 C++ 头文件、核心库以及 Python 接口。\n\n```bash\n# CUDA 13 环境\nmamba install -c rapidsai -c conda-forge raft-dask pylibraft cuda-version=13.1\n\n# CUDA 12 环境\nmamba install -c rapidsai -c conda-forge raft-dask pylibraft cuda-version=12.9\n```\n\n> **注意**: 上述命令会自动安装 `libraft-headers` 和 `libraft`。如需仅安装 C++ 头文件和预编译库，可移除 `raft-dask` 和 `pylibraft`。\n\n### 通过 Pip 安装 (实验性)\n\n如果您偏好使用 pip，可以使用以下命令安装实验性包：\n\n```bash\n# CUDA 13\npip install pylibraft-cu13\npip install raft-dask-cu13\n\n# CUDA 12\npip install pylibraft-cu12\npip install raft-dask-cu12\n```\n\n## 基本使用\n\nRAFT 提供了丰富的 Python 接口 (`pylibraft`)，支持与 CuPy、PyTorch、RAPIDS cuDF 等库无缝互操作。\n\n### Python 示例：计算成对距离\n\n以下示例展示了如何使用 RAFT 基础设施（通过 `cuvs` 调用）计算两个 CuPy 数组之间的欧几里得距离。\n\n```python\nimport cupy as cp\n\nfrom cuvs.distance import pairwise_distance\n\nn_samples = 5000\nn_features = 50\n\nin1 = cp.random.random_sample((n_samples, n_features), dtype=cp.float32)\nin2 = cp.random.random_sample((n_samples, n_features), dtype=cp.float32)\n\noutput = pairwise_distance(in1, in2, metric=\"euclidean\")\n```\n\n### 配置输出类型\n\n`pylibraft` 允许您全局配置计算结果的返回类型，以便与其他库（如 PyTorch 或 NumPy）无缝集成。\n\n```python\nimport pylibraft.config\n\n# 设置所有计算 API 返回 CuPy 数组\npylibraft.config.set_output_as(\"cupy\") \n\n# 设置所有计算 API 返回 PyTorch Tensor\npylibraft.config.set_output_as(\"torch\") \n```\n\n### 手动转换数据类型\n\n您也可以手动将 RAFT 的输出转换为其他格式：\n\n```python\n# 转换为 CuPy 数组\ncupy_array = cp.asarray(output)\n\n# 转换为 PyTorch Tensor\nimport torch\ntorch_tensor = torch.as_tensor(output, device='cuda')\n\n# 转换为 NumPy 数组\nnumpy_array = output.copy_to_host()\n```","某大型电商公司的机器学习团队正在构建亿级用户行为的实时推荐引擎，核心任务是对海量稀疏用户 - 物品交互矩阵进行高效分解。\n\n### 没有 raft 时\n- 团队需从零编写复杂的 CUDA 内核处理稀疏矩阵运算，单个模块开发周期长达数月\n- 不同项目组重复实现基础的线性代数函数，导致大量代码冗余和潜在的维护风险\n- CPU 与 GPU 间频繁的数据拷贝造成显著延迟，难以满足实时推理的低延迟需求\n- 缺乏统一优化的底层算子，模型迭代速度严重受限于硬件性能的深度挖掘不足\n\n### 使用 raft 后\n- 直接调用 RAFT 提供的稀疏线性代数原语，将矩阵分解功能的开发时间从月缩短至数天\n- 复用 RAFT 中经过生产验证的核心计算逻辑，彻底消除了重复造轮子的维护成本\n- 利用内置的 CUDA 加速机制，显著减少了主机与设备间的数据传输开销\n- 结合 raft-dask 轻松扩展至多节点多 GPU 环境，大幅提升大规模训练吞吐量\n\nRAFT 通过封装高性能 GPU 加速原语，让工程师能专注于业务算法创新而非底层硬件优化。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Frapidsai_raft_899a7e51.png","rapidsai","RAPIDS","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Frapidsai_30174da9.png","Open GPU Data Science",null,"RAPIDSai","https:\u002F\u002Frapids.ai","https:\u002F\u002Fgithub.com\u002Frapidsai",[84,88,92,96,100,104,108,112,116,120],{"name":85,"color":86,"percentage":87},"Cuda","#3A4E3A",62.6,{"name":89,"color":90,"percentage":91},"C++","#f34b7d",20.6,{"name":93,"color":94,"percentage":95},"Jupyter Notebook","#DA5B0B",10.5,{"name":97,"color":98,"percentage":99},"Python","#3572A5",2.4,{"name":101,"color":102,"percentage":103},"Cython","#fedf5b",1.4,{"name":105,"color":106,"percentage":107},"Shell","#89e051",1,{"name":109,"color":110,"percentage":111},"CMake","#DA3434",0.9,{"name":113,"color":114,"percentage":115},"C","#555555",0.4,{"name":117,"color":118,"percentage":119},"HTML","#e34c26",0,{"name":121,"color":122,"percentage":119},"Dockerfile","#384d54",990,228,"2026-04-02T20:32:13","Apache-2.0","未说明","需要 NVIDIA GPU，支持 CUDA 12.9 或 13.1",{"notes":130,"python":127,"dependencies":131},"RAFT 是 C++ 头文件模板库，提供可选共享库及 Python 包装器（pylibraft, raft-dask）。核心依赖 RMM 进行内存管理，API 使用 mdspan 表示多维数组。支持与 CuPy、PyTorch 等通过 __cuda_array_interface__ 互操作。推荐使用 mamba 或 conda 安装，需匹配 CUDA 版本。",[132,133,134,135,136,137,138],"RMM","cuVS","Dask","mdspan","CUB","cuBLAS","cuSOLVER",[26,54,13],[141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160],"anns","building-blocks","clustering","cuda","distance","gpu","information-retrieval","linear-algebra","llm","machine-learning","nearest-neighbors","primitives","random-sampling","solvers","sparse","statistics","vector-search","vector-similarity","vector-store","neighborhood-methods","2026-03-27T02:49:30.150509","2026-04-06T05:37:20.172245",[164,169,174,179,184],{"id":165,"question_zh":166,"answer_zh":167,"source_url":168},2738,"RAFT 的公共 API 头文件应该使用什么扩展名？编译时如何处理 CUDA 代码？","公共 API 头文件应统一使用 `.hpp` 扩展名，以减少消费者选择文件的负担并保持向后兼容。关于编译：NVCC 解析速度并非主要瓶颈，关键在于区分源文件类型。如果文件中没有 `__device__` 函数且不包含含设备代码的头文件，应使用 `.cpp` 由 GCC 编译以节省时间；若包含设备代码（如内核启动），则需使用 `.cu` 并由 NVCC 编译。核心组件如 `raft\u002Fcuda_utils.cuh` 允许在非 CUDA 编译器环境下编译。","https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fissues\u002F330",{"id":170,"question_zh":171,"answer_zh":172,"source_url":173},2739,"CAGRA 算法目前在 RAFT 中是否已经稳定可用？","是的，CAGRA 已正式移除实验状态，可以投入使用。它适用于处理大规模和小批量的最近邻查询任务。相关实现已合并至 PR #1666。用户如有反馈可直接提供。","https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fissues\u002F997",{"id":175,"question_zh":176,"answer_zh":177,"source_url":178},2740,"RAFT 的内存分配是否完全集成到 RMM 内存池中？","早期版本的 `raft::allocate` 函数可能直接调用 `cudaMalloc` 从而绕过 RMM。为了支持内存池功能，建议替换这些调用。具体的内存分配模式应遵循 `MR->allocate()`，主要涉及 `cpp\u002Finclude\u002Fraft\u002Fmr\u002Fbuffer_base.hpp` 和 `cpp\u002Finclude\u002Fraft\u002Fmr\u002Fdevice\u002Fallocator.hpp` 等文件的重构。","https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fissues\u002F308",{"id":180,"question_zh":181,"answer_zh":182,"source_url":183},2741,"如何在 RAFT 中使用设备端的随机数生成器接口？","该功能已通过 PR #609 解决。开发者可以使用基于 `random\u002Fdetail\u002Frng_impl.cuh` 的模板结构来生成随机数，或者直接暴露相应的设备函数。此外，接口还支持生成难以正确实现的随机值，特别是边界整数。","https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fissues\u002F406",{"id":185,"question_zh":186,"answer_zh":187,"source_url":188},2742,"为什么 RAFT 的编译时间较长？如何优化模板实例化导致的重复编译？","RAFT 通过预编译库 (`libraft.so`) 实例化常用模板参数以减少编译时间。常见问题包括意外重复实例化和因核心实现变更导致的依赖重新编译。解决方案要求：在编译 `libraft.so` 时确保不可能发生意外重实例化（报错而非静默失败），并减少关键头文件（如 `neighbors\u002Fspecializations.cuh`）的包含次数以降低耦合。","https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fissues\u002F1416",[190,195,200,205,210,215,220,225,230,235,240,245,250,255,260,265,270,275,280,285],{"id":191,"version":192,"summary_zh":193,"released_at":194},102227,"v24.10.00","## 🚨 Breaking Changes\n\n- [Feat] add `repeat`, `sparsity`,  `eval_n_elements` APIs to `bitset` (#2439) @rhdong\n\n## 🐛 Bug Fixes\n\n- Disable NN Descent Batch tests temporarily (#2453) @divyegala\n- Fix sed syntax in `update-version.sh` (#2441) @raydouglass\n- Use runtime check of cudart version for eig (#2430) @lowener\n- [BUG] Fix bitset function visibility (#2429) @lowener\n- Exclude any kernel symbol that uses cutlass (#2425) @robertmaynard\n\n## 🚀 New Features\n\n- [Feat] add `repeat`, `sparsity`,  `eval_n_elements` APIs to `bitset` (#2439) @rhdong\n- [Opt] Enforce the UT Coverity and add benchmark for `transpose` (#2438) @rhdong\n- [FEA] Support for half-float mixed precise in brute-force (#2382) @rhdong\n\n## 🛠️ Improvements\n\n- bump NCCL floor to 2.19 (#2458) @jameslamb\n- Deprecating vector search APIs and updating README accordingly (#2448) @cjnolet\n- Update update-version.sh to use packaging lib (#2447) @AyodeAwe\n- Switch traceback to `native` (#2446) @galipremsagar\n- bump NCCL floor to 2.18.1.1 (#2443) @jameslamb\n- Add missing `cuda_suffixed: true` (#2440) @trxcllnt\n- Use CI workflow branch &#39;branch-24.10&#39; again (#2437) @jameslamb\n- Update to flake8 7.1.1. (#2435) @bdice\n- Update fmt (to 11.0.2) and spdlog (to 1.14.1). (#2433) @jameslamb\n- Allow coo_sort to work on int64_t indices (#2432) @benfred\n- Adding NCCL clique to the RAFT handle (#2431) @viclafargue\n- Add support for Python 3.12 (#2428) @jameslamb\n- Update rapidsai\u002Fpre-commit-hooks (#2420) @KyleFromNVIDIA\n- Drop Python 3.9 support (#2417) @jameslamb\n- Use CUDA math wheels (#2415) @KyleFromNVIDIA\n- Remove NumPy &lt;2 pin (#2414) @seberg\n- Update pre-commit hooks (#2409) @KyleFromNVIDIA\n- Improve update-version.sh (#2408) @bdice\n- Use tool.scikit-build.cmake.version, set scikit-build-core minimum-version (#2406) @jameslamb\n- [FEA] Batching NN Descent (#2403) @jinsolp\n- Update pip devcontainers to UCX v1.17.0 (#2401) @jameslamb\n- Merge branch-24.08 into branch-24.10 (#2397) @jameslamb","2024-10-09T15:29:23",{"id":196,"version":197,"summary_zh":198,"released_at":199},102218,"v26.02.00","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at v26.02.00 -->\n\n## What's Changed\n### 🚨 Breaking Changes\n* Use CCCL's mdspan implementation by @bdice in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2836\n* Default to static linking of libcudart by @bdice in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2890\n* Remove `neighbors\u002F`, `cluster\u002F`, `distance\u002F`, `spatial\u002F`, `sparse\u002Fneighbors\u002F` apis by @aamijar in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2885\n* Remove cutlass and cuco dependencies by @divyegala in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2916\n### 🐛 Bug Fixes\n* Include `\u003Cthrust\u002Ffor_each.h>` where it is used by @bdice in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2883\n* Include CTest module in CMakeLists.txt by @bdice in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2895\n* Fix Lanczos Determinism by @aamijar in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2894\n* Change compile-time assertion to runtime assertion on is_strided by @bdice in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2909\n* Set memory pool through RMM by @viclafargue in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2866\n### 📖 Documentation\n* New readme image by @aamijar in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2907\n* Readme improvements by @aamijar in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2906\n### 🚀 New Features\n* Tile Policy for Uint8 Input (Pairwise) by @tarang-jain in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2770\n* Add copy_vectorized to RAFT by @lowener in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2900\n### 🛠️ Improvements\n* Use strict priority in CI conda tests by @bdice in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2879\n* Use strict priority in CI conda tests by @bdice in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2884\n* Remove alpha specs from non-RAPIDS dependencies by @bdice in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2886\n* Enable merge barriers by @KyleFromNVIDIA in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2889\n* Fix is_exhaustive, no longer constexpr by @bdice in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2888\n* Add devcontainer fallback for C++ test location by @bdice in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2893\n* `eigsh` optional seed by @aamijar in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2899\n* Empty commit to trigger a build by @bdice in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2904\n* Update to C++20 by @divyegala in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2908\n* Use SPDX license identifiers in pyproject.toml, bump build dependency floors by @jameslamb in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2910\n* Remove `neighbors\u002Fdetail\u002Ffaiss_select` by @aamijar in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2902\n* Remove `sparse\u002Fdistance` by @aamijar in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2905\n* Add CUDA 13.1 support by @bdice in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2896\n* Fix CCCL 3.2 mdspan constexpr issues by @bdice in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2911\n* build and test against CUDA 13.1.0 by @jameslamb in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2912\n* Laplacian Kernel for COO inputs by @aamijar in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2891\n* Empty commit to trigger a build by @jameslamb in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2919\n* Use main shared-workflows branch by @jameslamb in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2921\n* Fix update-version.sh incorrectly replacing main() function names by @AyodeAwe in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2923\n* Lanczos remove dead code by @aamijar in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2918\n* wheel builds: react to changes in pip's handling of build constraints by @mmccarty in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2927\n* fix(build): build package on merge to `release\u002F*` branch by @gforsyth in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2929\n\n## New Contributors\n* @mmccarty made their first contribution in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2927\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fcompare\u002Fv26.02.00a...v26.02.00","2026-02-04T21:15:46",{"id":201,"version":202,"summary_zh":203,"released_at":204},102219,"v25.12.00","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at v25.12.00 -->\n\n## What's Changed\n### 🚨 Breaking Changes\n* More consistent container policies & host memory resource by @achirkin in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2835\n* Require CUDA 12.2+ by @jakirkham in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2850\n### 🐛 Bug Fixes\n* Correct tagging in the `irecv` function of the STD communicator by @viclafargue in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2829\n* Fix copyright hook file exclusion by @KyleFromNVIDIA in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2840\n* Properly guard usage of openmp function calls by @robertmaynard in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2839\n* Fix reduce mdspan API by @lowener in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2853\n* Fix for STD comm waitall function by @viclafargue in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2852\n* Pin Cython pre-3.2.0 and PyTest pre-9 by @jakirkham in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2864\n* refactored update-version.sh to handle new branching strategy by @rockhowse in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2863\n* Fix laplacian scaling coefficients by @aamijar in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2871\n* Revert \"Remove Deprecated API (#2813)\" by @csadorf in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2881\n### 📖 Documentation\n* Use current system architecture in conda environment creation command by @bdice in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2862\n### 🚀 New Features\n* BENCH_PRIMS: convenience reporting of benchmark parameters and read throughput by @achirkin in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2824\n### 🛠️ Improvements\n* Update to rapids-logger 0.2 by @bdice in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2828\n* Enable `sccache-dist` connection pool by @trxcllnt in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2837\n* Use main in RAPIDS_BRANCH by @bdice in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2842\n* Use main shared-workflows branch by @bdice in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2844\n* Use SPDX for all copyright headers by @KyleFromNVIDIA in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2845\n* Use ruff-check, ruff-format instead of black, flake8, isort by @KyleFromNVIDIA in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2855\n* Remove shims for CCCL \u003C 3.1 compatibility by @bdice in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2858\n* Always convert warnings to errors by @jakirkham in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2857\n* Lanczos Solver with COO input and cusparse wrappers by @aamijar in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2851\n* COO support in sparse matrix utilities by @aamijar in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2861\n* Update RMM includes from `\u003Crmm\u002Fmr\u002Fdevice\u002F*>` to `\u003Crmm\u002Fmr\u002F*>` by @bdice in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2867\n* Use `sccache-dist` build cluster for conda and wheel builds by @trxcllnt in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2859\n* Remove Deprecated API by @jnke2016 in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2813\n\n## New Contributors\n* @rockhowse made their first contribution in https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fpull\u002F2863\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fcompare\u002Fv25.12.00a...v25.12.00","2025-12-11T00:00:39",{"id":206,"version":207,"summary_zh":208,"released_at":209},102220,"v25.10.00","## 🐛 Bug Fixes\n\n- Workaround for an illegal memory access on SM 120 devices (#2821) @achirkin\n- Fix sparse select_k: don&#39;t write beyond min(input_len, k) (#2814) @achirkin\n- [BUG] Fix compilation error in matrix\u002Fdetail\u002Fgather.cuh (#2811) @enp1s0\n- Fix select_k for negative bfloat16 (#2799) @apivovarov\n- Fix index types for coo kernels (#2793) @aamijar\n- Fix the GEMM pointer mode setting (#2777) @achirkin\n- Fix `host_vector_policy` issue (#2739) @viclafargue\n\n## 📖 Documentation\n\n- Fix UCX-Py mention to UCXX in docstring (#2804) @pentschev\n\n## 🚀 New Features\n\n- Update cutlass to a version that supports CUDA 13 (#2774) @robertmaynard\n\n## 🛠️ Improvements\n\n- Fix missed deps in `update-version.sh` (#2826) @AyodeAwe\n- Empty commit to trigger a build (#2816) @msarahan\n- Make warpsort kernels use the IEEE 754 bit representation for ordering (#2807) @achirkin\n- Configure repo for automatic release notes generation (#2806) @AyodeAwe\n- Support &lt; 2 element arrays in `rand_index`\u002F`adjusted_rand_index` (#2805) @jcrist\n- update dependencies: use cuda-toolkit wheels (#2802) @jameslamb\n- Use branch-25.10 again (#2800) @jameslamb\n- Remove CMake find UCX package (#2798) @pentschev\n- use dask-cuda[cu12, cu13] extras for wheel dependencies (#2797) @jameslamb\n- Remove UCX-Py (#2791) @pentschev\n- Update rapids-dependency-file-generator (#2790) @KyleFromNVIDIA\n- Build and test with CUDA 13.0.0 (#2787) @jameslamb\n- Fix template arg passing in `adjusted_rand_index` (#2785) @jinsolp\n- Use build cluster in devcontainers (#2781) @trxcllnt\n- Use rapids_cuda_enable_fatbin_compression (#2780) @robertmaynard\n- Increase Dask tests verbosity in CI (#2779) @pentschev\n- Update rapids_config to handle user defined branch name (#2778) @robertmaynard\n- [REVIEW] Fix: skip default_allocation_limit() if unnecessary (#2775) @i-Pear\n- Update rapids-build-backend to 0.4.1 (#2773) @KyleFromNVIDIA\n- ci(labeler): update labeler action to @v5 (#2772) @gforsyth\n- Register bfloat16\u002Fbfloat162 in util\u002Fvectorized.cuh (#2769) @apivovarov\n- Use mdspan::index_type to Only Instantiate Specific Kernels (#2767) @tarang-jain\n- Allow latest OS in devcontainers (#2759) @bdice\n- Update build infra to support new branching strategy (#2751) @robertmaynard\n- Use GCC 14 in conda builds. (#2708) @vyasr","2025-10-08T17:49:50",{"id":211,"version":212,"summary_zh":213,"released_at":214},102221,"v25.12.00a","## 🔗 Links\n\n- [Development Branch](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Ftree\u002Fbranch-25.12)\n- [Compare with `main` branch](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fraft\u002Fcompare\u002Fmain...branch-25.12)\n\n\n## 🐛 Bug Fixes\n\n- Fix copyright hook file exclusion (#2840) @KyleFromNVIDIA\n- Properly guard usage of openmp function calls (#2839) @robertmaynard\n- Correct tagging in the `irecv` function of the STD communicator (#2829) @viclafargue\n\n## 🛠️ Improvements\n\n- Enable `sccache-dist` connection pool (#2837) @trxcllnt\n- Update to rapids-logger 0.2 (#2828) @bdice","2025-10-01T15:59:16",{"id":216,"version":217,"summary_zh":218,"released_at":219},102222,"v25.08.00","## 🚨 Breaking Changes\n\n- `MatrixLinewiseOp` compile-time-invocation (#2701) @aamijar\n- Remove CUDA 11 from dependencies.yaml (#2695) @KyleFromNVIDIA\n- stop uploading packages to downloads.rapids.ai (#2688) @jameslamb\n- Reduce instantiations of `Reduction` kernels (#2679) @divyegala\n\n## 🐛 Bug Fixes\n\n- Fix stream sync for Copy2DAsync test (#2744) @lowener\n- Several small fixes to make Raft compile with LLVM. (#2735) @vitor1001\n- Add missing header (#2734) @vitor1001\n- [REVIEW] Fix static initialization order fiasco in `lanczos.cu` (#2733) @legrosbuffle\n- [REVIEW] Fix assertion in `fill_indices_by_rows_kernel`. (#2732) @legrosbuffle\n- libucx: consider post-releases in wheel builds (#2729) @jameslamb\n- Fix laplacian cast (#2725) @aamijar\n- Fix excess_subsample (#2723) @mfoerste4\n- Fix the constructor for `coordinate_structure` for non-zero `nnz`. (#2717) @legrosbuffle\n- [REVIEW] Fix compile error when using `mdbuffer` with all-static extents. (#2716) @legrosbuffle\n- Fix unsafe cast `coo_remove_scalar` (#2713) @aamijar\n- Fix laplacian self-loops (#2712) @aamijar\n- [REVIEW] Fix a few memory leaks. (#2710) @legrosbuffle\n- Fix MST bug for graph with identical edge weights (#2707) @jnke2016\n- Missed update accounting for reduction related APIs (#2704) @divyegala\n- Adding GH_TOKEN pass-through to summarize job (#2702) @msarahan\n- Work around Cython ctypedef bug (#2686) @vyasr\n\n## 📖 Documentation\n\n- add docs on CI workflow inputs (#2728) @jameslamb\n\n## 🛠️ Improvements\n\n- An additional small change to remove cuda 11 stuff (#2763) @cjnolet\n- Removing CUDA 11 from docs and code (#2757) @cjnolet\n- fix(docker): use versioned `-latest` tag for all `rapidsai` images (#2745) @gforsyth\n- Update protocol name for UCX-Py tests (#2743) @pentschev\n- Remove sphinx upper bound (#2742) @bdice\n- remove cuspatial references, avoid triggering tests on clang-format config changes (#2740) @jameslamb\n- MST Edge Case (#2736) @tarang-jain\n- Add missing `#include &lt;cassert&gt;` in `cpp\u002Finclude\u002Fraft\u002Fcore\u002Fmath.hpp` (#2730) @trxcllnt\n- Update leftover CUDA 12.8 to 12.9 in docs (#2724) @jakirkham\n- Fix docs lanczos solver (#2722) @aamijar\n- Use CUDA 12.9 in Conda, Devcontainers, Spark, GHA, etc. (#2721) @jakirkham\n- Remove nvidia and dask channels (#2720) @vyasr\n- [REVIEW] Fix compile error of `abs_op` when compiling with `clang` (#2718) @legrosbuffle\n- Avoid using internal method std::experimental::details::alignTo(). (#2714) @vitor1001\n- refactor(shellcheck): fix all remaining warnings\u002Ferrors (#2703) @gforsyth\n- `MatrixLinewiseOp` compile-time-invocation (#2701) @aamijar\n- Remove pytest pin (#2699) @vyasr\n- Fix several issues that breaks LLVM (#2698) @vitor1001\n- Remove CUDA 11 from dependencies.yaml (#2695) @KyleFromNVIDIA\n- Remove CUDA 11 devcontainers and update CI scripts (#2690) @bdice\n- refactor(rattler): remove cuda11 options and general cleanup (#2689) @gforsyth\n- stop uploading packages to downloads.rapids.ai (#2688) @jameslamb\n- fix(devcontainers): typo in container name (#2687) @gforsyth\n- Reduce instantiations of `Reduction` kernels (#2679) @divyegala\n- Forward-merge branch-25.06 into branch-25.08 (#2675) @divyegala\n- Add support for F16 in linalg::transpose (#2672) @enp1s0\n- Forward-merge branch-25.06 into branch-25.08 (#2664) @gforsyth\n- Support `coo_matrix` in `coo_symmetrize` and `coo_remove_scalar` (#2662) @aamijar\n- Lanczos Solver `which=SA,SM,LA,LM` argument (#2628) @aamijar","2025-08-06T18:33:50",{"id":221,"version":222,"summary_zh":223,"released_at":224},102223,"v25.06.00","## 🚨 Breaking Changes\n\n- Decoupling multi gpu resources from nccl usage (#2647) @jinsolp\n\n## 🐛 Bug Fixes\n\n- NCCL comm resource fix (#2692) @viclafargue\n- Fix the launch bounds for nn-descent kernel for 1210 and remove nn-descent tests (#2691) @viclafargue\n- Prefer host gather when dataset is available both on host and device (#2671) @tfeher\n- Fix warnings treated as errors downstream in cuVS (#2644) @achirkin\n- Fix nccl_comm.hpp warning: #83-D: type qualifier specified more than once (#2643) @achirkin\n- NVTX: null destination pointer warning-treated-as-error (#2639) @achirkin\n- Add UCXX and NCCL to `libraft` conda recipe (#2636) @divyegala\n- Fix building cutlass (#2619) @miscco\n- Fix COO symmetrization (#2582) @viclafargue\n\n## 🚀 New Features\n\n- [Feat] add `cudaMemcpy2DAsync` wrapper (#2674) @rhdong\n- Python wrapper for `device_resources_snmg` (#2666) @jinsolp\n- Laplacian normalization primitives (#2648) @aamijar\n- [FEA] Matrix shift rows and columns (#2634) @jinsolp\n- Use NCCL wheels from PyPI for CUDA 12 builds (#2629) @divyegala\n- Support strided matrix view as an input to matrix::samples_rows (#2626) @enp1s0\n- [Feat] add support for bm25 and tfidf (#2567) @jperez999\n\n## 🛠️ Improvements\n\n- use &#39;rapids-init-pip&#39; in wheel CI, other CI changes (#2677) @jameslamb\n- Dask 2025.4.1 compatibility (#2673) @TomAugspurger\n- Finish CUDA 12.9 migration and use branch-25.06 workflows (#2669) @bdice\n- Update to clang 20 (#2665) @bdice\n- Quote head_rev in conda recipes (#2660) @bdice\n- CUDA 12.9 use updated compression flags (#2657) @robertmaynard\n- Build and test with CUDA 12.9.0 (#2655) @bdice\n- Exclude librmm.so from auditwheel (#2654) @bdice\n- Fix cub include in normalize.cuh (#2652) @lowener\n- Add support for Python 3.13 (#2649) @gforsyth\n- Decoupling multi gpu resources from nccl usage (#2647) @jinsolp\n- [BUGFIX] Fixed quoting in wheel paths in pylibraft and raft_dask wheel tests (#2645) @VenkateshJaya\n- Download build artifacts from Github for CI (#2640) @VenkateshJaya\n- Limit allowed wheel sizes (#2638) @divyegala\n- Remove CUDA whole compilation ODR violations (#2633) @divyegala\n- refactor(rattler): enable strict channel priority for builds (#2632) @gforsyth\n- Vendor RAPIDS.cmake (#2631) @bdice\n- Replace `Thrust` iterator facilities and replace them with `libcu++` ones (#2627) @miscco\n- Port all conda recipes to `rattler-build` (#2623) @gforsyth\n- Add missing thrust include (#2618) @miscco\n- Moving wheel builds to specified location and uploading build artifacts to Github (#2617) @VenkateshJaya\n- Fixed pytest marker warnings by removing unused pytest.ini (#2591) @TomAugspurger\n- Introduction of the `raft::device_resources_snmg` type (#2549) @viclafargue\n- Create a NCCL sub-communicator using ncclCommSplit (#2495) @seunghwak","2025-06-05T19:23:46",{"id":226,"version":227,"summary_zh":228,"released_at":229},102224,"v25.04.00","## 🚨 Breaking Changes\n\n- Account for cugraph API breakage (#2581) @divyegala\n- Use new rapids-logger library (#2566) @vyasr\n\n## 🐛 Bug Fixes\n\n- Backport build patch fix (#2620) @KyleFromNVIDIA\n- Revert &quot;Temporarily increase `max_days_without_success` (#2602)&quot; (#2613) @divyegala\n- Relax max duplicates in batched NN Descent (#2610) @jinsolp\n- [Fix] Lanczos solver gemv fix (#2607) @aamijar\n- [Fix] `select-k-csr` failure on CUDA11.x + H100 (#2604) @rhdong\n- Temporarily increase `max_days_without_success` (#2602) @divyegala\n- Swap `blocks` and `threads_per_block` in `compute_graph_laplacian` (#2597) @jcrist\n- [BUG] Fix illegal memory access in linalg::reduction (#2592) @enp1s0\n- Require sphinx&lt;8.2.0 (#2590) @KyleFromNVIDIA\n- Account for cugraph API breakage (#2581) @divyegala\n- `#include &lt;numeric&gt;` for `std::iota` (#2578) @benfred\n- Fix Laplacian calculation in spectral partitioning (#2568) @wphicks\n- Take argument by `const&amp;` as the input range is const (#2558) @miscco\n- Allow some of the sparse utility functions to handle larger matrices (#2541) @viclafargue\n\n## 🛠️ Improvements\n\n- ci: pre-filter 11.4 jobs before they are enabled in shared workflows (#2608) @gforsyth\n- Use conda-build instead of conda-mambabuild (#2595) @bdice\n- Replace `cub::Sum` and `cub::Max` with `cuda::std::plus` and `cuda::maximum` (#2594) @miscco\n- Update all `conda_build_config.yaml`s RAPIDS UCX version (#2589) @jakirkham\n- Drop `cub::TransformInputIterator` in favor of `thrust::transform_iterator` (#2588) @miscco\n- Consolidate more Conda solves in CI (#2587) @KyleFromNVIDIA\n- Fix duplicate indices in batch NN Descent (#2586) @jinsolp\n- Require CMake 3.30.4 (#2584) @robertmaynard\n- Create Conda CI test env in one step (#2580) @KyleFromNVIDIA\n- Use shared-workflows branch-25.04 (#2576) @bdice\n- Add `shellcheck` to pre-commit and fix warnings (#2575) @gforsyth\n- Add build_type input field for `test.yaml` (#2573) @gforsyth\n- Use `rapids-pip-retry` in CI jobs that might need retries (#2571) @gforsyth\n- Avoid limited memory adaptor issue in balanced KMeans (#2570) @csadorf\n- update telemetry and retarget 25.04 (#2569) @msarahan\n- Use new rapids-logger library (#2566) @vyasr\n- disallow fallback to Make in Python builds (#2563) @jameslamb\n- Forward-merge branch-25.02 into branch-25.04 (#2561) @bdice\n- Migrate to NVKS for amd64 CI runners (#2559) @bdice\n- Add `verify-codeowners` hook (#2557) @KyleFromNVIDIA","2025-04-09T19:08:38",{"id":231,"version":232,"summary_zh":233,"released_at":234},102225,"v25.02.00","## 🚨 Breaking Changes\r\n\r\n- Update pip devcontainers to UCX 1.18 (#2550) @jameslamb\r\n- Switch over to rapids-logger (#2530) @vyasr\r\n- Adapt to rmm logger changes (#2513) @vyasr\r\n\r\n## 🐛 Bug Fixes\r\n\r\n- Rename test to tests. (#2546) @bdice\r\n- Fix bit order of RMAT Rectangular Generator to match expectation (#2542) @mfoerste4\r\n- Fix broken link to python doc (#2537) @lowener\r\n- Fix lanczos solver integer overflow (#2536) @viclafargue\r\n- Fix rnd bit generation in rmat_rectangular_kernel (#2524) @tfeher\r\n\r\n## 📖 Documentation\r\n\r\n- Fix docs builds (#2562) @bdice\r\n- [DOC] Fix sample codes (#2518) @enp1s0\r\n\r\n## 🚀 New Features\r\n\r\n- Add cuda 12.8 support (#2551) @robertmaynard\r\n- Add support for different data type of bitset (#2535) @lowener\r\n- [Feat] Support `bitset_to_csr` (#2523) @rhdong\r\n- Remove upper bounds on cuda-python to allow 12.6.2 and 11.8.5 (#2517) @bdice\r\n\r\n## 🛠️ Improvements\r\n\r\n- Revert CUDA 12.8 shared workflow branch changes (#2560) @vyasr\r\n- Build and test with CUDA 12.8.0 (#2555) @bdice\r\n- Update pip devcontainers to UCX 1.18 (#2550) @jameslamb\r\n- use dynamic CUDA wheels on CUDA 11 (#2548) @jameslamb\r\n- Normalize whitespace (#2547) @bdice\r\n- Use cuda.bindings layout. (#2545) @bdice\r\n- Revert &quot;Introduction of the `raft::device_resources_snmg` type (#2487)&quot; (#2543) @cjnolet\r\n- Add missing `#include &lt;cstdint&gt;` (#2540) @jakirkham\r\n- Use GCC 13 in CUDA 12 conda builds. (#2539) @bdice\r\n- Use rapids-cmake for the logger (#2534) @vyasr\r\n- Check if nightlies have succeeded recently enough (#2533) @vyasr\r\n- remove unused &#39;joblib&#39; and &#39;numba&#39; dependencies, other packaging cleanup (#2532) @jameslamb\r\n- introduce libraft wheels (#2531) @jameslamb\r\n- Switch over to rapids-logger (#2530) @vyasr\r\n- reduce duplication, removed unused things in dependencies.yaml (#2529) @jameslamb\r\n- Update cuda-python lower bounds to 12.6.2 \u002F 11.8.5 (#2522) @bdice\r\n- [Opt] Optimizing the performance of `bitmap_to_csr` (#2516) @rhdong\r\n- prefer system install of UCX in devcontainers, update outdated RAPIDS references (#2514) @jameslamb\r\n- Adapt to rmm logger changes (#2513) @vyasr\r\n- Require approval to run CI on draft PRs (#2512) @bdice\r\n- Shrink wheel size limit following removal of vector search APIs. (#2509) @bdice\r\n- Forward-merge branch-24.12 to branch-25.02 (#2508) @bdice\r\n- Introduction of the `raft::device_resources_snmg` type (#2487) @viclafargue\r\n- Add breaking change workflow trigger (#2482) @AyodeAwe\r\n- Remove &#39;sample&#39; parameter from stats::mean API (#2389) @mfoerste4","2025-03-24T12:28:45",{"id":236,"version":237,"summary_zh":238,"released_at":239},102226,"v24.12.00","## 🚨 Breaking Changes\n\n- Do not initialize the pinned mdarray at construction time (#2478) @achirkin\n\n## 🐛 Bug Fixes\n\n- Skip gtests for new lanczos solver when CUDA version is 11.4 or below. (#2520) @cjnolet\n- Switch `assert` to `static_assert` (#2510) @divyegala\n- Revert use of new Lanczos solver in spectral clustering (#2507) @lowener\n- Put a ceiling on cuda-python (#2486) @bdice\n- Don&#39;t presume pointers location infers usability. (#2480) @robertmaynard\n- Use Python for sccache hit rate computation. (#2474) @bdice\n- Allow compilation with CUDA 12.6.1 (#2469) @robertmaynard\n\n## 🚀 New Features\n\n- [FEA] Lanczos solver v2 (#2481) @lowener\n\n## 🛠️ Improvements\n\n- Skip gtests for Rmat Lanczos tests with cuda &lt;= 11.4 (#2525) @benfred\n- Upgrade to latest cutlass version (#2503) @vyasr\n- Removing some left over places where implicit instantiations were being ignored in headers (#2501) @cjnolet\n- Remove leftover template project code. (#2500) @bdice\n- 2412 remove libraft vss instantiations (#2498) @cjnolet\n- Remove raft-ann-bench (#2497) @cjnolet\n- Pin FAISS Version for raft-ann-bench (#2496) @tarang-jain\n- enforce wheel size limits and README formatting in CI, put a ceiling on Cython dependency (#2490) @jameslamb\n- Do not initialize the pinned mdarray at construction time (#2478) @achirkin\n- Use environment variables in cache hit rate computation. (#2475) @bdice\n- devcontainer: replace `VAULT_HOST` with `AWS_ROLE_ARN` (#2472) @jjacobelli\n- print sccache stats in builds (#2470) @jameslamb\n- make package installations in CI stricter (#2467) @jameslamb\n- Prune workflows based on changed files (#2466) @KyleFromNVIDIA\n- Merge branch-24.10 into branch-24.12 (#2461) @jameslamb\n- Update all rmm imports to use pylibrmm\u002Flibrmm (#2451) @Matt711","2024-12-11T19:15:40",{"id":241,"version":242,"summary_zh":243,"released_at":244},102228,"v24.08.00","## 🚨 Breaking Changes\n\n- [Refactor] move `popc` to under util (#2394) @rhdong\n- [Opt] Expose the `detail::popc` as public API (#2346) @rhdong\n\n## 🐛 Bug Fixes\n\n- Add timeout to UCXX generic operations (#2398) @pentschev\n- [Fix] bitmap set\u002Ftest issue (#2371) @rhdong\n- Fix 0 recall issue in `raft_cagra_hnswlib` ANN benchmark (#2369) @divyegala\n- Fix `ef` setting in HNSW wrapper (#2367) @divyegala\n- Fix cagra graph opt bug (#2365) @enp1s0\n- Fix a bug where the wrong API is used to free the memory (#2361) @PointKernel\n- Allow anonymous user in devcontainer name (#2355) @bdice\n- Fix compilation error when _CLK_BREAKDOWN is defined in cagra. (#2350) @jiangyinzuo\n- ensure raft-dask wheel tests install pylibraft wheel from the same CI run, fix wheel dependencies (#2349) @jameslamb\n- Change --config-setting to --config-settings (#2342) @KyleFromNVIDIA\n- Add workaround for syevd in CUDA 12.0 (#2332) @lowener\n\n## 🚀 New Features\n\n- [FEA] add the support of `masked_matmul` (#2362) @rhdong\n- [FEA] Dice Distance for Dense Inputs (#2359) @aamijar\n- [Opt] Expose the `detail::popc` as public API (#2346) @rhdong\n- Enable distance return for NN Descent (#2345) @jinsolp\n\n## 🛠️ Improvements\n\n- [Refactor] move `popc` to under util (#2394) @rhdong\n- split up CUDA-suffixed dependencies in dependencies.yaml (#2388) @jameslamb\n- Use workflow branch 24.08 again (#2385) @KyleFromNVIDIA\n- Add cusparseSpMV_preprocess to cusparse wrapper (#2384) @Kh4ster\n- Consolidate SUM reductions (#2381) @mfoerste4\n- Use slicing kernel to copy distances inside NN Descent (#2380) @jinsolp\n- Build and test with CUDA 12.5.1 (#2378) @KyleFromNVIDIA\n- Add CUDA_STATIC_MATH_LIBRARIES (#2376) @KyleFromNVIDIA\n- skip CMake 3.30.0 (#2375) @jameslamb\n- Use verify-alpha-spec hook (#2373) @KyleFromNVIDIA\n- Binarize Dice Distance for Dense Inputs (#2370) @aamijar\n- [FEA] Add distance epilogue for NN Descent (#2364) @jinsolp\n- resolve dependency-file-generator warning, other rapids-build-backend followup (#2360) @jameslamb\n- Remove text builds of documentation (#2354) @vyasr\n- Use default init in reduction (#2351) @akifcorduk\n- ensure update-version.sh preserves alpha spec, add tests on version constants (#2344) @jameslamb\n- remove unnecessary &#39;setuptools&#39; dependencies (#2343) @jameslamb\n- Use rapids-build-backend (#2331) @KyleFromNVIDIA\n- Add FAISS with RAFT enabled Benchmarking to raft-ann-bench (#2026) @tarang-jain","2024-08-07T17:01:24",{"id":246,"version":247,"summary_zh":248,"released_at":249},102229,"v24.06.00","## 🚨 Breaking Changes\n\n- Rename raft-ann-bench module to raft_ann_bench (#2333) @KyleFromNVIDIA\n- Scaling workspace resources (#2322) @achirkin\n- [REVIEW] Adjust UCX dependencies (#2304) @pentschev\n- Convert device_memory_resource* to device_async_resource_ref (#2269) @harrism\n\n## 🐛 Bug Fixes\n\n- Fix import of VERSION file in raft-ann-bench (#2338) @KyleFromNVIDIA\n- Rename raft-ann-bench module to raft_ann_bench (#2333) @KyleFromNVIDIA\n- Support building faiss main statically (#2323) @robertmaynard\n- Refactor spectral scale_obs to use existing normalization function (#2319) @ChuckHastings\n- Correct initializer list order found by cuvs (#2317) @robertmaynard\n- ANN_BENCH: enable move semantics for configured_raft_resources (#2311) @achirkin\n- Revert &quot;Build C++ wheel (#2264)&quot; (#2305) @vyasr\n- Revert &quot;Add `compile-library` by default on pylibraft build&quot; (#2300) @vyasr\n- Add VERSION to raft-ann-bench package (#2299) @KyleFromNVIDIA\n- Remove nonexistent job from workflow (#2298) @vyasr\n- `libucx` should be run dependency of `raft-dask` (#2296) @divyegala\n- Fix clang intrinsic warning (#2292) @aaronmondal\n- Replace too long index file name with hash in ANN bench (#2280) @tfeher\n- Fix build command for C++ compilation (#2270) @lowener\n- Fix a compilation error in CAGRA when enabling log output (#2262) @enp1s0\n- Correct member initialization order (#2254) @robertmaynard\n- Fix time computation in CAGRA notebook (#2231) @lowener\n\n## 📖 Documentation\n\n- Fix citation info (#2318) @enp1s0\n\n## 🚀 New Features\n\n- Scaling workspace resources (#2322) @achirkin\n- ANN_BENCH: AnnGPU::uses_stream() for optional algo GPU sync (#2314) @achirkin\n- [FEA] Split Bitset code (#2295) @lowener\n- [FEA] support of prefiltered brute force (#2294) @rhdong\n- Always use a static gtest and gbench (#2265) @robertmaynard\n- Build C++ wheel (#2264) @vyasr\n- InnerProduct Distance Metric for CAGRA search (#2260) @tarang-jain\n- [FEA] Add support for `select_k` on CSR matrix (#2140) @rhdong\n\n## 🛠️ Improvements\n\n- ANN_BENCH: common AnnBase::index_type (#2315) @achirkin\n- ANN_BENCH: split instances of RaftCagra into multiple files (#2313) @achirkin\n- ANN_BENCH: a global pool of result buffers across benchmark cases (#2312) @achirkin\n- Remove the shared state and the mutex from NVTX internals (#2310) @achirkin\n- docs: update README.md (#2308) @eltociear\n- [REVIEW] Reenable raft-dask wheel tests requiring UCX-Py (#2307) @pentschev\n- [REVIEW] Adjust UCX dependencies (#2304) @pentschev\n- Overhaul ops-codeowners (#2303) @raydouglass\n- Make thrust nosync execution policy the default thrust policy (#2302) @abc99lr\n- InnerProduct testing for CAGRA+HNSW (#2297) @divyegala\n- Enable warnings as errors for Python tests (#2288) @mroeschke\n- Normalize dataset vectors in the CAGRA InnerProduct tests (#2287) @enp1s0\n- Use dynamic version for raft-ann-bench (#2285) @KyleFromNVIDIA\n- Make &#39;librmm&#39; a &#39;host&#39; dependency for conda packages (#2284) @jameslamb\n- Fix comments in cpp\u002Finclude\u002Fraft\u002Fneighbors\u002Fcagra_serialize.cuh (#2283) @jiangyinzuo\n- Only use functions in the limited API (#2282) @vyasr\n- define &#39;ucx&#39; pytest marker (#2281) @jameslamb\n- Migrate to `{{ stdlib(&quot;c&quot;) }}` (#2278) @hcho3\n- add --rm and --name to devcontainer run args (#2275) @trxcllnt\n- Update pip devcontainers to UCX v1.15.0 (#2274) @trxcllnt\n- `#ifdef` out pragma deprecation warning messages (#2271) @trxcllnt\n- Convert device_memory_resource* to device_async_resource_ref (#2269) @harrism\n- Update the developer&#39;s guide with new copyright hook (#2266) @KyleFromNVIDIA\n- Improve coalesced reduction performance for tall and thin matrices (up to 2.6x faster) (#2259) @Nyrio\n- Adds missing files to `update-version.sh` (#2255) @AyodeAwe\n- Enable all tests for `arm64` jobs (#2248) @galipremsagar\n- Update nvtx3 link in cmake (#2246) @lowener\n- Add CAGRA-Q subspace dim = 4 support (#2244) @enp1s0\n- Get rid of `cuco::sentinel` namespace (#2243) @PointKernel\n- Replace usages of raw `get_upstream` with `get_upstream_resource()` (#2207) @miscco\n- Set the import mode for dask tests (#2142) @vyasr\n- Add UCXX support (#1983) @pentschev","2024-06-05T15:05:40",{"id":251,"version":252,"summary_zh":253,"released_at":254},102230,"v24.04.00","## 🐛 Bug Fixes\n\n- Update pre-commit-hooks to v0.0.3 (#2239) @KyleFromNVIDIA\n- MAINT: Simplify NCCL worker rank identification (#2228) @VibhuJawa\n- Fix bug in blockRankedReduce (#2226) @akifcorduk\n- Fix illegal acces mean\u002Fstdev, sum add Kahan Summation (#2223) @mfoerste4\n- Batch cutlass distance kernels along N matrix dim (#2215) @mdoijade\n- Fix out of bounds access in sum kernel (#2183) @tfeher\n- Fix ANN bench ground truth generation for k&gt;1024 (#2180) @tfeher\n- Fixing cusparse aligned address issue and adding note (#2179) @cjnolet\n- Launch `neighborhood_recall` kernel on CUDA stream (#2156) @divyegala\n- Add `compile-library` by default on pylibraft build (#2090) @lowener\n\n## 📖 Documentation\n\n- Adding cuVS notice to README and front page of docs. (#2224) @cjnolet\n\n## 🚀 New Features\n\n- Add CAGRA-Q to ANN benchmarks (#2233) @achirkin\n- Add CAGRA-Q build (compression) (#2213) @achirkin\n- CAGRA-Q search (#2206) @enp1s0\n- Demangle backtrace symbols on raft error (#2188) @achirkin\n- Reapply: Support for fp16 in CAGRA and IVF-PQ (#2172) @achirkin\n- Remove supports_streams from custom RAFT memory resources (#2121) @harrism\n- [FEA] Add support for bitmap_view &amp; the API of `bitmap_to_csr` (#2109) @rhdong\n\n## 🛠️ Improvements\n\n- Use `conda env create --yes` instead of `--force` (#2247) @bdice\n- Align ucx version pinning with ucx-py\u002Fucxx. (#2227) @bdice\n- Add upper bound to prevent usage of NumPy 2 (#2222) @bdice\n- Performance optimization of IVF-flat \u002F select_k (#2221) @mfoerste4\n- Replace local copyright check with pre-commit-hooks verify-copyright (#2220) @KyleFromNVIDIA\n- Remove hard-coding of RAPIDS version where possible (#2219) @KyleFromNVIDIA\n- Fix style. (#2214) @bdice\n- Add explicit instantiations for IVF-PQ search kernels used in tests (#2212) @tfeher\n- Improve RBC eps-neighborhood query performance (#2211) @mfoerste4\n- Add test for spmm (#2210) @mfoerste4\n- Only install necessary components in conda packages. (#2209) @bdice\n- Automate C++ include file grouping and ordering using clang-format (#2202) @harrism\n- Add support for Python 3.11, require NumPy 1.23+ (#2200) @jameslamb\n- Pass `std::optional` instead of `thrust::optional` to RMM (#2199) @trxcllnt\n- Update devcontainers to CUDA Toolkit 12.2 (#2192) @trxcllnt\n- target branch-24.04 for GitHub Actions workflows (#2189) @jameslamb\n- Fixing workaround for cuSPARSE bug with correct copy dimensions (#2185) @mfoerste4\n- Allow topk larger than 1024 in CAGRA (#2181) @benfred\n- IVF-FLAT support k &gt; 256 (#2169) @mfoerste4\n- Add environment-agnostic scripts for running ctests and pytests (#2165) @trxcllnt\n- Ensure that `ctest` is called with `--no-tests=error`. (#2163) @bdice\n- Update ops-bot.yaml (#2158) @AyodeAwe\n- random sampling of dataset rows with improved memory utilization (#2155) @tfeher\n- [FIX] Ensure hnswlib can be found from RAFT&#39;s build dir (#2145) @trxcllnt\n- Improve analysis experience for ANN benchmarks (#2139) @achirkin\n- Enable CAGRA index building without adding dataset to the index (#2126) @tfeher\n- Add fused cosine 1-NN cutlass based kernel (#2125) @mdoijade\n- Update raft for compatibility with the latest cuco (#2118) @PointKernel\n- Support CUDA 12.2 (#2092) @jameslamb\n- Cache IVF-PQ and select-warpsort kernel launch parameters to reduce latency (#1786) @achirkin","2024-04-10T15:06:36",{"id":256,"version":257,"summary_zh":258,"released_at":259},102231,"v24.02.00","## 🚨 Breaking Changes\n\n- Switch to scikit-build-core (#2051) @vyasr\n- Update to CCCL 2.2.0. (#2049) @bdice\n- Update `raft-ann-bench` output filenames and add features to plotting (#2043) @divyegala\n- Remove selection_faiss (#2027) @benfred\n\n## 🐛 Bug Fixes\n\n- fix is_row\u002Fcol_order for strided layouts (#2173) @mfoerste4\n- Fix failing C++ tests and revert #2097, #2085. (#2168) @cjnolet\n- Exclude tests from builds (#2162) @vyasr\n- [HOTFIX] 24.02 Revert Random Sampling (#2144) @cjnolet\n- Pin to pytest 7. (#2137) @bdice\n- Conditionally include `hnsw` wrapper source in CMake (#2135) @divyegala\n- [BUG] Fix `SPMM` strided view (#2124) @lowener\n- Fixing small bug in CUSPARSE spmm w\u002F CUDA 12.2 (#2117) @cjnolet\n- [BUG] Fix `num_cta_per_query` div (#2107) @lowener\n- Remove extraneous host pinnings from libraft-headers-only. (#2102) @bdice\n- Remove unneeded CI symbol excludes (#2098) @robertmaynard\n- Properly taking ownership of nccl subcomm (and destroying it) (#2094) @cjnolet\n- Fix `max_queries` for CAGRA (#2081) @lowener\n- Fix compile failure on RTX 4090 (#2076) @JieFengWang\n- Fix a crash in FAISS benchmark wrapper introduced in #2021 (#2062) @achirkin\n- Correct function that wasn&#39;t returning a value (#2045) @robertmaynard\n- Fixing small bug in raft-ann-bench (#2041) @cjnolet\n- Make device_resources accessed from device_resources_manager thread-safe (#2030) @wphicks\n- Fix ann-bench multithreading (#2021) @achirkin\n- Fix `ci\u002Fchecks\u002Fcopyright.py` to mirror RAPIDS reference (#2008) @divyegala\n- Fix pyproject versions (#2002) @vyasr\n\n## 📖 Documentation\n\n- Adding license info for wiki-all dataset (#2129) @cjnolet\n- [DOC] Documentation updates for release 24.02 (#2093) @cjnolet\n- Fix errors with ingroup exposed by doxygen 1.10 (#2079) @wphicks\n- Fix a typo (#2070) @narangvivek10\n- Add usage example for brute_force::build (#2029) @benfred\n- Add filtering to vector search tutorial (#1996) @lowener\n\n## 🚀 New Features\n\n- Update to use rapids-cmake for all deps (#2096) @robertmaynard\n- Add IVF-PQ example into the template project (#2091) @achirkin\n- Support for fp16 in CAGRA and IVF-PQ (#2085) @achirkin\n- Add random subsampling for IVF methods (#2077) @tfeher\n- Update `raft-ann-bench` output filenames and add features to plotting (#2043) @divyegala\n- Add brute_force index serialization (#2036) @wphicks\n- Add eps-neighbor search via RBC (#2028) @mfoerste4\n- `libraft` and `pylibraft` API for CAGRA build and HNSW search (#2022) @divyegala\n- Export Pareto frontier in `raft-ann-bench.data_export` (#2009) @divyegala\n- Implement maybe-owning multi-dimensional container (mdbuffer) (#1999) @wphicks\n- Add support for 1024+ dim vectors in CAGRA search (#1994) @enp1s0\n- Replace GEMM backend: cublas.gemm -&gt; cublaslt.matmul (#1736) @achirkin\n\n## 🛠️ Improvements\n\n- Remove get_mem_info functions from RAFT custom memory resources (#2108) @harrism\n- Replace call to mr::get_mem_info() (#2099) @harrism\n- Allow topk larger than 1024 in CAGRA (#2097) @benfred\n- Remove usages of rapids-env-update (#2095) @KyleFromNVIDIA\n- Provide explicit pool size for pool_memory_resources and clean up includes (#2088) @harrism\n- refactor CUDA versions in dependencies.yaml (#2086) @jameslamb\n- ANN bench fix latency measurement overhead (#2084) @tfeher\n- Remove hardcoded limit in `print_results` function (#2080) @narangvivek10\n- [FEA] Add support for SDDMM by wrapping the cusparseSDDMM (#2067) (#2067) @rhdong\n- Benchmark brute force knn (#2063) @benfred\n- [BUG] fix empty initialization of device_ndarray in pylibraft (#2061) @mfoerste4\n- Improve parallelism of refine host (#2059) @anaruse\n- Subsampling for IVF-PQ codebook generation (#2052) @abc99lr\n- Switch to scikit-build-core (#2051) @vyasr\n- Update to CCCL 2.2.0. (#2049) @bdice\n- Use cuda::proclaim_return_type on device lambda. (#2048) @bdice\n- Removing code that explicitly compares equality of rmm memory resources (#2047) @cjnolet\n- Add public enum for select-k algorithm selection (#2046) @benfred\n- Update dependencies.yaml to new pip index (#2042) @vyasr\n- Remove RAFT_BUILD_WHEELS and standardize Python builds (#2040) @vyasr\n- Fix ucx-py version pinning in dependencies.yaml. (#2035) @bdice\n- [REVIEW] Fix typos in parameter tuning guide (#2034) @abc99lr\n- Add AIR-Top-k reference (#2031) @tfeher\n- Remove selection_faiss (#2027) @benfred\n- Fixing json parse error in `raft-ann-bench.data_export` (#2025) @cjnolet\n- Updating cagra build constraint (#2016) @cjnolet\n- Update to fmt 10.1.1 and spdlog 1.12.0. (#1957) @bdice\n- Enable host dataset for IVF-Flat (#1635) @tfeher\n- add half\u002Fbfloat support to myInf and abs (#1592) @Kh4ster","2024-02-12T21:23:41",{"id":261,"version":262,"summary_zh":263,"released_at":264},102232,"v23.12.00","## 🐛 Bug Fixes\n\n- Update actions\u002Flabeler to v4 (#2037) @raydouglass\n- pylibraft only depends on numpy at runtime, not build time. (#2013) @bdice\n- Fixes to update-version.sh (#1991) @raydouglass\n- Adjusting end-to-end start time so it doesn&#39;t include stream creation time (#1989) @cjnolet\n- CAGRA graph optimizer: clamp rev_graph_count (#1987) @tfeher\n- Catching conversion errors in data_export instead of fully failing (#1979) @cjnolet\n- Fix syncing mechanism in `raft-ann-bench` C++ search (#1961) @divyegala\n- Fixing hnswlib in latency mode (#1959) @cjnolet\n- Fix `ucx-py` alpha version update for `raft-dask` (#1953) @divyegala\n- Reduce NN Descent test threshold (#1946) @divyegala\n- Fixes to new YAML config `raft-bench-ann` (#1945) @divyegala\n- Set RNG seeds in NN Descent to diagnose flaky tests (#1931) @divyegala\n- Fix FAISS CPU algorithm names in `raft-ann-bench` (#1916) @divyegala\n- Increase iterations in NN Descent tests to avoid flakiness (#1915) @divyegala\n- Fix filepath in `raft-ann-bench\u002Fsplit_groundtruth` module (#1911) @divyegala\n- Remove dynamic entry-points from raft-ann-bench (#1910) @benfred\n- Remove unnecessary dataset path check in ANN bench (#1908) @tfeher\n- Fixing Googletests and re-enabling in CI (#1904) @cjnolet\n- Fix NN Descent overflows (#1875) @divyegala\n- Build fix for CUDA 12.2 (#1870) @benfred\n- [BUG] Fix a bug in NN descent (#1869) @enp1s0\n\n## 📖 Documentation\n\n- Brute Force Index documentation fix (#1944) @lowener\n- Add `wiki_all` dataset config and documentation. (#1918) @cjnolet\n- Updates to raft-ann-bench docs (#1905) @cjnolet\n- End-to-end vector search tutorial in docs (#1776) @cjnolet\n\n## 🚀 New Features\n\n- Adding `dry-run` option to `raft-ann-bench` (#1970) @cjnolet\n- Add ANN bench scripts to generate ground truth (#1967) @tfeher\n- CAGRA build + HNSW search (#1956) @divyegala\n- Verify conda-cpp-post-build-checks (#1935) @robertmaynard\n- Make all cuda kernels have hidden visibility (#1898) @robertmaynard\n- Update rapids-cmake functions to non-deprecated signatures (#1884) @robertmaynard\n- [FEA] Helpers for identifying contiguous layouts. (#1861) @trivialfis\n- Add `raft::stats::neighborhood_recall` (#1860) @divyegala\n- [FEA] Helpers and CodePacker for IVF-PQ (#1826) @tarang-jain\n\n## 🛠️ Improvements\n\n- Pinning fmt and spdlog for raft-ann-bench-cpu (#2018) @cjnolet\n- Build concurrency for nightly and merge triggers (#2011) @bdice\n- Using `EXPORT_SET` in `rapids_find_package_root` (#2006) @cjnolet\n- Remove static checks for serialization size (#1997) @cjnolet\n- Skipping bad json parse (#1990) @cjnolet\n- Update select-k heuristic (#1985) @benfred\n- ANN bench: use different offset for each thread (#1981) @tfeher\n- Allow `raft-ann-bench\u002Frun` to continue after encountering bad YAML configs (#1980) @divyegala\n- Add build and search params to `raft-ann-bench.data_export` CSVs (#1971) @divyegala\n- Use new `rapids-dask-dependency` metapackage for managing dask versions (#1968) @galipremsagar\n- Remove unused header (#1960) @wphicks\n- Adding pool back in and fixing cagra benchmark params (#1951) @cjnolet\n- Add constraints to `hnswlib` in `raft-bench-ann` (#1949) @divyegala\n- Add support for iterating over batches in bfknn (#1947) @benfred\n- Fix ANN bench latency (#1940) @tfeher\n- Add YAML config files to run parameter sweeps for ANN benchmarks (#1929) @divyegala\n- Relax ucx pinning (#1927) @vyasr\n- Try using contiguous rank to fix cuda_visible_devices (#1926) @VibhuJawa\n- Unpin `dask` and `distributed` for `23.12` development (#1925) @galipremsagar\n- Adding `throughput` and `latency` modes to `raft-ann-bench` (#1920) @cjnolet\n- Providing `aarch64` yaml environment files (#1914) @cjnolet\n- CAGRA ANN bench: parse build options for IVF-PQ build algo (#1912) @tfeher\n- Fix python script location in ANN bench description (#1906) @tfeher\n- Refactor install\u002Fbuild guide. (#1899) @cjnolet\n- Check return values of raft-ann-bench subprocess calls (#1897) @benfred\n- ANN bench options to specify CAGRA graph and dataset locations (#1896) @cjnolet\n- Add check-json to pre-commit linters, and fix invalid ann-bench JSON config (#1894) @benfred\n- Use branch-23.12 workflows. (#1886) @bdice\n- Setup Consistent Nightly Versions for Pip and Conda (#1880) @divyegala\n- Fix and improve one-block radix select (#1878) @yong-wang\n- [FEA] Improvements on bitset class (#1877) @lowener\n- Branch 23.12 merge 23.10 (#1873) @AyodeAwe\n- Branch 23.12 merge 23.10 (#1868) @cjnolet\n- Replace `raft::random` calls to not use deprecated API (#1867) @lowener\n- raft: Build CUDA 12.0 ARM conda packages. (#1853) @bdice\n- Documentation for raft ANN benchmark containers. (#1833) @dantegd\n- [FEA] Support vector deletion in ANN IVF (#1831) @lowener\n- Provide a raft::copy overload for mdspan-to-mdspan copies (#1818) @wphicks\n- Adding FAISS cpu to `raft-ann-bench` (#1814) @cjnolet","2023-12-06T15:47:19",{"id":266,"version":267,"summary_zh":268,"released_at":269},102233,"v23.10.00","## 🚨 Breaking Changes\n\n- Change CAGRA auto mode selection (#1830) @enp1s0\n- Update CAGRA serialization (#1755) @benfred\n- Improvements to ANN Benchmark Python scripts and docs (#1734) @divyegala\n- Update to Cython 3.0.0 (#1688) @vyasr\n- ANN-benchmarks: switch to use gbench (#1661) @achirkin\n\n## 🐛 Bug Fixes\n\n- [BUG] Fix a bug in the filtering operation in CAGRA multi-kernel (#1862) @enp1s0\n- Fix conf file for benchmarking glove datasets (#1846) @dantegd\n- raft-ann-bench package fixes for plotting and conf files (#1844) @dantegd\n- Fix update-version.sh for all pyproject.toml files (#1839) @raydouglass\n- Make RMM a run dependency of the raft-ann-bench conda package (#1838) @dantegd\n- Printing actual exception in `require base set` (#1816) @cjnolet\n- Adding rmm to `raft-ann-bench` dependencies (#1815) @cjnolet\n- Use `conda mambabuild` not `mamba mambabuild` (#1812) @bdice\n- Fix `raft-dask` naming in wheel builds (#1805) @divyegala\n- neighbors::refine_host: check the dataset bounds (#1793) @achirkin\n- [BUG] Fix search parameter check in CAGRA (#1784) @enp1s0\n- IVF-Flat: fix search batching (#1764) @achirkin\n- Using expanded distance computations in `pylibraft` (#1759) @cjnolet\n- Fix ann-bench Documentation (#1754) @divyegala\n- Make get_cache_idx a weak symbol with dummy template (#1733) @ahendriksen\n- Fix IVF-PQ fused kernel performance problems (#1726) @achirkin\n- Fix build.sh to enable NEIGHBORS_ANN_CAGRA_TEST (#1724) @enp1s0\n- Fix template types for create_descriptor function. (#1680) @csadorf\n\n## 📖 Documentation\n\n- Fix the CAGRA paper citation (#1788) @enp1s0\n- Add citation info for the CAGRA paper preprint (#1787) @enp1s0\n- [DOC] Fix grouping for ANN in C++ doxygen (#1782) @lowener\n- Update RAFT documentation (#1717) @lowener\n- Additional polishing of README and docs (#1713) @cjnolet\n\n## 🚀 New Features\n\n- [FEA] Add `bitset_filter` for CAGRA indices removal (#1837) @lowener\n- ann-bench: miscellaneous improvements (#1808) @achirkin\n- [FEA] Add bitset for ANN pre-filtering and deletion (#1803) @lowener\n- Adding config files for remaining (relevant) ann-benchmarks million-scale datasets (#1761) @cjnolet\n- Port NN-descent algorithm to use in `cagra::build()` (#1748) @divyegala\n- Adding conda build for libraft static (#1746) @cjnolet\n- [FEA] Provide device_resources_manager for easy generation of device_resources (#1716) @wphicks\n\n## 🛠️ Improvements\n\n- Add option to brute_force index to store non-owning reference to norms (#1865) @benfred\n- Pin `dask` and `distributed` for `23.10` release (#1864) @galipremsagar\n- Update image names (#1835) @AyodeAwe\n- Fixes for OOM during CAGRA benchmarks (#1832) @benfred\n- Change CAGRA auto mode selection (#1830) @enp1s0\n- Update to clang 16.0.6. (#1829) @bdice\n- Add IVF-Flat C++ example (#1828) @tfeher\n- matrix::select_k: extra tests and benchmarks (#1821) @achirkin\n- Add index class for brute_force knn (#1817) @benfred\n- [FEA] Add pre-filtering to CAGRA (#1811) @enp1s0\n- More updates to ann-bench docs (#1810) @cjnolet\n- Add best deep-100M configs for IVF-PQ to ANN benchmarks (#1807) @tfeher\n- A few fixes to `raft-ann-bench` recipe and docs (#1806) @cjnolet\n- Simplify wheel build scripts and allow alphas of RAPIDS dependencies (#1804) @divyegala\n- Various fixes to reproducible benchmarks (#1800) @cjnolet\n- ANN-bench: more flexible cuda_stub.hpp (#1792) @achirkin\n- Add RAFT devcontainers (#1791) @trxcllnt\n- Cagra memory optimizations (#1790) @benfred\n- Fixing a couple security concerns in `raft-dask` nccl unique id generation (#1785) @cjnolet\n- Don&#39;t serialize dataset with CAGRA bench (#1781) @benfred\n- Use `copy-pr-bot` (#1774) @ajschmidt8\n- Add GPU and CPU packages for ANN benchmarks (#1773) @dantegd\n- Improvements to raft-ann-bench scripts, docs, and benchmarking implementations. (#1769) @cjnolet\n- [REVIEW] Introducing host API for PCG (#1767) @vinaydes\n- Unpin `dask` and `distributed` for `23.10` development (#1760) @galipremsagar\n- Add ivf-flat notebook (#1758) @tfeher\n- Update CAGRA serialization (#1755) @benfred\n- Remove block size template parameter from CAGRA search (#1740) @enp1s0\n- Add NVTX ranges for cagra search\u002Fserialize functions (#1737) @benfred\n- Improvements to ANN Benchmark Python scripts and docs (#1734) @divyegala\n- Fixing forward merger for 23.08 -&gt; 23.10 (#1731) @cjnolet\n- [FEA] Use CAGRA in C++ template (#1730) @lowener\n- fixed box around raft image (#1710) @nwstephens\n- Enable CUTLASS-based distance kernels on CTK 12 (#1702) @ahendriksen\n- Update bench-ann configuration (#1696) @lowener\n- Update to Cython 3.0.0 (#1688) @vyasr\n- Update CMake version (#1677) @vyasr\n- Branch 23.10 merge 23.08 (#1672) @vyasr\n- ANN-benchmarks: switch to use gbench (#1661) @achirkin","2023-10-11T15:00:07",{"id":271,"version":272,"summary_zh":273,"released_at":274},102234,"v23.08.00","## 🚨 Breaking Changes\n\n- Separate CAGRA index type from internal idx type (#1664) @tfeher\n- Stop using setup.py in build.sh (#1645) @vyasr\n- CAGRA max_queries auto configuration (#1613) @enp1s0\n- Rename the CAGRA prune function to optimize (#1588) @enp1s0\n- CAGRA pad dataset for 128bit vectorized load (#1505) @tfeher\n- Sparse Pairwise Distances API Updates (#1502) @divyegala\n- Cagra index construction without copying device mdarrays (#1494) @tfeher\n- [FEA] Masked NN for connect_components (#1445) @tarang-jain\n- Limiting workspace memory resource (#1356) @achirkin\n\n## 🐛 Bug Fixes\n\n- Remove push condition on docs-build (#1693) @raydouglass\n- IVF-PQ: Fix illegal memory access with large max_samples (#1685) @achirkin\n- Fix missing parameter for select_k (#1682) @ucassjy\n- Separate CAGRA index type from internal idx type (#1664) @tfeher\n- Add rmm to pylibraft run dependencies, since it is used by Cython. (#1656) @bdice\n- Hotfix: wrong constant in IVF-PQ fp_8bit2half (#1654) @achirkin\n- Fix sparse KNN for large batches (#1640) @viclafargue\n- Fix uploading of RAFT nightly packages (#1638) @dantegd\n- Fix cagra multi CTA bug (#1628) @enp1s0\n- pass correct stream to cutlass kernel launch of L2\u002Fcosine pairwise distance kernels (#1597) @mdoijade\n- Fix launchconfig y-gridsize too large in epilogue kernel (#1586) @mfoerste4\n- Fix update version and pinnings for 23.08. (#1556) @bdice\n- Fix for function exposing KNN merge (#1418) @viclafargue\n\n## 📖 Documentation\n\n- Critical doc fixes and updates for 23.08 (#1705) @cjnolet\n- Fix the documentation about changing the logging level (#1596) @enp1s0\n- Fix raft::bitonic_sort small usage example (#1580) @enp1s0\n\n## 🚀 New Features\n\n- Use rapids-cmake new parallel testing feature (#1623) @robertmaynard\n- Add support for row-major slice (#1591) @lowener\n- IVF-PQ tutorial notebook (#1544) @achirkin\n- [FEA] Masked NN for connect_components (#1445) @tarang-jain\n- raft: Build CUDA 12 packages (#1388) @vyasr\n- Limiting workspace memory resource (#1356) @achirkin\n\n## 🛠️ Improvements\n\n- Pin `dask` and `distributed` for `23.08` release (#1711) @galipremsagar\n- Add algo parameter for CAGRA ANN bench (#1687) @tfeher\n- ANN benchmarks python wrapper for splitting billion-scale dataset groundtruth (#1679) @divyegala\n- Rename CAGRA parameter num_parents to search_width (#1676) @tfeher\n- Renaming namespaces to promote CAGRA from experimental (#1666) @cjnolet\n- CAGRA Python wrappers (#1665) @dantegd\n- Add notebook for Vector Search - Question Retrieval (#1662) @lowener\n- Fix CMake CUDA support for pylibraft when raft is found. (#1659) @bdice\n- Cagra ANN benchmark improvements (#1658) @tfeher\n- ANN-benchmarks: avoid using the dataset during search when possible (#1657) @achirkin\n- Revert CUDA 12.0 CI workflows to branch-23.08. (#1652) @bdice\n- ANN: Optimize host-side refine (#1651) @achirkin\n- Cagra template instantiations (#1650) @tfeher\n- Modify comm_split to avoid ucp (#1649) @ChuckHastings\n- Stop using setup.py in build.sh (#1645) @vyasr\n- IVF-PQ: Add a (faster) direct conversion fp8-&gt;half (#1644) @achirkin\n- Simplify `bench\u002Fann` scripts to Python based module (#1642) @divyegala\n- Further removal of uses-setup-env-vars (#1639) @dantegd\n- Drop blank line in `raft-dask\u002Fmeta.yaml` (#1637) @jakirkham\n- Enable conservative memory allocations for RAFT IVF-Flat benchmarks. (#1634) @tfeher\n- [FEA] Codepacking for IVF-flat (#1632) @tarang-jain\n- Fixing ann bench cmake (and docs) (#1630) @cjnolet\n- [WIP] Test  CI issues (#1626) @VibhuJawa\n- Set pool memory resource for raft IVF ANN benchmarks (#1625) @tfeher\n- Adding sort option to matrix::select_k api (#1615) @cjnolet\n- CAGRA max_queries auto configuration (#1613) @enp1s0\n- Use exceptions instead of `exit(-1)` (#1594) @benfred\n- [REVIEW] Add scheduler_file argument to support MNMG setup (#1593) @VibhuJawa\n- Rename the CAGRA prune function to optimize (#1588) @enp1s0\n- This PR adds support to __half and nb_bfloat16 to myAtomicReduce (#1585) @Kh4ster\n- [IMP] move core CUDA RT macros to cuda_rt_essentials.hpp (#1584) @MatthiasKohl\n- preprocessor syntax fix (#1582) @AyodeAwe\n- use rapids-upload-docs script (#1578) @AyodeAwe\n- Unpin `dask` and `distributed` for development and fix `merge_labels` test (#1574) @galipremsagar\n- Remove documentation build scripts for Jenkins (#1570) @ajschmidt8\n- Add support to __half and nv_bfloat16 to most math functions (#1554) @Kh4ster\n- Add RAFT ANN benchmark for CAGRA (#1552) @enp1s0\n- Update CAGRA knn_graph_sort to use Raft::bitonic_sort (#1550) @enp1s0\n- Add identity matrix function (#1548) @lowener\n- Unpin scikit-build upper bound (#1547) @vyasr\n- Migrate wheel workflow scripts locally (#1546) @divyegala\n- Add sample filtering for ivf_flat. Filtering code refactoring and cleanup (#1541) @alexanderguzhva\n- CAGRA pad dataset for 128bit vectorized load (#1505) @tfeher\n- Sparse Pairwise Distances API Updates (#1502) @divyegala\n- Add CAGRA gbench (#1496) @tfeher\n- Cagra index construction without copying device mdarrays (#1","2023-08-09T17:02:49",{"id":276,"version":277,"summary_zh":278,"released_at":279},102235,"v23.06.02","## 🚨 Breaking Changes\n\n- ivf-pq::search: fix the indexing type of the query-related mdspan arguments (#1539) @achirkin\n- Dropping Python 3.8 (#1454) @divyegala\n\n## 🐛 Bug Fixes\n\n- PR to trigger rebuild of raft (#1622) @raydouglass\n- Assigning Deterministic rank to Dask Workers Based on CUDA_VISIBLE_DEVICES (branch-23.06) (#1587) @VibhuJawa\n- [HOTFIX] Fix  distance metrics L2\u002Fcosine\u002Fcorrelation when X &amp; Y are same buffer but with different shape and add unit test for such case. (#1571) @mdoijade\n- Using raft::resources in rsvd (#1543) @cjnolet\n- ivf-pq::search: fix the indexing type of the query-related mdspan arguments (#1539) @achirkin\n- Check python brute-force knn inputs (#1537) @benfred\n- Fix failing TiledKNNTest unittest (#1533) @benfred\n- ivf-flat: fix incorrect recomputed size of the index (#1525) @achirkin\n- ivf-flat: limit the workspace size of the search via batching (#1515) @achirkin\n- Support uint64_t in CAGRA index data type (#1514) @enp1s0\n- Workaround for cuda 12 issue in cusparse (#1508) @cjnolet\n- Un-scale output distances (#1499) @achirkin\n- Inline get_cache_idx (#1492) @ahendriksen\n- Pin to scikit-build&lt;17.2 (#1487) @vyasr\n- Remove pool_size() calls from debug printouts (#1484) @tfeher\n- Add missing ext declaration for log detail::format (#1482) @tfeher\n- Remove include statements from inside namespace (#1467) @robertmaynard\n- Use pin_compatible to ensure that lower CTKs can be used (#1462) @vyasr\n- fix ivf_pq n_probes (#1456) @benfred\n- The glog project root CMakeLists.txt is where we should build from (#1442) @robertmaynard\n- Add missing resource factory virtual destructor (#1433) @cjnolet\n- Removing cuda stream view include from mdarray (#1429) @cjnolet\n- Fix dim param for IVF-PQ wrapper in ANN bench (#1427) @tfeher\n- Remove MetricProcessor code from brute_force::knn (#1426) @benfred\n- Fix is_min_close (#1419) @benfred\n- Have consistent compile lines between BUILD_TESTS enabled or not (#1401) @robertmaynard\n- Fix ucx-py pin in raft-dask recipe (#1396) @vyasr\n\n## 📖 Documentation\n\n- Various updates to the docs for 23.06 release (#1538) @cjnolet\n- Rename kernel arch finding function for dispatch (#1536) @mdoijade\n- Adding bfknn and ivf-pq python api to docs (#1507) @cjnolet\n- Add RAPIDS cuDF as a library that supports cuda_array_interface (#1444) @miguelusque\n\n## 🚀 New Features\n\n- IVF-PQ: manipulating individual lists (#1298) @achirkin\n- Gram matrix support for sparse input (#1296) @mfoerste4\n- [FEA] Add randomized svd from cusolver (#1000) @lowener\n\n## 🛠️ Improvements\n\n- Require Numba 0.57.0+ (#1559) @jakirkham\n- remove device_resources include from linalg::map (#1540) @benfred\n- Learn heuristic to pick fastest select_k algorithm (#1523) @benfred\n- [REVIEW] make raft::cache::Cache protected to allow overrides (#1522) @mfoerste4\n- [REVIEW] Fix padding assertion in sparse Gram evaluation (#1521) @mfoerste4\n- run docs nightly too (#1520) @AyodeAwe\n- Switch back to using primary shared-action-workflows branch (#1519) @vyasr\n- Python API for IVF-Flat serialization (#1516) @tfeher\n- Introduce sample filtering to IVFPQ index search (#1513) @alexanderguzhva\n- Migrate from raft::device_resources -&gt; raft::resources (#1510) @benfred\n- Use rmm allocator in CAGRA prune (#1503) @enp1s0\n- Update recipes to GTest version &gt;=1.13.0 (#1501) @bdice\n- Remove raft\u002Fmatrix\u002Fmatrix.cuh includes (#1498) @benfred\n- Generate dataset of select_k times (#1497) @benfred\n- Re-use memory pool between benchmark runs (#1495) @benfred\n- Support CUDA 12.0 for pip wheels (#1489) @divyegala\n- Update cupy dependency (#1488) @vyasr\n- Enable sccache hits from local builds (#1478) @AyodeAwe\n- Build wheels using new single image workflow (#1477) @vyasr\n- Revert shared-action-workflows pin (#1475) @divyegala\n- CAGRA: Separate graph index sorting functionality from prune function (#1471) @enp1s0\n- Add generic reduction functions and separate reductions\u002Fwarp_primitives (#1470) @akifcorduk\n- [ENH] [FINAL] Header structure: combine all PRs into one (#1469) @ahendriksen\n- use `matrix::select_k` in brute_force::knn call (#1463) @benfred\n- Dropping Python 3.8 (#1454) @divyegala\n- Fix linalg::map to work with non-power-of-2-sized types again (#1453) @ahendriksen\n- [ENH] Enable building with clang (limit strict error checking to GCC) (#1452) @ahendriksen\n- Remove usage of rapids-get-rapids-version-from-git (#1436) @jjacobelli\n- Minor Updates to Sparse Structures (#1432) @divyegala\n- Use nvtx3 includes. (#1431) @bdice\n- Remove wheel pytest verbosity (#1424) @sevagh\n- Add python bindings for matrix::select_k (#1422) @benfred\n- Using `raft::resources` across `raft::random` (#1420) @cjnolet\n- Generate build metrics report for test and benchmarks (#1414) @divyegala\n- Update clang-format to 16.0.1. (#1412) @bdice\n- Use ARC V2 self-hosted runners for GPU jobs (#1410) @jjacobelli\n- Remove uses-setup-env-vars (#1406) @vyasr\n- Resolve conflicts in auto-merger of `branch-23.06` and `branch-23.04` (#1403) @galipremsagar\n- Adding base header-","2023-07-05T17:06:26",{"id":281,"version":282,"summary_zh":283,"released_at":284},102236,"v23.06.01","## 🚨 Breaking Changes\n\n- ivf-pq::search: fix the indexing type of the query-related mdspan arguments (#1539) @achirkin\n- Dropping Python 3.8 (#1454) @divyegala\n\n## 🐛 Bug Fixes\n\n- Assigning Deterministic rank to Dask Workers Based on CUDA_VISIBLE_DEVICES (branch-23.06) (#1587) @VibhuJawa\n- [HOTFIX] Fix  distance metrics L2\u002Fcosine\u002Fcorrelation when X &amp; Y are same buffer but with different shape and add unit test for such case. (#1571) @mdoijade\n- Using raft::resources in rsvd (#1543) @cjnolet\n- ivf-pq::search: fix the indexing type of the query-related mdspan arguments (#1539) @achirkin\n- Check python brute-force knn inputs (#1537) @benfred\n- Fix failing TiledKNNTest unittest (#1533) @benfred\n- ivf-flat: fix incorrect recomputed size of the index (#1525) @achirkin\n- ivf-flat: limit the workspace size of the search via batching (#1515) @achirkin\n- Support uint64_t in CAGRA index data type (#1514) @enp1s0\n- Workaround for cuda 12 issue in cusparse (#1508) @cjnolet\n- Un-scale output distances (#1499) @achirkin\n- Inline get_cache_idx (#1492) @ahendriksen\n- Pin to scikit-build&lt;17.2 (#1487) @vyasr\n- Remove pool_size() calls from debug printouts (#1484) @tfeher\n- Add missing ext declaration for log detail::format (#1482) @tfeher\n- Remove include statements from inside namespace (#1467) @robertmaynard\n- Use pin_compatible to ensure that lower CTKs can be used (#1462) @vyasr\n- fix ivf_pq n_probes (#1456) @benfred\n- The glog project root CMakeLists.txt is where we should build from (#1442) @robertmaynard\n- Add missing resource factory virtual destructor (#1433) @cjnolet\n- Removing cuda stream view include from mdarray (#1429) @cjnolet\n- Fix dim param for IVF-PQ wrapper in ANN bench (#1427) @tfeher\n- Remove MetricProcessor code from brute_force::knn (#1426) @benfred\n- Fix is_min_close (#1419) @benfred\n- Have consistent compile lines between BUILD_TESTS enabled or not (#1401) @robertmaynard\n- Fix ucx-py pin in raft-dask recipe (#1396) @vyasr\n\n## 📖 Documentation\n\n- Various updates to the docs for 23.06 release (#1538) @cjnolet\n- Rename kernel arch finding function for dispatch (#1536) @mdoijade\n- Adding bfknn and ivf-pq python api to docs (#1507) @cjnolet\n- Add RAPIDS cuDF as a library that supports cuda_array_interface (#1444) @miguelusque\n\n## 🚀 New Features\n\n- IVF-PQ: manipulating individual lists (#1298) @achirkin\n- Gram matrix support for sparse input (#1296) @mfoerste4\n- [FEA] Add randomized svd from cusolver (#1000) @lowener\n\n## 🛠️ Improvements\n\n- Require Numba 0.57.0+ (#1559) @jakirkham\n- remove device_resources include from linalg::map (#1540) @benfred\n- Learn heuristic to pick fastest select_k algorithm (#1523) @benfred\n- [REVIEW] make raft::cache::Cache protected to allow overrides (#1522) @mfoerste4\n- [REVIEW] Fix padding assertion in sparse Gram evaluation (#1521) @mfoerste4\n- run docs nightly too (#1520) @AyodeAwe\n- Switch back to using primary shared-action-workflows branch (#1519) @vyasr\n- Python API for IVF-Flat serialization (#1516) @tfeher\n- Introduce sample filtering to IVFPQ index search (#1513) @alexanderguzhva\n- Migrate from raft::device_resources -&gt; raft::resources (#1510) @benfred\n- Use rmm allocator in CAGRA prune (#1503) @enp1s0\n- Update recipes to GTest version &gt;=1.13.0 (#1501) @bdice\n- Remove raft\u002Fmatrix\u002Fmatrix.cuh includes (#1498) @benfred\n- Generate dataset of select_k times (#1497) @benfred\n- Re-use memory pool between benchmark runs (#1495) @benfred\n- Support CUDA 12.0 for pip wheels (#1489) @divyegala\n- Update cupy dependency (#1488) @vyasr\n- Enable sccache hits from local builds (#1478) @AyodeAwe\n- Build wheels using new single image workflow (#1477) @vyasr\n- Revert shared-action-workflows pin (#1475) @divyegala\n- CAGRA: Separate graph index sorting functionality from prune function (#1471) @enp1s0\n- Add generic reduction functions and separate reductions\u002Fwarp_primitives (#1470) @akifcorduk\n- [ENH] [FINAL] Header structure: combine all PRs into one (#1469) @ahendriksen\n- use `matrix::select_k` in brute_force::knn call (#1463) @benfred\n- Dropping Python 3.8 (#1454) @divyegala\n- Fix linalg::map to work with non-power-of-2-sized types again (#1453) @ahendriksen\n- [ENH] Enable building with clang (limit strict error checking to GCC) (#1452) @ahendriksen\n- Remove usage of rapids-get-rapids-version-from-git (#1436) @jjacobelli\n- Minor Updates to Sparse Structures (#1432) @divyegala\n- Use nvtx3 includes. (#1431) @bdice\n- Remove wheel pytest verbosity (#1424) @sevagh\n- Add python bindings for matrix::select_k (#1422) @benfred\n- Using `raft::resources` across `raft::random` (#1420) @cjnolet\n- Generate build metrics report for test and benchmarks (#1414) @divyegala\n- Update clang-format to 16.0.1. (#1412) @bdice\n- Use ARC V2 self-hosted runners for GPU jobs (#1410) @jjacobelli\n- Remove uses-setup-env-vars (#1406) @vyasr\n- Resolve conflicts in auto-merger of `branch-23.06` and `branch-23.04` (#1403) @galipremsagar\n- Adding base header-only conda package without cuda math libs (#1386) @cj","2023-06-12T15:49:09",{"id":286,"version":287,"summary_zh":288,"released_at":289},102237,"v23.06.00","## 🚨 Breaking Changes\n\n- ivf-pq::search: fix the indexing type of the query-related mdspan arguments (#1539) @achirkin\n- Dropping Python 3.8 (#1454) @divyegala\n\n## 🐛 Bug Fixes\n\n- [HOTFIX] Fix  distance metrics L2\u002Fcosine\u002Fcorrelation when X &amp; Y are same buffer but with different shape and add unit test for such case. (#1571) @mdoijade\n- Using raft::resources in rsvd (#1543) @cjnolet\n- ivf-pq::search: fix the indexing type of the query-related mdspan arguments (#1539) @achirkin\n- Check python brute-force knn inputs (#1537) @benfred\n- Fix failing TiledKNNTest unittest (#1533) @benfred\n- ivf-flat: fix incorrect recomputed size of the index (#1525) @achirkin\n- ivf-flat: limit the workspace size of the search via batching (#1515) @achirkin\n- Support uint64_t in CAGRA index data type (#1514) @enp1s0\n- Workaround for cuda 12 issue in cusparse (#1508) @cjnolet\n- Un-scale output distances (#1499) @achirkin\n- Inline get_cache_idx (#1492) @ahendriksen\n- Pin to scikit-build&lt;17.2 (#1487) @vyasr\n- Remove pool_size() calls from debug printouts (#1484) @tfeher\n- Add missing ext declaration for log detail::format (#1482) @tfeher\n- Remove include statements from inside namespace (#1467) @robertmaynard\n- Use pin_compatible to ensure that lower CTKs can be used (#1462) @vyasr\n- fix ivf_pq n_probes (#1456) @benfred\n- The glog project root CMakeLists.txt is where we should build from (#1442) @robertmaynard\n- Add missing resource factory virtual destructor (#1433) @cjnolet\n- Removing cuda stream view include from mdarray (#1429) @cjnolet\n- Fix dim param for IVF-PQ wrapper in ANN bench (#1427) @tfeher\n- Remove MetricProcessor code from brute_force::knn (#1426) @benfred\n- Fix is_min_close (#1419) @benfred\n- Have consistent compile lines between BUILD_TESTS enabled or not (#1401) @robertmaynard\n- Fix ucx-py pin in raft-dask recipe (#1396) @vyasr\n\n## 📖 Documentation\n\n- Various updates to the docs for 23.06 release (#1538) @cjnolet\n- Rename kernel arch finding function for dispatch (#1536) @mdoijade\n- Adding bfknn and ivf-pq python api to docs (#1507) @cjnolet\n- Add RAPIDS cuDF as a library that supports cuda_array_interface (#1444) @miguelusque\n\n## 🚀 New Features\n\n- IVF-PQ: manipulating individual lists (#1298) @achirkin\n- Gram matrix support for sparse input (#1296) @mfoerste4\n- [FEA] Add randomized svd from cusolver (#1000) @lowener\n\n## 🛠️ Improvements\n\n- Require Numba 0.57.0+ (#1559) @jakirkham\n- remove device_resources include from linalg::map (#1540) @benfred\n- Learn heuristic to pick fastest select_k algorithm (#1523) @benfred\n- [REVIEW] make raft::cache::Cache protected to allow overrides (#1522) @mfoerste4\n- [REVIEW] Fix padding assertion in sparse Gram evaluation (#1521) @mfoerste4\n- run docs nightly too (#1520) @AyodeAwe\n- Switch back to using primary shared-action-workflows branch (#1519) @vyasr\n- Python API for IVF-Flat serialization (#1516) @tfeher\n- Introduce sample filtering to IVFPQ index search (#1513) @alexanderguzhva\n- Migrate from raft::device_resources -&gt; raft::resources (#1510) @benfred\n- Use rmm allocator in CAGRA prune (#1503) @enp1s0\n- Update recipes to GTest version &gt;=1.13.0 (#1501) @bdice\n- Remove raft\u002Fmatrix\u002Fmatrix.cuh includes (#1498) @benfred\n- Generate dataset of select_k times (#1497) @benfred\n- Re-use memory pool between benchmark runs (#1495) @benfred\n- Support CUDA 12.0 for pip wheels (#1489) @divyegala\n- Update cupy dependency (#1488) @vyasr\n- Enable sccache hits from local builds (#1478) @AyodeAwe\n- Build wheels using new single image workflow (#1477) @vyasr\n- Revert shared-action-workflows pin (#1475) @divyegala\n- CAGRA: Separate graph index sorting functionality from prune function (#1471) @enp1s0\n- Add generic reduction functions and separate reductions\u002Fwarp_primitives (#1470) @akifcorduk\n- [ENH] [FINAL] Header structure: combine all PRs into one (#1469) @ahendriksen\n- use `matrix::select_k` in brute_force::knn call (#1463) @benfred\n- Dropping Python 3.8 (#1454) @divyegala\n- Fix linalg::map to work with non-power-of-2-sized types again (#1453) @ahendriksen\n- [ENH] Enable building with clang (limit strict error checking to GCC) (#1452) @ahendriksen\n- Remove usage of rapids-get-rapids-version-from-git (#1436) @jjacobelli\n- Minor Updates to Sparse Structures (#1432) @divyegala\n- Use nvtx3 includes. (#1431) @bdice\n- Remove wheel pytest verbosity (#1424) @sevagh\n- Add python bindings for matrix::select_k (#1422) @benfred\n- Using `raft::resources` across `raft::random` (#1420) @cjnolet\n- Generate build metrics report for test and benchmarks (#1414) @divyegala\n- Update clang-format to 16.0.1. (#1412) @bdice\n- Use ARC V2 self-hosted runners for GPU jobs (#1410) @jjacobelli\n- Remove uses-setup-env-vars (#1406) @vyasr\n- Resolve conflicts in auto-merger of `branch-23.06` and `branch-23.04` (#1403) @galipremsagar\n- Adding base header-only conda package without cuda math libs (#1386) @cjnolet\n- Fix IVF-PQ API to use `device_vector_view` (#1384) @lowener\n- Branch 23.06 merge 23.04 (#1379) @vyasr\n-","2023-06-07T15:34:50"]