[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-lance-format--lance":3,"tool-lance-format--lance":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":80,"owner_location":80,"owner_email":81,"owner_twitter":80,"owner_website":82,"owner_url":83,"languages":84,"stars":122,"forks":123,"last_commit_at":124,"license":125,"difficulty_score":23,"env_os":126,"env_gpu":127,"env_ram":127,"env_deps":128,"category_tags":137,"github_topics":138,"view_count":155,"oss_zip_url":80,"oss_zip_packed_at":80,"status":16,"created_at":156,"updated_at":157,"faqs":158,"releases":179},966,"lance-format\u002Flance","lance","Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..","Lance 是一个专为多模态 AI 设计的开源湖仓格式，旨在高效存储、查询和管理图像、视频、音频、文本及嵌入等多模态数据。它通过创新的文件格式、表格式和目录规范，帮助用户在对象存储上构建高性能的数据湖仓，从而支持复杂的 AI 工作流。\n\nLance 解决了传统数据格式（如 Parquet）在随机访问性能、向量搜索和数据版本管理上的不足。相比 Parquet 或 Iceberg，Lance 的随机访问速度快 100 倍，同时保持高效的扫描性能。此外，它支持混合搜索（向量相似性搜索、全文搜索和 SQL 分析），并提供零拷贝版本控制和 ACID 事务，非常适合需要频繁更新和回溯的机器学习特征工程。\n\n对于开发者和研究人员来说，Lance 提供了与 Pandas、DuckDB、PyArrow 和 PyTorch 等流行工具的无缝集成，降低了使用门槛。无论是构建搜索引擎、特征存储，还是进行大规模机器学习训练，Lance 都能显著提升效率。其独特的技术亮点包括原生多模态数据支持、高效的二进制编码和懒加载机制，以及无需额外基础设施的自动版本管理。\n\n如果你需要处理复杂多模态数据或优化 AI 数据管道","Lance 是一个专为多模态 AI 设计的开源湖仓格式，旨在高效存储、查询和管理图像、视频、音频、文本及嵌入等多模态数据。它通过创新的文件格式、表格式和目录规范，帮助用户在对象存储上构建高性能的数据湖仓，从而支持复杂的 AI 工作流。\n\nLance 解决了传统数据格式（如 Parquet）在随机访问性能、向量搜索和数据版本管理上的不足。相比 Parquet 或 Iceberg，Lance 的随机访问速度快 100 倍，同时保持高效的扫描性能。此外，它支持混合搜索（向量相似性搜索、全文搜索和 SQL 分析），并提供零拷贝版本控制和 ACID 事务，非常适合需要频繁更新和回溯的机器学习特征工程。\n\n对于开发者和研究人员来说，Lance 提供了与 Pandas、DuckDB、PyArrow 和 PyTorch 等流行工具的无缝集成，降低了使用门槛。无论是构建搜索引擎、特征存储，还是进行大规模机器学习训练，Lance 都能显著提升效率。其独特的技术亮点包括原生多模态数据支持、高效的二进制编码和懒加载机制，以及无需额外基础设施的自动版本管理。\n\n如果你需要处理复杂多模态数据或优化 AI 数据管道，Lance 是一个值得尝试的工具。只需几行代码，即可将现有 Parquet 数据转换为 Lance 格式，快速体验其卓越性能。","\u003Cdiv align=\"center\">\n\u003Cp align=\"center\">\n\n\u003Cimg width=\"257\" alt=\"Lance Logo\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flance-format_lance_readme_9636001a59ef.png\">\n\n**The Open Lakehouse Format for Multimodal AI**\u003Cbr\u002F>\n**High-performance vector search, full-text search, random access, and feature engineering capabilities for the lakehouse.**\u003Cbr\u002F>\n**Compatible with Pandas, DuckDB, Polars, PyArrow, Ray, Spark, and more integrations on the way.**\n\n\u003Ca href=\"https:\u002F\u002Flance.org\">Documentation\u003C\u002Fa> •\n\u003Ca href=\"https:\u002F\u002Flance.org\u002Fcommunity\">Community\u003C\u002Fa> •\n\u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002Flance\">Discord\u003C\u002Fa>\n\n[CI]: https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Factions\u002Fworkflows\u002Frust.yml\n[CI Badge]: https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Factions\u002Fworkflows\u002Frust.yml\u002Fbadge.svg\n[Docs]: https:\u002F\u002Flance.org\n[Docs Badge]: https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdocs-passing-brightgreen\n[crates.io]: https:\u002F\u002Fcrates.io\u002Fcrates\u002Flance\n[crates.io badge]: https:\u002F\u002Fimg.shields.io\u002Fcrates\u002Fv\u002Flance.svg\n[Python versions]: https:\u002F\u002Fpypi.org\u002Fproject\u002Fpylance\u002F\n[Python versions badge]: https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fpylance\n\n[![CI Badge]][CI]\n[![Docs Badge]][Docs]\n[![crates.io badge]][crates.io]\n[![Python versions badge]][Python versions]\n\n\u003C\u002Fp>\n\u003C\u002Fdiv>\n\n\u003Chr \u002F>\n\nLance is an open lakehouse format for multimodal AI. It contains a file format, table format, and catalog spec that allows you to build a complete lakehouse on top of object storage to power your AI workflows. Lance is perfect for:\n\n1. Building search engines and feature stores with hybrid search capabilities.\n2. Large-scale ML training requiring high performance IO and random access.\n3. Storing, querying, and managing multimodal data including images, videos, audio, text, and embeddings.\n\nThe key features of Lance include:\n\n* **Expressive hybrid search:** Combine vector similarity search, full-text search (BM25), and SQL analytics on the same dataset with accelerated secondary indices.\n\n* **Lightning-fast random access:** 100x faster than Parquet or Iceberg for random access without sacrificing scan performance.\n\n* **Native multimodal data support:** Store images, videos, audio, text, and embeddings in a single unified format with efficient blob encoding and lazy loading.\n\n* **Data evolution:** Efficiently add columns with backfilled values without full table rewrites, perfect for ML feature engineering.\n\n* **Zero-copy versioning:** Automatic versioning with ACID transactions, time travel, tags, and branches—no extra infrastructure needed.\n\n* **Rich ecosystem integrations:** Apache Arrow, Pandas, Polars, DuckDB, Apache Spark, Ray, Trino, Apache Flink, and open catalogs (Apache Polaris, Unity Catalog, Apache Gravitino).\n\nFor more details, see the full [Lance format specification](https:\u002F\u002Flance.org\u002Fformat).\n\n> [!TIP]\n> Lance is in active development and we welcome contributions. Please see our [contributing guide](https:\u002F\u002Flance.org\u002Fcommunity\u002Fcontributing\u002F) for more information.\n\n## Quick Start\n\n**Installation**\n\n```shell\npip install pylance\n```\n\nTo install a preview release:\n\n```shell\npip install --pre --extra-index-url https:\u002F\u002Fpypi.fury.io\u002Flance-format\u002Fpylance\n```\n\n> [!TIP]\n> Preview releases are released more often than full releases and contain the\n> latest features and bug fixes. They receive the same level of testing as full releases.\n> We guarantee they will remain published and available for download for at\n> least 6 months. When you want to pin to a specific version, prefer a stable release.\n\n**Converting to Lance**\n\n```python\nimport lance\n\nimport pandas as pd\nimport pyarrow as pa\nimport pyarrow.dataset\n\ndf = pd.DataFrame({\"a\": [5], \"b\": [10]})\nuri = \"\u002Ftmp\u002Ftest.parquet\"\ntbl = pa.Table.from_pandas(df)\npa.dataset.write_dataset(tbl, uri, format='parquet')\n\nparquet = pa.dataset.dataset(uri, format='parquet')\nlance.write_dataset(parquet, \"\u002Ftmp\u002Ftest.lance\")\n```\n\n**Reading Lance data**\n```python\ndataset = lance.dataset(\"\u002Ftmp\u002Ftest.lance\")\nassert isinstance(dataset, pa.dataset.Dataset)\n```\n\n**Pandas**\n```python\ndf = dataset.to_table().to_pandas()\ndf\n```\n\n**DuckDB**\n```python\nimport duckdb\n\n# If this segfaults, make sure you have duckdb v0.7+ installed\nduckdb.query(\"SELECT * FROM dataset LIMIT 10\").to_df()\n```\n\n**Vector search**\n\nDownload the sift1m subset\n\n```shell\nwget ftp:\u002F\u002Fftp.irisa.fr\u002Flocal\u002Ftexmex\u002Fcorpus\u002Fsift.tar.gz\ntar -xzf sift.tar.gz\n```\n\nConvert it to Lance\n\n```python\nimport lance\nfrom lance.vector import vec_to_table\nimport numpy as np\nimport struct\n\nnvecs = 1000000\nndims = 128\nwith open(\"sift\u002Fsift_base.fvecs\", mode=\"rb\") as fobj:\n    buf = fobj.read()\n    data = np.array(struct.unpack(\"\u003C128000000f\", buf[4 : 4 + 4 * nvecs * ndims])).reshape((nvecs, ndims))\n    dd = dict(zip(range(nvecs), data))\n\ntable = vec_to_table(dd)\nuri = \"vec_data.lance\"\nsift1m = lance.write_dataset(table, uri, max_rows_per_group=8192, max_rows_per_file=1024*1024)\n```\n\nBuild the index\n\n```python\nsift1m.create_index(\"vector\",\n                    index_type=\"IVF_PQ\",\n                    num_partitions=256,  # IVF\n                    num_sub_vectors=16)  # PQ\n```\n\nSearch the dataset\n\n```python\n# Get top 10 similar vectors\nimport duckdb\n\ndataset = lance.dataset(uri)\n\n# Sample 100 query vectors. If this segfaults, make sure you have duckdb v0.7+ installed\nsample = duckdb.query(\"SELECT vector FROM dataset USING SAMPLE 100\").to_df()\nquery_vectors = np.array([np.array(x) for x in sample.vector])\n\n# Get nearest neighbors for all of them\nrs = [dataset.to_table(nearest={\"column\": \"vector\", \"k\": 10, \"q\": q})\n      for q in query_vectors]\n```\n\n## Directory structure\n\n| Directory          | Description              |\n|--------------------|--------------------------|\n| [rust](.\u002Frust)     | Core Rust implementation |\n| [python](.\u002Fpython) | Python bindings (PyO3)   |\n| [java](.\u002Fjava)     | Java bindings (JNI)      |\n| [docs](.\u002Fdocs)     | Documentation source     |\n\n## Benchmarks\n\n### Vector search\n\nWe used the SIFT dataset to benchmark our results with 1M vectors of 128D\n\n1. For 100 randomly sampled query vectors, we get \u003C1ms average response time (on a 2023 m2 MacBook Air)\n\n![avg_latency.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flance-format_lance_readme_a5f33a38c880.png)\n\n2. ANNs are always a trade-off between recall and performance\n\n![avg_latency.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flance-format_lance_readme_70966d52e81a.png)\n\n### Vs. parquet\n\nWe create a Lance dataset using the Oxford Pet dataset to do some preliminary performance testing of Lance as compared to Parquet and raw image\u002FXMLs. For analytics queries, Lance is 50-100x better than reading the raw metadata. For batched random access, Lance is 100x better than both parquet and raw files.\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flance-format_lance_readme_ed239a844aec.png)\n\n## Why Lance for AI\u002FML workflows?\n\nThe machine learning development cycle involves multiple stages:\n\n```mermaid\ngraph LR\n    A[Collection] --> B[Exploration];\n    B --> C[Analytics];\n    C --> D[Feature Engineer];\n    D --> E[Training];\n    E --> F[Evaluation];\n    F --> C;\n    E --> G[Deployment];\n    G --> H[Monitoring];\n    H --> A;\n```\n\nTraditional lakehouse formats were designed for SQL analytics and struggle with AI\u002FML workloads that require:\n- **Vector search** for similarity and semantic retrieval\n- **Fast random access** for sampling and interactive exploration\n- **Multimodal data** storage (images, videos, audio alongside embeddings)\n- **Data evolution** for feature engineering without full table rewrites\n- **Hybrid search** combining vectors, full-text, and SQL predicates\n\nWhile existing formats (Parquet, Iceberg, Delta Lake) excel at SQL analytics, they require additional specialized systems for AI capabilities. Lance brings these AI-first features directly into the lakehouse format.\n\nA comparison of different formats across ML development stages:\n\n|                     | Lance | Parquet & ORC | JSON & XML | TFRecord | Database | Warehouse |\n|---------------------|-------|---------------|------------|----------|----------|-----------|\n| Analytics           | Fast  | Fast          | Slow       | Slow     | Decent   | Fast      |\n| Feature Engineering | Fast  | Fast          | Decent     | Slow     | Decent   | Good      |\n| Training            | Fast  | Decent        | Slow       | Fast     | N\u002FA      | N\u002FA       |\n| Exploration         | Fast  | Slow          | Fast       | Slow     | Fast     | Decent    |\n| Infra Support       | Rich  | Rich          | Decent     | Limited  | Rich     | Rich      |\n\n","\u003Cdiv align=\"center\">\n\u003Cp align=\"center\">\n\n\u003Cimg width=\"257\" alt=\"Lance Logo\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flance-format_lance_readme_9636001a59ef.png\">\n\n**多模态 AI 的开源湖仓格式（Open Lakehouse Format for Multimodal AI）**\u003Cbr\u002F>\n**为湖仓提供高性能向量搜索、全文搜索、随机访问和特征工程能力。**\u003Cbr\u002F>\n**兼容 Pandas、DuckDB、Polars、PyArrow、Ray、Spark，更多集成正在进行中。**\n\n\u003Ca href=\"https:\u002F\u002Flance.org\">文档\u003C\u002Fa> •\n\u003Ca href=\"https:\u002F\u002Flance.org\u002Fcommunity\">社区\u003C\u002Fa> •\n\u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002Flance\">Discord\u003C\u002Fa>\n\n[CI]: https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Factions\u002Fworkflows\u002Frust.yml\n[CI Badge]: https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Factions\u002Fworkflows\u002Frust.yml\u002Fbadge.svg\n[Docs]: https:\u002F\u002Flance.org\n[Docs Badge]: https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdocs-passing-brightgreen\n[crates.io]: https:\u002F\u002Fcrates.io\u002Fcrates\u002Flance\n[crates.io badge]: https:\u002F\u002Fimg.shields.io\u002Fcrates\u002Fv\u002Flance.svg\n[Python versions]: https:\u002F\u002Fpypi.org\u002Fproject\u002Fpylance\u002F\n[Python versions badge]: https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fpylance\n\n[![CI Badge]][CI]\n[![Docs Badge]][Docs]\n[![crates.io badge]][crates.io]\n[![Python versions badge]][Python versions]\n\n\u003C\u002Fp>\n\u003C\u002Fdiv>\n\n\u003Chr \u002F>\n\nLance 是一个多模态 AI 的开源湖仓格式（Open Lakehouse Format）。它包含一个文件格式、表格式和目录规范，允许您在对象存储之上构建完整的湖仓，以支持您的 AI 工作流。Lance 非常适合：\n\n1. 构建具有混合搜索能力的搜索引擎和特征存储。\n2. 需要高性能 IO 和随机访问的大规模机器学习训练。\n3. 存储、查询和管理包括图像、视频、音频、文本和嵌入在内的多模态数据。\n\nLance 的主要特性包括：\n\n* **表达性强的混合搜索：** 在同一数据集上结合向量相似性搜索、全文搜索（BM25）和 SQL 分析，并通过加速的二级索引提升性能。\n\n* **闪电般的随机访问：** 比 Parquet 或 Iceberg 快 100 倍的随机访问速度，同时不牺牲扫描性能。\n\n* **原生多模态数据支持：** 使用高效的二进制编码和延迟加载，将图像、视频、音频、文本和嵌入存储在统一的格式中。\n\n* **数据演进：** 高效添加带有回填值的列，无需重写整个表，非常适合机器学习特征工程。\n\n* **零拷贝版本控制：** 自动版本控制，支持 ACID 事务、时间旅行、标签和分支——无需额外的基础设施。\n\n* **丰富的生态系统集成：** Apache Arrow、Pandas、Polars、DuckDB、Apache Spark、Ray、Trino、Apache Flink 和开放目录（Apache Polaris、Unity Catalog、Apache Gravitino）。\n\n更多细节，请参阅完整的 [Lance 格式规范](https:\u002F\u002Flance.org\u002Fformat)。\n\n> [!TIP]\n> Lance 正在积极开发中，我们欢迎贡献。请参阅我们的 [贡献指南](https:\u002F\u002Flance.org\u002Fcommunity\u002Fcontributing\u002F) 以获取更多信息。\n\n## 快速开始\n\n**安装**\n\n```shell\npip install pylance\n```\n\n安装预览版：\n\n```shell\npip install --pre --extra-index-url https:\u002F\u002Fpypi.fury.io\u002Flance-format\u002Fpylance\n```\n\n> [!TIP]\n> 预览版比正式版发布更频繁，包含最新的功能和错误修复。它们经过与正式版相同的测试级别。\n> 我们保证它们至少会在未来 6 个月内保持发布并可供下载。当您需要固定到特定版本时，请优先选择稳定版。\n\n**转换为 Lance 格式**\n\n```python\nimport lance\n\nimport pandas as pd\nimport pyarrow as pa\nimport pyarrow.dataset\n\ndf = pd.DataFrame({\"a\": [5], \"b\": [10]})\nuri = \"\u002Ftmp\u002Ftest.parquet\"\ntbl = pa.Table.from_pandas(df)\npa.dataset.write_dataset(tbl, uri, format='parquet')\n\nparquet = pa.dataset.dataset(uri, format='parquet')\nlance.write_dataset(parquet, \"\u002Ftmp\u002Ftest.lance\")\n```\n\n**读取 Lance 数据**\n```python\ndataset = lance.dataset(\"\u002Ftmp\u002Ftest.lance\")\nassert isinstance(dataset, pa.dataset.Dataset)\n```\n\n**Pandas**\n```python\ndf = dataset.to_table().to_pandas()\ndf\n```\n\n**DuckDB**\n```python\nimport duckdb\n\n# 如果出现段错误，请确保已安装 duckdb v0.7+ 版本\nduckdb.query(\"SELECT * FROM dataset LIMIT 10\").to_df()\n```\n\n**向量搜索**\n\n下载 sift1m 子集\n\n```shell\nwget ftp:\u002F\u002Fftp.irisa.fr\u002Flocal\u002Ftexmex\u002Fcorpus\u002Fsift.tar.gz\ntar -xzf sift.tar.gz\n```\n\n将其转换为 Lance 格式\n\n```python\nimport lance\nfrom lance.vector import vec_to_table\nimport numpy as np\nimport struct\n\nnvecs = 1000000\nndims = 128\nwith open(\"sift\u002Fsift_base.fvecs\", mode=\"rb\") as fobj:\n    buf = fobj.read()\n    data = np.array(struct.unpack(\"\u003C128000000f\", buf[4 : 4 + 4 * nvecs * ndims])).reshape((nvecs, ndims))\n    dd = dict(zip(range(nvecs), data))\n\ntable = vec_to_table(dd)\nuri = \"vec_data.lance\"\nsift1m = lance.write_dataset(table, uri, max_rows_per_group=8192, max_rows_per_file=1024*1024)\n```\n\n构建索引\n\n```python\nsift1m.create_index(\"vector\",\n                    index_type=\"IVF_PQ\",\n                    num_partitions=256,  # IVF\n                    num_sub_vectors=16)  # PQ\n```\n\n搜索数据集\n\n```python\n# 获取前 10 个最相似的向量\nimport duckdb\n\ndataset = lance.dataset(uri)\n\n# 抽样 100 个查询向量。如果出现段错误，请确保已安装 duckdb v0.7+ 版本\nsample = duckdb.query(\"SELECT vector FROM dataset USING SAMPLE 100\").to_df()\nquery_vectors = np.array([np.array(x) for x in sample.vector])\n\n# 获取所有查询向量的最近邻\nrs = [dataset.to_table(nearest={\"column\": \"vector\", \"k\": 10, \"q\": q})\n      for q in query_vectors]\n```\n\n## 目录结构\n\n| 目录              | 描述                  |\n|-------------------|-----------------------|\n| [rust](.\u002Frust)    | 核心 Rust 实现       |\n| [python](.\u002Fpython)| Python 绑定 (PyO3)   |\n| [java](.\u002Fjava)    | Java 绑定 (JNI)      |\n| [docs](.\u002Fdocs)    | 文档源码             |\n\n## 性能基准\n\n### 向量搜索\n\n我们使用 SIFT 数据集对 1M 个 128 维向量进行基准测试\n\n1. 对于 100 个随机抽样的查询向量，平均响应时间 \u003C1ms（在 2023 年款 M2 MacBook Air 上）\n\n![avg_latency.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flance-format_lance_readme_a5f33a38c880.png)\n\n2. 近似最近邻（ANNs）始终是召回率和性能之间的权衡\n\n![avg_latency.png](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flance-format_lance_readme_70966d52e81a.png)\n\n### 与 Parquet 对比\n\n我们使用 Oxford Pet 数据集创建了一个 Lance 数据集，以对 Lance 与 Parquet 和原始图像\u002FXML 文件进行初步性能测试。对于分析查询，Lance 比直接读取原始元数据快 50-100 倍。对于批量随机访问，Lance 比 Parquet 和原始文件快 100 倍。\n\n![](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flance-format_lance_readme_ed239a844aec.png)\n\n## 为什么选择 Lance 用于 AI\u002FML 工作流？\n\n机器学习开发周期包含多个阶段：\n\n```mermaid\ngraph LR\n    A[Collection] --> B[Exploration];\n    B --> C[Analytics];\n    C --> D[Feature Engineer];\n    D --> E[Training];\n    E --> F[Evaluation];\n    F --> C;\n    E --> G[Deployment];\n    G --> H[Monitoring];\n    H --> A;\n```\n\n传统的湖仓格式（lakehouse formats）专为 SQL 分析设计，在处理需要以下能力的 AI\u002FML 工作负载时表现不佳：\n- **向量搜索**（Vector search），用于相似性和语义检索\n- **快速随机访问**（Fast random access），用于采样和交互式探索\n- **多模态数据存储**（Multimodal data storage），支持图像、视频、音频以及嵌入（embeddings）的存储\n- **数据演化**（Data evolution），在无需重写整个表的情况下进行特征工程\n- **混合搜索**（Hybrid search），结合向量、全文检索和 SQL 谓词\n\n尽管现有格式（如 Parquet、Iceberg、Delta Lake）在 SQL 分析方面表现出色，但它们需要额外的专业系统来实现 AI 功能。Lance 将这些以 AI 为核心的功能直接集成到湖仓格式中。\n\n以下是不同格式在 ML 开发阶段中的对比：\n\n|                     | Lance | Parquet & ORC | JSON & XML | TFRecord | Database | Warehouse |\n|---------------------|-------|---------------|------------|----------|----------|-----------|\n| 分析（Analytics）   | 快速  | 快速          | 慢         | 慢       | 一般     | 快速      |\n| 特征工程（Feature Engineering） | 快速  | 快速          | 一般       | 慢       | 一般     | 良好      |\n| 训练（Training）    | 快速  | 一般          | 慢         | 快速     | 不适用   | 不适用    |\n| 探索（Exploration） | 快速  | 慢            | 快速       | 慢       | 快速     | 一般      |\n| 基础设施支持（Infra Support） | 丰富  | 丰富          | 一般       | 有限     | 丰富     | 丰富      |","# Lance 快速上手指南\n\nLance 是一个专为多模态 AI 设计的开源湖仓格式，支持高性能向量搜索、全文搜索、随机访问和特征工程能力。以下是快速上手指南。\n\n---\n\n## 环境准备\n\n### 系统要求\n- Python 3.7 或更高版本\n- 支持的操作系统：Linux、macOS 和 Windows（WSL2 推荐）\n\n### 前置依赖\n确保已安装以下工具：\n- `pandas`\n- `pyarrow`\n- （可选）`duckdb`（用于 SQL 查询）\n\n可以通过以下命令安装依赖：\n```shell\npip install pandas pyarrow duckdb\n```\n\n---\n\n## 安装步骤\n\n安装稳定版：\n```shell\npip install pylance\n```\n\n安装预览版（包含最新功能和修复）：\n```shell\npip install --pre --extra-index-url https:\u002F\u002Fpypi.fury.io\u002Flance-format\u002Fpylance\n```\n\n> **提示**：预览版经过与稳定版相同的测试流程，并保证至少 6 个月的可用性。\n\n---\n\n## 基本使用\n\n### 转换为 Lance 格式\n以下示例展示如何将 Parquet 数据转换为 Lance 格式：\n\n```python\nimport lance\nimport pandas as pd\nimport pyarrow as pa\nimport pyarrow.dataset\n\n# 创建示例数据\ndf = pd.DataFrame({\"a\": [5], \"b\": [10]})\nuri = \"\u002Ftmp\u002Ftest.parquet\"\ntbl = pa.Table.from_pandas(df)\npa.dataset.write_dataset(tbl, uri, format='parquet')\n\n# 转换为 Lance 格式\nparquet = pa.dataset.dataset(uri, format='parquet')\nlance.write_dataset(parquet, \"\u002Ftmp\u002Ftest.lance\")\n```\n\n### 读取 Lance 数据\n```python\ndataset = lance.dataset(\"\u002Ftmp\u002Ftest.lance\")\nassert isinstance(dataset, pa.dataset.Dataset)\n```\n\n### 使用 Pandas 查看数据\n```python\ndf = dataset.to_table().to_pandas()\nprint(df)\n```\n\n### 使用 DuckDB 查询数据\n```python\nimport duckdb\n\n# 如果出现段错误，请确保安装了 DuckDB v0.7+ 版本\nresult = duckdb.query(\"SELECT * FROM dataset LIMIT 10\").to_df()\nprint(result)\n```\n\n### 向量搜索示例\n以下代码展示如何对 SIFT 数据集进行向量搜索：\n\n#### 下载并转换数据\n```shell\nwget ftp:\u002F\u002Fftp.irisa.fr\u002Flocal\u002Ftexmex\u002Fcorpus\u002Fsift.tar.gz\ntar -xzf sift.tar.gz\n```\n\n```python\nimport lance\nfrom lance.vector import vec_to_table\nimport numpy as np\nimport struct\n\nnvecs = 1000000\nndims = 128\nwith open(\"sift\u002Fsift_base.fvecs\", mode=\"rb\") as fobj:\n    buf = fobj.read()\n    data = np.array(struct.unpack(\"\u003C128000000f\", buf[4 : 4 + 4 * nvecs * ndims])).reshape((nvecs, ndims))\n    dd = dict(zip(range(nvecs), data))\n\ntable = vec_to_table(dd)\nuri = \"vec_data.lance\"\nsift1m = lance.write_dataset(table, uri, max_rows_per_group=8192, max_rows_per_file=1024*1024)\n```\n\n#### 构建索引\n```python\nsift1m.create_index(\"vector\",\n                    index_type=\"IVF_PQ\",\n                    num_partitions=256,  # IVF\n                    num_sub_vectors=16)  # PQ\n```\n\n#### 搜索数据\n```python\nimport duckdb\n\ndataset = lance.dataset(uri)\n\n# 随机采样 100 个查询向量\nsample = duckdb.query(\"SELECT vector FROM dataset USING SAMPLE 100\").to_df()\nquery_vectors = np.array([np.array(x) for x in sample.vector])\n\n# 获取最近邻\nrs = [dataset.to_table(nearest={\"column\": \"vector\", \"k\": 10, \"q\": q})\n      for q in query_vectors]\n```\n\n---\n\n以上是 Lance 的快速上手指南，更多高级功能请参考 [官方文档](https:\u002F\u002Flance.org)。","一家电商公司正在构建一个商品推荐系统，需要处理数百万条包含商品图片、描述文本和用户行为数据的多模态数据，并为实时查询提供支持。\n\n### 没有 lance 时\n- 数据存储在 Parquet 文件中，随机访问速度慢，每次查询特定商品信息时都需要加载大量无关数据，导致延迟高。\n- 图片和文本数据分开存储，整合分析困难，开发人员需要额外编写代码来处理不同格式的数据。\n- 数据更新频繁，但每次新增特征列都需要重写整个表，耗费大量存储空间和时间。\n- 缺乏版本管理功能，模型训练时无法追溯历史数据状态，调试和复现结果变得复杂。\n- 查询性能不足，尤其是在结合向量搜索和全文检索时，响应时间难以满足实时需求。\n\n### 使用 lance 后\n- 随机访问速度提升 100 倍，查询特定商品信息几乎瞬时完成，显著改善用户体验。\n- 多模态数据统一存储，图片、文本和嵌入向量可以高效编码和懒加载，简化了数据处理流程。\n- 支持高效列添加和回填功能，无需重写整个表，节省存储空间并加快数据更新速度。\n- 自动版本控制功能让数据变更可追溯，支持时间旅行和分支管理，方便模型调试和结果复现。\n- 结合向量相似性搜索、全文检索和 SQL 分析的能力，大幅提升复杂查询性能，满足实时推荐需求。\n\nlance 的高性能和多功能特性显著提升了多模态数据处理效率，为电商推荐系统的开发和运行提供了坚实基础。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Flance-format_lance_9636001a.png","lance-format","Lance Format","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Flance-format_910182f4.png","The Open Lakehouse Format for Multimodal AI",null,"community@lance.org","https:\u002F\u002Flance.org","https:\u002F\u002Fgithub.com\u002Flance-format",[85,89,93,97,101,105,109,113,116,119],{"name":86,"color":87,"percentage":88},"Rust","#dea584",69.9,{"name":90,"color":91,"percentage":92},"HTML","#e34c26",13.1,{"name":94,"color":95,"percentage":96},"Python","#3572A5",9.1,{"name":98,"color":99,"percentage":100},"Java","#b07219",5.3,{"name":102,"color":103,"percentage":104},"Jupyter Notebook","#DA5B0B",2.3,{"name":106,"color":107,"percentage":108},"Shell","#89e051",0.2,{"name":110,"color":111,"percentage":112},"C","#555555",0,{"name":114,"color":115,"percentage":112},"Makefile","#427819",{"name":117,"color":118,"percentage":112},"Handlebars","#f7931e",{"name":120,"color":121,"percentage":112},"Dockerfile","#384d54",6273,610,"2026-04-04T14:32:31","Apache-2.0","Linux, macOS, Windows","未说明",{"notes":129,"python":130,"dependencies":131},"需要安装 DuckDB 0.7+ 版本以避免可能的崩溃问题；支持与 Pandas、PyArrow、DuckDB 等工具集成；建议使用最新稳定版本以获得最佳性能。","3.8+",[132,133,134,135,136],"pyarrow","pandas","duckdb>=0.7","numpy","pylance",[13,51,54,26,14],[139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154],"machine-learning","computer-vision","data-format","deep-learning","python","apache-arrow","duckdb","mlops","data-analysis","data-analytics","data-science","dataops","data-centric","embeddings","rust","llms",4,"2026-03-27T02:49:30.150509","2026-04-06T05:16:10.399107",[159,164,169,174],{"id":160,"question_zh":161,"answer_zh":162,"source_url":163},4277,"Lance 的快照是否支持存储摘要属性？","此功能已在 Python 和 Java SDK 中支持，可以通过以下 PR 验证：https:\u002F\u002Fgithub.com\u002Fapache\u002Ffluss\u002Fpull\u002F1441。此外，Lance 0.33.0 版本已包含相关更新，可以使用该版本调用 `readTransaction` 方法。","https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fissues\u002F4181",{"id":165,"question_zh":166,"answer_zh":167,"source_url":168},4278,"如何解决 MergeInsert 操作中产生的重复行问题？","此问题已通过 PR #4687 修复。解决方案包括：1. 使用主键连接检查写入数据和现有数据之间的重复；2. 使用布隆过滤器检查增量新数据中的并发 MergeInsert 数据，如果发现重复行，则提交失败。注意，此方法依赖用户始终使用 MergeInsert API 来维护主键唯一性。","https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fissues\u002F4585",{"id":170,"question_zh":171,"answer_zh":172,"source_url":173},4279,"如何减少 Lance 的冷启动时间？","可以通过延迟导入可选依赖（如 Pandas、Numpy）来优化。建议参考 Polars 的实现方式，并确保在 Python 目录下运行，同时使用 `maturin develop` 构建 Python 模块。","https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fissues\u002F1217",{"id":175,"question_zh":176,"answer_zh":177,"source_url":178},4280,"如何启用二进制复制以加速小文件合并？","需要满足以下条件才能启用二进制复制：1. 设置 `CompactionOptions.enable_binary_copy` 为 true；2. 所有要合并的片段必须具有相同的文件版本；3. 片段不能包含删除文件。此功能适用于简单的小文件合并场景，但查询性能可能略低于普通压缩。","https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fissues\u002F5433",[180,185,190,195,200,205,210,215,220,225,230,235,240,245,250,255,260,265,270,275],{"id":181,"version":182,"summary_zh":183,"released_at":184},103699,"v5.0.0-beta.5","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at v5.0.0-beta.5 -->\n\n## What's Changed\n### New Features 🎉\n* feat(namespace): add count_table_rows, insert_into_table, query_table by @XuQianJin-Stars in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6132\n* feat: thread data_size through decode pipeline by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6391\n* feat: support index build progress callbacks in Python bindings by @vivek-bharathan in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6394\n* feat: refine logical vector index into an IVF view by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6400\n* feat: optimize one segmented vector segment per run by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6402\n### Bug Fixes 🐛\n* fix: remove legacy tempfile save of IVF centroids in GPU training path by @hushengquan in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6396\n### Other Changes\n* refactor: rename \"region\" to \"shard\" in mem_wal implementation by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6367\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fcompare\u002Fv5.0.0-beta.4...v5.0.0-beta.5","2026-04-04T07:28:48",{"id":186,"version":187,"summary_zh":188,"released_at":189},103700,"v5.0.0-beta.4","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at v5.0.0-beta.4 -->\n\n## What's Changed\n### Breaking Changes 🛠\n* feat!: add progress monitoring via callbacks for distributed merge by @vivek-bharathan in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6210\n### New Features 🎉\n* feat: support ingest mode for external blob writes by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6356\n* feat: support vector query pruning by index segments by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6376\n### Bug Fixes 🐛\n* fix: ensure durable writes actually wait for WAL flush by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6368\n* fix: preserve multipart part ordering in throttle wrapper by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6393\n### Performance Improvements 🚀\n* perf: fix O(N²) version column scan with deletion vectors by @pengw0048 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6375\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fcompare\u002Fv5.0.0-beta.3...v5.0.0-beta.4","2026-04-02T21:37:34",{"id":191,"version":192,"summary_zh":193,"released_at":194},103701,"v5.0.0-beta.3","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at v5.0.0-beta.3 -->\n\n## What's Changed\n### Breaking Changes 🛠\n* refactor!: cleanup namespace related APIs by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6186\n* refactor!: align distributed index build around segments by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6313\n### New Features 🎉\n* feat: io_uring based file reader by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5777\n* feat: add an arrow-stats crate with the ability to calculate basic stats on arrays by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5967\n* feat: change default file format version to 2.1 by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6115\n* feat: expose stable row ids by @ivscheianu in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6325\n* feat: make MAX_MINIBLOCK_VALUES configurable via env var by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6340\n* feat: support bf16 from pytorch dataset by @eddyxu in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6342\n* feat: support DataFusion Expr in DeleteBuilder by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6343\n* feat: support distributed IVF_RQ segment builds by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6359\n* feat: expose branch_identifier in python and java bindings by @majin1102 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6360\n* feat: object store decides scheduler type by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6373\n### Bug Fixes 🐛\n* fix(python): preserve index details in python index metadata by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6279\n* fix: use StorageOptions::new() in cloud providers to pick up env vars by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6316\n* refactor: remove row-id ordering from vector index merge by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6332\n* fix: avoid full scan for nullable vector fragment sampling by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6341\n* fix: respect fragment filters during distributed worker training by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6358\n### Documentation 📚\n* docs: clarify helper function guidance in AGENTS.md by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6357\n### Performance Improvements 🚀\n* perf: remove O(n²) performance regression in take() with duplicate indices by @YSBF in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6351\n* perf: reduce bitmap index build to 1 bitmap in RAM by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6371\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fcompare\u002Fv5.0.0-beta.2...v5.0.0-beta.3","2026-04-02T08:54:49",{"id":196,"version":197,"summary_zh":198,"released_at":199},103702,"v5.0.0-beta.2","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at v5.0.0-beta.2 -->\n\n## What's Changed\n### New Features 🎉\n* feat: pluggable index cache via CacheBackend trait by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6222\n* feat: add write progress callback to InsertBuilder by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6318\n* feat: support hamming distance in HNSW by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6336\n### Documentation 📚\n* docs: add conflict handling and FRI guidance by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6304\n* docs(python): prefer uv for local environment setup by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6335\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fcompare\u002Fv5.0.0-beta.1...v5.0.0-beta.2","2026-03-30T19:12:39",{"id":201,"version":202,"summary_zh":203,"released_at":204},103703,"v4.0.0","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at v4.0.0 -->\n\n## What's Changed\n### Breaking Changes 🛠\n* feat!: upgrade DataFusion dependency to 52.1.0 by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6015\n* refactor!: refactor java access to file format version by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6053\n* refactor!: remove create_empty_table usage by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6087\n* fix!: bump IVF_RQ version for compatibility check by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6097\n* perf(inverted)!: reduce fts indexing time and memory by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6174\n* feat: add index segment commit API by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6209\n* refactor!: remove staging from distributed vector indexing by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6269\n### New Features 🎉\n* feat: compress complex all null by @yingjianwu98 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F4990\n* feat: expose `use_scalar_index` param in Java scanner by @xloya in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5487\n* feat: add file list with sizes to IndexMetadata by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5497\n* feat(compaction): add Python config for defer_index_remap by @zhangyue19921010 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5691\n* feat(core): add Levenshtein-based suggestions to not-found errors in schema by @HemantSudarshan in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5976\n* feat: add URI-based commit support to Java SDK by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5978\n* fix: concurrent read and write to directory namespace by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5983\n* feat: add ability to pass custom headers to objectstore requests by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5989\n* feat: add DeleteResult with num_deleted_rows by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6001\n* feat: introduce IncompatibleTransaction error by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6003\n* feat(cleanup): add more metrics to RemovalStats by @zhangyue19921010 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6025\n* feat(java): expose prefilter parameter to support vector search with fragments by @nyl3532016 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6040\n* feat: surface ambiguous merge insert error as `InvalidInput` by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6048\n* feat(blob): distribute blob sidecar keys with reversed binary ids by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6060\n* feat: handle JSONB literals in Lance SQL planner by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6061\n* feat(java): expose Dataset.dropIndex method to drop specific index by @fangbo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6065\n* feat(blob): map external blob URIs to multi-base base ids by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6066\n* feat: add env toggle for repetition index cache on read by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6069\n* feat(compaction): single reserve_fragment_ids after rewriting files by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6072\n* feat: expose compaction binary copy configuration through python and java SDKs by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6074\n* feat(cleanup): support rate limiter for cleanup operation by @zhangyue19921010 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6084\n* feat: mark 2.2 as stable and add 2.3 as the next file format version by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6088\n* feat: support prewarm for IVF-based ANN indices by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6090\n* feat: add skip_transpose flag to vector index builders by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6114\n* feat: enable HNSW-accelerated partition assignment for fp16 vectors by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6119\n* feat: clearer progress reporting for IVF by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6126\n* feat: support vector indices in describe_indices filtering by @ndpvt-web in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6145\n* feat: reduce open file handles during IVF training by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6169\n* feat: add compaction options in manifest config by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6170\n* feat: support atomic multi-table transactions via namespace manifest by @XuQianJin-Stars in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6173\n* feat: add abfss:\u002F\u002F scheme support for Azure ADLS Gen2 by @burlacio in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6192\n* feat: bounding source fragments for compaction execution by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6232\n* fix: filter out detached versions when scanning manifests by @jackye19","2026-03-30T18:08:19",{"id":206,"version":207,"summary_zh":208,"released_at":209},103704,"v5.0.0-beta.1","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at v5.0.0-beta.1 -->\n\n## What's Changed\n### Breaking Changes 🛠\n* refactor!: remove staging from distributed vector indexing by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6269\n* refactor: move DatasetIndexExt out of lance-index by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6280\n* feat!: support sampling selected fragments by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6294\n### New Features 🎉\n* feat(java): add non-blocking AsyncScanner with CompletableFuture API by @beinan in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6102\n* feat(DirectoryNamespace): support index and transaction related operations by @zhangyue19921010 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6196\n* feat: bounding source fragments for compaction execution by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6232\n* fix: filter out detached versions when scanning manifests by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6245\n* feat: allow setting transaction properties in various operations by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6246\n* feat: add OpenDAL Azdls backend for abfss:\u002F\u002F with use_opendal flag by @burlacio in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6256\n* feat: add aimd throttled object store by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6266\n* feat: clarify logical indices and physical index segments by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6270\n* feat: support stop-word gaps in phrase queries by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6277\n* feat: move rate limiting to the object store by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6293\n* feat: support non-shared centroid vector index builds by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6296\n* feat: add a fast dataset version ID API by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6303\n* feat(python): add storage_options to IvfModel and PqModel save\u002Fload by @hushengquan in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6312\n### Bug Fixes 🐛\n* fix: handle list-level NULLs in NOT filters by @fenfeng9 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6044\n* fix: like queries with a prefix should be accelerated by btree and zonemap by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6188\n* fix: respect the old data filter on inverted index by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6216\n* fix: 2.1\u002F2.2 panic when a list column had small values and many empty values by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6234\n* fix: resolve_latest_location converts errors to not_found unconditionally by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6248\n* fix: return errors for unsupported fixed-size-list child types by @myandpr in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6253\n* fix: adding namespace support to java SDK CommitBuilder from dataset by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6257\n* fix: pass dataset_options to SafeLanceDataset in worker processes by @eddyxu in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6278\n* fix: support hamming distance in IndicesBuilder by @jmhsieh in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6295\n* fix(namespace): support nested types in convert_json_arrow_type by @jiaoew1991 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6300\n* fix: restore namespace build after DatasetIndexExt move by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6302\n* fix: multiple improvements for gh workflows by @esteban in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6306\n### Documentation 📚\n* docs: shorten core major release vote window by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6154\n### Performance Improvements 🚀\n* perf: add benchmark for distributed vector merge finalization by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6176\n* perf: new layout for positions and new algo for phrase query by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6203\n* perf: batched WAND and new WAND structure, ~50% faster by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6241\n* perf: optimize stable row_id index build from O(rows) to O(fragments) by @jiaoew1991 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6310\n### Other Changes\n* refactor: distributed vector segment build by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6220\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fcompare\u002Frelease-root\u002F5.0.0-beta.N...v5.0.0-beta.1","2026-03-27T22:47:34",{"id":211,"version":212,"summary_zh":213,"released_at":214},103705,"v4.0.0-rc.3","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at v4.0.0-rc.3 -->\n\n## What's Changed\n### Breaking Changes 🛠\n* feat!: upgrade DataFusion dependency to 52.1.0 by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6015\n* refactor!: refactor java access to file format version by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6053\n* refactor!: remove create_empty_table usage by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6087\n* fix!: bump IVF_RQ version for compatibility check by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6097\n* perf(inverted)!: reduce fts indexing time and memory by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6174\n* feat: add index segment commit API by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6209\n* refactor!: remove staging from distributed vector indexing by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6269\n### New Features 🎉\n* feat: compress complex all null by @yingjianwu98 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F4990\n* feat: expose `use_scalar_index` param in Java scanner by @xloya in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5487\n* feat: add file list with sizes to IndexMetadata by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5497\n* feat(compaction): add Python config for defer_index_remap by @zhangyue19921010 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5691\n* feat(core): add Levenshtein-based suggestions to not-found errors in schema by @HemantSudarshan in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5976\n* feat: add URI-based commit support to Java SDK by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5978\n* fix: concurrent read and write to directory namespace by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5983\n* feat: add ability to pass custom headers to objectstore requests by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5989\n* feat: add DeleteResult with num_deleted_rows by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6001\n* feat: introduce IncompatibleTransaction error by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6003\n* feat(cleanup): add more metrics to RemovalStats by @zhangyue19921010 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6025\n* feat(java): expose prefilter parameter to support vector search with fragments by @nyl3532016 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6040\n* feat: surface ambiguous merge insert error as `InvalidInput` by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6048\n* feat(blob): distribute blob sidecar keys with reversed binary ids by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6060\n* feat: handle JSONB literals in Lance SQL planner by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6061\n* feat(java): expose Dataset.dropIndex method to drop specific index by @fangbo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6065\n* feat(blob): map external blob URIs to multi-base base ids by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6066\n* feat: add env toggle for repetition index cache on read by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6069\n* feat(compaction): single reserve_fragment_ids after rewriting files by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6072\n* feat: expose compaction binary copy configuration through python and java SDKs by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6074\n* feat(cleanup): support rate limiter for cleanup operation by @zhangyue19921010 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6084\n* feat: mark 2.2 as stable and add 2.3 as the next file format version by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6088\n* feat: support prewarm for IVF-based ANN indices by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6090\n* feat: add skip_transpose flag to vector index builders by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6114\n* feat: enable HNSW-accelerated partition assignment for fp16 vectors by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6119\n* feat: clearer progress reporting for IVF by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6126\n* feat: support vector indices in describe_indices filtering by @ndpvt-web in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6145\n* feat: reduce open file handles during IVF training by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6169\n* feat: add compaction options in manifest config by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6170\n* feat: support atomic multi-table transactions via namespace manifest by @XuQianJin-Stars in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6173\n* feat: add abfss:\u002F\u002F scheme support for Azure ADLS Gen2 by @burlacio in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6192\n* feat: bounding source fragments for compaction execution by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6232\n* fix: filter out detached versions when scanning manifests by @jac","2026-03-24T22:37:53",{"id":216,"version":217,"summary_zh":218,"released_at":219},103706,"v4.1.0-beta.3","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at v4.1.0-beta.3 -->\n\n## What's Changed\n### New Features 🎉\n* feat: add OpenDAL Azdls backend for abfss:\u002F\u002F with use_opendal flag by @burlacio in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6256\n### Bug Fixes 🐛\n* fix: resolve_latest_location converts errors to not_found unconditionally by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6248\n* fix: return errors for unsupported fixed-size-list child types by @myandpr in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6253\n### Performance Improvements 🚀\n* perf: batched WAND and new WAND structure, ~50% faster by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6241\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fcompare\u002Fv4.1.0-beta.2...v4.1.0-beta.3","2026-03-23T17:25:28",{"id":221,"version":222,"summary_zh":223,"released_at":224},103707,"v4.1.0-beta.2","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at v4.1.0-beta.2 -->\n\n## What's Changed\n### Bug Fixes 🐛\n* fix: adding namespace support to java SDK CommitBuilder from dataset by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6257\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fcompare\u002Fv4.1.0-beta.1...v4.1.0-beta.2","2026-03-23T14:56:10",{"id":226,"version":227,"summary_zh":228,"released_at":229},103708,"v4.1.0-beta.1","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at v4.1.0-beta.1 -->\n\n## What's Changed\n### New Features 🎉\n* feat: bounding source fragments for compaction execution by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6232\n* fix: filter out detached versions when scanning manifests by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6245\n* feat: allow setting transaction properties in various operations by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6246\n### Bug Fixes 🐛\n* fix: like queries with a prefix should be accelerated by btree and zonemap by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6188\n* fix: 2.1\u002F2.2 panic when a list column had small values and many empty values by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6234\n### Performance Improvements 🚀\n* perf: new layout for positions and new algo for phrase query by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6203\n### Other Changes\n* refactor: distributed vector segment build by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6220\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fcompare\u002Frelease-root\u002F4.1.0-beta.N...v4.1.0-beta.1","2026-03-22T19:58:27",{"id":231,"version":232,"summary_zh":233,"released_at":234},103709,"v4.0.0-rc.2","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at v4.0.0-rc.2 -->\n\n## What's Changed\n### Breaking Changes 🛠\n* feat!: upgrade DataFusion dependency to 52.1.0 by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6015\n* refactor!: refactor java access to file format version by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6053\n* refactor!: remove create_empty_table usage by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6087\n* fix!: bump IVF_RQ version for compatibility check by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6097\n* perf(inverted)!: reduce fts indexing time and memory by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6174\n### New Features 🎉\n* feat: compress complex all null by @yingjianwu98 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F4990\n* feat: expose `use_scalar_index` param in Java scanner by @xloya in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5487\n* feat: add file list with sizes to IndexMetadata by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5497\n* feat(compaction): add Python config for defer_index_remap by @zhangyue19921010 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5691\n* feat(core): add Levenshtein-based suggestions to not-found errors in schema by @HemantSudarshan in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5976\n* feat: add URI-based commit support to Java SDK by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5978\n* fix: concurrent read and write to directory namespace by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5983\n* feat: add ability to pass custom headers to objectstore requests by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5989\n* feat: add DeleteResult with num_deleted_rows by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6001\n* feat: introduce IncompatibleTransaction error by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6003\n* feat(cleanup): add more metrics to RemovalStats by @zhangyue19921010 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6025\n* feat(java): expose prefilter parameter to support vector search with fragments by @nyl3532016 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6040\n* feat: surface ambiguous merge insert error as `InvalidInput` by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6048\n* feat(blob): distribute blob sidecar keys with reversed binary ids by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6060\n* feat: handle JSONB literals in Lance SQL planner by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6061\n* feat(java): expose Dataset.dropIndex method to drop specific index by @fangbo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6065\n* feat(blob): map external blob URIs to multi-base base ids by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6066\n* feat: add env toggle for repetition index cache on read by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6069\n* feat(compaction): single reserve_fragment_ids after rewriting files by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6072\n* feat: expose compaction binary copy configuration through python and java SDKs by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6074\n* feat(cleanup): support rate limiter for cleanup operation by @zhangyue19921010 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6084\n* feat: mark 2.2 as stable and add 2.3 as the next file format version by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6088\n* feat: support prewarm for IVF-based ANN indices by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6090\n* feat: add skip_transpose flag to vector index builders by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6114\n* feat: enable HNSW-accelerated partition assignment for fp16 vectors by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6119\n* feat: clearer progress reporting for IVF by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6126\n* feat: support vector indices in describe_indices filtering by @ndpvt-web in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6145\n* feat: reduce open file handles during IVF training by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6169\n* feat: add compaction options in manifest config by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6170\n* feat: support atomic multi-table transactions via namespace manifest by @XuQianJin-Stars in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6173\n* feat: add abfss:\u002F\u002F scheme support for Azure ADLS Gen2 by @burlacio in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6192\n* feat: add index segment commit API by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6209\n### Bug Fixes 🐛\n* fix(java): transaction fatal bug in java transaction api by @wojiaodoubao in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5824\n* fix: maintaining individual fragment operation when calling take_source by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5844\n* fix(encoding): handle empty ro","2026-03-20T17:52:26",{"id":236,"version":237,"summary_zh":238,"released_at":239},103710,"v4.0.0-rc.1","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at v4.0.0-rc.1 -->\n\n## What's Changed\n### Breaking Changes 🛠\n* feat!: upgrade DataFusion dependency to 52.1.0 by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6015\n* refactor!: refactor java access to file format version by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6053\n* refactor!: remove create_empty_table usage by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6087\n* fix!: bump IVF_RQ version for compatibility check by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6097\n* perf(inverted)!: reduce fts indexing time and memory by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6174\n### New Features 🎉\n* feat: compress complex all null by @yingjianwu98 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F4990\n* feat: expose `use_scalar_index` param in Java scanner by @xloya in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5487\n* feat: add file list with sizes to IndexMetadata by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5497\n* feat(compaction): add Python config for defer_index_remap by @zhangyue19921010 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5691\n* feat(core): add Levenshtein-based suggestions to not-found errors in schema by @HemantSudarshan in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5976\n* feat: add URI-based commit support to Java SDK by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5978\n* fix: concurrent read and write to directory namespace by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5983\n* feat: add ability to pass custom headers to objectstore requests by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5989\n* feat: add DeleteResult with num_deleted_rows by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6001\n* feat: introduce IncompatibleTransaction error by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6003\n* feat(cleanup): add more metrics to RemovalStats by @zhangyue19921010 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6025\n* feat(java): expose prefilter parameter to support vector search with fragments by @nyl3532016 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6040\n* feat: surface ambiguous merge insert error as `InvalidInput` by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6048\n* feat(blob): distribute blob sidecar keys with reversed binary ids by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6060\n* feat: handle JSONB literals in Lance SQL planner by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6061\n* feat(java): expose Dataset.dropIndex method to drop specific index by @fangbo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6065\n* feat(blob): map external blob URIs to multi-base base ids by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6066\n* feat: add env toggle for repetition index cache on read by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6069\n* feat(compaction): single reserve_fragment_ids after rewriting files by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6072\n* feat: expose compaction binary copy configuration through python and java SDKs by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6074\n* feat(cleanup): support rate limiter for cleanup operation by @zhangyue19921010 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6084\n* feat: mark 2.2 as stable and add 2.3 as the next file format version by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6088\n* feat: support prewarm for IVF-based ANN indices by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6090\n* feat: add skip_transpose flag to vector index builders by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6114\n* feat: enable HNSW-accelerated partition assignment for fp16 vectors by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6119\n* feat: clearer progress reporting for IVF by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6126\n* feat: support vector indices in describe_indices filtering by @ndpvt-web in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6145\n* feat: reduce open file handles during IVF training by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6169\n* feat: add compaction options in manifest config by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6170\n* feat: support atomic multi-table transactions via namespace manifest by @XuQianJin-Stars in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6173\n* feat: add abfss:\u002F\u002F scheme support for Azure ADLS Gen2 by @burlacio in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6192\n* feat: add index segment commit API by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6209\n### Bug Fixes 🐛\n* fix(java): transaction fatal bug in java transaction api by @wojiaodoubao in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5824\n* fix: maintaining individual fragment operation when calling take_source by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5844\n* fix(encoding): handle empty ro","2026-03-19T21:06:05",{"id":241,"version":242,"summary_zh":243,"released_at":244},103711,"v4.0.0-beta.13","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at v4.0.0-beta.13 -->\n\n## What's Changed\n### New Features 🎉\n* feat: add file list with sizes to IndexMetadata by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5497\n* feat: enable HNSW-accelerated partition assignment for fp16 vectors by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6119\n* feat: reduce open file handles during IVF training by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6169\n* feat: add compaction options in manifest config by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6170\n* feat: support atomic multi-table transactions via namespace manifest by @XuQianJin-Stars in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6173\n* feat: add abfss:\u002F\u002F scheme support for Azure ADLS Gen2 by @burlacio in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6192\n* feat: add index segment commit API by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6209\n### Bug Fixes 🐛\n* fix: maintaining individual fragment operation when calling take_source by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5844\n* fix: add missing type hint for producer function by @Gallardot in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6133\n* fix: prevent duplicate manifest entries from concurrent table creation by @jmhsieh in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6143\n* fix: preserve create index transaction semantics by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6204\n* fix: allow same field name with different type in dataset overwrites by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6206\n* fix: prewarm all segments for named indices by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6211\n### Documentation 📚\n* docs: fix incorrect URLs and cleanup by @prrao87 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5317\n* docs: document vector index RAM (training) & storage requirements by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6108\n* docs: add example to show how to index JSON column by @prrao87 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6208\n### Performance Improvements 🚀\n* perf: pre-transpose PQ codebook for SIMD-friendly L2 distance by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5923\n* perf: reuse distance calculator at selecting candidates by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6202\n### Other Changes\n* refactor: use the dataset file version to determine index file version by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6142\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fcompare\u002Fv4.0.0-beta.12...v4.0.0-beta.13","2026-03-19T05:23:28",{"id":246,"version":247,"summary_zh":248,"released_at":249},103712,"v3.0.1","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at v3.0.1 -->\n\n## What's Changed\n### Other Changes\n* refactor: rename arrow-scalar to lance-arrow-scalar by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6199\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fcompare\u002Fv3.0.0...v3.0.1","2026-03-19T21:04:09",{"id":251,"version":252,"summary_zh":253,"released_at":254},103713,"v3.0.1-rc.1","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at v3.0.1-rc.1 -->\n\n## What's Changed\n### Other Changes\n* refactor: rename arrow-scalar to lance-arrow-scalar by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6199\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fcompare\u002Fv3.0.0...v3.0.1-rc.1","2026-03-18T17:54:46",{"id":256,"version":257,"summary_zh":258,"released_at":259},103714,"v4.0.0-beta.12","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at v4.0.0-beta.12 -->\n\n## What's Changed\n### Bug Fixes 🐛\n* fix: disallowing stale credentials from directory namespace by @hamersaw in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6194\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fcompare\u002Fv4.0.0-beta.11...v4.0.0-beta.12","2026-03-13T21:40:06",{"id":261,"version":262,"summary_zh":263,"released_at":264},103715,"v4.0.0-beta.11","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at v4.0.0-beta.11 -->\n\n## What's Changed\n### New Features 🎉\n* feat: support vector indices in describe_indices filtering by @ndpvt-web in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6145\n### Bug Fixes 🐛\n* fix: preserve merge insert delete-by-source semantics by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6148\n* fix: handle nullable validity layers without def levels by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6187\n* fix: use to_arrow_reader in benchmark datagen by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6190\n* fix: memory_limit and num_workers params are not passed to index worker by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6197\n### Documentation 📚\n* docs: update the rules for data replacement conflicts to reflect reality by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6182\n### Performance Improvements 🚀\n* perf(inverted): reuse posting batch builder and merge tail partitions by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6191\n### Other Changes\n* refactor: rename arrow-scalar to lance-arrow-scalar by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6199\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fcompare\u002Fv4.0.0-beta.10...v4.0.0-beta.11","2026-03-13T20:37:25",{"id":266,"version":267,"summary_zh":268,"released_at":269},103716,"v3.0.0","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at v3.0.0 -->\n\n## What's Changed\n### Breaking Changes 🛠\n* feat!: support index progress reporting via callbacks by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5910\n* perf!: remove shuffle buffer by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5912\n* feat!: upgrade DataFusion dependency to 52.1.0 by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6015\n* refactor!: refactor java access to file format version by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6053\n* refactor!: remove create_empty_table usage by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6087\n* fix!: bump IVF_RQ version for compatibility check by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6097\n### Critical Fixes ‼️\n* fix: deduplicate row addresses in take to prevent panic by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5881\n* fix: fts flat search drops rows when avg_doc_length \u003C 1.0 by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5897\n* fix: invalidate index fragment bitmaps after data replacement and stale merge by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5929\n### New Features 🎉\n* feat: add RLE support for block by @yingjianwu98 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F4937\n* feat: compress complex all null by @yingjianwu98 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F4990\n* feat: support cleanup across branches by @majin1102 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5009\n* feat: dictionary index always32 bits by @yingjianwu98 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5011\n* feat: abort dictionary encode if not useful by @yingjianwu98 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5055\n* feat(cdf): cdf support upsert for views by @zhangyue19921010 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5369\n* feat(compaction): binary copy capability for compaction by @zhangyue19921010 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5434\n* feat: expose `use_scalar_index` param in Java scanner by @xloya in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5487\n* feat(python): expose search_filter in scanner by @wojiaodoubao in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5506\n* feat: add alter column nullable to non-nullable support by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5589\n* feat: evolute all_null_layout to constant layout by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5641\n* feat(java): support creating IVF_RQ index by @majin1102 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5648\n* feat(java): support building vector index distributively by @majin1102 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5664\n* feat(rust): add datafusion catalog_provider through namespace by @majin1102 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5686\n* feat: support List and Struct type for KeyValue in inserted_rows.rs by @wojiaodoubao in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5713\n* feat: support tencent cos by @ztorchan in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5740\n* feat: add Lance-HF docs to lance.org\u002Fintegrations\u002Fhuggingface\u002F by @prrao87 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5748\n* feat(python): support namespace for tensorflow by @yuqi1129 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5750\n* feat: add range to External blob by @wojiaodoubao in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5765\n* feat(java): support json extraction by scanning by @majin1102 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5770\n* feat: introduce RowIdSet and RowIdMask by @yanghua in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5771\n* feat: expose blob handling APIs to python by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5790\n* feat: add blob handling support for fragment by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5801\n* feat: add plan\u002Fexecute separation to FilteredReadExec by @LuQQiu in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5843\n* feat: add LSM scanner with point lookup and vector search support by @touch-of-grey in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5850\n* feat: add rename table implementations to REST namespaces by @bryanck in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5874\n* feat(python): expose enable_stable_row_ids in commit() by @fecet in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5908\n* feat: support aggregate in scanner by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5911\n* feat: spill page metadata to disk during IVF shuffle by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5921\n* feat: add third party licenses lists by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5922\n* feat(java): support session by @jackye1995 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5931\n* feat: make geodatafusion\u002Fgeoarrow optional via `geo` feature flag by @apoc in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F5934\n* perf: create local writer for efficient local writes by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull","2026-03-13T15:11:41",{"id":271,"version":272,"summary_zh":273,"released_at":274},103717,"v4.0.0-beta.10","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at v4.0.0-beta.10 -->\n\n## What's Changed\n### Breaking Changes 🛠\n* perf(inverted)!: reduce fts indexing time and memory by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6174\n### Bug Fixes 🐛\n* fix: avoid empty range reads for zero-length blobs by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6168\n### Documentation 📚\n* docs: add alicloud oss configuration by @FarmerChillax in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6167\n### Performance Improvements 🚀\n* perf: remove shard content key sorting from distributed merge by @Xuanwo in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6179\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fcompare\u002Fv4.0.0-beta.9...v4.0.0-beta.10","2026-03-12T22:36:28",{"id":276,"version":277,"summary_zh":278,"released_at":279},103718,"v4.0.0-beta.9","\u003C!-- Release notes generated using configuration in .github\u002Frelease.yml at v4.0.0-beta.9 -->\n\n## What's Changed\n### New Features 🎉\n* feat: clearer progress reporting for IVF by @wkalt in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6126\n### Bug Fixes 🐛\n* fix: replace `fetch_arrow_table` with `to_arrow_table` by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6146\n* fix: handle DataType::Null in adjust_child_validity to prevent panic by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6160\n* fix: persist frag reuse index external file on local filesystem by @wjones127 in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6163\n### Documentation 📚\n* docs: document the rules for transaction conflicts by @westonpace in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6158\n### Performance Improvements 🚀\n* perf: parallelize FTS prewarming by @BubbleCal in https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fpull\u002F6144\n\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Flance-format\u002Flance\u002Fcompare\u002Fv4.0.0-beta.8...v4.0.0-beta.9","2026-03-11T22:10:22"]