[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-spiceai--spiceai":3,"tool-spiceai--spiceai":64},[4,17,27,35,43,56],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":16},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,3,"2026-04-05T11:01:52",[13,14,15],"开发框架","图像","Agent","ready",{"id":18,"name":19,"github_repo":20,"description_zh":21,"stars":22,"difficulty_score":23,"last_commit_at":24,"category_tags":25,"status":16},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",138956,2,"2026-04-05T11:33:21",[13,15,26],"语言模型",{"id":28,"name":29,"github_repo":30,"description_zh":31,"stars":32,"difficulty_score":23,"last_commit_at":33,"category_tags":34,"status":16},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",107662,"2026-04-03T11:11:01",[13,14,15],{"id":36,"name":37,"github_repo":38,"description_zh":39,"stars":40,"difficulty_score":23,"last_commit_at":41,"category_tags":42,"status":16},3704,"NextChat","ChatGPTNextWeb\u002FNextChat","NextChat 是一款轻量且极速的 AI 助手，旨在为用户提供流畅、跨平台的大模型交互体验。它完美解决了用户在多设备间切换时难以保持对话连续性，以及面对众多 AI 模型不知如何统一管理的痛点。无论是日常办公、学习辅助还是创意激发，NextChat 都能让用户随时随地通过网页、iOS、Android、Windows、MacOS 或 Linux 端无缝接入智能服务。\n\n这款工具非常适合普通用户、学生、职场人士以及需要私有化部署的企业团队使用。对于开发者而言，它也提供了便捷的自托管方案，支持一键部署到 Vercel 或 Zeabur 等平台。\n\nNextChat 的核心亮点在于其广泛的模型兼容性，原生支持 Claude、DeepSeek、GPT-4 及 Gemini Pro 等主流大模型，让用户在一个界面即可自由切换不同 AI 能力。此外，它还率先支持 MCP（Model Context Protocol）协议，增强了上下文处理能力。针对企业用户，NextChat 提供专业版解决方案，具备品牌定制、细粒度权限控制、内部知识库整合及安全审计等功能，满足公司对数据隐私和个性化管理的高标准要求。",87618,"2026-04-05T07:20:52",[13,26],{"id":44,"name":45,"github_repo":46,"description_zh":47,"stars":48,"difficulty_score":23,"last_commit_at":49,"category_tags":50,"status":16},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,"2026-04-05T10:45:23",[14,51,52,53,15,54,26,13,55],"数据工具","视频","插件","其他","音频",{"id":57,"name":58,"github_repo":59,"description_zh":60,"stars":61,"difficulty_score":10,"last_commit_at":62,"category_tags":63,"status":16},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,"2026-04-04T04:44:48",[15,14,13,26,54],{"id":65,"github_repo":66,"name":67,"description_en":68,"description_zh":69,"ai_summary_zh":69,"readme_en":70,"readme_zh":71,"quickstart_zh":72,"use_case_zh":73,"hero_image_url":74,"owner_login":67,"owner_name":75,"owner_avatar_url":76,"owner_bio":77,"owner_company":78,"owner_location":78,"owner_email":79,"owner_twitter":80,"owner_website":81,"owner_url":82,"languages":83,"stars":118,"forks":119,"last_commit_at":120,"license":121,"difficulty_score":23,"env_os":122,"env_gpu":123,"env_ram":124,"env_deps":125,"category_tags":137,"github_topics":138,"view_count":23,"oss_zip_url":78,"oss_zip_packed_at":78,"status":16,"created_at":148,"updated_at":149,"faqs":150,"releases":181},3409,"spiceai\u002Fspiceai","spiceai","A portable accelerated SQL query, search, and LLM-inference engine, written in Rust, for data-grounded AI apps and agents.","Spice 是一款基于 Rust 构建的便携式加速引擎，专为打造“数据驱动”的 AI 应用和智能体而设计。它巧妙地将 SQL 查询、语义搜索与大语言模型（LLM）推理能力融合在一个轻量级的运行环境中，帮助开发者轻松解决 AI 幻觉问题，确保生成的回答始终基于真实可靠的数据。\n\n对于需要构建企业级 AI 助手或数据分析应用的开发者而言，Spice 极大地简化了技术架构。它支持跨数据库、数据仓库和数据湖的统一联邦查询，无需移动数据即可实现高效分析。其独特亮点在于“多合一”的 API 设计：不仅提供标准的 SQL 接口（如 ODBC\u002FJDBC）和向量搜索功能，还原生兼容 OpenAI 协议，可直接作为本地模型的加速网关；同时支持 Iceberg 目录和 MCP 协议，便于与外部工具无缝集成。\n\n凭借单二进制文件或容器的便携部署方式，以及 CUDA\u002FMetal 硬件加速能力，Spice 让团队能专注于业务逻辑创新，而非繁琐的基础设施搭建。无论是初创公司的全栈工程师，还是大型企业的 AI 架构师，都能利用 Spice 快速构建出既聪明又可信的智能应用。","\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspiceai_spiceai_readme_2b4a8d386306.png\" alt=\"spice oss logo\" width=\"600\"\u002F>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Factions\u002Fworkflows\u002Fcodeql-analysis.yml\">\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Factions\u002Fworkflows\u002Fcodeql-analysis.yml\u002Fbadge.svg?branch=trunk&event=push\" alt=\"CodeQL\"\u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fopensource.org\u002Flicenses\u002FApache-2.0\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache_2.0-blue.svg\" alt=\"License: Apache-2.0\"\u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fspiceai.org\u002Fslack\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSlack-Join%20Us-4A154B?logo=slack\" alt=\"Slack\"\u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fx.com\u002Fintent\u002Ffollow?screen_name=spice_ai\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Ffollow\u002Fspice_ai.svg?style=social&logo=x\" alt=\"Follow on X\"\u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Factions\u002Fworkflows\u002Fbuild_and_release.yml?branch=trunk\">\u003Cimg alt=\"GitHub Actions Workflow Status - build\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Factions\u002Fworkflow\u002Fstatus\u002Fspiceai\u002Fspiceai\u002Fbuild_and_release.yml?branch=trunk\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Factions\u002Fworkflows\u002Fspiced_docker_nightly.yml?branch=trunk\">\u003Cimg alt=\"GitHub Actions Workflow Status - docker build\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Factions\u002Fworkflow\u002Fstatus\u002Fspiceai\u002Fspiceai\u002Fspiced_docker_nightly.yml?branch=trunk&label=docker%20build\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Factions\u002Fworkflows\u002Fpr.yml?branch=trunk\">\u003Cimg alt=\"GitHub Actions Workflow Status - unit tests\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Factions\u002Fworkflow\u002Fstatus\u002Fspiceai\u002Fspiceai\u002Fpr.yml?event=merge_group&label=unit%20tests\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Factions\u002Fworkflows\u002Fintegration.yml?branch=trunk\">\u003Cimg alt=\"GitHub Actions Workflow Status - integration tests\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Factions\u002Fworkflow\u002Fstatus\u002Fspiceai\u002Fspiceai\u002Fintegration.yml?branch=trunk&label=integration%20tests\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Factions\u002Fworkflows\u002Fintegration_models.yml?branch=trunk\">\u003Cimg alt=\"GitHub Actions Workflow Status - integration tests (models)\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Factions\u002Fworkflow\u002Fstatus\u002Fspiceai\u002Fspiceai\u002Fintegration_models.yml?branch=trunk&label=integration%20tests%20(models)\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Factions\u002Fworkflows\u002Fbenchmarks.yml?branch=trunk\">\u003Cimg alt=\"GitHub Actions Workflow Status - benchmark tests\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Factions\u002Fworkflow\u002Fstatus\u002Fspiceai\u002Fspiceai\u002Ftestoperator_run_bench.yml?branch=trunk&label=benchmark%20tests\" \u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fspiceai.org\u002Fdocs\">📄 Docs\u003C\u002Fa> | \u003Ca href=\"#%EF%B8%8F-quickstart-local-machine\">⚡️ Quickstart\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\">🧑‍🍳 Cookbook\u003C\u002Fa>\n\u003C\u002Fp>\n\n**Spice** is a SQL query, search, and LLM-inference engine, written in Rust, for data apps and agents.\n\n\u003Cimg width=\"740\" alt=\"Spice.ai Open Source accelerated data query and LLM-inference engine\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspiceai_spiceai_readme_32134d46fe8d.png\" \u002F>\n\nSpice provides four industry standard APIs in a lightweight, portable runtime (single binary\u002Fcontainer):\n\n1. **SQL Query & Search**: HTTP, Arrow Flight, Arrow Flight SQL, ODBC, JDBC, and ADBC APIs; `vector_search` and `text_search` UDTFs.\n2. **OpenAI-Compatible APIs**: HTTP APIs for OpenAI SDK compatibility, local model serving (CUDA\u002FMetal accelerated), and hosted model gateway.\n3. **Iceberg Catalog REST APIs**: A unified Iceberg REST Catalog API.\n4. **MCP HTTP+SSE APIs**: Integration with external tools via Model Context Protocol (MCP) using HTTP and Server-Sent Events (SSE).\n\n🎯 Goal: Developers can focus on building data apps and AI agents confidently, knowing they are grounded in data.\n\nSpice's primary features include:\n\n- **Data Federation**: SQL query across any database, data warehouse, or data lake. Scale from single-node to distributed multi-node query execution. [Learn More](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fquery-federation).\n- **Data Materialization and Acceleration**: Materialize, accelerate, and cache database queries with Arrow, DuckDB, SQLite, PostgreSQL, or Spice Cayenne (Vortex). [Read the MaterializedView interview - Building a CDN for Databases](https:\u002F\u002Fmaterializedview.io\u002Fp\u002Fbuilding-a-cdn-for-databases-spice-ai)\n- **Enterprise Search**: Keyword, vector, and full-text search with Tantivy-powered BM25 and petabyte-scale vector similarity search via Amazon S3 Vectors or pgvector for structured and unstructured data.\n- **AI apps and agents**: An AI-database powering retrieval-augmented generation (RAG) and intelligent agents with OpenAI-compatible APIs and MCP integration. [Learn More](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fuse-cases\u002Frag).\n\nIf you want to build with DataFusion, DuckDB, or Vortex, Spice provides a simple, flexible, and production-ready engine you can just use.\n\n📣 Read the [Spice.ai 1.0-stable announcement](https:\u002F\u002Fspiceai.org\u002Fblog\u002Fannouncing-1.0-stable).\n\nSpice is built-on industry leading technologies including [Apache DataFusion](https:\u002F\u002Fdatafusion.apache.org), Apache Arrow, Arrow Flight, SQLite, and DuckDB.\n\n\u003Cdiv align=\"center\">\n  \u003Cpicture>\n    \u003Cimg width=\"600\" alt=\"How Spice works.\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspiceai_spiceai_readme_a2de21b7ee51.png\" \u002F>\n  \u003C\u002Fpicture>\n\u003C\u002Fdiv>\n\n🎥 [Watch the CMU Databases Accelerating Data and AI with Spice.ai Open-Source](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=tyM-ec1lKfU)\n\n🎥 [Watch How to Query Data using Spice, OpenAI, and MCP](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=TFAu4qxjTPk&list=PLesJrUXEx3U-dQul0PqLV3TGTdUmr3B6e&index=8)\n\n🎥 [Watch How to search with Amazon S3 Vectors](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=QPbqPf5W36g)\n\n## Why Spice?\n\n\u003Cdiv align=\"center\">\n  \u003Cpicture>\n    \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fassets\u002F80174\u002F96b5fcef-a550-4ce8-a74a-83931275e83e\">\n    \u003Cimg width=\"800\" alt=\"Spice.ai\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspiceai_spiceai_readme_ad2351c0603d.png\" \u002F>\n  \u003C\u002Fpicture>\n\u003C\u002Fdiv>\n\nSpice simplifies building data-driven AI applications and agents by making it fast and easy to query, federate, and accelerate data from one or more sources using SQL, while grounding AI in real-time, reliable data. Co-locate datasets with apps and AI models to power AI feedback loops, enable RAG and search, and deliver fast, low-latency data-query and AI-inference with full control over cost and performance.\n\n### Latest Capabilities\n\n- **Spice Cayenne Data Accelerator**: Simplified multi-file acceleration using the [Vortex columnar format](https:\u002F\u002Fgithub.com\u002Fvortex-data\u002Fvortex) + SQLite metadata. Delivers DuckDB-comparable performance without single-file scaling limitations.\n- **Multi-Node Distributed Query**: Scale query execution across multiple nodes with Apache Ballista integration for improved performance on large datasets.\n- **Acceleration Snapshots**: Bootstrap accelerations from S3 for fast cold starts (seconds vs. minutes). Supports ephemeral storage with persistent recovery.\n- **Iceberg Table Writes**: Write to Iceberg tables using standard SQL `INSERT INTO` for data ingestion and transformation—no Spark required.\n- **Petabyte-Scale Vector Search**: Native Amazon S3 Vectors integration manages the full vector lifecycle from ingestion to embedding to querying. SQL-integrated hybrid search with RRF.\n\n### How is Spice different?\n\n1. **AI-Native Runtime**: Spice combines data query and AI inference in a single engine, for data-grounded AI and accurate AI.\n\n2. **Application-Focused**: Designed to run distributed at the application and agent level, often as a 1:1 or 1:N mapping between app and Spice instance, unlike traditional data systems built for many apps on one centralized database. It’s common to spin up multiple Spice instances—even one per tenant or customer.\n\n3. **Dual-Engine Acceleration**: Supports both **OLAP** (Arrow\u002FDuckDB) and **OLTP** (SQLite\u002FPostgreSQL) engines at the dataset level, providing flexible performance across analytical and transactional workloads.\n\n4. **Disaggregated Storage**: Separation of compute from disaggregated storage, co-locating local, materialized working sets of data with applications, dashboards, or ML pipelines while accessing source data in its original storage.\n\n5. **Edge to Cloud Native**: Deploy as a standalone instance, Kubernetes sidecar, microservice, or cluster—across edge\u002FPOP, on-prem, and public clouds. Chain multiple Spice instances for tier-optimized, distributed deployments.\n\n## How does Spice compare?\n\n### Data Query and Analytics\n\n| Feature                          | **Spice**                             | Trino \u002F Presto       | Dremio                | ClickHouse          | Materialize         |\n| -------------------------------- | ------------------------------------- | -------------------- | --------------------- | ------------------- | ------------------- |\n| **Primary Use-Case**             | Data & AI apps\u002Fagents                 | Big data analytics   | Interactive analytics | Real-time analytics | Real-time analytics |\n| **Primary deployment model**     | Sidecar                               | Cluster              | Cluster               | Cluster             | Cluster             |\n| **Federated Query Support**      | ✅                                     | ✅                    | ✅                     | ―                   | ―                   |\n| **Acceleration\u002FMaterialization** | ✅ (Arrow, SQLite, DuckDB, PostgreSQL) | Intermediate storage | Reflections (Iceberg) | Materialized views  | ✅ (Real-time views) |\n| **Catalog Support**              | ✅ (Iceberg, Unity Catalog, AWS Glue)  | ✅                    | ✅                     | ―                   | ―                   |\n| **Query Result Caching**         | ✅                                     | ✅                    | ✅                     | ✅                   | Limited             |\n| **Multi-Modal Acceleration**     | ✅ (OLAP + OLTP)                       | ―                    | ―                     | ―                   | ―                   |\n| **Change Data Capture (CDC)**    | ✅ (Debezium)                          | ―                    | ―                     | ―                   | ✅ (Debezium)        |\n\n### AI Apps and Agents\n\n| Feature                       | **Spice**                               | LangChain          | LlamaIndex | AgentOps.ai      | Ollama                        |\n| ----------------------------- | --------------------------------------- | ------------------ | ---------- | ---------------- | ----------------------------- |\n| **Primary Use-Case**          | Data & AI apps                          | Agentic workflows  | RAG apps   | Agent operations | LLM apps                      |\n| **Programming Language**      | Any language (HTTP interface)           | JavaScript, Python | Python     | Python           | Any language (HTTP interface) |\n| **Unified Data + AI Runtime** | ✅                                       | ―                  | ―          | ―                | ―                             |\n| **Federated Data Query**      | ✅                                       | ―                  | ―          | ―                | ―                             |\n| **Accelerated Data Access**   | ✅                                       | ―                  | ―          | ―                | ―                             |\n| **Tools\u002FFunctions**           | ✅ (MCP HTTP+SSE)                        | ✅                  | ✅          | Limited          | Limited                       |\n| **LLM Memory**                | ✅                                       | ✅                  | ―          | ✅                | ―                             |\n| **Hybrid Search**             | ✅ (Keyword, Vector, & Full-Text-Search) | ✅                  | ✅          | Limited          | Limited                       |\n| **Caching**                   | ✅ (Query and results caching)           | Limited            | ―          | ―                | ―                             |\n| **Embeddings**                | ✅ (Built-in & pluggable models\u002FDBs)     | ✅                  | ✅          | Limited          | ―                             |\n\n✅ = Fully supported\n❌ = Not supported\nLimited = Partial or restricted support\n\n## Example Use-Cases\n\n### Data-grounded Agentic AI Applications\n\n- **OpenAI-compatible API**: Connect to hosted models (OpenAI, Anthropic, xAI, Amazon Bedrock) or deploy locally (Llama, NVIDIA NIM) with OpenAI Responses API support for advanced interactions. [AI Gateway Recipe](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Fblob\u002Ftrunk\u002Fopenai_sdk\u002FREADME.md)\n- **Federated Data Access**: Query using SQL and NSQL (text-to-SQL) across databases, data warehouses, and data lakes with advanced query push-down for fast retrieval. Scale to distributed multi-node query execution with Apache Ballista. [Federated SQL Query Recipe](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Fblob\u002Ftrunk\u002Ffederation\u002FREADME.md)\n- **Search and RAG**: Search and retrieve context with accelerated embeddings for retrieval-augmented generation (RAG) workflows. Native Amazon S3 Vectors integration for petabyte-scale vector search. Full-text search (FTS) via Tantivy-powered BM25 and vector similarity search (VSS) integrated into SQL via `text_search` and `vector_search` UDTFs. Reciprocal rank fusion (RRF) for hybrid search. [Amazon S3 Vectors Cookbook Recipe](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Ftree\u002Ftrunk\u002Fvectors\u002Fs3\u002FREADME.md)\n- **LLM Memory and Observability**: Store and retrieve history and context for AI agents while gaining deep visibility into data flows, model performance, and traces. [LLM Memory Recipe](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Fblob\u002Ftrunk\u002Fllm-memory\u002FREADME.md) | [Observability & Monitoring Features Documentation](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fobservability)\n\n### Database CDN and Query Mesh\n\n- **Data Acceleration**: Co-locate materialized datasets in Arrow, SQLite, DuckDB, PostgreSQL, or Cayenne (Vortex+SQLite) with applications for sub-second query. Bootstrap from snapshots stored in S3 for fast cold starts. Write to Iceberg tables with standard SQL `INSERT INTO`. [DuckDB Data Accelerator Recipe](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Fblob\u002Ftrunk\u002Fduckdb\u002Faccelerator\u002FREADME.md)\n- **Resiliency and Local Dataset Replication**: Maintain application availability with local replicas of critical datasets. Recover from federated source outages using acceleration snapshots. [Local Dataset Replication Recipe](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Fblob\u002Ftrunk\u002Flocalpod\u002FREADME.md)\n- **Responsive Dashboards**: Enable fast, real-time analytics by accelerating data for frontends and BI tools with configurable refresh schedules. [Sales BI Dashboard Demo](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Fblob\u002Ftrunk\u002Fsales-bi\u002FREADME.md)\n- **Simplified Legacy Migration**: Use a single endpoint to unify legacy systems with modern infrastructure, including federated SQL querying across multiple sources. [Federated SQL Query Recipe](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Fblob\u002Ftrunk\u002Ffederation\u002FREADME.md)\n\n### Retrieval-Augmented Generation (RAG)\n\n- **Unified Search with Vector Similarity**: Perform efficient vector similarity search across structured and unstructured data sources with native Amazon S3 Vectors integration for petabyte-scale vector storage and querying. The Spice runtime manages the vector lifecycle: ingesting data, embedding it using AWS Bedrock (Amazon Titan, Cohere), HuggingFace models, or Model2Vec (500x faster static embeddings), and storing in S3 Vector buckets or pgvector. Supports cosine similarity, Euclidean distance, or dot product. SQL-integrated search via `vector_search` and `text_search` UDTFs with hybrid search using reciprocal rank fusion (RRF). Example: `SELECT * FROM vector_search(my_table, 'search query', 10) WHERE condition ORDER BY _score;`. [Amazon S3 Vectors Cookbook Recipe](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Ftree\u002Ftrunk\u002Fvectors\u002Fs3\u002FREADME.md)\n- **Semantic Knowledge Layer**: Define a semantic context model to enrich data for AI. [Semantic Model Feature Documentation](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fsemantic-model)\n- **Text-to-SQL**: Convert natural language queries into SQL using built-in NSQL and sampling tools for accurate query. [Text-to-SQL Recipe](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Fblob\u002Ftrunk\u002Ftext-to-sql\u002FREADME.md)\n\n## FAQ\n\n- **Is Spice a cache?** No specifically; you can think of Spice data acceleration as an _active_ cache, materialization, or data prefetcher. A cache would fetch data on a cache-miss while Spice prefetches and materializes filtered data on an interval, trigger, or as data changes using CDC. In addition to acceleration Spice supports [results caching](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fcaching).\n\n- **Is Spice a CDN for databases?** Yes, a common use-case for Spice is as a CDN for different data sources. Using CDN concepts, Spice enables you to ship (load) a working set of your database (or data lake, or data warehouse) where it's most frequently accessed, like from a data-intensive application or for AI context.\n\n[➡️ Docs FAQ](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffaq)\n\n### Watch a 30-sec BI dashboard acceleration demo\n\n\u003Chttps:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fassets\u002F80174\u002F7735ee94-3f4a-4983-a98e-fe766e79e03a>\n\nSee more demos on [YouTube](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLesJrUXEx3U9anekJvbjyyTm7r9A26ugK).\n\n## Supported Data Connectors\n\n| Name                               | Description                           | Status            | Protocol\u002FFormat              |\n| ---------------------------------- | ------------------------------------- | ----------------- | ---------------------------- |\n| `databricks (mode: delta_lake)`    | [Databricks][databricks]              | Stable            | S3\u002FDelta Lake                |\n| `delta_lake`                       | Delta Lake                            | Stable            | Delta Lake                   |\n| `dremio`                           | [Dremio][dremio]                      | Stable            | Arrow Flight                 |\n| `duckdb`                           | DuckDB                                | Stable            | Embedded                     |\n| `file`                             | File                                  | Stable            | Parquet, CSV                 |\n| `github`                           | GitHub                                | Stable            | GitHub API                   |\n| `postgres`                         | PostgreSQL                            | Stable            |                              |\n| `s3`                               | [S3][s3]                              | Stable            | Parquet, CSV                 |\n| `mysql`                            | MySQL                                 | Stable            |                              |\n| `spice.ai`                         | [Spice.ai][spiceai]                   | Stable            | Arrow Flight                 |\n| `graphql`                          | GraphQL                               | Release Candidate | JSON                         |\n| `dynamodb`                         | Amazon DynamoDB                       | Release Candidate |                              |\n| `databricks (mode: spark_connect)` | [Databricks][databricks]              | Beta              | [Spark Connect][spark]       |\n| `flightsql`                        | FlightSQL                             | Beta              | Arrow Flight SQL             |\n| `iceberg`                          | [Apache Iceberg][iceberg]             | Beta              | Parquet                      |\n| `mssql`                            | Microsoft SQL Server                  | Beta              | Tabular Data Stream (TDS)    |\n| `odbc`                             | ODBC                                  | Beta              | ODBC                         |\n| `snowflake`                        | Snowflake                             | Beta              | Arrow                        |\n| `spark`                            | Spark                                 | Beta              | [Spark Connect][spark]       |\n| `oracle`                           | Oracle                                | Alpha             | [Oracle ODPI-C][ODPIC]       |\n| `abfs`                             | Azure BlobFS                          | Alpha             | Parquet, CSV                 |\n| `clickhouse`                       | Clickhouse                            | Alpha             |                              |\n| `debezium`                         | Debezium CDC                          | Alpha             | Kafka + JSON                 |\n| `gcs`, `gs`                        | [Google Cloud Storage][gcs]           | Alpha             | Parquet, CSV, JSON           |\n| `kafka`                            | Kafka                                 | Alpha             | Kafka + JSON                 |\n| `ftp`, `sftp`                      | FTP\u002FSFTP                              | Alpha             | Parquet, CSV                 |\n| `glue`                             | [AWS Glue][glue]                      | Alpha             | Iceberg, Parquet, CSV        |\n| `http`, `https`                    | HTTP(s)                               | Alpha             | Parquet, CSV, JSON           |\n| `imap`                             | IMAP                                  | Alpha             | IMAP Emails                  |\n| `localpod`                         | [Local dataset replication][localpod] | Alpha             |                              |\n| `mongodb`                          | MongoDB                               | Alpha             |                              |\n| `sharepoint`                       | Microsoft SharePoint                  | Alpha             | Unstructured UTF-8 documents |\n| `scylladb`                         | ScyllaDB                              | Alpha             |                              |\n| `smb`                              | SMB (Server Message Block)            | Alpha             | SMB                          |\n| `elasticsearch`                    | ElasticSearch                         | Roadmap           |                              |\n\n[databricks]: https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Fblob\u002Ftrunk\u002Fdatabricks\u002FREADME.md\n[spark]: https:\u002F\u002Fspark.apache.org\u002Fdocs\u002Flatest\u002Fspark-connect-overview.html\n[gcs]: docs\u002Ffeatures\u002Fgcs-connector.md\n[s3]: https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Ftree\u002Ftrunk\u002Fs3#readme\n[spiceai]: https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Ftree\u002Ftrunk\u002Fspiceai#readme\n[dremio]: https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Ftree\u002Ftrunk\u002Fdremio#readme\n[localpod]: https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Fblob\u002Ftrunk\u002Flocalpod\u002FREADME.md\n[iceberg]: https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Ftree\u002Ftrunk\u002Fcatalogs\u002Ficeberg#readme\n[glue]: https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Ftree\u002Ftrunk\u002Fglue\u002FREADME.md\n[ODPIC]: https:\u002F\u002Foracle.github.io\u002Fodpi\u002F\n\n## Supported Data Accelerators\n\n| Name       | Description                       | Status            | Engine Modes     |\n| ---------- | --------------------------------- | ----------------- | ---------------- |\n| `cayenne`  | [Spice Cayenne (Vortex)][cayenne] | Release Candidate | `file`           |\n| `arrow`    | [In-Memory Arrow Records][arrow]  | Stable            | `memory`         |\n| `duckdb`   | Embedded [DuckDB][duckdb]         | Stable            | `memory`, `file` |\n| `postgres` | Attached [PostgreSQL][postgres]   | Release Candidate | N\u002FA              |\n| `sqlite`   | Embedded [SQLite][sqlite]         | Release Candidate | `memory`, `file` |\n\n[arrow]: https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators\u002Farrow\n[cayenne]: https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators\u002Fcayenne\n[duckdb]: https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators\u002Fduckdb\n[postgres]: https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators\u002Fpostgres\n[sqlite]: https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators\u002Fsqlite\n\n## Supported Model Providers\n\n| Name          | Description                                  | Status            | ML Format(s) | LLM Format(s)                   |\n| ------------- | -------------------------------------------- | ----------------- | ------------ | ------------------------------- |\n| `openai`      | OpenAI (or compatible) LLM endpoint          | Release Candidate | -            | OpenAI-compatible HTTP endpoint |\n| `file`        | Local filesystem                             | Release Candidate | ONNX         | GGUF, GGML, SafeTensor          |\n| `huggingface` | Models hosted on HuggingFace                 | Release Candidate | ONNX         | GGUF, GGML, SafeTensor          |\n| `spice.ai`    | Models hosted on the Spice.ai Cloud Platform |                   | ONNX         | OpenAI-compatible HTTP endpoint |\n| `azure`       | Azure OpenAI                                 |                   | -            | OpenAI-compatible HTTP endpoint |\n| `bedrock`     | Amazon Bedrock (Nova models)                 | Alpha             | -            | OpenAI-compatible HTTP endpoint |\n| `anthropic`   | Models hosted on Anthropic                   | Alpha             | -            | OpenAI-compatible HTTP endpoint |\n| `xai`         | Models hosted on xAI                         | Alpha             | -            | OpenAI-compatible HTTP endpoint |\n\n## Supported Embeddings Providers\n\n| Name          | Description                         | Status            | ML Format(s) | LLM Format(s)\\*                 |\n| ------------- | ----------------------------------- | ----------------- | ------------ | ------------------------------- |\n| `openai`      | OpenAI (or compatible) LLM endpoint | Release Candidate | -            | OpenAI-compatible HTTP endpoint |\n| `file`        | Local filesystem                    | Release Candidate | ONNX         | GGUF, GGML, SafeTensor          |\n| `huggingface` | Models hosted on HuggingFace        | Release Candidate | ONNX         | GGUF, GGML, SafeTensor          |\n| `model2vec`   | Static embeddings (500x faster)     | Release Candidate | Model2Vec    | -                               |\n| `azure`       | Azure OpenAI                        | Alpha             | -            | OpenAI-compatible HTTP endpoint |\n| `bedrock`     | AWS Bedrock (e.g., Titan, Cohere)   | Alpha             | -            | OpenAI-compatible HTTP endpoint |\n\n## Supported Vector Stores\n\n| Name            | Description                                                          | Status |\n| --------------- | -------------------------------------------------------------------- | ------ |\n| `s3_vectors`    | Amazon S3 Vectors for petabyte-scale vector storage and querying     | Alpha  |\n| `pgvector`      | PostgreSQL with pgvector extension                                   | Alpha  |\n| `duckdb_vector` | DuckDB with vector extension for efficient vector storage and search | Alpha  |\n| `sqlite_vec`    | SQLite with sqlite-vec extension for lightweight vector operations   | Alpha  |\n\n## Supported Catalogs\n\nCatalog Connectors connect to external catalog providers and make their tables available for federated SQL query in Spice. Configuring accelerations for tables in external catalogs is not supported. The schema hierarchy of the external catalog is preserved in Spice.\n\n| Name            | Description             | Status | Protocol\u002FFormat              |\n| --------------- | ----------------------- | ------ | ---------------------------- |\n| `spice.ai`      | Spice.ai Cloud Platform | Stable | Arrow Flight                 |\n| `unity_catalog` | Unity Catalog           | Stable | Delta Lake                   |\n| `databricks`    | Databricks              | Beta   | Spark Connect, S3\u002FDelta Lake |\n| `iceberg`       | Apache Iceberg          | Beta   | Parquet                      |\n| `glue`          | AWS Glue                | Alpha  | CSV, Parquet, Iceberg        |\n\n## ⚡️ Quickstart (Local Machine)\n\n\u003Chttps:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fassets\u002F88671039\u002F85cf9a69-46e7-412e-8b68-22617dcbd4e0>\n\n### Installation\n\nInstall the Spice CLI:\n\nOn **macOS, Linux, and WSL**:\n\n```bash\ncurl https:\u002F\u002Finstall.spiceai.org | \u002Fbin\u002Fbash\n```\n\nOr using `brew`:\n\n```bash\nbrew install spiceai\u002Fspiceai\u002Fspice\n```\n\nOn **Windows** using PowerShell:\n\n```powershell\niex ((New-Object System.Net.WebClient).DownloadString(\"https:\u002F\u002Finstall.spiceai.org\u002FInstall.ps1\"))\n```\n\n### Usage\n\n**Step 1.** Initialize a new Spice app with the `spice init` command:\n\n```bash\nspice init spice_qs\n```\n\nA `spicepod.yaml` file is created in the `spice_qs` directory. Change to that directory:\n\n```bash\ncd spice_qs\n```\n\n**Step 2.** Start the Spice runtime:\n\n```bash\nspice run\n```\n\nExample output will be shown as follows:\n\n```bash\n2025\u002F01\u002F20 11:26:10 INFO Spice.ai runtime starting...\n2025-01-20T19:26:10.679068Z  INFO runtime::init::dataset: No datasets were configured. If this is unexpected, check the Spicepod configuration.\n2025-01-20T19:26:10.679716Z  INFO runtime::flight: Spice Runtime Flight listening on 127.0.0.1:50051\n2025-01-20T19:26:10.679786Z  INFO runtime::metrics_server: Spice Runtime Metrics listening on 127.0.0.1:9090\n2025-01-20T19:26:10.680140Z  INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:8090\n2025-01-20T19:26:10.879126Z  INFO runtime::init::results_cache: Initialized sql results cache; max size: 128.00 MiB, item ttl: 1s\n```\n\nThe runtime is now started and ready for queries.\n\n**Step 3.** In a new terminal window, add the `spiceai\u002Fquickstart` Spicepod. A Spicepod is a package of configuration defining datasets and ML models.\n\n```bash\nspice add spiceai\u002Fquickstart\n```\n\nThe `spicepod.yaml` file will be updated with the `spiceai\u002Fquickstart` dependency.\n\n```yaml\nversion: v1\nkind: Spicepod\nname: spice_qs\ndependencies:\n  - spiceai\u002Fquickstart\n```\n\nThe `spiceai\u002Fquickstart` Spicepod will add a `taxi_trips` data table to the runtime which is now available to query by SQL.\n\n```bash\n2025-01-20T19:26:30.011633Z  INFO runtime::init::dataset: Dataset taxi_trips registered (s3:\u002F\u002Fspiceai-demo-datasets\u002Ftaxi_trips\u002F2024\u002F), acceleration (arrow), results cache enabled.\n2025-01-20T19:26:30.013002Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset taxi_trips\n2025-01-20T19:26:40.312839Z  INFO runtime::accelerated_table::refresh_task: Loaded 2,964,624 rows (399.41 MiB) for dataset taxi_trips in 10s 299ms\n```\n\n**Step 4.** Start the Spice SQL REPL:\n\n```bash\nspice sql\n```\n\nThe SQL REPL inferface will be shown:\n\n```bash\nWelcome to the Spice.ai SQL REPL! Type 'help' for help.\n\nshow tables; -- list available tables\nsql>\n```\n\nEnter `show tables;` to display the available tables for query:\n\n```bash\nsql> show tables;\n+---------------+--------------+---------------+------------+\n| table_catalog | table_schema | table_name    | table_type |\n+---------------+--------------+---------------+------------+\n| spice         | public       | taxi_trips    | BASE TABLE |\n| spice         | runtime      | query_history | BASE TABLE |\n| spice         | runtime      | metrics       | BASE TABLE |\n+---------------+--------------+---------------+------------+\n\nTime: 0.022671708 seconds. 3 rows.\n```\n\nEnter a query to display the longest taxi trips:\n\n```sql\nSELECT trip_distance, total_amount FROM taxi_trips ORDER BY trip_distance DESC LIMIT 10;\n```\n\nOutput:\n\n```bash\n+---------------+--------------+\n| trip_distance | total_amount |\n+---------------+--------------+\n| 312722.3      | 22.15        |\n| 97793.92      | 36.31        |\n| 82015.45      | 21.56        |\n| 72975.97      | 20.04        |\n| 71752.26      | 49.57        |\n| 59282.45      | 33.52        |\n| 59076.43      | 23.17        |\n| 58298.51      | 18.63        |\n| 51619.36      | 24.2         |\n| 44018.64      | 52.43        |\n+---------------+--------------+\n\nTime: 0.045150667 seconds. 10 rows.\n```\n\n## ⚙️ Runtime Container Deployment\n\nUsing the [Docker image](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fspiceai\u002Fspiceai) locally:\n\n```bash\ndocker pull spiceai\u002Fspiceai\n```\n\nIn a Dockerfile:\n\n```dockerfile\nfrom spiceai\u002Fspiceai:latest\n```\n\nUsing Helm:\n\n```bash\nhelm repo add spiceai https:\u002F\u002Fhelm.spiceai.org\nhelm install spiceai spiceai\u002Fspiceai\n```\n\n## 🏎️ Next Steps\n\n### Explore the Spice.ai Cookbook\n\nThe Spice.ai Cookbook is a collection of recipes and examples for using Spice. Find it at [https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook#readme).\n\n### Using Spice.ai Cloud Platform\n\nAccess ready-to-use Spicepods and datasets hosted on the Spice.ai Cloud Platform using the Spice runtime. A list of public Spicepods is available on Spicerack: [https:\u002F\u002Fspicerack.org\u002F](https:\u002F\u002Fspicerack.org\u002F).\n\nTo use public datasets, create a free account on Spice.ai:\n\n1. Visit [spice.ai](https:\u002F\u002Fspice.ai\u002F) and click **Try for Free**.\n   ![Try for Free](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspiceai_spiceai_readme_13db350a7e64.png)\n\n2. After creating an account, create an app to generate an API key.\n   ![Create App](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspiceai_spiceai_readme_84977fe4da1a.png)\n\nOnce set up, you can access ready-to-use Spicepods including datasets. For this demonstration, use the `taxi_trips` dataset from the [Spice.ai Quickstart](https:\u002F\u002Fspice.ai\u002Fspiceai\u002Fquickstart).\n\n**Step 1.** Initialize a new project.\n\n```bash\n# Initialize a new Spice app\nspice init spice_app\n\n# Change to app directory\ncd spice_app\n```\n\n**Step 2.** Log in and authenticate from the command line using the `spice login` command. A pop up browser window will prompt you to authenticate:\n\n```bash\nspice login\n```\n\n**Step 3.** Start the runtime:\n\n```bash\n# Start the runtime\nspice run\n```\n\n**Step 4.** Configure the dataset:\n\nIn a new terminal window, configure a new dataset using the `spice dataset configure` command:\n\n```bash\nspice dataset configure\n```\n\nEnter a dataset name that will be used to reference the dataset in queries. This name does not need to match the name in the dataset source.\n\n```bash\ndataset name: (spice_app) taxi_trips\n```\n\nEnter the description of the dataset:\n\n```bash\ndescription: Taxi trips dataset\n```\n\nEnter the location of the dataset:\n\n```bash\nfrom: spice.ai\u002Fspiceai\u002Fquickstart\u002Fdatasets\u002Ftaxi_trips\n```\n\nSelect `y` when prompted whether to accelerate the data:\n\n```bash\nLocally accelerate (y\u002Fn)? y\n```\n\nYou should see the following output from your runtime terminal:\n\n```bash\n2024-12-16T05:12:45.803694Z  INFO runtime::init::dataset: Dataset taxi_trips registered (spice.ai\u002Fspiceai\u002Fquickstart\u002Fdatasets\u002Ftaxi_trips), acceleration (arrow, 10s refresh), results cache enabled.\n2024-12-16T05:12:45.805494Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset taxi_trips\n2024-12-16T05:13:24.218345Z  INFO runtime::accelerated_table::refresh_task: Loaded 2,964,624 rows (8.41 GiB) for dataset taxi_trips in 38s 412ms.\n```\n\n**Step 5.** In a new terminal window, use the Spice SQL REPL to query the dataset\n\n```bash\nspice sql\n```\n\n```bash\nSELECT tpep_pickup_datetime, passenger_count, trip_distance from taxi_trips LIMIT 10;\n```\n\nThe output displays the results of the query along with the query execution time:\n\n```bash\n+----------------------+-----------------+---------------+\n| tpep_pickup_datetime | passenger_count | trip_distance |\n+----------------------+-----------------+---------------+\n| 2024-01-11T12:55:12  | 1               | 0.0           |\n| 2024-01-11T12:55:12  | 1               | 0.0           |\n| 2024-01-11T12:04:56  | 1               | 0.63          |\n| 2024-01-11T12:18:31  | 1               | 1.38          |\n| 2024-01-11T12:39:26  | 1               | 1.01          |\n| 2024-01-11T12:18:58  | 1               | 5.13          |\n| 2024-01-11T12:43:13  | 1               | 2.9           |\n| 2024-01-11T12:05:41  | 1               | 1.36          |\n| 2024-01-11T12:20:41  | 1               | 1.11          |\n| 2024-01-11T12:37:25  | 1               | 2.04          |\n+----------------------+-----------------+---------------+\n\nTime: 0.00538925 seconds. 10 rows.\n```\n\nYou can experiment with the time it takes to generate queries when using non-accelerated datasets. You can change the acceleration setting from `true` to `false` in the datasets.yaml file.\n\n### 📄 Documentation\n\nComprehensive documentation is available at [spiceai.org\u002Fdocs](https:\u002F\u002Fspiceai.org\u002Fdocs\u002F).\n\nOver 45 quickstarts and samples available in the [Spice Cookbook](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook#spiceai-oss-cookbook).\n\n### 🔌 Extensibility\n\nSpice.ai is designed to be extensible with extension points documented at [EXTENSIBILITY.md](.\u002Fdocs\u002FEXTENSIBILITY.md). Build custom [Data Connectors](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-connectors), [Data Accelerators](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators), [Catalog Connectors](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fcatalogs), [Secret Stores](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fsecret-stores), [Models](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fmodels), or [Embeddings](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fembeddings).\n\n### 🔨 Upcoming Features\n\n🚀 See the [Roadmap](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fblob\u002Ftrunk\u002Fdocs\u002FROADMAP.md) for upcoming features.\n\n### 🤝 Connect with us\n\nWe greatly appreciate and value your support! You can help Spice in a number of ways:\n\n- Build an app with Spice.ai and send us feedback and suggestions at [hey@spice.ai](mailto:hey@spice.ai) or on [Discord](https:\u002F\u002Fdiscord.gg\u002FkZnTfneP5u), [X](https:\u002F\u002Ftwitter.com\u002Fspice_ai), or [LinkedIn](https:\u002F\u002Fwww.linkedin.com\u002Fcompany\u002F74148478).\n- [File an issue](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fissues\u002Fnew) if you see something not quite working correctly.\n- Join our team ([We’re hiring!](https:\u002F\u002Fspice.ai\u002Fcareers))\n- Contribute code or documentation to the project (see [CONTRIBUTING.md](CONTRIBUTING.md)).\n- Follow our blog at [spiceai.org\u002Fblog](https:\u002F\u002Fspiceai.org\u002Fblog)\n\n⭐️ star this repo! Thank you for your support! 🙏\n","\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspiceai_spiceai_readme_2b4a8d386306.png\" alt=\"spice oss logo\" width=\"600\"\u002F>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Factions\u002Fworkflows\u002Fcodeql-analysis.yml\">\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Factions\u002Fworkflows\u002Fcodeql-analysis.yml\u002Fbadge.svg?branch=trunk&event=push\" alt=\"CodeQL\"\u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fopensource.org\u002Flicenses\u002FApache-2.0\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache_2.0-blue.svg\" alt=\"License: Apache-2.0\"\u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fspiceai.org\u002Fslack\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSlack-Join%20Us-4A154B?logo=slack\" alt=\"Slack\"\u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fx.com\u002Fintent\u002Ffollow?screen_name=spice_ai\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Ffollow\u002Fspice_ai.svg?style=social&logo=x\" alt=\"Follow on X\"\u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Factions\u002Fworkflows\u002Fbuild_and_release.yml?branch=trunk\">\u003Cimg alt=\"GitHub Actions Workflow Status - build\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Factions\u002Fworkflow\u002Fstatus\u002Fspiceai\u002Fspiceai\u002Fbuild_and_release.yml?branch=trunk\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Factions\u002Fworkflows\u002Fspiced_docker_nightly.yml?branch=trunk\">\u003Cimg alt=\"GitHub Actions Workflow Status - docker build\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Factions\u002Fworkflow\u002Fstatus\u002Fspiceai\u002Fspiceai\u002Fspiced_docker_nightly.yml?branch=trunk&label=docker%20build\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Factions\u002Fworkflows\u002Fpr.yml?branch=trunk\">\u003Cimg alt=\"GitHub Actions Workflow Status - unit tests\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Factions\u002Fworkflow\u002Fstatus\u002Fspiceai\u002Fspiceai\u002Fpr.yml?event=merge_group&label=unit%20tests\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Factions\u002Fworkflows\u002Fintegration.yml?branch=trunk\">\u003Cimg alt=\"GitHub Actions Workflow Status - integration tests\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Factions\u002Fworkflow\u002Fstatus\u002Fspiceai\u002Fspiceai\u002Fintegration.yml?branch=trunk&label=integration%20tests\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Factions\u002Fworkflows\u002Fintegration_models.yml?branch=trunk\">\u003Cimg alt=\"GitHub Actions Workflow Status - integration tests (models)\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Factions\u002Fworkflow\u002Fstatus\u002Fspiceai\u002Fspiceai\u002Fintegration_models.yml?branch=trunk&label=integration%20tests%20(models)\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Factions\u002Fworkflows\u002Fbenchmarks.yml?branch=trunk\">\u003Cimg alt=\"GitHub Actions Workflow Status - benchmark tests\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Factions\u002Fworkflow\u002Fstatus\u002Fspiceai\u002Fspiceai\u002Ftestoperator_run_bench.yml?branch=trunk&label=benchmark%20tests\" \u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fspiceai.org\u002Fdocs\">📄 文档\u003C\u002Fa> | \u003Ca href=\"#%EF%B8%8F-quickstart-local-machine\">⚡️ 快速入门\u003C\u002Fa> | \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\">🧑‍🍳 食谱\u003C\u002Fa>\n\u003C\u002Fp>\n\n**Spice** 是一款用 Rust 编写的 SQL 查询、搜索和 LLM 推理引擎，专为数据应用和智能代理设计。\n\n\u003Cimg width=\"740\" alt=\"Spice.ai 开源加速数据查询与 LLM 推理引擎\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspiceai_spiceai_readme_32134d46fe8d.png\" \u002F>\n\nSpice 在一个轻量级、可移植的运行时（单个二进制文件或容器）中提供了四种行业标准 API：\n\n1. **SQL 查询与搜索**：HTTP、Arrow Flight、Arrow Flight SQL、ODBC、JDBC 和 ADBC API；`vector_search` 和 `text_search` UDTF。\n2. **OpenAI 兼容 API**：用于兼容 OpenAI SDK 的 HTTP API、本地模型服务（CUDA\u002FMetal 加速）以及托管模型网关。\n3. **Iceberg 目录 REST API**：统一的 Iceberg REST 目录 API。\n4. **MCP HTTP+SSE API**：通过 Model Context Protocol (MCP) 使用 HTTP 和 Server-Sent Events (SSE) 与外部工具集成。\n\n🎯 目标：开发者可以专注于构建数据应用和 AI 代理，同时确信其应用建立在可靠的数据基础之上。\n\nSpice 的主要特性包括：\n\n- **数据联邦**：可在任何数据库、数据仓库或数据湖中执行 SQL 查询。支持从单节点到分布式多节点的查询执行。[了解更多](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fquery-federation)。\n- **数据物化与加速**：使用 Arrow、DuckDB、SQLite、PostgreSQL 或 Spice Cayenne (Vortex) 对数据库查询进行物化、加速和缓存。[阅读 MaterializedView 采访——构建数据库 CDN](https:\u002F\u002Fmaterializedview.io\u002Fp\u002Fbuilding-a-cdn-for-databases-spice-ai)\n- **企业级搜索**：基于 Tantivy 的 BM25 关键字、向量和全文搜索，以及通过 Amazon S3 Vectors 或 pgvector 实现的 PB 级向量相似性搜索，适用于结构化和非结构化数据。\n- **AI 应用与智能代理**：一个结合了 OpenAI 兼容 API 和 MCP 集成的 AI 数据库，可驱动检索增强生成（RAG）和智能代理。[了解更多](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fuse-cases\u002Frag)。\n\n如果您希望使用 DataFusion、DuckDB 或 Vortex 进行开发，Spice 提供了一个简单、灵活且适合生产环境的引擎，您可以直接使用。\n\n📣 阅读 [Spice.ai 1.0 稳定版公告](https:\u002F\u002Fspiceai.org\u002Fblog\u002Fannouncing-1.0-stable)。\n\nSpice 基于业界领先的技术构建，包括 [Apache DataFusion](https:\u002F\u002Fdatafusion.apache.org)、Apache Arrow、Arrow Flight、SQLite 和 DuckDB。\n\n\u003Cdiv align=\"center\">\n  \u003Cpicture>\n    \u003Cimg width=\"600\" alt=\"Spice 的工作原理。\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspiceai_spiceai_readme_a2de21b7ee51.png\" \u002F>\n  \u003C\u002Fpicture>\n\u003C\u002Fdiv>\n\n🎥 [观看 CMU 数据库如何利用 Spice.ai 开源技术加速数据与 AI](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=tyM-ec1lKfU)\n\n🎥 [观看如何使用 Spice、OpenAI 和 MCP 查询数据](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=TFAu4qxjTPk&list=PLesJrUXEx3U-dQul0PqLV3TGTdUmr3B6e&index=8)\n\n🎥 [观看如何使用 Amazon S3 Vectors 进行搜索](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=QPbqPf5W36g)\n\n## 为什么选择 Spice？\n\n\u003Cdiv align=\"center\">\n  \u003Cpicture>\n    \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fassets\u002F80174\u002F96b5fcef-a550-4ce8-a74a-83931275e83e\">\n    \u003Cimg width=\"800\" alt=\"Spice.ai\" src=\"https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspiceai_spiceai_readme_ad2351c0603d.png\" \u002F>\n  \u003C\u002Fpicture>\n\u003C\u002Fdiv>\n\nSpice 通过使用户能够使用 SQL 从一个或多个数据源快速、轻松地查询、联邦和加速数据，简化了数据驱动型 AI 应用和智能代理的构建过程，同时确保 AI 始终基于实时、可靠的数据。将数据集与应用和 AI 模型同地部署，以支持 AI 反馈循环、实现 RAG 和搜索，并提供快速、低延迟的数据查询与 AI 推理服务，同时完全掌控成本和性能。\n\n### 最新功能\n\n- **Spice 辣椒数据加速器**：使用[Vortex 列式格式](https:\u002F\u002Fgithub.com\u002Fvortex-data\u002Fvortex) + SQLite 元数据，简化多文件加速。提供与 DuckDB 相当的性能，同时不受单文件规模限制。\n- **多节点分布式查询**：通过集成 Apache Ballista，在多个节点上扩展查询执行，从而提升大规模数据集上的性能。\n- **加速快照**：从 S3 启动加速，实现快速冷启动（几秒而非几分钟）。支持临时存储，并具备持久化恢复能力。\n- **Iceberg 表写入**：使用标准 SQL `INSERT INTO` 语句直接写入 Iceberg 表，完成数据摄取和转换——无需 Spark。\n- **PB 级向量搜索**：原生集成 Amazon S3 Vectors，管理从数据摄取、嵌入到查询的完整向量生命周期。结合 RRF 的 SQL 集成混合搜索。\n\n### Spice 有何不同？\n\n1. **AI 原生运行时**：Spice 将数据查询与 AI 推理整合到一个引擎中，实现数据驱动的 AI 和精准的 AI 应用。\n2. **以应用为中心**：专为在应用和代理级别进行分布式部署而设计，通常采用应用与 Spice 实例之间的 1:1 或 1:N 映射模式，不同于传统数据系统将多个应用集中部署在一个中央数据库上。常见做法是启动多个 Spice 实例，甚至为每个租户或客户单独部署一个实例。\n3. **双引擎加速**：在数据集层面同时支持 **OLAP**（Arrow\u002FDuckDB）和 **OLTP**（SQLite\u002FPostgreSQL）引擎，灵活应对分析型和事务型工作负载。\n4. **分离式存储**：计算与分离式存储解耦，将本地物化的工作数据集与应用程序、仪表板或 ML 流水线共置，同时访问原始存储中的源数据。\n5. **边缘到云端原生**：可作为独立实例、Kubernetes Sidecar、微服务或集群部署，覆盖边缘\u002FPOP、本地数据中心及公有云环境。可通过串联多个 Spice 实例实现分层优化的分布式部署。\n\n## Spice 如何对比？\n\n### 数据查询与分析\n\n| 功能                          | **Spice**                             | Trino \u002F Presto       | Dremio                | ClickHouse          | Materialize         |\n| ----------------------------- | ------------------------------------- | -------------------- | --------------------- | ------------------- | ------------------- |\n| **主要用例**                  | 数据与 AI 应用\u002F代理                 | 大数据分析           | 交互式分析           | 实时分析            | 实时分析            |\n| **主要部署模式**              | Sidecar                               | 集群                 | 集群                  | 集群                | 集群                |\n| **联邦查询支持**              | ✅                                     | ✅                    | ✅                     | ―                   | ―                   |\n| **加速\u002F物化**                 | ✅（Arrow、SQLite、DuckDB、PostgreSQL） | 中间存储             | 反射表（Iceberg）     | 物化视图            | ✅（实时视图）        |\n| **元数据目录支持**            | ✅（Iceberg、Unity Catalog、AWS Glue）  | ✅                    | ✅                     | ―                   | ―                   |\n| **查询结果缓存**              | ✅                                     | ✅                    | ✅                     | ✅                   | 有限                |\n| **多模态加速**                | ✅（OLAP + OLTP）                       | ―                    | ―                     | ―                   | ―                   |\n| **变更数据捕获 (CDC)**        | ✅（Debezium）                          | ―                    | ―                     | ―                   | ✅（Debezium）        |\n\n### AI 应用与代理\n\n| 功能                       | **Spice**                               | LangChain          | LlamaIndex | AgentOps.ai      | Ollama                        |\n| ---------------------------- | --------------------------------------- | ------------------ | ---------- | ---------------- | ----------------------------- |\n| **主要用例**               | 数据与 AI 应用                          | 代理式工作流     | RAG 应用   | 代理运营         | LLM 应用                      |\n| **编程语言**               | 任意语言（HTTP 接口）                   | JavaScript、Python | Python     | Python           | 任意语言（HTTP 接口）         |\n| **统一的数据 + AI 运行时**  | ✅                                       | ―                  | ―          | ―                | ―                             |\n| **联邦数据查询**           | ✅                                       | ―                  | ―          | ―                | ―                             |\n| **加速数据访问**           | ✅                                       | ―                  | ―          | ―                | ―                             |\n| **工具\u002F函数**              | ✅（MCP HTTP+SSE）                       | ✅                  | ✅          | 有限             | 有限                           |\n| **LLM 内存**               | ✅                                       | ✅                  | ―          | ✅                | ―                             |\n| **混合搜索**               | ✅（关键词、向量及全文检索）             | ✅                  | ✅          | 有限             | 有限                           |\n| **缓存**                   | ✅（查询及结果缓存）                    | 有限                | ―          | ―                | ―                             |\n| **嵌入表示**               | ✅（内置及可插拔模型\u002F数据库）           | ✅                  | ✅          | 有限             | ―                             |\n\n✅ = 完全支持  \n❌ = 不支持  \n有限 = 部分或受限支持\n\n## 示例用例\n\n### 基于数据的智能体AI应用\n\n- **OpenAI兼容API**：通过支持OpenAI响应API的高级交互，连接托管模型（OpenAI、Anthropic、xAI、Amazon Bedrock）或在本地部署（Llama、NVIDIA NIM）。[AI网关配方](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Fblob\u002Ftrunk\u002Fopenai_sdk\u002FREADME.md)\n- **联邦数据访问**：使用SQL和NSQL（文本转SQL）跨数据库、数据仓库和数据湖进行查询，并借助高级查询下推实现快速检索。借助Apache Ballista扩展至分布式多节点查询执行。[联邦SQL查询配方](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Fblob\u002Ftrunk\u002Ffederation\u002FREADME.md)\n- **搜索与RAG**：利用加速嵌入技术进行搜索并检索上下文，用于检索增强生成（RAG）工作流。原生集成Amazon S3向量存储，支持PB级向量搜索。通过Tantivy驱动的BM25实现全文搜索（FTS），并通过`text_search`和`vector_search`UDTF将向量相似度搜索（VSS）集成到SQL中。采用互斥排名融合（RRF）实现混合搜索。[Amazon S3向量 Cookbook配方](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Ftree\u002Ftrunk\u002Fvectors\u002Fs3\u002FREADME.md)\n- **LLM记忆与可观测性**：为AI智能体存储和检索历史及上下文，同时深入洞察数据流、模型性能和追踪信息。[LLM记忆配方](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Fblob\u002Ftrunk\u002Fllm-memory\u002FREADME.md) | [可观测性与监控功能文档](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fobservability)\n\n### 数据库CDN与查询网格\n\n- **数据加速**：将物化数据集与应用程序同地部署，采用Arrow、SQLite、DuckDB、PostgreSQL或Cayenne（Vortex+SQLite）格式，实现亚秒级查询。可通过存储在S3中的快照快速冷启动。使用标准SQL `INSERT INTO`写入Iceberg表。[DuckDB数据加速配方](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Fblob\u002Ftrunk\u002Fduckdb\u002Faccelerator\u002FREADME.md)\n- **容错与本地数据副本**：通过维护关键数据集的本地副本，确保应用程序的可用性。在联邦源发生故障时，可利用加速快照进行恢复。[本地数据副本配方](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Fblob\u002Ftrunk\u002Flocalpod\u002FREADME.md)\n- **响应式仪表板**：通过为前端和BI工具加速数据，并配置可调的刷新计划，实现快速的实时分析。[销售BI仪表板演示](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Fblob\u002Ftrunk\u002Fsales-bi\u002FREADME.md)\n- **简化遗留系统迁移**：使用单一端点统一遗留系统与现代基础设施，包括跨多个数据源的联邦SQL查询。[联邦SQL查询配方](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Fblob\u002Ftrunk\u002Ffederation\u002FREADME.md)\n\n### 检索增强生成（RAG）\n\n- **统一向量相似度搜索**：通过原生集成Amazon S3向量存储，实现对结构化和非结构化数据源的高效向量相似度搜索，支持PB级向量存储与查询。Spice运行时管理向量生命周期：摄取数据、使用AWS Bedrock（Amazon Titan、Cohere）、HuggingFace模型或Model2Vec（静态嵌入速度快500倍）进行嵌入，并存储于S3向量桶或pgvector中。支持余弦相似度、欧几里得距离或点积计算。通过`vector_search`和`text_search`UDTF将搜索集成到SQL中，并利用互斥排名融合（RRF）实现混合搜索。示例：`SELECT * FROM vector_search(my_table, '搜索查询', 10) WHERE condition ORDER BY _score;`。[Amazon S3向量 Cookbook配方](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Ftree\u002Ftrunk\u002Fvectors\u002Fs3\u002FREADME.md)\n- **语义知识层**：定义语义上下文模型，以丰富AI所需的数据。[语义模型功能文档](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fsemantic-model)\n- **文本转SQL**：利用内置的NSQL和采样工具，将自然语言查询转换为SQL，确保查询准确性。[文本转SQL配方](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Fblob\u002Ftrunk\u002Ftext-to-sql\u002FREADME.md)\n\n## 常见问题解答\n\n- **Spice是缓存吗？** 并不是严格意义上的缓存；你可以将其理解为一种“主动”缓存、物化或数据预取机制。缓存通常在缓存未命中时才获取数据，而Spice则会根据时间间隔、触发条件或通过CDC监听数据变化来预取和物化筛选后的数据。除了数据加速外，Spice还支持[结果缓存](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fcaching)。\n\n- **Spice是数据库CDN吗？** 是的，Spice的一个常见用法就是作为不同数据源的CDN。借鉴CDN的概念，Spice允许你将数据库（或数据湖、数据仓库）的工作集“运送”（加载）到最常被访问的位置，例如数据密集型应用或用于AI上下文的场景。\n\n[➡️ 文档FAQ](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffaq)\n\n### 观看30秒BI仪表板加速演示\n\n\u003Chttps:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fassets\u002F80174\u002F7735ee94-3f4a-4983-a98e-fe766e79e03a>\n\n更多演示请访问[YouTube](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLesJrUXEx3U9anekJvbjyyTm7r9A26ugK)。\n\n## 支持的数据连接器\n\n| 名称                               | 描述                           | 状态            | 协议\u002F格式              |\n| ---------------------------------- | ------------------------------------- | ----------------- | ---------------------------- |\n| `databricks (mode: delta_lake)`    | [Databricks][databricks]              | 稳定            | S3\u002FDelta Lake                |\n| `delta_lake`                       | Delta Lake                            | 稳定            | Delta Lake                   |\n| `dremio`                           | [Dremio][dremio]                      | 稳定            | Arrow Flight                 |\n| `duckdb`                           | DuckDB                                | 稳定            | 嵌入式                     |\n| `file`                             | 文件                                  | 稳定            | Parquet, CSV                 |\n| `github`                           | GitHub                                | 稳定            | GitHub API                   |\n| `postgres`                         | PostgreSQL                            | 稳定            |                              |\n| `s3`                               | [S3][s3]                              | 稳定            | Parquet, CSV                 |\n| `mysql`                            | MySQL                                 | 稳定            |                              |\n| `spice.ai`                         | [Spice.ai][spiceai]                   | 稳定            | Arrow Flight                 |\n| `graphql`                          | GraphQL                               | 发布候选        | JSON                         |\n| `dynamodb`                         | Amazon DynamoDB                       | 发布候选        |                              |\n| `databricks (mode: spark_connect)` | [Databricks][databricks]              | 测试版          | [Spark Connect][spark]       |\n| `flightsql`                        | FlightSQL                             | 测试版          | Arrow Flight SQL             |\n| `iceberg`                          | [Apache Iceberg][iceberg]             | 测试版          | Parquet                      |\n| `mssql`                            | Microsoft SQL Server                  | 测试版          | 表格数据流 (TDS)    |\n| `odbc`                             | ODBC                                  | 测试版          | ODBC                         |\n| `snowflake`                        | Snowflake                             | 测试版          | Arrow                        |\n| `spark`                            | Spark                                 | 测试版          | [Spark Connect][spark]       |\n| `oracle`                           | Oracle                                | 开发中          | [Oracle ODPI-C][ODPIC]       |\n| `abfs`                             | Azure BlobFS                          | 开发中          | Parquet, CSV                 |\n| `clickhouse`                       | Clickhouse                            | 开发中          |                              |\n| `debezium`                         | Debezium CDC                          | 开发中          | Kafka + JSON                 |\n| `gcs`, `gs`                        | [Google Cloud Storage][gcs]           | 开发中          | Parquet, CSV, JSON           |\n| `kafka`                            | Kafka                                 | 开发中          | Kafka + JSON                 |\n| `ftp`, `sftp`                      | FTP\u002FSFTP                              | 开发中          | Parquet, CSV                 |\n| `glue`                             | [AWS Glue][glue]                      | 开发中          | Iceberg、Parquet、CSV        |\n| `http`, `https`                    | HTTP(s)                               | 开发中          | Parquet、CSV、JSON           |\n| `imap`                             | IMAP                                  | 开发中          | IMAP 邮件                  |\n| `localpod`                         | [本地数据集复制][localpod]           | 开发中          |                              |\n| `mongodb`                          | MongoDB                               | 开发中          |                              |\n| `sharepoint`                       | Microsoft SharePoint                  | 开发中          | 非结构化 UTF-8 文档        |\n| `scylladb`                         | ScyllaDB                              | 开发中          |                              |\n| `smb`                              | SMB（服务器消息块）            | 开发中          | SMB                          |\n| `elasticsearch`                    | ElasticSearch                         | 计划中          |                              |\n\n[databricks]: https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Fblob\u002Ftrunk\u002Fdatabricks\u002FREADME.md\n[spark]: https:\u002F\u002Fspark.apache.org\u002Fdocs\u002Flatest\u002Fspark-connect-overview.html\n[gcs]: docs\u002Ffeatures\u002Fgcs-connector.md\n[s3]: https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Ftree\u002Ftrunk\u002Fs3#readme\n[spiceai]: https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Ftree\u002Ftrunk\u002Fspiceai#readme\n[dremio]: https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Ftree\u002Ftrunk\u002Fdremio#readme\n[localpod]: https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Fblob\u002Ftrunk\u002Flocalpod\u002FREADME.md\n[iceberg]: https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Ftree\u002Ftrunk\u002Fcatalogs\u002Ficeberg#readme\n[glue]: https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook\u002Ftree\u002Ftrunk\u002Fglue\u002FREADME.md\n[ODPIC]: https:\u002F\u002Foracle.github.io\u002Fodpi\u002F\n\n## 支持的数据加速器\n\n| 名称       | 描述                       | 状态            | 引擎模式     |\n| ---------- | --------------------------------- | ----------------- | ---------------- |\n| `cayenne`  | [Spice Cayenne (Vortex)][cayenne] | 发布候选        | `file`           |\n| `arrow`    | [内存中的 Arrow 记录][arrow]  | 稳定            | `memory`         |\n| `duckdb`   | 嵌入式 [DuckDB][duckdb]         | 稳定            | `memory`, `file` |\n| `postgres` | 附加的 [PostgreSQL][postgres]   | 发布候选        | 不适用              |\n| `sqlite`   | 嵌入式 [SQLite][sqlite]         | 发布候选        | `memory`, `file` |\n\n[arrow]: https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators\u002Farrow\n[cayenne]: https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators\u002Fcayenne\n[duckdb]: https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators\u002Fduckdb\n[postgres]: https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators\u002Fpostgres\n[sqlite]: https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators\u002Fsqlite\n\n## 支持的模型提供商\n\n| 名称          | 描述                                  | 状态            | 机器学习格式 | 大语言模型格式                   |\n| ------------- | -------------------------------------------- | ----------------- | ------------ | ------------------------------- |\n| `openai`      | OpenAI（或兼容）大语言模型端点          | 发布候选版      | -            | OpenAI 兼容的 HTTP 端点         |\n| `file`        | 本地文件系统                             | 发布候选版      | ONNX         | GGUF、GGML、SafeTensor          |\n| `huggingface` | HuggingFace 上托管的模型                 | 发布候选版      | ONNX         | GGUF、GGML、SafeTensor          |\n| `spice.ai`    | Spice.ai 云平台上托管的模型              |                 | ONNX         | OpenAI 兼容的 HTTP 端点         |\n| `azure`       | Azure OpenAI                               |                 | -            | OpenAI 兼容的 HTTP 端点         |\n| `bedrock`     | Amazon Bedrock（Nova 模型）               | Alpha 版本      | -            | OpenAI 兼容的 HTTP 端点         |\n| `anthropic`   | Anthropic 上托管的模型                   | Alpha 版本      | -            | OpenAI 兼容的 HTTP 端点         |\n| `xai`         | xAI 上托管的模型                         | Alpha 版本      | -            | OpenAI 兼容的 HTTP 端点         |\n\n## 支持的嵌入提供商\n\n| 名称          | 描述                         | 状态            | 机器学习格式 | 大语言模型格式\\*                 |\n| ------------- | ------------------- | ----------------- | ------------ | ------------------------------- |\n| `openai`      | OpenAI（或兼容）大语言模型端点 | 发布候选版      | -            | OpenAI 兼容的 HTTP 端点         |\n| `file`        | 本地文件系统                    | 发布候选版      | ONNX         | GGUF、GGML、SafeTensor          |\n| `huggingface` | HuggingFace 上托管的模型        | 发布候选版      | ONNX         | GGUF、GGML、SafeTensor          |\n| `model2vec`   | 静态嵌入（快 500 倍）           | 发布候选版      | Model2Vec    | -                               |\n| `azure`       | Azure OpenAI                        | Alpha 版本      | -            | OpenAI 兼容的 HTTP 端点         |\n| `bedrock`     | AWS Bedrock（例如 Titan、Cohere）   | Alpha 版本      | -            | OpenAI 兼容的 HTTP 端点         |\n\n## 支持的向量存储\n\n| 名称            | 描述                                                          | 状态 |\n| --------------- | -------------------------------------------------------------------- | ------ |\n| `s3_vectors`    | Amazon S3 向量，用于 PB 级别的向量存储和查询                     | Alpha  |\n| `pgvector`      | PostgreSQL 配合 pgvector 扩展                                   | Alpha  |\n| `duckdb_vector` | DuckDB 配合向量扩展，用于高效向量存储与搜索                     | Alpha  |\n| `sqlite_vec`    | SQLite 配合 sqlite-vec 扩展，用于轻量级向量操作                   | Alpha  |\n\n## 支持的目录\n\n目录连接器可连接到外部目录提供商，并使其表在 Spice 中可供联邦 SQL 查询使用。目前不支持为外部目录中的表配置加速功能。外部目录的模式层次结构将在 Spice 中得以保留。\n\n| 名称            | 描述             | 状态 | 协议\u002F格式              |\n| --------------- | ----------------------- | ------ | ---------------------------- |\n| `spice.ai`      | Spice.ai 云平台 | 稳定 | Arrow Flight                 |\n| `unity_catalog` | Unity 目录           | 稳定 | Delta Lake                   |\n| `databricks`    | Databricks              | 测试版   | Spark Connect、S3\u002FDelta Lake |\n| `iceberg`       | Apache Iceberg          | 测试版   | Parquet                      |\n| `glue`          | AWS Glue                | Alpha  | CSV、Parquet、Iceberg        |\n\n## ⚡️ 快速入门（本地机器）\n\n\u003Chttps:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fassets\u002F88671039\u002F85cf9a69-46e7-412e-8b68-22617dcbd4e0>\n\n### 安装\n\n安装 Spice CLI：\n\n在 **macOS、Linux 和 WSL** 上：\n\n```bash\ncurl https:\u002F\u002Finstall.spiceai.org | \u002Fbin\u002Fbash\n```\n\n或者使用 `brew`：\n\n```bash\nbrew install spiceai\u002Fspiceai\u002Fspice\n```\n\n在 **Windows** 上使用 PowerShell：\n\n```powershell\niex ((New-Object System.Net.WebClient).DownloadString(\"https:\u002F\u002Finstall.spiceai.org\u002FInstall.ps1\"))\n```\n\n### 使用方法\n\n**步骤 1.** 使用 `spice init` 命令初始化一个新的 Spice 应用程序：\n\n```bash\nspice init spice_qs\n```\n\n在 `spice_qs` 目录中会创建一个 `spicepod.yaml` 文件。切换到该目录：\n\n```bash\ncd spice_qs\n```\n\n**步骤 2.** 启动 Spice 运行时：\n\n```bash\nspice run\n```\n\n示例输出如下：\n\n```bash\n2025\u002F01\u002F20 11:26:10 INFO Spice.ai 运行时启动中...\n2025-01-20T19:26:10.679068Z  INFO runtime::init::dataset: 未配置任何数据集。如果这不是预期行为，请检查 Spicepod 配置。\n2025-01-20T19:26:10.679716Z  INFO runtime::flight: Spice 运行时飞行服务正在监听 127.0.0.1:50051\n2025-01-20T19:26:10.679786Z  INFO runtime::metrics_server: Spice 运行时指标服务正在监听 127.0.0.1:9090\n2025-01-20T19:26:10.680140Z  INFO runtime::http: Spice 运行时 HTTP 服务正在监听 127.0.0.1:8090\n2025-01-20T19:26:10.879126Z  INFO runtime::init::results_cache: 已初始化 SQL 结果缓存；最大大小：128.00 MiB，条目 TTL：1s\n```\n\n此时运行时已启动并可接受查询。\n\n**步骤 3.** 在一个新的终端窗口中，添加 `spiceai\u002Fquickstart` Spicepod。Spicepod 是一种包含数据集和机器学习模型配置的软件包。\n\n```bash\nspice add spiceai\u002Fquickstart\n```\n\n`spicepod.yaml` 文件将更新为包含 `spiceai\u002Fquickstart` 依赖项。\n\n```yaml\nversion: v1\nkind: Spicepod\nname: spice_qs\ndependencies:\n  - spiceai\u002Fquickstart\n```\n\n`spiceai\u002Fquickstart` Spicepod 将向运行时添加一个 `taxi_trips` 数据表，现在可以通过 SQL 查询该表。\n\n```bash\n2025-01-20T19:26:30.011633Z  INFO runtime::init::dataset: 数据集 taxi_trips 已注册（s3:\u002F\u002Fspiceai-demo-datasets\u002Ftaxi_trips\u002F2024\u002F），启用了加速（arrow）和结果缓存。\n2025-01-20T19:26:30.013002Z  INFO runtime::accelerated_table::refresh_task: 正在加载数据集 taxi_trips 的数据\n2025-01-20T19:26:40.312839Z  INFO runtime::accelerated_table::refresh_task: 已在 10 秒 299 毫秒内加载了 2,964,624 行数据（399.41 MiB）至数据集 taxi_trips\n```\n\n**步骤 4.** 启动 Spice SQL REPL：\n\n```bash\nspice sql\n```\n\nSQL REPL 界面将显示如下：\n\n```bash\n欢迎来到 Spice.ai SQL REPL！输入 'help' 获取帮助。\n\nshow tables; -- 列出可用表\nsql>\n```\n\n输入 `show tables;` 以显示可供查询的表：\n\n```bash\nsql> show tables;\n+---------------+--------------+---------------+------------+\n| table_catalog | table_schema | table_name    | table_type |\n+---------------+--------------+---------------+------------+\n| spice         | public       | taxi_trips    | BASE TABLE |\n| spice         | runtime      | query_history | BASE TABLE |\n| spice         | runtime      | metrics       | BASE TABLE |\n+---------------+--------------+---------------+------------+\n\n时间：0.022671708 秒。3 行。\n```\n\n输入查询语句以显示最长的出租车行程：\n\n```sql\nSELECT trip_distance, total_amount FROM taxi_trips ORDER BY trip_distance DESC LIMIT 10;\n```\n\n输出：\n\n```bash\n+---------------+--------------+\n| trip_distance | total_amount |\n+---------------+--------------+\n| 312722.3      | 22.15        |\n| 97793.92      | 36.31        |\n| 82015.45      | 21.56        |\n| 72975.97      | 20.04        |\n| 71752.26      | 49.57        |\n| 59282.45      | 33.52        |\n| 59076.43      | 23.17        |\n| 58298.51      | 18.63        |\n| 51619.36      | 24.2         |\n| 44018.64      | 52.43        |\n+---------------+--------------+\n\n时间：0.045150667 秒。10 行。\n```\n\n## ⚙️ 运行时容器部署\n\n使用本地 [Docker 镜像](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fspiceai\u002Fspiceai)：\n\n```bash\ndocker pull spiceai\u002Fspiceai\n```\n\n在 Dockerfile 中：\n\n```dockerfile\nfrom spiceai\u002Fspiceai:latest\n```\n\n使用 Helm：\n\n```bash\nhelm repo add spiceai https:\u002F\u002Fhelm.spiceai.org\nhelm install spiceai spiceai\u002Fspiceai\n```\n\n## 🏎️ 下一步\n\n### 探索 Spice.ai 烹饪书\n\nSpice.ai 烹饪书是一系列使用 Spice 的食谱和示例。可在 [https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook#readme) 找到。\n\n### 使用 Spice.ai 云平台\n\n通过 Spice 运行时访问 Spice.ai 云平台上托管的即用型 Spicepod 和数据集。公共 Spicepod 列表可在 Spicerack 上找到：[https:\u002F\u002Fspicerack.org\u002F](https:\u002F\u002Fspicerack.org\u002F)。\n\n要使用公共数据集，请在 Spice.ai 上创建一个免费账户：\n\n1. 访问 [spice.ai](https:\u002F\u002Fspice.ai\u002F) 并点击 **免费试用**。\n   ![免费试用](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspiceai_spiceai_readme_13db350a7e64.png)\n\n2. 创建账户后，创建一个应用以生成 API 密钥。\n   ![创建应用](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspiceai_spiceai_readme_84977fe4da1a.png)\n\n设置完成后，即可访问包括数据集在内的即用型 Spicepod。在本演示中，使用来自 [Spice.ai Quickstart](https:\u002F\u002Fspice.ai\u002Fspiceai\u002Fquickstart) 的 `taxi_trips` 数据集。\n\n**步骤 1.** 初始化一个新项目。\n\n```bash\n# 初始化一个新的 Spice 应用程序\nspice init spice_app\n\n# 切换到应用目录\ncd spice_app\n```\n\n**步骤 2.** 使用 `spice login` 命令从命令行登录并进行身份验证。系统将弹出浏览器窗口提示您进行身份验证：\n\n```bash\nspice login\n```\n\n**步骤 3.** 启动运行时：\n\n```bash\n\n# 启动运行时\nspice run\n```\n\n**步骤 4.** 配置数据集：\n\n在新的终端窗口中，使用 `spice dataset configure` 命令配置一个新的数据集：\n\n```bash\nspice dataset configure\n```\n\n输入一个将在查询中用于引用该数据集的名称。此名称不必与数据集源中的名称一致。\n\n```bash\n数据集名称：(spice_app) taxi_trips\n```\n\n输入数据集的描述：\n\n```bash\n描述：出租车行程数据集\n```\n\n输入数据集的位置：\n\n```bash\n来源：spice.ai\u002Fspiceai\u002Fquickstart\u002Fdatasets\u002Ftaxi_trips\n```\n\n当提示是否加速数据时，选择 `y`：\n\n```bash\n是否本地加速（y\u002Fn）？ y\n```\n\n您应该会在运行时终端看到以下输出：\n\n```bash\n2024-12-16T05:12:45.803694Z  INFO runtime::init::dataset: 数据集 taxi_trips 已注册 (spice.ai\u002Fspiceai\u002Fquickstart\u002Fdatasets\u002Ftaxi_trips), 加速方式 (arrow, 每10秒刷新), 结果缓存已启用。\n2024-12-16T05:12:45.805494Z  INFO runtime::accelerated_table::refresh_task: 正在加载数据集 taxi_trips 的数据\n2024-12-16T05:13:24.218345Z  INFO runtime::accelerated_table::refresh_task: 已为数据集 taxi_trips 加载 2,964,624 行数据（8.41 GiB），耗时 38 秒 412 毫秒。\n```\n\n**步骤 5.** 在新的终端窗口中，使用 Spice SQL REPL 查询数据集\n\n```bash\nspice sql\n```\n\n```bash\nSELECT tpep_pickup_datetime, passenger_count, trip_distance from taxi_trips LIMIT 10;\n```\n\n输出将显示查询结果以及查询执行时间：\n\n```bash\n+----------------------+-----------------+---------------+\n| tpep_pickup_datetime | passenger_count | trip_distance |\n+----------------------+-----------------+---------------+\n| 2024-01-11T12:55:12  | 1               | 0.0           |\n| 2024-01-11T12:55:12  | 1               | 0.0           |\n| 2024-01-11T12:04:56  | 1               | 0.63          |\n| 2024-01-11T12:18:31  | 1               | 1.38          |\n| 2024-01-11T12:39:26  | 1               | 1.01          |\n| 2024-01-11T12:18:58  | 1               | 5.13          |\n| 2024-01-11T12:43:13  | 1               | 2.9           |\n| 2024-01-11T12:05:41  | 1               | 1.36          |\n| 2024-01-11T12:20:41  | 1               | 1.11          |\n| 2024-01-11T12:37:25  | 1               | 2.04          |\n+----------------------+-----------------+---------------+\n\n时间：0.00538925 秒。10 行。\n```\n\n您可以尝试使用未加速的数据集来体验查询生成所需的时间。您可以在 `datasets.yaml` 文件中将加速设置从 `true` 更改为 `false`。\n\n### 📄 文档\n\n完整的文档可在 [spiceai.org\u002Fdocs](https:\u002F\u002Fspiceai.org\u002Fdocs\u002F) 上找到。\n\n超过 45 个快速入门和示例可在 [Spice Cookbook](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fcookbook#spiceai-oss-cookbook) 中找到。\n\n### 🔌 可扩展性\n\nSpice.ai 被设计为可扩展的，并在 [EXTENSIBILITY.md](.\u002Fdocs\u002FEXTENSIBILITY.md) 中记录了扩展点。您可以构建自定义的 [数据连接器](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-connectors)、[数据加速器](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators)、[目录连接器](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fcatalogs)、[秘密存储](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fsecret-stores)、[模型](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fmodels) 或 [嵌入](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fembeddings)。\n\n### 🔨 即将推出的功能\n\n🚀 请参阅 [Roadmap](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fblob\u002Ftrunk\u002Fdocs\u002FROADMAP.md) 以了解即将推出的功能。\n\n### 🤝 与我们联系\n\n我们非常感谢并珍视您的支持！您可以通过多种方式帮助 Spice：\n\n- 使用 Spice.ai 构建应用程序，并通过 [hey@spice.ai](mailto:hey@spice.ai) 或在 [Discord](https:\u002F\u002Fdiscord.gg\u002FkZnTfneP5u)、[X](https:\u002F\u002Ftwitter.com\u002Fspice_ai) 或 [LinkedIn](https:\u002F\u002Fwww.linkedin.com\u002Fcompany\u002F74148478) 上向我们发送反馈和建议。\n- 如果您发现某些功能未能正常工作，请 [提交问题](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fissues\u002Fnew)。\n- 加入我们的团队 ([我们正在招聘！](https:\u002F\u002Fspice.ai\u002Fcareers))。\n- 为项目贡献代码或文档（请参阅 [CONTRIBUTING.md](CONTRIBUTING.md)）。\n- 关注我们的博客 [spiceai.org\u002Fblog](https:\u002F\u002Fspiceai.org\u002Fblog)。\n\n⭐️ 请给这个仓库加星标！感谢您的支持！ 🙏","# Spice.ai 快速上手指南\n\nSpice 是一个用 Rust 编写的高性能 SQL 查询、搜索和 LLM 推理引擎，专为数据应用和 AI 智能体设计。它支持数据联邦、加速缓存、向量搜索以及 OpenAI 兼容的 API，帮助开发者构建基于真实数据的 AI 应用。\n\n## 环境准备\n\n在开始之前，请确保您的开发环境满足以下要求：\n\n*   **操作系统**：Linux (x86_64\u002Faarch64), macOS (Intel\u002FApple Silicon), 或 Windows (WSL2 推荐)。\n*   **容器运行时（可选但推荐）**：Docker 或 Podman，用于快速部署。\n*   **网络环境**：需要能够访问 GitHub 容器仓库或 Docker Hub。\n    *   *国内加速建议*：如果拉取镜像缓慢，建议配置 Docker 镜像加速器（如阿里云、腾讯云等提供的加速地址），或在 `spiced` 启动时通过环境变量代理网络请求。\n\n## 安装步骤\n\n您可以选择使用 Docker 快速运行，或直接下载二进制文件。\n\n### 方式一：使用 Docker 运行（推荐）\n\n这是最简单的方式，无需安装额外依赖即可启动 `spiced` 服务。\n\n```bash\ndocker run --rm -p 3000:3000 -p 50051:50051 ghcr.io\u002Fspiceai\u002Fspiced:latest\n```\n\n*   `3000`: HTTP API 端口（SQL, OpenAI 兼容接口等）。\n*   `50051`: Arrow Flight SQL 端口。\n\n### 方式二：安装命令行工具 (spice CLI)\n\n为了方便管理配置和交互，建议安装官方 CLI 工具。\n\n**macOS \u002F Linux:**\n\n```bash\ncurl -fsSL https:\u002F\u002Finstall.spiceai.org | bash\n```\n\n**Windows (PowerShell):**\n\n```powershell\niwr -useb https:\u002F\u002Finstall.spiceai.org\u002Finstall.ps1 | iex\n```\n\n安装完成后，验证版本：\n\n```bash\nspice --version\n```\n\n## 基本使用\n\n以下示例展示如何初始化一个项目、连接数据源并进行简单的 SQL 查询。\n\n### 1. 初始化项目\n\n使用 CLI 创建一个新的 Spice 项目目录：\n\n```bash\nspice init my_data_app\ncd my_data_app\n```\n\n这将生成一个 `spicepod.yaml` 配置文件，用于定义数据源和加速策略。\n\n### 2. 配置数据源\n\n编辑 `spicepod.yaml`，添加一个示例数据源（例如公共的 HTTPS CSV 文件或数据库）。以下是一个连接公开数据集并启用本地加速的配置示例：\n\n```yaml\nversion: v1.0\nkind: Spicepod\nname: my_data_app\n\ndatasets:\n  - from: spiceai:eth.recent_blocks\n    name: eth_blocks\n    description: \"Recent Ethereum blocks\"\n    acceleration:\n      enabled: true\n      engine: duckdb\n      mode: file\n```\n\n*注：`spiceai:` 前缀表示使用 Spice 云托管的示例数据。您也可以替换为 `postgres:`, `mysql:`, `s3:` 等任意支持的数据源。*\n\n### 3. 启动服务\n\n在项目目录下启动 `spiced` 守护进程：\n\n```bash\nspice run\n```\n\n服务启动后，您将看到类似以下的日志，表明引擎已就绪并加载了数据集：\n\n```text\nINFO spiced: Ready on port 3000\nINFO spiced: Dataset eth_blocks loaded and accelerated\n```\n\n### 4. 执行 SQL 查询\n\n保持 `spice run` 运行的同时，打开一个新的终端窗口，使用 CLI 执行 SQL 查询：\n\n```bash\nspice sql \"SELECT block_number, timestamp FROM eth_blocks LIMIT 5;\"\n```\n\n或者使用标准的 PostgreSQL 客户端（因为 Spice 暴露了 Postgres 兼容接口）：\n\n```bash\npsql -h localhost -p 5432 -U spice -d spice -c \"SELECT block_number FROM eth_blocks LIMIT 5;\"\n```\n\n### 5. 使用 AI\u002FLLM 功能 (可选)\n\nSpice 原生支持 OpenAI 兼容接口。如果您配置了 LLM 模型，可以直接通过 HTTP 调用：\n\n```bash\ncurl http:\u002F\u002Flocalhost:3000\u002Fv1\u002Fchat\u002Fcompletions \\\n  -H \"Content-Type: application\u002Fjson\" \\\n  -d '{\n    \"model\": \"local-model\",\n    \"messages\": [{\"role\": \"user\", \"content\": \"Summarize the latest ethereum blocks data.\"}]\n  }'\n```\n\n现在您已经成功运行了 Spice，可以进一步探索其数据联邦、向量搜索（Vector Search）或与 LangChain\u002FMCP 集成的能力。","某电商公司的数据团队正致力于构建一个能实时回答“上季度高价值客户流失原因”的智能分析助手，需要同时处理海量交易数据和非结构化客服记录。\n\n### 没有 spiceai 时\n- **数据孤岛严重**：交易数据在 PostgreSQL，日志在 S3 数据湖，客服文本在 Elasticsearch，开发者需编写复杂的 ETL 脚本或维护多个连接才能聚合数据。\n- **响应延迟高**：传统跨库查询效率低下，且缺乏向量搜索能力，导致 AI 代理在检索相关客服对话时耗时数秒，无法实现实时交互。\n- **架构臃肿**：为了支持 SQL 查询、向量检索和 LLM 推理，团队不得不分别部署数据库中间件、向量数据库和本地模型服务，运维成本极高。\n- **开发门槛高**：前端工程师需学习多种协议（如 JDBC、REST、gRPC）来对接不同后端，难以快速将数据能力嵌入 AI 应用。\n\n### 使用 spiceai 后\n- **统一数据联邦**：spiceai 通过单一 SQL 接口直接联邦查询 Postgres、S3 和 Elasticsearch，无需移动数据即可瞬间关联交易与客服记录。\n- **加速智能检索**：利用内置的 `vector_search` UDTF 和 Rust 加速引擎，spiceai 将混合搜索（关键词 + 向量）延迟降低至毫秒级，让 AI 回答更即时。\n- **极简运行时**：仅需一个轻量级二进制文件或容器，spiceai 便同时提供了 SQL 查询、OpenAI 兼容的 LLM 推理网关及向量搜索能力，大幅简化架构。\n- **标准化开发体验**：开发者只需通过标准的 ODBC\u002FJDBC 或 OpenAI SDK 即可调用所有能力，快速构建出数据驱动的分析 Agent。\n\nspiceai 通过统一的加速引擎消除了数据与应用间的隔阂，让开发者能以最低成本构建真正“落地于数据”的 AI 应用。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002Fspiceai_spiceai_13db350a.png","Spice.ai OSS","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002Fspiceai_26f92df1.png","Unified SQL query interface & materialization engine",null,"hey@spiceai.org","spice_ai","https:\u002F\u002Fspiceai.org","https:\u002F\u002Fgithub.com\u002Fspiceai",[84,88,92,96,99,103,106,109,112,115],{"name":85,"color":86,"percentage":87},"Rust","#dea584",98.4,{"name":89,"color":90,"percentage":91},"Shell","#89e051",1.1,{"name":93,"color":94,"percentage":95},"Python","#3572A5",0.2,{"name":97,"color":98,"percentage":95},"Makefile","#427819",{"name":100,"color":101,"percentage":102},"Dockerfile","#384d54",0,{"name":104,"color":105,"percentage":102},"TSQL","#e38c00",{"name":107,"color":108,"percentage":102},"PowerShell","#012456",{"name":110,"color":111,"percentage":102},"Mustache","#724b3b",{"name":113,"color":114,"percentage":102},"Visual Basic 6.0","#2c6353",{"name":116,"color":117,"percentage":102},"C","#555555",2859,180,"2026-04-04T17:11:50","Apache-2.0","Linux, macOS, Windows","可选。支持本地模型服务加速，需 NVIDIA GPU (CUDA) 或 Apple Silicon (Metal)。具体型号和显存大小取决于所加载的 LLM 模型大小，README 未指定最低硬性要求。","未说明。内存需求取决于数据联邦、加速引擎（Arrow\u002FDuckDB\u002FSQLite）及加载的模型大小。",{"notes":126,"python":127,"dependencies":128},"Spice 是一个用 Rust 编写的独立二进制文件或容器，无需安装 Python 环境即可运行核心功能。它支持多种部署模式： standalone 实例、Kubernetes Sidecar、微服务或集群。支持通过 Docker 快速部署。若需使用本地 LLM 推理加速，需确保系统已安装正确的 CUDA 驱动（NVIDIA）或支持 Metal（macOS）。","不需要 (核心引擎由 Rust 编写，通过 HTTP\u002FSQL 接口交互)",[129,130,131,132,133,134,135,136],"Rust (编译环境)","Apache DataFusion","Apache Arrow","DuckDB","SQLite","Tantivy","Apache Ballista","Vortex",[13,26,51],[139,140,141,142,143,144,145,146,147],"artificial-intelligence","developers","machine-learning","data","sql","infrastructure","data-federation","full-text-search","llm-inference","2026-03-27T02:49:30.150509","2026-04-06T07:05:51.126265",[151,156,161,166,171,176],{"id":152,"question_zh":153,"answer_zh":154,"source_url":155},15668,"使用 DuckDB 加速引擎处理包含 Decimal128 类型的 TPCH 数据集时出现 panic 错误怎么办？","这是一个已知的类型支持问题。当在 spicepod.yaml 中配置 DuckDB 加速引擎并加载包含 Decimal128(15, 2) 等复杂小数类型的数据集时，运行时可能会抛出 'not yet implemented' 的 panic 错误。该问题通常需要通过更新代码以修复 DuckDB 连接器中的类型映射逻辑来解决。请确保升级到包含相关修复的最新版本，或者暂时避免对包含此类特定小数类型的列使用 DuckDB 加速，改用其他支持的引擎（如 SQLite）。","https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fissues\u002F1026",{"id":157,"question_zh":158,"answer_zh":159,"source_url":160},15669,"使用 Flight SQL JDBC 驱动连接 Spice 并进行 ::TEXT 类型转换时报 'Unsupported ArrowType Utf8View' 错误如何解决？","该错误是由于旧版本中对 Utf8View 箭头类型支持不完善导致的。维护者确认此问题已在 Spice v1.11.3 版本中修复。如果您在升级后仍遇到类似的 'Unsupported vector type' 错误，可能需要合并特定的修复提交或等待后续的小版本更新。建议直接将 Spice 运行时升级到 v1.11.3 或更高版本以解决此兼容性問題。","https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fissues\u002F9253",{"id":162,"question_zh":163,"answer_zh":164,"source_url":165},15670,"Spice 是否支持连接 PostgreSQL 的分区主表？加载时提示 'Table not found' 怎么办？","早期版本中存在无法正确识别 PostgreSQL 分区表的缺陷，导致加载数据集时报错 'Failed to load the dataset... Table not found'。该问题已在 Spice v1.11.5 版本中得到修复。如果您遇到此问题，请将 Spice 运行时升级到 v1.11.5 或更高版本，升级后即可正常连接和查询 PostgreSQL 的分区表。","https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fissues\u002F9990",{"id":167,"question_zh":168,"answer_zh":169,"source_url":170},15671,"在使用 SQLite 加速引擎运行 TPCH 基准测试时，为什么会出现数据类型处理失败或计算错误？","这是因为 SQLite 加速器将 `decimal(a, b)` 类型存储为 `TEXT` 类型。当启用 SQLite 联合下推（federation pushdown）时，某些查询会完全在 SQLite 内部执行，而 SQLite 本身不知道这些 TEXT 字段应当被视为小数，从而导致类型处理错误。除非反解析器（unparser）知晓预期的输出模式，否则生成的 SQL 可能不正确。目前的机制下，如果所有处理都在 SQLite 中进行，小数类型可能无法被正确解析回 Decimal 类型。","https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fissues\u002F8594",{"id":172,"question_zh":173,"answer_zh":174,"source_url":175},15672,"安全扫描工具报告 spiced 包含硬编码的 'BEGIN PRIVATE KEY' 导致误报，如何处理？","这通常是一个误报，可能是由于依赖项中包含的示例文件或特定字符串触发了安全扫描器（如标记 DER PKCS8 私钥）。维护者指出，许多情况下这只是扫描器的过度敏感。如果阻碍了企业评估，可以尝试升级到最新版本（如 1.9.0-rc1 及以后），新版本的安全加固或依赖项更新可能已经消除了触发扫描的特征。如果问题依旧，可确认项目中并未实际嵌入敏感的私钥文件，并向安全团队说明这是误报。","https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fissues\u002F7667",{"id":177,"question_zh":178,"answer_zh":179,"source_url":180},15673,"在使用 GitHub\u002FGraphQL 连接器查询时，SQL 语句中的 LIMIT 子句为什么不生效？","这是一个已修复的 Bug。在旧版本中，GitHub\u002FGraphQL 连接器未能正确尊重 SQL 查询中的 LIMIT 限制，导致返回所有结果而非限定数量的行。该问题已通过优化规则修复（具体涉及重写 BytesProcessedNodes 优化规则以正确处理 Limit 节点）。请确保您使用的是修复后的版本（相关修复已合入主干并在后续发布中提供），升级后 LIMIT 子句将按预期工作。","https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fissues\u002F2850",[182,187,192,197,202,207,212,217,222,227,232,237,242,247,252,257,262,267,272,277],{"id":183,"version":184,"summary_zh":185,"released_at":186},90300,"v1.11.5","# Spice v1.11.5（2026年4月1日）\n\nSpice v1.11.5 是一个补丁版本，主要改进了 **`on_zero_results: use_source`** 回退性能、**Delta Lake** 时间戳谓词数据跳过功能、**S3 Parquet** 读取性能、**PostgreSQL** 分区表支持、**Cayenne** 目标文件大小处理，并为 CLI 的 v2.0 运行时升级做好准备。\n\n## v1.11.5 新增内容\n\n### `on_zero_results: use_source` 回退性能提升\n\n优化了 [`on_zero_results: use_source`](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fdata-acceleration\u002Fdata-refresh#behavior-on-zero-results) 回退路径，在联邦扫描计划上运行 DataFusion 的物理优化器（[#9927](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F9927)）。现在，回退路径会在执行前对联邦扫描计划应用 `SessionState::physical_optimizers()` 规则，从而实现并行文件组扫描及其他优化。这使得在多核机器上执行回退查询的速度显著提升，尤其适用于 Delta Lake 等基于文件的数据源。\n\n### Delta Lake：改进了 `>=` 时间戳谓词的数据跳过\n\n带有 `>=` 时间戳过滤条件的 **Delta Lake** 表扫描现在能够正确地剪枝掉不符合谓词的文件（[#9932](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F9932)），通过更有效的数据跳过（文件级剪枝）来提升查询性能。\n\n### PostgreSQL：支持分区表\n\n**PostgreSQL** 数据连接器现支持 [分区表](https:\u002F\u002Fwww.postgresql.org\u002Fdocs\u002Fcurrent\u002Fddl-partitioning.html)（[#9997](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F9997)），无论是联邦查询还是加速查询均可使用。\n\n### S3 Parquet 读取性能提升\n\n提升了从 **S3** 及其他对象存储中读取 Parquet 文件的性能（[#10064](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F10064)），尤其针对列数较多的表。现在会将列的数据范围合并为更少但更大的请求，而不是逐个单独获取，从而减少 HTTP 往返次数。\n\n### Cayenne：确保尊重目标文件大小\n\n**Cayenne** 加速器现在能够正确地遵守配置的目标文件大小（[#10071](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F10071)）。此前，Cayenne 可能会生成大量小而碎片化的 Vortex 文件；此次修复后，文件将以预期的目标大小写入，从而提高存储效率并改善查询性能。\n\n### CLI：支持 v2.0 运行时升级\n\nSpice CLI 现已支持升级到 v2.0 运行时版本。这意味着可以升级到 v2.0 发布候选版本，并在正式发布后升级至 v2.0 稳定版运行时。\n\n```console\nspice upgrade v2.0.0-rc.1\n```\n\n如果未指定版本直接运行 `spice upgrade`，将升级到最新稳定版本，包括 v2.0 正式发布后的版本。","2026-04-01T08:01:56",{"id":188,"version":189,"summary_zh":190,"released_at":191},90301,"v1.11.4","# Spice v1.11.4（2026年3月12日）\n\nSpice v1.11.4 是一个补丁版本，提升了 **S3** 元数据列查询的健壮性，并为加速视图启用了 **`on_zero_results: use_source`** 选项。\n\n## v1.11.4 的新特性\n\n### 加速视图：支持 `on_zero_results: use_source`\n\n加速 [视图](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fviews) 现在支持 [`on_zero_results: use_source`](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fdata-acceleration\u002Fdata-refresh#behavior-on-zero-results) 配置（[#9699](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F9699)）。此前，加速视图仅支持 `on_zero_results: return_empty`，即当加速数据中没有匹配的行时返回空结果集。此次变更后，视图可以在加速查询返回零行时回退到查询源数据，从而与加速数据集已有的行为保持一致。\n\n**示例配置：**\n\n```yaml\nviews:\n  - name: sales_summary\n    sql: |\n      SELECT region, SUM(amount) as total\n      FROM sales\n      GROUP BY region\n    acceleration:\n      enabled: true\n      on_zero_results: use_source\n```\n\n#### 回退机制的工作原理\n\n当加速视图配置为 `on_zero_results: use_source` 时，查询执行时会按以下步骤进行：\n\n1. **首先查询加速存储。** 查询会针对视图的加速数据执行（例如 Spice Cayenne、Arrow、DuckDB 或 SQLite）。\n\n2. **如果加速查询返回零行，则运行时会回退到重新执行该视图的 SQL 查询，这次直接作用于其引用的数据集。**\n\n3. **引用的数据集将按照各自的配置进行查询。** 视图的 SQL 会根据每个引用数据集的配置重新执行。这意味着：\n\n   - 如果某个引用的数据集已启用加速，则查询将命中该数据集的加速存储，而不是原始数据源。\n   - 如果某个引用的数据集也启用了加速且配置了 `on_zero_results: use_source`，而其加速存储同样返回零行，则它会独立地回退到自己的联邦数据源（例如 Postgres、S3 等）。\n   - 如果某个引用的数据集是联邦数据集（未启用加速），则查询将直接访问该数据源。\n\n因此，回退机制可以逐层传递：首先是视图自身的加速层，然后是各引用数据集的加速层，最后到达原始数据源——每一层都会独立应用其自身的 `on_zero_results` 行为。\n\n**多层回退示例：**\n\n```yaml\ndatasets:\n  - from: postgres:orders\n    name: orders\n    acceleration:\n      enabled: true\n      refresh_sql: \"SELECT * FROM orders WHERE created_at > now() - interval '7 days'\"\n      on_zero_results: use_source  # 如果加速数据无匹配，则回退到 Postgres 数据库\n\nviews:\n  - name: recent_orders_summary\n    sql: |\n      SELECT status, COUNT(*) as order_count\n      FROM orders\n      GROUP BY status\n    acceleration:\n      enabled: true\n","2026-03-12T08:23:30",{"id":193,"version":194,"summary_zh":195,"released_at":196},90302,"v1.11.3","Spice v1.11.3 是一个补丁版本，修复了 **S3** 和 **FlightSQL** 数据连接器中的模式一致性问题，改进了 **CDC 缓存失效机制**，并增强了 **HTTP** 数据连接器的错误处理和响应元数据功能。\n\n## v1.11.3 的新特性\n\n### S3 数据连接器修复\n\n修复了一个问题：当使用 [S3](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-connectors\u002Fs3) 数据集上的元数据列（`location`、`last_modified`、`size`）执行查询时，会抛出 `输入字段名与投影表达式不匹配` 错误（[#9647](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fissues\u002F9647)）。该问题出现在对元数据列应用过滤条件或标量函数进行投影时（例如 `SELECT lower(location) FROM table WHERE location = '...'`），以及当投影未返回任何匹配文件时。\n\n### FlightSQL 模式一致性\n\n修复了一个问题：当执行 `::TEXT` 类型转换时，Flight SQL JDBC 驱动程序会返回 `Unsupported ArrowType Utf8View` 错误（[#9253](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fissues\u002F9253)）。现在，[FlightSQL](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-connectors\u002Fflightsql) 端点会将视图类型（如 `Utf8View`、`BinaryView`）映射为其对应的非视图类型，从而确保与 JDBC 和 ODBC 客户端的兼容性。\n\n### CDC 缓存失效机制\n\n修复了一个问题：SQL 结果缓存会在每次变更流轮询时被失效，即使没有返回任何记录也是如此（[#9472](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fissues\u002F9472)）。这导致使用 `refresh_mode: changes` 的数据集（例如 DynamoDB Streams）几乎总是出现缓存未命中，从而使缓存几乎完全失效。现在，只有当变更批次包含实际的数据更改时，缓存才会被失效。\n\n### HTTP 数据连接器改进\n\n- 现在会将 HTTP 错误响应（例如 5xx）排除在缓存之外，以防止临时性的服务器错误污染缓存结果。\n- 向 HTTP 响应中添加了一个 `response_headers` 列（Map 类型），以便在查询结果中访问响应头元数据。\n\n## 贡献者\n\n- [@krinart](https:\u002F\u002Fgithub.com\u002Fkrinart)\n- [@sgrebnov](https:\u002F\u002Fgithub.com\u002Fsgrebnov)\n\n## 破坏性变更\n\n无破坏性变更。\n\n## 说明书更新\n\n无新的说明书示例。\n\n[Spice 说明书](https:\u002F\u002Fspiceai.org\u002Fcookbook) 包含 86 个示例，可帮助您快速轻松地开始使用 Spice。\n\n## 升级说明\n\n要升级到 v1.11.3，请使用以下任一方法：\n\n**CLI**:\n\n```console\nspice upgrade\n```\n\n**Homebrew**:\n\n```console\nbrew upgrade spiceai\u002Fspiceai\u002Fspice\n```\n\n**Docker**:\n\n拉取 `spiceai\u002Fspiceai:1.11.3` 镜像：\n\n```console\ndocker pull spiceai\u002Fspiceai:1.11.3\n```\n\n有关可用标签，请参阅 [DockerHub](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fspiceai\u002Fspiceai\u002Ftags)。\n\n**Helm**:\n\n```console\nhelm repo update\nhelm upgrade spiceai spiceai\u002Fspiceai --version 1.11.3\n```\n\n**AWS Marketplace**:\n\nSpice 已在 [AWS Marketplace](https:\u002F\u002Faws.amazon.com\u002Fmarketplace\u002Fpp\u002Fprodview-jmf6jskjvnq7i) 上架。\n\n## 变更内容\n\n#","2026-03-09T13:13:12",{"id":198,"version":199,"summary_zh":200,"released_at":201},90303,"v2.0.0-rc.1","# Spice v2.0.0-rc.1（2026年3月2日）\n\nv2.0.0-rc.1 是 v2.0 的首个候选版本，用于早期测试。\n\n此候选版本的亮点包括：\n\n- **主动-主动高可用分布式查询**：原生支持对象存储，基于 Apache Ballista 构建，具备动态集群规模调整、分布式摄取和集群可观测性。\n- **Spice Cayenne RC**：支持分阶段追加写入、基于文件的保留删除、复合分区和分布式摄取。\n- **DataFusion v52.2.0 升级**：新增排序下推、新的合并连接和动态过滤器。\n- **DDL 支持**：通过 SQL 为 Iceberg 和 Cayenne 目录提供 `CREATE TABLE` 和 `DROP TABLE` 功能。\n- **DuckLake 目录与数据连接器**：用于湖仓一体式数据管理。\n- **GCS 数据连接器（Alpha）**：用于 Google Cloud Storage。\n- **Rust CLI 重写**：实现统一的单二进制体验。\n- 依赖项升级，包括 **DuckDB v1.4.4**、**delta_kernel v0.18.2** 和 **mistral.rs**。\n\nSpice v2.0 包含若干[破坏性变更](#breaking-changes)。请在升级前仔细阅读破坏性变更部分。\n\n## v2.0.0-rc.1 新增功能\n\n### 主动-主动高可用分布式查询\n\n[分布式查询](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fdistributed-query)已退出 Beta 阶段，现提供基于对象存储的主动-主动高可用分布式查询功能。\n\n分布式查询支持两种执行模式：\n\n- **同步模式**：针对加速数据集的查询会在各个执行节点间分发，并实时流式返回结果。非加速数据集则仅在调度器上执行。此模式适用于对延迟要求极高的交互式查询。\n- **异步模式**：查询可通过全新的仅 HTTP 的 `\u002Fv1\u002Fqueries` API 提交，结果将物化到对象存储中，供后续检索。此模式更适合长时间运行的分析工作负载、批处理任务以及分布式模式下的非加速数据集。\n\n**关键改进**：\n\n- **动态集群规模调整**：查询计划器会根据集群中活跃执行节点的数量自动调整并行度，从而在节点增减时确保资源的最优利用。\n- **分布式摄取**：分区加速表的数据摄取现已在各执行节点间分布，可在集群模式下实现更高的吞吐量和并行数据加载。而常规（非分区）加速表则不会进行摄取负载的分布式处理。\n- **调度器上的同步执行**：当合适时，`\u002Fv1\u002Fsql` 和 FlightSQL 查询现在将在调度器上同步执行，从而减少那些无需分布式处理的查询的节点间开销。\n- **更快的故障检测**：执行节点的心跳超时时间由 180 秒缩短至 30 秒，使集群能够迅速检测并响应执行节点故障。\n- **集群可观测性**：新增指标和 Grafana 仪表板，用于监控分布式查询集群。\n\n### Spice Cayenne 改进\n\n[Spice Cayenne 数据加速器](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators\u002Fcayenne)已退出 Beta 阶段，并带来了显著的","2026-03-04T22:22:04",{"id":203,"version":204,"summary_zh":205,"released_at":206},90304,"v1.11.2","Spice v1.11.2 是一个补丁版本，用于优化 HTTP 数据连接器的行为。\n\n## v1.11.2 的新特性\n\n- **HTTP 数据连接器**：现在会将 HTTP 429（请求过多）响应视为可重试且不可缓存的响应，从而防止受速率限制的响应被存储在 HTTP 缓存加速器中。\n\n## 贡献者\n\n- [@sgrebnov](https:\u002F\u002Fgithub.com\u002Fsgrebnov)\n\n## 破坏性变更\n\n无破坏性变更。\n\n## Cookbook 更新\n\n没有新增的 Cookbook 示例。\n\n[Spice Cookbook](https:\u002F\u002Fspiceai.org\u002Fcookbook) 包含 86 个示例，可帮助您快速、轻松地入门 Spice。\n\n## 升级\n\n要升级到 v1.11.2，您可以使用以下任一方法：\n\n**CLI**：\n\n```console\nspice upgrade\n```\n\n**Homebrew**：\n\n```console\nbrew upgrade spiceai\u002Fspiceai\u002Fspice\n```\n\n**Docker**：\n\n拉取 `spiceai\u002Fspiceai:1.11.2` 镜像：\n\n```console\ndocker pull spiceai\u002Fspiceai:1.11.2\n```\n\n有关可用标签，请参阅 [DockerHub](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fspiceai\u002Fspiceai\u002Ftags)。\n\n**Helm**：\n\n```console\nhelm repo update\nhelm upgrade spiceai spiceai\u002Fspiceai --version 1.11.2\n```\n\n**AWS Marketplace**：\n\nSpice 已在 [AWS Marketplace](https:\u002F\u002Faws.amazon.com\u002Fmarketplace\u002Fpp\u002Fprodview-jmf6jskjvnq7i) 上架。\n\n## 变更内容\n\n### 更改日志\n\n- 将 HTTP 429（请求过多）响应处理为可重试、不可缓存的响应，由 [@sgrebnov](https:\u002F\u002Fgithub.com\u002Fsgrebnov) 在 [#9389](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F9389) 中实现。\n\n**完整更改日志**：\u003Chttps:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fcompare\u002Fv1.11.1...v1.11.2>","2026-02-18T03:55:24",{"id":208,"version":209,"summary_zh":210,"released_at":211},90305,"v1.11.1","# Spice v1.11.1（2026年2月9日）\n\nv1.11.1 是一个补丁版本，主要提升了 [Spice Cayenne](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators\u002Fcayenne) 加速器的可靠性和性能，增强了 **DynamoDB Streams** 和 **HTTP** 数据连接器的功能，并修复了 **联邦任务历史记录** 和 **FlightSQL** 中的问题。\n\n## v1.11.1 的新特性\n\n### Spice Cayenne 加速器改进\n\n此版本针对 [Spice Cayenne](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators\u002Fcayenne) 加速器进行了稳定性与性能方面的修复：\n\n- **基于行的删除逻辑**：重构了基于行的删除操作，改为使用带有 `RoaringBitmap` 的文件级删除向量。删除扫描现在采用 Vortex 原生流式处理，并支持过滤下推，仅投影行索引，从而实现删除操作零数据 I\u002FO。\n- **约束与冲突处理**：`constraints` 和 `on_conflict` 配置现已从联邦表元数据中自动推断得出，使得诸如 [DynamoDB](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-connectors\u002Fdynamodb) 等数据集无需在 Spicepod 中显式定义 `primary_key` 即可正常工作。\n- **分区表删除**：修复了在分区 Cayenne 表上执行 `DELETE` 操作时失败的问题。\n- **数据完整性**：修复了加速快照处理中的两个问题：受保护的快照现在会包含在冲突检测键集扫描中（防止在追加刷新过程中创建重复键），同时快照清理不会再删除受保护的快照。\n\n### 数据连接器改进\n\n- **DynamoDB Streams**：当流延迟超过 DynamoDB 分片保留期限（24 小时）时，新增自动重新引导功能。可通过新的 `lag_exceeds_shard_retention_behavior` 参数进行配置，取值为 `error`（默认）、`ready_before_load` 或 `ready_after_load`。\n- **HTTP 连接器**：HTTP 响应现新增 `response_status` 列（UInt16）。4xx 响应（如 404 Not Found）被视为有效可查询数据并正常缓存。而 5xx 响应则会以退避重试机制进行重试，返回给用户，但不会被缓存，以避免临时性服务器错误污染缓存结果。\n\n### 其他改进\n\n- **可靠性**：为 `SnapshotManager` 操作增加了重试机制，并整体提升了快照的可靠性。\n- **可靠性**：修复了 [查询结果缓存](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fcaching) 中时间戳精度不匹配的处理问题。\n- **可靠性**：修复了联邦任务历史记录查询中的双重投影问题，该问题曾导致集群模式下出现 `Schema error: project index out of bounds` 错误。\n- **开发者体验**：为 [FlightSQL](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-connectors\u002Fflightsql) 数据连接器新增了 Cookie 中间件支持。\n\n## 贡献者\n\n- [@krinart](https:\u002F\u002Fgithub.com\u002Fkrinart)\n- [@phillipleblanc](https:\u002F\u002Fgithub.com\u002Fphillipleblanc)\n- [@sgrebnov](https:\u002F\u002Fgithub.com\u002Fsgrebnov)\n\n## 破坏性变更\n\n无破坏性变更。\n\n## 食谱更新\n\n无重大更新。","2026-02-10T13:11:35",{"id":213,"version":214,"summary_zh":215,"released_at":216},90306,"v1.11.0","在 Spice v1.11.0 中，**Spice Cayenne 达到 Beta 状态**，新增加速快照、基于键的删除向量以及对 Amazon S3 Express One Zone 的支持。**DataFusion 已升级至 v51**，同时更新了 **Arrow v57.2** 和 **iceberg-rust v0.8.0**。v1.11 还引入了多项 **DynamoDB 及 DynamoDB Streams 改进**，例如 JSON 嵌套功能，并通过主动-主动调度器和 mTLS 为分布式查询带来了显著提升，从而实现企业级高可用性和安全的集群通信。\n\n此外，本次发布还新增了 **SMB、NFS 和 ScyllaDB 数据连接器（Alpha）**、**带有完整 SDK 支持的预编译语句**（gospice、spice-rs、spice-dotnet、spice-java、spice.js 和 spicepy）、**Google LLM 支持**以扩展 AI 推理能力，以及针对缓存、可观测性和 Arrow 加速的哈希索引的重大改进。\n\n## v1.11.0 新特性\n\n### Spice Cayenne 加速器进入 Beta 阶段\n\n[Spice Cayenne](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators\u002Fcayenne) 现已晋升为 Beta 状态，支持加速快照，并进行了多项性能与稳定性优化。\n\n**核心增强功能**：\n\n- **基于键的删除向量**：通过基于键的查找方式改进了删除向量的支持，从而实现更高效的数据管理和更快的删除操作。对于稀疏删除场景，基于键的删除向量比基于位置的向量更加节省内存。\n- **主键冲突处理**：Cayenne 表新增 `on_conflict` 配置选项，允许在插入操作中选择覆盖现有行（`upsert`）或忽略重复行（`drop`），以满足不同的业务需求。\n- **S3 Express One Zone 支持**：可将 Cayenne 数据文件存储在 [S3 Express One Zone](https:\u002F\u002Faws.amazon.com\u002Fs3\u002Fstorage-classes\u002Fexpress-one-zone\u002F) 中，实现个位数毫秒级延迟，非常适合对延迟敏感且需要持久化的查询工作负载。\n\n**可靠性提升**：\n\n- 解决了 `FuturesUnordered` 在递归调用时崩溃的问题\n- 修复了与 Vortex 指标分配相关的内存增长问题\n- 元数据目录现能正确识别并应用 `cayenne_file_path` 路径配置\n- 增加了对无法解析配置值的警告提示\n\n更多详细信息，请参阅 [Cayenne 文档](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators\u002Fcayenne)。\n\n### DataFusion v51 升级\n\n[Apache DataFusion 已升级至 v51](https:\u002F\u002Fdatafusion.apache.org\u002Fblog\u002F2025\u002F11\u002F25\u002Fdatafusion-51.0.0\u002F)，带来了显著的性能提升、全新 SQL 功能以及更强的可观测性。\n\n![DataFusion v51 ClickBench 性能对比图](https:\u002F\u002Fdatafusion.apache.org\u002Fblog\u002Fimages\u002Fdatafusion-51.0.0\u002Fperformance_over_time_clickbench.png)\n\n**性能改进**：\n\n- **更快的 `CASE` 表达式求值**：表达式现在能够更早地进行短路计算、复用部分结果并避免不必要的分发操作，从而加速常见的 ETL 流程。\n- **远程 Parquet 文件读取的默认行为优化**：DataFusion 现在","2026-01-28T07:46:44",{"id":218,"version":219,"summary_zh":220,"released_at":221},90307,"v1.11.0-rc.3","# Spice v1.11.0-rc.3（2026年1月22日）\n\nv1.11.0-rc.3 是一个补丁版本，包含了对 Arrow 加速中哈希索引的改进，以及修复了与 Flight SQL 端点的 TLS 连接问题。\n\n## v1.11.0-rc.3 的新特性\n\n### Arrow 加速中的哈希索引（实验性）\n\n基于 Arrow 的加速现在支持哈希索引，以加快等值谓词上的点查询速度。对于高基数列，哈希索引可提供平均 O(1) 复杂度的查找性能。\n\n**功能**：\n\n- 支持主键哈希索引\n- 支持非主键列的二级索引\n- 支持复合键，并正确处理空值\n\n配置示例：\n\n```yaml\ndatasets:\n  - from: postgres:users\n    name: users\n    acceleration:\n      enabled: true\n      engine: arrow\n      primary_key: user_id\n      indexes:\n        '(tenant_id, user_id)': unique  # 复合哈希索引\n```\n\n更多详细信息，请参阅 [哈希索引文档](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fnext\u002Ffeatures\u002Fdata-acceleration\u002Fhash-index)。\n\n### Flight SQL TLS 连接修复\n\n**TLS 连接支持**：修复了在使用 `grpc+tls:\u002F\u002F` 方案连接 Flight SQL 端点时出现的 TLS 连接问题。新增了通过新参数 `flightsql_tls_ca_certificate_file` 指定自定义 CA 证书文件的支持。\n\n## 贡献者\n\n- [@lukekim](https:\u002F\u002Fgithub.com\u002Flukekim)\n- [@phillipleblanc](https:\u002F\u002Fgithub.com\u002Fphillipleblanc)\n\n## 破坏性变更\n\n无破坏性变更。\n\n## 食谱更新\n\n无重大食谱更新。\n\n[Spice 食谱](https:\u002F\u002Fspiceai.org\u002Fcookbook) 包含 86 个食谱，帮助您快速轻松地开始使用 Spice。\n\n## 升级\n\n要升级到 v1.11.0-rc.3，可以使用以下方法之一：\n\n**CLI**：\n\n```console\nspice upgrade\n```\n\n**Homebrew**：\n\n```console\nbrew upgrade spiceai\u002Fspiceai\u002Fspice\n```\n\n**Docker**：\n\n拉取 `spiceai\u002Fspiceai:v1.11.0-rc.3` 镜像：\n\n```console\ndocker pull spiceai\u002Fspiceai:v1.11.0-rc.3\n```\n\n有关可用标签，请参阅 [DockerHub](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fspiceai\u002Fspiceai\u002Ftags)。\n\n**Helm**：\n\n```console\nhelm repo update\nhelm upgrade spiceai spiceai\u002Fspiceai --version 1.11.0-rc.3\n```\n\n**AWS Marketplace**：\n\nSpice 已在 [AWS Marketplace](https:\u002F\u002Faws.amazon.com\u002Fmarketplace\u002Fpp\u002Fprodview-jmf6jskjvnq7i) 上架。\n\n## 变更内容\n\n### 更改日志\n\n- Arrow 加速中的哈希索引，由 [@lukekim](https:\u002F\u002Fgithub.com\u002Flukekim) 在 [#8924](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F8924) 中实现\n- 改进哈希索引的验证和日志记录，由 [@lukekim](https:\u002F\u002Fgithub.com\u002Flukekim) 在 [#9047](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F9047) 中完成\n- 修复 grpc+tls:\u002F\u002F Flight SQL 端点的 TLS 连接问题，并添加自定义 CA 证书支持，由 [@phillipleblanc](https:\u002F\u002Fgithub.com\u002Fphillipleblanc) 在 [#9073](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F9073) 中完成","2026-01-23T19:53:21",{"id":223,"version":224,"summary_zh":225,"released_at":226},90308,"v1.11.0-rc.2","# Spice v1.11.0-rc.2（2026年1月20日）\n\nv1.11.0-rc.2 是 v1.11 版本的第二个候选发布版，用于高级测试。它将 **Spice Cayenne 提升至 Beta 状态**，新增加速快照支持，并引入了全新的 **ScyllaDB 数据连接器**，同时升级了 **DataFusion v51**、**Arrow 57.2** 和 **iceberg-rust v0.8.0**。此外，还对分布式查询、缓存和可观性功能进行了显著改进。\n\n## v1.11.0-rc.2 的新特性\n\n### Spice Cayenne 加速器进入 Beta 阶段\n\n[Spice Cayenne](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators\u002Fcayenne) 已被提升至 Beta 状态，新增了加速快照支持以及多项稳定性改进。\n\n**可靠性提升**：\n\n- 修复了 Docker 镜像中因时区数据库问题导致的加速崩溃；\n- 解决了 `FuturesUnordered` 递归释放时的崩溃问题；\n- 修复了与 Vortex 指标分配相关的内存增长问题；\n- 元数据目录现在能够正确遵循 `cayenne_file_path` 的指定路径；\n- 增加了对无法解析配置值的警告提示。\n\n带有快照功能的示例配置如下：\n\n```yaml\ndatasets:\n  - from: s3:\u002F\u002Fmy-bucket\u002Fdata.parquet\n    name: my_dataset\n    acceleration:\n      enabled: true\n      engine: cayenne\n      mode: file\n```\n\n### DataFusion v51 升级\n\n[Apache DataFusion 已升级至 v51](https:\u002F\u002Fdatafusion.apache.org\u002Fblog\u002F2025\u002F11\u002F25\u002Fdatafusion-51.0.0\u002F)，带来了显著的性能提升、新的 SQL 功能以及更强的可观性支持。\n\n![DataFusion v51 ClickBench 性能图](https:\u002F\u002Fdatafusion.apache.org\u002Fblog\u002Fimages\u002Fdatafusion-51.0.0\u002Fperformance_over_time_clickbench.png)\n\n**性能改进**：\n\n- **更快的 `CASE` 表达式评估**：表达式现在可以更早地进行短路计算、重用部分结果并避免不必要的散射操作，从而加速常见的 ETL 流程；\n- **远程 Parquet 文件读取的默认行为优化**：DataFusion 现在会默认获取 Parquet 文件的最后 512KB 数据，通常可避免每个文件产生两次 I\u002FO 请求；\n- **更快的 Parquet 元数据解析**：利用 Arrow 57 的全新 thrift 元数据解析器，元数据解析速度最高可提升至原来的四倍。\n\n**新 SQL 功能**：\n\n- **SQL 管道运算符**：支持使用 `|>` 语法进行内联转换；\n- **`DESCRIBE \u003Cquery>`**：无需执行即可返回任意查询的 Schema；\n- **SQL 函数中的命名参数**：支持 PostgreSQL 风格的 `param => value` 语法，适用于标量函数、聚合函数和窗口函数；\n- **Decimal32\u002FDecimal64 支持**：新增对这些 Arrow 类型的支持，包括 `SUM`、`AVG` 以及 `MIN\u002FMAX` 等聚合操作。\n\n管道运算符示例：\n\n```sql\nSELECT * FROM t\n|> WHERE a > 10\n|> ORDER BY b\n|> LIMIT 5;\n```\n\n**可观性增强**：\n\n- **改进的 `EXPLAIN ANALYZE` 指标**：新增了 `output_bytes`、过滤条件的选择率 `selectivity`、聚合操作的缩减因子 `reduction_factor` 以及详细的计时分解信息等指标。\n\n### Arrow 57.2 升级\n\nSpice 已升级至 [Apache Arrow Rust 57.2.0](https:\u002F\u002Farrow.apache.org\u002Fblog\u002F2025\u002F10\u002F30\u002Farrow-rs-57.0.0\u002F) 版本，带来了重大性能提升和新功能。\n\n![Arrow 57 Parquet 元数据","2026-01-21T20:56:03",{"id":228,"version":229,"summary_zh":230,"released_at":231},90309,"v1.11.0-rc.1","# Spice v1.11.0-rc.1（2026年1月5日）\n\nv1.11.0-rc.1 是 v1.11 版本的首个候选发布版，用于提前测试新功能，包括面向企业级安全集群通信的 **mTLS 分布式查询**、用于直接访问网络附加存储的新 **SMB 和 NFS 数据连接器**、可提升查询性能与安全性的 **预处理语句**、配备基于密钥的删除向量及对 Amazon S3 Express One Zone 支持的 **Cayenne 加速器增强**、可扩展 AI 推理能力的 **Google LLM 支持**，以及新增参数化查询支持的 **Spice Java SDK v0.5.0**。\n\n## v1.11.0-rc.1 的新特性\n\n### 基于 mTLS 的分布式查询\n\n**企业级安全集群通信**：[分布式查询](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fdistributed-query) 集群模式现默认启用双向 TLS（mTLS），以确保调度器与执行器之间的通信安全。集群内部通信包含高权限的 RPC 调用，例如获取 Spicepod 配置和展开敏感秘密等操作。mTLS 可确保只有经过身份验证的节点才能加入集群并访问敏感数据。\n\n**核心特性**：\n\n- **双向 TLS 认证**：所有执行器到调度器以及执行器之间在集群内部端口（50052）上的 gRPC 连接均采用 mTLS 保护，从而保障通信安全，并防止未经授权的节点加入集群。\n- **证书管理 CLI 工具**：新增开发者命令 `spice cluster tls init` 和 `spice cluster tls add`，用于生成 CA 证书及带有正确 SAN（主题备用名称）的节点证书。\n- **简化 CLI 参数**：为提高清晰度，重命名了集群相关参数（`--role`、`--scheduler-address`、`--node-mtls-*`），其中 `--scheduler-address` 默认表示 `--role executor`。\n- **端口隔离**：公共服务（Flight 查询、HTTP API、Prometheus 指标）分别运行在 50051、8090 和 9090 端口；而集群内部服务（`SchedulerGrpcServer`、`ClusterService`）则被隔离在 50052 端口，并强制使用 mTLS。\n- **开发模式**：可通过 `--allow-insecure-connections` 标志禁用 mTLS 要求，以便进行本地开发和测试。\n\n**快速入门**：\n\n```bash\n# 生成开发用证书\nspice cluster tls init\nspice cluster tls add scheduler1\nspice cluster tls add executor1\n\n# 启动调度器\nspiced --role scheduler \\\n  --node-mtls-ca-certificate-file ca.crt \\\n  --node-mtls-certificate-file scheduler1.crt \\\n  --node-mtls-key-file scheduler1.key\n\n# 启动执行器\nspiced --role executor \\\n  --scheduler-address https:\u002F\u002Fscheduler1:50052 \\\n  --node-mtls-ca-certificate-file ca.crt \\\n  --node-mtls-certificate-file executor1.crt \\\n  --node-mtls-key-file executor1.key\n```\n\n更多详细信息，请参阅 [分布式查询文档](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fdistributed-query)。\n\n### SMB 和 NFS 数据连接器\n\n**网络附加存储连接器**：新的 [数据连接器](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-conne","2026-01-07T05:27:59",{"id":233,"version":234,"summary_zh":235,"released_at":236},90310,"v1.10.4","# Spice v1.10.4 (Jan 5, 2026)\r\n\r\nv1.10.4 is a patch release with fixes for **Kafka\u002FDebezium batch commits**, **ABFSS URL support** for Azure Data Lake Storage Gen2, and improved **column projection handling** for location metadata columns.\r\n\r\n## What's New in v1.10.4\r\n\r\n### Additional Improvements & Bug Fixes\r\n\r\n- **Reliability**: Fixed Kafka and Debezium batch commit handling to properly commit offsets across all partitions. Previously, only the last message's offset was committed, which could cause message loss when batches contained messages from multiple partitions.\r\n- **Reliability**: Added support for `abfss:\u002F\u002F` URL prefix for Azure Data Lake Storage Gen2, in addition to the existing `abfs:\u002F\u002F` prefix. The `abfss` scheme indicates secure (TLS) connections to ADLS Gen2.\r\n- **Reliability**: Fixed column projection order mismatch when querying datasets with location metadata columns (e.g., `SELECT location, day, size FROM dataset`). Queries that specified columns in a different order than the schema would fail with \"column types must match schema types\" errors.\r\n- **Developer Experience**: Added detailed diagnostic logging for union projection pushdown optimization failures in cluster mode. When projection pushdown cannot be applied, debug-level logs now provide additional context to help identify the root cause.\r\n\r\n## Contributors\r\n\r\n- [@krinart](https:\u002F\u002Fgithub.com\u002Fkrinart)\r\n- [@phillipleblanc](https:\u002F\u002Fgithub.com\u002Fphillipleblanc)\r\n\r\n## Breaking Changes\r\n\r\nNo breaking changes.\r\n\r\n## Cookbook Updates\r\n\r\nNo major cookbook updates.\r\n\r\nThe [Spice Cookbook](https:\u002F\u002Fspiceai.org\u002Fcookbook) includes 84 recipes to help you get started with Spice quickly and easily.\r\n\r\n## Upgrading\r\n\r\nTo upgrade to v1.10.4, use one of the following methods:\r\n\r\n**CLI**:\r\n\r\n```console\r\nspice upgrade\r\n```\r\n\r\n**Homebrew**:\r\n\r\n```console\r\nbrew upgrade spiceai\u002Fspiceai\u002Fspice\r\n```\r\n\r\n**Docker**:\r\n\r\nPull the `spiceai\u002Fspiceai:1.10.4` image:\r\n\r\n```console\r\ndocker pull spiceai\u002Fspiceai:1.10.4\r\n```\r\n\r\nFor available tags, see [DockerHub](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fspiceai\u002Fspiceai\u002Ftags).\r\n\r\n**Helm**:\r\n\r\n```console\r\nhelm repo update\r\nhelm upgrade spiceai spiceai\u002Fspiceai\r\n```\r\n\r\n**AWS Marketplace**:\r\n\r\n🎉 Spice is now available in the [AWS Marketplace](https:\u002F\u002Faws.amazon.com\u002Fmarketplace\u002Fpp\u002Fprodview-jmf6jskjvnq7i)!\r\n\r\n## What's Changed\r\n\r\n### Changelog\r\n\r\n- Update acknowledgements by [@app\u002Fgithub-actions](https:\u002F\u002Fgithub.com\u002Fapp\u002Fgithub-actions) in [#8695](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F8695)\r\n- Proper batch commit for kafka\u002Fdebezium by [@krinart](https:\u002F\u002Fgithub.com\u002Fkrinart) in [#8671](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F8671)\r\n- Add support for abfss by [@krinart](https:\u002F\u002Fgithub.com\u002Fkrinart) in [#8706](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F8706)\r\n- cluster: UnionProjectionPushdownOptimizer: Add projection pushdown diagnostics for union children by [@phillipleblanc](https:\u002F\u002Fgithub.com\u002Fphillipleblanc) in [#8734](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F8734)\r\n- Fix column projection order mismatch with location metadata columns by [@phillipleblanc](https:\u002F\u002Fgithub.com\u002Fphillipleblanc) in [#8738](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F8738)","2026-01-05T07:00:33",{"id":238,"version":239,"summary_zh":240,"released_at":241},90311,"v1.10.3","# Spice v1.10.3 (Dec 29, 2025)\r\n\r\nv1.10.3 is a patch release with improved startup reliability, fixes for Azure BlobFS versioned containers, S3 custom endpoint query resolution, and a fix for the OpenAI Responses API.\r\n\r\n## What's New in v1.10.3\r\n\r\n### Additional Improvements & Bug Fixes\r\n\r\n- **Reliability**: Telemetry exporter initialization now runs asynchronously, preventing blocked startup in environments with network restrictions (e.g., Kubernetes with restrictive network policies).\r\n- **Reliability**: Fixed an issue where queries on Azure Blob containers with versioning enabled would fail with \"Azure does not support suffix range requests\" error in distributed query mode.\r\n- **Reliability**: Fixed S3 location-based queries against custom S3 endpoints (e.g., MinIO, LocalStack). Queries with `location` predicates on datasets using `s3_endpoint` and `s3_region` parameters now correctly route to the configured endpoint instead of defaulting to AWS S3.\r\n- **Reliability**: Fixed \"project index out of bounds\" errors in the query optimizer when union children have mismatched schemas. The optimizer now validates schema compatibility before applying projection pushdown.\r\n- **Reliability**: Fixed an issue where the OpenAI Responses API (`\u002Fv1\u002Fresponses`) was not working correctly.\r\n\r\n## Contributors\r\n\r\n- [@lukekim](https:\u002F\u002Fgithub.com\u002Flukekim)\r\n- [@phillipleblanc](https:\u002F\u002Fgithub.com\u002Fphillipleblanc)\r\n\r\n## Breaking Changes\r\n\r\nNo breaking changes.\r\n\r\n## Cookbook Updates\r\n\r\nNo major cookbook updates.\r\n\r\nThe [Spice Cookbook](https:\u002F\u002Fspiceai.org\u002Fcookbook) includes 84 recipes to help you get started with Spice quickly and easily.\r\n\r\n## Upgrading\r\n\r\nTo upgrade to v1.10.3, use one of the following methods:\r\n\r\n**CLI**:\r\n\r\n```console\r\nspice upgrade\r\n```\r\n\r\n**Homebrew**:\r\n\r\n```console\r\nbrew upgrade spiceai\u002Fspiceai\u002Fspice\r\n```\r\n\r\n**Docker**:\r\n\r\nPull the `spiceai\u002Fspiceai:1.10.3` image:\r\n\r\n```console\r\ndocker pull spiceai\u002Fspiceai:1.10.3\r\n```\r\n\r\nFor available tags, see [DockerHub](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fspiceai\u002Fspiceai\u002Ftags).\r\n\r\n**Helm**:\r\n\r\n```console\r\nhelm repo update\r\nhelm upgrade spiceai spiceai\u002Fspiceai\r\n```\r\n\r\n**AWS Marketplace**:\r\n\r\n🎉 Spice is now available in the [AWS Marketplace](https:\u002F\u002Faws.amazon.com\u002Fmarketplace\u002Fpp\u002Fprodview-jmf6jskjvnq7i)!\r\n\r\n## What's Changed\r\n\r\n### Changelog\r\n\r\n- Upgrade to openai-async v0.32 by [@lukekim](https:\u002F\u002Fgithub.com\u002Flukekim) in [#8635](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F8635)\r\n- Fix issue with location predicate for custom S3 endpoints + regression integration test by [@phillipleblanc](https:\u002F\u002Fgithub.com\u002Fphillipleblanc) in [#8668](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F8668)\r\n- fix: Validate schema match before projection pushdown in UnionProjectionPushdownOptimizer by [@phillipleblanc](https:\u002F\u002Fgithub.com\u002Fphillipleblanc) in [#8669](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F8669)\r\n- Start the anonymous telemetry exporter asynchronously by [@phillipleblanc](https:\u002F\u002Fgithub.com\u002Fphillipleblanc) in [#8679](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F8679)\r\n- fix: Azure does not support suffix range requests by [@phillipleblanc](https:\u002F\u002Fgithub.com\u002Fphillipleblanc) in [#8685](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F8685)","2025-12-29T11:05:52",{"id":243,"version":244,"summary_zh":245,"released_at":246},90312,"v1.10.2","# Spice v1.10.2 (Dec 22, 2025)\r\n\r\nv1.10.2 introduces **Tiered Caching Acceleration with Localpod** for multi-layer acceleration architectures, **Periodic Acceleration Snapshots** with configurable intervals, **DynamoDB JSON Nesting** for column consolidation, and **Kafka\u002FDebezium Batching** for faster data ingestion. This release also includes fixes for SQLite accelerator decimal\u002Fdate handling and real-time status reporting for the `\u002Fv1\u002Fdatasets` and `\u002Fv1\u002Fmodels` API endpoints.\r\n\r\n## What's New in v1.10.2\r\n\r\n### Tiered Caching with Localpod\r\n\r\n**Multi-Layer Acceleration Architecture**: The [Localpod connector](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-connectors\u002Flocalpod) now supports `caching` refresh mode, enabling tiered acceleration where a persistent cache (e.g., file-mode DuckDB) feeds a fast in-memory cache (e.g., Arrow, memory-mode DuckDB).\r\n\r\n**Key Features**:\r\n\r\n- **Automatic Cache Propagation**: New cache entries automatically propagate from parent to child accelerators\r\n- **Warm Startup**: Child accelerators initialize from existing parent data on startup, eliminating cold-start latency\r\n- **Flexible Tiering**: Combine any accelerator engines (DuckDB, SQLite, Cayenne) across tiers\r\n\r\nExample `spicepod.yaml` configuration:\r\n\r\n```yaml\r\ndatasets:\r\n  # Parent: persistent file-mode cache\r\n  - from: https:\u002F\u002Fapi.example.com\r\n    name: api_cache\r\n    acceleration:\r\n      enabled: true\r\n      refresh_mode: caching\r\n      engine: duckdb\r\n      mode: file\r\n\r\n  # Child: fast in-memory cache fed by parent\r\n  - from: localpod:api_cache\r\n    name: api_cache_memory\r\n    acceleration:\r\n      enabled: true\r\n      refresh_mode: caching\r\n      engine: arrow\r\n      mode: memory\r\n```\r\n\r\nFor more details, refer to the [Localpod Data Connector Documentation](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-connectors\u002Flocalpod).\r\n\r\n### Periodic Acceleration Snapshots\r\n\r\n**Configurable Snapshot Intervals**: A new `snapshots_create_interval` parameter enables periodic snapshot creation for accelerated datasets across all refresh modes. This provides better control over snapshot frequency and ensures consistent recovery points for accelerated data.\r\n\r\nExample `spicepod.yaml` configuration:\r\n\r\n```yaml\r\ndatasets:\r\n  - from: s3:\u002F\u002Fmy-bucket\u002Fdata.parquet\r\n    name: my_data\r\n    acceleration:\r\n      enabled: true\r\n      engine: duckdb\r\n      mode: file\r\n      refresh_mode: caching\r\n      snapshots: enabled\r\n      params:\r\n        snapshots_create_interval: 60s # Write a snapshot every 60 seconds\r\n```\r\n\r\nFor more details, refer to the [Data Acceleration Documentation](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fdata-acceleration).\r\n\r\n### DynamoDB JSON Nesting\r\n\r\n**Consolidate Columns into JSON**: The [DynamoDB Data Connector](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-connectors\u002Fdynamodb) now supports consolidating columns into a single JSON column using the `json_object: \"*\"` metadata option. This is useful when only a few columns are needed as discrete fields while the rest can be accessed as nested JSON.\r\n\r\nExample `spicepod.yaml` configuration:\r\n\r\n```yaml\r\ndatasets:\r\n  - from: dynamodb:my_table\r\n    name: my_table\r\n    columns:\r\n      - name: PK\r\n      - name: SK\r\n      - name: data_json\r\n        metadata:\r\n          json_object: '*' # Captures all other columns as JSON\r\n```\r\n\r\n**Example Output**: Given a DynamoDB table with columns `PK`, `SK`, `name`, `email`, and `status`, the resulting table schema consolidates all non-specified columns into the `data_json` column:\r\n\r\n| PK   | SK     | data_json                                                             |\r\n| ---- | ------ | --------------------------------------------------------------------- |\r\n| pk_1 | sort_1 | `{\"name\": \"Alice\", \"email\": \"alice@example.com\", \"status\": \"active\"}` |\r\n| pk_2 | sort_2 | `{\"name\": \"Bob\", \"email\": \"bob@example.com\", \"status\": \"inactive\"}`   |\r\n\r\nFor more details, refer to the [DynamoDB JSON Nesting Documentation](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-connectors\u002Fdynamodb#json-nesting).\r\n\r\n### Kafka\u002FDebezium Batching\r\n\r\n**Faster Data Ingestion**: Configure message batching for [Kafka](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-connectors\u002Fkafka) and [Debezium](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-connectors\u002Fdebezium) connectors to improve data ingestion throughput. Batching reduces processing overhead by grouping multiple messages together before insertion.\r\n\r\n**Key Features**:\r\n\r\n- **Configurable Batch Size**: Control the maximum number of records per batch (default: 10,000)\r\n- **Configurable Batch Duration**: Set the maximum wait time before flushing a partial batch (default: 1s)\r\n\r\nExample `spicepod.yaml` configuration:\r\n\r\n```yaml\r\ndatasets:\r\n  - from: debezium:kafka-server.public.my_table\r\n    name: my_table\r\n    params:\r\n      batch_max_size: 10000 # Max records per batch (default: 10000)\r\n      batch_max_duration: 1s # Max wait time per batch (default: 1s)\r\n```\r\n\r\nFor more details, refer to the [Kafka Data Connector Documentation](https:\u002F\u002Fspiceai.org\u002Fdoc","2025-12-22T22:39:49",{"id":248,"version":249,"summary_zh":250,"released_at":251},90313,"v1.10.1","# Spice v1.10.1 (Dec 15, 2025)\r\n\r\nv1.10.1 is a patch release with **Cayenne accelerator improvements** including configurable compression strategies and improved partition ID handling, **isolated refresh runtime** for better query API responsiveness, and **security hardening**. In addition, the GO SDK, gospice v8 has been released.\r\n\r\n## What's New in v1.10.1\r\n\r\n### Cayenne Accelerator Improvements\r\n\r\nSeveral improvements and bug fixes for the [Cayenne data accelerator](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators\u002Fcayenne):\r\n\r\n- **Compression Strategies**: The new `cayenne_compression_strategy` parameter enables choosing between `zstd` for compact storage or `btrblocks` for encoding-efficient compression.\r\n- **Improved Vortex Defaults**: Aligned Cayenne to Vortex footer configuration for better compatibility.\r\n- **Partition ID Handling**: Improved partition ID generation to avoid potential locking race conditions.\r\n\r\nExample `spicepod.yaml` configuration:\r\n\r\n```yaml\r\ndatasets:\r\n  - from: s3:\u002F\u002Fmy-bucket\u002Fdata.parquet\r\n    name: my_dataset\r\n    acceleration:\r\n      enabled: true\r\n      engine: cayenne\r\n      mode: file\r\n      params:\r\n        cayenne_compression_strategy: zstd # or btrblocks (default)\r\n```\r\n\r\nFor more details, refer to the [Cayenne Data Accelerator Documentation](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators\u002Fcayenne).\r\n\r\n### Isolated Refresh Runtime\r\n\r\nRefresh tasks now run on a separate Tokio runtime isolated from the main query API. This prevents long-running or resource-intensive refresh operations from impacting query latency and ensures the `\u002Fhealth` endpoint remains responsive during heavy refresh workloads.\r\n\r\n### Security Hardening\r\n\r\nMultiple security improvements have been implemented:\r\n\r\n- **Recursion Depth Limits**: Added limits to DynamoDB and S3 Vectors integrations to prevent stack overflow from deeply nested structures, mitigating potential DoS attacks.\r\n- **Spicepod Summary API**: The GET `\u002Fv1\u002Fspicepods` endpoint now returns summarized information instead of full `spicepod.yaml` representations, preventing potential sensitive information leakage.\r\n\r\n### Additional Improvements & Bug Fixes\r\n\r\n- **Performance**: Fixed double hashing of user supplied cache keys, improving cache lookup efficiency.\r\n- **Reliability**: Fixed idle DynamoDB Stream handling for more stable CDC operations.\r\n- **Reliability**: Added warnings when multiple partitions are defined for the same table.\r\n- **Performance**: Eagerly drop cached records for results larger than max cache size.\r\n\r\n## Spice Go SDK v8\r\n\r\nThe Spice Go SDK has been upgraded to v8 with a cleaner API, parameterized queries, and health check methods: [gospice v8.0.0](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fgospice\u002Freleases\u002Ftag\u002Fv8.0.0).\r\n\r\n**Key Features**:\r\n\r\n- **Cleaner API**: New `Sql()` and `SqlWithParams()` methods with more intuitive naming.\r\n- **Parameterized Queries**: Safe, SQL-injection-resistant queries with automatic Go-to-Arrow type inference.\r\n- **Typed Parameters**: Explicit type control with constructors like `Decimal128Param`, `TimestampParam`, and more.\r\n- **Health Check Methods**: New `IsSpiceHealthy()` and `IsSpiceReady()` methods for instance monitoring.\r\n- **Upgraded Dependencies**: Apache Arrow v18 and ADBC Go driver v1.3.0.\r\n\r\nExample usage with a local Spice runtime:\r\n\r\n```go\r\nimport \"github.com\u002Fspiceai\u002Fgospice\u002Fv8\"\r\n\r\n\u002F\u002F Initialize client for local runtime\r\nspice := gospice.NewSpiceClient()\r\ndefer spice.Close()\r\n\r\nif err := spice.Init(\r\n    gospice.WithFlightAddress(\"grpc:\u002F\u002Flocalhost:50051\"),\r\n); err != nil {\r\n    panic(err)\r\n}\r\n\r\n\u002F\u002F Parameterized query (safe from SQL injection)\r\nreader, err := spice.SqlWithParams(\r\n    ctx,\r\n    \"SELECT * FROM users WHERE id = $1 AND created_at > $2\",\r\n    userId,\r\n    startTime,\r\n)\r\n```\r\n\r\n**Upgrade**:\r\n\r\n```console\r\ngo get github.com\u002Fspiceai\u002Fgospice\u002Fv8@v8.0.0\r\n```\r\n\r\nFor more details, refer to the [Go SDK Documentation](https:\u002F\u002Fdocs.spice.ai\u002Fsdks\u002Fgo).\r\n\r\n## Contributors\r\n\r\n- [@phillipleblanc](https:\u002F\u002Fgithub.com\u002Fphillipleblanc)\r\n- [@lukekim](https:\u002F\u002Fgithub.com\u002Flukekim)\r\n- [@peasee](https:\u002F\u002Fgithub.com\u002Fpeasee)\r\n- [@krinart](https:\u002F\u002Fgithub.com\u002Fkrinart)\r\n- [@jeadie](https:\u002F\u002Fgithub.com\u002Fjeadie)\r\n- [@sgrebnov](https:\u002F\u002Fgithub.com\u002Fsgrebnov)\r\n- [@mach-kernel](https:\u002F\u002Fgithub.com\u002Fmach-kernel)\r\n\r\n## Breaking Changes\r\n\r\n- GET `\u002Fv1\u002Fspicepods` no longer returns the full `spicepod.yaml` JSON representation. A summary is returned instead. See [#8404](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F8404).\r\n\r\n## Cookbook Updates\r\n\r\nNo major cookbook updates.\r\n\r\nThe [Spice Cookbook](https:\u002F\u002Fspiceai.org\u002Fcookbook) includes 82+ recipes to help you get started with Spice quickly and easily.\r\n\r\n## Upgrading\r\n\r\nTo upgrade to v1.10.1, use one of the following methods:\r\n\r\n**CLI**:\r\n\r\n```console\r\nspice upgrade\r\n```\r\n\r\n**Homebrew**:\r\n\r\n```console\r\nbrew upgrade spiceai\u002Fspiceai\u002Fspice\r\n```\r\n\r\n**Docker**:\r\n\r\nPull the `spiceai\u002Fspiceai:1.10.1` image:\r\n\r\n```console\r\ndocker pull spiceai\u002Fspiceai:1.10.1\r\n```\r\n\r\nFor availabl","2025-12-16T20:53:19",{"id":253,"version":254,"summary_zh":255,"released_at":256},90314,"v1.10.0","# Spice v1.10.0 (Dec 9, 2025)\r\n\r\nSpice v1.10.0 introduces a new **Caching Acceleration Mode** with stale-while-revalidate (SWR) semantics for disk-persisted, low-latency queries with background refresh. This release also adds the **TinyLFU eviction policy** for the SQL results cache, a preview of the **DynamoDB Streams connector** for real-time CDC, **S3 location predicate pruning** for faster partitioned queries, improved **distributed query execution**, and multiple security hardening improvements.\r\n\r\n## What's New in v1.10.0\r\n\r\n### Caching Acceleration Mode\r\n\r\n**Low-Latency Queries with Background Refresh**: This release introduces a new `caching` [acceleration mode](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators) that implements the stale-while-revalidate (SWR) pattern. Queries return cached results immediately while data refreshes asynchronously in the background, eliminating query latency spikes during refresh cycles. Cached data persists to disk using DuckDB, SQLite, or Cayenne file modes.\r\n\r\n**Key Features**:\r\n\r\n- **Stale-While-Revalidate (SWR)**: Returns cached data immediately while refreshing in the background, reducing query latency\r\n- **Disk Persistence**: Cached results persist across restarts using DuckDB, SQLite, or Cayenne file modes\r\n- **Configurable Refresh**: Control refresh intervals with `refresh_check_interval` to balance freshness and source load\r\n\r\n> **Recommendation**: Use [retention configuration](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Freference\u002Fspicepod\u002Fdatasets#accelerationretention_check_enabled) with caching acceleration to ensure stale data is cleaned up over time.\r\n\r\nExample spicepod.yaml configuration:\r\n\r\n```yaml\r\ndatasets:\r\n  - from: http:\u002F\u002Flocalhost:7400\r\n    name: cached_data\r\n    time_column: fetched_at\r\n    acceleration:\r\n      enabled: true\r\n      engine: duckdb\r\n      mode: file # Persist cache to disk\r\n      refresh_mode: caching\r\n      refresh_check_interval: 10m\r\n      retention_check_enabled: true\r\n      retention_period: 24h\r\n      retention_check_interval: 1h\r\n```\r\n\r\nFor more details, refer to the [Data Acceleration Documentation](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fdata-acceleration).\r\n\r\n### TinyLFU Cache Eviction Policy\r\n\r\n**Higher Cache Hit Rates for SQL Results Cache**: A new TinyLFU [cache eviction policy](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fcaching#cache-eviction-policies) is now available for the SQL results cache. TinyLFU is a probabilistic cache admission policy that maintains higher hit rates than LRU while keeping memory usage predictable, making it ideal for workloads with varying query frequency patterns.\r\n\r\nExample spicepod.yaml configuration:\r\n\r\n```yaml\r\nruntime:\r\n  caching:\r\n    sql_results:\r\n      enabled: true\r\n      eviction_policy: tiny_lfu # default: lru\r\n```\r\n\r\nFor more details, refer to the [Caching Documentation](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fcaching) and the [Moka TinyLFU Documentation](https:\u002F\u002Fdocs.rs\u002Fmoka\u002Flatest\u002Fmoka\u002F#tinylfu) for details of the algorithm.\r\n\r\n### DynamoDB Streams Data Connector (Preview)\r\n\r\n**Real-Time Change Data Capture for DynamoDB**: The DynamoDB connector now integrates with DynamoDB Streams for real-time change data capture (CDC). This enables continuous synchronization of DynamoDB table changes into Spice for real-time query, search, and LLM-inference.\r\n\r\n**Key Features**:\r\n\r\n- **Real-Time CDC**: Automatically captures inserts, updates, and deletes from DynamoDB tables as they occur\r\n- **Table Bootstrapping**: Performs an initial full table scan before streaming changes, ensuring complete data consistency\r\n- **Acceleration Integration**: Works with `refresh_mode: changes` to incrementally update accelerated datasets\r\n\r\n> **Note**: DynamoDB Streams must be enabled on your DynamoDB table. This feature is in preview.\r\n\r\nExample spicepod.yaml configuration:\r\n\r\n```yaml\r\ndatasets:\r\n  - from: dynamodb:my_table\r\n    name: orders_stream\r\n    acceleration:\r\n      enabled: true\r\n      refresh_mode: changes # Enable Streams capture\r\n```\r\n\r\nFor more details, refer to the [DynamoDB Connector Documentation](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-connectors\u002Fdynamodb#streams).\r\n\r\n### OpenTelemetry Metrics Exporter\r\n\r\nSpice can now push metrics to an [OpenTelemetry](https:\u002F\u002Fopentelemetry.io\u002F) collector, enabling integration with platforms such as [Jaeger](https:\u002F\u002Fwww.jaegertracing.io\u002F), [New Relic](https:\u002F\u002Fnewrelic.com\u002F), [Honeycomb](https:\u002F\u002Fwww.honeycomb.io\u002F), and other OpenTelemetry-compatible backends.\r\n\r\n**Key Features**:\r\n\r\n- **Protocol Support**: Supports the gRPC (default port 4317) protocol\r\n- **Configurable Push Interval**: Control how frequently metrics are pushed to the collector\r\n\r\nExample spicepod.yaml configuration for gRPC:\r\n\r\n```yaml\r\nruntime:\r\n  telemetry:\r\n    enabled: true\r\n    otel_exporter:\r\n      endpoint: 'localhost:4317'\r\n      push_interval: '30s'\r\n```\r\n\r\nFor more details, refer to the [Observability & Monitoring Documentation](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fobservability).\r\n\r\n### S3 Connector Improvements","2025-12-09T07:30:00",{"id":258,"version":259,"summary_zh":260,"released_at":261},90315,"v1.10.0-rc.1","# Spice v1.10.0-rc1 (Dec 2, 2025)\n\nv1.10.0-rc1 is a release candidate for early testing of v1.10 features including an all new `caching` acceleration mode, `tiny_lfu` caching policy, a new DynamoDB Streams connector (Preview), improvements to the DynamoDB connector, faster distributed query execution, S3 connector improvements, and security hardening for v1.10.0-stable.\n\n## What's New in v1.10.0-rc1\n\n### Caching Acceleration Mode with SWR and TinyLFU\n\nThis release introduces a new `caching` [acceleration mode](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators) that implements the stale-while-revalidate (SWR) pattern using Data Accelerators such as DuckDB or Cayenne, enabling queries to return file-persisted cached results immediately while asynchronously refreshing data in the background. Combined with the new TinyLFU [cache eviction policy](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fcaching#cache-eviction-policies), Spice can now maintain higher cache hit rates while keeping memory usage predictable.\n\n**Key Features**:\n\n- **Stale-While-Revalidate (SWR)**: Returns cached data immediately while refreshing in the background\n- **Data Accelerator Support**: Cached accelerators can persist data to disk using DuckDB, SQLite, or Cayenne file modes.\n- **TinyLFU Cache Policy**: Probabilistic cache admission policy that maintains high hit rates with minimal overhead\n- **Predictable Memory Usage**: Configurable memory limits with automatic eviction of less frequently used entries\n\nExample Spicepod.yml configuration:\n\n```yaml\nruntime:\n  caching:\n    sql_results:\n      enabled: true\n      eviction_policy: tiny_lfu # default lru\n\ndatasets:\n  - from: s3:\u002F\u002Fmy-bucket\u002Fdata.parquet\n    name: cached_data\n    acceleration:\n      enabled: true\n      engine: duckdb\n      mode: file # Persist cache to disk\n      refresh_mode: caching\n      refresh_check_interval: 10m\n```\n\nFor more details, refer to the [Data Acceleration Documentation](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fdata-acceleration) and [Caching Documentation](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fcaching).\n\n### DynamoDB Streams Data Connector in Preview\n\nDynamoDB Connector now integrates with DynamoDB Streams which enables real-time streaming with support for both table bootstrapping and continuous change data capture (CDC). This connector automatically detects changes in DynamoDB tables and streams them into Spice for real-time query, search, and LLM-inference.\n\n**Key Features**:\n\n- **Real-Time CDC**: Automatically captures inserts, updates, and deletes from DynamoDB tables\n- **Table Bootstrapping**: Initial full table load before streaming changes\n\nExample Spicepod.yml configuration:\n\n```yaml\ndatasets:\n  - from: dynamodb:my_table\n    name: orders_stream\n    acceleration:\n      enabled: true\n      refresh_mode: changes\n```\n\nFor more details, refer to the [DynamoDB Connector Documentation](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-connectors\u002Fdynamodb#streams).\n\n### Cayenne Accelerator Enhancements\n\nThe [Cayenne data accelerator](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators\u002Fcayenne) now supports:\n\n- **Sort Columns Configuration**: Optimize inserts by pre-sorting data on specified columns for improved query performance\n\nExample Spicepod.yml configuration:\n\n```yaml\ndatasets:\n  - from: s3:\u002F\u002Fmy-bucket\u002Fdata.parquet\n    name: sorted_data\n    acceleration:\n      enabled: true\n      engine: cayenne\n      mode: file_create\n      params:\n        sort_columns: timestamp,region\n```\n\nFor more details, refer to the [Cayenne Documentation](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators\u002Fcayenne).\n\n### S3 Connector Improvements\n\n**S3 Location Predicate Pruning**: The S3 [data connector](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-connectors\u002Fs3) now supports location-based predicate pruning, dramatically reducing data scanned by pushing down predicates to S3 listing operations. This optimization is especially effective for partitioned datasets stored in S3.\n\n**AWS S3 Tables Write Support**: Full read\u002Fwrite capability for [AWS S3 Tables](https:\u002F\u002Fdocs.aws.amazon.com\u002FAmazonS3\u002Flatest\u002Fuserguide\u002Fs3-tables-buckets.html), enabling fast integration with AWS's table format for S3.\n\nFor more details, refer to the [S3 Tables Data Connector Documentation](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-connectors\u002Fs3) and [Glue Data Connection Documentation](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-connectors\u002Fglue).\n\n### Faster Distributed Query Execution\n\n[Distributed query](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fdistributed-query) planning and execution have been significantly improved:\n\n- **Fixed executor registration** in cluster mode for more reliable distributed deployments\n- **Improved hostname resolution** for Flight server binding, enabling better executor discovery\n- **Distributed accelerator registration**: Data accelerators now properly register in distributed mode\n- **Optimized query planning**: `DistributeFileScanOptimizer` improvements for faster planning with large datasets\n\nFor more details, refer t","2025-12-03T10:32:19",{"id":263,"version":264,"summary_zh":265,"released_at":266},90316,"v1.9.2.d2541e8","## What's Changed\r\n* 1.9.2 Release Notes (cherry-pick) by @krinart in https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F8315\r\n* release\u002F1.9: cherry pick security fixes by @phillipleblanc in https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F8303\r\n\r\n\r\n**Full Changelog**: https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fcompare\u002Fv1.9.2...v1.9.2.d2541e8","2025-12-04T17:34:51",{"id":268,"version":269,"summary_zh":270,"released_at":271},90317,"v1.9.2","# Spice v1.9.2 (Nov 26, 2025)\r\n\r\nv1.9.2 is a patch release that fixes a bug in SQL query results cache metrics emission, ensuring cache performance metrics are properly available for monitoring and observability.\r\n\r\n## What's New in v1.9.2\r\n\r\n## SQL Results Cache Metrics Fixed\r\n\r\nThe SQL query results cache metrics were not being properly emitted to the Prometheus metrics endpoint after startup. This release fixes the issue, ensuring all cache-related metrics are now correctly available for monitoring.\r\n\r\n**Available Cache Metrics**:\r\n\r\nOnce SQL results caching is enabled in `spicepod.yaml`:\r\n\r\n```yaml\r\nruntime:\r\n  caching:\r\n    sql_results:\r\n      enabled: true\r\n```\r\n\r\nYou can now fetch the following `results_cache_*` metrics from the `\u002Fmetrics` endpoint:\r\n\r\n```bash\r\ncurl -s http:\u002F\u002Flocalhost:9090\u002Fmetrics | grep \"^# HELP\" | awk '{print $3}' | sort -u | grep results_cache_\r\n\r\nresults_cache_hit_ratio       # Cache hit ratio (hits \u002F total requests)\r\nresults_cache_items_count     # Number of cached items\r\nresults_cache_max_size_bytes  # Maximum cache size in bytes\r\nresults_cache_misses          # Total cache misses\r\nresults_cache_requests        # Total cache requests\r\nresults_cache_size_bytes      # Current cache size in bytes\r\n```\r\n\r\nFor more information on SQL query results caching, see the [Caching Documentation](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fcaching).\r\n\r\n## Contributors\r\n\r\n- [@peasee](https:\u002F\u002Fgithub.com\u002Fpeasee)\r\n\r\n## Breaking Changes\r\n\r\nNo breaking changes.\r\n\r\n## Cookbook Updates\r\n\r\nNo major cookbook updates.\r\n\r\nThe [Spice Cookbook](https:\u002F\u002Fspiceai.org\u002Fcookbook) includes 82 recipes to help you get started with Spice quickly and easily.\r\n\r\n## Upgrading\r\n\r\nTo upgrade to v1.9.2 use one of the following methods:\r\n\r\n**CLI**:\r\n\r\n```console\r\nspice upgrade\r\n```\r\n\r\n**Homebrew**:\r\n\r\n```console\r\nbrew upgrade spiceai\u002Fspiceai\u002Fspice\r\n```\r\n\r\n**Docker**:\r\n\r\nPull the `spiceai\u002Fspiceai:1.9.2` image:\r\n\r\n```console\r\ndocker pull spiceai\u002Fspiceai:1.9.2\r\n```\r\n\r\nFor available tags, see [DockerHub](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fspiceai\u002Fspiceai\u002Ftags).\r\n\r\n**Helm**:\r\n\r\n```console\r\nhelm repo update\r\nhelm upgrade spiceai spiceai\u002Fspiceai\r\n```\r\n\r\n**AWS Marketplace**:\r\n\r\n🎉 Spice is now available in the [AWS Marketplace](https:\u002F\u002Faws.amazon.com\u002Fmarketplace\u002Fpp\u002Fprodview-jmf6jskjvnq7i)!\r\n\r\n## What's Changed\r\n\r\n### Changelog\r\n\r\n- fix: Ensure caching metrics are emitted after Prometheus startup by [@peasee](https:\u002F\u002Fgithub.com\u002Fpeasee) in [#8184](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fpull\u002F8184)","2025-11-26T22:04:06",{"id":273,"version":274,"summary_zh":275,"released_at":276},90318,"v1.9.1","# Spice v1.9.1 (Nov 24, 2025)\r\n\r\nv1.9.1 introduces **Amazon Bedrock Nova 2 Multimodal embeddings support** with high-dimensional vectors up to 3,072 dimensions and purpose-optimized embeddings for semantic search and retrieval operations, **DynamoDB timestamp filter pushdown** for more efficient append-mode acceleration with configurable time formatting, **HTTP Data Connector health probe configuration** for improved endpoint validation reliability, and **Spice .NET SDK v0.2** with expanded .NET version support and updated gRPC libraries. This release focuses on bug fixes, stability, and performance improvements.\r\n\r\n## Amazon Bedrock Nova 2 Multimodal embeddings\r\n\r\nSpice now supports the Amazon Nova 2 Multimodal embeddings models via the Bedrock models provider, enabling high-quality text embeddings for semantic search and vector similarity operations. The Nova embeddings model offers configurable dimensions and advanced features like truncation modes and embedding purpose optimization.\r\n\r\n**Key Features**:\r\n\r\n- **High-Dimensional Embeddings**: Support for up to 3,072 dimensions for rich semantic representations\r\n- **Configurable Truncation**: Control how input text is truncated when exceeding token limits (`START`, `END`, or `NONE`)\r\n- **Purpose Optimization**: Optimize embeddings for specific use cases (`GENERIC_INDEX`, `GENERIC_RETRIEVAL`, or `CLASSIFICATION`)\r\n- **Multimodal Model**: Leverages Amazon's Nova 2 multimodal architecture for consistent embeddings across different content types\r\n\r\nExample `spicepod.yml` configuration:\r\n\r\n```yaml\r\nembeddings:\r\n  - from: bedrock:amazon.nova-2-multimodal-embeddings-v1:0\r\n    name: nova_embeddings\r\n    params:\r\n      dimensions: '3072' # Required: Output dimensions\r\n      truncation_mode: START # Optional: START, END, or NONE (default: NONE)\r\n      embedding_purpose: GENERIC_RETRIEVAL # Optional. GENERIC_INDEX is default\r\n```\r\n\r\nFor more details on the embedding parameters and configuration options, refer to the [Amazon Nova Embeddings Documentation](https:\u002F\u002Fdocs.aws.amazon.com\u002Fnova\u002Flatest\u002Fuserguide\u002Fembeddings-schema.html) and the [Spice Embeddings Documentation](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fembeddings).\r\n\r\n## DynamoDB Timestamp Filter Pushdown\r\n\r\nThe DynamoDB Data Connector now supports timestamp filter pushdown, enabling more efficient append-mode acceleration refreshes by pushing timestamp filters directly to DynamoDB queries. Since DynamoDB stores timestamps as strings rather than native datetime types, this feature includes configurable timestamp formatting to ensure correct parsing and filtering.\r\n\r\n**Key Features**:\r\n\r\n- Filters on timestamp columns are now pushed down to DynamoDB, reducing data transfer and improving query performance\r\n- Support for Go-style datetime formatting patterns to handle various timestamp string formats\r\n- Uses ISO 8601 format by default when no custom format is specified\r\n\r\nExample `spicepod.yml` configuration:\r\n\r\n```yaml\r\ndatasets:\r\n  - from: dynamodb:sales\r\n    name: sales\r\n    time_column: created_at\r\n    time_format: timestamptz\r\n    params:\r\n      time_format: 2006-01-02T15:04:05.000Z07:00\r\n    acceleration:\r\n      enabled: true\r\n      engine: duckdb\r\n      refresh_mode: append\r\n```\r\n\r\nFor more details, refer to the [DynamoDB Data Connector Documentation](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-connectors\u002Fdynamodb).\r\n\r\n## HTTP Data Connector Health Probe Configuration\r\n\r\nThe HTTP Data Connector now supports configurable health probe paths for endpoint validation. Instead of using a random non-existent path, the system can now validate endpoints using a user-specified path, improving flexibility and reliability for health checks.\r\n\r\nExample `spicepod.yml` configuration:\r\n\r\n```yaml\r\ndatasets:\r\n  - from: https:\u002F\u002Fapi.tvmaze.com\r\n    name: tvmaze\r\n    params:\r\n      file_format: json\r\n      health_probe: \u002Fhealth-check\r\n```\r\n\r\nFor more details, refer to the [HTTP Data Connector Documentation](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-connectors\u002Fhttps).\r\n\r\n## Spice .NET SDK v0.2\r\n\r\nThe Spice .NET SDK has been upgraded with expanded .NET version support, custom User-Agent configuration, and updated gRPC libraries: [spice-dotnet v0.2.0](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspice-dotnet\u002Freleases\u002Ftag\u002Fv0.2.0). The SDK is available on [NuGet](https:\u002F\u002Fwww.nuget.org\u002Fpackages\u002FSpiceAI).\r\n\r\n**Key Features**:\r\n\r\n- **Expanded .NET Support**: Now supports .NET Standard 2.0, .NET Core 8.0, 9.0, and 10.0.\r\n- **Custom User-Agent**: Configure custom User-Agent headers for client identification and telemetry.\r\n- **Updated gRPC Libraries**: Upgraded gRPC dependencies and `netstandard` for improved performance and reliability\r\n\r\n**Upgrade Example**:\r\n\r\n```console\r\ndotnet add package SpiceAI --version 0.2.0\r\n```\r\n\r\nFor more details, refer to the [.NET SDK Documentation](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspice-dotnet\u002Fblob\u002Ftrunk\u002FREADME.md).\r\n\r\n### Additional Improvements & Bug Fixes\r\n\r\n- **Reliability**: Fixed view loading to respect topological order, preventing de","2025-11-25T00:09:37",{"id":278,"version":279,"summary_zh":280,"released_at":281},90319,"v1.9.0","# Spice v1.9.0 (Nov 18, 2025)\r\n\r\nv1.9.0-stable introduces **Spice Cayenne**, a new high-performance data accelerator built on the **Vortex** columnar format that delivers better than DuckDB performance without single-file scaling limitations, and a preview of **Multi-Node Distributed Query** based on Apache Ballista. v1.9.0 also upgrades to DataFusion v50, DuckDB v1.4.2, and Delta-Kernel v0.16 for even higher query performance, expands search capabilities with full-text search on views and multi-column embeddings, and delivers many additional features and improvements.\r\n\r\n## What's New in v1.9.0\r\n\r\n### Cayenne Data Accelerator (Beta)\r\n\r\n**Introducing Cayenne: SQL as an Acceleration Format**: A new high-performance [Data Accelerator](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Ffeatures\u002Fdata-acceleration) that simplifies multi-file data acceleration by using an embedded database (SQLite) for metadata while storing data in the [Vortex columnar format](https:\u002F\u002Fgithub.com\u002Fvortex-data\u002Fvortex), a Linux Foundation project. Cayenne delivers query and ingestion performance better than DuckDB's file-based acceleration without DuckDB's memory overhead and the scaling challenges of single DuckDB files.\r\n\r\nCayenne uses SQLite to manage acceleration metadata (schemas, snapshots, statistics, file tracking) through simple SQL transactions, while storing data in Vortex's compressed columnar format. This architecture provides:\r\n\r\n**Key Features**:\r\n\r\n- **SQLite + Vortex Architecture**: All metadata is stored in SQLite tables with standard SQL transactions, while data lives in Vortex's compressed, chunked columnar format designed for zero-copy access and efficient scanning.\r\n- **Simplified Operations**: No complex file hierarchies, no JSON\u002FAvro metadata files, no separate catalog servers—just SQL tables and Vortex data files. The entire metadata schema is intentionally simple for maximum reliability.\r\n- **Fast Metadata Access**: Single SQL query retrieves all metadata needed for query planning—no multiple round trips to storage, no S3 throttling, no reconstruction of metadata state from scattered files.\r\n- **Efficient Small Changes**: Dramatically reduces small file proliferation. Snapshots are just rows in SQLite tables, not new files on disk. Supports millions of snapshots without performance degradation.\r\n- **High Concurrency**: Changes consist of two steps: stage Vortex files (if any), then run a single SQL transaction. Much faster conflict resolution and support for many more concurrent updates than file-based formats.\r\n- **Advanced Data Lifecycle**: Full ACID transactions, delete support, and retention SQL execution on refresh commit.\r\n\r\nExample Spicepod.yml configuration:\r\n\r\n```yaml\r\ndatasets:\r\n  - from: s3:my_table\r\n    name: accelerated_data_30d\r\n    acceleration:\r\n      enabled: true\r\n      engine: cayenne\r\n      mode: file\r\n      refresh_mode: append\r\n      retention_sql: DELETE FROM accelerated_data WHERE created_at \u003C NOW() - INTERVAL '30 days'\r\n```\r\n\r\nNote, the Cayenne Data Accelerator is in [Beta](https:\u002F\u002Fgithub.com\u002Fspiceai\u002Fspiceai\u002Fblob\u002Ftrunk\u002Fdocs\u002Fcriteria\u002Faccelerators\u002Fbeta.md) with limitations.\r\n\r\nFor more details, refer to the [Cayenne Documentation](https:\u002F\u002Fspiceai.org\u002Fdocs\u002Fcomponents\u002Fdata-accelerators\u002Fcayenne), the [Vortex project](https:\u002F\u002Fgithub.com\u002Fvortex-data\u002Fvortex), and the [DuckLake announcement](https:\u002F\u002Fduckdb.org\u002F2025\u002F05\u002F27\u002Fducklake) that partly inspired this design.\r\n\r\n### Multi-Node Distributed Query (Preview)\r\n\r\n**Apache Ballista Integration**: Spice now supports distributed query execution based on [Apache Ballista](https:\u002F\u002Fgithub.com\u002Fapache\u002Fdatafusion-ballista), enabling distributed queries across multiple executor nodes for improved performance on large datasets. This feature is in **preview** in v1.9.0.\r\n\r\n**Architecture**:\r\n\r\nA distributed Spice cluster consists of:\r\n\r\n- **Scheduler**: Responsible for distributed query planning and work queue management for the executor fleet\r\n- **Executors**: One or more nodes responsible for running physical query plans\r\n\r\n**Getting Started**:\r\n\r\nStart a scheduler instance using an existing Spicepod. The scheduler is the only spiced instance that needs to be configured:\r\n\r\n```bash\r\n# Start scheduler (note the flight bind address override if you want it reachable outside localhost)\r\nspiced --cluster-mode scheduler --flight 0.0.0.0:50051\r\n```\r\n\r\nStart one or more executors configured with the scheduler's flight URI:\r\n\r\n```bash\r\n# Start executor (automatically selects a free port if 50051 is taken)\r\nspiced --cluster-mode executor --scheduler-url spiced:\u002F\u002Flocalhost:50051\r\n```\r\n\r\n**Query Execution**:\r\n\r\nQueries run through the scheduler will now show a `distributed_plan` in `EXPLAIN` output, demonstrating how the query is distributed across executor nodes:\r\n\r\n```sql\r\nEXPLAIN SELECT count(id) FROM my_dataset;\r\n```\r\n\r\n**Current Limitations**:\r\n\r\n- Accelerated datasets are currently **not supported**. This feature is designed for querying partitioned data lake formats (Parquet, Delta ","2025-11-19T06:46:45"]