[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-4paradigm--OpenMLDB":3,"tool-4paradigm--OpenMLDB":61},[4,18,26,36,44,53],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":17},4358,"openclaw","openclaw\u002Fopenclaw","OpenClaw 是一款专为个人打造的本地化 AI 助手，旨在让你在自己的设备上拥有完全可控的智能伙伴。它打破了传统 AI 助手局限于特定网页或应用的束缚，能够直接接入你日常使用的各类通讯渠道，包括微信、WhatsApp、Telegram、Discord、iMessage 等数十种平台。无论你在哪个聊天软件中发送消息，OpenClaw 都能即时响应，甚至支持在 macOS、iOS 和 Android 设备上进行语音交互，并提供实时的画布渲染功能供你操控。\n\n这款工具主要解决了用户对数据隐私、响应速度以及“始终在线”体验的需求。通过将 AI 部署在本地，用户无需依赖云端服务即可享受快速、私密的智能辅助，真正实现了“你的数据，你做主”。其独特的技术亮点在于强大的网关架构，将控制平面与核心助手分离，确保跨平台通信的流畅性与扩展性。\n\nOpenClaw 非常适合希望构建个性化工作流的技术爱好者、开发者，以及注重隐私保护且不愿被单一生态绑定的普通用户。只要具备基础的终端操作能力（支持 macOS、Linux 及 Windows WSL2），即可通过简单的命令行引导完成部署。如果你渴望拥有一个懂你",349277,3,"2026-04-06T06:32:30",[13,14,15,16],"Agent","开发框架","图像","数据工具","ready",{"id":19,"name":20,"github_repo":21,"description_zh":22,"stars":23,"difficulty_score":10,"last_commit_at":24,"category_tags":25,"status":17},3808,"stable-diffusion-webui","AUTOMATIC1111\u002Fstable-diffusion-webui","stable-diffusion-webui 是一个基于 Gradio 构建的网页版操作界面，旨在让用户能够轻松地在本地运行和使用强大的 Stable Diffusion 图像生成模型。它解决了原始模型依赖命令行、操作门槛高且功能分散的痛点，将复杂的 AI 绘图流程整合进一个直观易用的图形化平台。\n\n无论是希望快速上手的普通创作者、需要精细控制画面细节的设计师，还是想要深入探索模型潜力的开发者与研究人员，都能从中获益。其核心亮点在于极高的功能丰富度：不仅支持文生图、图生图、局部重绘（Inpainting）和外绘（Outpainting）等基础模式，还独创了注意力机制调整、提示词矩阵、负向提示词以及“高清修复”等高级功能。此外，它内置了 GFPGAN 和 CodeFormer 等人脸修复工具，支持多种神经网络放大算法，并允许用户通过插件系统无限扩展能力。即使是显存有限的设备，stable-diffusion-webui 也提供了相应的优化选项，让高质量的 AI 艺术创作变得触手可及。",162132,"2026-04-05T11:01:52",[14,15,13],{"id":27,"name":28,"github_repo":29,"description_zh":30,"stars":31,"difficulty_score":32,"last_commit_at":33,"category_tags":34,"status":17},1381,"everything-claude-code","affaan-m\u002Feverything-claude-code","everything-claude-code 是一套专为 AI 编程助手（如 Claude Code、Codex、Cursor 等）打造的高性能优化系统。它不仅仅是一组配置文件，而是一个经过长期实战打磨的完整框架，旨在解决 AI 代理在实际开发中面临的效率低下、记忆丢失、安全隐患及缺乏持续学习能力等核心痛点。\n\n通过引入技能模块化、直觉增强、记忆持久化机制以及内置的安全扫描功能，everything-claude-code 能显著提升 AI 在复杂任务中的表现，帮助开发者构建更稳定、更智能的生产级 AI 代理。其独特的“研究优先”开发理念和针对 Token 消耗的优化策略，使得模型响应更快、成本更低，同时有效防御潜在的攻击向量。\n\n这套工具特别适合软件开发者、AI 研究人员以及希望深度定制 AI 工作流的技术团队使用。无论您是在构建大型代码库，还是需要 AI 协助进行安全审计与自动化测试，everything-claude-code 都能提供强大的底层支持。作为一个曾荣获 Anthropic 黑客大奖的开源项目，它融合了多语言支持与丰富的实战钩子（hooks），让 AI 真正成长为懂上",148568,2,"2026-04-09T23:34:24",[14,13,35],"语言模型",{"id":37,"name":38,"github_repo":39,"description_zh":40,"stars":41,"difficulty_score":32,"last_commit_at":42,"category_tags":43,"status":17},2271,"ComfyUI","Comfy-Org\u002FComfyUI","ComfyUI 是一款功能强大且高度模块化的视觉 AI 引擎，专为设计和执行复杂的 Stable Diffusion 图像生成流程而打造。它摒弃了传统的代码编写模式，采用直观的节点式流程图界面，让用户通过连接不同的功能模块即可构建个性化的生成管线。\n\n这一设计巧妙解决了高级 AI 绘图工作流配置复杂、灵活性不足的痛点。用户无需具备编程背景，也能自由组合模型、调整参数并实时预览效果，轻松实现从基础文生图到多步骤高清修复等各类复杂任务。ComfyUI 拥有极佳的兼容性，不仅支持 Windows、macOS 和 Linux 全平台，还广泛适配 NVIDIA、AMD、Intel 及苹果 Silicon 等多种硬件架构，并率先支持 SDXL、Flux、SD3 等前沿模型。\n\n无论是希望深入探索算法潜力的研究人员和开发者，还是追求极致创作自由度的设计师与资深 AI 绘画爱好者，ComfyUI 都能提供强大的支持。其独特的模块化架构允许社区不断扩展新功能，使其成为当前最灵活、生态最丰富的开源扩散模型工具之一，帮助用户将创意高效转化为现实。",108111,"2026-04-08T11:23:26",[14,15,13],{"id":45,"name":46,"github_repo":47,"description_zh":48,"stars":49,"difficulty_score":32,"last_commit_at":50,"category_tags":51,"status":17},6121,"gemini-cli","google-gemini\u002Fgemini-cli","gemini-cli 是一款由谷歌推出的开源 AI 命令行工具，它将强大的 Gemini 大模型能力直接集成到用户的终端环境中。对于习惯在命令行工作的开发者而言，它提供了一条从输入提示词到获取模型响应的最短路径，无需切换窗口即可享受智能辅助。\n\n这款工具主要解决了开发过程中频繁上下文切换的痛点，让用户能在熟悉的终端界面内直接完成代码理解、生成、调试以及自动化运维任务。无论是查询大型代码库、根据草图生成应用，还是执行复杂的 Git 操作，gemini-cli 都能通过自然语言指令高效处理。\n\n它特别适合广大软件工程师、DevOps 人员及技术研究人员使用。其核心亮点包括支持高达 100 万 token 的超长上下文窗口，具备出色的逻辑推理能力；内置 Google 搜索、文件操作及 Shell 命令执行等实用工具；更独特的是，它支持 MCP（模型上下文协议），允许用户灵活扩展自定义集成，连接如图像生成等外部能力。此外，个人谷歌账号即可享受免费的额度支持，且项目基于 Apache 2.0 协议完全开源，是提升终端工作效率的理想助手。",100752,"2026-04-10T01:20:03",[52,13,15,14],"插件",{"id":54,"name":55,"github_repo":56,"description_zh":57,"stars":58,"difficulty_score":32,"last_commit_at":59,"category_tags":60,"status":17},4721,"markitdown","microsoft\u002Fmarkitdown","MarkItDown 是一款由微软 AutoGen 团队打造的轻量级 Python 工具，专为将各类文件高效转换为 Markdown 格式而设计。它支持 PDF、Word、Excel、PPT、图片（含 OCR）、音频（含语音转录）、HTML 乃至 YouTube 链接等多种格式的解析，能够精准提取文档中的标题、列表、表格和链接等关键结构信息。\n\n在人工智能应用日益普及的今天，大语言模型（LLM）虽擅长处理文本，却难以直接读取复杂的二进制办公文档。MarkItDown 恰好解决了这一痛点，它将非结构化或半结构化的文件转化为模型“原生理解”且 Token 效率极高的 Markdown 格式，成为连接本地文件与 AI 分析 pipeline 的理想桥梁。此外，它还提供了 MCP（模型上下文协议）服务器，可无缝集成到 Claude Desktop 等 LLM 应用中。\n\n这款工具特别适合开发者、数据科学家及 AI 研究人员使用，尤其是那些需要构建文档检索增强生成（RAG）系统、进行批量文本分析或希望让 AI 助手直接“阅读”本地文件的用户。虽然生成的内容也具备一定可读性，但其核心优势在于为机器",93400,"2026-04-06T19:52:38",[52,14],{"id":62,"github_repo":63,"name":64,"description_en":65,"description_zh":66,"ai_summary_zh":67,"readme_en":68,"readme_zh":69,"quickstart_zh":70,"use_case_zh":71,"hero_image_url":72,"owner_login":73,"owner_name":74,"owner_avatar_url":75,"owner_bio":76,"owner_company":77,"owner_location":77,"owner_email":77,"owner_twitter":77,"owner_website":78,"owner_url":79,"languages":80,"stars":119,"forks":120,"last_commit_at":121,"license":122,"difficulty_score":123,"env_os":124,"env_gpu":125,"env_ram":125,"env_deps":126,"category_tags":137,"github_topics":138,"view_count":32,"oss_zip_url":77,"oss_zip_packed_at":77,"status":17,"created_at":150,"updated_at":151,"faqs":152,"releases":153},6148,"4paradigm\u002FOpenMLDB","OpenMLDB","OpenMLDB is an open-source machine learning database that provides a feature platform computing consistent features for training and inference.","OpenMLDB 是一款开源的机器学习数据库，专为解决人工智能工程中数据与特征处理的难题而生。在传统的机器学习开发流程中，从离线训练到在线推理往往涉及两套不同的代码和团队，这不仅导致开发周期漫长，还极易引发“数据泄露”或特征不一致等严重问题，使得企业不得不投入巨大成本进行验证和维护。\n\nOpenMLDB 的核心使命是实现“开发即部署”。它提供了一个统一的特征平台，允许开发者直接使用 SQL 进行复杂的特征工程。通过这一设计，OpenMLDB 确保了用于模型训练的特征与线上实时推理的特征完全一致，从而消除了因环境差异导致的误差。其独特的技术亮点在于能够高效处理实时特征计算，同时满足低延迟、高吞吐和高可用性的生产级要求，将原本需要重构代码的繁琐过程简化为统一的 SQL 开发体验。\n\n这款工具非常适合机器学习工程师、数据科学家以及后端开发人员使用。无论是需要构建实时个性化推荐系统，还是从事金融风控分析的团队，都能利用 OpenMLDB 大幅降低从实验到落地的门槛，将精力更多地集中在算法优化而非工程适配上。作为一个已在数百家企业实际应用中验证过的成熟方案，OpenMLDB 正帮助各类组织以更高","OpenMLDB 是一款开源的机器学习数据库，专为解决人工智能工程中数据与特征处理的难题而生。在传统的机器学习开发流程中，从离线训练到在线推理往往涉及两套不同的代码和团队，这不仅导致开发周期漫长，还极易引发“数据泄露”或特征不一致等严重问题，使得企业不得不投入巨大成本进行验证和维护。\n\nOpenMLDB 的核心使命是实现“开发即部署”。它提供了一个统一的特征平台，允许开发者直接使用 SQL 进行复杂的特征工程。通过这一设计，OpenMLDB 确保了用于模型训练的特征与线上实时推理的特征完全一致，从而消除了因环境差异导致的误差。其独特的技术亮点在于能够高效处理实时特征计算，同时满足低延迟、高吞吐和高可用性的生产级要求，将原本需要重构代码的繁琐过程简化为统一的 SQL 开发体验。\n\n这款工具非常适合机器学习工程师、数据科学家以及后端开发人员使用。无论是需要构建实时个性化推荐系统，还是从事金融风控分析的团队，都能利用 OpenMLDB 大幅降低从实验到落地的门槛，将精力更多地集中在算法优化而非工程适配上。作为一个已在数百家企业实际应用中验证过的成熟方案，OpenMLDB 正帮助各类组织以更高效、可靠的方式拥抱人工智能。","![openmldb_logo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F4paradigm_OpenMLDB_readme_3d70fc649919.png)\n\n[![build status](https:\u002F\u002Fgithub.com\u002F4paradigm\u002Fopenmldb\u002Factions\u002Fworkflows\u002Fcicd.yaml\u002Fbadge.svg?branch=openmldb)](https:\u002F\u002Fgithub.com\u002F4paradigm\u002Fopenmldb\u002Factions\u002Fworkflows\u002Fcicd.yaml)\n[![docker pulls](https:\u002F\u002Fimg.shields.io\u002Fdocker\u002Fpulls\u002F4pdosc\u002Fopenmldb.svg)](https:\u002F\u002Fhub.docker.com\u002Fr\u002F4pdosc\u002Fopenmldb)\n[![slack](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSlack-Join%20Slack-blue)](https:\u002F\u002Fjoin.slack.com\u002Ft\u002Fhybridsql-ws\u002Fshared_invite\u002Fzt-ozu3llie-K~hn9Ss1GZcFW2~K_L5sMg)\n[![discuss](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscuss-Ask%20Questions-blue)](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Fdiscussions)\n[![release](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Frelease\u002F4paradigm\u002FOpenMLDB?color=lime)](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Freleases)\n[![license](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002F4paradigm\u002FOpenMLDB?color=orange)](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Fblob\u002Fmain\u002FLICENSE)\n[![gitee](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGitee-mirror-lightyellow)](https:\u002F\u002Fgitee.com\u002Fparadigm4\u002FOpenMLDB)\n[![maven central](https:\u002F\u002Fimg.shields.io\u002Fmaven-central\u002Fv\u002Fcom.4paradigm.openmldb\u002Fopenmldb-batch)](https:\u002F\u002Fmvnrepository.com\u002Fartifact\u002Fcom.4paradigm.openmldb\u002Fopenmldb-batch)\n[![maven central](https:\u002F\u002Fimg.shields.io\u002Fmaven-central\u002Fv\u002Fcom.4paradigm.openmldb\u002Fopenmldb-jdbc)](https:\u002F\u002Fmvnrepository.com\u002Fartifact\u002Fcom.4paradigm.openmldb\u002Fopenmldb-jdbc)\n[![pypi](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fopenmldb)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fopenmldb\u002F)\n\n**English | [中文](README_cn.md)**\n\n## Content\n\n1. [Our Philosophy](#1-our-philosophy)\n2. [A Feature Platform for ML Applications](#2-a-feature-platform-for-ml-applications)\n3. [Highlights](#3-highlights)\n4. [FAQ](#4-faq)\n5. [Download and Install](#5-download-and-install)\n6. [QuickStart](#6-quickstart)\n7. [Use Cases](#7-use-cases)\n8. [Documentation](#8-documentation)\n9. [Roadmap](#9-roadmap)\n10. [Contribution](#10-contribution)\n11. [Community](#11-community)\n12. [Publications](#12-publications)\n13. [The User List](#13-the-user-list)\n\n### OpenMLDB is an open-source machine learning database that provides a feature platform computing consistent features for training and inference.\n\n## 1. Our Philosophy\n\nFor the artificial intelligence (AI) engineering, 95% of the time and effort is consumed by data related workloads. In order to tackle this challenge, tech giants spend thousands of hours on building in-house data and feature platforms to address engineering issues such as data leakage, feature backfilling, and efficiency. The other small and medium-sized enterprises have to purchase expensive SaaS tools and data governance services. \n\nOpenMLDB is an open-source machine learning database that is committed to solving the data and feature challenges. OpenMLDB has been deployed in hundreds of real-world enterprise applications. It prioritizes the capability of feature engineering using SQL for open-source, which offers a feature platform enabling consistent features for training and inference.\n\n## 2. A Feature Platform for ML Applications\n\nReal-time features are essential for many machine learning applications, such as real-time personalized recommendation and risk analytics. However, a feature engineering script developed by data scientists (Python scripts in most cases) cannot be directly deployed into production for online inference because it usually cannot meet the engineering requirements, such as low latency, high throughput and high availability. Therefore, a engineering team needs to be involved to refactor and optimize the source code using database or C++ to ensure its efficiency and robustness. As there are two teams and two toolchains involved for the development and deployment life cycle, the verification for consistency is essential, which usually costs a lot of time and human power. \n\nOpenMLDB is particularly designed as a feature platform for ML applications to accomplish the mission of **Development as Deployment**, to significantly reduce the cost from the offline training to online inference. Based on OpenMLDB, there are three steps only for the entire life cycle:\n\n- Step 1: Offline development of feature engineering script based on SQL\n- Step 2: SQL online deployment using just one command\n- Step 3: Online data source configuration to import real-time data\n\nWith those three steps done, the system is ready to serve real-time features, and highly optimized to achieve low latency and high throughput for production.\n\n![workflow](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F4paradigm_OpenMLDB_readme_78a4925518f3.png)\n\nIn order to achieve the goal of Development as Deployment, OpenMLDB is designed to provide consistent features for training and inference. The figure above shows the high-level architecture of OpenMLDB, which consists of four key components: (1) SQL as the unified programming language; (2) The real-time SQL engine for for extra-low latency services; (3) The batch SQL engine based on [a tailored Spark distribution](https:\u002F\u002Fgithub.com\u002F4paradigm\u002Fspark); (4) The unified execution plan generator to bridge the batch and real-time SQL engines to guarantee the consistency.\n\n## 3. Highlights\n\n**Consistent Features for Training and Inference:** Based on the unified execution plan generator, correct and consistent features are produced for offline training and online inference, providing hassle-free time travel without data leakage.\n\n**Real-Time Features with Ultra-Low Latency**: The real-time SQL engine is built from scratch and particularly optimized for time series data. It can achieve the response time of a few milliseconds only to produce real-time features, which significantly outperforms other commercial in-memory database systems (Figures 9 & 10, [the VLDB 2021 paper](http:\u002F\u002Fvldb.org\u002Fpvldb\u002Fvol14\u002Fp799-chen.pdf)).\n\n**Define Features as SQL**: SQL is used as the unified programming language to define and manage features. SQL is further enhanced for feature engineering, such as the extended syntax `LAST JOIN` and `WINDOW UNION`.\n\n**Production-Ready for ML Applications**: Production features are seamlessly integrated to support enterprise-grade ML applications, including distributed storage and computing, fault recovery, high availability, seamless scale-out, smooth upgrade, monitoring, heterogeneous memory support, and so on.\n\n## 4. FAQ\n\n1. **What are use cases of OpenMLDB?**\n   \n   At present, it is mainly positioned as a feature platform for ML applications, with the strength of low-latency real-time features. It provides the capability of Development as Deployment to significantly reduce the cost for machine learning applications. On the other hand, OpenMLDB contains an efficient and fully functional time-series database, which is used in finance, IoT and other fields.\n   \n2. **How does OpenMLDB evolve?**\n   \n   OpenMLDB originated from the commercial product of [4Paradigm](https:\u002F\u002Fwww.4paradigm.com\u002F) (a leading artificial intelligence service provider). In 2021, the core team has abstracted, enhanced and developed community-friendly features based on the commercial product; and then makes it publicly available as an open-source project to benefit more enterprises to achieve successful digital transformations at low cost. Before the open-source, it had been successfully deployed in hundreds of real-world ML applications together with 4Paradigm's other commercial products.\n\nIrrespective of the name, it is unrelated to MLDB, a different open source project in development since 2015.\n   \n3. **Is OpenMLDB a feature store?**\n   \n   OpenMLDB is more than a feature store to provide features for ML applications. OpenMLDB is capable of producing real-time features in a few milliseconds. Nowadays, most feature stores in the market serve online features by syncing features pre-computed at offline. But they are unable to produce low latency real-time features. By comparison, OpenMLDB is taking advantage of its optimized online SQL engine, to efficiently produce real-time features in a few milliseconds.\n   \n4. **Why does OpenMLDB choose SQL to define and manage features?**\n   \n   SQL (with extension) has the elegant syntax but yet powerful expression ability. SQL based programming experience flattens the learning curve of using OpenMLDB, and further makes it easier for collaboration and sharing.\n\n## 5. Download and Install\n\n- Download: [GitHub release](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Freleases), [mirror site (China)](https:\u002F\u002Fwww.openmldb.com\u002Fdownload\u002F)\n- Install and deploy: [English](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002Fmain\u002Fdeploy\u002Finstall_deploy.html), [Chinese](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fzh\u002Fmain\u002Fdeploy\u002Finstall_deploy.html)\n\n## 6. QuickStart\n\n[OpenMLDB QuickStart](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002Fmain\u002Fquickstart\u002Fopenmldb_quickstart.html)\n\n## 7. Use Cases\n\nWe are building a list of real-world use cases based on OpenMLDB to demonstrate how it can fit into your business. \n\n| Use Cases                                                    | Tools                                                        | Brief Introduction                                           |\n| ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |\n| [New York City Taxi Trip Duration](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002Fmain\u002Fuse_case\u002Flightgbm_demo.html) | OpenMLDB, LightGBM                                           | This is a challenge from Kaggle to predict the total ride duration of taxi trips in New York City. You can read [more detail here](https:\u002F\u002Fwww.kaggle.com\u002Fc\u002Fnyc-taxi-trip-duration\u002F). It demonstrates using the open-source tools OpenMLDB + LightGBM to build an end-to-end machine learning applications easily. |\n| [Importing real-time data streams from Pulsar](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002Fmain\u002Fuse_case\u002Fpulsar_connector_demo.html) | OpenMLDB, Pulsar, [OpenMLDB-Pulsar connector](https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fio-connectors\u002F#jdbc-openmldb) | Apache Pulsar is a cloud-native streaming platform. Based on the OpenMLDB-Kafka connector , we are able to seamlessly import real-time data streams from Pulsar to OpenMLDB as the online data sources. |\n| [Importing real-time data streams from Kafka](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002Fmain\u002Fuse_case\u002Fkafka_connector_demo.html) | OpenMLDB, Kafka, [OpenMLDB-Kafka connector](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Ftree\u002Fmain\u002Fextensions\u002Fkafka-connect-jdbc) | Apache Kafka is a distributed event streaming platform. With the OpenMLDB-Kafka connector, the real-time data streams can be imported from Kafka as the online data sources for OpenMLDB. |\n| [Importing real-time data streams from RocketMQ](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fzh\u002Fmain\u002Fintegration\u002Fonline_datasources\u002Frocketmq_connector.html) | OpenMLDB, RocketMQ, [OpenMLDB-RocketMQ connector](https:\u002F\u002Fgithub.com\u002Fapache\u002Frocketmq-connect\u002Ftree\u002Fmaster\u002Fconnectors\u002Frocketmq-connect-jdbc\u002Frocketmq-connect-jdbc-openmldb) | Apache RocketMQ is a distributed messaging and streaming platform. The OpenMLDB-RocketMQ connector is used to efficiently import real-data streams from RocketMQ to OpenMLDB. |\n| [Building end-to-end ML pipelines in DolphinScheduler](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002Fmain\u002Fuse_case\u002Fdolphinscheduler_task_demo.html) | OpenMLDB, DolphinScheduler, [OpenMLDB task plugin](https:\u002F\u002Fdolphinscheduler.apache.org\u002Fzh-cn\u002Fdocs\u002Fdev\u002Fuser_doc\u002Fguide\u002Ftask\u002Fopenmldb.html) | We demonstrate to build an end-to-end machine learning pipeline based on OpenMLDB and DolphinScheduler (an open-source workflow scheduler platform). It consists of feature engineering, model training, and deployment. |\n| [Ad Tracking Fraud Detection](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002Fmain\u002Fuse_case\u002Ftalkingdata_demo.html) | OpenMLDB, XGBoost                                            | This demo uses OpenMLDB and XGBoost to [detect click fraud](https:\u002F\u002Fwww.kaggle.com\u002Fc\u002Ftalkingdata-adtracking-fraud-detection\u002F) for online advertisements. |\n| [SQL-based ML pipelines](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002Fmain\u002Fuse_case\u002FOpenMLDB_Byzer_taxi.html) | OpenMLDB, Byzer, [OpenMLDB Plugin for Byzer](https:\u002F\u002Fgithub.com\u002Fbyzer-org\u002Fbyzer-extension\u002Ftree\u002Fmaster\u002Fbyzer-openmldb) | Byzer is a low-code open-source programming language for data pipeline, analytics and AI. Byzer has integrated OpenMLDB to deliver the capability of building ML pipelines with SQL. |\n| [Building end-to-end ML pipelines in Airflow](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002Fmain\u002Fuse_case\u002Fairflow_provider_demo.html) | OpenMLDB, Airflow, [Airflow OpenMLDB Provider](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Ftree\u002Fmain\u002Fextensions\u002Fairflow-provider-openmldb), XGBoost | Airflow is a popular workflow management and scheduling tool. This demo shows how to effectively schedule OpenMLDB tasks in the Airflow through the provider package. |\n| [Precision marketing](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002Fmain\u002Fuse_case\u002FJD_recommendation_en.html) | OpenMLDB, OneFlow                                            | OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient. This use case demonstrates to use OpenMLDB for feature engineering and OneFlow for model training\u002Finference, to build an application for [precision marketing](https:\u002F\u002Fjdata.jd.com\u002Fhtml\u002Fdetail.html?id=1). |\n\n## 8. Documentation\n\n- Chinese documentations: [https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fzh](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fzh\u002F)\n- English documentations: [https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002F](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002F)\n\n## 9. Roadmap\n\nPlease refer to our [public Roadmap page](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Fprojects\u002F10).\n\nFurthermore, there are a few important features on the development roadmap but have not been scheduled yet. We appreciate any feedbacks on those features.\n\n- A cloud-native OpenMLDB\n- Automatic feature extraction\n- Optimization based on heterogeneous storage and computing resources\n- A lightweight OpenMLDB for edge computing\n\n## 10. Contribution\n\nWe really appreciate the contribution from our community.\n\n- If you are interested to contribute, please read our [Contribution Guideline](CONTRIBUTING.md) for more details. \n- If you are a new contributor, you may get start with [the list of issues labeled with `good first issue`](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Fissues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22).\n- If you have experience of OpenMLDB development, or want to tackle a challenge that may take 1-2 weeks, you may find [the list of issues labeled with `call-for-contributions`](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Fissues?q=is%3Aopen+is%3Aissue+label%3Acall-for-contributions).\n\n[![Open in Gitpod](https:\u002F\u002Fgitpod.io\u002Fbutton\u002Fopen-in-gitpod.svg)](https:\u002F\u002Fgitpod.io\u002F#https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB)\n\n## 11. Community\n\n- Website: [https:\u002F\u002Fopenmldb.ai\u002Fen](https:\u002F\u002Fopenmldb.ai\u002Fen)\n\n- Email: [contact@openmldb.ai](mailto:contact@openmldb.ai)\n\n- [Slack](https:\u002F\u002Fjoin.slack.com\u002Ft\u002Fopenmldb\u002Fshared_invite\u002Fzt-ozu3llie-K~hn9Ss1GZcFW2~K_L5sMg) \n\n- [GitHub Issues](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Fissues) and [GitHub Discussions](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Fdiscussions): The GitHub Issues is used to report bugs and collect new feature requirements. The GitHub Discussions is open to any discussions related to OpenMLDB.\n\n- [Blogs (English)](https:\u002F\u002Fopenmldb.medium.com\u002F)\n\n- [Blogs (Chinese)](https:\u002F\u002Fwww.zhihu.com\u002Fcolumn\u002Fc_1417199590352916480)\n\n- Public drives maintained by the PMC: [English](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1T5myyLVe--I9b77Vg0Y8VCYH29DRujUL) |  [中文](https:\u002F\u002Fopenmldb.feishu.cn\u002Fwiki\u002Fspace\u002F7101318128021307396)\n\n- [Mailing list for developers](https:\u002F\u002Fgroups.google.com\u002Fg\u002Fopenmldb-developers)\n\n- WeChat Groups (Chinese):\n\n  ![wechat](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F4paradigm_OpenMLDB_readme_28a013a0d13a.png)  \n\n## 12. Publications\n- [OpenMLDB: A Real-Time Relational Data Feature Computation System for Online ML](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3722212.3724446). Xuanhe Zhou, Wei Zhou, Liguo Qi, Hao Zhang, Dihao Chen, Bingsheng He, Mian Lu, Guoliang Li, Fan Wu, Yuqiang Chen. SIGMOD 2025.\n- [PECJ: Stream Window Join on Disorder Data Streams with Proactive Error Compensation](https:\u002F\u002Ftonyskyzeng.github.io\u002Fdownloads\u002FPECJ_TR.pdf). Xianzhi Zeng, Shuhao Zhang, Hongbin Zhong, Hao Zhang, Mian Lu, Zhao Zheng, and Yuqiang Chen. International Conference on Management of Data (SIGMOD\u002FPODS) 2024.\n- [Principles and Practices of Real-Time Feature Computing Platforms for ML](https:\u002F\u002Fcacm.acm.org\u002Fmagazines\u002F2023\u002F7\u002F274061-principles-and-practices-of-real-time-feature-computing-platforms-for-ml\u002Ffulltext). Hao Zhang, Jun Yang, Cheng Chen, Siqi Wang, Jiashu Li, and Mian Lu. 2023. Communications of the ACM 66, 7 (July 2023), 77–78.\n- [Scalable Online Interval Join on Modern Multicore Processors in OpenMLDB](docs\u002Fpaper\u002Fscale_oij_icde2023.pdf). Hao Zhang, Xianzhi Zeng, Shuhao Zhang, Xinyi Liu, Mian Lu, and Zhao Zheng. In 2023 IEEE 39rd International Conference on Data Engineering (ICDE) 2023. [[code]](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Ftree\u002Fstream)\n- [FEBench: A Benchmark for Real-Time Relational Data Feature Extraction](https:\u002F\u002Fgithub.com\u002Fdecis-bench\u002Ffebench\u002Fblob\u002Fmain\u002Freport\u002Ffebench.pdf). Xuanhe Zhou, Cheng Chen, Kunyi Li, Bingsheng He, Mian Lu, Qiaosheng Liu, Wei Huang, Guoliang Li, Zhao Zheng, Yuqiang Chen. International Conference on Very Large Data Bases (VLDB) 2023. [[code]](https:\u002F\u002Fgithub.com\u002Fdecis-bench\u002Ffebench).\n- [A System for Time Series Feature Extraction in Federated Learning](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3511808.3557176). Siqi Wang, Jiashu Li, Mian Lu, Zhao Zheng, Yuqiang Chen, and Bingsheng He. 2022. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (CIKM) 2022. [[code]](https:\u002F\u002Fgithub.com\u002F4paradigm\u002Ftsfe).\n- [Optimizing in-memory database engine for AI-powered on-line decision augmentation using persistent memory](http:\u002F\u002Fvldb.org\u002Fpvldb\u002Fvol14\u002Fp799-chen.pdf). Cheng Chen, Jun Yang, Mian Lu, Taize Wang, Zhao Zheng, Yuqiang Chen, Wenyuan Dai, Bingsheng He, Weng-Fai Wong, Guoan Wu, Yuping Zhao, and Andy Rudoff. International Conference on Very Large Data Bases (VLDB) 2021.\n\n## 13. [The User List](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Fdiscussions\u002F707)\n\nWe are building [a user list](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Fdiscussions\u002F707) to collect feedback from the community. We really appreciate it if you can provide your use cases, comments, or any feedback when using OpenMLDB. We want to hear from you! \n","![openmldb_logo](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F4paradigm_OpenMLDB_readme_3d70fc649919.png)\n\n[![构建状态](https:\u002F\u002Fgithub.com\u002F4paradigm\u002Fopenmldb\u002Factions\u002Fworkflows\u002Fcicd.yaml\u002Fbadge.svg?branch=openmldb)](https:\u002F\u002Fgithub.com\u002F4paradigm\u002Fopenmldb\u002Factions\u002Fworkflows\u002Fcicd.yaml)\n[![Docker 拉取次数](https:\u002F\u002Fimg.shields.io\u002Fdocker\u002Fpulls\u002F4pdosc\u002Fopenmldb.svg)](https:\u002F\u002Fhub.docker.com\u002Fr\u002F4pdosc\u002Fopenmldb)\n[![Slack](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSlack-Join%20Slack-blue)](https:\u002F\u002Fjoin.slack.com\u002Ft\u002Fhybridsql-ws\u002Fshared_invite\u002Fzt-ozu3llie-K~hn9Ss1GZcFW2~K_L5sMg)\n[![讨论](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscuss-Ask%20Questions-blue)](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Fdiscussions)\n[![发布版本](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Frelease\u002F4paradigm\u002FOpenMLDB?color=lime)](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Freleases)\n[![许可证](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002F4paradigm\u002FOpenMLDB?color=orange)](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Fblob\u002Fmain\u002FLICENSE)\n[![Gitee](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGitee-mirror-lightyellow)](https:\u002F\u002Fgitee.com\u002Fparadigm4\u002FOpenMLDB)\n[![Maven Central](https:\u002F\u002Fimg.shields.io\u002Fmaven-central\u002Fv\u002Fcom.4paradigm.openmldb\u002Fopenmldb-batch)](https:\u002F\u002Fmvnrepository.com\u002Fartifact\u002Fcom.4paradigm.openmldb\u002Fopenmldb-batch)\n[![Maven Central](https:\u002F\u002Fimg.shields.io\u002Fmaven-central\u002Fv\u002Fcom.4paradigm.openmldb\u002Fopenmldb-jdbc)](https:\u002F\u002Fmvnrepository.com\u002Fartifact\u002Fcom.4paradigm.openmldb\u002Fopenmldb-jdbc)\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fopenmldb)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fopenmldb\u002F)\n\n**English | [中文](README_cn.md)**\n\n## 目录\n\n1. [我们的理念](#1-our-philosophy)\n2. [面向机器学习应用的特征平台](#2-a-feature-platform-for-ml-applications)\n3. [亮点](#3-highlights)\n4. [常见问题解答](#4-faq)\n5. [下载与安装](#5-download-and-install)\n6. [快速入门](#6-quickstart)\n7. [使用案例](#7-use-cases)\n8. [文档](#8-documentation)\n9. [路线图](#9-roadmap)\n10. [贡献](#10-contribution)\n11. [社区](#11-community)\n12. [出版物](#12-publications)\n13. [用户列表](#13-the-user-list)\n\n### OpenMLDB 是一款开源的机器学习数据库，提供用于训练和推理的一致性特征计算平台。\n\n## 1. 我们的理念\n\n在人工智能（AI）工程化过程中，95% 的时间和精力都耗费在数据相关的工作上。为了应对这一挑战，科技巨头们投入数千小时自建数据和特征平台，以解决诸如数据泄露、特征回填和效率等问题。而其他中小型企业则不得不购买昂贵的 SaaS 工具和数据治理服务。\n\nOpenMLDB 是一款开源的机器学习数据库，致力于解决数据和特征方面的难题。它已在数百个真实的企业级应用场景中部署，优先支持使用 SQL 进行特征工程，为开源社区提供一个能够实现训练与推理一致性特征的平台。\n\n## 2. 面向机器学习应用的特征平台\n\n实时特征对于许多机器学习应用至关重要，例如实时个性化推荐和风险分析。然而，数据科学家开发的特征工程脚本（通常是 Python 脚本）无法直接部署到生产环境中进行在线推理，因为它们通常无法满足低延迟、高吞吐量和高可用性等工程需求。因此，需要专门的工程团队介入，利用数据库或 C++ 对源代码进行重构和优化，以确保其高效性和鲁棒性。由于开发和部署生命周期涉及两个团队和两套工具链，验证一致性变得尤为重要，而这往往需要耗费大量时间和人力。\n\nOpenMLDB 专为机器学习应用设计，旨在实现“开发即部署”的目标，从而大幅降低从离线训练到在线推理的成本。基于 OpenMLDB，整个生命周期只需三步：\n\n- 第一步：基于 SQL 进行离线特征工程脚本开发\n- 第二步：仅需一条命令即可完成 SQL 在线部署\n- 第三步：配置在线数据源以导入实时数据\n\n完成这三步后，系统即可提供实时特征，并经过高度优化以实现低延迟和高吞吐量的生产环境需求。\n\n![workflow](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F4paradigm_OpenMLDB_readme_78a4925518f3.png)\n\n为实现“开发即部署”的目标，OpenMLDB 被设计成能够为训练和推理提供一致性的特征。上图展示了 OpenMLDB 的高层架构，由四个关键组件组成：(1) 统一的编程语言 SQL；(2) 用于超低延迟服务的实时 SQL 引擎；(3) 基于 [定制化的 Spark 发行版](https:\u002F\u002Fgithub.com\u002F4paradigm\u002Fspark) 的批处理 SQL 引擎；(4) 统一的执行计划生成器，用于连接批处理和实时 SQL 引擎，以保证一致性。\n\n## 3. 亮点\n\n**训练与推理的一致性特征**：基于统一的执行计划生成器，能够为离线训练和在线推理生成正确且一致的特征，实现无数据泄露的便捷时间旅行。\n\n**超低延迟的实时特征**：实时 SQL 引擎从零开始构建，并针对时序数据进行了特别优化。它能够在几毫秒内生成实时特征，性能显著优于其他商业内存数据库系统（图 9 和 10，[VLDB 2021 论文](http:\u002F\u002Fvldb.org\u002Fpvldb\u002Fvol14\u002Fp799-chen.pdf)）。\n\n**用 SQL 定义特征**：SQL 被用作定义和管理特征的统一编程语言。此外，SQL 还针对特征工程进行了扩展，例如新增了 `LAST JOIN` 和 `WINDOW UNION` 等语法。\n\n**适用于生产环境的机器学习应用**：生产级特性无缝集成，支持企业级机器学习应用，包括分布式存储与计算、故障恢复、高可用性、无缝扩容、平滑升级、监控以及异构内存支持等。\n\n## 4. 常见问题解答\n\n1. **OpenMLDB 的应用场景有哪些？**\n\n   目前，OpenMLDB 主要定位于机器学习应用的特征平台，其优势在于低延迟的实时特征。它提供了“开发即部署”的能力，能够显著降低机器学习应用的开发和部署成本。此外，OpenMLDB 还内置了一个高效且功能完备的时序数据库，广泛应用于金融、物联网等领域。\n\n2. **OpenMLDB 是如何发展起来的？**\n\n   OpenMLDB 最初源自 [4Paradigm](https:\u002F\u002Fwww.4paradigm.com\u002F)（一家领先的人工智能服务提供商）的商业产品。2021 年，核心团队在该商业产品的基础上进行了抽象、增强，并开发了更符合社区需求的功能，随后将其以开源项目的形式对外发布，旨在帮助更多企业以低成本实现数字化转型。在开源之前，OpenMLDB 已与 4Paradigm 的其他商业产品一起成功部署于数百个实际的机器学习应用场景中。\n\n   需要注意的是，尽管名称相似，但 OpenMLDB 与自 2015 年开始开发的另一个开源项目 MLDB 并无关联。\n\n3. **OpenMLDB 是一个特征存储吗？**\n\n   OpenMLDB 不仅仅是一个为机器学习应用提供特征的特征存储。它能够在几毫秒内生成实时特征。目前市场上大多数特征存储系统主要通过同步离线预计算好的特征来提供在线特征，而无法生成低延迟的实时特征。相比之下，OpenMLDB 利用其优化的在线 SQL 引擎，能够高效地在几毫秒内生成实时特征。\n\n4. **为什么 OpenMLDB 选择使用 SQL 来定义和管理特征？**\n\n   SQL（及其扩展）具有简洁优雅的语法和强大的表达能力。基于 SQL 的编程方式可以大幅降低使用 OpenMLDB 的学习曲线，同时也有助于团队协作和知识共享。\n\n## 5. 下载与安装\n\n- 下载：[GitHub 发布页面](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Freleases)、[国内镜像站](https:\u002F\u002Fwww.openmldb.com\u002Fdownload\u002F)\n- 安装与部署：[英文文档](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002Fmain\u002Fdeploy\u002Finstall_deploy.html)、[中文文档](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fzh\u002Fmain\u002Fdeploy\u002Finstall_deploy.html)\n\n## 6. 快速入门\n\n[OpenMLDB 快速入门](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002Fmain\u002Fquickstart\u002Fopenmldb_quickstart.html)\n\n## 7. 使用案例\n\n我们正在整理一系列基于 OpenMLDB 的真实应用场景，以展示它如何适配您的业务需求。\n\n| 使用案例                                                    | 使用工具                                                        | 简介                                           |\n| ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |\n| [纽约市出租车行程时长预测](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002Fmain\u002Fuse_case\u002Flightgbm_demo.html) | OpenMLDB、LightGBM                                           | 这是 Kaggle 上的一个挑战赛，目标是预测纽约市出租车行程的总时长。详细信息请参阅 [此处](https:\u002F\u002Fwww.kaggle.com\u002Fc\u002Fnyc-taxi-trip-duration\u002F)。本案例展示了如何使用 OpenMLDB 和 LightGBM 这两个开源工具，轻松构建端到端的机器学习应用。 |\n| [从 Pulsar 导入实时数据流](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002Fmain\u002Fuse_case\u002Fpulsar_connector_demo.html) | OpenMLDB、Pulsar、[OpenMLDB-Pulsar 连接器](https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fio-connectors\u002F#jdbc-openmldb) | Apache Pulsar 是一个云原生的流式处理平台。借助 OpenMLDB-Kafka 连接器，我们可以将 Pulsar 中的实时数据流无缝导入 OpenMLDB，作为在线数据源。 |\n| [从 Kafka 导入实时数据流](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002Fmain\u002Fuse_case\u002Fkafka_connector_demo.html) | OpenMLDB、Kafka、[OpenMLDB-Kafka 连接器](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Ftree\u002Fmain\u002Fextensions\u002Fkafka-connect-jdbc) | Apache Kafka 是一个分布式事件流平台。通过 OpenMLDB-Kafka 连接器，Kafka 中的实时数据流可以被导入 OpenMLDB，作为在线数据源。 |\n| [从 RocketMQ 导入实时数据流](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fzh\u002Fmain\u002Fintegration\u002Fonline_datasources\u002Frocketmq_connector.html) | OpenMLDB、RocketMQ、[OpenMLDB-RocketMQ 连接器](https:\u002F\u002Fgithub.com\u002Fapache\u002Frocketmq-connect\u002Ftree\u002Fmaster\u002Fconnectors\u002Frocketmq-connect-jdbc\u002Frocketmq-connect-jdbc-openmldb) | Apache RocketMQ 是一个分布式消息和流处理平台。利用 OpenMLDB-RocketMQ 连接器，可以高效地将 RocketMQ 中的实时数据流导入 OpenMLDB。 |\n| [在 DolphinScheduler 中构建端到端机器学习流水线](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002Fmain\u002Fuse_case\u002Fdolphinscheduler_task_demo.html) | OpenMLDB、DolphinScheduler、[OpenMLDB 任务插件](https:\u002F\u002Fdolphinscheduler.apache.org\u002Fzh-cn\u002Fdocs\u002Fdev\u002Fuser_doc\u002Fguide\u002Ftask\u002Fopenmldb.html) | 本案例演示如何基于 OpenMLDB 和 DolphinScheduler（一个开源的工作流调度平台）构建端到端的机器学习流水线，包括特征工程、模型训练和部署等环节。 |\n| [广告点击欺诈检测](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002Fmain\u002Fuse_case\u002Ftalkingdata_demo.html) | OpenMLDB、XGBoost                                            | 本示例使用 OpenMLDB 和 XGBoost 来检测在线广告中的点击欺诈行为，相关比赛详情请参阅 [此处](https:\u002F\u002Fwww.kaggle.com\u002Fc\u002Ftalkingdata-adtracking-fraud-detection\u002F)。 |\n| [基于 SQL 的机器学习流水线](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002Fmain\u002Fuse_case\u002FOpenMLDB_Byzer_taxi.html) | OpenMLDB、Byzer、[OpenMLDB Byzer 插件](https:\u002F\u002Fgithub.com\u002Fbyzer-org\u002Fbyzer-extension\u002Ftree\u002Fmaster\u002Fbyzer-openmldb) | Byzer 是一种低代码的开源编程语言，适用于数据处理、分析和人工智能领域。Byzer 已集成 OpenMLDB，从而支持使用 SQL 构建机器学习流水线。 |\n| [在 Airflow 中构建端到端机器学习流水线](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002Fmain\u002Fuse_case\u002Fairflow_provider_demo.html) | OpenMLDB、Airflow、[Airflow OpenMLDB 提供者插件](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Ftree\u002Fmain\u002Fextensions\u002Fairflow-provider-openmldb)、XGBoost | Airflow 是一款流行的工作流管理和调度工具。本示例展示了如何通过提供的插件包，在 Airflow 中高效地调度 OpenMLDB 任务。 |\n| [精准营销](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002Fmain\u002Fuse_case\u002FJD_recommendation_en.html) | OpenMLDB、OneFlow                                            | OneFlow 是一个用户友好、可扩展且高效的深度学习框架。本案例展示了如何使用 OpenMLDB 进行特征工程，再结合 OneFlow 进行模型训练和推理，从而构建一个用于精准营销的应用程序，具体参考 [京东数据平台](https:\u002F\u002Fjdata.jd.com\u002Fhtml\u002Fdetail.html?id=1)。 |\n\n## 8. 文档\n\n- 中文文档：[https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fzh](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fzh\u002F)\n- 英文文档：[https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002F](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fen\u002F)\n\n## 9. 路线图\n\n请参阅我们的[公开路线图页面](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Fprojects\u002F10)。\n\n此外，开发路线图上还有一些重要的功能尚未排期。我们非常欢迎对这些功能的任何反馈。\n\n- 云原生的 OpenMLDB\n- 自动特征提取\n- 基于异构存储和计算资源的优化\n- 面向边缘计算的轻量级 OpenMLDB\n\n## 10. 贡献\n\n我们非常感谢社区的贡献。\n\n- 如果您有兴趣参与贡献，请阅读我们的[贡献指南](CONTRIBUTING.md)以获取更多详细信息。\n- 对于新贡献者，您可以从[标记为 `good first issue` 的问题列表](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Fissues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22)开始。\n- 如果您有 OpenMLDB 开发经验，或者希望挑战一个可能需要 1-2 周完成的任务，可以查看[标记为 `call-for-contributions` 的问题列表](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Fissues?q=is%3Aopen+is%3Aissue+label%3Acall-for-contributions)。\n\n[![在 Gitpod 中打开](https:\u002F\u002Fgitpod.io\u002Fbutton\u002Fopen-in-gitpod.svg)](https:\u002F\u002Fgitpod.io\u002F#https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB)\n\n## 11. 社区\n\n- 官网：[https:\u002F\u002Fopenmldb.ai\u002Fen](https:\u002F\u002Fopenmldb.ai\u002Fen)\n\n- 邮箱：[contact@openmldb.ai](mailto:contact@openmldb.ai)\n\n- [Slack](https:\u002F\u002Fjoin.slack.com\u002Ft\u002Fopenmldb\u002Fshared_invite\u002Fzt-ozu3llie-K~hn9Ss1GZcFW2~K_L5sMg) \n\n- [GitHub Issues](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Fissues) 和 [GitHub Discussions](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Fdiscussions)：GitHub Issues 用于报告 bug 和收集新功能需求。GitHub Discussions 则开放用于与 OpenMLDB 相关的任何讨论。\n\n- [博客（英文）](https:\u002F\u002Fopenmldb.medium.com\u002F)\n\n- [博客（中文）](https:\u002F\u002Fwww.zhihu.com\u002Fcolumn\u002Fc_1417199590352916480)\n\n- PMC 维护的公共网盘：[英文](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1T5myyLVe--I9b77Vg0Y8VCYH29DRujUL) |  [中文](https:\u002F\u002Fopenmldb.feishu.cn\u002Fwiki\u002Fspace\u002F7101318128021307396)\n\n- [开发者邮件列表](https:\u002F\u002Fgroups.google.com\u002Fg\u002Fopenmldb-developers)\n\n- 微信群（中文）：\n\n  ![wechat](https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F4paradigm_OpenMLDB_readme_28a013a0d13a.png)  \n\n## 12. 出版物\n- [OpenMLDB：面向在线机器学习的实时关系型数据特征计算系统](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002F10.1145\u002F3722212.3724446)。周轩赫、周伟、齐立国、张浩、陈迪浩、何炳胜、陆勉、李国梁、吴凡、陈宇强。SIGMOD 2025。\n- [PECJ：具有主动误差补偿的无序数据流上的流式窗口连接](https:\u002F\u002Ftonyskyzeng.github.io\u002Fdownloads\u002FPECJ_TR.pdf)。曾宪志、张书豪、钟洪斌、张浩、陆勉、郑兆和陈宇强。2024 年国际数据管理会议（SIGMOD\u002FPODS）。\n- [机器学习实时特征计算平台的原则与实践](https:\u002F\u002Fcacm.acm.org\u002Fmagazines\u002F2023\u002F7\u002F274061-principles-and-practices-of-real-time-feature-computing-platforms-for-ml\u002Ffulltext)。张浩、杨俊、陈诚、王思琪、李家树和陆勉。2023 年。《ACM 通信》第 66 卷第 7 期（2023 年 7 月），77–78 页。\n- [OpenMLDB 中现代多核处理器上的可扩展在线区间连接](docs\u002Fpaper\u002Fscale_oij_icde2023.pdf)。张浩、曾宪志、张书豪、刘欣怡、陆勉和郑兆。发表于 2023 年 IEEE 第 39 届国际数据工程会议（ICDE）。[[代码]](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Ftree\u002Fstream)\n- [FEBench：实时关系型数据特征提取基准测试](https:\u002F\u002Fgithub.com\u002Fdecis-bench\u002Ffebench\u002Fblob\u002Fmain\u002Freport\u002Ffebench.pdf)。周轩赫、陈诚、李坤义、何炳胜、陆勉、刘乔生、黄伟、李国梁、郑兆和陈宇强。2023 年国际大型数据库会议（VLDB）。[[代码]](https:\u002F\u002Fgithub.com\u002Fdecis-bench\u002Ffebench)。\n- [联邦学习中的时间序列特征提取系统](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3511808.3557176)。王思琪、李家树、陆勉、郑兆、陈宇强和何炳胜。2022 年。发表于第 31 届 ACM 国际信息与知识管理会议（CIKM）论文集中。[[代码]](https:\u002F\u002Fgithub.com\u002F4paradigm\u002Ftsfe)。\n- [利用持久内存优化用于 AI 驱动在线决策增强的内存数据库引擎](http:\u002F\u002Fvldb.org\u002Fpvldb\u002Fvol14\u002Fp799-chen.pdf)。陈诚、杨俊、陆勉、王泰泽、郑兆、陈宇强、戴文渊、何炳胜、黄永辉、吴国安、赵玉萍和鲁道夫。2021 年国际大型数据库会议（VLDB）。\n\n## 13. [用户列表](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Fdiscussions\u002F707)\n\n我们正在建立[用户列表](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Fdiscussions\u002F707)，以收集社区的反馈。如果您在使用 OpenMLDB 时能提供您的使用场景、评论或其他反馈，我们将不胜感激。我们期待您的声音！","# OpenMLDB 快速上手指南\n\nOpenMLDB 是一个开源的机器学习数据库，专为机器学习应用提供特征平台。它通过统一的 SQL 接口，确保离线训练和在线推理的特征一致性，实现“开发即部署”，显著降低从模型训练到线上服务的成本。\n\n## 1. 环境准备\n\n在开始之前，请确保您的系统满足以下要求：\n\n*   **操作系统**：推荐 Linux (CentOS 7+, Ubuntu 18.04+) 或 macOS。Windows 用户建议使用 WSL2 或 Docker。\n*   **内存**：至少 4GB RAM（生产环境建议 8GB+）。\n*   **前置依赖**：\n    *   **Docker & Docker Compose**（推荐方式，最简便）\n    *   或者安装 **Java (JDK 8\u002F11)** 和 **Maven**（用于源码编译或独立部署）\n    *   **Python 3.6+** (如需使用 Python SDK)\n\n> **提示**：对于国内开发者，推荐使用 Docker 镜像加速或访问国内镜像站获取资源。\n\n## 2. 安装步骤\n\n### 方式一：使用 Docker 快速启动（推荐）\n\n这是最简单的体验方式，一键启动包含所有组件的单机版 OpenMLDB。\n\n```bash\n# 拉取最新镜像（国内用户可使用阿里云等加速器配置）\ndocker pull 4pdosc\u002Fopenmldb:latest\n\n# 启动 OpenMLDB 容器\ndocker run -it --rm --name openmldb \\\n  -p 9000:9000 -p 6502:6502 \\\n  4pdosc\u002Fopenmldb:latest\n```\n\n启动成功后，您将进入 OpenMLDB 的命令行交互界面 (`openmldb>`).\n\n### 方式二：下载二进制包或源码编译\n\n如果您需要更灵活的部署，可以从官方发布页或国内镜像站下载。\n\n*   **GitHub Release**: [https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Freleases](https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Freleases)\n*   **国内镜像站**: [https:\u002F\u002Fwww.openmldb.com\u002Fdownload\u002F](https:\u002F\u002Fwww.openmldb.com\u002Fdownload\u002F) (推荐国内用户)\n\n下载后解压，并执行以下命令启动单机模式：\n\n```bash\ncd openmldb-\u003Cversion>\nbin\u002Fstart.sh standalone\n```\n\n随后连接客户端：\n```bash\nbin\u002Fopenmldb --zk_cluster=127.0.0.1:2181 --zk_path=\u002Fopenmldb\n```\n\n## 3. 基本使用\n\nOpenMLDB 的核心工作流分为三步：**导入数据** -> **定义特征 (SQL)** -> **部署服务**。\n\n以下是一个基于出租车行程预测的极简示例。\n\n### 第一步：创建表并导入数据\n\n在 `openmldb>` 命令行中，首先创建一张表并插入测试数据。\n\n```sql\n-- 创建表\nCREATE TABLE taxi_trip_data (\n    trip_id string,\n    vendor_id int,\n    pickup_datetime timestamp,\n    dropoff_datetime timestamp,\n    passenger_count int,\n    trip_distance double,\n    rate_code_id int,\n    store_and_fwd_flag string,\n    pickup_location_id int,\n    dropoff_location_id int,\n    payment_type int,\n    fare_amount double,\n    extra double,\n    mta_tax double,\n    tip_amount double,\n    tolls_amount double,\n    improvement_surcharge double,\n    total_amount double,\n    congestion_surcharge double\n);\n\n-- 插入模拟数据\nINSERT INTO taxi_trip_data VALUES \n('trip_001', 1, 1609459200000, 1609459800000, 1, 2.5, 1, 'N', 100, 200, 1, 10.5, 0.5, 0.5, 2.0, 0.0, 0.3, 13.8, 0.0),\n('trip_002', 1, 1609459300000, 1609460000000, 2, 3.0, 1, 'N', 101, 201, 1, 12.0, 0.5, 0.5, 2.5, 0.0, 0.3, 15.8, 0.0);\n```\n\n### 第二步：使用 SQL 定义特征\n\nOpenMLDB 扩展了 SQL 语法（如 `LAST JOIN` 和 `WINDOW`），用于处理时间序列特征工程。以下示例计算过去 1 小时内同一司机的平均车费。\n\n```sql\n-- 定义特征提取逻辑\nSELECT \n    trip_id,\n    vendor_id,\n    AVG(fare_amount) OVER w AS avg_fare_1h\nFROM taxi_trip_data\nLAST JOIN (\n    SELECT trip_id, vendor_id, fare_amount, pickup_datetime \n    FROM taxi_trip_data\n) t2\nON taxi_trip_data.vendor_id = t2.vendor_id\nWINDOW w AS (\n    PARTITION BY taxi_trip_data.vendor_id \n    ORDER BY taxi_trip_data.pickup_datetime \n    ROWS_RANGE BETWEEN 1h PRECEDING AND CURRENT ROW\n);\n```\n\n### 第三步：部署为在线服务\n\n确认特征逻辑无误后，只需一条命令即可将上述 SQL 部署为在线 API 服务。\n\n```sql\n-- 部署特征服务\nDEPLOY demo_deployment AS \nSELECT \n    trip_id,\n    AVG(fare_amount) OVER w AS avg_fare_1h\nFROM taxi_trip_data\nLAST JOIN (\n    SELECT trip_id, vendor_id, fare_amount, pickup_datetime \n    FROM taxi_trip_data\n) t2\nON taxi_trip_data.vendor_id = t2.vendor_id\nWINDOW w AS (\n    PARTITION BY taxi_trip_data.vendor_id \n    ORDER BY taxi_trip_data.pickup_datetime \n    ROWS_RANGE BETWEEN 1h PRECEDING AND CURRENT ROW\n);\n```\n\n部署成功后，您可以通过 REST API 或 JDBC\u002FPython SDK 调用该服务进行实时推理，无需重新编写代码。\n\n### 使用 Python SDK 调用\n\n```python\nimport openmldb\n\n# 初始化客户端\nsdk = openmldb.client()\nsdk.init(\"127.0.0.1:9000\")\n\n# 调用部署好的服务\nresult = sdk.call_procedure(\"demo_deployment\", [\"trip_003\", 1, 1609460000000, ...]) # 填入对应字段\nprint(result)\n```\n\n---\n更多详细文档、高级用例及集群部署方案，请访问 [OpenMLDB 中文文档](https:\u002F\u002Fopenmldb.ai\u002Fdocs\u002Fzh\u002Fmain\u002F)。","某金融科技公司正在构建实时反欺诈系统，数据科学团队需要基于用户历史交易行为快速计算滑动窗口特征（如过去 1 小时的转账次数）以训练风控模型。\n\n### 没有 OpenMLDB 时\n- **特征不一致风险高**：数据科学家使用 Python\u002FPandas 离线计算特征，而工程团队用 Java\u002FC++ 重写逻辑用于线上推理，两套代码极易出现逻辑偏差，导致“训练 - 推理”数据漂移。\n- **开发迭代周期长**：每次调整特征逻辑，都需要算法和工程两组人员分别修改代码并重新验证，沟通成本高，新模型上线往往耗时数周。\n- **实时性能难以保障**：传统的脚本化处理难以满足毫秒级低延迟和高并发要求，必须投入大量精力进行底层架构优化和重构。\n- **数据泄露隐患大**：在手动拼接历史数据时，容易因时间窗口处理不当引入未来数据，导致模型评估虚高但实际失效。\n\n### 使用 OpenMLDB 后\n- **训练推理高度一致**：通过统一的 SQL 定义特征逻辑，OpenMLDB 自动确保离线训练和在线推理使用完全相同的计算引擎，彻底消除特征不一致问题。\n- **实现“开发即部署”**：算法工程师只需编写一次 SQL 脚本即可直接应用于生产环境，无需工程团队重复造轮子，模型迭代效率提升数倍。\n- **原生支持高性能实时计算**：内置的即时特征计算能力天然支持低延迟和高吞吐，轻松应对海量交易请求，无需额外优化底层代码。\n- **自动规避数据泄露**：系统严格基于时间戳管理数据窗口，从机制上杜绝了未来数据混入训练集的风险，保证模型效果真实可靠。\n\nOpenMLDB 通过统一的特征平台打破了算法与工程的壁垒，让实时机器学习应用的开发像写 SQL 一样简单高效。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002F4paradigm_OpenMLDB_8f914100.png","4paradigm","4Paradigm","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002F4paradigm_945c02ee.jpg","4Paradigm Open Source Community",null,"4paradigm.com","https:\u002F\u002Fgithub.com\u002F4paradigm",[81,85,89,93,97,101,105,108,112,116],{"name":82,"color":83,"percentage":84},"C++","#f34b7d",74.4,{"name":86,"color":87,"percentage":88},"Java","#b07219",18.1,{"name":90,"color":91,"percentage":92},"Scala","#c22d40",3.1,{"name":94,"color":95,"percentage":96},"Python","#3572A5",2.9,{"name":98,"color":99,"percentage":100},"Shell","#89e051",0.7,{"name":102,"color":103,"percentage":104},"CMake","#DA3434",0.4,{"name":106,"color":77,"percentage":107},"SWIG",0.2,{"name":109,"color":110,"percentage":111},"Go","#00ADD8",0.1,{"name":113,"color":114,"percentage":115},"Makefile","#427819",0,{"name":117,"color":118,"percentage":115},"LLVM","#185619",1683,325,"2026-04-09T08:28:08","Apache-2.0",4,"Linux","未说明",{"notes":127,"python":125,"dependencies":128},"OpenMLDB 是一个机器学习数据库，核心功能基于 SQL 进行特征工程。它包含定制的 Spark 发行版用于批处理，以及自研的实时 SQL 引擎。虽然提供了 Python SDK（PyPI 上有 openmldb 包），但 README 中未明确指定具体的 Python 版本、内存或 GPU 需求。该系统支持分布式存储和计算，适用于生产环境，可通过 Docker 部署或从源码编译安装。",[129,130,131,132,133,134,135,136],"Spark (定制版)","LightGBM","XGBoost","Apache Kafka","Apache Pulsar","Apache RocketMQ","DolphinScheduler","Byzer",[14,16],[139,140,141,142,143,144,145,146,147,148,149],"feature-engineering","machine-learning","featurestore","in-memory-database","database-for-machine-learning","machine-learning-database","feature-store","feature-extraction","database-for-ai","featureops","mlops","2026-03-27T02:49:30.150509","2026-04-10T15:43:06.321837",[],[154,159,164,169,174,179,184,189,194,199,204,209,214,219,224,229,234,239,244,249],{"id":155,"version":156,"summary_zh":157,"released_at":158},188787,"v0.9.3","## 功能特性\n- 优化嵌套聚合调用的性能 (#4022 @aceforeverd)\n- 为 gcformat 特性签名添加 gcformat_index UDF (#4020 @zhanghaohit)\n- 将 upload-artifact 升级至 v3 版本 (#3984 @nmreadelf)\n- 升级 thirdparty 中的 absl 库 (#3986 @aceforeverd)\n- 添加在线离线一致性校验脚本 (#3974 @oh2024)\n\n## Bug 修复\n- 修复 ast_node_converter.cc 中 ConvertFrameBound 函数将 FrameBound 转换后处理空指针的问题 (#4015 @Shouren)\n- 修复 Slice 的正确引用计数问题 (#3998 @aceforeverd)\n- 修复在 ARM 架构上构建 Docker 镜像的问题 (#3985 @aceforeverd)\n- 修复 python_quickstart\u002Fdemo.py 在 sqlalchemy 2.0.27 下的兼容性问题 (#3979 @Shouren)\n- 修复 tools\u002Fvalidation 中 CMake 错误“install FILES 给定目录” (#3975 #3976 @Shouren)\n\n## 文档更新\n- 更新 OpenMLDBSdk 参考文档 (#3997 @emmanuel-ferdman)\n- 根据 #3993 更新 Kafka 连接器演示文档 (#3995 @Shouren)\n\n**完整变更日志**: https:\u002F\u002Fgithub.com\u002F4paradigm\u002FOpenMLDB\u002Fcompare\u002Fv0.9.2...v0.9.3","2025-02-21T06:58:05",{"id":160,"version":161,"summary_zh":162,"released_at":163},188788,"v0.9.2","### Bug 修复\n- 修复自托管环境中 OpenMLDB SDK 版本升级问题 (#3962 @aceforeverd)\n- 修复从 JOB_INFO 表查询时应始终处于在线模式的问题 (#3963 @aceforeverd)\n- 修复 UDF 文档生成工作流中将 create-pull-request 操作更新至 v6，并移除已弃用的文件同步文件 (#3964 @Jayaprakash0511)\n- 修复在 CentOS 7 EOL 环境下的构建问题 (#3965 @aceforeverd)\n- 修复 NumPy 版本锁定问题 (#3966 @aceforeverd)","2024-07-27T10:38:00",{"id":165,"version":166,"summary_zh":167,"released_at":168},188789,"v0.9.1","### 功能特性\n- 在 Java SDK 中支持合并 DAG SQL (#3911 @wyl4pd)\n- 支持 Tablet 远程获取用户表 (#3918 @oh2024)\n- 支持设置全局变量 @@execute_mode = 'request' (#3924 @aceforeverd)\n- 支持同步 CRUD 用户操作 (#3928 @oh2024)\n- 支持简单的 ANSI SQL 重写器 (#3934 @aceforeverd)\n- 支持 batchrequest 执行模式 (#3938 @aceforeverd)\n- 支持 isin、array_combine、array_join 和 locate 新的 UDF (#3939、#3945 @aceforeverd，#3940、#3943 @howdb)\n- 支持服务器端授权 (#3941 @oh2024)\n- 支持新的索引类型和 IoT 表 (#3944 @vagetablechicken)\n\n### Bug 修复\n- 修复安装脚本使用了错误的 ZooKeeper 配置文件 (#3901 @greatljn)\n- 修复客户端始终发送认证信息的问题 (#3906 @oh2024)\n- 修复在执行 DROP TABLE 时删除聚合表的问题 (#3908 @vagetablechicken)\n- 修复 SQL 客户端中配置子句里 execute_mode 的检查问题 (#3909 @aceforeverd)\n- 修复 gcformat 格式化时带空格导致连续签名的问题 (#3921 @wyl4pd)\n- 通过移除 S3 依赖并解决 curator 的包冲突来修复打包问题 (#3929 @tobegit3hub)\n- 修复排序键重复以及按整数排序的问题 (#3947 @oh2024)\n- 修复在旧版 glibc 操作系统上执行 checkout 操作的问题 (#3955 @aceforeverd)\n- 修复正确部署 Spark 的 CICD 问题 (#3958 @aceforeverd)","2024-07-18T13:18:26",{"id":170,"version":171,"summary_zh":172,"released_at":173},188790,"v0.9.0","### 重大变更\n- 将 SQLAlchemy 升级至 2.0.27，不再支持 SQLAlchemy 1.x 版本（#3805 @yht520100）\n- 修正 `first_value` 的语义，使其与 ANSI SQL 兼容（#3861 @aceforeverd）\n- 将执行模式的默认值设置为 `online`，此前在 0.9.0 版本之前为 `offline`（#3862 @aceforeverd）\n- 客户端认证功能已弃用，现可在服务器端启用认证（#3835 #3885 @oh2024）\n\n### 功能特性\n- 支持离线构建 Docker 镜像（#3773 #3787 @QiChenX，#3778 @aceforeverd）\n- 新模块 OpenM(ysq)LDB 支持 MySQL 协议（#3800 @tobegit3hub #3816 #3820 #3823 #3824 #3831 @yangwucheng）\n- SQL 引擎支持 Map 数据类型（#3841 #3847 @aceforeverd）\n- 在线和离线存储均支持 TiDB 后端（#3815 #3839 @yht520100）\n- Kafka 连接器支持字符串格式的时间戳及部分插入操作（#3834 @vagetablechicken）\n- 支持任意 Spark 发行版，移除对 OpenMLDB Spark 的依赖（#3849 @tobegit3hub）\n- 离线模式下支持 INSERT 函数（#3854 @Matagits）\n- 支持使用 SQL 函数生成特征签名（#3877 @wyl4pd）\n- 原生 SQL 中支持请求模式（#3874 @aceforeverd）\n\n### 错误修复\n- 修复使用相同时间戳进行删除时的问题（#3780 @dl239）\n- 修复 Python SDK 预编译 SQL 插入中出现的 `\\x00` 问题（#3788 @yht520100）\n- 修复创建类似 Hive 表时导入 Spark 配置的问题（@3792 @vagetablechicken）\n- 修复离线模式下 SELECT 常量返回空值的问题（#3825 @Matagits）\n- 修复 Notebook Magic 函数中 SHOW 和 LOAD SQL 打印结果集的问题（#3856 @tobegit3hub）\n- 修复不同数据类型 TTL 合并的问题（#3859 @vagetablechicken）\n- 修复 DDL 解析器在获取重复列键时的错误（#3873 @vagetablechicken）\n- 修复调用 zk RegisterName 时 Nameserver 初始化的 bug（#3869 @oh2024）\n\n### 测试\n- 在集成测试中设置 NPROC（#3782 @dl239）\n- YAML 测试框架支持 Map 数据类型（#3765 @aceforeverd）\n- 添加 Go SDK 测试后自动清理表的功能（#3799 @oh2024）\n- 修复 sql_cmd_test 并补充 MakeMergeNode 的实现（#3829 @aceforeverd）\n- 添加查询性能基准测试（#3855 @gaoboal）","2024-04-25T20:02:28",{"id":175,"version":176,"summary_zh":177,"released_at":178},188791,"v0.8.5","### 功能特性\n- 支持 Iceberg 作为离线存储 (#3737 @vagetablechicken)\n- 支持 `UNION ALL` 语句 (#3590 #3653 @aceforeverd)\n- 支持将 `SELECT ... INTO OUTFILE` 导出到 OpenMLDB 在线表 (#3616 @tobegit3hub)\n- 离线模式下支持不带 `ORDER BY` 的 `LAST JOIN` 和 `WINDOW` (#3619 @aceforeverd)\n- 支持 `CREATE\u002FALTER\u002FDROP USER` 语句 (#3678 #3745 #3747 @dl239, #3744 @tobegit3hub)\n- 支持在 SDK 中指定 Spark 配置 (#3613 @tobegit3hub)\n- 当服务器端内存使用量超过设定的限制时，`INSERT` 操作会返回失败 (#3631 @dl239)\n- 为 SQL 到 DAG 添加新接口 (#3630 @aceforeverd)\n- 如果部署的 SQL 包含 `LEFT JOIN`，索引将自动创建。(#3667 @aceforeverd)\n- 支持日志的自动删除 (#3704 #3736 #3706 @dl239)\n- 支持 disktable 的 `absandlat\u002Fabsorlat` TTL 类型 (#3716 @dl239)\n- 优化插入失败的错误信息 (#3725 @vagetablechicken)\n- 完善文档 (#3617 #3519 #3690 #3699 @vagetablechicken, #3612 @dl239, #3609 #3672 #3687 @aceforeverd, #3649 #3570 #3569 @TanZiYen @Elliezza, #3665 @DrDub, #3585 #3584 #3579 #3578 #3574 #3573 #3552 #3539 #3488 #3477 #3475 #3586 #3470 #3474 #3568 #3583 #3564 #3764 @TanZiYen, #3688 #3697 #3753 #3721 #3731 #3739 #3754 #3720 #3756 #3762 #3752 #3757 #3719 @Elliezza, #3075 @Elliezza @tobegit3hub, #3710 @tobegit3hub)\n- 其他小功能改进 (#3623 #3636 @aceforeverd, #3651 @tobegit3hub, #3641 #3692 @vagetablechicken, #3582 #3702 @dl239, #3674 @lqy222)\n\n### Bug 修复\n- 执行离线任务会消耗过多的 ZooKeeper 连接 (#3642 @dl239)\n- SDK 在与 ZooKeeper 断开连接后不会自动重连。(#3656 #3668 @vagetablechicken)\n- 如果 `FlexibleRowBuilder` 被设置为 null 值，将会抛出 `NullPointerException` (#3649 @dl239)\n- 如果导入数据中的字符串长度超过 255，可能会抛出 `BufferOverflowException`。(#3729 @ljwh)\n- 如果表中包含大量数据，在执行 `TRUNCATE` 后仍然可能查询到数据 (#3677 @dl239)\n- 删除数据后，仍可通过其他索引检索到数据 (#3693 @dl239)\n- 插入失败时删除脏数据 (#3681 @dl239)\n- 当没有表时，使用 `GetAllDbs` 获取数据库会失败。(#3742 @vagetablechicken)\n- 如果索引名称与之前不同，添加已删除的索引会失败 (#3635 @dl239)\n- 其他小 bug 修复 (#3638 #3654 #3717 #3726 #3743 @vagetablechicken, #3607 #3775 @dl239, #3640 @tobegit3hub, #3686 #3735 #3738 #3740 @aceforeverd, #3759 @yangwucheng)\n\n### 代码重构\n#3666 @vagetablechicken\n\n### Linux AArch64 平台的实验性构建产物\n\n```\n49b691a8a2dc7175823e9fb808e731a8999896cc3ab819cfd32f1ab10c299cde  openmldb-0.8.5-linux-gnu-aarch64.tar.gz\n```","2024-02-27T13:46:49",{"id":180,"version":181,"summary_zh":182,"released_at":183},188792,"v0.8.4","### 功能特性\n- 支持新的 SQL 语句 `SHOW CREATE TABLE`、`TRUNCATE` 以及 [Alpha] 版本的 `LEFT JOIN` (#3500 #3542 @dl239, #3576 @aceforeverd)\n- 支持在创建表时指定压缩选项 (#3572 @dl239)\n- 优化 Java SDK 的插入性能 (#3525 @dl239)\n- 支持定义不带 `ORDER BY` 子句的窗口 (#3554 @aceforeverd)\n- 支持对 Zookeeper 连接的认证 (#3581 @dl239)\n- [Alpha] 支持在窗口子句上使用 `LAST JOIN` (#3533 #3565 @aceforeverd)\n- 增强监控模块 (#3588 @vagetablechicken)\n- 支持 `datediff` 函数处理 1900 年之前的日期 (#3499 @aceforeverd)\n- 增强诊断工具 (#3559 @vagetablechicken)\n- 在 CLI 启动时检查表的状态 (#3506 @vagetablechicken)\n- 将 brpc 版本升级至 1.6.0 (#3415 #3557 @aceforeverd)\n- 改进文档 (#3517 @dl239, #3520 #3523 @vagetablechicken, #3467 #3468 #3535 #3485 #3478 #3472 #3486 #3487 #3537 #3536 @TanZiYen)\n- 其他小功能改进 (#3587 @vagetablechicken, #3512 @dl239)\n\n### Bug 修复\n- 如果请求模式中的 `WINDOW UNION` 语句里包含 `LAST JOIN`，SQL 编译会失败。(#3493 @aceforeverd)\n- 在某些情况下，删除索引后 Tablet 可能会崩溃 (#3561 @dl239)\n- 维护工具中存在一些语法错误 (#3545 @vagetablechicken)\n- 如果部署 SQL 包含多个数据库，更新 TTL 会失败 (#3503 @dl239)\n- 其他小 bug 修复 (#3518 #3567 #3604 @dl239, #3543 @aceforeverd, #3521 #3580 @vagetablechicken, #3594 #3597 @tobegit3hub)\n\n### 代码重构\n#3547 @aceforeverd","2023-11-21T03:46:41",{"id":185,"version":186,"summary_zh":187,"released_at":188},188793,"v0.8.3","### 功能特性\n- 优化 Java SDK 的性能 (#3445 @dl239)\n- 优化 Spark 连接器的写入性能，并显著降低内存消耗 (#3443 @vagetablechicken)\n- 支持使用自定义 SQL 从 HIVE 加载数据 (#3380 @tobegit3hub)\n- 改进 SDK 和 CLI 的输出信息 (#3384 @vagetablechicken, #3434 #3494 @dl239)\n- 新增内置函数 `json_array_length` 和 `get_json_object` (#3414 #3429 @aceforeverd)\n- 为 `DEPLOYMENT` 语句新增 `RANGE_BIAS` 和 `ROWS_BIAS` 选项 (#3456 @vagetablechicken)\n- 在在线模式下支持 `const` 项目 (#3376 @aceforeverd)\n- 支持指定数据库名称执行 `SHOW DEPLOYMENT` 和 `DROP DEPLOYMENT` (#3353 @emo-coder)\n- 支持为 Spark 继承环境变量 (#3450 @vagetablechicken)\n- 支持在删除表时一并删除 HDFS 文件 (#3369 @tobegit3hub)\n- 增强诊断工具 (#3330 @zhangziheng01233)\n- 增强运维工具 (#3455 @dl239)\n- 仅当用户设置的超时值大于默认值时，才使用该超时值 (#3484 @vagetablechicken)\n- 从演示 Docker 镜像中移除同步工具 (#3390 @dl239)\n- 改进文档 (#3383 #3392 #3410 @vagetablechicken, #3175 #3447 ##3463 @TanZiYen, #3436 @aceforeverd, #3451 @wangerry, #3453 #3462 #3498 @dl239)\n\n### Bug 修复\n- 执行 `CREATE TABLE LIKE HIVE` 时，即使未找到数据库也会返回成功 (#3379 @emo-coder)\n- 执行 `DROP FUNCTION` 时若发生错误，将导致该函数无法再次被删除。(#3362 @vagetablechicken, #3441 @dl239)\n- `SHOW JOBS` 的结果未按 `id` 排序 (#3371 @emo-coder)\n- 创建系统表失败时，NameServer 会崩溃。(#3432 @dl239)\n- 如果同一张表上的上一条 `CREATE INDEX` 命令尚未完成，`CREATE INDEX` 可能会失败。(#3393 @dl239)\n- 对已删除的索引列执行 `SELECT` 时，结果为空。(#3426 @dl239)\n- 其他一些小 bug 修复 (#3391 #3408 @vagetablechicken, #3386 #3427 #3459 @dl239, #3367 #3495 @aceforeverd)\n\n### 代码重构\n#3397 @emo-coder, #3411 @vagetablechicken, #3435 @aceforeverd, #3473 @lqy222\n\n### 破坏性变更\n- `SQLResultSet` 中的 `GetInternalSchema` 返回类型由原生 Schema 更改为 `com._4paradigm.openmldb.sdk.Schema` #3445\n- 移除已弃用的 TaskManager 配置项 `namenode.uri` #3369","2023-09-16T08:54:41",{"id":190,"version":191,"summary_zh":192,"released_at":193},188794,"v0.8.2","### 功能特性\n- 增强 `delete` 语句 (#3301 #3374 @dl239)\n- 增强 C++ SDK (#3334 @vagetablechicken)\n- 支持 `DROP TABLE\u002FDATABASE` 语句中的新选项 `IF EXISTS` (#3348 @emo-coder)\n- 完善文档 (#3344 #3152 #3355 #3360 @vagetablechicken, #3341 @aceforeverd, #3343 #3372 @dl239, #2968 @selenachenjingxin)\n- 将 Kafka 连接器版本升级至 `10.5.0-SNAPSHOT-0.8.1` (#3365 @vagetablechicken)\n\n### Bug 修复\n- 在某些环境下运行离线任务时，加载外部 UDF 库失败 (#3350 #3359 @vagetablechicken)\n- 使用 Hive 软链接加载数据时失败 (#3349 @vagetablechicken)\n- 插入操作成功，但时间戳无效 (#3313 @aceforeverd)\n- APIServer 中布尔类型未正确打包。(#3366 @vagetablechicken)\n- 当存在重复索引时，表仍能成功创建。(#3306 @dl239)\n\n### 破坏性变更\n- `SHOW TABLE STATUS` 的结果中，字段 `Offline_deep_copy` 将被 `Offline_symbolic_paths` 替代 #3349。","2023-07-20T13:59:58",{"id":195,"version":196,"summary_zh":197,"released_at":198},188795,"v0.8.1","### 功能特性\n- 支持新的 SQL 语句 `ALTER TABLE ... ADD\u002FDROP OFFLINE_PATH ...` (#3286 @aceforeverd, #3323 @tobegit3hub)\n- 支持部署涉及已有数据但未定义预聚合的表的 SQL 语句 (#3288 @dl239)\n- 支持新的内置函数 `top_n_value_ratio_cate`、`top_n_key_ratio_cate`、`list_except_by_key` 和 `list_except_by_value` (#3329 @aceforeverd)\n- 新增 SDK API，用于合并多个 SQL 语句进行部署 (#3297 @vagetablechicken)\n- 支持在 Kafka 连接器中映射主题表 (#3282 @vagetablechicken)\n- 支持在 Docker 和 Kubernetes 中部署 Kafka 连接器 (#3276 @tobegit3hub)\n- 支持从 NameServer 获取任务 (#3293 @dl239)\n- 增强诊断工具 (#3224 #3208 #3285 #3258 #3303 @zhangziheng01233)\n- 增强 `SELECT INTO ...` 语句 (#2529 @vagetablechicken)\n- 完善文档 (#3308 @aceforeverd, #3333 @TanZiYen)\n- 其他小功能改进 (#3312 #3314 @vagetablechicken, #3298 @aceforeverd)\n\n### Bug 修复\n- 在某些情况下 SQL 部署会失败 (#3328 @vagetablechicken)\n- 创建 UDF\u002FUDAF 可能会失败，因为默认情况下不存在 `udf` 目录。(#3326 @vagetablechicken)\n- 其他小 bug 修复 (#3281 #3284 @vagetablechicken)\n\n### 代码重构\n#3226 @dl239, #3294 @aceforeverd","2023-06-29T04:07:01",{"id":200,"version":201,"summary_zh":202,"released_at":203},188796,"v0.8.0","### 功能特性\n- 新增同步工具，可自动将在线存储中的数据同步到离线存储中（#3256 @vagetablechicken）\n- 支持新的内置函数 `var_samp`、`var_pop`、`entropy`、`earth_distance`、`nth_value_where` 和 `add_months`（#3046 #3193 @aceforeverd）\n- 支持 openmldb-spark-connector 的批量读取功能（#3070 @tobegit3hub）\n- [Alpha] 支持将 Kubernetes 作为离线引擎的 TaskManager 后端（#3147 #3157 #3185 @tobegit3hub）\n- 支持在 WHERE 子句上使用 LAST JOIN（#3134 @aceforeverd）\n- 支持在 WINDOW UNION 子句中使用 LAST JOIN（#3205 @aceforeverd）\n- 在函数 `round` 中支持将小数位数作为第二个参数（#3221 @aceforeverd）\n- 支持将 Amazon S3 作为离线数据源（#3229 #3261 @tobegit3hub）\n- 新增选项 `SKIP_INDEX_CHECK`，用于在部署 SQL 时跳过索引检查（#3241 @dl239）\n- 支持为离线表使用符号链接路径（#3235 @tobegit3hub）\n- 完善文档（#3104 #2993 @selenachenjingxin，#3113 #3118 #3239 @tobegit3hub，#3150 #3184 #3237 #3255 @aceforeverd，#3160 #3195 #3197 #3223 @lumianph，#3192 #3215 @haseeb-xd，#3201 #3220 #3232 #3236 #3254 @vagetablechicken，#3213 @alexab612，#3189 #3199 @TanZiYen）\n- 其他小幅功能改进（#3115 #3143 #3182 @tobegit3hub，#2818 #3123 @aceforeverd，#3128 #3127 @dl239）\n\n### Bug 修复\n- 在特定情况下执行离线 SQL 时会出现 curator 冲突问题。（#3090 @tobegit3hub）\n- 如果表名中没有指定数据库，则 `CREATE TABLE ... LIKE HIVE ...` 语句执行会失败。（#3063 @tobegit3hub）\n- 即使 `CREATE TABLE ... LIKE ...` 执行失败，CLI 仍会显示“success”。（#3080 @tobegit3hub）\n- 如果源表不存在，离线模式下执行 `SELECT ... INTO ...` 语句会失败。（#3116 @tobegit3hub）\n- 当对两个 LAST JOIN 使用 `SELECT *` 时，编译会失败。（#3117 @aceforeverd）\n- 如果查询 `JOB_INFO` 失败，同步作业线程将进入无限循环。（#3169 @vagetablechicken）\n- 如果 JOIN 语句中有多个条件，SQL 部署会失败。（#3196 @vagetablechicken）\n- 启用 SparkSQL 后，无法获取已注册的表。（#3057 @tobegit3hub）\n- 其他小幅 Bug 修复（#3097 #3095 @dl239，#3109 #3141 #3162 #3234 @aceforeverd，#3096 #3112 @tobegit3hub，#3231 #3251 @vagetablechicken）\n\n### 代码重构\n#3188 @tobegit3hub","2023-05-12T01:23:02",{"id":205,"version":206,"summary_zh":207,"released_at":208},188797,"v0.7.3","### Features\r\n- Support C\u002FC++ based User-Defined Aggregated Functions (UDAFs) (#2825 @dl239)\r\n- Improve the diagnostic tool to support a few new sub-commands (#3106 @vagetablechicken)\r\n- Add a new script to modify the node environment configuration (#3142 @dl239)\r\n- Change the default value of `max_traverse_cnt` to unlimited to avoid result truncated when performing queries in CLI (#2999 @dl239)\r\n- Improve the documents (#3111 #3093 #3119 @selenachenjingxin, #3105 #3125 #3120 @vagetablechicken, #3114 #3126 @dl239, #3128 @lumianph)\r\n\r\n### Bug Fixes\r\n- The user-provided `SPARK_HOME` does not work in the deployment scripts. (#3085 @zhanghaohit)\r\n- The result of `SELECT timestamp(string_val)` is incorrect at the offline mode. (#3088 @tobegit3hub)\r\n\r\n### Code Refactoring\r\n#3122 @haseeb-xd\r\n\r\nNote:\r\nIf the configuration of a tablet has not been updated when upgrading to this new version, the query result still may be truncated as the old version (#2999).","2023-03-22T15:30:56",{"id":210,"version":211,"summary_zh":212,"released_at":213},188798,"v0.7.2","### Features\r\n- [Alpha] Support the new SQL clause `WITH` (#2846 @aceforeverd)\r\n- Support deploying multiple TaskManagers (#3004 @zhanghaohit)\r\n- Support the new built-in functions `std`, `stddev`, `stddev_samp`, `stddev_pop`, `ew_avg` and `drawdown` (#3025 #3032 #3029 @zhanghaohit)\r\n- Add the new configurations to specify the maximum size of RocksDB's log files (#2991 @dl239)\r\n- The `CREATE TABLE ... LIKE PARQUET ...` statement supports a parquet file as the input in the offline mode. (#2996 @tobegit3hub)\r\n- Support showing query results of synchronous jobs in TaskManager (#3034 @vagetablechicken)\r\n- Change the default timeout of synchronous jobs to 30 minutes, and add a corresponding CLI parameter for configuration (#3061 @vagetablechicken)\r\n- Improve the documents (#2938 #2984 #3016 @vagetablechicken, #2958 #2973 #2980 #2987 #2988 #3035 @lumianph, #2990 @lukeAyin, #2997 #3065 @tobegit3hub, #3011 #3027 @dl239, #3020 #3066 #3071 #3074 @aceforeverd, #3033 #3036 @selenachenjingxin)\r\n\r\n### Bug Fixes\r\n- Disk table does not clean the expired data. (#2963 @dl239)\r\n- Incorrect index will be added if there is `LAST JOIN` statement in a deployed SQL. (#2979 @dl239)\r\n- The result is incorrect if a window frame is specified by `EXCLUDE CURRENT_ROW` (#2930 @aceforeverd)\r\n- SQL compiling fails if there is an UDF function in an UDAF expression. (#3018 @aceforeverd)\r\n- Although the return information indicates success, index creation may still fail in some cases. (#3042 @vagetablechicken)\r\n- The `recoverdata` command fails if there are a large number of records in a memory table. (#3060 @dl239)\r\n- The `deploy-all` tool deploys the Spark package to local nodes only. (#3022 @zhanghaohit)\r\n- Other minor bug fixes (#2970 #3028 #3026 #3003 #3064 @dl239)\r\n\r\n### Code Refactoring\r\n#2995 #3030 @aceforeverd\r\n\r\nNote:  \r\nWhile we have resolved the overflow issue in the current version of the monitor component #3003, it may still persist when upgrading from an older version.","2023-02-20T14:02:53",{"id":215,"version":216,"summary_zh":217,"released_at":218},188799,"v0.7.1","### Features\r\n- Support data import from Hive using a symbolic link (#2948 @vagetablechicken)\r\n- Support the new SQL statement `CREATE TABLE LIKE` (#2949 @aceforeverd, #2962 @tobegit3hub)\r\n- Improve the non-interactive CLI (#2898 @vagetablechicken)\r\n- Improve the documents (#2904 #2921 #2932 #2942 @selenachenjingxin, #2925 #2928 #2934 #2954 @vagetablechicken, #2924 @dl239, #2945 #2952 @lumianph, #2946 @aceforeverd)\r\n\r\n### Bug Fixes\r\n- The result of `_*_cate` is incorrect. (#2939 @zhanghaohit)\r\n- The deployment of SQL fails if the column name of a major table is a keyword. (#2894 @dl239)\r\n- Tablet may core dump when executing SQLs with disk tables. (#2926 @dl239)\r\n- There is memory leak when writing data into disk tables. (#2943 @dl239)\r\n- The result of `show components` is incorrect in certain cases. (#2940 @dl239)\r\n- Offline jobs execution fails in certain cases because the `Curator` component causes an incompatible issue. (#2936 @tobegit3hub)\r\n- Disabling the monitor log (#2953 @dl239)\r\n\r\n### Code Refactoring\r\n#2875 #2937 @dl239","2023-01-14T05:31:36",{"id":220,"version":221,"summary_zh":222,"released_at":223},188800,"v0.7.0","### Features\r\n- Improve the messages and errors when inserting rows (#2834 @vagetablechicken)\r\n- Add a new configuration `max_memory` to limit the memory usage of a tablet (#2815 @dl239)\r\n- Add new maintenance tools `deploy-all` and `start-all` (#2809 @zhanghaohit)\r\n- Insertion returns errors if the value of timestamp field is negative (#2776 @dl239)\r\n- Support the new built-in functions `unix_timestamp`, `pmod`, `datediff`, and `size` (#2843 #2839 #2847 #2864 @zhanghaohit)\r\n- Add the new data type of `ARRAY` in UDFs (#2817 @aceforeverd)\r\n- Improve the documents (#2868 @haseeb-xd, #2878 @Jake-00, #2879 #2876 #2889 #2890 @vagetablechicken, #2881 #2907 #2908 #2897 @selenachenjingxin, #2859 @AdvancedUno, #2893 @lumianph, #2916 @aceforeverd)\r\n\r\n### Bug Fixes\r\n- Window over a subquery(t1 LAST JOIN t2) fails due to column renaming in the subquery. (#2739 @aceforeverd)\r\n- `SHOW JOBLOG` fails under certain circumstances. (#2874 @tobegit3hub)\r\n- `OP` is not deleted if the related table has been dropped. (#2548 @dl239)\r\n- Memory is not released when deleting an index in some cases. (#2806 @dl239)\r\n- Changing a leader to a specified endpoint fails if there are data writing. (#2858 @dl239)\r\n- UDFs do not work for `yarn-client` and `yarn-cluster` in the offline mode. (#2802 @tobegit3hub)\r\n- Other minor bug fixes (#2828 #2903 #2906 @vagetablechicken, #2867 #2912 @dl239)\r\n\r\n### Code Refactoring\r\n#2860 @mammar11, #2865 #2863 @vagetablechicken, #2861 #2862 #2871 @Ziy1-Tan","2022-12-31T08:53:37",{"id":225,"version":226,"summary_zh":227,"released_at":228},188801,"v0.6.9","### Features\r\n- Add `pre-upgrade` and `post-upgrade` options in tools for upgrade (#2761 @zhanghaohit)\r\n- Support starting OpenMLDB with daemon (#2833 @vagetablechicken)\r\n- Improve the documents (#2827 @tobegit3hub, #2838 @vagetablechicken)\r\n\r\n### Bug Fixes\r\n- The CLI may crash if executing `show job` without a job ID. (#2771 @aceforeverd)\r\n- `select count(*)` may return empty result after tablets restart. (#2835 @zhanghaohit)\r\n- A tablet may crash if the output of SQL engine runner is `null`. (#2831 @dl239)","2022-12-08T14:15:18",{"id":230,"version":231,"summary_zh":232,"released_at":233},188802,"v0.6.8","### Features\r\n- Support the `where` clause in the SQL batch engine (#2820 @tobegit3hub)\r\n- Support input and output with the JSON format in APIServer (#2813 @vagetablechicken)\r\n- Improve the documents (#2814 @vagetablechicken)\r\n\r\n### Code Refactoring\r\n#2816 @dl239, #2714 @aceforeverd","2022-11-30T08:22:44",{"id":235,"version":236,"summary_zh":237,"released_at":238},188803,"v0.6.7","### Features\r\n- Support importing and exporting data from\u002Fto Hive (@2778 @vagetablechicken)\r\n- Improve the module of `autofe` (#2777 @vagetablechicken)\r\n- Improve error messages of the `TaskManager` client (#2780 @vagetablechicken)\r\n- Improve the documents (#2781 @zhanghaohit, #2767 #2792 @vagetablechicken, #2805 @selenachenjingxin, #2810 @dl239)\r\n\r\n### Bug Fixes\r\n- Python SDK workflow may fail on MacOS. (#2783 @vagetablechicken, #2788 @dl239)\r\n- There are syntax errors in some log messages. (@2770 dl239)\r\n- Installing Python SDK requires unnessary packages. (#2791 @vagetablechicken)","2022-11-24T15:41:43",{"id":240,"version":241,"summary_zh":242,"released_at":243},188804,"v0.6.6","### Features\r\n- Support the new build-in function `hash64` (#2754 @aceforeverd)\r\n- Improve the documents (#2763 @dl239, #2610 #2606 @michelle-qinqin)\r\n\r\n### Bug Fixes\r\n- `pytest` command is not found in the MacOS virtual machine. (#2765 @tobegit3hub)\r\n- Wrong output schema passes to the `WindowAggRunner`. (#2758 @aceforeverd)\r\n- There are no outputs when executing `showopstatus` command if no database is specified (#2773 @dl239)\r\n- The data recovery tool fails in some cases (#2768 @dl239)","2022-11-15T03:39:38",{"id":245,"version":246,"summary_zh":247,"released_at":248},188805,"v0.6.5","### Features\r\n- Optimize the distribution of table partitions (#2701 @jkpjkpjkp)\r\n- Add a new workflow to generate the documents of built-in functions automatically (#2709 #2729 @aceforeverd)\r\n- Support the new SQL statement `show joblog` (#2732 @aceforeverd, #2747 @tobegit3hub)\r\n- Add a warning message for `show table status` (#2738 @zhanghaohit)\r\n- Add a new tool for data recovery and scale-out\u002Fscale-in (#2736 @dl239)\r\n- Improve the documents (#2707 #2727 @aceforeverd, #2718 #2538 #2731 #2752 @vagetablechicken, #2607 #2609 @michelle-qinqin, #2733 @zhanghaohit, #2742 @auula)\r\n\r\n### Bug Fixes\r\n- Incorrect data will be loaded in offline mode if the schema mismatches with parquet files. (#2648 @vagetablechicken)\r\n- Creating index fails if specifying a database in SQL statement (#2720 @dl239)\r\n- `start_time` is not human-readable after submitting a job (#2751 @tobegit3hub)\r\n- Incorrect result of `GetRecordIdxCnt` is produced in `MemTable` (#2719 @jkpjkpjkp)\r\n\r\n### Code Refactoring\r\n#2688 #2717 @vagetablechicken, #2705 #2728 @dl239, #2601 @team-317, #2737 @Jake-00","2022-11-05T10:35:44",{"id":250,"version":251,"summary_zh":252,"released_at":253},188806,"v0.6.4","### Features\r\n- Support a new series of built-in functions `top_n_value_*_cate_where` (#2622 @aceforeverd)\r\n- Support online batch computation and aggregation over a full table (#2620 @zhanghaohit)\r\n- Support `load_mode` and `thread` options for `LOAD DATA` (#2684 @zhanghaohit)\r\n- Improve the documents (#2476, #2486 #2514 #2611 #2693 #2462 @michelle-qinqin, #2695 @lumianph, #2653 @vagetablechicken)\r\n- Support running MacOS compiling jobs in the CICD workflow (#2665 @dl239)\r\n\r\n### Bug Fixes\r\n- Recreating index fails if it has been dropped. (#2440 @dl239)\r\n- The `traverse` method may get duplicate data if there are same `ts` records on one `pk` (#2637 @dl239)\r\n- Multiple window union will fail when compiling in batch mode (#2478 @tobegit3hub, #2561 @aceforeverd)\r\n- `select * ...` statement may cause inconsistent output schemas in many cases (#2660 @aceforeverd)\r\n- Result is incorrect if the window is specified as `UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE CURRENT_ROW` (#2674 @aceforeverd)\r\n- Incorrect slice offsets may lead to offline jobs hang (#2687 @aceforeverd)\r\n- Other minor bug fixes (#2669 @dl239, #2683 @zhanghaohit)\r\n\r\n### Code Refactoring\r\n#2541 @dl239, #2573 #2672 @haseeb-xd","2022-10-22T09:32:12"]