[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"similar-LumingSun--ML4DB-paper-list":3,"tool-LumingSun--ML4DB-paper-list":65},[4,23,32,40,49,57],{"id":5,"name":6,"github_repo":7,"description_zh":8,"stars":9,"difficulty_score":10,"last_commit_at":11,"category_tags":12,"status":22},2268,"ML-For-Beginners","microsoft\u002FML-For-Beginners","ML-For-Beginners 是由微软推出的一套系统化机器学习入门课程，旨在帮助零基础用户轻松掌握经典机器学习知识。这套课程将学习路径规划为 12 周，包含 26 节精炼课程和 52 道配套测验，内容涵盖从基础概念到实际应用的完整流程，有效解决了初学者面对庞大知识体系时无从下手、缺乏结构化指导的痛点。\n\n无论是希望转型的开发者、需要补充算法背景的研究人员，还是对人工智能充满好奇的普通爱好者，都能从中受益。课程不仅提供了清晰的理论讲解，还强调动手实践，让用户在循序渐进中建立扎实的技能基础。其独特的亮点在于强大的多语言支持，通过自动化机制提供了包括简体中文在内的 50 多种语言版本，极大地降低了全球不同背景用户的学习门槛。此外，项目采用开源协作模式，社区活跃且内容持续更新，确保学习者能获取前沿且准确的技术资讯。如果你正寻找一条清晰、友好且专业的机器学习入门之路，ML-For-Beginners 将是理想的起点。",84991,2,"2026-04-05T10:45:23",[13,14,15,16,17,18,19,20,21],"图像","数据工具","视频","插件","Agent","其他","语言模型","开发框架","音频","ready",{"id":24,"name":25,"github_repo":26,"description_zh":27,"stars":28,"difficulty_score":29,"last_commit_at":30,"category_tags":31,"status":22},3128,"ragflow","infiniflow\u002Fragflow","RAGFlow 是一款领先的开源检索增强生成（RAG）引擎，旨在为大语言模型构建更精准、可靠的上下文层。它巧妙地将前沿的 RAG 技术与智能体（Agent）能力相结合，不仅支持从各类文档中高效提取知识，还能让模型基于这些知识进行逻辑推理和任务执行。\n\n在大模型应用中，幻觉问题和知识滞后是常见痛点。RAGFlow 通过深度解析复杂文档结构（如表格、图表及混合排版），显著提升了信息检索的准确度，从而有效减少模型“胡编乱造”的现象，确保回答既有据可依又具备时效性。其内置的智能体机制更进一步，使系统不仅能回答问题，还能自主规划步骤解决复杂问题。\n\n这款工具特别适合开发者、企业技术团队以及 AI 研究人员使用。无论是希望快速搭建私有知识库问答系统，还是致力于探索大模型在垂直领域落地的创新者，都能从中受益。RAGFlow 提供了可视化的工作流编排界面和灵活的 API 接口，既降低了非算法背景用户的上手门槛，也满足了专业开发者对系统深度定制的需求。作为基于 Apache 2.0 协议开源的项目，它正成为连接通用大模型与行业专有知识之间的重要桥梁。",77062,3,"2026-04-04T04:44:48",[17,13,20,19,18],{"id":33,"name":34,"github_repo":35,"description_zh":36,"stars":37,"difficulty_score":29,"last_commit_at":38,"category_tags":39,"status":22},519,"PaddleOCR","PaddlePaddle\u002FPaddleOCR","PaddleOCR 是一款基于百度飞桨框架开发的高性能开源光学字符识别工具包。它的核心能力是将图片、PDF 等文档中的文字提取出来，转换成计算机可读取的结构化数据，让机器真正“看懂”图文内容。\n\n面对海量纸质或电子文档，PaddleOCR 解决了人工录入效率低、数字化成本高的问题。尤其在人工智能领域，它扮演着连接图像与大型语言模型（LLM）的桥梁角色，能将视觉信息直接转化为文本输入，助力智能问答、文档分析等应用场景落地。\n\nPaddleOCR 适合开发者、算法研究人员以及有文档自动化需求的普通用户。其技术优势十分明显：不仅支持全球 100 多种语言的识别，还能在 Windows、Linux、macOS 等多个系统上运行，并灵活适配 CPU、GPU、NPU 等各类硬件。作为一个轻量级且社区活跃的开源项目，PaddleOCR 既能满足快速集成的需求，也能支撑前沿的视觉语言研究，是处理文字识别任务的理想选择。",74913,"2026-04-05T10:44:17",[19,13,20,18],{"id":41,"name":42,"github_repo":43,"description_zh":44,"stars":45,"difficulty_score":46,"last_commit_at":47,"category_tags":48,"status":22},3215,"awesome-machine-learning","josephmisiti\u002Fawesome-machine-learning","awesome-machine-learning 是一份精心整理的机器学习资源清单，汇集了全球优秀的机器学习框架、库和软件工具。面对机器学习领域技术迭代快、资源分散且难以甄选的痛点，这份清单按编程语言（如 Python、C++、Go 等）和应用场景（如计算机视觉、自然语言处理、深度学习等）进行了系统化分类，帮助使用者快速定位高质量项目。\n\n它特别适合开发者、数据科学家及研究人员使用。无论是初学者寻找入门库，还是资深工程师对比不同语言的技术选型，都能从中获得极具价值的参考。此外，清单还延伸提供了免费书籍、在线课程、行业会议、技术博客及线下聚会等丰富资源，构建了从学习到实践的全链路支持体系。\n\n其独特亮点在于严格的维护标准：明确标记已停止维护或长期未更新的项目，确保推荐内容的时效性与可靠性。作为机器学习领域的“导航图”，awesome-machine-learning 以开源协作的方式持续更新，旨在降低技术探索门槛，让每一位从业者都能高效地站在巨人的肩膀上创新。",72149,1,"2026-04-03T21:50:24",[20,18],{"id":50,"name":51,"github_repo":52,"description_zh":53,"stars":54,"difficulty_score":46,"last_commit_at":55,"category_tags":56,"status":22},2234,"scikit-learn","scikit-learn\u002Fscikit-learn","scikit-learn 是一个基于 Python 构建的开源机器学习库，依托于 SciPy、NumPy 等科学计算生态，旨在让机器学习变得简单高效。它提供了一套统一且简洁的接口，涵盖了从数据预处理、特征工程到模型训练、评估及选择的全流程工具，内置了包括线性回归、支持向量机、随机森林、聚类等在内的丰富经典算法。\n\n对于希望快速验证想法或构建原型的数据科学家、研究人员以及 Python 开发者而言，scikit-learn 是不可或缺的基础设施。它有效解决了机器学习入门门槛高、算法实现复杂以及不同模型间调用方式不统一的痛点，让用户无需重复造轮子，只需几行代码即可调用成熟的算法解决分类、回归、聚类等实际问题。\n\n其核心技术亮点在于高度一致的 API 设计风格，所有估算器（Estimator）均遵循相同的调用逻辑，极大地降低了学习成本并提升了代码的可读性与可维护性。此外，它还提供了强大的模型选择与评估工具，如交叉验证和网格搜索，帮助用户系统地优化模型性能。作为一个由全球志愿者共同维护的成熟项目，scikit-learn 以其稳定性、详尽的文档和活跃的社区支持，成为连接理论学习与工业级应用的最",65628,"2026-04-05T10:10:46",[20,18,14],{"id":58,"name":59,"github_repo":60,"description_zh":61,"stars":62,"difficulty_score":10,"last_commit_at":63,"category_tags":64,"status":22},3364,"keras","keras-team\u002Fkeras","Keras 是一个专为人类设计的深度学习框架，旨在让构建和训练神经网络变得简单直观。它解决了开发者在不同深度学习后端之间切换困难、模型开发效率低以及难以兼顾调试便捷性与运行性能的痛点。\n\n无论是刚入门的学生、专注算法的研究人员，还是需要快速落地产品的工程师，都能通过 Keras 轻松上手。它支持计算机视觉、自然语言处理、音频分析及时间序列预测等多种任务。\n\nKeras 3 的核心亮点在于其独特的“多后端”架构。用户只需编写一套代码，即可灵活选择 TensorFlow、JAX、PyTorch 或 OpenVINO 作为底层运行引擎。这一特性不仅保留了 Keras 一贯的高层易用性，还允许开发者根据需求自由选择：利用 JAX 或 PyTorch 的即时执行模式进行高效调试，或切换至速度最快的后端以获得最高 350% 的性能提升。此外，Keras 具备强大的扩展能力，能无缝从本地笔记本电脑扩展至大规模 GPU 或 TPU 集群，是连接原型开发与生产部署的理想桥梁。",63927,"2026-04-04T15:24:37",[20,14,18],{"id":66,"github_repo":67,"name":68,"description_en":69,"description_zh":70,"ai_summary_zh":70,"readme_en":71,"readme_zh":72,"quickstart_zh":73,"use_case_zh":74,"hero_image_url":75,"owner_login":76,"owner_name":77,"owner_avatar_url":78,"owner_bio":79,"owner_company":80,"owner_location":81,"owner_email":82,"owner_twitter":83,"owner_website":84,"owner_url":85,"languages":83,"stars":86,"forks":87,"last_commit_at":88,"license":83,"difficulty_score":46,"env_os":89,"env_gpu":90,"env_ram":90,"env_deps":91,"category_tags":94,"github_topics":83,"view_count":10,"oss_zip_url":83,"oss_zip_packed_at":83,"status":22,"created_at":95,"updated_at":96,"faqs":97,"releases":98},2291,"LumingSun\u002FML4DB-paper-list","ML4DB-paper-list","Papers for database systems powered by artificial intelligence (machine learning for database)","ML4DB-paper-list 是一个专注于“人工智能赋能数据库系统”领域的开源论文合集。它系统性地整理了将机器学习、深度学习及强化学习应用于数据库内核优化的前沿学术成果，涵盖配置自动调优、索引推荐、查询优化、基数估计、资源管理以及 Text-to-SQL 等核心方向。\n\n面对数据库智能化研究中论文数量激增且分类分散的痛点，该资源库通过精细化的目录结构，将海量文献按功能模块（如物理设计、负载预测、执行引擎等）进行归类，帮助从业者快速定位特定细分领域的高质量研究，有效降低了追踪最新技术动态的门槛。\n\n这份清单特别适合数据库内核开发者、系统架构师以及从事 AI4DB 方向研究的科研人员使用。无论是希望引入自调优机制的工程团队，还是探索新型算法模型的学术界人士，都能从中获取宝贵的理论依据与技术灵感。其独特亮点在于不仅收录了经典的系统性综述与教程，还持续跟进如零样本学习、通用优化模型等新兴趋势，并开放社区协作机制，鼓励全球开发者共同补充完善，是进入智能数据库领域不可或缺的导航图。","\n# [Paper List] AI4DB \u002F ML4DB \u002F Autonomous Database \u002F Self-driving Database \u002F 智能数据库 \u002F 自治数据库\n\nPaper list for database systems with artificial intelligence (machine learning, deep learning, reinforcement learning)\n\nNew papers keep coming, remember to **Watch** this repo if you are interested in this topic.\n\n有关机器学习、神经网络、强化学习、自调优技术等在数据库系统中的应用的文章列表，列表持续更新中，记得按赞、分享、打开小铃铛！\n\nWelcome to PR!\n\n欢迎大家补充！\n\nThere are so many papers emerging about [Text-To-SQL](https:\u002F\u002Fgithub.com\u002Feosphoros-ai\u002FAwesome-Text2SQL)! Sadly I'm not an expert with the topic and can not tell the quality of the papers.  \nLooking forward to contributions (PR, comment, discussion) about Text-To-SQL！🫶\n\n如果有同学需要稳定访问GitHub的方式，可以试试这个[链接](https:\u002F\u002Fazabudai.org\u002Fauth\u002Fregister?code=Z4oa)\n\nTable of Contents\n=================\n- [\\[Paper List\\] AI4DB \u002F ML4DB \u002F Autonomous Database \u002F Self-driving Database \u002F 智能数据库 \u002F 自治数据库](#paper-list-ai4db--ml4db--autonomous-database--self-driving-database--智能数据库--自治数据库)\n- [Table of Contents](#table-of-contents)\n  - [System and Tutorial](#system-and-tutorial)\n    - [Training Data Collection](#training-data-collection)\n  - [Data Access](#data-access)\n    - [Configuration Tuning](#configuration-tuning)\n    - [Physical Design](#physical-design)\n      - [Learned structure](#learned-structure)\n      - [Index](#index)\n        - [Index Structure](#index-structure)\n        - [LSM-tree related](#lsm-tree-related)\n        - [Index Recommendation](#index-recommendation)\n    - [Materialized View](#materialized-view)\n      - [Schema \\& Partition](#schema--partition)\n    - [Cache related](#cache-related)\n  - [Workload](#workload)\n    - [Workload generation](#workload-generation)\n    - [Resource Management and Auto-scaling](#resource-management-and-auto-scaling)\n    - [Performance Diagnosis and Modeling](#performance-diagnosis-and-modeling)\n    - [Workload Shift Detection](#workload-shift-detection)\n    - [Workload Characterization \\& Forecasting](#workload-characterization--forecasting)\n  - [Query Optimization](#query-optimization)\n    - [Query Rewrite](#query-rewrite)\n    - [Cardinality Estimation](#cardinality-estimation)\n      - [Data-based](#data-based)\n      - [Query-based](#query-based)\n    - [Cost Estimation](#cost-estimation)\n      - [Single Query](#single-query)\n      - [Concurrent](#concurrent)\n    - [Join Optimization](#join-optimization)\n    - [Parametric Query Optimization](#parametric-query-optimization)\n      - [Foundational Theory](#foundational-theory)\n      - [Engineering & Data-driven PQO](#engineering--data-driven-pqo)\n      - [ML-based PQO & Robust Query Optimization](#ml-based-pqo--robust-query-optimization)\n    - [Query Plan](#query-plan)\n  - [Query Execution](#query-execution)\n    - [Sort](#sort)\n    - [Join](#join)\n    - [Adaptive Query Processing](#adaptive-query-processing)\n    - [Approximate Query Processing](#approximate-query-processing)\n    - [Sheduling](#sheduling)\n  - [Text-to-SQL](#text-to-sql)\n  - [SQL Related](#sql-related)\n  \n  - [Stargazers over time](#stargazers-over-time)\n\n\n## System and Tutorial\n* ***SageDB: A Learned Database System (CIDR 2019)***\n* Database Learning: Toward a Database that Becomes Smarter Every Time (SIGMOD 2017)\n* Self-Driving Database Management Systems (CIDR 2017)\n* Self-Driving : From General Purpose to Specialized DBMSs (Phd@PVLDB 2018)  \n* Active Learning for ML Enhanced Database Systems (SIGMOD 2020)\n* Database Meets Artificial Intelligence: A Survey (TKDE 2020)\n* Self-driving database systems: a conceptual approach (Distributed and Parallel Databases 2020)\n* One Model to Rule them All: Towards Zero-Shot Learning for Databases (arXiv 2021)\n* UDO: Universal Database Optimization using Reinforcement Learning (arXiv 2021) [Source Code](https:\u002F\u002Fgithub.com\u002Fjxiw\u002FUDO)\n* Towards a Benchmark for Learned Systems (SMDB workshop 2021)\n* A Unified Transferable Model for ML-Enhanced DBMS [Vision] (arXiv 2021)\n* AI Meets Database: AI4DB and DB4AI (SIGMOD 2021)\n* Expand your Training Limits! Generating Training Data for ML-based Data Management (SIGMOD 2021)\n* MB2: Decomposed Behavior Modeling for Self-Driving Database Management Systems (SIGMOD 2021)\n* Towards instance-optimized data systems (VLDB 2021 from Tim Kraska)\n* Make Your Database System Dream of Electric Sheep: Towards Self-Driving Operation (VLDB 2021 from Andy Pavlo)\n* openGauss: An Autonomous Database System (VLDB 2021 from Guoliang Li)\n* Experience-Enhanced Learning: One Size Still does not Fit All in Automatic Database Management (arXiv 2021)\n* Baihe: SysML Framework for AI-driven Databases (arXiv 2022)\n* Survey on Learnable Databases: A Machine Learning Perspective (Big Data Research 2021)\n* Database Optimizers in the Era of Learning (ICDE 2022)\n* Machine Learning for Data Management: A System View (ICDE 2022)\n* Tastes Great! Less Filling! High Performance and Accurate Training Data Collection for Self-Driving Database Management Systems (SIGMOD 2022)\n* SAM: Database Generation from Query Workload with Supervised Autoregressive Model (SIGMOD 2022) [Source code](https:\u002F\u002Fgithub.com\u002FJamesyang2333\u002FSAM)\n* Detect, Distill and Update: Learned DB Systems Facing Out of Distribution Data (SIGMOD 2023) [Source code](https:\u002F\u002Fgithub.com\u002Fmeghdadk\u002FDDUp)\n* SageDB: An Instance-Optimized Data Analytics System (VLDB 2023)\n* Towards Building Autonomous Data Services on Azure (SIGMOD-Companion ’23)\n* Database Gyms (CIDR 2023)\n* Check Out the Big Brain on BRAD: Simplifying Cloud Data Processing with Learned Automated Data Meshes (VLDB 2023)\n* Machine Unlearning in Learned Databases: An Experimental Analysis (SIGMOD 2024) [Source code](https:\u002F\u002Fgithub.com\u002Fmeghdadk\u002FDB_unlearning)\n* PilotScope: Steering Databases with Machine Learning Drivers (VLDB 2024) [Source code](https:\u002F\u002Fgithub.com\u002Falibaba\u002Fpilotscope)\n* Machine Learning for Databases: Foundations, Paradigms, and Open problems (SIGMOD 2024)\n* NeurDB: An AI-powered Autonomous Data System (arXiv 2024)\n* GaussML: An End-to-End In-Database Machine Learning System (ICDE 2024)\n* NeurDB: On the Design and Implementation of an AI-powered Autonomous Database (arXiv 2024)\n* LLM for Data Management (VLDB 2024)\n* Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRAD (VLDB 2024)\n* The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-Actions (VLDB 2024)\n* NeurBench: Benchmarking Learned Database Components with Data and Workload Drift Modeling (arXiv 2025)\n* GaussMaster: An LLM-based Database Copilot System (arXiv 2025)\n* D-Bot: An LLM-Powered DBA Copilot (SIGMOD-Companion 2025)\n* Does A Fish Need a Bicycle? The Case for On-Chip NPUs in DBMS (CIDR 2026)\n### Training Data Collection\n* Expand your Training Limits! Generating Training Data for ML-based Data Management (SIGMOD 2021)\n* DataFarm: Farm Your ML-based Query Optimizer's Food! - Human-Guided Training Data Generation -. (CIDR 2022)\n* Farming Your ML-based Query Optimizer's Food. (ICDE 2022, **best demo award**)\n* Hit the Gym: Accelerating Query Execution to Efficiently Bootstrap Behavior Models for Self-Driving Database Management Systems (VLDB 2024)\n* Theoretical Analysis of Learned Database Operations under Distribution Shift through Distribution Learnability (ICML 2024)\n\n## Data Access\n### Configuration Tuning\n* SARD: A statistical approach for ranking database tuning parameters (ICDEW, 2008)\n* Regularized Cost-Model Oblivious Database Tuning with Reinforcement Learning （2016）\n* Automatic Database Management System Tuning Through Large-scale Machine Learning (SIGMOD 2017)\n* The Case for Automatic Database Administration using Deep Reinforcement Learning ( 2018 ArXiv)\n* An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning (SIGMOD 2019)\n* External vs. Internal : An Essay on Machine Learning Agents for Autonomous Database Management Systems\n* QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning (VLDB 2019)\n* Optimizing Databases by Learning Hidden Parameters of Solid State Drives (VLDB 2019)\n* iBTune: Individualized Buffer Tuning for Large-scale Cloud Databases (VLDB 2019)\n* Black or White? How to Develop an AutoTuner for Memory-based Analytics (SIGMOD 2020)\n* Learning Efficient Parameter Server Synchronization Policies for Distributed SGD (ICLR 2020)\n* Too Many Knobs to Tune? Towards Faster Database Tuning by Pre-selecting Important Knobs (HotStorage 2020)\n* Dynamic Configuration Tuning of Working Database Management Systems (LifeTech 2020)\n* Adaptive Multi-Model Reinforcement Learning for Online Database Tuning (EDBT 2021)\n* An inquiry into machine learning-based automatic configuration tuning services on real-world database management systems (VLDB 2021)\n* The Case for NLP-Enhanced Database Tuning: Towards Tuning Tools that \"Read the Manual\" (VLDB 2021)\n* CGPTuner: a Contextual Gaussian Process Bandit Approach for the Automatic Tuning of IT Configurations Under Varying Workload Conditions (VLDB 2021)\n* ResTune: Resource Oriented Tuning Boosted by Meta-Learning for Cloud Databases (SIGMOD 2021)\n* KML: Using Machine Learning to Improve Storage Systems (arXiv 2021)\n* Database Tuning using Natural Language Processing (SIGMOD Record 2021)\n* Towards Dynamic and Safe Configuration Tuning for Cloud Databases (SIGMOD 2022)\n* Automatic Performance Tuning for Distributed Data Stream Processing Systems (ICDE 2022)\n* Adaptive Code Learning for Spark Configuration Tuning (ICDE 2022)\n* DB-BERT: A Database Tuning Tool that \"Reads the Manual\" (SIGMOD 2022)\n* HUNTER: An Online Cloud Database Hybrid Tuning System for Personalized Requirements (SIGMOD 2022)\n* LOCAT: Low-Overhead Online Configuration Auto-Tuning of Spark SQL Applications (SIGMOD 2022)\n* Facilitating Database Tuning with Hyper-Parameter Optimization: A Comprehensive Experimental Evaluation (VLDB 2022)\n* LlamaTune: Sample-Efficient DBMS Configuration Tuning (VLDB 2022)\n* BLUTune: Query-informed Multi-stage IBM Db2 Tuning via ML (CIKM 2022)\n* A Unified and Efficient Coordinating Framework for Autonomous DBMS Tuning (arXiv 2023)\n* Automatic Database Knob Tuning: A Survey (TKDE)\n* Deep learning based Auto Tuning for Database Management System (arXiv 2023)\n* KeenTune: Automated Tuning Tool for Cloud Application Performance Testing and Optimization (ISSTA 2023)\n* ContTune: Continuous Tuning by Conservative Bayesian Optimization for Distributed Stream Data Processing Systems (arXiv 2023)\n* GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian Optimization (arXiv 2023)\n* An Eficient Transfer Learning Based Configuration Adviser for Database Tuning (VLDB 2024)\n* DB‑GPT: Large Language Model Meets Database (DSE 2024)\n* A Spark Optimizer for Adaptive, Fine-Grained Parameter Tuning (arXiv 2024)\n* TIE: Fast Experiment-driven ML-based Configuration Tuning for In-memory Data Analytics (IEEE Transactions on Computers)\n* VDTuner: Automated Performance Tuning for Vector Data Management Systems (ICDE 2024) [Source code](https:\u002F\u002Fgithub.com\u002Ftiannuo-yang\u002FVDTuner)\n* Nautilus: A Benchmarking Platform for DBMS Knob Tuning (DEEM 2024) [Source code](https:\u002F\u002Fgithub.com\u002Fuw-mad-dash\u002Fnautilus)\n* Is Large Language Model Good at Database Knob Tuning? A Comprehensive Experimental Evaluation (arXiv 2024)\n* CTuner: Automatic NoSQL Database Tuning with Causal Reinforcement Learning (Internetware 2024)\n* KnobTree: Intelligent Database Parameter Configuration via Explainable Reinforcement Learning (arXiv 2024)\n* KnobCF: Uncertainty-aware Knob Tuning (arXiv 2024)\n* Db2une: Tuning Under Pressure via Deep Learning (VLDB 2024)\n* {\\lambda}-Tune: Harnessing Large Language Models for Automated Database System Tuning (arXiv 2024)\n* Db2une: Tuning Under Pressure via Deep Learning (VLDB 2024)\n* LOFTune: A Low-Overhead and Flexible Approach for Spark SQL Configuration Tuning (TKDE 2025)\n* EAST: An Interpretable Knob Estimation System for Cloud Database (ICDE 2025)\n* AQETuner: Reliable Query-level Configuration Tuning for Analytical Query Engines (arXiv 2025)\n* Automated Database Tuning vs. Human-Based Tuning in a Simulated Stressful Work Environment: A Demonstration of the Database Gym (SIGMOD 2025)\n* Rabbit: Retrieval-Augmented Generation Enables Better Automatic Database Knob Tuning (ICDE 2025)\n* BitTuner: A Toolbox for Automatically Configuring Learned Data Compressors (ICDE 2025)\n* AgentTune: An Agent-Based Large Language Model Framework for Database Knob Tuning (SIGMOD 2025)\n* L2T-Tune:LLM-Guided Hybrid Database Tuning with LHS and TD3 (arXiv 2025)\n\n### Physical Design\n* Tiresias: Enabling Predictive Autonomous Storage and Indexing (VLDB 2022)\n* Hyper: Hybrid Physical Design Advisor with Multi-agent Reinforcement Learning (ICDE 2025)\n#### Learned structure\n* Stacked Filters: Learning to Filter by Structure (VLDB 2021)\n* LEA: A Learned Encoding Advisor for Column Stores (aiDM 2021)\n* Learning over Sets for Databases (EDBT 2024)\n* A Distributed Learned Hash Table (arXiv 2025)\n#### Index\n##### Index Structure\n* Learning to hash for indexing big data - A survey (2016)\n* The Case for Learned Index Structures (SIGMOD 2018)\n* A-Tree: A Bounded Approximate Index Structure (2017)\n* FITing-Tree: A Data-aware Index Structure (SIGMOD 2019)\n* Learned Indexes for Dynamic Workloads (2019)\n* SOSD: A Benchmark for Learned Indexes (2019)\n* Learning Multi-dimensional Indexes (2019)\n* ALEX: An Updatable Adaptive Learned Index (SIGMOD 2020)\n* Effectively Learning Spatial Indices (VLDB 2020) [GitHub Link](https:\u002F\u002Fgithub.com\u002FLiuguanli\u002FRSMI)\n* Stable Learned Bloom Filters for Data Streams (VLDB 2020)\n* START — Self-Tuning Adaptive Radix Tree (ICDEW 2020)\n* Learned Data Structures (2020)\n* RadixSpline: a single-pass learned index (aiDM2020)\n* The ML-Index: A Multidimensional, Learned Index for Point, Range, and Nearest-Neighbor Queries (EDBT 2020)\n* The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds (VLDB 2020)\n* A Tutorial on Learned Multi-dimensional Indexes (SIGSPATIAL 2020)\n* Why Are Learned Indexes So Effective? (ICML 2020)\n* Learned Indexes for a Google-scale Disk-based Database (arXiv 2020)\n* SIndex: A Scalable Learned Index for String Keys （APSys 2020)\n* XIndex: A Scalable Learned Index for Multicore Data Storage （PPoPP 2020)\n* Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads (VLDB 2021)\n* A Lazy Approach for Efficient Index Learning (2021)\n* The RLR-Tree: A Reinforcement Learning Based R-Tree for Spatial Data (arXiv 2021)\n* Spatial Interpolation-based Learned Index for Range and kNN Queries (arXiv 2021)\n* APEX: A High-Performance Learned Index on Persistent Memory (arXiv 2021)\n* RUSLI: Real-time Updatable Spline Learned Index (aiDM 2021)\n* PLEX: Towards Practical Learned Indexing (arXiv 2021)\n* SPRIG: A Learned Spatial Index for Range and kNN Queries (SSTD 2021)\n* Benchmarking Learned Indexes (VLDB 2021)\n* Updatable Learned Index with Precise Positions (VLDB 2021)\n* The Case for Learned In-Memory Joins (arXiv 2021)\n* Bounding the Last Mile: Efficient Learned String Indexing (arXiv 2021)\n* FINEdex: A Fine-grained Learned Index Scheme for Scalable and Concurrent Memory Systems (VLDB 2022)\n* The next 50 Years in Database Indexing or: The Case for Automatically Generated Index Structures (VLDB 2022)\n* The Concurrent Learned Indexes for Multicore Data Storage (Transactions on Storage 2022)\n* TONE: cutting tail-latency in learned indexes (CHEOPS 22)\n* A Learned Index for Exact Similarity Search in Metric Spaces (ArXiv 2022)\n* RW-tree: A Learned Workload-aware Framework for R-tree Construction (ICDE 2022)\n* The \"AI+R\"-tree: An Instance-optimized R-tree (MDM 2022)\n* LHI: A Learned Hamming Space Index Framework for Efficient Similarity Search (SIGMOD 2022)\n* Entropy Learned Hashing: 10X Faster Hashing with Controllable Uniformity (SIGMOD 2022)\n* Tuning Hierarchical Learned Indexes on Disk and Beyond (SIGMOD 2022)\n* FLIRT: A Fast Learned Index for Rolling Time frames (EDBT 2022)\n* Testing the Robustness of Learned Index Structures (arXiv 2022)\n* The Case for ML-Enhanced High-Dimensional Indexes (2022)\n* A Learned Index for Exact Similarity Search in Metric Spaces (arxiv 2022)\n* PLIN: A Persistent Learned Index for Non-Volatile Memory with High Performance and Instant Recovery (VLDB 2023)\n* A Data-aware Learned Index Scheme for Efficient Writes (ICPP 2022)\n* Frequency Estimation in Data Streams: Learning the Optimal Hashing Scheme (TKDE)\n* FILM: A Fully Learned Index for Larger-Than-Memory Databases (VLDB 2023)\n* WISK: A Workload-aware Learned Index for Spatial Keyword Queries (arXiv 2023)\n* Efficiently Learning Spatial Indices (ICDE 2023)\n* Cutting Learned Index into Pieces: An In-depth Inquiry into Updatable Learned Indexes (ICDE 2023)\n* DILI: A Distribution-Driven Learned Index (arXiv 2023)\n* Learned Index: A Comprehensive Experimental Evaluation (VLDB 2023)\n* LMSFC: A Novel Multidimensional Index based on Learned Monotonic Space Filling Curves (Extended Version) (arXiv 2023)\n* One stone, two birds: A lightweight multidimensional learned index with cardinality support (arXiv 2023)\n* A Simple Yet High-Performing On-disk Learned Index: Can We Have Our Cake and Eat it Too? (aiXiv 2023)\n* Fast Partitioned Learned Bloom Filter (arXiv 2023)\n* Efficient Index Learning via Model Reuse and Fine-tuning (ICDEW 2023)\n* COAX: Correlation-Aware Indexing (ICDEW 2023)\n* Learned Index with Dynamic e (openreview 2023)\n* Learning to Optimize LSM-trees: Towards A Reinforcement Learning based Key-Value Store for Dynamic Workloads (arXiv 2023)\n* SALI: A Scalable Adaptive Learned Index Framework based on Probability Models (SIGMODE 2024)\n* Sieve: A Learned Data-Skipping Index for Data Analytics (VLDB 2023)\n* Demonstrating Waffle: A Self-driving Grid Index (VLDB Demo 2023)\n* Can LSH (Locality-Sensitive Hashing) Be Replaced by Neural Network? (arXiv 2023)\n* Workload-aware and Learned Z-Indexes (arXiv 2023)\n* AirIndex: Versatile Index Tuning Through Data and Storage (SIGMOD 2024)\n* A Fast Learned Key-Value Store for Concurrent and Distributed Systems (TKDE 2023)\n* When Learned Indexes Meet Persistent Memory: The Analysis and the Optimization (TKDE 2023)\n* PLATON: Top-down R-tree Packing with Learned Partition Policy (PACMMOD 2023)\n* A Learned Cuckoo Filter for Approximate Membership Queries over Variable-sized Sliding Windows on Data Streams (PACMMOD 2023)\n* WIPE: a Write-Optimized Learned Index for Persistent Memory (TACO 2023)\n* Algorithmic Complexity Attacks on Dynamic Learned Indexes (VLDB 2024)\n* A Fully On-disk Updatable Learned Index (ICDE 2024)\n* Limousine: Blending Learned and Classical Indexes to Self-Design Larger-than-Memory Cloud Storage Engines (SIGMOD 2024)\n* AStore: Uniformed Adaptive Learned Index and Cache for RDMA-enabled Key-Value Store (TKDE 2024)\n* Cabin: A Compressed Adaptive Binned Scan Index (SIGMOD 2024)\n* SWIX: A Memory-efficient Sliding Window Learned Index (SIGMOD 2024)\n* Limousine: Blending Learned and Classical Indexes to Self-Design Larger-than-Memory Cloud Storage Engines (SIGMOD 2024)\n* A Survey of Learned Indexes for the Multi-dimensional Space (arXiv 2024)\n* Hyper: A High-Performance and Memory-Efficient Learned Index via Hybrid Construction (Proceedings of the ACM on Management of Data 2024)\n* Predicate caching: Query-driven secondary indexing for cloud data warehouses (SIGMOD 2024)\n* AStore: Uniformed Adaptive Learned Index and Cache for RDMA-Enabled Key-Value Store (TKDE 2024)\n* Can Learned Indexes be Built Efficiently? A Deep Dive into Sampling Trade-offs (SIGMOD 2024)\n* Making In-Memory Learned Indexes Efficient on Disk (SIGMOD 2024)\n* LeaderKV: Improving Read Performance of KV Stores via Learned Index and Decoupled KV Table (ICDE 2024)\n* Chameleon: Towards Update-Efficient Learned Indexing for Locally Skewed Data (ICDE 2024)\n* Revisiting Learned Index with Byte-addressable Persistent Storage (ICPP 2024)\n* UpLIF: An Updatable Self-Tuning Learned Index Framework (arXiv 2024)\n* LITS: An Optimized Learned Index for Strings (VLDB 2024)\n* Evaluating Learned Indexes for External-Memory Joins (arXiv 2024)\n* Learned Indexes with Distribution Smoothing via Virtual Points (arXiv 2024)\n* VEGA: An Active-tuning Learned Index with Group-Wise Learning Granularity (SIGMOD 2025)\n* ALT-Index: A Hybrid Learned Index for Concurrent Memory Database Systems (ICDE 2025)\n* BMTree: Designing, Learning, and Updating Piecewise Space-Filling Curves for Multi-Dimensional Data Indexing (arXiv 2025)\n* LIOF: Make the Learned Index Learn Faster With Higher Accuracy (TKDE 2025)\n* TELEX: Two-Level Learned Index for Rich Queries on Enclave-based Blockchain Systems (TKDE 2025)\n* Piecewise Linear Approximation in Learned Index Structures: Theoretical and Empirical Analysis (arXiv 2025)\n* Learned Indexes From the One-dimensional to the Multi-dimensional Spaces: Challenges, Techniques, and Opportunities (SIGMOD 2025)\n* Evaluating Learned Indexes in LSM-tree Systems: Benchmarks,Insights and Design Choices (arXiv 2025)\n* leSAX Index: A Learned SAX Representation Index for Time Series Similarity Search (ICDE 2025)\n* High Performance or Low Memory? An Updatable Learned Index Framework for Time-Space Tradeoff (SIGMOD 2025)\n* Understanding Robustness Issues of Updatable Learned Indexes: [Experiments & Analysis] (SIGMOD 2025)\n* LETIndex: A Secure Learned Index with TEE (VLDB 2025)\n* Benchmarking RL-Enhanced Spatial Indices Against Traditional, Advanced, and Learned Counterparts (arxiv 2025)\n##### LSM-tree related\n* Leaper: A Learned Prefetcher for Cache Invalidation in LSM-tree based Storage Engines （VLDB 2020）\n* From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees (OSDI 2020)\n* TridentKV: A Read-Optimized LSM-Tree Based KV Store via Adaptive Indexing and Space-Efficient Partitioning (TPDS 2022)\n* LearnedKV: Integrating LSM and Learned Index for Superior Performance on SSD (arXiv 2024)\n* CAMAL: Optimizing LSM-trees via Active Learning (arXiv 2024)\n* DobLIX: A Dual-Objective Learned Index for Log-Structured Merge Trees (arXiv 2025)\n* Learned LSM-trees: Two Approaches Using Learned Bloom Filters (aiXiv 2025)\n##### Index Recommendation\n* Index Selection in a Self- Adaptive Data Base Management System （SIGMOD 1976）\n* AutoAdmin 'What-if' Index Analysis Utility (SIGMOD 1998)\n* Self-Tuning Database Systems: A Decade of Progress (VLDB 2007)\n* AI Meets AI: Leveraging Query Executions to Improve Index Recommendations (SIGMOD 2019) \n* Automated Database Indexing using Model-free Reinforcement Learning (ICAPS 2020)\n* DRLindex: deep reinforcement learning index advisor for a cluster database (2020 Symposium on International Database Engineering & Applications)\n* Magic mirror in my hand, which is the best in the land? An Experimental Evaluation of Index Selection Algorithms (VLDB 2020) [GitHub Link](https:\u002F\u002Fgithub.com\u002Fhyrise\u002Findex_selection_evaluation)\n* An Index Advisor Using Deep Reinforcement Learning (CIKM 2020) [GitHub Link](https:\u002F\u002Fgithub.com\u002Frmitbggroup\u002FIndexAdvisor)\n* DBA bandits: Self-driving index tuning under ad-hoc, analytical workloads with safety guarantees (ICDE 2021)\n* MANTIS: Multiple Type and Attribute Index Selection using Deep Reinforcement Learning (IDEAS 2021)\n* AutoIndex: An Incremental Index Management System for Dynamic Workloads (ICDE 2022) [GitHub Link](https:\u002F\u002Fgithub.com\u002Fzhouxh19\u002FAutoIndex)\n* SWIRL: Selection of Workload-aware Indexes using Reinforcement Learning (EDBT 2022) [GitHub Link](https:\u002F\u002Fgithub.com\u002Fhyrise\u002Frl_index_selection)\n* Indexer++: workload-aware online index tuning with transformers and reinforcement learning (ACM SIGAPP SAC, 2022)\n* Budget-aware Index Tuning with Reinforcement Learning (SIGMOD 2022)\n* ISUM: Efficiently Compressing Large and Complex Workloads for Scalable Index Tuning (SIGMOD 2022)\n* DISTILL: Low-Overhead Data-Driven Techniques for Filtering and Costing Indexes for Scalable Index Tuning (VLDB 2022)\n* SmartIndex: An Index Advisor with Learned Cost Estimator (CIKM 2022)\n* HMAB: self-driving hierarchy of bandits for integrated physical database design tuning (VLDB 2022)\n* Learned Index Benefits: Machine Learning Based Index Performance Estimation (VLDB 2023) [GitHub Link](https:\u002F\u002Fgithub.com\u002FJC-Shi\u002FLearned-Index-Benefits)\n* AIM: A practical approach to automated index management for SQL databases (ICDE 2023)\n* Updatable Learned Indexes Meet Disk-Resident DBMS - From Evaluations to Design Choices (SIGMOD 2023)\n* Index Tuning with Machine Learning on Quantum Computers for Large-Scale Database Applications (AIDB@VLDB 2023)\n* A Data-Driven Index Recommendation System for Slow Queries (CIKM 2023)\n* ML-Powered Index Tuning: An Overview of Recent Progress and Open Challenges (arXiv 2023)\n* Robustness of Updatable Learning-based Index Advisors against Poisoning Attack (SIGMOD 2024)\n* Refactoring Index Tuning Process with Benefit Estimation (VLDB 2024) [GitHub Link](https:\u002F\u002Fgithub.com\u002FHIT-DB-Group\u002FRIBE)\n* Leveraging Dynamic and Heterogeneous Workload Knowledge to Boost the Performance of Index Advisors (VLDB 2024) [GitHub Link](https:\u002F\u002Fgithub.com\u002FXMUDM\u002FBALANCE)\n* MFIX: An Efficient and Reliable Index Advisor via Multi-Fidelity Bayesian Optimization (ICDE 2024)\n* TRAP: Tailored Robustness Assessment for Index Advisors via Adversarial Perturbation (ICDE 2024)\n* Online Index Recommendation for Slow Queries (ICDE 2024)\n* Automatic Index Tuning: A Survey (TKDE)\n* Breaking It Down: An In-Depth Study of Index Advisors (VLDB 2024)\n* Can Uncertainty Quantification Enable Better Learning-based Index Tuning? (arXiv 2024)\n* Hybrid Cost Modeling for Reducing Query Performance Regression in Index Tuning (TKDE 2024)\n* A New Paradigm in Tuning Learned Indexes: A Reinforcement Learning Enhanced Approach (arXiv 2025)\n* LLMIdxAdvis: Resource-Efficient Index Advisor Utilizing Large Language Model (arXiv 2025)\n* Guiding Index Tuning Exploration with Potential Estimation (ICDE 2025)\n* AutoIndexer: A Reinforcement Learning-Enhanced Index Advisor Towards Scaling Workloads (arXiv 2025)\n* Rainbow: Risk-aware Index Benefit Estimation Facing Out Of Distribution Workloads (SIGMOD 2025)\n* Automatic Indexing in Oracle (VLDB 2025)\n\n### Materialized View\n* Automatic View Generation with Deep Learning and Reinforcement Learning (ICDE 2020)\n* An Autonomous Materialized View Management System with Deep Reinforcement Learning (ICDE 2021)\n* A Technical Report on Dynamic Materialized View Management using Graph Neural Network\n* HMAB: self-driving hierarchy of bandits for integrated physical database design tuning (VLDB 2022)\n* AutoView: An Autonomous Materialized View Management System with Encoder-Reducer (TKDE 2022)\n* Dynamic Materialized View Management using Graph Neural Network (ICDE 2023)\n#### Schema & Partition\n* Schism: a Workload-Driven Approach to Database Replication and Partitioning (VLDB 2010)\n* Skew-Aware Automatic Database Partitioning in Shared-Nothing, Parallel OLTP Systems (SIGMOD 2012)\n* Automated Data Partitioning for Highly Scalable and Strongly Consistent Transactions (2016 Transactions on Parallel and distributed systems)\n* GridFormation : Towards Self-Driven Online Data Partitioning using Reinforcement Learning (aiDM@SIGMOD 2018)\n* Learning a Partitioning Advisor with Deep Reinforcement Learning (2019)\n* Qd-tree: Learning Data Layouts for Big Data Analytics (SIGMOD 2020)\n* A Genetic Optimization Physical Planner for Big Data Warehouses (2020)\n* Lachesis: Automated Partitioning for UDF-Centric Analytics (VLDB 2021)\n* Instance-Optimized Data Layouts for Cloud Analytics Workloads (SIGMOD 2021)\n* Jigsaw: A Data Storage and Query Processing Engine for Irregular Table Partitioning (SIGMOD 2021)\n* Dalton: Learned Partitioning for Distributed Data Streams (VLDB 2023)\n* Grep: A Graph Learning Based Database Partitioning System (Management of Data 2023)\n* Learned spatial data partitioning （arXiv 2023)\n* Relax and Let the Database Do the Partitioning Online (BIRTE 2011)\n* SWORD: Scalable Workload-Aware Data Placement for Transactional Workloads (EDBT 2013)\n* Online Data Partitioning in Distributed Database Systems (EDBT 2015)\n* A Robust Partitioning Scheme for Ad-Hoc Query Workloads (SOCC 2017)\n* Automated multidimensional data layouts in Amazon Redshift (SIGMOD 2024)\n* Oasis: An Optimal Disjoint Segmented Learned Range Filter (VLDB 2024)\n\n### Cache related\n* A Learned Cache Eviction Framework with Minimal Overhead (arXiv 2023)\n\n## Workload\n### Workload generation \nDemonstrating SQLBarber: Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads (SIGMOD 2025)\n\n### Resource Management and Auto-scaling\n\n* Automated Demand-driven Resource Scaling in Relational Database-as-a-Service (SIGMOD 2016)\n* Database Workload Capacity Planning using Time Series Analysis and Machine Learning (SIGMOD 2020)\n* Seagull: An Infrastructure for Load Prediction and Optimized Resource Allocation (VLDB 2020)\n* FIRM: An Intelligent Fine-grained Resource Management Framework for SLO-Oriented Microservices (OSDI 2020)\n* Optimal Resource Allocation for Serverless Queries (arXiv 2021)\n* sinan: ml-based and qos-aware resource management for cloud microservices (ASPLOS 2021)\n* Towards Optimal Resource Allocation for Big Data Analytics (EDBT 2022)\n* Tenant Placement in Over-subscribed Database-as-a-Service Clusters (VLDB 2022)\n* Fine-Grained Modeling and Optimization for Intelligent Resource Management in Big Data Processing (arXiv 2022)\n* SIMPPO: a scalable and incremental online learning framework for serverless resource management (SoCC 2022)\n* SUFS: A Generic Storage Usage Forecasting Service Through Adaptive Ensemble Learning (ICDE 2023)\n* Auto-WLM: Machine Learning Enhanced Workload Management in Amazon Redshift (SIGMOD-Companion ’23)\n* SeLeP: Learning Based Semantic Prefetching for Exploratory Database Workloads (arXiv 2023)\n* Intelligent scaling in Amazon Redshift (SIGMOD 2024)\n* Forecasting Algorithms for Intelligent Resource Scaling: An Experimental Analysis (Socc 2024)\n* LORE: Learning-Based Resource Recommendation for Big Data Queries (ICDE 2025)\n\n### Performance Diagnosis and Modeling\n\n- Performance and resource modeling in highly-concurrent OLTP workloads (SIGMOD 2013)\n- DBSherlock: A Performance Diagnostic Tool for Transactional Databases (SIGMOD 2016)\n- A Top-Down Approach to Achieving Performance Predictability in Database Systems (SIGMOD 2017)\n- Diagnosing Root Causes of Intermittent Slow Queries in Cloud Databases (VLDB 2020)\n- Workload-Aware Performance Tuning for Autonomous DBMSs (ICDE 2021)\n- Sage: Practical and Scalable ML-Driven Performance Debugging in Microservices (ASPLOS 2021)\n- D-Bot: Database Diagnosis System using Large Language Models (arXiv 2023)\n- Modeling Shifting Workloads for Learned Database Systems (SIGMOD 2024)\n- Andromeda: Debugging Database Performance Issues with Retrieval-Augmented Large Language Models (SIGMOD 2025)\n\n### Workload Shift Detection\n\n- Towards workload shift detection and prediction for autonomic databases (CIKM 2007)\n- Consistent on-line classification of dbs workload events (CIKM 2009)\n- On predictive modeling for optimizing transaction execution in parallel OLTP systems (VLDB 2011)\n- In-Context Adaptation to Concept Drift for Learned Database Operations (arXiv 2025)\n\n### Workload Characterization & Forecasting\n\n* On Workload Characterization of Relational Database Environments (TSE 1992)\n* Workload Models for Autonomic Database Management Systems (International Conference on Autonomic and Autonomous Systems 2006)\n* Workload characterization and prediction in the cloud: A multiple time series approach (APNOMS 2012）\n* Query-based Workload Forecasting for Self-Driving Database Management Systems (SIGMOD 2018）\n* Query2Vec: An Evaluation of NLP Techniques for Generalized Workload Analytics (Arxiv 2018)\n* Database Workload Characterization with Query Plan Encoders (arXiv 2021)\n* Explaining Inference Queries with Bayesian Optimization (VLDB 2021)\n* Statistical Schema Learning with Occam's Razor (SIGMOD 2022)\n* Intelligent Automated Workload Analysis for Database Replatforming (SIGMOD 2022)\n* Stitcher: Learned Workload Synthesis from Historical Performance Footprints (EDBT 2022)\n* DBAugur: An Adversarial-based Trend Forecasting System for Diversified Workloads (ICDE 2023)\n* An Efficient Online Prediction of Host Workloads Using Pruned GRU Neural Nets (arXiv 2023)\n* Uncertainty-Aware Workload Prediction in Cloud Computing (arXiv 2023)\n* Real-Time Workload Pattern Analysis for Large-Scale Cloud Databases (VLDB 2023)\n* Robust Auto-Scaling with Probabilistic Workload Forecasting for Cloud Databases (ICDE 2024)\n* QPSEncoder: A Database Workload Encoder with Deep Learning (DEXA 2024)\n* From Feature Selection to Resource Prediction: An Analysis of Commonly Applied Workflows and Techniques (EDBT 2025)\n\n## Query Optimization\n* Learned Query Optimizer: What is New and What is Next (SIGMOD 2024)\n* GLO: Towards Generalized Learned Query Optimization (ICDE 2024)\n* Robust Query Optimization in the Era of Machine Learning: State-of-the-Art and Future Directions (ICDE 2024)\n* Presto’s History-based Query Optimizer (VLDB 2024)\n* Spatial Query Optimization With Learning (VLDB 2024)\n* DBG-PT: A Large Language Model Assisted Query Performance Regression Debugger (VLDB 2024)\n* How Good are Learned Cost Models, Really? Insights from Query Optimization Tasks (SIGMOD 2025) [GitHub Link](https:\u002F\u002Fgithub.com\u002FDataManagementLab\u002Flcm-eval)\n* SERAG: Self-Evolving RAG System for Query Optimization (arXiv 2025)\n* Logical and Physical Optimizations for SQL Query Execution over Large Language Models (SIGMOD 2025)\n* SEFRQO: A Self-Evolving Fine-Tuned RAG-Based Query Optimizer (arXiv 2025)\n* JOB-Complex: A Challenging Benchmark for Traditional&Learned Query Optimization (arXiv 2025)\n* LLM4Hint: Leveraging Large Language Models for Hint Recommendation in Offline Query Optimization (arXiv 2025)\n* Graph Transformers for Query Plan Representation: Potentials and Challenges (VLDB 2026)\n\n### Query Rewrite\n* Sia: Optimizing Queries using Learned Predicates (SIGMOD 2021)\n* A Learned Query Rewrite System using Monte Carlo Tree Search (VLDB 2022)\n* WeTune: Automatic Discovery and Verification of Query Rewrite Rules (SIGMOD 2022)\n* A Learned Query Rewrite System (VLDB 2023)\n* Query Rewriting via Large Language Models (arXiv 2024)\n* LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency (arXiv 2024) [GitHub](https:\u002F\u002Fgithub.com\u002FDAMO-NLP-SG\u002FLLM-R2)\n* R-Bot: An LLM-based Query Rewrite System (arXiv 2024)\n* QUITE: A Query Rewrite System Beyond Rules with LLM Agents (arXiv 2025)\n* Leveraging Query Optimizers to Verify the Soundness of LLM-based Query Rewrites for Real-World Workloads, and More! (CIDR 2026)\n\n### Cardinality Estimation\n* Are We Ready For Learned Cardinality Estimation? (VLDB 2021) [GitHub Link](https:\u002F\u002Fgithub.com\u002Fsfu-db\u002FAreCELearnedYet)\n* A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation (SIGMOD 2021)\n* LATEST: Learning-Assisted Selectivity Estimation Over Spatio-Textual Streams (ICDE 2021)\n* Fauce: Fast and Accurate Deep Ensembles with Uncertainty for Cardinality Estimation (VLDB 2021)\n* Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation (arXiv 2021) [GitHub Link](https:\u002F\u002Fgithub.com\u002FNathaniel-Han\u002FEnd-to-End-CardEst-Benchmark)\n* Learned Cardinality Estimation: A Design Space Exploration and A Comparative Evaluation (VLDB 2022)\n* Glue: Adaptively Merging Single Table Cardinality to Estimate Join Query Size (aiXiv 2021)\n* Unsupervised Selectivity Estimation by Integrating Gaussian Mixture Models and an Autoregressive Model (EDBT 2022)\n* Selectivity Functions of Range Queries are Learnable (SIGMOD 2022)\n* Prediction Intervals for Learned Cardinality Estimation: An Experimental Evaluation (ICDE 2022)\n* Learned Cardinality Estimation: An In-depth Study (SIGMOD 2022)\n* FactorJoin: A New Cardinality Estimation Framework for Join Queries (SIGMOD 2023)\n* AutoCE: An Accurate and Efficient Model Advisor for Learned Cardinality Estimation (ICDE 2023)\n* Couper: Memory-Efficient Cardinality Estimation under Unbalanced Distribution (ICDE 2023)\n* ALECE: An Attention-based Learned Cardinality Estimator for SPJ Queries on Dynamic Workloads (VLDB 2023)\n* Advanced Dataset Discovery: When Multi-Query-Dataset Cardinality Estimation Matters (aiXiv 2024)\n* Sample-Efficient Cardinality Estimation Using Geometric Deep Learning (VLDB 2024)\n* PRICE: A Pretrained Model for Cross-Database Cardinality Estimation (arXiv 2024) [GitHub Lint](https:\u002F\u002Fgithub.com\u002FStCarmen\u002FPRICE)\n* ByteCard: Enhancing ByteDance's Data Warehouse with Learned Cardinality Estimation (SIGMOD 2024)\n* ASM in Action: Fast and Practical Learned Cardinality Estimation (SIGMOD 2024)\n* CardBench: A Benchmark for Learned Cardinality Estimation in Relational Database (arXiv 2024)\n* Duet: efficient and scalable hybriD neUral rElation undersTanding. (ICDE 2024)\n* Cardinality Estimation of LIKE Predicate Queries using Deep Learning (SIGMOD 2025)\n* TardySketch: A Framework for Cardinality Estimation Adaptable to Sliding Windows (arXiv 2025)\n* Algorithmic Complexity Attacks on All Learned Cardinality Estimators: A Data-centric Approach (arXiv 2025)\n* DistJoin: A Decoupled Join Cardinality Estimator based on Adaptive Neural Predicate Modulation (TKDE 2026) [GitHub Lint](https:\u002F\u002Fgithub.com\u002FGIS-PuppetMaster\u002FDistJoin)\n#### Data-based\n* Self-Tuning, GPU-Accelerated Kernel Density Models for Multidimensional Selectivity Estimation (SIGMOD 2015)\n* Estimating Join Selectivities using Bandwidth-Optimized Kernel Density Models (VLDB 2017)\n* DeepDB: Learn from Data, not from Queries! (VLDB 2020) [GitHub Link](https:\u002F\u002Fgithub.com\u002FDataManagementLab\u002Fdeepdb-public)\n* Deep Unsupervised Cardinality Estimation (VLDB 2019) \n* Multi-Attribute Selectivity Estimation Using Deep Learning (arXiv 2019)\n* Deep Learning Models for Selectivity Estimation of Multi-Attribute Queries (SIGMOD 2020)\n* NeuroCard: One Cardinality Estimator for All Tables (VLDB 2020) [GitHub Link](https:\u002F\u002Fgithub.com\u002Fneurocard\u002Fneurocard)\n* Learning to Sample: Counting with Complex Queries (VLDB 2020)\n* Selectivity estimation using probabilistic models (SIGMOD 2001)\n* Lightweight graphical models for selectivity estimation without independence assumptions (VLDB 2011)\n* Efficiently adapting graphical models for selectivity estimation (VLDB 2013)\n* An Approach Based on Bayesian Networks for Query Selectivity Estimation (DASFAA 2019)\n* BayesCard: A Unified Bayesian Framework for Cardinality Estimation (arXiv 2020) [GitHub Link](https:\u002F\u002Fgithub.com\u002Fwuziniu\u002FBayesCard)\n* Online Sketch-based Query Optimization (arXiv 2021)\n* LMKG: Learned Models for Cardinality Estimation in Knowledge Graphs (arXiv 2021)\n* LHist: Towards Learning Multi-dimensional Histogram for Massive Spatial Data (ICDE 2021)\n* FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation (VLDB 2021) [GitHub Link](https:\u002F\u002Fgithub.com\u002Fwuziniu\u002FFSPN)\n* Astrid: Accurate Selectivity Estimation for String Predicates using Deep Learning (VLDB 2021)\n* FACE: A Normalizing Flow based Cardinality Estimator (VLDB 2022)\n* Pre-training Summarization Models of Structured Datasets for Cardinality Estimation (VLDB 2022)\n* Cardinality Estimation of Approximate Substring Queries using Deep Learning (VLDB 2022)\n* Speeding Up End-to-end Query Execution via Learning-based Progressive Cardinality Estimation (Proceedings of the ACM on Management of Data)\n* Cardinality estimation with smoothing autoregressive models (WWW 2023)\n* Cardinality estimation using normalizing flow (VLDBJ 2023)\n* LPLM: A Neural Language Model for Cardinality Estimation of LIKE-Queries (SIGMOD 2024)\n* ASM: Harmonizing Autoregressive Model, Sampling, and Multi-dimensional Statistics Merging for Cardinality Estimation (SIGMOD 2024)\n* ASM in Action: Fast and Practical Learned Cardinality Estimation (SIGMOD 2024)\n* SAFE: Sampling-Assisted Fast Learned Cardinality Estimation for Dynamic Spatial Data (DEXA 2024)\n* Updateable Data-Driven Cardinality Estimator with Bounded Q-error (arXiv 2024)\n* Grid-AR: A Grid–based Booster for Learned Cardinality Estimation and Range Joins (arXiv 2024)\n* SSCard: Substring Cardinality Estimation using Suffix Tree-Guided Learned FM-Index (arXiv 2025)\n* A Lightweight Learned Cardinality Estimation Model (TKDE 2025)\n* Downsizing Diffusion Models for Cardinality Estimation (arXiv 2025)\n#### Query-based\n* Adaptive selectivity estimation using query feedback (SIGMOD 1994)\n* Selectivity Estimation in Extensible Databases -A Neural Network Approach （VLDB 1998）\n* Effective query size estimation using neural networks.  (Applied Intelligence 2002)\n* LEO - DB2's LEarning optimizer （VLDB 2011)\n* A Black-Box Approach to Query Cardinality Estimation (CIDR 07)\n* Cardinality Estimation Using Neural Networks (2015)\n* Towards a learning optimizer for shared clouds (VLDB 2018)\n* Learning State Representations for Query Optimization with Deep Reinforcement Learning  (DEEM@SIGMOD2018)\n* Learned Cardinalities: Estimating Correlated Joins with Deep Learning （CIDR2019）[GitHub Link](https:\u002F\u002Fgithub.com\u002Fandreaskipf\u002Flearnedcardinalities)\n* Estimating Cardinalities with Deep Sketches (SIGMOD 2019) [GitHub Link](https:\u002F\u002Fgithub.com\u002Fandreaskipf\u002Flearnedcardinalities)\n* Selectivity estimation for range predicates using lightweight models (VLDB 2019)\n* (Review) An Empirical Analysis of Deep Learning for Cardinality Estimation (arXiv 2019)\n* Flexible Operator Embeddings via Deep Learning (arXiv 2019)\n* Improved Cardinality Estimation by Learning Queries Containment Rates (EDBT 2020)\n* NN-based Transformation of Any SQL Cardinality Estimator for Handling DISTINCT, AND, OR and NOT (2020)\n* QuickSel: Quick Selectivity Learning with Mixture Models (SIGMOD 2020)\n* Efficiently Approximating Selectivity Functions using Low Overhead Regression Models (VLDB 2020)\n* Learned Cardinality Estimation for Similarity Queries (SIGMOD 2021)\n* Uncertainty-aware Cardinality Estimation by Neural Network Gaussian Process (arXiv 2021)\n* Flow-Loss: Learning Cardinality Estimates That Matter (VLDB 2021)\n* Warper: Efficiently Adapting Learned Cardinality Estimators to Data and Workload Drifts (SIGMOD 2022)\n* Lightweight and Accurate Cardinality Estimation by Neural Network Gaussian Process for Approximate Complex Event Processing (SIGMOD 2022)\n* Enhanced Featurization of Queries with Mixed Combinations of Predicates for ML-based Cardinality Estimation (EDBT 2023)\n* Speeding Up End-to-end Query Execution via Learning-based Progressive Cardinality Estimation (SIGMOD 2023)\n* Robust Query Driven Cardinality Estimation under Changing Workloads (VLDB 2023)\n* Learned Probing Cardinality Estimation for High-Dimensional Approximate NN Search (ICDE 2023)\n* CEDA: Learned Cardinality Estimation with Domain Adaptation (VLDB 2023)\n* Efficient Cardinality and Cost Estimation with Bidirectional Compressor-based Ensemble Learning (arXiv 2023)\n* Adding Domain Knowledge to Query-Driven Learned Databases (arXiv 2023)\n* PACE: Poisoning Attacks on Learned Cardinality Estimation (SIGMOD 2024)\n* Sample-Efficient Cardinality Estimation Using Geometric Deep Learning (VLDB 2024)\n* Automating localized learning for cardinality estimation based on XGBoost (Knowledge and Information Systems)\n* Data-Agnostic Cardinality Learning from Imperfect Workloads (arXiv 2025)\n* SPACE: Cardinality Estimation for Path Queries Using Cardinality-Aware Sequence-based Learning (SIGMOD 2025)\n### Cost Estimation\n#### Single Query\n* Statistical learning techniques for costing XML queries (VLDB 2005)\n* Predicting multiple metrics for queries: Better decisions enabled by machine learning （icde 2009)\n* The Case for Predictive Database Systems : Opportunities and Challenges （CIDR 2011)\n* Learning-based query performance modeling and prediction (ICDE 2012)\n* Robust estimation of resource consumption for SQL queries using statistical techniques (VLDB 2012)\n* Learning-based SPARQL query performance modeling and prediction (WWW 2017)\n* Plan-Structured Deep Neural Network Models for Query Performance Prediction (arXiv 2019)\n* An End-to-End Learning-based Cost Estimator (arXiv 2019)(VLDB 2019)\n* Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings (2020)\n* DBMS Fitting: Why should we learn what we already know? (CIDR 2020)\n* A Note On Operator-Level Query Execution Cost Modeling (2020)\n* ML-based Cross-Platform Query Optimization (ICDE 2020)\n* Zero-Shot Cost Models for Out-of-the-box Learned Cost Prediction (VLDB 2022)\n* Efficient Learning with Pseudo Labels for Query Cost Estimation (CIKM 2022)\n* gCBO: A Cost-based Optimizer for Graph Databases (CIKM 2022)\n* QueryFormer: A Tree Transformer Model for Query Plan Representation (VLDB 2022)\n* BASE: Bridging the Gap between Cost and Latency for Query Optimization (VLDB 2023)\n* Rethinking Learned Cost Models: Why Start from Scratch? (PACMMOD 2023)\n* Budget-aware Query Tuning: An AutoML Perspective (arXiv 2024)\n* OS Pre-trained Transformer: Predicting Query Latencies across Changing System Contexts [GitHub Link](https:\u002F\u002Fgithub.com\u002Fparimarjan\u002FLatencyPredictor)\n* Precision Meets Resilience: Cross-Database Generalization with Uncertainty Quantification for Robust Cost Estimation (CIKM 2024)\n* DACE: A Database-Agnostic Cost Estimator (ICDE 2024)\n* QCFE: An Efficient Feature Engineering for Query Cost Estimation(ICDE 2024)\n* T3: Accurate and Fast Performance Prediction for Relational Database Systems With Compiled Decision Trees (arXiv 2025)\n* Evaluating Learned Query Performance Prediction Models at LinkedIn: Challenges, Opportunities, and Findings (arXiv 2025)\n* LEAP: A Low-cost Spark SQL Query Optimizer using Pairwise Comparison (VLDB 2025)\n* CONCERTO: Complex Query Execution Mechanism-Aware Learned Cost Estimation (arXiv 2025)\n* GRACEFUL: A Learned Cost Estimator For UDFs (arXiv 2025)\n* Cross-Database Query Cost Estimation: A Comparative Study of Classic ML, Transformers, and LLMs\n* Bootstrapping Learned Cost Models with Synthetic SQL Queries (arXiv 2025)\n\n#### Concurrent\n* PQR: Predicting query execution times for autonomous workload management （ICAC 2008）\n* Performance Prediction for Concurrent Database Workloads (SIGMOD 2011)\n* Predicting completion times of batch query workloads using interaction-aware models and simulation(EDBT 2011)\n* Interaction-aware scheduling of report-generation workloads (VLDB 2011) （有调度策略）\n* Towards predicting query execution time for concurrent and dynamic database workloads (not machine learning) （VLDB 2014）\n* Contender: A Resource Modeling Approach for Concurrent Query Performance Prediction （EDBT 2014）\n* Query Performance Prediction for Concurrent Queries using Graph Embedding (VLDB 2020)\n* Efficient Deep Learning Pipelines for Accurate Cost Estimations Over Large Scale Query Workload (SIGMOD 2021)\n* A Resource-Aware Deep Cost Model for Big Data Query Processing (ICDE 2022)\n* Stage: Query Execution Time Prediction in Amazon Redshif (SIGMOD 2024)\n* PlanRGCN: Predicting SPARQL Query Performance (VLDB 2025)\n* Learned Cost Models for Query Optimization: From Batch to Streaming Systems (VLDB 2025)\n### Join Optimization\n* Adaptive Optimization of Very Large Join Queries (SIGMOD 2018) (Not machine learning\n* Deep Reinforcement Learning for Join Order Enumeration (aiDM@SIGMOD 2018)\n* Learning to Optimize Join Queries With Deep Reinforcement Learning (ArXiv)\n* Reinforcement Learning with Tree-LSTM for Join Order Selection (ICDE 2020)\n* Research Challenges in Deep Reinforcement Learning-based Join Query Optimization (aiDM 2020)\n* Efficient Join Order Selection Learning with Graph-based Representation (KDD 2022)\n* SOAR:A Learned Join Order Selector with Graph Attention Mechanism （IJCNN 2022）\n* Query Join Order Optimization Method Based on Dynamic Double Deep Q-Network (Electronics 2023)\n* Coral: federated query join order optimization based on deep reinforcement learning (WWW 2023)\n* JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning (arXiv 2023)\n* Join Order Selection with Deep Reinforcement Learning: Fundamentals, Techniques, and Challenges (VLDB 2023)\n* Sub-optimal Join Order Identification with L1-error (SIGMOD 2024)\n* TESSM: Tree-based Selective State Space Models for Efficient Join Order Selection Learning (CIKM 2024)\n* SOLAR: Scalable Distributed Spatial Joins through Learning-based Optimization (arXiv 2025)\n### Parametric Query Optimization\n  #### Foundational Theory\n  * Dynamic Query Evaluation Plans (SIGMOD 1989)\n  * Parametric Query Optimization (VLDB 1992)\n  * Optimization of Dynamic Query Evaluation Plans (SIGMOD 1994)\n  * Design and Analysis of Parametric Query Optimization Algorithms (VLDB 1998)\n  * Least Expected Cost Query Optimization: What Can We Expect? (SIGMOD 2002)\n  * Parametric Query Optimization for Linear and Piecewise Linear Cost Functions (VLDB 2002)\n  * AniPQO: Almost Non-intrusive Parametric Query  Optimization for Nonlinear Cost Functions (VLDB 2003)\n  #### Engineering & Data-driven PQO\n  * Analyzing Plan Diagrams of Database Query Optimizers (VLDB 2005)\n  * On the Production of Anorexic Plan Diagrams (VLDB 2007)\n  * Identifying Robust Plans through Plan Diagram Reduction (VLDB 2008)\n  * Efficiently Approximating Query Optimizer Plan Diagrams (VLDB 2008)\n  * Closing the Query Processing Loop in Oracle 11g (VLDB 2008)\n  * Progressive Parametric Query Optimization (TKDE 2009)\n  * Dynamic Plan Generation for Parameterized Queries (SIGMOD 2009)\n  * Variance Aware Optimization of Parameterized Queries (SIGMOD 2010)\n  * On the Stability of Plan Costs and the Costs of Plan Stability (VLDB 2010)\n  * Parametric Plan Caching Using Density-Based Clustering (ICDE 2012)\n  * Leveraging Re-costing for Online Optimization of Parameterized Queries with Guarantees (SIGMOD 2017)\n  #### ML-based PQO & Robust Query Optimization\n  * Leveraging Query Logs and Machine Learning for Parametric Query Optimization (VLDB 2022)\n  * Kepler: Robust Learning for Faster Parametric Query Optimization (SIGMOD 2023)\n  * RankPQO: Learning-to-Rank for Parametric Query Optimization (VLDB 2024)\n  * PARQO: Penalty-Aware Robust Plan Selection in Query Optimization (VLDB 2024)\n  * PAR2QO: Parametric Penalty-Aware Robust Query Optimization (VLDB 2024)\n  * APQO: An Adaptive Framework for Parametric Query Optimization (SIGMOD 2025)\n### Query Plan\n* Plan Selection Based on Query Clustering （VLDB 2002)\n* Cost-Based Query Optimization via AI Planning (AAAI 2014)\n* Sampling-Based Query Re-Optimization (SIGMOD 2016)\n* Learning State Representations for Query Optimization with Deep Reinforcement Learning  (DEEM@SIGMOD2018)\n* Towards a Hands-Free Query Optimizer through Deep Learning （CIDR 2019)\n* Neo: A Learned Query Optimizer (VLDB 2019)\n* Bao: Learning to Steer Query Optimizers (2020)\n* ML-based Cross-Platform Query Optimization (ICDE 2020)\n* Learning-based Declarative Query Optimization (2021)\n* **Bao: Making Learned Query Optimization Practical** (SIGMOD 2021 **Best Paper**!) [Doc](https:\u002F\u002Frmarcus.info\u002Fbao_docs\u002Fintroduction.html) [GitHub Link](https:\u002F\u002Fgithub.com\u002Flearnedsystems\u002FBaoForPostgreSQL)\n* Microlearner: A fine-grained Learning Optimizer for Big Data Workloads at Microsoft (2021)\n* Steering Query Optimizers: A Practical Take on Big Data Workloads (SIGMOD 2021)\n* A Unified Transferable Model for ML-Enhanced DBMS (CIDR 2021)\n* Balsa: Learning a Query Optimizer Without Expert Demonstrations (SIGMOD 2022)\n* Leveraging Query Logs and Machine Learning for Parametric Query Optimization (VLDB 2022)\n* Deploying a Steered Query Optimizer in Production at Microsoft (SIGMOD 2022)\n* Building Learned Federated Query Optimizers (VLDB 2022 PhD Workshop)\n* Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection (VLDB 2022)\n* Learn What Really Matters: A Learning-to-Rank Approach for ML-based Query Optimization (BTW 2023)\n* Lero: A Learning-to-Rank Query Optimizer (VLDB 2023) [GitHub Link](https:\u002F\u002Fgithub.com\u002FAlibabaIncubator\u002FLero-on-PostgreSQL)\n* Learned Query Superoptimization (arXiv 2023)\n* Kepler: Robust Learning for Faster Parametric Query Optimization (SIGMOD 2023)\n* LOGER: A Learned Optimizer towards Generating Efficient and Robust Query Execution Plans (VLDB 2023)\n* BitE : Accelerating Learned Query Optimization in a Mixed-Workload Environment (arXiv 2023)\n* Reinforcement Learning-based SPARQL Join Ordering Optimizer\n* LEON: A New Framework for ML-Aided Query Optimization (VLDB 2023)\n* AutoSteer: Learned Query Optimization for Any SQL Database (VLDB 2023)\n* FASTgres: Making Learned Query Optimizer Hinting Effective (VLDB 2023)\n* Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and Analysis (VLDB 2023)\n* QO-Insight: Inspecting Steered Query Optimizer (VLDB Demo 2023)\n* QPSeeker: An Efficient Neural Planner combining both data and queries through Variational Inference (EDBT 2024)\n* FOSS: A Self-Learned Doctor for Query Optimizer (ICDE 2024)\n* Lemo: A Cache-Enhanced Learned Optimizer for Concurrent Queries (PACMMOD 2023)\n* A Comparative Study and Component Analysis of Query Plan Representation Techniques in ML4DB Studies (VLDB 2024)\n* Learned Optimizer for Online Approximate Query Processing in Data Exploration (TKDE 2024)\n* A learning-based framework for spatial join processing: estimation, optimization and tuning (VLDB 2024)\n* Roq: Robust Query Optimization Based on a Risk-aware Learned Cost Model (arXiv 2024)\n* PLAQUE: Automated Predicate Learning at Query Time (SIGMOD 2024)\n* GLO: Towards Generalized Learned Query Optimization (ICDE 2024)\n* Eraser: Eliminating Performance Regression on Learned Query Optimizer (VLDB 2024)\n* Low Rank Approximation for Learned Query Optimization (aiDM 2024)\n* Lero: applying learning-to-rank in query optimizer (VLDB 2024)\n* RobOpt: A Tool for Robust Workload Optimization Based on Uncertainty-Aware Machine Learning (SIGMOD 2024)\n* A Novel Technique for Query Plan Representation Based on Graph Neural (Big Data Analytics and Knowledge Discovery)\n* An Exploratory Case Study of Query Plan Representations (aiXiv 2024)\n* JAPO: learning join and pushdown order for cloud-native join optimization (Frontiers of Computer Science 2024)\n* Steering the PostgreSQL query optimizer using hinting: State-Of-The-Art and open challenges (35th GI-Workshop on Foundations of Databases)\n* PARQO: Penalty-Aware Robust Plan Selection in Query Optimization (arXiv 2024)\n* HERO: Hint-Based Efficient and Reliable Query Optimizer (arXiv 2024)\n* Can Large Language Models Be Query Optimizer for Relational Databases? (arXiv 2025)\n* Learned Offline Query Planning via Bayesian Optimization (arXiv 2025)\n* A Query Optimization Method Utilizing Large Language Models (arXiv 2025)\n* RankPQO: Learning-to-Rank for Parametric Query Optimization (VLDB 2025)\n* Low Rank Learning for Offline Query Optimization (arXiv 2025)\n* LIMAO: A Framework for Lifelong Modular Learned Query Optimization (arXiv 2025)\n* Athena: An Effective Learning-based Framework for Query Optimizer Performance Improvement (SIGMOD 2025)\n* Delta: A Learned Mixed Cost-based Query Optimization Framework (arXiv 2025)\n* Training-Free Query Optimization via LLM-Based Plan Similarity (arXiv 2025)\n* A Learned Cost Model-based Cross-engine Optimizer for SQL Workloads (arXiv 2025)\n* FOSS: A learned doctor for query optimization (VLDBJ)\n\n## Query Execution\n### Sort\n* The Case for a Learned Sorting Algorithm (SIGMOD 2020)\n* Defeating duplicates: A re-design of the LearnedSort algorithm (aiXiv 2021)\n* Towards Parallel Learned Sorting (arXiv 2022)\n### Join\n* SkinnerDB : Regret-Bounded Query Evaluation via Reinforcement Learning (VLDB 2018)\n* The Case for Learned In-Memory Joins (arXiv 2021)\n### Adaptive Query Processing\n* Eddies: Continuously adaptive query processing. (SIGMOD 2000)\n* Micro adaptivity in Vectorwise (SIGMOD 2013)\n* Cuttlefish: A Lightweight Primitive for Adaptive Query Processing (2018)\n* Scalable Multi-Query Execution using Reinforcement Learning (SIGMOD 2021)\n### Approximate Query Processing\n* DBEST: Revisiting approximate query processing engines with machine learning models (SIGMOD 2019)\n* LAQP: Learning-based Approximate Query Processing (2020)\n* Approximate Query Processing for Data Exploration using Deep Generative Models (ICDE 2020)\n* ML-AQP: Query-Driven Approximate Query Processing based on Machine Learning (2020)\n* Approximate Query Processing for Group-By Queries based on Conditional Generative Models (2021)\n* Learned Approximate Query Processing: Make it Light, Accurate and Fast (CIDR 2021)\n* NeuroSketch: Fast and Approximate Evaluation of Range Aggregate Queries with Neural Networks (SIGMOD 2023)\n* Exploiting Machine Learning Models for Approximate Query Processing (Big Data 2022)\n* Tuple Bubbles: Learned Tuple Representations for Tunable Approximate Query Processing (aiDM 2023)\n* Learning-Based Sample Tuning for Approximate Query Processing in Interactive Data Exploration (TKDE 2024)\n### Sheduling\n* Workload management for cloud databases via machine learning (ICDE 2016 WiseDB)\n* A learning-based service for cost and performance management of cloud databases （ICDEW 2017）(short version for WiSeDB)\n* WiSeDB: A Learning-based Workload Management Advisor for Cloud Databases (2016 VLDB)\n* Learning Scheduling Algorithms for Data Processing Clusters (SIGCOMM 2019)\n* CrocodileDB: Efficient Database Execution through Intelligent Deferment (CIDT 2020)\n* Buffer Pool Aware Query Scheduling via Deep Reinforcement Learning (2020)\n* Self-Tuning Query Scheduling for Analytical Workloads (SIGMOD 2021)\n* LSched: A Workload-Aware Learned Query Scheduler for Analytical Database Systems (SIGMOD 2022)\n* DBMLSched: Scheduling In-database Machine Learning Jobs (AIDB@VLDB 2023)\n* Learning Interpretable Scheduling Algorithms for Data Processing Clusters (arXiv 2024)\n* CCaaLF: Concurrency Control as a Learnable Function (arXiv 2025)\n* Laser: Buffer-Aware Learned Query Scheduling in Master-Standby Databases (VLDB 2025)\n* Improving DBMS Scheduling Decisions with Accurate Performance Prediction on Concurrent Queries (VLDB 2025)\n\n(transaction 👇)\n\n* Scheduling OLTP transactions via learned abort prediction (aiDM@SIGMOD 2019)\n* Scheduling OLTP Transactions via Machine Learning （2019）\n* Polyjuice: High-Performance Transactions via Learned Concurrency Control (OSDI 2021)\n\n## Text-to-SQL\n* SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning (arXiv 2017)\n* An End-to-end Neural Natural Language Interface for Databases (arXiv 2018)\n* SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task (EMNLP 2018)\n* Robust Text-to-SQL Generation with Execution-Guided Decoding (arXiv 2018)\n* Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation (ACL 2019)\n* Global Reasoning over Database Structures for Text-to-SQL Parsing (EMNLP 2019)\n* Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing (ACL 2019)\n* Natural language to SQL: Where are we today? (VLDB 2020)\n* Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing (EMNLP Findings 2020)\n* RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers (ACL 2020)\n* Exploring Unexplored Generalization Challenges for Cross-Database Semantic Parsing (ACL 2020)\n* TAPAS: Weakly Supervised Table Parsing via Pre-training (ACL 2020)\n* TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data (ACL 2020)\n* Semantic Evaluation for Text-to-SQL with Distilled Test Suites (EMNLP 2020)\n* SMBOP: Semi-autoregressive Bottom-up Semantic Parsing (NAACL-HLT 2021)\n* Natural SQL: Making SQL Easier to Infer from Natural Language Specifications (EMNLP Findings 2021)\n* LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations (ACL 2021)\n* Structure-Grounded Pretraining for Text-to-SQL (NAACL-HLT 2021)\n* GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing (ICLR 2021)\n* SADGA: Structure-Aware Dual Graph Aggregation Network for Text-to-SQL (NeurIPS 2021)\n* GP: Context-free Grammar Pre-training for Text-to-SQL Parsers (arXiv 2021)\n* Relation Aware Semi-autoregressive Semantic Parsing for NL2SQL (arXiv 2021)\n* On Robustness of Neural Semantic Parsers (EACL 2021)\n* MT-Teql: Evaluating and Augmenting Neural NLIDB on Real-world Linguistic and Schema Variations (VLDB 2021)\n* PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models (EMNLP 2021)\n* Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training (AAAI 2021)\n* Towards robustness of text-to-sql models against synonym substitution (ACL 2021)\n* Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization (EMNLP 2021)\n* CodexDB: Generating Code for Processing SQL Queries using GPT-3 Codex (arXiv 2022)\n* S2SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers (arXiv 2022)\n* UNIFIEDSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models (EMNLP 2022)\n* RASAT: Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-SQL (EMNLP 2022)\n* UNISAR: A Unified Structure-Aware Autoregressive Language Model for Text-to-SQL (arXiv 2022)\n* N-Best Hypotheses Reranking for Text-To-SQL Systems (SLT 2022)\n* Semantic Enhanced Text-to-SQL Parsing via Iteratively Learning Schema Linking Graph (KDD 2022)\n* SeaD: End-to-end Text-to-SQL Generation with Schema-aware Denoising (NAACL-HLT Findings 2022)\n* STAR: SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing (EMNLP Findings 2022)\n* Towards Generalizable and Robust Text-to-SQL Parsing (EMNLP Findings 2022)\n* SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers (COLING 2022)\n* Towards robustness of text-to-sql models against natural and realistic adversarial table perturbation (ACL 2022)\n* Evaluating the Text-to-SQL Capabilities of Large Language Models (arXiv 2022)\n* A survey on deep learning approaches for text-to-SQL (VLDBJ 2023)\n* GAR: A Generate-and-Rank Approach for Natural Language to SQL Translation (ICDE 2023)\n* Exploring the Compositional Generalization in Context Dependent Text-to-SQL Parsing (arXiv 2023)\n* Improving Generalization in Language Model-Based Text-to-SQL Semantic Parsing: Two Simple Semantic Boundary-Based Techniques (arXiv 2023)\n* Exploring Chain-of-Thought Style Prompting for Text-to-SQL (arXiv 2023)\n* Few-shot Text-to-SQL Translation using Structure and Content Prompt Learning (SIGMOD 2023)\n* Multitask pretraining with structured knowledge for text-to-SQL generation (ACL 2023)\n* Demonstrating GPT-DB: Generating Query-Specific and Customizable Code for SQL Processing with GPT-4 (VLDB Demo 2023)\n* Graphix-T5: Mixing Pre-Trained Transformers with Graph-Aware Layers for Text-to-SQL Parsing (AAAI 2023)\n* SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (arXiv 2023)\n* Teaching Large Language Models to Self-Debug (arXiv 2023)\n* A comprehensive evaluation of ChatGPT's zero-shot Text-to-SQL capability (arXiv 2023)\n* DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction (arXiv 2023)\n* C3: Zero-shot Text-to-SQL with ChatGPT (arXiv 2023)\n* RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL (AAAI 2023)\n* Dr.spider: A Diagnostic Evaluation Benchmark Towards Text-To-Sql Robustness (ICLR 2023)\n* Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL (arXiv 2024)\n* Natural language to SQL [Resource repo](https:\u002F\u002Fgithub.com\u002Fyechens\u002FNL2SQL)\n* Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL (VLDB 2024)\n* Awesome-Text2SQL [Resource repo](https:\u002F\u002Fgithub.com\u002Feosphoros-ai\u002FAwesome-Text2SQL)\n* Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows [Resource repo](https:\u002F\u002Fgithub.com\u002Fxlang-ai\u002FSpider2) (arXiv 2024)\n\n## SQL Related\n* Query2Vec (ArXiv)\n* Facilitating SQL Query Composition and Analysis (ArXiv 2020)\n* From Natural Language Processing to Neural Databases (VLDB 2021)\n* BERT Meets Relational DB: Contextual Representations of Relational Databases\n* LearnedSQLGen: Constraint-aware SQL Generation using Reinforcement Learning (SIGMOD 2022)\n* PreQR: Pre-training Representation for SQL Understanding (SIGMDO 2022)\n* From BERT to GPT-3 Codex: Harnessing the Potential of Very Large Language Models for Data Management (VLDB 2022)\n* Query Generation based on Generative Adversarial Networks (arXiv 2023)\n\n## Stargazers over time\n\n[![Stargazers over time](https:\u002F\u002Fstarchart.cc\u002FLumingSun\u002FML4DB-paper-list.svg)](https:\u002F\u002Fstarchart.cc\u002FLumingSun\u002FML4DB-paper-list)\n","# [论文列表] AI4DB \u002F ML4DB \u002F 自动化数据库 \u002F 自动驾驶数据库 \u002F 智能数据库 \u002F 自治数据库\n\n包含人工智能（机器学习、深度学习、强化学习）在数据库系统中应用的论文列表\n\n新论文不断涌现，如果你对这个话题感兴趣，请记得 **Watch** 这个仓库。\n\n关于机器学习、神经网络、强化学习、自调优技术等在数据库系统中的应用的文章列表，列表持续更新中，记得按赞、分享、打开小铃铛！\n\n欢迎提交 PR！\n\n欢迎大家补充！\n\n关于 [Text-To-SQL](https:\u002F\u002Fgithub.com\u002Feosphoros-ai\u002FAwesome-Text2SQL) 的论文层出不穷！可惜我并不是这方面的专家，无法判断这些论文的质量。  \n非常期待大家在 Text-To-SQL 方面的贡献（PR、评论、讨论）！🫶\n\n如果有同学需要稳定访问GitHub的方式，可以试试这个[链接](https:\u002F\u002Fazabudai.org\u002Fauth\u002Fregister?code=Z4oa)\n\n目录\n=================\n- [\\[论文列表\\] AI4DB \u002F ML4DB \u002F 自动化数据库 \u002F 自动驾驶数据库 \u002F 智能数据库 \u002F 自治数据库](#paper-list-ai4db--ml4db--autonomous-database--self-driving-database--智能数据库--自治数据库)\n- [目录](#table-of-contents)\n  - [系统与教程](#system-and-tutorial)\n    - [训练数据收集](#training-data-collection)\n  - [数据访问](#data-access)\n    - [配置调优](#configuration-tuning)\n    - [物理设计](#physical-design)\n      - [学习型结构](#learned-structure)\n      - [索引](#index)\n        - [索引结构](#index-structure)\n        - [LSM-tree相关](#lsm-tree-related)\n        - [索引推荐](#index-recommendation)\n    - [物化视图](#materialized-view)\n      - [模式与分区](#schema--partition)\n    - [缓存相关](#cache-related)\n  - [工作负载](#workload)\n    - [工作负载生成](#workload-generation)\n    - [资源管理与自动扩展](#resource-management-and-auto-scaling)\n    - [性能诊断与建模](#performance-diagnosis-and-modeling)\n    - [工作负载变化检测](#workload-shift-detection)\n    - [工作负载特征分析与预测](#workload-characterization--forecasting)\n  - [查询优化](#query-optimization)\n    - [查询重写](#query-rewrite)\n    - [基数估计](#cardinality-estimation)\n      - [基于数据](#data-based)\n      - [基于查询](#query-based)\n    - [代价估计](#cost-estimation)\n      - [单条查询](#single-query)\n      - [并发执行](#concurrent)\n    - [连接优化](#join-optimization)\n    - [参数化查询优化](#parametric-query-optimization)\n      - [基础理论](#foundational-theory)\n      - [工程实现与数据驱动的PQO](#engineering--data-driven-pqo)\n      - [基于机器学习的PQO与鲁棒查询优化](#ml-based-pqo--robust-query-optimization)\n    - [查询计划](#query-plan)\n  - [查询执行](#query-execution)\n    - [排序](#sort)\n    - [连接](#join)\n    - [自适应查询处理](#adaptive-query-processing)\n    - [近似查询处理](#approximate-query-processing)\n    - [调度](#sheduling)\n  - [Text-to-SQL](#text-to-sql)\n  - [SQL相关](#sql-related)\n  \n  - [Star数量随时间变化](#stargazers-over-time)\n\n\n## 系统与教程\n* ***SageDB：一个学习型数据库系统（CIDR 2019）***\n* 数据库学习：迈向每次使用都更智能的数据库（SIGMOD 2017）\n* 自动驾驶数据库管理系统（CIDR 2017）\n* 自动驾驶：从通用到专用的DBMS（Phd@PVLDB 2018）  \n* 面向ML增强型数据库系统的主动学习（SIGMOD 2020）\n* 数据库遇见人工智能：综述（TKDE 2020）\n* 自动驾驶数据库系统：一种概念性方法（分布式与并行数据库 2020）\n* 统一模型统治一切：迈向数据库的零样本学习（arXiv 2021）\n* UDO：利用强化学习进行通用数据库优化（arXiv 2021）[源代码](https:\u002F\u002Fgithub.com\u002Fjxiw\u002FUDO)\n* 走向学习型系统的基准测试（SMDB研讨会 2021）\n* 面向ML增强型DBMS的统一可迁移模型 [愿景]（arXiv 2021）\n* AI遇见数据库：AI4DB与DB4AI（SIGMOD 2021）\n* 扩展你的训练极限！为基于ML的数据管理生成训练数据（SIGMOD 2021）\n* MB2：自动驾驶数据库管理系统的分解行为建模（SIGMOD 2021）\n* 走向实例优化的数据系统（VLDB 2021，Tim Kraska提出）\n* 让你的数据库系统梦见电动绵羊：迈向自动驾驶操作（VLDB 2021，Andy Pavlo提出）\n* openGauss：一个自治数据库系统（VLDB 2021，Guoliang Li提出）\n* 经验增强型学习：在自动数据库管理中，一刀切仍然不适用（arXiv 2021）\n* 百合：面向AI驱动数据库的SysML框架（arXiv 2022）\n* 可学习数据库综述：机器学习视角（大数据研究 2021）\n* 学习时代的数据库优化器（ICDE 2022）\n* 数据管理中的机器学习：系统视角（ICDE 2022）\n* 味道很棒！分量更少！高性能且准确的自动驾驶数据库管理系统训练数据收集（SIGMOD 2022）\n* SAM：基于监督自回归模型从查询工作负载生成数据库（SIGMOD 2022）[源代码](https:\u002F\u002Fgithub.com\u002FJamesyang2333\u002FSAM)\n* 检测、蒸馏与更新：学习型DB系统应对分布外数据（SIGMOD 2023）[源代码](https:\u002F\u002Fgithub.com\u002Fmeghdadk\u002FDDUp)\n* SageDB：一个实例优化的数据分析系统（VLDB 2023）\n* 走向在Azure上构建自治数据服务（SIGMOD-Companion ’23）\n* 数据库健身房（CIDR 2023）\n* 来看看BRAD的大脑吧：用学习型自动化数据网格简化云上数据处理（VLDB 2023）\n* 学习型数据库中的机器遗忘：实验分析（SIGMOD 2024）[源代码](https:\u002F\u002Fgithub.com\u002Fmeghdadk\u002FDB_unlearning)\n* PilotScope：用机器学习驱动程序操控数据库（VLDB 2024）[源代码](https:\u002F\u002Fgithub.com\u002Falibaba\u002Fpilotscope)\n* 数据库中的机器学习：基础、范式与开放问题（SIGMOD 2024）\n* NeurDB：一个由AI驱动的自治数据系统（arXiv 2024）\n* GaussML：一个端到端的数据库内机器学习系统（ICDE 2024）\n* NeurDB：关于AI驱动自治数据库的设计与实现（arXiv 2024）\n* LLM用于数据管理（VLDB 2024）\n* 云端蓝图：用BRAD统一并自动优化云上数据基础设施（VLDB 2024）\n* Holon方法：通过合成原型动作，利用机器学习同时调优自动驾驶数据库管理系统中的多个组件（VLDB 2024）\n* NeurBench：用数据和工作负载漂移建模来评估学习型数据库组件（arXiv 2025）\n* GaussMaster：一个基于LLM的数据库副驾驶系统（arXiv 2025）\n* D-Bot：一个LLM驱动的DBA副驾驶（SIGMOD-Companion 2025）\n* 鱼需要自行车吗？DBMS中片上NPU的必要性（CIDR 2026）\n\n### 训练数据收集\n* 扩展你的训练极限！为基于机器学习的数据管理生成训练数据（SIGMOD 2021）\n* DataFarm：为你的基于机器学习的查询优化器“耕种粮食”！——人类引导的训练数据生成——（CIDR 2022）\n* 为你的基于机器学习的查询优化器“耕种粮食”。（ICDE 2022，**最佳演示奖**）\n* 去健身房：加速查询执行，以高效地启动自动驾驶数据库管理系统的行为模型（VLDB 2024）\n* 通过分布可学习性对分布漂移下的学习型数据库操作进行理论分析（ICML 2024）\n\n## 数据访问\n### 配置调优\n* SARD：一种用于对数据库调优参数进行排序的统计方法（ICDEW，2008）\n* 基于强化学习的正则化代价模型无关数据库调优（2016）\n* 通过大规模机器学习实现数据库管理系统的自动调优（SIGMOD 2017）\n* 使用深度强化学习进行自动数据库管理的案例研究（2018 ArXiv）\n* 基于深度强化学习的端到端自动云数据库调优系统（SIGMOD 2019）\n* 外部与内部：关于用于自治数据库管理系统的机器学习智能体的探讨\n* QTune：一种基于深度强化学习的查询感知型数据库调优系统（VLDB 2019）\n* 通过学习固态硬盘的隐藏参数来优化数据库（VLDB 2019）\n* iBTune：面向大规模云数据库的个性化缓冲区调优（VLDB 2019）\n* 黑色还是白色？如何开发用于内存型分析的自动调优器（SIGMOD 2020）\n* 学习分布式SGD的高效参数服务器同步策略（ICLR 2020）\n* 调优的旋钮太多？通过预先选择重要旋钮来加快数据库调优速度（HotStorage 2020）\n* 运行中数据库管理系统的动态配置调优（LifeTech 2020）\n* 用于在线数据库调优的自适应多模型强化学习（EDBT 2021）\n* 关于在真实世界数据库管理系统上使用机器学习进行自动配置调优服务的探究（VLDB 2021）\n* 增强NLP的数据库调优：迈向能够“阅读手册”的调优工具（VLDB 2021）\n* CGPTuner：一种上下文相关的高斯过程多臂赌博机方法，用于在不同工作负载条件下自动调整IT配置（VLDB 2021）\n* ResTune：由元学习驱动的面向资源的云数据库调优增强方案（SIGMOD 2021）\n* KML：利用机器学习改进存储系统（arXiv 2021）\n* 基于自然语言处理的数据库调优（SIGMOD Record 2021）\n* 向云数据库的动态且安全的配置调优迈进（SIGMOD 2022）\n* 分布式数据流处理系统的自动性能调优（ICDE 2022）\n* 用于Spark配置调优的自适应代码学习（ICDE 2022）\n* DB-BERT：一款能够“阅读手册”的数据库调优工具（SIGMOD 2022）\n* HUNTER：一个针对个性化需求的在线云数据库混合调优系统（SIGMOD 2022）\n* LOCAT：面向Spark SQL应用的低开销在线配置自动调优（SIGMOD 2022）\n* 通过超参数优化促进数据库调优：一项全面的实验评估（VLDB 2022）\n* LlamaTune：样本高效的DBMS配置调优（VLDB 2022）\n* BLUTune：基于ML的查询驱动型多阶段IBM Db2调优（CIKM 2022）\n* 用于自治DBMS调优的统一高效协调框架（arXiv 2023）\n* 数据库旋钮自动调优：一项综述（TKDE）\n* 基于深度学习的数据库管理系统自动调优（arXiv 2023）\n* KeenTune：用于云应用性能测试与优化的自动化调优工具（ISSTA 2023）\n* ContTune：基于保守贝叶斯优化的分布式流数据处理系统的连续调优（arXiv 2023）\n* GPTuner：一种通过GPT引导的贝叶斯优化实现的手册阅读型数据库调优系统（arXiv 2023）\n* 一种基于迁移学习的高效数据库调优顾问（VLDB 2024）\n* DB‑GPT：大型语言模型与数据库的结合（DSE 2024）\n* 一款用于自适应、细粒度参数调优的Spark优化器（arXiv 2024）\n* TIE：面向内存数据分析的快速实验驱动型ML配置调优（IEEE计算机汇刊）\n* VDTuner：面向向量数据管理系统的自动性能调优（ICDE 2024）[源代码](https:\u002F\u002Fgithub.com\u002Ftiannuo-yang\u002FVDTuner)\n* Nautilus：一个用于DBMS旋钮调优的基准测试平台（DEEM 2024）[源代码](https:\u002F\u002Fgithub.com\u002Fuw-mad-dash\u002Fnautilus)\n* 大型语言模型擅长数据库旋钮调优吗？一项全面的实验评估（arXiv 2024）\n* CTuner：基于因果强化学习的自动NoSQL数据库调优（Internetware 2024）\n* KnobTree：通过可解释强化学习实现的智能数据库参数配置（arXiv 2024）\n* KnobCF：不确定性感知的旋钮调优（arXiv 2024）\n* Db2une：通过深度学习在压力下进行调优（VLDB 2024）\n* {\\lambda}-Tune：利用大型语言模型实现数据库系统的自动化调优（arXiv 2024）\n* Db2une：通过深度学习在压力下进行调优（VLDB 2024）\n* LOFTune：一种低开销且灵活的Spark SQL配置调优方法（TKDE 2025）\n* EAST：一个可解释的云数据库旋钮估算系统（ICDE 2025）\n* AQETuner：面向分析型查询引擎的可靠查询级配置调优（arXiv 2025）\n* 自动数据库调优与人工调优在模拟高压工作环境中的对比：数据库健身房的演示（SIGMOD 2025）\n* Rabbit：检索增强生成技术助力更好的自动数据库旋钮调优（ICDE 2025）\n* BitTuner：一套用于自动配置已学习数据压缩器的工具箱（ICDE 2025）\n* AgentTune：一个基于代理的大型语言模型框架，用于数据库旋钮调优（SIGMOD 2025）\n* L2T-Tune：LLM引导的混合数据库调优，结合LHS和TD3（arXiv 2025）\n\n### 物理设计\n* 提瑞西阿斯：实现预测性自治存储与索引（VLDB 2022）\n* 超级：基于多智能体强化学习的混合物理设计顾问（ICDE 2025）\n#### 学习型结构\n* 堆叠过滤器：基于结构的学习式过滤（VLDB 2021）\n* LEA：面向列存数据库的学习型编码顾问（aiDM 2021）\n* 面向数据库的集合学习（EDBT 2024）\n* 分布式学习哈希表（arXiv 2025）\n#### 索引\n##### 索引结构\n* 大数据索引中的学习哈希——综述（2016）\n* 学习型索引结构的必要性（SIGMOD 2018）\n* A-Tree：一种有界近似索引结构（2017）\n* FITing-Tree：一种数据感知型索引结构（SIGMOD 2019）\n* 面向动态工作负载的学习型索引（2019）\n* SOSD：学习型索引基准测试（2019）\n* 多维学习型索引的学习方法（2019）\n* ALEX：一种可更新的自适应学习型索引（SIGMOD 2020）\n* 空间索引的有效学习（VLDB 2020）[GitHub链接](https:\u002F\u002Fgithub.com\u002FLiuguanli\u002FRSMI)\n* 用于数据流的稳定学习布隆过滤器（VLDB 2020）\n* START——自调优自适应基数树（ICDEW 2020）\n* 学习型数据结构（2020）\n* RadixSpline：单次遍历学习型索引（aiDM2020）\n* ML-Index：一种用于点查询、范围查询及最近邻查询的多维学习型索引（EDBT 2020）\n* PGM-index：一种具有可证明最坏情况边界且完全动态的压缩学习型索引（VLDB 2020）\n* 学习型多维索引教程（SIGSPATIAL 2020）\n* 为什么学习型索引如此有效？（ICML 2020）\n* 面向谷歌规模磁盘数据库的学习型索引（arXiv 2020）\n* SIndex：一种适用于字符串键的可扩展学习型索引（APSys 2020）\n* XIndex：一种面向多核数据存储的可扩展学习型索引（PPoPP 2020）\n* 海啸：一种适用于相关数据和倾斜工作负载的多维学习型索引（VLDB 2021）\n* 一种用于高效索引学习的懒惰方法（2021）\n* RLR-Tree：一种基于强化学习的空间R树（arXiv 2021）\n* 基于空间插值的学习型索引，用于范围查询和kNN查询（arXiv 2021）\n* APEX：一种高性能的持久内存学习型索引（arXiv 2021）\n* RUSLI：实时可更新的样条学习型索引（aiDM 2021）\n* PLEX：迈向实用的学习型索引（arXiv 2021）\n* SPRIG：一种用于范围查询和kNN查询的学习型空间索引（SSTD 2021）\n* 学习型索引的基准测试（VLDB 2021）\n* 具有精确位置的可更新学习型索引（VLDB 2021）\n* 学习型内存内连接的必要性（arXiv 2021）\n* 限制最后一公里：高效的字符串学习型索引（arXiv 2021）\n* FINEdex：一种针对可扩展并发内存系统的细粒度学习型索引方案（VLDB 2022）\n* 数据库索引的未来五十年，或：自动构建索引结构的必要性（VLDB 2022）\n* 面向多核数据存储的并发学习型索引（Transactions on Storage 2022）\n* TONE：降低学习型索引的尾延迟（CHEOPS 22）\n* 一种用于度量空间中精确相似性搜索的学习型索引（ArXiv 2022）\n* RW-tree：一种学习型的工作负载感知框架，用于R树构建（ICDE 2022）\n* “AI+R”树：一种实例优化的R树（MDM 2022）\n* LHI：一种用于高效相似性搜索的学习型汉明空间索引框架（SIGMOD 2022）\n* 熵学习哈希：在可控均匀性下实现10倍速度的哈希运算（SIGMOD 2022）\n* 磁盘及其他介质上分层学习型索引的调优（SIGMOD 2022）\n* FLIRT：一种用于滚动时间窗口的快速学习型索引（EDBT 2022）\n* 学习型索引结构的鲁棒性测试（arXiv 2022）\n* 机器学习增强型高维索引的必要性（2022）\n* 一种用于度量空间中精确相似性搜索的学习型索引（arxiv 2022）\n* PLIN：一种高性能且可即时恢复的非易失性存储持久化学习型索引（VLDB 2023）\n* 一种面向高效写入的数据感知学习型索引方案（ICPP 2022）\n* 数据流中的频率估计：学习最优哈希方案（TKDE）\n* FILM：一种完全学习型的超内存数据库索引（VLDB 2023）\n* WISK：一种面向空间关键词查询的工作负载感知学习型索引（arXiv 2023）\n* 空间索引的高效学习（ICDE 2023）\n* 将学习型索引拆解分析：对可更新学习型索引的深入探讨（ICDE 2023）\n* DILI：一种由分布驱动的学习型索引（arXiv 2023）\n* 学习型索引：全面实验评估（VLDB 2023）\n* LMSFC：一种基于学习单调填充曲线的新型多维索引（扩展版）（arXiv 2023）\n* 一石二鸟：一种轻量级多维学习型索引，支持基数计算（arXiv 2023）\n* 一种简单却性能卓越的磁盘学习型索引：能否兼得蛋糕与食用？（aiXiv 2023）\n* 快速分区学习布隆过滤器（arXiv 2023）\n* 通过模型复用与微调实现高效索引学习（ICDEW 2023）\n* COAX：关联感知索引（ICDEW 2023）\n* 动态e的学习型索引（openreview 2023）\n* 学习优化LSM树：面向动态工作负载的强化学习键值存储（arXiv 2023）\n* SALI：一种基于概率模型的可扩展自适应学习型索引框架（SIGMODE 2024）\n* Sieve：一种用于数据分析的学习型跳过数据索引（VLDB 2023）\n* 展示华夫格：一种自动驾驶网格索引（VLDB Demo 2023）\n* 局部敏感哈希是否能被神经网络取代？（arXiv 2023）\n* 工作负载感知与学习型Z索引（arXiv 2023）\n* AirIndex：通过数据和存储进行多功能索引调优（SIGMOD 2024）\n* 一种面向并发分布式系统的快速学习型键值存储（TKDE 2023）\n* 当学习型索引遇到持久性内存时：分析与优化（TKDE 2023）\n* PLATON：采用学习型分区策略的自顶向下R树打包（PACMMOD 2023）\n* 一种用于数据流上可变大小滑动窗口的近似成员资格查询的学习型杜鹃过滤器（PACMMOD 2023）\n* WIPE：一种面向持久性内存的写优化学习型索引（TACO 2023）\n* 动态学习型索引的算法复杂度攻击（VLDB 2024）\n* 一种完全可在磁盘上更新的学习型索引（ICDE 2024）\n* 豪华轿车：融合学习型与传统索引，自主设计超内存云存储引擎（SIGMOD 2024）\n* AStore：面向支持RDMA的键值存储的统一自适应学习型索引与缓存（TKDE 2024）\n* Cabin：一种压缩自适应分箱扫描索引（SIGMOD 2024）\n* SWIX：一种内存效率高的滑动窗口学习型索引（SIGMOD 2024）\n* 豪华轿车：融合学习型与传统索引，自主设计超内存云存储引擎（SIGMOD 2024）\n* 多维空间学习型索引综述（arXiv 2024）\n* 超级：一种高性能且内存高效的混合构建学习型索引（ACM关于数据管理的会议论文集 2024）\n* 谓词缓存：面向云数据仓库的查询驱动二级索引（SIGMOD 2024）\n* AStore：面向支持RDMA的键值存储的统一自适应学习型索引与缓存（TKDE 2024）\n* 学习型索引能否高效构建？抽样权衡的深度探讨（SIGMOD 2024）\n* 让内存中的学习型索引在磁盘上同样高效（SIGMOD 2024）\n* LeaderKV：通过学习型索引和解耦KV表提升KV存储的读取性能（ICDE 2024）\n* 变色龙：面向局部倾斜数据的更新高效学习型索引探索（ICDE 2024）\n* 重新审视使用字节寻址持久性存储的学习型索引（ICPP 2024）\n* UpLIF：一个可更新的自调优学习型索引框架（arXiv 2024）\n* LITS：一种面向字符串的优化学习型索引（VLDB 2024）\n* 对外部内存连接中学习型索引的评估（arXiv 2024）\n* 通过虚拟点平滑分布的学习型索引（arXiv 2024）\n* VEGA：一种具有分组学习粒度的主动调优学习型索引（SIGMOD 2025）\n* ALT-Index：一种面向并发内存数据库系统的混合学习型索引（ICDE 2025）\n* BMTree：为多维数据索引设计、学习并更新分段式空间填充曲线（arXiv 2025）\n* LIOF：让学习型索引以更高精度更快地学习（TKDE 2025）\n* TELEX：一种用于基于Enclave的区块链系统上丰富查询的两层学习型索引（TKDE 2025）\n* 学习型索引结构中的分段线性逼近：理论与实证分析（arXiv 2025）\n* 从一维到多维空间的学习型索引：挑战、技术和机遇（SIGMOD 2025）\n* 在LSM树系统中评估学习型索引：基准、洞见和设计选择（arXiv 2025）\n* leSAX索引：一种用于时间序列相似性搜索的学习型SAX表示索引（ICDE 2025）\n* 高性能还是低内存？一种用于时空权衡的可更新学习型索引框架（SIGMOD 2025）\n* 解析可更新学习型索引的鲁棒性问题：实验与分析（SIGMOD 2025）\n* LETIndex：一种带有TEE的安全学习型索引（VLDB 2025）\n* 对RL增强型空间索引与传统、先进及学习型对应物的基准测试（arxiv 2025）\n##### LSM树相关\n* Leaper：一种用于LSM树存储引擎中缓存失效的学习型预取器（VLDB 2020）\n* 从WiscKey到波本：一种用于日志结构合并树的学习型索引（OSDI 2020）\n* TridentKV：一种通过自适应索引和空间高效分区优化读取性能的LSM树基KV存储（TPDS 2022）\n* LearnedKV：将LSM与学习型索引结合，在SSD上实现卓越性能（arXiv 2024）\n* CAMAL：通过主动学习优化LSM树（arXiv 2024）\n* DobLIX：一种面向日志结构合并树的双目标学习型索引（arXiv 2025）\n* 学习型LSM树：两种利用学习布隆过滤器的方法（aiXiv 2025）\n##### 索引推荐\n* 自适应数据库管理系统中的索引选择（SIGMOD 1976）\n* AutoAdmin“假设”索引分析工具（SIGMOD 1998）\n* 自调优数据库系统：十年进展（VLDB 2007）\n* AI遇见AI：利用查询执行改进索引推荐（SIGMOD 2019）\n* 使用无模型强化学习的自动化数据库索引（ICAPS 2020）\n* DRLindex：面向集群数据库的深度强化学习索引顾问（2020年国际数据库工程与应用研讨会）\n* 手中魔镜啊，请告诉我谁是这世上最好的索引选择算法？索引选择算法的实验评估（VLDB 2020）[GitHub链接](https:\u002F\u002Fgithub.com\u002Fhyrise\u002Findex_selection_evaluation)\n* 基于深度强化学习的索引顾问（CIKM 2020）[GitHub链接](https:\u002F\u002Fgithub.com\u002Frmitbggroup\u002FIndexAdvisor)\n* DBA强盗们：在临时性和分析性工作负载下安全驾驶索引调优（ICDE 2021）\n* MANTIS：使用深度强化学习进行多种类型和属性索引的选择（IDEAS 2021）\n* AutoIndex：一种面向动态工作负载的增量索引管理系统（ICDE 2022）[GitHub链接](https:\u002F\u002Fgithub.com\u002Fzhouxh19\u002FAutoIndex)\n* SWIRL：利用强化学习选择工作负载感知索引（EDBT 2022）[GitHub链接](https:\u002F\u002Fgithub.com\u002Fhyrise\u002Frl_index_selection)\n* Indexer++：结合变压器和强化学习进行工作负载感知在线索引调优（ACM SIGAPP SAC，2022）\n* 基于预算的强化学习索引调优（SIGMOD 2022）\n* ISUM：高效压缩大型复杂工作负载，实现可扩展索引调优（SIGMOD 2022）\n* DISTILL：低开销数据驱动技术，用于筛选和估算索引成本，从而实现可扩展索引调优（VLDB 2022）\n* SmartIndex：一款带有学习成本估算器的索引顾问（CIKM 2022）\n* HMAB：用于集成物理数据库设计调优的自驱动强盗层级（VLDB 2022）\n* 学习型索引收益：基于机器学习的索引性能估算（VLDB 2023）[GitHub链接](https:\u002F\u002Fgithub.com\u002FJC-Shi\u002FLearned-Index-Benefits)\n* AIM：一种面向SQL数据库的自动化索引管理实用方法（ICDE 2023）\n* 可更新学习型索引与磁盘驻留DBMS相遇——从评估到设计选择（SIGMOD 2023）\n* 利用量子计算机上的机器学习进行大规模数据库应用的索引调优（AIDB@VLDB 2023）\n* 一种面向慢查询的数据驱动索引推荐系统（CIKM 2023）\n* 机器学习赋能索引调优：近期进展与开放挑战概述（arXiv 2023）\n* 可更新学习型索引顾问对抗中毒攻击的鲁棒性（SIGMOD 2024）\n* 重构索引调优流程，加入收益估算环节（VLDB 2024）[GitHub链接](https:\u002F\u002Fgithub.com\u002FHIT-DB-Group\u002FRIBE)\n* 利用动态和异构工作负载知识提升索引顾问性能（VLDB 2024）[GitHub链接](https:\u002F\u002Fgithub.com\u002FXMUDM\u002FBALANCE）\n* MFIX：一种高效可靠的多保真贝叶斯优化索引顾问（ICDE 2024）\n* TRAP：通过对抗扰动对索引顾问进行定制化鲁棒性评估（ICDE 2024）\n* 面向慢查询的在线索引推荐（ICDE 2024）\n* 自动索引调优：综述（TKDE）\n* 拆解剖析：索引顾问的深度研究（VLDB 2024）\n* 不确定性量化能否促进更好的学习型索引调优？（arXiv 2024）\n* 混合成本建模以减少索引调优中的查询性能退化（TKDE 2024）\n* 学习型索引调优的新范式：强化学习增强方法（arXiv 2025）\n* LLMIdxAdvis：一种利用大型语言模型的资源节约型索引顾问（arXiv 2025）\n* 通过潜在估计引导索引调优探索（ICDE 2025）\n* AutoIndexer：一种面向规模化工作负载的强化学习增强索引顾问（arXiv 2025）\n* Rainbow：面对分布外工作负载的风险感知索引收益估算（SIGMOD 2025）\n* Oracle中的自动索引（VLDB 2025）\n\n### 物化视图\n* 基于深度学习和强化学习的自动视图生成（ICDE 2020）\n* 基于深度强化学习的自主物化视图管理系统（ICDE 2021）\n* 使用图神经网络进行动态物化视图管理的技术报告\n* HMAB：用于集成物理数据库设计调优的自驱动多臂老虎机层次结构（VLDB 2022）\n* AutoView：基于编码器-解码器的自主物化视图管理系统（TKDE 2022）\n* 使用图神经网络的动态物化视图管理（ICDE 2023）\n#### 模式与分区\n* Schism：一种工作负载驱动的数据库复制与分区方法（VLDB 2010）\n* 在无共享并行OLTP系统中考虑倾斜的自动数据库分区（SIGMOD 2012）\n* 面向高度可扩展且强一致事务的自动化数据分区（2016年《并行与分布式系统汇刊》）\n* GridFormation：利用强化学习实现自驱动在线数据分区（aiDM@SIGMOD 2018）\n* 使用深度强化学习学习分区顾问（2019年）\n* Qd-tree：为大数据分析学习数据布局（SIGMOD 2020）\n* 面向大数据仓库的遗传优化物理规划器（2020年）\n* Lachesis：面向以用户定义函数为中心的分析的自动化分区（VLDB 2021）\n* 针对云分析工作负载的实例优化数据布局（SIGMOD 2021）\n* Jigsaw：用于不规则表分区的数据存储与查询处理引擎（SIGMOD 2021）\n* Dalton：面向分布式数据流的学习型分区（VLDB 2023）\n* Grep：基于图学习的数据库分区系统（Management of Data 2023）\n* 学习空间数据分区（arXiv 2023）\n* 放松并让数据库在线完成分区（BIRTE 2011）\n* SWORD：面向事务类工作负载的可扩展、工作负载感知数据放置策略（EDBT 2013）\n* 分布式数据库系统中的在线数据分区（EDBT 2015）\n* 面向即席查询工作负载的稳健分区方案（SOCC 2017）\n* Amazon Redshift中的自动化多维数据布局（SIGMOD 2024）\n* Oasis：最优的不相交分段学习型范围过滤器（VLDB 2024）\n\n### 缓存相关\n* 开销极低的学习型缓存淘汰框架（arXiv 2023）\n\n## 工作负载\n### 工作负载生成\n展示SQLBarber：利用大型语言模型生成定制化且真实的SQL工作负载（SIGMOD 2025）\n\n### 资源管理与自动伸缩\n\n* 关系型数据库即服务中的自动化按需资源伸缩（SIGMOD 2016）\n* 基于时间序列分析和机器学习的数据库工作负载容量规划（SIGMOD 2020）\n* Seagull：用于负载预测和优化资源分配的基础设施（VLDB 2020）\n* FIRM：面向SLO导向微服务的智能细粒度资源管理框架（OSDI 2020）\n* 无服务器查询的最佳资源分配（arXiv 2021）\n* sinan：基于ML且关注QoS的云原生微服务资源管理（ASPLOS 2021）\n* 向大数据分析的最佳资源分配迈进（EDBT 2022）\n* 超额订阅数据库即服务集群中的租户放置（VLDB 2022）\n* 面向大数据处理的智能资源管理的细粒度建模与优化（arXiv 2022）\n* SIMPPO：用于无服务器资源管理的可扩展增量式在线学习框架（SoCC 2022）\n* SUFS：通过自适应集成学习提供通用存储使用预测服务（ICDE 2023）\n* Auto-WLM：Amazon Redshift中的机器学习增强型工作负载管理（SIGMOD-Companion ’23）\n* SeLeP：面向探索性数据库工作负载的基于学习的语义预取（arXiv 2023）\n* Amazon Redshift中的智能伸缩（SIGMOD 2024）\n* 智能资源伸缩的预测算法：一项实验分析（Socc 2024）\n* LORE：面向大数据查询的学习型资源推荐（ICDE 2025）\n\n### 性能诊断与建模\n\n* 高并发OLTP工作负载中的性能与资源建模（SIGMOD 2013）\n* DBSherlock：事务型数据库的性能诊断工具（SIGMOD 2016）\n* 自顶向下实现数据库系统性能可预测性的方法（SIGMOD 2017）\n* 诊断云数据库中间歇性慢查询的根本原因（VLDB 2020）\n* 面向自治DBMS的工作负载感知性能调优（ICDE 2021）\n* Sage：面向微服务的实用且可扩展的ML驱动性能调试（ASPLOS 2021）\n* D-Bot：基于大型语言模型的数据库诊断系统（arXiv 2023）\n* 为学习型数据库系统建模变化的工作负载（SIGMOD 2024）\n* Andromeda：利用检索增强型大型语言模型调试数据库性能问题（SIGMOD 2025）\n\n### 工作负载变化检测\n\n* 向自治数据库的工作负载变化检测与预测迈进（CIKM 2007）\n* 数据库工作负载事件的一致性在线分类（CIKM 2009）\n* 关于优化并行OLTP系统中事务执行的预测建模（VLDB 2011）\n* 面向学习型数据库操作的概念漂移情境适应（arXiv 2025）\n\n### 工作负载特征描述与预测\n\n* 关于关系型数据库环境的工作负载特征描述（TSE 1992）\n* 自治数据库管理系统的工作负载模型（2006年国际自治与自主系统会议）\n* 云端的工作负载特征描述与预测：多时间序列方法（APNOMS 2012）\n* 面向自动驾驶数据库管理系统的查询驱动型工作负载预测（SIGMOD 2018）\n* Query2Vec：评估NLP技术在通用工作负载分析中的应用（Arxiv 2018）\n* 使用查询计划编码器进行数据库工作负载特征描述（arXiv 2021）\n* 利用贝叶斯优化解释推理查询（VLDB 2021）\n* 基于奥卡姆剃刀的统计模式学习（SIGMOD 2022）\n* 面向数据库平台迁移的智能自动化工作负载分析（SIGMOD 2022）\n* Stitcher：从历史性能足迹中学习合成工作负载（EDBT 2022）\n* DBAugur：面向多样化工作负载的对抗式趋势预测系统（ICDE 2023）\n* 使用剪枝GRU神经网络高效在线预测主机工作负载（arXiv 2023）\n* 云计算中的不确定性感知工作负载预测（arXiv 2023）\n* 大规模云数据库的实时工作负载模式分析（VLDB 2023）\n* 基于概率性工作负载预测的云数据库稳健自动伸缩（ICDE 2024）\n* QPSEncoder：带有深度学习的数据库工作负载编码器（DEXA 2024）\n* 从特征选择到资源预测：常用工作流程与技术的分析（EDBT 2025）\n\n## 查询优化\n* 学习型查询优化器：最新进展与未来方向（SIGMOD 2024）\n* GLO：迈向通用的学习型查询优化（ICDE 2024）\n* 机器学习时代的鲁棒查询优化：现状与未来方向（ICDE 2024）\n* Presto 的基于历史的查询优化器（VLDB 2024）\n* 基于学习的空间查询优化（VLDB 2024）\n* DBG-PT：大型语言模型辅助的查询性能回归调试器（VLDB 2024）\n* 学习型代价模型究竟有多好？来自查询优化任务的洞察（SIGMOD 2025）[GitHub 链接](https:\u002F\u002Fgithub.com\u002FDataManagementLab\u002Flcm-eval)\n* SERAG：用于查询优化的自进化 RAG 系统（arXiv 2025）\n* 面向大型语言模型的 SQL 查询执行中的逻辑与物理优化（SIGMOD 2025）\n* SEFRQO：一种自进化的微调 RAG 基查询优化器（arXiv 2025）\n* JOB-Complex：面向传统与学习型查询优化的挑战性基准测试集（arXiv 2025）\n* LLM4Hint：利用大型语言模型进行离线查询优化中的提示推荐（arXiv 2025）\n* 用于查询计划表示的图变换器：潜力与挑战（VLDB 2026）\n\n### 查询重写\n* Sia：使用学习型谓词优化查询（SIGMOD 2021）\n* 基于蒙特卡洛树搜索的学习型查询重写系统（VLDB 2022）\n* WeTune：自动发现与验证查询重写规则（SIGMOD 2022）\n* 一种学习型查询重写系统（VLDB 2023）\n* 基于大型语言模型的查询重写（arXiv 2024）\n* LLM-R2：增强规则的大型语言模型重写系统，用于提升查询效率（arXiv 2024）[GitHub](https:\u002F\u002Fgithub.com\u002FDAMO-NLP-SG\u002FLLM-R2)\n* R-Bot：基于 LLM 的查询重写系统（arXiv 2024）\n* QUITE：超越规则、采用 LLM 代理的查询重写系统（arXiv 2025）\n* 利用查询优化器验证基于 LLM 的查询重写的正确性，以应对真实世界的工作负载，以及其他更多内容！（CIDR 2026）\n\n### 基数估计\n* 我们准备好采用学习型基数估计了吗？（VLDB 2021）[GitHub链接](https:\u002F\u002Fgithub.com\u002Fsfu-db\u002FAreCELearnedYet)\n* 面向基数估计的统一深度学习模型：同时从数据和查询中学习（SIGMOD 2021）\n* 最新进展：面向时空文本流的辅助学习选择率估计（ICDE 2021）\n* Fauce：用于基数估计的快速且准确的不确定性深度集成模型（VLDB 2021）\n* 数据库管理系统中的基数估计：全面基准评估（arXiv 2021）[GitHub链接](https:\u002F\u002Fgithub.com\u002FNathaniel-Han\u002FEnd-to-End-CardEst-Benchmark)\n* 学习型基数估计：设计空间探索与比较评估（VLDB 2022）\n* Glue：自适应融合单表基数以估计连接查询规模（aiXiv 2021）\n* 通过整合高斯混合模型与自回归模型实现无监督选择率估计（EDBT 2022）\n* 范围查询的选择率函数可被学习（SIGMOD 2022）\n* 学习型基数估计的预测区间：实验评估（ICDE 2022）\n* 学习型基数估计：深入研究（SIGMOD 2022）\n* FactorJoin：一种用于连接查询的新基数估计框架（SIGMOD 2023）\n* AutoCE：一款准确高效的模型顾问，用于学习型基数估计（ICDE 2023）\n* Couper：在非均衡分布下进行内存高效的基数估计（ICDE 2023）\n* ALECE：一种基于注意力机制的学习型基数估计器，适用于动态工作负载下的SPJ查询（VLDB 2023）\n* 高级数据集发现：当多查询数据集基数估计至关重要时（aiXiv 2024）\n* 利用几何深度学习实现样本高效基数估计（VLDB 2024）\n* PRICE：一个用于跨数据库基数估计的预训练模型（arXiv 2024）[GitHub链接](https:\u002F\u002Fgithub.com\u002FStCarmen\u002FPRICE)\n* ByteCard：利用学习型基数估计增强字节跳动的数据仓库（SIGMOD 2024）\n* ASM实战：快速实用的学习型基数估计（SIGMOD 2024）\n* CardBench：关系数据库中学习型基数估计的基准测试（arXiv 2024）\n* Duet：高效且可扩展的混合神经关系理解模型。（ICDE 2024）\n* 使用深度学习对LIKE谓词查询进行基数估计（SIGMOD 2025）\n* TardySketch：一个可适应滑动窗口的基数估计框架（arXiv 2025）\n* 针对所有学习型基数估计器的算法复杂度攻击：一种以数据为中心的方法（arXiv 2025）\n* DistJoin：基于自适应神经谓词调制的解耦式连接基数估计器（TKDE 2026）[GitHub链接](https:\u002F\u002Fgithub.com\u002FGIS-PuppetMaster\u002FDistJoin)\n#### 基于数据的方法\n* 自调优、GPU加速的核密度模型，用于多维选择率估计（SIGMOD 2015）\n* 使用带宽优化的核密度模型估计连接选择率（VLDB 2017）\n* DeepDB：从数据中学习，而非从查询中学习！（VLDB 2020）[GitHub链接](https:\u002F\u002Fgithub.com\u002FDataManagementLab\u002Fdeepdb-public)\n* 深度无监督基数估计（VLDB 2019）\n* 利用深度学习进行多属性选择率估计（arXiv 2019）\n* 用于多属性查询选择率估计的深度学习模型（SIGMOD 2020）\n* NeuroCard：一款适用于所有表的基数估计器（VLDB 2020）[GitHub链接](https:\u002F\u002Fgithub.com\u002Fneurocard\u002Fneurocard)\n* 学会采样：使用复杂查询计数（VLDB 2020）\n* 使用概率模型进行选择率估计（SIGMOD 2001）\n* 无需独立性假设的轻量级图形模型用于选择率估计（VLDB 2011）\n* 高效调整图形模型以用于选择率估计（VLDB 2013）\n* 基于贝叶斯网络的查询选择率估计方法（DASFAA 2019）\n* BayesCard：一个用于基数估计的统一贝叶斯框架（arXiv 2020）[GitHub链接](https:\u002F\u002Fgithub.com\u002Fwuziniu\u002FBayesCard)\n* 基于在线草图的查询优化（arXiv 2021）\n* LMKG：知识图谱中用于基数估计的学习模型（arXiv 2021）\n* LHist：迈向为海量空间数据学习多维直方图（ICDE 2021）\n* FLAT：一种快速、轻量且准确的基数估计方法（VLDB 2021）[GitHub链接](https:\u002F\u002Fgithub.com\u002Fwuziniu\u002FFSPN)\n* Astrid：利用深度学习对字符串谓词进行精确选择率估计（VLDB 2021）\n* FACE：基于归一化流的基数估计器（VLDB 2022）\n* 面向基数估计的结构化数据集摘要模型预训练（VLDB 2022）\n* 使用深度学习对近似子串查询进行基数估计（VLDB 2022）\n* 通过基于学习的渐进式基数估计加速端到端查询执行（ACM数据管理学报）\n* 使用平滑自回归模型进行基数估计（WWW 2023）\n* 使用归一化流进行基数估计（VLDBJ 2023）\n* LPLM：用于LIKE查询基数估计的神经语言模型（SIGMOD 2024）\n* ASM：将自回归模型、采样和多维统计合并用于基数估计（SIGMOD 2024）\n* ASM实战：快速实用的学习型基数估计（SIGMOD 2024）\n* SAFE：针对动态空间数据的采样辅助快速学习型基数估计（DEXA 2024）\n* 可更新的数据驱动基数估计器，具有有界Q-error（arXiv 2024）\n* Grid-AR：基于网格的助推器，用于学习型基数估计和范围连接（arXiv 2024）\n* SSCard：利用后缀树引导的学习型FM索引进行子串基数估计（arXiv 2025）\n* 一款轻量级的学习型基数估计模型（TKDE 2025）\n* 用于基数估计的扩散模型小型化（arXiv 2025）\n#### 基于查询的方法\n* 利用查询反馈进行自适应选择率估计（SIGMOD 1994）\n* 扩展性数据库中的选择率估计——基于神经网络的方法（VLDB 1998）\n* 使用神经网络有效估计查询规模。（应用智能 2002）\n* LEO——DB2的学习型优化器（VLDB 2011）\n* 查询基数估计的黑盒方法（CIDR 07）\n* 使用神经网络进行基数估计（2015）\n* 向共享云环境中的学习型优化器迈进（VLDB 2018）\n* 利用深度强化学习学习查询优化的状态表示（DEEM@SIGMOD2018）\n* 学习型基数：利用深度学习估计相关联接（CIDR2019）[GitHub链接](https:\u002F\u002Fgithub.com\u002Fandreaskipf\u002Flearnedcardinalities)\n* 使用深度草图估计基数（SIGMOD 2019）[GitHub链接](https:\u002F\u002Fgithub.com\u002Fandreaskipf\u002Flearnedcardinalities)\n* 使用轻量级模型对范围谓词进行选择率估计（VLDB 2019）\n* （综述）关于深度学习在基数估计中应用的实证分析（arXiv 2019）\n* 通过深度学习实现灵活的操作符嵌入（arXiv 2019）\n* 通过学习查询包含率改进基数估计（EDBT 2020）\n* 基于NN的任意SQL基数估计器改造，以处理DISTINCT、AND、OR和NOT操作符（2020）\n* QuickSel：使用混合模型快速学习选择率（SIGMOD 2020）\n* 使用低开销回归模型高效近似选择率函数（VLDB 2020）\n* 面向相似度查询的学习型基数估计（SIGMOD 2021）\n* 神经网络高斯过程支持下的不确定性感知基数估计（arXiv 2021）\n* Flow-Loss：学习真正重要的基数估计值（VLDB 2021）\n* Warper：高效适应数据与工作负载漂移的学习型基数估计器（SIGMOD 2022）\n* 使用神经网络高斯过程实现轻量且准确的基数估计，用于近似复杂事件处理（SIGMOD 2022）\n* 通过混合谓词组合增强查询特征化，以支持基于ML的基数估计（EDBT 2023）\n* 通过基于学习的渐进式基数估计加速端到端查询执行（SIGMOD 2023）\n* 在变化的工作负载下实现稳健的查询驱动基数估计（VLDB 2023）\n* 面向高维近似NN搜索的学习型探测式基数估计（ICDE 2023）\n* CEDA：带有领域适应性的学习型基数估计（VLDB 2023）\n* 使用双向压缩器集成学习实现高效的基数与成本估计（arXiv 2023）\n* 将领域知识添加到查询驱动的学习型数据库中（arXiv 2023）\n* PACE：针对学习型基数估计的中毒攻击（SIGMOD 2024）\n* 利用几何深度学习实现样本高效基数估计（VLDB 2024）\n* 基于XGBoost自动化的局部学习用于基数估计（知识与信息系统）\n* 不依赖特定数据的基数学习，适用于不完美工作负载（arXiv 2025）\n* SPACE：基于序列学习的路径查询基数估计，具备基数意识（SIGMOD 2025）\n\n### 成本估算\n#### 单一查询\n* 面向XML查询成本估算的统计学习技术（VLDB 2005）\n* 查询多指标预测：机器学习赋能更优决策（ICDE 2009）\n* 预测型数据库系统的机遇与挑战（CIDR 2011）\n* 基于学习的查询性能建模与预测（ICDE 2012）\n* 利用统计技术对SQL查询资源消耗进行稳健估计（VLDB 2012）\n* 基于学习的SPARQL查询性能建模与预测（WWW 2017）\n* 面向查询性能预测的计划结构化深度神经网络模型（arXiv 2019）\n* 端到端基于学习的成本估算器（arXiv 2019）（VLDB 2019）\n* 大数据查询处理的成本模型：学习、改造及我们的发现（2020）\n* DBMS拟合：为何要学习我们已知的知识？（CIDR 2020）\n* 关于算子级查询执行成本建模的注记（2020）\n* 基于ML的跨平台查询优化（ICDE 2020）\n* 零样本成本模型：开箱即用的学习型成本预测（VLDB 2022）\n* 基于伪标签的高效学习用于查询成本估算（CIKM 2022）\n* gCBO：面向图数据库的成本优化器（CIKM 2022）\n* QueryFormer：用于查询计划表示的树形变换器模型（VLDB 2022）\n* BASE：弥合查询优化中成本与延迟之间的鸿沟（VLDB 2023）\n* 重新思考学习型成本模型：为何要从头开始？（PACMMOD 2023）\n* 预算感知的查询调优：AutoML视角（arXiv 2024）\n* OS预训练变换器：在不断变化的系统环境中预测查询延迟 [GitHub链接](https:\u002F\u002Fgithub.com\u002Fparimarjan\u002FLatencyPredictor)\n* 精准与鲁棒并重：结合不确定性量化实现跨数据库泛化，以支持稳健的成本估算（CIKM 2024）\n* DACE：一种数据库无关的成本估算器（ICDE 2024）\n* QCFE：高效的查询成本估算特征工程（ICDE 2024）\n* T3：使用编译型决策树为关系数据库系统提供准确且快速的性能预测（arXiv 2025）\n* 在LinkedIn评估学习型查询性能预测模型：挑战、机遇与发现（arXiv 2025）\n* LEAP：利用成对比较的低成本Spark SQL查询优化器（VLDB 2025）\n* CONCERTO：考虑复杂查询执行机制的学习型成本估算（arXiv 2025）\n* GRACEFUL：面向UDF的学习型成本估算器（arXiv 2025）\n* 跨数据库查询成本估算：经典ML、Transformer与LLM的比较研究\n* 使用合成SQL查询自举学习型成本模型（arXiv 2025）\n\n#### 并发\n* PQR：预测查询执行时间以实现自主工作负载管理（ICAC 2008）\n* 并发数据库工作负载的性能预测（SIGMOD 2011）\n* 利用交互感知模型和仿真预测批处理查询工作负载的完成时间（EDBT 2011）\n* 报表生成工作负载的交互感知调度（VLDB 2011）（包含调度策略）\n* 向并发且动态数据库工作负载的查询执行时间预测迈进（非机器学习）（VLDB 2014）\n* Contender：面向并发查询性能预测的资源建模方法（EDBT 2014）\n* 利用图嵌入进行并发查询性能预测（VLDB 2020）\n* 面向大规模查询工作负载的高效深度学习管道，用于精确成本估算（SIGMOD 2021）\n* 面向大数据查询处理的资源感知深度成本模型（ICDE 2022）\n* Stage：亚马逊Redshift中的查询执行时间预测（SIGMOD 2024）\n* PlanRGCN：预测SPARQL查询性能（VLDB 2025）\n* 学习型成本模型用于查询优化：从批处理到流式系统（VLDB 2025）\n### 连接优化\n* 超大连接查询的适应性优化（SIGMOD 2018）（非机器学习）\n* 基于深度强化学习的连接次序枚举（aiDM@SIGMOD 2018）\n* 利用深度强化学习优化连接查询（ArXiv）\n* 使用Tree-LSTM进行连接次序选择的强化学习（ICDE 2020）\n* 基于深度强化学习的连接查询优化研究挑战（aiDM 2020）\n* 基于图表示的高效连接次序选择学习（KDD 2022）\n* SOAR：具有图注意力机制的学习型连接次序选择器（IJCNN 2022）\n* 基于动态双深度Q网络的查询连接次序优化方法（Electronics 2023）\n* Coral：基于深度强化学习的联邦查询连接次序优化（WWW 2023）\n* JoinGym：面向强化学习的高效查询优化环境（arXiv 2023）\n* 基于深度强化学习的连接次序选择：基础、技术和挑战（VLDB 2023）\n* 利用L1误差识别次优连接次序（SIGMOD 2024）\n* TESSM：基于树状选择性状态空间模型的高效连接次序选择学习（CIKM 2024）\n* SOLAR：通过学习型优化实现可扩展的分布式空间连接（arXiv 2025）\n### 参数化查询优化\n#### 基础理论\n* 动态查询执行计划（SIGMOD 1989）\n* 参数化查询优化（VLDB 1992）\n* 动态查询执行计划的优化（SIGMOD 1994）\n* 参数化查询优化算法的设计与分析（VLDB 1998）\n* 最小期望成本查询优化：我们能期待什么？（SIGMOD 2002）\n* 针对线性和分段线性成本函数的参数化查询优化（VLDB 2002）\n* AniPQO：几乎无侵入式的非线性成本函数参数化查询优化（VLDB 2003）\n#### 工程与数据驱动的PQO\n* 分析数据库查询优化器的计划图（VLDB 2005）\n* 关于厌食症式计划图的生产（VLDB 2007）\n* 通过计划图约简识别稳健计划（VLDB 2008）\n* 高效近似查询优化器的计划图（VLDB 2008）\n* 在Oracle 11g中闭合查询处理循环（VLDB 2008）\n* 渐进式参数化查询优化（TKDE 2009）\n* 针对参数化查询的动态计划生成（SIGMOD 2009）\n* 考虑方差的参数化查询优化（SIGMOD 2010）\n* 关于计划成本稳定性与成本稳定性的讨论（VLDB 2010）\n* 利用密度聚类进行参数化计划缓存（ICDE 2012）\n* 借助再成本计算实现有保障的参数化查询在线优化（SIGMOD 2017）\n#### 基于ML的PQO与稳健查询优化\n* 利用查询日志和机器学习进行参数化查询优化（VLDB 2022）\n* Kepler：稳健学习助力更快的参数化查询优化（SIGMOD 2023）\n* RankPQO：面向参数化查询优化的学习排序方法（VLDB 2024）\n* PARQO：惩罚感知的稳健计划选择在查询优化中的应用（VLDB 2024）\n* PAR2QO：参数化惩罚感知的稳健查询优化（VLDB 2024）\n* APQO：一个自适应的参数化查询优化框架（SIGMOD 2025）\n\n### 查询计划\n* 基于查询聚类的计划选择（VLDB 2002）\n* 基于成本的查询优化：利用人工智能规划方法（AAAI 2014）\n* 基于采样的查询重优化（SIGMOD 2016）\n* 使用深度强化学习学习查询优化的状态表示（DEEM@SIGMOD2018）\n* 通过深度学习迈向免手动干预的查询优化器（CIDR 2019）\n* Neo：一个基于学习的查询优化器（VLDB 2019）\n* Bao：学习引导查询优化器（2020）\n* 基于机器学习的跨平台查询优化（ICDE 2020）\n* 基于学习的声明式查询优化（2021）\n* **Bao：使基于学习的查询优化走向实用**（SIGMOD 2021 **最佳论文**！）[文档](https:\u002F\u002Frmarcus.info\u002Fbao_docs\u002Fintroduction.html) [GitHub链接](https:\u002F\u002Fgithub.com\u002Flearnedsystems\u002FBaoForPostgreSQL)\n* Microlearner：微软针对大数据工作负载的细粒度学习型优化器（2021）\n* 引导查询优化器：面向大数据工作负载的实用方案（SIGMOD 2021）\n* 面向ML增强DBMS的统一可迁移模型（CIDR 2021）\n* Balsa：无需专家示范即可学习查询优化器（SIGMOD 2022）\n* 利用查询日志与机器学习进行参数化查询优化（VLDB 2022）\n* 在微软生产环境中部署引导式查询优化器（SIGMOD 2022）\n* 构建基于学习的联邦查询优化器（VLDB 2022博士生研讨会）\n* 基于成本还是基于学习？用于查询计划选择的混合查询优化器（VLDB 2022）\n* 学习真正重要的内容：基于排序学习的ML查询优化方法（BTW 2023）\n* Lero：基于排序学习的查询优化器（VLDB 2023）[GitHub链接](https:\u002F\u002Fgithub.com\u002FAlibabaIncubator\u002FLero-on-PostgreSQL)\n* 基于学习的查询超优化（arXiv 2023）\n* Kepler：稳健学习以加速参数化查询优化（SIGMOD 2023）\n* LOGER：学习型优化器，旨在生成高效且稳健的查询执行计划（VLDB 2023）\n* BitE：在混合工作负载环境中加速基于学习的查询优化（arXiv 2023）\n* 基于强化学习的SPARQL连接顺序优化器\n* LEON：用于ML辅助查询优化的新框架（VLDB 2023）\n* AutoSteer：适用于任何SQL数据库的基于学习的查询优化（VLDB 2023）\n* FASTgres：使基于学习的查询优化提示机制更加有效（VLDB 2023）\n* 简单自适应查询处理与基于学习的查询优化器：观察与分析（VLDB 2023）\n* QO-Insight：检查引导式查询优化器（VLDB演示2023）\n* QPSeeker：一种高效的神经规划器，通过变分推断结合数据与查询（EDBT 2024）\n* FOSS：为查询优化器服务的自我学习型医生（ICDE 2024）\n* Lemo：一种缓存增强型并发查询学习优化器（PACMMOD 2023）\n* ML4DB研究中查询计划表示技术的比较研究与组件分析（VLDB 2024）\n* 用于数据探索的在线近似查询处理的学习型优化器（TKDE 2024）\n* 基于学习的空间连接处理框架：估计、优化与调优（VLDB 2024）\n* Roq：基于风险感知的学习成本模型的稳健查询优化（arXiv 2024）\n* PLAQUE：查询时自动谓词学习（SIGMOD 2024）\n* GLO：迈向通用的学习型查询优化（ICDE 2024）\n* Eraser：消除基于学习的查询优化器中的性能退化（VLDB 2024）\n* 低秩近似用于学习型查询优化（aiDM 2024）\n* Lero：在查询优化器中应用排序学习（VLDB 2024）\n* RobOpt：基于不确定性感知机器学习的稳健工作负载优化工具（SIGMOD 2024）\n* 一种基于图神经网络的新型查询计划表示技术（大数据分析与知识发现）\n* 查询计划表示的探索性案例研究（aiXiv 2024）\n* JAPO：学习云原生连接优化中的连接与下推顺序（计算机科学前沿2024）\n* 使用提示引导PostgreSQL查询优化器：现状与开放挑战（第35届GI数据库基础研讨会）\n* PARQO：惩罚感知的稳健查询优化计划选择（arXiv 2024）\n* HERO：基于提示的高效可靠查询优化器（arXiv 2024）\n* 大型语言模型能否成为关系数据库的查询优化器？（arXiv 2025）\n* 基于贝叶斯优化的离线查询计划学习（arXiv 2025）\n* 一种利用大型语言模型的查询优化方法（arXiv 2025）\n* RankPQO：面向参数化查询优化的排序学习（VLDB 2025）\n* 低秩学习用于离线查询优化（arXiv 2025）\n* LIMAO：终身模块化学习型查询优化框架（arXiv 2025）\n* Athena：一种有效的基于学习的查询优化器性能提升框架（SIGMOD 2025）\n* Delta：基于混合成本的学习型查询优化框架（arXiv 2025）\n* 基于LLM计划相似性的免训练查询优化（arXiv 2025）\n* 基于学习成本模型的跨引擎SQL工作负载优化器（arXiv 2025）\n* FOSS：为查询优化服务的自学医生（VLDBJ）\n\n## 查询执行\n### 排序\n* 学习型排序算法的必要性（SIGMOD 2020）\n* 消除重复：重新设计LearnedSort算法（aiXiv 2021）\n* 向并行学习排序迈进（arXiv 2022）\n### 连接\n* SkinnerDB：基于强化学习的后悔约束查询评估（VLDB 2018）\n* 内存中连接学习的必要性（arXiv 2021）\n### 自适应查询处理\n* Eddies：持续自适应的查询处理。（SIGMOD 2000）\n* Vectorwise中的微观自适应性（SIGMOD 2013）\n* Cuttlefish：自适应查询处理的轻量级原语（2018）\n* 基于强化学习的可扩展多查询执行（SIGMOD 2021）\n### 近似查询处理\n* DBEST：利用机器学习模型重新审视近似查询处理引擎（SIGMOD 2019）\n* LAQP：基于学习的近似查询处理（2020）\n* 利用深度生成模型进行数据探索的近似查询处理（ICDE 2020）\n* ML-AQP：基于机器学习的查询驱动型近似查询处理（2020）\n* 基于条件生成模型的Group-By查询近似处理（2021）\n* 学习型近似查询处理：轻量、准确且快速（CIDR 2021）\n* NeuroSketch：利用神经网络对范围聚合查询进行快速近似评估（SIGMOD 2023）\n* 利用机器学习模型进行近似查询处理（大数据2022）\n* 元组气泡：用于可调近似查询处理的学习型元组表示（aiDM 2023）\n* 基于学习的样本调优，用于交互式数据探索中的近似查询处理（TKDE 2024）\n\n### 调度\n* 通过机器学习进行云数据库的工作负载管理（ICDE 2016 WiseDB）\n* 基于学习的云数据库成本与性能管理服务（ICDEW 2017）（WiSeDB的简短版本）\n* WiSeDB：面向云数据库的基于学习的工作负载管理顾问（2016 VLDB）\n* 面向数据处理集群的学习型调度算法（SIGCOMM 2019）\n* CrocodileDB：通过智能延迟实现高效的数据库执行（CIDT 2020）\n* 基于深度强化学习的缓冲池感知查询调度（2020）\n* 面向分析型工作负载的自适应查询调度（SIGMOD 2021）\n* LSched：面向分析型数据库系统的负载感知学习型查询调度器（SIGMOD 2022）\n* DBMLSched：调度数据库内机器学习作业（AIDB@VLDB 2023）\n* 学习可解释的数据处理集群调度算法（arXiv 2024）\n* CCaaLF：可学习函数形式的并发控制（arXiv 2025）\n* Laser：主从式数据库中的缓冲区感知学习型查询调度（VLDB 2025）\n* 利用准确的并发查询性能预测改进DBMS调度决策（VLDB 2025）\n\n（事务👇）\n\n* 通过学习型中止预测调度OLTP事务（aiDM@SIGMOD 2019）\n* 通过机器学习调度OLTP事务（2019）\n* Polyjuice：基于学习型并发控制的高性能事务（OSDI 2021）\n\n## 文本转SQL\n* SQLNet：无需强化学习即可从自然语言生成结构化查询（arXiv 2017）\n* 面向数据库的端到端神经网络自然语言接口（arXiv 2018）\n* SyntaxSQLNet：用于复杂且跨领域的文本转SQL任务的语法树网络（EMNLP 2018）\n* 基于执行引导解码的稳健文本转SQL生成（arXiv 2018）\n* 朝着具有中间表示的跨领域数据库中复杂文本转SQL迈进（ACL 2019）\n* 面向文本转SQL解析的数据库结构全局推理（EMNLP 2019）\n* 使用图神经网络表示模式结构以进行文本转SQL解析（ACL 2019）\n* 自然语言到SQL：我们目前处于什么阶段？（VLDB 2020）\n* 桥接文本与表格数据，实现跨领域文本转SQL语义解析（EMNLP Findings 2020）\n* RAT-SQL：面向文本转SQL解析器的关系感知模式编码与链接（ACL 2020）\n* 探索跨数据库语义解析中尚未被充分研究的泛化挑战（ACL 2020）\n* TAPAS：通过预训练实现弱监督下的表格解析（ACL 2020）\n* TaBERT：用于文本与表格数据联合理解的预训练模型（ACL 2020）\n* 基于提炼测试集的文本转SQL语义评估（EMNLP 2020）\n* SMBOP：半自回归式自底向上语义解析（NAACL-HLT 2021）\n* Natural SQL：使SQL更容易从自然语言规范中推导出来（EMNLP Findings 2021）\n* LGESQL：带有混合局部与非局部关系的线图增强型文本转SQL模型（ACL 2021）\n* 面向文本转SQL的结构基础预训练（NAACL-HLT 2021）\n* GraPPa：用于表格语义解析的语法增强型预训练模型（ICLR 2021）\n* SADGA：面向文本转SQL的结构感知双图聚合网络（NeurIPS 2021）\n* GP：面向文本转SQL解析器的无上下文语法预训练模型（arXiv 2021）\n* 关系感知的半自回归式语义解析用于NL2SQL（arXiv 2021）\n* 关于神经网络语义解析器鲁棒性的讨论（EACL 2021）\n* MT-Teql：在真实世界的语言和模式变化下评估并增强神经网络NLIDB的能力（VLDB 2021）\n* PICARD：为受限自回归解码从语言模型中逐步解析（EMNLP 2021）\n* 通过生成增强型预训练学习语义解析的上下文表示（AAAI 2021）\n* 朝着提高文本转SQL模型对同义词替换的鲁棒性努力（ACL 2021）\n* 探索跨领域文本转SQL泛化的未被充分研究的局限性（EMNLP 2021）\n* CodexDB：使用GPT-3 Codex生成处理SQL查询的代码（arXiv 2022）\n* S2SQL：将句法注入到问题-模式交互图编码器中，用于文本转SQL解析器（arXiv 2022）\n* UNIFIEDSKG：利用文本到文本的语言模型统一并多任务处理结构化知识接地（EMNLP 2022）\n* RASAT：将关系结构整合进预训练的Seq2Seq模型中用于文本转SQL（EMNLP 2022）\n* UNISAR：一种统一的结构感知自回归语言模型用于文本转SQL（arXiv 2022）\n* 文本转SQL系统中的最佳假设重排序（SLT 2022）\n* 通过迭代学习模式链接图来增强语义的文本转SQL解析（KDD 2022）\n* SeaD：端到端的文本转SQL生成，具备模式感知去噪功能（NAACL-HLT Findings 2022）\n* STAR：面向情境依赖型文本转SQL解析的SQL引导预训练（EMNLP Findings 2022）\n* 朝着通用且鲁棒的文本转SQL解析努力（EMNLP Findings 2022）\n* SUN：探索文本转SQL解析器中的内在不确定性（COLING 2022）\n* 朝着提高文本转SQL模型对自然及现实对抗性表格扰动的鲁棒性努力（ACL 2022）\n* 评估大型语言模型的文本转SQL能力（arXiv 2022）\n* 关于文本转SQL的深度学习方法综述（VLDBJ 2023）\n* GAR：一种生成并排序的自然语言到SQL翻译方法（ICDE 2023）\n* 探索情境依赖型文本转SQL解析中的组合泛化能力（arXiv 2023）\n* 改善基于语言模型的文本转SQL语义解析的泛化能力：两种简单的基于语义边界的技巧（arXiv 2023）\n* 探索面向文本转SQL的思维链式提示方法（arXiv 2023）\n* 利用结构与内容提示学习进行少量样本的文本转SQL翻译（SIGMOD 2023）\n* 结合结构化知识进行多任务预训练以生成文本转SQL内容（ACL 2023）\n* 展示GPT-DB：使用GPT-4生成针对特定查询且可定制的SQL处理代码（VLDB Demo 2023）\n* Graphix-T5：将预训练的Transformer与图感知层结合用于文本转SQL解析（AAAI 2023）\n* SQL-PaLM：改进大型语言模型对文本转SQL的适应性（arXiv 2023）\n* 教授大型语言模型自我调试能力（arXiv 2023）\n* 对ChatGPT零样本文本转SQL能力的全面评估（arXiv 2023）\n* DIN-SQL：分解式的情境学习文本转SQL，并带有自我修正功能（arXiv 2023）\n* C3：使用ChatGPT进行零样本文本转SQL（arXiv 2023）\n* RESDSQL：将模式链接与骨架解析解耦，用于文本转SQL（AAAI 2023）\n* Dr.spider：一个旨在提升文本转SQL鲁棒性的诊断评估基准（ICLR 2023）\n* 下一代数据库接口：基于LLM的文本转SQL综述（arXiv 2024）\n* 自然语言到SQL【资源库】（https:\u002F\u002Fgithub.com\u002Fyechens\u002FNL2SQL）\n* 将小型语言模型与大型语言模型结合用于零样本NL2SQL（VLDB 2024）\n* Awesome-Text2SQL【资源库】（https:\u002F\u002Fgithub.com\u002Feosphoros-ai\u002FAwesome-Text2SQL）\n* Spider 2.0：在真实世界的企业级文本转SQL工作流上评估语言模型【资源库】（https:\u002F\u002Fgithub.com\u002Fxlang-ai\u002FSpider2）（arXiv 2024）\n\n## SQL 相关\n* Query2Vec (ArXiv)\n* 促进 SQL 查询的组合与分析 (ArXiv 2020)\n* 从自然语言处理到神经数据库 (VLDB 2021)\n* BERT 遇上关系数据库：关系数据库的上下文表示\n* LearnedSQLGen：基于强化学习的约束感知 SQL 生成 (SIGMOD 2022)\n* PreQR：用于 SQL 理解的预训练表示 (SIGMDO 2022)\n* 从 BERT 到 GPT-3 Codex：挖掘超大规模语言模型在数据管理中的潜力 (VLDB 2022)\n* 基于生成对抗网络的查询生成 (arXiv 2023)\n\n## 星标随时间变化\n\n[![星标随时间变化](https:\u002F\u002Fstarchart.cc\u002FLumingSun\u002FML4DB-paper-list.svg)](https:\u002F\u002Fstarchart.cc\u002FLumingSun\u002FML4DB-paper-list)","# ML4DB-paper-list 快速上手指南\n\n`ML4DB-paper-list` 并非一个需要编译运行的软件工具，而是一个持续更新的**学术论文索引仓库**。它系统性地整理了人工智能（机器学习、深度学习、强化学习）在数据库系统中的应用文献。本指南旨在帮助开发者快速获取该资源列表并在本地或云端进行查阅。\n\n## 环境准备\n\n本项目主要为文档和链接集合，无复杂的系统依赖，仅需具备基础的代码版本管理环境和网络访问能力。\n\n*   **操作系统**：Windows \u002F macOS \u002F Linux 均可。\n*   **前置依赖**：\n    *   `Git`：用于克隆仓库到本地。\n    *   现代浏览器（Chrome, Edge, Firefox 等）：用于阅读 Markdown 渲染后的内容或访问论文链接。\n    *   （可选）Markdown 编辑器：如 VS Code, Typora，用于离线浏览。\n*   **网络建议**：\n    *   由于仓库托管于 GitHub，国内用户访问可能不稳定。推荐配置科学上网环境，或使用文中提供的国内加速\u002F镜像方案。\n\n## 安装步骤\n\n你可以通过以下两种方式获取论文列表：\n\n### 方式一：在线浏览（推荐）\n直接访问 GitHub 仓库页面查看最新整理的目录和链接：\n> https:\u002F\u002Fgithub.com\u002Feosphoros-ai\u002FML4DB-paper-list\n\n**国内加速访问方案**：\n如果无法稳定访问 GitHub，可尝试使用以下镜像服务或文中提到的加速链接：\n1.  **GitHub 镜像站**：将上述 URL 中的 `github.com` 替换为 `mirror.ghproxy.com` (或其他可用的公共镜像)。\n2.  **文中推荐通道**：参考原仓库描述中提供的 [azabudai.org](https:\u002F\u002Fazabudai.org\u002Fauth\u002Fregister?code=Z4oa) 注册获取稳定访问方式。\n\n### 方式二：克隆到本地\n若需离线查阅或贡献内容（PR），请使用 Git 克隆仓库。\n\n```bash\n# 1. 克隆仓库\ngit clone https:\u002F\u002Fgithub.com\u002Feosphoros-ai\u002FML4DB-paper-list.git\n\n# 2. 进入目录\ncd ML4DB-paper-list\n\n# 【可选】国内用户若克隆速度慢，可使用 Gitee 镜像（如有）或设置 git 代理\n# 示例：设置 http 代理 (请替换为你的实际代理地址)\n# git config --global http.proxy http:\u002F\u002F127.0.0.1:7890\n# git config --global https.proxy https:\u002F\u002F127.0.0.1:7890\n```\n\n## 基本使用\n\n获取仓库后，主要通过查阅 `README.md` 文件来定位所需的学术资源。该列表按数据库系统的核心模块进行了分类整理。\n\n### 1. 查看分类目录\n打开项目根目录下的 `README.md` 文件，你将看到如下核心分类结构：\n\n*   **System and Tutorial**: 综述、教程及自治数据库系统架构（如 SageDB, openGauss）。\n*   **Data Access**: 数据访问层优化，包括参数调优 (Configuration Tuning)、物理设计 (Physical Design)、索引 (Index) 和物化视图。\n*   **Workload**: 负载管理，涵盖负载生成、资源自动伸缩、性能诊断及负载漂移检测。\n*   **Query Optimization**: 查询优化核心，包含基数估计 (Cardinality Estimation)、代价估计、连接优化及参数化查询优化。\n*   **Query Execution**: 查询执行层，涉及排序、自适应查询处理及近似查询处理。\n*   **Text-to-SQL**: 自然语言转 SQL 的相关研究。\n\n### 2. 查找特定领域论文\n假设你想研究 **“基于强化学习的数据库参数自动调优”**：\n\n1.  在 `README.md` 中找到 `Data Access` -> `Configuration Tuning` 章节。\n2.  浏览该章节下的论文列表，例如：\n    *   *QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning (VLDB 2019)*\n    *   *An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning (SIGMOD 2019)*\n3.  点击论文标题链接，直接跳转至论文原文（arXiv, ACM DL, IEEE Xplore 等）或对应的开源代码仓库（如标注了 `[Source Code]`）。\n\n### 3. 贡献内容 (PR)\n如果你发现了新的相关论文希望补充到列表中：\n\n```bash\n# 1. 创建新分支\ngit checkout -b add-new-paper\n\n# 2. 编辑 README.md，在对应分类下添加论文条目\n# 格式示例：* ***Paper Title*** (Conference Year) [Link]\n\n# 3. 提交更改\ngit add README.md\ngit commit -m \"Add new paper: [Paper Title]\"\n\n# 4. 推送到远程并发起 Pull Request\ngit push origin add-new-paper\n# 随后在 GitHub 网页端发起 PR\n```\n\n通过以上步骤，你可以高效地利用 `ML4DB-paper-list` 追踪 AI4DB 领域的最新研究进展。","某大型电商公司的数据库团队正致力于优化核心交易系统的查询性能，计划引入机器学习技术实现索引自动推荐和基数估计优化。\n\n### 没有 ML4DB-paper-list 时\n- 研究人员需要在海量学术库中盲目搜索，难以区分哪些论文真正解决了生产环境的“长尾查询”延迟问题。\n- 缺乏系统性的分类指引，团队容易遗漏如\"LSM-tree 相关优化”或“并发代价估计”等关键细分领域的最新突破。\n- 从零复现算法成本极高，因无法快速定位像 UDO 这样附带开源代码的高质量论文，导致验证周期长达数月。\n- 对 Text-to-SQL 等新兴方向缺乏权威评估，难以判断哪些成果具备落地价值，极易在低质量研究上浪费算力资源。\n\n### 使用 ML4DB-paper-list 后\n- 团队直接通过\"Index Recommendation\"和\"Cardinality Estimation\"分类，精准锁定针对高并发场景的最新 SOTA 算法。\n- 借助清晰的目录结构，迅速发现并采纳了关于自适应查询处理（Adaptive Query Processing）的前沿方案，填补了技术盲区。\n- 利用列表中提供的源码链接（如 UDO），将算法验证周期从数月缩短至两周，快速完成了内部基准测试。\n- 参考社区对 Text-to-SQL 方向的讨论与补充，规避了不成熟的技术路线，集中资源攻关高回报的自治调优模块。\n\nML4DB-paper-list 将原本分散杂乱的学术海洋转化为结构化的技术地图，让数据库智能化改造从“大海捞针”变为“按图索骥”。","https:\u002F\u002Foss.gittoolsai.com\u002Fimages\u002FLumingSun_ML4DB-paper-list_ccc0980c.png","LumingSun","Luming Sun","https:\u002F\u002Foss.gittoolsai.com\u002Favatars\u002FLumingSun_4d41e68e.jpg","Lost in DATABASE","Inspur","Beijing","luming_s@126.com",null,"https:\u002F\u002Flumingsun.github.io\u002F","https:\u002F\u002Fgithub.com\u002FLumingSun",770,93,"2026-04-03T09:29:50","","未说明",{"notes":92,"python":90,"dependencies":93},"该仓库是一个学术论文列表（Paper List），用于整理人工智能在数据库系统中的应用研究，本身不是一个可运行的软件工具或代码库，因此没有具体的操作系统、GPU、内存、Python 版本或依赖库要求。部分列出的论文附带了外部源代码链接，那些独立项目的运行环境需参考其各自的仓库说明。",[],[14,18],"2026-03-27T02:49:30.150509","2026-04-06T05:37:36.172121",[],[]]