A Supervised Framework for Learned Index Selection Under Diverse Workloads and Data Distributions

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


MPhil Thesis Defence


Title: "A Supervised Framework for Learned Index Selection Under Diverse 
Workloads and Data Distributions"

By

Mr. Mengxuan LI


Abstract:

Learning-based indexes have emerged as a promising alternative to 
traditional database indexing structures, leveraging machine learning models 
to optimize performance across diverse workloads. Unlike traditional indexes 
such as B-Trees or Hash Tables, learning-based indexes adapt to the 
underlying data distribution, which allows them to potentially provide 
better performance for more workloads. Recently, several indexes based on 
this paradigm have been proposed and have shown great potential.

However, the effectiveness of a single index type cannot cover all types of 
workloads and data distributions. In practice, a single index type is 
usually chosen through extensive manual testing on a specific dataset or 
based on past experience. The former procedure is time-consuming and 
requires substantial computing resources, while the latter is not accurate. 
Moreover, as workloads change over time, it is difficult to dynamically 
select a single static index to achieve good performance for all operation 
types (e.g., queries, inserts, deletes).

In this work, we focus on one-dimensional learned indexes and develop a 
predictive model to recommend the best index for an unseen dataset under a 
given workload, without exhaustive testing. Firstly, we generate diverse 
data distributions through data augmentation, and encode each dataset into a 
feature vector. We then conduct comprehensive performance evaluations of 
multiple indexes and collect scores for each index on different operation 
types. Finally, we train a deep learning model to learn the relationship 
between encoded dataset features and index performance. Given an unseen 
dataset, the model can predict the optimal index.


Date:                   Friday, 23 May 2025

Time:                   4:00pm - 6:00pm

Venue:                  Room 2128B
                        Lift 19

Chairman:               Prof. Ke YI

Committee Members:      Prof. Lei CHEN (Supervisor)
                        Dr. Wilfred NG