More about HKUST
A Supervised Framework for Learned Index Selection Under Diverse Workloads and Data Distributions
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
MPhil Thesis Defence
Title: "A Supervised Framework for Learned Index Selection Under Diverse
Workloads and Data Distributions"
By
Mr. Mengxuan LI
Abstract:
Learning-based indexes have emerged as a promising alternative to
traditional database indexing structures, leveraging machine learning models
to optimize performance across diverse workloads. Unlike traditional indexes
such as B-Trees or Hash Tables, learning-based indexes adapt to the
underlying data distribution, which allows them to potentially provide
better performance for more workloads. Recently, several indexes based on
this paradigm have been proposed and have shown great potential.
However, the effectiveness of a single index type cannot cover all types of
workloads and data distributions. In practice, a single index type is
usually chosen through extensive manual testing on a specific dataset or
based on past experience. The former procedure is time-consuming and
requires substantial computing resources, while the latter is not accurate.
Moreover, as workloads change over time, it is difficult to dynamically
select a single static index to achieve good performance for all operation
types (e.g., queries, inserts, deletes).
In this work, we focus on one-dimensional learned indexes and develop a
predictive model to recommend the best index for an unseen dataset under a
given workload, without exhaustive testing. Firstly, we generate diverse
data distributions through data augmentation, and encode each dataset into a
feature vector. We then conduct comprehensive performance evaluations of
multiple indexes and collect scores for each index on different operation
types. Finally, we train a deep learning model to learn the relationship
between encoded dataset features and index performance. Given an unseen
dataset, the model can predict the optimal index.
Date: Friday, 23 May 2025
Time: 4:00pm - 6:00pm
Venue: Room 2128B
Lift 19
Chairman: Prof. Ke YI
Committee Members: Prof. Lei CHEN (Supervisor)
Dr. Wilfred NG