More about HKUST
A Supervised Framework for Learned Index Selection Under Diverse Workloads and Data Distributions
The Hong Kong University of Science and Technology Department of Computer Science and Engineering MPhil Thesis Defence Title: "A Supervised Framework for Learned Index Selection Under Diverse Workloads and Data Distributions" By Mr. Mengxuan LI Abstract: Learning-based indexes have emerged as a promising alternative to traditional database indexing structures, leveraging machine learning models to optimize performance across diverse workloads. Unlike traditional indexes such as B-Trees or Hash Tables, learning-based indexes adapt to the underlying data distribution, which allows them to potentially provide better performance for more workloads. Recently, several indexes based on this paradigm have been proposed and have shown great potential. However, the effectiveness of a single index type cannot cover all types of workloads and data distributions. In practice, a single index type is usually chosen through extensive manual testing on a specific dataset or based on past experience. The former procedure is time-consuming and requires substantial computing resources, while the latter is not accurate. Moreover, as workloads change over time, it is difficult to dynamically select a single static index to achieve good performance for all operation types (e.g., queries, inserts, deletes). In this work, we focus on one-dimensional learned indexes and develop a predictive model to recommend the best index for an unseen dataset under a given workload, without exhaustive testing. Firstly, we generate diverse data distributions through data augmentation, and encode each dataset into a feature vector. We then conduct comprehensive performance evaluations of multiple indexes and collect scores for each index on different operation types. Finally, we train a deep learning model to learn the relationship between encoded dataset features and index performance. Given an unseen dataset, the model can predict the optimal index. Date: Friday, 23 May 2025 Time: 4:00pm - 6:00pm Venue: Room 2128B Lift 19 Chairman: Prof. Ke YI Committee Members: Prof. Lei CHEN (Supervisor) Dr. Wilfred NG