Bridging Database and AI: From Efficient Vector Retrieval to Intelligent Data System

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Bridging Database and AI: From Efficient Vector Retrieval to 
Intelligent Data System"

By

Miss Yao TIAN


Abstract:

Modern database management systems (DBMSs) are undergoing a transformative 
evolution, driven by two complementary trends: (1) the growing demand for 
efficient semantic similarity search on high-dimensional embedding vectors to 
power artificial intelligence (AI) applications; and (2) the increasing 
adoption of AI techniques to enhance DBMS intelligence, performance, and 
autonomy. This dissertation makes fundamental contributions to both 
directions of this evolution.

To support AI applications with more capable DBMS infrastructure, we propose 
a series of vector indexing techniques that push the theoretical and 
practical boundaries of similarity search. Our work includes: (i) a novel 
locality-sensitive hashing (LSH) framework with a query-centric dynamic 
bucketing strategy, which achieves the best- known query complexity for 
arbitrary approximation ratios; (ii) an early-termination- driven incremental 
LSH variant that enhances query quality and speeds up lookup in both theory 
and practice; and (iii) a hybrid LSH-Graph index that combines the strengths 
of both paradigms for superior real-world performance. To enhance DBMS 
functionality using AI techniques, we design several learned components that 
automate and optimize critical DBMS operations. Our innovations include: (i) 
a learned index that leverages data distribution to address the long-standing 
challenge of similarity search in mid-to- high dimensional spaces; (ii) a 
learned filter for efficient membership testing over sliding-window data 
streams, particularly valuable for reducing the cost of repeated large 
language model (LLM) calls; and (iii) a learned query optimizer capable of 
intelligently exploring execution plans while ensuring no performance 
regressions.

Together, these efforts tackle the emerging challenges in the database field 
shaped by the rapid advancement of LLMs, and point toward the future of 
intelligent DBMSs that are deeply integrated with modern AI workloads.


Date:                   Thursday, 5 June 2025

Time:                   2:00pm - 4:00pm

Venue:                  Room 3494
                        Lifts 25/26

Chairman:               Dr. Ki Ling CHEUNG (ISOM)

Committee Members:      Prof. Xiaofang ZHOU (Supervisor)
                        Prof. Cunsheng DING
                        Prof. Ke YI
                        Dr. Jia LIU (MARK)
                        Prof. Reynold C.K. CHENG (HKU)