More about HKUST
Bridging Database and AI: From Efficient Vector Retrieval to Intelligent Data System
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Bridging Database and AI: From Efficient Vector Retrieval to
Intelligent Data System"
By
Miss Yao TIAN
Abstract:
Modern database management systems (DBMSs) are undergoing a transformative
evolution, driven by two complementary trends: (1) the growing demand for
efficient semantic similarity search on high-dimensional embedding vectors to
power artificial intelligence (AI) applications; and (2) the increasing
adoption of AI techniques to enhance DBMS intelligence, performance, and
autonomy. This dissertation makes fundamental contributions to both
directions of this evolution.
To support AI applications with more capable DBMS infrastructure, we propose
a series of vector indexing techniques that push the theoretical and
practical boundaries of similarity search. Our work includes: (i) a novel
locality-sensitive hashing (LSH) framework with a query-centric dynamic
bucketing strategy, which achieves the best- known query complexity for
arbitrary approximation ratios; (ii) an early-termination- driven incremental
LSH variant that enhances query quality and speeds up lookup in both theory
and practice; and (iii) a hybrid LSH-Graph index that combines the strengths
of both paradigms for superior real-world performance. To enhance DBMS
functionality using AI techniques, we design several learned components that
automate and optimize critical DBMS operations. Our innovations include: (i)
a learned index that leverages data distribution to address the long-standing
challenge of similarity search in mid-to- high dimensional spaces; (ii) a
learned filter for efficient membership testing over sliding-window data
streams, particularly valuable for reducing the cost of repeated large
language model (LLM) calls; and (iii) a learned query optimizer capable of
intelligently exploring execution plans while ensuring no performance
regressions.
Together, these efforts tackle the emerging challenges in the database field
shaped by the rapid advancement of LLMs, and point toward the future of
intelligent DBMSs that are deeply integrated with modern AI workloads.
Date: Thursday, 5 June 2025
Time: 2:00pm - 4:00pm
Venue: Room 3494
Lifts 25/26
Chairman: Dr. Ki Ling CHEUNG (ISOM)
Committee Members: Prof. Xiaofang ZHOU (Supervisor)
Prof. Cunsheng DING
Prof. Ke YI
Dr. Jia LIU (MARK)
Prof. Reynold C.K. CHENG (HKU)