More about HKUST
Bridging Database and AI: From Efficient Vector Retrieval to Intelligent Data System
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Bridging Database and AI: From Efficient Vector Retrieval to Intelligent Data System" By Miss Yao TIAN Abstract: Modern database management systems (DBMSs) are undergoing a transformative evolution, driven by two complementary trends: (1) the growing demand for efficient semantic similarity search on high-dimensional embedding vectors to power artificial intelligence (AI) applications; and (2) the increasing adoption of AI techniques to enhance DBMS intelligence, performance, and autonomy. This dissertation makes fundamental contributions to both directions of this evolution. To support AI applications with more capable DBMS infrastructure, we propose a series of vector indexing techniques that push the theoretical and practical boundaries of similarity search. Our work includes: (i) a novel locality-sensitive hashing (LSH) framework with a query-centric dynamic bucketing strategy, which achieves the best- known query complexity for arbitrary approximation ratios; (ii) an early-termination- driven incremental LSH variant that enhances query quality and speeds up lookup in both theory and practice; and (iii) a hybrid LSH-Graph index that combines the strengths of both paradigms for superior real-world performance. To enhance DBMS functionality using AI techniques, we design several learned components that automate and optimize critical DBMS operations. Our innovations include: (i) a learned index that leverages data distribution to address the long-standing challenge of similarity search in mid-to- high dimensional spaces; (ii) a learned filter for efficient membership testing over sliding-window data streams, particularly valuable for reducing the cost of repeated large language model (LLM) calls; and (iii) a learned query optimizer capable of intelligently exploring execution plans while ensuring no performance regressions. Together, these efforts tackle the emerging challenges in the database field shaped by the rapid advancement of LLMs, and point toward the future of intelligent DBMSs that are deeply integrated with modern AI workloads. Date: Thursday, 5 June 2025 Time: 2:00pm - 4:00pm Venue: Room 3494 Lifts 25/26 Chairman: Dr. Ki Ling CHEUNG (ISOM) Committee Members: Prof. Xiaofang ZHOU (Supervisor) Prof. Cunsheng DING Prof. Ke YI Dr. Jia LIU (MARK) Prof. Reynold C.K. CHENG (HKU)