More about HKUST
Efficient High-Dimensional Search in Modern Vector Databases
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Efficient High-Dimensional Search in Modern Vector Databases"
By
Mr. Xi ZHAO
Abstract:
The rapid expansion of vector datasets, driven by the widespread adoption of
embedding models for processing unstructured data, has significantly amplified
the importance of high-dimensional search and the role of vector databases.
However, the vector search problem in high-dimensional spaces is inherently
computationally expensive due to the curse of dimensionality. As a result,
finding approximate solutions has become a more efficient and viable
alternative. Modern vector databases commonly address two types of similarity
search problems: Approximate Nearest Neighbor Search (ANNS) and Approximate
Maximum Inner Product Search (AMIPS). Various methods have been developed for
efficient vector search in modern vector databases, including locality-sensitive
hashing (LSH) based approaches and graph-based techniques.
This thesis investigates the performance of LSH-based and graph-based methods,
evaluating factors such as index cost (index size and indexing time), query
latency, and query accuracy. To support efficient vector search in modern vector
databases while maintaining acceptable indexing costs, we propose a series of
vector indexing techniques that advance both the theoretical and practical
boundaries of similarity search. Our contributions include: (i) DB-LSH, a
query-centric LSH framework for ANNS that reduces index size while preserving
sub-linear query complexity and improving candidate quality through dynamic
bucketing; (ii) LSH-APG, a hybrid indexing strategy that significantly reduces the
construction cost of approximate proximity graphs for ANNS by integrating a
lightweight LSH framework; (iii) FARGO, an LSH-based AMIPS framework that
improves accuracy and efficiency through asymmetric transformation, global
multi-probing, and adaptive termination; (iv) NAP, a norm-adaptive partitioning
strategy with hybrid index structures tailored to different norm regions for
accurate and efficient AMIPS; and (v) SOSIA, an innovative framework for AMIPS
over sparse vectors that converts sparse vectors into a binary space and provides
an unbiased estimator for the inner product between any two vectors.
Together, these efforts address emerging challenges in vector databases, shaped
by the rapid advancements in large language models (LLMs), and pave the way for
the future of intelligent vector management, seamlessly integrated with modern AI
workloads.
Date: Thursday, 9 April 2026
Time: 9:00am - 11:00am
Venue: Room 2132C
Lift 22
Chairman: Dr. Qiming SHAO (ECE)
Committee Members: Prof. Xiaofang ZHOU (Supervisor)
Prof. Raymond WONG
Prof. Ke YI
Dr. Sirui HAN (EMIA)
Prof. Yunjun GAO (Zhejiang University)