More about HKUST
Efficient High-Dimensional Search in Modern Vector Databases
PhD Thesis Proposal Defence
Title: "Efficient High-Dimensional Search in Modern Vector Databases"
by
Mr. Xi ZHAO
Abstract:
The rapid expansion of vector datasets, driven by the widespread adoption of
embedding models for processing unstructured data, has significantly
amplified the importance of high-dimensional search and the role of vector
databases. However, the vector search problem in high-dimensional spaces is
inherently computationally expensive due to the curse of dimensionality. As
a result, finding approximate solutions has become a more efficient and
viable alternative. Modern vector databases commonly address two types of
similarity search problems: Approximate Nearest Neighbor Search (ANNS) and
Approximate Maximum Inner Product Search (AMIPS). Various methods have been
developed for efficient vector search in modern vector databases, including
locality-sensitive hashing (LSH) based approaches and graph-based techniques.
This thesis investigates the performance of LSH-based and graph-based
methods, evaluating factors such as index cost (index size, indexing time),
query latency, and query accuracy. To support efficient vector search in
modern vector databases while maintaining acceptable index costs, we propose
a series of vector indexing techniques that push the boundaries of both
theoretical and practical similarity search. Our contributions include: (i)
a novel graph construction strategy that significantly reduces the indexing
time of APGs for ANNS by integrating a lightweight LSH framework; (ii) an
asymmetrical transformation and adaptive hash bucket probing strategy for
enhancing the accuracy and efficiency of the AMIPS problem; (iii) a new
norm-adaptive partitioning strategy and hybrid index structures tailored to
the unique characteristics of different partitions, optimizing their
respective sizes and norm distributions for accurate and efficient AMIPS;
and (iv) an innovative framework to address the challenges posed by sparse
vectors in AMIPS, which converts sparse vectors into a binary space and
provides an unbiased estimator for the inner product between any two
vectors.
Together, these efforts address emerging challenges in vector databases,
shaped by the rapid advancements in large language models (LLMs), and pave
the way for the future of intelligent vector management, seamlessly
integrated with modern AI workloads.
Date: Tuesday, 20 January 2026
Time: 10:00am - 12:00noon
Venue: Room 3494
Lifts 25/26
Committee Members: Prof. Xiaofang Zhou (Supervisor)
Prof. Raymond Wong (Chairperson)
Prof. Ke Yi