More about HKUST
Towards Efficient and Scalable Retrieval in Metric Spaces
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Towards Efficient and Scalable Retrieval in Metric Spaces"
By
Mr. Zengyang GONG
Abstract:
In the era of data explosion, characterized by increasing volume, velocity,
and variety, efficiently managing diverse data types is a critical challenge.
Metric spaces provide a powerful and unified framework for handling diverse
data types, including spatio-temporal data and high-dimensional vectors, by
defining distance metrics that satisfy the triangle inequality. This property
enables effective information retrieval across various data representations.
However, as data scales exponentially, efficient query processing becomes a
major bottleneck, necessitating scalable retrieval methods. Advanced
retrieval techniques are essential for enabling real-world applications, from
real-time location-based services to high-dimensional vector search for
retrieval-augmented generation (RAG).
This thesis investigates retrieval methods across various representative
real-world search problems, each reflecting distinct application scenarios.
First, for the origin-destination search problem, we model real-world road
networks as time-dependent graphs and introduce a novel shortcut selection
problem based on tree decomposition. This approach transforms the graph into
a tree structure, and we prove the selection problem is NP-hard. We further
present a solution with theoretical performance guarantees to accelerate
retrieval. Second, for the optimal routing problem in shared mobility
services, we focus on minimizing additional travel time required to serve new
requests (e.g. passengers or parcels) for workers (e.g. drivers or couriers).
To improve efficiency, we design a novel data summarization model that
reduces retrieval time from cubic to linear and enables constant-time delay
queries along routes. Third, for the similarity search problem, which has
gained importance with the rise of large language models (LLMs) and
retrieval-augmented generation. We first propose a dimensionality reduction
technique that preserves distance approximation guarantees between
high-dimensional vectors. Additionally, we develop a data structure that
safely prunes redundant distance calculations, further accelerating
retrieval. Finally, extensive experiments on real datasets demonstrate that
our proposed solutions consistently outperform state-of-the-art algorithms by
significant margins.
Date: Friday, 10 October 2025
Time: 9:00am - 11:00am
Venue: Room 2408
Lifts 17/18
Chairman: Dr. June SHI (MARK)
Committee Members: Prof. Lei CHEN (Supervisor)
Prof. Qiong LUO
Prof. Raymond WONG
Prof. Can YANG (MATH)
Prof. Jianliang XU (HKBU)