Towards Efficient and Scalable Retrieval in Metric Spaces

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Towards Efficient and Scalable Retrieval in Metric Spaces"

By

Mr. Zengyang GONG


Abstract:

In the era of data explosion, characterized by increasing volume, velocity, 
and variety, efficiently managing diverse data types is a critical challenge. 
Metric spaces provide a powerful and unified framework for handling diverse 
data types, including spatio-temporal data and high-dimensional vectors, by 
defining distance metrics that satisfy the triangle inequality. This property 
enables effective information retrieval across various data representations. 
However, as data scales exponentially, efficient query processing becomes a 
major bottleneck, necessitating scalable retrieval methods. Advanced 
retrieval techniques are essential for enabling real-world applications, from 
real-time location-based services to high-dimensional vector search for 
retrieval-augmented generation (RAG).

This thesis investigates retrieval methods across various representative 
real-world search problems, each reflecting distinct application scenarios. 
First, for the origin-destination search problem, we model real-world road 
networks as time-dependent graphs and introduce a novel shortcut selection 
problem based on tree decomposition. This approach transforms the graph into 
a tree structure, and we prove the selection problem is NP-hard. We further 
present a solution with theoretical performance guarantees to accelerate 
retrieval. Second, for the optimal routing problem in shared mobility 
services, we focus on minimizing additional travel time required to serve new 
requests (e.g. passengers or parcels) for workers (e.g. drivers or couriers). 
To improve efficiency, we design a novel data summarization model that 
reduces retrieval time from cubic to linear and enables constant-time delay 
queries along routes. Third, for the similarity search problem, which has 
gained importance with the rise of large language models (LLMs) and 
retrieval-augmented generation. We first propose a dimensionality reduction 
technique that preserves distance approximation guarantees between 
high-dimensional vectors. Additionally, we develop a data structure that 
safely prunes redundant distance calculations, further accelerating 
retrieval. Finally, extensive experiments on real datasets demonstrate that 
our proposed solutions consistently outperform state-of-the-art algorithms by 
significant margins.


Date:                   Friday, 10 October 2025

Time:                   9:00am - 11:00am

Venue:                  Room 2408
                        Lifts 17/18

Chairman:               Dr. June SHI (MARK)

Committee Members:      Prof. Lei CHEN (Supervisor)
                        Prof. Qiong LUO
                        Prof. Raymond WONG
                        Prof. Can YANG (MATH)
                        Prof. Jianliang XU (HKBU)