Efficient and Accurate Visual Relocalization with Deep Neural Networks

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Efficient and Accurate Visual Relocalization with Deep Neural Networks"

By

Mr. Changkun LIU


Abstract:

Accurate visual relocalization is a critical problem in many applications,
such as mobile robots, augmented reality, virtual reality, and autonomous
driving. This thesis advances the field by enhancing accuracy, robustness,
and computational efficiency in visual relocalization systems.
This research leverages state-of-the-art techniques—including neural
radiance fields (NeRF), 3D Gaussian splatting (3DGS), and 3D vision
foundation models—to develop accurate, efficient, and practical solutions for
camera relocalization. Traditional structure-based hierarchical localization
(HLoc) pipelines can achieve high accuracy but often rely on complex
frameworks and incur substantial runtime costs. The first study (AIR-HLoc)
addresses this limitation by introducing an adaptive strategy that dynamically
adjusts the number of retrieved images, thereby improving computational
efficiency while maintaining accuracy. In parallel, end-to-end regression
methods based on convolutional neural networks and transformers have gained
increasing attention, as they directly learn to predict camera poses or scene
coordinates from input images without referring to any explicit scene priors
such as 3D point clouds or maps. However, such methods often exhibit limited
generalization capacity, leading to unreliable predictions and falling short
of the accuracy achieved by structure-based approaches.

The thesis introduces two novel frameworks to address these issues. The first
is an uncertainty-aware hierarchical pose refinement framework (HR-APR),
while the second is an efficient pose refinement framework (GS-CPR), which
leverages 3D Gaussian Splatting (3DGS) and 3D vision foundation models.
These frameworks either help to better estimate the predictions’ confidence
of end-to-end regression methods or improve the accuracy of their pose
predictions across diverse benchmarks.

Existing 3D vision foundation models make inefficient use of dense point data
and fail to deliver accurate metric pose estimation. To address these
limitations, this thesis introduces Plana3R, a novel 3D vision foundation
model designed for zero-shot, feed-forward metric indoor 3D reconstruction
and pose estimation. The model leverages planar primitives as a compact
representation tailored to man-made environments.


Date:                   Friday, 31 October 2025

Time:                   11:00am - 1:00pm

Venue:                  Room 4475
                        Lifts 25/26

Chairman:               Prof. Weichuan YU (ECE)

Committee Members:      Dr. Tristan BRAUD (Supervisor)
                        Prof. Pedro SANDER
                        Prof. Sai-Kit YEUNG
                        Dr. Yuan LIU (ISD)
                        Prof. Hong ZHANG (SUSTech)