More about HKUST
Efficient and Accurate Visual Relocalization with Deep Neural Networks
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Efficient and Accurate Visual Relocalization with Deep Neural Networks" By Mr. Changkun LIU Abstract: Accurate visual relocalization is a critical problem in many applications, such as mobile robots, augmented reality, virtual reality, and autonomous driving. This thesis advances the field by enhancing accuracy, robustness, and computational efficiency in visual relocalization systems. This research leverages state-of-the-art techniques—including neural radiance fields (NeRF), 3D Gaussian splatting (3DGS), and 3D vision foundation models—to develop accurate, efficient, and practical solutions for camera relocalization. Traditional structure-based hierarchical localization (HLoc) pipelines can achieve high accuracy but often rely on complex frameworks and incur substantial runtime costs. The first study (AIR-HLoc) addresses this limitation by introducing an adaptive strategy that dynamically adjusts the number of retrieved images, thereby improving computational efficiency while maintaining accuracy. In parallel, end-to-end regression methods based on convolutional neural networks and transformers have gained increasing attention, as they directly learn to predict camera poses or scene coordinates from input images without referring to any explicit scene priors such as 3D point clouds or maps. However, such methods often exhibit limited generalization capacity, leading to unreliable predictions and falling short of the accuracy achieved by structure-based approaches. The thesis introduces two novel frameworks to address these issues. The first is an uncertainty-aware hierarchical pose refinement framework (HR-APR), while the second is an efficient pose refinement framework (GS-CPR), which leverages 3D Gaussian Splatting (3DGS) and 3D vision foundation models. These frameworks either help to better estimate the predictions’ confidence of end-to-end regression methods or improve the accuracy of their pose predictions across diverse benchmarks. Existing 3D vision foundation models make inefficient use of dense point data and fail to deliver accurate metric pose estimation. To address these limitations, this thesis introduces Plana3R, a novel 3D vision foundation model designed for zero-shot, feed-forward metric indoor 3D reconstruction and pose estimation. The model leverages planar primitives as a compact representation tailored to man-made environments. Date: Friday, 31 October 2025 Time: 11:00am - 1:00pm Venue: Room 4475 Lifts 25/26 Chairman: Prof. Weichuan YU (ECE) Committee Members: Dr. Tristan BRAUD (Supervisor) Prof. Pedro SANDER Prof. Sai-Kit YEUNG Dr. Yuan LIU (ISD) Prof. Hong ZHANG (SUSTech)