Efficient and Accurate Visual Relocalization with Deep Neural Network

PhD Thesis Proposal Defence


Title: "Efficient and Accurate Visual Relocalization with Deep Neural 
Network"

by

Mr. Changkun LIU


Abstract:

Accurate visual relocalization is a crucial problem in many applications, 
such as mobile robots, augmented reality, virtual reality, and autonomous 
driving. This thesis advances the field by enhancing accuracy, robustness, 
and computational efficiency in visual relocalization systems.

This research leverages state-of-the-art techniques—including neural radiance 
fields (NeRF), 3D Gaussian splatting (3DGS), and 3D vision foundation 
models—to develop accurate, efficient, and practical solutions for camera 
relocalization. Traditional structure-based hierarchical localization (HLoc) 
pipelines can achieve high accuracy but often rely on complex frameworks and 
incur substantial runtime costs. The first study (AIR-HLoc) addresses this 
limitation by introducing an adaptive strategy that dynamically adjusts the 
number of retrieved images, thereby improving computational efficiency while 
maintaining accuracy. In parallel, end-to-end regression methods based on 
convolutional neural networks and transformers have gained prominence, as 
they directly learn to predict camera poses or scene coordinates from input 
images, further expanding the design space of localization approaches. 
However, they often struggle to match the accuracy of structure-based 
approaches. To address this, the thesis introduces two novel frameworks: an 
uncertainty-aware hierarchical pose refinement (HR-APR) and an efficient pose 
refinement framework (GS-CPR) integrating 3DGS and 3D vision foundation 
models. These frameworks achieve significant improvements on end-to-end 
regression approaches in pose estimation accuracy across various benchmarks.

Existing 3D vision foundation models make inefficient use of dense point data 
and fail to deliver accurate metric pose estimation. To address these 
limitations, this thesis introduces Plana3R, a novel 3D vision foundation 
model designed for zero-shot, feed-forward metric indoor 3D reconstruction 
and pose estimation. The model leverages planar primitives as a compact 
representation tailored to man-made environments.


Date:                   Wednesday, 24 September 2025

Time:                   2:00pm - 4:00pm

Venue:                  Room 5501
                        Lifts 25/26

Join Zoom Meeting:
https://hkust.zoom.us/j/96886475205?pwd=On4NuQrg7uoCxXVaNObE1YNWvzHl3E.1
Meeting ID:             968 8647 5205
Password:               139237

Committee Members:      Dr. Tristan Braud (Supervisor)
                        Dr. Dan Xu (Chairperson)
                        Prof. Sai-Kit Yeung