More about HKUST
Efficient and Accurate Visual Relocalization with Deep Neural Network
PhD Thesis Proposal Defence
Title: "Efficient and Accurate Visual Relocalization with Deep Neural
Network"
by
Mr. Changkun LIU
Abstract:
Accurate visual relocalization is a crucial problem in many applications,
such as mobile robots, augmented reality, virtual reality, and autonomous
driving. This thesis advances the field by enhancing accuracy, robustness,
and computational efficiency in visual relocalization systems.
This research leverages state-of-the-art techniques—including neural radiance
fields (NeRF), 3D Gaussian splatting (3DGS), and 3D vision foundation
models—to develop accurate, efficient, and practical solutions for camera
relocalization. Traditional structure-based hierarchical localization (HLoc)
pipelines can achieve high accuracy but often rely on complex frameworks and
incur substantial runtime costs. The first study (AIR-HLoc) addresses this
limitation by introducing an adaptive strategy that dynamically adjusts the
number of retrieved images, thereby improving computational efficiency while
maintaining accuracy. In parallel, end-to-end regression methods based on
convolutional neural networks and transformers have gained prominence, as
they directly learn to predict camera poses or scene coordinates from input
images, further expanding the design space of localization approaches.
However, they often struggle to match the accuracy of structure-based
approaches. To address this, the thesis introduces two novel frameworks: an
uncertainty-aware hierarchical pose refinement (HR-APR) and an efficient pose
refinement framework (GS-CPR) integrating 3DGS and 3D vision foundation
models. These frameworks achieve significant improvements on end-to-end
regression approaches in pose estimation accuracy across various benchmarks.
Existing 3D vision foundation models make inefficient use of dense point data
and fail to deliver accurate metric pose estimation. To address these
limitations, this thesis introduces Plana3R, a novel 3D vision foundation
model designed for zero-shot, feed-forward metric indoor 3D reconstruction
and pose estimation. The model leverages planar primitives as a compact
representation tailored to man-made environments.
Date: Wednesday, 24 September 2025
Time: 2:00pm - 4:00pm
Venue: Room 5501
Lifts 25/26
Join Zoom Meeting:
https://hkust.zoom.us/j/96886475205?pwd=On4NuQrg7uoCxXVaNObE1YNWvzHl3E.1
Meeting ID: 968 8647 5205
Password: 139237
Committee Members: Dr. Tristan Braud (Supervisor)
Dr. Dan Xu (Chairperson)
Prof. Sai-Kit Yeung