More about HKUST
Efficient and Accurate Visual Relocalization with Deep Neural Network
PhD Thesis Proposal Defence Title: "Efficient and Accurate Visual Relocalization with Deep Neural Network" by Mr. Changkun LIU Abstract: Accurate visual relocalization is a crucial problem in many applications, such as mobile robots, augmented reality, virtual reality, and autonomous driving. This thesis advances the field by enhancing accuracy, robustness, and computational efficiency in visual relocalization systems. This research leverages state-of-the-art techniques—including neural radiance fields (NeRF), 3D Gaussian splatting (3DGS), and 3D vision foundation models—to develop accurate, efficient, and practical solutions for camera relocalization. Traditional structure-based hierarchical localization (HLoc) pipelines can achieve high accuracy but often rely on complex frameworks and incur substantial runtime costs. The first study (AIR-HLoc) addresses this limitation by introducing an adaptive strategy that dynamically adjusts the number of retrieved images, thereby improving computational efficiency while maintaining accuracy. In parallel, end-to-end regression methods based on convolutional neural networks and transformers have gained prominence, as they directly learn to predict camera poses or scene coordinates from input images, further expanding the design space of localization approaches. However, they often struggle to match the accuracy of structure-based approaches. To address this, the thesis introduces two novel frameworks: an uncertainty-aware hierarchical pose refinement (HR-APR) and an efficient pose refinement framework (GS-CPR) integrating 3DGS and 3D vision foundation models. These frameworks achieve significant improvements on end-to-end regression approaches in pose estimation accuracy across various benchmarks. Existing 3D vision foundation models make inefficient use of dense point data and fail to deliver accurate metric pose estimation. To address these limitations, this thesis introduces Plana3R, a novel 3D vision foundation model designed for zero-shot, feed-forward metric indoor 3D reconstruction and pose estimation. The model leverages planar primitives as a compact representation tailored to man-made environments. Date: Wednesday, 24 September 2025 Time: 2:00pm - 4:00pm Venue: Room 5501 Lifts 25/26 Join Zoom Meeting: https://hkust.zoom.us/j/96886475205?pwd=On4NuQrg7uoCxXVaNObE1YNWvzHl3E.1 Meeting ID: 968 8647 5205 Password: 139237 Committee Members: Dr. Tristan Braud (Supervisor) Dr. Dan Xu (Chairperson) Prof. Sai-Kit Yeung