More about HKUST
LEARNING VISUAL CORRESPONDENCES FOR GEOMETRY RECOVERY
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "LEARNING VISUAL CORRESPONDENCES FOR GEOMETRY RECOVERY" By Mr. Hongkai CHEN Abstract: Identifying robust and accurate visual correspondences across images, also known as image matching, has been a long-standing topic in computer vision research. Particularly, image matching serves a fundamental step in reconstructing real-world geometry from multi-view photos, which receives widespread attention from series of industrial applications, including metaverse, AR/VR and autonomous driving. Traditionally, image matching involves a series of discrete steps and hand-crafted algorithms. Although proven effective in general cases, their manually designed features and matching strategy are often insufficient to cope with challenging matching scenarios, such as low-texture regions, large perspective changes or very low-overlap pairs. In this thesis, we are dedicated to further improving the accuracy and robustness of image matching algorithms, particularly through the utilization of deep learning techniques. We first propose a graph neural network (GNN), which inherit traditional keypoint-based matching scheme, to regularize matching cost jointly reasoning about visual similarity and matching consensus. Specifically, to avoid exhaustive interaction among image keypoints, we leverage a small set of pre-seleceted relatively reliable matches, referred to as seed matches, to guide matching of a whole keypoint set. By integrating seed matches with a series of efficient attentive operations, we prove that even a very limited set of seeds could provide strong clues to assist matching of other keypoints. Through comprehensive experiments, we demonstrate that our approach achieves competitive performance compared with state-of-the-art GNN-based matcher while maintaining modest computational costs. Jumping out of keypoint-based matching, we then presenet an end-to-end Transformerbased matcher that directly works on raw image pairs and skip the step of keypoint detection. To tackle the quadratic complexity caused by dense operation on images for vanilla transformer, we propose a global-local attention framework to ensure both global long-range interaction and local fine-level interaction. Specially, instead of setting local attention span as a fixed size, we adjust it according to learned matching uncertainty, which balances matching coverage and interaction granularity in an adaptive way. Through comprehensive evaluation, we prove that our designed attention framework significantly improve the quality of obtained matches and boosts the accuracy of camera pose estimation. Particularly, we outperform our counterparts that also adopt efficient Transformer design by a large margin. Finally, taking one step further from our previous work, we propose a geometry-aware deformable attention to enhance local attention in Transformer-based matcher. Towards better modeling of ubiquitous local deformation caused by view-point changes, we estimate patchwise parametric deformation filed from intermediate matching results, which are used to shape local attention pattern. Through this design, we embed deformation priors into the process of matching in a principled and intuitive manner. Experiments show that our design considerably improves the effectiveness of global-local attention framework and produces high quality visual correspondences for both two-view pose estimation and visual localization. With intensive investigation and innovation, we aspire to further advance the performance boundaries of image matching and empower a wider range of 2D and 3D applications. Date: Tuesday, 22 August 2023 Time: 2:00pm - 4:00pm Venue: Room 4475 Lifts 25/26 Chairman: Prof. Lixin WU (MATH) Committee Members: Prof. Long QUAN (Supervisor) Prof. Chi Keung TANG Prof. Dan XU Prof. Weichuan YU (ECE) Prof. Tien Tsin WONG (CUHK) **** ALL are Welcome ****