More about HKUST
Efficient and Accurate Data Association in Large-Scale Structure-from-Motion and Beyond
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Efficient and Accurate Data Association in Large-Scale Structure-from-Motion and Beyond" By Mr. Tianwei SHEN Abstract Data association, in the context of Structure-from-Motion (SfM) and Simultaneous Localization and Mapping (SLAM), is the process of associating uncertain measurements (e.g. image pixels, local descriptors and 3D tracks) to the same object or identity. It forms the foundation of many 3D computer vision problems, starting from finding local feature correspondences, identifying similar images with overlaps, up to bundle adjustment and related graph-based optimization problems that seek to achieve a harmonious status in terms of geometric and photometric quantities. Unlike deterministic pose estimation algorithms that typically have closed-form solutions, data association usually works in a noisy setting and does not possess an analytical form. Yet, it greatly affects the efficiency and accuracy of the reconstruction. In this thesis, we explore the elements of the data association problem in the context of 3D reconstruction and related problems. More specifically, we first give a thorough overview of the modern SfM pipeline, with a focus on the functionality of data association in each of its sub-steps. Then we describe three novel methods to solve the data association in SfM-related 3D computer vision problems. First, we propose a learning-based algorithm for the efficient and accurate association of similar images that depict the same scene, which often serves as the first step in a large-scale 3D reconstruction to accelerate the later image matching pipeline. Though Convolutional Neural Networks (CNNs) have achieved superior performance on object image retrieval, Bag-of-Words (BoW) models with handcrafted local features still dominate the retrieval of overlapping images in 3D reconstruction. We narrow down this gap by presenting an efficient CNN-based method to retrieve images with overlaps, which we refer to as the matchable image retrieval problem. We propose a batched triplet-based loss function combined with mesh re-projection to effectively learn the CNN representation. The proposed method significantly accelerates the image retrieval process in 3D reconstruction and outperforms the state-of-the-art CNN-based and BoW methods for matchable image retrieval. Based on the pairwise image matching, we present match graph construction method that tackles the issues of completeness, efficiency and consistency in a unified framework. Pairwise image matching of unordered image collections greatly affects the efficiency and accuracy of SfM. Insufficient match pairs may result in disconnected structures or incomplete components, while costly redundant pairs containing erroneous ones may lead to folded and superimposed structures. Our approach starts by chaining all but singleton images using a visual-similarity-based minimum spanning tree. Then the minimum spanning tree is incrementally expanded to form locally consistent strong triplets. Finally, a global community-based graph algorithm is introduced to strengthen the global consistency by reinforcing potentially large connected components. We demonstrate the superior performance of our method in terms of accuracy and efficiency on both benchmark and Internet datasets. Our method also performs remarkably well on the challenging datasets of highly ambiguous and duplicated scenes. The data association problem also widely exists in other domains of 3D reconstruction. We describe our contributions in two related problems, namely generating consistent textures in image-based modeling, and estimating relative camera poses via the deep interplay of photometric and geometric information. The first one shares the same graph structure with the large-scale SfM problem, while the second combines traditional geometric motion estimation method with the recent trend of learning-based methods. We bridge the gap between geometric loss and photometric loss by introducing the matching loss constrained by epipolar geometry in a self-supervised framework. Evaluated on the KITTI dataset, our method outperforms the state-of-the-art unsupervised ego-motion estimation methods by a large margin. We conclude the thesis by laying out future directions of data association with different types of information sources. Date: Wednesday, 5 June 2019 Time: 2:00pm - 4:00pm Venue: Room 3494 Lifts 25/26 Chairman: Prof. Danny Tsang (ECE) Committee Members: Prof. Long Quan (Supervisor) Prof. Pedro Sander Prof. Chiew-Lan Tai Prof. Chi-Keung Tang Prof. Ajay Joneja (ISD) Prof. Hongdong Li (Australian National Univ) **** ALL are Welcome ****