Efficient and Accurate Data Association in Large-Scale Structure-from-Motion

PhD Thesis Proposal Defence


Title: "Efficient and Accurate Data Association in Large-Scale 
Structure-from-Motion"

by

Mr. Tianwei SHEN


Abstract:

Data association, in the context of Structure-from-Motion (SfM) and 
Simultaneous Localization and Mapping (SLAM), is the process of associating 
uncertain measurements (e.g. image pixels, local descriptors and 3D tracks) to 
the same object or identity. It forms the foundation of many 3D computer vision 
problems, starting from finding local feature correspondences, identifying 
similar images with overlaps, up to bundle adjustment and related graph-based 
optimization problems that seek to achieve a harmonious status in terms of 
geometric and photometric quantities. Unlike deterministic pose estimation 
algorithms that typically have closed-form solutions, data association usually 
works in a noisy setting and does not possess an analytical form. Yet, it 
greatly affects the efficiency and accuracy of the reconstruction. In this 
thesis, we explore the elements of the data association problem in the context 
of 3D reconstruction and related problems. More specifically, we first give a 
thorough overview of the modern SfM pipeline, with a focus on the functionality 
of data association in each of its sub-steps. Then we describe three novel 
methods to solve the data association in SfM-related 3D computer vision 
problems.

First, we propose a learning-based algorithm for the efficient and accurate 
association of similar images that depict the same scene, which often serves as 
the first step in a large-scale 3D reconstruction to accelerate the later image 
matching pipeline. Though Convolutional Neural Networks (CNNs) have achieved 
superior performance on object image retrieval, Bag-of-Words (BoW) models with 
handcrafted local features still dominate the retrieval of overlapping images 
in 3D reconstruction. We narrow down this gap by presenting an efficient 
CNN-based method to retrieve images with overlaps, which we refer to as the 
matchable image retrieval problem. We propose a batched triplet-based loss 
function combined with mesh re-projection to effectively learn the CNN 
representation. The proposed method significantly accelerates the image 
retrieval process in 3D reconstruction and outperforms the state-of-the-art 
CNN-based and BoW methods for matchable image retrieval.

Based on the pairwise image matching, we present match graph construction 
method that tackles the issues of completeness, efficiency and consistency in a 
unified framework. Pairwise image matching of unordered image collections 
greatly affects the efficiency and accuracy of SfM. Insufficient match pairs 
may result in disconnected structures or incomplete components, while costly 
redundant pairs containing erroneous ones may lead to folded and superimposed 
structures. Our approach starts by chaining all but singleton images using a 
visual-similarity-based minimum spanning tree. Then the minimum spanning tree 
is incrementally expanded to form locally consistent strong triplets. Finally, 
a global community-based graph algorithm is introduced to strengthen the global 
consistency by reinforcing potentially large connected components. We 
demonstrate the superior performance of our method in terms of accuracy and 
efficiency on both benchmark and Internet datasets. Our method also performs 
remarkably well on the challenging datasets of highly ambiguous and duplicated 
scenes.

The data association problem also widely exists in other domains of 3D 
reconstruction. We describe our contributions in two related problems, namely 
generating consistent textures in image-based modeling, and estimating relative 
camera poses via the deep interplay of photometric and geometric information. 
The first work shares the same graph structure with the large-scale SfM 
problem. The second works combines traditional geometric motion estimation 
method with the recent trend of learning-based methods. We bridge the gap 
between geometric loss and photometric loss by introducing the matching loss 
constrained by epipolar geometry in a self-supervised framework. Evaluated on 
the KITTI dataset, our method outperforms the state-of-the-art unsupervised 
ego-motion estimation methods by a large margin.


Date:			Monday, 4 March 2019

Time:                  	4:00pm - 6:00pm

Venue:                  Room 2408
                         (lifts 17/18)

Committee Members:	Prof. Long Quan (Supervisor)
 			Dr. Pedro Sander (Chairperson)
 			Prof. Huamin Qu
 			Prof. Chiew-Lan Tai


**** ALL are Welcome ****