More about HKUST
Efficient and Accurate Data Association in Large-Scale Structure-from-Motion
PhD Thesis Proposal Defence
Title: "Efficient and Accurate Data Association in Large-Scale
Structure-from-Motion"
by
Mr. Tianwei SHEN
Abstract:
Data association, in the context of Structure-from-Motion (SfM) and
Simultaneous Localization and Mapping (SLAM), is the process of associating
uncertain measurements (e.g. image pixels, local descriptors and 3D tracks) to
the same object or identity. It forms the foundation of many 3D computer vision
problems, starting from finding local feature correspondences, identifying
similar images with overlaps, up to bundle adjustment and related graph-based
optimization problems that seek to achieve a harmonious status in terms of
geometric and photometric quantities. Unlike deterministic pose estimation
algorithms that typically have closed-form solutions, data association usually
works in a noisy setting and does not possess an analytical form. Yet, it
greatly affects the efficiency and accuracy of the reconstruction. In this
thesis, we explore the elements of the data association problem in the context
of 3D reconstruction and related problems. More specifically, we first give a
thorough overview of the modern SfM pipeline, with a focus on the functionality
of data association in each of its sub-steps. Then we describe three novel
methods to solve the data association in SfM-related 3D computer vision
problems.
First, we propose a learning-based algorithm for the efficient and accurate
association of similar images that depict the same scene, which often serves as
the first step in a large-scale 3D reconstruction to accelerate the later image
matching pipeline. Though Convolutional Neural Networks (CNNs) have achieved
superior performance on object image retrieval, Bag-of-Words (BoW) models with
handcrafted local features still dominate the retrieval of overlapping images
in 3D reconstruction. We narrow down this gap by presenting an efficient
CNN-based method to retrieve images with overlaps, which we refer to as the
matchable image retrieval problem. We propose a batched triplet-based loss
function combined with mesh re-projection to effectively learn the CNN
representation. The proposed method significantly accelerates the image
retrieval process in 3D reconstruction and outperforms the state-of-the-art
CNN-based and BoW methods for matchable image retrieval.
Based on the pairwise image matching, we present match graph construction
method that tackles the issues of completeness, efficiency and consistency in a
unified framework. Pairwise image matching of unordered image collections
greatly affects the efficiency and accuracy of SfM. Insufficient match pairs
may result in disconnected structures or incomplete components, while costly
redundant pairs containing erroneous ones may lead to folded and superimposed
structures. Our approach starts by chaining all but singleton images using a
visual-similarity-based minimum spanning tree. Then the minimum spanning tree
is incrementally expanded to form locally consistent strong triplets. Finally,
a global community-based graph algorithm is introduced to strengthen the global
consistency by reinforcing potentially large connected components. We
demonstrate the superior performance of our method in terms of accuracy and
efficiency on both benchmark and Internet datasets. Our method also performs
remarkably well on the challenging datasets of highly ambiguous and duplicated
scenes.
The data association problem also widely exists in other domains of 3D
reconstruction. We describe our contributions in two related problems, namely
generating consistent textures in image-based modeling, and estimating relative
camera poses via the deep interplay of photometric and geometric information.
The first work shares the same graph structure with the large-scale SfM
problem. The second works combines traditional geometric motion estimation
method with the recent trend of learning-based methods. We bridge the gap
between geometric loss and photometric loss by introducing the matching loss
constrained by epipolar geometry in a self-supervised framework. Evaluated on
the KITTI dataset, our method outperforms the state-of-the-art unsupervised
ego-motion estimation methods by a large margin.
Date: Monday, 4 March 2019
Time: 4:00pm - 6:00pm
Venue: Room 2408
(lifts 17/18)
Committee Members: Prof. Long Quan (Supervisor)
Dr. Pedro Sander (Chairperson)
Prof. Huamin Qu
Prof. Chiew-Lan Tai
**** ALL are Welcome ****