More about HKUST
Learning Large-scale Multi-view Stereopsis
PhD Thesis Proposal Defence
Title: "Learning Large-scale Multi-view Stereopsis"
by
Mr. Yao YAO
Abstract:
Multi-view stereo (MVS) reconstructs 3D representations of the scene from
imagery, which is a core problem of computer vision extensively studied for
decades. Traditionally, MVS algorithms apply hand-crafted similarity metrics
and engineered regularizations to compute dense correspondences. While these
methods have shown great results under ideal Lambertian scenarios, classical
MVS algorithms still suffer from numerous artifacts. In this thesis, we propose
to advance the MVS reconstruction using recent deep learning techniques.
First, we present an end-to-end deep learning architecture, MVSNet, for depth
map inference from multi-view images. The key contribution of this part is the
carefully integration between multi-view geometries and convolutional neural
networks (CNNs). In the network, we extract deep image features and build the
3D cost volume upon the camera frustum via the differentiable homography
warping. Then, 3D convolutions are applied to regularize and regress the output
depth map. We demonstrate on DTU dataset that MVSNet significantly outperforms
previous state-of-the-arts in both reconstruction completeness and overall
quality.
Next, we propose to extend the MVSNet architecture for large-scale MVS
reconstruction. One major limitation of current learning-based approaches is
the scalability: the memory-consuming cost volume regularization makes the
learned MVS hard to be applied to high-resolution scenes. To this end, we
sequentially regularize 2D cost maps via the gated recurrent unit (GRU) rather
than regularize the entire 3D cost volume in one go. The GRU regularization
dramatically reduces memory consumption and makes high-resolution
reconstructions feasible. The proposed R-MVSNet is evaluated on the large-scale
Tanks and Temples dataset and achieves comparable results to classical
large-scale MVS algorithms.
Finally, we establish a large-scale synthetic MVS dataset, BlendedMVS, based on
blended images and rendered depth maps. While several MVS datasets have been
proposed, they fail to provide accurate depth and occlusion information as
ground truth mesh models are usually incomplete. We therefore establish a new
MVS dataset based on model rendering. Textured meshes are first reconstructed
from images of different scenes, which are then rendered into color images,
depth maps, and occlusion maps. We further blend rendered images with input
images using high-pass and low-pass filters to generate our training input.
Extensive experiments demonstrate that models trained on BlendedMVS achieve
significant better generalization ability compared with models trained on other
MVS datasets.
In sum, this thesis presents a complete learning-based solution to large-scale
multi-view stereopsis, including a current baseline network (MVSNet), its
large-scale extension (R-MVSNet) and a large-scale synthetic dataset
(BlendedMVS). We bridge the gap between classical MVS reconstructions and
recent deep learning techniques and demonstrate the effectiveness of the
learning-based MVS through extensive experiments on different datasets.
Date: Friday, 25 October 2019
Time: 4:30pm - 6:30pm
Venue: Room 2126D
lift 19
Committee Members: Prof. Long Quan (Supervisor)
Dr. Xiaojuan Ma (Chairperson)
Dr. Pedro Sander
Prof. Chiew-Lan Tai
**** ALL are Welcome ****