Learning Large-Scale Multi-View Stereopsis

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Learning Large-Scale Multi-View Stereopsis"

By

Mr. Yao YAO



Abstract

Multi-view stereo (MVS) reconstructs 3D representations of the scene from 
imagery, which is a core problem of computer vision extensively studied for 
decades. Traditionally, MVS algorithms apply hand-crafted similarity metrics 
and engineered regularizations to compute dense correspondences. While these 
methods have shown great results under ideal Lambertian scenarios, classical 
MVS algorithms still suffer from numerous artifacts. In this thesis, we propose 
to advance the MVS reconstruction using recent deep learning techniques.

First, we present an end-to-end deep learning architecture, MVSNet, for depth 
map inference from multi-view images. The key contribution of this part is the 
carefully integration between multi-view geometries and convolutional neural 
networks (CNNs). In the network, we extract deep image features and build the 
3D cost volume upon the camera frustum via the differentiable homography 
warping. Then, 3D convolutions are applied to regularize and regress the output 
depth map. We demonstrate on DTU dataset that MVSNet significantly outperforms 
previous state-of-the-arts in both reconstruction completeness and overall 
quality.

Next, we propose to extend the MVSNet architecture for large-scale MVS 
reconstruction. One major limitation of current learning-based approaches is 
the scalability: the memory-consuming cost volume regularization makes the 
learned MVS hard to be applied to high-resolution scenes. To this end, we 
sequentially regularize 2D cost maps via the gated recurrent unit (GRU) rather 
than regularize the entire 3D cost volume in one go. The GRU regularization 
dramatically reduces memory consumption and makes high-resolution 
reconstructions feasible. The proposed R-MVSNet is evaluated on the large-scale 
Tanks and Temples dataset and achieves comparable results to classical 
large-scale MVS algorithms.

Finally, we establish a large-scale synthetic MVS dataset, BlendedMVS, based on 
blended images and rendered depth maps. While several MVS datasets have been 
proposed, they fail to provide accurate depth and occlusion information as 
ground truth mesh models are usually incomplete. We therefore establish a new 
MVS dataset based on model rendering. Textured meshes are first reconstructed 
from images of different scenes, which are then rendered into color images, 
depth maps, and occlusion maps. We further blend rendered images with input 
images using high-pass and low-pass filters to generate our training input. 
Extensive experiments demonstrate that models trained on BlendedMVS achieve 
significant better generalization ability compared with models trained on other 
MVS datasets.

In sum, this thesis presents a complete learning-based solution to large-scale 
multi-view stereopsis, including a current baseline network (MVSNet), its 
large-scale extension (R-MVSNet) and a large-scale synthetic dataset 
(BlendedMVS). We bridge the gap between classical MVS reconstructions and 
recent deep learning techniques and demonstrate the effectiveness of the 
learning-based MVS through extensive experiments on different datasets.


Date:			Friday, 6 December 2019

Time:			4:00pm - 6:00pm

Venue:			Room 5508
 			Lifts 25/26

Chairman:		Prof. Bertram Shi (ECE)

Committee Members:	Prof. Long QUAN (Supervisor)
 			Prof. Pedro SANDER
 			Prof. Chiew Lan TAI
 			Prof. Kai TANG (MAE)
 			Prof. Wenping WANG (HKU)


**** ALL are Welcome ****