More about HKUST
Learning Large-scale Multi-view Stereopsis
PhD Thesis Proposal Defence Title: "Learning Large-scale Multi-view Stereopsis" by Mr. Yao YAO Abstract: Multi-view stereo (MVS) reconstructs 3D representations of the scene from imagery, which is a core problem of computer vision extensively studied for decades. Traditionally, MVS algorithms apply hand-crafted similarity metrics and engineered regularizations to compute dense correspondences. While these methods have shown great results under ideal Lambertian scenarios, classical MVS algorithms still suffer from numerous artifacts. In this thesis, we propose to advance the MVS reconstruction using recent deep learning techniques. First, we present an end-to-end deep learning architecture, MVSNet, for depth map inference from multi-view images. The key contribution of this part is the carefully integration between multi-view geometries and convolutional neural networks (CNNs). In the network, we extract deep image features and build the 3D cost volume upon the camera frustum via the differentiable homography warping. Then, 3D convolutions are applied to regularize and regress the output depth map. We demonstrate on DTU dataset that MVSNet significantly outperforms previous state-of-the-arts in both reconstruction completeness and overall quality. Next, we propose to extend the MVSNet architecture for large-scale MVS reconstruction. One major limitation of current learning-based approaches is the scalability: the memory-consuming cost volume regularization makes the learned MVS hard to be applied to high-resolution scenes. To this end, we sequentially regularize 2D cost maps via the gated recurrent unit (GRU) rather than regularize the entire 3D cost volume in one go. The GRU regularization dramatically reduces memory consumption and makes high-resolution reconstructions feasible. The proposed R-MVSNet is evaluated on the large-scale Tanks and Temples dataset and achieves comparable results to classical large-scale MVS algorithms. Finally, we establish a large-scale synthetic MVS dataset, BlendedMVS, based on blended images and rendered depth maps. While several MVS datasets have been proposed, they fail to provide accurate depth and occlusion information as ground truth mesh models are usually incomplete. We therefore establish a new MVS dataset based on model rendering. Textured meshes are first reconstructed from images of different scenes, which are then rendered into color images, depth maps, and occlusion maps. We further blend rendered images with input images using high-pass and low-pass filters to generate our training input. Extensive experiments demonstrate that models trained on BlendedMVS achieve significant better generalization ability compared with models trained on other MVS datasets. In sum, this thesis presents a complete learning-based solution to large-scale multi-view stereopsis, including a current baseline network (MVSNet), its large-scale extension (R-MVSNet) and a large-scale synthetic dataset (BlendedMVS). We bridge the gap between classical MVS reconstructions and recent deep learning techniques and demonstrate the effectiveness of the learning-based MVS through extensive experiments on different datasets. Date: Friday, 25 October 2019 Time: 4:30pm - 6:30pm Venue: Room 2126D lift 19 Committee Members: Prof. Long Quan (Supervisor) Dr. Xiaojuan Ma (Chairperson) Dr. Pedro Sander Prof. Chiew-Lan Tai **** ALL are Welcome ****