More about HKUST
Recovering 3D Structures from 2D Images and Videos
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Recovering 3D Structures from 2D Images and Videos" By Mr. Likang WANG Abstract: As humans, we exist in a three-dimensional space and perceive the world through our eyes and sense of touch. However, capturing the three-dimensional world we observe is far from a straightforward task. Among the available sensors, only cameras can emulate the human visual system. Yet, conventional cameras typically provide only two-dimensional images. This limitation means our understanding of the world is akin to a blind person feeling an elephant. Over the past few decades, many researchers have strived to recover the three-dimensional structure of scenes from these two-dimensional images, but even today, reaching satisfactory quality and efficiency in reconstruction remains a significant challenge. To navigate this challenge, we aim first to explore the limits of reconstruction quality. Specifically, we propose a novel coarse-to-fine strategy for scene reconstruction. This approach begins with estimating an initial spatial position for each pixel in the image. Next, we introduce a self-supervised method for estimating the error distribution between our preliminary predictions and the ground truth. This function allows us to concentrate our efforts on areas most likely to be accurate and carry out a more refined inspection. As a result, our strategy leads to significant improvements in reconstruction quality under the same time and space constraints. We then explore how to achieve more satisfactory reconstruction results while meeting the requirements of real-time inference efficiency. For this purpose, we propose two innovative solutions. Firstly, we focus on achieving the highest possible quality in three-dimensional scene reconstruction while maintaining an inference speed of more than 30 frames per second. To do this, we propose a feature fusion method capable of simultaneously extracting and preserving the low-frequency and high-frequency information between video frames. It delivers massive improvements on large planes and fine details without introducing extra computational costs. In addition, based on the sparsity of the three-dimensional space, we propose an accurate and efficient loss correction strategy, enabling more comprehensive scene recovery. Secondly, we set our sights on achieving the most accurate detail recovery, updating at least once every 1/30th of a second. We factor in semantic consistency between frames to facilitate a swift preliminary screening of points in the three-dimensional space. This is followed by a meticulous evaluation focused only on a minority of spatial areas. As a result, our method not only achieves low-latency updates but also provides significantly superior detail quality. While the novel methods we propose advance the field of three-dimensional structure recovery from two-dimensional images and videos, our approach isn't flawless. Therefore, we also discuss the shortcomings of our current work and propose promising directions for future research. Date: Friday, 19 January 2024 Time: 1:00pm - 3:00pm Venue: Room 5510 Lifts 25/26 Chairman: Prof. Howard LUONG (ECE) Committee Members: Prof. Lei CHEN (Supervisor) Prof. Junxian HE Prof. Qiong LUO Prof. Can YANG (MATH) Prof. Haibo HU (HKPU) **** ALL are Welcome ****