Geometry Inference from Different Modalities: Videos, Polarization Images, and Portrait Images

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Geometry Inference from Different Modalities: Videos, Polarization
Images, and Portrait Images"

By

Miss Jiaxin XIE


Abstract:

Learning Geometry from a single image has been a long-standing and challenging
problem. Single image methods heavily rely on learned image priors, which may
not generalize well to unseen scenes. This thesis explores alternative
methodologies that incorporate additional information from diverse modalities
to enhance the understanding of 3D structures.

In Chapter 2, we propose a novel approach that leverages video frames extracted
from monocular videos. By solving the triangulation problem between two video
frames, initial depth estimates are obtained. This temporal context enhances
the accuracy and robustness of depth estimation, enabling a more comprehensive
reconstruction of the underlying 3D geometry.

In Chapter 3, we introduce the utilization of polarization images to aid in
normal estimation for complex scenes. Polarization images capture distinct
changes in light polarization as it interacts with surfaces of different shapes
and materials. By analyzing polarization cues, dense surface orientation
information is extracted, facilitating accurate estimation of surface normals.

In Chapter 4, we leverage a pre-trained 3D-aware portrait images generation
model to aid in depth estimation. The pre-trained model exhibits a strong
ability to generate multi-view portrait images. Exploiting this 3D-aware
generation capability, we utilize the model to infer depth from a single input
image. The estimated depth information is then employed to warp pseudo views,
effectively addressing the challenging geometry-texture trade-off encountered
in 3D inversion tasks.

Collectively, this thesis contributes to the advancement of learning 3D from
single images by incorporating information from different modalities, including
videos, polarization images, and portrait images. The proposed methodologies
overcome limitations of naive single image approaches.


Date:                   Tuesday, 30 January 2024

Time:                   10:00am - 12:00noon

Venue:                  Room 3494
                        Lifts 25/26

Chairman:               Prof. Ding PAN (PHYS)

Committee Members:      Prof. Qifeng CHEN (Supervisor)
                        Prof. Long CHEN
                        Prof. Dan XU
                        Prof. Ling SHI (ECE)
                        Prof. Rynson LAU (HKCityU)


**** ALL are Welcome ****