A deep discriminative representational framework for recovering categorical 3D object attributes from visual data

PhD Thesis Proposal Defence


Title: "A deep discriminative representational framework for recovering 
categorical 3D object attributes from visual data"

by

Mr. Shichao LI


Abstract:

Recognizing 3D properties from 2D RGB images is a fundamental problem in 
computer vision, and enables many applications such as human-computer 
interaction, traffic surveillance, autonomous perception, and augmented 
reality. This problem is challenging due to the loss of depth information in 
the image formation process and a large variation of depth, object geometry, 
and scene illumination. In early studies, David Marr proposed a 
representational framework of vision that begins with a low-level primal 
sketch, which is progressively mapped to an intermediate 2.5D sketch and 
finally 3D model representation. With the increasing number of 3D labels, 
recent deep representation learning approaches instantiate such a framework by 
learning from data in an end-to-end manner. This thesis explores such an 
instantiation for a set of representative 3D perception problems. After 
introducing the background and niche of this thesis, these explorations are 
presented in an order of increasing number of 3D attributes, camera views, and 
system capabilities.

We first study the problem of recognizing the 3D orientation of vehicles from a 
single RGB image. In contrast to prior arts that directly regress the angle 
values with a deep neural network, we propose a progressive approach by 
learning geometry-aware representations with perspective points which achieves 
improved model generalization. We encode the prior knowledge of a projective 
invariant into the training process to further improve representation learning 
with extra unlabeled images.

Secondly, we study inferring non-rigid 3D posture of humans from single-view 
images. We discovered a dataset bias problem in the training phase and propose 
the first method to incorporate synthetic data into the training phase of 
2D-to-3D networks to achieve better model generalization to unseen inputs.

We then extend the study of rigid pose estimation problems to two-views and 
study learning voxel-based representations for stereo 3D object detection. We 
propose a new multi-resolution approach that enables high-resolution modeling 
of object regions and design a new instance-level model to achieve high 
precision and transferable pose refinement.

Finally, we push the capability of the perception model to go beyond the rigid 
pose estimation and achieve fine-grained shape inference, making it more 
similar to the binocular human vision system. We design the first model for 
joint stereo 3D object detection and implicit shape estimation with a new 
instance-level model that infers shape with intermediate point-based 
representations. We further extend the pose refinement studies to the non-rigid 
object classes such as pedestrians and cyclists.


Date:			Tuesday, 26 July 2022

Time:                  	2:00pm - 4:00pm

Zoom Meeting:		https://hkust.zoom.us/j/9838391022

Committee Members:	Prof. Tim Cheng (Supervisor)
 			Dr. Qifeng Chen (Chairperson)
 			Dr. Dan Xu
 			Prof. Weichuan Yu (ECE)


**** ALL are Welcome ****