More about HKUST
A deep discriminative representational framework for recovering categorical 3D object attributes from visual data
PhD Thesis Proposal Defence Title: "A deep discriminative representational framework for recovering categorical 3D object attributes from visual data" by Mr. Shichao LI Abstract: Recognizing 3D properties from 2D RGB images is a fundamental problem in computer vision, and enables many applications such as human-computer interaction, traffic surveillance, autonomous perception, and augmented reality. This problem is challenging due to the loss of depth information in the image formation process and a large variation of depth, object geometry, and scene illumination. In early studies, David Marr proposed a representational framework of vision that begins with a low-level primal sketch, which is progressively mapped to an intermediate 2.5D sketch and finally 3D model representation. With the increasing number of 3D labels, recent deep representation learning approaches instantiate such a framework by learning from data in an end-to-end manner. This thesis explores such an instantiation for a set of representative 3D perception problems. After introducing the background and niche of this thesis, these explorations are presented in an order of increasing number of 3D attributes, camera views, and system capabilities. We first study the problem of recognizing the 3D orientation of vehicles from a single RGB image. In contrast to prior arts that directly regress the angle values with a deep neural network, we propose a progressive approach by learning geometry-aware representations with perspective points which achieves improved model generalization. We encode the prior knowledge of a projective invariant into the training process to further improve representation learning with extra unlabeled images. Secondly, we study inferring non-rigid 3D posture of humans from single-view images. We discovered a dataset bias problem in the training phase and propose the first method to incorporate synthetic data into the training phase of 2D-to-3D networks to achieve better model generalization to unseen inputs. We then extend the study of rigid pose estimation problems to two-views and study learning voxel-based representations for stereo 3D object detection. We propose a new multi-resolution approach that enables high-resolution modeling of object regions and design a new instance-level model to achieve high precision and transferable pose refinement. Finally, we push the capability of the perception model to go beyond the rigid pose estimation and achieve fine-grained shape inference, making it more similar to the binocular human vision system. We design the first model for joint stereo 3D object detection and implicit shape estimation with a new instance-level model that infers shape with intermediate point-based representations. We further extend the pose refinement studies to the non-rigid object classes such as pedestrians and cyclists. Date: Tuesday, 26 July 2022 Time: 2:00pm - 4:00pm Zoom Meeting: https://hkust.zoom.us/j/9838391022 Committee Members: Prof. Tim Cheng (Supervisor) Dr. Qifeng Chen (Chairperson) Dr. Dan Xu Prof. Weichuan Yu (ECE) **** ALL are Welcome ****