More about HKUST
A deep discriminative representational framework for recovering categorical 3D object attributes from visual data
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "A deep discriminative representational framework for recovering categorical 3D object attributes from visual data" By Mr. Shichao LI Abstract How to recognize 3D properties from 2D RGB images is a fundamental problem in computer vision, which enables enormous applications such as human-computer interaction, traffic surveillance, autonomous perception, and augmented reality. This problem is challenging due to the loss of depth information in the image formation process and a large variation of depth, object geometry, and scene illumination. In early studies, David Marr proposed a representational framework of vision that begins with a low-level primal sketch, which progresses to an intermediate 2.5D sketch and finally a 3D model representation. With an increasing amount of 3D labels, recent deep representation learning approaches instantiate such a framework by learning from data in an end-to-end manner. This thesis explores such instantiations for a set of representative 3D perception problems. After introducing the background and the niche of this thesis, these studies are presented in an order of increasing number of 3D attributes, camera views, and system capabilities. We first study the problem of recognizing the 3D orientation of vehicles from a single RGB image. In contrast to prior arts that directly regress the angular values with a deep neural network, we propose a progressive approach by learning geometry-aware representations with perspective points which achieves improved model generalization. We encode the prior knowledge of a projective invariant into the training process to further improve the representation learning with extra unlabeled images. Secondly, we study inferring non-rigid 3D posture of humans from single-view images. We discovered a dataset bias problem in the training phase and propose the first method to incorporate synthetic data into the training phase of 2D-to-3D networks to achieve better model generalization to unseen inputs. We then extend the study of rigid pose estimation problems to two-views and study learning voxel-based representations for stereo 3D object detection. We propose a new multi-resolution approach that enables high-resolution modeling of object regions and design a new instance-level model to achieve high precision and transferable pose refinement. Finally, we push the capability of the perception model to go beyond the rigid pose estimation and achieve fine-grained shape inference, making it more similar to the binocular human vision system. We design the first model for joint stereo 3D object detection and implicit shape estimation with a new instance-level model that infers shape with intermediate point-based representations. We further extend the pose refinement studies to the non-rigid object classes such as pedestrians and cyclists. Date: Tuesday, 16 August 2022 Time: 2:00pm - 3:40pm Zoom Meeting: https://hkust.zoom.us/j/9838391022 Chairperson: Prof. Weiyin HONG (ISOM) Committee Members: Prof. Tim CHENG (Supervisor) Prof. Qifeng CHEN Prof. Dan XU Prof. Weichuan YU (ECE) Prof. Hongsheng LI (CUHK) **** ALL are Welcome ****