More about HKUST
Learning Rigid Object Pose Estimation
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Learning Rigid Object Pose Estimation" By Mr. Yisheng HE Abstract Rigid object pose estimation aims to predict the target object's orientation, position, and size. It is a significant component of various real-world applications, including but not limited to robotic manipulation, augmented reality, and autonomous driving. Recently, the rapid development of deep learning techniques has inspired various learning-based approaches to promote rigid object pose estimation. In this thesis, we advance learning-based rigid object pose estimation in three aspects: improving the pose estimation accuracy, enhancing the network generalizability, and eliminating the reliance on manual labels. First, we improve the accuracy of learning-based object pose estimation by enhancing the two main sub-modules, the representation learning backbone for feature extraction from RGBD inputs and the subsequent output representation for pose estimation. For representation learning, we introduce a full-flow bidirectional fusion network to combine the complementary information residing in the RGB and depth images. Features with rich semantic and geometric information are extracted for precise regression of different downstream tasks. For output representation, we introduce a 3D-keypoint-based algorithm by joint instance semantic segmentation and 3D keypoint detection. Then, the pose parameters are estimated within a least-squares fitting manner. Our 3D-keypoint-based formulation fully leverages the geometric constraint of the rigid object and is easy for a network to learn and optimize. Second, we enhance the generalizability of pose estimation algorithms by eliminating the close-set assumption and their reliance on high-fidelity object CAD models. We study a few-shot open-set 6D pose estimation problem, which aims to estimate the 6D pose of unknown objects given only a few support views. We propose a large-scale photorealistic dataset (ShapeNet6D) for network pre-training and introduce a dense prototype matching network to tackle the pose estimation problem. We also establish a benchmark to facilitate future research on this new challenging problem. Finally, to eliminate the reliance on time- and labor-consuming manual labels, we propose a self-supervised framework for category-level object pose and size estimation. Specifically, we propose a label-free method that learns to enforce the geometric consistency between the category template mesh and observed object point cloud under a self-supervision manner. Given the category template mesh and the observed scene object point cloud, we propose to leverage differentiable shape deformation, registration, and rendering to enforce geometric consistency for self-supervision. Date: Tuesday, 13 December 2022 Time: 5:00pm - 7:00pm Venue: Room 5566 lifts 27/28 Chairperson: Prof. Jun ZHANG (ECE) Committee Members: Prof. Qifeng CHEN (Supervisor) Prof. Long QUAN Prof. Dan XU Prof. Ling SHI (ECE) Prof. Hongsheng LI (CUHK) **** ALL are Welcome ****