Learning Rigid Object Pose Estimation

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Learning Rigid Object Pose Estimation"

By

Mr. Yisheng HE


Abstract

Rigid object pose estimation aims to predict the target object's orientation, 
position, and size. It is a significant component of various real-world 
applications, including but not limited to robotic manipulation, augmented 
reality, and autonomous driving. Recently, the rapid development of deep 
learning techniques has inspired various learning-based approaches to promote 
rigid object pose estimation. In this thesis, we advance learning-based rigid 
object pose estimation in three aspects: improving the pose estimation 
accuracy, enhancing the network generalizability, and eliminating the reliance 
on manual labels.

First, we improve the accuracy of learning-based object pose estimation by 
enhancing the two main sub-modules, the representation learning backbone for 
feature extraction from RGBD inputs and the subsequent output representation 
for pose estimation. For representation learning, we introduce a full-flow 
bidirectional fusion network to combine the complementary information residing 
in the RGB and depth images. Features with rich semantic and geometric 
information are extracted for precise regression of different downstream tasks. 
For output representation, we introduce a 3D-keypoint-based algorithm by joint 
instance semantic segmentation and 3D keypoint detection. Then, the pose 
parameters are estimated within a least-squares fitting manner. Our 
3D-keypoint-based formulation fully leverages the geometric constraint of the 
rigid object and is easy for a network to learn and optimize.

Second, we enhance the generalizability of pose estimation algorithms by 
eliminating the close-set assumption and their reliance on high-fidelity object 
CAD models. We study a few-shot open-set 6D pose estimation problem, which aims 
to estimate the 6D pose of unknown objects given only a few support views. We 
propose a large-scale photorealistic dataset (ShapeNet6D) for network 
pre-training and introduce a dense prototype matching network to tackle the 
pose estimation problem. We also establish a benchmark to facilitate future 
research on this new challenging problem.

Finally, to eliminate the reliance on time- and labor-consuming manual labels, 
we propose a self-supervised framework for category-level object pose and size 
estimation. Specifically, we propose a label-free method that learns to enforce 
the geometric consistency between the category template mesh and observed 
object point cloud under a self-supervision manner. Given the category template 
mesh and the observed scene object point cloud, we propose to leverage 
differentiable shape deformation, registration, and rendering to enforce 
geometric consistency for self-supervision.


Date:			Tuesday, 13 December 2022

Time:			5:00pm - 7:00pm

Venue:			Room 5566
 			lifts 27/28

Chairperson:		Prof. Jun ZHANG (ECE)

Committee Members:	Prof. Qifeng CHEN (Supervisor)
 			Prof. Long QUAN
 			Prof. Dan XU
 			Prof. Ling SHI (ECE)
 			Prof. Hongsheng LI (CUHK)


**** ALL are Welcome ****