LEARNING RIGID OBJECT POSE ESTIMATION

PhD Thesis Proposal Defence


Title: "LEARNING RIGID OBJECT POSE ESTIMATION"

by

Mr. Yisheng HE


Abstract:

Rigid object pose estimation aims to predict the target object’s orientation, 
position, and size. It is a significant component of various real-world 
applications, including but not limited to robotic manipulation, augmented 
reality, and autonomous driving. Traditional algorithms for this problem 
utilize hand-crafted features to extract the correspondence between images and 
object mesh models. However, they suffer from limited performance in 
challenging scenarios, e.g., changing illumination conditions and scenes with 
heavy occlusion. In this thesis, we leverage deep learning techniques to 
advance the rigid object pose estimation.

First, we decompose learning-based object pose estimation into two sub-modules, 
the representation learning backbone for feature extraction from RGBD inputs 
and the subsequent output representation for pose estimation. For 
representation learning, we introduce a full-flow bidirectional fusion network 
to combine the complementary information residing in the RGB and depth images. 
Features with rich semantic and geometric information are extracted for precise 
regression of different downstream tasks. For output representation, we 
introduce a 3Dkeypoint- based algorithm by joint instance semantic segmentation 
and 3D keypoint detection. The pose parameters are then estimated within a 
least-squares fitting manner. Our 3D-keypointbased formulation fully leverages 
the geometric constraint of the rigid object and is easy for a network to learn 
and optimize.

Second, we study a few-shot open-set 6D pose estimation problem. Our goal is to 
eliminate the two limitations of learning-based pose estimation algorithms: the 
close-set assumption and their reliance on high-fidelity object CAD models. The 
proposed few-shot 6D pose estimation problem is to estimate the 6D pose of an 
unknown object given a few support views of xv it. We propose a large-scale 
photorealistic dataset (ShapeNet6D) for network pre-training and introduce a 
dense prototype matching network to estimate pose parameters. We also establish 
a benchmark to facilitate future research on this new challenging problem.

Finally, we propose a self-supervised framework for category-level object pose 
and size estimation. Our goal is to enable learning-based algorithms to 
eliminate the reliance on time- and labor-consuming manual labels. We propose a 
label-free method that learns to enforce the geometric consistency between the 
category template mesh and observed object point cloud under a self-supervision 
manner. Specifically, given the category template mesh and the observed scene 
object point cloud, we propose to leverage differentiable shape deformation, 
registration, and rendering to enforce geometric consistency for 
self-supervision.


Date:			Friday, 8 July 2022

Time:                  	4:00pm - 6:00pm

Zoom Meeting:		https://hkust.zoom.us/j/4536985718

Committee Members:	Dr. Qifeng Chen (Supervisor)
  			Prof. Long Quan (Chairperson)
 			Dr. Dan Xu
 			Prof. Ling Shi (ECE)


**** ALL are Welcome ****