Image and Video Instance Segmentation: Towards Better Quality, Efficiency and Robustness

PhD Thesis Proposal Defence


Title: "Image and Video Instance Segmentation: Towards Better Quality, 
Efficiency and Robustness"

by

Mr. Lei KE


Abstract:

Instance segmentation is a fundamental task in computer vision with many 
real-world applications, such as image/video editing, robotic perception, 
self-driving and medical imaging. Various image/video instance segmentation 
approaches have been proposed with remarkable progress. However, their 
predicted mask quality is still not desirable with over-smoothing object 
boundaries. Besides, the performance of existing image/video instance 
segmentation methods significantly degrades when deploying in complex 
real-world environments, such as segmenting heavily occluded instances. Beyond 
single images, efficiently leveraging the long-range temporal correspondence to 
improve video segmentation is also underexplored.

For high-quality image-based instance segmentation, we present Mask Transfiner 
and propose the concept of Incoherent Regions. Instead of operating on regular 
dense tensors, our Mask Transfiner decomposes and represents the image regions 
as a quadtree, and only corrects sparse error-prone areas. This allows Mask 
Transfiner to predict highly accurate instance masks, with efficiency and at a 
low computational cost.

For better video segmentation quality, based on Mask Transfiner, we further 
design the first high-quality video instance segmentation (VIS) method VMT, 
capable of leveraging high-resolution features thanks to a highly efficient 
video transformer structure. To benchmark high-quality VIS, we introduce the 
Tube-Boundary AP metric along with the large-scale HQ-YTVIS benchmark. Besides, 
we also design an efficient prototypical cross-attention network to distill 
temporal information for improving multiple object tracking and segmentation 
(MOTS), especially for the self-driving scenario.

For enhancing segmentation robustness under heavy occlusions, we propose BCNet 
with a simple bilayer decoupling network for explicit occluder-occludee 
modeling. We extensively investigate the efficacy of bilayer structure using 
FCN, GCN and ViT network architectures, and show its effectiveness and 
generalization on six large-scale and popular image and video instance 
segmentation benchmarks.


Date:			Monday, 6 February 2023

Time:                  	3:00pm - 5:00pm

Venue: 			Room 4475
 			Lifts 25/26

Committee Members:	Prof. Chi-Keung Tang (Supervisor)
 			Dr. Qifeng Chen (Chairperson)
 			Dr. Dan Xu
 			Prof. Yu-Wing Tai


**** ALL are Welcome ****