More about HKUST
Image and Video Instance Segmentation: Towards Better Quality, Efficiency and Robustness
PhD Thesis Proposal Defence Title: "Image and Video Instance Segmentation: Towards Better Quality, Efficiency and Robustness" by Mr. Lei KE Abstract: Instance segmentation is a fundamental task in computer vision with many real-world applications, such as image/video editing, robotic perception, self-driving and medical imaging. Various image/video instance segmentation approaches have been proposed with remarkable progress. However, their predicted mask quality is still not desirable with over-smoothing object boundaries. Besides, the performance of existing image/video instance segmentation methods significantly degrades when deploying in complex real-world environments, such as segmenting heavily occluded instances. Beyond single images, efficiently leveraging the long-range temporal correspondence to improve video segmentation is also underexplored. For high-quality image-based instance segmentation, we present Mask Transfiner and propose the concept of Incoherent Regions. Instead of operating on regular dense tensors, our Mask Transfiner decomposes and represents the image regions as a quadtree, and only corrects sparse error-prone areas. This allows Mask Transfiner to predict highly accurate instance masks, with efficiency and at a low computational cost. For better video segmentation quality, based on Mask Transfiner, we further design the first high-quality video instance segmentation (VIS) method VMT, capable of leveraging high-resolution features thanks to a highly efficient video transformer structure. To benchmark high-quality VIS, we introduce the Tube-Boundary AP metric along with the large-scale HQ-YTVIS benchmark. Besides, we also design an efficient prototypical cross-attention network to distill temporal information for improving multiple object tracking and segmentation (MOTS), especially for the self-driving scenario. For enhancing segmentation robustness under heavy occlusions, we propose BCNet with a simple bilayer decoupling network for explicit occluder-occludee modeling. We extensively investigate the efficacy of bilayer structure using FCN, GCN and ViT network architectures, and show its effectiveness and generalization on six large-scale and popular image and video instance segmentation benchmarks. Date: Monday, 6 February 2023 Time: 3:00pm - 5:00pm Venue: Room 4475 Lifts 25/26 Committee Members: Prof. Chi-Keung Tang (Supervisor) Dr. Qifeng Chen (Chairperson) Dr. Dan Xu Prof. Yu-Wing Tai **** ALL are Welcome ****