Towards Data-Efficient Object Recognition In The Open World

PhD Thesis Proposal Defence


Title: "Towards Data-Efficient Object Recognition In The Open World"

by

Mr. Qi FAN


Abstract:

Conventional object recognition methods typically require a substantial amount 
of training data and preparing such high-quality training data is very 
labor-intensive. This has motivated the recent development of data-efficient 
object recognition. Further research is needed to develop novel techniques that 
can enable high-performance object recognition with limited labeled data. The 
success of such techniques can have a significant impact on various 
applications such as autonomous vehicles, robotics, and healthcare.

In this thesis, we present our proposed data-efficient object recognition 
methods. We first introduce a general few-shot object detection model that can 
be applied to detect novel objects without re-training and fine-tuning by 
exploiting matching relationship between object pairs in a weight-shared 
network at multiple network stages. Central to our method are our 
Attention-RPN, Multi-Relation Detector and Contrastive Training strategy, which 
exploit the similarity between the few shot support set and query set to detect 
novel objects while suppressing false detection in the background. To train our 
network, we contribute a new dataset that contains 1000 categories of various 
objects with high-quality annotations.

Then we propose a technique to detect  objects in the video with three 
contributions to real-world visual learning challenges in our highly diverse 
and dynamic world: 1) a large-scale video dataset FSVOD-500 comprising of 500 
classes with class-balanced videos in each category for few-shot learning; 2) a 
novel Tube Proposal Network (TPN) to generate high-quality video tube proposals 
for aggregating feature representation for the target video object which can be 
highly dynamic; 3) a strategically improved Temporal Matching Network (TMN+) 
for matching representative query tube features with better discriminative 
ability thus achieving higher diversity.

Next, we further generalize our method to detect objects in unseen domains. We 
analyze and investigate effective solutions to overcome domain style 
overfitting for robust object detection without the above shortcomings.  Our 
method, dubbed as Normalization Perturbation (NP), perturbs the channel 
statistics of source domain low-level features to synthesize various latent 
styles, so that the trained deep model can perceive diverse potential domains 
and generalizes well even without observations of target domain data in 
training. Normalization Perturbation only relies on a single source domain and 
is surprisingly simple and effective, contributing a practical solution by 
effectively adapting or generalizing classification DG methods to robust object 
detection.

Finally, we extend our methods to accurately segment objects for high-quality 
masks by exploiting objects commonalities among different classes. 
Specifically, we parse two types of commonalities: 1) shape commonalities which 
are learned by performing supervised learning on instance boundary prediction; 
and 2) appearance commonalities which are captured by modeling pairwise 
affinities among pixels of feature maps to optimize the separability between 
instance and the background.


Date:			Thursday, 13 April 2023

Time:                  	5:00pm - 7:00pm

Venue:			Room 5501
  			lifts 25/26

Committee Members:	Prof. Chi-Keung Tang (Supervisor)
 			Dr. Hao Chen (Chairperson)
 			Dr. Qifeng Chen
 			Dr. Dan Xu


**** ALL are Welcome ****