TOWARDS DATA-EFFICIENT OBJECT RECOGNITION IN THE OPEN WORLD

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "TOWARDS DATA-EFFICIENT OBJECT RECOGNITION IN THE OPEN WORLD"

By

Mr. Qi FAN


Abstract:

Conventional object recognition methods typically require a substantial amount
of training data and preparing such high-quality training data is very
labor-intensive. This has motivated the recent development of data-efficient
object recognition. Further research is needed to develop novel techniques that
can enable high-performance object recognition with limited labeled data. The
success of such techniques can have a significant impact on various
applications such as autonomous vehicles, robotics, and healthcare.

In this thesis, we present our proposed data-efficient object recognition
methods. We first introduce a general few-shot object detection model that can
be applied to detect novel objects without re-training and fine-tuning by
exploiting matching relationship between object pairs in a weight-shared
network at multiple network stages. Central to our method are our
Attention-RPN, Multi-Relation Detector and Contrastive Training strategy, which
exploit the similarity between the few shot support set and query set to detect
novel objects while suppressing false detection in the background. To train our
network, we contribute a new dataset that contains 1000 categories of various
objects with highquality annotations.

We extend our box-level recognition FSOD method to perform accurate pixel-level
prediction for novel classes, i.e., few-shot semantic segmentation, and
co-salient object detection. Our proposed self-support few-shot semantic
segmentation method addresses the critical intra-class appearance discrepancy
problem inherent in few-shot segmentation, by leveraging the query feature to
generate self-support prototypes and perform self-support matching with query
features. Then we investigate a novel group collaborative learning framework
GCoNet, by introducing effective semantic information to benefit the
representation of both the intra-group compactness and inter-group separability
for CoSOD.

Then we propose a technique to detect objects in the video with three
contributions to real-world visual learning challenges in our highly diverse
and dynamic world: 1) a large-scale video dataset FSVOD-500 comprising of 500
classes with class-balanced videos in each category for few-shot learning; 2) a
novel Tube Proposal Network (TPN) to generate high-quality video tube proposals
for aggregating feature representation for the target video object which can be
highly dynamic; 3) a strategically improved Temporal Matching Network (TMN+)
for matching representative query tube features with better discriminative
ability thus achieving higher diversity.

Finally, we generalize our method to detect objects in unseen domains. We
analyze and investigate effective solutions to overcome domain style
overfitting for robust object detection without the above shortcomings. Our
method, dubbed as Normalization Perturbation (NP), perturbs the channel
statistics of source domain low-level features to synthesize various latent
styles, so that the trained deep model can perceive diverse potential domains
and generalizes well even without observations of target domain data in
training. Normalization Perturbation only relies on a single source domain and
is surprisingly simple and effective, contributing a practical solution by
effectively adapting or generalizing classification DG methods to robust object
detection.


Date:                   Friday, 25 August 2023

Time:                   3:00pm - 5:00pm

Venue:                  Room 3494
                        lifts 25/26

Chairperson:            Prof. Xiangtong QI (IEDA)

Committee Members:      Prof. Chi Keung TANG (Supervisor)
                        Prof. Long CHEN
                        Prof. Long QUAN
                        Prof. Weichuan YU (ECE)
                        Prof. Ping LUO (HKU)


**** ALL are Welcome ****