More about HKUST
Towards Data-Efficient Object Recognition In The Open World
PhD Thesis Proposal Defence Title: "Towards Data-Efficient Object Recognition In The Open World" by Mr. Qi FAN Abstract: Conventional object recognition methods typically require a substantial amount of training data and preparing such high-quality training data is very labor-intensive. This has motivated the recent development of data-efficient object recognition. Further research is needed to develop novel techniques that can enable high-performance object recognition with limited labeled data. The success of such techniques can have a significant impact on various applications such as autonomous vehicles, robotics, and healthcare. In this thesis, we present our proposed data-efficient object recognition methods. We first introduce a general few-shot object detection model that can be applied to detect novel objects without re-training and fine-tuning by exploiting matching relationship between object pairs in a weight-shared network at multiple network stages. Central to our method are our Attention-RPN, Multi-Relation Detector and Contrastive Training strategy, which exploit the similarity between the few shot support set and query set to detect novel objects while suppressing false detection in the background. To train our network, we contribute a new dataset that contains 1000 categories of various objects with high-quality annotations. Then we propose a technique to detect objects in the video with three contributions to real-world visual learning challenges in our highly diverse and dynamic world: 1) a large-scale video dataset FSVOD-500 comprising of 500 classes with class-balanced videos in each category for few-shot learning; 2) a novel Tube Proposal Network (TPN) to generate high-quality video tube proposals for aggregating feature representation for the target video object which can be highly dynamic; 3) a strategically improved Temporal Matching Network (TMN+) for matching representative query tube features with better discriminative ability thus achieving higher diversity. Next, we further generalize our method to detect objects in unseen domains. We analyze and investigate effective solutions to overcome domain style overfitting for robust object detection without the above shortcomings. Our method, dubbed as Normalization Perturbation (NP), perturbs the channel statistics of source domain low-level features to synthesize various latent styles, so that the trained deep model can perceive diverse potential domains and generalizes well even without observations of target domain data in training. Normalization Perturbation only relies on a single source domain and is surprisingly simple and effective, contributing a practical solution by effectively adapting or generalizing classification DG methods to robust object detection. Finally, we extend our methods to accurately segment objects for high-quality masks by exploiting objects commonalities among different classes. Specifically, we parse two types of commonalities: 1) shape commonalities which are learned by performing supervised learning on instance boundary prediction; and 2) appearance commonalities which are captured by modeling pairwise affinities among pixels of feature maps to optimize the separability between instance and the background. Date: Thursday, 13 April 2023 Time: 5:00pm - 7:00pm Venue: Room 5501 lifts 25/26 Committee Members: Prof. Chi-Keung Tang (Supervisor) Dr. Hao Chen (Chairperson) Dr. Qifeng Chen Dr. Dan Xu **** ALL are Welcome ****