More about HKUST
Label-efficient Learning by Exploiting Unlabeled Data with Higher-quality Supervision
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Label-efficient Learning by Exploiting Unlabeled Data with
Higher-quality Supervision"
By
Miss Huimin WU
Abstract:
This thesis seeks to explore label-efficient learning techniques that can
mitigate the reliance on large-scale human labels in training deep learning
models. Our primary focus lies in the development of strategies that make
effective use of unlabeled data across various training scenarios, each with
different levels of human labels, including: 1) the semi-supervised setting,
characterized by a limited amount of labeled data, say around 20%; 2) the
barely-supervised setting, wherein few labeled data, no more than 5%, are
utilized; 3) the self-supervised pre-training, which operates without human
labels; 4) the downstream adaptation, the second phase of self-supervised
learning, which leverages all available labels and an additional set of
unlabeled data. Intuitively, the effective supervision of unlabeled data is
crucial for learning performance. Therefore, our primary goal is to create
high-quality supervision that is expressive, non-degenerating, data-generic,
and transferable.
Firstly, we explore more expressive forms of supervision to enhance
semi-supervised medical image segmentation. Expressive supervision has the
ability to train a more compact and better-separated feature space, as
opposed to simply segregating features from different classes, as traditional
super-vision in the format of one-hot vectors or their soft counterparts. In
this regard, we propose utilizing contrastive loss for this purpose, which
offers more discriminative and generalizable features and finally leads to
better final performance.
Then, we reduce the number of human labels and explore the barely supervised
learning setting. This setting is characterized by the presence of few
labeled data points, posing challenges in accurately supervising unlabeled
data. Pseudo labels generated by state-of-the-art semi-supervised learning
methods often exhibit unsatisfactory accuracy and even degrade during
training. Moreover, previous methods geared towards barely supervised
learning tend to assume structural similarity between data for supervision,
limiting their applicability to diverse datasets such as those for cancer or
disease segmentation. To address these issues, we propose non-degenerating
pseudo label generation strategy through an online confidence thresholding
technique. Aside from improved accuracy, the proposed supervision does not
rely on ground truth similarity, making it adaptable to inter-case variation
and applicable to a broader range of practical segmentation problems,
encompassing both organ and cancer segmentation.
Thirdly, we explore the topic of self-supervised pre-training. Instead of
relying on human labels, it extracts supervisory signals directly from the
data itself. This is achieved by performing self-defined pre-text tasks, with
contrastive learning being a common example. Contrasting learning performs
the task of instance discrimination, where two augmented views of the same
data are identified as a positive pair, and two different data samples are
identified as a negative pair. Augmentations play a crucial role in providing
supervision, but they are usually data-specific. For a wider range of
applicability, the focus of this work is on finding a data-generic
augmentation that can be applied to any data modality. To achieve this
objective, we propose randomized quantization as an augmentation for
contrastive learning. Our method has shown better performance compared to
previous data-specific and data-agnostic self-supervised techniques. We have
validated its effective-ness across various data modalities, including
vision, audio, 3D point clouds, and DABS, a public benchmark for data-
agnostic self-supervised learning.
In our latest work, we look for transferable supervision from unlabeled data
that can be beneficial to novel downstream applications. Typically,
self-supervised pre-training foundation models are adapted for solving
downstream semantic tasks such as classification, segmentation, or detection.
However, it remains unclear how to reuse general-purpose foundation models
for non-semantic tasks like optical flow estimation, which involves
predicting the movement of each pixel from a source image to the
corresponding point in the target image. While conventional practices for
this task involve specialized architectural designs, the architectures of
existing self-supervised pre-training models tend to converge to
transformers, posing a challenge for their adaptation to optical flow
estimation. Our study revealed that pre-training spatiotemporal masked
autoencoders on natural videos offer valuable knowledge for optical flow
estimation. Unlike previous approaches relying on complex task-specific
architectural components, our overall architecture does not contain
task-specific inductive bias, greatly simplifying the architectural design.
The strong performance validates that optical flow estimation can benefit
from pre-training on unlabeled data.
Date: Monday, 15 July 2024
Time: 10:00am - 12:00noon
Venue: Room 3494
Lifts 25/26
Chairman: Prof. Weiping LI (MATH)
Committee Members: Prof. Tim CHENG (Supervisor)
Dr. Xiaomeng LI (Co-supervisor)
Prof. Huamin QU
Dr. Dan XU
Dr. Zhiyao XIE (ECE)
Prof. Guotai WANG (UESTC)