Label-efficient Learning by Exploiting Unlabeled Data with Higher-quality Supervision

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Label-efficient Learning by Exploiting Unlabeled Data with 
Higher-quality Supervision"

By

Miss Huimin WU


Abstract:

This thesis seeks to explore label-efficient learning techniques that can 
mitigate the reliance on large-scale human labels in training deep learning 
models. Our primary focus lies in the development of strategies that make 
effective use of unlabeled data across various training scenarios, each with 
different levels of human labels, including: 1) the semi-supervised setting, 
characterized by a limited amount of labeled data, say around 20%; 2) the 
barely-supervised setting, wherein few labeled data, no more than 5%, are 
utilized; 3) the self-supervised pre-training, which operates without human 
labels; 4) the downstream adaptation, the second phase of self-supervised 
learning, which leverages all available labels and an additional set of 
unlabeled data. Intuitively, the effective supervision of unlabeled data is 
crucial for learning performance. Therefore, our primary goal is to create 
high-quality supervision that is expressive, non-degenerating, data-generic, 
and transferable.

Firstly, we explore more expressive forms of supervision to enhance 
semi-supervised medical image segmentation. Expressive supervision has the 
ability to train a more compact and better-separated feature space, as 
opposed to simply segregating features from different classes, as traditional 
super-vision in the format of one-hot vectors or their soft counterparts. In 
this regard, we propose utilizing contrastive loss for this purpose, which 
offers more discriminative and generalizable features and finally leads to 
better final performance.

Then, we reduce the number of human labels and explore the barely supervised 
learning setting. This setting is characterized by the presence of few 
labeled data points, posing challenges in accurately supervising unlabeled 
data. Pseudo labels generated by state-of-the-art semi-supervised learning 
methods often exhibit unsatisfactory accuracy and even degrade during 
training. Moreover, previous methods geared towards barely supervised 
learning tend to assume structural similarity between data for supervision, 
limiting their applicability to diverse datasets such as those for cancer or 
disease segmentation. To address these issues, we propose non-degenerating 
pseudo label generation strategy through an online confidence thresholding 
technique. Aside from improved accuracy, the proposed supervision does not 
rely on ground truth similarity, making it adaptable to inter-case variation 
and applicable to a broader range of practical segmentation problems, 
encompassing both organ and cancer segmentation.

Thirdly, we explore the topic of self-supervised pre-training. Instead of 
relying on human labels, it extracts supervisory signals directly from the 
data itself. This is achieved by performing self-defined pre-text tasks, with 
contrastive learning being a common example. Contrasting learning performs 
the task of instance discrimination, where two augmented views of the same 
data are identified as a positive pair, and two different data samples are 
identified as a negative pair. Augmentations play a crucial role in providing 
supervision, but they are usually data-specific. For a wider range of 
applicability, the focus of this work is on finding a data-generic 
augmentation that can be applied to any data modality. To achieve this 
objective, we propose randomized quantization as an augmentation for 
contrastive learning. Our method has shown better performance compared to 
previous data-specific and data-agnostic self-supervised techniques. We have 
validated its effective-ness across various data modalities, including 
vision, audio, 3D point clouds, and DABS, a public benchmark for data- 
agnostic self-supervised learning.

In our latest work, we look for transferable supervision from unlabeled data 
that can be beneficial to novel downstream applications. Typically, 
self-supervised pre-training foundation models are adapted for solving 
downstream semantic tasks such as classification, segmentation, or detection. 
However, it remains unclear how to reuse general-purpose foundation models 
for non-semantic tasks like optical flow estimation, which involves 
predicting the movement of each pixel from a source image to the 
corresponding point in the target image. While conventional practices for 
this task involve specialized architectural designs, the architectures of 
existing self-supervised pre-training models tend to converge to 
transformers, posing a challenge for their adaptation to optical flow 
estimation. Our study revealed that pre-training spatiotemporal masked 
autoencoders on natural videos offer valuable knowledge for optical flow 
estimation. Unlike previous approaches relying on complex task-specific 
architectural components, our overall architecture does not contain 
task-specific inductive bias, greatly simplifying the architectural design. 
The strong performance validates that optical flow estimation can benefit 
from pre-training on unlabeled data.


Date:                   Monday, 15 July 2024

Time:                   10:00am - 12:00noon

Venue:                  Room 3494
                        Lifts 25/26

Chairman:               Prof. Weiping LI (MATH)

Committee Members:      Prof. Tim CHENG (Supervisor)
                        Dr. Xiaomeng LI (Co-supervisor)
                        Prof. Huamin QU
                        Dr. Dan XU
                        Dr. Zhiyao XIE (ECE)
                        Prof. Guotai WANG (UESTC)