More about HKUST
Label-efficient Learning by Exploiting Unlabeled Data with Higher-quality Supervision
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Label-efficient Learning by Exploiting Unlabeled Data with Higher-quality Supervision" By Miss Huimin WU Abstract: This thesis seeks to explore label-efficient learning techniques that can mitigate the reliance on large-scale human labels in training deep learning models. Our primary focus lies in the development of strategies that make effective use of unlabeled data across various training scenarios, each with different levels of human labels, including: 1) the semi-supervised setting, characterized by a limited amount of labeled data, say around 20%; 2) the barely-supervised setting, wherein few labeled data, no more than 5%, are utilized; 3) the self-supervised pre-training, which operates without human labels; 4) the downstream adaptation, the second phase of self-supervised learning, which leverages all available labels and an additional set of unlabeled data. Intuitively, the effective supervision of unlabeled data is crucial for learning performance. Therefore, our primary goal is to create high-quality supervision that is expressive, non-degenerating, data-generic, and transferable. Firstly, we explore more expressive forms of supervision to enhance semi-supervised medical image segmentation. Expressive supervision has the ability to train a more compact and better-separated feature space, as opposed to simply segregating features from different classes, as traditional super-vision in the format of one-hot vectors or their soft counterparts. In this regard, we propose utilizing contrastive loss for this purpose, which offers more discriminative and generalizable features and finally leads to better final performance. Then, we reduce the number of human labels and explore the barely supervised learning setting. This setting is characterized by the presence of few labeled data points, posing challenges in accurately supervising unlabeled data. Pseudo labels generated by state-of-the-art semi-supervised learning methods often exhibit unsatisfactory accuracy and even degrade during training. Moreover, previous methods geared towards barely supervised learning tend to assume structural similarity between data for supervision, limiting their applicability to diverse datasets such as those for cancer or disease segmentation. To address these issues, we propose non-degenerating pseudo label generation strategy through an online confidence thresholding technique. Aside from improved accuracy, the proposed supervision does not rely on ground truth similarity, making it adaptable to inter-case variation and applicable to a broader range of practical segmentation problems, encompassing both organ and cancer segmentation. Thirdly, we explore the topic of self-supervised pre-training. Instead of relying on human labels, it extracts supervisory signals directly from the data itself. This is achieved by performing self-defined pre-text tasks, with contrastive learning being a common example. Contrasting learning performs the task of instance discrimination, where two augmented views of the same data are identified as a positive pair, and two different data samples are identified as a negative pair. Augmentations play a crucial role in providing supervision, but they are usually data-specific. For a wider range of applicability, the focus of this work is on finding a data-generic augmentation that can be applied to any data modality. To achieve this objective, we propose randomized quantization as an augmentation for contrastive learning. Our method has shown better performance compared to previous data-specific and data-agnostic self-supervised techniques. We have validated its effective-ness across various data modalities, including vision, audio, 3D point clouds, and DABS, a public benchmark for data- agnostic self-supervised learning. In our latest work, we look for transferable supervision from unlabeled data that can be beneficial to novel downstream applications. Typically, self-supervised pre-training foundation models are adapted for solving downstream semantic tasks such as classification, segmentation, or detection. However, it remains unclear how to reuse general-purpose foundation models for non-semantic tasks like optical flow estimation, which involves predicting the movement of each pixel from a source image to the corresponding point in the target image. While conventional practices for this task involve specialized architectural designs, the architectures of existing self-supervised pre-training models tend to converge to transformers, posing a challenge for their adaptation to optical flow estimation. Our study revealed that pre-training spatiotemporal masked autoencoders on natural videos offer valuable knowledge for optical flow estimation. Unlike previous approaches relying on complex task-specific architectural components, our overall architecture does not contain task-specific inductive bias, greatly simplifying the architectural design. The strong performance validates that optical flow estimation can benefit from pre-training on unlabeled data. Date: Monday, 15 July 2024 Time: 10:00am - 12:00noon Venue: Room 3494 Lifts 25/26 Chairman: Prof. Weiping LI (MATH) Committee Members: Prof. Tim CHENG (Supervisor) Dr. Xiaomeng LI (Co-supervisor) Prof. Huamin QU Dr. Dan XU Dr. Zhiyao XIE (ECE) Prof. Guotai WANG (UESTC)