More about HKUST
Learning Under Distributional Shift
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Learning Under Distributional Shift" By Mr. Yong LIN Abstract: Machine learning models often assume that the training and testing data are drawn from the same distribution, but this assumption can be violated in real-world applications where the testing distribution differs. Enhancing models' robustness under distributional shift, known as Out-of-Distribution (OOD) Generalization, has gained significant attention in the machine learning community. One popular framework, called Invariant Risk Minimization (IRM), focuses on learning invariant features that can stably predict labels under distributional shifts and discard spurious features that are unstable. IRM enjoys theoretical guarantees for linear models with sufficient number of environments. In this thesis, we first identify some fundamental limitations of IRM and propose methods to alleviate these issues. First, we demonstrate that IRM can be inherently susceptible to overfitting. Specifically, we reveals that IRM theoretically degenerates to Empirical Risk Minimization (ERM) if overfitting occurs. Additionally, through empirical experiments, we provide evidence of IRM's performance degradation with larger neural networks. To mitigate this issue, we propose several methods such as incorporating Bayesian inference and sample reweighting. Second, we theoretically show that learning invariant features is generally impossible without explicit environment partitions. Furthermore, we propose utilizing cheaply available auxiliary information to automatically generate partitions and provide corresponding conditions for the framework. Despite these efforts to improve IRM, it still faces challenges, including non-identifiability with non-linear models under large distributional shift and inadequate performance on large-scale real-world datasets. Interestingly, practitioners have found exceptional success with ensemble-based models, such as ensembles of independently trained models, for OOD generalization. However, it is important to note that ensemble models inevitably utilize spurious features because each individual model in ensemble is trained by Empirical Risk Minimization (ERM) and inherently incorporates spurious features. The success of ensemble models contradicts the IRM theory, which suggests that models relying on spurious features would fail. In this thesis, we unravel the mysteries surrounding ensemble-based models on robust OOD generalization. Our research reveals that ensemble-based models excel in reducing prediction errors in OOD scenarios by leveraging a diverse range of spurious features. Challenging the prevailing belief that emphasizes learning invariant features for improved OOD performance, our findings indicate that incorporating a multitude of diverse spurious features diminishes their individual impact and leads to enhanced overall OOD generalization. Through empirical experiments on the MultiColorMNIST dataset, we substantiate the effectiveness of leveraging diverse spurious features, which aligns with our theoretical analysis. Capitalizing on these novel insights, we further develop pioneering techniques that achieve SOTA OOD performance for foundation models, such as CLIP, on large-scale datasets, including ImageNet variants. Date: Friday, 1 December 2023 Time: 2:00pm - 4:00pm Venue: Room 4475 Lifts 25/26 Chairman: Prof. Jin QI (IEDA) Committee Members: Prof. Tong ZHANG (Supervisor) Prof. Nevin ZHANG Prof. Xiaofang ZHOU Prof. Yuan YAO (MATH) Prof. Mingming GONG (The University of Melbourne) **** ALL are Welcome ****