More about HKUST
Learning Under Distributional Shift
PhD Thesis Proposal Defence Title: "Learning Under Distributional Shift" by Mr. Yong LIN Abstract: Machine learning models often assume that the training and testing data are drawn from the same distribution, but this assumption can be violated in real-world applications where the testing distribution differs. Enhancing models' robustness under distributional shift, known as Out-of-Distribution (OOD) Generalization, has gained significant attention in the machine learning community. One popular framework, called Invariant Risk Minimization (IRM), focuses on learning invariant features that can stably predict labels under distributional shifts and discard spurious features that are unstable. IRM enjoys theoretical guarantees for linear models with sufficient number of environments. In this thesis, we first identify some fundamental limitations of IRM and propose methods to alleviate these issues. First, we demonstrate that IRM can be inherently susceptible to overfitting. Specifically, we reveals that IRM theoretically degenerates to Empirical Risk Minimization (ERM) if overfitting occurs. Additionally, through empirical experiments, we provide evidence of IRM's performance degradation with larger neural networks. To mitigate this issue, we propose several methods such as incorporating Bayesian inference and sample reweighting. Second, we theoretically show that learning invariant features is generally impossible without explicit environment partitions. Furthermore, we propose utilizing cheaply available auxiliary information to automatically generate partitions and provide corresponding conditions for the framework. Despite these efforts to improve IRM, it still faces challenges, including non-identifiability with non-linear models under large distributional shift and inadequate performance on large-scale real-world datasets. Interestingly, practitioners have found success with ensemblebased models, such as ensembles of independently trained models, for OOD generalization. However, it is important to note that ensemble models inevitably utilizes of spurious features because each individual model in ensemble is trained by Empirical Risk Minimization (ERM) and inherently incorporates spurious features. The success of ensemble models contradicts the IRM theory, which suggests that models relying on spurious features would fail. In this thesis, we also try to unravel the mystery surrounding ensemble-based models and their effectiveness in OOD generalization. Date: Monday, 30 October 2023 Time: 9:30am - 11:30am Venue: Room 5510 lifts 25/26 Committee Members: Prof. Tong Zhang (Supervisor) Prof. Nevin Zhang (Chairperson) Prof. Raymond Wong Prof. Xiaofang Zhou **** ALL are Welcome ****