Learning Under Distributional Shift

PhD Thesis Proposal Defence


Title: "Learning Under Distributional Shift"

by

Mr. Yong LIN


Abstract:

Machine learning models often assume that the training and testing data are
drawn from the same distribution, but this assumption can be violated in
real-world applications where the testing distribution differs. Enhancing
models' robustness under distributional shift, known as Out-of-Distribution
(OOD) Generalization, has gained significant attention in the machine learning
community. One popular framework, called Invariant Risk Minimization (IRM),
focuses on learning invariant features that can stably predict labels under
distributional shifts and discard spurious features that are unstable. IRM
enjoys theoretical guarantees for linear models with sufficient number of
environments. In this thesis, we first identify some fundamental limitations of
IRM and propose methods to alleviate these issues.

First, we demonstrate that IRM can be inherently susceptible to overfitting.
Specifically, we reveals that IRM theoretically degenerates to Empirical Risk
Minimization (ERM) if overfitting occurs. Additionally, through empirical
experiments, we provide evidence of IRM's performance degradation with larger
neural networks. To mitigate this issue, we propose several methods such as
incorporating Bayesian inference and sample reweighting.

Second, we theoretically show that learning invariant features is generally
impossible without explicit environment partitions. Furthermore, we propose
utilizing cheaply available auxiliary information to automatically generate
partitions and provide corresponding conditions for the framework.

Despite these efforts to improve IRM, it still faces challenges, including
non-identifiability with non-linear models under large distributional shift and
inadequate performance on large-scale real-world datasets. Interestingly,
practitioners have found success with ensemblebased models, such as ensembles
of independently trained models, for OOD generalization. However, it is
important to note that ensemble models inevitably utilizes of spurious features
because each individual model in ensemble is trained by Empirical Risk
Minimization (ERM) and inherently incorporates spurious features. The success
of ensemble models contradicts the IRM theory, which suggests that models
relying on spurious features would fail. In this thesis, we also try to unravel
the mystery surrounding ensemble-based models and their effectiveness in OOD
generalization.


Date:                   Monday, 30 October 2023

Time:                   9:30am - 11:30am

Venue:                  Room 5510
                        lifts 25/26

Committee Members:      Prof. Tong Zhang (Supervisor)
                        Prof. Nevin Zhang (Chairperson)
                        Prof. Raymond Wong
                        Prof. Xiaofang Zhou


**** ALL are Welcome ****