Learning Over-parameterized Linear Models via Stochastic Gradient Descent

Speaker: Dr. Difan ZOU
         Department of Computer Science
         The University of Hong Kong

Title:  "Learning Over-parameterized Linear Models via
        Stochastic Gradient Descent"

Date:   Monday, 17 April 2023

Time:   4:00pm - 5:00pm

Venue:  Lecture Theater F (Leung Yat Sing Lecture Theater)
        near lift 25/26, HKUST

Abstract:

It has been widely witnessed that standard optimization algorithms, such
as SGD, can find a solution that generalizes well, despite the problem
being extremely over-parameterized (i.e., the number of model parameters
is larger than the training sample size). In order to understand this
algorithmic inductive bias, we study the generalization ability of SGD in
a basic setting: (over-parameterized) linear regression. Particularly,
unlike conventional works that mostly study the worst-case guarantee of
SGD, our work focuses on proving a problem- and algorithm-dependent bounds
on the generalization error. I will show that the developed bounds can
precisely characterize the sufficient conditions on the problem and
algorithm such that SGD can generalize. I will also discuss the implicit
regularization of SGD by comparing it to the solution ofthe regularized
risk minimization (i.e., ridge regression) via the notion of "sample
inflation", i.e., how many more samples SGD (with tuned stepsize) requires
to perform no worse than the optimal ridge regression solution (with the
optimal regression parameter). As a result, we can identify which problem
instances are more suitable to be learned by SGD and vise versa.


*****************
Biography:

Difan Zou is currently an assistant professor in the Department of
Computer Science, The University of Hong Kong.  He received his Ph.D.
degree in the Department of Computer Science, UCLA. Additionally, he
received a B.S degree in Applied Physics, from the School of Gifted Young,
University of Science and Technology of China (USTC), and an M.E degree in
Electrical Engineering from USTC. He is a recipient of the Bloomberg Data
Science Ph.D. fellowship. His research interests are broadly in machine
learning/deep learning, optimization, stochastic modeling, and signal
processing.