More about HKUST
Learning Over-parameterized Linear Models via Stochastic Gradient Descent
Speaker: Dr. Difan ZOU Department of Computer Science The University of Hong Kong Title: "Learning Over-parameterized Linear Models via Stochastic Gradient Descent" Date: Monday, 17 April 2023 Time: 4:00pm - 5:00pm Venue: Lecture Theater F (Leung Yat Sing Lecture Theater) near lift 25/26, HKUST Abstract: It has been widely witnessed that standard optimization algorithms, such as SGD, can find a solution that generalizes well, despite the problem being extremely over-parameterized (i.e., the number of model parameters is larger than the training sample size). In order to understand this algorithmic inductive bias, we study the generalization ability of SGD in a basic setting: (over-parameterized) linear regression. Particularly, unlike conventional works that mostly study the worst-case guarantee of SGD, our work focuses on proving a problem- and algorithm-dependent bounds on the generalization error. I will show that the developed bounds can precisely characterize the sufficient conditions on the problem and algorithm such that SGD can generalize. I will also discuss the implicit regularization of SGD by comparing it to the solution ofthe regularized risk minimization (i.e., ridge regression) via the notion of "sample inflation", i.e., how many more samples SGD (with tuned stepsize) requires to perform no worse than the optimal ridge regression solution (with the optimal regression parameter). As a result, we can identify which problem instances are more suitable to be learned by SGD and vise versa. ***************** Biography: Difan Zou is currently an assistant professor in the Department of Computer Science, The University of Hong Kong. He received his Ph.D. degree in the Department of Computer Science, UCLA. Additionally, he received a B.S degree in Applied Physics, from the School of Gifted Young, University of Science and Technology of China (USTC), and an M.E degree in Electrical Engineering from USTC. He is a recipient of the Bloomberg Data Science Ph.D. fellowship. His research interests are broadly in machine learning/deep learning, optimization, stochastic modeling, and signal processing.