More about HKUST
On Generalization and Implicit Bias of Gradient Methods in Deep Learning
Speaker: Prof. Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Title: "On Generalization and Implicit Bias of Gradient Methods in Deep Learning" Date: Monday 12 Aug 2019 Time: 2:00pm - 3:00pm Venue: Room 3598 (via lift no. 27/28), HKUST Abstract: Deep learning has enjoyed huge empirical success in recent years. Although training a deep neural network is a highly non-convex optimization problem, simple (stochastic) gradient methods are able to produce good solutions that minimize the training error, and more surprisingly, can generalize well to out-of sample data, even when the number of parameters is significantly larger than the amount of training data. It is known that changing the optimization algorithm, even without changing the model, changes the implicit bias, and also the generalization properties. What is the bias introduced by the optimization algorithms for neural networks? What ensures generalization in neural networks? In this talk, we attempt to answer the above questions by proving new generalization bounds and investigating the implicit bias of various gradient methods. (1) We develop a new framework, termed Bayes-Stability, for proving algorithm-dependent generalization error bounds. Using the new framework, we obtain new data-dependent generalization bounds for stochastic gradient Langevin dynamics (SGLD) and several other noisy gradient methods (e.g., with momentum, mini-batch and acceleration, Entropy-SGD). Our result recovers (and is typically tighter than) a recent result in Mou et al. (2018) and improves upon the results in Pensia et al. (2018). Our experiments demonstrate that our data-dependent bounds can distinguish randomly labelled data from normal data, which provides an explanation to the intriguing phenomena observed in Zhang et al. (2017a). (2) We show gradient descent converges to the max-margin direction for homogeneous neural networks, including fully-connected and convolutional neural networks with ReLU or LeakyReLU activations, generalizing previous work for logistic regression with one-layer or multi-layer linear networks. Finally, as margin is closely related to robustness, we discuss potential benefits of training longer for improving the robustness of the model. ****************** Biography: Jian Li is currently an associate professor at Institute for Interdisciplinary Information Sciences (IIIS, previously ITCS), Tsinghua University, headed by Prof. Andrew Yao. He got his BSc degree from Sun Yat-sen (Zhongshan) University, China, MSc degree in computer science from Fudan University, China and PhD degree in the University of Maryland, USA. His major research interests lie in algorithm design and analysis, machine learning, and databases. He co-authored several research papers that have been published in major computer science conferences and journals. He received the best paper awards at VLDB 2009 and ESA 2010. He is also a recipient of the "221 Basic Research Plan for Young Faculties" at Tsinghua University and the "new century excellent talents award" by Ministry of Education of China.