More about HKUST
Large Scale Optimization Methods for Machine Learning
PhD Thesis Proposal Defence Title: "Large Scale Optimization Methods for Machine Learning" by Mr. Shuai ZHENG Abstract: Dealing with large-scale dataset has been a big challenge for optimization method in machine learning. Typical machine learning problems can be cast as a minimization of an objective over some underlying data distribution. To exploit the data structure and improve the generalization performance, some complex regularization may be added to the objective. The stochastic gradient descent (SGD) method has been widely viewed as an ideal approach for large-scale machine learning problems while the conventional batch gradient method typically falters. Despite its flexibility and scalability, the stochastic gradient is associated with high variance which impedes training. This thesis proposal provides a number of new optimization algorithms for tackling the large-scale machine learning tasks. We firstly propose a fast and scalable stochastic ADMM method for solving empirical risk minimization problem with complex nonsmooth regularizers such as graph lasso and group lasso, and a stochastic continuation method to optimize convex problems where loss and regularizer are both nonsmooth. While the existing approaches rely crucially on the assumption that the dataset is finite, we introduce two SGD-like algorithms for the finite sums with infinite data. The proposed algorithms outperform existing methods in terms of both iteration complexity and storage. Inspired by the recent advancement on adaptive gradient methods for training deep neural networks, we present a fast and powerful optimization algorithm based on the follow-the-proximally-regularized-leader (FTPRL) method. The new algorithm significantly outperforms the existing approaches, thereby advancing the state-of-the-art results. Recently, there is growing interest in distributed training as it can still be difficult to store a very large dataset on a single machine. In the light of this, we develop a distributed asynchronous gradient-based method, improving upon the existing distributed machine learning algorithms and enjoying fast linear convergence rate. Finally, the scalability of large-scale distributed training of neural networks is often limited by the communication overhead. Motivated by the recent advances in optimization with compressed gradient, we propose a communication-efficient distributed SGD with error-feedback. The proposed method provably converges to a stationary point at the same asymptotic rate as distributed synchronous SGD. Date: Monday, 1 April 2019 Time: 2:00pm - 4:00pm Venue: Room 5508 (lifts 25/26) Committee Members: Prof. James Kwok (Supervisor) Prof. Dit-Yan Yeung (Chairperson) Prof. Daniel Palomar (ECE) Prof. Tong Zhang (MATH) **** ALL are Welcome ****