More about HKUST
Large Scale Optimization Methods for Machine Learning
PhD Thesis Proposal Defence
Title: "Large Scale Optimization Methods for Machine Learning"
by
Mr. Shuai ZHENG
Abstract:
Dealing with large-scale dataset has been a big challenge for optimization
method in machine learning. Typical machine learning problems can be cast as a
minimization of an objective over some underlying data distribution. To exploit
the data structure and improve the generalization performance, some complex
regularization may be added to the objective. The stochastic gradient descent
(SGD) method has been widely viewed as an ideal approach for large-scale
machine learning problems while the conventional batch gradient method
typically falters. Despite its flexibility and scalability, the stochastic
gradient is associated with high variance which impedes training. This thesis
proposal provides a number of new optimization algorithms for tackling the
large-scale machine learning tasks.
We firstly propose a fast and scalable stochastic ADMM method for solving
empirical risk minimization problem with complex nonsmooth regularizers such as
graph lasso and group lasso, and a stochastic continuation method to optimize
convex problems where loss and regularizer are both nonsmooth. While the
existing approaches rely crucially on the assumption that the dataset is
finite, we introduce two SGD-like algorithms for the finite sums with infinite
data. The proposed algorithms outperform existing methods in terms of both
iteration complexity and storage. Inspired by the recent advancement on
adaptive gradient methods for training deep neural networks, we present a fast
and powerful optimization algorithm based on the
follow-the-proximally-regularized-leader (FTPRL) method. The new algorithm
significantly outperforms the existing approaches, thereby advancing the
state-of-the-art results. Recently, there is growing interest in distributed
training as it can still be difficult to store a very large dataset on a single
machine. In the light of this, we develop a distributed asynchronous
gradient-based method, improving upon the existing distributed machine learning
algorithms and enjoying fast linear convergence rate. Finally, the scalability
of large-scale distributed training of neural networks is often limited by the
communication overhead. Motivated by the recent advances in optimization with
compressed gradient, we propose a communication-efficient distributed SGD with
error-feedback. The proposed method provably converges to a stationary point at
the same asymptotic rate as distributed synchronous SGD.
Date: Monday, 1 April 2019
Time: 2:00pm - 4:00pm
Venue: Room 5508
(lifts 25/26)
Committee Members: Prof. James Kwok (Supervisor)
Prof. Dit-Yan Yeung (Chairperson)
Prof. Daniel Palomar (ECE)
Prof. Tong Zhang (MATH)
**** ALL are Welcome ****