More about HKUST
A Survey of Synchronization and Scheduling in General-Purpose Distributed Machine Learning Platforms
PhD Qualifying Examination
Title: "A Survey of Synchronization and Scheduling in General-Purpose
Distributed Machine Learning Platforms"
by
Mr. Chengliang ZHANG
Abstract:
Large datasets and models can achieve state-of-the-art machine learning
results, but training such models is both time-consuming and
computation-intensive. A typical large dataset can take up to terabytes of
storage, while a complex model have billions of parameters to be trained,
no single machine can accommodate such demand. Intuitively, one can train
these models in distributed clusters consisting of commodity machines in
parallel. As a result, recent years have witnessed relentless research
efforts on distributed machine learning.
The survey investigates the state-of-the-art architecture called Parameter
Server, which is tailored for large scale machine learning problems.
Besides the design philosophy of parameter server, we focus on the
synchronization schemes and the trade-off between computation efficiency
and consistency. We then survey the ongoing efforts on improving parameter
server performance by addressing problems like heterogeneity, machine
failure, and communication scheduling.
Date: Wednesday, 25 April 2018
Time: 3:00pm - 5:00pm
Venue: Room 2611
Lifts 31/32
Committee Members: Dr. Wei Wang (Supervisor)
Prof. James Kwok (Chairperson)
Dr. Kai Chen
Prof. Bo Li
**** ALL are Welcome ****