More about HKUST
A Survey of Synchronization and Scheduling in General-Purpose Distributed Machine Learning Platforms
PhD Qualifying Examination Title: "A Survey of Synchronization and Scheduling in General-Purpose Distributed Machine Learning Platforms" by Mr. Chengliang ZHANG Abstract: Large datasets and models can achieve state-of-the-art machine learning results, but training such models is both time-consuming and computation-intensive. A typical large dataset can take up to terabytes of storage, while a complex model have billions of parameters to be trained, no single machine can accommodate such demand. Intuitively, one can train these models in distributed clusters consisting of commodity machines in parallel. As a result, recent years have witnessed relentless research efforts on distributed machine learning. The survey investigates the state-of-the-art architecture called Parameter Server, which is tailored for large scale machine learning problems. Besides the design philosophy of parameter server, we focus on the synchronization schemes and the trade-off between computation efficiency and consistency. We then survey the ongoing efforts on improving parameter server performance by addressing problems like heterogeneity, machine failure, and communication scheduling. Date: Wednesday, 25 April 2018 Time: 3:00pm - 5:00pm Venue: Room 2611 Lifts 31/32 Committee Members: Dr. Wei Wang (Supervisor) Prof. James Kwok (Chairperson) Dr. Kai Chen Prof. Bo Li **** ALL are Welcome ****