A Survey of Communication Optimizations in Distributed Deep Learning

PhD Qualifying Examination


Title: "A Survey of Communication Optimizations in Distributed Deep Learning"

by

Mr. Lin ZHANG


Abstract:

Nowadays, distributed deep learning (DL) has become a common practice in 
accelerating large deep neural networks (DNNs) training across multiple 
workers. However, such distributed training requires extensive communications. 
The communication overheads often consume a significant portion of the training 
time, and result in a severe performance bottleneck. How to address the 
communication issues has attracted much attention from both academia and 
industry to improve the system scalability. In this article, we present a 
survey of communication optimization techniques for data parallel distributed 
DL. In particular, we focus on system architecture design and communication 
scheduling algorithms. The system architecture defines how workers exchange 
information, and the communication scheduling algorithms can be applied into 
different architectures to better utilize the network capacity. These 
techniques are important as they do not change the training dynamics of the 
learning algorithms and can be directly integrated into existing DL frameworks. 
Furthermore, we find that distributed second-order algorithms have been emerged 
to accelerate distributed DNNs training with less iterations to convergence, 
compared to the first-order SGD algorithms. As existing communication solutions 
are mostly built on first-order algorithms, it motivates us to explore the 
opportunities of communication optimizations for the representative K-FAC 
algorithms.


Date:			Wednesday, 1 December 2021

Time:                  	2:00pm - 4:00pm

Venue:			Room 3494
 			(lifts 25/26)

Committee Members:	Prof. Bo Li (Supervisor)
 			Dr. Wei Wang (Chairperson)
 			Prof. Qiong Luo
 			Dr. Yangqiu Song


**** ALL are Welcome ****