A Survey of Communication Optimization in Distributed Deep Learning Training

PhD Qualifying Examination


Title: "A Survey of Communication Optimization in Distributed Deep Learning 
Training"

by

Mr. Hao WANG


Abstract:

Deep learning facilitates the development of numerous sophisticated 
applications such as image recognition, natural language processing, and 
autonomous driving over the last decade. As deep learning training jobs may 
consume days or weeks to complete, such jobs are usually processed in a 
distributed manner. In such a manner, however, we detect a shift of bottleneck 
from computation to communication due primarily to the 100s to 1000s of 
parameter synchronizations among workers. To alleviate the above bottleneck, 
researchers resort to substantial approaches such as gradient compression, 
computation- communication overlapping and in-network aggregation, which have 
yet achieved notable performance.

In this survey, we first give a comprehensive review of existing communication 
optimization solutions. We summarize them into three levels with a total of six 
categorizations, which are: reducing communication round and gradient 
compression at the application level; synchronization modes, parameter exchange 
schemes, and overlap & scheduling at the framework level; in-network 
aggregation at the network level. Through the above survey, we find out that 
existing solutions are mostly built on application and framework levels, which 
motivates us of exploring the opportunities with specific network stacks and 
programmable switches for designing new algorithms at the network level.


Date:			Tuesday, 27 July 2021

Time:                  	4:00pm - 6:00pm

Zoom meeting:
https://hkust.zoom.com.cn/j/3666130280?pwd=SE1yN1lJRG51MERsYXo3Ym5oMTJ5QT09

Committee Members:	Dr. Kai Chen (Supervisor)
 			Dr. Brahim Bensaou (Chairperson)
 			Prof. Bo Li
 			Dr. Wei Wang


**** ALL are Welcome ****