More about HKUST
A Survey of Communication Optimization in Distributed Deep Learning Training
PhD Qualifying Examination Title: "A Survey of Communication Optimization in Distributed Deep Learning Training" by Mr. Hao WANG Abstract: Deep learning facilitates the development of numerous sophisticated applications such as image recognition, natural language processing, and autonomous driving over the last decade. As deep learning training jobs may consume days or weeks to complete, such jobs are usually processed in a distributed manner. In such a manner, however, we detect a shift of bottleneck from computation to communication due primarily to the 100s to 1000s of parameter synchronizations among workers. To alleviate the above bottleneck, researchers resort to substantial approaches such as gradient compression, computation- communication overlapping and in-network aggregation, which have yet achieved notable performance. In this survey, we first give a comprehensive review of existing communication optimization solutions. We summarize them into three levels with a total of six categorizations, which are: reducing communication round and gradient compression at the application level; synchronization modes, parameter exchange schemes, and overlap & scheduling at the framework level; in-network aggregation at the network level. Through the above survey, we find out that existing solutions are mostly built on application and framework levels, which motivates us of exploring the opportunities with specific network stacks and programmable switches for designing new algorithms at the network level. Date: Tuesday, 27 July 2021 Time: 4:00pm - 6:00pm Zoom meeting: https://hkust.zoom.com.cn/j/3666130280?pwd=SE1yN1lJRG51MERsYXo3Ym5oMTJ5QT09 Committee Members: Dr. Kai Chen (Supervisor) Dr. Brahim Bensaou (Chairperson) Prof. Bo Li Dr. Wei Wang **** ALL are Welcome ****