Sparse Gradient Communication for Accelerating Distributed Deep Learning

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


MPhil Thesis Defence


Title: "Sparse Gradient Communication for Accelerating Distributed Deep 
Learning"

By

Miss Zihan LI


Abstract:

Synchronous stochastic gradient descent (S-SGD) with data parallelism has 
become a de-facto approach in training large-scale deep neural networks (DNNs) 
on multi-GPU systems. However, S-SGD requires synchronizing gradients from all 
workers iteratively, which incurs excessive communication costs and limits the 
scaling efficiency of GPU clusters. Gradient sparsification like top-k 
sparsification techniques has been shown to be potentially effective in 
reducing the communication volume thus improve scalability. Yet, existing 
approaches with top-k sparsification still suffer from high communication 
complexity in that they often need a very low density to achieve better 
performance while easily sacrificing the model accuracy. To this end, we 
propose a novel sparse communication approach called TopKA2A, which integrates 
top-k sparsification with AlltoAll communication to exchange sparse tensors 
among GPUs, resulting in a notable reduction in communication complexity. With 
rigorous theoretical analysis on the conditions that TopKA2A can be applied, we 
design a simple yet effective tensor fusion algorithm based on binary search. 
We conduct an in-depth analysis and evaluation of communication efficiency by 
comparing TopKA2A with state-of-the-art solutions over popular DNN models 
without compromising model accuracy. Experimental results demonstrate that 
TopKA2A yields substantial communication efficiency gains and achieves training 
speedup over existing algorithms on a 32-GPU cluster.


Date:                   Thursday, 27 June 2024

Time:                   2:00pm - 4:00pm

Venue:                  Room 3494
                        Lifts 25/26

Chairman:               Dr. Shuai WANG

Committee Members:      Prof. Bo LI (Supervisor)
                        Dr. Yangqiu SONG (Co-supervisor)
                        Prof. Song GUO
Privacy Sitemap
Sparse Gradient Communication for Accelerating Distributed Deep Learning

About

People

Research

Academics

Admissions