More about HKUST
Sparse Gradient Communication for Accelerating Distributed Deep Learning
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
MPhil Thesis Defence
Title: "Sparse Gradient Communication for Accelerating Distributed Deep
Learning"
By
Miss Zihan LI
Abstract:
Synchronous stochastic gradient descent (S-SGD) with data parallelism has
become a de-facto approach in training large-scale deep neural networks (DNNs)
on multi-GPU systems. However, S-SGD requires synchronizing gradients from all
workers iteratively, which incurs excessive communication costs and limits the
scaling efficiency of GPU clusters. Gradient sparsification like top-k
sparsification techniques has been shown to be potentially effective in
reducing the communication volume thus improve scalability. Yet, existing
approaches with top-k sparsification still suffer from high communication
complexity in that they often need a very low density to achieve better
performance while easily sacrificing the model accuracy. To this end, we
propose a novel sparse communication approach called TopKA2A, which integrates
top-k sparsification with AlltoAll communication to exchange sparse tensors
among GPUs, resulting in a notable reduction in communication complexity. With
rigorous theoretical analysis on the conditions that TopKA2A can be applied, we
design a simple yet effective tensor fusion algorithm based on binary search.
We conduct an in-depth analysis and evaluation of communication efficiency by
comparing TopKA2A with state-of-the-art solutions over popular DNN models
without compromising model accuracy. Experimental results demonstrate that
TopKA2A yields substantial communication efficiency gains and achieves training
speedup over existing algorithms on a 32-GPU cluster.
Date: Thursday, 27 June 2024
Time: 2:00pm - 4:00pm
Venue: Room 3494
Lifts 25/26
Chairman: Dr. Shuai WANG
Committee Members: Prof. Bo LI (Supervisor)
Dr. Yangqiu SONG (Co-supervisor)
Prof. Song GUO