More about HKUST
Sparse Gradient Communication for Accelerating Distributed Deep Learning
The Hong Kong University of Science and Technology Department of Computer Science and Engineering MPhil Thesis Defence Title: "Sparse Gradient Communication for Accelerating Distributed Deep Learning" By Miss Zihan LI Abstract: Synchronous stochastic gradient descent (S-SGD) with data parallelism has become a de-facto approach in training large-scale deep neural networks (DNNs) on multi-GPU systems. However, S-SGD requires synchronizing gradients from all workers iteratively, which incurs excessive communication costs and limits the scaling efficiency of GPU clusters. Gradient sparsification like top-k sparsification techniques has been shown to be potentially effective in reducing the communication volume thus improve scalability. Yet, existing approaches with top-k sparsification still suffer from high communication complexity in that they often need a very low density to achieve better performance while easily sacrificing the model accuracy. To this end, we propose a novel sparse communication approach called TopKA2A, which integrates top-k sparsification with AlltoAll communication to exchange sparse tensors among GPUs, resulting in a notable reduction in communication complexity. With rigorous theoretical analysis on the conditions that TopKA2A can be applied, we design a simple yet effective tensor fusion algorithm based on binary search. We conduct an in-depth analysis and evaluation of communication efficiency by comparing TopKA2A with state-of-the-art solutions over popular DNN models without compromising model accuracy. Experimental results demonstrate that TopKA2A yields substantial communication efficiency gains and achieves training speedup over existing algorithms on a 32-GPU cluster. Date: Thursday, 27 June 2024 Time: 2:00pm - 4:00pm Venue: Room 3494 Lifts 25/26 Chairman: Dr. Shuai WANG Committee Members: Prof. Bo LI (Supervisor) Dr. Yangqiu SONG (Co-supervisor) Prof. Song GUO