More about HKUST
TOWARDS SCALABLE DEEP LEARNING WITH COMMUNICATION OPTIMIZATIONS
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "TOWARDS SCALABLE DEEP LEARNING WITH COMMUNICATION OPTIMIZATIONS" By Mr. Lin ZHANG Abstract: With the burst of data and model sizes, it has become prevalent to parallelize deep neural networks (DNNs) training in a cluster of distributed devices, which however introduces extensive communication overheads. In this thesis, we study both system-level and algorithm-level communication optimization techniques to improve training efficiency. First, existing data parallel training systems rely on the all-reduce primitive for gradient aggregation, which only achieve sub-optimal training performance. We present DeAR, that decouples the all-reduce primitive to two operators to enable fine-grained communication scheduling, and then we use dynamic tensor fusion to derive an optimal solution. Second, many gradient compression algorithms have been proposed to compress communication data in synchronous stochastic gradient descent (S-SGD) to accelerate distributed training, but we find they fail to outperform S-SGD in most cases. To this end, we propose ACP-SGD, which largely reduces the compression and communication overheads, and enjoys three system optimizations (all-reduce, pipelining, and tensor fusion). Third, we are concerned with the advancement of second-order methods such as distributed K-FAC (D-KFAC) for training DNNs because of their utilization of curvature information to accelerate the training process. However, D-KFAC incurs extensive computations and communications for curvature information. We present smart parallel D-KFAC (SPD-KFAC) and placement-aware D-KFAC (PAD-KFAC) to accelerate D-KFAC with efficient pipelining and optimal tensor placement scheduling techniques, respectively. Fourth, we present a memory- and time-efficient second-order algorithm named Eva, with two novel techniques: 1) we approximate the curvature information with two small stochastic vectors to reduce the memory and communication consumption, and 2) we derive an efficient update formula without explicitly computing the inverse of matrices to address the high computation overhead. Date: Friday, 18 August 2023 Time: 2:00pm - 4:00pm Venue: Room 3494 lifts 25/26 Chairperson: Prof. Yong HUANG (CHEM) Committee Members: Prof. Bo LI (Supervisor) Prof. Yangqiu SONG Prof. Qian ZHANG Prof. Weichuan YU (ECE) Prof. Yuanqing ZHENG (PolyU) **** ALL are Welcome ****