More about HKUST
A Survey for Optimizing Distributed Deep Learning in GPU Cluster
PhD Qualifying Examination Title: "A Survey for Optimizing Distributed Deep Learning in GPU Cluster" by Mr. Xinchen WAN Abstract: Deep learning has been widely used in multiple application domains. As the training process may consume hours or days to complete, distributed systems are adopted for the purpose of timely training. Meanwhile, GPUs remain the dominant custom-accelerators for deep learning process, which motivates large companies to establish large-scale GPU clusters and deploy DL applications upon them. However, the way to collaborate between communication and computation for training in GPU clusters remains to be investigated. In seeking high training efficiency, several optimization techniques are proposed. In this survey, we first give a background knowledge of distributed deep learning and GPU cluster. Then we present and discuss several techniques by categorizing them in two aspects: communication and computation. Lastly, we conclude by showing the limitations of current studies and providing new directions for future work. Date: Friday, 10 July 2020 Time: 4:00pm - 6:00pm Zoom meeting: https://hkust.zoom.us/j/99380725107 Committee Members: Dr. Kai Chen (Supervisor) Dr. Brahim Bensaou (Chairperson) Dr. Qifeng Chen Dr. Yangqiu Song **** ALL are Welcome ****