More about HKUST
Towards Efficient and Effective Distributed Training and Inference System for Large-Scale Machine Learning
PhD Thesis Proposal Defence Title: "Towards Efficient and Effective Distributed Training and Inference System for Large-Scale Machine Learning" by Mr. Weiyan WANG Abstract: In recent years, there has been a trend that model size increases exponentially to extract knowledge from large-scale training data effectively. And such large models have achieved stateof- art accuracy results in various tasks, including but not limited to computer vision (CV), natural language processing (NLP), and information retrieval (IR). However, the large-scale model pays the price of efficiency for effectiveness, making it impractical for training and inference in some scenarios. When distributed training for large-scale models and data, it can suffer from network bottlenecks due to bandwidth contention and network heterogeneity during global synchronization. Furthermore, the inference of large models on a single device can have too high computation costs to satisfy the real-time requirement after the deployment. This thesis presents our efforts in building efficient distributed training and inference systems for large-scale machine learning while maintaining effectiveness. For distributed training in various tasks, we propose a novel Divide-and-Shuffle Synchronization (DS-Sync) to realize communication efficiency without sacrificing convergence accuracy. DS-Sync divides workers into independently synchronized groups to be free of bottlenecks, and it ensures a global consensus by iteratively shuffling workers among different groups. For online IR and NLP inference in remote servers, we propose student parallelism to build the distributed inference system Academus for low latency and high throughput. It distillates the original large model into an equivalent group of parallel, flat, and homogenous student models to trade more parallel operators for fewer serial operators and ameliorate the costs of batching and paddings. And for CV inference in local edge devices, we further propose Model Decomposition and Parallelization (MDP) for distributed inference on heterogeneous edge devices due to latency and privacy issuses. MDP decomposes the vision transformers into different sub-models with various input and model sizes suitable for different devices. All sub-models are virtually stacked to work together for higher accuracy. In the future, I am continuously working towards realizing efficient, effective, and scalable distributed systems for large-scale machine learning in more practical scenarios. Date: Wednesday, 26 April 2023 Time: 9:30am - 11:30am Venue: Room 5510 (lifts 25/26) Committee Members: Prof. Kai Chen (Supervisor) Prof. Gary Chan (Chairperson) Dr. Yangqiu Song Dr. Shuai Wang **** ALL are Welcome ****