Towards Efficient and Effective Distributed Training and Inference System for Large-Scale Machine Learning

PhD Thesis Proposal Defence


Title: "Towards Efficient and Effective Distributed Training and Inference 
System for Large-Scale Machine Learning"

by

Mr. Weiyan WANG


Abstract:

In recent years, there has been a trend that model size increases exponentially 
to extract knowledge from large-scale training data effectively. And such large 
models have achieved stateof- art accuracy results in various tasks, including 
but not limited to computer vision (CV), natural language processing (NLP), and 
information retrieval (IR). However, the large-scale model pays the price of 
efficiency for effectiveness, making it impractical for training and inference 
in some scenarios. When distributed training for large-scale models and data, 
it can suffer from network bottlenecks due to bandwidth contention and network 
heterogeneity during global synchronization. Furthermore, the inference of 
large models on a single device can have too high computation costs to satisfy 
the real-time requirement after the deployment.

This thesis presents our efforts in building efficient distributed training and 
inference systems for large-scale machine learning while maintaining 
effectiveness. For distributed training in various tasks, we propose a novel 
Divide-and-Shuffle Synchronization (DS-Sync) to realize communication 
efficiency without sacrificing convergence accuracy. DS-Sync divides workers 
into independently synchronized groups to be free of bottlenecks, and it 
ensures a global consensus by iteratively shuffling workers among different 
groups. For online IR and NLP inference in remote servers, we propose student 
parallelism to build the distributed inference system Academus for low latency 
and high throughput. It distillates the original large model into an equivalent 
group of parallel, flat, and homogenous student models to trade more parallel 
operators for fewer serial operators and ameliorate the costs of batching and 
paddings. And for CV inference in local edge devices, we further propose Model 
Decomposition and Parallelization (MDP) for distributed inference on 
heterogeneous edge devices due to latency and privacy issuses. MDP decomposes 
the vision transformers into different sub-models with various input and model 
sizes suitable for different devices. All sub-models are virtually stacked to 
work together for higher accuracy. In the future, I am continuously working 
towards realizing efficient, effective, and scalable distributed systems for 
large-scale machine learning in more practical scenarios.


Date:  			Wednesday, 26 April 2023

Time:                  	9:30am - 11:30am

Venue:			Room 5510
 			(lifts 25/26)

Committee Members:	Prof. Kai Chen (Supervisor)
 			Prof. Gary Chan (Chairperson)
 			Dr. Yangqiu Song
 			Dr. Shuai Wang


**** ALL are Welcome ****