Towards Efficient and Effective Distributed Training and Inference System for Large-Scale Machine Learning

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Towards Efficient and Effective Distributed Training and Inference
System for Large-Scale Machine Learning"

By

Mr. Weiyan WANG


Abstract:

In recent years, there has been a trend that model size increases exponentially
to effectively extract knowledge from large-scale training data. Such large
models have achieved state-of-art accuracy results in various tasks, including
but not limited to computer vision (CV), natural language processing (NLP), and
information retrieval (IR). However, the large-scale model pays the price of
efficiency for effectiveness, making it impractical for training and inference
in some scenarios. When distributed training for large-scale models and data,
it can suffer from network bottlenecks due to bandwidth contention and network
heterogeneity during global synchronization. Furthermore, the inference of
large models on a single device can have too high computation costs to satisfy
the real-time requirement after the deployment.

This thesis presents our efforts in building efficient distributed training and
inference systems for large-scale machine learning while maintaining
effectiveness. For distributed training in various tasks, we propose a novel
Divide-and-Shuffle Synchronization (DS-Sync) to realize communication
efficiency without sacrificing convergence accuracy. DS-Sync divides workers
into independently synchronized groups to be free of bottlenecks, and it
ensures a global consensus by iteratively shuffling workers among different
groups. For online IR and NLP inference in remote servers, we propose student
parallelism to build the distributed inference system Academus for low latency
and high throughput. It distillates the original large model into an equivalent
group of parallel, flat, and homogenous student models to trade more parallel
operators for fewer serial operators and ameliorate the costs of batching and
paddings. And for CV inference in local edge devices, we further propose Model
Decomposition and Parallelization (MDP) for distributed inference on
heterogeneous edge devices due to latency and privacy issuses. MDP decomposes
the vision transformers into different sub-models with various input and model
sizes suitable for different devices. All sub-models are virtually stacked to
work together for higher accuracy. In the future, I am continuously working
towards realizing efficient, effective, and scalable distributed systems for
large-scale machine learning in more practical scenarios.


Date:                   Tuesday, 21 November 2023

Time:                   10:00am - 12:00noon

Venue:                  Room CYTG002
                        Lifts 35/36

Chairman:               Prof. Jia LIU (IEDA)

Committee Members:      Prof. Kai CHEN (Supervisor)
                        Prof. Yangqiu SONG
                        Prof. Ke YI
                        Prof. Zili MENG (ECE)
                        Prof. Yu ZHANG (Southern Univ of Science and Technology)


**** ALL are Welcome ****