Exploring Task Dependencies of Data-Parallel Jobs in Alibaba Cloud

MPhil Thesis Defence


Title: "Exploring Task Dependencies of Data-Parallel Jobs in Alibaba Cloud"

By

Mr. Yunchuan ZHENG


Abstract

Large production data centers consistently deal with data-parallel computations 
with complicated task dependencies, which are usually formed as Directed 
Acyclic Graphs (DAGs). Thus it would benefit scheduler design by figuring out 
DAG structures and runtime characteristics in production environment, which 
remains an important missing piece in the literature.

To bridge this gap, this dissertation conducts a comprehensive study on an open 
sourced cluster trace of Alibaba Group. We examine the dependency structures of 
Alibaba batch jobs and find that their DAGs have sparsely connected vertices 
and can be approximately decomposed into multiple trees with bounded depth. We 
also investigate the runtime performance of DAGs and results indicate that 
dependent tasks may have significant variability in resource usage and 
duration—even for recurring tasks. In both aspects, we compare the SQL jobs in 
the standard TPC benchmarks with the production workloads and find the former 
inadequately representative. To better benchmark DAG schedulers at scale, we 
develop a workload generator that can faithfully synthesize task dependencies 
based on the production Alibaba trace. Extensive evaluations show that the 
synthesized DAGs have consistent statistical characteristics as the production 
DAGs, and the synthesized and real workloads yield similar scheduling results 
with various schedulers.


Date:  			Thursday, 30 March 2023

Time:			2:30pm - 4:30pm

Venue:			Room 4475
 			lifts 25/26

Committee Members:	Dr. Wei Wang (Supervisor)
 			Prof. Bo Li (Chairperson)
 			Dr. Shuai Wang


**** ALL are Welcome ****