Towards Optimal Delay and Throughput in Data-Parallel Computing Clusters

PhD Thesis Proposal Defence


Title: "Towards Optimal Delay and Throughput in Data-Parallel Computing 
Clusters"

by

Mr. Jingjie JIANG


Abstract:

Data-parallel frameworks are designed to support the processing of large 
volumes of data with distributed algorithms, such as machine learning and 
graph processing, in computing clusters. Due to the distributed nature of 
data-parallel jobs, computation and network resources both serve as the 
most critical factors to improve individual job performance and overall 
system throughput. Therefore, there is a pressing need to coordinate the 
allocation of network bandwidth and the scheduling of computation tasks.

This thesis handles the allocation of both network and computation 
resources through delay-aware bandwidth allocation schemes and 
network-aware task scheduling frameworks. Specifically, we make the 
following three contributions.

First, we design Tailor, a dynamic monitoring and routing system to reduce 
network transfer times between successive computation stages of a job 
(captured as coflow completion time). Tailor is transparent to 
data-parallel applications and requires minimum modifications of 
end-hosts.

Second, for clusters where only edge networks experience severe and 
persistent congestion, we show that achieving work conservation is 
insufficient to maximizing the utilization of access links. We then 
propose a hierarchical bandwidth allocation framework, Adia, that 
maximizes link utilization without sacrificing coflow performance.

Last but not least, we propose to embrace network-awareness into task 
scheduling since any scheme that aims at reducing network transfer times 
cannot eliminate the network bottleneck. By introducing a new queueing 
model that decouples the usage of network and computation resources, we 
propose a network-aware scheduling algorithm, Adrestia, which is proved to 
be throughput-optimal. To enable the incremental deployment of 
network-aware scheduling, we further propose a simple yet effective 
scheduling framework, Symbiosis, that reinforces and complements the 
schedulers of existing data-parallel frameworks, and successfully improves 
both delay and throughput performance.


Date:			Tuesday, 10 January 2017

Time:                  	4:00pm - 6:00pm

Venue:                  Room 3494
                         (lifts 25/26)

Committee Members:	Prof. Bo Li (Supervisor)
  			Prof. Lei Chen (Chairperson)
 			Dr. Kai Chen
  			Dr. Yangqiu Song


**** ALL are Welcome ****