More about HKUST
Towards Optimal Delay and Throughput in Data-Parallel Computing Clusters
PhD Thesis Proposal Defence
Title: "Towards Optimal Delay and Throughput in Data-Parallel Computing
Clusters"
by
Mr. Jingjie JIANG
Abstract:
Data-parallel frameworks are designed to support the processing of large
volumes of data with distributed algorithms, such as machine learning and
graph processing, in computing clusters. Due to the distributed nature of
data-parallel jobs, computation and network resources both serve as the
most critical factors to improve individual job performance and overall
system throughput. Therefore, there is a pressing need to coordinate the
allocation of network bandwidth and the scheduling of computation tasks.
This thesis handles the allocation of both network and computation
resources through delay-aware bandwidth allocation schemes and
network-aware task scheduling frameworks. Specifically, we make the
following three contributions.
First, we design Tailor, a dynamic monitoring and routing system to reduce
network transfer times between successive computation stages of a job
(captured as coflow completion time). Tailor is transparent to
data-parallel applications and requires minimum modifications of
end-hosts.
Second, for clusters where only edge networks experience severe and
persistent congestion, we show that achieving work conservation is
insufficient to maximizing the utilization of access links. We then
propose a hierarchical bandwidth allocation framework, Adia, that
maximizes link utilization without sacrificing coflow performance.
Last but not least, we propose to embrace network-awareness into task
scheduling since any scheme that aims at reducing network transfer times
cannot eliminate the network bottleneck. By introducing a new queueing
model that decouples the usage of network and computation resources, we
propose a network-aware scheduling algorithm, Adrestia, which is proved to
be throughput-optimal. To enable the incremental deployment of
network-aware scheduling, we further propose a simple yet effective
scheduling framework, Symbiosis, that reinforces and complements the
schedulers of existing data-parallel frameworks, and successfully improves
both delay and throughput performance.
Date: Tuesday, 10 January 2017
Time: 4:00pm - 6:00pm
Venue: Room 3494
(lifts 25/26)
Committee Members: Prof. Bo Li (Supervisor)
Prof. Lei Chen (Chairperson)
Dr. Kai Chen
Dr. Yangqiu Song
**** ALL are Welcome ****