More about HKUST
Towards Optimal Delay and Throughput in Data-Parallel Computing Clusters
PhD Thesis Proposal Defence Title: "Towards Optimal Delay and Throughput in Data-Parallel Computing Clusters" by Mr. Jingjie JIANG Abstract: Data-parallel frameworks are designed to support the processing of large volumes of data with distributed algorithms, such as machine learning and graph processing, in computing clusters. Due to the distributed nature of data-parallel jobs, computation and network resources both serve as the most critical factors to improve individual job performance and overall system throughput. Therefore, there is a pressing need to coordinate the allocation of network bandwidth and the scheduling of computation tasks. This thesis handles the allocation of both network and computation resources through delay-aware bandwidth allocation schemes and network-aware task scheduling frameworks. Specifically, we make the following three contributions. First, we design Tailor, a dynamic monitoring and routing system to reduce network transfer times between successive computation stages of a job (captured as coflow completion time). Tailor is transparent to data-parallel applications and requires minimum modifications of end-hosts. Second, for clusters where only edge networks experience severe and persistent congestion, we show that achieving work conservation is insufficient to maximizing the utilization of access links. We then propose a hierarchical bandwidth allocation framework, Adia, that maximizes link utilization without sacrificing coflow performance. Last but not least, we propose to embrace network-awareness into task scheduling since any scheme that aims at reducing network transfer times cannot eliminate the network bottleneck. By introducing a new queueing model that decouples the usage of network and computation resources, we propose a network-aware scheduling algorithm, Adrestia, which is proved to be throughput-optimal. To enable the incremental deployment of network-aware scheduling, we further propose a simple yet effective scheduling framework, Symbiosis, that reinforces and complements the schedulers of existing data-parallel frameworks, and successfully improves both delay and throughput performance. Date: Tuesday, 10 January 2017 Time: 4:00pm - 6:00pm Venue: Room 3494 (lifts 25/26) Committee Members: Prof. Bo Li (Supervisor) Prof. Lei Chen (Chairperson) Dr. Kai Chen Dr. Yangqiu Song **** ALL are Welcome ****