OPTIMIZE RESOURCE SCHEDULING IN MULTI-TENANT CLUSTERS AT SCALE

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "OPTIMIZE RESOURCE SCHEDULING IN MULTI-TENANT CLUSTERS AT SCALE"

By

Mr. Qizhen WENG


Abstract

Witnessing the soaring demand for computation over the past decade, tech 
companies are piling up numerous commodity machines to serve requests from 
massive users. Such large-scale multi-tenant clusters, with optimized 
resource scheduling, have the potential to be highly efficient. However, 
it is challenging to achieve high performance and low cost in practice. 
Given heterogeneous hardware and diverse workloads, many schedulers either 
fail with low resource utilization, which increases the cost, or cause 
high workload contention, which decreases the performance.

In this dissertation, starting with a characterization study of a 
production cluster, we present the challenges posed to resource 
scheduling; for example, low resource utilization, presence of 
hard-to-schedule tasks demanding high-end GPUs, imbalance load across 
machines, and severe contention on CPU resources. To tackle these issues, 
packing and balancing are two major approaches. Bin-packing consolidates 
workloads on fewer servers, accommodating demanding tasks and improving 
resource utilization. Load-balancing scatters tasks over the cluster, 
mitigating contention and boosting workload performance.

Following the packing method towards higher utilization, we find resource 
fragmentation to be a major obstacle, especially in GPU-sharing clusters 
where conventional binpacking is unviable. It is because the scheduling of 
GPU-sharing tasks that requests a partial GPU cannot be modeled as a 
classic bin packing problem, due to the discrete and interchangeable 
nature of GPU resources. Therefore, we take a new approach towards high 
utilization by minimizing fragmentation. We quantify the degree of GPU 
fragmentation statistically, and then use this metric to guide scheduling. 
We propose a novel scheduling heuristic called Fragmentation Gradient 
Descent (FGD), which consistently outperforms a variety of packing-based 
schedulers and further utilizes hundreds of GPUs in large-scale cluster 
emulations driven by production traces.

Following the balancing method towards better performance, we study the 
placement of long-running application (LRA) containers. LRAs, with 
stringent performance requirements, are difficult to schedule due to their 
sophisticated resource interferences and I/O dependencies. Existing 
schedulers, avoiding contention by minimizing the violations of placement 
constraints, fall short in performance, as manually expressed constraints 
only provide qualitative scheduling guidelines. Hence, we design Metis, a 
data-driven scheduling system that learns to optimally place LRA 
containers using deep reinforcement learning. Metis eliminates the complex 
manual specification of placement constraints and offers concrete 
quantitative scheduling criteria. Enhanced by hierarchical learning 
techniques, Metis scales to large clusters and substantially increases the 
throughput of workloads in real deployments on the public cloud.


Date:			Friday, 25 November 2022

Time:			4:00pm - 6:00pm

Venue:			Room 5501
 			lifts 25/26

Chairperson:		Prof. Can YANG (MATH)

Committee Members:	Prof. Wei WANG (Supervisor)
 			Prof. Shuai WANG
 			Prof. Qian ZHANG
 			Prof. Jun ZHANG (ECE)
 			Prof. Song GUO (PolyU)


**** ALL are Welcome ****