More about HKUST
TOWARDS USABLE, EFFICIENT SERVERLESS COMPUTING SYSTEMS
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "TOWARDS USABLE, EFFICIENT SERVERLESS COMPUTING SYSTEMS" By Mr. Minchen YU Abstract: Serverless computing has gained much popularity as a new cloud computing paradigm due to its high scalability and fine-grained, pay-per-use billing model. In contrast to traditional serverful cloud offerings like VMs, serverless computing proposes high-level abstractions in the form of stateless functions, which facilitates users to develop and execute cloud applications. However, these advancements also present new challenges, such as application state management and restricted function communication, which can significantly impact the usability and applicability of serverless cloud. This dissertation aims to make serverless computing more usable and efficient, where we explore both scenarios: general-purpose and application-specific serverless systems. We first discuss general-purpose serverless platforms that are designed to provide a function abstraction and support diverse applications. Current serverless platforms typically deploy an application as a function workflow, and orchestrate and trigger its functions by following invocation dependencies. However, this design is oblivious to the underlying data exchanges between functions, making it inefficient and hard to orchestrate complex applications. We therefore propose a novel data-centric approach to function orchestration, which can easily and effectively support complex workflow patterns by making data consumption explicit and allowing it to trigger functions. Following this data-centric design, we present Pheromone, an efficient serverless platform that enables low-latency function interactions and data exchanges and is easy to use for orchestrating many applications. We next focus on enabling efficient ML model inference on serverless computing, i.e., application-specific serverless systems. Serverless computing is well-suited for model inference, as it supports fast autoscaling to handle dynamic, bursty inference requests at low cost. However, current serverless functions are limited in CPU and memory resources and do not support GPUs, which hinders their ability to perform efficient model inference. To address the limitations, we present two serverless systems. First, we propose Gillis, a serverless model inference system that tackles the resource limitations of individual functions. Gillis can automatically partition and parallelize a ML model across multiple function, leading to faster inference and reduced memory footprint per function. It supports large models that cannot be accommodated within a single function and effectively meets request-level latency Service Level Objectives (SLOs) through its model partitioning algorithms. Second, we propose Torpor, a GPU-enabled serverless platform for low-latency, resource-efficient model inference. Torpor enables fine-grained GPU sharing among various inference functions, and supports efficient model swapping between host and GPUs, which reduces function keep-alive cost and achieves load balancing across GPUs. With its model swapping and request scheduling algorithms, Torpor can effectively meet perfunction latency SLOs while achieving high GPU utilization. Date: Thursday, 3 August 2023 Time: 10:30am - 12:30pm Venue: Room 5501 Lifts 25/26 Chairman: Prof. Jun ZHANG (ECE) Committee Members: Prof. Wei WANG (Supervisor) Prof. Gary CHAN Prof. Shuai WANG Prof. Fengbin TU (ECE) Prof. Chuan WU (HKU) **** ALL are Welcome ****