Towards Efficient Resource Management for Cloud-Native Applications

PhD Thesis Proposal Defence


Title: "Towards Efficient Resource Management for Cloud-Native Applications"

by

Mr. Suyi LI


Abstract:

Cloud-native applications are designed to take advantage of the scalability, 
flexibility, and resilience of cloud computing platforms. Unlike traditional 
applications built for fixed, on-premises infrastructure, cloud-native 
applications are engineered to accommodate the distributed, virtualized, and 
dynamically provisioned nature of the cloud. However, these applications face 
challenges due to unpredictable workloads and traffic patterns, as well as 
their diverse natures and varying resource requirements. Judicious resource 
management strategies are critical to optimally schedule compute, network, 
and storage resources, maximizing application performance and cost efficiency 
while ensuring service level objectives (SLOs).

This dissertation presents tailored resource management strategies for three 
cloud-native application scenarios: long-running applications (LRAs), 
serverless computing, and generative AI inference. For LRAs, which run 
numerous long-lived container replicas to provide real-time services, optimal 
container placement is crucial for attaining the best application 
performance, as the containers have complex performance interferences, e.g., 
resource competitions and I/O dependencies. We propose a novel reinforcement 
learning-based scheduler that captures container interferences, complies with 
strict operation constraints, and efficiently produces high-quality placement 
decisions.

In serverless computing, also known as Function-as-a-Service (FaaS), 
applications are composed of loosely-coupled cloud functions with 
user-specified resource configurations. Despite the simplified development 
and deployment offered by FaaS platforms, recent studies reveal considerable 
resource underutilization due to users' overclaimed resource configurations 
and the workload pattern of serverless computing. To improve resource 
utilization while meeting applications' latency requirements, we present 
Golgi, a new FaaS scheduling system that judiciously overcommits functions 
and performs automatic vertical scaling for function instances in accordance 
with applications' SLOs.

Finally, we focus on text-to-image generation, a representative generative AI 
inference workload in the cloud. Production text-to-image services typically 
employ a workflow consisting of a base diffusion model augmented with various 
adapters, such as ControlNets and LoRAs, to better control output image 
details. However, adapters can significantly increase serving latency due to 
their loading and computation overhead, which escalates as more adapters are 
used. We characterize text-to-image services in production and propose Katz, 
a system that efficiently serves text-to-image workflows by orchestrating 
computation and network resources with efficient serving pipelines.


Date:                   Tuesday, 29 April 2025

Time:                   2:30pm - 4:30pm

Venue:                  Room 1104
                        Lift 19

Committee Members:      Dr. Wei Wang (Supervisor)
                        Prof. Bo Li (Chairperson)
                        Prof. Song Guo
                        Prof. Qian Zhang