More about HKUST
Towards Efficient Resource Management for Cloud-Native Applications
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Towards Efficient Resource Management for Cloud-Native Applications"
By
Mr. Suyi LI
Abstract:
Cloud-native applications are designed to take advantage of the scalability,
flexibility, and resilience of cloud computing platforms. Unlike traditional
applications built for fixed, on-premises infrastructure, cloud-native
applications are engineered to accommodate the distributed, virtualized, and
dynamically provisioned nature of the cloud. However, these applications face
challenges due to unpredictable workloads and traffic patterns, as well as
their diverse natures and varying resource requirements. Judicious resource
management strategies are critical to optimally schedule compute, network,
and storage resources, maximizing application performance and cost efficiency
while ensuring service level objectives (SLOs).
This dissertation presents tailored resource management strategies for three
cloud- native application scenarios: long-running applications (LRAs),
serverless computing, and generative AI inference. For LRAs, which run
numerous long-lived container replicas to provide real-time services, optimal
container placement is crucial for attaining the best application
performance, as the containers have complex performance interferences, e.g.,
resource competitions and I/O dependencies. We propose a novel reinforcement
learning-based scheduler that captures container interferences, complies with
strict operation constraints, and efficiently produces high-quality placement
decisions.
In serverless computing, also known as Function-as-a-Service (FaaS),
applications are composed of loosely-coupled cloud functions with
user-specified resource configurations. Despite the simplified development
and deployment offered by FaaS platforms, recent studies reveal considerable
resource underutilization due to users’ overclaimed resource configurations
and the workload pattern of serverless computing. To improve resource
utilization while meeting applications’ latency requirements, we present
Golgi, a new FaaS scheduling system that judiciously overcommits functions
and performs automatic vertical scaling for function instances in accordance
with applications’ SLOs.
Finally, we focus on text-to-image generation, a representative generative AI
inference workload in the cloud. Production text-to-image services typically
employ a workflow consisting of a base diffusion model augmented with various
adapters, such as Control- Nets and LoRAs, to better control output image
details. However, adapters can significantly increase serving latency due to
their loading and computation overhead, which escalates as more adapters are
used. We characterize text-to-image services in production and propose KATZ,
a system that efficiently serves text-to-image workflows by orchestrating
computation and network resources with efficient serving pipelines
Date: Wednesday, 17 September 2025
Time: 10:00am - 12:00noon
Venue: Room 5501
Lifts 25/26
Chairman: Prof. Shuhuai YAO (MAE)
Committee Members: Dr. Wei WANG (Supervisor)
Prof. Song GUO
Dr. Shuai WANG
Prof. Jun ZHANG (ECE)
Dr. Cong WANG (CityU)