More about HKUST
Towards Efficient Resource Management for Cloud-Native Applications
PhD Thesis Proposal Defence Title: "Towards Efficient Resource Management for Cloud-Native Applications" by Mr. Suyi LI Abstract: Cloud-native applications are designed to take advantage of the scalability, flexibility, and resilience of cloud computing platforms. Unlike traditional applications built for fixed, on-premises infrastructure, cloud-native applications are engineered to accommodate the distributed, virtualized, and dynamically provisioned nature of the cloud. However, these applications face challenges due to unpredictable workloads and traffic patterns, as well as their diverse natures and varying resource requirements. Judicious resource management strategies are critical to optimally schedule compute, network, and storage resources, maximizing application performance and cost efficiency while ensuring service level objectives (SLOs). This dissertation presents tailored resource management strategies for three cloud-native application scenarios: long-running applications (LRAs), serverless computing, and generative AI inference. For LRAs, which run numerous long-lived container replicas to provide real-time services, optimal container placement is crucial for attaining the best application performance, as the containers have complex performance interferences, e.g., resource competitions and I/O dependencies. We propose a novel reinforcement learning-based scheduler that captures container interferences, complies with strict operation constraints, and efficiently produces high-quality placement decisions. In serverless computing, also known as Function-as-a-Service (FaaS), applications are composed of loosely-coupled cloud functions with user-specified resource configurations. Despite the simplified development and deployment offered by FaaS platforms, recent studies reveal considerable resource underutilization due to users' overclaimed resource configurations and the workload pattern of serverless computing. To improve resource utilization while meeting applications' latency requirements, we present Golgi, a new FaaS scheduling system that judiciously overcommits functions and performs automatic vertical scaling for function instances in accordance with applications' SLOs. Finally, we focus on text-to-image generation, a representative generative AI inference workload in the cloud. Production text-to-image services typically employ a workflow consisting of a base diffusion model augmented with various adapters, such as ControlNets and LoRAs, to better control output image details. However, adapters can significantly increase serving latency due to their loading and computation overhead, which escalates as more adapters are used. We characterize text-to-image services in production and propose Katz, a system that efficiently serves text-to-image workflows by orchestrating computation and network resources with efficient serving pipelines. Date: Tuesday, 29 April 2025 Time: 2:30pm - 4:30pm Venue: Room 1104 Lift 19 Committee Members: Dr. Wei Wang (Supervisor) Prof. Bo Li (Chairperson) Prof. Song Guo Prof. Qian Zhang