More about HKUST
Accommodating LLM Service over Heterogeneous Computational Resources
PhD Qualifying Examination Title: "Accommodating LLM Service over Heterogeneous Computational Resources" by Mr. Ran YAN Abstract: Serving generative inference and training large language models are crucial components of contemporary AI applications. Due to the intensive inference and training computation, the state-of-the-art inference service or training task of the large language models (LLMs) are usually hosted in centralized data centers with homogeneous high-performance GPUs, which can be very expensive. The high cost of such deployment potentially limits the application and advancement of this great technique. In this survey, we explore an alternative approach by deploying the inference and training tasks across heterogeneous GPUs to enable better flexibility and efficiency for heterogeneous resource utilization. However, with the heterogeneity of GPU hardware specifications and numerous potential parallel strategies, effectively accommodating LLM service over heterogeneous resources is extremely challenging. Excluding the heterogeneity factors (i.e. in homogeneous settings), identifying an efficient parallel configuration still requires significant effort. To outline future research directions, we first review the state-of-the-art works on scheduling LLM inference and training and then analyze potential research avenues. Date: Monday, 17 February 2025 Time: 10:00am - 12:00noon Venue: Room 2408 Lifts 25/26 Committee Members: Dr. Binhang Yuan (Supervisor) Dr. Dongdong She (Chairperson) Dr. Zili Meng Dr. Wei Wang