More about HKUST
Accommodating LLM Service over Heterogeneous Computational Resources
PhD Qualifying Examination
Title: "Accommodating LLM Service over Heterogeneous Computational Resources"
by
Mr. Ran YAN
Abstract:
Serving generative inference and training large language models are crucial
components of contemporary AI applications. Due to the intensive inference
and training computation, the state-of-the-art inference service or training
task of the large language models (LLMs) are usually hosted in centralized
data centers with homogeneous high-performance GPUs, which can be very
expensive. The high cost of such deployment potentially limits the
application and advancement of this great technique. In this survey, we
explore an alternative approach by deploying the inference and training
tasks across heterogeneous GPUs to enable better flexibility and efficiency
for heterogeneous resource utilization. However, with the heterogeneity of
GPU hardware specifications and numerous potential parallel strategies,
effectively accommodating LLM service over heterogeneous resources is
extremely challenging. Excluding the heterogeneity factors (i.e. in
homogeneous settings), identifying an efficient parallel configuration still
requires significant effort. To outline future research directions, we first
review the state-of-the-art works on scheduling LLM inference and training
and then analyze potential research avenues.
Date: Monday, 17 February 2025
Time: 10:00am - 12:00noon
Venue: Room 2408
Lifts 25/26
Committee Members: Dr. Binhang Yuan (Supervisor)
Dr. Dongdong She (Chairperson)
Dr. Zili Meng
Dr. Wei Wang