More about HKUST
A Survey on Communication Optimization for LLM Serving
PhD Qualifying Examination
Title: "A Survey on Communication Optimization for LLM Serving"
by
Mr. Yijun SUN
Abstract:
The rise of Large Language Models (LLMs) has catalyzed a new era for
generative AI applications. To efficiently serve increasingly large models,
modern serving clusters widely employ model parallelism, Key-Value (KV) cache
reuse, and Prefill-Decode (P/D) disaggregation. However, these techniques
introduce high communication overhead, which has become a primary performance
bottleneck severely affecting both latency and throughput.
This survey provides a comprehensive overview of communication optimization
techniques for LLM serving. We first outlines the fundamentals of LLM
inference and the serving paradigms that give rise to communication overhead.
Then we systematically classify and explore a wide range of optimization
strategies, categorizing them into two primary approaches: lossy
optimizations, which reduce data volume at the cost of model quality, and
lossless optimizations, which improve communication efficiency without
compromising generation quality. Through a comprehensive synthesis of the
objectives, methodologies, and inherent trade-offs of existing approaches,
this survey offers valuable insights into building quality-preserving and
communication-efficient serving systems.
Date: Tuesday, 29 July 2025
Time: 3:00pm - 5:00pm
Venue: Room 3494
Lifts 25/26
Committee Members: Prof. Kai Chen (Supervisor)
Prof. Gary Chan (Chairperson)
Dr. Xiaomin Ouyang