More about HKUST
A Survey on Communication Optimization for LLM Serving
PhD Qualifying Examination Title: "A Survey on Communication Optimization for LLM Serving" by Mr. Yijun SUN Abstract: The rise of Large Language Models (LLMs) has catalyzed a new era for generative AI applications. To efficiently serve increasingly large models, modern serving clusters widely employ model parallelism, Key-Value (KV) cache reuse, and Prefill-Decode (P/D) disaggregation. However, these techniques introduce high communication overhead, which has become a primary performance bottleneck severely affecting both latency and throughput. This survey provides a comprehensive overview of communication optimization techniques for LLM serving. We first outlines the fundamentals of LLM inference and the serving paradigms that give rise to communication overhead. Then we systematically classify and explore a wide range of optimization strategies, categorizing them into two primary approaches: lossy optimizations, which reduce data volume at the cost of model quality, and lossless optimizations, which improve communication efficiency without compromising generation quality. Through a comprehensive synthesis of the objectives, methodologies, and inherent trade-offs of existing approaches, this survey offers valuable insights into building quality-preserving and communication-efficient serving systems. Date: Tuesday, 29 July 2025 Time: 3:00pm - 5:00pm Venue: Room 3494 Lifts 25/26 Committee Members: Prof. Kai Chen (Supervisor) Prof. Gary Chan (Chairperson) Dr. Xiaomin Ouyang