More about HKUST
Congestion Control in Data Centers using AI: A Survey
PhD Qualifying Examination Title: "Congestion Control in Data Centers using AI: A Survey" by Mr. Waheed Gameel Gadallah GHALI Abstract: Today’s cloud computing provides various web services to clients over the internet. Cloud computing requires using many data center networks (DCNs) that consist of tens of thousands of servers interconnected by high-speed communication networks. However, the unique characteristics of DCNs, such as high bandwidth and low latency, may introduce serious challenges for traditional congestion control mechanisms. Issues like buffer bloat, incast, and the partition/aggregate operation often lead to congestion and degrading performance for delay-sensitive applications such as web search, real-time data processing, and distributed computing frameworks. More sophisticated solutions are required since traditional methods find it difficult to adjust to the dynamic and unpredictable nature of DCN traffic. Recently, Artificial Intelligence (AI) proved to be a potent tool for process control in many areas. Mainly, in the domain of congestion control in DCNs, AI-driven mechanisms have emerged that employ reinforcement learning, predictive analytics, and real-time adaptation to overcome the shortcomings of conventional methods. Consequently, AI can leverage network traffic patterns to dynamically adjust parameters, enhance resource allocation, and proactively identify congestion. In this survey, we explore congestion control approaches and the related DCN architectural designs. First, we categorize DCN congestion control algorithms into incast, hardware, and software-based methods. Subsequently, we explore AI-powered congestion control systems and divide them into four primary groups: reinforcement learning-based algorithms, fine-tuned and adaptable algorithms, offline-trained algorithms, and online-trained algorithms. We analyze these algorithms' effectiveness, challenges, and potential improvements, shedding light on key research directions for future AI-driven congestion control in DCN environments. This survey aims to guide the development of next-generation congestion control algorithms that enhance performance, adaptability, and scalability in DCNs by bridging the gap between networking and AI advancements. Date: Tuesday, 8 April 2025 Time: 2:00pm - 4:00pm Venue: Room 4472 Lifts 25/26 Committee Members: Dr. Brahim Bensaou (Supervisor) Prof. Gary Chan (Chairperson) Prof. Kai Chen