Congestion Control in Data Centers using AI: A Survey

PhD Qualifying Examination


Title: "Congestion Control in Data Centers using AI: A Survey"

by

Mr. Waheed Gameel Gadallah GHALI


Abstract:

Today’s cloud computing provides various web services to clients over the 
internet. Cloud computing requires using many data center networks (DCNs) 
that consist of tens of thousands of servers interconnected by high-speed 
communication networks. However, the unique characteristics of DCNs, such as 
high bandwidth and low latency, may introduce serious challenges for 
traditional congestion control mechanisms. Issues like buffer bloat, incast, 
and the partition/aggregate operation often lead to congestion and degrading 
performance for delay-sensitive applications such as web search, real-time 
data processing, and distributed computing frameworks. More sophisticated 
solutions are required since traditional methods find it difficult to adjust 
to the dynamic and unpredictable nature of DCN traffic.

Recently, Artificial Intelligence (AI) proved to be a potent tool for 
process control in many areas. Mainly, in the domain of congestion control 
in DCNs, AI-driven mechanisms have emerged that employ reinforcement 
learning, predictive analytics, and real-time adaptation to overcome the 
shortcomings of conventional methods. Consequently, AI can leverage network 
traffic patterns to dynamically adjust parameters, enhance resource 
allocation, and proactively identify congestion.

In this survey, we explore congestion control approaches and the related DCN 
architectural designs. First, we categorize DCN congestion control 
algorithms into incast, hardware, and software-based methods. Subsequently, 
we explore AI-powered congestion control systems and divide them into four 
primary groups: reinforcement learning-based algorithms, fine-tuned and 
adaptable algorithms, offline-trained algorithms, and online-trained 
algorithms. We analyze these algorithms' effectiveness, challenges, and 
potential improvements, shedding light on key research directions for future 
AI-driven congestion control in DCN environments. This survey aims to guide 
the development of next-generation congestion control algorithms that 
enhance performance, adaptability, and scalability in DCNs by bridging the 
gap between networking and AI advancements.


Date:                   Tuesday, 8 April 2025

Time:                   2:00pm - 4:00pm

Venue:                  Room 4472
                        Lifts 25/26

Committee Members:      Dr. Brahim Bensaou (Supervisor)
                        Prof. Gary Chan (Chairperson)
                        Prof. Kai Chen