More about HKUST
Congestion Control in Data Centers using AI: A Survey
PhD Qualifying Examination
Title: "Congestion Control in Data Centers using AI: A Survey"
by
Mr. Waheed Gameel Gadallah GHALI
Abstract:
Today’s cloud computing provides various web services to clients over the
internet. Cloud computing requires using many data center networks (DCNs)
that consist of tens of thousands of servers interconnected by high-speed
communication networks. However, the unique characteristics of DCNs, such as
high bandwidth and low latency, may introduce serious challenges for
traditional congestion control mechanisms. Issues like buffer bloat, incast,
and the partition/aggregate operation often lead to congestion and degrading
performance for delay-sensitive applications such as web search, real-time
data processing, and distributed computing frameworks. More sophisticated
solutions are required since traditional methods find it difficult to adjust
to the dynamic and unpredictable nature of DCN traffic.
Recently, Artificial Intelligence (AI) proved to be a potent tool for
process control in many areas. Mainly, in the domain of congestion control
in DCNs, AI-driven mechanisms have emerged that employ reinforcement
learning, predictive analytics, and real-time adaptation to overcome the
shortcomings of conventional methods. Consequently, AI can leverage network
traffic patterns to dynamically adjust parameters, enhance resource
allocation, and proactively identify congestion.
In this survey, we explore congestion control approaches and the related DCN
architectural designs. First, we categorize DCN congestion control
algorithms into incast, hardware, and software-based methods. Subsequently,
we explore AI-powered congestion control systems and divide them into four
primary groups: reinforcement learning-based algorithms, fine-tuned and
adaptable algorithms, offline-trained algorithms, and online-trained
algorithms. We analyze these algorithms' effectiveness, challenges, and
potential improvements, shedding light on key research directions for future
AI-driven congestion control in DCN environments. This survey aims to guide
the development of next-generation congestion control algorithms that
enhance performance, adaptability, and scalability in DCNs by bridging the
gap between networking and AI advancements.
Date: Tuesday, 8 April 2025
Time: 2:00pm - 4:00pm
Venue: Room 4472
Lifts 25/26
Committee Members: Dr. Brahim Bensaou (Supervisor)
Prof. Gary Chan (Chairperson)
Prof. Kai Chen