More about HKUST
Towards High Speed Data Center Network: Challenges and Solutions
PhD Thesis Proposal Defence
Title: "Towards High Speed Data Center Network: Challenges and Solutions"
by
Mr. Shuihai HU
Abstract:
In recent years, the link speed of data center networks (DCNs) significantly
increases, from 1Gbps to 10Gbps, to 40/100Gbps with 200Gbps on the horizons. In
the era of high speed DCNs, it is increasingly clear that traditional kernel
based network transports can no longer meet the requirements of modern data
center applications, mainly for two reasons. First, traditional network
transports adopt reactive algorithms for congestion control, which is too slow
and inefficient at high speed. Second, kernel based transports have very high
CPU overhead at high speed and thus can hardly deliver low latency and high
throughput to applications/services at low cost. Realizing the drawbacks of
traditional network transports, great effort has been made in the recent years.
However, existing solutions either fail to achieve desirable performance or are
difficult to deploy in production environments.
Regarding congestion control for high speed DCNs, proactive congestion control
solutions recently have drawn great attention in the research community. By
explicitly scheduling packet transfers based on the availability of network
bandwidth, proactive solutions offer a lossless, near-zero queueing network for
data transmission. Despite the advantages, a major drawback of proactive
solutions is that, an extra RTT is needed to allocate rates for new arrival
flows. To solve this, existing solutions let new flows blindly transmit
unscheduled packets in the first RTT. The unscheduled packets, however, can
cause severe congestion under heavy workloads, resulting in large queue
buildups and even loss of scheduled packets, affecting the properties of
proactive solutions.
Regarding providing desirable network performance at low CPU overhead, public
cloud providers like Microsoft and Google are deploying remote direct memory
access (RDMA) over Ethernet (RoCE) in their data centers to enable low latency,
high throughput data transfers with minimal CPU overhead. Roce deployments,
however, are vulnerable to deadlocks induced by Priority Flow Control (PFC).
Once deadlock is formed, through- put of the whole network or part of the
network will go to zero due to the backpressure effect of PFC pause. This
dissertation describes my research efforts to address the above two challenges.
First, we present Aeolus, a simple yet effective solution that augments all
existing proactive solutions. With Aeolus, two seemingly contradictory goals
are achieved simultaneously: eliminating the one RTT additional delay while
still preserving all the good properties of proactive solutions. Second, we
propose a practical deadlock prevention scheme for RDMA DCNs, called Tagger. By
carrying tags in the packets and installing pre-generated match-action rules in
the switches for tag manipulation and buffer management, Tagger guarantees
deadlock-freedom using only modest buffers without any changes to the rout- ing
protocol or switch hardware.
Date: Monday, 3 December 2018
Time: 11:00am - 1:00pm
Venue: Room 3494
(lifts 25/26)
Committee Members: Dr. Kai Chen (Supervisor)
Dr. Yangqiu Song (Chairperson)
Prof. Lei Chen
Dr. Wei Wang
**** ALL are Welcome ****