More about HKUST
Towards High Speed Data Center Network: Challenges and Solutions
PhD Thesis Proposal Defence Title: "Towards High Speed Data Center Network: Challenges and Solutions" by Mr. Shuihai HU Abstract: In recent years, the link speed of data center networks (DCNs) significantly increases, from 1Gbps to 10Gbps, to 40/100Gbps with 200Gbps on the horizons. In the era of high speed DCNs, it is increasingly clear that traditional kernel based network transports can no longer meet the requirements of modern data center applications, mainly for two reasons. First, traditional network transports adopt reactive algorithms for congestion control, which is too slow and inefficient at high speed. Second, kernel based transports have very high CPU overhead at high speed and thus can hardly deliver low latency and high throughput to applications/services at low cost. Realizing the drawbacks of traditional network transports, great effort has been made in the recent years. However, existing solutions either fail to achieve desirable performance or are difficult to deploy in production environments. Regarding congestion control for high speed DCNs, proactive congestion control solutions recently have drawn great attention in the research community. By explicitly scheduling packet transfers based on the availability of network bandwidth, proactive solutions offer a lossless, near-zero queueing network for data transmission. Despite the advantages, a major drawback of proactive solutions is that, an extra RTT is needed to allocate rates for new arrival flows. To solve this, existing solutions let new flows blindly transmit unscheduled packets in the first RTT. The unscheduled packets, however, can cause severe congestion under heavy workloads, resulting in large queue buildups and even loss of scheduled packets, affecting the properties of proactive solutions. Regarding providing desirable network performance at low CPU overhead, public cloud providers like Microsoft and Google are deploying remote direct memory access (RDMA) over Ethernet (RoCE) in their data centers to enable low latency, high throughput data transfers with minimal CPU overhead. Roce deployments, however, are vulnerable to deadlocks induced by Priority Flow Control (PFC). Once deadlock is formed, through- put of the whole network or part of the network will go to zero due to the backpressure effect of PFC pause. This dissertation describes my research efforts to address the above two challenges. First, we present Aeolus, a simple yet effective solution that augments all existing proactive solutions. With Aeolus, two seemingly contradictory goals are achieved simultaneously: eliminating the one RTT additional delay while still preserving all the good properties of proactive solutions. Second, we propose a practical deadlock prevention scheme for RDMA DCNs, called Tagger. By carrying tags in the packets and installing pre-generated match-action rules in the switches for tag manipulation and buffer management, Tagger guarantees deadlock-freedom using only modest buffers without any changes to the rout- ing protocol or switch hardware. Date: Monday, 3 December 2018 Time: 11:00am - 1:00pm Venue: Room 3494 (lifts 25/26) Committee Members: Dr. Kai Chen (Supervisor) Dr. Yangqiu Song (Chairperson) Prof. Lei Chen Dr. Wei Wang **** ALL are Welcome ****