Towards High Speed Data Center Network: Challenges and Solutions

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Towards High Speed Data Center Network: Challenges and Solutions"

By

Mr. Shuihai HU


Abstract

In recent years, the link speed of data center networks (DCNs) significantly 
increases, from 1Gbps to 10Gbps, to 40/100Gbps with 200Gbps on the horizons. In 
the era of high speed DCNs, it is increasingly clear that traditional kernel 
based network transports can no longer meet the requirements of modern data 
center applications, mainly for two reasons. First, traditional network 
transports adopt reactive algorithms for congestion control, which is too slow 
and inefficient at high speed. Second, kernel based transports have very high 
CPU overhead at high speed and thus can hardly deliver low latency and high 
throughput to applications/services at low cost. Realizing the drawbacks of 
traditional network transports, great effort has been made in the recent years. 
However, existing solutions either fail to achieve desirable performance or are 
difficult to deploy in production environments.

Regarding congestion control for high speed DCNs, proactive transport has drawn 
great attention in the community. With proactive transport, link capacities are 
proactively allocated as “credits” to each sender who then is able to send 
“scheduled packets” at a right rate to ensure zero queueing and high link 
utilization. Despite being promising, a fundamental challenge is that proactive 
transport requires at least one RTT for the credits to be computed and 
delivered. In the thesis, we reveal that such one-RTT “pre-credit” phase is 
crucial, but none of prior solutions has treated it properly. Regarding 
providing desirable network performance at low CPU overhead, public cloud 
providers like Microsoft and Google are deploying remote direct memory access 
(RDMA) over Ethernet (RoCE) in their data centers to enable low latency, high 
throughput data transfers with minimal CPU overhead. RoCE deployments, however, 
are vulnerable to deadlocks induced by Priority Flow Control (PFC). Once 
deadlock is formed, throughput of the whole network or part of the network will 
go to zero due to the backpressure effect of PFC pause.

This thesis describes my research efforts to address the above two challenges. 
First, we present Aeolus, a solution focusing on “pre-credit” packet 
transmission acting as a build- ing block for all proactive transports. With 
Aeolus, two seemingly contradictory goals are achieved simultaneously: 
eliminating the one-RTT additional delay of “pre-credit” phase while still 
preserving all the good properties of proactive solutions. Second, we propose a 
practical deadlock prevention scheme for RDMA DCNs, called Tagger. By carrying 
tags in the packets and installing pre-generated match-action rules in the 
switches for tag manipulation and buffer management, Tagger guarantees 
deadlock-freedom using only modest buffers without any changes to the routing 
protocol or switch hardware.


Date:			Wednesday, 12 June 2019

Time:			2:00pm - 4:00pm

Venue:			Room 3494
 			Lifts 25/26

Chairman:		Prof. Weijia Wen (PHYS)

Committee Members:	Prof. Kai Chen (Supervisor)
 			Prof. Wei Wang
 			Prof. Ke Yi
 			Prof. Wei Zhang (ECE)
 			Prof. Chuan Wu (HKU)

**** ALL are Welcome ****