More about HKUST
Towards Efficient Transports for Datacenter Networking with High Environmental Variations
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Towards Efficient Transports for Datacenter Networking with High Environmental Variations" By Mr. Junxue ZHANG Abstract In real-world datacenter networking, high environmental variations exist. For instance, the base RTT, which is assumed to be stable, can have up to 2.68X variations due to the varying processing delay caused by network components such as networking stack, middlebox, hypervisor, etc, Furthermore, besides the RTT variations, there are also other environmental variations in datacenters, eg, traffic pattern, topology, failure, etc, posing challenges towards transports design for datacenter networks. From the algorithm level, the high environmental variations make heuristic ECN-based transports difficult to deliver optimal performance. One concrete example is that the RTT variations make it difficult for datacenter operators to derive the proper ECN marking threshold to simultaneously deliver high throughput, low latency and good burst tolerance communications. Furthermore, we find that adaptive neural network (NN) driven transports can learn and adapt to the varying environment, which shows its potential to be successful in datacenter networking with high environmental variations. However, current NN-based transports fail to deliver optimal performance from the deployment level, leading to either performance loss or large overhead. This thesis describes our research efforts in designing efficient transports for datacenter networking with high environmental variations. First, to solve the problem of degraded performance with high RTT variations, we propose a new heuristic ECN-based transport -- ECN#, ECN# extends the current ECN marking mechanism to consider both instantaneous and persistent congestion. Our evaluations show that ECN# can effectively reduce latency without hurting throughput. For example, compared to the current practice, ECN# achieves up to 23.4% (31.2%) lower average (99th percentile) flow completion time (FCT) for short flows while delivering similar FCT for large flows under production workloads. Second, to make adaptive NN-based transports available for datacenter networking, we propose LiteFlow. LiteFlow is a hybrid framework to deploy high-performance adaptive NNs for kernel datapath by decoupling the control path of adaptive NNs into a kernel-space fast path for efficient model inference, and a userspace slow path for effective model tuning. We evaluate LiteFlow with two real-world NN-based CC schemes. Experiment results show that for flow goodput, LiteFlow with these NNs can outperform userspace-deployed NNs by up to 44.4% while suffering no more overhead than kernel-space CC algorithms such as BBR and CUBIC. Date: Friday, 26 August 2022 Time: 4:00pm - 6:00pm Zoom Meeting: https://hkust.zoom.us/j/94626581311?pwd=bnJXeWZmb1Q1L2ozTDMrTHNpTzNadz09 Chairperson: Prof. Kevin CHEN (ECE) Committee Members: Prof. Kai CHEN (Supervisor) Prof. Gary CHAN Prof. Ke YI Prof. Xuanyu CAO (ECE) Prof. Hong XU (CUHK) **** ALL are Welcome ****