More about HKUST
Towards Efficient and Scalable RDMA Networking for Datacenter Applications
PhD Thesis Proposal Defence
Title: "Towards Efficient and Scalable RDMA Networking for Datacenter
Applications"
by
Miss Wenxue LI
Abstract:
Remote Direct Memory Access (RDMA) has become a cornerstone of high-speed
networking in modern datacenters. As applications such as AI training and HPC
continue to scale, datacenter networks demand both higher performance and
stronger scalability. However, existing RDMA techniques face key limitations,
including sluggish congestion handling, inflexible communication semantics,
and poor network scalability, which collectively constrain datacenter
efficiency.
This thesis addresses these challenges with three contributions. First, we
propose FlowSail to enable timely congestion handling. By adopting hop-by-hop
flow regulation without requiring per-flow queues, it achieves sub-RTT
responsiveness while remaining practical for deployment. Second, we design
Cepheus, which leverages programmable switches to extend RDMA semantics from
one-to-one to one-to-many. Through in-network connection bridging and signal
aggregation, it minimizes transmission hops and maximizes bandwidth
utilization for one-to-many communication. Finally, we introduce DCP, which
revisits RDMA reliability for lossy fabrics. By integrating lightweight
switch-assisted packet trimming with redesigned RDMA NIC reliability logic,
it enables fast and precise loss recovery under per-packet multipath
transmission, enabling scalable and efficient RDMA transmission over lossy
networks.
Together, these contributions advance RDMA networking by improving congestion
handling, enriching communication semantics, and enabling scalable
transmission over lossy fabrics, thereby strengthening the foundation for
future datacenter networks.
Date: Thursday, 25 September 2025
Time: 2:00pm - 4:00pm
Venue: Room 4472
Lifts 25/26
Committee Members: Prof. Kai Chen (Supervisor)
Prof. Song Guo (Chairperson)
Dr. Binhang Yuan