More about HKUST
Towards Efficient and Scalable RDMA Networking for Datacenter Applications
PhD Thesis Proposal Defence Title: "Towards Efficient and Scalable RDMA Networking for Datacenter Applications" by Miss Wenxue LI Abstract: Remote Direct Memory Access (RDMA) has become a cornerstone of high-speed networking in modern datacenters. As applications such as AI training and HPC continue to scale, datacenter networks demand both higher performance and stronger scalability. However, existing RDMA techniques face key limitations, including sluggish congestion handling, inflexible communication semantics, and poor network scalability, which collectively constrain datacenter efficiency. This thesis addresses these challenges with three contributions. First, we propose FlowSail to enable timely congestion handling. By adopting hop-by-hop flow regulation without requiring per-flow queues, it achieves sub-RTT responsiveness while remaining practical for deployment. Second, we design Cepheus, which leverages programmable switches to extend RDMA semantics from one-to-one to one-to-many. Through in-network connection bridging and signal aggregation, it minimizes transmission hops and maximizes bandwidth utilization for one-to-many communication. Finally, we introduce DCP, which revisits RDMA reliability for lossy fabrics. By integrating lightweight switch-assisted packet trimming with redesigned RDMA NIC reliability logic, it enables fast and precise loss recovery under per-packet multipath transmission, enabling scalable and efficient RDMA transmission over lossy networks. Together, these contributions advance RDMA networking by improving congestion handling, enriching communication semantics, and enabling scalable transmission over lossy fabrics, thereby strengthening the foundation for future datacenter networks. Date: Thursday, 25 September 2025 Time: 2:00pm - 4:00pm Venue: Room 4472 Lifts 25/26 Committee Members: Prof. Kai Chen (Supervisor) Prof. Song Guo (Chairperson) Dr. Binhang Yuan