More about HKUST
Towards Efficient and Secure GPU Interconnect for AI-centric Systems
PhD Thesis Proposal Defence
Title: "Towards Efficient and Secure GPU Interconnect for AI-centric Systems"
by
Mr. Zhenghang REN
Abstract:
The rapid growth of GPU-driven AI applications demands high-speed and secure
GPU interconnects to support large-scale model training and deployment in
parallel and distributed systems. Existing interconnects, however, face
significant challenges, including limited bandwidth, in-network congestion,
and the risk of private data leakage. These issues directly hinder the
performance, scalability, and security of modern AI systems.
This thesis explores novel solutions to enhance GPU interconnect performance
while ensuring data confidentiality in AI-centric systems. It makes the
following three key contributions: First, we propose FuseLink to maximize
GPU communication bandwidth. By efficiently transmitting data across
multiple network interfaces, it leverages both intra- and inter-server
connections to mitigate communication bottlenecks in multi-GPU systems.
Second, we introduce MCC, a novel congestion control scheme. It improves
communication efficiency and resilience in AI-centric networks by leveraging
message-level congestion signals to prevent the excessive rate reduction
common in traditional algorithms. Finally, we present CORA, a
high-performance GPU communication framework for secure machine learning.
This framework incorporates Remote Direct Memory Access (RDMA) with
cryptographic primitives like secret sharing to enable low-latency,
privacy-preserving model training and serving across GPU clusters.
Together, these contributions advance the state of GPU interconnect
protocols, addressing critical challenges in bandwidth, congestion
management, and security for AI-centric systems.
Date: Wednesday, 20 August 2025
Time: 2:00pm - 4:00pm
Venue: Room 3494
Lifts 25/26
Committee Members: Prof. Kai Chen (Supervisor)
Prof. Song Guo (Chairperson)
Dr. Binhang Yuan