More about HKUST
Towards Efficient and Secure GPU Interconnect for AI-centric Systems
PhD Thesis Proposal Defence Title: "Towards Efficient and Secure GPU Interconnect for AI-centric Systems" by Mr. Zhenghang REN Abstract: The rapid growth of GPU-driven AI applications demands high-speed and secure GPU interconnects to support large-scale model training and deployment in parallel and distributed systems. Existing interconnects, however, face significant challenges, including limited bandwidth, in-network congestion, and the risk of private data leakage. These issues directly hinder the performance, scalability, and security of modern AI systems. This thesis explores novel solutions to enhance GPU interconnect performance while ensuring data confidentiality in AI-centric systems. It makes the following three key contributions: First, we propose FuseLink to maximize GPU communication bandwidth. By efficiently transmitting data across multiple network interfaces, it leverages both intra- and inter-server connections to mitigate communication bottlenecks in multi-GPU systems. Second, we introduce MCC, a novel congestion control scheme. It improves communication efficiency and resilience in AI-centric networks by leveraging message-level congestion signals to prevent the excessive rate reduction common in traditional algorithms. Finally, we present CORA, a high-performance GPU communication framework for secure machine learning. This framework incorporates Remote Direct Memory Access (RDMA) with cryptographic primitives like secret sharing to enable low-latency, privacy-preserving model training and serving across GPU clusters. Together, these contributions advance the state of GPU interconnect protocols, addressing critical challenges in bandwidth, congestion management, and security for AI-centric systems. Date: Friday, 22 August 2025 Time: 2:00pm - 4:00pm Venue: Room 3494 Lifts 25/26 Committee Members: Prof. Kai Chen (Supervisor) Prof. Song Guo (Chairperson) Dr. Binhang Yuan