NETWORK TRANSPORT FOR AI-CENTRIC NETWORKING

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "NETWORK TRANSPORT FOR AI-CENTRIC NETWORKING"

By

Mr. Hao WANG


Abstract:

Driven by the increasing complexity of machine learning (ML) applications, 
such as autonomous driving, generative artificial intelligence (GAI), and 
game AI, the size of ML models is increasing explosively, from ResNet50 with 
23M parameters to GPT-3 with 175B parameters. ML practitioners usually 
leverage distributed training systems to parallelize the training process for 
large models and datasets, and communication network could become the 
bottleneck in this case. To train large models, people start to use 
customized AI clusters, e.g., xAI’s Colossus with 100,000 GPUs, to replace 
the traditional data centers. A paradigm shift occurs from general-purpose 
networking to AI-centric networking. Traditional network transport 
mechanisms, however, are often inadequate to meet or leverage the unique 
latency, throughput, scalability, and reliability requirement of AI-centric 
applications. Additionally, current transports overlook the inherent 
characteristics of AI-centric applications, e.g., loss tolerance and traffic 
predictability.

This thesis seeks to handle the paradigm shift by proposing specialized 
network transports tailored for AI-centric networking. We first propose our 
domain-specific network transport protocol for distributed training, which 
includes message semantics, reliability, rate control, flow scheduling, and 
load balancing. Next, to further enhance throughput and scalability, we 
introduce our transport that leverages in-network computing with data-plane 
memory scheduling on programmable switches. Finally, we discuss how to enable 
in-network computing in a cloud environment and design the transport layer 
primitives to support the deployment.


Date:                   Thursday, 6 March 2025

Time:                   3:00pm - 5:00pm

Venue:                  Room 4472
                        Lifts 25/26

Chairman:               Prof. Daniel PEREZ PALOMAR (ECE)

Committee Members:      Prof. Kai CHEN (Supervisor)
                        Prof. Qiong LUO
                        Dr. Binhang YUAN
                        Dr. Jun ZHANG (ECE)
                        Dr. Hong XU (CUHK)