Network Transport for AI-Centric Networking

PhD Thesis Proposal Defence


Title: "Network Transport for AI-Centric Networking"

by

Mr. Hao WANG


Abstract:

Driven by the increasing complexity of machine learning (ML) applications, 
such as autonomous driving, generative artificial intelligence (GAI), and 
game AI, the size of ML models is increasing explosively, from ResNet50 with 
23M parameters to GPT-3 with 175B parameters. ML practitioners usually 
leverage distributed training systems to parallelize the training process for 
large models and datasets, and communication network could become the 
bottleneck in this case. To train large models, people start to use 
customized AI clusters, e.g., xAI’s Colossus with 100,000 GPUs, to replace 
the traditional data centers. A paradigm shift occurs from general-purpose 
networking to AI-centric networking. Traditional network transport 
mechanisms, however, are often inadequate to meet the unique latency, 
throughput, scalability, and reliability requirement of AI-centric 
applications. Additionally, current transports overlook the inherent 
characteristics of AI-centric applications, e.g., loss tolerance and traffic 
predictability.

This thesis seeks to handle the paradigm shift by proposing specialized 
network transports tailored for AI-centric networking. We first propose our 
domain-specific network transport protocol for distributed training, which 
includes message semantics, reliability, rate control, flow scheduling, and 
load balancing. Next, to further enhance throughput and scalability, we 
introduce our transport that leverages in-network computing with data-plane 
memory scheduling on programmable switches. Finally, we discuss the emerging 
AI-centric networking protocols in the industry, such as UEC (Ultra Ethernet 
Consortium) and their relationship to ours, focusing on congestion control 
and load balancing.


Date:                   Monday, 9 December 2024

Time:                   4:00pm - 6:00pm

Venue:                  Room 3494
                        Lifts 25/26

Committee Members:      Prof. Kai Chen (Supervisor)
                        Prof. Qiong Luo (Chairperson)
                        Dr. Yangqiu Song
                        Dr. Binhang Yuan