Towards Efficient and Accessible Systems for Distributed Machine Learning

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Towards Efficient and Accessible Systems for Distributed Machine 
Learning"

By

Mr. Kaiqiang XU


Abstract:

The increasing scale of modern machine learning (ML) models has created 
immense demand for computational resources. ML computing for AI involves 
highly distributed, parallelized workloads with intricate patterns while 
presenting challenges in resource management across distributed 
environments. Efficiently managing and executing these workloads using 
existing computing abstractions and mechanisms presents significant 
challenges to underlying systems.

This thesis addresses these challenges by designing a comprehensive system 
to optimize ML workloads, built upon an infrastructure for GPU clusters. 
This infrastructure provides efficient resource allocation and scheduling in 
multi-tenant environments, serving as a foundation for enhanced scalability, 
usability, and resource utilization.

Building on this foundation, three complementary systems are introduced: a 
carbon- efficient scheduler that aligns ML workloads with low-carbon energy 
periods, reducing environmental impact while maintaining performance; a 
compiler framework that optimizes distributed data processing for ML 
applications by simplifying development and minimizing communication 
overhead; and a scalable system for graph neural network training that 
introduces hybrid parallelism, locality-aware partitioning, and multi- level 
pipelines to enable efficient processing of billion-edge graphs.

Together, these contributions improve the performance, sustainability, and 
scalability of ML systems, addressing critical infrastructure gaps and 
advancing the state of the art in ML system design. This work provides the 
foundation for meeting the demands of next- generation AI applications in 
research and industry.


Date:                   Friday, 1 August 2025

Time:                   2:00pm - 4:00pm

Venue:                  Room 3494
                        Lifts 25/26

Chairman:               Prof. Minhua SHAO (CBE)

Committee Members:      Prof. Kai CHEN (Supervisor)
                        Prof. Gary CHAN
                        Prof. Song GUO
                        Dr. Jun ZHANG (ECE)
                        Prof. Cong WANG (CityU)