More about HKUST
Towards Efficient and Accessible Systems for Distributed Machine Learning
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Towards Efficient and Accessible Systems for Distributed Machine
Learning"
By
Mr. Kaiqiang XU
Abstract:
The increasing scale of modern machine learning (ML) models has created
immense demand for computational resources. ML computing for AI involves
highly distributed, parallelized workloads with intricate patterns while
presenting challenges in resource management across distributed
environments. Efficiently managing and executing these workloads using
existing computing abstractions and mechanisms presents significant
challenges to underlying systems.
This thesis addresses these challenges by designing a comprehensive system
to optimize ML workloads, built upon an infrastructure for GPU clusters.
This infrastructure provides efficient resource allocation and scheduling in
multi-tenant environments, serving as a foundation for enhanced scalability,
usability, and resource utilization.
Building on this foundation, three complementary systems are introduced: a
carbon- efficient scheduler that aligns ML workloads with low-carbon energy
periods, reducing environmental impact while maintaining performance; a
compiler framework that optimizes distributed data processing for ML
applications by simplifying development and minimizing communication
overhead; and a scalable system for graph neural network training that
introduces hybrid parallelism, locality-aware partitioning, and multi- level
pipelines to enable efficient processing of billion-edge graphs.
Together, these contributions improve the performance, sustainability, and
scalability of ML systems, addressing critical infrastructure gaps and
advancing the state of the art in ML system design. This work provides the
foundation for meeting the demands of next- generation AI applications in
research and industry.
Date: Friday, 1 August 2025
Time: 2:00pm - 4:00pm
Venue: Room 3494
Lifts 25/26
Chairman: Prof. Minhua SHAO (CBE)
Committee Members: Prof. Kai CHEN (Supervisor)
Prof. Gary CHAN
Prof. Song GUO
Dr. Jun ZHANG (ECE)
Prof. Cong WANG (CityU)