More about HKUST
Towards Efficient and Accessible Systems for Distributed Machine Learning
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Towards Efficient and Accessible Systems for Distributed Machine Learning" By Mr. Kaiqiang XU Abstract: The increasing scale of modern machine learning (ML) models has created immense demand for computational resources. ML computing for AI involves highly distributed, parallelized workloads with intricate patterns while presenting challenges in resource management across distributed environments. Efficiently managing and executing these workloads using existing computing abstractions and mechanisms presents significant challenges to underlying systems. This thesis addresses these challenges by designing a comprehensive system to optimize ML workloads, built upon an infrastructure for GPU clusters. This infrastructure provides efficient resource allocation and scheduling in multi-tenant environments, serving as a foundation for enhanced scalability, usability, and resource utilization. Building on this foundation, three complementary systems are introduced: a carbon- efficient scheduler that aligns ML workloads with low-carbon energy periods, reducing environmental impact while maintaining performance; a compiler framework that optimizes distributed data processing for ML applications by simplifying development and minimizing communication overhead; and a scalable system for graph neural network training that introduces hybrid parallelism, locality-aware partitioning, and multi- level pipelines to enable efficient processing of billion-edge graphs. Together, these contributions improve the performance, sustainability, and scalability of ML systems, addressing critical infrastructure gaps and advancing the state of the art in ML system design. This work provides the foundation for meeting the demands of next- generation AI applications in research and industry. Date: Friday, 1 August 2025 Time: 2:00pm - 4:00pm Venue: Room 3494 Lifts 25/26 Chairman: Prof. Minhua SHAO (CBE) Committee Members: Prof. Kai CHEN (Supervisor) Prof. Gary CHAN Prof. Song GUO Dr. Jun ZHANG (ECE) Prof. Cong WANG (CityU)