More about HKUST
Reducing the Ever-growing Cost of Machine Learning Services
Speaker: Dr. Binhang Yuan ETH Zurich Title: "Reducing the Ever-growing Cost of Machine Learning Services" Date: Monday, 20 February 2023 (Revised) Time: 3:00pm - 4:00pm HKT Zoom link: https://hkust.zoom.us/j/465698645?pwd=aVRaNWs2RHNFcXpnWGlkR05wTTk3UT09 Meeting ID: 465 698 645 Passcode: 20222023 Abstract: The recent success of machine learning (ML) has dramatically benefited from the exponential growth of ML model capacity. However, the enormous capacity of ML models also leads to a significantly higher cost. In practice, the high cost of ML comes from three sources: i) the cost of optimizing/deploying ML services over the ever-changing hardware; ii) the low utilization of the hardware due to parallel/distributed communication overhead; and iii) the high cost of accessing the hardware. My work attempts to reduce the cost in all three categories above: Developing and deploying ML workflows in ever-changing execution environments is a tedious and time-consuming job and would require a significant amount of engineering effort to scale out the computation; my work proposes new abstractions for ML system design and implementation with expressivity, easy optimization, and high performance. In parallel/distributed ML training, communication is usually the main bottleneck that restricts hardware efficiency; my work explores system relaxations of communications under different parallel ML training paradigms to increase hardware efficiency without compromising their statistical efficiency. Given the advances in system optimization and relaxation, my work further investigates how to deploy the ML service over a decentralized open collective environment consisting of much cheaper and underutilized decentralized GPUs; the result is promising: when the decentralized interconnections are 100X slower than the data center network, under efficient scheduling, the end-to-end training throughput is only 1.7~3.5X slower than the state-of-the-art solutions inside a data center. ******************* Biography: Binhang Yuan is a postdoc research scientist in the Department of Computer Science at ETH Zurich under the supervision of Ce Zhang. He received his Bachelor of Science (2013) in Computer Science from Fudan University, and his Master of Science (2016) and Ph.D. (2020) in Computer Science from Rice University advised by Chris Jermaine. Binhang's research interests lie in the areas of data management for machine learning and distributed/decentralized machine learning systems. His work won the Best Paper Honorable Mention Award in VLDB 2019 and the Research Highlight Award in SIGMOD 2020.