More about HKUST
Towards Efficient, Secure and Cost-effective Large-Scale Systems for Machine Learning
PhD Thesis Proposal Defence Title: "Towards Efficient, Secure and Cost-effective Large-Scale Systems for Machine Learning" by Mr. Chengliang ZHANG Abstract: Machine learning techniques have advanced in leaps and bounds in the past decade. As ML's success critically relies on the abundant computing power and the availability of big data, it is impractical to host ML applications on a single machine. By distributing ML workload and training data across multiple machines, we are able to substantially improve the productivity of ML applications. As large-scale ML applications are increasingly deployed in production systems, how to improve efficiency, protect data security, and reduce cost have become pressing needs in the deployment of large-scale ML applications. Specifically, there are three unique challenges that must be addressed. First, how to efficiently train an ML model in a cluster in the presence of heterogeneity? Second, once the model is trained, how do we serve it with minimal cost while maintaining service-level objectives (SLOs)? Lastly, now that federated learning (FL) is proposed to protect data privacy, how to practice it without compromising speed and model quality becomes the problem. Unfortunately, existing works do not provide satisfactory solutions to the three challenges. First, traditional ML systems often conduct asynchronous training to improve resource utilization. While it maximizes the rate of updates, the price paid is degraded training quality. Second, ML serving is much more computation intensive and harder to scale, applying generic cloud scaling methods on ML serving can lead to high resource wastage and poor latency performance. Third, Homomorphic Encryption (HE) can be conveniently adopted to preserve data privacy in FL without sacrificing model accuracy. However, HE induces prohibitively high computation and communication overheads which make it impractical for state-of-the-art models. To answer the above three unique challenges in large-scale ML systems, we profile, analyze, and propose new strategies to achieve efficiency, security, and cost-effectiveness. To address the _rst problem, we propose a new distributed ML scheme, termed speculative synchronization. Our scheme allows workers to speculate about the recent parameter updates from others on the y, and if necessary, the workers abort the ongoing computation, pull fresher parameters, and start over to improve the quality of training. We implement our scheme and demonstrate that speculative synchronization achieves substantial speedups over the asynchronous parallel scheme with minimal communication overhead. Second, to tackle the dual challenge of SLO compliance and cost effectiveness, we proposes a general-purpose ML serving system called MArk (Model Ark). To start, MArk dynamically batches requests and opportunistically serves them using accelerators for improved performance-cost ratio. Then, instead of relying on over-provisioning, MArk employs predictive autoscaling to hide the provisioning latency at low cost. Last, MArk exploits the stateless nature of inference serving by utilizing flexible, yet costly serverless instances to cover unexpected load spikes. We show that MArk can greatly reduce the serving cost while scoring better latency performance compared with popular industry solutions. Last, we present BatchCrypt, a system solution for cross-silo FL that significantly reduces the encryption and communication overhead caused by HE. Instead of encrypting individual gradients with full precision, we encode a batch of quantized gradients into a long integer and encrypt it in one go. To allow gradient-wise aggregation to be performed on ciphertexts of the encoded batches, we develop new quantization and encoding schemes along with a novel gradient clipping technique. Our evaluations confirm that BatchCrypt can effectively reduce the computation and communication overhead. Date: Friday, 12 June 2020 Time: 3:00pm - 5:00pm Zoom Meeting: https://hkust.zoom.us/j/99778106038 Committee Members: Dr. Wei Wang (Supervisor) Prof. Bo Li (Chairperson) Dr. Kai Chen Prof. Qian Zhang **** ALL are Welcome ****