Federated Learning Systems: Performance Profiling and Optimization
Modern machine learning applications cannot succeed without the availability of training data. However, in many real-world scenarios such as healthcare and finance, training data is not easy to obtain for the following two reasons:
- Labeling raw data for machine learning training is expensive because they require professional knowledge and experience;
- The increasingly strict protection towards data privacy and data security restricts the sharing of training data.
Federated Transfer Learning (FTL) proposed by WeBank AI Group provides a reliable framework to overcome the above constraints by applying homomorphic encryption and polynomial approximation. Nowadays, FTL is attracting increasing attention in industries such as finance, medicine and healthcare. To make FTL a practical solution in real-world environment, performance is a key factor. However, FTL has its own unique privacy protection components such as data encryption, leading to potential performance degradation.
To understand the factors affecting the performance of FTL and get guidance on how to deploy and optimize FTL in practice. The SING Lab led by Prof. Kai Chen quantified the performance based on the real-world project FATE, which is an open-source project initiated by WeBank AI Group to provide a secure computing framework to support the federated AI ecosystem. Prof. Chen’s team showed three major bottlenecks in FTL and their potential solutions:
- Inter-process communication is one of the major bottlenecks of current FTL implementation. Within one machine, data exchange and memory copies among processes cause extremely high latency. Techniques such as JVM native memory heap and UNIX domain sockets may have the chance to mitigate the bottleneck.
- The requirements for privacy protection add more computation overhead to FTL. The software-based encryption implementation consumes too much CPU cycles. Work on GPU, FPGA, or SmartNIC inspires the possibility of implementing data encryption on specific hardware in the FTL infrastructure.
- Traditional congestion control problems in network communication are potential challenges. As bandwidth is limited in Internet, the intensive data exchange through network, especially after data encryption, will make FTL much slower. Deploying advanced networking technologies, such as RDMA or novel bounded-loss tolerant congestion control algorithms, will help to make data transfer faster.
Our paper titled "Quantifying the Performance of Federated Transfer Learning" is the first work in the community to systematically quantify and analyze the characteristics and performance bottlenecks of FATE. This work has won the Best Student Paper Award of the first Federated Learning Workshop in IJCAI 2019.
Furthermore, the findings from this work have guided us to further improve FATE to overcome the identified bottlenecks. For example, we improve the FATE as followings:
- We make the first code contribution to the FATE open-source community by significantly improving data exchange efficiency within Inter-process communication.
- We develop a FPGA-based encryption library that can effectively speed up this time-consuming operation, which is recently also nominated as the best student paper award of the upcoming Federated Learning Workshop in IJCAI 2020.
More follow-up work on optimizing and accelerating federated learning systems is being conducted in the SING Lab.