FEDERATED TRANSFER LEARNING UNDER HETEROGENEOUS DATA

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "FEDERATED TRANSFER LEARNING UNDER HETEROGENEOUS DATA"

By

Mr. Xueyang WU


Abstract

Recent advancements in artificial intelligence (AI)  applications rely on 
massive amounts oftraining data.  In practice, these valuable data are 
independently distributed among multiple dataowners (e.g., companies and 
individuals), whose quantities are typically modest, and the data areusually 
heterogeneous. Collecting data from individual users or acquiring data from 
data owners isa conventionally popular and straightforward solution to this 
issue.  However, such solutions havebecome obsolete due to the rising trend of 
data privacy and data security concerns.  Currently, AIsystems face the problem 
of utilizing fragmented and diverse data that are independently 
distributedacross several data owners.

Federated learning (FL), a novel privacy-preserving collaborative machine 
learning paradigm,is proposed to address the privately isolated small data 
learning problem.  Its main idea is to com-pose a federation of data owners in 
which all participants virtually assemble their data withoutsacrificing data 
security and privacy. There are several challenges for federated learning, 
includingcommunication efficiency, data security and privacy protection, and 
statistical learning.  Among these challenges, the statistical learning 
challenge caused by heterogeneous data significantly af-fects the performance 
of FL systems and thus prohibits FL’s applications in practice.   In 
recentyears, academics have developed a machine learning paradigm known as 
transfer learning, whichutilizes heterogeneous data to solve the statistical 
learning issue in the target domain with limitedor no data.  Naturally, it 
motivates us to incorporate the spirit of transfer learning into 
federatedlearning to overcome the difficulty of statistical learning in 
practical FL.

In this thesis, we focus on federated transfer learning, a class of federated 
learning methodsthat employ the transfer learning methodology to tackle the 
statistical learning difficulty posed byheterogeneous data. Compared to other 
federated learning approaches, which presume datasets ondata owners are 
similarly and independently distributed, federated transfer learning focuses on 
howto address data heterogeneity across data owners in practice and achieves 
superior performance.

The thesis consists of two parts. First, we provide a brief overview of 
federated learning, includ-ing its concept, evolution, and categorization. More 
specifically, we cover its statistical learningchallenges in depth. We offer a 
precise categorization of algorithms addressing these challengesin federated 
learning, which we refer to as federated transfer learning.  Then, we examine 
currentrepresentative works and incorporate them into our proposed federated 
transfer learning architec-ture. Second, we identify three typical scenarios of 
data heterogeneity in federated learning withpractical applications and 
investigate how our proposed federated transfer learning methods over-come the 
challenge in these scenarios.  We believe that these federated transfer 
learning methodshold great promise for wider applications of federated 
learning.


Date:			Monday, 12 December 2022

Time:			10:30am - 12:30pm

Venue:			Room 3494
 			lifts 25/26

Chairperson:		Prof. Lixin XU (MATH)

Committee Members:	Prof. Qiang YANG (Supervisor)
 			Prof. Lei CHEN (Supervisor)
 			Prof. Kai CHEN
 			Prof. Yangqiu SONG
 			Prof. Can YANG (MATH)
 			Prof. Qing LI (PolyU)


**** ALL are Welcome ****