Minimum Exposure Approach for Trustworthy Vertical Federated Learning

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Minimum Exposure Approach for Trustworthy Vertical Federated 
Learning"

By

Mr. Dashan GAO


Abstract:

As artificial intelligence advances, addressing data scarcity and privacy 
concerns becomes crucial. Federated Learning (FL) offers a 
privacy-preserving framework for collaborative model training across 
organizations. Specifically, Vertical Federated Learning (VFL) faces 
unique challenges arising from vertically partitioned data among parties. 
This thesis introduces a minimum-exposure approach for trustworthy VFL, 
aiming to expose only the minimum-necessary information needed, thereby 
balancing trustworthiness objectives such as privacy, utility, robustness, 
and efficiency. By categorizing information exposure into data and model 
parameter exposure, this thesis proposes guides targeted mitigation 
strategies. First, we address intra-sample label exposure in VFL with a 
two-phase framework: offline-phase cleansing and training-phase 
perturbation. Our proposed Label Privacy Source Coding (LPSC) encodes the 
minimum-necessary label information in the offline phase. Then, we employ 
adversarial training to enhance privacy during training. Second, we 
further explore a more challenging VFL scenario with arbitrarily aligned 
samples. We introduce the Complementary Knowledge Distillation (CKD) 
framework to minimizing intra-sample information exposure and facilitate 
privacy-preserving knowledge transfer among parties. Third, we tackle 
model parameter exposure in heterogeneous federated transfer learning by 
proposing a cryptography-based framework PP-HFTL. A model integration 
method in PP-HFTL reduces model parameter exposure and allows local model 
inference. Finally, we address inter-sample information exposure by 
proposing a secure vertical federated dataset condensation (VFDC) 
framework. This framework efficiently condenses the entire real dataset in 
VFL to a small synthetic dataset, reducing inter-sample information 
exposure that could compromise privacy while maintaining model utility. 
Extensive experiments on real-world datasets demonstrate the effectiveness 
and efficiency of our approaches, outperforming existing baselines in 
various objectives.


Date:                   Friday, 28 February 2025

Time:                   10:00am - 12:00noon

Venue:                  Room 4472
                        Lifts 25/26

Chairman:               Prof. Andrew Wing On POON (ECE)

Committee Members:      Prof. Qiang YANG (Supervisor)
                        Prof. Kai CHEN (Supervisor)
                        Dr. Xiaojuan MA
                        Dr. Wei WANG
                        Dr. Can YANG (MATH)
                        Prof. Hongxia YANG (PolyU)