More about HKUST
Minimum Exposure Approach for Trustworthy Vertical Federated Learning
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Minimum Exposure Approach for Trustworthy Vertical Federated
Learning"
By
Mr. Dashan GAO
Abstract:
As artificial intelligence advances, addressing data scarcity and privacy
concerns becomes crucial. Federated Learning (FL) offers a
privacy-preserving framework for collaborative model training across
organizations. Specifically, Vertical Federated Learning (VFL) faces
unique challenges arising from vertically partitioned data among parties.
This thesis introduces a minimum-exposure approach for trustworthy VFL,
aiming to expose only the minimum-necessary information needed, thereby
balancing trustworthiness objectives such as privacy, utility, robustness,
and efficiency. By categorizing information exposure into data and model
parameter exposure, this thesis proposes guides targeted mitigation
strategies. First, we address intra-sample label exposure in VFL with a
two-phase framework: offline-phase cleansing and training-phase
perturbation. Our proposed Label Privacy Source Coding (LPSC) encodes the
minimum-necessary label information in the offline phase. Then, we employ
adversarial training to enhance privacy during training. Second, we
further explore a more challenging VFL scenario with arbitrarily aligned
samples. We introduce the Complementary Knowledge Distillation (CKD)
framework to minimizing intra-sample information exposure and facilitate
privacy-preserving knowledge transfer among parties. Third, we tackle
model parameter exposure in heterogeneous federated transfer learning by
proposing a cryptography-based framework PP-HFTL. A model integration
method in PP-HFTL reduces model parameter exposure and allows local model
inference. Finally, we address inter-sample information exposure by
proposing a secure vertical federated dataset condensation (VFDC)
framework. This framework efficiently condenses the entire real dataset in
VFL to a small synthetic dataset, reducing inter-sample information
exposure that could compromise privacy while maintaining model utility.
Extensive experiments on real-world datasets demonstrate the effectiveness
and efficiency of our approaches, outperforming existing baselines in
various objectives.
Date: Friday, 28 February 2025
Time: 10:00am - 12:00noon
Venue: Room 4472
Lifts 25/26
Chairman: Prof. Andrew Wing On POON (ECE)
Committee Members: Prof. Qiang YANG (Supervisor)
Prof. Kai CHEN (Supervisor)
Dr. Xiaojuan MA
Dr. Wei WANG
Dr. Can YANG (MATH)
Prof. Hongxia YANG (PolyU)