More about HKUST
Minimum Exposure Approach for Trustworthy Vertical Federated Learning
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Minimum Exposure Approach for Trustworthy Vertical Federated Learning" By Mr. Dashan GAO Abstract: As artificial intelligence advances, addressing data scarcity and privacy concerns becomes crucial. Federated Learning (FL) offers a privacy-preserving framework for collaborative model training across organizations. Specifically, Vertical Federated Learning (VFL) faces unique challenges arising from vertically partitioned data among parties. This thesis introduces a minimum-exposure approach for trustworthy VFL, aiming to expose only the minimum-necessary information needed, thereby balancing trustworthiness objectives such as privacy, utility, robustness, and efficiency. By categorizing information exposure into data and model parameter exposure, this thesis proposes guides targeted mitigation strategies. First, we address intra-sample label exposure in VFL with a two-phase framework: offline-phase cleansing and training-phase perturbation. Our proposed Label Privacy Source Coding (LPSC) encodes the minimum-necessary label information in the offline phase. Then, we employ adversarial training to enhance privacy during training. Second, we further explore a more challenging VFL scenario with arbitrarily aligned samples. We introduce the Complementary Knowledge Distillation (CKD) framework to minimizing intra-sample information exposure and facilitate privacy-preserving knowledge transfer among parties. Third, we tackle model parameter exposure in heterogeneous federated transfer learning by proposing a cryptography-based framework PP-HFTL. A model integration method in PP-HFTL reduces model parameter exposure and allows local model inference. Finally, we address inter-sample information exposure by proposing a secure vertical federated dataset condensation (VFDC) framework. This framework efficiently condenses the entire real dataset in VFL to a small synthetic dataset, reducing inter-sample information exposure that could compromise privacy while maintaining model utility. Extensive experiments on real-world datasets demonstrate the effectiveness and efficiency of our approaches, outperforming existing baselines in various objectives. Date: Friday, 28 February 2025 Time: 10:00am - 12:00noon Venue: Room 4472 Lifts 25/26 Chairman: Prof. Andrew Wing On POON (ECE) Committee Members: Prof. Qiang YANG (Supervisor) Prof. Kai CHEN (Supervisor) Dr. Xiaojuan MA Dr. Wei WANG Dr. Can YANG (MATH) Prof. Hongxia YANG (PolyU)