The Promise and Pitfalls of Public Data in Private ML
Speaker:
Dr. Gautam Kamath
University of Waterloo
Title: The Promise and Pitfalls of Public Data in Private ML
Date: Monday, 28 April 2025
Time: 4:00pm - 5:00pm
Venue:
Lecture Theater F
(Leung Yat Sing Lecture Theater), near lift 25/26, HKUST
Abstract:
Machine learning models are frequently trained on large-scale datasets, which may contain sensitive or personal data. Worryingly, without special care, these models are prone to revealing information about datapoints in their training set, leading to violations of individual privacy. To protect against such privacy risks, we can train models with differential privacy (DP), a rigorous notion of individual data privacy. While training models with DP has previously been observed to result in unacceptable losses in utility, I will discuss recent advances which incorporate public data into the training pipeline, allowing models to guarantee both privacy and utility. I will also discuss potential pitfalls of this approach, and directions forward for the community.
This talk is based on the speaker's ICLR 2022 paper and ICML 2024 paper (best paper award).
Biography:
Gautam Kamath is an Assistant Professor at the David R. Cheriton School of Computer Science at the University of Waterloo, and a Canada CIFAR AI Chair and Faculty Member at the Vector Institute. He has a B.S. in Computer Science and Electrical and Computer Engineering from Cornell University, and an M.S. and Ph.D. in Computer Science from the Massachusetts Institute of Technology. He is interested in reliable and trustworthy statistics and machine learning, including considerations such as data privacy and robustness. He was a Microsoft Research Fellow, as a part of the Simons-Berkeley Research Fellowship Program at the Simons Institute for the Theory of Computing. He serves as an Editor in Chief of Transactions on Machine Learning Research, and was the program committee co-chair of the 36th International Conference on Algorithmic Learning Theory (ALT 2025). He is the recipient of several awards, including the Caspar Bowden Award for Outstanding Research in Privacy Enhancing Technologies, a best paper award at the Forty-first International Conference on Machine Learning (ICML 2024), and the Faculty of Math Golden Jubilee Research Excellence Award.