COMP 6211I: Trustworthy Machine Learning [Spring 2023]

Monday, 13:30-14:50 @ Room 6591

Friday, 9:00-10:20 @ Room 6591




This is an intensive graduate seminar on Trustworthy machine learning. The course covers different topics in emerging research areas related to the broader study of security and privacy in machine learning. Students will learn about attacks against computer systems leveraging machine learning, as well as defense techniques to mitigate such attacks.


The course assumes students already have a basic understanding of machine learning. Students will familiarize themselves with the emerging body of literature from different research communities investigating these questions. The class is designed to help students explore new research directions and applications. Most of the course readings will come from both seminal and recent papers in the field.

Grading Policy


A 1 page summary of reading assigned is due each class (starting from week 2 and onwards). A physical copy should be turned in before the beginning of class. The summary should cover the following: (a) what did the papers do well?, (b) where did the papers fall short?, (c) what did you learn from these papers?, and (d) what questions do you have about the papers?

Research Projects

Students are required to do a project in this class. The goal of the course project is to provide the students an opportunity to explore research directions in trustworthy machine learning. The project should be related to the course content. An expected project consists of

Tentative Schedule and Material

Date Topic Slides Readings&links Assignments
Fri 3/2 Overview of Trustworthy Machine Learning lecture_0    
Mon 6/2 Machine learning basics part 1 lecture_1    
Fri 10/2 Machine learning basics part 2 lecture_2    
Mon 13/2 Machine learning basics part 3 lecture_3    
Fri 17/2 Machine learning basics part 4 lecture_4    
Mon 20/2 Exam      
Fri 24/2 Test-time intergrity (attack) slides White-box attack:
Goodfellow et al., Explaining and Harnessing Adversarial Examples
Carlinin and Wagner, Towards Evaluating the Robustness of Neural Networks
Moosavi-dezfooli et al., Universal adversarial perturbations
Hard-label black-box attack:
Brendel et al., Decision-based adversarial attacks: reliable attacks against black-box machine learning models
Cheng et al., Query-efficient hard-label black-box attack: an optimization-based approach
Chen et al., HopSkipJumpAttack: A Query-Efficient Decision-Based Attack
Mon 27/2 Test-time intergrity (defense) slides Madry et al., Towards Deep Learning Models Resistant to Adversarial Attacks
Wong et al., Fast is better than Free: Revisiting Adversarial Training
Zhang et al., Theoretically Principled Trade-off between Robustness and Accuracy
Fri 3/3 Training-time intergrity (backdoor attack) slides Liu et al., Trojaning Attack on Neural Networks
Shafahi et al., Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks
Gu et al., BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
Mon 6/3 Training-time intergrity (defense) slides Wang et al., Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks
Wang et al., Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases
Fri 10/3 Test-time intergrity (verification) part 1 slides Eric and Kolter, Provable defenses against adversarial examples via the convex outer adversarial polytope
Zhang et al., Efficient Neural Network Robustness Certification with General Activation Functions
Zhang et al., General Cutting Planes for Bound-Propagation-Based Neural Network Verification
Mon 13/3 Test-time intergrity (verification) part 2 slides Cohen et al., Certified Adversarial Robustness via Randomized Smoothing  
Fri 17/3 Training-time intergrity (poisoning attack)   Koh and Liang, Understanding Black-box Predictions via Influence Functions
Carlini and Terzis, Poisoning and Backdooring Contrastive Learning
Carlini, Poisoning the Unlabeled Dataset of Semi-Supervised Learning
Mon 20/3 Confidentiality (data) attack slides Carlini et al., Extracting Training Data from Large Language Models
Kahla et al., Label-Only Model Inversion Attacks via Boundary Repulsion
Fri 24/3 Privacy attacks slides Shokri et al., Membership Inference Attacks against Machine Learning Models
Fredrikson et al., Model inversion attacks that exploit confidence information and basic countermeasures
Choquette-Choo et al., Label-Only Membership Inference Attacks
Mon 27/3 Confidentiality (model) slides Jagielski et al., High Accuracy and High Fidelity Extraction of Neural Networks
Tramer et al., Stealing Machine Learning Models via Prediction APIs
Fri 31/3 Confidentiality defense slides Huang et al., Unlearnable Examples: Making Personal Data Unexploitable
Maini, Dataset Inference: Ownership Resolution in Machine Learning
Mon 3/4 Fairness slides Zhao et al., Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints
Dwork et al., Fairness Through Awareness
Caliskan et al., Semantics derived automatically from language corpora contain human-like biases
Fri 7/4 Study break      
Mon 10/4 Study break      
Fri 14/4 Differential privacy part I slides Dwork et al., Calibrating Noise to Sensitivity in Private Dat Analysis
Abadi et al., Deep Learning with Differential Privacy
Mon 17/4 Differential privacy part II slides Papernot et al., Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data
Mironov, Renyi Differential Privacy
Fri 21/4 Interpretability (XAI) part 1 slides Simonyan et al., Deep inside convolutional networks: Visualising image classication models and saliency maps
Selvaraju et al., Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Mon 24/4 Interpretability (XAI) part 2 slides Ribeiro et al., “Why Should I Trust You?”: Explaining the Predictions of Any Classifier
Lundberg and Lee, A unified approach to interpreting model predictions
Fri 4/28 Safety   Athalye et al., Synthesizing Robust Adversarial Examples
Xu et al., Adversarial T-shirt! Evading Person Detectors in A Physical World
Mon 1/5 Labor day      
Fri 5/5 Uncertainty slides Guo et al., On Calibration of Modern Neural Networks
Minderer et al., Revisiting the Calibration of Modern Neural Networks
Mon 8/5 Project Presentation      


There is no required textbook for this course. Some recommended readings are