Scalable and Automated Side Channel Analysis for AI Infrastructures

PhD Thesis Proposal Defence


Title: "Scalable and Automated Side Channel Analysis for AI Infrastructures"

by

Mr. Yuanyuan YUAN


Abstract:

With years of growing development, AI systems have gained widespread adoption 
across numerous security- and privacy-critical applications. Side channel 
analysis (SCA) explores unintended secret leakage in a system's non-functional 
characteristics (i.e., side channels) such as execution time or memory access 
patterns. This thesis comprehensively studies the side channel leakage in 
infrastructures that underpin the entire life cycle of modern AI systems, 
including data processing libraries, trusted execution environments, runtime 
interpreters, executables on edge devices, etc. We also propose highly 
practical solutions for SCA and implement a series of scalable and automated 
SCA tools.

Our first work focuses on cache side channel, a severe threat that enables 
remote and non-privileged adversaries. This work addresses the quantification 
and localization problems, two fundamental challenges in detecting side channel 
leakage. We first review the (in-)adequacy of existing side channel detection 
methods and propose eight criteria for designing a full-fledged cache side 
channel detector. Accordingly, we propose CacheQL that meets all these 
criteria. CacheQL quantifies the leakage as the mutual information between the 
secret and its resulting side channel observation. To localize the leakage 
source, CacheQL distributes the leaked information among vulnerable program 
modules via game theory. With meticulous optimizations, CacheQL is highly 
scalable and applies to production software; it also largely improves the 
quantification precision and reduces the cost of localization from exponential 
to almost constant.

Our second work exploits cache side-channel leakage in data processing 
libraries of AI systems. Modern AI systems accept high-dimensional media data 
(e.g., images, audios) as inputs and adopt data processing libraries (e.g., 
Libjpeg, FFmpeg) to handle different input formats; recovering AI inputs is 
inherently challenging given the high dimension and complex format of such 
data. Unlike prior works that operate on data bytes (e.g., pixel values of an 
image), we focus on semantic information in AI inputs (e.g., what constitutes a 
face in an image) and significantly reduce the complexity of SCA. Our pipeline 
is unified for different formats of AI inputs and is fully automated. We also 
propose low-cost yet effective mitigation for the leakage.

Our third work further investigates data leakage induced by computations of the 
internal AI models and focuses on trusted execution environments (TEEs). 
Despite that TEEs are widely employed to ensure secure AI computation on 
untrusted host machines, we show that this security belief is violated due to 
the recently disclosed ciphertext side channels in TEEs: a malicious host can 
precisely recover AI inputs from the encrypted ciphertext in TEEs. We 
systematically examine the leakage in AI runtime interpreters including 
TensorFlow and PyTorch, and study how their different computation paradigms 
affect the leakage. Our results also show that optimizations in AI compilers 
(e.g., TVM, Glow) can enlarge the leakage in AI executables.

Our fourth work studies how AI models can be extracted under TEE protection by 
exploiting ciphertext side channels. This work for the first time demonstrates 
the feasibility and practicality of recovering AI models' weights from 
microarchitecture side channels. We implement a highly stealthy tool, 
HyperTheft, that can recover weights by observing the AI model's one execution 
without querying the model. The recovered model weights constantly achieve 
77%~97% under different attack scenarios. Based on HyperTheft's results, we 
further show how different downstream attacks can be enabled to leak training 
data and manipulate the AI system's outputs.


Date:                   Friday, 31 May 2024

Time:                   2:00pm - 4:00pm

Venue:                  Room 4475
                        Lifts 25/26

Committee Members:      Dr. Shuai Wang (Supervisor)
                        Prof. Shing-Chi Cheung (Chairperson)
                        Dr. Yangqiu Song
                        Dr. Long Chen