More about HKUST
Scalable and Automated Side Channel Analysis for AI Infrastructures
PhD Thesis Proposal Defence Title: "Scalable and Automated Side Channel Analysis for AI Infrastructures" by Mr. Yuanyuan YUAN Abstract: With years of growing development, AI systems have gained widespread adoption across numerous security- and privacy-critical applications. Side channel analysis (SCA) explores unintended secret leakage in a system's non-functional characteristics (i.e., side channels) such as execution time or memory access patterns. This thesis comprehensively studies the side channel leakage in infrastructures that underpin the entire life cycle of modern AI systems, including data processing libraries, trusted execution environments, runtime interpreters, executables on edge devices, etc. We also propose highly practical solutions for SCA and implement a series of scalable and automated SCA tools. Our first work focuses on cache side channel, a severe threat that enables remote and non-privileged adversaries. This work addresses the quantification and localization problems, two fundamental challenges in detecting side channel leakage. We first review the (in-)adequacy of existing side channel detection methods and propose eight criteria for designing a full-fledged cache side channel detector. Accordingly, we propose CacheQL that meets all these criteria. CacheQL quantifies the leakage as the mutual information between the secret and its resulting side channel observation. To localize the leakage source, CacheQL distributes the leaked information among vulnerable program modules via game theory. With meticulous optimizations, CacheQL is highly scalable and applies to production software; it also largely improves the quantification precision and reduces the cost of localization from exponential to almost constant. Our second work exploits cache side-channel leakage in data processing libraries of AI systems. Modern AI systems accept high-dimensional media data (e.g., images, audios) as inputs and adopt data processing libraries (e.g., Libjpeg, FFmpeg) to handle different input formats; recovering AI inputs is inherently challenging given the high dimension and complex format of such data. Unlike prior works that operate on data bytes (e.g., pixel values of an image), we focus on semantic information in AI inputs (e.g., what constitutes a face in an image) and significantly reduce the complexity of SCA. Our pipeline is unified for different formats of AI inputs and is fully automated. We also propose low-cost yet effective mitigation for the leakage. Our third work further investigates data leakage induced by computations of the internal AI models and focuses on trusted execution environments (TEEs). Despite that TEEs are widely employed to ensure secure AI computation on untrusted host machines, we show that this security belief is violated due to the recently disclosed ciphertext side channels in TEEs: a malicious host can precisely recover AI inputs from the encrypted ciphertext in TEEs. We systematically examine the leakage in AI runtime interpreters including TensorFlow and PyTorch, and study how their different computation paradigms affect the leakage. Our results also show that optimizations in AI compilers (e.g., TVM, Glow) can enlarge the leakage in AI executables. Our fourth work studies how AI models can be extracted under TEE protection by exploiting ciphertext side channels. This work for the first time demonstrates the feasibility and practicality of recovering AI models' weights from microarchitecture side channels. We implement a highly stealthy tool, HyperTheft, that can recover weights by observing the AI model's one execution without querying the model. The recovered model weights constantly achieve 77%~97% under different attack scenarios. Based on HyperTheft's results, we further show how different downstream attacks can be enabled to leak training data and manipulate the AI system's outputs. Date: Friday, 31 May 2024 Time: 2:00pm - 4:00pm Venue: Room 4475 Lifts 25/26 Committee Members: Dr. Shuai Wang (Supervisor) Prof. Shing-Chi Cheung (Chairperson) Dr. Yangqiu Song Dr. Long Chen