More about HKUST
Principles and Automation of Low-Level Optimizations on GPUs
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Principles and Automation of Low-Level Optimizations on GPUs" By Mr. Da YAN Abstract Performance optimizations on GPUs are not well-understood enough. This thesis discusses principles and automation of performance optimizations on NVIDIA GPUs, with a special focus on compute-bound kernels. This thesis focuses on the abstraction layers between portable virtual instruction sets (e.g., LLVM IR, NVIDIA PTX) and native hardware assembly. We first introduce the native GPU instruction set, Shader ASSembly (SASS). Previously, the public cannot customize SASS generation as the only way to generate SASS is to use close-sourced proprietary compiler ptxas. ptxas hides many important optimizations including instruction scheduling. We develop an open-source assembler, TuringAs, for the public to manipulate SASS. And we identified new optimization opportunities at the SASS level. For instance, using some native SASS instructions helps to reduce register pressure and reordering SASS instructions leads to better instruction-level parallelism thus increasing throughput. We evaluate the effectiveness of our optimizations with the examples of Winograd convolution (a fast convolution algorithm) and Tensor Core matrix multiplication. Next, we introduce our effort to automate SASS optimizations to promote productivity. Programming in SASS doesn't scale to a large number of kernels or new GPU architectures. We develop GASS, an LLVM-based compiler that translates high-level virtual representation (i.e., LLVM IR) to optimized SASS automatically. We highlight our newly proposed instruction scheduler for compute-bound deep learning kernels, our customization of the if-conversion pass, and our algorithms to resolve data dependency. The evaluation shows that our algorithms in GASS outperform LLVM's algorithms by a considerable margin and GASS is on-par with highly optimized proprietary compiler ptxas. Date: Monday, 4 July 2022 Time: 2:00pm - 4:00pm Zoom Meeting: https://hkust.zoom.us/j/96421397372?pwd=a3VPMHJCZS9haGJyUDlIeGNuUWlHdz09 Chairperson: Prof. Robert KO (LIFS) Committee Members: Prof. Wei WANG (Supervisor) Prof. Lionel PARREAUX Prof. Shuai WANG Prof. Wei ZHANG (ECE) Prof. Bei YU (CUHK) **** ALL are Welcome ****