More about HKUST
Optimizing the Inference Efficiency of Deep Neural Networks on the Graph and the Operator Level
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Optimizing the Inference Efficiency of Deep Neural Networks on the Graph and the Operator Level" By Miss Jingzhi FANG Abstract: Deep neural networks (DNNs) have achieved great success in many areas, e.g., computer vision, natural language processing, and so on. This great success is mainly achieved by increasingly large and computationally intensive deep learning models. The increased model size makes the training and inference of DNNs time-consuming, bringing severe problems to the development and application of DNNs. As DNN implementations affect the execution time, we can optimize the DNN implementations to make DNNs run faster. The best implementation of a DNN is affected by its model architecture, input workload, and the hardware to run on, therefore, each DNN should be optimized individually, with the consideration of its runtime information. However, the optimization space of DNN implementations is often huge, making it hard to search for the best implementation. Furthermore, we may need to conduct the optimization process multiple times in practice, e.g., when designing the model architecture and when the DNN runtime information is dynamic (like dynamic input workload). Long optimization time can be unaffordable. As a result, the efficiency of optimizing the DNN implementation is also of great importance. In this thesis, we focus on the effectiveness and efficiency of optimizing DNN implementations and propose three techniques. As DNNs are usually interpreted as computation graphs, where each node corresponds to an operator in the model (e.g., matrix multiplication) and each edge corresponds to the data dependency between operators, our techniques optimize the DNN implementation on the graph and the operator level, respectively. The graph-level optimization method searches for the equivalent computation graph for a DNN by applying transformations to its original computation graph iteratively. The operator-level optimization method searches for the equivalent low-level code for an operator by applying transformations to its naively implemented low-level code. Specifically, our first two techniques optimize the general DNNs on the two levels, which focus on accelerating the optimization of DNN implementations while maintaining good optimization effectiveness. Then we move to a special scenario, i.e., there being sparsity in DNNs, and propose a method to optimize the sparse operators (i.e., operators with sparse input tensors), which poses unique challenges as we need to consider the sparse formats to store input tensors that will affect the way of data access and hence the possible operator implementations. Date: Friday, 28 June 2024 Time: 2:00pm - 4:00pm Venue: Room 3494 Lifts 25/26 Chairman: Dr. Kai LIU (LIFS) Committee Members: Prof. Lei CHEN (Supervisor) Prof. Qiong LUO Prof. Ke YI Dr. Can YANG (MATH) Prof. Haibo HU (PolyU)