Optimizing the Inference Efficiency of Deep Neural Networks on the Graph and the Operator Level

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Optimizing the Inference Efficiency of Deep Neural Networks on the 
Graph and the Operator Level"

By

Miss Jingzhi FANG


Abstract:

Deep neural networks (DNNs) have achieved great success in many areas, e.g., 
computer vision, natural language processing, and so on. This great success is 
mainly achieved by increasingly large and computationally intensive deep 
learning models. The increased model size makes the training and inference of 
DNNs time-consuming, bringing severe problems to the development and 
application of DNNs. As DNN implementations affect the execution time, we can 
optimize the DNN implementations to make DNNs run faster. The best 
implementation of a DNN is affected by its model architecture, input workload, 
and the hardware to run on, therefore, each DNN should be optimized 
individually, with the consideration of its runtime information. However, the 
optimization space of DNN implementations is often huge, making it hard to 
search for the best implementation. Furthermore, we may need to conduct the 
optimization process multiple times in practice, e.g., when designing the model 
architecture and when the DNN runtime information is dynamic (like dynamic 
input workload). Long optimization time can be unaffordable. As a result, the 
efficiency of optimizing the DNN implementation is also of great importance.

In this thesis, we focus on the effectiveness and efficiency of optimizing DNN 
implementations and propose three techniques. As DNNs are usually interpreted 
as computation graphs, where each node corresponds to an operator in the model 
(e.g., matrix multiplication) and each edge corresponds to the data dependency 
between operators, our techniques optimize the DNN implementation on the graph 
and the operator level, respectively. The graph-level optimization method 
searches for the equivalent computation graph for a DNN by applying 
transformations to its original computation graph iteratively. The 
operator-level optimization method searches for the equivalent low-level code 
for an operator by applying transformations to its naively implemented 
low-level code. Specifically, our first two techniques optimize the general 
DNNs on the two levels, which focus on accelerating the optimization of DNN 
implementations while maintaining good optimization effectiveness. Then we move 
to a special scenario, i.e., there being sparsity in DNNs, and propose a method 
to optimize the sparse operators (i.e., operators with sparse input tensors), 
which poses unique challenges as we need to consider the sparse formats to 
store input tensors that will affect the way of data access and hence the 
possible operator implementations.


Date:                   Friday, 28 June 2024

Time:                   2:00pm - 4:00pm

Venue:                  Room 3494
                        Lifts 25/26

Chairman:               Dr. Kai LIU (LIFS)

Committee Members:      Prof. Lei CHEN (Supervisor)
                        Prof. Qiong LUO
                        Prof. Ke YI
                        Dr. Can YANG (MATH)
                        Prof. Haibo HU (PolyU)