More about HKUST
Optimizing the Inference Efficiency of Deep Neural Networks on the Graph and the Operator Level
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Optimizing the Inference Efficiency of Deep Neural Networks on the
Graph and the Operator Level"
By
Miss Jingzhi FANG
Abstract:
Deep neural networks (DNNs) have achieved great success in many areas, e.g.,
computer vision, natural language processing, and so on. This great success is
mainly achieved by increasingly large and computationally intensive deep
learning models. The increased model size makes the training and inference of
DNNs time-consuming, bringing severe problems to the development and
application of DNNs. As DNN implementations affect the execution time, we can
optimize the DNN implementations to make DNNs run faster. The best
implementation of a DNN is affected by its model architecture, input workload,
and the hardware to run on, therefore, each DNN should be optimized
individually, with the consideration of its runtime information. However, the
optimization space of DNN implementations is often huge, making it hard to
search for the best implementation. Furthermore, we may need to conduct the
optimization process multiple times in practice, e.g., when designing the model
architecture and when the DNN runtime information is dynamic (like dynamic
input workload). Long optimization time can be unaffordable. As a result, the
efficiency of optimizing the DNN implementation is also of great importance.
In this thesis, we focus on the effectiveness and efficiency of optimizing DNN
implementations and propose three techniques. As DNNs are usually interpreted
as computation graphs, where each node corresponds to an operator in the model
(e.g., matrix multiplication) and each edge corresponds to the data dependency
between operators, our techniques optimize the DNN implementation on the graph
and the operator level, respectively. The graph-level optimization method
searches for the equivalent computation graph for a DNN by applying
transformations to its original computation graph iteratively. The
operator-level optimization method searches for the equivalent low-level code
for an operator by applying transformations to its naively implemented
low-level code. Specifically, our first two techniques optimize the general
DNNs on the two levels, which focus on accelerating the optimization of DNN
implementations while maintaining good optimization effectiveness. Then we move
to a special scenario, i.e., there being sparsity in DNNs, and propose a method
to optimize the sparse operators (i.e., operators with sparse input tensors),
which poses unique challenges as we need to consider the sparse formats to
store input tensors that will affect the way of data access and hence the
possible operator implementations.
Date: Friday, 28 June 2024
Time: 2:00pm - 4:00pm
Venue: Room 3494
Lifts 25/26
Chairman: Dr. Kai LIU (LIFS)
Committee Members: Prof. Lei CHEN (Supervisor)
Prof. Qiong LUO
Prof. Ke YI
Dr. Can YANG (MATH)
Prof. Haibo HU (PolyU)