More about HKUST
Optimizing the Inference Efficiency of Deep Neural Networks on the Graph and the Operator Level
PhD Thesis Proposal Defence
Title: "Optimizing the Inference Efficiency of Deep Neural Networks on the
Graph and the Operator Level"
by
Miss Jingzhi FANG
Abstract:
Deep neural networks (DNNs) have achieved great success in many areas, e.g.,
computer vision, natural language processing, and so on. This great success is
mainly achieved by increasingly large and computationally intensive deep
learning models. The increased model size makes the training and inference of
DNNs time-consuming, bringing severe problems to the development and
application of DNNs. As a result, it is important to reduce the execution time
of DNNs. One way to achieve this goal is to optimize the implementation of
DNNs, without changing their outputs. The best implementation of a DNN is
affected by its model architecture, input workload, and the hardware to run on.
Therefore, each DNN should be optimized individually, with the consideration of
its runtime information. However, the optimization space of DNN implementations
is often huge, making it hard to search for the best implementation.
Furthermore, we may need to conduct the optimization process multiple times in
practice, e.g., when designing the model architecture and when the DNN runtime
information is dynamic (like dynamic input workload). Long optimization time
can be unaffordable. As a result, the efficiency of optimizing the DNN
implementation is also of great importance. In this thesis, we introduce two
techniques to accelerate the optimization of DNN implementations while
maintaining good optimization effectiveness. Specifically, as the DNN can be
represented by a computation graph, where each node corresponds to an operator
in the model (e.g., matrix multiplication) and each edge corresponds to the
data dependency between operators, our techniques optimize the DNN
implementation on the graph level and the operator level, respectively. The
graph-level optimization method searches for the equivalent computation graph
for a DNN by applying transformations to its original computation graph
iteratively. The operator-level optimization method searches for the equivalent
low-level code for an operator by applying transformations to its naively
implemented low-level code.
Date: Friday, 22 March 2024
Time: 4:00pm - 6:00pm
Venue: Room 5501
Lifts 25/26
Committee Members: Prof. Lei Chen (Supervisor)
Prof. Raymond Wong (Chairperson)
Prof. Qiong Luo
Prof. Ke Yi