More about HKUST
Optimizing the Inference Efficiency of Deep Neural Networks on the Graph and the Operator Level
PhD Thesis Proposal Defence Title: "Optimizing the Inference Efficiency of Deep Neural Networks on the Graph and the Operator Level" by Miss Jingzhi FANG Abstract: Deep neural networks (DNNs) have achieved great success in many areas, e.g., computer vision, natural language processing, and so on. This great success is mainly achieved by increasingly large and computationally intensive deep learning models. The increased model size makes the training and inference of DNNs time-consuming, bringing severe problems to the development and application of DNNs. As a result, it is important to reduce the execution time of DNNs. One way to achieve this goal is to optimize the implementation of DNNs, without changing their outputs. The best implementation of a DNN is affected by its model architecture, input workload, and the hardware to run on. Therefore, each DNN should be optimized individually, with the consideration of its runtime information. However, the optimization space of DNN implementations is often huge, making it hard to search for the best implementation. Furthermore, we may need to conduct the optimization process multiple times in practice, e.g., when designing the model architecture and when the DNN runtime information is dynamic (like dynamic input workload). Long optimization time can be unaffordable. As a result, the efficiency of optimizing the DNN implementation is also of great importance. In this thesis, we introduce two techniques to accelerate the optimization of DNN implementations while maintaining good optimization effectiveness. Specifically, as the DNN can be represented by a computation graph, where each node corresponds to an operator in the model (e.g., matrix multiplication) and each edge corresponds to the data dependency between operators, our techniques optimize the DNN implementation on the graph level and the operator level, respectively. The graph-level optimization method searches for the equivalent computation graph for a DNN by applying transformations to its original computation graph iteratively. The operator-level optimization method searches for the equivalent low-level code for an operator by applying transformations to its naively implemented low-level code. Date: Friday, 22 March 2024 Time: 4:00pm - 6:00pm Venue: Room 5501 Lifts 25/26 Committee Members: Prof. Lei Chen (Supervisor) Prof. Raymond Wong (Chairperson) Prof. Qiong Luo Prof. Ke Yi