Network Compression via Quantization and Sparsification

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Network Compression via Quantization and Sparsification"

By

Miss Lu HOU


Abstract

Deep neural network models, though very powerful and highly successful, are 
computationally expensive in terms of space and time. Recently, there have been 
a number of attempts on compressing the network weights. These attempts greatly 
reduce the network size, and allow the possibility of deploying deep models in 
resource-constrained environments.

In this thesis, we focus on two kinds of network compression methods: 
quantization and sparsification. We first propose to directly minimize the loss 
w.r.t. the quantized weights by using the proximal Newton algorithm. We provide 
a closed-form solution for binarization, as well as an efficient approximate 
solution for ternarization and m-bit (where m > 2) quantization. To speed up 
distributed training of weight-quantized networks, we then propose to use 
gradient quantization to reduce the communication cost, and theoretically study 
how the combination of weight and gradient quantization affects convergence. In 
addition, since previous quantization methods usually have inferior performance 
on LSTMs, we study why training quantized LSTMs is difficult, and show 
that popular normalization schemes can help stabilize the training of quantized 
LSTMs.

While weight quantization reduces redundancy in weight representation, 
network sparsification can reduce redundancy in the number of weights. To 
achieve a higher compression rate, we extend the previous quantization-only 
formulation to a more general network compression framework, which allows 
simultaneous quantization and sparsification. Finally, we find that sparse deep 
neural networks obtained by pruning resemble biological neural networks in many 
ways. Inspired by the power law distributions of many biological neural 
networks, we show that these pruned networks also exhibit properties of the 
power law, and these properties can be used for faster learning and 
smaller networks in continual learning.


Date:			Tuesday, 30 July 2019

Time:			10:30am - 12:30pm

Venue:			Room 3494
 			Lifts 25/26

Chairman:		Prof. Chi-Ying Tsui (ISD)

Committee Members:	Prof. James Kwok (Supervisor)
 			Prof. Kai Chen
 			Prof. Dit-Yan Yeung
 			Prof. Yuan Yao (MATH)
 			Prof. Sungroh Yoon (Seoul National University)


**** ALL are Welcome ****