More about HKUST
Network Compression via Loss-aware Quantization
PhD Thesis Proposal Defence Title: "Network Compression via Loss-aware Quantization" by Miss Lu HOU Abstract: Deep neural network models, though very powerful and highly successful, are computationally expensive in terms of space and time. Recently, there have been a number of attempts on quantizing the network weights and activations. These attempts greatly reduce the network size, and allows possibility of deploying deep models in resource-constrained environments, such as small computing devices. However, most existing quantization schemes are based on simple matrix approximations and ignore the effect of quantization on the loss. In this thesis, we first propose to directly minimize the loss w.r.t. the quantized weights. The optimization problem can be solved by a proximal Newton algorithm with diagonal Hessian approximated by the second moments already computed by the RMSProp or Adam optimizer. We show that for binarization, the underlying proximal step has an efficient closed- form solution. Experiments on both feedforward and recurrent networks show that the proposed loss-aware binarization algorithm outperforms existing binarization schemes. Since binarization often causes accuracy degradation on large models, we then extend the loss-aware weight binarization scheme to ternarization and m-bit (where m > 2) quantization. Experiments on both feedforward and recurrent neural networks show that the proposed scheme outperforms state-of-the-art weight quantization algorithms, and is as accurate (or even more accurate) than the full-precision network. Though weight-quantized models have small storage and fast inference, the training can still be time-consuming. This can be improved with distributed learning. To reduce the high communication cost due to gradient synchronization, recently gradient quantization has also been proposed to train deep networks with full-precision weights. Thus we finally theoretically study how the combination of both weight and gradient quantization affects convergence. Empirical experiments confirm the theoretical convergence results, and demonstrate that quantized networks can speed up training and have comparable performance as full-precision networks. Date: Thursday, 25 April 2019 Time: 4:00pm - 6:00pm Venue: Room 4475 (lifts 25/26) Committee Members: Prof. James Kwok (Supervisor) Dr. Brian Mak (Chairperson) Dr. Wei Wang Prof. Tong Zhang (MATH) **** ALL are Welcome ****