More about HKUST
Network Compression via Loss-aware Quantization
PhD Thesis Proposal Defence
Title: "Network Compression via Loss-aware Quantization"
by
Miss Lu HOU
Abstract:
Deep neural network models, though very powerful and highly successful, are
computationally expensive in terms of space and time. Recently, there have been
a number of attempts on quantizing the network weights and activations. These
attempts greatly reduce the network size, and allows possibility of deploying
deep models in resource-constrained environments, such as small computing
devices. However, most existing quantization schemes are based on simple matrix
approximations and ignore the effect of quantization on the loss.
In this thesis, we first propose to directly minimize the loss w.r.t. the
quantized weights. The optimization problem can be solved by a proximal Newton
algorithm with diagonal Hessian approximated by the second moments already
computed by the RMSProp or Adam optimizer.
We show that for binarization, the underlying proximal step has an
efficient closed- form solution. Experiments on both feedforward and
recurrent networks show that the proposed loss-aware binarization
algorithm outperforms existing binarization schemes. Since binarization
often causes accuracy degradation on large models, we then extend the
loss-aware weight binarization scheme to ternarization and m-bit (where m
> 2) quantization. Experiments on both feedforward and recurrent neural
networks show that the proposed scheme outperforms state-of-the-art weight
quantization algorithms, and is as accurate (or even more accurate) than
the full-precision network.
Though weight-quantized models have small storage and fast inference, the
training can still be time-consuming. This can be improved with distributed
learning. To reduce the high communication cost due to gradient
synchronization, recently gradient quantization has also been proposed to train
deep networks with full-precision weights. Thus we finally theoretically study
how the combination of both weight and gradient quantization affects
convergence. Empirical experiments confirm the theoretical convergence results,
and demonstrate that quantized networks can speed up training and have
comparable performance as full-precision networks.
Date: Thursday, 25 April 2019
Time: 4:00pm - 6:00pm
Venue: Room 4475
(lifts 25/26)
Committee Members: Prof. James Kwok (Supervisor)
Dr. Brian Mak (Chairperson)
Dr. Wei Wang
Prof. Tong Zhang (MATH)
**** ALL are Welcome ****