More about HKUST
Learning Static and Dynamic Sparse Structures for Deep Neural Networks
PhD Thesis Proposal Defence
Title: "Learning Static and Dynamic Sparse Structures for Deep Neural Networks"
by
Mr. Zhourong CHEN
Abstract:
In the past decade, deep neural networks (DNNs) have produced superior results
in a wide range of machine learning applications. However, the structures of
these networks are usually dense and handcrafted by human experts. Learning
sparse structures from data for DNNs still remains a challenging problem in the
literature. In this thesis, we investigate learning two types of sparse
structures for DNNs. The first are static sparse structures which are learned
from data and fixed for different input samples, while the second are dynamic
sparse structures which are conditioned on individual input sample. Learning
these sparse structures is expected to help ease overfitting, reduce time and
space complexity, and lead to improved interpretability of deep models.
For learning static sparse structures, we propose two methods called Tree
Skeleton Expansion (TSE) and Tree Receptive Field Growing (TRFG) respectively
for standard feedforward neural networks (FNNs). Both methods rely on learning
probabilistic graphical models (PGMs) for identifying groups of strongly
correlated units and focus on modeling the strong correlations. In TSE, we
construct a tree-structured PGM as a skeleton and expand the connections in the
skeleton to form a deep sparse structure for FNNs. TSE is fast and the
resulting sparse models can achieve better performance with much fewer
parameters compared with dense FNNs. In TRFG, we propose to learn deep
structures in a layer-wise manner. For each layer of units, we build a
tree-structured PGM and construct the next layer by introducing a unit for each
local region in the PGM. TRFG can efficiently capture the salient correlations
at different layers and learn sparse models which have better performance and
interpretability than dense FNNs.
For learning dynamic sparse structures, the most essential problem is how to
dynamically configure the network structure for each individual input sample on
the fly. We propose a new framework called GaterNet for this problem in
convolutional neural networks (CNNs). GaterNet utilizes a dedicated sub-network
to generate binary gates from input and prunes filters in a CNN for the
specific input based on the gate values. It results in a dynamic CNN which
essentially processes different samples with different sparse structures. Our
preliminary experiments show that, with the help of this dynamic pruning, the
generalization performance of the CNN can be significantly improved.
Date: Friday, 10 May 2019
Time: 10:00am - 12:00noon
Venue: Room 5510
lifts 25/26
Committee Members: Prof. Nevin Zhang (Supervisor)
Dr. Raymond Wong (Chairperson)
Dr. Yangqiu Song
Prof. Dit-Yan Yeung
**** ALL are Welcome ****