More about HKUST
Learning Static and Dynamic Sparse Structures for Deep Neural Networks
PhD Thesis Proposal Defence Title: "Learning Static and Dynamic Sparse Structures for Deep Neural Networks" by Mr. Zhourong CHEN Abstract: In the past decade, deep neural networks (DNNs) have produced superior results in a wide range of machine learning applications. However, the structures of these networks are usually dense and handcrafted by human experts. Learning sparse structures from data for DNNs still remains a challenging problem in the literature. In this thesis, we investigate learning two types of sparse structures for DNNs. The first are static sparse structures which are learned from data and fixed for different input samples, while the second are dynamic sparse structures which are conditioned on individual input sample. Learning these sparse structures is expected to help ease overfitting, reduce time and space complexity, and lead to improved interpretability of deep models. For learning static sparse structures, we propose two methods called Tree Skeleton Expansion (TSE) and Tree Receptive Field Growing (TRFG) respectively for standard feedforward neural networks (FNNs). Both methods rely on learning probabilistic graphical models (PGMs) for identifying groups of strongly correlated units and focus on modeling the strong correlations. In TSE, we construct a tree-structured PGM as a skeleton and expand the connections in the skeleton to form a deep sparse structure for FNNs. TSE is fast and the resulting sparse models can achieve better performance with much fewer parameters compared with dense FNNs. In TRFG, we propose to learn deep structures in a layer-wise manner. For each layer of units, we build a tree-structured PGM and construct the next layer by introducing a unit for each local region in the PGM. TRFG can efficiently capture the salient correlations at different layers and learn sparse models which have better performance and interpretability than dense FNNs. For learning dynamic sparse structures, the most essential problem is how to dynamically configure the network structure for each individual input sample on the fly. We propose a new framework called GaterNet for this problem in convolutional neural networks (CNNs). GaterNet utilizes a dedicated sub-network to generate binary gates from input and prunes filters in a CNN for the specific input based on the gate values. It results in a dynamic CNN which essentially processes different samples with different sparse structures. Our preliminary experiments show that, with the help of this dynamic pruning, the generalization performance of the CNN can be significantly improved. Date: Friday, 10 May 2019 Time: 10:00am - 12:00noon Venue: Room 5510 lifts 25/26 Committee Members: Prof. Nevin Zhang (Supervisor) Dr. Raymond Wong (Chairperson) Dr. Yangqiu Song Prof. Dit-Yan Yeung **** ALL are Welcome ****