More about HKUST
Effective and efficient convolutional architectures for visual recognition
PhD Thesis Proposal Defence Title: "Effective and efficient convolutional architectures for visual recognition" by Mr. Ningning MA Abstract: Convolution neural networks are divided into two categories: static CNNs and dynamic CNNs, according to whether the CNN layers are sample-dependent. Specifically, there are two research topics, one is a more efficient improvement of the existing method based on the static CNNs, and the other is to explore more effective dynamic neural architectures. For static CNNs, to improve the efficiency of the model, we continue to design some efficient and accurate building blocks. For dynamic CNNs, we present three simple, effective, and effective methods. First, we present WeightNet that decouples the convolutional kernels and the convolutional computation. This is different from the common practice that all the input samples share the same convolutional kernel. In that case, convolution kernels are learnable hyper-parameters, in our case, the kernels are learned by an additional simple network made of fully-connected layers. Our approach is general that unifies two current distinct and extremely effective SENet and CondConv into the same framework on weight space. We use the WeightNet, composed entirely of (grouped) fully-connected layers, to directly output the convolutional weight. The simple change has a large impact: it provides a meta-network design space, improves accuracy significantly, and achieves optimum Accuracy-FLOPs and Accuracy-Parameter trade-offs. Next, we present a new visual activation we call funnel activation, that performs the non-linear transformation while simultaneously capturing the spatial dependency. Our method extends the ReLU by adding a negligible overhead spatial condition to replace the hand-designed zero in ReLU, which helps capture complicated visual layouts with regular convolution. Despite it seems a minor change, it has a large impact: it shows great improvements in many visual recognition tasks and even outperforms the complicated DeformableConv and SENet. Third, we present a simple, effective, and general activation function we term ACON which learns to activate the neurons or not. Interestingly, we find Swish, the recent popular NAS-searched activation, can be interpreted as a smooth approximation to ReLU. Intuitively, in the same way, we approximate the more general Maxout family to our novel ACON family, which remarkably improves the performance and makes Swish a special case of ACON. Next, we present meta-ACON, which explicitly learns to optimize the parameter switching between non-linear (activate) and linear (inactivate) and provides a new design space. By simply changing the activation function, we show its effectiveness on both small models and highly optimized large models (e.g. it improves the ImageNet top-1 accuracy rate by 6.7% and 1.8% on MobileNet-0.25 and ResNet-152, respectively). Moreover, our novel ACON can be naturally transferred to object detection and semantic segmentation, showing that ACON is an effective alternative in a variety of tasks. Date: Friday, 22 January 2021 Time: 3:00pm - 5:00pm Zoom Meeing: https://hkust.zoom.com.cn/j/4468144429 Committee Members: Prof. Long Quan (Supervisor) Prof. Chiew-Lan Tai (Chairperson) Dr. Qifeng Chen Dr. Ming Liu **** ALL are Welcome ****