Enhancing Attentions in Deep NLP Models

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Enhancing Attentions in Deep NLP Models"

By

Miss Lanqing XUE


Abstract

Attention is an essential mechanism for creatures to cognitive the world. 
Psychologists describe it as the allocation of limited cognitive 
processing resources. Neural attention is a technique motivated by 
cognitive attention. It selectively processes information from sources by 
computing input-dependent dynamic weights to boost the information from 
relevant portions. Neural attention was first proposed in deep learning in 
2014 and developed a lot in the decade. A milestone in the development is 
the proposal of Transformers in 2017. The transformer is the first deep 
architecture constructed solely on attention mechanisms without any 
recurrence and convolution. Nowadays, attention has obtained great success 
in many areas, such as natural language processing, computer vision, and 
social networks, becoming an essential component in neural networks.

In addition to exploring attention in more applications, another 
fascinating study explores ways to enhance attention in current models. 
The motivations are to avoid networks being distracted by irrelevant 
information and to improve the network’s interpretability. Existing works 
tend to diverge into two categories. One enhances attention by purely 
improving attention mechanisms. The other enhances attention by exploring 
patterns in data. In this thesis, we focus on enhancing attention in deep 
natural language processing (NLP) models. We introduce new methods of both 
categories to enhance attention. The contributions of this thesis are as 
follows:

First, we propose Gated Attention Network (GA-Net), a novel sparse 
attention network, for sequence data. GA-Net combines the techniques of 
attention and dynamic network configuration. It dynamically selects a 
subset of elements to pay attention to and filters irrelevant elements. 
Besides, an efficient end-to-end learning method using Gumbel-softmax is 
designed to relax the binary gates and enable back-propagation to 
facilitate GA-Net training. GA-Net achieves better performance in text 
classification tasks compared with all baseline models with global or 
local attention and obtains better interpretability.

Second, we propose DeepRapper, a Transformer-based autoregressive language 
model which carefully models rhymes and rhythms for rap generation. 
DeepRapper adapts language generation frameworks and data representations, 
which enhances attention to rhyme-related context. Specifically, 
DeepRapper generates rap lyrics in the reverse order with rhyme 
representation and constraint. To our knowledge, DeepRapper is the first 
system to generate rap with both rhymes and rhythms. Both objective and 
subjective evaluations demonstrate that  DeepRapper generates creative and 
high-quality raps with rhymes and rhythms.


Date:			Tuesday, 3 August 2021

Time:			10:00am - 12:00noon

Zoom Meeting: 		https://hkust.zoom.us/j/6761083097

Chairperson:		Prof. Wenjing YE (MAE)

Committee Members:	Prof. Nevin ZHANG (Supervisor)
 			Prof. Yangqiu SONG
 			Prof. Raymond WONG
 			Prof. Bing-yi JING (MATH)
 			Prof. Irwin KING (CUHK)


**** ALL are Welcome ****