Enhancing Attentions in Deep NLP Models

PhD Thesis Proposal Defence


Title: "Enhancing Attentions in Deep NLP Models"

by

Miss Lanqing XUE


Abstract:

Attention is an essential mechanism for creatures to cognitive the world. 
Psychologists describe it as the allocation of limited cognitive 
processing resources. Neural attention is a technique motivated by 
cognitive attention. It selectively processes information from sources by 
computing input-dependent dynamic weights to boost the information from 
relevant portions. Neural attention was first proposed in deep learning in 
2014 and developed a lot in the decade. A milestone in the development is 
the proposal of Transformers in 2017. The transformer is the first deep 
architecture constructed solely on attention mechanisms without any 
recurrence and convolution. Nowadays, attention has obtained great success 
in many areas, such as natural language processing, computer vision, and 
social networks, becoming an essential component in neural networks.

In addition to exploring attention in more applications, another fascinating 
study explores ways to enhance attention in current models. The motivations are 
to avoid networks being distracted by irrelevant information and to improve the 
network’s interpretability. Existing works tend to diverge into two categories. 
One enhances attention by purely improving attention mechanisms. The other 
enhances attention by exploring patterns in data. In this thesis, we focus on 
enhancing attention in deep natural language processing (NLP) models. We 
introduce new methods of both categories to enhance attention. The 
contributions of this thesis are as follows:

First, we propose Gated Attention Network (GA-Net), a novel sparse attention 
network, for sequence data. GA-Net combines the techniques of attention and 
dynamic network configuration. It dynamically selects a subset of elements to 
pay attention to and filters irrelevant elements. Besides, an efficient 
end-to-end learning method using Gumbel-softmax is designed to relax the binary 
gates and enable back-propagation to facilitate GA-Net training. GA-Net 
achieves better performance in text classification tasks compared with all 
baseline models with global or local attention and obtains better 
interpretability.

Second, we propose DeepRapper, a Transformer-based autoregressive language 
model which carefully models rhymes and rhythms for rap generation. DeepRapper 
adapts language generation frameworks and data representations, which enhances 
attention to rhyme-related context. Specifically, DeepRapper generates rap 
lyrics in the reverse order with rhyme representation and constraint. To our 
knowledge, DeepRapper is the first system to generate rap with both rhymes and 
rhythms. Both objective and subjective evaluations demonstrate that  DeepRapper 
generates creative and high-quality raps with rhymes and rhythms.


Date:			Tuesday, 18 May 2021

Time:                  	2:00pm - 4:00pm

Zoom Meeting:		https://hkust.zoom.us/j/3332618904

Committee Members:	Prof. Nevin Zhang (Supervisor)
 			Dr. Yangqiu Song (Chairperson)
 			Prof. Fangzhen Lin
 			Prof. Raymond Wong


**** ALL are Welcome ****