More about HKUST
Enhancing Attentions in Deep NLP Models
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Enhancing Attentions in Deep NLP Models" By Miss Lanqing XUE Abstract Attention is an essential mechanism for creatures to cognitive the world. Psychologists describe it as the allocation of limited cognitive processing resources. Neural attention is a technique motivated by cognitive attention. It selectively processes information from sources by computing input-dependent dynamic weights to boost the information from relevant portions. Neural attention was first proposed in deep learning in 2014 and developed a lot in the decade. A milestone in the development is the proposal of Transformers in 2017. The transformer is the first deep architecture constructed solely on attention mechanisms without any recurrence and convolution. Nowadays, attention has obtained great success in many areas, such as natural language processing, computer vision, and social networks, becoming an essential component in neural networks. In addition to exploring attention in more applications, another fascinating study explores ways to enhance attention in current models. The motivations are to avoid networks being distracted by irrelevant information and to improve the network’s interpretability. Existing works tend to diverge into two categories. One enhances attention by purely improving attention mechanisms. The other enhances attention by exploring patterns in data. In this thesis, we focus on enhancing attention in deep natural language processing (NLP) models. We introduce new methods of both categories to enhance attention. The contributions of this thesis are as follows: First, we propose Gated Attention Network (GA-Net), a novel sparse attention network, for sequence data. GA-Net combines the techniques of attention and dynamic network configuration. It dynamically selects a subset of elements to pay attention to and filters irrelevant elements. Besides, an efficient end-to-end learning method using Gumbel-softmax is designed to relax the binary gates and enable back-propagation to facilitate GA-Net training. GA-Net achieves better performance in text classification tasks compared with all baseline models with global or local attention and obtains better interpretability. Second, we propose DeepRapper, a Transformer-based autoregressive language model which carefully models rhymes and rhythms for rap generation. DeepRapper adapts language generation frameworks and data representations, which enhances attention to rhyme-related context. Specifically, DeepRapper generates rap lyrics in the reverse order with rhyme representation and constraint. To our knowledge, DeepRapper is the first system to generate rap with both rhymes and rhythms. Both objective and subjective evaluations demonstrate that DeepRapper generates creative and high-quality raps with rhymes and rhythms. Date: Tuesday, 3 August 2021 Time: 10:00am - 12:00noon Zoom Meeting: https://hkust.zoom.us/j/6761083097 Chairperson: Prof. Wenjing YE (MAE) Committee Members: Prof. Nevin ZHANG (Supervisor) Prof. Yangqiu SONG Prof. Raymond WONG Prof. Bing-yi JING (MATH) Prof. Irwin KING (CUHK) **** ALL are Welcome ****