More about HKUST
Understanding Transformer in Natural Language Processing
PhD Thesis Proposal Defence Title: "Understanding Transformer in Natural Language Processing" by Mr. Han SHI Abstract: Transformer-based models are popularly used in natural language processing (NLP) and have shown significant performance on various downstream tasks, such as text classification, text translation, question answering and text generation. Even though Transformer-based models have achieved great success in a lot of fields, few works delve into Transformer deeper. In this proposal, we attempt to analyse and understand Transformer architecture from three different perspectives. Firstly, we focus on the self-attention module in Transformer and propose a differentiable architecture search method to find important attention patterns. Different from prior works, we find that diagonal elements in the attention map can be dropped without harming the performance. To understand this observation, we provide a theoretical proof from the perspective of universal approximation. Furthermore, we achieve a series of attention masks for efficient architecture design based on our proposed search method. Secondly, we attempt to understand the feed-forward module in Transformer from a unified framework. Specifically, we introduce the concept of memory token and build the relationship between feed-forward and self-attention. Moreover, we propose a novel architecture named uni-attention, which contains all four types of attention connection in our framework. Uni-attention achieves better performance compared with previous baselines given the same number of memory tokens. Finally, we investigate the over-smoothing phenomenon in whole Transformer architecture. We provide a theoretical analysis by building the relationship between the self-attention and the graph field. Specifically, we find that layer normalization plays a important role in the over-smoothing problem and verify this empirically. To alleviate this issue, we propose hierarchical fusion architectures such that the output can be more diverse. Date: Friday, 22 July 2022 Time: 3:00pm - 5:00pm Zoom Meeting: https://hkust.zoom.us/j/5599077828 Committee Members: Prof. James Kwok (Supervisor) Dr. Brian Mak (Chairperson) Dr. Hao Chen Dr. Minhao Cheng **** ALL are Welcome ****