Leveraging Multi-Grained Global Contexts for Scientific and Social Media Keyphrase Generation

MPhil Thesis Defence


Title: "Leveraging Multi-Grained Global Contexts for Scientific and Social 
Media Keyphrase Generation"

By

Mr. Shizhe DIAO


Abstract

Keyphrase generation aims to produce a set of phrases summarizing the 
essentials of a given document. Conventional methods normally apply an 
encoder-decoder architecture to generate the output keyphrases for an input 
document, where they are designed to focus on each current document so they 
inevitably omit crucial global contexts carried by other relevant documents, 
e.g., the cross-document dependency and latent topics.

In this thesis, we firstly focus on scientific documents and propose CDKGen, a 
Transformer-based keyphrase generator,  which expands the Transformer to global 
attention with cross-document attention networks to incorporate available 
documents as references so as to generate better keyphrases with the guidance 
of topic information. In addition to the scientific domain, we verify the 
effectiveness of our approach in the social media domain as well. The nature of 
social media contents makes it difficult to directly transfer the keyphrase 
generation methods to this domain, mainly because they are often short in 
length and extremely informal, making the post information insufficient to 
infer the keyphrases. To address this, we leverage relevant posts and their 
conversations (replying and reposting messages) and relevant entity relations 
to enrich the contexts of the original post.  Specifically, we propose MOCHA 
(Multi-grained glObal Contexts Hashtag generAtor), a hashtag generation model 
consisting of two novel modules: RC-ATTENTION and RE-GRAPH. The RC-ATTENTION 
module uses cross-document attention to retrieve relevant posts and 
conversations. The RE-GRAPH module employs a graph attention network to model 
the relevant entity relations.

Experimental results on five scientific document datasets and two social media 
datasets illustrate the validity and effectiveness of our model, which achieves 
the state-of-the-art performance on all datasets. Further analyses show that 
our model is able to generate keyphrases consistent with the topics and 
conversations while maintaining sufficient diversity.


Date:  			Thursday, 12 August 2021

Time:			9:00pm - 11:00pm

Zoom meeting:		https://hkust.zoom.us/j/2627821624

Committee Members:	Prof. Tong Zhang (Supervisor)
 			Prof. Kani Chen (Chairperson, MATH)
 			Dr. Yuan Yao (MATH)


**** ALL are Welcome ****