More about HKUST
Leveraging Multi-Grained Global Contexts for Scientific and Social Media Keyphrase Generation
MPhil Thesis Defence Title: "Leveraging Multi-Grained Global Contexts for Scientific and Social Media Keyphrase Generation" By Mr. Shizhe DIAO Abstract Keyphrase generation aims to produce a set of phrases summarizing the essentials of a given document. Conventional methods normally apply an encoder-decoder architecture to generate the output keyphrases for an input document, where they are designed to focus on each current document so they inevitably omit crucial global contexts carried by other relevant documents, e.g., the cross-document dependency and latent topics. In this thesis, we firstly focus on scientific documents and propose CDKGen, a Transformer-based keyphrase generator, which expands the Transformer to global attention with cross-document attention networks to incorporate available documents as references so as to generate better keyphrases with the guidance of topic information. In addition to the scientific domain, we verify the effectiveness of our approach in the social media domain as well. The nature of social media contents makes it difficult to directly transfer the keyphrase generation methods to this domain, mainly because they are often short in length and extremely informal, making the post information insufficient to infer the keyphrases. To address this, we leverage relevant posts and their conversations (replying and reposting messages) and relevant entity relations to enrich the contexts of the original post. Specifically, we propose MOCHA (Multi-grained glObal Contexts Hashtag generAtor), a hashtag generation model consisting of two novel modules: RC-ATTENTION and RE-GRAPH. The RC-ATTENTION module uses cross-document attention to retrieve relevant posts and conversations. The RE-GRAPH module employs a graph attention network to model the relevant entity relations. Experimental results on five scientific document datasets and two social media datasets illustrate the validity and effectiveness of our model, which achieves the state-of-the-art performance on all datasets. Further analyses show that our model is able to generate keyphrases consistent with the topics and conversations while maintaining sufficient diversity. Date: Thursday, 12 August 2021 Time: 9:00pm - 11:00pm Zoom meeting: https://hkust.zoom.us/j/2627821624 Committee Members: Prof. Tong Zhang (Supervisor) Prof. Kani Chen (Chairperson, MATH) Dr. Yuan Yao (MATH) **** ALL are Welcome ****