More about HKUST
Commonsense Knowledge Base Population and Reasoning
PhD Thesis Proposal Defence Title: "Commonsense Knowledge Base Population and Reasoning" by Mr. Tianqing FANG Abstract: Commonsense Knowledge includes facts about the everyday world that ordinary people are expected to know. It plays a crucial role in Natural Language Processing (NLP) systems, enabling them to make presumptions about common situations encountered by humans. However, acquiring and incorporating Commonsense Knowledge into NLP systems poses challenges, as it is typically implicit and not readily available in standard corpora, hindering the application of downstream commonsense reasoning. To tackle the data scarcity issue, a standard way to study commonsense is to construct CommonSense Knowledge Bases (CSKB). Previous attempts have focused on 1) human annotation, which has limited scalability, 2) information extraction, which suffers from poor quality and reporting bias, or 3) text generation from Large Language Models (LLMs), which suffers from selection bias and limited novelty of generated knowledge. We propose an alternative commonsense knowledge acquisition framework, called Commonsense Knowledge Base Population (CKBP), which automatically populates expensive complex commonsense knowledge from more affordable linguistic knowledge resources. We establish a benchmark of CKBP based on event-event discourse relations extracted through semantic and discourse parsing of large corpora, and manually annotate 60K populated triples for verification. To carry out the population process, we introduce a graph-neural-network (GNN) based model that leverages the rich contextual information in the knowledge graph as additional supervision signals. Since CKBP is a semi-supervised learning problem with a large amount of unlabeled data (discourse knowledge from large corpora), we also propose a pseudo-labeling based model that achieves excellent performance. We evaluate the effectiveness of the populated knowledge on downstream commonsense reasoning tasks and observe that it enhances generative commonsense inference and commonsense question answering by providing more diverse knowledge. Furthermore, with knowledge at hand, we explore commonsense reasoning based on Commonsense Knowledge from two perspectives. First, we directly utilize the populated knowledge for downstream commonsense question answering by converting it into question-answering form with templates, serving as supervision data for training QA models. Second, we perform structured reasoning on complex logical queries derived from Commonsense Knowledge graphs. We sample conjunctive logical queries from the graphs and verbalize them using LLMs to generate narratives for both training and evaluating models for complex reasoning. Experimental results demonstrate that while LLMs exhibit proficiency in handling one-hop Commonsense Knowledge, performing complex reasoning involving multiple hops and intersections on Commonsense Knowledge graphs remains challenging. Models trained on complex logical queries show improvement in terms of general narrative understanding and complex commonsense reasoning ability. Date: Friday, 31 May 2024 Time: 10:00am - 12:00noon Venue: Room 4475 Lifts 25/26 Committee Members: Dr. Yangqiu Song (Supervisor) Dr. Junxian He (Chairperson) Dr. Brian Mak Dr. Dan Xu