Commonsense Knowledge Base Population and Reasoning

PhD Thesis Proposal Defence


Title: "Commonsense Knowledge Base Population and Reasoning"

by

Mr. Tianqing FANG


Abstract:

Commonsense Knowledge includes facts about the everyday world that ordinary 
people are expected to know. It plays a crucial role in Natural Language 
Processing (NLP) systems, enabling them to make presumptions about common 
situations encountered by humans. However, acquiring and incorporating 
Commonsense Knowledge into NLP systems poses challenges, as it is typically 
implicit and not readily available in standard corpora, hindering the 
application of downstream commonsense reasoning.

To tackle the data scarcity issue, a standard way to study commonsense is to 
construct CommonSense Knowledge Bases (CSKB). Previous attempts have focused on 
1) human annotation, which has limited scalability, 2) information extraction, 
which suffers from poor quality and reporting bias, or 3) text generation from 
Large Language Models (LLMs), which suffers from selection bias and limited 
novelty of generated knowledge.

We propose an alternative commonsense knowledge acquisition framework, called 
Commonsense Knowledge Base Population (CKBP), which automatically populates 
expensive complex commonsense knowledge from more affordable linguistic 
knowledge resources. We establish a benchmark of CKBP based on event-event 
discourse relations extracted through semantic and discourse parsing of large 
corpora, and manually annotate 60K populated triples for verification.

To carry out the population process, we introduce a graph-neural-network (GNN) 
based model that leverages the rich contextual information in the knowledge 
graph as additional supervision signals. Since CKBP is a semi-supervised 
learning problem with a large amount of unlabeled data (discourse knowledge 
from large corpora), we also propose a pseudo-labeling based model that 
achieves excellent performance. We evaluate the effectiveness of the populated 
knowledge on downstream commonsense reasoning tasks and observe that it 
enhances generative commonsense inference and commonsense question answering by 
providing more diverse knowledge.

Furthermore, with knowledge at hand, we explore commonsense reasoning based on 
Commonsense Knowledge from two perspectives. First, we directly utilize the 
populated knowledge for downstream commonsense question answering by converting 
it into question-answering form with templates, serving as supervision data for 
training QA models. Second, we perform structured reasoning on complex logical 
queries derived from Commonsense Knowledge graphs. We sample conjunctive 
logical queries from the graphs and verbalize them using LLMs to generate 
narratives for both training and evaluating models for complex reasoning. 
Experimental results demonstrate that while LLMs exhibit proficiency in 
handling one-hop Commonsense Knowledge, performing complex reasoning involving 
multiple hops and intersections on Commonsense Knowledge graphs remains 
challenging. Models trained on complex logical queries show improvement in 
terms of general narrative understanding and complex commonsense reasoning 
ability.


Date:                   Friday, 31 May 2024

Time:                   10:00am - 12:00noon

Venue:                  Room 4475
                        Lifts 25/26

Committee Members:      Dr. Yangqiu Song (Supervisor)
                        Dr. Junxian He (Chairperson)
                        Dr. Brian Mak
                        Dr. Dan Xu