Commonsense Knowledge Base Population and Reasoning for Inferential Knowledge

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Commonsense Knowledge Base Population and Reasoning for Inferential 
Knowledge"

By

Mr. Tianqing FANG


Abstract:

Commonsense knowledge includes facts about the everyday world that ordinary 
people are expected to know. It plays a crucial role in natural language 
processing (NLP) systems, enabling them to make presumptions about common 
situations encountered by humans. However, acquiring and incorporating 
commonsense knowledge into NLP systems poses challenges, as such knowledge is 
typically implicit and not readily available in standard corpora.

To tackle the data scarcity issue, a standard way to study commonsense is to 
construct commonsense knowledge bases (CSKBs). Previous attempts have focused 
on (1) human annotation, which is expensive and has limited scalability; (2) 
information extraction, which suffers from relatively poor quality and 
reporting bias; or (3) text generation from Large Language Models (LLMs), which 
suffers from selection bias and limited novelty of generated knowledge. 
Moreover, the power of LLMs to elicit commonsense knowledge also requires 
fine-tuning on large-scale corpora and human- annotated commonsense data in the 
first place.

We propose an alternative commonsense knowledge acquisition framework, called 
Commonsense Knowledge Base Population (CKBP), which automatically populates 
complex commonsense knowledge from more affordable linguistic knowledge 
resources. We establish a benchmark for CKBP based on event-event discourse 
relations extracted through semantic and discourse parsing of large corpora, 
and we manually annotate 60K populated triples for verification.

To carry out the population process, we introduce a Graph Neural Network (GNN)- 
based model that leverages the rich contextual information in the knowledge 
graph as additional supervision signals. Since CKBP is a semi-supervised 
learning problem with a large amount of unlabeled data (discourse knowledge 
from large corpora), we also propose a pseudo-labeling-based model that 
achieves excellent performance. We evaluate the effectiveness of the populated 
knowledge on downstream commonsense reasoning tasks and observe that it 
enhances generative commonsense inference and commonsense question answering by 
providing more diverse knowledge.

Furthermore, with the knowledge at hand, we explore commonsense reasoning based 
on commonsense knowledge from two perspectives. First, we directly utilize the 
populated knowledge for downstream commonsense question answering by converting 
it into question-answering (QA) form with templates, serving as supervision 
data for training QA models and generative commonsense inference models. 
Second, we perform reasoning on complex logical queries derived from 
commonsense knowledge graphs. We sample conjunctive logical queries from the 
knowledge graphs and verbalize them using LLMs to generate narratives for both 
training and evaluating models for complex reasoning. Experimental results 
demonstrate that while LLMs exhibit proficiency in handling one-hop commonsense 
knowledge, performing complex reasoning involving multiple hops and 
intersections on commonsense knowledge graphs remains challenging. Models 
trained on complex logical queries show improvement in terms of general 
narrative understanding and complex commonsense reasoning ability.


Date:                   Thursday, 8 August 2024

Time:                   4:00pm - 6:00pm

Venue:                  Room 5510
                        Lifts 25/26

Chairman:               Prof. Irene Man Chi LO (CIVL)

Committee Members:      Dr. Yangqiu SONG (Supervisor)
                        Dr. Junxian HE
                        Dr. Brian MAK
                        Dr. Jing WANG (ISOM)
                        Dr. Wen HUA (PolyU)