More about HKUST
Commonsense Knowledge Base Population and Reasoning
PhD Thesis Proposal Defence
Title: "Commonsense Knowledge Base Population and Reasoning"
by
Mr. Tianqing FANG
Abstract:
Commonsense Knowledge includes facts about the everyday world that ordinary
people are expected to know. It plays a crucial role in Natural Language
Processing (NLP) systems, enabling them to make presumptions about common
situations encountered by humans. However, acquiring and incorporating
Commonsense Knowledge into NLP systems poses challenges, as it is typically
implicit and not readily available in standard corpora, hindering the
application of downstream commonsense reasoning.
To tackle the data scarcity issue, a standard way to study commonsense is to
construct CommonSense Knowledge Bases (CSKB). Previous attempts have focused on
1) human annotation, which has limited scalability, 2) information extraction,
which suffers from poor quality and reporting bias, or 3) text generation from
Large Language Models (LLMs), which suffers from selection bias and limited
novelty of generated knowledge.
We propose an alternative commonsense knowledge acquisition framework, called
Commonsense Knowledge Base Population (CKBP), which automatically populates
expensive complex commonsense knowledge from more affordable linguistic
knowledge resources. We establish a benchmark of CKBP based on event-event
discourse relations extracted through semantic and discourse parsing of large
corpora, and manually annotate 60K populated triples for verification.
To carry out the population process, we introduce a graph-neural-network (GNN)
based model that leverages the rich contextual information in the knowledge
graph as additional supervision signals. Since CKBP is a semi-supervised
learning problem with a large amount of unlabeled data (discourse knowledge
from large corpora), we also propose a pseudo-labeling based model that
achieves excellent performance. We evaluate the effectiveness of the populated
knowledge on downstream commonsense reasoning tasks and observe that it
enhances generative commonsense inference and commonsense question answering by
providing more diverse knowledge.
Furthermore, with knowledge at hand, we explore commonsense reasoning based on
Commonsense Knowledge from two perspectives. First, we directly utilize the
populated knowledge for downstream commonsense question answering by converting
it into question-answering form with templates, serving as supervision data for
training QA models. Second, we perform structured reasoning on complex logical
queries derived from Commonsense Knowledge graphs. We sample conjunctive
logical queries from the graphs and verbalize them using LLMs to generate
narratives for both training and evaluating models for complex reasoning.
Experimental results demonstrate that while LLMs exhibit proficiency in
handling one-hop Commonsense Knowledge, performing complex reasoning involving
multiple hops and intersections on Commonsense Knowledge graphs remains
challenging. Models trained on complex logical queries show improvement in
terms of general narrative understanding and complex commonsense reasoning
ability.
Date: Friday, 31 May 2024
Time: 10:00am - 12:00noon
Venue: Room 4475
Lifts 25/26
Committee Members: Dr. Yangqiu Song (Supervisor)
Dr. Junxian He (Chairperson)
Dr. Brian Mak
Dr. Dan Xu