More about HKUST
KNOWLEDGE BASE REFINEMENT WITH INTERNAL AND EXTERNAL DATA
PhD Thesis Proposal Defence
Title: "KNOWLEDGE BASE REFINEMENT WITH INTERNAL AND EXTERNAL DATA"
by
Mr. Hao XIN
Abstract:
In the contemporary digital age, a multitude of publicly accessible knowledge
bases (KBs) have been established to bolster knowledge-centric applications
such as search engines and online recommendations.
Nonetheless, these knowledge bases grapple with the issue of incomplete
knowledge. Firstly, certain domain-specific knowledge remains relatively
uncharted. For instance, current KBs primarily concentrate on encoding factual
data, considered as objective knowledge. Secondly, in dynamic real- world
scenarios where information is constantly evolving, these KBs struggle to keep
up with emerging data, resulting in incomplete databases.
In this thesis, we investigate the knowledge base refinement task from both
external and internal data sources, which contains three major research
problems.
Firstly, we tackle the issue of enriching subjective domain knowledge, aiming
to bridge the gap between existing KBs and subjective knowledge. We propose a
framework for enriching knowledge bases with subjective knowledge, leveraging
knowledge from the crowd and existing KBs.
Secondly, we examine the problem of populating knowledge bases, which involves
extracting knowledge from unstructured text that aligns with the schema of the
target KBs, thereby enriching them. We propose a comprehensive system that
inputs an incomplete target KB and documents, and outputs concise triples. It
initially performs joint entity and relation linking to the existing KB based
on both the context of the document and background KB information. It then
summarizes the extracted facts by considering their relevance to the document
and the diversity among them.
Thirdly, we investigate the issue of updating knowledge bases, which involves
identifying and updating outdated facts in KBs. We employ the revision history
of the target KB to learn how to identify outdated facts and propose a
cost-aware fact selection algorithm to guide the fact update process.
Furthermore, we explore the problem of Knowledge Update Rule Discovery (KURD),
which seeks to derive an optimal subset of knowledge update rules for
performing knowledge updating, taking into account rule quality and coverage.
We validate the effectiveness and efficiency of the proposed solutions for each
of the aforementioned problems against cutting-edge techniques, through
comprehensive experiments on real-world datasets. Finally, we conclude the
thesis by outlining future research directions and challenges pertaining to the
task of refining knowledge bases.
Date: Friday, 10 May 2024
Time: 2:00pm - 4:00pm
Venue: Room 4475
Lifts 25/26
Committee Members: Prof. Lei Chen (Supervisor)
Prof. Qiong Luo (Chairperson)
Prof. Ke Yi
Dr. Nan Tang (HKUST-GZ)