KNOWLEDGE BASE REFINEMENT WITH INTERNAL AND EXTERNAL DATA

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "KNOWLEDGE BASE REFINEMENT WITH INTERNAL AND EXTERNAL DATA"

By

Mr. Hao XIN


Abstract:

In the contemporary digital age, a multitude of publicly accessible knowledge 
bases (KBs) have been established to strengthen applications focused on 
knowledge, including search engines, recommendation systems, and 
retrieval-augmented generation models. Nonetheless, these knowledge bases 
grapple with the issue of incomplete knowledge. Firstly, certain 
domain-specific knowledge remains relatively uncharted. For instance, current 
KBs primarily concentrate on encoding factual data, considered as objective 
knowledge. Secondly, in dynamic real-world scenarios where information is 
constantly evolving, these KBs struggle to keep up with emerging data, 
resulting in incomplete databases. This thesis explores the refinement of 
knowledge bases utilizing both external and internal data sources.

Firstly, we tackle the issue of enriching subjective domain knowledge, aiming 
to bridge the gap between existing KBs and subjective knowledge. We propose a 
framework for enriching knowledge bases with subjective knowledge, leveraging 
knowledge from the crowd and existing KBs.

Secondly, we examine the problem of populating knowledge bases, which involves 
extracting knowledge from unstructured text that aligns with the schema of the 
target KBs, thereby enriching them. We propose a comprehensive system that 
inputs an incomplete target KB and documents, and outputs concise triples. It 
initially performs joint entity and relation linking to the existing KB based 
on both the context of the document and background KB information. It then 
summarizes the extracted facts by considering their relevance to the document 
and the diversity among them.

Thirdly, we investigate the issue of updating knowledge bases, which involves 
identifying and updating outdated facts in KBs. We employ the revision history 
of the target KB to learn how to identify outdated facts and propose a 
cost-aware fact selection algorithm to guide the fact update process. 
Furthermore, we explore the problem of Knowledge Update Rule Discovery (KURD), 
which seeks to derive an optimal subset of knowledge update rules for 
performing knowledge updating, considering rule quality and coverage.

The effectiveness and efficiency of the proposed solutions for each problem are 
validated through comprehensive experiments on real-world datasets, comparing 
them with cutting-edge techniques. The thesis concludes by outlining future 
research directions and challenges.


Date:                   Friday, 16 August 2024

Time:                   2:00pm - 4:00pm

Venue:                  Room 3494
                        Lifts 25/26

Chairman:               

Committee Members:      Prof. Lei CHEN (Supervisor)
                        Prof. Qiong LUO
                        Prof. Raymond WONG
                        Dr. Can YANG (MATH)
                        Prof. Jianliang XU (HKBU)