Multi-Schema Entity Resolution

MPhil Thesis Defence


Title: "Multi-Schema Entity Resolution"

By

Miss Qiong Huang


Abstract

Entity resolution (ER) is the problem of identifying and merging the
records judged to represent the same real-world entity. Most previous ER
approaches assumed a unified schema (or a global schema) under which all
records are compared and merged in a field basis. We consider the
multi-schema ER problem: records come from multiple sources that are of
different schemas. A prime example is Information Integration over the
deep web, where the goal is to integrate data from heterogeneous sources.

We formalize the multi-schema ER problem; investigate some properties that
are satisfied in unified-schema setting but not in multi-schema setting;
and identify the possible resolution conflicts that might occur in
multi-schema setting using the previous ER approaches. We then propose
VEOS algorithm that is free from such conflicts and at the same time can
take advantage of order scheduling to improve accuracy.

We identify schema-level and data-level criteria to distinguish the more
reliable comparisons so that by comparing them first a more accurate
result is expected. To leverage such information, we propose to construct
confidence graph upon which our scheduling algorithm is developed. Our
experiments using real online shopping data show that: (1) our scheduling
algorithm is considerably effective in improving accuracy, and (2) VEOS
with scheduling outperforms other methods in both accuracy and efficiency.


Date:				Monday, 23 June 2008

Time:				2:00p.m.-4:00p.m.

Venue:				Room 3501
				Lifts 25-26

Committee Members:		Prof. Frederick Lochovsky (Supervisor)
				Prof. Dik-Lun Lee (Chairperson)
				Dr. Lei Chen


**** ALL are Welcome ****