More about HKUST
Multi-Schema Entity Resolution
MPhil Thesis Defence Title: "Multi-Schema Entity Resolution" By Miss Qiong Huang Abstract Entity resolution (ER) is the problem of identifying and merging the records judged to represent the same real-world entity. Most previous ER approaches assumed a unified schema (or a global schema) under which all records are compared and merged in a field basis. We consider the multi-schema ER problem: records come from multiple sources that are of different schemas. A prime example is Information Integration over the deep web, where the goal is to integrate data from heterogeneous sources. We formalize the multi-schema ER problem; investigate some properties that are satisfied in unified-schema setting but not in multi-schema setting; and identify the possible resolution conflicts that might occur in multi-schema setting using the previous ER approaches. We then propose VEOS algorithm that is free from such conflicts and at the same time can take advantage of order scheduling to improve accuracy. We identify schema-level and data-level criteria to distinguish the more reliable comparisons so that by comparing them first a more accurate result is expected. To leverage such information, we propose to construct confidence graph upon which our scheduling algorithm is developed. Our experiments using real online shopping data show that: (1) our scheduling algorithm is considerably effective in improving accuracy, and (2) VEOS with scheduling outperforms other methods in both accuracy and efficiency. Date: Monday, 23 June 2008 Time: 2:00p.m.-4:00p.m. Venue: Room 3501 Lifts 25-26 Committee Members: Prof. Frederick Lochovsky (Supervisor) Prof. Dik-Lun Lee (Chairperson) Dr. Lei Chen **** ALL are Welcome ****