PhD Thesis Proposal Defence "Towards Fully Automatic Data Integration for Web Databases" by Mr. Weifeng Su Abstract: An important part of today's Web is Web databases, in which 80% of the databases are structured databases. To facilitate a user to retrieve relevant records from different Web databases simultaneously, we propose a meta-querying system comprised of three components: query interface integrator, data extractor and result integrator. In each component, a novel method is proposed that tries to address the respective problem automatically. In the query interface integrator, the attribute occurrence patterns in multiple query interfaces are used to find the attributes that match in different interfaces. In the data extractor, a domain ontology is first learned from the information overlap in the query results from different web databases and it is then used to extract the data encoded in the HTML pages. In the result integrator, to identify the duplicates that exist in the results from different web databases, a set of negative records is first constructed by assuming that the records from the same web database are unique and then, starting from the negative records, an iterative algorithm identifies the duplicates from different web databases. Preliminary experimental results show that these automatic, novel methods can achieve very high precision and outperform existing methods. Date: Wednesday, 24 January 2007 Time: 2:00p.m.-4:00p.m. Venue: Room 3501 lifts 25-26 Committee Members: Prof. Frederick Lochovsky (Supervisor) Prof. Dik-Lun Lee (Chairperson) Dr. Qiong Luo Dr. Wilfred Ng **** ALL are Welcome ****