PhD Qualifying Examination "Data Integration in the Hidden Web" Miss Qiong Huang Abstract: Accessing information in the hidden Web requires not only search for relevant data but also integration of heterogeneous data sources. Compared with the traditional data integration scenario, two major issues of which are schema matching and entity resolution, web data integration adds the following new challenges: source modeling and selection, source querying and data extraction. To prepare for domain-based data integration, source selection first classifies similar data sources while source modeling extracts the schemas from the various data source interfaces. For schema matching, many holistic methods take into account the scalability problem specific to hidden Web data integration. Next, source querying automatically simulates user inputs to retrieve relevant data results. Once the results are returned in web pages, data extraction techniques are employed for collecting data entities. Finally, data entities from different data sources are matched and merged together. In this survey, we first study the basic characteristics of hidden web data integration and then review state-of-the-art techniques for each of these sub problems. These techniques focus on the modeling of the corresponding sub-problem or on how to identify and exploit various kinds of information. We also point out some open research issues. Date: Thursday, 18 January 2007 Time: 2:00p.m.-4:00p.m. Venue: Room 3501 lifts 25-26 Committee Members: Prof. Frederick Lochovsky (Supervisor) Dr. Lei Chen (Chairperson) Prof. Dik Lun Lee Dr. Qiong Luo **** ALL are Welcome ****