PhD Thesis Proposal Defence "Information discovery, extraction and integration for the hidden web" By Miss Jiying Wang Abstract Currently, web pages returned by filling in search forms are not indexable to most search engines, since they are dynamically generated by querying a back-end (relational or object-relational) database. Referred to as the Deep Web or Hidden Web, the set of such pages is estimated to be around 500 times the size of the ^surface web^. Therefore, there arises the need for new information services that can help users locate information in the hidden web, i.e., to discover the promising information sources, disseminate the queries, extract the corresponding results from web pages and integrate the retrieved data. To minimize user effort in such an information retrieval process and enable the tools to scale with the growth of the web, we explore the problem of automatically interacting with information sources in the hidden web. The problem has four aspects: information discovery, extraction, understanding and integration. In this proposal, we report our initial investigations on two of these aspects: the problem of automatically extracting data objects from a given web site and the problem of automatically assigning semantic labels to the data. Furthermore, we propose some future work to address the remaining two questions: information discovery and integration. Date: Thursday, 15 May 2003 Time: 2:00p.m.-4:00p.m. Venue: Room 2304 Lifts 17-18 Committee Members: Prof. Frederick Lochovsky (Supervisor) Prof. Dik-Lun Lee (Chairman) Prof. Hongjun Lu Dr. Wilfred Ng **** ALL are Welcome ****