More about HKUST
Keyword Search over Relational Data
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Keyword Search over Relational Data" By Mr. Alexander Markowetz Abstract Due to its ease of use, relational keyword search (R-KWS) has become increasingly popular. Compared to SQL, it has several benefits. First, R-KWS hides the schema from users and allows searching for combinations of terms without knowing in which data sources they appear. Second, its queries are easy to express. Third, many search tasks only become feasible through R-KWS. However, R-KWS is a young concept that holds many challenges. First, it lacks common semantics, and different implementations encounter various pitfalls. Second, its simplicity comes at a high computational cost. Specifically, R-KWS explores a vast search space, composed of all possible combinations of keyword occurrences in any attribute of every table. There are two general methodologies for query processing: (i) graph based (GB), traversing a materialized data graph, and (ii) operator based (OB), executing a set of relational operator trees. In both cases, a large portion of the computations are wasted on traversals/operators that fail to return results. Finally, while data streams become increasingly important to our information landscapes, R-KWS has been restrained to static data. The contribution of this paper is thus threefold. First, it provides homogenized semantics and matching query processing methods (GB as well as OB) that are formally shown to be correct. Second, it introduces a comprehensive framework for reachability indexing that quickly collapses the search space, and leads to significant savings. Third, it extends R-KWS to continuous queries over relational data streams. In particular, it proposes temporal semantics, as well as matching query processing techniques (GB as well as OB) that are further optimized. Extensive experiments demonstrate the high practicability of R-KWS, on tables as well as streams, and the significant benefits of the proposed optimizations. Date: Friday, 26 September 2008 Time: 2:30p.m.-4:30p.m. Venue: Room 4480 Lifts 25-26 Chairman: Prof. Chi-Ying Tsui (ECE) Committee Members: Prof. Dimitris Papadias (Supervisor) Prof. Lei Chen Prof. Mounir Hamdi Prof. Ajay Joneja (IELM) Prof. Kian-Lee Tan (Comp. Sci., National Univ. of Singapore) **** ALL are Welcome ****