The Hong Kong University of Science and Technology Department of Computer Science PhD Thesis Defence "Efficient Structural Query Processing in XML Databases" By Mr. Haifeng Jiang Abstract As XML is gaining unqualified success in being adopted as a universal data exchange format, particularly in the World Wide Web, more and more information will be stored, exchanged, and presented in XML. The ability to intelligently query XML data sources, therefore, becomes increasingly important. This thesis studies the query processing of a core subset of XML query languages: twig queries with and without OR-predicates. An XML twig query, represented as a labeled query tree, is essentially a complex selection on both the structure and the content of an XML document. Matching a twig query is to find all embedded instances of the query tree in the data tree that represents the XML document. While querying by content can leverage traditional database technologies, evaluating the structural relationships (specifically parent-child and ancestor-descendant relationships) specified in twig queries has imposed a great challenge to existing database technologies. We present in this thesis a series of holistic twig join algorithms by which a query tree is not matched edge by edge; instead, we match it as a whole so that irrelevant intermediate results can be greatly reduced. Specifically, we first present a merge-based algorithm and an index-based algorithm for processing twig queries without OR-predicates. The algorithms access underlying XML data through a generic data access interface so that they are independent from specific physical storage methods. We then extend these two algorithms to handle general twig queries that can contain OR -predicates. To provide an efficient data access interface for our algorithms, we propose a novel external-memory index structure, namely XR -tree, and detail the implementation of a data access interface for XR-tree indexed XML data. We conducted extensive experiments with our algorithms in comparison with existing state-of-the-art ones. Our algorithms clearly being the winner, we recommend that they should be adopted by a performance-critical XML query system. Date: Thursday, 13 May 2004 Time: 10:00a.m.-12:00noon Venue: Room 2404 Lifts 17-18 Chairman: Prof. Chung Chun Yang (MATH) Committee Members: Prof. Hongjun Lu (Supervisor) Prof. Frederick Lochovsky Prof. Qiong Luo Prof. Andrew Lim (IEEM) Prof. Jeffrey Xu Yu (Sys. Engg. & Engg. Mgmt., CUHK) **** ALL are Welcome ****