More about HKUST
Efficient Processing of Complex Join Queries on the Coud
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Efficient Processing of Complex Join Queries on the Coud" By Mr. Xiaofei ZHANG Abstract Join operation is one of the most expressive and expensive data analytic tools in traditional Database systems. Along with the exponential growth of various data collections, NoSQL data storage has risen as the prevailing solution for Big Data. However, without the strong support of heavy index, the join operator becomes even more crucial and challenging for querying against or mining from massive data. There have been intensive studies over different types of join operations over distributed data, e.g. similarity join, set join, fuzzy join and etc., all of which focus on efficient join query evaluation by exploring the massive parallelism of the MapReduce computing framework on the Cloud platform. However, the multi-way generalized join problem, which is summarized as the complex join in this thesis, has not yet been thoroughly explored. The substantial challenge of complex join lies in, given a number of processing units, mapping a complex join query to a number of parallel tasks and having them executed in a well scheduled sequence, such that the total processing time span is minimized. In this thesis, we demonstrate how our complex join solution can be well applied to the query processing over various data analytic scenarios, i.e., querying RDF data, pattern matching over graph data and etc.To summarize, our study covers four following aspects: 1) We propose a cost model based RDF join processing solution using MapReduce and general purposed optimization strategy; 2) We propose an novel representation of RDF data on Cloud platforms, based on which we propose an I/O efficient strategy to evaluate SPARQL queries as quickly as possible. 3) We study the problem of efficient processing of multi-way Theta-join queries using MapReduce from a cost-effective perspective; 4) We develop a complete solution framework for join-based efficient analysis over distributed graphs using the distance join query as an example. We validate our solutions through extensive experiments and discuss several interesting research directions of the complex join processing on the Cloud. Date: Wednesday, 19 June 2013 Time: 11:00am – 1:00pm Venue: Room 3494 Lifts 25/26 Chairman: Prof. Qing Li (ISOM) Committee Members: Prof. Lei Chen (Supervisor) Prof. Dik-Lun Lee Prof. Ke Yi Prof. Yeou-Koung Tung (CIVL) Prof. Jianliang Xu (Comp. Sci., HKBU) **** ALL are Welcome ****