Efficient Processing of Complex Join Queries on the Cloud

PhD Thesis Proposal Defence


Title: "Efficient Processing of Complex Join Queries on the Cloud"

by

Mr. Xiaofei ZHANG


ABSTRACT:

Join operation is one of the most expressive and expensive data analytic 
tools in traditional Database systems. Along with the exponential growth 
of various data becoming available, NoSQL data storage has risen as the 
prevailing solution for Big Data. Without the strong support of heavy 
index, join becomes even more crucial and challenging for querying against 
or mining from massive data. There have been intensive studies over 
different types of join operations over distributed data, e.g. similarity 
join, set join, fuzzy join and etc., all of which focus on efficient join 
query evaluation by exploring the massive parallelism promised by the 
MapReduce computing framework on the Cloud platform. However, surprisingly 
most of existing works rarely study the multi-way join problem, which is 
summarized as the ``complex join'' in this thesis. The substantial 
challenge of ``complex join'' lies in, given a number of processing units, 
mapping a multi-way join query to a number of parallel tasks and having 
them executed in a well scheduled sequence, such that the total processing 
time span is minimized. As a matter of fact, multi-way join is widely 
adopted many real world applications. In this thesis, we demonstrate how 
our ``complex join'' solution can be well applied to the query processing 
over RDF data. To summarize, our study covers three following aspects:

1) We propose a cost model based RDF join processing solution using 
MapReduce and general purposed optimization strategy;

2) We study the problem of efficient processing of multi-way Theta-join 
queries using MapReduce from a cost-effective perspective;

3) We propose  an novel representation of RDF data on Cloud platforms, 
based on which we propose an I/O efficient strategy to evaluate SPARQL 
queries as quickly as possible.

Moreover, we demonstrate through extensive experiments the efficiency and 
effectiveness of our proposed solutions for multi-way join processing on 
the Cloud. At the meanwhile, we report our preliminary work and discuss 
the future research plans including several interesting directions on the 
complex join processing on the Cloud.


Date:                   Tuesday, 8 January 2013

Time:                   2:30pm - 4:30pm

Venue:                  Room 3501
                         lifts 25/26

Committee Members:      Dr. Lei Chen (Supervisor)
                         Dr. Raymond Wong (Chairperson)
 			Dr. Kai Chen
 			Dr. Ke Yi


**** ALL are Welcome ****