More about HKUST
Efficient Processing of Complex Join Queries on the Cloud
PhD Thesis Proposal Defence
Title: "Efficient Processing of Complex Join Queries on the Cloud"
by
Mr. Xiaofei ZHANG
ABSTRACT:
Join operation is one of the most expressive and expensive data analytic
tools in traditional Database systems. Along with the exponential growth
of various data becoming available, NoSQL data storage has risen as the
prevailing solution for Big Data. Without the strong support of heavy
index, join becomes even more crucial and challenging for querying against
or mining from massive data. There have been intensive studies over
different types of join operations over distributed data, e.g. similarity
join, set join, fuzzy join and etc., all of which focus on efficient join
query evaluation by exploring the massive parallelism promised by the
MapReduce computing framework on the Cloud platform. However, surprisingly
most of existing works rarely study the multi-way join problem, which is
summarized as the ``complex join'' in this thesis. The substantial
challenge of ``complex join'' lies in, given a number of processing units,
mapping a multi-way join query to a number of parallel tasks and having
them executed in a well scheduled sequence, such that the total processing
time span is minimized. As a matter of fact, multi-way join is widely
adopted many real world applications. In this thesis, we demonstrate how
our ``complex join'' solution can be well applied to the query processing
over RDF data. To summarize, our study covers three following aspects:
1) We propose a cost model based RDF join processing solution using
MapReduce and general purposed optimization strategy;
2) We study the problem of efficient processing of multi-way Theta-join
queries using MapReduce from a cost-effective perspective;
3) We propose an novel representation of RDF data on Cloud platforms,
based on which we propose an I/O efficient strategy to evaluate SPARQL
queries as quickly as possible.
Moreover, we demonstrate through extensive experiments the efficiency and
effectiveness of our proposed solutions for multi-way join processing on
the Cloud. At the meanwhile, we report our preliminary work and discuss
the future research plans including several interesting directions on the
complex join processing on the Cloud.
Date: Tuesday, 8 January 2013
Time: 2:30pm - 4:30pm
Venue: Room 3501
lifts 25/26
Committee Members: Dr. Lei Chen (Supervisor)
Dr. Raymond Wong (Chairperson)
Dr. Kai Chen
Dr. Ke Yi
**** ALL are Welcome ****