More about HKUST
Efficient Processing of Complex Join Queries on the Coud
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Efficient Processing of Complex Join Queries on the Coud"
By
Mr. Xiaofei ZHANG
Abstract
Join operation is one of the most expressive and expensive data analytic
tools in traditional Database systems. Along with the exponential growth
of various data collections, NoSQL data storage has risen as the
prevailing solution for Big Data. However, without the strong support of
heavy index, the join operator becomes even more crucial and challenging
for querying against or mining from massive data. There have been
intensive studies over different types of join operations over distributed
data, e.g. similarity join, set join, fuzzy join and etc., all of which
focus on efficient join query evaluation by exploring the massive
parallelism of the MapReduce computing framework on the Cloud platform.
However, the multi-way generalized join problem, which is summarized as
the complex join in this thesis, has not yet been thoroughly explored. The
substantial challenge of complex join lies in, given a number of
processing units, mapping a complex join query to a number of parallel
tasks and having them executed in a well scheduled sequence, such that the
total processing time span is minimized. In this thesis, we demonstrate
how our complex join solution can be well applied to the query processing
over various data analytic scenarios, i.e., querying RDF data, pattern
matching over graph data and etc.To summarize, our study covers four
following aspects:
1) We propose a cost model based RDF join processing solution using
MapReduce and general purposed optimization strategy;
2) We propose an novel representation of RDF data on Cloud platforms,
based on which we propose an I/O efficient strategy to evaluate SPARQL
queries as quickly as possible.
3) We study the problem of efficient processing of multi-way Theta-join
queries using MapReduce from a cost-effective perspective;
4) We develop a complete solution framework for join-based efficient
analysis over distributed graphs using the distance join query as an
example.
We validate our solutions through extensive experiments and discuss
several interesting research directions of the complex join processing on
the Cloud.
Date: Wednesday, 19 June 2013
Time: 11:00am – 1:00pm
Venue: Room 3494
Lifts 25/26
Chairman: Prof. Qing Li (ISOM)
Committee Members: Prof. Lei Chen (Supervisor)
Prof. Dik-Lun Lee
Prof. Ke Yi
Prof. Yeou-Koung Tung (CIVL)
Prof. Jianliang Xu (Comp. Sci., HKBU)
**** ALL are Welcome ****