More about HKUST
Efficient Processing of Complex Join Queries on the Cloud
PhD Thesis Proposal Defence Title: "Efficient Processing of Complex Join Queries on the Cloud" by Mr. Xiaofei ZHANG ABSTRACT: Join operation is one of the most expressive and expensive data analytic tools in traditional Database systems. Along with the exponential growth of various data becoming available, NoSQL data storage has risen as the prevailing solution for Big Data. Without the strong support of heavy index, join becomes even more crucial and challenging for querying against or mining from massive data. There have been intensive studies over different types of join operations over distributed data, e.g. similarity join, set join, fuzzy join and etc., all of which focus on efficient join query evaluation by exploring the massive parallelism promised by the MapReduce computing framework on the Cloud platform. However, surprisingly most of existing works rarely study the multi-way join problem, which is summarized as the ``complex join'' in this thesis. The substantial challenge of ``complex join'' lies in, given a number of processing units, mapping a multi-way join query to a number of parallel tasks and having them executed in a well scheduled sequence, such that the total processing time span is minimized. As a matter of fact, multi-way join is widely adopted many real world applications. In this thesis, we demonstrate how our ``complex join'' solution can be well applied to the query processing over RDF data. To summarize, our study covers three following aspects: 1) We propose a cost model based RDF join processing solution using MapReduce and general purposed optimization strategy; 2) We study the problem of efficient processing of multi-way Theta-join queries using MapReduce from a cost-effective perspective; 3) We propose an novel representation of RDF data on Cloud platforms, based on which we propose an I/O efficient strategy to evaluate SPARQL queries as quickly as possible. Moreover, we demonstrate through extensive experiments the efficiency and effectiveness of our proposed solutions for multi-way join processing on the Cloud. At the meanwhile, we report our preliminary work and discuss the future research plans including several interesting directions on the complex join processing on the Cloud. Date: Tuesday, 8 January 2013 Time: 2:30pm - 4:30pm Venue: Room 3501 lifts 25/26 Committee Members: Dr. Lei Chen (Supervisor) Dr. Raymond Wong (Chairperson) Dr. Kai Chen Dr. Ke Yi **** ALL are Welcome ****