Two Popular Queries in Massive Multidimensional Datasets

PhD Thesis Proposal Defence


Title: "Two Popular Queries in Massive Multidimensional Datasets"

by

Miss Wuman Luo


ABSTRACT:

The arrival of cyber-physical system era is changing data analysis in many 
ways. Driven by the advances in Internet and sensor techniques, the amount of 
multidimensional contents, such as images, trajectories, and video clips, has 
grown to an unprecedented level. Supporting multidimensional objects in large 
scale requires significant extensions from traditional databases. One critical 
issue is indexing and query processing. In this proposal, we discuss two types 
of queries in massive multidimensional datasets: high-dimensional similarity 
join and the most frequent path finding.

In the first part of this proposal, we study how to perform parallel 
high-dimensional joins in the MapReduce paradigm. Specifically, we propose a 
cost model to demonstrate that it is important to take both communication and 
computation costs into accounts as dimensionality and data volume increases. To 
this end, we propose an efficient compression approach which can help 
significantly reduce both these costs. Moreover, we design two parallel 
frameworks which can scale up to massive data sizes and very high 
dimensionality. In the second part of this proposal, we address the problem of 
path finding by evaluating the desirability of a path from a novel perspective, 
i.e., how frequently the path has been taken within the given time constraints. 
This new query not only helps users to learn from the experiences of the past 
travelers, but also takes the variability of road and traffic conditions into 
account. To achieve this goal, we firstly design two indexes for efficient 
trajectory searching and splitting. After that, we develop a "footmark graph“ 
construction algorithm to calculate the road segment frequencies from raw 
trajectories. Finally, we propose a most ”frequent path finding“ algorithm 
based on the ”more frequent“ relation in a dynamic programming manner.


Date:                   Friday, 15 June 2012

Time:                   1:30pm - 3:30pm

Venue:                  Room 3501
                         lifts 25/26

Committee Members:      Prof. Lionel Ni (Supervisor)
                         Dr. Qiong Luo (Chairperson)
 			Dr. Lei Chen
 			Dr. Lin Gu


**** ALL are Welcome ****