A Hadoop-based Storage System for Big Spatio-Temporal Data Analytics

PhD Thesis Proposal Defence


Title: "A Hadoop-based Storage System for Big
Spatio-Temporal Data Analytics"

by

Mr. Haoyu TAN


ABSTRACT:

During the past decade, various GPS-equipped devices have generated a 
tremendous amount of data with time and location information, which we 
refer to as big spatio-temporal data. As the size of the data is 
ontinuously growing, it will outgrow the capabilities of any serial 
processing techniques and it is therefore necessary to perform the data 
analytics in parallel. We use the Hadoop platform to implement and execute 
most of our data processing lgorithms. However, using Hadoop alone is not 
sufficient for spatio-temporal data processing because the underlying 
storage systems do not support efficient spatio-temporal range queries. 
Existing spatio-temporal storage systems assume a non-distributed 
environment handling datasets at the scale of several gigabytes, which 
makes them undesirable for very large spatio-temporal datasets. Therefore, 
there is a notable gap between big data analytics and spatio-temporal data 
storage. It motivates us to build an efficient and scalable storage system 
for big spatio-temporal data analytics.

In this proposal, we present the design and implementation of CloST, a 
scalable big spatio-temporal data storage system to support data analytics 
using Hadoop. The main objective of CloST is to avoid scan the whole 
dataset when a spatio-temporal range is given. To this end, we propose a 
novel data model which has special treatments on three core attributes 
including an object id, a location and a time. Based on this data model, 
CloST hierarchically parti- tions data using all core attributes which 
enables efficient parallel processing of spatio-temporal range scans. 
According to the data characteristics, we devise a compact storage 
structure which reduces the storage size by an order of magnitude. In 
addition, we proposes scalable bulk loading algorithms capable of 
incrementally adding new data into the system. Future works for completion 
of the thesis are also given in the proposal.


Date:                   Friday, 15 June 2012

Time:                   3:30pm - 5:30pm

Venue:                  Room 3501
                         lifts 25/26

Committee Members:      Prof. Lionel Ni (Supervisor)
                         Dr. Lei Chen (Chairperson)
 			Dr. Lin Gu
 			Dr. Qiong Luo


**** ALL are Welcome ****