More about HKUST
A Hadoop-based Storage System for Big Spatio-Temporal Data Analytics
PhD Thesis Proposal Defence
Title: "A Hadoop-based Storage System for Big
Spatio-Temporal Data Analytics"
by
Mr. Haoyu TAN
ABSTRACT:
During the past decade, various GPS-equipped devices have generated a
tremendous amount of data with time and location information, which we
refer to as big spatio-temporal data. As the size of the data is
ontinuously growing, it will outgrow the capabilities of any serial
processing techniques and it is therefore necessary to perform the data
analytics in parallel. We use the Hadoop platform to implement and execute
most of our data processing lgorithms. However, using Hadoop alone is not
sufficient for spatio-temporal data processing because the underlying
storage systems do not support efficient spatio-temporal range queries.
Existing spatio-temporal storage systems assume a non-distributed
environment handling datasets at the scale of several gigabytes, which
makes them undesirable for very large spatio-temporal datasets. Therefore,
there is a notable gap between big data analytics and spatio-temporal data
storage. It motivates us to build an efficient and scalable storage system
for big spatio-temporal data analytics.
In this proposal, we present the design and implementation of CloST, a
scalable big spatio-temporal data storage system to support data analytics
using Hadoop. The main objective of CloST is to avoid scan the whole
dataset when a spatio-temporal range is given. To this end, we propose a
novel data model which has special treatments on three core attributes
including an object id, a location and a time. Based on this data model,
CloST hierarchically parti- tions data using all core attributes which
enables efficient parallel processing of spatio-temporal range scans.
According to the data characteristics, we devise a compact storage
structure which reduces the storage size by an order of magnitude. In
addition, we proposes scalable bulk loading algorithms capable of
incrementally adding new data into the system. Future works for completion
of the thesis are also given in the proposal.
Date: Friday, 15 June 2012
Time: 3:30pm - 5:30pm
Venue: Room 3501
lifts 25/26
Committee Members: Prof. Lionel Ni (Supervisor)
Dr. Lei Chen (Chairperson)
Dr. Lin Gu
Dr. Qiong Luo
**** ALL are Welcome ****