More about HKUST
A Hadoop-based Storage System for Big Spatio-Temporal Data Analytics
PhD Thesis Proposal Defence Title: "A Hadoop-based Storage System for Big Spatio-Temporal Data Analytics" by Mr. Haoyu TAN ABSTRACT: During the past decade, various GPS-equipped devices have generated a tremendous amount of data with time and location information, which we refer to as big spatio-temporal data. As the size of the data is ontinuously growing, it will outgrow the capabilities of any serial processing techniques and it is therefore necessary to perform the data analytics in parallel. We use the Hadoop platform to implement and execute most of our data processing lgorithms. However, using Hadoop alone is not sufficient for spatio-temporal data processing because the underlying storage systems do not support efficient spatio-temporal range queries. Existing spatio-temporal storage systems assume a non-distributed environment handling datasets at the scale of several gigabytes, which makes them undesirable for very large spatio-temporal datasets. Therefore, there is a notable gap between big data analytics and spatio-temporal data storage. It motivates us to build an efficient and scalable storage system for big spatio-temporal data analytics. In this proposal, we present the design and implementation of CloST, a scalable big spatio-temporal data storage system to support data analytics using Hadoop. The main objective of CloST is to avoid scan the whole dataset when a spatio-temporal range is given. To this end, we propose a novel data model which has special treatments on three core attributes including an object id, a location and a time. Based on this data model, CloST hierarchically parti- tions data using all core attributes which enables efficient parallel processing of spatio-temporal range scans. According to the data characteristics, we devise a compact storage structure which reduces the storage size by an order of magnitude. In addition, we proposes scalable bulk loading algorithms capable of incrementally adding new data into the system. Future works for completion of the thesis are also given in the proposal. Date: Friday, 15 June 2012 Time: 3:30pm - 5:30pm Venue: Room 3501 lifts 25/26 Committee Members: Prof. Lionel Ni (Supervisor) Dr. Lei Chen (Chairperson) Dr. Lin Gu Dr. Qiong Luo **** ALL are Welcome ****