More about HKUST
Techniques and Applications of Random Sampling on Massive Data
PhD Thesis Proposal Defence
Title: "Techniques and Applications of Random Sampling on Massive Data"
by
Mr. Lu WANG
Abstract:
Living in the era of big data, we often need to process and analyze data
sets that have never been so large and fast-growing. Random sampling has
thus received much attention as an effective tool for turning big data
"small". It allows us to significantly reduce the size of input while
maintaining the main features of the original data set we need. It is also
easy to trade off between the computation complexity and the accuracy of
the result, by tweaking the sample size.
Although random sampling is a classical problem with a long history, it
has received revived attention lately motivated by new applications as
well as new constraints in the big data era. This thesis presents several
new techniques and applications of random sampling: (1) a new randomized
streaming algorithm for finding approximate quantiles in a data stream,
which achieves the smallest space complexity of all such algorithms; (2)
an augmented B-tree index that, for any given range query, returns a
sampling-based summary containing the quantiles and heavy hitters of all
tuples in the query range; and (3) an sample-augmented R-tree that, given
any range query, returns random samples from the query range in an online
fashion. Apart from the description and analysis of each algorithm we
propose, experimental results are also provided, confirming the advantages
of the new algorithms.
Date: Tuesday, 13 January 2015
Time: 10:00am - 12:00noon
Venue: Room 3494
lifts 25/26
Committee Members: Dr. Ke Yi (Supervisor)
Prof. Siu-Wing Cheng (Chairperson)
Prof. Dimitris Papadias
Dr. Raymond Wong
**** ALL are Welcome ****