Techniques and Applications of Random Sampling on Massive Data

PhD Thesis Proposal Defence


Title: "Techniques and Applications of Random Sampling on Massive Data"

by

Mr. Lu WANG


Abstract:

Living in the era of big data, we often need to process and analyze data 
sets that have never been so large and fast-growing. Random sampling has 
thus received much attention as an effective tool for turning big data 
"small". It allows us to significantly reduce the size of input while 
maintaining the main features of the original data set we need. It is also 
easy to trade off between the computation complexity and the accuracy of 
the result, by tweaking the sample size.

Although random sampling is a classical problem with a long history, it 
has received revived attention lately motivated by new applications as 
well as new constraints in the big data era. This thesis presents several 
new techniques and applications of random sampling: (1) a new randomized 
streaming algorithm for finding approximate quantiles in a data stream, 
which achieves the smallest space complexity of all such algorithms; (2) 
an augmented B-tree index that, for any given range query, returns a 
sampling-based summary containing the quantiles and heavy hitters of all 
tuples in the query range; and (3) an sample-augmented R-tree that, given 
any range query, returns random samples from the query range in an online 
fashion. Apart from the description and analysis of each algorithm we 
propose, experimental results are also provided, confirming the advantages 
of the new algorithms.


Date:			Tuesday, 13 January 2015

Time:                   10:00am - 12:00noon

Venue:                  Room 3494
                         lifts 25/26

Committee Members:	Dr. Ke Yi (Supervisor)
 			Prof. Siu-Wing Cheng (Chairperson)
 			Prof. Dimitris Papadias
 			Dr. Raymond Wong


**** ALL are Welcome ****