More about HKUST
Sublinear Algorithms for Massive Data
PhD Thesis Proposal Defence
Title: "Sublinear Algorithms for Massive Data"
by
Mr. Di CHEN
Abstract:
Sublinear algorithms address the rapid growth in data volume with a simple
yet powerful premise, that useful tasks can be performed with even fewer
resources than required to simply store the data.
This thesis studies randomized algorithms for massive data. We either
devise new algorithms, or improve analysis on existing algorithms,
resulting in meaningful theoretical guarantees with sublinear space or
communication.
First, we present an algorithm that uses sublinear communication to
perform set reconciliation under a `noisy data' model, where two data
points shall be considered `the same' when the distance between them is
small, modelling tiny perturbations caused in data due to some form of
noise.
The second is a 1-pass streaming algorithm for estimating the number of
distinct entities in the same noisy data model. Geometrically, this may be
seen as counting sparsely placed clusters of small diameter, using space
that is sublinear in the number of clusters.
Finally, we give an improved analysis of random Fourier features (RFF) for
the Gaussian kernel, a popular data-oblivious embedding of kernel
functions. In contrast to competing techniques that are typically
data-dependent, RFF can be used under the data stream setting. Our work
justifies the use of RFF in a variety of learning problems under the
Gaussian kernel distance.
Date: Friday, 23 September 2016
Time: 9:00am - 11:00am
Venue: Room 4475
(lifts 25/26)
Committee Members: Prof. Mordecai Golin (Supervisor)
Dr. Ke Yi (Supervisor)
Dr. Raymond Wong (Chairperson)
Dr. Sunil Arya
**** ALL are Welcome ****