More about HKUST
Sublinear Algorithms for Massive Data
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Sublinear Algorithms for Massive Data" By Mr. Di CHEN Abstract Sublinear algorithms address the rapid growth in data volume with a simple yet powerful premise, that useful tasks can be performed with even fewer resources than required to simply store the data. This thesis studies randomized algorithms for massive data. We either devise new algorithms, or improve analysis on existing algorithms, resulting in meaningful theoretical guarantees with sublinear space or communication. First, we present an algorithm that uses sublinear communication to perform set reconciliation under a 'noisy data' model, where two data points shall be considered 'the same' when the distance between them is small, modelling tiny perturbations caused in data due to some form of noise. The second is a 1-pass streaming algorithm for estimating the number of distinct entities in the same noisy data model. Geometrically, this may be seen as counting sparsely placed clusters of small diameter, using space that is sublinear in the number of clusters. Finally, we give an improved analysis of random Fourier features (RFF) for the Gaussian kernel, a popular data-oblivious embedding of kernel functions. In contrast to competing techniques that are typically data-dependent, RFF can be used under the data stream setting. Our work justifies the use of RFF in a variety of learning problems under the Gaussian kernel distance. Date: Friday, 9 December 2016 Time: 9:30am - 11:30am Venue: Room 2129C Lift 19 Chairman: Prof. Wai-Ho Mow (ECE) Committee Members: Prof. Mordecai Golin (Supervisor) Prof. Ke Yi (Supervisor) Prof. Sunil Arya Prof. Raymond Wong Prof. Lancelot James (ISOM) Prof. Xiaoming Sun (Inst. of Comp. Tech., CAS) **** ALL are Welcome ****