Distributed Algorithms for Computing Statistical Information on Massive Data

PhD Qualifying Examination


Title: "Distributed Algorithms for Computing Statistical Information on
Massive Data"

by

Mr. Zengfeng Huang


Abstract:

Consider a distributed system with k nodes, where each node holds data set, and 
the goal is to design communication-efficient algorithms for computing 
functions over the union of the k data sets. In this survey, we focus on 
computing some most important statistical information of the underlying data, 
in particular item frequencies, heavy hitters, quantiles, top-m items, and 
random samples. We will consider both a flat network structure and more 
complicated tree networks. We also consider the case where the inputs are not 
static sets, but k data streams, and the goal is to continuously track these 
functions over the data that has arrived at all nodes so far.


Date:                   Friday, 7 May 2010

Time:                   2:00pm - 4:00pm

Venue:                  Room 3304
                         lifts 17/18

Committee Members:      Dr. Ke Yi (Supervisor)
                         Prof. Siu-Wing Cheng (Chairperson)
                         Dr. Sunil Arya
 			Prof. Mordecai Golin


**** ALL are Welcome ****