Computing Statistical summaries on Massive Distributed Data

PhD Thesis Proposal Defence


Title: "Computing Statistical summaries on Massive Distributed Data"

by

Mr. Zengfeng Huang


ABSTRACT:

Consider a distributed system with k nodes, where each node holds a
part of the data. Our the goal is to design communication-efficient
algorithms for computing functions over the entire data set. In this
thesis, we focus on computing some most important statistical
summaries of the underlying data, in particular item frequencies,
heavy hitters, quantiles, and eps-approximations. We will consider
both a flat network structure and more complicated tree networks. We
give efficient algorithms with communication costs that scale
sublinearly in the size of the communication network. We also give
almost tight lower bounds, both deterministic and randomized, for all
the problems we study in this thesis.


Date:                   Wednesday, 30 January 2013

Time:                   4:00pm - 6:00pm

Venue:                  Room 3494
                         lifts 25/26

Committee Members:      Dr. Ke Yi (Supervisor)
                        	Prof. Siu-Wing Cheng (Chairperson)
 			Dr. Sunil Arya
 			Prof. Mordecai Golin


**** ALL are Welcome ****