Computing summaries over Distributed Data

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Computing summaries over Distributed Data"

By

Mr. Zengfeng HUANG


Abstract

Consider a distributed system with k nodes, where each node holds a part 
of the data. The goal is to extract some useful information from the 
entire data set or to compute some functions over the data. We are 
interested in designing communication-efficient algorithms and also 
characterizing the communication complexity for various problems. We 
consider both a flat network structure and more complicated tree networks.

In this thesis, we study some most important statistical summaries of the 
underlying data, in particular item frequencies, heavy hitters, quantiles, 
and eps-approximations, which are extensively studied in database, machine 
learning, computational geometry and data mining. We provide general 
techniques for both designing efficient algorithms and proving 
communication lower bounds, with which we get almost tight bounds for 
these problems.

We also study graph problems in the distributed setting, where the edges 
of the input graph is partitioned across k nodes. We show how to compute 
an approximate maximum matching, one of most important graph problems, 
communication-efficiently. We also prove a tight lower bound for this 
problem. To prove this bound, we develop new techniques, which we believe 
will have a wide applicability to prove distributed communication 
complexity for other graph problems.


Date:			Friday, 16 August 2013

Time:			10:00am – 12:00noon

Venue:			Room 3501
 			Lifts 25/26

Chairman:		Prof. Kam Sing Wong (PHYS)

Committee Members:	Prof. Ke Yi (Supervisor)
 			Prof. Sunil Arya
 			Prof. Siu-Wing Cheng
 			Prof. Ajay Joneja (IELM)
                      	Prof. Xiaoming Sun (CAS, Beijing)


**** ALL are Welcome ****