Classic and New Data Structure Problems in External Memory

PhD Thesis Proposal Defence


Title: "Classic and New Data Structure Problems in External Memory"

by

Mr. Zhewei Wei


ABSTRACT:

The demand of efficient data structures for query processing on massive data 
sets has grown tremendously in the past decades. Traditionally, data structures 
are designed and analyzed in the RAM model, where each memory cell can be 
accessed with unit cost. This assumption, however, is unrealistic for modeling 
modern memory hierarchies which consist of many levels of memories and caches 
with different sizes and access costs. As a consequence, a number of more 
elaborate models were introduced. Among them the most successful ones are the 
I/O model and the cache-oblivious model. In recent years, designing data 
structures that are I/O-efficient or cache-oblivious has become an active 
direction in both the theory and database communities.

This proposal starts by considering the dictionary problem, one of the most 
basic data structure problems, in which we want to store and access a set of 
(key, data) pairs. Hash tables are the most efficient and the most fundamental 
data structure for implementing a dictionary. In this proposal we study hash 
tables in both the I/O model and the cache- oblivious model. We first show an 
inherent query-insertion tradeoff of hashing in the I/O model, which implies 
that the buffering technique is essentially useless for hash tables. In the 
cache-oblivious model, we build a hash table that achieves the same search cost 
as its cache-aware version does, for all block sizes.

The second problem studied in this proposal is the summary query problem, which 
is a natural generalization of the range searching problem. Our goal is to 
design data structures that allow for extracting a statistical summary of all 
the records in the query range. The summaries we support include frequent 
items, quantiles, various sketches, and wavelets, all of which are of central 
importance in massive data analysis.


Date:                   Tuesday, 4 October 2011

Time:                   10:30am - 12:30pm

Venue:                  Room 2578
                         lifts 27/28

Committee Members:      Dr. Ke Yi (Supervisor)
                         Dr. Sunil Arya (Chairperson)
 			Prof. Siu-Wing Cheng
 			Prof. Mordecai Golin


**** ALL are Welcome ****