More about HKUST
Classic and New Data Structure Problems in External Memory
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
PhD Thesis Defence
Title: "Classic and New Data Structure Problems in External Memory"
By
Mr. Zhewei Wei
Abstract
The demand of efficient data structures for query processing on massive
data sets has grown tremendously in the past decades. Traditionally, data
structures are designed and analyzed in the RAM model, where each memory
cell can be accessed with unit cost. This assumption, however, is
unrealistic for modeling modern memory hierarchies which consist of many
levels of memories and caches with different sizes and access costs. As a
consequence, a number of more elaborate models were introduced. Among them
the most successful ones are the I/O model and the cache-oblivious model.
In recent years, designing data structures that are I/O-efficient or
cache-oblivious has become an active direction in both the theory and
database communities.
This thesis starts by considering the dictionary problem, one of the most
basic data structure problems, in which we want to store and access a set
of (key, data) pairs. Hash tables are the most efficient and the most
fundamental data structure for implementing a dictionary. In this thesis
we study hash tables in both the I/O model and the cache-oblivious model.
We first show an inherent query-insertion tradeoff of hashing in the I/O
model, which implies that the buffering technique is essentially useless
for hash tables. In the cache-oblivious model, we build a hash table that
achieves the same search cost as its cache-aware version does, for all
block sizes.
The second problem studied is another fundamental data structure problem,
priority queues. The priority queue problem is well understood in the
comparison based I/O model, where its complexity is known to be the same
as sorting. In this thesis, we establish their equivalence in the I/O
model without any restrictions, by providing a reduction from priority
queues to sorting. Note that the other direction of the reduction is
trivial.
The third problem studied in this thesis is the summary query problem,
which is a natural generalization of the range searching problem. Our goal
is to design data structures that allow for extracting a statistical
summary of all the records in the query range. The summaries we support
include frequent items, quantiles, various sketches, and wavelets, all of
which are of central importance in massive data analysis.
Date: Thursday, 2 February 2012
Time: 1:30pm – 3:30pm
Venue: Room 3402
Lifts 17/18
Chairman: Prof. Anaimalai MUTHUKRISHNAN (MARK)
Committee Members: Prof. Ke Yi (Supervisor)
Prof. Siu-Wing Cheng
Prof. Mordecai Golin
Prof. Xiangtong Qi (IELM)
Prof. Yitong Yin (Comp. Sci. & Tech., Nanjing Univ.)
**** ALL are Welcome ****