More about HKUST
Building Summaries over Streaming Data
PhD Thesis Proposal Defence
Title: "Building Summaries over Streaming Data"
by
Mr. Ge LUO
Abstract:
Many of today's applications generate huge amounts of data in a streaming
fashion. The data often has to be processed in real time with limited
memory space. On the other hand, it has been observed that in most cases,
the value density of these data is rather low, and they can usually be
compacted in a small summary that can be used to serve queries on the
original data without losing much accuracy. Therefore, building various
small summaries over streaming data has received much attention in both
the algorithms and database communities in recent years.
This thesis proposal makes three contributions to this area. First, we
continue the study of building quantile summaries over streaming data.
This is a fundamental problem that has been studied for over 30 years, yet
there has been limited formal comparison of the competing methods, and no
comprehensive study of their performance. In this proposal, we remedy this
deficit by providing a taxonomy of different methods, and describe
efficient implementations. In doing so, we also propose and analyze
variations that have not been explicitly studied before, yet which turn
out to perform the best.
Second, we revisit another classical problem, which is to build a
piecewise linear application (PLA) over streaming time series data. Prior
work addressed two versions of the problem, where either the PLA consists
of disjoint segments, or it is required to be a continuous piecewise
linear function. However, we observe that neither minimizes the true
representation size of the PLA, i.e., the number of parameters required to
represent it. In this proposal, we design an algorithm that generates the
optimal PLA in terms of representation size while meeting the prescribed
max-error guarantee.
Thirdly, we propose a new type of summaries, called persistent data
sketching. All existing data summaries are updated upon the arrival of
every element in the stream, thus destroying its previous version. A
persistent sketch, on the other hand, preserves all its previous versions
as it is updated over time. Every update creates a new version, while all
the versions are kept compactly so that any previous version can still be
queried. This gives the summary the ability to answer queries about the
stream at any prior time.
Date: Tuesday, 18 August 2015
Time: 10:00am - 12:00noon
Venue: Room 2132B
lift 19
Committee Members: Dr. Ke Yi (Supervisor)
Dr. Raymond Wong (Chairperson)
Prof. Siu-Wing Cheng
Dr. Qiong Luo
**** ALL are Welcome ****