More about HKUST
Building Summaries over Streaming Data
PhD Thesis Proposal Defence Title: "Building Summaries over Streaming Data" by Mr. Ge LUO Abstract: Many of today's applications generate huge amounts of data in a streaming fashion. The data often has to be processed in real time with limited memory space. On the other hand, it has been observed that in most cases, the value density of these data is rather low, and they can usually be compacted in a small summary that can be used to serve queries on the original data without losing much accuracy. Therefore, building various small summaries over streaming data has received much attention in both the algorithms and database communities in recent years. This thesis proposal makes three contributions to this area. First, we continue the study of building quantile summaries over streaming data. This is a fundamental problem that has been studied for over 30 years, yet there has been limited formal comparison of the competing methods, and no comprehensive study of their performance. In this proposal, we remedy this deficit by providing a taxonomy of different methods, and describe efficient implementations. In doing so, we also propose and analyze variations that have not been explicitly studied before, yet which turn out to perform the best. Second, we revisit another classical problem, which is to build a piecewise linear application (PLA) over streaming time series data. Prior work addressed two versions of the problem, where either the PLA consists of disjoint segments, or it is required to be a continuous piecewise linear function. However, we observe that neither minimizes the true representation size of the PLA, i.e., the number of parameters required to represent it. In this proposal, we design an algorithm that generates the optimal PLA in terms of representation size while meeting the prescribed max-error guarantee. Thirdly, we propose a new type of summaries, called persistent data sketching. All existing data summaries are updated upon the arrival of every element in the stream, thus destroying its previous version. A persistent sketch, on the other hand, preserves all its previous versions as it is updated over time. Every update creates a new version, while all the versions are kept compactly so that any previous version can still be queried. This gives the summary the ability to answer queries about the stream at any prior time. Date: Tuesday, 18 August 2015 Time: 10:00am - 12:00noon Venue: Room 2132B lift 19 Committee Members: Dr. Ke Yi (Supervisor) Dr. Raymond Wong (Chairperson) Prof. Siu-Wing Cheng Dr. Qiong Luo **** ALL are Welcome ****