More about HKUST
Output-optimal Massively Parallel Streaming Joins
MPhil Thesis Defence Title: "Output-optimal Massively Parallel Streaming Joins" By Mr. Serafeim PAPADIAS Abstract The advent of big data caused huge, rapid and volatile data streams to emerge, pushing research community into designing both real-time Distributed Stream Processing Systems (DSPSs) and streaming algorithms that run on top of those systems. The DSPSs must exhibit a variety of features such as hight throughput and low latency processing of data streams. In the first part of this thesis, we present the state of the art DSPSs and describe certain features that make them unique. In the second part, we focus on the problem of join processing in the streaming context. Specifically, we present the first output- optimal join algorithm for stream join processing, called Streaming Randomized HyperCube (SRHC). The algorithm operates optimally in the presence of high skew, considering both the input and the output sizes of the join, a feature that makes it quite suitable for many-to-many joins. Finally, we implement SRHC on top of Flink and evaluate its efficiency compared to state of the art join algorithms after conducting experiments on both synthetic and real datasets. Date: Wednesday, 5 September 2018 Time: 3:00pm - 5:00pm Venue: Room 5566 Lifts 27-28 Committee Members: Dr. Ke Yi (Supervisor) Dr. Raymond Wong (Chairperson) Dr. Qiong Luo **** ALL are Welcome ****