Join Algorithms for Non-Conventional Databases

PhD Thesis Proposal Defence


Title: "Join Algorithms for Non-Conventional Databases"

by

Mr. Yin YANG


Abstract:

Efficient join processing is a pivotal problem in many novel database 
applications. This proposal presents our initial investigations on 
continuous join queries in a data stream environment, which exhibit the 
following two important properties. First, data characteristics, e.g., 
arrival rates, value distributions, etc., may change in an unpredictable 
manner. Consequently, the initial query plan may gradually become 
inefficient. This necessitates dynamic plan migration, an online 
transition from the old plan to a new one generated based on current 
statistics. In addition to correctness, an effective technique for this 
purpose should achieve the following objectives: (i) minimize the memory 
and CPU overhead of the migration (ii) reduce the duration of the 
transition and (iii) maintain a steady output rate. The only known 
solutions for this problem are the moving states (MS) and parallel track 
(PT) strategies, which have some serious shortcomings related to the above 
objectives. Motivated by these shortcomings, we first propose HybMig, 
which combines the merits of MS and PT, and outperforms both on every 
aspect.

The second key property of streaming a join query is that its data records 
are not available beforehand, but enter the system gradually. Hence, the 
operators in an execution plan need to be co-ordinated online, based on 
tuple arrivals and expirations. Specifically, these operators are 
connected via the “consumer-producer” relationship, i.e., the outputs of 
one operator (the “producer”) serve as the inputs to another (the 
“consumer”). Conventional techniques execute each operator separately and 
push all results to its consumers, without considering whether the 
consumers need them. As a result, considerable CPU and memory resources 
are wasted on producing and storing useless intermediate results. 
Motivated by this, we propose just-in-time (JIT) processing, a novel 
methodology that enables a consumer to return feedback expressing its 
current demand to the producer. The latter selectively generates outputs 
based on this information. Besides achieving significant savings in terms 
of CPU time and memory consumption, JIT is compatible with to various 
semantics including sliding windows and explicit deletions, and applies to 
a broad range of execution plans, e.g., those involving shared operators, 
and/or using sophisticated architectures.


Date:     		Wednesday, 17 December 2008

Time:                   10:00a.m.-12:00noon

Venue:                  Room 3588
 			lifts 27-28

Committee Members:      Prof. Dimitris Papadias (Supervisor)
 			Dr. Wilfred Ng (Chairperson)
                         Dr. Lei Chen
 			Dr. Ke Yi


**** ALL are Welcome ****