More about HKUST
Join Algorithms for Non-Conventional Databases
PhD Thesis Proposal Defence
Title: "Join Algorithms for Non-Conventional Databases"
by
Mr. Yin YANG
Abstract:
Efficient join processing is a pivotal problem in many novel database
applications. This proposal presents our initial investigations on
continuous join queries in a data stream environment, which exhibit the
following two important properties. First, data characteristics, e.g.,
arrival rates, value distributions, etc., may change in an unpredictable
manner. Consequently, the initial query plan may gradually become
inefficient. This necessitates dynamic plan migration, an online
transition from the old plan to a new one generated based on current
statistics. In addition to correctness, an effective technique for this
purpose should achieve the following objectives: (i) minimize the memory
and CPU overhead of the migration (ii) reduce the duration of the
transition and (iii) maintain a steady output rate. The only known
solutions for this problem are the moving states (MS) and parallel track
(PT) strategies, which have some serious shortcomings related to the above
objectives. Motivated by these shortcomings, we first propose HybMig,
which combines the merits of MS and PT, and outperforms both on every
aspect.
The second key property of streaming a join query is that its data records
are not available beforehand, but enter the system gradually. Hence, the
operators in an execution plan need to be co-ordinated online, based on
tuple arrivals and expirations. Specifically, these operators are
connected via the “consumer-producer” relationship, i.e., the outputs of
one operator (the “producer”) serve as the inputs to another (the
“consumer”). Conventional techniques execute each operator separately and
push all results to its consumers, without considering whether the
consumers need them. As a result, considerable CPU and memory resources
are wasted on producing and storing useless intermediate results.
Motivated by this, we propose just-in-time (JIT) processing, a novel
methodology that enables a consumer to return feedback expressing its
current demand to the producer. The latter selectively generates outputs
based on this information. Besides achieving significant savings in terms
of CPU time and memory consumption, JIT is compatible with to various
semantics including sliding windows and explicit deletions, and applies to
a broad range of execution plans, e.g., those involving shared operators,
and/or using sophisticated architectures.
Date: Wednesday, 17 December 2008
Time: 10:00a.m.-12:00noon
Venue: Room 3588
lifts 27-28
Committee Members: Prof. Dimitris Papadias (Supervisor)
Dr. Wilfred Ng (Chairperson)
Dr. Lei Chen
Dr. Ke Yi
**** ALL are Welcome ****