More about HKUST
Join Algorithms for Non-Conventional Databases
PhD Thesis Proposal Defence Title: "Join Algorithms for Non-Conventional Databases" by Mr. Yin YANG Abstract: Efficient join processing is a pivotal problem in many novel database applications. This proposal presents our initial investigations on continuous join queries in a data stream environment, which exhibit the following two important properties. First, data characteristics, e.g., arrival rates, value distributions, etc., may change in an unpredictable manner. Consequently, the initial query plan may gradually become inefficient. This necessitates dynamic plan migration, an online transition from the old plan to a new one generated based on current statistics. In addition to correctness, an effective technique for this purpose should achieve the following objectives: (i) minimize the memory and CPU overhead of the migration (ii) reduce the duration of the transition and (iii) maintain a steady output rate. The only known solutions for this problem are the moving states (MS) and parallel track (PT) strategies, which have some serious shortcomings related to the above objectives. Motivated by these shortcomings, we first propose HybMig, which combines the merits of MS and PT, and outperforms both on every aspect. The second key property of streaming a join query is that its data records are not available beforehand, but enter the system gradually. Hence, the operators in an execution plan need to be co-ordinated online, based on tuple arrivals and expirations. Specifically, these operators are connected via the “consumer-producer” relationship, i.e., the outputs of one operator (the “producer”) serve as the inputs to another (the “consumer”). Conventional techniques execute each operator separately and push all results to its consumers, without considering whether the consumers need them. As a result, considerable CPU and memory resources are wasted on producing and storing useless intermediate results. Motivated by this, we propose just-in-time (JIT) processing, a novel methodology that enables a consumer to return feedback expressing its current demand to the producer. The latter selectively generates outputs based on this information. Besides achieving significant savings in terms of CPU time and memory consumption, JIT is compatible with to various semantics including sliding windows and explicit deletions, and applies to a broad range of execution plans, e.g., those involving shared operators, and/or using sophisticated architectures. Date: Wednesday, 17 December 2008 Time: 10:00a.m.-12:00noon Venue: Room 3588 lifts 27-28 Committee Members: Prof. Dimitris Papadias (Supervisor) Dr. Wilfred Ng (Chairperson) Dr. Lei Chen Dr. Ke Yi **** ALL are Welcome ****