Data Dependencies in the Presence of Difference

PhD Thesis Proposal Defence


Title: "Data Dependencies in the Presence of Difference"

by

Mr. Shaoxu Song


Abstract:

The importance of difference semantics (e.g., “similar” or “dissimilar”) 
are recently recognized for declaring dependencies among various types of 
data, such as numerical values or text values. We propose a novel form of 
differential dependencies (DDs), which specifies constraints on 
difference, instead of identification function in traditional dependency 
notations like functional dependencies. Informally, a differential 
dependency states that if two tuples have distances on attributes X 
agreeing with a certain differential function, then their distances on 
attributes Y should also agree with the corresponding differential 
function on Y . For example, [date(≤ 7)] → [price(< 100)] states that 
the flight price difference of any two days in a week length should be no 
greater than 100$. Such differential dependencies are useful in various 
applications, e.g., violation detection, data partition, query 
optimization, record linkage, etc.

In this proposal, we first report our preliminary work on several 
theoretical issues of differential dependencies, including formal 
definitions of DDs and differential keys, subsumption order relation of 
differential functions, implication of DDs, closure of a differential 
function, a sound and complete inference system, and minimal cover for 
DDs. Then, we investigate a practical problem, i.e., how to discover DDs 
and differential keys from a given sample data. Due to the intrinsic 
hardness, we develop several pruning methods to improve the discovery 
efficiency in practice. Next, through an extensive experimental evaluation 
on real data sets, we demonstrate the discovery performance, and the 
effectiveness of DDs in several real applications. Finally, we discuss the 
future research plans and several directions of future work.


Date:  			Monday, 31 May 2010

Time:           	1:30pm - 3:30pm

Venue:          	Room 4480
 			lifts 25/26

Committee Members:      Dr. Lei Chen (Supervisor)
 			Dr. Raymond Wong (Chairperson)
 			Prof. Frederick Lochovsky
 			Dr. Ke Yi


**** ALL are Welcome ****