EFFICIENT CORRELATED PATTERN DISCOVERY IN DATABASES

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "EFFICIENT CORRELATED PATTERN DISCOVERY IN DATABASES"

By

Miss Yiping Ke


Abstract

Correlation mining has gained great success in many application domains
for its ability to capture the underlying dependency between objects.
However, existing research on correlation mining is mainly conducted on
boolean databases, despite that other complex data, especially in various
scientific and business domains, proliferates in recent years. In this
thesis, we study the correlated pattern discovery from two types of
prevalently-used databases: quantitative databases and graph databases. In
mining correlations from quantitative databases, we propose a novel notion
of Quantitative Correlated Patterns (QCPs), which is founded on two
correlation measures, normalized mutual information and all-confidence. We
also develop an algorithm to efficiently mine QCPs by utilizing a
supervised interval combining method and performing bi-level pruning. In
mining graph databases, we formalize a new problem of Correlated Graph
Search (CGS) using Pearson's correlation coefficient as a correlation
measure. We devise an efficient algorithm that solves the CGS problem by
mining the candidates from a much smaller projected database. We also make
use of the theoretical bounds on the support of a candidate graph to
directly answer high-support queries without mining the candidates. The
experimental results on both real and synthetic datasets justify the
efficiency and effectiveness of our proposed solutions.


Date:			Friday, 18 January 2008

Time:			10:00a.m.-12:00noon

Venue:			Room 4480
			Lifts 25-26

Chairman:		Prof. Kun Xu (MATH)

Committee Members:	Prof. Wilfred Ng (Supervisor)
			Prof. Dik Lun Lee
			Prof. Ke Yi
			Prof. Oscar Au (ECE)
			Prof. Xindong Wu (Comp. Sci., Univ. of Vermont)


**** ALL are Welcome ****